Optimal Design of Control Systems: Stochastic and Deterministic Problems (Pure and Applied Mathematics: A Series of Monographs and Textbooks 221)

OPTIMAL DESIGN OF CONTROL SYSTEMS PURE AND APPLIED MATHEMATICS A Program of Monographs, Textbooks, and Lecture Notes ...

Author: Gennadii E. Kolosov

27 downloads 575 Views 15MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

OPTIMAL DESIGN OF CONTROL SYSTEMS

PURE AND APPLIED MATHEMATICS A Program of Monographs, Textbooks, and Lecture Notes

EXECUTIVE EDITORS Earl J. Taft Rutgers University New Brunswick, New Jersey

Zuhair Nashed University of Delaware Newark, Delaware

EDITORIAL BOARD M. S. Baouendi University of California, San Diego Jane Cronin Rutgers University Jack K. Hale Georgia Institute of Technology

Anil Nerode Cornell University Donald Passman University of Wisconsin, Madison Fred S. Roberts Rutgers University

S. Kobayashi University of California, Berkeley

Gian-Carlo Rota Massachusetts Institute of Technology

Marvin Marcus University of California, Santa Barbara

David L. Russell Virginia Polytechnic Institute and State University

W. S. Massey Yale University

Walter Schempp Universitat Siegen

Mark Teply University of Wisconsin, Milwaukee

MONOGRAPHS AND TEXTBOOKS IN PURE AND APPLIED MATHEMATICS 1. K. Yano, Integral Formulas in Riemannian Geometry (1970) 2. S. Kobayashi, Hyperbolic Manifolds and Holomorphic Mappings (1970) 3. V. S. Vladimirov, Equations of Mathematical Physics (A. Jeffrey, ed.; A. Littlewood, trans.) (1970) 4. B. N. Pshenichnyi, Necessary Conditions for an Extremum (L. Neustadt, translation ed.; K. Makowski, trans.) (1971) 5. L. Narici et a/., Functional Analysis and Valuation Theory (1971) 6. S. S. Passman, Infinite Group Rings (1971) 7. L. Domhoff, Group Representation Theory. Part A: Ordinary Representation Theory. Part B: Modular Representation Theory (1971,1972) 8. W. Boothbyand G. L. Weiss, eds., Symmetric Spaces (1972) 9. Y. Matsushima, Differentiate Manifolds (E. T. Kobayashi, trans.) (1972) 10. L. E. Ward, Jr., Topology (1972) 11. A. Babakhanian, Cohomological Methods in Group Theory (1972) 12. R. Gilmer, Multiplicative Ideal Theory (1972) 13. J. Ye/7, Stochastic Processes and the Wiener Integral (1973) 14. J. Barros-Neto, Introduction to the Theory of Distributions (1973) 15. R Larsen, Functional Analysis (1973) 16. K. Yano and S. Ishihara, Tangent and Cotangent Bundles (1973) 17. C. Procesi, Rings with Polynomial Identities (1973) 18. R. Hermann, Geometry, Physics, and Systems (1973) 19. N. R. Wallach, Harmonic Analysis on Homogeneous Spaces (1973) 20. J. Dieudonne, Introduction to the Theory of Formal Groups (1973) 21. /. Vaisman, Cohomology and Differential Forms (1973) 22. B.-Y. Chen, Geometry of Submanifolds (1973) 23. M. Marcus, Finite Dimensional Multilinear Algebra (in two parts) (1973,1975) 24. R. Larsen, Banach Algebras (1973) 25. R. O. Kujala and A. L Vitter, eds., Value Distribution Theory: Part A; Part B: Deficit and Bezout Estimates by Wilhelm Stall (1973) 26. K. B. Stolarsky, Algebraic Numbers and Diophantine Approximation (1974) 27. A. R. Magid, The Separable Galois Theory of Commutative Rings (1974) 28. B. R. McDonald, Finite Rings with Identity (1974) 29. J. Satake, Linear Algebra (S. Koh et al., trans.) (1975) 30. J. S. Go/an, Localization of Noncommutative Rings (1975) 31. G. Klambauer, Mathematical Analysis (1975) 32. M. K. Agoston, Algebraic Topology (1976) 33. K. R. Goodearl, Ring Theory (1976) 34. L. E. Mansfield, Linear Algebra with Geometric Applications (1976) 35. N. J. Pullman, Matrix Theory and Its Applications (1976) 36. B. R. McDonald, Geometric Algebra Over Local Rings (1976) 37. C. W. Groetsch, Generalized Inverses of Linear Operators (1977) 38. J. E. Kuczkowski and J. L. Gersting, Abstract Algebra (1977) 39. C. O. Christenson and W. L. Voxman, Aspects of Topology (1977) 40. M. Nagata, Field Theory (1977) 41. R. L. Long, Algebraic Number Theory (1977) 42. W. F. Pfeffer, Integrals and Measures (1977) 43. R. L Wheeden and A. Zygmund, Measure and Integral (1977) 44. J. H. Curtiss, Introduction to Functions of a Complex Variable (1978) 45. K. Hrbacek and T. Jech, Introduction to Set Theory (1978) 46. W. S. Massey, Homology and Cohomology Theory (1978) 47. M. Marcus, Introduction to Modern Algebra (1978) 48. £ C. Young, Vector and Tensor Analysis (1978) 49. S. B. Nadler, Jr., Hyperspaces of Sets (1978) 50. S. K. Sega/, Topics in Group Kings (1978) 51. A. C. M. van Rooij, Non-Archimedean Functional Analysis (1978) 52. L. Corwin and R. Szczarba, Calculus in Vector Spaces (1979) 53. C. Sadosky, Interpolation of Operators and Singular Integrals (1979) 54. J. Cronin, Differential Equations (1980) 55. C. W. Groetsch, Elements of Applicable Functional Analysis (1980)

56. /. Vaisman, Foundations of Three-Dimensional Euclidean Geometry (1980) 57. H. /, Freedan, Deterministic Mathematical Models in Population Ecology (1980) 58. S. B. Chae, Lebesgue Integration (1980)

59. 60. 61. 62.

C. S. Rees ef a/., Theory and Applications of Fourier Analysis (1981) L Nachbin, Introduction to Functional Analysis (R. M. Aron, trans.) (1981) G. Orzech and M. Orzech, Plane Algebraic Curves (1981) R. Johnsonbaugh and W. E. Pfaffenberger, Foundations of Mathematical Analysis

(1981) 63. W. L. Voxman and R. H. Goetschel, Advanced Calculus (1981) 64. L J. Corwin and R. H. Szczarba, Multivariable Calculus (1982) 65. V. /. Istratescu, Introduction to Linear Operator Theory (1981)

66. R. D. Jarvinen, Finite and Infinite Dimensional Linear Spaces (1981) 67. J. K. Been? and P. E. Ehriich, Global Lorentzian Geometry (1981)

68. 69. 70. 71.

D. L Armacost, The Structure of Locally Compact Abelian Groups (1981) J. W. Brewer and M. K. Smith, eds., Emmy Noether: A Tribute (1981) K. H. Kim, Boolean Matrix Theory and Applications (1982) T. W. Wieting, The Mathematical Theory of Chromatic Plane Ornaments (1982)

72. D. B.Gauld, Differential Topology (1982)

73. R. L. Faber, Foundations of Euclidean and Non-Euclidean Geometry (1983) 74. M. Carmeli, Statistical Theory and Random Matrices (1983) 75. J. H. Carruth et a/., The Theory of Topological Semigroups (1983) 76. R. L. Faber, Differential Geometry and Relativity Theory (1983)

77. S. Bamett, Polynomials and Linear Control Systems (1983) 78. G. Karpilovsky, Commutative Group Algebras (1983) 79. F. Van Oystaeyen and A. Verschoren, Relative Invariants of Rings (1983)

80. /. Vaisman, A First Course in Differential Geometry (1984) 81. G. W. Swan, Applications of Optimal Control Theory in Biomedicine (1984) 82. T. Petrie andJ. D. Randall, Transformation Groups on Manifolds (1984) 83. K. Goebel and S. Reich, Uniform Convexity, Hyperbolic Geometry, and Nonexpansive Mappings (1984) 84. T. Albu and C. Nastasescu, Relative Finiteness in Module Theory (1984) 85. K. Hrbacek and T. Jech, Introduction to Set Theory: Second Edition (1984) 86. F. Van Oystaeyen and A. Verschoren, Relative Invariants of Rings (1984) 87. B. R. McDonald, Linear Algebra Over Commutative Rings (1984)

88. M. Namba, Geometry of Projective Algebraic Curves (1984) 89. G. F. Webb, Theory of Nonlinear Age-Dependent Population Dynamics (1985) 90. M. R. Bremner et ai, Tables of Dominant Weight Multiplicities for Representations of

Simple Lie Algebras (1985) 91. A. E. Fekete, Real Linear Algebra (1985) 92. S. B. Chae, Holomorphy and Calculus in Normed Spaces (1985) 93. 94. 95. 96. 97.

A. J. Jem, Introduction to Integral Equations with Applications (1985) G. Karpilovsky, Projective Representations of Finite Groups (1985) L. Narici and E. Beckenstein, Topological Vector Spaces (1985) J. Weeks, The Shape of Space (1985) P. R. Gribik and K. O. Kortanek, Extremal Methods of Operations Research (1985)

98. J.-A. Chao and W. A. Woyczynsk!, eds., Probability Theory and Harmonic Analysis (1986)

99. 100. 101. 102.

G. D. Crown et a/.. Abstract Algebra (1986) J. H. Carruth et a/., The Theory of Topological Semigroups, Volume 2 (1986) R. S. Dora/7 and V. A. Belfi, Characterizations of C*-Algebras (1986) M. W. Jeter, Mathematical Programming (1986)

103. M. Altman, A Unified Theory of Nonlinear Operator and Evolution Equations with Applications (1986)

104. A. Verschoren, Relative Invariants of Sheaves (1987) 105. R. A. Usmani, Applied Linear Algebra (1987)

106. P. Blass and J. Lang, Zariski Surfaces and Differential Equations in Characteristic p > 0(1987)

107. J. A. Reneke et ai, Structured Hereditary Systems (1987) 108. H. Busemann and B. B. Phadke, Spaces with Distinguished Geodesies (1987) 109. R. Harte, Invertibility and Singularity for Bounded Linear Operators (1988) 110. G. S. Ladde et at., Oscillation Theory of Differential Equations with Deviating Arguments (1987) 111. L. Dudkin et a/., Iterative Aggregation Theory (1987) 112. T. Okubo, Differential Geometry (1987) 113. D. L. Stand and M. L. Stand, Real Analysis with Point-Set Topology (1987)

114. T. C. Card, Introduction to Stochastic Differential Equations (1988) 115. S. S. Abhyankar, Enumerative Combinatorics of Young Tableaux (1988) 116. H. Sfracfe and R. Famsteiner, Modular Lie Algebras and Their Representations (1988) 117. J. A. Huckaba, Commutative Rings with Zero Divisors (1988)

118. W. D. Wallis, Combinatorial Designs (1988) 119. W. W;{?s/aw, Topological Fields (1988) 120. G. Karpilovsky, Field Theory (1988)

121. S. Caenepeel and F. Van Oystaeyen, Brauer Groups and the Cohomology of Graded Rings (1989) 122. W. Kozlowski, Modular Function Spaces (1988) 123. E. Lowen-Colebunders, Function Classes of Cauchy Continuous Maps (1989)

124. M. Pave/, Fundamentals of Pattern Recognition (1989) 125. V. Lakshmikantham et a/., Stability Analysis of Nonlinear Systems (1989) 126. R. Sivaramakrishnan, The Classical Theory of Arithmetic Functions (1989) 127. N. A. Watson, Parabolic Equations on an Infinite Strip (1989)

128. K. J. Hastings, Introduction to the Mathematics of Operations Research (1989) 129. B. Fine, Algebraic Theory of the Bianchi Groups (1989)

130. D. N. Dikranjan et a/., Topological Groups (1989) 131. J.C. Morgan II, Point Set Theory (1990) 132. P. Biter and A. Witkowski, Problems in Mathematical Analysis (1990)

133. H. J. Sussmann, Nonlinear Controllability and Optimal Control (1990) 134. J.-P. Florens et a/., Elements of Bayesian Statistics (1990) 135. N. Shell, Topological Fields and Near Valuations (1990)

136. B. F. Doolin and C. F. Martin, Introduction to Differential Geometry for Engineers 137. 138. 139. 140. 141. 142. 143. 144. 145. 146.

(1990) S. S. Holland, Jr., Applied Analysis by the Hilbert Space Method (1990) J. Okninski, Semigroup Algebras (1990) K. Zhu, Operator Theory in Function Spaces (1990) G. B. Price, An Introduction to Multicomplex Spaces and Functions (1991) R. B. Darst, Introduction to Linear Programming (1991) P. L Sachdev, Nonlinear Ordinary Differential Equations and Their Applications (1991) T. Husain, Orthogonal Schauder Bases (1991) J. Foran, Fundamentals of Real Analysis (1991) W. C. Brown, Matrices and Vector Spaces (1991) M. M. RaoandZ. D. Ron, Theory of Oriicz Spaces (1991)

147. J. S. Go/an and T. Head, Modules and the Structures of Rings (1991) 148. 149. 150. 151. 152.

C. Small, Arithmetic of Finite Fields (1991) K. Yang, Complex Algebraic Geometry (1991) D. G. Hoffman et a/., Coding Theory (1991) M. O. Gonzalez, Classical Complex Analysis (1992) M. O. Gonzalez, Complex Analysis (1992)

153. L W. Baggett, Functional Analysis (1992) 154. M. Sniedovich, Dynamic Programming (1992) 155. R. P. Agarwal, Difference Equations and Inequalities (1992) 156. 157. 158. 159.

C. Brezinski, Biorthogonality and Its Applications to Numerical Analysis (1992) C. Swartz, An Introduction to Functional Analysis (1992) S. 8. Nadler, Jr., Continuum Theory (1992) M. A. AI-Gwaiz, Theory of Distributions (1992)

160. E. Perry, Geometry: Axiomatic Developments with Problem Solving (1992) 161. E. Castillo and M. R. Ruiz-Cobo, Functional Equations and Modelling in Science and Engineering (1992)

162. A. J. Jerri, Integral and Discrete Transforms with Applications and Error Analysis (1992) 163. A. Charlieret a/., Tensors and the Clifford Algebra (1992)

164. 165. 166. 167. 168. 169. 170.

P. Bilerand T. Nadzieja, Problems and Examples in Differential Equations (1992) E. Hansen, Global Optimization Using Interval Analysis (1992) S. Guerre-Delabriere, Classical Sequences in Banach Spaces (1992) Y. C. Wong, Introductory Theory of Topological Vector Spaces (1992) S. H. Kulkami and B. V. Limaye, Real Function Algebras (1992) W. C. Brown, Matrices Over Commutative Rings (1993) J. LoustauandM. Dillon, Linear Geometry with Computer Graphics (1993)

171. W. V. Petryshyn, Approximation-Solvability of Nonlinear Functional and Differential Equations (1993) 172. E. C. Young, Vector and Tensor Analysis: Second Edition (1993) 173. T. A. Bick, Elementary Boundary Value Problems (1993)

174. M. Pave/, Fundamentals of Pattern Recognition: Second Edition (1993) 175. S. A. Albeverio et al., Noncommutative Distributions (1993) 176. W. Fulks, Complex Variables (1993)

177. M. M. Rao, Conditional Measures and Applications (1993) 178. A. Janicki and A. Weron, Simulation and Chaotic Behavior of a-Stable Stochastic Processes (1994) 179. P. Neittaanmaki and D. Tiba, Optimal Control of Nonlinear Parabolic Systems (1994) 180. J. Cronin, Differential Equations: Introduction and Qualitative Theory, Second Edition (1994)

181. S. HeikkilS and V. Lakshmikantham, Monotone Iterative Techniques for Discontinuous Nonlinear Differential Equations (1994) 182. X. Mao, Exponential Stability of Stochastic Differential Equations (1994)

183. B. S. Thomson, Symmetric Properties of Real Functions (1994) 184. J. E. Rub/o, Optimization and Nonstandard Analysis (1994) 185. J. L Bueso et al., Compatibility, Stability, and Sheaves (1995) 186. A. N. Michel and K. Wang, Qualitative Theory of Dynamical Systems (1995) 187. M. R. Darnel, Theory of Lattice-Ordered Groups (1995)

188. Z. Naniewicz and P. D. Panagiotopoulos, Mathematical Theory of Hemivariational Inequalities and Applications (1995) 189. L. J. Co/w/n and R. H. Szczarta, Calculus in Vector Spaces: Second Edition (1995) 190. L. H. Erbe et al., Oscillation Theory for Functional Differential Equations (1995)

191. S. Agaian etal.. Binary Polynomial Transforms and Nonlinear Digital Filters (1995) 192. 193. 194. 195. 196.

M, I. Gil', Norm Estimations for Operation-Valued Functions and Applications (1995) P. A. Grillet, Semigroups: An Introduction to the Structure Theory (1995) S. Kichenassamy, Nonlinear Wave Equations (1996) V. F. Krotov, Global Methods in Optimal Control Theory (1996) K. /. Beidaret al.. Rings with Generalized Identities (1996)

197. V. I. Amautov et al., Introduction to the Theory of Topological Rings and Modules 198. 199. 200. 201. 202. 203. 204.

(1996) G. S/erfcsma, Linear and Integer Programming (1996) R. Lasser, Introduction to Fourier Series (1996) V. Sima, Algorithms for Linear-Quadratic Optimization (1996) D. Redmond, Number Theory (1996) J. K. Beem et al., Global Lorentzian Geometry: Second Edition (1996) M. Fontana et al., Priifer Domains (1997) H. Tanabe, Functional Analytic Methods for Partial Differential Equations (1997)

205. C. Q. Zhang, Integer Flows and Cycle Covers of Graphs (1997) 206. E. Spiegel and C. J. O'Donnell, Incidence Algebras (1997)

207. B. Jakubczyk and W. Respondek, Geometry of Feedback and Optimal Control (1998) 208. T. W. Haynes et al., Fundamentals of Domination in Graphs (1998) 209. T. W. Haynes et al., Domination in Graphs: Advanced Topics (1998) 210. L. A. D'Alotto et al., A Unified Signal Algebra Approach to Two-Dimensional Parallel

Digital Signal Processing (1998) 211. F. Halter-Koch, Ideal Systems (1998) 212. N. K. Goviletal., Approximation Theory (1998) 213. R. Cross, Multivalued Linear Operators (1998) 214. A. A. Martynyuk, Stability by Liapunov's Matrix Function Method with Applications (1998) 215. A. FaviniandA. Yagi, Degenerate Differential Equations in Banach Spaces (1999)

216. A. Illanes and S. Nadler, Jr., Hyperspaces: Fundamentals and Recent Advances (1999)

217. G. Kato and D. Struppa, Fundamentals of Algebraic Microlocal Analysis (1999) 218. G. X.-Z. Yuan, KKM Theory and Applications in Nonlinear Analysis (1999)

219. D. Motreanu and N. H. Pavel, Tangency, Flow Invariance for Differential Equations, and Optimization Problems (1999) 220. K. Hrbacek and T. Jech, Introduction to Set Theory, Third Edition (1999) 221. G. £ Kolosov, Optimal Design of Control Systems (1999) 222. A. I. Prilepko et al., Methods for Solving Inverse Problems in Mathematical Physics (1999)

Additional Volumes in Preparation

OPTIMAL DESIGN OF CONTROL SYSTEMS Stochastic and Deterministic Problems

G. E. Kolosov Moscow University of Electronics and Mathematics Moscow, Russia

MARCEL

MARCEL DEKKER, INC.

NEW YORK • BASEL

Library of Congress Cataloging-in-Publication Data Kolosov, G. E. (Gennadil Evgen'evich) Optimal design of control systems: stochastic and deterministic problems / G. E.

Kolosov. p. cm.— (Monographs and textbooks in pure and applied mathematics; 221) Includes bibliographical references and index.

ISDN 0-8247-7537-6 (alk. paper) 1. Control theory. 2. Mathematical optimization. I. Title. II. Scries.

QA402.3.K577 1999 629.8312—dc21

99-30940 CIP

This book is printed on acid-free paper. Headquarters

Marcel Dekker, Inc. 270 Madison Avenue, New York, NY 10016 tel: 212-696-9000; fax: 212-685-4540 Eastern Hemisphere Distribution Marcel Dekker AG Hutgasse 4, Postfach 812, CH-4001 Basel, Switzerland tel: 41-61-261-8482; fax: 41-61-261-8896

World Wide Web http://www.dekker.com

The publisher offers discounts on this book when ordered in bulk quantities. For more information, write to Special Sales/Professional Marketing at the headquarters address above. Copyright © 1999 by Marcel Dekker, Inc. AH Rights Reserved.

Neither this book nor any part may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, microfilming, and recording, or by any information storage and retrieval system, without permission in writing from the publisher.

Current printing (last digit): 10 9 8 7 6 5 4 3 2 1 PRINTED IN THE UNITED STATES OF AMERICA

PREFACE

The rise of optimal control theory is a remarkable example of interaction between practical needs and mathematical theories. Indeed, in the middle of this century the development of various automatic control systems in technology and of systems for control of motion of mechanical objects (in particular, of flying objects such as airplanes and rockets) gave rise to specific mathematical problems concerned with finding the conditional extremum of functions or functionals, which could not be solved by means of the methods of classical mathematical analysis and the calculus of variations. Extreme urgency of these problems for practical needs stimulated the efforts of mathematicians to develop methods for solving these new problems. At the end of the fifties and at the beginning of the sixties, these efforts were crowned with success when new mathematical approaches such as Pontryagin's maximum principle, Bellman's dynamic programming, and linear and convex programming (developed somewhat earlier by L. Kantorovich, G. Dantzig, and others) were established. These new approaches greatly affected the research carried out in control theory at that time. It should be noted that these approaches have played a very important role in the process of formation of optimal control theory as an independent branch of science. One can say that the role of the maximum principle and dynamic programming in the theory of optimal control is as significant as that of Maxwell's equations in electromagnetic theory in physics. Optimal control theory evolved most intensively at the end of the sixties and during the seventies. This period showed a very high degree of cooperation and interaction between mathematicians and all those dealing with applications of control theory in technology, mechanics, physics, chemistry, biology, etc. Later on, a gap between the purely mathematical and the practical approach to solving applied problems of optimal control began to emerge and is now apparent. Although the appearance of this gap can be explained by quite natural reasons, nevertheless, the further growth of this trend seems to be undesirable. The author hopes that this book will to some extent reduce the gap between these two branches of research. iii

IV

Preface

This book is primarily intended for specialists dealing with applications of control theory. It is well known that the use of such approaches as, say, the maximum principle or dynamic programming often leads to optimal control algorithms whose implementation for actual real-time plants encounters great (sometimes insurmountable) difficulties. This is the reason

that for solving control problems in practice one often employs methods based on various simplifications and heuristic concepts.

Naturally, this

results in losses in optimality but makes it possible to obtain control algorithms that allow simple technological implementations. In some cases the use of simplifications and heuristic concepts can also result in significant deviations of the system performance index from its optimal value (Chapter VI). In this book we describe ways for constructing simply realizable algorithms of optimal (suboptimal) control, which are based on the dynamic

programming approach. These algorithms are derived on the basis of exact, approximate analytical, or numerical solutions of differential and functional Bellman equations corresponding to the control problems considered.

The book contains an introduction and seven chapters. Chapter I deals with some general concepts of control theory and the description of mathematical approaches to solving problems of optimal control. We consider both deterministic and stochastic models of controlled systems and discuss

the distinguishing features of stochastic models, which arise due to possible ambiguous interpretation of solutions to stochastic differential equations describing controlled systems with white noise disturbances. We define the synthesis problem as the principal problem of optimal control theory and give a general scheme of the dynamic programming approach. The Bellman equations for deterministic and stochastic control problems (for Markov models and stochastic models with indirect observations) are studied. For problems with infinite horizon we introduce the

concept of stationary operating conditions, which is widely used in further chapters of the book. Exact methods of synthesis are considered in Chapter II. We describe the exceptional cases in which the Bellman equations have exact solutions, and

hence the optimal control algorithms can be obtained in explicit analytical forms.

First (in §2.1), we briefly discuss some well-known results concerned with solution of the so-called LQ-problems. Next, in §§2.2-2.4, we write exact solutions for three specific problems of optimal control with bounded control actions. We consider deterministic and stochastic problems of control of

the population size and the problem of constructing an optimal servomechanism. In these systems, the optimal controllers are of the "bang-bang"

form, and the switch point coordinates are given by finite formulas.

Preface

v

The following four chapters are devoted to the description of approximate methods for synthesis. In this case, the design of suboptimal control systems is based, as a rule, on using the approximate solutions of the corresponding Bellman equations. To obtain these approximate solutions, we mainly use various versions of small parameter methods or successive approximation procedures. In Chapter III we study weakly controlled systems. We consider control problems with bounded controls and assume that the values of admissible control actions are small. This stipulates the appearance of a small parameter in the nonlinear term in the Bellman equation. This, in turn, makes it possible to propose a natural successive approximation procedure for solving the Bellman equation, and thus the synthesis problem, approximately. This procedure is a modification of the well-known Picard and Bellman procedures which provide a way for obtaining approximate solutions of nonlinear differential equations by solving a sequence of linear equations. Chapter III is organized as follows. First (in §3.1), we describe the general scheme of approximate synthesis for controlled systems under stationary operating conditions. Next (in §3.2), by using this general scheme, we calculate a suboptimal controller for an oscillatory system with one degree of freedom. Later (in §3.3 and §3.4), we generalize our approach to nonstationary problems and to the case of correlated disturbances; then we estimate the error obtained. In §3.5 we prove that the successive approximation procedure in question converges asymptotically. Finally (in §3.6), we apply this approach to an approximate design of a stochastic system with distributed parameters. Chapter IV is about stochastic controlled systems with noises of small intensities. In this case, the diffusion terms in the Bellman equation contain small coefficients. Under certain assumptions this allows us to replace the initial stochastic problem by a sequence of auxiliary deterministic problems of optimal control whose solutions (i) can be calculated more easily and (ii) give a way for designing suboptimal control systems (with respect to the initial stochastic problem). This approach is used for calculating suboptimal controllers for two specific servomechanisms. In Chapter V we consider a class of controlled systems whose dynamics are quasiharmonic. The trajectories of such systems are close to harmonic oscillations, and this is the reason that the well-developed techniques of the theory of nonlinear oscillations can be effectively applied for studying these systems. By using polar coordinates as the phase variables, we describe the system state in terms of slowly changing amplitude and phase. The presence of a small parameter on the right-hand sides of the differential equations for these variables allows us to elaborate different versions of approximate solutions for the various problems of optimal control. These

vi

Preface

solutions are based on the use of appropriate asymptotic expansions of the performance index, the optimal control algorithm, etc. in powers of the small parameter. We illustrate these techniques by solving four specific problems of optimal damping of deterministic and stochastic oscillations in a biological predator-prey system and in a mechanical system with oscillatory dynamics. In Chapter VI we discuss some special asymptotic methods of synthesis which do not belong to the classes of control problems studied in Chapters III-V. We consider the problems of control of plants with unknown parameters (the adaptive control problems), in which the a priori uncertainty of their values is small. In addition, we study stochastic control problems with bounded phase variables and a problem of optimal control of the population size whose behavior is governed by a stochastic logistic equation with a large value of the medium capacity. We use small parameter approaches for solving the problems mentioned above. For the construction of suboptimal controls, we employ the asymptotic series expansions for the loss functions and the optimal control algorithms. The error obtained is estimated. Numerical methods of synthesis are covered in the final Chapter VII. We discuss the problem of the assignment of boundary conditions to grid

functions and propose some different schemes for solving specific problems of optimal control. The numerical methods proposed are used for solving specific synthesis problems. The presentation of all the approaches studied in the book is accompanied by numerous examples of actual control problems. All calculations are carried out up to the accuracy level sufficient for comparatively simple implementation of the optimal (suboptimal) algorithms obtained in actual devices. In many cases, the algorithms are presented in the form of analogous circuits or flow charts. The book can be helpful to students, postgraduate students, and specialists working in the field of automatic control and applied mathematics. The book may be of interest to mechanical and electrical engineers, physicists and biologists. Only knowledge of the foundations of probability theory is required for assimilating the subject matter of the book. The reader should be acquainted with basic notions of probability theory such as random events and random variables, the probability distribution function and the probability density of random variables, the mean value of a random variable, inconsistent and independent random events and variables, etc. It is not compulsory to know the foundations of the theory of random processes, since Chapter I provides all necessary facts about the methods for describing random processes that are encountered further in

Preface

vii

the book. This makes the book accessible to a wide circle of students and specialists who are interested in applications of optimal control theory. The author's intention to write this book was supported by R. L. Stratonovich, who was the supervisor of the author's Ph.D thesis and for many

years till his sudden death in 1997 remained the author's friend. The author wishes to express his deep gratitude to V. B. Kolmanovskii,

R. S. Liptser, and all participants of the seminar "Stability and Control" at the Moscow University of Electronics and Mathematics for useful remarks and advice concerning the contents of this book. The author's special thanks go to M. A. Shishkova for translating the manuscript into English and keyboarding.

G. E. Kolosov

CONTENTS

Preface

v

Introduction

1

Chapter I. Synthesis Problems for Control Systems and the Dynamic Programming Approach 1.1. Statement of synthesis problems for optimal control systems 1.2. Differential equations for controlled systems with random functions 1.3. Deterministic control problems. Formal scheme of the dynamic programming approach 1.4. The Bellman equations for Markov controlled processes 1.5. Sufficient coordinates in control problems with indirect observations Chapter II. lems

7 7

32 48 57

75

Exact Methods for Synthesis Prob-

2.1. Linear-quadratic problems of optimal control (LQproblems) 2.2. Problem of optimal tracking a wandering coordinate 2.3. Optimal control of the population size 2.4. Stochastic problem of optimal fisheries management Chapter III. Approximate Synthesis of Stochastic Control Systems With Small Control Actions 3.1. Approximate solution of stationary synthesis problems 3.2. Calculation of a quasioptimal regulator for the oscillatory plant

93

93 103 123 133

141

144 154 IX

Contents 3.3. Synthesis of quasioptimal controls in the case of correlated noises 3.4. Nonstationary problems. Estimates of the quality of approximate synthesis 3.5. Analysis of the asymptotic convergence of successive

approximations (3.0.6)-(3.0.8) as k —>• oo 3.6. Approximate synthesis of some stochastic systems with distributed parameters Chapter IV.

164 175

188 199

Synthesis of Quasioptimal Systems

in the Case of Small Diffusion Terms in the

Bellman Equation 4.1. Approximate synthesis of a servomechanism with small-intensity noise

4.2. Calculation of a quasioptimal system for tracking a discrete Markov process

219 221

233

Chapter V. Control of Oscillatory Systems 5.1. Optimal control of a quasiharmonic oscillator. An

247

asymptotic synthesis method 5.2. Control of the "predator-prey" system. The case of a poorly adapted predator 5.3. Optimal damping of random oscillations 5.4. Optimal control of quasiharmonic systems with noise in the feedback circuit

248

Chapter VI.

267 276 298

Some Special Applications of

Asymptotic Synthesis Methods 6.1. Adaptive problems of optimal control 6.2. Some stochastic control problems with constrained

311 312

phase coordinates 6.3. Optimal control of the population size governed by the stochastic logistic model

328 341

Chapter VII. Numerical Synthesis Methods 7.1. Numerical solution of the problem of optimal damping of random oscillations 7.2. Optimal control for the "predator-prey" system (the general case)

355

Conclusion References Index

383 387 401

356 368

INTRODUCTION

The main problem of the control theory can be formulated as follows. In the design of control systems it is assumed that each control system (see Pig. 1) consists of the following two principal parts (blocks or subsystems): the subsystem P to be controlled (the plant) and the controlling subsystem C (the controller). The plant P is a dynamical system (mechanical, electrical, biological, etc.) whose behavior is described by a well-known operator mapping the input (controlling) actions u(t) into the output trajectories x(t}. This operator can be denned by a system of ordinary differential, functional, functional-differential, or integral equations or by partial differential equations. It is important that the operator (or, in technical terms, the structure or the construction) of the plant P is assumed to be given and fixed from the outset.

x(t)

U(t)

P

C

FIG. 1 As for the controller C, no preliminary restrictions are imposed on its structure. This block must be constructed in such a way that the output

trajectories { x ( t ) : 0 < t < T] (the case T = +00 is not excluded) possess, in a sense, sufficiently "good" properties. Whether the trajectories are "good" or not depends on the specifications imposed on the control system in question. These assumptions are often stated by using the concept of a support (or standard) trajectory x ( t ) , and the control system itself is constructed so that the deviation x(t) — x(t)\

on the time interval 0 < t < T does not exceed a value given in advance. If the "quality" of an individual trajectory {x(t): 0 < t < T} can be estimated by the value of some functional /[»(<)] of this trajectory, then there is a possibility to find an optimal trajectory x* (t) on which the functional

2

Introduction

I [ x ( t ) } attains its extremum value (in this case, the extremum type (minimum or maximum) is determined by the character of the control problem). The functional /[cc(t)] used for estimating the control quality is often called

the optimality criterion or the performance index of the control system designed.

If there are no random actions on the system, the problem of finding the optimal trajectory K*(i) amounts to finding the optimal control program {u*(t): 0 < t < T} that ensures the plant motion along the extremum trajectory { x f ( t } : 0 < t < T}. The optimal control w*(t) can be calculated by using methods of classical calculus of variations [64], or, in more general situations, Pontryagin's maximum principle [156], or various approximate

methods [138] based on these two fundamental approaches. Different methods for calculating the optimal control programs are discussed in [137]. If an optimal control system is constructed without considering stochastic effects, then the system can be open (as in Fig. 1), since the plant trajectory { x ( t ) : 0 < t < T} and hence the value of the optimality criterion /[x(t)] are determined uniquely for a chosen realization {u(t): 0 < t < T} of control actions. (Needless to say that the equation of the plant is assumed to have a unique solution for a given initial state x(Q) = XQ and a given input function u(t].}

y(t}

C

u(t)

P

x(

FIG. 2 The situation is different if the system is subject to noncontrolled random actions. In this case, to obtain an effective control, one needs some information about the actual current state x(t) of the plant, that is, the

optimal system must be a closed-loop (or feedback) system. For example, all servomechanisms are designed according to this principle (see Fig. 2). In this case, in addition to the operator of the plant P, it is necessary to take into account the properties of a source of information, which determines the required value y(t) of the output parameter vector x(t) at each instant t (examples of specific servomechanisms can be found in [2, 20,

38, 50]). The block C measures the current values of the input y(t) and output x(t) variables and forms controlling actions in the form of the functional u(t) — y>(i/o, XQ) of the observed trajectories y*0 = {y(s): 0 < s < t}, xf0 = { x ( s ) : 0 < s < t} so that the equality x ( t ) = y(t) holds, if possible,

Introduction

3

for 0 < t < T. However, the stochastic nature of the assigning action (command signal) y(t) on one side, and the inertial properties of the plant P on the other side do not allow to ensure the required identity between the in-

put and output parameters. Therefore, a problem of optimal control arises

in a natural way. Hence, just as in the deterministic case, the optimality criterion /[|y(t) — x(t)\] is introduced, which is a measure of the "distance" between the functions y ( t ) and x(t) on the time interval 0 < t < T. The final statement of the problem depends on the type of assumptions on the properties of the assigning action y(t). Throughout this book, we use the probability description for all random actions on the system. This means that all assigning actions are treated as random functions with known (completely or partially) probability characteristics. In this approach, the optimal control law that determines the structure of the block C can be found from the condition that the mean value of the criterion j[|j/(<) — £(£)|] attains its minimum. Another approach in which the regions of admissible values of perturbations rather than their probability characteristics are specified

and the optimal system is constructed by methods of the game theory is

described in [23, 114, 115, 145, 195]. If the servomechanism shown in Fig. 2 is significantly affected by noises arising due to measurement errors, instability of voltage sources in electri-

cal circuits, varying properties of the medium surrounding the automatic system, then the block diagram in Fig. 2 becomes more complicated and

can be of the form shown in Fig. 3.

FIG. 3 Here £(<) and rj(t) denote random perturbations distorting information on the command signal y ( t ) and the state x ( t ) of the plant to be controlled; the random function £(t) describes the perturbing actions on the plant P. By '!' and '2' we denote the blocks in which useful signals and noises are combined. It is usually assumed that the structure of such blocks is known.

4

Introduction

In this book we do not consider control systems whose block diagrams are more complicated than that shown in Fig. 3. All control systems studied in the sequel are special cases of the system shown in Pig. 3. The main emphasis of this book is on the methods for calculating the optimal control algorithms

(*)

«»(<) = *(t,XQ,yo),

which determine the structure of the controller C and guarantee the optimal behavior of the feedback control system shown in Fig. 3. Since the methods studied in this book are oriented to solving applied control problems in mechanics, engineering, and biology, much attention is paid to obtaining (*) in a form such that it can easily be used in practice. This means that all optimal control algorithms described in the book for specific problems are

such that the functional (mapping) ip* in (*) has either a finite analytic form or can be implemented by sufficiently simple standard modeling methods. From the mathematical viewpoint, all problems of optimal control are

related to finding a conditional extremum of a functional (the optimality criterion), i.e., are problems of calculus of variations [28, 58, 64, 137]. However, a distinguishing feature of many optimal control problems is that they are "nonclassical" due restrictions imposed on the admissible values of controlling actions u(i). For instance, this often leads to discontinuous

extremals inadmissible in the classical theory [64]. Therefore, problems of optimal control are usually solved by contemporary mathematical methods, the most important being the Pontryagin maximum principle [156] and the Bellman dynamic programming approach [14]. These methods develop and generalize two different approaches to variational problems in the classical theory: the Euler method and the Weierstrass variational principle used for

constructing a separate extremal and the Hamilton-Jacobi method based on the consideration of the entire field of extremals, which leads to partial differential equations for controlled systems with lumped parameters or to equations with functional derivatives for controlled systems with distributed

parameters. The maximum principle, which is a rigorously justified mathematical

method, can be used in general for solving both deterministic and stochastic problems of optimal control [58, 116, 156]. However this method, based on the consideration of individual trajectories of the control process, leads

to certain technical difficulties when one needs to find the structure of the controller C in feedback stochastic systems (see Figs. 2 and 3). In

this situation, the dynamic programming approach looks more attractive. This method however suffers some flaws from the accuracy viewpoint (for example, it is well known that the Bellman differential equations cannot be

Introduction

5

used in some cases of deterministic time-optimal control problems [50, 137,

156]). In systems with lumped parameters where the behavior of the plant P is governed by ordinary differential equations, the dynamic programming

approach allows the reduction of optimal control problem to solving a nonlinear partial differential equation (the Bellman equation). In this case, the structure of the controller C (and hence the form of the function (mapping) iff in (*)) is determined simultaneously with solving this equation. Thus this method provides a straightforward solution of the main problem in control theory, namely, the synthesis of a closed-loop automatic control system. As for the possibility to use this method, so far it has been rigorously proved that the Bellman differential equations are valid and form the basis for solving the synthesis problems for a wide class of stochastic

and deterministic control systems [113, 175]. Therefore, the dynamic programming approach is widely used in this book and underlies practically all methods developed for calculating optimal (or quasioptimal) controls. As noted above, these methods constitute the dominant bulk of the subject matter of this book. As is known, the functional and differential Bellman equations can be used effectively only if the controlled process (or, in more general cases,

the system phase trajectory in some state space) is a process without aftereffects, that is, a Markov type process. In deterministic problems, this Markov property of trajectories readily follows from the corresponding existence and uniqueness theorems for the solutions of the Cauchy problem. To ensure the Markov property of trajectories in stochastic control problems, it is necessary to impose some restrictions on the class of random functions used as mathematical models of random disturbances on the system. To this end, throughout this book, it is assumed that all random actions on the system are either "white noise" type processes or Markov stochastic

processes. When the perturbations are of white noise type, the controlled process

x(t) itself can be Markov. If the noises are of Markov type, then the process x(t) is, generally speaking, a component of a partially observable Markov process of larger dimension. Therefore, to solve the synthesis problem effectively in this case, one needs to use a special state space formed by sufficient statistics, so that the time evolution of these statistics possesses the Markov property. In this case, the controller C consists of two parts: a block that forms sufficient statistics (coordinates) and an actual controller

whose structure can be found by solving the Bellman equation. These topics are studied in more detail in Chapter I.

CHAPTER I

SYNTHESIS PROBLEMS FOR CONTROL SYSTEMS AND THE DYNAMIC PROGRAMMING APPROACH

§1.1. Statement of synthesis problems for optimal control systems

In synthesis problems it is required to find the structure of the control

block (controller) C in a feedback control system (see Figs. 2 and 3). From the mathematical viewpoint, this problem is solved if we know the form of the mapping u =
vector-function u = {u(t) : 0 < t < T} (the system is considered on the time interval [0,T]). The conditions under which algorithm (1.1.1) can physically be implemented impose some restrictions on the form of the mapping in (1.1.1). Usually, it is assumed that the current values of the control vector u(i) = (ui(t], . . . , ur(t)j at time t are independent of the future values x(t') and y(t'}^ t1 > t. Therefore, the mapping (1.1.1) can be written as follows (see (*) in Introduction):

u(t) = y>(*,x£,y 0 '),

0
(1.1.2)

where XQ = {x(s) : 0 < s < t} and y£ = { y ( s ) : 0 < s < t} denote the functions x and y realized at time t. In simpler situations (say, in the case of the servomechanism shown in Fig. 2), the synthesis function ip may depend only on the current values of

the input processes (1.1.3) or even may be of the form

u(t)=
The functions x and y are input functions for the controller C.

(1.1.4)

8

Chapter I

if the command signal y(t) is either absent or a known deterministic function of time. The explicit form of the synthesis function ip is determined by the character of the optimal control problem. To state the synthesis problem for an optimal control system mathematically, we need to know: (1) the dynamic equations of the controlled plant; (2) the goal of control;

(3) the restrictions (if any) on the domain of admissible values of control actions w, on the domain of the phase variables x, etc.; (4) the probability characteristics of the stochastic processes that affect the system. Obviously, in problems of deterministic optimal control we need only the first three objects. 1.1.1. Dynamic equations of the controlled plant. The present

monograph, except for §3.6, deals with control systems in which the plant P can be described by a system of ordinary differential equations in the normal form x = g(t,x,u), (1.1.5) where x = x(t) G Rn and u = u(t) 6 Rr are the current values of an n-dimensional vector of output parameters (the phase variables) and of an r-dimensional control vector, g(t, £, u): R x Rn x Rr i—>• Rn is a given vector-function, and the dot over a letter denotes the derivative with respect to time (that is, x is an n-vector with components dxi/dt, i — l , . . . , n ) . Here and in the sequel, R& denotes the Euclidean space of fc-dimensional vectors. If, in addition to the control u, the controlled plant experiences uncontrolled random perturbations (see Fig. 3), then its behavior is described by the equation x=g(t,x,u,t(t}), (1.1.6)

where £(t) is an m-vector of random functions ( f , i ( t ) , . . . ,{,m(t)).

Differ-

ential equations of the form (1.1.6) with random functions on the righthand sides are called stochastic differential equations. In contrast with the "usual" differential equations of the form (1.1.5), they have some special properties, which we consider in detail in the next section. The form of the vector-functions g(t, x, u) and g(t, x, u, £(t)) on the right in (1.1.5) and (1.1.6) is determined by the physical nature of the plant. In the subsequent chapters, we consider various special cases of Eqs. (1.1.5)

Synthesis Problems for Control Systems

9

and (1.1.6) and solve some specific control problems for mechanical, technical, and biological objects. In the present chapter, we only discuss general restrictions that we need to impose on the function
to obtain a well-posed mathematical statement of the problem of optimal control synthesis. The most important and, in fact, the only restriction on the function g(-) is the existence of a unique solution to the Cauchy problem for Eqs. (1.1.5) and (1.1.6) with any given control function u(t] chosen from a function

class that is called the class of admissible controls. This means that the trajectory x ( t ) of system (1.1.5) or (1.1.6) is uniquely determined2 on the time interval t0 < t < to + T by the initial state x(to) = *o and a chosen

function {u(t}: t0 < t < t0 + T}. The uniqueness of the solution x(t) of system (1.1.5) with the initial

condition x(to) = XQ is guaranteed by well-known existence and uniqueness theorems for systems of ordinary differential equations [137]. The following

theorem [156] presents very general sufficient conditions for the existence and uniqueness of the solution of system (1.1.5) with the initial condition

x(to) = XQ (the Cauchy problem).

THEOREM. Let a vector-function g(t,x,u) be continuous with respect to all variables (t, x, u) and continuously differentiate with respect to the components of the vector x = (xi,..., xn), and let the vector-function u — u(t]

be continuous with respect to time. Then there exists a number T > 0 such that a unique continuous vector-function x(t) satisfies system (1.1.5) with the initial condition x(to) = XQ on the interval to < t < to + T. If T —>• oo, that is, if the domain of existence of the unique solution is arbitrary large, then the solution of the Cauchy problem is said to be infinitely continuable to the right. It should be noted that the functions g(-) and u need not be continuous

with respect to t. The theorem remains valid for piecewise continuous and even for bounded functions
absolutely continuous function [91]. The assumption that the function g(-) is smooth with respect to the

components of the vector x is much more essential. If this condition is not satisfied, then we can encounter situations in which system (1.1.5) does not have any solutions in the "common" classical sense (for example, for some initial vectors x(to) = XQ, it may be impossible to construct a function The solution of the stochastic differential equation (1.1.6) is a stochastic process x(t). The uniqueness of the solution to (1.1.6) is understood in the sense that the initial

condition x(to) = #o a nd the control function u(t): t0 < t < t + T uniquely determine the probability characteristics of the random variables x(t) for all t £ (to, to + T].

10

Chapter I

x(t) that identically satisfies (1.1.5) on an arbitrarily small finite interval t0 (t, x(t)) like in (1.1.4). It is well known (this will be illustrated by numerous special examples considered later) that many problems of optimal control with control constraints often result in control algorithms u* —
is directed towards the surface S in £)_ and away from the surface S in D+, then the solution goes from £>_ to D+, intersecting the surface S only once (Fig. 4). But if the vector ~g is directed towards the surface S in £>_ and

in D+, then the solution, once coming to S, can leave it neither to Z>_ nor to D + . Therefore, there is a problem of continuation of this solution. In [54] it is assumed that after the solution x(t) comes to the surface S, the subsequent motion of system (1.1.7) is realized along the surface S with velocity x = go(t,x) = ag+(t,x) + (1 - a)g_(t, x), (1.1.8) where x (E S and the number a (0 < a < 1) are chosen so that the vector <7o(£, x) is tangent to the surface S at the point x.

The vector ^(t, xj in (1.1.8) can be constructed in the following way.

11

Synthesis Problems for Control Systems

x(t)

D

FIG. 4 At the point x G 5 we construct the vectors cj+(t, x) and g-(t,x) and connect their endpoints with a straight line. The point of intersection of this straight line with the plane tangent to 5 at the point x is the endpoint of the desired vector g0(t,x) (Fig. 5).

FIG. 5

A function x(t) satisfying Eq. (1-1.7) in D+ and in D_ and satisfying Eq. (1.1.8) on the surface 5 is called the generalized solution of Eq. (1.1.7) or a solution in the sense of Filippov. This definition makes sense, since a solution in the sense of Filippov is the limit of a sequence of classical solutions to Eq. (1.1.7) with smoothed (in

12

Chapter I

x) right-hand sides gk(t,x) ifgk(t,x) —>• g(t, x) as k —)• oo. Moreover, the sequence xk(t) of classical solutions of equations with retarded argument

uniquely converges to the same limit if the delay Tk —>• 0 as k —)• oo (see [54]). We also note that, in practice, solutions in the sense of Pilippov can

be realized in some technical, mechanical, and other systems of automatic control, which are sometimes called systems with variable structure [46]. In such systems, the plant is described by Eq. (1.1.5), and the control vector u makes a jump when the phase vector x(t) intersects a given switching surface S. In such systems, if the motion is along the switching surface, the critical segments of the trajectory can be realized by infinitely fast switching of control. In the theory of automatic control such regimes are called

"sliding modes" [2, 46]. Generalized solutions in the sense of Filippov allow us to construct the unique solution of the Cauchy problem for Eq. (1.1.5) with function

g(t, x,u) piecewise continuous in x. Now let us consider the stochastic differential equations (1.1.6). We have already pointed out that these equations substantially differ from ordinary differential equations of the form (1.1.5); the special properties of Eqs. (1.1.6) are studied in §1.2. Here we only briefly dwell on the nature of

special properties of these equations. The stochastic differential equations (1.1.6) have the following fundamental characteristic property. If the random function £ ( t ) on the righthand side of (1.1.6) is a stochastic process of the "white noise" type, then the Cauchy problem for (1.1.6) can have an infinite (larger than a countable) set of different solutions. Everything depends on how we understand the solution of (1.1.6) or, in other words, on how we construct the random

function x ( t ) that satisfies the corresponding Cauchy problem for (1.1.6). It turns out that in this case we can propose infinitely many well-defined solutions of equation (1.1.6). This situation gives an impression that the differential equations (1.1.6) do not make any sense. However, since control systems perturbed by a white noise play an important role, it is necessary to specify how the dynamics of a system is described in this case and in which sense Eq. (1.1.6) must be

understood if it is still used. On the other hand, the existence and uniqueness of the solution to the Cauchy problem for equations of the forms (1.1.5) and (1.1.6) is the basic assumption that allows us to use the dynamic programming approach for solving problems of optimal control synthesis.

In §1.2 we discuss these and some other topics.

Synthesis Problems for Control Systems

13

1.1.2. Goal of control. The requirements imposed on a designed control system determine the form of the functional (the optimality criterion) , which is a numerical estimate of the control process. Let us consider some typical problems of optimal control and write out the cost functionals needed to state these problems. We begin with deterministic problems in which the plant is described by the system of differential equations (1.1.5). First, we assume that the time

interval 0 < t < T (on which we consider the control process) is fixed and the initial position of the plant is given, that is, x(0) = Zo, where x0 is a vector of some given numbers. Such problems are called control problems with variable right endpoint of the trajectory. Suppose that it is required to construct an optimal servomechanism (see Fig. 2) such that the input command signal y(t): 0 < t < T is a known function of time. If the goal of the servomechanism shown in Fig. 2 is to reproduce the input function y(t) via the output function x(t): 0 < t < T most closely, then one of possible criteria for estimating the performance of this servomechanism is the integral

/ Jo

\x(t)- y ( t } \ p d t ,

(1.1.9)

where p is a given positive number, and a denotes the Euclidean norm of __

1 /">

a vector o, that is, a = (S™=i a?) • ^n an "ideal" servomechanism, the controlled output process is identically equal to the command signal, that is, x(t) = y(t), 0 < t < T, and the functional (1.1.9) is equal to zero, which is the least possible value. In other cases, the value of (1.1.9) is a numerical estimate of the proximity between the input and output processes. It may happen that much "effort" is required to ensure a sufficient proximity between the processes x(t] and y(t), that is, the control action u(t) needs to be large at the input of the plant P. However, it is undesirable to use too "large" controls in many actual devices both from the energy and economy viewpoints, as well as from the reliability considerations. In these cases, instead of (1.1.9), it is better to use, for example, the cost functional

r

T [ \ x ( t ) - y(t)\'+ a\U(t)\«] dt,

(1.1.10)

where a, q > 0 are some given numbers. This functional takes into account both the proximity between the output process x(t) and a given input

process y(t) and the total "cost" of control on the time interval [0,T]. Of course, the functionals (1.1.9) and (1.1.10) do not exhaust all methods for stating integral optimality criteria that are used in problems of synthesis of optimal servomechanisrns (Fig. 2). The most general form

14

Chapter I

of integral criteria can be obtained by using the penalty functions introduced by Wald [188]. Suppose that each current state of the system shown in Fig. 2, characterized by the set of vectors (x(t), y(t), u(t)^, is "penalized" by a given nonnegative scalar function c(a;, y, u) of their arguments.

If c(x,y, u) has the meaning of specific penalties per unit time, then the functional fT

Ji[«]= /

Jo

c(x(t),y(t),u(t)}dt

(1.1.11)

is a natural performance criterion on the time interval [0, T]. Obviously, the functionals (1.1.9) and (1.1.10) are special cases of (1.1.11), in which the penalty function c is defined as c(x,y, u) = \x — y\p or c(z,j/, it) = \x — y\p + a u 9, respectively. Another class of optimal control problems is formed by problems of terminal control. Such problems appear when the character of transition processes in the system is not essential for 0 < t < T and we are interested only in the state of the system at the terminal moment of time T. In this case, using the corresponding penalty function i / ) ( x , y ) , we obtain the terminal

optimality criterion (1.1.12) It should be noted that, by an appropriate extension of the phase vector x, we can rewrite the integral criterion (1.1.11) in the form (1.1.12). Thus,

from the mathematical viewpoint, the integral criterion (1.1.11) is a special case of the terminal criterion (1.1.12) (see [1, 34, 137]). Nevertheless, we distinguish these criteria in the sequel, since they have different meanings in applications. In addition to (1.1.11) and (1.1.12), we often use their combination

fT h[u} =

Jo

c(x(t},y(t),u(t))dt

+ ^(x(T),y(T));

(1.1.13)

this criterion depends both on the transition process and on the terminal state of the system. If the worst (with respect to a chosen ' penalty function) state of the controlled system on a fixed time interval [0,T] is a crucial factor, then,

instead of (1.1.11), we must use the criterion

.

(

t

)

)

.

(1.1.14)

An optimal system constructed by the minimization of the criterion (1.1.14) provides the best (in contrast with any other system) result only in the

worst operating mode. Criteria of the form (1.1.14) were studied in [16, 40, 92, 148].

Synthesis Problems for Control Systems

15

If a dynamic system described by Eqs. (1.1.5) is totally controllable,3 then optimal control problems with fixed endpoints or with some fixed terminal set are often considered together with control problems with variable right endpoint of the trajectory. In these problems, the control time T is not fixed in advance and, as admissible controls, we take the control functions u(t): 0 < t < T that transfer system (1.1.5) from a given initial state x(Q) = XQ to a fixed terminal state x(T) = Xi or to a fixed terminal set. An admissible control ut(t) : 0 < t < T is optimal if the integral functional (1.1.11) attains its minimum at u*(t). The control problems with fixed endpoints contain time-optimal problems as an important special case. In these problems, we have the penalty function c = 1 in (1.1.11), and the minimized functional (1.1.11) is equal to the transition time Ii[u] = T from the state XQ to the state Xi (or to a given terminal set). Time-optimal problems find wide application in mechanics, physics, technology, etc. (see [1, 24, 85, 90, 123, 137, 156]). It should be noted that in due time just the time-optimal problems made a great impact on the formation of the theory of optimal control as a subject of independent study. The most of the cited optimal control problems can readily be generalized to the stochastic case in which the plant is described by the stochastic differential equations (1.1.6). It only remains to note that in this case each of the functionals (1.1.11)-(1.1.14) is a random variable for any fixed control algorithm u(t) (given, say, in the form (1.1.3)). These variables are measured to a large extent by their mean values, which determine the values of the mean "losses" or "costs" of control if the control algorithm (1.1.3) is repeated many times. The mean values

h[u] = E/i[«] «] = E / c ( x ( t ) , y ( t ) , u ( t ) ) dt, Jo

(1.1.15)

I6[u] = Eh[u] = E^(x(T),y(T)}, Ir(u] = E/sH

(1.1.16)

= E[ f

c ( x ( t ) , y ( t ) , u(t)) dt + 1>(x(T),y(T))] ,

LJo

I8[u] = EI4[u] = E max c ( x ( t ) , y ( t ) , u ( t ) ) .

J

(1.1.17) (1.1.18)

of functionals (1.1.11)— (1.1.14) usually serve as optimality criteria in sto3

According to [78, 111], the system (1.1.5) is called totally controllable if for any two given vectors XQ and x\ , there always exist a finite number T > 0 and an admissible control function u(t) : 0 < t < T such that system (1.1.5) is transferred from a given initial state x(0) = XQ to a given terminal state x(T) == x\ during time T.

16

Chapter I

chastic problems of optimal control (in (1.1.15)—(1.1.18) and, in what follows, EA denotes the mean value (the mathematical expectation) of A). In controlled stochastic systems, we often encounter control problems in which the terminal moment is a random value, for example, the optimal halting problems [113, 132]. In these problems, we have an additional optimization parameter, namely, the random terminal time r of the process. Therefore, the optimality criterion depends both on the control actions u = UQ = u(t): 0 < t < T and on T as follows:

I9[u, r}=E\ T c ( X ( t ) , y(t), u(t)) dt + $(X(T), y(r)}\. Uo J

(1.1.19)

There is another type of problems with a random terminal time of the process. Suppose that D C R n + m is a subset of the Cartesian product

R n+TO = Rn x Rm of Euclidean spaces of vectors x and y. Suppose that TD is the time instant at which the point ( x ( t ) , y(t)) comes to the boundary 3-D of the set D for the first time. Then we can state the problem [34, 113] of finding the control law (1.1.3) for which the functional

U

TD

-i

c ( x ( t ) , y(t),u(t)) dt + 1>(x(TD), y(TD}}

J

(1.1.20)

attains an extremum. The functional

/n[«] = ETD

(1.1.21)

is a special case of (1.1.20) with c(-) = 1 and i/>(-) = 0; the value of this functional is equal to the mean time during which the point (x,y) comes to the boundary dD of the set D. If the criterion /u is used, then the goal

of control depends on whether the initial state (x(0), j/(0)) of the system is an interior point or, vice versa, ( x ( 0 ) , y ( 0 ) } £ R-n+m \ D. If (z(0),j/(0)) G D, then, as a rule, we need to maximize (1.1.21) (see §2.2); otherwise

((x(0),y(0)) ^ £>), the goal of control is to minimize (1.1.21). The last problem is a stochastic version of the time-optimal problem [1, 85]. The criteria / i , . . . , In considered do not exhaust all possible statements of optimal control problems. The other statements can be found in the

literature devoted to the control theory [1, 3, 5, 24, 34, 43, 58, 111, 112, 128, 156]. The choice of the criterion depends on practical needs, that is, on the special technical problem that arises at the design stage. It

should be noted that in the mathematical approach to optimal systems more attention is paid to general problems of the qualitative theory (the existence and uniqueness of the solution, justification of the optimality principle,

Synthesis Problems for Control Systems

17

estimates and asymptotics of solutions, etc.), while the choice of a criterion is not very essential. Moreover, by introducing some additional variables, one can transform different optimality criteria to some standard form, for example, to the integral form Ii (1$) or to the terminal form /2 (I&)- The situation is different if some quantitative calculations of the optimal control

block (controller) C are required, that is, if we need to write the algorithm (1.1.3) explicitly. The complexity and sometimes the method of calculations

significantly depend on the optimality criterion. On the other hand, some criteria of different form may lead to constructions of optimal systems close

to each other (see §2.2). In such cases, it may be useful to replace the original criterion by a new one that simplifies the calculations of the optimal algorithm (1.1.3) but does not change it essentially. Problems of the rational choice of optimality criteria are considered in [50, 155].

In the present monograph, we assume that an optimality criterion is already chosen from some considerations and is a known functional of trajectories of the system. 1.1.3. Constraints in control problems. In the design of actual control systems (both open- and closed-loop systems, see Figs. 1-3), it is often required to take into account some restrictions on the parameters of the control process. For example, in many problems some constraints are imposed on the control actions u. Suppose that only the value of the control

actions is important. Then the control function u is admissible if for all t it takes values from some bounded set U C R r , that is,

u(t) e U.

(1.1.22)

In particular, (1.1.22) can be of the form M<)|<«?,

i = l,...,r,

£)/*«?(*)<*,

(1.1.23)

(1.1.24)

1=1 where u°, /i,-, and k are given positive numbers. Constraints of the form (1.1.22)-(1.1.24) reflect the fact that control actions of any physical nature (force, torque, electric force, potential, heat quantity, concentration, etc.) always vary in a bounded range. The control is not allowed to take large values, since this may result in mechanical break-downs, damages of electric circuits, etc. Constraints of integral character are also possible. Sometimes they are called constraints on control resources. In this case, the admissible control

18

Chapter I

functions u(t) must satisfy the condition f

I Jo

u(t)\qdt
(1.1.25)

where Q > 0 is a given number. Problems with constraints (1.1.25) were considered in [22, 34]. The same technical considerations often show that it is necessary to impose some restrictions on the domain of admissible values of the phase variables x. If X C Rn is the set of possible values of x, then the related constraints on the phase trajectories x(t) can be of the form

x(t)eX,

(1.1.26)

which is similar to (1.1.22). Constraints of the form (1.1.22)-(1.1.26) are of considerable and some-

times of decisive importance in problems of optimal control. So, control problems often make sense only for constraints of the form (1.1.22)-(1.1.25).

Indeed, let us consider a control problem in which the plant is described by system (1.1.5) and the control performance is estimated by the integral optimality criterion (1.1.11) with penalty function independent of y and u:

r-T I[u] = I c(x(t}} dt. Jo

(1.1.27)

Suppose that the penalty function c ( x ) attains the minimum value at x = XQ (we can always assume that this minimum value is zero). Then, by using an arbitrary large control u (admissible in the absence of constraints (1.1.22)(1.1.25)), we can obtain a trajectory of motion x(t) that is arbitrarily close

to x ( t ) = XQ = const (it is assumed that system (1.1.5) is controllable [78, 111] and the current state of system (1.1.5) can be measured exactly). Thus, if the control function is unbounded, the functional (1.1.27) can be arbitrarily close to the zero value of its absolute minimum. But if the control

u(t) is bounded by some of conditions (1.1.22)-(1.1.25), then the functional (1.1.27) never takes the zero value at z(0) ^ x0 and the minimization problem for (1.1.27) is nondegenerate. In some cases, restrictions on phase variables (1.1.26) allow us to improve the mathematical model of the control process and to describe the actual situation more precisely. Let us consider an illustrative example. Suppose that the plant is a servomotor with bounded speed. The equation of motion

has the form
(1.1.28)

Synthesis Problems for Control Systems

19

where x and u are scalars. Suppose that by solving the synthesis problem, we obtain the optimal control algorithm of the relay type:

t(t,

x) =u 0 sign(:c- x0(t}),

signy=<

0,

y = 0,

(1.1.29)

where x0(t] is a given function of time. In this algorithm the control action instantaneously varies by a finite value when the difference (x — «o(t)) changes the sign. If an actual control device implementing (1.1.29) has some inertial properties (for example, it turns out that the absolute rate v of changing the control action is bounded by VQ) , then it is more convenient to model such a system by a plant whose behavior is described by two phase

coordinates x± = x and x^ = u such that Xi = X2,

X2 = V,

\V

< VQ,

X-2\ < UQ.

(1.1.30)

In this case, v (the rate of change of x% = u) is a control parameter, and the control constraint in (1.1.28) becomes a constraint imposed on the phase coordinate x% in (1.1.30). 1.1.4. Probability characteristics of stochastic processes. As was already pointed out in Introduction, in the present monograph, we consider stochastic processes under the assumption that all random actions

on the system (the variables y ( t ] , £(i), rj(t), and £(t) in Fig. 3) are either white noises or Markov type processes. We restrict our consideration of methods for the mathematical description of such processes to a rather elementary presentation of related notions and problems.

The rigorous

theory of Markov processes based on the measure theory can be found in

the monographs [44, 45]. A stationary scalar stochastic process £(t) is called the standard white noise if it is Gaussian and has the zero mean and a delta type correlation function,

T)=6(r).

(1.1.31)

In (1.1.31) S(r) denotes the Dirac delta-function that is zero for r ^ 0 and becomes infinite for r — 0 (see [65, 91]). Besides, any continuous function f ( t ) satisfies the relation

f(t)6(t-t0)dt

=

f(to), f(b)/2,

< 0 e (a, &), t0 = b,

/(a)/2,

to = «,

0,

t0£[a,b].

(1.1.32)

Chapter I

20

Various nonstationary generalizations of the notion of white noise are combinations (obtained by multiplication and addition) of the standard process (1.1.31) and some deterministic functions of time. Obviously, a Gaussian stochastic process with characteristics (1.1.31) cannot be physically realized, since, as we can see from (1.1.31), this process has the infinite variance

and hence, to realize this process we need to have a source of infinite power. Therefore, a process of the white noise type can be considered on some time interval [0, T] as the limit (as A —>• 0) model of a time sequence of independent identically distributed random variables £,• = £(£,- = iA) (i = 0, 1, . . . , N, N = T/A) with probability density

(1.1.33) From (1.1.33) we can see that D£j = I/A —> oo as A —> 0. This means that on any arbitrarily small finite time interval T with probability 1 a realization of the white noise takes values both larger and smaller than any fixed number. Thus the white noise is a stochastic process that oscillates extremely fast and with infinite amplitude about its mean value. If we try to draw a realization of the white noise on the time interval [to, to + T], then this realization completely fills the infinite band parallel to the a;-axis as shown in Fig. 6.

FIG. 6

Synthesis Problems for Control Systems

21

The white noise is a convenient abstraction of actual stochastic processes. This model of processes is of great advantage for performing mathematical calculations with expressions that contain white noise type processes (in particular, one can readily calculate the mean values of integrals of such processes); this is related to the properties of the delta-function (1.1.32). In mathematical investigations actual stochastic processes (,p(t) with finite correlation time rcor can be replaced by white noise type processes if Tcor
Tn , where Tn is either the characteristic time constant or the transient time in a system under the action of (,p(t) (for more details, see [173, 181]). The rigorous description of generalized stochastic processes of the white noise type can be found in [63, 157].

A multidimensional generalization of the standard white noise is an n-dimensional vector-column of random functions £(<) whose components £,•(<) (i — l , . . . , n ) are independent Gaussian stochastic processes with characteristics (1.1.31). Thus, instead of (1.1.31), the n-dimensional standard white noise is characterized by

,

(1.1.34)

where E is the n x n identity matrix and the superscript T indicates the transpose of a matrix. Now let us consider methods for defining Markov processes. A stochastic process £(<) : 0 < t < T with values from some set (the phase space) X is called a Markov process or a process without aftereffect if the conditional probabilities satisfy the relation

P[£(*n)eG|£(
G C X. Formula (1.1.35) means that the probabilities of future values of Markov processes are completely determined by the last measured state of the process and are independent of the preceding states (the absence of aftereffect). Depending on whether the sets [0, T] and X are discrete or continuous, we

distinguish discrete Markov sequences or Markov chains (the sets [0, T] and X are discrete), continuous sequences (the set [0,T] is discrete and the set X is continuous), and discrete Markov processes (the set [0, T] is continuous and the set X is discrete). But if the phase space X is continuous and the

argument t of the stochastic process £(i) may take any value t G [0, T], then we have the following two types of Markov processes: continuous (all sample paths of the process £ ( t ) : 0 < t < T are continuous functions of time with probability 1) and strictly discontinuous (all sample paths of the process £(t) : 0 < t < T are step-functions, while the moments and amplitudes

of jumps are continuous random variables).

Chapter I

22

There exist more complicated Markov processes that are combinations of processes listed above [181]. Numerous papers and monographs deal with various types of Markov processes (see, for example, [11, 36, 38, 67, 160]). In the present monograph, we consider the following three types of stochastic processes in most detail: discrete, continuous, and strictly discontinuous Markov processes.

FIG. 7 Discrete Markov processes. As was already noted, in this case the time is continuous, but the phase space X is discrete. We assume that the set X consists of finitely many elements xi, . . .,xa, . . ., xm. At each instant of time t € [0,T] (possibly, T —> oo), the process £ ( t ) with probability Pa(t) takes one of the ra possible values xa, a = l,...,m. The transitions to new states are instantaneous and take place at random moments. Thus a sample path of the process £(i) is a step-function of time as shown in Fig. 7. Suppose that the process is in the state £ ( t ) = xa at time t. Then, it follows from (1.1.35) that the probability of the event that the process comes to the state £(r) = xp at time T > t depends only on t, T, xa, and

Xft. The corresponding conditional probability Xa},

(1.1.36)

which is usually called the transition probability, is an important characteristic of the Markov process (.(t). The unconditional probabilities P«(<) = P{£(<) = xa}, a — l , . . . , m , and functions (1.1.36) describe the process £ ( t ) completely.4 Actually, the 4 If the probabilities P{£(*i) = x a ,£(t 2 ) = xp,. . ., £(t n ) = Xu,} are known for any (ti, *2> • • • > * n ) G [0, T] and for any set of numbers (a,f),...,it>), then a stochastic process is said to be well-defined.

Synthesis Problems for Control Systems

23

probability multiplication theorem [52, 67] and the Markov property of the process £ ( t ) imply that for any t-^ < ti < • • • < tn and a, / ? , . . . , w — 1, . . . , m the probability of the event {£(^1) = *cn£(*2) = xp, • • • ,£(tn) — xu} can be expressed in terms of the functions Pa(t) and P a/ 3(i, T) as follows:

(1.1.37) On the other hand, the functions Pa(t) and P a / a (t,r) can be obtained as solutions of some systems of ordinary differential equations. Let us derive the corresponding equations for Pa(t) and Pa0(t, r)- To this end, we first obtain the Chapman-Kolmogorov equation for the transition

probabilities m

P a / 3 (i,r) = ^P a7 (t, ( r)P 7/3 (cr,r),

0 < t <
(1.1.38)

7=1

We write formula (1.1.37) for three instants of time t,
P{£(t) = xa,£(cr) = z 7 ,<e(r) = xp} = P a (t)P a 7 (t,«r)P 7 j 8 (o-,T).

(1.1.39)

Since

-f=l

(1.1.40) we write the right-hand side of (1.1.40) in the form P(t(t)=Xcnt(T)

= Xp)

= Pa(t)Paf}(t,T)

(1.1.41)

and, substituting (1.1.39) and (1.1.41) into (1.1.40), obtain Eq. (1.1.38) after Pa(t) is canceled out. To derive differential equations for Pa/)(t, T) we need some local time characteristics of the Markov process £(i). If we assume that there is at most one change of the state of the process £(t) on a small time interval A,5 then for small r — t we can write the transition probabilities Pap(t, T)

as follows: l-Xa(t)(T-t)+o(r-t),

a = (3,

5 This is a well-known condition that the process £(t) is ordinary [157, 160], which means that the probability of two and more jumps of £(4) is equal to o(A) on a small time interval A.

24

Chapter I

(in (1.1.42), as everywhere in the following, by o(A) we denote expressions of higher order than the infinitesimal A, that is, o(A) is a scalar function such that lim A -^oo(A)/A = 0). The normalization condition X)/3=i Paftfti T) — 1 f°r ^ ne transition probability and formula (1.1.42) imply that

Xa(t)=

£

Xaf)(t).

(1.1.43)

As is known [160, 181], the parameters A a| g(t) determine the intensity of jumps of the process £(t). The variable Xa(t) defined by (1.1.43) is often called the exit intensity or the exit density of the state xa. It determines the time interval on which the process £(t) remains in the state xa in the sense that the probability P» of changing the state on the time interval [t, t + T] under the condition £(t) = xa is

{

rt+T X (s)

— / Jt

a

i

ds >. )

By setting cr = t + A in (1.1.38) and using (1.1.42), we obtain

Pap(t, T) = Pa/3(t + A, r) - Xa(t)&Pa0(t + A, T) m

c-^«) Dividing (1.1.44) by A and passing to the limit as A —>• 0, we arrive at the system of differential equations (a,/3 = 1,..., m)

dfj

7=1 (l^a)

for the transition probabilities Pap(t,T), which are considered as functions of the initial time t. Equations (1.1.45) hold for t < T. The unique solution of system (1.1.45) is determined by the additional conditions on the

functions Paa(t, T} for t — r: (1.1.46)

Synthesis Problems for Control Systems

25

With respect to the variable T, the transition probabilities Pa0(t, T) satisfy the other system of equations (a, (3 = 1, . . . , m) dP

lll=-Xft(r)Pa/)(t,T)

+

T

X^(T)Pa,(t,T),

(1.1.47)

which, by analogy with (1.1.45), can be obtained from (1.1.38) with
r — A by passing to the limit as A —> 0. The initial conditions Pc,p(t,t) = 6at),

(1.1.48)

which are similar to (1.1.46), provide the uniqueness of the solution of (1.1.47) denned for r>t.

Equations (1.1.47) and (1.1.45) are the forward and backward systems of Kolmogorov equations for the transition probabilities. From equations (1.1.47) one can also readily derive equations for the unconditional probabilities P a (2), a = 1, . . .,?7i. It suffices to multiply (1.1.47) by Pa(t) and

sum over a taking into account the fact that

a=l

As a result, after we rename f3 —>• a and r

• t, we obtain the system of

equations (a = 1, . . . , m)

(1.1.49)

P«(t) = •y=l (7/ar)

The initial probabilities P a (0), a = l , . . . , m , ensure that the solution of

system (1.1.49) is unique for t > 0. Thus an ordinary discrete Markov process is completely determined by the probabilities P a (0), a = 1, . . . , TO, of initial states and by the intensities A a/ g(t), a,{3 = 1, . . .,m, a ^ (3, of jumps. Indeed, if we know these characteristics, then we can find the probabilities Pa(t) and P a/ a(i, r) by solving the systems of linear differential equations (1.1.49) and (1.1.47) (or (1.1.45)). Conversely, if we know the probabilities Pa(t) and Pap(t, T), then we can calculate all possible probabilities of the form (1.1.37).

Continuous Markov processes. These processes are continuous in the phase space X and with respect to time. On each time interval t$
with probability 1.

26

Chapter I

First, let us consider a one-dimensional (scalar) continuous stochastic process. In this case, the phase space X = RI is a set of points on the real axis. Since the instant value £(t) — x of the process is a continuous random variable, its probability properties can be determined via the probability density function p(x,t). In a similar way, one can use the multidimensional density function P(XI, X2, . . . , xn; t±, t2, . . . , tn) to describe the set of

instant values £(ii) = xi,£(t2) = x2,...,£(tn) = xn. A stochastic process £(t) : 0 < t < T is considered to be determined if we know all possible joint density functions P(XI, . . . , xn;
• ••

p(xi, ...,xn;tl,...,tn)dxi...dxn = I

with respect to the variables xi,...,xn. With respect to the variables ti, . . . , < „ , these functions satisfy the symmetry conditions [66, 173]

(here (ai, «2, • • • , « „ ) is a permutation of the indices 1, 2, . . . , n) and the

compatibility conditions, which allow us to obtain the marginal distribution [39] by integrating the initial density function p(xa, x/}; ta,tp) = I • • • I p(x-i, ...,xn;ti,..., tn) dxi . . . . . . dxa_idxa+i . . . dxp_idx0+i . . . dxn.

(1.1.50)

It follows from the probability multiplication theorem for joint distributions (for any not necessarily Markov process) that

p(xi,x-2,...,xn;ti,t2,...,tn) = (1.1.51)

where p(a;;,i,- xi, ti; . . . ; K,-_j.,i,-_i), i = 2,...,n, are densities relative to the conditional distributions of the process £(ti) provided that the instant values £(ti) = x - L , £ ( t 2 ) = x2, . . . , £(^-i) = z,--i are chosen. However, if in (1.1.51) the sequence of times
xp(x3,t3

x2,t2)...p(xn,tn

xn_i,
(1.1.52)

Synthesis Problems for Control Systems

27

Relation (1.1.52) is an analog of (1.1.37) in the case of continuous Markov processes. It follows from (1.1.52) that to write any multidimensional density function, one needs to know the unconditional density p(x, t) and the conditional

density p(y, T \ x, t) for any t and r > t. The function p ( y , T \ x,t), just as Pa/)(t,T) in the discrete case, is called the transition probability. One can obtain differential equations for the functions p(x, t) and p(y, T x,t), which

are analogs of Eqs. (1.1.45), (1.1.47), and (1.1.49). However, in contrast with (1.1.45), (1.1.47), and (1.1.49), in this case we have partial differential equations.

We write p(y,r \ x,t) = p(x,t;y,r) for the transition probability and consider this probability as a function of four variables x, t, y, and T. Then, using (1.1.50) and (1.1.52), we readily obtain the relation

p(x,t;y, T} = / p(x,t;z, ff)p(z, a--y,r)dz,

t <
(1.1.53)

which is a continuous analog of the Chapman-Kolmogorov equation; often this equation is also called the Markov [67] or the Smoluchovski equation

[173]. We define the local characteristics of the stochastic process £ ( t ) by the relations

A) - {(t)]

£(t) =x} = A(t, Z )A + o(A),

2

E{K(t + A) - £(t)] | £(<) = x} = B(t, z)A + o(A), k

A) - £(t)} | t(t) =x} = o(A),

(1.1.54)

k > 3.

This Markov process is called a diffusion process. The values A(t, x) and B(t, x) that determine £ ( t ) are called the drift and the diffusion coefficients. Figure 8 illustrates these parameters by showing some sample paths of the diffusion process £(<) issued from the point x at time t$. The straight line AB shows the direction along which the "centroid" of the fan of sample paths drifts for t close to to- The angle a between the line AB and the z-axis is determined by the drift coefficient tana = A(to,x). For small (t — tg), the diffusion coefficient B(x,to) determines the rate of increase in the variance of instant values of £(t) with respect to the line AB] that is, B(x,to) determines the expansion rate of the fan of sample paths issued from the point (to,x). Note that the conditional expectations in (1.1.54) can be calculated by integrating with the transition probability. For example,

-x)p(x,t;z1t + *)dz.

(1.1.55)

Chapter I

28

FIG. 8 In (1.1.53) we set a = t + A, assume that the transition probability p(z,t + A;T/, T) is a sufficiently smooth function, and write its Taylor expansion

k=l

;y,r). (1.1.56)

k\

Substituting (1.1.56) into (1.1.53) and taking into account (1.1.54) and (1.1.55), we obtain

dp

p(x,t; y, T) = p(x,

)—-( (1.1.57)

Dividing (1.1.57) by A and passing to the limit as A —>• 0, we arrive at the backward Kolmogorov equation

dp(x,

t; y, 8t

B(t,

_

dx

dx2

To obtain the forward equation to which the transition probability satisfies as a function of y and T, we need to note that if f,(cr) = z, then the probability properties of the increment £(z, ff) = £(T) — £(
Synthesis Problems for Control Systems

29

known [67, 173], the characteristic function 0(u; z, a) related to p(z,
[ <J«(9-*)p(z,
(1.1.59)

and by the inverse Fourier transform

-*°<3-^6(u;z,
(1.1.60)

is also a universal characteristic of the random variable ("(2,
If

/lift

(1.1.61) (here i/fc(z,cr) = E{[£(r) — £(
8(u\ z, T - A) = 1 + AjuA(z, r - A) - —u2B(z, T - A) + o(A) (1.1.62) for cr = r — A. Applying the inverse Fourier transform to the left- and right-hand sides of this relation, we obtain f\

p(z, T - A; y, T) - % - z) - A(z, r - A) ^-[S(y - z)] A

vy

).

(1.1.63)

In (1.1.63) we have used the well-known formal relation [41] for the deltafunction:

S(y - z) = —- / e-Ju(y-*> du, 2ir J

which has an exact meaning after it is multiplied by an arbitrary continuous

function
p(x, t; y, T) - p(x, t; y, T - A) - A — [A(y, T - A)p(x, t; y, T - A)]

a/ 2

30

Chapter I

Then passing to the limit as A —>• 0, we obtain the forward Kolmogorov equation

(1.1.64) which is also called the Fokker-Planck equation. Equations (1.1.58) and (1.1.64) are linear partial differential equations

of parabolic type. It is well known [166, 179] that the unique solution of such equations is not determined by the initial condition (for (1.1.64)) or the endpoint condition (for (1.1.58)) p(x, t; y, t) = p(x, T; y, T) - 6(y - x)

(1.1.65)

to which the transition probability satisfies for t = r. It is also necessary to take into account the boundary conditions, which in the case of the infinite phase space X, consist of restrictions imposed on the asymptotic behavior

of the function p(x,t;y, T} as \x\, \y\ —>• oo. To obtain unique solutions of (1.1.58) and (1.1.64), it suffices to require that p(x,t;y,r) is bounded, though it follows from the normalization condition that we always have a sharper condition: p(x, t] y, T) —> 0 as \x , \y\ —> oo. If the phase space X is

bounded, then the additional conditions to which the function p(x,t;y,r) need to satisfy at the boundary points of X are determined by the behavior of the phase trajectories £(£) near these boundary points (see §6.2).

Multiplying (1.1.64) by p(x,t), integrating with respect to x, and taking into account the fact that

p(x, t)p(x, t; y, T) dx = p(y, T), we obtain (after the change r —> t and y —>• x) the following equation for

the unconditional density p(x,t):

^j^- = -j-[A(x,t)p(x,t)}+±-JI^[B(x,t)P(x,t)},

(1.1.66)

which is similar to (1.1.64). To solve (1.1.66) for t > t0, it is necessary to set the initial density p(x,to) = po(x) and to take into account the boundary conditions. If Po(x) — S(x — XQ), then the solution of (1.1.66) is the transition probability p(xo,to\ x,t), t > to. If the process (_(t) is an n-dimensional vector-function of time (the phase

space is X = R n ), then the local characteristics of process (1.1.54) are determined by the vector A(x,t) of drift coefficients with components Ai(x,t),

Synthesis Problems for Control Systems

31

i = 1, . . . , n , and by the matrix of diffusion coefficients Bij(x,t), i, j — 1, . . . , n. In the multidimensional case, the Fokker-Planck equation (1.1.66) has the form

t)}.

(1.1.67)

The sums on the right-hand side of (1.1.67) are taken over twice repeated indices. This means that these expressions are understood as

~ (AiP) = J

!=1

',.7=1

Later we shall also use such short notation for sums. It should be noted that the Fokker-Planck equation is not the only

method for describing the properties of continuous Markov processes. Another method for defining the diffusion processes is based on stochastic differential equations. We consider this method in §1.2.

Strictly discontinuous Markov processes. Suppose that the state x of the process £(t) varies by jumps at random instants of time. By analogy with the case of discrete processes, we assume that the moments of jumps form

an ordinary sequence of events; then we denote the intensity of jumps by X(t, x) provided that £(t) = x. A jump at time t transfers the process £(t) to a random state y with probability density ir(x,y,i) = p(£(t + 0) = y \ ^(t) = z). Then for small r — t > 0 the transition probability p(x,t;y, T) of this process can be written in the form

p(x, t- y, T)-[!-(T- t)\(x, t)}S(y - x)

+ (T- t)\(x, t)ir(x, y, t) + O(T - t).

(1.1.68)

Just as previously, in (1.1.53) we first set a = t + A, then we set a = r — A, and apply (1.1.68). Passing to the limit as A —)• 0 in (1.1.53), we obtain the following pair of integro- differential Feller equations for the transition probability:

dp(x,t;y,r] dt

— A(x, t) \p(x,t; y, r) — / Tr(t, x, z)p(z, t; j/, r) i

(1.1.69)

dp(x,t\ y, T

~ A (j/i r )p( a ; ;^j ?/i r ) + / X(T, Z)TT(T, z,y)p(x,t; Z,T) dz.

(1.1.70)

32

Chapter I

It follows from (1.1.70) that the one-dimensional unconditional density p(x,t) satisfies the equation

^M = _ A (z,t)p(z,t) + [x(t,z)*(t,z,x)p(z,t)dz.

at

J

(1.1.71)

Equations (1.1.69)—(1.1.71) describe the probability properties of a strictly discontinuous Markov process. They are analogs of Eqs. (1.1.45), (1.1.47), and (1.1.49) for discrete processes and of Eqs. (1.1.58), (1.1.64), and (1.1.66) for diffusion processes. §1.2. Differential equations for controlled systems with random functions

Let us consider system (1.1.6) of stochastic differential equations describing the dynamics of a controlled plant P shown in Fig. 3, namely,

x = g(t,x,u,£(i)). Recall that x = x(t) is an n-dimensional vector-column with components Xi = X i ( t ) , i = 1,..., n, and g(t, x, u, £) is a given vector-function of time t and vectors x, u, and £. We begin the study of stochastic differential equations with the case in which the vector-function g is independent of the control parameters u (the motion without control): £

___

gU

y,

£(£)).

(1.2.1)

In this case, we mainly consider a special case of Eq. (1.2.1), that is, the scalar equation (1.2.2)

The results obtained for (1.2.2) in what follows can readily be generalized to the general case (1.2.1), as well as to the controlled motion case (1.1.6). We shall do this in the end of the section. If the stochastic process £(<) is the time derivative £(i) = rj(t) = dr)(t)/dt of some random function 77(2), then multiplying (1.2.2) by dt we can write Eq. (1.2.2) in terms of differentials: dx(t) = a(t, x(t}} dt + ff(t, x(t)) dr)(t).

(1.2.3)

The stochastic process x(t): t > to, is called the solution of stochastic differential equations (1.2.2), (1.2.3) with initial condition x(to) = x0; the

Synthesis Problems for Control Systems

33

expression on the right-hand side of (1.2.3) is called the stochastic differential of this process if for any t > to we have the integral representation /"*

a(T,x(T))dT +

Jta

/"*

.

ff(r,

(1.2.4)

Jto

Suppose that £ ( t ) in (1.2.2) is the standard white noise with characteristics (1.1.31). Then the stochastic process

ri(t)=r,(0)+ [ t(r)dT

(1.2.5)

is Gaussian and it follows from (1.1.31) that the increment rj (t) — ??(0) of this process has the following mean and variance over time t:

Efo(t) - i,(0)] = 0,

(1.2.6)

D[rj(t) - 77(0)] = Efo(*) - ??(0)]2 = E [ /' £(T) dr] Uo

1

= E / / Jo Jo = 1!

Jo Jo

o

T2) dr^T2 = t.

o

(1.2.7)

It also follows from (1.1.31) that the increments of rj(t) on nonintersecting time intervals are independent random variables, since we have

- »?(
cess r)(t) is called Brownian motion or a Wiener random process. One can show [66] that with probability 1 the realizations of this process are continuous but (in view of (1.1.31) and (1.2.5)) nowhere differentiable functions of time. Formula (1.2.7) illustrates these properties of the Wiener process. Indeed, the order of the increment

is given by the mean square deviation

A) - r,(t)} =

34

Chapter I

Thus, as A —>• 0 we have |Ar?| ~ \/A ->• 0 (the continuity) while the rate |AT?|/A ~ 1/v/A —>• oo (the nondifferentiability). The important formula [175] JV-l

lim V [77(t i + 1 )-T?(t;)] 2 =t-io A^O ^ (to < ti < t 2 < • • • < t

(1.2.8)

= t; A = max(ti+i — ti)) Z

is an immediate consequence of these specific properties of the Wiener process. In (1.2.8) the convergence of the sequence of random sums on the left-hand side to a nonrandom variable on the right-hand side is understood in the sense of convergence in probability (that is, (1.2.8) is satisfied

almost surely). Let us prove (1.2.8). To this end, we take into account the fact that, according to (1.2.6) and (1.2.7), the increment AT?,- = 77(^+1) — 7?(t,-) is a Gaussian random variable with zero mean and variance

DAT?; = E[ij(ti+1) - r,(r,i)}2 = ti+l - ti.

(1.2.9)

We calculate the initial 4-order moment of this increment

E(AT?,-)4 =

1

r ( r2 "I ^= I a;4exp I - —————— \ dx - 3(t,-+1 - t,-)3.

(1.2.10) Let us consider the random sum JV-l

C/v = ^(AT?j) 2 .

(1.2.11)

i=0

It follows from (1.2.9) that JV-l

EOv = ^(t«+i -ti) =t-t0.

(1.2.12)

t=0

Since AT?,- are independent, we have the variance JV-l

D(jv = y~] D(A77i)2. i=0

Taking into account the inequality D(AT?,-)2 < E(AT?i) 4 = 3(ii+i - ti)2, we obtain JV-l

JV-l

< 3 ^(t,-+i - t f ) 2 <3max(ti+i-t,-) ^(t i + 1 -t;) = 3A(i-t 0 ), i=0

*

i=0

Synthesis Problems for Control Systems

35

that is, the variance of (1.2.11) tends to zero as A —>• 0. Thus Chebyshev's inequality

and formula (1.2.12) imply (1.2.8). Now let us return to formula (1.2.4) for the solution of stochastic differential equations (1.2.2), (1.2.3). The right-hand side of (1.2.4) contains integrals of random functions, that is, stochastic integrals. The rigorous theory of such integrals, presented in [75] for the first time, can be found in [66]. Let us consider some special properties of these integrals that distinguish them from common integrals of sufficiently smooth deterministic functions. From the course of mathematical analysis it is known that (nonstochastic) Riemann and Stieltjes integrals are defined as the limits as A —>• 0 of the integral sums a(r, X(T)) dr = lim V" a(r,-, z(r;))(i i+ i - <,-), A—^0

(1.2.13)

i=0

N-l

,t

I ° ~^ »=o

(1.2.14)

obtained by the A-decomposition of the integration interval [to, t]: t$ < ^i < ti < • • •
i = Q,...,N-l. In the stochastic case in which a(r, X(T)) = a(r), cr(r, x(r)) =
and

Jta

,t / <JT Jt0

can also be defined by formulas (1.2.13), (1.2.14) provided that [160]: (1) the random functions a(r) and • 0 as A —>• 0 uniformly in r £ [t0, t] (the same holds for
E

f* a •/to

2

(r)dr
r*

E / cr2(r)dr
36

Chapter I (3) the limit in (1.2.13), (1.2.14) is understood as the mean square limit (recall that a random variable £ is called the mean square limit of the sequence of random variables £ra ,

if E(£ n -£) 2 - » 0 a s n - ^ o o ) . If assumptions (l)-(3) are satisfied, then the limits in (1.2.13) and (1.2.14) exist and the stochastic integrals are well denned by formulas (1.2.13) and (1.2.14). The fact that the value of the stochastic integral (1.2.14) depends on the choice of the points T,- is essentially new in contrast with the deterministic case. This follows from special properties of the Wiener process and formula (1.2.8) (integral (1.2.13) is independent of the choice of TJ). Let us consider this fact in more detail. It follows from the method for constructing the integral sum (1.2.14) that we can replace the continuous function u(r) = (T(X(T),T) on the integration interval [to,t] by a piecewise constant function
= TV{ = (1 - v}ti + vti+l,

0 < v < 1.

(1.2.15)

We show that in this case the value of integral (1.2.14) depends on v.

1 0

(J

\

{

"/ 7

Y(o VJ

to TO ti TI t-2, T2 ts . . . ti

/

^—

'

Tj ti+ i . . . tN_iTff_itN

FIG. 9

t

Synthesis Problems for Control Systems

37

Since the random function X(T) is continuous, we can approximate its realization on the interval [<j, <;+i] (i = 0, . . . , N — 1) by the segment of the straight line

X(T) = x(ti) + X(ti+i} ~ *(ti] (r - ti) + o(A),

t{ < r < ti+i, (1.2.16)

*i + l — H

(relation (1.2.16), (1.2.8), (1.2.13), and (1.2.14) are satisfied almost surely). From (1.2.16) and (1.2.15), we have

x(7f ) = (1 - v ) x ( t t ) + vx(ti+i) + o(A).

(1.2.17)

We assume that the function

(1.2.18)

Using (1.2.18), we transform (1.2.14), which defines the stochastic integral, as follows: - I Jt

= A-)-0 lim Y, <>•(*«•> (! - "H*0 + vx(ti+i))[n(ti+i) ~ " fci

(1.2.19)

In the integral Iv and in the differential of the Wiener process df,rj(T) in (1.2.19), the subscript v indicates the method by which the sum was constructed. For v = 0, formula (1.2.19) defines the Ito stochastic integral

/„=

ff(T,x(T})d0r,(r)= •'*"

lim

~*° i=l

which is widely used in the theory of stochastic processes of diffusion

type [45, 66, 113, 117, 132]. Let us calculate the difference Iv — IQ for an arbitrary v. Since we assume that the function

+ v(ti,x(ti))[x(ti+l)

- x(ti)} + o[x(ti+1) - x(ti)]

(1.2.21)

38

Chapter I

Passing in (1.2.3) from differentials to finite increments and taking into account the fact that tj+i — ti is small, we obtain

+ < r 7 , X 7

= *(ti, x(tt)) fofr+i) - r,(ti)] + 0(A).

(1.2.22)

We substitute (1.2.22) into (1.2.21); then we substitute the result into (1.2.19) and obtain the desired difference of integrals N-l

„

/„ -Jo = Urn Y, v-j£(ti,*(ti))(ti+i) ~*

-ffr)]2-

(1-2-23)

i=0

Let us calculate the limit on the right-hand side of (1.2.23). To this end, following [175], we consider both the A-decomposition i,-, 0 A. For each fixed g-decomposition we define piecewise constant functions f s ( t ) and / (t) whose constant values on the jth part of the e-decomposition are given by the formulas

Obviously, we have JV-l

JV-l

i=0

JV-l

< Y, f e ( t i ) [ r ) ( t i + i ) - rj(ti)}2.

(1.2.24)

8=0

It follows from (1.2.8) that A—>0

^-—'

3 i

3

therefore, the inequality (1.2.24) implies M-l \—^

JV-l a J~

/

e

j=0

£\ f £

f \

•

\—^ V (T

9

A—^0 ^—^ OX i=0 M-l £ e 'f£ f\ / j W / \f 7 + 1

— f£1 j /l '

lH - L -9 - 6 .9^^ /(O(

Synthesis Problems for Control Systems

39

The last inequality holds for any fixed e-decomposition. Since the function
as e —>• 0. Thus the first and the last sums in (1.2.25) have the same limits as e ->• 0: M-l

M-l

}™ E /.(*y)(*y + i - *5) = j™ E 7.(*5)(*y+i - *5) J=0

j=0

=/' Jta

This relation and (1.2.25) imply N-l „ A=0

=

Jtn

Now we return to (1.2.23) and obtain the following relation between the stochastic integral /„ and the Ito integral JQ: I cr(r, x(r))d I/ ?7(r) = Jta

/
7

(T))
(1.2.26)

Jta

Thus we see that the following similar formula holds for any square integrable function $(r, x(r)) that is continuously differentiable with respect to both arguments (provided that the stochastic process x(t) satisfies

Eq. (1.2.3)): / $(r,x(r))cZ^7 ? (r) = / Jto Jta

, XT

+ v I ^(T,x(T)}a(T,x(T)}dT. •/to

(1.2.27)

aa;

We also note that if the function
40

Chapter I

The stochastic integral /„ with parameter v = 1/2 plays an important role. Such integrals were introduced by R. L. Stratonovich [174] and are called symmetrized. The advantage of such integrals is that they can be cal-

culated by using the rules of integration for deterministic smooth functions. The calculation of the stochastic integral

/' Jtn

of the Wiener process is usually presented as an illustrative example [167, 174, 175]. Indeed, in this case the Ito integral can readily be calculated

from (1.2.20) and (1.2.8). By (1.2.20) we have ft

Io = I rj(r) d0r,(r) = lim I j.

iA —^-U

** t n

i=0

We write the ith summand in the form

take into account the equality ^(t^) — r](t) and (1.2.8), sum up, and thus obtain

/" r,(r) d0(r) = 1 \r,2(t] - r,2(t0)} - \(t - t0). Jta

L L

J

(1.2.28)

Z

A similar symmetrized integral can be found from (1.2.28) and (1.2.27) for v = 1/2, CT(T, x) = 1, and $(r, x(r)) = X(T) = ??(T). Since in this case

the second summand on the right-hand side of (1.2.27) is equal to (t—to)/2, we have

that is, the usual formula of integration of deterministic functions is valid for symmetrized stochastic integrals. It follows from the preceding that if the solution of stochastic differen-

tial equations (1.2.2), (1.2.3) is denned as a random function x(i) satisfying (1.2.4), then this definition must be supplemented with a method for calculating the last integral in (1.2.4), since different stochastic integrals determine different solutions of Eqs. (1.2.2) and (1.2.3).

In fact, it follows from (1.2.4) that for all interpretations of the stochastic integral, the solution x ( t ) is a Markov stochastic process, since the initial

Synthesis Problems for Control Systems

41

value x(t0) = XQ and the future increments of the Wiener process ^(T), T > to, independent of XQ, uniquely determine the future (with respect to to) behavior of the stochastic process x(t], t > IQ. The Markov process x(t) is a continuous stochastic processes of diffusion type, therefore, according

to §1.1, its probability properties are completely determined by the drift and diffusion coefficients (1.1.54)

A(t. x] = lim E A-S-O

(1.2.30) It follows from (1.2.4) that t+A

/

»t+A

a(T,x(r))dT+

cr(r,x(r))d??(r).

(1.2.31)

Since the process X(T) and the function
U

t+A

N

a(r, X(T)) dr \ X(T) = x \ = a(t, z)A + o(A). J

(1.2.32)

The result of averaging the second integral in (1.2.31) depends on the definition of the stochastic integral. If we have an Ito integral, then by definition (1.2.20) we have

er(r, z(r)) d0T?(r) | x ( t ) = x i = 0

(1.2.33)

(for any A not necessarily small). This important property of Ito integrals follows from the fact that the increments r}(r) are independent of the integrand CT(T,X(T)) (here a is not an extrapolating function [132]). By the

same reason, for any A, formulas (1.2.8) and (1.2.20) imply .- r yt+A

E{ / ILJt

-I 2

J

-j

/-t+A

x(t) = x\ = ) Jt

Ev2 (r, x(r)) dr.

(1.2.34) From (1.2.29)-(1.2.34), we obtain the local characteristics

A(t,x) = a(t,x),

B(t,x)=
(1.2.35)

42

Chapter I

of the Markov process defined by (1.2.4) on the basis of the Ito integral. If the second integral in (1.2.31) is understood in the sense of (1.2.19), then formula (1.2.34) remains valid after the change dorj(r) —>• dv-q(T) but, instead of (1.2.33), we obtain the following formula from (1.2.26) and (1.2.33):

(T(T, X(T) /•t+A

= E'

In this case, the diffusion process x(t] has the other characteristics Alt. = alt. x ) + v —{ (t, x]<j(t. cc), V x] V ' ; ' ' dx ' ' { ' ;' B(t,x) =
(1.2.37)

Thus, to avoid misunderstanding, it were more correct from the very beginning to write the stochastic differential equation (1.2.3), say, in the form dx(t) = a ( x ( t ) , t] dt +
dx(t) = A(t, x) [_

t/
£

I

dt +

^/B(t,x)dvi)(t)

(1.2.39)

corresponding to different values of the parameter v. If the Markov process x(t) is denned by the Ito equation dx(t) = a(t, x(t)} dt + a(t, x ( t ] } d0r)(t),

(1.2.40)

then Eq. (1.2.38), equivalent to (1.2.40), has the form

dx(t)

= \a(t, x ( t ) } - v L

a

^X'v(t, x)\dt + a(t, x) dvr)(t). ox \

(1.2.41)

Synthesis Problems for Control Systems

43

Finally, if x(t) is defined by (1.2.38), then the Ito equation corresponding to this process has the form dz(tr) = \a(t, x(t)) + v
ax

L

j

dt +
(1.2.42)

(formulas (1.2.39), (1.2.41), and (1.2.42) readily follow from (1.2.35) and (1.2.37)). From (1.2.38), (1.2.40)-(1.2.42) we can see that different forms of differential equations can readily be transformed into each other. In this connection, the following two questions arise immediately: (1) Do we need different forms of Eq. (1.2.38) at all? (2) Is it possible to use only one definite form of stochastic equations, say, the Ito form, and to consider all differential equations of the form (1.2.3) as Ito equations? The last question has an affirmative answer. Namely, in the majority of mathematical papers [44, 45, 56, 66, 113, 132] it is postulated from the beginning that the motion of a controlled system is described by Ito differential equations and the theory is constructed via Ito stochastic differentials and integrals. The answer to the first question is based on advantages and disadvantages of different forms of stochastic equations and integrals, on whether an actual process is adequate to its mathematical model, and on the character of the problem considered. The main advantage of Ito stochastic differential equations is the fact that their solutions x(t) are martingales; this follows from (1.2.4), the definition of the Ito integral (1.2.20), and formula (1.2.33). This fact allows us to study the processes x(t) by rather general methods from the theory of martingales [132]. Moreover, if we use the Ito equation, then we obtain many formulas, for example, in the theory of filtration (§1.5), in the most compact form. However, sometimes it is inconvenient to use Ito differentials and integrals, because very often we cannot use formulas from the common analysis for operations with Ito processes. This was already pointed out when integral (1.2.28) was calculated. A similar situation arises when we differentiate a function of the stochastic process x(t) that is a solution of the Ito equation (1.2.40). Suppose that
[

A

(^

-(t, x(t))

44

Chapter I

2 + (t, x(t)) d0rj(t).

(1.2.43)

(sX

But if we use the usual differentiation rule, then we have

dip(t,x(t}} = hj£?>(*> : L

cr(t,x(t))^(t,x(t))dr,(t)

(1.2.44)

for the differential of a composite function (p (under the condition that x(t) satisfies Eq. (1.2.40) with usual differential dr)(t)).

The outlined computational difficulties disappear if we use the symmetrized form of stochastic integrals and equations. This has already been shown for integration when we calculated the integral (1.2.28). Let us show

that the usual formula of the composite function differentiation (1.2.44) holds for the stochastic process x ( t ) defined by the symmetrized differential equation (that is, by Eq. (1.2.38) with v = 1/2). The proof of this statement is indirect. Namely, we show that formula (1.2.44) for x(t), defined by the symmetrized stochastic equation

dx(t) = a(t, x ( t ) ) dt + a(t, x ( t ) ) d l / 2 r j ( t ) ,

(1.2.45)

implies formula (1.2.43) for x(t], defined by the Ito equation (1.2.40). Indeed, it follows from (1.2.41) that the symmetrized equation equivalent to (1.2.40) has the form

r 1 ^ ~i dx(t) = \a- -a—- \ dt + cr dl/2r](t)

I

2 axj

(the arguments of a and cr are omitted). From this relation and (1.2.44) we

obtain the symmetrized stochastic differential

*•<'"<<» = Now we note that (1.2.27) implies

*(*, z(t)) dor)(t) +

(t, x(t))a(t, x(t)) dt. (1.2.47)

By setting $ = crdtp/dx in (1.2.47), we obtain the Ito stochastic differential (1.2.43) from (1.2.46) and (1.2.47).

Synthesis Problems for Control Systems

45

Now let us consider another problem, which is extremely important from the viewpoint of applications. This is the question whether our mathematical model is adequate to the actual process in the dynamic system with

random perturbations. One of the starting points in the theory of optimal control is the assumption that the equation of motion of a dynamic sys-

tem is given a priori (§1.1). Suppose that the corresponding equation has the form (1.2.2) or (1.2.3). We have already shown that one can construct infinitely many solutions of such equations by choosing one or other form

of stochastic integrals and differentials. Which solution from these infinitely many ones corresponds to the actual stochastic process in the system? Does this solution exist? The answers can be obtained only by analyzing specific physical premises that lead to Eqs. (1.2.2), (1.2.3). Such investigations were performed in [167, 173, 175, 181], whose basic results relative to Eqs. (1.2.2), (1.2.3) we state without details.

If we consider the solution x ( t ) of Eq. (1.2.3) as a continuous model for a stochastic discrete time process Xk = s(fcA), k — 0 , 1 , 2 , . . . , which is computer simulated according to the formula

Zfc+i = xk + a(fcA, xk)A + (r(kA, xk)£k+l

(1.2.48)

(£fc > k — 1, 2 , . . . , is a sequence of independent identically distributed Gaussian random variables with zero mean and variance D£fc = A), then as A —>• 0 the sequence Xk (under the linear interpolation with respect to t between

the points t^ = kA) converges in probability to the solution x ( t ) of (1.2.3), provided that the latter is the Ito equation. If the motion of a dynamic system is given by (1.2.2) (stochastic equations of the form (1.2.2) are called Langevin equations [127]), where £(i) is a sufficiently wide-band stationary stochastic process (for example, the Gaussian Ornstein-Uhlenbeck process with the autocorrelation function

R^(T) = (a/2) exp{-a T\} for large values of a), then the solution of (1.2.2) coincides with the solution of the symmetrized equation (1.2.3), that is, of (1.2.38) with v = 1/2. In particular, each simulation of the Langevin equation (1.2.2) with an "actual" white noise by using analog computers gives

a symmetrized solution of Eq. (1.2.3) (see [37]). In the present monograph, all stochastic equations given in Langevin

form (1.2.2) with the white noise £(i) are understood in the symmetrized sense. In what follows, the symmetrized form of stochastic equations is used rather often, since this is the most convenient form for calculations relative to transformations of random functions, the change of variables, etc. In this connection, we omit the index i> = 1/2 in the stochastic differential.

The subscript 0 in the differential dor](t) in Ito equations is used if and only if the Ito equation and the corresponding symmetrized equation have

46

Chapter I

different solutions. In other cases, just as in symmetrized equations, we write stochastic differentials without subscripts. Stochastic integrals and differentials that correspond to other values of v [191, 192] are basically of theoretical interest. We shall not consider them in what follows.

In conclusion, let us consider some possible generalizations of the results obtained. First we note that all above-mentioned facts for scalar equations

(1.1.2), (1.2.3) can readily be generalized to the multidimensional case, so that the form of (1.2.2), (1.2.3) is preserved, provided that x 6 Rn and £(t) (»?(<)) are n-dimensional vector-columns of phase coordinates and random functions and the functions a and cr are an n-column and an n x n matrix. If necessary, the corresponding systems of equations can be written in more detail, for example, instead of (1.2.2), we can write

Xi = a i ( t , x ) + < T i j ( t , x ) £ j ( t ) ;

i=l,...,n

(1.2.49)

(as usual, the summation is taken over repeated indices if any, that is, in (1.2.49) we have cr^j = YTj=iaij^j}Systems (1.2.2) and (1.2.3) (or (1.2.49)) determine an n-dimensional Markov process x(t) with the vector of drift coefficients Ai(t, x) = ai(t, x) +

°

' X>
t = l,...,n,

(1.2.50)

and the matrix of diffusion coefficients

B(t,x) =
(1.2.51)

If the process x(t) is denned by the Ito equation (1.2.40), then, instead of (1.2.50) and (1.2.51), we have

A(t,x) = a(t,x),

B(t,x) = ( r ( t , x ) < T T ( t , x ) .

(1.2.52)

According to [173], stochastic equations of the more general form (1.2.1) can always be represented in the form (1.2.2). Indeed, as shown in [173], if

random functions £(t) in (1.2.1) have a small correlation time (for example, one can assume that £(<) is an n- vector of independent stochastic processes of Ornstein-Uhlenbeck type with a large parameter a), then Eq. (1.2.1) determines a Markov process with the following drift and diffusion coefficients:

,

(1.2.53)

Synthesis Problems for Control Systems j(t

+ T,x,t(t + T))]dT

47

(1.2.54)

(here K[a,/3] = E(a — Ea)(/3 — E/3) denotes the covariance of random variables a and /3; moreover, the mean Eg; and the correlation functions in (1.2.53) and (1.2.54) are calculated under the assumption that the argument x is a nonrandom fixed vector). Since similar characteristics of the Markov process defined by (1.2.2) (or by (1.2.49)) have the form (1.2.50), (1.2.51), we can obtain the differential equation (1.2.2), which is stochastically equivalent to (1.2.1), by solving system (1.2.50), (1.2.51) with respect to the unknown variables a,- and
. 1 daij a, - Ai - - ~-o-kj\ 2 dxk

. i-l,...,n.

It follows from the preceding that to study Markov processes of diffusion type, without loss of generality, we can consider stochastic equations only in the form (1.2.2.), (1.2.3) or (1.2.40). Therefore, the most general form of differential equations of motion of a controlled system with random perturbations £(t) of the white noise type is given by the equation

(1.2.55) or by the equivalent equation dx(t) - a(t, x(t),u(t)) dt + a(t, x(t), u(t)) drj(t)

(1.2.56)

(in (1.2.55) £(<) is the standard white noise with the characteristics (1.1.34); in (1.2.56) r,(t)= f £(T)dT,

Jo

77(0) = 0,

is the standard Wiener process). In (1.2.55) and (1.2.56) u = u(t) is understood as the control algorithm (1.1.2). The form of this algorithm can be found by solving the Bellman equation.

48

Chapter I §1.3. Deterministic control problems. Formal scheme of the dynamic programming approach

The dynamic programming approach [14] was proposed by R. Bellman

in the fifties as a method for solving a wide range of problems relative to processes of multistage choice. In this section we briefly discuss the main idea of this method applied to synthesis problems for optimal feedback control systems [16, 17]. We begin with deterministic problems of optimal

control and pay the main attention to the algorithm of the method, that is, to the method for constructing the optimal control in the synthesis form.

Let us consider the control problem with free right endpoint of the trajectory, in which the plant is given by system (1.1.5)

x(t)=g(t,x(t),u(t)),

0
z(0) = x 0 ,

(1.3.1)

the performance criterion is a functional of the form (1.1.11)

(1.3.2) and the control vector u may take values at each moment of time in a given

bounded set U C R r , u(t) e U.

(1.3.3)

In problem (1.3. !)-(!. 3. 3) the time interval [0,T] and the initial vector of phase variables XQ are known; it is required to find the control function u*(t): Q
represented in the form

u,(t) =

(1.3.4)

where the current values of the control vector are expressed in terms of the

current values of phase variables of system (1.3.1). The optimal control of the form (1.3.4) is called optimal control in the synthesis form, and formula (1.3.4) itself is often called the algorithm of optimal control. The dynamic programming approach allows us to obtain the optimal control in the synthesis form (1.3.4) for problem (1.3. 1)-(1. 3.3) as follows. We write

U

t
T

-I

C(T, x(r), U(T)) dr + i>(x(T)}

J

.

(1.3.5)

Synthesis Problems for Control Systems

49

The function F(t,xt), called later the loss function,6 plays an important role in the method of dynamic programming. This function is equal to the minimum value of the functional (1.3.2) provided that the control process is considered on the time interval [t,T], 0 < t < T, and the vector of phase variables is equal to x(i] = xt at the beginning of this interval (that is, at time t). In (1.3.5) the minimum is calculated over all possible strategies U(T) =
(a) these functions take values in an admissible set U] (b) for any t € [0,T] the Cauchy problem for system (1.3.1), x(r)=ff(r,z(T),y»(r,x(r))),

t < T < T,

x ( t ) = xt,

has a unique solution X(T) : t < T < T.

The dynamic programming method is based on the Bellman optimality principle [14, 17], which implies that the loss function (1.3.5) satisfies the basic functional equation r ft

F(t,xt)=

min

_

1

/ c(o-,x(o-),u(er)) da + F(t,x7) \,

n(tT)£U

Ut

(1.3.6)

J

t<<7<*

for all t <E [t,T]. For different statements of the optimality principle and comments see [1, 16, 50]. However, here we do not discuss these statements, since to derive Eq. (1.3.6) it suffices to have the definition of the loss function (1.3.5) and to understand that this is a function of time and of the state x(t) = xt of the controlled system (1.3.1) at time t (recall that the control process is terminated at a fixed time T). To derive Eq. (1.3.6), we write the integral in (1.3.5) as the sum Jt = It ~*~ Jf °f *wo integrals and write the minimum as the succession of minima

min = min u(r)6U t
min .

«(
Then we can write (1.3.5) as follows:

. . t<(T
U

t c(er, x(ff),

u(cr)} dcr

t
I c(p,X(p},u(p)}dp+^(x(T)]\. Jt \

(1.3.7)

The function (1.3.5) is also called a value function, a cost function, or the Bellman

function.

50

Chapter I

Since, by (1.3.1), the control u(p) on the interval [I, T) does not affect the solution x(cr} of (1.3.1) on the preceding interval [t,t), formula (1.3.7) takes the form F(t,xt) = min < / c(a;x(a-),u(cr)) da u(
( Jt

t<er
+ mm \ f

c(p,x(p),-u(p))dp+il>(x(T))\\.

u(p)€U Ut t
(1.3.8)

J )

Now, since by (1.3.5) the second term in the braces in (1.3.8) is the loss

function F(t,Xf),

we finally obtain Eq. (1.3.6) from (1.3.8).

The basic functional equation (1.3.6) of the dynamic programming approach naturally allows us to derive the differential equation for the loss

function F(t,x). To this end, in (1.3.6) we set t = t + A, where A > 0 is small, and obtain

F(t,xt) =

min «(
F /"*+A 1 / c(
1

Since the solutions x(t) of system (1.1.3) are continuous, the increments (E*+A —xt) of the phase vector are small for admissible controls u(t) = (t, x(t)). Taking into account this fact and assuming that the loss function

F(t, x) is continuously differentiable with respect to all its arguments, we can expand the function F(t + A, K«+A) into its Taylor series about the point (t, Xt) as follows:

(1.3.10) In (1.3.10) dF/dx denotes an n-vector column with components dF/dxi, i— 1, 2, . . . , n; therefore, the third term on the right-hand side of (1.3.10) is the scalar product of the vector of increments (CC<+A — xt) and the gradient of the loss function

dF A. .dF xO —— = ^(Xit+^-Xit) —— . the function o(A) denotes the terms whose order is larger than that of the infinitesimal A. It follows from (1.3.1) that for small A the increment of the phase vector x can be written in the form -xt- g(t, xt, w«)A + o(A).

(1.3.11)

Synthesis Problems for Control Systems

51

Writing the first term in the square brackets in (1.3.9) as t+A

c(tr, x(a), u(cr)) da = c(t, xt, w«)A + o(A),

/.

(1.3.12)

substituting (1.3.10) and (1.3.12) into (1.3.9), and taking into account

(1.3.11), we arrive at T dF F(t, xt) = miu c(<, xt, u t )A + F(<, xt) + -^r(t, «t)A tf t G y LI

C7t

ftW

1

+ f f T (f, x ( , w t )-^-(i, a;t)A + o(A) . (1.3.13) ox j Note that only the first and the fourth terms on the right-hand side of (1.3.13) depend on the control ut. Therefore, the minimum is calculated

only over these terms, the other terms in the brackets can be ignored. Dividing (1.3.13) by A, passing to the limit as A —>• 0, and taking into account the fact that lirriA-KD o(A)/A = 0, we obtain the following differential equation for the loss function F(t, x):

at

,

, ,

,,)jr(t,x)

=0

(1.3.14)

(here we omit the subscript t of the phase vector Xt and the control Ut). Note that the loss function F(t, x) satisfies Eq. (3.1.14) on the entire interval of control 0 < t < T except at the endpoint t = T, where, in view of (1.3.5), the loss function satisfies the condition

F(T,x) = i/>(x).

(1.3.15)

The differential equation (1.3.14), called the Bellman equation, plays the central role in applications of the dynamic programming approach to the synthesis of feedback optimal control. The solution of the synthesis problem, that is, the optimal strategy or the control algorithm u* (t) = y>* (t, x) =

( p f ( t , x ( t ) ) can be found simultaneously with the solution of Eq. (1.3.14). Namely, suppose that we have somehow found the function F(t, x) that satisfies (1.3.14) and (1.3.15). Then the expression in the square brackets in (1.3.14) is a known function of i, x, and u. Calculating the minimum of this function with respect to u, we obtain the optimal control u* — y*(<, x) (it* determines the minimum point of this function in U C R r )If the functions c(t, x, u) and g(t, x, u) and the set of admissible controls

U allow us to minimize the function in the square brackets explicitly, then the optimal control can be written in the form

(1.3.16)

52

Chapter I

where dF/dx is a vector of partial derivatives yet unknown; when we minimize the function in the square brackets in (1.3.14), we assume that this vector is given. Using (1.3.16) and denoting

f , , T, , d F . , mm c(t, x, u) + g1 (t, x, u) ——(t, x) u£U \_

OX

dF

9F\\dF —)J — (1.3.17)

we write (1.3.14) without the symbol "min" as follows:

dF ( dF \ — ( t , x ) + 3>(t,x,—(t,x)}=Q, at \ ox J

Q
1.3.18

To complete the synthesis problem, it is necessary to solve (1.3.18) with regard to (1.3.15), that is, to find the function F(t, x) that satisfies (1.3.18) for 0 < t < T and continuously tends to a given function if>(x) as t —>• T,

and to substitute the function F(t, x) obtained into (1.3.16). In practice, the main difficulty in this synthesis procedure, is related to solving the Bellman equation (1.3.14) or (1.3.18), which is a first-order partial differential equation. The main distinguishing feature of the Bellman equation is that it is nonlinear because of the symbol "min" in (1.3.14), which shows that the function $ in (1.3.18) nonlinearly depends on the components of the vector of the partial derivatives dF/dx. The character of this nonlinearity is determined by the form of the functions c(i, x, u) and g(t, x, u), as well as by the set of admissible controls U. Let us consider some typical illustrative examples.

1°. Suppose that c(t,x,u) = ci(t,x) + uTP(t,x)u, where P is a symmetric r x r matrix positive definite for all x 6 R and t £ [0, T], g(t, x, u) — a(t, aj) + Q(t, x)u (a is an n-vector and Q is an n x r matrix), and the control u is unbounded (that is, U = R r ). Then the expression in the square brackets in (1.3.14) takes the form /) W

f) J?

[•} = Cl(t, x) + aT(t, x)^ + uTQT(t, x)——+ uTP(t, x)u.

(1.3.19)

By differentiating this function with respect to u and solving the system d[-]/du = 0, we obtain 1

/) F

^

(s3j

= --P~l(t,x)QT(t,x)--

(1.3.20)

Synthesis Problems for Control Systems

53

(the matrix P~l is the inverse of P). Substituting (1.3.20) into (1.3.19) instead of u, we obtain Q p\ ap t,x, —— J = c i ( t , a ; ) + a T ^ , x ) ——

(

1 f) W

3W

T - rt-,^rQ(t^}P-\t^}Q (t^}-^C*«K C/tC

(1-3-21)

2°. Suppose that c(i,«,w) = ci(t,x), g(t,x,u) = a(t,x) + Q(t,x)u, and the domain U is an r-dimensional parallelepiped, that is, |w,- < tioi, i = 1,..., r, where the numbers u0» > 0 are given. One can readily see that
dF t, x)——

( L3 - 22 )

9F

where sign A and \A\ are matrices obtained from A by replacing each its element a,-j by signojj and |ctjj|, respectively; {MOII • • • > U 0r} denotes the diagonal r x r matrix with itoi, • • •, «0r on its principal diagonal. 3°. Let the functions c(-) and g(-) be the same as in 2°; for the domain U, instead of a parallelepiped, we take an r-dimensional ball of radius RQ centered at the origin. Then, instead of (1.3.22), we obtain the following expressions for the functions tpo and 4>:

dX

——

"-'

1/2

(1.3.23)

Note that in (1.3.23) and in the following, dF/dxT denotes an n-row-vector with components dF/dxt, i = 1,..., n. Therefore, the function •jjrQQ'1'ffis a quadratic form in components of the gradient vector of the loss function, and the matrix QQT is its kernel. As a rule, the nonlinear character of Bellman equations does not allow one to solve these equations (and the synthesis problem) explicitly. There is only one exception, namely, the so-called linear-quadratic problems of optimal control (LQ-problems). In this case the differential equations (1.3.1) of the plant are linear:

x ( t ) = A(t)x(t) +B(t)u(t)

54

Chapter I

(here A(t) and B(t) are given nxn and nxr matrices), the penalty functions c(t, K, u) and i/'(x) m ^ne optimality criterion (1.3.2) are linear-quadratic forms of the phase variables z and controls u, and there are no restrictions

on the domain of admissible controls (that is, U — Rr in (1.3.3)). Let us solve the synthesis problem for the simplest one-dimensional LQproblem with constant coefficients; in this case, the solution of the Bellman equation and the optimal control can be obtained as finite analytic formulas. Suppose that the plant is described by the scalar differential equation

x = ax + bu,

(1.3.24)

and the optimality criterion has the form [•T

I(u) = clX2(T) + I

Jo

[cx2(t) + hu2(t)} dt

(1.3.25)

(ci > 0, c > 0, T > 0, h > 0, and a and b in (1.3.24) and (1.3.25) are given constant numbers). The Bellman equation (1.3.14) and the boundary condition (1.3.15) for problem (1.3.24), (1.3.25) have the form

9F f dF 1 —— (*,£)+min \cx*+ hu* + (ax + bu)——(t,x}\ =0, x

ot

"

" L

J

F(T,x) = Clx2.

(1.3.26)

(1.3.27)

The expression in the square brackets in (1.3.26) considered as a function of M is a quadratic trinomial. Since h > 0, this trinomial has the single minimum

which can readily be obtained from the relation d[-]/du = 0; this is a necessary condition for an extremum. Substituting M* into (1.3.26) instead of u and omitting the symbol "min", we rewrite the Bellman equation in the form

dF dt

,

dF dx

b2 IdF\2 =o, 4h\dxJ

—— + ex2 + ax _ _ _ _

0
~

v(1.3.29)

'

We shall seek the loss function F(t,x) satisfying Eq. (1.3.29) and the boundary condition (1.3.27) in the form

F(t, x) = p(t)x2,

(1.3.30)

Synthesis Problems for Control Systems

55

where p(t) is the desired function of time. If we substitute (1.3.30) into (1.3.29), then we see that p(t) must satisfy the ordinary differential equation

p+c+2ap-~p2 = 0 h

(1.3.31)

for 0 < t < T. Moreover, it follows from (1.3.27) and (1.3.30) that the function p(t] assumes a given value at the right endpoint of the control interval:

p(T) = ci.

(1.3.32)

Equation (1.3.31) can readily be integrated by separation of variables. The boundary condition (1.3.32) determines the unique solution of (1.3.31). Performing the necessary calculations, we obtain the following function p(i) that satisfies Eq. (1.3.31) and the boundary condition (1.3.32):

P(

'~

(J3- a)h - [62d - h(a

Thus it follows from (1.3.28) and (1.3.30) that the optimal control in the synthesis form for problem (1.3.24), (1.3.25) has the form

«, =
(1.3.34)

where p(t) is determined by (1.3.33). Note that problem (1.3.24), (1.3.25) is one of few optimal control problems, for which the Bellman equation can be solved exactly. In Chapter II we consider some other examples of exact solutions to synthesis problems

of optimal control (for deterministic and stochastic control systems). However, the majority of the optimal control problems cannot be solved exactly. In these cases, one usually employs approximate and numerical synthesis methods considered in Chapters III-VII. We complete this section with some remarks. First we note that we have considered only a formal scheme or, as is said sometimes, the algorithmic essence of the dynamic programming approach.

The described method for constructing an optimal control in the synthesis form (1.3.4) is justified by some assumptions, which are violated sometimes. We need to take into account the following. (1) The loss function F(x,t) determined by (1.3.5) is not always differentiable even if the penalty functions c(t,x,u) and i{>(x) are sufficiently

56

Chapter I

smooth (or even analytic) functions. It is well known that, by this reason, the dynamic programming approach cannot be used for solving many time-optimal control problems [50, 156]. (2) Even in the case where the loss function F(x,t) satisfies the Bell-

man equation (1.3.14), the control u*(i, x] minimizing the function in the square brackets in (1.3.14) may not be admissible. In particular, this control can violate the existence and uniqueness conditions for the solution of

the Cauchy problem for system (1.3.1). (3) The Bellman equation (1.3.14) (or (1.3.18)) with the boundary condition (1.3.15) can have nonunique solutions. Nevertheless, we have the following theorem [1].

THEOREM. Suppose that there exists a unique continuously differentiable solution Fo(t,x) of Eq. (1.3.14) with boundary condition (1.3.15) and there exists an admissible control u*(t,x] such that

[

Ft W 1 3F c(t,x,u)+gT(t,x,u)^-(t,x)\ = c(t,x,ut) + gT(t, x, ut)— —(t, x). ox J ox

Then the control w*(t, x) in the synthesis form is optimal, and the function Fo(t,x) coincides with the loss function (1.3.5).

In conclusion, we point out another fact relative to the dynamic programming approach. The matter is that this method can be used for solving problems of optimal control for which the optimal control ut(t, x) does not exist. For example, such situations appear when the domain of admissible controls U in (1.3.3) is an open set. The absence of an optimal control does not prevent us from deriving the

basic equations for the dynamic programming approach. It only suffices to modify the definition of the loss function (1.3.5). So, if we define the function F(t, xt) as the greatest lower bound of the functional in the square brackets in (1.3.5),

U

T

i

C(T, X(T),U(T)) dr + 1>(x(T))

J

,

(1.3.35)

t
then one can readily see that the function (1.3.35) satisfies the equations

c(a, x(a), u(
(1.3.36)

J

t<<7
3F

r

ft F

~\

ot

utu |_

ox

\

—— ( t , x ) + inf \c(t, x, u)+gT(t, x,u)-—(t, x)\ = 0 ,

(1.3.37)

Synthesis Problems for Control Systems

57

which are similar to Eqs. (1.3.6) and (1.3.14). However, in this case the functions u*(t, x) realizing the infimum of the function in the square brackets in (1.3.37) may not exist. Nevertheless, the absence of an optimal control u*(t,x) is of no fundamental importance in applications of the dynamic programming approach,

since if the lower bound in (1.3.37) is not attainable, one can always construct the so-called £-optimal strategy ue(t, x). If this strategy is used in system (1.3.1), then the performance functional (1.3.2) attains the value I(uc) = -F(0, Xo) + e, where e is a given positive number. Obviously, to construct an actual control system, it suffices to know the e-optimal strategy ue(t, x) for a small e. Here we do not describe methods for constructing £-optimal strategies. First, these methods are considered in detail in the literature (see, for example, [113, 137]). Second (and this is the main point), the optimal control always exists in all special problems studied in Chapters II-VII. This is the reason that, from the very beginning, in the definition of the loss function (1.3.5) we use the symbol "min" instead of a more general symbol "inf." §1.4. The Bellman equations for Markov controlled processes

The dynamic programming approach is widely used for solving stochastic problems of optimal control. In this section we consider the control

problems in which the controlled process is a Markov stochastic process. It follows from the definition of the Markov processes given in §1.1 that the probabilities of future states of a controlled system are completely determined by the current states of the vector of phase variables, which are assumed to be known at any time t.

FIG. 10 One can readily see that the servomechanism shown in Fig. 10 possesses the listed Markov properties if the following conditions are satisfied: (1) the joint vector (y(i),x(t)) of instant values that define the input actions and output variables is treated as the phase vector of the system;

58

Chapter I

(2) the input action y(t) is a Markov stochastic process; (3) the random perturbation £(i) is a white noise type process; (4) the controller C is a noninertial device that forms current values of the control actions u(t) according to the rule

u(t) = v(t,x(t),y(t}).

(1.4.1)

Actually, if the plant P is described by equations of the form (1.2.55)

and y(t) is a Markov process with known probability characteristics, then it follows from (1.2.55) and (1.4.1) that the joint vector x(t) = (x(i),y(t)) is a Markov process. In particular, if y(t) is a diffusion process with drift coef-

ficient A(t, y) and diffusion coefficient B(t, j/), then it follows from (1.2.39), (1.2.55), and (1.4.1) that the vector x(t) satisfies a system of stochastic differential equations of the form (1.2.2), that is, x(t) is a diffusion Markov process.

In this section we deal only with systems of the type shown in Fig. 10. In §1.5 we consider the possibilities of applying the dynamic programming approach in a more general situation with non-Markov controlled process (Fig. 3). Later we shall derive the Bellman equations for various stochastic problems of optimal control that are studied in Chapters II— VII. These problems

were stated in §1.1. 1.4.1. Base problem. Optimal tracking of a diffusion process. As the basic problem we consider the synthesis of the optimal ser-

vomechanism shown in Fig. 10 under the following conditions: (i) the controlled plant P is described by a system of stochastic differential equations of the form x(t)=a(t,x(t),u(t))+a(t,x(t})£(t),

z(0) = xo,

0 < * < T, (1.4.2)

where x G Rn is a vector of controlled output variables, u € Rr is a vector of control actions, ^(t) is the n-dimensional standard white noise

with characteristics (1.1.34), a and a are a given matrix and a vectorfunction, and the initial vector x(Q) = x$ and the time interval [0,T] are specified; (ii) the optimal control is sought in the form (1.4.1), the goal of control

is to minimize the functional

o

J

(1.4.3)

(iii) the restrictions on admissible controls have the form

«(*) G U,

(1.4.4)

Synthesis Problems for Control Systems

59

where U is a given bounded closed subset in the space R r ; (iv) the input stochastic process y(t) is independent of £ ( t ) and is an ra-dimensional diffusion Markov process with a known vector Ay(t,y) of drift coefficients and with matrix By(t, y) of diffusion coefficients;7 (v) there are no restrictions on the phase variables, that is, on the components of the vector x = (a;, y) e R n + m ; the current values of the components of the joint vector x can be measured precisely at any instant of time

te[o,T]. By analogy with (1.3.5) we determine the loss function F(t,xt,yt) for problem (i)-(v) as follows: F(t,xt,yt) = min E u(r)£U

r ,-T / C(X(T), j/(r),u(r)) dr Ut

t
+ i>(x(T),y(T))

x ( t ) = xt,y(t)=y^.

(1-4.5)

The loss function (1.4.5) for stochastic problem (i)-(v) differs from the

loss function (1.3.5) in the deterministic case by the additional operation of averaging the functional in the square brackets in (1.4.5). The averaging in (1.4.5) is performed over the set of sample paths xj = [X(T) : t < r < T],

y? = [y(r): t < T < T], that on the interval [t,T] satisfy the stochastic differential equations (1.4.2) and (*) (see the footnote) with initial conditions x(t) = Xt, y(t) = yt and control function u(r) = tf(r, x(r), y ( r ) j ,

t < T < T. Since the process X(T) — (z(r),j/(r)) is Markov, the result of averaging jF<,,(i, Xt) = F,f,(t, xt, yt) = £[•] in (1.4.5) is uniquely determined by the time moment t, by the state vector of the system xt — (xt, yt) at this moment, and by a chosen algorithm of control, that is, by the vector-function ip(-) in (1.4.1). Therefore, it turns out that the loss function (1.4.5) obtained by

minimizing Fv(t,xt) = F^t, xt, yt) over all admissible controls8 (that is, over all admissible vector-functions (•)) depends only on time t and the state (xt,yt) of the servomechanism (Fig. 10) at this time moment. As was shown in §1.2, the coefficients Av(t,y) and Bv(t, y) uniquely determine the system of stochastic differential equations

y(t) = a»(t,y(t))+v»(t,y(t))r,(t),

(*)

whose solutions are sample paths of the Markov process y(t); in (*) rj(t] denotes the standard white noise (1.1.34) independent of £(*). 8 Just as in the deterministic case (§1.3) the control in the form (1.4.1) is called

admissible if (i) for all t 6 [0,T), x 6 R n , and y € R m , the vector-function
60

Chapter I

One can readily see that, for any t (E [t,T], the loss function (1.4.5) satisfies the equation

F(t,xt,yt) = mm E\ f c(x(T),y(T),u(T))dT+F(t,xT,W)\ u(r)£U

IJt

(1.4.6) J

t
that is a stochastic generalization of the functional equation (1.3.6). The

averaging in (1.4.6) is performed over the sample paths x\ and y*, and the symbol E[-] in (1.4.6) indicates the conditional expectation E^j 7, [•]. To prove (1.4.6), we write E^T j, 7 <•(•) for the conditional expectation T '^7 1^*'"*

of a functional of phase trajectories denoted by (•). Here we average over

all possible sample paths x'j = [X(T) : t < T < T], y'j = [y(r) :t
writing the integral in (1.4.5) as the sum Jt

= ft + fj

of two integrals

and writing the minimum as the succession of minima min = min t
min ,

t<<7
we can rewrite (1.4.5) as

r

min

min

v(
ft

T,s[[ / c(x(
(1.4.8) It follows from (1.4.1) and (1.4.2) that the controls u(p) on the time interval t ~t c(x(o-),y(
J t

Synthesis Problems for Control Systems

61

we can rewrite (1.4.8) in the form

F(t,xt,yt)=

min E 7 7,,. u(
i>ytt

U(

| / c(x(a),y(cr),u(cr))

dff

(. Jt

+ min E s T i y T | s T y 7[jr c ( x ( p ) , y ( p ) , u ( p ) ) d p + i > ( x ( T ) , y ( T ) ) ] ^ . t
Since the process (z(£), y(t)} is Markov, the result of averaging in the second term in the braces in (1.4.9) depends only on the terminal state of a fixed sample path (xj, y\}. Thus, replacing E^T^T^T,,! by E^T !,,._,,_ and taking into account the fact that, by (1.4.5), the second term in (1.4.9) is the loss function F ( t , x ^ , i f f ) , we finally obtain the functional equation (1.4.6) from

(1.4.9). Just as in the deterministic case, the functional equation (1.4.6) allows us to obtain a differential equation for the loss function F(t,x,y). By setting

t = t + A, we rewrite (1.4.6) in the form r yt+A

F(t,xt,yt)=

min

E

/

C(X(T), y(r), U(T)) dr

.

(1.4.10)

Assuming that A > 0 is small and the penalty function c(x,y,u) is continuous in its arguments and having in mind that the diffusion processes X(T) and J/(T) are continuous, we can represent the first term in the square brackets in (1.4.10) as

c(x(r), 2/(r), w(r)) dr = c(z t , j/t, ut) A + o(A),

(1.4.11)

where, as usual, the function o(A) denotes infinitesimals of higher order than that of A. Now we assume that the loss function F(t,x,y) has continuous derivatives with respect to t and continuous second-order derivatives with respect to phase variables x and y. Then for small A we can expand the function

62

Chapter I

F(t + A, Xt+Ai Vt+A.) in the Taylor series

= -F^, xt, yt) + A

*9 F

f)F

x T

«+ -yt) + 2

+ o(A) + o(|i« +A - it 2 ) + o(|y t+A - yt\2).

(1.4.12)

Here all derivatives of the loss function are calculated at the point (t, xt, yt),

as usual, dF/dx and dF/dy denote the n- and m-column- vectors of partial derivatives of the loss function with respect to the components of the vectors x and y, respectively; d2F/dxdxT , d2 F / dxdyT , and d2F/dydyT denote the n x n, n x m, and m x ra matrices of second derivatives.

To obtain the desired differential equation for F(t,x,y), we substitute (1.4.11) and (1.4.12) into (1.4.10), average, and pass to the limit as A —>• 0. Note that if we average expressions containing the random increments

(xt+A. — %t) and (y«+A — y t ) , then all derivatives of F in (1.4.12) are considered as constants, since they depend on (t,xt,yt) and the mathematical expectation in (1.4.10) is calculated under the assumption that the values of Xt and yt are known and fixed. The mean values of the increments (xt+& — Xt) can be calculated by integrating Eqs. (1.4.2). However, we can avoid this calculation if we use the results discussed in §1.2. Indeed, if just as in (1.4.11) we assume that the control U(T) is fixed and constant, U(T) = ut, then we see that for t < T < t + A, Eq. (1.4.2) determines a Markov process X(T) such that we can write (see (1.1.54))

E(a; t+ A - x t ) = Ax(t, xt, « t )A + o(A),

(1.4.13)

where Ax(t, xt,Ut) is the vector of drift coefficients of this process. But since (for a fixed u(t) = ut) Eq. (1.4.2) is similar to (1.2.2), it follows from (1.2.50) that the components of this vector have the form9 I

\

< i j . X t

=ai(t,xt,ut) + - ——J-^ ——-a-kj(t,xt), (1.4.14) 9 Recall that formula (1.4.14) holds for the symmetrized stochastic differential equation (1.4.2). But if (1.4.2) is an Ito equation, then we have Ax(t, xt,ut) = a(t,xt,ut) instead of (1.4.14).

Synthesis Problems for Control Systems

63

In a similar way, (1.4.2), (1.1.50) and (1.2.52) imply

E(z t+A - xt)(xt+A - xt)T = Bx(t, x f )A + o(A),

(1.4.15)

where

Bx(t,xt}=a(t,xt)aT(t,xt).

(1.4.16)

The other mean values in (1.4.12) can be expressed in terms of the input Markov process y(t) as follows:

E(J/«+A - yt) = A»(t, yt) A + o(A), (1.4.17) A

°( )-

(1.4.18)

Finally, since the stochastic processes y(t] and £(i) are independent, we have

- Vt)T = o(A).

(1.4.19)

Taking into account (1.4.13)-(1.4.19), we substitute (1.4.11) and (1.4.12) into (1.4.10) and rewrite the resulting expression as follows:

F(t,xt,yt) = min < F(t, xt, yt) + A \c(xt, yt, ut) + -=-

vt£U {

\_

at

For brevity, in (1.4.20) we omit the arguments (t,xt,yt) of all partial derivatives of F and denote the trace of the matrix A — |[oij||™ by Sp^4 = on + a 2 2 + • •• + o nn . By analogy with Eq. (1.3.14), we divide (1.4.20) by A, pass to the limit as A —>• 0, and obtain the following Bellman differential equation for the

loss function F = F(t,x,y):

+ mm\c(x,y,u)+(Ax(t,x,u)f~]

=0.

(1.4.21)

By analogy with (1.3.14), we omit the subscripts of xt, yt, and ut, assuming that the phase variables x, y and the control vector u in (1.4.21) are taken

64

Chapter I

at the current time t. We also note that the loss function F — F(t, x, y) must satisfy Eq. (1.4.21) for 0 < t < T. At the right endpoint of the control interval, this function must satisfy the condition

,y) = i>(x,y),

(1.4.22)

which readily follows from its definition (1.4.5). By using the operator

,

2L

(1.4.23)

oxax1

we can rewrite (1.4.21) in the compact form

mm [Ll,iyF(t, x, y) + c(x, y, u)} = 0.

(1.4.24)

In the theory of Markov processes [45, 157, 175], the operator (1.4.23) is called an infinitesimal operator of the diffusion Markov process x(t) = (*(*)),»(*))•

To obtain the optimal control in the synthesis form ut =
If it is possible to calculate the minimum of the function in the square brackets in (1.4.21) explicitly, then the optimal control can be written as

follows (see §1.3, (1.3.16)-(1.3.18)): u* =
(1.4.25)

and the Bellman equation (1.4.21) can be written without the symbol "min"

Sp

*'<*'»>

+

*'*'*

= °'

(1A26)

where $ denotes a nonlinear function of components of the vector dF/dx 9F

-\=c(x -u *n(t T v —— \\

>x>y>%:)}\ Tir-

(i-4-27)

Synthesis Problems for Control Systems

65

In this case, solving the synthesis problem is equivalent to solving (1.4.26) with the additional condition (1.4.22). After the loss function F(t,x,y) satisfying (1.4.26) and (1.4.22) is found, we can calculate the gradient

dF(t, x, y)/dx = u(t, x, y) and obtain the desired optimal control

u* — 0 ( t , x , y , u ( t , x , y ) ) .

(1.4.28)

Obviously, the main difficulty in this approach to the synthesis problem is to solve Eq. (1.4.26). Comparing this equation with a similar equation (1.3.18) for the deterministic problem (1.3.!)-(!.3.3), we see that, in contrast with (1.3.18), Eq. (1.4.26) is a second-order partial differential equation of parabolic type. By analogy with (1.3.18), Eq. (1.4.26) is nonlinear, but, in contrast with the deterministic case, the nonlinearity of Eq. (1.4.26)

is weak, since (1.4.26) is linear with respect to the higher-order derivatives of the loss function. This is why, in the general theory of parabolic equations [61, 124], equations of type (1.4.26) are usually called quasilinear or semilinear. In the general theory [124] of quasilinear parabolic equations of type

(1.4.26), the existence and uniqueness theorems for their solutions are proved for some classes of nonlinear functions $. The unique solution

of (1.4.26) is selected by initial and boundary conditions on the function F(t,x,y). In our case, condition (1.4.22) that determines the loss function

for t = T plays the role of the "initial" condition. The boundary conditions are determined by the restrictions imposed on the phase variables * and y in the original statement of the synthesis problem. If, as in problem (i)-(v) considered here, there are no restrictions on the phase variables, then it is necessary to solve the Cauchy problem for (1.4.26). In this case, the uniqueness of the solution is ensured by some requirements on the rate of growth of the function F(t, x, y) as x , \y\ —> oo (for details see Chapter III). However, there are no general methods for solving equations of type

(1.4.26) explicitly. Nevertheless, in some specific cases, Eq. (1.4.26) can be solved approximately or numerically, and sometimes, exactly. We describe such special cases in detail in Chapters II-VII. Now let us consider some modifications of problem (i)-(v) that we shall study later. First of all, we trace how the form of the Bellman equation (1.4.21) varies if, in the initial problem (i)-(v), we use optimality criteria that differ from (1.4.3). 1.4.2.

Stationary tracking. We begin by modifying the criterion

(1.4.3), which allows us to examine stationary operating conditions of the servomechanism shown in Fig. 10. We assume that criterion (1.4.3) does not penalize the terminal state

of the controlled system, that is, the penalty function ij}(x,y) = 0 in the

Chapter I

66

functional (1.4.3). Then the servomechanism shown in Fig. 10 can operate in the time-invariant (stationary) tracking mode if the following conditions are satisfied: (1) the input Markov process y(t) is homogeneous in time, namely, its drift and diffusion coefficients are independent of time: Ay(t, y] = Ay (y)

(2) the plant is autonomous, that is, the right-hand sides of Eqs. (1.4.2) do not depend on time explicitly, a(t, x, u) = a(x, u) and cr(t, x) = ff(x)',

3) the system works sufficiently long (the upper integration limit T —>• oo in (1.4.3)).

z(t)

= y(t] - x(t)

FIG.

11

A process of relaxation to the stationary operating conditions is schematically shown in Fig. 11, where the error z(i) = y ( t ) — x(t) between the input action (the command signal) and the controlled value (x and y are scalar variables) is plotted on the ordinate axis. One can see that for large T the operation interval [0, T] can be conventionally divided into two intervals: the time-varying operation interval [0, ti, this correlation disappears, and we can assume that z(t), t 6 [
Synthesis Problems for Control Systems

67

The performance on the time-invariant interval is characterized by the value 7 of mean losses per unit time (the stationary tracking error). If the operation time T increases to T + AT (see Fig. 11), then the loss function (1.4.5) increases by -yAT. Therefore, to study the stationary tracking, it

is expedient, instead of the loss function (1.4.5), to use the loss function f(x, y) that is independent of time and can be written as

f(x,y)=

lim [F(t,x,y)-i(T-t)].

T—*-oo

(1.4.29)

It follows from (1.4.23) and (1.4.24) that function (1.4.29) satisfies the stationary Bellman equation

mm [L« y /(z, y) + c(x, y, u)] = 7,

(1.4.30)

where L™^y denotes the elliptic operator

Obviously, for the optimal control ut =
7=

c(x,y,tpt(x,y))p00(x,y)dxdy

(1.4.32)

and, together with the functions f(x,y) and u* = ipt(x,y), can be found by solving the time-invariant equation (1.4.30). Some methods for solving

the stationary Bellman equations are considered in Chapters II- VI. 1.4.3. Maximization of the mean time of the first passage to the boundary. As previously, we assume that in the servomechanism shown in Fig. 10 the stochastic process y(t) is homogeneous in time and the plant P is autonomous. We also assume that a simply connected closed domain

D C R-n+m is chosen in the (n + ra)-dimensional Euclidean space R n +m of vectors (x,y). It is required to find a control that, for any initial state

(z(0),j/(0)) G D of the system, maximizes the mean time Er during which the representative point (x(t),y(i)') achieves the boundary dD of the domain D (see the criterion (1.1.21) in §1.1). By Wu(t — to, XQ, j/o) we denote the probability of the event that the rep-

resentative point (x, y) does not reach dD during time t— to ifx(to) = XQ and

68

Chapter I

y(to) = 3/0, (£0,2/0) G D, and a control algorithm u(t) — t0,

(1.4.33)

if (s0, 2/0) € 3D.

If t r denotes a random instant of time at which the phase vector x ( t ) = (x(t),2/(f)) comes to the boundary dD for the first time, then the time T = tr — to of coming to the boundary is a random variable and the function

W"(-) can be expressed via the conditional probability

Wu(t~to,x0,yo) = Pu{r>t-t0 \x(t0) = x0,y(t0) = 2/0} = Pn{T>t-t0\x0,y0}.

(1.4.34)

For the mutually disjoint events {T < t—to} and {T > t— to}, the probability addition theorem implies

Pu{rt-to x 0 ,Jto} = l.

(1.4.35)

Expressing the distribution function of the probabilities PW{T < t — to \ ^0,2/0} via the probability density wT(a] of the continuous random variable T, we obtain ft — to

P{r
x0,y0} = I Jo

wT(a)du — 1 - Wu(t - t0,x0, t/0)

from (1.4.34) and (1.4.35). Hence, after the differentiation with respect to t, we have

dWu wr(t-t0) = ————(t-to,x0,yo).

(1.4.36)

Using the same notation for the argument of the density and for the random

value, that is, writing wT(t — to) = w(r), from (1.4.33) and (1.4.36) we obtain the mean time ET of achieving the boundary /•OO

=

Jo

,-00

Tw(r)dT = -

Jo

^yyu

T~—(T,x0,y0)dT

dr

/.OO

= / Jo

|»OO

Wu(T,x0,y0)dT=

Wn(t-t0,x0,yoo)dt. )dt.

Jt

°

This formula holds if lim^oo^ - t0)Wu(t - t0, x 0 , yo) = 0.

(1.4.37)

Synthesis Problems for Control Systems

69

The mean time ET depends both on the initial state (XQ, J/Q) of the controlled system shown in Fig. 10 and on a chosen control algorithm. u = f>(x, y). Therefore, the Bellman function for the problem considered is determined by the relation ,.00

Fi(x,y) = max /

WU(T - t, x, y) dr.

(1.4.38)

u(r)£U Jt T>t

By analogy with (1.4.10), for the function (1.4.38), the basic functional equation of the dynamic programming approach has the form

Fi(xt,yt) =

max

r ,t+A -i / WU(T -t,xt,yt)dT +EFi(xt+&.,yt+&)\.

u(r)£U Ut t
J

(1.4.39) The Bellman differential equation for the function Fi(x,y) can be derived from (1.4.39) by passing to the limit as A —>• 0. In this case, the procedure is almost the same as that used for the derivation of Eq. (1.4.21)

for the basic problem (i)-(v). Expanding Fi(xt+&., J/J+A) in the Taylor series around the point (xt,yt), averaging the expansion with respect to the random increments (XJ+A — %t) and (yt+A — yt), taking into account the relation lirriA-+o W"(A, Xt, yt) = I for all (xt,yt) lying in the interior of D, and passing to the limit as A —)• 0, from (1.4.39) with regard to (1.4.13)-(1.4.19), we obtain the Bellman differential equation for the function F-i(x,y): F^x.y)^-!,

(1.4.40)

where the elliptic operator L">y is given by (1.4.31).

We also note that the function Fi(x,y) satisfies Eq. (1.4.40) in the interior of the domain D. It follows from (1.4.33) and (1.4.38) that at the points of the boundary dD the function F\ vanishes,

= 0.

(1.4.41)

[x,y)edD

In the theory of differential equations of elliptic type, the problem of solving Eq. (1.4.40) with the boundary condition (1.4.41) is called the first

interior boundary-value problem or the Dirichlet problem. Thus, solving the synthesis problem for the optimal control that maximizes the mean time of the first passage to the boundary is equivalent to solving the Dirichlet

problem for the semilinear elliptic equation (1.4.40).

70

Chapter I

1.4.4. Minimization of the maximum penalty. Now let us consider the synthesis problem with optimality criterion (1.1.18) for the optimal control system shown in Fig. 10. In this case, it is reasonable to introduce the loss function

F2(t,xt,yt)=

min E max r T C(Z(T), t/(r), W(T)) .

U(T)
L*> >

J

(1.4.42)

t
In (1.4.42) the averaging has the meaning of the conditional mathematical expectation £[•] = £{[•] x(t) = xt,y(t) = yt}. For small A we have the following basic functional relation for the function F2'.

F2(t,xt,yt)

=

min

{max.[c(xt,yt,ut) + o(A),EF2(t + A,

Let us introduce the notation c°(x,y) = min^gjy c(x, y, u). Then it follows from (1.4.43) that either

F2(t,xt,yt) = c°(xt,yt),

(1.4.44)

or

F2(t,xt,yt) = lim

min

A

EF2(t + A, z t + A , j/ t+A ),

(1.4.45)

-*-° u(r)£U t
provided that the function F2(t,X(,yt) > c°(xt,yt) has been obtained from

(1.4.45). Acting by analogy with Section 1.4.1, that is, expanding the function .£2(2 + A, £ ( _|_ A ,y t + A ) in the series (1.4.12), averaging, and passing to the limit as A ->• 0, from (1.4.44) and (1.4.45) we obtain (with regard to (1.4.13)-(1.4.19)) the Bellman equation in the differential form:

f mmLl F2(t,x,y) = Q I u£U ( F2(t, x, y) = c°(x, y)

if F2(t, x, y) > c°(x, y ) , (1.4.46) otherwise,

where Lf^^y denotes the operator (1.4.23). The unique solvability of (1.4.46) implies the condition

F2(T,x,y)=c°(x,y),

(1.4.47)

Synthesis Problems for Control Systems

71

as well as the matching conditions for the function ^2(^1 x, y) on the interface between the domains on which the equations in (1.4.46) are defined. These conditions of "smooth matching" [113] require the continuity of the function F2(t,x,y) and of its first-order derivatives with respect to the

phase variables x and y on the interface mentioned above. If, by analogy with Sections 1.4.2 and §1.4.3, the characteristics of the input process y(t) and of the controlled plant P are time-independent, then it is often expedient to use a somewhat different statement of the problem considered, which allows us to assume that the loss function is independent

of time. In this case, we do not fix the observation time but assume that the optimal system minimizes the functional I[u] = E maxc(x(r), y(r), u(T))e-^T-t},

(1.4.48)

where (3 > 0 is a given number. This change of the mathematical statement preserves all characteristic features of the problem. Indeed, it follows from

(1.4.48) that the time of observation of the function c(x,

y, u) is bounded

and determined by /?. Namely, this time is large for small /3 and small for large (3. For the criterion (1.4.48) the loss function is determined by the formula10

h(x,y) = min E [maxc(a;(T), 2/(r),-u(T)) e -^ ( r - < ) j . «(r)eu L r > * J

(1.4.49)

T>t

Taking into account the relations min E

max c(a;(r), J/(r), u(

u(r)£U

r>t + A

= e-/3A min E max C(X(T), «(r)€l7

r>t+A

we can rewrite Eq. (1.4.43) for the function f z ( x , y ) in the form h(xt,yt) =

min < max[c(a3 t ,J/t,Wt) +o(A), Ef2(xt+/\,yt+^)e~/3^] «(r)eu I

>. J

(1.4.50) 10

As usual £[•] in (1.4.49) is treated as the conditional mathematical expectation

72

Chapter I

By analogy with the previous reasoning, from. (1.4.50) we obtain the Bellman equation for the function /2(x, y)'

( mini" h(x,y) = ( 3 f 2 ( x , y ) I u&u I h(x,y) = c°(x,y)

if f?(x,y) > c°(x,y),

(1.4.51)

otherwise,

where i" is the elliptic operator (1.4.31) that does not contain the derivative with respect to time t. In §2.2 of Chapter II, we solve Eq. (1.4.51) for a special problem of optimal control. 1.4.5. Optimal tracking of a strictly discontinuous Markov process. Let us consider a version of the synthesis problem for the optimal

tracking system that differs from the basic problem (i)-(v) by conditions (i) and (iv). Namely, we assume that (i) the input process y(t) in the servomechanism (see Fig. 10) is given by a strictly discontinuous Markov process (see §1.1) with known characteristics X ( t , y ) and T r ( y , z , t ) determining the intensity of jumps and the density of the transition probability at the

state (t, y) and that (ii) there are no random perturbations £(t) that act on the plant P. In this case, the plant P is described by the system of

ordinary (nonstochastic) differential equations

x(t) = a(t,x(t},u(t)},

x(Q) = xQ,

0
(1.4.52)

It follows from (1.1.68) that for small A the transition probability p(t, yt', t-\-

A,2/ t + A) = p(y(t + A) — y t+ A | y ( t ) = yt) for the input process y(t) is determined by the formula

-yt) AA(t, j/ t )T(*, yt, y < + A ) + o(A).

(1.4.53)

By analogy with the solution of the basic problem, in the case considered, the loss function Fz(t,x,y) is determined by (1.4.5) if £[•] in (1.4.5) is understood as the averaging of the functional [•] over the set of sample paths yf = [y(r) : t < T < T] issued from a given initial point y(t) = yt.

Obviously, Fs(i, x,y) satisfies the functional equations (1.4.6) and (1.4.10). We rewrite Eq. (1.4.10) for F3 as follows:

, «
U

t+A

c(x(T),y(T),u(T))dT

•

(1-4.54)

Synthesis Problems for Control Systems

73

Note that for small A we can explicitly average (1.4.54) by integrating the function in the square brackets multiplied by the transition probability (1.4.53). Since the sample paths of the input process y(t) are discontinuous, the random increments (y*+A ~ Vt) are> generally speaking, not small. Therefore, in our case, instead of (1.4.12), we use the following representation of

F3(t + A, a; t+ A, 2/t+A) as A ->• 0:

dF3 F3(t + A, zt+A, yt+A) = Fsit, xt, yt+&) + A - — (i, xt, o(A)

(1.4.55)

(in (1.4.55) it is assumed that -Fs(<, z,j/) is a continuously differentiable function with respect to t and z). The Bellman equation for F3(t, x, y) can be derived from (1.4.54) in the

standard way. To this end, we substitute expansion (1.4.55) into (1.4.54), average with probability density (1.4.53), and pass to the limit as A —>• 0 in (1.4.54). Using (1.4.53), we obtain

EF3(t,xt,yt+£,) = I F3(t,xt,yt+&)[(l- A\(t,yt))8(yt+& - yt)

+ AA(<, yt)x(t, yt, J/*+A)] dyt+& + o(A) = F3(t, xt, yt) + A

/ F3(t, xt, z)ir(t, yt, z] dz

- F3(t, xt, &)] A(*, yt) + o(A).

(1.4.56)

In a similar way, it follows from (1.4.52) and (1.4.53) that -xt = a(t, xt, ut)A + o(A),

dF* -^(i,i t ,j/ t )+0(A) ) 5F-,

-^(t, s t , yt) + 0(A),

(1.4.57)

(1.4.58) (1.4.59)

/•«+A

E / 1/4

C(X(T), y ( r ) , u ( T ) ) dr = c(xt, yt, w«)A + o(A) (1.4.60)

(in (1.4.58) and (1.4.59) the functions O(A) denote terms of the order of A such that lirriA->.o O(A)/A = AT, where TV is a finite number).

74

Chapter I

Using (1.4.55)-(1.4.60) and passing to the limit as A -> 0 in (1.4.54), we obtain the following Bellman integro-differential equation for the function FS:

^(t,x,y) + X ( t , y) \ f ir(t, y, z)F3(t, x,z)dz- F3(t, x, y) Ot

LJ

+ min \aT(t,x,u)—^-(t,x,y)+c(x,y,u) = 0, 0 < t < T, (1.4.61) ueu |_ ox J F3(T,x,y)=il>(x,y). (1.4.62) If A(t, y) = \(y), 7r(t, y, z) = ?r(y, z), a(i, z, u) = a(x, u), tf>(x, y) = 0, and T —^ oo, then the system shown in Fig. 10 may operate in the stationary tracking mode (see Section 1.4.2). In this case, instead of (1.4.61), we have the stationary Bellman equation A(y)

Tr(y,z)f3(x,z)dz-f3(x,y)\

aT(x, u)--(x,y) +c(x,y, u) =7, ox J where a stationary loss function f3(x,y)

(1.4.63)

is determined by analogy with

(1.4.29) as f3(x, y) = ^ [F3(t, x, y) - 7(T - t)] and the number 7 > 0 determines mean losses per unit time in the stationary tracking mode under the optimal control. The solution of the timeinvariant equation (1.4.63) for a special synthesis problem is given in §2.2. In conclusion, we make some remarks. First, we note that in this section we have considered only the synthesis problems (and the corresponding Bellman equations) that are studied in

the present monograph. The Bellman equations for other stochastic control problems can be found in [1, 3, 5, 18, 34, 50, 57, 58, 113, 122]. Moreover, the ideas and methods of the dynamic programming approach are widely used for solving problems of optimal control for Markov sequences and processes with finitely or countably many states [151, 152], which we do not consider in this book.

We also point out that many arguments and computations in this section are of rather formal character and sometimes correspond to the "physical level of rigor." To justify the optimality principle, the sufficiency of Markov optimal strategies, the validity of Bellman differential equations, and the solvability of synthesis problems rigorously, it is required to have rather complicated and refined mathematical constructions that are beyond the

framework of this book. The reader interested in a closer examination of these problems is referred to the monographs [58, 59, 175], and especially to [113].

Synthesis Problems for Control Systems

75

§1.5. Sufficient coordinates in control problems with indirect observations

We have already noted that the dynamic programming method in, so to say, its "pure" form can be used only for Markov controlled processes. Let Xt be a current phase state of the system. The probabilities of future states -X"t+A (A > 0) of the process X(t) must be completely determined by the last measured value Xt. However, since the time evolution of X(t) depends on random perturbations and control actions, the process X(t) satisfies the Markov property only if the values Ut of the current control are determined by the instant values of the phase variables and time as follows:

ut =
(1.5.1)

The Markov property of the process X(t) allows us to write the basic functional equation of the optimality principle, then to obtain the Bellman equation, etc., that is, to follow the procedure described in §1.4. To implement the control algorithm in the form (1.5.1), it is necessary to measure the phase variables Xt exactly at each instant of time. This possibility is provided by the servomechanism shown in Fig. 10. In this case, the phase variables Xt = Xt — ( x t - , y t ) are the components of the (n + m)-dimensional vector of instant input (assigning) actions and output (controlled) variables. Now let us consider a more general case of the system shown in Fig. 3. At each instant of time, instead of true values of the vectors Xt and y$, we have only the results of measurements J/Q and XQ, which are sample paths of the stochastic processes { y ( s ) : 0 < s < t] and { x ( s ) : 0 < s < t}. These processes are mixtures of "useful signals" y^, x*0 and "random noises" £Q, ?7oOnly these results of measurements can be used for calculating the current values of the control actions Ut', therefore, the desired control algorithm for the system shown in Fig. 3 has the form of the functional ut = (t,x*,y*).

(1.5.2)

To illustrate the computation of the optimal functional ¥>(<, EQ, J/Q), we consider, as an example, the basic synthesis problem (see §1.4, Section 1.4.1) in the case of indirect observations. Assume that the equation of the controlled plant, the restrictions on the control, and the optimality criterion have the form

x(t) = a(t,x(t),u(t))+cr(t,x(t))£(t),

u(t) <E U C R r ,

0 < t < T,

(1.5.3)

(1.5.4)

76

Chapter I

I(u)

= E\ f c(x(t), y ( t ) , u ( t ) ) dt + ^(x(T), y(T))] Uo J

(1.5.5)

(here we use the notation from (1.4.2), (1.4.3), and (1.4.4) in §1.4). The

observed processes x ( t ) and y ( t ) are determined by the relations

x ( t ) = P ( t } x ( t ) + Q(t)r,(t);

y(t) = H(t)y(t) + G(t)C(t),

0
Here P, Q, H, and G are given matrices whose dimensions agree with the dimensions of the vectors x, x, n, y, y, and £. We also assume that the

vectors x and rj (as well as the vectors y and Q are of the same dimension, and the square matrices Q(t) and G(t) are nondegenerate for all t G [0, T].11

We assume that the stochastic process £(i) in (1.5.3) is the standard white noise (1.1.34) and the other stochastic functions y(t), C(^), and r)(t) are Markov diffusion processes with known characteristics (that is, with given drift and diffusion coefficients). The stochastic processes £ ( t ) , y(i), CWi and r](t) are assumed to be independent. We also note that the stochastic process x(t), which is a solution of the stochastic equation (1.5.3), is not Markov, since in this case the control functions u(t) = Ut on the right-hand side of (1.5.3) have the form of functionals (1.5.2) and depend on the history of the process.

Following the formal scheme of the dynamic programming approach, by analogy with (1.4.5), we can define the loss function for the problem considered as follows:

t
U

~. ~ „

T

-, c(z(r), y(r), u(r))dr + j(x(T), y(T)) \ z«, %< . J

„

(L5 7)

-

Since the functions XQ and y^ are arguments of F in (1.5.7), it would be more correct if expression (1.5.7) were called a loss functional; however, both (1.5.7) and (1.4.5) are called loss functions.

In contrast with §1.4, it is essentially new that we cannot write the optimality principle equation of type (1.4.6) or (1.4.10) for the function

(1.5.7), since this function depends on the stochastic processes x ( t ) and y ( t ) , which are not Markov. Formula (1.5.6) immediately shows that x ( t ) and y(t) have no Markov properties, since the sum of Markov processes

is not a Markov process. Moreover, it was pointed out that the process x ( t ) itself is not Markov. Therefore, we can solve the synthesis problem 11

For simplicity, we assume that Q(t) and G(t) are nondegenerafce, but this condition

; not necessary [132, 175].

Synthesis Problems for Control Systems

77

by using the dynamic programming approach only if we can choose new "phase" variables X(t) = Xt for the loss function (1.5.7) so that, on one hand, they were sufficient for computation of minimum future losses in the sense of

F(t,x*,y*)=F(t,Xt) and, on the other hand, the stochastic process X(t) be Markov. Such phase variables Xt are called sufficient coordinates [171] by analogy with sufficient

statistics used in the mathematical statistics [185]. It turns out that there exist sufficient coordinates for the problem considered and Xt is a collection of instant values of observable processes x(t) = Xt and y ( t ) — yt and of the a posteriori probability density p(t,xt,yt) = p(x(t) = xt,y(t) = yt | *o,5o) of the unobserved vectors Xt and yt,

Xt=(xt,yt,p(t,xt,yt)).

(1.5.8)

In what follows, it will be shown that the coordinates (1.5.8) are sufficient to compute the loss function (1.5.7). In the case of an uncontrolled process x(t), the Markov property of (1.5.8) follows from Theorem 5.9 in [175].

To derive the Bellman differential equation, it is necessary to know equations that determine the time evolution of sufficient coordinates. For the first two components of (1.5.8), that is, for the process x ( t ) and y ( t ) , these equations can be assumed to be known, because one can readily obtain them from the a priori characteristics of the processes j/(<), x(t), C(£), n(t) and formulas (1.5.6). Later we derive the equation for the a posteriori probability density p(t, zj, yt). First, we do not pay attention to the fact that the control ut has the form of a functional (1.5.2). In other words, we assume that u(t) in (1.5.3) is a known deterministic function of time. Then the stochastic process x(t) that satisfies the stochastic equation (1.5.3) is a diffusion Markov process whose characteristics (the drift and diffusion coefficients) are uniquely determined

by the vector a(t,x,u) and the matrix cr(t,x) (see §1.2). Thus, in our case, x(t), y(t), C(i), and ^(t) are independent stochastic Markov diffusion processes with given drift coefficients and matrices of diffusion coefficients. In view of formulas (1.5.6) and the fact that the matrices Q(t) and G(t)

are nondegenerate, it follows that the collection ( x ( t ) , y(t), x ( t ) , y(i)~) is a Markov diffusion process whose characteristics can be expressed via given characteristics of the processes x(t), y(t), C(*)> and *?(<)• Indeed, if we

denote the vectors of drift coefficients by Ax(t, x), Ay(t, y ) , A f ( t , C), A^(t, n) and the diffusion matrices of independent Markov processes x ( t ) , y ( t ) , £ ( t ) ) , and r)(t) by Bx(t,x), By(t,y), -B c (i,C), B^(t,rj), then it follows from (1.5.6) that the drift coefficients Ay and Ay for the components x(t) and y(t) are

78

Chapter I

determined by the relations

A-s = P(t)Ax(t,x) + Q(t)An(t, Q~1(t)(x- P ( t ) x ) ) + P(t)x + Q(t)Q-l(t)(x - P(i)z),

(1.5.9)

f = H(t)Ay(t, y) + G(t)Ac (t, G~\t)(y - H(t)y}} G(t)G-1(t)(y-H(t)y)

(1.5.10)

and the matrix B of the diffusion coefficients of the joint process ( x ( t ) , y ( t ) , has the block form

Bz(t,x,x) 0 Bx(t,x)PT(t) 0

0 By(t,y,y) 0 By(t,y)HT(t)

P(t)Bx(t,x) 0 Bx(t,x) 0

0 H(t)B9(t,; 0 By(t,y)

, (1.5.11)

where Bx(t,x,x) and B^(t,y,y) are square matrices12 determined by the relations

(1.5.12) T

By(t,y,y)=H(t)By(t,y)H (t] + G(t)B((t,G-l(t)(y-H(t)y))GT(t). (1.5.13) Now we point out that in the Markov collection of random functions

( x ( t ) , y(f), x(t), yit)) the components x ( t ) and y(t) are observable, but the components x(y) and y ( t ) are not observable. Partially observable Markov processes are often called conditional Markov processes. The rigorous theory of such processes can be found in [132, 175]. Let us consider the conditional (a posteriori) density p(t, xt, yt) = (x(t) = xt, y(t) = yt XQ, y0') of the probability distribution for unobservable com-

ponents of the partially observable Markov process ( x ( t ) , y ( t ) , x ( t ) , y ( t ) ) . It turns out that the a posteriori density p(t, Xt, yt) satisfies a stochastic partial differential equation, first obtained in [175], This is a generalization of the Fokker-Plank equation (1.1.67) to the case of observation. In what follows, we briefly derive this equation. 12 If P, Q, H, and G in (1.5.6) are row matrices, then B~(t,x,x) and B~(t,y,y) are scalar functions.

Synthesis Problems for Control Systems

79

According to [175], we introduce the following notation. We denote the collection of random functions (x(t), y(t), x(i), y ( t ) ) that forms a Markov process by a single letter z(t) and assume that the dimension of the vector z is equal to n. We assume that the unobservable components of the vector z are numbered from 1 to m and the observable components are numbered from ra+1 to n. For convenience, we write xa (I < a < m) for unobservable components and yp (m + 1 < p < n) for observable ones. We also use

three groups of indices: the indices i,k,t,... vary from 1 to n; the indices a, /3, 7, . . . from 1 to m; and /?,
z ( t ) = z} = Ak(t,z); Hm

Azk = zk(t + A) -

E[Azfc AZ, | z(t) =z}= Bkt(t, z);

Urn iE[Az f c l ...Az f c l . | z(t) = z] = 0

(1.5.14) (r > 2).

It is required to obtain an equation for the a posteriori probability p(t, x$) = p(xt | 2/o)i provided that (1.5.14) and the results of observation of y*Q are known. Using the transition probability PA(Z*+A I z*) = PA(a=t+A,^t+A xt,yt) and the probability multiplication theorem, we obtain

p(xt+A,Vt+A,xt | 3/0) =PA(a5t+A,2/«+A

xt,yt)p(xt \ y*0).

(1.5.15)

Integrating (1.5.15) with respect to xt and taking into account (1.1.50), we obtain

I J/o) = / PA(a: t + A,2/t+A ] xt,yt)p(t,xt] dxt.

(1.5.16)

If we write the left-hand side of (1.5.16) in the form p(z«+A,yt+A I 2/o) =P(«t+A | 2/o'2/«+A)p(y*+A

J/Q)>

then we can write (1.5.16) as follows: P(Z*+A

2/o.2/t+A) = -7———j—JT / PA(**+A,y«+A | xt,yt)p(t,xt)dxt.

p(yt+A i %/ ^

(1.5.17)

Integrating (1.5.16) with respect to xt+&, we obtain = //

| xt,yt)p(t, xt)dxtdxt+&.

(1.5.18)

Chapter I

80

Substituting (1.5.18) into (1.5.17) and taking into account the fact that the equality

is valid, since the arguments are continuous, we obtain

xt,yt)p(t,xt)dxt

,

p(t + A,

a;*,

—-—— ~ ——-

+ o(A).

(1.5.19) Equation (1.5.19) for partially observable Markov processes plays the

same role as the Markov (Smoluchovski) equation (1.1.53) for the complete observation. To derive the differential equation for the a posteriori density p(t, Xi] from (1.5.19), we use the same method as for the derivation of the

Fokker-Planck equation in §1.1 (see (1.1. 59)-(l.l. 64)). Let us introduce two characteristic functions of random increments As a , a = ! ) • • • ) m, and Az^, k = 1, . . . , n.,13

= I exp [jua(xat+& - x a ( )]p A (x t + A,J/«+A J

Xt,yt)dxt+&,

(1.5.20)

i, • • • , un,zt)

(1.5.21)

- I exp[JM

The transition probability can be expressed in terms of inverse Fourier transforms as follows:

zt = —71—— / exp [ - juk(zkt+A - zkt)] l^ "; J x 02(ui,...,un,zt)dui,...,dun,

(1.5.22)

zt = -—— I exp [ - jua(zat+& - zat)] \*n)

J

(1.5.23) 13 In (1.5.20) and (1.5.21), as usual, j = \/—1 and the sum is taken over the repeated indices:

Synthesis Problems for Control Systems

81

Using the expansion of In $2(1*! zt) in the Maclaurin series, we can write

(1.5.24)

where Ks [ A z & , . . . , Az r ] denotes the s-order correlation between the components of the vector of increments Az = z(t + A) — z(i) of the Markov process z(t). Using well-known relations between correlations and initial

moments [173, 181], we see that (1.5.14) gives the following representation for (1.5.24):

0 2 (wi,- • -,un,zt) = exp \&jAkuk - —BkiukUi + o(A) Zi

L

(1.5.25)

-i

(for brevity, in (1.5.25) and in the following we do not write arguments of

Ak and Bkt, namely, Ak = Ak(t, zt) and Bkl — Bki(t, zt)). Comparing (1.5.22) and (1.5.23), we see that 0i(ui, ...,um, zt, y t + A ) = (27r)—— / exp [ - ;«<,(^+A - yat}}

x ^2(1*1,..., un,zt) dum+i.. .dun.

(1.5.26)

After the substitution of (1.5.25), we can calculate the integral (1.5.26) explicitly. As the result of integration, for the characteristic function #1 we obtain the formula14

— Kex.p [L(ui,.. .,u r o ,z t ,2/ t + A )A + o(A)],

(1.5.27)

where

= jua Aa + Ba
-

,

(1.5.28)

14

To obtain (1.5.27) and (1.5.28), it suffices to use the well-known formula [67] /

J

ex

P

L

~ ~Bkt(xk ~ mk)(xe-mt)\

2

J

dxi ...dxn = \

*

V detij

which holds for real symmetric positive definite matrices B = j|Sfe^|jr and any constants 771 ft and 7Ti£.

82

Chapter I

and K is a constant that does not influence the final result of calculations. Note that we calculated (1.5.26) under the assumption that the matrix ||.B<7p||ro+i is nondegenerate and we used the notation ]|-FVp|| = H-Bo-pll" 1 Since the exponent in (1.5.27) is small (~ A), we replace the exponential by the first three terms of the Maclaurin series ex = 1 + x + x 2 /2, truncate the terms whose order with respect to A is larger than 1, and obtain

exp [L(ui, . . . , w m , zt, 2/«+A)A + o(A)] = 1 + L(UI, ...,um, zt, 2/ t+A )A

- —(uaupBavFvpBpft

- ArFrpAp) + juaBa^F^pAp^ + o(A).

(1.5.29)

In (1.5.29) we used the relation FapBpT = BapFpr = 8VT, where 8ffT is the Kronecker delta, and the formula

Ay p At/ r = Bpr(t, z t )A + o(A) = BpT A + o(A),

(1.5.30)

which follows from the properties of Wiener processes and is a multidimensional generalization of formula (1.2.8) (for details, see Lemma 2.2 in [175]). Substituting (1.5.28) into (1.5.29) and collecting the similar terms, we obtain

exp [£(ui,...,w m ,z t ,3/ t + A )A + o(A)] = 1 + (jua)(AaA + Ba
(1.5.31)

Using (1.5.23), (1.5.27), and (1.5.31), we calculate the numerator of the fraction on the right-hand side in (1.5.19): /

rr

[

//

II

.',.

(A

A

Jua\-H-a'-l -

+ o(A).

(1.5.32)

Taking into account the formulas (see (1.1.60)-(1.1.64)) 1 ____ m (2 }

/C e-3ui(!C-ft+^.-x-ft)

/ -,'->,

(2 ?r)m J

m

j

j

_

aui...aum—

(,-ju-f(xyt+A.-x^f)

j

j

'''

Synthesis Problems for Control Systems

dS(xat+A — Xat)

TT

f

83

/

(27T)™

we obtain the numerator in (1.5.19) (we omit the constant K, since K and a similar constant in the denominator of (1.5.19) are canceled): r\

p(t, xt+&) - -^——— [(vla A + Ba
32 ———— 5———— [Bapp(t,xt+±)]

+A<7F(7pp(t,xt+&))&yp + o(&)

Q g 23)

(in (1.5.33) (<, zt+A,J/t) are the arguments of the coefficients Afc and Fffp). The denominator of the right-hand expression in (1.5.19) differs from the numerator by integration with respect to Xt+&. We perform this integration, take into account the fact that the normalization condition for the probability density p(t, xt+&) implies p(t, »«+A) dxt+A = 1;

p(t, x) —>• 0,

x) *0 — — » 0 c/xa

as

\x\ —> oo,

and from (1.5.33) obtain the following expression (without K) for the denominator in (1.5.19): I

o(A)

= 1 + &ypEpsAffFap

+ o(A),

(1.5.34)

where E p s (-) denotes the a posteriori averaging f(-)p(t,x)dx. We assume that the elements of the matrix Bap (and of F<,p, respectively) are independent of unobservable components x and take into account (1.5.30). Then we can write

(1.5.35)

84

Chapter I

Multiplying (1.5.33) by (1.5.35) and substituting the result into (1.5.19), we obtain

1

/)2

AaP - PEpsAff - j(Bavp) OX(y

+ o(A). (1.5.36) -*

As A —> 0, the terms denoted by o(A) in (1.5.36) disappear, and the finite increments become differentials. In this case, according to §1.2, it is necessary to point out in which sense stochastic differentials are understood, since the differential equation obtained is stochastic (it contains the differential of the Markov process dyp(t)). Comparing Eq. (1.5.36) (as A —> 0) with the stochastic equation (1.2.3), we see that now the a posteriori probability density p(t, x) in (1.5.36) plays the role of the random function x(t) in (1.2.3), and the vector-function (1.5.37) plays the role of the function 0, (1.5.36) implies the following differential equation in the Ito form:

dt [d0yp(t) - EpsAp dt}Fpa. pAa - pEpsAa - -— (5a
d2

dp _ 1

r

m-2dx-^[(Baf}--

dyr

2 yp

dxa

Tf>

dyT

Synthesis Problems for Control Systems f\

a

i -&TP

rt

VT

85 i J? <rp-**-rT **-p

by using coupling formulas between stochastic differentials and integrals (see §1.2);15 in (1.5.39) yp = yp(t) = dyp(t)/dt denotes the formal timederivative of the Markov process yp(t).

Equation (1.5.39) is a generalization of the Fokker-Planck equation to the observation case. It should be noted that if some transformations of random functions p(t,x) are necessary (see formulas (1.5.41)-(1.5.44) below),

then it is more convenient to use Eq. (1.5.39), although it is more cumbersome than a similar Ito equation (1.5.38), since (see §1.2) the symmetrized form allows us to treat random functions (even such singular functions as white noises) according to the same formal rules as for deterministic and

sufficiently smooth functions. We can show [132] that VFap[d0yp(t) - EpsApdt}16 is the differential of the standard Wiener process dr](t) studied in §1.2. Therefore, in view of Eq. (1.5.38), the already cited Markov property of the set (j/t,p(<, x)) can be obtained by not completely rigorous but sufficiently illustrative arguments. Indeed, since the increments [^(t + A) — r),,(i)] of the stochastic pro-

cesses riv(t) = ^/Fap[yp(t)

- J0 EpsAp(T,y(T))dt]

in (1.5.38) are mutually

independent, the future values of the a posteriori probability p(t + A, x) are

completely determined by (x t , yt,p(t, *))• Since the vector xt is unobservable, then the probabilities p(t + A, x) of future values are determined by \ytip(t, x)) and the probability of the current value Xt, that is, by the a posteriori density p(t, x) contained in (yt,p(t, «))• On the other hand, since the process z(t) is of Markov character, the probabilities of future values of the observable process yt+A are completely determined by its current state z t — (xt,yt)i that is, by the same set (yt,p(t, x)), since xt is unobservable. This implies that (y(t),p(t, x)) is a Markov process. Now let us recall that Eqs. (1.5.38) and (1.5.39) were derived under the assumption that the control u(t) in (1.5.3) is a known deterministic

function of time. However, if the control u(t) is given by the functional (1.5.2) (in the new notation, following (1.5.14), this functional has the form

u(t] — ut = p(t, j/o))> then this fact does not affect the Markov properties of (j/t,p(t, a:)), since it is assumed that (j/t,p(i,a:)) is determined by the 15 Here we do not show in detail how to transform the Ito equation (1.5.38) to the symmetrized form (1.5.39); the reader is strongly recommended to do this useful exercise

on his own. 16

\AF(Tp denotes an element of the matrix \/F which is the square root of the ma-

trix F; since the matrix |]Bp
86

Chapter I

entire past history of the observations J/Q = { y ( s ) : 0 < s < t}. Thus, for a given state of (yt,p(t, x)) and any chosen functional if> in (1.5.2), the control Ut is a known vector on which the functions o(i, x, u) in (1.5.3) and the functions Aa and Aa in (1.5.38) depend as on a parameter. Hence it

follows that Eqs. (1.5.38) and (1.5.39) are also valid for controlled processes (provided that the control is given in the form (1.5.2)).

Now let us return to the synthesis problem and the dynamic programming approach. Describing the state of a controlled system at time t by (1.5.8) or, briefly, by (yt,p(t,x)^

(recall that after (1.5.14) we introduced

the new notation: Sj, yt —>• yt and xt, yt —)• xt), we can write the loss function (1.5.7) as F(t,i/Q) = F(t, yt,p(t, z)). Using the Markov property of (yt,p(t, z)), we can write the basic equation of the optimality principle for the function F(t,yt,p(t,x)} as follows:

F(t,yt,p(t, x)) =

rnin

«reu

E< \

"*+A c(xT,yT,uT)dr

t
ytt,,PP(t,x)\. (t,x).

(1.5.40)

Generally speaking, by passing to the limit as A —> 0 in (1.5.40) and using (1.5.14) and (1.5.38) (or (1.5.39)), we can obtain the Bellman differen-

tial equations by analogy with §1.4. However, the equation obtained in this way contains the functional derivatives SF/Sp(t,x), 82F/6p(t,x)Sp(t,x), etc.;

usually it is difficult to solve this equation (as pointed out in §1.4

and §1.5, even the solution of "usual" Bellman partial differential equa-

tions is a rather complicated problem). Therefore, it is more convenient in practice, instead of the a posteriori density p(t, x), to use some equivalent set of parameters as arguments of the function F. We show how to do this. Assume that the a posteriori probability density p(t, x) is a unimodal function of a vector variable x for all t £ [0, T]. By the vector mt = m(t] we denote the maximum of the a posteriori density p(t, x) at time t. Expanding lnp(t, x) in the Taylor series with respect to x around the point m(t), we

obtain the following representation of the a posteriori density p(t, x): p(t, x) = exp < a(t) - y~] —

x

•^ 2J

a,/3,...,C=i

aap...dt)(xc.-ma(t))...(x<;-mf(t))^

} J

(1.5.41)

Synthesis Problems for Control Systems

87

(the scalar function a(t) in (1.5.41) is determined by the normalization condition J p(t, x) dx = 1). Using (1.5.41), we readily obtain the system of equations for the parameters (ma(t),aap(t),aapy(t), . . .) instead of the symmetrized equation (1.5.39). To this end, we rewrite (1.5.39) in the more compact form

at

<

,

p

,

,

.

(1.5.42)

Next we replace the functions Aa and $(a;, y) by their Taylor series,17 substitute (1.5.41) into (1.5.42), and successively set the coefficients of equal powers of (xa — ma), (xa — ma)(xi) — mp), ... on the left- and right-hand sides of (1.5.42) equal to each other; thus we obtain the following system of ordinary differential equations for ma(t), aa/}(t), aap^(i), . . .:

dxa' 2

-

5A7 Qi~jft dx 7P 3

dx( 03^ — -^

Q--JCI

'

, — >> 1 -5—apiS + OXct

(1.5.43)

In (1.5.43) the dot over a variable indicates, as usual, the time derivative (mp = dm/}(t)/dt). Moreover, in (1.5.43) we assume that Bap is independent of x and omit the arguments of the functions A, $, and of their derivatives; we assume_that_the values of these functions are taken at the point x = m, that is, Ap — Ap(t,m,y), d$/dxa = d$(t,m,y)/dxa, etc. It follows from (1.5.41) that the set of the parameters ma(t), aap(t),... uniquely determines the a posteriori probability density p(t, x) at time t. 17

The functions Aa and $(a;,j/) are expanded with respect to x in a neighborhood

of the point ra(£).

88

Chapter I

Thus we can use these parameters as new arguments of the loss function, since F(t, yt,p(t, x ) ) = F(t, yt,mat,aa(3t, • • •)• However, in the general case, system (1.5.43) is of infinite order, and therefore, if we use the new sufficient coordinates (yt, mat, aapt,...) instead of the old coordinates (yt,p(tix)}> then we do not gain considerable advantage in solving special problems.

Nevertheless, there is an important class of problems in which the a posteriori probability density (1.5.41) is Gaussian (conditional Gaussian Markov processes are studied in detail in [131, 132]). We have such processes if [175] (1) the elements of the matrix Bap are constant numbers; (2) the functions Aa linearly depend on x; (3) the function <3>(a;,t/) depends on x linearly and quadratically; (4) the initial probability density (the a priori probability density of unobservable components before the observation) p(0, x) is Gaussian. Under these conditions, we have aa/g7 = aap^t = • • • = 0 in

(1.5.41) and (1.5.43), and system (1.5.43) is closed and of finite dimension:

+ &o:/3 —

^*«7"'po j-"yo

f\

•

a,/3,7,(5= I,...,TO.

(1.5.44)

Now let us consider the synthesis problem corresponding to this case. To avoid cumbersome formulas, we deal with a simplified version of problem (1.5.3)-(1.5.6). Namely, we assume that the input y(t) is absent and the system shown in Fig. 3 does not contain Block 1. Suppose that the plant P is described by the system linear with respect to the output (controlled) variables

x = G(t, u)x + b(t, u) +
(1.5.45)

where x = x ( t ) is an m-vector of output variables, G(t, u) and <j(i) are given m x TO matrices, b(t, u} is a given TO-vector-function, and £(t) is an ra-vector of random perturbations of the standard white noise type (1.1.34). More

explicitly, the vector-matrix equation (1.5.45) has the form a,/3= l , . . . , m . We observe the stochastic process

x ( t ) = P(t)x(t) + Q(t)ri(t),

Q
(1.5.46)

where x and 77 are fc-vectors, P and Q are k x n and k x k matrices, the matrix Q(t) is nondegenerate for all 0 < t < T, and rj(t) is the standard white noise (1.1.34) independent of (,(t).

Synthesis Problems for Control Systems

89

Under the assumption that the admissible control satisfies condition (1.5.4), it is required to find the optimal control uf(t) = <£>(£, ZQ) such that the cost functional

c(x(t),u(t))dt + i,(x(T))

(1.5.47) J

attains its minimum value.

We write y(t) = I X(T) dr. Jo

(1.5.48)

Then (x(t), y(t)) is a Markov stochastic process, and it follows from relations (1.5.45), (1.5.46), and (1.5.48) that the characteristics (1.5.14) of this

process have the form

Aa = Ga/}(t, u)xp + ba(t, u), Bap = Q,

Aa = Paa(t}xa, (

B^ =Qpr(t)Q
' '

'

In (1.5.49) the indices a, /3, 7 take values from 1 to m and the indices p, cr, T from TTi+1 to m + k.

In this case, it follows from (1.5.49) and (1.5.39) that in (1.5.42) we have

Bap = Ba!),

Aa = Aa,

*(x,y) = AaFapyp - -Ap \

Zi

'

(1.5.50)

(Pap is an element of the matrix H-B^H"1 = [Q(t)QT (^)] -1 ). It follows from (1.5.49) and (1.5.50) that in this case system (1.5.43) has the form (1.5.44). Substituting (1.5.49) and (1.5.50) into (1.5.44), we obtain the following system of equations for the parameters ma(t), aa/3(t), a,(3 = 1, . . . , m, of the a posteriori density:

b p ( t , u ) ] + P
- Pp/)(t)mf}),

(t, u)a7/3 - a a7 G 7/3 (t, u)

(1.5.51)

+ P<,a(t)Fffp(t)Ppf)(t). System (1.5.51) can be written in a more compact vector-matrix form. So, introducing the matrix A = ||aa/g|[J* and taking into account the fact that yp = xp according to (1.5.48), we see that (1.5.51) implies

Am = A[G(t, u)m+b(t, u)} + PT (t)[Q(t)QT (t)]-l(x - P(t)m),

A - -Aa(t)crT(t)A - GT(t,u)A - AG(t,u) T

T

+ P (t)[Q(t)Q (t)]^P(t).

(1.5.52)

90

Chapter I

Now we note that the right-hand sides of (1.5.52) do not explicitly depend on y(t), and moreover, the cost functional (1.5.47) is independent of the observable process x(t). Therefore, in this case, the current values of the vector yt do not belong to the sufficient coordinates of problem (1.5.45)(1.5.47), which are the current values of the components of the vector mt and of the elements of the matrix At. If instead of the matrix A we consider the matrix D = A~l of a posteriori covariances, then, multiplying the first equation in (1.5.52) by the matrix D from the left and the second equation in (1.5.52) from the left and from the right and taking into account the formulas

DA = AD = E,

DA + DA = 0,

D = -DAD,

we obtain, instead of (1.5.52), the relations

TO = G(t,u)m+b(t,u) + DPT(t)[Q(t)QT(t)}-1(x(t) T

T

D = v(t)
T

- P(t)m),

(1.5.53)

1

- DP (t)[Q(t)Q (t)]- P(t)D. Equations (1.5.53) are well-known equations of the Kalman filter [1, 5, 58, 79, 132]. As is known, the Kalman filter is a device for optimal filtering of the "useful signal" x(t) that is observed on the background of a random noise. In this case, the vector m(t) is an optimal18 estimate of current values of the components of the unobservable stochastic process x(t) that is the result of observation of XQ = {x(s): 0 < s < t}, provided that the observation process is given by (1.5.46). The matrix D(t) that satisfies the second (matrix) equation in (1.5.53) characterizes the accuracy of the estimation of unobservable components of the process x(t) by the vector

m(t) (see [1, 5, 79]). Equations (1.5.53) play the role of "equations of motion" for the controlled system in the space of sufficient coordinates. Since the process Q~l(t)(x(t) — P(t)mj is a white noise, the first equation in (1.5.53) is a stochastic equation of type (1.5.45), and the second equation is a usual differential (matrix) equation. Therefore, the Bellman differential equation for the loss function F(t,mt, Dt) can be derived by a technique similar to that used in §1.4 to derive Eq. (1.4.21) for the function (1.4.5). 18 The optimality of the estimate m(t) is understood as the minimum of the mean square deviation E|a;(t) — m(t)| 2 ; as is known [167, 175, 181], in the Gaussian case, m(t) coincides with the maximum point of the a posteriori probability density p(t, x) = p(x(t) = x | S0f).

Synthesis Problems for Control Systems

91

After similar calculations, we obtain the Bellman equation of the following form (see also [34, 175]) for the function F(t, m, D) in problem (1.5.45)(1.5.47):

dt

I

ft p

(TOTGT (t, u) + 6 T (t, u)) ——

3F

1

+ Sp ^(VGT(t, u) + G(t,«)D) + c(m, D, u) = 0, (1.5.54)

where dF/drn is an w,-vector with components dF/dma, a = l , . . . , m ; 82F/dmdmT is the mxm matrix of the derivatives d2F/dmadmf), a, (3 = 1,..., TO; dF/dD is the mxm matrix of the partial derivatives dF/dDap, a,/3 = 1,..., TO; and c(m, D, w) denotes the a posteriori mean of the penalty

function c(x,u) in the functional (1.5.47), that is,

= [(2 7 r) T O detD]- 1 / 2

(1.5.55) The loss function F(t, TO, D) satisfies (1.5.54) for 0 < t < T.

At the

terminal instant of time t = T, this function is determined by the relation

F(T,m,D) = EpsTp(x),

(1.5.56)

where, by analogy with (1.5.55), E ps (-) denotes integration of (•) with Gaussian density.

We see that (1.5.56) is a generalization of condition

(1.4.22) to the case of indirect observations. As usual, by solving Eq. (1.5.54) with the additional condition (1.5.56), we simultaneously obtain the optimal control u*(t) = ipi(t,m(t), D(t)) (see §1.3 and §1.4). Thus the desired algorithm of optimal control in the functional form u*(t) = (p(t,Xo) for problem (1.5.45)-(1.5.47) is the superposition of the two operations: the optimal filtering of the observed process ( x ( t ) : 0 < t < T) by means of the Kalman filter (1.5.53) and the formation of the current control ut(t) = tpi(t,m(t), D(t)). This situation is typical of other problems with indirect observations. Therefore, in the general case of the servomechanism shown in Fig. 3, the

Chapter I

92

FIG. 12 controller C actually consists of two blocks that are functionally different (see Fig. 12): the sufficient coordinate block SC that models the corresponding filter and the decision block D whose structure is determined by the solution of the Bellman equation.

Some examples of other Bellman equations obtained by using sufficient coordinates, as well as solutions of these equations, will be considered later

in §3.3, §4.2, §5.4, and §6.1.

CHAPTER II

EXACT METHODS FOR SYNTHESIS PROBLEMS

Exact solutions to synthesis problems of optimal control are of deep theoretical and practical interest. However, exact solutions can be obtained only in some special cases. The point is that exact methods are characterized by rather strict restrictions on the assumptions of the synthesis problem, but these assumptions are seldom satisfied in actual practice. It is well known that, for instance, the Bellman equation can be solved exactly under the following assumptions: (1) the dynamic equations of the plant are linear, (2) the optimality criterion of the form (1.1.11) or (1.4.3) contains only quadratic penalty functions, (3) no restrictions are imposed on the control and on the phase coordinates, (4) random actions (if any) on the system are Gaussian Markov processes or processes of the white noise type. The synthesis problems satisfying (l)-(4) are called linear- quadratic problems of optimal control. An extensive literature is devoted to these problems [3, 5, 18, 24, 72, 112, 122, 128, 132, 168]. In the present chapter we restrict our consideration to an outline of methods for solving such problems (§2.1) and consider in more detail less known results concerning the solution of some special synthesis problems with bounded controls (§§2.2-2.4).

§2.1. Linear-quadratic problems of optimal control (LQ-problems)

2.1.1. First, let us consider an elementary optimal stabilization problem of a first-order system perturbed by a Gaussian white noise (see Fig. 13). Suppose that the plant P is described by a linear scalar equation of the form

x - ax + bu+ A/KW,

(2.1.1)

where a, b, and v are given constants (y > 0) and £(t) is the standard white noise (1.1.31). The performance of this system is estimated by the following functional of the form (1.4.3) with quadratic penalty functions:

T

[cx2(t) + hu2(t)} dt + Cl z 2 (T)

(2.1.2) J

93

Chapter II

94

c

u(t)

FIG.

x(t)

P

13

(here c, ci, and ft, are given positive constants). We do not impose any restrictions on the control u and the phase variable x.

Problem (2.1.1), (2.1.2) is a stochastic generalization of the linear-quadratic problem (1.3.24), (1.3.25) considered in §1.3 and a special case of a more general problem (1.4.2)-(1.4.4). Since the stabilization system shown in Fig. 13 is a specific case of the servomechanism shown in Fig. 8, the Bellman equation for problem (2.1.1), (2.1.2),

- ex2 + min I bu—— + hu2\ = 0,

ax

~dt

'

(2.1.3)

can be obtained from (1.4.21) by setting

Ay=By = 0,

Bx = v,

c(x,y,u) = cx2 + hu2,

Ax = ax + bu.

In (2.1.3) the loss function F = F(t,x), determined, as usual, by

t
satisfies Eq. (2.1.3) in the strip ET = {0 < t < T, -oo < x < 00} and becomes a given quadratic function,

F(T, x) — cix2,

(2.1.5)

for t = T. Condition (2.1.5) readily follows from the definition of the loss function (2.1.4) or from formula (1.4.22) with ip(x, y) — c-^x2. The optimal control M* in the form (1.4.25), which minimizes the expression in the square brackets in (2.1.3), is determined by the condition

d[-]/du — 0 as follows:

—sir

<"•"

Exact Methods for Synthesis Problems

95

Substituting, instead of u, the control u* into the expression in the square brackets in (2.1.3) and omitting the symbol "min", we rewrite Eq. (2.1.3) in the form

?L +ari>L+1L*PL + 2

dt

dx

2 dx

r

2_^lf^V=o

4h\dx)

(217} {

' ' '

(Eq. (2.1.7) is just Eq. (1.4.26) for problem (2.1.1), (2.1.2)). Now to solve the synthesis problem, it remains to find the solution F(t, x)

that satisfies Eq. (2.1.7) in the strip HX and is a continuous continuation of (2.1.5) as t —> T. We shall seek such a solution in the form

F(t,x) =p(t)x2+ r ( t ) ,

(2.1.8)

where p(t) and r ( t ) are some functions of time. We choose these functions so that the solution of the form (2.1.8) satisfy (2.1.5) and (2.1.7). Substituting (2.1.8) into (2.1.7) and setting the coefficient of x2 equal to zero, as well

as the terms independent of a;, we obtain the following equations for the unknown functions p(t) and r(t):

b2 p=-c-2ap+--p2, h r=-vp.

(2.1.9) (2.1.10)

It follows from (2.1.5) that the solutions p(t) and r(t) of (2.1.9) and (2.1.10) attain the values p(T) = d, r(T)=0 (2.1.11) at the terminal time t = T. The system of ordinary differential equations (2.1.9), (2.1.10) with additional conditions (2.1.11) can readily be integrated. As a result, we obtain the following expressions for the functions p(t) and r(t): T

Di-D2eW-

)

,__

v\D2. D3 + DtJW-V =

l

D,

n

(2.1.13) where the constants (3, DI, D2, D3, and D4 are related to the parameters

of problem (2.1.1), (2.1.2) as follows: cb2

—,

DI = c + ci(a + /3), 2

b D3=f3-a+—Ci, n

D2 = c+ Cl (o -/?),

b2 D4=/3 + a-—Cl. h

Chapter II

96

From (2.1.6), (2.1.8), and (2.1.12), we obtain the optimal control law

w,(<, x) - p(t)x,

(2.1.14)

p(t) = -

which is the solution of the synthesis problem for the optimal stabilization

system in Pig. 13. It follows from (2.1.14) that in this case the controller C in Fig. 13 is a linear amplifier in the variable x with variable amplification factor p(t). In the sequel, we indicate such amplifiers by a special

mark ">." Therefore, the optimal system for problem (2.1.1), (2.1.2) can be represented as the block diagram shown in Fig. 14.

FIG. 14 Obviously, the minimum value I[u*] of the optimality criterion (2.1.2) with control (2.1.14) and the initial state z(0) — x is equal to F(0, x). From

(2.1.8), (2.1.12), and (2.1.13), we have

J4e-2^T

"fib

2/3 \D4

n

3

Di

+D 4 + r > 3

^—— ^ .

(2.1.15)

To complete the study of problem (2.1.1), (2.1.2), it remains to prove

that the solution (2.1.12)-(2.1.15) of the synthesis problem is unique. It follows from our discussion that the problem of uniqueness of (2.1.12)(2.1.15) is equivalent to the uniqueness of the solution (2.1.8) of Eq. (2.1.7). The general theory of quasilinear parabolic equations [124] implies that

Eq. (2.1.7) with additional condition (2.1.5) has a unique solution in the class of functions F(t, x) whose growth as x —¥ oo does not exceed that of any finite power of x\. On the other hand, an analysis of properties of the

loss function (2.1.4) performed in [113] showed that, for each t e [0,T] and x £ RI, the function (2.1.4) satisfies the estimate

0
97

Exact Methods for Synthesis Problems

where N(T) is bounded for any finite T. Therefore, the function (2.1.8) is a unique solution of Eq. (2.1.7), corresponding to the problem considered, and the synthesis problem has no solutions other than (2.1.12)-(2.1.15).

REMARK. The optimal control (2.1.14) is independent of the parameter v, that is, of the intensity of random actions on the plant P, and coincides with the optimal control algorithm (1.3.33), (1.3.34) for the deterministic problem (1.3.24), (1.3.25). Such a situation is typical of many other linear-quadratic problems of optimal control with perturbations in

the form of a Gaussian white noise. D The exact formulas (2.1.12)-(2.1.15) allow us to examine the process of relaxation of stationary operating conditions (see §1.4, Section 1.4.2) for the stabilization system in question. To this end, let us consider a special case of problem (2.1.1) in which the terminal state x(T)

is not penalized

(ci = 0). In this case, formulas (2.1.12) and (2.1.13) read

_a

a ) e -2/3(T-t)

(2.1.16)

'

vh, r^ -ln[/3-a + (ft-

a e

1/3. -Sln2/3. (2.1.17)

If the operating time is equal to T > t\ = 3/2/3, then the functions p(t) and r(t) determined by (2.1.16) and (2.1.17) have the form shown in Fig. 15.

vcT

/3-a

T- — T t 2/3

T- — T t

2/3

FIG.

15

The functions p(i) and r(t) are characterized by the existence of two time intervals [0, T—
98

Chapter II

ways. The first interval [0, T — ii] corresponds to the stationary operating mode, that is, p(t) ~ c/(/3 — a) = const for t (E [0, T— ti], the function r(t) linearly decreases as t grows, and on this interval the rate of decrease in r(i) is constant and equal to vc/(f) — a). The terminal interval [T — ii,T] is essentially nonstationary. It follows from (2.1.16) and (2.1.17) that the length of this nonstationary interval is of the order of 3/2/3. Obviously, in the case where this nonstationary interval is a small part of the entire operating time [0, T], the control performance is little affected if, instead of the exact optimal control (2.1.14), we use the control

x

(2.1.18)

that corresponds to the stationary operating mode. It follows from (2.1.18) that for large T the controller C in Fig. 13 is a linear amplifier with constant amplification factor, whose technical realization is much simpler than that of the nonstationary control block described by (2.1.14) and (2.1.12). Formulas (2.1.16) and (2.1.17) show that, for large values of T — t, the loss function (2.1.8) satisfies the approximate relation

F(t,x)~-?—x2 +

-(T-t).

(2.1.19)

Comparing (2.1.19) and (1.4.29), we see that in this case the value 7 of stationary mean losses per unit time, introduced in §1.4, is equal to VC

' ~~ p-a a '

(2.1.20)

that is, coincides with the rate of decrease in the function r(t) on the stationary interval [0, T —
It should be noted that to calculate 7 and the function /(a;), we need not have exact formulas for p(t) and r(i) in (2.1.8). It suffices to use the corresponding stationary Bellman equation (1.4.30), which in this cases has the form

df and to substitute the desired solution in the form /(x) = px2 into (2.1.22). We obtain the numbers p and 7, just as in the nonstationary case, by setting

Exact Methods for Synthesis Problems

99

the coefficients of x2 and the free terms on the left- and right-hand sides in (2.1.22) equal to each other. We also note that if at least one of the parameters a, 6, i/, c, and h of problem (2.1.1), (2.1.2) depends on time, then, in general, there does not exist any stationary operating mode. In this case, one cannot obtain finite

formulas for the functions p(t) and r(t) in (2.1.8), since Eq. (2.1.9) is a Riccati equation and,

in general, cannot be integrated exactly. Therefore,

if the problem has variable parameters, the solution is constructed, as a rule, by using numerical integration methods. 2.1.2. All of the preceding can readily be generalized to multidimensional problems of optimal stabilization. Let us consider the system shown in Fig. 13 whose plant P is described by a linear vector-matrix equation of the form

(2.1.23)

where x = x(t] e Rn is an n- vector-column of phase variables, u G Rr is an r- vector of controlling actions, and £(t) £ RTO is an ra- vector of random perturbations of a Gaussian white noise type with characteristics (1.1.34). The dimensions of the matrices A, B, and a are related to the dimensions of the corresponding vectors and are equal to n x n, n x r, and n x m, respectively. The elements of these matrices are continuous functions of

time1 defined for all t from the interval [0, T] on which the controlled system is considered. For the optimality criterion, we take a quadratic functional of the form

= EJ f

[xT(t)G(t)x(t)+uT(t)H(t)u(t)}dt xTt(T)Qx(T}\.

(2.1.24)

Here Q and G(t) are symmetric nonnegative definite n x n matrices and the symmetric r x r matrix H(t) is positive definite for each t e [0, T]. Just as (2.1.3), the Bellman equation for problem (2.1.23), (2.1.24) follows from (1.4.21) if we set A« = B« = 0, Bx = a(t)(TT(t), Ax = A(t)x + B(t)u, and c(x,y,u] = xTGx + UTHu. Thus we obtain

r,. d2F G(t)x 1

+ min \uTBT(t) —— + uTH(t)u\ = 0. « \_ ax J

(2.1.25)

As was shown in [156], it suffices to assume that the elements of the matrices A(t), .B(t), and rr(t) are measurable and bounded.

100

Chapter II

In this case, the additional condition on the loss function (1.4.22) has the form F(T, x) - xTQx.

The further considerations leading to the solution of the synthesis problem

are similar to those in the one-dimensional case. Calculating the minimum value of the expression in the square brackets in (2.1.25), we obtain the optimal control 1

Q EI

(t)-,

(2.1.26)

which is a vector analog of formula (2.1.6). Substituting the expression

obtained for w* into (2.1.25), we arrive at the equation aP

f) p

i

G(T)x - -B(t)H^(t)BT(t)

= 0. (2.1.27)

We seek the solution of (2.1.27) as the following quadratic form with respect to the phase variables:

F(t,x) = xTP(t)x + r(t).

(2.1.28)

Substituting (2.1.28) into (2.1.27) and setting the coefficients of the quadratic (with respect to x) terms and the free terms on the left-hand side in (2.1.27) equal to zero, we obtain the following system of differential equations for the unknown matrix P(t) and the scalar function r(t):

P + AT(t)P + PA(t) + G(t) - PB(i)H-1(t)BT(t)P T

r + Sp Pa-(t)(T (t) = 0,

= 0, P(T) = Q;

r(T) = 0.

(2.1.29)

If system (2.1.29) is solved, then the optimal solution of the synthesis problem has the form

uf(t,x) = -H-1(t)BT(t)P(t)x

= J>(t)x,

(2.1.30)

which follows from (2.1.26) and (2.1.28). Formula (.2.1.30) shows that the controller C in the optimal system in Fig. 13 is a linear amplifier with n inputs and r outputs and variable amplification factors. Let us briefly discuss the possibilities of solving system (2.1.29). The existence and uniqueness of the nonnegative definite matrix P(i) satisfying the matrix-valued Riccati equation (2.1.29) are proved in [72] under the above assumptions on the properties of the matrices A(t), B(t), G(t), H(t),

Exact Methods for Synthesis Problems

101

and Q. One can obtain explicit formulas for elements of the matrix P(t) only by numerical methods,2 which is a rather complicated problem for large dimensions of the phase vector x. In the special case of the zero matrix G(t) = 0, the solution of the matrix equation (2.1.29) has the form [1, 132]

rT

+ Q/ Jt

-i-l

X(T,s)B(s)H-1(s}BT(s)XT(T,s)ds\

QX(T,t).

(2.1.31)

J

Here X(t,s), t > s, denotes the fundamental matrix of system (2.1.23); sometimes this matrix is also called the Cauchy matrix. The properties of the fundamental matrix are described by the relations

X(t, t) = E,

= A(t)X(t, ,),

= -X(t, s}A(s).

(2.1.32) One can construct the matrix X(t, s) if the so-called integral matrix Z(t) of system (2.1.23) is known. According to [111], a square nxn matrix Z(t) is called the integral matrix of system (2.1.23) if its columns consist of any n linearly independent solutions of the homogeneous system x = A(t)x. If the matrix Z(i) is known, then the fundamental matrix X(t, s) has the form X(t,s) = Z(i)Z-l(s). (2.1.33) One can readily see that the matrix (2.1.33) satisfies conditions (2.1.32). The fundamental matrix can readily be calculated if and only if the elements of the matrix A(t) in (2.1.23) are time-independent, that is, if A(t) = A = const. In this case, we have

and the exponential matrix can be expressed in the standard way [62] either via the Lagrange-Silvester interpolation polynomial (in the case of simple eigenvalues of the matrix A) or via the generalized interpolation polynomial (in the case of multiple eigenvalues and not simple elementary divisors of the matrix A). If the matrix A is time-varying, the construction of the fundamental matrix (2.1.33) becomes more complicated and requires, as a rule, the use of numerical integration methods. 2 There also exist approximate analytic methods for calculating the matrices P(t) [ I , 72]. However, for matrices P(t) of larger dimensions, these methods meet serious

computational difficulties.

102

Chapter II

2.1.3. The results obtained by solving the basic linear-quadratic problem (2.1.23), (2.1.24) can readily be generalized to more general statements

of the optimal control problem. Here we only list the basic lines of these generalizations; for a detailed discussion of this subject see [1, 5, 34, 58, 72,

122, 132]. First of all, note that the synthesis problem (2.1.23), (2.1.24) admits an exact solution even if there are noises in the feedback circuit, that is, if

instead of exact values of the phase variables x(i), the controller C (see Fig. 13) receives distorted information of the form

x ( t ) = N(t)x(t) +
(2.1.34)

where N(t) and
process of the white noise type (1.1.34) or a Gaussian Markov process. In this case, the optimal control algorithm coincides with (2.1.30) in which,

instead of the true values of the current phase vector x = x ( t ) , we use the vector of current estimates m — m(i) of the phase vector. These estimates

are formed with the help of Eqs. (1.5.53) for the Kalman filter, which with regard to the notation in (2.1.23) and (2.1.34) have the form 3

m = [A(t) - B(t)H-i(t)BT(t)P(t)}m + DJV^M^Wr 1 ^) - N(t)m), (2.1.35) T

T

D =
well-known separation theorem [58, 193]. The next generalization of the linear-quadratic problem (2.1.23), (2.1.24) is related to a more general model of the plant. Suppose that, in addition to additive noises (,(t), the plant P is subject to perturbations depending on the state x and control u and to pulsed random actions with Poisson distribution of the pulse moments. It is assumed that the behavior of the plant P is described by the special equation

x = A^x+B^u+o-^x^+o-^u&^+a^^+az^Ott), 3

(2.1.37)

Equations (2.1.35) and (2.1.36) correspond to the case in which 77(4) in (2.1.34) is a

white noise.

Exact Methods for Synthesis Problems

103

FIG. 16 where £i(t) and £2^) are scalar Gaussian white noises (1.1.31), 9(t) is an I-vector of independent Poisson processes with intensity coefficients A,- (i = !,...,£), <TI, 0*2, and 173 are given n x n, n x r, and n x I matrices, and

the other variables have the same meaning as in (2.1.23). For the exact solution of problem (2.1.37), (2.1.24), see [34]. We also note that sufficiently effective methods have been developed for infinitely dimensional linear-quadratic problems of optimal control if the

plant P is either a linear dynamic system with distributed parameters or a quantum-mechanical system. Results corresponding to control of distributed parameter systems can be found in [118, 130, 164, 182] and to control of quantum systems in [12, 13]. All linear-quadratic problems of optimal control, treated examples, are characterized by the fact that isfying the Bellman equation is of quadratic form (a and the optimal control law is a linear function (a

as well as the abovethe loss function satquadratic functional) linear operator) with

respect to the phase variables (the state function). To solve the Bellman equation becomes much more difficult if it is necessary to take into account some restrictions on the domain of admissible control values in the design of an optimal system. In this case, exact analytical results can be obtained, as a rule, for one-dimensional synthesis problems (or for problems reducible to one-dimensional problems). Some of such problems are considered in the following sections of this chapter. §2.2. Problem of optimal tracking a wandering coordinate

Let the input (command) signal y ( t ) in the servomechanism shown in Fig. 2 be a scalar Markov process with known characteristics, and let the plant P be a servomotor whose speed is bounded and whose behavior is

104

Chapter II

described by the scalar deterministic equation

x-u,

\u(t)\
(2.2.1)

(here um determines the admissible range of the motor speed, — um < x < um). Equation (2.2.1) adequately describes the dynamics of a constant

current motor controlled by the voltage on the motor armature under the assumption that the moment of inertia and the inductance of the armature

winding are small [2, 50]. We shall show that various synthesis problems stated in §1.4 can be solved for such servomechanisms. 2.2.1. Let y(t] be a diffusion Markov process with constant drift a and

diffusion B coefficients. We need to calculate the controller C (see Fig. 2) that minimizes the integral optimality criterion f

c (x(t),y(t))dt,

(2.2.2)

Jo

where c(x, y) is a given penalty function.

By setting A* = a, B« = B, ax = u, and Bx = 0 in (1.4.21), we readily obtain the following Bellman equation for problem (2.2.1), (2.2.2):

dF dt

dF dy

Bd2F 2 dy

.

.

. r dF] «-— = 0. |«|<« TO L dx J

— — + a — — + - — —2 - + c z,y + mm

2.2.3

We shall consider the penalty functions c(x, y) depending only on the error signal, that is, on the difference z = y — x between the command input y and the controlled variable x. Obviously, in this case, the loss function F(t,x,y) = F(t,y—x) = F(t,z) in (2.2.3) also depends only on z. Instead of (2.2.3), we have

dF at

dF dz

Bd2F 2 dz

r dFi - u—— \= 0. l«l<«™ L dz J

-5T + a1T- + TlTir 2 + c(z + mm

(2.2.4)

The minimum value of the function in the square brackets in (2.2.4) can be obtained by using the control4

ut — um sign f —— J , 4

(2.2.5)

In (2.2.5) sign
sign
Exact Methods for Synthesis Problems

105

which requires to switch the servomotor speed instantly from one admissible limit value to the opposite value when the derivative dF(t, z)/dz of the loss function changes its sign. Control of the form (2.2.5) is naturally called control of relay type (sometimes, this control is called "bang-ban
Substituting (2.2.5), instead of u, into (2.2.4) and omitting the symbol "min", we reduce Eq. (2.2.4) to the form

3F dF B d2F 8F -57- +0-5- + -^-^+ c(z) -Urn -£- = 0 . dt dz 2 dz2 dz

(2.2.6)

In [113, 124] it was shown that in the strip UT = {0 < t < T, -oo < z < 00} Eq. (2.2.6) has a unique solution F(t,z) satisfying the additional condition F(T, z) = 0 if the penalty function c(z) is continuous and does not grow too rapidly as z\ —>• oo.5 In this case, F(t, z) is a function twice continuously differentiable with respect to z and once with respect to t. In particular, since dF/dz is continuous, the condition

—— ( t , z ) = Q

(2.2.7)

must be satisfied at the moment of switching the controlling action. If c(z) > 0 attains its single minimum at the point z = 0 and does not decrease monotonically as z| —> oo, then Eq. (2.2.7) has a single root z f ( t ) for each t. This root determines the switch point of control. On

different sides of the switch point the derivative dF/dz has opposite signs. If dF/dz > 0 for z > z p (<) and dF/dz < 0 for z < zr(t), then we can write the optimal control (2.2.5) in the form

w»(t,z)=umsign(z-zr(t)).

(2.2.8)

Thus, the synthesis problem is reduced to finding the switch point z (t}. To this end, we need to solve Eq. (2.2.6).

Equation (2.2.6) has an exact solution if we consider the stationary tracking. In this case, the terminal time (the upper limit of integration in (2.2.2)) T —>• oo, and Eq. (2.2.6) for the time-invariant loss function (see (1.4.29)) •t)]

(2.2.9)

becomes the ordinary differential equation

B d2f ^d^

df +a

dz~U"

+ c(z)=
(2.2.10)

5 More precisely, the condition that there exist positive constants A\, A?, and a such that 0 < c(z) < A\ + A? z\a for all z implies constraints on the growth of the

function c(z).

106

Chapter II

which can be solved by the matching method [113, 171, 172]. Let us show how to do this. Obviously, the nonlinear equation (2.2.10) is equivalent to two linear equations

f^+ <-->£+•<•>=* »>v £>»• *

l

(2211)

-

for the functions f i ( z ) and f i ( z ) that determine the function f ( z ) on each side of the switch point z p . The unique solutions to linear equations (2.2.11) are determined by the behavior of /i and /2 as \z\ —>• oo. It follows from the statement of the problem that if we take into account the diffusion "divergence" of the trajectories z(t) for large \z , then we only obtain small corrections to the value of the optimality criterion and, in the limit as \z\ —)• oo, the loss functions f i ( z ) and /2(z) must behave just as the solutions to Eqs. (2.2.11) with B = Q. The corresponding solutions of Eqs. (2.2.11) have the form =

z

2 f°°

-fi JIz B

"' ""'

I c ( z ) - f]

2 [*

dz

„ = —S B

r

ex

\

2K.-a)

1

(2 2 12

L z P \_ - - J —5—— ( ~ z) \ B

>-~+.>r }]f

-- '

\

i i —-r- J t c ( z )-7]exp

i -\~ /

J-oo

„

t-2

Z ) | UZ.

L

According to (2.2.7), we have the following relations at the switch point z p :

~(zr) = -^-(zr) = 0.

(2.2.13)

Substituting (2.2.12) into (2.2.13), considering (2.2.13) as a system of equations with respect to two unknown variables z p and -y, and performing some simple transformations, we obtain the equation for the switch point: /

/ Jo

\cI z -\- ________ ) — r\ ? — ________ I \e~v rlit — fl
O 9 14^ ^Z.Z.ltj

and the expression for the stationary tracking error

^ f zZ +7 - r cC

"Jo

Bp

}c-dvr dv

( r 2(um-a))e

-J0

Cc(z Z

( r

BV

}c-*dv dv

2(um+a))e

' (2.2.15)

Exact Methods for Synthesis Problems

107

To obtain explicit formulas for the switch points and stationary errors, it is necessary to choose some special penalty functions c(z). For example, for the quadratic penalty function c(z) = z 2 from (2.2.14), (2.2.15), we have 5^-j,

(2-2.16)

If c(z) — z|, then we have /

n

\

(2.2.18)

,. 2(um-a) ™ + a ) l n 1 + ——

+um-a

(2.2.19) It should be noted that formulas (2.2.16)-(2.2.19) make sense only under the condition um > a. This is due to the fact that the stationary operating mode in the problem considered may exist only for um > a. Otherwise, (for a > um), the mean rate of increase in the command signal y(t) is larger

than the limit admissible rate of change in the output variable x(t), and the error signal z ( t ) = y(t) — x ( t ) is infinitely growing in time. If the switch point zp is found, then we know how to control the servomotor P under the stationary operating conditions. In this case, according to (2.2.8), the optimal control has the form «„ (z(t)) = um sign ( z ( t ) - z r ) ,

(2.2.20)

and hence, the block diagram of the optimal servomechanism has the form

shown in Fig. 17. The optimal system shown in Fig. 17 differs from the optimal systems considered in the preceding section by the presence of an essentially nonlin-

ear ideal-relay-type element in the feedback circuit. The other distinction between the system in Fig. 17 and the optimal linear systems considered in §2.1 is that the control method depends on the diffusion coefficient B

of the input stochastic process (in §2.1, the optimal control is independent of the diffusion coefficients,6 and therefore, the block diagrams of optimal deterministic and stochastic systems coincide).

If B = 0 (the deterministic case), then it follows from (2.2.16)-(2.2.19) that the switch point zf — 0 and the stationary tracking error 7 = 0. These 6

This takes place if the current values of the state vector x(t) are measured exactly.

Chapter II

108

FIG. 17 results readily follow from the statement of the problem; to obtain these results it is not necessary to use the dynamic programming method. Indeed,

if at some instant of time we have y(t] > x(t) (z(t) > 0), then, obviously, it is necessary to increase x at the maximum rate (that is, at u = +um) till the equality y = x (z = 0) is attained. Then the motor can be stopped. In

a similar way, for y < x (z < 0), the control u — —um is switched on and operates till y becomes equal to x. After y = x is attained and the motor is stopped, the zero error z remains constant, since there are no random actions to take the system out of the state z = 0. Therefore, the stationary tracking "error" is zero.7 If the diffusion is taken into account, then the optimal deterministic

control u*et = um sign z is not optimal. This fact can be explained as follows. Let u — um signz, and let 5 ^ 0 . Then the following two factors affect the trajectories z(t): they regularly move downwards with velocity (um —a) for z > 0 and upwards with velocity (um + a) for z < 0 due to the drift a and control u (see Fig. 18), and they "spread" due to the diffusion B that is the same for all z. As a result, the stochastic process z(t) becomes

stationary (since the regular displacement towards the i-axis is proportional to t and the diffusion spreading away from the i-axis is proportional to \/t) and all sample paths of z ( t ) are localized in a strip of finite width containing the t-axis.8 However, since the "returning" velocities in the upper and lower

half-planes are different, the stationary trajectories of z ( t ) are arranged not It is assumed that the penalty function c(z) attains its minimum value at z = 0 and c(0) = 0. 8 More precisely: if 2(0) = 0, then with probability 1 the values of z(t) lie in a strip of finite width for all t > 0.

Exact Methods for Synthesis Problems

109

z(t)

FIG. 18 symmetrically with respect to the line z = 0, as is conventionally shown in Fig. 19. If the penalty function c(z) is an even function (c(z) — c(—z)), then, obviously, the stationary tracking error 7 = Ec(z) (see (1.4.32)) can be decreased by placing the strip AB (where the trajectories are localized) symmetrically with respect to the axis z = 0. This effect can be reached

by switching the control u at some negative value zr rather than at z = 0. The exact position of the switch point zr is determined by formulas (2.2.14), (2.2.16), and (22.2.18).

FIG. 19 In conclusion, we note that all results obtained in this section can readily be generalized to the case where the plant P is subject to additive noncon-

trolled perturbations of the white noise type (see Fig. 10). In this case,

110

Chapter II

instead of Eq. (2.2.1), we have

um,

(2.2.21)

where £(t) is the standard white noise (1.1.31) independent of the input process y(t) and N > 0 is a given number. In this case, the Bellman equation (2.2.3) acquires the form

dF - ^ dt

dF — dy

Bd2F 2 dy22

Nd2F 2 dx2o

.

i

.

.

mm

r dF

|«|<«

and instead of (2.2.4), we obtain

dF -7 at

dF -5az

B + Nd2F 2 dz2

. .

. r dFi A mm - «—— = 0. l«l<«m L oz

This equation differs from (2.2.4) only by a coefficient of the diffusion term. Therefore, all results obtained for systems whose block diagram is shown in Fig. 2 and whose plant is described by Eq. (2.2.1) are automatically

valid for systems in Fig. 10 with Eq. (2.2.21) if in the original problem the diffusion coefficient B is replaced by B + N. In particular, if noises in the plant are taken into account, then formulas (2.2.16) and (2.2.17) for the stationary switch point and the stationary tracking error take the form Sf =

_(B + N)a ~

2

Note also that the problem studied in this section is equivalent to the synthesis problem for servomechanism tracking a Wiener process of intensity B with nonsymmetric constraints on admissible controls — um + a <

u < um + a, since both these problems have the same Bellman equation

(2.2.4). 2.2.2. Now let us consider the synthesis problem that differs from the problem considered in the preceding section only by the optimality criterion. We assume that there is an admissible domain [^1,^2] for the error

z(i) = y(t) - x(t) (ti and £2 are given numbers such that li < I?). We assume that if z(t) leaves this domain, then serious undesirable effects may occur. For example, the system considered or a part of any other more

complicated system containing our system may be destroyed. In this case,

Exact Methods for Synthesis Problems

111

it is natural to look for controls that keep z(t] within the admissible limits for the maximum possible time. General problems of calculating the maximum mean time of the first passage to the boundary were considered in §1.4. In particular, the Bellman equation (1.4.40) was obtained. In the scalar case studied here, this equation has the form

Bd2Fj

S=-1

dj\

~2~dv^ + a~dy~^ H
(2-2-24)

(Eq. (2.2.24) follows from (1.4.40), (1.4.31), since A« = a, Ax = u, BV = B, Bx = 0). Recall that the function Fi(x,y) in (2.2.24) is equal to the maximum mean time of the first passage to the boundary of the domain of admissible phase variables if the initial state of the system is (x, y). In the case where the domain of admissible values (x, y) is determined by the error signal z = y — x, the function FI depends only on the difference FI(X, y) = Fi(y — x) — F\(z] and, instead of the partial differential equation (2.2.24), we have the following ordinary differential equation for the function FI(Z): „ , 22 f a^+ max - «^ =-1. 2 dz dz |«|<«™ L dz J

(2.2.25)

The function -Fi(z) satisfies Eq. (2.2.25) at the interior points of the domain [£1,^2] °f admissible errors z. At the boundary points of this domain, FI vanishes (see (1.4.41)):

The optimal system can be synthesized by solving Eq. (2.2.25) with the boundary conditions (2.2.26). Just as in the preceding section, one can see that the optimal control u f ( z ) is of relay type and is equal to

ux(z) = -um sign (~rL)•

(2.2.27)

Using (2.2.27), we transform Eq. (2.2.25) to the form

dz

(2.2.28)

The condition of smooth matching (see [113], p. 52) implies that the solution -Fi(z) of Eq. (2.2.28) and the derivatives dFi/dz and d2Fi/dz2 are

Chapter II

112

FIG. 20 continuous everywhere in the interior of [li z1 is determined by the condition

dz

\. Therefore, the switch point

= 0.

(2.2.29)

The same continuity conditions and the boundary conditions (2.2.26), as well as the "physical" meaning of the function F±(z), a priori allow us to estimate the qualitative behavior of the functional dependence -Fi(z). The corresponding curve is shown in Fig. 20. It follows from (2.2.29) that the switch point corresponds to the maximum value of FI(Z). In this case, F ( ( z ) < 0 for z > z1, and F[(z] > 0 for z < z1. In particular, this implies that the optimal control (2.2.27) can be written in the form

u*(z) = -u r a sign(z-zj), which is similar to (2.2.20) and differs only by the position of the switch point. Thus, in this case, if the applied constant displacement —zr is replaced by — z1, then the block diagram of the optimal system coincides with that in Fig. 15. The switch point z1 can be found by solving Eq. (2.2.28) with the boundary conditions (2.2.26). Just as in the preceding section, we replace the nonlinear equation (2.2.28) by the following pair of linear equations for the

Exact Methods for Synthesis Problems

113

function Ff(z), zl < z < £ 2 > and the function F^(z), lj_ < z < z^:

Bd2F+ t .dF+ V^H~+ (a-Wm)-7^ = -l, 2 2 dz dz Bd2F~ .dFr , r

1

z1
Y^- + (a + U ™ ) ^ = -1'

, 2>2>30)

^<*<^

The required switch point z1 can be obtained from the matching conditions for F f ( z ) and F^(z). Since FI(Z) is twice continuously differentiable, it follows from (2.2.27) that these conditions have the form

F?(*i) = fT(*1), v

dz

(2-2-31)

dz

r'

The boundary conditions (2.2.26) and (2.2.32) for F?(z) and J'f(z) imply ,-.0. /

\

^ — <-2

um - a

-^

2(um - a)2 {

exp

-

I B v2(um -a)(z-z1}

— exp

B j.j

n (-1

—— 2

um + a

s

f\ 1 . .

i

.

W

.

' \'

(2.2.33)

It

I

\

'

IJ

2(um + a) 2 |_

*L

B -a)(z-zl

By using (2.2.33) and the continuity condition (2.2.31), we obtain the following transcendental equation for the required point z1:

2umz* = (um + a)£2 + (um - a)li

™

— ,

B\um2 I um + .i._. 4- n.

r

9

.

i 1

(2.2.34)

um — a

In the simple special case a = 0, it follows from (2.2.34) that i

4+4

114

Chapter II

that is, the switch point is the midpoint of the interval of admissible errors z. This natural result can be predicted without solving the Bellman equation. In the other special case where —li — I? = t (I > 0) and a
Zr

oB

~ <

at r i + exp(2Mro£/5)1 +

^ [l-exp(2uml/B)\ '

To find z1 in the other cases, it is necessary to solve the transcendental equation (2.2.34). 2.2.3. Assume that the performance of the servomechanism shown in Fig. 2 is determined by the maximum error z(i] = y(z) ~ x(t] on a fixed

time interval 0 < t < T. Then it is natural to minimize the optimality criterion

/[«] = E max z ( t ) \ = E max \y(t) - x(t)|, 0
0
(2.2.35)

which is a special case of the criterion (1.1.18). For convenience, we shall

use the modification (1.4.48) of the criterion (1.1.18), that is, instead of (2.2.35), we shall minimize I[u] = Emax z(r)| e -^ r -*>.

(2.2.36)

T>t

The parameter (3 > 0 determines the observation time for the stochastic process z(r). We assume that the criteria (2.2.35) and (2.2.36) are equivalent if the terminal time T and the variable (3 are matched, for example, as follows: T = c//3, where c > 0 is a constant. The Bellman equation for the problem considered can be obtained from (1.4.51) with regard to the relation /2(#,y) = f z ( y — x) — /2(z). This equation has the form

min \-u~ =/?/ PJ22 , «|<« m L dz\ '

i fJ/ 2 ( z ) > | z , ^ ' ' '

(2.2.37)

otherwise. Just as in the preceding sections, after the expression in the square brackets is minimized, Eq. (2.2.37) acquires the form = /?/2,

if/2(*)>|*,

(2.2.38)

otherwise. In this case, just as in the preceding sections, the optimal control w*(z) is

of relay type and can be written in the form (2.2.20). The only distinction

Exact Methods for Synthesis Problems

115

is that, in general, the switch point z 2 differs from z p and z1. The point z 2 can be found by solving Eq. (2.2.38). Solving Eq. (2.2.38), we shall distinguish two domains on the z-axis: the domain Z-± where f2(z] > \z\ and the domain Z2 where f2(z) = \z\.

Obviously, if f2(z*) = \z*\ for any z*, then / 2 (z) = z| for any z such that \z\ > z* . In other words, the domain Z2 consists of two infinite intervals

(—oo, z') and (z",+oo). In the domain Z\ lying between the boundary points z' < 0 and z" > 0, we have

Bd2f2 2

2 dz

,

d/2

+ a—— - un

dz

d/2 = /3f2. dz

(2.2.39)

Next, the interval [z', z"] is divided by the switch point z 2 into the following two parts: the interval z' < z < z 2 where Eq. (2.2.39) takes the form

=0

(2-2.40)

and zr2 — < z< z" where —

Thus, in this case, we have seven unknown variables: z', z", z 2 , and the four constants obtained by integrating Eqs. (2.2.40) and (2.2.41). They can be obtained from the following seven conditions:

, c 2} I\ Z / I>, r

( =^ df2

.

2.

_

—;—— V I Z ;I —

/2

(2.2.42) 2

_

—— V ( Z ;I — U .

Formulas (2.2.42) are smooth matching conditions for the solutions f2 (z) and f i ~ ( z ) .

The last three conditions show that the solutions and their

first-order derivatives are continuous at the switch point z 2 (see (2.2.31) and (2.2.32)). The first four conditions show that the solutions and their first-order derivatives are continuous at the boundary points of Z\. By solving (2.2.40) and (2.2.41) with regard to (2.2.42), we readily obtain

116

Chapter II

the following three equations for z', z", and z2: 1

A2{exp[-(A1 + A 2 ) ( Z 2 _ z ' ) ] _ 1

' ~ ^ " ~ l; 1

(2.2.44)

1

-

_e

(2.2.45)

In (2.2.43)-(2.2.45) we have used the notation

= •= um - a + v/(wm - a) 2 + 2/3B , a 1

1 o \

uro + "^("m - a) 2 + 2/3B

The desired switch point z2 can be found by solving the system of transcendental equations (2.2.43)-(2.2.45). Usually, this system can be solved by numerical methods. One can obtain the explicit expression for z1 from

Eqs. (2.2.43)-(2.2.45) only if the problem is symmetric, that is, if a — 0. In this case, the domain Zi is symmetric about the origin, z' = —z", and we have the switch point z2 — 0. However, this is a rather trivial case and of no practical interest.

REMARK. It should be noted that the optimal systems considered in Sections 2.2.2 and 2.2.3 are very close to each other (the switch points nearly coincide, zj Kt z2) if the corresponding parameters of the problem agree well with each other. These parameters can be made consistent in

the following way. Assume that the same parameters a, B, and um are given in problems of Sections 2.2.2 and 2.2.3. Then, choosing a value of

Exact Methods for Synthesis Problems

117

the parameter /3, we can calculate three numbers z' = z'(/3), z" = z"(/3), and z2 — z2((3) in dependence on the choice of (3. Now if we use z' and z" as the boundary values of admissible errors (li = z' (/3), ti — z" (/3)) in the problem considered in Section 2.2.2, then by solving Eq. (2.2.34), we obtain the coordinate of the switch point z* and show that z^((3) tel z^(f3)9 for (3 varying from 1.0 to 10~4. This is confirmed by the numerical experiment

described in [92]. Moreover, in [92] these values of the parameter /3.

it is shown that Fi(z^(f3)}

« (3'1 for

D

2.2.4. Now let us consider the synthesis problem of optimal tracking a discontinuous Markov process. Let us assume that the input process y(t) in the problem of Section 2.2.1 is a pure discontinuous Markov process. As shown in §1.1, such processes are completely characterized by the intensity

\(y) of jumps and the density function ir(y,y') describing the transition probabilities at the jump moments. The one-dimensional density p(t, y) of this process satisfies the Feller equation (see (1.1.71))

dp(t, y) - A(j/)p(t, y) + I A(Z)TT(Z, y)p(t, z) dz = 0. dt

(2.2.46)

From (1.4.61) with regard to (2.2.1) and (2.2.2), we obtain the Bellman equation

ir(y,z)F3(t,x,z)dz-F3(t,x,y) + c(x,y)+

min u - ( t , x, y) = 0. l«l<« m L ox J

(2.2.47)

If we denote the integro-differential operator of Eq. (2.2.46) by L f t y , then this equations can be written in the short form

Lt,yP(t,y)=Q.

(2.2.48)

Comparing Eqs. (2.2.46) and (2.2.47) with the Feller equations (1.1.69) and (1.1.70), we see that, for pure discontinuous processes, the Bellman equation (2.2.47) contains the integro-differential operator L^y of the backward Feller equation; this operator is dual to Ltiy. Therefore, Eq. (2.2.47) can be written in the form , x, y) + c(x, y) + min \u~(t, x, y)} = 0. [«|<« m [_ 9

OX

(2.2.49)

J

The approximate relation z^(&) « z*(P) means that \z*(0)-z^(P)\ < z"(/3)-z'(/3).

118

Chapter II

In what follows, we assume that the input Markov process y(t) is homogeneous with respect to the state variable y, that is, A(y) = A = const and •x(y,y') = n(y — y'). In this case, by using the formal method proposed in [176], we can replace the integro-differential operator L^>y in (2.2.47) and

(2.2.49) by an equivalent differential operator. Let us show how to do this. First, we try to write Eqs. (2.2.46) and (2.2.48) in the form

(2 2 50)

--

where L is the required differential operator and Lp is the density of the probability flow [160, 173]. We apply the Fourier transform to (2.2.50) and

(2.2.46). For the Fourier transform of the probability density oo

/

esyp(t,y)dy,

s = u,

=\

•oo

we obtain the following two equations from the well-known property of the Fourier transform of the convolution of two functions:10

),

(2.2.51)

s)-l),

(2.2.52)

where 7f(s) denotes

f°° s W(s) = I e ^(y}dy.

(2.2.53)

J — oo

Comparing (2.2.51) and (2.2.52), we obtain the spectral representation of the desired operator L(8)

= A^" 1 .

(2.2.54)

s

If the expression on the right-hand side of (2.2.54) is the ratio of polynomials 0

=

Q(s)

l

...

m

qo + qis + - - - + qnsn

V

'

(hi and 5,- are constant numbers), then, as follows from the theory of Fourier

transforms [41], the desired operator L(d / dy) can be obtained from L ( s ) 10

RecaII that A(y) = A and ir(z,y) = Tr(y - z) in (2.2.4

Exact Methods for Synthesis Problems

119

by the formal change s —>• d/dy. Using the operator L(d/dy), we transform the Bellman equation (2.2.49) to the form

(note that if L(d/dy) = H(d/dy)/Q(d/dy), then the Li/> =
= Q(d/dy)
!?- =i-c(z)

dz

(2.2.57)

(here the time-invariant loss function /s = fs(z) is determined by analogy with (2.2.9) and 7 is the stationary error.) The optimal control (2.2.58) 3 differs from (2.2.20) only by the^ position puoiuiuii of ui the oiic switch owiu^ii point puniu z 7 , which can

be obtained from the condition dz

v r/

.

To complete the solution of the synthesis problem, we need to calculate z3. By analogy with Section 2.2.1, the switch point z3 divides the z-axis into two parts: the domain z > z3, where df$/dz > 0, it* = um, and

Eq. (2.2.57) has the form

and the domain z < z 3 , where df^/dz the form

w

/ A \

< 0, u* = um, and Eq. (2.2.57) has

AtF + um -^- = 7 - c ( z ) .

(2.2.60)

At the switch point z3 the derivatives of the functions / + (z) and f ~ ( z ) vanish:

120

Chapter II

To solve the linear equations (2.2.59) and (2.2.60) explicitly, we need to specify the form of the linear operator L(d/dz). Assume that the density of the transition probability Tr(y,y') at the jump moment is given by the formula f

1I exp(-fc / L 2 |y I / -2/1)n ffor //
^(s) = 71 IT-•——— fc2 —— + Afc s - s22 '

(2.2.63)

A* = fc2

After the change s —> d/dz, we obtain the following expression for the operator L(d/dz) from (2.2.63) and (2.2.54):

(2.2.64) With regard to (2.2.64), we can write Eqs. (2.2.59) and (2.2.60) in the form x(_d__

dz

*,\df£ I dz

/,,

.. d

Introducing the functions

we transform the system (2.2.65) as follows:

=

um ) dz

,

where •U, 7rn/ 1[_

(U,^. 1 "7

(17 U,/(,

JI

Relations (2.2.61) and (2.2.66) imply the following matching conditions for the functions
(k2um + \&k)p+(z3r) = (-k22um + \&k)
(2.2.69)

Exact Methods for Synthesis Problems

121

The characteristic equations corresponding to Eqs. (2.2.67) are

umj

\

um J

=0.

(2.2.70)

By fj,^ and ^ we denote the roots of the characteristic equation for the function (p+ (z) (correspondingly, by /j^ and /j,^ the roots of the characteristic equation for (f>~(z)). A straightforward verification shows that if

A

(2.2.71)

then (1) all roots /uf )2 are real,

(2) each characteristic equation in (2.2.70) has roots of opposite signs (for definiteness, in what follows, we assume that ^ and /j,^ are positive, and respectively, fj,^ < 0). REMARK. Note that condition (2.2.71) must be satisfied, since this is the existence condition for the stationary tracking considered here. Indeed, the expression on the left-hand side in (2.2.71) is equal to the absolute value of the mean rate of the regular displacement in the command signal y(t) caused by random jumps. Obviously, this rate of change cannot exceed the limit admissible speed of the servomotor. Inequality (2.2.71) just confirms

this fact.

D

Taking into account these properties of the roots of the characteristic equations (2.2.70), one can readily see that the solutions of Eqs. (2.2.67) have the form ±

(z)=

X

f r

"?('-*') ± ( '}

'

~ (/if - / 4 ) L / f o o 6 fZ

±, ^ a. 1 ei*2(*-*)c±(z')dz'\.

(2.2.72)

Using (2.2.72), from the matching conditions (2.2.69), we obtain the following equation for the required switch point z3:

(k2um + AAfc) I" r° elt+yc+(z3_ ZT 04 ~ / 4 ) [J-oo6

}

y

, y

f°° rf9+(3_ ZT Jo

] \

e^yc~(z3r~y)dy\.

(2.2.73)

122

Chapter II

For the quadratic penalty function c(z) = z 2 , Eq. (2.2.73) has an exact solution. Actually, taking into account (2.2.68), we can rewrite Eq. (2.2.73) in the form

t-y-

Oh - A*2~)

f°° M~»/o

L./-OO

>

e"*»c(y) dy + / Jo

1 J

e"^c(t/) dy] = 0, J

(2.2.74)

where

c(y) = co + cl2/ + c 2 y 2 ,

CQ = fc 2 (z 3 ) 2 + 2Afcz 3 - 2,

{ 2i.jL. i 0 1

Calculating the integrals in (2.2.74), we obtain

(-k2um+XAk)\

/I

1N

/

c

2

T —:—T———XT— Ko ~

/ / ' I

— ^2 J

L

v \

^i

1

~

2

i *J '

1

4

//~/

,

-

\ ( III ' 1 "

A*2

1/^

Since /Lt^"2 and /z^ 2 satisfy the quadratic equations (2.2.70), after easily transformation, we obtain the following explicit formula for the switch point: /o n

v • •«

Using (2.2.69), (2.2.72), and (2.2.77), we can readily calculate the stationary specific error 32

r

mAk

+ X)

2Afc

3

2

~ ~ ~ ~

.

(2.2.78)

Exact Methods for Synthesis Problems

123

If instead of condition (2.2.71), we have a stronger inequality

A

F~~ I

then we can substantially simplify (2.2.77) and (2.2.78) by expanding these expressions in power series in the small parameter e = XAk/umk2 and taking only the leading terms of these expansions. In this case, instead of (2.2.77) and (2.2.78), we have Afc/2 jfe2

1

'

7 - K3)2-

(

J

(2-2.80)

For the first time, formulas (2.2.79) and (2.2.80) were derived by somewhat different methods in [176]. §2.3.

Optimal control of the population size

Numerous investigations deal with the dynamics of animal and microorganism populations and control of the population size. For example, an

extensive literature on this subject can be found in [51, 73, 87, 89, 133, 142, 186, 187, 189]. Various mathematical evolutional models depending on the environmental conditions of biological populations are used for describing the variations in the population size. We begin with a brief review of such models. The main attention is given to the models that we shall consider in this book later. 2.3.1. Models describing the population dynamics. Apparently, Malthus was the first who considered the following model for the population

dynamics in 1798: J = az.

(2.3.1)

Here x = x ( t ) is the population size11 at time t, and the constant number a, called the growth factor, is determined as the difference between the birthrate and the death-rate factors. If the birth-rate is larger than the death-rate (a > 0), then, according to the Malthus model (2.3.1), the population size must grow infinitely. llr

The variable x is assumed to be continuous, although the number of individuals in

the population can be only integer. However, if the number of individuals is sufficiently large, then the continuous model (2.3.1) can be used. In this case, the variable x is treated as the population density, that is, as the number of individuals per unit area (or volume) of the population habitat.

124

Chapter II

Usually, this result is not confirmed, which shows that the model (2.3.1) is not perfect. Nevertheless, the basic idea of this model, namely, the assumption that the rate of the population variation is proportional to the current population size proved to be very fruitful. Many more realistic

models were constructed on this basis by introducing the corresponding corrections to the growth factor a. So, for example, if we assume that in (2.3.1) the growth factor a depends

on the population size x as = a(x) = -fj,

ln(x/K)

/

x\

or a = a(x) = r (I - — J ,

then we obtain the Gompertz model (1825)

dx

* - '^ or the Verhulst model (1838)

Equation (2.3.3) is often called the logistic equation. The positive constants r and K are usually called the natural growth factor and the capacity of the medium. Models for more complicated systems of interacting populations are also based on the Malthus model (2.3.1). Assume that in the same habitat there are two different populations of sizes xi and x%, respectively. Let each of these populations be described by the Malthus type equations

at

,

at

= bx2.

, . (2.3.4)

Now we assume that individuals in the second population (predators) can exist only if they eat individuals (prey) from the first population.12 In this case, it is natural to assume that the growth factors a and b in (2.3.4) have the form

12

This model is usually illustrated by the assumption that x\ denotes a community

of hares bares and X? a community of wolves. Hares need vegetable food, and wolves feed on hares •es (and (and only onlv on hares). hares)

Exact Methods for Synthesis Problems

125

Thus, we arrive at the two equations

dxi

—-— = (ai — 02^2)*!,

at

dx2

—;— = ( — t>i + ozxzjx^

at

,

,

(2.3.5)

which are well-known Lotka-Volterra equations that model the behavior of the simplest system of interacting populations, the "predator-prey" model. These equations were studied in detail by V. Volterra [187], who found many remarkable properties of their solutions.

The multidimensional generalization of the Lotka-Volterra model has the form

•fcr

/.

^. .. V

r=i)2,...,n.

(2.3.6)

The dynamics of system (2.3.6) depends on the form of the matrix A = \\aij\\™. If this ma trix is antisymmetric, i.e., if a,-j = — oj,- and an = 0, then Eq. (2.3.6) describes a conservative model for the population interaction. If the quadratic form arsxrxs is positive definite, then model (2.3.6) is called dissipative. Further generalizations of models for the population dynamics were related to more detailed descriptions of the interaction between individuals in the population. For example, in many actual situations, the growth factor a depends on the population size at some preceding moment of time rather than on the current population size. In these cases, it is expedient to use the Hutchinson model (1948) M\

h > 0,

(2.3.7)

which is a generalization of the logistic model (2.3.3). In 1976 Gushing proposed the following more general model, in which both discrete and distributed delays are taken into account: f

x(t)=rx(t)\l-

I

f00

Jo

1

x(t-s)dK(s)\,

1

t > 0,

(2.3.8)

where K(s) is a nondecreasing bounded function and the integral on the right-hand side is of the Stieltjes type. In some special cases, it is necessary to take into account the spatial distribution of the population size. In these cases, the state of the population is described by the density function D(t,x,y] at the point (x,y). If the movement of individuals within the habitat is a diffusion process, then instead of (2.3.7), we have the Hutchinson model with diffusion —— (ft x y } =

dt

' '>

126

Chapter II

Equations (2.3.1)-(2.3.9) model the behavior of isolated biological communities (autonomous models). If there are external actions on the population, then some additional terms responsible for these actions appear on the right-hand sides of Eqs. (2.3.1)-(2.3.9) As usual, we distinguish two

types of external actions: purposeful controlled actions, which can be used to control the population size, and noncontrolled random perturbations. Let us consider a population described by the model (2.3.3). If there are external actions, say, some individuals are taken away from the population, then we obtain the controlled logistic model x = r

/ x\ (l-K — \x-qux, V /

(2.3.10)

where the function u = u(t) > 0 is the intensity of the catching process and

the number q > 0 is the catchability coefficient. In this case, the value

ft,

Q=q

u(t)x(t)dt

(2.3.11)

•>«!

gives the number of individuals caught during the time interval [ti,^]-13 In a similar way, the Lotka-Volterra equations can be generalized to the following controlled system:

xi — (ai - a2x2)xi - qiuixi, •

i

X2 = ( — Oi +

i

\

(Z.O.J-.ZJ

Here the control functions «i = ui(t) > 0 and u^ = u-2(t) > 0 are, respectively, the intensities of catching prey and predators. If individuals are removed only from one population, then we have u\(t) = 0 or U2(t) = 0 in

(2.3.12). If the population behavior is substantially influenced by noncontrolled

random perturbations, then the dynamics of the population is described by stochastic differential equations. For example, in some problems, the population behavior can be satisfactory described by the stochastic logistic model

x = r 1- K- x-qux + Wx£(t), (2.3.13) V J where £ ( t ) is the scalar Gaussian white noise (1.1.31) and the number B > 0 determines the intensity of random perturbations. Many other stochastic models used to describe the dynamics of various biological systems can be found in [51, 83]. 13

Note that Eq. (2.3.10) can be used not only in the case of "mechanical" removal

(catching, shooting, etc.) but also in the case where the population size is controlled by treating the habitat with chemical agents.

Exact Methods for Synthesis Problems

127

2.3.2. Statement of the problem. In this section we consider the optimal control of the size of a population described by the controlled logistic model (2.3.10). The statement of this problem is borrowed from the books [35, 68], where this problem is formulated in conformity to the problem of fisheries management.

We shall assume that the state x = x(t) of the controlled system (2.3.10) characterizes the general quantity (or the mean density) of fish at time t in some chosen habitat. We also assume that the intensity of fishing is bounded by a given value um > 0. In this case, the mathematical model for the dynamics of fish population has the form . (, x\ as = r I 1 — — I a? — qux, \ K)

0 < u(t)
t> 0,

(2.3.14)

x(Q) = ZQ > 0.

By p > 0 we denote the price of the unit mass of caught fish and by c > 0 the price of unit "efforts" u for fishing. Then it is natural to estimate the "quality" of control by the functional fT

I(u)= I (pqx(t)-c)u(t)dt, Jo

(2.3.15)

which, with regard to (2.3.11), gives the profit provided by the fishing process defined by the control function u(i): 0 < t < T. The problem is to find the optimal control u*(t): 0 < t < T for which the functional (2.3.15) attains its maximum. Following [35, 68], instead of (2.3.15), we shall estimate the quality of control (i.e., of fishing) by the functional in which the terminal time T —t oo. In this case, an additional "killing" factor appears in the integrand to ensure the convergence, namely, ,.00

I(u) = I Jo

st

e-

(pqx(t)-c)u(t)dt,

(2.3.16)

where S > 0 is a given positive number. As a result, we arrive at the following problem: for an arbitrary initial state x(Q) = x$ of the controlled system (2.3.14), to find a control function 0 < Ut.(t) < um: t > 0 (or

0 < w* ( x ( t ) } < um: t > 0) for which the functional (2.3.16) attains its maximum on the trajectories of system (2.3.14). REMARK. If the initial population size XQ does not exceed the capacity K of the medium, then it follows from Eq. (2.3.14) that for any time moment

128

Chapter II

t > 0 and any admissible control u(t), the population size has the same property, x(t) < K. Therefore, this problem is well posed if the parameters p, q, and c in the functional (2.3.16) satisfy the condition

— = Xl
(2.3.17)

Otherwise, (x± > K), this problem has only a trivial solution u*(t) = 0, t > O.14 Therefore, in what follows, we assume that the inequality (2.3.17) is satisfied. We also assume that qum > r.

D

2.3.3. The solution of problem (2.3.14), (2.3.16). If we define the function F(x) of the maximum future profit by the relation

U

oo

-|

e~st (pqx(t) - c)u(t) dt z(0) = x , j

(2.3.18)

then, using the standard procedure described in §1.3, we obtain the Bellman equation

max (J^U^Ufn

 = 0 ^

-**- '

I CtiC

(2.3.19)

J

corresponding to problem (2.3.14), (2.3.16). It follows from Eq. (2.3.19) that, depending on the current state (the

population size) x of the system (2.3.14), to perform the optimal control we need to choose u*(x) = 0 for all points x £ R1 C R+ at which the function

p(x) =pqx-c-qx—— ax

(2.3.20)

is negative. Conversely, at all points x G R2 C R+ where 0, we

need to take the maximum admissible control ut(x) = um. If tp(xt) = 0 at a point cc* (in view of continuity, the point x* separating R1 from R2 is the limit point of these domains), then the optimal control u f ( x * ) for Eq. (2.3.19) is formally undetermined. However, one can see that the choice

of any admissible control 0 < u < um at the point x* does not affect the solution of Eq. (2.3.19).

Now let us consider how the population size in system (2.3.14) varies with time. Let x(0) = XQ < xi. Obviously, in this case, there exists an

initial half-interval [0,t*) at all points of which we must set ut(t) = 0. This statement immediately follows from the fact that the expression pqx — c in

the parentheses in (2.3.16) is negative for x(t] close to XQ. Thus, we have 14

Since x\ > K, we have pqx(t) — c < 0 for all t.

Exact Methods for Synthesis Problems

129

x(t) e R1 for all t £ [0,t«). Hence, it follows from Eq. (2.3.14) with u = 0 that, on the interval [0,**), the population size x(t) increases monotonically up to the value xt = x(t*) that separates the sets R1 and R 2 . At the point x f , as was already noted, the control may take any arbitrary value. It is expedient to take this value equal to

and keep it constant for t > t#. It follows from (2.3.14) that the control (2.3.21) preserves the population size xt. REMARK. For u(xt) ^ MI, the representative point of system (2.3.14), starting from the state a?*, comes either to the set R1 (for u(xt) > MI) or to the set R2 (for u(xt) < «i) during an infinitely small time interval. Then the control u = 0 (or u = um) immediately returns the representative point to the state a;*. Thus, for u(x+) ^ ui, the population size ** is preserved by infinitely rapid switches of control (this is the sliding mode). Though, as follows from (2.3.19), the value of the functional (2.3.16) for this control remains the same as for u(xt) = -MI, the constant control u(t) = u(xt) = HI, t > <*, is more convenient, since in this case the existence problem does not arise for the solution x(t), t > t f , of Eq. (2.3.14). The optimal control 0

for

0 < t < t*

«={(1-V)

fo,

«>«„

(x(t)

< a;*),

<"-22>

realizes the generalized solution x*(t) of Eq. (2.3.14) in the Filippov sense (see §1.1). D Thus, for x(0) = XQ < xi the optimal control (2.3.22) is a piecewise function shown in Fig. 21 together with the plot of the function x*(t), which shows the change of the population size corresponding to this control. It

remains to find the moment tt at which the catching of individuals starts or, which is the same, to find the size (density) xt = xt(tf} of the population that we need to keep constant in the area of active catching. These variables can readily be obtained by calculating the functional (2.3.16) and taking its maximum with respect to t«. Indeed, for the control (2.3.22), the functional (2.3.16) is equal to -

K J

. (2.3.23)

We can calculate its maximum with respect to tt by using the fact that x* = z*(i*) as a function of tt satisfies Eq. (2.3.14) for u = 0. After the

Chapter II

130

FIG. 21 differentiation, from the extremum condition

we obtain the following equation for the optimal size x* of the population: x

t - ± ( ± + ( l - i } K X\ x _ ^ - n • 2\pq + ( 2Mr ' r)") *

(

f2324^ '

This equation has only one positive solution

J

(2.3.25)

which has a physical meaning. Note that, in view of (2.3.17), the value xt determined by (2.3.25) always satisfies the inequality x^ < xt < K.

We also note that the condition

XQ < %i, introduced for the sake of obviousness, does not influence the

Exact Methods for Synthesis Problems

131

choice of the optimal control. This strategy is completely determined by x f , and according to this strategy, we do not catch individuals if the current population density x(t) < xf and start catching with constant intensity (2.3.21) if the population size attains the value xt (2.3.25). We can readily calculate the profit function (2.3.18) corresponding to

this strategy. Integrating the equation in (2.3.14) with a;(0) = x and u = 0, we obtain

-rK

x (w t ) = —————-.——r, x + (K - x)e~Tt

t > 0. ~

v(2.3.26)

'

Using (2.3.26), we see that the condition

xK

— = a;,

X -f {J\ — XJC

(2.3.27)

' '

allows us to find the moment tt. From (2.3.23) and (2.3.27), we explicitly calculate the profit function -Z*

(2.3.28) for x < x*. To solve problem (2.3.14), (2.3.16) completely, it remains to consider the case x(0) = XQ > x*, that is, the case where the initial population size is larger than the optimal size (2.3.25). First, we note that, in view of (2.3.28), the profit function F(x) monotonically increases on the interval 0 < x < x* from zero to the maximum value

(x,) =

dq

(pqx* - c) 1 \ KJ

(2.3.29)

We also note that the function ^(x) has only one maximum point

and since the "killing" factor 6 in (2.3.16) is strictly positive, we always

have the strict inequality x 2 > x*. Now if x(0) = XQ = x2 > £*, then using the constant control (2.3.30)

Chapter II

132

we can keep the population size at a level of x? for which the functional (2.3.16) attains the value

I(u(x2)) = However, the constant control (2.3.30) is not optimal. One can readily see that the functional (2.3.16) can take values larger than l(u(x.2)) = if>(x2) if, instead of (2.3.30), we use the piecewise constant control function

0
(2.3.31)

shown in Fig. 22.

A

x(t)

FIG.

22

We choose a time interval A during which the control um is applied, so that at the end of A the population size attains the value (2.3.25), that is,

«(A) = a;*.15 The time interval A for the control um is determined by the equation , - ________x 2 K(r-qu m )________ «t'* —

p

:

:——————^—;——————TT"-

rx2+[K(r-qum)-rx2\e(iu™-r^

(Z.O.OZ)

15

The inequality I(u&(t)) > 7(«(a;2)) = *l>(xi) is obtained by calculating the func-

tional (2.3.16) with regard to Eq. (2.3.14), where control has the form (2.3.31). Here we do not perform the corresponding elementary but cumbersome calculations and leave

them to the reader as an exercise.

Exact Methods for Synthesis Problems

133

Obviously, control functions of the form (2.3.31) can be used not only for the initial population size x(0) = x% but also for arbitrary initial sizes x(0) = x > xf. In this case, we must perform the change £2 —>• x only in Eq. (2.3.32) for the length A of the initial pulse um. One can easily verify

that (2.3.20) implies tp(x) > 0 for all x > x*. Therefore, the optimal control as a function of the current population size (the synthesizing function) for problem (2.3.14), (2.3.16) has the form

for

for for

0 < x < a;*,

x = x#,

(2.3.33)

x > a;*,

where xt is determined by (2.3.25). Formula (2.3.33) gives the mathematical expression for the control strategy that is well known in the theory of optimal fisheries management [35, 68]. The key point in this strategy is the existence of an optimal size xt of fish population given by (2.3.25). The goal of control is to achieve the optimal size xf of the population as soon as possible and to preserve the size x* by using the constant control (2.3.21). This control strategy maximizes

the profit obtained by fishing if the profit is estimated by the functional (2.3.16). In conclusion, we note that the results presented in this section can be generalized to the case in which the dynamics offish population is subject

to the retarded equation (or the equation with delay) =r(l- ^--V] X(t) - qu(t)x(t); K \ J

here we have studied the controlled Hutchinson model. For the results related to this case, see [99].

The stochastic version of problem (2.3.14), (2.3.16), when the behavior of the population is described by the stochastic equation (2.3.13), will be

described in §6.3. §2.4. Stochastic problem of optimal fisheries management

Now let us consider the problem of optimal fisheries management that differs from the problem considered in §2.3 by the stochastic character of the model used for the description of the population dynamics. We assume that the behavior of a fish population is subject to the stochastic differential equation of the form

x ( t ) = [r- qu(t)]x(t) + V2Bx(t)£(t),

x(0)

= z0,

0 < u(t) < um,

t > 0,

(2.4.1)

134

Chapter II

where £ ( t ) is a scalar Gaussian white noise (1.1.31), B > 0 is a given positive number, the natural growth factor r > 0 and the catchability coefficient q > 0 have the same meaning as similar coefficients in (2.3.10), (2.3.13), and (2.3.14). Equation (2.4.1) is a special case (as K ->• oo) of Eq. (2.3.13) and, in accordance with the classification presented in Section 2.3.1, the model described by Eq. (2.4.1) can be called a controlled stochastic Malthus

model. Just as in §2.3, the size x ( t ) of the fish population is controlled by catching a part of this population. In this case, the catching intensity u(t ) has an upper bound um, and therefore, the set of all nonnegative measurable bounded functions of the form u(t) : [0, oo) —>• [0, um] is considered as the

set of admissible controls. The goal of control is to maximize the functional (2.3.16), which, in view of the random character of the functions x(t) and u(t), is replaced by the corresponding mean value. As a result, we have the problem r

/.oo

I(u) = E \

-|

e~5t (pqx(t) - c}u(t) dt ->•

I JO

J

max

.

(2.4.2)

«(t)6[0,« m ] «>0

In what follows, we assume that the decay index 6 in (2.4.2) satisfies the

condition 8 > r. We shall solve problem (2.4.1), (2.4.2) by using the standard procedure of the dynamic programming approach described in §1.4. We define the profit function for problem (2.4.1), (2.4.2) by the relation F(x)=

r r°° e~

max E / «(«)e[o,« m ] Uo t>o

Si

i

(pqx(t) - c)u(t) dt \ x(Q) = x , \

(2.4.3)

where £[(•) [ x(0) = x] denotes the conditional mathematical expectation of (•). As was shown in [113, 175], the second-order derivative of the profit function (2.4.3) is continuous. It follows from Theorem 3.1.5 in [113] that for all x G R + — [0, oo), this function has the upper bound

F(x)
(2.4.4)

where N > 0 is a constant.

The Bellman equation (Fx - ^, Fxx = f^f) max [Bx2 Fxx + (r + B - qu)xFx - SF + (pqx - c)u] = 0

0<«<« m

(2.4.5)

for the profit function (2.4.3) can be obtained as usual (see §1.4). It should

be pointed out that a symmetrized stochastic integral (see [174] and §1.2)

Exact Methods for Synthesis Problems

135

was used for writing (2.4.5). This led to an additional term B in the parentheses in (2.4.5), that is, in the coefficient of xFx.le Equation (2.4.5) allows us to find the optimal control ut as a function u i f ( x ) of the current states of system (2.4.1). First, we note that, according to (2.4.5), the set of all admissible states of (2.4.1) can be divided into the following two subsets (just as in §2.3):

the subset R 1 , where
the subset R 2 , where f>(x) > 0 and u*(x) = um. The boundary between these two subsets is determined by the relation pqx - c - qxFx - 0.

(2.4.6)

The further calculations show that, in this problem, there exists a unique point x* satisfying (2.4.6). Therefore, the subsets R1 and R2 are the intervals R1 = [0, x*) and R2 = (x*,oo). Thus the optimal control in the synthesis form Ut = u*(x) becomes uniquely determined at all points x <E R+

except for the point x*. It follows from (2.4.6) that we can use any admissible control u(z*) 6 [0,itm] at the point x*. Therefore, the optimal control function M*(X) can be represented in the form ( \ ' ,,

u*(x)=<

(2.4.7)

I Mm,

if

X > X*,

and the final solution of the synthesis problem is reduced to calculating the coordinate of the switch point X*. To calculate x*, we need to solve the Bellman equation (2.4.5). As was already noted, the second-order derivative of the profit function F(x) is continuous, thus the profit function F(x) satisfying (2.4.5) can

be obtained by using the matching method (see §2.2). In what follows, we describe the procedure for solving the Bellman equation (2.4.5) and calculating the coordinate of the switch point xf in detail. By F*(x) and F2(x) we denote the profit function F(x) on the intervals

R1 = [0, x,) and R2 = (a:,, oo). It follows from (2.4.5) and (2.4.7) that the functions Fl and F2 satisfy the linear equations

- SF1 = 0, 2

Bx F

2

2

x

0 < x < x,, 2

+ (r + B - qum)xF - SF + (pqx

- c)um = 0,

(2.4.8) x > a;,.

(2.4.9) 16 If the stochastic differential equation in (2.4.1) is the Ito equation, then the second term in the Bellman equation (2.4.5) has the form (r — qu)xFx.

136

Chapter II

Since the profit function F(x) is sufficiently smooth (recall that the secondorder derivative of F(x) is continuous), both functions F1 and F2 must satisfy the condition (2.4.6) at the switch point x*. Taking into account the fact that F(0) = 0 according to (2.4.1) and (2.4.3), we have the following additional boundary condition for the function F!(x):

Fl(0) = 0.

(2.4.10)

The boundary conditions (2.4.6), (2.4.10), and the upper bound (2.4.4) allow us to obtain the explicit analytic solution of Eqs. (2.4.8) and (2.4.9). Equation (2.4.8) is the well-known homogeneous Euler equation. Its general solution has the form

,

(2.4.11)

where AI and A? are constants, and k\ and k\ satisfy the characteristic equation

Bk(k-l) + (r + B)k-S = 0.

(2.4.12)

The constants AI and A^ are determined by two boundary conditions (2.4.10) and (2.4.6) at the points x = 0 and x = a;,. Since the roots

of Eq. (2.4.12) have opposite signs, we conclude that to satisfy the condition (2.4.10), we need to set A2 equal to zero in (2.4.11). The constant c

~k

can be calculated by substituting Fl(x) = Aixk> into (2.4.6) and taking into account the fact that (2.4.6) is valid at the switch point x*. Thus, the

solution of Eq. (2.4.8) is given by the formula

>

0<x<x*.

(2.4.14)

The inhomogeneous Euler equation (2.4.9) can be solved in a similar way. By using the standard method of variation of parameters, we obtain the general solution PqX

Exact Methods for Synthesis Problems where

137

___________ *? = ^ [««m - r + >/(««m - r) 2 + 4B<5] ,

Y ^2 = ^[?w™ -

r

Lti

___________ ~ V(lum -r)2 + 4BS]

(2.4.16)

satisfy the characteristic equation

Bk2 - (qum -r)k-S = 0, and Ay and A± are arbitrary constants.

Since k2 is positive, we must set the constant A$ equal to zero (otherwise, formula (2.4.15) contradicts the upper bound (2.4.4)). The constant A^ can

be calculated from condition (2.4.6) at the switch point x f . Substituting F2(x) (determined by (2.4.15) with A3 = 0) into (2.4.6) instead of the function F, we obtain A

.

(S-r-B)p

——— ________ 1____________________ '___________ rp

4 — r 7/ r

n

,

\

i_fc*

^

__

*

c

_______ /w

T 7

-kl

2,

*

This implies the following expression for the function F2(x):

6 - r - B + qum

l\(6-r-B)px*

c]fx\k'

+ 72" r————— JT^ ——— ~ T

kl [S-r-B + qum

— )

JJV**/

i

x >x

~

*-

(2.4.17)

The two functions F1(x) (2.4.14) and F2(x) (2.4.17) determine the profit function F(x) satisfying the Bellman equation (2.4.5) for all x £ R+ = [0,oo). These functions contain a parameter a;*, which remains unknown.

We can calculate x* by using the continuity property of the profit function F(x). Each of the functions Fl and F2 is continuous. Hence, to ensure the continuity of F(x), it suffices to satisfy the condition

Fl(x*) = F2(x*)

(2.4.18)

at the switch point x». It follows from (2.4.6), (2.4.8), and (2.4.9) that (2.4.18) is equivalent to the condition

F2,(x.) = F2x(x*) at the switch point x*.

(2.4.19)

138

Chapter II

Calculating the second-order derivative of the functions (2.4.14) and (2.4.17), we derive the following equation for x» from (2.4.19):

p(6 — r — B)

c

Hence, the switch point xt is determined by the explicit formula T _ JLtjfi

c(d — r — tf + qum)

——

/1 I

*\

*

I Zf.rt.ZdJ I

pq[S — r — B + ,ki~k2\qum] Formula (2.4.20) and the optimal control algorithm (2.4.7) constitute

the complete analytic solution of the stochastic problem (2.4.1), (2.4.2) of optimal fisheries management. Some final comments and remarks. It is of interest to compare (2.4.20) and (2.3.25), which is the optimal size of the population in the deterministic problem (2.3.14), (2.3.16) of optimal control considered in §2.3. Denoting (2.3.25) by xf, we may expect that the equality

lim x* = lim xt

K-s-oo

B-X)

(2.4.21)

is valid due to continuity reasons (since the deterministic version of problem (2.4.1), (2.4.2) formally coincides with problem (2.3.14), (2.3.16) as K ->• oo). We can verify (2.4.21) by straightforward calculations of the limits on both sides. Indeed, using (2.3.25), we readily calculate the limit on the left-hand side of (2.4.21) for S > r,

lim i, =

K-+OO

f

pq(S - r)

(2.4.22) ^

'

The same result is obtained by calculating the limit of (2.4.20) as B —>• 0, since fej-l v (S-r)(qum-r) k\ — k\ B->o which follows from (2.4.13) and (2.4.16). Formula (2.4.21) shows how the results obtained in this section for problem (2.4.1), (2.4.2) are related to similar results for problem (2.3.14), (2.3.16) obtained in Section 2.3.3 by quite different methods.

There is another interesting specific feature of problem (2.4.1), (2.4.2). Namely, the standard "classical" approach of the dynamic programming

Exact Methods for Synthesis Problems

139

that leads to the exact solution of the stochastic problem (2.4.1), (2.4.2) does not allow us to solve the synthesis problem (that is, to find the switch point Xf) for the deterministic version of problem (2.4.1), (2.4.2), that is, in the case where there are no random perturbations in Eq. (2.4.1). This fact can readily be verified if we consider the deterministic version of the

Bellman equation (2.4.5). max \(r - qu)xFx - SF + (pqx - c)u] = 0

(2.4.23)

0
and calculate the functions

F'( z ) = l(f>z.-£)(±.J :c I

i •"

d \o-l

,

(2.4.24)

-«'l <- I + T^ i / \7 i \ •*-* / a[(r-qu )(a-l) m

I —

q\ \a;*

* r - qum

(2.4.25)

which, in this case, determine the profit function F(x) on the intervals R1 = [0, »*) and R 2 = [a;*, oo). Contrary to the stochastic case in which the continuity condition (2.4.18) for the functions (2.4.14) and (2.4.17) determines the unique switch point (2.4.20), one can readily verify that the same continuity condition F 1 ( x f ) = F2(x*) for the functions (2.4.24) and (2.4.25) holds for any point xf € (0, oo). Therefore, the control problem considered can serve as an example illustrating the well-known idea (see [113, 175]) that the dynamic programming approach is more suited for solving control problems with stochastic

models of plants (which, by the way, describe the actual reality more adequately) . REMARK. If the equation in (2.4.1) is understood as the Ito stochastic equation, then the Bellman equation for problem (2.4.1), (2.4.2) differs from (2.4.5) and has the form max [Bx2Fxx + (r - qu)xFx — SF + (pqx — c)u] = 0.

0<«<« m

The way for solving this equation is quite similar to the above procedure for solving Eq. (2.4.5). However, the population size xf that determines

the switch point for the optimal control (2.4.7) differs from (2.4.20) and is

140

Chapter II

given by the expressions

_

=

c(S - r + qum)

s B + qum - r - y/(B + qum - r) 2 + 46B . D

CHAPTER III

APPROXIMATE SYNTHESIS OF STOCHASTIC CONTROL SYSTEMS WITH SMALL CONTROL ACTIONS

Various approximate synthesis methods can be useful if the Bellman equation cannot be solved exactly. Chapters III-VI deal with some of these

methods. Approximate methods are usually efficient if the initial statement of the optimal control problem contains a small parameter. Quasioptimal control algorithms are constructed by using either the corresponding procedures of successive approximations or asymptotic expansions of the loss function in powers of a small parameter of the problem. The choice of a method for constructing an approximate solution of the synthesis problem essentially

depends on the choice of a parameter that is considered to be small. For example, in this chapter, the values of control actions are assumed to be small. Chapter IV is about the Bellman equation with small diffusion coefficients. In Chapter V, we consider control problems for oscillating systems with small attenuation decrement. In Chapter VI, the role of small parameters is played by the a posteriori covariances of unknown coefficients in the plant equations. Let us formulate the main idea of the approximate synthesis method studied in this chapter. As was already noted, the method is based on the assumption that control actions on the plant P are relatively small. Prom

the physical viewpoint, this assumption means that the effect of the control actions on the phase trajectories of the system is small, and therefore the system dynamics is similar to noncontrolled motion. In particular, this

assumption holds for control problems with constraints if the noises acting on the plant are of large intensity. Indeed, let us assume that the controlled (unperturbed) plant is a stable

mechanical system. Then large random perturbations lead to large deviations of the system from the equilibrium state. In this case, some "internal" inertial and elastic forces arise in the system. These forces can significantly

exceed the (bounded) control forces whose effects on the system turn out to be relatively small.1 1

Note that in this book we do not consider deterministic synthesis problems for

141

142

Chapter III

From the formal mathematical viewpoint, the fact that control actions are small leads to a small parameter in the nonlinear term in the Bellman equation. To verify this fact, let us consider the synthesis problem for the servomechanism (Fig. 10) governed by the Bellman equation (1.4.21). Assume that the dimensions of the region U of admissible controls are bounded by a small value of the order of e. For definiteness, we assume that U is either an r-dimensional parallelepiped (RT D U = (u: u,-| < umi, i — 1, 2, . . . , r; maxj umi = e)) or an r-dimensional ball of radius e, that is, (R, D U = (u: £J=1 «? < e ) ) . In the first case, according to (1.3.22), the solution of the synthesis problem is given by the formula (the control algorithm)

= - { « m l , - . - , W m r } s i g n ( QT (t) -r— (t, X, y) 1,

\

dx

J

(3.0.1)

where the vector of partial derivatives dF/dx is calculated by solving the equation

LF(t, x,y) = -ci ( x , y ) + eu^ ^

w

~ - r, *,,,

(3.0.2)

Here um denotes the r-vector (column) um/e and L is a linear operator of the form 2

"'^^ u -

(see

32

In the second case (where U is a ball), the optimal control has the form (1.3.23))

systems controlled by small forces. Such systems called weakly controllable in [32] were studied in [32, 137]. 2 Recall that relations (3.0.1) and (3.0.2) follow from the Bellman equation (1.4.21)

with c(x,y,u) = ci(x,y) and Ax(t,x) = a(t,x) + Q(t)u; {uml,.. .,umr} denotes a diagonal (r X r)-matrix; for a column A with components AI ,..., Ar, the expressions sign A and \A\ denote j--columns with components sign-A,- and \A{\ (i = 1,..., r), respectively.

Approximate Synthesis of Stochastic Control Systems

143

where the vector dF/dx is the gradient of the loss function satisfying the equation

(

op

1

ft]?\ ~^

QQT-

•

(3.0.4)

If we denote the nonlinear terms in Eqs. (3.0.2) and (3.0.4) in the same way, then we can write both equations in the form

,

(3.0.5)

where $(f, dF/dx) is a given nonlinear function of its arguments. As a rule, equations of the type (3.0.5) cannot be solved exactly. However, the presence of a small parameter in the nonlinear term of this equation yields a rather natural way of solving this equation approximately. To this end, one can use the method of successive approximations in which the

zero-order approximation Fo(t, x,y) satisfies the equation

LF0 = -c1(x,y)

(3.0.6)

and the successive approximations Fk(t,x,y) can be calculated recurrently by solving the sequence of linear equations of the form

A =1,2,....

(3.0.7)

If we know the solution Fk(t, x, y) of the equation for the /sth approximation (fc = 0, 1, . . .), then we can perform an approximate synthesis of the controlled system by setting the quasioptimal control algorithm as (3.0.8)

In this chapter we consider an approximate method for the synthesis of optimal systems whose "algorithmic" essence is given in formulas (3.0.6)-

(3.0.8).3 Needless to say, the practical use of procedure (3.0.6)-(3.08) in special problems leads to additional problems of constructivity and efficiency of 3 The approximate synthesis algorithm (3.0.6)-(3.08) is a modification of the wellknown Bellman method of successive approximations [14, 16]. This method was used by

W. Fleming for solving some stochastic problems of optimal control [55]. The procedure

(3.0.6)-(3.0.8) is a special case of the Bellman method if the trivial strategy uo(t,x,y) = 0 is used as the initial "generating" control strategy in the Bellman method.

144

Chapter III

this approximate synthesis method. In this chapter we shall discuss these problems in detail. All related material is divided in sections as follows. First (§§3.1-3.3), we consider some methods for calculating the successive approximations for stationary synthesis problems. We write out approximate solutions (that correspond to the first two approximations) for some

special control systems with various types of disturbances affecting the system. In §3.1 and §3.2, we consider random perturbations of the white noise type. In §3.3 the results obtained in §3.1 and §3.2 are generalized to the case of correlated noises. In §3.4 we study nonstationary problems and estimate the error of the approximate synthesis (3.0.6)-(3.0.8) for the first two approximations. In §3.5 we study asymptotic properties of successive approximations (3.0.7), (3.0.8) as k —>• oo. We show that, under some special conditions, as k —> oo the sequence F& is convergent to the exact solution of the Bellman equation, and the corresponding quasioptimal control algorithms (3.0.8) to the optimal control ut(t,x,y). In this case, the convergence Uk -> uf is understood in the sense of convergence of values of the functional to be minimized. Finally, in §3.6 the method of successive approximations (3.0.6)^(3.0.8) is used for approximate synthesis of some stochastic control systems with distributed parameters. §3.1.

Approximate solution of stationary synthesis problems

3.1.1. Let us consider the problem of optimal damping of oscillations in a dynamic system subject to random perturbations of the white noise type (Fig. 13). Let the plant P be described by the following system of linear stochastic differential equations with constant coefficients:

(3.1.1)

Here x = x ( t ) is an n- vector (column) of current phase variables of the system (xi(t), . . . , xn(t )) , u = u(t) is an r-vector (column) of control actions (ui(t), . . . ,u r (t)), £(t) is an n-vector (column) of random perturbations with independent components (£i(<), . . • ,£»»(<)) of the standard white noise type (1.1.31), and A, Q, and
J

(3.1.2)

Approximate Synthesis of Stochastic Control Systems

145

where c(x) > 0 is a given convex penalty function attaining the absolute minimum c(0) = 0 at the point x — 0 (the restrictions on c(z) are discussed in detail in §3.4 and §3.5). Let admissible controls be bounded and small. We assume that all components of the control vector u satisfy the conditions

K < £umi,

i=l,...,r,

(3.1.3)

where e > 0 is a small parameter and w m i , . . . , umr > 0 are given numbers

of order 1. The system shown in Fig. 13 is a special case (the input signal y(t) = 0) of the servomechanism shown in Fig. 10. Therefore, the Bellman equation

for problem (3.1.1)-(3.1.3) readily follows from (1.4.21), and taking into account the relations Ay(t, y) = 0, By(t, y) = 0, Ax(t, x, u) = Ax + Qu, and c(x,y,u) = c(x), we obtain dF(t x}

' +LF(t,X)+ Ot

min

T

gT

+ c(x) = 0.

(3.1.4)

Here L denotes a linear elliptic operator of the form

(3 L5)

-

where, according to (1.4.16), the matrix B = aaT , and, as usual, the sum in the last expression on the right-hand side of (3.1.5) is taken over repeated indices from 1 to n. It follows from (3.0.1) and (3.0.2) that in this case the optimal control has the form

(3.1.6) where the loss function F(t, x) satisfies the equation

dF —— (t, a;) + LF(t, x) - -c(x) + eu

(3.1.7)

Some methods for solving Eq. (3.1.7) will be considered in §3.4. In the present section, we restrict our consideration to stationary operating conditions of the stabilization system in question. It follows from §1.4 that the stationary mode of stabilization (damping) can take place if T —)• oo, where T is the terminal instant of the operation interval (the upper

Chapter III

146

integration limit in (3.1.2)). Obviously, in this case, stationary operating conditions exist only if the unperturbed motion of the plant P is stable, that is, in other words, if the real parts of the eigenvalues of the matrix A in (3.1.1) are negative. In what follows, we assume that these conditions

are satisfied. If we define the stationary loss function f ( x ) by the relation (see (1.4.29), (2.2.9))

then (3.1.7) implies the following time-invariant equation for f ( x ) :

Lf(x)

= 7 - c(x) +

(3.1.8)

where the parameter 7 characterizing stationary "specific losses" together

with the function f ( x ) can be found by solving Eq. (3.1.8). We shall solve Eq. (3.1.8) by the method of successive approximations. The scheme for calculating (3.0.6), (3.0.7), applied to the time-invariant equation (3.1.8) leads to the sequence of equations

(3.1.9)

Lfo = 7° - c ( x ) ,

Lfk = Jk- c(x)

Q1

dx

(3.1.10)

It follows from (3.1.9) and (3.1.10) that each time when we are calculating the next approximation, we need to solve a linear inhomogeneous elliptic equation of the form

Lf(x) =

(3.1.11)

We shall consider a method for solving Eq. (3.1.11) with a given function
in eigenfunctions of some Sturm-Liouville problem [179]. 3.1.2. The passage to the adjoint equation. Let us consider the operator

that is the dual of the operator (3.1.5). The equation

— (<, x) = L*p(t,x)

(3.1.13)

Approximate Synthesis of Stochastic Control Systems

147

is the Fokker-Planck equation (1.1.67) for the n-dimensional Gaussian Markov process x(i) [45, 167, 173]. The assumption that the matrix A is stable implies that this process has a stationary density function po(x) such that

L*p0(x) = 0.

(3.1.14)

In this case the stationary probability density po(x) has the form

po(x) = V

. /

1

«

T ——— exp [L - \(x Px)} J . 2 - i

(3.1.15) '

where P"1 is the matrix of covariances of components of the vector x.

We shall present a possible method for solving Eq. (3.1.13) and calculating the matrix P in (3.1.15). The diffusion Markov process x ( t ) described by (3.1.13) satisfies the system of linear stochastic differential equations

x = Ax +
(3.1.16)

describing the uncontrolled motion of the plant (3.1.1). We pass from x to new variables y related to x by the linear transformation

x = Vy

(3.1.17)

with a nondegenerate matrix V. As a result, instead of (3.1.16), we have the following system for the new variables:

(3.1.18) where

A = V~iAV,

a = V-1a.

(3.1.19)

We choose V so that the matrix A is diagonal

I={A1,A2,...,An}.

(3.1.20)

As is known [62], the matrix V always exists and can readily be constructed if the eigenvalues of the matrix A are simple, that is, if the characteristic equation of the matrix A,

det(j4. - \E) = 0,

(3.1.21)

has different roots AI, A 2 , . . . , A n . In this case, the columns of the matrix V are the eigenvectors vl of the matrix A that satisfy the linear equations

Aff=\ivi,

i=l,...,n.

(3.1.22)

148

Chapter III

The system (3.1.18) can readily be solved in the case of (3.1.20). Indeed, writing (3.1.18) in rows, we obtain

yt = Aiyi + rli(t),

£=l,...,n,

(3.1.23)

where the random functions rjt(i) — &tk(,k(t) are processes of the white noise type and have the characteristics

Eife(*) = 0,

Em(t)rjm(t - T) = BtmS(r)-

£ , m = l , . . . , n . (3.1.24)

Here Btm is an element of the matrix B = cfcfT. obtain

yt(t) = yt0e^-^ +

Solving Eq. (3.1.23), we

e^-r,^') dt',

ylo = W (t 0 ),

(3.1.25)

Jto

and taking into account (3.1.24), derive the following expressions for the means and covariances:

(3.1.26)

which determine the transition probability p(y(t) process y(t). It follows from (3.1.26) that

Ey/ = 0,

Eytym = -

y(to)) of the Gaussian

B m

^ .

(3.1.27)

*t + *m

in the stationary case as t —> oo, since, by assumption, ReA^ < 0 (i = 1, . . . , n) for all roots of the characteristic equation (3.1.21) of the matrix A. It follows from (3.1.27) that the stationary density function po(y) can be written in the form

po(y) =

1

=exp[-i(y 1

r

Py)],

^/(27r)«detP-

where each entry of the matrix P"1 is given by the formula

(3.1.28)

Approximate Synthesis of Stochastic Control Systems

149

The stationary density po(y) satisfies the stationary Fokker-Planck equation

L*Po(y) = --(XiViPo) + ^jj-(BijPo) dyi

2 ayidyj

= 0.

(3.1.30)

Since the random processes y(t) and x(t) are related by the linear transformation (3.1.17), the comparison of (3.1.15) with (3.1.28) yields the formula

P = VTPV,

(3.1.31)

which together with (3.1.29) allows us to calculate the matrix P. Now let us return to the Fokker-Planck equation (3.1.14). If the operator (3.1.5) satisfies the potentiality condition (see §4, Section 5 in [173]), then the operator equality4 POL

= L*PQ

(3.1.32)

readily follows from (3.1.5), (3.1.12), and (3.1.14). However, even if the

potentiality conditions are not satisfied and (3.1.32) does not hold, one can choose an operator L\ satisfying a similar relation PoL

= L\PQ.

(3.1.33)

One can readily see that the operator L\ has the form5

ri = - ( G , ^ +

(^),

(3-1.34)

where the matrix G = ||G,-j||" is similar to the transpose matrix AT from (3.1.1) and (3.1.5),

G=P-1ATP.

(3.1.35)

The similarity transform (3.1.35) employs the matrix P from (3.1.15). Relation (3.1.33) allows us to replace Eq. (3.1.11) by a similar equation

for the dual operator. In other words, it follows from (3.1.11) and (3.1.33) that the problem of finding f ( x ) in (3.1.11) is equivalent to the problem of finding z(x) in the equation

L\z(x) = ip(x),

(3.1.36)

where z(x), i>(x), and the functions /(a;), V>(x) from Eq. (3.1.11) satisfy the relations

z(x)=po(x)f(x), 4

if>(x) = Po(x)if>(x).

(3.1.37)

As usual, the operator equality is understood in the sense that it is an ordinary

r e l a t i o n p g ( x ) L w ( x ) = L*po(x)w(x) for any sufficiently smooth function w(x). 5 The verification of (3.1.33) is left to the reader as an exercise.

150

Chapter III

3.1.3. The solution of equations (3.1.36) and (3.1.11). Let us consider the following problem of finding the eigenfunctions zs (x) and eigenvalues A s of the operator L\ (the Sturm-Liouville problem):

L\zs = \,z,.

(3.1.38)

Since L\ is the Fokker-Planck operator, its eigenfunctions zs must satisfy the zero conditions at infinity (as x —> oo).

By passing from x to new variables y (x = Vy) and acting in a way similar to (3.1.17)-(3.1.31), we can transform the operator (3.1.34) to the form £;

=-(A,.tt)+ I ( B

t f

) ,

(3.1.39)

where Bij is an element of the matrix B = a (TT , "a = V cr, V is a nonde__ _ i __ __ generate matrix such that the transformation V GV makes the matrix G diagonal, G = {Ai, . . ., A n }, and A; are roots of Eq. (3.1.21).6 In the new variables the stationary Fokker-Planck equation has the form

*

- 0-

^-

This equation differs from (3.1.30) only by the matrix of diffusion coefficients; therefore, the stationary probability density p0(y) is determined by the formulas

pQ(y) =

1

_= exp [ - |(j/TPy)] ,

(3.1.41)

y(27r)"detP

1 similar to (3.1.28) and (3.1.29). Differentiating (3.1.40) appropriately many times, we see that

_

+ m 2 A 2 + • • • + mn \n) —^——jr^Po = 0,

(3.1.43)

According fco (3.1.35), the matrix G is similar to the transpose AT. Since all similar and transpose matrices have the same eigenvalues, the characteristic equation det(G — AB) = 0 for the matrix G coincides with (3.1.21)).

Approximate Synthesis of Stochastic Control Systems

151

where mi, m?,..., mn are any arbitrary integers between 0 and oo. It follows from v(3.1.43) that the functions -J^j—Q 1" Pn an-d the num' dy-i ---dyn bers (miAi + • • • + m n A n ) can be treated as the eigenfunctions zs and the

eigenvalues Xs of problem (3.1.38), respectively. By using (3.1.41), we can write the functions zs in more detail as follows:7 Zs

= (_l)™i+"-+™»tf rai ... ra J y )exp [- f(y T Pj/)].

(3.1.44)

Here Hmi,_mn(y) — Hmi...mn(yi: • • -,2/n) denote multidimensional Hermitian polynomials (for instance, see [4]) that, by definition, are equal to Hm^,...,mn(y) = (-l) mi +"+ m « exp [f (j/TPt/)]

"

(3.1.45)

It follows from the general theory [4] for Hermitian polynomials with real variables y that these polynomials form a closed and complete system of functions, and an arbitrary function from a sufficiently large class (these functions grow at infinity not faster than any finite power of \y\) can be expanded in an absolutely and uniformly convergent series in this system of functions. Furthermore, the polynomials H are orthogonal to another

group of Hermitian polynomials G given by the formula

"-|(/i T P"V)]-

(3.1.46)

Here the variables fj, and y satisfy the relation or

=

and the orthogonality condition itself has the form

r00

i"*

J — oo

J — oo

_

MetP

..Sv,m,

(S1/imi is the Kronecker delta). 7

The constant coefficient [(2ir) n detP "'j- 1 / 2 in (3.1.44) is omitted.

(3.1.48)

152

Chapter III

However, we often need to use a complex matrix V for the change of variables x —> y (for instance, see the problem in §3.2.). To pass to complex variables, we need to verify some additional statements from the general

theory [4], which hold for real variables. In particular, it is necessary to verify the orthogonality conditions (3.1.48), which are the most important in practical calculations.

This was verified in [107], where it was shown that all properties of the polynomials H and G remain valid for complex variables if only all functions HVl...Vn(y), Gmi...mn(y), exp[|(yTPy)], and exp[|(/iTP fj,)] are considered as functions of the initial real variables x of the problem. To this end, we need to make the change of variables y = V x in all these functions. In particular, in this case, the orthogonality condition (3.1.48) has the form oo

/

/»oo

... / -oo

e-^xTp^HVl...

J — oo

..£„„„,„,

(3.1.49)

where the matrices PI and P satisfy the relation

~P = VTP1V

(3.1.50)

similar to (3.1.31). Thus, we obtain the following algorithm for constructing the solution f ( x ) of Eq. (3.1.11). First, we seek a stationary density po(x) satisfying (3.1.14) and an operator L\ satisfying (3.1.33). Then we transform prob-

lem (3.1.11) to problem (3.1.36). After this, to find the eigenfunctions and eigenvalues of problem (3.1.38), we need to calculate the matrix V that transforms the matrix G to the diagonal form { A i , . . . , A n } by the siml-

larity transform V GV. Next, using the known A,- and V and (3.1.42), we calculate the matrices P and P that determine the stationary distribution (3.1.41). The expression obtained for p0(y) enables us to find the eigenfunctions zs = zmi...mm (3.1.44) for problem (3.1.38) and the orthogonal polynomials G TOl ... TOn (3.1.46). Finally, we seek the function z(x) satisfying (3.1.36) in the form of the series with respect to the eigenfunctions: oo z x

( ) =

X) mj....m n =0

a

m1...mnZmi. ..mn(x),

(3.1.51)

Approximate Synthesis of Stochastic Control Systems

153

where omi...ran are unknown coefficients; the eigenfunctions zmi,,.mn(x) can be calculated by formulas (3.1.44) with y = V x. If we also represent the right-hand side ij)(x) — po(x)(x) in (3.1.36) as the series oo

po(x)
^

bmi...mnzmi...mn(x),

(3.1.52)

where, in view of (3.1.49),

*— r...r

A/det PI (27r) n / 2 mi... !.. .m .,,„„. ^_00 00 n\ J_

Jj.oo _,

T-l i x G r o i ... T O/nT(F~ z)dxi...dz n

(3.1.53)

then we can calculate the unknown coefficients oroi...TOn in (3.1.51) by the formula ^m i...mn

\

ami...mn - T———— ,

\

.

, \

A mi ... mn = Aimi + ••• + Xnmn,

/ o i c x \

(3.1.54)

which follows from (3.1.38) and (3.1.43). Now we see that (3.1.37) implies the expression

for the solution of the initial equation (3.1.11). The algorithm obtained for solving (3.1.11) can be used for calculating

the successive approximations (3.1.9) and (3.1.10) It remains only to solve the problem of how to choose the stationary losses 7^ (k=0,l,2, ... ) in

Eqs. (3.1.9) and (3.1.10). 3.1.4. Calculation of the parameters Vs (k - 0,1,2,...). The structure of the solution (3.1.55) and a natural requirement that the stationary loss function f ( x ) must be finite imply that there is a unique method for choosing 7*. Indeed, since, according to (3.1.54), the eigenvalue AQO...O = 0, the coefficient aoo...o in (3.1.46) is finite if a necessary condition 60o.. .0 = 0 is satisfied, or, more precisely, (in view of (3.1.53) and (3.1.46)) if we have oo

/

/»oo

.../ •oo

J — oo

PQ(X)
154

Chapter III

This relation, (3.1.9) and (3.1.10) imply the following expressions for the stationary losses 7*: OO

/

/»OO

.../ -OO

c(x)pQ(x)dxl...dxn,

(3.1.56)

J — OO

Tdfk-i(x)

Q'

Po(x)dxi...dxn, dx = 1,2,.... (3.1.57)

Thus, we have completely solved the problem of how to calculate the successive approximations (3.1.9), (3.1.10) for the stationary operating conditions

of the optimal stabilization system. If the loss function f k ( x ) in the fcth approximation is calculated, then the quasioptimal control Uk(x) in the feth. approximation is completely defined,

namely, in view of (3.0.8) and (3.1.6), we have

(3.1.58) In the next section, using this general algorithm for approximate synthesis, we shall calculate a special system of optimal damping of random oscillations when the plant is a linear oscillating system with one degree of freedom. §3.2. Calculation of a quasioptimal regulator for the oscillatory plant

In this section we consider the stabilization system shown in Pig. 13, in which the plant P is an oscillatory dynamic system described by the equation

x + /3x + x = u + VB£(t),

(3.2.1)

where the absolute value of the scalar control u is bounded,

\u\ < e,

(3.2.2)

the scalar random process £(t) is the standard white noise (1.1.31), and /?,

B, and s are given positive numbers ((3 < 2). Equations of the type of (3.2.1) describe the motion of a single mass

point under the action of elastic forces, viscous friction, controlling and random perturbations. The same equation describes the dynamics of a direct-current motor controlled by the voltage applied to the armature when

Approximate Synthesis of Stochastic Control Systems

155

the load on the shaft varies randomly. Examples of other actual physical objects described by Eq. (3.2.1) can be found in [2, 19, 27, 136]. For system (3.2.1), (3.2.2), it is required to calculate the optimal regulator (damper) C (see Fig. 13), which will damp, in the best possible way with respect to the mean square error, the oscillations constantly arising

in the system due to random perturbations £(t). More precisely, as the optimality criterion (3.1.2), we shall consider the functional

(x2(t)

=E

x2 (t)) dt] , J

(3.2.3)

which has the meaning of the mean energy of random oscillations in system (3.2.1). Note that the mean square criterion (3.2.3) is used most frequently and this criterion corresponds to the most natural statement of the optimal

damping problem [1, 50]. However, there are other statements of the problem with penalty functions other than the function c(x] = x2 + x2 exploited in (3.2.3). From the viewpoint of the method used here for solving the synthesis problem, the choice of the penalty function is of no fundamental importance. To make the problem (3.2.1)-(3.2.3) consistent with the general statement treated in §3.1, we write Eq. (3.2.1) as the following system of two first-order equations for the phase coordinates x\ and x% (these variables can be considered as the displacement x\ = x and the velocity X2 = x):

(3.2.4)

= —zi —(3x2

Using the vector-matrix notation, we can write system (3.2.4) in the form (3.1.1), where A, Q, and
A=

0

0

Q=

(3.2.5)

According to §3.1, under the stationary operating conditions (T -» oo in (3.2.3)), the desired optimal damper C (Fig. 13) is a relay type regulator described by the equation (see (3.1.6))

u*(xi,x2) = -esign

-— . \OX2'

(3.2.6)

Here / = /(xi^xz) is the loss function satisfying the stationary Bellman equation (see (3.1.8))

df

(3.2.7)

Chapter III

156 where, according to (3.1.5) and (3.2.5),

L — x2-— - (/3x2 +

B d2

-— + IT -5-

(3.2.

ox-2

The equation

21

>.i-j.i.

determines a switching line for the optimal control action (from u — +£ to u = —£ or backwards) on the phase plane (KI, x2). The goal of the present

section is to obtain explicit expressions for the control algorithm (3.2.6) and the switching line (3.2.9). To this end, it is necessary to solve Eq. (3.2.7). We shall solve this equation by the method of successive approximations

discussed in §3.1. First, we shall prepare the mathematical apparatus for calculating the successive approximations. A straightforward verification shows that the stationary distribution with the density function po(x) = po(xi,x2), satisfying the equation (see (3.1.14))

f

=

has the form

(3.2.10) Hence, the matrices P and P~l in (3.1.15) are equal to

2/3 B

B

I 0 0 1

i o o i

(3.2.11)

It follows from (3.2.11) and (3.1.35) that in this case the matrix G of the

operator (3.1.34) coincides with the transpose matrix AT, that is, according to (3.2.5), we have " 0 -1 G= (3.2.12) 1 -0 and the operator (3.1.34) has the form

d

o

B d2 f\ s\ 2

(3.2.13)

One can readily see that the same probability density (3.2.10) satisfies the stationary equation Llpo(xi,x2) = 0. Therefore, in this case, the matrix PI from (3.1.49) and (3.1.50) coincides with the matrix P determined by (3.2.11).

Approximate Synthesis of Stochastic Control Systems

157

The matrix V that reduces (3.2.12) to the diagonal form by the similarity transform is equal to

1

V =

1

—AI

—A 2 (3.2.14)

This expression and formulas (3.1.50) and (3.2.12) imply B

B

Correspondingly, the inverse matrix P

B 2(4-,

2//J

(3.2.15)

-A2

has the form

A2 2//J 2//3 A!

(3.2.16)

The matrices (3.2.15) and (3.2.16) allow us to calculate the two-dimensional Hermitian polynomials .

Gim

r 1 / T7"^" \ T

, e i _,

f)l+m

^

r

1 / T1 ~^ \ i

exp

= (-!)

(3.2.18)

VL = Py.

Then these polynomials must be represented as functions of xi and x? by using the formula x = Vy and expression (3.2.14) for the matrix V. Table 3.2.1 shows some first polynomials H and G. In this case, in view of (3.1.51)-(3.1.55), (3.2.7), (3.2.10), and (3.2.11), the solutions of the equations of successive approximations (3.1.9), (3.1.10) can be written in terms of the Hermitian polynomials H^m(xi,X2) as the

series mA 2 '

(3.2.19) where the coefficients b\m are calculated by the formulas

(3.2.20) £

/V K __

/

-

9

V* ——
^1

9

I

^2 i ^

> 1.

Chapter III

158

TABLE 3.2.1 Polynomials H #00 = 1

-

-

2/3

-H

, p

2/3

//3

\

X2 #10 — -Pnyi + P\iyi — MI — ~TT B »i+ (^-y ) w

-"01 = -i 122/1 + -T22J/2 — M2 — ~^" ±> rr

"p

i

T-T-

n

.

rr

~p

i

'Xl+(P

+js}x,

,,2

-"20 = ~-» 11 + Ml

,,2

-"02 — — •« 22 T P2 #30

— Ml —

~ PllfJ-2

~

#12 =

#03 = A*2 — 3P22/x2 #40 = Ml ~ 3PnPl 2

#31 = MlM2

o-

_ ,,2,,2

rr

3

#"22 — MlM2 3Pl 2 P 2 2

#"13 — PlM2 #04

— fJ-2 ~ 6P22M 2

Polynomials G

GOO = 1

1 2JS 1 2^ Expressions for the polynomials G 2 O j G n , . . . can be obtained from the corresponding expressions for Him by the change Ppq —>• Ppq , fj,p —>• yp. Before we pass to the straightforward calculation of the successive ap-

proximations f k , we make the following two remarks about some singular-

Approximate Synthesis of Stochastic Control Systems

159

ities of the series on the right-hand side of (3.2.19).

REMARK 3.2.1. In practice, the series (3.2.19) is usually replaced by a finite sum. The number of terms of the series (3.2.19) left in this sum is determined by the rate of convergence of the series (3.2.19). Here we do

not discuss this question (see, for example, [26, 166, 179]). However, in our case, the series (3.2.19) cannot be truncated in an arbitrary way, since it contains complex terms such as the polynomials Him and the coefficients

aklm (this follows from (3.2.14)-(3.2.18) and (3.2.20)). At the same time, the loss function /jt(£i, £2) represented by this series has the meaning of a

real function for real arguments. Therefore, truncating the series (3.2.19), we must remember that a finite sum of this series determines a real function

only if the last terms of this sum contain all terms with Him of a certain group (namely, all Htm with l + m = s, where s is the highest order of the

polynomials left in the sum (3.2.19)).

D

REMARK 3.2.2. Equation (3.2.7), as well as the corresponding equations of successive approximations (3.1.9) and (3.1.10), is centrally symmetric (such equations remain unchanged after the substitution (x\, £2) —>• (—£1, —£2))- Therefore, the series (3.2.19) must not contain terms for which the sum (I + TO) is odd, since the polynomials Htm with odd (I + TO) are

not centrally symmetric (see Table 3.2.1). If we take this fact into account, then the body of practical calculations is considerably reduced.

D

In what follows, we present the first two approximations calculated according to (3.1.9) and (3.1.10) and the quasioptimal control algorithms «o(£i,£2) and «i(£i,CB2) corresponding to these approximations. The zero approximation. First of all, let us calculate the parameter 7° of specific stationary losses in the zero approximation. From (3.1.56) with regard to c(x) = x\ + x\ and (3.2.10), we have 0

O

f

r^

II

/•OO

/

2

*) \

F

/3 /

9

9 \ "I

KB JJ_ 0 0 Calculating the integral, we obtain

7° = j.

(3.2.21)

In view of Remark 3.2.2, the first coefficients a°0 and a^ in the series (3.2.19) are equal to zero.8 The coefficients b^Q, b®^ and 6§2 can be calculated by using the formulas for G2o, GII, and G02 from Table 3.2.1 and 8

The same result can be obtained if we formally calculate the coefficients 6j0 and fejjj

by using (3.2.20).

160

Chapter III

(3.2.16). Then, according to (3.2.20), the coefficient b% has the form O<x\

=

f3

x exp [ - f (x2 + x2)} (7° - xl - x2,), (3.2.22) The integral in (3.2.22) can readily be calculated, thus, taking into account (3.2.21) and (3.2.14), we obtain

In a similar way, we can easily find

(3 2 24)

--

All other coefficients b®m with l-\-m > 2 are zero in view of the orthogonality

condition (3.1.49). According to (3.2.19), it follows from (3.2.23) and (3.2.24) that a° =--^-/^-^

(3.2.25) 02

_

**'

IP

,rV

Finally, using the formulas for H20, #11, and H02 from Table 3.2.1 and (3.2.25), we obtain the loss function in the zero approximation t I

\ _

Orj-

| a0 r r

JOv*!) 2-2J — a20-"20 i

ll-"ll +

| a0 r r

02"02

. Ix| _ -B(/? 2 +4)

(3.2.26) This relation and condition (3.2.9) imply the following equation for the zero-approximation switching line F0:

zi+z

2

= 0.

(3.2.27)

Approximate Synthesis of Stochastic Control Systems

161

In this case, the quasioptimal control algorithm UQ(X] in the zero approximation has the form

u0(x) = -g sign (xi + ^x2} .

(3.2.28)

REMARK 3.2.3. The loss function f o ( x i , x 2 ) in the zero approximation (without a constant term) and the parameter of stationary losses (3.2.21) can be calculated in a different way, without using the method considered above. Indeed, if we first seek the solution of the zero approximation equation (L is the operator (3.2.8)) Lf0 = 7° - xl - x\

(3.2.29)

as the quadratic form

fo(xi, x2) = with unknown coefficients /in, hi2, and h22, then, substituting this expression into (3.2.39), we obtain four equations for ha, /ii2, h22, and 7°. However, higher approximations cannot be obtained by this simple reasoning. D

The first approximation. It follows from (3.1.10) and (3.2.26) that in the first approximation we need to solve the equation

2x2 This equation can be solved by analogy with the zero-approximation equation (3.2.29), but the calculations are much more cumbersome due to more complicated expression on the right-hand side.

First, we employ (3.1.57) and (3.2.21) to find the specific stationary losses 1

R

cR

^

f r°°

//

9

f / 3 / 2

2\~l

T

then, after the integral is calculated, we obtain

The coefficients a\0, a\^... in (3.2.19) are calculated by (3.2.19) and (3.2.20) with regard to the formulas for Glm from Table 3.2.1. We omit

162

Chapter III

the intermediate calculations and write the final expression for /I(:EI, £2)Taking only the first terms in the series (3.2.19) up to the fourth order inclusively (that is, omitting the terms for which (I + m) > 4), we obtain the following expression for the loss function in the first approximation: = VX

\ + px\ + puxixi + pmx\ + const . (3.2.31) Here9

ea/3 25

P31 = -^5-,

Pii =

£CX. l+T,

_ ea B

_ ea/3 125

P22 = -=-,

_ ea 165

Pl3 = 77T^, 1

O^O!

^2 = ^ - ^ ,

P04 = T^, ,

/o i->\ — 1 /9

« = (7r/35)

(32.32)

.

The condition f32
(3.2.32). From (3.2.9) and (3.2.31) we obtain the following equation for the switch-

ing line F1 in the first approximation:

It follows from the continuity conditions that for small e the switching

line F1 is close to F° determined by Eq. (3.2.27). Therefore, if we set £ 2 = —(/3/2)xi in the terms of the order of e in (3.2.33), then we obtain errors of the order of e2 in the equation for F1. Using this fact and formulas (3.2.32) and (3.2.33), we arrive at the following expression with accuracy up to the terms of the order of O(e1}:

Figure 23 shows the position of the switching lines F° and F1 on the phase plane (EI, x%). The switching line (3.2.34) determines the quasioptimal control algorithm in the first approximation:

«i(i) = -esign L +

l+

£«a;i -

-Zi •

(3-2.35)

We do not calculate the coefficients v and p and the constant term "const" in (3.2.31) since they do not affect the position of the switching line and the control algorithm in the first approximation.

Approximate Synthesis of Stochastic Control Systems

163

— —e

W* = £

FIG. 23

FIG. 24 This algorithm can easily be implemented with the help of standard blocks of analog computers. The corresponding block diagram of a quasioptimal control system for damping of random oscillations is shown in Fig. 24, where 1 and 2 denote direct-current amplifiers with amplification factors

gg/32 " 45

164

Chapter III

In conclusion, we dwell on another statement that follows from the calculations of the first approximation.

Namely, all expressions with a

small parameter contain this parameter in the form of the product ea = £/^TT/3B, This statement concerns the loss function (3.2.31), the switching line (3.2.34), and the formula (3.2.30) for stationary specific losses, which can be written in the form

or, more briefly,

if the condition /?2
fact, by the parameter e/ ' ^JirfjB. If we recall that, by the conditions of problem (3.2.2), the parameter e determines the values of admissible control, then it turns out that this variable need not be small for the method of successive approximations to be efficient. Only the relation between the limits of the admissible control and the intensity of random perturbations B is important. All this confirms our assertion made at the beginning of this chapter that the method of successive approximations considered here is convenient

for solving problems with bounded controls when the intensity of random perturbations is large.

§3.3. Synthesis of quasioptimal controls in the case of correlated noises

Now we shall show how the method of successive approximations studied in this chapter can be used for constructing quasioptimal controls when random actions on the system are not white noises. Instead of the system shown in Pig. 13, we shall consider a stabilization system of a somewhat more general form (see Fig. 25), where in addition to random actions £(<) on the plant we also take into account the noise r)(t] in the feedback circuit.

Let the controlled plant P be described, just as in §3.1, by the system of linear differential equations with constant coefficients (3.3.1)

Approximate Synthesis of Stochastic Control Systems

165

y(t)

FIG. 25 where XT = (zi, . . ., xn), UT = (MI, . . . , ur), (,T (t) = (&(<), . . .,&«(<)), and the constant matrices vl, Q, and cr are of dimensions n x n, n x r, and

n x ?7i, respectively. Block 1 in Fig. 25 is assumed to be a linear inertialess device described by the equation

y(t) = Cx(t) + Dr](t),

(3.3.2)

where yT = (j/i, . . . , yi, r = (771, . . . , rjt), and C and D are constant matrices of dimensions I x n and I x I, respectively, (det D ^ 0). The goal of control is to minimize a functional of the form - E

c(x(t},u(t}}dt\.

(3.3.3)

We assume that the random perturbations £(<) and r](t) affecting the system are independent diffusion processes with drift coefficients

= Hr),

(3.3.4)

and matrices of local diffusion coefficients B^ and B^ (G and B^ are mxm dimensional constant matrices; H and Bn are £ x £ dimensional constant matrices; the matrices B£ and B^ are symmetric, 5^ is a nonnegative definite matrix and B^ is a positive definite matrix). It is well known that in this case the diffusion processes £(t) and rj(t) are Gaussian. The stated problem is a special case of the synthesis problem treated in

§1.5. This problem is characterized by the fact that the controlled process x(t) is not a Markov process (in contrast, say, with problems considered in §3.1 and §3.2; moreover, x(t) is a nonobservable process), and therefore, to describe the controlled system shown in Fig. 25, we need a special space Xt

Chapter III

166 of states. This space was called the space of sufficient (see

coordinates in §1.5

also [171]). As was shown in §1.5, in this case, as sufficient coordinates,

we must use a system of parameters that determine the current a posteriori probability density of nonobserved stochastic processes:

The a posteriori density (3.3.5) satisfies a stochastic partial differential equation, which is a special case of Eq. (1.5.39). It follows from §1.5 that to write an equation for the density (3.3.5), we need to use a priori probability characteristics of the (n + m + £)-dimensional stochastic Markov process It follows from (3.3.1), (3.3.2), and (3.3.4) that the combined process has the drift coefficients

ax = Ax + Qu + a!;, ay — Ayxx Ayyy +

CQu,

(3.3.6)

and the matrix of local variances

B-

0...0 0...0

0 . . . 00 . . . 0 0...0 0...0

0...0 0...0

0...00...0 0...0

0...0 0...0

0...0

0...0

0...0

(3.3.7)

0...0

B

The matrices introduced in (3.3.6) and (3.3.7) are Ay,

= CA-

Ayy

=

DHD-^,

C,

Ayt: = Ca, By = DBnDT.

(3.3.8)

10 In this case, the control u in (3.3.1) is assumed to be a given known vector at each time instant t.

Approximate Synthesis of Stochastic Control Systems

167

Using (3.3.6) and (3.3.7), we obtain the following equation for the a posteriori probability density (3.3.5):11

dp(t,z)

I

92p(t,z)

ot

2*

Ozoz

(

T D—1

—^—- = -Spg ; — — 1 dy SJV

2\

p C

'

OCLy

d ,

,.

' - Sp^-^(azp(t,z))

-,\ , F . T D - LI /

rp

_l

dy -\- op ——Tfr — C.psd-u &v

^y

_

\

+ \y* B (ay - cEpsay)

uz

r

c

Ody

\

/

*

^y '— ^~p» ^P Tj—^r J P\t) Zi.

^y

J

*'

/ O O O\

{o.o.y ]

Here p(i,z) = pt(x,^) denotes the a posteriori density (3.3.5), z denotes the vector (#,£), az is the vector composed of the vector-columns ax and ct£, the matrix Bz is a part of the matrix (3.3.7) consisting of its first (n + m) rows and columns, Eps denotes the a posteriori averaging of the corresponding expressions (that is, the integration with respect to z with the density p(t, z ) ) .

It follows from (3.3.6)-(3.3.8) that the matrix B2 is constant, the components of the vector az are linear functions of z, and the expression in the square brackets in (3.3.9) linearly and quadratically depends on z. Therefore, as shown in §1.5 (see also [170, 175]), the a posteriori density p(t, z)

satisfying (3.3.9) is Gaussian, that is, (t))TK-l(t)(z

-

- *(<))],

(3.3.10)

if the initial (a priori] density p(0, z) = po(z) is Gaussian (this is assumed in the sequel).

Substituting (3.3.10) into (3.3.9), one can obtain a system of differential equations for the parameters ~z and K~* of the a posteriori probability density (3.3.10). One can readily see that this system has the form

V [y-ay(z,y,u)},

K~l = -2K~1BZK-1 - R-1V - VTK~i - W

(3.3.11)

(3.3.12)

(in our special case, the system (1.5.52) acquires the form (3.3.11), (3.3.12)). If instead of K~[ we use the inverse matrix K (which is the matrix of a posteriori covariances), then the system (3.3.11), (3.3.12) can be written in the form

z = az(z, u) + fflB~\y T

- ay(z, y, u)],

K = 1BZ + VK + K V + KWK. n

(3.3.13) (3.3.14)

To derive (3.3.9) from (1.5.39), we need to recall that, according to the notation

used in (1.5.39), the vector Aa coincides with the vector a z , the vector A& with ay, and

the structure of the diffusion matrix (3.3.7) implies the following relations between the matrices: \\Ba/3\\ = Bz, \\Baa.\\ = 0, \\Fffp\\ = B~l, and ||B,p|| = By.

Chapter III

168 Here
= Ayxkx( A V = 0

a G

yrr _

~

I ^yx-°y AT R"1 ^y A

I

AT R-1 A

»£ »

y*

T

(3.3.15)

"1

AT f!~1A

st y

y

where, in turn, kxx, kx£,... are elements of the block covariance matrix K,

kxx

kxe

(the dimensions of a block are determined by the dimensions of its subscripts; for example, kx£ is of dimension n x TO). The loss function for problem (3.3.1)-(3.3.3)

F(t,zt,Kt) = min Eps

:(X(T),W(T))C!T y*

u(r) t
= min Eps

C X T

= z t , JiC(t) =

u(r) t
(3.3.16) is completely determined by the time instant t and the current values of the parameters ( z t , K f ) of the a posteriori density (3.3.10) at this instant of time. It follows from the definition given in §1.5 that (z(f), K(t)) are sufficient coordinates for problem (3.3.1)-(3.3.3). The Bellman equation (1.5.54) for the function (3.3.16) can readily be obtained in the standard way from the Eqs. (3.3.13), (3.3.14) for the sufficient coordinates. However, it should be noted that, in this case, the system

(3.3.13), (3.3.14) has a special feature that allows us to exclude the a posteriori covariance K(t) from sufficient coordinates. The point is that, in contrast, say, with a similar system (1.5.53), the matrix equation (3.3.14) is independent of controls u and in no way related to the system of differential equations (3.3.13) for the a posteriori means ?(<). This allows us first to solve the system (3.3.14) and calculate the matrix of a posteriori covariances K(t) in the form of a known function of time on the entire control

interval 0 < t < T (we solve (3.3.14) with the initial matrix K(0) = K0, where KQ is the covariance matrix of a priori probability density po(z)). If K(t) is assumed to be known, then in view of (3.3.8) and (3.3.15) we

can also assume that the matrix crz in (3.3.13) is a known function of time, crz(t), and the loss function (3.3.16) depends on the set (i,2t). Therefore,

Approximate Synthesis of Stochastic Control Systems

169

instead of Eq. (1.5.54) for the loss function F(t, 2), we have the Bellman equation of the form

= mn

[c(x,v.)N(z,K(t))dz\ v

\

(3.3.17) (here N(Z,K(t)} denotes the normal probability density (3.3.10) with the vector of mean values ~z and the covariance matrix K(t)).

Just as in §3.1 and §3.2, Eq. (3.3.17) becomes simpler if we consider the stationary operating conditions for the stabilization system shown in

Fig. 25. The stationary operating conditions established during a long operating time interval (which corresponds to large time t) can exist if only there exists a real symmetric nonnegative definite matrix Kf such that

K*WK* + K*VT + VK* + 1BZ = 0

(3.3.18)

and this constant matrix K* is an asymptotically stable solution of (3.3.14). Let us assume that this condition is satisfied. Denoting the mean "control losses" per unit time under the stationary operating conditions, as

usual, by 7, we can define the stationary loss function /(z) = Km [F(t, z) - -j(T - t)], T-KX>

for which, from (3.3.17), we derive the time-invariant Bellman equation

[

rt f

r

aj(z, w ) — + / c(x,u)N(z, K*) dz

In (3.3.19) (TZ> is the matrix az (see (3.3.8) and (3.3.15)), where kxx, k^x,... are the corresponding blocks of the matrix Kf determined by (3.3.18). In some cases, it is convenient to solve Eq. (3.3.19) by the method of successive approximations treated in §3.1 and §3.2. The following example shows how this method can be used. Let us consider the simplest version of

the synthesis problem (3.3.1)-(3.3.3) in which Eqs. (3.3.1), (3.3.2) contain scalar variables instead of vectors and matrices. In (3.3.3) we write the penalty function c(x, u) in the form

c(x,u) = x +e~lu2,

(3.3.20)

170

Chapter III

where e > 0 is a small parameter. From the "physical" viewpoint, this penalty function means that the control actions are penalized much more strongly than the deviations of the phase coordinate x(i) of the control system (3.3.1) from the equilibrium state x — 0. For simplicity, we set Q =
x = — ax -\- £ + u + — [ y — (h — a)x — £ — u • (3.3.21)

£, = —
I

^XX

ff

——

^QifoxX

I

^fox^

I

'

_-j

I

5

-^-,

%£, ——

\®>

' 9)*^x£

ffx = (h - a)kxx + kx^

(3.3.22)

Bn
In this case the Bellman equation (3.3.19) has the form

„..,, s

dxdt

+

4

* eeJ

=

,. (3.3.23,

Here the constants ax* and cr^« are ffx,

= (h- a)k*xx + fc* c ,

where the constant covariances k*x, k*x^ and fc|^ form the stationary solution of the system of differential equations (3.3.22). Passing to the new variables x± = (^/B^/

(3.3.24)

where r =
\+ f x\N(bx1,k*,)dx = f , (3.3.25) J J

Approximate Synthesis of Stochastic Control Systems

171

where b = &x* /\/B^. Taking into account the formulas r°°

I

x\N(bx-L,klx)dx

J — CO

°°

/

x[exp -00

1

--——(v-bx *KXX

V

_ ^rt T.L* ) +I 6' a . 1•*•| Il _ 2 F (\ _ - ^/ rL JT (3.3.26)

27T J -

and minimizing the expression in the square brackets, we obtain from (3.3.25) the optimal control for the stationary stabilization conditions:

uf(xi, x2) = ~~~(xii

a;2

)'

(3.3.27)

where the function /(KI,^) satisfies the nonlinear elliptic equation l, x 2 )

= 7 - "(xi) +

2

-

(3-3.28)

Equation (3.3.28) is similar to Eqs. (3.1.8) and (3.2.7), therefore, in this case, we can use the same method of approximate synthesis as in §3.1 and §3.2. Then the quasioptimal control U]t(xi,X2) in the fcth approximation is determined by the formula

uk(xi,x2) = - — -/^(x1,X2), 20

where the functions fk(xi,x2) approximations

k = 0,1,2,...,

(3.3.29)

satisfy the linear equations of successive

Lfk(xi,x2)=uk(xi,X2),

* = 0,1,2,...,

(3.3.30)

, x2) = U>Q(XI) = 7° -

In this case, the calculations of successive approximations fk(xi,x2) are completely similar to those discussed in §3.1 and §3.2. Therefore, here

Chapter III

172

we restrict our consideration to a brief description of the calculation of fk(xi,xz). We only dwell upon the distinctions in formulas. The operator (3.3.24) can be written in the form (3.1.5) if A = \\Aij\\\ and B = \\Bij\\^ in (3.1.5) are understood as the matrices

A=

1 1 1 1

—a 0

-g

The stationary density po(x) satisfying (3.1.14) has the form (3.1.15), and the matrices P and P~1, as one can readily see, have the form

P ~_ A ,/2

-afj,p

-a/ip -(r +

(3.3.31)

p/ > 1 , = a + g, v — a- g — r, and p = r + 2g). Using (3.3.31), we can find the matrix (see (3.1.35)) p[cf-(r + gY}

p(1a? - 2ar — gr) - g/j,2

(3.3.32)

as well as the matrix p/2gfj, l/2g

V =

(3.3.33)

By the similarity transform, this matrix reduces the matrix (3.3.32) to the diagonal form

\i 0

0 A2

= -a,

A 2 = -g.

(3.3.34)

It follows from (3.1.44), (3.1.51), and (3.1.55) that solutions of the equations of successive approximations (3.3.30) can be represented as the series (3.3.35)

where Him(xi,X2) are two-dimensional Hermitian polynomials calculated __ _ i __ _ -i by the formulas (3.2.17) with y = V x (the matrix V is inverse to (3.3.33)).

Approximate Synthesis of Stochastic Control Systems

173

The coefficients aktm are calculated by the formula (see (3.2.19))

fc

_ ——tfm——_——^m_ ) l\i + m\2 la + mg

(3.3.36)

and the coefficients

,,

00

I 0 /* X* det^P G/ m (/i) ^=T7Tpx ex? [ - ^(xTPx)]uk(x) dXldx2 !m, JJ

(3.3.37) are expressed in terms of the group of Hermitian polynomials Gim(x\, x-z) orthogonal to Him(xi, x-i) and calculated by (3.2.18). Parallel to the calculations of the successive approximations to the loss

function (3.3.35), we calculate specific stationary losses -yfc (corresponding to the Mh approximation) from the condition &QO = 0. So, in the zero approximation we have

hence, performing simple calculations and taking into account (3.3.31), we obtain

g) 2

Next, using the obtained value of 7° and formulas (3.3.26), (3.3.30), (3.3.36), and (3,3.27), we can calculate any desired number of coefficients a°m in the series (3.3.35). With the help of these coefficients, we can construct an approximate expression for the function fo(xi,X2), which allows

us to derive an explicit formula for the quasioptimal control algorithm UQ(XI,X^) in the zero approximation and to calculate the variables -y1, /i (£1,2:2), and MI (3:1,2:2) related to the first approximation. Here we write explicit formulas neither for fo(xi, Xi) nor for /i(a;i, £2), since they are very cumbersome. We only remark that in this case all quasioptimal control algorithms (3.3.29) are nonlinear functions of the phase variables (2:1,2:2); moreover, the character of nonlinearity is determined by

the number of terms left in the series (3.3.35) for the calculations.

Chapter III

174

Thus, from the preceding it follows that the methods for calculations of stationary operating conditions of the stabilization system (Fig. 13) can readily be generalized to the case of a more general system with correlated noise (Fig. 25) if the noise is a Gaussian Markov process. In this case, the

optimal system is characterized by the appearance of an optimal filter in the regulator circuit; this filter is responsible for the formation of sufficient coordinates. In our example (Fig. 25), where x, y, u, £, and f] are scalar, this filter is described by Eqs. (3.3.21). The circuit of functional elements of this closed-loop control system is shown in Fig. 26.

f

——>

" 1 ' -, >

—J

s(i )

Jo

"NT1

M

x(<)

W -*

t( t) ^\

3

Zi

<—

y

5^

Jo

,_ 4<

6

<

7<

f—

d/dt

'

8-

y(*)

!>?(* x

«—

FIG. 26 Blocks P and 1 are units of the initial block diagram (Fig. 25). The rest of the diagram in Fig. 26 determines the structure of the optimal controller. One can see that this controller contains standard linear elements of analog computers such as integrators, amplifiers, adders, etc. and one nonlinear converter NC, which implements the functional dependence (3.3.29).

Units of the diagram marked by ">" and having the numbers 1,2,... ,8 are amplifiers with the following amplification factors Kt:

K5 = -1,

K6 = a - h,

Kr = -1,

K& = h.

Approximate Synthesis of Stochastic Control Systems

175

§3.4. Nonstationary problems. Estimates of the quality of approximate synthesis

3.4.1. Nonstationary synthesis problems. If equations of a plant

are time-dependent or if the operating time T of a system is bounded, then the optimal control algorithm is essentially time-varying, and we cannot find this algorithm by using the methods considered in §§3.1-3.3. In this case, to synthesize an optimal system, it is necessary to solve a time-varying Bellman equation, which, in general, is a more complicated problem. However, if the plant is governed by a system of linear (time-varying) equations, then we can readily write solutions of the successive approximation equa-

tions (3.0.6), (3.0.7) in quadratures. Let us show how this is done. Just as in §3.1,

problem for the stabilization system (Fig.

we consider the synthesis

13) with a plant P described by

equations of the form

x = A(i)x + Q(t)u

+ ar(t)£(t),

(3.4.1)

where x is an n-dimensional vector of phase coordinates, u is an r-dimensional vector of controls, A(t), Q(t), and
n-dimensional standard white noise (1.1.34). To estimate the quality of control, we shall use the following criterion of the type of (1.1.13): (3.4.2)

and assume that the absolute values of the components of the control vector u are bounded by small values (see (3.1.3)):

K <£umi,

i= l,...,r.

(3.4.3)

According to (3.1.6) and (3.1.7), the optimal control u*(t,x) for problem (3.4.1)-(3.4.3) is given by the formula u,(t,x) = -{ew m i,...,£« mr }sign ( QT(t)—-(t,x) ) , \ dx / where the loss function F(t,

(3.4.4)

x) satisfies the equation

LtiXF(t, x) = -c(x)

- £$ (t, —— (<, x)}, \ ox )

(3.4.5)

176

Chapter III

with Lt,x denoting a linear parabolic operator of the form

—+-1

d

fit* L/tt-

*} LJ

For the function 3>(t,dF/dx], we have the expression (3.4.7)

In this case, the function F(t,x) must satisfy (3.4.5) for all x £ R n , 0 < t < T, and be a continuous continuation of the function (3.4.8)

as t -> T (see (1.4.22)). The nonlinear equation (3.4.5) is similar to (3.0.5) and, according to (3.0.6) and (3.0.7), can be solved by the method of successive approximations. To this end, we need to solve the sequence of linear equations

Lt,,F0(t,x) = -c(x),

Lt,,Fk(t, x) = -c(x)

(3.4.9)

- e$ (t, ^±(t, x)}, dx V >

k = 1, 2, . . . (3.4.10)

(all functions Fk(t,x) determined by (3.4.9) and (3.4.10) must satisfy condition (3.4.8)). Next, if we take Fk(t,z) as an approximate solution of Eq. (3.4.5) and substitute Fk into (3.4.4) instead of F, we obtain a quasioptimal control algorithm Uk(t, x) in the fcth approximation. Let us write the solutions Fj. (i, a;), k = 0, 1, 2, . . . , in quadratures. First, let us consider Eq. (3.4.9). Obviously, its solution Fo(t,x) is equal to the value of the cost functional

c(x(T))dT+i>(x(T))

x(t)=x

(3.4.11)

on the time interval [t, T] provided that there are no control actions. In this case, the functional on the right-hand side of (3.4.11) is calculated along the trajectories x(r), t < r < T, that are solutions of the system of stochastic

differential equations

X = A(T)X

+
x ( t ) = x,

describing the uncontrolled motion of the plant (u = 0 in (3.4.1)).

(3.4.12)

Approximate Synthesis of Stochastic Control Systems

177

It follows from §1.1 and §1.2 that the solution of (3.4.12) is a continuous Markov process X(T) (a diffusion process). This process is completely determined by the transitive probability density function p(x, t; z, T), which determines the probability density of the random variable z = X(T) if the stochastic process x(t) was in the state x(f) — x at the preceding time moment t. Obviously, by using p(x,t;z,r), we can write the functional (3.4.11) in the form

FQ(t,x)= f

dr I

Jt

p(x,t-z,T)c(z)dz+

Jnn

f

p(x,t;z,T)4>(z)dz.

(3.4.13)

JRn

On the other hand, we can write the transitive density p(x, t; z, T) for the diffusion process X(T) (3.4.12) as an explicit finite formula if we know the

fundamental matrix X(t, T) for the nonperturbed (deterministic) system z = A(t)z. Indeed, since Eqs. (3.4.12) are linear, the stochastic process X(T) satisfy-

ing this equation is Markov and Gaussian. Therefore, for this process, the transitive probability density has the form

p(x,t;z,T) = [(2ir)B det DJ-^expI-f (z - a)TD~l(z - a)], where a = Ez = E(X(T)

(3.4.14)

x ( t ] = a;) is the vector of mean values and

D = E[(z — Ez)(z — Ez) T ] is the covariance (dispersion) matrix of the random vector z = X(T). On the other hand, using the fundamental matrix X(t,T)12 we can write the solution of system (3.4.12) in the form (the Cauchy formula)

x(r) = X(t,r)x+ f Jt

X(s,T)
Hence, performing the averaging and taking into account properties of the white noise (1.1.34), we obtain the following expressions for the vector a

and the matrix D:

a = Ex(r)=X(t,T)x,

(3.4.15) T T

D = E [ ( x ( r ) - a))((xx((rr))--aa)) ]

!

Jt

X(s,T)o-(s)Et(s)£T(S')o-T(s')XT(s',T)dsds'

12

Recall that the fundamental matrix X(t,r),T > t, is a nondegenerate nxn matrix

whose columns are linearly independent solutions of the system Z ( T ) = A(T)Z(T), so

that X(t, t) = E, where E is the identity matrix. Methods for constructing fundamental matrices and their properties are briefly described on page 101 (for details, see [62, 111]).

178

Chapter III X(s,T)cr(s)6(s'

-s)
X(s,T)B(s)XT(s,T)ds,

=

B(s)=a(s}ffT(s).

J*

(3.4.16)

Formulas (3.4.13)-(3.4.16) determine the solution F0(t,x) of the zeroapproximation equation (3.4.9), satisfying (3.4.8), in quadratures. It follows from (3.4.13)-(3.4.16) that the function F0(t,x) is infinitely many times differentiable with respect to the components of the vector x if the functions c(z) and ij)(z) belong to a rather wide class (it suffices that the functions c ( z ) exp(— |z T D~ 1 z) and ^>(z) exp(— ^zTD~iz) were absolutely integrable [25]). Therefore, by analogy with (3.4.13), we can write the solution Fk(t,x) of the successive approximation equations (3.4.10), satisfying

(3.4.8), in the form

Fk(t,x)=

dr Jt

JRn

= l,2,...

(3.4.17)

To obtain explicit formulas for the functions Fo(t, x), Fi(t, x),. .., which allow us to write the quasioptimal control algorithms Uo(t, x), «i(<, a;), ...

as finite analytic formulas, we need to have the analytic expression of the matrix X(t,r) and to calculate the integrals in (3.4.13) and (3.4.17). For autonomous plants (the case where the matrix A(t) in (3.4.1) and (3.4.12) is constant, A(t) = A = const), the fundamental matrix X(t,r) has the form of a matrix exponential:

X(t,r) = e A ( r -*>

(3.4.18)

whose elements can be calculated by standard methods. On the other hand, it is well known that fundamental matrices of nonautonomous systems can be constructed, as a rule, by numerical methods.13 Thus for A(t) ^ const, it is often difficult to obtain analytical results. If the plant equation (3.4.1) contains a constant matrix A(i) EE A —

const, then formulas (3.4.13) and (3.4.17) allow us to generalize the results obtained in §§3.1-3.3 for the stationary operating conditions to the timevarying case. For example, let us consider a time-varying version of the

problem of optimal damping of random oscillations studied in §3.2. 13 Examples of special matrices A(t) for which the fundamental matrix of the system x = A(t)x can be calculated analytically, can be found, e.g., in [139].

Approximate Synthesis of Stochastic Control Systems

179

Just as in §3.2, we shall consider the optimal control problem (3.2.1)(3.2.3). However, in contrast with §3.2, we now assume that the terminal time (the upper limit T of integration in the functional (3.2.3)) is a finite

fixed value. By writing the plant equation (3.2.1) in the form of system (3.2.4), we see that problem (3.2.1)-(3.2.3) is a special case of problem

(3.4.1)-(3.4.3) if

0

4= a(t) = a =

0

1

Q(t) = Q = 1

-i -p

0

0

C(x) =

o VB

2 2 Xl+X2,

i>(3

(3.4.19) = 0.

Therefore, it follows from the general scheme (3.4.4)-(3.4.10) that in this case the optimal control has the form

u#(t,xi,x2) = -esign f - — ( t , x i , x 2 ) 1,

(3.4.20)

where for 0 < t < T the function F(t,xi,x2) satisfies the equation

Lt>xF(t, xi, x2) = -x\-

(3.4.21)

0z 2 V "

and vanished at the terminal point, that is,

F(T,X!,x2) = Q.

(3.4.22)

According to (3.4.6) and (3.4.13), the operator Lt,x in (3.4.21) has the form

d

d Lt,x = ^T

8

B d2 2 dx\

-—— + — ——r.

dx2

Let us calculate the loss function Fg(t, xi, x2) of the zero approximation. In view of (3.4.9), (3.4.21), and (3.4.22), this function satisfies the linear equation

with the boundary condition

F0(T,xi,x2) = Q.

(3.4.24)

According to (3.4.13), the function F0(t,Xi,x2) can be written in quadratures ,-T

F0(t,xi,x2)=

dr Jt

(3.4.25)

Chapter III

180

where the transition probability density p(x,t;z, T) is given by (3.4.14). It follows from (3.4.15) and (3.4.16) that to find the parameters of the transition density we need to calculate the fundamental matrix (3.4.18). Obviously, the roots AI and A2 of the characteristic equation dei(A — \E) = 0 of the matrix A given by (3.4.19) are

Ai,2 = --

From this and the Lagrange-Silvester formula [62] we obtain the following expression for the fundamental matrix (3.4.18) (here p = (T — t ) ) :

-(A-X1E) -(A-X2E) (Ai-A2 (A2• sin Sp + 6 cos 6p sin 6p — sin 6p 6 cos Sp — | sin 6p (3.4.26)

X(t,r) =

It follows from (3.4.15), (3.4.16), and (3.4.26) that in this case the vector of means a and the variance matrix D of the transitive probability density (3.4.14) have the form

x\ cosSp + |(a:2 + f

(3.4.27)

On

D12

_B

Di 2 B

D 22

25' 6

B

(3.4.28)

Substituting (3.4.14) into (3.4.25) instead ofp(x,t;z,r), integrating, taking into account (3.4.27) and (3.4.28), and performing some easy calcula-

Approximate Synthesis of Stochastic Control Systems

181

tions, we obtain the following final expression for the function Fo(t, x^, x2):

xi

-l-

B(/32+4:); ^2

—e-^sm2S-p(B - 2(3x1 - 4xlX2) 80

(3.4.29)

"? cos 2 W/32 - 4)z? + 40*1*2 + 4zi - 5/3], where ~p = T — t. Let us briefly discuss formula (3.4.29). If we consider the terms on the right-hand side of (3.4.29) as function of "reverse" time ~p — T — t, then these terms can be divided into three groups: infinitely increasing, damping, and independent of ~p as ~p —> oo. These three types of terms have the following physical meaning. The only infinitely growing term (B/(3)~p in (3.4.29) shows how the mean losses (3.4.11) depend on the operating time in the mode of stationary operating conditions. Therefore, the coefficient B//3 has the meaning of the specific mean error 7, which was calculated in §3.2 by other methods and for which we obtained 7° = B//3 in the zero approximation (see (3.2.21)). Next, the terms independent of ~p (in the braces in (3.4.29)) coincide with the expression for the stationary loss function obtained in §3.2 (formula (3.2.26)). Finally, the damping terms in (3.4.29) characterize the deviations of operating conditions of the control system from the stationary ones. Using (3.4.29), we can approximately synthesize the optimal system in the zero approximation, where the control algorithm uo(t, Xi,x2) has the form (3.4.20) with F replaced by FQ. The equation

8F0

=

2x2

— I"

I

2x

+

2cos 2Sp(/3xi + x2) = 0

(3.4.30)

determines the switching line on the phase plane (xi, x 2 ). Formula (3.4.30) shows that this is a straight line coinciding with the K-axis as ~p —)• 0 and rotating clockwise as ~p -> oo (see Fig. 27) till the limit value xi + 2x2/(3 — 0 corresponding to the stationary switching line (see (3.2.27)). Formulas (3.4.29) and (3.4.30) also allow us to estimate whether it is important to take into account the fact that the control algorithm is timevarying. Indeed, (3.4.29) and (3.4.30) show that deviations from the stationary operating conditions are observed only on the time interval lying at

Chapter III

182

FIG. 27 the distance ~ (3~l from the terminal time T. Thus, if the general operating time T is substantially larger than this interval (say, T 3> 3//3), then we can

use the stationary algorithm on the entire interval [0,T], since in this case the value of the optimality criterion (3.2.3) does not practically differ from the optimal value. This fact is important for the practical implementation of optimal systems, since the design of regulators with varying parameters is a rather sophisticated technical problem. 3.4.2. Estimates of the approximate synthesis performance. Up to this point in the present chapter, we have studied the problem of how to find a control syste close to the optimal one by using the method of successive approximations. In this section we shall consider the problem of how the quasioptimal system constructed in this way is close to the optimal

system, that is, the problem of approximate synthesis performance. Let us estimate the approximate synthesis performance for the first two (the zero and the first) approximations calculated by (3.0.6)-(3.0.8). As an example, we use the time-varying problem. (3.4.1)-(3.4.3). We assume that

the entries of the matrices A(t), Q(t), and cr(i) in (3.4.1) are continuous functions of time defined on the interval 0 < t < T. We also assume that the penalty functions c(x) and i}>(x) in (3.4.2) are continuous and bounded for all x £ R n . Then [124] there exists a unique function F(t, x) that satisfies

the Cauchy problem (3.4.5), (3.4.8) for the quasilinear parabolic equation (3.4.5)14 This function is continuous in the strip HT = {\x\ < oo, 0 < t < T} 14

We shall use the following terminology: Eq. (3.4.5) is called a quasilinear (semi-

linear) parabolic equation, the problem of solving Eq. (3.4.5) with the boundary condi-

Approximate Synthesis of Stochastic Control Systems

183

and continuously differentiable once with respect to t and twice with respect to x for 0 < t < T; its first- and second-order derivatives with respect to x are bounded for x 6 HTOne can readily see that in this case

\F(t,x)-F1(t,x)\~e2,

\F(t,x)-F0(t,x)\~e,

(3.4.31)

and hence, for small £, the functions F0(t, x) and Fi(t, x) nicely approximate

the exact solution of Eq. (3.4.5). To prove relations (3.4.31), let us consider the functions So(t,x) = F(t, x) - F0(t, x) and Si(t, x) = F(t, x) - Fi(t, x). It follows from (3.4.5), (3.4.9), and (3.4.10) that these functions satisfy the equations

LS0 = -e* (t, ^-} ,

S0(T, x) = 0,

(3.4.32)

(3.4.33)

Equations (3.4.32) and (3.4.33) differ from (3.4.9) only by the expressions on the right-hand sides and by the initial data. Therefore, according to

(3.4.13), the functions So and Si can be written in the form T

/»

/)/?

dr /

/

S1(t,x)=ef Jt

\

p(x,t;z,T)$ (r,—— (T,Z) ) dz, v

jRn

az

(3.4.34)

'

dr f p (z,i;z,r)[*(r,^)-*(T,^)]dz. v Jnn L v Oz ' J (3 4 35)

Since the function $ is continuous (see (3.4.7)) and the components of the vector dF/dx are bounded, we have |$(T, dF/dz)\ < P for all r 6 [0, T]; hence, we have the estimate

\S0(t ,x)\<e

dr I Jt

p(x,t;z,T)$

dz,

jRn

<eP Ji

dr I p(x,t;z,T)dz = •/R«

(3.4.36)

tion (3.4.8) is called the Cauchy problem, and the boundary condition (3.4.8) itself is sometimes called the "initial" condition for the Cauchy problem (3.4.5), (3.4.8). This terminology corresponds to the universally accepted standards [61, 124] if (as we shall do in §3.5) in Eq. (3.4.5) we perform a change of variables and use the "reverse" time p = T — t instead of t. In this case, the backward parabolic equation (3.4.5) becomes a "usual" parabolic equation, and the boundary value problem (3.4.5), (3.4.8) takes the form of the standard Cauchy problem.

Chapter III

184

The first relation in (3.4.31) is thereby proved. To prove the second relation in (3.4.31), we need to estimate the difference S'0 = (dF/dxi) - (dF0/dxi). To this end, we differentiate (3.4.32) with respect to x;. As a result, we obtain the following equation for the function SQ: (3.4.37)

(in fact, the derivative on the right-hand side of (3.4.37) is formal, since the

function $ (3.4.7) is not differentiable). Using (3.4.13) for SQ, we obtain T

C

dr

(3.4.38)

JRn

Integrating (3.4.38) by parts with respect to z,- and taking into account (3.4.14) and (3.4.15), we arrive at

S'0 = £ f

Jt

dr I

Jnn

p(x,t;z,r)D-j1(zj

-Xjk(t,r)xk}$(r,

-f) dz. (3.4.39)

\ oz /

From (3.4.39) we obtain the following estimate for S10:

\So\ < £P I Jt

dr

I ^i? \zi ~ XJk(t, r)xk\p(x, t; z, T) dz = ePVi, Jnn .

(3.4.40)

J=l Now we note that since Q(i) in (3.4.7) is bounded, the function $(<, satisfies the Lipschitz condition with respect to y:

(3.4.41) Using (3.4.40), (3.4.41), and (3.4.35), we obtain dz <sN f Jt

dr f

p(x,t;z,T}y^\S*0\dz

JRn

i=1

n

<£2NPV(T-t),

V

(3.4.42) 1=1

Approximate Synthesis of Stochastic Control Systems

185

which proves the second relation in (3.4.31). In a similar way, we can also estimate the difference dF/dxi — dFi/dxt = S{. Indeed, just as (3.4.39) was obtained from (3.4.32), we use (3.4.33) to obtain

S{=e ?{=ef Jt

dr dr f

p(x,t-,z,T)D(z-Zkxk)*T,-*T,

dz.

JRn

(3.4.43) This relation and (3.4.40), (3.4.41) for the function S{ readily yield the estimate f> 17

|S1I =

fl Z7

< £2PVViN,

(3.4.44)

which we shall use later. According to (3.0.8), in this case the quasioptimal controls uo(t, x) and MI(<, x) are determined by (3.4.4), where instead of the loss function F(t, x) we use the successive approximations F0(x,t) and Fi(x,t), respectively. By Go(t, a:) and Gi(x,t) we denote the mean values of the functional (3.4.11) calculated on the trajectories of the system (3.4.1)

x = A(T)X + Q(T)MJ(T, a;) +
v(+\ —T \ I — 5

i — 0 1 — ' '

with the use of the quasioptimal controls uo(t, x) and u-\_(t, x). The functions Gi(t, a;), i — 0, 1, estimate the performance of the quasioptimal control algorithms Wi(<, x), i = 0,1. Therefore, it is clear that the approximate synthesis may be considered to be justified if there is only a small difference between the performance criteria Go(t, x) and Gi(i, x) of the suboptimal systems and the exact solution F(t, x} of Eq. (3.4.5) with the initial condition (3.4.8). One can readily see that the functions GO and GI satisfy estimates of type (3.4.31), that is,

\F(t1x)-G0(t1x)\~£,

\F(t,x)-G1(t,x)\~e2.

(3.4.45)

Relations (3.4.45) can be proved by analogy with (3.4.31). Indeed, the functions GO and GI satisfy the linear partial differential equations [45, 157]

LGi(t,x) = -c(x)-euf(t,x)QT(t)-(t,x),

(3.4.46)

Chapter III

186

This fact and (3.4.9), (3.4.10) imply the following equations for the functions H0 = F0 - Go and HI = FI - Gj.:

H0(T, x) = 0,

LH0 =

LH^^W^-*(t^}\,

(3.4.47)

F1(r,,) = o.

(3.4.48)

Since wf Q T '^j£- = $(*, ^")> Eq. (3.4.48) can be rewritten as follows:

(3.4.49) It follows from (3.4.4) that Eqs. (3.1.46), (3.4.49) are linear parabolic equations with discontinuous coefficients. Such equations were studied in [80, 81, 144]. It was shown that if, just as in our case, the coefficients in (3.1.46), (3.1.49) have discontinuities of the first kind, then, under our assumptions about the properties of A(t), Q(t), c ( x ) , and ip(x), the solutions of Eqs. (3.4.46), (3.4.49) and their first-order partial derivatives are bounded.

Using this fact, we can readily verify that the right-hand sides of (3.4.47) and (3.4.49) are of the order of e and e 2 , respectively. For Eq. (3.4.47), this statement readily follows from the boundedness of the components of the

vectors dGo/dx and MO and the elements of the matrix Q. The right-hand side of (3.4.49) can be estimated by the Lipschitz condition (3.4.41) and the inequality

dFi

dF0

8F

dF

which follows from (3.4.40) and (3.4.44). Therefore, for the functions H0 and HI we have

\HO\

•e2.

(3.4.50)

To prove (3.4.45), it suffices to take into account the inequalities |F-Go

FO-GO\< F- FO\ + \HO\,

\F-Gi and to use (3.4.31) and (3.4.50). Thus, relations (3.4.45) show that if the Bellman equation contains a small parameter in nonlinear terms, then the difference between the quasioptimal control system calculated by (3.0.6)-(3.0.8) and the optimal control system is small and, for sufficiently small e, we can restrict our calculations to a small number of approximations. We need either one (the zero)

Approximate Synthesis of Stochastic Control Systems

187

or two (the zero and the first) approximations. This depends on the admissible deviation of the quasioptimal system performance criteria Gi(t, x) from the loss function F(t,x). In conclusion, we make two remarks about (3.4.45).

REMARK 3.4.1. One can readily see that all arguments that lead to the estimates (3.4.45) remain valid for any types of nonlinear functions in (3.4.5) that satisfy the Lipschitz condition (3.4.41). Therefore, in particular, all statements proved above for the function <S> (3.4.7) automatically hold for equations of the form (3.0.4) with r-dimensional ball taken as the set U of admissible controls, instead of an r-dimensional parallelepiped. D REMARK 3.4.2. The estimates of the approximate synthesis accuracy considered in this section are based on the assumption that the solutions of the Bellman equation and their first-order partial derivatives are bounded. At first glance it would seem that this assumption substantially narrows the class of problems for which the approximate synthesis procedure (3.0.6)(3.0.8) can be justified. Indeed, the solutions of Eqs. (3.4.5), (3.4.9), (3.4.10), and (3.4.46) are unbounded for any x G Rn if the functions c(x) and i>(x) infinitely grow as x —>• oo. Therefore, for example, we must eliminate frequently used quadratic penalty functions from consideration. However, if we are interested in the solution of the synthesis problem in a given bounded region XQ of initial states x(0) of the control system, then the procedure (3.0.6)-(3.0.8) can also be used in the case of unbounded penalty functions. This statement is based on the following heuristic arguments. Since the plant equation (3.4.1) is linear and the matrices A(t), Q(t), and ff(t) and the control vector u are bounded, we can always choose a sufficiently large number R such that the probability

P{sup 0 < t < T \x(t)\ > R} becomes arbitrary small [11, 45, 157] for any fixed domain XQ of the initial states x ( Q ) . Therefore, without loss of accuracy, we can replace the unbounded functions c(x) and i{>(x) in (3.4.2) (if, in a certain sense, these functions grow as x\ — R —> oo slower than the probability P{sup 0 < t < T x(i)\ > R} decreases as R —>• oo) by the expressions

c(x] maXj; c(x) \x\ = R,

for for for

ij)(x)

for

\x\ < R, \x\ > R, x

< R, > R,

\x\ = R, for which the solutions of Eqs. (3.4.5), (3.4.9), (3.4.10), and (3.4.46) satisfy the boundedness assumptions. D

188

Chapter III

The question of whether procedure (3.0.6)-(3.0.8) can be used for solving the synthesis problems with unbounded functions c(x) and i/>(x) in the functional (3.4.2) will be rigorously examined in the next section. §3.5. Analysis of the asymptotic convergence of

successive approximations (3.0.6)— (3.0.8) as k —>• oo

The method of successive approximations (3.0.6)-(3.0.8) can also be used for the synthesis of quasioptimal control systems if the Bellman equation

does not contain a small parameter in nonlinear terms. Needless to say that (in contrast with Section 3.4.2 in §3.4) the first two approximations, as a rule, do not approximate the exact solution of the synthesis problem sufficiently well. We only hope that the suboptimal system synthesized on

the basis of (3.0.9) is close to the optimal system for large k. Therefore, we need to investigate the asymptotic behavior as k —>• oo of the functions

Fk(t,x) and U k ( t , x ) in (3.0.6)-(3.0.8). The present section deals with this problem. Let us consider the time-varying synthesis problem of the form (3.4.1)™ (3.4.3) in a more general setting. We assume that the plant is described by

the vector-matrix stochastic differential equation of the form

x = a(t, x) + q(t)u + a(t, x ) £ ( t ) .

(3.5.1)

Here x is an n-dimensional vector of phase coordinates of the system, u is an r-dimensional vector of controls, £(t) is an n-dimensional vector of random actions of the standard white noise type (1.1.34), a(t, x) is a given vectorfunction of the phase coordinates x and time t, and q(t) and
such that for t > t0, 0 < t0 < T, the stochastic equation (3.5.1) has a unique solution x(t) satisfying the condition x(t0) = XQ at least in the weak sense (see §IV.4 in [132]). As an optimality criterion, we take the functional (3.4.2),

U

T

-I

c(x(t))dt + il>(x(T))\. J

(3.5.2)

Here c ( x ) and ij)(x) are given nonnegative scalar penalty functions whose

special form is determined by the character of the problem considered (the requirements on c ( x ) and ij)(x) are given later). The constraints on the domain of admissible controls have the form

(1.1.22), weC/,

(3.5.3)

Approximate Synthesis of Stochastic Control Systems

189

where U C Rr is a closed bounded convex set in the Euclidean space R r . It is required to find a function it, = ut(t,x(t)} satisfying (3.5.3) such that the functional (3.5.2) calculated on the trajectories of system (3.5.1) with the control M* attains its minimum value.

In accordance with the dynamic programming approach, solving this problem is equivalent to solving the Bellman equation that, for problem

(3.5.1)-(3.5.3), reads (see §1.4)

Here a(t, x] is a column of functions with components (see (1.2.48))

-—-^ami, Z Oxm

t=l,...,n.

(3.5.5)

Recall that we assumed in §1.2 that throughout this book all stochastic differential equations written (just as (3.5.1)) in the Langevin form [127] are symmetrized [174].

By definition, the loss function F in (3.5.4) is equal to

F = F(t,x) = min u(T)£U

Here E[(-)

T ( r r E-H / C(X(T))

dr

llJt

x(t) = x] means averaging over all possible realizations of the

controlled stochastic process x(r) =; XU^(T) (r > t) issued from the point

x at T = t. It follows from (3.5.6) that

F(T,x) = j(x).

(3.5.7)

Passing to the "reverse" time p = T — t, we transform Eq. (3.5.4) and the condition (3.5.7) to the form

LF(p,x) = -c(x) - mm \uTqT(p)~(p, x ] \ , F(0,z) = '0(z).

In (3.5.8) we have the following notation:

(3.5.8) (3.5.9)

190

Chapter III

ai(p,x) = a,i(x,T — p), q(p) =q(T — p), b t j ( p , x ) is a general element of the matrix |
x

c(x).

(3.5.11)

In addition, we obtain the function

u*(p,x) =
(3.5.12)

that satisfies the condition T T, ^^F

dF

~ and solves the synthesis problem (after we have solved Eq. (3.5.11) with

the initial condition (3.5.9)). The form of the functions

Eq. (3.0.5) only by a small parameter (there is no small coefficient e of the function $). Nevertheless, in this case, we shall also use the approximate synthesis procedure (3.0.6)-(3.0.8) in which, instead of the exact solution F(p,x) of Eq. (3.5.11), we take the sequence of functions F Q ( P , X ) , F I ( P , X ) , . . . recurrently calculated by solving the following sequence of linear equations:

) = i>(x),

(3.5.13) (3.5.14)

The successive approximations u0(p, x), Ui(p, x), . . . of control are determined by the expressions

fc = 0 , l , . . .

(3.5.15)

Below we shall find the conditions under which the recurrent procedure (3.5.13)-(3.5.15) converges to the exact solution of the synthesis problem.

Approximate Synthesis of Stochastic Control Systems

191

Let us consider Eq.(3.5.11) with the operator L determined by (3.5.10). The solution F(p, x) and the coefficients bij(p, x) and a,(p, x) of the operator L are denned on UT = {[0, T] x R n } = {(p, x): 0 < p < T, x e R»}. We assume that everywhere in UT the matrix \\bij(p, x)\\" satisfies the con-

dition that the operator L is uniformly parabolic, that is, everywhere in UT for any real vector x

we

have

AM2<X%>,*)x
(3-5.16)

where A. and A are some positive constants. Moreover, we assume that the functions bij(p, x) and o;(/9, x) are bounded in E/r, continuous in both variables (p, x), and satisfy the Holder inequality with respect to x uniformly

in p, that is,

\bij(p,x)-bii(p,x°)\
(3.5.17) 0 < a < 1,

We assume that the functions c ( x ) , if>(x), and 3>(p,dF/dx)

A = const.

are continuous

and that c(x) and if>(x) satisfy the following restrictions on the growth

as x

• oo:

(3.5.18) (h is a positive constant). We also assume that the function $(/?, v) satisfies the Lipschitz condition with respect to v — (vi,...,vn) uniformly in p G

[0,2*], that is,

\*(p,v)-*(p,v°)\
$(p,0) = 0.

(3.5.19)

In particular, the functions $ from (3.4.7) and (1.3.23) satisfy (3.5.19). The following three consequences from the above assumptions are well

known [74]. (1) There exists a unique fundamental solution G(x, p;y, cr), satisfies the homogeneous equation LG = 0 in the variables (x,p), and

lim /

G(x, p; y,
(3.5.20)

^P jRn

for any continuous function f ( x ) such that

|/(z)| < const e^l 2 ,

&= -

>

(3.5.21)

192

Chapter III

(here .A is taken from (3.5.16)). (2) Solutions of inhomogeneous equations (3.5.13) and (3.5.14) can be expressed in terms of G(x, p; y,
F0(p,x) =

G(x,p;y,Q)iJ}(y)dy

+

jRn

Fk(p,x)=

f

da Jo

JRn

G(x, p-y,cr)c(y) dy, (3.5.22)

G(x,p-

JRn

+

Jof

P

d
jRn

Gfrw)

i c(y)

~ *-, v

°y

' J dy. (3.5.23)

In this case, formula (3.5.22) holds unconditionally in view of (3.5.18); for-

mula (3.5.23) holds only if the derivatives dFk/dx{ satisfy some inequalities of the form (3.5.18) (or at least of the form (3.5.21)). In the sequel, we show that this condition is always satisfied. The solutions Fk(p, x), k = 0, 1, . . ., are twice continuously differentiate in x, and the derivatives 8Fk/dxi and 92Fk/dxidxj can be calculated by differentiating the integrands on the right-hand sides of (3.5.22) and (3.5.23). (3) The following inequalities hold (for any A < A from (3.5.16)):

(3.5.24)

dG.

—— (x,p;y,(r)

Statements (l)-(3) hold for linear equations (3.5.13), (3.5.14) of successive approximations. Now we return to the synthesis problem and consider the two stages of solving this problem. First, by using the majorant estimates (3.5.24) and (3.5.25), we prove that the successive approximations Fk(p,x) converge as k —>• oo to the solution F(p,x) of Eq. (3.5.11) (in this case, we simultaneously prove that there exists a unique solution of

Eq. (3.5.11) with the initial condition (3.5.9)). Next, we show that the suboptimal systems constructed by the control law (3.5.15) are asymptotically as k —>• oo equivalent to the optimal system.

1. First, we prove that the sequence of functions Fo(p,x),Fi(p,x),... determined by recurrent formulas (3.5.22), (3.5.23) and the sequence of their partial derivatives dFk(p,x)/dxi, k = 0,1,2,... are uniformly con-

193

Approximate Synthesis of Stochastic Control Systems vergent. To this end,

we construct the differences

Qk = Fk+1(p,x)-Fk(p,x)

(3.5.26)

dFk o = J/

>~ady (3.5.27)

(in (3.5.26), (3,5,27) we set k - 0,1,2,... provided that F-i = 0). Using (3.5.19), (3.5.26), and (3.5.27), we obtain the inequalities

\Qk(p,x)\
da JO

\G(x,p;y,
da [

dy,

dy

JRn

(3.5.28) T, y)

dG(x,p;y,
dy

JRn

dy. (3.5.29)

Formulas (3.5.28), (3.5.29) and (3.5.24), (3.5.25) allow us to calculate estimates for the differences (3.5.26), (3.5.27) recurrently. To this end, it is necessary only to estimate \dQo/dxi . It turns out that the estimate of type (3.5.18) holds, that is,

dQ0

(3.5.30)

<

Indeed, since

t~n/2 I

exp ( - j\y\2 + h\y\] dy < K4

JRn

V

*

(3.5.31)

/

for A > 0, we have

P

f da-

Jo

(p-

exp

dy

) (3.5.32)

Chapter III

194

for the derivative 8F0/dxi provided that (3.5.18), (3.5.22), and (3.5.25) are taken into account. By using the inequality

dF0

dF0

<

dy

with regard to (3.5.19), (3.5.27), and (3.5.32), we obtain

dQ0

( / 0 - < T)-("+ 1 )/ 2 (
J" da f Jo

x exp I —

JRn

p-a

• h \ y \ ) dy

and since p is bounded, we arrive at (3.5.30). Using (3.5.30) and applying formulas (3.5.28) and (3.5.29) repeatedly, we estimate the differences (3.5.26) and (3.5.27) for an arbitrary number k > 1 (here F(-) is the gamma function) as follows:

\Qk(p,x)\< (3.5.33)

9Qk(p,x]

,h\x\

(3.5.34)

(formulas (3.5.33) and (3.5.34) are proved by induction over k). The estimates obtained show that the sequences of functions

Fk(p, x) = F0(p, x) + Q0(p,:

Qk-i(p,

(3.5.35)

dFk,

, Ox,

(3.5.36)

converge to some limit functions

F(p,x) = lim Fk(p,x), k— s-oo

-IJT i

^

Wi(p,x)=

i-

dF ,

k hm -r—— ( k—>-oo

In this case, the partial sums on the right-hand side of (3.5.35) uniformly converge in any bounded domain lying in E/r, while in (3.5.36) the partial

Approximate Synthesis of Stochastic Control Systems

195

sums converge uniformly if they begin from the second term. The estimate (3.5.32) shows that the first summand is majorized by a function with singularity at p — 0. However, one can readily see that this is an integrable

singularity. Therefore, we can pass to the limit (as k —>• oo) in (3.5.23) and in the formula obtained by differentiating (3.5.23) with respect to a;;. As a result, we obtain

F(p,x) = I

G(x,p;y,Q}i>(y)dy

JRn

+ I" da I JO

= f

Jnn

G(x, p; y, a) [c(y) - $(
JRu

dy.

ox

i

This implies that Wi(p, x) = dF(p,x)/dxi

and hence the limit function

F(p, x) satisfies the equation

F(p,x) = f

G(x,p;y,0)^(y)dy

jRn

+ [" d
Jo

jRn

G(x,p;y,
I

^ oy

>\

(3.5.37)

Equation (3.5.37) is equivalent to the initial equation (3.5.11) with the

initial condition (3.5.9), which can be readily verified by differentiating with regard to (3.5.20). Thus, we have proved that there exists a solution of Eq. (3.5.11) with the initial condition (3.5.9). The proof of this statement shows that the solution F(p, x) and its derivatives dF/dxi have the following majorants everywhere in HT'-

3F

\F(p,x)\
(3.5.38)

By using (3.5.38), we can prove that the solution of Eq. (3.5.11) with the initial condition (3.5.9) is unique. Indeed, assume that there exist two solutions FI and F2 of Eq. (3.5.11) (or of (3.5.37)). For the difference V = jFi — ^2 we obtain the expression

V(p. <,x) =

rp Jo

do-

r Jnn

i /

G(x,p;y,o-)\$((T,

I ^

dy

dy

which together with (3.5.19) allows us to write

\V(p,x)\
Jo

da f

Js.n

G(x,p;y,

ov

dy.

196

Chapter III

The same reasoning as for the functions Fk leads to the following estimate for the difference V = FI — F-2 that holds for any k: k/2

hM

This implies that V(p, x) = 0, that is, Fj_(p, x) = F2(p, x). We have proved that the successive approximations Fo(p, x), -Fi(p, x), . . . obtained by recurrent formulas (3.5.13) and (3.5.14) converge asymptotically as k —> oo to the solution of the Bellman equation, which exists and is unique. 2. Now let us return to the synthesis problem. Previously, it was proposed to use the functions uk(p, x} given by (3.5.15) for the synthesis of the control system. By Hk(p,x) we denote the functional

calculated on the trajectories of system (3.5.1) that pass through the point x at time t = T — p under the action of control uk. The function Hk(p, x) determines the "quality" of the control uk(p,x) and satisfies the linear equation a rr

LHk(p,x) = -c(x)-ul(p,x}qT(p)-~-(p,x),

Hk(Q,x) = ij)(x).

(3.5.39) From (3.5.14), (3.5.39), and the relation -ulqT8Fk/dx =. 3>(p,dFk/dx), it follows that the difference Ajt(p, x) = Fk(p,x) — Hk(p,x) equation

satisfies the

(3.5.40) A fc (0,a:) = 0. Since the right-hand side of (3.5.40) is small for large k (see (3.5.19),

(3.5.34)), that is,

dFt (3.5.41) and the initial condition in (3.5.40) is zero, we can expect that the difference Afc(/i>, x) considered as the solution of Eq. (3.5.40) is of the same order, that is, (3.5.42)

Approximate Synthesis of Stochastic Control Systems

197

If the functions Uk(p, x) are bounded and sufficiently smooth, so that the coefficients of the operator Lk are Holder continuous, then the operator Lj. is just the same as L and the inequality (3.5.42) can readily be obtained from

(3.5.22), (3.5.24), and (3.5.41). Conversely, if Uk(p, x] are discontinuous functions (but without singularities, for example, such as in (3.0.1) and

(3.0.8)), then the inequality (3.5.42) follows from the results of [81]. Since the series (3.5.35) is convergent, we have \F(p, x) — F k ( p , x ) \ < e'^Kfeh^ (where £% —>• 0 as k —>• oo). Finally, this fact, the inequality

\F-Hk <\F-Fk + \Fk - Hk , and (3.5.42) imply \F(p,x)-Hk (p, x) \ < ekKsehW

(3.5.43)

(efc = max(ej,. , ejj.') and K% = max(^6i -KV))- Formula (3.5.43) proves the asymptotic (as k —->• oo) optimality of suboptimal systems constructed according to the control algorithms Mfc(/o, x) calculated by the recurrent formulas REMARK 3.5.1. If the coefficients of the operator L are unbounded in IT/r, then the estimates (3.5.24) and (3.5.25), generally speaking, do not hold. However, there may be a change of variables that reduces the problem to the case considered above. If, for example, the coefficients a,i(t,x) in (3.5.1) depend on x in a linear way (that is, a(t, x) = A(t)x, where A(t) is an n x n matrix depending only on t), then the change of variables x = X(0,t)y (where X(0,t) is the fundamental matrix of the system x = A(t)x) eliminates unbounded coefficients in the operator L (in the new variables y), which allows us to investigate such systems by the methods considered above.

D

In conclusion, let us consider an example from [96], which illustrates the efficiency of the method of successive approximations for a one-dimensional

synthesis problem that can be solved exactly. Let the control system be described by the scalar equation

penalized. Then the Bellman equation (3.5.8) and the initial condition (3.5.9) take the form

op

= c(x) + min

l«l<«

«

+

2

,

F(0, x) = 0.

(3.5.44)

198

Chapter III

Minimizing the expression in the square brackets, we obtain the optimal control w * ( p , x ) — -um sign —— ( p , x ) ,

and transform the Bellman equation to the form

dF

OF_

, ,

= C(X) -

dx

F(0,x) = Q.

Zlh?'

(3.5.45)

Since the penalty function c(x) is even, it follows from (3.5.45) that for any p the loss function F(p, x) satisfying (3.5.45) is an even function of x, hence we have the explicit formula

ut ( p , x ) =ut(x) = -um sign x. In this case, for x > 0, the loss function F(p, x} is determined by the formula [26] ,

:

f )=/

da f°° \um u2m I ^-r- / C 2 / e x p — — ( x - y -——a-

Jo V27rkr Jo

I b

26
r

+

2b ]

r

J

L

2ba

The successive approximations Fo(p, x), Fi(p, x),... are even functions of the variable x (since c(x) is even). Therefore, in this case, any approximate control (3.5.15) coincides with the optimal control ut, and the efficiency of the method can be estimated by the deviation of the successive approximations FQ, FI, ... from the exact solution F(p, x) written above.

Choosing the quadratic penalty function c(x) = x2 and taking into account the fact that in this case the fundamental solution G(x,p;y,cr} (the transition probability density p(y,
we obtain from (3.5.22) and (3.5.23) the following expressions for the first two approximations:

fp

dcr

[°°

Jo ^/2irb(p-(r)

2

\ - fo ~ y) 2 1

J-^ rP

^(p, x) = F0(p, x) - 1um I JO

[

A

a a

2 ^2

2b(p-
= I

y2TTD{p — (TJ J — oo

2

r

/

\2 l

\y\ exp - ^ ~ V}^ \ dy. L

Approximate Synthesis of Stochastic Control Systems

t

199

F(p= !,»), F0(l,x),

1.5

FIG. 28 The functions FO, FI, F calculated for um = b = p = 1 are shown in Fig. 28. One can see that

\F(l,x)-F0(l,x) that is, the second approximation gives a satisfactory approximation to the exact solution. This example shows that the actual rate of convergence of successive approximations to the exact solution of the Bellman equation can be larger than the theoretical rate of convergence estimated by (3.5.35) and (3.5.33), since the proof of the convergence of the method of successive approximations (3.5.13)-(3.5.15) is based on rather rough estimates (3.5.24) and (3.5.25) for the fundamental solution. §3.6. Approximate synthesis of some stochastic systems with distributed parameters This section occupies a special place in the book, since only here we consider optimal control systems with distributed parameters in which the plant dynamics is described by partial differential equations. So far the theory of optimal control of systems with distributed parameters is characterized by a significant progress, first of all, in its deterministic branch [30, 130]. Important results are also obtained in stochastic problems (the distributed Kalman filter, the separation theorem in the optimal control synthesis for linear systems with quadratic criterion, etc. [118, 182]).

200

Chapter III

However, many problems in the stochastic theory of systems with lumped parameters still remain to be generalized to the case of distributed plants. We do not try to consider these problems in detail but only discuss

the possible use of the approximate synthesis procedure (3.0.6)-(3.0.8) for solving some stochastic control problems for distributed systems. Our consideration is confined to problems in which the plants are described by linear partial equations of parabolic type.

3.6.1. Statement of the problem. Let us consider control systems subject to the equation

^j^- = Cxv(t,x)+u(t,x)+^(t,x),

0<*
v(Q,x)=v0(x). (3.6.1)

Here Cx denotes a smooth elliptic operator with respect to spatial variables X — \%li • • • i -^n)i

Cx = aij (t,x) -A— + bi(t,x)-£-+c(t,x), 0 X j (j X j

(3.6.2)

(sX'i

whose coefficients a,-j(t, x), bi(t,x), and c(t, x) are denned in the cylinder

fi = D x [0,T], where D is the closure of an arbitrary domain in the n-dimensional Euclidean space Rn and the matrix a(t, x] satisfies the inequality ifar) = a,ij(t, x)r]ir)j > 0 (3.6.3) for all (t, x) £ O and all r\ = (rji,..., rjn) (as usual, in (3.6.2) and (3.6.3) the sum is taken over twice repeated indices from 1 to n). If D does not coincide with the entire space R n , then, in addition to (3.6.1), the following boundary conditions must be satisfied at the boundary dD of the domain D:

Mxv(t,x) = ur(t,x),

(3.6.4)

where the linear operator Mx depends on the character of the boundary problem. Thus, for the first, the second, and the third boundary value problems, condition (3.6.4) has the form

v(t,x) = ur(t,x), ^(t,x)=ur(t,x),

dv

— (t, x) + q(t, x)v(t, x) = ur(t, x).

(3.6.4.1) (3.6.4.II) (3.6.4.III)

Here x 6 dD, dv/dcr denotes the outward conormal derivative, and a is the outward conormal vector whose components CTJ (i = 1,.. .,n) and the

Approximate Synthesis of Stochastic Control Systems

201

components of the outward normal v on the boundary dD are related by the formulas cr^ — a.iji>j [61, 124]; in particular, if ||ajj||" is the identity matrix, i.e., dij = <5jj, then the conormal coincides with the normal. For example, equations of the form (3.6.1) with the boundary conditions (3.6.4) describe heat propagation or variation in a substance concentration in diffusion processes in some volume D [166, 179]. In this case, v(t, x) is the temperature (or, respectively, the concentration) at the point x G D at time t. Then the boundary condition (3.6.4.1) determines the temperature (concentration), and the condition (3. 6. 4. II) determines the heat (substance) flux through the boundary dD of the volume D. System (3.6.1) is controlled both by control actions u(t, x) distributed throughout the volume and by variations in the boundary operating conditions up(t, x). The admissible controls are piecewise continuous functions u(t,x) and t/ r (i, x) with values in bounded closed domains:

u(t,x)£U(x),

x£D;

ur(t, x) <E Ur(x),

x & dD.

(3.6.5)

We assume that the spatially distributed random action £(i, x) is of the nature of a spatially correlated normal white noise E£(*,z) = 0,

Et(t,x)t(T,y)=K(t,x,y)6(t-T),

(3.6.6)

where K (t, x, y) is a positive definite kernel-function symmetric in x and y and 6(t) is the delta function. We also assume that, under the above assumptions, the function v(t, x) characterizing the plant state at time t is uniquely determined as the generalized solution of Eq. (3.6.1) that satisfies (3.6.1) for (x,t) E D x (0,T] and is a continuous continuation of a given initial function i>(0, x) = VQ(X) as t —>• 0 and of the boundary conditions (3.6.4) as x —>• dD. The problem is to find functions u*(t,x) and u*(t, x) satisfying (3.6.5) so that to minimize the optimality criterion u[v(t,x1),...,v(t,x'),u(t,x)iur(t,x')] D J dD

xdx1...dx'dxdx'dt\,

(3.6.7)

where x1 = (x\, x*2, ..., x'n), dxl = dx\dx\ . ..dx^ (i = 1, 2, . . . , s), and w is an arbitrary nonnegative integrable function. In this case, the desired functions it* and u* must depend on the current state v(t, x) of the controlled system (the synthesis functions), that is, they must have the operator form ut(t,x) =
u*r(t,x) = il>[t,v(t,x)]

(3.6.8)

(it is assumed that the state function v(t, x) can be measured precisely).

202

Chapter III

3.6.2. The Bellman equation and equations of successive approximations. To find the operators (3.6.8), we shall use the dynamic programming approach. Taking into account the properties of the para-

bolic equation (3.6.1) and the nature of the random actions (3.6.6), we can prove [95] that the time evolution of v(t, x) is Markov in the following sense: for given functions u(t, x) and up(t,x), the probability distribution of the future values of V(T, x) for T > t is completely determined by the value of the function v(t,x) at time t. This allows us to consider the minimum losses on the time interval [t,T\,

F[t,v(t,x)] =

min

E

u(t.x}£U(x) ur(t,x)£Ur(x) t
rT _

uiTdr,

Jt

where

= /•••/

JD

/

U[V(T, x 1 ),...,i)(r, xs),u(r, x),ur(r, x')]dxl.. .dx'dxdx',

JD JdD

(3.6.9)

as a functional depending only on the initial (at time t) state v(t, x) and time t. Therefore, the fundamental difference equation of the dynamic programming approach (see (1.4.6)) can be written as

t
U

t + At

"1

u>T dr + F[t + At, v(t + At, x)] \. ) (3.6.10)

For small At, in view of (3.6.1), we have

v(t + At, x) = v(t, x) + Av(t, x) t+At

/

t(T,x)dr + o(At). (3.6.11)

Taking (3.6.11) into account, we can expand the functional F[t+At, v(t+ At,x)] in the functional Taylor series [91]

F[t + At,v(t + At,x)] „. . ., dF[t,v(t,x)] I" 8F[t, v(t, x) = F[t,v(t,x)}+ —— ——LL '' ; }1}1At+ / ' {

at

Sv(t,x)Sv(t,y)

J

dv(t,x)

-Av(t, x)Av(t, y) dxdy + ... . (3.6.12)

Approximate Synthesis of Stochastic Control Systems

203

The functional derivatives SF/6v and S3F/6v(x)6v(y) in (3.6.12) (for their detailed description, see [91]) can be obtained by calculating the standard derivatives in the formulas

SF ,. 1 = hm -— Sv(t,x) A-).O A"

dvj

AJ —¥x

'

v v . . . Sv(t,x)Sv(t,y)

A-S-O A

(3.6.13)

2n

In (3.6.13) the functional F^(VI,VZ, • • •) denotes a discrete analog of the functional F(t, v(t, a;)), which can be obtained by dividing the volume D into n-dimensional cubes A; of equal volume A™ and replacing the continuous function v(t, x) by a set of discrete values t>i, 1)2,... each of which

is equal to the value of v(t, x) at the center of the cube A,. In this case, the functional F is assumed to be sufficiently smooth, that is, its weak and strong Gateaux and Freshet derivatives [91] exist up to the second order inclusively, are equal to each other, and coincide with (3.6.13).

Substituting the expansion (3.6.12) into (3.6.10), passing to the limit as At —>• 0, and taking into account (3.6.6) and (3.6.11), we obtain the Bellman equation with functional derivatives:

dF

.

{_,

r

SF

r,

S + \ I I K(t,x,y) f "JDJD dv(t,x)6v(t,y)

,.

x

.

,.

„

dxdy.

(3.6.14)

To find the desired optimal control operators (3.6.8), it is necessary to solve Eq. (3.6.14).

The integral in the braces in (3.6.14) depends (in addition to the "solid" controls w(t, a:)) on the control actions u (t, x) that determine the boundary conditions (3.6.4) for the functions v(t, x) obtained by solving Eq. (3.6.1). We can write this dependence explicitly by using the Green formula [61,

124]

f —r

JD $v

- f

JD

r*— x v $

( / A\—— - — (—}]

JdD I

[da 6v

zos(v,Xi) > dx, 1/2

da V Sv / J

(3.6.15)

204

Chapter III

where £* denotes the differential operator dual to Cx in the variables x and v is the outward normal on 3D. In (3.6.15) the integral over the boundary dD of the domain D explicitly depends on the control ur(t, x) of

the boundary operating conditions as it follows from (3.6.4). To be definite, let us consider the third boundary value problem (3.6.4.III). The outward

conormal derivative of the state function v(t, x) on the boundary dD can be written as

dv — ( t , x ) = ur(t,x)-q(t,x)v(t,x)

(x€8D).

(3.6.16)

Substituting (3.6.16) into (3.6.15) and (3.6.15) into (3.6.14), we obtain the following final Bellman equation (for the third boundary value problem):

dF dt

. (_. utu, I

.

--—•= mm
f 5F JD $v

,

r

I -—-udx + I

JdD

ASF

, \

Sv

J

A—ur dx >

«r€fr

+ I t>£* —cb JD &v f [ 8F f dtnj\ . / d /SF\ SF\] + / \v—(bi--z-^-lcosfaxi)-Alv—l-?-}+qv-r-)\dx JdD \_ &v \ oxj / \ do- \ov / 5v / J S + 1 f f K(t, x, y) f dxdy, ZJoJo 6v(t,x)6v(t,y)

F(T, v(T, x}) = 0. (3.6.17)

This equation can be solved only approximately if the penalty functions are arbitrary and the controls u and w r are subject to constraints. Let us consider one of the methods for solving (3.6.17) by using the approximate synthesis procedure of the form (3.0.6)-(3.0.8). As already

noted (§§3.1-3.4), the approximate synthesis method is especially convenient if the controlling actions are small, namely, |[t> — flo||/|H| <si 1> where v is the solution of Eq. (3.6.1) with the boundary condition (3.6.4) and with any admissible functions u and ur satisfying (3.6.5), VQ is the solution of the corresponding homogeneous (in u and ur) problem, and |[ • || is the norm in the space L2'. From a physical viewpoint, this means that the power of (controlled) sources is not large as compared with ||i>||2 or with the intensity ID ID K f t i x> y} dxdy of random perturbations £(t, x). Then, by setting u(t, x) = ur = 0, we obtain the following equation for

the zero approximation instead of (3.6.17):

Approximate Synthesis of Stochastic Control Systems

+ [ L**°.(b.-?*L\ JQD L Sv \ ' dxj . K(t,x,y)

f

' .

' .

f

'Sv(t,x)Sv(t,y)

'f

,dxdy,

205

d fSF

°^

F0(T, v(T, x)) = 0 .

(3.6.18)

Here, according to (3.6.9), w 0 (v(i, a;)) is a functional of the form

uo(v(t,x)) = w(0,0,-y)

= I ... I JO

L

I

u[v(t,x'

),...,v(t,xs),Q,Q}dxl...dxsdxdx'

J D JdD

= I ••• I w 0 [r>(<, x1),..., v(t, xs)] dx1... dx". JD

JD

(3.6.19)

If the functional Fo(t, v(t, 2;)) satisfying (3.6.18) isfound, then the condition

. f_,

, , r SF

r 6F

\

O 0 dx > mm <w(u.u r .v)+ I ——udx+ I ——u «6C7, I JD 6v JdD Sv )

« r €C/ r

= w(^ 0 ,^o,w)+ / 5-^v0dx+ I

JD ov

S

-^^0dx^^(v(t,x)).

J9D Sv

(3.6.20)

allows us to calculate the zero-approximation optimal control functions (operators) uo(t, x) = ip0(t,v(tjx)J and u°(t, x) — i/;o(t,v(t,x)). The expression for oJi (v(t, x)) is used to calculate the first approximation _Fi(i,-u(t, a;)), and so on. In general, the fcth approximation Ftc\t,v(t,x)*) (k — 1,2,...) of the loss functional is determined as the solution of an equation of the form (3.6.18), where the change WQ —^ ^k and FQ —> F^ is performed. Furthermore, simultaneously with Fk, we determine the pair of functions (operators)

uk(t,x) = k[t,v(t,x)],

x&D,

ukr(t,x) = i>k[t,v(t,x)},

x£dD,

which allow us to synthesize a suboptimal control system in the fcth approximation (the functions tfk and ipk can be obtained from Eq. (3.6.20) with FQ replaced by Fk).

3.6.3. Quadrature formulas for functionals of successive approximations Fk[t,v(t, x ) ] , k = 0,1,2,.... To use the above procedure of approximate synthesis in practice, we need to solve Eq. (3.6.18) and the corresponding equations for Fk (k — 1, 2 , . . . ) .

206

Chapter III

First, let us consider the zero-approximation equation (3.6.18). We show that if the influence function G(x, t; £, T) of an instantaneous point source15 is known, then the solution of Eq. (3.6.18) can be written in the form

F0[t,v(t,x)] = I ••• I

JD

CO

/

JD

dx1...dxs I

Jt

dr

/»OO

.../

uo(vi,...,ve)ptT(vi,...,vi;v(t,x))dvi...dvi,

(3.6.21)

J — oo

-oo

where the function ^0(^1, . . . , vt) is defined by (3.6.19) and

P, r (t;i,... ,«,;«(*,*)) = [(27r)'det||Ar|[]" 1/2 x exp < — — rya — / G(xa,T;x,i)v(t,x) dx\ I

2L

JD

J

(3.6.22) Here the entries of the matrix ||-Dtr|| are given by the formulas

t

dp [

f K(^x,y)G(xa,T-x,tJ,)G(x^T-y,n)dxdy,

JD JD

(3.6.23)

and (D^Tl)ap denotes the (a,/3)th element of the inverse matrix ||Dtr|[-1. To prove (3.6.21) and (3.6.22), we need to recall some well-known facts [61, 124] from the theory of adjoint operators and Green functions. Suppose that a smooth elliptic operator Cx of the form (3.6.2) is given in an arbitrary domain D of an n-dimensional Euclidean space Rn . We also

assume that this operator is denned in the space of functions / sufficiently smooth in D and satisfying the equation

Mxf = 0

( z e dD)

(3.6.24)

on the boundary dD of the domain -D; here M.x denotes a certain differential operator with respect to the variables x G dD (a boundary operator). 15

The function G(x,t;£,T), t > r, with respect to the variables (x,t) is the solution

of the homogeneous boundary value problem (3.6.1), (3.6.4) (the case in which u(t, x) =

ur(t,x) = £(t,x) = 0 in (3.6.1) and (3.6.4)) with the initial condition v(r,x) = S(x - C). This function is also called the fundamental solution or the Green function of problem (3.6.1), (3.6.4).

Approximate Synthesis of Stochastic Control Systems

207

DEFINITION 3.6.1. The operators £* 6 D and M*x £ dD are called adjoint operators oiCx and M.x if for arbitrary sufficiently smooth functions /(z), satisfying (3.6.24), and (x) satisfying

l

(3.6.25)

(x G dD),

we have the relation

f C,fdx= f fC*
D

(3.6.26)

In general, the adjoint operators £* and A4* are not uniquely defined. However, if we set £* equal to the adjoint operator denned in the unbounded domain D = Rn [61], that is,

32

d

(3.6.27)

then it follows from Definition 3.6.1 and the Green formula

/
that the operator M*x can be defined uniquely. So, for the first, second, and third homogeneous boundary conditions (that is, for the conditions (3.6.4.I)-(3.6.4.III)) where ur(t,x) - 0, Eq. (3.6.25) takes, respectively, the form dD

(3.6.25.1)

= 0,

(3.6.25.II)

A

JT oa ~

9D

= 0. 9D

(3.6.25.III)

Now let us consider the parabolic operators L — — TT7 + £x,

at

(3.6.28) (3.6.29)

208

Chapter III

DEFINITION 3.6.2. A function G(x,t;C,r) denned and continuous for (x,i), ( C , T ) G £i, 2 > T, is called the influence function of a point source (the Green function) for the equation Lf = 0 in the domain fl if for any T G [0, T) the function G(s;,i;C,T) satisfies the equation

LG = 0

(3.6.30)

in the variables (t, x) in the domain D x (T < t < T) and satisfies the initial and boundary conditions

2;C,T) = <J(a;-C), MXG = 0

for x <E 3D,

(3.6.31)

T
(3.6.32)

In a similar way, the Green function G* (x, t; C, T) is defined for the adjoint parabolic operator (3.6.29). The only difference is that, in this case, the function G* is defined for time t < T. The conditions (similar to (3.6.30)(3.6.32)) that determine the Green function for the adjoint problem have the form

L*G* = 0

for

(t,x)£D x (Q
(3.6.33)

limG*(x.t;C,r) v ;= S(x-C), \ sy, tfr

M*XG* = 0

(3.6.34) v ;

for (t, x)£dDx(Q
D

(3.6.35)

The following statement readily holds for the functions G and G*.

DUALITY THEOREM. If G(x,t^,r) and G*(x,t;£, T) satisfy problems (3.6.30)-(3.6.32) and (3.6.33)-(3.6.35), then

G(x, t; C, T) = G* (C, T- x, t).

(3.6.36)

PROOF. Let us consider the functions G(y,rj;^T) and G*(y,n;x,t) for y G D and T < r\ < t. Taking into account the fact that these functions satisfy (3.6.30) and (3.6.33) in y and 77, in view of Definition 3.6.1 of the adjoint (in y) operator £*, we have

0=/ Jr+e

[ (-G*LG+GL*G*)dydrf JO

dyd D V

or]

or) /

*=

[G G] JD

* -~- dy-

Approximate Synthesis of Stochastic Control Systems

209

Rewriting (3.6.37) in the form

/ Gt(y,t-e;x,t)G(y,t-e;C,T)dy= JD

I G*(y,T+e;x,t)G(y,T+E;(;,T) dy JD

passing to the limit as e —)• 0, and taking into account (3.6.31) and (3.6.34), we obtain (3.6.36). D Now, by using the properties of the Green functions, we shall show that the functional (3.6.21) actually satisfies Eq.(3.6.18). To this end, we need to calculate all derivatives in (3.6.18). Taking into account the relation oo

*»o

- '

... / •00

J-o

xexp[-|K-aa

and the property (3.6.31) of the Green function, we differentiate (3.6.21) with respect to time and obtain

° = - f ... f u0[v(t,xl),...,v(t,xs}]dxl... l JD JD + I ••• [ dx^...dxs f

JD

... I J

-°°

JD

Jt

dr f

J -oo

...

\u0(v-L,...vs)—^(vi,...vs;v(t,x)) L

m

dvi,. ..dvs. J

(3.6.38)

To calculate dptr/dt, we use the rules for differentiating determinants and inverse matrices: - det B = det B • S p - 1 ^ ;

JET1 = -B

(here B is the matrix composed of the time-derivatives of the entries of the matrix B). Performing the necessary calculations, we obtain

dt

= -ptrtSp(D^K) ~

+ '

a,{3= !,...,«,

(3.6.39)

210

Chapter III

where, for brevity, we use the notation

= \\Kap\\i =

f

f K(t,x,y)G(xa,r;x,t)G(xf),T;y,t)dxdy

JD JD

[ ] « = [ » « - / G(xa, r; x, t)v(t, x) dx] ,

L JD Ptr = Ptr(vi,...,vs;v(t,x)).

J

(3.6.40)

By formulas (3.6.13) and (3.6.22), we can readily obtain the first- and second-order functional derivatives

SF

f f f^ f°° f°° 5— = / • • • / dxi...dxsl dr I ... w 0 («i, • • -,v,)ptT

dv(t,x)

S2F

——— °

8v(t,x)6v(t,y)

JD

JD Jt J_oo J_00 ( x G(x *,T;x,t)(D^)al)[]/3dvl...dvs,

f • \

JD

l

dx ...dx

s

fT

fc°

I dr I

Jt

J-oo

/•« ... I

J-c

(3.6.41)

, . . . , v,)ptr

- G(xa, r; x, t)(D^r1)al}G(x13, T; x, t) dv,. . . . dvs. (3.6.42) In view of (3.6.36), the Green functions G(xa,T- x,t) in (3.6.39)-(3.6.42) satisfy (with respect to x and t) the adjoint equation (3.6.33) in the interior of the domain D and the adjoint boundary condition on the boundary 3D. Taking into account the fact that the adjoint boundary condition has the form (3. 6. 25. Ill) for the third boundary value problem (Eq. (3.6.18) was written just for this problem) and substituting (3.6.41) into (3.6.18), we readily verify that the integral over the boundary dD in (3.6.18) is equal to zero. Finally, substituting (3.6.38)-(3.6.42) into (3.6.18), we arrive at an identity, and relation (3.6.21) is thereby proved. The solution of the zero approximation equation (3.6.18) is given by

formulas (3.6.21) and (3.6.22). As a rule, the higher-order approximations -Ffc(t, v(t, x)}, k > 1, are calculated by more complicated formulas, where, in addition, we must pass to the limit, since, in general, w/t (v(t, K)), k > 1, are not integral functionals of the form (3.6.19). Therefore, we can calculate successive approximations FI- (t, v(t, x)), k > 1, by using, instead of (3.6.21),

Approximate Synthesis of Stochastic Control Systems

211

the formula [95] T

/

/»oo

dr I J-OO

/»oo

.../

u£(Vl,...,vr)

J..OO

x Ptr(vi, ...,vr;v(t,x))dvi,.. .,dvr,

(3.6.43)

where u>£(vi,.. .,vr) = u£(v(t, x1),..., v(t, x7")) is a finite-dimensional analog of the functional Uk[v(t, x)] such that lim wfcA («i,..., VT) = uk[v(t,x)].

(3.6.44)

r—>oo

A-+0

The following example illustrates calculations with the help of formula

(3.6.43). 3.6.4. An example. If we choose some special expressions for the functional (3.6.7), the operator (3.6.2), etc., then, using formulas (3.6.21) and (3.6.43), we can obtain a finite approximate solution of the synthesis problem. As an example, we calculate the optimal control of a substance concentration in a cylinder of a finite length. Let us consider a control problem often encountered in chemical industry processes. Suppose that there is a chemical reactor in which the output product is obtained by catalytic synthesis reactions. We assume that the reacting agents diffuse into the catalysis chamber through pipelines. There

may be branches in the pipeline through which reagents are coming to technological units where the concentration of the entering substance varies on random. Simultaneously, to obtain a qualitative output product, it is necessary to maintain the reagent concentrations close to given values. One of possible ways to stabilize the concentration in the catalysis chamber is to change the flow rate at the input of the corresponding pipeline.

After appropriate generalizations and idealizations, this problem can be stated as follows. Let the plant (a pipeline) be a cylinder of length i filled

with a homogeneous porous medium; the assumption r
the cylinder can be affected by changes in the flow rate at the end of the cylinder (the rate of the incoming flow is the controlling action). Assuming

that a random perturbation £(i, x) is a stationary white noise, we obtain the following mathematical model of the plant to be controlled [95]:

=a2+<e(M),

Q<x
Q
«2 =

,

(3.6.45)

Chapter III

212

(here B and C are the diffusion and the porosity coefficients of the medium);

dv ~dx

dv_

dx

x=0

(3.6.46)

= 0,

r)-

(3.6.47)

For the plant (3.6.45)-(3.6.47), we need to synthesize a regulator that minimizes the mean value of the quadratic performance criterion

/=E[/

/

[ 6(x,y)v(t,x)v(t,y)dtdx]

L./O Jo Jo

(3.6.48) J

( 9 ( x , y ) is a given positive definite function, i.e., the kernel) provided that the absolute value of the boundary control action (the boundary flow of the substance) u is bounded, that is,

\u\ < um.

(3.6.49)

In this example the Bellman equation (3.6.17) has the form

&F i rl rl

-

S2F

I K(x.y}-^—.———^-——-dxdy y

2 Jo Jo

' *Sv(t,x}Sv(t,y)

mn

££-*£(£) Sv

. i"[r,,(T,x)]=o. (3.6.50)

Taking into account (3.6.45) and (3.6.46) and calculating the minimum with respect to u, we can rewrite (3.6.50) in the form

dF

f^ f^ f^ f^

f^

= L L «(*' x=0

d2 /

SF

dx

SF ' 'dx\Sv(t,x)J\f=

S2F =,y) Sv(t,x)Sv(t,y) dxdy - a un

SF \Sv(t, x ) / x=o

F[T,v(T,x)} = 0 .

(3.6.51)

Simultaneously, we obtain the optimal control law (3.6.52)

Thus, to obtain the final solution of the synthesis problem, it remains to calculate the functional derivative [SF/Sv(t, x)]x=o in (3.6.52). We calculate it by the method of successive approximations.

Approximate Synthesis of Stochastic Control Systems

213

The zero approximation. Suppose that um is small. To solve (3.6.51), we first set um = 0. As a result, we obtain the following equation of the zero approximation ft ft a2 , op f fft

=

-^ L L

<^2 F

1 r^ /*'

+ -/ K(x,y)——~-^-——dxdy, 2 Jo Jo '<Jr;(<,a;)ch>(t,3/)

F0[T, v(T, x)] = 0.

(3.6.53) Elementary calculations show that its solution (3.6.21) can be written in

the form fT

FQ[t,v(t,x-)]= / Jt

r

ft

ft

dr\ / / fl(x,y) (. Jo Jo x

\ /tl /fl (G(a ( -,T;K,i)G(y, r;«,i)-u(t, x)w(i, y) ;

IJo Jo

v

+ / K(x,y)G(x,T;x,ff)G(y,T;y,. Jt ' J ) (3.6.54) Here the Green function G for the boundary value problem (3.6.45), (3.6.46) can readily be obtained by the separation of variables (the Fourier method) [26, 179] and represented as the series 2 [1 ^rr f / ? r f c \ 2 z, i -nk _. .. G ( x , t ; C , r ) = k + 2^exp - I — a (< - r) cos —-x cos

(3.6.55) The functional derivative of the quadratic functional (3.6.54) can readily be calculated (for example, by using formulas (3.6.13); see also [91]) as follows:

SF

——^— = 2 OV\l, X)

CT

dr

Jt

Fl fi JQ

JQ

fl

e(x,y)G(x,T;x,t)G(y,T;y,t)v(t,y)dydxdy.

JQ

Hence it follows that the optimal control law (3.6.52) has the following form in the zero approximation:

UQ\t, v(t, x)} = um sign rtt

ft ft

/

Ut

dr

-,

x / / / 0(x,y)G(x,T;Q,t)G(y,T;y,t)v(t,y)dydxdy\. Jo Jo Jo J

(3.6.56)

214

Chapter III

The first approximation. Taking into account (3.6.56), we can write Eq. (3.6.51) in the first approximation with respect to um as follows: c) W

f>t

——ST = /

01

fit

/ e(x,y)v(t,x)v(t,y)dxdy

Jo Jo

F1[T,v(T,x)]=Q, (3.6.57)

U

T

fl fl fl dr I I I 0(x,y)G(x,T]Q,t)G(y,T;y,t)v(t,y)dydxdy Jo Jo Jo

Now, formulas (3.6.21) and (3.6.22) are not sufficient for calculating Fi(t,v(t, x)); we need to use a more complicated calculation procedure according to (3.6.43) and (3.6.44). A finite-dimensional analog of the functional 2 can be obtained by dividing the interval [0,1] into the intervals A = t/r and replacing u by

WA — \hiVi + • • •

r

hll = ^l

Jt

ft r ft r

dr \

\ 9(x,y)G(x,T;Q,t)G(y,T;[j,A,t)dxdy.

Jo Jo

Next, we use formulas (3.6.21), (3.6.22), and (3.6.43) as well as the formula OO

/

AOO

.../

|v

J — OO

-OO

x exp - —^—-——-^——-———— I dxi... dxr

P

'"

J

(3.6.58)

Approximate Synthesis of Stochastic Control Systems

215

where

1

hr gr •

As a result, for Fi[t, v(t, a;)], we obtain the expression

fl

2um

Jt

dr

,

(3.6.59)

where Fo[t, v(t, x)] is given by (3.6.54), and moreover, ft

,T

H = H[t,T,v(t,x)} =

da I JT

Jo

,/

ft

I

I

Jo

Jo

ft

/ 6(x,y) Jo

x G(x, a; 0, r)G(y, a- x, r)G(x, r; y, t)v(t, y) dxdydxdy,

(3.6.60)
da f

JT

da f • • • f K(x, y)6(x', y'}6(x", y")

JT

Jo

Jo

x G(x, T; x, a)G(y, T; y, a)G(x', a; 0, r)G(j/, a; x, r)G(x", W; 0, r) x G(y", cf; y, T) dxdydx'dy'dx"dy"dxdy. After the functional derivative (SFi/Sv(t,x)')x_0 is calculated, relations (3.6.52) and (3.6.59) yield the controlling functional

U

L

U

T

dr

f1 ft I I 0(x,y)G(x,T;Q,t)G(y,T;y,t)v(t,y)dydxdy

Jo Jo

H

H

ft f t f t _ _ J1 I I I 6(x,y)G(y,a;x,T)G(x,T;Q,t)dxdydx\ >. Jo Jo Jo 1)

(3.6.61) Formula (3.6.61) enables us to synthesize the quasioptimal control system in the first approximation.

Chapter III

216

Although the quasioptimal control algorithms (3.6.56) and (3.6.61) look somewhat cumbersome (especially, formula (3.6.61)), they admit a transparent technical realization. For example, let us consider the zero-approximation algorithm (3.6.56), which can be written as

U

l

1

Q(y,t)v(t,y)dy\,

(3.6.62)

J

where Q(y,t)= f Jt

dr I I 8(x,y)G(x,T;Q,t)G(y,T;y,t)dxdy Jo Jo

is a known function calculated previously. The current value of the state

function v(t, x ) can be determined by a system of data units that measure the concentration v(t, xi),v(t, #2), • • . , v(t,xp) at the points xi,X2,..-,xP lying along the cylinder. In particular, if the concentration gauges are placed uniformly along the cylinder, then the integral in (3.6.62) can be replaced by the sum

«o(V

= w ro sign

Qi(t) = 8=1

A = -, P

(3.6.63)

xt = iA,

Vi = v(t, xt).

As a result, we obtain an algorithm whose realization does not present any difficulties.

FIG.29

Approximate Synthesis of Stochastic Control Systems

217

Indeed, it follows from (3.6.63) that, besides a system of data units, the control circuit (the feedback circuit) contains a system of linear amplifiers with amplification factors Qi(t), an adder, and a relay type switching device that relates the pipeline [0,1] either to reservoir 1 (for pumping additional substance) or to reservoir 2 (for substance suction at the pipeline input). Figure 29 shows the block diagram of the system realizing the control algorithm (3.6.63). The quasioptimal first-approximation algorithm (3.6.61) can be realized in a similar way. Here only the control circuit, along with a nonlinear unit of an ideal relay type, contains nonlinear transformers that realize the probability error functions $(K). However, it should be noted that an error is inevitably present in the finite-dimensional approximation of the state function v(t, x) (when the algorithm (3.6.56) is replaced by (3.6.63)), since it is impossible to measure the system state v(t, x) precisely (this state is a point in the infinitely dimensional Hilbert state L2). However, if the points X I , . . . , X P of location of the concentration data units lie sufficiently close to each other, then this error can be neglected.

CHAPTER IV

SYNTHESIS OF QUASIOPTIMAL SYSTEMS

IN THE CASE OF SMALL DIFFUSION TERMS IN THE BELLMAN EQUATION

If random actions £(t) on the plant in the closed-loop control system shown in Fig. 3 are of small intensity and the observation errors j](t) and £(t) are large, then the Bellman equation contains a small parameter, namely, the coefficients of the second-order derivatives of the loss function in the phase variables are small. Indeed, considering the synthesis problem for which we derived the Bellman equation in the form (1.4.26) in §1.4, we assume that the matrix

Ft + [A" (t, y)]T Fy + I [ Sp B* (t,x)FM + Sp By0 (*, y) Fyy] + *(*, x, y, Fx) = 0, (4.0.1) where B^t, x) —
which the matrix Q(t) in (1.5.46) has the form Q(t) = e~i/2Q0(t). In this case, the Bellman equation (1.5.54) for the problem considered can be written in the form

Ft +

Sp DRDFmmT + Sp [FD(aaT - eDRD}} + ^(m, D,Fm,FD} = 0, (4.0.2)

where

, D, Fm, FD) = mm [(mTGT(t, u) + bT (t, u))F

Sp FD (DGT(t, u) + G(t, u)D) + c(m, D, u)] ,

219

220

Chapter IV

If the value of the parameter e is small, then the solutions of the above equations and the solutions of the equations

F? + [A*(t, y)}TF° + *(*, x, y, F°) = 0,

(4.0.3)

T

F? + Sp F°D, F%, F°) = 0,

(4.0.4)

obtained from (4.0.1), (4.0.2) by setting e — 0, are expected to be close to each other. The equations for F° are, generally speaking, simpler than the original Bellman equations, since they do not contain second-order derivatives and thus are partial differential equations of the first order. If these simpler equations can be solved exactly, then we can construct solutions of

the original Bellman equations as series in powers of the small parameter e, that is, as F = F°+eF1+e2 .... Here the function F° plays the role of the leading term (generating solution) of the expansion. Taking finitely many terms Fk = F° + £F1 + - - - + ekFk (4.0.5) of the asymptotic series and considering Fk as an approximate solution of the Bellman equation (the feth approximation), we can readily solve the synthesis problem corresponding to this approximation. To this end, it

suffices to make the change F —>• Fk in the expression for the optimal control algorithm w* = <po(t,x,y,dF/dx) (see, for instance, (1.4.25)). In this way, we obtain the quasioptimal algorithm for the fcth approximation:

Uk(t,x,y) — 1, the expressions of Fk (or Fk) can be calculated recurrently. If the functions $ and $1 are sufficiently smooth, then the system of equations

for the successive terms .F1, F2,... in the expansion (4.0.5) can be obtained in the standard way by substituting the expansion (4.0.5) into Eqs. (4.0.1)

or (4.0.2) and setting the coefficients of different powers ek (k > 1) of the small parameter equal to zero. In other cases, it may be convenient to use a somewhat different scheme of calculations in which the successive approximations Fk (k > 1) are obtained as solutions of the sequence of equations: Fkt + [A*(t, y)}TFky + *(<, x, y, Fkx) lyyT},

k > 1,

(4.0.6)

or

Fkt + Sp FkDff(TT

+ $i(m, £>, Fkm, FkD)

= e[Sp Fk-iDcrcrT - ±SpDRDFk_lmmT},

k>l. (4.0.7)

Synthesis of Quasioptimal Systems

221

This approximate synthesis procedure was studied in detail and exploited for solving some special problems in [34, 56, 58, 172, 175]. The accuracy of the approximate synthesis was investigated in [34, 56]. It was shown that, under certain conditions, the use of the quasioptimal control u^ in

the fcth approximation gives an error of the order of ek+1 in the value of the minimized functional. In other words, if instead of the optimal control algorithm ut we use the quasioptimal algorithm «&, then the difference between the value of the optimality criterion /[it*] corresponding to this

control and the minimum possible (optimal) value I[ut] — F is of the order ofe fc+1 , that is,

/[«*] - /[«.] = I[uk] -F< cek+1,

(4.0.8)

where c is a constant. In the present section the main attention is paid to the "algorithmic" aspects of the method, that is, to calculational methods for obtaining quasioptimal controls Ufc. As an example, we consider two specific problems of the optimal servomechanism synthesis. First (in §4.1), we consider the synthesis problem that generalizes the problem considered in §2.2 to the case in which the input process y(t) is a diffusion Markov process inhomogeneous

in the phase variable y. Next (in §4.2), we write an approximate solution of the synthesis problem for an optimal system of tracking a discrete Markov process of a "telegraph signal" type when the command input is observed on the background of a white noise. §4.1.

Approximate synthesis of a

servomechanism with small-intensity noise

Let us consider a servomechanism shown in Fig. 10. Assume that the plant P is described by a scalar equation of the form

x = u + VeN£(t),

(4.1.1)

where £(i) is the standard white noise of unit intensity (1.1.31), e and N are given positive constants (e is a small parameter), and the values of admissible controls u lie in the region1 —a — um
(4.1.2)

1 The nonsymmetric constraints (4.1.2) are, first, more general (see [21]), and second, they allow a more convenient comparison between the results obtained later and the

corresponding formulas constructed in §2.2.

222

Chapter IV

where um > a > 0. The command input y(t) is a £ (^-independent scalar Markov diffusion process with drift and diffusion coefficients

Ay - -/3y,

B« = eB,

(4.1.3)

where (3 and B > 0 are given numbers and e is the same small parameter as in (4.1.1). The performance of the tracking system will be estimated by the value of the integral optimality criterion

c(y(t)-x(t))dt],

o

(4.1.4) J

where the penalty function c(y(t) — x(t)} = c(z(2)) > 0, c(0) = 0, is a given concave function of the error signal z ( t ) = y(t) — x(t). The problem stated above is a generalization of the problem studied in Section 2.2.1 of §2.2 to the case in which the plant is subject to uncontrolled random perturbations and the input Markov process y(t) is inhomogeneous in the phase variable y (the drift coefficient Ay = Ay(y) = —f3y ^ const). The inhomogeneity of the input process y(t) makes the synthesis problem more complicated, since in this case the Bellman equation cannot be reduced to a one-dimensional equation (as in Section 2.2.1 of §2.2).

Since problem (4.1.1)-(4.1.4) is a special case of problem (1.4.2)-(1.4.4), then it follows from (1.4.21), (1.4.22), and (4.1.1)-(4.1.4) that the Bellman equation has the form

-PyFy +

min

—a—um
[uFx] + | (NFXX + BFyy] + c(y - x) = -Ft, 2

0
F(T, x, y) = 0.

(4.1.5)

If like in Section 2.2.1 of §2.2 we introduce a new phase variable z = y — x and replace the loss function F(t,x,y) by F(t,y,z), then Eq. (4.1.5) can readily be written as

- (3y(Fy + FZ) +

min

—a—um
[~uFz]

+ ^(Fyy + Wy* + ?**) + ^» + c(z) = -Ft, Q
F(T,y,z)=Q.

(4.1.6)

We are interested in the stationary tracking when the terminal time T —>• oo. If the stationary loss function f ( y , z) is introduced in the standard

way (see (1.4.29) and (2.2.9)), f ( y , z ) = lim [ F ( t , y, z) - 7(T - <)], 1 —>OO

(4.1.7)

Synthesis of Quasioptimal Systems

223

then (4.1.6) implies the following stationary Bellman equation for the problem considered: — a— um
+ \[B(fyy

+ 2fy, + /«) + #/„] + c(*) = 7-

(4.1.8)

As usual, the number 7 > 0 in (4.1.8) characterizes the mean losses per unit time under stationary operating conditions. This number is an unknown variable and can be obtained together with the solution of Eq. (4.1.8). Let us discuss the possibility that Eq. (4.1.8) can be solved. By R+

we denote the domain on the phase plane (y, z) where fz > 0 and by R_ the domain where fz < 0. It follows from (4.1.8) that the optimal control

«* (y, z) must be equal to u» = um — a in R+ and to u* = —um — a in R_. Denoting by f + ( y , z ) and /_(y, z) the values of the loss function f ( y , z ) in the domains R + and R_, we obtain the following two equations from

(4.1.8):

+ ^ 5 ^ + 2 ^ f + ^f I [ V Oy dyOz Oz

+JV%^ + c ( z ) = 7 Oz J

inR±. ^^

Since in (4.1.8) the first derivatives fy and fz are continuous on the interface F between R+ and R_ [172], both equations in (4.1.9) hold on F, and we have the condition

Since the control action u* is of opposite sign on each side of the inter-

face F, the line F is naturally called a switching line. It follows from the preceding that the problem of the optimal system synthesis is equivalent to the problem of finding the equation for the switching line F.

Equations (4.1.9) cannot be solved exactly. The fact that expressions with second-order derivatives contain a small parameter £ allows us to solve these equations by the method of successive approximations. In the zero

approximation, instead of (4.1.9), we need to solve the system of equations

By /±, 7°, and F° we denote the loss function, the stationary error, and the switching line obtained from the Eq. (4.1.11) for the zero approximation. The successive approximations /£, jk, and F* (k > 1) are calculated

224

Chapter IV

recurrently by solving a sequence of equations of the form

+(«T«™)=7fc-Cfc(2/,*),

* > 1 , (4.1-12)

where

ck(y,z) = ck±(y,z) = c(z)

B

2

-

,f

2

-i

A method for solving Eqs. (4.1.11), (4.1.12) was proposed in [172]. Let us briefly describe the procedure for calculating successive approximations

/ f c , 7 f c , a n d F f c , k = 0 , 1 , 2 , . . . . First of all, note that Eqs. (4.1.11), (4.1.12) are the Bellman equations for deterministic problems of synthesis of secondorder control systems in which the equation of motion has the form ^ = -/3y,

J = aTWro-/%

(4.1.14)

(in the second equation the signs "minus" and "plus" of um correspond

to the domains R^_ and Rk_, respectively). As was shown in [172], the gradient V/ fc of the solution of nondiffusion equations (4.1.11), (4.1.12) remains continuous when we cross the interface Tk , that is, on Ffc we have the conditions

dfl •>+ Q

ay

—

k df•>-

——

Q

oy

dfi J + I

—

Q

dfJ k-

—— Q

oz

dz

JL _ n i o '

"•

—— U )

1

)^;...)

I A 1 1 e;A It.l.lJ I

if the phase trajectories of the deterministic system (4.1.14) either approach the line F on both sides (the switching line of the first kind) or approach Ffc on one side and recede on the other side (the switching line of the second

kind, see Fig. 4). This fact allows us to calculate the gradient V/ fc along Tk. Indeed, in the domain Rk+ we have

dy

(4-1-16)

d.

and in the domain R.^. ,

6y

dz J '

v

,

'

m!

^/-

dz

(4.1.17)

Synthesis of Quasioptimal Systems

225

It follows from the preceding continuity considerations that both equations (4.1.16) and (4.1.17) must be satisfied on F simultaneously. Solving these equations for the first-order derivatives, we find the gradient of the loss function on the interface F* between R/j_ and Rl:

This allows us to write the difference between the values of the loss function at different points on the boundary Tk as a contour integral along the boundary,

fk(Q)-fk(P)=

I Akdy + Akdz. Jp

(4.1.19)

If the part of Ffc between the points P and Q is a boundary of the first kind (that is, the representative point of system (4.1.14) once coming to the boundary moves in the "sliding regime" along the boundary [172]), then formula (4.1.19) makes it possible to obtain a necessary condition for the boundary F* to be optimal. The corresponding equation for the desired switching line z = zk (y} is obtained from the condition that the difference (4.1.19) must be minimal. This equation can be written in the form [172]

8Ak

Equation (4.1.20) is a consequence of the following illustrative arguments. Let y and y be the coordinates of the points Q and P on the j/-axis.

We divide the interval [y , yp] into N equal intervals of length

A = \yp — y \/N and replace the contour integral (4.1.19) by the corresponding integral sum

r / Jp

[Ak(yi, Zj)A + Ak(yi, z,-)(z,-+i - z,-) (4.1.21)

where yi = yp + (i — 1)A and z,- = z(j/ ( -). We need to choose z,- so that to minimize the function $ A ( Z I , . . . , Z ). A necessary extremum condition d
write the following system of equations for optimal z, :

dAk j/i-i,z,--i) = 0. (4.1.22)

226

Chapter IV

If Ay(y,z), A*(y,z): and zk (y) are sufficiently smooth functions of their arguments, then we have a/tfe

(4.1.23) for small A = yt — j/,-_i. Substituting (4.1.23) into (4.1.22), taking into account the relation Zi+i — 2z; + z,-_i — o(A), and passing to the limit as A —>• 0, we obtain the condition ajfc

i,zi) = ^(yi,Zi),

(4.1.24)

which coincides with (4.1.20), since i is an arbitrary number. If we know the gradient of the loss functions along the switching line P* and the equation z = zk(y) for r fc , then we can find a condition for the parameter -jk that is the fcth approximation of the stationary tracking error 7 in the original diffusion equation (4.1.8). By using (4.1.18) and the equation z = zk(y), we obtain the following expression for total derivative dfk/dy along T fc :

f! = ^ y + A*^W(y,7*). dy dy

(4.1.25)

The unknown parameter 7^ can be found from the condition that the derivative (4.1.25) is finite at a stable point; in the problem considered the point y = 0 is stable. More precisely, this condition can be written as

limyw*(j/,7*) = 0.

y-*0

(4.1.26)

The expression ,

dt

dy

is the increment of the loss functions /* on the time interval dt. Hence (4.1.26) means that this increment becomes zero after the controlled deterministic system (4.1.14) arrives at the stable state y — 0. Obviously, in this case, it follows from the above properties of the penalty function c(z) that we also have z = 0. Thus, relation (4.1.26) is a necessary condition for the deterministic Bellman equations (4.1.11), (4.1.12) to have stationary solutions. Let us use the above-treated calculation procedure for solving the equations of successive approximations (4.1.11), (4.1.12). We restrict our calculations to a small number of successive approximations that determine the most important terms of the corresponding asymptotic expansions and primarily affect the structure of the controller C when a quasioptimal control system is designed.

Synthesis of Quasioptimal Systems

227

The zero approximation. To calculate the zero approximation, we need to solve the system of equations (see (4.1.11))

+(

-*«~'

+c(2)=/

(4 L27

-

- >

ftf

ftf

Using (4.1.15) and solving system (4.1.27) with respect to -^ = -—- = f?f°

df°

df°

Qf°

-jj- and -g^- = -gf- = -^-, we obtain the following expressions for the components of the gradient V/° (4.1.18) on the switching line F°: r)f°

f)f°

- = ^(,,,)EO,

r(7\ — -v°

-^fe^iiiL-i.

(4.1.28)

Equation (4.1.20), which is a necessary condition for the switching line of the first kind, and (4.1.28) allow us to obtain the equation for F°:

^=0.

(4.1.29)

Since, by assumption, the penalty function c(z) attains its unique minimum at z = 0, the condition (4.1.29) implies the equation

z = 0,

(4.1.30)

that is, in the zero approximation, the switching line coincides with the y-axis on the plane (y,z). Now let us verify whether (4.1.30) is a switching line of the first kind. An examination of phase trajectories of system (4.1.14) shows that on the segment -

_

,

the phase trajectories approach the j/-axis on both sides;2 therefore, this segment is an actual switching line. For y $ [l-,l+], the equation for the switching line F° will be obtained in the sequel. Now let us calculate the stationary tracking error 7°. From (4.1.25), (4.1.26), and (4.1.28), we have "°(2/,7°) = -,

7° = 0.

(4.1.31)

2 Obviously, in this case, the domain R.5. (R.1_) is the upper (lower) half-plane of the phase plane (y, z). Therefore, to construct the phase trajectories, in the second equation in (4.4.14), we must take «m with sign "minus" for z > 0 and with "plus" for z < 0.

228

Chapter IV

It also follows from (4.1.28) and (4.1.31) (with regard to c(0) = 0) that the loss function is constant on the j/-axis for I- < y < ^+; thus we can set /°(l/,0) = 0 for y e [*-,/+]. To calculate the loss function /° at an arbitrary point (y,z), we need to integrate Eqs. (4.1.27). To this end, let us first write the system of

equations for the integral curves (characteristics):

/3y

/3y-a±um

.

(4.L32)

c(z)

If 3/0 denotes the point at which a given integral curve intersects the y-axis, z — 0, then (4.1.32) implies the following equation for the characteristics

(the

phase trajectories):

,

±(y) =

lny-y,

(4.1.33)

as well as for the zero approximation of the loss function

/ JO

c(z')

dz'_______

t) + z-z']-a±um

(4.1.34)

In (4.1.34) we have J/Q = f>± \.¥±(y) + z]i where f ~ ^ ( y ) is the inverse function of
case, as already noted, the gradient (4.1.15) remains continuous on F°; therefore, the derivatives of the loss function along F° are determined as previously by (4.1.28). However, in general, formula (4.1.20), from which Eq. (4.1.30) was derived, may not be valid any longer. In this case, the equation for F° can be obtained by differentiating (4.1.34), say, with respect to z and by setting, in view of (4.1.28), the expression obtained equal to zero. This implies the following equation for the switching line F°: y

,o

A

-jaz

(4.1.35)

Here we took into account the equality c(0) = 0 and assumed that the condition (dip±/dyo) • (dyo/dz) ^ 0 must be satisfied on F° determined

by (4.1.35).

229

Synthesis of Quasioptimal Systems

An analysis of the phase trajectories (4.1.14) shows that, to find F° for y > l+, we must use the function (p-(y) in Eq. (4.1.35) (correspondingly, £_)_ in the case of the penalty function c(z) =

z2.

In this case, the integral in (4.1.35) can readily be calculated and

Eq. (4.1.35) acquires the form

a + um

2

y

y

yo

yo

——— In — + y0 In — = y - y0

2p

(4.1.36)

(in (4.1.36) we have yo — f>~1[-(y) + z], where -(y) is determined by (4.1.33)). Equation (4.1.36) determines the switching line z — z°(y) for z > t+

implicitly. Near the point y = l+ = (a + um)/(3 at which the switching line changes its type, Eq. (4.1.36) allows us to obtain an approximate formula and thus write the equation for F° explicitly:

Figure 30 shows the position of the switching line F° and the phase trajectories in the zero approximation.

um - a

FIG. 30

um + a

230

Chapter IV

Higher-order approximations. Everywhere in the sequel we assume that the penalty function c(z) = z 2 . Let us consider Eqs. (4.1.12) corresponding to the first approximation:

1

iy

I

«

I

I

oz )

\ ~ T ">mi

(4.1.37)

r,

dz

cl(y,z) = <4(t/,z)

,

e\

2

fd

32/9

fl

d2fl\

= z 2 +2 - [\B V-4^ + fl ^ %2 + ' "2^ '+ 5y(9z ' z2 /

d 2 fJt ±] (4.1-38)

Ar

gz:

'

To simplify the further calculations, we note that, in the case of the stationary tracking mode and of small diffusion coefficients considered here, the probability that the phase variables y and z fluctuate near the origin on phase plane (y, z) is very large. The values y = (a =p um}/j3 at which the switching line F° changes its type are attained very seldom (for the stationary operating conditions); therefore, we are mainly interested in finding the exact position of the switching line in the region — (um — a)/(3 < y < (um + a)//3, where, in the zero approximation, the position of the switching line is given by the equation z = 0. Next, note that the first-approximation equation (4.1.37) differs from the corresponding zero-approximation equation (4.1.27) only by a small (of the order of e) term in the expression for cl(y, z) (see (4.1.38)). Therefore, the continuity conditions imply that the switching line F1 in the first approximation determined by (4.1.37) is sufficiently close to the previous position z = 0. Thus, we can calculate F1 by using, instead of exact formulas, approximate expressions corresponding to small values of z.

Now, taking into account the preceding arguments, let us calculate the function c1(j/, z) = c±(y, z) determined by (4.1.38). To this end, we differentiate the second expression in (4.1.34) and restrict ourselves to the first-

and second-order terms in z. As a result, we obtain3 ft J± f 9 _ o 2

tz

^7

/3y - a ± um dzdy

f3z (f3y-a±um)2

pft II? yz (j3y - a ± u. 2

2

"''

r

3

( 4 -!-39) dy2

3

The functions f^_(y, z) and f^_(y,z), as the solutions of Eqs. (4.1.37), are defined in R\ and R]_. At the same time, the functions fi(y,z) and /£({/,z) are defined in R.5. and R°_. However, since the switching lines F° (between R^. and R°_) and F1 (between R^ and R]_) are close to each other, to calculate (4.1.39), we have used expressions

(4.1.34) for /£ in R^ and R]_.

Synthesis of Quasioptimal Systems

231

Substituting (4.1.39) into (4.1.38) and (4.1.37), we arrive at the equations

-

dy

m

dz

-

.-

(f3y-a±um)

(in Eqs. (4.1.40) we preserve only the most important terms in the functions

c± (t/, z) and neglect the terms of the order higher than or equal to that ofe 3 ). In view of (4.1.15), both equations (4.1.40) hold on the boundary F1. By solving these equations, we obtain the components of the gradient of the loss function V/ 1 (y, z) on the switching line F1:

dz

*

ul-((3y-ar N)z(/3y - a) __ dy ~ y~ {3y Py[ul - (f3y - a

. } \• • >

In this case, the condition (4.1.20) (a necessary condition for the switch-

ing line of the first kind) leads to the equation

_ e(B + N)((3y - o)1 _ e(B + N)2pz({3y - a)

< - (fly - «)2 J

[< - (fly - «)2]2

l

j

Hence, neglecting the e2-order terms, we obtain the following equation for the switching line F1 in the first approximation: u

™ - (fly - «)2

Equation (4.1.43) allows us to calculate the stationary tracking error •y1 in the first approximation. The function w 1 (j/, -y1) readily follows from (4.1.25), (4.1.41), and (4.1.43). Substituting the expression obtained for w 1 (y, 71) into (4.1.26), we see that 71 = O(e2), that is, the stationary tracking error in the first approximation coincides with that in the zero approximation, namely, 71 = 0. The stationary error 7 attains nonzero values only in the second approximation. To calculate the derivative (4.1.25)

with desired accuracy, we need not calculate the loss function /£ (y, z) in the first approximation but can calculate c 2 (y, z) in (4.1.12) and (4.1.13)

232

Chapter IV

by using expressions (4.1.41) for the derivatives df1 /dy and dfl/dz, are satisfied along the switching line F1. Differentiating the first relation in (4.1.41), we obtain

u

which

-

As follows from (4.1.41), the other second-order derivatives d 2 f ^ / d z d y and d2 f1 /dy2 on F1 are higher-order infinitesimals and can be neglected when we calculate j2 . Therefore, (4.1.45) and (4.1.13) yield the following approximation expression for the function c2(y, z):

Taking (4.1.46) into account and solving the system (4.1.16), (4.1.17) (with k = 2) for df2/dy and df2 /dz, we calculate the functions A2 and A2 in (4.1.44) as 2_g/2_ 1 La, e2(B + N)2 Z + « 8y - j3y\ 2[<-(/3j/-a)2]

d A

7

\ /'

d_f_ dz

"' (4.1.47) From (4.1.26), (4.1.43), (4.1.44), and (4.1.47), we derive the equation for the stationary tracking error in the second approximation:

(

*

e2(B + N)2((3y-a)

whence it follows that 72 =

Formula (4.1.48) exactly coincides with the stationary error (2.2.23) obtained for a homogeneous (in y) input process. The inhomogeneity, in other words, the dependence of the stationary error on the parameter (3, begins to manifest itself only in the calculations of higher approximations. However, the drift coefficient —j3y affects the position of the switching line (4.1.43) already in the first approximation. Formula (4.1.43) is a generalization of the corresponding formula (2.2.22); for (3 = 0 these formulas coincide.

Figure 31 shows the analogous circuit diagram of the tracking system that realizes the optimal control algorithm in the first approximation. The

unit NC is an inertialess nonlinear transformer governed by the functional

Synthesis of Quasioptimal Systems

233

y(t)

FIG. 31 dependence (4.1.43). The realization of the unit NC in practice is substantially simplified owing to the fact that the operating region of the input variable y (where (4.1.43) must be maintained) is small. In fact, it suffices to maintain (4.1.43) for \y\ < Ce1/2, where C is a positive constant of the order of O(l). Outside this region, the character of the functional input-output relation describing NC is of no importance. In particular, for \y\ > Ce1/2, the nonlinear transformer NC can be constructed by using the equations for the switching line F° in the zero approximation or, which is even simpler, by using the equation z = 0. This is due to the fact that the system shown in Fig. 31 optimizes only the stationary tracking conditions when the phase variables are fluctuating in a small neighborhood of the origin on the plane ( y , z ) . §4.2. Calculation of a quasioptimal system for tracking a discrete Markov process

As the second example illustrating the approximate synthesis procedure described above, we consider the problem of constructing an optimal system for tracking a Markov "telegraph signal" type process (a discrete process with two states) in the case where the measurement of the input signal is accompanied by a white noise and the plant is subject to random actions. Figure 32 shows the block diagram of the system in question. We assume that y ( t ) is a symmetric Markov process with two states (y(t) = ±1) whose a priori probabilities pt(±l) = P[y(t) = ±1] satisfy the equations (4.2.1)

Chapter IV

234

y(t)

G

y(t)

c

u(t)

P

a

I*)

FIG. 32 Here the number /J, > 0 determines the intensity of transitions between the states y = +1 and y = — 1 per unit time. The system (4.2.1) is a special case of system (1.1.49) with m = 2 and \a(t) = A 7a (i) = //. It readily follows from (4.2.1) that realizations of the input signal y(t) are sequences of random pulses; the lengths r of these pulses and of the intervals between them are independent exponentially distributed random variables, P(T >c) = e'^. The observable process y(t) is an additive mixture of the input signal y(t) and a white noise (independent of y ( t ) ) of intensity x:

(4.2.2) Like in §4.1, the plant P is described by the scalar equation

(4.2.3) where £(i) is the standard white noise independent of y(t) and C(t) and the controlling action is bounded in absolute value,

\u(t)\ < I.

(4.2.4)

To estimate the system performance, we use the integral optimality criterion

(y(t)-x(t))dt,

(4.2.5)

where the penalty function c(y — x) is the same as in (4.1.4). In the method used here for solving problem (4.2.1)-(4.2.5), it is important that c(y — x) is a differentiate function. In the subsequent calculations, this function is quadratic, namely,

c(y-x) = (y-x)2.

(4.2.6)

Synthesis of Quasioptimal Systems

235

A peculiar feature of our problem, in contrast, say, with the problem studied in §4.1, is that the observed pair of stochastic processes ( y ( t ) , x ( t ) ) is not a Markov process. Therefore, as was already noted in §1.5, to use the dynamic programming approach, it is necessary to introduce a special space of states formed by sufficient coordinates that already possess the Markov property. 4.2.1. Sufficient coordinates and the Bellman equation. Let us

show that the current value of the output variable x(t) and the a posteriori probability Wt(l) = P[y(t) = +1 J/Q] are sufficient coordinates Xt in the problem considered. In the sequel, owing to purely technical considerations, it is more convenient to take, instead of wt(l), the variable zt = Wt(^) ~ wt(— 1) as the second component of Xt. It follows from the normalization condition wt(l) + wt(— 1) — 1 that the a posteriori probabilities wt(l) and wt(— 1) can be uniquely expressed via zt as follows:

«,(-!) = l-=~-

(4-2-7)

Obviously, zt randomly varies in time. Let us derive the stochastic equation describing the random function Zt — z(t). Here we shall consider a somewhat more general case of the input signal nonsymmetric with respect to probability. In this case, instead of (4.2.1) the a priori properties of y(t) are described by the equations D,

(4-2.8)

that is, the intensities of transitions between the states y — +1 and y ~ — 1 down from above (/i) and upwards from below (v) are not equal to each other. Let us pass to the discrete time reference. In this case, random functions in (4.2.2) are replaced by sequences of random variables

yn=yn+Cn,

n=l,2,...,

(4.2.9)

where yn, yn, and £n are understood as the mean values of realizations over the interval A of time quantization: ~

1 fn

yn — A" /

~/ \

1 /"*

y(T) dr,

yn = — /

(

C» = V

/

(n-l)A

"~ 1)A

CW dr.

y(T) dr,

(4.2.10)

Chapter IV

236

It follows from (4.2.8) (see also (1.1.42)) that the sequence yn is a simple Markov chain characterized by the following four transition probabilities P,(yn+i I 2/n):

p A (-l | -1) = 1 - ./A

P A (1 | -1) = i/A,

(4.2.11)

(all relations in (4.2.11) hold up to terms of the order of o(A)). It follows from the properties of the white noise (1.1.31) that the random variables C» corresponding to different indices are independent of each other

and have the same probability densities (4.2.12)

P(C») =

Using these properties of the sequences yn and £ n , we can write recurrent formulas relating the a posteriori probabilities of successive time instants (with numbers n and n + 1) and the result yn+\ of the last observation. The probability addition and multiplication theorems yield the formulas

p(yn+1 = l,yf + 1 ) = p(yn = 1, j/i*)p A (l l)p(5;n+i | y«+i = l)

+ p(yn = -l,y")p A (l -l)p(yn+i | yn+i = l), (4.2.13) P(J/»+I = -l,SfiB+1) =p(y» = -l,sf")p A (- 1 I -l)p(j/ B +i | y n +i = -1) + p(yB = l,gfi B )p A (- 1 [ l)p(y»+i 2/n+i = -1). (4.2.14) Taking into account the relation p(yn = ± l , y " ) = wn(±l)p(y™), rewrite (4.2.13) and (4.2.14) as follows:

2/n + l = 1),

we can

(4.2.15)

Wn(-l

x p(y n+ i y n+ i = -1).

(4.2.16)

We write dn = to n (l)/tw n (-l) and note that (4.2.9) and (4.2.12) imply

Synthesis of Quasioptimal Systems

237

Now, dividing (4.2.15) by (4.2.16) and taking into account (4.2.11), we obtain the following recurrent relation for the parameter dn:

i/Al f2A_ \ ^ exp <^ ——j/ n+ i >.

(4.2.17)

By letting the time interval A —>• 0, and taking into account the fact that

liirtA->o(c?n+i — dn)/A = dt and (4.2.17), we derive the following differential equation for the function dt — d(t): fy

I

dt = v + (v - n)dt - fj-dl +—-y(t).

(4.2.18)

Since, in view of (4.2.7), the functions Zt = z(t) and dt satisfy the relation

dt-(l + zt)/(l - zt), Eq. (4.2.18) for zt has the form

zt = z/(l - zt) - //(I + zt) +

~y(t).

(4.2.19)

3£

For a symmetric signal (fj, = z/), instead of (4.2.19), we have ~

(4-2.20)

REMARK. According to (4.2.2), the observable process y ( t ) contains a white noise, and the coefficients o f y ( t ) in (4.2. 18)-(4. 2.20) contain random functions dt = d(t) and zt = z(t). It follows from §1.2 that, in this case, we must indicate in which sense we understand the stochastic integrals used for calculating the solutions of the stochastic differential equations (4.2.18)(4.2.20). A more rigorous analysis (e.g., see [132, 175] shows that all three

equations (4.2. 18)-(4. 2.20) must be treated as symmetrized equations. In particular, just due to this fact we can pass from Eq. (4.2.18) to Eq. (4.2.19) by using the standard rules for differentiating composite functions (instead of a more complicated differentiation rule (1.2.43) for solutions of differential Ito equations). D Now let us verify whether the coordinates Xt — (xt,zt) are sufficient for the solution of the synthesis problem in question. To this end, according to [171] and §1.5, we need to verify whether the coordinates Xt = (xt,zt) are sufficient (1) for obtaining the conditional mean penalties

£,Xo] = E[c(yt,xt) \ y(r),cc(r): 0 < r < t] = E[c(yt,xt)\Xt];

(4.2.21)

238

Chapter IV

(2) for finding constraints on the set of admissible controls u; (3) for determining their future evolution (that is, the probabilities of the future values Xt+&, A > 0). In this problem, in view of (4.2.4), the set of admissible controls is a given interval — 1 < u < 1 of the number axis independent of anything;

therefore, we need not take into account the statement of item (2).4 Obviously, the conditional mean penalties (4.2.21) can be expressed via the a posteriori probabilities as follows:

E[c(x t , yt)

J/(T), X(T) : 0 < r < t] = c(xt, l)wt(l) + c(xt, -l)t»«(-l). (4.2.22)

Since formulas (4.2.7) express the a posteriori probabilities iyj(±l) in terms

of zt, statement (1) is trivially satisfied for the variables (xt,z ( ). Let us study the time evolution of (xt,Zt). The variable Xt = x(t) satisfies an equation of the form (4.2.3). If in this equation the control ut at time t is determined by the current values of ( x t , Z t ) , then, in view of the white noise properties, the probabilities of the future values of X(T), T > t, are completely determined by Xt = (xt,zt). Now, let us consider Eq. (4.2.20). Note that, according to (4.2.2), y(t) = y(t) + V*CW, where y(t] is a Markov process and C(t) is a white noise. Therefore, it follows from

Eq. (4.2.20) that the probabilities of the future values Z <+ A are determined by Zt and the behavior of J/(T), T > t. However, since y(T) is a Markov process, its behavior for T > t is determined by the state yt described

by the probabilities wt(yt = ±1), that is, in view of (4.2.7), still by the coordinate Zj. Thus, statement (3) is proved for Xt = (xy,Zt). Equations (4.2.3) and (4.2.20) allow us to write the Bellman equation for the problem considered. Introducing the loss function

F(t,xt,zt) = min [«(r)|
T r f E /

C(X(T), y(r)} dr

Ut

i

x(t) = xt, z(t) = zt\

1

t
(4.2.23) and using the Markov property of the sufficient coordinates (x(i),z(t)), from (4.2.23) we obtain the basic functional equation of the dynamic programming approach: _

U

t+A

c(x(r),y(T))dT

+ F(t + A, a5 (+ A, ^+A) I *t, *t • (4.2.24) It is necessary to verify the statement of item (2) only in special cases in which the control constraints depend on the state of the control system. Such problems are not considered in this book.

Synthesis of Quasioptimal Systems

239

The Bellman differential equation can be derived from (4.2.24) by the standard method (see §1.4 and §1.5) of expanding F(t + A, X < + A, -Z«+A) in the Taylor series around the point (t, xt,zt), averaging, and passing to the limit as A —>• 0. In this procedure, we use the following obvious formulas that

are consequences of (4.2.3), (4.2.7), and (4.2.20)-(4.2.22): r

,t+A

E

/

-]

c(x(t),y(T))dr

\ xt,zt\ A + o(A), (4.2.25)

E[(z t +A - xt) | xt, zt] = u t A + o(A),

(4.2.26) (4.2.27) ),

(4.2.28) (4.2.29)

E[(zt+A - «»)* I aj t , zt] = o(A),

fc > 3.

(4.2.30)

It is somewhat more difficult to calculate the mean value of the difference (zt+&.—zt). Since, as was already noted, (4.2.20) is a symmetrized stochastic equation, E[(zt+&—Zt) Xt,Zt] = E[(zt+& — Zt) Zt] can be calculated with the help of formulas (1.2.29) and (1.2.37) (with v = 1/2 in (1.2.37)). Then, taking into account the relation E[y* 1 z*] = E[yt | zt] - zt,

from (4.2.20) and (1.2.37), we obtain

zt) | zt]

+ o(A).

(4.2.31)

As A -» 0, relations (4.2.24)-(4.2.31) enable us to write the Bellman differential equation in the form

dF O I

/// C/(-

. \ dF] I

iiiiAJ.

I\ ai\ tt I -S V. "\1

I IM

\ I

_

fit* Lr Jj

\ \

Bd2F

dF

n

-

-

Uf-Uf^

|

fl"7 C/.G

- / LJ

_

ftf^ \JJj

f. 1

(l-z2)2d2F |

_

/ Si IS Li

1 fl ?^ \J /J

+ c(x, 1)——+ = 0. c (a;.-l)^—^ V ' / O V " ' O 1

(4.2.32) *> -*

240

Chapter IV

The second term in Eq. (4.2.32) can also be written as —\dF/dx\. To the equation obtained, we must add a condition on the loss function

in the end of the control process, namely,

F(T,x,z) = 0,

(4.2.33)

and some boundary conditions. Since the input signal takes one of the two values y(t) = ±1 at each instant of time t, we can restrict our consideration to the region x\ < I. Thus the sufficient coordinates are defined on the square — 1 < x < +1, — 1 < z < +1. The boundary conditions on the sides x — — 1 and x = +1 of this square are r)W ^-(*,±l,z) = 0. (4.2.34) These conditions mean that there is no probability flow [11, 173] through the boundary x = ±1.5 On the other sides z — ±1 of the square, the diffusion coefficient coni1 2\ 2 2 T-I tained in the second diffusion term ^ §pr is zero. Therefore, instead of the conditions dF/dz = 0 on these sides of the square, we have the trivial conditions a

< oo.

(4.2.35)

If, by analogy with the problem solved in §4.1, in the space of sufficient coordinates ( x , z ) we denote the regions where dF/dx > 0 and dF/dx < 0 by R+ and R_, respectively, then in these regions the nonlinear equation (4.2.32) is replaced by the corresponding linear equation and the optimal control is formed by the rule

«,(*,x, z ) = | ' ! r^!R+' (. 1-1,

11

{J-jZ)

t I\—.

Since the first-order derivatives of the loss function are continuous [113, 175], on the interface F between R+ and R_, we have

dF, — (<,:c,z):=0. dx

4.2.36)

To solve the synthesis problem is equivalent to find the interface F between R+ and R_ (the switching line for the controlling action). A straightforward way for obtaining the equation for the switching line F is to solve 5 The condition (4.2.34) means that there are reflecting screens on the boundary segments (x = +1,-1 < z < +1) and (x = -1, -1 < z < +1) (for a detailed description of diffusion processes with phase constraints and various screens, see §6.2).

Synthesis of Quasioptimal Systems

241

the original nonlinear equation (4.2.32) with the initial and boundary conditions (4.2.33)-(4.2.35) and then, on the plane (x, z), to find the geometric locus where the condition (4.2.36) is satisfied. However, this method can be implemented only numerically. To solve

the synthesis problem analytically, let us return to the approximate method used in §4.1. 4.2.2. Calculation of the successive approximations. Suppose that the intensity of random actions on the plant is small but the error of measurement of the input signal is large. In this case, we can set B = eBo and x = XQ/£ (where e > 0 is a small parameter). We consider, just as in

§4.1, the stationary tracking operating conditions. Then for the quadratic loss function (4.2.6), the Bellman equation (4.2.32) has the form a

£

Q2

2 2

2

df . \u^-\ r /l = ^(B ( n f^ +(1-2 5 A + x2 2 - 2xz + 1 - 7 2/iz-^min ^————)>--±} 0 dz |«|
R+ and R_, we can replace the nonlinear equation (4.2.37) by the pair of linear equations

dz

dx

_,, (4.2.38) ^

2

each of which is valid only in one of the regions (R+ or R_) on the phase plane (x, z).

We shall solve Eqs. (4.2.38) by the method of successive approximations considered in §4.1. In this case, instead of (4.2.38), we need to

solve a number of simpler equations that successively approximate the original equations (4.2.38). By setting e = 0 in (4.2.38), we obtain the zero-approximation equations

- ± - - = x2 -2xz+l--y°.

dz

ax

(4.2.39)

The next approximations are calculated according to the scheme 9/;r——— 4^p*, IE

dz

i +

*>*

-

dx

— jj T^ — £,J 9-rr 4- j. 1 — J^ -j-

Bc,-^^— I ° dz 2

y

^
^ ^2

'

-,•* 7

'

fr — 1 2 ~ ' '''' '

(C4

2 401 '

242

Chapter IV

By solving the equations for the Mh approximation (k = 0 , 1 , 2 , . . . ) , we obtain the set /± (x, z), Fk, 7* consisting of approximate expressions for the loss function, the switching line, and the stationary tracking error. In what follows, we solve the synthesis problem in the first two approximations, the zero and the first.

The zero approximation. Let us consider Eqs. (4.2.39). By analogy with §4.1, the equation for the interface F° between R/J. and R£, on which both equations for /° and /£ hold, and the stationary tracking error 7° can be found without solving Eqs. (4.2.39). Indeed, using the condition that the gradient V/ fc (see (4.1.15)) is continuous on the switching line Yk, _ a

OX

—

Q

OX

>

a a

OZ

— —

QQ

OZ

j

K — U, 1, 4, . . .,

we obtain from (4.2.39) the following components of the gradient V/° along r°:

X

~ Ox

The condition

(4.2.43) which is necessary for the existence of the switching line of the first kind (see (4.1.20)), together with (4.2.42) implies that the line

z =x

(4.2.44)

is a possible F° for the zero approximation. An analysis of the phase trajectories of the deterministic system

dx - = ±l,

dz - = -2Mz

(4.2.45)

shows that the trajectories actually approach the line (4.2.44) on both sides6 if only 2fj, < 1. In what follows, we assume that this condition is satisfied. The stationary error 7° is obtained from the condition that the derivative df°/dx calculated along F° at the stable point (e.g., at the origin x — 0, 6 In the first equation in (4.2.45), the sign + corresponds to the region z > x and the sign — to z < x.

Synthesis of Quasioptimal Systems

z - 0) is finite (see we have

243

(4.1.25) and (4.1.26)). In view of (4.2.42) and (4.2.44),

dx

x

z

dx

2(j,x

along F°. The condition (4.1.26) in this case has the form lir 0, which implies 7° = 1. Now, to solve Eq. (4.2.39), we write the characteristic equations

dx

dz

df+

(4.2.46)

To solve (4.2.46) uniquely, it is necessary to pose some additional "initial" condition (to pose the Cauchy problem) for the loss function f ( x , z ) . This condition follows from (4.2.42) and (4.2.44). The second relation in (4.2.42) implies that f°(z,z) = -(z 2 /4^) +/°(0,0) on the line (4.2.44). Without loss of generality, we can set /°(0, 0) = 0. Thus, among the solutions /± obtained from (4.2.46), we choose the solution satisfying the condition /<>(*,*) = -£ = -£ = /"(*,*)

(4.2.47)

on the line z = x. We readily obtain this solution

> = where XQ = X±(xiz)

-

± an

,

±

(4.2.48)

d the functions x± are determined as solutions of

the equations x ± e T2MX± = ze*2l>x.

(4.2.49)

The first approximation. Now, using (4.2.48), we can find the switching line F1 in the first approximation. Relations (4.2.40) and (4.2.41) allow us to write the components of the gradient V/1 on the line F1: +

dx

4>co

\ dz2

dz2 )

eB0

4 \ dx2

dx2

— = —;———— I ———2 -}- ———2 ] -j- —— I ——t 2_j_ ——i^ 2 dz 8/j.itoz \ dz dz J 8fj,z \ dx dx X2

X

( 4 ' 2 - 5 0)

2/xz ~ -. + ^~Differentiating (4.2.48) and using the relations

dx± dx

X±

Chapter IV

244

that follow from (4.2.49), we find the components

dz2

~

(4.2.51)

2/j,z2'

Substituting (4.2.51) into (4.2.50), we obtain

l

dz

4/tz

(4.2.52) 2/J,Z

/J,

IfJLZ

Using again the condition (4.2.43), we find F1. The derivatives dA^/dz and dA^/dx are calculated with regard to the fact that the difference between the position of the switching line F1 in the first approximation and the position of F° determined by (4.2.44) is small. Therefore, after the differentiation of (4.2.52), we can replace x+ z

x

an<

i X- by the relation

X+ = X- = — - If this replacement is performed only for the terms of the order of e, then the error caused by this replacement is an infinitesimal of higher order.

-1

FIG. 33

245

Synthesis of Quasioptimal Systems Taking into account this fact, we obtain from (4.2.52):

dx dx

-0(e2

— 4iJ,2x2}

dz dAl

eB0

xo(l — 4/j,2x2)

— - - + 0(e 2 ). /j,z ft

1 — 4fj?x2

Hence, using (4.2.43), we obtain the equation for the switching line F1: 9/iWl _ x T 2ux(l

2

\2

9i 2fj,Box

(4.2.53)

The position of F1 on the plane (x, z ) depends on the values of fj,, >CQ, and BQ. Figure 33 shows one of the possible switching lines and the phase

trajectories of system (4.2.45). By analogy with the zero approximation, we find the stationary tracking error 71 from the condition that the gradient (4.2.52) is finite at the origin. By letting z —>• 0 and x —> 0 in (4.2.52) and taking into account the fact that x+ and %_ tend to zero just as x and z, we obtain

Hence it follows that the stationary error in the first approximation depends on the noise intensity at the input of the system shown in Fig. 32 but is

independent of the noises in the plant.

y(t)

rl y(t)

,

+1|— r -• r —l-i

>/\

1

l/x

J «-

i_ sc

1

< 1

-

-2/z I——

<- X •

-1 FIG.

P

^0

<

<

Kl 1 J

_I

34

ATC

x(t)

246

Chapter IV

Using the equation (4.2.53) for the switching line and Eq. (4.2.20), we construct the analogous circuit (see Fig. 34) for a quasioptimal tracking system in the first approximation. The dotted line indicates the unit SC that produces a sufficient coordinate z(t); the unit NC is an inertialess

transducer that realizes the functional dependence on the right-hand side of (4.2.53). If we have e -C 1 for the small parameter contained in the problem, then the output variable x(t) fluctuates mostly in a small neighborhood of

zero. In this case ( \ x ( t ) \
K =

CHAPTER V

CONTROL OF OSCILLATORY SYSTEMS

The present chapter deals with some synthesis problems for optimal systems with quasiharmonic plants. Here the term "quasiharmonic" means that the plant dynamics is close to harmonic oscillations in the process of control. In this case, through time t = 2?r, the phase trajectories of the second-order systems considered in this chapter are close to circles on the plane (a;, T). There exists an extensive literature on the methods for studying such systems (including controlled systems) (e.g., see [2, 19, 27, 33, 69, 70, 136, 153, 154] and the references therein). These methods are based on the idea (going back to Poincare) that the motion in oscillatory systems

can be divided into "fast" and "slow" motions. This idea along with the averaging method [2] enables one to derive equations for "slow" variables that can readily be integrated. These equations are usually derived by different versions of the method of successive approximations. Various approximate methods based on the first-approximation equation for slowly varying variables play an important role in industrial engineering. For the first time, such a method for studying nonlinear oscillatory systems was proposed by van der Pol [183, 184] (the method of slowly varying amplitudes). Among other first-approximation methods, we also point out

the "mean steepness" method [2] and the harmonic balance method [69, 70], which is widely used in engineering calculations of automatic control systems. More precise results can be obtained by regular asymptotic methods, the most important of which is the asymptotic Krylov-Bogolyubov method [19]. Originally, this method was developed for studying nonlinear oscillations in deterministic uncontrolled systems. Later on, this method was also used for the investigation of stochastic [109, 173] and controlled [33] oscillatory systems. In the present chapter, the Krylov-Bogolyubov method is also

widely used for constructing quasioptimal control algorithms. This chapter consists of four sections, in which we consider four special problems of optimal damping of oscillations in quasiharmonic second-order systems with constrained controlling actions. In the first two sections (§5.1 247

248

Chapter V

and §5.2) we consider deterministic problems; the other two sections (§5.3 and §5.4) deal with stochastic synthesis problems. First, in §5.1 we study the control problem for an arbitrary quasiharmonic oscillator with one degree of freedom. We describe a method for solving the synthesis problem approximately. In this method, the minimized functional and the equation for the switching line are presented as asymptotic expansions in powers of a small parameter contained in the problem. The method of approximate synthesis is illustrated by some examples of solving the optimal control problems for a linear oscillator and a nonlinear van der Pol oscillator. In §5.2 we use the method (considered in §5.1) for solving the control problem for a system of two biological populations, namely, the "predator-prey" model described by the LotkaVolterra equation (see §2.3). We study a special Lotka-Volterra model with a "poorly adapted predator." In this case, the sizes of both interacting populations obey a quasiharmonic dynamics. Next, in §5.3, we consider the stochastic version of the problem studied in §5.1. We consider an asymptotic synthesis method that allows us to construct quasioptimal control systems with an oscillatory plant subject to additive random disturbances. Finally, in §5.4, the method considered in §5.3 is generalized to the case of indirect observation when the measurement of the current state of the oscillator is accompanied by a white noise.

§5.1.

Optimal control of a quasiharmonic

oscillator. An asymptotic synthesis method

According to [2], a mechanical system with one degree of freedom is called a quasiharmonic oscillator if its behavior is described by the system of the form

=

),

\ ,u),

(5.1.1)

where xi and x2 are the phase coordinates, xi and X2 are sufficiently arbitrary (nonlinear, in the general case) functions of their arguments,1 u = (1*1,..., ur) is an r-dimensional vector of controlling actions subject to various restrictions, and the number e is a small parameter. It follows from (5.1.1) that for e — 0 the general solution of system (5.1.1) is a union of two harmonic oscillations

= acos(t + a),

(5.1.2)

'The only assumption is that, for some given functions xi and X2i the Cauchy problem of system (5.1.1) has a unique solution in a chosen domain D in the space of the variables (t,xi,x%) (see §1.1).

Control of Oscillatory Systems

249

with the same period r = 2?r and the phase shift A^> = Tr/2. Note that, in the phase plane (x±, x2), the trajectory that is a circle of radius a corresponds to the solution (5.1.2). If e ^ 0 but is a sufficiently small parameter, then, in view of the continuity, the difference between the solution of system (5.1.1) and the solution (5.1.2) is small on a time interval that is not

too large. More precisely, if for e ^ 0 we seek the solution of system (5.1.1) in the form

xi(t) = a(<)sin (t + «(<)),

x2(t) = a(t) cos (t + a(t)),

then the "amplitude" increment Aa — a(t + 2-Tr) — a(t) and the "phase" increment Aa = a(t + 27r) — a(t) are small during time T = 2?r, that is, Aa ~ £ and Aa ~ £. This fact justifies the term "quasiharmonic" for systems of the form (5.1.1) and serves as a basis for the elaboration of

various asymptotic methods for the analysis of such systems. 5.1.1. Statement of the problem. In the present section we consider controlled oscillators whose behavior is described by an equation of the form

x + ex(x, x)x + x = eu,

(5.1.3)

where x(z, x) is an arbitrary given function (nonlinear in the general case) that is centrally symmetric, e.g., x(x,x) = x(—z, — x). In the phase variables Xi,x2 (determined, as usual, by xi = x and x2 = x), we can replace Eq. (5.1.3) by the following equivalent system of first-order equations:

x-i = X2,

x2 — — xi — ex(xi,x2)x2 + eu,

(5.1.4)

hence it follows that the oscillator (5.1.3) is a special case of the oscillator (5.1.1) with Xi = 0 and X 2 (*i, x2, u) = u - x(xi, X2)x2.

It should be noted that equations of the form (5.1.3) describe a wide class of controlled plants of various physical nature: mechanical (the Froude pendulum [2]), electrical (vacuum-tube and semiconductor generators of harmonic oscillations [2, 19, 183, 184]), electromechanical remote tracking systems for angle reconstruction [2], etc. Numerous examples of actual systems mathematically modeled by Eq. (5.1.3) can be found in [2, 19,

136]. For the controlled oscillator (5.1.3), we shall consider the following optimal control problem with free right-hand endpoint of the trajectory. We assume that the absolute value of the admissible (scalar) control u = u(t) is bounded at each time instant t: \u(t)
(5.1.5)

250

Chapter V

and the goal of control for system (5.1.3) is to minimize the integral functional fT

I[u]— I

Jo

c(x(t},x(t}}dt-+

min |«(«)|<« m , o
v(5.1.6)

over the trajectories {x(t) = x"(t): 0 < t < T} of system (5.1.3) that

correspond to all possible controls u satisfying (5.1.5). The time interval [0,T] and the initial state of the oscillator *(0) = £i(0) = ZIQ, x(Q) = £ 2 (0) = x2o are given. The penalty function c(x, x) = c(xi,x2) in (5.1.6) is assumed to be nonnegative and symmetrical with respect to the origin, c(xi, x2) = c(— cci, —£2), and vanishing only at the point (xi = 0, x2 = 0). In this case, the optimal control w* minimizing the functional (5.1.6) is sought in the synthesis form u* = ut (t, xi(t), £2(2)). Problem (5.1.3)-(5.1.6) is a special case of problem (1.3. !)-(!. 3.3) considered in §1.3. Therefore, if we determine the function of minimum future losses

t
U

T

-I

C(ZI(T), :c2(T)) (fr

xi(t) = Xi,x2(t) = x2\ J

(5.1.7) in the standard way and use the standard derivation procedure described

in §1.3, then, for the function (5.1.7), we obtain the Bellman differential equation

OF at

OF axi

,

.

,

0
.

. dF . (" dF mm eu ' ox2 |«|<«m L 9x2 jF(T,X!,x 2 )=0,

(5.1.8) that corresponds to problem (5.1.3)— (5.1.6). Equation (5.1.8) allow us to obtain some general properties of the optimal control in the synthesis form u*(t, x\, x2), which we shall use later. Indeed, it follows from (5.1.8) that the optimal control ut for which the expression in the square brackets attains its minimum is a relay-type control and can be written in the form

dF u*(t, x^ x2] = -um sign -— (t, xi, x2).

(5.1.9)

REMARK 5.1.1. Rigorously speaking, the optimal control in this problem is not unique. This is related to the fact that at the points (t, Xi, x2), where dF(t, Zi, x2)/dx2 = 0, the optimal control uf is not uniquely determined by Eq. (5.1.8). On the other hand, one can see that at the points

Control of Oscillatory Systems

251

(t,xi,x2), where 8F/dx2 = 0, the choice of any control u° lying in the admissible region [—«TO, um] does not affect the value of the loss function F(t, Xi, x2) that satisfies the Bellman equation. Therefore, in particular,

the control (5.1.9) that requires the choice of w* = 0 at the points (t, x\, X2),

= O,2 is optimal. D

where dF(t,xi,x2)/dx2

Using (5.1.9), we can rewrite the Bellman equation (5.1.8) in the form

OF

dF

,

«^2"o

,

\<"1

dF

,dF

1 ^ / V V ^ - L ) ^ti^L I n

^ ""m

f\

OX2

0 < t < T, (5.1.10) It follows from (5.1.10) and the central symmetry of x(*i; x2) and c(xi, x2) that the loss function (5.1.7) satisfying (5.1.10) is centrally symmetric with

respect to the phase coordinates, namely, F(t, si, x2) = F(t, — X i , — x2)Therefore, for any t, xi, x2 we have

dF

dF

-— (t, xi, x2) = --— (t, -xi, -x2). ox2 ox2 It follows from this relation and (5.1.9) that the optimal control algorithm

u*(t, xi, x2) has an important property of being antisymmetric, namely, w*(<, xi, x2) = — u»(i, —xj_, —x2).

(5.1.11)

The facts that the optimal control in problem (5.1.3)-(5.1.6) is of relay type (5.1.9) and antisymmetric (5.1.11) play an important role in the asymptotic synthesis method discussed in the sequel. We also note that the optimal control algorithm in problem (5.1.3)(5.1.6) can be simplified significantly if we consider the optimal control of system (5.1.3) on an infinite time interval. In this case, the upper limit

of integration T —> oo in (5.1.6) and, instead of (5.1.7), we have the timeindependent3 loss function

min \u(T)\
/

C(ZI(T), x2(r)) dr Xi(t) = xi, x2(t) = x2 ,

Ut

J

T>t

(5.1.12) 2

Recall that the discontinuous function signx is determined by the relation

1 3

+1,

x > 0,

0,

x = 0,

-1,

x<0.

The loss function (5.1.12) is time-independent, since the plant equations (5.1.4) are

time-invariant.

252

Chapter V

and, instead of (5.1.9), we have a time-invariant control algorithm of the form u*(xi, x 2 ) = -Mm sign -— (xj., x2). (5.1.13) In what follows we shall consider just such a time-invariant version of

the optimal control problem (5.1.3)-(5.1.6) on an infinite time interval. REMARK 5.1.2. As T -> oo, problem (5.1.3)-(5.1.6) makes sense only if there exists an admissible control u(xi: X2) in the synthesis form ensuring the convergence of the improper integral4 f0

(5.1.14)

Jo where x"(t) and x2(t) denote solutions of system (5.1.4) with control u

and the initial conditions £i(0) = x\ and 3:2(0) = x2. Simultaneously, for some constraints of the form (5.1.5) imposed on the admissible controls and for some nonlinear functions x(xi,x2) in (5.1.3), (5.1.4), it may happen that none of the admissible controls u ensures the convergence of the integral (5.1.14). For example, if x(xiix2) — x^ — 1, then system (5.1.3) is a controlled van der Pol oscillator. It is well known [2, 183, 184] that undamped quasiharmonic auto-oscillations arise in such systems for u = 0. Moreover, this auto-oscillating process is stable with respect to small disturbances affecting the oscillator. Therefore, for sufficiently small um in

(5.1.5), any admissible control is insufficient to "suppress" auto-oscillations in the oscillator (5.1.3). In its turn, in view of the properties of the penalty function c(xi,x2), it follows from this fact that the integral (5.1.14) does not converge. D Everywhere in the sequel, we assume that the parameters of problem (5.1.3)— (5.1.6) are chosen so that this problem has a solution as T —>• oo.

The solvability conditions for problem (5.1.3)-(5.1.6) as T —» oo will be studied in more detail in Section 5.1.4. 5.1.2. Equations for the amplitude and the phase. Reduction of the synthesis problem. To study the quasiharmonic systems of the

form (5.1.1) and (5.1.3), it is convenient to describe the current state of the system by using, instead of the coordinate x\ and the velocity £ 2 , the polar coordinates A and
0=2 = — Asin$,

^ — t +
(5.1.15)

4 It also follows from the properties of the penalty function c(xi, x%) that the control u(xi,xy) guarantees the asymptotic stability of the trivial solution x\ (t) = x 2 (t) = 0 of system (5.1.4).

Control of Oscillatory Systems

253

The change of variables (5.1.15) transforms system (5.1.4) to the following equations for the slowly changing amplitude and phase (equations in the normal form [2, 19, 136]):

A = eG(A,$,u),

,u),

(5.1.16)

where

H(A,$,u) = , . ,,

A

,

2

< e ,

.

,

(5.1.17)

sn

ws (A, $) = u(A, $) sin $,

w c (-4j $) = w(-4; $) cos $•

Since the optimal control is of relay type (5.1.9), (5.1.13) and antisymmetric (5.1.11), for the control function u(A, r (^))]. (5.1.18)

Note that, in view of the change of variables (5.1.15), controls of the form (5.1.18) are already of relay type and antisymmetric on the phase plane ( x i , X 2 ) . The function r(A) in (5.1.18) determines (in the polar coordinates) an equation for the switching line of the controlling action. Thus, in this case, the synthesis problem is equivalent to the problem of finding the function (p* (A) that minimizes a given optimality criterion. The func-

tion f* (A) is calculated by using the method of successive approximations presented in Section 5.1.4. It is well known [2, 19, 33] that for a sufficiently small parameter e, in-

stead of Eqs. (5.1.16), one can use some other auxiliary equations, which are constructed according to certain rules and are called truncated equations. These equations allow one to obtain approximate solutions of the original equations in a rather simple way (the accuracy is the higher, the smaller is the parameter e).5 In the simplest case, the truncated equations

A = G(A), 5

(5.1.19)

Here we do not justify the approximating properties of the solutions constructed with the help of truncated equations. A detailed discussion of these problems can be found in numerous textbooks and monographs devoted to the theory of nonlinear oscillations (e.g., see [2, 19, 33, 136]).

254

Chapter V

are obtained from (5.1.16) by neglecting the vibrational terms in the expressions for G(A, $, u) and H(A, <&, it) or, which is the same, by averaging the right-hand sides of Eqs. (5.1.16) over the "fast phase" $, while the amplitude A is fixed,6 namely,

G(A) = G(A,,u)= —

G(A,3>,u)d$,

27T Jo0

(5.1.20)

2

i r* 27T Jo

A higher accuracy of approximation to the solution of system (5.1.16) is ensured by the regular asymptotic Krylov-Bogolyubov method [19, 173],

in which the vibrational terms on the right-hand sides of Eqs. (5.1.16) are eliminated by the additional change of variables

A = A* + £v(A*,$*),

$*=t + y>*,

(5.1.21)

where

v(A* $*) = v-i (A* *£*) -f- sv^iA* 3?*) -f- £ 2

(z 1 OO\ ^O.l.ZZj

denote purely vibrational functions such that

———— i r2n v(A,$*)d$* v(A*,$*) — — I

= 0,

27T J0

——————

1

f2n

w(A*,$*) = — I ^^ Jo

w(A , $*)d$* = 0.

By the change of variables (5.1.21), we obtain the following equations for the nonvibrational amplitude A* and phase (p* from (5.1.16):

A* = eG*(A") = eGl(A*) + e2G*2(A*} + £3 ...,

(5.1.23]

In this case, the successive terms GJ, H^, G%, H%,..., vi, wi, v2, w2,... of the asymptotic series (5.1.23) and (5.1.22) are calculated recurrently by the method of successive approximations. 6

This method for obtaining truncated equations is often called the method of slowly

varying amplitudes or the van der Pol method.

Control of Oscillatory Systems

255

Let us illustrate this method. By using (5.1.21), we can write (5.1.16) in the form

A* + ev(A*, $*) = eG(A* + ev(A*,$*), $* + ew(A\ $*), u),

I 0* -L «^^t I

Substituting (5.1.22) and (5.1.23) into (5.1.24) and retaining only the terms of the order of e in (5.1.24), we obtain the first-approximation relations

(5.1.25) Now, by equating the nonvibrational and purely vibrational terms on the left and on the right in (5.1.25), we obtain the following expressions for the first terms of the asymptotic series (5.1.23) and (5.1.22):

>*)-«.(A*,$*) = G(^*), _ i ______ A* vi'(A*,$*)=j '*

/"**

+

[xA(A*,$')-xA]d&-W^A*,®*),

(5.1.26) (5.1.27)

J $*

1 '-^;*c(A*,*t), (5.1.28) where

In (5.1.26)—(5.1.28), as usual, the bar over an expression indicates the averaging over the period, that is, ^ J0 " ... d$*; the lower integration limits i and $2 are chosen so that the functions vi(A*,$*) and iw 1 (.A*,$*), determined by (5.1.27) and (5.1.28), be "purely vibrational" in the variable $*. In a similar way, we can calculate the next terms G^A*), H^A*), v2(A*, $*),... of the asymptotic expansions (5.1.23) and (5.1.22). So, to

256

Chapter V

calculate the functions (?£, H%, v2, w2 in (5.1.24), we need to retain the expressions of the order of e2. Then (5.1.24) implies the second-approximation relations

In its turn, each equality in (5.1.29) splits into two separate relations for the nonvibrational and vibrational terms contained in (5.1.29), respectively. This allows us to calculate the four functions G|(j4*), H%(A*), v2(A*,$*), and 102(^*5 $*)• In particular, for the nonvibrational terms, the first equality in (5.1.29) implies ,

(5.1.30)

,

Using (5.1.17), (5.1.27), and (5.1.28), we can write the right-hand side of (5.1.30) in more details as follows:7

'S8A* du

° f( (5.1.31)

where the expression

/(XA - XA) d** + ^ j(Xv - xv] d**

(5.1.32)

indicates the control-independent terms. We do not write out the expressions for H2(A*), v2(A*, $*),... since we do not need them in the sequel. 7 For brevity, we omit the arguments (j4*,$*) of the functions x; , Xv», ua, $ s , and *c: in in(5.1.31) and and(5.1.32).

Control of Oscillatory Systems

257

5.1.3. Auxiliary formulas. The functions Gl(A*), H$(A*), G*i(A*), H^(A*),... that form the asymptotic series in (5.1.23) depend on the choice of the control algorithm u(A, $), that is, in view of (5.1.18), on the function ipr(A). It follows from (5.1.26) and (5.1.31) that we can write this

dependence explicitly if we know the expressions

**^T,

dA

(5133)

du c

<9$

The average values (5.1.33) can readily be calculated by using (5.1.18), the properties of the (5-function, and the fact that the functions ut(A, $), uc(A, $), ^S(A, <&), and $C(A, <J>) are periodic (with respect to $). 1. If, for defmiteness, we assume that 0 < ipr < 7T/2, then it follows from (5.1.17) and (5.1.18) that /"27r

1

m

27T J0

I

f

0 '+ /

— / L

sin $ d$ — /

sin <

^0

(5.1.34) One can readily see that formula (5.1.34) remains valid for any
2. In a similar way, we obtain 1

/*27T

MC = — /

27T JQ

rt

WTO sign [sin ($ — y> r (.A))] cos$d$ = — —— simpr(A) 7T

(5.1.35) and the relation wlw^=0, which we shall use later. 3. Using the formal relation

-— sign a: = 2S(x) ax

(5.1.36)

258

Chapter V

and formula (5.1.18), we can write

f\

-{M™ sign [sin ($=- 2 u

m

- 6 [ dA

S

m (* - <pr(A))] cos ($ - <pr(A)) sin$.

(5.1.37)

Using (5.1.37) and the properties of the (J-function, after the integration and some elementary calculations, we obtain ——- sin n$ = — /

dA

——- sin n<J> d<&

2?r J0

{

u

dA

d

,

,

^—^—[cos(n + l)tpr — cos(n — l)r\

0

,

,

tor even n ,

for odd n. (5.1.38)

4. By the straightforward integration with regard to

dus ——- = 2Mra5[sin($ - r)] cos($ — y> r ) sin $ + um sign[sin(<3> — y> r )] cos $, we obtain

for odd n. (5.1.39)

5. Since \£ s (yl, $) and dus(A,
Next, using (5.1.27) and (5.1.37), we arrive at n.i/y

(5.1.41)

dA where

,*

<J[sin($' - ^ ] cos$' - y>

sin $' d$'.

259

Control of Oscillatory Systems

7T

2?r

7T + (

FIG. 35 It follows from (5.1.40) that the choice of $1 does not affect the value of vF»f^f- Hence we set $1 = 0. Furthermore, if we consider 0 < y>r < TT,

then the piecewise constant function F(&) in (5.1.41) has jumps of value at the points <pr and TT + (pr as shown in Fig. 35. For this function , one can readily calculate F and usF, namely, Ur n -*»» u,F= —— TT

These relations, (5.1.34), (5.1.40), and (5.1.41) imply

dus

dip

n

—

_

Carrying out similar calculations for — -K < <pr < 0 and comparing the result with the last formula, we finally obtain I sm r 6. Using the relation

dus

= —(uc - uc)us = ucus - ucus

(5.1.42)

260

Chapter V

and expressions (5.1.34)-(5.1.36), we obtain

7. The relation = -uccosn3> n

(5.1.44)

allows us to reduce the calculation of the desired mean value to finding a simpler expression w c cosn$. Using (5.1.17) and (5.1.18) and performing some simple calculations, we obtain

for odd n. (5.1.45)

8. The value ^ „ cos n$ can readily be obtained by using the obvious relation -————— ^

1

s

w „ cos n$ = — —T- -r— cos n$

and formula (5.1.39). The expressions obtained for the average values (5.1.33) will be used

later for solving the synthesis problem. 5.1.4. Approximate solution of the synthesis problem. Now let us return to the basic problem of minimizing the functional (5.1.14). By choosing the nonvibrational amplitude and phase as the state variables, we rewrite (5.1.14) in the form8

f°° /(A , $ * ) = / c*(At*,*;)
(5.1.46)

where c*(A*,<£>*) is obtained from the penalty function c ( x i , X 2 ) by the change of variables (5.1.15), (5.1.21). Note that the functional (5.1.46), treated as a function of the initial state (A*, <&*), is a periodic function in the second variable, namely, I(A* , $*) = 8 The value of the functional (5.1.46) depends both on the initial state A*(0) = A*, $*(0) = $* of the system and on the control algorithm u(A*, $*) : 0 < t < oo. There-

fore, for the functional (5.1.46) it is more correct to use the notation Iu(At > * < ) (A*, $*) or /^rl '(^4*,$*) (which, in view of (5.1.18), is the same). However, for simplicity, we write I(A*,$*).

Control of Oscillatory Systems

261

*,$* + 2ir). Therefore, taking into account (5.1.21) and the second equation in (5.1.23), we obtain •*'27r

*

(5-1.47)

from (5.1.46). In (5.1.47) the integration over the period is performed along a trajectory of the system, and hence the amplitude A^ is treated as

a function of the fast phase £ . This function A£ ($£ ) is determined by the relation

that follows from Eqs. (5.1.23). Note that the amplitude increment A A* = A* ($* + 27r) - .A* (

3/(A*,$*)_

1

'**'

(A* 3>*\ t} l

*'

2

dA*2

1

* (A;)

-AA*-...

(5.1.49)

Since A A* = eGJ(A*)27r in the first approximation with respect to e, it follows from (5.1.49) that

where

" and the function G^(A*) = G^(A*,^ r (A*)) is determined by (5.1.26),

(5.1.17), and (5.1.34). Calculating the right-hand side of (5.1.49) with a higher accuracy (in this case, to calculate the last term in (5.1.49), we need to differentiate (5.1.50)), we obtain

c*(A*) v ;

*

~9c*

____ _ fE—_I { A* $*t)

*

*

'

262

Chapter V

where, just as in (5.1.51), the bar over a letter indicates the averaging over the period with respect to <&£, and the function G^(A*] is determined by (5.1.31). Let us write the functional to be minimized as follows:

-L(A*t,$*}dA*t

(5.1.53)

(note that, by the assumptions of the problem considered, we can set It follows from (5.1.53) that, to minimize the functional (5.1.46), it suffices to find the minimum of the derivative dI(A* , <&*}/dA* for an arbitrary current state (A* , <J>*) of the control system. The accuracy of this minimization procedure depends on the number of terms retained in the expansion of the right-hand side of (5.1.49) in powers of e. Let us perform the corresponding calculations for the first two approximations. According to (5.1.50), to minimize the functional (5.1.46) in the first approximation in e, it suffices to minimize (in <pr) the expression f)T

r*( A*}

(5-1-54) Since the penalty function c(x, x) = c(xi, £3) is nonnegative, we have c*(A*} > 0 for A* ^ 0. Therefore, to minimize (5.1.54), it suffices to minimize the function GJ (A*,

———r~T-——~~'

J.

/•27T /

>*) = — I 2T J0

This fact and (5.1.5) readily imply that the optimal control ui(A*,$*) in

the first approximation must have the form «i (A*,**) =u m sign(sin$*).

(5.1.55)

Comparing (5.1.55) and (5.1.18), we see that <pr(A*) = 0 in the first approximation in e. This means that, in this case, the switching line of the control coincides with the abscissa axis on the phase plane (xi = x , X 2 = x). Indeed, if, instead of the amplitude A* and the phase $*, we take the coordinate x and the velocity x as the state variables, then it follows from

(5.1.15), (5.1.21), and (5.1.55) that, in this approximation, the optimal control of the oscillator (5.1.3) is ensured by the synthesis function of the form

ui(x,x) = —umsignx.

(5.1.56)

Control of Oscillatory Systems

263

From the mechanical viewpoint, this result means that, to obtain the optimal damping of oscillations in the oscillator (5.1.3), we must apply

the maximum admissible controlling force (the torque) and this force (the torque) must always be opposite to the velocity (the angular velocity) of the motion. It must also be emphasized that the control algorithm in the

first approximation is universal, since it depends neither on the nonlinear characteristics of the oscillator (that is, on the function x(xi x) 'm (5.1.3)) nor on the form of the penalty function c(x, x) in the optimality criterion

(5.1-6). To find the quasioptimal control algorithm in the second approximation, we need to calculate the function if>r(Af) that minimizes (5.1.52) or, which is the same, the expression

G*1(^,^r(A*)) + £ G;(^,^ r (^)).

(5.1.57)

Since (5.1.57) differs from Gl(A*, ipr(A*)^ by a term of the order of e, it is natural to assume that the difference between the function <pr(A*) that minimizes (5.1.57) and the function <pr(A*) = 0 in the first approximation is small, that is, it is natural to assume that we have <pr(A*) ~ e for the desired function. Having in mind the fact that tpr(A*) ~ e and using the average values (5.1.33) calculated in Section 5.1.3, we can estimate the order of differ-

ent terms in formula (5.1.31) for the function G^A*, y> r (^4*)). We also note that since the function x 'm (5.1.3) is symmetric, that is, x(«, x) — x(—x, ~x), there are only cosines (sines) of $, 2$,... in the Fourier series for the functions X A (-A, $) (xp(A, $)). Thus, it follows from the results obtained in Section 5.1.3 that, among all terms in (5.1.31), only two terms ^T^efj*- and —y-^c^r are of the order of e. The other terms (depending on the control, that is, on tpr(A*)) in (5.1.31) are of the order of e2 or

e3. This implies that the function r(A*) minimizing (5.1.57) in the second approximation is just the function maximizing the expression = 5 .--L* e

+

.

L

.

(5.1.58)

To obtain some special results, we need to define the function x(z, x) explicitly. Let us consider two examples. EXAMPLE 1 . Suppose that the plant is a linear quasiharmonic oscillator described by Eq. (5.1.3). In this case, x(z,i) = 1 and, in view of (5.1.17),

By using (5.1.34), (5.1.43), and (5.1.45), we obtain ™

• o

T SH1 2 ^r'

264

Chapter V

The desired function <pr(A*) can be found from the condition

(5.1.59)

Since ipr is small (tpr ~ e), it follows from (5.1.59) that9 ,.^

£ .

2ura

(5.1.60)

The function ipr(A*) determines (in the polar coordinates) the switching line equation for the quasioptimal control in the second approximation. The position of this switching line on the phase plane ( x , x) is shown in Pig. 36.

FIG.

36

It follows from (5.1.18) and (5.1.60) that in this case the quasioptimal control algorithm (the synthesis function) in the second approximation has the form = « m s g n sn

(5.1.61)

REMARK 5.1.3. It follows from (5.1.60) that r(A*) ->• oo as A* -^ 0 and formulas (5.1.60) and (5.1.61) do not make sense any more. The reason 9 The terms of the order of e2 and of higher orders on the right-hand side of (5.1.60) are omitted.

Control of Oscillatory Systems

265

is that if we use a control of the form (5.1.18), then there always exists a small neighborhood of the origin on the phase plane (z, z) and the quasiharmonic character of the trajectories of the plant (5.1.3) is violated in

this neighborhood. In Fig. 36, this neighborhood is the circle of radius R

(R ~ e).10 In the interior of this neighborhood, the applicability conditions for the asymptotic (van der Pol, Krylov-Bogolyubov, etc.) methods are violated. Therefore, the quasioptimal control algorithms (5.1.56) and (5.1.61) can be used everywhere except for the interior of this neighborhood. Moreover, it is important to keep in mind that, by using the asymptotic synthesis method discussed in this section, it is in principle impossible to find the optimal control in a small neighborhood of the point (x = 0, x = 0).

D

EXAMPLE 2. Now let x(x,x) = x1 - 1. In this case, the plant (5.1.3) is a self-oscillating system (a self-exciting circuit) sometimes called the van der Pol oscillator or the Thomson generator. It follows from (5.1.17) that,

in this case, we have

A\ A2 1 XA(A,$)= - l-cos2$- — (l-cos4$) . 2 [

4

(5.1.62)

J

Using formulas (5.1.34), (5.1.43), and (5.1.45) for the function (5.1.58), we obtain

um £——

\A*2 / I 1

4W

• Q • , ™ ' O - - sm 3tpr - sin
Just as in Example 1, from the condition dF/d<pr = 0 with regard to the fact that (pr is small (tpr ~ e), we derive the equation of the switching line,

(5 L63

-

- >

and the synthesis function in the second approximation,

[ 10

/ F / A*2 A.-II \ \ 1 sin ( $ * - | ( ± _ - l + ^) I V

Z \

4

7T A

(5.1.64)

/ J J

An elementary analysis of the phase trajectories of a linear oscillator subject to the

control (5.1.56) shows that the phase trajectories of the system, once entering the circle

of radius R = 2eum , not only cease to be quasiharmonic, but cease to be oscillatory in character at all.

Chapter V

266

u2(x, x) — -un

u2(x,x) = un

FIG. 37 The switching line (5.1.63) is shown in Fig. 37. REMARK 5.1.4. It was pointed out in Remark 5.1.2 that the problem of optimal damping of oscillations in system (5.1.3) on an infinite time interval is well posed if the optimal (quasioptimal ) control of the plant (5.1.3) ensures the convergence of the improper integral (5.1.14) (or, which is the same, of the integral (5.1.46)). Let us establish the convergence conditions for these integrals in Example 2. The properties of the penalty function c(x, x) readily imply that the integral (5.1.46) converges if, for a chosen control algorithm and any initial value of the nonvibrational amplitude A*(0), the solution of the first equation in (5.1.23) A* (t) —>• 0 as t —>• oo, and furthermore, if A*(t) tends to zero not too "slowly." Let us consider the special form of Eq. (5.1.23) in Example 2. We confine ourselves to the first approximation A* = eG\(A*). Since the quasioptimal control in the first approximation has the form (5.1.55), it follows from (5.1.26) and (5.1.62) that the nonvibrational amplitude obeys the equation

If um > 7r/3\/3, then for any A* > 0 the function on the right-hand side of (5.1.65) cannot be positive; therefore, A* (t) —>• 0 as t —>• oo for any solution of (5.1.65). If in this case >

3V3

+ 6,

6>0,

(5.1.66)

Control of Oscillatory Systems

267

then the solution A*(t) of Eq. (5.1.65) attains the value A* = 0 on a finite time interval, which guarantees the convergence of the integral (5.1.46). Thus, the inequality (5.1.66) is the solvability condition for problem (5.1.3)(5.1.6) as T -> oo in the case of Example 2.11 D In conclusion we note that, in principle, the approximate method considered here can also be used for calculating the quasioptimal control algorithms in the third, fourth and higher approximations. However, in this case, the number of required calculations increases sharply.

§5.2.

Control of the "predator—prey" system. The case of a poorly adapted predator

In this section, by using the asymptotic synthesis method considered in §5.1, we solve the optimal control problem for a biological system consisting of two different populations interpreted as "predators" and "prey" coexisting in the same habitat (e.g., see §2.3 and [133, 186, 187]). This system is mathematically described by the standard Lotka-Volterra model in which the behavior of an isolated system is subject to the following system of equations (see (2.3.5)):

dx „ ^ = («x - o«3*,

y0>0,

dy , ~ ^~ ^ = (blX-b2)y,

x(t),y(t)>Q,

^^^

t>Q.

Recall that x = x( t ) and y = y ( t ] are the respective population sizes12 of prey and predators at time t and the positive constants 0.1, 02, 61, and 62 have the following meaning: 01 is the rate of growth of the number of prey, a-i is the rate of prey consumption by predators, 61 is the rate at which the prey biomass is processed into the new biomass of predators, and 63 is the rate of predator natural death. In this section we consider a special case of system (5.2.1) in which the predators die at a high natural rate and are "poor" predators, since they consume their prey at a low rate. In the nomenclature of [177], this problem corresponds to the case of predators poorly adapted to the habitat. For system (5.2.1), this means that we can take the ratio 0261/62 = e
tained in the asymptotic series on the right-hand side of Eq. (5.1.23) for the nonvibrational amplitude. 12 If the distribution of species over the habitat is uniform, then x and y denote the densities of the corresponding populations, that is, the numbers of species per unit area (volume) of the habitat.

268

Chapter V

5.2.1. Statement of the problem. We assume that system (5.2.1) is controlled by eliminating prey specimens from the population (by shooting, catching, and using herbicides). Then, instead of (5.2.1), we have the system (see (2.3.12)

dx

,

„„,

_

„,„,

—^ = (GI — a,2y)x — ux, d L

„

a;(0) = XQ, (5.2.2)

at here the control u = u(t) satisfies the constraints

0 < u < 7,

(5.2.3)

where 7 is a given positive number. We consider the control problem for system (5.2.2) with infinite time interval; the goal of control is to take the system from any initial state

xo,yo > 0 to the equilibrium state £* = b%/bi, y* — o-i/o-z of system (5.2.1). For the optimality criterion we use the functional

r°° \ f

/PI] = I

/>o\2

/

ni\2"i

[Cl (x( t) - g) + C2 (y ( t) - -i) J dt,

(5.2.4)

where c\ and c-z are given positive constants. We assume that the integral (5.2.4) is convergent. In (5.2.2) we change the variables as follows:

C>1

,

„ y = ———————— , «2

~ t=-r-, OJ&2

U

> = \J--

,. „ 5 2

KX

- -5

V "2

This allows us to rewrite system (5.2.2) in the form

dx u . , — = x = y + exy - —— (1 + ex), dt eb2u , , ay . s a^bi — =y=-x+-xy, £=——. dt us t>2

,, „•,

0 (5.2.0)

In this case, the functional (5.2.4) to be minimized acquires the form 1 r°° /[£] = —— / (Clalx2(t) + c2bl^y2(t)} dt.

(5.2.7)

W»2 Jo

In the new variables (x, y), the goal of control is to transfer the system to the origin (x = y = 0), and the range of admissible values is bounded by the

Control of Oscillatory Systems

269

quadrant x > — l/£, y < w/e (since the initial variables are nonnegative,

x,y > 0 ) . We assume that the admissible control is bounded by a small value. To this end, we set 7 = e2-y in (5.2.3). Then, changing the scale of the controlling function, u = £2u, we can write system (5.2.6) and the constraint (5.2.3) as fit

x — y + exy — -—- (1 + ez),

y — —x + sxy,

(5.2.8)

»2W

0 < u < -y.

(5.2.9)

Thus the desired optimal control u* can be found from the condition that the functional (5.2.7) attains the minimum value on the trajectories of system (5.2.8) with constraint (5.2.9) imposed on the control actions. In this case, we seek the control in the form uf = w*(a;(£), y ( t ) ] . 5.2.2. Approximate solution of problem (5.2.7)— (5.2.9). In the case of "poorly adapted" predators, the number £ in (5.2.8) is small, and system (5.2.8) is a special case of the controlled quasiharmonic oscillator (5.1.1). Therefore, the method of §5.1 can immediately be used for solving

problem (5.2.7)-(5.2.9). The single distinction is that admissible controls are subject to nonsymmetric constraints (5.2.9); thus the antisymmetry property (5.1.11) of the optimal control is violated. As a result, it is impossible to write the desired controls in the form (5.1.18). However, as is shown later, no special difficulties in calculating the quasioptimal controls in problem (5.2.7)-(5.2.9) arise due to this fact. On the whole, the scheme for solving problem (5.2.7)-(5.2.9) repeats the approximate synthesis procedure described in §5.1. Therefore, in what follows, the main attention is paid to distinctions in expressions and formulas

caused by the special nature of problem (5.2.7)-(5.2.9). Just as in §5.1, by changing variables according to formulas (5.1.15)13 we transform system (5.2.8) to the following equations for the slowly changing amplitude and phase (5.1.16):

Now, instead of (5.1.17), we have the following expressions for the functions

G(A, $) and H(A, $) only:

G(A, $) = g(A, *) - uc(A, $) - eu'c(A, $),

H(A, $) = h(A, *) - us(A, *) - eu't(A, *), With the obvious change in notation: xi = x and x? =• y.

270

Chapter V

g(A, $) = A2 sin $ cos 4>(sin <J> — cos $), h(A, $) = j4sin$cos$(sin$ + cos$),

sin $,

«'. (^, *) = -

(5.2.10)

1

sin $ cos $.

The passage to Eqs. (5.1.23) for the nonvibrational amplitude .A* and phase ip* is performed, as above, by using formulas (5.1.21)-(5.1.24). The

terms G±, H-[, G%, • • • in the asymptotic series in (5.1.23) are calculated from (5.1.24), (5.2.10) by the method of successive approximations. In particular, in the first approximation, instead of (5.1.26)-(5.1.28), we have

*,$*), i **

frj(A*) = -«,(A*,**),

(5.2.11)

[ f f (^,$)- U c (^,$) + «c]d$,

(5.2.12)

*)-«.(^*,$) + «.]d*.

(5.2.13)

In (5.2.11)-(5.2.13) we took into account the fact that, in view of (5.2.10), we have

g = g(A*,$*) =

g(A* ,$) d$ = 0,

h = 0.

For the second term of the asymptotic series on the right-hand side of

Eq.(5.1.23), instead of (5.1.31), we have

By §5.1 the quasioptimal controls ui(A*, 3>*),U2(A*, $*),... are found from the condition that the partial derivative dI(A*,$*)/dA* attains its minimum. In view of (5.1.50) and (5.1.52), this condition is equivalent to the condition that G\(A*) attains its minimum (in the first approximation) or

Control of Oscillatory Systems

271

the sum G\(A*} +£GJ(^4*) attains its minimum (in the second approximation). It follows from (5.2.9), (5.2.10), and (5.2.11) that minimization of GA* means maximization of _______

1

uc = uc(A*, $*) = — / 2-7T

u(jT,$)cos$d$->- max . -

(5.2.15)

This fact immediately implies the following implicit formula for quasioptimal control in the first approximation:

«i(A*,$*) = -(signcos$* +1). Zi

(5.2.16)

Taking into account formulas (5.1.15) and (5.1.21) for the change of variables, we can write x = A* cos 4>* with accuracy up to terms of the order of £. This fact and (5.2.16) readily imply the following expression for the

synthesis control in the first approximation in terms of the variables (x, y):

ui(x,y) = ^(signx + I).

(5.2.17)

Thus, in the course of the control process, the controlling action assumes only the boundary values from the admissible range (5.2.9) and is switched from the state MI = 0 to the state MI = 7 (or conversely) each time when the representative point (x, y) intersects the y-axis (the switching line in the

first approximation). We also point out that, according to (5.2.5), in the variables (x, y) corresponding to the original statement of problem (5.2.2)(5.2.4), this control algorithm leads to the switching line that is the vertical line passing through the point x = x* — 62/^1 on the abscissa axis; this point determines the number of prey if system (5.2.1) is in equilibrium. To find the optimal control in the second approximation, we need to minimize the expression Gi(A*) + £G2(A*) — F(A*,u). The functions

Gl(A*} = Gl(A*,u) and G$(A*) = G$(A*,u) are calculated by formulas (5.2.11) and (5.2.14) with regard to (5.2.10), (5.2.12), and (5.2.13). In actual calculations by these formulas, it is convenient to use the fact that the difference between the optimal control u2 (A* , $* ) in the second approximation and (5.2.16) must be small. More precisely, we can assume that on the one-period interval of the fast phase $* variation, the optimal control in the second approximation has the form of the function shown in Fig. 38 (the solid lines), where AI and A2 are the phase shifts of the switch times with respect to the switch times of the control in the first approximation

(the dashed lines); these variables are small (Ai, A2 ~ e). This fact allows us, without loss of generality, to seek the control algo-

rithm u2(A*,$*) in the second approximation immediately in the form u2(A*, $*) = 1 {sign[cos($* - ^1) - sin p2] + 1} .

(5.2.18)

Chapter V

272

A2 37T/2

7T/2

27T

$*

FIG. 38 Here
are related to AI and A2 as

and y>2 =

A2-Ai

Ai+A2

(5.2.19)

and hence, are also of the order of e. If the desired control in the second approximation is written in the form (5.2.18), then there are at least two advantages. First, in this case, we can minimize F(A*,u) = GJ(A*) + eG^(A*) by finding the minimum of the known function F(A*,
calculate G\ and G*2 by formulas (5.2.11) and (5.2.14) using the fact that ipi and <£>2 are small ((pi, ip^ ~ e ) ) . From (5.2.10), (5.2.11), and (5.2.18), we obtain •=__\_t 27T&2W JO

27T&2w

(5.2.20)

Since y>i,y> 2 ~ e, it follows from (5.2.20) that the maximal terms (depending on ipi and y> 2 ) in the expansion of (5.2.20) in powers of e are of the order of e 2 . Therefore, to calculate the second term eG*2 in the function F(A*, 2 ) = Gi + eG^ to be minimized, we can retain only terms of the order of e2 and neglect the terms of the order of e3 and of higher orders.

273

Control of Oscillatory Systems

R

FIG. 39 With regard to this remark we calculate the mean values on the right-hand side of (5.2.14) and thus see14 that we need to minimize the function

(5.2.21) to obtain the optimal values of (f>\ and if 2 in the second approximation. From the condition dF/dipi = dF/dtp? = 0 necessary for an extremum, we obtain the desired optimal values

i = 0,

ip2 = ¥>2(A*) = -

ub\A*2'

(5.2.22)

Expressions (5.2.22) determine (in the polar coordinates) the switching line for the optimal control in the second approximation. The form of this

line on the phase plane (a;, y) is shown in Fig. 39. The neighborhood of the origin in the interior of the circle of radius R = 2e7/w&2 is the region where the quasiharmonic character of the phase trajectories is violated. Generally speaking, the results obtained here are not authentic, and we need to use some other methods for constructing the switching line in this region. 5.2.3. Comparative analysis of different control algorithms. It

is of interest to compare the results obtained in the preceding subsection 14

Here we omit cumbersome elementary transformations leading to (5.2.21). To ob-

tain (5.2.21), we need to use formulas (5.2.10), (5.2.12), (5.2.13), and (5.2.18) and the technique used in Section 5.1.3 for calculating average values.

274

Chapter V

with the solutions of similar synthesis problems obtained by other methods. To this end, we can use the results discussed in §7.2 (see also [105]), where we present a numerical method for solving the synthesis problem for the "normalized" predator— prey system controlled on a finite time interval. In §7.2 we consider the optimal control problem in which the plant equa-

tions, the constraints on admissible controls, and the optimality criterion have the form

— = x(l - y) - ux, d l_

x(Q) = x0 > 0, (5.2.23) y0>Q,

0 < r < T,

0 < U(T) < 7,

(5.2.24)

[(l-2f(T)) 2 + ( l - y ( r ) ) 2 ] d r - > O

min

.

(5.2.25)

0<«(r)<7 0
In this case, in §7.2 we derive the optimal control ut (r, x, y) in the synthesis form by solving the Bellman equation corresponding to problem (5.2.23)(5.2.25) numerically.

Note that problem (5.2.23)-(5.2.25) turns into problem (5.2.2)-(5.2.4) if the following assumptions are satisfied:

~ T = ait,

_ x

i~ = —x,

_ 2~ y= — y,

,

6= —2 ,

„,

_

7 = 017,

(5.2.26) "l

Cl = 72 , 2

*^2

C2 = ~2 ,

T -)• 00.

Cti-i

We also note that, in view of the changes of variables (5.2.5) and (5.2.26), the quasioptimal control algorithm in the first approximation (5.2.17) acquires the form

«iC=. V) = |[sign(^ - 1) + 1].

(5.2.27)

To estimate the effectiveness of algorithm (5.2.27), we performed a numerical simulation of the normalized system (5.2.23). Namely, we constructed a numerical solution of (5.2.23) on the fixed time interval 0 < T < T = 15 for three different algorithms of control u: (1) the optimal control u — M*(T, x, y); (2) the optimal stationary control u = ust(x,y)

corresponding to the case where the terminal time T —>• oo in problem (5.2.23)-(5.2.25); (3) the quasioptimal control in the first approximation (5.2.27).

275

Control of Oscillatory Systems

2.

10

12

14

T

FIG. 40 For these three control algorithms, the transient processes in system (5.2.23) are shown as functions of time in Fig. 40 and as phase trajectories in Fig. 41. Moreover, the following parameters of problem (5.2.2)-(5.2.4)

were used for the simulation: a\ = «2 = ^i = ^2 = 0.5, 7 = 0.125, e = 0.5, w = 1, 7 = 0.5, GI = 02 = 1 (in problem. (5.2.23)-(5.2.25), to these values there correspond 7 = 0.25 and b = 1).

1 -

2 x

FIG. 41 Comparing the curves in Figs. 40 and 41, we see that these three algorithms lead to close transient processes in the control system. Hence,

276

Chapter V

the second and the third algorithms provide a sufficiently "good" control. This fact is also confirmed by calculating the quality functional (5.2.25) for these three algorithms, namely, we obtain I[ut(T,x,y)] = 4.812, I[u'f(x,y)] = 4.827, and I[ui(x,y)] = 4.901. Thus, any of these algo-

rithms can be used with approximately the same result. Obviously, the simplest practical realization is provided by the first-approximation algorithm (5.2.27) obtained here; by the way, this algorithm corresponds to

reasonable intuitive heuristic considerations of how to control the system. Indeed, according to (5.2.27), it is necessary to start catching (shooting, etc.) every time when the prey population size becomes larger than the equilibrium size (for the normalized dimensionless system (5.2.23), this equilibrium size is equal to 1). Conversely, as soon as the prey population size becomes smaller than the equilibrium size, any external action on

the system must be stopped. It should be noted that the situation when the first-order approximation allows one to obtain a control algorithm close to the optimal control is rather typical not only of this special case but also of other cases where the small parameter methods are used for solving approximate synthesis problems for control systems. This fact is often (and not without success) used in practice for solving special problems [2, 33]. However, it should be noted that this fact is not universal. There are several cases where the first-approximation control leads to considerable increase in the value of

the functional to be minimized with respect to its optimal value. At the same time, the higher-order approximations allow one to obtain control algorithms close to the optimal control. Some examples of such situations (however, related to control problems of different nature) are examined in §6.1 and in [97, 98]. §5.3. Optimal damping of random oscillations

In this section we consider the optimal control problem for a quasiharmonic oscillator, which is a stochastic generalization of the problem studied in §5.1. Therefore, many ideas and calculational formulas from §5.1 are widely used in the sequel. However, it should be pointed out that the foundations underlying the approximate synthesis methods in these two sections are absolutely different. In §5.1 the quasioptimal controls are obtained by straightforward calculations and minimization of the cost functional, while in the present section the approximate synthesis is based on an approximate method for solving the Bellman equation corresponding to the problem in question. 5.3.1. Statement of the problem. Preliminary notes. Here we

consider a stochastic version of problem (5.1.3)-(5.1.6)as the initial synthe-

Control of Oscillatory Systems

277

sis problem. We assume that the quasiharmonic oscillator (5.1.3) is subject to small controls eu = eu(t) and, in addition, to random perturbations of small intensity

x + £x(x,x)x + x = -£u + ^/eB£(t),

(5.3.1)

where £(i) denotes the standard scalar white noise (1.1.31) and B > 0 is a given number. The admissible controls u = u(t), just as in (5.1.5), are subject to the constraints \u(t)\
and the goal of control is to minimize the mean value of the functional

U

T

-I

c(x(t),x(t))dt

J

->•

min . [«(*))<«„

(5.3.3)

0
The nonlinear functions x(x, x) an(i c(x, x) in (5.3.1) and (5.3.3), just as in §5.1, are assumed to be centrally symmetric, %(x, x) = x(— xi — x) and c(x, x) — c(— x, — x). Next, it is assumed that the penalty function c(x, x) is nonnegative and possesses a single minimum at the point (x = 0, x = 0) and c(0, 0) = 0. Let us introduce the coordinates x\ = x, x 2 = x and rewrite (5.3.1) as

xi = x 2 ,

*2 = -xi -ex(xi,x2)x2 +eu + VeB£(t).

(5.3.4)

Then, using the standard procedure from §1.4, for the function of minimum future losses

t
U

T

I

C(XI(T), x 2 (r)) dr \ x^t) = xi, x2(t) = x2\, J

(5.3.5) we obtain the Bellman differential equation

dF

, . 9F xi + £x(xi,x2)x2-—+

. mm

dF <9x 2 _ (5.3.6)

corresponding to problem (5.3.1)~(5.3.3).

278

Chapter V

It follows from (5.3.6) that the desired optimal control ut (t, x\, x2) can be written in the form

dF u*(t, zi, x2) = -um sign -—(t, zi, x2), OX'2

(5.3.7)

where the loss function F(t,xi,x2) satisfies the following semilinear equation of parabolic type:

dF

dF

dF e_B^d2F

T

(5.3.8)

Equation (5.3.8) and the fact that the functions X(KI, x2) and G(XI, x2] are symmetric imply that F = F(t, KI, 3:2)5 satisfying (5.3.8), is symmetric with respect to the phase coordinates, that is, F(t, xi, x2) = F(t, —xi, — x2). This and formula (5.3.7) show that the optimal control (5.3.7) possesses an important property, which will be used in what follows; namely, the optimal control (5.3.7) is antisymmetric (see (5.1.11)):

We also stress that in this section the main attention is paid to solving the stationary version of problem (5.3.1)-(5.3.3), that is, to solving the control problem in which the terminal time T —>• oo. In the nomenclature of [1], problem (5.3.1)-(5.3.3) as T —>• oo is called the problem of optimal stabilization of the oscillator (5.3.1). 5.3.2. Passage to the polar coordinates. The Bellman differen-

tial and functional equations. By using the change of variables (5.1.15), we transform Eqs. (5.3.4) to equations for the slowly changing amplitude A and phase ip:

,u,t),

3>=t + p,

(5.3.10)

where

H(A,$,u,t) = H(A,$,u)- —T-£c(t),

£c(t) =.B 1/2 ^(<)cos$,

(5.3.11) and the functions G(A, $, u) and H(A, ,w) are determined by (5.1.17).

Control of Oscillatory Systems

279

Note that the right-hand sides of the differential equations (5.3.10) for the amplitude and phase contain a random function £(t) that is a white noise. Therefore, Eqs. (5.3.10) are stochastic equations. The expressions (5.3.11) for G and H are derived from (5.3.4) and (5.1.15) by changing the variables according to the usual rules valid for smooth functions (,(t). Thus

it follows from §1.2 that the stochastic equations (5.3.4) and (5.3.10) are equivalent if they are symmetrized.15

We also note that by passing to the polar coordinates (which become the arguments of the loss function (5.3.5)), we can equally use either the set (A,
For the loss function F(t, A, $) defined by analogy with (5.3.5),

F(t,A,*)=

min

E[ /

|«(r)|<« m

Cl(A(r),

$(r)) dr \ A(t) = A, *(*) = $1 ,

Ut

J

t
ci(A,$) = c(Acos$, -ylsin$),

(5.3.12)

we can write the basic functional equation of the dynamic programming approach (see (1.4.6)) as

min

"' T /"'

E /

\u(r)\
+A

ci(Ar, $ r ) dr + F(i + A,

Ut

t
(5.3.13) This equation expresses the "optimality principle." It is important to stress that relation (5.3.13) holds for any time interval A (not necessarily small).

This fact is important in what follows. But if A -> 0 in (5.3.13), than, using (5.3.10) and (5.3.11), we can readily obtain (see §1.4) the following Bellman differential equation for the function (5.3.12):

0
More precisely, for Eqs.

F(T,A,$) = Q,

(5.3.14)

(5.3.10) it is important to take into account the sym-

metrization property, since these equations contain a white noise £(i) multiplicatively with expressions that depend on the state variables A and f. As for Eqs.(5.3.4), they have

the same solutions independent of whether they are understood in the Ito, Stratonovich, or any other sense.

280

Chapter V

where L denotes the operator n 2 $ a2 2

dA

sin 2$

82

1A dAd®

cos 2 $ d2

2A2 cos2 $ d

sin 2$ d

The last two terms in (5.3.15) appear due to the fact that the stochastic equations (5.3.10) are symmetrized. If we change the time scale and pass to the slowly varying time t = et, then Eq. (5.3.14) for the loss function F(t, A, $) acquires the form

^ ' u + H ( A ' ^ - <5-3-16' It follows from (5.3.16) that the derivatives of the loss functions with respect to the amplitude and the fast phase are of different orders of magnitude (if dF/dA ~ 1, then 6F/d3> ~ e). This fact, important for the subsequent considerations, follows from the quasiharmonic character of the motion of system (5.3.4). Equation (5.3.16) can be simplified if, just as in §1.4, §2.2, §3.1, etc., we consider the stationary stabilization of random oscillations in system (5.3.4). In this case, the upper limit of integration T —>• oo in (5.3.5) and (5.3.12). The right-hand side of (5.3.12) also tends to infinity because of random perturbations £(t). Therefore, to suppress the divergence in the

stationary case, we need to consider the following stationary loss function (see (1.4.29), (2.2.9), and (4.1.7)):

/(A, *) = lim [F(t, A, $) - 7(eT - ?)], T—>oo

(5.3.17)

where the constant 7 characterizes the mean losses of control per unit time in the stationary operating conditions. For the function (5.3.17), we have the stationary version of Eq. (5.3.16): df_

3$

(5 3 18)

-"

Just as in §5.1, taking into account the relay (5.3.7) and the antisymmetry (5.3.9) properties, without loss of generality, we can seek the optimal

Control of Oscillatory Systems

281

control u*(A, «fr), which minimizes the expression in the square brackets in (5.3.18), in the set of controlling actions of the form (5.1.18):

u(A, $) = um sign [sin (* - pr(A))].

(5.3.19)

This allows us to rewrite Eq. (5.3.18) in the form

df - min \G(A,$>,

(5.3.20)

where G(A, $, tpr) and H(A, $, y? r ) denote the functions obtained from

(5.1.17) after the substitution of the control u(A,<&) in the form (5.3.19). Thus, solving the synthesis problem is reduced to finding the function f*(A) that minimizes the expression in the square brackets in (5.3.20) and determines (in polar coordinates) the equation for the switching line of the controlling actions ut = ±WTO under the optimal control u* (A, $). To calculate the function
desired function
Now let us write the functional equation (5.3.13) for the time interval

A = 2-7T. With regard to (5.3.19), we can write (•J + 2JT

mm

"~

Pr(Ar

(5.3.22) Since the loss function (5.3.12) is periodic in the variable <J>, we have F(t, A, $) = F(t, A, $ - 27r). This and (5.3.10) imply that relation (5.3.22) can be rewritten as

F(t,At,$t)=mmE\

ci(A r ,* r ) dr t

+ F(t + 27T, At + eAA, $t + gA^) ,

(5.3.23)

282

Chapter V

where * +2jr

eAA

/J,.

~

G(AT,3>T,uT,

t rt+2n

(5.3.24)'

v

«+2?r

H(AT, $ r , UT, T) dr

/ * + 27T

tf(,4 r ,* r ,y> r (A r )

Using, just as in (5.3.16), the "slow" time t = et and expanding F(t + 2ire, At + eAA, $t + zAip) in the Taylor series, we rewrite (5.3.23) in the form

r

/•*

mmEle fr

I Jt

dF ~ dF ~(t + 2ne, At, $ t ) + eA^— (

(5.3.25) In the stationary case considered in what follows, Eq. (5.3.25) acquires the form

27r r /-*+ mm inE\e I c (A

l

I Jt

(5.3.26)

Equation (5.3.26) is of infinite order and looks much more complicated than the differential equation (5.3.20). Nevertheless, since the differences eA.A and e&ip are small, the higher-order derivatives of the loss function in (5.2.26) are, as a rule, of higher orders of magnitude with respect to powers

of the parameter e. This allows us, considering terms of more and more

Control of Oscillatory Systems

283

higher order of e in (5.3.26) successively, to solve equations of comparatively non-high orders and then to use these solutions for approximate solving of the synthesis problem. In practice, in this procedure of approximate synthesis, special attention

must be paid to a very important fact that simplifies the calculations of successive approximations. Namely, in this case, there are two equations, (5.3.20) and (5.3.26), for the same function f(A, $). Thus, combining both these equations, we can exclude the derivatives df/d$, d2f/dAd$, ... of the loss function with respect to the phase from (5.3.26) and thus to decrease the dimension and to turn the two-dimensional equation (5.3.26) into a one-dimensional equation. It is convenient to exclude the derivatives with respect to the phase, just

as to solve Eqs. (5.3.26), by using the method of successive approximations. 5.3.3. Approximate solution of the synthesis problem. To apply the method of successive approximations, we need to calculate the mean

value of the integral /.« /.«+2jr

-I

Cl(AT,^T)dT\

e Jt

\

(5.3.27)

in (5.3.26) and the mean values of the amplitude and phase increments

E(eAA), E(^), E[(eAA) 2 ], E[(e^A)(e
(5.3.28)

over the time 2?r. By using system (5.3.10), we can calculate expressions (5.3.27) and (5.3.28) with arbitrary accuracy in the form of series in powers of the small parameter e. Let us write

G(A, *, um sign[sin($ - <pr(A))], t) = G(A, $, *),

^

H(A, *, um sign[sin($ -
lo.o.^y )

Then it follows from (5.3.10) that the increments of the amplitude A and the slow phase

At+r — At = £6AT = £ I G(At+T', $t+T', T') dr', Jo

Jo

(5.3.30) , $t+T', r1) dr'.

By using formulas (5.3.30) repeatedly, we can present e8AT and £6ipT as the series

£8AT = •e3...,

284

Chapter V

where

e&iAr = e j

G(At, $t + T', T') dr',

(5.3.32)

(5.3.33)

e3AS3AA T = e

T

1

^,

j dT '

(5 3>34)

-

= £ f H(At^t + r', T') dT', ./o

(5.3.35)

Tl

TL ,

Lr

a* (5.3.36)

The increments (5.3.24) are calculated by formulas (5.3.31)-(5.3.36) with regard to (5.1.17), (5.3.10), and (5.3.11) as

eAA — sSA^m

£& — £62*-

(5.3.37)

Finally, we need to use (5.3.31)-(5.3.37) and average the corresponding expressions with respect to (,(t), taking into account (1.1.31).

In a similar way, using formulas (5.3.32)-(5.3.36), we can also calculate the integral in (5.3.27) as a series in powers of e. Indeed, writing

:/ Jt

ci(AT,$T)dT = e

Jo

ci(AT,$t + r)dr

substituting (JiA r ,<5i^ r ,... given by formulas (5.3.32), (5.3.35), ..., and averaging with respect to £,(t), we obtain the desired expansion for (5.3.27). In practice, to use this method for calculating the mean values of (5.3.27) and (5.3.28), we need to remember that formulas (5.3.30)-(5.3.38) possess a

Control of Oscillatory Systems

285

specific distinguishing feature relative to the fact that the random functions in expressions (5.3.29) have the coefficients e"1/2:

G(A, <M) = G(A, $, <pr) - e = XA(A*)-«.( ,-1/2 U

v

c(A*)

(5.3.39)

-

FB

COS®

—

(formulas (5.3.39) follows from (5.1.17), (5.3.11), and (5.3.29)). Thus, terms of the order of e"1 appear in £^2^.2^) E^s^*-, . . ., E^^TD E^^TM • • •• Therefore, in the calculations of the mean values of (5.3.27) and (5.3.28), the number of terms retained in the expansions (5.3.31) must always be larger by 1 than needed for the accuracy desired (if, for example, we need

to calculate the mean values of (5.3.27) and (5.3.28) with accuracy up to terms of order of es , then we need to retain (s + 1) terms in the expansions (5.3.31)). For example, let us calculate the first term in the expansion of the mean value E(eAyl). From (5.3.32) and (5.3.35), we have

o

G(At, *« + r', <pr(At)) - i - £ ( r ' ) sin($« + r') dr', V £ J (5.3.40) dr'.

(5.3.41) Averaging (5.3.40) with respect to £(t) and taking into account the properties of the white noise, we obtain

E6j.Ate = 2TrG(At, <pr) = 2n[xA(At) - us (At, <pr)],

(5.3.42)

where the bar, as usual, indicates averaging with respect to the fast phase over the period (e.g., XA(At) = ^ J^ XA(At, *) d&), and us(At,ipr) = us is given by (5.1.34)). Next, it follows from (5.3.33), (5.3.40), and (5.3.41) that 27T

Qf-i

Chapter V

286 f

L\G(At,

Jo

*« + r', pr(At)) - J^-tV) sin(*« + r')] dr*

v e

J

H(At,

-r.

(5.3.43) Averaging (5.3.43) with respect to £(t) and taking into account (1.1.31) and (1.1.32), we obtain D

o D

i o

/-2JT 2JT F

= —- / zAt JQ

1

/ <J(r' - r) cos($t + r') cos($( + r) dr' \dr + D Uo J

where /»27T

r

'

J0

1

ftC

•jf dr.

Finally, from (5.1.34), (5.3.31), (5.3.37), (5.3.42), and (5.3.44) we obtain the desired mean value

E(eAA) = e27r = e2?r

-e2...

-u,(At,

—— (5.3.45)

= -UTO/TT). In a similar way, we obtain

At

•e2...

(5.3.46)

Control of Oscillatory Systems

287

For the other mean values of (5.3.27) and (5.3.28), in the first approximation in e, we have

r

rt+2x i 2 g/ Cl(Ar,3>T}dT \=e2ircl(At) + e ...,

L Jt

1

E(eAA)2 = eBTr + £2...

(5.3.47) (5.3.48)

All the other mean values E[(eA4)(eAy>)], E(eA^>) 2 ,... in (5.3.28) are higher-order infinitesimals in e. Now let us calculate successive approximations of the Bellman equation (5.3.26). Simultaneously, with the help of Eq. (5.3.20), we shall exclude the derivatives of the loss function with respect to the phase from (5.3.26). The first approximation. We represent the loss function f(A, $) as the series

f ( A , *) = h(A, $) + ef2(A, *) + e2 . . . ,

(5.3.49)

substitute into Eq. (5.3.26), and retain only terms of the order of e (omitting the terms of the order of e2 and of higher orders). Since, in view of (5.3.20), df/d$ ~ e, using (5.3.45)-(5.3.48), we obtain the following equation of the first approximation from (5.3.26):

=0. (5.3.50) In (5.3.50) we calculate the minimum with respect to <pr under the assumption that dfi/dA > 0 and thus obtain the expression

*r(A) = V\(A} = 0.

(5.3.51)

for the minimizing function
Comparing the result obtained with the approximate synthesis result (5.1.55) for a similar deterministic problem, we see that, in the first approximation in e, the perturbation £(t) in no way affects the switching line. Just as in the deterministic case (5.1.55), (5.1.56), the switching line coincides with the abscissa axis on the phase plane (x, x) for any type of nonlinearity, that is, for any function x(*,i) in Eq. (5.3.1).

288

Chapter V

To find the switching line in the second approximation, we need to calculate the derivative djijdA satisfying the differential equation (5.3.52), where the stationary error 7 is not jet found. But we can readily show how to calculate this error. Namely, since the stationary error is defined (in the

probability sense) as the mean penalty value (see (1.4.32)), we have

7= / Jo

Pl(A)cl(A)dA,

(5.3.53)

where pi (A) is the stationary probability density for the distribution of the amplitude A. The Fokker-Planck equation that determines this stationary

density is conjugate to the Bellman equation. Therefore, in the case of (5.3.52), the equation for pi(A) has the form

For the zero probability flow (see §4, item 4 in [173]), Eq. (5.3.54) has the solution (5.3.55) where the constant C is determined by the normalization condition

C=

r°°

f i r fA l) Aex.pl - -[877,4-4 / xA(A')dA'\ \dA.

(5.3.56)

As soon as 7 is known, we can solve Eq. (5.3.52). The unique solution of this equation is specified by the condition that the function dfi/dA must behave as A —> oo just as in the deterministic case (that is, in (5.3.1) the random perturbations £(t) = 0). This assumption on the solution of

Eq. (5.3.52) is quite natural, since, obviously, the role of the diffusion term in the equation decreases as A increases (similar considerations were used in §2.2). It follows from (5.3.52) that if there are no perturbations (B = 7 — 0), then this equation has the solution

_dh_ -Cl(A) dA ~ ~xA(A)-2p Therefore, the diffusion equation (5.3.52) has the solution

=

7 _ 5l

«„ (5.3.58)

Control of Oscillatory Systems

289

Now we can verify whether the derivative dfi/dA is positive (this was our assumption, when we derived (5.3.51)). It follows from (5.3.58) that this assumption is satisfied for not too small values of the amplitude A. Therefore, if we solve the synthesis problem by this method, we need not consider a small neighborhood of the origin on the phase plane (», x). Just

as in the deterministic case in §5.1, it is clear from the "physical" viewpoint that the controlling action u and the perturbations £(t) lead to appearance of a neighborhood where the quasiharmonic character of the phase trajectories is violated. The second approximation. To obtain the Bellman equation in the

second approximation, we retain the following expression in (5.3.26):

rJt

minE<e / fr

ci(AT, $T] dr —

The other terms in (5.3.26) are necessarily of orders larger than that of e2. The derivatives dfi/dQ, d2fi/dAd&, . . . of the loss function with respect to the phase can be eliminated from (5.3.59) by using (5.3.20). Hence we have

(5.3.60) To find the function y>* (A) that minimizes the expression in the braces in (5.3.59), we shall consider only the terms in (5.3.59) that depend on the control (or, which is the same, on <pr(A)). In this case, we shall use the fact that the minimizing function (f* (A) is small in the second approximation:
and ~ e3. Clearly, it is no longer sufficient to have only formulas (5.3.45)-(5.3.48) for the mean values of (5.3.27) and (5.3.28) in the first approximation.

In the expansions (5.3.45)-(5.3.48) we need to calculate the terms ~ e2. Following the way for calculating (5.3.30)-(5.3.38) and retaining only expressions depending on ipc = ey>2 in the terms of the order of e2, we see that, in the second approximation, formulas (5.3.45)-(5.3.48) must be

290

Chapter V

replaced by

, <3>) — UC(A, i (5.3.61)

E(e&A)2 = eBTt - c-^-uc(A, <pr) sin2 $,

(5.3.62)

E(eAp) = _e2™*(Air) ^ .A

(5.3.63)

[

/•4 + 2JT

E e I/

Jt

"I / J Ji N I ci(A T,®T)dT\ = \

A

'

°

'

(5.3.64)

where G(A, $) and ci(yl, $) denote the purely vibrational components of

the functions G(A,^,ipr) = G(A,(pr) + G(A,$) and a(A, $) = ci(^) + By using (5.3.60)-(5.3.64), (5.1.34), (5.3.42), and (5.3.59), we see that the desired function if*(A) = e(p2(A), which determines the switching line in the second approximation, can be found by minimizing the expression

uc (A, pjc! (A, *) - « c (A

uc(A, Vr

(5.3.65) We collect similar terms in (5.3.65) with the help of (5.1.34)-(5.1.36).

Control of Oscillatory Systems

291

As a result, we obtain

TT uc(A, $)ci(.A, $)

2um

/_

B\ .

VA

dfi/dA

1A1

r

(5.3.66)

}-

In the following two examples, we calculate the function if* (A) for which (5.3.66) attains its minimum. EXAMPLE 1. Suppose that the plant to be controlled is a linear system. In this case, x(xi i) = 1 in (5.3.1), and it follows from (5.1.17) that _

A

X A (A) — — — j 2

A

XA(A,$) = —cos2$. 2

For simplicity, we assume that the vibrational component of the penalty function ci(A, $) = 0 (this holds, e.g., if c(x, x) = x2 + x2 in (5.3.3)). Then, in view of (5.1.44) and (5.1.45), the expression (5.3.66) acquires the form

5/Af

, £ \ A (I . A[ 4 V3 A B\ .

— cos ip r + — — — I — sin 3<pr + sin (pr

7 - ciU)

The condition dN/dy>r = 0 leads to the following equation for the desired function if* (A):

sin +£

** -\cos3^ + (\ -

+

cos

* =

Representing the desired functions if* (A) in the form of asymptotic expan-

sion
Formula (5.3.68) determines the switching line of the suboptimal control u2(A, $) = um sign[sin($ — ep^A))] in the second approximation.

292

Chapter V

In (5.3.68), 7 is calculated by formula (5.3.53) with the stationary probability density

F(u) = V '

.0

(5.3.69) Here the derivative dfi/dA, determined by (5.3.58) in the general case, has the form

dA

BA f°° , x / [ci(A') -7]A

,

r 1 ,9 ,1 , exp - — (A'2 + 8/iA') dA'.

, (5.3.70)

Since •7 —>• 0

and

3/i ci(A) —— —> ———^—7—-

u

as

JD —>• 0;

A

one can readily see that formula (5.3.68) coincides as B —> 0 with the

corresponding expression (5.1.60) for the switching line of the deterministic problem. EXAMPLE 2. Let us consider a nonlinear plant with x(xi x) = x2 — I in (5.3.1) (in this case, the plant is a self-exciting van der Pol circuit). For such a system, it follows from (5.1.17) that

x(A) = |--g-,

x A (A,$) = -|cos2$ + ^-cos4$.

(5.3.71)

Substituting (5.3.71) into (5.3.66) and using (5.1.44) and (5.1.45), from (5.3.66) and the condition dN/dipr = 0 we derive the expression for the switching line in the second approximation, which coincides in form with

the expression obtained in the previous example. However, now the loss function and the stationary error in (5.3.68) must be calculated in a different way.

So, in this case, the stationary probability density (5.3.55) for the distribution of the amplitude has the form

(5-3-72)

293

Control of Oscillatory Systems where C is the normalization constant: A4_ _ r00 A„ exp ri| —j— _ /— ! (^ C — I/ CXP B V 8 ~ Jo L~ V"8~ ~

(5.3.73)

The stationary error 7 in (5.3.68) is calculated by formula (5.3.53) with the help of (5.3.72) and (5.3.73). The expression for dfi/dA can be obtained from (5.3.58) with regard to (5.3.71). As a result, we see that the derivative dfi/dA in (5.3.68) has the form

9/1

°°

+

^ A'(

dA'.

Just as in Example 1, formula (5.3.68) coincides as B —>• 0 with the corresponding expression

obtained in §5.1 (see (5.1.63)) for the deterministic problem.

_2

-1

-1 U = Un

FIG.

42

The influence of random perturbations on the position of the switching

line in the second approximation is shown in Fig. 42, where four switching

294

Chapter V

lines for the linear quasiharmonic system from Example 1 are depicted. Curve 1 corresponds to the deterministic problem (B — 0). Curves 2, 3, and 4 show the switching lines in the stochastic case and correspond to the white noise intensities B = 1, B = 5, and B = 20, respectively. These switching lines correspond to the quadratic penalty function c(x, x) =

x2 + x2 in the optimality criterion (5.3.3) and the parameters um = 1 and e = 0.25 in problem (5.3.1)-(5.3.3). The dashed circle in Fig. 42 approximately indicates the domain where the quasiharmonic character of the phase trajectories of the system is violated. In the interior of this domain, the synthesis method studied here may lead to large errors, and we need to employ some other methods for calculating the switching line near the origin.

5.3.4.

Approximate synthesis of control that maximizes the

mean time of the first passage to the boundary. As another ex-

ample of the method of successive approximations treated above, let us consider the synthesis problem for a system maximizing the mean time during which the representative point (x(t), x(t)} first comes to the boundary of some domain on the phase plane (x,x). For defmiteness, we assume that this domain is the disk of radius Ro centered at the origin. As before, we consider a system whose behavior is described by Eq. (5.3.1) with constraints (5.3.2) imposed on control. Passing to the polar coordinates and considering the new state variables A and as functions of the "slow" time t = et, we transform Eq. (5.3.1) to the system of equations of the form

at

, ,,,

at

£

,u,t),

(5.3.74)

where the functions G and H are given by (5.3.11) and (5.1.17). By using Eq. (5.3.74), we can write the Bellman equation for the problem in question. It follows from §1.4 that the maximum mean time during which the representative point (A(r), $(r)) achieves the boundary (the loss function for the synthesis problem considered) can be written as (see (1.4.38))

U

oo

-i

W(T,AT,$~\dT\. J

(5.3.75)

Recall that W(T, Af, 3>j) denotes the probability that the representative

point with the polar coordinates (Af, 3>j-) at time t does not achieve the boundary of the region of admissible values during the time (T — t). For the optimality principle (see

(1.4.39)) corresponding to the function (5.3.75),

Control of Oscillatory Systems

295

we can write the equation

t
U

t+A

_ 1 W(T, Aj, %) dr + F(A^+A, % +A ) . -I

(5.3.76) By letting the time interval A —>• 0, in the usual way (§1.4), we obtain the following differential Bellman equation for the function F(A, <J>):

dF ( — r dF dF —- =s< - l - L F - max \G(A, $, u)—— + H(A, $,u) —— (5.3.77) Here L is the operator (5.3.15), and the functions G and H are determined by formulas (5.1.17). On the other hand, if we set A = 1-xe in (5.3.76), then we arrive at the finite-difference Bellman equation (an analog of (5.3.26))

Here the increments of the amplitude eAA and the "slow" phase eA

Section 5.3.3, to solve Eqs. (5.3.77) and (5.3.78) simultaneously. Here we write out the first two approximations of the function f>r(A) determining the switching line in the optimal regulator, which, just as in Section 5.3.3, is of relay type and has the form (5.3.19). The first approximation. Substituting the expression 9F/d$ from (5.3.77) into Eq. (5.3.78), omitting the terms of the order of e2 and of higher orders, and using (5.3.45)~(5.3.48), we obtain the following Bellman equation in the first approximation:

max (5.3.79) Since, by definition, W(t, Af, $j) = 1 at all points in the interior of the domain of admissible states (that is, for all A-JT < RO), we can transform

296

Chapter V

(5.3.79) with regard to (5.3.45) to the form

*v '

=-1. (5.3.80)

4AJ 8A

The function (p* (A) determining the switching line in the first approximation is found from the condition that the expression in the square brackets in (5.3.80) attains its maximum. For dF^/dA < O16 we obtain p*(A)=0.

(5.3.81)

Comparing (5.3.81) with (5.3.51) as well as with (5.1.55), we conclude that, in the first approximation in e, the switching line of the optimal quasiharmonic stabilization system always coincides with the abscissa axis on the plane (x, x); this fact is independent of the type of system nonlinearity,

the existence of random perturbations, and the optimality criterion. Some distinctions between expressions for (p* (A) appear only in higher-order approximations. The equation for the loss function F\ (^4) in the first approximation with regard to (5.3.81) has the form

4 dA2 A unique solution of this equation is determined by the natural boundary conditions

dA

(0) < oo.

(5.3.83)

For simplicity, we shall consider the case where the plant is a linear quasiharmonic system. In this case, we have x(xi *) — 1 in (5.3.1) and x(-A) = -A/2 in (5.3.82). Solving (5.3.81) with the second condition in (5.3.83), we readily obtain

dA

_ 8 M fe-(A+4rf/BdA\\. B Jo JJ

v(5.3.84)

'

The expression (5.3.84) is used for determining the switching line in the second approximation. 16

It follows from (5.3.84) that the condition 9Fi/9A < 0 is satisfied for all A £ (0,Ro\.

Control of Oscillatory Systems

297

The second approximation. The switching line in the second approximation is calculated by analogy with Section 5.3.3. Namely, in Eq. (5.3.78) we consider the terms of the order of £2 and retain the terms depending on
that the desired function ip* (A) in the second approximation is determined by the condition that the expression

attains its maximum. If the system is linear, then we have X A (^4., <J>) = Acos 2<J>/2, and the desired expression for tp* (A), which follows from the condition dN/dip — 0 with regard to (5.1.44) and (5.1.45), has the form

(5.3.86)

Figure 43 shows the switching line given by (5.3.86).

FIG. 43 In conclusion, let us present the block diagram (Fig. 44) of a quasiop-

timal self-stabilizing feedback control system with plant P described by

Chapter V

298

Eq. (5.3.1). The feed-back circuit (the regulator) of this system contains a differentiator, a multiplier, an adder, an inverter, a relay unit, and two nonlinear transducers NC1 and NC2. Unit NCl realizes the functional that is, produces the current value of the amdependence A = plitude A. Unit NCl models the functional dependence if*(A), which is given either by (5.3.68) or by (5.3.86), depending on the problem considered. Thus, the feed-back circuit in the diagram in Fig. 44 realizes the control law

u(x, x) = -eum sign (x + x
1

e .

FIG. 44 We also note that the diagram in Fig. 44 becomes significantly simpler if system (5.3.1) is controlled by using the quasioptimal algorithm in the first approximation (5.1.55), (5.1.56). In this case, the part of the diagram indicated by the dashed line is absent. §5.4. Optimal control of quasiharmonic systems with noise in the feedback circuit Now we shall show how to generalize the results of the preceding section to the case where the error in the measurement of the output (controlled) variable x(i] cannot be removed. 5.4.1. Statement of the problem. We shall consider the feed-back

control system whose block diagram is shown in Fig. 25. Just as in §5.3, we

Control of Oscillatory Systems

299

assume that the plant P is a quasiharmonic controlled system perturbed by the standard white noise and described by the equation

x + ex(x,x)x + x = su + VeB(,(t).

(5.4.1)

We seek the optimal (scalar) control w* = «* (t) in the class of piecewise continuous functions whose absolute value is bounded by um: \u(t)\
(5.4.2)

It is required to calculate the controller C so that to provide the optimal damping of oscillations x(t) arising in system (5.4.1) under the action of

random perturbations (.(t). In this case, the quality of the damping is estimated by the mean value of the functional

U

T

-1

c(x(t),x(t))dt\. J

(5.4.3)

The functions x(x,x] and c(x,x) in (5.4.1), (5.4.3) are the same as in (5.3.1), (5.3.3). Therefore, problem (5.4.1)-(5.4.3) is completely identical to problem (5.3.1)-(5.3.3). The single but important distinction between these problems is the fact

that now it is impossible to measure the current state of the controlled variable x(t). We assume that the result y(t] of our measurement is an additive mixture of the true value of x(t) and a random error of small intensity: y(t) = x ( t ) + V^r,(t], (5.4.4) where e is a small parameter the same as in (5.4.1) and the random function r)(i) is a white noise (independent of £(i)) with characteristics

Er,(t) = 0,

Erj(t)r,(t - r) = N6(r),

(5.4.5)

where N > 0 is the intensity (spectral density) of the process rj(t). Now to obtain some information about the current state of the plant at time t, we need to use the entire prehistory of the observed process 2/0 = {y(r) : 0 < T < t} from the initial time t = 0 till the current time t. Therefore, in this case, the current values of the control action ut and the function (5.3.5) of minimum future losses depend on the observed realization t/o, that is, are the functionals

«t = wfoS],

(5.4.6)

U

T

-j

c(x(T),x(T))dr

y*0 \. J

(5.4.7)

300

Chapter V

The principal distinction between problems (5.4.1)-(5.4.4) and (5.3.1)(5.3.3) is that, to find the optimal control functional (5.4.6) that minimizes the optimality criterion (5.4.3), we need to choose the space of states of the controlled system (the sufficient coordinates of the problem; see §1.5, §3.3, and §4.2) in a special way, which will allow us to use the dynamic

programming approach for solving the synthesis problem. Let us show how to determine the sufficient coordinates for problem 5.4.2. Equations for the sufficient coordinates. Let us consider the random function z(t) = JQ y(r) dr. Then writing the plant equation (5.4.1) as the system of first-order equations:

(5.4.8) and assuming that the control u is a given function of time, we can readily show that z ( t ) is the observable component of the three-dimensional Markov process ( x i ( t ) , X 2 ( t ) , z ( t ) ) . By using (5.4.4), (5.4.5), and (5.4.8), as well as the results of §1.5, we readily obtain an equation for the a posteriori probability density wpi(t,x) = wpi(t,xi,X2) = w(xi,x<2 \ ZQ) = w(xi,xz \ 2/0) for the components of the unobservable diffusion process determined by system (5.4.8). The corresponding equation is a special case of Eq. (1.5.39)

and has the form

(5.4.9) Here the subscripts a , f 3 take the values 1 and 2, and E2, \Bap\\ =

Ky>(x)

0 0 || 0 gB '

= £U — £x(x

, y(a: yj

'

. _ _1 f ~ si

^fA

(5.4.10)

Equation (5.4.9) for the a posteriori density also remains valid if the control u in (5.4.8) is a functional of the observed process z*Q (or y^} or even of the a posteriori density wps(t, x) itself. This fact is justified in [175] (see also §1.5). It follows from (5.4.4), (5.4.5), (5.4.9), (5.4.10), and the results of §1.5 that the a posteriori probability density wps(t, x), treated as a function of time, is a Markov stochastic process and thus can be used as a sufficient coordinate in the synthesis problem. However, usually, instead of wps(t, x), it is more convenient to use a parameter system equivalent to wps(t, x). If we write x%(t) = x°(, x°(t) = X2t for the coordinates of the maximum

Control of Oscillatory Systems

301

point of the a posteriori probability density wps(t,x) at time t,17 then, expanding wps(t, x] in the Taylor series around this point, we obtain the following representation for wps(t, x) = wps(t, xi, x^) (see (1.5.41)): wps(t, 3=1, £2) = const 00

-I

E

-«»!»,...„.(*)(*„. - < (*)) . . . (xn, - X ° . ( t ) )

S=2

S

(5.4.11)

J

-

(in (5.4.11) the sum is over n;, i = 1,..., s, assuming the values 1 and 2). If we substitute (5.4.11) into (5.4.9) and set the coefficients of equal powers of (xni — o;^)... (xn> — x^J on the left- and right-hand sides equal to each other, then we obtain a system of differential equations for the parameters z°.(<) and a ni ...„,(<) (see (1.5.43)). Note that since Eq. (5.4.9) is symmetrized, the stochastic equations obtained for x^.(t) and ani...n,(t) are also symmetrized. It is convenient to replace the probability density wps(t, x) by a set of parameters, since we often can truncate the infinite system of the parameters x nii ani,...,n, [167, 170, 181] retaining only a comparatively small number of terms in the sum that is the exponent in (5.4.11). The error admitted in this case as compared with the exact expression for wps is the less the higher is the a posteriori accuracy of estimation of the unobservable components Xi and £2 (or, which is the same, the less is the norm of the matrix i|D a/ g|| of the a posteriori variances); here, the norm of the matrix ||Da/g|| is of the order of e, since, in view of (5.4.4), the observation error is a small variable of the order of y'e. It is often assumed [167, 170] that a n i n _, n s = anin2n3n4 = • • • = 0 in (5.4.11) (this is the Gaussian approximation). In the Gaussian approximation, from (5.4.9) and (5.4.10) we have the following system of equations for the parameters of the a posteriori density wps(t, zi, X2): 18

eN

17 The variables xj(t) and Xj(t) are estimates of the current values of the coordinate x(t) and the velocity x(t) of the control system (5.4.1). If the estimation quality is

determined only by the value of the a posteriori probability, then xj(i) and x^(t) are the optimal estimates. 18 For the linear oscillator (when %(a;,i) = 1 in (5.4.1)), the a posteriori density (5.4.11) is exactly Gaussian, and Eqs. (5.4.12) are precise.

302

Chapter V

D12 = -DnD 12

_£

.

_Dlll +e

_

+ D 22

_ - -22^— ,

1 2 -—-— oxidx2

D 22 = sB - 2D12 l + 4^ + e

vx2

- D 2 2 OD

(5.4.12)

To write these equations, we have passed from the parameter system ||aa^||

to the matrix ||Dajg|[ = Hcta/sll" 1 of the a posteriori covariances. Besides of this, in (5.4.12) we have used the notation x2) + x 2 -— (KI, x2),

X2(a=i, x2) = x2x(xi, x2).

OX2

Let us make some remarks concerning Eqs. (5.4.12). First, since (see

(5.4.1), (5.4.4), and (5.4.5)) the noise intensity in the plant and in the feedback circuit is assumed to be small (of the order of e), the covariances of the a posteriori distribution are also small variables of the order of e, that is, we can write DH = eDn, Di2 = eD 12 , and D 2 2 = eD 22 . This implies that the terms in (5.4.12) are of different order of magnitude and thus Eqs. (5.4.12) can be simplified furthermore. So, retaining the most important terms and omitting the terms of the order of e2 and of higher orders, we can rewrite (5.4.12) in the form

•A.VH •"a/ < "2 ' ^^ >

= 2D12 -

fij

D?, eN

D?2 , D 22 = e S - 2 D i 2 - — £ + e 2 . . .

D

(5.4.13)

P Iv

We also note that, in this approximation, the last three equations in (5.4.13)

can be solved independently of the first two equations. In particular, we

Control of Oscillatory Systems

303

see that, for a long observation, the stationary operating conditions occur and the covariances of the a posteriori probability distribution attain some steady-state values D^, D^, and D^ that do not change during the further observation. These limit covariances depend neither on the way of control nor on the type of the plant nonlinearity (the function x(x,x) in (5.4.1)) and are equal to

(5.4.14)

= £D22 = In what follows, we obtain the control algorithm for the optimal stabilizer (controller) C under these stationary observation conditions. 5.4.3. The Bellman equation and the solution of the synthesis problem. In the Gaussian approximation, the loss function (5.4.7) is completely determined by the current values of the a posteriori means x®(t) •=. x®t and x2(t) = x2t and by the values of the a posteriori covariances On, Di2, and D22- Under the stationary observation conditions, the a posteriori covariances (5.4.14) are constant, and therefore, we can take Z

2W> X2(t)i and time t as the arguments of the loss function (5.4.7). Thus, in this case, instead of (5.4.7), we have

t
U

T

_^ __^ _^ -i CI(XIT, x2r) dr | x^t, a^, D*!, D*2, D22 . J (5.4.15)

In (5.4.15) the symbol E of the mathematical expectation means the a posteriori averaging, that is, the averaging with the a posteriori probability

density. In other words, if we write the integral in the square brackets in (5.4.15) as a function of the initial values of the unobservable variable x±t and x-2t, then, to obtain F(t, x^t, x2t), we need to integrate this function with respect to *i« and x2t with the Gaussian probability density exp

-

— — ——=,

[D2'3(x« - x°lt)2 - 2D1*a(x» - x°lt)(x2t - x°2t)

(5.4.16) For the function (5.4.15), the basic functional equation (the optimality

304

Chapter V

principle) of the dynamic programming approach has the form

_

U

t+A

Ci(xiT,X2T)dT

F(t + A, z° t+A , x§ t+A ) |<,^ t , DU.DU, D2*2 . (5.4.17) The differential Bellman equation can be obtained from (5.4.17) by using the standard derivation procedure outlined in §1.4 and §1.5. To this end, we need to expand the function F(t + A, «°t+A, Z^+A) in the Taylor series around the point (t, x®t, a^t); *° calculate the mean values of the increments - xr° E(r° + A - xit) r° I 2> ••• \t)i} cf
and the integral t+A

-,

ci(xlT,x2T)dr

J

,

(5.4.19)

to substitute the expressions obtained for (5.4.18) and (5.4.19) into (5.4.17), and pass to the limit as A —>• 0. To calculate the mean values of (5.4.18), we need Eqs. (5.4.13) and formulas (5.4.4) and (5.4.5). So, from (5.4.13) we obtain r

(5.4.20)

J

Since the stochastic processes XIT — KI(T), x\T — X ° ( T ) , and x^T = «°(T) are continuous, for small A we can replace these stochastic functions by the constant values x^t, x®t, and x\t. The error of this replacement is of the order of o(A). As a result, if we average with respect to r\(£) with regard to (5.4.5), then (5.4.20) implies

o(A). By averaging (*) with respect to xu with the probability density (5.4.16),

we finally obtain

«§,A + o(A).

(5.4.21)

Control of Oscillatory Systems

305

In a similar way, we can find the other expressions for (5.4.18) and (5.4.19):

- z°t) = (eut - x°lt °

£x(x°u,

0

N

v

'

x°2t)x°2t)A + 0 (A),

"

— * 7^" *

-^t)=e^7 i i A + o(A), 0

U

° 2 —iti

N

t+A

-I

/Y+OO

c(xir,x2T)dT\ = A // \

c(xit,X2t)N(x°, D*)dxitdx2t

JJ-OO

+ o(A).

D

(5.4.22)

Using (5.4.21) and (5.4.22) and letting A -)• 0 in (5.4.17), we obtain

dx°2 0 -=r*^*

" /• /> +oo

//

1

2

dF 2

+

(

1

_

, D*)dxi(ia; 2

(5.4.23)

»/ J —

(here we omit the subscript t in «j, x®, zi, 3:2)If the terminal time T in (5.4.3), (5.4.7), and (5.4.15) is sufficiently large, then the fact that F depends on t becomes unimportant (the stationary stabilization conditions take place), since the derivative —dF/dt —>• 7 as T —>• oo (here 7 is a constant that characterizes the mean losses per unit time

under the optimal control). As is usual in such cases (see (1.4.29), (2.2.9), (4.1.7), and (5.3.17)), passing from F(t, x°, x®} to the time-independent loss function

[F(t, xl x°) we arrive at the stationary version of Eq. (5.4.23):

/• f +

// J J —C

(5.4.24)

Chapter V

306

Just as in §5.3, it is more convenient to solve Eq. (5.4.24) in the polar coordinates if, instead of the estimated values of the coordinate x® and the velocity x%, we use, as the arguments of the loss function, the corresponding

values of the amplitude A$ ° = vlocos$o,

an

d the phase $o:

x° = -A0 sin <J>0

o = t + fo] •

(5.4.25)

Performing the change of variables (5.4.25), we transform (5.4.24) to the form

(5.4.26)

- mn

The expressions for G(A 0 , $0, u) and H(Ao, $0, u) coincide with (5.1.17) after the change A, <& —>• AQ, 3>o- The function c* (Ao, <J>o) is determined by the penalty function c(x, x) in (5.4.3) (e.g., for c(x, x) = x2 + x2, we have c*(A 0 ,$o) = Al + eDi! + eD^). In (5.4.26) L0 denotes the differential operator

sin2$ 0

'K*^2|cos2< sin2 $o d

sin2$0

sin 2$o cos2$0

d

dA0

cos 2$o I

". n

(D 12 ) 2 sin2 $0 __

sin2$0

cos2 $0 d2

l

}

Note that as TV —>• 0 formula (5.4.27) passes into formula (5.3.15) for the operator L obtained in §5.3 for systems containing complete information

about the phase coordinates of the plant. We can readily verify this fact by substituting the values (5.4.14) of the steady-state covariances into (5.4.27) and passing to the limit as N -> 0. Then (5.4.27) acquires the form of

(5.3.15), and Eq. (5.4.26) coincides with (5.3.18).

Control of Oscillatory Systems

307

Equation (5.4.26) can be solved by the approximate method outlined in §5.3. Indeed, the principal assumption (necessary for the approximate method to be efficient) that the trajectories of the sufficient coordinates x®(t) and £2 W are quasiharmonic is satisfied in this case, since the noises (,(t) in the plant and the noises rj(t) in the feed-back circuit are small (their

intensity is of the order of e). In view of this fact, the rate of change of the estimated values for the amplitude AQ and the phase
min u(r)\
\

ft+2"

E \e I I

c*(AOT, <J>0r) dr — ZTTSJ +

df

, *

df d$0

Jt

i < V < t + 2)r

+

dA0d$0 (5.4.28)

similar to Eq. (5.3.26). Next, just as in §5.3, by using (5.4.26), we eliminate the derivatives of the loss function with respect to the phase <&o from (5.4.28) and solve the obtained one-dimensional infinite-order equation by the method of successive approximations. Note that the increments of the estimated values of the amplitude sAAo and the phase eA<po on the time interval A — 2-Tr can readily be calculated with the help of Eqs. (5.4.13) for the sufficient coordinates written in the

polar coordinates AQ and 3>o in accordance with the change of variables (5.4.25). In this case, just as in §5.3, we assume in advance that, in view of the symmetry of the problem, the optimal control has the form

w»(4 0 ,$ 0 ) = um sign [sin ($0-<pr(A0))],

(5.4.29)

and thus solving the synthesis problem is equivalent to finding the equation in the polar coordinates for the switching line ipr(Ao). We do not consider the mathematical calculations in detail (they coincide

with those in §5.3), but illustrate the results obtained for the switching line in the first two approximations by way of example of a controlled plant that is a linear quasiharmonic system (in (5.4.1) we have x(x, x] = 1). By using the above-described procedure, we simultaneously solve Eqs. (5.4.26) and

Chapter V

308

(5.4.28) and obtain the following one-dimensional Bellman equation in the first approximation (in the case of quadratic penalties c(x, x) = x2 + x2):

Z/dfi

TTT+

dAf.

(I / \

n\ dfi

-^r)x 2 / dA

r

dA0

(5.4.30)

TT 4N Hence we obtain the following equation for the switching line in the first approximation:

vl(A0) = 0,

(5.4.31)

which corresponds to the control law

FIG. 45 Taking into account (5.4.31), from (5.4.30) we obtain the expression exp -

'

.

(A2 — 7) exp

r

1,. — — (A 2 -

for the derivative dfi/dAo, which enters the formula for the switching line in the second approximation:

309

Control of Oscillatory Systems

Since ip2(A0) is small, it follows from (5.4.25) and (5.4.29) that the quasioptimal control algorithm in the second approximation can be written as

« 2 (z°, x°2) = -um sign (x°2 + zV r (\Af The block diagram of a self-stabilizing system realizing the control algorithm in the second approximation is shown in Fig. 45. The most important distinction between this system and that in Fig. 44 is that the feed-back circuit contains an additional element SC producing the current values of the sufficient coordinates x°(t) and x^t). Figure 46 presents the diagram of this element in detail.

eu

FIG. 46

CHAPTER VI

SOME SPECIAL APPLICATIONS OF ASYMPTOTIC SYNTHESIS METHODS

In this chapter we consider some methods for solving adaptive problems of optimal control (§6.1), as well as problems of control with constrained phase coordinates (§6.2). Furthermore, in §6.3 we solve a problem of controlling the size of a population whose behavior is described by a stochastic logistic model.

"Adaptive problems" are optimal control problems, similar to those considered above, that are solved under the assumption that some system parameters are unknown a priori. In this case, just as in problems with observation noise (§3.3, §4.2, and §5.4), the optimal controller is a combination of the optimal nitration unit and the controlling unit properly producing the required controlling actions on the plant. In §6.1 we present an approximate method for calculating such controllers; this method is effective if the a priori indeterminacy of unknown parameters is relatively small. In §6.2 we present exact and approximate solutions of some stochastic problems of control with constrained phase coordinates. We consider two servomechanisms and a stabilizing system under the assumption that the range of admissible deviations between the command signal and the output coordinate is a fixed interval on the coordinate axis. We consider two cases of reflecting and absorbing screens at the endpoints of this interval. In solving the stabilization problem, we study a two-dimensional problem in which the phase trajectories reflect along the normal on the boundary of the region of admissible phase variables. In §2.4 we have already studied the problem of control of a population size and have exactly solved a special control problem based on the stochastic Malthus model. In §6.3 we shall consider a general case of a stochastic logistic controlled model and construct an optimal control algorithm for this model in terms of generalized power series. We also obtain approximate finite formulas for quasioptimal algorithms, which can be used for large values of the model parameter called the medium capacity. 311

312

Chapter VI §6.1.

Adaptive problems of optimal control

In this section we consider the synthesis problem for controlled dynamic systems perturbed by a white noise and described by equations with unknown parameters. We assume that the system equations contain these parameters linearly and that the a priori indeterminacy of these parameters

is small in some sense. First we present a formal algorithm for solving the Bellman equation approximately (and for the synthesis of a quasioptimal control). The algorithm is based on the method of successive approximations in which the solution of the optimal control problem with completely known values of all parameters is used as the zero approximation (a generative solution). Next, we estimate the quality of the approximate synthesis (for the first two approximations). Finally, we illustrate our method by calculating a quasioptimal stabilization system in which the controlled plant

is an aperiodic dynamic unit with an unknown inertia factor. 6.1.1. We shall consider control systems where the plant is described by stochastic differential equations of the form

x = A0(x)+Bu +
(6.1.1)

Here x is an n-dimensional phase vector, u is an r-dimensional control vector, 6(x) is an n-dimensional vector of known functions, £(t) is an ndimensional vector of random functions of the white noise type (1.1.34),

and A, B, a are constant matrices of the corresponding dimensions. Here B and cr are known matrices (det to, for a given x(t0) = x

and any admissible control u. In the following, it is convenient to denote the unknown parameters of

the matrix A by the special letter a. Numbering all unknown parameters in an arbitrary way and writing them as a column a = (QI, . . . , a*), we can rewrite Eq. (6.1.1) as x = A*0(x)+Q(x)a + Bu + o-£(t),

(6.1.2)

where A* is obtained from the matrix A by substituting zeros instead of all unknown elements and the n x k matrix Q(x) (that consists of the functions Bi(x) and zeros) is uniquely determined by the vector a from the condition A0(x) = A*B(x] + Q(x)a. The goal of control is to minimize with respect to u the mean value of the functional

U

T

}

(c(x(t}} + uT(t)Hu(t)} dt + ij>(x(T)} L J

(6.1.3)

Applications of Asymptotic Synthesis Methods

313

where c(x) and ip(x) are some nonnegative bounded continuous functions, and H is a positive definite constant r x r matrix. We do not impose any restrictions on the admissible values of the control vector u and assume that the state vector x can exactly be measured at any time t G [0,T]. Thus, we can seek the optimal control u* that minimizes the mathematical

expectation (6.1.3) in the form of the functional «,=¥»[*, z*0],

(6.1.4)

where x*0 = {X(T) : 0 < T < t] is an observed realization of the state vector from the initial instant of time to the current time t. 6.1.2. The approximate synthesis algorithm. We assume that the difference between the unknown parameters a and the a priori known

vector ao is small. To obtain a rigorous mathematical statement, we assume that a is a random vector subject to an a priori Gaussian distribution with mean QQ and the covariance matrix DO = eDo (e is a small parameter). This assumption and Eqs. (6.1.2) imply the following two facts that we need in the sequel. 1. The a posteriori probability density p(a

a;0) = Pt(ct) calculated from observations of the process ^(i) 1 is a Gaussian (conditionally Gaussian) density completely described by the vector m = m(t) = mt of a posteriori mean values and the matrix D = D(t) = Dt of a posteriori covariances. The latter are described by the following differential equations (see [132, 175]):

d m = DQ T (x(t))AT- 1 [d0x(t) - (A(m)9(x) + Bu) dt], (6.1.5) U = -DJViD.

(6.1.6)

Throughout this section, N~* is the inverse of the matrix TV = era'T, NI =

QTN~1Q, and the matrix A(m) is obtained from A in (6.1.1) in which all unknown parameters a are replaced by their a posteriori means m.2 We also note that system (6.1.5) contains stochastic differential Ito equations and the differential equations in system (6.1.6) are understood in the usual sense. 2. The elements of the matrix Dt are small variables (~ s) for all t > 0. Indeed, by integrating the matrix equation (6.1.6), we obtain the following explicit formula for the covariance matrix D< in quadratures:

Dt=\E + D0 [ QT(xs)N-1Q(xs)ds] L Jo J

D0

(6.1.7)

1

It follows from (6.1.2) and (6.1.4) that x(t) is a diffusion type process.

As is known [38, 39, 167], the a posteriori means m = mt are optimal estimates of ce with respect to the minimum mean square error criterion.

314

Chapter VI

(E is the k x k identity matrix). Denoting the columns (with the same numbers) of the matrices Dt and DO by yt and t/o, respectively, we obtain from (6.1.7) the relations

R(s) = D0QT(xs)N-lQ(xs).

R(s)dsyt=y0,

(6.1.8)

o

Since the constant matrices D0 and N [ are positive definite, the matrix R(s) is nonnegative definite; R(s) is degenerate if and only if all elements of at least one column of the matrix Q are zero. Let X(s) > 0 be the minimum eigenvalue of the matrix R(s). plying (6.1.8) by yt in the scalar way, we obtain

\\yt\\2 + yt

Jo

R(s)dsyt-(y0,yt)

On multi-

(6.1.9)

(here \\yt\\ is the Euclidean norm of the vector yt). Replacing the quadratic form in (6.1.9) by its lower bound and estimating the inner product (yo, yt) with the help of the Cauchy-Schwarz-Bunyakovskii inequality, we arrive at the inequality

11^*11 S (l + /*W)

\\yo\\i

/"*

fJ-(t) = / X(s)ds. Jo

(6.1.10)

Since \\yo\\ ~ £, it follows from (6.1.10) that \\yt\\ ~ £• Thus we have Dt ~ e for alH G [0,T]. We shall solve the problem of optimal control synthesis by the dynamic programming approach. To this end, we first note that the a posteriori probability density pt(a) (or the current values of its parameters mt and Dt) together with the current values of the phase vector Xt form the suffi-

cient coordinates (see §1.5) for the problem in question. Therefore, these parameters and time t are arguments of the loss function given, as usual, by the formula

_ t<»
U

T [ c ( x ( s ) ) +uT(s)Hu(s)]

x ( t ) = x, m(t)

ds

=m, D(t) = D. D\. (6.1.11)

The expression in the square brackets in (6.1.5) is the differential of the Wiener process (the

innovation process [132]) with the matrix N of

315

Applications of Asymptotic Synthesis Methods

diffusion coefficients. Therefore, it follows from (6.1.2), (6.1.5), and (6.1.6) that the variables (xt,mt, Dt) form a diffusion Markov process (degenerate with respect to D). By applying the standard derivation procedure (see §1.4, as well as [97]), we obtain the following differential Bellman equation for the function F = F(t, x, ra, D): T T T T -Ft = 0T(x)A(m)F x + min [U B Fx + u Hu] +

Sp(NF,,T

F(T,x,m,D)=i>(x).

(6.1.12)

Here Ft = dF/dt, Fx is a vector-column with components ^-, . . . , -j£-,

FxxT F

T

•PramT —

dmpdmq

(i= l,...,n;p= l,...,k),

=

8F dD

are matrices of partial derivatives,

pg

and Sp(-) is the trace of the matrix (•). Since the covariance matrix D is of the order of s, it is now expedient to pass to new variables D according to the formula D = sD. Performing this substitution and minimizing the expression in the square brackets, we

transform Eq. (6.1.12) to the form

-Ft = 0T(x)AT(m)Fx - ^FfBiF, + ^ Sp(NFxxT)

+ c(x)~eSp(DNlDFD)+eSp(DQTFxmT} + -Sp(DNlDFmmT), F(T,x,m:D)=il>(x),

B

(6.1.13)

In this case, the vector

(6.1.14)

316

Chapter VI

at which the function in the square brackets in (6.1.12) attains its minimum, determines the optimal control law, which becomes a known function M* = M* (t, x, ra, D) of the sufficient coordinates, after the loss function F = F(t, x, m, D) is calculated from Eq. (6.1.13). Now let us discuss whether Eqs. (6.1.13) can be solved. Obviously, in the more or less general case, it is hardly possible to obtain an exact solution.

Moreover, one cannot construct the exact solution of Eq. (6.1.13) even in the special case where 8(x) is a linear function and c(x) and i>(x) are quadratic functions of a;, that is, in the case in which the synthesis problem

with known parameters in system (6.1.1) can be solved exactly. The crucial difficulty in this case is related to the bilinear form (in the variables x and m) appearing in the coefficients of the first-order derivatives Fx. On the other hand, a high accuracy of estimating the unknown parameters a, due

to which a small parameter e appeared in the three last terms in (6.1.13), results in a rather natural assumption that the difference between the exact solution of (6.1.13) and the solution of (6.1.13) with e = 0 is small. (In other words, the difference between the solution of the synthesis problem with unknown parameters a and the similar solution with given a = ai is small.)

The above considerations allow us to believe that an efficient approximate solution of Eq. (6.1.13) (that is, of the synthesis problem) can be obtained by means of the regular asymptotic method based on the expansion of the desired loss function F in powers of the small parameter e:

F = F° + eF1 + e2F2 + e3 ... .

(6.1.15)

Substituting (6.1.15) into (6.1.13) and grouping terms of the same order with respect to £, we obtain the following equations for successive approximations: _T j ]_

4

x

x

1

2

xx )

2

\ i

-i

x

x

2 (6.1.17) /

-

4£,v.,

* •

1 <

2

p-i) + I 0
Fs (T 2; ra D) = 0

s> 2

(6.1.18)

Applications of Asymptotic Synthesis Methods

317

The zero-approximation equation (6.1.16) is nonlinear,3 while the successive approximations can be found by solving the linear equations (6.1.17) and (6.1.18), which usually is a simpler computational problem. Thus, the described scheme for solving Eq. (6.1.13) approximately is useful only if Eq. (6.1.16), that is, the Bellman equation for the problem with completely

known parameters a, can be solved exactly. As was already pointed out, the last condition is satisfied if 6i(x] are linear functions and c(x] and i/>(x) are quadratic functions of the phase variables x. In this case, all successive approximations can also be calculated in the form of quadratures (see §3.1 in [34]). The solutions of Eqs. (6.1.16)-(6.1.18) of successive approximations can be used for obtaining approximate solution of the synthesis problem. Namely, the quasioptimal control us(t, x,m, D), corresponding to the sth approximation, is determined by formula (6.1.14) after the function F in (6.1.14) is replaced by the approximate expression F = F° + sF^ + • • • + esFs. 6.1.3. Estimates of the quality of approximate synthesis. We assume that the quasioptimal control us (t, x, ra, D) has already been obtained in the sth approximation. By

(6.1.19) we denote the mean value (calculated from the time instant i] of the optimality criterion (6.1.3) for the control w s . 4 The deviation A* = Gs — F of the function (6.1.19) from the equation (6.1.13) is a natural control us (t, x, m, D). In what first two approximations, that

exact solution F(t,x,m, D) of the Bellman estimate of the quality of the approximate follows, we calculate the order of As in the is, we estimate A° and A1.

Just as in §3.4, we calculate the desired estimates A s (s — 0, 1) in two steps. First, we estimate the differences 5s — F - F*, then, 7* = F' - Gs , which immediately implies the estimates for A s (in view of the triangle inequality) .

Estimation of the differences 6° and 51. Let 9 ( x ) , c ( x ) , and i/>(x) be bounded continuous functions for all x e R n . Then it follows from Theorem 2.8 (for the Cauchy problem) in [124] that the quasilinear equations 3 The partial differential equations (6.1.13) and (6.1.16) of parabolic type are linear with respect to the higher-order derivatives of the loss function. That is why, equations of the form (6.1.13) and (6.1.16) are sometimes called weakly nonlinear (quasilinear or

semi-linear), see [61, 124]. 4

In (6.1.19), U,(T)

= U,(T,XU>(r),mu>(T),D»'(T)),

where xu'(r], m"'(r), and

Du> (T) satisfy Eqs. (6.1.2), (6.1.5), and (6.1.6) with u = U,(T) for r > t and the initial conditions x"' (t) = x, rn"> (t) = m, and D"' (t) = D.

318

Chapter VI

(6.1.13) and (6.1.16) have at most one solution in the class of functions that are continuous in the strip HT — {\x\ < oo; |D| < oo; m < oo; 0 < t <

T}, continuously differentiable once in t, and twice in other variables for 0 < t < T, and possess bounded first- and second-order derivatives with respect to x, m, D in HT- Furthermore, Theorem 2.5 (for quasilinear equations) in [124] implies the following estimate for the solution of the Cauchy problem (6.1.13):

x,m, D)| < Cim&xif>(x) + C 2 max|c

(6.1.20)

(here C\, C^ > 0 are some constants; it is assumed that the function c may depend not only on x as in (6.1.13) but on the other variables t,m, D). The above arguments also hold for linear equations (6.1.17) and (6.1.18) of successive approximations. By introducing a quasilinear operator L, we rewrite Eq. (6.1.13) in the form LF~-c(x), 0<«T; F(T, x, m, D) = ^(x). Then, for <5° = F - F°, we obtain from (6.1.13) and (6.1.16) a quasilinear equation of the form

LS° - -( 0
6°(T,x,m,D)=Q

(6.1.21)

(with regard to the fact that the solution F° of the zero-approximation equation (6.1.16) is independent of D, and therefore, F® = 0). The vector of partial derivatives F® is a bounded continuous function in view of the above-mentioned properties of the solution to Eq. (6.1.16). Hence, (6.1.21) is an equation of the form (6.1.13). To use the estimate (6.1.20), we need to verify whether the right-hand side of (6.1.21) is bounded. The elements of the matrices Q and NI are bounded, since the functions 6(x) are bounded and the matrix N is bounded and nondegenerate. Moreover, it follows from the inequality (6.1.10) that the norm of the matrix D can only decrease with time t. Therefore, the matrix D is bounded for all t £ [0,T] if the matrix DO of the initial (a priori) covariances is bounded, which was assumed in advance. It remains to estimate the matrices F°mT and F^mT of partial derivatives. To this end, we turn to the zero-approximation equation (6.1.16). By writing v* = dF°/dmi (here mi is an arbitrary component of the vector TO) and differentiating (6.1.16) with respect to the parameter m;, we obtain

Applications of Asymptotic Synthesis Methods

319

the linear equation for v':

vi(T,x,m,D) = Q.

0
(6.1.22)

Equation (6.1.22) is written for the case where the unknown parameter en stands on the rth line and in the jth column of the matrix A in the initial system (6.1.1); here 9j = 6j(x) is the jth component of the vector-function 0(x). Since OjFXr 'ls bounded, the solution v* of Eq. (6.1.22) and its partial derivatives vx and v'xxT, as was already noted, are also bounded. Finally, since v'x = Fxm . is bounded and the number i is arbitrary, the matrix FxmT in the first term on the right in (6.1.21) is also bounded. In a similar way, we verify the boundedness of F^mT • Thus, it follows from (6.1.21), (6.1.20) that 6° satisfies the estimate

\S°
(6.1.23)

where C is a positive constant. In a similar way, we can estimate Sl = F — F = F — F° — eF1. From (6.1.13), (6.1.16), and (6.1.17), it follows that Sl satisfies the equation

Q
Sl(T,x,m,D) = Q.

(6.1.24)

The boundedness of Fx, F^, F^xT, and F^mT can be verified by analogy with the case where we estimated 5°. Therefore, (6.1.24) and the inequality

(6.1.20) imply

\Sl
(6.1.25) 1

Estimation of the differences 7° and 7 . For the functions Gs = s

G (t,x,m,D), s — 0,1,2,..., determined by (6.1.19), we have the linear partial differential equations [45]

-G\ = 6T(x)AT(m)G*x + uTsBTG*x + uTa Hus + lcx + -eSp(D7V-1(a;)DG3D), s

0 < t < T,

G (T,x,m, D) =i/>(x).

(6.1.26)

320

Chapter VI

The quasioptimal controls

us = us(t,x,m,D) = -l-H^BTF*x = ~^H~1BT(F° + eF* + • • • + e° F°) contained in (6.1.26) are bounded continuous functions. Therefore, in view of [66], the functions Ga satisfying (6.1.26) are also bounded and twice continuously differentiable, just as the functions F and Fs discussed above. By using the expressions u0 = -H~iBTF°/2 and m - -H~iBT(F° + eF*)/2 for quasioptimal controls, as well as equations (6.1.26), (6.1.16), and (6.1.17), we can readily obtain the following equations for the differences 7° = F° - G° and 71 = F° + eF* - G1:

I°7° = e\ Sp(DQTF°mT) + L

0
7°(T,z,m,D) = 0,

(6.1.27)

IV = £2 \Sp(DQTF}mT) + zi Sp (DN,D(F^mT I

Q
j1(T,x,m, D) = 0,

(6.1.28)

where L° and -t1 are the linear differential operators

^

v J

dxdmT

+ ySp(W( : c )D

2

x

dx

Since the expressions in the square brackets in (6.1.27) and (6.1.28) are bounded, the inequalities (6.1.20) for the solutions 7°(t, x,m, D) and 7 1 (t,z,m, D) of Eqs. (6.1.27) and (6.1.28) yield the estimates

]7°| < Ce,

I71 < Ce2.

(6.1.29)

Finally, from (6.1.29), (6.1.23), and (6.1.25) with regard to the inequality |AS < |(5*[ + |7" , we have

|A°| < Ce,

A1 < Ce2.

(6.1.30)

Applications of Asymptotic Synthesis Methods

321

The estimates (6.1.30) show that the use of the quasioptimal control UQ or ui instead of the optimal control (6.1.14) results in a deviation (an increase) in the functional (6.1.3) by ~ e in the zero approximation and by ~ e2 in the first approximation. Thus, it follows from (6.1.30) that the method

of approximate synthesis of optimal control considered in Section 6.1.2 is asymptotically efficient. 6.1.4. An example. Let us consider the simplest case of system (6.1.1) in which the plant is an aperiodic first-order unit with an unknown inertia factor. In this case, Eq. (6.1.2) is a scalar equation

x = -ax + bu + T/vti(t),

(6.1.31)

where a is an unknown parameter, b and v > 0 are given numbers, and £(;£) is a scalar white noise of intensity 1. We define the optimality criterion

(6.1.3) as fT

I[u] = E / (gx2(t) + hu2(t)) dt Jo

(6.1.32)

where g and h > 0 are given constants. The optimal filtration equations (6.1.5), (6.1.6) and the Bellman equation (6.1.13) for problem (6.1.31), (6.1.32) are

d0m= - — x(t)[d0x(t)+

(mx(t] - bu) dt\,

(6.1.33)

—2

t=——x2(t),

(6.1.34)

-Ft = -

D2 2,„

D2

£( \Jxrmx + —a; J*p — e£——x —x F — fmm n ,.

0
? /y

F(T,x,m,D) = Q

(6.1.35)

( x , m, and D are scalar variables in (6.1.33)^(6.1.35)). The zero approximation (6.1.6) for Eq. (6.1.35) has the form

0
F°(T,x,m) = Q.

(6.1.36)

322

Chapter VI

The exact solution of Eq. (6.1.36) is5

F°(t, «, m) = f°(t, ra)z 2 + r°(t, m),

f(t,m) = f(T-t,m) =

gv(T-t)

(6.1.37)

(p — m)e~2'ii(1 ~t> + p + m

vh

,

2/3

It follows from (6.1.14) and (6.1.37) that the quasioptimal control in the zero approximation has the form

u0(t,x,m) = --f°(t,m)x,

(6.1.38)

it

where f°(t,m) is determined by (6.1.37). To obtain the quasioptimal control in the first approximation, we need to calculate the second term in the asymptotic expansion (6.1.15). In our

case, Eq. (6.1.17) for the function Fl = Fl(t, x, m, D) has the form

0
Fl(T,x,m,D) = Q.

(6.1.39)

Since, in view of (6.1.37), we have F^x — 2/^(i,m)a;, we obtain the following expression for the desired function F^(t,x, m, D):

F1(t,x,m, D) = / 1 (t,m, D ) x 2 + r 1 ( t , m , D), /!(*, m, D) = -2Dexp I - 2 / T x r / fm(T~

fT

^(tjm, D) = v I

(

Jt

L

(6.1.40)

[m + — f°(T - s, m)} ds\ J

h

{ rT rm + — 62 f°(T s,m)exp-^ 2 /

J

-

i i s,m) I d s *> ds,

1 f(s,m,D)ds

(here f^(T—s, m) denotes the partial derivative df°(s, m)/dm of the function f°(s,m) in (6.1.37) with respect to the parameter m). 5

Note that the loss function in the zero approximation is independent of the estimate variance D, i.e., F° = F°(t,x,m).

Applications of Asymptotic Synthesis Methods

323

It follows from (6.1.14), (6.1.15), (6.1.37), and (6.1.40) that the quasioptimal control synthesis in the first approximation is given by the formula

«i(«, x, m, D) = -- [ f ° ( t , TO) + e f 1 (t, TO, D)] x.

(6.1.41)

Comparing (6.1.38) and (6.1.41), we note that the optimal regulators in the zero and first approximations are linear in the phase variable x. However, if higher-order approximations are used, then we obtain nonlinear "laws of control." For example, in the second approximation, we obtain from (6.1.18) and (6.1.35) the following equation for the function F2 = F2(t, x, m, D):

-F? = -mxF2 - ^-[f(t,m)xF2

+ (xfl(t,m, D)) 2 ]

Obviously, its solution has the form

F2(t, x, TO, D) = q(t, m, D)a;4 + f 2 ( t , m, D)x2 + r2(t, m, D), and therefore, it follows from (6.1.14), (6.1.15), (6.1.37), and (6.1.40) that the quasioptimal control in the second approximation

u2(t, x, TO, D) = - f r { [ f ° ( t , m) + e/1^, m, D) + £2f2(t, TO, D)]x + 2e2q(t,m,D)x3\ is a linearly cubic function of x. Figures 47 and 48 show block designs for quasioptimal feedback control systems, which correspond to the first (Fig. 47) and the second (Fig. 48) approximations. By Wi (i = 0,1, 2, 3) we denote linear (in x) amplifiers with varying amplification coefficients

Chapter VI

324

FIG. 47 The plant P is described by Eq. (6.1.31). The unit SC of optimal nitration forms the current values of the sufficient coordinates m = m(t) — mt and

D = D(t) = Dt. It should be noted that the coordinate mt is formed in SC with the aid of the equation

(6.1.42) which differs from Eq. (6.1.33). The reason is that only stochastic equations understood in the symmetrized sense [174] are subject to straightforward simulation. Therefore, the symmetrized equation (6.1.42) is chosen so that

its solution coincides with the solution of the Ito equation (6.1.33). 6.1.5. Some results of numerical experiments. The estimates (6.1.30) establish only the asymptotic optimality of the quasioptimal controls u0 and HI. Roughly speaking, the estimates (6.1.30) only mean that the less the parameter e (i.e., the less the a priori indeterminacy of the components of the vector a), the more grounds we have for using the quasioptimal controls UQ and u\ (calculated according to the algorithm given in Section 6.1.2) instead of the optimal (unknown) control (6.1.4) that solves problem (6.1.1)-(6.1.3). On the other hand, in practice we always deal with problems (6.1.1)-

(6.1.3) in which all parameters (including e) have definite finite values. As a rule, in advance, it is difficult to determine whether a given specific value of the parameter £ is sufficiently small so that the above approximate synthesis procedure can be used effectively. Some ideas about the situations arising

Applications of Asymptotic Synthesis Methods

325

FIG. 48 for various relations between the parameters of problem (6.1.1)-(6.1.3) are given by the results of numerical experiments performed to analyze the efficiency of the quasioptimal algorithms (6.1.38) and (6.1.41) (see the example

considered in Section 6.1.4). As was already noted, it is natural to estimate the quality of the quasioptimal controls us (s = 0 , 1 , 2 , . . . ) by the differences A* = Gs — F,

where the functions G" = Gs(t,x,m, D), given by (6.1.19), satisfy the linear parabolic type equations (6.1.26) and the loss function F = F(t, x, m, D) satisfies the Bellman equation (6.1.13). In the example considered in Section 6.1.4, the Bellman equation has the form (6.1.35), and the functions Gs (s = 0,1, 2 , . . . ) satisfy the equations

-G\ = (-mi + bus)Gsx + hu2 + gx2 + -G",,

D2 v 0
G s (T,x,m, D) =0.

G sran (6.1.43)

Equations (6.1.35) and (6.1.43) were solved numerically (Eq. (6.1.43) was solved for s = 0 and s = I with the quasioptimal controls (6.1.38) and (6.1.41) taken as us, s — 0,1). Here we do not describe finite-difference schemes for constructing numerical solutions of Eqs. (6.1.35) and (6.1.43)6 but only present the results 6

Numerical methods for solving equations of the form (6.1.35) and (6.1.43) are discussed in Chapter VII.

Chapter VI

326

F(p,x,m,D), GQ(p,x,m)

D=

FIG. 49 of the corresponding calculations performed for different values of the parameters of problem (6.1.31), (6.1.32). In Fig. 49 the plots of the loss function F (solid curves) and the function G° (dashed curves) are given for three values of the a posteriori

variance D = eD in the case where m = I , p = T — t = 3, and problem (6.1.31), (6.1.32) has the parameters g = h = b = v=\. (Since the functions F and G° are even with respect to the variable x, that is, F(t,x,m,D) - F(t,-x,m,D) and G°(t,x,m) = G°(t,-x,m), Fig. 49 shows the plots of F and G° only for x > 0.) Since the corresponding curves for F and G° are close to each other, we can state that, in this case, the quasioptimal zero-approximation control (6.1.38) ensures the control

quality close to that of the optimal control. However, this situation is not universal, which is illustrated by the numerical results shown in Fig. 50. Figure 50 shows the plots of the functions F (solid curves), G° (dotand-dash curves), and Gl (dashed curves) for the "reverse" time p —

T — t = 2.5 and the parameters g = h — 1, b = 0,1, and v — 5 of problem (6.31), (6.1.32). One can see that the use of the quasioptimal

zero-approximation control uo(t, x,m) leads to a considerable increase in the value of the functional (6.1.19) compared with the possible minimum (optimal) value F(t, x, TO, D). Therefore, in this case, to ensure a qualitative control of system (6.1.31), we need to use quasioptimal controls in higher-order approximations. In particular, it follows from Fig. 50 that, in

this case, the quasioptimal first-approximation control ui(t, x, TO, D) determined by (6.1.37), (6.1.40) and (6.1.41) provides the control quality close

327

Applications of Asymptotic Synthesis Methods

2000

m= -0.6 = 0.6

1.2

1.6

2.0 x

FIG. 50 to the optimal. Thus, the results of numerical solution of Eqs. (6.1.35) and (6.1.43) confirm that the quasioptimal control algorithm (6.1.41) is "highly qualitative." We point out that this result was obtained in spite of the fact that

the a posteriori variance D, which plays the role of a small parameter in the asymptotic synthesis method considered here, is of the same order of magnitude as the other parameters (
that in the title of this section the problems of optimal control with unknown parameters of the form (6.1.1)—(6.1.3) are called "adaptive." It is well known that problems of adaptive control are very important in the modern control theory, and at present there are numerous publications in

this field (e.g., see [6-9, 190] and the references therein). Thus, it is of interest to compare the results obtained in this section with other approaches to similar problems. The following heuristic idea is very often used for constructing adaptive algorithms of control. For example, suppose that for the feedback control system shown in Fig. 13 it is required to construct a controller C

that provides some desired (not necessarily optimal) behavior of the system in the case where some parameters a of the plant P are not known in advance. Suppose also that for some given parameters a, the required

328

Chapter VI

behavior of system in Fig. 13 is ensured by the well-known control algorithm u —
output process xf0 = {X(T) : 0 < T < t}\ (2) to define the adaptive control by the formula ua — if>(t,x,at). Needless to say, an additional analysis is required to answer the question of whether such control ensures the desired behavior of the system. The corresponding analysis [6-9, 190] shows that

this method for constructing adaptive control is quite acceptable in many specific problems. Now let us discuss the results of this section. Note that the abovementioned heuristic idea is exactly realized if system (6.1.2) is controlled by the quasioptimal zero-approximation control uo(t, x, m). To verify this fact, we return to the example considered in Section 6.1.4. The algorithm of the optimal control for problem (6.1.31), (6.1.32) with a known parameter a = —a is given by formulas (2.1.14) and (2.1.16) in §2.1. Comparing (2.1.14), (2.1.16) with (6.1.37), (6.1.38), we see that the quasioptimal zero-approximation algorithm (6.1.37), (6.1.38) can be obtained from the optimal algorithm (2.1.14), (2.1.16) by replacing the parameter —a by its optimal estimate mt established by means of the filter equations (6.1.33) and (6.1.34). On the other hand, a numerical analysis of the quasioptimal algorithms uo(t, x, m) and ui(t, x, m, D) (see Figs. 49 and 50) shows that the algorithm MI is preferable in contrast with the "heuristic" algorithm UQ in the zero approximation. This result proves that the regular asymptotic method considered in this section is effective for solving adaptive problems of optimal control.

§6.2. Some stochastic control problems with constrained phase coordinates

As was pointed out in §1.1, in the process of constructing actual control systems, one often needs to take into account constraints of various types imposed on the set of possible values of the phase coordinates. These constraints arise due to exploiting some specific systems, additional requirements on the transient processes, allowance to the fact that the time of control switching is finite, and to other courses. In these cases, a region of admissible values is specified in the phase space, so that the representative point of the controlled system must not leave this region. In this case, the equations of the system dynamics that determine the phase trajectories in the interior of this region can be violated at the boundary of this region.

Additional constraints that are imposed on the phase trajectories on the

Applications of Asymptotic Synthesis Methods

329

boundary depend on the type of the problem. In what follows, we consider two one-dimensional and one two-dimensional problems of optimal control synthesis (the problem dimension is determined by the number of phase variables on which, in addition to time i, the loss function depends). In the one-dimensional problems, the controlled

variable z(t) is interpreted as the difference (error signal) between the current values of the random command input y(t) and the controlled variable x(t) in the servomechanism studied in §2.2. However, in contrast with §2.2 where any value of the error signal z ( t ) was admissible, in the present section it is assumed that the region of admissible values of z is an interval [^1,^2]- At the endpoints of this interval, we have either reflecting or absorbing screens [157, 160]. In the first case, if the representative point z(t)

comes to i\ or £2, then it is instantaneously reflected into the interior of the interval; in the second case, on the contrary, the representative point "sticks" to the boundary and remains there forever. In practice, we have the first problem if the error signal values lying outside the admissible interval [£1,^2] are prohibited, and we have the second problem if the tracking is interrupted at the endpoints (just as in radio systems of phase small adjustment [143, 180]). In the two-dimensional problem we consider the optimal control of a diffusion process in the interior of the disk of radius TO centered at the

origin on the phase plane (x,y). The circle bounding this disk is a regular boundary [124] reflecting the phase trajectories along the inward normal. 6.2.1. One-dimensional problems. Reflecting screens. Let us consider, just as in §2.2, the synthesis problem of optimal tracking a wandering coordinate in the case where the servomotor with bounded speed is used as an executive mechanism. By analogy with §2.2, we assume that the

command input y(t) is a continuous Markov diffusion process with known drift a and diffusion B coefficients (a,B — const, B > 0). By using a servomotor with bounded speed (x = u, \u\ < um, um > a|), it is required to "follow" the command signal y(i) on the time interval 0 < t < T so that to minimize the mathematical expectation (mean value) of the integral performance criterion T -i

c(z(t))dt\, 0

1

where z ( t ) = y ( t ) — x(t] is the error signal, c(z) is a nonnegative penalty function attaining its minimum at the unique point z = 0, and c(0) = 0. In this case, as shown in §2.2, solving the synthesis problem (in the case of unbounded phase coordinates) is equivalent to solving the Bellman equation

330

Chapter VI

(see (2.2.4))

2 dzi

oz

min

|«]<« m i

_ _„

»_„,,), at

oz \

0 < ( < T , (6.2.1)

with the loss function

F(t,z)-

f fT 1 E / c(z(s)) ds z(t) = z ,

min |«(»)[<« m

L./*

(6.2.2)

J

t<s
satisfying the following natural condition for t = T:

F(T,z) = 0.

(6.2.3)

According to §1.4, the Bellman equation is defined only by local characteristics of the controlled process z(t). Therefore, for problems with constraints on the error signal, Eq. (6.2.1) also remains valid at all interior points li < z < 12. Indeed, since the stochastic process z ( t ) is continuous, its realizations issued from an interior point z with large probability (almost surely) move to a small distance during a small time At and cannot reach the endpoints t\ and (.-2- Therefore, in a sufficiently small neighborhood of any interior point z, a controlled stochastic process behaves in the same way as if there were no reflecting screens. Hence, the differential equation (6.2.1) is valid at these points. At the points £1 and lz, Eq. (6.2.1) is not valid, and additional conditions on the function F at these points are determined by the character of the

process z(t) near these points. For example, in the case of reflecting screens considered here, we have the conditions [157] (9 W

rtW

^(Mi) = ^(M2) = 0.

(6.2.4)

The conditions (6.2.4) can be explained intuitively by modeling the diffusion process z ( t ) approximately as a discrete random walk [160] in which with some definite probabilities the representative point goes from the point z to neighboring points z ± Az, Az = ^/BAt, during the time At. Then if at any time instant t the point z comes to the boundary, say, z = li, then with probability 1 the process z attains the value £1 + Az at

time t + At, and therefore, we can write the following relation for the loss function (6.2.2):

F(t, 4) = c(li) At + F(t + At, /! + Az).

Applications of Asymptotic Synthesis Methods

331

By expanding the second term in the Taylor series around the point (<, •£].), we obtain

OF —— (

dF —— (

whence, passing to the limit as At —>• 0, we arrive at (6.2.4). Thus, to synthesize an optimal servomechanism with the variable z subject to constraints in the form of reflecting screens at the points z = t\ and z = 12, we need to solve Eq. (6.2.1) with additional conditions (6.2.3) and (6.2.4) on the function F(t,z) (4 < z < £ 2 , 0 < t < T). In this case, the synthesis problem is solved according to the scheme studied in §2.2 for a similar problem without constraints on z. Therefore, here we only briefly recall this scheme paying the main attention to distinctions arising in the calculational formulas due to constraints on the phase variable. Obviously, the expression in the square brackets in (6.2.1) is minimized by an optimal control of the form

dF

uf(t,z) = u m sign —— (t,

(6.2.5)

Substituting (6.2.5) into (6.2.1) and omitting the symbol min, we obtain

Bd2F

dF_

2

dz

2 dz

- u

dF

d_F_

' dt

(6.2.6)

If we pass to the reverse time T = T — t, then the boundary value problem we need to solve acquires the form

B d2F

dF

~o~~5~ 2 dz2

^~ dz

dF

dF

-c(z

t l < Z <

0 < T < T, (6.2.7) (6.2.8)

J?(0,z) = 0.

(6.2.9)

By taking into account the properties of the penalty function c(z), we see that the loss function F ( T , Z ) satisfying the boundary value problem (6.2.7)-(6.2.9) for all r (0 < T < T) has a single minimum (with respect to z) on the interval li < z < £2- Therefore, the optimal control (6.2.5) can be written as (see (2.2.8))

ut (T, z) = um sign (z - z,, (T)),

(6.2.10)

332

Chapter VI

where z* (T) is the minimum point (with respect to z) of the function F(r, z) and simultaneously the switch point of the controlling action. This point can be found from the condition f)F

^-(r,z.) = 0.

(6.2.11)

Thus, to synthesize an optimal system, we need to solve the boundary value problem (6.2.7)-(6.2.9) and to use the condition (6.2.11). Problem (6.2.7)-(6.2.9) can be solved exactly if, just as in §2.2, we consider the stationary operating conditions corresponding to large values of T. In this case, instead of the function F(T, z), we can consider the stationary loss function f(z] given by the relation

(just as in (1.4.29), (2.2.9), (4.1.7), and (5.3.17), the number 7 characterizes the mean losses per unit time in the stationary tracking mode). Therefore, for large T (more precisely, as T —>• oo), the partial differential equation (6.2.7) is replaced by the following ordinary differential equation for the function f ( z ) :

BcPf , df a 2 dz2 dz

= 7-c(z)

with the boundary conditions

(6.2.12)

i"-13)

|«>> =£«•>=»•

In this case, the coordinate of the switch point given by (6.2.11), where F is replaced by /, attains a constant value z* (that is, we have a stationary switch point).

The boundary value problem (6.2.12), (6.2.13) can readily be solved by

the matching method. By analogy with §2.2, let us consider Eq. (6.2.12) on different sides of the switch point z*. Then the nonlinear equation (6.2.12) is replaced by the pair of linear equations

f

"(""*" ° )

= < y

~

C

^'

z

*<*<£2,

(6.2.14)

Solving Eqs. (6.2.14), (6.2.15) with boundary conditions (6.2.13), we arrive at

1(a-um)

(6.2.16)

Applications of Asymptotic Synthesis Methods

333

By using (6.2.11), we obtain the two equations

^M = ^M = 0

(6-2.17)

for two unknown parameters 7 and z*. Substituting (6.2.16) into (6.2.17) and eliminating the parameter 7 from the system obtained, we see that the stationary switch point z* satisfies the transcendental equation p A 2 z,

A2 Jif , ' e*2yc(y] dy V ; "

_ ,,A2/2

(6.2.18)

For the quadratic penalty function c(z) = z 2 , Eq. (6.2.18) acquires the form

wi(z») = w 2 (z*),

(6.2.19) 2 2

\ 2 y 2 _ 2A'j2 -i- 2 _ (A ^ _ 2A'^~ -1- 21 exDfA'f^' _ z 11 WiCz*) = ——————— i = 1,2. If ^i —> —oo and £2 —>• +00 (that is, reflecting screens are absent), then Eqs. (6.2.19) imply the following explicit formula for the switch point zt: - _L \ AI

1 _ \ A 2

aB "? u^ — a9z '

this formula was obtained in §2.2 (see (2.2.16)). In the other special case £2 = —^i and AI = — A2 (the last equality is possible only if a — 0), Eq. (6.2.19) has a single trivial root zt = 0, that is, the optimal control (6.2.10) coincides in sign with the error signal z.

6.2.2. Absorbing screens. Let us see how the tracking system studied in the preceding section operates with absorbing screens. Obviously, in this

case, the loss function (6.2.2) must also satisfy Eq. (6.2.7) in the interior of the interval [^1,^2] and the zero initial condition (6.2.9). At the boundary points, instead of (6.2.8), we have

F(T, 4) = C(^I)T,

F(T, £ 2 ) = c(l2)r.

(6.2.20)

The conditions (6.2.20) follow from formula (6.2.2) and the fact that the trajectories z ( t ) stick to the boundary. Indeed, by using, as above, the discrete random walk model for z(t), we can rewrite (6.2.2) as

334

Chapter VI

and hence, since t and A are arbitrary, we obtain

F(t, li) = c(li)(T -t)= c(li)r,

i = 1, 2.

Just as in the preceding section, the exact solution of synthesis problem with absorbing screens can be obtained only in the stationary case (as

T —>• oo). Suppose that the stationary operating mode exists and that ZQ is the corresponding stationary switch point. Then for large r, the nonlinear equation (6.2.7) can be replaced by two linear equations

c(z)

'

^
B d2F-, 8F-> <9_F> Tirr + ^-^Tr^l^ -'(*)' 2i OZ OZ OT

z o < z < £ 2 . (6.2.22)

For z = ZQ, z = £1, and z = i^-, the functions FI and Jf<2 satisfy (6.2.11) and (6.2.20). In accordance with [26], for large T, we seek the solutions of the linear equations (6.2.21) and (6.2.22) in the form

Fi(r, z) = J>i(z)T + f i ( z ) ,

i = 1, 2.

(6.2.23)

Using (6.2.23), we obtain from (6.2.21), (6.2.11), and (6.2.20) the following system of ordinary differential equations for the functions il>$z) and f i ( z ) :

(6.2.24) From (6.2.24) we obtain

v. (6.2.25) za

In a similar way, for the functions ^>2 and /2 we have

f a ( z ) = c(l2),

h(z) = ^ fZ dy f" [C(12) - C («)]e A '(-») dy (6.2.26)

(here AI and A2 are given by (6.2.16)). It follows from (6.2.23), (6.2.25), and (6.2.26) that Eq. (6.2.7) has a continuous solution only if

c(/i) = c(/ 2 ).

(6.2.27)

Applications of Asymptotic Synthesis Methods

335

The same continuity condition allows us also to obtain the following equation for the switch point ZQ (provided that (6.2.27) is satisfied):

z

\

[c(t \c(t-$1)-c(y)}e —c(y}\e x^-^dy= dy =

dz /

a

Ir

dz

[c(4) - c(y)}ex^-^ dy.

•/£?

(6.2.28) Just as in the case of reflecting screens, Eq. (6.2.28) can be specified by various expressions for the penalty function c(z).

REMARK 6.2.1. If the condition (6.2.27) is violated, then it makes no sense to study the stationary operating mode in the problem with absorbing

boundaries, since in this case the synthesis problem has only a trivial solution. In fact, we can readily see that for c(li) > c(£ 2 ) we always need to set u = —um (correspondingly, for c(£i) < c(l^) we need to set u = +um). This character of control depends on the fact that, in view of its regularity, the diffusion process z(t] sticks to that or other boundary with probability 1 (as t —>• oo). Therefore, it is clear that this algorithm for controlling the process z(t) maximizes the probability of the event that the process sticks to the boundary with the least possible value of the penalty function c(z). D In the general case c(£i) ^ c(£ 2 ), we need to solve the nonstationary

boundary value problem (6.2.7), (6.2.20), and (6.2.9). Since this problem cannot be solved exactly, it is necessary to use approximate synthesis methods. In particular, we can use the method of successive approximations considered in Chapter III for problems with unbounded phase coordinates. According to Chapter III, the approximate solutions F^k\r, z) of Eq. (6.2.7) can be found by recurrently solving the sequence of linear equations

B, 2 dz2 2

fe

B 9 p( ) 7T

2

a 2o

dz

c + a^T=^^dz dr W>

k

Qp( )

Qp( )

+0-^—— = —5—— + un

dz

( 6 - 2 - 29 )

k

dr

dz

(all fC=)(r,z), k = 0,1,2,..., in (6.2.29) satisfy (6.2.9) and (6.2.20)). After F(k\T, z) are calculated, the synthesis of a suboptimal system is established by (6.2.10) and (6.2.11) with F replaced by F^. Just as in Chapter III, we can prove that the function sequence F^(T,Z) asymptotically as k —>• oo converges to the exact solution F(T, z) of the boundary value problem (6.2.7), (6.2.9), (6.2.20), and the corresponding suboptimal systems to the optimal system (the last convergence is estimated by the quality functional).

6.2.3. The two-dimensional problem. Suppose that the motion of a controlled system is similar to the dynamics of a Brownian particle

336

Chapter VI

randomly walking on the plane (x, y] so that along one of the axes, say, along the z-axis, this motion is controlled by variations of the drift velocity within a given region, while along the y-eads we have a purely diffusion

noncontrolled wandering. In this case, the equations describing the system

motion have the form

t),

y=V2B£2(t},

-(um - a) 0 indicate the boundary values of the nonsymmetric region of admissible controls u. We assume that the representative point ( x ( t ) , y ( t ) ) must not go away from the origin on the plane (x, y) to distances larger than TQ. To this end, we assume that the phase trajectories reflect from the circle of radius TO

along the inward normal to this boundary. Under this assumption, it is required to find a control law that minimizes the mean value of the quadratic optimality criterion

r rT ( x 2 ( t )

i

+ y2(t))dt\. J

I[u}=E { I I Jo

(6.2.31)

One can readily see that the Bellman equation related to this problem, written in the reverse time r — T-t, has the form (FT, Fx, Fy indicate the partial derivatives with respect to T, x, y):

B(Fxx + Fyy) +

min

[uFx] = Fr - x2 - y2.

-(« m -a)<«<(« m +o) L

(6.2.32)

v

'

In addition to Eq. (6.2.32) for the function F(r,x,y) such that 0 < T < T and ^/x2 + y2 < TO, the loss function F ( T , X , y) must satisfy the zero initial condition

F(0,x,y) = Q

(6.2.33)

and the boundary condition of the form [157]

OF,

where d/dn is the normal derivative on the circle of radius TQ.

(6.2.34)

Applications of Asymptotic Synthesis Methods

337

In the polar coordinates (r, (p) defined by the formulas x — r cos (p, y — rsinip, the boundary value problem (6.2.32)-(6.2.34) acquires the form B ( Frr + — Fr + —— 2

r

r

mm

/

.

\

I

\

•

'

„

r i i

(6.2.35) (6.2.36) (6.2.37)

It follows from (6.2.35) that, just as in the one-dimensional case, the optimal control is of relay type:

ut (r, r, ) = a — um sign ( cos
(6.2.38)

but now, instead of the switch point, we have a switching line on the plane (x,y). This switching line is given by the equation (in the polar coordinates)

cos
r

-F^ = 0.

(6.2.39)

To obtain an explicit formula for the switching line, we need to solve Eq. (6.2.35) or (since this is impossible) equations of successive approximations obtained by analogy with Eqs. (6.2.29). Now we shall calculate the loss functions and the corresponding switching lines for the first two approximations of Eq. (6.2.35). The zero approximation. Following the algorithm of successive approximations considered in Chapter III (see also (6.2.29)), we set the nonlinear term in the zero approximation of (6.2.35) equal to zero and thus obtain O) r F B 1(pW + -_F<°) + —-T^ F(°AI — - VF( - T. r2 rr ^ r T T D

( 6 2 40) ^O.Z.IUj

It follows from (6.2.40), (6.2.36), and (6.2.37) that the solution F^ is radially symmetric, F^ = F^(r,r), and therefore, instead of (6.2.40), (6.2.36), and (6.2.37), we have B p

+

°

=F

- r\

J ? ° ( 0 , r) = 0,

F^(r, r0) = 0.

(6.2.41)

Chapter VI

338

It is well known [179] that the solution of Eq. (6.2.41) can be found by separation of variables (by the Fourier method) as the series

(6.2.42) (u.2.4o) / r3/o I -^-r 1 dr. Jo \ ro ) Here IQ(X] is the Bessel function of zero order and is the mth root of the equation dlo(fj,)/d/j, = 0. It follows from the properties of zeros of the Bessel function [179] that the series (6.2.42) is rapidly convergent. Therefore, since we are interested only in the qualitative character of suboptimal control laws, it suffices to find only the first term of the series in the sum in (6.2.42). Calculating GI and using the tables of Bessel functions [77], we obtain the following approximate expression (9 = B/rfy for the function Cm — i

n Vorr /

r> \i9

(A)2[Io(A)]2

F(°)(r,r) = i r - 0 . 0 4 2 6 / 0

(6.2.44)

By differentiating (6.2.44) with respect to r and taking into account the

relations dlo(x)/dx = I i ( x ] and /J,® — 3.84, we find ro

1-exp -

(6.2.45)

Applications of Asymptotic Synthesis Methods

339

Since the first-order Bessel function /i(^°r/r 0 ) is positive for 0 < r < r0 (Ii(fJ^) = 0), the derivative (6.2.45) is positive everywhere in the interior of the disk of radius ro on the plane (z,j/). Hence, in view of (6.2.38), the sign of the controlling action in the zero-approximation is determined by the sign of cos
diameter of the disk of radius r0 on the plane (x, y) (in Fig. 51 the switching line is indicated by AOB; the arrows show the direction of the mean drift velocity).

The first approximation. By using the results obtained above, we can write the first-approximation equation as

(6.2.46)

, r,
0 <
(um+a)Fr(0)(T,r)cos
\ <

(6.2.47)

(here the function Fr(0) is given by formula (6.2.45)). The solution of Eq. (6.2.46) may also be written as a series in eigenfunctions, but since now there is no radial symmetry, this series differs from (6.2.42) and has the form [179] .7nT

00 !-AJ

OO !-^J

F(1) (r, r, tp) = COO(T) + / V} ^ (cnro cos n
) •'o n =o m =i

x /„ f ^-r) exp f - B(^-\\T - a)] d \

(6.2.48)

where

Cnm — Cnm\O) —

^»'

—

2~f jy

n

\2

21 r2 / ^ \

'

^o d «U^ m ; - " J - ' n l M m J

(6.2.49)

—————————!lO——!lO———————————————————————————————————————————

^•UOC)2 - "^KnOC) [ 1 ^ 2

CL, r=: <

for n / 0, for n = 0,

'

(6.2.50)

340

Chapter VI

and COO(T) denotes the terms independent of r and y>, and hence, insignificant for the control law (6.2.38). The numbers /^ are the roots of the equation dln(fj,)/dfj, = 0, where / n (/i) is the nth-order Bessel function. By analogy with the case of the zero approximation, we consider only the first most important terms of the series (6.2.48). Namely, we retain only the terms corresponding to the two roots fj,\ and /i° of the equation dln(/j,)/dfj, — 0; according to [77], ^,\ = 1.84 and $ = 3.84. This means that all coefficients in (6.2.48) except for CQI, GU, and c'n must be set equal to zero. The coefficient CQI coincides with ci in (6.2.43) and has been calculated in the zero approximation (therefore, in the series (6.2.48) the term containing CQI coincides with the second term in formula (6.2.44)). By calculating c'n according to (6.2.50) with regard to (6.2.47), we obtain c'u = 0. Thus, to find the loss function F^l\ it suffices to calculate only cu. Substituting (6.2.47) and (6.2.45) into (6.2.49), we obtain

2

_

(/4)2

Z x jT / «P[-»(,
'o

f ,,(r*,)!: Jo \ o ) r

x

i !-±r\rdr

»3>r/2

L

/.jr/2

cos2 if dip- (um - a) I

(wm + a) / ./jr/2

-j

cos2 p efy> .

./-JT/2

J

(6.2.51) Since we have (see [179], §2, Part 1, Appendix 2) pra

/..I \

/ ,,0 \

J,i I( M l 'r 1J, f - ' i l( ^r\rdrriror— Jo V r o / \^o / II

J

r2n°T, (iil1 \T'lii°\ li

r

o^'„ ^> ^^> . , (Mi)2-(w)2

we calculate the other integrals in (6.2.51) and obtain

,

(6.2.52)

Applications of Asymptotic Synthesis Methods

341

Substituting (6.2.52) into (6.2.39) and letting T —>• oo, we arrive at the following equation for the switching line corresponding to the stationary operating conditions:

[3.54v/i(/4v) - 1.91/1 (/4v)] cos2
+ 1.9Ui(fi\v) = 0 (6.2.53) (here v = r/r 0 , £ = Or0/a = B/ar0, and /i(/a) = dli((j,)/d(j,).

FIG. 52 Curves 1, 2, and 3 in Fig. 52 correspond to the three values of the parameter s in Eq. (6.2.53): e = 0.4, 1.0, 3.0. Thus, the optimal control in the first approximation consists in switching the control action from u — —(um — a ) in the region R_ to u = +(um +a) in the region R + , which (in dependence of the value of the parameter e) lies inside one of the closed

curves 1-3 in Fig. 52.

REMARK 6.2.2. The decomposition (Fig. 52) of the phase space into the regions R_ and R_|_ can be refined if the functions F^(r,z} and F^'(T,r^if>) are calculated more precisely (that is, are approximated by a larger number of terms in the series (6.2.42) and (6.2.48)). However, as the corresponding calculations show, curves 1-3 obtained in this case do not practically differ from those shown in Fig. 52. D §6.3. Optimal control of the population size governed by the stochastic logistic model

In this section we return to the problem of optimal control of the population size, which was formulated in §2.4 (but not solved). Let us briefly recall the statement of this problem.

342

Chapter VI

6.3.1. Statement of the problem. We shall consider a single-species population whose dynamics is described by the controlled stochastic logistic model

,

K

,

,

(6.3.1)

where x — x ( t ) is the population size (density) at time t, £(t) is a stochastic process (1.1.31) of the standard white noise type, and r, K q, B, and x° are given positive constants.

Admissible controls belong to the class of nonnegative scalar bounded measurable functions u = u(t] that for all t satisfy a condition of the form 0 < u(t) < um.

(6.3.2)

where um is a given positive number.

We shall consider the control problem on an infinite time interval R+ = [0, oo) with an arbitrary initial population size a;(0) = x° > 0. The goal of control is to maximize the functional

r r°° e~st (pqx(t)

I[u] = E\ Uo

i

- c}u(t) dt -> max , J o<«(*)<« ro

(6.3.3)

t>0

where 6,p,q,c > 0 are given numbers and E denotes the mathematical expectation of the expression in the square brackets (we average over the ensemble of random trajectories issued from a given point x(Q) = x° and satisfying the stochastic differential equation (6.3.1)). It follows from §2.4 that problem (6.3.1)-(6.3.3) is a stochastic general-

ization of optimal fisheries management problems studied in [35, 68, 101]. If, just as in §2.4 and in [35, 68, 101], the number p is the cost of unit mass of caught fish, the number c denotes the cost of unit efforts u(t) for fishing, and q is the catchability coefficient, then the functional (6.3.3) is an estimate of the mean profit obtained by fishing during time of the order of 1/S. The optimal control function u*(t) : R + -» [0, um] maximizing the functional (6.3.3) is a random function of time. To obtain a constructive algorithm for calculating this function, we need to use some results of the general control theory for processes of diffusion type (see [58, 113, 175] as well as §1.4).

We assume that the controlling party has information about the current values of the controlled process x(t). Then it is expedient to choose the control u(t) at time t on the basis of the entire body of information available

Applications of Asymptotic Synthesis Methods

343

on the controlled process. This leads to a controlling function of the form u(t) = u(t, XQ), XQ = {x(s): 0 < s < t} that is sometimes called the natural strategy of control (the function u(t,XQ) can be a probability measure). But if, just as in our case, the controlled system obeys an equation of the form (6.3.1) with perturbations £(t) in the form of a Gaussian white noise, then, as was shown in [113, 175], the prehistory of the controlled process x ( s ) : 0 < s < t does not affect the quality of control. Therefore, to solve the optimization problem (6.3.3), it suffices to consider only the class of controlling functions that are deterministic functions of the current phase variable u(t) = u(t,x(t)) (the nonrandomized Markov strategy). Next, since the stochastic process £(£) is stationary and the coefficients in (6.3.1) are time-invariant, the optimal strategy for the infinite-horizon problem in question does not depend on time explicitly, that is, uf(i) = u*(x(t)}. By using the controlling function (the synthesizing function) in the form u f ( x ) , we can realize the optimal control of system (6.3.1) in the form of an automatic feedback control system. In what follows, we present a method for calculating the synthesizing function u*(x) for problem (6.3.1)-(6.3.3).

6.3.2. Solution of problem (6.3.1)-(6.3.3). By analogy with §2.4 and on the basis of the results obtained in [113, 175], we can assert that the maximum value of the functional (6.3.3) (that is, the cost function)

r r ^^ F(x) =

max 00

E /

e~st(pqx(t) - c)u(t)dt

x(0) = x

IJo

considered as a function of the initial state x is twice continuously differentiable and satisfies the following Bellman equation7 (F1 = dF/dx, F" = d2F/dx2):

Bx2F"+x^r+B-~xjF'+

max

[(pqx-c-qxF')u]-6F

= 0. (6.3.4)

The cost function is defined only for nonnegative values of the variable x; for x = 0, this function satisfies the natural boundary condition F(Q) = 0,

(6.3.5)

which is a straightforward consequence of (6.3.1) and (6.3.3) (indeed, it follows from (6.3.1) that if x(0) = 0, then x(t) = 0 for all t > 0; hence, 7

Equation (6.3.4) is written with regard to the fact that the solution of the stochastic

equation (6.3.1) is understood in the symmetrized sense (see §1.2 and [174]).

344

Chapter VI

it follows from (6.3.3) that in this case the optimal control has the form u*(t) = 0; and hence, (6.3.3) implies (6.3.5)). First, note that Eq. (6.3.4) for 6 > r + B and K —> oo has exact solution (obtained in §2.4) •I

J_

/

C \ / 1* \

TO(PXO~ -}(—) (

7 0

\

>

Q<x<x0,

VOX

C\

F(x)={u m(-———^—————--} v ' \6 — r - B + qum a/ I r (S - r - B)px0 8 - r - B + qum q

(6.3.6)

Here

_ _

c(S-r-B +9Um)

(6_3J)

determines the switch point of the optimal control in the synthesis form

u*(x)=t

f 0,

0 < x < a*,

(. Um,

X > XQ,

(6.3.8)

and the numbers fc° > 0 and k^ < 0 in (6.3.6) and (6.3.7) can be written

in terms of the parameters of problem (6.3.1)-(6.3.3) as r 2 + 4SB),

kl = ~ (qum -r- ^/(qum - r)« + 4SB).

For an arbitrarily chosen value of the parameter (the medium capacity) K > 0, it is impossible to find the solution of Eq. (6.3.4) in the form of

finite formulas like (6.3.6) and (6.3.7). Nevertheless, as is shown below, constructive methods for solving the synthesis problem can also be found in this case. Let us construct a solution of Eq. (6.3.4). First, we note that it follows from (6.3.4) that the optimal control requires only the boundary values u = 0 and u = um of the set [0, um] of admissible controls. The choice of one of these values is determined by the sign of the expression 7(2;) = pqx — c — qxF'x. If 7(2:) = 0, then the choice of control is not determined formally by Eq. (6.3.4). However, one can see that in this case the choice of any admissible value of u does not affect the solution of Eq. (6.3.4), since the nonlinear term of Eq.(6.3.4) vanishes for 7(2:) = 0 and any admissible u. Therefore, we can write the optimal control in the form i

;m,

-\lv\ i\x! ^ ^ n"> 7(2;) > 0.

Applications of Asymptotic Synthesis Methods

345

If the equation 7(2;) = 0 has a single root x f , then the optimal control can be written in the form 0 < x < Xt, J O,

(6.3.9)

= <

1. ^"m t

*^* \ iC.

similar to (6.3.8), where the coordinate of the switch point xt is determined by the equation

pqx - c - qxF'(x) = 0,

(6.3.10)

whose solution can be obtained after the cost function F ( x ] is calculated. By Fo(a;) and FI(X) we shall denote the cost function F(x) on either side of the switch point xf. Then, as it follows from (6.3.4) and (6.3.9), instead of one nonlinear equation (6.3.4), we have two linear equations for FQ and

Bx2F^' + x(r + B-^-x}F^-SF0 = Q, K. I

V

(r + B -qu

*>

m

0 < x < z,,

(6.3.11)

\

- —x\F[-6Fi = um(c-pqx), K

x* < x. (6.3.12)

Since the cost function F(x), as the solution of the Bellman equation (6.3.4), is twice continuously differentiable for all x E [0, oo), the functions FQ and FI satisfy the boundary condition (6.3.10) at the switch point x*. Moreover, it follows from (6.3.5) that -Fo(O) = 0. These boundary conditions allow us

to obtain the unique solution of Eqs. (6.3.11) and (6.3.12), and thus, for all x G [0,oo), to construct the cost function F(x) satisfying the Bellman equation (6.3.4). We shall seek the solution of Eq.(6.3.11) as the generalized power series TTl / \ __ I \ r^\x) ^ x (7(O.Q -j-i a^x +| 0-2^ 2 +I ... j.

/ i ? O 1 O \ (o.o.ioj

By substituting the series (6.3.13) into (6.3.11) and setting the coefficients of x" ,x"+1, . . . equal to zero, we obtain the following system for the characteristic factor cr and the coefficients a,-, i = 0, 1, 2, . . . : - S]a0 = 0,

[Ba-(a - 1) + (r + B)a -6 + 2B
(
- l) + (r + B)
r ~ J^(^ + n- 1)O n-l n = 1,2,3..., .

= 0,

(6.3.14)

346

Chapter VI

If we set a0 7^ 0, then the first relation in (6.3.14) implies the characteristic equation

Ba2 + ra - S — 0, whose roots

2£ v

• . v-

i -—/i

_ 1,0 2 _ __ / _

""~

~2B

_

/

2 i AX n\

-i-"±0-0;

determine two possible values of the characteristic factor (x), (6.3.15) where

i>(x) = x i , ,

.

fcO

^^, "

n=1

,„

'

(6.3.16)

n\\KB)

For the coefficients of the series (6.3.16) we have the estimate

Thus, the series (6.3.16) converges for any finite x > 0, we can differentiate this series term by term, and its sum ^(x) is an entire analytic function satisfying the estimate < xl exp

- .

\2KB1

The constant ao in (6.3.15) can be found from the boundary condition (6.3.10) for the function FQ at the switch point x*. Hence we have the following final expression for the solution of Eq. (6.3.11):

77?T-

(6'3-17)

The nonhomogeneous equation (6.3.12) is of the same type as Eq. (6.3.11) and its solution can also be expressed in terms of generalized power series.

Applications of Asymptotic Synthesis Methods

347

It is well known that the general solution of the nonhomogeneous equation (6.3.12) is the sum of the general solution of the homogeneous equation,

•B-qum-^-x]xF[-SF1 = 0,

(6.3.18)

•K /

and any particular solution of Eq. (6.3.12). Equation (6.3.18) is similar to Eq. (6.3.11), and therefore, its solution can be constructed by analogy with the above-described procedure (6.3.13)-(6.3.17). Performing the required calculations, we obtain the following expression for the general solution of Eq. (6.3.18): (6.3.19) Here ci and 03 are arbitrary constants and the functions i>i(x] and are the sums of generalized power series

(6.3.21) where the numbers k^, k^, and a are determined by the expressions

,>

(qum - r)2 + 4SB.

(6.3.22)

Note that the series (6.3.20) for any finite x can be majorized by a convergent numerical series. Therefore, the series (6.3.20) can be differentiated and integrated term by term, and its sum tl>i(x) is an entire function. Similar statements for the series (6.3.21) hold only for a ^ n (here n is a positive integer number); in what follows, we assume that this inequality is satisfied. A particular solution of the nonhomogeneous equation (6.3.12) can be found by the standard procedure of variation of parameters. We write the desired particular solution $ as

*(x) = ci(z)Vi(z) + c2(x)i>2(x),

(6.3.23)

where the desired functions c\(x) and C 2 (cc) satisfy the condition

ci(a;)^i(a;) + c'2(z)V>2(a;) = 0.

(6.3.24)

348

Chapter VI

By substituting, instead of FI, the relation (6.3.23) into (6.3.12), after simple calculations with regard to (6.3.24), we obtain _ um [ ~ B J

(c - pqx)il>2(x) dx

_ Mm f

(pqx - c)tl>i(x) dx

o'

~ B J x^2(x)^((x)~^(x)^(x)Y

l0

-*' ° j

Note that the expression in the square brackets in the integrands in (6.3.25) and (6.3.26) is the Wronskian of Eq. (6.3.12), which is not zero for all x, since the solutions tj>i(x) and ^(x) are linearly independent. Therefore, we can readily calculate the integrals in (6.3.25) and (6.3.26) and thus find the functions ci(x] and C2(x) as generalized power series obtained by term-by-term integration in (6.3.25) and (6.3.26). Thus the general solution of the nonhomogeneous equation (6.3.12) has the form x) + $(z), (6.3.27) where $(z) is given by (6.3.23), (6.3.25), and (6.3.26). To obtain the unique solution satisfying the Bellman equation (6.3.4) for x > a;*, we need to

choose arbitrary constants ci and c2 in (6.3.27). To this end, we use the boundary condition (6.3.10) for the function FI(X) at the switch point xt. To obtain the second condition, we assume that the functions FQ(X) and

FI(X) coincide as K —>• oo with the known exact solution F ( x ) given by

(6.3.6). It follows from (6.3.16), (6.3.17), (6.3.20), (6.3.21), (6.3.25), and (6.3.26) that this condition is satisfied if we set ci = 0. The condition (6.3.10) for the function FI(X) at the point xf implies 1 / c \ c2 = -777-T- (p- —— - $'(«*) . il>'2(x) \ qxt ')

Thus, the desired solution of the inhomogeneous equation (6.3.12) acquires the form

- + $(x},

x>x..

(6.3.28)

Formulas (6.3.17) and (6.3.28) determine the cost function F ( x ) that satisfies the Bellman equation (6.3.4) for all x € [0, oo). In these formulas, only the coordinate of the switch point x* remains unknown. To find x*,

we use the condition that the cost function F(x) must be continuous at the switch point: F0(x)

= FI(X)

(6.3.29)

Applications of Asymptotic Synthesis Methods

349

or, which is the same due to (6.3.10), the condition that the second-order derivative must be continuous: (6.3.30)

Since the series (6.3.16) and (6.3.21) are convergent, we can calculate z* with any prescribed accuracy, and thus solve our equations numerically. Furthermore, for large values of the medium capacity K, formulas (6.3.29) and (6.3.30) give us approximate analytic formulas for the switch point, and these formulas allows us to construct control algorithms that are close to the optimal control. 6.3.3. The calculation of x* for large K. In the case K —> oo, the functions i}>(x), ij)i(x), and ^2(a?), as it follows from (6.3.16), (6.3.20), and (6.3.21), are given by the finite formulas

Correspondingly, instead of the series (6.3.15) and (6.3.28), we have 1 / \ / \k° F0(x) = —lpx#--\( — J , = un •

(6.3.31)

pqx

S — r — B + qun t

(S-r-B)px*

c\ ( x

N

*' (6.3.32)

By substituting (6.3.31) and (6.3.32) into (6.3.30), we obtain x, = x0, where XQ is given by (6.3.7) (derived in §2.4). If the medium capacity K is a finite number, then the coordinate x* cannot be written as a finite formula. However, it follows from continuity considerations that for large K the coordinate xf is close to XQ, so that we can take XQ as the first approximation to the root of Eqs. (6.3.29) and (6.3.30). Then the corrections for refining this first approximation can be calculated by the following scheme. For large K, the e — r/KB can be considered as a small parameter and, as follows from (6.3.15), (6.3.16), (6.3.20), (6.3.21), and (6.3.28), the functions F0(x) and FI(X) can be represented as power series in e:

F0(x] = Fg(x) + eFt(x) + e2F*(x) + e3 ..., 2

3

F!(X) = F?(x) + sFt(x) + e Fl(x) + e ...

(6.3.33) (6.3.34)

350

Chapter VI

We also seek the root of Eqs. (6.3.29) and (6.3.30), that is, the coordinate xf, as the series

xf = K 0 + eAi + e 2 A 2 + e 3 . . . ,

(6.3.35)

where the numbers x 0 i AI, A 2 , . . . must be calculated. By substituting the

expansions (6.3.33)-(6.3.35) into Eq. (6.3.29) (or (6.3.30)) and setting the coefficients of equal powers of the small parameter e on the left- and righthand sides equal to each other, we obtain a system of equations for successive calculation of the numbers x 0 , AI, A 2 ,... in the expansion (6.3.35). Obviously, the first term XQ in (6.3.35) coincides with (6.3.7). To calculate the first correction AI in the expansions (6.3.33) and (6.3.34), we retain the zero-order and the first-order terms and omit the terms ~ s2 and

higher-order terms. As a result, from (6.3.16), (6.3.17), (6.3.20), (6.3.21), and (6.3.28) we obtain the following expressions for the functions FQ(X) and FI(X) in the first approximation:

^H1 + —TT >

————,^°,o - 7.U / 7~U

i

^——» 1\ _„ '

(6.3.36) \ '

-a:* J

ek\x ft

( AI - —— +eA2xt as* o

^

[ f c 2+

V

^^J

c2 + (p-A1)x,

(6.3.37)

where A

I = T^———;

(r + B- S)(4B - 2qum + 2r-S) 1 — )2 _|_ AX R v v ^—,n

» ; -TIUJJ.

r

By differentiating (6.3.36) and (6.3.37) two times, we rewrite Eq. (6.3.30) p_

cr \ f t 0 \ I «1

qx* J L

a;* r-

\

TZ- 1 — 1

f(^ J- 111

-eA2. (6.3.38) To calculate the first two terms in the expansion (6.3.35), we substitute the root x* = x0 + eAi into Eq. (6.3.38) and collect the terms of the zero

Applications of Asymptotic Synthesis Methods

351

and the first order with respect to the small parameter e. If we retain only the zero-order terms in Eq. (6.3.38), then we can readily see that (6.3.38) implies formula (6.3.7) for XQ. Collecting the terms of the order of e, from (6.3.38) we obtain the first correction

qx0

(6.3.$9)

-ai

Thus, for large values of the parameter K (that is, for small e), the coordinate XQ given by (6.3.7) can be interpreted as the switch point in the

zero approximation. Correspondingly, the formula

where XQ and AI are given by (6.3.7) and (6.3.39), determines the switch point in the first approximation. Let UQ(X) and MI (a;) denote the controls Ui(x)

=

f 0,

0 < x < Xi,

I, Mm,

X{ < X,

~

*:

(6.3.41)

I— 0, 1.

Obviously, by using these algorithms to control system (6.3.1), we can decrease the value of the functional (6.3.3) compared with its maximum value F(x), which can be obtained by using the optimal control (6.3.9). However, it is natural to expect that this decrease in the value of the functional

(6.3.3) is negligible for large K , and moreover, the quasioptimal control MI(Z) is "better" than the zero-approximation algorithm UQ(X) in the sense that /[MI] > I[u0]. 6.3.4. Results of the numerical analysis. Our expectations are confirmed by the following results of numerical analysis of the quasiopti-

mal algorithms (6.3.41). By d(x) we denote the value of the functional (6.3.3) obtained by using the controls Ui and a given initial population size

z(0) = x. Then G,-(x) is a continuously differentiable function of the initial state x and satisfies the linear equation /

r \

V

& J

Bx*G'!+ T+B-—X }xG'i + (pqx-c-qxG'i)ui(x)-SGi

= 0,

G(0) = 0. (6.3.42)

352

Chapter VI

Denoting by GM(X) and Gu(x), just as in Section 6.3.2, the values of the function d(x) on either side of the switch point a;,-, we obtain the following equations for G,-0 and GJI from (6.3.42):

Bx2G'/0 +r + B- K x xG'io - SGn = 0, 0 < x < xt,

/

V

(

r \

Bx2G"1 + [ r + B - qum - — x }xG'i-L - 5Gn = um(c-pqx), K V /

(6.3.43) a;,- < x, (6.3.44)

which are quite similar to Eqs. (6.3.11) and (6.3.12). Therefore, the general solutions of these equations, by analogy with Section 6.3.2, have the form

GiQ = ci^(z),

Ga = c 2 Y> 2 (z) + *(z),

(6.3.45)

where the functions i/>(x), i>z(x), and $(*) are given by formulas (6.3.16), (6.3.20), (6.3.21), (6.3.23), (6.3.25), and (6.3.26). The functions (6.3.45) differ from the corresponding functions (6.3.17) and (6.3.28) in Section 6.3.2 by the method used for calculating the constants ci and 02 in (6.3.45). In Section 6.3.2 the corresponding constants (ao in (6.3.15) and ci,C2 in (6.3.27)) were determined by the condition (6.3.10) at an unknown switch point a;,, while in Eqs. (6.3.42) the switch point Xi was given in advance either by (6.3.7) with i = 0 or by (6.3.40) with i = 1. By substituting (6.3.45) into the formulas Gio(xi) = Gn(xi) and G'io(xi) = Gj-1(a;,-),8 we obtain the following formulas for the coefficients Si and C2 in (6.3.45):

_

=

By choosing specific numerical values of the parameters r, K,
(or (6.3.30)), and then to substitute the obtained value into (6.3.46) instead of X{. In this case, the functions Gio(x) and GH(X) given by (6.3.45) 8

These formulas follow from the condition that the solutions G;(x) of Eqs. (6.3.42)

are continuously differentiable.

353

Applications of Asymptotic Synthesis Methods

coincide, respectively, with the functions F0(x) and FI(X) given by (6.3.17) and (6.3.28), that is, we have Gi(x) = F(x). The above-described procedure for numerical constructing the functions Go(x), GI(:E), and F(x) was realized in the form of software and was used in numerical experiments for estimating the quality of the quasioptimal

control algorithms UQ(X) in the zero approximation and ui(x) in the first approximation. Some results of these experiments are shown in Figs. 53

and 54, where the cost function F ( x ] is plotted by solid curves and the functions GQ(X) and GI(X] by dot-and-dash and dashed curves, respectively.

F(x) 1.6 Gi(x = 7.5

G0(x 1.4

1.2

1.0

0.8 X*

0

X*

1.0 "-—--""1.2 ^-^ 1.4 # = 7.5 # = 11

1.6

FIG. 53 In Fig. 53 these curves are constructed for two values of the parameter K: K = 7.5 and K = 11; the other parameters of problem (6.3.1)-(6.3.3) are: r — 1, S = 3, B = 1, q = 3, um = 1.5, c = 3, and p — 2. In this case, the variable e = r/KB treated as a small parameter in the expansions

(6.3.33)-(6.3.35) attains the values e = 0.091 (the upper group of curves) and £ = 0.133 (the lower group of curves). Figure 53 shows that in this case all three curves F(x), GQ(X), and GI(X), relative to the same group of parameters, are sufficiently close to each other. Hence, the use of the quasioptimal algorithms (6.3.41) ensures the control quality close to that

of the optimal control (obviously, the first-approximation control ui(x) is

Chapter VI

354

preferable than the zero-approximation control U0(x), since the mean cost GI(Z) corresponding to ui(x) is closer to the optimal cost F(x)). 0.07 : F(x)

0.06

Gi(s) G0(x)

0.05 0.04 0.03

0.02 X0

0

0.65

0.55 K -0.17

0.7

FIG. 54 It is of interest to point out that an improvement in the control quality can be obtained by using MI (a) instead of UQ(X} even if the parameter

s — r/KB is not small. This phenomenon is clearly illustrated by the

results of calculations shown in Fig. 54, where the curves F(x), GQ(X), and GI(X) are drawn for the following parameters of problem (6.3.1)-(6.3.3): r = 1, <J = 20, B = 1, q = 3, um = 100, c = 3, p - 2, K = 0.3, and tf = 0.17. Many times in Chapters III, V, and VI we have considered similar situations (in which the formal use of the approximate synthesis procedure developed for problems with a small parameter e <S 1 provides satisfactory results for e ~ 1). Thus we see that the small parameter methods and related methods of successive approximations are very effective tools for

investigation and solution of various specific practical problems of optimal control.

CHAPTER VII

NUMERICAL SYNTHESIS METHODS

Numerical synthesis methods are, in general, mostly universal compared with any other methods for solving problems of optimal control, since numerical methods are highly insensitive to the problem conditions. Indeed, each of the approximate methods described in Chapters III-VI is intended for solving optimal control problems from a certain class characterized by the singularities of the plant dynamics equations, by small parameters, etc. The choice of the method for obtaining quasioptimal control algorithms essentially depends on the singularity of the control problem designed.

On the other hand, if the control problem is solved, just as in the present book, by the dynamic programming method, then the possibility to solve the synthesis problem numerically is determined by the way of constructing

a numerical solution of the Bellman equation corresponding to the problem in question. The type of this Bellman equation is determined by the character of the problem considered. So, the majority of stochastic synthesis problems studied in Chapters II-VI correspond to the Bellman equations in the form of nonlinear second-order partial differential equation of the parabolic type. Correspondingly, the Bellman equations for deterministic synthesis problems are (nonlinear) first-order partial differential advection

type equations. Equations of both types were thoroughly studied long ago. Such equations arise in many problems of mathematical physics and mechanics of continuous media, in modeling chemical and biological processes, etc. Hence, so far numerous different numerical methods have been developed for solving such equations,1 many of which are realized as standard programs that 1 It would be right to note that numerical methods have been developed mostly for solving second-order parabolic equations. Nonlinear advection equations have been less

studied until the present time. However, many papers dealing with qualitative analysis and numerical solution of such equations have appeared most recently. Here we would

like to mention the Italian school of mathematicians (M. Falcone, R. Ferretti, and others) who studied various discrete schemes that allow the construction of numerical solutions

for various types of nonlinear advection equations including those with discontinuous solutions [10, 31, 48, 49, 53],

355

356

Chapter VII

are parts of well-known software packages such as MATLAB, Mathematica, and some others. It should be noted that the existing software can be used for solving synthesis problems in practice rather seldom. This fact is related to some

peculiar features of the Bellman equations (see §3.5 in [34]), which make the application of standard numerical methods rather difficult. For example,

the difficulties arising in solving the Bellman equations of higher dimensions are well known. Furthermore, an obstacle known as the " boundary difficulty" is often encountered in the numerical solution of synthesis problems. Obviously, any numerical procedure allows us to construct the solution of the Bellman equation only in a bounded region D where the arguments of the loss function vary. Therefore, if, for example, we solve the Bellman equation of parabolic type, then we need to pose the initial and boundary conditions on the boundary of D. At the same time, many optimal control problems do not contain any restrictions on the phase coordinates (in this case, to solve the synthesis problem, we need to solve the Cauchy problem for the Bellman equation). Thus, for a reasonable choice of the boundary conditions required for the numerical solution of the problem, we need, in addition, to study the asymptotic behavior of the loss function at infinity. In more detail, these problems are considered in §7.1. In §7.1 and §7.2 we show how one of the most widely used methods

(known as the grid function method] for solving partial differential equations numerically can be applied for the numerical solution of some specific optimal control problems studied in the previous chapters by other methods. §7.1. Numerical solution of the problem of optimal damping of random oscillations

The main results of this section are related to the numerical solution of the problem of optimal damping of random oscillations in a linear oscillator; this problem was studied in §3.2 and §3.4. However, we begin with some general problems concerning methods for stating the boundary conditions for the loss function in solving the synthesis problem numerically. 7.1.1. Choice of the boundary conditions for the loss function. Let us consider a control system governed by the differential Ito equation of the form dx(t)

= [a(t, x) + q(t)u] dt + cr(t, x) d0rj(t),

0 < t < T,

x ( 0 ) =x0.

( • • 1

Numerical Synthesis

357

Here x = x(t) is an n-dimensional vector of phase variables, u = u(t) is an r-dimensional vector of controlling actions, r)(t) is a d-dimensional vector of independent Wiener stochastic processes of unit intensity, a(t, x) is an n-dimensional vector of given functions, and q(t) and a(t, x) are given n x r and n x d matrices.

We assume that admissible control actions are subject to constraints of the form

u(t)

€ U,

(7.1.2)

where U is a given closed bounded set in R r . If the vector of current phase variables x(t) can be measured exactly, then we need to construct a control function w* = ut (t, x(t)^, 0 < t < T,

in the synthesis form so that, for any given initial state cc(0) = KO, the function u* minimizes the following functional denned on the trajectories of Eq. (7.1.1):

r rT i I[tt] = E / c ( x ( t ) ) d t + i/>(x(T))\ LJo J

(7.1.3)

(here £[•] is the mathematical expectation of [•], c(x):i/>(x) > 0 are given penalty functions, and 0 < t < T is a given time interval). According to §1.4, the dynamic programming approach allows one to

reduce problem (7.1.1)-(7.1.3) to solving the partial differential equation (the Bellman equation)

LF + rmn[uTqTFx] = -c(x),

0
F(T, x) = ^(x),

ueU

(7.1.4)

L = — + a T (i, x)— Here F = F(t,x) is the loss function determined as usual by

F(t,x)=

F fT min E / c ( x ( s ) ) ds + i/>(x(T)) «(»)e^ I J t t<s
1 x(t)=x\. J

(7.1.5)

Equation (7.1.4) is a semilinear (linear with respect to higher-order derivatives) equation of parabolic type, and we shall try to solve it numerically by using different versions of well-studied finite-difference procedures in the grid function methods (the grid methods) [135, 162, 163, 179]. However, these calculational scheme allow one to obtain the solution only in a bounded domain D of the phase variables x. To apply these methods, we need to impose some boundary conditions on the loss function F(t, x) on the boundary of D. Since in the initial statement of the problem it is

358

Chapter VII

assumed that the solution of Eq. (7.1.4) must be defined on an unbounded phase space (x G Rn), the boundary conditions for F(t, x) require a special analysis if Eq. (7.1.4) is solved numerically. A possible method for overcoming the boundary indeterminacy in stochastic time-optimal control problems was proposed in [85].

For the problem considered here, the essence of the method suggested in [85] consists in the following. Suppose that it is required to construct a numerical solution of Eq. (7.1.4) in a bounded region D. Let us consider a sequence of expanding bounded regions in the phase space DR D D (DR can be the n-dimensional ball of radius R or the n-dimensional cube with edge R centered at the origin). Then the desired solution F(t, x) is denned

in the region D as the limit of the sequence of numerical solutions of the boundary value problems for Eq. (7.1.4) in the regions DR, corresponding to the increasing sequence of values of the parameter R. In this case, the

boundary conditions posed on the boundaries of the regions DR can be arbitrary (for example, the zero conditions F(t,x}\QD = 0). However, in practice, the use of this procedure in the numerical synthesis requires an extremely large amount of calculations. For example, already

for the second-order system (7.1.1) (a; 6 R-a)) this method is unacceptable, since the time required to compute the solution is too large. Here we present a more economical numerical method based on the use

of the asymptotic behavior of the loss function for large \x\. In this case, we need a priori estimate the asymptotic behavior of F(t, x) satisfying (7.1.4) as \x\ —>• oo. Suppose that q(t) be a piecewise continuous bounded function for all t G [0, T] and a(t, x) and a(t, x) are continuous functions in x, Borel function in (t,x), and satisfy the conditions

\a(t,x)-a(t,y)\ + \\
(

'

for all x,y, £ Rn and t £ [0,T], where N > 0 are constants, a is the

Euclidean norm of the vector a, and \\cr\\ — *\/Sp<7<7T. We assume that the penalty functions c ( x ) , tj)(x) > 0 are continuous and satisfy the condition

Ni\x\m(x)< JV 2 (1+ x\)m

(7.1.7)

for all x G Rn and some m, AT1; JV2 > 0; furthermore,

|c(z) - c(y)\ + \j(x) - i>(y)\ < N(l + R)m~l\x - y\ for all R > 0 and x, y e SR (Sn is a ball of radius R in R n ).

(7.1.8)

Numerical Synthesis

359

By using Theorem IV. 1.1 [113], one can show that the conditions (7.1.6), (7.1.8), together with the upper estimates (7.1.7), guarantee that the function F(t,x) satisfying problem (7.1.1)-(7.1.3) has generalized first-order derivatives in x and the estimate

\Fx(t,x)\
(7.1.9)

is satisfied for any i G [0, T] and for almost all x. The lower bounds for the penalty functions (7.1.7) and the continuity of the phase trajectories x(t) imply the following lower estimate for the loss function:

\x\)m.

(7.1.10)

Let F°(t, x) denote the solution of the linear equation

LF° = -c(x),

0
F°(T,x) = i)(x)

(7.1.11)

(L is the operator in (7.1.4)). Obviously, F° is the value of the functional

U

T

1 c ( x ( s ) ) ds + $(x(T)) | x ( t ) = x\. J

(7.1.12)

This functional is calculated on the trajectories of system (7.1.1) corresponding to the noncontrolled motion (the averaging in (7.1.12) is performed over the set of sample paths x(s) : t < s < T issued from a given point x(t) = x and satisfying the stochastic differential equation (7.1.1) for u = 0). It follows from (7.1.4) and (7.1.11) that the difference G(t, x) = F°(t, x)F(t,x) satisfies the equation

LG = $(t,Fx),

0
G(T,x) = 0.

(7.1.13)

Here $ denotes the nonlinear function $(<, Fx) — — minu€U[uT qT Fx]. Since the set U of admissible controls and the function q(t) are bounded, we have the estimate

\$(t,Fx)\
(7.1.14)

If the transition probability density of a noncontrolled Markov process x(s) satisfying Eq. (7.1.1) for u = 0 is denoted by p(x, t; y, s) (s > 2), then

we can write the solutions of Eqs. (7.1.11) and (7.1.13) in quadratures (see (3.4.13)). In particular, for the function G we have

G(t,x)= I Jt

ds I JRn

$(s,Fy(s,y))P(x,t;y,s)dy.

(7.1.15)

360

Chapter VII

This relation and (7.1.9) imply the following (similar to (7.1.9)) upper bound for the difference G - F - F°:

\G(t,x}\
G(t, x)/F(t, x) = [F°(t, x) - F(t, x ) ] / F ( t , x) -> 0

(7.1.16)

as \x\-> oo. This condition allows us to use F°(t, x) as the asymptotics of the loss function F(t,x] for solving the Bellman equation (7.1.4) numerically. In some cases, for instance, in the example considered below, we succeed in obtaining a finite analytic formula for the function F°(t, x). 7.1.2. Numerical solution of a specific problem. We shall discuss

the method of numerical synthesis in more detail for the problem of optimal damping of random oscillations studied in §3.2 and §3.4. Suppose that the plant to be controlled is a linear oscillator with one degree of freedom governed by an equation of the form

x+/3x + x = u+V2B£(t),

\u\
(7.1.17)

where £(t) is the scalar standard white noise (1.1.31), u is a scalar control, and /?, B, and um are given positive numbers (/? < 2). By setting the penalty functions c(x(t}} — X2(t) + x2(t) and i/>(x) = 0 in (7.1.3), we obtain the Bellman equation

Ft +yFx - (x+(3y}Fy + BFyy = -x2 - y2 + um\Fy\, -oo < x,y< +00, 0
^

for the loss function F(t,x,y) (here x and y = x are the phase variables). By passing to the reverse time p = T — t, we can rewrite (7.1.18) as the standard Cauchy problem for a semilinear parabolic equation. By using the old notation t for the reverse time p, we rewrite (7.1.18) as

Ft = BFyy + yFx - (x+(3y)Fy - um\Fy\ + x2 + y2, 0
^ ( • • I

We shall seek the numerical solution of Eq. (7.1.19) in the quadratic (-L < x < L, —L < y < L) region D of the phase variables (see Fig. 55). We need to pose boundary conditions for the function F(t,x,y) on the boundary of D. It follows form (7.1.17) that the phase trajectories lying in

361

Numerical Synthesis

E

-L

-L

D

FIG. 55 the interior of D cannot terminate on the boundary segments BC and ED indicated by dashed lines in Fig. 55. Therefore, we need not pose boundary

conditions on these segments; on the other parts of the boundary, as it follows from Section 7.1.1, the boundary conditions are posed with the aid of the asymptotics F°(t,x,y) satisfying the linear equation F? = BF°y + yF°x - (x Q
F°(0, x, y) = 0.

(7.1.20)

Up to the notation, Eq. (7.1.20) coincides with Eq. (3.4.23) whose solution was obtained in §3.4 as the finite formula (3.4.29). Rewriting (3.4.29) with

regard to the notation used in the present problem, we obtain the solution of Eq. (7.1.20) in the form

2B

^•sm2St(B-,

j cos 2St(x2(/32 - 2) + 2(3xy + 2y2 - B{3) , 6 =

(7.1.21)

Formula (7.1.21) allows us to pose the boundary conditions for the desired function F = F(t,x,y) on the unhatched parts of the boundary

362

Chapter VII

of D = (-L < x,y < +L). To this end, we set F = F(t,x,y) — F°(t,-L,y) on AB, F = F°(t,x,L) on CF, F = F°(t,L,y) on EF, and F = F°(t,x,-L) on AD. Let us construct a uniform grid in the domain HT = {D x [0,T]} =

{(x,y,t): - L < x, y < L, 0 < t < T}. By F^ we denote the value of the function F(t,x,y) at the point with coordinates (t = kr, x = ih,y = jh), where h and T are the approximation steps in the coordinates x,y and in time t and i, j, k are integer- valued variables with values in —Q < i < +Q, -Q < 3 < +Q, a,ndO
It follows from (7.1.19) that for & = 0 we must set

*t' = 0.

-Q<*,J
(7.1-23)

at all nodes of the grid.

For the difference approximation of Eq. (7.1.19) we shall use a locally one-dimensional solution method (a lengthwise-transverse scheme) [163]. In this case the complete approximation scheme consists in solving the following two one-dimensional (with respect to the phase coordinates) equations successively:

vt = yvx + x2,

(7.1.24) 2

Vt = -(x+0y + um AgnV,)Vy + BVyy + y .

(7.1.25)

Each of Eqs. (7.1.24) and (7.1.25) is replaced by a two-layer difference scheme denned by the three-point pattern (Eq. (7.1.24)) or by the four-point pattern (Eq. (7.1.25)). In this case, since the parts of the boundary of D

indicated by dashed lines in Fig. 55 are inaccessible, we shall approximate vx = dv/dx by the right difference derivative for y > 0 (j > 0) and by the left difference derivative for y < 0 (j < 0). Then the derivatives Vy = dV/dy and Vyy = 92V/dy2 are approximated by the central difference derivatives

vx ~ '+1'J

% J

' ,

j > 0,

-Q < i < Q - 1,

Numerical Synthesis k

363

k

j < 0, -Q + 1 < i < Q,

' ' ~ '

fc T/

*~

,

,-

2h

'

,, y

-

o f c

i

fc

-g + 1
(7.1.25) are' related as follows: F^ = vfj, vk+l = V£, and V^ = Ftk+l. Moreover, since the time-step is assumed to be small (we take r — 0.01), in the difference approximation of Eq. (7.1.25) we can use the sign of the derivative Vk = vk+l instead of sign (V*^ - V^jt\), that is, we shall use m:j — sign(F,-j+1 — VJ*j_i) instead of signVy (a similar replacement was performed in [34, 86]).' It follows from the preceding that the difference approximation transforms Eqs. (7.1.24) and (7.1.25) to the following three difference equations;

(VU -vij)'T = J(

0 < j < Q - 1, -Q < i < Q - 1, (7.1.26)

(7.1.27)

-Q+l
(7.1.28) Formulas (7.1.26) and (7.1.27) together with the boundary conditions (7.1.22) and the initial conditions (7.1.23) allow us to calculate the functions vfj"1 recurrently at all nodes of the grid. Indeed, rewriting (7.1.26) and (7.1.27) in the form

+ JTV

+ r(ih)2}/(l + jr),

j > 0,

(7.1.29)

JT),

j<0,

(7.1.30)

364

Chapter VII

we see that, for given vfj = Fkj and each fixed j > 0, the desired set of the values of v* t1 can be calculated successively from right to left by formula (7.1.29). For the initial value of Vq+f we take F°((k + l)r,L,jh), where F°(t,x,y) is the function (7.1.21). Correspondingly, for j < 0 the values

of uH 1 can be calculated from left to right by formula (7.1.30) with the initial value vk_^j = F°((k + l)r, -L,jh).2 Since t^t1 = V*j, we obtain the grid function Vfj for the fcth time layer, after the grid function •wft 1 is calculated. Now to calculate the grid function Vff1 = -Fft1 on the layer (k + I ) , we need to solve the linear algebraic system (7.1.28). It is convenient to solve this system by the sweep method

[162, 179], which we briefly discuss here. Let us denote the desired values of the grid function on the layer (k + 1) by z, - V*fl . Then system (7.1.28) can be written in the form

AjZj.1-CjZj+MjZj+i

= -ipj,

_Q + i < j < Q _ l ,

(7.1.31)

where Aj, Cj, Mj, and
Cj = Ih2 + 4rB,

Mj - 2rB - hr(ih + jj3h + umuitj),

j = 2h2 (t)f t1 +

(7.1.32) Since the number of equations in (7.1.31) is less than the number of unknown variables Zj, —Q < j < Q, to solve the system (7.1.31) uniquely, we need to complete this system with two conditions

(7.1.33) that follow from the boundary conditions (7.1.22). We seek the solution of problem (7.1.31), (7.1.33) in the form

+ i/,-+i,

-Q < j < Q - 1,

(7.1.34)

where the coefficients fj,j and i/j are calculated by the recurrent formulas

2 The recurrent formulas (7.1.29) and (7.1.30) are used for fc = 0,1, 2, ..., K - 1. It follows from (7.1.23) that in (7.1.29) and (7.1.30) we must set «? . = 0, -Q < i,j < Q, for fc = 0.

Numerical Synthesis

365

with the initial conditions

F°((k + l)T,ih,-L).

(7.1.36)

Thus, the algorithm for solving problem (7.1.31), (7.1.33) by the sweep method consists in the following two steps:

(1) to find Hj and vj recurrently for -Q + 1 < j < Q (from left to right from j to j + 1) by using the initial values (7.1.36) and formulas (7.1.35); (2) employing ZQ from (7.1.33), to calculate (from right to left from j + I to j) the values ZQ_V Zg_ 2 , • • • , z _o + ii z-o successively according to formulas (7.1.34) (note that in this case, in view of (7.1.36), the value of z_Q coincides with that given by (7.1.33)). As was shown in [162, 179], the procedure of calculations by formulas (7.1.34) and (7.1.35) is stable if for any j we have

A, > 0,

MJ > 0,

Cj > AJ + Mj.

It follows from (7.1.32) that these conditions can be reduced to the following one in the problem in question:

IB > h(ih + jfih + umuitj).

Obviously, the last condition can always be satisfied by choosing a sufficiently small approximation step h. This calculational procedure was realized as a software package used for numerical experiments on computers. The parameters of a difference scheme were chosen so that to ensure a prescribed accuracy. It is well known [163] that the total locally one-dimensional approximation scheme (7.1.22),

(7.1.23), (7.1.26)-(7.1.28) is absolutely stable and its error is O(h2 + T). The approximation steps were: T = 0.01 and h = 0.1. The dimensions of the region D were: L = 3 and Q = 30. The other parameters (3,um,B of the problem were different in different specific calculations. The twodimensional data array of the loss function F(t,x,y) was printed for t =

0.25,0.5, 0.75,.... Some results of these calculations are shown in Fig. 56-60. Figure 56 presents the axonometry of the loss function F(t, x, y) in Eq. (7.1.19) with (3 = B = um = 1 at three time moments t — 0.25, 0.5, 1.0. Figure 57 shows curves of constant level F(t,x,y) = 3 and switching lines in an optimal system with f3 = B = um = 1 at three time moments t = 0.5, 2.0, 8.0. In view of the central symmetry of Eqs. (7.1.19), these curves are plotted in two different halves of the region D. The switching line uniquely determines the optimal control of system (7.1.17) as follows: u = —um at the points of

Chapter VII

366 F(t,x,y)

16:

FIG. 56

\ \ -3

-2

-1

0 \ 1 \

2

I 3

x

-1 -2

-3

FIG. 57 the phase plane (a;, y] lying above the switching line, and u = +um below this line.

Figure 58 illustrates how the switching line and the value of the performance criterion of this optimal system depend on the value of the admissible control um for B = /? = 1 and t = 4. In Fig. 58 one can see that an increase in the range of admissible controls uniformly improves the control quality,

367

Numerical Synthesis

that is, decreases the value of the optimality criterion independent of the initial state of system (7.1.17).

um = 1

FIG. 58 Figures 59 and 60 show how the switching lines and the constant level curves depend on the other parameters of the problem.

um=/3=l

FIG. 59

Chapter VII

368

FIG. 60 §7.2. Optimal control for the "predator—prey" system (the general case)

In this section we consider the deterministic problem of optimal control for a biological system consisting of two interacting populations ("predators" and "prey"). We have already considered this system in §5.2 where we studied a special type of this system called in §5.2 the case of a "poorly adapted predator." In what follows, we consider the general case of this problem. The synthesis problem corresponding to this case is solved numerically. Furthermore, we obtain some analytic results for a control problem with infinite horizon.

7.2.1. The normalized Lotka—Voiterra model. Statement of the problem. We assume that the system considered is described by the

Lotka-Volterra model (see [133, 186, 187] as well as §2.3 and §5.2) in which the behavior of the isolated system is governed by a system of the form

yi(r) =

o 4 )t/i

(7.2.1)

Here x\(r) and y\(r) are the sizes (densities) of prey and predator populations at time r, the positive numbers at (i = 1,2,3,4) characterize the intraspecific (oj, 04) and interspecific (02,03) interactions. By changing the variables

x(t) = a3a^1xi(r),

y(t) =

we rewrite system (7.2.1) in the dimensionless (normalized) form x(t) = (1 - y)x,

y ( t ) = b(x - l)y.

(7.2.2)

Numerical Synthesis

369

Just as in §5.2, we assume that the external (controlling) action on system (7.2.2) is to remove some prey species from the habitat (by catching, shooting, or using some chemical substances). In this case, the control system considered is described by equations of the form

x(t) = (1- y)x - ux,

t > 0,

y(t) = b(x - l)y,

z(0) = x0 > 0,

y(0) = y0 > 0,

where u = u(t) is a nonnegative bounded scalar controlling function that for all t > 0 satisfies the constraints 0 < u(t) < um,

(7.2.4)

where um is a given positive number. Let us consider the phase trajectories of the controlled system (7.2.3). They are solutions of the differential equation

(I — y — u)x

-

(7.2.5)

First, we note that in view of Eqs. (7.2.3), the phase variables x(t) and y(t) cannot attain negative values for alH > 0 if the initial values XQ and t/o are

nonnegative (the last assumption is always satisfied, since XQ and J/Q denote the initial sizes of the prey and predator populations, respectively). Therefore, all solutions of Eq. (7.2.5) (the phase trajectories of system (7.2.3)) lie in the first quadrant (x > 0,y > 0) of the phase plane (x,y). Furthermore, we shall consider only the phase trajectories that correspond to the two boundary values of control: u — 0 and u = um. For u = 0 Eqs. (7.2.3) coincide with Eqs. (7.2.2) for an isolated (au-

tonomous) Lotka-Volterra system. The dynamics of system (7.2.2) was studied in detail in [187]. Omitting the details, we only note that in the first quadrant (x > 0, y > 0) there are two singular points (a? = 0, y = 0) and (x = l,y= 1) that are the equilibrium states of system (7.2.2). In this case the origin (x = 0,y = Q) is an unstable equilibrium state, while the

state (x = I , y = 1) is stable and is a center type singular point. All phase trajectories of system (7.2.2) (except for the two that lie on the coordinate axes: (x > 0, y — 0) and (x = 0, y > 0)) form a family of closed concentric curves around the point (a; = l,y = 1). Thus, in a noncontrolled system the sizes of both populations are subject to undecaying oscillations whose period and amplitude depend on the initial state (xo,y0). However, if the

initial state (:EO,J/O) lies on one of the coordinate axes in the plane ( x , y ) , then there arise singular (aperiodic) phase trajectories. In this case it fol-

lows from Eqs. (7.2.2) that the representative point of the system cannot

370

Chapter VII

leave the corresponding coordinate axis and in the course of time either approaches the origin (along the y-axis) or goes to infinity (along the xaxis). The singular phase trajectories correspond to the degenerate case of system (7.2.2). In this case, the biological system considered contains only one population.

If u = um > 0, then the dynamics of system (7.2.3) substantially depends on um. For example, if 0 < um < 1, then the periodic character of solutions of system (7.2.3) is conserved (just as in the case u = 0), while only the center of the family of phase trajectories moves to the point (x = 1, y — 1 —

um). For um > I the solution of system (7.2.3) is aperiodic. In the special case um = 1, Eq. (7.2.5) can easily be solved, and the phase trajectories of system (7.2.3) can be written explicitly as

(7.2.6) For um > 1 Eq. (7.2.5) has a unique singular point (x — 0, y ~ 0), and this equilibrium state is globally asymptotically stable.3 Now let us formulate the goal of control for system (7.2.3). In many cases [90, 105] it is most desirable that system (7.2.3) is in equilibrium for u = 0, that is, the point (x = 1, y = 1) is the most desirable state of system (7.2.3).

In this case, one is interested in a control u* = u f ( x , y ) that takes system (7.2.3) from any initial state (XQ, yo) to the point x — 1, y = 1 in a minimum time. This problem was solved in [90]. Here we consider the problem of constructing a control ut = w*(i, x, y), which, in general, does not guarantee that the system comes to the equilibrium point (a; = l,j/ = 1) but ensures the minimum mean square deviation of the system phase trajectories from the state (x — l,y = 1) in a, given time interval 0 < t < T: ,T

/[«]=

/

[(l-x(t})2 + (l-y(t))2]dt^

JO

min

.

(7.2.7)

0
7.2.2. The Bellman equation and calculation of the boundary conditions. By using the standard procedure of the dynamic programming approach (see §1.3), we obtain the following algorithm for solving

problem (7.2.3.), (7.2.4), (7.2.7). 3

In this case the term "global" means that the trivial solution of system (7.2.3) is

asymptotically stable for any initial values (0:0,2/0) from the first quadrant of the phase plane.

Numerical Synthesis

371

Now we define the loss function (the functional of minimum future losses) by the relation

F(t,x,y)=

min

< /

[(l - x(
0<«(
•(t) = x , y ( i ) = 4 ,

(7.2.8)

and thus write the Bellman equation for problem (7.2.3), (7.2.4), (7.2.7) as

x,»>0,

0
F(T,x,y) = 0. (7.2.9)

If the function F(t, x, y) satisfying (7.2.9) is found, then the desired optimal control u#(t,x,y) in the synthesis form is given by the expression 0,

for

7nr(<, x,j/) < 0,

By using (7.2.10), we can rewrite the Bellman equation in the form 3 F1

(9 J?

t)W

-H=*(i-y- «0^ + M* -1)^- + a - -)2 + a - y) 2 , ( 7 _ 2 _ u ) x)2/>0,

0
F(T,x,y) = Q.

It follows from (7.2.10) that the optimal control is a relay type function, that is, at each time instant the control u is either u = 0 or u = um (this is a bang-bang control). If the loss function (7.2.8) is continuously differentiable with respect to x, then the control is switched from one value to the other each time when the condition

OF -fc(t,x,y) = Q

(7.2.12)

is satisfied. Equation (7.2.12) determines the switching line on the phase

plane (x, y) at each time instant. This switching line divides the phase space x, y > 0 into two regions RQ and Rm where the control u is either u = 0 or u = um, respectively. To find the switching line is equivalent to

solve the problem of optimal control synthesis.

372

Chapter VII

Of course, it must be remembered that the above procedure for solving the synthesis problem can be used only if the loss function (7.2.8) is sufficiently smooth and the Bellman equation (7.2.9) (or (7.2.11)) holds at all points of the domain 13/r = {x,y > 0,0 < t < T] of definition of the loss function. The smoothness properties of solutions satisfying equation of the form (7.2.9) (or (7.2.11)) were studied in detail in [172]. As applied to Eq. (7.2.9), the main result of [172] has the following meaning. The loss function F(t,x,y) satisfying (7.2.9) has continuous first-order derivatives with respect to all its arguments in the regions RO and Rm. On the interface between RQ and Rm, that is, on the switching line, the derivatives dF/dx and dF/dy can be discontinuous (have jumps) depending on type of the switching line. Namely, for the switching lines of the first and second kind, the first-order derivatives of the loss function are continuous everywhere in D/F. On the switching line of the third kind, the partial derivatives dF/dx and dF/dy always have jumps. Recall that, according to the classification given in [172], the type of the switching line is determined by the character of the phase trajectories of system (7.2.3) in the regions RQ and Rm near the switching line. For example, if the phase trajectories approach the switching line on both sides, then such a switching line is called a switching line of the first kind. In this case, the representative point of system (7.2.3), once coming to the switching line, moves along this line in the sliding mode (see §1.1). If the phase trajectories approach the switching line on one side (say, in the region RQ) and leave it on the other side

(in Rm), then we have a switching line of the second kind. Finally, if the switching line coincides with a phase trajectory in the region Rm (or RO), then we have a switching line of the third kind. In what follows, switching lines of the third kind do not occur; thus we can assume that for problem (7.2.3), (7.2.4), (7.2.7) studied here the Bellman equation (7.2.9) (or (7.2.11)) is valid everywhere in the region x > 0, y > 0, 0 < t < T, and in this region the function F(t, z, y) satisfying this equation has continuous first-order derivatives with respect to all its arguments. To solve Eq. (7.2.9) uniquely, we need to pose boundary conditions for the loss function F(t, x, y) on the boundary of the region of admissible phase variables, that is, for x = 0 and y = 0. Such boundary conditions can readily be obtained by a straightforward calculation of the functional on the right in (7.2.8) by using Eqs. (7.2.3) describing the system considered. Let us write F(t, 0, y) = (t,y) and F(t, x, 0) = ip(t,x). Then, using (7.2.3) and (7.2.8), we obtain

t, y) = 2(T - 1) +

[e-* - l] +

[l - e-^-*)] .

(7.2.13)

Numerical Synthesis

373

To find t[>(t, x), we need to solve the following one-dimensional optimization problem

rT

/ Jt

[1 + (1 - x(ff))

2] -+

min , o
x(a) = (1 - u)x(
a > t,

(7.2.14)

v

'

x(t] — x.

Problem (7.2.14) can readily be solved, although the solution of (7.2.14) and hence the form of the function i(>(t, x) substantially depend on the value of um .

(a) Let 0 < um < 1. In this case the points

x1 = e-F-t\

z 2 = 2/[l + e(1-""')(T-')]

(7.2.15)

divide the x-axis into three intervals. On the intervals 0 < x < KI and x-2 < x < oo, the function has the explicit form 2(T -t)- 2z[e(T-*) - 1] + x2[e2(T-') - l]/2,

0 < x < zi,

(7.2.16) On the interval xi < x < x%, the function tj>(t, x) is given by the formula i,(t, x) = 2(T - 1) + 1x -

T

— Um)

where z is the root of the transcendental algebraic equation xznm/(l-um)

[ e (l-« m )(T-«) + z]

= 2.

(7.2.18)

One can readily see that the possible values of the root z of Eq. (7.2.18) always lie in the region 1 < z < eC 1 -"™)^-*) an(j tne boundary values z = 1 and z = e (i—«™)(^-t) correspond to the endpoints (7.2.15) of the interval

Xi < x < £2- The optimal control «*, which solves problem (7.2.14), depends on the variable x(t) = x and is determined as follows: if x < xi,

then u* = 0,

if x > X2,

then M* = U TO ,

t <
Chapter VII

374 if

<x<

then u* =

for x(
0, um,

for x(a) > x*.

(b) Let um = 1. In this case, for u = um the coordinate x(
for x(
Um,

for x( 1.

(7.2.19)

The minimum value of the functional (7.2.14) can readily be calculated for control (7.2.19), and as a result, for the desired function if>(t,x] we obtain the expression

• 2(T — t} — 2x\e(T~t} — 11 + — le 2 ( T ~*) — I] 0 < x < e~( T ~'),

(T-t}-lnx + 2x-x2/2-3/2, 2

I (T-t)(2-2x + x ),

e -(

T

-*) < x < 1,

x > 1.

(7.2.20) (c) Let um > 1. In this case the optimal control solving problem (7.2.14) coincides with (7.2.19).4 After some simple calculations, we obtain

(T-t) -lnx + 2x- x2/2 - 3/2,

e~( T ~*) < x < 1,

(T-t) + (In x - 2x + x2/2 + 3/2)/(« m - 1), Zt\_L — 1 1 — T~H—— L^

—

J

» m )(T-t) _ ]_1

e (« m -l)(T-t)

< j; < go

(7.2.21) 4

For e~( T-t ) < x < e ' UTO ~ 1 )( T - t ) t there always exists a time instant
the solution x( t, x(t) = x, attains the

value x(rro) = 1. After the time
for

x(
for

x(
for

x( 1.

Under this control we can realize the generalized solution in the sense of Filippov of the equation x(
Numerical Synthesis

375

Thus, to find the optimal control in the synthesis form that solves problem (7.2.3), (7.2.4), (7.2.7), we need to solve the following boundary value problem for the loss function F(t,x,y):

z,y>0,

0
, y ) =
(7.2.22)

F(t, x, 0) = tf(t, x),

where w* has the form (7.2.10), (t,x) is given by expressions (7.2.16)-(7.2.18), (7.2.20) or (7.2.21) depending on the value of the maximum admissible control um.

The boundary value problem (7.2.22) was solved numerically. The results obtained are given in Section 7.2.4. 7.2.3. Problem with infinite horizon. Stationary operating mode. Let us consider the control problem (7.2.3), (7.2.4), (7.2.7) on an infinite time interval (in this case the terminal time T —> oo). If the optimal control u*(t,x,y) that solves problem (7.2.3), (7.2.4), (7.2.7) ensures the convergence of the functional (7.2.8) for any initial state (x > 0, y > 0) of the system, then due to time-invariance of Eqs. (7.2.3) the loss function (7.2.8) is also time-invariant, that is, F(t,x,y) —> f ( x , y ) , where the function f ( x , y ) satisfies the equation = 0, (7.2.23)

which is the stationary version of the Bellman equation (7.2.9).

In this case, the optimal control u#(x,y) and the switching line do not depend on time explicitly and are given by formulas (7.2.10) and (7.2.12) with F(t,x,y) replaced by the loss function f(x,y). Let us denote the loss function f ( x , y) in the region RQ (ut = 0) by f o ( x , y ) , and the loss function f ( x , y ) in the region Rm (u* = um) by f m ( x , y ) - In RO the function /o satisfies the equation (7.2.24) \jAJ

\j y

Correspondingly, for the function fm defined on Rm we have

x(l-y- w m )^P + by(x - l)~ + (1 - cc)2 + (1 - y)2 = 0. (7.2.25)

376

Chapter VII

Since the gradient of the loss function is continuous on the switching line, that is, on the interface between RQ and Rm, we have

dfo _ dfm dx dx '

dfo _ dfm dy dy

,,_, 2 2g.

Equations (7.2.24)-(7.2.26) allow us to obtain explicit formulas for the partial derivatives d f / d x and df /dy along the switching line

dx

'

dy

by(x-l)

(l 7 2 2 7 j)

If the switching line contains intervals of sliding mode, then formulas (7.2.27) allow us to find these intervals and to obtain explicit analytic formulas for the switching line on these intervals. As was shown in §4.1 (see also [172]), the second-order mixed partial derivatives of the loss function /(x, y) must coincide on the intervals of sliding mode, that is, we have

d2 7f

dxdy

-

62Jf

dydx

(7.2.28)

By using formulas (7.2.27), one can readily see that the condition (7.2.28) is satisfied along the two lines y = x and y — 2 — x. To verify whether these lines (or some parts of them) are lines of the sliding mode, we need to consider the families of phase trajectories (that is, the solutions of Eq. (7.2.5)) for u = 0 and u = um near these lines. The corresponding analysis of the phase trajectories of system (7.2.3) shows that the sliding mode may take place along the straight line y = x for x < 1 and along the line y = 2 — x for x > 1. In this case the representative point of system (7.2.3) once coming to the line y = x (x < 1) moves along this lines (due to the sliding mode) away from the equilibrium state (x — 1, y = 1). On the other hand, along the line y — 2 — x (x > 1), system (7.2.3) asymptotically as t —>• oo approaches the point (x = l,y = 1) due to the sliding mode. That is why, only the straight line segment

y = 2-x,

1 < x < x° < 2,

(7.2.29)

can be considered as the switching line for the optimal control in the stationary operating mode. If u = um, then the integral curve of Eq. (7.2.5) is tangent to the line y = 2 — x at the endpoint x° of the segment (7.2.29). By using (7.2.5), we can write the tangency condition as

Numerical Synthesis

377

For different values of the parameters in problem (7.2.3), (7.2.4), (7.2.7) (that is, of the numbers b > 0 and um > 0), the solution of Eq. (7.2.30) has the form

[36- 1 - um - V ( 3 6 - l - M m ) 2 - 8 6 ( 6 - l ) ] / 2 ( 6 - 1), 2/(2-wm),

.2,

if

0 < um < 1,

6 ^ 1 or

if

0<wro
&=1,

if wm > 1,

um > 1,

b > bm,

b < bm, (7.2.31)

where the number bm is equal to

2 - um + <, - v/8MTO(«ro - 1)

One can easily obtain a finite formula for the stationary loss function f ( x , y ) along the switching line (7.2.29). By using the second equation in (7.2.3) and formula (7.2.29), we see that the coordinate y(t) is governed by the differential equation

y = b(y- y2)

(7.2.32)

while moving along the straight line (7.2.29). By integrating (7.2.32) with the initial condition y(0) = y, we obtain

y(t) =

^,. y , u. y+(l-y)e~bt

(7.2.33)

Using (7.2.33) and the relation x(t) = 2 — y(t) and calculating the functional I in (7.2.7) for T — oo, we find the desired stationary loss function

/(2 -y,y)= h(y) = -(y - Inj/ - 1).

(7.2.34)

Here y is an arbitrary point in the interval 2 — x° < y < 1.

7.2.4. Numerical solution of the nonstationary synthesis problem. If the control time T is finite, then the algorithm of the optimal control ut(t, x, y) depends on time and, to find this control, we need to solve the nonstationary Bellman equation (7.2.22). This equation is solved numerically in the bounded region fl = {0 < x < x max , 0 < y < ymax, 0 < i < T}. To this end, in fi we construct the grid W = iXi {, * = lhjjr.7 i = 0,} 1,i . . . .i N-r, d i ^ j ih^Nr ^ ^ d i = tfjXmav', max i ?/.

__ -'I,

-' __ n

1

AT

if3 — j*"yi J — * - ' i - i - 5 ' ' ' ) - i ' l j / 5

L

y

RT

__

.

y ~~ c/max5

tk = kr, k = 0,1,..., JV, rN = T},

(7.2.35)

378

Chapter VII

and define the grid function F^ that approximates the desired continuous solution F(t,x,y) of Eq. (7.2.22) at the nodes of the grid (a,-, %•,<*). The values of the grid function F^ at the nodes of the grid (7.2.35) are related

to each other by algebraic equations obtained by the difference approximation of the Bellman equation (7.2.22). In what follows, we use well-known methods for constructing difference schemes [60, 135, 162, 163], therefore, here we restrict our consideration only to a formal description of the difference equations used for solving Eq. (7.2.22) numerically. We stress that the problems of approximation accuracy and stability and of the convergence of the grid function FJj to the exact solution F(t,x,y) of Eq. (7.2.22) as hx,hy,T^ 0 are studied in detail in [49, 53, 135, 162, 163, 179]. Just as in §7.1, by using the alternating direction method [163], we replace the two-dimensional (with respect to the phase variables) equation (7.2.22) by the following pair of one-dimensional equations:

-^ = z(l - < / - « * ) £ + ( I - * ) 2 ,

(7-2.36)

-dY- = by(x-l}d^ + (l-y)\

(7.2.37)

each of which is approximated by a finite-difference scheme with fractional steps in the variable t. To ensure the stability of the difference approximation of Eqs. (7.2.36), (7.2.37), we use the scheme of "oriented differences"

[163]. For 0 < i < Nx and 0 < j < Ny, 0 < k < N, we replace Eq. (7.2.36) by the difference scheme fc-0.5 _

k

(7-2.38)

where

k

and the approximation steps hx and T satisfy the condition T\TX < hx for all rx on the grid u.

For Eq. (7.2.37) we used the difference approximation rk-l

vrk-0.5

(7.2.39)

Numerical Synthesis

379

where

)yjb,

r+ = 0.5(ry + \ry\),

ry = Q.5(ry - \ry\),

and the steps hy and T are specified by the condition r\ry\ < hy for all ry

on the grid (7.2.35). The grid functions for the initial Bellman equation (7.2.22) and for the auxiliary equations (7.2.36), (7.2.37) are related as Fkj = vfj, v^°-5 = k 5 k l k l 11111 vV ~ =f F ~ V v,3.~°- > -and ij • tj The grid functions are calculated backwards over the time layers (numbered by fc) from k = N to an arbitrary number 0 < k < N. The grid function F^ approximates the loss function F(T-kr, ihx,jhy) corresponding to Eq. (7.2.22).

To obtain the unknown values of the grid functions v\j and Vj* uniquely from the algebraic equations (7.2.38) and (7.2.39), in view of (7.2.22), we need to complete these equations with the zero "initial" conditions

f£=0,

Q
G<j
(7.2.40)

and the boundary conditions of the form

hx),

0
Q.5)T,jhy),

0 < k < N,

0<j
0
where the function ip(t, y) is determined by (7.2.13), and the function i(>(t, x] is calculated either by formulas (7.2.16)-(7.2.18) or by formula (7.2.20) (or (7.2.21)) depending on the value of the admissible control um. According to [163], the difference scheme (7.2.38)-(7.2.41) approximates the loss function F(t, x, y) of Eq. (7.2.22) up to O(hx + hy + T).

Calculations according to formulas (7.2.38)-(7.2.41) were performed by using computers, and some numerical results are shown in Figs. 61-64. Figure 61 shows the position of the switching lines (7.2.12) on the phase plane (x,y) for different values of the "reverse" time p = T — t. The curves in Fig. 61 were constructed for the problem parameters b = um = 0.5 and the parameters hx = hy = 0.1, T = 0.01, and Nx = Ny = 20

of the grid (7.2.35). Curves 1-5 correspond to the values of the reverse time p = 1.5, 2.5, 3.5, 5.0, 7.0, respectively. The dashed line in Fig. 61 indicates the segment of the line (7.2.29) that is the part of the switching line corresponding to the sliding mode of control in the limit case p = T — t —>• oo. Figures 62 and 63 show similar results for the maximum values um — 1.0 and um = 1.5 of the admissible control. Curves 1-3 in

Figs. 62 and 63 are the switching lines corresponding to three values of the

Chapter VII

380

2.0

54321

1.6 1.2 0.8

0.4

0

0.4

0.8

1.2

1.6 x

FIG. 61

321

2.0 1.6 1.2 0.8 0.4

0.4

0.8

1.2

1.6

2.0 x

FIG. 62 reverse time p = 3.5, 6.0, 12.0. Figure 64 illustrates the variation of the loss function F(t, x, y) along a part of the line (7.2.29) for different time moments. The dotted line in Fig. 64 shows the stationary loss function

(7.2.34).

381

Numerical Synthesis

2.0 v

321

1.6 1.2

0.4

0

0.4

1.2

0.

FIG.

1.0

1.1

63

1.2

FIG.

2.0 x

1.6

1.3

x

64

Figures 61-64 show that the results of numerical solution of Eq. (7.2.22) (and of the synthesis problem) as p —>• oo allow us to study the passage to the stationary control of population sizes. Moreover, these data confirm the results of the theoretical analysis of the stationary mode carried out in Section 7.2.3.

382

Chapter VII

We also point out that the nonstationary ut(t, x,y) and the stationary uf(x,y) = linip-j.oo u*(t, x, y] algorithms of optimal control, obtained by solving the Bellman equation (7.2.22) numerically, were used for the numerical simulation of transient processes in system (7.2.3) when the com-

parative analysis of different control algorithms was carried out. The results of this simulation and comparative analysis were discussed in §5.2.

CONCLUSION

Design methods that use the frequency approach to the analysis and synthesis of control systems [119-121, 146, 147] are widely applied in modern control engineering. Based on such notions as the transfer functions of open- or closed-loop systems, these methods allow one to evaluate the control quality by the position of zeros and poles of these transfer functions in the frequency domain. The frequency methods are very illustrative and effective in studying linear feedback control systems. As for the methods for the calculation of optimal (suboptimal) control algorithms in the state space, shown in this book, modern engineering most frequently deals with results obtained by solving problems of linear quadratic optimization, which lead to linear optimal control systems. So far linear quadratic problems of optimal control have been studied comprehensively, the literature on this subject is quite extensive, and therefore these problems are only briefly outlined here. It should be noted that the practical realization of linear optimal systems often involves difficulties, as one needs to solve the matrix-valued Riccati equation and to use the solution of this equation on the actual time scale. These problems are discussed in [47, 126, 134, 149, 150]. It is well known that a large number of practically important problems of optimal control cannot be reduced to linear quadratic problems. In particular, this is true for control problems in which constraints imposed on the values of the admissible control play an important role. Although practically important, there is currently no universal approach to solving these optimal control problems with constraints in the form that ensures a simple technical realization of the optimal control algorithm. The author hopes that the results obtained in this book will help to develop new engineering methods for solving such problems by using constructive methods for solving the Bellman equations. Some remarks concerning the prospects for solving applied problems of optimal control on the basis of the dynamic programming approach should be made. The existing methods of optimal control synthesis could be categorized as exact, approximate analytic, and numerical. If a synthesis problem can

383

384

Conclusion

be solved exactly, then the optimal control algorithm can be written as a finite formula obtained by analytically solving the corresponding Bellman equation. Then the block C (the controller) in the functional diagram (see

Figs. 2 and 3) is a device simulating the analytic expression derived for the optimal algorithm. Unfortunately, it is seldom that the Bellman equations can be solved exactly (as a rule, for one-dimensional control problems).

The same holds in the case of linear quadratic problems, for which the dynamic programming approach only simplifies the procedure of solving the synthesis problem by reducing the problem of solving a nonlinear partial differential equation to solving a finite system of ordinary differential equations (a matrix-valued Riccati equation). In general, one could say that intuition and conjecture are crucial in search of exact solutions to the Bellman equations. Therefore, the construction of exact solution resembles a kind of art rather than a formal scientific approach.1 Thus, we cannot expect that exact synthesis methods would be widely used for solving actual control problems. The "practical" value of exact solutions to Bellman equations (and to synthesis problems) is that they, as a rule, form the basis for a family of approximate analytic synthesis methods, which in turn enable one to find control algorithms close to optimal algorithms for a significantly larger class of specific applied problems. The most common approximate synthesis methods employ various versions of the methods of a small parameter and of successive approximations for solving the Bellman equation. On one hand, a large variety of different versions of asymptotic synthesis methods (described in this book and by other authors, see [22, 33, 34, 56-58, 110]) is available which allow one to obtain solutions for many important classes of optimal control problems often encountered in practice. On the other hand, the asymptotic synthesis methods usually have a remarkable feature (multiply shown in this book) that ensures a high effectiveness of asymptotic methods in practice. Namely, quasioptimal control algorithms derived according to some scheme with small parameters are often sufficient when the parameter supposed to be small is in fact of a finite value, which is comparable to the other parameters of the problem. In the design of actual control systems, this allows one to obtain reasonable control algorithms by introducing a purely formal small parameter into a specific problem considered. Moreover, by formally applying the method of a small parameter, it is often possible to significantly improve various heuristic control algorithms commonly used in engineering (a typical example of such an improvement is given in §6.1). All this makes approximate synthesis methods based on the use of asympJ A similar situation arises in the search of Liapunov functions in the theory of stability [1, 29, 125, 129]. This fact was pointed out by T. Burton [29, p. 166]: " . . . Beyond any

doubt, construction of Liapunov functions is an art."

Conclusion

385

totic methods for solving the Bellman equations one of the most promising trends in the engineering design of optimal control systems. Another important branch of applied methods for solving problems of optimal control is the development of numerical methods for solving the

Bellman equations (and synthesis problems). This field has recently received much attention [10, 31, 48, 49, 53, 86, 104, 169]. The main benefit of numerical synthesis methods is their high universality. It is worth to note

that numerical methods also play an important role in problems of evaluating the performance index of quasioptimal control algorithms calculated by other methods. Currently, the widespread use of numerical synthesis methods in modern engineering is somewhat hampered by the following two factors: (i) the approximate properties of discrete schemes for solving some classes of Bellman equations still remain to be rigorously mathematically justified, and (ii) the calculation of grid functions requires a great

number of operations. All this makes it difficult to solve control problems of higher dimension and those with unbounded phase space. However, one must not consider these facts as an obstacle to using numerical methods in engineering. Recent developments in numerical methods for solving the Bellman equations and in the decomposition of multidimensional problems

[31], continuous advances in parallel computing, and the progress in computer technology itself suggest that the numerical methods for the synthesis of optimal systems will soon become a regular tool for all those dealing with the design of actual control systems.

REFERENCES

1. V. N. Afanasiev, V. B. Kolmanovskii, and V.R. Nosov, Mathematical Theory of Control Systems Design, Dordrecht: Kluwer Academic Publishers, 1996. 2. A. A. Andronov, A. A. Vitt, and S. E. Khaikin, Theory of Oscillations, Moscow: Fizmatgiz, 1971. 3. M. Aoki, Optimization of Stochastic Systems, New York-London: Academic Press, 1967. 4. P. Appel et J. Kampe de Feriet, Fonktions hypergeometriques et hyperspheriques, Polynomes d'Hermite. Paris, 1926. 5. K. J. Astrom, Introduction to Stochastic Control Theory. New York: Academic Press, 1970.

6. K. J. Astrom, Theory and Applications of Adaptive Control - a Survey. Automatica-J. IFAC, 19: 471-486, 1992. 7. K. J. Astrom, Adaptive control. In: Antoulas, ed., Mathematical System Theory, Berlin: Springer, 1991, pp. 437-450.

8. K. J. Astrom, Adaptive control around 1960. IEEE Control Systems, 16, No. 3: 44-49, 1996.

9. K. J. Astrom and B. Wittenmark, A survey of adaptive control applications. Proceedings 34th IEEE Conference on Decision and Control, New Orleans, Louisiana, 1995, pp. 649-654. 10. M. Bardi, S. Bottacin, and M. Falcone, Convergence of discrete

schemes for discontinuous value functions of pursuit-evasion games. In: G. J. Olsder, ed., New Trends in Dynamic Games and Applications, Basel-Boston: Birkhauser, 1995, pp. 273-304.

11. A. T. Bharucha-Reid, Elements of the Theory of Markov Processes and Their Applications, New York: McGrow-Hill, 1960. 12. V. P. Belavkin, Optimization of quantum observation and control. Proceedings of 9th IFIP Conference on Optimizations Techniques, Warszawa, 1979, Springer, 1980, pp. 141-149. 387

388

References

13. V. P. Belavkin, Nondemolition measurement and control in quantum dynamic systems. Proceedings of CISM Seminar on Information Complexity and Control in Quantum Physics, Springer, 1987,

pp. 311-329. 14. R. Bellman, Dynamic Programming. Princeton: Princeton University Press, 1957. 15. R. Bellman and E. Angel, Dynamic Programming and Partial Differential Equations. New York: Academic Press, 1972. 16. R. Bellman, I. Gliksberg, and O. A. Gross, Some Aspects of the Mathematical Theory of Control Processes. Santa Monica, California: Rand Corporation, 1958. 17. R. Bellman and R. Kalaba, Theory of dynamic programming and feedback systems. Proceedings of 1st IFAC Congress, Theory of Discrete, Optimal, and Self-Tuning Systems, Moscow: Akad. Nauk USSR, 1961. 18. D. P. Bertsekas, Dynamic Prograrnming and Stochastic Control. London: Academic Press, 1976.

19. N. N. Bogolyubov and Yu. A. Mitropolskii, Asymptotic Methods in

Nonlinear Oscillation Theory. Moscow: Pizmatgiz, 1974. 20. I. A. Boguslavskii, Navigation and Control under Incomplete Statistical Information. Moscow: Mashinostroenie, 1970. 21. L A . Boguslavskii and A. V. Egorova, Stochastic optimal control of motion with nonsymmetric constraints. Avtomat. i Telemekh., 33, No. 8, 1972. 22. M. Y. Borodovskii, A. S. Bratus, and F. L. Chernous'ko, Optimal pulse correction under random disturbances. Prikl. Mat. Mekh.,

39, No. 5, 1975. 23. N. D. Botkin and V. S. Patsko, Universal strategy in a differential game with fixed terminal time. Problems Control Inform. Theory,

11, No. 6: 419-432, 1982. 24. A. E. Bryson and Y. C. Ho, Applied Optimal Control. TorontoLondon: Blaisdell, 1969. 25. B. M. Budak and S. V. Fomin, Multiple Integrals and Series. Moscow: Nauka, 1965. 26. B. M. Budak, A. A. Samarskii, A. N. Tikhonov, Collection of Problems in Mathematical Physics. Moscow: Nauka, 1972. 27. B. V. Bulgakov, Oscillations. Moscow: Gostekhizdat, 1954. 28. R. Bulirsch and H. J. Pesch, The maximum principle, Bellman's equation, and Caratheodory's work. J. Optim. Theory and Appl.,

References

389

80, No. 2: 203-229, 1994. 29. T. A. Burton, Volterra Integral and Differential Equations. New

York: Academic Press, 1983. 30. A. G. Butkovskii, Distributed Control Systems. New York: Else-

vier, 1969. 31. F. Camili, M. Palcone, P. Lanucara, and A. Seghini, A domain

decomposition method for Bellman equations. In: D. E. Keyes and J. Xu, eds., Domain Decomposition Methods in Scientific and Engineering. Contemp. Math., Providence: Amer. Math. Soc.,

32.

33. 34. 35.

36. 37.

1994, 180: 477-483, 1994. F. L. Chernous'ko, Some problems of optimal control with a small parameter. Prikl. Mat. Mekh., 32, No. 1, 1968. F. L. Chernous'ko, L. D. Akulenko, and B. N. Sokolov, Control of Oscillations. Moscow: Nauka, 1980. F. L. Chernous'ko and V. B. Kolmanovskii, Optimal Control under Random Disturbances. Moscow: Nauka, 1978. C. W. Clark, Bioeconomic Modeling and Fisheries Managements. New York: Wiley, 1985. D. R. Cox and H. D. Miller, The Theory of Stochastic Processes. Methuen, 1965. M. L. Dashevskiy and R. S. Liptser, Analog modeling of stochastic differential equations connected with change point problem. Avtomat. i Telemekh., 27, No. 4, 1966.

38. M. H. A. Davis and R. B. Vinter, Stochastic Modeling and Control. London: Chapman and Hall, 1985.

39. M. H. DeGroot, Optimal Statistical Decisions. New York: McGrowHill, 1970. 40. V. F. Dem'yanov, On minimization of maximal deviation. Vestnik Leningrad Univ. Math., No. 7, 1966. 41. V. A. Ditkyn and A. P. Prudnikov, Integral Transforms and Operational Calculus. Moscow: Fizmatgiz, 1961. 42. A. L. Dontchev, Error estimates for a discrete approximation to constrained control problems. SIAM J. Numer. Anal., 18: 500514, 1981.

43. A. L. Dontchev, Perturbations, Approximations, and Sensitivity Analysis of Optimal Control Systems. Lecture Notes in Control and Inform. Sci., Vol. 52, Berlin: Springer, 1983. 44. J. L. Doob, Stochastic Processes. New York: Wiley, 1953.

390

References

45. E. B. Dynkin, Markov Process. Berlin: Springer, 1965.

46. S. V. Emel'yanov, ed., Theory of Variable-Structure Systems. Mos-

cow: Nauka, 1970. 47. C. Endrikat and I. Hartmann, Optimal design of discrete-time

MIMO systems in the frequency domain. Internat. J. Control, 48, No. 4: 1569-1582, 1988. 48. M. Falcone, Numerical solution of dynamic programming equations. Appendix to the monograph by M. Bardi, I. Capuzzo Dolcetta, Optimal Control and Viscosity Solutions of Hamilton-Jacobi-Bellman Equations. Basel-Boston: Birkhauser, 1997. 49. M. Falcone and R. Ferretti, Convergence analysis for a class of semi-

Lagrangian advection schemes. SIAM J. Numer. Anal., 38, 1998. 50. A. A. Feldbaum, Foundations of the Theory of Optimal Automatic Systems. Moscow: Nauka, 1966. 51. M. Feldman and J. Roughgarden, A populations's stationary distribution and chance of extinction in stochastic environments with

remarks on the theory of species packing. Theor. Pop. Biol., 7, No. 12: 197-207, 1975. 52. W. Feller, An Introduction to Probability Theory and Its Applications. New York: Wiley, 1970.

53. R. Ferretti, On a Class of Approximation Schemes for Linear Boundary Control Problems. Lecture Notes in Pure and Appl. Math., Vol. 163, New York: Marcel Dekker, 1994.

54. A. F. Filippov, Differential Equations with Discontinuous RightHand Sides. Dordrecht: Kluwer Academic Publishers, 1986.

55. W. H. Fleming, Some Markovian optimization problems. J. Math, and Mech., 12 No. 1, 1963. 56. W. H. Fleming, Stochastic control for small noise intensities. SIAM J. Control, 9, No. 3, 1971. 57. W. H. Fleming and M. R. James, Asymptotic series and exit time

probabilities. Ann. Probab., 20, No. 3: 1369-1384, 1992.

58. W. H. Fleming and R. W. Rishel, Deterministic and Stochastic Optimal Control: Berlin: Springer, 1975. 59. W. H. Fleming and H. M. Soner, Controlled Markov Processes and Viscosity Solutions. Berlin: Springer, 1993. 60. G. E. Forsythe, M. A. Malcolm, and C. B. Moler, Computer Methods for Mathematical Computation. Englewood Cliffs, N.J.: Prentice Hall, 1977.

References

391

61. A. Friedman, Partial Differential Equations of Parabolic Type. Englewood Cliffs, N.J.: Prentice Hall, 1964. 62. F. R. Gantmacher, The Theory of Matrices. Vol. 1, New York:

Chelsea, 1964. 63. I. M. Gelfand, Generalized Stochastic Processes. Dokl. Akad. Nauk

SSSR, 100, No. 5, 1955. 64. I. M. Gelfand and S. V. Fomin, Calculus of Variations. Moscow: Fizmatgiz, 1961. 65. I. M. Gelfand and G. I. Shilov, Generalized Functions and Their Calculations. Moscow: Fizmatgiz, 1959.

66. I. I. Gikhman and A. V. Skorokhod, The Theory of Stochastic Processes. Berlin: Springer, Vol. 1, 1974; Vol. 2, 1975. 67. B. V. Gnedenko, Theory of Probabilities. Moscow: Nauka, 1969. 68. B. S. Goh, Management and Analysis of Biological Populations. Amsterdam: Elsevier Sci., 1980. 69. L. S. Goldfarb, On some nonlinearities in automatic regulation systems. Avtomat. i Telemekh., 8, No. 5, 1947.

70. L. S. Goldfarb, Research method for nonlinear regulation systems based on harmonic balance principle. In: Theory of Automatic Regulation, Moscow: Mashgiz, 1951. 71. E. Goursat, Cours d'Analyse Mathematique. V. 3, Paris: GauthierVillars, 1927. 72. R. Z. Hasminskii, Stochastic Stability of Differential Equations. Alphen: Sijtjoff and Naordhoff, 1980. 73. G. E. Hutchinson, Circular control systems in ecology. Ann. New

York Acad. Sci., 50, 1948. 74. A. M. Il'in, A. S. Kalashnikov, and O. L. Oleynik, Second-order parabolic linear equations. Uspekhi Mat. Nauk, 17, No. 3, 1962. 75. K. Ito, Stochastic integral. Proc. Imp. Acad., Tokyo, 20, 1944. 76. K. Ito, On a formula concerning stochastic differentials. Nagoya Math. J., 3: 55-65, 1951. 77. E. Janke, F. Emde, and F. Losch, Tafeln hoherer Funktionen. Stuttgart: Teubner, 1960. 78. R. E. Kalman, On general theory of control systems. In: Proceedings of the 1 IFAC Congress, Vol. 2, Moscow: Acad. Nauk SSSR, 1960. 79. R. E. Kalman and R. S. Bucy, New results in linear filtering and prediction theory. Trans. ASME Ser. D (J. Basic Engineering), 83:

95-108, 1961.

392

References

80. L. I. Kamynin, Methods of heat potentials for a parabolic equation with discontinuous coefficients. Siberian Math. J., 4, No. 5, 1963. 81. L. I. Kamynin, On existence of boundary problem solution for parabolic equations with discontinuous coefficients. Izv. Akad. Nauk

SSSR Ser. Mat., 28, No. 4, 1964. 82. V. A. Kazakov, Introduction to the Theory of Markov Processes and Radio Engineering Problems. Moscow: Sovetskoe Radio, 1973.

83. M. Kimura, Some problems of stochastic processes in genetics. Ann. Math. Statist., 28: 882-901, 1957. 84. V. B. Kolmanovskii, On approximate synthesis of some stochastic systems. Avtomat. i Telemekh., 36, No. 1, 1975. 85. V. B. Kolmanovskii, Some time-optimal control problems for stochastic systems. Problems Control Inform. Theory, 4, No. 4, 1975. 86. V. B. Kolmanovskii and G. E. Kolosov, Approximate and numerical methods to design optimal control of stochastic systems. Izv. Akad.

Nauk SSSR Tekhn. Kibernet., No. 4: 64-79, 1989. 87. V. B. Kolmanovskii and A. D. Myshkis, Applied Theory of Functional Differential Equations. Dordrecht: Kluwer Academic Pub-

lishers, 1992. 88. V. B. Kolmanovskii and V. R. Nosov, Stability of Functional Differential Equations. London: Academic Press, 1986. 89. V. B. Kolmanovskii and L. E. Shaikhet, Control of Systems with Aftereffect. Transl. Math. Monographs, Vol. 157, Providence: Amer. Math. Soc., 1996. 90. V. B. Kolmanovskii and A. K. Spivak, Time-optimal control in a predator-prey system. Prikl. Mat. Mekh., 54, No. 3: 502-506,

1990. 91. A. N. Kolmogorov and S. V. Fomin, Elements of Function Theory and Functional Analysis. Moscow: Nauka, 1968. 92. G. E. Kolosov, Synthesis of statistical feedback systems optimal with respect to different performance indices.

Vestnik Moskov.

Univ. Ser. Ill, No. 1: 3-14, 1966. 93. G. E. Kolosov, Optimal control of quasiharmonic plants under incomplete information about the current values of phase variables.

Avtomat. i Telemekh., 30, No. 3: 33-41, 1969. 94. G. E. Kolosov, Some problems of optimal control of Markov plants. Avtomat. i Telemekh., 35, No. 2: 16-24, 1974. 95. G. E. Kolosov, Analytical solution of problems in synthesis of optimal distributed-parameter control systems subject to random per-

References

393

turbations. Automat. Remote Control, No. 11: 1612-1622, 1978. 96. G. E. Kolosov, Synthesis of optimal stochastic control systems by the method of successive approximations. Prikl. Mat. Mekh., 43, No. 1: 7-16, 1979.

97. G. E. Kolosov, Approximate synthesis of stochastic control systems with random parameters. Avtomat. i Telemekh., 43, No. 6: 107116, 1982. 98. G. E. Kolosov, Approximate method for design of stochastic adaptive optimal control systems. In: G. S. Ladde and M. Sambandham, eds., Proceedings of Dynamic Systems and Applications, Vol. 1, 1994, pp. 173-180. 99. G. E. Kolosov, On a problem of population size control. Izv. Ross. Akad. Nauk Teor. Sist. Upravlen., No. 2: 181-189, 1995. 100. G. E. Kolosov, Numerical analysis of some stochastic suboptimal controlled systems. In: Z. Deng, Z. Liang, G. Lu, and S. Ruan, eds., Differential Equations and Control Theory. Lecture Notes in Pure and Appl. Math., Vol. 176, New York: Marcel Dekker, 1996, pp. 143-148. 101. G. E. Kolosov, Exact solution of a stochastic problem of optimal control by population size. Dynamic Systems and Appl., 5, No. 1:

153-161, 1996. 102. G. E. Kolosov, Size control of a population described by a stochastic logistic model. Automat. Remote Control, 58, No. 4: 678-686, 1997. 103. G. E. Kolosov and D. V. Nezhmetdinova, Stochastic problems of optimal fisheries managements. In: Proceedings of the 15th IMACS Congress on Scientific Computation. Modelling and Applied Mathematics, Vol. 5, Berlin: Springer, 1997, pp. 15-20. 104. G. E. Kolosov and M. M. Sharov, Numerical method of design of stochastic optimal control systems. Automat. Remote Control, 49,

No. 8: 1053-1058, 1988. 105. G. E. Kolosov and M. M. Sharov, Optimal damping of population size fluctuations in an isolated "predator-prey" ecological system. Automation and Remote Control, 53 No. 6: 912-920, 1992.

106. G. E. Kolosov and M. M. Sharov, Optimal control of population sizes in a predator-prey system. Approximate design in the case of an ill-adapted predator. Automat. Remote Control, 54, No. 10:

1476-1484, 1993. 107. G. E. Kolosov and R. L. Stratonovich, An asymptotic method for

394

References solution of the problems of optimal regulators design. Avtomat. i

Telemekh., 25, No. 12: 1641-1655, 1964. 108. G. E. Kolosov and R. L. Stratonovich, On optimal control of quasi-

harmonic systems. Avtomat. i Telemekh., 26, No. 4:601-614, 1965. 109. G. E. Kolosov and R. L. Stratonovich, Asymptotic method for solution of stochastic problems of optimal control of quasiharmonic systems. Avtomat. i Telemekh., 28, No. 2: 45-58, 1967. 110. N. N. Krasovskii and E. A. Lidskii, Analytical design of regulators in the systems with random properties. Avtomat. i Telemekh., 22, No. 9-11, 1961. 111. N. N. Krasovskii, Theory of the Control of Motion. Moscow: Nauka, 1968. 112. V. F. Krotov, Global Methods in Optimal Control Theory. New York: Marcel Dekker, 1996. 113. N. V. Krylov, Controlled Diffusion Process. New York: Springer, 1980. 114. S. I. Kumkov and V. S. Patsko, Information sets in the problem of pulse control. Avtomat. i Telemekh., 22, No. 7: 195-206, 1997. 115. A. B. Kurzhanskii, Control and Observation under Uncertainty. Moscow: Nauka, 1977.

116. H. J. Kushner and A. Schweppe, Maximum principle for stochastic control systems. J. Math. Anal. Appl., No. 8, 1964. 117. H. J. Kushner, Stochastic Stability and Control. New York-London:

Academic Press, 1967. 118. H. J. Kushner, On the optimal control of a system governed by a linear parabolic equation with white noise inputs. SIAM J. Control,

6, No. 4, 1968. 119. H. Kwakernaak, The polynomial approach to H^o optimal regu-

lation. In: E. Mosca and L. Pandolfi, eds., .Hoc-Control Theory, Como, 1990. Lecture Notes in Math., Vol. 1496, Berlin: Springer, 1991. 120. H. Kwakernaak, Robust control and .Hoo-optimization. Automatica-J. IFAC, 29, No. 2: 255-273, 1993. 121. H. Kwakernaak, Symmetries in control system design. In: Alberto Isidori, ed., Trends in Control, A European Perspective, Rome. Berlin: Springer, 1995. 122. H. Kwakernaak and R. Sivan, Linear Optimal Control Systems. New York-London: Wiley, 1972.

References

395

123. J. P. La Salle, The time-optimal control problem. In: Contribution to Differential Equations, Vol. 5, Princeton, N.J.: Princeton Univ.

Press, 1960. 124. O. Ladyzhenskaya, V. Solonnikov, and N. Uraltseva, Linear and Quasilinear Equations of Parabolic Type. Transl. Math. Monographs, Vol. 23, Providence: Amer. Math. Soc., 1968. 125. V. Lakshmikantham, S. Leela and A. A. Martynyuk, Stability Analysis of Nonlinear Systems. New York: Marcel Dekker, 1988. 126. P. Lancaster and L. Rodman, Solutions of the continuous and discrete time algebraic Riccati equations. In: S. Bittanti, A. J. Laub,

and J. G. Willems, eds., The Riccati Equation. Berlin: Springer, 1991. 127. P. Langevin, Sur la theorie du mouvment brownien. Comptes Rendus Acad. Sci. Paris, 146, No. 10, 1908. 128. E. B. Lee and L. Marcus, Foundation of Optimal Control Theory. New York-London: Wiley, 1969. 129. X. X. Liao, Mathematical Theory and Application of Stability, Wuhan, China: Huazhong Normal Univ. Press, 1988. 130. J. L. Lions, Optimal Control of Systems Governed by Partial Dif-

ferential Equations. Berlin: Springer, 1971. 131. R. S. Liptser and A. N. Shiryaev, Statistics of conditionally Gaussian random sequences. In: Proc. of the 6th Berkeley Symp. of Mathem. Statistics and Probability. University of California, 1970. 132. R. S. Liptser and A. N. Shiryaev, Statistics of Random Processes.

Berlin: Springer, Vol. 1, 1977 and Vol. 2, 1978.

133. A. J. Lotka, Elements of Physical Biology. Baltimore: Williams and Wilkins, 1925. 134. R. Luttmann, A. Munack, and M. Thoma, Mathematical modelling, parameter identification, and adaptive control of single cell protein processes in tower loop bioreactors. In: Advances in Biochemical

Engineering, Biotechnology, Vol. 32, Berlin-Heidelberg: Springer, 1985, pp. 95-205. 135. G. I. Marchuk, Methods of Numerical Mathematics. New York-

Berlin: Springer, 1975. 136. N. N. Moiseev, Asymptotical Methods of Nonlinear Analysis. Moscow: Nauka, 1969. 137. N. N. Moiseev, Foundations of the Theory of Optimal Systems. Moscow: Nauka, 1975.

396

References

138. B. S. Mordukhovich, Approximation Methods in Problems of Optimization and Control. Moscow: Nauka, 1988.

139. V. M. Morozov and I. N. Kalenkova, Estimation and Control in

Nonstationary Systems. Moscow: Moscow State Univ. Press, 1988.

140. E. M. Moshkov, On accuracy of optimal control of terminal condition. Prikl. Mat. i Mekh., 34, No. 3, 1970. 142. J. D. Murray, Lectures on Nonlinear Differential Equation Model in

Biology. Oxford: Claremon Press, 1977. 143. G. V. Obrezkov and V. D. Razevig, Methods of Analysis of Tracking Breakdowns. Moscow: Sovetskoe Radio, 1972.

144. O. A. Oleynik, Boundary problems for linear elliptic and parabolic equation with discontinuous coefficients. Izv. Acad. Nauk SSSR

Ser. Mat., 25, No. 1, 1961. 145. V. S. Patsko, et al., Control of an aircraft landing in windshear. J. Optim. Theory and Appl., 83, No. 2: 237-267, 1994. 146. A. E. Pearson, Y. Shen, and J. Q. Pan, Discrete frequency formats

for linear differential system identification. In: Proc. of 12th World Congress IFAC, Sydney, Australia, Vol. VII, 1993, pp. 143-148 147. A. E. Pearson and A. A. Pandiscio, Control of time lag systems via reducing transformations. In: Proc. of 15th IMACS World Congress. A. Sydow, ed., Systems Engineering, Vol. 5, Berlin: Wissenschaft & Technik, 1997, pp. 9-14. 148. A. A. Pervozvanskii, On minimum of maximal deviation of controlled linear system. Izv. Acad. Nauk SSSR Mekhanika, No. 2,

1965. 149. H. J. Pesch, Real-time computation of feedback controls for constrained extremals (Part 1: Neighboring extremals; Part 2: A cor-

rection method based on multiple shooting). Optimal Control Appl. Methods, 10, No. 2: 129-171, 1989. 150. H. J. Pesch, A practical guide to the solution of real-life optimal

control problems. Control Cybernet., 23, No. 1 and 2: 7-60, 1994. 151. A. B. Piunovskiy, Optimal control of stochastic sequences with constraints. Stochastic Anal. Appl., 15, No. 2: 231-254, 1997. 152. A. B. Piunovskiy, Optimal Control of Random Sequences in Problems with Constraints. Dordrecht: Kluwer Academic Publishers,

1997. 153. H. Poincare, Sur le probleme de troits corps et les equations de la dynamiques. Acta Math., 13, 1890.

References

397

154. H. Poincare, Les Methodes Nouvelles de la Maechanique Celeste. Paris: Gauthier-Villars, 1892-1899. 155. I. I. Poletayeva, Choice of optimality criterion. In: Engineering

Cybernetics, Moscow: Nauka, 1965. 156. L. S. Pontryagin, V. G. Boltyanskii, R. V. Gamkrelidze, and E. F. Mischenko, The Mathematical Theory of Optimal Processes. New York: Interscience, 1962. 157. Yu. V. Prokhorov and Yu. A. Rozanov, Probability Theory, Foundations, Limit Theorems, and Stochastic Processes. Moscow: Nauka, 1967. 158. N. S. Rao and E. O. Roxin, Controlled growth of competing species.

SIAM J. Appl. Math., 50, No. 3: 853-864, 1990. 159. V. I. Romanovskii, Discrete Markov Chains. Moscow: Gostekhizdat, 1949. 160. Yu. A. Rozanov, Stochastic Processes. Moscow: Nauka, 1971.

161. A. P. Sage and J. L. Melsa, Estimation Theory with Applications to Communication and Control. New York: McGraw-Hill, 1971. 162. A. A. Samarskii, Introduction to Theory of Difference Schemes. Moscow: Nauka, 1971. 163. A. A. Samarskii and A. V. Gulin, Numerical Methods. Moscow:

Nauka, 1989. 164. M. S. Sholar and D. M. Wiberg, Canonical equation for boundary feedback control of stochastic distributed parameter systems. Automatica-J. IFAC, 8, 1972. 165. H. L. Smith, Competitive coexistence in an oscillating chemostat.

SIAM J. Appl. Math., 40, No. 3: 498-552, 1981. 166. S. L. Sobolev, Equations of Mathematical Physics. Moscow: Nauka,

1966. 167. Yu. G. Sosulin, Theory of Detection and Estimation of Stochastic Signals. Moscow: Sovetskoe Radio, 1978. 168. J. Song and J. Yu, Population System Control. Berlin: Springer, 1987. 169. J. Stoer, Principles of sequential quadratic programming methods for solving nonlinear programs. In: K. Schittkowski, ed., Computational Mathematical Programming. NATO ASI Series, F15, 1985,

pp.165-207. 170. R. L. Stratonovich, Application of Markov processes theory for optimal filtering of signals. Radiotekhn. i Elektron., 5, No. 11, 1960.

398

References

171. R. L. Stratonovich, On the optimal control theory. Sufficient coordinates. Avtomat. i Telemekh., 23, No. 7, 1962. 172. R. L. Stratonovich, On the optimal control theory. Asymptotic method for solving the diffusion alternative equation. Avtomat. i

Telemekh., 23, No. 11, 1962. 173. R. L. Stratonovich, Topics in the Theory of Random Noise. New York: Gordon and Breach, Vol. 1, 1963 and Vol. 2, 1967. 174. R. L. Stratonovich, New form of stochastic integrals and equations. Vestnik Moskov. Univ. Ser. I Mat. Mekh., No. 1, 1964. 175. R. L. Stratonovich, Conditional Markov Processes and Their Application to the Theory of Optimal Control. New York: Elsevier, 1968. 176. R. L. Stratonovich and V. I. Shmalgauzen, Some stationary problems of dynamic programming, Izv. Akad. Nauk SSSR Energetika i Avtomatika, No. 5, 1962. 177. Y. M. Svirezhev, Nonlinear Waves, Dissipative Structures, and Catastrophes in Ecology. Moscow: Nauka, 1987. 178. G. W. Swan, Role of optimal control theory in cancer chemotherapy, Math. Biosci., 101: 237-284, 1990.

179. A. N. Tikhonov and A. A. Samarskii, Equations of Mathematical Physics. Moscow: Nauka, 1972. 180. V. I. Tikhonov, Phase small adjustment of frequency in presence of noises. Avtomat. i Telemekh., 21, No. 3, 1960. 181. V. I. Tikhonov and M. A. Mironov, Markov Processes. Moscow: Sovetskoe Radio, 1977. 182. S. G. Tzafestas and J. M. Nightingale, Optimal control of a class

of linear stochastic distributed parameter systems. Proc. IEE, 115, No. 8, 1968. 183. B. van der Pol, A theory of the amplitude of free and forced triode

vibration. Radio Review, 1, 1920. 184. B. van der Pol, Nonlinear theory of electrical oscillations. Proc.

IRE, 22, No. 9, 1934. 185. B. L. van der Waerden, Mathematische Statistik. Berlin: Springer, 1957.

186. V. Volterra, Variazione fluttuazioni del numero d'individui in specie animali convivelnti. Mem. Acad. Lincei, 2: 31-113, 1926. 187. V. Volterra, Lecons sur la theorie mathematique de la lutte pour la vie. Paris: Gauthier-Villars, 1931.

References

399

188. A. Wald, Sequential Analysis, New York: Wiley, 1950. 189. K. E. F. Watt, Ecology and Resource Management. New York:

McGraw-Hill, 1968. 190. B. Wittenmark and K. J. Astrom, Practical issues in the implementation of self-tuning control. Automatica-J. IFAC, 20: 595-605,

1984. 191. E. Wong and M. Zakai, On the relation between ordinary and stochastic differential equations. Internat. J. Engrg. Sci., 3, 1965. 192. E. Wong and M. Zakai, On the relation between ordinary and stochastic differential equations and applications to stochastic problems in control theory. Proc. Third Intern. Congress IFAC, London, 1966. 193. W. M. Wonham, On the separation theorem of stochastic control.

SIAM J. Control, 6: 312-326, 1968. 194. W. M. Wonham, Random differential equations in control theory. In: A. T. Bharucha-Reid, ed., Probabilistic Methods in Applied Mathematics, Vol. 2, New York: Academic Press, 1970. 195. M. A. Zarkh and V. S. Patsko, Strategy of second payer in the linear differential game. Prikl. Math. Mekh., 51, No. 2: 193-200, 1987.

INDEX

program, 2

Adaptive problems of optimal control, 9 A posteriori mean values, 91

A posteriori covariances, 90 Asymptotic synthesis method, 248 Asymptotic series, 220

of relay type, 105, 111 Control problem with infinite horizon, 343 Controller, 1, 7

Constraints, control, 17 on control resources, 17

on phase variables, 18 Cost function (functional), 49

Covariance matrix, 147

D Bellman equation, 47, 51 differentional, 63 functional, 278 integro-differentional, 74 stationary, 67

Bellman optimality principle, 49

Diffusion process, 27 Dynamic programming approach, 47

E

Brownian motion, 33

Equations, Langevin 45 C Cauchy problem, 9 Capacity of the medium, 124 Chapman-Kolmogorov equation, 23 Control, admissible, 9

bang-bang, 105

logistic, 124 of a single population, 342

stochastic differential, 32 truncated, 253 Error signal, 104 Error, stationary tracking, 67, 226 Estimate,

boundary, 212

of approximate synthesis, 182

distributed, 201

of unknown parameters, 316

401

Index

402

Euler equation, 136

Lotka-Volterra, equation, 125

normalized model, 274, 368 F M Feedback control system, 2 Filippov generalized solution, 12 Fokker-Planck equation, 29 Functional, cost, 19 quadratic, 93, 99

Gaussian, conditionally, 313 probability density, 92 process, 20

Markov, process, 21 discrete, 22 continuous, 25 conditional, 79 strictly discontinuous, 31 Mathematical expectation, 15 conditional, 60 Matrix, fundamental, 177 Method, alternating direction, 378 grid function, 356 small parameter, 220

H Hutchinson, model, 125

of successive approximation, 143 sweep, 364 Model, stochastic logistic, 126, 311 Malthus, model, 123

Integral criterion, 14 Ito, equation, 42

N

stochastic integral, 37

K Kalman filter, 91 Kolmogorov, backward equation, 25 forward equation, 25

Krylov-Bogolyubov method, 254

Loss function, 49

Natural growth factor, 124 Nonvibrational amplitude, 254 phase, 254 O

Optimal, damping of random oscillations, 276 fisheries

management, 133, 342 Optimality criterion, 2, 13 terminal, 14

Index

403

Oscillator, quasiharmonic, 248 Oscillatory systems, 247

lengthwise-transverse, 362

Screen, reflecting, 329 absorbing, 333 Servomechanism, 7 Sliding mode, 12

Performance index, 2

Stationary operating

Plant, 1, 7 Plant with distributed parameters, 199 Poorly adapted predator, 267 Population models, 123

conditions, 65 Sufficient coordinates, 75 Switch point, 105 Switching line, 156 Symmetrized (Stratonovich) stochastic integral, 40 Synthesis, numerical, 355 Synthesis problem, 7

Predator-prey model, 125

Probability, density, 20 Problem, boundary-value, 70 linear-quadratic (LQ-), 53 with free endpoint, 48 Process, stochastic, 19

optimal stabilization, 278

R Regulator, 154 Riccati equation, 100

Transition probability, 22

V Van-der-Pol oscillator, 252 Van-der-Pol method, 254

W Sample path, 108 Scheme,

White noise, 19 Wiener random process, 33

Geometry of Feedback and Optimal Control (Pure and Applied Mathematics)

Read more

Fourier Analysis and Approximation Volume 1. (Pure and applied mathematics; a series of monographs and textbooks)

Read more

Monomial Algebras (Monographs and Textbooks in Pure and Applied Mathematics)

Read more

Nonlinear Controllability and Optimal Control (Pure and Applied Mathematics)

Read more

Coding Theory: The Essentials (Pure and Applied Mathematics : a Series of Monographs and Textbooks, 150)

Read more

Banach algebras: An introduction (Pure and Applied Mathematics: A Series of Monographs and Textbooks 24)

Read more

Set Theory (Pure and Applied Mathematics: A Series of Monographs and Textbooks Vol 79)

Read more

Topological Embeddings (Pure and applied mathematics; a series of monographs and textbooks)

Read more

Linear Systems Control: Deterministic and Stochastic Methods

Read more

Linear Systems Control: Deterministic and Stochastic Methods

Read more

Linear Systems Control: Deterministic and Stochastic Methods

Read more

Stochastic versus Deterministic Systems of Differential Equations (Pure and Applied Mathematics)

Read more

Design of Experiments (Statistics: a Series of Textbooks and Monographs)

Read more

Approximation-Solvability of Nonlinear Functional and Differential Equations (Monographs and Textbooks in Pure and Applied Mathematics)

Read more

Combinatorics of Spreads and Parallelisms (Monographs and Textbooks in Pure and Applied Mathematics)

Read more

Principles of Differential Equations (Pure and Applied Mathematics: A Wiley Series of Texts, Monographs and Tracts)

Read more

Revolutions of Geometry (Pure and Applied Mathematics: A Wiley Series of Texts, Monographs and Tracts)

Read more

Principles of Differential Equations (Pure and Applied Mathematics: A Wiley Series of Texts, Monographs and Tracts)

Read more

An Introduction to Differentiable Manifolds and Riemannian Geometry (Pure and applied mathematics, a series of monographs and textbooks)

Read more

An Introduction to Differentiable Manifolds and Riemannian Geometry (Pure and applied mathematics, a series of monographs and textbooks)

Read more

Linear and Integer Programming: Theory and Practice (Monographs and Textbooks in Pure and Applied Mathematics)

Read more

Linear Operators, Part III: Spectral Operators (Pure and Applied Mathematics: A Series of Texts and Monographs)

Read more

Galois Theory (Pure and Applied Mathematics: A Wiley Series of Texts, Monographs and Tracts)

Read more

Functional Analysis (Pure and Applied Mathematics: A Wiley-Interscience Series of Texts, Monographs and Tracts)

Read more

Galois Theory (Pure and Applied Mathematics: A Wiley Series of Texts, Monographs and Tracts)

Read more

Time Dependent Problems and Difference Methods (Pure and Applied Mathematics: A Wiley-Interscience Series of Texts, Monographs and Tracts)

Read more

Boundary Value Problems and Singular Pseudo-Differential Operators (Pure and Applied Mathematics: A Wiley Series of Texts, Monographs and Tracts)

Read more

Green's Functions and Boundary Value Problems (Pure and Applied Mathematics: A Wiley Series of Texts, Monographs and Tracts)

Read more

Biorthogonality and its Applications to Numerical Analysis (Monographs and Textbooks in Pure and Applied Mathematics)

Read more

Vector and Tensor Analysis (Monographs and Textbooks in Pure and Applied Mathematics)

Read more

Recommend Documents

Geometry of Feedback and Optimal Control (Pure and Applied Mathematics)

GEOMETRY OF FEEDBACK AND OPTIMAL CONTROL PURE AND APPLIED MATHEMATICS A Program of Monographs, Textbooks,and Lecture ...

Fourier Analysis and Approximation Volume 1. (Pure and applied mathematics; a series of monographs and textbooks)

FOURIER ANALYSIS AND APPROXIMATION Volume I This is Volume 40 in PURE AND APPLIED MATHEMATICS A Series of Monographs ...

Monomial Algebras (Monographs and Textbooks in Pure and Applied Mathematics)

MONOMIAL ALGEBRAS PURE AND APPLIED MATHEMATICS A Program of Monographs, Textbooks, and Lecture Notes EXECUTIVE EDITO...

Nonlinear Controllability and Optimal Control (Pure and Applied Mathematics)

Coding Theory: The Essentials (Pure and Applied Mathematics : a Series of Monographs and Textbooks, 150)

Banach algebras: An introduction (Pure and Applied Mathematics: A Series of Monographs and Textbooks 24)

PUR E AND A Series of I\ I'PI. I ED McJ ll o~rap " .I' MATHEMAT I CS and Textbooks BANACH ALGEBRAS an introductio...

Set Theory (Pure and Applied Mathematics: A Series of Monographs and Textbooks Vol 79)

SET THEORY This is a volume in PURE AND APPLIED MATHEMATICS A Series of Monographs and Textbooks Editors: SAMUEL EILE...

Topological Embeddings (Pure and applied mathematics; a series of monographs and textbooks)

Topological Embeddings This is Volume 52 in PURE AND APPLIED MATHEMATICS A series of Monographs and Textbooks Editors...

Linear Systems Control: Deterministic and Stochastic Methods

Linear Systems Control: Deterministic and Stochastic Methods

Time: 5:54pm t1-v1.0 DP RO OF Linear Systems Control UN CO RR EC TE May 7, 2008 t1-v1.0 DP RO OF Time: 5:54pm...