Advances in Industrial Control
Other titles published in this Series: Digital Controller Implementation and Fragility ...
24 downloads
1160 Views
2MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Advances in Industrial Control
Other titles published in this Series: Digital Controller Implementation and Fragility Robert S.H. Istepanian and James F. Whidborne (Eds.) Optimisation of Industrial Processes at Supervisory Level Doris Sáez, Aldo Cipriano and Andrzej W. Ordys Robust Control of Diesel Ship Propulsion Nikolaos Xiros Hydraulic Servo-systems Mohieddine Jelali and Andreas Kroll Strategies for Feedback Linearisation Freddy Garces, Victor M. Becerra, Chandrasekhar Kambhampati and Kevin Warwick Robust Autonomous Guidance Alberto Isidori, Lorenzo Marconi and Andrea Serrani Dynamic Modelling of Gas Turbines Gennady G. Kulikov and Haydn A. Thompson (Eds.) Control of Fuel Cell Power Systems Jay T. Pukrushpan, Anna G. Stefanopoulou and Huei Peng Fuzzy Logic, Identification and Predictive Control Jairo Espinosa, Joos Vandewalle and Vincent Wertz Optimal Real-time Control of Sewer Networks Magdalene Marinaki and Markos Papageorgiou Process Modelling for Control Benoît Codrons Computational Intelligence in Time Series Forecasting Ajoy K. Palit and Dobrivoje Popovic Modelling and Control of mini-Flying Machines Pedro Castillo, Rogelio Lozano and Alejandro Dzul
Rudder and Fin Ship Roll Stabilization Tristan Perez Hard Disk Drive Servo Systems (2nd Edition) Ben M. Chen, Tong H. Lee, Kemao Peng and Venkatakrishnan Venkataramanan Measurement, Control, and Communication Using IEEE 1588 John Eidson Piezoelectric Transducers for Vibration Control and Damping S.O. Reza Moheimani and Andrew J. Fleming Windup in Control Peter Hippe Manufacturing Systems Control Design Stjepan Bogdan, Frank L. Lewis, Zdenko Kovaˇci´c and José Mireles Jr. Practical Grey-box Process Identification Torsten Bohlin Modern Supervisory and Optimal Control Sandor A. Markon, Hajime Kita, Hiroshi Kise and Thomas Bartz-Beielstein Publication due July 2006 Wind Turbine Control Systems Fernando D. Bianchi, Hernán De Battista and Ricardo J. Mantz Publication due August 2006 Soft Sensors for Monitoring and Control of Industrial Processes Luigi Fortuna, Salvatore Graziani, Alessandro Rizzo and Maria Gabriella Xibilia Publication due August 2006 Advanced Fuzzy Logic Technologies in Industrial Applications Ying Bai, Hanqi Zhuang and Dali Wang (Eds.) Publication due September 2006 Practical PID Control Antonio Visioli Publication due November 2006
Murad Abu-Khalaf, Jie Huang and Frank L. Lewis
Nonlinear H2 /H∞ Constrained Feedback Control A Practical Design Approach Using Neural Networks
With 47 Figures
123
Murad Abu-Khalaf, PhD Automation & Robotics Research Institute The University of Texas at Arlington Fort Worth, Texas USA
Jie Huang, PhD Department of Automation and Computer-aided Engineering Chinese University of Hong Kong Shatin, New Territories Hong Kong
Frank L. Lewis, PhD Automation & Robotics Research Institute The University of Texas at Arlington Fort Worth, Texas USA
British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library Library of Congress Control Number: 2006925302 Advances in Industrial Control series ISSN 1430-9491 ISBN-10: 1-84628-349-3 e-ISBN 1-84628-350-7 ISBN-13: 978-1-84628-349-9
Printed on acid-free paper
© Springer-Verlag London Limited 2006 MATLAB® and Simulink® are registered trademarks of The MathWorks, Inc., 3 Apple Hill Drive, Natick, MA 01760-2098, U.S.A. http://www.mathworks.com Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced, stored or transmitted, in any form or by any means, with the prior permission in writing of the publishers, or in the case of reprographic reproduction in accordance with the terms of licences issued by the Copyright Licensing Agency. Enquiries concerning reproduction outside those terms should be sent to the publishers. The use of registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant laws and regulations and therefore free for general use. The publisher makes no representation, express or implied, with regard to the accuracy of the information contained in this book and cannot accept any legal responsibility or liability for any errors or omissions that may be made. Printed in Germany 987654321 Springer Science+Business Media springer.com
Advances in Industrial Control Series Editors Professor Michael J. Grimble, Professor of Industrial Systems and Director Professor Michael A. Johnson, Professor (Emeritus) of Control Systems and Deputy Director Industrial Control Centre Department of Electronic and Electrical Engineering University of Strathclyde Graham Hills Building 50 George Street Glasgow G1 1QE United Kingdom
Series Advisory Board Professor E.F. Camacho Escuela Superior de Ingenieros Universidad de Sevilla Camino de los Descobrimientos s/n 41092 Sevilla Spain Professor S. Engell Lehrstuhl für Anlagensteuerungstechnik Fachbereich Chemietechnik Universität Dortmund 44221 Dortmund Germany Professor G. Goodwin Department of Electrical and Computer Engineering The University of Newcastle Callaghan NSW 2308 Australia Professor T.J. Harris Department of Chemical Engineering Queen’s University Kingston, Ontario K7L 3N6 Canada Professor T.H. Lee Department of Electrical Engineering National University of Singapore 4 Engineering Drive 3 Singapore 117576
Professor Emeritus O.P. Malik Department of Electrical and Computer Engineering University of Calgary 2500, University Drive, NW Calgary Alberta T2N 1N4 Canada Professor K.-F. Man Electronic Engineering Department City University of Hong Kong Tat Chee Avenue Kowloon Hong Kong Professor G. Olsson Department of Industrial Electrical Engineering and Automation Lund Institute of Technology Box 118 S-221 00 Lund Sweden Professor A. Ray Pennsylvania State University Department of Mechanical Engineering 0329 Reber Building University Park PA 16802 USA Professor D.E. Seborg Chemical Engineering 3335 Engineering II University of California Santa Barbara Santa Barbara CA 93106 USA Doctor K.K. Tan Department of Electrical Engineering National University of Singapore 4 Engineering Drive 3 Singapore 117576 Professor Ikuo Yamamoto Kyushu University Graduate School Marine Technology Research and Development Program MARITEC, Headquarters, JAMSTEC 2-15 Natsushima Yokosuka Kanagawa 237-0061 Japan
To my parents Suzan and Muhammad Samir M. Abu-Khalaf To Qingwei, Anne and Jane J. Huang To Galina F. L. Lewis
Series Editors’ Foreword
The series Advances in Industrial Control aims to report and encourage technology transfer in control engineering. The rapid development of control technology has an impact on all areas of the control discipline. New theory, new controllers, actuators, sensors, new industrial processes, computer methods, new applications, new philosophies}, new challenges. Much of this development work resides in industrial reports, feasibility study papers and the reports of advanced collaborative projects. The series offers an opportunity for researchers to present an extended exposition of such new work in all aspects of industrial control for wider and rapid dissemination. Almost all physical systems are nonlinear and the success of linear control techniques depends on the extent of the nonlinear system behaviour and the careful attention given to switching linear controllers through the range of nonlinear system operations. In many industrial and process-control applications, good engineering practice, linear control systems and classical PID control can give satisfactory performance because the process nonlinearity is mild and the control system performance specification is not particularly demanding; however, there are other industrial system applications where the requirement for high-performance control can only be achieved if nonlinear control design techniques are used. Thus, in some industrial and technological domains there is a strong justification for more applications of nonlinear methods. One prevailing difficulty with nonlinear control methods is that they are not so easily understood nor are they easy to reduce to formulaic algorithms for routine application. The abstract and often highly mathematical tools needed for nonlinear control systems design means that there is often an “education gap” between the control theorist and the industrial applications engineer; a gap that is difficult to bridge and that prevents the widespread implementation of many nonlinear control methods. The theorist/applications engineer “education gap” is only one aspect of the complex issues involved in the technology transfer of nonlinear control systems into industry. A second issue lies in the subject itself and involves the question of whether nonlinear control design methods are sufficiently mature actually to make the transfer to industry feasible and worthwhile. A look at the nonlinear control literature reveals many novel approaches being developed by the theorist but often
x
Series Editors’ Foreword
these methods are neither tractable nor feasible nor has sufficient attention been given to the practical relevance of the techniques for industrial application. We hope through the Advances in Industrial Control series to explore these themes through suitable volumes and to try to create a corpus of monograph texts on applicable nonlinear control methods. Typically such volumes will make contributions to the range of applicable nonlinear-control-design tools, will provide reviews of industrially applicable techniques that try to unify groups of nonlinear control design methods and will provide detailed presentations of industrial applications of nonlinear control methods and system technology. This particular volume in Advances in Industrial Control by M. Abu-Khalaf, J. Huang and F.L. Lewis makes a contribution to increasing the range of applicable nonlinear control design tools. It starts from a very classical viewpoint that performance can be captured by a suitably constructed cost function and that the appropriate control law emerges from the optimisation of the cost function. The difficulty is that the solution of these optimal control problems for the class of nonlinear state-space systems selected leads to intractable equations of the Hamilton–Jacobi type. The authors then propose and develop a solution route that exploits the approximation properties of various levels of complexity within nonlinear network structures. Namely, they use neural networks and exploit their “universal function approximation property” to compute tractable solutions to the posed nonlinear H2- and H-optimal-control problems. Demonstrations of the methods devised are given for various numerical examples in Chapter 3; these include a nonlinear oscillator, a minimum-time control problem and a parabolic tracking system. Later in the volume, the nonlinear benchmark problem of a Rotational–Translational Actuator (RTAC) system is used to illustrate the power of the methods devised. An aerospace example using the control design for the F-16 aircraft normal acceleration regulator illustrates a high-performance output feedback control system application. Thus, the volume has an interesting set of applications examples to test the optimal control approximation techniques and demonstrate the performance enhancements possible. This welcome entry to the Advances in Industrial Control monograph series will be of considerable interest to the academic research community particularly those involved in developing applicable nonlinear-control-system methods. Research fellows and postgraduate students should find many items giving research inspiration or requiring further development. The industrial engineer will be able to use the volume’s examples to see what the nonlinear control laws look like and by how much levels of performance can be improved by the use of nonlinear optimal control. M.J. Grimble and M.A. Johnson Industrial Control Centre Glasgow, Scotland, U.K.
Preface
Modern Control Theory has revolutionized the design of control systems for aerospace systems, vehicles including automobiles and ships, industrial processes, and other highly complex systems in today’s world. Modern Control Theory was introduced during the late 1950s and 1960s. Key features of Modern Control are the use of matrices, optimality design conditions, and probabilistic methods. It allows the design of control systems with guaranteed performance for multiinput/multi-output systems through the solution of formal matrix design equations. For linear state-space systems, the design equations are quadratic in form and belong to the general class known as Riccati equations. For systems in polynomial form, the design equations belong to the class known as Diophantine equations. The availability of excellent solution techniques for the Riccati and Diophantine design equations has brought forward a revolution in the design of control systems for linear systems. Moreover, mathematical analysis techniques have been effectively used to provide guaranteed performance and closed-loop stability results for these linear system controllers. This has provided confidence in modern control systems designed for linear systems, resulting in their general acceptance in communities including aerospace, process control, military systems, and vehicle systems, where performance failures can bring catastrophic disasters. Physical systems are nonlinear. The push to extend the operating envelopes of such systems, for instance hyper-velocity and super-maneuverability performance in aerospace systems and higher data storage densities for computer hard disk drive systems, means that linear approximation techniques for controls design no longer work effectively. Therefore the design of efficient modern control systems hinges on the ability to use nonlinear system models. It is known that control systems design for general nonlinear systems can be performed by solving equations that are in the Hamilton–Jacobi (HJ) class. Unfortunately, control design for modernday nonlinear systems is hampered because the HJ equations are impossible to solve exactly for general nonlinear systems. This book presents computationally effective and rigorous methods for solving control design equations in the HJ class for nonlinear systems. The approach taken
xii
Preface
is the approximation of the value functions of the HJ equations by nonlinear network structures such as neural networks. It is known that neural networks have many properties, some of them remarkable and none more important than the “universal function approximation property”. In this book, we use neural networks to solve HJ equations to obtain nearly optimal solutions. The convergence of the solutions and the guaranteed performance properties of the controllers derived from them are rigorously shown using mathematical analysis techniques. The result of the nearly optimal solution procedures provided in this book is an extension to modern nonlinear systems of accepted and proven results like those already known for linear systems. Included are optimal controls design for nonlinear systems, H-infinity design for nonlinear systems, constrained-input controllers including minimum-time design for nonlinear systems, and other results that are essential for effective utilization of the full envelope of capabilities of modern systems. The book is organized into eight chapters. In Chapter 1, preliminarily results from four main areas are collected. These results can be thought of as the building blocks upon which the rest of the book relies. Chapter 2 introduces the policy iterations technique to constrained nonlinear optimal control systems. It is shown that one can solve the optimal control problem by iterative optimization. Chapter 3 introduces neural network training as a means to solve the iterative optimizations introduced in Chapter 2. Both Chapter 2 and 3 therefore introduce neural networks to the solution of optimal control problems for constrained nonlinear systems by using iterative optimization techniques based on policy iterations, dynamic programming principles, function approximation, and neural network training. In Chapter 4, the application of reinforcement learning to zero-sum games appearing in H-infinity control is discussed. The result is an iterative optimization technique that solves the zero-sum game. Chapter 5 shows an implementation of neural networks to the solution of the iterative optimization problems for the case of zero-sum games. In Chapters 6 and 7, a systematic approach to the solution of the value function for the case of zero-sum games is shown in both continuous time and discrete time respectively. In this case, unlike the previous chapters, the solution is aimed at directly without using neural networks or iterative optimizations. Chapter 8 addresses constraints on the measured output. The static output feedback control for H-infinity control is treated. An iterative method to solve for the static output feedback gain for the case of linear systems is presented. The work in Chapter 8 is based on collaborative research with Jyotirmay Gadewadikar who contributed this chapter. Simulations presented in this book are implemented using The MathWorks MATLAB software package. Funding of the work reported by the first and third authors, mainly Chapters 1– 5 and Chapter 8, was provided by the National Science Foundation through the Electrical and Communications Systems division under grant ECS-0501451, and by the Army Research Office under grant W91NF-05-1-0314. Jie Huang’s work, which is limited to Chapters 6 and 7, was supported by the Hong Kong Research
Preface
xiii
Grants Council under grant CUHK 4168/03E, and by the National Natural Science Foundation of China under grant No. 60374038.
April 2006 Arlington, Texas
Murad Abu-Khalaf Jie Huang Frank L. Lewis
Contents
Mathematical Notation ....................................................................................... xix 1 Preliminaries and Introduction ....................................................................... 1 1.1 Nonlinear Systems ..................................................................................... 1 1.1.1 Continuous-time Nonlinear Systems .............................................. 1 1.1.2 Discrete-time Nonlinear Systems ................................................... 2 1.2 Stability of Nonlinear Systems .................................................................. 3 1.2.1 Lyapunov Stability of Continuous-time Nonlinear Systems .......... 4 1.2.2 Lyapunov Stability of Discrete-time Nonlinear Systems ............... 7 1.3 Dissipativity of Nonlinear Systems ........................................................... 8 1.3.1 Dissipativity of Continuous-time Nonlinear Systems..................... 9 1.3.2 Dissipativity of Discrete-time Nonlinear Systems........................ 12 1.4 Optimal Control of Nonlinear Systems.................................................... 14 1.4.1 Dynamic Programming and the HJB Equation............................. 14 1.4.2 Discrete-time HJB Equation......................................................... 17 1.5 Policy Iterations and Optimal Control ..................................................... 18 1.5.1 Policy Iterations and H2 Optimal Control..................................... 19 1.5.2 Policy Iterations and the Bounded Real Lemma........................... 21 1.6 Zero-sum Games of Nonlinear Systems .................................................. 23 1.6.1 Continuous-time Zero-sum Games: The HJI Equation ................ 23 1.6.2 Linear Quadratic Zero-sum Games and H Optimal Control ....... 25 1.6.3 Discrete-time HJI equation........................................................... 26 1.7 Neural Networks and Function Approximation....................................... 28 1.7.1 Neural Networks........................................................................... 28 1.7.2 Function Approximation Theorems.............................................. 30 1.8 Bibliographical Notes .............................................................................. 31 2 Policy Iterations and Nonlinear H2 Constrained State Feedback Control .................................................................................. 33 2.1 Introduction.............................................................................................. 33 2.2 Optimal Regulation of Systems with Actuator Saturation ....................... 34 2.3 Policy Iterations for Constrained-input Systems...................................... 37
xvi
Contents
2.4
2.5
Nonquadratic Performance Functionals for Minimum-time and Constrained States Control ...................................................................... 41 2.4.1 Minimum-time Problems.............................................................. 41 2.4.2 Constrained States ........................................................................ 41 Bibliographical Notes .............................................................................. 42
3 Nearly H2 Optimal Neural Network Control for Constrained-Input Systems............................................................................ 43 3.1 A Neural Network Solution to the LE(V,u).............................................. 43 3.2 Convergence of the Method of Least Squares to the Solution of the LE(V,u) .......................................................................................... 45 3.3 Convergence of the Method of Least Squares to the Solution of the HJB Equation................................................................................. 52 3.4 Algorithm for Nearly Optimal Neurocontrol Design with Saturated Controls: Introducing a Mesh in \ n ....................................... 54 3.5 Numerical Examples................................................................................ 56 3.5.1 Constrained-input Linear System ................................................. 56 3.5.2 Nonlinear Oscillator with Constrained Input................................ 62 3.5.3 Constrained State Linear System.................................................. 65 3.5.4 Minimum-time Control................................................................. 68 3.5.5 Parabolic Tracker.......................................................................... 71 3.6 Policy Iterations Without Solving the LE(V,u) ........................................ 75 3.7 Bibliographical Notes .............................................................................. 76 4 Policy Iterations and Nonlinear H Constrained State Feedback Control .................................................................................. 77 4.1 Introduction.............................................................................................. 77 4.2 Policy Iterations and the Nonlinear Bounded Real Lemma ..................... 78 4.3 L2-gain of Nonlinear Control systems with Input Saturation ................... 83 4.4 The HJI Equation and the Saddle Point ................................................... 86 4.5 Solving the HJI Equation Using Policy Iterations ................................... 90 4.6 Bibliographical Notes .............................................................................. 94 5 Nearly H Optimal Neural Network Control for Constrained-Input systems ............................................................................ 95 5.1 Neural Network Representation of Policies............................................. 96 5.2 Stability and Convergence of Least Squares Neural Network Policy Iterations ..................................................................................... 100 5.3 RTAC: The Nonlinear Benchmark Problem.......................................... 104 5.4 Bibliographical Notes ............................................................................ 113 6 Taylor Series Approach to Solving HJI Equation ..................................... 115 6.1 Introduction............................................................................................ 115 6.2 Power Series Solution of HJI Equation.................................................. 118 6.3 Explicit Expression for Hk ..................................................................... 126 6.4 The Disturbance Attenuation of RTAC System..................................... 135 6.5 Bibliographical Notes ............................................................................ 146
Contents xvii
7 An Algorithm to Solve Discrete HJI Equations Arising from Discrete Nonlinear H Control Problems.......................................... 147 7.1 Introduction............................................................................................ 147 7.2 Taylor Series Solution of Discrete Hamilton–Jacobi–Isaacs Equation................................................................................................. 151 7.3 Disturbance Attenuation of Discretized RTAC System......................... 164 7.4 Computer Simulation............................................................................. 172 7.5 Bibliographical Notes ............................................................................ 175 8 H Static Output Feedback.......................................................................... 177 8.1 Introduction............................................................................................ 177 8.2 Intermediate Mathematical Analysis ..................................................... 178 8.3 Coupled HJ Equations for H Static Output Feedback Control............. 182 8.4 Existence of Static Output Feedback Game Theoretic Solution ............ 185 8.5 Iterative Solution Algorithm .................................................................. 187 8.6 H Static Output Feedback Design for F-16 Normal Acceleration Regulator .......................................................................... 188 8.7 Bibliographical Notes ............................................................................ 192 References ........................................................................................................... 193 Index .................................................................................................................... 201
Mathematical Notation
\n AT V ( x) Vx
x x xc H2 Hf L2 : : C m (: ) x: x: w
V V V V ARE HJ HJB HJI LE DOV A
B A B sup x:
Euclidean n -dimensional space transpose of matrix A value or cost of x column vector corresponding to the gradient of V ( x) with respect to x . In Chapters 5 and 6, this is a row vector. state vector of the dynamical system the 2-norm of vector x transpose of the vector x 2-norm on the Hardy space f-norm on the Hardy space 2-norm on the Lebesgue space of integrable functions compact set of the state space complement of the set : continuous and differentiable up to the mth degree on : x belongs to : x does not belong to : neural network weight vector neural network activation function neural network activation functions vector column vector denoting the gradient of V with respect to x algebraic Riccati equation Hamilton–Jacobi equation Hamilton–Jacobi–Bellman equation Hamilton–Jacobi–Isaacs equation Lyapunov equation domain of validity there exists A Kronecker product B A and B supremum of a function with respect to x on :
xx
Mathematical Notation
min
minimum with respect to u
max
maximum with respect to d
u
d
a, b
inner product: integral
³ a( x)b( x)dx
for scalar a( x) and b( x)
1 Preliminaries and Introduction
In this chapter, basic concepts and background material related to the analysis and control of nonlinear systems are reviewed. The topics covered in this chapter are based on a variety of well-established research topics upon which the rest of this book is based. In Section 1.1, the classes of continuous-time and discrete-time dynamical nonlinear systems that appear throughout the book are described using the state-space formulation. Section 1.2 reviews the main stability results concerning nonlinear dynamical systems. Section 1.3 reviews the important notions of dissipativity and the bounded real lemma. In Section 1.4, optimal control of nonlinear dynamical systems is reviewed, and the well-known Hamilton–Jacobi–Bellman equation for continuous-time and discrete-time systems is introduced with its relations to the H 2 norm. In Section 1.5, the concept of policy iterations found in the reinforcement learning literature is reviewed. Its relation to the optimal control problem is discussed in the framework of Riccati equations. In Section 1.6, zero-sum game theory is reviewed, and the well-known Hamilton–Jacobi–Isaacs equation for continuous-time and discrete-time domains is introduced and its relation to the H f norm. Finally, Section 1.7 reviews the basics of neural networks and their function approximation property.
1.1 Nonlinear Systems In this section, continuous-time and discrete-time systems considered in this book are described. These systems are autonomous, i.e. time-invariant, and affine in the input. 1.1.1 Continuous-time Nonlinear Systems In this book, the affine in input continuous-time nonlinear dynamical systems considered in this book are described by
2
Nonlinear H2/H Constrained Feedback Control
x (t )
f ( x(t )) g ( x(t ))u (t ) k ( x(t )) d (t )
y (t )
h( x(t ))
(1.1)
where t is the continuous-time index, x(t ) \ n is the internal state vector, f ( x) \ n , g ( x) \ nu m1 and k ( x) \ nu m2 . u (t ) \ m1 is the control input, and y (t ) \ p is the measured system output. d (t ) \ m2 is a disturbance input determined by the surrounding environment. Note that the dynamics of many physical systems can be described by (1.1). For instance, it may be derived from the physics of the system by using the Lagrangian or Hamiltonian dynamics. Equation y h( x) is called the output or measurement equation and represents how we choose to measure the systems variables. It depends on the type and availability of sensors. Throughout this book, it is assumed that a unique continuous-time solution exists locally for all t t 0 . To guarantee this, it is assumed throughout the book that f , g , and k are sufficiently smooth or at least locally Lipschitz to guarantee the uniqueness of local solutions. A special and important class of the dynamical systems (1.1) is the class of linear time-invariant systems (LTI) described by x y
Ax Bu Kd Hx
(1.2)
with A , B , C are constant matrices. Hence, results applicable to (1.2) will be highlighted throughout the book.
1.1.2 Discrete-time Nonlinear Systems In this book, the discrete-time nonlinear dynamical systems considered are affine in the input and can be described by
xk 1 y
f ( xk ) g1 ( xk )uk g 2 ( xk )d k h( xk )
(1.3)
where k is the discrete-time index, xk \ n is the internal state vector, f ( x) \ n , g1 ( xk ) \ nu m1 and g 2 ( xk ) \ nu m2 . uk \ m1 is the control input, and yk \ p is the measured system output. d k \ m2 is a disturbance input determined by the surrounding environment. A special and important class of (1.3) is the linear time-invariant systems described by xk 1 yk
Axk Buk Kd k Hxk
(1.4)
with A , B , C constant matrices. The results tailored to (1.4) will be highlighted and emphasized throughout the book.
Preliminaries and Introduction
3
In both continuous-time and discrete-time systems with zero disturbance, the choice of f ( x ) determines the stability of the unforced system, i.e. u 0 . Moreover, the choice of the input matrix g ( x) determines the controllability of the system, and the choice of the measurement matrix h( x ) determines the observability of the system, in other words the suitability of the measurements taken in a system.
1.2 Stability of Nonlinear Systems In this section, we study the stability of an equilibrium point of the system with respect to changes in the initial conditions. These definitions will be stated in terms of continuous-time nonlinear systems with the understanding that discrete-time nonlinear systems admit similar. Consider the following unforced, i.e. no inputs, continuous-time nonlinear dynamical system x
f ( x)
(1.5)
where x and f are n u 1 vectors. It is assumed that f is Lipschitz continuous on a set : \ n containing the origin of the system. Under this assumption, a unique continuous-time solution x(t ) that satisfies (1.5) exists. To discuss the stability of (1.5), the following definitions are required.
Definition 1.1 (Equilibrium Point) A vector xe \ n is a fixed or equilibrium point of (1.5) if f ( xe ) 0 . Definition 1.2 In all parts of this definition, xe is an equilibrium point of (1.5) denotes a vector norm. and 1. Stability: xe is stable in the sense of Lyapunov if starting close enough to xe , the state will always stay close to xe at later times. More precisely, xe is stable in the sense of Lyapunov if for any given H ! 0 , there exists a positive constant G (H ) such that if x0 xe G (H ) then x(t ) xe H . 2. 3.
Asymptotic Stability: xe is asymptotically stable if it is stable in the sense of Lyapunov and eventually converging to xe as time goes to infinity. Domain of Attraction: A region such that asymptotic stability will result for any state starting inside this region but not for states starting outside it.
4
Nonlinear H2/H Constrained Feedback Control
4.
Global Asymptotic Stability: xe is globally asymptotically stable if the equilibrium point is asymptotically stable, and the corresponding region of attraction is \ n .
An isolated equilibrium point xe can always be brought to the origin by redefinition of co-ordinates; therefore, let us assume without loss of generality that the origin is an equilibrium point. In this case, if an output equation is considered, then one can relate the stability of the internal state vector x by observing the measured output y h( x) . In that case, the following definitions become relevant.
Definition 1.3 (Zero-state Observability) System (1.5) is zero-state observable if y (t ) 0 t t 0 implies that x(t ) 0 t t 0 . Definition 1.4 (Zero-state Detectability) System (1.5) is zero-state detectable if y (t ) 0 t t 0 implies that lim x(t ) 0 . t of
If control inputs are introduced to system (1.5) to become x y
f ( x) g ( x)u h( x )
(1.6)
and when x f ( x) is not necessarily stable, then in this case it is important to study the stabilizabilty and controllability of system (1.6).
Definition 1.5 (Controllability) System (1.6) is locally controllable around the origin if there exists a neighbourhood : of the origin such that, given any initial state x0 : , there exists a final time T and a control input u (t ) on [0, T ] that drives the state from x0 to the origin. Definition 1.6 (Stabilizabilty) System (1.6) is locally stabilizable around the origin if there exists a neighbourhood : of the origin and a control input u (t ) that drives the state from x0 : to the origin asymptotically. Determining whether a nonlinear system is stable or not is largely based on Lyapunov theorems discussed in what follows for both continuous-time and discrete-time nonlinear systems.
1.2.1 Lyapunov Stabiltity of Continuous-time Nonlinear Systems Consider the autonomous dynamical system
x
f ( x), y
h( x )
(1.7)
with x \ n , which could represent an unforced system with u 0 , or a closedloop system after the controller has been designed and specified as a function of
Preliminaries and Introduction
5
the state x(t ) , i.e. u ( x) l ( x) . In both cases, the stability of (1.7) around the origin can be determined by the following theorems.
Theorem 1.1 (Lyapunov Stability) If there exists a locally positive definite function V ( x) ! 0 , such that its time derivative along the trajectories of (1.7) in some neighbourhood of the origin is V ( x ) VxT x VxT f ( x) d 0
(1.8)
then the origin is stable in the sense of Lyapunov and V ( x) is called a Lyapunov function. Moreover, if in some neighbourhood of the origin V ( x ) VxT x VxT f ( x) 0
(1.9)
then the origin is asymptotically stable. The origin is globally stable, respectively globally asymptotically stable, if in addition (1.8), respectively (1.9), holds for all x \ n with the Lyapunov function satisfying the radially unbounded property, i.e. V ( x) o f as x o f .
Theorem 1.2 Let V ( x) t 0 be a solution to
VxT f ( x)
h( x)T h( x)
(1.10)
and suppose that x
f ( x), y
h( x )
(1.11)
is zero-state detectable. Then x 0 is an asymptotically stable equilibrium of (1.11). If additionally (1.10) holds for all x \ n and V ( x) is radially unbounded, then x 0 is globally asymptotically stable. For the special case of linear time-invariant systems, the following Lyapunov theorems apply.
Theorem 1.3 (Lyapunov Theorem for Linear Systems) The system x
Ax, y
Hx
is stable in the sense of Lyapunov if there exists matrices P ! 0 , Q that satisfy the Lyapunov equation AT P PA
Q
(1.12)
HT H t 0
(1.13)
6
Nonlinear H2/H Constrained Feedback Control
Moreover, if Q H T H ! 0 and there exists a P ! 0 that solves (1.13) then system (1.12) is asymptotically stable. One may think of P in (1.13) as a cost that is the outcome of evaluating the following performance functional: f T
³ y(t )
y (t ) dt
(1.14)
0
over the state trajectories of (1.12). Hence one may write f T
³ x(t )
Qx(t )dt
0
§f · x0c ¨ ³ (e At )T Qe At dt ¸ x0 ©0 ¹
x0T Px0
(1.15)
If the linear time-invariant system
x
Ax Bu
u
Kx
(1.16)
is considered, one can define the following closed-loop Lyapunov equation
( A BK )T P P( A BK )
Q
(1.17)
and analyze the stability of the solutions of this Lyapunov equations to determine the stability of this closed-loop system. There are many extensions to the results appearing in Theorem 1.3. Using the concepts of observability and detectability, the following extensions follow.
Theorem 1.4 A necessary and sufficient condition for A to be strictly stable is that for any symmetric Q ! 0 , the unique solution of the linear matrix equation AT P PA Q
0
is P ! 0 and symmetric.
Theorem 1.5 If Q t 0 and A is strictly stable, the linear matrix equation
AT P PA Q
0
has a unique solution P , and P t 0 . Moreover, if (Q1/ 2 , A) is observable, then P!0.
Theorem 1.6 Suppose P t 0 , Q t 0 , (Q1/ 2 , A) is detectable and
Preliminaries and Introduction
AT P PA Q
7
0
Then A is strictly stable. Moreover, if (Q1/ 2 , A) is observable, then P ! 0 .
1.2.2 Lyapunov Stabiltity of Discrete-time Nonlinear Systems Consider the autonomous discrete-time dynamical system xk 1
f ( xk )
(1.18)
with x \ n could represent a closed-loop system after the controller has been designed and specified as a function of the state xk . Stability of (1.18) around the origin can be determined by the following theorems.
Theorem 1.7 (Lyapunov Stability) If there exists a positive definite function V ( x) ! 0 such that the forward difference along the trajectories of (1.18)
'V ( xk ) V ( xk 1 ) V ( xk ) V ( f ( xk )) V ( xk ) d 0
(1.19)
then the origin is stable in the sense of Lyapunov and V ( xk ) is called a Lyapunov function. Moreover, if 'V ( xk ) V ( f ( xk )) V ( xk ) 0
(1.20)
then the origin is asymptotically stable. The origin is globally stable, respectively globally asymptotically stable, if in addition (1.19), respectively (1.20), holds for all x \ n with the Lyapunov function satisfying the radially unbounded property, i.e. V ( x) o f as x o f . For the special case of linear time-invariant systems, the following theorem applies.
Theorem 1.8 (Lyapunov Theorem for Linear Systems) The system xk 1
Axk
(1.21)
is stable in the sense of Lyapunov if there exist matrices P ! 0 , Q t 0 that satisfy the Lyapunov equation P
AT PA Q
(1.22)
If there exists a solution such that both P and Q are positive definite, the system is asymptotically stable.
8
Nonlinear H2/H Constrained Feedback Control
One may think of P in (1.22) as a cost function that is the outcome of evaluating the following performance functional f
¦x
T
k
Qxk
(1.23)
k 0
over the state trajectories of (1.21). Hence one may write f
¦x
k
k 0
T
Qxk
§ f · x0T ¨ ¦ ( Ak )T QAk ¸ x0 ©k 0 ¹
x0T Px0
(1.24)
1.3 Dissipativity of Nonlinear Systems Dissipativity, when it holds, is a property of nonlinear systems that plays an important role in the study of control systems from an input–output perspective. The focus in this section will be mainly concerned with the treatment of squareintegrable signals, i.e. finite-energy signals. These are natural to work with when studying the effect of disturbances on closed-loop systems. This study requires the following function norms.
Definition 1.7 ( L p -norm) Given a continuous-time function f (t ) : > 0, f o \ n , its L p function norm, f (t ) Lp , is given in terms of the vector p-norm f (t ) p at each value of t by 1/ p
f (t )
and if p
§f · p ¨ ³ f (t ) p dt ¸ ©0 ¹
Lp
f f (t )
Lf
sup f (t ) t t0
f
Definition 1.8 ( l p -norm) Let Z {0,1, 2,3,!} be the set of natural numbers and f (k ) : Z o \ n . The l p function norm, f (k ) l p , is given in terms of the vector pnorm f (k ) p at each value of k by 1/ p
f (t ) and if p
§ f p · ¨ ¦ f (k ) p ¸ ©k 0 ¹
lp
f f (t )
lf
sup f (k ) f . k t0
Preliminaries and Introduction
9
If the L p -norm ( l p -norm) is finite, then f (t ) L p ( f (k ) l p ). A continuoustime signal (discrete-time signal) that is f (t ) L2 ( f (k ) l2 ) is called a squareintegrable signal or equivalently a finite-energy signal. Therefore, the L2 -norm ( l2 -norm) is essential in the study of dissipativity and robustness of dynamical systems.
1.3.1 Dissipativity of Continuous-time Nonlinear Systems Consider the system described by x
f ( x) k ( x)d
z
h( x )
(1.25)
where f (0) 0 and hence x 0 is assumed to be an equilibrium point of the system. d (t ) is considered a disturbance input and z (t ) is a fictitious penalty output.
Definition 1.9 (Dissipative Systems) System (1.25) with supply rate w(d , z ) is said to be dissipative if there exists V t 0 , called the storage function, such that t1
V ( x(t0 )) ³ w(d (t ), z (t ))dt t V ( x(t1 ))
(1.26)
t0
Definition 1.10 ( L2 -gain Stability) System (1.25) has an L2 -gain less than or equal to J , where J t 0 , if z (t )
L2
d J d (t )
(1.27)
L2
for all d (t ) L2 for which the state trajectory remains within the domain attraction of the system, with z (t ) h( x(t )) denoting the output of (1.25) resulting from d when the initial state x(0) 0 . Dissipativity and L2 -gain stability are related. It can be shown that if the 2 2 dynamical system (1.25) is dissipative with a supply rate w(t ) J 2 d (t ) z (t ) , then it is also L2 -gain stable. To see this, if there exists V t 0 satisfying (1.26) 2 2 with x0 0 and w(t ) J 2 d (t ) z (t ) . And if V ( x0 ) 0 , then t1
t1
t1 2
2
2 ³ w(d (t ), z (t ))dt t V ( x1 ) t 0 ³ z (t ) dt d J ³ d (t ) dt
t0
t0
t0
It has been shown that a lower bound on the storage function V ( x ) is given by the so-called available storage. The existence of the available storage is essential in determining whether or not a system is dissipative.
10
Nonlinear H2/H Constrained Feedback Control
Definition 1.11 (Available Storage) The available storage Va t 0 of (1.25) is given by the following optimal control problem f
Va ( x)
sup ³ w(d (t ), z (t ))dt d (t ) 0
The optimal maximizing policy d associated with the available storage can be thought of as the policy for extracting the maximum energy from the system. It can be interpreted as the worst possible L2 disturbance when the supply rate is given 2 2 by w(t ) J 2 d (t ) z (t ) . For a system to be dissipative, Va needs to be finite. The available storage Va t 0 provides a lower bound on the storage function of the dynamical system, 0 d Va d V . If Va C1 then it solves the following Hamilton–Jacobi equation
dVa dx
T
T
f
dV 1 dVa kk T a hT h 2 dx 4J dx
0, Va (0)
0
(1.28)
To find the available storage, one needs to solve an optimization problem which can be approached by solving a variational problem as in optimal control theory. The Hamiltonian of the this optimization problem is given by H ( x, p , d )
pT f kd hT h J 2 d T d
(1.29)
This Hamiltonian is a polynomial of degree two in d , and has a unique maximum at 1 T k ( x) p 2J 2
d* given by H ( x, p )
pT f ( x)
1 T p k ( x)k ( x)T p h( x)T h( x) 4J 2
(1.30)
Setting the right-hand side of Equation (1.30) to zero and replacing p with dV dx , one has dV dx
T
T
f ( x)
1 dV dV h( x)T h( x) k ( x ) k ( x )T 2 dx 4J dx
0
and this is the same as Equation (1.28). It can be shown that any V ( x) t 0 that solves the following Hamilton–Jacobi inequality
Preliminaries and Introduction
dV dx
T
11
T
f
1 dV dV kk T hT h d 0, V (0) 2 dx 4J dx
0
(1.31)
is a possible storage function. The relationship between Hamilton–Jacobi equations, L2 -gain stability and dissipativity of dynamical system is discussed in the Bounded Real Lemma theorems.
Theorem 1.9 (Nonlinear Bounded Real Lemma) Consider the nonlinear timeinvariant system (1.25). Suppose there is a continuously differentiable, positive semidefinite function V ( x) that satisfies dV dx
T
T
f
1 dV dV hT h d 0 kk T 2 dx 4J dx
with J a positive constant. Then, the system (1.25) is finite-gain L2 -stable and its L2 -gain is less than or equal to J . If V ( x) t 0 solves dV dx
T
T
f
1 dV dV hT h kk T dx 4J 2 dx
0
with x
1 dV kk T 2 dx 2J
f
asymptotically stable, then the system is finite-gain L2 -stable and its L2 -gain is strictly less than J and V ( x) t 0 is called the available storage. If in addition zero-state observability is assumed, then the available storage is positive definite. Theorem 1.9 is the nonlinear analogue of the Bounded Real Lemma known for linear time-invariant systems.
Theorem 1.10 (Bounded Real Lemma) Consider the system x
Ax Kd
z
Hx
Then the following statements are equivalent: V ( A) ^ and the L2 -gain is strictly less than J . The algebraic Riccati equation
AT P PA H T H J12 PKK T P
0
(1.32)
12
Nonlinear H2/H Constrained Feedback Control
has a unique symmetric solution P t 0 such that V ( A J12 KK T P) ^ .
There exists a symmetric X ! 0 such that AT X XA H T H J12 XKK T X 0
Note that for the special case of linear systems, the L2 -gain can be found exactly and is equivalent to the infinity norm, H f , of the transfer function from d to z given as z
2 L2
d
2 L2
S
2
³ S H ( jw)d ( jw) dw d sup S ³ S d ( jw) dw
2
w
H ( jw)d ( jw) d ( jw)
2 2
2 2
(1.33) 2
d sup H ( jw) w
2
where
J
sup H ( jw) w
2
is the lower bound on J for which Theorem 1.10 holds.
1.3.2 Dissipativity of Discrete-time Nonlinear Systems Consider the following nonlinear time-invariant discrete-time system
xk 1 zk
f ( xk ) k ( xk )d k h( xk )
(1.34)
where f (0) 0 and hence x 0 is assumed to be an equilibrium point of the system. d k is considered a disturbance input and zk is a fictitious penalty output.
Definition 1.12 (Dissipative Systems) System (1.34) with supply rate w(t ) is said to be dissipative if there exists V t 0 , called the storage function, such that N
V ( x0 ) ¦ w( zk , d k ) t V ( xN 1 )
(1.35)
k 0
The storage function in (1.35) satisfies V ( xk 1 ) V ( xk ) d w( zk , d k ) It can be shown that any V ( x) t 0 that solves the following discrete-time Hamilton–Jacobi inequality
Preliminaries and Introduction
V ( xk ) t sup ^w( zk , d k ) V ( xk 1 )`
13
(1.36)
dk
is a possible storage function.
Definition 1.13 ( l2 -gain Stability) System (1.34) has an l2 -gain less than or equal to J , where J t 0 , if zk
l2
d J dk
(1.37)
l2
for all d k l2 , with z (k ) h( xk ) denoting the output of (1.34) resulting from d for initial state x(0) 0 with the state trajectory remaining within the domain attraction of the system.
Definition 1.14 (Available Storage) The available storage Va t 0 of (1.34) is given by the following optimal control problem N
Va ( xk )
max ¦ w( zk , d k ) dk
k 0
The optimal policy d k associated with the available storage can be thought of as the policy for extracting the maximum energy from the system. It can be interpreted as the worst possible l2 disturbance when the supply rate is given by 2 2 wk J 2 d k zk . The relationship between discrete-time Hamilton–Jacobi equation and l2 -gain stability and dissipativity of dynamical system is discussed in the discrete-time Bounded Real Lemma theorems.
Theorem 1.11 (Nonlinear Bounded Real Lemma) Consider the time-invariant nonlinear system (1.34), Suppose there is a positive semidefinite function V ( xk ) that satisfies the following discrete-time version of the Hamilton–Jacobi equation V ( xk )
max[ zk dk
2
J 2 dk
2
V ( f ( xk ) k ( xk )d k )]
with J a positive constant. Then, the system (1.34) is finite-gain l2 stable and its l2 -gain is less than or equal to J . Theorem 1.11 is the nonlinear analogue of the Bounded Real Lemma known for discrete-time linear time-invariant systems.
Theorem 1.12 (Bounded Real Lemma) Consider the system xk 1 yk
Ax Buk Cxk
14
Nonlinear H2/H Constrained Feedback Control
Then the following statements are equivalent: A is asymptotically stable and the l2 -gain is strictly less than J . There exists a P ! 0 such that P ! AT PA C T C AT PB( BT PB J 2 I ) 1 BT PA 0 ! BT PB J 2 I
The algebraic Riccati equation P
AT PA C T C AT PB( BT PB J 2 I ) 1 BT PA
(1.38)
has a unique symmetric solution P t 0 such that A B( BT PB J 2 I ) 1 BT PA is asymptotically stable.
1.4 Optimal Control of Nonlinear Systems Optimal control theory involves the design of controllers that besides meeting the required control objective also minimize specific performance functionals. The value function of the optimal control problem satisfies Bellman’s Optimality Principle which is described for both continuous-time and discrete-time systems.
1.4.1 Dynamic Programming and the HJB Equation Consider the system described by x
f ( x) g ( x)u
(1.39)
where f (0) 0 and hence x 0 is assumed to be an equilibrium point of the system. u (t ) is considered a control input. It is desired to find u (t ) such that the following infinite-horizon performance functional is minimized f
V ( x(t ))
³ > r x(W ), u(W ) @ dW
(1.40)
t
This can be equivalently written as f
V ( x(t ))
min ³ > r x(W ), u (W ) @ dW u (t )
0dt df
t
which after subdividing the interval becomes
(1.41)
Preliminaries and Introduction
f t 't ½ min ® ³ > r x(W ), u (W ) @ dW ³ > r x(W ), u (W ) @ dW ¾ u (W ) ¿ t 't t dW df ¯ t
V ( x(t ))
15
(1.42)
To solve this optimal control problem, Bellman’s Optimality Principle requires that
t 't ½ min ® ³ > r x(W ), u (W ) @ dW V ( x(t 't )) ¾ u (t ) ¯ t ¿
V ( x(t ))
(1.43)
Taking the infinitesimal version of (1.43), under the assumption that V ( x) is continuously differentiable, one obtains 0
^
`
min r x(t ), u (t ) Vx c ( x(t )) > f ( x) g ( x)u @ u (t )
(1.44)
which is known as the Hamilton–Jacobi–Bellman equation (HJB) and provides a sufficient condition in optimal control theory. The optimal control law is then given as
u ( x)
1 dV R 1 g T 2 dx
(1.45)
For the special case of linear time-invariant systems x
Ax Bu
(1.46)
with the quadratic infinite-horizon cost f
V ( x0 )
min ³ ª¬ xT Qx u T Ru º¼ dt u (t )
0dt df
(1.47)
0
the Hamilton–Jacobi–Bellman equation (1.44) becomes the algebraic Riccati equation
AT P PA Q PBR 1 BT P
0
(1.48)
The value of the optimal control problem is quadratic in the state V ( x)
xT Px
(1.49)
with P being the positive semidefinite solution of (1.48). The optimal controller in this case, known as the linear quadratic regulator (LQR), is given as
16
Nonlinear H2/H Constrained Feedback Control
u ( x)
R 1 BT Px
(1.50)
For linear systems, the optimal control problem (1.47) is related to a general class of problems known as the H 2 optimal control. To illustrate this fact, consider the system described in Figure 1.1
x Ax B1w B2u z C1 x D11w D12u y C2 x D21w D22u
w u
z
y
u D ( y) Figure 1.1.
where u is the control input, w is a disturbance input, y is the measured output and z is the controlled output. In the H 2 optimal control theory, one is interested in finding a controller that minimizes the second norm of the transfer function from w to z
H ( s)
2 2
1 2S
³
f
0
³
f
f
trace ¬ª H T ( jw) H ( jw) ¼º dw
T ¬ª h(t ) h(t ) ¼º dt
(1.51)
This can be thought of as minimizing the response of the transfer function H ( s ) to the impulsive input w . To see why (1.46) and (1.47) is an H 2 optimal control problem. Note that x Ax Bu x(0) x0
(1.52)
can be written as
with
x
Ax B1 w B2 u
z
C1 x D12 u
y
x
(1.53)
Preliminaries and Introduction
B1
x0
B2
B
1/ 2
ªQ º « » ¬ 0 ¼
C1 Moreover, w G (t ) , x(0)
D12
ª 0 º « R1/ 2 » ¬ ¼
0 and the optimized functional is
f
f
T ³ ª¬ z (t ) z (t ) º¼ dt
³ ª¬ x Qx u
T
0
17
T
Ru º¼ dt .
(1.54)
0
In this book, we will refer to the optimal control problem (1.41) as an H 2 optimal control problem.
1.4.2 Discrete-time HJB Equation Consider the following nonlinear time-invariant discrete-time system xk 1
f ( xk ) g ( xk )uk
(1.55)
where f (0) 0 . uk is a control input and x 0 is assumed to be an equilibrium point of the system. It is desired to find uk such that the following infinite-horizon performance functional is minimized f
V ( xk )
¦ r x ,u i
i
(1.56)
i k
This can be equivalently written as V ( xk )
f
min ¦ r xi , ui uk 0d k df
(1.57)
i k
which after subdividing the interval becomes
V ( xk )
f ½ min ®r xk , uk ¦ r xi , ui ¾ uk ¯ ¿ i k 1 0d k df
(1.58)
To solve this optimal control problem, Bellman’s Optimality Principle requires that
V ( xk )
min ^r xk , uk V ( xk 1 )` uk
min ^r xk , uk V ( f ( xk ) g ( xk )uk )` uk
(1.59)
18
Nonlinear H2/H Constrained Feedback Control
Equation (1.59) is the discrete-time Hamilton–Jacobi–Bellman equation. Unlike the case of continuous-time systems, the optimal control policy is related to the optimal cost-to go through dV ( xk 1 ) 1 R 1 g ( xk )T 2 dxk 1
u ( xk )
(1.60)
and hence it is very difficult, except in very special cases, to find a closed-form solution for u ( xk ) in terms of V ( xk ) when u ( xk ) is related to V ( xk 1 ) as shown in (1.60). Consider the special case of linear time-invariant systems xk 1
Axk Buk
(1.61)
with the quadratic infinite-horizon cost V ( xk )
f
min ¦ ª¬ xk T Qxk uk T Ru º¼ uk 0d k df
(1.62)
i k
In this case, Equation (1.59) becomes the algebraic Riccati equation P
AT PA Q AT PB( BT PB R) 1 BT PA
(1.63)
The value of the optimal control problem is quadratic in the state
V ( xk )
xk T Pxk
(1.64)
with P the positive semi definite solution of (1.64). The optimal controller in this case, known as the discrete-time linear quadratic regulator (LQR) and is given by
uk
( BT PB R) 1 BT PAxk
(1.65)
1.5 Policy Iterations and Optimal Control Solving optimal control problems requires solving a dynamic programming problem which happens backward in time. One can solve for the optimal control policy by a sequence of policy iterations that eventually converge to the optimal control policy. This idea has its roots in the reinforcement learning literature. In this section, policy iterations in the case of continuous-time linear timeinvariant systems are introduced. In this case, policy iterations require one to solve a sequence of linear matrix equations, i.e. Lyapunov equations, rather than solving
Preliminaries and Introduction
19
a quadratic matrix equation, i.e. algebraic Riccati equation. Preliminary results concerning policy iterations technique to solve the algebraic Riccati equations (1.48) and (1.32) are discussed. These concepts are generalized later in this book to the case of nonlinear continuous-time systems to solve the Hamilton–Jacobi– Bellman equation and the Hamilton–Jacobi–Isaacs equation. Discrete-time policy iterations methods do not appear in this book.
1.5.1 Policy Iterations and H2 Optimal Control Consider the following linear system x u with u
Ax Bu Kx
(1.66)
Kx a stabilizing controller. Consider the following cost-to-go f
V ( x0 )
T
³ ª¬ x Qx u
T
Ru º¼ dt
(1.67)
0
which when evaluated over the trajectories of the closed-loop system (1.66), one has V ( x0 )
ªf º x0T « ³ (e( A BK )t )T ª¬Q K T RK º¼ e( A BK )t dt » x0 ¬0 ¼ T x0 Px0
(1.68)
Taking the infinitesimal version of (1.68) V ( x) VxT ( A BK ) x
xT Qx u T Ru xT (Q K T RK ) x
one obtains the following closed-loop Lyapunov equation
0
( A BK )T P P( A BK ) Q K T RK
If one has two different stabilizing controller u1 one has V1 ( x) V1xT ( A BK1 ) x xT Qx u1T Ru1 xT (Q K1T RK1 ) x
K1 x and u2
K 2 x , then
20
Nonlinear H2/H Constrained Feedback Control
V2 ( x) V2 xT ( A BK 2 ) x xT Qx u2T Ru2 xT (Q K 2T RK 2 ) x
u2
One can differentiate V1 ( x) V2 ( x) K 2 x to have V1 V2
over the trajectories of controller
(V1x V2 x )T ( A BK 2 ) x (u1T u2T ) R (u1 u2 ) (Vx1T B 2u2T R )(u1 u2 )
and implies that V1 ( x0 ) V2 ( x0 )
³
f
0
ª¬(u1T u2T ) R (u1 u2 ) (Vx1T B 2u2T R)(u1 u2 ) º¼ dt
Hence, one has f
P1 P2
³ (e
( A BK 2 ) t T
) ª¬( K1 K 2 )T R( K1 K 2 )
0
( BT P1 RK 2 )T ( K1 K 2 ) ( K1 K 2 )T ( BT P1 RK 2 ) º¼ e( A BK2 ) t dt If the control policy u2
(1.69)
K 2 x is selected such that K2
R 1 BT P1
then it can be shown that this policy is a stable policy by noticing that V1 ( x) is a Lyapunov function for u2 and is given by V1
(V1x )T ( A BK 2 ) x xT Qx (u1T u2T ) R(u1 u2 ) u2T Ru2
Theorem 1.13 Let Pi , i Lyapunov equation
0
(1.70)
0,1,! be the unique positive definite solution of the
( A BK i )T Pi Pi ( A BK i ) Q K iT RK i
where the policies Ki
R 1 BT Pi 1
(1.71)
and K 0 is such that A BK 0 has all eigenvalues with negative real parts. Then
Preliminaries and Introduction
P d Pi 1 d Pi d " , i
lim Pi o P
21
0,1,!
i of
where P solves AT P PA Q PBR 1 BT P
0
Proof. From (1.70), the policies given by (1.71) are stabilizing. From (1.69), one , can show that P d Pi 1 d Pi d " , i 0,1,! . 1.5.2 Policy Iterations and the Bounded Real Lemma Consider the algebraic Riccati equation (1.32) in Theorem 1.10. The next theorem shows the existence and convergence of policy iterations that solve for the stabilizing solution of (1.32).
Theorem 1.14 Assume there exists a P t 0 that solves
AT P PA J12 PDDT P H T H with V ( A J12 DDT P) ^ Pi , i 0,1,! that solves
and
0
(1.72)
H ! 0 . Then, there exists a sequence
( A DK i )T Pi Pi ( A DK i ) H T H J 2 K iT K i
0 where the policies Ki
1 J2
DT Pi 1
(1.73)
with P0 0 . Then, i, V ( A DK i ) ^
Pi d Pi 1 d " d P , i
0,1,!
lim Pi o P ! 0 i of
where P solves (1.72).
Proof. Existence: Assume that there is Pi 1 such that V ( A DK i ) ^ , then one has f
Pi
³ (e
( A DKi ) t T
0
and Pi is the unique solution of
) ª¬ H T H J 2 K iT K i º¼ e ( A DKi )t dt
22
Nonlinear H2/H Constrained Feedback Control
( A DK i )T Pi Pi ( A DK i )
H T H J 2 K iT K i
(1.74)
Moreover, from Theorem 1.10, there exists a symmetric X ! 0 such that ( X )
AT X XA H T H J12 XDDT X 0
(1.75)
Equation (1.75) can be rewritten to have ( A DK i )T X X ( A DK i )
H T H J 2 K iT K i J12 ( X Pi 1 ) DDT ( X Pi 1 ) ( X )
(1.76)
Combining (1.76) with (1.74), one has ( A DK i )T ( X Pi ) ( X Pi )( A DK i ) J12 ( X Pi 1 ) DDT ( X Pi 1 ) ( X ) (1.77)
<0 and one concludes from Theorem 1.5 that X Pi ! 0 . Now consider the following policy update K i 1
1
J2
DT Pi
Rearranging terms in (1.74) one has ( A DK i 1 )T Pi Pi ( A DK i 1 )
H T H J 2 K i 1T K i 1 J12 ( Pi Pi 1 ) DDT ( Pi Pi 1 )
(1.78)
and rearranging (1.76) one obtains ( A DK i 1 )T X X ( A DK i 1 )
H T H J 2 K i 1T K i 1 J12 ( X Pi ) DDT ( X Pi ) ( X )
(1.79)
Subtracting (1.78) from (1.79), on has ( A DK i 1 )T ( X Pi ) ( X Pi )( A DK i 1 ) J12 ( X Pi ) DDT ( X Pi ) J12 ( Pi Pi 1 ) DDT ( Pi Pi 1 ) ( X ) <0
(1.80)
Preliminaries and Introduction
23
Using X Pi ! 0 as a Lyapunov function candidate for the dynamics of the policy update K i 1 J12 DT Pi , it follows from Theorem 1.4 that V ( A DK i 1 ) ^ . Existence then follows by induction by initializing with K1 J12 DT P0 where P0 0 and noting that V ( A DK1 ) ^ because of V ( A) ^ . Convergence: Note that ( A DK i 1 )T Pi 1 Pi 1 ( A DK i 1 )
( A DK i 1 )T Pi Pi ( A DK i 1 )
H T H J 2 K i 1T K i 1
(1.81)
H T H J 2 K i 1T K i 1 J12 ( Pi Pi 1 ) DDT ( Pi Pi 1 )
(1.82)
Substracting (1.82) from (1.81), one has ( A DK i 1 )T ( Pi 1 Pi ) ( Pi 1 Pi )( A DK i 1 ) = J 2 ( K i 1T K iT )( K i 1 K i )
(1.83)
which implies that f
Pi 1 Pi
J 2 ³ (e( A DK
i 1 ) t
)T ( K i 1T K iT )( K i 1 K i )e( A DKi1 )t dt t 0
(1.84)
0
Hence, Pi d Pi 1 d " d P , i note that P1 ! 0 since H ! 0 .
0,1,! and lim Pi o P . To show that P ! 0 , i of ,
1.6 Zero-sum Games of Nonlinear Systems In this section, we introduce zero-sum games of concern in this book for both nonlinear continuous-time and discrete-time systems. Then results applicable to the linear time-invariant case are emphasized. Finally, the relation between continuous-time linear quadratic zero-sum games and H f optimal control theory is discussed.
1.6.1 Continuous-time Zero-sum Games: The HJI Equation Consider the system described by x
f ( x) g ( x)u k ( x)d
(1.85)
where f (0) 0 . u (t ) is considered a control input while d (t ) is considered a disturbance input. In the zero-sum games considered in this book, the two players u (t ) and d (t ) are required to optimize the infinite horizon cost
24
Nonlinear H2/H Constrained Feedback Control f
V ( x0 )
³ > r x(W ), u(W ), d (W ) @ dW
(1.86)
0
where u (t ) is the minimizer player and d (t ) is the maximizer player. It is assumed that the information structure is dynamic, that is both controller and disturbance have access to some measurement model. In this section, we assume that both players have access to the state information. Under this closedloop information structure, a partial differential equation known as the Hamilton– Jacobi–Isaacs equation
0
T ° ½° dV min max ®r x(t ), u (t ), d (t ) f ( x ) g ( x)u k ( x)d @¾ > u (t ) d (t ) dx ¯° ¿°
(1.87)
0
T ½ dV max min ®r x(t ), u (t ), d (t ) > f ( x) g ( x)u k ( x)d @¾ d (t ) u (t ) dx ¯ ¿
(1.88)
or
provides a sufficient condition for optimality as did the Hamilton–Jacobi–Bellman equation in (1.44). If V V V then the zero-sum game has a value V . Moreover, if
0
T ° ½° dV min max ®r x(t ), u (t ), d (t ) f ( x ) g ( x)u k ( x)d @¾ > u (t ) d (t ) dx ¯° ¿° T ½ dV max min ®r x(t ), u (t ), d (t ) > f ( x) g ( x)u k ( x)d @¾ d (t ) u (t ) dx ¯ ¿ T
r x( x), u ( x), d ( x)
dV ª f ( x) g ( x)u k ( x) d º¼ dx ¬
(1.89)
then the pair of policies (u , d ) is in saddle point equilibrium. A specific zero-sum cost that has applications to H f optimal control is
r x(t ), u (t ), d (t )
hT h u T Ru J 2 d T d
(1.90)
In this case, the saddle point policies are then given by
1 dV R 1 g T 2 dx
1 T dV d ( x) k dx 2J 2
u ( x)
(1.91)
Preliminaries and Introduction
25
with V solving dV dx
0
T
T
f
dV § 1 T 1 · k k g T R 1 g ¸ h T h ¨ 2 dx © 4J 4 ¹
(1.92)
In this case, d ( x ) is thought to be the worst possible L2 disturbance.
1.6.2 Linear Quadratic Zero-sum Games and H Optimal Control For the special case of linear time-invariant systems x
Ax Bu Kd
(1.93)
with the quadratic cost
xT Qx u T Ru J 2 d T d
r x(t ), u (t ), d (t )
(1.94)
In this case, Equation (1.92) becomes the following algebraic Riccati equation
§ 1 AT P PA P ¨ 2 KK T BBT ©J
· ¸PQ ¹
0
(1.95)
and (1.91) becomes u ( x)
Lu x
d ( x)
Ld x
R 1 BT Px 1
J
2
(1.96)
K T Px
To see the relation to L2 -gain stability, note that Equation (1.95) becomes 0
AcT P PAc Qc
Qc
Q Lu T RLu
Ac
A BLu
1
J2
PKK T P (1.97)
which implies that z (t ) with
2 L2
d J 2 d (t )
2 L2
(1.98)
26
Nonlinear H2/H Constrained Feedback Control f
z (t )
³ x Qx u T
L2
T
Ru dt
0
Hence, solving the zero-sum game guarantees that the L2 -gain from d to z is bounded by J . Designing the control system to minimize the L2 -gain from d to z as shown in Figure 1.1 is equivalent to finding a solution to the zero-sum game (1.95) for the smallest possible J . This is known as the H f optimal control problem.
1.6.3 Discrete-time HJI Equation In a discrete-time zero-sum game setting, one has the following nonlinear timeinvariant discrete-time system xk 1
f ( xk ) g1 ( xk )uk g 2 ( xk )d k
(1.99)
where f (0) 0 , uk is a control input and d k is the disturbance input. It is desired to find uk to optimize the infinite-horizon performance functional f
V ( xk )
¦ r x ,u , d i
i
i
(1.100)
i k
where u is the minimizer player and d is the maximizer player. It is assumed that the information structure is dynamic, that is both controller and disturbance have access to some measurement model. Here we assume that both players have access to the state information. Under this closed-loop information structure, applying the dynamic programming principle results in the following Isaacs equation V ( xk )
min max ^r xk , uk , d k V ( f ( xk ) g1 ( xk )uk g 2 ( xk )uk )` uk
dk
max min ^r xk , uk , d k V ( f ( xk ) g1 ( xk )uk g 2 ( xk )d k )` dk
uk
r xk , uk , d k V ( f ( xk ) g1 ( xk )uk g 2 ( xk )d k )
(1.101)
which is a discrete-time analogue of the Hamilton–Jacobi–Isaacs equation. This equation provides a sufficient condition for the existence of a saddle point for a dynamic game. A specific zero-sum cost that has applications to H f optimal control is
r xk , uk , d k
hk T hk u T Ruk J 2 d k T d k
(1.102)
Unlike the case of continuous-time systems, the optimal control policy is related to the optimal cost-to-go through
Preliminaries and Introduction
u ( xk )
dV ( xk 1 ) 1 R 1 g1T ( xk ) 2 dxk 1
d ( xk )
27
(1.103)
dV ( xk 1 ) 1 1 T R g 2 ( xk ) 2 dxk 1 2J
and hence, it is very difficult to find a closed-form solutions for the policies u ( xk ) and d ( xk ) in terms of V ( xk ) except in very special cases. Consider the special case of linear time-invariant systems Axk Buk Ewk
xk 1
(1.104)
with the quadratic cost
r xk , uk , wk
xk T Qxk uk T Ruk J 2 wk T wk
(1.105)
In this case, Equation (1.101) becomes the following algebraic Riccati equation: AT PA R
P
1
BT PE º ª BT PA º ª I BT PB AT PE ] « T » « » E T PE J 2 I ¼ ¬ E T PA¼ ¬ E PA
[ AT PB
(1.106)
The value of the optimal control problem is quadratic in the state V ( xk )
xk T Pxk
(1.107)
where P is the positive semidefinite solution of (1.106). Moreover, (1.103) is given by u ( x)
w ( x)
Lx
(1.108)
Kx
with
ªLº «K » ¬ ¼
1
BT PE º ª BT PA º ª I BT PB « T » « » E T PE J 2 I ¼ ¬ E T PA¼ ¬ E PB
(1.109)
28
Nonlinear H2/H Constrained Feedback Control
1.7 Neural Networks and Function Approximation Neural networks have many applications and structures. In this book, an important property of neural networks is utilized that is known as the function approximation property. Neural networks will be the tool by which closed-form solutions to the various Hamilton–Jacobi equations arising in this book are obtained. This will result in closed-form representations of optimal policies for both H 2 and H f constrained nonlinear control.
1.7.1 Neural Networks In this book, we are interested in two-layer static feedforward neural networks that are shown in Figure 1.2.
Figure 1.2. Two-layer feedforward static neural networks
In Figure 1.2, ª x1 º «x » « 2» «#» « » ¬ xn ¼
is the input vector, while
Preliminaries and Introduction
29
ª y1 º «y » « 2» « # » « » ¬ ym ¼
is the output vector and V () is known as the activation function of the neural network. The output of the neural network is given by yi
§
K
§
n
·
·
©
j 1
¹
¹
V ¨ ¦ wik V ¨ ¦ vkj x j vk 0 ¸ wi 0 ¸ i 1, 2,! , m ©k
1
(1.110)
where vkj are the weights of the first layer and wik are the weights of the second layer. vk 0 and wi 0 are threshold or bias weights which have a constant input signal of value 1. Equation (1.110) can also be written in vector form as y
V m W T V L Vx
(1.111)
where in this case
x
V m
VT
ª v10 «v « 20 « # « ¬ vL 0
ª1º «x » « 1» « x2 » « » «#» «¬ xn »¼ ª V 1 () º « V () » « 2 » « # » « » ¬V m () ¼
v11
v12
v21
v22
# vL1
# vL 2
" v1n º " v2 n » » # » » " vLn ¼
30
Nonlinear H2/H Constrained Feedback Control
WT
ª w10 «w « 20 « # « ¬ wm 0
w11
w12
w21
w22
# wm1
# wm 2
" w1L º " w2 L » » # » » " wmL ¼
A special form of the two-layer neural network is the “linear in parameters” neural network which has linear activation functions on the output layer and is given as y W T V L V T x
(1.112)
If the weights and thresholds of the first layer are predetermined, then only the second-layer weights and thresholds need to be tuned, and effectively the neural network has one layer of weights y W T V L V T x W T I ( x)
(1.113)
1.7.2 Function Approximation Theorems The neural network (1.113) is special and is known as the functional-link neural network. It can be thought of as a Fourier series and can be used to approximate any smooth function with all its derivatives. In general, I ( x ) in (1.113) needs to be a basis that must satisfy the following two requirements on a compact simply-connected set : of R n : 1. 2.
A constant function on : can be expressed as (1.113) with a finite number L of hidden-layer neurons. The functional range of (1.113) is dense in the space of continuous functions from : to R m for countable L .
For example, if I ( x) are only polynomial functions, then it has been shown by the High Order Weierstrass Approximation Theorem that polynomials can approximate uniformly a function and its derivatives uniformly.
Definition 1.15 (Uniform Convergence) A sequence of functions ^ f n ` converges uniformly to f on a set : if H ! 0, N (H ) : n ! N f n ( x) f ( x) H x : , is the absolute value. or equivalently sup f n ( x) f ( x) H , where x:
Theorem 1.15 (High-Order Weierstrass Approximation Theorem) Let f ( x) C m (:) in the compact set : , then there exists a polynomial, f N ( x) , such that it converges uniformly to f ( x) C m (:) , and such that all its partial derivatives up to order m converges uniformly.
Preliminaries and Introduction
31
Activation functions other than polynomials can also be used. In the literature, it has been noted that if I ( x) is built from sigmoid functions, then these sigmoid functions can approximate functions in a Sobolev space setting. Sobolev spaces are important to study the convergence of a neural network in approximating a function and its derivative at the same time.
Definition 1.16. (Sobolev Space H m , p (:) ) Let : be an open set in \ n and let u C m (:) . Define a norm on u by 1 p
u
m, p
§ · p D ¨ ³ D u ( x) dx ¸ , 1 d p f ¦ 0d D d m © : ¹
This is the Sobolev norm in which the integration is the Lebesgue integration. The completion of u C m (:) : u m , p f with respect to is the Sobolev space m, p H m , p (:) . For p 2 , the Sobolev space is a Hilbert space [6]. Therefore, a basis defined in the Sobolev space can uniformly approximate any function in the same Sobolev space, i.e. a continuous function and its derivatives. In much of the book, polynomials are used as the activation function of choice. However, other activation functions can also be used. The choice of the activation function depends on the complexity of the problem and the way the training of the neural network is taking place.
^
`
1.8 Bibliographical Notes A good introduction to nonlinear system theory is found in Khalil’s book [51] on which many of the stability theorems are based. For the special case of linear systems, one can consult the book by Kailath [49]. The dissipativity of dynamical systems in Section 1.3 was first introduced by Willems in [93][94]. See also the work of Hill and Moylan for the nonlinear case [40]. The book by Van der Schaft collects many of these results with an emphasis on Hamilton–Jacobi equations [91]. For the discrete-time case, dissipative dynamical systems and their related discrete-time Hamilton–Jacobi equations appears in [62]. For optimal control theory, one may consult the book by Lewis and Syrmos [61] and by Kirk [53]. The material in Section 1.5 is based on the concept of policy iterations established in Artificial Intelligence and Reinforcement Learning [22]. The application of policy iterations to optimal control can be traced back to [54], [85] and [58]. A rigorous treatment of its application to Riccati equations can be found in [57] and [96]. The H f optimal control theory is due to Zames [95]. It was put in a zero-sum game framework by Baúar [14]. This approach is well developed in two valuable textbooks [15][16]. Finally, a very good introduction to the material in Section 1.7 can be found in the book written by Lewis et al [60]. Proofs related to the function approximation property in the Sobolev space setting can be found in [41]. An introduction to Sobolev space can be found in [6] while the study of mathematical analysis and convergence analysis is introduced in [7].
2 Policy Iterations and Nonlinear H2 Constrained State Feedback Control
2.1 Introduction In this chapter, the constrained optimal control problem is studied through the framework of the Hamilton–Jacobi–Bellman equation (HJB). It is shown how to break the HJB equation formulated for constrained input systems into a sequence of Lyapunov equations that are easier to handle. Solution of the HJB equation is a challenging problem due to its inherently nonlinear nature. For linear systems with no constraints, the HJB equation results in the well-known Riccati equation used to derive linear state feedback control. However, even when the system is linear, the saturated control requirement makes the value function, and hence the required control law, nonlinear. In the general nonlinear case, the HJB equation generally cannot be solved explicitly. There has been a great deal of effort devoted to this issue. Approximate HJB solutions have been found using many techniques. In this presentation the focus is on solving the HJB equation for constrained input systems using the method of policy iterations. Policy iterations have been used with the HJB method using a sequence of nonlinear Lyapunov equations (LE) which can be thought of as the nonlinear counterpart of the matrix Lyapunov equation. The policy iterations method improves a given initial stabilizing control policy. This method reduces to the well-known Kleinman iterative method for solving the algebraic Riccati equation for linear systems using a sequence of Lyapunov equations. In summary, the objective of this chapter is to study the application of the policy iteration method to the HJB equation formulated using nonquadratic performance functionals to confront the saturation issue. For constrained-input systems, two optimal control problems are presented. The first is an optimal regulator, while the second is a minimum-time optimal controller. Therefore, in Section 2.2, the HJB equation for constrained input systems is introduced using nonquadratic performance functions. In Section 2.3, a LE is introduced that will be
34
Nonlinear H2/H Constrained Feedback Control
useful in implementing the policy iteration method and studying its convergence properties. It will be shown that instead of solving for the value function using the HJB directly, one can solve for a sequence of cost functions through the LE equation that converges uniformly to the value function that solves the HJB equation. In Section 2.4, it is shown how to construct a nonquadratic performance functional to address minimum-time and constrained-state problems. In the next chapter, neural networks are used to solve for the value function of the HJB equation derived in this chapter and hence to develope a framework to construct nearly optimal constrained state feedback controllers.
2.2 Optimal Regulation of Systems with Actuator Saturation Consider an affine in the control nonlinear dynamical system of the form x
f ( x) g ( x)u ( x)
(2.1)
where x \ n , f ( x) \ n , g ( x) \ num , and the input u U with the set
^u u ,! , u \
U
1
m
m
: D i d ui d E i , i 1,! , m`
and D i , E i are constants. Assume that f gu is Lipschitz continuous on a set : \ n containing the origin, and that the system (2.1) is stabilizable in the sense that there exists a continuous control on : that asymptotically stabilizes the system. It is desired to find u that minimizes a generalized nonquadratic functional V ( x0 )
³
f
0
[Q( x) W (u )]dt
(2.2)
where Q( x) and W (u ) are positive definite functions on : , x z 0 Q( x) ! 0 and x 0 Q ( x ) 0 . For unbounded control inputs, a common choice for W (u ) is W (u ) u cRu , where R \ mum . Note that in optimal control of nonlinear systems, the control u must not only stabilize the system on : , but also make the integral finite. Such controls are defined to be admissible [19].
Definition 2.1 (Admissible Controls) A control u is defined to be admissible with respect to (2.2) on : , denoted by u < (:) , if u is continuous on : ; u (0) 0 ; u stabilizes (2.1) on : ; x0 : , V ( x0 ) is finite. Equation (2.2) can be expanded as follows
Policy Iterations and Nonlinear H2 Constrained State Feedback Control
f
T
V ( x0 )
35
³ >Q( x) W (u )@ dt ³ >Q( x) W (u)@ dt T
0
T
³ >Q( x) W (u )@ dt V ( x(T )
(2.3)
0
If the cost function V is differentiable at x0 , then rewriting Equation (2.3)
V ( x0 ) V ( x(T )) T o0 T V lim
T
1 >Q( x) W (u )@ dt T o0 T ³ 0 lim
VxT f gu
Q( x) W (u )
(2.4)
Equation (2.4) is the infinitesimal version of Equation (2.2) and is a nonlinear Lyapunov equation given by VxT f gu Q W (u )
0, V (0)
0
(2.5)
where V ( x) can be thought of as a Lyapunov function for the admissible control policy u ( x) . The LE equation becomes the well-known HJB equation on substitution of the optimal control u * ( x)
1 R 1 g T Vx 2
(2.6)
where V * ( x) is the value function of the optimal control problem which solves the HJB equation 1 Vx T f Q Vx T gR 1 g T Vx 4 * V (0) 0
0
(2.7)
Note that the value function V ( x) obtained from (2.7) serves as a Lyapunov function on : for the optimal controller given by (2.6). To confront bounded controls, Lyshevski [70], [68] introduced a generalized nonquadratic functional u
W (u )
2³ I 1 (v) Rdv 0
where v \ m , I \ m and
(2.8)
36
Nonlinear H2/H Constrained Feedback Control
ª I (v1 ) º « # » I (v ) « » «¬I (vm ) »¼ ª I 1 (u1 ) º « » I (u ) « # » «I 1 (um ) » ¬ ¼ with I () a bounded one-to-one function that belongs to C p ( p t 1) and is L2 (:) . Moreover it is a monotonic odd function with its first derivative bounded by the constant M . An example of such a function is the hyperbolic tangent I () tanh() . R is a positive definite constant matrix and assumed to be symmetric for simplicity of analysis. Note that W (u ) is positive definite because I (u ) is monotonic odd and R is positive definite. Substituting (2.8) in (2.5), one obtains the following Lyapunov equation u
LE (V , u ) VxT f gu Q 2³ I 1 (v) Rdv
0, V (0)
0
(2.9)
0
Note that the LE equation (2.9) becomes the HJB equation upon substituting the constrained optimal state feedback control policy u ( x)
I I( 12 R 1 g T Vx )
(2.10)
where V ( x) solves the following HJB equation
T x
*
HJB(V ) V
f g I(
1 2
R g V ) Q 2 1
T
x
I ( 12 R 1 g cVx )
³ 0
V (0)
I 1 (v) Rdv
0
(2.11)
0
This is a nonlinear differential equation for which there may be many solutions. Existence and uniqueness of the value function has been shown in [66]. This HJB equation cannot generally be solved. There is no method suitable to solve this type of equation to find the value function of the associated optimal control problem. Moreover, current solutions are not well defined over a specific region in the state space.
Remark 2.1 Optimal control problems do not necessarily have smooth or even continuous value functions, [43][13]. In [63], using the theory of viscosity solutions, it is shown that for infinite horizon optimal control problems with unbounded cost functionals and under certain continuity and controllability assumptions of the dynamics, the value function is continuous, V ( x) C (:) . Moreover, if the Hamiltonian is strictly convex and if the continuous viscosity
Policy Iterations and Nonlinear H2 Constrained State Feedback Control
37
solution is semiconcave, then V ( x) C1 (:) [13] satisfying the HJB equation everywhere. Note that for affine input systems (2.1), the Hamiltonian is strictly convex if the system dynamics are not bilinear, and if the integrand of the performance functional (2.2) does not have cross-terms of the states and the input. In this chapter, all derivations are performed under the assumption of smooth solutions to (2.9) and (2.11) with all that this requires of necessary conditions. See [90][85] for a similar framework of solutions. If this smoothness assumption is relaxed, then one needs to use the theory of viscosity solutions [13] to show that the continuous cost solutions of (2.9) do converge to the continuous value function of (2.11).
2.3 Policy Iterations for Constrained-Input Systems It is important to note that the LE is linear in the cost function derivative, while the HJB is nonlinear in the value function derivative. Solving the LE for the cost function requires solving a linear partial differential equation, while the HJB equation solution involves a nonlinear partial differential equation, which may be impossible to solve. This is the reason for introducing the policy iteration technique for the solution of the HJB equation, which is based on a sound proof as will be seen in Lemma 2.1. Policy iterations using the LE have not yet been rigorously applied to bounded controls. In this section, it is shown that the policy iterations technique can be used for constrained controls when certain restrictions on the control input are met. The policy iteration technique is now applied to the new set of Equation (2.9), (2.10). The following lemma shows how Equation (2.10) can be used to improve a control policy. It will be required that the bounding function I () is nondecreasing.
Lemma 2.1 If u j < (:) , and V j C1 (:) satisfies the equation LE (V j , u j ) with the boundary condition V j (0) 0 , then the new control derived as u j 1 ( x)
I I( 12 R 1 g T Vx j )
0
(2.12)
is an admissible control for the system on : . Moreover, if the bounding function I () is monotone odd function, and V j 1 is the unique positive definite function satisfying equation LE (V j 1 , u j 1 ) 0 , with the boundary condition V j 1 (0) 0 , then V ( x) d V j 1 ( x) d V j ( x) x : .
Proof. To show the admissibility part, since V j C1 (:) , the continuity assumption on g implies that u j 1 is continuous. Since V j 1 is positive definite it attains a minimum at the origin, and thus, Vx j dV j dx must vanish there. This implies that u j 1 (0) 0 . Taking the derivative of V j along the solution trajectory of the system x f gu j 1 , one has Vj ( x,u j 1 ) Vx j T f Vx j T gu j 1
(2.13)
38
Nonlinear H2/H Constrained Feedback Control
uj
Vx j T f
Vx j T gu j Q 2 ³ I 1 (v) Rdv
(2.14)
0
Therefore Equation (2.13) becomes uj
Vj ( x,u j 1 )
Vx j T gu j Vx j T gu j 1 Q 2 ³ I 1 (v) Rdv
(2.15)
0
Since Vx jc g ( x)
2I 1c (u ) R , one has
Vj ( x,u j 1 )
uj ª 1 º T Q 2 «I (u j 1 ) R (u j u j 1 ) ³ I 1 (v) Rdv » «¬ »¼ 0
(2.16)
The second term in the previous equation is negative when I 1 , and hence I , is nondecreasing. To see this, note that the design matrix R is symmetric positive definite, this means that one can rewrite it as R /6/ where 6 is a triangular matrix with its values being the singular values of R , and / is an orthogonal symmetric matrix. Substituting for R in (2.16), one has Vj ( x,u j 1 )
uj ª º Q 2 «I 1 (u j 1 )T /6/ (u j u j 1 ) ³ I 1 (v)/6/dv » 0 ¬« ¼»
Applying the coordinate change u Vj ( x,u j 1 )
(2.17)
/ 1 z to (2.17)
Q I 1 (/ 1 z j 1 )/6/ (/ 1 z j / 1 z j 1 ) zj
2 ³ I 1 (/ 1] )/6// 1d ] 0
zj
Q I 1 (/ 1 z j 1 )T /6( z j z j 1 ) 2 ³ I 1 (/ 1] )/6d ] 0
zj
Q 2 ʌ( z j 1 )T 6( z j z j 1 ) 2 ³ ʌ(] )6d ]
(2.18)
0
where ʌ( z j )T I 1 (/ 1 z j )T / . Since 6 is a triangular matrix, one can now decouple the transformed input vector such that
Policy Iterations and Nonlinear H2 Constrained State Feedback Control
zk
Vj ( x,u j 1 )
39
j
Q 2 ʌ( z j 1 )T 6( z j z j 1 ) 2 ³ ʌ(] )6d ] 0
ª Q 2¦ 6 kk «S ( zk k 1 «¬
zk
m
j 1
)( zk j zk
j 1
)
j
º
³ S (] k )d] k » 0
»¼
(2.19)
Since the matrix R is positive definite, then one has the singular values 6 kk being all positive. Moreover, from the geometrical meaning of zk
S ( zk j 1 )( zk j zk j 1 )
j
³ S (]
k
)d ] k
0
this term is always negative if S () is monotone and odd. Because I () is monotone and odd one to one function, it follows that I 1 () is odd and monotone. Hence, since ʌ( z j )T I 1 (/ 1 z j )T / , it follows that S () is monotone and odd. This implies that V j ( x,u j 1 ) 0 and that V j ( x) is a Lyapunov function for u j 1 on : . Following Definition 2.1, u j 1 is admissible on : . To show the second part of the lemma, note that x0 along the solution trajectories of f gu j 1 , one has u j 1 x (W , x0 , u j 1 ) ½ ° ° 1 Q ( x ( W , x , u )) 2 I ( v ) Rdv ® ¾dW 0 j 1 ³0 ³0 °¯ °¿ u j x (W , x0 , u j ) f ½ ° ° 1 ³ ®Q( x(W , x0 , u j )) 2 I ( v ) Rdv ¾dW ³ 0° 0 °¿ ¯ f
V j 1 V j
f
³ Vx j 1T Vx j T ª¬ f gu j 1 º¼ dW
(2.20)
0
Because LE (V j 1 , u j 1 )
0 , LE (V j , u j )
0
uj
Vx j T gu j Q 2 ³ I 1 (v) Rdv
Vx j T f
(2.21)
0
u j 1 T x j 1
V
f
T x j 1
V
gu j 1 Q 2 ³ I 1 (v) Rdv 0
Substituting (2.21) and (2.22) in (2.20), one obtains
(2.22)
40
Nonlinear H2/H Constrained Feedback Control
V j 1 ( x0 ) V j ( x0 )
u j 1 f ½° ° 2 ³ ®I 1 (u j 1 )T R(u j 1 u j ) ³ I 1 (v) Rdv ¾ dW uj 0° ¯ ¿°
(2.23)
By decoupling Equation (2.23) using R /6/ , it can be shown that V j 1 ( x0 ) V j ( x0 ) d 0 when I () is nondecreasing. Moreover, it can be shown by , contradiction that V ( x0 ) d V j 1 ( x0 ) . The next theorem is a key result on which the rest of the chapter is justified. It shows that policy iterations on the saturated control law converges uniformly to the optimal saturated control law for the given actuator saturation model I () . Theorem 2.1 If u0 < (:) , then u j < (:), j t 0 . Moreover, V j o V , u j o u uniformly on : . Proof. From Lemma 2.1, it can be shown by induction that u j < (:), j t 0 . Furthermore, Lemma 2.1 shows that V j is a monotonically decreasing sequence and bounded below by V ( x) . Hence V j converges pointwise to Vf . Because : is compact, then uniform convergence follows immediately from Dini’s theorem [7]. Due to the uniqueness of the value function [61][66], it follows that Vf V . Controllers u j are admissible, therefore they are continuous having unique trajectories due to the locally Lipschitz continuity assumptions on the dynamics. Since (2.2) converges uniformly to V , this implies that the system’s trajectories converge x0 : . Therefore u j o uf uniformly on : . If dV j dx converges uniformly to dV dx , one concludes that uf u . To prove that dV j dx o dV dx uniformly on : , note that dV j dx converges uniformly to some continuous function J . Since V j o V uniformly and dV j dx exists j , it follows that the sequence dV j dx is term-by-term differentiable [7], and J dV dx . ,
The following guarantees that improving the control law does not reduce the region of asymptotic stability of the initial saturated control law. Corollary 2.1 If : denotes the region of asymptotic stability (RAS) of the constrained optimal control u , then : is the largest region of asymptotic stability of any other admissible control law. Proof. The proof is by contradiction. Lemma 1 showed that the saturated control u is asymptotically stable on : 0 , where : 0 is the stability region of the saturated control u0 . Assume that uLargest is an admissible controller with the largest region of asymptotic stability : Largest . Then, there is x0 : Largest , x0 : . From Theorem 2.1, x0 : which completes the proof. ,
Note that there may be stabilizing saturated controls that have larger stability regions than u * , but are not admissible with respect to Q( x ) and the system ( f , g) .
Policy Iterations and Nonlinear H2 Constrained State Feedback Control
41
2.4 Nonquadratic Performance Functionals for Minimum-time and Constrained States Control In this section, various performance functionals are developed to encode different constraints on the dynamical system. 2.4.1 Minimum-time Problems
For a system with saturated actuators, one may be interested in finding the control signal required to drive the system to the origin in minimum time. This requirement can be addressed by the following nonquadratic performance functional f
V
u ª º T tanh( x Qx ) 2 « ³0 ¬ ³0 I (v) Rdv »¼ dt
(2.24)
By choosing the coefficients of the weighting matrix R very small, and for xcQ x 0 , the performance functional becomes, ts
V
³ 1 dt
(2.25)
0
and for xcQ x | 0 , the performance functional becomes f
V
u ª T º x Qx 2 « ³t ¬ ³0 I (v) Rdv »¼ dt s
(2.26)
Equation (2.25) usually represents performance functionals used in minimumtime optimization because the only way to minimize (2.25) is by minimizing ts . Around the time ts , one has the performance functional slowly switching to a nonquadratic regulator that takes into account the actuator saturation. Note that this method allows easy formulation of a minimum-time problem, and that the solution will follow using the policy iteration technique. The solution is a nearly minimumtime controller that is easier to find compared with techniques aimed at finding the exact minimum-time controller. Finding an exact minimum-time controller requires finding a bang–bang controller based on a switching surface that is in most cases hard to determine [61], [53]. 2.4.2 Constrained States
In literature, there exist several techniques that finds a domain of initial states such that starting within this domain guarantees that a specific control policy will not violate the state constraints [35]. However, one is interested in improving given
42
Nonlinear H2/H Constrained Feedback Control
control laws so that they do not violate specific state-space constraints. For this the following nonquadratic performance functional can be chosen:
Q ( x, k )
nc § xl · xT Qx ¦ ¨ ¸ l 1 © Bl D l ¹
2k
(2.27)
where nc , Bl , are the number of constrained states and the upper bound on xl respectively. The integer k is positive, and D l is a small positive number. As k increases, and D l o 0 , the nonquadratic term will dominate the quadratic term when the state-space constraints are violated. However, the nonquadratic term will be dominated by the quadratic term when the state-space constraints are not violated. Note that in this approach, the constraints are considered soft constraints that can be hardened by using higher values for k and smaller values for D l .
2.5
Bibliographical Notes
The HJB equation used in this chapter was derived originally in [70]. Approximate HJB solutions have been found using many techniques such as those developed by Saridis [85], Beard [18][19], Lendaris [76], Lee [59], Bertsekas and Tsitsiklis [22], Munos [75], Lewis and Kim [52], Balakrishnan [38], [64], [65], Lyshevski [67][68][69][70], and Huang [46]. Policy iterations have been used with the HJB method in [17] and [85] using a sequence of nonlinear Lyapunov equations (LE), which can be thought of as the nonlinear counterpart of the matrix Lyapunov equation [61]. In [85], Saridis et al. developed a policy iteration method that improves a given initial stabilizing control policy. This method reduces to the wellknown Kleinman iterative method for solving the algebraic Riccati equation for linear systems [54] using a sequence of Lyapunov equations.
3 Nearly H2 Optimal Neural Network Control for Constrained-Input Systems
For nonlinear systems, it is unclear how to solve the LE equation explicitly. The solution technique described in this chapter combines the policy iteration method with the method of weighted residuals to get a neural network least squares solution of the HJB that is formulated using a nonquadratic functional to encode constraints on the input. Although Equation (2.9) is linear in the Lyapunov function V j ( x ) for any admissible control policy u j ( x) , it is still difficult to solve for the cost function V j ( x) . Therefore, in Section 3.1 neural networks are used to approximate the solution of the Lyapunov equation V j ( x ) at each policy iteration j to solve in a least squares sense over : \ n . Convergence of this neural network solution technique is presented in Sections 3.2 and 3.3. In Section 3.4, a mesh is introduced on : . This yields an efficient, practical and computationally tractable solution algorithm for general nonlinear systems with saturated controls. Finally, Section 3.5 utilizes some numerical examples to demonstrate the techniques presented in this chapter and serves as a tutorial for the proposed nearly optimal neural network approach. Section 3.6 presents an alternative policy iteration method that does not require solving a Lyapunov equation.
3.1 A Neural Network Solution to the LE(V,u) It is well known that neural networks can be used to approximate smooth functions on prescribed compact sets [60]. Since our analysis is restricted to a set within the stability region, neural networks are naturally suited to our application. Therefore, to successively solve (2.9), (2.10) for bounded controls, one can approximate V j with
Vˆj ( x)
L
¦w
k j
k 1
V k ( x) w j T ı L ( x)
(3.1)
44
Nonlinear H2/H Constrained Feedback Control
which is a neural network with activation functions V k ( x) C1 (:) , V j (0) 0 . The neural network weights are wk j and L is the number of hidden-layer neurons. T T Vectors ı L ( x ) { >V 1 ( x) V 2 ( x) " V L ( x) @ , w j { ª¬ w1 j w2 j " wL j º¼ are the vector activation function and the vector weight respectively. The neural network weights will be tuned to minimize the residual error in a least squares sense over a set of points within the stability region : of the initial stabilizing control policy. The least squares solution attains the lowest possible residual error with respect to the weights of the neural network. For LE (V , u ) 0 , the solution V is replaced with VL having a residual error § LE ¨ Vˆ ( x) ©
L
¦w V k
k
k 1
· ( x), u ¸ ¹
eL ( x)
(3.2)
To find the least squares solution, the method of weighted residuals is used [31]. The weights w L are determined by projecting the residual error onto deL ( x ) dw L and setting the result to zero x : using the inner product, i.e.
deL ( x) , eL ( x) dw where f,g
³ fgdx
0
(3.3)
is a Lebesgue integral. Equation (3.3) becomes
:
ı L ( f gu ), ı L ( f gu ) w u
Q 2³ I 1 (v) Rdv, ı L ( f gu )
0
(3.4)
0
The following technical results are needed. L
Lemma 3.1 If the set ^V k `1 is linearly independent and u < (:) , then the set L
T ° dV k °½ ( f gu ) ¾ ® °¯ dx °¿1
(3.5)
is also linearly independent.
Proof. See [19].
,
Owing to Lemma 3.1, ı L ( f gu ), ı L ( f gu ) is of full rank, and thus is invertible. Therefore a unique solution for w exists and is computed as
Nearly H2 Optimal Neural Network Control for Constrained-input Systems
w
ı L ( f gu ), ı L ( f gu )
45
1
u
Q 2³ I 1 (v) Rdv, ı L ( f gu )
.
(3.6)
0
Having solved for the neural net weights, the improved control is given by uˆ
§1 · I ¨ R 1 g ( x)T ı LT w ¸ 2 © ¹
(3.7)
Equations (3.6) and (3.7) are successively solved at each policy iteration i until convergence.
3.2 Convergence of the Method of Least Squares to the Solution of the LE(V,u) In what follows, convergence results associated with use of the method of approach to solve the LE equation for the cost function using the Fourier series expansion (3.1) are shown. Before this, the following notation and definitions associated with convergence issues are considered. Definition 3.1 Convergence in the mean: A sequence of functions ^ f n ` that is Lebesgue-integrable on a set : , L2 (:) , is said to converge in the mean to f on : if H ! 0, N (H ) : n ! N f n ( x) f ( x) where f
2 L2 ( : )
L2 ( : )
H
f, f .
The convergence proofs for the least squares method is done in the Sobolev function space setting. This space allows one to define functions that are L2 (:) with their partial derivatives. The LE equation can be written using the linear operator A defined on the Hilbert space H 1,2 (:) AV
VxT f gu
P
Q W (u )
L
In [71], it is shown that if the set ^V j ` is complete and the operator A and its 1 inverse are bounded, then AVˆ AV
L2 ( : )
o 0 and Vˆ V
L2 ( : )
o0
46
Nonlinear H2/H Constrained Feedback Control
However, for the LE equation, it can be shown that these sufficiency conditions are violated. Neural networks based on power series have an important property that they are differentiable. This means that they can approximate uniformly a continuous function with all its partial derivatives of order m using the same polynomial, by differentiating the series termwise. This type of series is m -uniformly dense. This is known as the High Order Weierstrass Approximation theorem. Other types of neural networks not necessarily based on power series that are m -uniformly dense are studied in [41].
Lemma 3.2 High Order Weierstrass Approximation Theorem: Let f ( x) C m (:) in the compact set : , then there exists a polynomial, f N ( x) , such that it converges uniformly to f ( x) C m (:) , and such that all its partial derivatives up to order m converges uniformly, [31], [41]. Lemma 3.3 Given N linearly independent sets of functions 2
DN fN
o 0 DN
L2 ( : )
2
^ f n ` . Then
o0
l2
Proof. To show the sufficiency part, note that the Gram matrix, G f N , f N , is 2 positive definite. Therefore, D NT GN D N t O (GN ) D N l , O (GN ) ! 0 N . If 2 2 D N cGN D N o 0 , then D N l D N cGN D N O (GN ) o 0 because O (GN ) ! 0 N . 2 To show the necessity part, note that
DN
2 L2 ( : )
2 DN fN
2 DN fN
2
2
2
L2 ( : )
DN
L2 ( : )
fN
2
fN
2
L2 ( : )
DN fN
2
DN fN
2
L2 ( : ) L2 ( : )
L2 ( : ) L2 ( : )
Using the Parallelogram Law 2
DN fN
L2 ( : )
DN fN
2
2
2 DN
L2 ( : )
L2 ( : )
2 fN
2 L2 ( : )
As N o f
DN fN
2 L2 ( : )
DN fN
o0
2 2 D N L (:) 2 f N
2 L2 ( : )
2
DN fN
2
DN fN
2
L2 ( : ) L2 ( : )
2 L2 ( : )
o fN
2
o fN
2
L2 ( : ) L2 ( : )
As N o f o fN
o0
2 DN fN
2 L2 ( : )
DN
2 L2 ( : )
fN
2 L2
2
L2 ( : )
2 D N f N L (:) o 0 (:) 2
Nearly H2 Optimal Neural Network Control for Constrained-input Systems
Therefore, D N
2 l2
o 0 DN fN
2 L2 ( : )
47
,
o0.
Before discussing the convergence results for the method of least squares, the following four assumptions are needed. Assumption 3.1 The LE solution is positive definite. This is guaranteed for stabilizable dynamics and when the performance functional satisfies zero-state observability. Assumption 3.2 The system’s dynamics and the performance integrands Q ( x) W u ( x) are such that the solution of the LE is continuous and differentiable, therefore, belonging to the Sobolev space V H 1,2 (:) . f
Assumption 3.3 One can choose complete coordinate elements ^V j ` H 1,2 (:) 1 and its partial derivatives such that the solution V H 1,2 (:) ^wV wx1 ,!f , wV wxn ` can be approximated uniformly by the infinite series built from ^V j ` . 1
Assumption 3.4 The sequence ^\ j
AV j ` is linearly independent and complete.
In generalf the infinite series, constructed from the complete coordinate elements ^V j ` , need not be differentiable. However, from Lemma 3.1 and [41], it 1 is known that several types of neural networks can approximate a function and all its partial derivatives uniformly. Linear independence of ^\ j ` follows from Lemma 3.1. While completeness follows from Lemma 3.2 and [41], L : Vˆ V H and k wVˆ wxk wV wxk H
V , H
This implies that L o 0 sup AVˆ AV o 0 AVˆ AV x:
L2 ( : )
o0
and therefore completeness of the set ^\ j ` is established. The next theorem uses these assumptions to conclude convergence results of the least squares method which is placed in the Sobolev space H 1,2 (:) . Theorem 3.1 If assumptions 3.1–3.4 hold, then approximate solutions exist for the LE equation using the method of least squares and are unique for each L . In addition, the following results are achieved: o0 R1) LE (Vˆ ( x)) LE (V ( x)) L2 ( : )
R2) Vˆx Vx
L2 ( : )
o0
48
Nonlinear H2/H Constrained Feedback Control
R3) VxT f 4J1 2 VxT kk T Vx hT h d 0, V (0)
0
Proof. Existence of a least squares solution for the LE equation can easily be shown. The least squares solution VL is nothing but the solution of the minimization problem
AVˆ P
2
min A3 P
2
3S L
min w T ȥ L P
2
w
where S L is the span of ^V 1 ,! , V L ` . Uniqueness follows from the linear independence of ^\ 1 ,! ,\ L ` . The first results, R1, follows from the completeness of ^\ j ` .To show the second result, R2, write the LE equation in terms of its series expansion on : with coefficients c j 0
§ LE ¨ Vˆ ©
L
¦wV i
i 1
i
· ¸ LE (V ¹
(w c L )T ı L f gu
f
¦c V ) i
i
H L ( x)
i 1
L ( x)
e T f dV i H L ( x) ¦ ci f gu . dx i L 1
Note that eL ( x) converges uniformly to zero due to Lemma 3.2, and hence converges in the mean. On the other hand H L ( x) is shown to converge in the mean to zero using the least squares method as seen in R1. Therefore, (w c L )T ı L ( f gu )
2
H L ( x) eL ( x)
L2 ( : )
d 2 H L ( x)
2 L2 ( : )
2 L2 ( : )
2 eL ( x)
Because
ı L f gu is linearly independent, using Lemma 3.3, one concludes that w cL
2 l2
o0
Therefore, because the set
^d V i
d x`
is linearly independent, one concludes from Lemma 3.3 that
2 L2 ( : )
o0
Nearly H2 Optimal Neural Network Control for Constrained-input Systems
(w c L )T ı L
2
49
o0
L2 ( : )
Because the infinite series with c j converges uniformly it follows that Vˆx Vx
L2 ( : )
o0
Finally, the third result, R3, follows by noting that g ( x ) is continuous and therefore bounded on : , this implies using R2 that 1 R 1 g T (Vˆx Vx ) 2
2
L2 ( : )
1 d R 1 g T 2
2
(Vˆx Vx ) L2 ( : )
2 L2 ( : )
o0
Denote 1 2
1 2
Dˆ k ( x) g k T Vˆx , D k ( x) g k T Vx
uL u
1 1 I g T Vˆx ) I g T Vx ) 2 2 ª I Dˆ1 ( x) I D1 ( x) º « » # « » «¬I Dˆ m ( x) I D m ( x) »¼
Because I () is smooth, and under the assumption that its first derivative is bounded by a constant M, then one has
I (Dˆ j ) I (D j ) d M (Dˆ j ( x) D j ( x)) therefore
Dˆ j ( x) D j ( x)
L2 ( : )
o 0 I (Dˆ j ) I (D j )
L2 ( : )
o0 ,
hence R3 follows. Corollary 3.1 If the results of Theorem 3.1 hold, then
sup Vˆx Vx o 0, sup Vˆ V o 0, sup uˆ u o 0 x:
x:
x:
Proof. As the coefficients of the neural network, w j , series converge to the 2 coefficient of the uniformly convergent series, c j , w c L l 2 o 0 . And since the mean error goes to zero in R2 and R3, hence uniform convergence follows. ,
50
Nonlinear H2/H Constrained Feedback Control
The next theorem is required to show the admissibility of the controller derived using the technique presented in this chapter. Corollary 3.2 Admissibility of uˆ( x) :
M : L t M , uˆ < (:) Proof. Consider the following LE equation: u j 1
Vj ( x,uˆ j 1 )
Q 2I 1 (u j 1 )T R (uˆ j 1 u j 1 ) 2
³I
1
(v) Rdv
0
d0
uj
2 ³ I 1 (v) Rdv 2I 1 (u j 1 )T R (u j u j 1 ) u j 1
Since uˆ j 1 is guaranteed to be within a tube around u j 1 because uˆ j 1 o u j 1 uniformly. Therefore one can easily see that u j 1
I 1 (u j 1 )T Ruˆ j 1 t 1/ 2 I 1 (u j 1 )T Ru j 1 D
³I
1
Rdv
0
with D ! 0 is satisfied x :{x : x :1 (H L )} where :1 (H L ) : containing the origin. Hence Vj ( x,uˆ j 1 ) 0 x :{x : x :1 (H L )} . Given that uˆ j 1 (0) 0 , and from the continuity of uˆ j 1 , there exists : 2 (H L ) :1 (H L ) containing the origin for which Vj ( x,uˆ j 1 ) 0 . As L increases, :1 (H L ) gets smaller while : 2 (H L ) gets larger and the inequality is satisfied x : . Therefore, L0 : L t L0 , Vj ( x,uˆ j 1 ) 0 x : and hence uˆ < (:) . , Corollary 3.3 (Positive Definiteness of Vˆ ( x) ): Vˆ ( x ) Vˆ ( x) ! 0 .
0 x
0 , elsewhere
Proof. The proof is going to be by contradiction. Assuming that u < (:) , then Lemma 3.1 is satisfied. Therefore w
ı L ( f gu ), ı L ( f gu )
1
u
Q 2 ³ I 1 (v) Rdv, ı L ( f gu ) 0
Assume also that L
xa z 0, s.t.
¦w V j
j 1
Then,
j
( xa )
w T ı L ( xa )
0
Nearly H2 Optimal Neural Network Control for Constrained-input Systems
51
T
u
Q 2 ³ I 1 (v) Rdv, ı L ( f gu )
ı L ( f gu ), ı L ( f gu )
1T
ı L ( xa )
0
0
Note that because Lemma 3.1 is satisfied then
ı L ( f gu ), ı L ( f gu )
1
is a positive definite constant matrix. This implies that T
u
Q 2³ I 1 (v ) Rdv, ı L ( f gu )
ı L ( xa )
0
0
One can expand this matrix representation into a series form, T
u 1
Q 2 ³ I (v) Rdv, ı L ( f gu )
ı L ( xa )
0
u
L
¦ j 1
Q 2 ³ I 1 (v) Rdv, 0
dV j dx
T
( f gu ) V j ( xa )
0 Note that u
Q 2³ I 1 (v) Rdv, 0
dV j dx
T u °§ · ½° · § dV j 1 ¨ 2 I ( ) ( f gu ) ¸ ¾dx Q v Rdv ¸¨ ³: ®¨© ³0 ¸ ¹ © dx °¯ ¹ ¿°
T
( f gu )
Thus T u °§ · ½° · § dV j 1 ¨ 2 I ( ) ( f gu ) ¸ ¾dx V j ( xa ) Q v Rdv ¸¨ ®¨ ¦ ³ ³ ¸ j 1 : °© 0 ¹ © dx ¹ °¿ ¯ L
0
Using the mean value theorem, [ : such that u º ° ª °½ T 1 Q 2 ® « ³: °¬ ³0 I (v) Rdv »¼ u ª¬ı L ( xa )ı L ( f gu)º¼ ¾°dx ¯ ¿
° ª
u
º
°½
¼
¿°
P (:) ® «Q 2³ I 1 (v) Rdv » u ª¬ıTL ( xa )ı L ( f gu ) º¼ ¾ [ ¯° ¬
0
where P (:) is the Lebesgue measure of : . This implies that
52
Nonlinear H2/H Constrained Feedback Control
0
L
ª§
u
j 1
¬«©
0
· dV j
¦ P (:) «¨ Q 2³ I1 (v) Rdv ¸
¹ dx
u
ª
º
L
¼
j 1
T
º ( f gu ) » ([ ) u V j ( xa ) ¼»
ª dV j T
P (:) «Q 2³ I 1 (v) Rdv » ([ ) ¦ « ¬
0
«¬ dx
º ( f gu ) » ([ ) u V j ( xa ) »¼
This implies that L
ª dV j T
¦« j 1
«¬ dx
º ( f gu ) » ([ ) u V j ( xa ) »¼
0
Now, one can select a constant V j ( xa ) to be equal to a constant c j . Thus one can rewrite the above formula as L
¦ j 1
ª dV j T º ( f gu ) » ([ ) cj « ¬« dx ¼»
0
Since [ depends on : , which is arbitrarily, this means that, dV j dx f gu is , not linearly independent, which contradicts our assumption.
Corollary 3.4 It can be shown that sup uˆ ( x) u ( x ) o 0 x:
implies that sup J ( x) V ( x) o 0 x:
where LE ( J , uˆ )
0 , LE (V , u )
0.
3.3 Convergence of the Method of Least Squares to the Solution of the HJB Equation In this section, a theorem analogous to Theorem 3.1 which guarantees that least squares policy iterations converge to the value function of the HJB equation (2.11) is presented.
Theorem 3.2 Under the assumptions of Theorem 3.1, the following are satisfied j t 0 : i. sup Vˆ V o 0 x:
j
j
Nearly H2 Optimal Neural Network Control for Constrained-input Systems
ii.
53
sup uˆ j 1 u j 1 o 0 x:
iii. N : L t N , uˆ j 1 < (:)
Proof. This proof is by induction. Basis Step: Using Corollary 3.1 and 3.2, it follows that for any u0 < (:) , one has A. sup Vˆ V o 0 x:
B.
0
0
sup uˆ1 u1 o 0 x:
C.
N : L t N , uˆ1 < (:)
Inductive Step: Assume that a. sup Vˆ V o 0 x:
b.
j 1
j 1
sup uˆ j u j o 0 x:
c.
N : L t N , uˆ j < (:)
If J j is such that LE ( J j , uˆ j ) 0 . Then from Corollary 3.1, J j can be uniformly approximated by Vˆj . Moreover from assumption b and Corollary 3.4. It follows that as uˆ j o u j uniformly, then J j o V j uniformly. Therefore Vˆj o V j uniformly. Because Vˆj o V j uniformly, then uˆ j 1 o u j 1 uniformly by Corollary 3.1. From Corollary 3.2, M : L t M uˆ j 1 < (:) . , Hence the proof by induction is complete. The next theorem is an important result upon which the algorithm proposed in Figure 3.1 in the next section is based.
Theorem 3.3 H ! 0, M , N : j t M , L t N the following is satisfied A. sup Vˆ V H x:
B.
j
sup uˆ j u H x:
C.
uˆ j < (:)
Proof. The proof follows directly from Theorem 2.1 and Theorem 3.2.
,
54
Nonlinear H2/H Constrained Feedback Control
3.4 Algorithm for Nearly Optimal Neurocontrol Design with Saturated Controls: Introducing a Mesh in \n Solving the integration in (3.6) is expensive computationally. However, an integral can be approximated by replacing the integral with a summation series over a mesh of points on the integration region. This results in a nearly optimal, computationally tractable solution procedure. By introducing a mesh on :, with mesh size equal to 'x, one can rewrite some terms of (3.6) as follows:
X
«ı ( f gu ) x1 «¬ L
" ı L ( f gu ) x »» p ¼
u « «Q 2 I 1 (v) Rdv ³0 « x1 ¬
Y
T
» " Q 2 ³ I 1 (v) Rdv » » 0 xp ¼
(3.8)
T
u
(3.9)
where p in x p represents the number of points of the mesh. This number increases as the mesh size is reduced. Note that ı L ( f gu ), ı L ( f gu )
lim X T X 'x
'x o 0
u
Q 2³ I 1 (v) Rdv, ı L ( f gu ) 0
lim X T Y 'x
(3.10)
'x o 0
This implies that one can calculate w L as
w
( X T X ) 1 ( X T Y )
(3.11)
One can also use Monte Carlo integration techniques in which the mesh points are sampled stochastically instead of being selected in a deterministic fashion, [30]. This allows a more efficient numerical integration technique. In any case, however, the numerical algorithm at the end requires solving (3.11), which is a least squares computation of the neural network weights. Numerically stable routines that compute equations like (3.11) do exists in several software packages like MATLAB, which is used to perform the simulations in this chapter. A flowchart of the proposed computational algorithm presented in this chapter is shown in Figure 3.1. This is an offline algorithm run a priori to obtain a neural network feedback controller that is a nearly optimal solution to the HJB equation for the constrained control input case. The neurocontrol law structure is shown in
Nearly H2 Optimal Neural Network Control for Constrained-input Systems
55
Figure 3.2. It is a neural network with activation functions given by ı , multiplied by a vector function of the system’s state variables. Start
Initialization
L : Number of neurons or activation function. ı L : Neurons. p: Number of mesh points. u0 : Initial asymptotically stable control. M : Number of policy iterations. :: The neural network region of approximation. Q( x ), R: Performance criteria.
j
Xj Yj
«ı ( f gu ) j x «¬ L 1
0
" ı L ( f gu j )
xp
» »¼
T
uj uj « » «Q 2 I 1 (v) Rdv " Q 2 I 1 (v ) Rdv » ³ ³ « » 0 0 x1 xp ¼ » ¬« iT i 1 iT i w j ( X j X j ) ( X j Y j )
j o j 1, uˆ j 1
T
I( 12 g T ı LT w j ). No
j!M
Yes
Finish
Figure 3.1. Policy iterations algorithm for nearly optimal saturated neurocontrol
56
Nonlinear H2/H Constrained Feedback Control
Figure 3.2. Neural-network-based nearly optimal saturated control law
3.5 Numerical Examples The power of the neural network control technique to find nearly optimal nonlinear saturated controls for general systems is demonstrated using five examples. In all these examples, neural networks are based on polynomials, i.e. Volterra neural networks are used. 3.5.1 Constrained-input Linear System
The algorithm obtained is applied to the following linear system x1 x2
2 x1 x2 x3
x3
x3 u1
x1 x2 u2
It is desired to control the system with input constraints u1 d 3, u2 d 20 . This system when uncontrolled has eigenvalues with positive real parts. This systems is not asymptotically null controllable, therefore global asymptotic stabilization cannot be achieved [88]. The algorithm developed in this chapter is used to derive a nearly optimal neurocontrol law for a specified region of stability around the origin. The following smooth function is used to approximate the value function of the system:
Nearly H2 Optimal Neural Network Control for Constrained-input Systems
V21 ( x1 , x2 , x3 )
57
w1 x12 w2 x22 w3 x32 w4 x1 x2 w5 x1 x3 w6 x2 x3 w7 x14 w8 x24 w9 x34 w10 x12 x22 w11 x12 x32 w12 x22 x32 w13 x12 x2 x3 w14 x1 x22 x3 w15 x1 x2 x32 w16 x13 x2 w17 x13 x3 w18 x1 x23 w19 x1 x33 w20 x2 x33 w21 x23 x3
Selecting the approximation for V ( x) is usually a natural choice guided by engineering experience and intuition. With this selection, one guarantees that V (0) 0 . This is a neural net with polynomial activation functions, a Volterra neural network. It has 21 activation functions containing powers of the state variable of the system up to the 4th power. Neurons with 4th-order power of the states variables were selected because for neurons with 2nd-order power of the states, the algorithm did not converge. Moreover, it is found that 6th power polynomials did not improve the performance over 4th power ones. The number of neurons required is chosen to guarantee uniform convergence of the algorithm. If fewer neurons are used, then the algorithm might not properly approximate the cost function associated with the initial stabilizing control, and thus the improved control using this approximated cost might not be admissible. The activation functions for the neural network neurons selected in this example satisfy the properties of activation functions discussed in Section 3.1 and [60]. To initialize the algorithm, a stabilizing control is needed. It is very easy to find this using a linear quadratic regulator (LQR) for unconstrained controls. In this case, the performance functional is f
³x
2 1
x22 x32 u12 u22 dt
0
Solving the corresponding Riccati equation, unconstrained state feedback control is obtained:
the
u1
8.31x1 2.28 x2 4.66 x3
u2
8.57 x1 2.27 x2 2.28 x3
following
stabilizing
However, when the LQR controller works through saturated actuators, the stability region shrinks. Further, this optimal control law derived for the linear case will not be optimal anymore working under saturated actuators. Figure 3.3 shows the performance of this controller assuming working with unsaturated actuators for the initial conditions xi (0) 1.2, i 1, 2,3 . Figure 3.4 shows the performance when this control signal is bounded by u1 d 3, u2 d 20 . Note how the bounds destroy the performance.
58
Nonlinear H2/H Constrained Feedback Control
State trajectory for LQR control law 2 x1
Systems states
1 0 x2
-1 -2
x3 -3
0
2
4 Time (s) 6
8
10
LQR control signal 5 u1
Control input u(x)
0 u2 -5
-10
-15
-20
0
2
4 Time (s) 6
8
10
Figure 3.3. LQR optimal unconstrained control
In order to model the saturation of the actuators, a nonquadratic cost performance term (2.8) is used as explained before. To show how to do this for the general case of u d A , it is assumed that the function I (cs ) is given as A * tanh(1/ A cs ) , where cs is assumed to be the command signal to the actuator. Figure 3.5 shows this for the case u d 3 .
Nearly H2 Optimal Neural Network Control for Constrained-input Systems
59
State trajectory for LQR control law with bounds 3 x1
2
Systems states
1 0 -1 x3
-2 -3 -4 -5
x2 0
2
4 Time (s) 6
8
10
The Initial stabilizing control: LQR control signal with bounds 5 u1
Control input u(t)
0 u2 -5
-10
-15
-20
0
2
4
6
8
Time (s) Figure 3.4. LQR control with actuator saturation
Following that, the nonquadratic cost performance is calculated to be u
W (u )
2 ³ A tanh 1 (v / A) Rdv 0
2 A u R u u u tanh 1 (u / A) A2 u R u ln 1 u 2 / A2
10
60
Nonlinear H2/H Constrained Feedback Control
This nonquadratic cost performance is then used in the algorithm to calculate the optimal bounded control. The improved bounded control law is found using the technique presented in the previous section. The algorithm is run over the region 1.2 d x1 d 1.2, 1.2 d x2 d 1.2, 1.2 d x3 d 1.2 with the design parameters R I 2 x 2 , Q I 3 x 3 . This region falls within the region of asymptotic stability of the initial stabilizing control. Methods to estimate the region of asymptotic stability are discussed in [51]. Modeled vs. true actuator saturation Saturated output of the actuator
5 True saturation Modeled saturation
0
-5 -10
-5
0 Command signal
5
10
Figure 3.5. Model of saturation
After 20 policy iterations, the algorithm converges to
u1
§ 7.7 x1 2.44 x2 4.8 x3 2.45x13 2.27 x12 x2 ½ · ¨1° °¸ 3 tanh ¨ ®3.7 x1 x2 x3 0.71x1 x22 5.8 x12 x3 4.8 x1 x32 ¾ ¸ ¨¨ 3 ° ° ¸¸ 3 2 2 3 ¿¹ © ¯0.08 x2 0.6 x2 x3 1.6 x2 x3 1.4 x3
u2
§ 9.8 x1 2.94 x2 2.44 x3 0.2 x13 0.02 x12 x2 ½ · ¨ 1 ° °¸ 20 tanh ¨ ®1.42 x1 x2 x3 0.12 x1 x22 2.3 x12 x3 1.9 x1 x32 ¾ ¸ ¨¨ 20 ° ° ¸¸ 3 2 2 3 ¿¹ © ¯0.02 x2 0.23 x2 x3 0.57 x2 x3 0.52 x3
This is a nearly optimal saturated control law in feedback strategy form. It is given in terms of the state variables and a neural net following the structure shown in Figure 3.2. The suitable performance of this saturated control law is revealed in Figure 3.6.
Nearly H2 Optimal Neural Network Control for Constrained-input Systems
61
State trajectory for the nearly optimal control law 4 x1
Systems states
2
0 x3
-2
-4 x2 -6
0
2
4 Time (s) 6
8
10
Nearly optimal control signal with input constraints 5 u1 Control input u(t)
0
-5
u2
-10
-15
-20
0
2
4 Time (s) 6
8
Figure 3.6. Nearly optimal nonlinear neural control law
10
62
Nonlinear H2/H Constrained Feedback Control
3.5.2 Nonlinear Oscillator with Constrained Input Consider the nonlinear oscillator having the dynamics x1
x1 x2 x1 ( x12 x22 )
x2
x1 x2 x2 ( x12 x22 ) u
It is desired to control the system with control limits of u d 1 . The following smooth function is used to approximate the value function of the system,
V24 ( x1 , x2 )
w1 x12 w2 x22 w3 x1 x2 w4 x14 w5 x24 w6 x13 x2 w7 x12 x22 w8 x1 x23 w9 x16 w10 x26 w11 x15 x2 w12 x14 x22 w13 x13 x23 w14 x12 x24 w15 x1 x25 w16 x18 w17 x28 w18 x17 x2 w19 x16 x22 w20 x15 x23 w21 x14 x24 w22 x13 x25 w23 x12 x26 w24 x1 x27
This neural net has 24 activation functions containing powers of the state variable of the system up to the 8th power. In this example, the order of the neurons is higher than in the previous example to guarantee uniform convergence. The complexity of the neural network is selected to guarantee convergence of the algorithm to an admissible control law. When only up to the 6th-order powers are used, convergence of the iteration to admissible controls was not observed. The unconstrained state feedback control u 5 x1 3 x2 , is used as an initial stabilizing control for the iteration. This is found after linearizing the nonlinear system around the origin, and building an unconstrained state feedback control which makes the eigenvalues of the linear system all negative. Figure 3.7 shows the performance of the bounded controller u sat11 5 x1 3 x2 , when running it through a saturated actuator for x1 (0) 0, x2 (0) 1 . The nearly optimal saturated control law is now found through the technique presented in Figure 3.1. R 1, Q I 2 x 2 and the algorithm is run over the region 1 d x1 d 1 1 d x2 d 1 After 20 policy iterations, the nearly optimal saturated controller is found to be
u
§ 2.6 x1 4.2 x2 0.4 x23 4.0 x13 8.7 x12 x2 8.9 x1 x22 · ¨ ¸ 5 5 4 3 2 2 3 4 ¨ 5.5 x2 2.26 x1 5.8 x1 x2 11x1 x2 2.6 x1 x2 2.00 x1 x2 ¸ tanh ¨ 7 7 6 5 2 4 3 3 4¸ ¨ 2.1x2 0.5 x1 1.7 x1 x2 2.71x1 x2 2.19 x1 x2 0.8 x1 x2 ¸ ¨ 1.8 x 2 x5 0.9 x x 6 ¸ 1 2 1 2 © ¹
This is the control law in terms of a neural network following the structure shown in Figure 3.2. The performance of this saturated control law is shown in Figure 3.8. Note that the states and the saturated input in Figure 3.8 have fewer oscillations when compared to those of Figure 3.7.
Nearly H2 Optimal Neural Network Control for Constrained-input Systems
63
State trajectory for initial stabilizing control 1 x1 Systems states
0.5
x2
0
-0.5
-1
0
5
10
15
20 Time (s)
25
30
35
40
35
40
Control signal for the initial stabilizing control 1
Control input u(t)
0.5
0
-0.5
-1
0
5
10
15
20 Time (s)
25
30
Figure 3.7. Performance of the initial stabilizing control when saturated
64
Nonlinear H2/H Constrained Feedback Control
State trajectory for the nearly optimal control law 1 0.8
Systems states
0.6 x1
0.4 0.2 0 -0.2
x2
-0.4 -0.6
0
5
10
15
20 25 30 35 Time (s) Nearly optimal control signal with input constraints
5
10
15
40
1
Control input u(t)
0.5
0
-0.5
-1
0
20 Time (s)
25
30
35
40
Figure 3.8. Nearly optimal nonlinear control law for the nonlinear oscillator considering actuator saturation
Nearly H2 Optimal Neural Network Control for Constrained-input Systems
65
3.5.3 Constrained State Linear System
Consider the following system x1 x2
x2 x1 x2 u
x1 d 3 For this, select the following performance functional 10
§ x · x12 x22 ¨ 1 ¸ © 3 1 ¹
Q( x,14) W (u )
u2
Note that the coefficient k is chosen to be 10, and B1 3 , and D1 1 . k is selected to be 10 because a larger value for k requires using many activation functions in which a large number of them will have to have powers higher than the value k . However, since this simulation was carried on a double precision computer, then power terms higher than 14 do not add up nicely and round-off errors seriously affect determination of the weights of the neural network by causing a rank deficiency. An initial stabilizing controller, LQR u ( x) 2.4x1 3.6x2 , that violates the state constraints is shown in Figure 3.9. The performance of this controller is improved by stochastically sampling from the region 3.5 d x1 d 3.5, 5 d x2 d 5 , where the number of mesh points p 3000 , and running the policy iterations algorithm 20 times. It can be seen that the nearly optimal control law that considers the state constraint (Figure 3.10) tends not to violate the state constraint as the LQR controller does. It is important to realize that as the order k in the performance functional is increased, then one gets larger and larger control signals at the starting time of the control process to avoid violating the state constraints. A smooth function of the order 45 that resembles the one used for the nonlinear oscillator in the previous example is used to approximate the value function of the system. The weights w are found by the policy iteration method. Since R 1 , the final control law becomes u ( x)
1 w wT ı L ( x) 2 wx2
It was noted that the nonquadratic performance functional returns an overall cost of 212.33 when the initial conditions are x1 2.4, x2 5.0 for the optimal controller, while this cost increases to 316.07 when the linear controller is used. It is this increase in cost detected by the nonquadratic performance functional that causes the system to avoid violating the state constraints. If this difference in costs is made bigger, then one actually increases the set of initial conditions that do not
66
Nonlinear H2/H Constrained Feedback Control
violate the constraint. This, however, requires a larger neural network, and high precision computing machines. State trajectory for quadratic performance functional 5 4
States
3 x1
2 1 0 -1
x2 -2
0
5
10 Time (s)
15
20
Control input for quadratic performance functional, LQR control 5 0
Control u(t)
-5 -10 -15 -20 -25
0
5
10 Time (s)
15
Figure 3.9. LQR control without considering the state constraint
20
Nearly H2 Optimal Neural Network Control for Constrained-input Systems
67
State trajectory for the nonquadratic performance functional 5 4
States
3 2
x1
1 0 -1 -2
x2 0
5
10 Time (s)
15
20
Control input for the nonquadratic performance functional 10
Control u(t)
0
-10
-20
-30
-40
0
5
10 Time (s)
15
Figure 3.10. Nearly optimal nonlinear control law considering the state constraint
20
68
Nonlinear H2/H Constrained Feedback Control
3.5.4 Minimum-time Control Consider the following system
x1 x2
x2 x2 u
It is desired to control the system with control limits of u d 1 to drive it to the origin in minimum time. Typically, from classical optimal control theory [53], one finds that the control law required is a bang–bang controller that switches back and forth, based on a switching surface that is calculated using Pontryagin’s minimum principle. It follows that the minimum time control law for this system is given by
x2 ln x2 1 x2 , x2
s x
x1
u * ( x)
1, ° °+1, ® °1, ° 0, ¯
for x such that s x >0 for x such that s x <0 for x such that s x =0 and x2 <0 for x
0
The response of this controller is shown in Figure 3.11. It can be seen that this is a highly nonlinear control law that requires the calculation of a switching surface. This is, however, a formidable task even for linear systems with state dimension larger than 3. However, when using the method presented in this chapter, finding a nearly minimum-time controller becomes a less complicated matter. The following nonquadratic performance functional is used
Q( x)
tanh x12 / 0.12 x22 / 0.12 , W (u )
u
0.001u 2³ tanh 1 ( P )d P 0
A smooth function of the order 35 is used to approximate the value function of the system. This neural network is solved by stochastic sampling, Monte Carlo methods [30]. Let p 5000 for 0.5 d x1 d 0.5, 0.5 d x2 d 0.5 . The weights w are found after iterating for 20 times. Since R 1 , the final control law becomes
u ( x)
§1 w · tanh ¨ w T ı L ( x) ¸ © 2 wx2 ¹
Figure 3.12 shows the result of using the nearly minimum-time neural network controller. Figure 3.13 plots the state trajectory of both controllers. Note that the nearly minimum-time controller behaves as a bang–bang controller until the states come close to the origin, when it starts behaving as a regulator.
Nearly H2 Optimal Neural Network Control for Constrained-input Systems
Figure 3.11. Performance of the exact minimum-time controller
69
70
Nonlinear H2/H Constrained Feedback Control
State trajectory for the nonquadratic performance functional 0.6 0.4
x1
States
0.2 0 -0.2 -0.4 -0.6 -0.8
x2 0
1
2
3
4
Time (s)
Figure 3.12. Performance of the nearly minimum-time controller
5
Nearly H2 Optimal Neural Network Control for Constrained-input Systems
71
State evolution for both controllers 0.6 0.4 Switching surface method Nonquadratic functionals method
0.2
x2
0 -0.2 -0.4 -0.6 -0.8 -0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
x1 Figure 3.13. State evolution for both minimum-time controllers
3.5.5 Parabolic Tracker In this example, an academic example taken from [76] that describes a parabolic track as seen in Figure 3.14.
Figure 3.14. A mass constrained to a parabolic track
72
Nonlinear H2/H Constrained Feedback Control
The dynamics are given by
ª x1 º « x » ¬ 2¼
ª 4 x12 x2 2 gx2 cx1 º ª 2 x2 « » m » «« m(1 4 x22 ) 1 4 x22 « «¬ »¼ «¬ 0 x1
1 º ªf º m(1 4 x22 ) » « V » » f ¬ H¼ 0 ¼»
In this problem, a unit mass ( m 1 ) is constrained to follow a parabolic track given by ( y u 2 ) and the influence of gravity ( g ), and a vertical fV and horizontal f H forces. The surface of the parabola has a viscous damping c 0.1 . Both inputs fV and f H are constrained as
1 d fV d 1 1 d f H d 1 An initial stabilizing policy is fV 0 and f H 0 . That is the unforced dynamics are stable. This can be seen by considering the total energy of this system as a Lyapunov function
E
m (1 4 x22 ) x12 mgx22 2
which, when evaluated over the system trajectory, becomes
E
c(1 4 x22 ) x12
Figure 3.15 shows the state trajectory of the unforced system. Figure 3.16 shows the cost of this initial policy for fV 0 and f H 0 . To find the optimal controller for this system, one can use a polynomial based neural network with terms up to the order six. Policy iterations are implemented using this neural network over
5 d x1 d 5 5 d x2 d 5 Figure 3.17 shows the trajectories of the system, while Figure 3.18 shows the nearly optimal policy of both controllers. Figure 3.19 shows the cost of this nearly optimal policy. Note that this is a major improvement over that of Figure 3.16.
Nearly H2 Optimal Neural Network Control for Constrained-input Systems
Initial Controller State Trajectories 6 x1 x2
4
x 1,x 2
2
0
-2
-4
-6
0
10
20
30
40
50 60 Time (sec)
70
80
90
100
90
100
Figure 3.15. State trajectories of the unforced system
Initial Controller Cost 120
100
cost
80
60
40
20
0
0
10
20
30
40
50 60 Time (sec)
70
Figure 3.16. Cost of the initial policy
80
73
Nonlinear H2/H Constrained Feedback Control
Nearly Optimal Controller State Trajectories 6 x1 x2
4
x 1,x 2
2
0
-2
-4
-6
0
5
10
15
20
25
Time (sec)
Figure 3.17. State trajectories of the nearly optimal controller
Nearly Optimal Controller 1 fV 0.8
fH
0.6
0.4
0.2
u(t)
74
0
-0.2
-0.4
-0.6
-0.8
-1
0
5
10
15
20
Time (sec)
Figure 3.18. Performance of the nearly optimal controller
25
Nearly H2 Optimal Neural Network Control for Constrained-input Systems
75
Nearly Optimal Controller Cost 30
25
cost
20
15
10
5
0
0
10
20
30
40
50 60 Time (sec)
70
80
90
100
Figure 3.19. Cost of the nearly optimal controller
3.6 Policy Iterations Without Solving the LE(V,u) In the algorithm and numerical examples presented so far, policy iterations is implemented by iterating between the solution of the LE uj
Vx j ( f gu j ) Q 2 ³ I 1 (v) Rdv T
0, V (0)
0
(3.12)
0
and u j 1 ( x)
I I( 12 R 1 g T Vx j )
(3.13)
One can rewrite the policy iteration as f
V j ( x0 )
³
u j 1 ( x)
I I( 12 R 1 g T Vx j )
0
[Q( x) W (u j )]dt
(3.14)
and (3.15)
76
Nonlinear H2/H Constrained Feedback Control
In this case, one may use a neural network to approximate V j ( x) directly without using a Lyapunov equation as in (3.12) since u j ( x) is admissible and the existence of V j ( x) is guaranteed. Hence one can solve for it without the need for f ( x) in
x
f ( x) g ( x)u
One may use any neural network basis i.e. sigmoid functions, radial basis functions, polynomial, as long as the function approximation property, discussed in Chapter 1, of the resulting neural network holds. Hence one may solve for the examples in Section 3.5 using policy iterations between (3.14) and (3.15).
3.7
Bibliographical Notes
Approximate HJB solutions have been found using many techniques such as those developed by Saridis [85], Beard [18][19], Lendaris [76], Lee [59], Bertsekas and Tsitsiklis [22], Munos [75], Lewis and Kim [52], Balakrishnan [38], [64], [65] Lyshevski [67][68][69][70], and Huang [46]. The results in this chapter appear in [2]. The policy iterations scheme in Section 3.6 appears in [76] and [58]. Successful application of the LE was limited until the novel work of Beard [18][19][17], where a Galerkin spectral approximation method was used to find approximate solutions to the LE at each policy iteration on a given compact set. This method requires the computation of several integrals to arrive at the solution and did not consider systems with input saturation. The results in this chapter appear in [1].
4 Policy Iterations and Nonlinear H Constrained State Feedback Control
4.1 Introduction In this chapter, the HJI equation for systems with input constraints is derived and then an algorithmic solution to solve the HJI equation using policy iterations on the corresponding zero-sum game is developed. This chapter has three objectives. First, prove the existence of policy iterations on the disturbance input, converging to the available storage of the associated dissipative closed-loop dynamics. Hence, this is a way to solve the HJB equation of the nonlinear bounded real lemma. Second, a formal solution is given to the suboptimal H control problem of dynamical systems with constraints on the input using a special quasi-norm to perform the L2-gain analysis and derive the corresponding HJI equation. Third, policy iterations on both players are used to break the HJI of constrained controls into a sequence of linear partial differential equations. This is analogous to the work in Chapter 2 and [1] where the second and third objectives have been applied to the HJB equation appearing in optimal control theory. Remark 4.1 Necessary conditions for the existence of smooth solutions of the HJI equation in the case of systems with no input constraints have been studied earlier [47], [90]. Other lines of research study the nonsmooth solutions of the HJI equation using the theory of viscosity solutions [13]. This notion of solutions was studied for the H f control problem [10]. In this note, the proposed results are valid under regularity assumptions as made in [47], [90] and is justified by assumptions on the quasi-norm described later in the note. See [1] for the HJB case.
78
Nonlinear H2/H Constrained Feedback Control
4.2 Policy Iterations and the Nonlinear Bounded Real Lemma Consider the system described by x
f ( x) k ( x)d
z
h( x )
(4.1)
where f (0) 0 , d (t ) is considered a disturbance, and z (t ) is a fictitious output. x 0 is assumed to be an equilibrium point of the system. It is said that the system (4.1) has an L2 -gain d J , J t 0 , if T
³ 0
T 2
2
z (t ) dt d J 2 ³ d (t ) dt
(4.2)
0
for all T t 0 and all d L2 (0, T ) , with x(0) 0 . Dynamical systems that are finite L2 -gain stable are said to be dissipative, [93].
Definition 4.1 System (4.1) with supply rate w(t ) is said to be dissipative if there exists V t 0 , called the storage function, such that t1
V ( x0 ) ³ w(t )dt t V ( x1 )
(4.3)
t0
where x1
M (t1 , t0 , x0 , d ) .
If x0 0 and V t 0 satisfying (4.3) exists such that 2 2 w(t ) J 2 d (t ) z (t ) , then t1
t1
2
t1
V ( x0 )
0 and
2
2 ³ w(t )dt t V ( x1 ) t 0 ³ z(t ) dt d J ³ d (t ) dt
t0
t0
t0
It has been shown that a lower bound on the storage function is given by the socalled available storage. The existence of the available storage is essential in determining whether or not a system is dissipative.
Definition 4.2 The available storage Va of (4.1) is given by the following optimal control problem T
Va ( x)
sup
³ w(d , z )dt
d ( ), T t 0 0
(4.4)
Policy Iterations and Nonlinear H Constrained State Feedback Control
79
It was shown in [94][93] that for a system to be dissipative, the so-called available storage Va needs to be finite. The available storage, Va t 0 , provides a lower bound on the storage function of the dynamical system 0 d Va d V . To find the available storage, one needs to solve an optimization problem, which can be approached by solving a variational problem as in optimal control theory [53][61]. The Hamiltonian of the optimization problem is given by H ( x, p , d )
pT f kd hch J 2 d cd
(4.5)
The Hamiltonian is a polynomial of degree two in d , and has a unique maximum at
d*
1 2J 2
k ( x)T p
given by
H ( x, p )
pT f ( x)
1 T p k ( x) k ( x)T p h( x)T h( x) 4J 2
(4.6)
Therefore, the value function of the optimization problem (4.4), the available storage, when smooth Va t 0 C1 , is the stabilizing solution of the following Hamilton–Jacobi–Bellman equation Va xT f 4J1 2 Va xT kk T Va x hT h
0, Va (0)
0
(4.7)
The optimal policy is given by
d*
1 2J 2
k ( x)T Va x ( x)
(4.8)
which can be thought of as the policy for extracting the maximum energy from the 2 2 system for a supply rate given by w(t ) J 2 d (t ) z (t ) . It can be interpreted as the worst possible L2-disturbance that can affect the system (4.1). It is assumed that system (4.1) is zero-state observable and hence Va ! 0 with a certain domain of validity as defined next [23].
Definition 4.3 The set : of all x satisfying (4.7) is said to be the domain of validity (DOV) of Va ( x) . Lemma 4.1 If (4.7) has a solution V ( x) C1 , then the solution to (4.7) is positive definite whenever the system is zero-state observable. Moreover it follows that the free system x f ( x) is at least locally asymptotically stable. Global asymptotic stability follows if V ( x) is also a proper function, or radially unbounded.
80
Nonlinear H2/H Constrained Feedback Control
Proof. From (4.7), it follows that dV dx
T
f ( x) d h( x)T h( x)
Hence positive definiteness follows from zero-state observability as shown in Lemma 1 [40]. Since V ! 0 , asymptotic stability follows from LaSalle’s invariance principle, and zero-state observability. ,
Lemma 4.2 If the system dynamics x
f
1 dV kk T dx 2J 2
(4.9)
is asymptotically stable, where V solves (4.7), then L2 -gain<J . Proof. See [90], [55].
,
Lemma 4.3 If system (4.1) has L2 -gain<J , then one has P ( x) such that PxT f hT h 4J1 2 PxT kk T Px
Q( x) 0
(4.10)
,
Proof. See [91].
Lemma 4.4 It can be also been shown that any V ( x) t 0 that solves the following Hamilton–Jacobi inequality
VxT f 4J1 2 VxT kk T Vx hT h d 0, V (0)
0
(4.11)
is a possible storage function.
,
Proof. See [91].
Equation (4.7) is nonlinear in Va ( x) , therefore it is hard if not impossible to solve. In Theorem 4.1, policy iterations on d is used to break (4.7) into a sequence of equations that are linear in V ( x ) . This type of policy iterations, also known as Newton’s method, has been used to solve
AT P PA J12 PBBT P C T C
0
(4.12)
appearing in the Bounded Real Lemma problem for linear systems and was demonstrated in Section 1.5.2. Existence of iterative policies to solve (4.12) appears in [57]. Theorem 4.1 generalizes this to (4.1).
Policy Iterations and Nonlinear H Constrained State Feedback Control
81
Theorem 4.1 Let V ! 0 C 1 be the stabilizing of (4.7). Then under the assumption that i V i C1 one can solve for V by policy iterations starting with d 0 0 , and solving for V i (Vxi )T ( f kd i ) hT h J 2 d i
2
0
(4.13)
and updating the disturbance at each iteration according to
d i 1 with x
1 2J 2
k T Vxi
(4.14)
f kd i 1 asymptotically stable i . Moreover, i o f sup V i V o 0 x:
with 0 V i (:i ) d V i 1 (:i 1 ) and : i 1 :i .
Proof. Existence: Assume that there is d i such that x stable. Then since (Vxi )T ( f 2J1 2 kk T Vxi 1 ) PxT ( f 2J1 2 kk T Vxi 1 )
f kd i is asymptotically
hT h 4J1 2 (Vxi 1 )T kk T Vxi 1 hT h Q( x) 4J1 2 (Vxi 1 )T kk T Vxi 1 4J1 2 ( Px Vxi 1 )T kk T ( Px Vxi 1 )
therefore : i :i 1 and
( Px Vxi )T ( f 2J1 2 kk T Vxi 1 )
Q( x) 4J1 2 ( Px Vxi 1 )T kk T ( Px Vxi 1 ) 0
Since the vector field x f kd i is locally asymptotically stable, it follows that P V i ! 0 is a Lyapunov function. To show local asymptotic stability of x f kd i 1 , differentiating V i over the trajectories of x f kd i 1 one has (Vxi )T ( f 2J1 2 kk T Vxi ) PxT ( f 2J1 2 kk T Vxi )
hT h 4J1 2 (Vxi )T kk T Vxi 4J1 2 (Vxi Vxi 1 )T kk T (Vxi Vxi 1 ) hT h Q( x) 4J1 2 (Vxi )T kk T Vxi 4J1 2 ( Px Vxi )T kk T ( Px Vxi )
then asymptotic stability of x
f kd i 1 follows from
( Px Vxi )T ( f 2J1 2 kk T Vxi )
Q( x) 4J1 2 ( Px Vxi )T kk T ( Px Vxi ) 4J1 2 (Vxi Vxi 1 )T kk T (Vxi Vxi 1 ) 0
82
Nonlinear H2/H Constrained Feedback Control
Starting with d 0 { 0 , and by asymptotic stability of x f , the proof follows by induction. Convergence: Since (d i , V i ) exists and is asymptotically stable. Then, i, V i 1 t V i . This is shown by integrating V i and V i 1 over the state trajectory of x f kd i 1 for x0 :i :i 1 . Since (Vxi 1 )T ( f kd i 1 )
hT h J 2 d i 1
2
(Vxi )T f
(Vxi )T kd i hT h J 2 d i
(Vxi )T k
2J 2 (d i 1 )T
2
Then it follows that f
³ ^V i 1 ( x0 ) V i ( x0 )` dt
V i 1 ( x0 ) V i ( x0 )
0
f
³ ^(V
) ( f kd i 1 ) (Vxi 1 )T ( f kd i 1 )` dt
i T x
0
f
³ ^J
2
di
2
2J 2 (d i 1 )T (d i 1 d i ) J 2 d i 1
2
0
f
^
J 2 ³ d i 1 d i 0
2
` dt
` dt t 0
and hence pointwise convergence to the solution of (4.7) follows. Since : is compact, uniform convergence of V i to V on : follows from Dini’s theorem, [7]. ,
Theorem 4.2 If (4.1) satisfies (4.2) for J 2 d J 1 and if f 2J1 2 kk T Vx J 1 and x
x
f 2J1 2 kk T Vx J 2
are asymptotically stable on :J1 and :J 2 . Then :J 2 :J1 and VJ 2 t VJ 1 .
Proof. Since for J 2 , the available storage VJ 2 satisfies
Vx J 2 T f 4J1 2 Vx J 2 T kk T Vx J 2 hT h 2
0 Vx J 2 T f 4J1 2 Vx J 2 T kk T Vx J 2 hT h d 0 1
VJ 2 is a possible storage function with gain J 1 . Therefore, VJ 1 is valid on :J 2 and :J 2 :J1 . Integrating over the trajectory of x f kdJ 1 it follows that f
VJ 2 ( x0 ) VJ 1 ( x0 )
³ ^V
J1
0
`
f
^
( x0 , dJ 1 ) VJ 2 ( x0 , dJ 1 ) dt t ³ J 22 dJ 2 dJ 1
and this completes the proof.
0
2
` dt t 0 ,
Policy Iterations and Nonlinear H Constrained State Feedback Control
83
Example 4.1. Consider the nonlinear system x
x3 d ,
z
x3
(4.15)
The corresponding HJ equation is Vx ( x3 )
1 2 Vx x 6 4J 2
(4.16)
0
The available storage V ( x) 2J 2 (1 (1 J 2 )1/ 2 ) x 4 / 4 . Note that the available storage ceases to exists for J 1 . Hence the L2 -gain=1. Note that the closed-loop dynamics
x
(1 J 2 )1/ 2 x3
(4.17)
and hence the closed-loop dynamics is asymptotically stable for J ! 1 . To solve the HJ Equation (4.16) by policy iterations, note that V i ( x) pi x 4 with pi a constant. Hence Equation (4.13) gives 4 pi x 3 ( x 3 J22 pi 1 x 3 ) x 6 J 2
2
J2
pi 1 x3
2
0
(4.18)
which is equivalent to
pi (4 8J 2 pi 1 ) 1 4J 2 pi21 For the case when J 2 , V ( x ) converges to pf 2 3 .
0
(4.19)
(2 3) x 4 . Iterating on (4.19) with p0
0
4.3 L2-gain of Nonlinear Control Systems with Input Saturation Consider the following nonlinear system ° x 6:® ¯° z
2
f ( x) g ( x)u k ( x)d ½° ¾ 2 2 h u ¿°
(4.20)
where x \ n , u \ m , d \ q , f (0) 0, x 0 is an equilibrium point of the system, z (t ) a fictitious output, d (t ) L2 [0, f) is the disturbance, and u (t ) U is the control with U defined as U
^u (t ) L2 [0, f) | D i d ui d D i , i
1,! , m`
84
Nonlinear H2/H Constrained Feedback Control
d
z
x f ( x) g ( x)u k ( x)d z \ ( x, u ), y x
u
y
u D ( y) Figure 4.1. State feedback nonlinear H f controller
In the L2 -gain problem, one is interested in u which for some prescribed J renders 2
(t ) § z · ¨ T 2 2 ¸ 2 ³0 ¨ h h u J d ¸ dt ¨ ¸ © ¹
f
V ( x0 )
nonpositive for all d (t ) L2 (0, f) and x(0) f
³ 0
f
2
(4.21)
0 . In other words
2
z (t ) dt d J 2 ³ d (t ) dt
(4.22)
0
It is well known [16] that the L2 -gain problem is equivalent to the solvability of the zero-sum game f
V ( x0 )
2
min max ³ hT h u t J 2 d uU
d
0
2
dt
(4.23)
The Hamiltonian of the previous zero-sum game is H ( x, p , u , d )
2
pT ( f gu kd ) hT h u J 2 d
2
(4.24)
Finding the stationarity conditions of this Hamiltonian requires solving for min max H ( x, p, u, d ) and max min H ( x, p, u , d ) uU
d
d
uU
(4.25)
which is a constrained optimization with respect to the control policy u U . To overcome this difficulty of the constrained optimization problem of the Hamiltonian, a quasi-norm is used to transform the constrained optimization problem (4.23) into
Policy Iterations and Nonlinear H Constrained State Feedback Control f
V ( x0 )
2
min max ³ hT h u q J 2 d u
d
0
2
dt
85
(4.26)
Definition 4.4 A quasi-norm, q , on a vector space X , has the following properties x
q
0 x
0, x y
q
d x q y q, x
q
x
q
I (.) I (.) sat(.)
D
-D
-D
D
0
Figure 4.2. Approximation of control saturation
This definition is weaker than the definition of a norm, in which the third property is replaced by homogeneity D x q D x q D \ [7]. A suitable quasinorm to confront control saturation is u
u
2 q
2 ³ I 1 (v)dv 0
m
uk
k 1
0
¦2³ I
1
(4.27)
(v)dv
where u q C1 one to one, and I 1 is assumed to be monotonically increasing, 2 2 i.e. I () tanh() for u d 1 . Hence u t q u t and is locally quadratic in u . The Hamiltonian of this modified zero-sum game (4.26) is u
H ( x, p , u , d )
pT ( f gu kd ) hT h 2³ I 1 (v)dv J 2 d 0
2
(4.28)
86
Nonlinear H2/H Constrained Feedback Control
Finding the stationarity conditions of this Hamiltonian requires solving for min max H ( x, p, u, d ) and max min H ( x, p, u, d ) u
d
(4.29)
u
d
where the minimization of the Hamiltonian with respect to u is unconstrained. See [1][69], and Chapter 2 for similar work done in the framework of HJB equations. The next lemma shows a property that is satisfied by the quasi-norm (4.27).
Lemma 4.5 If I 1 is monotonically increasing, then a
³I
1
(v)dv I 1 (b)T (a b) ! 0, a z b
b
4.4 The HJI Equation and the Saddle Point To study the HJI equation corresponding to (4.26), we first investigate the finitehorizon game under feedback strategy information structure for both players [16]. It is shown that Isaacs’s condition is satisfied and there is a unique saddle point solving the finite-horizon zero-sum game
V ( x0 , T )
T u § 2· min max ³ ¨ hT h 2³ I 1 (v)dv J 2 d ¸ dt u d 0© 0 ¹
(4.30)
The Hamiltonian of the game (4.30) is u
H ( x, p , u , d )
pT ( f gu kd ) hT h 2³ I 1 (v)dv J 2 d
2
(4.31)
0
Lemma 4.6 Isaacs’s condition: min max H u
d
max min H . d
Proof. Applying the stationarity conditions wH wu gives u ( x)
I 12 g ( x)T p), d ( x)
H ( x, p , u , d )
1 2J 2
k ( x)T p
u
0 and wH wd
0 to (4.31)
(4.32)
pT f 2I 1 (u )T u hT h u
2 ³ I 1 (v)dv 4J1 2 pT kk T p 0
(4.33)
Policy Iterations and Nonlinear H Constrained State Feedback Control
87
Rewriting (4.31) in terms of (4.33) gives
H ( x, p , u , d )
H ( x, p , u , d ) J
2
d d
2
° u ½° 2 ® ³ I1 (v)dv I 1 (u )T (u u ) ¾ °¯u ¿°
2
From Lemma 4.5, one has
H ( x0 , u , d ) d H ( x0 , u , d ) d H ( x0 , u, d )
(4.34)
,
and Isaacs’s condition follows. The Hamilton–Jacobi–Isaacs equation (HJI) corresponding to (4.30) is
wV (t ; x) wt
V (T ; x)
wV (t ; x) , u, d ) wx wV (t ; x) max min H ( x, , u, d ) u d wx wV (t ; x) f ( x) g ( x)u k ( x)d wx 0 min max H ( x, u
d
(4.35)
Under regularity assumptions, from Theorem 2.6 [16], if there exists V ( x0 ) C1 solving the HJI (4.35), then
V ( x0 , u , d ) d V ( x0 , u , d ) d V ( x0 , u, d )
(4.36)
and the zero-sum game has a value and the pair of policies (4.32) are in saddle point equilibrium. The zero-sum game (4.26) is an infinite-horizon zero-sum game. Therefore, it is important to see the behavior of the finite-horizon game (4.30) as T o f . It is seen that as T o f in (4.30), one obtains the following Isaacs equation: H ( x, p , u , d ) u
VxT ( f gu kd ) hT h 2 ³ I 1 (v)dv J 2 d
2
(4.37)
0
0
On substitution of (4.32) in (4.37), the HJI equation is obtained I
V f V g I g Vx h h 2 T x
T x
1 2
T
T
1 gT V x 2
³ 0
V (0)
0
I 1 (v)dv 4J1 2 VxT kk T Vx
0
(4.38)
88
Nonlinear H2/H Constrained Feedback Control
and hence the infinite-horizon game has a value. In the next theorem, it is shown that (4.32) remains in saddle point equilibrium as T o f if they are sought among finite energy strategies. See [15] for unconstrained policies.
Theorem 4.3 Suppose that there exists a V ( x) C1 satisfying the HJI Equation (4.38) and that x
f g I 12 g T Vx ) 2J1 2 kk T Vx
(4.39)
is asymptotically stable, then u ( x)
I 12 g T Vx ), d ( x)
1 2J 2
k T Vx
(4.40)
are in saddle point equilibrium for the infinite-horizon game among strategies u U , d L2 [0, f) .
Proof. The proof is made by completing the squares, T
J T (u, d ; x0 )
³ h
T
2
h u (t ) q J 2 d
2
dt V ( x )
0
0
(4.41)
T
V ( xT ) ³ V dt
0
where V solves (4.38). This becomes T
J T (u , d ; x0 )
³ h
2
2
dt
2
2
dt V ( x ) V ( x )
T
h u (t ) q J 2 d
T
h u (t ) q J 2 d
0
T
³ h
0
0
T
T
³ (Vx* )T ( f gu kd )dt 0
T
³ h 0
T
2
h u (t ) q J 2 d
2
(Vx* )T ( f gu kd ) dt V ( x0 ) V ( xT )
§ u 1 2 1
T
³0 ¨¨ 2 ³ I (v)dv 2I (u ) (u u ) J d d © u
V ( x0 ) V ( xT ) T
2
· ¸¸ dt ¹
Since u (t ), d (t ) L2 [0, f) , and since the game has a finite value as T o f , this implies that x(t ) L2 [0, f) , therefore x(t ) o 0 , V ( x(f)) 0 and
Policy Iterations and Nonlinear H Constrained State Feedback Control
89
J f (u, d ; x0 ) V ( x0 ) f § u ³ ¨ 2 ³ I 1 (v)dv 2I 1 ( u )T (u u ) J 2 d d ¨ 0 © u
2
· ¸¸ dt ¹
(4.42)
Hence u , d are in saddle point equilibrium in the class of finite energy strategies. , Since (4.40) satisfies the Isaacs equation, it can be shown that the feedback saddle point is unique in the sense that it is strongly time consistent and noise insensitive [15]. It is important to see how the solution of the infinite-horizon zero-sum game with the quasi-norm relates to the original constrained input L2-gain control problem. To see this, note that substituting u in (4.42), one has u
V ( f gu kd ) h h 2 ³ I 1 (v) dv J 2 d T x
T
2
J 2 d d
2
0
u
V (u , d ) hT h 2 ³ I 1 (v)dv J 2 d
2
d0
(4.43)
0
u ª 2º 2
1 T ³0 ««V (u , d ) h h 2 ³0 I (v)dv J d »» dt d 0 ¬ ¼
T
Integrating both sides, one has
u ª º (u , d ) hT h 2 I 1 (v)dv J 2 d 2 » dt d 0 V « ³0 « ³0 ¬ ¼» T ª u T º 2 V ( x(T )) V ( x(0)) ³ « hT h 2 ³ I 1 (v)dv » dt d J 2 ³ d dt « 0 ¬ 0 0 ¼» T
(4.44)
If the closed-loop system is asymptotically stable and d () L2 > 0, f , then u
hT h 2 ³ I 1 (v) L2 > 0, f 0
Thus (4.45) follows from x(0)
0 and lim x(T ) T of
u f § T · 2 1 2 h h 2 I ( v ) dv dt J d ¨ ¸ ³0 ¨ ³0 ³0 d dt ¸ © ¹
f
0
(4.45)
90
Nonlinear H2/H Constrained Feedback Control
4.5 Solving the HJI Equation Using Policy Iterations To solve (4.38) by policy iterations, one starts by showing the existence and convergence of policy iterations on the constrained input as in [90] for systems with no input constraints. Then policy iterations on both players as proposed in [20] are performed on the constrained controller and d .
Theorem 4.4 Assume that the closed-loop dynamics for the constrained stabilizing controller u j ,
x
f ( x) g ( x)u j k ( x)d { f j ( x) k ( x)d
satisfy all assumptions of Theorem 2.2. If the constrained controller is updated according to
u j 1
I I 12 g T Vx j
(4.46)
where V j is the available storage that solves uj
Vx j T f j hT h 2 ³ I 1 (v)dv 4J1 2 Vx j T kk T Vx j
0
(4.47)
0
then x f j 1 kd remains dissipative with respect to d (t ) for the same J . Moreover, j o f sup V j V o 0 x:0
with V j 1 d V j with V j 1 valid on :0 , and V is the stabilizing solution of (4.38).
Proof. To show the first part, Vx j T f j 1
Vx j T f Vx j T gu j 1 Vx j T f Vx j T gu j Vx j T g (u j 1 u j ) uj
hT h 4J1 2 Vx j T kk T Vx j 2 ³ I 1 (v)dv 2I 1 (u j 1 )T (u j 1 u j ) 0
u j 1
hT h 2
³I
1
(v)dv 4J1 2 Vx j T kk T Vx j
0
u j 1
2
³I
1
(v)dv 2I 1 (u j 1 )T (u j 1 u j )
uj
From Lemma 4.5, one has the following HJ inequality:
Policy Iterations and Nonlinear H Constrained State Feedback Control
91
u j 1
Vx j T f j 1 hT h 2
³I
1
(v)dv 4J1 2 Vx j T kk T Vx j d 0
0
From Lemma 4.4, V j is a possible storage for x
f j 1 . Hence one has
u j 1
Vx j 1T f j 1 hT h 2
³I
1
(v)dv 4J1 2 Vx j 1T kk T Vx j 1
0
0
where V j 1 d V j and V j 1 is valid on : j and hence valid on :0 . V j converges pointwise to V follows, and since : is compact, uniform convergence of V j to V on : follows by Dini’s theorem [7]. ,
Corollary 4.1 The available storage V of u , (4.40), has the largest DOV of any other constrained controller guaranteeing (4.22) a prescribed J . Proof. The proof follows immediately from Theorem 4.4 since V is valid for any :0 , the DOV of the available storage of any u guaranteeing (4.22). , This implies that u has the largest DOV within which L2 -performance for a given J is guaranteed. Policy iterations in Theorem 4.4 and Theorem 4.1 can be combined to provide a two-loop policy iterations solution method for the HJI equation. Specifically, select u j and find V j that solves (4.47) by inner loop policy iterations on uj
Vxi j T ( f j kd i ) hT h 2 ³ I 1 (v)dv J 2 d i
2
0
(4.48)
0
and the disturbance as in Theorem 4.1 until V jf o V j . Then by Theorem 4.4, use (4.46) in an outer loop policy iteration on the constrained control. Equation (4.48) can be considered as PI (V ji , u j , d i ) 0 , where PI stands for Policy Iteration. It becomes equivalent to the LE equation in Chapter 2 when J f . Controllers derived using (4.38) for a fixed J are suboptimal H f controllers. Optimal H f are achieved for the lowest possible J for which the HJI is solvable. The next theorem demonstrates what happens to the DOV of the value of the game as J decreases.
Theorem 4.5 If J 1 t J 2 ! J , then : J1 :J 2 where : J1 and : J 2 denotes the DOV of the available storage functions VJ 1 and VJ 2 solving (4.38) for J 1 and J 2 respectively with J being the smallest gain for which a stabilizing solution of the HJI exists. Proof. Follows from Theorem 4.4, and Corollary 4.1.
,
92
Nonlinear H2/H Constrained Feedback Control
This implies that once the HJI is solved for a particular attenuation J 1 , one can use the converged control policy as an initial stabilizing solution to try to solve for the HJI with a smaller attenuation J 2 . This is summarized in Figure 4.3.
Remark 4.2 It may be possible that the DOV of the HJI shrinks to null as one approaches J . See [91] for unconstrained control cases. Example 4.2 Consider the following nonlinear system x
x3 u d ,
2
ln ª¬1 tanh 2 (2 x 3 ) º¼ 2 ³ tanh 1 (v)dv
1 d u d 1 u
z
(4.49)
0
Note that hc( x)h( x) ln ª¬1 tanh 2 (2 x3 ) º¼ ! 0 and is monotonically increasing in x . It follows that the HJI Equation (4.38) in this case is given by tanh( 0.5Vx )
0 Vx ( x 3 ) Vx tanh(0.5Vx ) 2
³
tanh 1 (v)dv
0
1 2 Vx2 ln ª¬1 tanh 2 (2 x3 ) º¼ 4J 0 Vx ( x 3 ) Vx tanh(0.5Vx ) 2 tanh(0.5Vx ) tanh 1 (tanh(0.5Vx )) (4.50) 1 2 Vx ln ª¬1 tanh 2 (2 x3 ) º¼ 4J 2 1 0 Vx ( x 3 ) ln ª¬1 tanh 2 (0.5Vx ) º¼ 2 Vx2 ln ª¬1 tanh 2 (2 x3 ) º¼ 4J ln ª¬1 tanh 2 (0.5Vx ) º¼
Assume that J 1 , then the available storage of the HJI equation exists and is given by V ( x) x 4 and the closed-loop dynamics x
f g I 12 g T Vx ) 2J1 2 kk T Vx x 3 tanh(2 x 3 )
(4.51)
is locally asymptotically stable and hence the L2 -gain<1 . The constrained input HJI equation along with two players policy iterations provide a sequence of differential equations for which approximate closed-form solutions are easier to obtain. In the next chapter, it is shown how to use neural networks to obtain a least squares solution of the HJI equation. It is demonstrated how to obtain an approximate solution for V ji in PI (V ji , u j , d i ) 0 at each iteration on i and j . Therefore, one obtains a practical method to derive L2 -gain optimal, or suboptimal H f , controllers of nonlinear systems affine in input and experiencing actuator saturation.
Policy Iterations and Nonlinear H Constrained State Feedback Control
Start Initialization u0 : Initial asymptotically stable control. I1 , I 2 : Number of policy iterations. h( x), J 0 : States related performance criteria.
j
0
0, d i
i
0
Solve for u j
i T x j
V
( f j kd ) h h 2 ³ I 1 (v)dv J 2 d i i
T
2
0
0
i o i 1, d i 1
No No
1 2J 2
k 'Vxi j
i ! I1 Yes
j o j 1, u j 1
I I( 12 g T Vxi j )
j ! I2
Reduce J Let u0 be uI 2
Yes Is the HJI solvable?
No
Yes Finish Figure 4.3. Policy iterations to solve the constrained-input HJI
93
94
4.6
Nonlinear H2/H Constrained Feedback Control
Bibliographical Notes
The formulation of the nonlinear theory of H control has been well developed [10][13][16][90] and [93]. The H norm has played an important role in the study and analysis of robust optimal control theory since its original formulation in an input–output setting by Zames [95]. Earlier solution techniques involved operatortheoretic methods. State space solutions were rigorously derived in [29] for the linear system case that required solving several associated Riccati equations. Later, more insight into the problem was given after the H linear control problem was posed as a zero-sum two-person differential game by Baúar [16]. The nonlinear counterpart of the H control theory was developed by Van der Schaft [90]. He utilized the notion of dissipativity introduced by Willems [93][94] and formulated the H control theory into a nonlinear L2-gain optimal control problem. The L2gain optimal control problem requires solution of a Hamilton–Jacobi equation, namely the Hamilton–Jacobi–Isaacs (HJI) equation. Conditions for the existence of smooth solutions of the Hamilton–Jacobi equation were studied through invariant manifolds of Hamiltonian vector fields and the relation with the Hamiltonian matrices of the corresponding Riccati equation for the linearized problem [90]. Later some of these conditions were relaxed by Isidori and Astolfi [47] into critical and noncritical cases. In [90], it was proven that there exist a sequence of iterative policies to pursue the smooth solution of the HJI equation. Later Beard and McLain [20] proposed, for the first time, to use policy iterations on the disturbance, if they exist, as well as policy iterations on the controller. For linear systems, policy iterations on the disturbance are used to prove the existence of maximal and minimal solutions of Riccati equations in [96].
5 Nearly H Optimal Neural Network Control for Constrained-Input Systems
In our earlier work presented in the fourth chapter of this book and appearing in [3], the zero-sum game for L2-gain optimal control, suboptimal H control, of affine in input nonlinear systems with control constraints was treated. Moreover, the Hamilton–Jacobi–Isaacs (HJI) equation was derived using performance functionals with quasi-norms to encode input constraints. As for unconstrained inputs, once the game value function of the HJI equation is smooth and computed, a feedback controller can be synthesized that results in closed-loop asymptotic stability and provides L2-gain disturbance attenuation. However, computing the value of the game is a formidable task when solutions of the HJI are approached directly. For unconstrained affine in input nonlinear systems, a direct approach to solve the HJI equation is given by the second coauthor [46], where the assumed smooth solution is found by solving for the Taylor series expansion coefficients in a very efficient and organized manner. In [20], an indirect method to solve the HJI equation for unconstrained systems based on policy iterations is proposed where the solution of a sequence of differential equations, linear in the associated cost, converges to the solution of the related HJI equation, which is nonlinear in the available storage. Galerkin techniques are used to solve the sequence of linear differential equations, resulting in a numerically efficient algorithm that, however, requires computing numerous integrals over a well-defined region of the statespace. In [3], policy iterations were proposed to solve the constrained-input HJI equation. In this chapter, we build on the results in [3] by using neural networks to solve for the sequence of linear differential equations in a least squares sense on a prescribed compact set of the state-space. This is an extension to our earlier neural network policy iteration approach to solve the constrained-input HJB equation [1]. The importance of this chapter is that a practical solution method based on neural networks is provided to solve for suboptimal H control of constrained input systems. The remainder of this chapter is organized as follows. Section 5.1 describes a neural network least squares based algorithm to practically solve the
96
Nonlinear H2/H Constrained Feedback Control
constrained-input HJI equation. Section 5.2 demonstrates the stability and convergence of the proposed neural network algorithm. Section 5.3 illustrates a successful application of the proposed algorithm to the Rotational/Translational Actuator (RTAC) nonlinear benchmark problem under actuator saturation. Conclusions are given in Section 5.4. In the next section, it is shown how to approximate V ji in PI (V ji , u j , d i ) 0 at each iteration on i and j using neural networks.
5.1 Neural Network Representation of Policies Although equation uj i T x j
V
( f j kd ) h h 2 ³ I 1 (v)dv J 2 d i i
T
2
0
(5.1)
0
is in principle easier to solve for V ji than solving the HJI (4.38) directly, it remains difficult to get an exact closed-form solution for V ji at each iteration. Therefore, one seeks to solve approximately for V ji at each iteration. In this section, a computationally practical neural network based algorithm is presented that solves for V ji on a compact set domain of the state-space in a least squares sense. Proofs of convergence and stability of the neural network policies are discussed in Section 5.2. It is well known that neural networks can be used to approximate smooth functions on prescribed compact sets [60]. Therefore, V ji is approximated at each inner loop iteration i over a prescribed region of the state-space with a neural net,
Vˆji ( x)
L
¦w
i j, k
V k ( x) w ij T ı L ( x)
(5.2)
k 1
wheref the activation functions V j ( x ) : : o , are continuous, V j (0) 0 , span ^V j `1 L2 (:) . The neural network weights are wk and L is the number of hidden-layer neurons. Vectors T
ı L ( x) { >V 1 ( x) V 2 ( x) " V L ( x) @ T
w { > w1 w2 " wL @
are the vector activation function and the vector weight, respectively. The neural network weights are tuned to minimize the residual error in a least squares sense over a set of points within the stability region : of the initial stabilizing control. The least squares solution attains the lowest possible residual error with respect to the neural network weights. Replacing V ji in PI (V ji , u j , d i ) 0 with Vˆji , one has
Nearly H Optimal Neural Network Control for Constrained-input Systems
§ PI ¨ Vˆji ( x) ©
L
¦w V k
k
k 1
· ( x), u j , d i ¸ ¹
97
(5.3)
eL ( x)
where eL ( x) is the residual error. To find the least squares solution, the method of weighted residuals is used [31]. The weights, w ij , are determined by projecting the residual error onto deL ( x) dwij and setting the result to zero x : using the inner product, i.e.
deL ( x) , eL ( x) dw ij where f,g has w ij
³ fgdx
0
(5.4)
is a Lebesgue integral. Rearranging the resulting terms, one
:
ı L Fji , ı L Fji
1
< H ij , ı L F ji
(5.5)
where F ji
f gu j kd i
H ij
h ' h 2 ³ I 1 (v)dv J 2 d i
uj
2
0
Equation (5.5) involves a matrix inversion. The following lemma discusses the invertibility of this matrix. L
Lemma 5.1 If the set ^V j ` is linearly independent, then 1
^V
T j
Fji `
L
1
is also linearly independent.
Proof. This follows from the asymptotic stability of the vector field x in [3], and from [1].
Fji shown ,
Because of Lemma 5.1, the term ı L F ji , ı L F ji is guaranteed to have full rank, and thus is invertible as long as x Fji is asymptotically stable. This in turn guarantees a unique w ij of (5.5). Having solved for the neural net weights, the disturbance policy is updated as dˆ i 1
1 2J 2
k T ı LT w ij
(5.6)
98
Nonlinear H2/H Constrained Feedback Control
It is important that the new dynamics x f gu j kdˆ i 1 are asymptotically stable in order to be able to solve for w ij1 in (5.5). Theorem 1 in the next section discusses the asymptotic stability of x f gu j kdˆ i 1 . Policy iterations on the disturbance requires solving iteratively between Equation (5.5) and (5.6) at each inner loop iteration on i until the sequence of neural network weights w ij converges to some value, denoted by w j . Then the control is updated using w j as
uˆ j 1
I( 12 g T ı LT w j )
(5.7)
in the outer-loop iteration on j . Finally, one can approximate the integrals needed to solve (5.5) by introducing a mesh on : with mesh size equal to 'x . Equation (5.5) becomes X ij
«ı F i "" ı F i L j ¬« L j x1
T
» , Yi j xp ¼ »
«H i ¬« j
x1
"" H ij
» xp ¼ »
T
(5.8)
where p in x p represents the number of mesh points and H and F are as shown in (5.5). The number p increases as the mesh size is reduced. Therefore
ı L Fji , ı L Fji H ij , ı L Fji
lim ( X ij T X ij ) 'x
'x o 0
lim ( X ij T Y ji ) 'x
(5.9)
'x o 0
This implies that one can calculate w ij as
w ij
( X ijT X ij ) 1 ( X ij T Y ji )
(5.10)
An interesting observation is that Equation (5.10) is the standard least squares method of estimation for a mesh on : . Note that the mesh size ' should be such that the number of points p is greater than or equal to the order of approximation L . This guarantees full rank for ( X ij T X ij ) . There do exist various ways to efficiently approximate integrals such as those appearing in (5.5). Monte Carlo integration techniques can be used. Here the mesh points are sampled stochastically instead of being selected in a deterministic fashion [30]. In any case however, the numerical algorithm at the end requires solution of (5.10), which is a least squares computation of the neural network weights. Numerically stable routines to compute equations like (5.10) exist in several software packages, including MATLAB, which is used the next section. A flowchart of the proposed computational algorithm is shown in Figure 5.1. This is an offline algorithm run a priori to obtain a neural network constrained state feedback controller that is nearly L2-gain optimal.
Nearly H Optimal Neural Network Control for Constrained-input Systems
Start
Initialization ı L : Neurons.
p: Number of mesh points. u0 : Initial asymptotically stable control. I1 , I 2 : Number of policy iterations. :: The neural network region of approximation. h( x), J 0 : States related performance criteria. R: Controls related performance criteria.
j i
X ij
0
0, dˆ (i )
0
«ı F i "" ı F i L j ¬« L j x1 « H i "" H i j ¬« j x1
w ij
( X ij T X ij ) 1 ( X ij T Y ji )
1 2J 2
» ¼»
T
T
Y ji
i o i 1, dˆ i 1
» xp ¼ »
xp
k T ı LT w ij
No i ! I1
Reduce J Let u0 be uˆ I 2
Yes
I( 12 g T ı LT w j )
j o j 1, uˆ j 1
No
j ! I2
Is the HJI solvable?
Yes
No Finish
Figure 5.1. Flowchart of the algorithm
99
100
Nonlinear H2/H Constrained Feedback Control
In this algorithm, once the policies converge for some J 1 , one may use the control policy as an initial policy for new inner/outer loop policy iterations with J 2 J 1 . The attenuation J is reduced until the HJI equation is no longer solvable on the desired compact set.
5.2 Stability and Convergence of Least Squares Neural Network Policy Iterations In this section, the stability and convergence of policy iterations between (5.5), (5.6) and (5.7) is studied. It is shown that the closed-loop dynamics resulting from the inner loop iterations on the disturbance (5.6) is asymptotically stable as dˆ i 1 uniformly converges to d i 1 . Then later, it is shown that the updated uˆ j 1 is also stabilizing. Hence, this section starts by showing convergence results of the method of least squares when neural networks are used to solve for V ji in. Note that (5.2) is a Fourier series expansion. In this chapter, a linear in parameters Volterra neural network is used. This gives a power series neural network that has the important property of being differentiable. This means that they can approximate uniformly a continuous function with all its partial derivatives up to order m using the same polynomial, by differentiating the series termwise. This type of series is m -uniformly dense as shown in [1]. Other m -uniformly dense neural networks, not necessarily based on power series, are studied in [41]. To study the convergence properties of the developed neural network algorithm, the following assumptions are required.
Assumption 5.1 It is assumed that the available storage exists and is positive definite. This is guaranteed for stabilizable dynamics and when the performance functional satisfies zero-state observability. Assumption 5.2 The system dynamics and the performance integrands are such that the solution of PI (V ji , u j , d i ) 0 is continuous and differentiable for all i and j , therefore, belonging to the Sobolev space V H 1,2 (:) [6]. f
Assumption 5.3 One can choose complete coordinate elements ^V j ` H 1,2 (:) 1 such that the solution V H 1,2 (:) and ^wV wx1f,!, wV wxn ` can be uniformly approximated by the infinite series built from ^V j ` . 1
Assumption 5.4 The sequence ^\ j and given by
AV j
AV j ` is linearly independent and complete,
dV j dx
T
( f gu kd )
Assumptions 5.1–5.3 are standard in H control theory and neural network control literature. Lemma 5.1 assures the linear independence required in the fourth
Nearly H Optimal Neural Network Control for Constrained-input Systems
101
assumption while the high-order Weierstrass approximation theorem (Theorem 1.15) shows that for all V and H , there exists L and w L such that Vˆ V H , dVˆ dxk dV dxk H k This implies that as L o f sup AVˆ AV o 0 AVˆ AV x:
L2 ( : )
o0
Therefore completeness of ^\ j ` is established, and Assumption 5.4 is satisfied. Similar to the HJB Equation [1], one can use the previous assumptions to conclude the uniform convergence of the least squares method which is placed in the Sobolev space H 1,2 (:) [6].
Theorem 5.1 The neural network least squares approach converges uniformly for
sup dVˆji dx dV ji dx o 0, sup Vˆji V ji o 0, sup dˆ i 1 d i 1 o 0 x:
x:
x:
sup uˆ j 1 u j 1 o 0 x:
Next, it is shown that the system x f j kdˆ i 1 is asymptotically stable, and hence Equation (5.5) can be used to find Vˆ i 1 .
Theorem 5.2 L0 : L t L0 such that x
f j kdˆ i 1 is asymptotically stable.
Proof. Since the system x f j kd is dissipative with respect to J , this implies [90] that there exists P ( x ) ! 0 such that uj
PxT f j hT h 2 ³ I 1 (v)dv 4J1 2 PxT kk cPx
Q( x) 0
(5.11)
0
where i, P ( x ) t V i ( x ) . Since uj
(Vxi 1 )T ( f j 2J1 2 kk T Vxi )
hT h 2 ³ I 1 (v) dv 4J1 2 (Vxi )T kk T Vxi 0
one can write the following using Equation (5.12) and (5.11):
(5.12)
102
Nonlinear H2/H Constrained Feedback Control
uj
( Px Vxi 1 )T ( f j kd i 1 )
PxT f j hT h 2 ³ I 1 (v)dv 4J1 2 PxT kk T Px 0
4J1 2 ( Px Vxi )T kk T ( Px Vxi )
(5.13)
Q( x) 4J1 2 ( Px Vxi )T kk T ( Px Vxi ) 0 Since x f j kd i 1 and the right-hand side of (5.13) is negative definite, it follows that P ( x ) V i 1 ( x ) ! 0 . Using P ( x ) V i 1 ( x ) ! 0 as a Lyapunov function candidate for the dynamics x f j kdˆ i 1 , one has uj
( Px Vxi 1 )T ( f j 2J1 2 kk T Vˆxi )
Pxc f j hT h 2 ³ I 1 (v)dv 4J1 2 PxT kk T Px 0
4J 2 ( Px V ) kk T ( Px Vxi ) 1
i T x
2J1 2 ( Px Vxi 1 )T kk T (Vˆxi Vxi ) d Q( x) 2J1 2 ( Px Vxi 1 )T kk T (Vˆxi Vxi ) From uniform convergence of Vˆ i to V i , L0 : L t L0 such that
x :,
1 2J 2
( Px Vxi 1 )T kk T (Vˆxi Vxi ) ! Q ( x)
This implies that
x :, ( Px Vxi 1 )T ( f j 2J1 2 kk T Vˆxi ) 0 , Next, it is shown that neural network policy iteration on the control as given by (5.7) is asymptotically stabilizing and L2-gain stable for the same attenuation J on :.
Theorem 5.3 L0 : L t L0 such that x
f uˆ j 1 is asymptotically stable.
Proof. This proof is in essence contained in Corollary 3 in [1] where the positive definiteness of h( x ) is utilized to show that uniform convergence of Vˆj to V j , implies that L0 : L t L0 such that
x :, (Vx j )T ( f uˆ j 1 ) 0 , Theorem 5.4 If x f gu j 1 kd has L2-gain less than J , then it can be shown that L0 : L t L0 such that x f guˆ j 1 kd has L2-gain less than J .
Nearly H Optimal Neural Network Control for Constrained-input Systems
103
Proof. Since x f gu j 1 kd has L2-gain less than J , then this implies that there exists a P ( x ) ! 0 such that u j 1
PxT ( f gu j 1 ) hT h 2 ³ I 1 (v)dv 4J1 2 PxT kk T Px
Q( x) 0
0
Hence, one can show that uˆ j 1
PxT ( f guˆ j 1 ) hT h 2 ³ I 1 (v)dv 4J1 2 PxT kk T Px 0
uˆ j 1
Q( x) PxT g (uˆ j 1 u j 1 ) 2
³I
1
(v) dv
u j 1
From uniform convergence of uˆ j 1 to u j 1 , L0 : L t L0 such that uˆ j 1
x :, PxT g (uˆ j 1 u j 1 ) 2
³I
1
(v)dv ! Q( x)
u j 1
This implies that uˆ j 1
x :, Px g ( f uˆ j 1 ) h h 2 ³ I 1 (v)dv 4J1 2 PxT kk T Px 0 T
T
0
, The importance of Theorem 5.4 is that it justifies solving for the available storage for the new updated dynamics x f guˆ j 1 kd . Hence, all of the preceding theorems can be used to show by induction the following main convergence results. The next theorem is an important result upon which the algorithm proposed in Figure 5.1 is justified. Theorem 5.5 L0 : L t L0 such that A. For all j , x f guˆ j 1 kd is dissipative with L2-gain less than J on : . B. For all j and i , x C.
f guˆ j 1 kd i is asymptotically stable on : .
H , L1 ! L0 such that sup uˆ j u H and sup Vˆji V H . x:
x:
Proof. The proof follows directly by indution from Theorems 5.1–5.4.
,
104
Nonlinear H2/H Constrained Feedback Control
5.3 RTAC: The Nonlinear Benchmark Problem In this section, we will consider the disturbance attenuation problem associated with the so called Rotational/Translational Actuator (RTAC) system depicted in Figure 5.2.
Figure 5.2. Rotational actuator to control a translational oscillator
The RTAC system is described in [25] and is abstracted to study a dual-spin spacecraft. It consists of a cart of mass M connected to a fixed wall by a linear spring of stiffness k. The cart is constrained to have one-dimensional travel. The proof-mass actuator attached to the cart has mass m and moment of inertia I about its center of mass, which is located at a distance e from the point about which the proof-mass rotates. Its motion occurs in a horizontal plane so that no gravitational forces need to be considered. The motion of RTAC is derived in [25] and is repeated as follows:
[ [
H (T 2 sin T T cos T ) F
(5.14)
T H[ cos T u
where [ is the one-dimensional displacement of the cart, T the angular position of the proof body, w the disturbance, and u the control input. The coupling between the translational and rotational motion is captured by the parameter H , which is defined by
H
me ( I me 2 )( M m)
where 0 H 1 is the eccentricity of the proof body. Letting x col ( x1 x2 x3 x4 ) col ([ [ T T) , d F , and following state-space representation of (5.14):
y
[
yields the
Nearly H Optimal Neural Network Control for Constrained-input Systems
x
f ( x ) g ( x )u k ( x ) d
y
x1
105
(5.15)
where
f
g
k
x2 ª º « » 2 x1 H x4 sin x3 « » « » 1 H 2 cos 2 x3 « » x4 « » « H cos x ( x H x 2 sin x ) » 3 1 4 3 » « «¬ »¼ 1 H 2 cos 2 x3 0 ª º « H cos x » 3 « » «1 H 2 cos 2 x3 » « » 0 « » « » 1 « » 2 2 «¬1 H cos x3 »¼ 0 ª º « » 1 « » 2 2 «1 H cos x3 » « » 0 « » « H cos x3 » « » 2 2 ¬«1 H cos x3 ¼»
(5.16)
where 1 H 2 cos 2 x3 z 0 for all x3 and H 1. The dynamics of this nonlinear plant pose a challenge as both the rotational and translation motions are coupled as shown. In [89] and [74], unconstrained controls were obtained to solve the L2 disturbance problem of the RTAC system based on Taylor series solutions of the HJI equation. In [74], unconstrained controllers based on the state-dependent Riccati equation (SDRE) were obtained. The SDRE is easier to solve than the HJI equation and results in a time varying controller that was shown to be suboptimal. In this section, a neural network constrained-input H state feedback controller is computed for the RTAC shown in Figure 5.2. To our knowledge, this is the first treatment in which input constraints are explicitly considered during the design of the optimal H controller that guarantees optimal disturbance attenuation. The dynamics of the nonlinear plant are given as
106
Nonlinear H2/H Constrained Feedback Control
x z'z
H me
f ( x) g ( x)u k ( x)d , 2 1
2 2
u d2
2 3
x 0.1x 0.1x 0.1x42 u
I me M m 2
0.2, J
2
(5.17)
q
10
with f ( x) , g ( x) and k ( x) as defined in (5.16). The design steps procedure goes as follows: Initial control selection The following H f controller of the linear system resulting from Jacobian linearization of (5.17) is chosen u0
2 tanh(2.4182 x1 1.1650 x2 0.3416 x3 1.0867 x4 )
and forced to obey the u d 2 constraint. This is a stabilizing controller that guarantees that L2 -gain<6 for the Jacobian linearized system [89]. The neural network is going to be trained on the following region of the state-space xi d 2 i 1, 2,3, 4 which is a subset of the region of asymptotic stability of u0 that can be estimated using techniques in [34]. Policy iterations The iterative algorithm starts by approximately solving for the HJI with J 30 . The approximate solution is done by inner loop iterations between (5.6) and (5.10) followed by outer loop policy iterations (5.7). In the simulation performed, the neurons of the neural network were chosen from the 6th-order series expansion of the value function. Only polynomial terms of even order were considered, therefore the total number of neural networks is L 129 and is shown in Figure 5.3. A 6th-order series approximation of the value function was satisfactory for our purposes, and it results in a 5th-order controller as done for the unconstrained case in [46]. Once the neural network algorithm converges, and an approximate solution is found for (4.38) with J 30 , the resulting controller can be used as an initial controller for a new inner/outer loop iterations to solve (4.38) with a smaller J . The computational routine was successful in obtaining approximate solutions to (4.38) with J 10 with the final weights as given Figure 5.4. The controller is finally given as u
1 g T ( x)ı LT w 2
The neural network activation functions are shown in Figure 5.3. Note that this is a Volterra type neural network.
Nearly H Optimal Neural Network Control for Constrained-input Systems
VL
$ x12, x1 x2, x1 x3, x1 x4, x22, x2 x3, x2 x4, x32, x3 x4, x42, x14,
x13 x2, x13 x3, x13 x4, x12 x22, x12 x2 x3, x12 x2 x4, x12 x32, x12 x3 x4, x12 x42, x1 x23, x1 x22 x3, x1 x22 x4, x1 x2 x32, x1 x2 x3 x4, x1 x2 x42, x1 x33, x1 x32 x4, x1 x3 x42, x1 x43, x24, x23 x3, x23 x4, x22 x32, x22 x3 x4, x22 x42, x2 x33, x2 x32 x4, x2 x3 x42, x2 x43, x34, x33 x4, x32 x42, x3 x43, x44, x16, x15 x2, x15 x3, x15 x4, x14 x22, x14 x2 x3, x14 x2 x4, x14 x32, x14 x3 x4, x14 x42, x13 x23, x13 x22 x3, x13 x22 x4, x13 x2 x32, x13 x2 x3 x4, x13 x2 x42, x13 x33, x13 x32 x4, x13 x3 x42, x13 x43, x12 x24, x12 x23 x3, x12 x23 x4, x12 x22 x32, x12 x22 x3 x4, x12 x22 x42, x12 x2 x33, x12 x2 x32 x4, x12 x2 x3 x42, x12 x2 x43, x12 x34, x12 x33 x4, x12 x32 x42, x12 x3 x43, x12 x44, x1 x25, x1 x24 x3, x1 x24 x4, x1 x23 x32, x1 x23 x3 x4, x1 x23 x42, x1 x22 x33, x1 x22 x32 x4, x1 x22 x3 x42, x1 x22 x43, x1 x2 x34, x1 x2 x33 x4, x1 x2 x32 x42, x1 x2 x3 x43, x1 x2 x44, x1 x35, x1 x34 x4, x1 x33 x42, x1 x32 x43, x1 x3 x44, x1 x45, x26, x25 x3, x25 x4, x24 x32, x24 x3 x4, x24 x42, x23 x33, x23 x32 x4, x23 x3 x42, x23 x43, x22 x34, x22 x33 x4, x22 x32 x42, x22 x3 x43, x22 x44, x2 x35, x2 x34 x4, x2 x33 x42, x2 x32 x43, x2 x3 x44, x2 x45, x36, x35 x4, x34 x42, x33 x43, x32 x44, x3 x45, x46( Figure 5.3. Volterra neural network used in the RTAC example
107
108
Nonlinear H2/H Constrained Feedback Control
w [7.5591 -0.5592 -0.0398 -2.0616 7.5212 1.7514" 3.0072 0.3526 1.2436 1.3561 0.0910 0.0082" -0.1817 -0.1380 0.1958 0.1807 0.4315 0.2912 0.0057 -0.1288 0.3864 0.1383 -0.2192 0.4320 0.1107 0.1727 0.2055 0.0897 -0.4341 -1.9855 -0.1703 -0.0064 -0.2915 0.0071 -0.0082 0.0052
0.1441 0.3113" -0.0817 0.2979" 0.1636 0.0131" 0.3292 0.3234" 0.1540 -0.1364"
0.0053 0.0407 0.0029 -0.0125 0.0142" 0.0061 -0.0099 -0.0072 -0.0060 -0.0123" -0.0110 0.0289 0.0193 0.0033 -0.0147" 0.0074 0.0098 0.0001 0.0016 0.0047 "
-0.0138 -0.0084 -0.0047 -0.0192 -0.0258 -0.0177" -0.0408 -0.0187 -0.0053 -0.0012 -0.0144 -0.0260" -0.0080 0.0062 -0.0011 0.0140 0.0109 -0.0031" -0.0127 -0.0051 -0.0041 -0.0134 -0.0131 -0.0141" -0.0292 -0.0178 -0.0089 -0.0243 -0.0125 0.0022" -0.0482 -0.0388 0.0184 0.0366 0.0064 0.0011" -0.0063 -0.0042 -0.0004 -0.0102 -0.0150 -0.0141" -0.0515 -0.0319 -0.0144 0.0157 0.0003 0.0200" 0.0398 0.0091 0.0346 0.1461 -0.0217 -0.0407" -0.0048 -0.0008 -0.0273 0.0100 0.0493 0.0037" -0.0105 -0.0167 -0.0058]T . Figure 5.4. Weight of the Volterra neural network used in the RTAC example
Simulation Figures 5.5 and 5.6 show the states trajectories when the system is at rest and experiencing a disturbance d (t ) 5sin(t )e t . Figure 5.7 shows the control signal, while Figure 5.8 shows the attenuation f
³ 0
2
z (t ) dt
f
³
2
d (t ) dt
0
Figures 5.9 and 5.10 shows the states trajectories when the system is at rest and experiencing a disturbance d (t ) 5sin(t )e t . Figures 5.11 and 5.12 show the control signal and attenuation respectively.
Nearly H Optimal Neural Network Control for Constrained-input Systems
Initial Controller State Trajectories 8 r theta
6
x 1,x 3
4 2 0 -2 -4 -6
0
20
40 60 Time in seconds
80
100
Figure 5.5. r , T state trajectories
Initial Controller State Trajectories 4 rdot thetadot
x 2,x 4
2
0
-2
-4
0
20
40 60 Time in seconds
Figure 5.6. r , T state trajectories
80
100
109
Nonlinear H2/H Constrained Feedback Control
Initial Controller 2
control
1
0
-1
-2
0
20
40 60 Time in seconds
80
100
80
100
Figure 5.7. u (t ) control input
Initial Controller Cost 300 250
Attenuation
110
200 150 100 50 0
0
20
40 60 Time in seconds
Figure 5.8. Disturbance attenuation
Nearly H Optimal Neural Network Control for Constrained-input Systems
Nearly Optimal Controller State Trajectories 3 r theta
2
x 1,x 3
1
0
-1
-2
0
20
40 60 Time in seconds
80
100
Figure 5.9. Nearly optimal r , T state trajectories
Nearly Optimal Controller State Trajectories 2 rdot thetadot
1
x 2,x 4
0
-1
-2
-3
0
20
40 60 Time in seconds
80
Figure 5.10. Nearly optimal r , T state trajectories
100
111
Nonlinear H2/H Constrained Feedback Control
Nearly Optimal Controller 1.5 1
control
0.5 0 -0.5 -1 -1.5
0
20
40 60 Time in seconds
80
100
Figure 5.11. Nearly optimal u (t ) control input
Nearly Optimal Controller Cost 25
20 Attenuation
112
15
10
5
0
0
20
40 60 Time in seconds
80
Figure 5.12. Nearly optimal disturbance attenuation
100
Nearly H Optimal Neural Network Control for Constrained-input Systems
113
The nearly optimal nonlinear constrained input H f controller is shown to perform much better than the initial controller the algorithm started with. It is a novel utilization of the neural networks approximation property to obtain a closedform solution to the constrained-input H control policy, which is very hard to find otherwise.
5.4
Bibliographical Notes
The work in this chapter appears in [4]. The systems considered are affine in input with control saturation. The algorithm relies on policy iterations that have been proposed for unconstrained [20] and constrained [3] control case. The nonlinear benchmark problem is originally proposed in [25]. In [89] and [74], unconstrained controls were obtained to solve the L2 -disturbance problem of the RTAC system based on Taylor series solutions of the HJI equation. In [74], unconstrained controllers based on the state-dependent Riccati equation (SDRE) were obtained. The SDRE is easier to solve than the HJI equation and results in a time varying controller that was shown to be suboptimal.
6 Taylor Series Approach to Solving HJI Equation
It is known from the previous chapters that the solvability of the L2-gain attenuation problem by state feedback control is boiled down to the solvability of the Hamilton–Jacobi–Isaacs (HJI) equation. However, due to the nonlinear nature, it is almost impossible to obtain the closed-form solution of the HJI equation. In this chapter, we develop an approximation approach to solving the HJI equation in terms of the Taylor series. The chapter is organized as follows. In Section 6.1 we introduce the notation and review the state feedback H control problem. Section 6.2 shows that the coefficients of the Taylor series solution are governed by a Riccati equation and a sequence of linear algebraic equations. The solvability conditions for these equations are also derived. In Section 6.3, we give explicit expressions for linear algebraic equations that lend themselves to an iterative algorithm. In Section 6.4, the algorithm is applied to solve the disturbance attenuation problem for the well-known benchmark rotational/translational actuator system approximately.
6.1 Introduction We will focus on the full information case since the HJI equation arising from the state estimation has the same form as that arising from the full information case. For convenience of introducing our approach, the notation of this chpater is different from that in Chapter 1 and we will consider a nonlinear system described as follows:
x
f ( x) g1 ( x) w g 2 ( x)u
z
h( x) l ( x)u
(6.1)
where x \ n is the plant state, u \ m2 the plant input, w \ m1 a set of exogenous input variables, and z \ p a penalty variable. It is assumed that all
116
Nonlinear H2/H Constrained Feedback Control
functions involved in this setup are smooth and defined in a neighborhood X of n the origin in \ and f (0) 0 and h(0) 0. To simplify the derivation of the controller, we also make the following assumptions about the plant: hT ( x)l ( x) T
l ( x)l ( x)
0
(6.2)
R2
with R2 a nonsingular constant matrix. The feedback control law takes the form: u
k ( x)
(6.3)
where k ( x) is a locally defined sufficiently smooth function satisfying k (0) 0. As discussed in Chapter 1, the purpose of control is twofold: to achieve closedloop stability and to attenuate the effect of the disturbance input w to the penalty variable z. Here closed-loop stability means that the equilibrium x 0 of the closed-loop system with w 0 is asymptotically stable. The disturbance attenuation is characterized in the following way. Given a real number 0 J , it is said that the exogenous signals are locally attenuated by J if there exists a neighborhood X of the point x 0 such that for every T ! 0 and for every piecewise continuous function w : [0, T ] o \ m1 for which the state trajectory of the closed-loop system starting from x(0) 0 remains in X for all t [0, T ], and the response z : [0, T ] o \ p of (6.1) and (6.3) satisfies T
T
T T 2 ³ z (s) z (s)ds d J ³ w (s)w(s)ds 0
(6.4)
0
The HJI equation has been derived in Chapter 1 using the zero-sum game theory. Here we will give a different derivation of the HJI equation based on the work of [45]. For this purpose, let the Hamiltonian function associated with the above problem be H a ( x, p, w, u )
pT ( f ( x) g1 ( x) w g 2 ( x)u ) 1 2 2 ( h( x) l ( x)u J 2 w ) 2
(6.5)
Under assumption (6.2), Equation (6.5) can be rewritten as follows: H a ( x, p,w,u )
ª wº 1 pT f ( x) hT ( x)h( x) [ pT g1 pT g 2 ] « » 2 ¬u ¼ T
1 ª wº ª wº « » R« » 2 ¬u ¼ ¬u ¼
(6.6)
Taylor Series Approach to Solving HJI Equation
117
where R
ª J 2 I « ¬ 0
0º » R2 ¼
(6.7)
Let def
H ( x, p )
H a ( x, p,D1 ( x, p ),D 2 ( x, p ))
(6.8)
where
D1 ( x, p )
1
J2
g1T p, D 2 ( x, p )
R21 g 2T p
(6.9)
Then H ( x, p )
g gT 1 1 pT f ( x) h1T h1 pT ( 1 21 g 2 R21 g 2 ) pT 2 2 J
(6.10)
It is shown in [47] that if there exists a positive definite C1 function V ( x) , locally n defined in a neighborhood of the origin in \ that satisfies
H a ( x,VxT ,D1 ( x,VxT ),D 2 ( x,VxT ))
0
(6.11)
or, more explicitly, g gT 1 1 Vx f ( x) h1T h1 Vx ( 1 21 g 2 R21 g 2 )VxT 2 2 J
0
(6.12)
where Vx denotes the gradient of V ( x ) , then the state feedback control law given by u
R21 g 2T ( x)VxT
(6.13)
achieves disturbance attenuation with performance level specified by J .
Remark 6.1 Equation (6.12), although called the Hamilton–Jacobi–Isaacs equation, is slightly different from (1.91). This difference can be eliminated by replacing Vx in (6.12) with Vx / 2 .
118
Nonlinear H2/H Constrained Feedback Control
Due to the nonlinear nature, it is rarely possible to find a closed form solution for the HJI equation. Therefore, in what follows, we will find a Taylor series solution of (6.12). We will show that the coefficient vectors of the power series solution of (6.12) are governed by one algebraic Riccati equation and a sequence of linear algebraic equations. And, further, we will give conditions for these equations to be solvable. This set of equations lend themselves to a systematic procedure to generate the coefficient vectors of the Taylor series expansion of V ( x) .
6.2 Power Series Solution of HJI Equation To obtain a Taylor series solution of (6.12), we first express the solution of (6.12) as follows V ( x)
1 T x Px V [3 ] ( x) 2
(6.14)
where P R nu n is a symmetric matrix and V [3 ] ( x) consists of the cubic and higher order terms of V ( x ). Expand H ( x, p ) as follows
H ( x, p )
1 T 1 x H xx x pT H px x pT H pp p H [3 ] ( x, p ) 2 2
(6.15)
where H xx , H px , H pp R nun with H xx and H pp symmetric, and H [3 ] ( x, p ) consists of cubic and high order terms of H ( x, p ). Denote the gradient of V [3 ] by Vx[3 ] . Then Vx
xT P Vx[3 ]
(6.16)
Letting p VxT in (6.15) and then using (6.16) gives
H ( x,VxT )
1 T 1 x H xx x Vx H px x Vx H ppVxT H [3 ] ( x,VxT ) 2 2 1 T 1 T x H xx x x PH px x xT PH pp Px 2 2 1 [3 ] [3 ] Vx ( H px H pp P ) x Vx H pp (Vx[3 ] )T H [3 ] ( x,VxT ) (6.17) 2
Clearly, the quantity Vx[3 ] ( H px H pp P ) x Vx[3 ] H pp (Vx[3 ] )T / 2 H [3 ] ( x, VxT ) does not contain the quadratic term in x of H ( x,VxT ). Therefore, Equation (6.12) can be split into two equations as follows:
Taylor Series Approach to Solving HJI Equation
1 T 1 x H xx x xT PH px x xT PH pp Px 2 2
119
0
(6.18)
1 Vx[3 ] H pp (Vx[3 ] )T H [3 ] ( x,VxT ) 2
(6.19)
and Vx[3 ] ( H px H pp P ) x
Equation (6.18) governs the quadratic term in x of V ( x) while Equation (6.17) governs the cubic and higher order terms of V ( x ). From (6.18), it is clear that P can be obtained by solving the algebraic Riccati equation H Tpx P PH px PH pp P H xx
(6.20)
0
Therefore, we only need to concentrate on obtaining the power series solution V [3 ] ( x) of (6.17). For this purpose, let us introduce the following notation. For any matrix K , we define K (0)
1, K (1)
K , K (i )
K
K
"
K , i 2,3," i factors
where
stands for the Kronecker product. Also, for any n-dimensional vector x [ x1 ," , xn ]T , we define x[0] x
[k ]
1, x[1] k 1
x k 1 1 2
[ x x
x " x1k 1 xn x1k 2 x22 x1k 2 x2 x3 " x1k 2 x2 xn " xnk ]T k t 1
It is clear that there exist constant matrices M k and N k such that x[ k ]
M k x(k ) , x(k )
For example, with n
N k x[ k ]
(6.21)
2, x (2) and x[2] are given by, respectively,
x (2)
ª x12 º « » « x1 x2 » , x[2] « x2 x1 » « 2 » ¬« x2 ¼»
ª x12 º « » « x1 x2 » « x22 » ¬ ¼
Correspondongly, M 2 and N 2 are given by, respectively,
120
Nonlinear H2/H Constrained Feedback Control
M2
ª1 0 0 0 º «0 1 0 0» , N 2 « » «¬ 0 0 0 1 »¼
ª1 «0 « «0 « ¬0
0 0º 1 0 »» 1 0» » 0 1¼
It is noted that the dimensions of M k and N k are C (n,k ) by n k and, respectively, n k by C (n,k ) where def
C (n,k )
Cnk k 1
ik 1 (n k i ) k!
Also, M k N k I n[ k ] where I n[ k ] is an identity matrix of dimension C (n,k ) . Using the above notation gives the following unique expression for V ( x ) : V ( x)
f def 1 T 1 T x Px ¦ Pk x[ k ] x Px V [3 ] ( x) 2 2 k 3
(6.22)
Our purpose is to derive explicit equations that govern all row vectors Pk , k 3, 4,". To this end, we list some useful identities involving the Kronecker product as follows:
Lemma 6.1 (i) For any matrices A, B, C , D of conformable dimensions, ( A
B )( A
D)
( AC )
( BD)
(6.23)
(ii) For k t 1, wx ( k ) wx
k
¦x
( i 1)
I n
x ( k i )
(6.24)
i 1
Where I n denotes an n by n identity matrix. (iii) For x R n , y R m and A R nu m ,
xT Ay
row( A)( x
y )
(6.25)
where row : R nu m o R1u nm is an operator that maps n by m matrix A [aij ] to a 1 by mn row vector row( A) in the following way: row( A) [a11 a12 "a1m "an1 "anm ] k
(iv) For any integers i, j , k t 0, and matrix T R nun ,
Taylor Series Approach to Solving HJI Equation
( x (i )
I n
x ( j ) )Tx ( k )
( I n( i )
T
I n( j ) ) x ( i j k )
121
(6.26)
(v) For any integers i tk 11 and k t i, and row vector Pk of dimension n k , there exists a matrix Pki R nu n determined by Pk such that Pk ( x ( i 1)
I n
x ( k i ) )
( Pki x ( k 1) )T
(6.27)
Proof. Equations (6.23) through (6.25) follow straightforwardly from the definition of the Kronecker product. Equation (6.26) can be proved as follows: ( x (i )
I n
x ( j ) )Tx ( k )
( x ( i )
( I n
x ( j ) ))(1
(Tx ( k )
1)) x (i )
( I n
x ( j ) )(Tx ( k )
1) x (i )
(Tx ( k )
x ( j ) ) ( I n(i ) x (i ) )
(Tx ( k ) )
x ( j ) ( I n(i )
T ) x (i k )
x ( j ) ( I n(i )
T ) x (i k )
( I n( j ) x ( j ) ) ( I n(i )
T
I n( j ) ) x ( i j k )
Note that in deriving (6.26), we have repeatedly utilized identity (6.23). To prove (6.27), note that
x (i 1)
I n
x ( k i )
ª x1(i 1) x ( k i ) « 0 « « # « 0 « « x (i 2) x x ( k i ) 2 « 1 0 « « # « « 0 « # « « # « (i 1) ( k i ) « xn x « 0 « # « « 0 ¬
0 ( i 1) 1
x
x
( k i )
# 0 0 ( i 2) 1 2
x x( k i )
x
# 0
# # 0 ( i 1) n
x
x( k i ) #
0
º " 0 » " 0 » » # # » ( i 1) ( k i ) " x1 x » » " 0 » " 0 » » # # » ( i 2) ( k i ) » " x1 x2 x » # # » » # # » " 0 » » " 0 » # # » ( i 1) ( k i ) » " xn x ¼
122
Nonlinear H2/H Constrained Feedback Control
Next, partition Pk as a 1 by ni block matrix as follows:
Pk
[P
N 1"11
"P 1"1n P 1"21 "P 1"2 n "" P n"n1 "P n"nn ] N N N N N
i tuple
i tuple
i tuple
i tuple
i tuple
i tuple
where, for 1 d j1 ," , ji d n, Pj1" ji1 ji is a row vector of dimension n k i . Then it holds that Pk x (i 1)
I n
x ( k i )
( Pki x ( k 1) )T
(6.28)
where
Pki
P 1"21 " P n"n1 º ªP N 1"11 N N « i tuple i tuple i tuple » « » P 1"22 " P n"n 2 » 1"12 «P N N N i tuple i tuple » « i tuple « # # # # » « » P 1"2 n " P n"nn » «P N 1"1n N N «¬ i tuple i tuple i tuple » ¼
, With these preparations, we are ready to state our major result of this section:
Lemma 6.2 The coefficient vectors Pk , k equations: PkU k
3, 4," , of (6.22) satisfy the following
(6.29)
Hk
where k
Uk
M k (¦ I n( i 1)
( H px H pp P )
I n( k i ) ) N k
(6.30)
i 1
and H k depends only on P,P3 ,",Pk 1 .
Proof. Let us first show that, for k t 3, wx[ k ] ( H px H pp P ) x U k x[ k ] wx
(6.31)
For this purpose, note that, by (6.21), (6.24) and (6.26), for k t 1, and any square matrix T of dimension n,
Taylor Series Approach to Solving HJI Equation
wx[ k ] Tx wx
Mk
123
wx ( k ) Tx wx k
M k (¦ x (i 1)
I n
x ( k i ) )Tx i 1 k
M k (¦ x (i 1)
T
x ( k i ) ) x ( k ) i 1 k
M k (¦ x (i 1)
T
x ( k i ) ) N k x[ k ] i 1
Letting T ( H px H pp P ) in the above equation gives (6.31). Next, by definition, Vx[3 ]
f
¦P
k
k 3
wx[ k ] wx
(6.32)
Substituting (6.32) into the left-hand side of (6.19) and using (6.31) gives Vx[3 ] ( H px H pp P ) x
f
¦ PU k
k
x[ k ]
(6.33)
k 3
On the other hand, by inspection, it is not difficult to find that the right-hand side of (6.19) can be arranged as follows 1 Vx[3 ] H pp (Vx[3 ] )T H [3 ] ( x, VxT ) 2
f
¦H
k
x[ k ]
(6.34)
k 3
where H k depends only on P,P3 ," ,Pk 1 . In fact, it is possible to explicitly express H k in terms of P,P3 ," ,Pk 1 as will be done in the next section. ,
Theorem 6.1 Let the eigenvalues of H px H pp P be O1 ," ,On . Then the eigenvalues of U k are given by
O
l1O1 " ln On
where l1 "ln
k , l1 ," ,ln
(6.35) 0,1, " ,k .
Proof. To establish (6.35), we note that, for any k t 1, U k is such that wx[ k ] Tx U k x[ k ] wx where T
( H px H pp P ).
(6.36)
124
Nonlinear H2/H Constrained Feedback Control
Note that the components of x[ k ] consist of all products of the variables x1 ," ,xn taken k at a time. Therefore if we define / k as the vector space of all homogeneous polynomials in x1 ," ,xn of degree k , then the components of x[ k ] : k o / k such that, for each give a basis of / k . Now define a linear mapping LTx / k I / ,
LTx (I )
wI Tx wx
(6.37)
Then using (6.36) shows LTx ( x1k ,x1k 1 x2 ," , x1k 1 xn , x1k 2 x22 , x1k 2 x2 x3 ," , x1k 2 x2 xn ," , xnk ) wx k wx1k wx k 1 x Tx, 1 2 Tx," , n Tx) wx wx wx [k ] wx ( Tx)T wx (U k x[ k ] )T (
( x1k , x1k 1 x2 ," , x1k 1 xn ,x1k 2 x22 , x1k 2 x2 x3 ," , x1k 2 x2 xn ," , xnk )(U k )T Then, (U k )T is the matrix of the linear mapping under the ordered basis {x1k , x1k 1 x2 ," , x1k 1 xn , x1k 2 x22 , x1k 2 x2 x3 ," ,x1k 2 x2 xn ," ,xnk }
(6.38)
Thus, the spectrum of U k is the same as that of the linear mapping (6.37). Next let the Jordan canonical form of T be
T
ª J1 0 «0 J 2 « «" " « ¬0 0
"
0º 0 »» " "» » " J l ¼ nu n "
(6.39)
where
Ji
ª Oi 1 0 " 0 º «0 O 1 " 0» i « » «" " " " "» « » ¬ 0 0 0 " Oi ¼ ni u ni
is a ni u ni Jordan block with eigenvalues Oi . Suppose the generalized row eigenvalues of T are
Taylor Series Approach to Solving HJI Equation
] 11 ,] 12 ," ,] 1n ,] 21 ," ,] 2 n ,] l1 ," ,] l n 1
2
125
(6.40)
l
satisfying
] ijT
j ni ° Oi] ij ® 1 d j ni °¯Oi] ij ] i ( j 1)
(6.41)
Clearly, D l nl
D1 n1
" (] l1 x)Dl 1 " (] l nl x)
{(] 11 x )D11 (] 12 x)D12 " (] 1n1 x)
l
ni
¦¦ D
ij
k} (6.42)
i 1 j 1
also constitutes a basis for / k . Furthermore LTx ((] ij x) s )
° sOi (] ij x) s ® s s 1 °¯ sOi (] ij x) s (] ij x) ] i ( j 1) x
j
ni
(6.43)
j ni
Now define an order on (6.42) in the following “lexicographic” way D l nl
(] 11 x)D11 " (] l nl x)
! (] 11 x) E11 " (] l nl x)
E l nl
(6.44)
if and only if there exist positive integers i0 and j0 d ni0 such that
Di
0 j0
E i0 j0
D ij
E ij
and
If i i0 , j d ni or i Using (6.43) gives
i0 , j j0 . Then (6.42) constitutes an ordered basis of / k . D l nl
LTx ((] 11 x)D11 "(] l nl x)
l
)
ni
D l nl
(¦¦ D ij Oi )(] 11 x)D11 "(] l nl x) i 1 j 1
D l nl
terms greater than (] 11 x)D11 "(] l nl x)
Thus, the matrix of the linear mapping LTx on / k under the ordered basis (6.42) with the order (6.44) is upper triangular with the diagonal elements being
O
n1
n2
nl
j 1
j 1
j 1
¦ D1 j O1 ¦ D 2 j O2 " ¦ D lj Ol
126
Nonlinear H2/H Constrained Feedback Control
Therefore, the eigenvalues of LTx on / k , hence the eigenvalues of U k , are given , exactly by Equation (6.35). The above theorem immediately leads to the existence condition of the Taylor series solution of (6.19) as follows.
Corollary 6.1 There exists a unique solution Pk of (6.29) for k t 3 if and only if 2," ,k } such that l1 l2 " ln k for all k t 3, all l1 ,l2 ," ,ln {0,1, l1O1 " ln On z 0
(6.45)
Remark 6.2 If we denote the Taylor series expansion of functions f ( x), g1 ( x), g 2 ( x) and h( x ) as follows: f ( x)
Ax f [2 ] ( x), h( x)
Cx h[2 ] ( x)
g1 ( x)
B1 g1[1 ] ( x), g 2 ( x)
B2 g 2[1 ] ( x)
(6.46)
then it is easy to identify that
H px H pp
A, B1 B1T
J2
H xx
CT C
B2 R21 B2T
Therefore H px H pp P corresponds to the closed-loop system matrix of the linearized plant under linear state feedback H f control. As a result, condition (6.45) always holds as long as the standard linear state feedback H f suboptimal control for the linearized plant corresponding to J is solvable.
6.3 Explicit Expression for Hk In the last section, we have shown that the coefficient vectors Pk , k t 3, of the Taylor series expansion of V ( x) satisfy the linear algebraic Equation (6.29) and we have also derived the explicit expression for U k . In this section, we will further derive the explicit expression for H k . First note that there exists a constant matrix S k of dimension n by n k 1 determined by Pk such that k
Pk M k (¦ x (i 1)
I n
x ( k i ) )
( xT )( k 1) Sk
(6.47)
i 1
In fact, let P k Pk M k . Then using (6.27) shows the existence of a matrix Pki of dimension n by n k 1 such that
Taylor Series Approach to Solving HJI Equation
P k ( x (i 1)
I n
x ( k i ) )
( Pki x ( k 1) )T
127
(6.48)
Letting k
Sk
¦ (P )
i T k
, k
3,4,"
(6.49)
i 1
gives (6.47). f
¦ (x
Remark 6.3 As a result of (6.47), we have Vx[3 ] let S 2 P. Then f
¦S
VxT
T k
T ( k 1)
)
S k . For convenience,
k 3
x ( k 1)
k 2
Now we can proceed as follows f
f
l 3
m 3
(¦ ( xT )(l 1) Sl ) H pp ( ¦ SmT x ( m 1) )
Vx[3 ] H pp (Vx[3 ] )T
f
¦ ¦
(( xT )(l 1) Sl H pp S mT x ( m 1) )
k 4 l m k 2 l , mt3 f
¦ ¦
row( Sl H pp S mT )x ( k )
k 4 l m k 2 l , mt3 f k 1
¦¦ row(S H l
pp
S kTl 2 )x ( k )
k 4 l 3 f
¦Z
k
x( k )
(6.50)
k 4
where k 1
Zk
¦ row(S H l
l 3
Next we note that
pp
S kT l 2 ), k
4,5,"
(6.51)
128
Nonlinear H2/H Constrained Feedback Control
H [3 ] ( x, p)
B BT 1 T T 1 x C Cx pT Ax pT ( 1 21 B2 R21 B2T ) p J 2 2 1 pT ( f ( x) Ax) (hT h xT C T Cx) 2 1 T g1 g1T B1 B1T p ( ( g 2 R21 g 2 B2 R21 B2T )) p J2 2 H ( x, p )
Using the notation defined in (6.46) gives H [3 ] ( x, p ) 1 pT ( f ( x) Ax) ( hT h xT C T Cx) 2 1 T g1[1 ] ( g1[1 ] )T B1 ( g1[1 ] )T g1[1 ] B1T p ( g 2[1 ] R21 ( g 2[1 ] )T J2 2 B2 R21 ( g 2[1 ] )T g 2[1 ] R21 B2T ) p 1 pT ( f ( x) Ax) ( hT h xT C T Cx) 2 [1 ] [1 ] T g ( g1 ) 2 g1[1 ] B1T 1 pT ( 1 g 2[1 ] R21 ( g 2[1 ] )T 2 g 2[1 ] R21 B2T ) p 2 J2
(6.52)
Now let f
f ( x)
¦A
m
f
¦C
x ( m ) , h( x )
m 1
where A1
A and C1
m
x(m)
C. Then f
Vx ( f ( x) Ax)
(6.53)
m 1
f
(¦ ( x (l 1) )T Sl )( ¦ Am x ( m ) ) l 2
m 2
f
¦ ¦
(x
( l 1) T
) Sl Am x ( m )
k 3 l m k 1 l ,mt2 f
¦ ¦
row( Sl Am )
k 3 l m k 1 l ,mt2 f k 1
¦¦ row(S A l
k l 1
)
k 3 l 2 f
¦E k 3
k
x(k )
(6.54)
Taylor Series Approach to Solving HJI Equation
129
where k 1
Ek
¦ row(S A
k l 1
l
),
k
3,4,"
(6.55)
l 2
h ( x )T h ( x )
f
f
l 1
m 1
(¦ Cl x (l ) )T ¦ Cm x ( m ) f
¦¦
( x (l ) )T ClT Cm x ( m )
k 2 lm k l , m t1 f
¦¦
row(ClT Cm ) x ( k )
k 2 lm k l , m t1 f k 1
¦¦ row(C
T l
Ck l ) x ( k )
k 2 l 1 f
¦F x
(k )
(6.56)
k
k 2
where k 1
Fk
¦ row(C
T l
Ck l ),k
2,3,"
(6.57)
l 1
To obtain the Taylor series expansion of the term g [1 ] ( g1[1 ] )T 2 g1[1 ] B1T 1 Vx ( 1 g 2[1 ] R21 ( g 2[1 ] )T 2 g 2[1 ] R21 B2T )VxT 2 J2
(6.58)
let B1
[ B11 " B1m1 ], B2
[ B21 " B2 m2 ]
g1[1 ] ( x) [ g11 ( x)" g1m1 ( x)], g 2[1 ] ( x) [ g 21 ( x)" g 2 m2 ( x)] and write the Taylor series g 2l ( x), l 1,2," ,m2 as follows: f
g1l ( x)
¦B
m 1l
expansions
of
g1l ( x), l 1,2," ,m1
and
f
x ( m ) , g 2l ( x)
m 1
Then for i 1,2, j 1," ,mi
¦B
m 2l
m 1
x(m)
(6.59)
130
Nonlinear H2/H Constrained Feedback Control f
Vx Bij
BijT (¦ S kT x ( k 1) )
BijT VxT
k 2
f
¦B
(k ) T k 1
T ij
S
k ij
x(k )
x
k 1 f
¦Y
(6.60)
k 1
where Yijk
BijT S kT1 , k
1, 2,"
(6.61)
Also for i 1,2, j 1," ,mi ,
gijT ( x)VxT
f
¦ ¦
f
f
l 2
m 1
(¦ ( x (l 1) )T Sl )(¦ Bijm x ( m ) )
Vx gij ( x)
( x (l 1) )T Sl Bijm x ( m )
k 2 l m k 1 l t 2, m t1 f
¦ ¦
row( Sl Bijm ) x ( k )
k 2 l m k 1 l t 2, m t1 f
k
¦¦ row(S B l
k 1 l ij
) x( k )
k 2 l 2 f
¦W
k ij
x(k )
(6.62)
k 2
where k
Wijk
¦ row(S B l
l 2
Therefore,
k 1 l ij
), k
2,3,"
(6.63)
Taylor Series Approach to Solving HJI Equation
Vx g1[1 ] B1T VxT
131
m1
¦V g
1j
x
B1TjVxT
j 1 m1
f
¦ (¦ ( x j 1
f
) (W1lj )T )(¦ Y1mj x ( m ) )
(l ) T
l 2
m 1
m1
f
¦ ¦ ¦ (x
(l ) T
) (W1lj )T Y1mj x ( m )
k 3 l m k j 1 l t 2, m t1 m1
f
¦ ¦ ¦ row((W
) Y1mj ) x ( k )
l T 1j
k 3 l m k j 1 l t 2, m t1 f k 1 m1
¦¦ ¦ row((W
) Y1kj l ) x ( k )
l T 1j
k 3 l 2 j 1 f
¦I
1 k
x( k )
(6.64)
k 3
where k 1 m1
I k1
¦¦ row((W
) Y1kj l ), k
l T 1j
3,4,"
(6.65)
l 2 j 1
and Vx g1[1 ] ( g1[1 ] )T VxT
m1
¦V g x
1j
g1TjVxT
j 1 m1
f
¦ (¦ ( x j 1
f
) (W1lj )T )( ¦ W1mj x ( m ) )
(l ) T
l 2
m 2
m1
f
¦ ¦ ¦ (x
(l ) T
) (W1lj )T W1mj x ( m )
k 4 l m k j 1 l t 2, m t 2 m1
f
¦ ¦ ¦ row((W
l T 1j
) W1mj ) x ( k )
k 4 l m k j 1 l t 2, m t 2 f k 2 m1
¦¦¦ row((W
) W1kj l ) x ( k )
l T 1j
k 4 l 2 j 1 f
¦G x 1 k
k 4
(k )
(6.66)
132
Nonlinear H2/H Constrained Feedback Control
where k 2 m1
Gk1
¦¦ row((W
) W1kj l ), k
l T 1j
4,5,"
(6.67)
l 2 j 1
Similarly, m2
¦V g
Vx g 2[1 ] R21 B2T VxT
x
2j
2 k
x(k )
R21 B2T jVxT
j 1 f
¦I
(6.68)
k 3
where k 1 m2
I k2
¦¦ row((W
) R21Y2kjl ),k
l T 2j
3,4,"
(6.69)
l 2 j 1
and m2
¦V g
Vx g 2[1 ] R21 ( g 2[1 ] )T VxT
x
2j
R21 g 2T jVxT
j 1 f
¦G
2 k
x(k )
(6.70)
k 4
where k 2 m1
Gk2
¦¦ row((W
) R21W2kj l ),k
l T 2j
4,5,"
(6.71)
l 2 j 1
Substituting (6.54), (6.56), (6.64), (6.66), (6.68), and (6.70), into (6.52) gives H [3 ] ( x, Vx )
( E3 f
F3 2 I 32 I 31 2 ) N 3 x[3] 2 J
¦ ( Ek k 4
Fk 2 I k2 Gk2 2 I k1 Gk1 )N k x[ k ] 2 2J 2
Finally substituting (6.50) and (6.72) into (6.34) gives
(6.72)
Taylor Series Approach to Solving HJI Equation
F3 2 I 32 I 31 2 ) N3 2 J
H3
( E3
Hk
2 I 1 G1 1 ( Z k 2 Ek Fk k 2 k 2 I k2 Gk2 ) N k ,k 2 J
133
(6.73) 4,5,"
It is clear from the explicit expressions of Z k , Ek , Fk , I ki , and Gki that H k depends only on P, P3 ," , Pk 1 . l
¦ row(S
Remark 6.4 Using W1lj
p
B1l j 1 p ) and Y1kj l
B1Tj S kT l 1 in (6.65) gives a
p 2
more explicit expression of I k1 as follows k 1 m1
I k1
¦¦ row((W
) Y1kj l )
l T 1j
l 2 j 1 k 1 m1
l
¦¦ row(¦ (row(S l 2 j 1
p
B1l j 1 p ))T B1Tj SkT l 1 )
p
B1l j 1 p )T B1Tj S kT l 1 ),k
p 2
m1 k 1
l
¦¦¦ row((row(S
3,4,"
j 1 l 2 p 2 l
Also using W1lj l 2," k 2,
¦ row(S
p
B1l j 1 p ) and W1kj l
k l 1 q 1j
) gives, for
(row( S p B1l j 1 p ))T row( S q B1kj l 1 q )
(6.74)
p 2
k l
l
p 2 k
¦
q
q 2
¦ (row(S
(W1lj )T W1kj 1
k l
¦ row(S B
p
B1l j 1 p ))T ¦ row( S q B1kj l 1 q ) q 2
¦
s 4 p q s 2 d p d l ,2 d q d k l
Substituting (6.74) into (6.67) gives k 2 m1
Gk1
¦¦ row((W
) W1kj l )
l T 1j
l 2 j 1
k 2 m1
k
¦¦ row(¦ l 2 j 1
m1 k 2 k
¦¦¦
¦
¦
j 1 l 2 s 4 p q s 2 d p d l ,2 d q d k l
Similarly, we have
(row( S p B1l j 1 p ))T row( S q B1kj l 1 q ))
s 4 p q s 2 d p d l ,2 d q d k l
row((row( S p B1l j 1 p ))T row( Sq B1kj l 1 q )),k
4,5,"
134
Nonlinear H2/H Constrained Feedback Control
k 1 m2
¦¦ row((W
I k2
) R21Y2kj l )
l T 2j
l 2 j 1
m2 k 1
l
¦¦¦ row((row( S
p
B2l j1 p )T R21 B2T j S kT l 1 ),k
3,4,"
j 1 l 2 p 2
and k 2 m2
Gk2
¦¦ row((W
) R21W2kj l )
l T 2j
l 2 j 1
m2 k 2 k
¦¦¦ j 1 l 2 s 4
¦
row((row( S p B2l j1 p ))T R21row( S q B2k j l 1 q )),k
4,5,"
pq s 2 d p d l ,2 d q d k l
Remark 6.5 We can also put the state feedback control law (6.13) in Taylor series form. For this purpose, using (6.60) and (6.62) gives B2T VxT ( g 2[1 ] )T VxT
g 2T ( x)VxT
T T ª B21 VxT º ª g 21 VxT º « » « » « " »« " » « B2Tm VxT » « g 2Tm VxT » ¬ 2 ¼ ¬ 2 ¼ T T ª B21 Sk 1 W21k º f « » T [k ] " B2 Px ¦ « »N k x k 2 « B2Tm SkT1 W2km » 2 ¼ ¬ 2
(6.75)
Substituting (6.75) into (6.13) gives u
R21 g 2T ( x)VxT (6.76)
f
¦Q x
[k ]
k
k 1
where Q1
R21 B2 P,
Qk
T T ª B21 S k 1 W21k º « » R21 « " » N k ,k t 2 « B2Tm S kT1 W2km » ¬ 2 2 ¼
In particular, when m2
1, we have B2
B21 . Thus, for k t 2,
(6.77)
Taylor Series Approach to Solving HJI Equation
135
R21 ( B2T SkT1 W21k ) N k
Qk
(6.78)
k
R21 ( B2T SkT1 ¦ row( Sl B2k 1 l )) N k l 2
6.4 The Disturbance Attenuation of RTAC System In this section, we will consider the disturbance attenuation problem associated with the Rotational/Translational Actuator (RTAC) system depicted in Figure 5.2. The motion equation of the system has been derived in Chapter 5. Under a mild persistent disturbance, the position of the cart will oscillate or even go unbounded. We will consider how to design a state feedback controller such that the affect of the disturbance on the position of the cart can be attenuated to a certain degree. This control problem can be formulated as a nonlinear H f control problem by introducing the following performance output z h( x) l ( x)u where h( x )
ª Dx º « 0 » , l ( x) ¬ ¼
ªO4u1 º « 1 », D ¬ ¼
diag (1, G , G , G )
where G is a nonnegative real number introduced for making the design trade-off. The nonlinearity of the system has prohibited a solution of the pertinent HJI equation. Here, we will consider how to systematically obtain the Taylor series solution of the HJI equation. To this end, let us first give the coefficient matrices of the Taylor series expansion of various functions described in (5.16) up to 5th-order assuming H 0.2, G 0.1, and J 6. The results are given as follows: A2 k
A1
A5
0, k
1 ª 0 « 25 « 0 « 24 « 0 0 « « 1 «¬ 24 0 ª (2, 256) « 25 « ¬« 1536
B12 k 1
1,2,"
(6.79)
0 0º » 0 0» ª(2,11) (2, 48) (4,11) (4, 48) º », « 25 A 5 65 1 »» 3 « 0 1» » 24 576 24 ¼» 4u64 ¬« 576 » 0 0» ¼ (2, 688) (4,171) (4, 688) º ," 5 25 17 »» 576 1536 576 ¼» 4u1024
0,B22 k 1
0, k
1,2," ,
(6.80)
136
Nonlinear H2/H Constrained Feedback Control
B10
0 2
B
ª 0 º « 25 » « » « 24 » , « 0 » « » « 5» «¬ 24 »¼ ª 0 º « 5» « » « 24 » , « 0 » « » « 25 » «¬ 24 »¼
C1
B12
ª (2,11) (4,11) º « 25 65 »» , « ¬« 576 576 »¼ 4u16
B14
ª (2,171) (4,171) º « 25 ," 25 »» « 1536 »¼ 4u 256 ¬« 1536
2 2
ª (2,11) (4,11) º « 65 25 »» , « «¬ 576 576 »¼ 4u16
4 2
ª (2,171) (4,171) º « 25 ," 25 »» « «¬ 1536 1536 »¼ 4u 256
B
ªC º « 0 » , Ck ¬ ¼
B
2,3,"
0 for k
R2
(6.81)
1
Remark 6.6 We have adopted a compact form to denote the sparse matrices A3 , A5 and so on. With our notation, only the non-zero elements of a matrix together with its coordinates are given. For example, the matrix A3 is a 4 u 64 matrix whose nonzero elements only appear in row 2 and column 11, row 2 and column 48, row 4 and column 11, and row 4 and column 48, with values 25 / 576, 5 / 24, 65 / 576, 1/ 24, respectively. To be specific, we will solve HJI equation up to the 6th-order. Let us first point out that P2l 1 0, for all l 1, 2," , which can be concluded thanks to equations (6.79), (6.80), and (6.81). As a result, we only need to obtain P, P4 , and P6 . It is known that P is the solution to the following algebraic Riccati equation AT P PA P( B2 R21 B2T
1
J2
B1 B1T ) P C T C
0
and can be obtained easily as follows
P
S2
0.14405 0.17539 0.94317 º ª 10.835 « 0.14405 9.1398 0.11685 0.40559 »» « « 0.17539 0.11685 0.050209 0.12061 » « » 0.12061 0.56475 ¼ ¬ 0.94317 0.40559
(6.82)
Taylor Series Approach to Solving HJI Equation
137
To obtain P4 and P6 iteratively, we need to make use of the algorithm in the last two sections. Let us first consider P4 using P4 H 4U 41 . By (6.30) and (6.73), we can calculate U 4 and H 4 as follows: H4
2 I 1 G1 1 ( Z 4 2 E4 F4 4 2 4 2 I 42 G42 ) N 4 2 2J
U4
M 4 (¦ I 4(i 1)
( H px H pp P )
I 4(4 i ) ) N 4
4
i 1
M 4 (( H px H pp P )
I 4(3) I 4
( H px H pp P )
I 4(2) I 4(2)
( H px H pp P )
I 4 I 4(3)
( H px H pp P )) N 4
(6.83)
Clearly, the two matrices U 4 and H 4 depend on the following intermediate quantities: 3
E4
¦ row(S A
4 l 1
l
) row( S2 A3 ) row( S3 A2 )
l 2 3
F4
¦ row(C
T l
C4 l ) row(C1T C3 ) row(C2T C2 ) row(C3T C1 )
l 1
Z4
row( S3 H pp S3T )
Y111
B1T S2T
Y112
B1T S3T
W112
row( S 2 B11 )
W113
¦ row(S B
3
4 l 1
l
) row( S 2 B12 ) row( S3 B11 )
l 2 3
I 41
¦ row((W
) Y114 l ) row((W112 )T Y112 ) row((W113 )T Y111 )
l T 11
l 2
Y211
B2T S2T
Y212
B2T S3T
W212
row( S 2 B21 ) 3
W213
¦ row(S B
4 l 2
l
) row( S 2 B22 ) row( S3 B21 )
l 2 3
I 42
¦ row((W
) R21Y214 l ) row((W212 )T R21Y212 ) row((W213 )T R21Y211 )
l T 21
l 2 2
1 4
G
¦ row((W
) W114 l ) row((W112 )T W112 )
l T 11
l 2 2
G42
¦ row((W
) R21W214 l ) row((W212 )T R21W212 )
l T 21
l 2
138
Nonlinear H2/H Constrained Feedback Control
Note that since m1 m2 1, we have made use of the notation B1 B2 B21 in the above equations. Finally, using P4 H 4U 41 gives P4
B11 and
[37.148 48.909 16.107 50.479 54.887 18.389 50.657 2.5827 15.492 24.097 34.776 15.924 45.004 0.80968 8.9895 14.252 0.013263 0.064319 2.6391 3.7588 15.398 11.18 24.834 2.7804 12.28 14.289 0.040014 0.066337 2.7555 2.141 0.0012059 0.016393 0.024149 0.18431 0.20317]1u35 (6.84)
From P4 , we can obtain S 4 P41 P42 P43 P44 which will be used for calculating P6 and the Taylor series expansion of the control law. Similarly, we can compute P6 using P6 H 6U 61 . By (6.30) and (6.73), we can obtain U 6 and H 6 as follows: 6
U6
M 6 (¦ I 4(i 1)
( H px H pp P )
I 4(6 i ) ) N 6 i 1
M 6 (( H px H pp P )
I 4(5) I 4
( H px H pp P )
I 4(4) I 4(2)
( H px H pp P)
I 4(3) I 4(3)
( H px H pp P)
I 4(2) I H6
(4) 4
( H px H pp P)
I 4 I
(5) 4
(6.85)
( H px H pp P )) N 6
2 I 1 G1 1 ( Z 6 2 E6 F6 6 2 6 2 I 62 G62 ) N 6 2 2J
where intermediate quantities are computed as follows: 5
E6
¦ row(S A l
6 l 1
) row( S2 A5 ) row( S3 A4 ) row( S 4 A3 ) row( S5 A2 )
l 2 5
F6
¦ row(C
T l
C6 l )
l 1
row(C1T C5 ) row(C2T C4 ) row(C3T C3 ) row(C4T C2 ) row(C5T C1 ) 5
Z6
¦ row(S H l
pp
S8T l )
row( S3 H pp S5T ) row( S4 H pp S4T ) row( S5 H pp S3T )
l 3
Y113
B1T S 4T
Y114
B1T S5T
W114
¦ row(S B
4
l
l 2
5l 1
)
row( S2 B13 ) row( S3 B12 ) row( S 4 B11 )
Taylor Series Approach to Solving HJI Equation
139
5
¦ row(S B
W115
6 l 1
l
)
row( S 2 B14 ) row( S3 B13 ) row( S 4 B12 ) row( S5 B11 )
l 2 5
¦ row((W
) Y116 l )
l T 11
I 61
l 2
row((W112 )T Y114 ) row((W113 )T Y113 ) row((W114 )T Y112 ) row((W115 )T Y111 ) Y213
B2T S4T
Y214
B2T S5T
W214
¦ row(S B
4
5l 2
l
)
row( S 2 B23 ) row( S3 B22 ) row( S 4 B21 )
)
row( S 2 B24 ) row( S3 B23 ) row( S 4 B22 ) row( S5 B21 )
l 2 5
¦ row(S B
5 21
W
6 l 2
l
l 2 5
¦ row((W
) R21Y216 l )
l T 21
I 62
l 2
row((W212 )T R21Y214 ) row((W213 )T R21Y213 ) row((W214 )T R21Y212 ) row((W215 )T R21Y211 ) 4
¦ row((W
G61
) W116 l ) row((W112 )T W114 ) row((W113 )T W113 ) row((W114 )T W112 )
l T 11
l 2 4
¦ row((W
G62
) R21W216 l )
l T 21
l 2
row((W214 )T R21W212 ) row((W213 )T R21W213 ) row((W212 )T R21W214 ) Using P6 P6
H 6U 61 gives [981.45 732.87 439.5 1486.2 1088.8 19.12 121.76 21.214 322.76 634.91 932.12 330.63 962.35 53.861 380.64 637.55 9.0045 34.001 47.988 0.84132 478.2 72.518 249.34 3.8088 28.362 38.032 1.5946 54.298 312.67 378.13 0.016103 2.1404 21.668 66.028 49.944 374.97 79.038 189.96 50.322 291.4 417.1 11.401 75.627 225.52 193.47 0.0085719 0.92724 26.741 98.066 83.407 0.024746 0.43614 1.2571 2.8829 13.062 9.5653 116.81 95.439 244.02 18.159 55.461 39.764 1.2045 24.8 118.21 124.73 0.26491 4.3047 29.456 68.167 43.57 0.032143 0.24508 0.39798 6.4559 13.892 7.8167 4.7213e 005 0.003559 0.035064 0.13441 0.33689 1.0005 0.59147]1u84
140
Nonlinear H2/H Constrained Feedback Control
From P6 , we can obtain S6 by using S6 P61 P62 P63 P64 P65 P66 . Using (6.76) and (6.78) and notingt that m2 1, S3 0 and S5 0, we can obtain the fifth-order controller as follows: u5
Q1 x Q3 x[3] Q5 x[5] R21 B2 Px R21[ B2T S4T row( PB22 )]N 2 x[3] R21[ B2T S6T row( PB24 ) row( S4 B22 )]N 5 x[5]
The numerical values for Q1 Q1
[0.95246 1.4816 0.10129 0.50378]
and for Q3
[42.393 29.898 12.306 36.649 25.144
Q3
2.7298 10.94 0.26036 3.6253 8.7769 13.036 5.804 14.249 0.075606 0.62405 0.7366 0.00078754 0.042873 0.0019096 0.4005]1u 20 and for Q5 is Q5
[1395.4 326.83 332.23 1297.4 419.88 534.26 1729.2 27.525 20.675 130.19 138.78 15.78 76.603 68.337 663.21 1197.5 3.3001 37.455 141.2 129.32 192.71 369.41 1027.3 57.147 287.71 343.9 0.58078 33.119 212.49 266.92 0.34536 2.0269 4.557 33.996 32.443 108.18 41.644 171.35 46.839 292.49 422.91 6.9832 53.034 139.14 103.59 0.11682 1.5678 10.848 29.482 22.558 0.00075384 0.0098191 0.21599 0.26505 2.3165 2.0682]1u56
We now compare the performance of the linear and third-order controllers by computer simulations. Let us take a look at Figure 6.1 which shows the time response of the four states of the open-loop system with zero initial condition under the disturbance F (t ) 0.5sin(3t ).
Taylor Series Approach to Solving HJI Equation
141
Figure 6.1. Time response of the four states of the open-loop system with zero initial condition under the disturbance F (t ) 0.5sin(3t )
It can be seen that, without control, the magnitude of the steady-state oscillation is about half of the amplitude of the disturbance. Moreover, if we increase the amplitude of the disturbance to 5.5, the states of the system become unbounded. Next, we compare the stabilization performance of the linear and third-order controllers. Figure 6.2 shows the time response of the four states of the closed-loop system resulting from both linear and third-order controllers with initial condition x(0) [0.1 0.1 0.1 0.1] and zero external disturbance. Figure 6.3 repeats the scenario of Figure 6.2 with initial condition
x(0) [0.45 0.45 0.45 0.45]
142
Nonlinear H2/H Constrained Feedback Control
Figure 6.2. Time response of the four states of the closed-loop system with initial condition x(0) [0.1 0.1 0.1 0.1]
It can be seen that, in both cases, both controllers can stabilize the system, but the third-order controller gives a better transient response, especially, when the initial condition is large. Next we will compare the performance of the disturbance attenuation of the linear and third-order controllers. Part (a) and part (b) of Figure 6.4 compares the profiles of the output of the closed-loop system and the control input under linear and third-order control, respectively, where all initial conditions are zero and the disturbance is F (t ) 0.5sin(3t ). It can be seen that performance of the two controllers are quite similar, and the magnitude of the steady-state output is less than 15% of the amplitude of the disturbance. Part (a) and part (b) of Figure 6.5 repeats the scenario of Figure 6.4 with a larger disturbance F (t ) 5.5sin(3t ). In this case, the linear controller cannot stabilize the system while the third-order controller can still maintain its performance.
Taylor Series Approach to Solving HJI Equation
143
Figure 6.3. Time response of the four states of the closed-loop system with initial condition x(0) [0.45 0.45 0.45 0.45]
144
Nonlinear H2/H Constrained Feedback Control
Figure 6.4. Profiles of the output of the closed-loop system and the control input under linear and third-order controllers with the disturbance F (t ) 0.5sin(3t ) and zero initial condition
Taylor Series Approach to Solving HJI Equation
145
Figure 6.5. Profiles of the output of the closed-loop system and the control input under the linear and third-order controllers with the disturbance F (t ) 5.5sin(3t ) and zero initial condition
146
Nonlinear H2/H Constrained Feedback Control
6.5 Bibliographical Notes Nonlinear H f control problems were formulated and studied in the early 1990s in several papers [12], [47], and [90]. Using Taylor series to approximately solve the HJI equation was first suggested in [90]. Solvability conditions for the existence of the former Taylor series solution were given in [50]. A systematic numerical approach to obtaining the formal Taylor series solution was developed in [46]. An iterative approach for obtaining the formal Taylor series solution was further elaborated in [45]. Section 6.1 is based on the treatment of [47]. Sections 6.2 and 6.3 are a modified version of [46]. The nonlinear benchmark control problem, i.e. the disturbance attenuation problem of RTAC system, was described in [25]. Applications using the nonlinear H f control method to study the nonlinear benchmark control problem were first reported in [89] where a third-order polynomial solution of the HJI equation was obtained by a direct calculation. A fifth-order approximate nonlinear H f control law for studying the nonlinear benchmark control problem was given in [28] based on the systematic approach given in [45]. The computer simulation of Section 6.4 was performed by Peng Jia, currently a master student of the second author of the book.
7 An Algorithm to Solve Discrete HJI Equations Arising from Discrete Nonlinear H Control Problems
In this chapter, we study the approximate solution of the nonlinear H control problem for the class of discrete-time nonlinear systems. Like the continuous-time case, it can be shown that the discrete-time nonlinear H control problem boils down to the solution of a set of algebraic and partial differential equations known as the discrete Hamilton–Jacobi–Isaacs (DHJI) equation, an extension of the discrete algebraic Riccati equation arising in the linear discrete H control problem. Also due to its nonlinear nature, it is rarely possible to obtain a closedform solution for the DHJI equation. Thus the approximation method is considered as the single feasible way to make the discrete nonlinear H control theory a practical design tool. This chapter will detail an approximation approach to solving the DHJI equation in terms of the Taylor series. Similar to the continuous-time case, it can be shown that the coefficients of the Taylor series solution for DHJI equation are governed by one discrete algebraic Riccati equation and a sequence of linear algebraic equations, respectively. This result lends itself to a systematic algorithm to approximately synthesize a discrete nonlinear H control law. The chapter is organized as follows. In Section 7.1, we present the formulation of the discrete-time H control problem and summarize the solvability conditions. In Section 7.2, we present the Taylor series solution of the DHJI equation. In Section 7.3, we apply the approach detailed in Section 7.2 to design an approximate discrete nonlinear H control law to achieve the disturbance attenuation problem for the discretized RTAC system.
7.1 Introduction Consider the following discrete-time multi-input, multi-output nonlinear systems x(t 1) z (t )
A( x(t )) B( x(t ))u (t ) E ( x(t )) w(t ), t C ( x(t )) D( x(t ))u (t ) F ( x(t )) w(t )
0,1, 2,!
(7.1)
148
Nonlinear H2/H Constrained Feedback Control
where x \ n is the plant state, u \ m the plant input, w \ r the exogenous input, and z \ p the penalty variable. It is assumed that all functions involved in this setup are smooth and defined in a neighborhood X of the origin in \ n . Without loss of generality, we assume A(0) 0 and C (0) 0 . We will only focus on the full information feedback case in which the control law takes the following form: u (t )
k ( x(t ), w(t ))
(7.2)
Where k (, ) is a locally defined smooth function satisfying k (0, 0) 0 . The notation used here for describing discrete-time nonlinear systems is somehow different from previous chapters mainly in that the value of a discretetime variable x at a time instant t is represented by x(t) instead of xt with t denoting the discrete-time index. The reason for this change is to make the results of this chapter be stated in a way more similar to the continuous case studied in Chapter 6. It should also be noted that the nonlinear systems studied in this chapter is more general than those in Chapter 1. Therefore, the discrete-time HJI equation to be derived is also more complex than the one introduced in Chapter 1.
Problem of Local Disturbance Attenuation and Internal Stability: Find a controller of the form (7.2) such that the closed-loop system is internally stable, and the L2-gain of the disturbance input w to the penalty variable z is attenuated by a prescribed real number Ȗ. Here internal stability means that the equilibrium x 0 of the closed-loop system with w 0 is locally asymptotically stable. The disturbance attenuation is characterized in the following way. Given a real number 0 J , it is said that the exogenous input is locally attenuated by Ȗ if there exists a neighborhood X of the point x 0 such that for every integer N ! 0 and for every w l2 ([0, N ], \ r ) for which the state trajectory of the closed-loop system starting from x (0) 0 remains in X for all t [0, N ], the response z l2 ([0, N ], \ p ) of Equation (7.1) and (7.2) satisfies N
¦ z (t ) t 0
2
N
dJ2
¦ w(t )
2
, N
1, 2,"
(7.3)
t 0
Like the continuous time systems, the solvability of the discrete time H control problem is closely related to a partial differential equation called discrete Hamilton–Jacobi–Issacs equation (DHJI). To introduce this equation, let V ( x ) be a smooth positive definite function, locally defined in a neighborhood of the origin of \ n . Then the Hamiltonian function H ( x, u , w) associated with the above problem is defined as H ( x, u , w) V ( A( x) B ( x)u E ( x) w) V ( x)
1 2 2 ( C ( x) D( x)u F ( x) w J 2 w ) 2
An Algorithm to Solve Discrete HJI Equations
149
The partial derivatives of the Hamiltonian function H ( x, u , w) with respect to u and w are calculated as follows: wH wu wV wD D def
H u ( x, u, w)
wH wu wV wD D
B ( x ) (C ( x) D( x)u F ( x) w)T D( x) A( x ) B ( x )u E ( x ) w
def
H w ( x, u , w)
E ( x) (C ( x) D( x)u F ( x) w)T F ( x) J 2 wT A( x ) B ( x )u E ( x ) w
The second-order derivatives of the Hamiltonian function H ( x, u, w) with respect to u and w can be calculated as follows:
w2 H ( x, u , w) w (u , w) 2
ª H uu ( x, u, w) H wu ( x, u, w) º « H ( x, u, w) H ( x, u, w) » ww ¬ uw ¼
ª T º w 2V w 2V BT ( x ) 2 E ( x ) » « B ( x) 2 B( x) wD wD « » « DT ( x ) D( x) » DT ( x ) F ( x ) « » 2 2 « E T ( x) w V B ( x) E T ( x) w V E ( x) » » « wD 2 wD 2 » « T 2 T F ( x) F ( x) J I r ¼ D ¬ F ( x) D( x)
(7.4)
A( x ) B ( x )u E ( x ) w
If w2 H ( x, u , w) w (u , w) 2 is invertible in a neighborhood of the origin, then, by the implicit function theorem, there exist two smooth functions u* ( x) and w* ( x) defined in a neighborhood of the origin x 0 satisfying u * (0) 0 and w* (0) 0 such that H u ( x, u * ( x), w* ( x))
0
(7.5)
H w ( x, u * ( x), w* ( x ))
0
(7.6)
The discrete Hamilton–Jacobi–Issacs equation associated with the discrete-time H control problem is then defined as follows:
150
Nonlinear H2/H Constrained Feedback Control
H ( x, u* ( x), w* ( x))
0
(7.7)
In terms of the Hamiltonian function H ( x, u , w), the solvability of the above problem for the full information case can be summarized as follows.
Theorem 7.1 Consider the discrete-time nonlinear system described in (7.1), and suppose there exists a smooth positive definite function V ( x) , locally defined in a neighborhood of the origin of \ n , such that (A1) H uu (0, 0, 0) ! 0,H ww (0, 0, 0) H wu (0, 0, 0) H uu1 (0, 0, 0) H uw (0, 0, 0) 0 (7.8)
(A2) The discrete Hamilton–Jacobi–Issacs equation H ( x, u * ( x), w* ( x))
(7.9)
0
holds in a neighborhood of 0 R n for u*(x) and w*(x) defined by (7.5) and (7.6). (A3) The equilibrium x = 0 of the system x(t 1)
A( x(t )) B( x(t ))u * ( x(t )) E ( x(t )) w* ( x(t ))
(7.10)
is locally asymptotically stable. Then, the full information feedback control law
u
k ( x, w)
u* ( x) H uu1 ( x, u* ( x), w* ( x)) H wu ( x, u* ( x), w* ( x))( w w* ( x))
(7.11)
solves the problem of disturbance attenuation with internal stability with the performance level specified by J .
Remark 7.1 Under assumption A1, the matrix w2 H ( x, u , w) w (u , w) 2 is invertible in a neighborhood of the origin. Thus, u*(x) and w*(x) are well-defined smooth functions in a neighborhood of 0 R n by Equation (7.5) and (7.6) and satisfy u * (0) 0 and w* (0) 0 . It is clear that the synthesis of the nonlinear discrete-time H control law (7.11) relies on the availability of the solution of (7.5) to (7.7). However, the nonlinear nature of (7.5) to (7.7) precludes the possibility of obtaining the closed-form solution for these equations except for some simple cases. Thus approximation
An Algorithm to Solve Discrete HJI Equations
151
methods may be the only feasible way to solve (7.5) to (7.7). In this chapter, we will consider using the Taylor series to approximately solve (7.5) to (7.7) in a fashion similar to what has been done for the continuous-time case described in Chapter 6. However, what makes the approximation problem in the discrete domain distinct from the continuous-time case is that the discrete HJI equation involves two additional functions u*(x) and w*(x) that are implicitly defined by two partial differential equations, (7.5) and (7.6). This difficulty arises from the fact that the discrete Hamiltonian function is a nonlinear algebraic equation as opposed to the continuous case where the Hamiltonian function is a partial differential equation. As a result, the discrete Hamiltonian function is not a quadratic function in (u, w) which rules out the possibility to obtain the explicit solution (u*(x), u*(x)) from (7.5) and (7.6). This discrete peculiarity makes the problem of approximately solving the discrete HJI equation more tedious than its continuous counterpart since we actually need to solve a set of mixed algebraic and partial differential equations with three unknown functions V(x), u*(x) and w*(x). To make the notation more tractable, we will focus on the single input, single output systems. That is, we assume m r p 1 for the rest of the chapter. Our major result, which will be presented in Section 7.2, is that the coefficients of the Taylor series solution for (7.5) to (7.7) is governed by one discrete Riccati equation and a sequence of linear algebraic equations. This result lends itself to an iterative algorithm that is able to systematically obtain the Taylor series solution for (7.5) to (7.7).
7.2 Taylor Series Solution of Discrete Hamilton–Jacobi–Isaacs Equation Using the same notations as those in Chapter 6 gives the following unique expressions for the Taylor series expansions of the functions V(x), u*(x) and w*(x) as follows:
V ( x)
f 1 T x Px ¦ Pk x[ k ] 2 k 3
f
u* ( x)
¦ k 1
(7.12)
f
U k x[ k ] ,w* ( x)
¦W x
[k ]
k
(7.13)
k 1
where P is n×n symmetric matrix and Pk, Uk, Wk are row vectors. Our purpose is to derive explicit equations that govern P and all other coefficient vectors Pk, Uk and Wk. To this end, we will expand the left-hand side of (7.5) to (7.7) into power series in x[k] with the unknown functions V(x), u*(x) and w*(x) represented by (7.12) and (7.13), and then let all coefficients of x[k] identically zero. This process entails the two more useful identities in addition to those introduced in Lemma 6.1 of Chapter 6.
152
Nonlinear H2/H Constrained Feedback Control
Lemma 7.1 (i) For x R n , y R m and A R nu p row( yxT A) (ii)
xT ( yT
A)
(7.14)
For x R n , y R m and z R m xyT z
( I n
yT )( x
z )
(7.15)
Proof. Equation (7.14) can be proved as follows: row( yxT A) [ y1 xT A y2 xT A" ym xT A] xT [ y1 A y2 A" ym A]
xT ( yT
A)
As for (7.15), we have xyT z
x
( yT z )
( I n x)
( yT z )
( I n
yT )( x
z )
, Next, we introduce more notations:
Lemma 7.2 Let f
A( x)
Ax ¦ Ak x ( k ) ,B( x) k 2 f
C ( x)
Cx ¦ Ck x ,D( x) (k )
k 2
f
B ¦ Bk x ( k ) ,E ( x) k 1 f
D ¦ Dk x ,F ( x) (k )
k 1
f
E ¦ Ek x ( k ) k 1
(7.16)
f
F ¦ Fk x
(k )
k 1
Then f
A( x) B ( x)u* ( x) E ( x) w* ( x)
¦I x
(k )
k
k 1
(7.17)
f
*
*
C ( x) D( x)u ( x) F ( x) w ( x)
¦\ k 1
where for k t 1,
k
x
(k )
An Algorithm to Solve Discrete HJI Equations
Ik
BU k M k EWk M k ( Ak Bku Ekw )
\k
DU k M k FWk M k (Ck Dku Fkw )
Bku
Ekw
Dku
Fkw
k 1 0, ° ( ) i ® i j k B ( I
U M ),k ! 1 j j °¯¦ i , j t1 i n k 1 0, ° i ( ) ® i j k E ( I
W M ),k ! 1 j j °¯¦ i , j t1 i n k 1 0, ° i ( ) ® D ( I
U j M j ),k ! 1 °¯¦ ii ,j jt1k i n k 1 0, ° (i ) ® F ( I
W j M j ),k ! 1 °¯¦ ii ,j jt1k i n
153
(7.18)
Proof. f
B( x)u * ( x)
¦
(B
f
Bi x (i ) )(
i 1
¦U
¦
f
Bi x (i ) )(
i 1
¦U M j
¦ BU M k
(7.15) f
¦
x( j ) )
f
k
x( k )
¦ ¦ Bx
(i )
i
U j M j x( j )
k 2 i j k i , j t1
k 1
j
j 1
f
x[ j ] )
j 1
f
( B
j
f
BU k M k x ( k )
¦ ¦ B (I
U M i
i n
j
j
) x( k )
k 2 i j k i , j t1
k 1
f
BU1 x
¦ ( BU M k
k
Bku ) x ( k )
(7.19)
k 2
Similarly, we can show f
E ( x) w* ( x)
EW1 x ¦ ( EWk M k Ekw ) x ( k ) k 2 f
D( x)u * ( x)
DU1 x ¦ ( DU k M k Dku ) x ( k ) k 2 f
F ( x) w* ( x)
FW1 x ¦ ( FWk M k Fkw ) x ( k ) k 2
(7.20)
154
Nonlinear H2/H Constrained Feedback Control
,
Thus (7.17) follows.
Remark 7.2 It is importment to note that ( Bku ,Dku ) , and ( Ekw ,Fkw ) depend only on U1 ," ,U k 1 and, respectively, W1 ," ,Wk 1 . Theorem 7.2 Under assumption A1, the coefficient vectors of the Taylor series expansions of u*(x) and w*(x) are given by ª BT PA DT C º R 1 « T T » ¬ E PA F C ¼
ªU1 º «W » ¬ 1¼
(7.21)
where R
ª BT PB DT D BT PE DT F º « T 2» T T T ¬ E PB F D E PE F F J ¼
(7.22)
ª P M B k Gku º Nk R 1 « k 1 k 1 kk 1 w» ¬ Pk 1 M k 1 Ek 1 Gk ¼
(7.23)
and for k 2, ªU k º «W » ¬ k¼ where ª Gku º « w» ¬Gk ¼
ª BT P( Ak Bku Ekw ) DT (Ck Dku « T u w T u ¬ E P ( Ak Bk Ek ) F (Ck Dk ª i j k row( BT PI DT\ ) k i j i j ¦j « ¦ i , j t1 « k «¦ i j k row( EiT PI j FiT\ j ) ¦ j i j , 1 t ¬
Fkw ) º » Fkw ) ¼ Pj M j B kj º » » k P M E » j j j 3 ¼
(7.24)
3
and
¦
Bkj
Bkts , j t k 1
(7.25)
t s j s t 0,t t k 1
k
Bkts
¦( ¦ i 1
l m t l t i 1,m t k i
*l (i 1)
Bs
* m ( k i ) ),s t 0
(7.26)
An Algorithm to Solve Discrete HJI Equations
¦
Ekj
Ekts , j t k 1
155
(7.27)
t s j s t 0,t t k 1
k
¦( ¦
Ekts
*l (i 1)
Es
* m ( k i ) ),s t 0
(7.28)
l m t l t i 1,m t k i
i 1
j 0, k ! 0 °01unk , ° k j 0 ®1, ° ° ii1 ,i i2,"",ii jt1k Ii1
Ii2
"
Ii j ,k t j t 1 ¯ 12 j
* kj
(7.29)
¦
def
def
B and E0
In (7.26) and (7.28), B0
E.
Proof. As pointed out in Remark 7.1, under assumption A1, u*(x) and w*(x) are well defined smooth functions in a neighborhood of 0 \ n satisfying u (0) 0 and w (0) 0. Also, note that
w2 H (0, 0, 0) w (u , w) 2
R
Thus, under assumption A1, R is invertible. Next, we break down the proof into six steps. Step 1: Define D (0) 1 . By straightforward calculation, we have for j t 1 , f
D ( j)
D
¦
f
I x k 1 k
¦ ¦
(k )
Ii
Ii
"
Ii x ( k ) 1
2
j
k j i1 i2 " i j k i1 , i2 ,", i j t1
Thus, for j t 0 , we have f
D ( j)
D
¦
f
I x k 1 k
(k )
¦*
kj
x( k )
(7.30)
k j
Step 2: Show, for k 3, f
k
(
¦ D (i 1)
I n
D (k i ) i 1
)( D
¦
f j
I x 1 j
( j)
¦ s 0
f
Bs x ( s ) )
¦Bx j k
j k 1
( j)
(7.31)
156
Nonlinear H2/H Constrained Feedback Control
f
k
(
¦ D (i 1)
I n
D (k i ) i 1
)( D
¦
f j
I x 1 j
¦
f
Es x ( s ) )
¦E
j k
x( j )
(7.32)
j k 1
s 0
( j)
For this purpose, note that, using (7.30), we have, for k 3, k
¦D
( i 1)
I n
D ( k i )
i 1
D f
k
¦¦
l i 1
I x j 1 j
( j)
¦*
m ( k i )
x( m) )
m k i
f
k
f
f
*l (i 1) x (l ) )
I n
(
(
i 1
¦
f
¦¦ ¦ *
l ( i 1)
x(l )
I n
* m ( k i ) x( m )
(7.33)
i 1 l i 1 m k i f
k
¦¦ ¦
(*l (i 1)
I n
* m ( k i ) )( x (l )
I n
x ( m ) )
i 1 t k 1 l m t l t i 1,m t k i f
k
¦¦ ¦
(*l (i 1)
I n
* m ( k i ) )( x (l )
I n
x ( m ) )
t k 1 i 1 l m t l t i 1,m t k i
Thus using (7.33), (6.23), (6.24), (7.26), and (7.25) successively gives f
k
(
¦D
( i 1)
I n
D ( k i )
i 1
)( D
f
k
s 0
i 1
f
f
¦ (¦ D
( i 1)
f
¦j
I x( j ) 1 j
¦B x s
(s)
)
s 0
I n
D ( k i )
)Bs x ( s ) D
f
¦ j 1I j x
( j)
k
¦¦¦ ¦
(*l (i 1)
I n
* m ( k i ) )( x (l )
I n
x ( m ) )Bs x ( s )
s 0 t k 1 i 1 l m t l t i 1,m t k i f
f
k
¦¦¦ ¦
(*l (i 1)
I n
* m ( k i ) )( I n (l )
Bs
I n ( m ) )x (l m s )
s 0 t k 1 i 1 l m t l t i 1,m t k i f
f
k
¦ ¦ ¦( ¦ s 0 t k 1 i 1 f
f
¦¦B
ts k
l m t l t i 1,m t k i
x(t s )
s 0 t k 1 f
¦ ¦
j k 1 t s j s t 0,t t k 1
Bkts x ( j )
*l (i 1)
Bs
* m ( k i ) )x (l m s )
An Algorithm to Solve Discrete HJI Equations
157
and hence f
k
(
¦D
( i 1)
I n
D ( k i )
i 1
)( D
¦
f
I x j 1 j
¦B x
f
(s)
s
j k
x( j )
(7.34)
j k 1
s 0
( j)
¦B
)
Equation (7.32) can be proved similarly. Step 3: Show, for k 3,
wV wD
B ( x) D
¦
f
I x j 1 j
BT PI1 x
( j)
f
¦ ( BT PIk k 2
wV wD
E ( x) D
¦
f j
¦
i j k i , j t1
k 1
row(BiT PI j ) ¦ Pj M j B kj ) x ( k )
(7.35)
j 3
E T PI1 x
I x( j ) 1 j f
¦ ( E T PIk k 2
¦ row(E
k 1
T i
i j k i , j t1
PI j ) ¦ Pj M j E kj ) x ( k )
(7.36)
j 3
To this end, first note that by Equation (7.12) and (6.24) of Chapter 6,
wV wD
f
k
k 3
i 1
D T P ¦ Pk M k (¦ D (i 1)
I n
D ( k i ) )
(7.37)
An elementary manipulation gives D T PB ( x)
D
¦
f
I x j 1 j
( j)
BT ( x) PD
D
¦
f
f
I x j 1 j
( j)
f
f
i 1
j 1
BT P (¦ Ik x ( k ) ) (¦ Bi x ( i ) )T P(¦ I j x ( j ) ) k 1
f
¦ (B k 1
f
T
PIk ) x ( k ) ¦
¦ row(B
PI j ) x ( k )
k 2 i j k i , j t1
f
BT PI1 x ¦ ( BT PIk k 2
On the other hand, using (7.31) gives
T i
¦ row(B
T i
i j k i , j t1
PI j )) x ( k ) (7.38)
158
Nonlinear H2/H Constrained Feedback Control
f
j
j 3
i 1
¦ Pk M k (¦ D (i 1)
I n
D ( j i ) )
B( x) ¦
D f
f k
I x( k ) 1 k
f
¦P M ¦ B j
j
k j
x(k )
(7.39)
k j 1
j 3
f k 1
¦¦ P M j
j
B kj x ( k )
k 2 j 3
Combining (7.38) and (7.39) gives (7.35). Equation (7.36) can be proved similarly. Step 4: A simple caculation gives DT ( x)(C ( x) D( x)u * ( x ) F ( x) w* ( x)) f
(
¦D x
f
¦\
j
¦ (D \
k
(i ) T
) (
i
i 0
x( j ) )
j 1
f
DT\ 1 x
T
¦ row(D \ T i
j
)) x ( k )
(7.40)
i j k i , j t1
k 2
Step 5: Combining (7.35) and (7.40) gives wV wD
B ( x) D A ( x ) B ( x ) u* ( x ) E ( x ) w* ( x )
(C ( x) D( x )u * ( x) F ( x) w* ( x))T D( x) f
( BT PI1 DT\ 1 ) x (
¦ (B
T
PIk DT\ k
k 2
¦
k 1
row(BiT PI j DiT\ j )
i j k i , j t1
¦P M j
j
B kj ) N k x[ k ]
(7.41)
j 3
Similarly, we have
wV wD
E ( x) D A ( x ) B ( x ) u* ( x ) E ( x ) w* ( x )
(C ( x) D( x)u* ( x) F ( x) w* ( x))T F ( x) J 2 ( w* ( x))T f
( E T PI1 F T\ 1 J 2W1 ) x (
¦ (E
T
PIk F T\ k
k 2
¦
i j k i , j t1
k 1
row(EiT PI j FiT\ j )
¦P M j
j 3
j
E kj J 2Wk ) N k x[ k ] (7.42)
An Algorithm to Solve Discrete HJI Equations
159
Step 6: Derive (7.21) and (7.23). Letting the coefficients of (7.41) and (7.42) be identically zero gives BT PI1 DT\ 1
0
(7.43)
E T PI1 F T\ 1 J 2W1
(7.44)
0
and for k 2, ( BT PIk DT\ k
k 1
¦ row(B
PI j DiT\ j )
¦ row(E
PI j FiT\ j )
T i
i j k i , j t1
( E T PIk F T\ k
¦P M j
j
B kj ) N k
0
(7.45)
j 3
T i
i j k i , j t1 k 1
¦P M j
j
E kj J 2Wk ) N k
0
(7.46)
j 3
Substituting the following relations
Ik N k
BU k EWk ( Ak Bku Ekw ) N k
(7.47)
\ k Nk
DU k FWk (Ck Dku Fkw ) N k
(7.48)
into (7.43) and (7.44), and, respectively, into (7.45) and (7.46), and rearranging , terms gives (7.21) and (7.23). The proof concludes.
Remark 7.3 Let R 1
ªr « 11 ¬ r21
r12 º r22 »¼
Then we can put (7.23) into the following form
ªU k º «W » ¬ k¼ where
ª Pk 1[ ku K ku º ,k t 2 « w w» ¬ Pk 1[ k K k ¼
(7.49)
160
Nonlinear H2/H Constrained Feedback Control
ª[ ku º « w» ¬[ k ¼
ª M k 1 ( r11 Bkk1 r12 Ekk1 ) N k º « » k k ¬ M k 1 ( r21 Bk 1 r22 Ek 1 ) N k ¼
(7.50)
ªKku º « w» ¬Kk ¼
ª Gu º R 1 « kw » N k ¬Gk ¼
(7.51)
A nice thing about Theorem 7.2 is that (U k , Wk ) is an affine linear function of Pk+1 in the sense that [ ku , [ kw , Kku , and Kkw depend only on P,P3 ," ,Pk and (U1 ,W1 ),! , (U k 1 , Wk 1 ). This fact can be verified by showing that, for j 3," ,k 1, B kj and E kj depend only on (U1 ,W1 ),! , (U k 1 , Wk 1 ). To this end, we note that, for k t j t 0, * kj depends only on I1 ,", Ik j 1 , hence only on (U1 , W1 ),! , (U k j 1 , Wk j 1 ) . Also, we note that B kj and E kj depend only on those *lm such that l d k and m d ( j 1) . Thus, B kj and E kj depend only on I1 ,", Ik j 2 , hence only on (U1 , W1 ),! , (U k j 2 ,Wk j 2 ) . Thus,for j 3," ,k 1, B kj and E kj depend only on (U1 ,W1 ),! , (U k 1 , Wk 1 ). . As a result of Theorem 7.2, we can solve (7.7) for P and Pk, for k 3, with the aid of (7.49). It turns out that P and Pk, for k 3, can be obtained by solving a discrete algebraic Riccati equation and a sequence of linear algebraic equations as described below.
Theorem 7.3 Under assumptions A1 and A2, P is governed by the following discrete algebraic Riccati equation T
ª BT PA DT C º 1 ª BT PA DT C º A PA C C « T R « T T » T » ¬ E PA F C ¼ ¬ E PA F C ¼ T
P
T
(7.52)
and, for k t 3 , Pk is governed by the following sequence of linear algebraic equations Pk Lk
Hk
(7.53)
where Lk
M k I1( k ) N k I n[ k ] ((I1T PB \ 1T D)T
[ ku1 M k 1 (I1T PE \ 1T F J 2W1T )T
[ kw1 M k 1 ) N k
(7.54)
An Algorithm to Solve Discrete HJI Equations
Hk
161
1 ¦ row(IiT PI j \ iT\ j J 2 M iT WiT W j M j ) N k 2 i j k i , j t 2
k 1
¦ Pj M j * kj N k j 3
§ ª PI1 ºT ¨ row ¨ «« \ 1 »» ¨¨ « J 2W » 1¼ ©¬
ª ( BKku1 EKkw1 ) M k 1 Ak 1 Bku1 Ekw1 º · ¸ « u w u w » «( DKk 1 FKk 1 ) M k 1 Ck 1 Dk 1 Fk 1 » ¸ N k « »¸ K kw1 M k 1 ¬ ¼ ¹¸
(7.55)
Proof. Again, we will break down the proof into several steps. Step 1: Show that V ( A( x) B( x)u* ( x) E ( x) w* ( x)) k 1 T T 1 f x I1 PI1 x ¦ ( ¦ row(IiT PI j ) ¦ Pj M j * kj )x ( k ) 2 2 k 3 i j k j 3
(7.56)
i , j t1
For this purpose, note that V ( A( x) B ( x)u * ( x) E ( x) w* ( x)) 1 D T PD 2 1 2
f
¦P M D j
f
¦ (I x
( j)
j
)
j 3
D f
(i ) T
i
) P
i 1
¦ (I x
f
¦ k 1 Ik x ( k ) f
( j)
j
)
j 1
f
¦ P M (¦ (I x j
j
k
j 3
1 f ( x (i ) )T IiT PI j x ( j ) 2 k 2 i j k
¦¦
(k ) ( j )
)
k 1
f
f
¦ P M ( ¦ (* j
j
j 3
kj
x( k ) )
k j
i , j t1
f k 1 T T 1 f x I1 PI1 x ( row(IiT PI j )) x ( k ) ( Pj M j * kj )x ( k ) 2 2 k 3 i j k k 3 j 3
¦ ¦
¦¦
(7.57)
i , j t1
Step 2: A simple calculation gives C ( x) D( x)u* ( x) F ( x) w* ( x) f
(
¦\ x i
2
f
(i ) T
) (
i 1
¦\
j
x( j ) )
j 1
f
xT\ 1T\ 1 x
¦ ( ¦ row(\ k 3 i j k i , j t1
T i
\ j )) x ( k )
(7.58)
162
Nonlinear H2/H Constrained Feedback Control
Step 3: Show that
H ( x, u* ( x), w* ( x)) 1 T T x (I1 PI1 P \ 1T\ 1 J 2W1T W1 ) x 2 f 1 row(IiT PI j \ iT\ j J 2 M iT WiT W j M j ) ( 2 k 3 i j k
¦ ¦
i , j t1
k
+
¦P M * j
j
kj
Pk M k ) N k x[ k ]
(7.59)
j 3
which is obtained by substituting (7.56) and (7.58) and the Taylor series expansions of V(x) and w*(x) into (7.7). Step 4: Derive equations (7.52) and (7.53). To this end, letting the coefficients of (7.59) be identically zero gives
I1T PI1 P \ 1T\ 1 J 2W1T W1
(7.60)
0
and for k t 3 (
1 row(IiT PI j \ iT\ j J 2 M iT WiT W j M j ) 2 i j k
¦
i , j t1
k
¦P M * j
j
kj
Pk M k ) N k
0
(7.61)
j 3
Now a simple manipulation gives I1T PI1 P \ 1T\ 1 J 2W1T W1 ( A BU1 EW1 )T PI1 P (C DU1 FW1 )T \ 1 J 2W1T W1 T
º BT PI1 DT\ 1 ªU1 º ª T T » A PI1 C \ 1 P «W » « T 2 T I \ J E P F W ¬ 1¼ ¬ 1 1 1¼
(7.62)
Substituting (7.43) and (7.44) into (7.62) shows that (7.60) is the same as AT PI1 C T\ 1 P
0
which gives, after some simple manipulations, (7.52). Next, we rewrite (7.61) into the following
An Algorithm to Solve Discrete HJI Equations
163
row(I1T PIk 1 \ 1T\ k 1 J 2W1T Wk 1 M k 1 ) N k Pk ( M k * kk M k ) N k
1 row(IiT PI j \ iT\ j J 2 M iT WiT W j M j ) N k 2 i j k
¦
i , j t 2
k 1
¦P M * j
j
kj
Nk
(7.63)
j 3
Now, for k t 3 , we have
I1T PIk 1 \ 1T\ k 1 J 2W1T Wk 1 M k 1 ª PI1 º « \ » « 1 » «¬ J 2W1 »¼
T
ª Ik 1 º « \ » k 1 « » ¬«Wk 1 M k 1 ¼» T ª Ak 1 Bku1 Ekw1 º ½ ª PI1 º ª B E º « \ » ° « D F » ªU k 1 º M «C D u F w » ° k 1 k 1 » ¾ « 1 » ®« » «W » k 1 « k 1 « »° «¬ J 2W1 ¼» ° ¬« 0 1 ¼» ¬ k 1 ¼ 0 ¬ ¼¿ ¯ T ª Ak 1 Bku1 Ekw1 º ½ ª PI1 º ª B E º u u « \ » ° « D F » ª Pk [ k 1 Kk 1 º M «C Du F w » ° k 1 k 1 » ¾ « 1 » ®« » « P [ w K w » k 1 « k 1 « »° «¬ J 2W1 »¼ ° «¬ 0 1 »¼ ¬ k k 1 k 1 ¼ 0 ¬ ¼¿ ¯ 2 T T u T T T w (I1 PB \ 1 D) Pk [ k 1 M k 1 (I1 PE \ 1 F J W1 ) Pk [ k 1 M k 1 ª PI1 º «« \ 1 »» «¬ J 2W1 »¼
T
ª ( BKku1 EKkw1 ) M k 1 Ak 1 Bku1 Ekw1 º « u w u w » « ( DKk 1 FKk 1 ) M k 1 Ck 1 Dk 1 Fk 1 » « » Kkw1 M k 1 ¬ ¼
(7.64)
Now using (7.14) gives row((I1T PB \ 1T D) Pk [ ku1 M k 1 )
Pk (I1T PB \ 1T D)T
[ ku1 M k 1
row((I1T PE \ 1T F J 2W1T ) Pk [ kw1 M k 1 )
Pk (I1T PE \ 1T F J 2W1 )T
[ kw1 M k 1
Thus, row(I1T PIk 1 \ 1T\ k 1 J 2W1T Wk 1 M k 1 )
Pk ((I1T PB \ 1T D)T
[ ku1 M k 1 (I1T PE \ 1T F J 2W1 )T
[ kw1 M k 1 ) § ª PI1 ºT ¨ row ¨ «« \ 1 »» ¨¨ «¬ J 2W1 »¼ ©
ª ( BK ku1 EK kw1 ) M k 1 Ak 1 Bku1 Ekw1 º · ¸ « w » u w u «( DK k 1 FKk 1 ) M k 1 Ck 1 Dk 1 Fk 1 » ¸ (7.65) « » ¸¸ Kkw1 M k 1 ¬ ¼¹
164
Nonlinear H2/H Constrained Feedback Control
Using (7.65) and (7.61), noting * kk I1( k ) ,M k N k I n[ k ] , and rearranging terms gives (7.53) with Lk and H k given by (7.54) and (7.55). ,
Remark 7.4 Using the same argument as used in Remark 7.3, it is not difficult to see that, for k 3, Lk and Hk depend only on P,P3 ," ,Pk and (U1 , W1 ),! , (U k 2 , Wk 2 ). Remark 7.5 Let the linearized system of (7.1) at the origin be denoted by x(t 1) z (t )
Ax(t ) Bu (t ) Ew(t ) Cx (t ) Du (t ) Fw(t )
(7.66)
Then (7.52) is exactly the discrete algebraic Riccati equation arising in the linear discrete H control problem, and the linearization of the control law (7.2) is exactly the linear control law that solves the problem of disturbance attenuation with internal stability for the linear system (7.66). Combining Theorems 7.2 and 7.3 leads to an iterative algorithm to solve (7.5) to (7.7) as described below. Assuming P, and (Uk, Wk, Pk+1) for k = 2, 3,ಹ, N are desirable, then the algorithm goes as follows. x Step 1: Solve the discrete Riccati equation (7.52) for P, obtain U1 and W1 from Equation (7.21), and set k = 2. x Step 2: From Lk+1 and H k+1 which depend only on P,P3 ," ,Pk and (U1, W1), " ,(Uk-1, Wk-1), and obtain P k+1 from (7.53). x Step 3: From [ ku , [ kw , K ku , and Kkw which depend only on P,P3 ," ,Pk and (U1, W1), " , (Uk-1, Wk-1), and obtain Uk and Wk from (7.49). x Step 4: If k = N stop. Otherwise set k = k+1 and return to Step 2. It is seen that, in a way similar to the continuous-time case, the problem of approximating discrete nonlinear H control laws is reduced to solving one discrete algebraic Riccati equation and a sequence of linear algebraic equations.
7.3 Disturbance Attenuation of Discretized RTAC System In this section, we will consider the disturbance attenuation problem for the discretized model of the RTAC system. The continuous-time model is given in Equation (5.15) and (5.16) of Chapter 5. Discretizing (5.15) via Euler's method with h as the sampling period gives the discrete-time state-space equation of the RTAC system as follows. x(t 1)
A( x(t )) B( x(t ))u (t ) E ( x(t )) w(t ),t
0,1, 2,"
(7.67)
where x(t), u(t) and w(t) are shorthand notation for x(th), u(th) and w(th) with t = 0, 1, 2, " , and
An Algorithm to Solve Discrete HJI Equations
A( x )
hf ( x) x,B( x)
hg 2 ( x),E ( x)
hg1 ( x)
165
(7.68)
Our objective is to find a full information approximate nonlinear H control law to achieve disturbance attenuation with internal stability for the discrete-time RTAC system. For this purpose, let us introduce a performance output variable z defined as follows: z (t )
C ( x) D( x)u F ( x) w
(7.69)
where C ( x) x1 G ( x2 x3 x4 ) with G a nonnegative real number, D(x) = d with d a nonnegative real number, and F ( x ) 0 . When G d 0, the performance output z = x1 is just the cart position, and when G > 0 and d >0, the performance output z is a linear combination of the state variable x and the control input u. It is expected that a nonzero G and/or nonzero d can be used to trade-off the performance and the required control effort. To be specific, we will design a third-order controller in this section. For this purpose, we expand the functions A(x), B(x), E(x), C(x), D(x) and F(x) up to third order. Doing so leads to the following matrices with the notation defined in (7.16): h 0 0º ª 1 « 25 » « h 1 0 0» ª (2,11) (2, 48) 24 « » A « , A2 04u16 , A3 « 25 5 « 0 0 1 h» h h « » «¬ 576 24 5 « » «¬ 24 h 0 0 1 »¼ ª 0 º « 5 » « h» ª(2,11) (4,11) º B «« 24 »» , B1 04u4 , B2 « 65 25 »» « 0 h h « » 576 ¼» 4u16 ¬« 576 « 25 » «¬ 24 h »¼ ª 0 º « 25 » « h» ª(2,11) (4,11) º E «« 24 »» , E1 04u4 , E2 « 25 65 »» « 0 h h « » 576 ¼» 4u16 ¬« 576 « 5 » «¬ 24 h »¼ C >1 G G G @ , C2 01u16 , C3 01u64 D
d , D1
F
0, F1
01u4 , D2 01u4 , F2
01u16 01u16
(4,11) 65 h 576
(4, 48) º 1 » h» 24 »¼ 4u64
166
Nonlinear H2/H Constrained Feedback Control
Remark 7.6 As in Chapter 6, we have adopted a compact form to denote the sparse matrices A3, B2 and so on. With our notation, only the non-zero elements of a matrix together with its coordinates are given. For example, the matrix A3 is a 4 u 64 matrix whose non-zero elements only appear in row 2 and column 11, row 2 and column 48, row 4 and column 11, and row 4 and column 48, with values 25 5 65 1 h , h , h , h , respectively. 576 24 576 24 Let us first find the linear term of the solution of the DHJI equation with 0.2,d 1,h 0.01 and J which can be straightforwardly obtained by solving (7.52) and (7.21) and the results are as follows
G
P
ª 402.17 « 27.642 « « 22.978 « ¬ 37.215
27.642 22.978 37.215º 453.29 8.7132 17.22 »» 8.7132 1.5475 1.792 » » 3.92 ¼ 17.22 1.792
(7.70)
and U1
[0.5648 0.92584 0.19988 0.27587]
W1
[0.0032238 0.047527 0.00089446 0.0018349]
Since A2 = 0, B1 = 0, E1 = 0, C2 = 0, D1 = 0, F1 = 0, it can be verified that the second-order term of the solution of the DHJI equation is as follows: P3
01u20 ,U 2
01u10 ,W2
01u10
(7.71)
To obtain the third-order term of the solution of the DHJI equation, let us first form L4 and H4 according to Theorem 7.3 as follows. L4
M 4I1(4) N 4 I 4(4)
((I1T PB \ 1T D)T
[3u M 3 (I1T PE \ 1T F J 2W1T )T
[3w M 3 ) N 4 where
[3u
M 4 (r11 B43 r12 E43 ) N3
[3w
M 4 (r21 B43 r22 E43 ) N 3
B43
¦
t s 3 s t 0,t t 3
4
B4ts
B430
¦( ¦ i 1
*l (i 1)
B0
* m (4 i ) )
l m 3 l t i 1,m t 4 i
* 00
B0
*33 *11
B0
* 22 * 22
B0
*11 *33
B0
* 00
An Algorithm to Solve Discrete HJI Equations
¦
E43
4
E4ts
E430
t s 3 s t 0,t t 3
¦( ¦ i 1
*l (i 1)
E0
* m (4 i ) )
l m 3 l t i 1,m t 4 i
* 00
E0
*33 *11
E0
* 22 * 22
E0
*11 * 33
E0
* 00 * 00
1,*11
I1 ,* 22
B0
B,E0
E
I1
I1 ,*33
I1
I1
I1
I1 BU1 M 1 EW1 M 1 A \ 1 DU1 M 1 FW1 M 1 C The row vector H4 can be calculated as follows: 1 row(I2T PI2 \ 2T\ 2 J 2 M 2T W2T W2 M 2 ) N 4 P3 M 3 * 43 N 4 2 § ª PI1 ºT ª ( BK3u EK3w ) M 3 A3 B3u E3w º · ¨ « »¸ row ¨ «« \ 1 »» «( DK3u FK3w ) M 3 C3 D3u F3w » ¸ N 4 ¨¨ « J 2W » « »¸ K3w M 3 1¼ ¬ ¼ ¹¸ ©¬ H4
where
\2
DU 2 M 2 FW2 M 2 (C2 D2u F2w )
* 43
I1
I1
I2 I2
I1
I1 I1
I2
I1
u 3 w 3
u ªK º 1 ª G3 º « » R « w » N3 ¬K ¼ ¬G3 ¼ u T G3 B P ( A3 B3u E3w ) DT (C3 D3u F3w )
row( B1T PI2 D1T\ 2 ) row( B2T PI1 D2T\ 1 ) P3 M 3 B33 G3w
E T P ( A3 B3u E3w ) F T (C3 D3u F3w )
row( E1T PI2 F1T\ 2 ) row( E2T PI1 F2T\ 1 ) P3 M 3 E33 and B2u
B1 ( I 4
U1 M 1 )
w 2
E1 ( I 4
W1 M 1 )
u 2
D1 ( I 4
U1 M 1 )
w 2
F1 ( I 4
W1 M 1 )
u 3
B
B1 ( I 4
U 2 M 2 ) B2 ( I 4(2)
U1 M 1 )
E3u
E1 ( I 4
W2 M 2 ) E2 ( I 4(2)
W1 M 1 )
D3u
D1 ( I 4
U 2 M 2 ) D2 ( I 4(2)
U1 M 1 )
E
D F
167
168
Nonlinear H2/H Constrained Feedback Control
F3u B33
F1 ( I 4
W2 M 2 ) F2 ( I 4(2)
W1 M 1 )
¦
B3ts
B330 B321
t s 3 s t 0,t t 2 3
i 1
3
¦( ¦
*l (i 1)
B
* m (3i ) )
l m 3 l t i 1,m t 3 i
i 1
¦
(*l 0
B
* m 2 )
¦
(*l 0
B1
* m 2 )
l m 3 l t 0,m t 2
¦( ¦
¦
(*l1
B
* m1 )
¦
(*l1
B1
* m1 )
l m 3 l t1,m t1
l m 2 l t 0,m t 2
*l (i 1)
B1
* m (3i ) )
l m 2 l t i 1,m t 3 i
¦
(*l 2
B
* m 0 )
l m 3 l t 2,m t 0
l m 2 l t1,m t1
¦
(*l 2
B1
* m 0 )
l m 2 l t 2,m t 0
* 00
B
*32 *10
B
* 22 *11
B
* 21 * 21
B
*11 *32
B
* 00 * 22
B
*10 * 00
B1
* 22 *11
B1
*11 * 22
B1
* 00 E33
¦
E330 E321
E3ts
t s 3 s t 0,t t 2 3
¦( ¦ i 1
3
*l (i 1)
E
* m (3i ) )
l m 3 l t i 1,m t 3 i
¦( ¦ i 1
*l (i 1)
E1
* m (3i ) )
l m 2 l t i 1,m t 3 i
* 00
E
*32 *10
E
* 22 *11
E
* 21 * 21
E
*11 *32
E
* 00 * 22
E
*10 * 00
E1
* 22 *11
E1
*11 * 22
E1
* 00 where *32
I2
I1
I2 I2
I1 ,*10
u 2
P4
P4[3u K3u , W3
w 2
BU 2 M 2 EW2 M 2 ( A2 B E )
Using (7.53) and (7.49) with k
U3
I2
01u4 ,* 21
4 gives
P4[3w K3w
H 4 L41 [178.42 83.024 8.9588 194.29 275.14 75.947 28.702 13.255 69.447 98.195 5.2063 126.36 1.1299 41.294 71.673 95.619 47.355 21.619
29.016 73.57 28.639 26.429
4.9062 7.3569 3.9893]1u35
An Algorithm to Solve Discrete HJI Equations
U3
169
[2.1846 0.85148 0.88087 0.54069 2.5123 0.63429 1.8631 0.46251 1.6363 1.1538 1.2889 0.025736 1.1789 0.080127 0.55424 1.3863 0.00027142 0.019299 0.294560.26984]1u20
W3
[0.0119150.056461 0.0093656 0.00238170.033882 0.0051857 0.03302 0.00080421 0.010459 0.014978 0.040698 0.014287 0.01618 0.0033511 0.0042611 0.0059355 0.00093166 0.0048367 0.00374940.0055233]1u 20
To obtain the third-order controller from (7.11), we calculate H uu ( x, u , w ) and H wu ( x, u , w ) up to the second order using the following formulas: BT ( x )
H uu ( x, u* , w* )
E T ( x)
H wu ( x, u* , w* )
w 2V wD 2
w 2V wD 2
B ( x ) DT ( x ) D ( x ) D A( x ) B ( x ) u* E ( x ) w*
B( x) F T ( x) D( x) D A ( x ) B ( x ) u* E ( x ) w*
We first look for the second-order term of D in purpose, let V [4] (D ) P4 M 4D (4) . Then w 2V [4] wD 2
w 2V wD 2 D A( x) B( x)u* E ( x) w*
. For this
w 2 ( P4 M 4D (4) ) wD 2 w[ P4 M 4
¦
T (3)
D A ( x ) B ( x ) u* E ( x ) w*
4 i 1
(D (i 1)
I 4
D (4 i ) )]T
T
w[(D ) S4 ] wD
[ S4T
¦
3 i 1
wD w[ S4T D (3) ] wD
(D (i 1)
I 4
D (3i ) )]T
(D (0)
I 4
D (2) D (1)
I 4
D (1) D (2)
I 4
D ( 0) )T S4 S (D ) S4
where S4 is defined in (6.49) of Chapter 6 and is uniquely determined by P4 and (D (0)
I 4
D (2) D (1)
I 4
D (1) D (2)
I 4
D (0) )T
S (D ) Thus, we have
w 2V wD 2
P S (D ) S4 "
(7.72)
170
Nonlinear H2/H Constrained Feedback Control
Using identity (6.26) of Chapter 6 gives
BT S (D ) S4 B
BT (D (0)
I 4
D (2) D (1)
I 4
D (1) D (2)
I 4
D (0) )T S4 B
[(D (0)
I 4
D (2) ) BD (0) (D (1)
I 4
D (1) ) BD (0) (D (2)
I 4
D (0) ) BD (0) ]T S4 B [( I 4(0)
B
I 4(2) )D (2) ( I 4(1)
B
I 4(1) )D (2) ( I 4(2)
B
I 4(0) )D (2) ]T S4 B BT S4T [( I 4(0)
B
I 4(2) ) ( I 4(1)
B
I 4(1) ) ( I 4(2)
B
I 4(0) )]D (2) BT S4T B4D (2) where
B4
( I 4(0)
B
I 4(2) ) ( I 4(1)
B
I 4(1) ) ( I 4(2)
B
I 4(0) )
Similarly, we have
E T S (D ) S4 B
BT S4T E4D (2)
where
E4
( I 4(0)
E
I 4(2) ) ( I 4(1)
E
I 4(1) ) ( I 4(2)
E
I 4(0) )
Also, we have
D (2)
(
¦
f
f
I x ( k ) )
(¦ k 1Ik x ( k ) )
k 1 k
I1(2) x (2) 2[3 ] ( x) where O[3+](x) denotes any smooth function that vanishes at the origin together with its first- and second-order derivatives. Now, we have, noting D ( x) D, H uu ( x, u* ( x ), w* ( x ))
T
( BT x (2) B2T 2[3 ] ( x))
w 2V ( B B2 x (2) 2[3 ] ( x)) DT D wD 2
T
( BT x (2) B2T 2[3 ] ( x))( P S (D ) S4 ") u ( B B2 x (2) 2[3 ] ( x)) DT D T
BT PB DT D x (2) B2T PB BT S (D ) S4 B BT PB2 x (2) 2[3 ] ( x) BT PB DT D BT PT B2 x (2) BT S4T B4D (2) BT PB2 x (2) 2[3 ] ( x) BT PB DT D ( BT PT B2 BT S4T B4I1(2) BT PB2 ) x (2) 2[3 ] ( x) BT PB DT D B x (2) 2[3 ] ( x ) 2
An Algorithm to Solve Discrete HJI Equations
171
where B 2
BT PT B2 BT S 4T B4I1(2) BT PB2
Similarly, we have, noting F ( x)
H wu ( x, u* ( x), w* ( x))
0, T
( E T x (2) E2T 2[3 ] ( x))
w 2V ( B B2 x (2) 2[3 ] ( x)) wD 2
E T PB E 2 x (2) 2[3 ] ( x) where
E 2
BT PT E2 BT S4T E4I1(2) E T PB2
Finally, noting that
H uu1 ( x, u * ( x), w* ( x)) H wu ( x, u* ( x), w* ( x)) ( BT PB DT D B 2 x (2) ) 1 ( E T PB E 2 x (2) ) 2[3 ] ( x) B 2 x (2) 1 (1 )( E T PB E 2 x (2) ) 2[3 ] ( x) BT PB DT D BT PB DT D E T PBB 2 E 2 E T PB T ( ) x (2) 2[3 ] ( x) 2 T T T T B PB D D ( B PB D D) B PB DT D gives the third-order controller as follows: u[3]
U1 x U 3 x[3] E T PB ( w W1 x W3 x[3] ) BT PB DT D E T PBB 2 E 2 ) x (2) ( w W1 x ) ( T T 2 T ( B PB D D) B PB DT D
(7.73)
Dropping the third-order terms in (7.73) gives the linear controller as follows: u
U1 x ( BT PB DT D ) 1 ( E T PB )( w W1 x)
(7.74)
172
Nonlinear H2/H Constrained Feedback Control
7.4 Computer Simulation Extensive computer simulation has been conducted to evaluate the performance of our control law. Figure 7.1 shows the time responses of the four states of the openloop system under a sinusoidal disturbance F (t ) sin(2.5t ) with zero initial condition. Figure 7.2 shows the time responses of the position output x1 of the closed-loop system and the control input under linear and third-order control, respectively, with the disturbance F (t ) sin(2.5t ) and zero initial condition. Figure 7.3 repeats the scenario of Figure 7.2 but with the disturbance F (t ) 5sin(2.5t ).
Figure 7.1. Time response of the four states of the open-loop system under a sinusoidal disturbance F (t ) sin(2.5t ) with zero initial condition
An Algorithm to Solve Discrete HJI Equations
173
Figure 7.2. Profiles of the output of the closed-loop system and the control input under linear and third-order control, respectively, with the disturbance F (t ) sin(2.5t ) and zero initial condition
To demonstrate more scenarios, Table 7.1 lists the steady state amplitudes of the time response of the position variable x1 for several different frequencies with Am 3.5 and Table 7.2 lists the steady state amplitudes of the time response of the position variable x1 for several different amplitudes with w 2. It can be seen that when the amplitude of the disturbance is relatively small, the performance of the linear and third-order controller is similar. But as the amplitude of the disturbance increases or as the frequency of the disturbance decreases, the third-order controller performs better than the linear controller. In particular, the stability domain of the closed-loop system resulting from the third-order controller is larger than that of the closed-loop system resulting from the linear controller.
174
Nonlinear H2/H Constrained Feedback Control
Figure 7.3. Profiles of the output of the closed-loop system and the control input under linear and third-order control, respectively, with the disturbance F (t ) 5sin(2.5t ) and zero initial condition
Table 7.1. Maximal steady state amplitudes of x1 with Am
Frequency w 2 w 3 w 4
Nonlinear controller 1.0335 0.43105 0.23858
Linear controller 1.1543 0.44706 0.24053
Table 7.2. Maximal steady state amplitudes of x1 with w
Frequency Am=0.5 Am=2 Am=3.5
Nonlinear controller 0.16455 0.61135 1.0335
3.5
2
Linear controller 0.16485 0.65917 1.1543
An Algorithm to Solve Discrete HJI Equations
175
7.5 Bibliographical Notes The discrete H control problem has been studied in several papers [14], [36], and [62]. Theorem 7.1 is adapted from [62]. Existence of an analytical solution to the discrete HJI equation was studied in [37], where an iterative procedure for computing the Taylor series solution of the discrete HJI equation was also given. It was shown in [44] that the coefficients of the Taylor series solution of the discrete HJI equation are governed by one discrete algebraic Riccati equation and a sequence of linear algebraic equations. A numerical algorithm for calculating these coefficients was also given in [44]. Sections 7.2 to 7.4 are based on [44]. Section 7.5 is based on the result presented in [48].
8 Hf Static Output Feedback
In this chapter our objective is to illustrate H f static output feedback design techniques for linear time-invariant systems. Static output feedback is applied in many areas of control engineering including process and flight control. The main advantage of the static output feedback is the simplicity of its implementation and ability it provides for designing compensators of prescribed structure. In this chapter we present a design method for deriving the static output feedback that satisfies prescribed H f performance criteria. The chapter is organized as follows. In Section 8.1, we introduce the notations and design objective briefly. Section 8.2 presents intermediate mathematical analysis which will be used in the Section 8.3. Necessary and sufficient conditions are derived in Section 8.3 and include two coupled matrix design equations. In Section 8.4 conditions are given under which static output feedback H f solution yield a well-defined saddle point for the zerosum differential game. A numerically efficient solution algorithm to solve the coupled equations is presented in Section 8.5. In Section 8.6, the algorithm is applied to design an F-16 normal acceleration design to illustrate the power of the proposed approach.
8.1 Introduction Consider the linear time-invariant system of Figure 8.1 with control input u (t ) output y (t ) , and disturbance d (t ) given by x
Ax Bu Dd , y
Cx
(8.1)
and a performance output z (t ) that satisfies
z (t )
2
xT Qx u T Ru, y
Cx
178
Nonlinear H2/H Constrained Feedback Control
x Ax Bu Dd y Cx 2 z xT Qx u T Ru
d
u
u
z
y
Ky
Figure 8.1. System description
for some positive matrices Q t 0 and R ! 0 . It is assumed that C has full row rank, a standard assumption to avoid redundant measurements. By definition the pair ( A, B ) is said to be stabilizable if there exists a real matrix K such that A BK is (asymptotically) stable. The pair ( A, C ) is said to be detectable if there exists a real matrix L such that A LC is stable. System (8.1) is said to be output feedback stabilizable if there exists a real matrix K such that A BKC is stable. The system L2 -gain is said to be bounded or attenuated by J if f
³
0 f
³
f
2
T
³ ( x Qx u
z (t ) dt
T
Ru )dt
0
f
2
³ (d
d (t ) dt
0
T
dJ2
(8.2)
d )dt
0
Bounded L2-gain design problem: Defining a constant output feedback control as u
Ky
KCx
(8.3)
it is desired to find a constant output feedback gain K such that the system is stable and the L2 -gain is bounded by a prescribed value J .
8.2 Intermediate Mathematical Analysis To find a constant output feedback gain K as described in the previous section, one may define the value functional f
J (K , d )
T
³ ( x Qx u
T
Ru J 2 d T d )dt
0
f
³[x 0
T
(Q C T K T RKC ) x J 2 d T d ]dt
(8.4)
Hf Static Output Feedback
179
the corresponding Hamiltonian is defined as T
H ( x, Vx , K , d )
wV [( A BKC ) x Dd ] wx xT (Q C T K T RKC ) x J 2 d T d
(8.5)
with co-state Vx . It is known that for linear systems the value functional V ( x ) is quadratic and may be taken in the form V xT Px ! 0 without loss of generality. We shall do so throughout the chapter. Two lemmas simplify the presentation of our main theorem in the next section, which solves this output feedback control problem. Lemma 8.1 is a mathematical description of Hamiltonian H , (8.5) at given predefined disturbance d , (8.6) and gain K , (8.7). It shows that if the gain K exists, the Hamiltonian takes on a special form.
Lemma 8.1 For the disturbance defined as d (t )
1
J2
DT Px
(8.6)
if there exists K satisfying
K C
R 1 ( BT P L)
(8.7)
for some matrix L , then one can write H ( x, Vx , K , d )
xT [ PA AT P Q
1 PDDT P J2
PBR 1 BT P LT R 1 L]x
(8.8)
Proof. Introduce a quadratic form V ( x) V
Then Vx
xT Px ! 0
(8.9)
2 Px , and substitution in (8.5) gives H ( x, Vx , K , d )
2 xT P [( A BKC ) x Dd ] xT (Q C T K T RKC ) x J 2 d T d
(8.10)
Note that H ( x,Vx K , d ) is globally concave in d . To find a maximizing disturbance, set
180
Nonlinear H2/H Constrained Feedback Control
wH wd
0
2 DT Px 2J 2 d
This defines the maximizing or worst case disturbance (8.6). Substitute (8.6) into (8.10) to get H ( x, Vx , K , d )
2 xT P[( A BKC ) x D
1
J2
DT Px] T
§ 1 · 1 x (Q C K RKC ) x J ¨ 2 DT Px ¸ 2 DT Px ©J ¹ J 1 xT [ PA AT P Q 2 PDDT P PBKC T
T
2
T
J
T
T
T
T
T
C K B P C K RKC ]x Completing the squares yields H ( x, Vx , K , d )
xT [ PA AT P Q
1
J2
PDDT P PBR 1 BT P
( KC R 1 BT P )T R ( KC R 1 BT P)]x
(8.11)
Substituting the gain defined by (8.7) into (8.11) yields (8.8) H ( x, Vx , K , d )
xT [ PA AT P Q 1
1
J
2
PDDT P PBR 1 BT P
( R B P R L R 1 BT P)T R( R 1 BT P R 1 L R 1 BT P)] x T
1
or H ( x, Vx , K , d )
xT [ PA AT P Q
1
J2
PDDT P PBR 1 BT P LT R 1 L]x
, The next lemma expresses the Hamiltonian for any K and d (t ) in terms of the Hamiltonian for K and d (t ). Lemma 8.2 Suppose there exists K so that Lemma 8.1 holds, then for any x(t ), K and d (t ) one can write H ( x, Vx , K , d )
H ( x, Vx , K , d ) xT [ LT R 1 L]x J 2 d d
2
xT [ L R ( K K )C ]T R 1[ L R ( K K )C ]x for K satisfying (8.7), d satisfying (8.6).
(8.12)
Hf Static Output Feedback
181
Proof. One has for any x(t ) , K , d (t ) , and a quadratic form V ( x) defined by (8.9) 2 xT P[( A BKC ) x Dd ]
H ( x, Vx , K , d )
xT (Q C T K T RKC ) x J 2 d T d
(8.13)
whence, one may derive H ( x, Vx , K , d )
xT [ PA AT P Q
1
J2
PDDT P PBR 1 BT P LT R 1 L] x
xT [ PBKC C T K T BT P C T K T RKC
1
J2
PDDT P
PBR 1 BT P LT R 1 L]x xT 2 PDd J 2 d T d or H ( x, Vx , K , d )
H ( x, Vx K , d ) (8.14) xT [ PB ( KC R 1 BT P) C T K T ( BT P RKC ) 1 LT R 1 L]x xT [ 2 PDDT P ]x xT 2 PDd J 2 d T d
J
Substituting R 1 ( BT P L) K C , R 1 BT P K C R 1 L, BT P RK C L, and PB C T ( K )T R LT into the first term in square brackets yields, after some manipulations
xT [ PB( KC R 1 BT P) C T K T ( BT P RKC ) LT R 1 L]x xT [C T ( K K )T R( K K )C ]x xT [ LT R 1 L C T ( K K )T L LT ( K K )C LT R 1 L]x The result contains nonsquare terms. One must change these into square form and study the contribution in order to reach any conclusion. Therefore complete the square to see that xT [ LT R 1 L C T ( K K )T L LT ( K K )C LT R 1 L]x xT [^ LT C T ( K K )T R` R 1 ^ L R ( K K )C` C T ( K K )T RR 1 R( K K )C LT R 1 L]x Therefore one has
182
Nonlinear H2/H Constrained Feedback Control
xT [ PB( KC R 1 BT P) C T K T ( BT P RKC ) LT R 1 L]x xT [^ LT C T ( K K )T R` R 1 ^ L R ( K K )C` LT R 1 L]x
(8.15)
Consider now the remaining three terms on the right-hand side of (8.14). One has 1
d
J2
DT Px
so that 1
( d )T
J2
xT PD xT PD
J 2 ( d )T J 4 d ( d )T
xT [ PDDT P]x
Therefore one can show xT [
1
J
2
PDDT P]x xT 2 PDd J 2 d T d
J 2 d d
2
Substituting now Equations (8.16) and (8.15) into (8.14) yields (8.12).
(8.16)
,
Remark 8.1 According to the proof and the form of the Hamiltonian in (8.12), d (t ) given by (8.6) can be interpreted as a worst-case disturbance since the equation is negative definite in d d * . The form (8.12) of the Hamiltonian does not allow the interpretation of K defined by (8.7) as a minimizing control. More shall be said about this in Section 8.4.
8.3 Coupled HJ Equations for H Static Output Feedback Control The following main theorem presents necessary and sufficient conditions for output feedback stabilizability with prescribed attenuation J .
Theorem 8.1 (Necessary and Sufficient Conditions for H Static Output Feedback Control): Assume that Q t 0 and ( A, Q ) is detectable. Then, the system defined by (8.1) is output feedback stabilizable with L2-gain bounded by J , , if and only if i. ( A, B ) is stabilizable and ( A, C ) is detectable ii. There exist matrices K and L such that K C
R 1 ( BT P L)
where P ! 0 , PT
P , is a solution of
(8.17)
Hf Static Output Feedback
PA AT P Q
1
J2
PDDT P PBR 1 BT P LT R 1 L
183
(8.18)
0
Proof. Sufficiency. To prove sufficiency first note that Lemma 8.1 shows that H ( x, Vx , K , d ) 0 if (ii) holds. It is next required to show bounded L2 -gain if (ii) holds. From Lemma 8.1 and Lemma 8.2, one has for any K , x(t ) , and d (t ) H ( x, Vx , K , d )
xT [ L R( K K )C ]T R 1[ L R( K K )C ]x xT [ LT R 1 L]x J 2 d d
2
(8.19) Ky
Note that one has, along the system trajectories, for u
KCx
T
wV wV x wt wx T wV ( Ax Bu Dd ) wx T wV [( A BKC ) x Dd ] wx
dV dt
so that from (8.5)
H ( x, Vx , K , d )
dV xT (Q C T K T RKC ) x J 2 d T d dt
(8.20)
With Equations (8.19) and (8.20) xT [ L R ( K K )C ]T R 1[ L R( K K )C ]x xT [ LT R 1 L]x J 2 d d
2
dV xT (Q C T K T RKC ) x J 2 d T d dt Selecting K
K , for all d (t ) , and x(t )
dV xT (Q C T K T RKC ) x J 2 d T d dt xT [ LT R 1 L]x J 2 d d
2
d0
(8.21)
Integrating this equation yields T
V ( x(T )) V ( x(0)) ³ [ xT (Q C T K T RKC ) x J 2 d T d ] dt d 0 0
(8.22)
184
Nonlinear H2/H Constrained Feedback Control
Selecting x(0) obtains
0 and noting that non-negativity implies V ( x (T )) t 0 T , one
T
T
T T T T 2 ³ x (Q C K RKC ) xdt d J ³ d d dt 0
(8.23)
0
for all T ! 0 , so that the L2 gain is less than J Finally, to prove the stability of the closed-loop system, letting d (t ) one has dV d xT (Q C T K T RKC ) x d xT Qx dt
0 in (8.21)
(8.24)
Now detectability of ( A, Q ) shows that the system is locally AS with Lyapunov function V ( x) . Necessity. To prove necessity, suppose that there exists an output feedback gain K that stabilizes the system and satisfies L2 gain J . It follows that Ac { A BKC is stable. Since A BKC A LC A BK , then (i) follows. Consider the equation AcT P PAc
1
J2
PDDT P Q C T K T RKC
0
(8.25)
From [55], Theorem 2.3.1, closed-loop stability and L2 gain boundedness implies that (8.25) has a unique symmetric solution such that P t 0 . Rearranging (8.25) and completing the square will yield 0
PA AT P Q
1
J2
PDDT P PBR 1 BT P
( KC R 1 BT P )T R ( KC R 1 BT P )
(8.26)
(8.18) is obtained from (8.26) for the gain defined by (8.17) and (ii) is verified.
,
Note that (8.25) is a Lyapunov equation referred to the output z (t ) , since 2 z (t ) xT Qx u T Ru . Moreover, this theorem reveals the importance of the Hamiltonian H ( x,Vx , K , d ) since the equation H ( x, Vx , K , d ) 0 must hold for a stabilizing output feedback with bounded H f gain. Note further that if C I , L 0 , this theorem reduces to known results for full state variable feedback. Table 8.1 summarizes the results of this section.
Hf Static Output Feedback Table 8.1.
185
H output feedback regulator
System model: x
Ax Bu Dd
y
Cx
Control: u
Ky
Value functional: f T
³ ( x Qx u
J (K , d )
T
Ru J 2 d T d ) dt
0
with Q t 0, R ! 0 H f gain design equations: PA AT P Q K C
1
J2
PDDT P PBR 1 BT P LT R 1 L
0
R 1 ( BT P L)
8.4 Existence of Static Output Feedback Game Theoretic Solution The form of (8.12) does not allow the interpretation of ( K , d ) as a well-defined saddle point. The purpose of this section is to study when the two policies are in saddle point equilibrium for H f static output feedback. This means one has Nash equilibrium in the game theoretic sense as discussed in [16], so that the H f output feedback problem has a unique solution for the resulting L . In fact, this is the case when Theorem 8.1 is satisfied with L 0 , as we now show using notions from two-player, zero-sum differential game theory [16][55]. The minimizing player controls u (t ) and maximizing player controls d (t ).
Theorem 8.2. (Existence of Well-defined Game Theory Solution): ( K , d ) is a well-defined game theoretic saddle point corresponding to a zero-sum differential game if and only if L is such that
186
Nonlinear H2/H Constrained Feedback Control
M { LT ( K K )C C T ( K K )T L C T ( K K )T R( K K )C t0
(8.27)
when K z K . Note that this is always true if L
0.
Proof. Equation (8.12) becomes H ( x, Vx , K , d )
H ( x, Vx , K , d )
xT [ L R ( K K )C ]T R 1[ L R ( K K )C ]x xT [ LT R 1 L]x J 2 d d
2
H ( x, Vx , K , d ) xT LT R 1 Lx xT LT ( K K )Cx xT [ R( K K )C ]T R 1 Lx xT [ R ( K K )C ]T R 1[ R( K K )C ]x xT [ LT R 1 L]x J 2 d d
2
Hence one has H ( x, Vx , K , d )
H ( x, Vx , K , d ) xT Mx J 2 d d
2
(8.28)
Under the condition defined by (8.27), one has H ( x, Vx , K , d ) d H ( x, Vx , K , d ) d H ( x,Vx , K , d )
(8.29)
w2 H ! 0, wu 2
(8.30)
or w2 H 0 wd 2
at ( K , d ). It is known that a saddle point at the Hamiltonian implies a saddle point at the value function J when considering finite-horizon zero-sum games. For the infinite-horizon case, the same strategies remain in saddle point equilibrium when sought among the class of stabilizing strategies [15]. Therefore, this implies that w2 J ! 0, wu 2
w2 J 0 wd 2
which guarantees a game theoretic saddle point.
(8.31)
,
Hf Static Output Feedback
187
Remark 8.2 To complete the discussion in the Remarks following Lemma 8.2, note that Theorem 8.2 allows the interpretation of K defined by (8.7), when L 0 , as a minimizing control in a game theoretic sense. It is important to understand that introducing L in Theorem 8.1 provides the extra design freedom needed to provide necessary and sufficient conditions for the existence of the H f output feedback solution. If L z 0, then there may exist a saddle point in some cases. However counter-examples are easy to find.
8.5 Iterative Solution Algorithm The importance of this H f control design is that the matrix equations in Table 8.1 are used to solve all the m u p elements of K at once. This corresponds to closing all the feedback loops simultaneously. Moreover, as long as certain reasonable conditions on the plant and the value functional hold, the closed-loop system is guaranteed to be stable. Most existing iterative algorithms for output feedback design require the determination of an initial stabilizing gain [73], which can be very difficult for practical systems. The following algorithm in Table 8.2 is proposed in [32] to solve the two coupled design equations in Theorem 8.1. Table 8.2.
1. Initialize: Set n 0 , L0
Hf Static Output feedback Solution Algorithm
0 , and select J , Q and R.
2. nth iteration: Solve for Pn in Pn A AT Pn Q
1
J2
Pn DDT Pn Pn BR 1 BT Pn LnT R 1 Ln
0
(8.32)
Evaluate gain and update L K n 1
R 1 ( BT Pn Ln )C T (CC T ) 1
(8.33)
Ln 1
RK n 1C BT Pn
(8.34)
If K n 1 and K n are close enough to each other, go to 3, otherwise set n and go to 2.
3. Terminate: Set K K n 1
n 1
188
Nonlinear H2/H Constrained Feedback Control
Note that this algorithm uses well-developed techniques for solving Riccati equations available, for instance, in MATLAB. It generalizes the algorithm in [56] to the case of nonzero initial gain.
Lemma 8.3 If this algorithm converges, it provides the solution to (8.17) and (8.18). Proof. Clearly at convergence (8.18) holds for Pn . Note that substitution of (8.33) into (8.34) yields Ln 1 Defining C
R[ R 1 ( BT Pn Ln )C T (CC T ) 1 ]C BT Pn
C T (CC T ) 1 as the right inverse of C one has ( BT Pn Ln )C C BT Pn
Ln 1
BT Pn ( I C C ) Ln C C At convergence Ln 1
BT P L
Ln { L , Pn { P so that 0
L( I C C ) BT P( I C C )
0
( BT P L)( I C C )
( BT P L)C C
(8.35)
This guarantees that there exists a solution K to (8.17) given by K
R 1 ( BT P L)C
,
8.6 H Static Output Feedback Design for F-16 Normal Acceleration Regulator In this example, we shall show that using the H-infinity design equations in Table 8.1, we can close all the loops simultaneously using output feedback aircraft control design for an F-16 normal acceleration regulator. It is very important to design feedback control regulators of prescribed structure for both stability augmentation systems (SAS) and control augmentation systems (CAS). Therefore, as opposed to full state-variable feedback, static output feedback design is required. This example shows the power of the proposed static H f output feedback design technique, since it is easy to include model dynamics, sensor processing dynamics, and actuator dynamics, but no additional dynamics (e.g. regulator) are needed.
Hf Static Output Feedback
189
The output feedback design algorithm in Table 8.2 is applied to the problem of designing an output feedback normal acceleration regulator for the F-16 aircraft in [87]. The control system is shown in Figure 8.2, where nz is the normal acceleration, r is the reference input in g ’s, and the control input u is the elevator actuator angle. To ensure zero steady-state error an integrator has been added in the feedforward path, this corresponds to the compensator dynamics. The integrator output is H . The short period approximation is used so the aircraft states are pitch rate q and angle of attack D . Since alpha measurements are quite noisy, a low pass filter with the cutoff frequency 10 rad/s is used to provide filtered measurements D F of the angle of attack. An additional state G e is introduced by the elevator actuator.
d(t)
ke r
-
+
e
1 s
H
kI
-
-
-
z u
20 . 2 s 20 . 2
įe
nz
Aircraft D ,q
D +
+
kĮ kq
10 s + 10
q
Figure 8.2. G command system
The state vector is: x(1) D : Angle of attack. x(2) q : Pitch rate. x(3) G e : Elevator actuator. x(4) D F : Filtered measurement of angle of attack. x(5) H : Integral controller. The measurement outputs are y [D F q e H ]T . We use the short period approximation to the F-16 dynamics linearized about the nominal flight condition described in [87], Table 3.4-3 (502 ft/s, level flight, dynamic pressure of 300 psf, xcg 0.35c ) and the dynamics are augmented to include the elevator actuator, angle-of-attack filter, and compensator dynamics. The result is
with,
x
Ax Bu Dd
y
Cx
(8.36)
190
Nonlinear H2/H Constrained Feedback Control
ª 1.01887 0.90506 0.00215 0 « 0.82225 1.07741 0.17555 0 « « 0 0 20.2 0 « 0 0 10 « 10 «¬ 16.26 0.4852 0 0.9788
A
B
ª 0 º « 0 » « » « 20.2 » , D « » « 0 » «¬ 0 »¼
ª0 º «0 » « » «1 » , C « » «0 » «¬ 0 »¼
0º 0 »» 0» » 0» 0 »¼
0 0 57.2958 0 º ª 0 « 0 57.2958 0 0 0 »» « « 16.26 0.9788 0.4852 0 0» « » 0 0 0 1¼ ¬ 0
The factor of 57.2958 is added to convert angles from radians to degrees. The control input is u Ky [kD kq ke k I ] y. It is required to select the output feedback feedback gains to yield stability with good closed-loop response. Note that kD and kq are feedback gains, while ke and k I are feedforward gains. This approach allows the adjustment of both for the best bounded L2 -gain performance. The algorithm presented above was used to design an H f pitch-rate regulator for a prescribed value of J . For computation of the output feedback gain K it is necessary to select Q and R . Using the algorithm described above for the given J , Q , and R the control gain K is easily found using MATLAB in a few seconds. If this gain is not suitable in terms of time responses and closed-loop poles, the elements of Q and R can be changed and the design repeated. After repeating the design several times we selected the design matrices as
Q
ª 264 16 1 0 0 º « 16 60 0 0 0 » « » « 1 0 0 0 0 » , R [0.1] « » 0 0 0 0 » « 0 «¬ 0 0 0 0 100 »¼
which yields the feedback matrix K
>0
0.1778 12.4336 31.7201@
The resulting closed-loop poles are at s
28.3061 , 1.4974 r 1.2148i , 3.1809 , 10
The resulting gains are applied to the system, and a unit step disturbance d (t ) is introduced in simulations to verify robustness of the design. The resulting time
Hf Static Output Feedback
191
responses shown in Figures 8.3 and 8.4 are very good. Note that, though we designed an H f regulator, the structure of the static output feedback controller with the prescribed loops also guarantees good tracking.
Figure 8.3. Angle of attack
Figure 8.4. Pitch rate
Note on selecting J The gain parameter J defines the L2 -bound for a given disturbance. One can quickly perform the design using the above algorithm for a prescribed value of J in a few seconds using MATLAB If the algorithm converges, then the
192
Nonlinear H2/H Constrained Feedback Control
parameter J may be reduced. If J is taken too small, the algorithm will not converge since the algebraic Riccati equation has no positive definite solution. This provides an efficient and fast trial-and-error method for determining the smallest allowable J for given Q and R design matrices which solves the H f problem. For this example, the H f value of J is found to be equal to 0.2, for which the above results were obtained.
8.7
Bibliographical Notes
This work appears in [32], it is well known that the output feedback optimal control solution is prescribed in terms of three coupled matrix equations [61], namely two associated Riccati equations and a spectral radius coupling equation. A sequential numerical algorithm to solve these equations is presented in [73]. Output feedback stabilizability conditions that require the solution of only two coupled matrices are given in [56]. It is shown in [32] that static output feedback H f solution does not yield a well-defined saddle point for a zero-sum differential game; conditions are given under which it is does. A numerical algorithm for calculating static output feedback gains was also given in [32]. Section 8.6 applies a design algorithm to the problem of designing an output feedback normal acceleration regulator for the F-16 aircraft in [87].
References
[1] Abu-Khalaf, M., F. L. Lewis, “Nearly optimal controls laws for nonlinear systems with saturating actuators using a neural network HJB approach,” Automatica, Issue 5, pp. 779–791, 2005. [2] Abu-Khalaf, M., F. L. Lewis, “Nearly optimal state feedback control of constrained nonlinear systems using a neural networks HJB approach,” IFAC Annual Reviews in Control, Vol. 28, pp. 239–251, 2004. [3] Abu-Khalaf, M., F. L. Lewis, J. Huang, “Hamilton-Jacobi-Isaacs formulation for constrained input nonlinear systems,” Proceedings of the 43rd IEEE CDC, pp. 5034–5040, Atlantis, Paradise Island, Bahamas, 2004. [4] Abu-Khalaf, M., F. L. Lewis, “Neural network policy iterations with saturated control policy for the disturbance attenuation zero-sum game,” IEEE Transactions on Neural Networks, (to appear in October 2006). [5] Abu-Khalaf, M., F. L. Lewis, J. Huang, “Policy iterations on the Hamilton– Jacobi–Isaacs equation for H state feedback control with input saturation,” IEEE Transactions on Automatic Control, (to appear in 2006). [6] Adams, R., Fournier, J., Sobolev Spaces, 2nd edition, Academic Press, 2003. [7] Apostol, T., Mathematical Analysis, Addison-Wesley, USA 1974. [8] Astolfi, A., P. Colaneri, “A Hamilton-Jacobi setup for the static output feedback stabilization of nonlinear systems,” IEEE Transactions on Automatic Control, Vol. 47, pp. 2038–2041, December 2002. [9] Astolfi, A., P. Colaneri, “Static output feedback stabilization: from linear to nonlinear and back,” Nonlinear and Adaptive Control in 2000, Vol. 1, Springer-Verlag, New York, 2000. [10] Ball, J., J. W. Helton, “Viscosity solutions of Hamilton-Jacobi equations arising in nonlinear Hf-control”, Journal of Mathematical Systems, Estimation, and Control, Vol. 6, No. 1, pp. 1–22, 1996.
194
References
[11] Ball, J., J. W. Helton, M. Walker, “ Hf Control for Nonlinear Systems with Output Feedback”, IEEE Transactions on Automatic Control, Vol. 38, pp. 546– 559 , April 1993. [12] Ball, J.A., J.W. Helton, “Hf-control for nonlinear plants: connection with differential games”, Proceedings of the IEEE Conference on Decision and Control, pp. 956–962, 1989. [13] Bardi, M., I. Capuzzo-Dolcetta, Optimal Control and Viscosity Solutions of Hamilton-Jacobi-Bellman Equations, Birkhauser, Boston, MA, 1997. [14] Baúar, T., “A dynamic game approach to controller design: disturbance rejection in discrete-time,” Proceedings of the 28th Conference on Decision and Control, pp. 407–414, 1989. [15] Baúar, T., G. J. Olsder, Dynamic Noncooperative Game Theory, 2nd edition, SIAM’s Classic in Applied Mathematics 23, SIAM, Philadelphia, 1999. [16] Baúar, T., P. Bernard, Hf Optimal Control and Related Minimax Design Problems, Birkhäuser, 1995. [17] Beard, R., Improving the Closed-Loop Performance of Nonlinear Systems, Ph.D. thesis, Rensselaer Polytechnic Institute, Troy, NY 12180, 1995. [18] Beard, R., G. Saridis, J. Wen, “Approximate solutions to the time-invariant Hamilton-Jacobi-Bellman equation,” Journal of Optimization Theory and Application, Vol. 96, No. 3, pp. 589–626, March 1998. [19] Beard, R., G. Saridis, J. Wen, “Galerkin Approximations of the Generalized Hamilton-Jacobi-Bellman Equation,” Automatica 33, Issue 12, pp. 2159– 2177, 1997. [20] Beard, R., T. McLain, “Successive Galerkin approximation algorithms for nonlinear optimal and robust control,” International Journal of Control, Vol. 71, No. 5, pp. 717–743, 1998. [21] Bernstein, D. S., “Optimal nonlinear, but continuous, feedback control of systems with saturating actuators,” International Journal of Control, Vol. 62, No. 5, pp. 1209–1216, 1995. [22] Bertsekas, D. P., J. N. Tsitsiklis, Neuro-dynamic Programming, Athena Scientific, Belmont, MA, 1996. [23] Bianchini, G., R. Genesio, A. Parentri, A. Tesi, “Global Hf controllers for a class of nonlinear systems”, IEEE Transactions on Automatic Control, Vol. 49, pp. 244–249, February 2004. [24] Bitsoris, G., E. Gravalou. “Design techniques for the control of discrete-time systems subject to state and control constraints,” IEEE Transactions on. Automatic control, Vol. 44, pp. 1057–1061, May 1999. [25] Bupp R., D. Bernstein, V. Coppola, “A benchmark problem for nonlinear control design,” International Journal of Robust and Nonlinear Control, Vol. 8, pp. 307–310, 1998.
References
195
[26] Burk, F., Lebesgue Measure and Integration, John Wiley & Sons, New York, NY, 1998. [27] Chen, F.-C., C.-C. Liu, “Adaptively controlling nonlinear continuous-time systems using multilayer neural networks,” IEEE Transactions on Automatic Control, Vol. 39, pp. 1306–1310, June 1994. [28] Deng, F., J. Huang, “Computer-aided design of nonlinear H control law: The benchmark problem,” Proceeding of 2001 Chinese Control Conference, Dalin, China, 840-845, 2001. [29] Doyle, J. H., K. Glover, P. Khargonekar, B. Francis, “State-space solutions to standard H2 and Hf control problems,” IEEE Transactions on Automatic Control, Vol. 34, pp. 831-847, August 1989. [30] Evans, M., Swartz, T., Approximating Integrals Via Monte Carlo and Deterministic Methods, Oxford University Press, 2000. [31] Finlayson, B. A., The Method of Weighted Residuals and Variational Principles, Academic Press, New York, NY, 1972. [32] Gadewadikar, J., F. Lewis, M. Abu-Khalaf, “Necessary and sufficient conditions for H-infinity static output-feedback control,” Journal of Guidance, Control, and Dynamics, (to appear in 2006). [33] Ge, S. S., C. C Hang, T. H. Lee, T. Zhang, Stable Adaptive Neural Network Control, Asian Studies in Computer and Information Science, Kluwer Academic Publishers, MA, 2002. [34] Genesio, R., M. Tartaglia, “On the estimation of asymptotic stability regions: state of the art and new proposals,” IEEE Transactions on Automatic Control, Vol. 30, pp. 747–755, August 1985. [35] Gilbert, E., K. T. Tan, “Linear systems with state and control constraints: the theory and application of maximal output admissible sets,” IEEE Transactions on Automatic Control, Vol. 36, pp. 1008–1020, Sep. 1991. [36] Guillard, H., S. Monaco, and D. Normand-Cyrot, “An approach to nonlinear discrete-Time H control,” Proceedings of the 32nd Conference on Decision and Control, pp. 178–183, 1993. [37] Guillard, H., S. Monaco, and D. Normand-Cyrot, “Approximated solutions to nonlinear discrete-time H control,” IEEE Transactions on Automatic Control, pp. 2143–2148, December 1995. [38] Han, D., S. N. Balakrishnan, “State-constrained agile missile control with adaptive-critic based neural networks,” Proceedings of the American Control Conference, pp. 1929–1933, June 2000. [39] Henrion, D., S. Tarbouriech, V. Kucera, “Control of linear systems subject to input constraints: a polynomial approach,” Automatica, Vol. 37, Issue 4, pp.597–604, 2001.
196
References
[40] Hill, D., P. Moylan, “The stability of nonlinear dissipative systems,” IEEE Transactions on. Automatic Control, Vol. 21, pp. 708–711, October 1976. [41] Hornik, K., M. Stinchcombe, H. White, “Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks,” Neural Networks, Vol. 3, pp. 551–560, 1990. [42] Hu, T., Z. Lin, B. M. Chen, “An analysis and design method for linear systems subject to actuator saturation and disturbance,” Automatica, Vol. 38, Issue 2, pp. 351–359, 2002. [43] Huang, C.-S., S. Wang, K. L. Teo, “Solving Hamilton-Jacobi-Bellman equations by a modified method of characteristics,” Nonlinear Analysis, Vol. 40, pp. 279–293, 2000. [44] Huang, J., “An algorithm to solve the discrete HJI equation arising in the L2gain optimization problem,” International Journal of Control, Vol. 72, No.1, pp. 49–57, 1999. [45] Huang, J., “An efficient algorithm to solve a sequence of linear equations arising in nonlinear Hf control”, Applied numerical Mathematics, Vol. 26, pp. 293–306, 1998. [46] Huang, J., C. F. Lin, “Numerical approach to computing nonlinear Hf control laws,” Journal of Guidance, Control, and Dynamics, Vol. 18, No. 5, pp. 989– 994, September-October 1995. [47] Isidori, A., A. Astolfi, “Disturbance attenuation and Hf-control via measurement feedback in nonlinear systems,” IEEE Transactions on Automatic Control, Vol. 37, pp. 1283–1293, September 1992. [48] Jia, P., J. Huang, “Disturbance attenuation of the nonlinear benchmark system by approximate discrete-time nonlinear H-infinity control,” Proceedings of the 2005 Chinese Control Conference, pp. 1776–1781, July 2005. [49] Kailath, T. Linear Systems, Prentice Hall, NJ, 1980. [50] Kang, W., P.K. De, A. Isidori, “Flight control in a windshear via nonlinear H methods,” Proceedings of the IEEE Control and Decision Conference, pp. 1135–1142, 1992. [51] Khalil, H., Nonlinear Systems, 3rd Edition, Prentice Hall, Upper Saddle River, NJ, 2003. [52] Kim, Y. H., F. L. Lewis, D. Dawson, “Intelligent optimal control of robotic manipulators using neural networks,” Automatic 36, Issue 9, pp. 1355–1364, 2000. [53] Kirk, D., Optimal Control Theory: An Introduction, Prentice Hall, New Jersey, 1970. [54] Kleinman, D., “On an iterative technique for Riccati equation computations,” IEEE Transactions on Automatic Control, Vol. 13, pp. 114–115, February 1968.
References
197
[55] Knobloch, H., Isidori, A., Flockerzi, D., Topics in Control Theory, Springer Verlag, Boston, 1993. [56] Kucera, V., De Souza, C. E., “A necessary and sufficient condition for output feedback stabilizability,” Automatica, Vol. 31, Issue. 9, pp. 1357–1359, 1995. [57] Lancaster, P., L. Rodman, Algebraic Riccati Equations, Oxford University Press Inc., New York, 1995. [58] Landelius, T., Reinforcement Learning and Distributed Local Model Synthesis, Ph.D. thesis, LinkSping University, 1997. [59] Lee, H. W. J., K. L. Teo, W. R. Lee, S. Wang, “Construction of suboptimal feedback control for chaotic systems using B-splines with optimally chosen knot points,” International Journal of Bifurcation and Chaos, Vol. 11, No. 9, pp. 2375–2387, 2001. [60] Lewis, F. L., S. Jagannathan, A. Yesildirek, Neural Network Control of Robot Manipulators and Nonlinear Systems, Taylor & Francis, London, 1999. [61] Lewis, F. L., V. L. Syrmos, Optimal Control, John Wiley & Sons, Inc. New York, NY, 1995. [62] Lin, W., C.I. Byrnes, “H control of discrete-time nonlinear systems,” IEEE Transaction on Automatic Control, Vol. 41, pp. 494–510, April 1996. [63] Lio, F. D., “On the Bellman equation for infinite horizon problems with unbounded cost functional,” Applied Mathematics and Optimization, Vol. 41, pp.171–197, 2000. [64] Liu, X., S. N. Balakrishnan, “Adaptive critic based neuro-observer,” Proceedings of the American Control Conference, pp. 1616–1621, June 2001. [65] Liu, X., S. N. Balakrishnan, "Convergence analysis of adaptive critic based optimal control,” Proceedings of the American Control Conference, pp.1929– 1933, June 2000. [66] Lyshevski, S. E., “Constrained optimization and control of nonlinear systems: new results in optimal control,” Proceedings of the IEEE Conference on Decision and Control, pp. 541–546, December. 1996. [67] Lyshevski, S. E., Control Systems Theory with Engineering Applications, Birkhauser, Boston, MA, 2001. [68] Lyshevski, S. E., “Optimal control of nonlinear continuous-time systems: design of bounded controllers via generalized nonquadratic functionals,” Proceedings of the American Control Conference, pp. 205–209, June 1998. [69] Lyshevski, S. E., “Role of performance functionals in control laws design,” Proceedings of the American Control Conference, pp. 2400–2405, June 2001. [70] Lyshevski, S. E., A. U. Meyer, “Control system analysis and design upon the Lyapunov method,” Proceedings of the American Control Conference, pp. 3219–3223, June 1995.
198
References
[71] Mikhlin, S. G., Variational Methods in Mathematical Physics, Pergamon, Oxford, 1964. [72] Miller, W. T., R. Sutton, P. Werbos, Neural Networks for Control, The MIT Press, Cambridge, Massachusetts, 1990. [73] Moerder, D. D., A. J. Calise, “Convergence of a numerical algorithm for calculating optimal output feedback gains,” IEEE Transactions on Automatic Control, Vol. 30, pp. 900–903, September 1985. [74] Mracek, C. J. Cloutier, “A preliminary control design for the nonlinear benchmark problem,” Proceedings of the 1996 International Conference on Control Applications, Dearborn, MI, 265-272, 1996. [75] Munos, R., L. C. Baird, A. Moore, “Gradient descent approaches to neuralnet-based solutions of the Hamilton-Jacobi-Bellman equation,” International Joint Conference on Neural Networks IJCNN, Vol. 3, pp. 2152–2157, 1990. [76] Murray, J., C. Cox, G. Lendaris, R. Saeks, “Adaptive dynamic programming,” IEEE Transactions on Systems, MAN, and Cybernetics -Part C: Applications and Reviews, Vol. 32, No. 2, pp. 140–153, 2002. [77] Narendra, K. S., F. L. Lewis, (ed.), “Special issue on neural network feedback control,” Automatica, Vol.37, Issue 8, pp. 1147–1148, 2001. [78] Naylor, A. W., Sell, G. R., Linear Operator Theory in Engineering and Science, Holt, Rinehart and Winston, INC, 1971. [79] Parisini, T., R. Zoppoli, “Neural approximations for infinite-horizon optimal control of nonlinear stochastic systems,” IEEE Transactions on Neural Networks, Vol. 9, No. 6, pp. 1388–1408, November 1998. [80] Polycarpou, M., “Stable adaptive neural control scheme for nonlinear systems,” IEEE Transactions on Automatic Control, Vol. 41, pp. 447–451, March 1996. [81] Rovithakis, G.A., M.A. Christodoulou, “Adaptive control of unknown plants using dynamical neural networks,” IEEE Transactions Systems, Man, and Cybernetics, Vol. 24, No. 3, pp. 400–412, 1994. [82] Saberi, A., Z. Lin, A. Teel, “Control of linear systems with saturating actuators,” IEEE Transactions on Automatic Control, Vol. 41, No. 3, pp. 368–378, March 1996, [83] Sadegh, N., “A perceptron network for functional identification and control of nonlinear systems,” IEEE Transactions on Neural Networks, Vol. 4, No. 6, pp. 982–988, November 1993. [84] Sanner, R.M., J.-J.E. Slotine, “Stable adaptive control and recursive identification using radial gaussian networks,” Proceedings of the IEEE Conference on Decision and Control, Brighton, pp. 2116–2123, 1991. [85] Saridis, G., C. S. Lee, “An approximation theory of optimal control for trainable manipulators,” IEEE Transactions on Systems, Man, Cybernetics, Vol. 9, No. 3, pp. 152–159, March 1979.
References
199
[86] Seshagiri S., H K. Khalil, “Output feedback control of nonlinear systems using RBF neural networks,” IEEE Transactions on Neural Networks, Vol. 11, No. 1, pp. 69–79, January 2000. [87] Stevens, B. L., F. L. Lewis, Aircraft Control and Simulation, 2nd Edition , Wiley Interscience, New York, 2003. [88] Sussmann, H. , E. D. Sontag, Y. Yang, “A general result on the stabilization of linear systems using bounded controls,” IEEE Transactions on Automatic Control, Vol. 39, No. 12, pp. 2411–2425, December 1994. [89] Tsiotras P., M. Corless, M. Rotea, “An L2 disturbance attenuations solution to the nonlinear benchmark problem,” International Journal of Robust and Nonlinear Control, Vol. 8, pp. 311–330, 1998. [90] Van Der Schaft, A. J., “L2-gain analysis of nonlinear systems and nonlinear state feedback Hf control,” IEEE Transactions on Automatic Control, Vol. 37, No. 6, pp. 770–784, 1992. [91] Van Der Schaft, A. J., L2-gain and Passivity Techniques in Nonlinear Control, London, U.K.: Springer-Verlag, 1999. [92] Vinter, R., J. Clark, M. James, “The interpretation of discontinuous state feedback control laws as nonanticipative control strategies in differential games,” IEEE Transactions on Automatic Control, Vol. 49, pp. 1360–1365, August 2004. [93] Willems, J. C., “Dissipative dynamical systems part II: linear systems with quadratic supplies,” Archive for Rational Mechanics and Analysis, Vol. 45, No. 1, pp. 352–393, 1972. [94] Willems, J. C., “Dissipative dynamical systems part I: general theory,” Archive for Rational Mechanics and Analysis, Vol. 45, No. 1, pp. 321–351, 1972. [95] Zames, G., “Feedback and Optimal Sensitivity: Model Reference Transformations, Multiplicative Seminorms, and Approximate Inverses,” IEEE Transactions on Automatic Control, Vol. 26, pp. 301–320, February 1981. [96] Zhou, K., J. Doyle, Essentials of Robust Control, Prentice Hall, 1997.
Index
A activation functions, 30, 44, 106 actuator saturation, 34, 60 admissible, 34 aircraft, 188, 192 algorithm, 55, 93, 99, 115, 187 ARE, 11, 14, 118, 136, 192 asymptotic stability, 3, 5, 34, 40, 178 attenuation, 9, 92, 104, 116, 135, 148, 182
B bang–bang, 34, 41, 68-71 bounded controls, 35, 37, 43
cost, 72, 110 cost-to-go, 6, 19, 24, 37
D dense, 30, 46, 100 deterministic, 54, 98 discrete-time, 2, 7, 12, 17, 26, 147175 dissipative, 9, 92, 104, 116, 135, 148, 182 disturbance, 2, 8, 16, 23, 78, 97, 108, 117, 177, 182 domain of validity, 79, 91, 92
E C completeness, 47, 48, 101 concavity, 37, 179 constrained-input, 37, 60, 85, 104 constraints on the states, 41, 65 contradiction, 40, 50 convergence: pointwise, 40, 82, 91 convergence: in the mean, 45, 48 convergence: uniform, 30, 34, 40, 46, 48, 82, 91, 101 convexity, 36
existence, 9, 21, 26, 36, 48, 76, 78, 81, 90, 126, 185
F F16, 188, 192 flowchart, 55, 93, 99, 187 full rank, 44, 97, 98 function approximation, 1, 28, 30, 76 functional, 6, 14, 34, 41, 100
202
Index
G game theory, 23, 77, 84, 94, 116, 185 game value function, 24, 87, 88, 91, 178, 185 Gram matrix, 46
H Hamiltonian, 2, 10, 36, 79, 84, 116, 148, 179 hidden, 30, 44, 96 Hilbert space, 31, 45 HJB, 1, 15, 18, 19, 24, 33, 79 HJI, 24, 26, 87, 117, 151 horizon: finite, 87, 186 horizon: infinite, 14, 17, 24, 26, 36, 87, 186 hyperbolic tangent, 36
mean convergence, 45, 48 mean value theorem, 51 mesh, 54, 55, 65, 98, 99 method of weighted residuals, 43, 44, 97 minimum time, 34, 41, 68-71 monotone, 36-40, 85, 92 Monte Carlo integration, 54, 68, 98
N nearly optimal, 34, 54, 111 necessary, 6, 37, 77, 177, 182, 187 neural network, 28, 43, 96 nondecreasing, 37-40 nonquadratic, 34, 41 null controllable, 56 numerical, 54, 75, 98, 140, 146, 175, 187
I
O
induction, 23, 40, 53, 82, 103 infinite series, 47, 49, 100 inner/outer, 91, 96, 98, 100, 106 invertible, 44, 97, 149, 150, 155
odd, 36, 37, 39 operator, 45, 94, 120 orthogonal, 38 oscillator, 62, 104
K
P
Kleinman, 33, 42
pointwise convergence, 40, 82, 91 policy iteration, 18, 33, 37, 45, 55, 75, 78, 90, 93, 99 polynomial, 10, 30, 46, 56, 72, 79, 100, 105, 124, 146 positive definite, 5, 7, 11, 20, 34, 46, 50, 79, 100, 117, 148, 192
L least squares, 44, 45, 97 Lebesgue integral, 31, 44, 45, 97 Lebesgue measure, 51 linearly independent, 44-52, 97-100 Lipschitz, 2, 3, 34, 40 LQR, 15, 57, 65 Lyapunov, 3-8, 19-23, 72, 81, 102, 184 Lyapunov equation, 5, 33-36, 76
M MATLAB, 54, 98, 188, 190, 191
Q quasi-norm, 85
R region of asymptotic stability, 40, 60, 106 residual error, 44, 96, 97
Index
Riccati equation, 11, 14, 105, 118, 136, 160, 188, 192 RTAC, 96, 104, 135, 164
203
U uniform convergence, 30, 34, 40, 46, 48, 82, 91, 101 uniqueness, 2, 36, 40, 48
S saddle, 24, 26, 86, 185 series, 30, 45, 100, 115, 151 singular values, 38, 39 smooth, 2, 30, 36, 78, 94, 116, 148 Sobolev space, 31, 47, 100 static output feedback, 177 stochastic, 54, 65, 68, 98 storage function, 9-13, 78-80, 82, 91 sufficiency, 46, 183 switching surface, 41, 68
T trajectory, 9, 13, 37, 68, 72, 82, 116, 148
V value function, 14, 24, 34, 40, 52, 79, 87, 88, 91, 178, 185 viscosity solutions, 36, 77 Volterra, 56, 57, 100, 106-108
W Weierstrass, 30, 46, 101
Z zero-sum, 23, 78, 84, 94, 116, 185
Other titles published in this Series (continued): Analysis and Control Techniques for Distribution Shaping in Stochastic Processes Michael G. Forbes, J. Fraser Forbes, Martin Guay and Thomas J. Harris Publication due August 2006 Process Control Performance Assessment Andrzej Ordys, Damien Uduehi and Michael A. Johnson (Eds.) Publication due August 2006 Adaptive Voltage Control in Power Systems Giuseppe Fusco and Mario Russo Publication due September 2006 Distributed Embedded Control Systems Matjaˇz Colnariˇc, Domen Verber and Wolfgang A. Halang Publication due October 2006 Modelling and Analysis of Hybrid Supervisory Systems Emilia Villani, Paulo E. Miyagi and Robert Valette Publication due November 2006
Model-based Process Supervision Belkacem Ould Bouamama and Arun K. Samantaray Publication due February 2007 Continuous-time Model Identification from Sampled Data Hugues Garnier and Liuping Wang (Eds.) Publication due May 2007 Magnetic Control of Tokamak Plasmas Marco Ariola and Alfredo Pironti Publication due May 2007 Process Control Jie Bao, and Peter L. Lee Publication due June 2007 Optimal Control of Wind Energy Systems Iulian Munteanu, Antoneta Iuliana Bratcu, Nicolas-Antonio Cutululis and Emil Ceanga Publication due November 2007