AOVANCEO PROCESS IDENTIFICATION ANO CONTROL
CONTROL
ENGINEERING
A Series of Reference Books and Textbooks Editor NE...
142 downloads
500 Views
9MB Size
Report
AOVANCEO PROCESS IDENTIFICATION ANO CONTROL
CONTROL
ENGINEERING
A Series of Reference Books and Textbooks Editor NEIL MUNRO, PH.D., D.Sc. Professor AppliedControl Engineering Universityof ManchesterInstitute of Scienceand Technology Manchester, United Kingdom
1. Nonlinear Control of Electric Machinery,DarrenM. Dawson,Jun Hu, and TimothyC. Burg 2. Computational Intelligence in Control Engineering,RobertE. King 3. Quantitative FeedbackTheory: Fundamentalsand Applications, Constantine H. Houpisand StevenJ. Rasmussen 4. SelfLearning Control of Finite MarkovChains,A. S. Poznyak,K. Najim, and E. GOmezRamirez 5. RobustControl and Filtering for TimeDelaySystems,MagdiS. Mahmoud 6. Classical FeedbackControl: With MATLAB, Boris J. Lurie and Paul J. Enright 7. OptimalControl of Singularly PerturbedLinear Systemsand Applications: HighAccuracyTechniques, Zoran GajMand MyoTaegLim 8. Engineering System Dynamics: A Unified GraphCenteredApproach, ForbesT. Brown 9. AdvancedProcessIdentification and Control, EnsoIkonen and Kaddour Najim 10. Modem Control Engineering, P. N. Paraskevopoulos
Additional Volumesin Preparation Sliding ModeControl in Engineering,Wilfrid Perruquetti and JeanPierre Barbot Actuator Saturation Control, edited by Vikram Kapila and Karolos Gdgodadis
ADVANCED PROC IOENTIFICATION AND CONTROL
Enso Ikonen University of Oulu Oulu, Finland
KaddourNajim Institut
National Polytechnique de Toulouse Toulouse, France
MARCEL
MARCELDEKKER, INC. DEKKER
NEW YORK BASEL
ISBN: 082470648X This book is printed on acidfree paper. Headquarters Marcel Dekker, Inc. 270 Madison Avenue, New York, NY 10016 tel: 2126969000; fax: 2126854540 Eastern Hemisphere Distribution Marcel Dekker AG Hutgasse 4, Postfach 812, CH4001 Basel, Switzerland tel: 41612618482; fax: 41612618896 World Wide Web http ://www.dekker.com The publisher offers discounts on this book when ordered in bulk quantities. For more information, write to Special Sales/Professional Marketing at the headquarters address above. Copyright © 2002 by Marcel Dekker, Inc.
All Rights Reserved.
Neither this book nor any part may be reproduced or transmitted in any ibrm or by any means, electronic or mechanical, including photocopying, microfilming, and recording, or by any reformation storage and retrieval system, without permission in writing from the publisher. Current printing (last digit): 10987654321 PRINTED IN THE UNITED STATES OF AMERICA
Series
Introduction
Manytextbooks have been written on control engineering, describing new techniques for controlling systems, or new and better ways of mathematically formulating existing methods to solve the everincreasing complex problems faced by practicing engineers. However, few of these books fully address the applications aspects of control engineering. It is the intention of this newseries to redress this situation. The series will stress applications issues, and not just the mathematics of control engineering. It will provide texts that present not only both new and wellestablished techniques, but also detailed examples of the application of these methods to the solution of realworld problems. The authors will be drawn from both the academic world and the relevant applications sectors. There are already many exciting examples of the application of control techniques in the established fields of electrical, mechanical (including aerospace), and chemical engineering. We have only to look around in today’s highly automated society to see the use of advanced robotics techniques in the manufacturing industries; the use of automated control and navigation systems in air and surface transport systems; the increasing use of intelligent control systems in the manyartifacts available to the domestic consumer market; and the reliable supply of water, gas, and electrical power to the domestic consumer and to industry. However, there are currently many challenging problems that could benefit from wider exposure to the applicability of control methodologies, and the systematic systemsoriented basis inherent in the application of control techniques. This series presents books that draw on expertise from both the academic world and the applications domains, and will be useful not only as academically recommended course texts but also as handbooks for practitioners in manyapplications domains. Advanced Process Identification and Control is another outstanding entry to Dekker’s Control Engineering series. Nell Munro III
Preface The study of control systems has gained momentumin both theory and applications. Identification and control techniques have emerged as powerful techniques to analyze, understand and improve the performance of industrial processes. The application of modeling, identification and control techniques is an extremely wide field. Process identification and control methods play an increasingly important role in the solution of many engineering problems. There is extensive literature concerning the field of systems identification and control. Far too often, an engineer faced with the identification and control of a given process cannot identify it in this vast literature, which looks like the cavern of Ali Baba. This book will introduce the basic concepts of advanced identification, prediction and control for engineers. Wehave selected recent ideas and results in areas of growing importance in systems identification, parameter estimation, prediction and process control. This book is intended for advanced undergraduate students of process engineering (chemical, mechanical, electrical, etc.), or can serve as a textbook of an introductory course for postgraduate students. Practicing engineers will find this book especially useful. The level of mathematical competence expected of the reader is that covered by most basic control courses. This book consists of nine chapters, two appendices, a bibliography and an index. A detailed table of contents provides a general idea of the scope of the book. The main techniques detailed in this book are given in the form of algorithms, in order to emphasize the main tools and facilitate their implementation. In most books it is important to read all chapters in consecutive order. This is not necessarily the only way to read this book. Modeling is an essential part of advanced control methods. Models are extensively used in the design of advanced controllers, and the success of the methods relies on the accuracy modeling of relevant features of the process to be controlled. Therefore the first part (Chapters 16) of the book is dedicated to process identificationthe experimental approach to process modeling. V
vi
PREFACE
Linear models, considered in Chapters 13, are by far the most commonin industrial practice. They are simple to identify and allow analytical solutions for many problems in identification and control. For many realworld problems, however, sufficient accuracy can be obtained only by using nonlinear system descriptions. In Chapter 4, a number of structures for the identification of nonlinear systems are considered: power series, neural networks, fuzzy systems, and so on. Dynamicnonlinear structures are considered in Chapter 5, with a special focus on Wiener and Hammerstein systems. These systems consist of a combination of linear dynamic and nonlinear static structures. Practical methods of parameter estimation in nonlinear and constrained systems are briefly introduced in Chapter 6, including both gradientbased and random search techniques. Chapters 79 constitute the second part of the book. This part focuses on advanced control methods, the predictive control methods in particular. The basic ideas behind the predictive control technique, as well as the generalized predictive controller (GPC), are presented Chapter 7, together with an application example. Chapter 8 is devoted to the control of multivariable systems. The control of MIMOsystems can be handled by two approaches, i.e., the implementation of either global multiinputmultioutput controllers or distributed controllers (a set of SISO controllers for the considered MIMOsystem). To achieve the design of a distributed controller it is necessary to select the best inputoutput pairing. Wepresent a wellknown and efficient technique, the relative gain array method. As an example of decoupling methods, a multivariable PIcontroller based on decoupling at both low and high frequencies is presented. The design of a multivariable GPCbased on a statespace representation ends this chapter. Finally, in order to solve complex problems faced by practicing engineers, Chapter 9 deals with the development of predictive controllers for nonlinear systems (adaptive control, Hammerstein and Wiener control, neural control, etc.). Predictive controllers can be used to design both fixed parameter and adaptive strategies, to solve unconstrained and constrained control problems. Application of the control techniques presented in this book are illustrated by several examples: fluidizedbed combustor, valve, binary distillation column, twotank system, pH neutralization, fermenter, tubular chemical reactor. The techniques presented are general and can be easily applied to many processes. Because the example concerning
vii
PREFACE
fluidized bed combustion (FBC) is repeatedly used in several sections the book, an appendix is included on the modeling of the FBCprocess. An ample bibliography is given at the end of the book to allow readers to pursue their interests further. Any book on advanced methods is predetermined to be incomplete. We have selected a set of methods and approaches based on our own preferences, reflected by our experienceand, undoubtedly, lack of experiencewith many of the modern approaches. In particular, we concentrate on the discrete time approaches, largely omitting the issues related to sampling, such as multirate sampling, handling of missing data, etc. In parameter estimation, subspace methods have drawn much interest during the past years. We strongly suggest that the reader pursue a solid understanding of the biasvariance dilemma and its implications in the estimation of nonlinear functions. Concerning the identification of nonlinear dynamic systems, we only scratch the surface of Wiener and Hammerstein systems, not to mention the multiplicity of the other paradigms available. Process control can hardly be considered a mere numerical optimization problem, yet we have largely omitted all frequency domain considerations so invaluable for any designer of automatic feedback control. Manyof our colleagues would certainly have preferred to include robust control in a cookbook of advanced methods. Many issues in adaptive and learning control would have deserved inspection, such as identification in closedloop, inputoutput linearization, or iterative control. Despite all this, we believe we have put together a solid package of material on the relevant methods of advanced process control, valuable to students in process, mechanical, or electrical engineering, as well as to engineers solving control problems in the real world. Wewould like to thank Professor M. M’Saad, Professor U. Kortela, and M.Sc. H. Aaltonen for providing valuable comments on the manuscript. Financial support from the Academyof Finland (Projects 45925 and 48545) is gratefully acknowledged.
Enso Ikonen Kaddour
N~im
Contents Series Introduction Preface I Identification
iii
1 Introduction to Identification 1.1 Where are models needed? .................... 1.2 What kinds of models are thele? ................. 1.2.1 Identification vs. firstprinciple modeling ........ 1.3 Steps cf identification ........................ 1.4 Outline of the book ........................
3 3 4 7 8 11
2
Linear Regression 2.1 Linear systems . ......................... 2.2 Method of least squares ..................... 2.2.1 Derivation ......................... ......................... 2.2.2 Algorithm 2.2.3 Matrix reFresentation ................... 2.2.4 Properties ......................... 2.3 Recursive LS method ....................... 2.3.1 Derivation ......................... 2.3.2 Algorithm ......................... prediction error ............... 2.3.3 A ~osteviori 2.4 RLS with exponential forgetting ................. 2.4.1 Derivation ..................... ¯ .... 2.4.2 Algorithm ......................... 2.5 Kalman filter ........................... 2.5.1 Derivation ......................... 2.5.2 Algorithm ......................... 2.5.3 Kalman filter in parameter estimation .........
13 13 17 18 20 21 25 28 28 31 33 34 36 36 37 40 42 44
3
Linear Dynamic Systems 3.1 Transfer function ......................... 3.1.1 Finite impulse response .................. 3.1.2 Transfer function .....................
47 47 47 50
ix
x
CONTENTS 3.2 Deterministic disturbances . . . , ................. 3.3 Stochastic disturbances ...................... 3.3.1 Offset in noise ....................... 3.3.2 BoxJenkins ........................ 3.3.3 Autoregressive exogenous ................. 3.3.4 Output error ....................... 3.3.5 Other structures ..................... equation ................... 3.3.6 Diophantine 3.3.7 /stepahead predictions ................. 3.3.8 Remarks ..........................
53 53 55 55 57 59 61 66 69 74
4
Nonlinear Systems 4.1 Basis function networks ...................... 4.1.1 Generalized basis function network ........... 4.1.2 Basis functions ... : .................. 4.1.3 Function approximation ................. 4.2 Nonlinear blackbox structures ................. 4.2.1 Power series ........................ 4.2.2 Sigmoid neural networks ................. 4.2.3 Nearest neighbor methods ................ 4.2.4 Fuzzy inference systems .................
77 78 78 79 81 82 83 89 95 98
5
Nonlinear Dynamic Structures 113 timeseries models .................. 114 5.1 Nonlinear 5.1.1 Gradients of nonlinear timeseries models ....... 117 5.2 Linear dynamics and static nonlinearities ........... 120 5.2.1 Wiener systems ...................... 121 124 5.2.2 Hammerstein systems ................... 5.3 Linear dynamics and steadystate models ............ 125 5.3.1 Transfer function with unit steadystate gain ...... 126 5.3.2 Wiener and Hammerstein predictors .......... 126 Gradients of the Wiener and Hammerstein predictors . 128 5.3.3 132 5.4 Remarks .............................. 133 5.4.1 Inverse of Hammerstein and Wiener systems ...... 134 5.4.2 ARX dynamics ......................
6
Estimation of Parameters error methods ..................... 6.1 Prediction 6.1.1 Firstorder methods .................... 6.1.2 Secondorder methods .................. 6.1.3 Step size ..........................
137 138 139 140 141
CONTENTS
xi
142 6.1.4 LevenbergMarquardt algorithm ............. Optimization under constraints ................. 149 6.2 149 6.2.1 Equality constraints ................... 6.2.2 Inequality constraints ................... 151 153 6.3 Guided random search ~nethods ................. 6.3.1 Stochastic learning automaton .............. 155 6.4 Simulation examples ....................... 159 Pneumatic valve: identification of a Wiener system . . 160 Binary distillation column: identification of Hammerstein model under constraints .............. 167 Twotank system: Wiener modeling under constraints. 172 Conclusions ........................ 176
II
Control Predictive Control 7.1 Introduction to modelbased control ............... 7.2 The basic idea ........................... control ............... 7.3 Linear quadratic predictive 7.3.1 Plant and model ..................... 7.3.2 /step ahead predictions ................. 7.3.3 Cost function ....................... 7.3.4 Remarks .......................... Closedloop behavior ................... 7.3.5 7.4 Generalized predictive control .................. 7.4.1 ARMAX/ARIMAX model ................ predictions ................. 7.4.2 /stepahead 7.4.3 Cost function ....................... 7.4.4 Remarks .......................... 7.4.5 Closedloop behavior ................... Simulation example ........................ 7.5
181 181 182 183 184 185 186 187 188 189 190 191 193 195 197 197
Multivariable Systems 8.1 Relative gain array method ................... 8.1.1 The basic idea ....................... 8.1.2 Algorithm ......................... 8.2 Decoupling of interactions .................... 8.2.1 Multivariable PIcontroller ................ 8.3 Multivariable predictive control ................. 8.3.1 Statespace model .....................
203 204 204 206 209 210 213 213
xii
CONTENTS 8.3.2 8.3.3 8.3.4 8.3.5
9
III
/step ahead predictions ................. Cost function ....................... Remarks .......................... Simulation example ....................
216 217 218 219
Timevarying and Nonlinear Systems control ......................... 9.1 Adaptive 9.1.1 Types of adaptive control ................ 9.1.2 Simulation example .................... 9.2 Control of Hammerstein and Wiener systems .......... 9.2.1 Simulation example .................... 9.2.2 Second order Hammerstein systems ........... 9.3 Control of nonlinear systems .................. Predictive control ..................... 9.3.1 9.3.2 Sigmoid neural networks ................. 9.3.3 Stochastic approximation ................. 9.3.4 Control of a fermenter .................. 9.3.5 Control of a tubular reactor ...............
223 223 225 228 232 233 242 247 248 248 252 254 266
Appendices
A StateSpace Representation A.1 St atespace description ...................... A.I.1 Control and observer canonical forms .......... A.2 Controllability and observability ................. A.2.1 Pole placement ...................... A.2.2 Observers .........................
273 273 274 275 276 280
B Fluidized Bed Combustion B.1 Model of a bubbling iiuidized bed ................ B.I.1 Bed ............................ B.1.2 Freeboard ......................... B.1.3 Power .......... : ................ Steadystate ........................ B.1.4 B.2 Tuning of the model .................. B.2.1 Initial values ........................ behavior ................... B.2.2 Steadystate B.2.3 Dynamics ......................... B.2.4 Performance of the model ................ B.3 Linearization of the model ....................
283 283 285 286 286 287 288 288 288 290 291 293
.....
CONTENTS
xiii
Bibliography
299
Index
307
Part I Identification
Chapter 1 Introduction
to Identification
Identification is the experimental approach to process modeling[5]. In the following chapters, an introductory overview to some important topics in process modeling is given. The emphasis is on methodsbased on the use of measurementsfrom the process. In general, these types of methods do not require detailed knowledgeof the underlying process; the chemical and physical phenomenaneed not be fully understood. Instead, good measurements of the plant behavior need to be available. In this chapter, the role of identification in process engineering is discussed, and the steps of identification are briefly outlined. Various methods, techniques and algorithms are considered in detail in the chapters to follow.
1.1
Where are models
needed?
Anengineer whois faced with the characterization or the prediction of the plant behavior, has to modelthe considered process. A modelingeffort always reflects the intended use of the model. The needs for process models arise from various requirements: In process design, one wants to formalize the knowledgeof the chemical and physical phenomenataking place in the process, in order to understand and develop the process. Because of safety and/or financial reasons, it might be difficult or even impossible to performexperiments on the real process. If a proper modelis available, experimentingcan be conducted using the modelinstead. Process modelscan also help to scaleup the process, or integrate a given system in a larger production scheme. ¯ In process control, the shortterm behavior and dynamicsof the process 3
4
CHAPTER 1.
INTRODUCTION TO IDENTIFICATION
mayneed to be predicted. Thebetter one is able to predict the output of a system, the better one is able to control it. A poor control system maylead to a loss of production time and valuable raw materials. In plant optimization, an optimal process operating strategy is sought. This can be accomplishedby using a model of the plant for simulating the process behavior under different conditions, or using the modelas a part of a numerical optimization procedure. The models can also be used in an operator decision support system, or in training the plant personnel. In fault detection, anomaliesin different parts of the process are monitored by comparing models of knownbehavior with the measured behavior. In process monitoring, weare interested in physical states (concentrations, temperatures, etc.) which must be monitored but that are not directly (or reliably) available through measurements.Therefore, wetry to deducetheir values by using a model. Intelligent sensors are used, e.g., for inferring process outputs that are subject to long measurement delays, by using other measurementswhich maybe available morerapidly.
1.2
What kinds
of models are there?
Several approachesand techniques are available for deriving the desired process model. Standard modeling approaches include two main streams: ¯ the firstprinciple
(whitebox) approach and
¯ the identification of a parameterized blackbox model. The firstprinciple approach (whitebox models) denotes models based on the physical laws and relationships (mass and energy balances, etc.) that are supposedto govern the system’s behavior. In these models, the structure reflects all physical insight about the process, and all the variables and the parametersall have direct physical interpretations (heat transfer coefficients, chemical reaction constants, etc.) Example1 (Conservation principle) general conservation principle:
A typical firstprinciple
Accumulation = Input  Output + Internal production
law is the (1.1)
The fundamentalquantities that are being conserved in all cases are either mass, momentum,or energy, or combinations thereof.
1.2.
WHAT KINDS OF MODELS ARE THERE?
5
Example 2 (Bioreactor) Many biotechnological processes consist of fermentation, oxidation and/or reduction of feedstuff (substrate) by microorganisms such as yeasts and bacteria. Let us consider a continuousflow fermentation process. Mass balance considerations lead to the following model: dx
d~ = (# u) ds 1 d~  ~#x + u(s~,~
(1.2) (1.3)
s)
where x is the biomass concentration, s is the substrate concentration, u is the dilution rate, sin is the influent substrate concentration, R is the yield coefficient and ~ is the specific growth rate. The specific growth rate # is known to be a complex function of several parameters (concentrations of biomass, x, and substrate, s, pH, etc.) Many analytical formulae for the specific growth rate have been proposed in the literature [1] [60]. The Monodequation is frequently used as the kinetic description for growth of microorganisms and the formation of metabolic products: I ~ 
s
(1.4)
#max KMq S
where ~rnax is the maximumgrowth rate parameter.
and KMis the MichaelisMenton
Often, such a direct modeling may not be possible.
One may say that:
The physical models are as different from the world as a geographic map is from the surface of the earth (Brillouin). The reason may be that the ¯ knowledge of the system’s mechanisms is incomplete, or the ¯ properties exhibited manner. Furthermore,
by the system may change in an unpredictable
¯ modeling may be timeconsuming
and
¯ may lead to models that are unnecessarily
complex.
6
CHAPTER 1.
INTRODUCTION TO IDENTIFICATION
In such cases, variables characterizing the behavior of the considered system can be measured and used to construct a model. This procedure is usually called identification [55]. Identification governs manytypes of methods. The modelsused in identification are referred to as blackbox models(or experimental models), since the parameters are obtained through identification from experimental data. Between the two extremes of whitebox and blackbox models lay the semiphysical greybox models. Theyutilize physical insight about the underlying process, but not to the extent that a formal firstprinciple modelis constructed. Example 3 (Heating system) If we are dealing with the modeling of electric heating system, it is preferable to use the electric powerV2 as a control variable, rather than the voltage, V. In fact, the heater power, rather than the voltage, causes the temperature to change. Even if the heating system is nonlinear, a linear relationship between the power and the temperature will lead to a goodrepresentation of the behavior of this system. Example4 (Tank outflow) Let us consider a laboratoryscale tank system [53]. The purpose is to model howthe water level y (t) changes with the inflow that is generated by the voltage u (t) applied to the pump. Several experiments were carried out, and they showedthat the best linear blackbox modelis the following
y(t) = aly(t 1) + a2u(t
(1.5)
Simulated outputs from this model were comparedto real tank measurements. They showed that the fit was not bad, yet the model output was physically impossible since the tank level was negative at certain time intervals. As a matter of fact, all linear modelstested showedthis kind of behavior. Observe that the outflow can be approximated by Bernoulli’s law which states that the outflow is proportional to square root of the level y (t). Combining these facts, it is straightforward to arrive at the following nonlinear model structure y(t) = aly (t 1)+ a~u(t  1) + a~v/y (t  1)
(1.6)
This is a grey box model. The simulation behavior of this model was found better than that of the previous one (with linear blackbox model), as the constraint on the origin of the output (level) was no longer violated.
1.2.
7
WHATKINDS OF MODELSARE THERE?
Modelingalways involves approximations since all real systems are, to some extent, nonlinear, timevarying, and distributed. Thus it is highly improbablethat any set of modelswill contain the ’true’ system structure. All that can be hoped for is a modelwhich provides an acceptable level of approximation, as measuredby the use to which the modelwill be dedicated. Another problem is that we are striving to build models not just for the fun of it, but to use the modelfor analysis, whoseoutcomewill affect our decision in the future. Therefore we are always faced with the problem of having model’accurate enough,’ i.e., reflecting enoughof the important aspects of the problem. The question of what is ’accurate enough’ can only, eventually, be settled by realworld experiments. In this book, emphasis will be on the discrete time approaches. Most processes encountered in process engineering are continuous time in nature. However,the developmentof discretetime modelsarises frequently in practical situations where system measurements(observations) are made, and control policies are implementedat discrete time instants on computersystems. Discrete time systems (discrete event systems) exist also, such as found from manufacturingsystems and assemblylines, for example. In general, for a digital controller it is convenient to use discrete time models. Several techniques are also available to transform continuous time modelsto a time discrete form.
1.2.1 Identification
vs. firstprinciple
modeling
Provided that adequate theoretical knowledgeis available, it mayseem obvious that the firstprinciple modeling approach should be preferred. The modelis justified by the underlying laws and principles, and can be easily transferred and used in any other context bearing similar assumptions. However, these assumptions maybecome very limiting. This can be due to the complexityof the process itself, whichforces the designer to use strong simplifications and/or to fix the model componentstoo tightly. Also, advances in process design together with different local conditions often result in that no two plants are identical. Example 5 (Power plant constructions) Power plant constructions are usually strongly tailored to match the local conditions of each individual site. Theconstruction dependson factors such as the local fuels available, the ratio and amountof thermal and electrical powerrequired, newtechnological innovations towards better thermal efficiency and emission control, etc. To makethe existing models suit a new construction, an important amount of redesign and tuning is required.
8
CHAPTER 1.
INTRODUCTION
TO IDENTIFICATION
Solving of the model equations might also pose problems with highly detailed firstprinciple models. Either cleverness of a mathematician is required from the engineer developing the model, or timeconsuming iterative computations need to be performed. In addition to the technical point of view, firstprinciple models can be criticized due to their costs. The more complex and a priori unknown the various chemical/physical phenomenaare to the model developer, or to the scientific community as a whole, the more time and effort the building of these models requires. Although the new information adds to the general knowledge of the considered process, this might not be the target of the model development project. Instead, as in projects concerning plant control and optimization, the final target is in improving the plant behavior and productivity. Just as plants are built and run in order to fabricate a product with a competitive price, the associated development projects are normally assessed against this criterion. The description of the process phenomenagiven by the model might also be incomprehensible for users other than the developer, and the obtained knowledge of the underlying phenomena may be wasted. It might turn out to be difficult to train the process operators to use a highly detailed theoretical model, not to mention teaching them to understand the model equations. Furthermore, the intermediate results, describing the subphenomena of the process, are more difficult to put to use in a process automation system. Even an advanced modern controller, such as a predictive controller, typically requires only estimates of the future behavior of the controlled variable. Having accepted these points of view, a semi or fullparameterized approach seems much more meaningful. This is mainly due to the saved design time, although collecting of valid inputoutput observations from a process might be time consuming. Note however, that it is very difficult to overperform the firstprinciple approach in the case where few measurements are available, or when good understanding of the plant behavior has already been gained. In process design, for example, there are no fullscale measurement data at all (as the plant has not been built yet) and the basic phenomenaare (usually) understood. In many cases, however, parameterized experimental models can be justified by the reduced time and effort required in building the models, and their flexibility in realworld modeling problems.
1.3 Steps of identification Identification is the experimental approach to process modeling [5]. Identification is an iterative process of the following components:
1.3.
STEPS OF IDENTIFICATION
9
¯ experimental planning (data acquisition), ¯ selection of the model structure, ¯ parameter estimation,
and
¯ model validation. The basis for the identification procedure is experimental planning, where process experiments are designed and conducted so that suitable data for the following three steps is obtained. The purpose is to maximize the information content in the data, within the limits imposed by the process. In modeling of dynamic systems, the sampling period 1 must be small enough so that significant process information is not lost. A peculiar effect called aliasing may also occur if the sampled signal contains frequencies that are higher than half of the sampling frequency: In general, if a process measurement is sampled with a sampling frequency ws, high frequency components of the process variable with a frequency greater than ~~ appear as lowfrequency components in the sampled signal, and may cause problems if they appear in the same frequency range as the normMprocess variations. The sampling frequency should be, if at all possible, ten times the maximumsystem bandwidth. For low signaltonoise ratios, a filter should be considered. In some cases, a timevarying sampling period may be useful (related, e.g., to the throughflow of a process). The signal must also be persistently exciting, such as a pseudo random (binary) sequence, PRBS, which exhibits spectral properties similar those of the white noise. Selection of the model structure is referred to as structure estimation, where the model inputoutput signals and the internal components of the model are determined. In general, the model structure is derived using prior knowledge. 1Whena digital computer is used for data acquisition, realvalued continuous signals are converted into digital form. The time interval between successive samples is referred to as sampling period (sampling rate). In recursive identification the length of the time interval between two successive measurements can be different from the sampling rate associated with data acquisition (for more details, see e.g. [5]).
10
CHAPTER 1.
INTRODUCTION TO IDENTIFICATION
Mostof the suggested criteria can be seen as a minimization of a loss function (prediction error, AkaikeInformation Criterion, etc.). In dynamicsystems, the choice of the order of the modelis a nontrivial problem. The choice of the model order is a compromisebetween reducing the unmodelled dynamics and increasing the complexity of the model which can lead to modelstabilizability difficulties. In manypractical cases, a second order (or even a first order) modelis adequate. Variousmodelstructures will be discussed in detail in the following chapters. In general, conditioning of data is necessary: scaling and normalization of data (to scale the variables to approximatelythe samescale), and filtering (to remove noise from the measurements). Scaling process is commonly used in several aspects of applied physics (heat transfer, fluid mechanics,etc.). This process leads to dimensionless parameters (Reynolds numberof fluid mechanics, etc.) which are used as an aid to understanding similitude and scaling. In [9] a theory of scaling for linear systems using methodfrom Lie theory is described. Thescaling of the input and output units has very significant effects for multivariable systems[16]. It affects interaction, design aims, weighting functions, modelorder reduction, etc. The unmodeled dynamics result from the use of inputoutput models to represent complex systems: parts of the process dynamics are neglected and these introduce extra modelingerrors which are not necessarily bounded. It is therefore advisable to perform normalization of the inputoutput data before they are processed by the identification procedure. The normalization procedure based on the norm of the regressor is commonly used [62]. Data filtering permits to focus the parameter estimator on an appropriate bandwidth. There are two aspects, namelyhighpass filtering to eliminate offsets, load disturbances, etc., and lowpassfiltering to eliminate irrelevant high frequency componentsincluding noise and system response. The rule of thumbgoverning the design of the filter is that the upper frequency should be about twice the desired system bandwidth and the lower frequency should be about onetenth the desired bandwidth. In parameter estimation, the values of the unknownparameters of a parameterized model structure are estimated. The choice of the parameter estimation method depends on the structure of the model, as well as the
1.4.
OUTLINE OF THE BOOK
11
properties of the data. Parameterestimation techniques will be discussed in detail in the followingchapters. In validation, the goodnessof the identified modelis assessed. The validation methodsdepend on the properties that are desired from the model. Usually, accuracyand goodgeneralization (interpolation/extrapolation) abilities are desired; transparency and computational efficiency mayalso be of interest. Simulations provide a useful tool for modelvalidation. Accuracy and generalization can be tested by crossvalidation techniques, where the modelis tested on a test data set, previously unseen to the model. Also statistical tests on prediction error mayprovide useful. With dynamicsystems, stability, zeros and poles, and the effect of the variation of the poles, are of interest. ¯ Mostmodelvalidation tests are based on simply the difference between the simulated and measured output. Model validation is really about model falsification. The validation problem deals with demonstrating the confidence in the model. Often prior knowledge concerning the process to be modeledand statistical tests involving confidence limits are used to validate a model.
1.4 Outline
of the book
In the remaining chapters, various model structures, parameter estimation techniques, and predictiv~ control of different kinds of systems (linear, nonlinear, SISOand MIMO) are discussed. In the second chapter, linear regression models and methods for estimating model parameters are presented. The method of least squares (LS) is a very commonlyused batch method. can be written in a recursive form, so that the componentsof the recursive least squares (RLS) algorithm can be updated with new information as soon as it becomesavailable. Also the Kalmanfilter, commonlyused both for state estimation as well as for parameter estimation, is presented in Chapter 2. Chapter 3 considers linear dynamicsystems. The polynomial timeseries representation and stochastic disturbance models are introduced. An/stepahead predictor for a general linear dynamicsystem is derived. Structures for capturing the behavior of nonlinear systems are discussed in Chapter 4. A general frameworkof generalized basis function networks is introduced. As special cases of the basis function network, commonly used nonlinear structures such as powerseries, sigmoid neural networks and Sugenofuzzy models are obtained. Chapter 5 extends to nonlinear dynamical systems. Thegeneral nonlinear timeseries approachesare briefly viewed.
12
CHAPTER 1.
INTRODUCTION TO IDENTIFICATION
A detailed presentation of Wiener and Hammersteinsystems, consisting of linear dynamicscoupledwith nonlinear static systems,, is given. To conclude the chapters on identification, parameter estimation techniques are presented in Chapter 6. Discussion is limited to prediction error methods, as they are sufficient for most practical problems encountered in process engineering. Anextension to optimization under constraints is done, to emphasizethe practical aspects of identification of industrial processes. A brief introduction to learning automata, and guided randomsearch methods in general, is also given. The basic ideas behind predictive control are presented in Chapter 7. First, a simple predictive controller is considered. This is followed by an extension including a noise model: the generalized predictive controller (GPC). State space representation is used, and various practical features are illustrated. AppendixA gives some background on state space systems. Chapter 8 is devoted to the control of multipleinputmultipleoutput (MIMO)systems. There are two main approaches to handle the control of MIMOsystems: the implementation of a global MIMOcontrollers, or implementationof a distributed controller (a set of SISOcontrollers for the considered MIMO system). To achieve the design of a distributed controller it is necessary to be able to select the best inputoutput pairing. In this chapter wepresent a well knownand efficient technique, the relative gain array (RGA) method. As an exampleof decoupling methods, a multivariable PIcontroller based on decoupling at both low and high frequencies, is presented. Finally, the design of a multivariable GPCbased on a state space representation is considered. In order to solve increasingly complexproblems faced by practicing engineers, Chapter 9 deals with the developmentof predictive controllers for nonlinear systems. Various approaches (adaptive control, control based on Hammersteinand Wiener models, or neural networks) are considered to deal with the timevarying and nonlinear behavior of systems. Detailed descriptions are providedfor predictive control algorithms to use. Usingthe inverse model of the nonlinear part of both Hammersteinand Wiener models, we showthat any linear control strategy can be easily implementedin order to achieve the desired performancefor nonlinear systems. The applications of the different control techniques presented in this book are illustrated by several examplesincluding: fluidizedbed combnstor,valve, binary distillation column, twotank system, pH neutralization, fermenter, tubular chemical reactor, etc. The example concerning the fluidized bed combustion is repeatedly used in several sections of the book. This book ends with AppendixB concerning the description and modeling of a fluidized bed combustion process.
Chapter 2 Linear Regression Amajor decision in identification is howto parameterize the characteristics and properties of a system using a model of a suitable structure. Linear modelsusually provide a goodstarting point in the structure selection of the identification procedure. In general, linear structures are simpler than the nonlinear ones and analytical solutions maybe found. In this chapter, linear structures and parameter estimation in such structures are considered.
2.1
Linear
systems
The dominatingdistinction betweenlinear and nonlinear systems is the principle of superposition[19]. Definition 1 (Principle of superposition) The following holds only if is linearly dependenton b: If alis the output due to bl and a2 is the output due to b2, then aal + ~a2 is the output due to abl ÷ j3b~.
(2.1)
In above, the a and ~3 are constant parameters, and ai and bi (i  1, 2) are somevalues assumedby variables a and b. Thecharacterization of linear timeinvariant dynamicsystems, in general, is virtually completebecausethe principle of superposition applies to all such systems. As a consequence, a large body of knowledgeconcerning the analysis and design of linear timeinvariant systems exists. By contrast, the state of nonlinear systems analysis is not nearly complete. 13
14
CHAPTER 2.
LINEAR
REGRESSION
With parameterized structures f(~a, 0), two types of linearities are of importance: Linearity of the model output with respect to model inputs ~; and linearity of the model output with respect to model parameters 0. The former considers the mapping capabilities of the model, while the latter affects the estimation of the model parameters¯ If at least one parameter appears nonlinearly, models are referred to as nonlinear regression models [78]. In this chapter, linear regression models are considered. Consider the following model of the relation between the inputs and output of a system
[55]: y(k) = OTcp(k) + (
(2.2)
where
(2.3)
and ~1(]g)
:
(2.4) :
. The model describes the observed variable y (k) as an unknown linear combination of the observed vector ~ (k) plus noise ~ (k). Such a model is called a linear regression model, and is a very commontype of model in control and systems engineering. ~ (k) is commonlyreferred to as the regression vector; 0 is a vector of constants containing the parameters of the system; k is the sample index. Often, one of the inputs is chosen to be a constant, ~t ~ 1, which enables the modeling of bias. If the statistical characteristics of the disturbance term are not known, we can think of
=
(2.5)
2.1.
LINEAR SYSTEMS
15
as a natural prediction of what y (k) will be. The expression (2.5) becomes prediction in an exact statistical (meansquares) sense, if {4 (k)} is a sequence of independent randomvariables, independent of the observations ~o, with 1zero . meanand finite variance In manypra~.ctical cases, the parameters 0 are not known,and need to be estimated. Let 0 be the estimate of ~ ~(k) ~Tg~(k)
(2.6)
Note, that the output ~(k) is linearly dependent on both 0 and ~ (k). Example 6 (Static system) The structure (2.2) can be used to describe manykinds of systems. Consider a noiseless static system with input variables Ul, u2 and ua and output y
(2.7) whereas (i = 1, 2, 3, 4) are constants. It can be presented in the form of (2.2) lWe are looking for a predictor if(k) which minimizes the meansquare error criterion
Replacingy (k) by its expression oT~ E{(y(k)ff)
(k)
~ ~ (k)
it follows:
2} 2} = E{(OTT(k)+((k)~)
If the sequence {( (k)} is independent of the obser~tions ~ (k),
In view of the fact that {( (k)} is a sequence of independent random v~iables with zero meanvalue, it follows E {( (k) (OT~(k)  ~) } =O. As aconsequence,
and the minimumis obtNned for (2.5). The minimum~lue of the criterion E I(( (k))2~, the ~iance of the noise. k"
is equal
16
CHAPTER 2.
LINEAR REGRESSION
by choosing al a2
(2.8)
a3 a4
ul(k) = u2(k) (k)
(2.9)
1 and we have y(k) =oTqo(k)
(2.10)
Example 7 (Dynamic system) Consider a dynamic system with input 2signals { u (k) } and output signals { y (k) }, sampledat discrete time instants k = 1, 2, 3, .... If the values are related through a linear difference equation y(k) + aly(k 1)+ . .. . + anAY(k hA) (2.1 1) = +b,~Bu(kdnt~)+~(k ) bou(kd)+ .... wherea~ (i = 1, ..., hA) and b~ (i = 0, ..., riB) are constants and d is the time delay, we can introduce a parameter vector/9 a 1 : 0 ~.
anA
(2.12)
bo : bnB
and a vector of lagged inputoutput data ~ (k) y(k
1)
y (k  hA) u(kd)
(2.13)
:
u (k  d  riB) 2Observed at samplinginstant k (k E 1,2 .... ) at timet =kT, whereT is referredto as the samplinginterval, or samplingperiod.Tworelated termsare used: the salnpling frequencyf = 3, and the angularsamplingfrequency,w= ~.
2.2.
METHODOF LEAST SQUARES
17
and represent the system in the form of (2.2)
y(k)
(2.14)
Thebackwardshift d is a convenient wayto deal with process time delays. Often, there is a noticeable delay betweenthe instant whena change in the process input is implementedand the instant whenthe effect can be observed from the process output. Whena process involves mass or energy transport, a transportation lag (time delay) is associated with the movement.This time delay is equal to the ratio L/V where L represents the length of the process (furnace for example),and V is the velocity (e.g., of the raw material). In system identification, both the structure and the true parameters8 of a system maybe a priori unknown.Linear structures are a very useful starting point in blackbox identification, and in most cases provide predictions that are accurate enough.Since the structure is simple, it is also simpleto validate the performanceof the model. The selection of a modelstructure is largely based on experience and the informatior/that is available of the process. Similarly, parameter estimates ~ maybe based on the available a priori information concerning the process (physical laws, phenomenologicalmodels, etc.). If these are not available, efficient techniquesexist for estimating someor all of the unknownparameters using sampleddata from the process. In what follows, we shall be concerned with some methods related to the estimation of the parameters in linear systems. These methods assume that a set of inputoutput data pairs is available, either offline or online, giving examples of the system behavior.
2.2 Method of least
squares
The methodof least squares3 is essential in systems and control engineering. It provides a simple tool for estimating the parameters of a linear system. In this section, we deal with linear regression models. Consider the model (2.2):
y(k) where 0 is a columnvector of parameters to be estimated from observations y (k), ~a (k), k = 1, 2, ..., K, and whereregressor ~ (k) is independent ~Theleast squares methodwas developed by Karl Gauss. He was interested in the esti mation of six parameters characterising the motions of planets and comets, using telescopic measurements.
18
CHAPTER 2.
LINEAR
REGRESSION
(linear regression) 4. K is the number of observations. This type ot" model is commonlyused by engineers to develop correlations between physical quantities. Notice that ~a (k) may correspond to a priori knownfunctions (log, exp, etc.) of a measured quantity. The goal of parameter estimation is to obtain an estimate of the parameters of the model, so that the model fit becomes ’good’ in the sense of some criterion. A commonlyaccepted method for a ’good’ fit is to calculate the values of the parameter vector that minimize the sum of the squared residuals. Let us consider the following estimation criterion
j (0)
1
K
2[y(k) (k)]
(2.16)
k=l
This quadratic cost function (to be minimized with respect to 0) expresses the average of the weighted squared errors between the K observed outputs, y (k), and the predictions provided by the. model, oTcp (k). The scalar coefficients c~k allow the weighting of different observations. The important benefit of having a quadratic cost function is that it can be minimized analytically. Rememberthat a quadratic function has the shape of a parabola, and thus possesses a single optimum point. The optimum (minimum or maximum) can be solved analytically by setting the derivative to zero and the examination of the second derivative shows whether a minimumor a maximumis in question.
2.2.1
Derivation
Let us minimize the cost function J with respect to parameters ~ = arg min J
(2.17)
where J is given by (2.16)
J=
K

k~l
aNote that this
poses restrictions
on the choice
of ~ (k).
(12.18)
2.2.
METHODOF LEAST SQUARES
19
Assumingthat ~ (k) is not a function of 0~, the partial derivative for the i’th term can be calculated, which gives OJ O1 = O0~g ~ 2[y(k)0T~(k)] 00~ k1 1
(2.19/
g
K
=~~~ {~[~(~1
 o~(~1][~, (~11} (~.~1
k=l
K
~=~
For the second derivative we have
(2.24) the first derivatives can be written as a row vector:
Taking the transpose gives OJ 2 0~ = ~
~’(~) k1
~0 T (~)
0
~y(k)~,(~) k1
(2.29)
20
CHAPTER 2.
LINEAR REGRESSION
Theoptimumof a quadratic function is found by setting all partial derivatives to zero: OJ
~
k=l
0
(2.30)
.~(k) (k) .~y (a)~(a) =o
(2.31)
k=l
~v(~)v~(~) ~= ~.~v(k)y(~)
]
k=l
(2.32)
The second derivative can be collected in a matrix: 02J
[ 02J ]
oo :
(2.33)
(2.34)
[~.~ (k) ~, (k)] k=l
~,j
K
= ~(~)~(~)
(~.351
k=l
For the optim~ to be a minim~, we req~re that the matrix is positive ~. definite Finally, the parameter vector ~ ~nimizing the c~t f~ction J is given by (if the inverse of the matrix exists):
"~= o~k~(k)~T(k)
Zo~k~(k)y(k)
The optimumis a minimumif the second derivative is positive, matrix my ~ is positive definite. 2.2.2
(2.36) i.e. the
Algorithm
Let us represent the celebrated least squares parameter estimate as an algorithm. 5Matrix
A is positive
definite
if
xTAx > 0 for x
#
0.
2.2.
21
METHODOF LEAST SQUARES
Algorithm 1 (Least squares method for a fixed data set) Let a system be given by y(k) = 0~ (k) + ~
(2.37)
wherey (k) is the scalar output of the system; 0 is the true parametervector of the systemof size I × 1; ~a (k) is the regressionvector of size I × 1; and~ (k) is system noise. The least squares parameter estimate 0 of 0 that minimizes the cost function K
~ J:
1 ~~ [y (k)
 0% (k)]
(2.38)
k=l
wherea~ are scalar weighting factors, is given by
~= ~(~)~T(~) k=l
~k~(~)y(~)
(~.a9)
k=l
If K
~.k~ (k)v~(~)
(2.40)
k=l
is invertible, then there is a uniquesolution. Theinverse exists if the matrix is positive definite. Hence, a linear regression model ~(k) = ~T~ (k)
(2.41)
was identified using sampledmeasurementsof the plant behavior, where ~ (k) is the output of the model (predicted output of the system) and 0 is a parameter estimate (based on K samples). 2.2.3
Matrix
representation
Often, it is moreconvenient to calculate the least squares estimate from a compact matrix form. Let us collect the observations at the input of the model to a K × I matrix
r,~T(1) ] [ ~1(1) ~p~(1)..(2)
(K)
_~
~p,(2) ~1 (2) ~ (2) ~o1 (K)2 (K) .. . ~O I (K
(2.42)
22
CHAPTER 2.
LINEAR REGRESSION
and observations at the output to a K × 1 vector
y(1) y(2) :
(2.43)
y(K) The K equations can be represented by a matrix equation y  (I)0 +
(2.44)
where E is a K × 1 columnvector of modeling errors. Nowthe least squares algorithm (assuming a~ = 1 for all k) that minimizes 1 J = ~ (y 00) T (y O~)
(2.45)
can be represented in a more compact form by ~=[(I)T4)]1 (I)Ty
(2.46)
where 02J ~ O0
,~To
(2.47)
must be positive definite. Consider Example7 (dynamic system). If the input signal is constant, say ~, the right side of equation (2.11) maybe written as follows ~Eb~+~(k)
(2.48)
i=0
It is clear that wecan not identify separately the parametersb~ (i  0, ..., nt~). Mathematically,the matrix (I)T(I) is singular. Fromthe point of view of process operation, the constant input fails to excite all the dynamicsof the system. In order to be able to identify all the modelparameters, the input signal must fluctuate enough,i.e. it has to be persistently exciting. Let us illustrate singularity by considering the following matrix: A= [1 ~]71 1
(2.49)
2.2.
METHODOF LEAST SQUARES Qc[~] 2.2 2.3 2.3 2.3 1.6 1.7 1.7 3.1 3.0 3.0
23
P [MW] 19.1 19.3 19.2 19.1 13.1 15.1 14.3 26.0 27.0 25.6
Table 2.1: Steadystate data from an FBCplant. whichis singular for all s E ~. However,if s is very small we can neglect the term al,2 = s, and obtain AI= [1 0]~1 1
(2.50)
The determinant of A1 is equal to 1. Thus, the determinant provides no information on the closeness of singularity of a matrix. Recall that the determinant of a matrix is equal to the product of its eigenvalues. Wemight therefore think that the eigenvalues contain moreinformation. The eigenvalues of the matrix A1 are both equal to 1, and thus the eigenvalues give no additional information. The singular values (the positive square roots of the eigenvalues of the matrix ATA)of a matrix represent a good quantitative measureof the near singularity of a matrix. The ratio of the largest to the smallest singular value is called the condition numberof the considered matrix. It provides a measureof closeness of a given matrix to being singular. Observe that the condition numberassociated with the matrix A1tends to infinity as e ~ 0. Let us illustrate the least squares methodwith two examples. Example 8 (Effective heat value) Let us cousider a simple application of the least squares method. The following steady state data (Table 2.1) was measured from an FBCplant (see Appendix B). In steady state, the power P is related to the fuel feed by P = gQc + ho
(2.51)
where H is the effective heat value[~~ MJ ] and h0 is due to losses. Basedon the data, let us determine the least squares estimate of the effective heat value of the fuel.
24
CHAPTER 2.
LINEAR REGRESSION
4O 30 20 10 0 10 0
l
2 Qc[kg/s]
3
4
Figure 2.1: Least squares estimate of the heat value. Substituting t9 ~ [H, ho]T, cb ~ [Qc, 1], y , P wehave 2.2 1 2.3 1 :
19.1 19.3
¯
(2.52)
3.0 1 Using (2.46), or Algorithm 1, we obtain 1 ~ 0Ty = [dPT(b]
(2.53)
0.6453 =[8.7997] Thus, H = 8.7997 i8 the least square8 estimate of the effective heat value of the fuel. Fig. 2.1 8how8the data point8 (dots) and the estimated linear relation (8olid line). Example 9 (02 dynamics) From an FBC plant ~ (see Appendix B), ][Nm fuel feed Qc [~] and flue gas oxygencontent CFt~’~ 1 were measuredwith a sampling interval of 4 seconds. The data set consisted of 91 noisy measurement patterns from step experiments around a steadystate operating point: fuelfeed~c=2.6[~],primary air~1=3.6 ,[Nm3]s j, secondary air~2=8.4 g’~3] [ Based on the measurements, let us determine the parameters a, b, and c of the following difference equation: [CF (k)  ~F] = a [CF (k  1)  ~F] + b [Qc (k  6)  ~c] + c
2.2.
25
METHODOF LEAST SQUARES
Let us construct the input matrix Qc(1)~c Qc(2)~c CF (90) 
1 1
(2.55)
Qc(85)~c
and the vector of measuredoutputs CF(7)
c (8)
(2.56)
CF(91) Theleast squares estimate of Ia, b, c]T is then calculated by (2.46), resulting in:
0.648 0.0172 0.0000
(2.57)
Thus, the dynamicsof the 02 content from fuel feed are described by
(2.58) = 0.648[CF(k
1)~F]
0.0172[Qc(k6)~c]
or, equivalently using the backwardshift operator, x (k  1) qix (k (1  0.648q~) [CF (k)  ~F] = 0.0172q6 [Qc (k)  ~c]
(2.59)
The data (dots) and a simulation with the estimated model (solid lines) illustrated in Fig. 2.2. 2.2.4
Properties
Nex~we will be concerned with the properties of the least squares estimator 0. Owingto the fact that the measurementsare disturbed, the vector parameter estimation ~ is random. An estimator is said to be unbiased if the mathematical expectation of the parameter estimation is equal to the true parameters O. The least squares estimation is unbiased if the noise E has zero meanand if the noise and the data (I) are statistically independent. Notice, that the statistical independenceof the observations and a zero mean
26
CHAPTER 2. 0.04
I
I
I
i~ ,~r~
~’ 0.03~_ ~’~ 0o0
O.G
I0
LINEAR REGRESSION
1
2
1
2
0 0
I 00
3
.
4
5
6
4
5
6
~2.~ ~2.f 2.5 0
3 t [min]
Figure 2.2: Prediction by the estimated model. Upper plot shows the predicted (solid line) and measured(circles) flue gas oxygencontent. Thelower plot showsthe modelinput, fuel feed. noise is sufficient but not necessary for carrying out unbiased estimation of the vector parameters [62]. The estimation error is given by
~=0~
(2.60)
The mathematical expectation is given by
_{oioo11 ~
E{O [*T~]I~T[~o..~E]}
(2.62)
since [oTo] ~ oTO= I, and E and ¯ ~e statistically independent. It follows that if E h~ zero mean, the LS ~timator is unbi~ed, i.e. E{~} = 0 and E{~} = O
(2.64)
Let ~ nowco~ider the covariance matr~ of the estimation error which ~ repr~ents the dispersion of~ about its mean value. The cov~i~ce matrix ~The co.fiance
of
a r~dom ~riable
x is
defined
by c~(x)
E {[~ E {~}1 [~ E {~}1~}. If x is zero mean, E {x} = 0, then coy(x) =
2.2.
METHODOF LEAST SQUARES
27
of the estimation error is given by
= since E h~ zero mean and v~i~ce a~ (and its components are identicMly distributed), and E and ¢ are statistically independent. It is a me~e of how well we can estimate the u~nown 0. In the le~t squ~ approach we operate on given data, ¯ is known.This results in P = [oTo]ia~
(2.71)
The squ~e root of the diagonal elements of P, ~, repr~ents the standard e~ors of each element ~ of the estimate ~. The v~iance can be ~timated ~ing the sum of squ~ed errors divided by de~ees of freedom
where I is the numberof p~ameters go ~timate. Nx~ple 10 (Nffec~ive hea~ ~lue~ continued) Co~ider Exhale Wehave K = 10 data points and two p~ameters, I = 2. Using (2.72) obtain ~ = o.a6ag, a stand~d error of 0.a~82 for the ~timate of H, and 0.8927 for the bi~ h0. Nem~k 1 (Co~anee matrix)
~or ~ = 1 we obtain
Therefore, in ~he frameworkof p~ame~er~timation, the ma~rNP = [~r~] ~ is called the error cov~iance matrN.
28
CHAPTER 2.
2.3 Recursive
LINEAR REGRESSION
LS method
The least squares method provides an estimate for the model parameters, based on a set of observations. Consider the situation whenthe observation pairs are obtained onebyone from the process, and that we would like to update the parameter estimate whenevernew information becomesavailable. This can be done by adding the newobservation to the previous set of observations and recomputing(2.39). In what follows, a recursive formulation is derived, however[55]. Instead of recomputingthe estimates with all available data, the previous parameter estimates are updated with the new data sample. In order to do this, the least squares estimation formula is written in the form of a recursive algorithm. Definition 2 (Recursive algorithm) A recursive algorithm has the form new estimate
=
old estimate
(2.74)
correction factor
2.3.1
prediction new  with old observation estimate
Derivation
The least squares estimate at sample instant k  1 is given by (2.39)
~(k 1) ai cp(i)~ T (i
ai~(i)y(i)
(2.75)
":
Li=I
At sample instant k, newinformation is obtained and the least squares estimate is given by
:
k1
(2.76) ] 1
x(~a~(i)y(i)+a~(k)y(k)) ~ i=l
Define k
a(a)= ~.,~ (i) ~ (i) i=l
(2.77)
2.3.
RECURSIVE LS METHOD
29
whichleads to the following recursive formula for R (k)
R(k)= R(k1)+~k~(k)~,~ (k)
(2.78)
Using(2.77), the least squares estimate (2.76) can be rewritten ~(k)=Rl(k)I~°~(i)y(i)+°~kcP(k)y(k)]~.
(2.79)
Basedon (2.77), the estimate at iteration k  1, (2.75), can be rewritten follows: k1
~ (k  1) = 1 (k 1)E c~i~p (i) y (i)
(2.80)
i1
which gives k1
E o~icp(i)y(i)=
R(k 1)~(k
(2.81)
i=1
Substituting this equation into (2.79), wefind
~(~)= a1(~) [a(~1)~(~ 1) +,~v(k) y (~)] From(2.78), we have a recursive formula for whichis substituted in (2.82) ~(~)= 1 ( ~) [ [a ( k) ~ (~ )~ (k )] ~ (k  ~) + ~ Reorganizing give: ~(k)=~(k1)+R~(k)a~(k)[y(k)~T(k)~(k1)]
(2.84)
which, together with (2.78), is a rec~sive formul~ for the le~t squ~ ~timate. In the algorithm given by (2.84), the matrix R (k) needs to be inverted at each time step. In order to avoid this, introduce P (k) = ~ (k)
(2.85)
The recision of R (k), (2.78), nowbecom~
P~(k)= P~(~ 1) + ,~ (k) ~ (k)
(2.86)
The t~get is to be able to update P (k) directly, without needing to matrix inversion. This can be done by ~ing the matrix inversion lemma.
30
CHAPTER 2.
LINEAR REGRESSION
Lemma1 (Matrix inversion lemma) Let A, B, C and D be matrices compatible dimensions so that A ÷ BCDexists. Then [A + BCD]’ = A’  AZB [DA’B + cl]1DA1 (2.87) Theverification of thelemmacanbe obtained by multiplying therighthandsideby A + BCDfromtheright, whichgivesunitmatrix (forproof,
see[64],p.64). Makingthe following substitutions A B C D
~ p1 (k~ ~(k) * ak ~ ~T(k)
I)
(2.88) (2.89) (2.90) (2.91)
and applying Lemma1 to (2.86) gives P (k) = [P1 (k 1) ~( k)a~:~ T (k )] ~
(2.92)
= P(k1)cp(k)cpT(k)P(k1) P(k1) (2.93) l+~T(k)P(k_ 1)~(k) Thusthe inversion of a square matrix of size dim0 is replaced by the inversion of a scalar. The algorithm can be more conveniently expressed by defining a gain vector L (k) L(k) =
1) ~ (k) ± +P(k (k) e (k  1)÷ =
(~P (k)
(2.94)
~k
wherethe second equality can be verified by substituting (2.93) for P (k) reorganizing. The recursive algorithm needs someinitial values to be started up. In the absence of prior knowledge,one possibility to obtain initial values is to use the least squares method on the first k0> dim 0 samples. Another common choice is to set the initial parametervector to zero ~(k0) =0 (2.95) and let the initial error covariancematrix to be a multiple of identity matrix P (k0) =
(2.96)
where C is some large positive constant. A large value implies that the confidence in 0 (k0) is poor and ensures a high initial degree of correction (adaptation). Notice that this makesthe updating direction coincide with the negative gradient of the least squares criterion.
2.3. 2.3.2
31
RECURSIVE LS METHOD Algorithm
Therecursive least squares algorithm can nowbe given, using (2.94), (2.84)(2.85), and (2.93) Algorithm 2 (Recursive least squares algorithm is given by L(k)
±~k
squares method) The recursive P(k 1) ~ (k)
least
(2.97)
(k)P(/c 1)÷
"~(k)=’~(k 1) ÷L (k) [y (k) _ ~T (k _ 1)~o(k)] P (k) = P (k  1)  L (k) ~T (k)P
(2.98) (2.99)
wherek = ko + 1, ko + 2, ko + 3, ... Theinitial values ~ (ko) and P (/Co) obtained by using the LS on the first/co > dim 0 samples 1
~ (k0) = P (k0) ~ ~ (i)y
(2.101)
The ~S method is one of the most widely ~ed rec~sive p~ameter ~timation techniques, due to its rob~tn~s and e~iness of implementation. Example 11 (02 dynamics: continued) Let us consider the same problem as in Example9 where the parameters of the following model were to be estimated: [1 "~
aq 1]
Iv F (k)
 ~F] ~
bq 6
[Qc (k)  ~c] c
(2.102)
Usingthe recursive LSmethod with the initial values k0 = 7: ~(7)=
0 ;P(7)= 0
~ 0 0 0 109
(2.103)
andsubstituting for k  7, 8, ..., 91
(2.10a)
32
CHAPTER 2. O.Od
oI
LINEAR I
00 0
0
REGRESSION I
~ 0.030.03
0.0%
i
2
3
4
5
6
2.8 ~2.7 22.6 2.5 I
I
3 t [minl
I
4
5
6
~ 3
~ 4
~ 5
6
0.5 10
~ 1
~ 2
Figure 2.3: Online prediction by the estimated model. Upper plot shows the predicted (solid line) and measured (circles) flue gas oxygen content; middle plot shows the model input, fuel feed. The evolution of the values of the estimated parameters is shown at the bottom of the figure. we have the following parameters at k = 91: ~(91)
0.646 0.0172 0.0000
(2.1o
which are the same (up to two digits) as in Example 9. Fig. 2.3 illustrates the evolution of the parameters a, b and c, as well as the online prediction by the model. Remark 2 (Factorization) The covariance matrix must remain positive definite. However, even if the initial matrix P (0) satisfies the second order condition of optimality (least squares optimization problem), the positive definiteness of P (k) can be lost, owing the numerical roundoff errors in
2.3.
RECURSIVE LS METHOD
33
long term behavior (adaptive context, etc.). In order to maintain numerical accuracy it is more advisable to update the estimator in a factorized form which guarantees that P (k) remains positive definite and that the roundoff errors, unavoidablein computerapplications, do not affect the solution significantly. Oneof the most popular methodsis the UDfactorization which is based on the decompositionof P (k) P (k) = V (k) D UT (k) where the factors U (k) and D (k) are, respectively, a unitary upper triangulax matrix and a diagonal matrix.
2.3.3
A posteriori
prediction
error
In the previous developments, the RLSwas derived using the a priori prediction error e(klk
1)=y(k)_~T(k_
1)~(k)
(2.106)
In somecases, the a posteriori version maybe preferred[51] e(klk ) ~’(k)~(k) =y(k)~
(2.107)
The connection between these can be obtained using (2.106) and (2.107) e(k[k)
y( k)’~T(k1)~(k)
= e(klk
(2.108)
1)~ T(k) [~(k)~(k
(2.109)
1)] L( k)e(klk
(2.110)
From(2.98) we derive [~(k)~(k
1)
Substituting (2.97) into this equation leads [
~(k)~(k1)
]
P(k1)~(k) ~+~T(k)P(k_l)cp(k)e 
(klk
1)
(2.111)
Thus, substituting (2.111) into (2.109) gives = e(klk1) ~~~i~~  1)~(k).e ~T(k) P (k 1)~ (k) (klk 1)(2.112) e(klk 1) = 1 +ak~r(k)P(k  1)~(k)
(2.113)
34
CHAPTER 2.
LINEAR REGRESSION
whichis the relation betweena priori and a posteriori prediction errors. The modified RLSalgorithm is then given by (2.97),
B(k)_~T(k 1) ~ ~(ele1)
~(klk1) ~(klk) ~(k)
1 + c~aT (k) P (k  1)~a = ) ~(k 1)+L(k)e(klk
(2.114) (2.115) (2.116)
and (2.99). It can be observed that e (k[k) can tend to zero if ~a (k) becomes unbounded,even if e (k]k  1) doesn’t.
2.4 RLS with exponential
forgetting
The criterion (2.16) gives an estimate based on the average behavior of the system, as expressed by the samplesused in the identification. This resulted in the Algorithms 1 and 2. However,if we believe that the system is timevarying, weneed an estimate that is representative of the current properties of the system. This can be accomplished by putting more weight on newer samples, i.e. by forgetting old information. These types of algorithms are referred to as adaptive algorithms. In the timevarying case, it is necessary to infer the modelat the same time as the data is collected. Themodelis then updated at each time instant whensome new data becomesavailable. The need to cope with timevarying processes is not the only motivatipn for adaptive algorithms. Adaptive identification mayneed to be considered, e.g., for processes that are nonlinear to the extent that one set of model parameters maynot adequately describe the process over its operating region [85]. In order to obtain an estimate that is representative for the current properties of the system at sample instant k, consider a criterion where older measurementsare discounted ([55], pp. 5659): k
J~ (8) = ~ ~~f~ (k,i) [y(i)  oT~
(2.117)
where~ (k, i) is increasing in i for a given k. Thecriterion is still quadratic in ~ and the minimizingoffline estimate is given by 1
0(k)= Z(k,i) i1
~(i)~T (i)
k
Z/~(k,i)~(i)y(i) i~1
(2.118)
2.4.
35
RLS WITH EXPONENTIAL FORGETTING ~(k,i) ""/’
~L1H=Inf ....... 0.5
/
~,=0.95 H=20 ........ "/ 0 0
50
k=100
Figure 2.4: Theeffect of A (ai = 1 for all i). Considerthe followingstructure for/~ (k, i): /3(k,i)= A(k)/3(k
(2.119)
where1 < i < k  1 and A (k) is a scalar. This can also be written ’3 (k’ i)= 1~I=~+1A (j)] a
(2.120)
/3 (i,i) 
(2.121)
where
If A (i) is a constant A, weget /~ (k, i) = Ak~ai
(2.122)
whichgives an exponential forgetting profile in the criterion (2.117). In such a case, the coefficient A is referred to as the forgetting factor. Figure 2.4 illustrates the weighting obtained using a constant A. The effect of A can be illustrated by computing the equivalent memory 1(a~ = 1 for all i). A commonchoice of A is 0.95  0.99. horizon H = T=X WhenA is close to 1, the time constant for the exponential decay is approximately H. Thus choosing A in the range 0.95  0.99 corresponds, roughly, to rememberingthe 20  100 most recent data.
36
CHAPTER 2.
2.4.1
LINEAR REGRESSION
Derivation
Weare nowready to derive a recursive form for the previous equa.tious. Let us introduce the following notation (see (2.77)) k
a(k)~Z(k,i)v ~ (i)
(2.123)
Separating the old and the new information k1
R(k) = y~fl(k,i)cp(i)cp T (i) + fl(k,k)cp(k)cp T (k)
(2.124)
i=1
and substituting (2.119) and (2.122) into this equation leads k1
a(~)y:~ ~Z (~ 1,i) v (i ~ (i) +.~ (~)~ (
(2.125)
i=1
Using (2.123) for R (k  1), we have a recursive formula for R
a(k)= ha(k1)+~,~ (k) ~ (~)
(2.126)
In a similar way to the RLS, we can write a recursive formula for the para~neter update ~ (k)
= ~
(k  1)
qa 1 (k)O~k~O
(~)
[y (k)
 ~T (k  1)
(2.127)
This is exactly the same as (2.84). Again, we can denote P (k) ~(k) and use the matrix inversion lemma(Lemma1) to avoid matrix inversion (2.127) (select A ~ £p1 (k 1) and B ~ ~o (k); G ~ a~ ; T (k)
2.4.2
Algorithm
Nowthe recursive least squares algorithm with exponential forgetting can be given. Algorithm 3 (RLS with exponential forgetting) The recursive squares algorithm with exponential forgetting is given by L(k)=
P(k 1) ~ (k) ~+~V(k) P(k_l)~(k) ~k
least
(2.128)
2.5.
37
KALMAN FILTER ~(k)=~(k 1) +L (k) [y (k) T(k 1)~(k)]
1
(2.129)
(2.130)
where 0 < A _< 1, and A = 1 gives the RLSalgorithm with no forgetting. The effect of the forgetting factor £ is that the P (k) and hence the gain L (k) are kept larger. With ~ < 1, the P (k) will not tend to zero and algorithm will always be alert to changesin 0. Example 12 (Or dynamics: continued) Let us illustrate the performanceof the RLSwith exponential forgetting. Consider the identification problem in an FBCplant in Example9, and let an unmeasured20%decrease in the char feed occur (e.g., due to an increase in the fuel moisture). Fig 2.5 illustrates the prediction and the online estimated parameters whenusing a forgetting factor A 0.97. The change occurs at t = 8 min. The algorithm is able to follow the changesin the process. There exists a large numberof other forgetting schemes. Many(if not most) of themare inspired by the robustness of the Kalmanfilter, discussed in the next section.
2.5 Kalman filter In the Bayesian approach to the parameter estimation problem, the parameter itself is thought of as a randomvariable. Based on the observations of other randomvariables that are correlated with the parameter, we mayinfer information about its value. The Kalmanfilter is developed in such a framework. The unobservable state vector is assumed to be correlated with the output of a system. So, based on the observations of the output, the value of the state vector can be estimated. In what follows, the Kalmanfilter is first introduced for state estimation. This is followed by an application to the parameter estimation problem. Assumethat a stationary stochastic vector signal {x (k)} can be described by the following Markovprocess x(k ÷ 1)
A(k)x(k)+v(k)
(2.131)
38
CHAPTER
2.
LINEAR
REGB:ESSION
0.0~
~
0.0(~
0.0:
5
10
15
10
15
I0
15
2.8 ~2.7
5
0
5
t [min]
Figure 2.5: Online prediction by the estimated model. Upper plot shows the predicted (solid line) and measured (circles) flue gas oxygen content clarity, only every third measurement is shown). The middle plot shows the model input, fuel feed. The evolution of the values of the estimated parameters is shown at the bottom of the figure.
2.5.
39
KALMAN FILTER
with measurement equation y(k) = C (k)x(k)
(2.132)
where x (k) is an S × 1 dimensional column state vector, v (k) is a S dimensional columnvector containing the system noise; and y (k) and e (k) are O × 1 dimensional columnvectors of measurable outputs and the output noise. A (k) is an S × S dimensionalsystem state transition matrix describing the internal dynamics of the system (Markovprocess). C (k) is the O output matrix, describing the relation between states and the measurable outputs. In state estimation, a stationary system is often assumed, A (k) A, C(~) = Theobjective is to estimate the state vector x (k) based on measurements of the outputs y (k), contaminated by noise e (k). The system model sample instant k is assumedto be known: A(k),C(k)
(2.133)
and the processes .{v (k)} and {e (k)} are zero mean, independent Ganssian processes with knownmeanvalues and covariances: E {v (k)} = 0; E {v (k) T (j) } = V (k) 6k
(2.134)
E{e(k)} = 0;E {e(k)e T (j)} = Y (k)6kj
(2.135)
E {e(k)v T (j)} =
(2.136)
where 5~i is the Kroneckerdelta function7. v (k) and e (k) have covariances V (k) and Y (k), respectively, which are nonnegative and symmetric. It assumedthat {y (k)} is available to measurement,but {x (k)} is not. It desirable to predict {x (k)} from the measurementsof {y (k)}. The Kalmanfilter can be derived in a numberof ways. In what follows, the meansquare error approach for the Kalmanpredictor is considered [41]. Wethen proceed by giving the algorithm for the Kalmanfilter (the proof for the filter case is omittedas it is lengthy). 7Kroneckerdelta function is given by ifi=j otherwise
40
CHAPTER 2.
2.5.1
LINEAR REGRESSION
Derivation
Let us introduce the following predictor for the state x at instant .k + 1 ~.(k+l)=A(k)~(k)+K(k)[y(k)C(k)~(k)]
(2.137)
which consists of two terms: a prediction based on the system model and the previous estimate, and a correction term from the difference betweenthe measuredoutput and the output predicted using the system model. The gain matrix K (k) needs to be chosen. Let us consider the following cost function to be minimized g(k+ 1) E {~(k + 1)~T (k + 1)}
(2.138)
where~ is the prediction error (2.139)
~(k+ 1) = ~(k + 1)  x(k The optimal solution is given by K (k) = A (k) P CT
( k)
Iv (
k)
~C
(k) P (k) T (k)] 1 (2.140)
where P (k + 1) = A (k) P (k) T (k) +V (k)  K (k) C (k) P T (k) (2. 141) Proof. Substituting (2.137) into (2.139) we have
= A(k)~(k)+K(k)[y(k)C(k)~(k)]x(k+l) = [A(k)K(k)C(k)]~(k)+K(k)y(k)x(k+l)
(2.142) (2.143)
and substituting (2.131)(2.132) we ~(k+l)

[A(k)K(k)C(k)]~(k)+K(k)C(k)x(k) +K (k)e (k) A (k)x (k)
(2.144)
Reorganizing and using (2.139), we have the following prediction error dynamics ~(k+l)=[A(k)K(k)C(k)]~(k)+K(k)e(k)v(k)
(2.145)
2.5.
KALMAN FILTER
41
The cost function (2.138) can nowbe expressed
J(k+l) E{[[A (k)K(k)C(k)]~, (k)+K(k)e(k) [[A(k)K(k)C(k)]~ (k)+K(k)e(k)T} [A(k)K(k)C(k)]E{~(k)~T x [A(k) K(k) CT +V(k)+ K(k) YgT (k) since e (k), v (k), and ~ (k) are statistically [A(k)K(k) C(k)] are known. Let us use the following notation
{2.146) (2.147)
independents and K (k) and
P(k) T( E{ k)} ~(k)~ Q (k) = Y (k) +C (k) P T (k
(2.148) (2.149)
where P (k) is the covariance matrix of the estimation error. Rewrite (2.147) P(k+l)
A( k)P(k)AT(k)K(k)C(k)P(k)AT(k) A (k) P (k) T (k) KT (k) +V (k) + g (k) Q gT (k)
(2 .150)
By completing squares of terms containing K (k) we find P(k+l)
A( lc)P(k)AT(k)+V(k) A (k) P (k) T (k) Q’ ( k) C(k) P T (k) + [K (k)A (k) P C T (k) Q1 (k)] Q (k) x [g (k)A (k) P (k) T (k) Q1 ( T
(2.151)
Nowonly the last term of the sum depends on K (k), and minimization of can be done by choosing K (k) such that the last term disappears: K (k) = A (k) P (k) T (k) [ Y (k) +C (k) PT (k)] 1
(2.152)
SByassumption, v (k) and e (k) are statistically independent. ~ (k) is given by ~. ~ (k)  x (k). The prediction ~ (k) depends on the past measurementy (k  1), hence dependent on e (k  1). The state x (k) is dependent on noise v (k  1) disturbing the state. Thus the prediction error ~(k) depends on e(k  1) and v(k  1), but e (k) or v (k). Thus, v (k), e (k), and ~ (k) are statistically independent.
42
CHAPTER 2.
LINEAR REGRESSION
Since the last term disappears, we have P(k+I)=A(k)P(k)AT(k)+V(k)K(k)C(k)P(k)AT(k) ¯
Collecting the results, we have the following algorithm for an optimal estimate (in the meansquare error sense) of the next state x (k + 1), based on information up to k: K(k) ~(k+ 1) P(k+ 1)
A(k)~ (k) T I (k) [V (k)4(J (k) (Pk)(jT (k A(k)~(k)g(k)[C(k)~(k)y(k)] n (k)P(k)T(k) +V(k K(k)C(k)P(k) T(k)
(2.155) (2.156)
If the disturbances{e (k) } and {v (k) } as well as the initial state x (0) Gaussian (with meanvalues 0, 0, and x0 and covariances V (k), Y (k) P (0), respectively), the estimate ~ (k ÷ 1) is the meanof the conditional distribution of x(k+ 1), ~(k + 1) = E{x(k + 1) ly(0),y(1),. P (k + 1) is the covarianceof the conditional distribution of x (k + 2.5.2
Algorithm
Let us denote the estimate (2.155) based on information up to time k ~ (k + l[k). A Kalmanfilter can also be derived for estimating the state x (k + 1), assuming nowthat the measurementy (k + 1) has becomeavailable, i.e.
~(k+ llk + 1)=E{x(k + 1)ly(0),y(1),... ,y(~
(2.157)
Consider nowa filter of the form
~(k+ llk + 1) = ~(k+ l[k) + K(k + 1)[y(k + 1)  C
(2.138)
The following algorithm can be derived. (Note that an extended state space modelis used with an additional deterministic input u (k) and a noise transition matrix G (k).) Algorithm 4 (Kalmanfilter) Estimate the state vectors x (k) of a system described by the following equations x(k + 1) = A(k)x(k)
+B(k)u(k)
(2.159)
2.5.
43
KALMAN FILTER y (k) = C (k)x (k) ÷
(2.160)
wherex (k) is an S× 1 state vector: u (k) and v (k) are I× 1 vectors containing the system inputs and Gaussian noise; y (k) and e (k) are O × 1 vectors measurableoutputs and the output Gaussian noise, respectively. A (k) is S × S system state transition matrix; B (k) and G (k) are S × I and S system input and noise transition matrices; C (k) is the O× S output matrix. Thefollowingare knownfor k = k0, k0 + 1, ko + 2, ..., j _< k: A (k),B (k), C (k),
(2.161)
E{v(k))=O;E(v(k)vT(j))=V(k)~
(2.162)
S(e(k)}=O;S(e(k)eT(j)~=Y(k)~
(2.163)
E (e (k) T (j)} = x (ko) ~ Xko;cov (x (ko)}
(2.164) (2.165)
1. Set k = ko. Initialize ~ (kolko) x~o and P (kolko) = P~o. 2. Time update: Computethe state estimate at k ÷ 1, given data up to k: ~(k+llk)=A(k)~(klk)+B(k)u(k )
(2.166)
and update the covariance matrix of the error in P(k+llk)=A(k)P(k]k)AT(k)+C~(k)V(k)CIT(k)
(2.167)
3. Measurement update: Observe the newmeasurementy (k ÷ 1), at time t  kT. Computethe Kalmanfilter
gain matrix:
g (k + 1) = P (k +llk ) C T (k + 1) × [Y (k + 1) ÷ C(k + 1)P (k + llk) T ( k
(2.168) ÷1) 1
Correct the state estimate at k + 1, given data up to k ÷ 1: ~(k +llk+
1) = ~(k+ llk) ÷K(k + 1)[y (k + 1)
(2.169)
44
CHAPTER 2.
LINEAR REGE~SSION
and update the new error covariance matrixg: P(k~llkt1
) = [I K(kt1)C(k ÷ l)]P(kt× [I K(kt 1)C(kt T +K(k + 1) Y (k + KT (k t 1)
) (2.170)
4. Increase sample index k = k + 1 and return to step 2.
2.5.3
Kalman filter
in parameter estimation
Supposethat the data is generated according to
y(k) = ~ (k)0
(2.171)
where e (k) is a sequence of independent Gaussian variables with zero mean and variance a2 (k). Supposealso that the prior distribution of 0 is Gaussian with mean~0 and covariance P0. The model, (2.171), can be seen as a linear statespace model: O(kt 1) = 0(k) y(k) = ¢pT (k)O(k)
(2.172) (2.173)
Comparingwith (2.159)(2.165) shows that these equations are identical whenmakingthe following substitutions:
A(k) ~ t;x(k)+0(~) B(k) ~ 0;u(k)+0 (;(k) ~ 0; v (k) , 0; V(k) C(k) + ¢pT (k);y(k) ~ y(k); e(k) + e(k);Y(k),:a2(k) ~ (010) + ~0; P (010) ~ P0 9This is a numerically better form of P(k + l{k+ 1) = P(k+ llk) (see[39], p. 270).
 K(k+ 1) C(k+ 1) )
(2.174) (2.175) (2.176) (2.177) (2.178) (2.179)
2.5.
45
KALMAN FILTER
The Kalmanfilter algorithm, (2.159)(2.170), is nowgiven by (note P (k) ~ P (k+ llk)  (klk);O(k); O ~ ~( llk)
(2.18o)
K(k+ 1)= a2(k)+cpT( k + 1) P (k) (k ~(k+ 1)=~(k)+
K(k+ 1)(y(k
+ 1)~ T(k)9~(k
+ 1))
P(k+l)=P(k)K(k+l)cpT(k+l)P(k)
(2.181) (2.182)
Comparingwith the RLS(Algorithm 2) shows that the Kalmanfilter holds the RLSas its special case. Not~ that nowthe initial conditions of the ILLS have a clear interpretation: 0 (0) is the prior meanand P (0) is the prior covarianceof the parameters0. Furthermore,the p_.osterior distribution of 0 at sample instant k is also Ganssian with mean0 (k) and covariance P (k) (see [55], pp. 3336). The Kalmanfilter approach also showsthat optimal weightingfactor ak in the least squares criterion is the inverse of the variance of the noise term, a~ = 1/cr 2 (k), at least whenthe noise is white and Gaussian. If the dynamics of the system are changing with time, i.e. the model parameters are timevarying, we can assumethat the parameter vector varies according to 0 (k + 1) = 0 (k) + v
(2.183)
NowV ~ 0 and the covariance update becomes(see (2.167)): P(k+l)=P(k)g(k+l)~oT(k+l)P(k)+V
(2.184)
This prevents the covariance matrix from tending to zero. In fact P (k) ~ whenthe numberof iterations increases, and the algoritlmi remains alert to changes in model parameters. For example, in [23] the addition (regularization) of a constant scaled identity matrix at each sampleinterval is suggested, V ~ ~I. The bounded information algorithm [70] ensures both lower and upper bounds ami n and a~,= for P (k) p (klk) = amax  aminp (k[k  1) + aminI amax
(2.185)
An advantage of the Kalmanfilter approach, comparedto the least squares algorithm with exponential forgetting, is that the nature of the parameter changes can be easily incorporated, and interpreted as the covariance matrix V.
Chapter 3 Linear
Dynamic Systems
In this chapter, our attention is focused on the discretetime blackbox modeling of linear dynamic systems. This type of model is commonlyused in process identification, and is essential in digital process control. Fromthe point of view of control, the simplicity of blackbox modelshas established them as a fundamentMmeansfor obtaining inputoutput representations of processes. The transfer function approach provides a basic tool for representing dynamic systems. Stochastic disturbance models provide a tool for characterizing (unmeasured)disturbances, present in all real systems.
3.1 Transfer function Let us first consider two commonlyused transfer functionI representations of process dynamics. 3.1.1
Finite
impulse
response
A finite impulse response (FIR) system is given (3.1) where 1In order to avoid unnecessary complexity in notation and terminology, the backward shift operator, ql, notation will be used, x (k  i)  q~x (k). Strictly speaking, a division of two polynomials in q1 is not meaningful (whereas the division of two functions in 1 is). However,the reader should consider this transfer operation as a symbolic notation (or as an equivalent z transform). With this loose terminology, we allow ourselves to use the term ’transfer function’ for descriptions that use polynomials in the backwardshift operator.
47
48
CHAPTER 3.
LINEAR
DYNAMIC SYSTEMS
¯ {y (k)} is a sequence of system outputs, and ¯ {u (k)} is a sequence of system inputs, sampled from the process at instants k = 1, 2,... time intervals:
which are usually equidistant
kT=t
(3.2)
where t is the time and T is the sampling interval process is characterized by B (ql)
= bo + blq ~ + ...
which is a polynomial in the backward shift
(e.g.,
in seconds). The
+ b~q
(3.3)
operator q1 (3.4)
d is the time delay (in sampling instants) between process input and output. The system behavior is determined by its coefficients or parameters b~, n = 0, 1, 2, ..., riB, bn E ~. FIR structures are among the simplest used for describing dynamic processes. They involve: ¯ no complex calculations,
and
¯ no assumption on the process order is required. The parameters can be obtained directly from the elements of the impulse response of the system. The choice of nB and d is less critical, if chosen large enough and small enough, respectively. The disadvantages of FIR are that: ¯ unstable processes can not be modelled, ¯ a large number of parameters need to be estimated (e,specially cesses containing slow modes, i.e. slow dynamics).
for pro
3.1.
49
TRANSFER FUNCTION
Residence time Process engineers are often confronted with the calculation of residence time in continuous flow systems (reactors, columns,etc.) [62]. Theresidence time is the time needed for the fluid to travel from one end of the process to the other. The residence time is a convenient time base for normalization (usually, the states variables are madedimensionless and scaled to take the value of unity at their target value). The residence time is also directly related to the efficiency and productivity of a given chemical process. Tracer tests (isotopic, etc.) are commonly used in chemicalengineering for determining the residence time. Anamountof tracer is fed into the process as quickly as possible (impulse input). The output is then measured and interpreted as the process impulse response. For linear systems, the residence time is directly calculated from the impulse response or from the parameters of their transfer function [97]. A linear system can be defined by its continuoustime impulse response g(t). Its output equation is given by: y(t) = / g(t r)u(~)d~
(3.5)
~’~0
where y(t) and u(t) represent respectively the output and the input. The residence time [97] is given by:
Tr¢ s
~=0
(36)
f g(t)dt In continuous flow system, the residence time can be interpreted as the expected time it takes for a molecule to pass trough the flow system. The residence time can also be connected to the inputoutput signals without using a phenomenologicalmodel description of the considered process. Based on the concept of impulse response function, the residence time can be calculated as follows: ¯ Continuoustime systems: ~’res
TF’(O) TF(O)
(3.7)
where TF(s) is the Laplace transform of the impulse response g(t) and TF’(.) is the derivative of TF(.) with respect to s.
50
CHAPTER 3.
LINEAR DYNAMIC SYSTEMS
¯ Discretetime systems: ~ kbk T.~e
8 ~
k=O
(3.8)
~, bk
kO
B(z) represents the discrete impulse response function defined as
k B(z) = bkz
(3.9)
k=0
It is easy to verify that in the discrete case the residence time is B’(1) T,,e~  B(1)
(3.10)
whereB’(.) is the derivative of B(.) with respect to The results concerning the calculation of the residence time for linear systems can also be extended to multidimensional continuous flow in non linear systems[65]. 3.1.2
Transfer
function
A moregeneral structure is the transfer function (TF) structure. It holds the FIRstructure as its special case. Definition 3 (Transfer function) Transfer function is given ~) B(q y(k) = A(q’)
u(kd)
(3.11)
where A is a monic polynomial of degree nA A (ql) =1 +alq1 nt
’~’~ } a,~Aq
(3.12)
B (ql) = bo + b~q1 ’~" + ... + b,~Bq
(3.13)
...
and B is a polynomial of degree nB
wherean E ~, n = 1, 2, ..., nAand bn E ~, n = 0, 1, ..., nt~. The main advantages of the TF model are that:
3.1.
TRANSFER FUNCTION
51
¯ a minimal numberof parameters is required, ¯ both stable and unstable processes can be described. Disadvantages include that ¯ an assumption of process order, hA, is needed (in addition to nB and
d), ¯ the prediction is morecomplex to compute. Poles and zeros give a convenient way of characterizing the behavior of systems described using transfer functions. Note, that switching into ztransform gives = z’~B1)(zl) A(z = z~b° q blZ1 q "’" q b’~’zn~ 1 + alz 1 ’~a + ... + anaz Multiplying the numerator and the denominator by znB+n4+dgives 1 ) U(z
(3.14) (3.15)
Y(z__~)= TM (bozTM + b,z~,~ + ... + b~,) = B (z~
(3.16) U (z) ~.+~ ( TM + al z ~~ + .. . + a~) A (z The roots of the polynomialsgive the poles (roots of A (z) = 0) and the zeros (roots of B (z) = 0) of the system. Definition 4 (Poles and zeros) For a tra~fer function (Deflation 3) nB zeros of the system are obtained from the roots of B (z) = boz~ + b~znB1 + ... + b~. = 0 (3.17) The nA pol~ of the syste~n are obtained from the roots of A (z) = ~A +alzha1 + .. . + a~a = 0
(3.18)
A (z) can be represented nR
nC
A (z) = ~ (z  p~). ~ + a~z + Z~)
(3.19)
n=l
where p~ are the nn real poles and z 2 + anZ + ~ contain the nc complex pMrs of pol~ of the system. In a simil~ way, B (z) can be represented
B (z)
nR
= (z
nc
+ + &)
n=l
where r~ are the na real zeros and z2 + a~z + ~ contain the nc complex pairs of zeros of the system.
52
CHAPTER 3.
LINEAR DYNAMIC SYSTEMS
Thesteadystate gain is obtained whenz + 1 lim z’~l
Y(z) ~
(3.21)
U (z)
From(3.16) it is simple to derive the followingresult. Algorithm 5 (Steadystate gain) The steadystate gain of a system described using a transfer function (Definition 3) is given nB
K~ = n=0 1+
(3.22)
where K,~ E ~ denotes the steadystate gain of the system. Example 13 (Pole and steadystate order system:
gain) Consider the following first
y(k) = ay(k 1)+ u(k 1)
(3.23)
The system can be written as 1
Z i) y(z1)_=B(zU (z 1) A (z 1) 1 1  az
(3.24)
and
Y(z) g(z)
B(z) A(z)
(3.25)
za
This system has one pole in z = a. The steadystate gain is given by K,~,~ = ~ (3.26) 1a In general, a system is stable2 if all its poles are located inside the unit circle. If at least one pole is on or outside the unit circle, the systemis not stable. Example 14 (Stability) Consider the system in Example 13 with initial condition y (0) = y0 and control input u (k) = 0. Thefuture values of the system for k = 1, 2, ... are given by y(k) = akyo
(3.27)
If ]a I < 1, then y (k) tends to zero and the system is stable. 2BIBO stability: bounded output.
A system
is
BIBO stable,
if
for
every
bounded
input,
we have a
3.2.
53
DETERMINISTIC DISTURBANCES
3.2 Deterministic
disturbances
In general, a real process is always subjected to disturbances. The effects of the system environmentand approximation errors are modelled as disturbance processes. Modelsof disturbance processes should capture the essential characteristics of the disturbances. In control, the disturbances that affect the control performance without makingthe resulting controller implementation uneconomical,are of interest. Consider a TF structure with a disturbance: y(k)= B(q1) A (ql)
(kd)+~(k)
(3.28)
where ~ (k) represents the totality of all disturbances at the output of the process. It is the sumof both deterministic and stochastic disturbances. In somecases, deterministic disturbances are exactly predictable. Asstmle that the disturbances are described by the following model C~(ql) ~ (k)
(3.29)
Typical exactly predictable deterministic disturbances include ¯ a constant C~ (ql)
= 1  q1
= 1 
1 co s (w Ts) +
(3.30)
¯ a sinusoid C~ (ql)
Example 15 (Constant deterministic bance gives
q
(3.31)
disturbance) A constant distur
~ (k) = ~ (k 
(3.32)
Thus, the effect of a disturbance at sampling instant k  1 remains also at instant k.
3.3 Stochastic
disturbances
The most serious difficulty in applying identification and control techniques to industrial processes is the lack of goodmodels.The effect of the environment of the process to be modeled, and approximation errors, are modeledas
54
CHAPTER 3.
LINEAR DYNAMIC SYSTEMS
disturbance processes. Thesedisturbances are classified into two categories: measured (e.g., ambient temperature) and unmeasured(e.g., particle size distributions, or compositionof raw materials). Usually, randomdisturbances are assumedto be stationary. Let us recall the definition of stationary processes. Definition 5 (Stationary process) A process {x (k), k E T} is said be stationary if, for any {kl, k2, ..., kg}, i.e. any finite subset of T, the joint distribution (x (kl + T), Z (k2 + T),..., X (kN + T)) of X(k + T) does not depend upon T. The modeling of unmeasuredperturbation is based on a sequence {e (k)} independent randomvariables with zero mean, E {e (k)} = 0, and variance 0"2 i. These assumptionsare not restrictive. In fact, a randomsequence {b
suchthatE{b(k)}=andE{b(k)2} =0"2canbeexpressed asa function of e (k) as follows b(k)=ae(k)+m
(3.33)
Remark 3 (Gaussian stationary processes) The usual argument given in favor of Gaussian stationary processes hinges upon the central limit theorem. Roughly, a large numberof small independent fluctuations, whenaveraged, give a Gaussian randomvariable. Notice also that linear operations upon Gaussian process leave it Gaussian. Physically independent sources (linear systems or linear regime) of small disturbances produce Gaussian processes. Example16 (Fluidized bed) Consider a bubbling fluidized bed [20]. Theoretically it is possible to understand and predict the mechanismand coalescence for two or three isolated bubbles in a deterministic manner. However, we are unable to extend the deterministic model to accurately predict the behavior of a large swarmof bubbles, since we do not have exact and complete knowledgeabout the initial conditions (startup of a fluidized bed) and external forces acting on the system (particle size distributions, etc.). Such a process appears to us to be stochastic, and we speak of the randomcoalescence and movementof the bubbles, which leads to pressure and density fluctuations.
3.3. 3.3.1
STOCHASTICDISTURBANCES Offset
55
in noise
The following model ~ (k) = C (ql) {~(k)
(3.34)
where C is a polynomial in the backwardshift operator ql, can be used to describe the noise affecting the plant under consideration. Themodelconsists of a zero meanrandomnoise sequence, (e (k)}, colored by the polynomial The offset is not modeledby (3.34). To take the offset into account, the following solution has been proposed ~ (k) = C (ql) e(k)
(3.35)
where d is a constant dependingon the plant operating point. However,it has been shownthat even if d is a constant, or slowly varying, there are inherent problems in estimating its value (appearance of 1 in the regressor, which is not a persistently exciting signal). Thus, the parameter d is inherently different from the other parameters of the model. Abetter solution, whichdoes not involve the estimation of the offset, is to assumethat the perturbation process has stationary increments, i.e. 1) C(q
A (~e (k)
(3.36)
where A is the difference operator i (ql)
= 1 __q1
(3.37)
This disturbance model is morerealistic. It can be interpreted as random step disturbances occurring at randomintervals (e.g., sudden changeof load or variation in the quality of feed flow). The model described in (3.36) corresponds to the inherent inclusion an integrator in the closed loop system. In general, the perturbation is described by c D where D is a polynomial in q1. The choice of D = AD* allows the incorporation of an explicit integral action into the design, where D* is a polynomial in q1. In particular, the choice D = AAis common. Various system structures with stochastic disturbances will be considered in the following. 3.3.2
BoxJenkins
The representation of process dynamicsis usually achieved with a disturbed linear discretetime model. Practically all linear blackbox SISOmodelstructures can be seen as special cases of the BoxJenkins(B J) modelstructure.
56
CHAPTER 3.
Definition
6 (BoxJenkins)
LINEAR
DYNAMIC SYSTEMS
BoxJenkins (B J) structure
is given
y(k)= A(q_l)uB(q1~) (k  d) +D~) e (k)C
(3.38)
where{ y (k) } is a sequence of process outputs, {u (k) } is a sequence of process inputs, and {e (k)} is a discrete white noise sequence (zero mean with finite variance (r 2) sampled from the process at instants k = 1, 2, ...;
A(q~)= 1 +alq1 ’~A +... + a,~aq B(ql)= bo+ blq~+... +~,~,q~" ~’~c C(q~)=1 +c~q +... +c,~cq nn 1 + .. . + dnDq n (q~) 1 + diq
(3.39) (3.40) (3.41) (3.42)
are polynomials in the backward shift operator q~, qix (k) = x (k 1) Basically, this type of blackbox system is used for four main purposes: 1. characterizing
(understanding) the inputoutput behavior of a process,
2. predicting future responses of a plant, 3. developing control systems and tuning controllers, 4. filtering
and
and smoothing of signals.
Items 12 are related to process modeling (monitoring, fault detection, etc.) and items 23 to process control (controller design, especially modelbased control). The fourth topic concerns signal processing (handling of measurements in process engineering). d is the time delay (in sampling instants) between the process input and the output: In process modeling d >_ 1 assures causality: process output, can not change before (or exactly at the same time) when a change in process input occurs. d _< 0 is used in filtering (smoothing) signals, d  0 can be used online filtering to remove measurement noise; d < 0 can be applied only in offline filtering (to computethe filtered signal, future values of the signal are required).
3.3.
57
STOCHASTIC DISTURBANCES
In what follows, interest is focused on process modeling,d _> 1, whered is not a design parameter, but depends on the time delay observed in the process to be modeled. Assumethat the current sample instant is k, and that the following informationis available: current and past process inputs u (k), u (k  1), ..., u (k processoutputs y (k), y (k  1), ..., y (k  hA). Let us denote by ~(k + 1) the prediction of y (k + 1) obtained using model. Let us assume further that ¯ the predictions ~(k),~(k 1),...,~’(k
max(nA, nc)).
are available as well. In practice, an exact mathematical description of the dynamicresponse of an industrial process maybe either impossible or impractical. Theuse of linear modelsinvolves a loss of information (approximationerrors, neglected dynamics). Whenselecting a structure for a stochastic process model, an assumption on the effect of noise is made. In the following, some commonlyused transfer function models (inputoutput model of the process) with stochastic noise models(effect of unmeasurednoise to the process output) are discussed. 3.3.3
Autoregressive
exogenous
Avariety of realworld processes can be described well by the autoregressive (AR) model. The ARprocess can be viewed as a result of passing of the white noise through a linear allpole filter. In the acronym ARX,the X denotes the presence of an exogenousvariable. Definition 7 (ARXstructure) ARX(autoregressive exogenous) structure is obtained by setting C = 1, D = A in the general structure (Definition 6): 1) B(q 1 y (k) = A(q_l) u (k  d) +)A(q"~e
(3.43)
Let us rewrite the ARXsystem for k + 1 and multiply by A: A(q1)
y(k+l)=B(q1)u(kd+l)qe(kbl)
(3.44)
58
CHAPTER 3.
LINEAR DYNAMIC SYSTEMS
For the system output at k + 1 we get y(k+l)=B(q1)u(kd+l)Al(q1)y(k)+e(k+l)
(3.45)
where A = 1 + qIA~. Noticing that the first two terms on the right side can be calculated exactly from the available data up to time k, and the noise term e (k + 1) will act on the process in the future, we have that ~(k+l)=B(q~)u(kd+l)A~(q~)y(k)
(3.46)
3. which minimizes the expected squared prediction error Algorithm 6 (ARXpredictor) 7) is givenby:
Predictor
for an ARXsystem (Definition
~(k + 1) = B (q*) u(k  d 1) A , (q *) y(k
(3.48)
whereAI (q*) = a, + ... + a,~Aq(’~A1). The prediction is a function of the process measurements. 3The objective is to find a linear predictor depending on the information available up to and including k which minimizes the expectation of the squared prediction error, i.e. (3.47) where E {.} represents the conditional expectation (on the available data). Introducing (3.45) in (3.47), we 2} E{[y(k+I)~] E { [B (q1)u(k_d E{[(B(q1)u(l~dq
1)  A, (q ’) y(k ) + e (k 1)
1)
E { [B (q’)u (k  d + 1) 1
1 ( q1)y(lg)_~) q (q l) y (
e(k n
2t } 1 )]
~)  ~
+~E{(~(ql)~(~~+ ~)a, (q’)~(a) ~) +E{¢~(~+ ~)} Since e(k+l) is independent with respect to u(kd),u(kd1),.., y (k), y (k 1),... , and a linear combination of these variables generating ~, the ond term will be zero. The third term does not depend on the choice of ~ and the criterion will be minimizedif the first term becomesnull. This leads to (3.46).
3.3.
59
STOCHASTIC DISTURBANCES
The prediction given by the ARXstructure can further be written out as scalar computations: ~(k+l)
bo u(kd+l) +blu(k  d) + ... +b,~Bu(k  d + 1  nt~) aly(k) a2y ( k  1)  ... a,~Ay (k  nA + 1)
(3.49)
Note, that the predictor can be written as a linear regression ~(k + 1)= ~ (k +
(3.50)
where r= ...it, and the LS methodcan be used for estimating the parameters. In general, the predictor can be written as ~’(k + 1)= f(u(kd + 1), ...,y
(k)
(3.51)
wheref is a linear function of the process inputs and past process outputs. If f is a nonlinear function, these modelsare referred to as NARX models. The prediction is a function of the process inputs and the past (real, measured) process outputs. This avoids the modelto drift far from the true process in the case of modelingerrors. 3.3.4
Output
error
Definition 8 (OF, structure) Output error (OE) structure setting C = D = 1 in the general system (Definition 6): ~) B(q y(k) = A (q~) u(k  d)
is obtained
(3.52)
In the OEsystem, the process output is disturbed by white noise only. Let us calculate the output of such a system at the future samplinginstant k + 1 (onestepahead prediction); assume that A and B are known. Wecan rewrite the OEstructure for k + 1 y(k + 1)= B(q1) A(q~) u (k  d + 1) + e (k
(3.53)
60
CHAPTER 3.
LINEAR DYNAMIC SYSTEMS
Thefirst term on the right side is a deterministic process, with u (t;  d + 1), u (k d 2),... available. The second term is unknown(not available instant k) but {e (k)} is assumedto have zero mean. Thus, we ~(k + 1)=~(qX~,u(k_ dB + 1) x) A(q
(3.54)
wherethe hat in ~ indicates that a prediction of y is considered. It is easy to 4. showthat this predictor minimizesthe expected squared prediction error Algorithm 7 (OE 1stepahead (Definition 8) is given
predictor)
Predictor
1) B(q
~(k + 1)= A(q_x)u(kd+
for an OE system
1)
(3.55)
The predictor operates ’in parallel’ with the process. Only a sequence of system inputs is required and the measured process outputs y (k) are not needed. The predictor can be written in a moreexplicit wayas ~(k+l)=B(q1)u(kd+l)Ax(q1)~(k)
(3.56)
where At (q~) = ax + ... + a,.~q (’~Ax) (containing the modelcoefficients correspondingto past predictions), i.e. given by A (qX) = 1 + qlA1 (ql)
(3.57)
4Let us minimize the expected squared prediction error, ~(k + 1) =argn~nE {[y(k 2} 1) y~ Substituting (3.52) to the above, we get
+E
+
The secondtermwiB be zero.The t~rdtermdo~ not dependon the choi~of ~. The criterionwiB be ~d if the ~st termbecomesn~l.T~s le~ to (3.54)[51].
3.3.
STOCHASTIC
61
DISTURBANCES
The prediction given by the OEstructure can further be written out as scalar computations: ~(k+l)
bo u(kd+l)
(3.58)
+blu (k  d) + ... +bn~u(k  d + 1  riB)
1)... a,~4~(k  nA + 1) Note, that the prediction
has the form
~(k+l)=f(u(kd+l),
,~(k),...)
(3.59)
where f is a linear function (superposition) of the process inputs and past predictions. Nonlinear models are referred to as NOEmodels (nonlinear output error). The prediction is a function of the past predicted outputs. Notice that the output measurement noise does not affect the prediction. Notice also that we can write the predictor as ~(k + 1) = ~T~ (k); however, the ~s in the regression vector are functions of the parameters 0 (see Section 3.3.8 howthis affects the parameter estimation). 3.3.5
Other
structures
A third important system structure average exogenous) structure.
is the ARMAX (autoregressive
Definition 9 (ARMAXstructure) The ARMAXstructure by setting D = A in the general structure (Definition 6): A(q1)
y(k) = B(q1)
u(kd)+v(q1)e(k)
moving
is obtained
(3.60)
Let us again rewrite the system for k + 1 A(q1)
y(k+l)=B(qi)u(kd+l)+C(q1)e(k+l)
(3.61)
Defining C1 (C = 1 + qlC1) and A1 (C and A are monic), we can write y(k+l)
A ~(q~)y(k)+B(qi)u(kd+l) +e(k + 1)+C~ (q~)
e(k)
(3 .62)
62
CHAPTER 3.
LINEAR DYNAMIC SYSTEMS
Takinginto account that the randomvariable e(k + 1) will act on the process (system) in the future, we obtain an expression of the ARMAX predictor ff(k+l)=Al(q1)y(k)+B(q1)u(kd+l)+Cl(q~)e(k) 5. which is the prediction minimizingthe expected squared error In viewof (3.62), it follows that the prediction error is equal e (k) = y (k) 
(3.64)
The past noise terms can be calculated from data. Alternatively, obtain the expression from (3.60) for computingpast noise terms
we can
e(k) = ~=Z[ ~(k) ~~u Algorithm 8 (ARMAX predictor) (Definition 9) is given by: ~(k + 1)=A, (q~)y(k)+
Predictor
for an ARMAX structure
B(q’)u(kd+
1)+C1 (ql) ¢(k) (3.66) where e (k) = y (k)  ~(k). It is a function of three terms: the measurements, system inputs, and knownerrors. Substituting (3.64) to (3.63) and reorganizing, we can see that the diction can be written as ~(k+l)
= (Cl(q~)A~(q1))~(k)
+~(q1)u(~~+ cl (q~)~(~) 5Let us minimize the expected squared prediction given by (3.62). Wehave that
error.
(3.67) Let the ARMAX system be
= ¢{[~1 (q’)~(~)+ S(q~)u(~~+ ~)
+c~(q~) e (~)+e(~+~]:} Reorganizing gives J ~ E{[A1 (q1)
y(k)qJ~(q1)~z(kdq
1)be1 2) (q1)
e(k)~]
+2E{e (k + 1)
[A~ (~~)~ (~)+~ (~’)~(~ ~ +~)+c, (~~)~(~) where the l~t term ~nishes since e (k + 1) is independent of all previous obse~tions. The minimumof J is obtained at (3.63).
3.3.
STOCHASTIC
DISTURBANCES
63
which has the form ~(k + 1) = f(u(kd+ 1),...,y(k),...,~(k),...)
(3.68)
where f is a linear function of the process inputs, past process outputs, and past predictions; nonlinear models are referred to as NARMAX models. Another important form is obtained by rewriting the noise term (q ~ (k + 1) =AC (~e(~+
(3.69)
Using definitions for C1, A1 ~ (k + 1) + A~ (k)~ (k) e( k + 1)+ C~e(k) and ~ (k)= ~e(k),
from (3.69),
(3.70)
we have
{(k + 1)=e(k 1) + [C ~ (q ~) A1 (q l) C(q A(q_~)je(k)
(3.71)
From(3.65) we get an expression for the past noise terms. Substituting (3.65) to (3.71) and reorganizing we have for the noise term ~(k+l)
= e(k+l)
(3.72) y(k)
A(q_,)
C(~_i~ Substituting (3.72) for the noise term we obtain another expression for the ARMAXpredictor. Algorithm 9 (ARMAXpredictor: continued) MAX structure (Definition 9) is given by: ~(k+l)
Predictor
for
an AR
1) B(q
m( q_l)u(k+ld)
(3.73)
~) _~ Cl (q~)  At (q~) B (q C (ql) Y(k)A~u(kd) Thus the ARMAX predictor and a correction term.
can be seen as consisting ~
Example 17 (ARMA)Let us consider the following
of an OE predictor
stochastic
y(k)+ ay(~ 1) = e(k)+ ~(k
]
process [2]
(3.74)
64
CHAPTER 3.
LINEAR
where {e (k)} is a sequence of equally distributed with zero mean. The process can be written as
DYNAMIC SYSTEMS normal random variables
1 +cq_~e(k l ) +aq
y(k)=
(3.75)
Consider the situation at sampling instant k when y (k),y (k 1),... observed and we want to determine y (k + 1). (3.75) gives y(k+ 1)
=
1 + 1 cq ~ 1 +aq
= e(k+
e(k ÷ 1)
1)
1 + 1 aq
(3.76) e(k)
(3.77)
The term e (k + 1) is independent of all observations. The last term is linear combination of e (k), e (k  1) ,... to be computed from the available data: e(k)
1 +aqly(k ) ~ 1 +cq
(3.78)
Eliminating e (k) from (3.77), we obtain ca
y(k+l)=e(k+l)+
~y( 1 + k) cq
(3.79)
The problem nowis to find the prediction ~(k + 1) of y (k + 1), based the available data at instant k, such that the criterion J= E{~(k
+ 1)}
(3.s0)
is minimized, where s (k + 1) is the prediction error e(k+ Equations (3.79)(3.81)
1)=y(k+
1)~(k+
(3.81)
lead E{e=(k+l)} +E 1 + cq ~ y (k) ~ +2E e(k+l)
l+cq
(3.82)
3.3.
STOCHASTIC
65
DISTURBANCES
As e (k + 1) is independent of the observations available at instant k, it follows that the last term vanishes. Hence, we can write (3.83)
J=E{e2(k+l)}>_E{e2(k+l)} where the equality is obtained for ~ = ~(k + l)
ca
(3.84)
1 + cq’ y (k)
The prediction error is given by z (k + 1) = e (k + Example 18 (ARMA: continued) Let us obtain the same result Algorithm 9. From the system given by (3.75) we get C (ql)
1 _ 1 ~ cq
using
(3.85)
B (q’) =
(3.86)
~ A (ql)
(3.87)
= 1 +aq
Using Algorithm 8 we get ~(k+X) Substituting
C~ (qi)
A~ 1)
c(q1)
y(k)
C, C~ = c and A1 = a gives ~(k + 1)= ~ +cqy(k)
(3.88)
Definition 10 (ARIMAXstructure) ARIMAX(autoregressive integral moving average exogenous) structure is obtained by setting C = 1, D = AA in the general structure (Definition 6): y(k)  B (q~)u(k_ A(q’)
I) C(q
(3.89)
e(k) AA(q~)
where A = 1  q~. Multiplying (3.89) by AA, reorganizing, and shifting AA(q~)y(k+l)=B(q~)Au(kd+l)+C(q1)e(k+l)
to k + 1 gives (3.90)
The ARIMAX system can be seen as an ARMAX process, where A (q~) ~AA(ql) and u (k) ~ Au (k). Then, using Algorithm 8, we have predictor.
66
CHAPTER 3.
Algorithm 10 (ARIMAXpredictor) tem is given by
LINEAR DYNAMIC SYSTEMS
The predictor
~(k + 1) = [AA],(ql)y(k)+B dC1 (ql)e (]g)
for an ARIMAX sys
(ql)Au(k
d~ 1)
(3.91)
where e(k) = y(k)  ~(k) and [AA] = 1 + q1 [AA]I. In the ARIMAX process, the noise (filtered by C) is integrated to the process output, which makesit possible to model disturbances of randomwalk type. The ARIMAX model (also referred to as the CARIMA model) is used the Generalized Predictive Control (GPC). Dueto the integral term present in the noise model, an additional integraloferror term is not neededin the controller.
3.3.6
Diophantine
equation
Prediction is intimately related to the separation procedure of available and unavailable data. This separation procedure is performed by Diophantine equation which will be presented next. The Diophantine equation 1) q_i Fi (q
(3.92)
is used for separating a transfer operator into future and knownparts (available and unavailable information). The solution to this equation will be needed in the next sections. Equation (3.92) can be solved in a recursive fashion, so that polynomials Ei+l and F/+I are obtained given the values of Ei and Fi. In the following, this recursive solution will be derived. Let us assume that Y is monic. Hence, the polynomials are given by
v (ql) = 1 + y,q’+ ... + y~yq~
x (qI)=x0+~lq1~ +x... +x.~q Ei (ql) = ei,o + ei,lq 1 ’~E’ + ... + ei,,,~,q F, (q~) = £,o + £,1q1 + ". iq + nF £,n~ ’
(3.93)
(3.94) (3.95) (3.96)
Consider two Diophantine equations
x (qi)= y (q,)E,÷I(q~)+ q(’÷l)F,+~ X(ql)_y (qi)Ei(qi)4qiFi(qi)
(3.97) (3.98)
3.3.
STOCHASTIC DISTURBANCES
67
Subtracting (3.97) from (3.98) yields 0 = Y (qI) [Ei+~ (ql) Ei(q l)] +qi [q’F~+~(q~)  Fi (ql)]
(3.99)
The polynomial E~+I  E~ can be split into two parts (by simply taking out one element) Ei+I (ql)
Ei(q l) 
 ~ ( q l) _~_ ei+l ,iqi
(3.100)
Substituting (3.100)into (3.99)gives i e o =~(q~)[~(ql)+,÷l,,q +q’ [qlF,÷l (q~)  F~(q’)]
= y (ql) N (ql)
(~.~o~) (3.1o2)
Hence,it follows that /~ (ql)
=
(3.103)
and q~Fi+l(q~)Fi(q1)+
y(q1)ei+Li=O
(3.104)
In order to derive the coefficients of the polynomialq~F~+~,let us rewrite this equation into the following form: ql[fi+l,Oqfi+l,lql+...qfi+l,nFi+lq
nFi+l] (3.105)
+ [1 + y~q~+ ... + y, wq’w] ei+l,i = 0 Finally, we obtain ei+l,i
~
(3.~06)
fi,o
f~+~,o= ~,~ yae~+~,~ /i+1,1
= /i,2  Y2ei+l,i
fi+l,j
=" fi,j+l :
 yj+iei+l,i
(3.107)
68
CHAPTER 3.
LINEAR
DYNAMIC SYSTEMS
where (3.107) is for j = 1, 2, ... Thus, a recursive formula for computing was obtained. Using (3.100) and (3.103), we also obtain a recursive formula for (3.108)
Ei+x (qa) = Ei (qa) + e~+x,iqi Nowall that is needed are the initial formula. Setting i = 1 in (3.92) gives X (qX) = E, (qa) y(qa) X(q 1) ~ 1) El(qi)
values Ex and Fx for the recursive
+ qlFx (ql)
(3.109)
y(q1)WqiFl(q
(3.110)
Since Y is monic, we get E1 (ql) _and substituting
(3.111) into
(3.111)
(3.11o)gives
F1 (ql) = q IX (ql)
xoY (q l)]
(3.112)
The Diophantine equation (3.92) for (3.93)(3.96) can thus be solved ing from the initial values E1 and Fa given by (3.111) and (3.112). The lutions E~ and F~, i = 2, 3, ... , are then obtained recursively using (3.106), (3.107), and (3.108) using i = 1,2,3, Algorithm 11 (Solution of the Diophantine of the Diophantine equation 
= Ei (qa)
equation)
+ qi~
1
The solution
(ql) (qI)
(3.113)
where y (qa) _~ 1 4yaq1 nY + ... X (ql)
4 ynvq
= Xo 4 xaq1 4 ...
4 XnxqnX
(qX)= e ,o + e ,aqa + ... + q a + ... + I~,,~F,q’~F’ F~(qX)= .5,o + f~,aq ny > 0, can be computed recursively E, (qa) F1 (ql)
(3.114) (3.115)
(3.116) (3.117)
using (3.118)
= = q IX (ql)
_ Xo]Z (ql)]
(3.119)
3.3.
STOCHASTIC
69
DISTURBANCES
and for i = 1, 2, ... and j = 0, 1, ...,
max(nx  i, ny  1) (3.120)
ei+l,i = fi,0 fi+l,j = fi,j+l  yj+le~+l,~ = E, (qi) + ei+l,,q~ E/+I (qi)
(3.121) (3.122)
The degrees of the polynomials are given by n~,,
= i
1
(3.123)
nF~ = max(nx  i,
3.3.7
/stepahead
(3.124)
predictions
Let us consider a BoxJenkins structure y(k)=
ny  1)
(Definition
B(q1)u(kd)+ 1) A(q
6)
1) C_.(q eD~~
(k)
(3.125)
where the disturbance is given by ~ (k)
1) C(q = n(~e(k)
(3.126)
and let us calculate a ’onestep’ algorithm for obtaining/step ahead predictions (see [88]). Thus, we wish to have a prediction ~(k + i) for the plant put y (k + i), provided with information up to instant k: y (k), y (k  1), Observe that the future output u(k),u(k 1),... and ~(k),~(k 1), .... values y(k +i)=B(q1)A(q_l) u(k +id)+D(q_l)e(k +i)C(q1)
(3.127)
can only be predicted with uncertainty since the future noise terms e (k + 1), e (k + 2), ..., e (k + i) are unknown.The minimization of such an uncertainty is the objective of the predictor design problem. This is a crucial issue in the predictive control, to be discussed in later chapters. Separation
of disturbance
Let us start by separating unknown terms (future) and known terms introducing the Diophantine equation for the disturbance process C n(q1)
= E, (q~) + qi~ (qi) (qi)
(3.128)
70
CHAPTER 3.
LINEAR
DYNAMIC SYSTEMS
where degEi(qX) degFi
(q x)
if riD>0 = { i1 min (i  1, nc) otherwise
(3.129)
= max(nc
(3.130)
i,
no1)
The disturbance at k + i can be decomposed into unknown (future) known (current and past) parts ~ (k + i) = Ei (ql) e (]~ d d
e(k) ~) D(q
and
(3.131)
The polynomials Ei and F~ are usually solved recursively (see 3.3.6). Assume that the solutions E~ and F/ are available. The second term on the right side can be compUtedby multiplying (3.125) by ~ (ql)
Fi (q~) B (q*), (k  d) F~(q *) C ( ql) ,,,
c (ql) v(k) c (q,)A~
’~ C)(ql) d(~e~
(3.132)
and rearranging Fi (ql)
Fi (ql)
BA(q~)(q~)u (k  d)] The process output/steps y(k+i)
(3.133)
ahead then becomes
(3.134)
= dB(q~)
X~~=~u (~ +i)
1) C(q
A(qi)
u 
+E,(ql) e(~ The third term depends on future noise terms e (k + i), which are unknown. However, {e (k)} was assumed to have zero mean, and we can take the conditional expectation of y (k + i), given all data up to k and the future process inputs. The best/step ahead predictor (in the sense that the variance of the prediction error is minimal) then becomes
4B(q~)u(k + ~(kti) = q ~(q1) .~Fi(q~) B(qa)u(k_d) ] C(q’) y(k) A(q’)
(3.135)
3.3.
STOCHASTIC
71
DISTURBANCES
Notice, that (3.135) represents the/stepahead prediction as a function system inputs and prediction errors. The prediction error for the i’th predictor is given by
~(~+ ~)=y(~+i)~(k+~)=E,(qi)~(~
(3.136)
which consists of future noise only (white noise with zero mean and variance a2). The variance is given by
j0
where ei,j is the j’th element of Ei. Thus, the variance of the prediction error is minimal. Let us continue a bit further and write (3.135) strictly as a function system inputs and past outputs. Multiplying both sides of the Diophantine (3.128) with BD/AC we obtain B(q1) A(q_~)
~B(q1)F~(q1)
B(q~)D(q1)Ei(q1)  A(q_~)C(q_~)
(3.138)
+ A(q~)C(q~)
which with (3.135) yields:
~(k+i)
1)
q_d [B(q1)D(q~)E{(q A(q1)C(q,) tFi(q1) C (q_l)
B(q1)
q_iB(q1)F~(q1) 1 + A(q_l)~:~ju(k+i) (kd)]
(3.139)
y(k)
Simple algebraic calculations lead to the following/stepahead Algorithm 12 (/stepahead BJ predictor) The/stepahead for a BoxJenkins system (Definition 6) is given ~(k + i) = dB
( ql)
D(qi)
E~(q1) A(q1)C(q
l) ?~
(k { i) ~
predictor. predictor
1) Fi(q C(q_l)y(k) (3.140)

where E~ and F~ are obtained from the Diophantine equation
D(q_i)
(3.141)
72
CHAPTER 3.
LINEAR
DYNAMIC SYSTEMS
Example 19 (1stepahead OE predictor) Let us derive ahead predictor for an OE system (Definition 8). The Diophantine equation becomes
the onestep
1 E1(ql) tqiF1 (ql)
(3.142)
for which the solution is
The predictor
E1 (qi)
=
(3.143)
F1 (ql)
_
(3.144)
becomes ~’(k + 1) B(q1)u(kd+ A (qi)
1)
(3.145) 
If A is a factor of D, numerical problems may occur (notice that in the ARX and ARMAXstructures D = A, in the ARIMAXD  AA.) To avoid these problems, let us rewrite the algorithm for this particular case. Algorithm 13 (/stepahead of D, denote
BJ predictor:
D (ql) The/stepahead given by
predictor
continued)
= D1 (qi)
If A is a factor
A (ql)
(3.146)
for a BoxJenkins system (Definition 1) Fi(q
~(k+i)=qdB(q1)nl(q1)Ei(qi)u(k+i)+
6) is then
(k) C(q1)Y
C(q1)
(3.147)
where Ei and F~ are obtained from the Diophantine equation C (ql__) D (q_l) = Ei (qi)
iFi (qi) + q ~(qi)
(3.148)
Example 20 (1stepahead ARX predictor) Let us derive a onestepahead predictor for an ARXsystem (Definition 7). Since A = D, D1 = 1. The Diophantine equation becomes 1 A(q_l) = E1 (qi)
.i F1 (qi) +t/ ~(q_~)
(3.149)
The solution for the Diophantine is given by
The predictor
E1 (qI)
_
F1 (ql)
= q [1  A (ql)]
(3.150) = A1 (qI)
(3.151)
becomes
~(k+l)=B(q1)u(kd+i)Al(q1)y(k)
(3.152)
3.3.
STOCHASTIC DISTURBANCES
73
Separation of inputs In control, the future process inputs are of interest (they are to be determined by the controller). The future and knownsignals in (3.140) can be further separated into future and knownparts using a Diophantine equation: B (ql) D (ql) Z~(q’) A (ql)
C (ql)
= Gi (ql) + qi+a
Hi (ql) A (ql)
C (qi)
(3.153)
which gives the algorithm for the/step ahemprediction. Algorithm 14 (/stepahead BJ predictor: continued) Using a modelwith separated available and unavailable information, the/step ahead prediction is given by
+ i)
(ql) _ d +
(3.154)
1 )Hi(q ~ A(q_l)C(q_l)u(k)
where Ei and F/are obtained from the Diophantine equation C D(q1) = E~ (ql) + qi~ (ql) (ql) and Hi and Gi are obtained from the Diophantine equation B (ql)
D (ql)
Ei (ql)
(3.155)
Hi (ql)
= Gi (ql) + q,+d (3.156) A (ql)C (ql) A (ql)C (ql) Finally, let us give the correspondingalgorithm for the case of having A as a factor of D. Algorithm 15 (/stepahead BJ predictor: continued) Consider a modelwith separated available and unavailable information and where A is a factor of D. Denote D (ql)
_ D1 (ql)
A (ql)
(3.157)
The/stepahead predictor for a BoxJenkins system (Definition 6) is then given by ~(k + i) = ai (q1)u(kd +H,
C(q1)
(3.158)
74
CHAPTER 3.
LINEAR DYNAMIC SYSTEMS
where Ei and F~ are obtained from the Diophantine equation (3.159) and Hi and G~are obtained from the Diophantine equation B (q~) D1(q’) E~(qi)
3.3.8
= G~ (qi) + q~+dH~ (qi) 1) C(q
(3.160)
Remarks
Let us conclude this chapter by makinga few remarks concerning the practical use of the stochastic timeseries models. Incremental estimation In practice, differencing of data is often preferred, i.e. workingwith signals Ay (k) and Au (k), where A = 1 q~. However,differencing data with high frequency noise componentsdegrades the signaltonoise ratio. It is possible (simple solution) to overcomethis with appropriate signal filtering. Gradients The estimation of the parameters in the polynomials A and B of the process modelis usually based on gradientbased techniques. For the ARXstructure, the predictor is given by ~(k + 1) = B (q~) u(k  1)  A 1 (q ~) y(k
(3.161)
Since the inputs are independent of the predictor parameters, LS, RLS,etc. (see Chapter 2) can be used. In the OEstructure (as well as ARMAX, etc.), the predictor output dependson the past predictions and the regression vector is thus a function of the parameters themselves. In order to estimate the parameters, alternative methodsmust be used. Following chapters will present the prediction error methods(nonlinear LS methods), for which the gradients of the predictor output with respect to the parameters are required. The OEpredictor is given by ~(k+l)=B(q~)u(kd+l)Al(q~)~(k)
(3.162)
3.3.
75
STOCHASTIC DISTURBANCES
The gradients with respect to the parameters in the feed forward part B are given by O~(k)
=u(kd
n)
am
(3.163)
m~l
wheren = 0, 1, ..., A
riB; and with respect to parameters in the feedback part ’~’~
O~(k)
wheren = 1, 2,
...,
O~(km) (3.164)
.6 nA
aThe gradients with respect to the parameters in the feedforward part B are given by
o~(~) _~_o BOan = Oa,~ (q’)
u (k  d) ~  A, (q’)
~(k 1)
The first term on the right hand side does not depend on an. For the second term, since A1 = al + a2q~ + ... + an.~qIn~~), we can write
O~(k) Oan
which can be written as (3.164). Similarly, the gradient with respect to parameters in are given by
o
o~(k)
Obn = "~n B (ql)
u(k
 d)
 "~nA1
(ql)
~(k

The first term on the right hand side gives 0 0 0 ~~B (q’) u (k  d) = ~~nbou(k  d) + ... + ~b,u (k  d  n)
o
+~bn,~u (k  d Ob, = u(kdn) and the second term gives
,~AO~(k .~) o Ob~(q~)~(~ 1) =m=l~_, a.. Combiningthese, we have (3.16g).
76
CHAPTER 3.
LINEAR
DYNAMIC SYSTEMS
Assuming that the parameters change slowly during the estimation 7, the past gradients can be stored and the computations performed in a recursive fashion. Let us collect the results in this more convenient form. Algorithm 16 (Gradients of the OE predictor) The derivatives of the output of the OE predictor with respect to the parameters in A and B axe given by
(k)
% (k);
(3.165) ~A
¯)
b,,(k)
= u(kdn)Ea,~(km
(3.166)
rn=l nA
¯ wheren = 0, 1, ...,
~(k)
= ~(kn)Ea,~(km)
nB and n = 1, 2, ...,
(3.167)
nA, respectively.
Notice that the system needs to be stable, since otherwise the gradients will grow (unbounded). Estimation
of noise
polynomials
In the system model y(k)=
B(q1)u(kd)+1) A(q
1) D(q e(k)
(3.16s)
only the process dynamics, B and A, are usually identified. D is a design parameter, the selection of which results in the OE structure, ARXstructure, etc. Estimating C is generally difficult in practice, because of the nature of C and the fact that e (k) is never available and must be approximated by priori or a posteriori prediction errors, thus reducing the convergence rate of the parameters. For estimating C, a simple solution is to filter the data (using the prior information about the process noise) with a low pass filter, F, thus removing high frequency components of the signals. It is then possible to use a fixed estimate of C (often denoted by T), representing prior knowledge about the process noise. One interpretation of T is that of a fixed observer. In the estimation, e.g., the RLScan be used. 7The assumptions are that ~ Ob~ (k  m)10=0q¢) " "~ ~ (k  m) IO=O(km) and °L (k  rn) IO=O(k)~ O_L(k  m)Io=o(km),where0 contains the timevaryingcompoOa~ Oan nents of the model.
Chapter 4 Nonlinear
Systems
Identification can be justified by the reduced time and effort required in building the models, and the flexibility of parameterized experimental models in realworld modeling problems. For simple inputoutput relations, linear models are a relatively robust alternative. Linear models are simple and efficient also whenextending to the identification of adaptive and/or dynamic models, and readily available control design methods can be found from the literature. However, most industrial processes are nonlinear. If the nonlinear characteristics of the process areknown, a seemingly nonlinear identification problem mayoften be converted to a linear identification problem. Using the available a priori knowledge of the nonlinearities, the model inputoutput data can be preprocessed, or the model reparameterized. This is in fact what is often done in graybox modeling. As the processes become more complex, a sufficiently accurate nonlinear inputoutput behavior is more difficult to obtain using linear descriptions. If more detailed models are required, then the engineer needs to turn to methods of identification of nonlinear systems. Manytypes of model structures have been considered for the identification of nonlinear systems. Traditionally, model structures with constrained nonlinearities have been considered (see, e.g., [78]). Lately, a number of new structures have been proposed (see, e.g., [86]) and shown to be useful in applications. Particular interest has been focused on fields such as neural computation [29][27] and fuzzy systems [47] [73]. These fields, among many other topics, are a part of the field of artificial intelligence. In this chapter, a brief introduction to some basic topics in the identification of nonlinear systems is given. The target of this chapter is to provide the reader with a basic understanding and overview of some commonparameterized (blackbox) structures used for approximating nonlinear functions. In particular, the basis function networks are introduced. They provide a 77
78
CHAPTER 4.
NONLINEAR SYSTEMS
general frameworkfor most nonlinear model structures, which sb~ould help the reader in understandingand clustering the multitude of differelat specific paradigms, structures and methodsavailable. The power series, onehiddenlayer sigmoid neural networks and 0order Sugeno fuzzy models are considered in detail, including linearization of the mappingsand the computation of gradients.
4.1 Basis
function
networks
In this section, the basis function networks[86] are introduced. Theyprovide a general frameworkfor most nonlinear modelstructures.
4.1.1
Generalized
basis function
network
Most nonlinear model structures can be presented as decomposedinto two parts: ¯ a mapping~ from the input space to regressors; and ¯ a mappingf from regressors to model output. Theselection of regressors ~o is mainly based on utilizing physical insight to the problem. Obviously, all the necessary input signals should be included. Sometransformation (preprocessing, filtering) of the raw measurements could also be used in order to facilitate the estimation of the parameters. In dynamictimeseries modeling, the ’orders’ of the system (numberof past inputs, outputs and predictions) need to be chosen. Such semiphysical regressors are formed in view of what is knownabout the system. In the remainingsections, we will be interested in the mappingf. The nonlinear mappingf can be viewed as function expansions [86]. In a generalized basis function network [31], the mappingf is formedby H
~(k) = f(~o(k),.)
= Ehh (~(k),.)gh
(4.1)
h1
wheregh are the basis functions and hh are weightingfunctions, h = 1, 2, ..., H. ~ denotes the model output1. The dot indicates that there maybe some parameters associated with these functions. The output of each basis function 1 Usually the models are to be used as predictors. the remaining chapters.
We will use this notation throughout
4.1.
BASIS
FUNCTION NETWORKS
79
is multiplied by the weighting function and these values are summedto form the function output. With constants as weighting functions, the structure is referred to as a standardbasis .function network. Thek’s in (4.1) refer the fact that these modelswill be used for sampled systems. The mappingf, however, is not dependent on the sampling, just as the operations of multiplication and summingin linear systems are not dependent on the sampling. In the remainderof this chapter, simplified notation will be used H
~=f(~, ") = :~~ hh (~, ") gh (qa,
(4.2)
h1
Animportant interpretation of the basis function networkis that of local models[31]: ~n (4.2), ... ... each function hh can be viewedas a local model,validity of whichis defined by the activation value ofgh. Hencegh’S partition the input space into operating regions on each of which a local modelis defined. The network smoothly joins these local models together through interpolation to form an overall global modelf.
4.1.2
Basis functions
Usually the basis functions are obtained by parameterizing a singlevariable mother basis .function, ~, and repeating it a large numberof times in the expansion. Singlevariable basis functions can be classified into local and global basis functions. Local basis .functions have a gradient with a bounded support (at least in a practical sense), whereas global basis functions have an infinitely spreading gradient. This means, roughly, that with local basis functions there axe large areas in the input space wherea changein the input variable causes no change in the function output; a change in the input of a global basis function always causes a changein the function output. Different kinds of singlevariable basis functions are illustrated in Fig. 4.1. In the multivariable case, the basis functions can be classified into three main groups [86]: tensor products, radial constructions and ridge constructions. ¯ The tensor product construction is the product of singlevariable functions I
gn (~o) = g (~i, ")
(4.3)
wherethe subscript i indexes the elements of the regression vector.
80
CHAPTER
sine
4.
0.~
0.~
0.~
OA
0.4
0.~
0.2
0
0
1
0.2
0.4
0.6
0.8
1
semicircl
0
1
0.~
o.~
O.f
0.~
0.4
O.d
0.~
0.2
0
0 0
SYSTEMS
1
~ 0.8
0
NONLINEAR
0.2
0.4
0.6
0.8
1
0.2 0.4 0’.6
0’.8
sigmoid
0
0.2
0.4
0.6
0.8
1
Figure 4.1: Examples of singlevariable basis functions, ~. A global basis function (sine) has a gradient with an infinite support. Local basis functions (semicircle) have a bounded support, at least in a practical sense (Gaussian and sigmoid functions).
4.1.
BASIS
81
FUNCTION NETWORKS
Radial construction is based on taking some norm on the space of the regression vector and passing the result through a singlevariable function
In ridge constructions, a linear combination of the regression vector is passed through a singlevariable function gh (~)  ~ (f~’~+~/h) The parameters ~/h and f~h are typically gh.
4.1.3
Function
(4.5)
related to the scale and position of
approximation
The powerful function approximation capabilities of some basis function networks are a major reason for their popularity in the identification of nonlinear systems. Let us call by a universal approximator something that can uniformly approximate continuous functions to any degree of accuracy on compact sets [12]. Proofs of universal approximation for basis function networks have been published by several authors. Hornik [30] showed that the multilayer feedforward networks with one hidden layer using arbitrary squashing functions (e.g., sigmoid neural networks) are capable of approximating any measurable function from finite dimensional space to another. This can be done to any desired degree of accuracy, provided that sufficiently many basis functions are available. The function approximation capability can be explained in the following intuitive way [29]: Any reasonable function f{x} can be represented by a linear combination of localized bumps that are each nonzero only in a small region of the domain {x}. Such bumps can be constructed with local basis functions and the associated weighting functions. Not surprisingly, universal function approximation capability can be proved for manytypes of networks. All the proofs are existence proofs, showing that approximations are possible: There exists a set of basis functions with a set of parameters that produces a mapping with given accuracy. Unfortunately, less can be said about how to find this mapping: How to find the correct parameters from data, or what is a (smallest) sufficient
82
CHAPTER 4.
NONLINEAR SYSTEMS
numberof basis functions for a particular problem. A typical frameworkis to approximate an unknownfunction F y=F(~)÷e
(4.6)
based on sampleddata ~ (k), y (k), k = 1, 2, ..., K, where the observed puts are corrupted by zero meannoise e (k) with finite variance. Notice, that in a standard basis function network H
~= f(~) " E ahgh (~,Dh,~/~)
(4.7)
the parameters ah appear linearly. If only ah are of interest, these can be estimated from data, e.g., using the least squares (the regressor containing the evaluated basis functions). If there are parameters in the basis functions to be estimated (Dh,’~h) they typically appear nonlinearly. In somecases, these types of parameters are commonlyestimated using iterative gradientbased methods (see Chapter 6). The structure selection problem (roughly, the selection of H) can also be guided by data (see, e.g., [18][26]). The main obstacle in structure selection is the fundamentaltradeoff betweenbias (due to insufficient model structure) and variance (due to noise in a finite data set), the biasvariance dilemma. With increased network size the bias decreases but the variance increases, and vice versa. In practice the performanceof data driven structure selection (smoothing) algorithms can be computationally expensive and sometimes questionable, however, and it is more commonto experiment with several fixed networksizes H. The ’optimal’ networksize is then found as the smallest network whichgives sufficient accuracy both on the data and on independent test data (roughly, crossvalidation). The biasvariance dilemma can also be tackled in parameter estimation by posing constraints on the functional form of the mapping(see Chapter 6).
4.2 Nonlinear
blackbox
structures
Nonlinear system identification can be difficult because a nonlinear system can be nonlinear in so manydifferent ways. Traditionally only modelstructures with constrained nonlinearities have had success in practice. Lately, a numberof new model structures have been proposed and shownto be useful in applications (see, e.g., [34]). Mostinterest has been focused on artificial neural networks (such as sigmoid neural networks and radial basis function networks), and fuzzy systems.
4.2.
NONLINEAR BLACKBOX STRUCTURES
83
To start with, recall the structure of the generalized basis function network(4.2) H
~= f(~o, .) = ~~hh(~, ")gh (~o,
(4.8)
h=l
The overall mappingis obtained by taking a weighted sum of the activation of the H basis functions. In what follows, somecommonly used structures are presented and shownto fit to the above generalized basis function network scheme.
4.2.1
Power series
Whenglobal basis functions are used, each weighting function hu has an effect on the model outcome at every operating region. Typical examples include the linear and multilinear models, special cases of powerseries, or polynomialdevelopments.In powerseries, the powersof the regressor generate the basis functions; in multilinear systemsonly first order terms of each regressor componentare used. The static mappingcan be seen as a special case of the identification of nonlinear dynamicsystems using Volterra series (see Chapter 5). Other common structures include the Fourier series, for example. Thesebelong to the class of series estimators, an extension of linear regression where the componentsof the regression vector represent the basis functions. Aconvenientfeature of these structures is that all the parameters appear linearly, and can be estimated, e.g., using the least squares method.
Linear regression A linear regression modeluses global basis functions
^ y
~o
(4. 9)
wherey is the modeloutput, ~ [~1, = ~2, ..., ~, ~,+1  1] T are the I inputs ^ r .... to the model with bias, and ~ = [~, ~, ..., ~, 0r+l] are the corresponding parameters. A linear model can be presented in the frameworkof the
84
CHAPTER 4.
NONLINEAR
SYSTEMS
generalized basis function network by assigning
(4.10)
Quite obviously, only linear functions can be mapped using the above model structure. Alternatively, we can also consider using the observed data points as basis functions. Assume that a linear model is based on K available data points (T (k) ,y (k)), k = 1, 2, ..., K. Let a linear model be given by (4.9) ~= [~T~]I CTy (see Section 2.2.3)¯ Then ~= ~oTZy
(4¯11)
where Z = [~T¢] 1 cT. Denote the k t’ column of Z by Z~. The presentation in the framework of the generalized basis function network is obtained by assigning y=y g, (~o, .) ~ ~oTz,
H=K hi (~,’) ~ y(1)
:
:
gh (~O,’)~ 9~Tzk
hn (~a,.)~ y(k)
(4.12)
gg (~,’) * ~orZ~c h, (~o,.) ~ y(g) This type of formulation is important in smoothing ([26] [25]). The smoothed values for each observed data point are given by ~ = ~ [~T~]~
~Ty
where S = ¯ [¢T¢] ~ cT (a g × K smoother matrix), ferred to as equivalent kernels of a linear smoother. Multilinear
(4.13) and its
rows are re
systems
In many practical cases, multilinear developments are sufficient. A function g (~o), ~o = IT1, ..., ~, ..., ~a~]T, is multilinear if it is linear in each component
4.2.
NONLINEAR BLACKBOX STRUCTURES
85
q~i, whenall other components9~j, j ~ i , are fixed. Ageneral form is given
by
{i1=1,.. 1,il_l
}=l;il ~i2<...
(4.15)
+01,2,"" ,I~01
V whereE(a,b,.,cI:l;a
Example 21 (Multilinear system) For a twoinput system (I = 2), multilinear developmentis given by (4.16) Example 22 (Fluidized bed combustion) In an FBC(see Appendix the steadystate relation betweenthe fuel feed rate, Qc, and the combustion poweris given by
P = gc (1  V)Qc+ gvVQc
(4.17)
Let us nowconsider Qc and V (fraction of volatiles in fuel) as nonconstant inputs to the system. Using the following input transformations (4.18) the equation can be written as P
= Hc~ 1 q
(g V 
gc)~1~:~2
(4.19)
which is a multilinear mappingfrom ~ to P. Exeunple 23 (Multilinear system: continued) For a threeinput (I = 3) a multilinear developmentis given
system
~ = ~0 + ~,~, + ~ + ~a~oa +
(4.20)
nt~1,2~91~P2 nt ~1,3(~1~P3Jr" ~2,3(~2(/93
÷01,2,3~o~ ~o2~o3
86
CHAPTER 4.
NONLINEAR SYSTEMS
Power series In powerseries, the basis functions are formedby taking tensor products of integervaluedpowers(j~ = 0, 1, 2, ...; i = 1, 2, ..., I) of the input variables I
gh(90,.) =H
(4.21)
i=1
up to a given order j I
j = Zji
(4.22)
i=1
The model is then produced by taking a weighted sum of the activations of the basis functions and the associated parameters. Example24 (Power series) In manypractical situations, a second order development(j = 2) is sufficient. For a twoinput system (I = 2), a polynomial development model would be as follows
The corresponding presentation in the framework of the generMized basis function network is obtained by substituting ~’=~" g~ (90,) = 1 g~(90, .) ~ qo~ g3 (T, ") ~
H=6 h~ (90,.) ~ hu (90, .) ~ ha (~, ") ~
g~(~,.) ~
~ (~,.)
gs(~,’)~x~2
hs(~,’)~
g~(~,.) ~
~ (~,) ~ ~,~
(4.24)
In multilinear and polynomia..1developments,the modeloutput ~ is linear with respect to the parameters 0, yet nonlinear with respect to inputs ~. Fromparameter estimation point of view, the methodscould also be seen as a methodof preprocessing the input data, and then applying simple linear regression. However, nonlinear functions can be mappedusing the above modelconstructions.
4.2.
NONLINEAR BLACKBOX STRUCTURES
87
Inversepower series The inverse of a function is commonly neededin applications of control, for example.In practice it is often simplest to choosethe structure of the inverse model(of the powerseries expansion) and to estimate their coefficients. a side note, however, we discuss in the following a less knownapproach for finding the inverse of a powerseries. TheBttrmanLagrangeseries constitutes a generalization of Taylor series. These series appear whenweexpandan analytical function f(~o) into a series of ascending powersof another analytical function w(~o): f(~o) = E ¢x~w~(~o)
(4.25)
h=O
For n >_ 1, it follows: a,~ = . lim~_~ad~,~_l f’ (~o) ’~ (~) j
(4.26)
where f’ (~o) = df(~o) d~o
(4.27)
Example 25 (Function approximation) Consider a function f(x) 2 to be approximated with w(x) = x around x = 0. Then f (x) = 2x and using (4.26) wehavefor the coefficients: .1 d { d(xa)2}=lim~_,o a2 = ~ 1,m~__,a~xx 2x x2 ah = 0 for h = O,h = 2, 3, 4,...
~xxx = 1
(4.28) (4.29)
Let us apply this result for inverting a powerseries. Considerthe following powerseries: w(~o) = c1~+ ~ + .. . + c,~o’~ + .. .
(4.30)
c~ # 0, which is convergent in the neighborhoodof the point ~o = 0. Find the expansion of the function ~o (w) with respect to the ascending powers W
~o(w)= s0 + cqw+ ... + c~nw"+ ...
(4.31)
88 This particular
CHAPTER 4.
NONLINEAR
SYSTEMS
problem can be solved using the BttrmanLagrange series. 1 d n1 (~)’~ c~= ~ lim~_~0 d~
,(n= 1,2,...)
(4.32)
because in our case f (~o) = ~ and t’ (~o) = Example 26 (Numerical model
example)
Consider
(4.33)
the following
power series
2y=~o(x)=clx+c2~
(4.34)
For an approximation of its inverse in the neighborhood of x  0 x = v(y)
= a0 +aly+ a2y 2 +""
(4.35)
the following coefficients are obtained using (4.32) (4.36) (4.37)
ca2 c42 33 = c 15’~ ~c~ = 527’35c~ = 14~1,.. Figure 4.2 illustrates
(4.39)
the approximation with c~ = 1 and c2 = 10.
Example 27’ (Exponential
function)
Consider following oo (_a). thez,~+ a
series (4.40)
~ (~) = ~o~= ~! n~0
Using (4.32) we derive c~,~
= lim~~0 .
~exp (anx)
( an)’~ n~
(4.41)
which leads to: oo (an),~~w,~.
(4.42)
4.2.
89
NONLINEAR BLACKBOX STRUCTURES 0.1.* 1 \
0.1
".
.~\
/
\\\
5>~ 0.0
/
05
0
0.05
0.1
0.15
X
Figure 4.2: A powerseries (solid line) and its inverses (dotted lines) approximatedaroundx = 0 using n  1, 2, 3, 4 and 5 first terms of the series. 4.2.2
Sigmoid
neural
networks
Neural networks consist of multiple techniques related loosely to each other by the backgroundof the algorithms: the neural circuitry in a living brain. There are three basic perspectives to neural networks. Oncan consider them as a form of artificial intelligence, as a meansfor enabling computers to perform intelligent tasks. On the other hand, neural networks can be seen from a biological point of view, as a wayof modeling the neural circuitry observed in living creatures. The approach taken in engineering, the more practical view, considers neural networksfrom a purely technical perspective of data classification, filtering and identification of nonlinear systems. For a large part the research in neural computation overlaps with the fields of statistical analysis and optimization. In general, neural networks are modelingstructures characterized by: ¯ A large numberof simple interconnected elements (units, nodes, neurons); and ¯ A learning mechanismfor adjusting the connections (weights, parameters) between the nodes, based on observed patterns of the system behavior. There are several alternative ways to categorize neural network models and techniques. From the pragmatic point of view, we can categorize them roughly into two classes:
90
CHAPTER 4.
NONLINEAR SYSTEMS
¯ Multilayer perceptron networks, such as sigmoid neural networks (SNN), deal with function approximation. Perhaps the most important result brought by the neural research has been to show that any reasonable function can be approximatedto any degree of accuracy; and to provide modelstructures that are also viable in practice. ¯ Selforganizing maps (SOM)consider the problems of clustering and quantization. Amongthe main new contributions is the introduction of an internal topology into the clustering process. In what follows, we will focus on function approximation tasks. The SOM will be briefly discussed in connection of nearest neighbor methods(section 4.2.3). Sigmoid neural networks are probably the most commonneural network structure used for nonlinear function approximation. They are also commonlyreferred to as multilayer perceptrons (MLPs), backpropagation networks, or feedforward artificial neural nets. These names comefrom the different properties of the standard sigmoid neural network. Sigmoidneural networksuse (practically) local basis functions. However, due to the use of the ridge construction (4.5), the interpretation as local roodels is not very useful as this type of structure estimates nonlinear hypersurfaces, rather than local modelsfor various operating regions. In practice, the fact that hypersurfaces are estimated provides advantages in interpolation. This type of structure is often referred to as semiglobal. Other examplesof similar structures include perceptrons or hinging hyperplanes, for example. In sigmoid neural networks, the basis functions have a sigmoidal shape. The network units are organized as layers. A typical structure is that of a layered 2 feedforward 3 sigmoid neural network. In the neural network terminology, the model inputs reside at the input layer. The input layer then feeds the hidden layer units. The network units, at the hidden layer, compure a linear combination of the input variables and pass this sum through a sigmoid function gh (~,/~h,
1 ~)  1 + e~~
(4.43)
The outputs of the multiple units at the hidden layer are then. fed to the output unit(s). The output unit computes a weighted sumof the activations 2In the simplestclass of layerednetworks,everyunit feedssignals onlyto the units locatedat the nextlayer (andreceivessignalsonlyfromthe units locatedat the preceding layer). Hencethere are no connections leadingfroma unit to units in preceeding layers, nor to otherunits in the samelayer, nor to units morethanonelayer ahea~t. 3Anetworktopologyis feedforward, if it doesnot containanyclosedloops(feedback).
4.2.
NONLINE~AR BLACKBOX STRUCTURES
91
Figure 4.3: Twonetwork constructions. Both models compute a function, ~ =f(~). Left: linear model consisting of a single summingnode. Pdght: standard sigmoid neural network with five hidden nodes (consisting of summingelement followed by a nonlinear sigmoid element) at the hidden layer, and a single summingoutput node. of the hidden layer units. The value of the weighted sumis then the output of the model. Figure 4.3 illustrates a standard sigmoidneural network. Also structures with multiple hidden layers can be constructed, where the basic mappings are further convolvedwith each other by treating basis function outputs as newregressors. Onehiddenlayer
sigmoid neural net
For most practical purposes in process engineering, a singlehiddenlayer networktopology is sufficient. Let us consider a standard onehiddenlayer sigmoldneural net with H hidden units (see Fig. 4.3) including bias parameters. The network computes a function f from an I dimensional column vector ~
92
CHAPTER 4.
NONLINEAR SYSTEMS
of modelinputs: H
~ f(~o, ~, f~) = ahgh (~O,[~h) + aH
(4.44)
h=l
where
=
i
1
(4.45)
y]flh’’~ai~h’’+a 1)exp( )i=a The H + 1 + H (I + 1) network parameters are contained in an H + 1 dimensional colunmvector c~: (4.46) and a matrix f~:
(4.47) Note that, for convenience,the bias parameters ~/in (4.43) are nowintegrated into the structure of the matrix f~. It is common to include bias constants in the linear summing. Usually, the parametersin the sigrnoid neural networkare estimated using somegradientbased method. To do this, the derivatives with respect to the parameters need to be computed. Let us derive the required gradients. For the parameters c~ at the output node we have ~f =  Z ahgh (~,f}h) Oah
~f
0
= gh (~,f~h)
(4.48)
O0~h h=l
=
1
(4.49)
00~H+I
For the parametersf} at the hidden layer nodes, the derivative cm~be written
(4.50)
4.2.
NONLINEAR BLACKBOX STRUCTURES
93
The derivative ~gh (~o,/3h) of a sigmoid function n and ridge construction n (¢) can be rewritten using the chain rule
0 0 0 0Zh,,,g.(~,/~h)= 0Zh,,~(¢(~,/~)) = 0¢7~(¢.)0~,, ~ (~0,~) For ~ as the sigmoid function 1 n(¢h) = 1 + exp(¢h)
(4.52)
the derivative can be expressed in terms of the function output 0 _a7~~ (¢h) = ~ (¢u)[1 a (¢u)]
(4.53)
For the linear sum¢~ I
¢~ = ~/~h,~qo~ + ~h,,+l
(4.54)
i=1
we have that 0 ¢h 0Z.,~ = ~o~ 0
~¢~
= 1
(4.55)
(4.56)
Using(4.50), (4.51), (4.53) and (4.55) the derivative o/~h.~ can be written as 0 f = o~hgh(~, Dh)[1 gh (OR,~h)] qO,
(4.57)
where gh is given by (4.45). For the bias parameters we have, using (4.50), (4.51), (4.53) and (4.56) 0 f= o~hgh (~o,/3h) [1  gh (~o,/3~)]
(4.58)
In a similar way, it is possible to linearize the onehiddenlayer sigmoid neural net in the neighborhood of its operating point ~ using the Taylor series approximation: "~(~O, ~) : ~1~01 c . ., a t ai~O!  3t ~I+1
(4.59)
94
CHAPTER 4.
NONLINEAR SYSTEMS
where 5~ = b~,f(~, (~,/3). To do this, the derivatives with respect to the system inputs need to be calculated: ai = a~ i a~fl~
an~igu (~,/3h) (4.60)
Again, the chain rule can be applied. The sigmoid and its derivative have already been given in (4.52)(4.53). For the linear sum, we have 0 ~~¢h =/3h,i
(4.61)
Substituting these into (4.60) and evaluating at the point of linearization the linearized parameters are obtained from H
5i = E ahgh(~O,/3h)[1 h gh (~,/3h)] ,3h,i
(4.62)
h=l
For zero error at the operating point ~o, the bias can be taken as a~+~= f (~, (~,/3)  E 5i~
(4.63)
i1
Let us collect the results. Algorithm 17 (Onehiddenlayer sigmoid neural network) The output of a onehiddenlayer sigmoid neural network with H hidden nodes is given by H
~’= f(~") = ahgh (~ O,/3h) + aH
(4.64)
h=l
where gh (~o,/3h)
1 , /~’,+1 1 +exp)(i E/3h’’~’ 1
(4.65)
and ~ is the I dimensional input vector. The parameters of the network are contained in a H + 1 dimensional vector c~ and H × (I ÷ 1) dimensional
4.2.
NONLINEAR BLACKBOX STRUCTURES
95
matrix/~. The gradients with respect to the parameters are given by _~0f = gh (~o,/3h)
(4.66)
0 .f 1
(4.67)
O0~H+I
0 ~f 0
 ahgh(tP,~h)[1gh(CP,~h)]~i f = ahgh(¢p,~Oh)[1 gh (~O,/3h)]
(4.68) (4.69)
whereh = 1, 2, ..., H and i = 1, 2, ..., I. A linearized approximationin the neighborhoodof an operating point ~ is given by
f(~, a) = ~1(~)~1+ ... + 5, (~)~i+ ai+~
(4.70)
where the linearized parameters are given by H
~i(~)
= Eahgh(~,~Oh)[1gh(~,Oh)]~h,,
(4.71)
I ~I+l
~
f(~,O:,f~)
 E~i~i i=1
(4.72)
i =1, 2, ..., I.
4.2.3
Nearest
neighbor
methods
There exists a large variety of paradigmsusing local basis functions. Probably the simplest paradigmis the nearest nei9hbor method. Let us consider a pool of data of K inputoutput measurementpairs: 4) = (~o (1), y (1)), (~o (2), y (2)), ..., (~o (K), y (K)). In order to obtain an estimate put given an input pattern ~o, ~o is comparedwith all the input patterns ~o (1), ..., ~o (K) in the data pool. Apattern in the pool is foundthat is closest (e.g., in the Euclidean sense) to the given input (competition between the patterns). Denotethis closest pattern by ~o (c), c E {1, 2, ..., K}. estimate of the output for the given ~o is then ~ = y (c). Let us choose ~ as the indicator function in (4.4): k e r (~o) g} (~)= { 10ifotherwise
(4.73)
96
CHAPTER 4.
where P (~o)  arg min k=l,2,...,K
NONLINEAR
SYSTEMS
]1~o  ~o (k)]]2; and in (4.2): K
~= ~~ y (k) g~ (T)
(4.74)
k=l
the standard basis function network has as many basis functions as there are data points, H ~ K, the local models are given by the observed data, hh ~ y (k). As can be easily seen, this type of approach belongs to the radial constructions [86]. In order to cope with noise and storage limits, an estimate can be computed based on prototypes representing average local behavior, instead of direct observations. In this case, H < K. The problem then is to find a set of suitable centers ~h of the basis functions. There are several ways to look for centers. The ~3h,iS can be spread on the domain of each i using, e.g., equidistant intervals, thus forming a grid of points. In the Kohonennetworks (learning vector quantization, selforganizing map (SOM)[49], see also [32] [80]), the distribution of the basis function centers resembles the probability distribution of the data patterns k = 1, 2, ..., K. All the above methods find a single winner among the basis functions; there is competition between the basis functions (nonoverlapping partitions). One can also consider multiple winners at the same time (overlapping partitions). For example, in the knearest neighbors estimate, the ~ is taken to be the average of those A observed y (k)’s that are associated with the inputs ~o (k) closest to the given ~. The knearest neighbors method can represented by using (4.2) with:
if ke r
g~ (~o) = { 0 otherwise
(4.75)
where F (~o) contains the indexes for the Anearest neighbors of W. Notice that the nearest neighbors estimators have no parameters to be estimated, and that no a priori assumption on the shape of the function is made. Therefore this type of methods are typical examples of nonparametric regression methods [18][25][26]. The A is seen as a smoothing parameter concerning structure selection. Proper selection of A is important: notice that as A increases the bias increases and the variance decreases, and vice versa. The linear smoother is given by ~ = Sy, where S is, roughly, a Abanded matrix (provided a suitable ordering). For A = 1, the smoother matrix is given by the identity matrix, S = I). For A > 1, the equivalent kernel is a rectangular one; note that many other types of kernels can be considered. Wheneach basis function is associated with a constant output at the model output space, piecewise constant functions can be constructed. With
4.2.
NONLINEAR BLACKBOX STRUCTURES
97
overlapping partitioning the resolution of the mappingis enhanced, since the modeloutcomeis an average of the activated constant values. However,the result is still a piecewise constant function. Often, this is an undesirable property for a model. Consider applications of control, for example: Based on a piecewise constant type of model,the gain of the systemis either zero or undefined! There are manywaysto arrive at smoother models. It is common to use prototypes (instead of direct data points), multiple winners (instead of a single winner) and Gaussiankernels (instead of rectangular kernels), in the radial basis function networksto be considered in the next subsection. Radial basis function
networks
In radialbasis function (RBF)networks, the input space is partitioned using radial basis functions. The center and width of each basis function is adjustable. Each basis function is associated with an adjustable weight and the output estimate is produced by computing a weighted sum at the output unit. Hence, the output of the network is a linear superposition of the activities of all the radial functions in the network. The RBFnetwork is often given in the normalized form: H
(4.76)
f(~,/~, ~’) = ah~U (~,¢~, 7U) = ~= h=l
The normalization Inakes the total sumof all basis functions ~h unity, in the whole operating region. Gaussian functions are typically used with RBF networks:
(4.77) Theuse of the RBFnetworksrequires that a suitable partitioning of the input space can be obtained. In simple methods,the locations of the Gaussiansare taken from arbitrary data points or from a uniformlattice in the input space. Alternatively, clustering techniques can be used to choose the centers, such as the SOM.In the orthogonal leastsquares method, locations are selected from data points one by one, maximizing the increment of the explained variance of the desired output. Fig. 4.4 illustrates function approximation with nonoverlapping partitioning of the input space (left), and overlapping partition (right).
98
CHAPTER 4.
NONLINEAR SYSTEMS
1 0.8 o o
~_o.~ 0.2
o
0 0.6
0.7
0.8
0.9
SOM
Normalized Gaussians [0.67]
Figure 4.4: Examplesof identification using local basis functions. lower pictures illustrate the partitioning found by SOM (left), and normalized Gaussians (right) placed at the same centers as well as the associated values of the LS weighting constants. The upper figures depict the data patterns (dots) and the mappingobtained at the output of the structure (solid line). 4.2.4
Fuzzy
inference
systems
Fuzzy modeling[73] [50] stems from advances in logic and cybernetics. Originally, fuzzy systems were developed in the 1960s as an outcome of fuzzy set theory. The fuzzy sets are a mathematical meansto represent vague information. In applications to process modelingand control, the, uncertainty handling aspects have, however,received less interest. Instead, the focus has been in extending the interpolation capabilities of rulebased expert systems. In what follows, we will focus on fuzzy rulebased systems. For moregeneral
4.2.
NONLINEAR
BLACKBOX
STRUCTURES
99
fuzzy systems, please see remarks at the end of this section. In rulebased inference systems, the universe is partitioned using concepts, modeled via sets. Reasoning is then based on expressions of logical relationships between the concepts: ifthen rules. In expert systems, binaryvalued logic is applied; fuzzy systems belong to the class of multivalued logic systems. Expert systems are rulebased systems. The knowledge about the process is represented using rules, such as if premise then consequent
(4.78)
Traditional expert systems use crisp rules based on twostate logic, where elements either belong or do not belong to a given class. The propositions (premise and consequent) represented by the rule can be either true or false, In real life, the classes are illdefined, overlapping, or fuzzy, and a pattern may belong to more than one class. Such nuances can be described with the help of fuzzy sets. In a fuzzy context, a pattern may be assigned a degree of membership value, which represents its degree of membership in a fuzzy set, ~ e [0,1]. Example 28 (Fuzzy and crisp sets) For example, it might be difficult to classify the speed of a car as ’fast’ or ’not fast’ because humanreasoning recognizes different shades of meaning between the two concepts [50]. Fig. 4.5 illustrates crisp and fuzzy concepts of ’fast’. The domain knowledge is expressed as ifthen rules, which relate the input fuzzy sets with the model outcome (if speed is fast then move away). Note, how the use of the adjective ’fast’ to characterize the speed of an approaching car is entirely sufficient to signal the necessity to move away; the precise velocity of the car at this momentis not important. The ifthen rule structure of fuzzy inference systems is convenient in that it is easily comprehensible as being close to one of the ways humans store knowledge. It also provides explanations for the model outcome since it is always possible to find out the exact rules that were fired, and to convert these into semantically meaningful sentences. In this sense, the fuzzy inference systems also provide insight and understanding of the considered process, and support ’what if’ type of analysis. Fuzzy systems
in process
modeling
From the process modeling point of view, the applications main contributions due to the use of fuzzy systems:
have shown two
100
CHAPTER 4.
NONLINEAR SYSTEMS
Fuzzyand crisp sets ’FAST’ Crisp ’FAST’
60 80 100
120
Figure 4.5: Examplesof a crisp set and a fuzzy set describing the ’fast’ speed of a car. A crisp set has sharp boundaries and its membership function asssumes binary values in {0,1}. A fuzzy set has vague boundaries and its membershipfunction takes values in [0,1]. A reduction of the complexity of systems, based on the use of fuzzy sets; and ¯ A transparent form of reasoning (similar to the conscious reasoning by humans). Neural networks have been shownto be very efficient in their fi~nction approximationcapabilities (see e.g. [27][29]), that is in mimickingthe observed inputoutput behavior. Unfortunately, neural networks appear as blackbox models to the developer and enduser. The disadvantage of blackbox models is that, although they seemto provide the correct functional mappings,they do not easily give any additional explanation on what this mappingis composed of, or makeit easier to understand the nature of the relation between the function inputs and outputs. This lack of transparency might lead to difficulties if humanintervention or manmachine interaction is required or expected. This is often the case whenmodels are utilized for optimization or monitoring purposes. Obviously, some transparency would also help the modeldeveloper to evaluate the validity of the modeland to locate unsatisfactory behavior when further model development is needed. The need for transparency has motivated the use of fuzzy systems. In process modeling, fairly simple fuzzy models have been applied. In general, fuzzy modelingcan be an efficient way to quickly build a modelor
4.2.
NONLINEAR BLACKBOX STRUCTURES
101
a controller for a process, whenonly rough information is available. Also nonlinear systems can be considered without extra effort. In a ’standard’ learning approach: 1. fuzzy sets and rules are stated by the experts (plant operators, engineers), 2. the system structure is established, and 3. the membershipfunctions and/or output constants are finetuned using data. This allows us to build a model of a system based on experimental human knowledge. Alternatively, one may start from a nominal model, in which case the motivation for using the fuzzy approach comesfrom the easiness of validation and the possibility to tune the system manually. Sugeno and Mamdani fuzzy models Fuzzy models can also be seen as based on local basis functions [86]. In a fuzzy system, the input partitioning is given by the premises of rules. In Mamdanifuzzy models, both the premise and the consequent of a fuzzy rule are specified using fuzzy sets: if {(~1 is A~,I) and ... and (~o, is Au,~)}then (~ is
(4.79)
where Ah,i and Bh are fuzzy sets specifying the h’th rule and I is the input dimension. In order to get a crisp output from the fuzzy inference, defuzzification is neededto convert the inferred fuzzy output into a crisp singleton. In Sugenofuzzy models, the consequentof a rule is a parameterized function of the input variables. Hencethe rules assumethe form: if {(~o~ is Ah,1) and ... and (~o, is A~,I)} then (~= fh (~,’)) Typically, the functions fh are constants (0order Sugenomodel) or linear polynomials. With Sugeno models, the consequences of multiple rules are combined by summing,weighting the rules with the normalized activation level of each rule. (}order Sugenomodelscan be viewedas a special case of Mamdanimodels, in which each rule’s consequent is specified by a constant (a singleton fuzzy set). In order to computethe fuzzy rules, the operations on fuzzy sets (is, and) need to be specified, as well as the inference (if premise then consequent). Common choice is to implement the ’~i is Ah,i’ by evaluating a triangular
102
CHAPTER 4.
NONLINEAR SYSTEMS
membershipfunction (or Gaussian, or bellshaped function), and the ’p and q’ operation as a product (or minimum).The ifthen inference (implication) is usually seen as a binaryvaluedrelation, true for the sets contai~.~edin the rule, zero elsewhere. Fuzzy inference
systems
Let us have a brief look at the logic backgroundconcerning fuzzy sets and reasoning in fuzzy systems. For moreinformation, see e.g., [40]. A .fuzzy set A of X is expressed by its membershipfunction #A from the universe of discourse to the unit interval
,A: x [0,1]
(4.81)
#A(~) expresses the extent to which~ fulfills the category specified by #A, where X is the universe of discourse (domain) of Fuzzyinference systems consist of five blocks ¯ fuzzification ¯ data base ¯ rule base ¯ decision logic ¯ defuzzification The.fuzzification block converts the system input ~  ~0 E ~ into a fuzzy set A~ on X. Its membershipfunction #A’ (~) is usually defined by the point fuzzification 1 if ~ = ~0 #A’(~)  { 0 otherwise
(4.82)
Alternative fuzzifications can be used if information about the uncertainty of the measurement~  ~0 is available, or if the measurementitself is not crisp. The data base contains information about the fuzzy sets #A~(~)’s (fuzzification), #A, (~)’s and B (y)’s (r ules), an d the associated li nguistic te A~’sand B’s (rules). Therule base is a set of linguistic statements: rules. The rules assume the form if (~1 is A1) and ... and (~ is A~) then (~
(4.83)
4.2.
NONLINEAR BLACKBOX STRUCTURES
~
103
where A and B are linguistic terms defined by fuzzy sets in the data base. 4This can be translated into a simpler form using fuzzy and #A(qO)= T (#A, (~1),’",#A,
(4.84)
A rule can be seen as a fuzzy implication function I I(~ZA(¢P),~B(~))
(4.85)
which is often modeledusing a tnorm. Thedecision logic processes the input fuzzy signals using linguistic rules. Let us derive the modusponens inference assuminga point fuzzification. if (~ is A) then (~ is ~ is A’ ==~~" is B’
(4.86)
where ttB, (~) = sup(T {#A’(q~), A(~O),B(~’)]})
(4.87)
Since the input is a point qa0 (we assumepoint fuzzification here), then 1 if o qa = ~o #A’(~0)  { 0 otherwise
(4.ss)
and the result can be expressed as #B’ (if)
~ T {1,I[pt A (qOo),/z B (if)I} ~ I[/z A (qOo),/zB(ff) ]
(4.89) (4.90)
In general, the inference of the h ’th rule, h = 1, 2, ..., H, for input qao can be expressed as (~) = [
I [~Ah (qO0),~tBh
if #Ah(tPo)= (ff)] otherwise
(4.91)
4Basic operations (intersection =¢. fuzzy and, union =~ fuzzy or) on fuzzy sets can be defined using tnorms and snorms. Tnorms are monotonic nondecreasing: a < b =:> T(a, c) < T(b, c); commutative: T(a, b) = T(b, a); associative: T(a, T (b, T(T(a,b) ,c); and have 1 as unit element T(!,a) = a. Any tnorm is related to its snorm (tconorm) by the deMorgan law S(a, b) = 1 T(1  a, 1  b). The conmaonly tnorms include product: T(a, b) abandmini mum: T(a, b) = rai n (a, b ). T he r elat snorms are probabilistic sum: S(a, b) = a + b  ab and maximum:S(a, b) = max (a,
104

CHAPTER 4.
NONLINEAR
SYSTEMS
which in the case of tnorm implications can be further simplified to #B’~ (9) = T [PAh(~00),IZBh (9)]
(4.92)
The combination of all fuzzy inferences is made by means of an snorm
~, (9) S~;, (9) ~,=~,~,...,The defuzzification determines (converts) the fuzzy output of the decision logic into a crisp output value. A commonchoice is the center of area method f~y 9#B’ (9)
49
(4.94)
In the case of 0order Sugeno fuzzy models, if (~o is Ah) then (~ = Yh), can think of the output sets in (4.92) as given by singleton sets: #Bh (~) 1 if ~ = ~h , zero elsewhere. The defuzzification can then be replaced by a weighted average H
E ~h~. (~0) (4.95)
9= h=,H h=l
Oorder
Sugeno fuzzy
model
Oorder Sugeno fuzzy models are commonin many process engineering applications. They represent a simple case of the more general fuzzy inference systems. Very often, the following choices are made: ¯ system inputs are crisp, ¯ product is chosen for the fuzzy and, and ¯ weighted sum is chosen for the defuzzification. With these choices we arrive to the following Oorder Sugeno fuzzy model. Definition 11 (Oorder Sugeno fuzzy model) A Oorder Sngeno model with H rules is a function f from an I dimensional column vector of model inputs H
E~,,g,,(~,") 9=f(~,a, .) = "=~ Eg. (’¢,") h=l
(4.96)
4.2.
NONLINEAR
BLACKBOX
STRUCTURES
105
Figure 4.6: Addone partition of the domain of the i’th input ~ (i 1, 2, ..., I) using triangular fuzzy sets. The centers of the sets are given by ~,p (p = 1, 2, ..., P~). The bold line shows the membershipfunction ~,3 (~). where I
gn (~’ ") = H #h,i (~i,
(4.97)
where #h,~ (~, ") 6 [0, 1] is the degree of membershipof the i’th input in the premise of the h’th rule. Let us change slightly the notation. Assume that an addone partitioning is used, see Fig. 4.6, where the domain of each input ~i is partitioned separately such that ~ ~,p~ (~, .)= 1 for all
(4.98)
pi=l
where P~ is the number of fuzzy sets used for partitioning the domain of the i’th input. Notice that it is usually simple to set the membership functions such that an addone partitioning is obtained. The tilde emphasizes that difference in the notation (strong fuzzy partition). The following result can be derived: Theorem 1 (Addone partition) partitioned such that
Assume that
~ ~,p, (~, .) = p~=l
each input
domain i
(4.99)
106
CHAPTER 4.
NONLINEAR SYSTEMS
where~,p~ E [0, 1] are the Pi membershipfunctions used for partitioning the domainof the i’th input ~ E ~, i = 1, 2, ..., I. In addition, supposethat the rulebase is complete and that the product tnorm is used. Then, tlhe sumof basis functions in a 0order Sugenomodelis given by H
P1
h1
where h = 1, 2,..., is equal to one
P~
pl1
I
pl=l i~l
H = YIi~l P~ are the H rules. The sumof basis functions H
~~gh (~,’) =
(4.101)
h=l
for all ~iA typical addonepartition is obtained using triangular membershipfunctions
:
~h,i
where ~,p~_l < ~i,~. Observe, howany crisp input ~,i can have nonzero degrees of membershiponly in at most two fuzzy sets ~i,p~. Hence, we have the following simpler result. Algorithm 18 (0order add1 Sugeno fuzzy model) Assume crisp system inputs, an addone partition, product tnorm, and weighted average defu~gification. Then, a 0order Sugenomodelis given by
=
H
I
h=l
i=1
(4.104
where ~,i (~i, ") is the membersNpf~ction ~sociated with the h’gh rNe and ~he i’th input. Equivalently, we can ~ite P1
P2
PI
~ ~ ~ ~*’" ~ ~P~,~,’",PI~I,p~ pl~l ~1 pl=l
(~l~
")
~2,~
(~2~
")’’"
~I,pl
(~I~
")
(4.105)
where~,p, (~, .) is the membership function~sociatedwith tee p~’th set p~titioningthe i’th input(p~= 1, 2,... , ~).
4.2.
107
NONLINEAR BLACKBOX STRUCTURES
Notice, that at each point in the input space which is a center of some triangular fuzzy set, i.e. ~i = ~i,p, Vi, the output of the system is given by ~pl,p2 ~"" ~PI"
Example29 (Fuzzy PI controller)
A PI controller
is given
= Kpe K,
(4.106)
and can be rewritten in an incremental form
= Kae (k)
(4.107)
wherethe control applied to the plant is given by u (k)  u (k  1) + Au Let us develop a fuzzy PItype controller. Clearly, the system has two inputs: the error e (k) and the changeoferror Ae(k), I = 2. For simplicity, let us choose P1 = 5 with linguistic labels negative big (NB), negative small (NS), zero (Z), positive small (PS), positive large (PL) defined by the of triangular add1fuzzysets f~l = [fl1,1," ¯ " , fll,5], andP2 = 3 (negative(N), zero(Z), positive (P) set by f12 = [fl2,1,fl2,~,fl~,a]). This can be written as a Sugeno model 5
3
(4.108) pl11o2=1
where~1,~, ~1,2,.. represent the degrees of membership for the propositions (fuzzy predicates) error is negative big error is negative small
etc. Similarly, the products ~1,1~2,1,~1,1~2,2,’’" truth values of the propositions
(4.109)
can be interpreted as the
error is negative big and changeoferror is negative error is negative big and changeoferror is zero (4.110)
etc. The entire rule base can be collected in a table format, showingthe
108
CHAPTER 4.
NONLINEAR SYSTEMS
values of Au(k) N
Z
P
NB NS Z PS PB
(4.111)
Often, linguistic labels are also assigned for the output singletons C%,p=,in order to further enhancethe transparency of the controller. Next, let us consider the derivatives of a 0order add1 Sugenomodel. The model is given by f(~o,.) = ~’~ .. . ~ pl:l p2:l
~l pl,P2,...~p
’
pl=I
H ~i ,pi (~
0,, ")
(4.
112)
i1
The derivatives with respect to the parameters ~m,~,... ,~ are simple to calculate. It follows I
0f(w,) = lI~,,~, (~,’)
(4.113)
0~,~,~,... ,~,, i=~
If only ~s are of interest, notice that these parameters appear lineaxly and, e.g., least squares can be used for their estimation. Also the gradients with respect to parameters ~ can be calculated. However, the tuning of fuzzy sets using data is more complicated due to various reasons (especially the transparency of the model mayeasily be lost). Therefore, this is omitted here. In order to get a linearized approximation of the Sugeno model in the neighborhoodof its operating point ~ (4.114) the derivatives with respect to the inputs need to be computed: 5i =O~°if(~o, .) = 0~o7.~ ~~... ~, ,,~,...,~,, H~ti,p, (~°i’ ") pl=l p21
p1=l
i=1
(4.115)
4.2.
NONLINEAR BLACKBOX STRUCTURES
109
Separating terms not depending on ~i and movingthe derivation operator inside the summationgives
pl=l
p~=l
pI=l
j=l;j~i
(4.11~) where the derivative of the triangular membershipfunction, (4.103), is given
by
otherwise (4.117) Notice that the gradient (4.117) is a piecewise constant. Thus, the considered Sugeno model can be seen as a piecewise multilinear system and the interpolation properties of the system are particularly well defined. Example 30 (Fuzzy PI controller: continued) Assume that at present a plant operates under a linear PI controller (or that this has been designed using, e.g., the ZieglerNichols rules), and that this PI controller is to be improved by designing a fuzzy PI controller. Thus, the parameters Kp and KI of a linear PI controller Au(k) KpAe (k) + Kie (k ) ar e a priori known. In order to use the nominalsystem as a starting point in the design of a fuzzy PI control, the equivalent fuzzy representation is needed. First, the input space needs to be partitioned. Assumethat reasonable boundsJerkin, emax]and [Aemin,Aema~]can be set. Initially, we can place the centers of add1 triangular fuzzy sets, e.g., at equidistant intervals (using Pi =5, P2 = 3): ~1 = [emin,
emin + Ze, ernin
~2 = [Aemin,
Aemin
q 2Ze,
emin +
+ ZAe,
(4.118)
3ze,
(4.119)
Aemax]
) where z~ = (~~"~")a and zA~ = ( A’eraax2/kernin The C~p~,mremain tobe specified. As the nominal system is given, we can set the ~u, ~,x,"" , ~.~ to correspond to the system output at points (~1,1,~,~, (/~,2,/~2,1), \
/
\
/
110
CHAPTER 4.
NONLINEAR SYSTEMS
~2,3) by assigning
Since this type of Sugenomodel is (piecewise multi)linear, the resulting fuzzy PI nowproduces exactly the same function as the nominal linear PI, as long as the inputs are within the given ranges, i.e., e E [emin, emax] and Ae E [Aemin,
Aemax]
.
Let us conclude this section by makinga few remarks. Remark 4 (Extension principle) The extension principle, or compositional rule of inference, is a meansfor extending any mappingof a fuzzy set from one ~pace to another. Let A be a fuzzy set defined in X, and f be a mapping from X to Y, f:X ~ Y. Then a mapping of A via f is a fuzzy set ~B (A) defined in Y. The membershipfunction is computed according to: [#B (A)] (y)
sup all
xeX for which y=f(x)
[#A(x)]
(4.121)
assuming that sup 0 = 0 (when no element of X is mapped to y). In the MISOcase, we have [#t~ (A)] (y) = sup~ ,,ex, for which y=ff,,)[#A (x)] #~ (x) =T[#A,(Xl)," "" ,#A~(X/)]. Example31 (Extension principle) Figure 4.7 illustrates the extension principle for mappinga fuzzy set A (characterized by #A) through a function f. Theresult is a fuzzy set B (characterized by #B). Example32 (0order Sugeno fuzzy system) Figure 4.8 shows an illustration of the extension principle for a fuzzy input A’ and a function given by sampled data points {y} =f{x}. The output is a fuzzy set on a discrete domain Y. Note that the fuzzy input A’ may be a result of fuzzification of a nonfuzzy input x0. The ’fuzziness’ in A’ together with a defuzzification methodthen determines the interpolation/smoothing properties of the system. Example 33 (0order Sugeno fuzzy system: continued) Figure 4.9 shows an illustration of the extension principle for a crisp input x0 and a
4.2.
111
NONLINEAR BLACKBOX STRUCTURES
Y
Figure 4.7: Mappinga fuzzy set A through a function f.
! X
Figure 4.8: Mappinga fuzzy input A through a ’function’ given by sampled data points.
112
CHAPTER 4.
NONLINEAR SYSTEMS
Y
B
!
o
ixo
Figure 4.9: Mappinga crisp input through a ’function’ given by fuzzy rules. function given by ’fuzzy rules’ (constant local models). The outputs are fuzzy singletons on a discrete domainY. Note that the input is crisp (or point fuzzification is used). The fuzziness in rule antecedents, as well as the defuzzification method, determines the interpolation properties of the system. Remark 5 (Fuzzy neural networks) During the past few years, the close connections between fuzzy and neural systems have been recognized (see, e.g., [86][34]). Fuzzy neural networks try to benefit from the advantages of both neural and fuzzy approaches. Functional equivalence of some neural and fuzzy paradigms has been established, and commonframeworks, such as the (generalized) basis function network, have been introduced. The links the ’old’ methodsof parameter estimation have becomeapparent, which has enabled the application of efficient parameter estimation methods. Fuzzy neural networks emphasize that the model contents can be presented as linguistic rules or as numerical parameters. The former allows the use of humanexperimental knowledgein initializing model parameters, complementingmissing data, and validating the identified model. The latter enables, e.g., the application of efficient optimization methodsfor parameter estimation. In most fuzzy neural network approaches found in the literature, the learning abilities of neural networksare applied to structures sharing the transparent logical interpretability of fuzzy systems.
Chapter 5 Nonlinear
Dynamic Structures
The best approach for describing nonlinear dynamic systems is to consider the a priori physical information about the system to be characterized. In many cases, suitable information is not available and the designer needs to turn into semiempirical or blackbox methods. In nonlinear dynamic systems, the output of the system depends, often in a complex way, on the past outputs, inputs or internal components of the system. The main problems in system identification are in structure selection, whereas efficient gr~lientbased or guided random search methods are available for solving the associated parameter estimation problems, even if the model is not linear with respect to the parameters. A direct extension of linear dynamic models is the Volterra series representation [92]. The Volterra representation is very general. In practice, however, a finite truncation of the series must be used, and a discrete approximation of the series made. For a SISO system, a Volterra model can be given by
where y (k) and u (k) are the system output and input at discrete instant k. The system parameters are given by w0, w,~, .., w~l,,~,...,,~ P (n,n~ =1,2, ...,N; i = 1,2, ...,P). N and P are the orders of the system, respectively. The order N is related to the length of the time window(zeros of the polynomials), and the order P is related to the nonlinearity of the mapping. The model output is linear with respect to its parameters, which 113
114
CHAPTER 5.
NONLINEAR
DYNAMIC STRUCTURES
makes the parameter estimation simple. Extension into the MISOcase is straightforward. However,there are theoretical and practical drawbacksassociated with the Volterra series [92]. In particular, the system ml~ycontain a large amount of parameters and suffer from the curse of dimensionality. Dueto this, practical applications of Volterra series are often limited to first and second order terms. The static nonlinearity in the Volterra models can be approximated by alternative structures, providing moreconvenient meansfor ¯ including a priori knowledge, ¯ handling of incomplete and noisy data sets, ¯ moreefficient parameterization, ¯ increasing the transparency of the model, ¯ improveddata compression, etc. In what follows, two types of nonlinear blackbox dynamic structures are considered: ¯ Nonlinear timeseries, and ¯ Wiener and Hammerstein models. In both structures, the nonlinear function is a static one. The capability to characterize dynamicalprocess behavior is obtained using delayed inputs and external feedback (nonlinear timeseries), or internal feedback using linear dynamic filters (Wiener and Hammersteinmodels).
5.1 Nonlinear
timeseries
models
There are a large numberof different blackbox approaches for describing nonlinear dynamic systems. In process identification, nonlinear dynamic blackbox timeseries structures are common.The ability to characterize dynamical process behavior is obtained by using delayed inputs and external feedback. For most practical purposes in process identification, MISO nonlinear dynamicsystems can be described with sufficient accuracy using the NAI~X, NOEand NARMAX timeseries structures. The structure determines the inputs to the model, where only externally recurrent feedback connections are allowed. For modelling very complex nonlinear dynamic systems, fully recurrent systems can also be considered.
5.1.
NONLINEAR TIMESERIES
MODELS
115
Denotea nonlinear static function by f, a function of someparameters w. In the NOEtimeseries structure the predictor input consists of past inputs of the process and the past predictions of the process output: ~(k)=f(u(kd),...,u(kdnB),~(k1),...,~(knA),w)
(5.2)
In the NARX structure the input consists of past inputs and outputs of the process: ~(k)=f(u(kd),...,u(kdnB),y(k
1),...,y(knA),W)
(5.3)
In the NARMAX structure the input consists of past inputs and outputs of the process, as well as past predictions: ~(k)
= f(u(kd),...,u(kdnB), y(k1),...,y(knA), ~’(k 1) ,...,~(k  nv)
(5.4)
The NARMAX structure is shown in Fig. 5.1. Notice, that the NOEand NARX structures can be seen as special cases of the NARMAX structure. The structure of the mappingf betweenthe inputs and the output is not determined. If no a priori information about the structure of the process is available, it is common to choose someblackbox structure: powerseries, sigmoid neural networks, or 0order Sugeno fuzzy system (amongmanyothers, see Chapter 4). In practice, process modelling using NOE,NARX and NARMAX structures can give accurate predictions on a fixed data set. Most importantly, it is possible to modela wide class of nonlinearities. If some nonlinear blackboxstructure is chosen for the static function f, practically all reasonable dynamic functions can be approximated (provided that the input data windowsare long enough, and the size (number of parameters) of the blackbox modelis sufficiently large). The approachis simple, as it extends the linear dynamictimeseries structures to nonlinear combinations of the inputs. If the mapping f is a linear one, ARX, OE and ARMAX structures result (see Chapter3). The main problem with these structures concerns the identification of the static nonlinear function f. The complexity (degrees of freedom) the mapping depends on the structure chosen for the nonlinear (parameterized) function. In nonlinear blackbox structures, the degree of freedom is usually large since the restriction of linearity of the mappingis removed. Technically, it is simple to apply somegradientbased optimization method with the NARXstructure; with NOEand NARMAX the need to take into account the dynamics whencomputing the gradients increases slightly the
116
CHAPTER 5.
~,(~)
NONLINEAR DYNAMIC STRUCTURES
~ .(~d) ~ ~(~d~)
y(k1)
~ :
y(k2)
:
Figure 5.1: NARMAX timeseries predictor.
~(~)
5.1.
117
NONLINEAR TIMESERIES MODELS
need of computations. However, too manydegrees of freedom in f make the parameters w sensitive to noise in data, and poor interpolation can be expected if the data set does not contain enoughinformation (covering the wholeoperating range and all dynamicsituations of interest). In general, the extrapolation properties of nonlinear timeseries modelsare always poor. These problems can be tackled in parameter estimation by using optimization under constraints, where constraints can be posed on the structure (regularization), based on a priori knownproperties of the process, or, e.g., deviation from a nominal model[63]. Alternatively, the degrees of freedom in the mappingcan be reduced. Let us next consider the gradients of the general nonlinear blackboxtimeseries models. These are required by gradientbased parameter estimation techniques (Chapter 6).
5.1.1
Gradients of nonlinear
timeseries
models
For simplicity of notation, let us restrict to SISOsystems (extending to MISOis straightforward.) Consider a nonlinear timeseries NARMAX predictor (5.4), see also Fig. 5.1. Let us calculate the gradient O~/Ow~ of the system output ~ with respect to its parameters T, w~, w = [wl, ...,w~, ...wj] j = 1.2,..., J. For simplicity of notation, denote f(u (k  d), y (k  1), ~ (k  1), f(k, w). Let us linearize the function f (5.4) around an operating point ~u, y, ~, W~.Using Taylor series, wehave that f(k,w)
7(k,w) d) ÷
~ x=~
+c(k,w) wherec is a constant (c (k, w) = [f (k, w)]~,=~)and the tilded variables the deviation from the point of linearization (u  ~ + ~, etc.). The notation [’]x=~ indicates that the expression is evaluated at the point of linearization.
118
CHAPTER 5.
NONLINEAR DYNAMIC STRUCTURES
Wethen ha~e that
0~
(5.7)
+ ~ N(k’w) x=~ 0
w +3~7~ ) c (k, since the ~ and ~ do not dependon wj, j = 1, 2, ..., J, whereas f and ~ do. For the third term on the right hand side we have that
(5.8)
Substituting (5.8) to (5.7) and reorganizing gives
(5.9)
(5.10) since ~ (k 1, w) =~.2~ yO A (k 1, w). Thus, the gradient is composed two terms: the static gradient (first term on the right) and the dynamiceffect of the gradient (second term). Let us summarizethe results by writing the above in a more convenient form.
5.1.
NONLINEAR
TIMESERIES
MODELS
119
Algorithm 19 (Gradients of NARMAX predictor) a NARMAX timeseries model ~(k)
The gradients
for
= f(u(kd),...,u(kdnB), y(k1),...,y(knA), ~(k 1),...,~(k
with parameters w = [wl, ...,
(5.11) nc),W)
wj, ...wj] T are obtained from
of (k, w)+ no~ ¢~(k_m) (k,w)%(k
(5.1~.)
where ~j denotes the ~adient of the model output with respect r ameters:
to its pm
Ow~ (k)
(5.13)
~ V~ (k,
~ (k, w) (j = 1, 2, J) are the static
~ients
w) of the nonline~
function
with r~pect to its parameters. The second term giv~ the dyna~c effect of the feedback in the network to the ~ients, a correction by the line~ized gain:
o~(a, ~) Ex~ple 34 (ARMAXstructure) simple line~ dynamic system ~(k)
Let ~ ill~trate
= a~(k)
(~.~a) the
above
~ing
+~(~ 1) + ~(~
The system h~ t~ p~ameters, w = [a, b, c] T, the ~nction f is linear. ~adients of the static (line~) function ~re given by:
[y(k)
(5.1~) The
Of (k,w)= ~(~Ow~ ~(k 1,w)
(5.16)
of(k,w) =c O~(k1)
(5.17)
120
CHAPTER 5.
NONLINEAR
linear dynamic part
z ~ 11
DYNAMIC STRUCTURES
nonlinear static part
Y
Figure 5.2: A Wiener system. The system input u is put through a linear filter, and a nonlinear mapping of the intermediate signal z gives the system output y.
nonlinear static part
z
linear dynamic part
Y
Figure 5.3: A Hammerstein system. A nonlinear mapping of the input signal u gives the intermediate signal z. The system output y is the output of a linear filter. The gradient of the system output with respect to its parameters is given by
v(k)=
= u(k 1) +c~,(k 1) if(k 1,w)
(5.18)
Notice that although f is linear, the system output is not linear since the gradients depend also on past data. A similar result was derived in section 3.3.8.
5.2 Linear dynamics and static
nonlinearities
In many cases, dynamics of the nonlinear process can be approximated using linear transfer functions for describing the system dynamics. Wiener and Hammerstein structures are typical examples of such structures. A restricted class of Wiener and Hammerstein systems will be considered next. Wiener and Hammerstein structures consist of a linear dynamic part and a nonlinear static part. In a Wiener structure (see Fig. 5.2), the linear dynamic part is followed by the nonlinear part. In a Hammerstein structure (see Fig. 5.3), the nonlinear part precedes the linear dynamic part.
5.2. LINEAR DYNAMICS AND STATIC NONLINEARITIE, 5.2.1
Wiener
S
121
systems
Assumea SISO Wiener system given by (5.19) where
z(k) = g(q~)~(kd)
(5.20)
y (k) is the output of the Wienersystem, f is a nonlinear static SISOfunction, z (k) is an intermediate variable, u (k) is the input to the system and the time delay. A (qZ) and B (ql) are polynomials in the backwardshift operator q1 : A (q’) = 1 + a~q~ ~ + ... + a~q ~ "~ B (q~) = bo + b~q + ... + bn, q
(5.21) (5.22)
Obviously, a predictor for the above deterministic system is given by
~(~)=~(~(k))
(5.~3)
whereA (qa) ~(k) = B (q~) u (k  d). But this is also the predictor O~system. This leads to consider the following stoch~tic proc~s:
~(~1=f [~(q~)~ A(q~)~ (~ ~1+~(~1]
(~.~41
Let us rewrite the noise term:
where the prediction error nowappears at the output of the Wiener system:
Althoughthis mayseemnice, the tradition from e (k) to e~ (k) is critical. ~oma statistical poim of view, in line~ syste~ the prope~i~ of {e (k)} conveyto ( ey (k) } (e.g., if e (k) h~ Gaussiandistribution, then ey (~) Ga~sian, too). For nonlinear system, this is not the c~e, and even ~ an
122
CHAPTER 5.
NONLINEAR DYNAMIC STRUCTURES
approximation it is valid only locally around an operating point provided Ithat . the function f is smoothenough Let us consider the following stochastic Wienersystem B y(k)= f (~(~u(kd))
(5.29)
where {e (k)} is a sequence of independent randomvariables with zero mean and finite variance. The predictor for such a system is given by ~(k) = f(~~(q_l) u(q’) (k  d))
(5.30)
and minimizesthe expectation of the squared prediction error ~. In general, the nonlinear system is a function of some parameters w. Hence we have the expression for a SISOWienerpredictor:
B
w)
~(k)=f(~u(kd),
(5.31)
It is straightforward to extend these results for MISOsystems with multiple linear dynamicsystems (one for each input). Let us first define a Wiener system. 1Undersome conditions related to the nonlinear mappingf and its inverse f1 (continuity, differentiability, etc.), the density function of the output of the nonlinear system can be expressed as a function of the density function of the input (see [39], p. 34, see also [711). ~Let us find ~(k)=argmj~nE {[y(k)
2}
Substituting (5.29) we have that
= :
+E{e where the second term is zero (due to the independenceof e (k) with respect ~o ~ (k ~d ~ and that E {e (k)} = 0). If the ~iance is finite, the criterion is mi~mi~ed (5.30).
5.2.
LINEAR DYNAMICS AND STATIC
Definition
12 (Wiener system)
Define
NONLINEAPdTIES a MISO Wiener system
y(k) = f(z (k)) wherez (k) = [zl (k), z2 (k), ...,
z~ (k), ..., B~(q1)
123
(5.32) T are given by (k d~)
= A (qx)
(5.33)
y (k) is the output of the Wiener system, f is a nonlinear static MISOfunction, z~ (k) are intermediate variables, u~ (k) are the inputs to the system d~ are the time delays. A~ (q~) and B~ (q~) are polynomials in the backward shift operator q1 :
and i = 1, 2, ...,
Ai(q x) = 1 +ai,~q 1 ~a’ +... +a,,,a~q
(5.34)
Bi (qX) ___ ~ bi,o+biAq1 +... +bi,n,,q,,
(5.35)
I, where I is the numberof inputs to the system.
Note, that a general Wiener system may have a single MIMOlinear dynamic part. Here we restrict to the case of multiple SISO linear dynamic parts. The MISOpredictor can be derived in a way similar to the SISO ease. Algorithm 20 (Wiener predictor) tem is given by
A predictor
for a MISO Wiener sys
~(k) = f(~(k)
(5.36)
Bi z"~ (k) = A~(qa) u~ (k
(5.37)
where
~(k) is the predicted output of the Wiener system, f is the nonlinear static MISOfunction of parameters w, u~ (k) and ~ (k) are the input to the system and intermediate variables, respectively, and di are the time delays. Ai (qX) and B~ (qa) are polynomials in the backward shift operator q1 Ai (qX) = 1 + a,,lq x " + a, ... + ai,,~a,q Bi (q~) = b,,o + b~,~qx + ... + bi,,,,q"’, and i = 1, 2, ...,
I, where I is the number of inputs to the system.
Let us next consider Hammerstein systems.
(5.38) (5.39)
124
5.2.2 Definition tem by
CHAPTER 5.
NONLINEAR
DYNAMIC STRUCTURES
Hammerstein systems 13 (Hammerstein
system)
y(k) =B(q1) fA(q1)
Define
a MISO Hammerstein sys
(u(k)w)~e(k) ,
(5.40)
T, i= 1,2,...,I where u(k)= [u1(k),u2(k),...,ui(k),...,u~(k)] where I is the number of inputs to the system, y (k) is the output of the Hammerstein System, and f is a nonlinear static MISOfunction of parameters w. A (ql) and B (ql) are polynomials in the backward shift operator q1 A (ql) = 1 alq1 . b .. . T B (q~) = boqd + blq ’d
’~’~ a,~Aq
(5.41)
’ + *"d ... + b,,Bq
(5,42)
d is the time delay. Since the multipleinput nonlinearity contains just one linear dynamicfilter,
Algorithm 21 (Hammerstein merstein system is given by
appears at the input and the system the prediction is simple to derive.
predictor)
B(q~)’"
A predictor
for
a MISO Ham
w)
~(k) = ~(~(u(k),
(5.43)
whereu(k) = [u~ (k) , u2 (k) , ..., u~ (k) , ..., u, T, i = 1,2,...,I wher e I is the number of inputs to the system. ~(k) is the predicted output of the Hammerstein system, f is a nonlinear static MISOfunction of parameters w. A (q~) and B (ql) are polynomials in the backward shift operator q~ A (q~) = 1 + a~q’ + ... + a,~,~q’*’*
(5.44)
~+ blq~~+... + ~,,,q"’" B(q~)= boq
(~.45)
d is the time delay. In the SISOcase, the input u (k) is a scalar u (k).
5.3.
LINEAR DYNAMICS AND STEADYSTATE
MODELS
5.3 Linear dynamics and steadystate
125
models
The Wiener and Hammersteinsystems consist of two parts: the linear dynamic(transfer function) and the static (nonlinear) part. In the practice industrial process engineering, the steadystate characteristics of a process are of main interest, and the dynamic behavior is often poorly known.In fact, often only the control engineers seemto be interested in the modeling of the dynamicsof a process, while the system designers and production engineers largely ignore the dynamics. In order to provide models that both parties can understand and in order to employalready existing (steadystate) models of the process amongother reasons (such as increased simplicity in identification and control design) it is reasonable to consider the case where the nonlinear static part in Wienerand Hammersteinmodels represents the steadystate behavior of a process. Hence, we assume that the static (nonlinear) function is given by the steadystate function of the process (5.46)
y.~8=f (u.~8) wherethe subscript ss denotes steadystate. The Wiener system is given by
y(k)= ~ (,. (k),w)
(5.47) T are given by
where z(k) = [zl (k),z2(k),...,zi(k),...,zi(k)] 1) Bi(q z, (k) = Ai (ql) u~ (k
(5.48)
In order to preserve the steadystate function, the steadystate gain of the linear dynamicpart has to be equal to one,
B,(z)
lim z~l A~(z)
1
(5.49)
i.e., in steadystate
(5.50) for all i = 1,2, ..., I. Similarly, for the Hammersteinsystem: y(k)= B(q~)f~.~(z(k) , w)+e(k) A(q~)
(5.51)
126
CHAPTER 5.
NONLINEAR DYNAMIC STRUCTURES
we must have
B(z)
lim ~ = 1 ¯ ~ A(z)
(5.52)
in order to preserve the steadystate function
y~s=f.~ (u~,w)
(5.53)
for all i = 1, 2, ..., I.
5.3.1
Transfer function with unit steadystate
gain
There are several waysto fulfill the requirements (5.49) and (5.52). Let consider the following constraint on the coefficient of the transfer function, wherea substitution for b* fulfills the requirement. Algorithm 22 (TF with unit steadystate nomial, nB _> 0 , be given by 1) B*(q
1) A(q
gain) Let the transfer
~~B bo + blq I +¯..+ bnBlq (n~l) + b* nB~ ~ n~ 1 + alq + ... + an, q
poly
(5.54)
A unit steadystate gain is ensured by nA
nB  1
b*ns = 1 + Z an ~_~ bn n=l
(5.55)
n=O
Proof. Substituting (5.55) to the ztransform equivalent of (5.54) letting z ~ 1 gives B* (z) b0 A bl + ...A b*n.1 + 1 + En=lnaan lim= z.~ A (z) 1 + a~ +... +

~na1 bn
A~n=0
:
1 (5.56)
whichshowsthat the steadystate gain of the dynamicpart is one;, as desired.
5.3.2
Wiener and Hammerstein predictors
Combiningthe result in section 5.3.1 with the steadystate Wienerstructure, we get the following predictor.
5.3.
LINEAR
DYNAMICS AND STEADYSTATE
MODELS
127
Algorithm 23 (Wiener predictor: continued) A predictor for a MISO Wiener system with a steadystate nonlinear function is given by
(5.57)
~(k)fss(~(k),w) where ~i (k)
~ B~ (ql)
Ai(qt)us
(]g
d~)
(5.58)
~(k) is the predicted output of the Wiener system, fs~ is the nonlinear static steadystate MISOfunction of parameters w, ui (k) and ~ (k) are the input to the system and intermediate variables, respectively, and di are the time delays. Ai (ql) and B~ (ql) are polynomials in the backward shift operator qt : A~ (ql) = l + a~,tq ~ na~ + ... + a~,~.~q B; (qt) = bi,o + bi,lq 1 q...
q bi,ns,lq
(5.59) (nsi1)
q bi*,nthqnB’
(5.60)
where nBi 1
b~*,n" = l+Ea’,~
E bi,,~
n=l
and i = 1, 2, ...,
I, where I is the numberof inputs to the system.
For the Hammerstein system we get a similar result. Algorithm 24 (Hammerstein predictor: continued) A predictor for MISOHammerstein system with a steadystate nonlinear function is given
by
if(k)= where ~(k) is the nonlinear Jut (k), ..., ui to the system. operator qt :
B* (qt) fs~ (u (k) A(q_t) ,
(5.62)
the predicted output of the Hammerstein system, and fss static steadystate MISOfunction of parameters w. u (k) (k), ..., u~ T, i = 1,2,... , I , where I is t henum ber of inputs A (q~) and B* (qt) are polynonfials in the backward shift
A (ql)
= 1 al n’~ q 1 q ... q
ana q
B* (q~) = boqd + b~q~’~ + ... +bn,_tq (’~~)’~ + nB b*
(5.63)
(5.64) d fl
128
CHAPTER 5.
NONLINEAR DYNAMIC STRUCTURES
where nA
riB1
b’n. = 1 + E a,~n=l
E b.
(5.65)
n=O
and d is the time delay.
5.3.3
Gradients dictors
of the Wiener and Hammerstein pre
If the parameters of the Wiener or Hammersteinmodel are unknown,they need to be estimated. Often, the most convenient way is to estimate the parameters from input~)utput data observed from the process. If gradientbased techniques are used, the gradients with respect to the parameters need to be computed. Assumethat all parameters are unknown: ¯ parameters of the static (steadystate)
mapping, w; and
¯ parametersof the linear transfer function(s), i.e., coefficients of the polynomials A and B, as well as delay(s) Anestimate for the delay(s), is usually obtained by simply looking at the process behavior (step response, etc.), whereas parameters w, A and B are estimated by minimizing the prediction error. Sometimes the estimation of d mayrequire several iterative rounds, where a value for d is suggested, parameters in A, B and w are estimated, and if the modelis unsatisfactory, newvalue(s) for d are suggested. Let us compute the gradients for parameters in A, B, and w in the Wienerpredictor (23). For the static part, denote gradient with respect parameters by
0F Owj
%(k)
(5.66)
and the gradients with respect to inputs by
wherew= [~/)1, ..., toj,..., ’~Oj]T and J is the numberof parametersin the nonlinear part. I is the numberof inputs to the system, i = 1, 2, ..., I. These parameters dependon the structure chosen for the static part.
5.3.
LINEAR
DYNAMICS AND STEADYSTATE
The chain rule can be applied for calculating to parameters of the dynamic part:
129
MODELS
the gradients with respect
0~ and
0~ 0b~,:(~)= ¢’ (~)~ (~) Hence, in order to compute the gradients,
0~
(5.~9)
only the gradients
ors
0b,,: (k) ~d~ (~)
(~.~0)
~e f~ther needed, where n = 1, 2, ..., hA, and n = 0, 1, ..., n~,  1, r~pectively, i = 1, 2,..., I. For simplicity, omit the input index i for a moment. The output of the line~ part can be ~itten ~ * nBd ~(~) = [60q~+...+6~._~q(~1)~+6~.q ]~(e)
 [(a~+~q~ +... +~,~q(~l))] ~(~~) nA ~nB  1 b~. It is now simple to compute the where b* nB = 1 + ~=~ a~  ~n~O derivatives with r~pect to the parameters:
Oa~(k)=u(kn,d)~
am
(km)~(kn)
(5.73)
where n = 0, 1, ..., nB  1 and n = 1, 2, ..., hA, respectively. Assumingthat the parameters nA change slowly, past gradients of ~o~and~ o~can be stored and the equations computed recursively, thus avoiding excessive computations. Let us collect the previous results in what follows. Algorithm 25 (Gradients of the Wiener predictor) The gradients the Wiener predictor with steadystate nonlinear part are given by
(k) ~ ~ (k) Owj
(5.Va)
130
CHAPTER 5.
NONLINEAR
DYNAMIC STRUCTURES
(k)¢,(k)~_(~) ua~,,~
(5.75)
Oa,,’~
09
(k) = (Ih (k) _~7Ob~,’~
(5.76)
where
(5.77)
0~~(k)~(I)i  d~) ~,I,~,,,,OKZ~ (~
Ob~,~(k) = u~ (k  n di)  ui(k  nB~
rn=l L
J (5.78)
,~A, ra o~i (k)=ui(kn.,di)E. Oai,~OK
~,ma’m=l
L
nOai,
(km)
~(kn)
wherei = 1, 2, ..., I (system inputs), j = 1, 2, ..., J (parameters of the static steadystate part) and n = 0, 1, ...,nB~ 1 and n = 1,2, ..., nA~, respectively (orders of the polynomials associated with each input). Owing to the recursion in the computation of the gradients, the polynomial A~ needs to be stable. Wheninstability is encountered, the parameters need to be projected toward a stable region. A simple method consists of multiplying A~with a constant 3’, 0 << 3’ < 1: o
~A~ A~= 1 + ~fai,lq 1 + ... + "7~A’ai,,~A,q
(5.80)
until all roots of the polynomial lay inside the unit circle. Let us next give the gradients for the Hammerstein predictor. dictor is given by B* (q’) r~ (u (k) w) 9(k) = A (ql) ’
The pre
(5.81)
in which the linear dynamic subsystemat the output can be written out 9(~) = [~o~~ + ~,~,~
n=l  [(a I n a a2q1 nt
+ ...
+ ~_,~¢~’/~]
n=O ...b
anAq(na1))]
9(k  1)
5.3.
131
LINEAR DYNAMICS AND STEADYSTATE MODELS
where the output of the static subsystemis given by ~(k) = r,~ (~, (~)
(5.83)
It is simple to calculate the derivatives with respect to the parameters:
n~ L ~
a~ ~
la~ (~ ~) (~.84) ]
m=l
cga(k)=’d(knBd)E
am (km)~(kn)
(5.85)
wheren = 0, 1, ..., nB 1 and n = 1, 2, ..., hA,respectively. The gradient of the output of the Hammersteinpredictor with respect to the parameters of the static part, denoted by
o~ (~)~~j(~) 0w~.
(5.86)
(j = 1, 2, ..., J) is still required. Denotethe gradient of the output of the static part with respect to its parameters by 0~
(5.s7)
(~)~%(k) o~~.
wherew = [w~,..., w~,..., wj]T contMnsthe parameters of the static mapping. The ~adient of the linear subsystem(5.82) is given
o~
o~ (~) = (~°~~+ ~(’~+ ’ + ~~(~~)~)(~ + 1 + Ea"
E b.{ ~=o
q "
(k)
(5.88)
]
_[(~,+~ql++~q(~l))] o~ (~_~) which is more conveniently expr~sed ~
n=l ~A
n=0
 ~ ~s~(~  n)
132
CHAPTER 5.
NONLINEAR DYNAMIC STRUCTURES
Assumingthat the parameters change slowly, past gradients Ej and koj can be stored and the equations computedrecursively, thus avoiding excessive computations. Let us collect the results for the Hammersteinsystem. Algorithm 26 (Gradients of the Hammerstein predictor) The gradients of the Hammersteinpredictor with steadystate nonline~c part are given by ,~A [ O~ (k m)]
Oh:
(~)
= ~(t:
Oa~(k)=~(knBd)Z
 /=1~ ..... am (km)
(5.90)
(5.91)
rn: l
Owj (k)~.=.~(k)
(5.92)
where
=_j(a) (5.93)
0~"  (k) ~ ~ (k) Owy
(5.94)
where j = 1, 2, ..., J (parameters of the static steadystate part) and n 0, 1, ...,n~  1 and n  1,2, ...,hA, respectively (orders of the polynomials associated with the output).
5.4
Remarks
Let us conclude this chapter by making a few remarks on the practical use of Wiener and Hammerstein systems. For discussion on the Wiener, Hammerstein, and related structures see, e.g., [72].
5.4.
REMARKS
5.4.1
Inverse
133 of
Hammerstein
and
Wiener
systems
Wiener and Hammersteinmodels are counterparts: the inverse of a Hammerstein structure is a Wienerstructure, and vice versa. This is important in applications to control. To show this, let us assumethat a system is described by a Hammerstein model ~B(ql) f(u (k)) (5.95) y(k) =q ~(q1) d is the time delay of the dynamicsystem, thus we can require that bo fi 0. The linear dynamicpart is given by ~) dB(q y(k)=q ~(q_l)Z(k) (5.96) which can be written out as a difference equation: y (k) = a~y (k  1)  ...
 a~ (~
(5.97)
+boz(k  d) + ... + b,~,z (k  d  riB) Its inverse is given by ’) .
z(k)=qdA(q
(k)
(5.9s)
Writing out, we have z(k)
1
a
1 = ~oY(k+d)+~oY(k+d1)+...+
y(k+dnA) ~
b~ (k  1) bnB z ... ~ (k  1  he) b~Z The nonlinear static part is given by z (~) = r(u(~))
(5.99)
(5.1oo)
Let us assumethat the inverse(s) of the nonlinear static part exists ~ (k) = ~ (z (k))
(5.101)
Then, the inverse(s) of a Hammersteinsystem can be expressed ui (k) = ~ (z (k)) (q~)(q_l) y (k +d)
(5.102)
which is a Wienersystem with input sequence(y (k + d)} filtered by B and scaled through ~1, to produce an output sequence (u~ (k)}.
134
CHAPTER 5.
NONLINEAR
Example 35 (Hammerstein inverse) Hammerstein model
DYNAMIC STRUCTURES
Let a SISO system be given by
y(k) 0. 7z(k1)+0.3y(k1) z(k) = 2 (k)
(5.103) (5.104)
with u E ~+ (positive real). Thus A(q 1) = 10.3q~;B(q :~) =0.7;d= 1
(5.105)
The inverse is given by 1 (k÷ z(~)  02y ~(k) = v~
0.3 1)  ~y(k)
(5.106)
(5.i07)
which has the structure of a Wiener system. Notice, that for u E N the inverse does not exist (not unique).
5.4.2
ARX dynamics
So far, only OEtypeof dynamicshave been considered. Let us next consider briefly systems with ARXdynamics. An ARXHammersteinpredictor is given by ~(k) = A~ (qi)y(k)+
B (qi)z(k)
(5.108)
where A = 1 + q~A1, and z (k) =f(u (k)). Since {y} is known(y is able), a onestep ahead ARXHammerstein predictor can be directly implemented. In general, AP~type of dynamics can not be recommendedfor the identification of processes where measurementsare strongly contaminatedby noise. The gradients are simple to calculate
where~j (k) = o_~_~
5.4.
REMARKS
135
For Wienersystems the intermediate variable z (k) is not available. Thus an ARXimplementation is not straightforward. However,if the inverses of the nonlinear part exist and are known, we can obtain the intermediate variables using the inverses
z~(k)=~’(y (k),
(5.112)
Thenthe gradients of the system can be given as
0~~.(~) =%(~)
(5.113)
09 (k) Ob~,,~
(5.114)
= ,~(k)u(kn)
~(k) = ~,(k)z,(~n) Oai,n
(5.115)
where ~ = ~. The identification of Wiener systems with ARXtype of dynamicsrequires the identification of the inverse(s) The derivation of the corresponding equations for a system with unit steadystate gain linear dynamicsis straightforward and omitted here.
Chapter 6 Estimation
of Parameters
This chapter considers parameter estimation techniques. These techniques are essential in system identification, as they provide the meansfor determining (offline) or adjusting (online) the parameters of a chosen model structure, using sampled data (measurements). The least squares method can be applied whenthe system output is linear with respect to its parameters. This is true for linear static mappings(linear regression models),as well as for some linear dynamic structures (such as ARXstructures), and some nonlinear systems (e.g., power series and multilinear systems). However, usually the least squares methodcan not be directly applied in nonlinear or dynamicsystems, since these types of systemsare, in general, nonlinear with respect to their parameters. In the previous chapters we have pointed out structures such as the OE(Chapter 3), the sigmoid neural networks (Chapter 4), or the Wienerstructure (Chapter 5), with which the least squares method can not be directly applied. In this chapter, the least squares methodwill be extended to the case of nonlinear systems. The parameter estimation problem is seen as an optimization problem, minimizinga cost function consisting of a sumof squared prediction errors. In general, the nonlinear parameter estimation techniques are iterative: A fixed set of data is used repeatedly; at each iteration the parameters are adjusted towards a smaller cost. This is because usually the criterion will not be quadratic in the parameters, as in the least squares method, and therefore an analytical solution is not available. Note the difference between recursive algorithms, such as recursive least squares (RLS). In RLS, new data patterns are added onebyone,possibly forgetting the older ones (e.g., exponential forgetting), but not used repeatedly. The methodsdiscussed in this chapter are batch methods, such as the least squares method, where a fixed data set is used. Also recursive forms can be derived. Unfortunately, 137
138
CHAPTER 6.
ESTIMATION
OF PARAMETERS
online identification using nonlinear models is less robust, due to several reasons, and can not be recommended. In practice, gradientbased methods are dominating. Their main drawback is that they can get stuck with the local minima in the cost function. However, these methods have shown to be efficient in practice. For complex systems, suitable methods can be found, e.g., from random.search and probabilistics. Unfortunately, the practical implementations are often inefficient. First, a general overview to prediction error methods is given, and the algorithm for the LevenbergMarquardt method is given. This is followed by the presentation of the Lagrange multipliers approach for the case of optimization under constraints. The guided random search methods are considered in the next section, with special emphasis on the learning automata. In order to illustrate the feasibility and performance of the methods presented in this chapter, a number of simulation examples on process identification is presented at the end of this chapter.
6.1
Prediction
error
methods
The family of methods that minimize the error between the predicted and the observed values of the output are called prediction error methods1. Many other parameter estimation techniques can be regarded as special cases of these methods. Iterative prediction error minimization methods are based on the following principle [87]: given a function J(0) to be minimized K 1 2 J(t9) = ~~ ~ (y (k)  ~(k))
(6.1)
and an initial state for ~(0), find a minimization direction u (l) and step ~ (1), and update the parameters ~ (l + 1) = ~ (1) + y (l)
(6.2).
The prediction ~(k)  f(O, k) is a function of the parameter vector [21, . . " .~p]T, which is computed using the current parameter estimates: ~ = ~ (/); y (k) is the corresponding target observation (measurement.); K number of data patterns in the training set. 1 Principle of least squares prediction
error [3]: Postulate a model that gives the pre
diction in terms of past data and parameters. Given the observations of past data, adjust the parameters in such a way that the sum of the squares of the prediction errors is as small as possible.
6.1.
PREDICTION ERROR METHODS
139
The task of the minimization is to find optimal values for the direction and the step size, whenonly local information of the function is available. Repeatedapplication of (6.2), each time with the optimal direction and step size, will bring J(0) to a minimum.As a result of the search, a parameter estimate is obtained which minimizesthe cost function (6.3)
0  argm~n J (0)
Note, however, that the globality of the minimumcan not be guaranteed. The optimization techniques are not necessarily restricted to cost functions of the specific form given by (6.1), as long as the derivatives are computed accordingly.
6.1.1
Firstorder
methods
Let us now focus on finding the minimization direction. Let 0 be some fixed parameter vector. The cost function (6.1) can be written as a Taylor expansion around 0: P
] 0~0p.+...
[ p=l
0=~
(6.4)
p=l;p.=l
where ~ is the deviation from ~, 0 = ~ + ~. In the first order methods,only the first nonconstant term of the Taylor expansionis used:
p
j (o)
+ p=l
1
"
(6.5)
The derivative of J(0), approximatedby (6.5), is given
A na~al ~ed point 0 is the c~rent parameter ~timate 0 (1). To compute the new ~gimate, the ~nimi~ation direction is given by the negative ~adient. The leaning rule then becom~
The ste~ size ~ (l) is often re~laced by a ~ed co,rant, due to lower com~uga~ionM cos~. These typ~ of methods ~e o~en Nso referred to ~ st~p~t descent, ~adient d~cent, le~t meansqu~es, or error b~kpropagation gech~qu~.
140
6.1.2
CHAPTER 6.
Secondorder
ESTIMATION
OF PARAMETERS
methods
If the second nonconstant term from the Taylor expansionis also considered, the cost function becomes
roJ1 a p=l
[0 J .] o o%(6.8) p=l;p*1
and can be written as a (8)
= J ~  bT~ + ~THT~
(6.9)
where b= ~~ a=a oa]
(6.10)
and the elements of the Hessian H are given by
(6.11) Minimum of (6.8) is found by setting the derivative to zero, and is located at H8  b = 0
(6.12)
from where the optimal 8 can be obtained. It is given by 0 = Hlb
(6.13)
Thus, the optimization reduces to matrix inversion. Unfortunately, the calculation of the Hessian H is computationally prohibitive in practice (analytical solutions are rare (see, e.g., [10]) and approximation methods must be applied. Alternatives include quasiNewton methods such as BFGSand DFP, or the conjugate gradient methods (see, e.g., [71187][84])). LevenbergMarquardt
method
A commonlyused secondorder method is the LevenbergMarquardt method
(see,e.g.,[7]).
Define a vector R whose K componentsrk are the residuals rk=~(k)y(k)
(6.14)
6.1.
141
PREDICTION ERROR METHODS
k = 1, 2, ..., K, where K is the numberof data samples. The cost function and its derivatives can nowbe expressed as 2
oJ (0) = ~ rk~6_ G(01~ R(0) O0
002
(6.1~)
(6.16)
(6.17)
=
where G (19) is the Jacobian matrix, whoseelements g~,p are given by g~’P= or~ (note that ~0~ = ~ (k)) and 8(t9)is the part of the Hessian matrix contains the secondderivatives of rk. The Newtoniteration is given by ~(l+ 1)=~(/)
[G(19)TG(19)+S(19)]~G(19)TR(19)
(6.19)
where the Jacobian G (19) is easy to calculate, while S (19) is not. In GaussNewtonmethod, the S (19) is simply neglected. In the LevenbergMarquardt method, the step is defined as (6.20) where# (l) is increased wheneverthe step wouldresult to an increased value of J. VChena step reduces J, # (1) is reduced. 6.1.3
Step
size
Whenthe minimization direction is available, the problem is to decide how far to go along this line before a new direction is chosen. The step size maybe constant or timevarying. Often, the step size parameter is chosen such that it is a decaying function of time, such as ~/(k) = ~, for example. This choice is madedue to theoretical requirements(to ensure infinite search
142
CHAPTER 6.
ESTIMATION
OF PARAMETERS
range: Y]~I ~ (l)  ~x~and convergenceof the estimates: Heuristically, the step size should be large whenfar awayfrom the optimum (at the beginning of the search), and tend towards zero in the neighborhood of the optimumpoint (at the end of search). In practice, however,it may difficult to find an efficient step size coefficient fulfilling these requirements. The following simple procedure was suggested for # in the LevenbergMarquardt method in [24]: Whenevera step would result in an increased value of J, # (l) is increased by multiplying it by somefactor larger than whena step reduces the value of J, it is divided by somefactor larger than 1. Note, that when# is large the algorithm becomessteepest descent with step size equal to ~, while for small # the algorithm becomesGaussNewton. Another common method is the three point method: Given there are three values 0a < 0b < 0c such that the function at 0b, J(0b), is the lowest, the sign of the derivative at 0b indicates whethera minimum is located in [0a, 0hi or in [0b, 0el. This section is then linearly interpolated from its endpoints, and the procedure is repeated. 6.1.4
LevenbergMarquardt
algorithm
Let us summarize the LevenbergMarquardt method in the following algorithm. Algorithm 27 (LevenbergMarquardt) sumof prediction errors
Given a function
J(0) of
K
J(0) = 5 [y (k)the minimizing parameters ~  arg m~nJ (0) can be found by the following algorithm. 1. Initialize: Set iteration index I = 1. Initialize ~ (1) and ~ (1) and specify 2. Evaluate the modeland the residuals: Evaluate ~ (k) and ~ (k) for all patterns k and parame~ers
(6.22)
6.1.
PREDICTION
143
ERROR METHODS
Compose the residual
vector R (6.23)
and compute J (~ (/)) J (~(/)) =
(6.24)
Compose the Jacobian matrix (~
9k,p= ~ (k)
(6.25)
Solve the parameter update
a~ (~) 1G~ = R [G~G + , (~)I] Repeat Step 2 using ~ (l) + A~ (l), If J(~ (/)+ A~ (1))
(6.26)
i.e. compute J(~ (l) + A~ (l)). increase the
#(t +a) =,(t)/,~
(6.27)
and update the parameters ~ (l + 1)  ~ (l) t A~(l)
(6.28)
otherwise reduce the step size #(l + 1)= z/r (/)
(6.29)
5. Set l  l + 1 and return to Step 2, or quit. Example 36 (pH neutrali~.ation) Let us consider a simple simulated example related to pH neutralization. The process model is given in detail in Chapter 8. Here, we will concentrate on the problem of identifying a SISO steady state titration curve: the effect of base stream to the effluent pH. The data was obtained by solving the steady state model for randomly selected inputs, scaling both variables to [0, 1], and adding Gaussian noise N(0, 0.052) to the output measurement.
144
CHAPTER 6. a)
ESTIMATION
1.2
OF PARAMETERS b)
1 0.8 0.6 ~~ 0.4 0.2 0 0.20
f(~) = 1.183~+0.001
0’,2 0’.4 0’.6 (~.8
Figure 6.1: Titration curve data. Plot a)shows the observed data. Training data is indicated by circles, test data by crosses. Plot b) shows the estimated linear model. Let us now examine the identification of a sigmoid neural network model for the process, estimating the parameters with the LevenbergMarquardt (LM) method. Data. Assumethat two sets of data from .the process have been measured, both containing 75 inputoutput observations of the plant behavior. The first set will be used for parameter estimation (training data), while the second is conserved for model validation (test set). In addition, in this simulated example, we can evaluate the true noiseless function. This gives us a third set (500 observations). The data are shown in Fig. 6.1a where the training data is indicated by circles and the test data by crosses. Model structure. Given the data, the next thing to do is to select the model structure. First, a linear regression model was estimated. The prediction is shown in Fig. 6.lb. Clearly, the plot indicates that the process may possess some nonlinearities. Since we are not aware of any data transformation for a titration curve that would convert the parameter estimation into a linear problem, we consider a sigmoid neural network (SNN) blackbox structure. Let us stick to the simple onehidden layer network. With SNN the number of nodes H still needs to be set. The nonlinearities seem mild, so let us experiment with several moderate network sizes, say, ’3, 5, and 10. Parameter estimation. For the LM method, the initial parameter vector 0 (1), initial step size # (1) and its adjustment factor ’0 need to specified. For SNN,a reasonable starting point is obtained by initializing ~ with small random values, say 0v E N(0, 0.01) Vp. Let us set the initial step size to/z (1) = 0.01 and a relatively moderate adjustment factor r/= 1.2.
6.1.
PREDICTION EPJ:tOR
!
i 102
0
METHODS
145
’
50 I00 150200 250 300 I
Figure 6.2: Evolution of the criteria J = ~ ~ (~(k)  y(k)) 2 during parameter estimation as a function of the LMiterations l. y (k) are the output data in the noiseless (J), test (Jtest) and training (Jtrai~) data sets and are the corresponding predictions by the model. H = 5. With these values set, we can proceed in estimating the parameters. Let us start with the medium sized network (H = 5). The evolution of several criteria is shown in Fig. 6.2. The LMalgorithm performs the task of minimizing Jtrain, which is the average sum of residual errors between observations in the training data and the corresponding model predictions. As seen from Fig. 6.2 (the curve with circles), the Jtrai,, drops rapidly during the first 50 iteration rounds, and for l > 50 the evolution is slower. With the stopping criterion of II(~TR < 103 the model shown in Fig. 6.3a is obtained (predictions connected with a solid line). Validation. At first sight, this seems to be a reasonable fit to the training data (circles in Fig. 6.3a). To have a better view of the model performance, we can compute the linearized parameters. These are shown in Fig. 6.3b. These show peaks at the areas where the prediction grows rapidly. Also, the gain is positive throughout the operating area indicating that the model is monotone. Thus, the model seems to coincide with our expectations of what a titration curve should look like. It is interesting to look at the estimated basis functions, see Fig. 6.3c, showing the adjusted sigmoid functions. The mapping seems to be composed of a practically linear term (with coefficient 3.3), a slowly increasing term (coefficient 0.4) and two sharper correction terms (coefficients 0.3 0.2). The model output is obtained by multiplying the sigmoids with the associated coefficients, and adding up the results (see Fig 6.3d). If the network has too many degrees of freedom, the noise in the finite
146
CHAPTER 6.
ESTIMATION
a)
1.2
OF PARAMETERS
c)
1 Jwain(330) =0.0018 ~ ~[~9~o 0.8 ¯ J(330) = 0.0005 Jaes~(330) = 0.0032j 0 ’~
0.4 0.2 0 o.6/~ 0"20
//=5 0.2
0.4 0.6
0.8
0
0.2
b)
5
0.4 0.6
0.8
d)
4 3
1 0 1 2
._.M 0
0.2
0.4 0.6
0.8
0(6)=0.755
0’.2 6.4 0.6 0.8
Figure 6.3: Performance of the model (H = 5). Plot a) shows the training data (circles), the prediction by the model (solid curve) and the noiseless function (dotted curve). The upper left corner shows the values of the performance criteria (see caption of Fig. 6.2 at the end of training). Plot shows the linearized parameters, i.e., the gradients with respect to the system input (solid curve) and the constant (dotted curve). Plot c) shows adjusted basis functions and the associated coefficients C~h. Plot d) shows the basis functions multiplied by their constants (solid curves) and the final mapping (dotted curve) obtained by summing the components and adding the bias term. All plots are shown as a function of the system input ~.
6.1.
PREDICTION
ERROR METHODS
147
210
Figure 6.4: Evolution of the criteria H = 10 (see the legend of Fig. 6.2).
as a function of the LMiterations
l.
data set will be captured by the model parameters. A rough crossvalidation method is to spare a set of observations for model validation. Computing the minimization criterion for the test data can reveal whether this is the case. Fig. 6.2 (the curve with crosses) shows the evolution of the Jtest during the estimation. It can be seen to drop rapidly during the first 50 iteration rounds and then grow, with a larger increase at l = 200. This indicates that our model may have captured some noise in it. Indeed, looking at the criterion J (the dashed curve in Fig. 6.2), computed using the model prediction and the noiseless function (notice that this is not available in practical problems), shows that the minimumoccurs at l = 180 and then starts to grow. This effect is even more pronounced for the largest network experimented (H = 10), as shown in Fig. 6.4. Figure 6.5 shows the prediction by the largest network, which indeed contains some spurious wiggles at ~ E [0, 0.15]. During the parameter estimation for the smallest network (H = 3) such phenomena could not be observed, and the prediction is the smoothest among the SNNsexperimented (see Fig. 6.6). Comparing the final values of Jtest (see Figs. 6.3a, 6.5 and 6.6) would suggest the smallest network (H = 3) as the ’optimal’ one. For this simulated example, we can also compute the deviation of the model from the noiseless function (Js in Figs. 6.3a, 6.5 and 6.6), and we find the mediumnetwork (H = 5) to have the smallest J. This gives a small taste of the difficulties associated with autonmtic data driven structure selection methods. Taking into account that, in general, different values for initial 0, initial # (1), r/and the stopping criterion result in different parameter estimates, it is easy to see
148
CHAPTER 6.
ESTIMATION
OF PARAMETERS
1.2 = 0.0016 1 ¯ Jtrain(462) o ~ 0.8 . P J(462) = 0.0007 ~"_" oJtest(462)= 0.0035_.,# 0.6 o ~0.4 0
o~
0.2 0
/4=10
0"20
0.2
0.4
0.6
0.8
Figure 6.5: Prediction by the largest SNNmodel(solid curve), training data (circles) and the noiseless function (dotted curve). The upper left corner showsthe values of the performance criteria (see legend of Fig. 6.2 at the end of training.)
Jtrain(66)= 0.0019 J(66) = 0.0006 Jtest(66) = 0.0031
0.2 0 0.20
0.2
0.4
0.6
0.8
Figure 6.6: Prediction by the smallest SNNmodel (solid curve), training data (circles) and the noiseless function (dotted curve). Theupper left corner showsthe values of the performance criteria (see legend of Fig. 6.2 at the end of training.)
6.2.
OPTIMIZATION UNDER CONSTRAINTS
149
that common sense engineering and process knowledgehelp a lot in assessing the validity of an identified model.
6.2 Optimization
under constraints
Parameterestimation rarely consists only of minimizingthe prediction error on a fixed data set. Instead, more or less vague conditions are imposed on the model, either implicitly or explicitly. Oneway to include a priori knowledgeexplicitly into the identification is to consider optimization under constraints. A commontechnique for solving this kind of problem is the Lagrangemultipliers approach(see, e.g., [15]).
6.2.1
Equality constraints
Consider a function J(0) of a parameter vector ~ = [01,... rameters to be minimized
,
0pit of P pa
minJ (~)
(6.30)
h~ (~) =
(6.31)
subject to C equality constraints
:
hc (0) = Construct a Lagrangefunction C
L(O,A) = J(0) + E~chc(O)
(6.32)
which is to be minimized with respect to O and maximizedwith respect to Jk. X = [~1,"" , Ac] are the Lagrangemultipliers, also referred to ~ the Kuhn~cker parameters. The Taylor expa~ion of t~s La~ge function is given by
p=l
0=0;~=~
c OL(O,X) +~[c=l
~
A=~Ac+’’" ]0=0;
(6.34)
150
CHAPTER 6.
ESTIMATION
OF PARAMETERS
The optimality conditions are given by 0L (0, A) 0L (0, 0L (0, A)
Example 37 (Parameter gression model
=0
(6.35)
=0
(6.36)
estimation algorithm) Consider a linear re
~ (a)o~(~)~ (~ ¢ 1)
(6.37)
and a criterion 1
J=~ Ile(k)e(~:1)112
(6.38)
Determinean identification algorithm for e (k) which minimizes Consider the following Lagrange function: 1 P L= 5 ~(op(k)0p(k
1)) 2 + £ [,(k)
0 T(k) cp(k 1)]
(6.39)
p=l
Fromthe optimality constraints it follows that O(k)O(~
1)~,~(k~(~¢)o ~’(~),e(~
1) 1) = o
(6.40) (6.41)
Solving for 0 (k) we derive y(k)
_~T(k_ 1)[a(k
1)+Av,(k
1)]
(6.42)
whichgives for A
T (k 1)e(k~= y(k)~ ~(k 1)~,T (kThe identification ~(k). = ~(k 1)
(6.43)
algorithm can be summarizedas follows:
~(~¢1) ~,~"(~ 1)~(~
[y(k)~ T (k 1)~ (k 1)] (6.44)
6.2. 6.2.2
OPTIMIZATION UNDER CONSTRAINTS Inequality
151
constraints
Let us extend the previous results in order to deal with inequality constraints. Assumethat a function J(O) of a parameter vector 0 = [01,... , Op] of parameters is to be minimized mini (0)
(6.45)
subject to C inequality constraints 0
ql (O)
(6.46)
qc (0) Note, that an equality constraint can be constructed using two inequality constraints. Again, a Lagrange function L(O, A) can be constructed L (to, X) = J (to) + E ~q~
(6.47)
c1
where nowAc _> 0 (for more sophisticated approaches see, e.g., [15], pp. 302319, pp. 334342). The Lagrange function is simultaneously minimized with respect to the parameter vector tO, and maximizedwith respect to the multipliers A. A simple recursive algorithm solving this problemcan be written as follOWS
O(l+l)
= O(1)~?(l)
A(/+I)
= max 0,
OL(to, X)
A(/)+~(/)
(6.48) (6.49)
where ~ (k) is the le~ng rate (step size). Thederivatives with respect to Lagrangemultipliers are given directly by the constraints OL(to, A) (6.50) 0~c (1) = qc (~ (l)) wherec  1, 2, ..., C. In order to calculate the gradients with respect to the parameters C COL(to, A)CO0p(l)
= ’~~ J (~ (/))c00p
+ E )~c~p qc ~(l))c=l
( 6.51)
152
CHAPTER 6.
(p = 1, 2,...,
ESTIMATION
OF PARAMETERS
P), the gradients
o
__oj 00, need to be available. ences
(~(l))
In general, they can be approximated by finite
"00 J00p( ) \~ (1)/ J (0 0~ ~qc
J (0epA0v) +e~,A0P)2A0p
~ )[~(1)~
qc(O+ep~Op)qe(Oep~Op) 2~Op
(6.52) differ
(6.53)
(~.~)
where A0v , p = 1, 2, ..., P, are some sma~ variatio~ of the parameters; ep is a column vector with 1 ~ the p’th element, zeros elsewhere. In some c~, an anal~ical form can be given. In the c~e of p~ameter estimation, the ~adients ~J(0) (p = 1, 2, ..., P) are usually available. prediction error methods, the cost function is given by K
J (0)=~ ~ (V(~¢) ~(a))2
(6.55)
where y is the target output and ~ is the output predicted by model. Thus,
oo,, (o)=~ (y(~)~(a))b~(a) k=l
(6.56)
In blackbox models, the gradient ~ is required by (unconstrained) parameter estimation methods, and is usually easily available. The availability of the gradients of the constraints, b~ qc (0), depends the type of constraints. For simple constraints (upper and lower bounds on the output or parameters, fixed points), analytic forms of the gradients are easy to obtain. For other typical constraints, such as constraints on gains, poles, deviation from a nominal model etc., these maybe difficult to obtain. Let us collect the results to an algorithm. Algorithm 28 (Lagrange multipliers) Problem: minimize J(0) to qc (0) _~ 0, c = 1, 2,..., C, with respect to 0 = [01,.. OF ]. 1. Initialize: Set iteration index l  1.
subject
6.3. GUIDED RANDOM SEARCH METHODS
153
Initialize 0 (l) and,k (l) 2. Evaluate model and constraints: Evaluate ~ (k) and ~ (k) for all patterns Evaluate qc and ~ for all constraints c. Evaluate ~ for all parameters p. 0L and y~. 0L. 3. Composegradients of the Lagrangian, ~5
0L 0A~ (l)  qc(l)
0L
0J
(6.57)
c
0q~
~°(~)~ (~ 00~(~)= ~ (~) ~¢1
(6.58)
4. Update the parameters and Lagrange multipliers: ~ (~ + 1) = ~ (~)  ~ (~)
,k(l+l)=max O,,,k(1) + ~(l)f~
(6.59)
(6.60)
5. Quit, or set l = l + 1 and return to Step 2.
6.3 Guided random search
methods
For solving optimization tasks, such as in parameter estimation, the most popular approaches are gradientbased. However,a cost function mayhave several local optima, and it is well knownthat gradientbased estimation routines mayconverge to a local optimuminstead of a global one. A common solution is to repeat the gradientbased search several times, starting at different (random)initial locations. Analternative is to use some (guided) randomsearch method instead of a gradientbased method, or use both as in hybrid methods. With randomsearch methods, the computation of the gradients at each iteration (often the most timeconsuming phase in the implementation and
154
CHAPTER 6.
ESTIMATION
OF PARAMETERS
computationof gradientbased methods)can be avoided. As well, constraints can be easily included. Most importantly, randomsearch methods perform a global search in the search space, and are thus not easily fooled by the local optima. As this has, however, not shownto be a severe problem i.n manyof the practical applications, various gradient methodsare commonlyused due to their efficacy: Matyas randomoptimization method [6] uses the idea of ’contaminating’ the current solution in order to explore the search space around the current solution. At each iteration, a Gaussian randomvector is added to the current solution. If the newsolution improves the model performance, it will be used in the next iteration. If no improvement occurs, the old solution is kept and a new (~aussian randomvector is generated. In simulated annealing [46], occasional upwardsteps in the criterion are allowed. The acceptance of upwardsteps is treated probabilistically, so that as the optimization proceeds, the probability is decreaseduntil the system’freezes’. A different approach was taken by Luus and Jaakola [56], whosemethod is based on direct search and systematic search region reduction. In each iteration, a numberof randomvectors belonging to the search space are generated and evaluated against the criterion. In order to improvethe solution, instead of concentrating on the space close to the best solution, the direct search is performed in a muchlarger space. In each iteration, the search space is, however,slightly reduceduntil it becomesso small that a desired accuracy has been obtained. Wheresimulated annealing finds background from statistical mechanics and evolutionary processes, genetic algorithms [22] are motivated by the mechanismsof natural selection and genetics. Froman initial population of solutions, a genetic algorithm chooses the most fitted solutions (in the sense of a given criterion) using a selectior~ operator. In order to generate new solutions, operators such as crossover and mutation are used. They are based on a specific form of coding of the solutions as strings, which allows recombination operators similar to those observed with chromosomes.On the new population of solutions, selection and recombination operators are used repeatedly until the population converges, giving a population of fittest solu~tions. Optimization techniques based on learning automata [75] also belong to the class of random search techniques. The concept of learning automata was initially introduced in connection of modeling of biological systems. They
6.3.
GUIDED RANDOMSEARCH METHODS
155
have been widely used to solve problems for which an analytical solution is not possible, or which are mathematically intractable. They have also attracted interest due to their potential usefulness in engineering problems of optimization and control characterized by nonlinearity and high level of uncertainty. In general, the learning automata are very simple machines, and have few and transparent tuning parameters. As an example of the random search paradigms, we will next consider the stochastic learning automata in moredetail.
6.3.1
Stochastic
learning
automaton
Optimization techniques based on learning automata (LA) belong to the class of (guided) randomsearch techniques. In general, randomsearch methods have attained fairly little interest in optimization, although they have some very appealing features. Learning automatacan be applied to a large class of optimization problems, since there are only few assumptions concerning the function to be optimized. They are simple, transparent and easy to apply, even for complexlystructured or constrained systems. The main advantages, if compared to the more popular gradientbased algorithms, are that the gradients need not be computedand that the search for the global minimum is not easily fooled by the local minima. A learning automatonis a sequential machinecharacterized by: ¯ a set of actions, ¯ an action probability distribution and ¯ a reinforcement learning scheme. It is connected in a feedback loop (see Fig. 6.7) to the randomenvironment (the function to be optimized, the process to be controlled, etc.). At every sampling instant, the automaton chooses randomly an action from a finite set of actions on the basis of a probability distribution. Theselected action causes a reaction of the environment, which in turn is the input signal for the automaton. With a reinforcement scheme, the learning automaton recursively updates its probability distribution, and should be capable of changing its structure and/or parameters to achieve the desired goal or optimal performancein the sense of a given criterion. To describe an automaton, introduce the following [75]: 1. U denotes the set {ul,u~,. Ae [2, ~x)[.
,UA} of the A actions of the automaton,
156
CHAPTER 6.
ESTIMATION OF PAP~tMETERS
random environment normalization procedure
u(k) stochastic automaton
~
~(k)
Figure 6.7: A learning automaton with a normalization procedure connected in a feedback loop with the environment. The automatonproduces an action, u(k), based on the probabilities of the actions. The environmentresponse, ~ (k), is normalized and fed back to the automaton. The automaton adjusts its action probabilities and produces a newaction. 2. {u (k) } is a sequenceof automatonoutputs (actions), u (k) 3. p (k) = [Pl (k),... ,PA (k)] T is a vector of action probabilities at iteration k, for which .4
~~p~,(k) = 1,Vk
(6.61)
a=l
4. {7 (k)} is a sequence of automaton inputs (environment responses). ¯ Automatoninputs are provided by the environment either in a binary (Pmodel environment) or continuous (Smodel environment) form. 5. T represents the reinforcement schemewhich changes the probability vector p(k+ l)=p(k)+rl(k)W(p(k),{~(~)},u(~))~=l,2,...,~: Pa (1) > 0 Va,
(6.62) (6.63)
where r~ (k) is a scalar learning rate that maybe timevarying. The vector T=[T1(.),..., TA(.)IT satisfies the followingconditions for serving the probability measure: A
~ Wa(.) = 0, Vk, a=l
(6.64)
6.3.
GUIDED RANDOMSEARCH METHODS pa(k)+zl(k)Ta(.)
157
[0,1],Vk, Va .
(6.65)
The operation of a learning automatoncan be summarizedas follows (see Fig. 6.7) 1. Select randomlyan action u(k) from the action set U according to the probability distribution p(k). 2. Calculate the normalized environmentresponse ~ (k). 3. Adjust the probability vector p(k). 4. Return to Step 1, or quit. A practical method for choosing an action according to a probability distribution (Step 1) is to generate a uniformly distributed randomvariable ~ =U(0, 1). The a’th action u (k) = ua is then chosen such that a is equal the least value of i, satisfying the followingconstraint: ~’~p~(k) _>
(6.66)
In the Smodelenvironment, the continuous environment responses (Step 2) need to be in the range of ~ (k) ¯ [0, 1]. To achieve this, a normalization procedure can be applied, e.g., ~ (k) =
so (k)  mini:l ..... As, (k) maxi=l,...,Asi (k)  mini:l..... n si (k)
(6.67)
where Ua is the chosen action, sa is the expectation of the environmentresponse ~ for action a, and ~ (k) denotes the normalizedenvironmentresponse,
(k)¯ [0, A numberof reinforcement schenies (Step 3) have been described in the literature [75]. Ageneral nonlinear reinforcementschemeis of the form [67]: ¯ if u (k) = us: A
p~(k+ 1)
g, (p (k)) (6.68)
Pa(k)+(1~(k))
j= l, j=/=a A
~(k)
E h~(p(k)) j:l,
j¢a
158
CHAPTER 6.
ESTIMATION
OF PAIL4METERS
¯ ifu(k)#u~: p,~(k+l)=p,~(k)(1~(k))g,~(p(k))+~(k)ha(p(k))
(6.69)
where the functions ga and ha are associated with reward and penalty, respectively. A simple reinforcement scheme, the linear rewardpenalty (LR_p) scheme [11], for an automaton of A actions operating in an Smodel environment is obtained by selecting
go(p(k)) ha (p (k))=
(6.70)
1 A Opa(k)
Substituting the preceding equations into (6.68)(6.69) LRp learning scheme: ¯ if u (k)
(6.71) we have tim following
= Ua:
p~(k + 1) =pa(k)+~1(1
p~(k)~(k))
(6.72)
¯ ifu(k)#ua:
Pa (k + 1) = Pa (k)
v I (Pa(k)
Learning automata can be applied to a large variety of complex optimization problems, since there are only few assumptions concerning the function to be optimized. Typical applications include multimodal function optimization problems, see Fig. 6.8. What makes the automata approach ]particularly interesting is the existence of theoretical proofs (eaccuracy, eoptimality, convergence with probability one, convergence in the mean square sense, rate of convergence) (see, e.g., [75]). There are almost no conditions concerning the function to be optimized (continuity, unimodality, differentiability, convexity, etc.). Learning automata perform a global search on the search space (action space), and they are not easily fooled by the local minima. In general, learning automata are simple machines, and have few and transparent tuning parameters. The main drawback is in the lack of efficiency (slow convergence rates), in particular for large action spaces.
6.4.
SIMULATION
EXAMPLES
159
environment optimized function
observation noise
~
loss function
learningautomaton
Figure 6.8: Multimodal function optimization using a learning automaton. The search region is quantified using X,~, Xb C X; X,~N~,¢bXb= 0; uaXa = X, where u (k) e {ul,... , UA}; and Ua E X,~ are fixed points.
6.4
Simulation
examples
This section will concern three applications related to process engineering. First, a SISO Wiener model for a simulated pneumatic valve is identified. Both greybox and blackbox approaches for modeling the static part are considered. The results are compared with those from the literature [95], and the estimated parameters are examined. The second example considers the estimation of the parameters in a MISOHammerstein model. The data is drawn from a binary distillation column model, considered also in [17]. Parameter estimation under constraints posed on the properties of the static part is illustrated. The third example illustrates identification of a twotank system under constraints using a Wiener model. TM All
simulations were performed on a Pentium PC (450 MHz)and Matlab 5.2. In the parameter estimation, the functions leastsq.m (Levenberg  Marquardt method with a mixed quadratic and cubic line search) and constr, r~ (mechanization of the Lagrange multipliers approach) from the Matlab optimization toolbox were used. The differential equations were solved using the ode23 .m function, inverse problems with fsolve, m.
160
6.4.1
CHAPTER 6.
ESTIMATION OF PAtbtMETERS
Pneumatic valve: identification tem
of a Wiener sys
Let us first consider a simple exampleon the identification of a Wienersystem. A simple model for a pneumatic valve for fluid flow control is given in [95], where also a Wienermodelfor the system is identified. Someof the simulation results can also be found from [36]. Process and data Pneumaticas well as electrical valves are commonly used for fluid flow control. The static characteristics of a valve for fluid flow control vary with operating conditions. The input of the model represents the pneumatic control signal applied to the stem, while the internal variable represents the stem position, or equivalently, the position of the valve plug. Linear dynamics describe the dynamic balance between the control signal and the stem position: 1 2
z (k) 0.1044q + 0.0883q = u (k) 1  1.4138q1 2 + 0.6065q
(6.74)
The flow through the valve is given by a nonlinear function of the stem position: Z
(k) 1 = 0.1; c2= 0.9
(6.75)
Based on the above model, data sets of 1000 inputoutput pairs were generated in the same fashion as in [95] (see Fig. 6.9): a pseudo random sequence (PRS) was used as input. In practice it is impossible to obtain perfect measurements;to makethe simulations morerealistic in the case of noisy observations a Gaussian noise was introduced at the output measurement. For reference purposes, also a noiseless data set wasgenerated, as well as two test sets of 200 inputoutput patterns. Model structure The structure
and parameter estimation
of the linear dynamic SISO model was assumed to be known
(riB = 1, nA = 2, d = 1)
1 z (k) = 1  bo+b~q alq~ + a2q2u(k
1)
(6.76)
6.4.
161
SIMULATION EXAMPLES
Training data
1 O0
200
300
400
I
I
I
100
200
300
400
500 600 time(samples)
700
800
900
1000
500 600 time(samples)
700
800
900
lO00
1.5
0.5 0 0.5~
Figure 6.9: Training data for the pneumatic valve model. The system input (upper plot) was a PRSsignal with a basic clock period of seven samples. The system output (lower plot) was corrupted by Gaussian noise N(0, 0.052).
162
CHAPTER 6.
ESTIMATION
OF PARAMETERS
Thus the linear dynamic part contained three parameters to be estimated from data (rememberthat b~ is fixed, see Algorithm 23). Twomodelstructures were considered for the static part. ]in the first case, the SQRTstructure of (6.75) with the two parameters Cl mid c2 to estimated, was considered. The required derivatives are simple to compute, resulting in Oy Oz Oy Ocl

C 1 (cl + ~ c2z2) z 2 (C1
g[
(6.77)
3
(6.78)
C2Z2)
a z Oy 3 0c2 2 (C1 ~[ 52Z2)
(6.79)
In the second case, a 0order Sugenofuzzy modelwas used (see Algorithm 18). Let us assume crisp system input(s), triangular fuzzy sets, addone partition, product tnorm, and weighted average defuzzification. Then, a 0order Sugeno model can be expressed as P1
P2
PI
p~=l/)21
(z2, (z,,.)
(6.80)
pl=l
where the input degrees of membershipare given by zi~i’p’~ #,,~,(zi,~i)=max(min(
\ ~,,:_~ ,~ ~i’+1zi / ,0),  ~,,, / (6.81)
where~i,v, (~i, ") is the membership function associated with the p~’th fuzzy set partitioning the i’th input (pi = 1, 2,... , Pi). Theinput partition is given by Di i,p~i < i,p, . The derivatives with respect to the inputs are given
by
P1
P2
p~=l p2=l
P~ p~=l
I
j=l;j¢i
where (6.83) otherwise
6.4.
163
SIMULATION EXAMPLES
and with respect to the consequent parameters by I 0~" : lI ~i,p~ (zi, .) O0~pl,P2,"" ,PI
(6.84) i=1
In the SISOproblem (I = 1) considered here, five triangular membership functions were used (P1 = 5). This results in a piecewise linear structure whichis functionally equivalent to that used in [95]. The antecedent parameters (knots) wereset to ~31 = [0, 0.2, 0.4, 0.7, 1] as in [95], and the consequent parameters c~ were to be estimated from data. The parameters of the two model structures were estimated using the LevenbergMarquardtmethod, using both noiseless and noisy data resulting in four simulations. In addition, the results were comparedwith those reported in [95]. Analysis Table 6.1 shows the numberof iterations required in the four cases. The training was fast, completed in a few minutes. Note, that the recursive prediction error method used in [95] is not a batch method, and thus the results are not directly comparablewith [95] (bottom line of Table 6.1). The rootmeansquared errors (RMSE)
RM S E
~
1 = "~
K
(y(k)
(6.s5)
on the corresponding training set and on noiseless test data are given in Table 6.2. Whenthe model structure was exactly known(the SQRTcase), the match was perfect; the RMSE on training and test set were zero up to four digits in the noiseless case, and close to the standard deviation of the noise in the case of noisy data. Also in the case of a moregeneral blackbox model(0Sugeno) the accuracy of all the identified modelswas good. Tables 6.3, 6.4 and 6.5 showthe true values of the parameters, estimated values, and the parameter values estimated in [95]. Table 6.3 shows the coefficients of the polynomials A and B, i.e linear dynamics. The maximum deviation from true zero at q = 0.8458 was less than 1%in the case of SQRT model,as well as in the case of 0Sugenomodelidentified from noiseless data. For the 0Sugenomodels identified from noisy data, however, the deviation was moreimportant, yet less than 20%.In all cases, deviation from the true poles located at q = 0.706’9 + 0.3268i was less than 2%.
164
CHAPTER 6. ESTIMATION OF PAI:b~.METERS training time SQRTnonoise SQRTnoise 0Sugenono noise 0Sugenonoise 0Sugenonoise[95]
Table 6.1: method.
Numberof iterations
training epochs 47 64 37 37 (1)
required by the LevenbergMarquardt
Table 6.4 gives the parameters of the SQRTmodel, and Table 6.5 the consequent parameters of the 0Sugeno model. Note, that the redundancy in gains is removedby fixing the gain of the dynamicpart; thus parameter b~ is not independently estimated. In [95], the problem of redundancy was solved by fixing the gain of the static part on an interval u E [0.2, 0.4] to 1.5; thus only the bias for the interval was identified (0.2402). The ’nonidentified’ parameters are indicated by the parentheses in the tables. However,as the steadystate gain of the transfer function of the true system (6.74)(6.75) is one, the parameters are directly comparable. Table .6.4 shows that the parameters were correctly estimated up to two digits in the case where the correct form of the nonlinearity was known. Table 6.5 indicates that the difference between the consequent parameters estimated here and the knot parametersestimated in [95] wassmall, the largest difference appeared in the first parameter (IAwll < 0.08), for which region the training set contained only a few data points. The performanceof the identified 0Sugenomodelon test set data (previously unseen to the model) is illustrated in Fig. 6.10. Theupper part of the figure showsthe steps in the system input u, and the filtered intermediate variable z. Note that the steadystate gain of the linear dynamicfilter is one. The lower part in Fig. 6.10 shows the output of the true system y and the prediction by the model~, whichis obtained by putting the intermediate variable z through the static (nonlinear) part. The fit betweenthe desired and obtained signals is close, although some deviation can be seen at lower values of y. This mismatchis due to the noise in the few data samples from that operating area. Results In this examplewe found that the correct parameter values were estimated using the Wiener structure and the LevenbergMarquardt method in pa
6.4.
165
SIMULATION EXAMPLES
Test data
~1 II
OA
0
t
I
20
40
60
I
80
f
100 120 time(samples)
140
160
180
40
60
80
1 O0 120 time(samples)
140
160
180
1
I
I
200
1.2 l
0.6 0.4 20
200
Figure 6.10: Prediction by the 0Sugenomodel identified on noisy data. The intermediate signal z (upper plot  dashed line) is obtained by filtering the input sigal u (upper plot  solid line). Themodeloutput (lower plot  dashed line) is computedby p~¢ting the intermediate signal through the nonlinear static function. Thetrue output is shownby a solid line.
RMSE training data test data true modelno noise 0 0 SQRTnonoise 0.0000 0.0000 SQRTnoise 0.0488 0.0014 0Sugenono noise 0.0297 0.0315 0Sugenonoise 0.0525 0.0126 0Sugenonoise[95] 0.0554 0.0111 Table 6.2: Rootmeansquarederror on training and test data.
166
CHAPTER 6.
ESTIMATION
Linear dynamics b0 true parametersno noise 0.1044 SQRTnonoise 0.1044 SQRTnoise 0.1046 0Sugenono noise 0.1058 0Sugenonoise 0.0997 0Sugenonoise[95] 0.0980
bl (b~) 0.0883 (0.0883) (0.0902) (0.0884) (0.0991) 0.0984
OF PARAMETERS
1.4138 1.4138 1.4104 1.3983 1.3926 1.4026
a2 0.6065 0.6065 0.6052 0.5925 0.5914 0.5990
Table 6.3: Estimated parameters of the linear dynamicblock.
SQRTparameters true parametersno noise SQRT:no noise SQRTnoise
Cl
C2
0.1 0.9 0.1000 0.9000 0.0992 0.8999
Table 6.4: Estimated parameters of the static SQRTblock.
0Sugeno parameters W W W w5 wa 1 2 3 0Sugenono noise 0.0545 0.5743 0.8253 0.9626 1.0021 0Sugenonoise 0.0397 0.5770 0.8252 0.9608 1.0030 0Sugenonoise[95] 0.0221 (0.5402) (0.8402) 0.9607 0.9852 Table 6.5: Estimated consequent parameters of the static 0Sugenoblock.
6.4.
SIMULATION
EXAMPLES
167
rameter estimation. Both noisy and noiseless cases were experimented. As expected, the performance of the approach was similar to that in [95], when comparable. However, the approach suggested here is not restricted to any particular form of the static mapping. As illustrated, the suggested identification procedure does not restrict the type of the nonlinear static model, as long as the gradients can be computed (or approximated). In the example, it was shown how a greybox SQRTmodelwith a structure justified by physical background could also be used, as well as a fuzzy blackbox model. In the next example, we will consider the identification of a more complicated MISOsystem.
6.4.2
Binary distillation column: identification merstein model under constraints
of Ham
In a second example, identification of a binary distillation column was studied. Hammerstein modeling of this process was considered in [17], see also [37]. In what follows, special emphasis is focused on the role of process identification under constraints. Process
and data
Distillation is a complex chemical operation for separation of components of liquid mixtures and purification of products. It is widely used in the petroleum, chemical and pharmaceutical industries. In a typical distillation column, the feed enters near the center of the column and flows down. Vapors that are released by heating are condensed and can be removed as overhead product or distillate. Any liquid that is returned to the column is called reflux. The reflux flows down the column and joins the feed stream. The reflux rate has a major influence on the separation process. Too much reflux makes the product excessively pure, but wastes energy because more reflux liquid has to be vaporized, while too little reflux causes an impure product. A simple model for a binary distillation column was given in [90] (pp. 7074). The model is described by a set of differential equations. The composition dynamics at the bottom, 1 st tray, feed tray, top trays and condenser are given by dxb 1 dt ~ ~ (L,x,
 Bxb  Yyb)
dxl 1 d~ = ~ (V (y~  Yl) + L2x2  LIxl)
(6.86)
(6.87)
168
CHAPTER 6. dx 1
1
ESTIMATION
OF PARAMETERS
(V (Ynf,  yn~) + L,~+,x,~+l  L,fx,~ + Fz~)
(6.88)
1  (V (YN1  YN) + Rxd  LNXN)
(6.89)
Mn~
dXN
MN
dxd
1
dt ~d (Yyg  RXd Dxd)
(6.90)
respectively. At other trays, the composition dynamicsare given by dxn 1 (V (Y.1  Y,~) + L,~+Ix,~+I L,~x.) dt The vapor composition at trays n = 1, 2, ..., puted using a constant relative volatility:
y. = Yb =
(6.91)
N and at the bottom are com
OlX n
1 + (a  1) ozx b
1 + (a  1)
(6.92) (6.93)
The relations between the changes in flows are assumedto be immediate: B L, D
=
FV+R R+F ifn<_n IR = if n > nI = VR
(6.94) (6.95) (6.96)
where B, D, and Ln are flows of the bottom, distillate and liquid at tray n, respectively. The steadystate operating parameters of the .distillation columnmodelare given in Table 6.6. The modelis distinguished by an openbook like steadystate nonlinearity betweenreflux and top composition and a strong variation of the apparent time constant with change in reflux. Using the model, a data set of 900 data patterns was generated by varying the values of reflux flow R and distillate flow V (PRS with a maximum amplitude of 5%, sampling time 10 min). The top composition Xd we~Sobserved and a modelfor it wasto be identified. Model structure
and parameter estimation
In [17], a Hammersteinmodel for the process was considered: u~ ~ R, u2 ~ V, y ~ Xd, with first order dynamics, nB=O, nA  1, d = 1. The same setting for the dynamicpart was used here.
6.4.
SIMULATION EXAMPLES
169
Parameter reflux R vapour boilup V feed flow F feed composition zf number of trays N feed tray nf relative volatility c~ holdups Mb = Mn = Md bottom composition xb top composition Xd
Value 1.477 1.977 1 0.5 25 12 2 0.5 0.005 0.995
Table 6.6: Steadystate operating parameters of the binary distillation umn model.
col
The steady state mappingwas identified using a sigmoid neural network (SNN) structure. The output of a onehiddenlayer SNNwith H hidden nodes (see Algorithm 17) is given H
~ = Zahgh (U,/3h)
÷ ag+l
(6.97)
h1
1
gh (u,/3h)
(6.98)
where u is the I dimensional input vector. The parameters of the network are contained in an H ÷ 1 dimensional vector c~ and H × (I + 1) dimensional matrix/3. The gradients with respect to the parameters are given by 0~"
H
0~,~. =h=l~o~,,g,~ (u,/3,~) [1 g,~(u,/3,31Z,~,,+l 0~"
.... O0~hgh (U,/3h)
1
= O~hgh(U,/3h)[1gh(U,/3h)]~p 0Z.,~ i 0~
= O~hgh(U,/3h)[1  gh (U,/3h)]
(6.99)
(6.100)
(6.101) (6.102)
170
CHAPTER 6.
ESTIMATION
OF PARAMETERS
whereh = 1, 2, ..., Hand i = 1, 2, ..., I. In the distillation colun~aexample, H = 6 was used. In the parameter estimation, optimization under constraints was considered. Constraints were posed on the gain of the static mappingso that
0uZ > o
(6.103)
2ou < o
(6.104)
was required. The constraints were evaluated at 625 points, forming a grid with regular intervals on the input space spanned by R and V: ul E {0.95u~8, ..., 1.05u~8}, u2 e {0.95u~,..., 1.05u~}, u~~ = 1.477, u~~ = 1.977. This results in 1250 constraint evaluations at each iteration. The sum of squared errors on the training set was then to be minimized under these constraints. The parameters were estimated using the Lagrange multipliers approach. Analysis The training data and the prediction of the identified modelon training data are illustrated in Fig. 6.11. The RMSEon training data was 0.6731. For reference purposes, the parameters were also estimated using the iLevenbergMarquardt method (no constraints). This resulted in a RMSEof 0.2048 training data. Hence, a more accurate description of training data points was obtained using the LevenbergMarquardt method. However,the examination of the static mappingshowsa significant problem with the unconstrained model. Fig. 6.12 shows the mappings obtained in the constrained and unconstrained cases. In the unconstrained case, the static mappingis nonmonotonic.This is due to the small amountof data and the mismatchin the structure of the plant and the model. The constrained case corresponds better to the a priori knowledgeof the process behavior (monotonic increasing with respect to R, monotonicdecreasing with respect
to v). Comparedwith [17], visual inspection of modelpredictions reveals that moreaccurate descriptions of the process were identified with the approach suggested here. This can be attributed mainly to the moreflexible structure used for the static part (a powerseries was used in [17]). As ipointed out in [17], however, a logarithmic transformation of the output measurement wouldprovide a morereasonable resolution for real applications.
6.4.
SIMULATION EXAMPLES
171
2.1 2 155 1.~ 1.’~ 1 1.5 "61’~ 14 ¯0
1.1 /
100
~ 200
~ 300
,
,
,
100
200
300
400 500 time(samples) ,
,
~ 600
700
800
)00
700
800
900
,
/
o.8 t 0
400 500 time(samples)
600
Figure 6.11: Prediction by the Hammersteinmodel identified under constraints on static gains. The upper plot showsthe modelinputs; the lower plot shows the plant response (solid line) and model responses: the intermediate variable (dotted line) and the prediction by the model (dashdot line).
172
CHAPTER 6.
ESTIMATION
OF PARAMETERS
~
\.R=I.514
~ 1.4032
1.477
1.5509
i782
R
1.~77
V
XR=1.477 XR 440
2.0759
Figure 6.12: Static mappingin the constrained (solid lines) and unconstrained (dashed lines) cases. Results In this example, a Hammersteinmodel for a MISOprocess was identified. Parameters were estimated under constraints, where constraints were posed on the static mapping. The suggested approach enables to pose constraints directly based on the a pr/or/ knowledgeon steadystate behavior. Typically, information such as minimumand maximumbounds of plemt output, knowledgeon sign or boundsof the gains, fixed equilibrium points, etc., is available. With linear dynamics, it is simple to pose constraints Msoon the dynamical part, such as bounds on the location of poles and zeros. This applies both for the Hammersteinand Wiener approaches. For put’poses such as process control, clearly the constrained model can be expected to give better performance. 6.4.3
Twotank straints
system:
Wiener
modeling
under
con
As a final example,let us illustrate the identification of a twotank process under constraints, using a Wienerstructure. Process Consider a twotank system [64], see Fig. 6.13. Massbalance considerations lead to the following nonlinear model: dY1(t)= dt
{Q
/ (~) \

1
/
(6.105)
6.4.
SIMULATION
173
EXAMPLES
R 2
Figure 6.13: A twotank system. dY~ (t) dt
A2
(6.106) \
where Y1 and Y2 are the levels, A1 and As are the crosssurfaces, and kl and k2 are the coefficients involved in the modeling of the restrictions and of the two tanks, respectively. The following values were adopted: A~ = A~ = 1, k~ = 1, ks = 0.9.
Experiment
design
The system was simulated using an input consisting of a pseudo random sequence. The output measurement was corrupted with a normally distributed random sequence with a variance equal to 0.04. From the simulations, a set of 398 measurements describing the behavior of the system were sampled using T = 1.
174
CHAPTER 6.
ESTIMATION
OF PARAMETERS
Model structure A SISO, I = 1, Wiener model was constructed from the input flow, Q (t), the level of the second tank, Y~(t). Second order linear dynamics, N = M= 2, with delay of one sample, d = 1, were considered. A sigmoid neural network with six hidden nodes, H = 6, was used to model the nonlinear static behavior of the system. Parameter estimation
(under constraints)
Anumberof constraints were considered for the static part. Theseconstraints wereevaluatedin C = 56 points Qc= {0.0, 0.02, 0.04, ..., 1.1}, c = 1, 2, ..., C. Constraints on the output: Jc(0) = f(Q~,0)  Ym~;Ymax 1.203 Jc+c (~) = Ymin
f
(6.107)
(6.10s)
(Q~, ~) ; Ymin =
Constraints on the static gain:
J2c+ (0)=of(Q,o)
Km~; Km~x = 2.5
Of(Q~,~). J3c+c (0)
 gmin
Kmin 
OQ~ ’
0
(6.109)
(6.110)
Fixed point in the origin: f(0, 0) J4c+1(~) = f (0, J4c+2(~) = f (0, ~)
(6.111) (6.112)
In addition, the poles pl and P2 were restricted to belongto the circle centered at the origin with radius ps: J4c+a (O) = IPl (0)1 Ps;Ps = 0.95
(6.113)
Jac+a (0) = IP2 (0)1 P~; P~ =
(6.114)
6.4.
SIMULATION
175
EXAMPLES
50
100
150
200
250
300
350
400
50
1 O0
150
200 time
250
300
350
400
Figure 6.14: Performance on training data. Upper plot shows the input flow Q (t). Lower plot shows the level of the second tank, Y2 (t). Dots indicate the points contained in the training set. Solid lines show the corresponding predictions given by the constrained and unconstrained Wiener models.
Hence, a total
number of 228 constraints
were posed to the model.
Using the training data, the parameters were estimated under the constraints given by Eqs. (6.107)(6.114). For comparison, the same data was used for training a Wiener model with the same structure using the LevenbergMarquardt method (without constraints). Figure 6.14 shows the performance of the Wiener models after training (8000 iterations).. The results indicate that the information contained in the training set was well captured in both cases. Figure 6.15 shows the static mappings provided by the two models. In both cases, the static mapping is accurate on the operating area for which measurements were provided in the training set. However, extrapolation outside the operating area contained in the training set gives poor results with the unconstrained model. It is simple to include additional a priori information using the constraints. Figure 6.15 shows that the constraints posed on the output of the model, on the gain, and on the fixed point are satisfied by the Wiener model identified under constraints. At the same time, the prediction error on measured data is minimized. For the dynamic part, the following linear model
176
CHAPTER 6.
0 0.2 0.4
ESTIMATION
OF PAI:U~METERS
0.6 0.8 1 input
Figure 6.15: Nonlinear static mappings identified by the Wiener models. Solid line shows the response for the static part of the Wiener model in the unconstrained case. Dotted line shows the behavior of the model identified under constraints. The circle indicates the equality constraint. was identified: ~(k) = 0.14Q (k) + 1.30~(k  1)  0.44~(k
(6.115)
with poles pl = 0.6476 + i0.143, p2 = 0.6476  i0.143, IPll = IPll = 0.6632. Thus the constraints on the dynamic part were fulfilled, too. Figure 6.16 depicts the performance of the Wiener models on test set data. Note that in the test data the input varies in a wider range than in the training data. The performance of the unconstrained Wiener model is poor, whereas for the constrained Wiener model the performance is much better. All the prior information was captured by the constrained Wiener model. Note that the static output of the Wiener model was constrained to be always less than (height of the second tank), which was not taken into account in the simulation of the plant, Eqs. (6.105)(6.106), as shownin 6.16.
6.4.4
Conclusions
The application of Wiener and Hammerstein structures in the identification of industrial processes was considered. Structures and associated parameter estimation methods were proposed, which resulted in a nonlinear steadystate description of the process with dynamics identified as linear OEtype filters. In many cases, the dynamics of a nonlinear process can be approximated using linear transfer functions, and the system nonlinearities can be pre
6.4.
SIMULATION
177
EXAMPLES
1 .=,0.5
50
100
150
200
250
300
350
400
50
100
150
200
250
300
350
400
50
100
150
200 time
250
300 . 350
400
~ 1.5 ~0.
0 0
5 0
Figure 6.16: Performance on test data. Upper plot shows the input flow Q (t). Middle and lower plots show the level of the second tank, Y2 (t). Solid lines show the corresponding predictions given by the constrained Wiener model (middle plot) and unconstrained Wiener model (lower plot).
178
CHAPTER 6.
ESTIMATION OF PAIbtMETERS
sented by a nonlinear gain only. This provides manybenefits i:n the form of robustness in dealing with the biasvariance dilemma,availability of the welldeveloped tools for handling both linear dynamicand nonlinear static systems, and increased transparency of the plant description. In this section, examplesof identifying a steadystate static plant modelwere presented, thus emphasizingthe transparency aspects. In industrial practice, it is commonthat the steadystate behavior of a process is muchbetter knownthan its dynamic characteristics. With the approach considered in the examples, it is simple to use this knowledgein the initialization and validation of a blackbox model. If a reliable steadystate model is available, it can be used as a whitebox or greybox static mappingin the Wiener or Hammersteinstructure. Furthermore, there were few restrictions posed on the formof the static mapping;no specific properties of a certain paradigm were used. This enables a nonlinear structure to be chosen depending on the application requirements (good transparencyfuzzy systems, high accuracyneural networks, efficiency and speedpowerseries, expectable interpolationpiecewise linear systems, etc.). Theseproperties are important from the practical point of view of process modeling. In addition, the identification of OEtypeof linear dynamicswas considered. This type of modelis morerobust towards noisy measurements,and particularly suitable for longrange simulation purposes.
Part II Control
Chapter
7
Predictive 7.1 Introduction
Control to modelbased
control
Modelsare a basic tool in modernprocess control. Explicit models are required by manyof the moderncontrol methods, or modelsare required during control design. In the control of nonlinear processes the role of modelsis even more emphasized. In the modelbased approaches, the controller can be seen as an algorithm operating on a modelof the process (subject to disturbances), and optimized in order to reach given control design objectives. In modeling, the choice of both the modelstructure and the associated parameter estimation techniques are constrained by the function approximation and interpolation capabilities (e.g., linear approximations,smoothnessof nonlinearities, a priori information). Fromthe control design point of view, the need for convenient waysto characterize a desired closedloop performancegives additional restrictions (e.g., existence of derivatives and analytic solutions). In addition, manyother properties maybe of importance (handling of uncertainties, nonideal sampling, data fusion, tuning, transparency, etc.). Clearly, the choice of a modelingmethodis of essential importance, and therefore a large part of this book has been consecrated for explaining the various approaches. In some cases, the behavior of the process operator is modeled(common, e.g., in fuzzy control), or a modelof a controloriented cost function is directly desired (e.g., in some passivitybased control approaches). Usually, however, the characterization of the inputoutput behavior of the process (or the closedloop control relevant characteristics) is the target of modeling (on/offline, in open/closedloop,etc.). Thetheory of modelingand control of linear systems is welldeveloped. In the control of nonlinear systems, a commonapproach has been to consider 181
182
CHAPTER 7.
PREDICTIVE
CONTROL
a nonlinear model, to linearize it around an operating point, and design a controller based on the linear description. This is simple and efficient, fits well to most regulation problems, and can be seen as gain sc:heduling or indirect adaptive control. In particular, linear approachesare difficult to beat in the analysis of dynamical systems. For servo problems, fully nonlinear approaches have been considered, based on the properties of knownnonlinearities or on the exploitation of raw computingpower(e.g., nonlinear predictive control). Predictive control is a modelbasedcontrol approachthat uses explicitly a process modelin order to determine the control actions. In this chapter, the predictive control approach will be discussed for the case of linear SISO models.
7.2
The basic
idea
Predictive controllers are based on a guess, a prediction, of the future behavior of the process, forecasted using a modelof the process. There exists a multitude of predictive control schemes, whichall have four major features in
common:
1. A model of the process to be controlled. The modelis used to predict the process output, with given inputs, over the prediction horizon. 2. A criterion function (usually quadratic) that is minimized in order to obtain the optimal controller output sequence over the predicted horizon. 3. A reference trajectory for the process output, i.e. a sequenceof desired future outputs. 4. A minimization procedure. The basic concept of predictive control is simple. A predictive controller calculates such future controller sequence that the predicted ou~tput of the process is close to the desired process output. Predictive controllers use the receding horizon principle: Onlythe first element of the controller output sequence is applied to control the process, and the whole procedure is repeated at the next sample. Any model that describes the relationship between the input and the output of the process can be used, including disturbance models, nonlinear
7.3.
LINEAR QUADRATIC PREDICTIVE CONTROL
183
models, or constrained models. The approach can also be extended for multivariable control. Calculation of the controller output sequenceis an optimization (minimization) problem. In general, solving requires an iterative procedure. Although many types of models can be considered, a major problemin deriving predictive controllers for nonlinear process modelsis the nonlinear optimization problem that must be solved at every sample. The way this problem is solved depends on the type of nonlinearity of the process model. However,if: ¯ the criterion is quadratic, ¯ the modelis linear, and ¯ there are no constraints, then an analytical solution is available. Theresulting controller is linear and timeinvariant if the modelis timeinvariant. This appealing case will be considered in the following sections. Example 38 (Car driver) Consider the process of driving a car. This process can be assimilated to a SISOsystem wherethe input is the variation of the position of the steering wheel towards a given fixed point of dash board. The output is the position of the car with respect to the direction of the road ahead. At each sampling instant the driver of the car calculates the variation of the control variable and implementsit, based on his observations of the road and the traffic ahead (to see further than the end of one’s nose) and his prediction of the behavior of the car. This procedure is repeated at each sampling period which depends on the driver.
7.3 Linear quadratic
predictive
control
In this section, the state spaceformulationis adopted(see, e.g., [69] [83] [96]). Remember,that a transfer function model can always be converted into a state spaceform; in fact, for each transfer function, there is an infinite number of state space representations (see AppendixA for a brief recap on state space models). First, the state space model and the principle of certainty equivalence control are introduced. The/stepahead predictors for the model in state space form will be derived. A simple quadratic cost function is then formulated and the optimal solution minimizingthe cost function is derived. Finally, the issues of control horizon, integral control action, state estimation and closedloop behavior are briefly discussed.
184
CHAPTER 7.
7.3.1
Plant
PREDICTIVE
CONTROL
and model
Let a SISOsystem (plant, process) be described by a statespace model x(k+ 1) = Ax(k)
÷Bu(k)
(7.1)
y(k) = Cx(k)
(7.2)
where x is the state vector (n × 1), u is the system input (controller output) (1 x y is the system output (measured) (1 x Ais the state transition matrix (n x n) B is the input transition vector (n x 1) C is the state observer vector (1 x n) Let us assume that a model (approximation) for the system is :known and given by _~, ~ and ~, and that the states x and output y are measurable. In the certainty equivalence control, the uncertainty in the parameters is not consider~.ed; the estimated parameters are used as if they were the true ones (A ~A, B ~~, C ~~). Thus, in what follows, we allow ourselves simplify the notation by dropping out the ’hats’. The target is to find the control input u (k) so that the desired control objectives are fulfilled. The objectives concern the future behavior of the process, from the nexttocurrent state up to the prediction horizon, H~,. The prediction horizon is generally chosen to be at least equal to the equivalent time delay (the maximumtime delay augmented by the numberof unstable zeros). Let the cost function (to be minimized)be given 2 J=~~,(w(k+i)~(k+i))2+ru(k+i1)
(7.3)
i=1
where w (k + i) is the desired system output at instant k + i. r is a scalar which can be used for balancing the relative importance of the two squared terms in (7.3). The minimization {u(k),...,u(k
+ H~, 1)} = arg min u(k),...,u(kWHp
1)
gives a sequenceof future controls {u (k), u (k + 1), ..., u (k + H~,  1) first value of the sequence(u (k)) is applied to control the system, at control instant the optimization is repeated (receding horizon control).
(7.4)
7.3. 7.3.2
LINEAR QUADRATIC PREDICTIVE CONTROL /step
ahead
185
predictions
Let us consider the/step ahead predictions. At instant k, the measured state vector x (k) is available. For future values of x, the modelhas to used. Theprediction for y (k + 1), based on information at k, is given ff(k + 1) = C lax (k) + Bu
(7.5)
ff(k + 2) = C [Ax (k + 1) Bu(k + 1)
(7.6)
For y (k + 2) we have
wherethe estimate for x (k + 1) can be obtained using the model, x (k + 1) Ax(k) + Bu (k). Substituting this gives ff(k
+ 2) = C[A[Ax(k) +Bu(k)] + 1)1 = CA2x(k) + CABu(k) + CBu(k+
(7.7) (7.8)
In a similar waywe have that ff(k
+ 3) = CAax(k) + CA~Bu(k) + CABu(k+1) + CBu(k +
and, by induction, for the/step ahead prediction i
+ i) = CA’x(k) + Ch Bu+ j 
(7.10)
j=l
Let us use a morecompactmatrix notation. Collect the predicted system outputs, the system inputs, and the desired future outputs at instant k into vectors of size (H~, x 1): ff(k + 1) = [~’(k + 1),... ,~’(k + r (7.11) u(k) = [u(k),..,u(k+H (7.12) vl)] T T w(k+l) [w (k+l),...,w(k+g,,)] (7.13) The future predictions can be calculated from ~(k + 1)= Zchx (k)+ ZchBu
(7.14)
where CA KCA
(7.15) CAH,,
CB KCAB
".. " CAH,,1B ¯ . CB
(7.16)
186
CHAPTER 7.
7.3.3
PREDICTIVE
CONTROL
Cost function
The cost function (7.3) can be expressed in a vector form J = (w(k+l)~(k÷l))T(w(k+l)~(k+l))
(7.17)
where R = rI. The solution for u minimizing J is given by u(k) [R1 ~ Kch, TKCAB]
KcTA~(w (k + 1)  KcAx(k))
Proof. Let us simplify the notations by dropping out the sample indexes k related to time. Minimization can be done analytically by setting the derivative o~ = 0. The derivative is given by OJ
0 (w y)
° (w  y) (w
(7.19)
For the partial derivatives we get 0 (w ~) = 0 y = KcAn ~uu 0 uT Ru = R; ~ = I
(7.20) (7.21)
Thus, the derivative (7.19) can be written 0__~J = _2K~AB(W p) + 2RTu 0U
(7.22)
Setting the derivative to zero and substituting the vector of future predictions from (7.14) we have K~A B (W  KCAX  KCABU) ~ RTu
(7.23)
Solving foru gives theoptimal control sequence (7.18). Letus introduce a gainmatrix K: I
T K = [R + KcABKCAB ] K~A n
(7.24)
7.3.
LINEAR
QUADRATIC PREDICTIVE
CONTROL
187
Denote the first row of K by K1. Since only the first element of the optimal sequence is applied to the process, the online control computations are reduced to u(k)
= K1 (w(k+ 1)
KcAx(k))
If the system parameters, A, B and C, are constant, and KCAcan be computed beforehand.
7.3.4
(7.25) the gain matrices K1
Remarks
In manycases, it is useful to consider an additional parameter in the tuning of the predictive controller, the control horizon. The control horizon H,, specifies the allowed number of changes in the control signal during optimization, i.e. Au(k+i) = fo r i
>_H~
(7.26)
whereA  1  q~. A simple way to implement the control horizon is to. modify the KCAB matrix. Let us decompose the matrix in two parts. The first part, K"CAB~ containsthe firstHc  1 columnsfrom the leftof the KCABmatrix.The b secondpart,vectorKCAB,sunIsrowwise theremaining elements of theKCAB matrix, i.e. (7.27) where k~b and ki,j are the elements (i th row and jth column) of the K~A B and KCABmatrices. The new KCABmatrix is then formed by
KAB=
(7.28)
In practice, it is useful to introduce also a minimum horizon, which specifies the beginning of the horizon to be used in the cost function, ioe. J = Y’]~=H~, (’) in (7.3). A simple implementation can be done by removing the first H,~  1 rows from KCAand KCAB in (7.15) and (7.16), respectively. Notice, that there is no integral action present. Thus, in the case of modeling errors, a steady state error may occur. A simple.way to include an integral term to the controller is to use an augmentedstate space model, with an additional state constructed of the integraloferror, xI (k) = I ( k  1) y (k)  ~ (k). This state then has a gain kz from the augmented state x~ the controller output u.
188
CHAPTER 7.
PREDICTIVE
CONTROL
In general, the states x are not directly measurable. Whennoise is not present an observer is used for state "recovering". In the presence of noise, a Kalmanfilter can be used to estimate the states (see Chapter 3). Provided that the covariances of the input and output noises are available or can be estimated, a state estimate minimizing the variance of the state ~timation error can then be constructed. The Kalmanfilter uses both the system model (A, B, C) and system inputoutput measurementsu, y in order to provide an optimal state estimate. The behavior of this dynamicsystem under the feedback, that is simply a function which mapsthe state space into the space of control variables, is analyzed in the next subsection.
7.3.5
Closedloop
behavior
In order to analyze the behavior of the closedloop system, let t~s derive its characteristic function. Takinginto account the control strategy (7.25), from the statespace model(7.1)(7.2) we derive the relation betweenthe output y(k) and the desired system output w(k) = [ 1 ... w (k): Substitute (7.25) to x(k + 1) in (7.1) with k ~x(k)  Ax(k 1) + BK1 (w(k)  KcAx(k
(7.29)
Reorganizing gives 1BKl x(k) w(k) = [I
q1 (A BK1KcA)]
(7.30)
Substituting to (7.2) gives the relation betweeny (k) and w 1 y(k) = C [I q1 (A 1 BK1KcA)]
and the characteristic
: w (k)
i
(7.31)
polynomial det [I  q~ (A  BKIKcA)]
(7.32)
Example 39 (Characteristic polynomial) Let a process be described the following transfer function 3 0.1989q y (k) = 1  0.9732q~ u (k)
(7.33)
7.
4.
189
GENERALIZED PREDICTIVE CONTROL
(this exampleis discussed in more detail at the end of this chapter). The equivalent control canonical statespace presentation is given by A
0.97 0 0] 1 0 0] ;B= 0 1 0
1 0 ;C= 0
[0
0 0.1989]
(7.34)
Let us design a predictive controller using Hp 5 and r = 1. This results to a gain vector K1[0
0 0.1799
0.1623
0.1514]
(7.35)
and
KCA =
0 0.1989 0.1929 0.1871 0.1815
0.1989 0 0 0 0
01 0 0 0 0
(7.36)
The matrix A  BKIKcAis given by 0.8774 0 0 1 0 0 0 1 0
(7.37)
and the characteristic polynomialwill be 1  0.8774q~. For r = 0.01, which penalizes less the control actions, the characteristic polynomialwill be 1 0.1692q~’, a muchfaster response. Note that the control strategy (7.18) associated with the cost function (7.17) is linear towards the system input, output and the desired output. can be easily expressed in the RSTform: R (q’)u(k)
= S (q~)y(k) + T (q~)
(7.38)
In the next section, the approachof generalized predictive control is considered, wherea disturbance modelis included in the plant description.
7.4 Generalized
predictive
control
Anappealing formulation called generalized predictive control (GPC)of longrange predictive control was derived by Clarke and coworkers [13]. It represents a unification of manylongrange predictive control algorithms (IDCOM[79], DMC[14]) and a Computationally simple approach. In the GPC,
190
CHAPTER 7.
PREDICTIVE
CONTROL
an ARMAX/ARIMAX representation of the plant is used. In what follows, /stepahead predictors for the ARMAX/ARIMAX model in state space form will be derived, a cost function formulated and the optimal solution minimizing the cost function derived. In the next section, a simulation example illustrates the performanceand tuning of the GPCcontroller.
7.4.1
ARMAX/ARIMAX model
Recall the ARMAX and ARIMAXstructures from Chapter MAX/ARIMAX model in the polynomial form is given by: F(q1)
y(k)=
B(q~)v(k)+C(q~)e(k)
3. An AR
(7.39)
wherefj, bj and ci are the coefficients of the polynomials.F(ql), B (ql) C (ql), j = 1, 2, ..., n. For notational convenience,withoutloss of generality, we assumethat the polynomials are all of order n; F (q~) and C (ql) monic, and b0 = 0. Substituting v(k) u( k) and F(~) ÷A(q 1) in (7.39) gives the ARMAX model, and substituting v (k) ~ Au(k) F (ql) ~__ AA(qI) gives the ARIMAX model structure. In w:hat follows, we denote the controller output by v (k). In the ARIMAX case, the final controller output to be applied to the plant will be u (k) = u (k  1)+Au The ARMAX/ARIMAX model can be represented in the statespace ~ form as
x(k+l) = hx(k)+n,(k)+Ge(~) y(k) = Cx(~)+e(k)
(7.40) (7.41)
The relation betweenthe statespace description and inputoutput description is given
B(q__)_cT[qI ~ B" C q) ( F (q) ’ F(q)
CT[qI A] ~ G ÷ 1
and F(q) C(q)
 det[qIA] ;B(q)=CTadj[qIAIB = CTadj[qIA](~+det[qIA]
Note that the polynomials are given in terms of the feedforward operator q.
7.4.
GENERALIZED PREDICTIVE CONTROL
191
where f~ f2
10.. 0 1
0 0
(7.42) I fn~ 0 0 fn 0 0 ..0
B= [bl
b= ..
bn1
bn]T
c = [ 1 0 ... 0 ]
(7.43)
(7.45)
If the coefficients of the polynomials F (ql) and B (ql) are unknown, can be obtained through identification (see previous chapters). Anestimate of C (ql) mayalso be identified. On can also consider estimating the matrices A, B and C (and G) directly from inputoutput data using subspace methods[4S11541. 7.4.2
/stepahead
predictions
The prediction is simple to derive. Let us consider a 1stepahead prediction y(k+l)
= Cx(k+ 1) +e(k+ = C[Ax(k)+Bv(k)+Ge(k)]+e(k+l) = C(AGC)x(k)+CBv(k)+CGy(k) +~(k+ 1)
(7.46) (7.47) (7.48)
where the last equality is obtained by substituting e (k) = y (k)  Cx from (7.41) and future noise is not knownbut assumed zero mean. The 21stepahead predictor becomes
~(k+ I)=C(AGC)x(k)+ CBv(k) 2The task istofind ~(k +I)
(7.49)
192
CHAPTER 7.
PREDICTIVE
CONTROL
Similarly, for the 2stepahead prediction, we have
v(~+2) = Cx(k + 2) + e (k +
(7.50) (7.51)
= c [Ax(k+ 1) Bv(k + 1)+ Ge(k + 1)1 +e(~+2) = C[A[Ax(k)+Bv(k)+Ge(k)]+Bv(k+].)] (7.52) +CGe(k + 1)+ e(k +2) ~ and the 2step ahead predictor becomes ~(k + 2) = CA[A GC]x(k) + CABv (k) + CBv(k + 1)
+ChGy(~) By induction, we have the following formula for an/stepahead prediction ~(k + i) =
CA~~Bv (k + j  1)
(7.54)
k j=l
+c~~~ [~ ~cl x +CA~~G~ (~) E{[y(k+
2} 1)~]
2} = E{[C(AGC)x(k)+CBv(k)+CGy(k)+e(k+I)~ = E { [C (A  GC) x (k) + CBv (k) + CGy(k) +2 [C (A  GC)x (k) + CBv(k) + CGy(k)  y~ e +e2 (k + 1)} = E{[C(A(]C)x(k)+CBv(k)+CGy(k)~]
2}
+E{e2(k+
1)}
since e (k + 1) does not correlate with x (k), v (k), y (k) or ~. The minimumis whenthe first term is zero, i.e. (7.49). 3Proceeding in the same way as with the 1step ahead predictor, we hmze
2} = E {[CA (A  GC) x (k) + CABv(k) + CBv (k + 1) + CAGy(k)
since e (k + 1) and e (k + 2) do not correlate with x (k), v (k), y (k), ~ or with each The ~i~ce is is ~nimized when (7.53) holds.
7.4.
GENERALIZEDPREDICTIVE CONTROL
193
Let us use a morecompactmatrix notation. Collect the predicted syste~n outputs, the system inputs, and the desired future outputs at instant k into vectors of size (Hpx 1): T ~(k+l)
= [~(k+l),..,~(k+Hp)] T
v(k) = Iv(k),... ,v(k+H,T w(k+l)  [w(k+l),...,w(k+gp)]
(7.55) (7.56) (7.57)
The future predictions can be calculated from ~ (k + 1) = KcnGcX(k) + Kcn~v(k) + KcnGy
(7.58)
where C[AKCAGC
GC] (7.59)
:
CA"~~ [A GC] CB
...
0 (7.60)
KCAB
CAHpIB ... CG KCAG
CB
T
:
(7.61)
CAHpIG
7.4.3
Cost
function
Let us minimizethe following cost function, expressed in a vector form
J  (w(k+1) ~(k + TQ(w(k + 1)  ~(k+ 1)) (7.6 T ~V(k)RV (k) where Q = diag[ql,..,qgp] and R = diag[rl,... ,rHp]. Notice that if v (k) ~ Au(k), the control costs are taken on the increments of the control action, whereasif v (k) ~ u (k), the costs are on the absolute values of control, as in (7.17). The introduction of diagonal weighting matrices Q and R enables the weighting of the terms in the cost function also with respect to their appearance in time. The optimal sequence is given by 1 v KcTA (k) BQ = X [R + K~ABQKcnB] (w (k + 1)  KCA~CX (k) KcnGY (k))
194
CHAPTER 7.
PREDICTIVE
CONTROL
Proof. Let us simplify the notations by dropping out the sample indexes k. Minimization can be done analytically by setting the derivative oJ The derivative is given by
o~ = (w p)~q (w
(w~)
+ v 0~~uRV + ~vv TM
(7.64) Rv
For the partial derivatives we get
0 (w p) = 0 ~vvy  KcAB 0
vT
Rv = R; ~v = I
(7.65)
(7.66)
Thus, the derivative can be written as OJ 0~ ’= 2K~ABQ (w  ~) + 2RTv
(7.67)
Setting the derivative to zero and substituting the vector of future predictions from (7.58), we have T KcABQ (W  KCAGCX  KCABV KCAGY)= RTv
(7.68)
Solving for v gives the optimal control sequence (7.63). Let us introduce a gain matrix K: K = [R + KcTABQKcAB] 1 K~A~Q
(7.69)
and denote the first row of K by K~. Since only the first element of the optimal sequence is applied to the process, the online control computations are reduced to v(k) = K1 [w(k + 1) Kcn~cx(k)  KcAGy(k)]
(7.70)
If the system parameters, A, B, G, and C, are constant, the gain matrices K~, Kch~c and KCA~,can be computed beforehand.
7.
4.
195
GENERALIZED PREDICTIVE CONTROL
7.4.4
Remarks
The disturbance model in the ARIMAX structure
c(q1) d)+ A(q_l)A(q_l)e(k)
(7.71)
allows a versatile design of disturbance control in predictive control. In particular: ¯ with C (ql) __ A (ql) (q l), 7.3, with no integral action;
the approach reduces tothat of section
¯ with C (ql) = A (ql), a pure integral control of disturbances is tained (noise characteristics ~); ¯ with C (ql) = C~ (q~), an ARIMAX model with noise characteristics c_a_is obtained; AA
¯ with C (q~) A (q~) C~(q~), an ARM AX mod el wit h noi se cha teristics ACa is obtained; ¯ with C (q~) A (ql) A (q l) C1 (ql), an arbitrary FIR filt er can designedfor the noise (no integral action); etc. Since the controller is operating on Au, the control horizon is simple to implement. A control horizon H,: is obtained whenonly the first Hc columns of the matrix KCAB in (7.63) are used. Accordingly, the control weighting matrix R, associated with future vs, has to be adjusted by specifying only the first Hc rows and columns. The future control increments: v (k + He), v (k + H~ + 1), ... are then assumedto be equal to zero. H~ 1 results in meanlevel control, where the optimization seeks for a constant control input (only one change in u allowed), which minimizesthe difference between targets w and predictions ~ in the given horizon. With large Hp, the plant is driven to a constant reference trajectory (in the absence of disturbances) with the samedynamics as the openloop plant. A minimumhorizon specifies the beginning of the horizon to be used in the cost function. If the plant modelhas a dead time of d [assuming that b0 is nonzeroin (7.71)], then only the predicted outputs at k + d, k + d + 1, ... are affected by a changein u (k). Thus, the calculation of earlier predictions wouldbe unnecessary.If d is not known,or is variable, H~,~can be set to 1. A simple implementation can be done by removingthe first H~,,  1 rows from KcAcc,KCAB and KCAG in (7.59) (7.61). The corresponding (first H~n rows and columns of the weighting matrix Q need to be removed, too. With
196
CHAPTER 7.
PREDICTIVE
CONTROL
Hc = nA b 1, Hp = nA q nB+1, H,n =nB+ 1 a deadbeat control [8] results, wherethe output of the process is driven to a constant reference trajectory in nB b 1 samples, nAb 1 controller outputs are required to do so. The GPC represents an unification of manylongrange predictive control algorithms, as well as a computationally simple approach. For example the generalized minimumvariance controller corresponds to the GPCin which both the Hln and Hpare set equal to time delay and only one control signal is weighted.. In somecases it is morerelevant to consider a cost function with weights on the nonincremental control input
J =
(7.72)
+u (k)T (k) The above equations are still valid with substitutions F (ql) ~__ A (ql) and v (k) ~ u (k) (ARMAX structure). This is a good choice, e.g., if the process already includes an integrator in itself. Note, that the control horizon is then implementedas by (7.27) and (7.28). The ARMAX/ARIMAX model can be seen as a fixed gain state observer. For the noise, wealwayshave e (k) = y (k)  Cx(k). In general, the states are not known(not measured, or there is noise in the measurements). Using the statespace model(7.40)(7.41), a prediction ~ (k) of the state x (k), y and u up to and including instant k  1, can be written as ~(k)
= [AGC]~(k
1)+ By(k
1)+ Gy(k
1)
(7.73)
or, equivalently, ~(k)=A~.(k1)÷Bv(k1)+(~[y(k1)t2~(k1)]
(7.74)
The prediction ~ (k) is then used for x (k) in the GPCequations. The above observer is also called an asymptotic state estimate [69], an estimate where the optimal estimate tends to whentime tends to infinity. An optimal estimate of the state can be obtained from a Kalmanfilter: ~(k)
= [AGC]~(k1)+Bv(k1)+Gy(k1) dK(k)[y(k)~(k)]
(7.75).
where
~(k)=C(A(~C)~(k1).+CBv(k1)+CGy(k1)
(7.76)
7.5.
SIMULATION EXAMPLE
and the Kalmanfilter equations
197
gain vector is obtained from the following recursive
(A  GC) P (k  1)(A T ×
K(k)
[Y + C (A  GC)e (k  1)(A T T P(k)
(7.77) cT]I
C
= (AGC)P(kI)(AGC) K (k) C (A GC)P (k 1) (A T
(7.78)
where the initial condition is P (0), the covariance matrix of the initial state estimation error: P (0) = E {(x(0)  ~(0))(x(0)  ~(0))T~ and is the variance of e (k). The asymptotic estimate is obtained whenlimk~ K (k + 1) = 0, which is true if the eigenvalues of the matrix (A  GC) less than one. 7".4.5
Closedloop
behavior
The GPCcontrol strategy is a linear combinationof the system input, output and the desired output. It can be expressed in the RSTform. As for the linear quadratic predictive controller, the characteristic function can be derived, proceedingin a similar wayas in section 7.3.5. Thecontroller is given by (7.70). Substituting (7.41) for y (k) in (7.70), substituting the result (7.40), regrouping and solving for x (k) and using (7.41) again, we y(k)
= {Iq 1 [A BK1 (KcAGC ~ KCAGC)]} 1 × BK~ :
(7.79)
w(k)+(BK~KcAG+G)e(k1)+e(k)
and the characteristic polynomialis given by Get{IqI[ABKI(KcAac+KcAGC)]}
(7.80)
The next subsection is dedicated to a control problem originating from an industrial process.
7.5
Simulation
example
Let us consider an exampleof the control of a fluidizedbed combustor(see Appendix B).
198
CHAPTER 7.
PREDICTIVE
CONTROL
Consider a nominal steadystate point given by Qc = 2.6 k~ (fuel feed 3 (se condary air rate), F1  3 1N’’’3 (primary air flow) and F2 Nm flow). The following linearized and discretized description betweencombustion power and fuel feed is obtained from the plant model using a sampling time of 10 seconds: 3 0.1989q P (k) = 1  0.9732q1Vc (k) (7.81) Assumingan ARIMAXmodel structure with C (ql) = A (ql) (integrating output measurementnoise) we have the following statespace model for the system x(k+l) y(k)
(7.82) (7.83)
Ax (k)+BAu(k)+Ge(k)  Cx(k)+e(k)
where y ~ P, u ,: Qc, and the matrices are given by 1.9732 0.9732 0.0000
I
1 0 0 1 0 0
0.0000 , B = 0.0000 0.1989 1 3 0.9732 t
c [1 o 0],a
o.ooooj
(7.84)
(7.85)
Let us first design a meanlevel controller: Hc = 1 (control horizon), Hp = 360 (large prediction horizon corresponding to 1 hour of operation). The gain matrices are then given by
KCAB
KCAGC
0 0 0.1989 0.3925
,Kcha
1I
:
:
7.4212
1
0.9732 1.9203 2.8421 3.7393 : 36.3114
(7.86)
i
1 1.9732 2.9203 3.8421
0 1 1.9732 2.9203
:
:
37.3113 37.3113
(7.87)
7.5.
SIMULATION
EXAMPLE
199
26 24
2¢ 18 10
20
30
40
50
60
7O
3.2 3 2.8 2.6 (:Y2.4 2.2 2 1.~
10
20
30
40
50
60
70
t [mini
Figure 7.1: Meanlevel control. Hc = 1, H,,l = 3, Hp = 360, R = 0, Q = I. The upper plot shows the combustion power, P [MW], controlled by the fuel feed rate Qc ~
Hmcan be given as equal to the time delay, H,~ = 3. The ’ideal’ meanlevel control result (using weighting matrices Q = and R = 0) is shown in Fig. 7.1, where the linear model (7.81) is used the process to be controlled. In meanlevel control, the plant has open loop 1. dynamics, the closed loop characteristic polynomial, (7.80), is 1  0.97q A tighter control can be obtained by reducing the length of the prediction horizon (Hp = 30 in Fig. 7.2,) and/or increasing the control horizon (Ho 30, Hc = 2 in Fig. 7.3). The characteristic polynomials are given by 1 1 0.93q and 1, respectively. Notice, however, that in the latter simulation the control signal is bounded, whereas the computation of the characteristic polynomial was based on an (unconstrained) linear model. Figure 7.4 shows a more realistic simulation, where the differential equation model was used for simulating the plant. Measurement noise with a
200
CHAPTER 7.
PREDICTIVE
CONTROL
26 24 22 20 18 I
10
20
40
50
60
T
I"
"[
T
.I
I.
30
40
50
60’
30
70
3.5 3 2.5 2 1.5 1 0"50
I0
20
70
t [minl
Figure 7.2: A typical GPCsetting.
Hp = 30, see Fig. 7.1 for other details.
7.5.
SIMULATION
EXAMPLE
201
26 24 22 20 18 16~
10
20
30
50
60
70
50
60
70
5 4
2 1 10
30
40 t [mini
Figure
7.3:
other
details).
Deadbeat
type
Note that
of setting.
the input
H~, = 30, H,:
was constrained
= 2 (see
Fig.
on the range
[0.5,
7.1 for 5].
202
CHAPTER 7.
PREDICT1VE CONTROL
26 24 22 20 18 10
20
30
40
50
60
70
3.5 3 ~2.5 2 1.5 1 0
10
20
30
40
50
60
70
t [mini
Figure 7.4: GPCcontrol. Hc = 1, H,,l = 3, Hp= 30, R = 100I, Q = I. The upper plot shows the combustionpower, P [MW],controlled by the fuel feed rate Qc[~]. Theplant was simulated using the differential equation model, with output noise N(0, 0.21). Anunmeasured25%heat value loss affects the process at t = 55 rain. standard deviation of 1%of the nominal value was added to the: output. In addition, an unmeasured disturbance (25% stepwise drop in fuel power) affects the simulated process at t = 55 rain. An ARIMAX model with C (ql) = 1  0.9q1 was designed for disturbance rejection. In addition, a nonzero control weight was used, R = 100I to reduce jitter in the controller output.
Chapter
8
Multivariable
Systems
In this chapter, the control of linear multivariable systems is considered. First, the design of a MIMO control system is reduced to several SISOdesign problems. The relative gain array (RGA)method aims at helping choose suitable pairs of control and controlled variables. If the interactions betweenthe variables are strong, the system maynot be satisfactorily controlled by SISOcontrollers only. In this case the interactions can be actively reduced by decouplers and the control of the decoupled system can then be designed using SISOmethods. Decoupling is considered in the second section, and a simple multivariable PI controller (MPI) based on decoupling both low and high frequencies is presented. The third approachconsidered in this section is a ’true’ multivariable control approach.Thedesign of a multivariable generalized predictive controller (MGPC) is considered, whichsolves the MIMO control design problem by minimizing a quadratic cost function. Simulation examples conclude this chapter. All methods are based on models of the system. However, only steadystate gains are required by the RGAmethod; steadystate and high frequency gains by the MPI approach. These can be determined experimentally by using relatively simple plant experiments. The MGPC approach requires a dynamic model of the MIMO system, the identification of which may be a morelaborious task and require moreextensive experimentingwith the plant. For MIMO systems, the statespace fortnulation is simpler than, e.g., that of polynomial matrices. Therefore, statespace models are assumedin what follows. In the case of MGPC,the conversion of a system model from a polynomialmatrix form to a statesp~ce form is also considered. 203
204
CHAPTER 8.
8.1 Relative
MULTIVARIABLE SYSTEMS
gain array method
For processes with N controlled outputs and N manipulated variables, there are N! different waysto select inputoutput pairs for SISOcontrol loops. One wayto select the ’best’ possible SISOcontrollers amongthe configurations, is to consider all the N! loops and select those inputoutput pairs thai; minimize the amountof interaction betweenthe SISOcontrollers. This is the relative gain array (RGA)method, also knownas Bristol’s method (see, e.g., [90], pp. 494503). The RGAmethodtries to minimize the interactions between SISO loops, by selecting an appropriate pairing. It does not eliminate the interactions, it merelytries to minimizethe effect. It only relies uponsteadystate information. If dynamicinteractions are moreimportant than those occurring at steadystate, then clearly RGAis not a good methodfor such systems.
8.1.1
The basic idea
Consider a stable Ninput Noutput process. Let us define a relative gain betweenan output Yo (o = 1, 2, ..., O) and a manipulated variable u~ (i 1, 2, ..., I) (O = I = N)
au,j ~co,,~,a,,t v~#,
(8.1)
L AUi ] Y~ constant Vk~o
where the notation ’u~ co~tant Vk ~ i’ denot~ that the valu~ of the manip~ated v~iabl~ other th~ u{ are kept co~tant. Sillily ’y~ co~tant Vk # o’ denot~ that M1outputs except the o’th one ~e kept constant by some control loops. The, the numerator in (8.1) is the openloop ste~ystate gain of the system (the difference betw~ni~tial and final steadystat~ in output o, divided by the amplitude of the step change in input i). The deno~natorin (8.1) is the closedloop steadystate g~n, whereall other outputs except the o’th one are controlled using a controller w~cheli~nat~ steadystate error (e.g., a PIcontroller). The ratio of the two g~ns defin~ the relative gain Ao,~. The value of Ao,i is a ~ef~ me~e of interaction. In partic~ar (s~
[521): 1. If Ao,i = 1, the output Yo is completely decoupled from ~l other inputs th~ the i’th one. T~s p~ of v~iabl~ is a perfect choice for SISO control.
8.1.
RELATIVE
GAIN ARRAY METHOD
205
If 0 < Ao,~ < 1, there is interaction between the output yo and input variables other than ui. The smaller the Ao,~, the smaller the interaction between output yo and input u~. If ~o,~ = 0, then output Yo does not respond to changes in input u~. Consequently, the input u~ can not be used to control the o’th output. If ~o# < 0, then the gains of the open and the closedloop systems have different signs. This is dangerous, as the system is only conditionally 1. stable 5. If Ao# > 1, the openloop gain is greater than closedloop gain. This ~. case is also undesirable A N × N matrix of relative gains (Bristol’s matrix) collects all the relative gains into a matrix form. A~,t A~,~ ... AI,N
AN,1 AN,2 "’" AN, N The sum of each row and column of the matrix is equal to one. The RGAmethod recommends the following way to pair the controlled outputs with the manipulated variables: Proposition 1 (BristoPs method) Select the control loops in such a way that the relative gains Ao,i are positive and as close to unity as possible. In other words, those pairs of input and output variables are selected that minimize the amount of interaction among the resulting loops. 1 Assume, for example, that the system is in open loop, and that the gain between Yo and ui is positive. This would then fix the gain(s) of the controller (e.g., positive gains in PIcontrol Aui  kpAeo q kteo (eo = Wo  Yo)). If the other loops are then put to automatic mode (controlled), the sign of the gain between yo and ui changes sign (since Ao,i < 0). Consequently, the gain of the controller designed for the open loop system has gain with a wrong sign, which results in instability. 2In most instances the Yo  ui controller will be tuned with the other control loops in manual mode. When the other control loops are then put into automatic mode, the gain between yo and ui will reduce (since Ao,i > 1) and the control performance for yo will probably degrade. If the Yo  ui controller is then retuned with a higher gain, a potential problem may arise: If the other loops are put back in manual mode, the gain between Yo and ui would increase. Coupled with the new high gain controller, instability could result. The greater Ao,i is, the more pronounced this effect is.
206
8.1.2
CHAPTER 8.
MULTIVARIABLE SYSTEMS
Algorithm
Whena model of the system is available, the Bristol’s methodis simple to compute. Consider a static model of an Ninput Noutput process: y = K~u
(8.3)
Withoutloss of generality we can assumefor a linear system that the initial state is at y = 0, u = 0. The open loop gains for a unit step are given by the coefficients of the gain matrix [K~]o,~= k~ o,~:
=
(8.4)
constant
In order to solve the closedloop gains let us computethe inw~rse of the system 1 u = K~ y = My
(8.5)
and denote the inverse matrix by M, [M]o,i = mo,~. In closed loop, all the other outputs axe controlled so that the steadystate remains the same, except for the o’th one (Ayj = O, Vj ~ o, Ayo = Aye). Wecan then write the following steadystate relation betweenthe i’th input and the o’th output: =
(8.6)
MAy "0 0
=MAy;
mi,o
0
i I
Yo
(8.7)
I
L mN,o J 0 Taking the i’th row of the above system of equations gives
(8.8)
AU~= m~,oAy~ and 1 yk constant
Vk¢o
(8.9)
8.1.
207
RELATIVE GAIN ARRAY METHOD
wheremi,o is the (i, o)’th element of the inverse of the process’ steady state gain matrix. Thus, the elements of the Bristol’s matrix are given by (8.10)
"~o,i = kss o,imi,o
Let us give an algorithm for computingthe Bristol’s matrix, whena linear modelfor the system is available. Algorithm 29 (Bristol’s
method) Given a steadystate
process model
y = K.~,~u
(8.11)
A = K~ ® (g~ 1) T
(8.12)
the Bristol matrix is given by
where ® denotes the elementwise multiplication. Example 40 (Brlstol’s
method) Consider a 2 x 2 system
¯ Let the following steadystate information be available
where 0.15 0.2 This results in the followingmatrix of relative gains A= 0 1 The Bristol’s methodthen suggests to select SISOcontrollers for pairs yl  ul and y2  u2, whichis intuitively clear since the input u~ has no effect on the output y~. Let the system be given by 0.15 0.2
(8.16)
This results in the following matrix of relative gains 0.6 0A
(8.17)
The Bristol’s methodthen suggests to select SISOcontrollers for pairs y~  u2 and y~  ul.
208
CHAPTER 8.
MULTIVARIABLE SYSTEMS
Let the system be given by Kss= 0.15
[1
(8.18)
0.2
This results in the following matrix of relative gains A=
2 3 ] 3 2
(8.19)
The Bristol’s methodthen suggests to select SISOcontrollers for pairs yl  u~ and ye  ul. There may be problems in switching between automatic and manualmodes, but at least the gains in open and closed loop will have samesigns. Example 41 (Fluidized bed combustion) A steadystate model for FBCplant (see Appendix B) in the neighborhood of an operating point given by 0.0688 212.29 162.72 8.103
0.0155 93.73 5.87 0
0.0155 0 18.29 0
Qc
(8.20)
¯ Let us first consider that the outputs CF(flue gas O~), TB(bed temperatures) and P (combustion power) are controlled by the three inputs (fuel feed, primary and secondary airs). The Bristol’s matrix becomes 0 0 1 0 1 0 1 0 0
(8.21)
Thus the suggestion is to control oxygen with secondary air, power with fuel feed, and bed temperatures with primary air. :For the first two, this is common practice in reality; the bed temperatures are not usually under automatic control. ¯ Let us consider controlling the freeboard temperatures ~.~, instead of bed temperatures. The Bristol’s matrix is given by 0 1.4734 0.4734 0 0.4734 1.4734 10 0
I
(8.22)
8.2.
DECOUPLING OF INTERACTIONS
209
Thesuggestionis still to control the powerby fuel feed (note that this is simple to reason using physical arguments, too). For the temperatures and air flows the situation is morecomplicated. The suggestion is now to use primary air for 02 control and secondary air for the freeboard temperatures; if chosen otherwise the open and closedloop gains will have different signsa. In practice, freeboard temperatures are not under automatic control. If the numberof input and output variables is not the same, then several Bristol’s matrices need to be formed. Assumethat there are O output variables and I (0 <_ I) possible manipulated variables. Thenan O × 0 matrix relative gains can be formedfor all different combinationsof Omanipulated variables. All the matrices need to be examinedbefore selecting the O loops with minimalinteraction. The rule for the selection of control loops remains the same, i.e. the control loops that have relative gains positive and as close to unity as possible are recommended. The RGAmethodindicates how the inputs should be coupled (paired) with the outputs to form loops with the smallest amount of interaction. However,this interaction maynot be small enough, even if it is the smallest possible. In this case, decouplers can be applied. These will be considered in the next section.
8.2 Decoupling
of interactions
The purpose of decouplers is to cancel the interaction between the loops. The remaining system can then be considered (and designed) as having interactions at all. Hence, a multivariable control design problemis converted into a set of SISOcontrol design problems, by introducing artificial decoupling compensations(see, e.g., [90], pp.504509). Theinteractions can be perfectly decoupledonly if the process is perfectly known.In practice, a perfect modelis rarely available. Thus only a partial decoupling can be obtained, with some (weak) interactions persisting. mayalso be that the decouplers are not realizable, or that the degree of decouplers would be too high for a practical implementation. In this case, somerealizable form of the decoupler can be considered. For a stable process, 3The gains from F~ and F2 to CFare equal, but the gain from F~ to TF is significantly smaller than that from F2. If F2 is used for 02 control, and F~ for temperature control then each action taken by the F2 would need to be compensated by a (larger) counteraction in F~. Thus the open and closedloop gains would have different signs depending on wheather TF  F1 controller is on or off.
210
CHAPTER 8.
MULTIVARIABLE SYSTEMS
a steadystate decoupler is always realizable. Remember that for a severely interacting system, static decouplingis better than no decouplingat all. There are a numberof different approaches, the most famous being perhaps the Rosenbrock’s(inverse) Nyquist array method(see, e.g., [57]), which is a frequency response methodseeking to reduce the interaction by using a compensator to first makethe system diagonally dominant. In what follows, a simple schemefor designing a discretetime multivariable PI controller is presented.
8.2.1
Multivariable
PIcontroller
In [74] a multivariable PI controller was suggested. The main idea is to decouple the system both at low and high frequencies. The original derivation was based on a continuoustime statespace model, in what follows the discretetime case is considered. Consider a linear timeinvariant stable multivariable plant described by the following discretetime equations
x(k+ 1)  Ax(k) +Bu(k) = Cx(
(8.23) (S.24)
and controlled by a multivariable PIcontroller Au (k) = KpO~vAe (k) + K~cqe
(8.25)
where e(k)=w(k)y(k)
(8.26)
and ap and at are tuning variables (diagonal matrices) and w (k) contains the set points. The idea is that the Ppart decouples the system at high frequencies, while the Ipart decouples the system at low frequencies (steadystate). Let us first consider the P and Icontrollers separately, and then combine them together. Pcontroller Let us first assumethat the system is controlled by a Pcontroller and that the aim is to drive the error e to zero as fast as possible. For the high frequencies we can write (Ay is the componentwith the highest frequency
8.2.
211
DECOUPLING OF INTERACTIONS
that can be described by the discretetime model): Ay(k+ 1)
= C~x(k+ 1) = C[x(k+1) x(k)] = C[Ax(k)+Bu(k)x(k)] = C(AI)x(k) + CBu(k) = C(AI)x(k) + Khi~,u(k)
(S.27) (8.28) (8.29) (8.30) (8.31)
where Khigh 
(8.32)
CB
Considerthat, at sampleinstant k, x is initially in a desired steadystate and a step change in the reference signal, Aw(k + 1), occurs at k + 1. then have e(k+
1) = Aw(k+ = w(k + 1)  w(k) = w(k + 1) y(k)
(8.33) (8.34) (8.35)
In order to drive the error w(k + 1) y (k + 1) to zero in one control sample (if possible), weneed to have y(k+l) = w(k+l) Ay(k+ 1) = Aw(k+ C(A I)x(k)+KhighU(k) = e(k+
(8.36) (8.37) (8.38)
and we can solve for the manipulated variables u(k)= Khig,,1 [e (k _~v1)_. C (A _ i) Au(k)
1 = KhighAe(k
(8.39) (8.40)
+ 1)
wherethe last equality is obtained using Ax(k) = 0 since x (k) was a steady1 state. Thus,Kp  Khig h in (8.25), if the inverse exists. Icontroller Let us nowconsider the case wherethe systemis controlled by an Icontroller. Fromthe system modelwe obtain the following relationship for a steadystate (by setting x(k + 1) = x(k) = x~ = Axe8 ÷ Bu~8 Xss 
(I 
A) 1
Bu~
(8.41) (8.42)
212
CHAPTER 8.
MULTIVARIABLE
SYSTEMS
and y~.~ = Cx,~
(8.43)
which gives y~
=
y~
= K~u~.~
A)’Bu~
C (I 
(8.44) (8.45)
where 1B K~ = C (I  A)
(8.46)
In order to drive the steadystate error (for a step change in the reference signal) to zero, we need to have y~,,,~,w Ksstlss,lmW
= iw +
Ks~ (u~,o~d + Au~,,,,,w) K~Au~,,,ew The required change at the controller Au(k)
= w
(8.47) y (k)
(8.48)
= Aw+ Y~,o~d
(8.49)
= Aw
(8.50)
output at k + 1 is then
i~le
(k +
(8.51)
Thus, KI = K~1 in (8.25), if the inverse exists. PIcontroller The PI controller can now be constructed by combining the tuning for P and I controllers. The controller was given by (8.25) Au(k) = Kp(~pAe (k) + Kic~,e
(8.52)
where e(k) Ae(k)
 w(k) = e(k)e(k1)
(8.53) (8.54)
and Kp 1 = [CB] 1B SI ]= I[C
(I
(8.55) 
A)
(8.56)
8.3.
MULTIVARIABLE PREDICTIVE CONTROL
213
and K~ provide decoupling at high and low frequencies. The tuning of the controller is conducted by adjusting the ~p and c~, starting with small positive values (0 < ap,~ << 1, 0 < a~,. << 1): Kp
O~P,2
(s.57)
~p
0
O~P,N
0
(8.58) 0
O~I,N
Setting Otp, results inanIcontroller only; similarly c~,~0 results in n pure Pcontrol. With ap,~ = 1, an aggressive Pcontrol is obtained, which tries to drive the error to zero in one sample. With a~,~  1, the controller output at k + 1 is set to the value whichprovides the newsteadystate (meanlevel control). In the presence of noisy measurementsand modelling errors, these can provide instability to the closed loop system, and unrealizable control signals. Therefore, smoothercontrol is usually desired, at the cost of closedloop performance.
8.3 Multivariable
predictive
control
In Chapter 7, the generalized predictive control (GPC)for SISOsystems was considered. In this section, we will extend the concepts of SISOGPCto the control of MIMO processes (see [45][96]). 8.3.1
Statespace
model
A MIMO system can be conveniently described by a statespace model. Let us consider a multivariable inputoutput polynomial modelof the form r (ql) y (k) = B (q’) v (k) + C (ql) where the polynomial matrices A, B and C are given by F (ql) = I + Flq 1 +... B (ql)
+ N FNq
Blq’ +" " + g Bgq
(8.60) (8.61)
214
CHAPTER 8.
MULTIVAPJABLE SYSTEMS
C (q~) = I+ C~q~ ~ +... + CHq
(8.62)
The output and noise vectors y and e, respectively, are of size O x 1:
T ,yo(k)] T
y(k) = [Yl e(k)
(8.63) (8.64)
Consequently, matrices F (qi) and C (q~) are of size O x O, (as well matrices F,~ and C,~). Input vector v is of size I x 1: v(k) = [,)1
T (k),v2(k),""
,vi(k)]
(8.65)
matrix B (q~) is of size O x I (as well as matrices B~). Without loss generality we assume that all polynomials are of order N. The above polynomial model can be represented in a (canonical observable) statespace form: x(k+ 1) = AX(k) + By(k) + y(k)  Cx(k)+e(k)
(8.66) (8.67)
Please note that A, B, C and G are matrices in the statespace model, whereas F (ql), B (ql) and C (ql) are polynomial matrices. The matrices A, B, G and C are given by 0 0 A
(8.68) FN1
B
I 0
= [ B~ B~ ¯ ¯ ¯ BN~I B.u ]T
G
c
0 0 0 0 ...
[ C1  F1
=
[I 0 ...
C2  F~ "" O]
CNiFNi
The matrices will now have the following sizes: NOx I, [G] = NOx O and [C]  O × NO.
(8.69) CNFN ]T(8.70)
(8.71) [A] = NO× NOI [B] =
Example 42 (Representation of a 2 × 2 system) For a 2 × 2 Pcanonical
8.3,
MULTIVARIABLE PREDICTIVE CONTROL
215
system with commondenominators we have (8.59)
o
1 + fl,l,lq 1 + ... N + fN, l,lq 0
1 + fl,2,2q 1 + ... + fN,2,2qN × (8.72)
_ 
[ bl,~,lq ~ N + ... + bg3,1q g... + bN,2,iq [ bt,23q1 +
bl,2,2q 1 n~ ...
1 tc~,2,2q
0
q bN,2,2q N
...
v2 (k)
N q CN,2,2q
e2
Withthis notation, the elements (i, j) of the matrices Fn consist then of the scalar coefficients f,~,~,j, matrices B and C are constructed in a similar way. Wethen have a statespace representation (8.66)(8.67) with matrices A, C and G given by: fl,l,1
[0
0
10]
fl,m ]
:
0
fN1,2,2
0 0 0 0
]
f~,l,1 0 0 f~,~,~
bNl,l,1 bgl,2,1
0 0
bNl,l,2 bgl,2,2
0 0 0 0
(8.7a)
0 0 1 0 0 [1 0 0
..
(8.7a)
]
bg,~,t bN,l,~ ] bN,2,1 bN,2,2
C1,1,1  /1,1,1 0 0 Cl,2,2  /1,2,2
]
CN1,1,1  fNl,l,1
0
0
c~v1,~,2  f~v~,~,~ J
CN, I,1  fN, l,1 0
0
1
1
c~,m f~r,~,~
(8.75)
216
CHAPTER 8.
MULTIVARIABLE SYSTEMS
whereall the elements of the matrices are scalars.
8.3.2
/step
ahead
predictions
The optimal predictions at sample instant k + i will be i
~(k+i)
y~ CAiJBv (k + j  1)
(8.77)
j=l
+CAi~ [A  GC]x (k) +CAiIGy (k) Let us use the following condensednotation:
T _ [~T(k+ 1),... ^T(k +gp)] ~(~),..., = [v v~ ~ (~ + H, 1)]
(s.78) (8.79)
that is ~1 (k+ 1) ~ (k + 1) :
~o (k + 1) ~1 (k+2)
~(k+2) :
~ (k + H~,)
~o (k +2)
~ (k + g,) ~(k+H,)
~o(~+g~)
(s.so)
8.3.
MULTIVAPJABLE
PREDICTIVE
CONTROL
217
~1(k) ~2(k)
v(k)
v~(k) v~(k + 1) v2(k + 1)
v(k+H,, 1)
vl (/¢ +1)
V(k)=
(8.81)
vl (k + H~, 1) v2 (k + Hp  1) :
v~(k + Hp 1) Wehave then a global predictive model for the Hpfuture time iustants k + 1, k+2, ..., k+Hp: ~ (k + 1) = KCAGCX (k) + KCABV(k) + KCAGY
(8.82)
where KCAGC
H"’ T C [A  GCI,.., CA [A  GC]] CB
...
0
(8.84)
KCAB CAH"IB KCAG
8.3.3
(8.83)
CG,...
"’"
CB
, CAHP1G]T
(8.88)
Cost function
Let us consider .the following cost function vJ
= [W(k+l)~(k+l)]Q[W(k+l)~(k÷l)]
(8.86)
+v~ (~¢)RV(~) where W (k + 1) = T (k + 1 ), ...,w T (k + Hp) T
(8.87)
218
CHAPTER 8.
MULTIVARIABLE SYSTEMS
that is wl (k + 1)
w2(k + 1) wo (k + 1) wl (k + 2)
w(~+l)
w2(k + 2)
w(~+l)=
(8.88) w(~+H,,)
wo(k + 2) Wl (k_~
Hp) i
~2(~+H~,) !
wo (k + Hp) The optimal control sequence is given by V(k)
~ = [K~AuQKcA, +R] XK~ABQ [W (k + 1) KCAGCX (k) KCAGY
(8.89)
Comparingwith the SISOcase, we see that the equations have the same form. Only the sizes of the vectors and matrices are different since an Ooutput/input system is considered instead of a SISOsystem. Instead of a scalar prediction, the predictions ~, (8.77), and targets w are given by an O 1 vector¯ Likewise, instead of a scalar input and noise, the systeminputs are nowgiven by~.aI x 1 vector v as well as the noise e. Thecollection of all future predictions Y as well as future targets Ware nowlong vectors of length OHm,,system inputs V are contained in a vector of length I~. Similarly, the elements of the gain matrices (8.83)(8.85) have been composedby piling the future predictions on top of each other. However,from a technical point of view, the future predictions, controller outputs and targets are still given in columnvectors just as in the SISOcase. Therefore, the solution also has a similar form and wewill give no proof for it.
8.3.4
Remarks
The implementation of the minimumhorizons H,~, prediction horizons H, and the control horizons H~ can be done in the same manner as in the SIS0 case, by ’cutting’ the matrices. For simplicity, in the above derivation
8.3.
MULTIVARIABLE PREDICTIVE CONTROL
219
same horizons were assumed for all inputs and outputs. Note, however, that whendifferent horizons are used, H,n,o ~ H,n,j; Hp,o ~ Hp,j and/or Hc,~ ~ H,~.,j for input i # j and/or output o # j, the ’removed’elements in the matrices need to be replaced by zeros. If R, (8.86), is a nonzero matrix, this results in numerical problems. To avoid this, all the rows and columns of KCABcontaining only zero elements need to be removed, as well as the corresponding rows and columns of Q and R, KCAGC,KCAGand W. In the samewayas in the SISOcase, a fixed gain observer is given by E(k) = [h
GC]~(k 1) +Bv(k 1) + Gy(k
(8.90)
This is the estimate where the Kalmanfilter tends to, whentime approaches infinity, i.e. the asymptotic estimate obtained under the condition that the eigenvalues of the matrix (A  GC)are less than one. 8.3.5
Simulation
example
Let us illustrate the multivariable predictive control on the FBCprocess. From Appendix B we obtain a model for the relations between combustion power P and flue gas oxygen CF (controlled variables) and fuel feed and secondary air F2 (manipulated variables), using a sampling time of seconds. Linearizing, discretizing, and converting to a statespace form, we have a statespace description in the form x(k+ 1) = hx(k) y(k) = Cx(k)
+BAu(k)
(8.91) (8.92)
where y~
p ,u~
F2
Let us design a multivariable GPCcontroller with the following setting: H(~.  [ 3 3 ], and H~, = [ 90 90 ] corresponding to a 15 min prediction horizon. The weighting matrix Q was determined such that the varying interval (different scales) of the correspondinginput and output variables was taken into account, resulting in ql = 278 and q2 = 0.01 whereql are the diagonal elements of Q corresponding to CF and q2 the elements corresponding to P. The control weighting was set to zeros. Anintegral noise model was assumedfor both outputs ( C~,~(q~) = A~,~(ql) i 1,2; C~,~ (q l) _~ for i#j). The plant was simulated using the differential equation model(Appendix B). In addition, an unmeasureddisturbance (25%drop in fuel power) effected the process at t = 55 min. Simulation results are shownin Fig. 8.1.
220
CHAPTER 8.
I
I
10
20
30
MULTIVARIABLE
SYSTEMS
40
50
60
J
J
J
70
20[ 15 0 4
I
I
10
20
30
40
50
60
1 0 15
I
I
~0
20
30
40
50
60
!0
20
30
40
50
I
t[min]
.
60
70
,
70
70
Figure 8.1: Multivariable
GPCof an FBC. The upper plots show the process [N’~3]and combustion power P [MW]. The outputs: flue gas oxygen C~ [~~~ j dashed line indicates
the targets.
The lower plots show the manipulated
variables: fuel feed Qc [~]and secondary air F2 [~[. unmeasured disturbance affects the process.
At t:= 55 min an
8.3.
MULTIVARIABLE PREDICTIVE CONTROL
0"0"0 3O
221
50
i O0
150
200
250
300
50
100
150
200
250
300
I
I
50
100
150
200
250
300
50
100
150 t [mini
200
250
300
t~ 20~ 10
Figure 8.2: Multivariable GPCof an FBC.See legend of Fig. 8.1 for notation. Measuredsteady states (see AppendixB) were used as target values. In a second simulation, the steadystates measuredfrom the true plant (see AppendixB) were used as reference targets. Fig. 8.2 illustrates these simulations.
Chapter
9
Timevarying Systems
and Nonlinear
In this chapter, some aspects of the control of timevarying and nonlinear systems are considered. The field of control of nonlinear systems is wide. The aim of this chapter is to give the interested reader, with the help of illustrative examples, a flavor of the problemsencounteredand the solutions available. First, a brief introduction to adaptive control is given and the two main approaches of gain scheduling and indirect adaptive control are presented. In nonlinear control, the Wiener and Hammersteinsystems are of particular interest, as the nonlinear control problemcan be reformulated such that linear control design methodscan be applied. A general approach for Wienerand Hammersteinsystems, based on the availability of an inverse of the static part, is introduced, and illustrated via a simulation example using Wiener GPCfor the control of a pH neutralization process. Then a special case of second order Hammersteinsystems is considered. This chapter is concludedby a presentation of a ’pure’ nonlinear predictive control approach, using SNNand optimization under constraints. The control method is illustrated using two examples concerned with a fermentor and a tubular reactor.
9.1
Adaptive
control
Let us use the following definition for adaptive control systems [38], p. 362, as a starting point: Definition 14 (Adaptive control) Adaptive control systems adapt (adjust) their behavior to the changing properties of controlled processes and their signals. 223
224
CHAPTER 9.
TIMEVARYING
AND NONLINEAR SYSTEMS
Basically, there are two main motivations for adaptive control: ¯ nonlinear processes, and ¯ timevarying processes. In real life, all industrial processes exhibit nonlinear timevaryingbehavior. Adaptive control mayneed to be considered for a process that changes with time, or whenthe process is nonlinear to the extent that one set of control system parametersis not suj~ficient to adequately describe the process over its operating region [85]. A major assumptionin conventional control design is that the underlying processes are linear timeinvariant (LTI) dynamicalsystems. Control design is almost always based on linear descriptions of the process to be controlled, or on the assumptionof linearity of the process in its operating region. This is due to the relative easiness of the identification of linear models,as well as to the availability of analytical results in the derivation of control laws based on linear process descriptions. A competenonlinear theory does not exist, and linear design methodswork rather well even whenapplied to nonlinear processes. Nonlinear
processes
All industrial processes, however,are inherently nonlinear. Nonlinearities maybe due to constraints, saturations or hysteresis in the process variables (such as upper and lower boundsof the position of an actuator, or an overflow in a tank.) Typically these nonlinearities occur at the boundaries of the operation areas of the process. Nonlinearities mayalso be present during the normal operation of the process due to the nonlinearity of the process phenomena,for example transport phenomena(transfer heat by conduction or by radiation, etc.). Thesenonlinearities are typically smoothand closetolinear, whichjustifies the use of linear approximations. A certain linear model(say, modelA) maybe valid in the neighborhood its operating point (a), or in a part of the operating area, and a controller may be designed based on the model description. Whenthe operal~ing point is changed, the model mayno longer match with the process. Instead, another linear model (model B) describes well the behavior of the process in the new operating point (point b). Consequently, the controller needs to redesigned (using the modelB) in order to maintain satisfactory behavior the controlled process.
9.1.
ADAPTIVE CONTROL
225
Timevarying processes A major part of conventional system theory is based upon the assumption that the systems have constant coefficients. This assumption of timeinvariance is fundamental to conventional design procedures. In real life, however, all industrial processes exhibit timevarying behavior. The properties of the process and/or its signals change with time due to component wearing, changes in process instrumentation, updates in process equipment, failures, etc. Whenchangesin the process are significant, the controller needs to be redesigned in order to maintain satisfactory behavior of the controlled process. With timevarying processes, the model parameters need to be updated online. Online identification maybe performed continuously, so that the model is updated at each sample instant. Alternatively, the model maybe updated at certain times ’whennecessary’. The necessity for identification maybe indicated from outside of the system (e.g., by a process operator), or sought out by the adaptive control systemitself (e.g., triggered by passing a certain value of an index of performance). (Note howlinear control based online identified linear modelscan be applied to a wide variety of processes, including nonlinear timevarying plants.) An adaptive system is able to adjust to the current environment: to gain information about the current environment, and to use it. Anadaptive systemis memorylessin the sense that it is not able to store this information for later use; all newinformation replaces the old one and only the current information is available. Alearning system, instead, is able to recognize the current environmentand to recall previously learned associated information from its memory. A learning system is an adaptive system with memory. Learning then meansthat the system adapts to its current environment and stores this information to be recalled later. Thus, one mayexpect that a learning system improvesits behavior with time; an adaptive system merely adjusts to its current environment. 9.1.1
Types of adaptive
control
categorized into three classes: Methods of adaptive control are commonly ¯ gain scheduling, ¯ indirect adaptive control, and ¯ direct adaptive control.
226
CHAPTER 9.
TIMEVARYING AND NONLINEAR SYSTEMS operating conditions
~
I Monitoring ~fprocess
auxiliaryprocess variables
Gain schedule
parametersCOntroller ~ reference
~[ Controller Process
Figure 9.1: Gain scheduling. Gain scheduling In gain scheduling, Fig. 9.1, the controller parameters are computedbeforehand for each operating region. The computation of controller parameters maybe based on a knownnonlinear modellinearized at each operating point, or on linear modelsidentified for each operating region. The modelparameters (or, rather, the precomputedcontroller parameters) are then tabulated. A scheduling variable permits to select which parameter values to use, the tabulated information is then applied for control. Since there is no feedback fromthe closed loop signals to the controller, gain scheduling is feed forward adaptation. In gain scheduling, the process operating conditions are monitored, possibly using someauxiliary process variables. Based on the observed operating conditions, precomputedcontroller parameters are selected using the ’gain schedule’, and then used in process control. The controller is switched between the precomputed settings, as the operating parameters vary. The name’gain scheduling’ is due to a historical background, since the scheme was originally used to accommodatefor changes in the process gain only. The design of gain scheduling can be seen as consisting of two steps: ¯ finding suitable scheduling variables, and ¯ designing of the controller at a numberof operating conditions. The maintask in the design is to find suitable scheduling variables. This is
9.1.
ADAPTIVE CONTROL
227
normally done based on physical knowledgeof the system. In process control, the production rate can often be chosen as a scheduling variable, since time constants and time delays are often inversely proportional to production rate. Whenthe scheduling variables have been found, the controller is designed at a numberof operating conditions, and the controller parameters are stored for each specific operating region. The stability and performance of the system are typically evaluated by simulation, with particular attention given to the transition betweendifferent operating conditions. The application of gain scheduling is usually a straightforward process. An advantage of gain scheduling is that the controller parameters can be changedvery quickly in response to process changes, since there is no estimation involved. The lack of estimation also brings about the main drawback of the system: there is no feedback to compensatefor an incorrect schedule. Gain scheduling is feed foward adaptation (or openloop adaptation): There is no feedback from the performanceof the closedloop to the controller parameters. Indirect
adaptive control
Indirect adaptive controllers [5] [38] try to attain an optimal control performance,subject to the design criterion of the controller and to the obtainable information on the process. The indirect adaptive control schemeis illustrated in Fig. 9.2. Three stages can be distinguished: ¯ the identification of the process (in closed loop); ¯ the controller design; and ¯ the adjustment of the controller. Conceptually, an indirect adaptive control schemeis simple to develop. A control design procedure is taken that is based on the use of a process model; the chosen controller design procedure is automated; and the procedure is applied every time a new process model has been identified. Thus there exists a large numberof different adaptive indirect (selftuning) controllers, based on different identification procedures and control laws. The model parameters are estimated in real time, the estimates are then used as if they were equal to the true ones (certainty equivalence principle) and the uncertainties of the estimates are, in general, not concerned. Theindirect adaptive control has been shownto yield goodresults in practice. Unfortunately,analysis of adaptive control systemsis difficult due to the interaction betweenthe controller design and the parameter estimation. This
228
CHAPTER 9.
TIMEVARYING AND NONLINEAR ,SYSTEMS Model
Controller design
~ Identification
reference Controller
Y Process
T Figure 9.2: Indirect adaptive selftuning control.
interaction can, however,play a key role in determining the convergenceand stability of the adaptive control. Often, this problemis handled by looking at the states of the adaptive control as separated into two categories which change at different rates. This introduces the idea of two time scales: the fast scale is the ordinary feed back, and the slower one is for updating the controller parameters. In direct adaptive control methods,the controller parameters are directly identified based on data (without first identifying a modelof the process).
9.1.2
Simulation
example
Let us look at a simulation exampleof the performance of an adaptive version of the generalized predictive control. In GPC,an explicit process model is required, i.e. a modelstructure needs to be selected [delay el and orders of modelpolynomialsA (ql) and B (ql)]. In addition, the coefficients and B need to be determined. In an adaptive version of GPC,the model parameters are updated using online measurementinformation. 2~he updated process model is then instantly used in minimizing the GPCcost function. Thus, a typical indirect adaptive controller is obtained.
9.1.
229
ADAPTIVE CONTROL
Process Consider a linear process (from [13]) described by its Laplace transform:
y(s__ A)= 2
1
(9.1)
u (s) 1 + 10s + 40s
The process can be written as a discretetime model:
y(k)
1 0.0114 + 0.0106q (1 _ql)(1 1.7567q1 + 0.77S8q2)
1)
(9.2)
In the following simulations, a sudden change at sampling instant k = 100 in the process is considered, so that the process gain is reduced to onefifth (B (ql) = 0.0023 + 0.0021q~) of the initial gain.
GPC with a fixed
process
model
The initial process model was assumedto be known(correct structure and parameters). The parameters of the GPCwere set to Hm= 1 (minimum output horizon), Hp = 10 (prediction horizon), H~= 1 (control horizon) and r  0 (control weighting). Fig. 9.3 depicts the resulting meanlevel control with fixed parameters. Clearly, after k = 100, the meanlevel type of control performanceis not obtained anymore.Instead, a significant overshoot appears and the settling time increases.
Adaptive
GPC
In a second simulation, the parameters of the process model were updated using RLSwith exponential forgetting (forgetting factor A = 0.99). The evolution of the parametersin B (ql) during estimation is illustrated in Fig. 9.4. The updated model was then used in the GPCcomputations. The result of the control is shownin Fig. 9.5. Theoriginal design specifications are fulfilled even if the process changeswith time.
230
CHAPTER 9.
TIMEVARYING
AND NONLINEAR SYSTEMS
0.5
I
50
150
200
250
300
350
40O
150
200
250
300
350
400
2 0
"~’4 6 80
50
100
Figure 9.3: GPC using a process model with constant parameters. At k  100, the process gain is decreased abrubtly to onefifth of the original. The performance of the designed meanlevel GPCdeteriorates at sampling instants k > 100 with large overshoots and a longer settling tirae.
9.1.
ADAPTIVE
CONTROL
231
0.01.’ 0.01 "~ 0.00~
50
100
150
200
250
300
350
400
Figure 9.4: The coefficients in B polynomial are correctly estimated. A relatively slow estimator was designed, with equivalent memoryhorizon equal to 100 samples. 0.5
0.5~)
,
t
50
1 O0
50
1 O0
150
200
250
300
350
4OO
300
350
400
2 0 "~’4 6 8 0
150
200
250
Figure 9.5: With adaptive GPC, the performace of the GPC remains as designed for the original process, even when the process gain changes (at k = ~00).
232
CHAPTER 9.
TIMEVARYING
AND NONLINEAR
SYSTEMS
Wienersystem inverse model
controller
i I ~ l I
linear dynamic system
I I (nonlinear) static ~ system I I z /
inverse
Y
]_
Figure 9.6: Schematic drawing of the control of a Wiener system based on the inverse of the model of the static nonlinear part.
9.2
Control tems
of Hammerstein and Wiener sys
For a given process, the predictive control problem can be tackled in many ways. This section provides a very simple and interesting approach for the design of control strategies on the basis of Hammerstein and Wiener models. The control based on nonlinear models such as the Hammerstein or Wiener models can be simplified and reduced to the design of co:ntrollers for linear systems. Figures 9.69.7 illustrate the control structures for Wiener and Hammerstein systems. Provided that the inverse of the nonlinear static part exists and is available[71], the remaining control problem is linear. Thus any linear control design method (such as the GPC, for example) can be applied in a straightforward way. What is required is the model of the inverse static systern. In many cases, it is simplest to identify the inverse model directly from input~)utput data (process inverse). Alternatively, the static (forward) nonlinearity be identified and its inverse mapping then identified (model i~Lverse). If forward static model is available, the inverse mayalso be solved ’online’ by iterating (online solution of model inverse). In some cases it :may also possible to have the inverse model from other sources, such as first principlebased process models, etc. Wewill next illustrate this large scale linearization ulated Wiener GPC example.
approach with a sim
9.2.
CONTROL OF HAMMERSTEIN
AND WIENER SYSTEMS
Hammerstein system
~_~
inverse
]~.] (nonlinear)
mode,~ s,a,ic [[
[
system
] [
233
~ linear
~ dy~m~[I i i system [
Y
~
[ [
Figure 9.7: Schematic drawing of the control of a Hammerstein system based on the inverse of the model of the static nonlinear part.
9.2.1
Simulation example
A MISOWiener model for the pH neutralization process is identified, and generalized predictive control (GPC)applied for the control of the process. The control of this process has been studied in a number of papers, e.g., [61][68][28]. Process
and data
Acid (ql), buffer (q2) and base (qa) streams are mixed in a tank and effluent pH is measured. The process model [61] consists of three nonlinear ordinary differential equations and a nonlinear output equation for the pH
(pH4): dh dt dW.~ dt
_ 1 (q~+q2+qa_C,~(h+z)n) A 1 = AT [(W’~I  Wa4) q~ + (Wa~  W,,)
(9.3) (9.4)
+ (w~,~ w,~,) 1 dW~,~ = A~ [(W~,I  ~,4) q~ + (~,~  Wb,) dt
(9.5)
+ (w~,~ w~,,) 0
~= ~4 W~ + 1~
(9.6)
Ha 1 pK~ + 2 x 1~ +W~’41 + 1~K~pH4 + 1~ H4pK~  l~H4 where h is the liquid level in the tank, and W~ 4 and W~ 4 are the reaction invariants of the effluent stream. Table 9.1 gives the nominal values used in the simulations.
234
CHAPTER 9.
TIMEVARYING
AND NONLINEAR SYSTEMS
variable tank area A valve coefficient C~ log of equilibrium constant pK1 log of equilibrium constant pK2 reaction invariant W~I reaction invariant reaction invariant reaction invariant reaction invariant reaction invariant time delay 0 acid flowrate ql buffer flowrate q2 base flowrate q3 liquid level in tank h effluent pHpH,~ vertical distance betweenoutlet and bottom of tank z valve exponent n reaction invariant W,. reaction invariant
value 2207 cm 8.75 6.35 10.25 3 x 103 M 3 x 10 2 M 3.05 × 103 M OM 3 x 102 M 5 x 105 M 0 min 16.6 ml/s 0.55 ml/s 15.6 ml/s 14 cm 7.0 0 cm 0.5 4.32 x 104 M 5.28 x 104 M
Table 9.1: Estimated parameters of the linear dynamicb][ock.
9.2.
CONTROL OF HAMMERSTEIN AND WIENER SYSTEMS bo BL10.2763 A1 B2 0.2230 A2 B~ m 0.2329 Aa
b~
al
(0.0757)
0.6737
0.1257
(0.2799)
0.0509
0.5480
(0.0653)
0.89.00
0.019.3
235
a2
Table 9.2: Identified parameters of the linear dynamictransfer polynomials. Training data of 500 samples was generated by simulating the model (9.3)(9.6). The input signal consisted of pseudorandomsequences for input, with a maximumamplitude of 50%of the nominal value. A test set of 500 data patterns was generated in a similar way. Model structure
and parameter
estimation
A Wiener modelwas identified using a SNNof 5 hidden nodes for the static part, the linear dynamicswere identified using nBt = riB2 = nBa = 1, nAt ~ hA2 ~ hA3 = 2 , and dl = d~ = d3 = 1. Model inputs consisted of the three input flows to the tank, ql, q2 and q3, the system output was the pH at the outlet. For reference purposes, also a Wiener model with a linear static part was identified (y = KTz). Parameters were estimated using the LevenbergMarquardt method. Results on identification The performance of the identified model is illustrated in Figs. 9.89.10. Fig. 9.8 shows the performance on training set. The Wiener model output follows closely the output of the true plant. Simulation on test set, Fig. 9.9, reveals that the mappingis not perfect. This is also indicated by the rootmeansquared errors on training set (RMSE= 0.1192) and test sets (RMSE= 0.2576). However, a reasonably accurate description of the pH process wasobtained. The static mappingis illustrated in Fig. 9.10, showing the titration curves for each input flow, whenother flows have their nominal values. Table 9.2 showsthe estimated parametersof the transfer polynomials. Control design The objective was to control the pH(pH4) in the tank by manipulating the base flow rate (q3). In order to fulfill the objective, a GPCcontroller was designed for the plant.
236
CHAPTER 9.
1
50
100
TIMEVARYING
150
200
250
AND NONLINEAR SYSTEMS
300
350
400
450
500
" /0
I
0 301
50 ,
/
~
0 0
0
0
1
1 O0 ,
I
I
I
I
I
150 ,
200 ,
250 ,
300 ,
350 ,
20~~
~
I O0
150
~
I
50
l O0
150
200
I
I
I
200
250
i
450 ,
I
I
I
I
300
350
400
450
i
300 250 time(samples)
500
~P7,
.. ~~
~ 50
I
400 ,
~
I
I
350
400
450
500
500
Figure 9.8: Training data for the pHneutralization model. Thesystem inputs are shownin the three upper plots. The system output is the lower plot (solid line). The predicted output (dashed line) follows closely the training data; dashed lines on upper plots showthe corresponding intermediate variables. Testdam 10
,
8 7 6 5 4
x~__
30
50
100
150
200
250 300 time(samples)
350
Figttre 9.9: Performanceon test data.
400
450
500
9.2.
CONTROL OF HAMMERSTEIN l0
AND WIENER SYSTEMS
237
IC
lOt
8t ~6 4
4~ 10
15 ql
20
0.2
0.4
0.6 q2
I0
0.8
15 q3
20
Figure 9.10: True (solid line) and identified (dashed line) static mappings. For Wiener and Hammerstein systems, a linear control design can be accomplished if the inverse of the nonlinear static part is available. In the Wiener system, when the process output and the target are mapped through an inverse nonlinear mapping of the static part, the remaining system is a linear one (see Fig. 9.6). In the simulations, the inverse problem was solved online: The process model and a GaussNewton search were used in order to find a z3 (SNN input) such that if(k) (SNN output) equals y (k) sured output) 1. The control problem was then based on the error between the desired output zw (k) and the intermediate signal z3 so that the desired performance characteristics were fulfilled. The GPCcost function is of the form
2s=
(9.7) i= Hrn
i=0
where Zw (k + i) are the desired future responses. In the case of Wiener systems, the z~ and z can be obtained from the desired and measured process outputs w and y, by solving the inverse of the static mapping. Control
simulations
First, a deadbeat controller was designed for the system: H~,l = deg B + d + 1 = 3, Hc = deg A + 1 = 3, H~, _> deg A + deg B + 1 (Hr, = 8) and r = 0. The ideal response is shown in Fig. 9.11, obtained using the identified Wiener model both in the controller and as the simulated plant. The upper part of Fig. 9.11 shows the control signal u and the intermediate signal z. The ~Note that in this case the (global) inverse does not need to exist, as the local solution (depending on the initial value of the search) was found. For the considered process, the global inverse would exist and could be identified.
238
CHAPTER 9.
TIMEVARYING AND NONLINEAR SYSTEMS
lower part shows the target w and the plant output y. The desired target sequenceconsisted of four step changesto the process (at t = {375, 750, 1125, 1500}seconds). At t = {1875, 2250, 2625}seconds three disturbances affect the process (unmeasured :t=10% changes in ql). As shownby the simulation responses, the ideal dead beat response is fast and accurate. Note, that the steadystate gain of the uz controller is one. However,the magnitudeand rate of the control input signal makesit unrealistic for manyreal process control applications. (Note howin the simulation the input q3 wasrestricted to be nonnegativewhicheffects the control at t = 375, Fig. 9.11.) A correct estimate of the I/O behavior of the process also needs to be available. In order to get an implementable control signal, it is commonin GPC to decrease the control horizon H~ (often H~ = 1). In meanlevel control (H,: = 1, Hplarge) the closed loop will have open loop plant dynamics;with smaller H~, a tighter control is obtained. Alternatively (not excluding), realizable controller can be obtained by introducing a nonzero value for the parameter r. Fig. 9.12 shows the simulation when using the true process, (9.3)(9.6), dead beat control settings, and r = 0.1. This parameter setting results in a relatively smoothcontrol input, and a small overshoot. Thesmall deviations from unit static gain are due to the processmodel mismatch.Due to the integral action in GPC,there is no steadystate error. For comparison, a GPCcontroller based on the linear pH model was experimented. The simulations are shownin Fig. 9.13. Let us assume that the desired performance was a small overshoot and smooth comrol actions, as in Fig. 9.12. Then the system response at pH = 7 is as desired. At pH  9, the system is overdamped, however, and at pH = 5 the system is underdamped.Clearly, the changes in the gain of the system reflect the control, and the closedloop performancechanges dependingon the operating point. These design difficulties were avoided in the Wienercontrol approach. Results In the example, identification and control of a pH neutralization process were considered. First, a MISOWiener modelwas identified for the process. Using the identified model, a GPCcontroller was designed. Simulations showedgood results. Simplicity of nonlinear process control is one of the mainmotivations for Wiener and Hammersteinsystems. Provided that the inverse of the static part is available, linear control design methods(e.g., pole placement) can be directly applied for the control of the linear subsystem. In the examplewe showedthat an explicit inverse modelis not always necessary. (,Note that in somecases a global inverse maynot exist). Although fixed parm~eter models
9.2.
CONTROL OF HAMMERSTEIN
AND WIENER SYSTEMS
239
WienerGPC:Deadbeat with a perfect model 4O
!i It
3O II
~ 20 10
Oo
I
1000
1500 time (seconds)
2000
2500
31300
1000
1500 time (seconds)
2000
2500
3000
8
0
500
Figure 9.11: Ideal dead beat control. Upper part: controller output u (dashed line) and ’measured’ intermediate signal z. Lowerpart: plant output y (solid line) and target w (dotted line).
240
CHAPTER 9.
TIMEVARYING
AND NONLINEAR SYSTEMS
WienerGPC:Deadbeat with control weighing 25 2~
1r
IC 5 0
500
1000
1500 time (seconds)
2000
2500
3000
500
1000
1500 time (seconds)
2000
2500
3000
IC 9 8
5
Figure 9.12: Simulation of WienerGPCcontrol of the pH neutralization process.
9.2.
CONTROL OF HAMMERSTEIN
241
AND WIENER SYSTEMS
LinearGPC: Deadbeat with control weighing
25 2O
r L
j
~
_
=~15
_ _ _
Ii
{,. i 10
5~)
I 5 O0
i 1000
500
1000
I
1500 time (seconds)
2000
1500 time (seconds)
2000
2500
3000
2500
3000
IC
6 I
0
Figure 9.13: Simulation of linear GPCcontrol of the pH neutralization cess.
pro
242
CHAPTER 9.
TIMEVARYING AND NONLINEAR SYSTEMS
were applied in the example, adaptive control applications are straightforward to conduct, provided that robustness of the closedloop learMngsystem can be guaranteed. From this point of view, the Wiener and Hammerstein approachesdo not pose any additional difficulties. 9.2.2
Second
order
Hammerstein
systems
In this subsection, wewill consider the special case of predictive control for a second order Hammersteinsystem. Second order Hammerstein model In order for a control system to function properly, it should be unduly insensitive to inaccuracies in the process model. Weshall be concerned with a class of SISOdiscrete time Hammersteinmodels P A(q~)Y(k)=EB’(q1)u’(k1)+
~(k) A(q’)
(9.8)
where Bp (ql) is a polynomial of degree nBp. This model belongs to the followingclass A (q~) y(k) = B (q~) f(u (k))
(9.9)
wheref(.) is a nonlinear function. Let us introduce the following auxiliary (pseudo) input [42] [43] P
(9.10) p1
Theprocess model(9.8) will be rewritten as follows: A (q~)
y(k)
= q~x(k)
1) A(q
(9.11)
Let us derive a predictive controller for the special case P = 2: 1 [Bl(q_~)u(k_l)+B2(q_~)u~(k_l)] 1) A(q 1 t A (q~) A (q_~).((k)
(9.12)
9.2.
CONTROL OF HAMMERSTEIN
AND WIENER SYSTEMS
243
Prediction In order to separate the available information from the unavailable (separate past and future noise), let us consider the following polynomial identity (see Chapter 3, Section 3.3.6)
1 _. 1) Fj(q A(q_l)A(q_l) (qi) q_q :.A(q_l) A(q_ )
(9.13)
from which we have Ej (q1)
A (q 1)
A (q1)
1 = )1qJFj
(9.14)
(q
Multiplying both sides of the model (9.12) by qJEj (ql) A (qi) A (qi) and substituting (9.14) leads
y(k+j) = al,j (qi) ~ (qi) ~(k+
(0.1s)
+G2d(ql) A (q~) 2 ( k +j  1) +F~(q’) y(k) + E~ (q1) ~(k where aid (ql)
= Ej (ql)
(9.16)
Bi (ql)
 g},0 + g~,lqq "’" + lJj,nbiFjl~l
"
Since the degree of the polynomial E~ (qi) is j  1, the noise components E~ (qi) ~(k + j) are all in the future and since ~(k) is assumed to be the prediction ~(k + j) of y (k + j) in the meansquares sense is given ~(k + j)
= Gld (q1) tG2,j
Au(k + j 1)
(ql)
(9.17)
A~2 (k ~ j 
+F~(q~)y(~). Notice that the prediction ~(k + j) depends upon: i) p~t and pr~ent meas~ed outputs; ii) p~t ~own control increments; ~d iii) pr~ent and fut~e control increments yet to be deter~ned. Let ~ denote by f (k + j) the componentof the prediction ~(k + j) posM of signM known (available) at sampling i~t~t k. For example, the expr~sio~ of ] (k + 1) and f (k + m) are respectively given f(k+l)
= [GI,1
(ql)~,0]
+ [a2,1
(ql)
~u(k) __ g~,0] ~u2 (k)
+F~(qx)~(~)
(9.18)
244
CHAPTER 9. TIMEVARYING AND NONLINEAR SYSTEMS
f (k + m)
=
[GI,, (4l)  gA,o  gk,lq' 
+ p 2 , m (9l) +Fm
 9;,0
 9;Jql
1
0 . 
 gm,mlq(ml)] 2
 . * *  9m,mlQ
(m1)
3
(k) Au2
( ) (9.19)
(ql) Y (k)
Let us rewrite equation (9.17) for j = 1,...,H, in the following matrix form
9 = G1U + G2U2+ F
(9.20)
where
(9.21)
1' 1
(9.22)
UHp1
u 2=
I
Au2 (k) Au2 (k + 1)
(9.23)
1 Au2 (k + H,,  1) (9.24)
9f,o 91,l 91,2
0 95,o 91,l
... 0 91,o
Gi =
0 0
0 0 9m,l 9m,o
(9.25)
*..
i 9m,ml
i = 1,2. We denote by
i .9m,m2
i
* * *
i
gi the kth column of the matrices Gi (i = 1,2) Gi= [ g6 gi
* * .
gkpl
]
(9.26)
9.2. CONTROL OF HAMMERSTEIN AND WIENER SYSTEMS
245
Cost function In what follows, we shall be concerned with the minimization of the following control objective
J
=E
{ [Z1
HC
I)
C [ ~ ( I +c j )  ~ ( I +C j)12+ C r [AU(IC+ j  ill2 j=l
IIC
(9.27)
which with (9.20), yields
{
J = E (GIU+ G2U2+ F  W)' (GIU+ G2U2+ F  W) (9.28) +TUTU} where
w = [w(k+ 1)
* * *
w(lc
+
(9.29)
Let us set
Then (9.28) is equivalent to
J = E (VTV + TUTU}
(9.31)
Minimization of cost function To minimize this quadratic cost function, we have to calculate the gradient of J with respect to the control increments ui and their squares u: (2 = 1, ...) HI,  1).
dJ dV = 2vT + 2rui. dUa dUa From (9.30))it follows that the gradient
(9.32)
is given by (9.33)
which leads to
d J = 2vTgf+ 4uiVT gi 2 + 2rui 
dUi
(9.34)
246
CHAPTER 9.
TIMEVARYING
AND NONLINEAR SYSTEMS
The partial derivative of the criterion J with respect to U OJ
i
oj
OJ ]
oj
SU = ~oo NT"
o~Hp_l
(9.35)
maybe written as OJ OU
[ zv~g~ ~.v~g~ ~.v~gl, p_l] (9.36) +[ ~ 4u0v~g~ 4ulv% 4u,,_lV~g~,._~ ] + [ 2ruo 2ru~o
2rUgp_~]
This gradient can be rewritten in a more compact form as 0_j_J = 2vTG1 + 4vTG~diag (U) T 2rU (9.37) 0U Thecurrent control law is given as the first elementof the vector U, obtained by equating the gradient oA~to zero, i.e., 0~J = 2vTG~ + 4VTG~diag (U) 2r UT = 0. (9.38) 0U. The complexity of this expression makesit intractable to calculate the optimal control increment. It is not easy to derive the analytical expression of the current control. In the sequel, we shall present Newton’smethodfor the computationof this current control. For this purpose, let us calculate the elements of the Hessian 02 J ~
02J 2 OU
¯
O’IZ, Hp i~0
02 J
02 J OUo~ 1
"" "
OUoUH p1
:
OItHp IU, I
(9.39)
" ¯¯ ~
From(9.32), we derive 02J Ou~u~
+~  0~,
~ =~[ b~
o [ .v OVl
where 5ij represents the Kronecker symbol. By (9.33), we obtain OZJ _ 0 [2VTgj + 4u~VTg~] + 2r6~ d O%u~
Ou~ = 2 \"~u~ ,/ [gj + 2u/g~] + 46idVgj + 2r6i,~
(9.40) (9.41)
(9.42) (9.43)
9.3.
CONTROLOF NONLINEAR SYSTEMS
247
Again, taking (9.33) into account, we deduce that
Finally, the secondpartial derivative of the criterion J with respect to U is given by 02 J 20U = 2G~TG2+ 4diagtV)[GzTG1] + 4 [G~TG2] diag(U) (9.47) +8diag (V)[G~TG~diag(U)] + 4diag (V~G~) The optimal control increment can be obtained iteratively on the basis of the Hessian (9.47). Evenif the considered model (second order Hammersteinmodel) is relatively simple, the developmentof the longrange predictive control strategy requires manycalculations which alter considerably the inherent robustness of longrangepredictive controllers. The next section is dedicated to the developmentof predictive control strategies for both unconstrained and constrained systems. This development is based on stochastic approximationtechniques and sigmoid neural networks. Anynonlinear basis function approach can also be used.
9.3 Control of nonlinear
systems
Several studies on the use of SNN(sigmoid neural network) as the basis for modelpredictive controllers (with finite prediction horizons), have been published [82][77][91] [81]. To avoid the use of locally linearized modelsof the process to be controlled (the classical GPCapproach), and complexoptimization techniques, we present a simple solution of deriving longrange predictive control based on SNN[66]2. The design of this solution is based on the training of two dynamic neural networks. The NOEand the NARX 2This section is based on K. Najim, A. Rusnak, A Meszaros and M. Fikar. Constrained LongRangePredictive Control Based on Artificial Neural Networks. International Journal of Systems Science, 28(12): 12111226, 1997. Reproduced with permission from Taylor Francis Ltd.
248
CHAPTER 9.
TIMEVARYING AND NONLINEAR .SYSTEMS
neural networksare respectively used as a multistep predictor and for calculating the control signal (neural controller). The multilayer feed forward SNNis trained so as to achieve the control objective. The maila idea presented in this study concerns the use of stochastic recursive approximation techniques as learning tool for the design of neural networkscontrollers to solve both unconstrained and constrained predictive control problems (minimization of a longrange quadratic cost function and preventing violations of process constraints). The control approach described below is general and does not dependon the structure of the control objective and the constraints.
9.3.1
Predictive
control
The formulation is based on a NARIMAX model, and on the minimization of the conditional expectation of a quadratic function measuring the control effort and the distance between the predicted system output and some predicted reference sequenceover the receding horizon, i.e. J = E y~ (w(k + j)  y(k + j))2 y~(Au(k + j 
1) 2 k (9.48
j=Hm
wherey, Au, and ware the controlled variable, future control increments, and set point, respectively. H,~l, and H~,, are respectively, the minimum and the maximum prediction horizon. The weighting factor r serves for penalization of the future control increments Au. In what follows, the use of neural networks(see Chapter 5) for prediction and control is considered.
9.3.2
Sigmoid neural
networks
Consider the problem concerning the design of an algorithm which at time k predicts simultaneously the outcomes of the process {y(k)} at time k + 1, k + 2, ..., k + H, where H, is the prediction horizon. NARX SNN, using onestepahead structure, generally perform poorly over a trajectory (prediction horizon) because errors are amplified wheninaccurate network outputs are recycled to the input layer. Recurrent SNNsare moreappropriate for application in modelpredictive control [94], and in this study wehave used a NOESNNas a multistep predictor. Unlike NARX SNNwhere information flows only from input layer to the outputs, recurrent SNNinc]Lude delayed information flow back to preceding layers. AnNOESNNpredictor is depicted in Fig. 9.14. The SNNinputs consisted of the previous and current values of process inputs and predicted plant outputs. The values of process outputs
9.3.
CONTROL OF NONLINEAR SYSTEMS
249
come into the NOESNNonly indirectly in the process of training when the future predicted output is compared with the actual process output. The training process of this SNNpredictor was carried out using a back propagation through the time algorithm [94]. A multilayer feed forward SNNwas used as a controller. The proposed structure of this SNNcontroller is depicted in Fig. 9.15. Thecontroller inputs consisted of the plant predictions whichare providedby the predictor and the desired value of the plant output. The outputs correspond to the present and future increments of the control signal. The weights of this SNNcontroller were updated directly using a stochastic approximation algorithm, which minimizes the control objective (9.48) subject to constraints (any control objective can be considered in this control approach). These weights are considered as the controller parameters. Since control action is based on the prediction of the plant behavior, offset can occur due to disturbances and model mismatchwhen the SNNis used as a dynamic model of the controlled process. Therefore the plant output is predicted at each sampling time as follows: y(k)=f(U,O)+d(k)
(9.49)
whereU is the SNNinput vector, ~ is a vector of the weights to be optimized, and d is a disturbance. This disturbance (correction) of the prediction computedby the following equation: d(k + i) = d(k) = y(k)
(9.50)
wherey is the current value of the plant output and ~" is the prediction of y generated by the SNNpredictor. The disturbance is assumedto be constant over the prediction horizon. The general schematic diagram of the predictive controller is depicted in Fig. 9.16. At each sampling period, the signals are fed into the SNN predictors: ¯ past and present plant outputs, ¯ past values of control actions applied to the process, and ¯ the calculated future control sequencefrom the last sampling period. The predictor calculates predictions of the plant outputs over the relevant horizon. These are corrected with the calculated deviation between actual process output and predicted process output at time k. Next the predictions are fed, together with the set point value (or sequence of future set point
250
CHAPTER 9.
TIME VARYING AND NONLINEAR SYSTEMS
Figure 9.14: Structure of a sigmoid neural network (SNN)predictor.
9.3.
CONTROL OF NONLINEAR SYSTEMS
o
Figure 9.15: Structure of a SNNcontroller.
251
252
CHAPTER 9.
TIMEVARYING
AND NONLINEAR SYSTEMS
Past andpresentprocessoutputs Modelpredictions SNN ~’ Disturbances
Set point
predictor ~~ Future.[ con.trol I
Past andpresent control actions
a..~cti~ SNN 1 controller ] Newcontrol action
Process
Process output
Figure 9.16: Structure of a predictive control system using sig~noid neural networks. values for programmedset points), into the SNNcontroller. This minimizes the criterion (9.48), constructs the future control increments, and closes the inner loop. This procedure is repeated until the future control increments converge. The algorithm used for training this neural network controller is described in the following section. 9.3.3
Stochastic
approximation
In this section our main emphasiswill be on stochastic approximation techniques [44]. The synthesis of the neural network controller is formulated as the determination of their associated weights which minimize the unconstrained (constrained) control objective function J, i.e., O=argm~nJ(O)
(9.51)
where 0 is the weight vector. The optimization problem(9.51) can be solved using stochastic approximationtechniques. A key feature of manypractical control problemsis the presence of constraints on both controlled and manipulatedvariables. Inequality constraints arise commonlyin process control problems due to physical limitations of plant equipment. For example, the control objective may be to minimize somequadratic cost function while satisfying constraints of product quality and avoiding undesirable operating regimes (flooding in a liquidliquid extraction column,etc.).
9.3.
CONTROL OF NONLINEAR
SYSTEMS
253
Let us consider the following constrained optimization problem: min J0(0)
(9.52)
J~(O) ~_ 0, (i = 1,...,m)
(9.53)
0
under the constraints
J0(0) is associated with the control objective defined by (9.48). The straints Ji(0) _< 0, ( i  1, ... , m) are usually associated with process physical limitations of valves, reactor volume, etc. Let us introduce the Lagrange function [93] defined by m
L(0,
(9.54)
j=l
where = [~bl, ... , ~,~] T is the KuhnTucker vector. To solve the optimization problem (9.52)(9.53), an iterative algorithm based on stochastic approximation techniques was proposed by Walk [93]. This algorithm maximizes simultaneously L(O, (I)) with respect to (I) and minimizes it with respect to ~. It is presented below: Let the estimates (~k, (I)k) be available at time k where 0k is dimensional random vector, (I)k = (¢k,i) i = 1, ..., rn is an mdimensional random vector with Ck,i > 0 (k E N) on a probability space (0, ko, R). ak, ck (k E N) be real positive sequences tending to zero and satisfying:
~a 2kCk2 < ~X),
E ak=oC, ~~akck
< cX~
(9.55)
The observation noise (the contamination of function values) is modeled by squareinte~ablereal randomv~iabl~E~,~(i = 0, ..., m; l = 1, ..., P; k ~ N). The optimization algorithm is given by [93]: (9.56)
~k+~ = Ck + akSoL(Ok, ~) where 5¢L(0k, (I)k)~  max {J,(0k)
’~’"},
(i
= 1,...,m)
(9.57)
and Ok+l = Ok  ak~)oL(Ok, +k)
(9.58)
254
CHAPTER 9.
TIMEVARYING AND NONLINEAR SYSTEMS
where (DoL(~e,(I)a))t
1 [J0(ek+ c~,)  J0(e~ c~,) vL]+ = (2c~) + ~ ~.~(~)~[~(~+ ~,)  j~(~  ~,)
(9.~)
m
e~ is a Pdime~ionalnullvector with 1 ~/’th coordinate (l = 1,., P). With the aalgebra ~ defined ~ follows ¯ ~ = ~(~,,¢~,~:,...,~_~(i= ~,...,~), " ~i ~,,, ..., V;_~,,(i= O,...,m;l =1, ..., P)) it is assumed:
(~.~0)
(9.61) and l,i Vi
(9.63)
Under these assumptions, it has been shown[93] that this algorithm converges almost surely to the optimal solution. In this algorithm, the componentsof the gradients of the Lagrange function toward the weights ~ and the KuhnTuckerparameters (I) are estimated by finite differences. The convergenceof this algorithm has been proved by Walk[93] as well as a central limit theorem with convergence order k which is also achieved for the KieferWolfowitzmethodto which the considered algorithm reduces if there are no constraints. To demonstrate the performance and the feasibility of this approach we applied it to control a continuousflow,stirred biochemicalreactor and a fixed bed tubular chemical reactor. 9.3.4
Control
of a fermenter
Control problemsin biotechnological processes have gained increasing interest because of the great numberof applications, mainly in the pharmaceutical industry and biological depollution [62]. Weconsidered a model of a continuousflow, stirred biochemical reactor (fermenter). This model, which describes the growth of saccharomycescerevisiae on glucose with continuous feeding, was adopted from [58]. It is based on a hypothesis in [89]: a limited oxidation capacity leading to formation of ethanol under conditions of oxygenlimitation or an excessive glucose concentration.
9.3.
CONTROL OF NONLINEAR
255
SYSTEMS
Process The dynamic model is derived from mass balance considerations. scribed by the following equations: Cell mass concentration
(9.64)
dt glucose (substrate)
It is de
concentration _v~ = D~(c~,i,,  c~)  Qsc~ dt
(9.65)
ethanol (product) concentration dce
d~ = D,(ce,i,~  c,,) + (Q~.,pr  Qe)cx
(9.66)
carbon dioxide concentration _c___2d = Dg(c,.,i,,  c,:) + Qcc~ dt "
(9.67)
dissolved oxygen concentration dco
d"~ = D~(co,i,,  co) + Na  Qoc~
(9.68)
gas phase oxygen concentration ~ 
Dg(cg,i n  cg) 
Na
(9.69)
where ql
D~ = ~ll and Og
q_~g
= Vg
(9.70)
The mathematical description of the kinetic model mechanisms is given in Table 9.3. The model parameters are given in Table 9.4. The initial conditions for equations (9.64)(9.69) are given in Table The fermenter model was simulated using the RungeKutta method. The bioprocess static behavior is depicted in Fig. 9.17. One can get information about the nonlinear behavior of the bioprocess by looking at the variation of the steadystate gain depicted in Fig. 9.17. The experiments described here illustrate the use of the predictive control algorithm when applied to controlling the dissolved oxygen concentration, Co(t), using the dilution rate, Dg(t), as.the manipulated variable.
256
CHAPTER 9.
TIMEVARYING
AND NONLINEAR SYSTEMS
Mechanism Glucose uptake Oxidation capacity
Description Qs ~  Qs,maxk~+c
Oxidative ethanol metabolism
Q~,ox = min { gos~o, Q~
Reductive glucose metabolism Ethanol uptake
Q~,~,iQ~Q~,ox ~ ~_.~_ Q,~._’,~e, c~ maxk.,~ +c¢ kl +c~
Oxidative ethanol metabolism
Qe lim Qs,ox~o)~e Qe,ox min( ~(Qo,
qo, lim = Qo,maXko+C~ lira
Ethanol production Growth Carbon dioxide production Oxygen consumption Oxygentransfer Maximumconsumption rates where induction or repression factors are Table 9.3: The kinetic model mechanismdescription.
k,, = 2.2 103mol/1 kl = 5.6 104mol/1 kin : 1.7 104tool/1 k n ~. 3.6 104mol/1 ko = 3.0 106mol/1 k~ = 5.6 104tool/1 1 kLa = 592 h P Qe, max= 0.13 mol/(Cmol.h) P Q~, max= 0.50 mol/(Cmol.h) Q~(’,,max= 0.20 mol/(Cmol.h) m = 35 mol/mol VI = O.70l 1 Dg = 1.0 h c~,i,, = c~,i,, = 1.0 103tool/1
Y~¢= 0.68 mol/mol ~(, = 1.28 tool/tool Y~x = 1.32 Cmol/mol ~e = 1.88 mol/mol ~o = 2.27 mol/mol Y~’ = 2.35 mol/mol ~r~ed = 1.89 mol/mol y~x = 3.65 Cmol/mol y~:~d = 0.36 Ctool/tool T,, = 2.80 h "to = 1.60 h T~ = 2.50 h Co,in
= Cc,in
= Co,in
= 0.0
mol/l
v.~ = 2.50 h
Table 9.4: Model parameters. (1 Ctool of biomass has the composition CH~.saN0.~700.~).
9.3.
CONTROL OF NONLINEAR co(0) = 1.5510amol/1 Cg(0) = 8.72105mol/1 cs(0) = 2.42104mol/1
SYSTEMS
257
c~(O) = 1.010amol/l co(O) = 2.43106moi/I ~(0) = l.OlO~Cmol/l
Table 9.5: Bioprocess initial
conditions. 4x 10
2
1 0.5 0.5
1 D
1.5
0
0.5
1
1.5
D
Figure 9.17: Static behavior of dissolved oxygen Co [mol/1] (left) gain (right) as a function of the dilution rate
and static
The structure of the SNNpredictor used was [6, 5, 1]: six neurons in the input layer with inputs [y(k 1), y(k 2), y(k 3), u(k), u(k 1), u(k 3)], five neurons in the hidden layer and one neuron in the output layer. The sampling period was set equal to 0.5 hours. The training set contained 600 inputoutput pairs. The structure of the SNNcontroller used was [14, 8, 4], and the inputs consisted of the predictions of the process behavior obtained using the SNNpredictor. Apart from predictor and controller parametrization (i.e. choice of the number of nodes), there still remain a few design parameters that must be specified a priori, i.e. the prediction horizon, the control weighting factor, and setting the values of the parameters involved in the stochastic approximation algorithm. The following choices were made: ¯ the horizons related 1, H, = 13.
to the control objective were set equal to H,, =
¯ the weighting factor r in the control objective was fixed to r = 0.1. ¯ the setting values of the parameters involved in the stochastic approximation algorithm were fixed to ak = 0.3, Ck  0.01.
258
CHAPTER 9.
TIMEVARYING AND NONLINEAR SYSTEMS
Notice that ak must decrease in order to removethe influence of disturbances, according to (9.55). In the noise free case, ak can be either constant decreasing sequence that converges to a constant value.
Control simulations For the first set of tests, the future reference was considered to !be a known square wave, as shown in the upper graph of Fig. 9.18. Figure 9.19 gives an enlargement of Fig. 9.18 for the first 200 hours of the simulation run. The lower graph of Fig. 9.18 represents the control signal u(k) derived from the predictive control calculation. Figure 9.18 and Figure 9.19 show the performance of the control. It can be seen that both steadystate and transient behavior are satisfactory. Thevariation of the control variable is very smooth. Notice that the set .point changesled to the variation of the bioprocess dynamics(steadystate gain change, etc.). The second experiment considers level constraints on the input. The following constraints on the manipulated variable, dilution rate D~, were used: Dg,min _~ D~(k) _< D~,,,,a~, Dg,min = 0.4, Dg,,,,~ = 0.85. Figure 9.20 shows the evolution of the dissolved oxygen concentration, co(k), and the dilution rate, D~(k). The evolution of the bioprocess output as well as the control variable for the first 200 hours of operation are depicted in Fig. 9.21. These simulation results show that the bioprocess operates well under the constrained control. In the third experiment, a rate constraint on the input was considered, [ADg(k)l _< 0.1. Figure 9.22 shows the evolution of the dissolved oxygen concentration co(k) and the dilution rate D~(k). Figure 9.23 gives an enlargement of Fig. 9.22 for the first 200 hours of this simulation run. There are a large number of set point changes. Some of them occur randomly (see Fig. 9.22 for k >_ 200 hours). Dueto the fulfillment of the constraint, the control signal has no large variations, thus it corresponds to industrial requirements. In practice, it is impossible to obtain perfect measurementsand uniform dissolved oxygen concentrations. Wetherefore introduced measurementnoise ~(k) with zero mean and variance equal to 0.02. Figure 9.2,1 shows the dissolved oxygenconcentration and the dilution rate. The control action is smooth. These results demonstrate that the presented control aigorithm has good regulation and tracking properties. Next, we will consider the control of a chemical reactor.
9.3.
CONTROL OF NONLINEAR SYSTEMS
~o1
.
:
I00
2~
3~
100
200
300
4~
i
:
.
259
" :
:
9~
5~
6~
7~
8~
500
600 t
700
800
1000
11~
12~
1 0.8 ~d3.6 OA 0.2 0
400
900
1000
1100
1200
Figure 9.18: Predictive control of a fermenter: unconstrained case. Top: the measuredand desired dissolved oxygen, c,,, as a function of time [hi. Bottom: the manipulated variable, D~.
260
CHAPTER 9.
TIMEVARYING
AND NONLINEAR SYSTEMS
1.5
40
60
80
100
120
140
160
180
1130
120
140
160
180
1 0.8
~d).6 OA 0.2 0
20
40
60
80
Figure 9.19: Enlargement of Fig. 9.18 for the first
200 hours of operation.
9.3.
CONTROL OF NONLINEAR SYSTEMS
261
0.5
1100
1 0.8
0.4 0.2 0
I
100
I
}
I
200
300
400
, I
500
I
I
I
I
600 t
700
800
900
I
1000
I
1100
Figure 9.20: Predictive control of a fermenter: level constraint on the manipulated variable, 0.4 _< Dg _< 0.85. Top: the measured and desired dissolved oxygen, Co, as a function of time [hi. Bottom: the manipulated variable, Dg.
262
CHAPTER 9.
20
40
TIMEVARYING
60
80
I00
AND NONLINEAR
120
140
160
,SYSTEMS
180
1 0,~
o.2 0.~ l
I
20
40
I
60
I
80
I
1 O0 I
I
120
Figure 9.21: Enlargement of Fig. 9.20 for the first
I
140
I
160
I
180
200 hours of operation.
9.3.
CONTROL OF NONLINEAR
263
SYSTEMS
1.5
100
200
300
400
500
600
700
800
900
1000 1100
900
1000 11130
1 0.8 ~.6 0.4 0.2 0
1 O0 200
300
400
500
600 t
700
800
Figure 9.22: Predictive control of a fermenter: rate constraint on the manipulated variable, IDol <_ 0.1. Top: the measured and desired dissolved oxygen, Co, E. as a function of time [hi. Bottom: the manipulated variable, D
264
CHAPTER 9.
TIMEVARYING
AND NONLINEAR
SYSTEMS
1.5
Go
20
40
60
80
100
120
140
20
40
60
80
1 O0
120
140
160
180
1 0.~
0.4 0.2 0
I
I
I
160
180
!
Figure 9.23: Enlargement of Fig. 9.22 for the first
200 hours of operation.
9.3.
265
CONTROLOF NONLINEAR SYSTEMS
1.5
200
300
400
500
600
700
800
900
1000
1100
1200
1 0.1~
0.,~ 0.2 0
I
1 O0
I
200
I
300
I
400
500
600 t
700
800
900
1000
1100
1200
Figure 9.24: Predictive control of a fermenter: noisy measurements. Top: the measuredand desired dissolved oxygen, Co, as a function of time [hi. Bottom: the ~nanipulated variable, Dg.
266 9.3.5
CHAPTER9. Control
TIMEVARYING AND NONLINEAR ,SYSTEMS
of a tubular
reactor
In a second examplewe consider the temperature control in a tubular chemical reactor. A tubular reactor is a significant and widelyused piece of equipment in chemical technology. The object of our study is such a reactor with fixedbed catalyst and cooling. Efficient control of this type of process is often hamperedby its highly nonlinear behavior and hazardous operating conditions. Assumingthat jl reversible exothermicfirstorder reactions take place in the reactor and somespecified simplifying circumstances hold, a structured nonlinear mathematical model of the process can be developed [59]. The modelwas proposed on the basis of both mass and heat balances and its final form is given by a set of nonlinear hyperbolic partial differential equations, as follows. Mass balance for the i’th component ~ Oci Oci 0~ + u~’z =  j1 E r,~(c,, Tk)
(9.71)
Energetic balance of reactant mixture
OTg + = AI(Tk   A2(T  Tw) Energetic balance of catalyst OTk = BI [~~ (AH~)r~  B2(Tk  Tg)  B~(Tk Ot L~=I
(9.73)
Energetic balance of the reactor’s wall ~ = C~ [C~(Tk  T~) + C~(T~ T~)  C4(T~ 
(9.74)
The initial and boundary conditions are: ci(z,O) = ci~(z); Tg(z, 0)= Tg~(Z); T~(z,O)= T~(z) Tw(z,0) = Two(Z); c4(O,t)=cio(t); Tg(0, t)= Tg0(t)
(9.75) (9.76)
Symbolsused stand for the following physical quantities: c~  concentration of the i’th component,t  dimensionlesstime, z  dimensionlessspatial variable, ri~  rate of chemicalreactions (i’th component in j’th reaction), Tg  reactant mixture temperature, Tw wall temperature, T~  catalyst temperature, T¢ coolant temperature, and coefficients A1, A~, B1  B3, C~  C,~ include the technological parameters of the reactor.
9.3.
CONTROL OF NONLINEAR SYSTEMS
267
For simulation, two reactions of the following rates were considered: rl(c,
Tk)=8.7 103exp 1.98Tk] c
( _ 2oo
19800)~ r2(c, Tk) = 4.57 105 exp 1.98Tk
(9.77)
(9.78)
This case corresponds to the ethylene oxide production in an industrial scale reactor. The parameters involved take the following values: = 51.356307; A~. = 23.796894; B1  0.000614; B~B2 = 2.301454; B1Ba = 0.266606; CIC~ = 0.080613 C~C~ = 0.322451; CICa = 1.048619 A1
(9.79) (9.80) (9.81)
For simulation, the following values of initial and boundaryconditions were considered:
c,s(z) = Ts(z) =
0.015037 kmol ma; Tg~(z) = 522.266948 K 526.844683 K; T,~(z)= 514.216915 K = 0.015600 kmol ma; Tgo(t) = 499.579969 K
(9.82) (9.83) (9.84)
The coolant temperature was chosen as a control variable. The goal of the control was to maintain a desired profile of the gas mixture average temperature in the reactor. Partial differential equations of the modelwere solved by dividing the reactor into 10 segmentsaccording to the spatial variable. Experiments included both unconstrained and constrained cases for the control. After some ’preexperimental’ runs we found the appropriate structure for SNNpredictor and SNNcontroller. The structure of the SNNpredictor used was[4, 6, 1] with inputs [y(k), y(k 1), u(k), u(k 1)] and output [y(k ÷ 1)]. The sampling period was set equal to 0.5 min. The training data contained 700 inputoutput pairs. The structure of the SNNcontroller used was [14, 8, 4]. Onthe input of the SNNcontroller, the future predictions of the process behavior were applied and the controller generated the future control a~:tions. The experiments reported here were carried out with the followingchoices: ¯ the horizons relating to the control objective wereset equal to Hm 4, Hp = 13. ¯ the weightingfactor r in the control objective was fixed to r = 0.1.
268
CHAPTER 9.
TIMEVARYING
AND NONLINEAR SYSTEMS
1.015 
1.010.
1.005
~ 0.995 .
0.990
0.985
0.980 I00
200
300
400Tc [K] 500
600
700
800
900
Figure 9.25: Static gain of the chemical reactor [66]. ¯ the values of the parameters involved in the stochastic approximation algorithm were fixed to: ak = 0.3, ck = 0.01. Figure 9.25 showsthe nonlinear steadystate (static) gain. The behavior of the controlled reactor in the unconstrained case is depicted in Fig. 9.26. The upper graph shows the reactor output, the middle graph shows the control signal. In the lower graph, the control action incrementsare depicted. Thebehavior of the chemicalreactor in the constrained case is illustrated in Fig. 9.27. In this experiment, a constraint of IAT,:(k)I < 15 K on the control variable was considered. Thedescription of this figure is similar to that of Fig. 9.26. Fromthe lower graph of Fig. 9.26, it can be s~n that the increments of the control signal lie in the interval [0, 100] K. Onthe other hand, in the constrained case, the control variable increments vary within the interval [0, 15] Kand the control signal is smooth. In this section, we presented some experiments concerning the implementation of the constrained predictive control algorithm based on neural networks. There are a numberof other potential applications, since many industrial plants (chemical, mineral, etc.) are characterized by nonlinear and timevarying behavior, and are subject to several kinds of disturbances.
9.3.
CONTROLOF NONLINEARSYSTEMS
269
4.2 Tin~ 6.56.05.55.04.5. 4.0. 3.5
0.8.
0
25
Figure 9.26: The output, desired output and the manipulatedvariable for predictivecontrol[66].
270
CHAPTER 9.
TIMEVARYING AND NONLINEAR SYSTEMS
5.8. 5.6 5.4’ 52’ 5.0 4.6. 4.4. 25
50
25
50
75
lid
125
75
I~0
125
150
5,5’
4O
Tam 0.15
Figure 9.27: Theoutput, desired output and the manipulatedvariable for the constrained predictive control with constraint on the manipula,tedvariable of the chemicalreactor, (IAT~(k) <_ 15 KI) [66].
Part III Appendices
Appendix
A
StateSpace
Representation
The primary purpose of this appendix is to introduce a numberof concepts which are fundamentalin the statespace representation and analysis of dynamic systems. Weformally define and illustrate the concepts of controllability and observability.
A.1 Statespace
description
Let a SISOsystem (plant, process) be described by a statespace model x(k+ 1) = Ax(k) y(k) = Cx(k)
+Bu(k)
(A.1) (A.2)
where x is the state vector (n x 1), u is the system input (controller output) (1 × y is the system output (measured) (1 x A is the state transition matrix (n × n) B is the input transition vector (n x 1) C is the state observer vector (1 x n) Remark6 (Characteristic
equation) The characteristic
by det (zI A) = 273
equation is given
274
APPENDIX A. STATESPACE
REPRESENTATION
Remark 7 (Number of representations) For a given system there exists no unique statespace representation. In fact from any state representation we can obtain a new one by using any linear transformation, i.e., ~(k) Tx(k) where T is a nonsingular matrix. Let us next introduce two state representations, namely, the control and observer canonical forms. A.I.1
Control
and
observer
canonical
forms
Consider a system given by a transfer polynomial 11 y(kl_B(q A(q_l) u(k)
(i.3)
whereB (ql) b~q~ + .. . + b,~q’~ and A (q~)  1 +a~q1 + .. . ÷ a,~q’~. For notational convenience, without loss of generality, we assumethat the polynomials are all of order n. The control canonical form x,:(k+l) y(k)
(A.4) (A.5)
A, :~(k)+BEu(k) = C,:xE(k)
is obtained by substituting al
a2
1 0
0 1
0
0
....
...
an1
0 0
1
0
1 0
(~c
an
0 0
= [ bl 52 "’"
(A.6)
(A.7)
bn ]
(A.8)
The key idea is that the elements of the first row of Ac are exactly the coefficients of the characteristic polynomialof the system (the matrix AEis
A.2.
275
CONTROLLABILITY AND OBSERVABILITY
knownas the companionmatrix of the polynomial ,~ (see [64]). Similarly, the observer canonical form xo(k+l) y(k)
÷
al ~n1 T "’" ~ a,~
Aoxo(k)+Bou(k) CoX,,(k)
(A.9) (A.10)
is obtained by substituting a~ a2
1 0 ... 0 1
0 0 (A.11)
a,~_~ 0 0 a,~ 0 0 ...
1 0
b~ b~
I
(A.12)
:
;o
Co= [ 1 0 ...
A.2 Controllability
0 ]
(A.13)
and observability
Wewill now consider now two fundamental notions concerning dynamics systems that are represented in the statespace form. Thefirst is whetherit is possible to transfer (drive, force) a systemfroma given initial state to any other arbitrary final state. The second is howcan we observe (determine) the state of a given systemif the only available informationconsists of input and output measurements. These concepts have been introduced by Kalman as the conceptsof controllability and observability. Definition 15 (Controllability) The system (A.1) is said to be completely state controllable (reachable), or simplycontrollable, if it is possible to find a control sequencewhichsteers it from any initial state x (ki) at any instant ki to any arbitrary final state x (kf) at any instant kf ki _~ 0. Otherwise, the system is said to be uncontrollable.
276
APPENDIX A. STATESPACE
REPRESENTATION
Definition 16 (Observability) The system (A.1) is said to be completely state observable, or simplyobservable, if and only if the completestate of the systemcan determinedover any finite time interval [ki, kf] from the available input and output measurements over the time interval [ki, kf] with kf > ki ~_ 0. Otherwise, the system is said to be unobservable. The following theorems state the conditions under which a given system is controllable or observable. Theorem2 (Controllability) The system (A.1)(A.2) is controllable and only if the controllability matrix defined by Is
AS A2B ...
AniS
]
(A.14)
has rank n. Theorem 3 (Observability) The system (A.1)(A.2) is observable only if the observability matrix defined by C CA 2CA
(A.15)
:
CA~I has rank n. In order to illustrate the usefulness of the controllability notion and the states transformation, we shall consider two applications: a feedback controller based on the pole placement control approach, and the reconstruction of the state variables. A.2.1
Pole
placement
The design of statespace feedback controllers can be seen as consisting of two independent steps: the design of the control law, and the design of an observer [21]. The final control algorithm will consist of a combinationof the control law and the estimator with controllaw calculations based on the estimated states. The state feedback control law is simply a linear combinationof the components of the statespace vector u(k) Kx (k
(A.16)
A.2.
CONTROLLABILITY AND OBSERVABILITY
whereK = [kl, ...,
277
kn]. Substituting this to the state equation (A.1) gives x(k + 1)= Ax(k) BKx
(A.17)
and its ztransform is given by (zI
h+BK)x(z)=
(A.lS)
with a characteristic equation det (zI  A + BK) =
(A.19)
In poleplacement, the controllaw design consists of finding the elements of K so that the roots of (A.19), poles of the closedloop system, are in the desired locations. The desired characteristic equation is given by P (z) = ’~ +an_l zn1
(A.20)
~ ... ~ o~iz ~ o~ 0 ~ 0
In order to do this, let us consider a linear transformationdefined by a matrix M x = M~
(A.21)
The system equations then become ~(k+ 1) y(k) u(k)
= M1AM~(k) +M1Bu(k) = CM~(k) = KM~(k)
(A.22) (A.23)
Let us introduce the following notations [64]: M~AM Min KM
(A.24) (A.25) (A.26)
where .~ et ~ have the following form 0 0
1 0
0 ... 1
0 0
[0 0 (A.27)
0
0
0
ao
al
a2
’’’
an_ 1
278
APPENDIX A. STATESPACE REPRESI~NTATION
The matrix ~ is a companionmatrix. The characteristic
polynomial is given
by n1 =z’~ a+ ,~1
det ( zI~5~)
+... + alz + ao
(A.28)
Let us denote the matrix Mas follows: M= [ml,m2,... where rn~ (i  1, ..., from (A.24)
(A.29)
,rn~]
n) represents the columnsof the matrix M. Weobtain M~ = AM
(A.30)
that is, ran1

mn2

anlmn an2mn
:
tmn
:
tmn1 :
ml 
almn aomn
=
(A.31)
Am2
= Arnl
Hence, (A + a,~_xI) (A 2 ~ a~_lA+ an2I) m,~
mn1 mn2
: mI
=
0 =
’~2 + ... + a2A+ alI) m.; (A’~1 + a,~_~A n1 + ... + a~A+ aoI) mn (A’~ + an_~A
The equation (A.25) B = MB
(A.33)
leads to (A.34)
mn=S
From(A.32), we derive llln_ 1
(A + anlI)
B
ITln_ 2
(A 2 b an~A q an2I) :
1111 ~~
(A’~1 + an_l A~2 + ...
(A.35) + a2A +
alI) B
A.2.
CONTROLLABILITY AND OBSERVABILITY
279
The inverse of the matrix Mexists if and only if (A.36)
rank (B, AB,... ,A~IB) = This correspondsto the controllability condition. Based on the desired characteristic equation (A.20) we derive det (zI  X + ~=’~ +a, \ ]
~_~z’~1 + .. . + alz +
(A.a7)
O~ o
We have 0 0
1 0
0 1
0 0
(A.a8)
:
0 ~0
0
0
~1
~2
0 0
O~0 o a
0 0
0 0
0
0 al
0~1 
0~2 
0 0
a2
O~n1  an1
(A.39) Wealso have
[ ~0 "~1
"’"
"~n1
]
(A.40)
It then follows in view of (A.39) and (A.40) ~, = a,  a,
(A.41)
(i = 0, ..., n  1) and K = M~K. The ideas behind this control design can also be used in the context of state observation, whichwill be considered next.
280
APPENDIX A. STATESPACE
A.2.2
REPRESENTATION
Observers
The problem of determination of the states of a given system arise in many contexts, e.g., for purposes of control, soft sensors development,diagnosis, etc. The state vector can be directly calculated from the available measurements (inputoutput data). From(A.1)(A.2), we y(kn+l)
=
Cx (k  n + 1) CAx(k n+ 1) + CBu(k n+ :
(A.42) CA"’x(k n + 1) + CA"~Bu(k n+ 1) + ... +CBu(k  1) These equations can be conveniently arranged in matrix form as, say, y(kn+2) :
= Qx(k n+ 1) +
u(kn+l)  n + 2) :
(A.43)
u(k :) where C CA CA:
(A.44)
:
CA,~~ This matrix is nothing else than the observability matrix. It is clear that in order to calculate the state x (k  n + 1), this matrix must not be singular (these developmentsrepresent the proof of the Theorem3). In other words, the nonsingularity of the matrix Q is crucial to the problem of observing the states. This state reconstruction approach has the drawbackthat it may be sensitive to disturbances [5]. Another approach for state reconstruction is based on the use of a dynamic system. Let us consider the following observer for the system states ~(k+l)=A~(k)+Bu(k)+L[y(k)C~.(k)]
(A.45)
based on the system modeland the correction term with gain L  [11, l~, ..., l,~] T. Theestimation error is given by ~(k + 1)= x(k + 1) ~(k + 1)= [A LC]
(A.46)
A.2.
CONTROLLABILITY AND OBSERVABILITY
281
Thus, if the matrix [A LC] represents an asymptotically stable system, ~ (.) will convergeto zero for any initial error ~ (0). For the design of the gain L, we can use the sameapproach as for the design of the statefeedback control law: The characteristic equation associated with the system governing the dynamicsof the estimator error is det (zI  A + LC)
(A.47)
and should be identical to the desired estimator characteristic equation. Notice that this characteristic equation is similar to (A.37), and, therefore the mathematical tools used for solving the state reconstruction problem are similar to those employedin the pole placement control design.
Appendix
B
Fluidized
Bed Combustion
In fluidized bed combustion (FBC), the combustion chamber contains a quantity of finely divided particles such as sand or ash. The combustion air entering from below lifts these particles until they form a turbulent bed, which behaves like a boiling fluid. The fuel is added to the bed and the mixed material is kept in constant movementby the combustion air. The heat released as the material burns maintains the bed temperature, and the turbulence keeps the temperature uniform throughout the bed. The main purpose of an FBC plant is to generate power (energy flux J [W]=[~]). Several powers can be distinguished: The fuel power is the power in the fuel (heat value times feed); the combustion power is the power released in combustion (dependent on completeness of the combustion and the furnace dynamics). Boiler power depends further on the efficiency of the heat exchangers as well as their dynamics. Often, a part of the heat is used for generating electricity in the turbines, while the remaining heat is used for the generation of steam and hot water. Wecan then distinguish electrical power and thermal power. Depending on plant constructions these are roughly of order 40%0%(electrical plant), 30%55%(cogenerating plant), or 0  80%(thermal plant) of the fuel power. In what follows a simplified model of a thermal plant is considered. For a more realistic modeling of the thermal power (steam mass flow), including the drum pressure, see [4].
B.1 Model of a bubbling
fluidized
bed
A rough model for a bubbling fluidized bed combustor can be formulated [76] based on mass and energy balances (see also [33] [35]). The model divides the furnace into two parts: the bed and the freeboard, see Fig. B.1. Combustion takes place in both: oxygen is consumpted and heat is released and removed. 283
284
APPENDIX B. FLUIDIZED BED COMBUSTION
C N0 [ stack
Throat temperature
throat heate: changers
Freeboardtemperature
board
Secondaryair flow Fuel feed ~’ Bedtemperature
bzd4
Primaryair flow
Figure B.I: A schematic drawing of a typical FBCplant. A mixture of inert/sorbent bed material and fuel is fluidized by the primary air. Complete combustionis ensured by the secondary air flow inserted from above the bed. The heat released in combustionis captured by heat exchangers and used for the generation of electricity, steam, or both.
B.1.
MODEL OF A BUBBLING FLUIDIZED BED
285
The control inputs of the system are the fuel feed Qc [~] and the primary and secondary air flows F1 and F2t [ SN"3] Measurable system outputs are the flue
B.I.1
gas
0 2 [Nm3] tN~~
~, and the bed and the freeboard temperatures TBand TF
Bed
The solids (char) in the fuel combustin the bed. Whenfed to the combustor, the solids are stored in the fuel inventory (the amountof unburnedchar in the bed). The combustionrate QB[~] depends on the availability of oxygen in the bed as well as the fuel properties: QB(t) = WcCB(t) tc
(B.1)
where Wcis the fuel inventory [kg], and tc is the (average) char combustion time Is]. CB and C1 are the oxygen contents [~] in the bed and in the primaryair, respectively. The dynamics of the fuel inventory are given by the difference between the fraction of the fuel feed rate that combusts in the bed Qc [~] and the combustionrate QB (t) dWc(t) = (1  V) qc (t)  QB dt
(B.2)
whereVis the fraction of volatiles in the fuel [~~]. Combustionin the bed consumes oxygen. (92 comes into the bed in the primary air flow F1 (t) N’na] which i s n aturally p rovided by t he environment 3] ~ Nm and having an oxygen content C1 = 0.21 tN~~ J" O2 is consumedin the combustionand transported to the freeboard: dCn(t) 1 [elF1 (t )  XcQB (t)  C~ (t) (t)] dt Vn
(B.3)
where Xc [s,~] is the coefficient describing the amountof 02 consumedby t kg J the fuel and Vs [ma] is the volumeof the bed. As a result of combustion, heat is released. The amount of released heat depends on the heat value Hc [~] of the solids in the fuel. Heat is removed from the bed by cooling water tubes. The energy balance for the bed temperatures TB[K] is given by dTB (t) = ~{gcQ~(t)amAmtTn(t)Tnt] dt
(0T1
(0%
(B.4)
286
APPENDIX B. FLUIDIZED BED COMBUSTION
where c~ and W~are the specific heat [~] and mass [kg] of the bed material (inert sand), aBt and ABt are the heat transfer coefficient[myr~] w and surface 2] [m of the cooling tubes, and Tnt is the temperature [K] of ~he cooling water. The incoming primary air in temperature T1 [K], with specific heat cl [NmE~K],conveyssomeheat into the system. Theremaining air, heated in J bed temperature, is transported into the freeboard, where CF[~,,7r~] is the specific heat of the flue gases. B.1.2
Freeboard
The gaseous components(volatiles) in the fuel are released and transported by the fluidizing air to the freeboard where immediatecombustion occurs. The combustionof volatile fraction of the fuel consumesoxygenin the freeboard. Oxygencomes to the freeboard from the bed and with the secondary air flow F2,[Nm3 tN,’~ J" Thedynamicsof the freeboard s J] with the O2content C2[Nln3] oxygen content CFrNm3~(flue gas oxygen) are given dCF(t) dt
1
~{c,(t) F1(t) +C~F2
(B.5)
XvVQc(t)  CF(t) [F~ (t) (t )]} where Xvt [Nm3] kg ~ is the coefficient describing the amountof 02 consumedby the volatiles. VF[m~] is the freeboard volume. The volatiles release energy whencombusted. Heat is removedfrom the freeboard by cooling water tubes located at the walls of the furnace. The energy balance for the freeboard temperatures TF [K] is given by dTf (t) dt
1 CFVF [HvVQc (t)
aF tAFt [T F (t )  TFt
(B.6)
~gl(t) Tn(t) +c2F2 (t) T2(t) +c~[F~(t) +F2(t)] where aFt and AFt are the heat transfer coefficient[,,,~K] w andsurface[m2] of the cooling tubes, TFt is the temperature [K] of the cooling water, c2 is the specific heat[N,~’;F~] J of the secondaryair, in temperature T2, a~d Hvis the heat value of the volatiles [~]. B.1.3
Power
The combustion powerPc [W] is the rate of energy released in ,combustion Pc (t) = HcQ~(t) + HvVQc
(B.7)
B.1.
MODEL OF A BUBBLING FLUIDIZED BED
287
and simple first order dynamics were assumed for the thermal power P [W] dP (t) = [Pc (t )  P (t dt Tmi x
(B.8)
whereTmixIS] is a time constant.
B.1.4
Steadystate
The equations can be solved in steadystate. Bed fuel inventory and bed oxygencontent are functions of the fuel feed and the primary air flows: Cltc (1  V) FIQc Wc = CIF1  Xc (1  V)
cn= c1 + xc (1F1  v)
(B.9)
(B.10)
Let us solve the equations eliminating variables other than the Qc, F1 and F2. Bed temperatures depend on the fuel power: Tn = Hc(1  V)Qc + ClF~T1 + aBtABtTBt aBtABt ~ EFF1
(B.11)
Flue gas oxygenis influenced mainly by the secondary air flow and the fuel feed: CF =
ClF,+C~F2 Xc (1  V) Qc XvVQc FI+F2
(B.12)
Freeboard temperatures depend on the heat released by the volatiles: (B.13)
[antAnt + cFF~][aFtAft + CF(F1 + F2)] Powerdependsentirely on the fuel feed and its heat value: P= Hc(1
V) Qc + HvVQc
(B.14)
288
APPENDIX B. FLUIDIZED BED COMBUSTION
B.2 Tuning of the model The above modelis very simple. The poor aspect is that it describes only a few of the phenomenainvolved in a complexprocess such as FBCcombustion. For example, the assumptions on fluidization, combusiton and heat transfer are very elementary. Therefore, the model needs to be tuned in order to match plant measurements. The nice thing is that a model with a simple structure is also simple to tune using standard methods. Muchmore detailed models have been constructed for the FBCprocess. Froma practical point of view the calculation times and the lack of accurate measurementsrestricts the use of these models. It is commonin practice that a simple mass or energy balance can not be closed based on plant measurements, due to systematic errors in measurements. Manyof the internal parameters related to combustion, fluidization, and heat transfer can be accurately measuredonly in laboratory conditions, and are not applicable to a real plant. The advanced models can be extremely useful in helping to develp and understand the process. In automatic process control, however, their significanceis less.
B.2.1 Initial
values
The tuning of the model was divided into three phases. The initial values were found from the literature (heat values and heat transfer coeiYicients, O2 consumption, plant geometry). These are given in Table B. 1.
B.2.2
Steadystate
behavior
The steadystate behavior of the model was adjusted first. tuning knots were used
TILefollowing
Hc ~ plHc V ~ p2Y Xc. ~ p3Xc
(B.15)
taking further that Hv = Hc; Xv = Xc. A cost function was formulated as a weighted sum of squared errors between measuredsteady state values and the predicted ones:
w~swere chosenaccordingto the scales of the variables. Usinga data set of 11 steadystate points, Table B.2, the values ofp~ were estimated to p  [0.2701,
B.2.
289
TUNING OF THE MODEL
Bed: bed material specific heat bed inert material volume heat transfer coefficient heat exchange surface cooling water temperature Freeboard: volume heat transfer coefficient heat exchange surface cooling water temperature Air flows: primary air O2 content primaryair specific heat primary air temperature secondary air O2 content secondaryair specific heat secondary air temperature flue gas specific heat Fuel feed: O2 consumed in combustion heat value of char mean combustion rate fraction of volatiles O2 consumed in combustion heat value of volatiles Other: time constant
c, =800
[@]
W~= 25000 VB= 26.3
[kg] a] [m
Am.= 26.8 Tm= 573
[m [K]
v = 128.1
3] [m
aft = 210 AFt = 130.7 Tm= 573
[mW~g] 2] [m IN]
C1 = 0.21 C1 = 1305 T1 = 328 C2 =0.21
[~~K] [Z]
aB,.=210
fNma 1
t N.~J
c2 = ~305 T~= 3~8
[K] J
CF 1305 Xc = 1.886 Hc=30x106
rNm~l t kg J [~]
tc = 50
Is]
Xv = 1.225
[Nm3
v=o.75 Hv= 50× 10 T,~,i~= 300
Table B.I: Constants for the FBCmodel.
[S]
290
APPENDIX B. FLUIDIZED BED COMBUSTION Qc[~) F~r~""l 2.2 3.5 2.3 3.5 2.3 3.5 2.3 3.5 1.6 2.5 1.7 2.8 1.7 2.8 3.1 3.7 3.7 3.0 3.0 3.7
v rN,,~, 7.9 6.5 9.8 8.0 4.4 5.2 6.4 10.2 8.6 11.0
CF[%vol] 5.1 3.0 6.9 5.1 2.9 4.0 5.4 3.9 2.5 5.0
TB[°C] TEl°C] 696 556 662 607 696 550 686 572 650 581 668 569 696 530 691 646 681 628 659 599
P[MW] 19.1 19.3 19.2 19.1 13.1 15.1 14.3 26.0 27.0 25.6
Table B.2: Steadystate data from a FBCplant. 0.9956, 0.4238]T using the LevenbergMarquardtmethod. The efficient heat value wasfound to be only .~ 30 %of the heat value of dry fuel. As the fuel feed was taken as the measuredkginput flux to the furnace, and moisture was not taken into account, this is acceptable. For volatiles the 3/4 assumption was reasonable. The O2consumptioncoefficient reflects the fact that less than half of the input feed consists of combustible components. B.2.3
Dynamics
The dynamics were tuned by hand by comparing measurements from stepresponse experiments and corresponding simulations. First, the delays were examined and found negligible (equal to zero) from air flows F1 and and 20 seconds from fuel feed Qc. This was judged reasonable as there is transport delay from a change in fuel belt conveyorspeed to the introduction of a change in the flow to the furnace. In addition, some delay is due to ignition of fuel, which was not taken into a~count in the model. Thus we have Qc (t) : Qc,actuator
(t 20)
(B.17)
whereQc,actuator is the measuredfuel flow. The transport delay in air flows wasinsignificant. The time constants for CF were found adequate (bed and freeboard volumes). The time constants for temperatures were adjusted by altering the mass of the bed inert material WIand the freeboard time constant CFVF(in dTF/dt equation (B.6) only); for powerthese were adjusted by setting For WIa value of 480 kg wasfound reasonable; the CFVFfor TFwas multiplied by 35 in order to have reasonable responses.
B.2.
291
TUNING OF THE MODEL
0
50
100
150
200
250
0
50
1 O0
! 50
200
250
300
50
I O0
! 50 t [mini
200
250
300
90~ 80~
50~
Figure B.2: Responseof the tuned FBCmodel(solid lines) against measurements (dotted lines) for a 25 MW FBCplant.
B.2.4
Performance
of
the
model
Figures B.2B.3 illustrate the performanceof the modelwith respect to data measured from a 25 MW semicirculated FBCplant, for steplike changes in Qc, F1 and F~. Figure B.2 illustrates the steadystate performance. Note that the data used for tuning the steadystates of the model was not taken from these experiments. The main characteristics of the 02 in flue gases as well as freeboard temperatures are captured by the model. For bed temperatures the response is poor. Figure B.3 shows a smaller section of the same simulation. For O2 the dynamic response is good. For freeboard temperatures the response seems like an acceptable first order approximationof a second order process. Again, the predictability for bed temperatures is poor.
292
APPENDIX B. FLUIDIZED BED CO_MBUSTION
3
~1.5 210
0.0~ 175
180
185
190
175
180
185
190 t[min]
195
200
205
210
200
205
210
70~
~
,65( 60( ~0
195
Figure B.3: Response of the tuned FBCmodel(solid lines) against measurements (dotted lines) for a 25 MWFBCplant.
B.3. LINEARIZATION OFTHEMODEL B.3 Linearization
293
of the model
The differential equation model (B.1)(B.6) is a nonlinear one, containing bilinear terms such as WcCBin (B.1) and CB(t) F1 (t) in (B.3). This can be discretized around an operating point. Let the operating point be given by:
(B.lS)
{Qc, F1, F2, We, CB, CF, TB, TF, P}
If this is a steady state point, then {Wc,CB,CF,TB,TF,P} can be determined by specifying {Qc, ~1,~2} and using the steady state equations (B.9)(B.14). Using a Taylorseries expansion, we obtain the following earized continuous time model: dWc(t) dt
(1  v) [oc(t)  ~]tcC1 we[c. (t) C~ [Wc (t) tcC1
dCB (t) dt
dCF(t) dt
VB
 Wc] + (1  c WcCu tcC1
(B.19)
294
APPENDIX B. FLUIDIZED BED COMBUSTION dT~(t) dt
cIT1

CFTB (F 1 (t)
 ~1)
(Wc(t)
c~Wi HcWc
o W~tcC~
~c,W,tcC,(CB(t) aBtABt 1

(B.22)
CFF
c~W~
(TB (t) 
(H~W~C~ _ aBtAm(TB  Tnt) c~fi~T~  CF~n) tc C~
c, W~ dTF(t) dt
HvV [Qc (t)  ~c]
[F~(t)
~ c2T2c__F~Fr.~CFTF IF2(t)  ~] +
F~
~
[TB(t)  T~](B.23)
CF(~ + ~2)  aFtAFt [TF (t)  ~F] CF VFTF
HvV~c  CF(~ + F~)TF  a~tAFt(~F  TFt) CF VFTF
c~F~T2 + CFF1TB CF VFTF
dP(t) dt (B.24)
Tmix
Equations (B.19)(B.24) can be expressed as a linear statespace model around an operating point dx(t) = A,:x (t) + B~,u (t) dt y (t) = C,~x (t)
(B.25) (B.26)
where the vectors x, u and y are given by deviations from the point of
B.3.
295
LINEARIZATION OF THE MODEL
linearization
wc(t)Wc c~, (t)U, c~(t)U~
x(t)
;u(t)
F~ (t)
TF(t) P(t)fi
]
Go(t) F~(~)F~
(B.27)
The coei~ticients of the matrices Ac and Bc are obtained from (B.19)(B.24), ~~ b1,1 = (1  V), etc. Usually the flue gas 02, bed and freeboard al,1 = tcC1 ~ temperatures and power outtake are measured:
andC,~= 00 00 00 10 01 00 0 0 0 0 0 1
T~(t)~ y(t)= TF(t)~F P(t)fi
(B.28)
For manypractical purposes, the modelneeds to be discretized. The approximation of a continoustime statespace model by a discretetime model is straightforward(see, e.g., [64]), and results in x(k+!) y(k)
= Ax(k)+Bu(k) = Cx(k)
(8.29) (8.30)
where t = kT~and Ts is the sampling time. Example43 (Linearization) given by
Let us linearize the model in a steady state
Nm3 "~c = 2.6kg, ~1 = 3.1Nm~, ~2 = 8.4 8 8 8
(B.31)
The remaining states of the operating point are then given by Wc = 165kg, CB = 0.042, CF = 0.031 TB = 749°C ,TF = 650°C, P = 21.1MW
(B.32) (B.33)
For a continuous time~ statespace modelwe obtain
tC
=
0.0040 0.0001 0 0.0027 0 0.0001
15.6819 0 0 0.5908 0 0 0.0242 0.0898 0 10.5892 0 0.0008 0 0 0.0005 0.4236 0 0
0 0 0 0 0.0051 0
0 0 0 0 0 0.0033 (B.34)
296
APPENDIX 0.2533 0 0.0046 0 0.7238 0.0202
B.
FLUIDIZED
0 0.0064 0.0001 0.755 0.0155 0
BED COMBUSTION
0 0 0.0014 0 0.0929 0
001000 000100 000010 000001
(B.35)
(B.36)
Using Ts  4 s, a linearized discretetime statespace model is obtained (this is simple to accomplish numerically using a suitable software like Matlab, for example). For convenience, the model is given in a transfer polynomial form. Wehave from the fuel feed: 1) CF(q 1) Qc(q I) rB(q ~) Qc(q 1) TF(q 1) Qc(q p(q~) ~) Qc(q
0.01549q 6 (1  0.9957q1) (1  0.09308q~) (R 5 (1  0.6233q ~)(1 + ~) 0.003362q 0.5543q 2 ~) (1  0.9968q~) (1  0.09293q 2.866q 6 (1  1.994q~ 2 + )0.9936q (1  0.9799q1) 2(1  9968q1)
(B.38) (B.39)
0.08027q6 (1  0.0923q~) q) (1  0.9958q (1
~
~~’(i
~~
(i
_~~qq1)(B.40)
From primary air flow:
C (q 1) f2(q 1) TB(q
1) F2(q TF(q P(q~) 1) F2(q
0.00084931q~ (1  0.9872q1) 1) (1 + 0.2433q (B.41) ~) 1) (1 0.6983q (1  0.9968q (1  :t) 0.09293q 0.020027q1 (1  1.006q1) (1  ~) 7,9q (B.42) ~ a) (1  0.9968q’) (1  0.09293q 0.06135q’ (1  0.095q~) (1  0.OSTZq’) (1  1.00Zq(~),,,, (1  0.9799q1) (1  0.9968q1)2 (1  0.09293q1)~’’" "6) ~) 0.011219q’ (1  q’)(1 + 0.4645q (1  0.9868q1) (1  0.9968q’) (1  0.09293q’)
(B.44)
B.3.
297
LINEARIZATION OF THE MODEL step in: Qc
@0.
o.0zl
~
0.04~
0 15
30
0 15
30
45
60
0.02[ 0.04~ 0 15
0.02 t 0.04[ 45
60
30
45
30
45
60
9~ 0
15
30
45
60
0 15
30
45
60
0 15
0
15
30
45
60
0
30
45
60
0
15
0
15
15
26 24 22
~. 26 24 22 0
15
30 45 t lmin]
60
30
60
45
60
30 45 t[min]
60
26 24 22 0
15
30 45 t[min]
60
Figure B.4: Step responses of the linearized model, linearized at steady state Qc = 2.6 [kg/s], Fx = 3.1 [Nm3/s], F2 = 8.4 [Nm3/s]. and from the secondary air flow: CF (ql)
F~ (ql) x) TB(q 1) F2(q
=
0.0046901q1
1 1  0.6983q
 0
TF (qX) _0.36785q~ = x F~ (ql) 1  0.9799q P(q~) = 0 x) Fz(q The performanceof the linearized modelis depicted in Fig. B.4.
(B.45) (B.46) (B.47) (B.4S)
Bibliography [1]J
Andrews. A mathematical model for the continuous culture of microorganisms using inhibitory substrate. Biotechnology and Bioengineering, 10:707723, 1968.
[2]K ~strom.
Introduction to Stochastic Control Theory. AcademicPress, NewYork, 1970.
likelihood and prediction error methods. Auto[3] K/~.str6m. Maximum matica, 16:551574, 1980. [4] K/~.strom and R Bell. Drumboiler dynamics. Automatica, 36:363378, 2000.
[5]K/~trt~m
and K Wittenmark. Computer Controlled Systems: Theory and Design. PrenticeHall Inc., EnglewoodCliffs, NewJersey, 1990.
[6]N Baba. A new approach for
finding the global minimumof error function of neural networks. Neural Networks, 2:367373, 1989.
[7]R Battiti.
First and secondorder methodsfor learning: Betweensteepest descent and Newton’s method. Neural Computation, 4:141166, 1992.
[8]R Bitmead,
MGevers, and V Wertz. Adaptive Optimal Control  The Thinking Man’sGPC.PrenticeHall, NewYork, 1990.
[9]R Brockett
and P Krishnaprasad. A scaling theory for linear systems. IEEETransactions on Automatic Control, 25(2):197207, 1980.
[10]W Buntine and A Weigend. Computingsecond derivatives in feedforward networks: A review. IEEE Transactions on Neural Networks, 5:480488, 1994.
[11]R Bush and F Mosteller. Stochastic Models for Learning. John Wiley and Sons, NewYork, 1958. 299
300
BIBLIOGRAPHY
[12]J Castro and MDelgado. Fuzzy systems with defuzzification are universal approximators. IEEE Transactions on Systems, Manand Cybernetics, 26:149152,1996. DClarke, C Mohtadi,and P Tufts. Generalized predictive control  part. Automatica, 23(2):137148, 1989.
[14]C Cutler and B Ramaker. Dynamicmatrix control: A computer control algorithm. JAAC, pages 00, 1980.
[15]T Edgar and D Himmelblau.
Optimization of Chemical Processes.
McGrawHill, NewYork, 1989.
[16]J Edmunds.Input and output scaling and reordering for diagonal dominance and block diagonal dominance. IEE Proceedings  Control Theory and Applications, 145(6):523530, 1998.
[17]E Eskinat,
S Johnson, and WLuyben. Use of Hammerstein models in identification of nonlinear systems. AIChEJournal, 37(2):255268, 1991.
[18]R L Eubank. Nonparametric Regression and Spline Smoothing. Marcel Dekker, NewYork, 1999.
[19]P Eykhoff. System Identification: Parameterand State Estimation. John Wiley and Sons, NewYork, 1974.
[20]R Fox and L Fan. Stochastic modeling of chemical process systems: Parts IIII. 167, 1990.
Chemical Engineering Education, XXIV:5659,8892, 164
[21]G Franklin, J Powell, and M Workman.Digital Control of Dynamic Systems. Addison Wesley LongmanInc., Menlo Park, U.S.A., 1998.
[22]D Goldberg. Genetic Algorithms in Search, Optimization, and Machine Learning. AddisonWesley,Reding, Massachusetts, 1989.
[23]G Goodwinand K Sin. Adaptive Filtering,
Prediction and Control.
PrenticeHall, NewJersey, 1984.
[24]MHagan and MMenhaj. Training feedforward networks with the marquardt algorithm. IEEETransactions on Neural Networks, 5:989993, 1994.
301
BIBLIOGRAPHY
[25[W H~rdle. Press,
Applied Nonparametric Regression. NewYork, 1990.
[26] T Hastie and R Tibshirani. H~ll, London, 1990.
Generalized Additive
Cambridge University Models. Chapmanand
[27] S Haykin. Neural Networks: A Comprehensive Foundation. MacMillan, NewYork, 1994.
[28]M Henson
and D Seborg. Adaptive nonlinear control of a pH neutralization process. IEEE Transactions on Control Systems Technology, 2(3):169182, 1994.
[29]J
Hertz, A Krogh, and R Palmer. Introduction to the Theory of Neural Computation. AddisonWesley, Redwood City, 1991.
[3o]K Hornik,
M Stinchcombe, and H White. Multilayer feedforward neural networks are universal approximators. Neural Networks, 2:359366, 1989.
[31]K Hunt,
R Haas, and R MurraySmith. Extending the functional equivMence of radial basis function networks and fuzzy inference systems. IEEE Transactions on Neural Networks, 7:776781, 1996.
[32]H HyStyniemi.
SelfOrganizing Artificial Systems Modeling and Control. PhD thesis, nology, 1994.
[33]E Ikonen.
Neural Networks in Dynamic Helsinki University of Tech
Pedin polttoainekertymanmallintaminen Master’s thesis, University of Oulu, 1991.
leijukerrospoltossa.
[34]E
Ikonen. Algorithms for Process Modelling Using Fuzzy Neural Networks: A Distributed Logic Processor Approach. PhD thesis, University of Oulu, 1996.
[35]E
Ikonen and U Kortela. Dynamic model of a fluidized bed coal combustor. Control Engineering Practice, 2(6):10011006, 1994.
[36]E
Ikonen and K Najim. Nonlinear process modelling based on a Wiener approach. Journal of Systems and Control Engineering  Proceedings of the Institution of Mechanical Engineers Part I, to appear.
[37]E
Ikonen, K Najim, and U Kortela. Process identification using Hammerstein systems. In IASTED International Conference on Modelling, Identification and Control (MIC 2000), Insbruck, Austria, 2000. IASTED.
302
BIBLIOGRAPHY
[38] R Isermann. Digital Control Systems. SpringerVerlag, Heidelberg, 1981. [39] A Jazwinski. Stochastic Processes and Filtering Theory. AcademicPress, NewYork, 1970. [40] MJohansson. A Primer on Fuzzy Control (Report). Lund ]Institute Technology, 1996. [41] R Johansson. System Modeling and Identification. Jersey, 1993.
of
Prentice, Hall, New
[42] E Katende and A Jutan. Nonlinear predictive control of complex processes. Industrial Engineering Chemistry Research, 35:35393546, 1996. [43] E Katende, A Jutan, and R Corless. Quadratic nonlinear predictive control. Industrial Engineering Chemistry Research, 37(7):27212728, 1998. [44] 3 Kiefer and J Wolfowitz. Stochastic estimation of the maximumof a regression. Annals of Mathematicsand Statistics, 23:462466, 1952. sys[45] MKinnaert. Adaptive generalized predictive controller for MIMO tems. International Journal of Control, 50(1):161172, 1989. [46] S Kirkpatrick, C Gelatt, and MVecchi. Optimization by simulated annealing. Science, 220:671680, 1983. [47] G Klir and T Folger. Fuzzy Sets, Uncertainty and Information. PrenticeHall, 1988. [48] T Knudsen. Consistency analysis of subspace identification methods based on a linear regression approach. Automatica, 37(1):8189, 2001. [49] T Kohonen. The selforganising 1990.
map. IEEE Proceedings, 78:14641486,
[50] R Kruse, J Gebhardt, and. F Klawonn. Foundations of Fuzzy Systems. John Wiley and Sons, Chichester, 1994. [51] I Landau, R Lozano, and MM’Saad.Adaptive Control. Springer Verlag, London, 1997. [52] P Lee, R Newell, and I Cameron. Process Managementand Control. http://wwweng2.murdoch.edu.au/m288/resources/textbook/title.htm, 2000.
BIBLIOGRAPHY
3O3
[53]P Lindskog. Methods, Algorithms and Tools for System Identification Based on Prior Knowledge.PhDthesis, Lindki3ping University, 1996.
[54]L Ljung and T McKelvey.A least squares interpretation
of subspace methodsfor system identification. In Proceedingsof the 35th Conference on Decision and Control, pages 335342. IEEE, 1996.
[55]L Ljung and T S~derstr~m. Theory and Practice of Recursive Identification. MITPress, Cambridge, Massachusetts, 1983.
[56]R Luus and T Jaakola. Optimization by direct search and systematic reduction of the size of the search region. AIChEJournal, 19:760766, 1973.
[57]J Maciejowski. Multivariable Feedback Design. AddisonWesley, Wokingham, England, 1989.
[58]A M~szaros, MBrdys, P Tatjewski, and P Lednicky. Multilayer adaptive control of continuous bioprocesses using optimising control techniques. case study: Baker’s yeast culture. Bioprocess Engineering, 12:19, 1995.
[59]A M~szaros, P Dostal, and J Mikles. Developmentof tubular chemical reactor modelsfor control purposes. ChemicalPapers, 48:6972, 1994.
[6o]J Monod.Recherche sur la Croissance des Cultures Bact$riennes. Herman, Paris, 1942.
[61]E Nahas, MHanson, and D Seborg. Nonlinear internal model control strategy for neural network models. Computersand ChemicalEngineering, 29(4):10391057,1992.
[62]K Najim. Process Modeling and Control in Chemical Engineering. Marcel Dekker Inc., NewYork, 1988.
[63]K Najim and E Ikonen. Distributed logic processors trained under constraints using stochastic approximation techniques. IEEETransations on Systems, Man, and Cybernetics  A, 29:421426, 1999.
[64]K Najim and E Ikonen. Outils MathOmatiquespour le G~nie des Proc~I~s  Cours et Exercices CorrigOs. Dunod,Paris, 1999.
[65]K Najim, A Poznyak, and E Ikonen. Calculation of residence time for non linear systems. International Journal of Systems Science, 27:661667, 1996.
304
BIBLIOGRAPHY
[66]K Najim, A Rusnak, A M~szaros, and M Fikar. Constrained longrange predictive control based on artificial neural networks. International Journal of Systems Science, 28:12111226,1997.
[67]K Narendra and MA L Thathachar. Learning Automata an Introduction. PrenticeHall, EnglewoodCliffs, NewJersey, 1989.
[68]S Norquay, A Palazoglu, and J Romagnoli. Application of Wiener model predictive control (WMPC)to a pH neutralization experinmnt. IEEE Transactions on Control Systems Technology, 7(4):437445, 1999.
[69]A Ordys and D Clarke. A statespace description for GPCcontrollers. International Journal of Systems Science, 24(9):17271744, 1993.
[7o]J Parkum, J Poulsen, and J Holst. Recursive forgetting algorithms. International Journal of Control, 55(1):109128, 1990.
[71]T Parthasarathy.
On Global Univalence Theorems. SpringerVerlag,
Berlin, 1983.
[72]R Pearson. DiscreteTime DynamicModels. Oxford University Press, Oxford, 1999.
[73]WPedrycz. Fuzzy Control and Fuzzy Systems. John Wiley and Sons, NewYork, 1989.
[74]J Penttinen
and H Koivo. Multivariable tuning regulators for unknown system. Automatica, 16:393398, 1980.
[75]A Poznyak and K Najim. Learning Automata and Stochastic Optimization. Springer, Berlin, 1997.
[76]MPyykk(i. Leijupetikattilan tulipesdn siiiitbjen
simulointi. Tampere
University of Technology,Finland, 1989.
[77]J MQuero, E F Camacho, and L G Franquelo. Neural network for constrained predictive control. IEEETransactions on Circuits and Systems  I: FundamentalTheory and Applications, 40:621626, 1993.
[78]D Ratkowsky. Nonlinear Regression Modeling: A Unified .Practical Approach. Marcel Dekker Inc., NewYork, 1983.
[79]J Richalet,
A Rault, J Testud, and J Papon. Modelpredictive heuristic control: Applications to industrial processes. Automatica, 14:413428, 1978.
3O5
BIBLIOGRAPHY
approach [80] V Ruoppila, T Sorsa, and H Koivo. Recursive leastsquares to selforganizing maps. In IEEE International Conference on Neural Networks, San Frdncisco, 1993. [81] A Rusnak, M Fikar, K Najim, and A M~szaros. Generalized predictive control based on neural networks. Neural Processing Letters, 4:107112, 1996.
[82]J
SaintDonat, N Bhat, and T McAvoy.Neural net based model predictive control. International Journal of Control, 54:14531468, 1991.
[83]D Sbarbaro,
N Filatov, and H Unbehauen. Adaptive predictive controllers based on othonormal series representation. International journal of adaptive control and signal processing, 13:621631, 1999.
[84]R Setiono
and L Hui. Use of quasiNewton method in a feedforward neural network construction algorithm. IEEE Transactions on Neural Networks, 6:273277, 1995.
[85]S Shah and W Cluett. for selftuning 69:8996, 1991.
Recursive least squares based estimation schemes control. Canadian Journal of Chemical Engineering,
[86]J
Sjt~berg, Q Zhang, L Ljung, A Benveniste, B Delyon, P Glorennec, H Hjalmarsson, and A Juditsky. Nonlinear blackbox modelling in system identification: A unified overview. Automatica, 31:16911724, 1995.
[87]P
van der Smagt. Minimisation methods for training networks. Neural Networks, 7:111, 1994.
feedforward neural
[88]R.
Soeterboek. Predictive Control: A Unified Approach. Prentice Hall International, London, 1992.
[89]B Sonnleitner
and O Kt~ppeli. Growth of saccharomyces crevisiae is controlled by its limited respiratiry capacity. Formulation and verification of a hypothesis. Biotechnology and Bioengineering, 28:8188, 1986.
[9o]G Stephanopoulos.
Chemical Process Control: An Introduction ory and Practice. PrenticeHall, NewYork, 1984.
[91]K O Temeng,
to The
P D Schnelle, and T J McAvoy. Model predictive control of an industrial packed bed reactor using neural networks. Journal of Process Control, 5:1927, 1995.
306
BIBLIOGRAPHY
[92]A Visala.
Modeling of Nonlinear Processes Using WienerNNRepresentation and Multiple Models. PhDthesis, Helsinki University of Technology, 1997.
[93]H Walk. Stochastic
iteration for a constrained optimization problem. Communicationsin Statistics, Sequential Analysis, 2:369385, 19831984.
[94]P Werbos. Backpropagation through
time: Whatit does and how to do it. Proceedings of the IEEE, 78:15501560,1990.
[95]T Wigren. Recursive prediction
error identification using the nonlinear Wiener model. Automatica, 29(4):10111025, 1993.
[96]C Yueand WQinglin.
A multivariable unified predictive control (UPC) algorithm based on the state space model. In K Seki, editor, 38th SICE Conference’99, pages 949952, Moriol~, Japan, 1999. The Society of Instrument and Control Engineers, The Society of Instrument and Control Engineers.
[97]K Zenger. Analysis
and Control Design of a Class of Time Varying Systems. Report 88, Helsinki University of Technology,Control :Engineering Laboratory, 1992.
Index adaptive control, 223 direct, 228 gain scheduling, 226 indirect, 227 adaptive systems, 225 addone partitioning, 105 ARMAX,61 ARX, 57 automata, 155 autoregressive exogenous, 57
constraints, 152, 253 control canonical form, 274 control horizon, 187, 195, 218 controllability, 275 covaxiance, 26 deadbeat control, 196 decision logic, 103 decouplers, 209 defuzzification, 104 Diophantine equation, 68 distillation columnmodel, 167 disturbance model, 195
basis function networks, 79 basis functions global, 79 local, 79 mother, 79 multivariable, 79 singlevariable, 79 batch methods, 137 bias, 96 biasvariance dilemma, 82 BoxJenkins, 56 Bristol’s matrix, 205 Bristol’s method, 204, 207 BurmanLagrangeseries, 87 BushMosteller scheme, 158
equivalent kernel, 84, 96 equivalent memoryhorizon, 35 expert systems, 99 extension principle, 110 factorization, 32 FBC, 283 fermenter model, 254 finite impulse response, 47 FIR, 47 fixed gain observer, 196, 219 fluidized bed combustion,197, 208, 219, 283 fluidized bed combustionmodel, 283 forgetting factor, 35 fuzzification, 102 point fuzzification, 103 fuzzy implication function, 103 Mamdanimodels, 101
CARIMA, 66 certainty equivalence, 184 characteristic equation, 188, 197, 273, 277, 281 chemical reactor model, 266 companion matrix, 275 condition number, 23 307
308 operations on sets, 103 PIcontroller, 107 set, 102 Sugenomodels, 101, 104, 106 examples, 162 fuzzy neural networks, 112 gain scheduling, 226 GaussNewton method, 141 generalized basis function networks, 79 generalized predictive control, 189, 213 adaptive, 228 examples, 197, 219, 228, 235 genetic algorithms, 154 global basis functions, 79 GPC, 189 Hammersteinsystems, 124, 127, 132 and Wiener systems, 133 control, 232, 242 examples, 167 Hessian matrix, 140 hinging hyperplanes, 90 istepahead predictors, 7173 identification, 8 indirect adaptive control, 227 integral action, 187, 195, 211 inverse Nyquist array, 210 ~nverses, 133, 232 iterative methods, 137 Jacobian matrix, 141 knearest neighbours, 96 Kalmanfilter, 42, 188, 196, 219 KuhnTucker parameters, 149 Lagrangefunction, 149, 151, 253 Lagrangemultipliers, 149, 152 examples, 172, 254
INDEX large scale linearization, 232 learning automata, 155 learning systems, 225 least squares method, 17, 20, 31, 36, 45 nonlinear, 137 LevenbergMarquardt,1.40, 142 examples, 143, 163, 290 linear systems, 83 local basis functions, 79 local models, 79 LTI systems, 224 Markovprocess, 37 matrix inversion lemma,29 Matyas random optimization, 154 meanlevel control, 195, 213, 229 MGPC, 213 minimumhorizon, 187, 195, 218 modus ponens, 103 multilinear systems, 84 multivariable GPC, 213 PIcontroller, 212 predictive controller, 218 nearest neighbors, 95 neural networks, 89 Kohonen’s, 96 onehiddenlayer si~pnoid, 94 sigmoid function, 90 sigmoid neural networks examples,169, 17’4, 235, 257, 267 Newton’s method, 141 nonparametric regression, 96 normalization, 10 observability, 275 observer canonical form, 275 OE, 59 output error, 59
INDEX
309
persistently exciting, 22 pHneutralization, 143 pHneutralization model, 233 PIcontroller fuzzy, 107 multivariable, 212 pneumatic valve model, 160 poles of a system, 51 powerseries, 86 prediction error a posteriori, 33 a priori, 33 prediction error methods, 138 iterative, 138 prediction horizon, 218 predictive control, 182 constrained examples, 267 examples, 255 principle of superposition, 13 projection algorithm, 130
scheduling variable, 226 search region reduction, 154 semiglobal models, 90 series estimator, 83 sigmoid function, 90 sigmoid neural networks examples, 143 sigmoid neural networks (SNN),90, 94 simulated annealing, 154 singular matrix, 22 singular values, 23 smoother matrix, 84, 96 stability, 130 statespace feedback, 276 statespace model, 184, 213, 273 stationary process, 54 steady state gain, 52, 126 stochastic approximation, 253 subspace methods, 191 Sugenofuzzy models, 101, 104, 106
radial basis function networks, 97 radial construction, 81 receding horizon principle, 182 recursive algorithms, 137 recursive least squares method,31, 36 reinforcement schemes, 157 relative gain, 204 relative gain array, 207 examples, 208 rewardpenalty scheme, 158 RGAmethod, 204, 207 ridge construction, 81 RMSE, 163 rootmeansquared error, 163 rulebased systems, 99
tnorm, 103 tensor product construction, 79 three point method, 142 time series ARIMAX,65, 190, 195 ARMAX,61, 190 ARX, 57 BoxJenkins, 56, 73 CARIMA, 66 NARIMAX, 248 NARMAX,115, 119 NARX, 115 NOE, 115 nonlinear, 114 OE, 59, 76 statespace form, 190 transfer function, 50 transparency, 100 twotank system model, 172
snorm, 103 scaling, 10
310 UDfactorization, 32 universal approximator, 81 variance, 96 Volterra systems, 113 Wiener systems, 123, 127, 129 and Hammersteinsystems, 133 control, 232 examples, 160, 172, 233 zeros of a system, 51
INDEX