Lecture Notes in Control and Information Sciences 404 Editors: M. Thoma, F. Allgöwer, M. Morari
Fouad Giri and Er-Wei Bai (Eds.)
Block-oriented Nonlinear System Identification
ABC
Series Advisory Board P. Fleming, P. Kokotovic, A.B. Kurzhanski, H. Kwakernaak, A. Rantzer, J.N. Tsitsiklis
Editors Prof. Fouad Giri Université de Caen GREYC Lab, CNRS UMR 6072 14032 Caen France E-mail:
[email protected] Dr. Er-Wei Bai University of Iowa Dept. Electrical & Computer Engineering Iowa Advanced Technology Laboratories 52244 Iowa City Iowa USA E-mail:
[email protected]
ISBN 978-1-84996-512-5
e-ISBN 978-1-84996-513-2
DOI 10.1007/978-1-84996-513-2 Lecture Notes in Control and Information Sciences
ISSN 0170-8643
Library of Congress Control Number: 2010932194 MATLAB: MATLAB and Simulink are registered trademarks of The MathWorks, Inc., 3 Apple Hill Drive, Natick, MA 01760-2098, U.S.A. c
2010 Springer-Verlag Berlin Heidelberg
This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilm or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable for prosecution under the German Copyright Law. The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. Typeset & Cover Design: Scientific Publishing Services Pvt. Ltd., Chennai, India. Printed in acid-free paper 543210 springer.com
Lecture Notes in Control and Information Sciences Edited by M. Thoma, F. Allgöwer, M. Morari Further volumes of this series can be found on our homepage: springer.com Vol. 404: Giri, F.; Bai, E.-W. (Eds.): Block-oriented Nonlinear System Identification 425 p. 2010 [978-1-84996-512-5] Vol. 403: Tóth, R.; Modeling and Identification of Linear Parameter-Varying Systems 319 p. 2010 [978-3-642-13811-9] Vol. 402: del Re, L.; Allgöwer, F.; Glielmo, L.; Guardiola, C.; Kolmanovsky, I. (Eds.): Automotive Model Predictive Control 284 p. 2010 [978-1-84996-070-0] Vol. 401: Chesi, G.; Hashimoto, K. (Eds.): Visual Servoing via Advanced Numerical Methods 393 p. 2010 [978-1-84996-088-5] Vol. 400: Tomás-Rodríguez, M.; Banks, S.P.: Linear, Time-varying Approximations to Nonlinear Dynamical Systems 298 p. 2010 [978-1-84996-100-4] Vol. 399: Edwards, C.; Lombaerts, T.; Smaili, H. (Eds.): Fault Tolerant Flight Control appro. 350 p. 2010 [978-3-642-11689-6] Vol. 398: Hara, S.; Ohta, Y.; Willems, J.C.; Hisaya, F. (Eds.): Perspectives in Mathematical System Theory, Control, and Signal Processing appro. 370 p. 2010 [978-3-540-93917-7] Vol. 397: Yang, H.; Jiang, B.; Cocquempot, V.: Fault Tolerant Control Design for Hybrid Systems 191 p. 2010 [978-3-642-10680-4] Vol. 396: Kozlowski, K. (Ed.): Robot Motion and Control 2009 475 p. 2009 [978-1-84882-984-8] Vol. 395: Talebi, H.A.; Abdollahi, F.; Patel, R.V.; Khorasani, K.: Neural Network-Based State Estimation of Nonlinear Systems appro. 175 p. 2010 [978-1-4419-1437-8]
Vol. 394: Pipeleers, G.; Demeulenaere, B.; Swevers, J.: Optimal Linear Controller Design for Periodic Inputs 177 p. 2009 [978-1-84882-974-9] Vol. 393: Ghosh, B.K.; Martin, C.F.; Zhou, Y.: Emergent Problems in Nonlinear Systems and Control 285 p. 2009 [978-3-642-03626-2] Vol. 392: Bandyopadhyay, B.; Deepak, F.; Kim, K.-S.: Sliding Mode Control Using Novel Sliding Surfaces 137 p. 2009 [978-3-642-03447-3] Vol. 391: Khaki-Sedigh, A.; Moaveni, B.: Control Configuration Selection for Multivariable Plants 232 p. 2009 [978-3-642-03192-2] Vol. 390: Chesi, G.; Garulli, A.; Tesi, A.; Vicino, A.: Homogeneous Polynomial Forms for Robustness Analysis of Uncertain Systems 197 p. 2009 [978-1-84882-780-6] Vol. 389: Bru, R.; Romero-Vivó, S. (Eds.): Positive Systems 398 p. 2009 [978-3-642-02893-9] Vol. 388: Jacques Loiseau, J.; Michiels, W.; Niculescu, S-I.; Sipahi, R. (Eds.): Topics in Time Delay Systems 418 p. 2009 [978-3-642-02896-0] Vol. 387: Xia, Y.; Fu, M.; Shi, P.: Analysis and Synthesis of Dynamical Systems with Time-Delays 283 p. 2009 [978-3-642-02695-9] Vol. 386: Huang, D.; Nguang, S.K.: Robust Control for Uncertain Networked Control Systems with Random Delays 159 p. 2009 [978-1-84882-677-9]
Vol. 385: Jungers, R.: The Joint Spectral Radius 144 p. 2009 [978-3-540-95979-3] Vol. 384: Magni, L.; Raimondo, D.M.; Allgöwer, F. (Eds.): Nonlinear Model Predictive Control 572 p. 2009 [978-3-642-01093-4] Vol. 383: Sobhani-Tehrani E.: Khorasani K.; Fault Diagnosis of Nonlinear Systems Using a Hybrid Approach 360 p. 2009 [978-0-387-92906-4]
Vol. 373: Wang Q.-G.; Ye Z.; Cai W.-J.; Hang C.-C.: PID Control for Multivariable Processes 264 p. 2008 [978-3-540-78481-4] Vol. 372: Zhou J.; Wen C.: Adaptive Backstepping Control of Uncertain Systems 241 p. 2008 [978-3-540-77806-6] Vol. 371: Blondel V.D.; Boyd S.P.; Kimura H. (Eds.): Recent Advances in Learning and Control 279 p. 2008 [978-1-84800-154-1]
Vol. 382: Bartoszewicz A.; Nowacka-Leverton A.: Time-Varying Sliding Modes for Second and Third Order Systems 192 p. 2009 [978-3-540-92216-2]
Vol. 370: Lee S.; Suh I.H.; Kim M.S. (Eds.): Recent Progress in Robotics: Viable Robotic Service to Human 410 p. 2008 [978-3-540-76728-2]
Vol. 381: Hirsch M.J.; Commander C.W.; Pardalos P.M.; Murphey R. (Eds.): Optimization and Cooperative Control Strategies: Proceedings of the 8th International Conference on Cooperative Control and Optimization 459 p. 2009 [978-3-540-88062-2]
Vol. 369: Hirsch M.J.; Pardalos P.M.; Murphey R.; Grundel D.: Advances in Cooperative Control and Optimization 423 p. 2007 [978-3-540-74354-5]
Vol. 380: Basin M.: New Trends in Optimal Filtering and Control for Polynomial and Time-Delay Systems 206 p. 2008 [978-3-540-70802-5] Vol. 379: Mellodge P.; Kachroo P.: Model Abstraction in Dynamical Systems: Application to Mobile Robot Control 116 p. 2008 [978-3-540-70792-9] Vol. 378: Femat R.; Solis-Perales G.: Robust Synchronization of Chaotic Systems Via Feedback 199 p. 2008 [978-3-540-69306-2] Vol. 377: Patan K.: Artificial Neural Networks for the Modelling and Fault Diagnosis of Technical Processes 206 p. 2008 [978-3-540-79871-2] Vol. 376: Hasegawa Y.: Approximate and Noisy Realization of Discrete-Time Dynamical Systems 245 p. 2008 [978-3-540-79433-2] Vol. 375: Bartolini G.; Fridman L.; Pisano A.; Usai E. (Eds.): Modern Sliding Mode Control Theory 465 p. 2008 [978-3-540-79015-0] Vol. 374: Huang B.; Kadali R.: Dynamic Modeling, Predictive Control and Performance Monitoring 240 p. 2008 [978-1-84800-232-6]
Vol. 368: Chee F.; Fernando T. Closed-Loop Control of Blood Glucose 157 p. 2007 [978-3-540-74030-8] Vol. 367: Turner M.C.; Bates D.G. (Eds.): Mathematical Methods for Robust and Nonlinear Control 444 p. 2007 [978-1-84800-024-7] Vol. 366: Bullo F.; Fujimoto K. (Eds.): Lagrangian and Hamiltonian Methods for Nonlinear Control 2006 398 p. 2007 [978-3-540-73889-3] Vol. 365: Bates D.; Hagström M. (Eds.): Nonlinear Analysis and Synthesis Techniques for Aircraft Control 360 p. 2007 [978-3-540-73718-6] Vol. 364: Chiuso A.; Ferrante A.; Pinzoni S. (Eds.): Modeling, Estimation and Control 356 p. 2007 [978-3-540-73569-4] Vol. 363: Besançon G. (Ed.): Nonlinear Observers and Applications 224 p. 2007 [978-3-540-73502-1] Vol. 362: Tarn T.-J.; Chen S.-B.; Zhou C. (Eds.): Robotic Welding, Intelligence and Automation 562 p. 2007 [978-3-540-73373-7] Vol. 361: Méndez-Acosta H.O.; Femat R.; González-Álvarez V. (Eds.): Selected Topics in Dynamics and Control of Chemical and Biological Processes 320 p. 2007 [978-3-540-73187-0]
Preface
Identification of block-oriented nonlinear systems has been an active research area for over five decades. In particular, over the last fifteen years, there has been a resurgence of research interests. On the one hand, a large number of works are reported each year in diversified journals and conferences. On the other, there are new and unsolved problems arising from practical applications. A number of very successful invited sessions on the topic were organised in recent IFAC SYSID symposia, IEEE CDC, ACC, and others, representing significant advances from theory to designs and applications. The idea itself of writing this book finds its origin in an invited session organised by the editors at SYSID 2009 in Saint-Malo (France). We thought it was timely and highly beneficial to put together a monograph reflecting the wide variety of problems as well as methods, concepts and results in the block-oriented nonlinear systems field. The present book summarises the state of the art in designing, analysing and implementing identification algorithms for this class of systems. It reports on the most recent and significant developments in the area and presents some new research directions. It is intended for a broad spectrum of readers from academia and industry, including researchers, graduate students, engineers and practitioners from various fields. The book provides a thorough theoretical grounding while also addressing the needs of practitioners. We are grateful to all our colleagues from around the world, who brilliantly contributed to this work, gathering the considerable knowledge from a range of aspects in the area as wide as possible. We would also like to thank Mr. Oliver Jackson and Ms. Charlotte Cross from Springer, UK for their encouragement and help in editing the book. Special and warm thanks go to Vincent Van Assche for his valuable help in the preparation of the camera ready copy. Caen, France Iowa City, IA, USA June 2010
Fouad Giri Er-Wei Bai
Contents
Part I: Block-oriented Nonlinear Models 1
2
Introduction to Block-oriented Nonlinear Systems . . . . . . . Er-Wei Bai, Fouad Giri 1.1 Block-oriented Nonlinear Systems . . . . . . . . . . . . . . . . . . . . . . 1.2 About This Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.1 Book Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.2 Who Can Use This Book . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nonlinear System Modelling and Analysis from the Volterra and Wiener Perspective . . . . . . . . . . . . . . . . . . . . . . . . Martin Schetzen 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 General System Considerations . . . . . . . . . . . . . . . . . . . . . . . . 2.3 The Volterra Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Applications of the Volterra Series . . . . . . . . . . . . . . . . . . . . . . 2.5 The Wiener Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6 The Wiener G-functionals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7 System Modelling with the G-functionals . . . . . . . . . . . . . . . . 2.8 General Wiener Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.9 The Gate Function Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.10 An Optimum System Calculator . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3 3 7 7 9 10 13 13 13 14 15 17 18 19 20 21 22 23
VIII
Contents
Part II: Iterative and Overparameterization Methods 3
4
5
An Optimal Two-stage Identification Algorithm for Hammerstein–Wiener Nonlinear Systems . . . . . . . . . . . . . . . . Er-Wei Bai 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Optimal Two-stage Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Compound Operator Decomposition and Its Application to Hammerstein and Wiener Systems . . . . . . . Jozef V¨ or¨ os 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 Serial Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.2 Parallel Application . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Decomposition of Block-oriented Nonlinear Systems . . . . . . 4.3.1 Hammerstein System . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.2 Wiener System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Identification of Hammerstein–Wiener Systems . . . . . . . . . . . 4.4.1 Hammerstein–Wiener Systems . . . . . . . . . . . . . . . . . . 4.4.2 Piecewise-linear Characteristics . . . . . . . . . . . . . . . . . 4.4.3 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.4 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Iterative Identification of Hammerstein Systems . . . . . . . . . Yun Liu, Er-Wei Bai 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Hammerstein System with IIR Linear Part . . . . . . . . . . . . . . 5.3 Non-smooth Nonlinearities . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
27 27 28 33 34 35 35 36 37 39 39 39 41 42 42 44 47 48 49 50 53 53 53 56 61 64 64
Part III: Stochastic Methods 6
Recursive Identification for Stochastic Hammerstein Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Han-Fu Chen 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Nonparametric f (·) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.1 Identification of A(z) . . . . . . . . . . . . . . . . . . . . . . . . . .
69 69 71 71
Contents
6.2.2 Identification of B(z) . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.3 Identification of f (u) . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Piecewise Linear f (·) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4 Parameterized Nonlinearity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.6 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
8
Wiener System Identification Using the Maximum Likelihood Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Adrian Wills, Lennart Ljung 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 An Output-error Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3 The Maximum Likelihood Method . . . . . . . . . . . . . . . . . . . . . . 7.3.1 Likelihood Function for White Disturbances . . . . . . 7.3.2 Likelihood Function for Coloured Process Noise . . . 7.4 Maximum Likelihood Algorithms . . . . . . . . . . . . . . . . . . . . . . . 7.4.1 Direct Gradient-based Search Approach . . . . . . . . . . 7.4.2 Expectation Maximisation Approach . . . . . . . . . . . . 7.5 Simulation Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5.1 Example 1: White Process and Measurement Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5.2 Example 2: Coloured Process Noise . . . . . . . . . . . . . 7.5.3 Example 3: Blind Estimation . . . . . . . . . . . . . . . . . . . 7.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parametric Versus Nonparametric Approach to Wiener Systems Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Grzegorz Mzyk 8.1 Introduction to Wiener Systems . . . . . . . . . . . . . . . . . . . . . . . . 8.2 Nonlinear Least Squares Method . . . . . . . . . . . . . . . . . . . . . . . 8.3 Nonparametric Identification Tools . . . . . . . . . . . . . . . . . . . . . 8.3.1 Inverse Regression Approach . . . . . . . . . . . . . . . . . . . 8.3.2 Cross-correlation Analysis . . . . . . . . . . . . . . . . . . . . . . 8.3.3 A Censored Sample Mean Approach . . . . . . . . . . . . . 8.4 Combined Parametric-nonparametric Approach . . . . . . . . . . 8.4.1 Kernel Method with the Correlation-based Internal Signal Estimation . . . . . . . . . . . . . . . . . . . . . 8.4.2 Identification of IIR Wiener Systems with Non-Gaussian Input . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.4.3 Recent Ideas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
IX
74 75 77 82 85 85 86 89 89 91 94 94 95 97 98 99 104 104 105 108 109 109 111 111 112 114 114 115 116 120 120 121 122 123 123
X
9
Contents
Identification of Block-oriented Systems: Nonparametric and Semiparametric Inference . . . . . . . . . . . M. Pawlak 9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2 Nonparametric and Semiparametric Inference . . . . . . . . . . . . 9.3 Semiparametric Block-oriented Systems . . . . . . . . . . . . . . . . . 9.3.1 Semiparametric Hammerstein Systems . . . . . . . . . . . 9.3.2 Semiparametric Parallel Systems . . . . . . . . . . . . . . . . 9.4 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10 Identification of Block-oriented Systems Using the Invariance Property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Martin Enqvist 10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.3 The Invariance Property and Separable Processes . . . . . . . . 10.4 Block-oriented Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
127 127 128 136 136 140 144 146 147 147 148 152 154 156 157
Part IV: Frequency Methods 11 Frequency Domain Identification of Hammerstein Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Er-Wei Bai 11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Problem Statement and Point Estimation . . . . . . . . . . . . . . . 11.2.1 Continuous Time Frequency Response . . . . . . . . . . . 11.2.2 Point Estimation of G(jω) Based on YT and UT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2.3 Implementation Using Sampled Data . . . . . . . . . . . . 11.3 Identification of G(s) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.3.1 Finite-order Rational Transfer Function G(s) . . . . . 11.3.2 Non-parametric G(s) . . . . . . . . . . . . . . . . . . . . . . . . . . 11.4 Identification of the Nonlinear Part f (u) . . . . . . . . . . . . . . . . 11.4.1 Unknown Nonlinearity Structure . . . . . . . . . . . . . . . . 11.4.2 Polynomial Nonlinearities . . . . . . . . . . . . . . . . . . . . . . 11.5 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.6 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
161 161 162 163 164 165 168 168 170 171 172 173 175 179 179
Contents
12 Frequency Identification of Nonparametric Wiener Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fouad Giri, Youssef Rochdi, Jean-Baptiste Gning, Fatima-Zahra Chaoui 12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.2 Identification Problem Statement . . . . . . . . . . . . . . . . . . . . . . . 12.3 Frequency Behaviour Geometric Interpretations . . . . . . . . . . 12.3.1 Characterisation of the Loci (xn (t), w(t)) and (x− n (t), w(t)) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.3.2 Estimation of the Loci Cψ (U, ω) . . . . . . . . . . . . . . . . 12.4 Wiener System Identification Method . . . . . . . . . . . . . . . . . . . 12.4.1 Phase Estimation (PE) . . . . . . . . . . . . . . . . . . . . . . . . 12.4.2 Nonlinearity Estimation (NLE) . . . . . . . . . . . . . . . . . 12.4.3 Frequency Gain Modulus Estimation . . . . . . . . . . . . 12.4.4 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.5 Further Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.5.1 Geometric Area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.5.2 Signal Spread . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Identification of Wiener–Hammerstein Systems Using the Best Linear Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . Lieve Lauwers, Johan Schoukens 13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.1.1 Block-oriented versus Black-box Models . . . . . . . . . . 13.1.2 Identification Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.2 The Best Linear Approximation . . . . . . . . . . . . . . . . . . . . . . . . 13.2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.2.2 Class of Excitations . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.3 Nonlinear Block Structure Selection Method . . . . . . . . . . . . . 13.3.1 Two-stage Nonparametric Approach . . . . . . . . . . . . . 13.3.2 Some Nonlinear Block Structures . . . . . . . . . . . . . . . 13.3.3 Theoretical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.3.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . 13.3.5 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . 13.4 Initial Estimates for Wiener–Hammerstein Models . . . . . . . . 13.4.1 Set-up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.4.2 Initialisation Procedure . . . . . . . . . . . . . . . . . . . . . . . . 13.4.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . 13.4.4 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . 13.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
XI
181
181 183 184 184 190 192 192 193 195 199 204 204 205 206 206 209 209 209 210 211 211 212 213 213 213 214 215 216 217 217 217 221 223 224 225
XII
Contents
Part V: SVM, Subspace and Separable Least-squares 14 Subspace Identification of Hammerstein–Wiener Systems Operating in Closed-loop . . . . . . . . . . . . . . . . . . . . . . . Jan-Willem van Wingerden, Michel Verhaegen 14.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.2.1 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . 14.2.2 Concept of Basis Functions . . . . . . . . . . . . . . . . . . . . . 14.2.3 Assumptions and Notation . . . . . . . . . . . . . . . . . . . . . 14.3 Hammerstein–Wiener Predictor-based Subspace Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.3.1 Predictors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.3.2 Extended Observability Times Controllability Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.3.3 Estimation of the Wiener Nonlinearity . . . . . . . . . . . 14.3.4 Recovery of the System Matrices . . . . . . . . . . . . . . . . 14.3.5 Estimation of the Hammerstein Nonlinearity . . . . . 14.4 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 NARX Identification of Hammerstein Systems Using Least-Squares Support Vector Machines . . . . . . . . . . . . . . . . . Ivan Goethals, Kristiaan Pelckmans, Tillmann Falck, Johan A.K. Suykens, Bart De Moor 15.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.2 Hammerstein Identification Using an Overparametrisation Approach . . . . . . . . . . . . . . . . . . . . . . . . . 15.2.1 Implementation of Overparametrisation . . . . . . . . . . 15.2.2 Potential Problems in Overparametrisation . . . . . . . 15.3 Function Approximation Using Least Squares Support Vector Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.4 NARX Hammerstein Identification as a Componentwise LS-SVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.4.1 SISO Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.4.2 Identification of Hammerstein MIMO Systems . . . . 15.5 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.6 Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.7 Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
229 229 230 230 231 232 233 233 234 235 235 235 236 239 239 241
241 243 243 244 245 247 247 250 253 254 255 256
16 Identification of Linear Systems with Hard Input Nonlinearities of Known Structure . . . . . . . . . . . . . . . . . . . . . . . 259 Er-Wei Bai 16.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
Contents
XIII
16.2
261 261 263 265 269 270
Deterministic Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.2.1 Identification Algorithm . . . . . . . . . . . . . . . . . . . . . . . 16.2.2 Consistency Analysis and Computational Issues . . . 16.3 Correlation Analysis Method . . . . . . . . . . . . . . . . . . . . . . . . . . 16.4 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Part VI: Blind Methods 17 Blind Maximum-likelihood Identification of Wiener and Hammerstein Nonlinear Block Structures . . . . . . . . . . . Laurent Vanbeylen, Rik Pintelon 17.1 Introduction: Blind Nonlinear Modelling . . . . . . . . . . . . . . . . 17.1.1 Nonlinear Sensor Calibration . . . . . . . . . . . . . . . . . . 17.1.2 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.2 Introduction of Models and Related Assumptions . . . . . . . . . 17.2.1 Class of Discrete-time Wiener and Hammerstein Systems Considered . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.2.2 Parametrisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.2.3 Stochastic Framework . . . . . . . . . . . . . . . . . . . . . . . . . 17.2.4 Identifiability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.3 The Gaussian Maximum-likelihood Estimator (MLE) . . . . . 17.3.1 The Negative Log-likelihood (NLL) Function . . . . 17.3.2 The Simplified MLE Cost Function . . . . . . . . . . . . . 17.3.3 Asymptotic Properties . . . . . . . . . . . . . . . . . . . . . . . . 17.3.4 Loss of Consistency in the Case of a Non-Gaussian Input . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.3.5 Non-white Gaussian Inputs . . . . . . . . . . . . . . . . . . . . . 17.4 Generation of Initial Estimates . . . . . . . . . . . . . . . . . . . . . . . . 17.4.1 Subproblem 1: Monotonically Increasing Static Nonlinearity Driven by Gaussian Noise . . . . . . . . . . 17.4.2 Subproblem 2: LTI Driven by White Input Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.5 Minimisation of the Cost Function . . . . . . . . . . . . . . . . . . . . . . 17.6 The Cram´er-Rao Lower Bound . . . . . . . . . . . . . . . . . . . . . . . . 17.7 Impact of Output Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.8 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.8.1 Setup: Presentation of the Example . . . . . . . . . . . . . 17.8.2 Graphical Presentation of the Results . . . . . . . . . . . 17.8.3 Monte Carlo Analysis Showing the Impact of Output Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.9 Laboratory Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.10 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
273 273 275 275 275 275 276 276 277 277 278 278 279 280 280 280 281 281 282 282 283 283 283 284 286 287 289 290
XIV
Contents
18 A Blind Approach to Identification of Hammerstein Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jiandong Wang, Akira Sano, Tongwen Chen, Biao Huang 18.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.2 Problem Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.3 Estimation of na , τ and A(q) . . . . . . . . . . . . . . . . . . . . . . . . . . 18.4 Estimation of x(t) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.5 Numerical Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.6 Experimental Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.6.1 Hammerstein Model of MR Dampers . . . . . . . . . . . . 18.6.2 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.6.3 Experiment Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 A Blind Approach to the Hammerstein-Wiener Model Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Er-Wei Bai 19.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.2 Problem Statement and Preliminaries . . . . . . . . . . . . . . . . . . . 19.3 Identification of the Hammerstein-Wiener Model . . . . . . . . . 19.3.1 Output Nonlinearity Estimation . . . . . . . . . . . . . . . . 19.3.2 Linear Transfer Function Estimation . . . . . . . . . . . . 19.3.3 Input Nonlinearity Estimation . . . . . . . . . . . . . . . . . . 19.3.4 Algorithm and Simulations . . . . . . . . . . . . . . . . . . . . . 19.3.5 Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.4 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
293 293 294 296 299 303 306 307 307 309 311 311 313 313 313 315 315 318 320 321 324 331 331
Part VII: Decoupling Inputs and Bounded Error Methods 20 Decoupling the Linear and Nonlinear Parts in Hammerstein Model Identification . . . . . . . . . . . . . . . . . . . . . . . Er-Wei Bai 20.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.2 Nonlinearity with the PRBS Inputs . . . . . . . . . . . . . . . . . . . . . 20.3 Linear Part Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.3.1 Non-parametric Identification . . . . . . . . . . . . . . . . . . . 20.3.2 Parametric Identification . . . . . . . . . . . . . . . . . . . . . . . 20.4 Nonlinear Part Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.5 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
335 335 337 339 339 341 343 345 345
Contents
21 Hammerstein System Identification in Presence of Hard Memory Nonlinearities . . . . . . . . . . . . . . . . . . . . . . . . . . . . Youssef Rochdi, Vincent Van Assche, Fatima-Zahra Chaoui, Fouad Giri 21.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Identification Problem Formulation . . . . . . . . . . . . . . . . . . . . . 21.3 Linear Subsystem Identification . . . . . . . . . . . . . . . . . . . . . . . . 21.3.1 Model Reforming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3.2 Model Centring and Linear Subsystem Parameter Estimation . . . . . . . . . . . . . . . . . . . . . . . . . 21.3.3 A Class of Exciting Input Signal . . . . . . . . . . . . . . . . 21.3.4 Consistency of Linear Subsystem Parameter Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3.5 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.4 Nonlinear Element Estimation . . . . . . . . . . . . . . . . . . . . . . . . . 21.4.1 Estimation of m1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.4.2 Estimation of (h1 , h2 ) . . . . . . . . . . . . . . . . . . . . . . . . . 21.4.3 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Bounded Error Identification of Hammerstein Systems with Backlash . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Vito Cerone, Dario Piga, Diego Regruto 22.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.3 Assessment of Tight Bounds on the Nonlinear Static Block Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.3.1 Definitions and Preliminary Results . . . . . . . . . . . . . 22.3.2 Exact Description of Dγl . . . . . . . . . . . . . . . . . . . . . . . 22.3.3 Tight Orthotope Description of Dγl . . . . . . . . . . . . . . 22.4 Bounding the Parameters of the Linear Dynamic Model . . . 22.5 A Simulated Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
XV
347
347 348 350 351 351 353 353 354 355 355 356 364 364 364 367 367 368 370 372 373 373 374 377 380 381
Part VIII: Application of Block-oriented Models 23 Block Structured Modelling in the Study of the Stretch Reflex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . David T. Westwick 23.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23.3 Initial Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
385 385 385 387
XVI
Contents
23.4 23.5
Hard Nonlinearities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Parallel Cascade Stiffness Model . . . . . . . . . . . . . . . . . . . 23.5.1 Iterative, Correlation Based Approach . . . . . . . . . . . 23.5.2 Separable Least Squares Optimisation . . . . . . . . . . . 23.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 Application of Block-oriented System Identification to Modelling Paralysed Muscle Under Electrical Stimulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Zhijun Cai, Er-Wei Bai, Richard K. Shield 24.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24.3 The Wiener–Hammerstein Fatigue Model . . . . . . . . . . . . . . . . 24.4 Identification of the Wiener–Hammerstein System . . . . . . . . 24.4.1 Identification of the Wiener–Hammerstein Non-fatigue Model (Single Train Stimulation Model) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24.4.2 Identification of the Wiener–Hammerstein Fatigue Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24.5 Collection of SCI Patient Data . . . . . . . . . . . . . . . . . . . . . . . . . 24.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24.7 Discussion and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
389 396 397 398 400 400
403 403 405 406 409
409 411 412 413 417 417
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421
List of Contributors
E.B. Bai University of Iowa, Iowa City, IA, USA
[email protected] Z. Cai University of Iowa, Iowa City, IA, USA
[email protected] V. Cerone Politecnico di Torino, Torino, Italy
[email protected]
M. Enqvist Link¨ opings universitet, Link¨ oping, Sweden
[email protected] T. Falck Katholieke Universiteit Leuven, Leuven, Belgium
[email protected] F. Giri University of Caen Basse-Normandie, Caen, France
[email protected]
F.Z. Chaoui University of Caen Basse-Normandie, Caen, France J.B. Gning
[email protected] Crouzet Automatismes, Valence, France H.F. Chen Jean-Baptiste.Gning@ Chinese Academy of Sciences, greyc.ensicaen.fr Beijing, China
[email protected] I. Goethals T. Chen ING Life Belgium, Etterbeek, University of Alberta, Edmonton, Belgium AB, Canada
[email protected] [email protected] B. De Moor Katholieke Universiteit Leuven, Leuven, Belgium
[email protected]
B. Huang University of Alberta, Edmonton, AB, Canada
[email protected]
XVIII
List of Contributors
L. Lauwers Vrije Universiteit Brussel, Brussels, Belgium
[email protected]
Y. Rochdi University of Cadi Ayyad, Marrakech, Morocco
[email protected]
Y. Liu University of Iowa, Iowa City, IA, USA liu
[email protected]
A. Sano Keio University, Yokohama, Japan
[email protected]
L. Ljung Link¨ opings universitet, Link¨ oping, Sweden
[email protected]
M. Schetzen Northeastern University, Boston, MA, USA
[email protected]
G. Mzyk Wroclaw University of Technology, Wroclaw, Poland
[email protected]
J. Schoukens Vrije Universiteit Brussel, Brussels, Belgium
[email protected]
M. Pawlak University of Manitoba, Winnipeg, Canada
[email protected]
R.K. Shield University of Iowa, Iowa City, IA, USA
[email protected]
K. Pelckmans Uppsala University, Uppsala, Sweden kristiaan.pelckmans@ esat.kuleuven.be
J.A.K. Suykens Katholieke Universiteit Leuven, Leuven, Belgium
[email protected]
D. Piga Politecnico di Torino, Torino, Italy
[email protected] R. Pintelon Vrije Universiteit Brussel, Brussels, Belgium rik.pintelon@ vub.ac.be D. Regruto Politecnico di Torino, Torino, Italy
[email protected]
V. Van Assche University of Caen Basse-Normandie, Caen, France
[email protected] J.W. van Wingerden Delft University of Technology, Delft, The Netherlands
[email protected] L. Vanbeylen Vrije Universiteit Brussel, Brussels, Belgium
[email protected]
List of Contributors
M. Verhaegen Delft University of Technology, Delft, The Netherlands
[email protected] J. V¨ or¨ os Slovak University of Technology, Bratislava, Slovakia
[email protected] J. Wang Peking University, Beijing, China
[email protected]
XIX
D.T. Westwick University of Calgary, AB, Canada
[email protected] A. Wills University of Newcastle, Callaghan, Australia
[email protected]
Chapter 1
Introduction to Block-oriented Nonlinear Systems Er-Wei Bai and Fouad Giri
1.1 Block-oriented Nonlinear Systems System identification refers to the experimental approach that consists of determining system models by fitting experimental data to a suitable model structure [14] in some optimal ways. Linear model structures can be based upon when the physical system remains in the vicinity of a nominal operation point so that the linearity assumption is satisfied. When a wide range of operation modes are involved, the linear assumption may not be valid and a nonlinear model structure becomes necessary to capture the system (nonlinear) behaviour. In relatively simple cases, suitable nonlinear model structures are obtained using the mathematical modelling approach that consists of describing the system phenomena using basic laws of physics, chemistry, etc. Then, system identification methods may be resorted to assign suitable numerical values to the (unknown) model parameters. When the mathematical modelling approach is insufficient, system identification must rely on ‘universal’ blackbox or grey-box nonlinear model structures. These include NARMAX models [9], multi-model representations [15], neuro-fuzzy models [3], Volterra series [19], nonparametric models [14] and others. In the present monograph, the focus is made on block-oriented nonlinear (BONL) models that consist of the interaction of linear time-invariant (LTI) dynamic subsystems and static nonlinear elements (Figure 1.1). The linear subsystems may be parametric (transfer functions, state-space representations, FIR, IIR...) or nonparametric (impulse response, frequency response...). The nonlinear elements may in turn be parametric or not, memory or memoryless. Finally, the system components may be interconnected in different ways (series, parallel, feedback). This high flexibility Er-Wei Bai Dept. of Elec. and Comp. Engineering, University of Iowa e-mail:
[email protected] Fouad Giri GREYC, University of Caen e-mail:
[email protected] F. Giri & E.-W. Bai (Eds.): Block-oriented Nonlinear System Identification, LNCIS 404, pp. 3–11. c Springer-Verlag Berlin Heidelberg 2010 springerlink.com
4
E.-W. Bai and F. Giri
u1
N
u2
y1
L
L
N
L
N
y2
L
Fig. 1.1: Example of BONL system consisting of LTI subsystems (L) and static nonlinearities (N)
provides these models with a remarkable ability to capture a large class of complex and nonlinear systems and motivates the great deal of interest paid to BONL models over the past fifteen years. It is worth noting that the model (linear and nonlinear) blocks may (and generally do) not correspond to physical components. Consequently, the connection points between blocks are generally artificial, i.e. they cannot be supposed to be accessible to measurements. The inaccessibility of such measurements, together with the system nonlinearities, makes BONL system identification a quite complex problem. Therefore, most currently available solutions concern relatively simple structures. The simplest and most known BONL structures are composed of just two blocks connected in series (Figures 1.2 and 1.3). The first one, the Hammerstein system, introduced in 1930 by the German mathematician A. Hammerstein [6], involves one input static nonlinear element in series with a dynamic LTI subsystem. The nonlinear element may account for actuator nonlinearities and other nonlinear effects that can be brought to the system input. Despite their simplicity, Hammerstein models have proved to be able to accurately describe a wide variety of nonlinear systems e.g. chemical processes [5], electrically stimulated muscles [7], power amplifiers [12], electrical drives [1], thermal microsystems [21], physiological systems [4], sticky control valves [20], solid oxide fuel cells [10], and magneto-rheological dampers [22]. u
NL
L
Fig. 1.2: Hammerstein model
y
1
Introduction to Block-oriented Nonlinear Systems
u
5
y
NL
L
Fig. 1.3: Wiener Model
The permutation of the linear and nonlinear elements in the Hammerstein model leads to what is commonly referred to the Wiener model (Figure 1.3) as a model of this type was first been studied in 1958 by N. Wiener [23]. In this model, the output nonlinear element may represent sensor nonlinearities as well as any nonlinear effects that can be brought to the system output. For instance, limit switch devices in mechanical systems and overflow valves can be modelled by output saturating nonlinearities. Moreover, the ability of Wiener models to capture complex nonlinear phenomena has been formally established. In this regard, it was shown that almost any nonlinear system can be approximated by a Wiener model with an arbitrarily high accuracy [2]. This theoretical fact has been experimentally checked through several practical applications including e.g. chemical processes [24, 11], biological systems [8] and others. A series combination of a Hammerstein and a Wiener model immediately yields a new model structure called the Hammerstein–Wiener system (Figure 1.4). The inverse combination leads to what is referred to as the Wiener–Hammerstein structure (Figure 1.5). These new structures offer higher modelling capabilities. Clearly, the Hammerstein–Wiener model is more convenient when both actuator and sensor nonlinearities are present. It has also been successfully applied to modelling several physical processes e.g. polymerase reactors [13], ionospheric processes [16], PH processes [18], magnetospheric dynamics. . . The Wiener–Hammerstein model (Figure 1.5) also finds applications. Closed-loop model structures (e.g. Figure 1.6) are resorted when feedback phenomena are involved. For instance, note that most industrial actuator valves are equipped with local feedback actions (commonly called positioners) to compensate coulomb frictions and other nonlinear effects. Then, the model structure like Figure 1.6 may then be quite suitable where the nonlinearity is of the saturation type.
u
L1
NL
L2
y
NL2
y
Fig. 1.4: Wiener–Hammerstein model
u
NL1
L Fig. 1.5: Hammerstein–Wiener model
6
E.-W. Bai and F. Giri
u
y
N
L
L
Fig. 1.6: Example of feedback block-oriented system
u
y
L
ξ
w N
Fig. 1.7: LFT representation of block-oriented systems with feedback actions
N
u
L
L
N
L
L
y
N
Fig. 1.8: Example of multichannel block-oriented nonlinear system
Interestingly, any block-oriented nonlinear system involving feedback elements, may be given the so-called linear fractional transformations (LFT) representation of Figure 1.7, where the block L (resp. N) accounts for all linear (resp. nonlinear) operators. Parallel block-oriented models (Figure 1.8) are suitable for modelling multichannel topology systems, e.g. electric power distribution, communication nets, multicell parallel power converters, etc.
1
Introduction to Block-oriented Nonlinear Systems
7
1.2 About This Book 1.2.1 Book Topics The previous section has emphasised the large diversity of BONL systems. The present book is not intended to be an encyclopedic monograph surveying all relevant identifications methods; it rather aims at pointing out the most recent and significant approaches i.e. those backed by formal (convergence, consistency) analyses and/or practical applications. As a matter of fact, most complete and successful solutions concern Hammerstein and Wiener systems or their simple series and parallel combinations. The present book makes a rigorous and illustrative presentation of these solutions. It consists of 24 chapters classed in height homogeneous parts. Each chapter includes an introduction, a conclusion and a self reference list. In addition to the present chapter, Part I contains a second chapter discussing some theoretical considerations about nonlinear system modelling. In particular, Wiener theory, which is an orthogonalisation of the Volterra series when the system input is a Gaussian random process, is discussed. It allows the determination of optimum systems from a desired system response. The limitation of the Wiener theory, that the input must be a Gaussian random process and the error criterion must be the mean-square, is obviated by use of the gate functions. Parts II to VII make together a survey of most recent and significant identification methods for various BONL systems. First, iterative and over-parametrisation methods are presented in Part II. In Chapter 3, Hammerstein–Wiener system identification is dealt with using an optimal two stage identification method combining a least squares parameter estimation and a singular value decomposition of two matrices whose dimensions are fixed and do not increase as the number of the data point increases. The algorithm is shown to be convergent in the absence of noise and convergent with probability one in the presence of white noise. In Chapter 4, the technique for compound mapping decomposition, that reduces the complexity of system description, is applied to general Hammerstein and Wiener types. In the case of Hammerstein–Wiener systems with piecewise affine nonlinearities, several different ways of decomposition can be performed to simplify the system description. Chapter 5 presents a normalised iterative identification algorithm for Hammerstein systems containing common nonsmooth piecewise-affine nonlinearities with saturation and preload characteristics. It is shown that the algorithms converge in one iteration step when the number of sample data is large. Stochastic identification methods are considered in Part III. Hammerstein systems are dealt with in Chapter 6 considering ARX and ARMAX linear subsystems and parametric, nonparametric and piecewise affine nonlinearities. Recursive parameter estimation algorithms are designed to identify the unknown parameters and points of the (nonparametric) nonlinearity. All recursive estimators are shown to converge to the true values with probability one. Stochastic Wiener system
8
E.-W. Bai and F. Giri
identification is dealt with in Chapter 7 using output error and maximum likelihood algorithms. The disturbances are allowed to be coloured making possible blind estimation of Wiener models. This case is accommodated by using the ExpectationMaximisation algorithm in combination with Particles methods. Alternative twostage, combined parametric-nonparametric algorithms are presented in Chapter 8. These involve estimation of the interconnecting signal and decomposition of the identification task into two subroutines. Chapter 9 presents a statistical framework for identification of parametric and nonparametric block-oriented systems, eventually including modelling errors. The developed identification approach makes classical nonparametric estimates amenable to incorporate constraints and be able to overcome the celebrated curse of dimensionality and system complexity. The proposed methodology is illustrated by examining semi-parametric Hammerstein and parallel systems. Chapter 10 gives an introduction to the invariance property and explains why it is so useful for identification of block-oriented systems. For example, the Bussgang’s classic theorem shows that this property holds for a static nonlinearity with a Gaussian input signal. Frequency domain methods are presented in Part IV. Hammerstein models with nonparametric nonlinearity are dealt with in Chapter 11. The identification method is based on the fundamental frequency and therefore, no a priori information of the nonlinearity structure is required. Wiener systems with nonparametric nonlinearities are dealt with in Chapter 12. The identification method relies on the investigation of generalised Lissajous curves. In Chapter 13, parametric Wiener–Hammerstein system identification is coped with using the concept of the best linear approximation. The identification method makes use of the frequency response function of the nonlinear system as a function of the root-mean-square (rms) value and the colouring of the input signal. Identification methods based on the Support Vector Machines (SVM), subspace and least-squares techniques are presented in Part V. Parameter identification of multiple-input multiple-output parametric Hammerstein and Wiener systems is dealt in Chapter 14. The identification method combines the predictor-based subspace method together with ideas from linear algebra and estimation theory. An alternative solution based on the Least Squares Support Vector Machines is presented in Chapter 15. The method is essentially based on Bai’s overparametrisation technique in Chapter 3, and combines this with a regularisation framework and a suitable model description which fits nicely within the LS-SVM framework with primal and dual model representations. Hammerstein systems with known structure input nonlinearities are studied in Chapter 16. The case of input nonlinearities parametrised by one parameter is dealt with using a deterministic identification method based on the idea of the separable least squares. The identification problem is shown to be equivalent to a one-dimensional minimisation problem. For a general input nonlinearity, a correlation analysis based identification algorithm is used. Blind identification methods are described in Part VI. In Chapter 17, Wiener and Hammerstein models are dealt with assuming that the unobserved input is white
1
Introduction to Block-oriented Nonlinear Systems
9
Gaussian noise, that the linear block is minimum phase, that the static nonlinearity is invertible, and, that the output measurements are noise-free. A Gaussian maximumlikelihood estimator is first constructed. Then, a Gauss-Newton iterative scheme is used to minimise the derived maximum-likelihood cost function, and asymptotic uncertainty bounds are computed. Hammerstein systems, with nonparametric static and backlash nonlinearities, are dealt with in Chapter 18. The (blind) identification method involves piecewise constant inputs and subspace direct equalisation. Hammerstein–Wiener model identification is studied in Chapter 19. The proposed (blind) identification approach requires no prior structural knowledge on the input nonlinearity and no white noise assumption on the input. Decoupling input methods are presented in Part VII. The main result of Chapter 20 is to show that the linear subsystem can be made decoupled from the nonlinear element in Hammerstein model identification. Therefore, identification of the linear part for a Hammerstein model becomes a linear problem and accordingly enjoys the same convergence and consistency results as if the unknown nonlinearity is absent. A similar result is presented in Chapter 21 for Hammerstein systems with backlash nonlinearity. Bounded error identification approaches are illustrated in Chapter 22 for Hammerstein systems with backlash nonlinearities. The identification method is a two-stage procedure using suitable tools like Pseudo Random Binary input Sequence (PRBS), relaxation techniques and linear matrix inequalities. The last part of the book illustrate the modelling capability of BONL models through biomedical and physiological applications. Chapter 23 deals with modelling and identification of physiological joint dynamics using BONL models identification. The focus is made on the stretch reflex i.e. the relationship between the angular velocity of a joint and the resulting muscle activity. In this context, the Hammerstein structure and parallel block structure turn out to be quite appropriate. Different identification techniques are used, i.e. iterative, correlation and separable least-squares methods. In Chapter 24, the Wiener–Hammerstein structure is used to model a paralysed muscle under electrical stimulation. An effective identification method is developed specifically for this system.
1.2.2 Who Can Use This Book This book will be of interest to all those who are concerned by nonlinear system identification, e.g. senior undergraduate and graduate students, practitioners and theorists in industries and academic institutions in the areas related to system identification, control and signal processing. The book can be used by active researchers and new comers to the area of nonlinear system identification. It is also useful for industrial practitioners and control technology designers. For students and new comers, the main prerequisite is to have previously followed a postgraduate course on linear system identification or equivalent courses.
10
E.-W. Bai and F. Giri
References 1. Balestrino, A., Landi, A., Ould-Zmirli, M., Sani, L.: Automatic nonlinear auto-tuning method for Hammerstein modeling of electrical drives. IEEE Transactions on Industrial Electronics 48, 645–655 (2001) 2. Boyd, S., Chua, L.O.: Fading memory and the problem of approximating nonlinear operators with Volterra series. IEEE Transactions on Circuits and Systems 32, 1150–1161 (1985) 3. Brown, M., Harris, H.C.: Neurofuzzy adaptive modelling and control. Prentice Hall, New Jersey (1994) 4. Dempsey, E., Westwick, D.: Identification of Hammerstein models with cubic spline nonlinearities. IEEE Transactions on Biomedical Engineering 51, 237–245 (2004) 5. Eskinat, E., Johnson, S., Luyben, W.L.: Use of Hammerstein models in identification of nonlinear systems. AIChE Journal 37, 255–268 (1991) 6. Hammerstein, A.: Nichtlineare integralgleichung nebst anwendungen. Acta Mathematica 54, 117–176 (1930) 7. Hunt, K.J., Munih, M., Donaldson, N.D., Barr, F.M.D.: Investigation of the Hammerstein hypothesis in the modeling of electrically stimulated muscle. IEEE Transactions on Biomedical Engineering 45, 998–1009 (1998) 8. Hunter, I.W., Korenberg, M.J.: The identification of nonlinear biological systems: Wiener and Hammerstein cascade models. Biological Cybernetics 55, 135–144 (1986) 9. Johansen, T.A., Foss, B.A.: Constructing NARMAX models using ARMAX models. International Journal of Control 58, 1125–1153 (1993) 10. Jurado, F.: A method for the identification of solid oxide fuel cells using a Hammerstein model. Journal of Power Sources 154, 145–152 (2006) 11. Kalafatis, A., Wang, L., Cluett, W.R.: Identification of time-varying pH processes using sinusoidal signals. Automatica 41, 685–691 (2005) 12. Kim, J., Konstantinou, K.: Digital predistortion of wideband signals based on power amplifier model with memory. IEE Electronics Letters 37, 1417–1418 (2001) 13. Lee, Y.J., Sung, S.W., Park, S., Park, S.: Input test signal design and parameter estimation method for the Hammerstein–Wiener processes. Industrial and Engineering Chemistry Research 43, 7521–7530 (2004) 14. Ljung, L.: System identification: Theory for the user. Prentice-Hall, Englewood Cliffs (1999) 15. Murray-Smith, R., Johansen, T.A.: Multiple model approaches to modelling and control. Taylor & Francis, London (1997) 16. Palanthandalam-Madapusi, H.J., Ridley, A.J., Bernstein, D.S.: Identification and prediction of ionospheric dynamics using a Hammerstein–Wiener model with radial basis functions. In: American Control Conference, Portland, OR, USA, pp. 5052–5057 (2005) 17. Palanthandalam-Madapusi, H.J., Bernstein, D.S., Ridley, A.J.: Space Weather Forecasting: Identifying periodically switching block-structured models for predicting magneticfield fluctuations. IEEE Control Systems Magazine 27, 109–123 (2007) 18. Park, H.C., Sung, S.W., Lee, J.: Modeling of Hammerstein–Wiener processes with special input test signals. Industrial and Engineering Chemistry Research 45, 1029–1038 (2006) 19. Schetzen, M.: The Volterra and Wiener theories of non-linear systems. Krieger Publishing Co. (1980), (reprint Edition with Additional Material, Malabar, Fla) 20. Srinivasan, R., Rengaswamy, R., Narasimhan, S., Miller, R.: Control loop performance assessment - Hammerstein model approach for stiction diagnosis. Industrial and Engineering Chemistry Research 44, 6719–6728 (2005)
1
Introduction to Block-oriented Nonlinear Systems
11
21. Sung, S.: System identification method for Hammerstein processes. Industrial and Engineering Chemistry Research 41, 4295–4302 (2002) 22. Wang, J., Sano, A., Chen, T., Huang, B.: Identification of Hammerstein systems without explicit parameterization of nonlinearity. International Journal of Control 82, 937–952 (2009) 23. Wiener, N.: Nonlinear problems in random theory. Wiley, New York (1958) 24. Zhu, Y.: Distillation column identification for control using Wiener model. In: American Control Conference, San Diego, California, vol. 5, pp. 3462–3466 (1999)
Chapter 2
Nonlinear System Modelling and Analysis from the Volterra and Wiener Perspective Martin Schetzen
2.1 Introduction A fundamental activity of modern science is the development and analysis of models. A scientific model is just a representation of a phenomenon. It is desired because it can be used to obtain new insights concerning a phenomenon and also to predict the outcome of experiments involving the phenomenon [5]. System theory, which is the theory of models, is thus fundamental to science. An important problem in system theory is the development of nonlinear models of phenomena. The Wiener theory of nonlinear systems is surveyed in this chapter. This survey will begin with a basic discussion of the Volterra theory which lies at the base of the Wiener theory. For this, some of the basic theoretical extensions to the Volterra theory will be outlined since they illustrate the broad usefulness of the Volterra series in the modelling of nonlinear phenomena which have been the basis of a number of practical applications of the Volterra theory.
2.2 General System Considerations Systems are classified as being either autonomous and nonautonomous. An autonomous system, such as the solar system, is one in which there is no external input. A nonautonomous system is one in which there is an external input which results in a system output. The discussion in this chapter will be limited to nonautonomous systems. Further, we limit our discussion to nonexplosive systems which are systems for which bounded inputs result only in bounded outputs. This condition is often referred to as the BIBO stability criterion. Note that a linear time invariant (LTI ) system is unstable in accordance with this criterion if there exits just one bounded input for which the output is unbounded. For example, a causal LTI system 1 u(t), in which u(t) is the unit step function, is BIBO with the impulse response 1+t Martin Schetzen, Professor Emeritus Department of Electrical and Computer Engineering, Northeastern University, Boston, Ma F. Giri & E.-W. Bai (Eds.): Block-oriented Nonlinear System Identification, LNCIS 404, pp. 13–24. c Springer-Verlag Berlin Heidelberg 2010 springerlink.com
14
M. Schetzen
unstable since its response to a step is unbounded. We note from this example that an LTI system can not have infinite memory for it to be nonexplosive since, as the past of an input increases, its contribution to the present value of the system output must decrease. This is not necessarily so for systems which are not LTI. In fact, many physical nonexplosive systems do have infinite memory. Generally, systems with more than one state of stable equilibrium and which can be switched from one stable state to another by the input possess infinite memory. A simple example of a system in this class of systems is the fuse. Our discussion will not apply to the class of systems with infinite memory and we consider systems with only one stable state. For convenience, our discussion will centre on time invariant systems in which the input and output are functions of time. For physicality, we thus will require the systems to be causal although, with a slight change of viewpoint, the theory also can be applied to noncausal systems such as occur with mechanical systems in which an input about x = 0 can have effects on the system over both positive and negative values of x . The Volterra and Wiener class of nonlinear systems which we shall discuss thus is restricted to non-infinite memory nonautonomous systems which are not explosive.
2.3 The Volterra Series LTI system theory is the basis for modelling many areas such as circuits, dynamics, and electromagnetics and it has been very successful in enabling the development of important practical applications as well as deep insights into these areas [4]. Since many practical nonlinear systems appear to be well approximated as a linear system for inputs with sufficiently small amplitude, Wiener considered representing the output, y(t), of a nonlinear system as a functional series expansion of its input, x(t). The type of expansion he chose was the Volterra Series: ∞
y(t) = h0 + ∑ Hn [x(t)]
(2.1)
hn (τ1 , ..., τn )x(t − τ1 )...x(t − τn )d τ1 ...d τn .
(2.2)
n=1
in which +∞
Hn [x(t)] =
−∞
...
+∞ −∞
In these equations, Hn is called a Volterra operator and in the nth -order integral, hn (τ1 , ..., τn ) is called the Volterra kernel of the Volterra operator. This type of expansion is named in recognition of Volterra’s contributions [10]. The Volterra series can be viewed as power series with memory. This can be seen by changing the input by a gain factor, c, so that the new input is cx(t). The new series for the system output is then seen to be ∞
∞
n=1
n=1
y(t) = h0 + ∑ Hn [cx(t)] = h0 + ∑ cn Hn [x(t)]
(2.3)
2
Nonlinear System Modelling and Analysis
15
which is a power series in the amplitude factor, c. It is a series with memory since the functionals, Hn [x(t)], are convolutions.
2.4 Applications of the Volterra Series Many important nonlinear problems have been successfully analysed by use of the Volterra series. The first application of the Volterra series was by Wiener in which he analysed a nonlinear RLC circuit. This was accomplished by making use of its power series property to expand the solution of the differential equation describing the circuit [11]. Methods were then further developed for the measurement and synthesis of the Volterra series [6]. Also, by making use of its power series property, the Volterra series of nonlinear feedback systems and pth -order inverses of nonlinear systems were determined [6]. A pth -order inverse is one for which tandem connection of a nonlinear system and its pth -order inverse is a system for which the second through the pth -order nonlinearity is zero. The determination of pth -order inverse of a given system requires only the first p Volterra kernels of the given system. This is important since, in practice, only the first p Volterra kernels of a given system are known. One physical application of the pth -order inverse is the undesirable distortion in an audio system due to various nonlinearities such as those of the loudspeaker and of the air through which the sound wave travels which can be reduced by placing a pth -order inverse before the loudspeaker. One can reduce the distortion introduced by audio or video tape recorders in a similar manner. Also, the suppression of the effect of slight transceiver nonlinearities in broadcast satellites can be accomplished without disturbing the satellite by installing the pth -order inverse of the nonlinearity at the output of the ground station transmitter or at the input of the ground station receiver. These are examples of predistorting a signal to combat the distortion introduced by the system through which the signal will pass. In effect, the predistortion of a signal using a pth -order inverse as a pre-inverse is similar to the coding a signal for transmission through a corrupting medium while the use of a pth -order inverse as a post-inverse is similar to the decoding of a signal which has been corrupted in its transmission through a medium. Another important application of Volterra series is the analysis of nonlinear networks. The power series property of the Volterra series was used to develop a multilinear analysis technique for the analysis of networks which are slightly nonlinear [6]. Instead of expanding the network differential equations in a Volterra series, a general multilinear model of the network is derived directly from the network itself from which the effect of the network nonlinearities on each branchcurrent and node voltage is made evident. Initial conditions are incorporated by replacing them with equivalent sources. In this model, the network nonlinearities are equivalent to known voltage and current sources embedded in a linear network. In consequence, all the techniques and insights obtained from linear circuit theory can be used. An important aspect of this procedure, as opposed to the usual methods, is that multilinear network theory maintains the network topology so that the effect of a given
16
M. Schetzen
nonlinear element on the network can be studied. It is this from which a theory for the synthesis of nonlinear networks can be developed. For example, one type of synthesis problem which occurs in electronics is the linearisation of a given nonlinear network. From knowledge of the specific effect of a given nonlinear element on the network, one can determine the placement of other nonlinear elements in the network required to cancel or at least reduce the undesirable effects of the given nonlinear element. An illustration of this linearisation technique is the classic pushpull circuit. Push-pull circuits are often used in electronic power amplifiers in which a large amount of distortion is introduced by operating a transistor in class B. The distortion is significantly reduced by introducing a second similar transistor and connecting it in a push-pull arrangement. In this manner, the power amplifier distortion is reduced significantly by the introduction of a second nonlinear element. Since multilinear analysis only requires the sequential analysis of the same linear network, the analysis can be done on a computer by using any of a number of software packages available for the analysis of linear networks. A computer program in which the software package is called for each sequential term would facilitate the analysis and enable one to obtain a rather high-order Volterra representation of a nonlinear network. It also turns out that the response of a given network to a particular input can be expressed as the linear combination of a particular set of waveforms which are invariant waveforms of the network response. The network nonlinearities and source amplitudes affect only the specific linear combination of these waveforms but not the waveforms themselves. These invariant waveforms thus are the building blocks of the nonlinear network response. With them, it becomes evident why the convergence of the Volterra series often is not a uniform convergence. Also the effect of the network nonlinearities and the source amplitudes upon the network response can be studied in terms of these invariant waveforms. In this manner, multilinear theory, which is based on the Volterra series, allows a major step forward to be taken in the theory of nonlinear networks and opens the possibility of greater advances to be made. The ability to obtain practical insights of a system by its Volterra series analysis is well illustrated by the analysis of the laser diode. Useful insight of its operation was obtained from its Volterra model [8, 13]. From this an analysis of the diode intermodulation modulation distortion (IMD) and its sources were obtained from which methods for reducing the distortion are obtained [9]. In addition, an understanding of methods for optimising the diode operation were obtained from the Volterra model [7]. The structure of a system’s block diagram determines the form of its Volterra kernels which can be exploited to determine general properties of the system. A good example is the class of systems which appear to be a good approximation of many biologic systems are those which are composed of a linear system followed by nonlinear-no memory system, and followed by a linear system (LNL systems). Although I have discussed systems in which the input and output are functions of time, this need not be so. For example, in applications to mechanical systems such as the static bending of a beam due to loading, the input and output would be functions only of position. Note that for such applications, the system Volterra
2
Nonlinear System Modelling and Analysis
17
kernels are not constrained by the requirements of causality as they are for systems in which the input and output are functions of time.
2.5 The Wiener Theory As a consequence of its power series character, there are some limitations associated with the application of the Volterra series to nonlinear problems. One major problem is the convergence of the Volterra Series. In function theory, analytic functions are those that are the sums of a Taylor series. Similarly, systems which can be represented by a convergent Volterra series are called analytic systems. The important successes of the Volterra Series in practical applications makes the question of the class of nonlinear systems that can be represented by a set of Volterra functionals an important one. That is, what is the class of nonlinear systems for which the Volterra operators are complete? Since the Volterra series is a functional power series, we first look to the approximation of functions. From the Weierstrass Approximation Theorem, we know that the functions, 1, x, x2 , x3 , ...,form a complete set of functions in every closed interval. As a result, any function which is continuous in a closed interval can be approximated uniformly by polynomials in the interval. Fr´echet extended this result concerning functions to functionals [1]. He proved that any continuous functional can be represented by a Volterra series whose convergence is uniform in all compact sets of continuous functionals. A functional, y(t) = H [x(t)], is said to be continuous if the values y1 (t) = H[x1 (t)] and y2 (t) = H[x2 (t)] are close whenever the corresponding input functions, x1 (t) and x2 (t), are close. So analogously, the Fr´echet theorem is a and statement that the set of Volterra functionals, Hn [x(t)], is complete. However, since many systems are not analytic, the Volterra series expansion of a system response may not converge. In the theory of functions, the convergence problem associated with a power series representation of a function, can be circumvented by using a different convergence criterion. One common method of accomplishing this is to use an L2 -norm as the measure of the error and to expand the function in an orthogonal series. One of Wiener’s basic innovations was to extend this notion of orthogonality of functions to functionals. In this manner, he circumvented the convergence problem in system modelling associated with the Volterra series representation of a nonlinear system. First, Wiener chose Brownian motion as the input for the characterisation of a nonlinear system [12]. To understand why he made this choice, first consider the characterisation of a linear system. It is only as a consequence of the superposition property of a linear system that any one waveform of a wide class of waveforms can be used as an input probe to characterise its input-output mapping. Since superposition does not hold for nonlinear systems, Wiener realised that the input probe required to characterise the nonlinear system would have to be a waveform which, over a time interval, eventually approximates any given waveform. That is, since the response of a nonlinear system to some arbitrary input can not be determined by superposition of its responses to some restricted class of input waveforms, the required input probe should, as time evolves, approximate all possible input waveforms. The
18
M. Schetzen
waveform representation of a Brownian particle’s wanderings was just such a waveform. To represent a nonlinear operator with a Brownian input, Wiener used a Stieltjes form of the Volterra functional which is equivalent to our familiar Riemann form of the Volterra functionals with the input, x(t), being white Gaussian time function1 . With these, he formed an orthogonal set of functionals for the analysis of nonlinear systems. He called the orthogonal set he formed, G-functionals. A time function, x(t), is said to be white if its power density spectrum is a constant so that φxx ( jω ) = A for which its autocorrelation function is φxx (τ ) = x(t)x(t + τ ) = Aδ (t) in which the overbar indicates the time average and δ (t)is the unit impulse. A white time function is not a physical waveform since its total average power is infinite. However, it is an accurate idealisation of a physical waveform for which the power density spectrum is flat over a band of frequencies considerably wider than the bandwidth of the system to which is being applied as an input. This idealisation results in great analytical simplification.
2.6 The Wiener G-functionals As I stated above, Wiener called his orthogonal set of functionals G-functionals since they are orthogonal with respect to a white Gaussian input. That is, he formed the G-functionals, G p [k p ; x(t)] in which k p is the kernel and x(t) is the input of the pth -order G-functional, to satisfy the condition G p [k p ; x(t)] Gq [k p ; x(t)] = 0
(2.4)
p = q when x(t) is from a white Gaussian process with the power density φxx ( jω ) = A A pth -order G-functional, G p [k p ; x(t)], is the sum of Volterra functionals of orders less than or equal to p. The form of a pth -degree G-functional, is p
G p [k p ; x(t)] =
∑ Kn(p) [x(t)] .
(2.5)
n=0
All kernels of order less than p are determined by the leading kernel, k p , to satisfy the orthogonality condition. A pth -degree G-functional, G p [k p ; x(t)], thus is the sum of Volterra functionals Kn(p) of orders less than or equal to p which are specified only by the leading Volterra kernel,k p. The first few G-functionals are: G0 [k0 ; x(t)] = k0 , 1
(2.6)
Wiener expressed the Volterra operators as Stieltjes integrals with the input being a member of an ensemble of Brownian waveforms. Since Brownian motion is differentiable almost nowhere, the Riemann form of these integrals does not formally exist. However, the statistical properties of the Volterra operators expressed in the Riemann form with a white Gaussian input is identical with those of the Volterra operators expressed in Stieltjes form with the input being a Brownian waveform. For convenience, we use the Riemann form with a white Gaussian input.
2
Nonlinear System Modelling and Analysis +∞
k1 (τ1 ) x (t − τ1 ) d τ1 ,
(2.7)
k2 (τ1 , τ2 ) x (t − τ1 ) x (t − τ2 ) d τ1 d τ2 + k0(2) ,
(2.8)
G1 [k1 ; x(t)] = +∞ +∞
G2 [k2 ; x(t)] =
−∞
−∞
19
−∞
in which k0(2) = −A
+∞ −∞
k2 (τ1 , τ1 ) d τ1 ,
(2.9)
and +∞ +∞ +∞
G3 [k3 ; x(t)] =
−∞
−∞
−∞
k3 (τ1 , τ2 , τ3 ) x (t − τ1 ) x (t − τ2 ) x (t − τ3 )
d τ1 d τ2 d τ3 + in which k1(3) (τ1 ) = −3A
+∞ −∞
+∞ −∞
k1(3) (τ1 ) x (t − τ1 ) d τ1 , (2.10)
k3 (τ1 , τ2 , τ2 ) d τ2 .
(2.11)
2.7 System Modelling with the G-functionals In terms of the G-functionals, the pth -order model of a nonlinear system is p
y p (t) =
∑ Gn [kn ; x(t)] .
(2.12)
n=0
The kernels, kn , are called the Wiener kernels of the nonlinear system. A practical method was developed [2] by which the Wiener kernels can be determined by crosscorrelating the system response, y(t), with the white Gaussian input, x(t), as ⎧ ti ≥ 0, i = 1, 2, ..., n , ⎪ 1 ⎪ ⎨ n!A n y(t)x (t − τ1 ) x (t − τ2 ) ...x (t − τn ) n = 0, 1, 2, ..., p , kn (τ1 , τ2 , ..., τn ) = ⎪ ⎪ ⎩ 0 otherwise. (2.13) In L2 -norm, this representation results in the optimum pth -order model of the given system. Note that it is optimum only for the white Gaussian input. Once the optimum pth -order model of a system has been determined, the corresponding pth -order Volterra series can be obtained simply by summing the Volterra kernels with the same order of each orthogonal functional. Wiener originally showed that his set of G-functionals is complete in the class of nonexplosive nonlinear systems whose memory is not infinite which I call the Wiener class of systems [6]. By nonexplosive in this context I mean systems for which the output has a finite mean-square value when the input is from a white Gaussian process. Although infinite memory linear systems are not physical, there is
20
M. Schetzen
a large class of infinite memory nonlinear systems which are physical such as a fuse which never forgets whether the current through it ever exceeded its rating. Systems which switch their states are generalisations of the fuse and so are infinite memory systems which are not members of the Wiener class of systems. Note that the model obtained is the optimum system so that the mean-square error obtained with a white Gaussian input with φxx ( jω ) = A will not be smaller for any other system of the Wiener class. The corresponding pth -order Volterra series can be obtained simply by summing the Volterra kernels with the same order of each orthogonal functional. This is the optimum pth -order Volterra model of the actual system for the given white Gaussian input even though the system may not be able to be represented by a convergent Volterra series. In practical applications, it is desirable to model a given system with an input with the same spectrum as that used in normal operation. For this, I have extended the G-functional theory to non-white Gaussian inputs whose spectra are factorable and also to non-Gaussian inputs for which there exists a one-one map to a Gaussian process. These extensions, as well as other concepts are discussed, are detailed in [6]. With these extensions, optimum Volterra models of a given system also can be determined for these classes of inputs. Optimum models for any input will be discussed below.
2.8 General Wiener Model From a study of the Wiener theory, it can be shown that any system of the Wiener class can be represented by the model shown in Figure 2.1. Section A of the model is a single-input multi-output linear system. Section B is a multi-input multi-output no-memory system composed of just multipliers and adders. The single-output of Section C is just a linear combination of its inputs. Sections A and B are the same for all systems of the Wiener class. The specific system being modelled is determined only by the amplifier gains in Section C. The outputs of section A are the coefficients of a representation of the past of the system input. The nonlinear operations on the past of the input is performed in Section B. In a practical model, Sections A and B are constructed with only a finite number of outputs. The outputs of Section A then are the coefficients of what the model can represent of the past of the system input and the outputs of Section B then are coordinates of the class of polynomial nonlinear operations that the model can represent. For an input from a Gaussian process, the outputs of Section A are jointly Gaussian and statistically independent so that outputs of Section B can be made
Fig. 2.1: Wiener Model
2
Nonlinear System Modelling and Analysis
21
linearly independent which enables the determination of the amplifier gains in Section C individually from certain simple averages. However, if the input, x(t), is not Gaussian, then the outputs of Section A are not jointly Gaussian and their joint distribution is extremely difficult to determine. The determination of the structure of Section B for which its outputs are linearly independent is then not possible. In such cases, one technique to determine the optimum amplifier gains in Section C is by a surface search technique. If a surface search technique is used, the optimum system for any convex error criterion can be determined. However, another procedure for inputs of any class is to use the Gate function model with which optimum systems even for certain non-convex error criteria can be determined.
2.9 The Gate Function Model If the input, x(t), is not Gaussian or the desired error criterion is not the mean-square criterion, a practical approach is obtained by using Gate functions. In a Gate function model, Section B is composed of Gate functions for which a given output of Section B is non- zero only if the output amplitudes of Section A are in a given multidimensional cell. As I’ve shown [6], a great advantage of the Gate function model is that the optimum amplifier gains can be determined individually for any weighted function of any error criterion, even many non-convex ones. However, the Gate function model usually will require more amplifier gains in Section C than required for the Wiener model with the same modelling error. This is the price that must be paid for the added generality and flexibility which is obtained with the Gate function model. But the number of amplifier gains can be substantially reduced by a judicious choice of the set of linear operators used in Section A. Some techniques for accomplishing this are detailed in [3, 6]. For a given input of a system, the output error is the difference between its response and some desired response. The measure of the size of the error used is called the error criterion. For a given error criterion, an optimum system is that system for which the size of the error is the smallest possible. Thus different error criteria generally will result in different optimum systems. That is, there is no such thing as the optimum system; an optimum system is only optimum relative to some given error criterion. Classically, the mean-square error criterion has been used to determine optimum linear systems because of its analytic tractability and that the only data required is the autocorrelation function of the input and the crosscorrelation function of the input and desired output. However, other error criteria are often desired. For example, consider attempting to hit the bull’s-eye of a target by using a controller to aim the gun. For such a system, small errors are well tolerated; it is just the large errors which it is desired to reduce significantly. The error criterion should then be one which the measure of the size of the error is small for misses of the target centre less than the bulls-eye radius but increases rapidly for misses larger than the bulls-eye radius. Note that in all our discussion above, there is no requirement that the output, y(t), of the system being modeled be the output of a physical system. The output thus can be just some desired output, z(t), of some unknown system. The optimum system is
22
M. Schetzen
called an optimum predictor if the desired output is a prediction of the system input as, for example, in predictive control in which it is desired to lead a target. The determination of an optimum predictor assumes that the future is not affected by the prediction. Thus, the prediction of the future of your speech waveform is valid only if you are not informed of the prediction allowing you to purposely modify its future. The only waveforms which can be predicted with zero error are quasianalytic ones which are not physical waveforms so that, in any practical case, there always will be some prediction error. The optimum system is called an optimum filter if the system input is a corrupted version of a signal and the desired system output is the uncorrupted signal. Except for the determination of an optimum LTI system using the mean-square error criterion, the analytic determination of an optimum system is generally not possible. The reason is that all the required data is normally not available and even if known, the resulting equations are difficult, if not impossible, to solve analytically. In consequence, an approach that can be taken is to consider a system model that can represent any system of a representative class of systems by a choice of a set of coefficients of the model. The desired optimum system is assumed to be able to be represented by a member of the class of systems. The set of coefficients of the model which represents the optimum system is then determined. This determination can be done experimentally. If the optimum system is not a member of the class of systems that can be represented, then the experimental procedure results in that class member with the smallest error. This approach obviates the difficulties involved with analytical techniques.
2.10 An Optimum System Calculator Of the various system models, the gate function model has the great advantage over the other models discussed in that it can be used to determine optimum systems for any input and any desired proper error criterion. However, the gate model has the disadvantage of requiring the determination of many more coefficients than the Wiener model. This disadvantage, though, can be overcome by only using the gate function model as an optimum system calculator. Advances over the last few years in computer technology have made such a calculator practical. The gate model so determined is not very practical to use as an optimum system since the number of its coefficients is large. However, the gate model only needs to be constructed once and used as a computer to determine optimum systems. For practical applications one then could make a much simpler model of the gate model by constructing a Volterra model of it. This can be accomplished, for example, by first determining a Wiener model of it as described above. Note that this determination also can be done digitally by construction a digital model and using a sample of a Gaussian random process as the input. This determination would not require much time since the number of coefficients of the Wiener model are so much less. The Wiener model then can be realised as a Volterra series in digital or analog form by summing Volterra terms of the Wiener model with the same order.
2
Nonlinear System Modelling and Analysis
23
If it appears that the error obtained with this Volterra model could possibly be reduced significantly, one could determine a second optimum gate model using, for example, a different set of linear systems and/or gates. The desired output used to determine the second optimum gate model would be the original desired output minus the output of the Volterra model that has been determined. The Volterra model of the second gate model could then be added in parallel to the original Volterra model to form a new Volterra model with a smaller error. The realisation of the new Volterra system is simplified by first summing terms of the same order. Note that the gate function model computer, which has not yet been constructed, is only used as a calculator to determine desired optimum systems from which optimum systems with a simpler realisation are obtained. The philosophy of this procedure to experimentally determine an optimum system for some desired error criterion from a representative sample of a system input and some desired system response replaces the procedure of determining the properties of the input and desired system response approximately and then using these data to approximately determine the desired system analytically. This is the basic philosophic approach of the Gate function calculator for optimum system determination. Also note that, as opposed to analytical methods, the direct experimental determination of an optimum system with the Gate function calculator for any desired error criterion with any input and desired response results in a system for which the closeness of the system determined to the optimum system is controlled only by the number of coefficients used. Also, as discussed above, the closeness of the resulting system can easily be made smaller if desired. This chapter has only been a survey illustrating the importance of the Volterra series and Wiener theory in system theory and their usefulness in system applications. Full details of the developments discussed and possible future developments are contained in [6].
References 1. Fr´echet, M.: Sur Les Fonctionelles Continues. Annales Scientifiques de L’Ecole Normale Superieure, 3rd -Ser 27, 193–216 (1910) 2. Lee, Y.W., Schetzen, M.: Measurement of the Wiener Kernels of a Nonlinear System by Crosscorrelation. International Journal of Control 2(3), 237–254 (1965) 3. Schetzen, M.: Asymptotic Optimum Laguerre Series. IEEE Transactions on Circuit Theory CT-18(5), 493–500 (1971) 4. Schetzen, M.: Linear Time-Invariant Systems. John Wiley & Sons, New York (2003) 5. Schetzen, M.: Airborne Doppler Radar: Applications, Theory, and Philosophy, Reston, vol. 215, American Institute of Aeronautics and Astronautics (2006) 6. Schetzen, M.: The Volterra and Wiener Theories of Nonlinear Systems, Krieger Publishing Co., Reprint Edition with Additional Material, Malabar, Fla (2006) 7. Schetzen, M.: Analysis of the Single-mode Laser-diode Linear Model. Optics Communication 282, 2901–2905 (2009) 8. Schetzen, M., Yildirim, R.: System Theory of the Single-mode Laser-diode. Optics Communication 219, 341–350 (2003)
24
M. Schetzen
9. Schetzen, M., Yildirim, R., C¸elebi, F.V.: Intermodulation Distortion of the Single-Mode Laser-Diode. Applied Physics B 93, 837–847 (2008) 10. Volterra, V.: Theory of Functionals and of Integral and Integro-Differential Equations. Dover Publications, Inc., New York (1959) 11. Wiener, N.: Response of a Nonlinear Device to Noise, Report 129, Radiation Laboratory, MIT., Cambridge, Ma (April 1942) (Also published as U.S. Department of Commerce Publications, PB-58087) 12. Wiener, N.: Nonlinear Problems in Random Theory. Technology Press, MIT & Wiley, New York (1958) 13. Yildirim, R., Schetzen, M.: Applications of the Single-mode Laser-diode. Optics Communication 219, 351–355 (2003)
Chapter 3
An Optimal Two-stage Identification Algorithm for Hammerstein–Wiener Nonlinear Systems Er-Wei Bai
3.1 Introduction Consider a scalar stable discrete time nonlinear dynamic system represented by p
q
n
m
i=1
l=1
j=1
t=1
y(k) = ∑ ai { ∑ dl gl [y(k − i)]} + ∑ b j { ∑ ct ft [u(k − j)]} + η (k)
(3.1)
where y(k), u(k) and η (k) are the system output, input and disturbance at time k respectively. The gl (·)’s and ft (·)’s are non-linear functions and a = (a1 , ..., a p ) , b = (b1 , ..., bn ) , c = (c1 , ..., cm ) , d = (d1 , ..., dq )
(3.2)
denote the system parameter vectors. The model (3.1) may be considered as the system where two static nonlinear elements N1 and N2 surround a linear block. It is different from the well-known Wiener–Hammerstein model [2] where two linear blocks surround a static nonlinear element and also different from the Hammerstein model discussed in [3, 4, 7, 8, 9] composed of a static nonlinear element followed by a linear block. The purpose of identification is to estimate unknown parameter vectors a, b, c and d from the observed input-output measurements. Through out the chapter, fi ’s (i = 1, 2, ..., m) and g j ’s ( j = 1, 2, ..., q) are assumed to be a priori known smooth nonlinear functions and the orders q, n, p and m are assumed to be known as well. Although applied to modelling some physical systems (e.g., see [10]), the system (3.1) has received little attention as far as the identification is concerned. On the contrary, there exists a large body of work on identification of the Wiener–Hammerstein model and the Hammerstein model. Unfortunately, schemes developed for either the Wiener–Hammerstein model or the Hammerstein model do not apply to the system (3.1) directly. The widely suggested algorithm for the Wiener–Hammerstein Er-Wei Bai Dept. of Elec. and Comp. Engineering, University of Iowa e-mail:
[email protected] F. Giri & E.-W. Bai (Eds.): Block-oriented Nonlinear System Identification, LNCIS 404, pp. 27–34. c Springer-Verlag Berlin Heidelberg 2010 springerlink.com
28
E.-W. Bai
model is the correlation technique [2] which relies on a separability assumption. Besides computational complexity, this assumption is restrictive and does not appear to work for the system (3.1) because of two nonlinear elements instead of one. The most popular identification schemes for the Hammerstein model are the NarendraGallman(NG) algorithm [7] and its variations [8]. The original NG algorithm may be divergent [9] But with normalising the estimates at each iteration [8], it is globally asymptotically convergent provided that the linear block in the Hammerstein model is FIR and the input u(k) to the system is white. The NG algorithm does not however apply to the system (3.1) because of a non-FIR linear block and also of the presence of the output nonlinear element. In fact, it is verified that the NG algorithm is not convergent for the system (3.1) even normalised at each iteration. The purpose of this chapter is to propose an algorithm for identification of the system (3.1). It is an optimal two-stage approach also referred to as the overparametrisation method. First, the least squares estimate for an over-sized paramˆ cˆ and dˆ are calculated eter vector is obtained. Then, the optimal estimates a, ˆ b, ˆ based on this least squares estimate. The estimates a, ˆ b, cˆ and dˆ provided by the proposed algorithm converge to the true a, b, c and d if the noise sequence is absent and converge to a, b, c and d with probability one if the noise sequence is white. The proposed algorithm is based on the works in [3, 4, 5] suggested for identification of the Hammerstein model, but it extends them in some important ways: • The proposed algorithm applies to the system (3.1) and thus is more general than those in [4, 5]. Note that the model discussed in [4, 5] contains only one nonlinear block, a subset of the system (3.1). • The estimates given by the proposed algorithm in this chapter are globally optimal while those presented in [4, 5] are local optimal and no optimality can be claimed for the estimate obtained in [3]. In particular, if proper pre-filtering on the observed data is allowed, the proposed algorithm is globally optimal for arbitrary noises. • One of the key assumptions in [3] is that the parameter vector of a bilinear Hammerstein system can be modelled with the first part being linear which implies that the system can only contain one nonlinear block and moreover the exact delay of the unknown system has to be known a priori. Our proposed algorithm does not require this assumption. Finally, it is important to point out that the computation involved in the proposed algorithm is very simple. The first stage is the least squares estimate which can be calculated recursively as the number of samples increases. The second stage is a global search which is elegantly transformed into the computation of the singular value decompositions (SVD) of two matrices with fixed dimensions n × m and p × q independent of the number of the data points.
3.2 Optimal Two-stage Algorithm Notice that the parametrisation of the system (3.1) is not unique. For instance, any parameter vectors b¯ = α b and c¯ = α −1 c or a¯ = β a and d¯ = β −1 d for some
3
Two-stage Algorithm
29
non-zero constant α or β provides an identical system as the one in (3.1). In other words, any identification experiment can not distinguish between the parameter vector sets (b, c) and (α b, α −1 c) or the sets (a, d) and (β a, β −1 d). To obtain a unique parametrisation, the first elements of the vectors b and a may be fixed to 1, a technique often used in the literature. The problem with this technique is that it indirectly presumes the delay of the system to be 1 which may not be the case. To avoid this problem, a different approach will be used: Assumption 3.1 (Uniqueness). Consider the system (3.1). Assume that ad T and bcT are not both zero. Moreover assume that a2 = 1 and b2 = 1 ( · 2 stands for the 2-norm) and the signs of the first non-zero elements of a and b are positive. Under the Uniqueness Assumption, it can be easily verified that the parametrisation of the system (3.1) is unique. Non-zeroness of bcT and ad T are assumed to avoid trivialising the identification problem. Define
θ
= (b1 c1 , ..., b1 cm , b2 c1 , ...b2 cm , ..., bn c1 , ..., bn cm , a1 d1 , ..., a1 dq , ..., a p d1 , ..., a p dq )T = (θ1 , ..., θnm , θnm+1 , ..., θnm+pq )T ⎛
Θbc
⎛ ⎞ ⎞ b 1 c1 , b 1 c2 , . . . , b 1 cm a1 d1 , a1 d2 , . . . , a1 dq ⎜ b 2 c1 , b 2 c2 , . . . , b 2 cm ⎟ ⎜ a2 d1 , a2 d2 , . . . , a2 dq ⎟ ⎜ ⎟ ⎟ T =⎜ , = bcT = ⎜ Θ = ad ⎜ ⎟ ⎟, .. .. . . .. .. .. . . .. ad ⎝ ⎝ ⎠ ⎠ . . . . . . . . b n c1 , b n c2 , . . . , b n cm a p d1 , a p d2 , . . . , a p dq
and φ (k) = ( f1 [u(k − 1)], ..., fm [u(k − 1)], ..., f1 [u(k − n)], ..., fm [u(k − n)] , (3.1) g1 [y(k − 1)], ..., gq [y(k − 1)], ..., g1[y(k − p)], ..., gq [y(k − p)])T . The system (3.1) can now be written as y(k) = φ T (k)θ + η (k). For a given data set {u(k), y(k)}Nk=1 , let YN = (y(1), ..., y(N)) , ηN = (η (1), ..., η (N)) , ΦN = (φ T (1), ..., φ T (N)). Then, Further, let
YN = ΦN θ + ηN .
(3.2)
θ (N) = (θ 1 , ..., θ nm , θ nm+1 , ..., θ nm+pq )T , a (N) = ( a1 , a 2 , ..., a p )T , b(N) = ( b1 , b2 , ..., bn )T ,
c (N) = ( c1 , c 2 , ..., c m )T , and d(N) = (d 1 , d 2 , ..., d q )T
(3.3)
denote the estimates of θ , a, b, c, and d respectively using the data set {u(k), y(k)}Nk=1 and
30
E.-W. Bai
⎛
⎛ ⎞ ⎞ θ 1 , . . . , θ m θ nm+1 , . . . , θ nm+q ⎜ ⎜ ⎟ ⎟ ⎜ θm+1 , . . . , θ 2m ⎟ ⎜ θnm+q+1 , . . . , θ nm+2q ⎟
⎜ ⎜ ⎟ ⎟ Θbc (N) = ⎜ .. . . . .. . . . ⎟ and Θad (N) = ⎜ ⎟ . .. . .. . . ⎝ ⎝ ⎠ ⎠ θ (n−1)m+1 , . . . , θ nm θ nm+(p−1)q+1, . . . , θ nm+pq (3.4) denote the estimate of Θbc and Θad respectively. Now, the two-stage identification algorithm for a given data set {u(k), y(k)}Nk=1 can be summarised in the following: Two-stage Identification Algorithm: Consider the system (3.1) under the Uniqueness Assumption. For a given data set {u(k), y(k)}Nk=1 , Step 1: Calculate the least squares estimate θ (N) = θ ls (N) = (ΦNT ΦN )−1 ΦNT YN .
bc (N) and Θ
ad (N) from θ (N) = θ ls (N) as in (3.4) and let Step 2: Construct Θ
bc (N) = Θ
min(n,m)
∑
i=1
ad (N) = σi μi νiT and Θ
min(p,q)
∑
δi ξi ζiT
i=1
be their singular value decompositions (SVD), where μi ’s (i = 1, 2, ..., n), νi ’s (i = 1, 2, ..., m), ξi ’s (i = 1, 2, ..., p) and ζi ’s (i = 1, 2, ..., q) are n, m, p, q-dimensional orthonormal vectors respectively. Step 3: Let sμ denote the sign of the first non-zero element of μ1 and sξ denote the sign of the first non-zero element of ξ1 . Define the estimates
= sξ δ1 ζ1 . b(N) = sμ μ1 , c (N) = sμ σ1 ν1 , a (N) = sξ ξ1 , d(N) The idea of the above algorithm is simple. First, the least squares estimate
bc (N) and Θ
ad (N), is sought to minimise the predicθ ls (N), or equivalently Θ
tion error. Since θls (N) is of high dimension, usually higher than p + n + m + q =dim(a)+dim(b)+dim(c)+dim(d), a lower q + p + n + m-dimensional vector bT (N), c T (N), d T (N))T ( aT (N), representing the estimates of a, b, c and d is to be determined. Note that
θ = (b1 cT , b2 cT , . . . , bn cT , a1 d T , . . . , a p d T )T and T T T T T T θ (N) = θ ls (N) = (b 1 c , b2 c , . . . , bn c , a1 d , . . . , a p d ) have a special structure in terms of a, b, c and d. Let vec(H) indicate the column vector by stacking the columns of H, i.e., vec(H) = (hT1 , hT2 , ..., hTl )T if H = (h1 , h2 , ..., hl ). Then, (bˆ 1 (N)cˆT (N), . . . , bˆ n (N)cˆT (N), aˆ1 (N)dˆT (N), . . . , aˆ p (N)dˆT (N))T − θ (N)22
3
Two-stage Algorithm
31
ˆ vec(b(N) cˆT (N)) − θ (N)22 vec(a(N) ˆ dˆT (N))
=
bc (N)2 +
ad (N)2 , = b(N) cT (N) − Θ a(N)d T (N) − Θ F F where · F stands for the matrix Frobenius norm. Thus, the closest a (N), b(N),
c (N) and d(N) to θ (N) in the 2-norm sense is given by
( a(N), d(N)) = arg
min
x∈R p ,w∈Rq
( b(N), c (N)) = arg
min
ad (N) − xwT 2 and Θ F
z∈Rn ,v∈Rm
bc (N) − zvT 2 Θ F
The solutions of b(N), c (N), a (N) and d(N) are provided by SVD decomposi
tions of Θbc (N) and Θad (N) as given in Step 2 of the proposed Identification Algorithm. The key difference between the above Two-stage Identification Algorithm
and the existing results along this line, e.g. in [4, 5], is that the estimates a (N), d(N),
b(N), c (N) here are obtained through a search over the entire parameter space while in [4, 5], they were searched only over a small subset. Now, the following results can be derived [1]. Theorem 3.1. Consider the system (3.1) and the Two-stage Identification Algorithm under the Uniqueness Assumption. Then, 1. For any N > 0, if ΦN is full column rank and the disturbance η (k) ≡ 0, then
a (N) = a, b(N) = b, c (N) = c, d(N) = d.
(3.5)
2. Let the disturbance η (k) be white with zero mean and finite variance and independent of u(k). Suppose the input u(k) is bounded and the regressor φ (k) is persistently exciting (PE), i.e.,
α2 I ≥
k0 +l0
∑
φ (k)φ T (k) ≥ α1 I > 0
k=k0
for any k0 ≥ 0 and some l0 > 0. Then, with probability 1 as N −→ ∞,
a (N) → a, b(N) → b, c (N) → c, d(N) → d.
(3.6)
Theorem 3.1 is also valid even if the disturbance η (k) is not zero mean by slightly modifying the system equation. To this end, let e = Eη (k) = 0 be the mean value of η (k). Then, the system (3.1) can be re-written as p
q
n
m
i=1
l=1
j=1
t=1
y(k) = ∑ ai { ∑ dl gl [y(k − i)]} + ∑ b j { ∑ ct ft [u(k − j)]} + e + v(k) where v(k) = η (k) − e is white and zero mean. Re-name
32
E.-W. Bai
θ = (b1 c1 , ..., b1 cm , ..., bn c1 , ..., bn cm , a1 d1 , ..., a1 dq , ..., a p d1 , ..., a p dq , e)T φ (k) = ( f1 [u(k − 1)], ..., fm [u(k − 1)], ..., f1 [u(k − n)], ..., fm [u(k − n)] , and g1 [y(k − 1)], ..., gq [y(k − 1)], ..., g1[y(k − p)], ..., gq [y(k − p)], 1)T . The above system is given by y(k) = φ T (k)θ + v(k) which is exactly the same as one with zero mean disturbance v(k). Thus, by applying the proposed Identification Algorithm with the re-named φ (k) and θ (N), all the results of Theorem 3.1 follow. To illustrate the method, a simulation example is provided. Consider a system y(k) = a1 (d1 y(k − 1) + d2sin(y(k − 1)) + b1(c1 u(k − 1) + c2u2 (k − 1))+ b2 (c1 u(k − 2) + c2u2 (k − 2)) + η (k) where a = (a1 ) = 1, d = (d1 , d2 )T = (0.5, 0.25)T , √ √ b = (b1 , b2 )T = (1/ 5, −2/ 5)T = (0.4472, −0.8944)T , and c = (c1 , c2 )T = (1, 4)T . For simulation, the input is u(k) = 2 ∗ sin(2 ∗ k) + 2 ∗ sin(4 ∗ k) + 0.15 ∗ sin(6 ∗ k) + 0.15 ∗ sin(8 ∗ k) + 0.1 ∗ sin(10 ∗ k) and η (k) are i.i.d. random variables uniformly in [−0.5, 0.5]. For N = 100, the proposed two-stage algorithm gives the following estimates
a (100) = 1, d(100) = (0.5006, 0.2404)T ,
b(100) = (0.4463, −0.8949)T , and c (100) = (1.0054, 4.0091)T . They are very close to the true but unknown a, b, c and d even for a small N = 100. In Theorem 3.1, convergence of the estimates to the true but unknown parameter vectors is pursued. Clearly, the convergence to the true values can be obtained only for some disturbances, e.g., white noises. In general, convergence can not be guaranteed for arbitrary noises. Thus, a more interesting question is to find estimates
a (N), b(N), c (N), d(N) that minimises
ˆ c, ˆ 2T , ˆ b, ˆ d) ( a(N), b(N), c (N), d(N)) = arg min YN − Y N (a, A A ˆ c, a, ˆ b, ˆ dˆ
(3.7)
ˆ c, ˆ = ( ˆ c, ˆ y (2, a, ˆ c, ˆ . . . , y (N, a, ˆ c, ˆ T, ˆ b, ˆ d) y(1, a, ˆ b, ˆ d), ˆ b, ˆ d), ˆ b, ˆ d)) where Y N (a, q p n m ˆ c, ˆ = ∑ aˆi ∑ dˆl gl [y(k − i)] + ∑ bˆ j ∑ cˆt ft [u(k − j)] . with y (k, a, ˆ b, ˆ d) i=1
l=1
j=1
t=1
Here X2AT A = X T AT AX, with some weighting matrix A. This is a difficult problem in general requiring nonlinear global search. However, for some matrix A, the proposed two-stage algorithm in fact provides an solution for this problem as shown below [1].
3
Two-stage Algorithm
33
Theorem 3.2. Consider the system (3.1) with some weighting matrix A as in (3.7). Let the least squares estimate in the first step of the Two-stage Identification Algorithm be re-defined as θ ls (N) = (Φ¯ NT Φ¯ N )−1 Φ¯ NT Y¯N with Φ¯ N = AΦN and Y¯N = AYN . Then, the estimates derived form the proposed Two-stage Identification Algorithm are the solution of
ˆ c, ˆ 2T ( a(N), b(N), c (N), d(N)) = arg min YN − Y N (a, ˆ b, ˆ d) A A ˆ c, a, ˆ b, ˆ dˆ
(3.8)
for any A such that all the singular values of AΦN are the same and non-zero. The weighting matrix A in Theorem 3.2 may be considered as pre-filtering on the observed raw data YN and ΦN . An important observation in identification is that the regressor vector φi should span all the directions in the parameter space, preferably with equal energy in each direction. This is equivalent to say that the maximum eigenvalue and the minimum eigenvalue of the matrix ΦNT ΦN should not differ by too much, a well-known concept referred to as the condition number of a matrix. In fact, if the condition number goes to infinity, parameter convergence can not be guaranteed even for arbitrarily small noises. The meaning of the weighting matrix A allowable in Theorem 3.2 is to pre-treat the data so that the condition number of the new treated data (AΦN )T AΦN is 1.
3.3 Concluding Remarks It was shown that the proposed two-stage identification algorithm converges to the true but unknown parameters with no noises or white noises. In the presence of arbitrary noises, it still converges to the global optimal values for certain weighting matrices. The problem of finding optimal estimates for arbitrary weighting matrices remains open. However, the following observation is useful: For any non-singular weighting matrix A, let Φ¯ N = AΦN , Y¯N = AYN and U Λ V T = Φ¯ N be the SVD decomposition of Φ¯ N . Then, YN − ΦN θ 2AT A = Y¯N − Φ¯ N θ ls (N)22 + Φ¯ N (θ ls (N) − θ )22 = Y¯N − Φ¯ N θ ls (N)22 + Λ V T (θ ls (N) − θ )22 . Therefore, for any non-singular A,
θ = arg min YN − ΦN θ 2AT A ⇔ θ = arg min Λ V T (θ ls (N) − θ )22 . θ
θ
A special case is that all the singular values of Φ¯ N = AΦN are the same and nonzero. In this case, θ = arg minθ YN − ΦN θ 2AT A ⇔ θ = arg minθ θ ls (N) − θ 22 and the solution is provided by Theorem 3.2. The chapter is based on [1] with permission from Automatica/Elsevier.
34
E.-W. Bai
References 1. Bai, E.W.: An optimal two-stage identification algorithm for a class of nonlinear systems. Automatica 34, 333–338 (1998) 2. Billings, S.A., Fakhouri, S.Y.: Identification of a class of nonlinear systems using correlation analysis. Proc. of IEE 125, 691–697 (1978) 3. Boutayeb, M., Rafaralahy, H., Darouach, M.: A robust and recursive identification method for Hammerstein model. In: IFAC World Congress, San Francisco, pp. 447–452 (1996) 4. Chang, F., Luus, R.: A non-iterative method for identification using Hammerstein model. IEEE Trans. on Auto. Contr. 16, 464–468 (1971) 5. Hsia, T.: A multi-stage least squares method for identifying Hammerstein model nonlinear systems. In: Proc. of CDC, Clearwater Florida, pp. 934–938 (1976) 6. Ljung, L.: Consistency of the least squares identification method. IEEE Trans. on Auto. Contr. 21, 779–781 (1976) 7. Narendra, K.S., Gallman, P.G.: An iterative method for the identification of nonlinear systems using a Hammerstein model. IEEE Trans. on Auto. Contr. 11, 546–550 (1966) 8. Rangan, S., Wolodkin, G., Poolla, K.: Identification methods for Hammerstein systems. In: Proc. of CDC, New Orleans, pp. 697–702 (1995) 9. Stoica, P.: On the convergence of an iterative algorithm used for Hammerstein system identification. IEEE Trans. on Auto. Contr. 26, 967–969 (1981) 10. Zhang, Y.K., Bai, E.W.: Simulation of spring discharge from a limestone aquifer in Iowa. Hydrogeology Journal 4, 41–54 (1996)
Chapter 4
Compound Operator Decomposition and Its Application to Hammerstein and Wiener Systems Jozef V¨or¨os
4.1 Introduction One of the main difficulties in the automatic control is caused by the fact that real systems are generally nonlinear. For better mathematical tractability of control system descriptions we are usually forced to make proper assumptions and to use approximations and/or simplifications, hoping that any possible side effects will not be too severe. Therefore, the approaches dealing with nonlinear dynamic systems are generally restrictive in assumptions and applicable to special cases only. In the case of block-oriented nonlinear systems we are confronted with system descriptions given by composition of mappings or operators f = f1 ◦ f2 ◦ · · · ◦ fn defined on nonempty sets Xi , where i = 1, 2, . . ., and fi : Xi → Xi+1 . This description seldom can be used in an analytic form (if it even exists) because the corresponding mathematical expression for input-output relation is often too intricate. Appropriate decompositions of the compound mapping f may provide suitable mathematical models, which simplify the identification of block-oriented nonlinear systems based on the input and output variables, even though one or more generally unmeasurable internal variables will have to be considered. In the following the technique for compound mapping decomposition is presented that can reduce the complexity of system description. Then its application to the block-oriented nonlinear dynamic systems of Hammerstein and Wiener types is shown. Finally, the identification of Hammerstein–Wiener systems with piecewise Jozef V¨or¨os Slovak University of Technology Faculty of Electrical Engineering and Information Technology Institute of Control and Industrial Informatics Ilkovicova 3, 812 19 Bratislava, Slovakia e-mail:
[email protected] F. Giri & E.-W. Bai (Eds.): Block-oriented Nonlinear System Identification, LNCIS 404, pp. 35–51. c Springer-Verlag Berlin Heidelberg 2010 springerlink.com
36
J. V¨or¨os
linear characteristics is presented where more and different ways of decomposition of the system operator are performed to simplify the system description.
4.2 Decomposition Let f , g and h be (one-to-one) mappings defined on nonempty sets U, X , and Y as follows: f :U →X
(4.1)
g:X →Y
(4.2)
h = g◦ f :U →Y
(4.3)
then where the mapping g ◦ f is called the composition of f and g. The notation g ◦ f reminds us that f (inner mapping) acts before g (outer mapping). The mapping composed from mappings f and g assigns just one element y ∈ Y to u ∈ U, or the corresponding x ∈ X , according to: y = g(x) = g [ f (u)] = h(u)
(4.4)
x = f (u) .
(4.5)
where
U
X
f
g
Y
h=g○f Fig. 4.1: Compound mapping
Let us assume the mapping g can be decomposed (split) and uniquely expressed by two mappings α : X →Y (4.6)
β : X →Y .
(4.7)
Then the mapping g can be defined on the following Cartesian product of two identical copies of set X g = α ⊕β : X ⊗X →Y . (4.8)
4
Compound Operator Decomposition
37
Replacement of the set forming the domain of the mapping by the Cartesian product of two copies of the same set is correct; it does not change the domain topology and does not require any assumptions or restrictions. Choice of an appropriate form of the decomposition (4.8) may simplify the mathematical description of the relations in (4.4) in some cases, mainly by splitting the mapping into additive or multiplicative forms given by a sum or a product of two mappings. This means that two mappings α and β with the domain X will exist for the mapping g such that in the additive case: g(x) = α (x) + β (x) ,
(4.9)
while in the multiplicative case: g(x) = α (x).β (x) ,
(4.10)
for every x ∈ X. Note that in the case of analytic mappings such decomposition forms always exist. The above mentioned way of decomposition is the basis for the so-called key term separation principle consisting of the following steps. Assume the outer mapping of (4.4) can be decomposed into the additive form (4.9). Then we substitute (4.5) only for x in the first term of (4.9), i.e. in the so-called key term as follows: g(x) = α [ f (u)] + β (x) .
(4.11)
After this half-substitution the domain of the compound mapping has been changed to the Cartesian product of inner and outer mapping domains g :U ⊗X →Y
(4.12)
and the couple of mappings given by (4.5) and (4.11) uniquely represents the original compound mapping (4.4). As no restrictions were imposed on the mappings α and β , it will be advantageous to choose the mapping α as the identity mapping. Then in the additive case: g(x) = x + β (x) = f (u) + β (x)
(4.13)
while in the multiplicative case: g(x) = x.β (x) = f (u).β (x) .
(4.14)
In some cases, the application of the key term separation principle may significantly simplify the mathematical description of the relations in (4.4).
4.2.1 Serial Application The key term separation principle can be applied more times within a compound mapping, namely in series, i.e. sequentially and repeatedly. Let Xi , i = 0, 1, . . . , n, be an ensemble of nonempty sets and let fi be mappings defined as follows:
38
J. V¨or¨os
fi : Xi−1 → Xi ,
i = 1, 2, . . . , n .
(4.15)
Let the composition of these mappings h = fn ◦ fn−1 ◦ · · · ◦ f1 : X0 → Xn
(4.16)
be given by the following relations x1 = f1 (x0 ) x2 = f2 (x1 ) = f2 ( f1 (x0 )) .. . xn = fn (xn−1 ) = fn ( fn−1 (· · · f2 ( f1 (x0 )) · · · ) = h(x0 )
(4.17)
where xi ∈ Xi . Assume that the mapping fn can be decomposed in the abovedescribed way, i.e. it can be defined on the Cartesian product of sets fn : Xn−1 ⊗ Xn−1 → Xn .
(4.18)
By analogy with (4.12) the domain of this mapping can be rewritten as follows: fn : Xn−2 ⊗ Xn−1 → Xn .
(4.19)
Further assuming that the mappings (4.15) for i = 1, 2, 3, . . . , n − 1, can be decomposed in the same way, then the compound mapping (4.16) can be replaced by the following equivalent mapping h : X0 ⊗ X1 ⊗ X2 ⊗ · · · ⊗ Xn−2 ⊗ Xn−1 → Xn
(4.20)
supplemented with the ensemble of mappings (4.15). If we choose the additive form of decompositions (4.13) for (4.17) and half-substitute only for the separated key term x1 : x2 = x1 + β2(x1 ) = f1 (x0 ) + β2 (x1 ) x3 = x2 + β3(x2 ) .. . xn = xn−1 + βn (xn−1 ) ,
(4.21)
the compound mapping (4.16) can be described by the equation n
xn = f1 (x0 ) + ∑ βi (xi−1 ) ,
(4.22)
i=2
i.e. by a sum of mappings. From the mathematical point of view, the equations (4.21) and (4.22) may be simpler than the original mapping (4.16).
4
Compound Operator Decomposition
39
4.2.2 Parallel Application The key term separation principle can be applied more times within a compound mapping also in parallel way — simultaneously. For example, let ϕ1 , . . . , ϕn be mappings defined on sets U1 ,U2 , . . . ,Un , X1 , X2 , . . . , Xn as
ϕi = Ui → Xi
i = 1, 2, . . . , n
(4.23)
and let γ be a mapping defined on sets X1 , X2 , . . . , Xn and Y as
γ = X1 ⊗ X2 ⊗ · · · ⊗ Xn → Y ,
(4.24)
y = γ (x1 , x2 , . . . , xn ) i = 1, 2, . . . , n xi = ϕi (ui )
(4.25) (4.26)
then
where ui ∈ Ui , xi ∈ Xi and y ∈ Y . Now we can perform the same procedure as above, i.e. we separate all xi , i = 1, 2, . . . , n, in (4.25) as the key terms and then half-substitute (4.26) only for the separated xi n
y = ∑ ϕi (ui ) + Γ (x1 , x2 , . . . , xn )
(4.27)
i=1
where Γ (.) represents the “remainder” of the original mapping γ (.) after separation of all xi s. Finally note that the decomposition of compound mappings can be performed by more and different combinations of the above-described approaches.
4.3 Decomposition of Block-oriented Nonlinear Systems The compound mapping decomposition can be directly applied to operators, i.e. mappings which operate on mathematical entities of higher complexity than real numbers, such as vectors, deterministic or stochastic variables, or mathematical expressions. In general, if either the domain or the co-domain (or both) of a mapping contains elements significantly more complex than real numbers, that mapping is referred to as an operator. In the control theory an operator can perform a mapping on any number of operands (inputs) to produce corresponding outputs. We shall use the word operator for mappings the domains and co-domains of which contain time dependent variables (e.g. u(t), x(t), y(t)).
4.3.1 Hammerstein System The key term separation principle can be very simply applied to the decomposition of Hammerstein system, which is a cascade connection of a nonlinear static subsystem
40
J. V¨or¨os
x(t) = f [u(t)]
(4.28)
where u(t) are the inputs and x(t) are the outputs, followed by a linear dynamic subsystem q−d B q−1 x(t) (4.29) y(t) = 1 + A (q−1 ) where A q−1 = a1 q−1 + . . . + am q−m B q−1 = b1 q−1 + . . . + bnq−n
(4.30) (4.31)
and x(t) and y(t) are the inputs and outputs, respectively. Then the Hammerstein system output equation can be written as y(t) = B q−1 x(t − d) − A q−1 y(t) . (4.32) Application of the key term separation principle is as follows. First, we separate a key term (4.33) y(t) = b1 x(t − d) + B q−1 − b1 x(t − d) − A q−1 y(t) i.e. the term containing variable x(t − d), and then we half-substitute (4.28) only for the key term y(t) = f [u(t − d)] + B q−1 − 1 x(t − d) − A q−1 y(t) , (4.34) assuming b1 = 1. Namely, if the nonlinear static function is multiplied by a nonzero real constant and if the linear dynamic part is divided by the same constant, the resulting model has the same input-output behaviour. After choosing an appropriate parametrisation for f (.), which is linear in parameters, the Hammerstein system can be described by the output equation, where all the parameters are separated and the equation is linear in parameters, but nonlinear in variables. This approach was applied to modelling and identification of Hammerstein systems with different types of nonlinearities, i.e. polynomial [17, 18, 5, 10],
u(t)
x(t) f (·)
y(t) q
−d
B q −1 − 1 1 + A (q −1 ) Fig. 4.2: Hammerstein system
4
Compound Operator Decomposition
41
discontinuous [19] two-segment polynomial [20], multisegment piecewise linear [22]. The parameter estimation was solved as quasi-linear problem by iterative methods with internal variable estimation. Also recursive estimation methods were proposed for on line identification of decomposed Hammerstein systems [24, 26, 27, 15].
4.3.2 Wiener System Similar approach can be chosen for the decomposition of Wiener system, which is a cascade connection of a linear dynamic subsystem (4.35) x(t) = B q−1 u(t − d) − A q−1 x(t) with the inputs u(t) and the outputs x(t), where A(q−1 ) and B(q−1 ) are given by (4.30) and (4.31), followed by a nonlinear static subsystem y(t) = g [x(t)]
(4.36)
where x(t) and y(t) are the inputs and outputs, respectively. First, we separate a key term y(t) = g1 x(t) + G[x(t)] . (4.37) Then, assuming g1 = 1, we half-substitute (4.35) only for the key term y(t) = B q−1 u(t − d) − A q−1 x(t) + G [x(t)] .
(4.38)
After choosing an appropriate parametrisation for G(.) being linear in parameters, the Wiener system can be described by the output equation, which is linear in all the separated parameters, but nonlinear in variables. The Wiener system description given by (4.38) was applied to modelling and identification of systems with different types of output nonlinearities, i.e. polynomial [17, 18, 5, 10], discontinuous [21], two-segment polynomial [23], multisegment piecewise linear [28], where iterative methods with internal variable estimation
u(t)
q −d B q −1 1 + A (q −1 )
x(t)
G(·)
Fig. 4.3: Wiener system
y(t)
42
J. V¨or¨os
were used. Also recursive estimation methods were proposed for on line identification of decomposed Wiener systems [30, 31, 34, 15].
4.4 Identification of Hammerstein–Wiener Systems Hammerstein and Wiener systems are the simplest types of block-oriented nonlinear dynamic systems and many methods have been proposed for their identification. However, there exist only few works reported in the literature on the so-called Hammerstein–Wiener system. Very little has been done in the identification of nonlinear systems using parametric Hammerstein–Wiener models [1, 2, 3, 6, 12, 13, 14, 35, 36] where more restrictions on nonlinear blocks are assumed, special sets of input-output data are used and more estimation steps and cost functions are considered. A significant disadvantage of these approaches is that the dimension of parameter vector is usually very high because of over-parametrisation. The following approach to parameter identification of Hammerstein–Wiener systems with piecewise linear characteristics is illustrating the multiple application of the key term separation principle in both serial and parallel ways. The resulting mathematical model for this type of block-oriented systems contains explicit information on all the blocks of given system without cross-multiplication of parameters but also more internal variables, which are generally unmeasurable. Application of an iterative algorithm enables, on the basis of one set of measured input/output data, the estimation of all internal variables and all model parameters, i.e.: 1. the coefficients determining the subdomains of the input static block function and the slopes of corresponding linear segments; 2. the parameters of linear block transfer function; and 3. the coefficients determining the subdomains of the output static block function and the slopes of corresponding linear segments.
4.4.1 Hammerstein–Wiener Systems The Hammerstein–Wiener system is given by the cascade connection of an input static nonlinearity block (N1) followed by a linear dynamic system (LS) which is followed by an output static nonlinearity block (N2) (Figure 4.4). The first nonlinear static block N1 can be described as: v(t) = C [u(t)]
(4.39)
where u(t) and v(t) are the inputs and outputs, respectively. The difference equation model of the linear dynamic block is: x(t) = B q−1 v(t) − A q−1 x(t) (4.40) where v(t) and x(t) are the inputs and outputs, respectively, A q−1 and B q−1 are scalar polynomials in the unit delay operator q−1
4
Compound Operator Decomposition
43
A q−1 = a1 q−1 + . . . + am q−m , B q−1 = b1 q−1 + . . . + bn q−n .
(4.41) (4.42)
The second nonlinear static block N2 can be described as y(t) = D [x(t)]
(4.43)
with inputs x(t) and outputs y(t). The Hammerstein–Wiener system inputs u(t) and outputs y(t) are measurable, while the internal variables v(t) and x(t) are not.
u(t)
v(t)
LS
x(t)
y(t)
Fig. 4.4: Hammerstein–Wiener system
The input-output description of the Hammerstein–Wiener system resulting from direct substitutions of the corresponding variables from (4.39) into (4.40) and then into (4.43) would be strongly nonlinear both in the variables and in the parameters, hence not very suitable for the parameter estimation. Therefore, the serial decomposition will be applied with the aim to derive a simpler form of the system description. The second nonlinear block can be decomposed and written as follows: y(t) = d1 x(t) + D [x(t)]
(4.44)
where the internal variable x(t) is separated. The linear dynamic block equation can be written as x(t) = b1 v(t − 1) + B q−1 − b1 v(t − 1) − A q−1 x(t) (4.45) where the internal variable v(t − 1) is separated. Now, to complete the serial decomposition, the corresponding half-substitutions can be performed, i.e.: (i) from (4.39) into (4.45) only for v(t − 1) in the first term, and (ii) from (4.45) into (4.44) again only for x(t) in the first term. The resulting output equation of the Hammerstein– Wiener system will be y(t) = d1 b1C [u(t − 1)] + B q−1 − b1 v(t − 1) − A q−1 x(t) + D [x(t)] . (4.46) Appropriate parametrisations of two nonlinear block descriptions can significantly simplify the system output equation and possibly lead to linearity in parameters. However, the system contains two internal variables, which are generally unmeasurable. As the Hammerstein–Wiener system consists in the cascade connection of three subsystems, the parametrisation of (4.46) is not unique, as many combinations of parameters can be found. Therefore, one parameter in at least two blocks has to be fixed in (4.46) to make the mathematical description unique. Evidently, the choices
44
J. V¨or¨os
d1 = 1 and b1 = 1 (more precisely bi = 1, where bi is the first nonzero parameter considered) in (4.46) will simplify the Hammerstein–Wiener system description.
4.4.2 Piecewise-linear Characteristics The general form (4.46) of the Hammerstein–Wiener system description can be used for the parameter identification of systems with different types of static nonlinearities in both N1 and N2 blocks [25]. Some technical systems possess the structure where the input and output nonlinearities are or can be modelled by the piecewise linear characteristics [11]. Although some approaches to identification of blockoriented systems with piecewise linear characteristics were proposed [4, 7, 16], the following application of key term separation principle simplifies the description of this type of systems. Let the output v(t) of the first nonlinear static block N1 be described by the following equations: if 0 ≤ u(t) ≤ dR1 mR1 u(t) (4.47) v(t) = mR2 [u(t) − dR1] + mR1dR1 if u(t) > dR1 if dL1 ≤ u(t) < 0 mL1 u(t) (4.48) v(t) = if u(t) < dL1 mL2 [u(t) − dL1] + mL1dL1 where |mR1 | < ∞, |mR2 | < ∞ are the corresponding segment slopes and 0 ≤ dR1 < ∞ is the constant for the positive inputs of N1, |mL1 | < ∞, |mL2 | < ∞ are the corresponding segment slopes and −∞ < dL1 < 0 is the constant for the negative inputs of N1. Let us introduce two internal variables f1 (t) = f1 [u(t)] = (mR2 − mR1 ) h [dR1 − u(t)] and
(4.49)
f2 (t) = f2 [u(t)] = (mL2 − mL1) h [u(t) − dL1]
(4.50)
where the switching function h(.) is defined as follows: 0 if s ≥ 0 h[s] = 1 if s < 0
(4.51)
and switches between two sets of variable s ∈ (−∞, ∞). Then the piecewise-linear mapping given by two equations (4.47) and (4.48) can be rewritten into the following input/output form [22]: v(t) = mR1 h [−u(t)] u(t) + [u(t) − dR1] f1 (t) +mL1 h [u(t)]u(t) + [u(t) − dL1] f2 (t) .
(4.52)
Now we can apply the parallel decomposition, i.e. separate two key terms, namely u(t) f1 (t) and u(t) f2 (t) and then half-substitute (4.49) and (4.50). So we obtain the following decomposed output equation for the block N1
4
Compound Operator Decomposition
45
v(t) = mR1 h [−u(t)] u(t) + (mR2 − mR1 ) h [dR1 − u(t)]u(t) − dR1 f1 (t) +mL1 h [u(t)] u(t) + (mL2 − mL1 ) h [u(t) − dL1] u(t) − dL1 f2 (t) (4.53) which is linear in parameters. Let the output y(t) of the second nonlinear static block N2 be described by the following equations: MR1 x(t) if 0 ≤ x(t) ≤ DR1 y(t) = (4.54) MR2 [x(t) − DR1 ] + MR1 DR1 if x(t) > DR1 ML1 x(t) if DL1 ≤ x(t) < 0 y(t) = (4.55) if x(t) < DL1 ML2 [x(t) − DL1] + ML1 DL1 where |MR1 | < ∞, |MR2 | < ∞ are the corresponding segment slopes and 0 ≤ DR1 < ∞ is the constant for the positive inputs of N2, |ML1 | < ∞, |ML2 | < ∞ are the corresponding segment slopes and −∞ < DL1 < 0 is the constant for the negative inputs of N2. In this case, the form proposed in [28] for the description of the first segment on the right-hand side and the left-hand side of the origin can be used leading to the following form: y(t) = MR1 x(t) + (ML1 − MR1 ) h [x(t)] x(t) + [x(t) − DR1] F1 (t) + [x(t) − DL1] F2 (t)
(4.56)
where the internal variables are defined as F1 (t) = F1 [x(t)] = (MR2 − MR1 ) h [DR1 − x(t)] and F2 (t) = F2 [x(t)] = (ML2 − ML1) h [x(t) − DL1 ] .
(4.57) (4.58)
We can again apply the parallel decomposition, i.e. separate two key terms, namely x(t)F1 (t) and x(t)F2 (t) and then half-substitute (4.57) and (4.58). We obtain the following decomposed equation for the block N2 y(t) = MR1 x(t) + (ML1 − MR1 ) h [x(t)] x(t) + (MR2 − MR1 ) h [DR1 − x(t)]x(t) − DR1 F1 (t) + (ML2 − ML1) h [x(t) − DL1] x(t) − DL1F2 (t)
(4.59)
and the equation is linear in parameters of nonlinear block N2. The above descriptions of the nonlinear blocks N1 and N2, i.e. (4.53) and (4.59), can be incorporated into (4.46). Choosing MR1 = 1, the half-substitution of (4.45) into (4.59) for the first term only gives m
n
i=2
j=1
y(t) = b1 v(t − 1) + ∑ bi v(t − i) + ∑ a j x(t − j) + (ML1 − 1)h [x(t)] x(t) + (MR2 − 1)h [DR1 − x(t)]x(t) −DR1 F1 (t) + (ML2 − ML1) h [x(t) − DL1] x(t) − DL1F2 (t)
(4.60)
46
J. V¨or¨os
and then choosing b1 = 1 the half-substitution of (4.53) into (4.60) for the first term only gives the resulting Hammerstein–Wiener system output equation y(t) = mR1 h [−u(t − 1)]u(t − 1) + (mR2 − mR1 ) h [dR1 − u(t − 1)]u(t − 1) − dR1 f1 (t − 1) +mL1h [u(t − 1)]u(t − 1) + (mL2 − mL1 ) h [u(t − 1) − dL1] u(t − 1) m
n
i=2
j=1
−dL1 f2 (t − 1) + ∑ bi v(t − i) − ∑ a j x(t − j) + (ML1 − 1)h [x(t)] x(t) + (MR2 − 1)h [DR1 − x(t)] x(t) −DR1 F1 (t) + (ML2 − ML1 ) h [x(t) − DL1] x(t) − DL1F2 (t) .
(4.61)
The Hammerstein–Wiener system is described by (4.49), (4.50) and (4.53) defining the internal variables f1 (t), f2 (t), and v(t) connected with N1; by (4.45) defining the internal variable x(t) being the output of LS; by (4.57), (4.58) defining the internal variables F1 (t), F2 (t) connected with N2; and by the output equation (4.61). All the parameters to be estimated are separated in (4.61) hence the proposed form of the Hammerstein–Wiener system description contains the least possible number of parameters to be estimated. Now the above description can be used as a mathematical model for the identification of Hammerstein–Wiener systems with piecewise linear characteristics. Defining the data vector
ϕ T (t) = {h [−u(t − 1)]u(t − 1), h [dR1 − u(t − 1)]u(t − 1), − f1 (t − 1), h [u(t − 1)]u(t − 1), h [u(t − 1) − dL1] u(t − 1), − f2 (t − 1), v(t − 2), . . . , v(t − m), −x(t − 1), . . . , −x(t − n), h [x(t)] x(t), h [DR1 − x(t)]x(t), −F1 (t), h [x(t) − DL1] x(t), −F2 (t)} (4.62) and the vector of parameters
θ T = [mR1 , mR2 − mR1 , dR1 , mL1 , mL2 − mL1, dL1 , b2 , . . . , bm , a1 , . . . , an , ML1 − 1, MR2 − 1, DR1 , ML2 − ML1, DL1 ]
(4.63)
the Hammerstein–Wiener model with piecewise-linear nonlinearities given by (4.61) can be put into a concise form y(t) = ϕ T (t) · θ + e(t)
(4.64)
where e(t) is an additive noise and the problem of model parameters estimation can be solved as a pseudo-linear estimation problem.
4
Compound Operator Decomposition
47
4.4.3 Algorithm As the data vector (4.62) contains unmeasurable variables f1 (t), f2 (t), v(t), x(t), F1 (t), F2 (t) and depends on the unknown parameters, no one-shot algorithm can be used for the estimation of parameter vector (4.63). However, the parameter estimation can be performed using an iterative method with internal variables estimation similarly as in [22, 28]. This means that an error criterion is repeatedly minimised using the estimates of internal variables resulting from the previous estimates of parameters included. In the case of the mean squares error criterion the following functional 2 1 N s J = ∑ y(t) − s−1ϕ T (t) · s θ (4.65) N t=1 is repeatedly minimised for the parameter vector estimate s θ , where N is the number of samples and s−1 ϕ (t) is the data vector with the (s − 1)-estimates of internal variables. The steps in the iterative procedure may be stated as follows: 1. Minimising (4.65) the estimates of parameters s θ are obtained using the data vectors s−1 ϕ (t) with the previous estimates of internal variables. 2. The estimates of internal variables s f1 (t), s f2 (t) are evaluated using s
f1 (t) = (s mR2 − s mR1 ) h [s dR1 − u(t)]
(4.66)
s
f2 (t) = ( mL2 − mL1 ) h [u(t) − dL1 ] .
(4.67)
s
s
s
3. The estimates of internal variable s v(t) are evaluated using s
v(t) = s mR1 h [−u(t)] u(t) + (s mR2 − s mR1 ) h [s dR1 − u(t)]u(t) −s dR1 s f1 (t) + s mL1 h [u(t)] u(t) + (s mL2 − s mL1 ) h [u(t) − sdL1 ] u(t) − s dL1 s f2 (t) .
(4.68)
4. The estimates of internal variable s x(t) are evaluated using s
m
n
i=2
j=1
x(t) = s v(t − 1) + ∑ s bi s v(t − i) + ∑ s a j s x(t − j) .
(4.69)
5. The estimates of internal variables s F1 (t), s F2 (t) are evaluated using F1 (t) = (s MR2 − s MR1 ) h [s DR1 − s x(t)] , s F2 (t) = (s ML2 − s ML1 ) h [s x(t) − s DL1 ] . s
(4.70) (4.71)
6. If the value of s J is less than a prescribed value, the procedure ends; else it continues with repeating steps 1–5. In the first iteration, only the parameters of N1 and LS are estimated and nonzero (small) initial values of 1 dR1 and 1 dL1 are used to start up the iterative algorithm. The initial values of the linear system parameters can be chosen as zero. Then nonzero initial values of 1 DR1 and 1 DL1 have to be used for N2. In the early steps of the
48
J. V¨or¨os
iterative procedure, the possible high values of s dR1 and s DR1 estimates and low values of s dL1 and s DL1 estimates may cause insufficient excitation for the estimation algorithm. Namely they may null the estimates of corresponding internal variables s f (t) and s f (t), as well as s F (t) and s F (t). However, proper limits for the values 1 2 1 2 of these parameters during the estimation process can overcome this problem.
4.4.4 Example To illustrate the feasibility of the proposed identification technique, the following example shows the process of parameters and internal variables estimation for the Hammerstein–Wiener system, where the nonlinear static block N1 (piecewise linear function with dead zone) and N2 (piecewise linear function with saturation) are given by the parameters in Table 4.1. Table 4.1: Parameters of the nonlinear blocks BLOCK N1 p11 p12 p13 p14 p15 p16
= = = = = =
mR1 mR2 mR2 − mR1 mL2 − mL1 dR1 dL1
= = = = = =
BLOCK N2 0.0 0.0 1.0 0.8 0.35 −0.3
p21 p22 p23 p24 p25 p26
= = = = = =
MR1 ML1 − MR1 MR2 − MR1 ML2 − ML1 DR1 DL1
= = = = = =
1.0 0.1 −1.0 −1.1 0.7 −0.8
The linear dynamic system is given by the difference equation x(t) = v(t − 1) + 0.15v(t − 2) + 0.2x(t − 1) − 0.35x(t − 2) . The initial values of the parameters were chosen as zero except 1 dR1 = 1 DR1 = 0.45 and 1 dL1 = 1 DL1 = −0.45 for the first estimates of corresponding internal variables. The identification was carried out with 2000 samples, using uniformly distributed random inputs and simulated outputs. Normally distributed random noise with zero mean and signal-to-noise ratio SNR = 50 was added to the simulated outputs (SNR − the square root of the ratio of output and noise variances). The parameter
Fig. 4.5 Parameter estimates for N1
4
Compound Operator Decomposition
49
Fig. 4.6 Parameter estimates for LS
Fig. 4.7 Parameter estimates for N2
estimates for the nonlinear block N1 are graphically shown in Figure 4.5 (the topdown order of parameters is p13 , p14 , p15 , p11 = p12 , p16 ), for the linear dynamic subsystem in Figure 4.6 (the top-down order of parameters is a2 , b2 , a1 ), and for the nonlinear block N2 in Figure 4.7 (the top-down order of parameters is p21 , p25 , p22 , p26 , p23 , p24 ). The parameter estimates are almost equal to the correct values after about 15 iterations and the iterative process shows good convergence. There is no general proof of convergence for iterative algorithms with internal variables estimation. However, the numerical accuracy and convergence of the proposed approach to parameter identification using Hammerstein–Wiener model with piecewise-linear nonlinearities is good despite the fact that 6 internal variables are estimated.
4.4.5 Conclusions From the mathematical point of view the presented form of compound mapping decomposition may appear as trivial or even inconvenient due to a certain redundancy. However, in the praxis, it may help to master problems with the description of systems, behaviour of which is characterised (indeed or expectedly) by compound mappings or operators. In control system analysis, it may be convenient to choose an unmeasurable internal variable as the key term of given compound operator and then to apply the presented principle. Compound mapping decomposition using the key term separation principle can be simply implemented and it does not change the domains of mappings. Note that the key term separation principle can be also used in decomposition of Wiener–Hammerstein systems [29, 15]. Moreover, it can be applied to descriptions of some dynamic nonlinearities as hysteresis [32] or backlash [33]. Consequently, it can be extended also to block-oriented systems with dynamic nonlinearities, e.g., to cascade systems with input backlash [33, 9] or output backlash [8].
50
J. V¨or¨os
References 1. Bai, E.W.: An optimal two-stage identification algorithm for Hammerstein–Wiener nonlinear systems. Automatica 34, 333–338 (1998) 2. Bai, E.W.: A blind approach to the Hammerstein–Wiener model identification. Automatica 38, 967–979 (2002) 3. Bauer, D., Ninness, B.: Asymptotic properties of least-squares estimates of Hammerstein–Wiener models. Int. J. Control 75, 34–51 (2002) 4. Chen, H.F.: Recursive identification for Wiener model with discontinuous piece-wise linear function. IEEE Trans. Automatic Control 51, 390–400 (2006) 5. Chen, H.T., Hwang, S.H., Chang, C.T.: Iterative identification of continuous-time Hammerstein and Wiener systems using a two-stage estimation algorithm. Industrial & Engineering Chemistry Research 48, 1495–1510 (2009) 6. Crama, P., Schoukens, J.: Hammerstein–Wiener system estimator initialization. Automatica 40, 1543–1550 (2004) 7. Dolanc, G., Strmcnik, S.: Identification of nonlinear systems using a piecewise-linear Hammerstein model. Systems and Control Letters 54, 145–158 (2005) 8. Dong, R., Tan, Q., Tan, Y.: Recursive identification algorithm for dynamic systems with output backlash and its convergence. Int. J. Appl. Math. Comput. Sci. 19, 631–638 (2009) 9. Dong, R., Tan, Y., Chen, H.: Recursive identification for dynamic systems with backlash. Asian Journal of Control 12, 26–38 (2010) 10. Guo, F.: A new identification method for Wiener and Hammerstein systems. PhD Dissertation, University Karlsruhe, Germany (2004) ˇ 11. Kalaˇs, V., Juriˇsica, L., Zalman, M., et al.: Nonlinear and Numerical Servosystems. Alfa/SNTL, Bratislava (in Slovak) (1985) 12. Lee, Y.J., Sung, S.W., Park, S., Park, S.: Input test signal design and parameter estimation method for the Hammerstein–Wiener processes. Industrial & Engineering Chemistry Research 43, 7521–7530 (2004) 13. Park, H.C., Sung, S.W., Lee, J.T.: Modeling of Hammerstein–Wiener processes with special input test signals. Industrial & Engineering Chemistry Research 45, 1029–1038 (2006) 14. Pupeikis, R.: On the identification of Hammerstein–Wiener systems. Lietuvos matem. rink 45, 509–514 (2005) 15. Tan, Y., Dong, R., Li, R.: Recursive identification of sandwich systems with dead zone and application. IEEE Trans. Control Systems Technology 17, 945–951 (2009) 16. van Pelt, T.H., Bernstein, D.S.: Non-linear system identification using Hammerstein and non-linear feedback models with piecewise linear static maps. Int. J. Control 74, 1807– 1823 (2001) 17. V¨or¨os, J.: Nonlinear system identification with internal variable estimation. In: Barker, H.A., Young, P.C. (eds.) Preprints 7th IFAC Symposium on Identification and System Parameter Estimation, York, pp. 439–443 (1985) 18. V¨or¨os, J.: Identification of nonlinear dynamic systems using extended Hammerstein and Wiener models. Control-Theory and Advanced Technology 10, 1203–1212 (1995) 19. V¨or¨os, J.: Parameter identification of discontinuous Hammerstein systems. Automatica 33, 1141–1146 (1997) 20. V¨or¨os, J.: Iterative algorithm for parameter identification of Hammerstein systems with two-segment nonlinearities. IEEE Trans. Automatic Control 44, 2145–2149 (1999) 21. V¨or¨os, J.: Parameter identification of Wiener systems with discontinuous nonlinearities. Systems and Control Letters 44, 363–372 (2001)
4
Compound Operator Decomposition
51
22. V¨or¨os, J.: Modeling and parameter identification of systems with multisegment piecewise-linear characteristics. IEEE Trans. Automatic Control 47, 184–188 (2002) 23. V¨or¨os, J.: Modeling and identification of Wiener systems with two-segment nonlinearities. IEEE Trans. Control Systems Technology 11, 253–257 (2003a) 24. V¨or¨os, J.: Recursive identification of Hammerstein systems with discontinuous nonlinearities containing dead-zones. IEEE Trans. Automatic Control 48, 2203–2206 (2003b) 25. V¨or¨os, J.: An iterative method for Hammerstein–Wiener systems parameter identification. J. Electrical Engineering 55, 328–331 (2004) 26. V¨or¨os, J.: Identification of Hammerstein systems with time-varying piecewise-linear characteristics. IEEE Trans. Circuits and Systems - II: Express Briefs 52, 865–869 (2005) 27. V¨or¨os, J.: Recursive identification of Hammerstein systems with polynomial nonlinearities. J. Electrical Engineering 57, 42–46 (2006) 28. V¨or¨os, J.: Parameter identification of Wiener systems with multisegment piecewiselinear nonlinearities. Systems and Control Letters 56, 99–105 (2007) 29. V¨or¨os, J.: An iterative method for Wiener–Hammerstein systems parameter identification. J. Electrical Engineering 58, 114–117 (2007) 30. V¨or¨os, J.: Recursive identification of Wiener systems with two-segment polynomial nonlinearities. J. Electrical Engineering 59, 40–44 (2008) 31. V¨or¨os, J.: Recursive identification of time-varying Wiener systems with polynomial nonlinearities. Int. J. Automation and Control 2, 90–98 (2008) 32. V¨or¨os, J.: Modeling and identification of hysteresis using special forms of the ColemanHodgdon model. J. Electrical Engineering 60, 100–105 (2009) 33. V¨or¨os, J.: Modeling and identification of systems with backlash. Automatica 46, 369– 374 (2010) 34. V¨or¨os, J.: Recursive identification of systems with noninvertible output nonlinearities. Informatica 21, 139–148 (2010) 35. Wang, D., Ding, F.: Extended stochastic gradient identification algorithms for Hammerstein–Wiener ARMAX systems. Computers & Mathematics with Applications 56, 3157–3164 (2008) 36. Zhu, Y.: Estimation of an N-L-N Hammerstein–Wiener model. Automatica 38, 1607– 1614 (2002)
Chapter 5
Iterative Identification of Hammerstein Systems Yun Liu and Er-Wei Bai
5.1 Introduction The iterative method was first proposed to estimate Hammerstein system in [9]. It is a very simple and efficient algorithm. In general, the convergence of iterative algorithm can be a problem [12]. It was recently shown that iterative algorithms with normalisation possess some global convergence properties in identification of Hammerstein system with smooth nonlinearity and finite impulse response (FIR) linear part [3, 4, 11]. The convergence for an infinite impulse response (IIR) system was however not solved. In this chapter, the results are extended to Hammerstein systems with an IIR linear part. The global convergence for an IIR system is established for the odd nonlinearities. The chapter also extends results to Hammerstein systems with non-smooth nonlinearities and FIR linear block [2]. One way to deal with such nonlinearities is to use nonparametric approach. A problem with nonparametric approach is the slow convergence [6, 7], if it converges. To this end, iterative algorithms were proposed and developed to identify Hammerstein systems with piecewise nonlinearity [13, 14]. But the convergence property of the algorithms is unknown. This chapter presents a normalised iterative identification algorithm for two common non-smooth piecewise-linear nonlinear structures, i.e., nonlinearities with saturation and preload characteristics. It is shown that the algorithms converge in one iteration step when the number of sample data is large. The chapter is based on [1] with permission from Automatica/Elsevier.
5.2 Hammerstein System with IIR Linear Part We study the scalar system in the block diagram Figure 5.1. The input signal is ut , xt is the internal signal, vt and yt are the noise and output, respectively. In this chapter, Yun Liu and Er-Wei Bai Dept. of Electrical and Computer Engineering, University of Iowa e-mail:
[email protected] F. Giri & E.-W. Bai (Eds.): Block-oriented Nonlinear System Identification, LNCIS 404, pp. 53–65. c Springer-Verlag Berlin Heidelberg 2010 springerlink.com
54
Y. Liu and E.-W. Bai
vt
?
1 D(q−1 )
ut
-
xt
f (·)
-
+ - ?
B(q−1 ) D(q−1 )
+
-yt
Fig. 5.1: Hammerstein system
the noise vt is assumed to be i.i.d. random sequences with zero mean, Evt2 = σv2 < ∞, and E|vt |4+δ < ∞ for some δ > 0, where E denotes expectation of a random variable. The Hammerstein system with an infinite impulse response (IIR) linear subsystem has the following representation. The linear part is an unknown stable system which can be described by: yt = d1 yt−1 + d2 yt−2 + · · · + dnyt−n + b1xt−1 + b2xt−2 + · · · + bm xt−m + vt . The nonlinear part is an unknown odd symmetric function that can be expressed as xt = f (ut ) = a1 g1 (ut ) + a2g2 (ut ) + · · · + al gl (ut ).
(5.1)
Here gi (u), i = 1, ..., l are known odd functions of u. We can write the Hammerstein system into the following matrix form, n
m
l
i=1
j=1
i=1
yt = ∑ di yt−i + ∑ b j [ ∑ ai gi (ut− j )] + vt = dT φt (y) + bT Gt (u)a + vt where
(5.2)
⎛
⎞ g1 (ut−1 ) · · · gl (ut−1 ) ⎜ ⎟ .. .. .. φt (y) = (yt−1 , yt−2 , · · · , yt−n )T , Gt (u) = ⎝ ⎠. . . . g1 (ut−m ) · · · gl (ut−m )
The unknown parameters needed to be estimated are: a = (a1 , a2 , ..., al )T , d = (d1 , d2 , ..., dn )T , b = (b1 , b2 , ..., bm )T . The purpose of identification is to estimate the unknown parameters based on the observed input and output data {ut , yt },t = 1, 2..., N, for large enough N. ˆ bˆ are sought to For Hammerstein systems linear part (5.2), the estimates aˆ , d, minimise the least squares cost function, ˆ b} ˆ = argmina¯ ,d, ¯ b) ¯ = argmina¯ ,d, a, d, {ˆa, d, ¯ b¯ JN (¯ ¯ b¯ = argmina¯ ,d, ¯ b¯
1 N ∑ (yˆt − yt )2 N t=1
1 N ¯T ∑ (d φt (y) + b¯ T Gt (u)¯a − yt )2 . N t=1
(5.3)
5
Iterative Identification of Hammerstein Systems
55
Note that the model (5.2) is not well defined for identification purpose. Any pair λ a and b/λ for some non-zero constant λ provides the same input-output data. To have identifiability, we adopt the normalisation constraint on a for model (5.2). Assume that ||a|| = 1, and the first non-zero entry of a is positive. We propose the following normalised iterative algorithms to estimate a and d, b for system (5.2): Given the initial estimate bˆ 0 = (1, 0, ..., 0)T , and an arbitrary n-dimension vector dˆ 0 , aˆ k = arg mina¯ JN (¯a, dˆ k−1 , bˆ k−1 ); Normalise aˆ k to have positive first non-zero entry ; (5.4) ¯ b); ¯ {dˆ k , bˆ k } = arg mind, ak , d, ¯ b¯ JN (ˆ Replace k by k + 1 and the process is repeated.
Convergence Analysis To analyse the performance of the algorithm (5.4), we write the cost function ˆ b) ˆ in (5.3) as follows, JN (ˆa, d, ˆ b) ˆ = JN (ˆa, d,
l 1 N ˆ [(d − d)T φt (y) + ∑ (gi (ut−1 ), · · · , gi (ut−m ))(aˆi bˆ − ai b) + vt ]2 . ∑ N t=1 i=1
Let N → ∞, and define ⎛
⎞ yt−1 1 ⎜ ⎟ C = lim ∑ ⎝ ... ⎠ (yt−1 , ..., yt−n ), N→∞ N t=1 yt−n N
⎛
⎞ yt−1 1 ⎜ ⎟ A(i) = lim ∑ ⎝ ... ⎠ (gi (ut−1 ), ..., gi (ut−m )), N→∞ N t=1 yt−n N
where i = 1, ..., l, ⎛
⎞ gi (ut−1 ) 1 ⎜ ⎟ B(i, j) = lim ∑ ⎝ ... ⎠ · (g j (ut−1 ), ..., g j (ut−m )) N→∞ N t=1 gi (ut−m ) N
where i, j = 1, 2, ..., l. When the noise vt is i.i.d. random sequence with zero mean and finite variance σv2 , the cost function (5.5) becomes l
J = (dˆ − d)T C(dˆ − d) + ∑ (dˆ − d)T A(i)(aˆi bˆ − ai b) i=1
(5.5)
56
Y. Liu and E.-W. Bai l
l
+ ∑ ∑ (aˆi bˆ − ai b)T B(i, j)(aˆ j bˆ − a j b) + σv2 .
(5.6)
i=1 j=1
If the input signal is i.i.d. with symmetric distribution, the following convergence result can be derived for the IIR Hammerstein system (5.2). Theorem 5.1. Consider the Hammerstein system (5.2) and the cost function (5.6). Suppose the first element of b is nonzero, i.e., b1 = 0, and the input nonlinearity is odd, i.e., −gi (u) = gi (−u), i = 1, ..., l. Further, assume that the input ut is i.i.d. which has a symmetric probability density function with positive supports at no less than l points, say at r1 , ..., rl , so that gi ’s are linear independent with respect to these l points, i.e., ⎛ ⎞ g1 (r1 ) . . . g1 (rl ) ⎜ ⎟ rank ⎝ ... . . . ... ⎠ = l . gl (r1 ) . . . gl (rl ) Then, the normalised iterative algorithm (5.4) converges to the true parameters a, d, and b in one iteration step provided N → ∞.
5.3 Non-smooth Nonlinearities In this section, we apply the iterative algorithm to two types of non-smooth Hammerstein system with finite impulse response (FIR) linear subsystem. The linear part has the following representation: yt = d1 xt−1 + d2 xt−2 + · · · + dn xt−n .
(5.7)
The saturation and preload nonlinearities that are encountered often in practical applications are considered in this chapter as shown in Figures 5.2 and 5.3 . Let us express the saturation nonlinear part in Figure 5.2 as ⎧ ut > c; ⎨ a2 , xt = f (ut ) = a1 u, − c ≤ ut ≤ c; (5.8) ⎩ −a2 , ut < −c. Define a switching function h(u) as follows: 1, u > 0; h(u) = 0, u ≤ 0. The signal xt now can be expressed in an additive form: xt = a1 g1 (u, c) + a2g2 (u, c) where g1 (u, c) = uh(|c| − |u|), g2 (u, c) = h(u − c) − h(−c − u).
(5.9)
5
Iterative Identification of Hammerstein Systems
57
f (u) a2 = a1 c a1 u −c .
−a2
.. ... .. ... .... .. ...
.. ... .. ... .. ... .. ... ...
c
u
Fig. 5.2: Nonlinear part with saturation
Note that both g1 (u, c) and g2 (u, c) are odd functions of u and here xt is continuous piecewise-linear function of ut . Similarly the preload nonlinear part in Figure 5.3 can be expressed by xt = a1 g1 (u, c) + a2g2 (u, c)
(5.10)
where g1 (u, c) = u[h(u − c) + h(−c − u)], g2(u, c) = h(u − c) − h(−c − u). and both g1 (·) and g2 (·) are odd functions. xt is a discontinuous function of ut here. Now we can write both types of Hammerstein systems into the following compact form, yt = where
n
2
j=1
i=1
∑ d j [ ∑ ai gi(ut− j , c)] + vt = dT G(ut , c)a + vt ⎞ g1 (ut−1 , c) g2 (ut−1 , c) ⎟ ⎜ .. .. G(ut , c) = ⎝ ⎠. . . ⎛
g1 (ut−n , c) g2 (ut−n , c) The unknown parameters needed to be estimated are: a = (a1 , a2 )T , d = (d1 , d2 , ..., dn )T and the discontinuity point c in the Hammerstein system with preload nonlinear part. To identify the Hammerstein systems with non-smooth nonlinearities (5.11), the ˆ cˆ are sought to minimise the least squares cost function estimates aˆ , d,
58
Y. Liu and E.-W. Bai
ˆ c} ¯ c) {ˆa, d, ˆ = argmina¯ ,d, a, d, ¯ = argmina¯ ,d, ¯ c¯ JN (¯ ¯ c¯ = argmina¯ ,d, ¯ c¯
1 N ∑ (yt − yˆt )2 N t=1
1 N ¯ a)2 . ∑ (yt − d¯ T G(ut , c)¯ N t=1
(5.11)
Without loss of generality, we adopt the normalisation constraint on d for model (5.11). Assume that ||d|| = 1, and the first non-zero entry of d is positive. The following normalised iterative algorithm is proposed to estimate a and d in Hammerstein system with saturation nonlinearity in Figure 5.2. Given the initial estimate aˆ 0 = 0 and an estimate of an upper bound of the threshold parameter cˆ1 > c, ¯ cˆ1 ); dˆ k = arg mind¯ JN (ˆak−1 , d, Normalise dˆ k to have positive first non-zero entry ; aˆ k = arg mina¯ JN (¯a, dˆ k , cˆ1 ); {aˆ2 }k = the second element of aˆ k , Given an estimate of an lower bound of the threshold parameter cˆ2 < c, aˆ k = arg mina¯ JN (¯ak−1 , dˆ k , cˆ2 ); {aˆ1 }k = the first element of aˆ k , Replace k by k + 1 and the process is repeated.
(5.12)
Similarly, we propose the following normalised iterative algorithms to estimate a and d in the Hammerstein system with preload nonlinearity in Figure 5.3. Given the initial estimate aˆ 0 = 0 and an estimate of an upper bound of the threshold parameter cˆ > c, define ¯ c); ˆ dˆ k = arg mind¯ JN (ˆak−1 , d, Normalise dˆ k to have positive first non-zero entry; ˆ aˆ k = arg mina¯ JN (¯ak−1 , dˆ k , c); Replace k by k + 1 and the process is repeated.
(5.13)
In fact, the algorithm can also be started with dˆ 0 : Given the initial estimate dˆ 0 = 0, define ˆ aˆ k = arg mina¯ JN (¯a, dˆ k−1 , c); ¯ c); dˆ k = arg mind¯ JN (ˆak , d, ˆ Normalise dˆ k to have positive first non-zero entry; update aˆ k accordingly with the normalisation factor; Replace k by k + 1 and the process is repeated.
(5.14)
To analyse the convergence properties of the above algorithms (5.12), (5.13), and ˆ c) (5.14), we write the cost function JN (ˆa, d, ˆ in (5.11) as follows, 1 N n g1 (ut−i , c) g1 (ut−i , c) ˆ ˆ ˆ ˆ = ∑ [ ∑ (di (a1 , a2 ) − di (aˆ1 , aˆ2 ) + vt ]2 . JN (ˆa, d, c) ˆ g2 (ut−i , c) g2 (ut−i , c) N t=1 i=1 We are interested in the behaviour of the algorithm when N is large enough. Let N → ∞, and define
5
Iterative Identification of Hammerstein Systems
ˆ ˆ j) = E[ g1 (ut−i , c) ˆ g2 (ut− j , c))], ˆ A(i, · (g1 (ut− j , c), g2 (ut−i , c) ˆ g (u , c) · (g1 (ut− j , c), g2 (ut− j , c))], A(i, j) = E[ 1 t−i g2 (ut−i , c) g (u , c) ˆ g2 (ut− j , c))], ˆ · (g1 (ut− j , c), A¯ 1 (i, j) = E[ 1 t−i g2 (ut−i , c) ˆ g (u , c) · (g1 (ut− j , c), g2 (ut− j , c))] A¯ 2 (i, j) = A¯ T1 ( j, i) = E[ 1 t−i g2 (ut−i , c) ˆ
59
(5.15)
where i, j = 1, 2, ..., n. When the noise vt is i.i.d. random sequence with zero mean and finite variance σv2 , the cost function will become n
n
ˆ c) ˆ j)ˆadˆj ˆ c) ˆ = ∑ ∑ [di aT A(i, j)ad j + dˆi aˆ T A(i, J(ˆa, d, ˆ = lim JN (ˆa, d, N→∞
i=1 j=1
−di a A¯ 1 (i, j)ˆaT dˆj − dˆi aˆ T A¯ 2 (i, j)ad j ] + σv2 . (5.16) T
Similarly, define ⎛
⎞ gi (ut−1 , c) ⎜ gi (ut−2 , c) ⎟ ⎜ ⎟ B(i, j) = E[⎜ ⎟ · (g j (ut−1 , c), g j (ut−2 , c), · · · g j (ut−n , c))], .. ⎝ ⎠ . gi (ut−n , c)
⎞ gi (ut−1 , c) ˆ ⎜ gi (ut−2 , c) ˆ ⎟ ⎟ ˆ j) = E[⎜ ˆ g j (ut−2 , c), ˆ · · · g j (ut−n , c))], ˆ B(i, ⎟ · (g j (ut−1 , c), ⎜ .. ⎠ ⎝ . ⎛
gi (ut−n , c) ˆ
⎞ gi (ut−1 , c) ⎜ gi (ut−2 , c) ⎟ ⎟ ⎜ ˆ g j (ut−2 , c), ˆ · · · g j (ut−n , c))], ˆ B¯ 1 (i, j) = E[⎜ ⎟ · (g j (ut−1 , c), .. ⎠ ⎝ . ⎛
B¯ 2 (i, j) =
gi (ut−n , c) T B¯ 1 ( j, i)
(5.17)
where i, j = 1, 2. The cost function in (5.16) can be written in the following equivalent form, 2
2
ˆ c)) ˆ c)) ˆ j)dˆ aˆ j (5.18) J(ˆa, d, ˆ = lim JN (ˆa, d, ˆ = ∑ ∑ [ai dT B(i, j)da j + aˆidˆ T B(i, N→∞
i=1 j=1
−ai dT B¯ 1 (i, j)dˆ aˆ j − aˆi dˆ T B¯ 2 (i, j)da j ] + σv2 (5.19) The convergence property of algorithm (5.12) for Hammerstein system with saturation nonlinear part is shown in the following theorem.
60
Y. Liu and E.-W. Bai
f (u)
−c
.. .... .. .... ... ... .... .
... ... .. ... ... .. ... .. ...
a1 u + a2
c
ut
a1 u − a2
Fig. 5.3: Nonlinear part with preload
Theorem 5.2. Consider the Hammerstein system with saturation nonlinearity (5.11). Assume an upper bound cˆ = cˆ1 > c, and lower bound cˆ = cˆ2 < c for the threshold value c are available. In addition, assume the input ut is i.i.d. with symmetric distribution that have positive support at points both larger and smaller than cˆ1 , cˆ2 such that Eg2i (u, cˆ j ) > 0, i, j = 1, 2, and the initial estimate dˆ T0 d = 0. Then the normalised iterative algorithm (5.12) converges to the true parameters a and d in one step provided that N → ∞. The similar convergence results can be established for the pre-load nonlinearity. However, the proof is tedious. For simplicity, we only present the result for a uniform distribution input. Theorem 5.3. Consider the Hammerstein system with preload nonlinearity (5.11), ˆ i) > 0, i = 1, ..., n. Assume the initial the cost function (5.16) or (5.19) with A(i, estimate dˆ T0 d = 0 and the input ut is i.i.d. uniformly distributed on [−U,U], where U ≥ c is a known upper bound of c. Then, the normalised iterative algorithm (5.13) or (5.14) converges to the true parameters a and d in one step provided that N → ∞. Next we need to obtain an estimate for the discontinuity point c in the Hammerstein system with preload. When the input ut be i.i.d. uniformly distributed in [−U,U], 1 the probability density function of ut is q(u) = 2U . Consider the correlation between yt and ut−1 . We have U 3 − c3 U 2 − c2 + a2 d1 3U 2U a1 d1 3 a2 d1 2 a1 d1U 2 a2 d1U c − c + + − E(yt ut−1 ) = 0 ⇒− 3U 2U 3 2
E(yt ut−1 ) = a1 d1 E(ug1 (u)) + a2d1 E(ug2 (u)) = a1 d1
We can solve last equation (5.20) above to get an estimate c. ˆ The correlation can be estimated by the sample mean,
5
Iterative Identification of Hammerstein Systems
yt ut−1 =
61
1 N ∑ yt ut−1 . N − 1 t=2
The parameters a1 , a2 , d1 in (5.20) are replaced by the estimates from the iterative algorithm. It can be proved that cˆ → c as N → ∞. Thus all the parameters in the systems with preload nonlinear part are estimated.
5.4 Examples The first example shows the convergence of the iterative algorithm (5.4). The true parameters in the Hammerstein system with IIR nonlinear part are a = (0.9545, 0.2983)T , b = (3, −2)T , d = (0.3, 0.2, 0.1)T . The simulation conditions are as follows. ut is uniformly distributed in [−1, 1], N = 2500, bˆ 0 = (1, 0)T , dˆ 0 = (0.1, 0.4, 0.4)T . The nonlinear functions are g1 (u) = u, g2 (u) = u3 , the noise vt is white Gaussian with zero mean and standard deviation 0.1. Using the normalised iterative algorithm, after one step we get aˆ = (0.9537, 0.3008)T , bˆ = (2.9793, −1.9850)T , dˆ = (0.2997, 0.2000, 0.1000)T . The estimates are very close to the true values. To show how the estimates beˆ b) ˆ − have with different values of N, we calculated the squared error eT e, e = (ˆa, d, (a, d, b) for N = 100 to 10000. For each N, the squared error is the average of 100 Monte Carlo simulation. The error is plotted in Figure 5.4 showing that the squared error goes to zero as N → ∞ in probability. The convergence of algorithm (5.4) depends on the odd symmetry of the nonlinear block. When the nonlinear block is not perfectly odd, we give the following example to show the method is not very sensitive on the odd assumption. Let the nonlinear functions be g1 (u) = u3 , g2 (u) = u2 . The true parameters are a = (0.9998, 0.0200)T , b = (3, −2)T , d = (0.4, 0.2, 0.1)T . The nonlinear block has a small even component and is not perfectly odd here. The simulation conditions are as follows. ut is uniformly distributed in [−1, 1], N = 5000, bˆ 0 = (1, 0)T , dˆ 0 = (0.05, 0.05, 0.05)T . The noise vt is white Gaussian with zero mean and standard deviation 0.1. After one iterative step, we get the estimates aˆ = (0.9998, 0.0188)T , bˆ = (2.9999, −1.9993)T , dˆ = (0.3998, 0.2000, 0.1000)T . The estimates are still close to the true values in this example of non-odd case. We then provide a numerical example to show the efficiency of the iterative algorithm (5.12) in Hammerstein system with saturation nonlinear part. The true parameters are a = (1, 3.5)T , d = (0.9058, 0.3397, −0.2265, −0.1132)T , c = 3.5.
62
Y. Liu and E.-W. Bai
0.16 0.14 0.12
Squared Error
0.1 0.08 0.06 0.04 0.02 0
0
2000
4000
6000
8000
10000
N
Fig. 5.4: Estimation error for Hammerstein models with IIR linear block
In simulation, the input ut is uniformly distributed in [−10, 10], N = 4000, aˆ 0 = (0.5, 4.9)T . Noise vt is white Gaussian with zero mean and standard deviation 0.1. Using the normalised iterative algorithm, after one step we get dˆ = (0.9052, 0.3399, −0.2274, −0.1161)T . Using an upper bound cˆ1 = 9.8 and lower bound cˆ2 = 1 of true parameter, we get aˆ = (0.9878, 3.3960)T , It then can be used to estimate the parameter c, cˆ = 3.4380. The mean squared errors for different value of N are plotted in Figure 5.5. Thus the algorithm (5.12) gives very close estimates for the true parameters. The last example is a Hammerstein system with preload piecewise-linear nonlinear part. We will use it to verify the algorithm (5.13) and (5.14). The true parameters are a = (6, 2)T , d = (0.7947, 0.2649, −0.5298, −0.1325)T , and c = 2. The simulation conditions are as follows. ut is uniformly distributed in [−10, 10], N = 5000, aˆ 0 = (4, −3)T , cˆ = 4 is an upper bound of the true value c = 2. Noise vt is white Gaussian with zero mean and standard deviation 0.1. Using the normalised iterative algorithm, after one step we get aˆ = (5.9495, 2.0648)T , dˆ = (0.7973, 0.2685, −0.5249, −0.1292)T .
5
Iterative Identification of Hammerstein Systems
63
0.1 0.09 0.08
Squared Error
0.07 0.06 0.05 0.04 0.03 0.02 0.01 0
0
2000
4000
6000
8000
10000
N
Fig. 5.5: Estimation error for Hammerstein models with saturation nonlinear block
7
6
Squared Error
5
4
3
2
1
0 0.2
0.4
0.6
0.8
1
1.2 N
1.4
1.6
1.8
2 x 10
4
Fig. 5.6: The true (solid) and the estimated (dash-dot) nonlinearities
To estimate c more accurately, we use the input ut uniformly distributed in [−4, 4], and choose the real solution of the equation (5.20). The estimate is cˆ = 2.0140. Figure 5.6 demonstrates the average squared errors with different value of N. This shows that the algorithm is effective.
64
Y. Liu and E.-W. Bai
One point that need to be emphasised is that the estimates in the examples are ˆ b), ˆ not J(ˆa, d, ˆ b). ˆ But given the assumptions on the obtained by minimising JN (ˆa, d, noise, the following consistency result is available [8, 10], lim θˆN = θo w.p. 1, θo = argminθ ∈Rm lim EJN (θ ).
N→∞
N→∞
ˆ b}, ˆ θo = {a, d, b}. Here θˆN = {ˆa, d,
5.5 Conclusion Iterative algorithm is a simple and efficient approach in parameter estimation. The convergence of the iterative method on IIR Hammerstein system is not known in current literature. A normalised iterative algorithm with some given initial conditions is proposed for the system in this chapter. The convergence property is analysed and illustrated with an example. Hammerstein systems with non-smooth piecewise-linear nonlinear part are often encountered in real processes operating differently in different input intervals. In this chapter, normalised iterative algorithms are applied to identification of Hammerstein systems with two kinds of odd piecewise-linear nonlinearities, the saturation and the preload nonlinear structures. By using a random input with symmetric distribution, the iterative algorithm is shown to be convergent in one step. The results are supported by examples of both systems. The convergence results of the iterative algorithms in such systems are new. The algorithms converge fast and is easy to compute. Although some prior knowledge of the nonlinear part structure and lower and upper bound of the critical points is needed, the method is still useful and provides a simple way to get convergent estimates. The numerical examples show that the suggested approach could be a viable method to try in such systems.
References 1. Bai, E.W.: Iterative identification of Hammerstein systems. Automatica 43, 346–354 (2006) 2. Bai, E.W.: Identification of linear system with hard input nonlinearities of known structure. Automatica 38, 853–860 (2002) 3. Bai, E.W., Li, D.: Convergence of the iterative Hammerstein system identification algorithm. IEEE Trans. on Auto. Contr. 49, 1929–1940 (2004) 4. Bai, E.W., Liu, Y.: Least squares solutions of bilinear equations. System & Control Letters 55, 466–472 (2006) 5. Cerone, V., Regruto, D.: Parameter bounds for discrete-time Hammerstein models with bounded output errors. IEEE Trans. on Auto. Contr. 48, 1855–1860 (2003) 6. Chen, H.F.: Pathwise convergence of recursive identification algorithms for Hammerstein systems. IEEE Trans. on Auto. Contr. 49, 1641–1649 (2004) 7. Greblicki, W.: Continuous time Hammerstein system identification. IEEE Trans. on Auto. Contr. 45, 1232–1236 (2000)
5
Iterative Identification of Hammerstein Systems
65
8. Ljung, L.: System Identification: Theory for the User. Prentice-Hall, New Jersey (1999) 9. Narendra, K.S., Gallman, P.G.: Continuous time Hammerstein system identification. IEEE Trans. on Auto. Contr. 11, 546–550 (1966) 10. Ninness, B., Gibson, S.: Quantifying the accuracy of Hammerstein model estimation. Automatica 38, 2037–2051 (2002) 11. Wolodkin, G., Rangan, S., Poolla, K.: New results for Hammerstein system identification. In: Proc. of CDC, pp. 697–702 (1995) 12. Stoica, P.: On the convergence of an iterative algorithm used for Hammerstein systems. IEEE Trans. on Auto. Contr. 26, 967–969 (1981) 13. V¨or¨os, J.: Parameter identification of discontinuous Hammerstein systems. Automatica 33, 1141–1146 (1997) 14. V¨or¨os, J.: Recursive identification of Hammerstein systems with discontinuous nonlinearities containing dead-zones. IEEE Trans. on Auto. Contr. 48, 2203–2206 (2003)
Chapter 6
Recursive Identification for Stochastic Hammerstein Systems Han-Fu Chen
6.1 Introduction By the Hammerstein system we mean a cascading system composed of a nonlinear memoryless block f (·) followed by a linear subsystem. In a deterministic setting, the linear part of the system is characterised by a rational transfer function (RTF) and the system output yk is exactly observed. However, in practice the system itself may be random and the observations may be corrupted by noises. So, it is of practical importance to consider stochastic Hammerstein systems as shown in Figure 6.1.
𝜉𝑘 𝑢𝑘
𝑓(⋅)
𝑣𝑘
Linear Subsystem
𝑦𝑘
C
𝑧𝑘
Fig. 6.1: Hammerstein system
Identification of Hammerstein systems has been an active research area for many years. The topic of the chapter is to recursively identify the SISO stochastic Hammerstein system. Let us specify systems to be identified in the chapter. The linear subsystem is described by an ARMAX system A(z)yk+1 = B(z) f (uk ) + C(z)wk+1 , k ≥ 0
(6.1)
or by an ARX system Han-Fu Chen Key Laboratory of Systems and Control, Institute of Systems Science, AMSS, Chinese Academy of Sciences, Beijing 100190, P.R. China e-mail:
[email protected] F. Giri & E.-W. Bai (Eds.): Block-oriented Nonlinear System Identification, LNCIS 404, pp. 69–87. c Springer-Verlag Berlin Heidelberg 2010 springerlink.com
70
H.-F. Chen
A(z)yk+1 = B(z) f (uk ) + wk+1 ,
(6.2)
where {wk } is the system noise and z is the backward-shift operator: zyk = yk−1 . So, here the stochastic Hammerstein system is concerned. The system input is {uk }, but the input of the linear subsystem is Δ
vk = f (uk ), where f (·) is an unknown static nonlinearity. The output of the system is {yk }, but the observation zk is with additive noise ξk zk = yk + ξk .
(6.3)
By assuming the orders (p, q, r) be known, identification of the linear subsystem is to estimate the unknown coefficients in the polynomials: Δ
A(z) = 1 + a1z + · · · + a pz p , p ≥ 0, a p = 0, Δ
B(z) = 1 + b1z · · · + bqzq , q ≥ 0, bq = 0, Δ
C(z) = 1 + c1z · · · + cr zr , r ≥ 0, cr = 0.
(6.4)
It is worth noting that it is not a restriction that all polynomials are monic. First, dividing by a constant we can obtain a monic A(z), then by changing the variance of wk we can make C(z) to be monic. Finally, any constant factor b0 of B(z) may be considered as a multiple of f (·), and instead of f (·) a new unknown function Δ
g(·) = b0 f (·) may serve as the system nonlinearity. As the nonlinearity concerns, both nonparametric and parametric f (·) are considered in the chapter. For the nonparametric f (·), the value f (u) is estimated for any fixed u. In the parametric case, f (·) either is expressed by a linear combination of known basis functions with unknown coefficients, or is a piecewise linear function with unknown joints and slopes, and hence identification of the nonlinear block in this case is equivalent to estimating unknown parameters. Various parameter estimation methods such as the extended least squares (ELS), instrumental variables, iterative optimisation and many others are used to identify Hammerstein systems with parametrised nonlinearities, see, e.g., [1-3, 7, 10, 15-19] among others. When the nonlinearity is not parametrised, it is usually estimated with the help of kernel functions, see, e.g., [11, 12, 21, 22] among others. For the case where the data set is with fixed size, an optimisation approach to estimating unknown parameters may be appropriate, but in the case where the data size is increasing or the number of data is extremely large though the data size is fixed, the recursive estimation algorithms is preferable to apply, because in this case the estimates are convenient to be updated when a new data is taken into account. In the chapter we only consider the recursive algorithms to estimate not only the nonlinearity of the system but also the unknown parameters contained in the linear
6
Recursive Identification for Hammerstein Systems
71
part of the system. All estimates are proved to converge a.s. to the corresponding true values. The rest of the chapter is arranged as follows. In Section 6.2 the problem is solved for the Hammerstein system with nonparametric f (·). In Section 6.3 the systems with the nonlinearities being piecewise linear functions are identified, while in Section 6.4 it is dealt with Hammerstein system identification with nonlinearity expanded to a series of basis functions with unknown coefficients. Some concluding remarks are given in Section 6.5. In the Appendix a general convergence theorem (GCT) is presented, which is used for convergence analysis of the proposed recursive identification algorithms.
6.2 Nonparametric f (·) In this section we consider identification of the Hammerstein system consisting of (6.2) and (6.3) with f (·) non-parametrised [6,22]. Let us fix an arbitrary u and estimate f (u). As the system input, let us take {uk } to be a sequence of independent and identically distributed (iid) random variables with density p(·) such that Euk = 0, |uk | ≤ u∗ where u∗ > 0 is a constant, u∗ > |u|, and p(·) is continuous at u with p(u) > 0. Assume uk = 0 for k < 0. Let Δ μ E f (uk ) , R1 E( f (uk ) − μ )2 , Euk f (uk ) = ρ = 0. We need the following assumptions: A1. All roots of A(z) = 0 are outside the closed unit disk; A2. 1 + b1 + · · · + bq = 0; A3. Both {ξk } and {wk } are sequences of iid random variables with Ewk = E ξk = 0, R2 Ew2k , R3 E ξk2 , R2 < ∞, R3 < ∞, and ξk = wk = 0 for k < 0, and the sequences {uk }, {ξk }, and {wk } are mutually independent. A4. The rational functions B(z)B(z−1 )R1 + R2 and A(z)A(z−1 ) have no common zero; A5. The function f (·) is measurable, locally bounded and continuous at u, where f (u) is estimated.
6.2.1 Identification of A(z) It is clear that the process {zk } with selected {uk } is asymptotically stationary if A1, A3 hold. By stability of A(z) the influence of initial values exponentially decays. So, without loss of generality, for simplicity of writing we may directly assume {zk } is stationary: b Δ μ ∗ = Ezk (= μ ), and a Δ
γ (τ ) = E(zk+τ − μ ∗ )(zk − μ ∗ ), τ ≥ 0, where
(6.5)
72
H.-F. Chen Δ
p
Δ
b
Δ
Δ
a = ∑ ai , b = ∑ bi with a0 = 1, b0 = 1. i=0
(6.6)
i=0
From (6.2) and (6.3) it follows that A(z)zk+1 = B(z) f (uk ) + A(z)ξk+1 + wk+1 , which can be rewritten as b A(z)(zk+1 − μ ) = B(z)( f (uk ) − μ ) + A(z)ξk+1 + wk+1 . a
(6.7)
Multiplying both sides of (6.7) by (zk−s − ba μ ), s ≥ p ∨ q and taking expectation by A3 lead to
γ (s + 1) + a1γ (s) + · · · + a pγ (s + 1 − p) = 0, s ≥ p ∨ q.
(6.8)
Setting s = p ∨ q, p ∨ q + 1, · · · , p ∨ q + p − 1 in (6.8), we derive the following linear algebraic equation: ⎤ ⎡ ⎡ ⎤ a1 γ (p ∨ q + 1) ⎢ γ (p ∨ q + 2)⎥ ⎢ a2 ⎥ ⎥ ⎢ ⎢ ⎥ (6.9) T ⎢ . ⎥ = −⎢ ⎥, .. ⎦ ⎣ ⎣ .. ⎦ . ap where
⎡
γ (p ∨ q) γ (p ∨ q + 1) .. .
γ (p ∨ q + p)
⎤ · · · γ (p ∨ q + 1 − p) · · · γ (p ∨ q + 2 − p)⎥ ⎥ ⎥. .. .. ⎦ . .
⎢ ⎢ T ⎢ ⎣ γ (p ∨ q + p − 1) · · ·
(6.10)
γ (p ∨ q)
The coefficients of A(z) are estimated as follows. By ergodicity of {zk }, we recursively estimate μ ∗ and γ (τ ) by 1 ∗ 1 μk∗ = (1 − )μk−1 + zk k k and
(6.11)
1 1 ∗ ∗ γk (τ ) = (1 − )γk−1 (τ ) + (zk − μk− (6.12) τ )(zk−τ − μk−τ ). k k Replacing γ (i) in (6.9) with γk (i) obtained by (6.11) and (6.12), i = p ∨ q + 1 − p, · · · , p ∨ q + p, we derive the following equation for defining estimates ak ( j) for a j , j = 1, · · · , p : ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ak (1) γk (p ∨ q) · · · γk (p ∨ q + 1 − p) γk (p ∨ q + 1) ⎢ γk (p ∨ q + 1) · · · γk (p ∨ q + 2 − p)⎥ ⎢ ak (2) ⎥ ⎢ γk (p ∨ q + 2)⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ · ⎢ .. ⎥ = − ⎢ ⎥ . (6.13) .. .. .. .. ⎣ ⎦ ⎣ . ⎦ ⎣ ⎦ . . . . γk (p ∨ q + p − 1) · · · γk (p ∨ q) γk (p ∨ q + p) ak (p)
6
Recursive Identification for Hammerstein Systems
73
Theorem 6.1. Assume A1, A3, A4, and A5 hold. Then ak (i) −−−→ ai a.s. for i = 1, · · · , p. k→∞
Proof. By ergodicity of {zk } we have
μk∗ −−−→ μ ∗ a.s. and γk (τ ) −−−→ γ (τ ) a.s. ∀τ ≥ 0. k→∞
k→∞
This means that the key point is to prove that the matrix T defined by (6.10) is nonsingular. The proof is similar to that given in [20] for the case R2 = 0. By A1 there is an R > 1 such that the spectral function g(z) of {zk } is analytic in B {z : R1 < |z| < R}, where g(z) =
∞ R1 B(z)B(z−1 ) + R2 + R3 A(z)A(z−1 ) = ∑ γ (i)zi . −1 A(z)A(z ) i=−∞
(6.14)
Assume the converse: T is singular. Then there exists a vector h = [h1 , · · · , h p ]T = 0 such that T h = 0, i.e., p
∑ h j γ (p ∨ q + i − j) = 0, i = 1, · · · , p,
(6.15)
j=1
which by (6.9) implies p
p
∑ h j γ (p ∨ q + p + 1 − j)
p
= − ∑ h j ∑ al γ (p ∨ q + p + 1 − j − l) j=1
j=1
l=1
p
p
l=1
j=1
= − ∑ al ∑ h j γ (p ∨ q + p + 1 − j − l) = 0, and hence (6.15) is valid for all i ≥ 1. p
Δ
Consider h(z) = g(z)z p∨q−p ∑ h j z p− j . By (6.14)(6.15) we have j=1
h(z) =
p
∞
j=1
l=0
∑ h j ∑ γ (l + j − p ∨ q)zl .
(6.16)
Thus, h(z) is bounded in |z| < 1. On the other hand, by A4 there is no pole-zero cancelation in R1 B(z)B(z−1 ) + R2 + R3 A(z)A(z−1 ) z p∨q p∨q−p g(z)z = . A(z)A(z−1 )z p By A1, A(z)A(z−1 )z p has p roots in the open disk with radius |z| = R1 . Therefore, h(z) has at least one pole in the open disk with radius |z| = R1 . This contradicts
74
H.-F. Chen
with the boundedness of h(z) in |z| < 1. The obtained contradiction shows that T is nonsingular. Then, (6.13) gives consistent estimates for ai , i = 1 · · · , p.
6.2.2 Identification of B(z) In what follows βk (0) is the estimate for ρ Eu1 f (u1 ) and βk (i) is for ρ bi . So, βk (i) d ∗ β (0) is the estimate for bi , i = 1, · · · , q, whenever βk (0) = 0. Since μ = c μ , k
p
1 + ∑ ak (i)
μk μk∗
i=1
1+
1 βk (0)
(6.17)
q
∑ βk (i)
i=1
serves as the estimate for μ (= E f (u1 )), if ρ = 0 and A2 holds. We apply the stochastic approximation algorithms with expanding truncations (SAAWET) [5] to estimate bi , i = 0, 1, · · · , q. For convenience of reading, the GCT of SAAWET is attached at the end of the chapter. Let {Mk } be a sequence of positive numbers increasingly diverging to infinity: Mk > 0, Mk+1 > Mk ∀ k, and Mk −−−→ ∞. k→∞
Setting σ0 (i) = 0, i = 0, 1, · · · , q, with an arbitrary initial value β0 (i), recursively define
βk+1 (i) = [βk (i) − 1k (βk (i) − uk (zk+1+i + ak−p (1)zk+i + · · · + ak−p(p)zk+1+i−p ))] (6.18) ·I[|β (i)− 1 (β (i)−u (z ], k k k k+1+i +ak−p (1)zk+i +···+ak−p (p)zk+1+i−p ))|≤M σk (i)
k
σk (i)
k−1
= ∑ I[|β j (i)− 1 (β j (i)−u j (z j+1+i +a j−p (1)z j+i +···+a j−p (p)z j+1+i−p))|>M σ j=1
j
j (i)
].
Theorem 6.2. Assume A1, A3, A4, and A5 hold. Then the recursive estimates βk (i), i = 0, 1, · · · , q, are strongly consistent: Δ
βk (i) −−−→ ρ bi , and bk (i) = k→∞
βk (i) −−−→ bi , a.s., i = 0, 1, · · · , q. βk (0) k→∞
Proof. We rewrite (6.18) as 1 1 βk+1 (i) = (βk (i) − (βk (i) − ρ bi) − εk+1 (i)) k k · I[|β (i)− 1 (β (i)−ρ bi )− 1 ε (i)|≤M ] k k k+1
(6.19)
εk+1 (i) = ρ bi − uk (zk+1+i + ak−p(1)zk+i + · · · + ak−p(p)zk+1+i−p ), i = 0, 1, · · · , q.
(6.20)
k
k
σk (i)
where
6
Recursive Identification for Hammerstein Systems
Define
75
m
1 ≤ t} i=n i
m(n,t) max{m : n ≤ m, ∑ By GCT it suffices to show, for i = 0, 1, · · · , q, lim lim sup
T →∞ n→∞
1 m(n,t) 1 | ∑ ε j+1 (i)| = 0 a.s., ∀ t ∈ (0, T ). T j=n j
(6.21)
By (6.2), we rewrite (6.20) as (1)
(2)
(3)
εk+1 (i) = εk+1 (i) + εk+1 (i) + εk+1 (i), where
Δ (1) εk+1 (i) = ρ bi − uk f (uk+i ) + b1 f (uk+i−1 ) + · · · + bq f (uk+i−q ) + ξk+1+i Δ (2) εk+1 (i) = uk (a1 − ak−p(1))yk+i + · · · + (a p − ak−p(p))yk+1+i−p Δ (3) εk+1 (i) = −uk wk+1+i + ak−p (1)ξk+i + · · · + ak−p(p)ξk+1+i−p .
By the convergence theorem for the sum of martingale difference sequences [4, 9], we have for i = 0, 1, · · · , q ∞
1
∑ k εk+1 (i) < ∞ (1)
∞
a.s.
k=1
1
∑ k εk+1 (i) < ∞ (3)
a.s.
k=1
(1)
(3)
Therefore, (6.21) is satisfied with ε j+1 (i) replaced by εk+1 (i) and εk+1 (i). By A1, A3, A5, and the boundedness of {uk }, applying Theorem 6.1 leads to ∞
1
∑ k εk+1 (i) < ∞ (2)
a.s.
k=1
Thus, (6.21) has been verified, and the proof is completed.
Remark 6.1. In the proof of Theorems 6.1 and 6.2 the condition A5 is only partly used. As a matter of fact, the requirement “ f (·) is continuous at u, where f (u) is estimated” is not needed.
6.2.3 Identification of f (u) For estimating f (u), we will use the kernel function defined below:
χk ! ∗ u
−u∗
where τk =
1 kδ
e e
u −u −( kτ )2 k
2 −( x−u τ ) k
,
p(x)dx
, with δ fixed, δ ∈ (0, 12 ). Similarly to [6], it can be shown that
76
H.-F. Chen
√ E χk = 1, E χk f (uk ) −−−→ f (u), and sup E( τk χk )2 < ∞. k→∞
(6.22)
k
Setting λ0 (u) = 0, with an arbitrary initial value η0 (u), we define the estimate fk (u) for f (u). Let ηk (u) be recursively calculated according to the following algorithm:
ηk+1 (u) 1 =[ηk (u) − (ηk (u) − (χk − 1)(zk+1 + ak−p(1)zk + · · · + ak−p(p)zk+1−p ))] k · I[|η (u)− 1 (η (u)−(χ −1)(z +a (1)z +···+a (p)z (6.23) ))|≤M ], k
λk (u) =
k
k
k
k+1
k−p
k
k−p
λk (u)
k+1−p
k−1
∑ I[|η j (u)− 1j (η j (u)−(χ j −1)(z j+1+a j−p (1)z j +···+a j−p(p)z j+1−p))|>Mλ j (u) ] .
j=1
Then fk (u) ηk (u) + μk serves as the estimate for f (u). Theorem 6.3. Assume A1-A5 hold and ρ = 0. Then
ηk (u) −−−→ f (u) − E f (u1 ) a.s. and ηk (u) + μk −−−→ f (u) a.s., k→∞
k→∞
where {ηk (u)} is generated by (6.23), and { μk } is given by (6.17). Proof. Similar to the proof of Theorem 6.2, by GCT given in Appendix, we only need to show lim lim sup
T →∞ n→∞
1 m(n,t) 1 | ∑ e j+1 | = 0 a.s., ∀ t ∈ (0, T ), T j=n j
(6.24)
where ek+1 = ( f (u) − E f (u1 )) − (χk − 1)(zk+1 + ak−p(1)zk + · · · + ak−p(p)zk+1−p ).
(6.25)
By (6.2), ek+1 is expressed as ek+1 = ( f (u) − E f (u1 )) − (χk − 1)( f (uk ) + b1 f (uk−1 ) + · · · + bq f (uk−q ) + ξk+1 + (ak−p(1) − a1)yk + · · · + (ak−p (p) − a p)yk+1−p + εk+1 + ak−p(1)εk + · · · + ak−p(p)εk+1−p ). By (6.22), A1, A3, and Theorem 6.1, using the boundedness of { f (uk )} and convergence theorem for the sum of martingale difference sequences [4, 9], similar to the proof in Theorem 6.2, we have
6
Recursive Identification for Hammerstein Systems ∞
77
1
∑ k (χk − 1)(b1 f (uk−1 ) + · · · + bq f (uk−q ) + ξk+1) < ∞
a.s.,
k=1 ∞
1
∑ k (χk − 1)(εk+1 + ak−p(1)εk + · · · + ak−p(p)εk+1−p ) < ∞
a.s.,
k=1
∞
1
∑ k ( f (uk ) − E f (u1)) < ∞
a.s.,
k=1
and
∞
∑
k=1
1 k (χk − 1)(ak−p (1) − a1 )yk
< ∞ a.s.,
.. .
∞
∑ 1k (χk − 1)(ak−p(p) − a p)yk+1−p < ∞ a.s.
k=1
So, for (6.24) it remains to show lim lim sup
T →0 n→∞
1 m(n,t) 1 | ∑ ( f (u) − χ j f (u j ))| = 0 a.s., ∀ t ∈ (0, T ). T j=n j
(6.26)
This is verified by writing f (u) − χ j f (u j ) = f (u) − E χ j f (u j ) + E χ j f (u j ) − χ j f (u j ), and noticing
∞
1
∑ k (χk f (uk ) − E χk f (uk )) < ∞
a.s.
k=1
and m(n,t)
|
∑
j=n
1 ( f (u) − E χ j f (u j ))| = o(1) · O(T ) = o(T ), j
which follows from (6.22). Thus, ηk (u) −−−→ f (u)− E f (u1 ), and the last conclusion k→∞
of the theorem follows by (6.17) and Theorems 6.1 and 6.2.
Remark 6.2. If E ξk4 < ∞ and E εk4 < ∞, more accurate results can be obtained. In 1 2
fact, it can be shown that μk∗ − μ ∗ = O( (log log1 k) ) a.s., γk (τ ) − γ (τ ) = O( log1 k ) a.s. k2
k2
for τ ≥ 0, and ck (i) − ci = O( log1 k ) a.s., where “ = O” means “ ≤ α ”, where α is a k2
constant which is independent of k but may depend on sample path.
6.3 Piecewise Linear f (·) We continue to consider the Hammerstein system consisting of (6.2) and (6.3) with f (·) being a piecewise linear function (possibly discontinuous) [7] containing six unknown parameters c+ , c− , b+ , b− , d + , d − expressed as
78
H.-F. Chen
⎧ + + + + ⎪ ⎨c (u − d ) + b , u > d f (u) = 0, −d − ≤ u ≤ d + ⎪ ⎩ − − − c (u + d ) − b , u < −d − .
(6.27)
We keep Assumptions A2-A4 made in Section 6.2 unchanged, but strengthen A1 to A1’ and replace A5 with A5’. A1’. All roots of A(z) = 0 and B(z) = 0 are outside the closed unit disk. A5’. The unknown nonlinearity is expressed by (6.27), and an upper bound U for d + and d − is available 0 ≤ d + < U and 0 ≤ d − < U. Let us take {uk } to be a sequence of iid random variables with uniform distribution over [−2U, 2U] and independent of {ξk } and {wk }. Paying attention to Remark 1, we see that under A1’, A2-A4, and A5’ Theorems 6.1 and 6.2 remain valid for the Hammerstein system with nonlinearity (6.27). Hence, as before we have the strongly consistent estimates ak (i) and bk ( j) defined in Section 6.2 for ai and b j , respectively, i = 1, · · · , p, j = 1, · · · , q. Δ
Define the q−dimensional vector H T = [1, · · · , 0] and ⎡ ⎤ ⎡ −b1 1 0 · · · 0 −bk (1) 1 ⎢−b2 0 1 · · · 0⎥ ⎢−bk (2) 0 ⎢ ⎥ ⎢ B⎢ . ⎥ and Bk ⎢ .. .. ⎣ .. ⎦ ⎣ . . 1 0 0 −bq
0
0···0
⎤ 0 0⎥ ⎥ ⎥. 1⎦ −bk (q) 0 0 · · · 0 0
0
0··· 1··· .. .
(6.28)
From (6.2) it follows that A(z)yk+1 = B(z)vk + wk+1 , vk = H T xk , where xk = Bxk−1 + H(A(z)yk+1 − wk+1 ) is recursively defined with some initial value x0 . In what follows by Ak (z) we mean the polynomial obtained from A(z) with coefficients ai replaced by its estimate ak (i), i = 1, · · · , p, given in Section 6.2. Therefore, Δ
Ak (z)yk+1 = yk+1 + ak (1)yk + · · · + ak (p)yk−p+1 , and Ak (z)ξk+1 and Ak (z)zk+1 are similarly defined. Then, the estimate vˆk for vk is naturally defined as Δ
vˆk = H T xˆk , xˆk = Bk xˆk−1 + HAk (z)zk+1 with an arbitrarily initial value xˆ0 . It is clear that f (u) = c+ u − h+ for u ≥ U or vk = μ +T φk+ for uk ≥ U, where Δ
μ + = [c+ , h+ ]T ,
Δ
φk+ = [uk , −1]T I[uk ≥U] .
(6.29)
6
Recursive Identification for Hammerstein Systems
79
Define Δ
m+ k = vˆk I[uk ≥U] .
(6.30)
Then μ + can be estimated by the following least squares (LS) algorithm [4, 14]: + + + + +T + μk+ = μk−1 + a+ k Pk φk (mk − φk μk−1 ),
(6.31)
+T + + −1 a+ k = (1 + φk Pk φk ) .
+ + + +T + Pk+1 = Pk+ − a+ k Pk φk φk Pk ,
(6.32)
Δ
The estimate for μ − = [c− , h− ]T is calculated in a similar way. Defining Δ
Δ
m− k = vˆk I[uk ≤−U] ,
φk− = [uk ,
1]T I[uk ≤−U] ,
(6.33)
we estimate μ − by the recursive LS algorithm: − − − − −T − μk− = μk−1 + a− k Pk φk (mk − φk μk−1 ),
(6.34)
− − − −T − Pk+1 = Pk− − a− k Pk φk φk Pk ,
(6.35)
−T − − −1 a− k = (1 + φk Pk φk ) .
Write μk+ and μk− in the component form: + − − − μk+ = [c+ k , hk ] and μk = [ck , hk ].
We now define estimates for d + , b+ , d − , and b− . It is clear that
2U 1 (c+ u − h+)du 4U d + c+ h + + c+ U h + = − d +2 + d + − , 8U 4U 2 2
Evk I[uk ≥0] =
(6.36)
and from here it follows that d+ = ⎧ 1 ⎨ 1 [h+ − sign(h+ )h+2 + 4c+U(c+U − h+ − 2Ev I 2 k [uk ≥0] ) ], c+ + ⎩ 4U ( h + Ev I k [uk ≥0] ), h+ 2
if |c+ | > 0 if c+ = 0,
where “ − sign(h+ )” is taken to make d + to be continuous with respect to c+ as c+ → 0 for a fixed Evk I[uk ≥0] . From here it is natural to define the estimates for d + and b+ as follows dk+
+2 1 + + + + 2 ¯+ h+ k − sign(hk ) hk + 4c¯k U(c¯k U − hk − 2m k) = , c¯+ k Δ
+ + + b+ k = ck d k − h k ,
where c¯+ k is a modification of ck to avoid possible dividing by zero:
(6.37)
80
H.-F. Chen
Δ c¯+ k =
c+ k , 1 (sign c+ k )k,
if if
1 |c+ k | ≥ k, 1 |c+ k | < k,
(6.38)
and m¯ + k is the time average of {vˆk I[uk ≥0] } recursively defined by m¯ + k =
vˆk I[uk ≥0] k−1 + m¯ k−1 + with m¯ + 0 = 0. k k
(6.39)
− Similarly, modify c− k to c¯k :
Δ c¯− k =
c− k, 1 (sign c− k )k,
1 if |c− k | ≥ k, − if |ck | < 1k ,
(6.40)
− − and define estimates dk− and b− k for d and b respectively:
dk− =
" #1 2 − − 2 − − − − h− − sign(h ) (h ) + 4 c ¯ U( c ¯ U − h + 2 m ¯ ) k k k k k k k c¯− k
,
(6.41)
and Δ
− − − b− k = ck d k − h k ,
(6.42)
where m¯ − k is the time average of {vˆk I[uk ≤0] } : m¯ − k =
vˆk I[uk ≤0] k−1 − m¯ k−1 + with m¯ − 0 = 0. k k
(6.43)
Theorem 6.4. Assume that A1’, A2-A4, and A5’ hold and uk is uniformly distributed over [−2U, 2U]. Then with probability one it takes place that
μk− −−−→ [c− , h− ]T ,
μk+ −−−→ [c+ , h+ ]T , k→∞
dk+ −−−→ k→∞
b+ −−→ b+ , k − k→∞
d , +
k→∞ dk− −−−→ d − , k→∞
and b− −−→ b− . k − k→∞
Proof. The LS estimate μk+ given by (6.31)(6.32) equals
μk+ =
"
k
∑ φi+ φi+T
i=1
#−1
k
∑ φi+ m+i ,
(6.44)
i=1
k
whenever the matrix
∑ φi+ φi+T is nonsingular [4, 14].
i=1
Since {uk } is iid with uniform distribution over [−2U, 2U], by the strong law of large numbers [9] we have
6
Recursive Identification for Hammerstein Systems
2 1 k + +T 1 k ui −ui ∑ φi φi = k ∑ −ui 1 I[ui ≥U] k i=1 i=1 7 2 U − 38 U −−−→ 12 3 a.s., 1 − 8U k→∞ 4
81
(6.45)
which is nondegenerate. Let νk be the estimation error for vk
νk = vˆk − vk . Noticing that vk I[uk ≥U] = μ +T φk+ and +T + m+ φk + νk I[uk ≥U] , k =μ
from (6.44) we have " # k k μk+ = ( ∑ φi+ φi+T )−1 ∑ φi+ φi+T μ + + νi I[ui ≥U] . i=1
i=1
By noticing (6.45), for μk+ −−−→ μ + a.s. it suffices to to show k→∞
1 n −−→ 0 ∑ νk I[uk ∈B] −n→∞ n k=1
a.s. for any Borel set B.
(6.46)
Define xˆ¯k = Bk xˆ¯k−1 + HAk (z)yy+1 . Then, we have xˆk − xˆ¯k = Bk (xˆk−1 − xˆ¯k−1 ) + HAk (z)ξk+1
(6.47)
xˆ¯k − xk = Bk (xˆ¯k−1 − xk−1 ) + (Bk − B)xk−1 + Hwk+1 .
(6.48)
and
By (6.47)–(6.48) it follows that
νk I[uk ∈B] = H T xˆk I[uk ∈B] − H T xk I[uk ∈B] = H T (xˆk − xˆ¯k )I[uk ∈B] + H T (xˆ¯k − xk )I[uk ∈B] = H T Bk1 (xˆ0 − xˆ¯0 )I[uk ∈B] + H T
k
∑ Bk,i+1 HAi (z)ξi+1 I[uk ∈B]
i=1 k
+ H T Bk1 (xˆ¯0 − x0 )I[uk ∈B] + H T ∑ Bk,i+1 (Bi − B)xi−1I[uk ∈B] i=1
k
+ H T ∑ Bk,i+1 Hwi+1 I[uk ∈B] , i=1
(6.49)
82
H.-F. Chen
where Δ
Bni = Bn Bn−1 · · · Bi
Δ
for n ≥ i,
B ji = I
for
j < i.
By Theorems 6.1 and 6.2 Ak (z) −−−→ A(z) and Bk (z) −−−→ B(z). Then by stability n→∞
n→∞
of B(z) there exist constants c > 0 and λ ∈ (0, 1) such that Bk j ≤ cλ k− j ,
and Bk− j ≤ cλ k− j
∀k ≥ j.
(6.50)
Since {uk }, {wk }, and {ξk } are mutually independent and {uk } is bounded, by (6.50) it is shown [7] that the time average of each term at the right-hand side of (6.49) tends to zero. Hence (6.46) is verified. −−→ Evk I[ uk ≥ 0]. Comparing (6.37) with Notice that (6.46) also implies that m¯ + k − k→∞
+ + d + , by strong consistency of c+ −−→ d + . The strong k and hk we conclude that dk − k→∞
− consistency of μk− , m¯ − k , and dk are established in a similar way.
6.4 Parameterized Nonlinearity We now consider identification of the system (6.1) with f (·) expressed as a sum of linear combination of basis functions with unknown coefficients [21]. To be precise, let {g1 (x), · · · , gs (x)}, x ∈ R be a set of basis functions and let s
f (x) =
∑ d j g j (x)
with unknown d1 , · · · , ds .
j=1
Assume the system output {yk } is observed without noise, i.e., ξk = 0 in (6.3). Then the system (6.1) is expressed as s
s
A(z)yk+1 =
∑ d j g j (uk ) + b1 ∑ d j g j (uk−1) + · · ·
j=1
j=1
s
+ bq ∑ d j g j (uk−q ) + C(z)wk+1 .
(6.51)
j=1
By setting . . . . . Δ θ T = [−a1 · · · − a p..d1 · · · ds ..b1 d1 · · · b1 ds .. · · · ..bq d1 · · · bq ds ..c1 · · · cr ], (6.52) . . ϕk0T = [yk · · · yk+1−p ..g1 (uk ) · · · gs (uk ) · · · g1 (uk−q ) · · · gs (uk−q )..wk · · · wk+1−r ], (6.53) the system (6.51) is then written as yk+1 = θ T ϕk0 + wk+1 .
(6.54)
6
Recursive Identification for Hammerstein Systems
83
To identify θ in this model we may use various estimation algorithms [4], e.g., ELS, the weighted least squares (WLS), the stochastic gradient (SG), and other modifications. Let us apply the ELS with an arbitrary θ0 and P0 = α0 I for some α0 > 0:
θk+1 = θk + ρk Pk ϕk (yk+1 − θkT ϕk )
(6.55)
Pk+1 = Pk − ρk Pk ϕk ϕkT Pk
(6.56)
1 T wˆ k+1 = yk+1 − θk+1 ϕk , ρk = 1 + ϕkT Pk ϕk
(6.57)
ϕkT = [yk · · · yk+1−p g1 (uk ) · · · gs (uk ) · · · g1 (uk−q ) · · · gs (uk−q )wˆ k · · · wˆ k+1−r ]. (6.58) Remark 6.3. Let us write θk component-wisely: Δ
θkT = [−a1,k , · · · , −a p,k , d1,k , · · · , ds,k , (b1 d1 )k , · · · , (b1 ds )k , · · · , (bq d1 )k , · · · , (bq ds )k , c1,k , · · · , cr,k ]. (bi d j )k d j,k
It is clear that 0, j = 1, · · · , s.
may serve as an estimate for bi , i = 1, · · · , q, whenever d j,k =
To establish θk −−−→ θ we need the the following assumptions. k→∞
is strictly positive real (SPR), i.e., C−1 (eiλ ) + C−1 (e−iλ ) − 1 > 0, ∀λ ∈ [0, 2π ]; B2. {wn , Fn } is a martingale difference sequence and supn≥0 E[|wn+1 |β |Fn ] < ∞ a.s. for some β ≥ 2, where {Fn } is a sequence of nondecreasing σ −algebras and un is Fn −measurable. B3. A(z) is stable, i.e., A(z) = 0 ∀|z| ≤ 1; . . B4. A(z), B(z), and C(z) have no common factor and [a ..b ..c ] = 0; B1.
C−1 (z) − 12
p
q r
B5. The set of functions {gi , i = 0, · · · , s} with g0 = 1 is linearly independent over some interval [δ1 , δ2 ]; {uk } is a sequence of iid random variables with density p(x), which is positive and continuous over [δ1 , δ2 ], and 0 < Eg2i (uk ) < ∞, i = 1 · · · , s; The sequences {uk } and {wk } are mutually independent; B6. limn→∞ n1 ∑ni=1 w2i = Rw > 0 a.s.; B7. ∑si=1 di2 = 0. Theorem 6.5. If B1-B7 hold, then as n tends to infinity θn+1 − θ 2 = O
" log n log log nδ (β −2) # n
a.s.,
(6.59)
where δ (0) = c > 1 and δ (x) = 0 for x = 0. 0 (n) and λ 0 (n) the maximal and minimal eigenvalues of Proof. Denote by λmax min n 1 0 0T ∑i=0 ϕi ϕi + α0 I, respectively. By Theorem 4.2 in [4] it follows that
84
H.-F. Chen
" log λ 0 (n) log log λ 0 (n)δ (β −2) # max max a.s. as n → ∞, θn+1 − θ = O 0 (n) λmin 2
(6.60)
δ (β −2) 0 0 0 if log λmax (n) log log λmax (n) = o λmin (n) a.s. The conclusion (6.59) will follow from (6.60) if we can show that 1 0 (n) > 0 a.s. lim inf λmin n→∞ n The proof is similar to that given in Theorem 6.2 of [4] for (6.79). Here we only outline the key points of the proof. For (6.60) it suffices to show n 1 lim inf λmin { ∑ fi fiT } > 0 a.s., n→∞ n i=0
Δ
where fi = A(z)ϕi0 and λmin {A} denotes the minimal eigenvalue of a matrix A. For a fixed ω (sample) the converse assumption leads to the existence of a subsequence Δ
(0)
(p−1)
ηnTk = [αnk , · · · , αnk
(01)
(0s)
(q1)
(qs)
(0)
(r−1)
, βnk , · · · , βnk , · · · , βnk , · · · , βnk , γnk , · · · , γnk
]
with ηnk 2 = 1 such that 1 nk T 2 ∑ (ηnk fi ) = 0, k→∞ nk i=0
(6.61)
lim
where
ηnTk fi = μ
∑ h nk
μ
μ μ (l j) j (s j) j (0 j) h z , · · · , h z , n n ∑ k ∑ k ∑ hnk z j · [g1(ui ), · · · , gs (ui ), wi ]T ,
j=0
(i j) j
z =
p−1 j=0
μ
p−1
j=0
(0 j) j
z =
j=0
q
∑ αnk B(z)z j+1 di + ∑ βnk
j=0
∑ h nk
j=0
( j)
( ji)
A(z)z j , i = 1, · · · , s,
j=0
r−1
∑ αnk C(z)z j + ∑ γnk A(z)z j .
j=0
( j)
( j)
j=0
By boundedness of {ηnk } without loss of generality we may assume that ηnk −−−→ η k→∞
with η = 1. By a treatment similar to that used in Theorem 6.4 of [4] it is shown that (6.61) implies that η = 0. But this is impossible since 1 ≡ ηnk −−−→ η . The obtained k→∞
contradiction proves the theorem. For details we refer to [21].
6
Recursive Identification for Hammerstein Systems
85
6.5 Concluding Remarks We have proposed recursive algorithms to identify the Hammerstein systems with nonlinearities i) f (·) non-parameterized; ii) f (·) being a piecewise linear function; iii) f (·) being presented as a sum of basis functions with unknown coefficients. In all three cases the estimates are proved to converge to the true values with probability one. In Sections 6.2 and 6.3 the results are obtained for (6.2), where the system noise {wk } is uncorrelated. However, the similar results may also be obtained for (6.1) with correlated system noise C(z)wk with the help of technique developed in [8] without imposing conditions like SPR on C(z). In Section 6.3 the availability of the upper bound U for d + and d − may be removed by the treatment used in [13], where a recursive algorithm is proposed to generate an upper bound for d + and d − . It would be of interest to consider the case where an internal noise exists, i.e., the input of the linear subsystem includes not only vk but also an additive noise. Besides, it might also be of interest to consider the colored observation noise ξk .
6.6 Appendix General Convergence Theorem (GCT) for SA Δ Let f (·) be an unknown function f (·) : Rl → Rl , and denote its root set by J = l {x ∈ R : f (x) = 0}. With any initial value x0 the problem is to recursively construct {xk } to approach J on the basis of noisy observations {yk }, where yk+1 = f (xk ) + εk+1 denotes the observation at time k + 1 and εk+1 the observation noise, which may depend on xk . Take a sequence {Mk } of positive numbers increasingly diverging to infinity, and a fixed point x∗ ∈ Rl . Define {xk } by SAAWET as follows: xk+1 = (xk + ak yk+1 )I[xk +ak yk+1 ≤Mσ
k
]
∗
+ x I[xk +ak yk+1 >Mσ ] ,
(6.62)
k
σk =
k−1
∑ I[xi +aiyi+1 >Mσi ] ,
σ0 = 0.
(6.63)
i=1
The following assumptions are to be used. S1. ak > 0, ak −−−→ 0 and ∑∞ k=1 ak = ∞. k→∞
S2. There is a continuously differentiable function (not necessarily being nonnegative) v(·) : Rl → R such that sup
δ ≤d(x,J)≤Δ
f T (x)vx (x) < 0
86
H.-F. Chen Δ
for any Δ > δ > 0, where d(x, J) = infy {x − y : y ∈ J} and vx (·) denotes the Δ
gradient of v(·). Further, v(J) = {v(x) : x ∈ J} is nowhere dense, and x∗ used in (6.62) is such that v(x∗ ) < infx=c0 v(x) with x∗ < c0 for some c0 > 0 . S3. f (·) is measurable and locally bounded. For introducing condition on noise let us denote by (Ω , F , P) the probability space. Let εk+1 (·, ·) : (Rl × Ω , B l × F ) → (Rl × B l ) be a measurable function defined on the product space. Let the noise εk+1 be given by
εk+1 = εk+1 (xk , ω ),
ω ∈ Ω.
S4. For the fixed sample path ω under consideration lim lim sup
T →0 k→∞
1 m(nk ,Tk ) ∑ ai εi+1 (xi (ω ), ω ) = 0, T i=n k
∀Tk ∈ [0, T ]
(6.64)
along the subscripts {nk } of any convergent subsequences xnk (ω ), where $ % m Δ m(k, T ) = max m : ∑ ai ≤ T .
(6.65)
i=k
The algorithm (6.62)-(6.63) is considered for a fixed ω , but ω in xi (ω ) is often suppressed. Theorem 6.6 (GCT). Let {xk } be given by (6.62)(6.63) for a given initial value x0 . Assume S1-S3 hold. Then, d(xk , J) −−−→ 0 for any sample paths (ω ) for which S4 k→∞
holds. For the proof we refer to Theorem 2.2.1 in [5]. It is worth noting that Condition S4 is imposed on convergent subsequences, so the condition is of local feature. In contrast to this, a condition would be of global feature if it is required to verify along the whole sequence {xn }. Acknowledgements. The work is supported by NSFC Grants No. 60821091 and 60874001 and by a grant from the National Laboratory of Space Intelligent Control.
References 1. Bai, E.W.: Convergence of the iterative Hammerstein system identification algorithm. IEEE Trans. Automat. Contr. 49(11), 1929–1940 (2004) 2. Bai, E.W., Li, D.: Convergence of the iterative Hammerstein system identification algorithm. IEEE Transactions on Automatic Control 49, 1929–1940 (2004) 3. Boutayeb, M., Rafaralahy, H., Drouach, M.: A robust and recursive identification method for Hammerstein model. In: Proc. 13th. IFAC World Congr., San Fransisco, CA, vol. I, pp. 447–452 (1996) 4. Chen, H.F., Guo, L.: Identification and Stochastic Adaptive Control. Birkh¨auser, Boston (1991)
6
Recursive Identification for Hammerstein Systems
87
5. Chen, H.F.: Stochastic Approximation and Its Applications. Kluwer, Dordrecht (2002) 6. Chen, H.F.: Pathwise convergence of recursive identification algorithms for Hammerstein systems. IEEE Trans. Automat. Contr. 49(10), 1641–1649 (2004) 7. Chen, H.F.: Strong consistency of recursive identification for Hammerstein systems with discontinuous piece-wise linear memoryless block. IEEE Trans. Automat. Contr. 50(10), 1612–1617 (2005) 8. Chen, H.F.: New Approach to Identification for ARMAX Systems. IEEE Trans. Autom. Control 55, 868–879 (2010) 9. Chow, Y.S., Teicher, H.: Probability Theory. Spring, New York (1978) 10. Eskinat, E., Johnson, S., Luyben, W.L.: Use of Hammerstein models in identification of nonlinear systems. AIChE. J. 37(2), 255–268 (1991) 11. Greblicki, W.: Stochastic approximation in nonparametric identification of Hammerstein systems. IEEE Trans. Automat. Contr. 47(11), 1800–1810 (2002) 12. Greblicki, W., Pawlak, M.: Nonparametric identification of Hammerstein systems. IEEE Trans. Inform. Theory 35(3), 409–418 (1989) 13. Huang, Y.Q., Chen, H.F., Fang, H.T.: Identification of Wiener Systems with Nonlinearity Being Piecewise-linear Function. Science in China, Series F 51(1), 1–12 (2008) 14. Ljung, L.: System Identification. Prentice Hall, Upper Saddle River (1987) 15. Narendra, K.S., Gallman, P.G.: An iterative method for the identification of nonlinear systems using a Hammerstein model. IEEE Trans. Automat. Contr. 11(7), 546–550 (1966) 16. Pawlak, M.: On the series expansion approach to the identification of Hammerstein system. IEEE Trans. Automat. Contr. 36, 459–476 (1982) 17. Stoica, P.: On the convergence of an iterative algorithm used for Hammerstein system identification. IEEE Trans. Automat. Contr. 26(4), 967–969 (1981) 18. Verhaegen, M., Westwick, D.: Identifying MIMO Hammerstein systems in the context of subspace model identificaiton methods. Int. J. Control 63(2), 331–349 (1996) 19. V¨or¨os, J.: Parameter identification of discontinuous Hammerstein systems. Automatica 33(6), 1141–1146 (1997) 20. Walker, A.M.: Large-sample estimation of parameters for autoregressive process with moving-average residuals. Biometrika 49(1,2) (1962) 21. Zhao, W.X.: Parametric Identification of Hammerstein Systems with Consistency Results Using Stochastic Inputs. IEEE Trans. Automat. Contr. 55, 474–480 (2010) 22. Zhao, W.X., Chen, H.F.: Recursive identification for Hammerstein system with ARX subsystem. IEEE Trans. Automat. Contr. 51(12), 1966–1974 (2006)
Chapter 7
Wiener System Identification Using the Maximum Likelihood Method Adrian Wills and Lennart Ljung
Dedicated To Anna Hagenblad (1971 - 2009) Much of the research presented in this chapter was initiated and pursued by Anna as part of her work towards a Ph.D thesis, which she sadly never had the opportunity to finish. Her interest in this research area spanned nearly ten years and her contributions were significant. She will be missed. We dedicate this work to the memory of Anna.
7.1 Introduction Within the class of nonlinear system models, the so-called block-oriented models have gained wide recognition and attention by the system identification and automatic control community. Typically, these models are constructed by joining linear dynamic system blocks with static nonlinear mappings in various forms of interconnection. The Wiener model depicted in Figure 7.1 is one such block-oriented model, see, e.g. [2], [18] or [9]. It is typically comprised of two blocks, where the first one is linear and dynamic and the second is nonlinear and static. From one perspective, these models are reasonable since they often reflect the physical realities of a system. Some examples of this include distillation columns [24], pH control processes [11], and biological examples [10]. More generally, they Adrian Wills School of Electrical Engineering and Computer Science, University of Newcastle, Callaghan, NSW, 2308, Australia e-mail:
[email protected] Lennart Ljung Division of Automatic Control, Link¨opings universitet, SE-581 80 Link¨oping, Sweden e-mail:
[email protected] F. Giri & E.-W. Bai (Eds.): Block-oriented Nonlinear System Identification, LNCIS 404, pp. 89–110. c Springer-Verlag Berlin Heidelberg 2010 springerlink.com
90
A. Wills and L. Ljung
w(t)
u(t)
G(q, ϑ)
x0 (t)
e(t)
x(t)
f (·, η)
y(t)
Fig. 7.1: The Wiener model. The input u(t) and the output y(t) are measurable, but not the intermediate signal x(t). w(t) and e(t) are noise sources. x0 (t) denotes the output of the linear dynamic system G. f is nonlinear and static (memoryless)
accurately model situations where the output of a linear system is obtained using a nonlinear measurement device. From another perspective, if the blocks of a Wiener model are multi-variable, then it can be shown [3] that almost any nonlinear system can be approximated arbitrarily well using them. However, this is not the focus of the current chapter, where single input - single output systems are considered. With this as motivation, in this chapter we are concerned with estimating Wiener models based on input and/or output measurements. To make these ideas more precise, we will adopt the notation used in Figure 7.1 here and throughout the remainder of this chapter. In particular, the input signal is denoted by u(t), the output signal by y(t) and x(t) denotes the intermediate unmeasured signal. The disturbance term w(t) is henceforth called the process noise and e(t) is called the measurement noise as usual. These noise terms are assumed to be mutually independent. Using this notation, the Wiener system can be described by the following equations: x0 (t) = G(q, ϑ )u(t) . x(t) = x0 (t) + w(t) . y(t) = f x(t), η + e(t) .
(7.1)
Throughout this chapter it is assumed that f and G each belong to a parametrised model class. Typical classes for the nonlinear term f include basis function expansions such as polynomials, splines, or neural networks. The nonlinearity f may also be a piecewise linear function, such as a dead-zone or saturation function. Typical classes for the linear term G include rational transfer functions and linear state space models. It is important to note that if the process noise w and the intermediate signal x are unknown, then the parametrisation of the Wiener model is not unique. For example, scaling the linear block G via κ G and scaling the nonlinear block f via f ( κ1 ·) will result in identical input–output behaviour. (It may necessary to scale the process noise variance with a factor κ .)
7
Wiener System Identification Using the Maximum Likelihood Method
91
Based on the above description, the problem addressed in this chapter is to estimate the parameters ϑ within the model class for G and η within the model class for f that best match the measured output data from the system. For convenience, we define a joint parameter vector θ as
θ = [ϑ T , η T ]T
(7.2)
which will be used throughout this chapter.
7.2
An Output-error Approach
While there are several methods for identifying Wiener models proposed in the literature, the most dominant of these is to parametrise the linear and the nonlinear blocks, and then estimate the parameters from data by minimising an output-error criterion (this has been used in [1], [21] and [22] for example). In particular, if the process noise w(t) in Figure 7.1 is ignored, then a natural criterion is to minimise VN (θ ) =
#2 1 N" y(t) − f G(q, ϑ )u(t), η . ∑ N t=1
(7.3)
This approach is standardly used in software packages such as [23] and [12]. If it is true that the process noise w(t) is zero, then (7.3) becomes the predictionerror criterion. Furthermore, if measurement noise is white and Gaussian, (7.3) is also the Maximum Likelihood criterion and the estimate is therefore consistent [13]. Even for the case where there is process noise, it may still seem reasonable to use an output-error criterion like (7.3) to obtain an estimate. However, f G(q, ϑ )u(t), η is not the true predictor in this case and it has been shown in [8] that this can result in biased estimates. A further difficulty with this approach is that is cannot directly handle the case of blind Wiener model estimation where the process noise is assumed to be zero, but the input u(t) is not measured. Related criteria to (7.3) have been derived for this case, but they assume that the nonlinearity is invertible and/or that the measurement noise is not present [20, 19]. By way of motivating the main tool in this chapter, namely Maximum Likelihood estimation, the next section provides conditions for the estimates of (7.3) to be consistent. It is shown by example that using the output-error criterion can produce biased estimates. These results appeared in [8].
Consistent Estimates Consider a Wiener system in the form of Figure 7.1 and Equation (7.1) and assume we have measurements of the input and output according to some “true” parameters (ϑ0 , η0 ), i.e.
92
A. Wills and L. Ljung
y(t) = f G(q, ϑ0 )u(t) + w(t), η0 + e(t) .
(7.4)
Based on the measured inputs and outputs, we would like to find an estimate of
) say, that are close to the true parameters. A more these parameter values, (ϑ , η precise way of describing this is to say that an estimate is consistent if the parameters converge to their true values as the number of data, N tends to infinity. In order to make this idea concrete for the output-error criterion in (7.3) we write the true system (7.4) as ˜ + e(t) (7.5) y(t) = f G(q, ϑ0 )u(t), η0 + w(t) where
w(t) ˜ = f G(q, ϑ0 )u(t) + w(t), η0 − f G(q, ϑ0 )u(t), η0 .
(7.6)
The new disturbance term w(t) ˜ may be regarded as a (input-dependent) transformation of the process noise w(t) to the output. This transformation will most likely distort the stochastic properties of w(t), such as mean and variance, compared with w(t). ˜ By inserting the equation for y in (7.5) into the criterion (7.3), we receive the following expression. #2 1 N" f − f + w(t) ˜ + e(t) (7.7) ∑ 0 N t=1 #2 1 N # 2 2 N " 1 N" = ∑ f0 − f + ∑ w(t) ˜ + e(t) ˜ + e(t) + ∑ f0 − f w(t) N t=1 N t=1 N t=1
VN (θ ) =
where
f0 f G(q, ϑ0 )u(t), η0 ,
f f G(q, ϑ )u(t), η .
(7.8)
Further, assume that all noise terms are ergodic, so that time averages tend to their mathematical expectations as N tends to infinity. Assume also that u is a (quasi)stationary sequence [13], so that is also has well defined sample averages. Let, E denote both mathematical expectation and averaging over time signals (cf. E¯ in [13]). Using the fact that the measurement noise e is zero mean, and independent of the input u and the process noise w means that several cross terms will disappear. The criterion then tends to " #2 " # ˜ . (7.9) V¯ (θ ) = E f0 − f + E w˜ 2 (t) + Ee2(t) + 2E f0 − f w(t) The last term in this expression cannot necessarily be removed since the transformed process noise w˜ need not be independent of u. The criterion (7.9) has a quadratic form, and the true values (ϑ0 , η0 ) will minimise the criterion if and essentially only if " # E f G(q, ϑ0 )u(t), η0 − f G(q, ϑ )u(t), η w(t) ˜ =0. (7.10)
7
Wiener System Identification Using the Maximum Likelihood Method
93
Typically, this will not hold due to the possible dependence between u and w. ˜ The parameter estimates will therefore be biased in general. To illustrate this, we provide an example below. Example 7.1. Consider the following Wiener system, with linear dynamic part described by x0 (t) + 0.5x0(t − 1) = u(t − 1) , (7.11) x(t) = x0 (t) + w(t) , followed by a static nonlinearity described as a second order polynomial, f x(t) = c0 + c1x2 (t) , y(t) = f x(t) + e(t) .
(7.12)
The goal is to estimate the nonlinearity parameters denoted here by c 0 and c 1 . In this case it is possible to provide expressions for the analytical minimum of criterion (7.3). Recall that in this case the process noise w(t) is assumed to be zero. Therefore, the predicted output can be expressed as y (t) = f (G(q, ϑ )u(t), η ) = f (x0 (t), η ) = c 0 + c 1 x20 (t) .
(7.13)
Assume that all signals, noises as well as inputs, are Gaussian, zero mean and ergodic. Let λx denote the variance of x0 , λw denote the variance of w, and λe denote the variance of e. As N tends to infinity, the criterion (7.3) tends to the limit (7.9) 2 V¯ = E(y − y )2 = E c0 + c1 (x0 + w)2 + e − c 0 − c 1 x20 2 = E (c1 − c 1 )x20 + c0 − c 0 + 2c1 x0 w + c1 w2 + e . All the cross terms will be zero since the signals are Gaussian, independent and zero mean. The fourth order moments are Ex4 = 3λx2 and Ew4 = 3λw2 . This leaves V¯ =3(c1 − c 1 )2 λx2 + (c0 − c 0 )2 + 4c1λx λw + 3c21λw2 + λe + 2(c0 − c 0 ) × (c1 − c 1 )λx + 2c1(c1 − c 1 )λx λw + 2c1(c0 − c 0 )λw . From this expression it is possible to compute the gradient with respect to each c i and therefore find the minimum by solving (c0 − c 0) + (c1 − c 1) + c1 λw = 0 3(c1 − c 1 )λx2 + (c0 − c 0 )λx + 3c1λx λw = 0 with the solution c 0 = c0 + c2 λw , Therefore, the estimate of c0 is clearly biased.
c 1 = c1 .
94
A. Wills and L. Ljung
Motivated by the above example, the next section investigates the use of the Maximum-Likelihood criterion to estimate the system parameters, which is known to produce consistent estimates under the assumptions of this chapter [13].
7.3 The Maximum Likelihood Method The maximum likelihood method provides estimates of the parameter values θ based on an observed data set YN = {y(1), y(2), . . . , y(N)} by maximising a likelihood function. In order to use this method it is therefore necessary to first derive an expression for the likelihood function itself. The likelihood function is the probability density function (PDF) of the outputs that is parametrised by θ . We shall assume for the moment that the input sequence UN = {u(1), u(2), . . . , u(N)} is a given, deterministic sequence (the case of blind Wiener estimation where the input is assumed to be stochastic is subsumed by the coloured process noise case in Section 7.3.2). This likelihood will be denoted here by pθ (YN ) and the Maximum-Likelihood (ML) estimate is obtained via
θ = arg max pθ (YN ) . θ
(7.14)
This approach enjoys a long and fruitful history within the system identification community because of its statistical efficiency in producing consistent estimates (see e.g. [13]). In the following sections we will provide expressions of the likelihood function for various Wiener models. In particular, we firstly consider the system depicted in Figure 7.1 and then consider a related one whereby the process noise is allowed to be coloured. Finally, we consider the case where the input signal is unknown (the is the so-called blind estimation problem). Based on these expressions, Section 7.4 provides algorithms for computing the ML estimate. This includes the direct gradient-based approach for models in the form of Figure 7.1, which was presented in [8]. In addition, the ExpectationMaximisation approach is presented for the case of coloured process noise.
7.3.1 Likelihood Function for White Disturbances For the Wiener model in Figure 7.1 we assume that the disturbance sequences e(t) and w(t) are each white noise. This means that for given input sequence UN , y(t) will also be a sequence of independent variables. This in turn implies that the PDF of YN will be the product of the PDF’s of y(t),t = 1, . . . , N. Therefore, it is sufficient to derive the PDF of y(t). To simplify notation we shall use y(t) = y, x(t) = x. As a means to expressing this PDF, we firstly introduce an intermediate signal x (see Figure 7.1) as a nuisance parameter. The benefit of introducing this term is that the PDF of y given x is basically a reflection of the PDF of e since y(t) = f x(t) + e(t) hence
7
Wiener System Identification Using the Maximum Likelihood Method
py (y|x) = pe y − f (x, η )
95
(7.15)
where pe is the PDF of e. In a similar manner, the PDF of x given UN can be obtained by noting that (7.16) x(t) = G(q, ϑ )u(t) + w(t) = x0 (t, ϑ ) + w(t) . So that for a given UN and ϑ , x0 is a known, deterministic variable, and hence px (x) = pw x − x0 (ϑ ) = pw x − G(q, ϑ )u(t) (7.17) where pw is the PDF of w. Since x(t) is not measured, then we must integrate over all x ∈ R in order to eliminate it from the expressions to receive
py (y) =
x∈R
= x∈R
= x∈R
px,y (x, y)dx py (y|x) px (x)dx
(7.18)
pe y − f (x, η ) pw x − G(q, ϑ )u(t) dx .
In order to proceed further, it is necessary to assume a PDF for e and w. Therefore, we assume that the process noise w(t) and the measurement noise e(t) are Gaussian, with zero means and variances λw and λe respectively, i.e. 1 − 1 ε2 pe ε = √ e 2λe 2πλe
and
1 − 1 v2 pw v = √ e 2λw . 2πλw
(7.19)
The joint likelihood can be expressed as the product over all time instants since the noise is white, so that N N ∞ 1 1 √ pθ (YN ) = (7.20) ∏ −∞ e− 2 ε (t,θ ) dx(t) 2π λe λw t=1 where
ε (t, θ ) =
#2 2 1" 1 y(t) − f x(t), η x(t) − G(q, ϑ )u(t) . + λe λw
(7.21)
Therefore, when provided with the observed data UN and YN , we can calculate pθ (YN ) and its gradients for each θ . This means that the ML criterion (7.14) can be maximised numerically. This approach is detailed in Section 7.4.1. The derivation of the Likelihood function appeared in [7] and [8].
7.3.2 Likelihood Function for Coloured Process Noise If the process noise is coloured, we may represent the Wiener system as in Figure 7.2. In this case, equations for the output are given by
96
A. Wills and L. Ljung
x(t) = G(q, ϑ )u(t) + H(q, ϑ )w(t) , y(t) = f x(t), η + e(t) .
(7.22)
By using the predictor form, see [13], we may write this as x(t) = x (t|Xt−1 ,Ut , ϑ ) y(t) =
x (t|Xt−1 ,Ut , ϑ ) + w(t) , (7.23) H −1 (q, ϑ )G(q, ϑ )u(t) + 1 − H −1(q, ϑ ) x(t) , (7.24) (7.25) f x(t), η + e(t) .
In the above, Xt−1 denotes the sequence Xt−1 = {x(1), . . . , x(t − 1)} and similarly for Ut . The only stochastic parts are e and w, hence for a given sequence XN , the joint PDF of YN is obtained in the standard way N
pYN (YN |XN ) = ∏ pe (y(t) − f (x(t), η )) .
(7.26)
t=1
On the other hand, the joint PDF for XN is given by (c.f. Equation (5.74), Lemma 5.1, in [13]) N
pXN (XN ) = ∏ pw (x(t) − x (t|Xt−1 ,Ut , ϑ )) .
(7.27)
t=1
The likelihood function for YN is thus obtained from (7.26) by integrating out the nuisance parameter XN using its PDF (7.27) N
pθ (YN ) =
∏ pw
# " H −1 (q, ϑ )[x(t) − G(q, ϑ )u(t)] pe y(t) − f x(t), η dXN .
t=1
(7.28) Unfortunately, in this case filtered versions of x(t) enter the integral, which means that the integration is a true multidimensional integral over the entire sequence XN .
w(t)
e(t)
H(q, ϑ)
u(t)
G(q, ϑ)
x0 (t)
x(t)
f (·, η)
y(t)
Fig. 7.2: Wiener model with coloured process noise. Both w(t) and e(t) are white noise sources, but w(t) is filtered through H(q, ϑ )
7
Wiener System Identification Using the Maximum Likelihood Method
97
This is likely to be intractable using direct integration methods in practise, unless the inverse noise filters are short FIR filters. Motivated by this, here we adopt another approach whereby the noise filter H is described in state-space form as H(q, ϑ ) = C(ϑ )(qI − A(ϑ ))−1 B(ϑ ) ,
(7.29)
where A, B, C are state-space matrices, and the state update is described via
ξ (t + 1) = A(ϑ ) ξ (t) + B(ϑ ) w(t) .
(7.30)
Therefore, according to Figure 7.2, the output can be expressed as y(t) =
f (C(ϑ )ξ (t) + G(q, ϑ )u(t), η ) + e(t) .
(7.31)
Equations (7.30) and (7.31) are in the form of a nonlinear state-space model, which has recently been considered in [17]. In that paper the authors use the ExpectationMaximisation algorithm in conjunction with particle methods to compute the ML estimate. We also adopt this technique here, which is detailed in Section 7.4.2. Blind estimation Note that if the linear term G was zero, then the above system will become a blind Wiener model, so that (7.31) becomes y(t) =
f (C(ϑ )ξ (t), η ) + e(t)
(7.32)
and the parameters in H and f must be estimated via the output measurements only. This case is profiled via a simulation example in Section 7.5.3.
7.4 Maximum Likelihood Algorithms For the case of white Gaussian process and measurement noise described in Section 7.3.1, it was mentioned that numerical methods could be used to evaluate the likelihood integral in Equation (7.20). At the same time, these methods can be used to compute the gradient for use in a gradient based search procedure to find the maximum likelihood estimate. This is the approach outlined in Section 7.4.1 below and profiled in Section 7.5 by way of simulation examples. While this method is very useful and practical, it does not handle the case of estimating parameters of a colouring filter for the case discussed in Section 7.3.2. Further, it does not handle the blind estimation case discussed in Section 7.3.2. Therefore, we present an alternative method based on using the Expectation Maximisation (EM) approach in Section 7.4.2 below. A key point to note is that this method requires a nonlinear smoothing operation and this is achieved via particle methods. Again, the resulting algorithm is profiled in Section 7.5 by way of simulation studies.
98
A. Wills and L. Ljung
7.4.1 Direct Gradient-based Search Approach In this section we are concerned with maximising the likelihood function described in (7.20) and (7.21) via gradient based search. In order to avoid numerical conditioning issues, we consider the equivalent problem of maximising the log-likelihood function provided below. θ = arg max L(θ ) (7.33) θ
where L(θ ) log pθ (YN ) = −N log(2π ) −
&
N N log(λw λe ) + ∑ log 2 t=1
∞ −∞
' e− 2 ε (t,θ ) dx 1
(7.34) (7.35)
and ε (t, θ ) is given by Equation (7.21). To solve (7.33) here we employ an iterative gradient based approach. Typically, this approach proceeds by computing a “search direction”, and then the function L is increased along the search direction to obtain a new parameter estimate. This search direction is usually determined so that it forms an acute angle with the gradient, since under these conditions it can be shown to increase the cost when added to the current estimate. To be more precise, at iteration k, L(θk ) is modelled locally as 1 L(θk + p) ≈ L(θk ) + gTk p + pT Hk−1 p, 2
(7.36)
where gk is the derivative of L with respect to θ evaluated at θk and Hk−1 is a symmetric matrix. If a Newton direction is desired, then Hk−1 would be the inverse of Hessian matrix, but the Hessian matrix itself may be quite expensive to compute. However, the structure in (7.34) is directly amenable to using Gauss-Newton gradient based search [4], which provides a good approximation to the Hessian. Here, however, we employ a quasi-Newton method where Hk is updated at each iteration based on local gradient information so that it resembles the Hessian matrix in the limit. In particular, we use the well-known BFGS update strategy [15, Section 6.1], which can guarantee that Hk is negative definite and symmetric so that pk = −Hk gk
(7.37)
maximises (7.36). The new parameter estimate θk+1 is then obtained by updating the previous one via θk+1 = θk + αk pk , (7.38) where αk is selected such that L(θk + αk pk ) > L(θk ).
(7.39)
7
Wiener System Identification Using the Maximum Likelihood Method
99
Evaluating the cost L(θk ) and its derivative gk are essential to the success of the above approach. For the case of computing the cost, we see from (7.34) that this requires the evaluation of an integral. Similarly, note that the i’th element of the gradient vector gk , denoted gk (i), is given by ⎡ ! ∞ ∂ ε (t,θ ) − 1 ε (t,θ ) ⎤(( N 2 dx ( N ∂ log(λw ) N ∂ log(λw ) 1 −∞ ∂ θ (i) e ⎦( + + ∑ gk (i) = ⎣ ! ∞ − 1 ε (t,θ ) ( 2 ∂ θ (i) 2 ∂ θ (i) 2 t=1 ( e 2 dx −∞
θ =θk
(7.40) so that computing the gradient vector also requires evaluation of an integral. Evaluating the integrals in (7.34) and (7.40) will be achieved numerically in this chapter. In particular, we employ a fixed-interval grid over x and use the composite Simpson’s rule to obtain the approximation [16, Chapter 4]. The reason for employing a fixed grid (it need not be of fixed-interval as used here) is that it allows straightforward computation of L(ϑk ) and its derivative gk at the same grid points. This is detailed in Algorithm 7.1 below and used in the simulations in Section 7.5.
7.4.2 Expectation Maximisation Approach In this section we address the coloured process noise case introduced in Section 7.3.2. As mentioned in that section, the likelihood function as expressed in (7.28) involved the evaluation of a high dimensional integral, which is not tractable on desktop computers. To tackle this problem, the output y(t) was expressed as a nonlinear state-space model via (7.31), (7.29) and (7.30). In this form, the problem is directly amenable to the recently developed Expectation Maximisation (EM) algorithm described in [17]. This section will detail the EM approach as applied to the coloured process noise case. It is also directly applicable to the blind estimation case discussed in Section 7.3.2. In keeping with the notation already defined in Section 7.4.1 above, the EM algorithm is a method for computing θ in (7.33) that is very general and addresses a wide range of applications. Key to both its implementation and theoretical underpinnings is the consideration of a joint log-likelihood function of both the measurements YN and the so-called “missing data” Z LZ,YN (θ ) = log pθ (Z,YN ).
(7.41)
In some cases, the missing data is quite literally measurements that are absent for some reason. More generally though, the missing data Z consists of “measurements” that while not available, would be useful to the estimation problem. As such, the choice of Z is a design variable in the deployment of the EM algorithm. For the case in Section 7.3.2, this choice is naturally the missing state sequence Z = {ξ1 , . . . , ξN },
(7.42)
100
A. Wills and L. Ljung
Algorithm 7.1 Numerical computation of likelihood and derivatives Given an odd number of grid points M, the parameter vector θ and the data UN and YN , perform the following steps. (Note that after the algorithm has terminated, the cost L ≈ L¯ and gradient g ≈ g). ¯ 1. Simulate the system x0 (t) = G(ϑ , q)u(t). 2. Specify grid vector Δ ∈ RM as M equidistant points between the limits [a b], so that Δ (1) = a and Δ (i + 1) = Δ (i) + (b − a)/M for all i = 1, . . . , M − 1. ¯ = 0 for i = 1, . . . , nθ . 3. Set L¯ = N log(2π ) + N2 log(λw λe ), and g(i) 4. for t=1:N, a. for j=1:M, compute x = x0 (t) + Δ ( j),
γ j = e− 2 (α 1
2
/λw +β 2 /λe )
α = x − x0 (t), ,
δ j (i) = γ j
β = y(t) − f (x, η ) ,
∂ ε (t, θ ) , ∂ θ (i)
i = 1, . . . , nθ ,
end b. Compute ⎛ ⎞ M−1 M−3 2 2 (b − a) ⎝ κ= γ1 + 4 ∑ γ2 j + 2 ∑ γ2 j+1 + γM ⎠ , 3M j=1 j=1 ⎞ ⎛ M−1 M−3 2 2 (b − a) ⎝ π (i) = δ1 (i) + 4 ∑ δ2 j (i) + 2 ∑ δ2 j+1 (i) + δM (i)⎠ , 3M j=1 j=1
i = 1, . . . , nθ ,
L¯ = L¯ − log(κ ), 1 ∂ log(λw λe ) π (i) , i = 1, . . . , nθ , g(i) ¯ = g(i) ¯ + + 2 ∂ θ (i) κ end
since if it were known or measured, then the problem would reduce to one in the form of (7.3), which is more readily soluble. It is of vital importance to understand the connection between the joint likelihood in (7.41) and the likelihood (7.34) that we are trying to optimise. Accordingly, note that by the definition of conditional probability, the likelihood (7.34) and the joint likelihood (7.41) are related by log pθ (YN ) = log pθ (Z,YN ) − log pθ (Z | YN ).
(7.43)
Let θk denote an estimate of the likelihood maximiser θ in (7.33). Further, denote by pθk (Z | YN ) the conditional density of the missing data Z, given observations of the available data YN and depending on the choice θk . These definitions allow the
7
Wiener System Identification Using the Maximum Likelihood Method
101
following expression, which is obtained by taking conditional expectations of both sides of (7.43) relative to pθk (Z | YN ).
log pθ (YN ) =
log pθ (Z,YN )pθk (Z | YN )dZ −
log pθ (Z | YN )pθk (Z | YN )dZ
= Eθk {log pθ (Z,YN ) | YN } − Eθk {log pθ (Z | YN ) | YN } . *+ , ) *+ , ) Q(θ ,θk )
(7.44)
V (θ ,θk )
Employing these newly defined Q and V functions, we can express the difference between the likelihood Lθk (YN ) at the estimate θk and the likelihood Lθ (YN ) at an arbitrary value of θ as L(θ ) − L(θk ) = (Q(θ , θk ) − Q(θk , θk )) + (V (θk , θk ) − V (θ , θk ))) . *+ , )
(7.45)
≥0
The positivity of the last term in the above equation can be established by noting that it is the Kullback–Leibler divergence metric between two densities [5]. As a consequence if we obtain a new estimate θk+1 such that Q(θk+1 , θk ) > Q(θk , θk ), then it follows that L(θk+1 ) > L(θk ). So that, by increasing the Q function we are also increasing the likelihood (7.34). This leads to the EM algorithm, which iterates between forming Q(θ , θk ) and then maximising it with respect to θ to obtain a better estimate θk+1 (for further information regarding the EM algorithm, the text [14] is an excellent reference). Algorithm 7.2 Expectation Maximisation Algorithm 1. Set k = 0 and initialise θ0 such that L(θ0 ) is finite. 2. Expectation (E) step: Compute Q(θ , θk ) = Eθk {log pθ (Z,YN ) | YN } .
(7.46)
3. Maximisation (M) step: Compute
θk+1 = arg max Q(θ , θk ). θ
(7.47)
4. If not converged, update k := k + 1 and return to step 2.
The Expectation and Maximisation steps are treated separately in Sections 7.4.2.1 and 7.4.2.2 below. 7.4.2.1
Expectation Step
The first challenge in implementing the EM algorithm is the computation of Q(θ , θk ) according to (7.44). To address this, note that via Bayes’ rule and the Markov property associated with the model in (7.30) and (7.31) and with the choice (7.42) for Z
102
A. Wills and L. Ljung
Lθ (Z,YN ) = log pθ (YN |Z) + log pθ (Z) N−1
=
∑
t=1
N
log pθ (ξt+1 |ξt ) + ∑ log pθ (yt |ξt ).
(7.48)
t=1
Applying the conditional expectation operator Eθk {· | YN } to both sides of (7.48) yields Q(θ , θk ) = I1 (θ , θk ) + I2(θ , θk ),
(7.49)
where I1 (θ , θk ) =
N−1
∑
t=1 N
I2 (θ , θk ) = ∑
log pθ (ξt+1 |ξt )pθk (ξt+1 , ξt |YN ) dξt dξt+1 ,
log pθ (yt |ξt )pθk (ξt |YN ) dξt .
(7.50a) (7.50b)
t=1
Hence, computing Q(θ , θk ) requires knowledge of densities such as pθk (ξt |YN ) and pθk (ξt+1 , ξt |YN ) associated with a nonlinear smoothing problem. Unfortunately, due to the nonlinear nature of the Wiener model, these densities are unlikely to have analytical expressions. This chapter therefore takes a numerical approach of evaluating (7.50a)-(7.50b) via the use of particle methods, more formally known as sequential
importance resampling (SIR) methods [6]. This will result in an approximation Q of Q via
θ , θk ) = I 1 (θ , θk ) + I 2(θ , θk ) Q(
(7.51)
where I 1 and I 2 are approximations to (7.50a) and (7.50b). These approximations are provided by the particle smoothing Algorithm 7.3 below (see [17] for background and a more detailed explanation). To use this algorithm, we require the ability to draw new samples from the disi tribution pθk (ξ˜t |ξt−1 ), but this is straightforward since ξt is given by a linear statespace equation in (7.30) with white Gaussian disturbance w(t). Therefore, according i we can draw ξ˜ti via to (7.30), for each ξt−1 i ξ˜ti = Aξt−1 + Bω i
(7.60)
where ω i is a realisation from the appropriate Gaussian distribution for w(t). In addition, we require the ability to evaluate the probabilities pθk (yt |ξ˜t j ) and k |ξ˜ i ). Again, this is straightforward in the Wiener model case described by pθk (ξ˜t+1 t (7.29)-(7.31) since j pθk (yt |ξ˜t ) = pe (yt − f (Cξ˜ti + G(q)ut )), pθ (ξ˜ k |ξ˜ i ) = pw (B† [ξ˜ k − Aξ˜ i ]) k
t+1
t
t+1
where B† is the Moore-Penrose pseudo inverse of B.
t
(7.61) (7.62)
7
Wiener System Identification Using the Maximum Likelihood Method
103
Algorithm 7.3 Particle Smoother Given the current estimate θk , choose the number of particles M and complete the following steps. 1. Initialise particles, {ξ0i }M i=1 ∼ pθk (ξ0 ) and set t = 1; 2. Predict the particles by drawing M i.i.d. samples according to i ξ˜ti ∼ pθk (ξ˜t |ξt−1 ),
i = 1, . . . , M.
(7.52)
3. Compute the importance weights {wti }M i=1 , wti w(ξ˜ti ) =
pθk (yt |ξ˜ti ) , M p (y |ξ˜ j ) ∑ j=1 θk t t
i = 1, . . . , M.
(7.53)
j
4. For each j = 1, . . . , M draw a new particle ξt with replacement (resample) according to, j P(ξt = ξ˜ti ) = wti ,
i = 1, . . . , M.
(7.54)
5. If t < N increment t → t + 1 and return to step 2, otherwise proceed to step 6. 6. Initialise the smoothed weights to be the terminal filtered weights {wti } at time t = N, wiN|N = wiN ,
i = 1, . . . , M.
(7.55)
and set t = N − 1. 7. Compute the following smoothed weights M
k |ξ˜ i ) pθk (ξ˜t+1 t
k=1
vtk
k ∑ wt+1|N
i = wti wt|N
vtk
,
(7.56)
M
k |ξ˜ti ). ∑ wti pθ (ξ˜t+1
(7.57)
k
i=1 ij
wt|N
j j wti wt+1|N pθk (ξ˜t+1 | ξ˜ti ) ∑M wl pθ (ξ˜ l | ξ˜ l ) l=1
t
k
t+1
(7.58)
t
8. Update t → t − 1. If t > 0 return to step 7, otherwise proceed to step 9. 9. Compute the approximations I 1 (θ , θk )
N M M
ij j log pθ (ξ˜t+1 | ξ˜ti ), ∑ ∑ ∑ wt|N
(7.59a)
t=1 i=1 j=1
I 2 (θ , θk )
N M
i log pθ (yt |ξ˜ti ). ∑ ∑ wt|N
(7.59b)
t=1 i=1
7.4.2.2
Maximisation Step
θ , θk ) of the function Q(θ , θk ) made available, attention With an approximation Q( now turns to the maximisation step (7.47). This requires that the approximation
104
A. Wills and L. Ljung
θ , θk ) is maximised with respect to θ in order to compute a new iterate θk+1 of Q( the maximum likelihood estimate.
will not be available. As such, this In general, a closed form maximiser of Q section again employs a gradient based search technique as already utilised in Sec θ , θk ) tion 7.4.1. For this purpose, note that via (7.51) and (7.59) the gradient of Q( with respect to θ is simply computable via
∂ ∂ I 1 (θ , θk ) ∂ I 2 (θ , θk ) Q(θ , θk ) = + , ∂θ ∂θ ∂θ ˜ j ˜i N M M ∂ I 1 (θ , θk ) i j ∂ log pθ (ξt+1 |ξt ) = ∑ ∑ ∑ wt|N , ∂θ ∂θ t=1 i=1 j=1 N M ˜i ∂ I 2 (θ , θk ) i ∂ log pθ (yt |ξt ) = ∑ ∑ wt|N . ∂θ ∂θ t=1 i=1
(7.63a) (7.63b) (7.63c)
j j In the above, we require partial derivatives of pθk (yt |ξ˜t ) and pθk (ξ˜t+1 |ξ˜ti ) with respect to θ . To that end, we may obtain these derivatives via simple calculus on the expressions provided in (7.61) and (7.62). Note that for a given θk , the particle smoother algorithm will provide the particles
itself). ξ˜ti and all the weights required to calculate the above gradients (and indeed Q Importantly, these particles and weights are valid while ever θk remains the same (which it does throughout the Maximisation step). With this gradient available, we can employ the same strategy that was presented
Indeed, this was in Section 7.4.1 for maximising L, to the case of maximising Q. used in the simulations in Section 7.5.
7.5 Simulation Examples In this section we profile three different algorithms on various simulation examples. To streamline the presentation it is helpful to provide each algorithm with an abbreviation. Therefore, output error approach outlined in Section 7.2 is denoted by OE. Secondly, the direct gradient based search method of Section 7.4.1 is denoted by ML-DGBS. Thirdly, the expectation maximisation method of Section 7.4.2 is labelled ML-EM. For the implementation √ of ML-DGBS we chose the limits for the integration [a, b] (see Algorithm 7.1) as ±6 λw , where λw is the variance of the process noise w(t). This corresponds to a confidence interval of 99.9999 % for the signal x(t) if the process noise is indeed Gaussian and white. The number of grid points was chosen to be 1001.
7.5.1 Example 1: White Process and Measurement Noise The first example is a second order system with complex poles for the linear part G(ϑ , q), followed by a deadzone function for the nonlinear part f (·, η ). The input
7
Wiener System Identification Using the Maximum Likelihood Method
105
u and process noise w are Gaussian, each with zero mean and variance 1, while the measurement noise e is Gaussian with zero mean and variance 0.1. The system is given by x0 (t) + a1x0 (t − 1) + a2x0 (t − 2) = u(t) + b1u(t − 1) + b2 , u(t − 2) , x(t) = x0 (t) + w(t) , ⎧ ⎪x(t) − c1 for x(t) < c1 , ⎨ f x(t) = 0 for c1 ≤ x(t) ≤ c2 , ⎪ ⎩ x(t) − c2 for c2 > x(t) , y(t) = f x(t) + e(t) .
(7.64)
Here, we estimate the parameters a1 , a2 , b1 , b2 , c1 , c2 . A Monte-Carlo simulation with 1000 data sets was generated, each using N = 1000 samples. The true values of the parameters, and the results of the OE approach (see Section 7.2) and ML-DGBS method (see Section 7.3.1) are summarised in Table 7.1. The estimates of the deadzone function f x(t) from Equation (7.69) are plotted in Figure 7.3. Table 7.1: Parameter estimates with standard deviations for Example 1, using OE and MLDGBS methods. The mean and standard deviations are computed over 1000 runs. The notation n.e. stands for “not estimated” as the noise variances are not estimated with output error method Par a1 a2 b1 b2 c1 c2 λw λe
True 0.6000 -0.6000 -0.6000 0.6000 -0.3000 0.5000 1.0000 0.1000
OE 0.5486 ± 0.0463 -0.5482 ± 0.0492 -0.6002 ± 0.0146 0.6006 ± 0.0130 -0.1600 ± 0.0632 0.3500 ± 0.0652 n.e. n.e.
ML-DGBS 0.6017 ± 0.0444 -0.6015 ± 0.0480 -0.6002 ± 0.0141 0.6007 ± 0.0126 -0.3064 ± 0.0610 0.5061 ± 0.0641 0.9909 ± 0.0634 0.1033 ± 0.0273
This simulation confirms that the output error approach provides biased estimates. On the other hand, the Maximum Likelihood method provides consistent estimates, including noise variances.
7.5.2 Example 2: Coloured Process Noise The second example considers the Wiener model in Figure 7.2. It is similar to the first example in that the linear part G is a second order system with complex, but different in that we have replaced the deadzone function with a saturation function for the nonlinear part f (·, η ), and different in that the process noise is coloured by
106
A. Wills and L. Ljung 1
0.8
0.6
0.4
f(x(t),η)
0.2
0
−0.2
−0.4
−0.6
−0.8
−1 −1
−0.8
−0.6
−0.4
−0.2
0 x(t)
0.2
0.4
0.6
0.8
1
−0.8
−0.6
−0.4
−0.2
0 x(t)
0.2
0.4
0.6
0.8
1
1
0.8
0.6
0.4
f(x(t),η)
0.2
0
−0.2
−0.4
−0.6
−0.8
−1 −1
Fig. 7.3: Example 1: The true deadzone function as a thick black line and the 1000 estimated deadzones, appearing in grey. Above: OE. Below: ML-DGBS
H(q) =
q−1 1 − h1q−1
(7.65)
which corresponds to the state-space system
ξ (t + 1) = h1 ξ (t) + w(t).
(7.66)
7
Wiener System Identification Using the Maximum Likelihood Method
107
Therefore, overall Wiener system is then given by x0 (t) + a1x0 (t − 1) + a2x0 (t − 2) = u(t) + b1u(t − 1) + b2u(t − 2) , ⎧ ⎪c for x ≤ c1 , ⎨ 1 f x = x for c1 < x ≤ c2 , ⎪ ⎩ c2 for c2 < x , y(t) = f ξ (t) + x0(t) + e(t) .
(7.67)
The goal is to estimate the parameters a1 , a2 , b1 , b2 , c1 , c2 , h1 based on input and output measurements. In this case, three different algorithms were employed, namely the OE method from Section 7.2, the ML-DGBS approach from Section 7.4.1, and the ML-EM particle based method from Section 7.4.2. It should be mentioned that the former two algorithms do not cater for estimating the filter parameter h1 . It is interesting nonetheless to observe their performance based on the wrong assumptions that each make about the process noise, i.e. it doesn’t exist in the first case, and it is assumed white in the second. As before, we ran a Monte-Carlo simulation with 1000 runs and in each we generated a new data set with N = 1000 points. The signals u(t), w(t) and e(t) were generated in the same way as for Example 1. For the EM approach, we used M = 200 particles in approximating Q (see (7.51)). The results are summarised in Table 7.2. It can be observed that the output error approach again provides biased estimates of the nonlinearity parameters. The direct gradient based search procedure seems to produce reasonable results, but the expectation maximisation approach produces slightly more accurate results (this is perhaps surprising given that only M = 200 particles were used). It is worth asking if the consistency of the ML-DGBS approach for coloured process noise is surprising or not. It is well known from linear identification that the Output Error approach gives consistent estimates, even when the output error disturbance is coloured, and thus an erroneous likelihood criterion is used, [13]. The Wiener model resembles the output error model in that, in essence, it is a static model, i.e. for given input u noise is added to the deterministic variable Table 7.2: Parameter estimates with standard deviations for Example 2 with coloured noise, using the OE, ML-DGBS and ML-EM methods Par a1 a2 b1 b2 c1 c2 h1 λw λe
True 0.6000 -0.6000 -0.6000 0.6000 -0.5000 0.3000 0.9000 1.0000 0.1000
OE 0.5683 ± 0.2424 -0.5677 ± 0.2718 -0.5995 ± 0.0642 0.6027 ± 0.0545 -0.3032 ± 0.0385 0.1108 ± 0.0397 n.e. n.e. n.e.
ML-DGBS 0.6163 ± 0.1798 -0.6258 ± 0.2570 -0.5989 ± 0.0510 0.6022 ± 0.0403 -0.4974 ± 0.0278 0.2991 ± 0.0250 n.e. 5.4671 ± 1.8681 0.1000 ± 0.0069
ML-EM 0.5874 ± 0.1376 -0.5820 ± 0.1649 -0.5980 ± 0.0392 0.6017 ± 0.0333 -0.5000 ± 0.0184 0.3003 ± 0.0173 0.8986 ± 0.0227 0.9765 ± 0.2410 0.1000 ± 0.0054
108
A. Wills and L. Ljung
β (t) = G(q)u(t) as β (t) + e(t) (linear output error) or as f (β (t) + w(t)) + e(t) (Wiener model). The spectrum or time correlation of the noises do not seem essential. However, a formal proof of this does not appear to be straightforward in the Wiener case. Therefore, given the relative simplicity of implementing the ML-DGBS method compared with the EM approach, and given that the estimates for both approaches are comparable, it is worth asking whether or not the noise model really needs to be estimated. On the other hand, if it is essential that the noise model be identified, then the output error and ML-DGBS methods are not really suitable since they do not handle this case. In line with this, the next section discusses the blind estimation problem where identifying the noise filter is essential.
7.5.3 Example 3: Blind Estimation In the third simulation, we again consider the Wiener model depicted in Figure 7.2 but with G = 0. This can be interpreted as a blind Wiener model estimation problem, where the unknown input signal w(t) is first passed through a filter H(q) and then secondly mapped through a static nonlinearity f . Finally, the measurements are corrupted by the disturbance e(t) to provide y(t). In particular, we assume as in Example 2 that the process noise is coloured by H(ϑ , q) =
q−1 1 − h1q−1
(7.68)
and the resulting signal is then mapped through a saturation nonlinearity, so that the overall Wiener system is given by y(t) = f ξ (t) + e(t) , ξ (t + 1) = h1 ξ (t) + w(t) , ⎧ (7.69) ⎪c for ξ (t) ≤ c1 , ⎨ 1 f ξ (t) = ξ (t) for c1 < ξ (t) ≤ c2 , ⎪ ⎩ c2 for c2 < ξ (t) . Here we are trying to estimate the parameters h1 , c1 , c2 and the variance parameters λw , λe of the process noise w(t) and e(t), respectively. This is to be done based on the output measurements alone. The EM method described in Section 7.4.2 is directly applicable to this case, and was employed here. As usual, we ran a Monte-Carlo simulation with 1000 runs and in each we generated a new data set with N = 1000 points. The signals w(t) and e(t) were generated as Gaussian random numbers with variance 1 and 0.1, respectively. In this case, we used only M = 50 particles in approximating Q. The results are summarised in Table 7.3. Even with a modest number of particles used, M = 50, the estimates are consistent and appear to be accurate.
7
Wiener System Identification Using the Maximum Likelihood Method
109
Table 7.3: Parameter estimates with standard deviations for Example 3, using the EM method Par b2 c1 c2 λw λe
True 0.9000 -0.5000 0.3000 1.0000 0.1000
ML-EM 0.8995 ± 0.0237 -0.4967 ± 0.0204 0.2968 ± 0.0193 1.0293 ± 0.1744 0.1019 ± 0.0063
7.6 Conclusion The dominant approach for estimating Wiener models is to parametrise the linear and nonlinear parts and then minimise, with respect to these parameters, the squared error between the measured output and a simulated one from the Wiener model. This approach implicitly assumes that no process noise is present. It was confirmed that this leads to biased estimates if the assumption is wrong. To overcome this problem, the chapter presents two algorithms for providing maximum likelihood estimates of Wiener models that include both process and measurement noise. The first is based on the assumption that the process noise is white, and the second assumes that the process noise has been coloured by a linear filter. In the latter case, the likelihood function involves the evaluation of a high dimension integral, which is not tractable using traditional numerical integration techniques. Motivated by this, the chapter casts the Wiener model in the form of a nonlinear state-space model, which is directly amenable to a recently developed Expectation Maximisation algorithm. Of vital importance is that the expectation step can be approximated using sequential importance resampling (or particle) methods, which are easily implemented using standard desktop computing. This approach was profiled for the case of coloured process noise with very promising results. Finally, the case of blind Wiener model estimation can be directly handled using the expectation maximisation method presented here. The efficacy of this method was demonstrated via a simulation example.
References 1. Bai, E.-W.: Frequency domain identification of Wiener models. Automatica 39(9), 1521– 1530 (2003) 2. Billings, S.A.: Identification of non-linear systems - a survey. IEE Proc. D 127, 272–285 (1980) 3. Boyd, S., Chua, L.O.: Fading memory and the problem of approximating nonlinear operators with Volterra series. IEEE Transactions on Circuits and Systems CAS-32(11), 1150–1161 (1985) 4. Dennis Jr, J.E., Schnabel, R.B.: Numerical Methods for Unconstrained Optimization and Nonlinear Equations. Prentice-Hall, Englewood Cliffs (1983) 5. Gibson, S., Ninness, B.: Robust maximum-likelihood estimation of multivariable dynamic systems. Automatica 41(10), 1667–1682 (2005)
110
A. Wills and L. Ljung
6. Gordon, N.J., Salmond, D.J., Smith, A.F.M.: A novel approach to nonlinear/nonGaussian Bayesian state estimation. In: IEE Proceedings on Radar and Signal Processing, vol. 140, pp. 107–113 (1993) 7. Hagenblad, A., Ljung, L.: Maximum likelihood estimation of wiener models. In: Proc. 39:th IEEE Conf. on Decision and Control, Sydney, Australia, pp. 2417–2418 (2000) 8. Hagenblad, A., Ljung, L., Wills, A.: Maximum likelihood identification of wiener models. Automatica 44(11), 2697–2705 (2008) 9. Hsu, K., Vincent, T., Poolla, K.: A kernel based approach to structured nonlinear system identification part i: Algorithms, part ii: Convergence and consistency. In: Proc. IFAC Symposium on System Identification, Newcastle (2006) 10. Hunter, I.W., Korenberg, M.J.: The identification of nonlinear biological systems: Wiener and Hammerstein cascade models. Biological Cybernetics 55, 135–144 (1986) 11. Kalafatis, A., Arifin, N., Wang, L., Cluett, W.R.: A new approach to the identification of pH processes based on the Wiener model. Chemical Engineering Science 50(23), 3693– 3701 (1995) 12. Ljung, L., Singh, R., Zhang, Q., Lindskog, P., Juditski, A.: Developments in Mathworks system identification toolbox. In: Proc. 15th IFAC Symposium on System Identification, Saint-Malo, France (2009) 13. Ljung, L.: System Identification, Theory for the User, 2nd edn. Prentice-Hall, Englewood Cliffs (1999) 14. McLachlan, G., Krishnan, T.: The EM Algorithm and Extensions, 2nd edn. John Wiley & Sons, Chichester (2008) 15. Nocedal, J., Wright, S.J.: Numerical Optimization, Second Edition, 2nd edn. Springer, New York (2006) 16. Press, W.H., Teukolsky, S.A., Vetterling, W.A., Fannery, B.P.: Numerical Recipes in C, the Art of Scientific Computing, 2nd edn. Cambridge University Press, Cambridge (1992) 17. Sch¨on, T., Wills, A., Ninness, B.: System identification of nonlinear state-space models. Automatica (provisionally accepted) (2009) 18. Schoukens, J., Nemeth, J.G., Crama, P., Rolain, Y., Pintelon, R.: Fast approximate identification of nonlinear systems. Automatica 39(7), 1267–1274 (2003) 19. Vanbaylen, L., Pintelon, R., de Groen, P.: Blind maximum likelihood identification of wiener systems with measurement noise. In: Proc. 15th IFAC Symposium on System Identification, Saint-Malo, France, pp. 1686–1691 (2009) 20. Vanbaylen, L., Pintelon, R., Schoukens, J.: Blind maximum-likelihood identification of wiener systems. IEEE Transactions on Signal Processing 57(8), 3017–3029 (2009) 21. Westwick, D., Verhaegen, M.: Identifying MIMO Wiener systems using subspace model identification methods. Signal Processing 52, 235–258 (1996) 22. Wigren, T.: Recursive prediction error identification using the nonlinear Wiener model. Automatica 29(4), 1011–1025 (1993) 23. Wills, A.G., Mills, A.J., Ninness, B.: A matlab software environment for system identification. In: Proc, 15th IFAC Symposium on System Identification, Saint-Malo, France (2009) 24. Zhu, Y.: Distillation column identification for control using Wiener model. In: 1999 American Control Conference, Hyatt Regency San Diego, California, USA (1999)
Chapter 8
Parametric Versus Nonparametric Approach to Wiener Systems Identification Grzegorz Mzyk
8.1 Introduction to Wiener Systems The problem of nonlinear dynamic systems modelling by means of block-oriented models has been strongly elaborated for the last four decades, due to vast variety of applications. The concept of block-oriented models assumes that the real plant, as a whole, can be treated as a system of interconnected blocks, static nonlinearities (N) and linear dynamics (L), where the interaction signals cannot be measured. The most popular in this class are two-element cascade structures, i.e., Hammersteintype (N-L), Wiener-type (L-N), and sandwich-type (L-N-L) representations. Particularly, since in the Wiener system (Figure 8.1) the nonlinear block is preceded by the linear dynamics and the nonlinearity input is correlated, its identification is much more difficult in comparison with the Hammerstein system. However the Wiener model allows for better approximation of many real processes. Such difficulties in theoretical analysis forced the authors to consider special cases, and to take somehow restrictive assumptions on the input signal, impulse response of the linear dynamic block and the shape of the nonlinear characteristic. In particular, for Gaussian input the problem of Wiener system identification becomes much easier. Since the internal signal {xk } is then also Gaussian, the linear block can be simply identified by the cross-correlation approach, and the static characteristic can be recovered e.g. by the nonparametric inverse regression approach ([14]-[16]). Non-Gaussian random input is very rarely met in the literature. It is allowed e.g. in [38], but the algorithm presented there requires prior knowledge of the parametric representation of the linear subsystem. Most of recent methods for Wiener system identification assumes FIR linear dynamics, invertible nonlinearity, or require the use of specially designed input excitations ([2], [12]). Grzegorz Mzyk Institute of Computer Engineering, Control and Robotics, Wroclaw University of Technology, Poland e-mail:
[email protected] F. Giri & E.-W. Bai (Eds.): Block-oriented Nonlinear System Identification, LNCIS 404, pp. 111–125. c Springer-Verlag Berlin Heidelberg 2010 springerlink.com
112
G. Mzyk
Fig. 8.1: Wiener system
In the chapter we compare and combine two kinds of methods, parametric ([1], [2], [5]-[12], [27]-[32], [39]-[43], [45], [47], [48]) and nonparametric ([14]-[26], [33]-[38], [44]). The method is called ’parametric’ if both linear and nonlinear subsystems are described with the use of finite number of unknown parameters, e.g. when FIR linear dynamic model and polynomial characteristic with known orders are assumed. The popular parametric methods elaborated for Wiener system identification are not free of the approximation error and do not allow full decentralisation of the identification task of complex system. Moreover the theoretical analysis of identifiability, and convergence of parametric estimates remains relatively difficult. On the other hand, nonparametric approach offers simple algorithms, which are asymptotically free of approximation error, i.e. they converge to the true system characteristics. However, the purely nonparametric methods are not commonly exploited in practice for the following reasons: (i) they depend on various tuning parameters and functions; in particular, proper selection of kernel and the bandwidth parameter or orthonormal basis and the scale factor are critical for the obtained results, (ii) the prior knowledge of subsystems is completely neglected; the estimates are based on measurements only, and the resulting model may be not satisfactory when the number of measurements is small, and (iii) bulk number of estimates must be computed when the model complexity grows large. In Section 8.2 we recollect the traditional parametric least-squares method for Wiener system identification and discuss its weak points. Next, in Section 8.3, we present several purely nonparametric methods, i.e., correlation-based estimate of the linear dynamics, kernel estimate of the inverse regression, and a censored sample mean approach to nonlinearity recovering. Finally, selected parametric and nonparametric methods are combined and the properties of the proposed two-stage procedures are discussed in Section 8.4.
8.2 Nonlinear Least Squares Method The Wiener system, i.e. the linear dynamics with the impulse response discrete-time ∞ λ j j=0 , connected in the cascade with the static nonlinear block characterised by μ (), is described by the following equation & ' yk = μ
∞
∑ λ j uk− j
j=0
+ zk ,
8
Parametric Versus Nonparametric Approach to Wiener Systems Identification
113
where uk , yk , and zk are the input, output and the random disturbance, $ %∞ respectively.
(x) for The goal of identification is to recover both elements, i.e. λj and μ j=0
each x ∈ R, using the set of input-output measurements {(uk , yk )}Nk=1 . In the traditional (parametric) approach we also assume finite dimensional functionals, e.g. the ARMA-type dynamic block xk + a∗1xk−1 + ... + a∗r xk−r = b∗0 uk + b∗1uk−1 + ... + b∗s uk−s , xk = φkT θ ∗ ,
(8.1)
φk = (−xk−1 , ..., −xk−r , uk , uk−1 , ..., uk−s ) , T
θ ∗ = (a∗1 , ..., a∗r , b∗0 , b∗1 , ..., b∗s )T , and given formula μ (x, c∗ ) = μ (x) including finite number of unknown true parameters c∗ = (c∗1 , c∗2 , ..., c∗m )T . Respective Wiener model is thus represented by r + (s + 1) + m parameters, i.e., T
xk φ k θ , and xk = 0 for k 0, T
where φ k = (−xk−1 , ..., −xk−r , uk , uk−1 , ..., uk−s )T ,
(8.2)
θ = (a1 , ..., ar , b0 , b1 , ..., bs )T , and y(x, c) = μ (x, c), where c = (c1 , c2 , ..., cm )T . If the assumed model (8.2) agrees with the true system description (8.1), then the results of identification can be significantly improved in comparison with the nonparametric approach, particularly, if the number of measurements is small. On the other hand, the risk of bad parametrisation and existence of systematic approximation error must be taken into account together with the warranty of the parameter estimates converge. If xk had been accessible for measurements then the true system parameters could have been estimated by the following minimisation N
N
θ = arg min ∑ (xk − xk (θ ))2 ,
c = argmin ∑ (yk − y(xk , c))2 .
θ k=1
c
(8.3)
k=1
Here we assume that only the input-output measurements (uk , yk ) of the whole Wiener system are accessible, and the internal signal xk is hidden. This observation leads to the following nonlinear least squares problem N
θ , c = arg min ∑ [yk − y(xk (θ ), c)]2 , θ ,c k=1
(8.4)
which is usually very complicated numerically. Moreover, uniqueness of the solution in (8.4) cannot be guaranteed in general, as it depends on both input distribution, types of models, and values of parameters. Recent publications concerning
114
G. Mzyk
application of neural networks or soft computing methods to Wiener system identification problem do not include any theoretical analysis of convergence.
8.3 Nonparametric Identification Tools The nonparametric approach to block-oriented system identification was introduced in eighties by Greblicki and Pawlak ([20], [37]). For the system of Hammerstein ∞ structure, the reverse connection described by the equation yH k = ∑ j=0 γ j m(uk− j ) + zk , it was noticed that the input-output regression function is equivalent to nonlinear static characteristic, up to some scale and offset factors R(u) E yH k+l |uk = u = γl m(u) + ∑ γ j Em(u1 ). j=l
Since then, two kinds of nonparametric methods have been examined – first, based on the kernel regression estimation, and second, employing the orthogonal series expansion of the nonlinearity. Also the cross-correlation method was proposed for estimation of the impulse response of linear dynamic block in Hammerstein system. The analogous ideas was also applied by Greblicki for a class of Wiener systems, with Gaussian input and locally invertible characteristics [16]. Respective algorithms are shortly reminded in Sections 8.3.1 and 8.3.2. In Section 8.3.3 we introduce and analyse a new kind of nonlinearity estimate in Wiener system, which works under the least possible prior knowledge, i.e. under non-Gaussian input, IIR linear dynamics and any continuous, but not necessary invertible, static characteristic.
8.3.1 Inverse Regression Approach Assume that the input uk and the noise εk are white Gaussian, mutually independent processes with finite variances σu2 , σε2 < ∞, the noise εk is zero-mean, i.e. E εk = 0, and the output measurement noise zk is not present, i.e. yk = yk . The nonparametric estimation of the inverted regression relies on the following lemma. Lemma 8.1. [21] If μ () is invertible then for any y ∈ μ (R) it holds that E uk |yk+p = y = α p μ −1 (y)
(8.5)
where α p = λ p σσu2 . 2
v
Since for any time lag p, the μ −1 (y) can be identified only up to some multiplicative constant α p , let us denote, for convenience, v(y) = α p μ −1 (y). The nonparametric of v(y) has the form y−yk+p ∑Nk=1 uk K( h(N) ) v (y) = , (8.6) y−yk+p ∑Nk=1 K( h(N) )
8
Parametric Versus Nonparametric Approach to Wiener Systems Identification
115
where K() and h(N) is a kernel function and bandwidth parameter, respectively. The following theorem holds. Theorem 8.1. [21] If μ () is invertible, K() is Lipschitz and such that c1 H(|y|) K(y) c2 H(|y|) for some c1 and c2 , where H() is nonnegative and non-increasing function, defined of [0, ∞), continuous and positive at t = 0, and such that tH(t) → 0 as t → ∞, then for h(N) → 0 and Nh2 (N) → ∞ as N → ∞ it holds that v (y) → v(y) in probability as N → ∞,
(8.7)
at every point y, in which the probability density f (y) is positive and continuous. The rate of convergence in (8.7) depends on the smoothness of the identified characteristic and is provided by the following lemma. −1 Lemma 8.2. [21] Let us define g(y) v(y) f (y) and denote v(y) = g(y) f (y) . If μ (), f () and g() have q bounded derivatives in a neighbourhood of y, then " 1 1 # | v(y) − v(y)| = O N − 2 + 2q+2 in probability.
e.g. O(N −1/4 ) for q = 1, O(N −1/3 ) for q = 2, and O(N −1/2 ) for q large. In [16] and [19], the estimate (8.6) was generalised for the larger class of Wiener systems, admitting the “locally invertible” nonlinear static blocks and correlated excitation. The strongest limitation of the inverse regression approach is thus assumption about the Gaussianity of the input signal.
8.3.2 Cross-correlation Analysis The nonparametric identification of the linear dynamic block is based on the following property. Lemma 8.3. [21] If E |vk μ (vk )| < ∞ then E uk yk+p = β λ p , where β =
σu2 E {vk μ (vk )}. σv2
Since λ p can be identified only up to some multiplicative constant β , let us denote, for convenience, κ p β λ p , and consider its natural estimate of the form
κ p =
1 N ∑ uk yk+p . N k=1
Theorem 8.2. [21] If μ () is the Lipschitz function, then 1 2 lim E (κ p − κ p) = O . N→∞ N
(8.8)
(8.9)
116
G. Mzyk
Consequently, when the stable IIR linear subsystem is modelled by the filter with the impulse response κ 0 , κ 1 , ..., κ n(N) , then it is free of the asymptotic approximation error if n(N) → ∞ and n(N)/N → 0 as N → ∞.
8.3.3 A Censored Sample Mean Approach In this section we assume that the input {uk } is an i.i.d., bounded (|uk | < umax ; unknown umax < ∞), but not necessary Gaussian random process. There exists a probability density of the input, ϑu (uk ) say, which is a continuous and strictly positive function around the estimation point x, i.e., ϑu (x) ε > 0. The unknown impulse response {λ j }∞j=0 of the linear IIR filter is exponentially upper bounded, that is ( ( (λ j ( c1 λ j , some unknown 0 < c1 < ∞, (8.10) where 0 < λ < 1 is an a priori known constant. The nonlinearity μ (x) is an arbitrary function, continuous almost everywhere on x ∈ (−umax , umax ) (in the sense of Lebesgue measure). The output noise {zk } is a zero-mean stationary and ergodic process, which is independent of the input {uk }. For simplicity of presentation we also let L ∑∞j=0 λ j = 1 and umax = 12 . We note that the members of the family of Wiener systems composed by series connection of linear filters with the impulse λ responses {λ j } = { c2j }∞j=0 and the nonlinearities μ (x) = μ (c2 x) are, for c2 = 0, indistinguishable from the input-output point of view. In consequence, the characteristic μ () can be recovered in general only up to some domain scaling factor c2 , independently of the applied identification method. Observe that, in particular, for the FIR linear dynamics, the condition (8.10) is fulfilled for arbitrarily small (con( stant λ > 0. Moreover, it holds that |xk | < xmax < ∞, where xmax umax ∑∞j=0 (λ j (. ( ( Since ∑∞j=0 (λ j ( L and L = 1, thus the support of the random variables xk , i.e. (−xmax , xmax ), is generally wider than the estimation interval x ∈ (−umax , umax ). We introduce and analyse the nonparametric estimate of the part of characteristic μ (x), for x ∈ (−umax , umax ), and next we expand the obtained results for x ∈ (−xmax , xmax ), when the parametric knowledge of μ () is provided. Let x be a chosen estimation point of μ (·). For a given x let us define a “weighted distance” between the measurements uk , uk−1 , uk−2 , ..., u1 and x as
δk (x)
k−1 (
(
∑ (uk− j − x( λ j = |uk − x| λ 0 + |uk−1 − x| λ 1 + ... + |u1 − x| λ k−1 ,
(8.11)
j=0
i.e. δ1 (x) = |u1 − x|, δ2 (x) = |u2 − x| + |u1 − x| λ , δ3 (x) = |u3 − x| + |u2 − x| λ + |u1 − x| λ 2 , etc., which can be computed recursively as follows
δk (x) = λ δk−1 (x) + |uk − x|.
(8.12)
8
Parametric Versus Nonparametric Approach to Wiener Systems Identification
117
Under above assumptions we obtain ( ( ( ( (∞ ( (∞ ∞ (( ( ( ( |xk − x| = ( ∑ λ j uk− j − ∑ λ j x( = ( ∑ λ j uk− j − x ( = ( j=0 ( ( j=0 ( j=0 ( ( (k−1 ∞ (( ( = ( ∑ λ j uk− j − x + ∑ λ j uk− j − x ( ( j=0 ( j=k
k−1 (
((
∞
(
( (
λk
∑ (λ j ( (uk− j − x( + 2umax ∑ (λ j ( δk (x) + 1 − λ
j=0
Observe that if in turn
Δk (x).
(8.13)
j=k
Δk (x) h(N),
(8.14)
then the true (but unknown) interaction input xk is located close to x, provided that h(N) (further, a calibration parameter) is small. The distance given in (8.13) may be easily computed as the point x and the data uk , uk−1 , uk−2 , ..., u1 are each time at ones disposal. In turn, the condition (8.14) selects k’s for which the input sequences {uk , uk−1 , uk−2 , ..., u1 } are such that the true nonlinearity inputs {xk } surely belong to the neighbourhood of the estimation point x with the radius h(N). Let us also notice that asymptotically, as k → ∞, it holds that
δk (x) = Δk (x),
(8.15)
with probability 1. Proposition 8.1. If, for each j = 0, 1, ..., ∞ and some d > 0, it holds that ( ( (uk− j − x( d , λj
(8.16)
then
λ . (8.17) 1−λ Proof. The condition (8.16) is fulfilled with probability 1 for each j > j0 , where j0 = logλ d is the solution of the following inequality |xk − x| d logλ d + d
d 2umax = 1. λj Analogously as in (8.13), we obtain |xk − x|
j0
d
λ j0 +1
∑ λ jλ j + 1−λ ,
j=0
which yields (8.17).
We propose the following nonparametric kernel-like estimate of the nonlinear characteristic μ () at the given point x, exploiting the distance δk (x) between xk and x, and having the form
118
G. Mzyk
# " δk (x) ∑Nk=1 yk · K h(N)
N (x) = # , " μ δk (x) ∑Nk=1 K h(N) where K() is the window kernel function of the form 1, as |v| 1 K(v) = . 0, elsewhere
(8.18)
(8.19)
Since the estimate (8.18) is of the ratio form we treat the case 0/0 as 0. Theorem 8.3. If h(N) = d(N) logλ d(N), where d(N) = N −γ (N) , and " #−w γ (N) = log1/λ N , then for each w ∈ 12 , 1 the estimate (8.18) is consistent in the mean square sense, i.e., it holds that
N (x) − μ (x)) = 0. lim E (μ 2
N→∞
(8.20)
Proof. Let us denote the probability of selection as p(N) P (Δk (x) h(N)). To prove (8.20) it suffices to show that (see (19) and (22) in [33]) h(N) → 0, N p(N) → ∞,
(8.21) (8.22)
N (x), respectively. as N → ∞. They assure vanishing of the bias and variance of μ Since under assumptions of Theorem 8.3 d(N) → 0 ⇒ h(N) → 0,
(8.23)
in view of (8.17), the bias-condition (8.21) is obvious. For the variance-condition (8.22) we have ⎧ ⎫ ⎨min(k, - j0 ) ( ( d(N) ⎬ (uk− j − x( < p(N) P ⎩ j=0 λj ⎭ ⎫ ⎧ ⎨min(k, j0 - j0 ) ( ( d(N) ⎬ ( ( d(N) ( ( (uk− j − x( < u < = P P − x k− j ⎩ j=0 λj ⎭ ∏ λj j=0 d(N) d(N) d(N) (ε d(N)) j0 +1 · ε · ... · ε = = j0 ( j0 +1) λ0 λ1 λ j0 λ 2 & ' j0 +1 " 1 # j0 +1 1 1 ε d(N) = = ε d(N) = ε · d(N) 2 logλ d(N)+logλ ε + 2 . j0 λ2 ε
(8.24)
8
Parametric Versus Nonparametric Approach to Wiener Systems Identification
119
By inserting d(N) = N −γ (N) = (1/λ )−γ (N) log1/λ N to (8.24) we obtain N · p(N) = ε · N 1−γ (N)( 2 γ (N) log1/λ N+logλ ε + 2 ) . 1
1
(8.25)
" #−w For γ (N) = log1/λ N and w ∈ 12 , 1 from (8.25) we simply conclude (8.22) and consequently (8.20). To establish the asymptotic rate of convergence we additionally assume that the nonlinear characteristic μ (x) is a Lipschitz function, i.e., it exists a positive constant l < ∞, such that for each xa , xb ∈ R it holds that |μ (xa ) − μ (xb )| l |xa − xb |. 0
N (x) = S1 ∑Si=1 For a window kernel (8.19) we can rewrite (8.18) as μ y[i] , where 0 " δ (x) # [i] [i]’s are indexes, for which K h(N) = 1, and S0 is a random number of semeasurements. For each y[i] , i = 1, 2, ..., S0 , respective x[i] is such that (lected output ( ( ( (x[i] − x( h(N) and consequently ( ( ( ( (μ (x[i] ) − μ (x)( lh(N), which for Ezk = 0 leads to ( ( ( (
N (x)| = (Ey[i] − μ (x)( = ((E μ (x[i] ) − μ (x)(( lh(N), |biasμ
N (x) = O h2 (N) . bias2 μ For the variance we have
(8.26)
' 1 n
N (x) = ∑ P(S0 = n) · var(μ
N (x)|S0 = n) = ∑ P(S0 = n) · var varμ ∑ y[i] . n i=1 n=0 n=1 N
N
&
Since, under strong law of large numbers and Chebychev inequality, it holds that limN→∞ P (S0 > α ES0 ) = 1 for each 0 < α < 1 (see [33]), we obtain asymptotically & ' 1 n
N (x) = ∑ P(S0 = n) · var varμ (8.27) ∑ y[i] n i=1 n>α ES0 with probability 1. Taking into account that y[i] = y[i] + z[i] , where y[i] and z[i] are independent random variables we obtain ' & ' & ' & 1 n 1 n 1 n (8.28) var ∑ y[i] = var n ∑ y[i] + var n ∑ z[i] . n i=1 i=1 i=1 Since the process z[i] is ergodic, under strong law of large numbers, it holds that & var
1 n ∑ z[i] n i=1
'
=O
1 N p(N)
1 =O . N
(8.29)
120
G. Mzyk
The process {y[i] } is in general not ergodic, but in consequence of (8.14) it has compact support [μ (x) − lh(N), μ (x) + lh(N)] and the following inequality holds & ' 1 n var (8.30) ∑ y[i] vary[i] (2lh(N))2 . n i=1 From (8.27), (8.28), (8.29) and (8.30) we conclude that
N (x) = O(h2 (N)), varμ
(8.31)
which in view of (8.26) leads to
N (x) − μ (x)| = O(h2 (N)) |μ
(8.32)
in the mean square sense. A relatively slow rate of convergence, guaranteed in a general case, for h(N) as in Theorem 8.3, is a consequence of small amount of a priori information. We emphasise that for, e.g., often met in applications
N (x) = 0 and piecewise constant functions μ (x), it exists N < ∞, such that bias2 μ # " 1 n
N (x) − μ (x)| = O( N1 ) as N → ∞ var n ∑i=1 y[i] = 0 for N > N, and consequently |μ (see (8.29)).
8.4 Combined Parametric-nonparametric Approach The idea of the combined parametric-nonparametric approach to system identification was introduced by Hasiewicz and Mzyk in [24], and continued in [25], [33], [35], and [38]. The algorithms decompose the complex system identification task on independent identification problems for each components. The decomposition is based on the estimation of interaction inputs xk . Next, using the resulting pairs xk , yk ), both linear dynamic and static nonlinear subsystems are iden(uk , x k ) and ( tified separately. In the contrary to Hammerstein system, where xH k = m(uk ) may be estimated directly by any regression estimation method, for Wiener system the situation is more complicated as xk = ∑∞j=0 λ j uk− j , and the impulse response of the linear dynamics must be estimated first, to provide indirect estimates of xk .
8.4.1 Kernel Method with the Correlation-based Internal Signal Estimation Here we assume that the input uk is white and Gaussian, the nonlinear characteristic μ () is bounded by polynomial of any finite order, cov(u1 , y1 ) = 0, and the linear dynamics is FIR with known order s, i.e. xk = ∑sj=0 λ j uk− j . Observe that E{yk |xk = x} = μ (x). Since the internal signal xk cannot be measured, the following kernel regression estimate is proposed (see [44])
8
Parametric Versus Nonparametric Approach to Wiener Systems Identification
# " x− x ∑Nk=1 yk K h(N)k
(x) = # , " μ x− x ∑Nk=1 K h(N)k
121
(8.33)
where x k is indirect estimate of β xk (i.e. scaled xk ) x k =
s
∑ κ j uk− j
j=0
based on the input-output sample correlation (see (8.8)). The following theorem holds.
(x) → μ (x/β ) in probability as N → ∞. Theorem 8.4. [44] If K() is Lipschitz then μ Moreover, if both μ () and K() are twice differentiable, then it holds that " 2 #
(x) − μ (x/β )| = O N − 5 +ε (8.34) |μ for any small ε > 0, provided that h(N) ∼ N −1/5 . In practise, due to assumed Gaussianity of excitations, the algorithm (8.34) is rather recommended for the tasks, in which the input process can be freely generated.
8.4.2 Identification of IIR Wiener Systems with Non-Gaussian Input The presented kernel-type algorithm (8.18) is applied in this section to support estimation of parameters, when our prior knowledge about the system is large, and in particular, the parametric model of the characteristic is known. Assume that we are given the class μ (x, c), such that μ (x) ⊂ μ (x, c), where c = (c1 , c2 , ..., cm )T , i.e. for the vector of true parameters c∗ = (c∗1 , c∗2 , ..., c∗m )T it holds that μ (x, c∗ ) = μ (x). Let moreover the function μ (x, c) be by assumption differentiable with respect to c, and the gradient c μ (x, c) be bounded in some convex neighbourhood of c∗ for each x. We assume that c∗ is identifiable, i.e., there exists such a sequence x(1) , x(2) , ..., x(N0 ) of estimation points, that
μ (x(i) , c) = μ (x(i) ), i = 1, 2, ..., N0 =⇒ c = c∗ . The proposed estimate has two steps. Step 1. For the sequence x(1) , x(2) , ..., x(N0 ) compute N0 pairs $" #%N0
N (x(i) ) x(i) , μ , i=1
using the estimate (8.18).
122
G. Mzyk
Step 2. Perform the minimisation of the cost-function N0 " #2
N (x(i) ) − μ (x(i) , c) , QN0 ,N (c) = ∑ μ i=1
with respect to the variable vector c, and take c N0 ,N = arg min QN0 ,N (c) c
(8.35)
as the estimate of c∗ . Theorem 8.5. Since in Step 1 (nonparametric) for the estimate (8.18) it holds that
N (x(i) ) → μ (x(i) ) in probability as N → ∞ for each i = 1, 2, ..., N0 , thus μ c N0 ,N → c∗ in probability, as N → ∞. Proof. The proof is analogous to that of Theorem 1 in [25].
8.4.3 Recent Ideas The interesting new attempt to the impulse response estimation of the linear block in Wiener system is presented in [44]. It is assumed that the input probability density f (u) has compact support, both μ () and f () have continuous derivatives, and the linear dynamics is FIR with known order s. We emphasise that similarly as in the correlation-based algorithm (see Section 8.3.2) the characteristic μ () need not to be invertible and moreover the input density is not assumed to be Gaussian. The idea follows from the chain rule. Introducing the vectors uk = (uk , uk−1 , ..., uk−s )T and λ = (λ0 , λ1 , ..., λs )T one can describe the Wiener system by the following formula yk = F(uk ) + zk , where F(uk ) = μ (λ T uk ). Let DF (u) be the gradient of F(). It holds that DF (uk ) = μ (λ T uk )λ and consequently $ % (8.36) E {DF (uk )} = c0 λ , where c0 = E μ (λ T uk ) . It leads to the idea of estimation of the scaled vector λ , including the true elements of the impulse response, by the gradient averaging. Since for a given uk , μ (λ T uk ) is unknown, DF (uk ) cannot be computed directly. Introducing fu () – the joint probability density of uk , the property (8.36) can be transformed to the more applicable form ([44]) % 1 $ E yk D f (uk ) = c1 λ , where c1 = E f (uk )μ (λ T uk ) , 2
8
Parametric Versus Nonparametric Approach to Wiener Systems Identification
123
and D f (u) is a gradient of fu (). Since for white uk we have fu (uk ) = ∏sj=0 f (uk− j ), it leads to the following scalar estimates of the impulse response s 1 N
λ j = ∑ yk d f , j (uk ), where d f , j (uk ) = D f (uk )[ j] = f (uk− j ) ∏ f (uk−i ). N k=1 i=0,i= j (8.37) If the input probability density function f (u) is unknown, it can be simply estimated, e.g., by the kernel method. The open question is generalisation of the approach for IIR linear subsystems and correlated input cases.
8.5 Conclusion The principal question in Wiener system identification problem is selection of adequate method. The scope of application of each estimates is limited by specific set of associated assumptions. Most of them requires a priori known parametric type of model, Gaussian input, FIR dynamics or invertible characteristic. In fact, the authors address particular cases, and the problems they solve are quite different (see references below). Since the general Wiener system identification problem includes many difficult aspects, existence of one universal algorithm cannot be expected. In the light of this, the nonparametric approach seems to be good tool, which allows for combining selected methods, depending on specificity of the particular task. Moreover, pure nonparametric estimates are the only possible choice, when the prior knowledge of the system is poor.
References 1. Bai, E.W.: A blind approach to Hammerstein–Wiener model identification. Automatica 38, 969–979 (2002) 2. Bai, E.W.: Frequency domain identification of Wiener models. Automatica 39, 1521– 1530 (2003) 3. Bai, E.W., Reyland, J.: Towards identification of Wiener systems with the least amount of a priori information on the nonlinearity. Automatica 44, 910–919 (2008) 4. Bai, E.W., Reyland, J.: Towards identification of Wiener systems with the least amount of a priori information: IIR cases. Automatica 45(4), 956–964 (2009) 5. Bershad, N.J., Bouchired, S., Castanie, F.: Stochastic analysis of adaptive gradient identification of Wiener–Hammerstein systems for Gaussian inputs. IEEE Transactions on Signal Processing 48(2), 557–560 (2000) 6. Bershad, N.J., Celka, P., Vesin, J.M.: Analysis of stochastic gradient tracking of timevarying polynomial Wiener systems. IEEE Transactions on Signal Processing 48(6), 1676–1686 (2000) 7. Billings, S.A., Fakhouri, S.Y.: Identification of nonlinear systems using the Wiener model. Automatica 13(17), 502–504 (1977) 8. Billings, S.A., Fakhouri, S.Y.: Identification of systems containing linear dynamic and static nonlinear elements. Automatica 18(1), 15–26 (1982)
124
G. Mzyk
9. Celka, P., Bershad, N.J., Vesin, J.M.: Fluctuation analysis of stochastic gradient identification of polynomial Wiener systems. IEEE Transactions on Signal Processing 48(6), 1820–1825 (2000) 10. Celka, P., Bershad, N.J., Vesin, J.M.: Stochastic gradient identification of polynomial Wiener systems: analysis and application. IEEE Transactions on Signal Processing 49(2), 301–313 (2001) 11. Chen, H.F.: Recursive identification for Wiener model with discontinuous piece-wise linear function. IEEE Transactions on Automatic Control 51(3), 390–400 (2006) 12. Giri, F., Rochdi, Y., Chaoui, F.Z.: An analytic geometry approach to Wiener system frequency identification. IEEE Transactions on Automatic Control 54(4), 683–696 (2009) 13. Giannakis, G.B., Serpedin, E.: A bibliography on nonlinear system identification. Signal Processing 81, 533–580 (2001) 14. Greblicki, W.: Nonparametric identification of Wiener systems. IEEE Transactions on Information Theory 38, 1487–1493 (1992) 15. Greblicki, W.: Nonparametric identification of Wiener systems by orthogonal series. IEEE Transactions on Automatic Control 39(10), 2077–2086 (1994) 16. Greblicki, W.: Nonparametric approach to Wiener system identification. IEEE Transactions on Circuits and Systems – I: Fundamental Theory and Applications 44(6), 538–545 (1997) 17. Greblicki, W.: Continuous-time Wiener system identification. IEEE Transactions on Automatic Control 43(10), 1488–1493 (1998) 18. Greblicki, W.: Recursive identification of Wiener systems. International Journal of Applied Mathematics and Computer Science 11(4), 977–991 (2001) 19. Greblicki, W.: Nonlinearity recovering in Wiener system driven with correlated signal. IEEE Transactions on Automatic Control 49(10), 1805–1810 (2004) 20. Greblicki, W., Pawlak, M.: Identification of discrete Hammerstein systems using kernel regression estimates. IEEE Transactions on Automatic Control 31, 74–77 (1986) 21. Greblicki, W., Pawlak, M.: Nonparametric System Identification. Cambridge University Press, Cambridge (2008) 22. Greblicki, W., Mzyk, G.: Semiparametric approach to Hammerstein system identification. In: Proceedings of the 15th IFAC Symposium on System Identification, Saint-Malo, France, July 6-8, pp. 1680–1685 (2009) 23. Hasiewicz, Z.: Identification of a linear system observed through zero-memory nonlinearity. International Journal of Systems Science 18, 1595–1607 (1987) 24. Hasiewicz, Z., Mzyk, G.: Combined parametric-nonparametric identification of Hammerstein systems. IEEE Transactions on Automatic Control 49, 1370–1376 (2004) 25. Hasiewicz, Z., Mzyk, G.: Hammerstein system identification by non-parametric instrumental variables. International Journal of Control 82(3), 440–455 (2009) ´ 26. Hasiewicz, Z., Sliwi´ nski, P., Mzyk, G.: Nonlinear system identification under various prior knowledge. In: Proceedings of the 17th World Congress the IFAC, Seoul, Korea, pp. 7849–7858 (2008) 27. Hagenblad, A., Ljung, L., Wills, A.: Maximum likelihood identification of Wiener models. Automatica 44, 2697–2705 (2008) 28. Hu, X.L., Chen, H.F.: Strong consistence of recursive identification of Wiener systems. Automatica 41, 1905–1916 (2005) 29. Hughes, M.C., Westwick, D.T.: Identification of IIR Wiener systems with spline nonlinearities that have variable knots. IEEE Transactions on Automatic Control 50(10), 1617–1622 (2005) 30. Hunter, I.W., Korenberg, M.J.: The identification of nonlinear biological systems: Wiener and Hammerstein cascade models. Biological Cybernetics 55, 135–144 (1986)
8
Parametric Versus Nonparametric Approach to Wiener Systems Identification
125
31. Korenberg, M.J., Hunter, I.W.: The identification of nonlinear biological systems: LNL cascade models. Biological Cybernetics 55, 125–134 (1986) 32. Lacy, S.L., Bernstein, D.S.: Identification of FIR Wiener systems with unknown, noninvertible, polynomial non-linearities. International Journal of Control 76(15), 1500– 1507 (2003) 33. Mzyk, G.: A censored sample mean approach to nonparametric identification of nonlinearities in Wiener systems. IEEE Transactions on Circuits and Systems – II: Express Briefs 54(10), 897–901 (2007) 34. Mzyk, G.: Generalized kernel regression estimate for the identification of Hammerstein systems. International Journal of Applied Mathematics and Computer Science 17(2), 101–109 (2007) 35. Mzyk, G.: Nonlinearity recovering in Hammerstein system from short measurement sequence. IEEE Signal Processing Letters 16(9), 762–765 (2009) 36. Mzyk, G.: Kernel-type identification of IIR Wiener systems with non-gaussian input. IEEE Transactions on Signal Processing (2010) 37. Nadaraya, E.A.: On estimating regression. Theory of Probability and its Applications 9, 141–142 (1964) 38. Pawlak, M., Hasiewicz, Z., Wachel, P.: On nonparametric identification of Wiener systems. IEEE Transactions on Signal Processing 55(2), 482–492 (2007) 39. Pupeikis, R.: On the identification of Wiener systems having saturation-like functions with positive slopes. Informatica 16(1), 131–144 (2005) 40. Rafajłowicz, E.: Non-parametric identification with errors in independent variables. International Journal of Systems Science 25(9), 1513–1520 (1994) 41. Vanbeylen, L., Pintelon, R., Schoukens, J.: Blind maximum-likelihood identification of Wiener systems. IEEE Transactions on Signal Processing 57(8), 3017–3029 (2009) 42. Vandersteen, G., Schoukens, J.: Measurement and identification of nonlinear systems consisting of linear dynamic blocks and one static nonlinearity. IEEE Transactions on Automatic Control 44(6), 1266–1271 (1999) 43. V¨or¨os, J.: Parameter identification of Wiener systems with multisegment piecewiselinear nonlinearities. Systems and Control Letters 56, 99–105 (2007) 44. Wachel, P.: Parametric-Nonparametric Identification of Wiener Systems, PhD Thesis (in Polish) Wrocław University of Technology, Poland (2008), http://diuna.iiar.pwr.wroc.pl/wachel/rozprawa.pdf 45. Westwick, D., Verhaegen, M.: Identifying MIMO Wiener systems using subspace model identification methods. Signal Processing 52, 235–258 (1996) 46. Wiener, N.: Nonlinear Problems in Random Theory. Wiley, New York (1958) 47. Wigren, T.: Convergence analysis of recursive identification algorithms based on the nonlinear Wiener model. IEEE Transactions on Automatic Control 39, 2191–2206 (1994) 48. Zhao, Y., Wang, L.Y., Yin, G.G., Zhang, J.F.: Identification of Wiener systems with binary-valued output observations. Automatica 43, 1752–1765 (2007)
Chapter 9
Identification of Block-oriented Systems: Nonparametric and Semiparametric Inference M. Pawlak
9.1 Introduction Block-oriented nonlinear systems are represented by a certain composition of linear dynamical and nonlinear static models. Hence, a block-oriented system is defined by the pair (λ , m(•)), where λ defines infinite-dimensional parameter representing impulse response sequences of linear dynamical subsystems, whereas m(•) is a vector of nonparametric multidimensional functions describing nonlinear elements. In the parametric identification approach to block-oriented systems one assumes that both λ and m(•) are known up to unknown finite dimensional parameters, i.e., λ = λ (ϑ ) and m(•) = m(•; ζ ) for ϑ , ζ being finite dimensional unknown parameters. There are numerous identification algorithms for estimating ϑ , ζ representing specific block-oriented systems, see [5] for an overview of the subject. Although such methods are quite accurate, it is well known, however, that parametric models carry a risk of incorrect model specification. On the other hand, in the nonparametric setting λ and m(•) are completely unspecified and therefore the corresponding nonparametric block-oriented system does not suffer from risk of misspecification. Nevertheless, since nonparametric estimation algorithms make virtually no assumptions about the form of (λ , m(•)) they tend to be slower to converge to the true characteristics of a block-oriented system than correctly specified parametric algorithms. Moreover, the convergence rate of nonparametric methods is inversely proportional to the dimensionality of input and interconnecting signals. This is commonly referred to as the “curse of dimensionality”. Nonparametric methods have attracted a great deal of attention in the statistical science, see [14] for a recent overview and a list of a large number of texts written by statisticians. The number of texts on nonparametric algorithms tailored to the needs of engineering and system identification in particular is much smaller, see [6] and [10] for recent contributions. M. Pawlak Department of Electrical and Computer Engineering, University of Manitoba, Winnipeg, Manitoba, R3T5V6 Canada e-mail:
[email protected] F. Giri & E.-W. Bai (Eds.): Block-oriented Nonlinear System Identification, LNCIS 404, pp. 127–146. c Springer-Verlag Berlin Heidelberg 2010 springerlink.com
128
M. Pawlak
In practice, we can accept intermediate models which lie between parametric and fully nonparametric cases. For this so called semiparametric models we specify a parametric form for some part of the model but we do not require the parametric assumption for the remaining parts of the model. Hence, the nonparametric description (λ , m(•)) of the system is now replaced by (θ , g(•)), where θ is a finite dimensional vector and g(•) is a set of nonparametric nonlinearities being typically univariate functions. The parameter θ represents characteristics of all linear dynamical subsystems and low-dimensional approximations of multivariate nonlinearities. The fundamental issue is to characterise the accuracy of approximation of (λ , m(•)) by the selected semiparametric model (θ , g(•)). This challenging problem will be addressed in this paper in some specific cases. Once such characterisation is resolved, we cam make use of this low complexity semiparametric representation to design practical identification algorithms that share the efficiency of parametric modelling while preserving the high flexibility of the nonparametric case. In fact, in many situations we are able to identify linear and nonlinear parts of a block-oriented system under much weaker conditions on the system characteristics and underlying probability distributions. A semiparametric inference is based on the concept of blending together parametric and nonparametric estimation methods. The basic idea is to first analyse the parametric part of the block-oriented structure as if all nonparametric parts were known. To eliminate the dependence of a parametric fitting criterion on the characteristics of the nonparametric parts, we form pilot nonparametric estimates of the characteristics being indexed by a finite-dimensional vector of the admissible value of the parameter. Then this is used to establish a parametric fitting criterion (such as least squares) with random functions representing all estimated nonparametric characteristics. On the other hand, nonparametric estimates of the characteristics use estimates of the parametric part. As a result of this interchange, we need some data resampling schemes in order to achieve some statistical independence between the estimators of parametric and nonparametric parts of the system. This improves the efficiency of the estimates and facilitates the mathematical analysis immensely. In Section 2 we examine sufficient conditions for the convergence of our identification algorithms for a general class of semiparametric block-oriented systems. This general theory is illustrated (Section 3) in the case of semiparametric versions of multivariate Hammerstein system and parallel systems. We show in this context that the semiparametric strategy leads to consistent estimates of (θ , g(•)) with rates which are independent of the signal dimensionality. These results are also verified in simulation studies. An overview of the theory and applications of semiparametric inference in statistics and economics can be found in [8], [15].
9.2 Nonparametric and Semiparametric Inference The modern nonparametric inference provides a plethora of estimation methods allowing us to recover system characteristics with the minimum knowledge about
9
Nonparametric and Semiparametric Inference
129
their functional forms. This includes classical methods like k−nearest neighbours, kernel and series estimators. On the other hand, sparse basis functions, regularisation techniques, support vector machines, and boosting methods define modern alternatives [14], [9]. For a given set of training data DN = {(X1 ,Y1 ), . . . , (XN ,YN )} taken at the input and output of a certain system, a typical nonparametric estimate of a regression function m(x) = E{Yt |Xt = x} can be written as N
mˆ N (x) =
∑ αt Yt K(x, Xt ),
(9.1)
t=1
where K(x, v) is a kernel function and {αt } is a normalising factor. For instance, in the classical kernel estimate αt = (∑Ni=1 K(x, Xi ))−1 for each t , where K(x, v) is the compactly supported kernel of the convolution type, i.e., K(x, v) = K(x − v). On the other hand, in support vector kernel machines, αt is selected by the optimisation algorithm defined by the maximal-margin separation principle and the kernel function is of the inner product type (Mercer’s kernels) K(x, v) = ∑l φl (x)φl (v). In order to achieve the desired consistency property, i.e., that mˆ N (x) tends to m(x) as N → ∞, the kernel function must be tuned locally. This can be achieved by introducing the concept of smoothing parameters that control the size of local information that is employed in the estimation process. The consistency is the unavoidable property and is met by most classical nonparametric techniques. Some modern techniques like support vector machines are not local since they use the entire training data in the design process. This can be partially overcome by using the proper so-called universal kernels. A serious limitation in the use of nonparametric estimators is the fact that they are prune to the dimensionality of observed signals as well as the smoothness of estimated characteristics [14], [6]. To illustrate this point, let us consider the following multiple-input, single-output (MISO), nonlinear autoregressive model of order p : Yn = m(Yn−1 ,Yn−2 , . . . ,Yn−p , Un ) + Zn ,
(9.2)
where Un ∈ Rd is the input signal, Zn is noise process, and m(•, •) is a (p + d) dimensional function. It is clear that m(•, •) is a regression function of Yn on the past outputs Yn−1 ,Yn−2 , . . . ,Yn−p and the current input Un . Thus, it is a straightforward task to form a multivariate nonparametric regression estimate such as the one in (9.1), where the signal Xt ∈ R p+d is defined as Xt = (Yt−1 ,Yt−2 , . . . ,Yt−p , Ut ). The convergence analysis, see [6], of such an estimate will strongly depend on the stability conditions of the nonlinear recursive difference equation: yn = m(yn−1 , yn−2 , . . . , yn−p , un ). With this respect, a fading-memory type assumption along with the Lipschitz continuity of m(•, •) seem to be sufficient for the consistency of nonparametric regression estimates. Hence, for m(•, •) being a Lipschitz continuous function the best possible rate can be
130
M. Pawlak
" # − 1 OP N 2+p+d , where OP (•) denotes the rate in probability. For the second order system p = 2 and double-input (d = 2) this gives a very slow rate of OP (N −1/6 ). This apparent curse of dimensionality also exists in the case of the MISO Hammerstein system which will be examined in the next section. To overcome this problem one can consider to approximate the regression function m(x) = E{Yt |Xt = x}, x ∈ Rq , by some low-dimensional structures. We are looking for a parsimonious semiparametric alternative which can be represented by a finite-dimensional parameter and a set of single-variable nonlinearities. The following is a simple semiparametric model:
μ (x) = g(θ T x),
(9.3)
where the function g : R → R and the parameter θ ∈ Rq are unknown and must be estimated. We note that g(•) is a single variable function and thus the curse of dimensionality for the model μ (x) is removed. The model μ (x) is not uniquely defined. In fact if g(•) is linear then we cannot identify θ . Moreover, the scaling of the vector θ does not influence the values of g(•). Hence, we need to normalise θ either by setting one of the coefficients of θ to one, e.g., θ1 = 1 or by putting the restriction ||θ || = 1. We will call the set of all such normalised values of θ as Θ . Often the vector of covariates Xt can be decomposed as Xt = (Ut , Vt ), where Ut ∈ Rd has the interpretation of the input signal and Vt ∈ R p defines the past values of the output signal. Then we can propose a few alternatives to the model in (9.3), e.g.,
μ (u, v) = ρ T v + g(γ T u),
(9.4)
where ρ ∈ R p and γ ∈ Rd are unknown parameters. This semiparametric model applied in (9.2) would result in a partially linear nonlinear system of the form Yn = ρ1Yn−1 + ρ2Yn−2 + · · · + ρ pYn−p + g(γ T Un ) + Zn .
(9.5)
The statistical inference for the model in (9.3), i.e., estimation of (θ , g(•)) requires the characterisation of the “true” characteristics (θ ∗ , g∗ (•)). This can be done by finding the optimal L2 projection of the original system onto the model defined in (9.3). Hence, we wish to minimise Q(θ , g(•)) = E{(Yt − g(θ T Xt ))2 }
(9.6)
with respect to θ ∈ Θ and g(•) such that E{g2 (θ T Xt ))} < ∞. The minimiser of Q(θ , g(•)) will be denoted as (θ ∗ , g∗ (•)). Since the minimisation of Q(θ , g(•)) is equivalent to the minimisation of E{(Yt − g(θ T Xt ))2 |Xt }. This is the L2 projection and for a given θ ∈ Θ the solution is g(w; θ ) = E{Yt |θ T Xt = w}. This is just a regression function of Yt on θ T Xt , i.e., the best predictor of the output signal Yt by the projection of Xt onto the direction defined by the vector θ . Plugging this choice
9
Nonparametric and Semiparametric Inference
131
into Q(θ , g(•)), i.e., calculating Q(θ , g(•; θ )) we can readily obtain the following error function Q(θ ) = E{(Yt − g(θ T Xt ; θ )2 } = E{(var(Yt |θ T Xt )}.
(9.7)
The minimiser of Q(θ ) with respect to θ ∈ Θ defines the optimal θ ∗ and consequently the corresponding optimal nonlinearity g∗ (w) = g(w; θ ∗ ). In practice, it is difficult to find an explicit formula for g(w; θ ) and to characterise the choice of θ ∗ . It is clear that the smoothness and shape of g(w; θ ) is controlled by the smoothness of m(x) and the conditional distribution of Xt on θ T Xt . To shed some light on this issue let us consider a simple example. Example 9.1. Let Yt = m(X1t , X2t ) + Zt be the bivariate regression model with X1t = Ut , X2t = Ut−1 and m(x1 , x2 ) = x1 x2 . Assume that {Ut } is zero mean unit variance stationary Gaussian process with the correlation E{Ut+τ Ut } = ρ (τ ). Let us also denote ρ = ρ (1). The noise process {Zt } is assumed to be a stationary process with zero mean and unit variance being, moreover, independent of {Ut }. Hence, we wish to determine the best L2 approximation of the system Yt = Ut Ut−1 + εt by a model of the form g(Ut + θ Ut−1 ). The aforementioned discussion reveals that first we have to determine the regression function g(w; θ ) = E{Yt |Ut + θ Ut−1 = w} and next to find the optimal θ ∗ by minimising Q(θ ) = E{(var(Yt |Ut + θ Ut−1 )}. To do so, let us first note that the random vector (Ut−1 ,Ut + θ Ut−1 ) has the bivariate Gaussian distribution with zero mean and covariance matrix ρ +θ 1 . ρ + θ 1 + θ 2 + 2θ ρ This fact and some algebra yield g(w; θ ) = a(θ )w2 + b(θ ),
(9.8)
where a(θ ) = (ρ + θ (1 − θ ))/(1 + θ 2 + 2θ ρ ) and b(θ ) = −θ (1 − ρ 2))/(1 + θ 2 + 2θ ρ ). Further algebra leads also to an explicit formula for the projection error Q(θ ) for a given θ . In Figure 9.1 we depict Q(θ ) as a function of θ for the value of the correlation coefficient ρ equal to 0, 0.4, 0.8. The dependence of Q(θ ) on negative values of ρ is just a mirror reflection of the curves shown in Figure 9.1. Interestingly, we observe that√in the case of ρ = 0 we have two values of θ minimising Q(θ ), i.e., θ ∗ = ±1/ 3. When |ρ | is increasing, the optimal θ is unique and is slowly decreasing from θ ∗ = 0.577 for ρ = 0 to θ ∗ = 0.505 for ρ = 0.9. On the hand, the value of the minimal error Q(θ ∗ ) is decreasing fast from Q(θ ∗ ) = 0.75 for ρ = 0 to Q(θ ∗ ) = 0.067 for ρ = 0.9. Figure 9.2 shows the optimal nonlinearities g∗ (w) = g(w; θ ∗ ) corresponding to the values ρ = 0, 0.4, 0.8. It is worth nothing that g∗ (w) for ρ = 0 is smaller than g∗ (w) for any ρ > 0. Similar relationships hold for ρ < 0. Thus, we can conclude that the best approximation (for ρ = 0) of the system Yt = Ut Ut−1 + Zt by the class of models {g(Ut + θ Ut−1 ) : θ ∈ Θ } is the √ model √ √ 3−1 2 ∗ ∗ ∗ ∗ Yt = g (Ut + θ Ut−1 ) + Zt , where θ = ±1/ 3 and g (w) = ± 4 w ∓ 43 . In the case when, e.g., ρ = 0.5 we obtain θ ∗ = 0.532 and g∗ (w) = 0.412w2 − 0.219.
132
M. Pawlak
2.5 2.0
0
ρ=
1.5
ρ=
1.0
0.4
ρ = 0.8
0.5 0.0 2
1
0
1
2
Fig. 9.1: The projection error Q(θ ) versus θ for the values of the correlation coefficient ρ = 0, 0.4, 0.8
12 10
ρ = 0.8
8
0. 4
6
ρ
=
4 2
ρ=
0 4
2
0
2
0
4
Fig. 9.2: The optimal nonlinearity g (w) for the values of the correlation coefficient ρ = 0, 0.4, 0.8
We should also observe that our model represents the Wiener cascade system with the impulse response (1, θ ∗ ) and the nonlinearity g∗ (w). The fact that the correlation reduces the value of the projection error Q(θ ) can be interpreted as follows. With an increasing correlation between input variables the bivariate function m(Ut ,Ut−1 ) = Ut Ut−1 behaves like a function of a single variable. In fact, from (9.8) we have that b(θ ) → 0 as |ρ | → 1. Thus far, we have discussed the preliminary aspects of the semiparametric inference concerning the characterisation of the optimal characteristics (θ ∗ , g∗ (•)) of the model in (9.3). Next, we wish to set up estimators of θ ∗ and g∗ (w). If the regression function g(w; θ ) = E{Yt |θ T Xt = w}, θ ∈ Θ was known, then, due to (9.7), an obvious estimator of θ ∗ would be a minimiser of the following empirical counterpart of Q(θ ):
9
Nonparametric and Semiparametric Inference
133
N
QN (θ ) = N −1 ∑ (Yt − g(θ T Xt ; θ ))2 .
(9.9)
t=1
Since g(w; θ ) is unknown, this is not a feasible estimator. We can, however, estimate the regression function g(w; θ ) by some standard nonparametric methods like kernel algorithms, see (9.1). Let g(w; ˆ θ ) denote a nonparametric estimate of g(w; θ ). As a concrete example we can use the classical kernel estimate N
N
t=1
l=1
g(w; ˆ θ ) = ∑ Yt K((w − θ T Xt )/b)/ ∑ K((w − θ T Xl )/b),
(9.10)
where b is the bandwidth parameter that controls the accuracy of the estimate. In the limit any reasonable nonparametric estimate g(w; ˆ θ ) is expected to tend to g(w; θ ) which, in turn, satisfies the restriction g(w; θ ∗ ) = g∗ (w). Hence, substituting g(w; θ ) in (9.9) by g(w; ˆ θ ) we can obtain the following criterion depending solely on θ : N
ˆ θ T Xt ; θ ))2 . Qˆ N (θ ) = N −1 ∑ (Yt − g(
(9.11)
t=1
It is now natural to define an estimate θˆ of θ ∗ as the minimiser of Qˆ N (θ ), i.e.,
θˆ = arg min Qˆ N (θ ). θ ∈Θ
(9.12)
This approach may lead to an effective estimator of θ ∗ subject some limitations. First, as we have already noted, we should be able to estimate the projection g(w; θ ) for a given θ . In the context of block-oriented systems, the difficulty of this step depends on the complexity of the studied nonlinear system, i.e., whether nonlinear components can be easily estimated as if the parametric part of the system were known. In the next section we will demonstrate that this is the case for MISO Hammerstein and parallel systems. Second, we must minimise the criterion Qˆ N (θ ) which may be an expensive task mostly if θ is highly dimensional and if the gradient vector of Qˆ N (θ ) is difficult to evaluate. To partially overcome these computational difficulties we can use the following simplified iterative method: Step 1: Select an initial θˆ (0) and set g(w; ˆ θˆ (0) ). Step 2: Minimise the criterion N
Q˜ N (θ ) = N −1 ∑ (Yt − g( ˆ θ T Xt ; θ (0) ))2 t=1
with respect to θ and use the obtained value θˆ (1) to update g(w; ˆ θ ), i.e., go to Step 1 to get g(w; ˆ θ (1) ). Step 3: Iterate between the above two steps until a certain stopping rule is satisfied.
134
M. Pawlak
Note that in the above algorithm the criterion Q˜ N (θ ) has a weaker dependence ˆ θ ) is on θ than the original criterion Qˆ N (θ ). In fact, in Q˜ N (θ ) the nonlinearity g(w; already specified. Concerning the recovery of the model optimal nonlinearity g∗ (•) we can use the ˆ θ ) to obtain g(w; ˆ θˆ ). This estimate θˆ and plug it back into our pilot estimate g(w; ∗ can define a nonparametric estimate of g (•). Nevertheless, one can use any other nonparametric estimate g(•; ˜ θ ) with θ replaced by θˆ . Yet another important issue is the problem of selecting a smoothing parameter, like the bandwidth b in the kernel ˆ The forestimate in (9.10), which tunes nonparametric estimates g(•; ˆ θ ) and g(•). mer estimate is used as a preliminary estimator of the projection g(•; θ ) so that θ ∗ can be estimated in the process of minimising Qˆ N (θ ) in (9.11). On the other hand, the latter estimate is used as a final estimate of g∗ (•). Hence, we may be forced to select two separate smoothing parameters. The one for g(•; ˆ θ ), and the other for g(•). ˆ This can be done by augmenting the definition of Qˆ N (θ ) in (9.11) by adding the smoothing parameter as a variable in Qˆ N (θ ). Hence, we define Qˆ N (θ , b) and then minimise Qˆ N (θ , b) simultaneously with respect to θ and b. The bandwidth obtained in this process is by no means good for selecting the estimate θˆ . Whether this is the proper choice for the accurate estimation of g(•) ˆ is not quite clear, see [7] for the affirmative answer to this controversy in the context of the classical regression problem from i.i.d. data. In order to establish consistency properties of the aforementioned estimates we first need to establish that the criterion Qˆ N (θ ) in (9.11) tends ((P)) as N → ∞ to the limit criterion Q(θ ) in (9.7) for a given θ ∈ Θ . This holds under fairly general conditions due to the law of large numbers. Furthermore, as we have already argued we identify the optimal θ ∗ with the minimum of Q(θ ). Note, however, that Qˆ N (θ ) is not a convex function of θ and therefore need not achieve a unique minimum. This, however, is of no serious importance for the consistency since we may weaken our requirement on the minimiser θˆ of Qˆ N (θ ) and define θˆ as any estimator that nearly minimises Qˆ N (θ ), i.e., (9.13) Qˆ N (θˆ ) ≤ inf Qˆ N (θ ) + δN , θ ∈Θ
for any random sequence δN , such that δN → 0(P). It is clear that (9.13) implies that Qˆ N (θˆ ) ≤ Qˆ N (θ ∗ ) + δN and this is sufficient for the convergence of θˆ defined in (9.13) to θ ∗ . Since θˆ depends on the whole mapping θ → Qˆ N (θ ), the convergence of θˆ to θ ∗ requires uniform consistency of the corresponding criterion function, i.e., we ( ( need supθ ∈Θ (Qˆ N (θ ) − Q(θ )( → 0(P). This uniform convergence is the essential step in proving the convergence of θˆ to θ ∗ . This can be established by using formal techniques for verifying whether the following class of functions {(y − g(w; ˆ θ ))2 : θ ∈ Θ } satisfies a uniform law of large numbers that is often referred to as the GlivienkoCantelli property, see [13]. This along with the assumption that the limit criterion
9
Nonparametric and Semiparametric Inference
135
Q(θ ) is a continuous function on the compact set Θ ⊂ Rq , such that θ ∗ ∈ Θ , imply that for any sequence of estimators θˆ that satisfy (9.13) we have
θˆ → θ ∗
as N → ∞(P).
A related issue of interest pertaining to a given estimate is the rate of convergence, i.e., how fast the estimate tends to the true characteristic. Under the differentiability condition of the mapping θ → (• − g(•; ˆ θ ))2 we can consider the problem of the convergence rate. Hence, if Q(θ ) admits the second-order Taylor expansion at θ = θ ∗ then for θˆ defined in (9.13) with N δN → 0(P), we have
θˆ = θ ∗ + OP(N −1/2 ).
(9.14)
This is the usual parametric rate of convergence. Since θˆ → θ ∗ then it is reasonable to expect that the estimate g(•; θˆ ) converges to g(•; θ ∗ ) = g∗ (•). The following decomposition will facilitate this claim g(•; θˆ ) − g∗(•) =
{g(•; ˆ θˆ ) − g(•; ˆ θ ∗ )} ∗ + {g(•; ˆ θ ) − g∗(•)}.
(9.15)
The convergence (P) of the second term to zero in the above decomposition represents a classical problem in nonparametric estimation. The difficulty of establishing this convergence depends on the nature of the dependence between random signals within the underlying system. Concerning the first term in (9.15) we can apply the linearisation technique, i.e., g(•; ˆ θˆ ) − g(•; ˆ θ ∗) =
∂ g(•; ˆ θ )|θ =θ ∗ ∂θ + oP(θˆ − θ ∗ ).
2T
(θˆ − θ ∗)
To show the convergence (P) of the first term to zero it suffices to prove that the derivative has a finite limit (P). This fact can be directly verified for a specific estimate g(•; ˆ θ ) of g(•; θ ). Hence the statistical accuracy of g(•; ˆ θˆ ) is determined by the second term of the decomposition in (9.15). Since for typical nonparametric estimates we have g(•; ˆ θ ∗ ) = g∗ (•) + OP (N −β ), where 1/3 ≤ β < 1, therefore we obtain (9.16) g(•; ˆ θˆ ) = g∗ (•) + OP(N −β ). For instance, if g∗ (•) is the Lipschitz continuous function and g(•; ˆ θˆ ) is the kernel nonparametric estimate then (9.16) holds with β = 1/3. For twice differentiable nonlinearities, this takes place with β = 2/5. The criterion Qˆ N (θ ) in (9.11) utilises the same data to form the pilot nonparametric estimate g(w; ˆ θ ) and to define Qˆ N (θ ). This is not generally a good strategy and some form of resampling scheme should be applied in order to separate measurements into the testing data (used to form Qˆ N (θ )) and training sequence (used for forming the estimate g(w; ˆ θ )). Such separation will facilitate not only the
136
M. Pawlak
mathematical analysis of the estimation algorithms but also gives a desirable separation of parametric and nonparametric estimation problems, which allows one to evaluate parametric and nonparametric estimates more precisely. One such a strategy would be the leave-one-out method which modifies Qˆ N (θ ) as follows N
Q¯ N (θ ) = N −1 ∑ (Yt − gˆt (θ T Xt ; θ ))2 ,
(9.17)
t=1
where gˆt (w; θ ) is the version of the estimate g(w; ˆ θ ) with the training data pair (Xt ,Yt ) omitted from calculation. For instance, in the case of the kernel estimate in (9.10) this takes the form N
N
i=t
l=t
gˆt (w; θ ) = ∑ Yi K((w − θ T Xi )/b)/ ∑ K((w − θ T Xl )/b) . Yet another efficient resampling scheme is based on the partition strategy which reorganises a set of training data DN into two non overlapping subsets that are dependent as weakly as possible. Hence, we define two non overlapping subsets DN,1 , DN,2 of the training set DN such that DN,1 is used to estimate the regression function g(w; θ ) whereas DN,2 is used as a testing sequence to form the least-squares criterion Qˆ N (θ ) in (9.11). There are various strategies to split the data for the efficient estimation of θ ∗ and g∗ (•). The machine learning principle says the testing sequence DN,2 should consists (if it is feasible) of independent observations, whereas the training sequence DN,1 can be arbitrary.
9.3 Semiparametric Block-oriented Systems In this section, we will illustrate the nonparametric/semiparametric methodology developed in Section 2 by examining a few important cases of block-oriented systems. This includes MISO Hammerstein and parallel systems.
9.3.1 Semiparametric Hammerstein Systems Let us begin with the multiple-input, single-output (MISO) Hammerstein system which is depicted in in Figure 3. The fully nonparametric Hammerstein system is given by the following inputoutput relationship: Yn
= Λ (z)Vn + H(z)en ,
Vn
=
m(Un ),
(9.18)
−i −1 where Λ (z) is a causal transfer function defined as Λ (z) = ∑∞ being i=0 λi z , with z the backward-shift operator.
9
Nonparametric and Semiparametric Inference
137 en H(z)
Un
Vn
m(•)
Zn Λ(z)
+
Yn
Fig. 9.3: MISO Hammerstein system
Moreover, H(z) is the stable and inversely stable noise model driven by a white process {en }. The system is excited by the d-dimensional input Un , which throughout the paper is assumed to be a sequence of i.i.d. random vectors. The output of the linear dynamic subsystem Gn is corrupted by an additive noise Zn being independent of {Un }. The system nonlinearity m(•) is a nonparametric function defined on Rd . It is known, see [6], that if Λ (∞) = 1 and E{m(Un )} = 0 then m(u) = E{Yn |Un = u}. This fact holds for any correlated noise process. This allows us to recover m(•) by applying nonparametric regression estimates such those defined in (9.1). Let mˆ N (•) be such an estimate based on the training data DN = {(U1 ,Y1 ), . . . , (UN ,YN )}. It can be demonstrated (under common smoothing conditions on m(u)) that for a large class of nonparametric regression estimates, see [6], we have " # m(u) ˆ = m(u) + OP N −2/(d+4) . (9.19) Hence, the estimates suffer the curse of dimensionality since the rate of convergence gets slower as d increases. It is also worth noting that the linear part of the system Λ (z) can be recovered via the correlation method independently on the form of the system nonlinearity and the noise structure, see [6]. This defines a fully nonparametric identification strategy for the MISO Hammerstein system. The statistical accuracy, however, of such estimation algorithms is rather low due to the generality of the problem. In many practical situations and due to the inherent complexity of the nonparametric Hammerstein system it is sufficient if we resort to the following semiparametric alternative of (9.18) (see Figure 9.4) Yn
= Λ (z; λ )Vn + H(z; λ )en ,
Vn
=
g(γ T Un ),
(9.20)
where Λ (z; λ ) and H(z; λ ) are parametrised rational transfer functions. The function g(•) and the d−dimensional parameter γ define the one-dimensional semiparametric approximation of m(•) which was already introduced in Section 2, see (9.3). Note the class of dynamical systems represented by the rational transfer functions covers a wide range of linear autoregressive and moving average processes. Hence, the semiparametric model in (9.20) is characterised by the pair (θ , g(•)), where θ = (λ , γ ). Since the identifiability of the model requires that Λ (∞; λ ) = 1
138
M. Pawlak en H(z; λ) Un
Wn
γ
g(•)
Vn
Zn Λ(z; λ)
+
Yn
Fig. 9.4: Semiparametric MISO Hammerstein model
and γ1 = 1, therefore we can define the parameter space as Θ = {(λ , γ ) : Λ (∞; λ ) = 1, γ1 = 1}, such that Θ is a compact subset of R p+d , where p is the dimensionality of λ . In order to develop constructive identification algorithms let us define the concept of the true Hammerstein system corresponding to (9.20). We may assume without loss of generality that the true system is in the form as in (9.20) and this will be denoted by the asterisk sign, i.e., the true system is defined as Yn Vn
= Λ (z; λ ∗ )Vn + H(z; λ ∗ )en , = g∗ (γ ∗T Un ),
(9.21)
where it is natural to expect that θ ∗ ∈ Θ . Since the system is linear between the signal Vn and the output Yn then we can recall, see [11], that a one-step ahead prediction error for a given θ ∈ Θ is given by
εn (θ ) = H −1 (z; λ ) [Yn − Λ (z; λ )Vn (γ )] ,
(9.22)
where Vn (γ ) is the counterpart of the true signal Vn corresponding to the value γ . Under our normalisation we note that for a given γ T Un the best L2 predictor of Vn (γ ) is the regression E{Vn (γ )|γ T Un } = E{Yn |γ T Un }. Hence, let g(w; γ ) = E{Yn |γ T Un = w} be the regression function predicting the unobserved signal Vn (γ ). It is worth nothing that g(w; γ ∗ ) = g∗ (w). All these considerations lead to the following form of (9.22) εn (θ ) = H −1 (z; λ ) Yn − Λ (z; λ )g(γ T Un ; γ ) . (9.23) Reasoning now as in Section 2 we can readily form selection criterion for estimating θ ∗ QN (θ ) = N −1
N
∑ εn2 (θ ).
n=1
This is a direct counterpart of the criterion defined in (9.9). As we have already noted the regression g(w; γ ) is unknown but can be directly estimated by nonparametric regression estimates. For example, we use the kernel method, see (9.10), g(w; ˆ γ) =
N
N
n=1
l=1
∑ Yt K((w − γ T Un )/b)/ ∑ K((w − γ T Ul )/b).
(9.24)
9
Nonparametric and Semiparametric Inference
139
Using this or any other nonparametric regression estimate in QN (θ ) we can form our final selection criterion for estimating θ ∗ Qˆ N (θ ) = N −1
N
∑ εˆn2 (θ ),
(9.25)
n=1
ˆ γ T Un ; γ ). The where εˆn (θ ) is the version of (9.23) with g(γ T Un ; γ ) replaced by g( ∗ minimiser of Qˆ N (θ ) = Qˆ N (λ , γ ) defines an estimate (λˆ , γˆ) of (λ , γ ∗ ). Following the reasoning from Section 2 we can show that (λˆ , γˆ) tends (P) to (λ ∗ , γ ∗ ). Furthermore, under additional mild conditions we can find that (λˆ , γˆ) is converging with the optimal OP (N −1/2 ) rate. It is worth noting that if the linear subsystem of the Hammerstein structure is ∗ p λi z−i then we of the moving average type of order p, i.e., Λ (z; λ ∗ ) = 1 + ∑i=1 can estimate Λ (z; λ ∗ ) (independently of g∗ (•) and γ ∗ ) via the correlation method, see [6]. In fact, for a given function η : Rd → R such that E η (Un ) = 0 and E{η (Un )g∗ (Wn )} = 0 we have the following estimate of λ ∗ N −1 ∑N−t i=1 Yt+i η (Ui ) λ˜ t = , −1 N ∑Ni=1 Yi η (Ui )
t = 1, . . . , p.
This applied in (9.25) gives the simplified least squares criterion for selecting γ Qˆ N (γ ) = N
−1
N
∑
i=p+1
&
Yi − ∑ λ˜ t gˆ γ T Ui−t ; γ p
'2 .
(9.26)
t=0
Once the parametric part of the Hammerstein system is obtained one can define the following nonaprametric estimate for the system nonlinearity g(w) ˆ = g(w; ˆ γˆ), where g(w; ˆ γ ) is any nonparametric consistent estimate of g(w; γ ) and γˆ is the minimiser of Qˆ N (λ , γ ). Recalling the arguments given in Section 2 we can conclude that if g∗ (w) is twice differentiable and if we select the bandwidth as b = cN −1/5 then we have g(w) ˆ = g∗ (w) + OP (N −2/5 ). This rate is independent of the dimensionality of the input signal and it is known to be optimal [14]. This should be contrasted with the nonparametric Hammerstein system identification accuracy, see (9.19). The bandwidth choice is critical for the precision of the kernel estimate. The choice b = cN −1/5 is only asymptotically optimal and in practice one would like to specify b depending on the data at hand. One possibility as we have already pointed out, would be to extend the criterion Qˆ N (θ ) in (9.25) and include b into the minimisation process. Hence, we would have the modified criterion Qˆ N (θ , b).
140
M. Pawlak
It is worth noting that in the above two-step scheme the estimate g(w; ˆ γ ) in (9.25) and the criterion function Qˆ N (θ ) share the same training data. This is usually not the recommended strategy since it may lead to estimates with unacceptably large variance. Indeed, some resampling schemes would be useful here which would partition the training data into the testing and training sequences. The former should be used to form the criterion Qˆ N (λ , γ ), whereas the latter to obtain the nonparametric estimate g(w; ˆ γ ). The aforementioned concepts are illustrated in the following simulation example, see also [12] for further details. Example 9.2. In our simulation example, the d-dimensional input signal Un is generated according to uncorrelated Gaussian distribution Nd (0, σ 2 I). We assume that the actual system can be exactly represented by the semiparametric model, with √ √ T the characteristics γ = cos(θ ), sin(θ )/ d − 1, · · · , sin(θ )/ d − 1 and g(w) = 0.7 arctan(β w). Note that with this parametrisation ||γ || = 1. The true value of γ corresponds to θ ∗ = π /4. The slope parameter β defining g(w) is changed in some experiments. Note that the large β defines the nonlinearity with a very rapid change at w = 0. The FIR(3) linear subsystem is used with the transfer function Λ ∗ (z) = 1 + 0.8z−1 − 0.6z−2 + 0.4z−3 . The noise Zt is N(0, 0.1). In our simulation examples we generate L different independent training sets and determine our estimates γˆ and g(·) ˆ described in this section. The local linear kernel estimate with the kernel function K(w) = (1 − w2 )2 , |w| ≤ 1 was employed. In implementing the kernel estimate, the window length b was selected simultaneously with the choice of γ . Furthermore, tin the partition resampling strategy the size of the training subset is set to 55% of the complete training data of the size n = 150. It is also worth noting that the optimal b needed for estimating a preliminary regression estimate g(w; ˆ γ ), has been observed to be different than that required for the final estimate g(w). ˆ Figure 9.5 shows the mean squared error (MSE) of γˆ versus the parameter β . Figure 9.6 represents the identical dependence for the mean integrated squared error (MISE) of g(·). ˆ In both figures we have d = 2. We observe a little influence of the complexity of the nonlinearity g(w) on the accuracy of the estimate g(w; ˆ γ ). This is not the case for estimating g(w). Clearly, a faster changing function is harder to estimate than the one that changes slowly. Figures 7 and 8 show the influence of ˆ The slope parameter was set the input dimensionality on the accuracy of γˆ and g(·). to β = 2. As d varies from d = 2 to d = 10 we observe a very little change in the error values. This supports the observation that the semi-parametric approach may behave favourable in high dimensions.
9.3.2 Semiparametric Parallel Systems In this section we make use of the semiparametric methodology in the context of the parallel system with a single (without loss of generality) input and a finite memory linear subsystem. Hence, the system shown in Figure 9.9 is assumed to be the true system with the following input-output description:
9
Nonparametric and Semiparametric Inference
141
Fig. 9.5: MSE(γˆ) versus the slope parameter β ; n = 150, d = 2
Fig. 9.6: MISE(g) ˆ versus the slope parameter β ; n = 150, d = 2 p
Yn = m∗ (Un ) + ∑ λ j∗Un− j + Zn .
(9.27)
j=0
The identifiability condition for this system is that λ0∗ = 1. Hence, let Λ = {λ ∈ R p+1 : λ0 = 1} be a set of all admissible parameters that is assumed to be the compact subset of R p+1 .
142
M. Pawlak
Fig. 9.7: MSE(γˆ) versus the input dimensionality d; n = 150, β = 2
Fig. 9.8: MISE(g) ˆ versus the input dimensionality d; n = 150, β = 2
As we have already discussed, the semiparametric least squares strategy begins with the elimination of the nonlinear characteristic from the optimisation process. To this end let, Wn (λ ) =
p
∑ λ jUn− j ,
(9.28)
j=0
be the output of the linear subsystem for a given λ ∈ Λ . Clearly Wn (λ ∗ ) = Wn . Next, let (9.29) m(u; λ ) = E{Yn − Wn (λ )|Un = u}
9
Nonparametric and Semiparametric Inference
{λ∗i , 0 ≤ i ≤ p}
143
Wn
Un
Zn Yn
m∗ (•) Fig. 9.9: Semiparametric nonlinear parallel model
be the best model (regression function) of m∗ (u) for a given λ ∈ Λ . Indeed, the signal Yn − Wn (λ ) − Zn is the output of the nonlinear subsystem for λ ∈ Λ . Noting that, p
m(u; λ ) = m∗ (u) + ∑ (λ j∗ − λ j ) E{Un− j |Un = u}, j=0
m(u; λ ∗ )
we can conclude that = m∗ (u). For a given training set DN = {(U1 ,Y1 ), . . . , (UN ,YN )} we can easily form a nonparametric estimate of the regression function m(u; λ ). Hence let, N t (Yt − Wt (λ ))K u−U ∑t=p+1 b , (9.30) m(u; ˆ λ) = N t K u−U ∑t=1 b be the kernel regression estimate of m(u; λ ). The mean-squared criterion for estimating λ ∗ can now be defined as follows: Qˆ N (λ ) = N −1
N
∑
(Yt − m(U ˆ t ; λ ) − Wt (λ ))2 .
(9.31)
t=p+1
The minimiser of the prediction error Qˆ N (λ ) defines an estimate λˆ of λ ∗ . As soon as λˆ is determined we can estimate m∗ (u) by the two-stage process, i.e., we have, m(u) ˆ = m(u; ˆ λˆ ).
(9.32)
Thus far we have used the same data for estimating the pilot regression estimate m(u; ˆ λ ) and the criterion function Qˆ N (λ ). This may lead to consistent estimates but the mathematical analysis of such algorithms is lengthy. In Section 2 we suggested the partition resampling scheme which gives a desirable separation of the training and testing data sets and reduces the mathematical complications. This strategy can be easily applied here, i.e., we can use the subset DN,1 of DN to derive the kernel estimate in (9.30) and then utilise the remaining part of DN for computing the criterion function Qˆ N (λ ). For estimates of λˆ and m(u) ˆ obtained as outlined above, we can follow the arguments given in Section 2 and show that λˆ → λ ∗ (P) and consequently m(u; ˆ λˆ ) → ∗ ∗ m(u; λ ) = m (u)(P).
144
M. Pawlak
The minimisation procedure required to obtain λˆ can be involved due to the highly nonlinear nature of Qˆ N (λ ). A reduced complexity algorithm can be developed based on the general iterative scheme described in Section 2. Hence, for a given λˆ (old) , set m(u; ˆ λˆ (old) ). Then we form the modified criterion, Q˜ N (λ ) = N −1
N
∑
" #2 Yt − m(U ˆ t ; λˆ (old) ) − Wt (λ ) ,
(9.33)
t=p+1
and find
λˆ (new) = arg min Q˜ N (λ ). λ ∈Λ
ˆ λˆ (new) ) and iterate the above process until the Next, we use λˆ (new) to get m(u; ˜ criterion QN (λ ) does not change significantly. It is worth noting that Wt (λ ) in (9.33) is a linear function of λ and therefore we can explicitly find λˆ (new) that minimises Q˜ N (λ ). Indeed, this is the classical linear least squares problem with the following solution
λˆ (new) = (UT U)−1 UT O,
(9.34)
where O is the (N − p) × 1 vector with the t-th coordinate being equal to Yt − m(U ˆ t ; λˆ (old) ), t = p + 1, . . ., N. U is a (N − p) × (p + 1) matrix, U = (UTp+1 , . . . , UTN )T , where Ut = (Ut , . . . ,Ut−p )T . We should note that the above algorithm can work with the dependent input process {Un }. However, if {Un } is a sequence of i.i.d. random variables, then the correlation method provides the following explicit solution for recovering λ ∗ . In fact, we have cov(Yn ,Un− j ) ; j = 1, . . . , p. λ j∗ = var(U0 ) Note also that
m∗ (u) = E{Yn |Un = u} − u.
which allows us to recover m∗ (u). Empirical counterparts of cov(Yn ,Un− j ), var(U0 ), and the regression function E{Yn |Un = u} define the estimates of the system characteristics. Although these are explicit estimates, they are often difficult to generalise in more complex cases. On the other hand, the semiparametric approach can easily be extended to a large class of interconnected complex systems.
9.4 Concluding Remarks In this paper we have compared the nonparametric and semiparametric approaches to the problem of recovering characteristics of block-oriented systems. We have argued that the semiparametric inference can offer an attractive strategy for
9
Nonparametric and Semiparametric Inference
145
identification of large scale composite systems where one faces an inherent problem of dimensionality and model complexity. In fact, the semiparametric paradigm allows us to project the original system onto some parsimonious alternative. The semiparametric version of the least squares method employed in this paper determines such a projection via an optimisation procedure. We have examined a very simple class of semiparametric models, see (9.3), characterised by a single function of one variable and the projection parameter of the dimensionality equal to the number of the input-output signals used in the identification problem. The following is the natural generalisation of the approximation in (9.3)
μ (x) =
L
∑ gl (θlT x),
(9.35)
l=1
where now we wish to specify the univariate functions {gl (•), 1 ≤ l ≤ L} and the parameters {θl , 1 ≤ l ≤ L}. Often one also needs to estimate the degree L of this approximation network. The approximation properties of (9.35) has been examined in [1]. It is worth nothing the nonlinear characteristic in Example 9.1, i.e., m(x1 , x1 ) = x1 x2 , can be exactly reproduced by the network in (9.35). In fact, we have 1 1 x1 x2 = (x1 + x2 )2 − (x1 − x2 )2 . 4 4 This corresponds to (9.35) with g1 (w) = 14 w2 , g2 (w) = − 14 w2 and θ1 = (1, 1)T , θ2 = (1, −1)T . Semiparametric models have been extensively examined in the econometric literature, see [8], [15]. There, they have been introduced as more flexible extension of the standard linear regression model and popular models include partial linear and multiple-index models. These are static models and this paper can be viewed as the generalisation of these models to dynamic nonlinear block-oriented systems. In fact, the partially linear models fall into the category of parallel models, whereas multiple-index models correspond to Hammerstein/Wiener connections. Semiparametric models have recently been introduced in the nonlinear time series literature [3], [4]. Some empirical results on the identification of the partially linear model have been reported in [2]. Comprehensive studies of semiparametric Hammerstein/Wiener models have been given in [6]. There are number of issues worth further studies. First of all, one can consider a more robust version the leastsquare criterion with a general loss function. This would lead to the semiparametric alternative of M- estimation [13]. As a result, we could examine semiparametric counterparts of maximum-likelihood estimation and some penalised M-estimators. The latter would allow us to incorporate some shape constraints like convexity and monotonicity of underlying characteristics. Acknowledgements. The author wishes to thank Mount First and Jiaqing Lv for assistance.
146
M. Pawlak
References 1. Diaconis, P., Shahshahani, M.: On nonlinear functions of linear combinations. SIAM Journal on Scientific Computing 5(1), 175–191 (1984) 2. Espinozo, M., Suyken, J.A.K., De Moor, B.: Kernel based partially linear models and nonlinear identification. IEEE Trans. on Automatic Control 50, 1602–1606 (2005) 3. Fan, J., Yao, Q.: Nonlinear Time Series: Nonparametric and Parametric Methods. Springer, New York (2003) 4. Gao, J., Tong, H.: Semiparametric non-linear time series model selection. Journal of Royal Statistical Society B 66, 321–336 (2004) 5. Giannakis, G.B., Serpendin, E.: A bibliography on nonlinear system identification. Signal Processing 81, 533–580 (2001) 6. Greblicki, W., Pawlak, M.: Nonparametric System Identification. Cambridge University Press, Cambridge (2008) 7. H¨ardle, W., Hall, P., Ichimura, H.: Optimal smoothing in single-index models. The Annals of Statistics 21, 157–178 (1993) 8. H¨ardle, W., M¨uller, M., Sperlich, S., Werwatz, A.: Nonparametric and Semiparametric Models. Springer, Heidelberg (2004) 9. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer, Heidelberg (2009) 10. Kvam, P.H., Vidakovic, B.: Nonparametric Statistics with Applications to Science and Engineering. Wiley, Chichester (2007) 11. Ljung, L.: System Identification: Theory for the User. Prentice-Hall, Englewood Cliffs (1999) 12. Pawlak, M., Lv, J.: On nonparametric identification of MISO Hammerstein systems (Submitted, 2010) 13. Van Der Vaart, A.W.: Asymptotic Statistics. Cambridge University Press, Cambridge (1998) 14. Wasserman, L.: All of Nonparametric Statistics. Springer, Heidelberg (2006) 15. Yatchev, A.: Semiparametric Regression for the Applied Econometrician. Cambridge University Press, Cambridge (2003)
Chapter 10
Identification of Block-oriented Systems Using the Invariance Property Martin Enqvist
10.1 Introduction Identification of systems that can be written as interconnected linear time-invariant (LTI) dynamical subsystems and static nonlinearities has been an active research area for several decades. These systems are often referred to as block-oriented systems since their structures can be characterised using linear dynamical and static nonlinear blocks. In particular, block-oriented systems where the blocks are connected in series have received special attention. For example, Wiener and Hammerstein systems are common examples of series connected block-oriented systems. A Wiener system consists of an LTI subsystem followed by a static nonlinearity, and a Hammerstein system has the same subsystems but in the opposite order. Wiener and Hammerstein systems can both be viewed as special cases of the more general Wiener–Hammerstein systems, which have one LTI subsystem before the static nonlinearity and one after. Many results about identification of Wiener, Hammerstein or Wiener–Hammerstein systems are directly or indirectly related to a particular invariance property which, in some cases, can be shown using Bussgang’s theorem [9] or the theory of separable processes [29, 30]. The reason for the usefulness of this invariance property is that it makes it possible to identify the linear parts of a Wiener, Hammerstein or Wiener–Hammerstein system without compensating for or estimating the static nonlinearity in the system. Hence, the identification problem can be divided into two easier problems. The purpose of this chapter is to give an introduction to the invariance property and its use for identification of block-oriented systems and also to describe some other related results. Martin Enqvist Division of Automatic Control, Department of Electrical Engineering, Link¨oping University, SE-58183 Link¨oping, Sweden e-mail:
[email protected] F. Giri & E.-W. Bai (Eds.): Block-oriented Nonlinear System Identification, LNCIS 404, pp. 147–158. c Springer-Verlag Berlin Heidelberg 2010 springerlink.com
148
M. Enqvist
10.2 Preliminaries One of the most popular methods for system identification is the prediction-error method [25], which has proved to produce accurate models in a large number of estimation settings. The results that will be presented later in this chapter can be motivated using this method, but are useful also for other approaches to identification of block-oriented systems. The prediction-error method can be applied if a finite sequence N Z N = (u(t), y(t))t=1
of simultaneous measurements of the input signal u(t) and output signal y(t) from the studied system is available and a parametrised model y(t, θ ) = G(q, θ )u(t) + H(q, θ )e(t),
(10.1)
has been selected. Here, θ is a d-dimensional vector of parameters and q denotes the shift operator qu(t) = u(t + 1). Under the assumption that e(t) is white noise (i.e., a sequence of independent random variables) and that the model (10.1) is an exact description of the true system, the mean-square error optimal predictor y(t, ˆ θ) of y(t) is (10.2) y(t, ˆ θ ) = H −1 (q, θ )G(q, θ )u(t) + (1 − H −1(q, θ ))y(t). The basic idea in the prediction-error method is to compare how well the predictor (10.2) can predict the measured output y(t) for different θ values and to select a parameter estimate θˆN by minimising some criterion VN (θ , Z N ). For example, this criterion can be chosen to be quadratic such that 1 N θˆN = arg min VN (θ , Z N ) = arg min ∑ (y(t) − y(t, ˆ θ ))2 . θ ∈DM θ ∈DM N t=1 Here, θ is restricted to some pre-specified set DM ⊂ Rd . Usually, DM is the set of parameters that make the predictor (10.2) stable. In many cases, the minimisation of VN (θ , Z N ) has to be performed using some kind of numerical method, e.g., a Gauss-Newton or a damped Gauss-Newton method. The convergence properties of θˆN have been analysed and in [24] it is shown that under rather weak conditions on the true system and on the input and output signals, it holds that
θˆN → θ ∗ = arg min E((y(t) − y(t, ˆ θ ))2 ), θ ∈DM
w.p.1 as N → ∞,
(10.3)
where E denotes the expected value. With some abuse of notation, y(t) and y(t, ˆ θ) here denote the stochastic signals while they previously in this section have denoted realisations of these signals. The convergence result (10.3) shows that many asymptotic properties of the prediction-error method can be derived by analysing the asymptotic cost function E((y(t) − y(t, ˆ θ ))2 ). This is the approach that will be
10
Identification of Block-oriented Systems Using the Invariance Property
149
used here and for simplicity, only output error models, with H(q, θ ) = 1 in (10.1), will be discussed. However, it is straightforward to modify all results to the general case with an arbitrary noise model. More specifically, the identification problem for block-oriented systems that will be described here is formulated in a stochastic framework where all signals are stationary stochastic processes. For simplicity, all signals are assumed to have zero mean, i.e., E(v(t)) = 0 for any signal v(t) and all t ∈ Z. Furthermore, both the covariance function Rv (τ ) = E(v(t)v(t − τ )) and its z-transform
Φv (z) =
∞
∑
τ =−∞
Rv (τ )z−τ
are assumed to be well-defined. The function Φv (z) is called the z-spectrum of the signal and it is assumed that its region of convergence contains the unit circle. Similarly, when two signals u(t) and y(t) are considered, it is assumed that they are jointly stationary and that the cross-covariance function Ryu (τ ) = E(y(t)u(t − τ )) exists. Furthermore, it will be assumed that also this function has a z-transform Φyu (z) whose region of convergence contains the unit circle. Since system identification deals with the input and output signals of a system, any assumptions on these signals can be viewed as implicit assumptions on the system itself. This is a convenient alternative to making explicit assumptions about the system in order to guarantee certain properties of the input and output signals, in particular in the nonlinear setting studied here. Hence, only nonlinear systems with input and output signals with the following properties will be studied in this chapter. Assumption 10.1. Assume that (a)The input u(t) is a real-valued stationary stochastic process with E(u(t)) = 0,
∀t ∈ Z.
(b)There exist K > 0 and α , 0 < α < 1, such that the second order moment Ru (τ ) = E(u(t)u(t − τ )) satisfies |Ru (τ )| < K α |τ | ,
∀τ ∈ Z.
(c)The z-spectrum Φu (z) has a unique canonical spectral factorisation
Φu (z) = L(z)ru L(z−1 ),
(10.4)
where L(z) and 1/L(z) are causal transfer functions that are analytic in the set {z ∈ C | |z| ≥ 1}, L(∞) lim|z|→∞ L(z) = 1 and ru is a positive constant.
150
M. Enqvist
Assumption 10.2. Assume that (a)The output y(t) is a real-valued stationary stochastic process with E(y(t)) = 0,
∀t ∈ Z.
(b)There exist K > 0 and α , 0 < α < 1, such that the second order moments Ryu (τ ) = E(y(t)u(t − τ )) and Ry (τ ) = E(y(t)y(t − τ )) satisfy |Ryu (τ )| < K α |τ | , |τ |
|Ry (τ )| < K α ,
∀τ ∈ Z, ∀τ ∈ Z.
Assumptions 10.1 and 10.2 are satisfied for a wide range of nonlinear systems, including many block-oriented systems. Under these assumptions, and using only output error models, the optimal LTI approximation of a particular nonlinear system can be defined as the stable and causal LTI model G0,OE that minimises the meansquare error E((y(t) − G(q)u(t))2 ). This model is often called the Wiener filter for prediction of y(t) from (u(t − k))∞ k=0 [33, 32, 16]. However, in order to avoid any ambiguities concerning which type of Wiener filter is referred to and in order to emphasise the approximation properties of G0,OE , we will not use the term Wiener filter here, but instead call G0,OE the Output Error LTI Second Order Equivalent (OE-LTI-SOE) of the nonlinear system. This terminology has been used previously in, for example, [26], [15] and [13]. Definition 10.1. Consider a nonlinear system with input u(t) and output y(t) such that Assumptions 10.1 and 10.2 are fulfilled. The Output Error LTI Second Order Equivalent (OE-LTI-SOE) of this system is the stable and causal LTI model G0,OE (q) that minimises the mean-square error E((y(t) − G(q)u(t))2 ), i.e., G0,OE (q) = arg min E((y(t) − G(q)u(t))2 ), G∈G
where G denotes the set of all stable and causal LTI models. Here, stability means bounded input bounded output stability [19]. It should be noted that the OE-LTI-SOE of a nonlinear system is input dependent, i.e., that different input signals in general result in different OE-LTI-SOEs for one particular nonlinear system. Using classic Wiener filter theory, a closed-form expression for the OE-LTI-SOE can be obtained. Theorem 10.1 (OE-LTI-SOEs). Consider a nonlinear system with input u(t) and output y(t) such that Assumptions 10.1 and 10.2 are fulfilled. Then the OE-LTI-SOE G0,OE of this system is 3 4 Φyu (z) 1 G0,OE (z) = , (10.5) ru L(z) L(z−1 ) causal where [. . .]causal denotes taking the causal part, and where L(z) is the canonical spectral factor of Φu (z) from (10.4).
10
Identification of Block-oriented Systems Using the Invariance Property
151
Proof. See [19] or any other textbook on the theory of Wiener filters.
A simple, but useful, corollary to Theorem 10.1 shows that the expression (10.5) for the OE-LTI-SOE sometimes can be simplified. Corollary 10.1. Consider a nonlinear system with input u(t) and output y(t) such that Assumptions 10.1 and 10.2 are fulfilled, and assume that the ratio Φyu (z)/Φu (z) defines a stable and causal LTI system. Then G0,OE (z) =
Φyu (z) . Φu (z)
(10.6)
Proof. This is a direct consequence of Wiener filter theory. A short proof can be found in [15]. The OE-LTI-SOE of a system will be called regular if (10.6) holds. Hence, we have the following definition. Definition 10.2. An OE-LTI-SOE G0,OE (z) is regular if it can be written G0,OE (z) =
Φyu (z) . Φu (z)
In most applications, the order of the OE-LTI-SOE is unknown. However, if a parametrised model, possibly of lower order than the OE-LTI-SOE, is estimated from data, this model will approximate the OE-LTI-SOE according to the following theorem, which is a special case of Theorem 4.1 in [26]. Theorem 10.2. Consider a nonlinear system with input u(t) and output y(t) such that Assumptions 10.1 and 10.2 are fulfilled. Let G0,OE be the corresponding OELTI-SOE according to Theorem 10.1. Suppose that a parametrised stable and causal output error model G(q, θ ) is fitted to the signals u and y according to
θˆ = arg min E(η (t, θ )2 ), θ
where
η (t, θ ) = y(t) − G(q, θ )u(t).
(10.7)
Then it follows that
θˆ = arg min θ
Proof. See [15].
π −π
|G0,OE (eiω ) − G(eiω , θ )|2 Φu (eiω ) d ω .
Theorem 10.2 shows that a low-order model will approximate the OE-LTI-SOE in the same way as a low-order model approximates the true system in a linear identification problem (cf. Section 8.5 and Problem 8G.5 in [25]).
152
M. Enqvist
In the remaining sections, the theory of OE-LTI-SOEs will be used to analyse linear approximations of block-oriented systems. Depending on the type of input signal used, such an approximation can be useful when estimating a complete model of the system.
10.3 The Invariance Property and Separable Processes A key feature of block-oriented systems like Wiener, Hammerstein and Wiener– Hammerstein systems is that the nonlinear characteristics of the system is contained in a single static nonlinearity. Hence, it is natural that there is a strong connection between identification results for such systems and results about static nonlinearities. Because of their simplicity and frequent occurrences in many applications, static nonlinearities is a well-studied topic in the control and identification literature. For example, the properties of static nonlinearities in closed-loop systems is a classic topic and many useful results exist, e.g., the describing function framework for analysis of oscillations [2] and the circle criterion for stability analysis [20]. One of the most useful results about stochastic signals is the particular invariance property that holds for some signals. Definition 10.3. Consider a stationary stochastic process u(t) with E(u(t)) = 0 and Ru (τ ) < ∞ for all τ ∈ Z and a static nonlinearity y(t) = f (u(t)) such that E(y(t)) = 0 and Ryu (τ ) < ∞ for all τ ∈ Z. The invariance property holds if Ryu (τ ) = b0 Ru (τ ),
∀τ ∈ Z,
(10.8)
for some constant b0 . An early result about the invariance property is Bussgang’s theorem [9], which says that the invariance property holds, with b0 = E( f (u(t))), for a large class of static nonlinearities when u(t) is Gaussian. For such a Gaussian signal, the constant b0 is called the equivalent gain in [6] and can be viewed as a describing function for a random input signal. Just like ordinary describing functions, it can be used to analyse nonlinear closed-loop systems [2]. Bussgang’s theorem has been generalised to functions of several Gaussian variables [1, 31, 27] and to other classes of signals than Gaussian in [3], [8], [29] and [30]. Nuttall’s generalisation [29, 30] uses the concept of separable processes and is particularly interesting. Definition 10.4 (Separability). A stationary stochastic process u(t) with E(u(t)) = 0 is separable (in Nuttall’s sense) if E(u(t − τ )|u(t)) = a(τ )u(t)
(10.9)
for some function a(τ ). A number of separable signals are listed in [29] and [30], e.g., Gaussian signals, random binary signals and several types of modulated signals. In addition, it has
10
Identification of Block-oriented Systems Using the Invariance Property
153
been shown that signals with elliptically symmetric distributions are separable [28] as well as random multisine signals with flat amplitude spectra [14]. It is easy to show that the function a(τ ) in (10.9) can be expressed using the covariance function of u(t). Lemma 10.1. Consider a separable stationary stochastic process u(t) with E(u(t)) = 0. The function a(τ ) from (10.9) can then be written a(τ ) =
Ru (τ ) . Ru (0)
(10.10)
Proof. The result is shown in [29] and [30] but since the proof is quite short and a good example of how the separability property can be used, it has been included here. Actually, the result follows immediately from the fact that Ru (τ ) = E(u(t)u(t − τ )) = E u(t)E u(t − τ )|u(t) = a(τ )E(u(t)2 ) = a(τ )Ru (0) if u(t) is separable. Here, we have used the facts that E(Y ) = E(E(Y |X)), E(g(X)Y |X) = g(X)E(Y |X) for two random variables X and Y [17].
(10.11a) (10.11b)
Furthermore, it is easy to show that the separability of u(t) is a sufficient condition for the invariance property (10.8) to hold. Consider a separable process u(t) with zero mean and a static nonlinearity such that y(t) has zero mean too. Then it follows that Ryu (τ ) = E f (u(t))u(t − τ ) = E E f (u(t))u(t − τ )|u(t) = E f (u(t))E u(t − τ )|u(t) = a(τ )E f (u(t))u(t) = b0 Ru (τ ), (10.12) where b0 = E f (u(t))u(t) /Ru (0) and where (10.10) has been used in the last equality. This result can be found in [29] and [30] together with the converse result, which says that separability, in a certain sense, is also a necessary condition for (10.8) to hold. Consider an arbitrary stationary stochastic process u(t) with zero mean and let Du be a class of Lebesgue integrable functions such that Du = { f : R → R | E f (u(t)) = 0, E f (u(t))2 < ∞, Ryu (τ ) = E f (u(t))u(t − τ ) exists ∀τ ∈ Z}. (10.13) The following result shows a certain equivalence between the invariance property and separability of the input signal. Theorem 10.3. Consider a stationary stochastic process u(t) with E(u(t)) = 0 and Ru (τ ) < ∞ for all τ ∈ Z. The invariance property (10.8) holds for all f ∈ Du if and only if u(t) is separable.
154
M. Enqvist
Proof. See [29] or [30].
Theorem 10.3 shows that it is impossible to find a more general signal class for which the invariance property holds for any function in Du . In particular, it explains exactly which feature of Gaussian signals that is crucial for Bussgang’s theorem to hold. Theorem 10.3 has been generalised to nonlinear finite impulse response systems using a separability concept where the conditioning in (10.9) is with respect to several signal components [15]. Using z-spectra, the invariance property can be written
Φyu (z) = b0 Φu (z). Hence, Corollary 10.1 gives that the OE-LTI-SOE of a static nonlinearity is static when the invariance property holds. It is easy to show that this result does not hold for all input signals. As will be shown in the next section, the invariance property has turned out to be quite useful for identification of some classes of block-oriented systems.
10.4 Block-oriented Systems The invariance property is particularly useful for block-oriented systems that contain one static nonlinearity, such as Wiener–Hammerstein systems. In this case, the OELTI-SOE turns out to be equal to the product of the transfer functions of the two linear subsystems and some constant. Theorem 10.4. Consider a Wiener-Hammerstein system y(t) = G2 (q)v(t) + w(t) where v(t) = f (n(t)) and n(t) = G1 (q)u(t) and where G1 (q) and G2 (q) are stable and causal LTI systems. Assume that u(t) and y(t) fulfil Assumptions 10.1 and 10.2 and that w(t) is uncorrelated with u(t − τ ) for all τ ∈ Z. Assume also that n(t) and v(t) fulfil Assumptions 10.1 and 10.2 and that the invariance property holds for n(t) and v(t) = f (n(t)) such that Φvn (z) = b0 Φn (z). Then the OE-LTI-SOE of this system is (10.14) G0,OE (z) = b0 G2 (z)G1 (z). Proof. We have
Φyu (z) = G2 (z)Φvu (z),
(10.15a)
−1
Φvn (z) = Φvu (z)G1 (z ),
(10.15b) −1
Φn (z) = G1 (z)Φu (z)G1 (z ).
(10.15c)
In addition, the invariance property (10.8) gives that
Φvn (z) = b0 Φn (z).
(10.16)
Inserting (10.15b) and (10.15c) in (10.16) gives
Φvu (z) = b0 G1 (z)Φu (z),
(10.17)
10
Identification of Block-oriented Systems Using the Invariance Property
155
and inserting (10.17) in (10.15a) gives
Φyu (z) = G2 (z)b0 G1 (z)Φu (z). Hence, (10.14) follows from Corollary 10.1.
Theorem 10.4 shows that the OE-LTI-SOE of a Wiener–Hammerstein system will be b0 G2 (z)G1 (z) when the invariance property holds for the static nonlinearity v(t) = f (n(t)). Hence, an estimated output error model will approach this model when the number of measurements tends to infinity and it is possible to obtain accurate information about the linear subsystems without compensating for or estimating the static nonlinearity in the system. This information is particularly useful if either G1 or G2 is equal to one, i.e., if we have either a Hammerstein or a Wiener system. In these cases, the OE-LTI-SOE will simply be a scaled version of the LTI subsystem and it is straightforward to estimate the static nonlinearity in a second step. However, for a Wiener–Hammerstein system, there is also the problem of factorising the OE-LTI-SOE into G1 (z) and G2 (z). It is easy to see that Theorem 10.4 can be applied for all Hammerstein systems with separable input signals, but the Wiener and Wiener–Hammerstein cases are harder to analyse. The main problem is that separability is not preserved under linear transformations. This means that there is no guarantee that a separable input signal u(t) will imply that the input n(t) = G1 (q)u(t) to the static nonlinearity is also separable. However, since a Gaussian input signal to an LTI system produces a Gaussian output signal, the separability of the signal is preserved in this case. Hence, it is natural that the special case of a Gaussian input signal has been studied in detail. Some results about identification of Wiener, Hammerstein and Wiener-Hammerstein systems using Gaussian input signals can, for example, be found in [7], [4], [5], [21] and [18]. All these papers cover complete Wiener–Hammerstein systems and use third order moments to factorise the OE-LTI-SOE in G1 (z) and G2 (z). Some related results for random multisine signals exist too [10, 11, 12]. It should be mentioned also that the approach to first identify the linear subsystems in a Wiener, Hammerstein or Wiener–Hammerstein system using the invariance property is based on asymptotic properties of the model estimates. Hence, it is useful mainly for relatively large datasets. The use of the invariance property is illustrated in the following simple example. Example 10.1. Consider the Hammerstein system y(t) = G(q) f (u(t)) + w(t), where 1 + 0.4q−1 , 1 − q−1 + 0.24q−2 f (x) = arctan(4x),
G(q) =
156
M. Enqvist
and w(t) is Gaussian white noise with E(w(t)) = 0 and E(w(t)2 ) = 1. The input to the system is 0.3 e(t), u(t) = 1 + 0.4q−1 where e(t) is Gaussian white noise with E(e(t)) = 0 and E(e(t)2 ) = 1. The signals w(t) and e(t) are independent. The behaviour of this system has been simulated for a particular realisation of the input and noise signals and a dataset with 20 000 input and output measurements has been collected. An output error model has been estimated from this dataset using the System Identification Toolbox in MATLABTM and the result was ˆ G(q) =
1 + 0.415q−1 . 1 − 0.973q−1 + 0.220q−2
Here, the numerator coefficients have been rescaled such that the first coefficient is one in order to make the comparison with the true linear subsystem easier. As can be ˆ seen, G(q) is a rather accurate approximation of G(q). This result is obtained since the invariance property holds for a Gaussian input signal.
10.5 Discussion The invariance property has been used for identification of block-oriented systems in numerous publications and is a well-established tool in the system identification community. However, the underlying theoretical results have also given rise to a large number of results in the statistical literature. Many of these results seem to have been developed independently without a lot of interaction with the system identification community. It seems likely that some methods that have been designed for particular problems in, for example, regression analysis should be useful also for identification of block-oriented dynamical systems. In [22], a nonlinear regression model y = f (ST x, e) is studied. Here, y is a scalar response variable, S is a p × q-dimensional matrix, x is a p-dimensional vector of explanatory variables and e is some unknown disturbance which is independent of x. The main result in [22] shows that a basis for the column space of the matrix S can be estimated using a method called sliced inverse regression, provided that the distribution of the x vector satisfies a condition about linear conditional expectations which is similar to the definition of separability. The main idea in sliced inverse regression is to estimate E(x|y) for different values of y and to extract information about the column space of S from these estimates using principal component analysis. It is mentioned in [22] that the method is applicable when, for example, x has an elliptical distribution. The method is particularly useful when q is much smaller than p. In this case, knowledge about S, or at least
10
Identification of Block-oriented Systems Using the Invariance Property
157
its column space, can be used to transform a high-dimensional nonlinear regression problem into a more low-dimensional and tractable one. A related approach to the same problem is presented in [23], where a method based on principal Hessian directions is used instead of sliced inverse regression. Numerous modifications and improvements of these original methods can also be found in literature. Investigation of the available methods for dimension reduction in the context of block-oriented systems, possibly with some modifications to handle also the infinite impulse response case, seems like an interesting topic for future research.
References 1. Atalik, T.S., Utku, S.: Stochastic linearization of multi-degree-of-freedom non-linear systems. Earthquake Engineering and Structural Dynamics 4, 411–420 (1976) 2. Atherton, D.P.: Nonlinear Control Engineering, student edn. Van Nostrand Reinhold, New York (1982) 3. Barrett, J.F., Lampard, D.G.: An expansion for some second-order probability distributions and its application to noise problems. IRE Transactions on Information Theory 1(1), 10–15 (1955) 4. Billings, S.A., Fakhouri, S.Y.: Theory of separable processes with applications to the identification of nonlinear systems. Proceedings of the IEE 125(9), 1051–1058 (1978) 5. Billings, S.A., Fakhouri, S.Y.: Identification of systems containing linear dynamic and static nonlinear elements. Automatica 18(1), 15–26 (1982) 6. Booton Jr., R.C.: Nonlinear control systems with random inputs. IRE Transactions on Circuit Theory 1(1), 9–18 (1954) 7. Brillinger, D.R.: The identification of a particular nonlinear time series system. Biometrika 64(3), 509–515 (1977) 8. Brown Jr., J.L.: On a cross-correlation property for stationary random processes. IRE Transactions on Information Theory 3(1), 28–31 (1957) 9. Bussgang, J.J.: Crosscorrelation functions of amplitude-distorted Gaussian signals. Technical Report 216, MIT Research Laboratory of Electronics, Cambridge, Massachusetts (1952) 10. Crama, P., Schoukens, J.: Initial estimates of Wiener and Hammerstein systems using multisine excitation. IEEE Transactions on Instrumentation and Measurement 50(6), 1791–1795 (2001) 11. Crama, P., Schoukens, J.: Hammerstein-Wiener system estimator initialization. Automatica 40(9), 1543–1550 (2004) 12. Crama, P., Schoukens, J.: Computing an initial estimate of a Wiener-Hammerstein system with a random phase multisine excitation. IEEE Transactions on Instrumentation and Measurement 54(1), 117–122 (2005) 13. Enqvist, M.: Linear Models of Nonlinear Systems. PhD thesis, Link¨oping University, Link¨oping, Sweden (2005) 14. Enqvist, M.: Identification of Hammerstein systems using separable random multisines. In: Proceedings of the 14th IFAC Symposium on System Identification, Newcastle, Australia, pp. 768–773 (2006) 15. Enqvist, M., Ljung, L.: Linear approximations of nonlinear FIR systems for separable input processes. Automatica 41(3), 459–473 (2005)
158
M. Enqvist
16. Gardner, W.A.: Introduction to Random Processes. Macmillan Publishing Company, New York (1986) 17. Gut, A.: An Intermediate Course in Probability. Springer, New York (1995) 18. Hunter, I.W., Korenberg, M.J.: The identification of nonlinear biological systems: Wiener and Hammerstein cascade models. Biological Cybernetics 55, 135–144 (1986) 19. Kailath, T., Sayed, A.H., Hassibi, B.: Linear Estimation. Prentice Hall, Upper Saddle River (2000) 20. Khalil, H.K.: Nonlinear Systems, 3rd edn. Prentice Hall, Upper Saddle River (2002) 21. Korenberg, M.J.: Identifying noisy cascades of linear and static nonlinear systems. In: Proceedings of the 7th IFAC Symposium on Identification and System Parameter Estimation, York, UK, pp. 421–426 (1985) 22. Li, K.-C.: Sliced inverse regression for dimension reduction. Journal of the American Statistical Association 86(414), 316–327 (1991) 23. Li, K.-C.: On principal Hessian directions for data visualization and dimension reduction: Another application of Stein’s lemma. Journal of the American Statistical Association 87(420), 1025–1039 (1992) 24. Ljung, L.: Convergence analysis of parametric identification methods. IEEE Transactions on Automatic Control 23(5), 770–783 (1978) 25. Ljung, L.: System Identification: Theory for the User, 2nd edn. Prentice Hall, Upper Saddle River (1999) 26. Ljung, L.: Estimating linear time-invariant models of nonlinear time-varying systems. European Journal of Control 7(2-3), 203–219 (2001) 27. Lutes, L.D., Sarkani, S.: Stochastic Analysis of Structural and Mechanical Vibrations. Prentice Hall, Upper Saddle River (1997) 28. McGraw, D.K., Wagner, J.F.: Elliptically symmetric distributions. IEEE Transactions on Information Theory 14(1), 110–120 (1968) 29. Nuttall, A.H.: Theory and application of the separable class of random processes. Technical Report 343, MIT Research Laboratory of Electronics, Cambridge, Massachusetts (1958) 30. Nuttall, A.H.: Theory and Application of the Separable Class of Random Processes. PhD thesis, MIT, Cambridge, Massachusetts (1958) 31. Scarano, G., Caggiati, D., Jacovitti, G.: Cumulant series expansion of hybrid nonlinear moments of n variates. IEEE Transactions on Signal Processing 41(1), 486–489 (1993) 32. Schetzen, M.: The Volterra & Wiener Theories of Nonlinear Systems. John Wiley & Sons, Chichester (1980) 33. Wiener, N.: Extrapolation, Interpolation and Smoothing of Stationary Time Series. Technology Press & Wiley, New York (1949)
Chapter 11
Frequency Domain Identification of Hammerstein Models Er-Wei Bai
11.1 Introduction In this chapter, we discuss a frequency approach for Hammerstein model identification. The method is based on the fundamental frequency and therefore, no a priori information on the structure of the nonlinearity is required. Moreover, the method is not limited to Hammerstein models whose linear part is a finite-order rational transfer functions, but applies to Hammerstein models with a non-parametric linear part. The method can be easily extended to Wiener models with minor modifications. The chapter is based on an article [1], IEEE Trans. on Automatic Control, Vol. 48, pp.530-542, 2003 with permission from IEEE Intellectual Property Rights Office. All the proofs can be found in [1]. Use of sinusoidal inputs in identification of Hammerstein models has certain advantages. The periodicity of the input signals implies that all the signals inside the system consist of frequencies that are integer multiples of the input frequencies. Subharmonics or chaos can never happen. This makes identification simple. Another important observation in our approach is that with sinusoidal inputs, the output of the nonlinearity permits a Fourier series representation. Moreover, the Fourier coefficients are invariant with respect to the input frequencies. We remark that the idea of frequency domain identification to Hammerstein models is not new and appeared in the study of identification for Hammerstein models [6, 11, 17]. Though there were several approaches in the literature, they are more or less the same ideas as in [6, 9, 10]. In [10], the nonlinearity is assumed to be a polynomial with a known order. The reason is that once the order is known, the highest harmonic has a known frequency and behaves in a linear manner [10]. Thus, linear techniques based on the highest harmonic can be applied to identify the linear part. The problem is that the nonlinearity may not be a polynomial with a known order. Even it is a polynomial with known order, the coefficient of the highest order is usually very small. For Er-Wei Bai Dept. of Electrical and Computer Engineering, University of Iowa e-mail:
[email protected] F. Giri & E.-W. Bai (Eds.): Block-oriented Nonlinear System Identification, LNCIS 404, pp. 161–180. c Springer-Verlag Berlin Heidelberg 2010 springerlink.com
162
E.-W. Bai
instance, in a practical situation, the unknown nonlinearity may be approximated by a polynomial. To have a reasonably good approximation, the order has to be high. Moreover, the coefficient with the highest order is usually small. This implies that the very signal used in identification with the highest harmonic has a small amplitude. This has a significant impact on the signal to noise ratio in identification and makes the method sensitive. The frequency approach of [6] also assumes the exact order knowledge of the nonlinearity. With a known order, the output is a known nonlinear function of input magnitude and frequency with some unknown parameters. By repeatedly applying different magnitudes and frequencies, these unknown parameters can be uniquely solved. Without the exact order information, uniqueness is however lost. The other frequency approach [9] also needs a parametrisation of the unknown nonlinearity.
11.2 Problem Statement and Point Estimation Consider the Hammerstein model shown in Figure 11.1, where u(t), v(t) y(t) and y f (t) are the system input, noise, output and filtered output respectively. x(t) denotes the unavailable internal signal. These are continuous time signals. u(iTs ) and y f (iTs ) denote the sampled input and sampled filtered output signals respectively with the sampling interval Ts that will be specified later. The filter is a lowpass filter at designer’s disposal. The goal of the frequency domain identification is to apply inputs of the form u(t) = Acos(ωk t), ωk = 0, t ∈ [0, T ] ˆ and then, to determine a pair of the estimates fˆ(·) and G(s) based on finite sampled ˆ ˆ → G(s) in inputs and filtered outputs u(iTs ) and y f (iTs ) so that f (·) → f (·), G(s) some sense. Note that the continuous time model G(s), not its discretised model, ˆ is our interests. The exact forms of fˆ(·) and G(s) will be given later. In fact, the ˆ ˆ forms of f (·) and G(s) depend on whether they are parametric or not. Just like the frequency identification approaches for linear systems, the proposed method may also have to be repeated for a number of frequencies. Assumption 11.1. Throughout the chapter, we assume 1. The nonlinearity x = f (u) is a static function that is continuous and piecewise smooth for u ∈ [−A, A], where [−A, A] is the input range of interest. 2. The linear part G(s) is an exponentially stable continuous time system which can be represented by a rational transfer function or simply non-parametric. 3. Noise v(t) is a continuous time random signal that is the output of an unknown stable and proper linear system driven by a white noise source with zero mean and finite variance. No rationality on the transfer function G(s) and no a priori knowledge on the structure of the unknown nonlinearity f (·) are assumed. The standard polynomial nonlinearities as well as many hard input nonlinearities, e.g., deadzone and saturation, belong to the class specified by Assumption 11.1.
11
Frequency Domain Identification of Hammerstein Models
163
v(t) u(t)
f(u)
-
x(t)
-
G(s)
-
y(t)
+
?
-
+ -
?
filter
u(iTs)
-
yf( iTs )
Fig. 11.1: The Hammerstein model
11.2.1 Continuous Time Frequency Response Note that the nonlinearity x = f (u) is continuous and piecewise smooth. If the input u(t) = Acos(ωk t) which is an even and periodic function with the period 2ωπk , then, x(t) is also an even and periodic function that is continuous and piecewise smooth, and consequently it permits a Fourier series representation ∞
x(t) = ∑ ri cos(iωk t)
(11.1)
i=0
where the Fourier coefficients are given by r0 =
ωk 2π
2π ωk
0
f (Acos(ωk t))dt, ri =
ωk π
2π ωk
0
f (Acos(ωk t))cos(iωk t)dt, i = 1, 2, ...
Moreover, since f (u)|t=0 = f (u)|t=2π /ωk and x(t) is continuous and piecewise ¯ smooth, it follows from [15] that |x(t) − ∑ii=0 ri cos(iωk t)| → 0 uniformly in t as i¯ → ∞. Lemma 11.1. Let u(t) = Acos(ωk t) and x(t) be represented by the Fourier series representation (11.1). Then, 1. The Fourier coefficients ri ’s are independent of the input frequency ωk . In other words, the Fourier series expression (11.1) is valid for any non-zero input frequency with the identical Fourier coefficients ri ’s. 2. If the nonlinearity is odd, i.e., f (−u) = − f (u), then r2i = 0, i = 0, 1, 2, .... 3. If the nonlinearity is even, i.e., f (−u) = f (u), then r2i+1 = 0, i = 0, 1, 2, .... The lemma shows that ri ’s are independent of the input frequency ωk . This observation is the key that makes the frequency domain identification of Hammerstein models possible. We now define the finite time Fourier transform. Given the input frequency ωk and with the observation interval
164
E.-W. Bai
T =L
2π ωk
(11.2)
for some integer L > 0, let VT (ω ) YT (ω )
!
= 0T v(t)e− jω t dt, UT (ω ) = ! = 0T y(t)e− jω t dt, Y f ,T (ω ) =
!T u(t)e− jω t dt, !0T − jω t 0
y f (t)e
(11.3)
dt
denote the finite time Fourier transforms of v(t), u(t), y(t) and y f (t) respectively, where u(t) = Acos(ωk t) is the input, and v(t), y(t) and y f (t) are the noise, output and filtered output respectively. When u(t) = Acos(ωk t), since x(t) = ∑∞ i=0 ri cos(iωk t) and the linear part is an unknown transfer function G(s), it follows that y(t) = ∑∞ i=0 ri |G( j ωk i)|cos(iωk t + ∠G( jωk i)).
11.2.2 Point Estimation of G( jω ) Based on YT and UT In this section, we develop theoretical framework in continuous time domain based on the continuous time model G(s) and continuous time signals u(t), x(t) and y(t). Digital implementation using only the sampled u(iTs ) and y f (iTs ) will be discussed in the next section. ¯ jωk ) of G( jω ) at Given the input u(t) = Acos(ωk t), define the point estimate G( ω = ωk as 1 Y (ω ) ¯ j ωk ) = A T T k G( (11.4) 1 T UT (ωk ) where YT (ωk ) and UT (ωk ) are the finite time Fourier transforms as defined in (11.3). It is a straightforward calculation to show that T1 UT (ωk ) = A2 and because ∞ ∑i=0 ri |G( jωk i)|cos(iωk t + ∠G( jωk i)) is absolutely integrable [15], we have 1 1 YT (ωk ) = T T
T ∞
1
∑ ri |G( jωk i)|cos(iωk t + ∠G( jωk i))e− jωk t dt + T VT (ωk )
0 i=0
r1 1 G( jωk ) + VT (ωk ) . 2 T It is important to note at this point that in the characterisation of the Hammerstein model, the gains of f (u) and G(s) are actually not unique because of the product. Any pair (α f (u), G(s)/α ), α = 0, would produce identical input and output measurements. There are several ways to make the representation unique, e.g., either the gain of f (u) or G(s) can be fixed to be unit. In this chapter we take a different approach and assume =
Assumption 11.2. The coefficient r1 in (11.1) is normalised to be one, i.e., r1 = 1.
11
Frequency Domain Identification of Hammerstein Models
165
Normalisation of r1 = 1 is arbitrary. In theory, r1 may be normalised to be any fixed number or we can also normalise any ri , i = 1, to be unity in the case that r1 = 0. From the above assumption, it follows that ¯ j ωk ) = A G(
1 T YT (ωk ) 1 T UT (ωk )
= G( jωk ) +
2 VT (ωk ) . T
(11.5)
¯ jωk ) using YT (ωk ) and UT (ωk ). The following theorem gives the quality of G( Theorem 11.1. Consider the Hammerstein model under Assumptions 11.1 and ¯ jωk ) in (11.4) by frequency domain identifi11.2. Consider the point estimate G( cation. Then, uniformly in k, ¯ jωk ) → G( jωk ) G( in probability as T → ∞. ¯ jωk ) of G( jωk ) can be accurately Similar to the linear case, the point estimate G( obtained in the presence of the unknown nonlinearity f (·) provided that the continuous time data u(t) and y(t) are available. However, in most applications, only sampled values are available and thus, we discuss implementation of this point estimation algorithm by using the sampled data u(iTs ) and y f (iTs ). We will show that the identification results remain almost identical if the lowpass filter and the sampling interval Ts are properly chosen as suggested in [14].
11.2.3 Implementation Using Sampled Data ¯ jωk ) is the calculations of 1 UT (ωk ) and 1 YT (ωk ) that involves A key to find G( T T continuous time integrations. We show in this section that these quantities are computable by applying DFTs (discrete Fourier transform) on the sampled input and filtered output u(iTs ) and y f (iTs ). There are three steps involved: the choice of lowpass filter cutoff frequency ω¯ , the determination of the sampling interval Ts and the calculation of DFTs. Filter choice: When u(t) = Acos(ωk t), the output is given by ∞
y(t) = ∑ ri |G( jωk i)|cos(iωk t + θi ) + v(t) i=0
with θi = ∠G( jωk i) . Recall that the purpose of the point estimation is to estimate G( jωk ). In the absence of any structural prior information on the unknown nonlinearity, we will see later however that the pair (u(iTs ), x(iT ˆ s )) plays an impor¯ tant role, where x(t) ˆ = ∑ii=0 rˆi cos(iωk t) is the estimate of unknown internal variable x(t) = ∑∞ i=0 ri cos(iωk t) and rˆi is the estimate of ri . To estimate ri based on the sampled data u(iTs ) and y f (iTs ), y f (t) must contain frequencies up to i¯ωk . To this end, let the cutoff frequency ω¯ of the lowpass filter be
166
E.-W. Bai
i¯ωk < ω¯ < (i¯ + 1)ωk
(11.6)
for some integer i¯ ≥ 1. Then, the output y f (t) of the lowpass filter is in the form of i¯
y f (t) = ∑ ri |G( jωk i)|cos(iωk t + θi ) + v f (t)
(11.7)
i=0
where v f (t) is the filtered noise. Here, we assume that the higher order terms i > i¯ are negligible. How to deal with these small errors will be provided later in discussions of Sections 3.1 and 3.2. Determination of the sampling interval Ts . Since the highest frequency remaining in y f (t) due to the input is i¯ωk , we define the sampling interval by 2π 1 , M>2. (11.8) Ts = ¯ iωk M The choice of the integer M > 2 is to make sure that the sampling frequency is always higher than the Nyquist frequency i¯ωk /π . Obviously, from (11.8), we have T =L
2π ¯ s , Ts /T = 1 , ωk Ts = 2π . = LiMT i¯M ωk Li¯M
We comment that a large number of simulation seems to suggest that in many cases identification results are similar with or without the lowpass filter. One ex¯ planation is that because | ∑∞ i=i¯+1 ri cos(iωk t)| → 0 as i gets larger, when rˆi → ri and ∞ | ∑i=i¯+1 ri cos(iωk t)| is already small, the use of the lowpass does not make too much difference. In the absence of the lowpass filter, the choice of the sampling interval (11.8) remains valid. DFT implementation: With the sampled input and filtered output data u(iTs ) and y f (iTs ), i = 0, 1, ..., Li¯M − 1, we now define the DFTs of u(iTs ) and y f (iTs ). ¯
1 LiM−1 Y f ,DFT (pωk ) = ¯ ∑ y f (lTs )e− j pωk lTs LiM l=0 =
1 T
Li¯M−1
∑
(11.9)
y f (lTs )e− j pωk lTs Ts , p = 0, 1, ..., i¯
l=0
and ¯
1 LiM−1 1 UDFT (pωk ) = ¯ ∑ u(lTs )e− j pωk lTs = T LiM l=0
Li¯M−1
∑
u(lTs )e− j pωk lTs Ts .
(11.10)
l=0
These DFTs Y f ,DFT (ωk ) and UDFT (ωk ) have a very clear interpretation with respect to the continuous time integrations T1 UT (ωk ) and T1 Y f ,T (ωk ). In fact,
11
Frequency Domain Identification of Hammerstein Models
Y f ,DFT (ωk ) and UDFT (ωk ) are numerical integrations of T1 UT (ωk ) and by Li¯M rectangular of equal width Ts . For UDFT (ωk ), recall again that u(t) = Acos(ωk t). This implies
167 1 T Y f ,T (ωk )
u(iTs ) = Acos(ωk iTs ) and
¯
1 LiM−1 A jωk lTs UDFT (ωk ) = ¯ + e− jωk lTs )e− jωk lTs ∑ 2 (e LiM l=0 ¯
1 LiM−1 A A = ¯ ∑ 2 (1 + e−2 jωklTs ) = 2 . LiM l=0 This implies that UDFT (ωk ) = A2 = T1 UT (ωk ). In other words, the continuous time integration T1 UT (ωk ) can be obtained exactly by UDFT (ωk ) which is computable ¯ − 1. From (11.7), we now calusing only the sampled input u(iTs ), i = 0, 1, ..., LiM culate ¯
¯
1 LiM−1 i Y f ,DFT (pωk ) = ¯ ∑ ∑ ri |G( jωk i)|cos(iωk lTs + θi)e− j pωk lTs LiM l=0 i=0 ¯
1 LiM−1 + ¯ ∑ v f (lTs )e− j pωk lTs . LiM l=0 The second term is exactly the DFT V f ,DFT (pωk ) of v f (lTs ) and the first term can be rewritten as ¯
¯
1 i 1 LiM−1 j(i−p)ωk lTs jθi r |G( j ω i)| e + e− j(i+p)ωklTs e− jθi ) i k ∑ ∑ (e Li¯M i=0 2 l=0 =
rp 2 G( j ωk p) r0 G( j0)
p = 1, 2, ..., i¯ p = 0.
In particular, when p = 1, Y f ,DFT (ωk ) =
r1 1 G( jωk ) + V f ,DFT (ωk ) = G( jωk ) + V f ,DFT (ωk ) . 2 2
We comment that the calculation of UDFT (ωk ) and Y f ,DFT (ωk ) is well known in the literature [7]. We now define the point estimate G¯ d ( jωk ) using only the sampled u(iTs ) and y f (iTs ) by Y f ,DFT (ωk ) . (11.11) G¯ d ( jωk ) = A UDFT (ωk )
168
E.-W. Bai
From the calculation of Y f ,DFT (ωk ) and UDFT (ωk ), it follows that G¯ d ( jωk ) = G( jωk ) + 2V f ,DFT (ωk ) where V f ,DFT (ω ) is the DFT of v f (t), and the estimation error is G¯ d ( jωk ) − G( jωk ) = 2V f ,DFT (ωk ) . We now summarise the algorithm for estimating G( jωk ) using only the sampled u(iTs ) and y f (iTs ). Identification algorithm for estimating G( jωk ) using only the sampled data. Given u(t) = Acos(ωk t), let i¯ωk < ω¯ < (i¯+ 1)ωk for some integer i¯ ≥ 1, Ts = i¯2ωπ M1 k
for some integer M > 2 and T = L 2ωπ for some integer L > 0. k Step 1: Collect u(iTs ) and y f (iTs ), i = 0, 1, ..., Li¯M − 1. Step 2: Calculate Y f ,DFT (ωk ) and UDFT (ωk ). Y (ω ) Step 3: Define the estimate G¯ d ( jωk ) = A Uf ,DFT(ω k) . DFT k The estimate G¯ d ( jωk ) is computable using only the sampled data. Moreover, as its continuous counterpart, G¯ d ( jωk ) → G( jωk ) as T → ∞ as shown in the following theorem. Theorem 11.2. Consider the point estimate G¯ d ( jωk ) of (11.11) with T and Ts defined in (11.2) and (11.8) respectively. Then, uniformly in k, G¯ d ( jωk ) = G( jωk ) + 2V f ,DFT (ωk ) → G( jωk ) in probability as T → ∞.
11.3 Identification of G(s) Given the point estimates G¯ d ( jωk )’s, to find a G( jω ) is a curve fitting problem. Whether a particular method is effective for identification of G( jω ) depends on the assumptions of G( jω ). If G( jω ) is non-parametric, it is expected that the method is complicated and tedious. On the other hand, the identification is much easier if the unknown G(ω ) is known to be an nth order rational transfer function.
11.3.1 Finite-order Rational Transfer Function G(s) In this section, we will discuss a simple case when the unknown G(s) is characterised by an nth order stable rational transfer function G(s) =
b1 sn−1 + b2 sn−2 + ...... + bn . sn + a1sn−1 + a2sn−2 + ...... + an
(11.12)
11
Frequency Domain Identification of Hammerstein Models
169
The unknown coefficient vector θ and its estimate θˆ are denoted by
θ = (b1 , ......, bn , a1 , ....., an ) , θˆ = (bˆ 1 , ......, bˆ n , aˆ1 , ....., aˆn ) . The simplest way to find θˆ is to solve the least squares minimisation [12]. Let e(θˆ , ωk ) = (( jωk )n−1 bˆ 1 + ... + bˆ n) − G¯ d ( jωk )(( jωk )n + ( jωk )n−1 aˆ1 + ... + aˆ n) . Then, the estimate θˆ is obtained by N
θˆ = arg min ∑ e(θˆ , ωk )2
(11.13)
k=1
for some N ≥ n. Clearly, if G¯ d ( jωk ) = G( jωk ), e(θ , ωk ) = 0 and θˆ = θ . Now, we ˆ define the estimate G(s) as ˆ = G(s)
bˆ 1 sn−1 + ... + bˆ n n s + aˆ1sn−1 + ... + aˆ n
(11.14)
The following theorem can be easily derived that gives the estimation error analysis. Theorem 11.3. Let G( jω ) = 0, ∀ω and let the parameter vector estimate θˆ and ˆ the transfer function estimate G(s) be defined by (11.13) and (11.14) with N ≥ n. Suppose G¯ d ( jωk ) → G( jωk ) in probability as T → ∞. Then, ˆ jω ) − G( jω )| → 0 θˆ → θ , sup |G( ω
in probability as T → ∞. ˆ jω ) of (11.13) and (11.14) are consisWe remark that the least squares solutions G( ¯ tent in theory because Gd (ωi ) → G( jω ) as T → ∞. In some applications, however, the least squares may not perform well due to various reasons. For instance, (1) When T is finite which is always the case in reality, G¯ d ( jωi ) = G( jωi ) and this ˆ jω ), (2) The lowpass filter is not introduces errors on the least squares estimate G( ideal or the noise may not be completely captured by the assumptions. This again causes errors on the point estimate G¯ d ( jωi ) and consequently the least squares esˆ jω ) and (3) A large range of input frequencies can over-emphasise high timate G( frequency errors and results in a poor low frequency fit. To overcome these difficulties, the iterative least squares can be used. Let θˆ (l) be the estimate obtained at the lth iteration, the iterative least squares solution of θˆ (l+1) consists of minimising N
θˆ (l+1) = arg min ∑
e(θˆ (l+1) , ωk )2
(l) (l) k=1 ( j ωk )n + ( j ωk )n−1 aˆ1 + ... + aˆ n 2
.
170
E.-W. Bai
Alternatively, the nonlinear least squares estimate can be defined N
θˆ = arg min ∑ G¯ d (ωk ) − k=1
bˆ 1 ( jωk )n−1 + ... + bˆ n 2 ( jωk )n + aˆ1 ( jωk )n−1 + ... + aˆ n
and solved using numerical methods, e.g., the Newton-Gauss scheme. For both the iterative least squares and the nonlinear least squares, the linear least squares solutions provided by (11.13) and (11.14) can be used as an initial estimate to begin with. It was shown in [14] that, if convergent, the iterative least squares and the nonlinear least squares tend to give a smaller estimation error. For details, see Sections 7.8 and 7.9 of [14].
11.3.2 Non-parametric G(s) Given the point estimates, how to find the transfer function is a classical problem. There exist some methods in the literature that could be modified and used here. For instance, the well known spectral analysis method [13] aims to determine the transfer function based on spectral estimation and smoothing. Here, we adopt an approach based on interpolation technique which is used in H∞ identification setting. To this end, consider the standard bilinear transformation s=
1 λ −1 1 + sγ or λ = γ λ +1 1 − sγ
for some γ > 0. In this section, we set the numerical value γ = 1. Now define H(λ ) = G(s)|s= λ −1 = λ +1
∞
∑ hk λ −k .
k=0
Since G(s) is unknown but exponentially stable and the bilinear transformation preserves the stability, the unknown H(λ ) satisfies |hk | ≤ M1 ρ k for some constants M1 > 0 and 0 < ρ < 1. Further, let s = jω and λ = e jΩ , we have jω =
e jΩ − 1 Ω Ω = jtan or ω = tan . j Ω e +1 2 2
Our idea of identification is to use the point estimate G¯ d ( jωk ) of (11.11) at ωk = ˆ λ) = tan kNπ or Ωk = 2kNπ , k = 0, 1, ..., N − 1 for some N > 0. Then, we construct H( N−1 ˆ −k ∑k=0 hk λ such that ˆ jΩk ) = H(e
N−1
¯ jωk ), k = 0, 1, ..., N − 1 . ∑ hˆ l e− jΩk l = G(
(11.15)
l=0
ˆ Finally, we define the estimate G(s) of G(s) as ˆ = H( ˆ λ )| 1+s . G(s) λ= 1−s
(11.16)
11
Frequency Domain Identification of Hammerstein Models
171
Theorem 11.4. Let ωk = tan kNπ , Tk = Lk 2ωπk , k = 0, 1, ..., N − 1 and T = mink Tk . ˆ Consider the estimate G(s) of (11.16). Suppose G¯ d (ωk ) → G(ωk ) in probability, and N → ∞ and N/T → 0 as T → ∞. Then, uniformly in ω , ˆ jω ) − G( jω ))| ≤ O(ρ N ), E|G( ˆ jω ) − G( jω )|2 ≤ O(ρ 2N ) + O( N ) → 0 |E(G( T ˆ jω ) converges to G( jω ) uniformly in probability. as T → ∞. In other words, G( The idea of the above estimate is the polynomial interpolation of (11.15). Under Assumptions 11.1 and 11.2, G¯ d ( jωk ) → G( jωk ) in probability and consequently ˆ jω ) → G( jω ). However, if G¯ d ( jωk ) → G( jωk ) for various reasons, e.g., T is G( finite, then Gˆ d ( jωk ) = G( jωk ) and, say only |G¯ d ( jωk ) − G( jωk )| ≤ ε can be guaranteed for some small but non-zero ε > 0. Then, the polynomial interpolation of (11.15) tends to show some overshooting for very large N. In fact, the overshooting is in the order of ε ln N as N → ∞. To avoid this problem, several methods can be used, e.g., interpolations using splines. We discuss the following two robustness modifications: • If only a finite frequency range is interested, we can apply Fejer interpolation which matches the given data but also limits the magnitude of the derivatives. ˆ jω ) − G( jω )| = O(ε ) as N → ∞ ˆ jω ) satisfies |G( Then, the obtained estimate G( in the frequency range of interest. This algorithm is linear and see [4] for details. • If the whole frequency range (−∞, ∞) is interested, it is well known that there does not exist any linear algorithm which ensures the robustness in the presence of small but non-vanishing errors in the point estimation of G¯ d ( jωk ). Here, linear algorithm means that the algorithm is linear from the given data to the estimate. In this case, a two stage nonlinear algorithm [8] can be applied. The first stage of the algorithm is to find a non-causal system and the second stage is to apply the Nehari approximation to find a best fit which is stable and causal. The error between the estimate and the true system is in the order of O(ε ) as N → ∞ [8]. Since the above two modifications are available in the literature, we only provide a brief discussion and interested readers can find more from [6, 8]. We also comment that the approach adopted here is based on the interpolation technique. Other techniques, e.g., the well known spectral analysis method can also be modified and used to determine the transfer function. The key is the reliable point estimation provided in the previous sections.
11.4 Identification of the Nonlinear Part f (u) ˆ Once the linear part G(s) is identified, we can estimate the nonlinear part x = fˆ(u). Two cases are discussed: (1) There is no a priori knowledge on the structure of the unknown f (u) and (2) f (u) is represented by a polynomial with a known order. In both cases, we need to estimate the ri ’s.
172
E.-W. Bai
11.4.1 Unknown Nonlinearity Structure Although the structure of the nonlinearity is assumed to be unknown, the nonlinearity is static and can be determined by the graph information using pairs (u(iTs ), x(iTs )). The input u(iTs ) is available and therefore, recovery of x(t) and, consequently x(iTs ), becomes a key in determining the nonlinearity. The input is in the form of u(t) = Acos(ωk t) and the output of the nonlinearity ¯ is x(t) = ∑∞ ˆ = ∑ii=0 rˆi cos(iωk t) for i=0 ri cos(iωk t). Define the estimate of x(t) as x(t) some integer i¯ > 0. Here rˆi denotes the estimate of ri . Clearly the estimation error is given by i¯
|x(t) − x(t)| ˆ = | ∑ (ri − rˆi )cos(iωk t) + i=0
i¯
≤ ∑ |ri − rˆi | + | i=0
∞
∑
∞
∑
ri cos(iωk t)|
i=i¯+1
ri cos(iωk t)| .
(11.17)
i=i¯+1
By the continuity and piecewise smoothness condition on f (·), the second term converges to zero uniformly [15] as i¯ → ∞. We need now to find the estimates rˆi ’s so that the first term also converges to zero. Recall when u(t) = Acos(ωk t), UDFT (ωk ) = A/2 and ri ¯ 2 G( j ωk i) + V f ,DFT (iωk ) i = 1, 2, ..., i , Y f ,DFT (iωk ) = r0 G( j0) + V f ,DFT (0) i=0. Define the estimate rˆi ’s by rˆ0 =
rˆi =
G( j0) V f ,DFT (0) A Y f ,DFT (0) = r0 + ˆ ˆ j0) ˆ j0) 2G( j0) UDFT (ωk ) G( G(
(11.18)
Y f ,DFT (iωk ) G( jiωk ) 2V f ,DFT (iωk ) A = ri + , i = 1, 2, ..., i¯ . (11.19) ˆ ˆ jiωk ) ˆ jiωk ) U ( ω ) G( jiωk ) DFT k G( G(
ˆ jω ) − G( jω ), this implies ˜ jω ) = G( With G( |ˆr0 − r0 | = | −
˜ j0) V f ,DFT (0) G( ˜ j0)|| r0 | + | V f ,DFT (0) | r + | ≤ |G( ˆ j0) 0 ˆ j0) ˆ j0) ˆ j0) G( G( G( G(
and for i = 1, 2, ..., i¯, |ˆri − ri | = | −
˜ jiωk ) 2V f ,DFT (iωk ) 2V f ,DFT (iωk ) G( ˜ jiωk )|| ri ri + | ≤ |G( |+| |. ˆ ˆ ˆ ˆ jiωk ) G( jiωk ) G( jiωk ) G( jiω0 ) G(
˜ jiω0 ) → 0 in probability as T → ∞, we have Clearly, if G( jω ) = 0, ∀ω and G( rˆi → ri , i = 0, 1, ..., i¯ and this results in the following theorem.
11
Frequency Domain Identification of Hammerstein Models
173
ˆ jωk i) → Theorem 11.5. Assume that G( jω ) = 0 ∀ω , and uniformly in ωk i G( G( jωk i) in probability. Let rˆi ’s be given by (11.18) and (11.19). Suppose i¯ → ∞, ¯2 N → ∞, i¯2 ρ 2N → 0, i TN → 0 as T → ∞, where T is defined in Theorem 11.4. Then, in probability |x(t) − x(t)| ˆ →0 ˆ s )| → 0. uniformly as T → ∞ and consequently, |x(iTs ) − x(iT We comment that for each input frequency ωk , we could derive a set of estimates rˆi (ωk ), i = 0, 1, ..., i¯, where ωk emphasises that the estimate is derived when input frequency is at ωk . In applications, an average is recommended rˆi =
rˆi (ω1 ) + rˆi (ω2 ) + ... + rˆi(ωN ) , i = 0, 1, ..., i¯ . N
In this section, the structure of the input nonlinearity is assumed to be unknown and thus estimation relies on the graph given by the pairs (u, x). ˆ Once the graph is obtained, its structure can be determined. The next step is to parametrise this nonlinearity by using appropriate base functions, e.g., x = f (u) = ∑ fl (u, αl ) for some known nonlinear functions fl ’s and unknown coefficients αl ’s. The choice of fl ’s of course depends on the structure shown in the graph. Then, the optimal αˆ can be calculated αˆ = arg min ∑(∑ fl (u(iTs ), αl ) − x(iT ˆ s ))2 . (11.20) α
i
l
In the case that the nonlinearity f (·) is even or odd, then from Lemma 11.1, the number of the Fourier coefficients ri ’s which have to be estimated can be cut into half.
11.4.2 Polynomial Nonlinearities In this section, we discuss a simple case when the unknown nonlinearity is parametrised by a polynomial x = ∑li=0 βi ui . The exact order of the polynomial is not necessarily known. However, an upper bound l is assumed to be available. Note that the identification of such a nonlinearity can be carried out without using the structure as discussed in the previous section section. However, if such information is available, these information should be taken into consideration in identification. Denote l l l1 = , l2 = − rem(l + 1, 2) 2 2
(11.21)
where 2l rounds l/2 to the nearest integer towards zero and rem(l + 1, 2) is the remainder after division (l + 1)/2. When u(t) = Acos(ωk t), it follows that l
x(t) = ∑ βi ui (t) = i=0
l
∑
i=0,even
βi Ai cosi (ωk t) +
l
∑
i=0,odd
βi Ai cosi (ωk t)
(11.22)
174
E.-W. Bai
= β 0 A0 +
l1
m−1
1
∑ β2m A2m 22m−1 [ ∑ ci2m cos((2m − 2i)ωkt) +
m=1
i=0
l2
+
m
l
i=0
i=0
1
cm 2m ] 2
∑ β2m+1A2m+1 22m ∑ ci2m+1cos((2m − 2i + 1)ωkt) = ∑ ri cos(iωkt)
m=0
where l1
r0 = β0 +
A2m
∑ β2m 22m cm2m
(11.23)
m=1 l2
r2k+1 =
∑ β2m+1
m=k l1
r2k =
A2m+1 m−k c , k = 0, 1, ......, l2 22m 2m+1
A2m
∑ β2m 22m−1 cm−k 2m ,
(11.24)
k = 1, 2, ......, l1
(11.25)
m=k
with
m(m − 1)...(m − k + 1) . (11.26) 1 · 2... · ... · k From equations (11.23), (11.24) and (11.25), we see that βi ’s and rk ’s satisfy the following equations. ckm =
⎛
1
⎜ ⎜0 ⎜ ⎜0 ⎜ ⎜. ⎜. ⎝. 0 ) ⎛A ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ )
20
c01
0 0 .. . 0
A2 1 c 22 2 A2 0 c 21 2
0 .. . 0
A4 2 c 24 4 A4 1 c 23 4 A4 0 c 23 4
.. . 0
... ... ... .. . *+
⎞ A2l1 l1 c ⎛ 22l1 2l1 A2l1 l1 −1 ⎟ ⎟ c 22l1 −1 2l1 ⎟ ⎜ A2l1 l1 −2 ⎟ ⎜ c 22l1 −1 2l1 ⎟ ⎜ .. .
A2l1 0 c 22l1 −1 2l1
...
Σ0
A3 1 c 22 3 A3 0 c 22 3
0 .. . 0
A5 2 c 24 5 A5 1 c 24 5 A5 0 c 24 5
.. . 0
*+ Σ1
... ... ... .. . ...
β0 β2 . ⎟ ⎝ .. ⎟ ⎠ β2l
2
⎛
r0 ⎟ ⎜ r2 ⎟ ⎜ ⎟ = ⎜ .. ⎠ ⎝ .
⎞ ⎟ ⎟ ⎟, ⎠
(11.27)
r2l1
1
,
⎞ A2l2 +1 l2 c 2l 2l +1 ⎛ 2 2 2 A2l2 +1 l2 −1 ⎟ ⎟ c 22l2 2l2 +1 ⎟ ⎜ A2l2 +1 l2 −2 ⎟ ⎜ c ⎜ 22l2 2l2 +1 ⎟ ⎟⎝ .. . A2l2 +1 0 2l2 c2l2 +1
⎞
β1 β3 .. .
⎞
⎛
r1 r3 .. .
⎞
⎟ ⎜ ⎟ ⎟ ⎜ ⎟ ⎟=⎜ ⎟, ⎠ ⎝ ⎠ ⎟ ⎠ β2l +1 r2l2 +1 2
(11.28)
,
where l1 and l2 are defined in (11.21). The matrices Σ0 and Σ1 are independent of unknown βi ’s and rk ’s. Since there is one-to-one map between βi ’s and rk ’s, the estimates βˆi ’s and fˆ = l ∑i=0 βˆi ui can be easily obtained based on the estimates of rˆi ’s,
11
Frequency Domain Identification of Hammerstein Models
⎛ ˆ ⎞ ⎛ ⎞ β0 rˆ0 ⎜ βˆ2 ⎟ ⎜ rˆ2 ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ . ⎟ = Σ0−1 ⎜ .. ⎟ , ⎝ .. ⎠ ⎝ . ⎠ ˆ rˆ2l1 β2l1
⎛ ˆ ⎞ ⎛ ⎞ β1 rˆ1 ⎜ βˆ3 ⎟ ⎜ rˆ3 ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ . ⎟ = Σ1−1 ⎜ .. ⎟ , ⎝ .. ⎠ ⎝ . ⎠ rˆ2l2 +1 βˆ2l2 +1
and
175
(11.29)
l
fˆ(u) = ∑ βˆi ui (t) .
(11.30)
i=0
Clearly, if rˆi → ri in probability as T → ∞, then βˆi → βi and supu∈[−A,A] | fˆ(u) − f (u)| → 0 in probability as T → ∞. We now summarise the above discussion into the following theorem. Theorem 11.6. Let rˆi ’s be given as in (11.18) and (11.19). Consider the estimates βˆi ’s and fˆ(u) = ∑li=0 βˆi ui derived from (11.29) and (11.30). Then, under the conditions of Theorem 11.5, in probability as T → ∞
βˆi → βi ,
sup | fˆ(u) − f (u)| → 0 . u∈[−A,A]
We comment that the inverses Σ0−1 and Σ1−1 are involved in calculating the estimates. In theory, the matrices Σ0 and Σ1 are always nonsingular. However, these matrices become ill-conditioned very soon, see Table 11.1 for condition numbers. For a low dimension polynomial, the method of (11.29) and (11.30) are fairly effective. For a higher dimensional polynomial, caution has to be exercised because of large condition numbers. For a really high order polynomial, a two step method preˆ s ) to determine sented in the previous section can be used, i.e., using u(iTs ) and x(iT the nonlinearity and then using the optimisation (11.20) to find the polynomial coefficients. Alternately, an orthogonal polynomial approach may be used to overcome this difficulty. Table 11.1: Condition numbers of Σ0 and Σ1 dimension 1 2
3
4
5
6
cond(Σ0 ) 1 2.6 14.4 82.4 447 2780 cond(Σ1 ) 1 6.3 37.6 220 1294 7570
11.5 Simulation In this section, we consider two numerical examples. Example 1: The unknown nonlinear and linear parts are given, respectively, by 3
f (u) = u + u2 = ∑ βi ui , G(s) = i=0
s+1 . s2 + 5s + 6
176
E.-W. Bai
The nonlinearity is known to be a polynomial with the maximum order 3 and the linear part is a second order transfer function. The noise v(t) is a random signal uniformly distributed in [−0.25, 0.25] and the input is u(t) = Acos(ωit), A = 1, i = 1, 2, 3 with ω1 = 0.5 ω2 = 1, ω3 = 5 and Ti = 100 2ωπi . For input frequency ωi , the sampling interval is set to be 100 · 2ωπi /10000 = 50πωi . No lowpass filter is used in simulation, i.e., y f (t) = y(t). Because the linear part is parametric, we use the estimate of (11.14). The identified linear and nonlinear coefficients are shown in the tables 11.2, 11.3, 11.4. Table 11.2: The true values and the estimates of ri ’s
true values estimates, ω1 = 0.5 estimates, ω2 = 1 estimates, ω3 = 5 average
r0
r1
r2
r3
0.5 0.5073 0.5020 0.4947 0.5013
1 1.0027 0.9975 1.0001 1.0001
0.5 0.4947 0.4977 0.5108 0.5011
0 0.0055 0.0097 0.0052 0.0068
Table 11.3: The true values and the estimates of βi ’s
β0 true values 0
β1
β2
β3
1
1
0
estimates 0.003 1.0052 1.0022 -0.0068 Table 11.4: The true values and the estimates of θ = (θ1 , θ2 , θ3 , θ4 )
θ1 true values 1
θ2
θ3
θ4
1
5
6
estimates 0.9861 1.0128 4.8555 6.0745
ˆ Thus, the estimates of fˆ(·) and G(s) are given by ˆ = fˆ(u) = 0.003 + 1.0052u + 1.0022u2 − 0.0068u3, G(s)
0.9861s + 1.0128 s2 + 4.8555s + 6.0745
which are very close to the true but unknown f (u) and G(s). The true (solid line) and the estimated (dash-dot) nonlinearities are shown in Figure 11.2, and the Bode plots of the true (solid line) and the estimated transfer functions are shown in Figure 11.3. They are basically indistinguishable.
11
Frequency Domain Identification of Hammerstein Models
177
Fig. 11.2: The true (solid) and the estimated (dash-dot) nonlinearities
Example 2: The linear part is the same as in Example 1. However, the nonlinear part is a saturation nonlinearity as shown in Figure 11.4 in solid line, ⎧ u ≥ 0.808 , ⎨ 0.808 u −0.808 < u < 0.808 , x= ⎩ −0.808 u ≤ −0.808 . The structure is unknown in simulation. The noise v(t) is a random signal uniformly distributed in [−0.1, 0.1] and the input is u(t) = 2 ∗ cos(ωit), i = 1, 2 with ω1 = 0.5, ω2 = 2 and Ti = 100 2ωπi . The estimate of the transfer function is given by ˆ = 0.9951s + 0.9879 G(s) s2 + 4.9907s + 5.9404 which is very close to the true but unknown G(s). For the nonlinearity, because its structure is unknown, we estimate the Fourier coefficients rˆi ’s first, which are shown in Table 11.5.
178
E.-W. Bai
Fig. 11.3: The true (solid) and the estimated (dash-dot) Bode plots Table 11.5: The true values and the estimates ri ’s (saturation) i
0
12
3
4
5
6
7
8
9
true ri rˆi , ω = 0.5 rˆi , ω = 2 average
0 .0032 -.0001 .0015
1 1 1 1
-.2625 -.2642 -.2673 -.2658
0 .0012 -.0007 .0003
.0889 .0859 .0827 .0843
0 .0004 -.0117 -.0057
-.0141 -.0151 -.0195 -.0173
0 0 -.0141 -.0070
-.0153 -.0092 -.0136 -.0114
0 .0014 .0002 .0008
The estimated nonlinearity (circle) is shown in Figure 11.4 by using the pairs 9
u(mTs ) = 2 ∗ cos(ωk mTs ), x(mT ˆ s ) = ∑ rˆi cos(iωk mTs ) i=0
Either input frequency ω1 or ω2 can be used. In our simulation, they really do not make any difference. Although the structure of the nonlinearity is not known a priori, the graph using the pair (u, x) ˆ gives a good estimate of the unknown nonlinearity. Further, if the form of f (u) is unknown, but it is known that f (u) is odd. Then, from Lemma 11.1, all the even coefficients r2i ’s are zero. In this case, we only have to identify the odd coefficients 9
x(t) ˆ =
∑
i=1,odd
rˆi cos(iω t) .
11
Frequency Domain Identification of Hammerstein Models
179
Fig. 11.4: The true (solid) and the estimated (circle) nonlinearities
11.6 Concluding Remarks In this chapter, we have proposed a frequency domain identification approach for Hammerstein models. By exploring the fundamental frequency, the linear part and the nonlinear part can be identified. No information on the form of the nonlinearity is assumed. The method is simple. Note that in the absence of prior information on the structure of the nonlinearity, the estimation is based on the Fourier series and thus, the rate of the convergence of the Fourier series becomes important. For those with rapidly decreasing coefficients, the first a few terms suffice to give a quite accurate approximation. This leads naturally to the question of how to speed up the convergence. There are some interesting ideas along this direction in [15]. It will certainly be a very interesting topic to pursue this further in the context of identification for Hammerstein models as discussed.
References 1. Bai, E.W.: A frequency domain approach Hammerstein model identification. IEEE Trans. on Auto. Contr. 48, 530–542 (2003) 2. Bai, E.W.: Identification of linear systems with hard input nonlinearities of known structure. Automatica 38, 853–860 (2002)
180
E.-W. Bai
3. Bai, E.W.: A blind approach to the Hammerstein–Wiener model identification. Automatica 38, 967–979 (2002) 4. Bai, E.W., Raman, S.: A linear interpolatory algorithm for robust system identification with corrupted measurement data. IEEE Trans. on Auto. Contr. 38, 1236–1241 (1993) 5. Bauer, D., Ninness, B.: Asymptotic properties of Hammerstein model estimates. In: Proc. of IEEE CDC, Sydney, Australia, pp. 2855–2860 (2000) 6. Baumgartner, S., Rugh, W.: Complete identification of a class of nonlinear systems from steady state frequency response. IEEE Trans. on Circuits and Systems 22, 753–759 (1975) 7. Brigham, E.O.: The fast Fourier transform. Prentice-Hall, Englewood Cliffs (1974) 8. Chen, J., Gu, G.: Control Oriented System Identification: An H∞ Approach. John Wiley & Sons, New York (2000) 9. Crama, P., Schoukens, J.: First estimates of Wiener and Hammerstein systems using multisine excitation. In: Proc. of IEEE Instrumentation and Measurement Conf., Budapest, Hungary, pp. 1365–1369 (2001) 10. Gardiner, A.: Frequency domain identification of nonlinear systems. In: 3rd IFAC Symp. on Identification and System Parameter Estimation, Hague, Netherlands, pp. 831–834 (1973) 11. Krzyzak, A.: On nonparametric estimation of nonlinear dynamic systems by the Fourier series estimate. Signal Processing 52, 299–321 (1996) 12. Levi, E.C.: Complex curve fitting. IEEE Trans. on Auto. Contr. 4, 37–43 (1959) 13. Ljung, L.: System Identification: Theory for the users, 2nd edn. Prentice-Hall, Upper Saddle River (1999) 14. Pintelon, R., Schoukens, J.: System Identification: A Frequency Domain Approach. IEEE Press, Piscataway (2001) 15. Tolstov, G.: Fourier Series. Prentice-Hall, Englewood Cliffs (1962) 16. Vandersteen, G., Rolain, Y., Schoukens, J.: Nonparametric estimation of the frequencyresponse function of the linear blocks of a Wiener–Hammerstein models. Automatica 33, 1351–1355 (1997) 17. Zadeh, L.: On the identification problem. IRE Trans. on Circuit Theory 3, 277–281 (1956)
Chapter 12
Frequency Identification of Nonparametric Wiener Systems Fouad Giri, Youssef Rochdi, Jean-Baptiste Gning, and Fatima-Zahra Chaoui
12.1 Introduction A great deal of interest has recently been paid to Wiener system identification Figure 12.1. However, most proposed solutions have been developed in the case of parametric systems, see e.g. [13, 14, 15, 17, 19]. As the internal signal x(t) is not accessible for measurement, and may even be of no physical meaning, the system output then turns out to be a bilinear (but fully known) function of the unknown parameters (those of the nonlinearity, on one hand, and those of the linear subsystem, on the other hand). Such bilinearity feature has been carried out following different approaches. One of them is the iterative optimisation method that consists in computing alternatively the parameters of the linear subsystem and those of the nonlinear subsystem. When optimisation is performed with respect to one set of parameters, the other set is fixed. Such an iterative procedure is shown to be efficient provided that it converges, e.g. [17]. But, convergence cannot be guaranteed except under restrictive conditions, e.g. [20]. In [2] a separable nonlinear optimisation solution is suggested. It consists in expressing a part of the parameters (namely, those of the linear subsystem) in function of the others, using first order optimality conditions. The dimension of the parameter space is thus reduced making easier the optimisation problem. Frequency-type solutions have also been proposed, see e.g. [5]. The idea is to apply repeatedly a sine input with different amplitudes and frequencies. Then, exploiting the polynomial nature of the nonlinearity, the input-output equation can Fouad Giri GREYC Lab,University of Caen, France Youssef Rochdi University of Cadi Ayyad, Marrakech, Morocco Jean-Baptiste Gning Crouzet Automatismes, Valence, France Fatima-Zahra Chaoui ENSET, Rabat, Morocco F. Giri & E.-W. Bai (Eds.): Block-oriented Nonlinear System Identification, LNCIS 404, pp. 181–207. c Springer-Verlag Berlin Heidelberg 2010 springerlink.com
182
F. Giri et al.
be uniquely solved with respect to the unknown parameters. Further details on the separable least squares approach can be found in Chapter 16. Nonparametric nonlinearities have also been dealt with using different approaches. In [7, 8, 9, 10] the identification problem is dealt with using stochastic tools. But, the nonlinearity is supposed to be invertible and smooth. Another stochastic approach has been proposed in [12] where the linear subsystem coefficients are first estimated using a least-squares type algorithm and the obtained estimates are used to recover the nonlinearity N(x) at any fixed x. Consistency is established supposing that the linear subsystem is MA with known and nonzero leading coefficient and, in addition, the nonlinearity is continuous with growth not faster than a polynomial. In [1] a frequency method is proposed for noninvertible nonlinearities. It consists in applying repeatedly sine input signals and operating a discrete-Fourier transformation of the obtained (steady-state) periodic output signals to estimate the frequency gain (at different frequencies) and the nonlinearity. In [4], a recursive identification scheme is proposed for Wiener systems including a dead-zone preload nonlinearity. There too, consistency is only achieved in presence of MA subsystems with known and nonzero leading coefficient. More recently, a semi-parametric identification approach has been presented in [16]. The impulse response function of the linear subsystem is identified via the nonlinear least squares approach with the system nonlinearity estimated by a pilot nonparametric kernel regression estimate. The linear subsystem estimate is then used to form a nonparametric kernel estimate of the nonlinearity. The price paid is that the impulse response function is a finite order which actually amounts to suppose the linear subsystem to be MA. The maximum likelihood method recently proposed in [11] constitutes an interesting alternative. Then, consistency of parameter estimates is guaranteed provided all inputs (control and noises) are Gaussian processes. In the present monograph, the above stochastic approaches are dealt with in Chapters 6 to 10. In the present chapter, a frequency-domain identification scheme is presented for Wiener systems involving possibly noninvertible and nonsmooth nonlinearities. The identification purpose is to estimate the system nonlinearity f (.) and the system phase and gain (∠G ( jωk ) , |G ( jωk ) |), for a set of a priori chosen frequencies ωk (k = 1 . . . m). As a matter of fact, the complexity of the identification problem partly depends on the assumptions made on the system. Presently, no assumption is made on the linear part G(s), except stability, while the nonlinear element is allowed to be noninvertible and nonsmooth. The problem complexity also lies in the fact that the (unavailable) internal signal (x (t)) cannot be uniquely reconstituted from the input-output signals. Consequently, the system cannot be uniquely modelled i.e. any couple of the form (G(s)/K, f (Kx)) (K = 0) is a possible model. The present frequency-domain identification method relies on analytic geometry tools first introduced in [6]. It particularly involves Lissajous like curves and their geometrical characteristics e.g. area and spread. These tools and other portions of [6] are presently reused with permission from the IEEE Intellectual Property Rights Office.
12
Frequency Identification of Nonparametric Wiener Systems
183
Fig. 12.1: Wiener model structure
12.2 Identification Problem Statement We are considering nonlinear systems that can be described by the Wiener model of Fig12.1 where G(s) denotes the transfer function of the linear subsystem and f (.) is a memoryless nonlinear function. Analytically, the Wiener model is described by the equations x(t) = g(t) ∗ u(t) , y(t) = w(t) + v(t)
with w(t) = f (x(t)) ,
(12.1) (12.2)
where the symbol ∗ refers to the convolution operation, g(t) = L−1 (G(s)) is the inverse Laplace transform of G(s) (g(t) is also the impulse response of the linear subsystem). Only the input u(t) and the output y(t) are accessible to measurement, the internal signals x(t) and w(t) are not. The equation error v(t) is a random signal that accounts for output noise. The linear subsystem is supposed to be BIBO stable (which is normal when system identification is carried out in open loop). Also, {v(t)} is assumed to be a stationary ergodic sequence of zero-mean independent random variables. Ergodicity makes possible the substitution of arithmetic averages to probabilistic means, simplifying forthcoming developments. As this is usually the case, the frequency-domain identification method we are developing necessitates the application of sine signals, u(t) = U cos(ω t), for a set of a priori chosen amplitudes and frequencies (U, ω ) ∈ {(Uk , ωk ); k = 1, . . . , m}. Thanks to linear subsystem stability, the steady-state internal signal turns out to be de f
of the form x(t) = U|G( jω )| cos(ω t − ϕ (ω )) with ϕ (ω ) = −∠G( jω ). With these notations, it is supposed that f (.) is defined on all intervals −Uk |G( jωk )| ≤ x ≤ Uk |G( jωk )| and falls in the following class of functions: Assumption 12.1 a. If f (.) is even then, f (0) should be known and f −1 ( f (0)) = {0} b. If f (.) is not even then, there should exist , −1 ≤ σ0 ≤ σ1 ≤ 1 such that f (.) is locally invertible on the subsets σ0Uk |G( jωk )| σ1Uk |G( jωk )| (k ∈ {1, . . . , m}) , i.e. one has, for all x ∈ σ0Uk |G( jωk )| σ1Uk |G( jωk )| and all z ∈ [−Uk |G( jωk )|, Uk |G( jωk )|]; f (x) = f (z) ⇒ x = z.
184
F. Giri et al.
Without loss of generality we suppose that |G( jωk )| = 0, for all k. Indeed, if |G( jωk )| were zero (for some k) then x(t) would be null and the output y(t) would in turn be null (up to noise). This case can easily be recognised in practice and discarded, observing the output. Except for the above assumption, the system is arbitrary. In particular, G(s) and f (.) are not necessarily parametric and the latter is allowed to be noninvertible and nonsmooth. On the other hand, note that Part b of Assumption 12.1 is not too restrictive because the sizes of the subintervals (σ0Uk |G( jωk )| σ1Uk |G( jωk )|) are unknown and may be arbitrarily small (i.e. σ1 may be too close to σ0 ); this feature guarantees a wide application field of the proposed identification approach. The identification scheme must estimates of the sys# " be able to provide accurate
tem frequency characteristics |G( jωk )|, ∠G( jωk ), f (x) (k = 1, . . . , m). As previously mentioned, any couple of the form (G(s)/K, f (Kx)) (K = 0) is a possible model for the considered system. This naturally leads to the question: what particular model should the identification method focuses on? This question will be answered later in Subsection 12.4.2 . At this point, let us just emphasise that, as long as the system phase is concerned, all models are similar to either (G, f ) or (G− , f − ), de f
de f
depending on the sign of K, where G− (s) = −G(s), f − (x) = f (−x). Therefore, we begin the identification process designing a phase estimator. The main design ingredients are developed in the next section.
12.3 Frequency Behaviour Geometric Interpretations All along this section, the Wiener system is excited by a sine input signal u(t) = U cos(ω t) where (U, ω ) is any fixed couple belonging to the set {(Uk , ωk ) ; k = 1, . . . , m} Using the models (G, f ) and (G− , f − ) the resulting steady-state internal signals are respectively defined by the equations x(t) = U|G( jω )| cos (ω t − ϕ (ω )) and w(t) = f (x(t)) and (12.3) x− (t) = U|G( jω )| cos (ω t − ϕ (ω ) − π ) and w(t) = f − (x− (t)) .(12.4) Let us also define the corresponding known normalised signals: xn (t) = cos (ω t − ϕ (ω )) , x− n (t) = cos (ω t − ϕ (ω ) − π ) .
(12.5)
12.3.1 Characterisation of the Loci (xn (t), w(t)) and (x− n (t), w(t)) The aim of this subsection is to establish key properties that characterise the parametrised curves (xn (t), w(t)) and (x− n (t), w(t)). First, notice that the signals xn (t) and x− n (t) are particular elements of the more general class of signals
χψ (t) = cos(ω t − ψ ), ψ ∈ R .
(12.6)
12
Frequency Identification of Nonparametric Wiener Systems
185
Indeed, it is readily seen that: xn (t) = cos (ω t − ϕ (ω )) = χϕ (ω ) (t), x− n (t) = − cos (ω t − ϕ (ω )) = χϕ (ω )+π (t) .
(12.7)
Now, let Cψ (U, ω ) be the parametrised locus constituted of all points of coordinates χψ (t), w(t) (t ≥ 0), i.e. Cψ (U, ω ) = { χψ (t), w(t) ; t ≥ 0} . (12.8) As χψ (t) is periodic (with period T = 2π /ω ) and, in steady state, w(t) is in turn periodic with period nT (for some n ∈ N∗ ) the curve Cψ (U, ω ) turns out to be an oriented closed-locus. In general, Cψ (U, ω ) is constituted of one or several loops, (see e.g. Figure 12.2). In the particular case where w(t) is a sine signal, Cψ (U, ω ) is a Lissajous curve 1 and may present different shapes depending on the value of ψ , e.g. an ellipse, a circle or a line [18]. As, presently, w(t) is not necessarily a sine signal, the curve Cψ (U, ω ) is referred to Lissajous like. Furthermore, Cψ (U, ω ) and Cψ +π (U, ω ) are symmetric with respect to the waxis, a property that will prove to be useful in the forthcoming sections. For now, the only characteristic of Cψ (U, ω ) that is of interest is its geometric area, denoted A(ψ ). The geometric area of a single loop curve ignores the curve orientation and so is positive. This is to be distinguished from the algebraic area that may be positive or negative, depending on the curve orientation sense. In the case of a multi-loops locus, the global geometric area equals the sum of the geometric areas of the different single loops. Figure 12.2 shows an oriented closed-locus composed of two loops. Now, we are ready to introduce the next important definition. Definition 12.1. The closed-locus Cψ (U, ω ) is said to be static if its geometrical area is null (A(ψ ) = 0). Then Cψ (U, ω ) looks like a standard curve (non-closed locus). Inversely, Cψ (U, ω ) is said non-static (or memory) when (A(ψ ) = 0). Proposition 12.1. Consider the Wiener system described by equations (1)-(2), subject to Assumption 12.1, excited by sine inputs u(t) = U cos(ω t), with (U, ω ) ∈ {(Uk , ωk ) ; k = 1, . . . , m}. Then, the following facts hold: 1. For any h and any ω , the Curve Cϕ (ω )+2hπ (U, ω ) is symmetric to Cϕ (ω )+(2h+1)π (U, ω ) with respect to the ordinate axis (w-axis). 2. The curves Cϕ (ω )+hπ (U, ω )(h = 0, ±1, ±2, . . .; ω = ω1 , . . . , ωm ) are all static. Proof 1. From (12.2) (12.3) (12.6) and (12.7) it follows that: w(t) = f U|G( jω )|χϕ (ω ) (t) = f − U|G( jω )|χϕ (ω )+π (t) . 1
(12.9)
Lissajous curves are the family of parametric curves (x(t), y(t)) with x(t) = A cos(ωx t + δx ) and y(t) = B cos(ωyt + δy ). They were studied in 1857 by the French physicist JulesAntoine Lissajous [18].
186
F. Giri et al.
Fig. 12.2: An example of closed-locus with null algebraic area while its geometric area is the sum of areas of the four loops
On the other hand, one has: χϕ (ω ) (t) if h is even, χϕ (ω )+hπ (t) = χϕ (ω )+π (t) = −χϕ (ω ) (t) if h is odd.
(12.10)
Then, it follows from (12.8) that Cϕ (ω )+2hπ (U, ω ) and Cϕ (ω )+(2h+1)π (U, ω ) coincide respectively with Cϕ (ω ) (U, ω ) and Cϕ (ω )+π (U, ω ). Recall that Cϕ (ω )+hπ (U, ω ) is the set of all points (cos(ω t − ϕ (ω )), w(t)) and Cϕ (ω )+(2h+1)π (U, ω ) is the set of (− cos(ω t − ϕ (ω )), w(t)). Therefore Cϕ (ω )+2hπ (U, ω ) and Cϕ (ω )+(2h+1)π (U, ω ) are symmetric with respect to ordinate axis (0, w). 2. Since f (.) and f − (.) are functions (in the standard sense), it follows from (12.8) that Cϕ (ω ) (U, ω ) and Cϕ (ω )+π (U, ω ) are static curves. Proposition 12.2. Consider the problem statement of Proposition 12.1. if Cψ (U, ω ) is static for some ψ , then one has: 1. f (U|G( jω )| cos(θ )) = f (U|G( jω )| cos(θ − 2(ψ − ϕ ))), for all θ . 2. f − (U|G( jω )| cos(θ )) = f − (U|G( jω )| cos(θ − 2(ψ − ϕ ))), for all θ . 3. If the function f (.) is even, then: ψ − ϕ (ω ) = hπ or π2 + hπ , for some h = 0, ±1, ±2 . . . Proof. From (12.3)-(12.4) one has, for all t: w(t) = f (U|G( jω )| cos(ω t − ϕ (ω ))) = f − (−U|G( jω )| cos(ω t − ϕ (ω ))) .
12
Frequency Identification of Nonparametric Wiener Systems
187
On the other hand, if Cψ is static then there exists a function g(.) such that: w(t) = g(cos(ω t − ψ )) .
(12.11)
f (U|G( jω )| cos(ω t − ϕ (ω ))) = g(cos(ω t − ψ )) ,
(12.12)
f − (−U|G( jω )| cos(ω t − ϕ (ω ))) = g(cos(ω t − ψ )) .
(12.13)
Then, one has for all t:
On the other hand, it can be easily checked that: # " " # " " ψ# ψ# cos ω t + − ψ = cos ω T − t + − ψ (for all t) , ω ω
(12.14)
where T = 2π /ω . Then, ones gets that: " " " ψ# ## " " " ## ψ# g cos ω t + − ψ = g cos ω T − t + − ψ (for all t), ω ω which, together with equations (12.12) and (12.13), yields (for all t): f (U|G( jω )| =
f − (−U|G( jω )| =
" " ψ# ## − ϕ (ω ) cos ω t + , ω " " " ## ψ# − ϕ (ω ) f U|G( jω )| cos ω T − t + . ω " " ψ# ## − ϕ (ω ) cos ω t + , ω " " ## " ψ# f − −U|G( jω )| cos ω T − t + . − ϕ (ω ) ω
These immediately imply, respectively: f (U|G( jω )| = = f − (−U|G( jω )| = =
cos (ω t + ψ − ϕ (ω ))) f (U|G( jω )| cos (ω T − ω t + ψ − ϕ (ω ))) f (U|G( jω )| cos (ω t − ψ + ϕ (ω ))) , cos (ω t + ψ − ϕ (ω ))) f − (−U|G( jω )| cos (ω T − ω t + ψ − ϕ (ω ))) f − (−U|G( jω )| cos (ω t − ψ + ϕ (ω ))) .
Parts 1 and 2 of the current proposition follows from these expressions letting θ = ω t + ψ − ϕ (ω ). To prove Parts 3 and 4, let us introduce the following notations:
188
F. Giri et al.
where:
2(ψ − ϕ (ω )) = δ (ω ) + 2hπ
(12.15)
0 ≤ δ (ω ) < 2π and h = 0, ±1, ±2, . . .
(12.16)
Then, using Parts 1 and 2, it follows that, for all θ ∈ [0, 2π ) : f (U|G( jω )| cos(θ )) = f (U|G( jω )| cos(θ − δ ) .
(12.17)
Case 1: f (.) is not even. It follows from (12.10)-(12.11), using Assumption 12.1 (Part b), that the function f is invertible in the subinterval (σ0U|G( jω )| σ1U|G( jω )|) . Then, from (12.17) it follows that for any θ ∈ (0, 2π ) such that cos(θ ) ∈ (σ0 σ1 ), one has: (12.18) cos(θ ) = cos(θ − δ (ω )) . Now, it can be easily checked that if δ (ω ) = 0, then for all θ ∈ (0, 2π ) − δ (ω ) δ (ω ) { 2 ,π + 2 } : cos(θ ) = cos(θ − δ (ω )) . (12.19) But this clearly contradicts (12.18); proving thus Part 3 of the current proposition. Case 2: f (.) is even. Letting θ = π /2 in (12.17), one gets " "π ## − δ (ω ) f (0) = f U|G( jω )| cos . 2 " Then it follows from Assumption 12.1 (part a) that cos π2 −δ (ω ) = 0 i.e.
δ (ω ) = −2iπ or δ (ω ) = π − 2iπ for some integer i = 0, ±1, ±2, . . .) (12.20) This, together with (12.15) implies that, either ψ − ϕ = (h − i)π or ψ − ϕ = π + (h − i)π which proves Part 4 and completes the proof of Proposition 12.2. 2 Proposition 12.3. Consider the problem statement of Proposition 12.1, If Cψ (U, ω ) is static then there exists a unique function g(.), such that: g(cos(ω t − ψ )) = w(t), ∀t and g(0) = f (0). More specifically, for all z ∈ [−1 + 1]: g(z) =
f (U|G( jω )|z) f − (U|G( jω )|z)
i f ψ − ϕ = 2hπ (for some integer h), i f ψ − ϕ = π + 2hπ .
Proof. The fact that Cψ (U, ω ) is static guarantees the existence of a g(.) such that: g(cos(ω t − ψ )) = w(t), ∀t .
(12.21)
12
Frequency Identification of Nonparametric Wiener Systems
189
The uniqueness of g(0) is proved separating the two cases referred to in Assumption 12.1. Case 1: the function f (0) is not even. Using Proposition 12.2 (Part 3) it follows that ψ − ϕ (ω ) = hπ , for some integer h. Then (12.21), implies: " # g (−1)h cos(ω t − ϕ (ω )) = w(t), ∀t (12.22) Comparing (12.22) with (12.3) yields: " # g (−1)h cos(ω t − ϕ (ω )) = f (U|G( jω )| cos(ω t − ϕ (ω ))) which implies that, for all z ∈ [−1 + 1]: g(z) = g(z) =
f (U|G( jω )|z) f − (U|G( jω )|z)
if h is even , if h is odd .
(12.23) (12.24)
Hence, Proposition 12.3 holds in Case 1. Case 2: the function f (0) is even. Using Proposition 12.2 (Part 4) it follows that
ψ − ϕ (ω ) = kπ or ψ − ϕ (ω ) =
π + kπ (for some h = 0, ±1, ±2, . . .) . (12.25) 2
Let us show, by contradiction, that the second solution in (12.25) can not hold. To this end, assume that, for some integer k:
ψ − ϕ (ω ) =
π + kπ . 2
It follows from (12.21) that, for all t, " # g (−1)k sin(ω t − ϕ (ω )) = w(t) .
(12.26)
(12.27)
Comparing (12.27) with (12.3) yields: " # g (−1)k sin(ω t − ϕ (ω )) = f (U|G( jω )| cos(ω t − ϕ (ω ))) . This can be given a more compact form, letting θ = ω t − ϕ (ω ): # " g (−1)k sin(θ ) = f (U|G( jω )| cos(θ )) , ∀θ .
(12.28)
Substituting θ + π to θ in (12.28) implies that, for all θ : " # g −(−1)k sin(θ ) = f (−U|G( jω )| cos(θ )) .
(12.29)
As f is even, it follows, comparing (12.28)-(12.29), that g is in turn even. Then, one gets from (12.28) that, for all θ :
190
F. Giri et al.
f (U|G( jω )| cos(θ )) = g
5 1 − cos2 (θ ) .
(12.30)
1 Using the variable change z = 1 − cos2 (θ ), it follows from (12.30) that, ∀z ∈ [−1 + 1]: # " 1 (12.31) g(z) = f U|G( jω )| 1 − z2 . Let us check that such a solution g(.) is not admissible in the sense of Assumption 12.1 (Part a). Indeed, one readily gets from (12.31) that: g(0) = f (U|G( jω )|) .
(12.32)
On the other hand, it follows from Assumption 12.1 (Part a) that: f (x) = f (0) =⇒ x = 0 .
(12.33)
Since |G( jω )| = 0 ( for all ω ∈ {ω1 , . . . , ωm }) it follows from (12.32)-(12.33) that g(0) = f (0). This clearly shows that g(.) does not satisfy Assumption 12.1 (Part a) and, so, is not admissible. Hence, the solution (12.26) should be discarded. Then, in view of (12.25), one necessarily has ψ − ϕ = kπ . The rest of the proof is similar to Case 1. The proof of Proposition 12.3 is completed.
12.3.2 Estimation of the Loci Cψ (U, ω ) Proposition 12.3 is quite interesting as it shows that ϕ (ω ) = −∠G( jω ) can be recovered (modulo π ) by just tuning the parameter ψ until the closed-locus Cψ (U, ω ) becomes static. Furthermore, this proposition also says that the obtained static curve is precisely the graphical plot of either f (U|G( jω )|x) or f (−U|G( jω )|x). However, these results cannot be directly used because the plotting of loci Cψ (U, ω ) necessitates the availability of the signal w(t) which is not accessible for measurement. This is presently coped with making full use of the information at hand, namely the periodicity (with period 2π /ω ) of both χψ (t) and w(t) and the ergodicity of the noise v(t). Bearing these in mind, the relation y(t) = w(t) + v(t) suggests the following estimator: de f
w(t,
N) =
1 N ∑ y(t + kT ); t ∈ [0, T ) , N k=1
(12.34)
where T = 2π /ω and N is a sufficiently large integer. Specifically, for a fixed time instant t, the quantity w(t,
N) is the mean value of the (measured) sequence {y(t + kT ); k = 0, 1, 2, . . .}. Then, an estimate C ψ ,N (U, ω ) of Cψ (U, ω ) is simply obtained
substituting w(t,
N) to w(t) when constructing Cψ (U, ω ). Accordingly, Cψ ,N (U, ω)
N) turns out to be the parametrised locus including all points χψ (t), w(t, (t ≥ 0), i.e.
12
Frequency Identification of Nonparametric Wiener Systems
C ψ ,N (U, ω ) = { χψ (t), w(t)
; t ≥ 0} .
191
(12.35)
The above remarks lead to the following proposition: Proposition 12.4. Consider the problem statement of Proposition 12.1. Then, one has: 1. w(t,
N) converges in probability to w(t) (as N → ∞). 2. C ψ ,N (U, ω ) converges in probability to Cψ (U, ω ) (as N → ∞). i.e. one has for all t ≥ 0:
N) = χψ (t), w(t) (w.p.1) . (12.36) lim χψ (t), w(t, N→∞
3. Consequently, if ψ = ϕ (ω ) + hπ then lim C ψ ,N (U, ω ) is static (w.p.1). N→∞
4. Inversely, if lim C ψ ,N (U, ω ) is static for some ψ , then one of the following stateN→∞
ments hold w.p.1: a. ψ = ϕ (ω ) + 2hπ (for some h = 0, ±1, ±2, ±3 . . .) and the mapping cos(ω t − ψ ) → w(t,
N) coincides with the curve of the function f (U|G( jω )|x) with x ∈ [−1 + 1]. b. ψ = ϕ (ω ) + π + 2hπ for some h = 0, ±1, ±2, ±3 . . .) and the mapping cos(ω t − ψ ) → w(t,
N) coincides with the curve of the function f − (U|G( jω )|x) with x ∈ [−1 + 1]. Proof. From (12.2) one gets that, for all t ∈ [0 T ): y(t) = w(t) + v(t) .
(12.37)
Then, using the fact that w(t) is periodic of period T , it follows from (12.37) that, for all t ∈ [0 T ) and all integers k : y(t + kT ) = w(t) + v(t + kT ), which in turn implies that: w(t,
N) =
1 N 1 N y(t + kT ) = w(t) + ∑ v(t + kT ) . ∑ N k=1 N k=1
(12.38)
Since {v(tk )} where tk = t + kT , is zero mean and ergodic, the last term on the right side vanishes w.p.1 as N → ∞ . This proves Part 1 of the proposition. To prove χψ (t) and Part 2, notice that, using the fact that both w(t) are periodic (with period T ) one has xψ (t + kT ), y(t + kT ) = xψ (t), w(t) + (0, v(t + kT )). Averaging both sides (with respect to k) gives: ' & ' & 1 N 1 N (12.39) xψ (t), ∑ y(t + kT ) = xψ (t), w(t) + 0, ∑ v(t + kT ) . N k=1 N k=1
192
F. Giri et al.
Again, the last term on the right side vanishes w.p.1 as N → ∞ , for the same reasons as above. This proves Part 2 of the proposition. Part 3 is a consequence of Part 2 and Proposition 12.1. Part 4 follows from Part 2 and Proposition 12.3. Remark 12.1. In the light of Proposition 12.4 (Part 4) it is seen that, for sufficiently large N, all curves belonging to the family {C ϕ (ωk ),N (U, ω ); k = 1, . . . , m} are static, concentric and any one of them is a, more or less, spread version of all the others. The same remark applies to the family {C ϕ (ωk )+pi,N (U, ω ); k = 1, . . . , m}. The notion of spread is illustrated by Figure 12.3 and analytically defined in Section 12.5.
12.4 Wiener System Identification Method 12.4.1 Phase Estimation (PE) Proposition 12.4 and Remark 12.1 suggest the following phase estimator: PE1 PE2
For a fixed (U, ω ) ∈ {(U1 , ω1 ), . . . , (Um , ωm )}, apply the input signal u(t) = U cos(ω t) to the nonlinear system of interest. Get a recording of the output y(t) over a sufficiently large interval of time 0 ≤ t ≤ NT . Generate the continuous signals w(t,
N − 1) and w(t,
N) using (12.34). Compute the L2 [0, T ] norms, i.e.:
and
"! #1/2 T
N)2 dt w(N)
2= 0 w(t, "! #1/2 w(N)
− w(N
− 1)2 = 0T (w(t,
N) − w(t,
N − 1))2 dt .
Choose a threshold 0 < ε << 1. If: w(N)
− w(N
− 1)2 <ε w(N)
2
PE3
then go to Step PE3. Otherwise, increase the value of N and repeat Step PE2 starting from the beginning. Plot the (closed) parametrised curve
N)) , 0 ≤ t ≤ NT } C ψ ,N (U, ω ) = {(cos(ω t − ψ ), w(t,
PE4
for different values of ψ ∈ [0, π ]. Let Φs denote the set of values of ψ for which C ψ ,N (U, ω ) is static. If Φs contains a single value (this happens when f (.) is non even) then, note ψ ∗ that value and let φ N (ω ) = ψ ∗ . This estimate corresponds to either ϕ (ω ) or ϕ (ω ) + π . If Φs contains two values (this happens when f (.) is even) then, note ψ ∗ the value of ψ for which C ψ ,N (U, ω ) passes by the known point (0, f (0)), and let φ N (ω ) = ψ ∗ .
12
Frequency Identification of Nonparametric Wiener Systems
PE5
193
Repeat the steps PE2 to PE4 with all couples (U, ω ) ∈ {(U1 , ω1 ), . . . , (Um , ωm )} considering the following correction when necessary, for any i = 2, . . . , N : if the curve C φ (ω ,N) (Ui , ωi ) is not a (more or less) spread version of the N i (U j , ω j ) , j = 1 . . . i − 1) then, previously obtained curves (i.e. C
φN (ω j ,N)
correct φ N (ωi ) adding π to it (i.e. φ N (ωi ) = ψ ∗ + π ) where ψ ∗ is as in PE4. Remark 12.2. It is worthy to note that the correction (by π ) in Step PE5 is useful in the case of noneven nonlinearities f (.), to ensure that the phase estimator φ N (ω ) focuses either on ϕ (ω ) = −∠G( jω ) (∀ω ∈ {ω1 , . . . , ωm }) or on ϕ (ω ) + π (for all ω ). It does not matter to know which case is being focused on. The main point is that the estimator must focus either on G(s) or on G− (s) (but not on both). In the case of even nonlinearities, the correction by π is useless because it is then structurally impossible to make the phase estimator φ N (ω ) focusing either on ϕ (ω ) = −∠G( jω ) (for all ω ) or on ϕ (ω ) + π (for all ω ). Indeed, from (12.3) one has w(t) = f (U|G( jω )| cos(ω t − ϕ (ω ))) but this in turn implies that w(t) = f (U|G( jω )| cos(ω t − ϕ (ω ) ± π )) when f (.) is even. Therefore, the two possibilities ϕ (ω ) and ϕ (ω ) ± π turn out to be indistinguishable from the system output. Hence, in case of even nonlinearities, the phase estimator will be considered to be consistent if φ N (ω ) converges in probability to ϕ (ω ) (modulo π ) as N → ∞. The above remark together with the result of Proposition 12.4 guarantee the consistency of the estimator φ N (ω ). This is formalised in the following theorem: Theorem 12.1. Consider the problem statement of Proposition 12.1. In the case of non-even nonlinearities f (.), the phase estimator φ N (ω ) , defined by the procedure PE1-PE5, is consistent in the sense that one has w.p.1: lim ϕ N (ω ) = ϕ (ω ) ( f or all ω ) or lim ϕ N (ω ) = ϕ (ω ) + π ( f or all ω ).
N→∞
N→∞
In the case of even nonlinearities, φ N (ω ) is consistent in the sense that lim ϕ (ω ) =
ϕ (ω ) + π , w.p.1.
N→∞
12.4.2 Nonlinearity Estimation (NLE) de f Let us consider the family of (static) curves Γ N (U, ω ) = C φ N (ω ),N (U, ω ) constructed using the phase estimation procedure PE1-PE5. In the light of Proposition 12.4 and Theorem 12.1, it is clear that for any fixed couple (U, ω ) ∈ {(U1 , ω1 ), . . . , (Um , ωm )}, the curve Γ N (U, ω ) converges in probability to (the graphical plot of) either f (U|G( jω )| x) or f − (U|G( jω )|x) ( with −1 ≤ x ≤ 1).Therefore, a consistent estimate of either f (Uk |G( jωk )|x) or f − (Uk |G( jωk )|x) can be recovered from the curve
194
F. Giri et al.
Γ N (U, ω ) (whatever (U, ω ) ∈ {(U1 , ω1 ), . . . , (Um , ωm )}). The question is: what use can be made of an estimate of f (Uk |G( jωk )|x) or ( f − (Uk |G( jωk )|x)) for a given (Uk , ωk )? This question is going to be answered exploiting the model multiplicity pointed out in Section 12.2. Indeed, the above functions are nothing other than the nonlinearities of the particular couple of models: G(s) de f , f (Uk |G( jωk )|x) , M(Uk , ωk ) = Uk |G( jωk )| G− (s) de f − − , f (Uk |G( jωk )|x) M (Uk , ωk ) = Uk |G( jωk )| respectively. Then, for any k ∈ {1, . . . , m}, the curve Γ N (Uk , ωk ) converges (in probability) to the nonlinearity of either M(Uk , ωk ) or M − (Uk , ωk ) (it does not matter to know which of these two models is actually the limit). That is, the search domain is thus reduced to the model family {M(Uk , ωk ), M − (Uk , ωk ); k = 1, . . . , m} It remains to decide what particular element of this family one should focus on. To this end, notice that the functions f (Uk |G( jωk )|x) and f (−Uk |G( jωk )|x) are more or less spread versions of f (x) and f (−x), respectively. Specifically, f (Uk |G( jωk )|x) is more spread than f (x) if Uk |G( jωk )| < 1. Otherwise, f (Uk |G( jωk )|x) is less spread than (and so is a concentrated version of) f (x). The same remark applies to f (−Uk |G( jωk )|x) compared to f (−x). The spread notion is illustrated by Figure 12.3. Now, since the functions of interest, namely f (Uk |G( jωk )|x) and f (−Uk |G( jωk )|x), are all defined in the same interval, i.e. −1 ≤ x ≤ 1, it is judicious to focus on the couple of models that involve the less spread (or, equivalently, the most concentrated) nonlinearity. Let M(U, ω ), M − (U, ω ) denote such couthe ple where (U, ω ) ∈ {(U1 , ω1 ), . . . , (Um , ωm )}. For convenience, " # corresponding − − elements are respectively denoted G(s), f (x) and G (s), f (x) i.e. de f − 1 (12.40) G(s), G (s) = −G(s) , U|G( jω | de f de f − f (s) = f U|G( jω )|x , f (x) = f (−x); (x ∈ [−1 + 1]) . (12.41) Focusing on M(U, ω ), M − (U, ω ) will prove to be most convenient when it comes to estimating the gain modulus |G( jω )|(ω ∈ {ω1 , . . . , ωm }). On the other hand, as f U|G( jω )|x is the less spread function of the family { f (Uk |G( jωk )|x) , k = 1, . . . , m}, one necessarily has
G(s)
de f
=
Uk |G( jωk )| ≤ U|G( jω | (k = 1, .., m) . −
(12.42)
Finally, recall that ∠G( jω ) = ∠G( jω ) and ∠G ( jω ) = ∠G− ( jω ), for all ω . That is, the estimator defined by PE1-PE5, is still convenient in the sense that de f φ N (ω ) converges in probability to φ (ω ) = −∠G( jω ) for all ω ∈ {ω1 , . . . , ωm } or it
12
Frequency Identification of Nonparametric Wiener Systems
195
converges to φ (ω ) + π for all ω . Taking these remarks into account, we propose the following algorithm to get a nonlinearity estimate f N (.): NLE1
NLE2
de f Consider the family of static curves Γ N (U, ω ) = C φ (ω ),N (U, ω ), conN structed in the phase estimation procedure PE1-PE5. Select the most concentrated (i.e. less spread) curve and let (U, ω ) denote the corresponding amplitude and frequency couple. Let f N (x) (with x ∈ [−1, +1]) is the function induced by the static curve Γ N (U, ω ).
Theorem 12.2. Consider the problem statement of Proposition 12.1. The nonlinearity estimator NLE1-NLE2, is consistent in the sense that one has w.p.1: a. if lim φ N (ω ) = ϕ (ω ) then lim f N (x) = f (x), for all x ∈ [−1 + 1] . N→∞
N→∞
− b. if lim φ N (ω ) = ϕ (ω ) + π then lim f N (x) = f (x), for all x ∈ [−1 + 1] . N→∞
N→∞
Remark 12.3. 1. In case this may be of some interest, a polynomial representation can, a posteriori, be given to the nonlinearity f N (x) interpolating a number of points selected on the curve Γ N (U, ω ). However, the polynomial approximation thus obtained is not a consistent estimate of f N (x) except at the selected points. 2. In [1], model rescaling (12.40)-(12.41) has also been used to cope with the estimation of the nonlinearity. However, there, it was suggested that ω may be chosen arbitrary. Doing so, the gains |G( jωk )| can not be consistently estimated as explained in the following subsection.
12.4.3 Frequency Gain Modulus Estimation The phase estimates φ N (ω ) (ω = ω1 , . . . , ωk ) and the nonlinearity estimate f N ob jω )| of the gains |G( jω )| tained previously will now be used to get estimates |G(
Fig. 12.3: Two functions and their spread versions
196
F. Giri et al.
for ω = ω1 , . . . , ωk . To keep simple the forthcoming presentation, it is temporarily assumed that f N = f , φ N (ω ) = φ (ω ) = −∠G( j, ω ) (for ω = ω1 , . . . , ωk ) and v(t) = 0 (no noise). In the procedure PE1-PE5, the Wiener system has been excited by the family u(t) = U cos(ω t) ((U, ω ) = (U1 , ω1 ), . . . , (Um , ωm )). Using of signals the model G(s), f (s) , the Wiener system signals can be expressed as follows: y(t) = w(t) = f |G( jω )|U cos(ω t − φ (ω )) with ϕ (ω ) = −∠G( jω ) . (12.43) On the other hand, it readily follows from (12.40) and (12.42) that U|G( jω )| ≤ 1 for all (U, ω ) ∈ {(U1 , ω1 ), . . . , (Um , ωm )} .
(12.44)
−1
If f were invertible, then the internal signal x(t) = f (y(t)) = |G( jω )|U cos(ω t − ϕ (ω )) could be made available and, so, |G( jω )| would be easily recovered. The situation becomes much problematic when f is not invertible. Nevertheless, the gains |G( jω )| can still be uniquely determined if the system nonlinearity satisfies, in addition to Assumption 12.1, the following assumption: Assumption 12.2. If the system nonlinearity f is even then, there exist 0 ≤ σ0 < σ1 ≤ 1 such that, for all (U, ω ) ∈ {(U1 , ω1 ), . . . , (Um , ωm )} , all x ∈ σ0 U|G( jω )| σ1 U|G( jω )| , and all z ∈ [0 U|G( j, ω )|] , one has: f (x) = f (z) ⇒ x = z . Remark 12.4. It readily follows from Assumptions 12.1-12.2 that: 1. If f is even then it is invertible at x = 0 and f (0) is known (Assumption 12.1, Part a). Furthermore, one gets from Assumption 12.2 that, for all (U, ω ) = {(U1 , ω1 ), . . . , (Um , ωm )}, the branch of f corresponding to the (unilateral) interval [0 U|G( jω )|] is locally invertible on the subinterval [σ0 U|G( jω )| σ1 U|G( jω )|] For instance, the function f (x) = x2 satisfies both Assumptions 12.1 (Part a) and A2, the numbers (σ0 , σ1 ) are then arbitrary. 2. It has already been noticed that f is a concentrated version of f . Therefore, f also satisfies Assumptions 12.1-12.2. More precisely, a. in the case where f is even, replace in Assumption 12.2, the intervals [−U|G( jω )| U|G( jω )|] and [0 U|G( jω )|] by
[−U|G( jω )| U|G( jω )|]
and [0 U|G( jω )|] respectively;
12
Frequency Identification of Nonparametric Wiener Systems
197
b. in the case where f is not even, replace in Assumption 12.1 (Part b) the intervals [−U|G( jω )| U|G( jω )|] and [σ0U|G( jω )| σ1U|G( jω )|] by [−U|G( jω )| U|G( jω )|] and [σ0U|G( jω )| σ1U|G( jω )|] respectively. Now, it will be shown that, under Assumptions 12.1-12.2, the gain |G( jω )| (ω = ω1 , . . . , ωm ) can exactly be estimated using the following predictive estimator:
j, ω | = arg min JU,ω (α ) ((U, ω ) ∈ {(U1 , ω1 ), . . . , (Um , ωm )}) , (12.45) |G( 0≤α ≤ U1
de f
JU,ω (α ) =
T
t=0
|w(t) − f (α U cos(ω t − ϕ (ω ))) |dt (T = 2π /ω ) . (12.46)
Proposition 12.5. Consider the optimisation problem defined by (12.45) and (12.46) in presence of the constraint (12.43), where f satisfies Assumptions 12.1-12.2 (as this is made precise in Remark 12.4, Part 2). Then, for all (U, ω ) ∈ {(U1 , ω1 ), . . . ,
jω )| = |G( jω )|. (Um , ωm )}, JU,ω (α ) has a unique global minimum at |G( Proof. It is readily seen from (12.46) that, (U, ω ) ∈ {(U1 , ω1 ), . . . , (Um , ωm )}, JU,ω |G( jω )| = 0 which means that |G( jω )| is a global minimum. Let us show that such a minimum is unique. To this end, let α denote any real such that JU,ω (α ) = 0. It follows from (12.43) and (12.46) that, for almost all 2 t ∈ [0, T ]: f G( j, ω )|U cos(ω t − ϕ (ω )) = f (α U cos(ω t − ϕ (ω ))) . (12.47) On the other hand, it is obvious that, for all (U, ω ) ∈ {(U1 , ω1 ), . . . , (Um , ωm )} and all t ∈ [0, T ] such that σ0 < cos(ω t − ϕ (ω )) < σ1 , one has:
σ0U|G( j, ω )| < |G( j, ω )|U cos(ω t − ϕ (ω )) < σ1U|G( j, ω )| .
(12.48)
Similarly, for all t ∈ [0, T ] such that σ0 < cos(ω t − ϕ (ω )) < σ1 , one has:
σ0U|G( j, ω )| < |G( j, ω )|U cos(ω t − ϕ (ω )) < σ1U|G( j, ω )| .
(12.49)
Now, let us distinguish between the two cases referred to in Assumptions 12.1-12.2. Case 1: f is even. Using Assumption 12.2 and Remark 12.4 (part 2a), it follows from (12.47) that |G( j, ω )|U cos(ω t − ϕ (ω )) = α U cos(ω t − ϕ (ω )) , for all t such that (12.49) holds. This readily gives α = |G( j, ω )|. 2
This means that (12.47) holds for all t ∈ [0, T ] except on a subset of measure zero (in the Lebesgue sense).
198
F. Giri et al.
Case 2: f is not even. Using Assumption 12.1 (part b) and Remark 12.4 (part 2a), it follows from (12.47) that |G( j, ω )|U cos(ω t − ϕ (ω )) = α U cos(ω t − ϕ (ω )), for all t such that (12.48) holds. This implies that α = |G( j, ω )|. Hence, uniqueness of the global minimum of JU,ω (.) is proved in all cases. Remark 12.5. 1. A crucial feature that makes the optimisation problem (12.45)-(12.46) well posed is that the function f is defined for all possible values of its argument, i.e. |G( j, ω )|U cos(ω t − ϕ (ω )) with t ∈ [0, T ] and (U, ω ) ∈ {(U1 , ω1 ), . . . , (Um , ωm )}. Indeed, (12.44) ensures that the above argument belongs to [−1 1] which actually is the interval of definition of f (by (12.41)). Accordingly, the minimum search in (12.45) is limited to the interval 0 ≤ α ≤ 1/U. 2. It is important to note that the above well posedness (of the optimisation problem (12.45)-(12.46) is a direct consequence of our choice in Subsection 12.4.2 to fo" # − − cus on the particular models G(s), f (x) and G (s), f (x) whose functions de f de f − f (x) = f U|G( jω )|x and f (x) = f −U|G( jω )|x are the most concentrated (less spread) on [−1 1] (see (12.40)-(12.42)). 3. The proposed predictive estimator is largely inspired from [1]. But, the spread − − requirement (applied when selecting the models (G, f ) and (G , f ) has been introduced in [6]. Without such spread requirement, there would be no guarantee that all arguments of interest, i.e. |G( j, ω )|U cos(ω t − ϕ (ω )), would belong to the definition interval [−1 1] and, consequently, the minimum search interval in (12.45)-(12.46) would not necessarily include the global minimum, |G( jω )|, for all (U, ω ) ∈ {(U1 , ω1 ), . . . , (Um , ωm )} Proposition 12.5 is now used to build up a consistent estimator of the gains |G( jω )| in presence of not necessarily null noise v(t). Then, equation (12.43) becomes: y(t) = w(t) + v(t) w(t) = f |G( j, ω )|U cos(ω t − ϕ (ω )) . (12.50) Given the estimates f N and φ N (ω ) (ω = ω1 . . . ωm ), of the nonlinearity and phase, equations (12.45)-(12.46) suggests the following predictive-type gain estimator:
N ( jω )| = arg min JU,ω ,N (α ); f or any (U, ω ) ∈ {(U1 , ω1 ), . . . , (Um , ωm )} , |G 0≤α ≤ U1
(12.51) T de f
JU,ω ,N (α ) =
0
" # |w(t,
N) − fN α U cos(ω t − φ N (ω )) |dt ,
(12.52)
where w(t,
N) is defined by (12.34), i.e. de f
w(t,
N) =
1 N ∑ y(t + iT ) . N i=1
(12.53)
12
Frequency Identification of Nonparametric Wiener Systems
199
Theorem 12.3. Consider the Wiener system described by (12.1)-(12.2) and suppose that the involved nonlinearity satisfies Assumptions 12.1-12.2. Let the system be excited by the same input signals as in procedure PE1-PE5, i.e. u(t) = U cos(ω t) with (U, ω ) ∈ {(U1 , ω1 ), . . . , (Um , ωm )}. The system can also be modelled by (G, f ) defined by (12.40)-(12.41), involving a nonlinearity f that in turn satisfies Assumptions 12.1-12.2 (as this is made precise in Remark 12.4 (Part 2). Then, the gain estimator
N ( jω )| converges in probability to |G( jω )|, as (12.51)-(12.53) is consistent i.e. |G N → ∞: Proof. In view of Proposition 12.5, it is sufficient to show that JU,ω ,N (α ) converges in probability to JU,ω (α ) as N → ∞. To this end, note that by Proposition 12.4 (Part 1), w(t,
N) converges in probability to w(t), for all t ∈ [0, T ], (as N → ∞). Furthermore, it was shown in Theorems 12.1 and 12.2 that one of the following statements holds w.p.1 as N → ∞: de f i. φ N (ω ) converges to ϕ (ω ) = −∠G( jω ) ( for all ω ∈ {ω1 , . . . , ωm }) and f N converges to f , ii. φ N (ω ) converges to ϕ (ω ) + π ( for all ω ∈ {ω1 , . . . , ωm }) and f N converges to − f .
In the light of these remarks, it is readily seen comparing (12.37) and (12.52) that JU,ω ,N (α ) actually converges in probability to JU,ω (α ) as N → ∞. Remark 12.6. Since (12.52) has a unique minimum and the search interval 6 is known, 7 1 . the minimum can be found graphically plotting JU,ω ,N (α ) against α ∈ 0, U
12.4.4 Simulation Results 12.4.4.1
Identification in Presence of an Even Nonlinearity
The identification method described previously is now illustrated considering a Wiener system characterised by: G(s) =
15(−1 + s) , (3 + s)(2 + s)(4 + s)
f (x) = abs(x)
As G(s) is nonminimum phase it involves a large phase variation (indeed, the phase goes from 0o to 360o ). The nonlinearity f (.) is an even function but it satisfies Assumption 12.1 i.e. f −1 ( f (0)) = 0. According to the proposed identification method, the system is excited by 6 sinusoids of the form u(t) = Uk cos(ωk t) (k = 1 . . . 6), where the couples (Uk , ωk ) are given the values of Table 12.1. The system is disturbed by a zero-mean noise v(t) is uniformly randomly distributed in the interval [−0.2 0.2]. Fig 12.4 shows the (steady-state) output signal y(t), obtained with ω = ω1 = 0.1π rad. The phase estimator PE1-PE5 is first applied to get phase estimates φ N (ωk ). The way this estimator operates is presently illustrated for ω = ω1 = 0.1π rad. First, the average output w(t,
N) is generated according to (12.34). Figure 12.4 shows the resulting w(t,
N) for N = 10 and N = 100.
200
F. Giri et al.
Fig. 12.4: True output y(t) (top) and filtered output w(t,
N) obtained with N = 10 (median) and N = 100 (bottom)
Then, w(t,
N) is used to construct the parametrised closed locus C ψ ,N (U1 , ω1 ) = {(cos(ω1t − ψ ), w(t,
N)); 0 ≤ t ≤ NT } (T = 2π /ω1) for different values of ψ ∈ [0, π ) and N = 100. The curves corresponding to four values of ψ are shown by Fig 12.5. It is seen that a static curve is obtained for ψ = 0.637rad and ψ = 2.207rad. However, the static locus corresponding to ψ = 2.207rad does not pass by the known point (0, f (0)) = (0, 0). Therefore, one has φ N (ω1 ) = ψ ∗ = 0.637rad which, in fact, is nothing other than the value of ϕ (ω1 ) = −∠G( jω1 ). The phase estimates φ N (ωk ) obtained for the different frequencies ωk (k = 1 . . . 6) are given by Table 12.1. Notice that they all correspond to ϕ (ωk ) = −∠G( jωk ) The static curves Γ N (Uk , ωk ) obtained for the different ωk ’s are plotted in Figure 12.6. A rapid inspection of these curves shows that the less spread (most concen12.4.2, trated) is the one corresponding to ω = ω3 = π rad/s. Following Subsection " # − − the focus will be made on the couple of models G(s), f (x) and G (s), f (x) , defined by (12.40)-(12.41) which, presently, are characterised by: de f
G(s) =
1 3.896(−1 + s) , G(s) = (3 + s)(2 + s)(4 + s) U|G( jω |
f (x) = abs(3.85x) ,
" # − − using the fact that U|G( jω | = 6 × 0.6417 = 3.850 and G (s), f (x) = −G(s) , f (−x) . According to the estimation procedure NLE1-NLE2, the nonlinearity estimate f N it defined by the particular static curve Γ N (U, ω ). It is easily checked that, for this example, f N coincides with f . The compatibility between the phase
12
Frequency Identification of Nonparametric Wiener Systems
201
Fig. 12.5: Closed locus {(cos(ω1t − ψ ), w(t,
N)); 0 ≤ t ≤ NT } for different values of ψ
Fig. 12.6: Curves Γ N (Uk , ωk ) obtained for ωk (k = 1, . . . , 6)
estimates, on one hand, and the nonlinearity estimate, on the other hand, is thus guaranteed just as this was predicted by Theorem 12.2. Recall that, in practical#sit " − − uations, it does not really matter which model, G(s), f (x) or G (s), f (x) , is actually being identified. Given the estimates φ N (ωk ) and f N , the estimator (12.51)-(12.53) is resorted to
jωk )| of the frequency gain modulus |G( jωk )|. This is illustrated by get estimates |G( Figure 12.7 which shows the cost function JU1 ,ω1 ,N (α ) plotted against α ∈ [0, 1/Uk ]
202
F. Giri et al. Table 12.1: Simulation results in presence of an even nonlinearity
k Uk ωk (rad/s) −∠G( jωk )rad φ N (ωk )rad |G( jωk )|
jωk )| |G(
1 2 0.1 π 0.643 0.637 0.178 0.177
2 2 0.5 π 2.526 2.535 0.209 0.209
3 6 π 3.741 3.752 0.167 0.167
4 6 1.5 π 4.402 4.411 0.113 0.1125
5 10 2π 4.804 4.812 0.077 0.077
6 10 3π 5.258 5.266 0.040 0.040
Fig. 12.7: Plot of JU,ω ,N (α ) for ω1 = 0.1π rad
for ω1 = 0.1π rad, U1 = 2. It is seen that the global minimum is achieved for α =
jω1 )| = 0.1775 which is very close to the true gain value. Table 12.1 gives the |G( gain estimates thus obtained for different frequency couples (Uk , ωk ) and shows that the quality of estimation is quite satisfactory. Hence, the good quality of the whole identification method is confirmed, despite the presence of noise (see Figure 12.4). 12.4.4.2
Identification in Presence of a Noneven Nonlinearity
In this subsection we only focus on phase estimation for a specific Wiener system involving a noneven nonlinearity satisfying Assumption 12.1. In addition, this nonlinearity is non smooth as it involves a preload. Specifically, the considered Wiener system is characterised by, de f
G(s) =
" 1 # 10 and f (x) = sign(x) 1 + |x| . (s + 1)4
12
Frequency Identification of Nonparametric Wiener Systems
203
The system output is disturbed by a zero-mean noise v(t) that is randomly uniformly distributed in [−0.5, 0.5] . First, the phase estimator PE1-PE5 is applied to de f
get an estimate of the phase ϕ ( jω ) = −∠G( jω ) for ω = 0.3π rad/s. To this end, the system is excited by the signal u(t) = U cos(ω t) with U=3. Figure 12.8 shows the closed-locus C ψ ,N (U, w) obtained with different values of ψ and N = 200. According to PE1-PE5, the phase estimate φ N (ω ) is that value of ψ that leads to a static curve. In this case, there is only one possible value in [0, π [ for φ N (ω ) i.e. ψ ∗ = 3.028rad (Figure 12.8).
Fig. 12.8: The curves Cψ ,N (U, ω ) obtained for several values of ψ
The phase estimates φ N (ωk ) for (k = 1 . . . 6), by applying phase estimator PE1PE5, are given in Table 12.2. The obtained values confirm the consistency of the proposed phase estimator. Figure 12.9 shows the set of static curves Γ Uk ,ωk ,N (k = 1 . . . 6). It is easily checked that each curve is a more- or less-spread version of the other confirming Step PE5 of the phase estimator. According to NLE1-NLE2, the less-spread curve is retained as the nonlinearity estimate.
Table 12.2: Simulation results in presence of an noneven nonlinearity k Uk ωk (rad/s) −∠G( jωk )rad φ N (ωk )rad
1 3 0.3 π 3.023 3.028
2 3 0.5 π 4.015 4.027
3 6 0.7 π 4.576 4.585
4 6 0.9 π 4.923 4.934
5 8 1.2 π 5.246 5.257
6 8 1.5 π 5.446 5.458
204
F. Giri et al.
Fig. 12.9: The curves Γ ωk ,N = C φ (ω ),N obtained for the frequencies ωk (k = 1 . . . 6) k
12.5 Further Expressions 12.5.1 Geometric Area Let A(U, ω , ψ , N) denote the geometric area of the curve C ψ ,N (U, ω ). It follows using Proposition 12.2 and equation (12.14), that: T
" dxψ ψ # |dt |w
T − t + 2 , N − w(t,
N)| || ω dt 0 T " # 2π ψ = |w
T − t + 2 , N − w(t,
N)| | sin(ω t − ψ )|dt . (12.54) T 0 ω
A(U, ω , ψ , N) =
This provides us with an analytical tool for static feature i.e. C ψ ,N is static if A(U, ω , ψ , N) = 0. Now, using Proposition 12.4 it follows that if A(U, ω , ψ , N) = 0 (for a sufficiently large value of N) then, one has either ψ = ϕ (ω ) or ψ = ϕ (ω ) + π In the first case, the mapping cos(ω t − ψ ) → w(t,
N) gives an estimate f N of f ; − otherwise, the mapping corresponds to f . The smaller A(U, ω , ψ , N) is the better the estimation quality. Procedure PE1-PE5 can then be reformulated as a problem of optimising the nonlinear function A(ψ , N). As the problem is one-dimensional and the search domain is well defined, i.e. [0 π ], the minimum can be determined graphically. To illustrate this analytical version of PE1-PE5, consider the simulation conditions of Subsection 12.4.4.1. Let us make the illustration with ω = 0.1π rad, U = 2 and N = 100. The corresponding function A(U, ω , ψ , N) is plotted versus ψ in Figure 12.10. It is seen that A(U, ω , ψ , N) actually has a two possible global minimum in the interval [0 π ), namely ψ1∗ = 0.645rad and ψ2∗ = 2.222rad. Note that, ψ2∗ ≈ ψ2∗ + π /2; furthermore, the admissible value is ψ1∗ = 0.645rad is very closed to the phase estimate obtained in Subsection 12.4.4.1.
12
Frequency Identification of Nonparametric Wiener Systems
205
Fig. 12.10: Plot of A(U, ω , ψ , N) vs ψ , for 0 ≤ ψ ≤ π for the Wiener system of Subsection 12.4.4.1
Fig. 12.11: Plot of A(U, ω , ψ , N) vs ψ , for 0 ≤ ψ ≤ π for the Wiener system of Subsection 12.4.4.2
Now, let us consider the Wiener system of Subsection 12.4.4.2 which involves a non even nonlinearity. We aim at estimating the phase for ω = 0.3π rad/s. The corresponding function A(U, ω , ψ , N) is represented (versus ψ ) in Figure 12.11 for U = 3 and N = 200. It is seen that A(U, ω , ψ , N) actually has a global minimum in the interval [0 π ). This is achieved at ψ ∗ = 3.028rad which is very close to the phase estimate obtained in Subsection 12.4.4.2.
12.5.2 Signal Spread de f Let us consider the family of (static) curves Γ ω ,N = C φ (ω ),N , (U, ω ) ∈ {(U1 , ω1 ), N . . . , (Um , ωm )}, constructed in PE1-PE5. Let gk (x)(−1 ≤ x ≤ 1) denote the function induced by the static curve Γ N (Uk , ωk ). It was noticed in Subsection 12.4.2 that each function g(x) is a more or less spread version of the functions f or f − 1. According to the estimation procedure NLE1-NLE2, the estimate of the nonlinearity
206
F. Giri et al.
is the particular function gk (x) that presents the smallest degree of spread. Doing so, one recovers the largest part of the system nonlinearity ( f or f − ). In the simulation of Subsection 12.4.4.1, this was easily recognised using the plots of Γ N (Uk , ωk ) (Fig 12.6). The spread degree can also be evaluated analytically using the following measure [3]: +1
Sk =
x2 |gk (x)|2 dx
−1 +1 −1
.
(12.55)
|gk (x)| dx 2
The larger Sk is the more spread the function gk (x). According to the procedure NLE1-NLE2, the best estimate for the system nonlinearity is the function gk (x) with the smallest value of Sk .
12.6 Conclusion In this chapter, a frequency-domain solution has been developed to deal with Wiener system identification. Unlike most available solutions, the system nonlinearity is presently not required to be (globally) invertible and smooth. The main component of the proposed identification method is the consistent phase estimator PE1-PE5 described in Section 12.4.1. The design of this estimator essentially relays on the analytical geometry investigation of Section 12.3 . The main outcome of such investigation is that the Lissajous like curves Cψ (U, ω ) (0 < ψ < π ) are all nonstatic except for ψ = ϕ + kπ . Inversely, when Cψ is static then it may correspond to f (.) to f − (.) or to (more or less) spread versions of these. The focus has been made on the couple of models M(U, ω ), M − (U, ω ) that involve the less spread nonlinearities. This choice has proved to be most judicious in regard to gain estimation. The fact that the nonlinearity estimation is coupled with the phase estimation guarantees the compatibility of the corresponding estimates. The above compatibility and spread facts are crucial features in the presented frequency identification method.
References 1. Bai, E.W.: Frequency domain identification of Wiener models. Automatica 39, 1521– 1530 (2003) 2. Bruls, J., Chou, C.T., Heverkamp, B.R.J., Verhaegen, M.: Linear and nonlinear system identification using separable least squares. European Journal of Control 5, 116–128 (1999) 3. Carbon, M., Ghorbanzadeh, D.: Mathematical Elements for Signals. Dunod, Paris (1997) ISBN: 210 0034642 4. Chen, H.F.: Recursive identification for Wiener model with discontinuous piece-wise linear function. IEEE Transactions on Automatic Control 51, 390–400 (2006) 5. Gardiner, A.: Frequency domain identification of nonlinear systems. In: IFAC Symposium of System Identification and Estimation, Rotterdam, The Netherlands, pp. 831–834 (1993)
12
Frequency Identification of Nonparametric Wiener Systems
207
6. Giri, F., Rochdi, Y., Chaoui, F.Z.: An Analytic Geometry Approach to Wiener System Frequency Identification. IEEE Transactions on Automatic Control 54(4), 683–696 (2009) 7. Greblicki, W.: Nonparametric identification of Wiener systems. IEEE Transactions on Automatic Control 51, 390–400 (1992) 8. Greblicki, W.: Nonparametric approach to Wiener system identification. IEEE Transactions on Circuits and Systems-I: Fundamental Theory and Applications 44, 538–545 (1997) 9. Greblicki, W.: Recursive identification of Wiener system. Applied Mathematics and Computer Science 11, 977–991 (2001) 10. Greblicki, W.: Nonlinearity recovering in Wiener system driven with correlated signal. IEEE Transactions on Automatic Control 49, 1805–1810 (2004) 11. Hagenblad, A., Ljung, L., Wills, A.: Maximum likelihood identification of Wiener models. Automatica 44, 2697–2705 (2008) 12. Hu, X.L., Chen, H.F.: Strong consistence of recursive identification for Wiener systems. Automatica 41, 1905–1916 (2005) 13. Hunter, I.W., Korenberg, M.J.: The identification of nonlinear biological systems: Wiener and Hammerstein cascade models. Biological Cybernetics 55, 135–144 (1986) 14. Nordsj¨o, A.E., Zetterberg, L.H.: Identification of certain time-varying nonlinear Wiener and Hammerstein systems. IEEE Transaction on Signal Processing 49, 577–592 (2001) 15. Pajunen, G.A.: Adaptive control of Wiener type nonlinear systems. Automatica 28, 781– 785 (1992) 16. Pawlak, M., Hasiewicz, Z., Wachel, P.: On Nonparametric identification of Wiener systems. Transactions on Signal Processing 55(2), 482–492 (2007) 17. V¨or¨os, J.: Parameter identification of Wiener systems with discontinuous nonlinearities. Systems and Control Letters 44, 363–372 (2001) 18. Weisstein, E.W.: Lissajous Curve. In: MathWorld (2006), a Wolfram Web Resource, http://mathworld.wolfram.com/LissajousCurve.html 19. Westwick, D.T., Kearney, R.E.: A new algorithm for the identification of multiple-input Wiener systems. Biological Cybernetics 68, 75–85 (1992) 20. Wigren, T.: Recursive prediction error identification using the nonlinear Wiener model. Automatica 29, 1011–1025 (1993)
Chapter 13
Identification of Wiener–Hammerstein Systems Using the Best Linear Approximation Lieve Lauwers and Johan Schoukens
13.1 Introduction Nonlinear block-oriented models have successfully been used in many engineering applications to identify chemical, mechanical, and biological systems or processes. Due to their simple structure, these models are very attractive from a user’s point of view. However, the block-oriented approach also has some disadvantages over black-box modelling approaches.
13.1.1 Block-oriented versus Black-box Models As an illustration, a brief comparison is made here between the block-oriented approach and the state space approach (see Table 13.1). Block-structured models give the most physical insight to the user and require usually a lower number of parameters than the state space approach. On the other hand, the block-oriented approach is not as flexible as the state space approach. Block-oriented models are not compatible with multiple input, multiple output (MIMO) systems and they require prior knowledge about the structure of the device, which can be hard to obtain. Furthermore, for some block-structured models (e.g., the Wiener–Hammerstein model) it might be difficult to generate good initial values to start the (nonlinear) optimisation procedure. Lieve Lauwers Vrije Universiteit Brussel, Pleinlaan 2, B-1050 Brussels, Belgium e-mail:
[email protected] Johan Schoukens Vrije Universiteit Brussel, Pleinlaan 2, B-1050 Brussels, Belgium e-mail:
[email protected] F. Giri & E.-W. Bai (Eds.): Block-oriented Nonlinear System Identification, LNCIS 404, pp. 209–225. c Springer-Verlag Berlin Heidelberg 2010 springerlink.com
210
L. Lauwers and J. Schoukens
Table 13.1: Comparison between the block-oriented approach and the state space approach Block-oriented State space Physical interpretation Number of parameters Flexibility of the model Model initialisation
⊕ ⊕
⊕ ⊕
13.1.2 Identification Issues In the following, two specific identification problems of the block-oriented approach are discussed: the selection of a model structure on the one hand and the search for starting values for the dynamic model parameters on the other hand. 13.1.2.1
How to Select a Model Structure?
A main disadvantage of the block-oriented approach is the difficulty of selecting a suitable model structure that can grasp both the linear dynamic and the nonlinear behaviour of the system under test. For instance, can a simple open loop structure describe the system, or is a nonlinear feedback loop necessary? Is a system with a single branch sufficient, or is a parallel structure with multiple nonlinear branches needed? When no prior knowledge is available, it might be hard for the user to know whether the selected block structure is appropriate for modelling the system. Therefore, a method is required that gives the user some guidance in her/his choice of nonlinear block structures. In Section 13.3, an alternative approach to discriminate between various subclasses of block-oriented model structures is presented. This block structure identification method is based on the behaviour of the Frequency Response Function (FRF) of the nonlinear system (i.e., the best linear approximation [11], [2]) as a function of the power spectrum of the input signal. 13.1.2.2
How to Initialise the Model?
The most difficult step in the identification of block-oriented models is the generation of good starting values for the different dynamic building blocks. For some block structures, like the Wiener–Hammerstein model, initial values are not always easy and straightforward to obtain. Unfortunately, in literature, few papers give hints how to obtain good starting values. In Section 13.4, a new method is proposed to initialise the different building blocks of a Wiener–Hammerstein model using the pole/zero information captured in the best linear approximation of the nonlinear system under test. The rest of the chapter is organised as follows: the major aspects of blockoriented nonlinear systems, briefly introduced above will be further discussed in full detail in the remaining parts of the text. Next, the best linear approximation
13
Identification of Wiener–Hammerstein Systems Using the BLA
211
is defined and its important properties are highlighted. Then, the best linear approximation is used as a simple tool to get access to valuable structural system information. Finally, an initialisation procedure for Wiener–Hammerstein models is presented based upon the best linear approximation. Both methods are illustrated on measurement data.
13.2 The Best Linear Approximation First, the concept of the best linear approximation of a system is briefly defined, since this will be used as a basic modelling tool in Section 13.3 and Section 13.4.
13.2.1 Definition Consider a single input, single output (SISO) nonlinear system S with input u(t) and output y(t). Definition 13.1. The Best Linear Approximation (BLA) of a nonlinear system S is defined as the model G belonging to the set of linear models G such that it minimises the mean square error between the true output and the modelled output for a particular class of inputs, i.e., % $ (13.1) GBLA = argmin E |y(t) − G(q)u(t)|2 G∈G
where G(q) is the linear transfer function model with q the shift operator [2], [3], [11]. From (13.1), it follows that the best linear approximation can be obtained nonparametrically by performing classical FRF measurements [1]: GBLA ( jωk ) =
Syu ( jωk ) Suu ( jωk )
(13.2)
where Syu ( jωk ) is the cross-power spectrum between the output and the input, and Suu ( jωk ) is the auto-power spectrum of the input. The best linear approximation of a nonlinear system depends on the following input properties: the probability density function (pdf) or the higher order moments, and the power spectrum of the (stochastic) input signal [11], [2], [3]. Consequently, a Gaussian and a uniform distributed excitation signal will not result in the same best linear approximation. For that reason, the class of considered signals needs to be defined in advance (see further). The dependency on the power spectrum includes the bandwidth and the amplitude level of the input signal. This property is the basis of the model structure selection method presented in Section 13.3. In general, a nonlinear system can always be represented by its best linear approximation GBLA followed by a source representing the unmodelled nonlinear contributions ys of the system [14], [3]. By construction, ys is uncorrelated (but not
212
L. Lauwers and J. Schoukens
y
1
0
−1
0
0.5
1
1.5
0
0.5
1
1.5
0
0.5
1
1.5
yBLA
1
−1 1
ys
0
0
−1
Time [s]
Fig. 13.1: (a) A nonlinear system with (b) its alternative representation and (c) the corresponding signals
independent) of the input signal u(t). This alternative representation is illustrated in Figure 13.1. The nonlinear contributions ys depend on the particular realisation of the input signal u(t), and they exhibit a stochastic behaviour from realisation to realisation if the input is a stochastic process. For that reason, this error term is called the nonlinear noise source. For random excitations, it is often very hard to distinguish it from the process noise. In practice, GBLA can be estimated nonparametrically by averaging the measured FRFs for different input realisations.
13.2.2 Class of Excitations Since the best linear approximation of a nonlinear system depends on the pdf (e.g., normal, uniform, or binary distribution) of the input, the class of excitation signals needs to be specified. Here, the class of the extended Gaussian excitations with a user-defined power spectrum will be considered. The main reasons for this choice are that signals belonging to this class are frequently used in practice and that the properties of the nonlinear distortions are well known in that case. Two members of this signal class are Gaussian noise and the random phase multisine which is defined below. Definition 13.2. A random phase multisine u(t) is a periodic signal, defined as a sum of harmonically related sine waves: N fmax 1 u(t) = √ ∑ Uk e j(2π N kt+φk ) N k=−N
(13.3)
13
Identification of Wiener–Hammerstein Systems Using the BLA
213
with phases φ−k = −φk , amplitudes Uk = U−k = U( k fNmax ), fmax the maximum frequency of the excitation signal, and N the number of frequency components. The phases φk are independent (over k) random variables that are uniformly distributed over [0, 2π ) such that E e jφk = 0 [11], [12]. Note that in theory a random phase multisine becomes only asymptotically Gaussian (N → ∞). The properties of Gaussian noise and random phase multisines are extensively studied in [12]. In this paper, it is shown that Riemann equivalent excitation signals result in the same best linear approximation.
13.3 Nonlinear Block Structure Selection Method In this section, some user guide lines are provided for the selection of a suitable nonlinear block structure, because not every model structure allows the user to model the system under test equally well. From a series of mutual experiments which are easily executed, the user can rapidly determine the expected complexity of the nonlinear modelling process.
13.3.1 Two-stage Nonparametric Approach The nonlinear model structure selection method is based on the fact that GBLA depends on the power spectrum of the input signal [11]. This property includes a dependency on both the root mean square (rms) value and the bandwidth or colouring of the excitation signal. The aim of this method is to give the user some guidance in the choice of nonlinear model structures, not to make a complete and strict classification of nonlinear model structures based on the behaviour of their best linear approximation. The approach consists in applying a Gaussian-like input signal and performing two types of experiments in which the power spectrum of the input is varied. In a first series of experiments, the amplitude or rms value of the input signal is varied while maintaining the shape of the power spectrum. In a second series of experiments, the shape of the power spectrum is varied while keeping the rms level constant (i.e., by normalising the power of the excitation signal before applying it to the system under test). Based on the resulting changes (a vertical shift, a shape change or no change) of the amplitude and phase characteristics of the best linear approximation, an analysis can then be made about the ability of some nonlinear block structures to identify the system under test, without performing any parameter estimation.
13.3.2 Some Nonlinear Block Structures The block-oriented models that can be distinguished by the proposed method are listed below: 1. the Wiener–Hammerstein (WH) system, consisting of two linear dynamic blocks with a static nonlinearity in between, as shown in Figure 13.2.
214
L. Lauwers and J. Schoukens
G1 u
G2
f p
q
y
Fig. 13.2: Wiener–Hammerstein system
Special cases are the Wiener (W) and the Hammerstein (H) system in which, respectively, the last and the first dynamic block is set to one. 2. the WH-Nonlinear-Finite-Impulse-Response (WH-NFIR) system, defined as a WH system where the static nonlinearity is replaced by a dynamic nonlinearity with a finite memory: q(t) = f (p(t), p(t − 1), ..., p(t − M))
(13.4)
with f a nonlinear function. For this system, two types of nonlinearities are distinguished leading to a. the WH-polyNFIR system, where the dynamic nonlinearity consists of a monomial: M
q(t) = c ∏ pαi (t − i)
(13.5)
i=0
with c a constant and αi ∈ R+ ; b. the WH-NFIRsum system, where the dynamic nonlinearity can be written as a sum of nonlinear functions gi with only one argument: M
q(t) = ∑ gi (p(t − i))
(13.6)
i=0
3. the Nonlinear Feedback (NLFB) system with a dynamic block in the feedforward branch and a WH structure in the feedback loop, as illustrated in Figure 13.3.
13.3.3 Theoretical Results The behaviour of the best linear approximation GBLA of the above mentioned block structures was studied for the class of extended Gaussian excitation signals. The results are summarised in Table 13.2 where an arrow denotes a vertical shift, a triangle a frequency dependent change, and the equality sign no change. The full theoretical study including the proofs of these results can be found in [6]. According to the behaviour of their best linear approximation, model structures can be divided into different classes in which the structures show the same behaviour. If the measured behaviour of the system under test corresponds to one of
13
Identification of Wiener–Hammerstein Systems Using the BLA
215
G1 u +
y
+
q
G 22
p
G 21
f
Fig. 13.3: Nonlinear Feedback system Table 13.2: General behaviour of the best linear approximation of some nonlinear systems (: a vertical shift; : a frequency dependent change; =: no change) Gaussian excitation signals changing rms value changing colouring |GBLA | ∠GBLA |GBLA | ∠GBLA WH, W H WH-polyNFIR H-NFIRsum W(H)-NFIRsum NLFB
= = =
= =
= = =
these classes, all the structures in that class are good candidates to approximate the true system. Structures outside that class show a different behaviour and are thus less appropriate to model the system. From Table 13.2, five classes of nonlinear model structures with a different behaviour for the best linear approximation can be distinguished. Since the WHNFIRsum structure and the NLFB structure belong to the same class, they cannot be distinguished from each other based on the characteristics of their best linear approximation, for the considered series of tests. As a result, a WH-NFIRsum structure can be used as an approximate model structure to identify a NLFB system, although it does not correspond to the true structure of the system.
13.3.4 Experimental Results The structure identification method described above is now applied to the Silverbox [13], an electrical circuit simulating the behaviour of a nonlinear mass-springdamper system. Here, only the results are discussed; a detailed description of the performed experiments can be found in [6].
216
L. Lauwers and J. Schoukens
0 −10 −20
0
100 Frequency [Hz]
0 Phase [°]
Amplitude [dB]
10
−50 −100
200
10 0 −10 0 0
Phase [°]
Amplitude [dB]
20
−150
100 Frequency [Hz]
200
100 Frequency [Hz]
200
−50 −100 −150
0
100 Frequency [Hz]
200
0
Fig. 13.4: Behaviour of the best linear approximation (bold lines) and its 95% uncertainty bounds (thin lines) for (a) different rms values and (b) different bandwidths. Top figures: amplitude FRF; bottom figures: phase FRF
In Figure 13.4, the behaviour of the best linear approximation is given for different rms values and different bandwidths (bold lines), together with the 95% uncertainty bounds (thin lines). It can be seen that, when varying the rms value, both the amplitude and phase of the best linear approximation change; they shift to the right for larger rms values. These shape changes are statistically significant within the 95% uncertainty bounds. When varying the bandwidth of the excitation signal, a statistically significant shape change is observed of both the amplitude and phase of GBLA . By combining the results of both experiments, it follows from Table 13.2 that there are two candidate model structures for the Silverbox: a W(H)-NFIRsum and a NLFB structure. The authors are unaware of the use of the first model structure for modelling this system. However in [10], a NLFB structure was successfully utilised. Note that if the user is only interested in identifying this system in a frequency band up to 50Hz, a W(H) structure is a good candidate model structure.
13.3.5 Concluding Remarks It should be emphasised that a candidate model structure depends on the rms and colour variations applied to the input signal for the structure identification procedure. Consequently, a candidate model structure will only be appropriate to identify the system under test for the range wherein the excitation was modified.
13
Identification of Wiener–Hammerstein Systems Using the BLA
217
Therefore, the user should always apply variations that cover the full amplitude range and frequency band of interest.
13.4 Initial Estimates for Wiener–Hammerstein Models In this section, a non-iterative method is proposed to initialise the linear dynamic blocks of a Wiener–Hammerstein model. The idea is to build these blocks from the pole/zero information captured in the best linear approximation of the system under test. The initial estimates can then be further optimised via an output error criterion to complete the identification process. The proposed identification procedure is finally applied to measurements from a physical system.
13.4.1 Set-up The system under test is the Wiener–Hammerstein structure depicted in Figure 13.2, where u and y are the measured input and output, respectively. The internal signals p and q are not measurable. G1 and G2 are the transfer functions of the linear dynamic blocks, and f (.) represents the static nonlinearity which can be written as a concatenation of two static nonlinear functions f1 and f2 such that f (x) = f2−1 ( f1 (x)) .
(13.7)
This property (which is true for any type of static nonlinearity since f1 or f2 can always be set equal to the identity function) will be used in the initialisation procedure explained below. The method is explained for discrete-time models, but can also be applied to continuous-time models.
13.4.2 Initialisation Procedure The initial estimates for the linear dynamic blocks of the WH model are generated from the system’s best linear approximation which is easily extracted from the input/output data. The idea is to split the WH model in two subsystems, and to write the linear dynamic blocks as a linear combination of basis functions containing the poles and zeros of GBLA . By selecting then a specific model structure, a problem linear-in-the-parameters is obtained which needs to be solved in order to find initial estimates for G1 and G2 . 13.4.2.1
Decomposing the WH Structure
The WH model is decomposed in two subsystems as illustrated in Figure 13.5, using the property in (13.7). The first subsystem with input u and output z1 has a nonlinear block f1 ; the second subsystem with input y and output z2 has a nonlinear block f2 . Note that the identification of the second subsystem can be considered as an errorsin-variables (EIV) problem when there is noise on y. Solving an EIV problem for
218
L. Lauwers and J. Schoukens
nonlinear systems is however very demanding. Here, the bias induced by the noisy data is neglected which is justified since the proposed approach serves only as a procedure to generate starting values. The aim is now to write Gˆ 1 and Gˆ −1 2 as a linear combination of basis functions Wi and Hi containing the poles and zeros of GBLA , respectively: Gˆ 1 = ∑ri=0 θˆ1L (i)Wi , . (13.8) s ˆL Gˆ −1 2 = ∑i=0 θ2 (i)Hi In the next paragraphs, the poles and zeros are extracted from the best linear approximation and the basis functions for the linear dynamic blocks are constructed. The final goal is then to find the proper coefficients θ L of these basis functions in order to compose initial estimates for G1 and G2 . 13.4.2.2
Extracting the Poles/Zeros from GBLA
For the class of extended Gaussian signals, it is well known that the best linear approximation of a Wiener–Hammerstein system is equal to GBLA ( jω ) = cG1 ( jω )G2 ( jω )
(13.9)
where c is a constant gain factor depending on the power spectrum (rms value and colouring) of the excitation and the odd nonlinear contributions [14], [11]. From equation (13.9), it follows that GBLA contains all the poles and zeros present in the linear dynamic blocks of the Wiener–Hammerstein structure, assuming that there is no pole-zero cancellation. In order to obtain these poles and zeros, first a parametric model Gˆ BLA (z, θ ) needs to be estimated for GBLA which was extracted nonparametrically from the input/output data. This can easily be done using, for instance, the Estimator for Linear Systems (ELiS; [11]): ∑ b bi z−i Gˆ BLA (z, θ ) = ni=0 a a j z− j ∑ j=0 n
(13.10)
but also the time domain prediction error framework can be used [8], [15].
Fig. 13.5: Decomposed WH structure consisting of two subsystems
13
Identification of Wiener–Hammerstein Systems Using the BLA
219
By decomposing the parametric model Gˆ BLA (z, θ ) into partial fractions, the poles ρi of GBLA will pop up in a natural way: max(nb −na ,0) αi + ∑ β j z− j −1 i=0 1 − ρi z j=0 na
Gˆ BLA (z, θ ) = ∑
(13.11)
with β j = 0 for nb < na . Note that for reasons of simplicity the partial fraction expansion of Gˆ BLA (z, θ ) is restricted to the case of simple poles. Similarly, in the decomposition of Gˆ −1 BLA (z, θ ) which is given by the inverse of (13.10) the simple zeros ηk of GBLA will appear. 13.4.2.3
Selecting the Basis Functions
From the partial fraction expansion of Gˆ BLA (z, θ ), the basis functions Wi in (13.8) for Gˆ 1 can be deduced:
2 1 −j , z |i = 0, ..., na , j = 0, ..., max(nb − na , 0) with nb > na . Wi = 1 − ρiz−1 (13.12) In order to obtain a real-valued estimate for G1 , the first order fractions with complex conjugate poles need to be merged together. As a result, the basis functions Wi for Gˆ 1 include first and second order fractions and pure delay terms. Note that these delay terms disappear for nb < na . A similar reasoning can be applied using the partial fraction expansion of ˆ −1 Gˆ −1 BLA (z, θ ) to obtain the basis functions Hi in (13.8) for G2 . Hereby, the role of nb and na needs to be interchanged in the above reasoning, and the poles ρi need to be replaced by the zeros ηk . 13.4.2.4
Solving a Problem Linear-in-the-parameters
The goal is now to find the proper coefficients θ L in (13.8) of the basis functions Wi and Hi by expressing that the outputs of both subsystems z1 and z2 in Figure 13.5 are equal. From Figure 13.5, it can easily be seen that the parameters θ L are combined in a nonlinear way with the parameters θ NL of the unknown static nonlinearities f1 and f2 . To circumvent the resulting problem that is nonlinear-in-the-parameters, an equivalent model structure with a different parametrisation will be considered (see Figure 13.6). This model structure will lead to a problem that is linear-in-theparameters (θ L , θ NL ) which can easily be solved. The price to be paid are the multiple input, single output (MISO) nonlinearities g1 and g2 which will give rise to extra regressors. To obtain expressions for z1 and z2 , the nonlinearities g1 and g2 in Figure 13.6 are approximated by a multivariable polynomial consisting of a linear and a nonlinear part: ˜ θ1L , θ1NL ) ≈ PL (u) ˜ θ1L + PNL (u) ˜ θ1NL g1 (u, (13.13) L NL L L NL g2 (y, ˜ θ2 , θ2 ) ≈ P (y) ˜ θ2 + P (y) ˜ θ2NL
220
L. Lauwers and J. Schoukens
with u˜ = [u˜1 , ..., u˜r ] and y˜ = [y˜1 , ..., y˜s ]. The row vector PNL (.) consists of all the distinct nonlinear monomials up to a certain degree (>1) which is chosen by the user. Note that PL (.) and PNL (.) are in fact the linear and the nonlinear regressor matrix, respectively. Using the following definitions: 3 L 4 θi i = 1, 2 (13.14) θi = θiNL and Rx˜ =
˜ PL (x)
PNL (x) ˜
x˜ = u, ˜ y˜
(13.15)
the continuity requirement z1 = z2 can be formulated as a Total Least Squares (TLS) problem [16]: Rθ = 0 (13.16) τ τ with R = Ru˜ − Ry˜ and θ = θ1 θ2τ . The unknown parameters θ can be found by performing a Singular Value Decomposition (SVD) [4] of the total regressor matrix R: R = USV τ . (13.17) When the matrix S has exactly one degeneration DS = 1 (i.e., one zero singular value), the parameter estimate θˆ is given by the last column of V corresponding to the only zero singular value. When multiple (DS > 1) or no degenerations at all (DS = 0) are present, the number of degenerations of the matrix S needs to be reduced or increased, respectively, to one. Degenerations of S stem from the linear dependencies between the columns of the regressor matrix R. In the ideal case DS = 1, the degeneration is due to the desired pˆ − qˆ relationship (see further, equation (13.18)). For DS = 0, this relation can not be formed by the present basis functions and, hence, extra delay terms z− j need to be included in both sets of basis functions. It can be proved that j = 0, ..., max(nb , na ) leads to successful results. For DS > 1, the pˆ − qˆ relation can be formed in multiple ways via different combinations of basis functions and, hence, delay terms need to be deleted one by one, starting from the highest order, until DS = 1. In practice, it is unfeasible to know when DS = 1 is reached and therefore a scan is performed over
Fig. 13.6: Model structure leading to problem linear-in-the-parameters: each subsystem consists of basis functions and a MISO static nonlinearity
13
Identification of Wiener–Hammerstein Systems Using the BLA
221
the different possibilities in order to select for the case DS > 1 the initial model with the lowest output root mean square error (rmse). Note that to increase the numerical stability, the matrix R can be normalised prior to the SVD such that each column has a rms value of one. Afterwards, the parameter estimate θˆ needs to be denormalised again. 13.4.2.5
Composing the Initial Estimates
The estimated parameters θˆ L which correspond to the components of the linear regressor matrix PL (.) determine the coefficients of the basis functions. Gˆ 1 and Gˆ −1 2 are then composed parametrically by making the linear combination of the basis functions in (13.8). Furthermore, the signal PL (.)θˆ L is within a scale factor an initial (nonparametric) estimate for the system’s intermediate signals: pˆ = PL (u) ˜ θˆ1L , (13.18) L ˜ θˆ2L . qˆ = P (y) To obtain a parametric estimate for the static nonlinearity f (.), a polynomial can for instance be fit through the pˆ − qˆ data.
13.4.3 Experimental Results The proposed initialisation method is now applied to measurements from an electronic nonlinear circuit with a Wiener–Hammerstein structure which was designed by Gerd Vandersteen [17]. A detailed description of the system under test and the performed experiments can be found in [5]. Note that the same data was used in a benchmark session organised at the IFAC Symposium on System Identification (SYSID) 2009 where the aim was to compare different modelling techniques using the same experimental data. 13.4.3.1
Best Linear Approximation
First, the best linear approximation Gˆ BLA ( jωk ) is calculated nonparametrically and plotted in Figure 13.7 (solid black line), together with its standard deviation σˆ BLA (dashed black line). To obtain a parametric model Gˆ BLA (z, θ ), different linear models of various orders are estimated. The best result in mean square sense is achieved by a 6th order linear model (nb = na = 6) which is also plotted in Figure 13.7 (solid grey line), together with the absolute difference between the measured FRF Gˆ BLA ( jωk ) and the parametric model Gˆ BLA (zk , θ ) with zk = e jωk Ts and Ts the sampling period (dashed grey line). 13.4.3.2
Initial Estimates
Next, the 6th order linear model is used as a starting point in the initialisation procedure. The static nonlinearity g2 is set equal to the identity function, such that the
222
L. Lauwers and J. Schoukens
system’s nonlinear behaviour is fully concentrated in g1 . The static nonlinearity g1 is approximated by a multivariable polynomial including the nonlinear degree 2 and 3. In practice, the number of degenerations of the matrix S is unfeasible to determine exactly since there is no explicit drop in the singular values. Therefore, the different possibilities (DS > 1, DS = 1, DS = 0) are dealt with (see also [5]). For each case, the linear dynamic blocks are initialised and a pˆ − qˆ data set is obtained. Next, a polynomial of various degrees is fit through this pˆ − qˆ data set to parametrise the static nonlinear block. All the resulting models are then used as a starting value in a nonlinear search routine. Hence, no model selection is performed on the initial nonlinear models. 13.4.3.3
Nonlinear Optimisation
All the obtained initial models are then optimised in the frequency domain via a Levenberg-Marquardt [7], [9] optimisation algorithm. The cost function which minimises the squared output error is given by V = ∑ |Y ( jωk , θ ) − Y ( jωk )|2
(13.19)
k
where Y ( jωk , θ ) and Y ( jωk ) are the modelled and measured output spectrum, respectively. In order to retain the best nonlinear model, a validation test is performed on a fresh data set. The best nonlinear model was obtained for the case DS > 1 in which one delay term was deleted from the original set of basis functions. For the static nonlinearity, a polynomial fit of degree 10 was used. This model has a validation rmse of 2.98 mV and contains 39 model parameters.
0
Amplitude [dB]
−20 −40 −60 −80 −100 0
0.05 0.1 0.15 Normalized Frequency
0.2
Fig. 13.7: Gˆ BLA ( j ωk ) (solid( black line), σˆ BLA (dashed( black line), 6th order linear model Gˆ BLA (z, θ ) (solid grey line), (Gˆ BLA ( j ωk ) − Gˆ BLA (zk , θ )( (dashed grey line)
Identification of Wiener–Hammerstein Systems Using the BLA
Amplitude [V]
Amplitude [V]
Amplitude [V]
13
223
0.5 0 −0.5 −1 0
0.5
1 Time [s]
1.5
0
0.5
1 Time [s]
1.5
0
0.5
1 Time [s]
1.5
0.5 0 −0.5 −1
0.5 0 −0.5 −1
Fig. 13.8: Validation results: measured output (black) and model error (grey). Top plot: best linear model; middle plot: initial nonlinear model; bottom plot: best nonlinear model
In Figure 13.8, the validation result for the 6th order linear model is shown in the top plot (using the benchmark test data). As an illustration, the validation result for the initial nonlinear model with a polynomial fit of degree 10 and DS > 1 is given in the middle plot (also using the benchmark test data). In black, the measured output is given and in grey, the simulation error. It can be seen that the simulation error shows no longer an explicit asymmetric behaviour which signifies that the nonlinear behaviour is already partly captured by the initial nonlinear model. The bottom plot shows the validation result for the best nonlinear WH model. In Figure 13.9, the spectra of the measured output signal (black) is shown, together with the linear simulation error (light grey) and the nonlinear simulation error (dark grey). In the pass-band of the DUT, the nonlinear model error is about 20dB lower than the linear model error.
13.4.4 Concluding Remarks The main advantage of the proposed initialisation procedure is that the required user interaction is low: only the order (nb and na ) of Gˆ BLA (z, θ ) and the degree of the multivariable polynomial need to be selected by the user. The model orders of the
224
L. Lauwers and J. Schoukens
Amplitude [dB]
0 −20 −40 −60 −80 −100
0
5 10 Frequency [kHz]
15
Fig. 13.9: DFT spectra of the measured output signal (black); linear model error (light grey); and nonlinear model error (dark grey)
linear dynamic blocks do not need to be chosen by the user in advance, since these blocks are composed parametrically once θˆ is available. Furthermore, it should be emphasised that the quality of the starting values depend strongly on the quality of the estimated GBLA .
13.5 Conclusions In this contribution, the usefulness of the best linear approximation of the system under test for the classification and identification of some block-oriented models was illustrated. First, a method was presented to guide the (inexperienced) user in the selection of an appropriate nonlinear model structure for the system under test. Using a two-stage nonparametric approach based on the behaviour of the best linear approximation, the user immediately obtains insight in the ability of some nonlinear block structures to model the system under test, without any parameter estimation. The method is easy to implement since the user only needs to carry out FRF measurements, together with a variance analysis. Secondly, a method to generate starting values for Wiener–Hammerstein models was presented. The linear dynamic blocks are written as a linear combination of basis functions containing the poles and the zeros of the best linear approximation of the system under test. The proposed initialisation algorithm is straightforward, non-iterative and requires little user interaction. The user only needs to perform classical FRF measurements and a number of mathematical operations. Acknowledgements. This work is sponsored by the Fund for Scientific Research (FWOVlaanderen), the Flemish Government (Methusalem Grant METH-1) and the Belgian Program on Interuniversity Poles of Attraction initiated by the Belgian State, Prime Minister’s Office, Science Policy programming (IAP-VI/4 - Dysco).
13
Identification of Wiener–Hammerstein Systems Using the BLA
225
References 1. Bendat, J.S., Piersol, A.G.: Engineering Applications of Correlations and Spectral Analysis. Wiley, New York (1980) 2. Enqvist, M.: Linear models of nonlinear systems. PhD thesis, Linkoping University, Linkoping, Sweden (2005) 3. Enqvist, M., Ljung, L.: Linear approximations of nonlinear FIR systems for separable input processes. Automatica 41(3), 459–473 (2005) 4. Golub, G.H., Van Loan, C.F.: Matrix Computations, 3rd edn. John Hopkins University Press, Baltimore (1996) 5. Lauwers, L., Pintelon, R., Schoukens, J.: Modelling of Wiener–Hammerstein Systems via the Best Linear Approximation. In: SYSID 2009, Proc. of 15th IFAC Symposium on System Identification, Saint-Malo, France, pp. 1098–1103 (2009) 6. Lauwers, L., Schoukens, J., Pintelon, R., Enqvist, M.: A nonlinear block structure identification procedure using frequency response function measurements. IEEE Trans. Instrum. Meas. 57(10), 2257–2264 (2008) 7. Levenberg, K.: A method for the solution of certain problems in least squares. Quart. Appl. Math. 2, 164–168 (1944) 8. Ljung, L.: System Identification: Theory for the User, 2nd edn. Prentice Hall, Upper Saddle River (1999) 9. Marquardt, D.: An algorithm for least-squares estimation of nonlinear parameters. SIAM Journal of Applied Mathematics 11, 431–441 (1963) 10. Paduart, J., Schoukens, J.: Fast identification of systems with nonlinear feedback. In: Proc. 6th IFAC Symp. NOLCOS, Stuttgart, Germany, pp. 525–529 (2004) 11. Pintelon, R., Schoukens, J.: System Identification: A Frequency Domain Approach. IEEE Press, Piscataway (2001) 12. Schoukens, J., Lataire, J., Pintelon, R., Vandersteen, G., Dobrowiecki, T.: Robustness issues of the best linear approximation of a nonlinear system. IEEE Transactions on Instrumentation and Measurement 58(5), 1737–1745 (2009) 13. Schoukens, J., Nemeth, J.G., Crama, P., Rolain, Y., Pintelon, R.: Fast approximate identification of nonlinear systems. Automatica 39(7), 1267–1274 (2003) 14. Schoukens, J., Pintelon, R., Dobrowiecki, T., Rolain, Y.: Identification of linear systems with nonlinear distortions. Automatica 41(3), 491–504 (2005a) 15. Soderstrom, T., Stoica, P.: System Identification. Prentice Hall, Englewood Cliffs (1989) 16. Van Huffel, S., Vandewalle, J.: The Total Least Squares Problem: Computational Aspects and Analysis. Society for Industrial and Applied Mathematics (SIAM), Philadelphia (1991) 17. Vandersteen, G.: Identification of linear and nonlinear systems in an errors-in-variables least squares and total least squares framework. PhD thesis, Vrije Universiteit Brussel, Brussels, Belgium (1997)
Chapter 14
Subspace Identification of Hammerstein–Wiener Systems Operating in Closed-loop Jan-Willem van Wingerden and Michel Verhaegen
14.1 Introduction Subspace identification has attracted considerable attention in the past years. This is mainly due to the fact that they are based on numerically reliable computations (e.g. SVD and QR). For general non-linear systems, the extensions of this class of identification approaches is far from trivial. However, by exploiting structure in the nonlinearity (e.g. Hammerstein–Wiener, LPV) dedicated algorithms can be developed [1, 2, 3]. Hammerstein–Wiener systems are a particular class of nonlinear systems, which are linear time-invariant (LTI) models with a static nonlinearity at the input and output, respectively. Although, Hammerstein and Wiener system identification attracted considerable attention in the past few years (see [4, 5] and references therein), Hammerstein–Wiener system identification did not [5]. Still, the identification of Hammerstein–Wiener models is of interest since this model structure appears in a large number of applications [5]. The focus of this chapter is on subspace-based Hammerstein–Wiener system identification. In the area of Hammerstein–Wiener model identification only a restricted number of subspace methods exist, e.g. [2, 6]. In [6] they formulate the subspace identification problem as an intersection problem of the past and the future using Kernel Canonical Correlation Analysis (KCCA). From this intersection they compute the state sequence and in the second step they apply Least-Squares Support Vector Machines (LS-SVM) [7] to obtain the nonlinearities and the system matrices. The proposed method only applies for data generated in open loop, while Jan-Willem van Wingerden Delft University of Technology, Mekelweg 2, 2628 CD Delft, The Netherlands e-mail:
[email protected] Michel Verhaegen Delft University of Technology, Mekelweg 2, 2628 CD Delft, The Netherlands e-mail:
[email protected] F. Giri & E.-W. Bai (Eds.): Block-oriented Nonlinear System Identification, LNCIS 404, pp. 229–239. c Springer-Verlag Berlin Heidelberg 2010 springerlink.com
230
J.-W. van Wingerden and M. Verhaegen
ek uk
yk f (.)
A,B,C,D,K
g(.)
Fig. 14.1: Schematic representation of an Hammerstein–Wiener systems.
from a practical point of view it is necessary to look at closed-loop system identification, which was motivated in [1, 2]. In [2] a novel algorithm is presented to identify MIMO Hammerstein–Wiener systems operating under open and closed-loop conditions. To do so, [2] formulated the linear time invariant optimised predictor based subspace identification algorithm as an intersection problem, better known as CCA. For Hammerstein–Wiener systems [2] utilised ideas from machine learning to estimate both the static nonlinearities and the Markov parameters. In the second step the state sequence was directly used to estimate the linear dynamics. The available identification algorithms lean on support-vector machines and this makes the notation of the derived algorithms complicated. Moreover, to solve the large intersection problems regularisation is required to solve the problem. In this book chapter we will follow the identification scheme in [2] but without using support-vector machines. We will stay in the primal domain and we will use simple basis functions to parametrise the nonlinearities. Since the algorithm stays close to the derivation of the LTI counterpart the notation will not be complicated. The outline of this book chapter is as follows; we start in Section 14.2 with the problem formulation, the idea of basis functions, and assumptions. In Section 14.3 we tailor the predictor-based subspace identification methods toward Hammerstein– Wiener systems. In Section 14.4 a simulation example is presented. We end this book chapter with our conclusions.
14.2 Problem Formulation In this section we present the problem formulation, the concept of basis functions, and the assumptions we make.
14.2.1 Problem Formulation For the derivation of the algorithm we consider the following minimal Hammerstein–Wiener system: xk+1 −1
=
Axk + B f (uk ) + Kek ,
g (yk ) = Cxk + ek ,
(14.1) (14.2)
14
Subspace Identification in Closed-loop
231
where xk ∈ Rn , uk ∈ Rr , yk ∈ R , are the state, input and output vectors. ek ∈ R denotes the zero-mean white innovation process12 . The matrices A ∈ Rn×n , B ∈ Rn×r , C ∈ R×n , K ∈ Rn× are the local system, input, output, and the noise matrices and finally f (.) : Rr → Rr and g(.)−1 : R → R are static smooth nonlinear one-toone functions. We can rewrite (14.1)-(14.2) in the predictor form as: xk+1 −1
g (yk )
˜ k + B f (uk ) + Kg−1(yk ), = Ax
(14.3)
= Cxk + ek ,
(14.4)
with A˜ = A−KC. It is well-known that an invertible linear transformation of the state does not change the input-output behaviour of the dynamic part of a Hammerstein– Wiener system. A similar linear transformation appears between the nonlinearities and the dynamic part. So, the static nonlinearities can be estimated up to a square invertible transformation. These transformations are given by: T −1 AT , T −1 BTu , T −1 KTy , Ty−1CT , Ty−1 g(.)−1 , and Tu−1 f (.) with T ∈ Rn×n , Ty ∈ R× , Tu ∈ Rr×r . The identification problem can now be formulated as: given the input sequence uk and the output sequence yk over a time k = {1, . . . , N}; find all, if they exist, the system matrices A, B, C, K and static nonlinearities, f (.) and g−1 (.), up to the mentioned similarity transformations.
14.2.2 Concept of Basis Functions The Hammerstein and Wiener nonlinearity will be parametrised using a linear combination of basis functions. These basis functions are defined as: g−1 (.) =
m
∑ αi qig−1 (.),
(14.5)
∑ βiqif (.),
(14.6)
i=1 m
f (.)
=
i=1
where m denotes the number of basis functions, qig−1 : R → R and qif : Rr → Rr the
ith basis function of the nonlinear functions g−1 and f , respectively. The parameters to be estimated are αi ∈ R× and βi ∈ Rr×r while the basis functions should be provided by the user. The parametrisation of the nonlinearities make it possible to define dummy inputs and outputs as follows: 1
2
Note that we use g−1 in the notation to emphasis the relation between the common formulation of Hammerstein–Wiener model structures. Further, for the identification algorithm we do not require the assumption that g−1 is invertible. The algorithm is derived without direct feed through. However, it is straightforward to extend the algorithm with a direct feed through.
232
J.-W. van Wingerden and M. Verhaegen
⎡
⎤ q1g−1 (yk ) ⎢q2 (y )⎥ 8 (1) 9 ⎢ g−1 k ⎥ ⎥ = yk yk = ⎢ .. (2:m) . ⎢ ⎥ yk ⎣ ⎦ . qm (y ) g−1 k
⎤ q1f (uk ) 2 ⎢ q f (uk ) ⎥ ⎥ ⎢ uk = ⎢ . ⎥ , ⎣ .. ⎦ qmf (uk ) ⎡
The estimated nonlinearity can now be expressed in the to be estimated coefficients multiplied with the dummy inputs and outputs. To do so we also define the stacked versions of α and β : α = α1 α2 · · · , αm , = α (1) α (2:m) , β = β1 β2 · · · , βm .
14.2.3 Assumptions and Notation Similar as in [8, 9] we define a past window denoted by p. This window is used to define the following stacked vector: ⎤ ⎡ zk ⎢ zk+1 ⎥ ⎥ ⎢ zkp = ⎢ (14.7) ⎥ .. ⎦ ⎣ . zk+p−1 with zk = [uTk , yTk ]T . This notation allows us to introduce the following stacked vector: Z
=
p [z1p , · · · , zN−p ].
X
=
[x p , · · · , xN ],
X X
= =
[x p+1 , · · · , xN ], [x p , · · · , xN−1 ],
In a similar way we define:
and E, E, U, and U. To solve a linear regression problem in the next section we also have to define the following two stacked vectors: Yf
(1)
=
[y p , · · · , yN ],
(2:m)
=
[y p
Yf
(1)
(2:m)
(1)
(2:m)
, · · · , yN
].
We assume that the state sequence X has full row rank and the extended observability matrix is given by:
14
Subspace Identification in Closed-loop
233
⎤ C ⎢ CA˜ ⎥ ⎥ ⎢ Γ p = ⎢ . ⎥. ⎣ .. ⎦ CA˜ p−1 ⎡
(14.8)
We also define the extended controllability matrix: K p = A˜ p−1 B, A˜ p−2B, · · · , B , with B = [Bβ1 , · · · , Bβm , K α1 , · · · , K αm ]. In the next section we revise the idea of predictor based subspace identification, in particular we will look at the PBSIDopt algorithm [9] and extend it to block nonlinear systems.
14.3 Hammerstein–Wiener Predictor-based Subspace Identification It is well known that the projector type of subspace algorithms (e.g. MOESP [10] and N4SID [11]) give biased estimates if the identification data set is generated under closed-loop conditions. The main reason for the bias is the constraint that for the projector type of algorithms the noise and the input should be uncorrelated. This assumption is clearly violated if there is a feedback loop present (as clearly explained in [12]). Predictor-based subspace identification (e.g. PBSID [9] and SSARX [8]) methods do not suffer from this drawback. In this section we introduce the PBSIDopt algorithm and use it for the identification of Hammerstein–Wiener systems.
14.3.1 Predictors The first objective of the predictor-based algorithms is to reconstruct the state sequence up to a similarity transformation. The state xk+p is given by: xk+p = A˜ p xk +
A˜ p−1B, A˜ p−2 B, · · · , B zkp . ) *+ , Kp
At this point we use the assumption that A˜ p = 0. With this assumption the state xk+p is given by: (14.9) xk+p = A˜ p−1 B, A˜ p−2 B, · · · , B zkp . ) *+ , Kp
The input-output behaviour of the Hammerstein–Wiener model is now given by: g−1 (yk+p ) = CK p zk + ek+p . p
234
J.-W. van Wingerden and M. Verhaegen
The nonlinearity is still present in this problem. With the definitions given in the previous section we can rewrite this equation as: (1)
α (1)Y f
(2:m)
= CK p Z − α (2:m)Y f
.
(14.10)
This problem can be looked-at as an intersection problem but in this book chapter we will consider it as a linear problem by just picking an α1 which now basically characterises the transformation Ty (since we can pre-multiply (14.10) with an arbitrary non-singular matrix). For presentation reasons we pick α (1) equal to unity. This allows us to formulate the following linear problem. min CK
p ,α (2:m)
(1)
(2:m) 2 ||F ,
||Y f − CK p Z + α (2:m)Y f
(14.11)
where || · · · ||F represents the Frobenius norm [13].
14.3.2 Extended Observability Times Controllability Matrix The product K p Z that represents by definition the state sequence, X, can not be estimated directly. In the predictor-based identification algorithms CK p is used to construct the extended observability matrix times the extended controllability matrix. The following upper block triangular matrix is used in the PBSIDopt algorithm: ⎡
CA˜ p−1 B CA˜ p−2 B ⎢ 0 CA˜ p−1 B ⎢ Γ pK p = ⎢ ⎣ 0
··· ··· .. .
CB ˜ CAB .. . CA˜ p−1 B
⎤ ⎥ ⎥ ⎥. ⎦
Observe that from the linear regression problems formulated in (14.11) we can construct this matrix. From the constructed matrix Γ p K p we can compute Γ p K p Z which equals by definition the extended observability matrix times the state sequence, Γ p X. By computing a Singular Value Decomposition (SVD) of this estimate we can estimate the state sequence and the order of the system. We will use the following SVD: 43 4 3 Σn 0 V p p W Γ K Z = U U⊥ , V⊥ 0 Σ where Σn is the diagonal matrix containing the n largest singular values and V is the corresponding row space. The matrix W represents a given weighting matrix. The state is now given by: † p K p Z, X = W −1 U Γ (14.12) where † represents the pseudo-inverse. It is well-known that this state contains a similarity transformation.
14
Subspace Identification in Closed-loop
235
14.3.3 Estimation of the Wiener Nonlinearity The estimation of the Wiener nonlinearity is a direct product of the minimisation given in (14.11) since we have put α (1) equal to the identity matrix the Wiener nonlinearity is now given by: m
g−1 (.) = q1g−1 (.) + ∑ αi qig−1 (.). i=2
We would like to stress that we can pre multiply this nonlinearity with an arbitrarily matrix Ty which is a transformation matrix which was already introduced in the problem statement. By setting α (1) equal to identify we fixed this transformation (but we don’t know it). If we solve (14.10) as an intersection problem a similar freedom will appear although then you give the additional freedom of picking a certain transformation to the algorithm.
14.3.4 Recovery of the System Matrices With the Wiener nonlinearity known and an estimate of the state sequence we can now find the LTI system matrices by simple linear regression. In the stacked version we have the following relation:
X Y˜
+ BU + K E,
= AX ˆ = CXˆ + E,
(14.13) (14.14)
i with Y˜ defined as the stacked version of ∑m i=1 αi qg−1 (yk+p ) and B is a low rank matrix defined as Bβ1 , . . . , Bβm . The solution is now given by:
⎡ ⎤†
X 6 7
⎣
B, K
= X U⎦ A,
E ˆ X † C = (Y˜ − E)
(14.15) (14.16)
Note that all the matrices are estimated up to this similarity transformation and the C matrix contains a transformation due to the Wiener nonlinearity.
14.3.5 Estimation of the Hammerstein Nonlinearity The Hammerstein nonlinearity can be computed in many ways. In this book chapter we find the Hammerstein nonlinearity by performing an SVD on the product BU which should have by definition rank r. The right-hand side of the equation can be estimated and since it is a low rank matrix we can obtain the Hammerstein nonlinearity by performing an SVD:
236
J.-W. van Wingerden and M. Verhaegen
BU =
U
U⊥
3
Σr 0
0 0
43
V V⊥
4 .
Using the column space of this matrix we now have an estimate of the Hammerstein nonlinearity and this one is given by: ⎤ q1f (uk ) ⎢ q2f (uk ) ⎥ ⎥ ⎢ Tu−1 f (uk ) = U † B ⎢ . ⎥ , ⎣ .. ⎦ qmf (uk ) ⎡
where † represents the pseudo inverse and Tu−1 the unknown transformation.
14.4 Example In this simulation example we use a fourth-order MIMO model with r = 2, and = 2. The collected data (uk , yk ) are used for the identification algorithm. The system matrices are: ⎡ ⎤ ⎡ ⎤ 0.37 0.37 0 0 0.6598 1.9698 ⎢ −0.37 0.37 ⎢ ⎥ 0 0 ⎥ ⎥ , B = ⎢ −0.5256 0.4845 ⎥ , A=⎢ ⎣ ⎣ −0.6968 0.1722 ⎦ 0 0 0.67 0.67 ⎦ 0 0 −0.67 0.67 0.1474 0.5646 ⎡ ⎤ 4.3171 −2.6436 3 4 ⎢ −0.4879 −0.3416 ⎥ 0.3749 0.0751 −0.5225 0.5830 ⎥ C= , K=⎢ ⎣ 0.6484 −0.9400 ⎦ , −0.8977 0.7543 0.1159 0.0982 −0.4660 0.1032 ⎡ " # ⎤ 8 9 (1) (1) sinc uk atan(y ) k " # ⎦ , g(yk ) = , f (uk ) = ⎣ (2) (2) yk sinc uk As reference signal we take a zero-mean Gaussian white noise signal with cov(uk ) = Ir and we add noise with a signal-to-noise ratio (SNR) of 40dB. For the identification experiment we used N = 1500 and p = f = 5. The collected data (uk and yk ) is used to identify a Hammerstein-Wiener model. The performance of the identified system is evaluated by looking at the value of the Variance-Accounted-For (VAF) on a data set different from the one used for identification. The VAF value for Hammerstein-Wiener systems is defined as: −1 (y )) var(g−1 (yk ) − g k −1 −1 VAF(g (yk ), g (yk )) = max 1 − , 0 ∗ 100%, var(g−1(yk ))
14
Subspace Identification in Closed-loop
237
Fig. 14.2: Presentation of f˜(uk ) based on 1000 Monte-Carlo simulations. The solid line represents the estimate with the highest VAF and the grey area covers the other 99 estimates.
Fig. 14.3: Presentation of g˜−1 (yk ) based on 100 Monte-Carlo simulations. The solid line represents the estimate with the highest VAF and the grey area covers the other 99 estimates.
238
J.-W. van Wingerden and M. Verhaegen
Fig. 14.4: Bode diagrams of the original transfer functions (dashed) and the identified transfer functions of the experiment with the highest mean correlation coefficients (solid). The transfer functions of the other 99 experiments are within the grey confidence region.
−1 (y ) denotes the signal obtained by simulating the identified Hammerwhere g k stein-Wiener model, g−1 (yk ) is the signal of the true Hammerstein-Wiener model, and var(·) denotes the variance of a quasi-stationary signal. To investigate the sensitivity of the identification algorithm with respect to noise, a Monte-Carlo simulation with 100 runs was carried out. For each of the 100 simulations a different realisation of the input and noise are used. In Figure 14.2 and 14.3 we show the nonlinearities by plotting f (uk ) and g−1 (yk ). Although, the estimates of the nonlinearities are obtained up to an unknown transformation. We can estimate in this simulation example this transformation since we know the ‘real’ system. These estimates are denoted by f˜(uk ) and g˜−1 (yk ), respectively. We clearly see that we can estimate the nonlinearities quite accurately for different realisations of the noise. In Figure 14.4 the LTI part of the Hammerstein-Wiener models is given by means of their Bode magnitude plot. As expected the closed-loop algorithm gives consistent results but a small bias arises due to the approximation made in the algorithm.
14
Subspace Identification in Closed-loop
239
14.5 Conclusions In this book chapter we faced the challenging problem to estimate MIMO Hammerstein-Wiener models from data. For this purpose a dedicated algorithm is developed based on recently proposed predictor-based subspace methods. Since the developed method stays close the his LTI counterpart it is intuitive and easy to use, this contradictory to methods which rely on support vector machines. The effectiveness of the approach was illustrated with a simple simulation example.
References 1. van Wingerden, J.W., Verhaegen, M.: Closed loop identification of MIMO Hammerstein models using LS-SVM. In: 15th IFAC Symposium on System Identification (2009) 2. van Wingerden, J.W., Verhaegen, M.: Closed-loop subspace identification of Hammerstein–Wiener models. In: Proceedings of the 48th IEEE Conference on Decision and Control, CDC (2009) 3. van Wingerden, J.W., Verhaegen, M.: Subspace identification of Bilinear and LPV systems for open and closed loop data. Automatica 45(2), 372–381 (2009) 4. Ding, F., Chen, T.: Identification of Hammerstein nonlinear ARMAX systems. Automatica 41(9), 1479–1489 (2005) 5. Bai, E.-W.: A blind approach to the Hammerstein-Wiener model identification. Automatica 38(6), 967–979 (2002) 6. Goethals, I., Pelckmans, K., Hoegaerts, L., Suykens, J.A.K., De Moor, B.: Subspace intersection identification of Hammerstein–Wiener systems. In: Proceedings of the 44th Conference on Decision and Control (CDC) and the European Control Conference (ECC), Seville, Spain (2005) 7. Suykens, J.A.K., van Gestel, T.V., De Brabanter, J., De Moor, B., Vandewalle, J.: Least squares support vector machines. World Scientific, Singapore (2002) 8. Jansson, M.: A new subspace identification method for open and closed loop data. In: Proceedings of the 16th IFAC World Congress, Czech Republic, Prague (2005) 9. Chiuso, A.: The role of vector auto regressive modeling in predictor based subspace identification. Automatica 43(6), 1034–1048 (2007) 10. Verhaegen, M., Dewilde, P.: Subspace model identification part 1: The output-error statespace model identification class of algorithms. International Journal of Control 56(5), 1187–1210 (1992) 11. Van Overschee, P., De Moor, B.: Subspace Identification for linear systems. Kluwer Academic Publishers, Dordrecht (1996) 12. Ljung, L., McKelvey, T.: Subspace identification from closed loop data. Signal Processing 52(2), 209–215 (1996) 13. Golub, G.H., van Loan, C.F.: Matrix computations. The John Hopkins University Press, Baltimore (1996)
Chapter 15
NARX Identification of Hammerstein Systems Using Least-Squares Support Vector Machines Ivan Goethals, Kristiaan Pelckmans, Tillmann Falck, Johan A.K. Suykens, and Bart De Moor
15.1 Introduction This chapter describes a method for the identification of a SISO and MIMO Hammerstein systems based on Least Squares Support Vector Machines (LS-SVMs). The aim of this chapter is to give a practical account of the works [14] and [15], adding to this material new insights published since. The identification method presented in this chapter gives estimates for the parameters governing the linear dynamic block represented as an ARX model, as well as for the unknown static nonlinear function. The method is essentially based on Bai’s overparametrisation technique, and combines this with a regularisation framework and a suitable model description which fits nicely within the LS-SVM framework with primal and dual model representations. This technique is found to cope effectively (i) with the illconditioning typically occurring in overparametrisation approaches, and (ii) with cases where no stringent assumptions can be made about the nature of the nonlinearity except for a certain degree of smoothness. Consider the task of modelling the nonlinear dynamic relation between an input signal (ut ∈ R)t and an output signal (yt ∈ R)t , both indexed over discrete time instants t ∈ Z. The main body of the paper will be concerned with SISO Hammerstein Ivan Goethals ING Life Belgium, Sint Michielswarande 70, B-1040 Etterbeek, Belgium e-mail:
[email protected] Kristiaan Pelckmans Uppsala University, Department of Information Technology, division Syscon, Box 337, SE-751 05, Uppsala, Sweden e-mail:
[email protected] Tillmann Falck, Johan A.K. Suykens, and Bart De Moor Katholieke Universiteit Leuven, ESAT-SCD-SISTA, Kasteelpark 10, B-3001 Leuven (Heverlee), Belgium e-mail: tillmann.falck,
[email protected],
[email protected] F. Giri & E.-W. Bai (Eds.): Block-oriented Nonlinear System Identification, LNCIS 404, pp. 241–258. c Springer-Verlag Berlin Heidelberg 2010 springerlink.com
242
I. Goethals et al.
systems consisting of a (fixed but unknown) static nonlinearity f : R → R followed by a (fixed but unknown) ARX linear subsystem with a transfer function of orders m > 0 and n > 0 of the numerator and denominator respectively. The parameters of the ARX subsystem are denoted as ω = (a1 , . . . , an , b0 , . . . , bm )T ∈ Rn+m+1 , assuming the following model n
m
i=1
j=0
yt = ∑ ai yt−i + ∑ b j f (ut− j ) + et , ∀t ∈ Z.
(15.1)
The equation error et is assumed to be white and zero mean. An extension of the ideas developed in this chapter to systems exhibiting multiple inputs and outputs will be presented in Subsection 15.4.2. For a general survey of different existing techniques for identification of Hammerstein systems, we refer to the relevant chapters in this book. In brief, identification of systems of the form (15.1) is often performed by describing the non-linear function f using a finite set of parameters and identifying those parameters together with the linear parameters ω . Parametric approaches found in the literature express the static non-linearity as a sum of (orthogonal or non-orthogonal) basis functions [22, 24, 25], as a finite number of cubic spline functions [9] or as a set of piecewise linear functions [32, 38]. Another form of parametrisation lies in the use of neural networks [19]. Regardless of the parametrisation scheme that is chosen, the final cost function will involve cross products between parameters describing the static nonlinearity f , and the parameters θ describing the linear dynamical system. Employing a maximum likelihood criterion results in a so-called bi-convex optimisation problem where global convergence is not guaranteed [28]. Several approaches have been proposed to solve the bi-convex optimisation problems typically encountered in Hammerstein system-identification, such as iterative approaches [24] and stochastic approaches [2, 5, 6, 7, 17, 38]. Moreover, in order to find a good optimum for these techniques, a proper initialisation is crucial [7] in practical applications. In this chapter, we will focus on one particular approach, the so-called overparametrisation approach and demonstrate that the key ideas behind the overparametrisation approach can conveniently be combined with the method of Least Squares-Support Vector Machines regression to yield reliable ARX and even subspace identification algorithms for Hammerstein systems. The main practical benefits of this approach will include: (i) the increased (numerical) robustness due to the presence of a well-defined regularisation mechanism, (ii) the user is not required to restrict the form (’parametrisation’) of the nonlinearity a priori, but reliable results will be obtained whenever the nonlinearity can be assumed to be ‘smooth’ (as will be defined later) to some degree and (iii) the framework of LS-SVMs will make it possible to confine the overparametrised model class more than was the case in [1]. Those advantages will support the claim of practical efficiency as illustrated on a number of case-studies in [14] and [15]. Since the publication of the works [14] and [15] some progress has been made towards more general model structures, other loss functions or recursive identification schemes for such models (see Section 15.6).
15
NARX Identification of Hammerstein Systems Using LS-SVM
243
Fig. 15.1: General structure of a Hammerstein system, consisting of a static nonlinearity f and a linear subsystem with transfer function B(z)/A(z)
The outline of this chapter is as follows: in Section 15.2 the basic ideas behind overparametrisation are briefly reviewed. The use of LS-SVMs for static function estimation is described in Section 15.3. In Section 15.4 the ideas of Sections 15.2 and 15.3 are combined into the Hammerstein identification algorithm under study. Section 15.5 gives an illustrative example, and Section 15.6 will discuss extensions towards other block-structure models and towards a Hammerstein subspace identification algorithm. Section 15.7 gives concluding remarks.
15.2 Hammerstein Identification Using an Overparametrisation Approach This section focuses on classical overparametrisation techniques applied in Hammerstein identification such as presented in [1, 22, 24, 25, 35, 37]. The key idea behind overparametrisation is to transform the bi-convex optimisation problem into a convex one by replacing every crossproduct of unknowns by new independent parameters [1, 4]. In a second stage the obtained solution is projected onto the Hammerstein model class. A technical implementation of this idea is presented here below.
15.2.1 Implementation of Overparametrisation The idea of overparametrisation for Hammerstein systems is implemented here by substituting the product b j f (ut− j ) in (15.1) by separate non-linear functions g j (ut− j ) for all j = 1, . . . , m. This results in the following overparametrised model: n
m
i=1
j=0
yt = ∑ ai yt−i + ∑ g j (ut− j ) + et .
(15.2)
Note that this equation is linear in the parameters ai , i ∈ {1, . . . , n} and the nonlinear functions g j . When the {g j } j are appropriately parametrised, (15.2) can be solved for ai , i ∈ {1, . . . , n} and g j , j ∈ {0, . . . , m} using an ordinary least squares approach. In a second stage, we are interested in recovering the parameters b(m) = (b0 , . . . , bm ) ∈ Rm from the estimated functions {gˆ j : R → R} j . In order to do so, we concentrate on the function-values these functions take on the samples (ut )t , rather than on the functions themselves. This idea allows us to work further with tools from linear algebra, rather than working in a context of functional analysis. Hence, let the matrix G ∈ R(m+1)×(N−2m−1) be defined as
244
I. Goethals et al.
⎡
g0 (um ) ⎢ .. G=⎣ . gm (um )
⎤ g0 (uN−m ) ⎥ .. ⎦, .
...
(15.3)
. . . gm (uN−m )
and where Gˆ ∈ R(m+1)×(N−2m+1) is the same matrix formed by the functions {gˆ j } j estimated before. Now the key observation is that G = b(m) fNT . here fN ∈ RN−2m+1 is defined as fN = ( f (um ), . . . , f (uN−m ))T . Hence in the ideal case the matrix G is a rank-one matrix where the left- and right singular vector corresponding to the nonzero singular value, is proportional to fN and b(m) respectively. The practical way ˆ and to use the best rank-one decomposition to to proceed is now to replace G by G, ˆ give an estimate b(m) of b(m) . So far no specific parametrisation was assumed in the derivation above. If the m + 1 functions g j have a common parametrisation, one can perform this projection in the parameter space instead as follows. A common parametrisation involves writing the original static non-linearity f in (15.1) as a linear combination of n f general non-linear basis functions fk , each with a certain weight ck such nf that f (ut ) = ∑k=1 ck fk (ut ). The functions f1 , f2 , and fn f are thereby chosen beforehand. Note that this amounts to parametrising the functions g j in (15.2) as nf g j (ut ) = ∑k=1 θ j,k fk (ut ), with θ j,k = b j ck . Hence, The original model (15.1) is rewritten as n
yt
m
∑ aiyt−i + ∑
=
∑ b j ck fk (ut− j ) + et
(15.4)
∑ aiyt−i + ∑ ∑ θ j,k fk (ut− j ) + et ,
(15.5)
i=1
=
nf
n
j=0 k=1 m nf
i=1
j=0 k=1
which can be solved for θ j,k , j = 0, . . . , m, k = 1, . . . , n f using e.g. a least squares algorithm. Denoting the estimates for θ j,k by θˆ j,k , estimates for the b j and ck are thereafter recovered from the SVD of: ⎡ˆ ⎤ θ0,1 θˆ0,2 . . . θˆ0,n f ⎢ θˆ1,1 θˆ1,2 . . . θˆ1,n ⎥ f ⎥ ⎢ . (15.6) Θˆ = ⎢ . .. .. ⎥ ⎣ .. . . ⎦ θˆm,1 θˆm,2 . . . θˆm,n f ,
15.2.2 Potential Problems in Overparametrisation Estimating individual components in a sum of non-linearities is not without risks. Suppose for instance that m = 1, then Equation (15.2) can be rewritten as: n
yt
=
∑ ai yt−i + g0(ut ) + g1(ut−1 ) + et
i=1
(15.7)
15
NARX Identification of Hammerstein Systems Using LS-SVM
245
n
=
∑ ai yt−i + g0(ut ) + δ + g1(ut−1 ) − δ + et
(15.8)
∑ ai yt−i + g0(ut ) + g1(ut−1 ) + et ,
(15.9)
i=1 n
=
i=1
with δ an arbitrary constant and g0 (ut ) = g0 (ut ) + δ , g1 (ut−1 ) = g1 (ut−1 ) − δ . Similarly, note that for any set of variables εk , k = 1, . . . , n f with ∀u ∈ R, nf = ∑k=1 εk fk (u) = Constant and any set α j , j = 0, . . . , m such that ∑mj=0 α j = 0, θ j,k θ j,k + α j εk is also a solution to (15.5) [18]. Hence, given a sequence of input/output measurements, all non-linearities estimated on these measurements will only be determined up to a set of constants. This problem is often overlooked in existing overparametrisation techniques and may lead to conditioning problems and destroy the low-rank property of (15.6). In fact, many published overparametrisation approaches applied to more complex Hammerstein systems lead to results which are far from optimal if no measures are taken to overcome this problem [14]. Following the parametric notation one possible solution is to calculate: ⎤ ⎡ˆ ⎤⎡ θ0,1 θˆ0,2 . . . θˆ0,n f f1 (u1 ) . . . f1 (uN ) ⎢ θˆ1,1 θˆ1,2 . . . θˆ1,n ⎥ ⎢ f2 (u1 ) . . . f2 (uN ) ⎥ f ⎥⎢ ⎥ ⎢ (15.10) Gˆ = ⎢ . .. ⎥ , .. .. ⎥ ⎢ .. ⎣ ⎦ ⎣ .. ⎦ . . . . fn f (u1 ) . . . fn f (uN ) θˆm,1 θˆm,2 . . . θˆm,n f subtract the mean of every row in Gˆ and take the SVD of the remaining matrix, from which estimates for the b j can be extracted. Estimates for the ck can then be found in a second round by solving (15.4). This identifiability issue is dealt with properly in the LS-SVM approach as described in the next section.
15.3 Function Approximation Using Least Squares Support Vector Machines In this section, we review some elements of Least Squares Support Vector Machines (LS-SVMs) for static function approximation. The theory reviewed here will be extended to the estimation of Hammerstein systems in Section 15.4. This framework has strong connections to research on RBF- and regularisation networks, Gaussian processes, smoothing splines and dual ridge regression amongst others, see e.g. [30] for a thorough overview. N ⊂ Rd × R be a set of input/output training data (x , y ) with an Let {(xt , yt )}t=1 t t d input xt ∈ R and output yt ∈ R. Consider the regression model yt = f (xt )+ et where x1 , . . . , xN are deterministic points, f : Rd → R is an unknown real-valued function and e1 , . . . , eN are uncorrelated random errors with E [et ] = 0, E et2 = σe2 < ∞. In recent years, Support Vector Machines (SVMs) [33] have been used for the purpose of estimating the non-linear f . The following model is assumed:
246
I. Goethals et al.
f (x) = wT ϕ (x) + b, where ϕ (x) : Rd → RnH denotes a potentially infinite (nH = ∞) dimensional feature map, w ∈ RnH , b ∈ R. The regularised cost function of the Least Squares SVM (LS-SVM) [30] is given as min J (w, e)
=
1 T γ n w w + ∑ et2 , 2 2 t=1
subject to : yt
=
wT ϕ (xt ) + b + et , t = 1, . . . , N.
w,b,e
The relative importance between the smoothness of the solution and the data fitting is governed by the scalar γ ∈ R+ 0 , referred to as the regularisation constant. The optimisation performed corresponds to ridge regression [16] in feature space. In order to solve the constrained optimisation problem, a Lagrangian is constructed: N
L (w, b, e; α ) = J (w, e) − ∑ αt {wT ϕ (xt ) + b + et − yt }, t=1
with αt for t = 1, . . . , N the Lagrange multipliers. The conditions for optimality are given as: N ∂L = 0 → w = ∑ αt ϕ (xt ), ∂w t=1
(15.11)
∂L =0 → ∂b
(15.12)
N
∑ αt = 0,
t=1
∂L = 0 → αt = γ et , t = 1, . . . , N, ∂ et ∂L = 0 → yt = wT ϕ (xt ) + b + et , t = 1, . . . , N. ∂ αt
(15.13) (15.14)
Substituting (15.11)-(15.13) into (15.14) yields the following dual problem (i.e. the problem in the Lagrange multipliers): 3 4 43 4 3 0 b 0 1N T , (15.15) = α y 1N Ω + γ −1IN T T T where y = y1 . . . yN ∈ RN , 1N = 1 . . . 1 ∈ RN , α = α1 . . . αN ∈ RN , and the matrix Ω ∈ RN×N where Ωi j = K(xi , x j ) = ϕ (xi )T ϕ (x j ), ∀i, j = 1, . . . , N, with K the positive definite kernel function. Note that in order to solve the set of equations (15.15), the feature map ϕ does never have to be defined explicitly. Only its inner product, a positive definite Mercer kernel, is needed. This is called the kernel trick [27, 33]. For the choice of the kernel K(·, ·), see e.g. [27]. Typical examples are the use of a linear kernel K(xi , x j ) = xTi x j , a polynomial kernel K(xi , x j ) = (τ + xTi x j )d , τ ≥ 0 of degree d or an RBF kernel K(xi , x j ) = exp(−xi − x j 22 /σ 2 )
15
NARX Identification of Hammerstein Systems Using LS-SVM
247
where σ denotes the bandwidth of the kernel. The resulting LS-SVM model for function estimation can be evaluated at a new point x∗ as fˆ(x∗ ) =
N
∑ αt K(x∗ , xt ) + b,
t=1
where (b, α ) is the solution to (15.15). Note that in the above, no indication is given as to how to choose free parameters such as the regularisation constant γ and the bandwidth σ in an RBF kernel. These parameters, which are generally referred to as hyper-parameters will have to be obtained from data, e.g. by tuning on an independent validation dataset, or by using cross-validation [18]. Besides the function estimation case, the class of LS-SVMs also includes classification, kernel PCA (principal component analysis), kernel CCA, kernel PLS (partial least squares), recurrent networks and solutions to non-linear optimal control problems. For an overview on applications of the LS-SVM framework, the reader is referred to [30] and citations.
15.4 NARX Hammerstein Identification as a Componentwise LS-SVM The key advantage of the use of overparametrisation as introduced in Section 15.2 is the particularly attractive convexity property. An essential problem with the overparametrisation approach however is the increased variance of the estimates due to the increased number of unknowns in the first stage. In this section, we will demonstrate that the ideas behind the overparametrisation approach can conveniently be combined with the method of LS-SVM regression which (i) features an inherent regularisation framework to deal with the increased number of unknowns, and (ii) enables one to deal properly with the identifiability issue discussed in Subsection 2.2. For instructive purposes, we will again focus on systems in SISO form and deal with the extension to MIMO systems later.
15.4.1 SISO Systems In line with LS-SVM function approximation, we replace every function g j (ut− j ) in (15.2) by wTj ϕ (ut− j ) with ϕ : R → RnH a fixed, potentially infinite (nH = ∞) dimensional feature map and for every j = 0, . . . , m, w j ∈ RnH . Adding an additional constant d, the reason of which will become clear below, the overparametrised model in Equation (15.2) can be written as n
m
i=1
j=0
yt = ∑ ai yt−i + ∑ wTj ϕ (ut− j ) + d + et .
(15.16)
With r = max(m, n) + 1, the regularised cost function of LS-SVM is given as:
248
I. Goethals et al.
min J (w j , e) =
w j ,a,d,e
1 2
m
1
N
∑ wTj w j + γ 2 ∑ et2 ,
(15.17)
t=r
j=0
subject to m
n
j=0
i=1
∑ wTj ϕ (ut− j ) + ∑ ai yt−i + d + et − yt
=
0, t = r, . . . , N,
(15.18)
=
0, j = 0, . . . , m.
(15.19)
N
∑ wTj ϕ (uk )
k=1
The problem (15.17)-(15.19), is known as a component-wise LS-SVM regression problem and was described in [26], and may be traced back to earlier research on additive models using smoothing splines, see e.g. [36] and citations. The term component-wise refers to the fact that the output is ultimately written as the sum of a set of linear and non-linear components. As will be seen shortly, the derivation of a solution to a component-wise LS-SVM problem follows the same approach within the LS-SVM setting with primal and dual model representations. Note the additional constraints (15.19) to centre the non-linear functions wTj ϕ (·) for all j = 0, . . . , m around their average over the training set. These constraints resolve the identifiability issue described in Subsection 15.2.2: they remove the uncertainty resulting from the fact that any set of constants can be added to the terms of the additive non-linear function (15.16), as long as the sum of the constants is zero. Observe that the constraints (15.19) correspond to removing the mean of every row in Gˆ in (15.10). Removing the mean will facilitate the extraction of the parameters b j in (15.1) later. Furthermore, the constraints enable us to give a clear meaning to the bias parameter d, namely d = ∑mj=0 b j N1 ∑Nk=1 f (uk ) . Now it is seen that inclusion of constraints of the form (15.19) can naturally be included in the objective function of the LS-SVM, avoiding the need for a separate normalisation step in the identification procedure. Lemma 15.1. Let again r = max(m, n) + 1. Given the system (15.16), the LS-SVM estimates for the non-linear functions wTj ϕ : R → R, j = 0, . . . , m, are given as: N
N
t=r
t=1
wTj ϕ (u∗ ) = ∑ αt K(ut− j , u∗ ) + β j ∑ K(ut , u∗ )
(15.20)
where the parameters αt ,t = r, . . . , N, β j , j = 0, . . . , m, as well as the linear model parameters ai , i = 1, . . . , n and d are obtained from the following set of linear equations: ⎤⎡ ⎤ ⎡ ⎡ ⎤ 0 0 1T 0 d 0 ⎥⎢ a ⎥ ⎢ 0 ⎥ ⎢ 0 0 Yp 0 ⎥⎢ ⎥ ⎢ ⎢ ⎥ (15.21) ⎦⎣ α ⎦ = ⎣ Yf ⎦, ⎣ 1 Y pT K + γ −1 I K0 T β 0 0 0 K0 (1TN Ω 1N )Im+1
15
NARX Identification of Hammerstein Systems Using LS-SVM
249
with a =
β
=
K 0 (p, q) = K (p, q) =
a1
. . . an
β0
T
, T . . . βm ,
(15.22) (15.23)
N
N
t=1 m
t=1
∑ Ωt,r+p−q = ∑ K(ut , ur+p−q), m
∑ Ω p+r− j−1,q+r− j−1 = ∑ K(u p+r− j−1, uq+r− j−1), (15.25)
j=0
Yp
Yf
=
=
(15.24)
⎡ yr−1 ⎢yr−2 ⎢ ⎢ .. ⎣ .
j=0
yr yr−1 .. .
yr−n
yr−n+1
yr+1
...
yN
T
... ...
⎤ yN−1 yN−2 ⎥ ⎥ .. ⎥ , . ⎦
...
yN−n
,
(15.26)
(15.27)
and 1N is a column vector of length N with elements 1. Proof. The proof is found in [14].
Note that the matrix K , which appears on the left hand side of (15.21) and plays a similar role as the kernel matrix Ω in (15.15), actually represents a sum of kernels in (15.21). This follows as a typical property of the solution of component-wise LSSVM problems [26]. It is instrumental that ill-conditioning only arises if the matrix Y p has zero singular values. That is, the regularisation term γ −1 IN will avoid illconditioning even if K is singular, as would be the case when the signal (ut )t is constant (and not persistently exciting of any order). Projecting onto the class of ARX Hammerstein models The projection of the obtained model onto (15.1) goes as follows. Estimates for the from (15.21). Furautoregressive parameters ai , i = 1, . . . , nare directly obtained thermore, for the training input sequence u1 . . . uN , we have: ⎡ ⎤ ⎤⎡ ˆ ⎤T αN . . . αr 0 f (u1 ) b0 ⎢ ⎥ αN . . . αr ⎢ .. ⎥ ⎢ .. ⎥ ⎢ ⎥ ⎥ ⎣ . ⎦⎣ . ⎦ = ⎢ .. .. ⎣ ⎦ . . ˆ bm f (uN ) 0 αN . . . αr ⎡ ⎤ ⎤T ⎡ ⎤ ⎡ ΩN,1 ΩN,2 . . . ΩN,N β0 N Ωt,1 ⎢ΩN−1,1 ΩN−1,2 . . . ΩN−1,N ⎥ ⎢ ⎥ ⎢ . ⎥ ⎢ . ⎥ ×⎢ . .. .. ⎥ + ⎣ .. ⎦ ∑ ⎣ .. ⎦ , (15.28) ⎣ .. . . ⎦ βm t=1 Ωt,N Ωr−m,1 Ωr−m,2 . . . Ωr−m,N ⎡
with fˆ(u) an estimate for
250
I. Goethals et al.
f (u) = f (u) −
1 N ∑ f (ut ). N t=1
(15.29)
N Hence, estimates for b j and the static non-linearity f evaluated in {ut }t=1 can be obtained from a rank 1 approximation of the right hand side of (15.28), for instance using a singular value decomposition. Again, this is the equivalent of the SVD-step that is generally encountered in overparametrisation methods [1, 4]. Once all the elements b j are known, ∑Nk=1 f (uk ) can be obtained as ∑Nk=1 f (uk ) = ∑mNd b j . In a j=0
second step, a parametric estimation of f can be obtained by applying classical LSN . SVM function approximation on the couples {(ut , fˆ(ut ))}t=1 Algorithm 15.1. LS-SVM Hammerstein identification – a SISO algorithm 1. Choose a kernel K and regularisation constant γ 2. Calculate the componentwise kernel matrix K as a sum of individual kernel matrices as described in (15.25) 3. Solve (15.21) for d, a, α , β 4. Apply (15.21) to a validation set or use cross-validation. Go to step 1 and change kernel parameters and or γ until optimal performance is obtained 5. Take the SVD of the right hand side of (15.28) to determine the linear parameters b0 , . . ., bm N from (15.28) and (15.29) 6. Obtain estimates { fˆ(ut )}t=1 7. If a parametric estimate for f is needed, apply LS-SVM function estimation on N {(ut , fˆ(ut ))}t=1
15.4.2 Identification of Hammerstein MIMO Systems Conceptually, an extension of the method presented in the previous section towards the MIMO case is straightforward, but the calculations involved are quite extensive. Assuming a MIMO Hammerstein system of the form: n
m
i=1
j=0
yt = ∑ Ai yt−i + ∑ B j f (ut− j ) + et ,
(15.30)
with yt , et ∈ Rny , ut ∈ Rnu , Ai ∈ Rny ×ny , B j ∈ Rny ×nu , t = 1, . . . , N, i = 1, . . . , n, j = T 0, . . . , m, and f : Rnu → Rnu : u → f (u) = f1 (u) . . . fnu (u) , we have for every row s in (15.30), that n
m
i=1
j=0
yt (s) = ∑ Ai (s, :)yt−i + ∑ B j (s, :) f (ut− j ) + et (s).
(15.31)
15
NARX Identification of Hammerstein Systems Using LS-SVM
251
Note that for every non-singular matrix V ∈ Rnu ×nu , and for any j = 0, . . . , m: B j (s, :) f (ut− j ) = B j (s, :)VV −1 f (ut− j ) ,
(15.32)
with B j (s, :) denoting row s in the matrix B j . Hence, any model of the form (15.30) can be replaced with an equivalent model by applying a linear transformation on the components of f and the columns of B j . This will have to be taken into account when identifying models of the form (15.30) without any prior knowledge of the non-linearity involved. T Substituting f (u) = f1 (u) . . . fnu (u) in (15.31) leads to: n
m
yt (s) = ∑ Ai (s, :)yt−i + ∑ i=1
nu
∑ B j (s, k) fk (ut− j ) + et (s).
(15.33)
j=0 k=1
u B j (s, k) fk (ut− j ) by wTj,s ϕ (ut− j ) + ds, j this reduces to By replacing ∑nk=1
n
m
i=1
j=0
yt (s) = ∑ Ai (s, :)yt−i + ∑ ω Tj,s ϕ (ut− j ) + ds + et (s). where
(15.34)
m
ds =
∑ ds, j .
(15.35)
j=0
The primal problem that is subsequently obtained is the following: min J (ω j,s , e) =
ω j,s ,e
m
ny
ny N
1
γs
∑ ∑ 2 ω Tj,s ω j,s + ∑ ∑ 2 et (s)2 .
(15.36)
s=1 t=r
j=0 s=1
subject to (15.34) and ∑Nk=1 wTj,s ϕ (uk ) = 0, j = 0, . . . , m, s = 1, . . . , ny . Lemma 15.2. Given the primal problem (15.36), the LS-SVM estimates for the nonlinear functions wTj,s ϕ : R → R, j = 0, . . . , m, s = 1, . . . , ny , are given as: N
N
t=r
t=1
wTj,s ϕ (u∗ ) = ∑ αt,s K(ut− j , u∗ ) + β j,s ∑ K(ut , u∗ ),
(15.37)
where the parameters αt,s ,t = r, . . . , N, s = 1, . . . , ny , β j,s , j = 0, . . . , m, s = 1, . . . , ny as well as the linear model parameters Ai , i = 1, . . . , n and ds , s = 1, . . . , ny are obtained from the following set of linear equations: ⎤⎡ ⎤ ⎡ ⎤ ⎡ X1 L1 R1 ⎥ ⎢ ⎥ ⎢ ⎢ . . ⎥ . .. (15.38) ⎦ ⎣ .. ⎦ = ⎣ .. ⎦ , ⎣ Lny where
Xny
Rny
252
I. Goethals et al.
⎤ ⎡ ⎤ ds 0 1T 0 0 ⎢ 0 ⎢ As ⎥ 0 Yp 0 ⎥ ⎥ ⎢ ⎢ ⎥ (15.39) ⎣ 1 Y pT K + γs−1 I S ⎦ , Xs = ⎣ α s ⎦ , 0 0 ST T βs T T 0 0 Y fT,s 0 , Y f ,s = yr (s)T . . . yN (s)T , (15.40) ⎤ ⎤ ⎡ ⎡ αr,s A1 (s, :)T N ⎥ ⎢ ⎢ . ⎥ .. ⎦ , α s = ⎣ .. ⎦ , S (p, q) = ∑ Ωt,r+p−q , (15.41) ⎣ . ⎡
Ls
=
Rs
=
As
=
βs
=
K (p, q) =
Am (s, :)T
β0,s
. . . βm,s
T
αN,s
t=1
, Ω p,q = ϕ (u p )T ϕ (uq ),
m
∑ Ω p+r− j−1,q+r− j−1,
(15.42)
T = 1TN Ω 1N · Im+1 .
(15.43)
j=0
Proof. The proof is found in [14].
Note that the matrices Ls , s = 1, . . . , ny in (15.38) are almost identical, except for the different regularisation constants γs . In many practical cases, however, and if there is no reason to assume that a certain output is more important than another, it is recommended to set γ1 = γ2 = . . . = γny . This will speed up the estimation algorithm since L1 = L2 = . . . = Lny needs to be calculated only once, but most importantly, it will reduce the number of hyper-parameters to be tuned. Projection onto the class of ARX Hammerstein models The projection of the obtained model onto (15.33) is similar as in the SISO case. Estimates for the autoregressive matrices Ai , i = 1, . . . , n are directly obtained from (15.38). For the training input sequence [ u1 . . . uN ] and every k = 1, . . . , nu , we have: ⎡
⎤ B0 (1, :) .. ⎢ ⎥ ⎢ ⎥⎡ . ⎢ ⎥ ˆT ⎢ Bm (1, :) ⎥ ⎢ ⎥ ⎢ f (u1 ) ⎢ ⎥⎢ .. .. ⎢ ⎥⎣ . . ⎢ ⎥ T ⎢ B0 (ny , :) ⎥ ⎢ ⎥ fˆ (uN ) ⎢ ⎥ .. ⎣ ⎦ .
⎤T ⎥ ⎥ ⎦
Bm (ny , :) ⎡ ⎢ +⎣
with fˆ(u) an estimate for
⎤ β0,1 ⎢ .. ⎥ ⎢ . ⎥ ⎥ ⎡ ⎢ ⎤T ⎢ βm,1 ⎥ ⎥ N Ωt,1 ⎢ ⎢ . ⎥ ⎢ ⎥ = ⎢ .. ⎥ ∑ ⎣ ... ⎦ ⎥ t=1 ⎢ ⎢ β0,n ⎥ Ωt,N y ⎥ ⎢ ⎢ . ⎥ ⎣ .. ⎦ ⎡
A1 .. . Any
βm,ny ⎡ ⎤ ΩN,1 ⎢ΩN−1,1 ⎥ ⎢ ⎦ × ⎢ .. ⎣ . Ωr−m,1
ΩN,2 ΩN−1,2 .. . Ωr−m,2
⎤ . . . ΩN,N . . . ΩN−1,N ⎥ ⎥ .. ⎥ , (15.44) . ⎦ ...
Ωr−m,N
15
NARX Identification of Hammerstein Systems Using LS-SVM
253
f (u) = f (u) − g,
(15.45)
⎤ d1 ⎢ ⎥ ∑ B j g = ⎣ ... ⎦ .
(15.46)
and g a constant vector such that: ⎡
m
j=0
d ny
Estimates for f and the B j , j = 0, . . . , m, can be obtained through a rank-nu approximation of the right hand side of (15.44). If a singular value decomposition is used, the resulting columns of the left hand side matrix of (15.44) containing the elements of B j , j = 0, . . . , m, can be made orthonormal, effectively fixing the choice of V in (15.32). From estimates for f in (15.45) and g in (15.46), finally, an estimate for the nonlinear function f can be obtained. Note that if the row-rank of ∑mj=0 B j is smaller than the column-rank, multiple choices for g are possible. This results as an inherent property of blind MIMO Hammerstein identification. The choice of a particular g is left to the user. Algorithm 15.2. LS-SVM Hammerstein identification – a MIMO algorithm 1. Choose a kernel K and regularisation constants γ = γ1 = . . . = γny 2. Calculate the componentwise kernel matrix K as a sum of individual kernel matrices as described in (15.43) 3. Solve (15.38) for α , β , d1 , . . . , dny and A1 , . . . , An 4. Apply (15.38) to a validation set or use cross-validation. Go to step 1 and change kernel parameters and or γ until optimal performance is obtained 5. Take the SVD (rank-nu approximation) of the right hand side of (15.44) to determine the linear parameters B0 , . . ., Bm N from (15.38), (15.45) and (15.46) 6. Obtain estimates { fˆ(ut )}t=1 7. If a parametric estimate for f is needed, apply LS-SVM function estimations on N , k = 1, . . . , n {(ut , fˆk (ut ))}t=1 y
15.5 Example This section illustrates the importance of the centring constraints and the careful selection of model parameters. Therefore consider the following example; f (u) = sinc(u)u2 is the true nonlinearity and the linear subsystem is of 6th order. The linear subsystem is described by A(z) = (z − 0.98e±i )(z − 0.98e±1.6i )(z − 0.97e±0.4i ) and B(z) = (z − 0.2)(z + 0.94)(z − 0.93e±0.7i)(z − 0.96e±1.3i). The importance of the centring constraints is most visible for systems with “challenging numerators”, in this example characterised by system zeros close to the unit circle. In contrast to the examples presented in [14] and [15] a minimal number of data points is used to illustrate the importance of proper tuning of the hyper-parameters
254
I. Goethals et al. with centering
frequency response [dB]
true nonlinearity
f (u)
0.5
0
−0.5 −2
0 u
2
without cenetring
bad γ
20
0 −20 −40
0
π /4
π /2
3π /4
π
frequency ω
Fig. 15.2: Reconstruction of true nonlinearity f (u) = sinc(u)u2 (left panel, solid line) and 6th order linear subsystem (right panel, solid line) by a LS-SVM model with centring constraints (dashed line, σ = 0.85, γ = 167) and without centring constraints (dotted line, σ = 1.17, γ = 13.7). Additionally the reconstruction for a model with a badly chosen regularisation constant (σ = 0.85, γ = 0.167) is shown by the loosely dotted line. All models are estimated from 95 samples (circles) with additive white Gaussian output noise of variance 0.12 . The model, with centring constraints and carefully selected regularisation constant, estimates the nonlinearity as well as the linear subsystem better than the other models
and the use of centring constraints. The true system is simulated for 190 time steps and white Gaussian noise with unit variance as input. The output is subject to additive white Gaussian noise with variance 0.12. Half of the samples are used as a validation set for model selection that is carried out using a grid search. The hyperparameters that need to selected are the regularisation constant γ and the bandwidth of the RBF kernel σ . For illustration purposes we performed the algorithm outlined at the end of Section 15.4.1 twice, once with centring constraints and once without. Figure 15.2 shows the reconstruction of the true nonlinearity and the linear subsystem as obtained for both models. Additionally the reconstruction resulting from a badly tuned model with centring constraints is shown. It can be seen that the reconstruction using the properly tuned model with centring constraints, is substantially better than the one obtained using a model without centring constraints or badly chosen model parameters.
15.6 Extensions Various extensions to the concepts introduced in Section 15.4 exist. We provide a brief summary below and refer the reader to the included references for further details:
15
NARX Identification of Hammerstein Systems Using LS-SVM
255
• Subspace identification: The ARX model class is a popular one due to (i) its simple structure and (ii) the fact that its parameters can be estimated conveniently as an ordinary least squares problem. Nevertheless, ARX models are not suited for the identification of linear dynamical systems under certain experimental conditions such as the presence of heavily coloured noise on the outputs. As a result, an extension of the concepts presented in Section 15.4 to the broader and more robust class of subspace identification algorithms is desirable. Such an extension was presented in [15] and is based on the observation that the oblique projection that features in most linear subspace identification algorithms in one way or another [31] can be written as a least squares optimisation problem, not unlike the one encountered in linear ARX identification. As in the ARX case, adding a static non-linearity f ultimately transforms this least squares optimisation problem into a bi-convex optimisation that can be solved using a componentwise LS-SVM. In [20] this has been further extended to closed loop measurements. • Recursive subspace identification: Over the last few years, various recursive versions of linear subspace identification algorithms have been presented [12, 21, 23]. In [3], it is shown that the Hammerstein subspace identification algorithm presented in [14] can also be transformed into recursive form, allowing for its use in on-line applications. • Identification of Block-structure Models: The ideas behind the above described approach can be readily extended towards identification of more general block-structured models. The identification of Hammerstein–Wiener systems with invertible output non-linearity was described in [13]. This result is based on the observation that a so-called subspace intersection algorithms, a realisation for the internal states of a linear system is obtained as the intersection of a space spanned by measured inputs and outputs. Numerically, the intersection can be calculated using a Canonical Correlation Analysis, which in turn can be extended towards the kernel equivalent, using the so-called KCCA algorithm, see e.g. [34]. Identification of General Wiener–Hammerstein models was described in [10], using a slight extension of the overparametrisation technique. • Large-Scale Problems: Extensions using fixed-size kernel methods for large datasets described in [30] were used to extend the kernel-based approach towards a method able to deal with O(106 ) number of training data points, see e.g. [10] or [8].
15.7 Outlook This chapter gave an account of an identification method for Hammerstein systems integrating kernel methods with Bai’s overparametrisation technique. Illustrations of this technique on real data can be found in e.g. [14, 11] and [10, 8]. While the method
256
I. Goethals et al.
is not exploiting any assumption on the inputs (like whiteness) directly, the influence of persistency of excitation is not well understood in such approaches (see e.g. [29] for the specific case where polynomial basis functions were used). However, regularisation is found to deal effectively with the lack of persistency. A thorough theoretical understanding of this observation is still missing. A second question around this approach is the overall asymptotic performance (including bias, consistency or variance expressions). The main difficulty is that the overparametrisation technique in general lacks a global objective function. As a result the necessary conditions for the method to work well are not fully established. From a more applied perspective, the extension to identification of Wiener structures is only covered in special cases and needs more work. Acknowledgements. Tillmann Falck, Johan Suykens and Bart De Moor are supported by Research Council KUL: GOA AMBioRICS, GOA MaNet, CoE EF/05/006 Optimization in Engineering (OPTEC), IOF-SCORES4CHEM, several PhD/postdoc & fellow grants; Flemish Government: FWO: PhD/postdoc grants, projects G.0452.04 (new quantum algorithms), G.0499.04 (Statistics), G.0211.05 (Nonlinear), G.0226.06 (cooperative systems and optimization), G.0321.06 (Tensors), G.0302.07 (SVM / Kernel), G.0320.08 (convex MPC), G.0558.08 (Robust MHE), G.0557.08 (Glycemia2), G.0588.09 (Brain-machine); research communities (ICCoS, ANMMM, MLDM); G.0377.09 (Mechatronics MPC); IWT: PhD Grants, McKnow-E, Eureka-Flite+, SBO LeCoPro, SBO Climaqs, POM; Belgian Federal Science Policy Office: IUAP P6/04 (DYSCO, Dynamical systems, control and optimization, 2007-2011); EU: ERNSI; FP7-HD-MPC (INFSO-ICT-223854), COST intelliCIS, EMBOCOM; Contract Research: AMINAL; Other: Helmholtz: viCERP, ACCM, Bauknecht, Hoerbiger. Ivan Goethals is a senior actuary at ING Life Belgium. Johan Suykens is a professor and Bart De Moor is a full professor at the Katholieke Universiteit Leuven, Belgium.
References 1. Bai, E.W.: An optimal two-stage identification algorithm for Hammerstein-Wiener nonlinear systems. Automatica 4(3), 333–338 (1998) 2. Bai, E.W.: A blind approach to Hammerstein model identification. IEEE Transactions on Signal Processing 50(7), 1610–1619 (2002) 3. Bako, L., Merc`ere, G., Lecoeuche, S., Lovera, M.: Recursive subspace identification of Hammerstein models based on least squares support vector machines. IET Control Theory & Applications 3, 1209–1216 (2009) 4. Chang, F.H.I., Luus, R.: A noniterative method for identification using the Hammerstein model. IEEE Transactions on Automatic Control 16, 464–468 (1971) 5. Crama, P.: Identification of block-oriented nonlinear models. PhD thesis, Vrije Universiteit Brussel, Dept. ELEC (2004) 6. Crama, P., Schoukens, J.: Hammerstein-Wiener system estimator initialization. In: Proc. of the International Conference on Noise and Vibration Engineering (ISMA 2002), Leuven, pp. 1169–1176 (2002) 7. Crama, P., Schoukens, J.: Initial estimates of Wiener and Hammerstein systems using multisine excitation. IEEE Transactions on Measurement and Instrumentation 50(6), 1791–1795 (2001)
15
NARX Identification of Hammerstein Systems Using LS-SVM
257
8. De Brabanter, K., Dreesen, P., Karsmakers, P., Pelckmans, K., De Brabanter, J., Suykens, J.A.K., De Moor, B.: Fixed-Size LS-SVM Applied to the Wiener–Hammerstein Benchmark. In: Proceedings of the 15th IFAC Symposium on System Identification (SYSID 2009), Saint-Malo, France, pp. 826–831 (2009) 9. Dempsey, E.J., Westwick, D.T.: Identification of Hammerstein models with cubic spline nonlinearities. IEEE Transactions on Biomedical Engineering 51, 237–245 (2004) 10. Falck, T., Pelckmans, K., Suykens, J.A.K., De Moor, B.: Identification of Wiener– Hammerstein Systems using LS-SVMs. In: Proceedings of the 15th IFAC Symposium on System Identification (SYSID 2009), Saint-Malo, France, pp. 820–825 (2009) 11. Goethals, I., Hoegaerts, L., Suykens, J.A.K., Verdult, V., De Moor, B.: HammersteinWiener subspace identification using kernel Canonical Correlation Analysis. Technical Report 05-30, ESAT-SISTA, K.U.Leuven, Leuven Belgium (2005), http://ftp.esat.kuleuven.ac.be/pub/SISTA/goethals/ goethals_hammer_wiener.ps 12. Goethals, I., Mevel, L., Benveniste, A., De Moor, B.: Recursive output-only subspace identification for in-flight flutter monitoring. In: Proceedings of the 22nd International Modal Analysis Conference (IMAC-XXII), Dearborn, Michigan (2004) 13. Goethals, I., Pelckmans, K., Hoegaerts, L., Suykens, J.A.K., De Moor, B.: Subspace intersection identification of Hammerstein-Wiener systems. In: Proceedings of the 44th IEEE conference on Decision and Control, and the European Control Conference (CDCECC 2005), Seville, Spain, pp. 7108–7113 (2005) 14. Goethals, I., Pelckmans, K., Suykens, J.A.K., De Moor, B.: Identification of MIMO Hammerstein models using least squares support vector machines. Automatica 41(7), 1263–1272 (2005) 15. Goethals, I., Pelckmans, K., Suykens, J.A.K., De Moor, B.: Subspace identification of Hammerstein systems using least squares support vector machines. IEEE Transactions on Automatic Control, Special Issue on System Identification 50(10), 1509–1519 (2005) 16. Golub, G.H., Van Loan, C.F.: Matrix Computations. John Hopkins University Press, Baltimore (1989) 17. Greblicki, W., Pawlak, M.: Identification of discrete Hammerstein systems using kernel regression estimates. IEEE Transactions on Automatic Control 31, 74–77 (1986) 18. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, Heidelberg (2001) 19. Janczak, A.: Neural network approach for identification of Hammerstein systems. International Journal of Control 76(17), 1749–1766 (2003) 20. Kulcsar, B., Van Wingerden, J.W., Dong, J., Verhaegen, M.: Closed-loop Subspace Predictive Control for Hammerstein systems. In: Proceedings of the 48th IEEE Conference on Decision and Control held jointly with the 28th Chinese Control Conference (CDC 2009/CCC 2009), Shanghai, China, pp. 2604–2609 (2009) 21. Lovera, M., Gustafsson, T., Verhaegen, M.: Recursive subspace identification of linear and non-linear wiener state space models. Automatica 36, 1639–1650 (1998) 22. McKelvey, T., Hanner, C.: On identification of Hammerstein systems using excitation with a finite number of levels. In: Proceedings of the 13th International Symposium on System Identification (SYSID 2003), pp. 57–60 (2003) 23. Merc`ere, G., Lecoeuche, S., Lovera, M.: Recursive subspace identification based on instrumental variable unconstrained quadratic optimization. Adaptive Control and Signal Processing, Special issue on Subspace-based identification in adaptive control and signal processing 18, 771–797 (2004)
258
I. Goethals et al.
24. Narendra, K.S., Gallman, P.G.: An iterative method for the identification of nonlinear systems using the Hammerstein model. IEEE Transactions on Automatic Control 11, 546–550 (1966) 25. Pawlak, M.: On the series expansion approach to the identification of Hammerstein systems. IEEE Transactions on Automatic Control 36, 736–767 (1991) 26. Pelckmans, K., Goethals, I., De Brabanter, J., Suykens, J.A.K., De Moor, B.: Componentwise least squares support vector machines. In: Wang, L. (ed.) Support Vector Machines: Theory and Applications, pp. 77–98. Springer, Heidelberg (2005) 27. Sch¨olkopf, B., Smola, A.: Learning with Kernels. MIT Press, Cambridge (2002) 28. Sj¨oberg, J., Zhang, Q., Ljung, L., Benveniste, A., Delyon, B., Glorennec, P., Hjalmarsson, H., Juditsky, A.: Nonlinear black-box modeling in system identification: a unified overview. Automatica 31(12), 1691–1724 (1995) 29. Stoica, P., S¨oderstr¨om, T.: Instrumental-Variable Methods for Identification of Hammerstein Systems. International Journal of Control 35(3), 459–476 (1982) 30. Suykens, J.A.K., Van Gestel, T., De Brabanter, J., De Moor, B., Vandewalle, J.: Least Squares Support Vector Machines. World Scientific, Singapore (2002) 31. Van Overschee, P., De Moor, B.: Subspace Identification for Linear Systems: Theory, Implementation, Applications. Kluwer Academic Publishers, Dordrecht (1996) 32. Van Pelt, T.H., Bernstein, D.S.: Nonlinear system identification using Hammerstein and nonlinear feedback models with piecewise linear static maps - part I: Theory. In: Proceedings of the American Control Conference (ACC 2000), pp. 225–229 (2000) 33. Vapnik, V.N.: Statistical Learning Theory. Wiley & Sons, Chichester (1998) 34. Verdult, V., Suykens, J.A.K., Boets, J., Goethals, I., De Moor, B.: Least squares support vector machines for kernel CCA in nonlinear state-space identification. In: Proceedings of the 16th international symposium on Mathematical Theory of Networks and Systems (MTNS 2004), Leuven, Belgium (2004) 35. Verhaegen, M., Westwick, D.: Identifying MIMO Hammerstein systems in the context of subspace model identification methods. International Journal of Control 63, 331–349 (1996) 36. Wahba, G.: Spline models for Observational data. SIAM, Philadelphia (1990) 37. Westwick, D., Verhaegen, M.: Identifying MIMO Wiener systems using subspace model identification methods. Signal Processing 52(2), 235–258 (1996) 38. Wills, A.G., Ninness, B.: Estimation of Generalised Hammerstein–Wiener Systems. In: Proceedings of the 15th IFAC Symposium on System Identification (SYSID 2009), Saint-Malo, France, pp. 1104–1109 (2009)
Chapter 16
Identification of Linear Systems with Hard Input Nonlinearities of Known Structure Er-Wei Bai
16.1 Problem Statement Hard input nonlinearities are common in engineering practice. These nonlinearities severely limit the performance of control systems. Therefore, robust controls are often used [8] to cancel or reduce the effect of these harmful nonlinearities. Those control designs require values of the parameters that represent the hard nonlinearities. Clearly, system identification constitutes a crucial part in such control designs if the parameters are unknown. The difficulty of identification for the system with a hard input nonlinearity is that the unknown parameters of the nonlinearity and the linear system are coupled. Moreover, the output of the hard nonlinear block may not be written as an analytic function of the input. Surprisingly, there is only scattered work reported in the literature on identification of systems with hard nonlinearities [4, 9], although robust control designs involving these hard nonlinearities become a very active research area in recent years. This chapter is based on [1] with permission from Automatica/Elsevier and all the proofs can be found in [1]. This chapter studies identification of a stable SISO discrete time linear system with a hard input nonlinearity with y(k), u(k) and v(k) being the system output, input and noise respectively. Note that the internal signal x(k) is not measurable. The linear system is assumed to be stable and is represented by a transfer function of known order β1 z−(n−1) + β2z−(n−2) + ... + βn , (16.1) H(z) = zn − α1 z−(n−1) − ... − αn parametrised by the parameter vector
θ T = (α1 , ..., αn , β1 , ..., βn ).
(16.2)
Er-Wei Bai Dept. of Electrical and Computer Engineering, University of Iowa e-mail:
[email protected] F. Giri & E.-W. Bai (Eds.): Block-oriented Nonlinear System Identification, LNCIS 404, pp. 259–270. c Springer-Verlag Berlin Heidelberg 2010 springerlink.com
260
E.-W. Bai
x(k)
x(k)
a
x(k)
a
-a -a
a
u(k)
u(k) -a
a
-1
u(k)
Relay
Preload x(k)
Saturation x(k)
1
-a
x(k)
1 -a
a
u(k)
-a
a
u(k)
-a
a
u(k)
-1 Dead-zone
Hysteresis-relay
Hysteresis
Fig. 16.1: Examples of input nonlinearities
The nonlinear block represents a static or non-static nonlinearity in the form of x(k) = N (u(k), ..., u(0), a)
(16.3)
for some nonlinear functions N parametrised by the parameter vector a ∈ Rl . Common examples of input nonlinearities are the Saturation, Preload, Relay, Dead-zone, Hysteresis-relay and Hysteresis nonlinearities as shown in Figure 16.1, = 1+sgn(a−|u(k)|) u(k) + 1+sgn(|u(k)|−a) a · sgn(u(k)) 2 2 = u(k) + a · sgn(u(k)) = u(k) − a · sgn(u(k)) − [1+sgn(a−|u(k)|)] (u(k) − a · sgn(u(k))) 2 =⎧ [sgn(u(k) − a) + sgn(u(k) + a)]/2 1 (u(k) > a) or (|u(k)| ≤ a and u(k) − u(k − 1) < 0) ⎪ ⎪ ⎨ or (|u(k)| ≤ a and u(k) = u(k − 1) and x(k − 1) = 1) xhys−relay (k) = −1 (u(k) < −a) or (|u(k)| ≤ a and u(k) − u(k − 1) > 0) ⎪ ⎪ ⎩ or (|u(k)| ≤ a and u(k) = u(k − 1) and x(k − 1) = −1) ⎧ u(k) − a u(k) − u(k − 1) > 0 ⎨ u(k) + a u(k) − u(k − 1) < 0 xhysteresis (k) = ⎩ x(k − 1) u(k) = u(k − 1) (16.4) where sgn is the standard sign function. Note that the gain of all nonlinearities in Figure 16.1 is assumed to be 1. This is to avoid the non-unique parametrisation problem due to the product of the nonlinear block and the linear system. If the gain is not 1, say α , it can be absorbed by the xsaturation (k) x preload (k) xdeadzone (k) xrelay (k)
linear system
αβ1 z−(n−1) +αβ2 z−(n−2) +...+αβn . zn −α1 z−(n−1) −...−αn
16
Identification of Linear Systems
261
Our identification approach is based on the Hammerstein model [2, 3, 7]. There exists a large number of works in the literature on Hammerstein model identification. Most results require that the nonlinearity is static and analytic, usually a polynomial [2, 3, 7] which is linear in the unknown parameter a. This is, however, not the case for hard nonlinearities. The hard nonlinearities may not be approximated by polynomials in stability analysis. Moreover, expressions of these hard nonlinearities are not linear in the unknown a. Determination of segments itself depends on the unknown a. To overcome these difficulties, some algorithms were proposed [4, 9]. For instance, an identification algorithm for a two-segment piecewise-linear nonlinearity was proposed in [9]. This algorithm is based on alternative estimation of the parameters and some argument variables. Though simulations illustrate some good results, as pointed out in the chapter, the convergence of the estimates is not analysed and can be divergent in some applications [7]. Moreover, approaches of [4, 9] do not apply to the non-static nonlinearity either. Two identification algorithms are proposed in the chapter. For nonlinearities parametrised by a single unknown constant a as in Figure 16.1, a separable least squares approach is proposed. It is shown that the identification problem is equivalent to an one-dimensional minimisation problem. This method makes full use of the low dimensionality of the nonlinearity and is found to be very effective. For a general input nonlinearity, a correlation analysis approach is presented. The novelty of this approach lies in the repeated applications of inputs.
16.2 Deterministic Approach In this section, input nonlinearities parametrised by a lower dimensional parameter vector are considered. In particular, detailed analysis is given for input nonlinearities parametrised by an one-dimensional parameter a. Such nonlinearities are common in practice and examples are shown in Figure 16.1. The purpose is to develop an efficient method making full use of their low dimensionality.
16.2.1 Identification Algorithm The linear system part can be written in time domain as y(k) = (y(k − 1), ..., y(k − n), x(k − 1), ..., x(k − n))θ + v(k),
(16.5)
with the unknown θ as well as a because x(k) = N (u(k), ..., u(0), a). Let x(k) ˆ = N (u(k), ..., u(0), a) ˆ denote the estimate of x(k) using a. ˆ Define
(16.6)
262
E.-W. Bai
eθˆ ,aˆ (k) = y(k)− (y(k − 1), ..., y(k − n), x(k ˆ − 1), ...x(k ˆ − n))θˆ , k = 1, 2, ..., N (16.7) and the objective function [5] J=
1 N 2 ∑ eθˆ ,aˆ (k). N k=1
The estimates aˆ and θˆ are the ones that minimise J. With ⎛ ⎞ ⎛ y(1) y(0) . . . y(1 − n) x(0) ˆ ⎜ y(2) ⎟ ⎜ y(1) . . . y(2 − n) x(1) ˆ ⎜ ⎟ ⎜ Y = ⎜ . ⎟ , A(a) ˆ =⎜ .. .. .. .. ⎝ .. ⎠ ⎝ . . . . y(N − 1) . . .
y(N)
(16.8)
... ... .. .
y(N − n) x(N ˆ − 1) . . .
⎞ x(1 ˆ − n) x(2 ˆ − n) ⎟ ⎟ ⎟, .. ⎠ . x(N ˆ − n) (16.9)
the objective function J can be rewritten as J=
1 Y − A(a) ˆ θˆ 2 . N
(16.10)
For a given data set {y(k), u(k)}, this minimisation involves two variables aˆ and θˆ . J may be a non-smooth function of a, ˆ but is smooth in θˆ . Moreover, 0=
1 ∂J = −AT (a)Y ˆ + AT (a)A( ˆ a) ˆ θˆ . 2 ∂ θˆ
(16.11)
Clearly, if AT (a)A( ˆ a) ˆ is invertible, the necessary and sufficient condition for θˆ to be optimal is that θˆ = [AT (a)A( ˆ a)] ˆ −1 AT (a)Y ˆ (16.12) provided that aˆ is optimal. Therefore, by substituting θˆ in terms of aˆ back into J, it follows that 1 J(a) ˆ = (I − A(a)[A ˆ T (a)A( ˆ a)] ˆ −1 AT (a))Y ˆ 2 . (16.13) N By substitution, for all six nonlinearities shown in Figure 16.1, θˆ is eliminated and the dimension of the search space is reduced from (1 + 2n) to 1. This kind of elimination of variables in optimisation literature is referred to as the separable nonlinear least squares problems [6]. Now, the original identification problem has been transformed into an one dimensional minimisation problem (16.13) for all six nonlinearities in Figure 16.1. Once the optimal aˆ is obtained, the optimal θˆ follows from (16.12). It is important to remark that the minimisation of (16.13) is always 1-dimensional for all six nonlinearities shown in Figure 16.1 independent of the linear part which could be parametrised by a high dimensional vector θ ∈ R2n . The deterministic identification algorithm for systems with hard input nonlinearities parametrised by a single parameter a is now summarised.
16
Identification of Linear Systems
263
Separable least squares identification algorithm for systems with hard input nonlinearities shown in Figure 16.1: Step 1: Consider the system (16.5), collect the data set {u(k), y(k)} and define Y and A(a). ˆ Step 2: Solve (16.13) for the optimal a. ˆ Step 3: Calculate the optimal θˆ as in (16.12). To illustrate the effectiveness of the proposed approach, the algorithm is tested with all six nonlinearities shown in Figure 16.1 with the following example. Example 16.1. Let the linear system be y(k) = α1 y(k − 1) + α2y(k − 2) + β1x(k − 1) + β2x(k − 2) + v(k) where θ T = [α1 , α2 , β1 , β2 ] = [−0.8333, −0.1667, 1, 1] which is unknown and v(k) is an i.i.d. random sequence in [−0.2, 0.2]. For simulation, N = 100 and input is uniformly distributed in [−4, 4]. Now, consider the above linear system with the Preload nonlinearity of a = 1, Dead-zone nonlinearity of a = 1, Saturation nonlinearity of a = 1, Relay nonlinearity of a = 1, Hysteresis-relay nonlinearity of a = 1 and Hysteresis nonlinearity of a = 1, separately. The true values of a and θ , and the estimates aˆ and θˆ are, respectively, shown in Table 16.1, where the error is defined as (aT , θ T ) − (aˆT , θˆ T )2 .
Table 16.1: True values and the estimates
θT Preload aˆ = 1 θˆ T Dead-Zone aˆ = 1 θˆ T Saturation aˆ = 1 θˆ T Relay aˆ = 1 θˆ T Hysteresis-relay aˆ = 1.02 θˆ T Hysteresis aˆ = 1 θˆ T a=1
= (−0.8333, −0.1667, 1, 1)
Error
= (−0.8336, −0.1676, 1.0022, 1.0046) = (−0.8315, −0.1735, 0.9957, 0.9949) = (−0.8214, −0.1626, 1.0146, 0.9779) = (−0.8312, −0.1712, 1.0103, 1.0132) = (−0.8259, −0.1634, 0.9962, 0.9863) = (−0.8357, −0.1636, 0.9973, 1.0004)
0.0052 0.0097 0.0293 0.0175 0.0167 0.0048
Note that only 100 data points are used to accurately estimate the unknown θ and a. This is because the dimension of the problem is reduced to one.
16.2.2 Consistency Analysis and Computational Issues Note that the estimates are derived from the minimisation of J(a). ˆ There are two questions that need to be answered: (1) how to find the global minimum of the nonlinear optimisation problem J(a), ˆ and (2) if the global minimum can be obtained,
264
E.-W. Bai
how the estimates perform in the noisy situation. The second question is discussed ˆ are derived from the prediction error. Therefore, first. At each N, the estimates (θˆ , a) with i.i.d. zero mean Gaussian noise, these estimates are actually the Maximum Likelihood estimates [5]. Moreover, if the noise v(·) is i.i.d. and satisfies Ev(k) = 0, Ev2 (k) < ∞, E|v(k)|4+ρ < ∞ for some ρ > 0, then the following strong consistency results hold θ0 θ0 θˆ = lim w.p.1., = arg min lim Ee2θˆ ,aˆ (k). a0 a0 N→∞ aˆ θˆ ,aˆ N→∞
(16.14)
(16.15)
In the above equations, E stands for the expectation operator. Clearly, the separable least squares estimates are well behaved in the noisy situation if the global minimum of J(a) ˆ can be found. How to find the global minimum of J(a) ˆ in general is a hard question that depends on the input nonlinearities. Recall, however, that J(a) ˆ is one dimensional for all six input nonlinearities discussed in this chapter. Therefore, one may apply global search methods to find the minimum. Alternative approach is to use the graphical method to find the minimum. After collections of the data set {y(k), u(k)}, x(k) ˆ and consequently A(a) ˆ can be constructed using a, ˆ and therefore, the complete picture of J(a) ˆ with respect aˆ can be plotted. This graphical picture provides us accurate information on where the global minimum is. Then, local search algorithms, for instance, simplex method can be applied in that region to find the global minimum. In fact, the global minimum can also be obtained directly from the plot of J(a) ˆ versus a. ˆ Using the data generated in Example 1, the plots of J(a) ˆ versus aˆ for all six nonlinearities in Figure 16.1 are shown in Figure 16.2, where in each subplot the vertical axis is J(a) ˆ and the horizontal axis is a. ˆ In all six figures, the neighbourhoods where the global minimum lies can be easily seen. Remark 16.1. To compare with the existing results of deterministic identification algorithms for the hard input nonlinearities, for instance in [9], our method is very efficient for nonlinearities parametrise by one parameter. First, the global minimum can always be obtained at least numerically based the plot of J(a) ˆ versus a. ˆ Secondly, the estimates have strong consistency results and are well behaved in noisy situations. There is no consistency analysis for the estimates of [9] and it is not clear how they perform in a noisy situation. Moreover, the method of [9] is alternative estimate and can be divergent [7] though rarely. Finally, the proposed method applies to non-static nonlinearities. The disadvantage of the proposed method is that it does not extend to the case that the nonlinearity is parametrised by a high dimensional vector due to nonlinear minimisation of J(a). ˆ We remark that this problem is not created by our formulation but is inherent in nonlinear system identification. Our approach makes full use of the low dimensionality of those nonlinearities parametrised by an one-dimensional parameter so that the global minimum is obtainable.
16
Identification of Linear Systems
265
16.3 Correlation Analysis Method As discussed before, the separable least squares method can be easily extended to the case where the nonlinearity is parametrised by a two-dimensional vector. However, it seems hard to extend the method to the case where the nonlinearity is parametrised by a higher dimensional a. In this section, a general input nonlinearity parametrised by some a ∈ Rl is considered and an algorithm for identification of systems with static hard input nonlinearities is proposed by using correlation analysis. Throughout this section, it is assumed that the input u(k) is at our disposal and is a zero mean i.i.d. random sequence with finite variance. The noise v(k) is assumed to be independent of the input. Recall that all signals are ergodic and the input nonlinearity is assumed to be static, it follows that Ev(k)u(k − j) = 0 and Ex(k − i)u(k − j) = Ex(k)u(k + i − j) = qδ (i − j) (16.16) where q = Ex(k)u(k). Based on the system model, the following equations hold for m ≥ 2n,
Fig. 16.2: J(a) ˆ versus aˆ
266
Ey(k)u(k − 1) Ey(k)u(k − 2)
E.-W. Bai
= β1 Ex(k − 1)u(k − 1) = α1 Ey(k − 1)u(k − 2) + β2Ex(k − 2)u(k − 2) .. .
= α1 Ey(k − 1)u(k − n) + ... + αn−1Ey(k − n + 1)u(k − n) +βn Ex(k − n)u(k − n) Ey(k)u(k − n − 1) = α1 Ey(k − 1)u(k − n − 1) + ... + αnEy(k − n)u(k − n − 1) . .. . Ey(k)u(k − n)
Ey(k)u(k − 2n) Ey(k)u(k − m)
= α1 Ey(k − 1)u(k − 2n) + ... + αnEy(k − n)u(k − 2n) .. . = α1 Ey(k − 1)u(k − m) + ... + αnEy(k − n)u(k − m)
(16.17) Let w(i) = Ey(k)u(k − i), then ⎛ ⎞ ⎛ w(1) 0 ⎜ w(2) ⎟ ⎜ w(1) ⎜ ⎟ ⎜ ⎜ ⎟ ⎜ .. .. ⎜ ⎟ ⎜ . . ⎜ ⎟ ⎜ ⎜ w(n) ⎟ ⎜ w(n − 1) ⎜ ⎟ ⎜ ⎜w(n + 1)⎟ ⎜ w(n) ⎜ ⎟=⎜ ⎜ ⎟ ⎜ .. .. ⎜ ⎟ ⎜ . . ⎜ ⎟ ⎜ ⎜ w(2n) ⎟ ⎜ w(2n − 1) ⎜ ⎟ ⎜ ⎜ ⎟ ⎜ .. .. ⎝ ⎠ ⎝ . . w(m) w(m − 1)
⎞ 0 0⎟ ⎟⎛ ⎞ .. ⎟ α1 . ⎟ . ⎟ ⎟⎜ ⎜ .. ⎟ w(n − 2) . . . 0 1⎟ ⎟ ⎟⎜ ⎜ αn ⎟ w(n − 1) . . . w(1) 0⎟ ⎟. ⎟⎜ qβ1 ⎟ ⎟⎜ .. .. .. ⎜ ⎟ ⎟ . . 0 ⎟⎜ . ⎟ . ⎝ .. ⎠ ⎟ w(2n − 2) . . . w(n) 0 . . . . . . 0 ⎟ ⎟ qβn .. .. . . .. .. . . ... 0 ⎠ . . . w(m − 2) . . . w(m − n) 0 0 . . . 0 (16.18) The estimate of αi and qβ j can be obtained by solving the above equation. To further find q and a, notice that q = Ex(k)u(k) depends on the distribution f of u(k) as well as the unknown a, i.e., q = q( f , a). If identification is carried out (l + 1) times with different distribution f1 , f2 , ..., fl+1 , the ratios ci =
0 0 .. .
... ... .. .
0 0 .. .
1 0 ... 0 1 ... .. .. . . . . . 0 0 ... 0 0 ... .. . . . ... .
q( f1 , a)β j q( f1 , a) , i = 1, 2, ..., l = q( fi+1 , a)β j q( fi+1 , a)
(16.19)
are numerically obtained and this provides l equations for the unknown a ∈ Rl ci q( fi+1 , a) = q( f1 , a), i = 1, 2, ..., l.
(16.20)
All variables ci ’s and q( fi , a)’s are computable and thus, a can be solved. Several examples are provided below. Non-symmetric Relay nonlinearity: Consider a non-symmetric Relay ⎧ u(k) > a2 ⎨ 1 0 −a1 < u(k) < a2 x(k) = (16.21) ⎩ −1 u(k) < −a1
16
Identification of Linear Systems
267
where a = (a1 , a2 ) is two dimensional. Let f1 and f2 be uniform distributions in [−b1 , b1 ] and [−b2 , b2 ] respectively. Let the third distribution be ⎧ 0 ⎪ ⎪ ⎨ f3 (u) =
⎪ ⎪ ⎩
d22 2 d1 d2 +d2 d12 d12 d1 d22 +d2 d12
u < −d1 or u > d2 , −d1 ≤ u ≤ 0 ,
(16.22)
0 < u ≤ d2 .
For d2 , d1 , b1 , b2 > max(a1 , a2 ), it follows that (2b2i − a21 − a22) d12 d22 1 d22 a21 + d12 a22 , i = 1, 2 and q( f3 , a) = − . 4bi d1 d22 + d2d12 2 d1 d22 + d2 d12 (16.23) q( f 1 ,a) From the definitions of ci = q( f ,a) , it follows that q( fi , a) =
i+1
' & 2 ' 2b1 b2 − 2c1b1 b22 2 a 1 = . 2b c2 d22 2b c2 d12 4b c2 d12 d22 1 − d d 21+d , 1 − d d 21+d a22 2b21 − d d12 +d d2 d2 d2
&
b 2 − c1 b 1 , 1 2
2 1
b 2 − c1 b 1 1 2
2 1
1 2
2 1
Hence, a = (a1 , a2 ) are uniquely obtained by solving the above equation. Non-symmetric Preload nonlinearity: In this case, ⎧ ⎨ u(k) + a2 u(k) > 0 , 0 u(k) = 0 , (16.24) x(k) = ⎩ u(k) − a1 u(k) < 0 . Let f1 and f2 be two uniform distributions in [−b1, b1 ] and [−b2 , b2 ] respectively, and ⎧ ⎨ 0.5δ (u + b3) u = −b3 1 0 ≤ u ≤ b3 f3 (u) = (16.25) ⎩ 2b3 0 otherwise where δ (t) is the δ function. It is easily calculated that q( fi , a) =
2b2 a1 b3 a2 b3 b2i bi (a1 + a2 ) + , i = 1, 2 and q( f3 , a) = 3 + + . (16.26) 3 4 3 2 4
Therefore, a = (a1 , a2 ) can be uniquely calculated from ⎛ 2 2 ⎞ 4(b1 −b2 c1 ) 1, 1 a1 3(b2 c1 −b1 ) ⎠ . ⎝ = b3 c2 b3 c2 b1 b1 2b2 c b21 a − , − 2 2 4 4 4 − 32 3
(16.27)
3
Saturation nonlinearity in Figure 16.1: Let f1 and f2 be two the uniform distributions in [−bi , bi ], i = 1, 2 with b2 , b1 > a. It is a routine to calculate
268
E.-W. Bai
q( fi , a) =
1 a b2i ( − a2 ), i = 1, 2 bi 2 6
(16.28)
and this implies, from the definition of c1 , : 3(c1 b2 − b1 )b1 b2 a= | |. c1 b 1 − b 2
(16.29)
Dead-Zone nonlinearity in Figure 16.1. Let f1 and f2 be two uniform distributions in [−b1 , b1 ] and [−b2 , b2 ] respectively with b2 > b1 > a. It is easily calculated that q( fi , a) =
1 3 abi 1 2 + bi , i = 1, 2 and c1 = a − 6bi 2 3
1 3 6b1 a − 1 3 6b2 a −
ab1 2 ab2 2
+ 13 b21 + 13 b22
.
(16.30)
This implies (c1 b1 − b2 )a3 + 3b1b2 (b1 − c1 b2 )a + 2b1b2 (c1 b22 − b21) = 0.
(16.31)
It can be shown that this equation always has three real roots. One lies in the interval (−∞, 0), the second one in (0, b1 ) and the last one in (b1 , ∞), provided b2 > b1 > a > 0. Since 0 < a < b1 and b1 is known, a can be uniquely determined. The identification algorithm using correlation analysis for static input nonlinearities is now summarised. Identification algorithm using correlation analysis: Step 1: Apply input u(k) with the distributions f1 and define w(i) =
1 N ∑ y(k)u(k − i), i = 1, 2, ..., m N k=1
(16.32)
for some large N and m ≥ 2n. Step 2: Construct equation (16.18). Solve the equation and denote the solution by αˆ i and q( f1 , a)β j . Step 3: Repeat Steps 1 and 2 by applying the input with different distributions fi , i = 2, ..., l + 1 to obtain q( fi , a)β j . Step 4: Calculate q( fi , a) and find ci =
q( f1 , a)β j q( f1 , a) , i = 2, ..., l + 1. = q( fi , a)β j q( fi , a)
Denote the solution by a. ˆ Compute q( f1 , a) using a. ˆ Step 5: The estimates are a, ˆ αˆ i and
(16.33)
16
Identification of Linear Systems
269
q( f1 , a)β j βˆ j = , j = 1, 2, ..., n. q( f1 , a)
(16.34)
Note that w(i) = N1 ∑Nk=1 y(k)u(k − i) → Ey(k)u(k − i) as N → ∞ and therefore, the estimates derived by the correlation method converge to the true values. It is also noted that in calculating ci =
q( f 1 ,a)β j q( f i+1 ,a)β j
and the corresponding a, ˆ any j can be
used. It may be beneficial to use the average cˆi = 1n ∑nj=1
q( f 1 ,a)β j . q( f i+1 ,a)β j
Now, consider a
numerical simulation example. Example 2: Consider the same linear system as in Example 1 with the Saturation nonlinearity of a = 1. For simulation, N = 2000, and v(k) is uniformly in [−0.1, 0.1]. Apply two inputs uniformly distributed in [−2, 2] and [−3, 3] respectively. The Identification Algorithm produces following estimates [a, ˆ αˆ 1 , αˆ 2 , βˆ1 , βˆ2 ] = [1.005, −0.8390, −0.1555, 1.09, 1.08]
(16.35)
with the estimation error [a, ˆ αˆ 1 , αˆ 2 , βˆ1 , βˆ2 ] − [a, α1 , α2 , β1 , β2 ] = 0.12.
(16.36)
Remark 16.2. Comparing to the deterministic identification algorithm presented in the previous section, the correlation method needs to estimate the correlation between input and outputs and therefore, a long length of data is needed. In Example 2, the estimation results using 2000 data points are not as good as the one by the separable least squares approach using only 100 data points. However, the correlation method applies to nonlinearities parametrised by some a ∈ Rl while the separable least squares method is limited to nonlinearities parametrised by an one or two dimensional a. Remark 16.3. The choices of fi ’s are arbitrary and the formulae derived above are just examples. One may pick other input distributions for identification. Of course, formulae would be different for different distributions. It is also interesting to note that if the nonlinearity is even, then any even distribution fi gives rise to a zero q( fi , a). In this case, non-even distributions such as f3 in the non-symmetric Preload example can be used. Similar discussion applies to odd nonlinearities.
16.4 Concluding Remarks Two identification algorithms are proposed for the system with hard input nonlinearities. The first one notes the fact that a is one dimensional and thus, transforms a higher dimensional nonlinear identification problem into an one dimensional minimisation problem. The method is particularly effective for many input nonlinearities which are parametrised by a single parameter a. The approach also applies to
270
E.-W. Bai
nonlinearities with memory. The second algorithm relies on repeated identifications with different random input sequences and is convergent.
References 1. Bai, E.W.: Identification of linear systems with hard input nonlinearities. Automatica 38, 967–979 (2002) 2. Bai, E.W.: An optimal two-stage identification algorithm for Hammerstein–Wiener nonlinear systems. Automatica 34, 333–338 (1998) 3. Billings, S.A., Fakhouri, S.Y.: Identification of a class of nonlinear systems using correlation analysis. Proc. of IEEi 125, 691–697 (1978) 4. Gu, X., Bao, Y., Lang, Z.: A parameter identification method for a class of discrete time nonlinear systems. In: Proc. 12th IMACS World Congress, Paris, vol. 4, pp. 627–629 (1988) 5. Ljung, L.: System Identification: Theory for the User. Prentice-Hall, Englewood Cliffs (1987) 6. Ruhe, A., Wedin, P.: Algorithms for separable nonlinear least squares problems. SIAM Review 22, 318–337 (1980) 7. Stoica, P.: On the convergence of an iterative algorithm used for Hammerstein system identification. IEEE Trans. on Automatic Control 26, 967–969 (1981) 8. Tao, G., Canudas de Wit, C.A.: Special Issue on Adaptive Systems with Non-smooth Nonlinearities. Int. J. Adapt. Contr. Signal Process 11(1) (1997) 9. V¨or¨os, J.: Parameter identification of discontinuous Hammerstein systems. Automatica 33, 1141–1146 (1997)
Chapter 17
Blind Maximum-likelihood Identification of Wiener and Hammerstein Nonlinear Block Structures Laurent Vanbeylen and Rik Pintelon
17.1 Introduction: Blind Nonlinear Modelling Despite their structural simplicity, Wiener and Hammerstein nonlinear model structures have been effective in many application areas, where linear modelling has failed, e.g., the chemical process industry [5, 13], microwave and radio frequency (RF) technology [4, 7, 19], seismology [21], biology [8], physiology and psychophysics [14]. They can also be used in model predictive control [28, 29]. The Wiener and the Hammerstein model – consisting of the cascade connection of a linear time invariant dynamic system and a static nonlinearity, in this or in reverse order (see Figure 17.1) – both fit inside the family of block oriented nonlinear structures, which have been extensively studied over the past few decades, see e.g. [3], [20]. Most of these identification approaches assume the availability of both input and output measurements of the system. However, in several real-world applications, such as operational modal analysis [15], one often does not have access to the system et
ut H
yt f
(a) Wiener
et
ut f
yt H
(b) Hammerstein
Fig. 17.1: Block diagram of a Wiener and a Hammerstein system. The building blocks are a linear dynamic block H0 and a static nonlinearity f0 Laurent Vanbeylen Vrije Universiteit Brussel, Dept. ELEC, Pleinlaan 2, B-1050 Brussels, Belgium e-mail:
[email protected] Rik Pintelon Vrije Universiteit Brussel, Dept. ELEC, Pleinlaan 2, B-1050 Brussels, Belgium e-mail:
[email protected] F. Giri & E.-W. Bai (Eds.): Block-oriented Nonlinear System Identification, LNCIS 404, pp. 273–291. c Springer-Verlag Berlin Heidelberg 2010 springerlink.com
274
L. Vanbeylen and R. Pintelon
Table 17.1: Comparison of available blind identification methods for nonlinear systems Model (approach) Volterra-Hammerstein [9]: cumulant-based
Hammerstein–Wiener [2]
Assumptions and properties of the method • • • •
Input: Gaussian and white Robust to low-order moving-average output noise Consistent Parametric
• Input: zero-order-hold (piecewise constant) • Oversampled output • Invertible output nonlinearity admitting a polynomial representation • Not robust to output noise • Parametric
Wiener–Hammerstein [18]: polyspectral slices
• • • • • •
Input: circular symmetric Gaussian Polynomial nonlinearity First linear system has minimum phase Robust to circular symmetric output noise Nonlinearity is not identified Nonparametric
Wiener [21]: mutual information
• • • •
Input: non-Gaussian iid Invertible nonlinearity and filter Not robust to output noise Quasi-nonparametric
input: in this case, blind identification of the nonlinear system becomes the only option. Without measurements of the input, and with little prior knowledge and hypotheses about the system and the input, our goal is to identify the parameters of the nonlinear block-structured model. Blind identification (closely related to blind inversion and equalisation) for linear systems has a long history with many theoretical results and applications, especially in telecommunications. Comprehensive overviews can be found in [1] and [23]. However, the present knowledge on blind identification of nonlinear systems is rather limited; Table 17.1 gives an overview and comparison of the most recent available blind identification methods for nonlinear systems. In the present chapter, Maximum-Likelihood Estimators (MLE’s) are proposed, assuming a white unobserved Gaussian input signal, a minimum-phase linear part, an invertible nonlinearity and errorless output observations. The major advantages of the maximum-likelihood approach are the consistency (convergence to the true value as the number of data tend to infinity), the asymptotic normality and the asymptotic efficiency (asymptotically the smallest variance) of the estimates. This work has in part been published in [25, 27], in which additional technical details may be found.
17
Blind Maximum-likelihood Identification
275
17.1.1 Nonlinear Sensor Calibration Sensor and measurement technologies are typical application domains of the identification algorithms to be presented, especially in sensor calibrations: a certain physical quantity (acceleration, pressure, temperature, ...) is measured through a sensor which has a dynamic, nonlinear behaviour, and hence, one gets a distorted measured quantity. The goal is to reconstruct the original physical quantity. For a Wiener system, this is done by a Hammerstein system consisting of the inverse nonlinearity followed by the inverse of the original linear part. For a Hammerstein system, the equalisation is done similarly by a Wiener system. In a nutshell, the equalisation of the system is achieved through the identification of the inverse system.
17.1.2 Outline After introducing the Wiener and Hammerstein models, setting up the problem and the parametrisation (Section 17.2), we introduce the Gaussian maximum-likelihood cost functions (Section 17.3). Next, we discuss the generation of high-quality initial estimates via a two-step procedure (Section 17.4), the numerical optimisation (Section 17.5), the calculation of uncertainty bounds via the Cram´er-Rao Lower Bound (Section 17.6), and the impact of output noise (Section 17.7). As the number of data tend to infinity (i.e. asymptotically), the estimator has very strong statistical properties (Section 17.3.3). The simulation experiments (Section 17.8) illustrate these properties (Section 17.8.2) and show the bias effect of measurement noise (Section 17.8.3). Finally, in the last section (Section 17.9), the method is applied to a laboratory experiment. In the simulations and laboratory experiment sections, we restricted ourselves to the Wiener case.
17.2 Introduction of Models and Related Assumptions In this section the Wiener and Hammerstein models are introduced together with the necessary assumptions. Next, the following topics are handled successively: the parametrisation, the stochastic framework and the identifiability issue.
17.2.1 Class of Discrete-time Wiener and Hammerstein Systems Considered Wiener and Hammerstein systems are defined as cascades of a linear time-invariant dynamic system (LTI) H0 and a static (memoryless) nonlinearity f0 (Figure 17.1). Let the input and output be denoted as e (t) and y (t). In general, the LTI system can be characterised by its transfer function H0 (z). The static nonlinearity can be characterised by its input-output mapping function: for the Wiener system y = f0 (u), and for the Hammerstein system u = f0 (e).
276
L. Vanbeylen and R. Pintelon
Assumption 17.1. (the class of discrete-time Wiener and Hammerstein systems considered) 1. The function f0 (•) is a monotonically increasing, bijective function; its derivative exists and is non-zero over R. 2. H0 (z) is a causal, stable and inversely stable (i.e., minimum-phase without delay) monic transfer function. Discussion: In a blind framework, Assumptions 17.1.1 and 17.1.2 allow to recover the input signal e from the observed output y (up to transient errors).
17.2.2 Parametrisation The parameter vectors characterising the linear and the nonlinear parts are respectively denoted by θL and θNL . The LTI is parametrised in the numerator and denominator coefficients of its transfer function H(z, θL ): c 1 + ∑nr=1 cr z−r C(z, θL ) = (17.1) d D(z, θL ) 1 + ∑nr=1 dr z−r with θLT = c1 , c2 , . . . , cnc , d1 , d2 , . . . , dnd . On the other hand, the nonlinear function is parametrised inversely (i.e. from output to input): in the Wiener case u = g(y, θNL ), and in the Hammerstein case e = g(u, θNL ).
H(z, θL ) =
Assumption 17.2. (nonlinear parametrisation) 1. θNL gives a unique representation of the nonlinearity, viz. g(x, θNL ) = g(x, θ˜NL ) ∀x ⇔ θNL = θ˜NL . 2. g is twice continuously differentiable with respect to the first and second function argument. As usual, a weighted-sum-of-basis-functions representation is chosen: g(x, θNL ) = f1 (x), . . . , fdim(θNL ) (x) · θNL .
(17.2)
This inverse parametrisation allows for an easy inversion (equalisation) of the nonlinear model and an easy evaluation of the MLE cost function (see further). Finally, we define a global parameter vector by stacking the parameters of both subblocks onto each other: T . (17.3) θ T = θLT , θNL
17.2.3 Stochastic Framework In order to be able to set up the likelihood, one needs to specify the noise probability distribution (in this case of the input). Here, the output is assumed not to be affected
17
Blind Maximum-likelihood Identification
Fig. 17.2 Illustration of the identifiability issue (in the Wiener case); a nonzero scaling factor a can be displaced from the input to the linear or nonlinear blocks. Herein f is the forward nonlinear function. The several depicted setups are equivalent since only the output is observed
277 et
et
aet aet
H
aH
H
a H
ut
y fu
yt
aut
u y f a
yt
aut
u y f a
yt
ut
y fu
yt
by measurement noise. Gaussian output noise can be included, but makes the MLE much harder to construct and implement [26]. Assumption 17.3. (stochastic framework) 1. The unknown, unobserved input e(t) is zero mean, white Gaussian noise with unknown (non-zero) variance λ0 . 2. The output y(t) is known exactly (observed without errors).
17.2.4 Identifiability In this paragraph, we introduce a set of identifiability conditions, because thus far the Wiener and Hammerstein system to be identified were overparametrised. There is a scaling ambiguity, due to the possibility to displace a scaling factor from the linear part to the nonlinearity, and vice versa. Figure 17.2 illustrates several setups of a Wiener system which are not distinguishable from output measurements only (assume a ∈ R0 ). A similar issue pops up for a Hammerstein system. Assumption 17.4. (identifiability conditions) 1. H(z, θL ) has no common pole-zero pairs, and is causal, stable and inversely stable. θL ) 2. H(z, θL ) = C(z, D(z,θL ) is monic (this means that the constants in the C and D polynomials are 1). 3. g (x1 , θNL ) = 1 for some x1 ∈ R. Under Assumption 17.4, the parameter vectors θL and θNL can jointly be estimated in a unique way, i.e. with a given output signal’s statistical properties related to a given true model, there exists no different model in the considered model class possessing exactly the same statistical properties [25, 27].
17.3 The Gaussian Maximum-likelihood Estimator (MLE) As indicated in the introduction, we focus on the identification of the system parameters using the maximum-likelihood principle. To do so, we need first to set up
278
L. Vanbeylen and R. Pintelon
analytically the likelihood function using the assumptions just introduced (such as the model structure, the Gaussianity of the input, the invertibility). The maximumlikelihood estimates are then defined as the maximising argument of the likelihood function. In practice, the measured data is plugged into the cost function resulting from the analysis, and the result is minimised over the parameters.
17.3.1 The Negative Log-likelihood (NLL) Function Theorem 17.1. Under Assumptions 17.1–17.3, the conditional Gaussian negative log-likelihood (NLL) function of the observations yTN = yT = [y(0), . . . , y(N − 1)]
(17.4)
given the model parameters θ and input variance λ , is given by: ⎧ 2 −1 ⎪ N−1 (H (q,θL )g(y(t),θNL )) N ⎪ log (2 πλ ) + ⎪ ∑ t=0 2λ ⎪2 ⎪ ⎨− N−1 log |g (y(t), θ )| ∑t=0 NL L( y| θ , λ ) = 2 −1 ⎪N N−1 (g(H (q,θL )y(t),θNL )) ⎪ log (2 πλ ) + ⎪ ∑ t=0 ⎪ 2 2λ ( ( ⎪ ⎩ N−1 − ∑t=0 log (g (H −1 (q, θL )y(t), θNL )(
(Wiener) ,
(Hammerstein) . (17.5) with q the forward shift operator (qx(t) = x(t + 1)), with (. . .) denoting the first order partial derivative w.r.t. the first argument of the function: i.e. g (x, y) = ∂∂ gx (x, y), and ‘log’ stands for Naperian logarithm (i.e. in base e). Here conditional means “given the initial conditions of the LTI part”. However, asymptotically, the conditional MLE equals the true MLE. Proof. The proof is based on the classical expressions of the log-likelihood for ARMA models and on the transformation formula of probability density functions through nonlinear mappings. More details can be found in [25, 27].
17.3.2 The Simplified MLE Cost Function Expressing that λ minimises the NLL (17.5) (set ∂ L( y| θ , λ )/∂ λ = 0), yields the expressions: 2 −1 1 N−1 (Wiener) , H (q, θL )g(y(t), θNL ) ∑ (17.6) λ = N1 t=0 N−1 −1 (q, θ )y(t), θ ) 2 (Hammerstein) . g(H ∑ L NL N t=0 Finally, after elimination of λ , taking the exponential function and scaling of the NLL with a constant, an equivalent cost function is obtained to be minimised over θ . It is a sum-of-squares expression given by
17
Blind Maximum-likelihood Identification
V (y, θ ) =
−1 2 N−1 g2N (y, θ ) ∑t=0 H (q, θL )g(y(t), θNL ) 2 N−1 g2N (y, θ ) ∑t=0 g(H −1 (q, θL )y(t), θNL )
279
(Wiener) , (Hammerstein) .
(17.7)
with ⎧ N−1 ⎪ 1 ⎪ (Wiener) , ⎨exp − N ∑ log |g (y(t), θNL )| t=0 gN (y, θ ) = N−1 ( ( ⎪ ⎪ (Hammerstein) . ⎩exp − N1 ∑ log (g (H −1 (q, θL )y(t), θNL )( t=0
(17.8) playing the role of a scalar (viz. not dependent on the summation index t) correction factor to the classical prediction error cost function, which disappears when no nonlinearity is present (g (•) = 1 ⇒ gN (y, θ ) = 1). Using Parseval’s theorem, the cost function can be written in the frequency domain; in the Wiener case, the frequency domain implementation of the operation H −1 (q, θL ) is easier, and hence this simplifies the cost function evaluation [27]. Since the cost function has been written as a sum of squares, its minimiser (to be constrained with the identifiability conditions; see Assumption 17.4)
θˆ = argmin V (yN , θ ) θ
(17.9)
can be calculated in a numerical stable way via the classical Gauss-Newton based iterative schemes [6]. The most likely input variance λˆ corresponding to this parameter set θˆ can be found by evaluating (17.6) at θˆ .
17.3.3 Asymptotic Properties To study the asymptotic properties (viz. the number of data samples N → ∞) of the MLE, some standard technical assumptions are needed concerning the true model, the considered model set and the excitation. Assumption 17.5. (consistency) 1. The true model is within the proposed model set that satisfies Assumptions 17.117.4. 2. The scaled NLL (without parameter ambiguities) L( yN | θ , λ =1)/N has continuous second order derivatives w.r.t. θ in a compact (i.e. closed and bounded, see [10] at section 2.2) set Θ for any N, infinity included. The compact set Θ is constructed such that it contains its unique global minimum is an interior point (i.e. not on the boundaries, see [10] at section 2.2) of Θ . 3. There exists an N0 such that for any N≥N0 , infinity included, the Hessian of the expected value of the scaled NLL (without parameter ambiguities) E [L( yN | θ , λ =1)/N], subject to the constraints given in Assumption 17.4, is regular (i.e. invertible) at its unique global minimiser in Θ .
280
L. Vanbeylen and R. Pintelon
Assumption 17.6. (nonlinear rates) 1. 2. 3. 4.
f0 (u) = O(|u|r1 ) with r1 ∈ R+ 0 as |u| → ∞. f0 (u) = O(|u|r2 ) with r2 ∈ R+ 0 as |u| → ∞. g(y, θNL ) = O(|y|r3 ) with r3 ∈ R+ 0 as |y| → ∞. ∂g ∂ g g (y, θNL ), ∂ θNL (y, θNL,0 ) and ∂ θNL (y, θNL,0 ) satisfy a similar power-law rate in y. Herein θNL,0 contains the exact parameter values corresponding to θNL .
Theorem 17.2. Under Assumptions 17.1-17.6 the MLE θˆ is consistent, has convergence rate O(N −1/2 ) in probability, is asymptotically efficient, and asymptotically normally distributed. Proof. See [27].
17.3.4 Loss of Consistency in the Case of a Non-Gaussian Input Since the MLE is constructed by assuming a white Gaussian input e(t), the consistency of the estimate θˆ can be lost if e(t) is non-Gaussian distributed. This effect is due to the nonlinearity, since in the linear case the estimate is still consistent [11, 16]. A counterexample illustrating this loss of consistency is given in [27].
17.3.5 Non-white Gaussian Inputs In the Wiener case, if the input whiteness assumption is violated, the identified linear dynamics will include the noise colouring and the LTI system dynamics, while the nonlinear part will remain unaffected. Without prior knowledge, it is not possible to separate both. In the Hammerstein case, both the linear and the nonlinear part are expected to be biased.
17.4 Generation of Initial Estimates Since the MLE is defined as the minimiser of a nonquadratic-in-the-parameters cost function, the Newton-Gauss minimisation may be affected by local minima. Hence, there is a need for good initial estimates to start the iterative procedure from. The properties that: • A linear system operating on a Gaussian input yields a Gaussian output. • The samples from a white Gaussian signal are independent and identically distributed (i.i.d.). • And a static system operating on an i.i.d. input yields an i.i.d. output. lead to the observation that, for both the Wiener and the Hammerstein system driven by white Gaussian noise, the input to the LTI part is i.i.d. and the the input to the static nonlinearity is Gaussian (see Figure 17.1). Hence, the initial estimates of both the linear and the nonlinear block may be generated by a combination of the following two subproblems:
17
Blind Maximum-likelihood Identification
281
1. The estimation of the Gaussian input uSNL (t) of a monotonically increasing static nonlinearity from its output signal ySNL (t) (and the nonparametric and parametric estimation of the nonlinear function). 2. The estimation of the white input uLT I (t) of an LTI system from its output signal yLT I (t) (and the parametric estimation of the transfer function). For a Wiener system, application of 1. to the (measured) output signal yields the intermediate signal (and the static nonlinearity f ), from which 2. allows to estimate the LTI H. For a Hammerstein, application of 2. to the (measured) output signal yields the intermediate signal (and the transfer function of the LTI H), from which 1. allows to estimate the static nonlinearity f .
17.4.1 Subproblem 1: Monotonically Increasing Static Nonlinearity Driven by Gaussian Noise The nonlinear function g is monotonically increasing; as a consequence it preserves order relationships. For every pair (u, ¯ y) ¯ satisfying y¯ = g(u), ¯ we have that FUSNL (u) ¯ = P(USNL ≤ u) ¯ = P(YSNL ≤ y) ¯ = FYSNL (y) ¯
(17.10)
with USNL and YSNL the random processes corresponding to respectively uSNL (t) and ySNL (t), and FUSNL and FYSNL the cumulative distribution functions of respectively USNL and YSNL , and P(A) the probability of event A. The following explains how (17.10), equating the corresponding quantiles, allows to estimate uSNL (t). First, note that the distribution of USNL is known (within an unknown variance scaling factor). Second, we define the empirical distribution function of YSNL as 1 N−1 1 FˆYSNL (ySNL (t)) = ∑ I{ySNL (r)≤ySNL (t)} − 2N N r=0
(17.11)
1 with the finite sample correction 2N , and IA the indicator function associated with the statement A, admitting the value 1 if A is true and 0 otherwise. Finally, the values of the input signal are reconstructed (within a scaling factor) as
uˆSNL (t) = Φ −1 (FˆYSNL (ySNL (t)))
(17.12)
with Φ −1 the standard normal quantile function. Plotting uˆ SNL(t) and ySNL (t) as function of each other gives a nonparametric estimate of the nonlinearity. The variances of each reconstructed input sample can be estimated, and next are used as weights in (0) a weighted least squares procedure, yielding the parameter estimate θˆNL [27].
17.4.2 Subproblem 2: LTI Driven by White Input Noise The second subproblem can be solved by applying classical ARMA algorithms to the output of the LTI yLT I (t), either in the time domain [11] or the frequency domain
282
L. Vanbeylen and R. Pintelon
[16]. This also coincides with the special case of a blind linear MLE of θL (cost (0) function (17.7) with g(x) = x). Let the estimate be denoted as θˆL . After inverse (0) filtering of yLT I (t) with H −1 (q, θˆL ), which is most easily done in the frequency domain, one finds an estimate of the LTI input uLT I (t) (within a scaling factor and transient errors). The order of H should be selected as to whiten the power spectrum of the reconstructed input signal.
17.5 Minimisation of the Cost Function Once initial parameter estimates are available, the maximum-likelihood sum-ofsquares cost function can be minimised via a Gauss-Newton based iterative algorithm (we used the Levenberg-Marquardt algorithm [12] in our implementation). Those algorithms require the use of the Jacobian matrix, defined as the derivatives of the residuals vector (with entries defined as the terms added up quadratically in the cost function) w.r.t. the parameters. Analytic expressions of the Jacobian matrix are available in [25, 27].
Practical Implementation of the Constraint In practice, the constraint on the nonlinearity was not implemented as stated in Assumption 17.4.3, but by setting the two-norm of the nonlinear parameter vector in the linear parametrisation to unity: θNL 2 = 1 .
(17.13)
The overparametrisation was taken care of by leaving all nonlinear coefficients free, taking the pseudoinverse of the Jacobian while solving the normal equations to the parameter updates, and finally rescaling the updated nonlinear coefficients to unit norm. This is justified as follows: the identifiability constraint chosen has no influence on model invariants in identification problems where the cost function is subject to exactly the same parameter ambiguities as the model itself [17].
17.6 The Cram´er-Rao Lower Bound Since a parameter estimate without uncertainty information is meaningless, the knowledge of the asymptotic covariance matrix of the estimated model parameters is quite informative. From Section 17.3.3 it follows that, under certain technical conditions, the MLE is asymptotically consistent, efficient and normally distributed. Hence, the asymptotic covariance matrix tends to the Cram´er-Rao lower bound (CRB), which is the inverse of the Fisher Information Matrix (FIM). Analytic expressions and practical calculation methods are available for the FIM in both the Wiener and the Hammerstein case [24, 25, 27]. This allows to calculate uncertainty
17
Blind Maximum-likelihood Identification
283
bounds with a given confidence level on the estimates and any model related quantity (poles, zeros, transfer function, static nonlinear function, ....).
17.7 Impact of Output Noise As already mentioned before, the assumption of errorless output observations is restrictive but eases the analysis. As a consequence of the presence of output measurement noise in the data (i.e. violation of Assumption 17.3.2), a bias will pop up in the estimates. For additive noise, asymptotically, this bias has the property to be – in a first order approximation – proportional to the variance of the additional unmodelled noise source [25, 27]. Therefore, there exists an output Signal-to-Noise-Ratio (SNR) beyond which the bias is insignificant compared to the estimation variability. Hence, the impact of additive output noise is small in high SNR situations. This effect is also shown in the simulations section.
17.8 Simulation Results Due to space limitations, we restrict ourselves to the presentation of simulation and measurement results for the Wiener case. For Hammerstein systems, the results are very similar [25].
17.8.1 Setup: Presentation of the Example In this section, the results of the method applied to a simulation example are discussed. The setup consists of a 4th-order LTI with transfer function H0 (z) =
1 + 0.00z−1 − 0.49z−2 + 0.01z−3 + 0.065z−4 1 + 0.3676z−1 + 0.88746z−2 + 0.52406z−3 + 0.55497z−4
cascaded with a 5th-order polynomial inverse nonlinearity: g0 (y) = 1 y1 y3 y5 θNL,0
(17.14)
(17.15)
T with the true parameter values θNL,0 chosen as 2 1 4 2 . Since θNL,2 , θNL,3 , and θNL,4 are all strictly positive, the invertibility Assumption 17.1.1 is met (the derivative has the same sign everywhere on the real axis). Since after the estimation, θˆNL only approximates θNL within an unknown scaling factor, the variance 1 of e(t) in Figure 17.1 is chosen to be 1, such that the resulting ˆ estimates θNL / λˆ can be compared with the true parameters θˆNL . The simulations consisted of R = 250 runs of a N = 8192-samples simulation with independent, zero mean, Gaussian noise excitation e(t). No disturbing output measurement noise is added.
284
L. Vanbeylen and R. Pintelon
17.8.2 Graphical Presentation of the Results The transfer function H0 and the nonlinear function u = g0 (y) are depicted in Figure 17.3 and Figure 17.4, respectively. Moreover, on each figure, the properties of the mean estimate over R runs are shown: (i) its sample standard deviation (computed as √1R times the sample standard deviation over R runs), (ii) the absolute value of its difference with the true values, and (iii) the 95% simultaneous confidence level of the difference. The latter is computed by applying the bootstrap method given in [22] using 5000 multivariate normally distributed parameter vectors with mean θ0 and covariance matrix the estimated Cram´er-Rao lower bound. Then, the simultaneous confidence bounds of the absolute value of the difference between the true values and a random realisation at 200 points (frequencies or y-values), are scaled by √1R to obtain the results shown in Figure 17.3 and Figure 17.4. From these figures, one can see that since the difference between the true values and its mean estimate, is (a) orders of magnitudes smaller than the true values, and (b) smaller than its 95% simultaneous confidence
Fig. 17.3 Transfer function of the linear part of the Wiener system, and properties of the mean estimate: (sample) standard deviation, magnitude of the complex difference between the true and the mean estimate, and its 95% simultaneous confidence level
Fig. 17.4 Mapping function u = g(y) of the nonlinear part of the Wiener system, and properties of the mean estimate: (sample) standard deviation, absolute value of the difference between the true and the mean estimate, and its 95% simultaneous confidence level
17
Blind Maximum-likelihood Identification
285
15
Fig. 17.5 Normalised histograms (grey bars) of the estimated parameters of the linear system (top: numerator coefficients, bottom: denominator coefficients), their true value (circle below), normalised histogram of initial estimates (white bars, coinciding with histogram of optimised values), probability densities according to mean CRB (bold)
Fig. 17.6 Normalised histograms (grey bars) of the estimated parameters of the nonlinear system, their true value (circle below), normalised histogram of initial estimates (white bars), probability densities according to mean CRB (bold)
20
20
20
10
10
10
10
5
0 −0.1
0 0.1 −0.6
0
−0.5
0 −0.4 −0.1
0
0.1
0
15
30
30
30
10
20
20
20
5
10
10
10
0
0.4
0.3
30
0
0.9
0.85
0
0.55
0.5
0.05 0.1
0.5 0.55 0.6
12
20
4
4
2
2
8
10
0 1.95
0
0
4
2
2.05
0
0.8
1
1.2
0
3
4
5
0
1.5
2
2.5
level. Therefore, no systematic error can be detected, which confirms the consistency of the MLE. Moreover, from Figure 17.4, it is clear that the curves (i) and (iii) have a minimum at about the zero crossing of the nonlinearity, which is related to the fact that u(t) is a zero mean signal, and most of the data are present around zero. In Figure 17.5 and 17.6, the normalised histograms of the parameters are depicted, together with their true values, the normalised histogram of their initial estimates, and their probability densities estimated from the mean CRB. Discussion: Consistent with the MLE properties outlined earlier, the results clearly show: (i) the excellent quality of the initial estimates, especially for the linear system (no significant improvements are achieved by the joint optimisation in the given example) (ii) the asymptotic normality of the estimates (iii) the consistency (estimates are nicely gathered around the true parameter values) (iv) the asymptotic efficiency of the MLE (asymptotic covariance matrix equals CRB). To cross-validate the results, the power spectrum of the intermediate signal, cal√ culated as λ H is compared with the spectrum of g(y). Figure 17.7 represents both spectra, and the standard deviations, computed as the diagonal elements of the ma√ ˜ ∼ ˜ ∂ θ · cov(θ ) · (∂ H/ ˜ ∂ θ )H with H˜ = λ H. trix cov(H) = ∂ H/
286 20
0
Amplitude [dB]
Fig. 17.7 Cross validation by comparing the spectrum of the intermediate signal via the linear and via the nonlinear block (for a randomly selected run). Notice that the model explains the observations very well
L. Vanbeylen and R. Pintelon
−20
−40
Spectrum g(y) λ1/2 H True values Standard deviation
−60 0.1
0.2 0.3 Normalized frequency
0.4
0.5
17.8.3 Monte Carlo Analysis Showing the Impact of Output Noise The same system as presented in Section 17.8.17.8.1 was used (also 8192 points per run) to study the impact of additive output noise on the simulation error of the model, for several output SNR conditions. At each SNR, the system was identified 50 times, and these 50 parameter vectors were averaged. Finally, for a fixed input signal, the root mean squared value (rms) of the difference between the output of the estimated model and the true system’s output was calculated relatively to the rms value of the latter. Figure 17.8 shows that the influence of the output noise on the simulation error is small (order of magnitude of a few percent) in high SNR cases. It is seen that the deviation between the full line and the horizontal dash-dotted line (obtained in a noiseless situation) is insignificant when the SNR exceeds the value of approximately 45dB. The reason is that the variability of the estimator dominates over the systematic errors (bias) induced by the output noise. Of course, one should still be careful since this numerical value is valid for this particular simulation example. Nevertheless the main conclusion “the higher the SNR, the smaller the error” will remain valid.
Relative simulation error
0.35 Relative simulation error Value for SNR = ∞
0.3 0.25 0.2 0.15 0.1 0.05
Fig. 17.8 Relative simulation error as function of the output SNR
0 20
30
40 50 Output SNR [dB]
60
70
17
Blind Maximum-likelihood Identification
287
17.9 Laboratory Experiment To show the potential usefulness of the approach in measurement science, a laboratory experiment has been elaborated, and the measurement results are reported. The setup, intended to emulate a nonlinear sensor, consisted of a Tektronix AM502 differential amplifier (cutoff frequency set to 3kHz), connected to a static squaring electrical circuit based on Analog Devices’ Internally Trimmed Integrated Circuit Multiplier AD532KH. Although this nonlinearity is globally non-invertible, the adjustable output DC offset of the differential amplifier was set to obtain a positive voltage waveform before the squaring circuit. Consequently, the measurements are located on one branch of the squaring circuit, and thus the nonlinearity is invertible in this part. To avoid electrical loading between the subsystems, wideband 50Ω buffers were inserted. The devices were excited by means of an Agilent HP E1445A Arbitrary Waveform Generator, and two synchronised Agilent HP E1430A Acquisition channels were used. Figure 17.9 depicts the whole measurement setup. The excitation signal was originally generated in MATLABTM, using the “randn” function, producing white Gaussian noise. After the acquisition (32768 data points at a sampling frequency of 78125Hz), the algorithm was given the output data (CH2) only. The basis functions for the inverse nonlinearity were carefully selected to approximate its true behaviour as good as possible (inverting the quadratic function 2 √ √ , y > 0 yields u = a + b y, hence f1 (y) = 1 and f2 (y) = y), and the y = u−a b orders of the linear part were chosen to produce spectrally white residuals (orders nc /nd = 3/3 were retained). The identification results are shown in Figures 17.10 to 17.12. The reconstructed waveforms cannot be distinguished from Gaussians; this is a first validation.
Buffer
Fig. 17.9 Measurement setup
Textronix AM502
AWG HP E1445A
Buffer
ACQ CH1 HP E1430A
Squaring circuit
Buffer
ACQ CH2 HP E1430A
Finally the obtained estimates were validated a second time by comparison with the results of a multisine measurement. The input variance as well as all the other settings were kept constant. A special odd multisine (almost normally distributed periodic signal with only odd harmonics excited) was applied and one period of ACQ CH1 and CH2 were measured (observe that this is not a blind experiment). The frequency response function (FRF) of the linear part was then calculated between the applied waveform (AWG) and CH1. Afterwards, a pointwise plot of CH1 versus CH2 was used for comparison with the estimate of the static nonlinear function. This validation can be found in Figure 17.13 and 17.14.
288
Histogram e (final estimate) 1000
Histogram Gaussian pdf
500 0 −0.3
−0.2
−0.1
0
0.1
0.2
0.3
Power spectrum e (final estimate) Amplitude [dB]
Fig. 17.10 Properties of the reconstructed input signal e. Top: histogram and black line representing the Gaussian probability density function with variance λˆ as computed from the model. Bottom: power spectrum (computed from reconstructed data) and white line representing power spectrum λˆ as computed from the model
L. Vanbeylen and R. Pintelon
−20 −40 −60 0
1
2 Frequency [Hz]
3 4
x 10
Histogram u (final estimate) Histogram Gaussian pdf
500 0 −0.4
−0.2
0
0.2
0.4
Power spectrum u (final estimate) Amplitude [dB]
Fig. 17.11 Properties of the reconstructed intermediate signal u. Top: histogram and black line representing the Gaussian probability density function with variance computed from the model. Bottom: power spectrum (computed from reconstructed data) and white line representing power spectrum λˆ Hˆ as computed from the model
1000
0 −20 −40 −60 0
1
2 Frequency [Hz]
3 4
x 10
Nonlinearity 0.4
u
0.2
0
−0.2
Fig. 17.12 Estimated nonlinearity: comparison of the different estimates
Nonparametric initial estimate Parametric initial estimate Optimized parametric estimate
−0.4 0
0.1
0.2 y
0.3
0.4
17
Blind Maximum-likelihood Identification
Fig. 17.13 Validation of the linear part H. Top: notice the excellent agreement of both FRFs. Bottom: the grey line indicates the magnitude of the complex difference between FRFs, the dotted line represents the measurement variance of the multisine-measured FRF
289
Amplitude [dB]
Amplitude [dB]
Validation linear part
Fig. 17.14 Validation of the nonlinear part: the estimated model falls inside the uncertainty band of the multisine measurement
Multisine measurement Estimated model
10 0 −10 0
1
2 Frequency [Hz]
3
2 Frequency [Hz]
3
4
x 10
−20 −40 −60 0
1
4
x 10
Validation nonlinear part 0.4
u
0.2
0
−0.2 Multisine measurement Estimated model
−0.4 0
0.1
0.2 y
0.3
0.4
17.10 Conclusion In this chapter, maximum-likelihood estimators for the blind identification of Wiener and Hammerstein systems have been presented. The main assumptions made are the following: the (unobserved) input signal is white, Gaussian noise; the linear part is minimum-phase; the nonlinearity is invertible; and the output signal is measured exactly. The MLE cost function can be reduced to a sum-of-squares form, which is the classical prediction error cost function, multiplied by a correction factor due to the nonlinearity. Iterative Newton-Gauss-based methods can be used for calculating the estimates, and a high-quality initialisation can be performed via a two-step procedure. At high SNR levels, the bias is proportional to the noise variance and may hence be neglected compared with the estimation variability beyond a certain SNR value. Simulation examples have been included to support the theoretical results. Finally, the method has been successfully applied to a laboratory experiment emulating a nonlinear sensor.
290
L. Vanbeylen and R. Pintelon
Acknowledgements. This work is sponsored by the Fund for Scientific Research (FWOVlaanderen), the Flemish Government (Methusalem Grant METH-1) and the Belgian Program on Interuniversity Poles of Attraction initiated by the Belgian State, Prime Minister’s Office, Science Policy programming (IAP-VI/4 - Dysco).
References 1. Abed-Meraim, K., Qiu, W., Hua, Y.: Blind system identification. Proc IEEE 85(8), 1310– 1322 (1997) 2. Bai, E.: A blind approach to the Hammerstein–Wiener model identification. Automatica 38(6), 967–979 (2002) 3. Billings, S., Fakhouri, S.: Identification of systems containing linear dynamic and static nonlinear elements. Automatica 18(1), 15–26 (1982) 4. Clark, C., Chrisikos, G., Muha, M., Moulthrop, A., Silva, C.: Time-domain envelope measurement technique with application to wideband power amplifier modeling. IEEE Trans. Microw. Theory Tech. 46(12), 2531–2540 (1998) 5. Eskinat, E., Johnson, S., Luyben, W.: Use of Hammerstein models in identification of nonlinear systems. AIChE Journal 37(2), 255 (1991) 6. Fletcher, R.: Practical methods of optimization, 2nd edn. Wiley-Interscience, New York (1991) 7. Greblicki, W.: Nonlinearity estimation in Hammerstein systems based on ordered observations. IEEE Trans. Signal Process 44(5), 1224–1233 (1996) 8. Hunter, I., Korenberg, M.: The identification of nonlinear biological systems: Wiener and Hammerstein cascade models. Biological Cybernetics 55(2), 135–144 (1986) 9. Kalouptsidis, N., Koukoulas, P.: Blind identification of Volterra-Hammerstein systems. IEEE Trans. Signal Process 53(8), 2777–2787 (2005) 10. Kaplan, W.: Advanced calculus, 4th edn. Addison-Wesley, Reading (1993) 11. Ljung, L.: System identification: theory for the user, 2nd edn. Prentice Hall, Upper Saddle River (1999) 12. Marquardt, D.: An algorithm for least-squares estimation of nonlinear parameters. SIAM J. Appl. Math. 11(2), 431–441 (1963) 13. Norquay, S., Palazoglu, A., Romagnoli, J.: Application of Wiener model predictive control (WMPC) to a pH neutralization experiment. IEEE Trans. Control Syst. Technol. 7(4), 437–445 (1999) 14. Nykamp, D., Ringach, D.: Full identification of a linear-nonlinear system via crosscorrelation analysis. Journal of Vision 2(1), 1–11 (2002) 15. Peeters, B., De Roeck, G.: Stochastic system identification for operational modal analysis: a review. Journal of Dynamic Systems, Measurement, and Control 123(4), 659–667 (2001) 16. Pintelon, R., Schoukens, J.: Box–Jenkins identification revisited – Part I: Theory. Automatica 42(1), 63–75 (2006) 17. Pintelon, R., Schoukens, J., Vandersteen, G., Rolain, Y.: Identification of invariants of (over)parameterized models: finite sample results. IEEE Trans. Autom. Control 44(5), 1073–1077 (1999) 18. Prakriya, S., Hatzinakos, D.: Blind identification of LTI-ZMNL-LTI nonlinear channel models. IEEE Trans. Signal Process 43(12), 3007–3013 (1995) 19. Prakriya, S., Hatzinakos, D.: Blind identification of linear subsystems of LTI-ZMNLLTI models with cyclostationary inputs. IEEE Trans. Signal Process 45(8), 2023–2036 (1997)
17
Blind Maximum-likelihood Identification
291
20. Schoukens, J., Nemeth, J., Crama, P., Rolain, Y., Pintelon, R.: Fast approximate identification of nonlinear systems. Automatica 39(7), 1267–1274 (2003) 21. Taleb, A., Sol´e, J., Jutten, C.: Quasi-nonparametric blind inversion of Wiener systems. IEEE Trans. Signal Process 49(5), 917–924 (2001) 22. Tj¨arnstr¨om, F.: Computing uncertainty regions with simultaneous confidence degree using bootstrap. In: Proc. 12th IFAC Symp. System Identification, Santa Barbara, CA, pp. 522–527 (2000) 23. Tong, L., Perreau, S.: Multichannel blind identification: from subspace to maximum likelihood methods. Proc. IEEE 86(10), 1951–1968 (1998) 24. Vanbeylen, L.: Calculation of the Fisher information matrix for blind ML Hammerstein identification. Tech. rep., Vrije Universiteit Brussel (2007), http://wwwir.vub. ac.be/elec/Papers%20on%20webPapers/slashLaurentVBeylen/ FisherHamm.pdf 25. Vanbeylen, L., Pintelon, R., Schoukens, J.: Blind maximum likelihood identification of Hammerstein systems. Automatica 44(12), 3139–3146 (2008) 26. Vanbeylen, L., Pintelon, R., de Groen, P.: Blind maximum likelihood identification of Wiener systems with measurement noise. In: Proc. 15th IFAC Symp. System Identification, Saint Malo, France, pp. 1686–1691 (2009a) 27. Vanbeylen, L., Pintelon, R., Schoukens, J.: Blind maximum-likelihood identification of Wiener systems. IEEE Trans Signal Process 57(8), 3017–3029 (2009b) 28. Wang, W., Henriksen, R.: Generalized predictive control of nonlinear systems of the Hammerstein form. Modeling, Identification and Control 15(4), 253–262 (1994) 29. Zhu, Y.: Identification of Hammerstein models for control using ASYM. International Journal of Control 73(18), 1692–1702 (2000)
Chapter 18
A Blind Approach to Identification of Hammerstein Systems Jiandong Wang, Akira Sano, Tongwen Chen, and Biao Huang
18.1 Introduction Hammerstein systems form a class of block-oriented nonlinear models, where a static nonlinearity precedes a linear dynamic system. There exist a large number of works on the topic of identification of Hammerstein systems in the literature. The methods of Hammerstein identification can be classified as the ten methods in Section 3.9 of [7] or the four groups in Chapter 1 of [8]. This chapter focuses on the blind approach that was formulated in [15] and further studied in [2, 17]. This type of approaches assumes inputs to be piece-wise constant for certain consecutive samples and aims at the main difficulty in identification of Hammerstein systems: the inner signal between the nonlinearity and linear system is unmeasurable. Such approaches have two important merits: (i) identification of Hammerstein systems is possible even without an explicit parametrisation of the nonlinearity, and (ii) the nonlinearity does not have to be static, but could be the Jiandong Wang Department of Industrial Engineering and Management, College of Engineering, Peking University, Beijing, China 100871 e-mail:
[email protected] Akira Sano Department of System Design Engineering, Keio University, 3-14-1 Hiroshi, Kohoku-ku, Yokohama, Japan 223-8522 e-mail:
[email protected] Tongwen Chen Department of Electrical and Computer Engineering, University of Alberta, Edmonton, Alberta, Canada T6G 2V4 e-mail:
[email protected] Biao Huang Department of Chemical and Materials Engineering, University of Alberta, Edmonton, Alberta, Canada T6G 2G6 e-mail:
[email protected] F. Giri & E.-W. Bai (Eds.): Block-oriented Nonlinear System Identification, LNCIS 404, pp. 293–312. c Springer-Verlag Berlin Heidelberg 2010 springerlink.com
294
J. Wang et al.
one with finite memories such as the hysteresis-backlash and relay-backlash relay studied in [4, 6]. These merits are valuable when the nonlinearity has many possible structures or is hard to be represented by parametric models. In particular, one industrial application is to capture the nonlinearities of faulty control valves in feedback control systems. The nonlinearity of control valves has a variety of possible structures, e.g., deadband, saturation, and backlash [5]. The contribution of this chapter is to propose a new blind approach to identification of Hammerstein systems without an explicit parametrisation of the nonlinearity. The new approach has two major difference with existing blind approaches in [15, 2, 17]: (i) Only the denominator of the linear system is identified, while the existing blind approaches estimate the denominator first and then use the estimated denominator to yield the numerator. (ii) The unmeasurable inner signal is estimated by a subspace direct equalisation method. [15, 17] estimated the inner signal by least-squares methods, even though the forms and complexities of their methods are different. [2] did so by passing output measurements through an inverse of the identified linear system without considering noise effects. The two major differences lead to significant improvement: as noted by [2, 17] (see Table 4 therein), errors in estimating the denominator are propagated to the estimated numerator. Skipping the numerator estimation removes such an error propagation completely. The rest of the chapter is organised as follows. Section 18.2 describes the problem to be solved and the necessary assumptions. Sections 18.3 and 18.4 devote to estimating the denominator of the linear system and the unmeasurable inner signal, respectively. Section 18.5 lists the detailed steps of the proposed approach and compares the performance of three different approaches in terms of inner signal estimation via numerical examples. Section 18.6 illustrates and validates the proposed approach by an experimental example of modelling magneto-rheological dampers. Some concluding remarks are given in Section 18.7.
18.2 Problem Description Consider a discrete-time Hammerstein system with sampling period h depicted in Figure 18.1. The Hammerstein system is composed of a linear time-invariant (LTI) causal dynamic system G(q) and a nonlinearity f (·). The output y(t) is measured with an additive coloured noise v(t). The Hammerstein system possibly works under closed-loop operation so that the measured input u(t) may be correlated with v(t − d) for d ≥ 1. The inner signal x(t) is unmeasurable — the main difficulty in
Fig. 18.1: A discrete-time Hammerstein system with sampling period h
18
A Blind Approach to Identification of Hammerstein Systems
295
identification of Hammerstein systems. Here q−1 appeared earlier is a backward shift operator: q−1 x(t) = x(t − 1). The objective is to identify G(q) and f (·) from the measurements of u(t) and y(t). We make the following assumptions throughout the chapter: A1. The input u(t) is piece-wise constant for p consecutive samples, where p is a positive integer. A2. The input u(t) is persistently excited (PE) with multiple-level amplitudes so that the inner signal x(t) is PE and the shape of nonlinearity f (·) can be revealed from a plot of x(t) and u(t). A3. The linear system and noise dynamics can be described by an AutoRegressive with eXternal input (ARX) model, y (t) =
1 B (q) x (t − τ ) + e (t) , A (q) A (q)
(18.1)
where A (q) = 1 + a1q−1 + a2 q−2 + · · · + ana q−na , B (q) = b1 q−1 + b2 q−2 + · · · + bnb q−nb . A4. A5. A6. A7.
Here the noise source e(t) is white with zero mean and variance σ 2 . If the time delay is decomposed as τ = kp + τ0 for k ∈ Z+ and τ0 ∈ [0, p), then k is known a priori. Here Z+ stands for the set of nonnegative integers. The upper bound n0b of the numerator order nb is known a priori, and p is no less than n0b + 1 , i.e., p ≥ n0b + 1. nb b j = 0. B(q) does not have a zero at 1, i.e., ∑ j=1 The nonlinearity f (·) preserves its input’s piece-wise constant property.
Assumption A1 is satisfied in several scenarios. A common one arises from user’s design by introducing piece-wise constant u(t) — see the experimental example in Section 18.6. Another scenario occurs in sampled-data systems by fast-sampling outputs — see Figure 18.3 in Section 18.5, where the output is sampled p times faster than the input updating period ph, and owing to the zero-order hold (ZOH), a fast-rate input with sampling period h is available by interpolation and has the piece-wise constant property. Assumption A2 is a standard identifiability condition for Hammerstein systems, see e.g., Section III-E in [2] for a detailed discussion. For Hammerstein systems, “identifiability” is understood with a gain ambiguity between f (·) and G(q); the ambiguity can be removed by adding some constraints (see the last paragraph in Section 18.4). Assumption A4 essentially arises from a character of the blind identification that the information of output only (without x(t) as the input of G(q)) cannot distinguish time delays τ1 = k1 p + τ0 and τ2 = k2 p + τ0 for k1 = k2 (Section 3.4 in [17]). We assume τ ∈ [0, p) in the sequel without loss of generality, since the known portion of τ can always be removed by shifting output data. Assumption A5
296
J. Wang et al.
is inherent in all blind approaches; in fact, Theorem 2.1 in [3] says that Assumption A5 is a sufficient and necessary condition for G(q) to be blindly identifiable from the information of y(t) only. Assumption A6 makes the estimation of the inner signal unique, and is a mild assumption satisfied by many systems. Assumption A7 says that if the input u(t) of nonlinearity is piece-wise constant, then the output x(t) of nonlinearity inherits the same piece-wise constant property from u(t), e.g., under Assumption A1, x(t) − x(t − 1) = 0, for (kp + 1) ≤ t ≤ (kp + p − 1), ∀t, k ∈ Z+ .
(18.2)
Equation (18.2) holds for static nonlinearity as well as certain nonlinearities with finite memories such as the hysteresis-backlash and relay-backlash relay studied in [4, 6]. Hence, Hammerstein systems in this context do not limit to the traditional class where only static nonlinearity is involved1.
18.3 Estimation of na , τ and A(q) This section presents the details in estimating the denominator order na , the time delay τ and the parameters ai ’s in A(q). Subtracting two consecutive samples of the output in (18.1), y (t) and y (t − 1), yields na
(y (t) − y (t − 1)) + ∑ ai (y (t − i) − y (t − i − 1)) i=1
nb
=
∑ b j (x (t − τ − j) − x (t − τ − j − 1)) + e (t) − e (t − 1).
(18.3)
j=1
(l)
Let t = kp + l for k ∈ Z+ and l ∈ [0, p). Define Δy (k) := y (kp + l) − y (kp + l − 1), (l) (l) and Δx (k) and Δe (k) analogously. Equation (18.3) becomes na
Δy (k) + ∑ ai Δy (l)
(l−i)
i=1
nb
(k) =
(l−τ − j)
∑ b j Δx
(l)
(k) + Δe (k) .
(18.4)
j=1
The property in (18.2) and Assumption A2 imply that (l)
Δx (k) = 0, for l = 1, 2, · · · , p − 1, (l) Δx (k) 1
=
(18.5)
0, for l = 0.
In general, for the nonlinearity f (·) with finite memories, x(t) may be piece-wise constant for p consecutive samples, where p ≤ p, e.g., the nonlinearity x(t) = f (u(t), u(t − 1)) generically implies p = p − 1; however, as long as p ≥ n0b + 1, the proposed approach is still valid.
18
A Blind Approach to Identification of Hammerstein Systems
297
Assumptions A4 and A5, namely, τ ∈ [0, p) and p ≥ n0b + 1 ≥ nb + 1, say that if l ∈ [nb + 1 + τ , p + τ ], then the set {l − τ − nb , l − τ − nb + 1, · · · , l − τ − 1} is a subset of {1, 2, · · · , p − 1}. Using the property in (18.5), (18.4) reduces to2 na
Δy (k) + ∑ ai Δy (l)
(l−i)
(l)
(k) = Δe (k) , ∀l ∈ [nb + 1 + τ , p + τ ] and ∀k ∈ Z+ . (18.6)
i=1
By introducing notation
θa
=
(l)
φy (k) =
6
a2
a1
(l−1)
−Δ y
T
· · · a na (k)
,
(l−2)
−Δ y
(k) · · ·
(l−na )
−Δ y
7T
,
(k)
(18.6) is rewritten as " #T (l) (l) (l) Δy (k) = φy (k) θa + Δe (k) , ∀l ∈ [nb + 1 + τ , p + τ ] and ∀k ∈ Z+ . (18.7) Equation (18.7) is a static linear regression that is linear in the parameter ai . However, if the ordinary least-squares method (LSM) is applied to (18.7) with the colN , i.e., lected output {y(t)}t=1 8
(l) θ¯a
" #T 1 K−1 (l) (l) = φy (k) φy (k) ∑ K k=0
9−1
1 K−1 (l) (l) ∑ φy (k) Δy (k) , K k=0
(l) the resulted estimate θ¯a would be biased. Here K is the largest integer less than or equal to N/p. The bias arises from the correlation between the noise term (l) (l−1) Δe (k) = e (kp + l) − e (kp + l − 1) with the regressor Δy (k) = y (kp + l − 1) − (l) y (kp + l − 2). The difference between θ¯a and θa is
8
(l) θ¯a − θa
" #T 1 K−1 (l) (l) = φy (k) φy (k) ∑ K k=0
9−1
1 K−1 (l) (l) ∑ φy (k) Δe (k) . K k=0
As K → ∞, $ % 1 K−1 (l) (l) (l) (l) φ (k) Δ (k) = E φ (k) Δ (k) = σ2 e y e ∑ y K→∞ K k=0 lim
0
··· 0
T 1×na
.
Here E {·} is the expectation operation. Recall that σ 2 is the variance of the noise source e(t) (see Assumption A3). Thus, θa can be estimated without bias by explicitly compensating the noise effect, i.e., 2
Rationale of Assumption 4: if the time delay τ has a portion being an integer multiple of p, the portion cannot be seen from (18.5) and (18.6).
298
J. Wang et al.
8 (l) θˆa
=
⎧ ⎡ ⎪ 8 9−1 ⎪ 9 ⎪ ⎨ 1 K−1 ⎢ " #T 1 K−1 (l) ⎢ (l) (l) (l) φy (k) φy (k) φy (k) Δy (k) − ⎢ ∑ ∑ ⎪ ⎣ K k=0 K k=0 ⎪ ⎪ ⎩
6 7 (l) −1 (l) = : Φy ηy , ∀l ∈ [nb + 1 + τ , p + τ ].
σˆ 2 0 .. . 0
⎤⎫ ⎪ ⎪ ⎬ ⎥⎪ ⎥ ⎥ ⎦⎪ ⎪ ⎪ ⎭
(18.8)
A consistent estimate of σ 2 was developed in Theorem 1 in [9] and Lemma 3 in [15]: σˆ 2 is the smaller one of the roots (Ω1 , Ω2 ) of a quadratic equation in Ω , i.e.,
σˆ 2 = min (Ω1 , Ω2 ) . The quadratic equation in Ω is 0.5Φ11 Ω 2 − Ω + 0.5σ¯ 2 = 0, where
Φ11
=
6 7 2 (l) −1 Φy , 11
σ¯ 2
=
2 " #T 1 K−1 (l) (l) (l) ¯ ∑ Δy (k) − φy (k) θa . K k=0
Here {·}11 stands for the first-column and first-row element of the operand matrix. (l) Theorem 18.1. Under Assumptions A2-A5, the estimated parameter θˆa in (18.8) (l) is consistent, i.e., θˆa → θ (l) as K → ∞. (l)
Proof. The consistency of θˆa can be proved analogously to the counterpart proof of Theorem 1 in [15]. (l)
As θˆa in (18.8) is based on the linear regression in (18.7), na and τ can be determined by standard model structure determination methods, e.g. those in Section 11.5 in [13] and Section 16.4 in [11]. Here the so-called Akaike information criterion (AIC) is adopted, 2 " " #T na # 1 K−1 (l) (l) (l) ˆ Δ (k) − φ (k) θ . V (l) (na ) = 1 + 2 y a ∑ y K K k=0
(18.9)
As nb is unknown, we may choose l = p + τ . Then, na and τ would be the integers associated with the minimum value of V (p+τˆ) (nˆ a )’s among different combinations of nˆ a and τˆ ; see also Section 3.4 in [17] for rationale of estimating τ from the information of y(t) only. This is illustrated later via the experimental example in Section 18.6.
18
A Blind Approach to Identification of Hammerstein Systems
299
Remark 18.1. We can also use the instrumental variable method (IVM) to estimate parameters in A(q). Based on (18.7), the parameters in A (q) are estimated by the (extended) IVM, 8
(l) θˆa
1 K−1 = ∑ ζ (k) φyT (k) K k=0
9†
1 K−1 (l) ∑ ζ (k) Δy (k) , ∀l ∈ [nb + 1 + τ , p + τ ]. K k=0 (18.10)
where
ζ (k) =
6
(l−2)
−Δ y
(k)
(l−3)
−Δ y
(k)
(l−m)
· · · −Δ y
7T (k)
, m ≥ na + 1. (18.11)
Superscript († ) represents Moore-Penrose pseudoinverse. Because the elements of (l) (l) ζ (k) in (18.11) are independent to Δe (k) in (18.7), the consistency of θˆa in (18.10) follows directly from the well-known theory of the IVM, e.g., Chapter 8 in [13]. Accompanying with the simplicity, the IVM suffers the drawback of being sub-optimal, i.e., the variances of the estimated parameters are usually larger than those of the estimates from the bias-compensated LSMs; however, the IVM could play a role in cross-checking or validation, i.e., we would have more confidence on the estimates from the bias-compensated LSM, if they match the counterparts from the IVM. (l) Remark 18.2. As θˆa in (18.8) is valid for l ∈ [nb + 1 + τ , p + τ ], we may adopt an aggregated estimation of θa to exploit more data points in order to reduce noise effects, ⎡ (n +1+τ ) ⎤† ⎡ (n +1+τ ) ⎤ Φy b ηy b ⎢ (nb +2+τ ) ⎥ ⎢ (nb +2+τ ) ⎥ ⎢ Φy ⎥ ⎢ ηy ⎥ ⎥ ⎢ ⎥. (18.12) θˆa = ⎢ .. .. ⎢ ⎥ ⎢ ⎥ ⎣ ⎦ ⎣ ⎦ . . (p+τ )
(p+τ )
Φy
ηy
(l) On the basis of θˆa in (18.10), an aggregated estimation of θa from the IVM can be defined analogously. As nb is unknown a priori, it may be replaced by the upper bound n0b or nˆ a (because G(q) is usually proper). Equation (18.12) is actually used in the experimental example in Section 18.6 later.
18.4 Estimation of x(t) The section studies how to estimate the inner signal x(t) after A(q) and τ have been identified. If y(t + τ ) is filtered by A(q), i.e., w (t) := A(q)y(t + τ ), then a finite impulse response (FIR) system is obtained from (18.1), A(q)y(t + τ ) =: w (t) =
nb
∑ b j x (t − j) + e (t + τ ).
j=1
(18.13)
300
J. Wang et al.
Owing to the property in (18.2), it is straightforward but tedious to see that (18.13) is equivalent to a slow-rate single-input multiple-output (SIMO) FIR system with the sampling period ph, 1
W (n) = ∑ H (i) X (n − i) + E (n) ,
(18.14)
i=0
where ⎡
⎢ ⎢ ⎢ ⎢ ⎢ H (0) = ⎢ ⎢ hnb −1 (0) ⎢ hn (0) b ⎢ ⎢ .. ⎣ . h p (0) ⎡ ⎢ ⎢ W (n) = ⎢ ⎣
w1 (n) w2 (n) .. . w p (n)
⎡
⎤
h1 (0) h2 (0) .. .
⎤
b1 b1 + b2 .. .
⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ nb −1 ⎥=⎢ bj ∑ j=1 ⎥ ⎢ ⎢ ⎥ ⎢ ∑nb b j j=1 ⎥ ⎢ ⎥ ⎢ .. ⎦ ⎣ . nb ∑ j=1 b j ⎡
⎢ ⎥ ⎢ ⎥ ⎥ := ⎢ ⎣ ⎦
w (pn + 1) w (pn + 2) .. .
⎤
⎡
h1 (1) h2 (1) .. .
⎡
⎤
⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎥ , H (1) = ⎢ ⎢ hnb −1 (1) ⎥ ⎢ hn (1) ⎥ b ⎢ ⎥ ⎢ .. ⎥ ⎣ . ⎦ h p (1)
⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥=⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎦ ⎣
⎡
⎡
⎤
⎢ ⎥ ⎢ ⎥ ⎥ , E (n) = ⎢ ⎣ ⎦
w (pn + p)
e1 (n) e2 (n) .. .
⎤
⎤
b nb 0 .. .
⎥ ⎥ ⎥ ⎥ ⎥ ⎥, ⎥ ⎥ ⎥ ⎥ ⎦
0
⎢ ⎥ ⎢ ⎥ ⎥ := ⎢ ⎣ ⎦
e p (n)
n
b bj ∑ j=2 nb ∑ j=3 b j .. .
e (pn + 1) e (pn + 2) .. .
⎤ ⎥ ⎥ ⎥, ⎦
e (pn + p)
X (n) := x (pn) . The SIMO FIR system in (18.14) has two characters. (i) Under Assumption A3, E(n) is white noise, i.e., e1 (n), e2 (n), · · · , e p (n) are independent on each other and have the same variance. (ii) X(n) is usually coloured with unknown statistics. The first character meets the basic assumption of blind equalisation in the field of communication, while the second rules out many direct equalisation methods that assume X (n) either to be white or to have known statistics, e.g., the linear prediction blind equalisation method [1]. Next, a subspace direct equalisation (SDE) method ([10, 16] is borrowed with some minor modifications for the special system in (18.14) to estimate the inner signal X (n). Assume E(n) to be absent for the time being, and construct a blocked N Hankel data matrix with the size of Lp × (K − L) from {w(t)}t=1 , ⎤ ⎡ W (2) W (3) · · · W (K − L + 1) ⎢ W (3) W (4) · · · W (K − L + 2) ⎥ ⎥ ⎢ W(L) = ⎢ ⎥. .. .. .. .. ⎦ ⎣ . . . . W (L + 1) W (L + 2) · · ·
W (K)
Here L is a positive integer named as the equaliser length, and K was defined in Section 18.3. Selection of L affects a tradeoff between the higher computational
18
A Blind Approach to Identification of Hammerstein Systems
301
cost and increased estimation accuracy for a larger L. Based on (18.14), W(L) has a factorisation ⎤ ⎡ H(1) H(0) 0 ··· 0 ⎢ .. ⎥ .. ⎢ 0 . H(1) H(0) . ⎥ ⎥ ⎢ W(L) = ⎢ . ⎥ . . . .. .. .. ⎣ .. 0 ⎦ 0 ··· 0 H(1) H(0) ⎡ ⎤ X (1) X (2) ··· X (K − L) ⎢ X (2) X (3) · · · X (K − L + 1) ⎥ ⎢ ⎥ ·⎢ ⎥ .. .. .. .. ⎣ ⎦ . . . . X (L + 1) X (L + 2) · · · X (K) =
: H · X (L + 1).
(18.15)
The Lp × (L + 1) matrix H has rows no less than columns, because L ≥ 1 and p is greater than 1 under Assumption A1. Then, H is full-column rank if and only if all the p channels of the SIMO FIR system in (18.14) do not share any common zero except at infinity (Theorem 1 in [12]), which is satisfied by Assumption A6 as shown in the proof of Theorem 18.2 later. As H is full-column rank, the null space of W(L) in (18.15) is the same as the null space of X(L + 1), i.e., null(W(L)) = null(X(L + 1)) =: Wn . Let the singular value decomposition (SVD) of W(L) be W(L) = VLΛ VTR . Because x (t) is persistently excited under Assumption A2, the row space of X(L + 1) has the dimension (L + 1). The first (L + 1) vectors in VR constitute the row space; the rest vectors form the null space, i.e., in MATLABTM notation, Wn = VR (:, (L + 2) : (K − L)) . As Wn is the null space of the Hankel matrix X(L + 1), we have X (k) X (k + 1) · · · X (k + K − L − 1) Wn = 0, ∀k ∈ [1, L + 1]. (18.16) These (L + 1) groups of equations in (18.16) imply ⎡ ⎤ WTn 0 · · · 0 ⎡ X (1) ⎢ . . .. ⎥ ⎢ X (2) ⎢ 0 WTn . .⎥ ⎢ ⎥⎢ .. ⎢ .. . . ⎥⎢ . ⎣ . . ... 0⎦⎣ X (K) 0 · · · 0 WT
⎤ ⎥ ⎥ ⎥ = 0, ⎦
n
or in a concise form WN · X = 0.
(18.17)
302
J. Wang et al.
The size of WN is (L + 1)(K − 2L − 1) × K. The following theorem proves the uniqueness of the estimated inner signal. Theorem 18.2. Under Assumptions A2 and A6, and the noise variance σ 2 = 0, X is the unique nontrivial solution of (18.17) up to a scalar ambiguity. Proof. It follows from Theorem 1 in [10] by noticing two facts. (i) The persistent excitation in Assumption A2 is equivalent to the concept of sufficient modes in the field of communication. (ii) Assumption A6 guarantees that all the p channels of the SIMO FIR system in (18.14) do not share any common zero except at infinity, which can be seen as follows. In this context, the i-th channel is Hi (q) = hi (0) + hi(1)q−1 . Hence, to make all the p channels without any common zero except at infinity is achieved by the condition that the matrix ⎤ nb ⎤ ⎡ ⎡ b1 bj ∑ j=2 h1 (0) h1 (1) ⎢ b 1 + b 2 ∑ nb b j ⎥ ⎢ h2 (0) h2 (1) ⎥ j=3 ⎥ ⎥ ⎢ ⎢ ⎥ .. .. ⎥ ⎢ ⎢ .. .. ⎢ ⎥ ⎥ ⎢ ⎢ . . . . ⎥ ⎥ ⎢ n −1 ⎢ ⎥ b ⎢ hn −1 (0) hn −1 (1) ⎥ = ⎢ ∑ b b ⎥ j nb b ⎥ ⎢ j=1 ⎢ b ⎥ n ⎥ ⎢ ∑b b ⎢ hn (0) h (1) ⎥ n 0 b b ⎥ ⎢ ⎢ j=1 j ⎥ ⎥ ⎢ ⎢ .. .. ⎥ . . ⎦ ⎣ ⎣ .. .. . . ⎦ nb h p (0) h p (1) b 0 ∑ j j=1
has a trivial null space. The condition is satisfied by the mild Assumption A6, i.e., nb b j = 0. ∑ j=1 Let us now consider the noise effect. Because the noise E(n) in (18.14) is white noise, it follows from Section D in [10] that the least-squares solution of (18.17) yields unbiased and consistent estimates, i.e., min WN · X2 , s.t. X2 = 1; X
ˆ is the last vector of XR , i.e., thus, if the SVD of WN is WN = XLΛN XTR , then X ˆ = XR (:, N). X
(18.18)
Theorem 2 in [10] also says that the covariance matrix of the estimated inner signal is bounded below by the Cramer-Rao bound if the noise E(n) is a zero-mean Gaussian white noise. Beside the constraint X2 = 1, the linear system G(q) is assumed to have a positive gain in order to remove the gain ambiguity between the nonlinearˆ in (18.18) may be reversed. Owing to ity f (·) and G(q). If necessary, the sign of X the property in (18.2), the fast-rate inner signal estimate x(t) ˆ is obtained from Xˆ (n) by piece-wise constant interpolation.
18
A Blind Approach to Identification of Hammerstein Systems
303
18.5 Numerical Examples This section first lists the detailed steps of the proposed blind approach. Second, it theoretically analyses the difference among the proposed approach and the existing blind approaches. Finally, two numerical examples are provided to validate the analysis and demonstrate the proposed approach as a promising approach of capturing control valve stiction, respectively. The proposed blind approach consists of the following steps: 1. The order na and time delay τ are determined as the integers associated with the minimum value of V (p+τˆ) (nˆ a ) in (18.9) among different combinations of nˆ a and τˆ . Both the bias-compensated LSM in (18.8) and the IVM in (18.10) can be used for the cross-checking purpose, i.e., if the two methods are in consistency, we would have more confidence on the estimates. 2. The parameters ai ’s in A(q) are estimated by the bias-compensated LSM in (18.8) with l = p + τˆ or their aggregated counterparts like (18.12) for all valid l’s. The estimates from the IVM in (18.10) can be used for the cross-checking purpose. ˆ 3. With A(q) and τˆ , the measured output y(t) is shifted forward by τˆ and filtered ˆ ˆ through A(q), i.e., w(t) = A(q)y (t + τˆ ). 4. The inner signal X(n) is obtained by the SDE method in (18.18), and its fast-rate ˆ counterpart x(t) ˆ is obtained from X(n) by piece-wise constant interpolation. 5. The shape of nonlinearity f (·) can be visualised from a graph of u(t) and x(t) ˆ ˆ (or U(n) and X(n)) to determine a suitable function of f (·). Then, parameters of f (·) can be estimated via least-squares nonlinear curve fitting, e.g., by the ‘lsqcurvefit’ function in MATLABTM Optimization Toolbox. 6. Identification of G(q) from x(t) ˆ and y(t) is standard, e.g., by the ‘arx’ function in MATLABTM System Identification Toolbox. These steps are illustrated in the experimental example in Section 18.6 later. Once the inner signal is estimated, identification of Hammerstein systems reduces to two standard problems in the steps 5 and 6. Thus, all the blind approaches are mainly different at the road to reach the inner signal. In particular, the steps 1-4 of the proposed approach consist of the estimation of A(q) and x(t), while the existing blind approaches [15, 1, 17] are composed of the estimation of A(q), B(q) and x(t). Let us look at the common steps in details. • In terms of the estimation of A(q), all the approaches under comparison are on the same basis of (18.7), even though [2] did not consider the effect of the noise (l) Δe (k). Thus, if the differences at presentations and minor technical details are ignored, all the approaches can be regarded having the same estimation of A(q). • In terms of the estimation of x(t), [2] estimated x(t) by passing output measurements through an inverse of the identified linear system without considering noise effects; [15, 17] avoided the inverse of the identified linear system by exploring the idea of least-squares methods, even though the forms and complexities of their methods were very different; the proposed approach exploits a SDE method that does not require the full information of linear system, i.e., the numerator B(q) is not required.
304
J. Wang et al.
Therefore, the major difference of all the approaches under comparison lies at the method estimating x(t). Figure 18.2 depicts a flow chart of reaching x(t) ˆ via ˆ INV three methods, namely, x(t) ˆ LSM via the least-squares method in [15, 17], x(t) via the linear system inverse method in [2], and x(t) ˆ SDE via the SDE method in the ˆ proposed approach. As the existing blind approaches estimate B(q) based on A(q), ˆ ˆ the error between A(q) and A(q) is unavoidably propagated into B(q). This was first pointed out by [2] and numerically demonstrated by [17] (see Table 4 therein). On the contrary, the proposed approach skips the estimation of B(q) and removes such an error propagation, which may lead to significant improvement, as shown in the next example.
Fig. 18.2: A flow chart of reaching x(t) ˆ by three methods
Example 1: In Figure 18.3, the continuous-time process Gc (s) works in a feedback loop with a pure gain controller, i.e., Gc (s) =
0.4095s + 1.0921 −0.36s e , C(z) = 0.1. s2 + 0.32s + 0.02
The process noise v is generated by passing zero-mean white noise with variance σ 2 through a noise dynamics, Nc (s) =
1 1 · , s2 + 0.32s + 0.02 TN s + 1
(18.19)
where TN is the time constant of a noise dynamics extra to 1/ s2 + 0.32s + 0.02 . If TN reduces to zero, the example in [17] is recovered. The nonlinearity fc is a backlash with deadband 0.1. The updating period of the ZOH is T = 0.6s. The upper bound of the numerator order nb is known a priori as n0b = 4, while the fastsampling ratio is chosen as p = n0b + 1 = 5. Thus, the fast-rate process G(q) at the sampling period h = T /p = 0.12s is
18
A Blind Approach to Identification of Hammerstein Systems
G (q) =
305
0.05597q−4(1 − 0.7244q−1) . (1 − 0.9898q−1)(1 − 0.9722q−1)
The reference signal R(n) is a random binary sequence with frequency band [0, 0.5] and values ±1.
Fig. 18.3: A sampled-data closed-loop Hammerstein system with output fast-sampling
In this example, we compare the performance of the proposed approach and the existing blind approaches. All the approaches are mainly different in the methods estimating the inner signal x(t), whose accuracy completely determines the qualities of nonlinearity and linear dynamics. Hence, we take advantage of simulation, that is, the true inner signal x(t) is available, and focus on comparing the performance of the proposed approach and the existing blind approaches in terms of closeness between x(t) and x(t). ˆ Detailed illustration of identification steps are provided in the experimental example in Section 18.6 later. The error between the true inner signal x(t) and its estimate x(t) ˆ is measured numerically by a fitness3 in %, xˆs (t) − x (t)2 . (18.20) F = 100 1 − x (t) − E {x (t)}2 Here xˆs (n) is the estimated inner signal after scaling. The computation of F in (18.20) is only possible for simulation where the true inner signal x(t) is available and the aforementioned gain ambiguity can be removed by properly scaling x(t). ˆ Table 18.1 lists the averaged fitnesses and their standard deviations in 100 Monte Carlo simulations for different noise levels, where each simulation duration is 300s and TN in (18.19) is fixed to TN = 0.2. The signal-to-noise ratio (SNR) is defined as SNR = y0 (t)2 / v (t)2 , where y0 (t) is the noise-free component of y (t) and ·2 denotes the Euclidean norm. FLSM , FINV and FSDE are respectively the fitnesses of the blind approach in [15, 17], that in [2], and the proposed approach. The subscripts emphasis the different methods to reach x(t) ˆ and stand for the least-squares method, the linear system inverse method, and the SDE method, respectively. In particular, the least-squares method described at Section 4 in [17] is exploited here. G(q) is minimum phase so that the blind approach in [2] estimates x(t) by passing 3
The fitness index is often adopted in the literature of system identification; see ‘compare’ command in MATLABTM System Identification Toolbox.
306
J. Wang et al.
Table 18.1: Averaged inner signal fitnesses in % and their standard deviations for different noise levels (TN = 0.2)
σ 2 × 10−2 0 0.01 0.05 0.1 0.5 1
SNR ∞ 14.3352 6.4481 4.5978 2.2034 1.6853
FLSM 100 96.7173 ± 0.5446 91.9363 ± 1.1802 87.4968 ± 1.6979 62.5485 ± 4.3326 41.8482 ± 6.16888
FINV 100 95.3568 ± 0.4589 89.0002 ± 0.9876 83.4794 ± 1.4465 54.7308 ± 4.0818 32.1188 ± 6.0221
FSDE 100 96.7043 ± 0.4313 92.1948 ± 1.1397 88.7822 ± 1.4323 71.7478 ± 3.9712 58.6067 ± 5.2791
ˆ y(t) through a direct inverse of G(q). In the SDE method, the equaliser length is chosen as L = 3. In Table 18.1, FLSM and FSDE are consistently better than FINV ; this is expected since both the least-squares method and the SDE method have the mechanism of reducing noise effects, and on the contrary, the linear system inverse method lacks such a mechanism. For smaller SNRs, FSDE is much better than FLSM and FINV , because the proposed approach skips the numerator estimation and removes the error propagation in the existing approaches. The error in numerator estimation becomes severe as the SNR is getting smaller, so that the error propagation deteriorate FLSM and FINV significantly. The next example demonstrates that the proposed blind approach is effective in capturing the nonlinearity of the control valve stiction in feedback control loop systems. Example 2: The configurations are the same as Example 1 except that the reference signal R(n) is a unit step signal, and the nonlinearity fc is replaced by a control valve stiction model having two parameters: the deadband plus stickband S = 4% and the stickband J = 2%, proposed by [5] (see Section 5 and Figure 10 therein). Due to the control valve stiction, constant oscillations appear in the closed-loop response. Here TN in (18.19) is chosen to be zero, and the the noise variance is σ 2 = 0.001. Analogously to Example 1, the proposed blind approach yields the estimated inner ˆ ˆ signal X(n). In Figure 18.4, the estimated nonlinearity revealed by X(n) and the measured input U(n) captures very well the actual nonlinearity of the control valve stiction.
18.6 Experimental Example The proposed approach is applied for modelling of magneto-rheological (MR) dampers. First, we briefly introduce the Hammerstein model of MR dampers and the experimental setup. Next, the steps of the proposed approach are illustrated, and the results of the three approaches under comparison are presented.
18
A Blind Approach to Identification of Hammerstein Systems
307
Fig. 18.4: The nonlinearity of control valve stiction (red dot) revealed by U(n) and X(n), and ˆ the estimated nonlinearity (black circle) by U(n) and X(n) in Example 2
18.6.1 Hammerstein Model of MR Dampers MR dampers are semi-active control devices to reduce vibrations of various dynamic structures. Recently, [14] proposed a so-called nonparametric model that has demonstrated several good merits. If currents/voltages of MR dampers are constants, the nonparametric model becomes a Hammerstein system depicted in Figure 18.1. Here the input u(t) and output y(t) stand for the velocity and damping force, respectively. [14] suggested a first-order model for the linear system, G (q) =
b1 q−1 , 1 + a1q−1
(18.21)
and three candidate functions for the nonlinearity, f (u) = c1 tanh (c0 u) + c2 ,
(18.22)
f (u) = c1 sgn (u) [1 − exp(−c0 |u|)] + c2 , f (u) =
(c0 + c1 |u − c3|)c2 (u−c3 ) − (c0 + c1 |u − c3|)−c2 (u−c3 ) c (u−c3 )
c02
−c2 (u−c3 )
+ c0
.
Our objective is to design an identification experiment and estimate G(q) and f (·) from the measured damping force y(t) and velocity u(t) by the proposed approach.
18.6.2 Experiment Setup A diagram of the experimental setup is depicted in Figure 18.5. In particular, Assumption A1 in Section 18.2 requires the velocity to be piece-wise constant for p
308
J. Wang et al.
Fig. 18.5: A diagram of the experimental setup
Fig. 18.6: Experimental data for estimation
consecutive samples. We let the desired displacement in Figure 18.5 take uniformlydistributed random values within the range [-1.5, 1.5] cm and have a constant increment every 0.2s. As a result, the velocity is approximately piece-wise constant for every 40 samples (the sampling period h is 0.005s). Thus, Assumption A1 is approximately satisfied. The integer p could be as large as 40; however, a larger p implies that fewer data points are exploited and the inner signal estimation needs higher computational costs. The linear system G(q) in (18.21) is expected to be the first order model, i.e., na = 1 and nb = 1. Hence, p = 5 (a factor of 40) seems a wellbalanced choice to safely satisfy Assumption A5 and meet the above consideration on the data length and computational cost. Two groups of experimental data are collected, each with 3700 samples. The data for parameter estimation is presented in Figure 18.6, while another group of data for model validation is shown in Figure 18.7 later.
18
A Blind Approach to Identification of Hammerstein Systems
309
18.6.3 Experiment Result The identification steps in Section 18.5 are proceeded as follows. First, the or(p+τˆ) der na and time delay τ are determined from Table 18.2 that lists VLSM (nˆ a ) and (p+τˆ) VINV (nˆa ) in (18.9) under different combinations of nˆ a and τˆ , where the subscripts “LSM” or “IVM” denotes θˆal obtained from the bias-compensated LSM in (18.8) or the IVM in (18.10). In Table 18.2, V (p+τˆ) (nˆ a ) achieves the smallest at nˆ a = 1 and (p+4) (p+4) τˆ = 4. In addition, VLSM (1) and VIV M (1) are very close to each other, and nˆ a = 1 matches the suggestion in (18.21) from [14], so that we feel confident on the choice of nˆ a = 1 and τˆ = 4. Note that the column of τˆ = 5 or τˆ = 6 is almost the same as that of τˆ = 0 or τˆ = 1, as the information of y(t) cannot distinguish time delays τ1 = k1 p + τ0 and τ2 = k2 p + τ0 for k1 = k2 . Table 18.2: The AICs under different combinations of nˆ a and τˆ (p+τˆ )
τˆ = 0
τˆ = 1
τˆ = 2
τˆ = 3
τˆ = 4
τˆ = 5
τˆ = 6
VLSM (1) (p+τˆ ) VIV M (1)
5.2108 4.8502
6.7095 5.5501
9.0337 7.3434
4.0337 3.9133
3.6583 3.6534
5.2248 4.8586
6.6870 5.5357
VLSM (2) (p+τˆ ) VIV M (2)
7.9207 6.4566
9.7719 20.8924
14.3760 8.2486
4.3373 4.1431
4.5929 9.5251
7.9277 6.5262
9.7031 23.9398
VLSM (3) (p+τˆ ) VIV M (3)
7.8365 6.1831
9.7935 5.3816
14.0967 7.9343
4.3497 4.0280
4.6274 6.1151
7.8420 6.1671
9.7232 5.4699
VLSM (4) (p+τˆ ) VIV M (4)
7.7892 7.2211
9.8107 6.0019
14.0369 10.1777
4.3731 4.2482
4.5822 5.5271
7.7967 7.9090
9.7383 5.9377
(p+τˆ )
(p+τˆ )
(p+τˆ )
Second, the denominator parameter is obtained in (18.12) with nˆ a = nˆ b = 1, τˆ = 4 and l = [nˆ b + τˆ + 1, p + τˆ ]: ˆ A(q) = 1 − 0.9283q−1.
(18.23)
Third, the slow-rate inner signal X(n) is estimated by the SDE method in (18.18). The equaliser length is chosen as L = 3. Third, the shape of nonlinearity by plotting ˆ U(n) against X(n) reveals that f (·) in (18.22) would be sufficient to capture the revealed nonlinearity, fˆ (U) = 0.04 tanh(0.6293U) + 0.0027.
(18.24)
ˆ Interpolating X(n) by the property in (18.2) gives x(t), ˆ the estimate of the inner signal x(t) with sampling period 0.005s. The linear system G(q) is ready to be identified from x(t) ˆ and y(t), 124.7q−1 . (18.25) Gˆ (q) = 1 − 0.9295q−1 ˆ ˆ The denominator of G(q) in (18.25) is in consistent with A(q) in (18.23).
310
J. Wang et al.
Fig. 18.7: The measured (solid), simulated (dots) and predicted (dashes) damping forces using the validation data: Fs = 65.2040% and Fp = 79.2043%
Since the actual inner signal x(t) is unavailable in the experiment, comparing the measured and simulated/predicted damping forces seems the single way to evaluate the quality of the Hammerstein model consisting of fˆ (·) in (18.24) and Gˆ (q) in (18.25). The simulated and (one-step ahead) predicted damping forces, yˆs (t) and yˆ p (t), are respectively as ˆ B(q) yˆs (t) = fˆ (u(t)) , ˆ A(q) ˆ ˆ fˆ (u(t)) + 1 − A(q) yˆ p (t) = B(q) y(t). Analogously to (18.20), the error between y(t) and yˆs (t) is measured numerically by a fitness Fs , and that between y(t) and yˆ p (t) is measured by Fp . For the estimation data, Fs = 68.8753% and Fp = 81.8321%, while for the model validation data, Fs = 65.2040% and Fp = 79.2043%. See Table 18.3 for a full list of fitness between the measured and k-step ahead predicted outputs, namely, FSDE (k) for k = 1, 2, 3. One half of corresponding y(t), yˆs (t) and yˆ p (t) are presented in Figure 18.7. In Figure 18.7, the estimated Hammerstein model performs very well in terms of dynamics tracking for both yˆs (t) and yˆ p (t), but the simulated output yˆs (t) presents certain gain errors at some peaks. The errors may arise from the approximation in achieving the piece-wise constant velocity, or structural limitation of the Hammerstein model in capturing certain dynamics of MR dampers. The blind approach in [15, 17] and that in [2] are also applied to the same estimation and validation data. Table 18.3 lists the fitnesses of the models obtained by the three approaches. Here FLSM (∞), FINV (∞) and FSDE (∞) are the fitnesses between the measured and simulated outputs from the three approaches, respectively.
18
A Blind Approach to Identification of Hammerstein Systems
311
Table 18.3: The simulated output fitnesses of the three approaches: FLSM (∞), FINV (∞) and FSDE (∞), and the k-step ahead predicted output fitness FSDE (k) of the SDE approach, for both estimation and validation data FLSM (∞) FINV (∞) FSDE (∞) FSDE (3) FSDE (2) FSDE (1) Est. 68.8840% 68.9275% 68.8753% 72.2431% 75.5122% 81.8321% Val. 64.3285% 65.0097% 65.2040% 66.4125% 70.4913% 79.2043%
The subscripts have the same meanings as those in Table 18.1. As nˆ a = nˆ b = 1, the two existing blind approaches have no need to estimate the numerator parameters (b1 = 1), so that the aforementioned error propagation does not exist. Hence, the three approaches are expected to have similar performance, which is confirmed by the almost same values of FLSM , FINV and FSDE in Table 18.3.
18.7 Conclusion A new approach has been proposed to identification of Hammerstein systems without an explicit parametrisation of the nonlinearity. It exploits the input’s piece-wise constant property to identify the denominator of the linear system and the SDE method to estimate the unmeasurable inner signal. Unlike those in [15, 2, 17], the proposed approach does not need the information of the numerator, and removes the error propagation completely. The improvement could be significant, as demonstrated by Example 1 in Section 18.5. The proposed approach is illustrated and validated by an experimental example of MR damper modelling in Section 18.6. The identified Hammerstein model of MR damper is fairly well for simulation/prediction purpose, but contains some noticeable modelling errors.
References 1. Abed-Meraim, K., Moulines, E., Loubaton, P.: Prediction error method for second-order blind identification: Algorithms and statistical performance. IEEE Trans. Signal Processing 45(3), 694–705 (1997) 2. Bai, E.W., Fu, M.: A blind approach to Hammerstein model identification. IEEE Trans. Signal Processing 50(7), 1610–1619 (2002) 3. Bai, E.W., Li, Q., Dasgupta, S.: Blind identifiability of IIR systems. Automatica 38(1), 181–184 (2002) 4. Cerone, V., Regruto, D.: Bounding the parameters of linear systems with input backlash. IEEE Trans. Automatic Control 52(3), 531–536 (2007) 5. Choudhury, M.A.A.S., Thornhill, N.F., Shah, S.L.: Modeling valve stiction. Control Engineering Prac. 13, 641–658 (2005) 6. Giri, F., Rochdi, Y., Chaoui, F.Z., Brouri, A.: Identification of Hammerstein systems in presence of hysteresis-backlash and hysteresis-relay nonlinearities. Automatica 44(3), 767–775 (2008) 7. Haber, R., Keviczky, L.: Nonlinear System Identification: Input-Output Modeling Approach. Kluwer Academic Publishers, Dordrecht (1999)
312
J. Wang et al.
8. Janczak, A.: Identification of Nonlinear Systems Using Neural Networks and Polynomial Models: A Block-Oriented Approach. Springer, New York (2005) 9. Kagiwada, H., Sun, L., Sano, A., Liu, W.: Blind identification of IIR model based on output over-sampling. IEICE Trans. Fundamentals E81-A(11), 2350–2360 (1998) 10. Liu, H., Xu, G.: Closed-form blind symbol estimation in digital communications. IEEE Trans. Signal Processing 43(11), 2714–2723 (1995) 11. Ljung, L.: System Identification: Theory for the User, 2nd edn. Prentice Hall, Englewood Cliffs (1999) 12. Moulines, E., Duhamel, P., Cardoso, J.F., Mayrargue, S.: Subspace methods for the blind identification of multichannel FIR filters. IEEE Trans. Signal Processing 43(2), 516–525 (1995) 13. S¨oderstr¨om, T., Stoica, P.: System Identification. Prentice Hall, London (1989) 14. Song, X., Ahmadian, M., Southward, S.C.: Modeling Magnetorheological dampers with application of nonparametric approach. J. Intell. Mater. Syst. Struct. 16(5), 421–432 (2005) 15. Sun, L., Liu, W., Sano, A.: Identification of a dynamical system with input nonlinearity. IEE Proc.-Control Theory Appl. 146(1), 41–51 (1999) 16. Van den Veen, A.J., Talwar, S., Paulraj, A.: A subspace approach to blind space-time signal processing for wireless communication systems. IEEE Trans. Signal Processing 45(1), 173–190 (1997) 17. Wang, J., Sano, A., Shook, D., Chen, T., Huang, B.: A blind approach to closed-loop identification of Hammerstein systems. Int. J. Control 80(2), 302–313 (2007)
Chapter 19
A Blind Approach to the Hammerstein-Wiener Model Identification Er-Wei Bai
19.1 Introduction The work of this chapter is a continuation of [3]. Unlike in [3] where a very special structure is assumed, however, the blind approach in this chapter allows a very general structure on the nonlinearities. In particular, the input nonlinearity structure can be arbitrary and is not assumed to be known. By using the blind identification approach, all the unknown internal variables can be recovered solely based on the output measurements. Once all interval variables are recovered, linear part and nonlinear parts including the structure can be identified. Our scheme applies to either white or non-white inputs. The blind techniques adopted in this chapter use our previous results presented for blind channel equalisations of IIR systems [4] and are also based on blind techniques developed for Hammerstein models [5, 20]. Although the algorithm proposed in this chapter is for the Hammerstein-Wiener model, it applies directly to the identification of either Wiener models or Hammerstein models with trivial modifications. The chapter is based on [1] with permission from Automatica/Elsevier.
19.2 Problem Statement and Preliminaries Consider the sampled Hammerstein-Wiener model shown in Figure 19.1, which consists of a Zero-Order-Hold, an input nonlinearity, a scalar linear stable continuous time system and an output nonlinearity. It is assumed that Assumption 19.1. • The unknown continuous time system P(s) possesses a state space representation
η˙ (t) = Aη (t) + bu(t), x(t) = cη (t), u, x ∈ R, A ∈ Rn×n .
(19.1)
Er-Wei Bai Dept. of Electrical and Computer Engineering, University of Iowa e-mail:
[email protected] F. Giri & E.-W. Bai (Eds.): Block-oriented Nonlinear System Identification, LNCIS 404, pp. 313–332. c Springer-Verlag Berlin Heidelberg 2010 springerlink.com
314
E.-W. Bai
For a given sampling interval T , its discrete time state space equation and the corresponding transfer function are given, respectively, by
η [(k + 1)T ] = )*+, eAT η [kT ] + Φ
G(z) = c(zI − Φ )−1Γ =
T
)
0
eAt dt · b u[kT ], x[kT ] = cη [kT ]. *+ ,
(19.2)
Γ
β (z) β1 z−1 + β2z−2 + ... + βnz−n = α (z) 1 − α1z−1 − α2 z−2 − ... − αnz−n
(19.3)
form some αi ’s and β j ’s. • The input nonlinearity is static u = g(w). Its structure is not assumed to be known. • The output nonlinearity is also static y = f (x). However, it is assumed that f (·) is one-to-one so that the inverse x = f −1 (y) exists and admits a polynomial i representation x = ∑m i=1 ri y . In identification, we will first estimate the inverse coefficients ri ’s and then, find the q i best forward function y = ∑i=1 ai xi of x = ∑m i=1 ri y in the least squares sense using the observed data. Finally, we represent the output nonlinearity y = f (x) by y = ∑qi=1 ai xi . We remark that if y = f (x) is one-to-one and continuous, its inverse x = f −1 (y) exists and is also continuous. With a bounded input and a stable system, x and y are always bounded and this implies that the inverse x = f −1 (y) can be i approximated to any accuracy by a polynomial x = ∑m i=1 ri y . In particular, as m −1 i the order m goes to infinite, x = f (y) = ∑i=1 ri y . This shows that the inverse representation is theoretically justified. Practically, of course, a high order m usually means a high sensitivity to noise and model uncertainty in the identification setting. Therefore, there is a limitation to the inverse approach. A discussion on this topic will be provided later. For a given sampling interval T , the goal of the Hammerstein-Wiener model identification is to estimate the transfer function G(z) in terms of its parameters αi ’s and q i i β j ’s, the output nonlinearity x = ∑m i=1 ri y and its inverse y = ∑i=1 ai x , and the input nonlinearity u = g(w) solely based on the measurements of w and y. No internal variables x and u are assumed available. Moreover, the structure of the input nonlinearity u = g(w) is unknown. Our idea of identification is the blind approach, i.e., to sample the output at a higher rate. Given the sampling interval T , let the output sampling interval be
Fig. 19.1: The sampled Hammerstein-Wiener model
19
A Blind Approach to the Hammerstein-Wiener Model Identification
315
h = T /l, l ≥ 1 for some positive integer l, referred to as the over-sampling ratio. The following lemma [4] will be useful. Lemma 19.1. Consider the continuous system (19.1) and its sampled system at the sampling interval h = T /l for some l ≥ 1. Then, the sampled system is minimal for any l ≥ 1 if and only if the sampled system at the sampling interval T is minimal. Minimality is important in identification. Without minimality assumption, the transfer function G(z) has pole-zero cancellations and this makes the parametrisation non-unique. In other words, the identifiability is lost. To this end, we make the following assumption. Assumption 19.2. The sampled system (19.2) is assumed to be minimal at the sampling interval T . Before closing this section, note that the parametrisation of the HammersteinWiener model is actually not unique. To obtain a unique parametrisation, two blocks need to be normalised. Since the structure of the input nonlinearity is not assumed to be known, we normalise the linear block and the output nonlinearity. Assumption 19.3. It is assumed that β1 = 1 and r1 = 1. With this normalisation assumption and persistent exciting (PE) input, we will show that g, G and f can be uniquely identified. Note that there are other ways to normalise the system, Assumption 19.3 is the simplest one. The purpose is to avoid unnecessary complications so that our ideas can be presented clearly.
19.3 Identification of the Hammerstein-Wiener Model If the structure of the input nonlinearity u = g(w) is unknown, identification of the Hammerstein-Wiener model is no longer a parameter estimation problem. Obviously, the identification involves structural estimation. It is clear, however, that if u were available, we would be able to estimate the input nonlinearity structure. At least, the complete picture of u = g(w) can be graphed by using the pairs of (w, u) and this graphical picture provides us accuracy information on the unknown input nonlinearity u = g(w). Therefore, the key is to estimate G(z), x = ∑ ri yi , and then to recover u solely based on the output measurements. We accomplish this goal in several steps, estimating the output nonlinearity, finding the linear part and then recovering u.
19.3.1 Output Nonlinearity Estimation Given the input sampling interval T , let the output sampling interval be h = T /(n + 1), where n is the order of G(z). We remark that h = T /(n + 1) is not necessary but does make analysis and notation simple. In fact, h = T (n¯ + 1) for any n¯ ≥ n will work, see remarks in Discussion section for details. Now, consider the sampled
316
E.-W. Bai
system at the sampling interval h = T /(n + 1). It is clear that the transfer function of the sampled system at the sampling interval h is also an nth order strictly proper rational function ˜ ˜ −1 ˜ −2 ˜ −n ˜ = β (z) = β1 z + β2z + ... + βnz G(z) α˜ (z) 1 − α˜ 1z−1 − α˜ 2 z−2 − ... − α˜ nz−n
(19.4)
for some unknown α˜ i ’s and β˜ j ’s. Its time domain equation is accordingly given by n
n
i=1
i=1
x[kh] = ∑ α˜ i x[kh − ih] + ∑ β˜i u[kh − ih]. Substituting the value x[kh] = ∑mj=1 r j y j [kh] into equation, it follows that m
n
m
n
j=1
i=1
j=1
i=1
∑ r j y j [kh] = ∑ α˜ i ∑ r j y j [kh − ih] + ∑ β˜i u[kh − ih] + v1[kh],
i = 1, ..., N (19.5)
where v1 [kh] denotes any discrepancy not counted by the model, e.g., the contribution of noise, model uncertainty and approximation errors. By observing that r1 = 1 from the normalisation assumption, the above equation can be re-written as y[kh] = (−y2 [kh], ..., −ym [kh], y[(k − 1)h], ..., y[(k − n)h], ..., ym [(k − 1)h], ..., ym [(k − n)h]) ) *+ , φ1 [kh]
· (r2 , ..., rm , ..., α˜ 1 , ..., α˜ n , α˜ 1 r2 , ..., α˜ n r2 , ..., α˜ 1 rm , ..., α˜ n rm ) ) *+ , θ1
n
+ ∑ β˜i u[kh − ih] + v1 [kh]. i=1
This is the basic equation for the estimation of the output nonlinearity x = ∑ ri yi in terms of its coefficients ri ’s. Now, consider two consecutive equations at k = l(n + 1) and k = l(n + 1) − 1, n
y[l(n + 1)h] = φ1 [l(n + 1)h]θ1 + ∑ β˜i u[l(n + 1)h − ih] + v1[l(n + 1)h], i=1 n
y[l(n + 1)h − h] = φ1 [l(n + 1)h − h]θ1 + ∑ β˜i u[l(n + 1)h − ih − h]+ v1[l(n + 1)h − h]. i=1
(19.6) Since the input sampling interval is fixed at T = (n + 1)h where h is the output sampling interval, we have w[(l − 1)T ] = w[(l − 1)T + h] = .... = w[(l − 1)T + nh], l = 1, 2, ... and this implies u[(l − 1)T ] = u[(l − 1)T + h] = .... = u[(l − 1)T + nh], l = 1, 2, ... Therefore, we have
19
A Blind Approach to the Hammerstein-Wiener Model Identification
317
u[(l − 1)T ] = u[l(n + 1)h − h − nh]] = ... = u[l(n + 1)h − h] = u[(l − 1)T + nh], and it follows that
Δ y[l] = Δ φ1 [l]θ1 + Δ v[l]
(19.7)
with
Δ y[l] = y[lT ] − y[lT − h], Δ v[l] = v1 [lT ] − v1 [lT − h], Δ φ1 [l] = φ1 [lT ] − φ1 [lT − h]. In equation (19.7), Δ y[l] and Δ φ1 [l] consist of output measurements y[kh] only and thus are available. Moreover, this equation is linear in the unknown parameter vector θ1 which can be estimated by many standard methods, e.g., the least squares method or the (normalised) LMS algorithm shown below
θˆ1 [l] = θˆ1 [l − 1] +
Δ φ1 [l] (Δ y[l] − Δ φ1 [l]θˆ1 [l − 1]). 1 + Δ φ1 [l]Δ φ1 [l]
Note that the estimates rˆ2 , ..., rˆm , αˆ˜ 1 , ..., αˆ˜ n of r2 , ..., rm , α˜ 1 , ..., α˜ n are the first (m− 1 + n) entries of θˆ1 . Also note that r1 is normalised to be 1. Therefore, once θˆ1 is obtained, we have the estimate of the inverse output nonlinearity m
fˆ−1 (y) = ∑ rˆi yi [kh]
(19.8)
i=1
˜ with rˆ1 = 1, as well as the estimate of the denominator α˜ (z) of G(z). q The forward output nonlinearity y[kh] = ∑i=1 aˆi xi [kh] can be constructed by minimising q
aˆ = arg min ∑{y[kh] − ∑ ai xˆi [kh]}2 k
i=1
i where y[kh]’s are observed outputs and x[kh] ˆ = ∑m i=1 rˆi y [kh]’s are generated from ˆ (19.8). We comment that direct readings of rˆi and α˜ k from θˆ1 may not be a good policy in a noisy situation because it ignores a large number of identified parameters α˜ k ri without taking into account of their contribution. A more robust way should consider their contribution. This can be done as follows. Let θˆ1 be represented by
θˆ1T = (¯r2 , ..., r¯m , α¯˜ 1 , ..., α¯˜ n , α˜ 1 r2 , ..., α˜ n r2 , ..., α˜ 1 rm , ..., α˜ n rm ). With rˆ1 = 1, the estimates rˆi and αˆ˜ k are defined as (ˆri , αˆ˜ k ) = arg min{∑(αˆ˜ k − α¯˜ k )2 + ∑(αˆ˜ k rˆi − α˜ k ri )2 } k
⎛ α¯˜ 1 ⎜ .. = arg min ⎝ . α˜¯ n )
α˜ 1 r2 .. .
... .. . α˜ n r2 . . . *+ Θ1
i,k
⎞ ⎛ ⎞ α˜ 1 rm αˆ˜ 1 .. ⎟ − ⎜ .. ⎟ (ˆr , rˆ , ..., rˆ ) m F . ⎠ ⎝. ⎠ 1 2 α˜ n rm α˜ˆ n ,
(19.9)
318
E.-W. Bai
where F stands for the matrix Frobenius norm. This problem was solved in [3]. Let min(n,m)
∑
σi ξi ηi = Θ1
(19.10)
i=1
be the singular value decomposition(SVD) of Θ1 , where σi ’s are the singular values and ξi and ηi are n and m dimensional orthonormal vectors respectively. Then, a solution with rˆ1 = 1 is given by ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ 1 rˆ1 αˆ˜ 1 ⎜ rˆ2 ⎟ ⎜ rˆ2 ⎟ 1 ⎜ .. ⎟ ⎜ ⎟ ⎜ ⎟ ⎝ . ⎠ = σ1 sη ξ1 , ⎜ .. ⎟ = ⎜ .. ⎟ = η1 ⎝ . ⎠ ⎝ . ⎠ sη αˆ˜ n rˆm rˆm where sη is the first entry of η1 .
19.3.2 Linear Transfer Function Estimation In this section, we propose a blind method to estimate G(z) without requiring u[kT ]. ˜ was obtained as a result Recall that the denominator 1 − αˆ˜ 1 z−1 − ... − αˆ˜ n z−n of G(z) ˆ ˜ of the estimation of θ1 in the previous section. Note G(z) is the transfer function of the sampled system at the sampling interval h = T /(n + 1). Write 1 − αˆ˜ 1 z−1 − ... − αˆ˜ nz−n = (1 − s˜1 z−1 )...(1 − s˜nz−1 ) ˜ where s˜i denotes the poles of G(z). The sampled system is assumed to be minimal at the sampling interval T and is minimal at any sampling interval h = T /l, l ≥ 1, from Lemma 19.1. Clearly, s is a pole of the continuous time system if and only if ˜ In other words, esT = esh(n+1) is a pole of G(z) if and only if esh is a pole of G(z). n+1 ˜ if s˜i ’s are the poles of G(z), then s˜i ’s are the poles of G(z). This implies that an estimate of α (z), the denominator of G(z) is given by −1 n+1 −1 ˆ 1 z−1 − ... − αˆ nz−n . αˆ (z) = (1 − s˜n+1 1 z )...(1 − s˜n z ) = 1 − α
(19.11)
Hence, an estimate of α (z) is already contained in θˆ1 and what we have to estimate is only the numerator β (z) of G(z). To this end, consider two sequences −k {x[kT ]} ⇐⇒ X(z) = ∑∞ k=0 x[kT ]z ∞ ¯ {x[kT + T /2]} ⇐⇒ X(z) = ∑k=0 x[kT + T /2]z−k
= G(z)U(z) ¯ = G(z)U(z)
(19.12)
−k is the Z-transform of the sequence u[kT ] at the samwhere U(z) = ∑∞ k=0 u[kT ]z ¯ pling interval T , and G(z) and G(z) represent the transfer functions from U(z) to ¯ X(z) and X(z) respectively. The transfer function G(z) is derived in (19.3) and is ¯ strictly proper. G(z) needs a special attention. From the continuous time state space equation (19.1), we have
19
A Blind Approach to the Hammerstein-Wiener Model Identification
η [(k + 1)T + T /2]
=
eAT η [kT + T /2] + )*+, Φ
x[kT + T /2]
=
T
T /2
eAt bdt u[kT ] + T /2 )0 ) *+ ,
cη [kT + T /2].
319
Γ1
eAt bdt u[(k + 1)T ] *+ , Γ2
¯ Thus, the transfer function G(z) is given by ¯ β¯0 + β¯1 z−1 + ... + β¯nz−n ¯ = β (z) = c(zI − Φ )−1 (Γ1 + Γ2 z) = G(z) . α¯ (z) 1 − α¯ 1z−1 − α¯ 2 z−2 − ... − α¯ nz−n ¯ It is interesting to note that G(z) and G(z) share the same denominator but with ¯ is proper but not strictly proper. different numerators and unlike G(z), G(z) Now, consider again two sequences {x[kT ]}, {x[kT + T /2]} and their Z-transforms ¯ ¯ X(z) = G(z)U(z), X(z) = G(z)U(z). ¯ ¯ ¯ = 0 and this results in Clearly, G(z)X(z) − G(z)X(z) = 0, β¯ (z)X(z) − β (z)X(z) ¯ (β¯0 + β¯1z−1 + ... + β¯nz−n )X(z) − (β1z−1 + ... + βnz−n )X(z) = 0. Its time domain equation is
β1 x[kT + T /2] =
⎛ ⎞ β2 ⎜.⎟ ⎜ .. ⎟ ⎜ ⎟ ⎜β ⎟ ⎜ n⎟ (−x[kT − T + T /2], ..., −x[kT − (n − 1)T + T /2], x[kT + T ], ..., x[kT − (n − 1)T ]) ⎜ ¯ ⎟ . ) *+ , ⎜β0 ⎟ ⎜.⎟ φ¯2 [k] ⎜.⎟ ⎝.⎠ β¯n ) *+ , θ2
By noting that β1 = 1, it follows that x[kT + T /2] = φ¯2 [k]θ2 . In this equation, φ¯2 is a function of x[kT ] and x[kT + T /2] which are not available. However, their estimates x[kT ˆ ] and x[kT ˆ + T /2] are readily available by using the estimated output nonlini earity xˆ = ∑m i=1 rˆi y and the observed outputs y[kT ] and y[kT + T /2]. Let xˆ and φ2 denote the estimates of x and φ¯2 using xˆ instead of x respectively, we have x[kT ˆ + T /2] = φ2 [k]θ2 + v2 [k]
(19.13)
where v2 [k] indicates the contribution due to the error between x and x. ˆ This equation is again linear in the unknown parameter vector θ2 and many standard estimation algorithms apply, e.g., the (normalised) LMS algorithm
θˆ2 [k] = θˆ2 [k − 1] +
φ2 [k] (x[kT ˆ + T /2] − φ2 [k]θˆ2 [k − 1]). 1 + φ2 [k]φ2 [k]
320
E.-W. Bai
θˆ2 consists of the estimates (βˆ1 , ..., βˆn , βˆ¯0 , ..., βˆ¯n ) with βˆ1 = 1. Therefore, combining ˆ¯ ˆ ¯ equation (19.11), we obtain the estimates G(z) and G(z) of G(z) and G(z) respectively, ˆ = G(z)
βˆ1 z−1 + βˆ2z−2 + ... + βˆnz−n βˆ¯0 + βˆ¯1z−1 + ... + βˆ¯ nz−n ˆ¯ = , G(z) −1 −2 −n 1 − αˆ 1z − αˆ 2 z − ... − αˆ nz 1 − αˆ¯ 1z−1 − αˆ¯ 2 z−2 − ... − αˆ¯ nz−n
where αˆ i = αˆ¯ i , i = 1, ..., n.
19.3.3 Input Nonlinearity Estimation Since the structure of the input nonlinearity is unknown, estimation of the input nonlinearity relies completely on the graph information determined by the pairs of (w[kT ], u[kT ]). The input w[kT ] is known, but not u[kT ]. Therefore, estimation of u[kT ] becomes a key in determining the input nonlinearity. Recall that the input sampling interval is T and thus u[kT ] = u[kT + T /2]. Also ¯ ¯ ¯ recall X(z) = G(z)U(z), X(z) = G(z)U(z). If either G(z) or G(z) is minimum phase, U(z) and consequently, u[kT ] can be recovered easily ¯ U(z) = G−1 (z)X(z) or U(z) = G¯ −1 (z)X(z). In time domain, these equations are 1 u[kT ] = ¯ (−β¯1 u[kT − T ] − ... − β¯ nu[kT − (n − 1)T] + x[kT + T /2]− β0
(19.14)
α¯ 1 x[kT + T /2 − T] − ... − α¯ nx[kT + T /2 − nT]). In these equations, both the estimates βˆ (z) and βˆ¯ (z) have been obtained and so are the estimates of x[kT ˆ ] and x[kT ˆ + T /2]. Thus, u[kT ˆ ] can be calculated. ¯ In the case that both G(z) and G(z) are non-minimum phase, recovery of u[kT ] ¯ becomes problematic. To overcome this difficulty, assume that G(z) and G(z) do not share any common zeros. Then, from the Bezout identity, there exist two stable ¯ transfer functions F(z) and F(z) such that ¯ + F(z)G(z) ¯ F(z)G(z) = 1.
(19.15)
This implies that ¯ + F(z)G(z)]U(z) ¯ + F(z)X ¯ ¯ F(z)X(z) (z) = [F(z)G(z) = U(z)
(19.16)
and recovery of U(z) or equivalently u[kT ] can be easily implemented by using x[kT ˆ ] and x[kT ˆ + T /2]. Once the estimate u[kT ˆ ] of u[kT ] is obtained, the input nonlinearity u[kT ] = g(w[kT ]) can be graphed using the pairs (w[kT ], u[kT ˆ ]) that provides complete information about the unknown input nonlinearity.
19
A Blind Approach to the Hammerstein-Wiener Model Identification
321
19.3.4 Algorithm and Simulations We are now in a position to summarise the identification algorithm for the Hammerstein-Wiener model with unknown structure of the input nonlinearity. Identification algorithm: Step 1: Consider the sampled Hammerstein-Wiener model in Figure 19.1. For a given sampling interval T , collect input and output measurements w[kT ], y[kh], y[kT ] and y[kT + T /2], where h = T /(n + 1). Step 2: Construct Δ y[l] and Δ φ1 [l] as in (19.7) and estimate θ1 using, e.g., the i LMS algorithm. From θ1 , determine the inverse output nonlinearity x = ∑m i=1 rˆi y . q i The estimate y = ∑i=1 aˆi x of the forward output nonlinearity y = f (x) is the i i best inverse of x = ∑m ˆ ] = ∑m ˆ + T /2] = i=1 rˆi y . Denote by x[kT i=1 rˆi y [kT ] and x[kT m i ∑i=1 rˆi y [kT + T /2], the estimates of x[kT ] and x[kT + T /2] respectively. Step 3: From θˆ1 , calculate the estimate αˆ (z) of α (z) using (19.11). Construct ˆ φ2 [k] as in (19.13) and the estimate θˆ2 . From θˆ2 , determine the estimates G(z), ˆ ¯ ¯ G(z) of G(z), G(z) respectively. ˆ¯ ˆ Step 4: If either G(z) or G(z) is minimum phase, calculate u[kT ˆ ] using (19.14). ˆ ˆ ¯ ¯ If both G(z) and G(z) are non-minimum phase, calculate F(z) and F(z) as in (19.15) and calculate the estimate u[kT ˆ ] using (19.16). Step 5: Graph the input nonlinearity u = g(w) by using the pairs (w[kT ], u[kT ˆ ]). Estimate the nonlinearity based the information provided by the graph. If necessary, parametrise the input nonlinearity using some base functions. To illustrate the idea, we provide two numerical simulations Example 19.1. The linear part in this example is a first order system
η˙ = −0.8η + 2.4266u, x = η . With T = 1, the discrete transfer function is given by G(z) =
1 . 1 − 0.6703z−1
Notice that non-integer numbers are used in the model so that the normalisation assumption β1 = 1 and r1 = 1 will be met. The input nonlinearity is a deadzone with the threshold 0.4 shown in Figure 19.2 in solid line. No a priori knowledge on the structure of the input nonlinearity is assumed in the simulation. The output nonlinearity is y = f (x) = 0.18x + 1.37(e0.6x − 1) shown in Figure 19.3 in solid line. Although non-polynomial, the output nonlinearity can be approximated well by a second order polynomial y = 1.0287x + 0.2555x2.
322
E.-W. Bai Table 19.1: Simulation results of Example 19.1 linear part(numerator) linear part(denominator) output nonlinearity
true values best approximation estimates
1
[1, − 0.6703]
1
[1, − 0.6713]
[1.0287, 0.2555] [1.0291, 0.2544]
Fig. 19.2: Graph of the actual (solid) and estimated input deadzone (circle), Example 19.1
Of course, both the true output nonlinearity and its best polynomial approximation are unknown in the simulation. Instead, in the simulation, the inverse output nonlinearity x = f −1 (y) is modelled by an 4th order polynomial x = ∑4i=1 ri yi . Then, the forward output nonlinearity y = ∑2i=1 ai xi is calculated by finding the best inverse of x = ∑4i=1 ri yi . For simulation, the input w[kT ], k = 1, ..., 500, is a uniform i.i.d. random variable in [−1, 1] and a uniform white noise distributed in [−0.05, , 0.05] is also added. Table 19.1 shows the simulation results. Figure 19.2 shows the true input deadzone nonlinearity in solid line and the graph using the pairs of (w([kT ], u[kT ˆ ]) in circles. Figure 19.3 shows the true output nonlinearity in solid line and its estimate y = 1.0291x + 0.2544x2 given by the proposed algorithm in circles. A satisfactory result is obtained.
19
A Blind Approach to the Hammerstein-Wiener Model Identification
323
Example 19.2. Consider a second order linear continuous system 0 1 0 η˙ = η+ u, x = [0.6556, 0.6556] η . −0.4 −0.3 1 Both the input and output nonlinearities are polynomials u = 1.5w − 1.2w2 + 0.8w3 + w4 , y = 0.9962x + 0.0996x2. Again, non-integer numbers are used to normalise β1 = 1 and r1 = 1 for simulation purpose. The structure of the input nonlinearity is not assumed to be known. In the simulation, we model the inverse output nonlinearity x = f −1 (y) by an 4th order polynomial x = ∑4i=1 ri yi . For simulation, the sampling interval T = 1.2 and input w[kT ], k = 1, ..., 500, is a uniform i.i.d. random variable in [−0.5, 0.5]. A uniform white noise with magnitude 0.01 is also added in the simulation. With T = 1.2, the true but unknown G(z) is G(z) =
1 − 0.2415z−1 . 1 − 1.2353z−1 + 0.7028z−2
Fig. 19.3: Output nonlinearity (solid) and its estimate (circle)
324
E.-W. Bai Table 19.2: Simulation results of Example 19.2 linear part(numerator) linear part(denominator) output nonlinearity true values estimates
[1, − 0.2415] [1, − 0.2445]
[1, − 1.2353, , 0.7028] [1, − 1.2367, 0.6977]
[0.0996, 0.9962] [0.097, 0.9986]
Fig. 19.4: Graph of the input nonlinearity, Example 19.2
Table 19.2 shows the simulation results using the proposed identification algorithm and Figure 19.4 shows the true input nonlinearity in solid line and the graph using the pairs of (w([kT ], u[kT ˆ ]) in circles.
19.3.5 Discussions To avoid unnecessary complications so that the idea can be clearly conveyed, our attention was focused on presenting the basic algorithm. The algorithm can be in fact improved in several ways and we provide some discussions in this section. 1. Parametrisation of the input nonlinearity u = g(w). In the previous discussion, the structure of the input nonlinearity is assumed to be unknown and thus estimation relies on the graph given by the pairs (w[kT ], u[kT ˆ ]). Once the picture of u = g(w) is obtained, the structure of u = g(w)
19
A Blind Approach to the Hammerstein-Wiener Model Identification
325
can be determined. The next step is to parametrise this nonlinearity by using appropriate base functions, e.g., u = g(w) = ∑ gi (w, bi ) for some known nonlinear functions gi ’s and unknown coefficients bi ’s. The choice of gi ’s of course depends on the structure shown in the graph. For instance, in Example 19.1, the graph clearly shows a deadzone input nonlinearity with some unknown threshold b. We can model this nonlinearity by the deadzone function u = g(w) = w − b · sgn(w) −
[1 + sgn(b − |w|)] (w − b · sgn(w)) 2
where sgn is the standard sign function and b is the unknown threshold which can be determined by [1 + sgn(b − |w[kT]|)] bˆ = arg min ∑{u[kT ] − [w[kT ] − b · sgn(w[kT]) − b k 2 ·(w[kT ] − b · sgn(w[kT]))]}2 . Using the data generated in Example 19.1, the optimal estimate bˆ = 0.3999 is obtained and the actual b = 0.4. In some cases, the input nonlinearity is assumed to be a polynomial, u = p bi wi . Then, the coefficients bi ’s can be determined by minimising g(w) = ∑i=1 ˆ ] − ∑ bi wi [kT ])2 . bi = arg min ∑(u[kT k
This approach identifies the linear part and the output nonlinearity first. Then, using the estimated linear part, the estimated output nonlinearity and the observed output, we recover u[kT ] and identify the input nonlinearity. Another approach is to identify the input and output nonlinearities and the linear part together by over-parametrising the whole system. To this end, recall the system equations. By substituting input and output nonlinearities into the system equation, we have m
n
i=1
i=1
∑ ri yi [kT ] = ∑ αi
m
n
p
j=1
i=1
j=1
∑ r j y j [kT − iT ] + ∑ βi ∑ b j w j [kT − iT ] + v[kT ], (19.17)
where v[kT ] is the contribution due to noise, model uncertainty and approximation errors. Since r1 = 1, it follows that y[kT ] = (−y2 [kT ], ..., −ym [kT ], y[kT − T ], ..., ym [kT − T ], ..., y[kT − nT ], ..., ym [kT − nT ], w[kT − T ], ..., w p [kT − T ], ..., w[kT − nT ], ..., w p [kT − nT ]) · (r2 , ..., rm , α1 , α1 r2 , ..., α1 rm , α2 , α2 r2 , ..., α2 rm , ..., β1 b1 , ..., β1 b p , ..., βn b1 , ..., βn b p ) . ) *+ , θ3
This equation involves only input-output measurements and is linear in θ3 . Therefore, any standard estimation algorithm applies. It is clear that the estimates rˆi ’s and αˆ i ’s can be read directly from θ3 . However, the estimates β i b j ’s, i = 1, ..., n, j = 1, ..., p, need to be projected into
326
E.-W. Bai
βˆ = (βˆ1 , ..., βˆn ) and bˆ = (bˆ 1 , ..., bˆ p ) minimised. This is equivalent to ⎛ β 1 b1 . . . ⎜ . .. ˆ ⎜ ˆ (β , b) = arg min ⎝ .. . β b . . . n 1 ) *+ Θ
ˆˆ 2 with βˆ1 = 1 such that ∑i, j (β i b j − βi b j ) is ⎞ β b 1 p .. ⎟ ⎟ ˆ ˆ . ⎠ −β b F , β nb p ,
with βˆ1 = 1
(19.18)
where F stands for the matrix Frobenius norm. This is exactly the same minimisation problem as in (19.9) and the solution is given by 1 βˆ = ξ1 , bˆ = σ1 sξ η1 , sξ where sξ is the first entry of ξ1 and ∑i=1 σi ξi ηi is the singular value decomposition(SVD) of Θ . The over-parametrisation method is fairly straightforward. A disadvantage is that the the dimension of identification is usually very high. To this end, the iterative method may be useful. The idea of the iterative method is reminiscent of the iterative method for the Hammerstein model [14] and for the Wiener model [12]. The parameter set is divided into two parts, linear part and nonlinear part. The linear part is estimated while the nonlinear part is fixed. Then, two sets are switched to evaluate the nonlinear part while the linear part is fixed. Define the cost function min(n,p)
m
n
i=1
i=1
J = ∑{ ∑ rˆi yi [kT ] − [ ∑ αˆ i k
m
n
p
j=1
i=1
j=1
∑ rˆ j y j [kT − iT ] + ∑ βˆi ∑ bˆ j w j [kT − iT ]]}2 (19.19)
and the iterative method can be summarised as follows: a. Consider the system and collect data. ˆ b. Set initial value rˆ(0) and b(0), and let i = 1. ˆ c. For fixed rˆ(i − 1) and b(i − 1), find αˆ (i) ˆ − 1)). = arg min J(α , β , rˆ(i − 1), b(i βˆ (i) α ,β Normalise the estimates such that βˆ1 (i) = 1. d. For fixed αˆ (i) and βˆ (i), determine rˆ(i) = argmin J(r, b, αˆ (i), βˆ (i)). ˆ b(i) r,b Normalise the estimates such that rˆ1 (i) = 1.
19
A Blind Approach to the Hammerstein-Wiener Model Identification
327
ˆ e. Let Ji = J(αˆ (i), βˆ (i), rˆ(i), b(i)). If Δ Ji = Ji − Ji−1 is smaller than some prescribed threshold, go to next step. Otherwise, set i = i + 1 and go to step 3. ˆ αˆ and βˆ . f. From rˆ, calculate a. ˆ The final estimates are a, ˆ b, Notice that the cost function is bilinear and this implies that minimisation at Step 3 or Step 4 is a simple linear least squares problem and can be solved efficiently. Although there is no guarantee for the global convergence, this iterative method is expected to work just like its counterpart for Hammerstein [14] or Wiener [12] models where it has been demonstrated that the iterative method is usually very effective and converges quickly. Moreover, divergence is rare [19]. 2. Output nonlinearity order estimation. In the proposed algorithm, the order m of the inverse output nonlinearity i x = ∑m i=1 ri y is assumed to be known. In practice, m is unknown and needs to estimated using the on line data. A number of standard methods of order estimation for linear systems, e.g, rank test and the output error test, find their applications here. Interested readers can find details in [13, 18]. We focus on the output error test method in this chapter. Suppose that the output nonlinearity and the linear transfer function have been estimated in terms of their coefficients ˆ = (aˆ1 , ..., aˆq ), βˆ (m) = (βˆ1 , ..., βˆn ), αˆ (m) = (αˆ 1 , ..., αˆ n ) rˆ(m) = (ˆr1 , ..., rˆm ), a(m) respectively for a fixed order m. Here m in the brackets indicates the dependence of estimates on the order m. Further, suppose that the input nonlinearity u = g(w) p ˆ i bi w . Define the estimated has been parametrised, say by a polynomial u = ∑i=1 output u[kT ¯ ] = x[kT ¯ ] = y¯m [kT ] =
p ∑i=1 bˆ i wi [kT ] n ¯ − jT ] + ∑nj=1 βˆ j (m)u[kT ¯ − jT ] ∑ j=1 αˆ j (m)x[kT q i [kT ] a ˆ (m) x ¯ ∑i=1 i
and the output error : e(m) =
1 N ∑ (y¯m [kT ] − y[kT ])2 N k=1
where y[kT ] is the observed output. Now, consider the case where the order m i −1 (y). With more free paincreases. Note that x = ∑m i=1 ri y approximates x = f rameters in the model, a better fit is expected, i.e., a smaller e(m) is expected as m increases. The important thing is to investigate whether or not the improvement is significant. In a noise free and PE input case, if the order m is high enough to adequately describe the inverse function x = f −1 (x), any increment in m only produce a small reduction in e(m). In a noisy situation, the output error e(m) may increase as m increases if m is already higher than the one that adequately describes the inverse function due to noise, computational error or the level of PE. Therefore, if Δ e(m) = e(m) − e(m − 1) is small , the order m should be chosen. Otherwise, a higher order is preferred. We provide an example here.
328
E.-W. Bai
Example 19.3. Let the continuous time system be a first order x˙ = −0.88x + 2.4266u. The unknown input and output nonlinearities are, respectively u = w + w2 + 2w3 , y = 1.059x + 0.4236x2. With the same simulation conditions as in Example 19.2, Figure 19.5 shows the output error e(m) versus the order m. Clearly, m = 5 or 6 should be chosen. When m = 5, the true and the estimates are given in Table 19.3. Note that the numerator is normalised to β (z) = 1 because of the first order transfer function. In this example, the unknown input nonlinearity is estimated and then is parametrised by a polynomial. Table 19.3: Simulation results when m=5, Example 19.3 linear part(denominator) output nonlinearity true values estimates
[1, − 0.6703] [1, − 0.6724]
input nonlinearity
[1.059, 0.4236] [1, 1, 2] [1.0410, 0.4146] [1.0073, 1.0254, 1.9916]
3. Inverse parametrisation of the output nonlinearity. The actual output nonlinearity y = f (x) is unknown and we use the inverse parametrisation x = ∑ ri yi . This inverse approach has been used in the literature [12, 15] to model Wiener systems. Here, we adopt this approach to model Hammerstein-Wiener systems with unknown input nonlinearity structure. Because of the blind method, this inverse approach makes identification of the Hammerstein-Wiener model feasible even with the unknown structure of the input nonlinearity and non-white inputs. Clearly, the success of the proposed algorithm hinges on the accuracy between the approximation x = ∑ ri yi and the true x = f −1 (y). Theoretically, as long as y = f (x) is one-to-one and continuous, x = ∑ ri yi approximates x = f −1 (y) to any accuracy as the order increases. Practically, however, a high order m introduces errors due to noise and model uncertainty and slows down the convergence rate. There is balance between the errors introduced by the approximation x = ∑ ri yi and the errors due to noise and model uncertainty. Therefore, the inverse approach is not an universal approach and probably would not work if the order of the approximation x = ∑ ri yi needs to be very high. The information whether or not the inverse approach is appropriate for a particular system which is unknown is in fact contained in the output error e(m) discussed before. If e(m) is uniformly large, the inverse approach does not work well. On the other hand, small e(m) for some m indicates the success of the inverse approach together with the blind method. 4. Persistent excitation conditions. To have a robust identification algorithm in the presence of noise and model uncertainty, the regressors Δ φ1 and φ2 need to be persistently exciting(PE). The
19
A Blind Approach to the Hammerstein-Wiener Model Identification
329
conditions that φ2 is PE are derived in [4], which basically say that φ2 is PE if the spectral lines of u[kT ] is not concentrated on less than 2n points, a richness condition. This richness condition can also be translated into conditions in terms of the input w[kT ]. Suppose that w[kT ] is i.i.d. zero mean random variable and u[kT ] = g(w[kT ]) assumes at least two distinctive values with nonzero probability. Then, u[kT ] is also i.i.d. and has infinitely many spectral lines which implies that φ2 is PE. The second scenario is the polynomial input nonlinearity u = ∑ bi wi and sinusoidal input w[kT ] = ∑ ci sin(Ωi k). This is certainly the case if the input is periodic by the Fourier series representation. Then, u[kT ] has more than 2n spectral lines if w[kT ] has 2n spectral lines unless in a pathological case where either coefficients are zero or frequencies are the same module 2π . 5. Choice of the over-sampling ratio l. In the algorithm, the over-sampling ratio l = (n + 1) is assumed, where n is the order of G(z). This seems to imply that the order of G(z) has to be known a priori. In fact, l does not have to be (n + 1) and any l > (n + 1) suffices [4]. Note that the key of the blind approach is to cancel the unknown signals u from equation (19.6) to form equation (19.7). This is possible for any l ≥ (n + 1) because if h = T /(n¯ + 1) or T = h(n¯ + 1) for any n¯ ≥ n, we have ¯ l = 1, 2, ... u[(l − 1)T ] = u[(l − 1)T + h] = .... = u[(l − 1)T + nh],
Fig. 19.5: Output error versus order, Example 19.3
330
E.-W. Bai
Hence, equation (19.7) follows. Of course, details of the algorithm including equations are not exactly same when l > (n + 1) instead of l = (n + 1). However, the idea remains the same and all modifications are minor. In theory, as long as l ≥ (n + 1), blind identification is possible. In practice, however, there is a limit on how large l can be. For a very large l or equivalently a very small h, two consecutive samples will likely have similar values and this makes the blind identification numerically ill-conditioned. A good choice of h is T 1 ≤h≤ 2 fy (n + 1) where fy is the bandwidth of the output y. Clearly from the sampling theorem, h = 21fy implies that y(t) can be completely determined from y[kh] and further increasing l or equivalently reducing h will not provide any additional information and will only make the algorithm ill-conditioned. The over-sampling approach is to fix the input sampling interval at T and over-sample the output at h so that l = T /h = (n+1) ¯ ≥ (n+1). Another avenue to make l = (n¯ + 1) ≥ (n + 1) is the under-sampling approach. By letting the output sampling interval h = T and keeping the input constant between k(n¯ + 1)T and (k + 1)(n¯ + 1)T for each k, we have l = (n¯ + 1). This under-sampling approach would avoid the numerical instability problem at a price that (1) the utilisation of time is less efficient prolonging the identification process and (2) the system may be excited only at low frequency ranges. It is conceivable that an “optimal” way to achieve l ≥ (n + 1) in some cases could be the one that combines both the over-sampling and under-sampling approaches. 6. Relation with the step response identification. In a way, the blind technique presented in this chapter may be considered as repeatedly applying piece-wise constant inputs. Conceptually, a number of step responses could be used to give information first of the output nonlinearity and the linear part, and then of the input nonlinearity. However, the blind technique works fundamentally different from the traditional step response identification method [17]. The traditional step response method relies heavily on the steadystate value y(t), t → ∞, of the step response and would suffer from large noises at the end of transient. This is specially true in the setting of parametric identification and therefore, it is suggested to apply the step response identification method several times to average out the effect of noises [17]. Clearly, with only the output measurements y[kT ], blind identification is not possible if both the input and output sampling intervals are fixed at T . This is because two different sampled systems combined with properly chosen inputs could provide the identical output y[kT ]. With additional intermediate values y[kh] (h ≤ T /(n + 1)) between kT and (k + 1)T , however, the choice of the system becomes unique. This is the basic observation that tells us why and how the blind technique works. Now, with the output observations over each T , an equation (19.7) related to the unknown parameters is derived. Obviously from (19.7),
19
A Blind Approach to the Hammerstein-Wiener Model Identification
331
the blind technique does not rely heavily on any particular value of the output observation but depends on each one y[kh] equally. 7. Identifiability With the PE inputs and Assumptions 19.1, 19.2, and 19.3, the identifiability of the Hammerstein-Wiener model shown in Figure 19.1 can be easily established. Identifiability here means that the representation of the system is unique in the absence of noise and model uncertainties. This can be seen easily. With the PE regressors, the solutions of (19.7) and (19.13) are unique. Moreover, the true but unknown system parameters are solutions in the absence of noises and model uncertainties. This establishes the identifiability.
19.4 Concluding Remarks In this chapter, we proposed a blind identification approach to sampled Hammerstein-Wiener models with the structure of the input nonlinearity unknown. The idea is to recover the internal signals u[kT ] and x[kT ] solely based on the output measurements. This is essential because the input nonlinearity has an unknown structure. The purpose of the chapter is to present the main idea and to illustrate the effectiveness of the proposed approach. Some important topics were not discussed in the chapter, e.g., how exactly the noise would influence the estimates in blind identification? This is an interesting and difficult question, and it gets more difficult to separate the effects of the noise from the under-parametrisation in inverting the output nonlinearity. We expect that the findings will be quite different from the existing results on the (non-blind) system identification. Another important issue is the application of the proposed method to a real world problem.
References 1. Bai, E.W.: A blind approach to Hammerstein-Wiener model identification. Automatica 38, 967–979 (2002) 2. Bai, E.W.: Identification of systems with hard input nonlinearities. In: Moheimani, R. (ed.) Perspectives in Control. Springer, Heidelberg (2001) 3. Bai, E.W.: An optimal two-stage identification algorithm for Hammerstein-Wiener nonlinear systems. Automatica 34(3), 333–338 (1998) 4. Bai, E.W., Fu, M.: Blind system identification and channel equalization of IIR systems without statistical information. IEEE Trans. on Signal Processing 47(7), 1910–1921 (1999) 5. Bai, E.W., Fu, M.: Hammerstein model identification: a blind approach, Tech Report, Dept of Elec. and Comp., Univ. of Iowa (2001) 6. Bilings, S.A., Fakhouri, S.Y.: Identification of a class of nonlinear systems using correlation analysis. Proc. of IEE 125(7), 691–697 (1978) 7. Boutayeb, M., Rafaralahy, H., Darouach, M.: A robust and recursive identification method for Hammerstein model. In: IFAC World Congress 1996, San Francisco, pp. 447–452 (1996) 8. Chang, F., Luus, R.: A non-iterative method for identification using Hammerstein model. IEEE Trans. on Auto. Contr. 16, 464–468 (1971)
332
E.-W. Bai
9. Greblicki, W.: Nonparametric identification of Wiener system. IEEE Trans. on Information Theory 38, 1487–1493 (1992) 10. Hsia, T.: A multi-stage least squares method for identifying Hammerstein model nonlinear systems. In: Proc. of CDC, Clearwater Florida, pp. 934–938 (1976) 11. Haber, R., Unbehauen, H.: Structure identification of nonlinear dynamic systems-a survey of input/output approaches. Automatica 26, 651–677 (1990) 12. Kalafatis, A.D., Wang, L., Cluett, W.R.: Identification of Wiener-type nonlinear systems in a noisy environment. Int. J. Contr. 66, 923–941 (1997) 13. Ljung, L.: System Identification: Theory for the User. Prentice-Hall, Englewod Cliffs (1987) 14. Narendra, K.S., Gallman, P.G.: An iterative method for the identification of nonlinear systems using a Hammerstein model. IEEE Trans. on Auto. Contr. 11, 546–550 (1966) 15. Pajunen, G.: Adaptive control of Wiener type nonlinear systems. Automatica 28, 781– 785 (1992) 16. Pawlak, M.: On the series expansion approach to the identification of Hammerstein systems. IEEE Trans. on Auto Contr. 36, 763–767 (1991) 17. Rake, H.: Step response and frequency response methods. Automatica 16, 519–526 (1980) 18. Soderstrom, T., Stoica, P.: System Identification. Prentice-Hall, Englewood Cliffs (1989) 19. Stoica, P.: On the convergence of an iterative algorithm used for Hammerstein system identification. IEEE Trans. on Auto. Contr. 26, 967–969 (1981) 20. Sun, L., Liu, W., Sano, A.: Identification of dynamical system with input nonlinearity. IEE Proc. Control Theory Appl. 146(1), 41–51 (1998) 21. Wigren, T.: Convergence analysis of recursive identification algorithms based on the nonlinear Wiener model. IEEE Trans. on Auto Contr. 39, 2191–2206 (1994)
Chapter 20
Decoupling the Linear and Nonlinear Parts in Hammerstein Model Identification Er-Wei Bai
20.1 Problem Statement Consider a discrete time Hammerstein model [2, 8, 9, 10] with y(k), u(k) and v(k) being system output, input and noise respectively. The internal signal x(k) is not measurable. The linear system is assumed to be exponentially stable represented by a nonparametric transfer function ∞
G(z) = ∑ θi z−i
(20.1)
i=1
for some |θi | ≤ M λ i , 0 < M < ∞ and 0 ≤ λ < 1. The nonlinear block represents a static nonlinearity ∞
x = f (u, η ) = ∑ ηi pi (u) i=0
{pi (u)}∞ 0,
e.g., the power series or some orthonormal sysfor some base functions tems. These base functions can be unknown in identification. The purpose of identification is to find a pair of estimates Gˆ and fˆ based on the input-output date set {u(k), y(k)}N1 . In the non-parametric case, the transfer function estimate is represented by a FIR system n
ˆ θˆ n ) = ∑ θˆn,i z−i , θˆ n = (θˆn,1 , . . . , θˆn,n ) . G(z,
(20.2)
i=1
To estimate the coefficients θˆn,i , define the output prediction ˆ θˆ n ) f (u(k), ηˆ ) , y(k) ˆ = G(z, where ηˆ is the estimate of η . Now, consider the quadratic error criterion for each n Er-Wei Bai Department of Electrical and Computer Engineering, The University of Iowa, Iowa City, IA 52242 USA e-mail:
[email protected] F. Giri & E.-W. Bai (Eds.): Block-oriented Nonlinear System Identification, LNCIS 404, pp. 335–345. c Springer-Verlag Berlin Heidelberg 2010 springerlink.com
336
E.-W. Bai
1 N 2 JN (θˆ n , ηˆ ) = ∑ (y(k) − y(k)) ˆ N k=1
(20.3)
and the associated estimates (θˆ n , ηˆ ) = arg min JN (θˆ n , ηˆ ) . The quadratic error (20.3) is one of the most common cost functions in the identification literature and is certainly the case for Hammerstein model identifications. Many well known Hammerstein model identification methods [2, 3, 5, 8] belong to this class and the differences lie only in the detail techniques for solving (20.3). It is well known [6], provided that the noise is white, zero mean and independent of the input, that with probability one
θˆ n → θ∗ , ηˆ → η∗ , as N → ∞ for some θ∗ and η∗ satisfying (θ∗ , η∗ ) = arg min lim EJN (θˆ n , ηˆ ) N→∞
where E denotes the expectation operator over the probability space upon which the random variable is defined. Thus, the parameter and the transfer function estimates are convergent. This is similar to the linear case. The convergence rate is however, a different story. Recall that in a linear setting, i.e., in the absence of the nonlinearity ˆ jω , θˆ n ) is asymptotically given by [6, 7] f , the variance of the estimate G(e ˆ jω , θˆ n )} ≈ Var{G(e
n Φv (ω ) N Φu (ω )
(20.4)
where Φv (ω ) and Φu (ω ) are the spectral densities of the noise v(·) and the input u(·) respectively. Naturally, in the presence of the nonlinearity, one would expect ˆ jω , θˆ n )} ≈ n Φv (ω ) , where Φx (ω ) is the spectral density of the internal sigVar{G(e N Φx (ω ) nal x = f (u, η ) which is the input to the linear system. This is however not the case. As shown in [9], the asymptotic variance for the Hammerstein model identification is given by ˆ jω , θˆ n )} ≈ Var{G(e n Φv (ω ) 1 d d ˆ jω , θˆ n )|2 + ( log Φx (ω )) Pη ( log Φx (ω ))|G(e N Φx (ω ) 4N d η dη
(20.5)
where Pη /N is the variance matrix of the estimate ηˆ . The first term in (20.5) is as expected and the second term is from the coupling between the linear part and the nonlinear part. This coupling is one of the major difficulties in Hammerstein model identifications that not only makes identification of the linear part nonlinear,
20
Decoupling the Linear and Nonlinear Parts
337
but also adds an unavoidable degradation to the accuracy of the estimates. Therefore, it is very desirable, if possible, to reduce the effect of the coupling. This seems to be a non-trivial problem. Note that the asymptotic variance (20.5) is based on the quadratic criterion (20.3) and is independent of the technique which solves the minimisation. In other words, the asymptotic variance is independent of the identification algorithm. On the other hand, the asymptotic variance does rely on the spectral density function Φx (ω ) that in turn depends on the structure of the nonlinearity. If the structure of the unknown nonlinearity were available, one could adjust the input spectral density so that the second term is reduced. The problem is clearly that the structure of the nonlinearity is not assumed to be known and any method which tries to reduce the coupling effect should be independent of the structure of the unknown nonlinearity. In this chapter, we propose a two step identification algorithm. In the first step, the linear part is identified using the Pseudo-Random Binary Sequences (PRBS) input. With the help of the PRBS input, we show that identification of the linear part is decoupled from the nonlinear part and the resultant asymptotic variance is give by (20.4) exactly the same as if the unknown nonlinearity is absent. In other words, the decoupling is achieved and the effect of the coupling is eliminated. Moreover, this decoupling is independent of the nonlinearity which could be discontinuous and unknown.
20.2 Nonlinearity with the PRBS Inputs The PRBS input is a signal that shifts between two levels ±c for some c = 0. For simplicity, we assume in this chapter that c = 1. Because the PRBS input is easy to generate and has desired properties similar to a white noise, it is widely and successfully used in linear system identifications [6, 11]. It is also well known, however, that the PRBS signal is inappropriate for nonlinear system identifications in general [2, 9] because it assumes only two values that may not sufficiently excite the unknown nonlinearity. Despite of this well known fact, we show in this chapter that use of the PRBS input is actually very beneficial for identification of Hammerstein models. In particular, because of this very binary nature, any static nonlinearity can be exactly characterised by a linear (affine) function under the PRBS input. Therefore, the effect of nonlinearity is eliminated. To be precise, let x = f (u) be any static nonlinear function, e.g., as illustrated in Figure 20.1. Since, u(k) = ±1 for any k, x(k) = f (±1) for all k. Thus, x = f (u) can be completely characterised by a linear (affine) function under the PRBS input x = f (u) = η0 + η1 · u . The coefficients η0 and η1 can be determined by solving f (1) η0 + η1 · 1 η0 1 1 = = f (−1) η0 − η1 · 1 1 −1 η1
(20.1)
(20.2)
338
E.-W. Bai
⇒
1 1 η0 = η1 2 1
1 f (1) + f (−1) 1 f (1) = . −1 f (−1) 2 f (1) − f (−1)
Though η0 and η1 depend on the unknown f (·), they are well defined and unique. In this chapter, we assume f (1) = f (−1) that implies η1 = 0. Clearly, if f (1) = f (−1), x(k) becomes a constant and identification may not be possible. At this point, observe that i the gains of f (u) and G(z) are actually not unique. Any pair (α f (u), G(z)/α ), α = 0, would produce identical input and output measurements. Therefore, to be uniquely identifiable, one of the gains have to be fixed. There are several ways to normalise the gains, e.g., either the gain of f (u) or G(z) can be fixed to be unit. In this chapter, we assume Assumption 20.1. The coefficient η1 in (20.1) is normalised to be one, i.e., η1 = 1. With this normalisation, the Hammerstein model can be re-written as y(k) = G(z)x(k) + v(k) = G(z)(η0 + η1 u(k)) + v(k) ∞
∞
= ∑ θi (η0 + u(k − i)) + v(k) = θ0 + ∑ θi u(k − i) + v(k)
with
i=1 ∞ θ0 = η0 ∑i=1 θi .
i=1
Fig. 20.1: The functions x = f (u) and x = η0 + η1 · u
(20.3)
20
Decoupling the Linear and Nonlinear Parts
339
Except the bias term θ0 which can be easily estimated and corrected, the equation (20.3) that relates the unknown G(z) to the input output data is linear (affine). Thus, identification of the linear part of the Hammerstein model is virtually linear under the PRBS input. It is important to point out that we obtain this linear (affine) equation with no knowledge of the nonlinearity.
20.3 Linear Part Identification Since the equation (20.3) is linear, any linear identification method can be applied. The key is that identification of the linear part is decoupled from the nonlinear part under the PRBS input.
20.3.1 Non-parametric Identification For each n, define two vectors Wn (ω ) = (e− jω , e−2 jω , . . . , e−n jω ) .
φn (k) = (u(k − 1), u(k − 2), . . ., u(k − n)) . We now re-write the output prediction y(k) ˆ and the quadratic error using equation (20.3): n
y(k) ˆ = θˆn,0 + ∑ θˆn,i u(k − i) , i=1
n 1 N 1 N 2 (θˆn,0 , θˆ n ) = arg min ∑ (y(k) − y(k)) ˆ = ∑ [y(k) − (θˆn,0 + ∑ θˆn,i u(k − i))]2 , N k=1 N k=1 i=1
and the associated transfer function estimate n
ˆ jω , θˆ n ) = ∑ θˆn,i e−i jω = Wn (ω )θˆ n . G(e i=1
Clearly, the parameter estimates θˆn,0 and θˆ n based on the quadratic error (20.3) are the well known least squares solutions ⎛ˆ ⎞ θn,0 ⎜θˆ ⎟ N ˆ 1 N θn,0 1 1 ⎜ n,1 ⎟ −1 1 = { φ (k))} [ (1, = ⎟ ⎜ . ∑ φn (k) ∑ φn (k) y(k)] . n θˆ n ⎝ .. ⎠ N k=1 N k=1 θˆn,n ˆ we model the PRBS input as an i.i.d. To analyse the consistency of the estimate G, process with binomial density distribution 0.5 u(k) = 1 , prob{u(k)} = 0.5 u(k) = −1 .
340
E.-W. Bai
It is easily verified that 1 N 1 N 2 u(k) = 0, Eu2 (k) = lim ∑ ∑ u (k) = 1 , N→∞ N N→∞ N k=1 k=1
Eu(k) = lim
1 N ∑ u(k)u(k − τ ) = δ (τ ), Φu (ω ) = 1 , N→∞ N k=1
R(τ ) = Eu(k)u(k − τ ) = lim
1 N ∑ u(k)v(t) = 0 , N→∞ N k=1
Eu(k)v(t) = lim
provided that the noise is independent of the input. Intuitively, from the above equations and the assumption that |θi | ≤ M λ i , we expect as N → ∞, 1 N 1 ∑ φn (k) (1, φn (k)) = N k=1 ⎛
1 u(k − 1) ... 2 N ⎜u(k − 1) (k − 1) . .. u 1 ⎜ ⎜ .. .. ∑ . .. N k=1 ⎝ . . u(k − n) u(k − n)u(k − 1) . . .
⎞ ⎛ u(k − n) 1 0 ⎜0 1 u(k − 1)u(k − n)⎟ ⎟ ⎜ ⎟ → ⎜ .. .. .. ⎠ ⎝. . . 2 0 0 u (k − n)
... ... .. . ...
⎞ 0 0⎟ ⎟ .. ⎟ .⎠ 1
∞ 1 N 1 N φn (k)y(k) = ∑ φn (k)(θ0 + ∑ θi u(k − i) + v(k)) ∑ N k=1 N k=1 i=1
⎛ ⎞ θ1 ⎟ ⎜ N ∞ 1 ⎜θ2 ⎟ → ∑ φn (k) ∑ θn u(k − i) → ⎜ . ⎟ . N k=1 ⎝ .. ⎠ i=1 θn
ˆ jω , θˆ n ) → G(e jω ) as Roughly speaking, this implies θˆ n → (θ1 , . . . , θn ) and G(e n, N → ∞. In fact, by mimicking the proofs and technical assumptions on the order n and the noise v(·) as in the linear case [7], the above observation can be made precise. ˆ θˆ n ) derived Theorem 20.1. Consider the Hammerstein model, the estimate G(z, from the quadratic criterion under the PRBS input. Suppose that the noise v(·) is the output of some unknown exponentially stable linear system driven by an i.i.d. sequence with zero mean and independent of the input. Further, let the order n = n(N) satisfy n2 (N)/N → 0 and n(N) → ∞, as N → ∞,
∞
∑[n(N 2 )/N]2 < ∞, 1
∞
∑[n3 (N)/N]q < ∞ . 1
20
Decoupling the Linear and Nonlinear Parts
341
for some q > 0. Then, with probability one as N → ∞, ; N ˆ jω ˆ n Φv (ω ) j ω n j ω ˆ |G(e [G(e , θ ) − G(e jω )] ≈ normal (0, ) , θˆ ) − G(e )| → 0, n Φu (ω ) and this implies ˆ jω , θˆ n )} ≈ Var{G(e
n Φv (ω ) . N Φu (ω )
This is exactly the same result one would have in a linear identification setting. In other words, the second term in (20.5) which is the effect of coupling in Hammerstein model identifications is completely eliminated by using the PRBS input. To show the improved performance by decoupling, we consider a numerical example. The nonlinearity is a pre-load shown in Figure 20.1, x = f (u, η ) = η1 · u + η0 · sign(u) with η1 = η0 = 0.5. The linear part is a second order stable system G(z) =
z − 0.75 . z2 − 1.5z + 0.56
In the simulation, N = 3000 and the noise is assumed to be i.i.d. uniformly in [−1, 1]. Figure 20.2 shows the estimation errors (in *) 1 2π
ˆ jω , θˆ n )|2 d ω |G(e jω ) − G(e
as a function of n by using the proposed method. To compare the results with existing methods of non-PRBS inputs, we also show the estimation errors by the iterative method [8] √ (in√o). In the simulation of the iterative method, an uniform random input in [− 3, 3] is applied to keep the same input energy and η1 is normalised to 0.5. As expected, the proposed method outperforms the iterative method because of decoupling. Though only the comparison with the iterative method is provided, the comparison is representative. For example, because of iterative nature and that the pre-load can be written as a linear function of u(·) and sign(u(·)), the performance of the half-substitution approach of [12] is similar. More importantly, as discussed before, the asymptotical performance (20.5) is independent of the method as long as the quadratic criterion is considered.
20.3.2 Parametric Identification In the above discussion, G(z) is assumed to be non-parametric. The idea can be easily extended to parametric identifications. Suppose
342
E.-W. Bai
G(z) =
β1 zm−1 + β2zm−2 + ... + βm , zm + α1 zm−1 + ... + αm
(20.1)
for some m, βi ’s and αi ’s. Re-define
θ = (α1 , . . . , αm , β1 , . . . , βm , θ0 ) , φ (k) = (−y(k − 1), . . ., −y(k − m), u(k − 1), . . ., u(k − m), 1) . The output y(k) can be written in time domain as m
m
i=1
i=1
y(k) = − ∑ αi y(k − i) + ∑ βi u(k − i) + θ0 + v0 (k) = θ φ (k) + v0 (k)
(20.2)
for some v0 (·) and θ0 = η0 ∑m i=1 βi . This equation is again linear (affine) as if the unknown nonlinearity is absent and thus, any linear method may apply. Since there is a huge volume of work in the literature on linear system identification, we only discuss three cases here relevant to the bias term θ0 . The quadratic error criterion: If the quadratic error of (20.3) is considered, the estimates θ = (αˆ 1 , . . . , βˆm , θˆ0 ) is again the least squares solution θ = (Q(N) Q(N))−1 Q(N)Y (N) → θ as N → ∞
Fig. 20.2: Estimation errors
20
Decoupling the Linear and Nonlinear Parts
343
provided that the noise v0 (·) is i.i.d. with zero mean and independent of the input, where ⎞ ⎛ ⎞ ⎛ φ (1) y(1) ⎟ ⎜ ⎟ ⎜ Y (N) = ⎝ ... ⎠ , Q(N) = ⎝ ... ⎠ .
φ (N)
y(N)
Consequently, αˆ i → αi and βˆ j → β j asymptotically, and ˆ = G(z)
βˆ1 zm−1 + . . . + βˆm → G(z) . zm + αˆ 1 zm−1 + . . . + αˆ m
Non-zero mean i.i.d. noises: If the noise v0 (·) is i.i.d. with non-zero mean, this non-zero mean and the bias term θ0 can be easily taken care of by the filter (1 − z−1). By defining y f (k) = (1 − z−1)y(k), u f (k) = (1 − z−1)u(k), v f (k) = (1 − z−1)v0 (k) and applying the filter to both sides of (20.2), we obtain m
m
i=1
i=1
y f (k) = − ∑ αi y f (k − i) + ∑ βi u f (k − i) + v f (k) . Clearly, the bias term θ0 is eliminated by the filtering and the new noise sequence v f (k) is i.i.d. with zero mean. Then, the least squares method can be applied again. Non-i.i.d. noises: If the noise v0 (·) is not i.i.d. but a stationary process with rational spectral density, then the instrumental variable method can be used. It is a standard result that the estimate derived by the instrumental variable method converges asymptotically provided that the instrumental variable is properly chosen [11].
20.4 Nonlinear Part Identification Once the linear part is obtained, we can identify the nonlinear part. At this point, disadvantages of PRBS inputs become apparent because the PRBS assumes only two values that do not excite the nonlinearity sufficiently. In order to excite the nonlinearity, the input has to be rich enough. In other words, to identify the nonlinear part, a new input output data set has to be generated. The primary difficulty in identifying the unknown nonlinearity f (·) is that no a priori structural knowledge on f (·) is assumed and thus estimation of f (·) is no longer a parameter estimation problem. We can deal with this problem in at least two ways. The first approach is to write f (u) = ∑∞ i=0 ηi pi (u) for some base functions {pi }∞ and then, to estimate the coefficients η ’s. This approach has been discussed in i 0 details in [10] along its convergence and consistency. The advantage of this approach is that only ηi ’s are estimated and no additional steps are needed. The disadvantage is that usually a large number of ηi ’s is required to have a reasonable representation
344
E.-W. Bai
of the unknown f (·). We focus the second approach in this chapter which is based on the following observation: though f (·) is unknown, it is static and if u(k) and x(k) are available, the picture of f (·) can be graphed and this graphical picture provides structural information on the unknown f (·). The input u(k) is available and the key to determine the structure of f (·) is to recover x(k). We divide the discussion in two cases. ˆ The estimated linear part G(z) is minimum phase: ˆ In this case, the internal signal x(k) can be estimated by inverting G(z), ˆ −1 y(k) . x(k) ˆ = G(z) In the case of parametric identification, we may write x(k) ˆ in time domain as x(k) ˆ =
1 ˆ my(k − m + 1)] . [−βˆ2 x(k − 1) − ... − βˆ mx(k − m + 1) + y(k + 1) + ... + α ˆ β1
Note all y(k)’s are available and the causality is not an issue. ˆ The estimated linear part G(z) is non-minimum phase: ˆ In this case, inversion becomes problematic. However, as long as G(z) does not have zeros on the unit circle, the following result can be easily derived, e.g., by a similar proof as in [4]. ˆ is stable and does not have zeros on the unit circle. Lemma 20.1. Assume that G(z) Then, for any arbitrarily small ε > 0, there always exists a stable transfer function ˆ H(z) and a positive integer l such that z1l − H(z)G(z) 2 ≤ ε. Now, define the estimate x(k) ˆ by x(k ˆ − l) = H(z)y(k). From the lemma, the error ε can be made arbitrarily small. Once x(k) ˆ is recovered, the structural information can be identified from the graph determined by the pairs (u(k), x(k)). ˆ If necessary, we can then parametrise the nonlinearity by using appropriate base functions xˆ = f (u) = ∑ pi (u, ηi ) for some known base functions pi ’s. The coefficients ηi ’s can estimated from
ηˆ i = arg min ηˆ i
1 N ˆ − ∑ pi (u(k), ηˆ i ))2 . ∑ (x(k) N k=1
A weighted criterion assigning smaller weights of earlier estimates of the internal signal might provide better results. Such earlier estimates are indeed influenced by the transient behaviour. Convergence analysis of this type of estimates has been carried out in the literature and we refer interested readers to [3] for details.
20
Decoupling the Linear and Nonlinear Parts
345
20.5 Concluding Remarks In Hammerstein model identifications, a major difficulty is the coupling between the linear and the nonlinear parts. Under the PRBS input, however, we show that they can be decoupled and thus, identification of the linear part is essentially a linear problem. This greatly reduces the complexity and improves the efficiency of the identification algorithm. The chapter is based on [1] with permission from Automatica/Elsevier.
References 1. Bai, E.W.: Decoupling of linear and nonlinear parts in Hammerstein model identification. Automatica 40, 1651–1659 (2004) 2. Bai, E.W.: Identification of linear systems with hard input nonlinearities of known structure. Automatica 38, 853–860 (2002) 3. Bai, E.W.: A blind approach to the Hammerstein–Wiener model identification. Automatica 38, 967–979 (2002) 4. Bai, E.W., Dasgupta, S.: A minimum k-step delay controller for robust tracking of nonminimum phase systems. Systems & Control Letters 28, 197–203 (1996) 5. Boutayeb, M., Rafaralahy, H., Darouach, M.: A robust and recursive identification method for Hammerstein model. In: IFAC World Congress 1996, San Francisco, pp. 447–452 (1996) 6. Ljung, L.: System Identification: Theory for the User, 2nd edn. Prentice-Hall, Upper Saddle River (1999) 7. Ljung, L., Yuan, Z.-D.: Asymptotic properties of black box identification of transfer functions. IEEE Trans. on Auto Contr. 30, 514–530 (1985) 8. Narendra, K.S., Gallman, P.G.: An iterative method for the identification of nonlinear systems using a Hammerstein model. IEEE Trans. on Auto. Contr. 11, 546–550 (1966) 9. Ninness, B., Gibson, S.: Quantifying the accuracy of Hammerstein model estimation. Automatica 38, 2037–2051 (2002) 10. Pawlak, M.: On the series expansion approach to the identification of Hammerstein systems. IEEE Trans. on Auto Contr. 36, 763–767 (1991) 11. Soderstrom, T., Stoica, P.: System Identification. Prentice-Hall, New York (1989) 12. Voros, J.: Parameter identification of discontinuous Hammerstein systems. Automatica 33, 1141–1146 (1997)
Chapter 21
Hammerstein System Identification in Presence of Hard Memory Nonlinearities Youssef Rochdi, Vincent Van Assche, Fatima-Zahra Chaoui, and Fouad Giri
21.1 Introduction Most works on Hammerstein system identification have focused on the case of memoryless nonlinearities characterised by a static relation u = F(v, θ ) where θ is a vector of unknown parameters (Figure 21.1). For a long time, the function F(v, θ ) has been assumed to be of known structure (e.g. polynomial), continuous in v and linear in θ (see e.g. Chapters 3 and 5 and reference list therein). Hard nonlinearities of known type have been considered in [1, 4, 6, 8]. Then, F(v, θ ) may be nonlinear in θ and discontinuous in v. The case of memoryless nonlinearities F(v) with no prior knowledge has been dealt with in [5]. Hammerstein system identification in presence of memory hard nonlinearities is a more challenging problem. The case of backlash-relay and backlash (Figures 21.2a and 21.3a) has been coped with in [3] using a separable nonlinear least squares method. However, the proposed solution has only been applied to symmetric nonlinearities with a single unknown parameter (specifically, h1 = h2 = a and M1 = M2 = 1 where a denotes the unknown parameter). A thorough description of this approach is given in Chapter 16. In [6] a two-stage procedure is developed for estimating parameter bounds for Hammerstein systems involving not-necessarily symmetric backlash element. The quality of the estimated bounds depends on the output noise amplitude: the smaller the noise Youssef Rochdi FST, University Caddi Ayad Marrakech Vincent Van Assche GREYC, University of Caen, Caen, France Fatima-Zahra Chaoui ENSET, University of Rabat, Morocco Fouad Giri GREYC, University of Caen, Caen, France e-mail:
[email protected] F. Giri & E.-W. Bai (Eds.): Block-oriented Nonlinear System Identification, LNCIS 404, pp. 347–365. c Springer-Verlag Berlin Heidelberg 2010 springerlink.com
348
Y. Rochdi et al.
amplitude, the tighter the parameter bounds (Fig 21.2a). A detailed presentation of this approach is made in Chapter 22. In the present chapter, a quite different solution is described for identifying Hammerstein systems containing not-necessarily symmetric nonlinearities of the backlash, backlash-relay, switch and switch-relay types (Figures 21.2a to 21.3b). The linear subsystem parameters are first estimated using a least squares estimator. The estimation process relies on a key design feature: an appropriate input signal is designed to ensure persistent excitation and to make possible the measurement of the internal signal u(t). The linear subsystem identification is then decoupled from the nonlinear element. The parameters of the nonlinear element are then estimated using estimators of the least-squares type, based on adequate system parametrisation. All involved estimators are shown to have quite interesting consistency properties. These consistency results have first been established in [7]. The chapter is organised as follows: the identification problem is stated in Section 21.2; Section 21.3 is devoted to the linear subsystem identification while the nonlinear element identification is carried out in Section 21.4.
ξ(t) v(t)
u(t) F(.)
Memory nonlinearity
y(t) B(q-1)
1/A(q-1)
Linear subsystem Fig. 21.1: Hammerstein System
21.2 Identification Problem Formulation We are interested in systems that can be described by the Hammerstein model (Figure 21.1): (21.1) A q−1 y(t) = B q−1 u(t) + ξ (t) and u(t) = F(v(t)) with A q−1 = 1 + a1q−1 + . . . + an q−n , B q−1 = b1 q−1 + . . . + bn q−n
(21.2)
where the internal signal u(t) is not measurable and the equation error ξ (t) is a bounded zero-mean stationary and ergodic sequence of stochastically independent variables. The linear subsystem is supposed to be (asymptotically) stable, controllable and of known order n. Controllability is required for persistent excitation purpose ([5]). The function F(.) is allowed to be a backlash, backlash-relay, switch
21
Hammerstein System Identification
349
or switch-relay. These are analytically defined in Table 21.1 and graphically illustrated by Figures 21.2a to 21.3b. The backlash-relay and switch-relay elements are characterised by the (unknown) parameters (M1 , M2 , h1 , h2 ); they will be denoted R(M1 ,M2 ,h1 ,h2 ) . The switch and backlash elements are characterised by (S, h1 , h2 ); they will be denoted Sw(S,h1 ,h2 ) and Ba(S,h1 ,h2 ) respectively. The meaning of (h1 , h2 ) is different for the different elements. For backlash- and switch-relay, h1 is the smallest number such that, for all t and whatever the value of v(t − 1): v(t) ≥ h1 ⇒ F(v(t)) = M1 , h2 is the largest number such that, for all t and whatever the value of v(t − 1): v(t) ≤ h2 ⇒ F(v(t)) = M2 . For backlash operators, [h2 , h1 ] is the widest interval such that one may have F(v) = 0, for all v ∈ [h2 , h1 ]. For switch elements, F(v) = 0 ⇒ v = h2 or v = h1 .
F(v)
M1 S
-hm
h2
hm v
h1
M2
Fig. 21.2a Backlash operator
Fig. 21.2b Switch operator
F(v)
F(v)
M
M1
v
-hm h2
h1
M2
Fig. 21.3a Backlash-relay
hm
-hm h2
h1
M2
Fig. 21.3b Switch-relay
hm
v
350
Y. Rochdi et al. Table 21.1: Analytic definitions of the considered nonlinearities
Backlash ⎧ ⎨ S (v(t) − h2 ) u(t) = S (v(t) − h1 ) ⎩ u(t − 1) u(t−1) where hL = S + h2
Switch ⎧ if v(t) ≤ hL ⎨ S (v(t) − h1 ) if v(t) ≥ hR u(t − 1) if hL < v(t) < hR u(t) = ⎩ S (v(t) − h2 ) u(t−1) and hR = S + h1
Backlash-relay ⎧ M1 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ u(t) =
M2 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩
if v(t) > h1 or h2 ≤ v(t) ≤ h1 and u(t − 1) = M1 if v(t) < h2 or h2 ≤ v(t) ≤ h1 and u(t − 1) = M2
Switch-relay ⎧ M1 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ u(t − 1) u(t) = ⎪ ⎪ M ⎪ 2 ⎪ ⎪ ⎪ ⎪ ⎪ ⎩
if v(t) > v(t − 1) if v(t) = v(t − 1) if v(t) < v(t − 1)
if v(t) > h1 or h2 ≤ v(t) ≤ h1 and v(t) > v(t − 1) if v(t) = v(t − 1) if v(t) < h2 or h2 ≤ v(t) ≤ h1 and v(t) < v(t − 1)
Assumption 21.1. The following assumptions are supposed to be satisfied by the identified system: 1. B(1) = 0 i.e. the static gain of the linear subsystem is nonzero. 2. There is a known real hm such that: hm > max (|h1 |, |h2 |). Our purpose is to design an identification scheme that provides consistent estimates of all unknown parameters i.e. those of the linear subsystem B q−1 /A q−1 and those of the nonlinear element F(.). Remark 2.1. i) Part 2 of Assumption 21.1 is not too restrictive because hm may be arbitrarily large. A similar assumption is needed in Section 15.7 where symmetric nonlinearities (of backlash type) were considered. There, a graphical search method has been proposed to find the value of a. Then, the search should be initialised in an interval including the unknown parameter a to prevent local convergence. ii) It can be easily checked that the backlash-relay element (resp. switch-relay) can be seen as a cascade of a simple backlash (resp. switch) element in series with a static-relay.
21.3 Linear Subsystem Identification The linear subsystem identification is dealt with in three steps. First, an adequate system rescaling is introduced, in Subsection 21.3.1, to reduce the number of uncertain parameters. The obtained system representation is further transformed in Subsection 21.3.2 to cope with the unavailability of the internal signal u(t). The transformed representation involves linearly the linear subsystem parameters and, therefore, is used to estimate them. Finally, a persistently exciting input is proposed in Subsection 21.3.3 and shown to ensure estimate consistency.
21
Hammerstein System Identification
351
21.3.1 Model Reforming The considered identification problem solution up to a multiplicative has a unique real constant. Actually, if the triplet A q−1 , B q−1 , F(.) is a solution then so is −1 def A q , μ B q−1 , Fμ (.) for any μ = 0 with Fμ (v) = F(v) μ . −1 ∗ −1 ∗ Let A q ,B q , F (.) denotes the particular model corresponding to def
μ ∗ = M2 − M1 , i.e. def def B∗ = μ ∗ b1 q−1 + . . . + bn q−n = b∗1 q−1 + b∗2 q−2 + . . . + b∗n q−n , F∗ =
def
F(.) . μ∗
(21.3) (21.4)
It is readily seen from Figures 21.2a to 21.3b that: F = R(M1 ,M2 ,h1 ,h2 ) ⇒ F ∗ = R(m1 ,m2 ,h1 ,h2 ) ∗
(21.5) ∗
F = SwS,h1 ,h2 or F = Ba(S,h1 ,h2 ) ⇒ F = Sw(s,h1 ,h2 ) or F = Ba(s,h1 ,h2 ) with: m1 =
M1 M2 S , m2 = , s= . M2 − M1 M2 − M1 M2 − M1
(21.6)
(21.7)
The following relations are also of interest: m2 − m1 = 1, S =
M1 −M2 m1 −m2 = , s= = . hm − h1 hm + h2 hm − h1 hm + h2
(21.8)
Using (21.3)-(21.4), the system (21.1) can be rewritten as follows: def A q−1 y(t) = B∗ q−1 u∗ (t) + ξ (t), u∗ (t) = F ∗ (v(t)) .
(21.9)
Note that for Hammerstein systems with backlash-relay or switch-relay, the element R(m1 ,m2 ,h1 ,h2 ) involves three uncertain parameters (due to (21.8)) rather than four in the initial element R(M1 ,M2 ,h1 ,h2 ) . However, the new internal signal u∗ (t) in (21.9) is still unavailable. Therefore, the system representation (21.9) needs further transformations.
21.3.2 Model Centring and Linear Subsystem Parameter Estimation Let {y1 (t)} denote the response of (21.9) when the following input signal is applied: −hm if t = 0 , def v(t) = v1 (t) = . (21.10) hm if t ≥ 1 .
352
Y. Rochdi et al.
Then, using the definitions of Table 21.1 (or simply Figures 21.2a to 21.3b), it follows that, for all t ≥ 1: def A q−1 y1 (t) = B∗ q−1 u∗1 (t) + ξ1 (t), u∗1 (t) = F ∗ (hm ) = m1 ,
(21.11)
where ξ1 (t) denotes the realisation of ξ (t) during the present experiment. Time averaging the first equality in (21.11), over the interval 1 ≤ t ≤ L, yields1 : A(1)y¯1 (L) = B∗ (1)m1 + ξ¯1 (L) .
(21.12)
The ergodicity of {ξ1 (t)} implies that ξ¯1 (L) → E(ξ (t)) = 0 as L → 0 (w.p. 1). Also, let y¯1 denotes the limit of y(L) ¯ when L → ∞. It follows from (21.12) that such a limit exists and satisfies: (21.13) A(1)y¯1 = B∗ (1)m1 . Practically, y¯1 can be computed from a sufficiently large sample {y(t);t = 1 . . . L}. Now, subtracting (21.13) from the first equality in (21.9) gives: A q−1 (y(t) − y¯1) = B∗ q−1 (u∗ (t) − m1) + ξ (t) . (21.14) For convenience, let us introduce the following centred signals: y(t) ˜ = y(t) − y¯1 , u(t) ˜ = u∗ (t) − m1 . def
def
(21.15)
Using (21.14) and (21.15), it follows that the identified system (21.9) can be given the compact form: ˜ = B∗ q−1 u(t) ˜ + ξ (t) . A q−1 y(t) (21.16) On the other hand, using (21.15), (21.4) and the definitions of F(.) (Table 21.1 or Figures 21.2a to 21.3b), it follows that, for all t: 0 if v(t) = hm , (21.17) u(t) ˜ = 1 if v(t) = −hm . That is, the internal sequence {u(t)} ˜ in (21.16) turns out to be measurable as long as the input sequence {v(t)} takes its values in the set {−hm , hm }. Therefore, the coefficients of A q−1 and B∗ q−1 can be estimated based upon the equation error (21.16). To this end, the latter is given the following regressive form: y(t) ˜ = φ˜ (t)T θ ∗ + ξ (t) ,
(21.18)
with:
φ˜ (t) = [−y(t ˜ − 1) · · · − y(t ˜ − n) u(t ˜ − 1) · · · u(t ˜ − n)]T , 1
(21.19)
Throughout the chapter, x(N) ¯ denotes the arithmetic mean of {x(t)} i.e. x(N) ¯ = x(i)/N. If {x(t)} is an ergodic stochastic process, then x(N) ¯ → E(x(t)) (w.p. 1) as ∑N i=1 N → ∞ where E(x(t)) denotes the ensemble mean.
21
Hammerstein System Identification
353
θ ∗ = [a1 . . . an b∗1 · · · b∗n ]T .
(21.20)
The unknown parameter vector can then be estimated using the standard least squares estimator: 8
1 N ˜ ˜ T θˆ (N) = ∑ φ (i)φ (i) N i=1
9−1 8
1 N ˜ ˜ T ∑ φ (i)y(i) N i=1
9 .
(21.21)
˜ be It is understood that {v(t)} takes its values only in the set {−hm , hm } so that u(t) measurable.
21.3.3 A Class of Exciting Input Signal The input signal {v(t)} should meet the two requirements: (i) it must take its values in the set {−hm , hm } so that {u(t)} ˜ is measurable; (ii) the resulting regression vectors φ˜ (t) should satisfy the persistent excitation (PE) property (Proposition 21.1). Bearing these in mind, we propose the following periodic signal, with period T = 4n, where k is any integer, tk = kT , tk ≤ kT < tk + 1: −hm for t = tk + 2n , def v(t) = v2 (t) = (21.22) hm otherwise. Then, in view (21.17), the internal signal {u(t)} ˜ turns out to be the following: 1 for t = tk + 2n , def u(t) ˜ = u˜2 (t) = (21.23) 0 otherwise.
21.3.4 Consistency of Linear Subsystem Parameter Estimates Let z˜(t) denotes the undisturbed output defined as follows: A q−1 z˜(t) = B∗ q−1 u˜2 (t) .
(21.24)
Introduce the undisturbed state vector ˜ def = [−˜z(t − 1) · · · − z˜(t − n) u(t ˜ − 1) · · · u(t ˜ − n)]T . Z(t) Proposition 21.1. Let the system (21.1) be excited by the signal (21.22) so that it can be represented by the equation error (21.16) or its regression form (21.18). Then, with I denoting the identity matrix, one has: ˜ is PE i.e. there exists a real λ > 0 such that, for all k: 1. Z(t) 4n−1
∑
i=0
Z˜ (tk + i) Z˜ (tk + i)T ≥ λ I .
354
Y. Rochdi et al.
2. φ˜ (t) is PE in the mean i.e. there exists a real β > 0, such that: 1 N ˜ ˜ T ∑ φ (i)φ (i) > β I (w.p.1). N→∞ N i=1 lim
3. The estimator (21.21) is consistent i.e. θˆ (N) → θ ∗ as N → ∞ (w.p. 1). Proof. The PE property of Part 1 can be obtained applying a Technical Lemma in [5] to the system (21.24), using the fact that the input u˜2 (t) has the form required in [5], due to (21.23). Part 2 follows from Part 1, based on the relation ˜ + 1/A q−1 [ξ (t − 1) . . . ξ (t − n) 0 . . . 0]T , φ˜ (t) = Z(t) ˜ using the ergodicity of ξ (t) and its independence with Z(t). Part 3 is in turn a consequence of Part 2.
21.3.5 Simulation Consider two Hammerstein systems with a linear subsystem characterised by: A q−1 = 1 − 1.3q−1; B q−1 = q−1 − 0.5q−1 . (21.25)
ξ (t) is a zero-mean i.i.d random sequence in [−0.5 0.5]. The nonlinear elements are a backlash and switch-relay, respectively defined as follows: for Ba(S,h1 ,h2 ) : h1 = 1, h2 = −0.5, S = 1 ,
(21.26)
for R(M1 ,M2 ,h1 ,h2 ) : h1 = 2, h2 = −1, M1 = 1, M2 = −2 .
(21.27)
Notice that these nonlinearities are asymmetric. The real hm in Assumption 2 is chosen as follows: 2 for F = Ba(S,h1 ,h2 ) , (21.28) hm = 3 for F = R(M1 ,M2 ,h1 ,h2 ) . For Ba(S,h1 ,h2 ) , it follows from (21.26) and (21.28) that M1 = F (hm ) = 1; M2 = F(−hm ) = 1.5. Then, (21.3) gives: B∗ q−1 = −2.5q−1 + 1.25q−2 for F = Ba(S,h1 ,h2 ) , (21.29) B∗ q−1 = −3q−1 + 0.5q−2 for F = R(M1 ,M2 ,h1 ,h2 ) . (21.30) Also, one gets from (21.4)-(21.7) that for F = Ba(S,h1 ,h2 ) : 2 2 F ∗ = Ba(s,h1 ,h2 ) with h1 = 1, h2 = −0.5, s = − , m1 = − , 5 5
(21.31)
21
Hammerstein System Identification
355
and for F = R(m1 ,m2 ,h1 ,h2 ) : F ∗ = R(m1 ,m2 ,h1 ,h2 ) with m1 = −
−1 , h1 = 2, h2 = −1 , 3
(21.32)
where m2 = 1 + m1 . The resulting parameter vector θ ∗ is given in Table 21.2 (line 3). First, the system is excited by the step signal v1 (t) defined by (21.10). Then, time averaging the first equality in (21.11), over the interval 1 ≤ t ≤ L = 1000, yields y¯1 . Then, the system is excited by the periodic signal defined by (21.22). The obtained data are used in algorithm (21.21) (with N = 1000). Doing so, one gets the estimates of Table 21.2 (line 4) which are clearly quite close to their true values. Table 21.2: The estimate obtained by (21.22)
θ∗ θˆ (N)
Case where F = Ba(S,h1 ,h2 )
Case where F = R(M1 ,M2 ,h1 ,h2 )
[−1.3 0.42 − 2.5 1.25] [−1.297 0.419 − 2.493 1.245]
[−1.3 0.42 − 3 1.5]T [−1.297 0.418 − 2.993 1.493]T
21.4 Nonlinear Element Estimation ∗ Let θˆ (N) bethe estimate −1 −1of θ obtained by (21.21) (using a sufficiently large N). ˆ ˆ and BN q be the estimates of A q−1 and B∗ q−1 induced by Let AN q θˆ (N). These estimates will now be used to determine the parameters of F ∗ (.) ∈ R(m1 ,m2 ,h1 ,h2 ) , Ba(s,h1 ,h2 ) , Sw(s,h1 ,h2 ) .
21.4.1 Estimation of m1 Equation (21.13) suggests for m1 the following estimate: mˆ 1 (L, N) =
Aˆ N (1) y¯1 (L) , for a sufficiently large L. Bˆ N (1)
(21.33)
Using Proposition 21.1 (Part 3), one gets comparing (21.13) and (21.33) the following result: Proposition 21.2. Consider the system described as well by (21.1) or (21.9), where F ∗ ∈ R(m1 ,m2 ,h1 ,h2 ) , Ba(s,h1 ,h2 ) , Sw(s,h1 ,h2 ) . Let it be excited by the signal (21.10) so that it can be described by (21.13). Then, the estimator (21.33) is consistent i.e. mˆ 1 (L, N) converges (w.p. 1) to m1 , as L, N → ∞ .
356
Y. Rochdi et al.
21.4.2 Estimation of (h1 , h2 ) 21.4.2.1
Input Signal Design
The (21.9) and (21.16) do share the same linear part, i.e. parametrisations system A q−1 , B∗ q−1 . But, they involve different nonlinearities, namely F ∗ (v) and ˜ F(v) respectively. On the other hand, using Figures 21.2a to 21.3b, it follows from the relation u∗ (t) = F ∗ (v(t)) that: ˜ u(t) ˜ = F(v(t))
(21.34)
with def F˜ = R(0,1,h1 ,h2 ) when F ∗ = R(m1 ,m2 ,h1 ,h2 ) ,
(21.35)
F˜ = Ba(s,h1 ,h2 ) when F ∗ = Ba(s,h1 ,h2 ) , and
(21.36)
F˜ = Sw(s,h1 ,h2 ) when F ∗ = Sw(s,h1 ,h2 ) .
(21.37)
def def
The system parametrisation (21.16), which proved to be useful for the identification of the linear subsystem, turns out to be also appropriate for the identification of the nonlinearity. Indeed, in the light of (21.35)-(21.36), it is seen that (21.16) involves less uncertain parameters than (21.9). In order to estimate the uncertain ˜ parameters of F(.), the system is now excited by a periodic input, denoted v3 (α ,t), defined by (Figure 21.4): v3 (α ,t) = hm − t Δ1 for t = 0, 1, . . . , α T and
(21.38)
v3 (α ,t) = −hm + (t − α T )Δ2 for t = α T + 1, . . ., T ,
(21.39)
def
def
2hm m where T > 0, 0 < α < 1, Δ1 = 2h α T , and Δ 2 = T −α T . The period T and the ratio α are arbitrary, but α T should be an integer. In presence of F = R(M1 ,M2 ,h1 ,h2 ) (switch- and backlash-relay types), we will focus on the two integers t1 , t2 ∈ (0 T ) where v3 (α ,t) is as close as possible to h2 and h1 , respectively (Fig 21.4). Let us introduce the following notations:
t1 = β T and t2 = γ T def
def
where (β , γ ) satisfy 0 < β < α < γ < 1. It follows from (21.38)-(21.39) that: hm − β T Δ1 = h2 − ε (α , T ) with 0 ≤ ε1 (α , T ) < Δ1 , −hm + (γ − α )T Δ2 = h1 + ε2 (α , T ) with 0 ≤ ε2 (α , T ) < Δ2 .
(21.40)
Similarly, in presence of F = Ba(S,h1 ,h2 ) , the focus will be made on t1 , t2 ∈ (0 T ) where v(t) = v3 (α ,t) is as close as possible to the particular values (hm − (h1 − h2)) and (−hm + (h1 − h2)) (see Fig 21.4). Let us introduce the following notations: t1 = μ T and t2 = η T def
def
21
Hammerstein System Identification
357
where μ and η verify 0 < μ < α < η < 1. It follows from (21.38)-(21.39) that: hm − μ T Δ1 = hm − (h1 − h2) − ε1 (α , T )
(21.41)
−hm + (η − α )T Δ2 = −hm + (h1 − h2) + ε2 (α , T ) with 0 ≤ ε1 (α , T ) < Δ1 and 0 ≤ ε2 (α , T ) < Δ2 . Equations (21.40)-(21.41) show that, for a given α , the residuals εi (α , T ), εi (α , T ) (i = 1, 2) vanish as T → ∞. On the other hand, it follows from (21.15) and Figures 21.2a to 21.3b that, for t > T , u(t) ˜ is in turn periodic, with period T . The transient behaviour, during the first period [0 T ), may be ignored letting T be the origin of time. Then, u(t) ˜ turns out to be defined in a period [0 T ) as follows (Figures 21.5 to 21.7): ⎧ 1 if β T ≤ t ≤ γ T def ⎪ ⎪ u(t) ˜ = u˜R (α ,t) = ⎪ ⎪ 0 otherwise ⎪ ⎪ ⎪ ∗ ⎪ for ⎪ ⎧ F = R(m1 ,m2 ,h1 ,h2 ) , ⎪ ⎪ ⎪ 0 if 0 < t ≤ μ T ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ t− μ T ⎨ ⎪ ⎪ if μ T < t ≤ α T def ⎪ α T −μ T ⎨ u(t) ˜ = u˜Ba (α ,t) = 1 if α T < t < η T ⎪ ⎪ ⎪ ⎩ − t−η T + 1 if η T < t ≤ T ⎪ ⎪ ⎪ T −η T ⎪ ⎪ ∗ = Ba ⎪ for F ⎪ (s,h1 ,h2 ) , ⎪ ⎪ ⎪ 1−s(h 1 −h2 ) ⎪ ⎪ def t + s (h1 − h2) if 0 ≤ t < α T ⎪ αT u(t) ˜ = u˜Sw (α ,t) = ⎪ ⎪ ⎪ 2shm Tt−−ααTT − 2shm if α T ≤ t < T ⎪ ⎪ ⎩ for F ∗ = Sw(s,h1 ,h2 ) . Let u¯˜R (α ), u¯˜Ba (α ), and u¯˜Sw (α ) be the averages of u˜R (α ,t), u˜Ba (α ,t), and u˜Sw (α ,t) over [0 T ). One gets from (21.42): (21.42) u˜¯R (α ) = (γ − β ) + ε (α , T ) for F ∗ = R(m1 ,m2 ,h1 ,h2 ) , ∗ u˜¯Ba (α ) = 0.5(α − μ ) + (η − α ) + 0.5(1 − η ) + ε (α , T) for F = Ba(s,h1 ,h2 ) , (21.43) ¯u˜Sw (α ) = 0.5α + 0.5α s (h1 − h2) + (1 − α )shm for F ∗ = Sw(s,h ,h ) . (21.44) 1
2
In (21.42)-(21.44), and in subsequent equations, ε (α , T ) denotes a generic error resulting from ε1 (α , T ), ε1 (α , T ), ε2 (α , T ), ε2 (α , T ), such that for any fixed 0 < α < 1: lim ε (α , T ) = 0 . (21.45) T →∞
Using (21.40)-(21.41) and the expressions of Δ1 and Δ2 , it follows from (21.42)(21.43) that: h1 1 (1 − α ) + ε (α , T ) for F ∗ = R(m1 ,m2 ,h1 ,h2 ) , u¯˜R (α ) = + 2 2hm 1 h1 − h2 (1 − 2α ) + ε (α , T ) for F ∗ = Ba(s,h1 ,h2 ) . u˜¯Ba (α ) = + 2 4hm
(21.46) (21.47)
358
Y. Rochdi et al.
Fig. 21.4: Shape of the signal v(t) = v3 (α ,t), defined by (21.38)-(21.39), over one period T
Furthermore, due to (21.8), (21.47) can be rewritten as follows: 1 (2hm + 1/s) u¯˜Ba (α ) = + (1 − 2α ) + ε (α , T ) . 2 4hm
(21.48)
In the sequel, estimates of u¯˜R (α ), u¯˜Ba (α ), u¯˜Sw (α ) will be needed. To determine these, let us perform averaging of both sides of (21.16) over [0 MT ), when v(t) = v3 (α ,t). Doing so, one gets: ¯˜ ¯˜ A(1)y(MT ) = B∗ (1)u(MT ) + ξ¯ (MT ) .
(21.49)
¯˜ In fact, u(MT ) coincides with the average value of u(t) ˜ on the period [0 T ) because, in the present case, u(t) ˜ is periodic with period T . Specifically, one has: ⎧ ⎨ u¯˜R for F ∗ = R(m1 ,m2 ,h1 ,h2 ) , ¯˜ u¯˜Ba for F ∗ = Ba(s,h1 ,h2 ) , u(MT )= (21.50) ⎩ ¯ u˜Sw for F ∗ = Sw(s,h1 ,h2 ) . Using (21.50) and the ergodicity of {ξ (t)}, it follows from (21.49) that: ¯˜ lim [B∗ (1)u¯˜R (α ) − A(1)y(MT )] = 0 for F ∗ = R(m1 ,m2 ,h1 ,h2 ) (w.p. 1) ,
(21.51)
¯˜ lim [B∗ (1)u¯˜Ba (α ) − A(1)y(MT )] = 0 for F ∗ = Ba(s,h1 ,h2 ) (w.p. 1) ,
(21.52)
¯˜ lim [B∗ (1)u¯˜Sw (α ) − A(1)y(MT )] = 0 for F ∗ = Sw(s,h1 ,h2 ) (w.p. 1) .
(21.53)
M→∞ M→∞ M→∞
21
Hammerstein System Identification
359
Fig. 21.5 Shapes of the sequences obtained over a period T for backlash-relay and switch-relay
21.4.2.2
Estimation of (h1 , h2 ) for Backlash- and Switch-Relay
To determine (h1 , h2 ), one needs two equations like (21.46). We then proceed with two experiments involving two inputs v3 (t, α1 ) and v3 (t, α2 ) with the same period T but different ratios α1 and α2 . The resulting internal signals are denoted u˜R1 and u˜R2 . In view of (21.46), these have the average values: h1 1 h2 (1 − α1) + α1 + ε (α1 , T ) , u¯˜R1 (α1 ) = + 2 2hm 2hm h1 1 h2 (1 − α2) + α2 + ε (α2 , T ) . u¯˜R2 (α2 ) = + 2 2hm 2hm
(21.54) (21.55)
Solving (21.54)-(21.55) for (h1 , h2 ) yields: 3
h1 h2
4
3 = 2hm
(1 − α1) α1 (1 − α2) α2
4−1 3
u˜¯R1 (α1 ) − 0.5 − ε (α1, T ) u¯˜R2 (α1 ) − 0.5 − ε (α2, T )
4 .
(21.56)
In order to use (21.56), one needs u¯˜R1 (α1 ), u¯˜R2 (α2 ). Using (21.50) and (21.51), one gets the estimates: Aˆ N (1) ¯ Aˆ N (1) ¯ uˆ¯˜R1 (M, N) = y˜31 (MT ) , u¯ˆ˜R2 (M, N) = y˜32 (MT ) , ˆ BN (1) Bˆ N (1)
(21.57)
360
Y. Rochdi et al.
Fig. 21.6 Shapes of the sequences obtained over a period T for backlash operator
where y¯˜31 (t), y¯˜32 (t) are the responses of (21.15)-(21.16) to the inputs v3 (t, α1 ) and v3 (t, α2 ), respectively. This, together with (21.56), suggests the following estimation algorithm for (h1 , h2 ): 9 3 4 3 4−1 8 ˆ hˆ 1 (M, N) u¯˜R1 (M, N) − 0.5 (1 − α1) α1 = 2hm . (21.58) (1 − α2) α2 hˆ 2 (M, N) uˆ¯˜R2 (M, N) − 0.5 Proposition 21.3. Consider the Hammerstein system described as well by (21.1) and (21.16), (21.34) with F˜ = R(0,1,h1 ,h2 ) . Let it be excited by two inputs v3 (α ,t) (with different values of α ) and let the Algorithm (21.57)-(21.58) be used to get the estimates hˆ 1 (M, N), hˆ 2 (M, N). Then, one has: ( ( ( ( lim sup (ˆh1 (M, N) − h1 ( ≤ e(T ) , lim sup (ˆh2 (M, N) − h2 ( ≤ e(T ) (w.p.1) M→∞ N→∞
M→∞ N→∞
with e(T ) an asymptotically vanishing generic function. That is, the larger T the better the estimates. Proof. Subtracting (21.56) from (21.58) gives:
21
Hammerstein System Identification
361
Fig. 21.7 Shapes of the sequences obtained over a period T for a switch operator
9 4−1 8 ˆ u¯˜R1 (M, N) − u¯˜R1 (α1 ) + ε (α1 , T ) . uˆ¯˜R2 (M, N) − u¯˜R2 (α1 ) + ε (α2 , T ) (21.59) On the other hand, applying (21.51) to the couples
3 4 3 hˆ 1 (M, N) − h1 (1 − α1) = 2h m (1 − α2) hˆ 2 (M, N) − h2
α1 α2
(u¯˜R1 (α1 ), y¯˜1 (MT )) and (u¯˜R2 (α2 ), y¯˜2 (MT )) yields: lim [B∗ (1)u¯˜R1 (α1 ) − A(1)y¯˜1(MT )]
M→∞
= lim [B∗ (1)u¯˜R2 (α2 ) − A(1)y¯˜2 (MT )] = 0 (w.p.1) . (21.60) M→∞
Using Proposition 21.1 (Part 3), it follows from (21.34) that: lim Bˆ N (1)u¯˜R1 (α1 ) − Aˆ N (1)y¯˜1 (MT )
M→∞ N→∞
362
Y. Rochdi et al.
= lim Bˆ N (1)u¯˜R2 (α2 ) − Aˆ N (1)y¯˜2 (MT ) = 0 (w.p.1) . (21.61) M→∞ N→∞
But, from (21.57) one has: Bˆ N (1)u¯ˆ˜R1 (M, N) − Aˆ N (1)y¯˜1 (MT ) = Bˆ N (1)u¯ˆ˜R2 (M, N) − Aˆ N (1)y¯˜2 (MT ) = 0 . (21.62) Comparing (21.61) and (21.62), it follows (using Part 3 of Proposition 21.1) that: " " # # lim uˆ¯˜R1 (M, N) − u¯˜R1 (α1 ) = lim uˆ¯˜R2 (M, N) − u¯˜R2 (α2 ) = 0 . (21.63) M→∞ N→∞
M→∞ N→∞
Proposition 21.3 follows from (21.45), (21.59), (21.63), using the fact that ε (α , T ) vanishes as T → ∞. 21.4.2.3
Estimation of (s, h1 , h2 ) for Backlash Operators
We now focus on the case of F ∗ = Ba(s,h1 ,h2 ) . Equations (21.48), (21.49) and (21.50) suggest for the parameter s the estimate: 3 4−1 1 1 s(M, ˆ N) = 4hm uˆ¯˜Ba (M, N) − − 2hm 2 1 − 2α ˆ def AN (1) ¯˜ uˆ¯˜Ba = y(MT ) Bˆ N (1)
(21.64) (21.65)
where y(t) ˜ is the response of (21.15)-(21.16) to the input v(t) = v3 (t, α ) and 0 < α < 1 is arbitrary but α = 0.5. Consequently, only one experiment is needed to estimate the parameter s. On the other hand, it is easily seen from (21.8) and Figure 21.4 1 that: h1 = hm − ms1 and h2 = −hm − 1+m s . This suggests for (h1 , h2 ) the following estimates: mˆ 1 (L, N) mˆ 1 (L, N) and hˆ 2 (L, M, N) = −hm − . hˆ 1 (L, M, N) = hm − s(M, ˆ N) s(M, ˆ N)
(21.66)
Proposition 21.4. Consider the Hammerstein system with backlash nonlinearity described by (21.1) or by (21.16), (21.38) with F˜ = Ba(s,h1 ,h2 ) . Let it be excited by v3 (α ,t) (with α = 0.5) and use Algorithm (21.64)-(21.66) to get the estimates s(M, ˆ N), hˆ 1 (L, M, N), hˆ 2 (L, M, N). Then, one has w.p.1: lim sup |s(M, ˆ N) − s| ≤ e(T ) ,
M→∞ N→∞
( ( lim sup (ˆh1 (L, M, N) − h1 ( ≤ e(T ) ,
L→∞ M→∞ N→∞
21
Hammerstein System Identification
363
( ( lim sup (ˆh2 (L, M, N) − h2 ( ≤ e(T ) ,
L→∞ M→∞ N→∞
with e(T ) a generic error that vanishes as T → ∞. That is, the larger is T the better the estimates. Proof. From (21.64) and (21.48) one gets, respectively: 1 −1 ¯ s = 4hm (u˜Ba − 0.5 − ε (α , T)) − 2hm , 1 − 2α 1 1 (s(M, ˆ N))−1 = 4hm uˆ¯˜Ba (M, N) − − 2hm . 2 1 − 2α
(21.67) (21.68)
Subtracting each side of (21.68) from the corresponding side of (21.67), yields: " # 1 s−1 − (s(M, ˆ N))−1 = 4hm u¯˜Ba (α ) − uˆ¯˜Ba (M, N) − ε (α , T ) . (21.69) 1 − 2α ˜ = On the other hand, applying (21.52) to the case where v(t) = v3 (α ,t) and u(t) u˜B (α ,t), one gets: ¯˜ lim [B∗ (1)u¯˜Ba (α ) − A(1)y(MT )] = 0 (w.p. 1) ,
M→∞
(21.70)
where y(t) ˜ denotes the response of (21.15)-(21.16) to the input v(t) = v3 (α ,t). Using Proposition 21.1 (Part 3), it follows from (21.70) that: ¯˜ lim Bˆ N (1)u¯˜Ba (α ) − Aˆ N (1)y(MT ) = 0 (w.p. 1) . (21.71) M,N→∞
¯˜ ) = 0. This, together with But, from (21.65) one has, Bˆ N (1)uˆ¯˜Ba (M, N) − Aˆ N (1)y(MT (21.71), implies: 6 7 lim u¯˜Ba (α ) − uˆ¯˜Ba (M, N) = 0 , (21.72) M→∞ N→∞
which together with (21.69) and (21.45) establishes Proposition 21.4 for s (using the fact that ε (α , T ) vanishes as T → ∞). One can similarly establishes the convergence results concerning h1 , h2 . 21.4.2.4
Estimation of (s, h1 , h2 ) for Switch Elements
We now focus on the case of F ∗ = Sw(s,h1 ,h2 ) . An estimation algorithm can be designed following the approach of Subsection 21.4.2.2. This consists in performing two experiments involving two input signals v3 (α1 ,t) and v3 (α2 ,t), defined by (21.38)-(21.39), with the same period T (but α1 = α2 ). The resulting internal signals are denoted u˜Sw1 (t), u˜Sw2 (t) and, in view of (21.44), have the average values:
364
Y. Rochdi et al.
α1 h1 − h2 + α1 s + (1 − α1)shm , u¯˜Sw1 (α1 ) = 2 2 h1 − h2 α2 + α2 s + (1 − α2)shm . u¯˜Sw1 (α2 ) = 2 2
(21.73) (21.74)
These constitute the main ingredients to get estimates for s, h1 , h2 , just as we did in Subsection 21.4.2.2.
21.4.3 Simulation Consider the Hammerstein system with switch-relay of Subsection 21.3.5. Applying algorithm (21.33) (with N = 1000) yields mˆ 1 = 0.336 (compare with the true value m1 = −1/3). Algorithm (21.57)-(21.58) is then applied with the parameters T = 120, L = 1000, M = 1000, α1 = 0.5 and α2 = 0.7. The estimates obtained for (h1 , h2 ) are: hˆ 1 = 2.03, hˆ 2 = −1.006. Compare these with the corresponding true values, namely h1 = 2, h2 = −1. Consider the Hammerstein system with backlash nonlinearity defined in Subsection 21.3.5. Applying algorithm (21.33) (with N = 1000) yields mˆ 1 = −0.404 (compare with mˆ 1 = −2/5). Then, algorithm (21.64)-(21.66) is applied with the parameters T = 120, L = 1000, M = 100, α = 0.7. The estimates thus obtained for (s, h1 , h2 ) are: sˆ = −0.385, hˆ 1 = 0.953, hˆ 2 = −0.455. Compare these with the true values, namely s = −2/5, h1 = 1, h2 = −0.5. The above simulation results have been confirmed by running the proposed identification approach several times with different realisations of the noise ξ (t).
21.5 Conclusion Hammerstein system identification has been addressed in presence of backlash, switch, backlash-relay and switch-relay nonlinearities. The estimation of the linear subsystem and the determination of the nonlinear element are performed separately. One key step in the identification scheme design is the construction of the system parametrisation (21.16) that presents three crucial properties: (i) the unknown linear parameters come in linearly; (ii) the involved internal signal (u(t)) ˜ is known as ˜ involves less unlong as v(t) ∈ {hm , +hm }; (iii) the corresponding nonlinearity F(.) certain parameters than the initial element F(.) in (21.1). The second main feature of the identification scheme is the design of persistently exciting input signals that made it possible to achieve formal consistency results for the involved estimators.
References 1. Bai, E.W.: Identification of linear systems with hard input nonlinearities of known structure. Automatica 38, 853–860 (2002) 2. Cerone, V., Regruto, D.: Bounding the parameters of linear systems with input backlash. IEEE Transactions on Automatic Control 52, 531–536 (2007)
21
Hammerstein System Identification
365
3. Chaoui, F.Z., Giri, F., Rochdi, Y., Haloua, M., Naitali: A System identification based on Hammerstein model. International Journal of Control 78(6), 430–442 (2005) 4. Giri, F., Chaoui, F.Z., Rochdi, Y.: Parameter identification of a class of Hammerstein plants. Automatica 37, 749–756 (2001) 5. Giri, F., Chaoui, F.Z., Rochdi, Y.: Interval excitation through impulse sequences. A technical lemma. Automatica 38, 457–465 (2002) 6. Giri, F., Chaoui, F.Z., Rochdi, Y.: Recursive identification of systems with hard input nonlinearities of known structure. In: IEEE American Control Conference, Boston, Massachusetts, USA, pp. 4764–4769 (2004) 7. Giri, F., Rochdi, Y., Chaoui, F.Z.: Identification of Hammerstein systems in presence of Hysteresis-Backlash and Hysteresis-Relay nonlinearities. Automatica 44, 767–775 (2008) 8. V¨or¨os, J.: Parameter identification of discontinuous Hammerstein systems. Automatica 33, 1141–1146 (1997)
Chapter 22
Bounded Error Identification of Hammerstein Systems with Backlash Vito Cerone, Dario Piga, and Diego Regruto
22.1 Introduction Actuators and sensors commonly used in control systems may exhibit a variety of nonlinear behaviours that may be responsible for undesirable phenomena such as delays and oscillations, which may severely limit both the static and the dynamic performance of the system under control (see, e.g., [22]). In particular, one of the most relevant nonlinearities affecting the performance of industrial machines is the backlash (see Figure 22.1), which commonly occurs in mechanical, hydraulic and magnetic components like bearings, gears and impact dampers (see, e.g., [17]). This nonlinearity, which can be classified as dynamic (i.e., with memory) and hard (i.e. non-differentiable), may arise from unavoidable manufacturing tolerances or sometimes may be deliberately incorporated into the system in order to describe lubrication and thermal expansion effects [3]. The interested reader is referred to [22] for real-life examples of systems with either input or output backlash nonlinearities. In order to cope with the limitations caused by the presence of backlash, either robust or adaptive control techniques can be successfully employed (see, e.g., [7], and [21] respectively), which, on the other hand, require the characterisation of the nonlinear dynamic block. Few contributions can be found in literature on the idenVito Cerone Dipartimento di Automatica e Informatica, Politecnico di Torino, corso Duca degli Abruzzi 24, 10129 Torino, Italy e-mail:
[email protected] Dario Piga Dipartimento di Automatica e Informatica, Politecnico di Torino, corso Duca degli Abruzzi 24, 10129 Torino, Italy e-mail:
[email protected] Diego Regruto Dipartimento di Automatica e Informatica, Politecnico di Torino, corso Duca degli Abruzzi 24, 10129 Torino, Italy e-mail:
[email protected] F. Giri & E.-W. Bai (Eds.): Block-oriented Nonlinear System Identification, LNCIS 404, pp. 367–382. c Springer-Verlag Berlin Heidelberg 2010 springerlink.com
368
V. Cerone, D. Piga, and D. Regruto
tification of systems with input backlash. An input-holding scheme is exploited in [20] to compute the system parameters estimates by solving least squares problems, while a separable least squares approach is discussed in [1]. In [8] a consistent estimator, based on careful selection of the system parametrisation and the input signal, is presented, whereas an iterative algorithm relying on a newly proposed model for the backlash nonlinearity is discussed in [23]. Although the most common assumption in system identification is that measurement errors are statistically described, a worthwhile alternative is the bounded-errors or set-membership characterisation, where uncertainties are assumed to belong to a given set. In this context all parameters consistent with measurements, error bounds and the assumed model structure are feasible solutions of the identification problem. The interested reader can refer to survey papers [15, 24] and book [14] for a thorough presentation of the main theoretical basis. In this chapter, the procedure for the identification of linear systems with input backlash presented in [6] is reviewed and improved. More specifically, the problem of bounding the parameters of a stable, single-input single-output (SISO) discrete time linear system with unknown input backlash (see Figure 22.2) in the presence of bounded output measurement error is considered, under the common assumption that the inner signal x(t) is not supposed to be measurable. The chapter is organised as follows. Section 22.2 is devoted to the problem formulation. In Section 22.3, parameters of the nonlinear block are tightly bounded using input-output data from the steady-state response of the system to a collection of square wave inputs. Then, in Section 22.4, through a dynamic experiment, for all ut belonging to a suitable pseudo random binary signal (PRBS) sequence {ut }, we compute tight bounds on the inner signal, which are used to bound the parameters of the linear part together with noisy output measurements. Recently proposed relaxation techniques based on linear matrix inequalities (LMIs) are exploited in the identification of the linear block parameters, providing a significant improvement over the algorithm proposed in [6]. A simulated example is reported in Section 22.5.
22.2 Problem Formulation Let us consider the Hammerstein system depicted in Figure 22.2, where the nonlinear block that transforms the input signal ut into the unmeasurable inner variable xt is a backlash described by (see, e.g., [22]) ⎧ ⎪ ⎨ml (ut + cl ) for ut ≤ zl , xt = mr (ut − cr ) for ut ≥ zr , (22.1) ⎪ ⎩ xt−1 for zl < ut < zr , where ml > 0, mr > 0, cl > 0, cr > 0 are constant parameters characterising the backlash and . xt−1 . xt−1 zl = − cl , zr = + cr , (22.2) ml mr
22
Bounded Error Identification of Hammerstein Systems with Backlash xt
6
mr
ml
−cl
369
cr
ut
Fig. 22.1: Backlash characteristic
are the u-axis values of the intersections between the two lines with slopes ml and mr and the horizontal inner segment containing xt−1 . The backlash characteristic is depicted in Figure 22.1. The block that maps xt into the noise-free output wt is a discrete-time linear dynamic SISO system defined by wt = G (q−1 )xt =
B(q−1 ) xt , A (q−1 )
(22.3)
where A (q−1 ) = 1 + a1 q−1 + . . . + ana q−na and B(q−1 ) = b0 + b1 q−1 + . . . + bnb q−nb are polynomials in the backward shift operator q−1 , (q−1wt = wt−1 ). Furthermore, the following common assumptions are made: A1 A2 A3
the linear system is asymptotically stable (see, e.g., [10, 11, 12, 19, 20]); the steady-state gain of the linear block is not equal to zero (see, e.g., [11, 12, 20]); a rough upper bound of the settling time is available (see, e.g., [9]).
As ordinarily assumed in block-oriented nonlinear system identification, the inner signal xt is not supposed to be measurable. Therefore, identification of the Hammerstein system described by (22.1) - (22.3) relies only on input-output data. Here, we assume that the input signal ut is exactly known, while measurements yt of output wt are corrupted by bounded additive noise according to yt = wt + ηt ,
(22.4)
| ηt |≤ Δ ηt .
(22.5)
where
370
V. Cerone, D. Piga, and D. Regruto
Let γ ∈ R4 and θ ∈ R p be the unknown parameter vectors to be estimated, defined as
γT θT
. = . =
[ γ1 γ2 γ3 γ4 ] = [ ml cl mr cr ] ,
(22.6)
[ a1 . . . ana b0 b1 . . . bnb ] ,
(22.7)
where na + nb + 1 = p. It is worth noting that the parametrisation of the structure depicted in Figure 22.2 is not unique. In fact, given the pair of subsystems G˜(q−1 ), N˜ (ut , γ˜), any Hammerstein system (see Figure 22.2) with G (q−1 ) = α −1 G˜(q−1 ) and N (wt , γ ) = N˜ (wt , α γ˜) provides the same input-output behaviour for any nonzero and finite constant α ∈ R. In order to get a unique parametrisation, in this work we assume that the steady-state gain g of the linear block G (q−1 ) be one, that is: nb
∑ bj
g=
j=0 na
1 + ∑ ai
= 1.
(22.8)
i=1
In the next sections we describe a two-stage procedure for deriving lower and upper bounds of parameters γ and θ , consistently with the assumed model structure, given measurements and uncertainty bounds. ηt ut
-
N (·)
xt
−1
- B(q )
A (q−1 )
+ yt wt + ? -
Fig. 22.2: Hammerstein system with backlash
22.3 Assessment of Tight Bounds on the Nonlinear Static Block Parameters In this section we describe the first step of the proposed identification procedure where steady-state operating conditions are exploited to bound the parameters of the backlash. We apply to the system a set of square wave inputs with M different amplitudes and collect 2M steady-state values of the noisy output. More precisely, for each value of the input square wave amplitude, one steady-state output sample is collected on the positive half-wave of the input and one steady-state output measurement is collected on the negative half-wave. Because the backlash deadzone is unknown, the input amplitude must be chosen as large as to guarantee that the output shows any nonzero response. By combining Eqs. (22.1), (22.3), (22.4) and (22.8) under assumptions A1, A2 and A3 stated in Section 22.2, we obtain the following input-output description of the system in Figure 22.2 in steady-state operating conditions:
22
Bounded Error Identification of Hammerstein Systems with Backlash
w¯ i−1 + cr , mr i = 1, . . . , M;
(22.9)
w¯ j−1 − cl , ml j = 1, . . . , M;
(22.10)
w¯ i = mr (u¯i − cr ) for u¯i ≥ y¯i = w¯ i + η¯ i ,
w¯ j = ml (u¯ j + cl ) for u¯ j ≤ y¯ j = w¯ j + η¯ j ,
371
where the triplets {u¯i , y¯i , η¯ i } and {u¯ j , y¯ j , η¯ j } are collections of steady-state values of the known input signal, output observation and measurement error taken during the positive and the negative square wave respectively. As can be noted, Eqs. (22.9) and (22.10) depend only on the backlash parameters, thus the identification of γ can be carried out leaving aside the dynamics of the linear block. A block diagram description of Equation (22.10) is depicted in Figure 22.3; an analogous schematic representation also hold for Equation (22.9). Because (22.9) depends only on the right side backlash parameters mr and cr , while (22.10) involves only ml and cl , the overall feasible parameter region of the backlash can be written as the Cartesian product of two sets, that is . (22.11) Dγ = Dγr × Dγl , where . Dγr = (mr , cr ) ∈ R2+ : y¯i = mr (u¯i − cr ) + η¯ i , | η¯ i |≤ Δ η¯ i ; i = 1, . . . , M , (22.12) . Dγl = (ml , cl ) ∈ R2+ : y¯ j = ml (u¯ j + cl ) + η¯ j , | η¯ j |≤ Δ η¯ j ; j = 1, . . . , M , (22.13) {Δ η¯ i } and {Δ η¯ j } are the sequences of bounds on measurements uncertainty.
u¯ j
-
N (·)
η¯ j + y¯ j w¯ j + ? -
Fig. 22.3: Hammerstein system with backlash in steady-state operating conditions
Remark 22.1. Dγr and Dγl are 2-dimensional disjoint sets lying on the (mr , cr )−plane and the (ml , cl )−plane respectively, which means that they can be handled separately. It is worth noting that Dγr and Dγl are 2-dimensional sets enjoying the same topological features and the same mathematical properties. Therefore, the results derived in the rest of the paper for one of the two sets, say Dγl , also hold for the other set (Dγr ). Remark 22.2. Note that Dγr and Dγl are bounded sets as far as at least two measurements with different input amplitude are collected.
372
V. Cerone, D. Piga, and D. Regruto
An exact description of the feasible parameter set Dγl in terms of edges and vertices is presented below together with an orthotopic outer-bounding set providing tight parameter uncertainty intervals. Introductory definitions and preliminary results are first given.
22.3.1 Definitions and Preliminary Results − Definition 22.1. Let h+ l (u j ) and hl (u j ) be the constraints boundaries defining the FPS Dγl corresponding to the j-th sets of data:
. h+ l (u j ) = . h− l (u j ) =
(ml , cl ) ∈ R2+ : y j + Δ η j = ml (u j − cl ) , (ml , cl ) ∈ R2+ : y j − Δ ηs = ml (u j − cl ) .
(22.14) (22.15)
. Definition 22.2. Boundary of Dγl = ∂ Dγl . − Definition 22.3. The constraints boundaries h+ l (u j ) and hl (u j ) are said to be active if their intersections with ∂ Dγl is not the empty set:
h+ l (u j ) h− l (u j )
-
∂ Dγl = 0/ ⇐⇒ h+ l (u j ) is active.
(22.16)
∂ Dγl = 0/ ⇐⇒ h− l (u j ) is active.
(22.17)
− Remark 22.3. The constraints boundaries h+ l (u j ) and hl (u j ) may either intersect l l ∂ Dγ or be external to Dγ .
Definition 22.4 (Edges of Dγl ). $ % . + h˜ + Dγl = ml , cl ∈ Dγl : y j + Δ η j = ml (u j − cl ) , l (u j ) = hl (u j )
(22.18)
$ % . − l l h˜ − (u ) = h (u ) D = m , c ∈ D : y − Δ η = m (u − c ) . j j j j l l l l s γ γ l l
(22.19)
Definition 22.5 (Constraints intersections). The set of all pairs (ml , cl ) ∈ R2+ where intersections among the constraints occur is $ . − − Iγl = (ml , cl ) ∈ R2+ : {h+ {h+ / r (uρ ), hr (uρ )} r (uσ ), hr (uσ )} = 0; % (22.20) ρ , σ = 1, . . . , M; ρ = σ . Definition 22.6 (Vertices of Dγl ). The set of all vertices of Dγl is defined as the set of all intersection couples belonging to the feasible parameter set Dγl : . V (Dγl ) = Iγl Dγl .
(22.21)
22
Bounded Error Identification of Hammerstein Systems with Backlash
373
22.3.2 Exact Description of Dγl An exact description of Dγl can be given in terms of edges, each one being described, from a practical point of view, as a subset of an active constraint lying between two vertices. An effective procedure for deriving active constraints, vertices and edges of Dγl is reported in the Appendix.
22.3.3 Tight Orthotope Description of Dγl Unfortunately, the exact description of Dγl provided by edges could be not so easy to handle. A somewhat more practical, although approximate, description can be obtained by computing the following tight orthotope outer-bounding set Pγl containing Dγl : . Pγl = γ ∈ R2+ : γ j = γ cj + δ γ j , | δ γ j |≤ Δ γ j , j = 1, 2 , where
min max . γj + γj , γ jc = 2
max min . (γ j − γ j ) , Δγj = 2
. . = min γ j , γ max = max γ j . γ min j j γ ∈Dγl
γ ∈Dγl
(22.22)
(22.23) (22.24)
Because the constraints defining Dγl are nonconvex in ml and cl , standard nonlinear optimisation tools (gradient method, Newton method, etc.) cannot be used to solve problems (22.24) since they can trap in local minima, which may result arbitrary far from the global one. Thus, parameter uncertainty intervals obtained using these tools are not guaranteed to contain the true unknown parameters, which is a key requirement of any bounded-error identification method. Global optimal solutions to problems (22.24) can be computed thanks to the result reported below. Proposition 22.1. The global optimal solutions to problems (22.24) occur on the vertices of Dγl . Proof. First (i) we notice that each level curve of functionals (22.24) — parallel lines to ml -axis and cl -axis respectively — intersect the constraint boundaries (22.14) and (22.15) only once. Next, (ii) objective functions in (22.24) are monotone in Dγl , which implies that the optimal solution lies on the boundary of Dγl . Thanks to (i) the optimal value cannot lie on one edge between two vertices: if that was true, it would mean that there is a suboptimal value where the functional intersect the edge twice: that would contradict (i). Then the global optimal solutions of problems (22.24) can only occur on the vertices of Dγl .
374
V. Cerone, D. Piga, and D. Regruto
Remark 22.4. Given the set of vertices V (Dγl ) computed via Algorithm 22.1 reported in the Appendix, evaluation of (22.24) is an easy task because it only requires the computation of (a) the objective functions on a set of at most 4M points and (b) the maximum over a set of real-valued elements.
22.4 Bounding the Parameters of the Linear Dynamic Model In the second stage of the presented procedure, parameters bounds of the linear dynamic part are computed, using a PRBS input {ut } taking values ±u , with u > 0. Thanks to its properties, this kind of input sequence has been successfully used to identify linear dynamic systems (see, e.g., [13, 18]) while, in general, it is inappropriate for the identification of nonlinear systems (see, e.g., [1, 16]). However, as shown in [2], a PRBS input can be effectively employed to decouple the linear and the nonlinear parts in the identification of Hammerstein models with a static nonlinearity. In this chapter we show that the use of a PRBS sequence is profitable for the identification of linear system with input backlash. The key idea underlying the choice of the input sequence {ut } is based on the following result. Result 1. Let us consider a PRBS input {ut } whose levels are ±u . If u > cr and −u < −cl , then the output sequence {xt } of the backlash described by (22.1) is still a PRBS with levels x¯ = mr (u − cr ), x = ml (u − cl ). Proof. The proof of Result 1 follows from the backlash mathematical model (22.1) assuming u = u with u > cr and −u < −cl . From Result 1, it can be noted that the choice of suitable PRBS input levels ±u depends on the unknown parameters cr and cl , which are bounded in the first stage of the presented procedure. Therefore, in order to satisfy hypotheses of Result 1, and −u < −cmax values of u , such that u > cmax r l , are chosen. Given the exact description of Dγr and Dγl , tight bounds on the amplitudes x¯ and x of unmeasurable inner signal xt can be defined as . x¯min = min r mr (u − cr ), for u ≥ cmax r , mr ,cr ∈Dγ
max
x¯
. = max r mr (u − cr ), for u ≥ cmax r ,
(22.25)
mr ,cr ∈Dγ
. xmin = min ml (u + cl ), for − u ≤ −cmax l , ml ,cl ∈Dγl
x
max
. = max m( u + cl ), for − u ≤ −cmax l .
(22.26)
ml ,cl ∈Dγl
Computation of bounds in (22.25) and (22.26) requires, at least in principle, the solution to two nonconvex optimisation problems with two variables and 4M nonlinear inequality constraints. Thanks to Proposition 22.2 reported below, the global optimal solution is guaranteed to be achieved.
22
Bounded Error Identification of Hammerstein Systems with Backlash
375
Definition 22.7. Let us define the x-level curve of the objective function of problem (22.25) as . (22.27) gr (u , x) = (mr , cr ) ∈ R2+ : x = mr (u − cr ) , and the x-level curve of the objective function of problem (22.26) as . gl (u , x) = (ml , cl ) ∈ R2+ : x = ml (−u + cl ) .
(22.28)
Proposition 22.2. The global optimal solutions to problems (22.25) and (22.26) occur on the vertices of Dγr and Dγl , respectively. Proof. First (i) we notice that each x-level curve gl (u , x) intersect each constraint boundary in (22.14) and (22.15) only once. Next, (ii) the objective function x = ml (−u + cl ) is a monotone function in Dγl , which implies that the optimal solution lies on the boundary of Dγl . Thanks to (i) the optimal value cannot lie on an edge between two vertices: if that was true, it would mean that there is a suboptimal value where the functional intersect the edge twice: that would contradict (i). Then the global optimal solutions to problems (22.26) can only occur on the vertices of Dγl . Similar considerations apply to the right side of the backlash. By defining the central estimate x¯tc of x¯t and the uncertainty bound Δ x¯t as min + x¯max . x¯ x¯tc = , 2
max − x¯min . x¯ , Δ x¯t = 2
(22.29)
as well as the central estimate xtc and the uncertainty bound Δ xt of xt as min + xmax . x xtc = , 2
max − xmin . x , Δ xt = 2
(22.30)
the following relation can be established between the unknown inner signals xt and the corresponding central value xtc : xtc = xt + δ xt , | δ xt |≤ Δ xt ,
(22.31)
where xtc
= x¯tc , Δ xt
xtc
= xtc ,
Δ xt
= Δ x¯t if ut = u ,
(22.32)
= Δ xt if ut = −u .
(22.33)
Given the uncertain inner sequence {xtc } and the noise-corrupted output sequence {yt }, the problem of parameters bounds evaluation of the linear system can be formulated in the framework of bounded errors-in-variables (EIV) as shown in Figure 22.4, i.e. the identification of linear dynamic models where both the input and the output are affected by bounded uncertainties. The exact description of the feasible parameter region Dθ for the linear system, i.e. the set of all linear model parameters θ consistent with the assumed model structure, input and output signals xt and yt and error bounds Δ xt and Δ ηt , is
376
V. Cerone, D. Piga, and D. Regruto
$ . Dθ = θ ∈ R p : A (q−1 )(yt − ηt ) = B(q−1 )(xtc − δ xt ); % g = 1; | ηt |≤ Δ ηt ; | δ xt |≤ Δ xt ;t = 1, . . . , N ,
(22.34)
where N is the length of data sequence and g = 1 accounts for condition (22.8) on the steady-state gain. The parameter uncertainty intervals defined as . (22.35) PUI j = θ j , θ j , where
θj
. =
θj
. =
min θ j ,
(22.36)
max θ j ,
(22.37)
θ ∈Dθ θ ∈Dθ
can be computed finding the global optimal solution to the constrained optimisation problems (22.36) and (22.37). Because Dθ is a nonconvex set defined by nonlinear inequalities in the variables θ , ηt and δ xt , numerical optimisation tools cannot be employed to solve problems (22.36) and (22.37) because they can trap in local minima/maxima, which may prevent the computed uncertainty intervals from containing the true parameter θ j . One possible solution to overcome this problem is to relax (22.36) and (22.37) to convex problems to obtain a lower (upper) bound of θ j (θ j ). In paper [6] the technique presented in [4], which provides a polytopic outer approximation of the FPS Dθ , is used to derive relaxed parameter uncertainty intervals through the solution of linear programming problems. In this Section, we exploit the algorithm for the computation of the PUIs (22.35) presented in [5], which is based on the approximation of the original optimisation problems (22.36) and (22.37) by a hierarchy of convex LMI relaxations. Relaxed parameter uncertainty intervals obtained through the application of such a technique are guaranteed to be less conservative than those computed in [6] and to contain the true unknown parameter θ j . Besides, the computed relaxed bounds are guaranteed to converge monotonically to the tight ones defined in (22.36) and (22.37) as the number of successive LMI relaxations, the relaxation order δ , increases (see [5] for details).
−1 - B(q )
xt
−1
A (q
+? δ xt + xtc
?
wt
-
)
+ ηt ? +
yt
?
Fig. 22.4: Errors-in-variables basic setup for linear dynamic system
22
Bounded Error Identification of Hammerstein Systems with Backlash
377
22.5 A Simulated Example In this section we illustrate the presented parameter bounding procedure through a numerical example. The simulated system is characterised by a linear block with A (q−1 ) = (1 − 0.76q−1 + 0.82q−2), B(q−1 ) = (2.15q−1 − 1.09q−2) and a nonsymmetric backlash with ml = 0.25, mr = 0.26, cl = 0.0628, cr = 0.0489. Thus, the true parameters vectors are γ = [ml cl mr cr ]T = [0.25 0.0628 0.26 0.0489]T and θ = [a1 a2 b1 b2 ]T = [−0.76 0.82 2.15 − 1.09]T . It must be pointed out that the backlash parameters are realistically chosen. In fact, we consider the parameters of a real world precision gearbox which features a gear ratio equal to 0.25 and a deadzone as large as 0.0524rad (≈ 3o ) and simulate a possible fictitious nonsymmetric backlash with gear ratio ml = 0.25, mr = 0.26 and deadzone cl = 0.0628 (≈ 3.6o ), cr = 0.0489 (≈ 2.8o ). Bounded absolute output errors are considered when simulating the collection of both steady state data {u¯s , y¯s }, and transient sequence {ut , yt }. Uncertainties η¯ s and ηt are random sequences belonging to the uniform Table 22.1: Nonlinear block parameters evaluation: central estimates (γ cj ), parameter bounds max (γ min j , γ j ) and parameter uncertainty bounds Δ γ j
Δη 0.005
SNR (db) 54
0.02
42
0.05
34
0.15
25
0.2
22
0.3
18
γj ml cl mr cr ml cl mr cr ml cl mr cr ml cl mr cr ml cl mr cr ml cl mr cr
True Value 0.2500 0.0628 0.2600 0.0489 0.25000 0.06280 0.26000 0.04890 0.2500 0.0628 0.2600 0.0489 0.2500 0.06280 0.2600 0.04890 0.25000 0.06280 0.26000 0.04890 0.25000 0.06280 0.26000 0.04890
γ min j
γ cj
γ max j
Δγj
0.2500 0.0624 0.2600 0.0484 0.2496 0.0512 0.2596 0.0377 0.2493 0.0576 0.2593 0.0438 0.2488 0.0504 0.2588 0.0369 0.2493 0.0444 0.2593 0.0311 0.2490 0.0352 0.2590 0.0223
0.2500 0.0628 0.2600 0.0489 0.2499 0.0602 0.2599 0.0464 0.2501 0.0649 0.2601 0.0509 0.2495 0.0661 0.2595 0.0520 0.2503 0.0626 0.2603 0.0487 0.2504 0.0625 0.2604 0.0486
0.2500 0.0633 0.2600 0.0493 0.2503 0.0693 0.2603 0.0550 0.2509 0.0722 0.2609 0.0579 0.2503 0.0818 0.2603 0.0671 0.2512 0.0809 0.2612 0.0662 0.2518 0.0898 0.2618 0.0749
0.0000 0.0005 0.0000 0.0004 0.0003 0.0090 0.0003 0.0087 0.0008 0.0073 0.0008 0.0070 0.0007 0.0157 0.0007 0.0151 0.0009 0.0182 0.0009 0.0175 0.0014 0.0273 0.0014 0.0263
378
V. Cerone, D. Piga, and D. Regruto
Table 22.2: Linear block parameters evaluation: central estimates (θ cj ), parameter bounds max (θ min j , θ j ) and parameter uncertainty bounds Δ θ j for N = 100
Δη 0.005
SNR (db) 50
0.02
37
0.05
29
0.15
19
0.2
17
0.3
14
θj a1 a2 b1 b2 a1 a2 b1 b2 a1 a2 b1 b2 a1 a2 b1 b2 a1 a2 b1 b2 a1 a2 b1 b2
True Value -0.7600 0.8200 2.1500 -1.0900 -0.7600 0.8200 2.1500 -1.0900 -0.7600 0.8200 2.1500 -1.0900 -0.7600 0.8200 2.1500 -1.0900 -0.7600 0.8200 2.1500 -1.0900 -0.7600 0.8200 2.1500 -1.0900
θ min j
θ cj
θ max j
Δθj
-0.7605 0.8196 2.1470 -1.0943 -0.7666 0.8152 2.1187 -1.1283 -0.7664 0.8142 2.1201 -1.1461 -0.7703 0.8050 2.0725 -1.1953 -0.7814 0.8015 2.0324 -1.2712 -0.7933 0.7914 1.9652 -1.3676
-0.7600 0.8201 2.1503 -1.0906 -0.7594 0.8205 2.1506 -1.0931 -0.7601 0.8191 2.1581 -1.1005 -0.760 0.8206 2.1582 -1.0985 -0.7610 0.8224 2.1680 -1.1176 -0.7606 0.8240 2.1679 -1.1346
-0.7596 0.8205 2.1535 -1.0860 -0.7522 0.8259 2.1825 -1.0580 -0.7539 0.8240 2.1960 -1.0550 -0.7497 0.8353 2.2439 -1.0017 -0.7405 0.8433 2.3036 -0.9639 -0.7278 0.8567 2.3707 -0.9016
0.0004 0.0004 0.0033 0.0037 0.0072 0.0054 0.0319 0.0352 0.0063 0.0049 0.0380 0.0455 0.0103 0.0147 0.0857 0.0968 0.0204 0.0209 0.1356 0.1537 0.0328 0.0327 0.2027 0.2330
distributions U[−Δ η¯ s , +Δ η¯ s ] and U[−Δ ηt , +Δ ηt ], respectively. Bounds on steadystate and transient output measurement errors are supposed to have the same value, . i.e., Δ η¯ s = Δ ηt = Δ η . The numerical example is performed for six different values of Δ η . From the simulated steady-state data {w¯ s , η¯ s } and the transient sequence {wt , ηt }, the signal to noise rations SNR and SNR are evaluated, respectively, as < M M . 2 2 SNR = 10 log ∑ w¯ s (22.38) ∑ η¯ s , SNR
. = 10 log
s=1 N
∑
t=1
< wt2
s=1 N
∑
ηt2
.
(22.39)
t=1
For a given Δ η , the length of steady-state and the transient data are M = 50 and N = [100, 300] respectively. Parameters bounds of the linear system are computed
22
Bounded Error Identification of Hammerstein Systems with Backlash
379
Table 22.3: Linear block parameters evaluation: central estimates (θ cj ), parameter bounds max (θ min j , θ j ) and parameter uncertainty bounds Δ θ j for N = 300
Δ η SNR θ j (db) 0.005 50 a1 a2 b1 b2 0.02 38 a1 a2 b1 b2 0.05 30 a1 a2 b1 b2 0.15 20 a1 a2 b1 b2 0.2 18 a1 a2 b1 b2 0.3 14 a1 a2 b1 b2
True Value -0.7600 0.8200 2.1500 -1.0900 -0.7600 0.8200 2.1500 -1.0900 -0.7600 0.8200 2.1500 -1.0900 -0.7600 0.8200 2.1500 -1.0900 -0.7600 0.8200 2.1500 -1.0900 -0.7600 0.8200 2.1500 -1.0900
θ min j
θ cj
θ max j
Δθj
-0.7603 0.8196 2.1464 -1.0924 -0.7643 0.8158 2.1135 -1.1145 -0.7641 0.8153 2.1266 -1.1273 -0.7708 0.8110 2.0897 -1.1782 -0.7721 0.8034 1.9965 -1.2069 -0.7794 0.7942 1.9154 -1.2735
-0.7600 0.8200 2.1494 -1.0892 -0.7598 0.8202 2.1435 -1.0832 -0.7597 0.8200 2.1559 -1.0964 -0.7600 0.8213 2.1599 -1.0993 -0.7605 0.8216 2.1302 -1.0651 -0.7605 0.8226 2.1234 -1.0540
-0.7598 0.8204 2.1525 -1.0861 -0.7552 0.8247 2.1734 -1.0519 -0.7552 0.8246 2.1853 -1.0655 -0.7493 0.8315 2.2300 -1.0203 -0.7488 0.8398 2.2638 -0.9233 -0.7416 0.8510 2.3314 -0.8345
0.0002 0.0004 0.0030 0.0031 0.0045 0.0044 0.0300 0.0313 0.0045 0.0047 0.0294 0.0309 0.0107 0.0102 0.0701 0.0790 0.0116 0.0182 0.1336 0.1418 0.0189 0.0284 0.2080 0.2195
through the LMI-based procedure proposed in Section 22.4, for an LMI relaxation order δ equal to 2. Results on the backlash parameters evaluation are reported in Table 22.1, while Table 22.2 and Table 22.3 show results on the linear block parameters estimation for a transient-data sequence length N equal to 100 and 300, respectively. γ j and From Tables 22.1,22.2 and 22.3 it can 7 that the 6 true parameters 7 6 be noted
max and θ min , θ max respectively, θ j belong to the computed intervals γ min j , γj j j for j = 1, . . . , 4. It must be pointed out that, although in principle the estimation algorithm is not guaranteed to provide tight bounds on the parameters of the linear block for a finite value of the relaxation order δ , in practice satisfactory bounds on the parameters θ are obtained also for low signals to noise ratio (SNR < 20 db) and for a small number of experimental measurements (N = 100).
380
V. Cerone, D. Piga, and D. Regruto
22.6 Conclusion A two stage procedure for bounding the parameters of a single-input single-output Hammerstein system, where the nonlinear block is a backlash and the output measurements are corrupted by bounded noise, is presented in this chapter. The proposed approach is based on the selection of the input signal in order to decouple the nonlinear and the linear block parameters. In the first stage, a set of square wave input signals with different amplitudes is applied and the corresponding steady-state output samples are collected, from which the characterisation of the backlash feasible parameter set is derived, thanks to the fact that in steady-state operating conditions the input-output mapping does not depend on the linear block. On the basis of the derived backlash feasible parameter set, parameter uncertainty intervals are evaluated by computing the global optima solutions to nonconvex optimisation problems. In the second stage, a method for computing bounds on the unmeasurable inner signal is presented when a pseudo random binary signal is applied to the system. The obtained inner signal bounds, together with output noisy measurements, are used to estimate the linear block parameters through the solution of a suitable errors-invariables identification scheme. Recent results on relaxation techniques based on linear matrix inequalities are profitably used to compute the linear block parameter uncertainty intervals, which are guaranteed to converge monotonically to the tight ones as the relaxation order increases. The effectiveness of the procedure discussed in this chapter is shown through a simulated example, where satisfactory parameters bounds are obtained also for a small number of experimental data and in the presence of low signal to noise ratios.
Appendix In this appendix a procedure for the computation of vertices and active constraints defining the feasible parameter set Dγl is presented. The following additional symbols and quantities are introduced: HL is a list of active constraints boundaries, that is, each element HL (k) of the list is an active constraint boundary; the expression X ← {z} means that the element z is included in the set or list X; Dγl (s) is the set of the backlash parameters that are consistent with the first s measurement, the error bound and the assumed backlash model structure. A formal description of Dγl (s) is: . Dγl (s) = (ml , cl ) ∈ R2+ : y¯ j = ml (u¯ j − cl ) + η¯ j , | η¯ j |≤ Δ η¯ j ; j = 1, . . . , s . (22.40) The proposed procedure, Algorithm 22.1 below, works in four stages. First, the active constraints boundaries and the vertices of the set Dγl are characterised exploiting Definitions 1, 3, 5 and 6. Then, for each new measurement us , the intersections among the constraints boundaries h+ (us ) and h− (us ) and the active constraints boundaries contained in the list HL are computed; these intersections are temporarily included in the set V (Dγl ); the constraints boundaries h+ (us ) and h− (us ) are included in the list HL . Further, vertices of Dγl (s) are obtained rejecting the constraints
22
Bounded Error Identification of Hammerstein Systems with Backlash
381
Algorithm 22.1: Computation of vertices and active constraints of Dγl 1. begin = 2. V (Dγl ) ← {h+ (u1 ) h+ (u1 )}. = 3. V (Dγl ) ← {h+ (u1 ) h− (u2 )}. = 4. V (Dγl ) ← {h− (u1 ) h+ (u2 )}. = 5. V (Dγl ) ← {h− (u1 ) h− (u2 )}. 6. HL ← {h+ (u1 ), h+ (u2 ), h− (u1 ), h− (u2 )}. 7. for s = 3 : 1 : M 8. L = length(HL ); 9. q = 0; 10. for z = 1 : 1 : L = 11. V (Dγl ) ← {h+ (us ) HL (z)}. 12. if h+ (us ) ∈ / HL then 13. HL ← {h+ (us )}. 14. end if = 15. V (Dγl ) ← {h− (us ) HL (z)}. 16. if h− (us ) ∈ / HL then 17. HL ← {h− (us )}. 18. end if 19. end for = 20. V (Dγl ) = {V (Dγl ) Dγl (s)}. 21. for k=1:1:length(HL) = 22. if ∃ j = k : {HL (k) HL ( j)} ∈ V (Dγl ) then 23. Haux (q) = HL (k). 24. q = q + 1. 24. end if 25. end for 26. HL = Haux . 27. end for 28. return HL . 29. return V (Dγl ). 30. end
boundaries intersections that do not satisfy all the constraints generated by the first s measurements, which implicitly define Dγl (s). Finally, HL is updated retaining only the constraints boundaries whose intersection with each other is a vertex of Dγl (s).
References 1. Bai, E.W.: Identification of linear systems with hard input nonlinearities of known structure. Automatica 38, 853–860 (2002) 2. Bai, E.W.: Decoupling the linear and nonlinear parts in Hammerstein model identification. Automatica 40(4), 671–676 (2004) 3. Bapat, C.N., Popplewell, N., McLachlan, K.: Stable periodic motions of an impact-pair. Journal of Sound and Vibration 87, 19–40 (1983)
382
V. Cerone, D. Piga, and D. Regruto
4. Cerone, V.: Parameter bounds for armax models from records with bounded errors in variables. Int. J. Control 57(1), 225–235 (1993) 5. Cerone, V., Piga, D., Regruto, D.: Set-membership EIV identification through LMI relaxation techniques. In: Proc. of American Control Conference, Baltimore, Maryland, USA (2010) 6. Cerone, V., Regruto, D.: Bounding the parameters of linear systems with input backlash. IEEE Trans. Automatic Control 52(3), 531–536 (2007) 7. Corradini, M.L., Orlando, G., Parlangeli, G.: A VSC approach for the robust stabilization of nonlinear plants with uncertain nonsmooth actuator nonlinearities — A unified framework. IEEE Trans. Automatic Control 49(5), 807–813 (2004) 8. Giri, F., Rochdi, Y., Chaoui, F.Z., Brouri, A.: Identification of hammerstein systems in presence of hysteresis-backlash and hysteresis-relay nonlinearities. Automatica 44(3), 767–775 (2008) 9. Kalafatis, A.D., Wang, L., Cluett, W.R.: Identification of Wiener-type nonlinear systems in a noisy enviroment. Int. J. Control 66(6), 923–941 (1997) 10. Krzy˙zak, A.: Identification of nonlinear block-oriented systems by the recursive kernel estimate. Int. J. Franklin Inst. 330(3), 605–627 (1993) 11. Lang, Z.Q.: Controller design oriented model identification method for Hammerstein system. Automatica 29(3), 767–771 (1993) 12. Lang, Z.Q.: A nonparametric polynomial identification algorithm for the Hammerstein system. IEEE Trans. Automatic Control 42(10), 1435–1441 (1997) 13. Ljung, L.: System Identification, Theory for the User. Prentince Hall, Upper Saddle River (1999) 14. Milanese, M., Norton, J., Piet-Lahanier, H., Walter, E. (eds.): Bounding approaches to system identification. Plenum Press, New York (1996) 15. Milanese, M., Vicino, A.: Optimal estimation theory for dynamic sistems with set membership uncertainty: an overview. Automatica 27(6), 997–1009 (1991) 16. Ninness, B., Gibson, S.: Quantifying the accuracy of Hammerstein model estimation. Automatica 38, 2037–2051 (2002) 17. Nordin, M., Gutman, P.O.: Controlling mechanical systems with backlash — a survey. Automatica 38, 1633–1649 (2002) 18. S¨oderstr¨om, T., Stoica, P.: System Identification. Prentice-Hall, Upper Saddle River (1989) 19. Stoica, P., S¨oderstr¨om, T.: Instrumental-variable methods for identification of Hammerstein systems. Int. J. Control 35(3), 459–476 (1982) 20. Sun, L., Liu, W., Sano, A.: Identification of a dynamical system with input nonlinearity. IEE Proc. Part D 146(1), 41–51 (1999) 21. Tao, G., Canudas de Wit, C.A.: Special issue on adaptive systems with non-smooth nonlinearities. Int. J. of Adapt. Control & Sign. Proces. 11(1) (1997) 22. Tao, G., Kokotovic, P.V.: Adaptive control of systems with actuator and sensor nonlinearities. Wiley, New York (1996) 23. V¨or¨os, J.: Modeling and identification of systems with backlash. Automatica 46(2), 369– 374 (2010) 24. Walter, E., Piet-Lahanier, H.: Estimation of parameter bounds from bounded-error data: a survey. Mathematics and Computers in simulation 32, 449–468 (1990)
Chapter 23
Block Structured Modelling in the Study of the Stretch Reflex David T. Westwick
23.1 Introduction Nonlinear system identification has a long history in several disciplines related to biomedical engineering. Much of this can be credited to the seminal textbook written by Marmarelis and Marmarelis [25], which popularised the use of the crosscorrelation method for estimating the Wiener kernels of a system driven by a white Gaussian noise input [24]. This, together with the rapidly evolving capabilities of the digital computers of the day, gave a large number of researchers the tools to investigate a wide variety of nonlinear dynamical systems, particularly in the area of sensory physiology. More recent developments have been summarised in a series of multiple-author research volumes [26, 27, 28], and two recent textbooks [29, 43]. The balance of this chapter will be organised as follows. Section 23.2 presents background information on the relationship between the Volterra/Wiener series and several commonly used block structured models. Initial applications of block structured modelling in the study of sensory systems are reviewed in Section 23.3 . This will be followed by a discussion of the identification of systems containing high-degree nonlinearities in Section 23.4. Section 23.5 discuses a model developed specifically for the investigation of joint dynamics. Finally, Section 23.6 will summarise the chapter and suggest some open problems.
23.2 Preliminaries A wide variety of time-invariant nonlinear systems can be represented by a Volterra series. The system is described by a series of Volterra kernels of degrees ranging from 0 to some maximum value L. For a discrete time system, the output of it’s ’th degree Volterra kernel can be written: David T. Westwick Department of Electrical and Computer Engineering, Schulich School of Engineering, University of Calgary, 2500 University Drive, NW Calgary, AB, Canada e-mail:
[email protected] F. Giri & E.-W. Bai (Eds.): Block-oriented Nonlinear System Identification, LNCIS 404, pp. 385–402. c Springer-Verlag Berlin Heidelberg 2010 springerlink.com
386
D.T. Westwick T
y (t) =
T
T
∑ ∑ . . . ∑ h(k1 , k2 , . . . , k)u(t − k1)u(t − k2) . . . u(t − k)
k1 =0 k2 =0
(23.1)
k =0
where u(t) and y (t) are the input and output, respectively, T is the memory length of the system, and h (k1 , k2 , . . . , k ) is the ’th degree Volterra kernel. The output of the system is then written as the sum of the outputs of the individual kernels. Thus: L
y(t) =
∑ y(t) ,
(23.2)
=0
Boyd and Chua [4] proved that any time-invariant system could be approximated, to within an arbitrary precision, by a Volterra series model, provided the system had what they defined as a fading memory. While this result established the theoretical validity of the Volterra series model, it says nothing about its practical applicability. It can be shown [25] that the number of model parameters, that is the number of independent kernel values, in a Volterra series model is given by N par =
(T + L + 1)! . (T + 1)!L!
(23.3)
It is clear in (23.1) that the Volterra series model is linear in the parameters, and can therefore be identified using an ordinary least squares regression. However, the number of parameters given in (23.3) makes it clear that this approach will only be practical for systems with relatively mild nonlinearities (low L), short memories (low T ) or both. A more attractive alternative involves working with block structured models, interconnections of dynamic linear elements and memoryless nonlinearities. Any number of representations can be used for the linear and nonlinear elements. However, to emphasise the relationship to the Volterra series, we will use finite impulse responses (FIR) to represent the dynamic linear elements, and polynomials to represent the nonlinearities. Consider a Wiener system, a dynamic linear element followed by a memoryless nonlinearity, L
y(t) =
&
T
'
∑ c() ∑ h(k)u(t − k)
=0
(23.4)
k=0
where h(k) are the elements of the FIR filter, and c() are the polynomial coefficients of the nonlinearity. Comparing (23.4) with (23.1), it is evident that the Volterra kernels of a Wiener system are given by: h (k1 , k2 , . . . , k ) = c()h(k1 )h(k2 ) . . . h(k ) .
(23.5)
This leads to the observation that the second-order Volterra kernel will be a rank-1 matrix. Furthermore, if one sums a Volterra kernel over all but one of its indices, the result will be proportional to the impulse response of the linear element.
23
Block Structured Modelling in the Study of the Stretch Reflex
387
Reversing the order of the linear and nonlinear elements produces a Hammerstein system: T
L
k=0
=0
∑ h(k) ∑ c()u(t − k) ,
y(t) =
(23.6)
which has Volterra kernels given by h (k1 , k2 , . . . , k ) = c()h(k1 )δ (k2 − k1 )δ (k3 − k1 ) . . . δ (k − k1 ) ,
(23.7)
where δ (i − j) is interpreted as a Kronecker delta. Thus, the Volterra kernels of a Hammerstein system will only be non-zero on their main diagonals, which will all be proportional to the impulse response of the linear element. The Wiener–Hammerstein model often turns up in physiological applications, where it is more commonly referred to either as an LNL model, or a sandwich model [22]. The output of a Wiener Hammerstein model can be written as: y(t) =
T −1
L
j=0
=0
&
T −1
'
∑ g( j) ∑ c() ∑ h(k)u(t − j − k)
,
(23.8)
k=0
where h(k) and g( j) are the impulse responses of the first and second linear elements, respectively. By comparing (23.8) with (23.1), it is possible to compute the Volterra kernels of the Wiener–Hammerstein model. h (k1 , . . . , kq ) = c()
T −1
∑ g( j)h(k1 − j)h(k2 − j) . . . h(k − j) .
(23.9)
j=0
23.3 Initial Applications Block structured models, have found widespread application in the study of physiological systems because they can often be used to model highly nonlinear systems using relatively few parameters. Furthermore, it is sometimes possible to map the individual elements to different parts of the system under study. The nervous system encodes physical signals in the firing rates of neurons. This leads to saturation, on both the upper and lower extremes of the input, and perhaps some sort of compression of the signal’s dynamic range. Since this process is generally much faster than the rest of the dynamics, it appears to be memoryless when working at time scales that are relevant to the rest of the system. For example, the auditory system is often modelled using a Wiener Hammerstein cascade [9], two linear filters separated by a static nonlinearity, where the nonlinearity can be associated with the process of encoding the response of the hair cells onto an auditory nerve fibre. Similarly, the stretch reflex, the relationship between the angular velocity of a joint and the electrical activity evoked in its muscles, can be modelled as a Hammerstein cascade [19]. The nonlinearity is believed to model the muscle spindles, which are organs that respond to a stretching of the muscles. In both cases, using a block structured
388
D.T. Westwick
model provides insight regarding the processes that are happening on either side of the nonlinearities. Some of the first applications physiological applications of block structured modelling [37, 38, 39] involved analysing the first and second-order Wiener kernels from the responses of various cells in the bullfrog retina to both light inputs, and to electrical stimulation. In models of Type-N amacrine cells, Sakuranaga noted that the second-order kernels could be generated by by a Wiener cascade consisting of the system’s first-order Wiener kernel followed by a square-law device. This was not true for the Type-C amacrine cells. Later, Naka et al. [33] showed that these second-order kernels could be generated (approximately) by convolving the firstorder Wiener kernel of the electrical response between two peripheral cells, with the second-order kernel of a Type-N amacrine cell, thus suggesting that the response of the Type-C cells could be explained by a Wiener–Hammerstein model. While the generation of the block structured models helped explain the nature of the transformations occurring in the retina, the accuracy of the models was limited by that of the initial Wiener kernel estimates. Korenberg [21] developed a highly efficient method, the fast orthogonal algorithm (FOA), for estimating the zero, first and second-order terms in a truncated Volterra series model. This development prompted a series of studies of various sensory systems, including the visual system [10, 11], and the stretch receptor in the cockroach hindleg [12, 13]. In these studies, the elements of a Wiener–Hammerstein model were extracted from estimates of the first- and second-order Volterra kernels obtained using the FOA. The first significantly non-zero row in the estimated second-order Volterra kernel was used as an estimate of the initial linear element. This was then deconvolved from the estimated first-order kernel to yield an estimate of the second linear element in the cascade. The nonlinearity could then be recovered via least squares regression. Several iterative, correlation based techniques have been proposed for the identification of Hammerstein [17] Wiener [17, 35, 23] and Wiener–Hammerstein [22] models. These techniques all require Gaussian inputs, as they use Bussgang’s famous theorem [5] to obtain initial estimates of the linear elements from measurements of the input/output cross-correlation. The methods iterate between fitting the nonlinear elements (via some sort of least squares computation), and the linear elements (using correlation based calculations). Although these procedures generally produce acceptable results, none of them has been proved to converge, much less to any sort of optimal solution. Kearney and Hunter [18] used system identification techniques to fit a nonlinear model between the angular velocity of the ankle joint and the electrical activity that it evoked over the Gastrocnemius and Soleus (GS) muscles. In this study, they compared linear impulse response models with Hammerstein models obtained by fitting linear models between half-rectified velocity and the resulting myoelectric activity. The choice of a half wave rectifier nonlinearity was motivated by knowledge of the underlying physiology. In particular, there was experimental evidence to suggest that the muscle spindle was expected to respond to stretching of the muscle, and to be velocity sensitive [7]. In a later study, Kearney and Hunter [19] estimated the first and
23
Block Structured Modelling in the Study of the Stretch Reflex
389
second-order Wiener kernels between the ankle angular velocity and GS-EMG, and found that the second-order kernel was dominated by its diagonal values, suggesting a Hammerstein structure. They then applied the iterative algorithm suggested by [17] to estimate the linear and nonlinear elements in a Hammerstein cascade. The predictive power of the resulting model was found to be similar to that of the ad-hoc Hammerstein model identified in the earlier study [18].
23.4 Hard Nonlinearities Traditionally, the nonlinearities in block structured models have been represented by polynomials [34, 6]. This underscores the connection to the kernels in the Volterra series. However, polynomial estimation can be problematical, as the resulting linear regressions can become very poorly conditioned. As a result, Hunter and Korenberg [17] suggested using orthogonal polynomials instead. In particular, they recommended using Chebyshev polynomials, since these tend to produce well conditioned estimation problems for a wide variety of input distributions [2]. Figure 23.1 shows the first 6 elements of three polynomial bases: a power series, Hermite polynomials, and Chebyshev polynomials. Suppose that we wish to fit a polynomial to input-output data from a static nonlinearity. Consider 2 inputs, each 1000 points long: one with a standard normal distribution, and one that is uniformly distributed. Table 23.1 shows the condition numbers of the regressions that would fit polynomials of degrees 0 through 10 using these two inputs and each of the three polynomial bases shown in Figure 23.1. Notice that the regressions for the Chebyshev polynomials remain well conditioned for all polynomial degrees tested, whereas the other two bases quickly become severely ill-conditioned. Furthermore, note that the Chebyshev basis remains better conditioned with a uniform input, than with a normally distributed input. Table 23.1: Condition numbers for the example polynomial regression obtained using Power Series, Hermite, and Chebyshev basis functions Poly. Normal Distribution Uniform Distribution Deg. Power Series Hermite Chebyshev Power Series Hermite Chebyshev 0 1.00 1.00 1.00 1.00 1.00 1.00 1.05 1.05 2.85 1.00 1.00 1.74 1 2.07 1.14 4.93 2.91 1.02 2.02 2 3.37 1.33 7.63 5.89 3.36 2.52 3 6.02 1.94 9.11 10.3 9.51 2.67 4 13.6 3.87 10.5 29.3 32.1 3.05 5 27.7 8.70 11.0 55.5 147. 3.20 6 62.4 23.2 11.4 149. 539. 3.48 7 140. 68.2 11.5 305. 2,789. 3.57 8 335. 208 11.6 804. 11,944. 3.77 9 773. 730. 11.6 1,707. 63,793. 3.81 10
390
D.T. Westwick
Fig. 23.1: Comparison of the power series, Hermite and Chebyshev polynomial basis functions. The power series (left) is shown over an arbitrarily chosen domain. The Hermite polynomials (centre) are shown over the range [-3 3], since they are orthogonal for standard normal inputs. The Chebyshev polynomials are shown over the domain [-1 1], since the input is mapped to this range as part of the fitting procedure
Figure 23.2 shows polynomials of degrees 2,3,5 and 7 fit to a hard nonlinearity, the composition of an arc-tangent with a half-wave rectifier, using the inputs described above. This nonlinearity includes a linear segment, a sharp corner, and a gentle saturation. Results obtained with a normally distributed input are shown on the left. Since the data are more heavily distributed close to the origin, the result-
23
Block Structured Modelling in the Study of the Stretch Reflex
391
Fig. 23.2: Polynomial approximations to a hard nonlinearity obtained with a normally distributed (left) and a uniformly distributed input (right)
ing approximations are relatively poor near the extremities. The plots on the right were obtained using a uniformly distributed input, which causes the fitting error to be more uniformly distributed across the input domain. Fitting hard nonlinearities will require high-degree polynomials, which may in turn lead to ill-conditioned estimation problems. To a certain degree, conditioning problems can be reduced if a Chebyshev polynomial basis is used. A separable least squares (SLS) based algorithm has been developed that identifies Hammerstein models consisting of a polynomial nonlinearity followed by a FIR filter [42]. The Hammerstein model was parametrised by the vector: 4 3 θn (23.10) θ= θ where θ is a vector that includes all the parameters that appear linearly in the output, and θn contains all the remaining parameters. Clearly, the polynomial Hammerstein model (23.6) is bilinear in the polynomial coefficients and impulse response weights. Thus, the output can be regarded as being linear in either the filter weights, or in the polynomial coefficients, provided the other parameters are held fixed. Since the number of filter weights will generally be much larger than the number of polynomial coefficients, they were treated as the “linear” parameters, whereas the polynomial coefficients were placed in θn , the vector of nonlinear parameters. Given any set of nonlinear parameters, θn , the optimal linear parameters corresponding to those particular nonlinear parameters may be obtained by solving a linear regression. Thus, the linear parameters, and hence the model output and resulting mean squared prediction error, can be viewed as functions of the nonlinear parameters. Thus, one need only perform an optimisation with respect to the relatively small number of nonlinear parameters. These separable nonlinear least squares problems were studied in [14]. Ruhe and Wedin [36] proposed several algorithms for solving them. In the system
392
D.T. Westwick
identification literature, interest in these algorithms appears to have been initiated in [40], which applied them to neural network training. The SLS based algorithm [42] has been used to identify a Hammerstein model of the stretch reflex EMG. Details of the experiment can be found in [15, 19]. Briefly, a human subject lay supine, with his left foot inserted into a custom fitted fibre-glass cast. The cast was attached to a rotary electro-hydraulic actuator, and aligned the axis of rotation of the ankle with that of the actuator. The actuator was configured as a position servo, which was used to apply the experimental perturbation to the joint. The measurements consisted of the ankle angle, measured by a potentiometer attached to the actuator, the torque, measured by a load cell placed between the cast and actuator, and the electrical activity (EMG) measured over the Tibialis Anterior and over the Gastrocnemius and Soleus muscles. Data were sampled at 1000Hz, and then decimated by a factor of 5. Figure 23.3 shows 10 seconds of data from this experiment. Thus, models were fit between the angular velocity of the ankle joint, and the electrical activity recorded over the GS muscles. Results were compared with those obtained using the iterative technique proposed in [17]. Figure 23.4 shows a typical result from this study. Both models comprised a 7th-degree Chebyshev polynomial
Fig. 23.3: 10 seconds of experimental joint dynamics data. The top panel shows the measured ankle angle, while the torque is shown in the middle. The lower panel shows both EMGs. The activity of the Tibialis Anterior is shown in grey, while the activity of the Gastrocnemius/Soleus is shown in black, with the sign reversed
23
Block Structured Modelling in the Study of the Stretch Reflex
393
Fig. 23.4: Comparison of stretch reflex EMG models identified using the Hunter-Korenberg iterative method (dashed), and a separable least squares algorithm (solid). The separable least squares model provided substantially better predictions than did the model identified using the Hunter-Korenberg algorithm (5.13% vs 8.83% normalised mean squared error in separate validation data)
followed by a 32-tap FIR filter. However, the model identified using SLS produced considerably better output predictions than did the iterative model, resulting in a 42% reduction in the normalised mean squared error (NMSE), when evaluated using separate validation data. This is not surprising, since the SLS approach explicitly minimises the mean squared error, and was initialised using the result of the iterative approach. Even with their robust numerical properties, Chebyshev polynomials are not well suited to modelling “hard” nonlinearities such as rectification, saturation, thresholding and deadzones, since all of the basis functions have support over the whole real line, and eventually tend to ±∞. One approach to dealing with these hard nonlinearities, is to adopt specialised models. For example Bai [1] developed single parameter models of a number of hard nonlinearities. Since the linear element in the Hammerstein cascade could be identified using linear regression, it was possible to express the mean squared prediction error as a function of the single parameter that represented the nonlinearity. Thus, the optimal nonlinearity could be fitted by numerically solving a one-dimensional optimisation problem. While this is a promising approach, the resulting nonlinearity models were all piecewise linear. While the hard
394
D.T. Westwick
nonlinearities often encountered in physiological systems are sometimes described in terms of these piecewise linear transformations, the nonlinearities themselves are are somewhat more complicated, and more likely contain smooth curves rather than linear segments.
Use of Spline Functions One of the benefits of using a separable least squares approach to identify the elements of a Hammerstein model is that there is no reason to use a nonlinearity model that is linear in its parameters, since the nonlinearity parameters will be fitted using an iterative optimisation. Thus, in [8] the polynomial nonlinearity used in [42] was replaced with a cubic spline [3], a piecewise cubic function defined by a series of knot points. Between each pair of successive knots, the spline is defined by a cubic polynomial, chosen such that the spline is continuous and twice differentiable. There are a variety of different spline representations, with slight variations in their properties. In [8], the spline was represented by the coordinates of its knot points. Using this representation, the output is nonlinear with respect to all of the parameters. The benefit derived by using this representation, is that each of the parameters only influences the shape of the nonlinearity in a local region, determined by the positions of the adjacent knot points. This, in turn, leads to a relatively well conditioned estimation problem. Cubic splines are well suited to the approximation of hard nonlinearities. For example, Figure 23.5 shows cubic spline approximations to the same hard nonlinearity used to generate Figure 23.2. A cubic spline with 5 knots has 10 free parameters, the same number as a degree 9 polynomial. However, the fits shown in Figure 23.5 for 5 knot splines are almost perfect. This suggests that cubic splines could be well suited to representing hard nonlinearities in block-structured models.
Fig. 23.5: Cubic spline approximations to a hard nonlinearity obtained with a normally distributed (left) and a uniformly distributed input (right)
23
Block Structured Modelling in the Study of the Stretch Reflex
395
The algorithm proposed in [8] was used to fit a Hammerstein model the data shown in Figure 23.4. Results are summarised in Figure 23.6, which compares identified Hammerstein models with Chebyshev polynomial and cubic spline nonlinearities. The two models appear to be virtually identical, however, in cross-validation testing, the spline-based model generated simulation errors that were 22% smaller than those generated by the polynomial based model.
Fig. 23.6: Comparison of Hammerstein models the stretch reflex EMG using either a Chebyshev polynomial (dashed) or a cubic spline (solid) to represent the static nonlinearity. Although the two models appear quite similar, the spline based model produced more accurate predictions than the polynomial based model (4.01% vs 5.13% NMSE in separate validation data)
Although splines (cubic or otherwise) posses powerful approximation abilities, their outputs are highly nonlinear functions of their knot positions. This leads to non-convex optimisation problems, whose solutions can be sensitive to the choice of initial estimate. Although it is possible to obtain excellent results using block structured models that incorporate spline nonlinearities, the resulting algorithms can be difficult to use, due to the need for an appropriate initialisation. Until this difficulty can be addressed, it will limit the usefulness of spline-based block-structured models.
396
D.T. Westwick
23.5 The Parallel Cascade Stiffness Model While the electrical activity recorded in the EMG provides useful information regarding the timing of muscle contractions, it is just a byproduct of the muscle activation system. On the other hand, it is the force (or torque) generated by the muscles that is of functional significance. hIS (τ ) TI (t) 2
Is + Bs + K
θ (t)
+ T q(t)
nRS (·) d dt
hRS (τ )
+ TR (t)
Fig. 23.7: Block structured model of joint stiffness. The upper pathway represents the intrinsic stiffness, due to the properties of active muscle, tendons ligaments etc., in the absence of reflexive activity. It is represented as a second-order linear mass-spring-damper system. The lower pathway represents the effects of reflexes on the joint stiffness. It is a Wiener– Hammerstein system comprising a differentiator, a memoryless nonlinearity that likely includes half-wave rectification, and a second linear element, with is taken to represent the activation dynamics of the muscle, but which also includes the propagation delay in the entire reflex arc
Kearney et al. [20] proposed a model, shown in Figure 23.7, that could be fit between the angular position of a joint, and the resulting torque generated around that joint, and an iterative, correlation-based identification scheme for this model. This was followed by several studies where this model was used to study the effects of spinal lesions, strokes and degenerative neuromuscular diseases on the significance of the reflex contributions to overall joint stiffness [30, 31, 32]. Zhang and Rymer [46] used a similar model to study of the effects of reflexes in the mechanics of the human elbow joint. While this model structure has the potential to provide very detailed information regarding the physiology of the system, its identification presents several challenges: 1. The intrinsic pathway is an improper system, as it has two zeros and no poles. Discretisation of this system results in an acausal system [44], which can be modelled using a two-sided impulse response [18]. This approach has been widely used in the identification of linearised models of joint stiffness [16]. 2. Some sort of constraint must be used to separate the velocity term in the intrinsic response from that due to the the first-degree term in the polynomial representation of the nonlinearity. In the physiological system, the reflex loop includes a propagation delay, which can be used to separate these two contributions.
23
Block Structured Modelling in the Study of the Stretch Reflex
397
23.5.1 Iterative, Correlation based Approach The correlation based algorithm for identifying the elements of the parallel cascade stiffness model, proposed in [20], used the approximately 40ms propagation delay in the reflex loop to separate the contributions of the intrinsic and reflex pathways, and hence restore identifiability. They first fit a two-sided impulse response between the (input) position and (output) torque, but limited the memory length to ± 40ms. A Hammerstein cascade, whose linear element included a 40ms delay, was fitted between the joint’s angular velocity and the residuals remaining after the output of the intrinsic pathway had been subtracted from the measured torque. This iteration then continued, but the restrictions on the memory length of the intrinsic pathway were removed. Figure 23.8 shows 10 seconds (out of a total of 65) of experimental ankle position/torque data. The experiment that produced these data was essentially the same as that described above. However, the input perturbation had been redesigned to have a slightly higher bandwidth, and more uniform velocity distribution, both of which facilitate the identification of the nonlinear model. Nevertheless, the bandwidth of the input was restricted enough not to significantly suppress the stretch reflex. The first 50 seconds were used for identification, so that the last 15 seconds could be
Fig. 23.8: Data from an experimental study of ankle joint mechanics. The data consisted of approximately 65 seconds of ankle position (top) and torque (bottom) measurements, sampled at 100Hz
398
D.T. Westwick
Fig. 23.9: Parallel Cascade Stiffness models identified using the iterative correlation based technique introduced by [20]. Note that the impulse responses in the reflex pathway (lower right) were smoothed using a 3-point, zero-phase smoother, for presentation purposes
used for model validation. Figure 23.9 shows the elements of two models identified from these data using variants of the iterative correlation based approach proposed in [20]. In one case (dashed lines), the static nonlinearity was assumed to be halfwave rectifier. In the other case (solid lines), the nonlinearity was represented by a 7th degree Chebyshev polynomial. Cross-validation resulted in simulation errors of 24.74% and 22.30% NMSE for the rectifier and polynomial based models, respectively. While it is much simpler to fit the rectifier based model, including the nonlinearity in the fitting procedure improved the simulation accuracy of the model, both in the identification and validation data. Furthermore the impulse responses were also visibly less noisy in the polynomial based model than was the case in the rectifier based model.
23.5.2 Separable Least Squares Optimisation Let the memoryless nonlinearity, nRS (·) in Figure 23.7, be parametrised by the vector θn . Then the output of the parallel cascade stiffness model can be written: M1
T (t) =
∑
τ =−M1
hIS (τ )θ (t − τ ) +
M2
∑
τ =DR
hRS (τ )nRS (θ˙ (t − τ ), θn )
(23.11)
where M1 is the memory (and anticipation) length of the two-sided intrinsic stiffness model, Dr and M2 are the delay and memory length of the reflex model, and the remaining variables are defined in Figure 23.7. Regardless of how the nonlinearity is parametrised, the model output is linear in the tap weights of the two impulse responses. Thus, this model structure appears to be an ideal candidate for a separable least squares approach.
23
Block Structured Modelling in the Study of the Stretch Reflex
399
Fig. 23.10: Parallel Cascade Stiffness models identified using the separable least squares based approach introduced in [41], using both polynomial (dashed) and cubic spline (solid) models for the static nonlinearity
Fig. 23.11: Torque predictions produced by the cubic-spline based parallel cascade stiffness model
A separable least squares algorithm for a parallel cascade stiffness model with a polynomial nonlinearity was developed in [41]. Clearly, this approach could be extended to a variety nonlinearity representations. Figure 23.10 shows the elements of two models that were fit to the data in Figure 23.8 using SLS. The first (dashed lines) used represented the nonlinearity as a 7th degree Chebyshev polynomial. The second model (solid lines) used a 5-knot cubic spline instead. The separable least squares models produced slightly more accurate predictions than did the models identified using the iterative correlation based algorithms. The polynomial and spline based models resulted in 20.03 and 19.28% NMSE, respectively, in the validation segment. In addition to modest improvements in the prediction accuracy, the linear impulse responses in the models identified using SLS techniques were quite visibly less noisy than those identified using the correlation approach. However, it should be noted that the SLS method required significantly
400
D.T. Westwick
more computation time than either of the iterative schemes. Furthermore, the nonlinearities in Figure 23.10 appear to contain a constant gain, as compared to those in Figure 23.9. This may be due to differences in the velocity term assigned to the intrinsic pathway. Additional constraints may be required to remove this ambiguity.
23.6 Conclusions Block structured models can provide functional insight into certain physiological systems. They are particularly well suited to the study of the effects of reflexes on joint mechanics. There are several challenges in applying these methods, including the need to model high-degree nonlinearities, and the need to use appropriately chosen constraints to maintain identifiability. Acknowledgements. The author would like to thank Dr. Robert E. Kearney from the Department of Biomedical Engineering at McGill University for supplying the experimental data.
References 1. Bai, E.: Identification of linear systems with hard input nonlinearities of known structure. Automatica 38, 853–860 (2002) 2. Beckmann, P.: Orthogonal Polynomials for Engineers and Physicists. The Golem Press, Boulder (1973) 3. de Boor, C.D.: A practical guide to splines, Applied Mathematical Sciences, vol. 27. Springer, New York (1978) 4. Boyd, S., Chua, L.: Fading memory and the problem of approximating nonlinear operators with Volterra series. IEEE Trans. Circuits Syst. CAS 32(11), 1150–1161 (1985) 5. Bussgang, J.: Crosscorrelation functions of amplitude-distorted Gaussian signals. Tech. Rep. 216, MIT Electrical Research Lab (1952) 6. Chang, F., Luus, R.: A noniterative method for identification using Hammerstein model. IEEE Trans. Autom. Control 16, 464–468 (1971) 7. Chen, W., Poppele, R.: Static fusimotor effect on the sensitivity of mammalian muscle spindles. Brain Res. 57, 244–247 (1973) 8. Dempsey, E., Westwick, D.: Identification of Hammerstein models with cubic spline nonlinearities. IEEE Trans. Biomed. Eng. 51, 237–245 (2004) 9. Eggermont, J.: Wiener and Volterra analyses applied to the auditory system. Hearing Res 66, 177–201 (1993) 10. Emerson, R., Korenberg, M., Citron, M.: Identification of intensive nonlinearities in cascade models of visual cortex and its relation to cell classification. In: [27], pp. 97–111 (1989) 11. Emerson, R., Korenberg, M., Citron, M.: Identification of complex-cell intensive nonlinearities in a cascade model of cat visual cortex. Biol. Cybern. 66, 291–300 (1992) 12. French, A., Korenberg, M.: A nonlinear cascade model for action potential encoding in an insect sensory neuron. Biophys. J. 55, 655–661 (1989) 13. French, A., Korenberg, M.: Disection of a nonlinear cascade model for sensory encoding. Ann. Biomed. Eng. 19, 473–484 (1991)
23
Block Structured Modelling in the Study of the Stretch Reflex
401
14. Golub, G., Pereyra, V.: The differentiation of pseudo-inverses and nonlinear least squares problems whose variables separate. SIAM J. Numer. Anal. 10(2), 413–432 (1973) 15. Hunter, I., Kearney, R.: Two-sided linear filter identification. Med. Biol. Eng. Comput. 21, 203–209 (1983) 16. Hunter, I., Kearney, R.: Quasi-linear, time-varying, and nonlinear approaches to the identification of muscle and joint mechanics. In: [26], pp. 128–147 (1987) 17. Hunter, I., Korenberg, M.: The identification of nonlinear biological systems: Wiener and Hammerstein cascade models. Biol. Cybern. 55, 135–144 (1986) 18. Kearney, R., Hunter, I.: System identification of human triceps surae stretch reflex dynamics. Exp. Brain Res. 51, 117–127 (1983) 19. Kearney, R., Hunter, I.: Nonlinear identification of stretch reflex dynamics. Ann. Biomed. Eng. 16, 79–94 (1988) 20. Kearney, R., Stein, R., Parameswaran, L.: Identification of intrinsic and reflex contributions to human ankle stiffness dynamics. IEEE Trans. Biomed. Eng. 44(6), 493–504 (1997) 21. Korenberg, M.: Identifying nonlinear difference equation and functional expansion representations: The fast orthogonal algorithm. Ann. Biomed. Eng. 16, 123–142 (1988) 22. Korenberg, M., Hunter, I.: The identification of nonlinear biological systems: LNL cascade models. Biol. Cybern. 55, 125–134 (1986) 23. Korenberg, M., Hunter, I.: Two methods for identifying Wiener cascades having noninvertible static nonlinearities. Ann. Biomed. Eng. 27(6), 793–804 (1999) 24. Lee, Y., Schetzen, M.: Measurement of the Wiener kernels of a non-linear system by cross-correlation. Int. J. Control 2, 237–254 (1965) 25. Marmarelis, P., Marmarelis, V.: Analysis of Physiological Systems. Plenum Press, New York (1978) 26. Marmarelis, V. (ed.): Advanced Methods of Physiological System Modelling. Biomedical Simulations Resource, USC-LA, Los Angeles, vol. 1 (1987) 27. Marmarelis, V. (ed.): Advanced Methods of Physiological System Modeling, vol. 2. Plenum Press, New York (1989) 28. Marmarelis, V. (ed.): Advanced Methods of Physiological System Modeling, vol. 3. Plenum Press, New York (1994) 29. Marmarelis, V.: Nonlinear Dynamic Modeling of Physiological Systems. IEEE Press, Piscataway (2004) 30. Mirbagheri, M., Barbeau, H., Kearney, R.: Intrinsic and reflex contributions to human ankle stiffness: variation with activation level and position. Exp. Brain Res. 135, 423– 436 (2000) 31. Mirbagheri, M., Barbeau, H., Ladouceur, M., Kearney, R.: Intrinsic and reflex stiffness in normal and spastic, spinal cord injured subjects. Exp. Brain Res. 141, 446–459 (2001) 32. Mirbagheri, M., Alibiglou, L., Thajchayapong, M., Rymer, W.: Muscle and reflex changes with varying joint angle in hemiparetic stroke. J. NeuroEng. and Rehabil. 5(1), 6 (2008) 33. Naka, K.I., Sakai, H., Naohiro, I.: Generation and transformation of second-order nonlinearity in catfish retina. Ann. Biomed. Eng. 16, 53–64 (1988) 34. Narendra, K., Gallman, P.: An iterative method for the identification of nonlinear systems using a Hammerstein model. IEEE Trans. Autom. Control AC 11, 546–550 (1966) 35. Paulin, M.: A method for constructing data-based models of spiking neurons using a dynamic linear-static nonlinear cascade. Biol. Cybern. 69, 67–76 (1993) 36. Ruhe, A., Wedin, P.: Algorithms for separable nonlinear least squares problems. SIAM Rev. 22(3), 318–337 (1980)
402
D.T. Westwick
37. Sakuranaga, M., Ando, Y.I., Naka, K.I.: Signal transmission in the catfish retina. I. transmission in the outer retina. J. Neurophysiology 53, 373–389 (1985) 38. Sakuranaga, M., Ando, Y.I., Naka, K.I.: Signal transmission in the catfish retina. II. transmission to type N cell. J. Neurophysiology 53, 390–410 (1985) 39. Sakuranaga, M., Ando, Y.I., Naka, K.I.: Signal transmission in the catfish retina. III. transmission to type C cell. J. Neurophysiology 53, 411–428 (1985) 40. Sj¨oberg, J., Viberg, M.: Separable non-linear least squares minimization – possible improvements for neural net fitting. In: IEEE Workshop on Neural Networks for Signal Processing, vol. 7, pp. 345–354 (1997) 41. Westwick, D., Kearney, R.: Separable least squares identification of a parallel cascade model of human ankle stiffness. In: Proc. IEEE EMBS Conf., Istanbul, Turkey, vol. 23, pp. 1282–1285 (2001) 42. Westwick, D., Kearney, R.: Separable least squares identification of nonlinear Hammerstein models: Application to stretch reflex dynamics. Ann. Biomed. Eng. 29(8), 707–718 (2001) 43. Westwick, D., Kearney, R.: Identification of Nonlinear Physiological Systems. IEEE Press Series in Biomedical Engineering. IEEE Press / Wiley, Piscataway (2003) 44. Westwick, D., Perreault, E.: Identification of apparently a-causal stiffness models. In: Proc. IEEE EMBS Conf., vol. 27, pp. 5611–5614 (2005) 45. Wiener, N.: Nonlinear problems in random theory. Technology Press Research Monographs. Wiley, New York (1958) 46. Zhang, L., Rymer, W.: Simultaneous and nonlinear identification of mechanical and reflex properties of human elbow joint muscles. IEEE Trans. Biomed. Eng. 44(12), 1192– 1209 (1997)
Chapter 24
Application of Block-oriented System Identification to Modelling Paralysed Muscle Under Electrical Stimulation Zhijun Cai, Er-Wei Bai, and Richard K. Shield
24.1 Introduction Most biomedical and biological systems have nonlinear dynamics [25, 27], which bring difficulties in modelling and identification. These systems are usually represented as a series of blocks, and each block stands for a linear or nonlinear subsystem. Among those block-oriented models, Wiener model, Hammerstein model and Wiener–Hammerstein model are very popular. The Wiener model is represented as a linear dynamics followed by a nonlinear static system, while the Hammerstein model is composed in the reversed order, and the Wiener–Hammerstein model is constructed as a nonlinear static system surrounded by two linear dynamics (see Figure 24.1). All those three type of models have been extensively researched, and many identification methods have been proposed (e.g. [2, 3, 4, 6, 11, 23, 31, 39, 40]). In this chapter, we use Wiener–Hammerstein system to model the paralysed muscle under electrical stimulation, and propose an effective identification method specifically for the proposed model. Spinal cord injury (SCI) may cause the loss of volitional muscle activity, and in consequence trigger a range of deleterious adaptations. Muscle cross-sectional area declines by as much as 45% in the first six weeks after injury, with further additional atrophy occurring for at least six months [10]. Muscle atrophy impairs weight distribution over bony prominences, predisposing individuals with SCI to pressure ulcers, a potentially life-threatening secondary complication [32]. The diminution of muscular loading through the skeleton precipitates severe osteoporosis in paralysed limbs. The lifetime fracture risk for individuals with SCI is twice the risk Er-Wei Bai and Zhijun Cai Department of Electrical and Computer Engineering, The University of Iowa, Iowa City, IA 52242 USA e-mail: er-wei-bai,
[email protected] Richard K. Shield Graduate Program in Physical Therapy and Rehabilitation Science, The University of Iowa, Iowa City, IA 52242 USA F. Giri & E.-W. Bai (Eds.): Block-oriented Nonlinear System Identification, LNCIS 404, pp. 403–419. c Springer-Verlag Berlin Heidelberg 2010 springerlink.com
404
Z. Cai, E.-W. Bai, and R.K. Shield
Fig. 24.1: (a) Wiener model, (b) Hammerstein model and (c)Wiener–Hammerstein model. L and NL in the Figure represents the linear dynamics and nonlinear static system, respectively
experienced by the non-SCI population [38]. Rehabilitation interventions to prevent post-SCI muscle atrophy and its sequel are an urgent need. Functional electrical stimulation (FES) after (SCI) is an effective method to induce muscle hypertrophy [20, 30], fibre type and metabolic enzyme adaptations [1, 12], and improvements in torque output and fatigue resistance [33, 34, 35]. New evidence suggests that an appropriate longitudinal dose of muscular load can be an effective anti-osteoporosis countermeasure [21, 37, 33]. FES also has potential utility for restoration of function in tasks such as standing, reaching, and ambulating. The numerous applications for electrical stimulation after SCI have created a demand for control systems that adjust stimulus parameters in real-time to accommodate muscle output changes (potentiation, fatigue) or inter-individual force production differences. To facilitate the refinement of control system algorithms, mathematical models of paralysed muscle under electrical stimulation are continuously being developed. To most successfully adapt stimulus parameters to real-time muscle output changes, an accurate and easy-to-implement model is essential. Over the last decades, researchers have developed a number of muscle models aimed at predicting muscle force output [7, 14, 15, 17, 18, 16, 19]. The Hill-Huxleytype model [15] is the most advanced and accurate model put forward to date [8, 22]. Compared to others, the Hill-Huxley-type model represents muscle dynamics well. However, its complexity undermines its usefulness for real-time implementation. Identification of a Hill-Huxley-type model is non-trivial because it is time-varying, high dimensional and nonlinear. Local minimum versus global minimum is always a difficult issue, and a user must tune identification algorithm parameters patiently (including the initial estimates) in order to have a good result. FES fatigue models are also developed in [14, 15, 17, 18] based on the Hill-Huxley-type model for individual stimulation trains. Due to the structural complexity and convergence issues, the fatigue model has to be identified off-line, which is not suitable for applications that require real time adaptations. Our goal has been to develop fatigue and non-fatigue models that can be used in real-time to predict the force output for a large class of stimulation patterns. The model proposed should be at least comparable to the Hill-Huxley-type model, but at a much reduced complexity. We propose to use a Wiener–Hammerstein system which resembles the Hill-Huxley-type structure but has the added advantage of greater simplicity. This approach was previously suggested by Hunt and his colleagues [26] for non-fatigue force but was deemed inadequate for muscle modelling. By examining the experimental data sets and the system, we noted two problems.
24
Application of Block-oriented System Identification
405
First, a linear block prior to the nonlinear block was missing and secondly, the static nonlinearity seemed suboptimal. The Wiener–Hammerstein fatigue model is developed based on a non-fatigue model proposed in our previous works [5]. By assuming that fatigue is a slow-varying process, which is a reasonable assumption, the fatigue effects are described by slowing varying model parameters. We test the proposed model on the actual soleus force data under 15Hz stimulation from 14 subjects with SCI. It is demonstrated that the advantages of the proposed model over previous ones are theoretically justified and numerically verified.
24.2 Problem Statement A typical electrical stimulation and the corresponding muscle force responses are shown in Figure 24.2. The electrical stimulation pattern is composed of a number of trains (124 trains in Figure 24.2) denoted by the pulses and the corresponding force output is represented by the thick curves. We observe that the peak force of each train decreases as the number of the trains increases, though the input stimulation pattern remains the same, a phenomenon referred to as fatigue. We try to find a mathematical model that describes force output under the given stimulation pattern for each individual train and also captures the fatigue phenomenon.
The Hill-Huxley-type Fatigue Model Researchers have developed many mathematical muscle models. Some of them are based on the physiologies of the muscle [9, 24, 28, 29], and some of them are not [7, 26]. However, there are few models that just model the stimulated muscle force
Fig. 24.2: Force responses (thick curve) to the stimulation pulse (thin pulses) for subject 1. The muscle was stimulated by a sequence of trains with duration of 2 seconds for total 124 trains. Each train is composed of 10-pulses with 15Hz frequency followed by a resting period of 1337ms. The muscle fatigue effect is clearly shown through the reduced peak force response of the 62nd and 124th train
406
Z. Cai, E.-W. Bai, and R.K. Shield
output under electrical stimulation, and among which, the Hill-Huxley-type fatigue model [8, 15, 22] is the most accurate model so far in the literature. It describes the force output during non-fatigue and fatigue conditions and is given by a set of equations (24.1)-(24.5). 1 n t − ti CN dCN = ∑ Ri exp(− )− , dt τc i=1 τc τc where Ri = 1 + (R0 − 1) exp(−
(24.1)
ti −ti−1 τc )
dy CN y =A − , CN dt 1 + CN τ1 + τ2 1+C N
(24.2)
dA A − Arest + αA y, =− dt τ f at
(24.3)
R0 − R0,rest dR0 =− + αR0 y, dt τ f at
(24.4)
τc − τc,rest d τc =− + ατc y. dt τ f at
(24.5)
Equations (24.1) and (24.2) describe the stimulated muscle behaviour for each individual stimulation train, and Equations (24.3)-(24.5) govern the fatigue effect via varying the parameter A, R0 and τc . In Equations (24.1) and (24.2), ti is the time of the ith stimulation input and CN is the (internal) state variable, while A, R0 and are gains, and y(t) is the force output and τ1 , τ2 , and τc are the time constants. Note no actual input amplitude is directly used but only the input time sequence is used. The effect of the input amplitude is automatically adjusted by the parameters Ri and τc . In Equations (24.3)-(24.5), αA , αR0 and ατc are decaying parameters and τ f at is recovery rate. These four parameters control the fatigue model. Arest , R0,rest and τc,rest are the parameter values at the initial (non-fatigue) stage which have to be identified from the non-fatigue model (24.1) and (24.2). Note that output force y is involved as a feedback in Equations (24.3)-(24.5). To identify the Hill-Huxley-type model, an offline multi-step method is used [15]. First, it fixes τc = 0.02(seconds) to obtain the other two time constants τ1 and τ2 , which are fixed at the obtained value in the next step; then parameters A, R0 , and τc will be identified for each individual stimulation train. The last step is to derive the fatigue parameters αA , αR0 , ατc and τ f at by using the actual force output. This procedure is time-consuming and has to be done off-line due to the complexity of the model and algorithm that causes local minimum problems.
24.3 The Wiener–Hammerstein Fatigue Model The proposed Wiener–Hammerstein (block-oriented) model for single stimulation Av0 train is shown in Figure 24.3 (a) with the nonlinearity w = f (v0 ) denoted by 1+Bv , 0
24
Application of Block-oriented System Identification
407
Fig. 24.3: (a): Wiener–Hammerstein muscle model. (b): The middle nonlinear block can be decomposed into three parts. (c): The simplified Wiener–Hammerstein muscle model
where B and A are unknown parameters that are adjusted for each individual subject. The system is in the discrete time domain for easy use of digital computers. The input is the electrical stimulus (in volts) at time kT where T (0.2ms in the paper) is the sampling interval and the output y(k) = y(kT ) is the muscle force at time kT . The internal signals v(k) = v(kT ) and w(k) = w(kT ) are unavailable. The linear blocks prior and after the nonlinearity are the first order dynamic systems. The Wiener– Hammerstein model resembles the structure of the Hill-Huxley-type model but at a much reduced complexity. By decomposing middle block into three blocks (Figure 24.3 (b)) and combining the gains B and A/B with a0 and b0 , respectively, we obtain the following system in Figure 24.3 (c), where a2 = a0 B and b2 = b0 A/B. This process greatly simplifies the identification problem, reducing the number of unknown parameters from six to four. Also, no additional sequence of ti ’s is needed, which is not the case for the Hill-Huxley-type model. It is important to comment that the system in Figure 24.3 (c) is identical to the system in Figure 24.3 (a) from the input to output point of view, though the complexity is greatly reduced. The identification algorithm and convergence analysis of Wiener–Hammerstein model for single train (non-fatigue) are described in [5]. We here concentrate on the Wiener–Hammerstein fatigue model, which is developed upon the Wiener–Hammerstein model for a single stimulation train. From Figure 24.3, the fatigue model is described by the following set of Equations (24.6)(24.9): 1 b2 (p) f u p (kT ) , (24.6) y p (kT ) = z − b1(p) z − a1(p) a1 (p + 1) = αa1 (p)a1 (p) + βa1(p)a1 (p − 1),
(24.7)
b1 (p + 1) = αb1 (p)b1 (p) + βb1(p)b1 (p − 1),
(24.8)
b2 (p + 1) = αb2 (p)b2 (p) + βb2(p)b2 (p − 1).
(24.9)
408
Z. Cai, E.-W. Bai, and R.K. Shield
Fig. 24.4: The cost function J (aˆ1 (p), h(aˆ1 )) vs. aˆ1 . The right plot is the zoom of the left plot with aˆ1 ∈ [0.9, 1]
In Equation (24.6), b1 (p), b2 (p) and a1 (p) are the parameters of pth (p > 2) train x in the stimulation pattern, f (x) = 1+x is the saturation function, y p (kT ) and u p (kT ) are the force output and stimulus input for the pth stimulation train, respectively. Note that in the fatigue model, we set a2 (p) = 1 for all stimulation trains because the fitting performance is not sensitive to a2 , and it also reduces the number of parameters to three for each individual train. For stability reason, we set 0 ≤ b1 (p) ≤ 1 and 0 ≤ a1 (p) ≤ 1. Time varying nature of the parameters b1 (p), b2 (p) and a1 (p) reveals the muscle fatigue effects (Equations (24.7)-(24.9)). It is reasonable to assume that the parameters for each individual train are slow varying so they can be predicted by the extrapolation of the previous two corresponding parameters. In Equations (24.7)-(24.9), αa1 , αb1 ,αb2 , βa1 , βb1 and βb2 are the coefficients, and they are obtained through the iterative (least squares) method. ⎡ ⎤ a1 (3) 4 3 αa1 (p) −1 T ⎢ ⎥ .. (24.10) A1 ⎣ = AT1 A1 ⎦, . βa1 (p) a1 (p) ⎡ ⎤ b1 (3) 4 3 −1 T ⎢ αb1 (p) ⎥ .. (24.11) B1 ⎣ = BT1 B1 ⎦, . βb1 (p) b1 (p) ⎡ ⎤ b2 (3) 4 3 −1 T ⎢ αb2 (p) ⎥ .. B2 ⎣ = BT2 B2 (24.12) ⎦, . βb2 (p) b2 (p)
24
Application of Block-oriented System Identification
⎡ ⎢ ⎢ where A1 = ⎢ ⎣ ⎡ ⎢ ⎢ and B2 = ⎢ ⎣
a1 (2) a1 (3) .. .
⎤
a1 (1) a1 (2) .. .
⎡
⎥ ⎢ ⎥ ⎢ ⎥, B1 = ⎢ ⎦ ⎣
a1 (p) a1 (p − 1) ⎤ b2 (2) b2 (1) b2 (3) b2 (2) ⎥ ⎥ ⎥. .. .. ⎦ . . b2 (p) b2 (p − 1)
b1 (2) b1 (3) .. .
409
b1 (1) b1 (2) .. .
⎤ ⎥ ⎥ ⎥, ⎦
b1 (p) b1 (p − 1)
It is commented that only the data up to p − 1 trains are needed to calculate αa1 (p), βa1 (p), αb1 (p), βb1 (p), αa2 (p) and βb2 (p). This causal property is important which makes identification in real time feasible.
24.4 Identification of the Wiener–Hammerstein System 24.4.1 Identification of the Wiener–Hammerstein Non-fatigue Model (Single Train Stimulation Model) Before presenting the identification algorithm for the Wiener–Hammerstein fatigue model, we need to show the method to identify the Wiener–Hammerstein nonfatigue model with given input and output data for pth single stimulation train. Let θ (p) = [a1 (p), b1 (p), b2 (p)] and θˆ (p) = [aˆ1 (p), bˆ 1 (p), bˆ 2 (p)] denote the unknown system parameters and the estimate, respectively. Let y(kT ˆ ) be the predicted output calculated using the estimates 1 bˆ 2 (p) f u p (kT ) . y(p)(kT ˆ )= (24.13) z − bˆ 1(p) z − bˆ 1(p) We want to find the best parameter set θ ∗ = [a∗1 (p), a∗2 (p), b∗1 (p), b∗2 (p)] that minimises the sum of squared errors between the actual output y p (kT ) and yˆ p (kT )
θ ∗ = arg min θˆ
∑ (y p(kT ) − yˆ p(kT ))2
.
(24.14)
k
Obviously, (24.14) is a nonlinear optimisation problem because of the involvement of the nonlinear function f and thus in general, local minimum is always a tough issue. However, we show that this is not a problem for (24.14). # " u (kT ) Suppose a value of aˆ1 (p) is given, the internal signal wˆ p (kT ) = f z−paˆ (p) 1
can be calculated. Based on this internal signal and the model yˆ p ((k + 1)T ) = bˆ 1 (p)yˆ p (kT ) + bˆ 2 (p)wˆ p (kT, aˆ1 (p)), The optimal value of bˆ 1 (p) and bˆ 2 (p) with given value of aˆ1 (p) is the solution of
410
Z. Cai, E.-W. Bai, and R.K. Shield
[bˆ 1 (p), bˆ 2 (p)]∗ = arg
= arg
min
[bˆ 1 (p),bˆ 2 (p)]
min
∑ ⎧ ⎨
k
∑
[bˆ 1 (p),bˆ 2 (p)] ⎩ k
2 y p (kT ) − yˆp (kT )
'2 ⎫ ⎬ y p (kT ) − ∑ bˆ 1k−1−i (p)b2 (p)wˆ p (iT, aˆ1 (p)) . ⎭ i=1
&
k−1
(24.15)
By taking the derivative of the above cost function with respect to bˆ 2 (p) and setting it to zero yields k−1 k−1−i (kT ) b (p) w ˆ (iT, a ˆ (p)) y ∑ ∑ p p 1 k i=1 1 (24.16) bˆ 2 (p) = 2 , k−1−i b (p) w ˆ (iT, a ˆ (p)) ∑k ∑k−1 p 1 i=1 1 k−1−i which is well defined provided ∑k ∑k−1 (p)wˆ p (iT, aˆ1 (p)) = 0. Replace i=1 b1 bˆ 2 (p) with bˆ 1 (p), the optimisation (24.15) becomes one-dimensional. By visualising the cost function versus bˆ 1 ∈ [0, 1] in Figure 24.5, it is easily seen that there is only one (global) minimum in that range. Then, any nonlinear optimisation algorithm can be used to find the global minimum. Finally, the optimal bˆ 2 (p) is obtained from bˆ 1 (p) as in (24.16). This process guarantees a unique optimal solution bˆ 1 (p) and bˆ 2 (p) for given aˆ1 (p), written as [bˆ 1 (p), bˆ 2 (p)]T = h (aˆ1 (p)) ,
(24.17)
and the minimisation problem (24.14) of three parameters becomes the minimisation problem of one parameters min J (aˆ1 (p), h (aˆ1 (p))) .
(24.18)
Although the minimisation problem is simplified from (24.14) to (24.18), (24.18) is still nonlinear. But it is one-dimensional and the cost function J versus aˆ1 can now be easily plotted and visualised in Figure 24.4. It is clearly shown that there is one (global) minimum for aˆ1 ∈ [0, 1], precisely in the range bˆ 1 ∈ [0.98, 1]. In fact, for both minimisation problems (24.16) and (24.18), we have found that, for all subjects tested, there is only one (local and global) minimum. Then any nonlinear optimisation algorithm can be used to find the global minimum. We now summarise the identification algorithm for single train response (non-fatigue model). Single train (non-fatigue) model identification algorithm: Given the data set u p (k p T ) and y p (k p T ) for pth stimulation train number and k p = 1, 2, · · · , N p : 1. For each aˆ1 (p), use any nonlinear optimisation algorithm to find the optimal bˆ 1 (p) and bˆ 2 (p) of (24.15). 2. Apply any nonlinear optimisation algorithm to find the optimal aˆ1 ∈ [0, 1], and compute the corresponding bˆ 1 (p) and bˆ 2 (p). In simulations, we use the modified MATLABTM program “fminsearchbnd” to solve the nonlinear optimisation problem for both Step 1 and Step 2. The MATLAB TM
24
Application of Block-oriented System Identification
411
Fig. 24.5: The cost function vs. bˆ 1 with aˆ1 = 0.9
program “fminsearchbnd” is Nelder-Mead simplex approach based and is able to deal with simple upper-bound and lower-bound constraints.
24.4.2 Identification of the Wiener–Hammerstein Fatigue Model Now, we are ready to present the identification algorithm for the fatigue model. Online fatigue model Identification algorithm: Given the data set u p (k p )T and y p (k p T ) where p = 1, 2, · · · , N, is the simulation train number and k p ∈ {1, 2, · · · , N p } with N p being the total number of sampling data for pth stimulation train and N being the total stimulation train number. 1. Identify the single train (non-fatigue) models for the first ks stimulation trains respectively, which are assumed to be the non-fatigue case. Let p = ks . 2. Format the parameters aˆ1 (p) , bˆ 1 (p) and bˆ 2 (p) into A1 , B1 and B2 as in Equation (24.10)-(24.12). Apply Equation (24.10)-(24.12) to obtain the coefficients αa1 , αb1 , αb2 , βa1 , βb1 and βb2 . 3. Substitute the above coefficients into (24.7) - (24.9), we have the predicted values of a1 (p + 1), b1 (p + 1) and b2 (p + 1) for the next stimulation train. 4. The force response yˆ p+1 (k p+1 T ) of the next stimulation train is predicted by substituting a1 (p + 1), b1 (p + 1) and b2 (p + 1) into Equation (24.6). 5. Collect the actual input and force output for the (p + 1)th stimulation train. 6. If p = N − 1, stop. Otherwise let p = p + 1 and go back to Step 2. Key Points: • Because of the simple structure of the proposed model and global optimum properties of the least squares (24.10) - (24.12), and (24.16) and (24.18), this identification procedure is suitable for on line identification, that is, identify the model
412
Z. Cai, E.-W. Bai, and R.K. Shield
for each simulation train and use those parameters to predict the force response for the next stimulation train. • In Equations (24.10) - (24.12), the identified parameters of the current and the previous train are used to obtain the model parameter of the next stimulation train. The number of previous trains that used to obtain the next stimulation train model can be increased, which makes the prediction more accurate but it increases complexity. • The first ks trains are used to establish the initial fatigue model, and then the parameters a1 (p + 1) , b1 (p + 1) and b2 (p + 1) are updated in real time. The exact value of ks can be adjusted based on some criteria, for example, ks is the value that the fatigue phenomenon is initially observed.
24.5 Collection of SCI Patient Data Fourteen subjects with chronic SCI (> 2 years) provided written informed consent, as approved by the University of Iowa Human Subjects Institutional Review Board. A detailed description of the stimulation and the force transducing systems has been previously reported [33, 34, 35] (Figure 24.6). In brief, the subject sat in a wheelchair with the knee and ankle positioned at ninety degrees. The foot rested upon a rigid metal plate, and the ankle was secured with a soft cuff and turnbuckle connectors. Padded straps over the knee and forefoot ensured isometric conditions. The tibial nerve was supramaximally stimulated in the popliteal fossa using a nerve probe and a custom computer-controlled constant-current stimulator. Stimulation was controlled by digital pulses from a data-acquisition board (Metrabyte DAS 16F, Keithley Instruments Inc., Cleveland, OH) housed in a microcomputer under custom
Fig. 24.6: Schematic representation of the limb fixation and force measurement system
24
Application of Block-oriented System Identification
413
software control. The simulator was programmed to deliver a 10-pulse train (15Hz; train duration: 667ms) every 2 seconds for total 124 trains. In this paper, we will only consider stimulation at 15Hz. This is because that muscular overload ( 60% of maximal torque) can be generated via 15Hz supra-maximal stimulation [35] and eliciting muscle contractions with a 1 on: 2 off work-rest cycle (Burke like protocol) with a 15Hz frequency induced significant low-frequency fatigue without compromising neuromuscular transmission [13, 36]. The ensuing soleus isometric plantar flexion torques were measured via a load cell (Genisco AWU-250) positioned under the first metatarsal head. Torque was amplified 500 times (FPM 67, Therapeutics Unlimited) and input to a 12-bit resolution analog-to-digital converter at a sampling rate of 5000 samples per second. The digitised signals were analysed with Datapac 2K2 software (RUN Technologies, Mission Viejo, CA).
24.6 Results We applied the Wiener–Hammerstein model on 14 subjects. To show the advantages of the proposed model, we compared its performance against the Hill-Huxley-type model. We tried to obtain the Hill-Huxley-type fatigue model using the algorithm presented in [15] which unfortunately did not perform well. Therefore, we identified the Hill-Huxley model train by train using Equation (24.1) and (24.2) (with optimal τ1 , τ2 identified in advance), and denoted the predicted force of each train by dCˆN 1 n ˆ CˆN t − ti = )− , Ri exp(− ∑ ˆ ˆ ˆ dt τc (p) i=1 τc (p) τc (p)
(24.19)
t −ti−1 where Rˆ i = 1 + (Rˆ 0 − 1) exp(− iτˆc (p) )
d yˆ p yˆ p CˆN ˆ = A(p) − . ˆ dt 1 + CN τ1 + τ2 CˆN 1+Cˆ
(24.20)
N
The actual and the predicted peak forces for the Hill-Huxley-type model are defined as Fpk (p) = max y p (t), t∈[t0 ,t0+1 ]
(24.21)
Fˆpk (p) = max yˆ p (t), t∈[t0 ,t0+1 ]
where t p is the starting time of the pth train. The performance of the proposed Wiener–Hammerstein model is compared against the performance of (24.19)-(24.21). It is important to note that the performance of (24.19)-(24.22) is better than the actual Hill-Huxley fatigue model (24.3)(24.5). Hence, the comparison appears reasonable. Similarly, the actual and the predicted peak force for the proposed model are defined as
414
Z. Cai, E.-W. Bai, and R.K. Shield
Fpk (p) = Fˆpk (p) =
max
y p (k p T ),
max
yˆ p (k p T ).
k p ∈[1,2,··· ,N p ]
k p ∈[1,2,···,N p ]
(24.22)
There are totally 124 trains and each train is composed of 10 pulses at 15Hz with a resting period 1337ms (1/3rd on: 2/3rds rest). Figure 24.7 and Figure 24.8 show the fits for the first and last three trains of the proposed model, respectively. Both figures demonstrate the capability of fitting the actual force response very well. The predicted parameters (a1 (p + 1) ,b1 (p + 1)and b2 (p + 1)) (x) in the fatigue model based on (24.7)-(24.9) are shown in Figure 24.9 along with the individually identified ones (dot) based on the data within individual trains. This illustrates the model proposed is capable of capturing the properties of individual trains as well as the fatigue phenomenon. Figure 24.10 shows the predicted (star) and actual (dot) peak force for subject 1. They match well in both shape and magnitude. For each subject, the good-of-fitness (gof) is calculated for each train and their average is used for comparison, see Equation (24.23). : Np 2 ∑ p=1 (y(k p T )−yˆ pk (k p T )) gof p = 1 − Np 2, (24.23) ∑ p=1 (y(k p T )−y¯ˆ pk (k p T )) gofave =
1 N
∑Np=1 gof p , N = 124.
The average good-of-fitness for all 124 stimulation trains of all 14 subjects are in Table 24.1. The proposed model substantially outperforms the Hill-Huxley-type model in gofave (0.9102 vs. 0.7805). Keep in mind that the proposed model is for the prediction and the Hill-Huxley-type model reflects only the off line identification result.
Fig. 24.7: The force response of the first three stimulation trains for actual output (solid) and the predicted output (dashed) for subject 1
24
Application of Block-oriented System Identification
415
Fig. 24.8: The force response of the last three stimulation trains for actual output (solid) and the predicted output (dashed) for subject 1. The total number of trains is 124
Fig. 24.9: Model parameters a1 , b1 and b2 of the predicted (x) and identified (dot) for subject 1
As for the prediction performance on the peak force, we use the correlation coefficient, as used in [15], between the actual peak force and predicted peak force, which is defined as ∑Np=1 (Fpk (p) − F¯pk (p))(Fˆpk (p) − F¯ˆpk (p)) #2 , 2 N " ¯ N ¯ ˆ ˆ ∑ p=1 Fpk (p) − Fpk (p) ∑ p=1 Fpk (p) − Fpk (p)
R= ;
(24.24)
416
Z. Cai, E.-W. Bai, and R.K. Shield
where F¯pk and F¯ˆpk are the average of actual and predicted peak force for all stimulation trains, respectively. This measurement tells us how the predicted force correlates to the actual peak force of each individual train. A problem with the use of correlation coefficients is that it is unable to distinguish the discrepancy between the actual peak force and predicted peak force due to a DC shift or scaling. That is, the correlation is a measure of association and not agreement. For instance, the correlation coefficient could be close to one even though the actual peak force and predicted peak force differ substantially. Consequently, a very high correlation
Fig. 24.10: Actual (dot) and predicted (star) peak force of each stimulation train for subject 1 Table 24.1: Average good-of-fitness (gofave ) for each stimulation train and the correlation coefficient (R) and good-of-fitness (gof pk ) for the actual and predicted peak force for 14 subjects using the proposed fatigue model and Hill-Huxley-type model, respectively Subject Number 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Ave
Wiener–Hammerstein model gofave R gof pk 0.9363 0.9993 0.9532 0.8938 0.9993 0.9331 0.8889 0.9967 0.8375 0.8958 0.9976 0.8924 0.9286 0.9997 0.9643 0.9212 0.9983 0.9212 0.9187 0.999 0.9337 0.9223 0.9995 0.891 0.9217 0.9992 0.917 0.9265 0.9994 0.9374 0.911 0.9982 0.8557 0.8687 0.9975 0.7262 0.9025 0.9993 0.8886 0.9072 0.9996 0.9351 0.9102 0.9988 0.8990
Hill-Huxley-type model gofave R gof pk 0.8077 0.9978 0.7721 0.7692 0.9989 0.8443 0.8437 0.9971 0.519 0.7826 0.9978 0.7481 0.841 0.9993 0.8428 0.8154 0.9994 0.8833 0.8673 0.9982 0.9039 0.781 0.9952 0.5571 0.7311 0.9969 0.7216 0.7734 0.9988 0.7558 0.802 0.9924 0.22 0.7889 0.9971 0.8284 0.7249 0.9936 0.4065 0.5992 0.9821 0.4535 0.7805 0.9961 0.6755
24
Application of Block-oriented System Identification
417
coefficient does not always imply a small error in predicting the actual force. A better criterion is the good-of-fitness, widely used in identification literature. Since we are more concerned about the peak force for each stimulation train during the FES, we want to know how well the predicted peak force fit the actual peak fore of each individual train defined by > ? N ? ∑ p=1 Fpk (p) − Fˆpk (p) 2 ? (24.25) gof p = 1 − @ " #2 . ∑Np=1 Fpk (p) − F¯ˆpk (p) Table 24.1 also shows the peak force correlation coefficient and good-of-fitness of the actual and predicted peak force for the proposed Wiener–Hammerstein fatigue model and the Hill-Huxley-type model. It is shown that the proposed fatigue model has a similar correlation coefficient to the Hill-Huxley-type individual model (0.9988 vs. 0.9916 in R). In terms of the gof, the proposed model outperforms the Hill-Huxley model substantially (0.8990 vs 0.6755 in gof pk ).
24.7 Discussion and Conclusions We have developed a Wiener–Hammerstein model for paralysed muscle under electrical stimulation, and presented an effective identification method. The proposed model is much less complicated than the most accurate model but captures the characteristics of the stimulated muscle by taking the advantage of the block-oriented system. However, the fitting and predicting performances are not diminished. The proposed model predicts well not only the peak force tendency (high gof pk ) but also the force output profile (high gofave ) of the each individual stimulation train. The proposed model is appropriate for online implementation due to its one “global” optimum property, which makes the model quickly and accurately identified. Patients’ muscles are easily fatigued due to the synchronising recruitment mechanism of FES during the stimulation training, so accurate and fast online prediction is very critical, especially for relatively long time. Further, this model has a strong potential to be incorporated into a feedback controlled FES system. In this paper, the fatigue model is only tested on 15Hz stimulation and has not been tested on other frequency stimulations or hybrid stimulation frequencies, which will be addressed in subsequent experiments. Our future goals are to analyse other stimulation protocols and implement the algorithm into a real time feedback FES system.
References 1. Andersen, J.L., Mohr, T., et al.: Myosin heavy chain isoform transformation in single fibres from m. vastus lateralis in spinal cord injured individuals: effects of long-term functional electrical stimulation (FES). Pflugers Archiv - European Journal of Physiology 431, 513–518 (1996)
418
Z. Cai, E.-W. Bai, and R.K. Shield
2. Bai, E.W.: An optimal two-stage identification algorithm for Hammerstein–Wiener nonlinear systems. Automatica 34, 333–338 (1998) 3. Bai, E.W.: Frequency Domain Identification of Hammerstein Models. IEEE Trans. Autom. Control 48, 530–542 (2003) 4. Bai, E.W.: Frequency Domain Identification of Wiener Models. Automatica 39, 1521– 1530 (2003) 5. Bai, E.W., Cai, Z.J.: Identification a Modified Wiener–Hammerstein System and Its Application in Electrically Stimulated Paralyzed Skeletal Muscle Modeling. Automatica 45, 736–743 (2009) 6. Bai, E.W., Fu, M.: A blind approach to Hammerstein model identification. IEEE Trans. Signal Processing 50, 1610–1619 (2002) 7. Bobet, J., Stein, R.B.: A simple model of force generation by skeletal muscle during dynamic isometric contractions. IEEE Trans. Biomed. Eng. 13, 1010–1016 (1998) 8. Bobet, J., Stein, R.B.: A comparison of models of force production during stimulated isometric ankle dorsiflexion in humans. IEEE Trans. Neural Syst. Rehabil. Eng. 13, 444– 451 (2005) 9. Brown, I.E., Scott, S.H., Loeb, G.E.: Mechanics of feline soleus: II. Design and validation of a mathematical model. J. Muscle Res. Cell. Motil. 17, 219–232 (1996) 10. Castro, M.J., Apple, D.F., et al.: Influence of complete spinal cord injury on skeletal muscle cross-sectional area within the first 6 months of injury. European Journal of Applied Physiology & Occupational Physiology 80, 373–378 (1999) 11. Crama, P., Schoukens, J.: Initial estimates of Wiener and Hammerstein systems using multisine excitation. IEEE Transactions on Instrumentation and Measurement 50, 1971– 1975 (2001) 12. Crameri, R.M., Weston, A., et al.: Effects of electrical stimulation-induced leg training on skeletal muscle adaptability in spinal cord injury. Scandinavian Journal of Medicine & Science in Sports 80, 316–322 (2002) 13. Deluca, C.: Myoelectrical manifestations of localized muscular fatigue in human. JCRC Crit. Rev. Biomed. Eng. 11, 251–279 (1984) 14. Ding, J., Binder-Macleod, S.A., et al.: Two-step, predictive, isometric force model tested on data from human and rat muscles. J. Appl. Physiol. 85(6), 2176–2189 (1998) 15. Ding, J., Wexler, A.S., et al.: A predictive model of fatigue in human skeletal muscles. J. Appl. Physiol. 89, 1322–1332 (2000) 16. Ding, J., Wexler, A.S., et al.: A mathematical model that predicts the force-frequency relationship of human skeletal muscle. Muscle Nerve 26, 477–485 (2002) 17. Ding, J., Wexler, A.S., et al.: A predictive fatigue model–I: Predicting the effect of stimulation frequency and pattern on fatigue. IEEE Trans. Neural Syst. Rehabil Eng. 10(1), 48–58 (2002) 18. Ding, J., Wexler, A.S., et al.: A predictive fatigue model–II: Predicting the effect of resting times on fatiguee. IEEE Trans. Neural Syst. Rehabil Eng. 10(1), 59–67 (2002) 19. Ding, J., Wexler, A.S., et al.: Mathematical models for fatigue minimization during functional electrical stimulation. Electromyogr. Kinesiol. 13, 575–588 (2003) 20. Dudley, G.A., Castro, M.J., et al.: A simple means of increasing muscle size after spinal cord injury: a pilot study. European Journal of Applied Physiology & Occupational Physiology 80, 394–396 (1999) 21. Dudley-Javoroski, S., Shields, R.K.: Case report: Dose estimation and surveillance of mechanical load interventions for bone loss after spinal cord injury. Physical Therapy (2008), (in press) 22. Frey Law, L.A., Shields, R.K.: Mathematical models of human paralyzed muscle after long-term training. J. Biomech. (2007)
24
Application of Block-oriented System Identification
419
23. Greblicki, G.: Continuous time Hammerstein system identification. IEEE Trans. Automat. Contr. 45, 1232–1236 (2000) 24. Gribble, P.L., Ostry, D.J.: Origins of the power law relation between movement velocity and curvature: Modeling the e.ects of muscle mechanics and limb dynamics. J Neurophysiol 76, 2853–2860 (1996) 25. Haber, R., Unbehauen, H.: Structure Identification of Nonlinear Dynamic SystemssA Survey on Input/Output Approaches. Automatica 26, 651–677 (1990) 26. Hunt, K.J., Munih, M., et al.: Investigation of the Hammerstein hypothesis in the modeling of electrically stimulated muscle. IIEEE Trans. Biomed. Eng. 45, 998–1009 (1998) 27. Hunter, I.W., Korenberg, M.J.: The identification of nonlinear biological systems: Wiener and Hammerstein cascade models. Biol. Cybern. 55, 135–144 (1986) 28. Krylow, A.M., Rymer, W.Z.: Role of intrinsic muscle properties in producing smooth movements. IEEE Trans on BME 44, 165–176 (1997) 29. Loeb, G.E., Brown, I.E., Cheng, E.J.: A hierarchical foundation for models of sensorimotor control. Exp Brain Res 126, 1–18 (1999) 30. Mahoney, E.T., Bickel, C.S., et al.: Changes in skeletal muscle size and glucose tolerance with electrically stimulated resistance training in subjects with chronic spinal cord injury. Arch. Phys. Med. Rehabil. 86, 1502–1504 (2005) 31. Narendra, K.S., Gallman, P.G.: An iterative method for the identification of nonlinear systems using a Hammerstein model. IEEE Transactions on Automatic Control 11, 546– 550 (1966) 32. Shields, R.K., Dudley-Javoroski, S.: Musculoskeletal deterioration and hemicorporectomy after spinal cord injury. Physical Therapy 83, 263–275 (2003) 33. Shields, R.K., Dudley-Javoroski, S.: Musculoskeletal plasticity after acute spinal cord injury: Effects of long-term neuromuscular electrical stimulation training. Journal of Neurophysiology 95, 2380–2390 (2006) 34. Shields, R.K., Dudley-Javoroski, S.: Musculoskeletal adaptation in chronic spinal cord injury: effects of long-term soleus electrical stimulation training. Journal of Neurorehabilitation and Neural Repair 21, 169–179 (2006) 35. Shields, R.K., Dudley-Javoroski, S., et al.: Post-fatigue potentiation of paralyzed soleus muscle: Evidence for adaptation with long-term electrical stimulation training. Journal of Applied Physiology 101, 556–565 (2006) 36. Shields, R.K., Fray Law, L.A.: Effects of electrically induced fatigue on the twitch and tetanus of paralyzed soleus muscle in humans. JJ. Appl. Physiol. 82(5), 1499–1507 (1997) 37. Shields, R.K., Schlechte, J., et al.: Bone mineral density after spinal cord injury: a reliable method for knee measurement. Arch. Phys. Med. Rehabil. 86, 1969–1973 (2005) 38. Vestergaard, P., Krogh, K., et al.: Fracture rates and risk factors for fractures in patients with spinal cord injury. Spinal Cord 36, 790–796 (1998) 39. V¨or¨os, J.: Paramter identification of discontinuous Hammerstein systems. Automatica 33, 1141–1146 (1997) 40. Westwick, D., Verhaegen, M.: Identifying MIMO Wiener systems using subspace model identification methods. Signal Processing 52, 235–258 (1998)
Index
ARMAX system, 69 AutoRegressive with eXternal input (ARX), 69, 241, 295 Backlash, 304, 348, 367 characteristic, 369 parameters identification, 371 Basis function, 217 Best linear approximation, 209 Bi-convex optimisation, 243 Biased estimates, 91 Binary, 337 Blind identification, 91, 97, 108, 273, 293, 303, 314, 329 asymptotic properties, 279, 285 cost function, 278 Cram´er-Rao lower bound, 282 impact of output noise, 283, 286 initial estimates, 280 laboratory experiment, 287 negative log-likelihood function, 278 simulation results, 283 uncertainty bounds, 282 Block linear, 368 nonlinear, 368 Block-oriented models, 89, 273, 369 Bounded errors, 368 errors-in-variables, 375 noise, 369 uncertainties, 375 Bounds computation, 374
parameters, 370, 374 parameters evaluation, 375 Bussgang’s theorem, 152 CCA, 229 Central estimate, 375 Chebyshev Polynomials,Hermite Polynomials, 389 Closed-loop system identification, 230 Coloured noise, 95, 105 Combined method, 120 Component-wise LS-SVM, 247 Compound operator, 35 Consistent estimates, 91 Control valve stiction, 303, 306 Convergence a.s., 71 Convex LMI relaxation, 376 Convex optimisation, 245 Cross-correlation analysis, 115 Curse of dimensionality, 130 Deadzone, 370 Decomposition, 36 Decouple, 339, 348 Equalisation, 275 Error bounds, 375 output measurement, 368 Error propagation, 304 Errors-in-variables, 375 Expectation maximisation algorithm, 99
422 Fast sampling outputs, 295 Feasible parameter set, 372 Feature map, 245, 247 Feature space, 245, 247 Frequency identification, 165, 181 Frequency Response Function (FRF), 211 G-functionals, 18 Gradient-based search, 98 Half-substitution, 37, 43, 45 Hammerstein, 368 Hammerstein model, 273, 348, 387 Hammerstein system, 35, 39, 241, 293, 294 Hammerstein–Wiener system, 42, 255 Hard input nonlinearities, 262, 347 Hyper parameters, 245, 247 Hysteresis, 294 Identification, 241 Ill-conditioning, 244 Instrumental variable method (IVM), 299 Invariance property, 152 Inverse regression approach, 114 Iterative algorithm, method, 47, 55, 58 Kernel, 245, 247 estimate, estimators, 114, 129 function, 245 Kernel function, 247 Key term separation principle, 37 Laboratory experiment blind identification of Wiener systems, 287 Lagrange multipliers, 245, 247 Least squares, 112, 245, 297 Least Squares Support Vector Machines (LS-SVM), 229, 245, 247 Likelihood function, 94 Linear matrix inequalities, 368 Linear programming, 376 Lissajous curves, 185 LPV, 229 Magneto-rheological dampers, 306 Maximum likelihood, 89, 94 Maximum likelihood algorithm, 97, 274 asymptotic properties, 279, 285 MIMO, 250
Index MISO Hammerstein systems, 136 Model structure selection, 213 Multisine, 212 NARX, 241 Noise output noise, 275 Nonconvex constraints, 373 optimisation problems, 374 Nonlinear feedback, 214 Nonparametric identification, 114, 129 Nonparametric nonlinearity, 71, 181 Nonparametric regression, 129 Normalisation constraint, 55, 58 Numerical integration, 99 One-step ahead prediction, 138 Orthotopic outer-bounding, 372 Output noise, 275 Output signal only, 277 Output-error approach, 91 Output-error criterion, 91 Over-sampling, 315, 329 Overparametrisation, 243 method, 28 nonlinear optimisation, 282 Parallel systems, 136, 140 Parameter backlash, 368 bounds, 370 bounds computation, 374 feasible set, 372 tight intervals, 372 uncertainty intervals, 368 Particle filter, 102 methods, 102 smoother, 102 Piecewise constant, 295 Piecewise linear, 44, 77 point estimate, 164, 165, 167, 168, 170 Polytopic outer approximation, 376 Prediction-error criterion, 91 Preload, 57 Primal-dual framework, 245, 247 Projections, 130 Pseudo random binary sequence, 368 Pseudo random binary sequences, 337
Index
423
Recursive estimation, 69, 72 Regression, 245 Regularisation, 245, 247 Ridge regression, 245
Stretch reflex, 389 Subspace direct equalisation (SDE), 300 Subspace identification, 255 Switch operator, 348
Sampled data, 165 sampled data, 165 Saturation, 56 Semiparametric Inference, 127, 128 Sensor calibration, 275 Separability, 152 Separable least squares, 263–265 Sequential Importance Resampling (SIR), 102 Set-membership, 368 Signal spread, 192, 205 Singular value decomposition (SVD), 30 SISO, 247 Spline, 394 Static function approximation, 245 Stochastic Hammerstein system, 69
Two-stage, 28, 30 Uncertainty bounds, 370 parameter interval, 368 Uniform consistency, 134 Unobserved input, 273 Volterra series, 14, 385, 388 WH-NFIRsum, 214 WH-polyNFIR, 214 Wiener model, 35, 41, 89, 91, 111, 181, 273, 386 Wiener theory, 17 Wiener–Hammerstein model, 209, 387
Lecture Notes in Control and Information Sciences Edited by M. Thoma, F. Allgöwer, M. Morari Further volumes of this series can be found on our homepage: springer.com Vol. 404: Giri, F.; Bai, E.-W. (Eds.): Block-oriented Nonlinear System Identification 425 p. 2010 [978-1-84996-512-5] Vol. 403: Tóth, R.; Modeling and Identification of Linear Parameter-Varying Systems 319 p. 2010 [978-3-642-13811-9] Vol. 402: del Re, L.; Allgöwer, F.; Glielmo, L.; Guardiola, C.; Kolmanovsky, I. (Eds.): Automotive Model Predictive Control 284 p. 2010 [978-1-84996-070-0] Vol. 401: Chesi, G.; Hashimoto, K. (Eds.): Visual Servoing via Advanced Numerical Methods 393 p. 2010 [978-1-84996-088-5] Vol. 400: Tomás-Rodríguez, M.; Banks, S.P.: Linear, Time-varying Approximations to Nonlinear Dynamical Systems 298 p. 2010 [978-1-84996-100-4] Vol. 399: Edwards, C.; Lombaerts, T.; Smaili, H. (Eds.): Fault Tolerant Flight Control appro. 350 p. 2010 [978-3-642-11689-6] Vol. 398: Hara, S.; Ohta, Y.; Willems, J.C.; Hisaya, F. (Eds.): Perspectives in Mathematical System Theory, Control, and Signal Processing appro. 370 p. 2010 [978-3-540-93917-7] Vol. 397: Yang, H.; Jiang, B.; Cocquempot, V.: Fault Tolerant Control Design for Hybrid Systems 191 p. 2010 [978-3-642-10680-4] Vol. 396: Kozlowski, K. (Ed.): Robot Motion and Control 2009 475 p. 2009 [978-1-84882-984-8] Vol. 395: Talebi, H.A.; Abdollahi, F.; Patel, R.V.; Khorasani, K.: Neural Network-Based State Estimation of Nonlinear Systems appro. 175 p. 2010 [978-1-4419-1437-8]
Vol. 394: Pipeleers, G.; Demeulenaere, B.; Swevers, J.: Optimal Linear Controller Design for Periodic Inputs 177 p. 2009 [978-1-84882-974-9] Vol. 393: Ghosh, B.K.; Martin, C.F.; Zhou, Y.: Emergent Problems in Nonlinear Systems and Control 285 p. 2009 [978-3-642-03626-2] Vol. 392: Bandyopadhyay, B.; Deepak, F.; Kim, K.-S.: Sliding Mode Control Using Novel Sliding Surfaces 137 p. 2009 [978-3-642-03447-3] Vol. 391: Khaki-Sedigh, A.; Moaveni, B.: Control Configuration Selection for Multivariable Plants 232 p. 2009 [978-3-642-03192-2] Vol. 390: Chesi, G.; Garulli, A.; Tesi, A.; Vicino, A.: Homogeneous Polynomial Forms for Robustness Analysis of Uncertain Systems 197 p. 2009 [978-1-84882-780-6] Vol. 389: Bru, R.; Romero-Vivó, S. (Eds.): Positive Systems 398 p. 2009 [978-3-642-02893-9] Vol. 388: Jacques Loiseau, J.; Michiels, W.; Niculescu, S-I.; Sipahi, R. (Eds.): Topics in Time Delay Systems 418 p. 2009 [978-3-642-02896-0] Vol. 387: Xia, Y.; Fu, M.; Shi, P.: Analysis and Synthesis of Dynamical Systems with Time-Delays 283 p. 2009 [978-3-642-02695-9] Vol. 386: Huang, D.; Nguang, S.K.: Robust Control for Uncertain Networked Control Systems with Random Delays 159 p. 2009 [978-1-84882-677-9]
Vol. 385: Jungers, R.: The Joint Spectral Radius 144 p. 2009 [978-3-540-95979-3] Vol. 384: Magni, L.; Raimondo, D.M.; Allgöwer, F. (Eds.): Nonlinear Model Predictive Control 572 p. 2009 [978-3-642-01093-4] Vol. 383: Sobhani-Tehrani E.: Khorasani K.; Fault Diagnosis of Nonlinear Systems Using a Hybrid Approach 360 p. 2009 [978-0-387-92906-4]
Vol. 373: Wang Q.-G.; Ye Z.; Cai W.-J.; Hang C.-C.: PID Control for Multivariable Processes 264 p. 2008 [978-3-540-78481-4] Vol. 372: Zhou J.; Wen C.: Adaptive Backstepping Control of Uncertain Systems 241 p. 2008 [978-3-540-77806-6] Vol. 371: Blondel V.D.; Boyd S.P.; Kimura H. (Eds.): Recent Advances in Learning and Control 279 p. 2008 [978-1-84800-154-1]
Vol. 382: Bartoszewicz A.; Nowacka-Leverton A.: Time-Varying Sliding Modes for Second and Third Order Systems 192 p. 2009 [978-3-540-92216-2]
Vol. 370: Lee S.; Suh I.H.; Kim M.S. (Eds.): Recent Progress in Robotics: Viable Robotic Service to Human 410 p. 2008 [978-3-540-76728-2]
Vol. 381: Hirsch M.J.; Commander C.W.; Pardalos P.M.; Murphey R. (Eds.): Optimization and Cooperative Control Strategies: Proceedings of the 8th International Conference on Cooperative Control and Optimization 459 p. 2009 [978-3-540-88062-2]
Vol. 369: Hirsch M.J.; Pardalos P.M.; Murphey R.; Grundel D.: Advances in Cooperative Control and Optimization 423 p. 2007 [978-3-540-74354-5]
Vol. 380: Basin M.: New Trends in Optimal Filtering and Control for Polynomial and Time-Delay Systems 206 p. 2008 [978-3-540-70802-5] Vol. 379: Mellodge P.; Kachroo P.: Model Abstraction in Dynamical Systems: Application to Mobile Robot Control 116 p. 2008 [978-3-540-70792-9] Vol. 378: Femat R.; Solis-Perales G.: Robust Synchronization of Chaotic Systems Via Feedback 199 p. 2008 [978-3-540-69306-2] Vol. 377: Patan K.: Artificial Neural Networks for the Modelling and Fault Diagnosis of Technical Processes 206 p. 2008 [978-3-540-79871-2] Vol. 376: Hasegawa Y.: Approximate and Noisy Realization of Discrete-Time Dynamical Systems 245 p. 2008 [978-3-540-79433-2] Vol. 375: Bartolini G.; Fridman L.; Pisano A.; Usai E. (Eds.): Modern Sliding Mode Control Theory 465 p. 2008 [978-3-540-79015-0] Vol. 374: Huang B.; Kadali R.: Dynamic Modeling, Predictive Control and Performance Monitoring 240 p. 2008 [978-1-84800-232-6]
Vol. 368: Chee F.; Fernando T. Closed-Loop Control of Blood Glucose 157 p. 2007 [978-3-540-74030-8] Vol. 367: Turner M.C.; Bates D.G. (Eds.): Mathematical Methods for Robust and Nonlinear Control 444 p. 2007 [978-1-84800-024-7] Vol. 366: Bullo F.; Fujimoto K. (Eds.): Lagrangian and Hamiltonian Methods for Nonlinear Control 2006 398 p. 2007 [978-3-540-73889-3] Vol. 365: Bates D.; Hagström M. (Eds.): Nonlinear Analysis and Synthesis Techniques for Aircraft Control 360 p. 2007 [978-3-540-73718-6] Vol. 364: Chiuso A.; Ferrante A.; Pinzoni S. (Eds.): Modeling, Estimation and Control 356 p. 2007 [978-3-540-73569-4] Vol. 363: Besançon G. (Ed.): Nonlinear Observers and Applications 224 p. 2007 [978-3-540-73502-1] Vol. 362: Tarn T.-J.; Chen S.-B.; Zhou C. (Eds.): Robotic Welding, Intelligence and Automation 562 p. 2007 [978-3-540-73373-7] Vol. 361: Méndez-Acosta H.O.; Femat R.; González-Álvarez V. (Eds.): Selected Topics in Dynamics and Control of Chemical and Biological Processes 320 p. 2007 [978-3-540-73187-0]