Modeling and Control of Complex Systems
AUTOMATION AND CONTROL ENGINEERING A Series of Reference Books and Textbooks ...
588 downloads
3114 Views
18MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Modeling and Control of Complex Systems
AUTOMATION AND CONTROL ENGINEERING A Series of Reference Books and Textbooks Editor FRANK L. LEWIS, PH.D. Professor Automation and Robotics Research Institute The University of Texas at Arlington
Co-Editor SHUZHI SAM GE, PH.D. The National University of Singapore
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17.
Nonlinear Control of Electric Machinery, Darren M. Dawson, Jun Hu, and Timothy C. Burg Computational Intelligence in Control Engineering, Robert E. King Quantitative Feedback Theory: Fundamentals and Applications, Constantine H. Houpis and Steven J. Rasmussen Self-Learning Control of Finite Markov Chains, A. S. Poznyak, K. Najim, and E. Gómez-Ramírez Robust Control and Filtering for Time-Delay Systems, Magdi S. Mahmoud Classical Feedback Control: With MATLAB®, Boris J. Lurie and Paul J. Enright Optimal Control of Singularly Perturbed Linear Systems and Applications: High-Accuracy Techniques, Zoran Gajif and Myo-Taeg Lim Engineering System Dynamics: A Unified Graph-Centered Approach, Forbes T. Brown Advanced Process Identification and Control, Enso Ikonen and Kaddour Najim Modern Control Engineering, P. N. Paraskevopoulos Sliding Mode Control in Engineering, edited by Wilfrid Perruquetti and Jean-Pierre Barbot Actuator Saturation Control, edited by Vikram Kapila and Karolos M. Grigoriadis Nonlinear Control Systems, Zoran Vukiç, Ljubomir Kuljaãa, Dali Donlagiã, and Sejid Tesnjak Linear Control System Analysis & Design: Fifth Edition, John D’Azzo, Constantine H. Houpis and Stuart Sheldon Robot Manipulator Control: Theory & Practice, Second Edition, Frank L. Lewis, Darren M. Dawson, and Chaouki Abdallah Robust Control System Design: Advanced State Space Techniques, Second Edition, Chia-Chi Tsui Differentially Flat Systems, Hebertt Sira-Ramirez and Sunil Kumar Agrawal
18. Chaos in Automatic Control, edited by Wilfrid Perruquetti and Jean-Pierre Barbot 19. Fuzzy Controller Design: Theory and Applications, Zdenko Kovacic and Stjepan Bogdan 20. Quantitative Feedback Theory: Fundamentals and Applications, Second Edition, Constantine H. Houpis, Steven J. Rasmussen, and Mario Garcia-Sanz 21. Neural Network Control of Nonlinear Discrete-Time Systems, Jagannathan Sarangapani 22. Autonomous Mobile Robots: Sensing, Control, Decision Making and Applications, edited by Shuzhi Sam Ge and Frank L. Lewis 23. Hard Disk Drive: Mechatronics and Control, Abdullah Al Mamun, GuoXiao Guo, and Chao Bi 24. Stochastic Hybrid Systems, edited by Christos G. Cassandras and John Lygeros 25. Wireless Ad Hoc and Sensor Networks: Protocols, Performance, and Control, Jagannathan Sarangapani 26. Modeling and Control of Complex Systems, edited by Petros A. Ioannou and Andreas Pitsillides
Modeling and Control of Complex Systems
Edited by
Petros A. Ioannou
University of Southern California Los Angeles, California, U.S.A.
Andreas Pitsillides University of Cypress Nicosia, Cyprus
Boca Raton London New York
CRC Press is an imprint of the Taylor & Francis Group, an informa business
MATLAB® is a trademark of The MathWorks, Inc. and is used with permission. The MathWorks does not warrant the accuracy of the text or exercises in this book. This book’s use or discussion of MATLAB® software or related products does not constitute endorsement or sponsorship by The MathWorks of a particular pedagogical approach or particular use of the MATLAB® software.
CRC Press Taylor & Francis Group 6000 Broken Sound Parkway NW, Suite 300 Boca Raton, FL 33487-2742 © 2008 by Taylor & Francis Group, LLC CRC Press is an imprint of Taylor & Francis Group, an Informa business No claim to original U.S. Government works Printed in the United States of America on acid-free paper 10 9 8 7 6 5 4 3 2 1 International Standard Book Number-13: 978-0-8493-7985-7 (Hardcover) This book contains information obtained from authentic and highly regarded sources. Reprinted material is quoted with permission, and sources are indicated. A wide variety of references are listed. Reasonable efforts have been made to publish reliable data and information, but the author and the publisher cannot assume responsibility for the validity of all materials or for the consequences of their use. Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, please access www. copyright.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC) 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged. Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. Visit the Taylor & Francis Web site at http://www.taylorandfrancis.com and the CRC Press Web site at http://www.crcpress.com
P1: Binaya Dash November 16, 2007
18:11
7985
7985˙C000
Contents
1 Introduction to Modeling and Control of Complex Systems . . . . . . . . 1 Petros Ioannou and Andreas Pitsillides
2 Control of Complex Systems Using Neural Networks . . . . . . . . . . . . . 13 Kumpati S. Narendra, Matthias J. Feiler, and Zhiling Tian
3 Modeling and Control Problems in Building
Structures and Bridges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 Sami F. Masri and Anastasios G. Chassiakos
4 Model-Free Adaptive Dynamic Programming Algorithms
for H-Infinity Control of Complex Linear Systems . . . . . . . . . . . . . . . . 131 Asma Al-Tamimi, Murad Abu-Khalaf, and Frank L. Lewis
5 Optimization and Distributed Control for Fair Data
Gathering in Wireless Sensor Networks . . . . . . . . . . . . . . . . . . . . . . . . . . 161 Avinash Sridharan and Bhaskar Krishnamachari
6 Optimization Problems in the Deployment
of Sensor Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .179 Christos G. Cassandras and Wei Li
7 Congestion Control in Computer Networks . . . . . . . . . . . . . . . . . . . . . . 203 Marios Lestas, Andreas Pitsillides, and Petros Ioannou
8 Persistent Autonomous Formations
and Cohesive Motion Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247 Barıs¸ Fidan, Brian D. O. Anderson, Changbin Yu, and Julien M. Hendrickx
9 Modeling and Control of Unmanned Aerial Vehicles:
Current Status and Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277 George Vachtsevanos, Panos Antsaklis, and Kimon P. Valavanis
10 A Framework for Large-Scale Multi-Robot Teams . . . . . . . . . . . . . . . . 297 Andrew Drenner and Nikolaos Papanikolopoulos
P1: Binaya Dash November 16, 2007
18:11
7985
7985˙C000
11 Modeling and Control in Cancer Genomics . . . . . . . . . . . . . . . . . . . . . . . 339 Aniruddha Datta, Ashish Choudhary, Michael L. Bittner, and Edward R. Dougherty
12 Modeling and Estimation Problems
in the Visuomotor Pathway . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367 Bijoy K. Ghosh, Wenxue Wang, and Zachary V. Freudenburg
13 Modeling, Simulation, and Control of Transportation Systems . . . 407 Petros Ioannou, Yun Wang, and Hwan Chang
14 Backstepping Controllers for Stabilization
of Turbulent Flow PDEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 439 Miroslav Krstic, Jennie Cochran, and Rafael Vazquez
15 An Approach to Home Automation by Means
of MAS Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 461 Giuseppe Conte and David Scaradozzi
16 Multi-Robot Social Group-Based Search Algorithms . . . . . . . . . . . . . 485 Bud Fox, Wee Tiong Ong, Heow Pueh Lee, and Albert Y. Zomaya Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 509
P1: Binaya Dash November 16, 2007
18:11
7985
7985˙C000
Preface
Broadly speaking, a complex system consists of a large number of interacting components, which may include molecules, cells, bacteria, electronic chips, computers, routers, automobiles, even people or business firms. Interactions among the elements of such systems are often nonlinear and lead to rich dynamics, with patterns and fluctuations on many scales of space and time. They are often hard to understand, model and control using traditional approaches. Recent developments in the area of electronics, computational speed, sensor and communication technologies and advances in areas such as microelectromechanical systems MEMS, nanotechnology and quantum electronics open the way for new approaches in dealing with systems far more complex than one could imagine a few years ago. System theory can play a significant role in understanding, modeling, and controlling such complex systems. There is a general understanding that complex system theory together with technological advances in materials, electronics, and sensors will help solve new nontraditional problems in addition to the traditional ones, push the performance envelope further, and open the way for new products and more efficient operations. As complex system and feedback control concepts penetrate different disciplines, new notation is generated and new techniques are developed, leading to many publications, with results and products scattered in different journals, books, conference proceedings, and so on. Given the multidisciplinary nature of complex systems the scattering of information across different areas creates a chaotic situation for the reader who is interested in understanding the complexity and possible solutions as they apply to different areas and applications. The purpose of this book is to bring together a number of research experts working in different areas or disciplines to present some of their latest approaches and future research directions in the area of modeling and control of complex systems in a language that can be understood easily by system theorists. By bringing together different experts with different views and application areas the book provides a better picture of the issues involved in dealing with the modeling and control of complex systems in completely different areas. What works in one area may fail in another and an acceptable approach in one area may produce revolutionary results in another. The book contains sixteen chapters covering an indicative spectrum of the different areas and disciplines that can be classed as complex systems. These include neural networks for modeling and control, modeling and control of civil structures, transportation systems, sensor networks, genomics, computer
P1: Binaya Dash November 16, 2007
18:11
7985
7985˙C000
networks, unmanned air vehicles, robots, biomedical systems, fluid flow systems, home automation systems, and so on. The focus is not only on the theoretical treatment of the topic but also on the application and future directions. Readers from different disciplines with interest in modeling and control of complex systems will benefit from the book as they will learn how complexity is dealt with in different disciplines by researchers of different backgrounds using different approaches. This feature of the book is very educational and will help researchers learn about methodologies in other areas that may be applicable to their area. In addition it will enable people to shift to other research areas within complex systems where their approach and methodology will lead to new solutions. The book is intended for people who are interested in the theory and application of a system approach to handle complex systems in a very wide range of areas. Possible solutions to the modeling and control of complex systems may include, in addition to theory and simulation tools, the use of advanced sensor and communication technologies for implementation. This mix of theory simulation and technology becomes a strong educational vehicle for enlarging knowledge beyond the bounds of specific topics in which most researchers are often trapped. It encourages a multidisciplinary approach to deal with complexity, which has the potential of leading to new breakthroughs and advances. We wish to thank all the authors for their valuable time and efforts in putting together this book, for their hard work, and for sharing their experiences so readily. We also thank the reviewers for their valuable comments in enhancing the contents of this book. Last, but not least, we would like to thank Frank Lewis, the series editor, B. J. Clark, Helen Redshaw, Nora Konopka, Catherine Giacari, Jessica Vakili, and the rest of the staff at CRC for their understanding, patience, and unwavering support in materializing this book. We hope this book will be a useful reference and a source of inspiration for all the readers in this important and growing field of research, and will contribute to the effective modeling and design of complex systems, which form the pillar of today’s society. Petros Ioannou Andreas Pitsillides
P1: Binaya Dash November 16, 2007
18:11
7985
7985˙C000
The Editors
Dr. Petros Ioannou is a professor in the Department of Electrical EngineeringSystems, University of Southern California and the director of the Center of Advanced Transportation Technologies. He also holds a courtesy appointment with the Department of Aerospace and Mechanical Engineering. His research interests are in the areas of adaptive control, neural networks, nonlinear systems, vehicle dynamics and control, intelligent transportation systems and marine transportation. Dr. Ioannou is a fellow of IEEE, fellow of the International Federation of Automatic Control (IFAC), and the author or coauthor of 8 books and over 200 research papers in the areas of controls, vehicle automation, neural networks, nonlinear dynamical systems and intelligent transportation systems. Andreas Pitsillides (IEEE M’89, SM’2005) received a B.Sc. (Honors) degree from the University of Manchester Institute of Science and Technology (UMIST) and Ph.D. from Swinburne University of Technology, Melbourne, Australia, in 1980 and 1993, respectively. He is an associate professor, Department of Computer Science, University of Cyprus, and heads the Networks Research Laboratory (NetRL). Andreas is also a founding member and chairman and scientific director of the Cyprus Academic and Research Network (CYNET) since its establishment in 2000. Prior to that he worked in industry for six years (Siemens 1980–1983, Asea-Brown Boveri, 1983–1986), and from 1987 to 1994 was with the Swinburne University of Technology (lecturer, senior lecturer 1990–1994, and foundation associate director of the Swinburne Laboratory for Telecommunications Research, 1992–1994). In 1992, he spent a six-month period as an academic visitor at the Telstra (Australia) Telecom Research Labs (TRL). Andreas’s research interests include fixed and wireless networks (ad hoc and sensor networks, TCP/IP, WLANs, UMTS third generation mobile networks and beyond), flow and congestion control, resource allocation and radio resource management, and Internet technologies and their application in mobile e-services, for example, in tele-healthcare and security issues. He has a particular interest in adapting tools from various fields of applied mathematics, such as nonlinear control theory and computational intelligence, to solve problems in computer networks. Andreas has published over 170 research papers and book chapters, presented invited lectures at major research organizations, and has given short courses at international conferences and short courses to industry.
P1: Binaya Dash November 16, 2007
18:11
7985
7985˙C000
His work has been funded by the European Commission IST program, the Cyprus National Research Promotion Foundation (RPF), the Cambridge Microsoft Research Labs, the University of Cyprus, the Swinburne University of Technology, and the Australian government research grants board, with total funding exceeding 9 million Euro. Current research projects include: IST FP 6 M-POWER, IST FP 6 C-MOBILE, IST FP 6 MOTIVE, RPF VIDEO, UCY ADAVIDEO, IST e-TEN FP 6 HEALTHSERVICE24, IST e-TEN LINKCARE, IST FP6 GEANT. Andreas serves or has served on the executive committees of major conferences, such as INFOCOM, WiOpt, ISYC, MCCS, and ICT. He is a member of the International Federation of Automatic Control (IFAC) Technical Committee TC 1.5 on Networked Systems and TC 7.3 on Transportation Systems, and of the International Federation of Information Processing (IFIP) working group WG 6.3: Performance of Communications Systems. Andreas is also a member of the editorial board of Computer Networks (COMNET) Journal.
P1: Binaya Dash November 16, 2007
18:11
7985
7985˙C000
Contributors
Murad Abu-Khalaf Control & Estimation Group The MathWorks, Inc. Natick, Massachusetts Asma Al-Tamimi Mechatronics Engineering Department Hashemite University Zarqa, Jordan Brian. D. O. Anderson Research School of Information Sciences and Engineering Australian National University Canberra, Australia and National ICT Australia Canberra, Australia
Hwan Chang Department of Electrical Engineering Systems Center for Advanced Transportation Technologies University of Southern California Los Angeles, California Anastasios G. Chassiakos Department of Electrical Engineering California State University Long Beach, California Ashish Choudhary Department of Electrical Engineering Texas A&M University College Station, Texas
Jennie Cochran Panos Antsaklis Department of Mechanical and Department of Electrical Engineering Aerospace Engineering University of Notre Dame University of California Notre Dame, Indiana San Diego, California Michael L. Bittner Translational Genomics Research Institute Phoenix, Arizona Christos G. Cassandras Department of Manufacturing Engineering Center for Information and Systems Engineering Boston University Brookline, Massachusetts
Giuseppe Conte Dipartimento di Ingegneria Informatica Gestionale e dell’ Automazione Universit`a Politecnica delle Marche Ancona, Italy Aniruddha Datta Department of Electrical Engineering Texas A&M University College Station, Texas
P1: Binaya Dash November 16, 2007
18:11
7985
7985˙C000
Edward R. Dougherty Department of Electrical Engineering Texas A&M University College Station, Texas and Translational Genomics Research Institute Phoenix, Arizona Andrew Drenner Department of Computer Science and Engineering Center for Distributed Robotics University of Minnesota Minneapolis, Minnesota Matthias J. Feiler Systems Design ETH Zurich ¨ Zurich, ¨ Switzerland Baris Fidan Research School of Information Science and Engineering Australian National University Canberra, Australia and National ICT Australia Canberra, Australia Bud Fox Institute of High Performance Computing Singapore Zachary V. Freudenburg Department of Computer Science and Engineering Washington University Saint Louis, Missouri Bijoy K. Ghosh Department of Mathematics and Statistics Texas Tech University Lubbock, Texas
Julien M. Hendrickx Department of Mathematical Engineering Universite Catholique de Louvain Louvain-la-Neuve, Belgium Petros Ioannou Department of Electrical Engineering Systems Center for Advanced Transportation Technologies University of Southern California Los Angeles, California Bhaskar Krishnamachari Ming Hsieh Department of Electrical Engineering Viterbi School of Engineering University of Southern California Los Angeles, California Miroslav Krstic Department of Mechanical and Aerospace Engineering University of California San Diego, California Heow Pueh Lee Institute of High Performance Computing Singapore and Department of Mechanical Engineering National University of Singapore Singapore Marios Lestas Department of Computer Science Networks Research Lab (NetRL) University of Cyprus Nicosia, Cyprus
P1: Binaya Dash November 16, 2007
18:11
7985
7985˙C000
Frank L. Lewis Automation & Robotics Research Institute The University of Texas at Arlington Fort Worth, Texas
David Scaradozzi Dipartimento di Ingegneria Informatica Gestionale e dell’ Automazione Universit`a Politecnica delle Marche Ancona, Italy
Wei Li The Math Works, Inc. Natick, Massachusetts
Avinash Sridharan Ming Hsieh Department of Electrical Engineering Viterbi School of Engineering University of Southern California Los Angeles, California
Sami F. Masri Civil and Environmental Engineering University of Southern California Los Angeles, California Kumpati S. Narendra Department of Electrical Engineering Center for Systems Science Yale University New Haven, Connecticut
Zhiling Tian Department of Electrical Engineering Center for Systems Science Yale University New Haven, Connecticut George Vachtsevanos School of Electrical and Computer Engineering Georgia Institute of Technology Atlanta, Georgia
Wee Tiong Ong Department of Mechanical Engineering National University of Singapore Singapore
Kimon P. Valavanis Department of Computer Science and Engineering University of South Florida Tampa, Florida
Nikolaos Papanikolopoulos Department of Computer Science and Engineering University of Minnesota Minneapolis, Minnesota
Rafael Vazquez Department of Aerospace Engineering Escuela Superior de Ingenieros University of Seville Seville, Spain
Andreas Pitsillides Department of Computer Science Networks Research Lab (NetRL) University of Cyprus Nicosia, Cyprus
Wenxue Wang Department of Mathematics and Statistics Texas Tech University Lubbock, Texas
P1: Binaya Dash November 16, 2007
18:11
7985
7985˙C000
Yun Wang Department of Electrical Engineering Systems Center for Advanced Transportation Technologies University of Southern California Los Angeles, California Changbin Yu Research School of Information Sciences and Engineering Australian National University Canberra, Australia and National ICT Australia Canberra, Australia
Albert Y. Zomaya CISCO Systems Chair Professor of Internetworking School of Information Technologies The University of Sydney Sydney, Australia
P1: Binaya Dash October 24, 2007
17:28
7985
7985˙C001
1 Introduction to Modeling and Control of Complex Systems
Petros Ioannou and Andreas Pitsillides
CONTENTS 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 1.10 1.11 1.12 1.13 1.14 1.15
Chapter 2: Control of Complex Systems Using Neural Networks ...... 3 Chapter 3: Modeling and Control Problems in Building Structures and Bridges ............................................................................... 4 Chapter 4: Model-Free Adaptive Dynamic Programming Algorithms for H-Infinity Control of Complex Linear Systems........... 5 Chapter 5: Optimization and Distributed Control for Fair Data Gathering in Wireless Sensor Networks .................................................. 5 Chapter 6: Optimization Problems in the Deployment of Sensor Networks .................................................................................... 6 Chapter 7: Congestion Control in Computer Networks ....................... 6 Chapter 8: Persistent Autonomous Formations and Cohesive Motion Control ............................................................................................ 7 Chapter 9: Modeling and Control of Unmanned Aerial Vehicles: Current Status and Future Directions ...................................................... 7 Chapter 10: A Framework for Large-Scale Autonomous Multi-Robot Teams...................................................................................... 8 Chapter 11: Modeling and Control in Cancer Genomics ...................... 8 Chapter 12: Modeling and Estimation Problems in the Visuomotor Pathway ....................................................................... 9 Chapter 13: Modeling, Simulation, and Control of Transportation Systems.......................................................................... 9 Chapter 14: Backstepping Controllers for Stabilization of Turbulent Flow PDEs ........................................................................... 10 Chapter 15: An Approach to Home Automation by means of MAS Theory........................................................................................... 10 Chapter 16: Multi-Robot Social Group-Based Search Algorithms ................................................................................................. 11
1
P1: Binaya Dash October 24, 2007
2
17:28
7985
7985˙C001
Modeling and Control of Complex Systems
The modeling of complex dynamic systems has always been a challenging research topic due to the fact that mathematical models cannot accurately describe nature. A real system is often nonlinear, with infinite dimensions, noise, and external disturbances, and characteristics that can vary with time. It is impossible to describe these dynamic characteristics with mathematical equations and achieve a high level of accuracy in the sense that for the same inputs the outputs of the model match those of the real system over the whole frequency spectrum. What is possible, however, and useful for all practical purposes is to achieve model/system output matching over the frequency range of interest, which is often the low-frequency range. In this case modeling is even more challenging as decisions have to be made as to which phenomena and dynamics are neglected and which ones are modeled. Modeling is therefore not only a mathematical exercise but involves a good understanding of the system and its functionality. Models can be developed using physical laws as well as experiments and processing of data. Once a model is developed it has to be validated using real data over the frequency spectrum of interest. Complex models may be developed in an effort to understand the system, for diagnostic purposes or to be used for control design. It is often the case that the model of the system is so complex that it is difficult if at all possible to use existing control tools to design a control scheme to meet the performance requirements. The options in this case are to develop simplified models that accurately describe the dynamic characteristics of the system over the frequency range of interest for which control design tools are available or to develop new control tools applicable to the complex system under consideration. The first option is a characteristic of the traditional approaches to control design when electronics and computational tools were not as advanced as they are today. For example, a high-order model could lead to a high-order controller that cannot be implemented due to lack of adequate computer memory and computational speed. In such a case, the model would be simplified by reducing its order so that a simplified control design can be developed and implemented using available computational tools. The dramatic development of computers and microelectronics with the simultaneous reduction in implementation costs opened the way to designing complex control designs based on far more complex system models. In addition the performance envelope is no longer restricted by the computational constraints of the past and new control problems and areas emerged, bringing new challenges and the need for new, nontraditional control techniques. The traditional modeling of a practical system as a linear, timeinvariant system in the state-space form: x˙ = Ax + Bu y = C x + Du or the input–output transfer function form: y = G(s)u
P1: Binaya Dash October 24, 2007
17:28
7985
7985˙C001
Introduction to Modeling and Control of Complex Systems
3
served the needs of a wide class of control problems and continues to do so. The reason is that many electromechanical systems are designed to behave as linear, time-invariant systems over the frequency range of interest. As the class of modeling and control problems expands to include nonclassical systems or the need for expanding the performance envelope in an effort to reduce cost or squeeze more out of a system arises, the above classical model formulation is no longer adequate or applicable. Modeling complexity and controlling complex systems became an emerging area of interest for research, teaching, and applications. Complex systems include systems in which many subsystems and constituent elements interact with each other as well as with their environment in the presence of uncertainties and unpredictable phenomena. But from this mass of interactions patterns emerge so that the overall system performs in a satisfactory manner, and evolves over time. These subsystems include continuous time as well as discrete time parts in a noisy environment. Systems that fall into this category include biological systems, transportation and computer networks, robotic systems, unmanned air vehicles, sensor networks, embedded systems, manufacturing systems, and so on. Most of the control problems that arise in these applications are nontraditional as complexity cannot be reduced as much as in traditional mechanical or electrical systems. Therefore, there is a strong need for new modeling techniques and control designs to cope with the plethora of inputs and outputs involved in such complex systems, as well as to meet the challenges of new systems and performance requirements by taking advantage of the dramatic advances in sensors, information technologies, computers, and computational speed. The availability of these technologies opens the way to apply system theory, which involves modeling and control, to many nontraditional electromechanical systems and networks of systems more complex than in the past. The purpose of this book is to bring together a number of experts dealing with the modeling and control of complex systems to present up-to-date progress as well as future directions. The diversity of the topic areas has the common theme of control and modeling as it is viewed and treated by experts with different backgrounds and approaches. Below we present a brief summary of the different areas covered in the chapters to follow.
1.1
Chapter 2: Control of Complex Systems Using Neural Networks
Neural networks have been motivated from biological systems as a way of dealing with complex nonlinear systems. The goal of using neural networks or artificial neural networks to build systems with the ability to learn and control in a way very similar to biological systems has not yet been achieved the way it was promised. Instead neural networks have been used as nonlinear function approximators either off-line or online using adaptive control techniques to update the weights of the network. A plethora of results and applications
P1: Binaya Dash October 24, 2007
4
17:28
7985
7985˙C001
Modeling and Control of Complex Systems
of neural network techniques have appeared in many areas of engineering dealing with modeling, function or mapping identification, as well as control of systems that are too complex to be handled with traditional techniques. Although the results as presented in the literature are impressive, theoretical justifications are scarce, especially with respect to parameter convergence, identifiability, and control. In this chapter the authors discuss issues related to neurocontrol that have arisen during the past fifteen years. They provide a background on some mathematical preliminaries, results from linear and nonlinear control, neural networks used to practically realize the control laws, as well as the methods used to update their parameters. The evolution of the field during this period and the principal ideas motivating them are also discussed. Because neural network-based control naturally leads to nonlinear control, and to nonlinear adaptive control when system characteristics are unknown, many of the current research problems are related to these areas. Results in nonlinear control theory, concepts and structures suggested by classical (linear) adaptive control, and the approximating capabilities of neural networks are judiciously combined to deal with the nonlinear adaptive control problems that arise in complex systems. Appropriate assumptions that have to be made at every stage both to have well-posed problems and to make them mathematically tractable are discussed extensively and critical comments concerning methods currently in vogue in the control literature are provided. The authors briefly address global and stabilizability questions, as well as optimization and optimal control over a finite time using neural networks. Finally, the current status of industrial applications is described with details related to the choice of the neural networks, the structures of identifiers and controllers, and off-line and online training of neural networks for successful practical controllers.
1.2
Chapter 3: Modeling and Control Problems in Building Structures and Bridges
Future large building structures and bridges could employ a large number of sensors for diagnostic purposes as well as for active control in case of earthquakes and other external forces. Understanding and identifying the dynamics of these systems and finding ways to control them is an active area of research. In this chapter the authors address the modeling of realistic structural dynamic systems for purposes of simulation, active control, or structural health-monitoring applications. They provide a state-of-the-art approach, incorporating parametric as well as nonparametric system identification methods, for developing parsimonious nonlinear models of arbitrary structural systems. The models developed can be used in a variety of applications, spanning the range from micro-electromechanical systems (MEMS) devices,
P1: Binaya Dash October 24, 2007
17:28
7985
7985˙C001
Introduction to Modeling and Control of Complex Systems
5
to aerospace structures, to dispersed civil infrastructure systems. A wide variety of case studies is provided to illustrate the use of the modeling tools for online or off-line identification of nonlinearities, using experimental measurements as well as simulation results, to represent many challenging types of stationary as well as nonstationary nonlinearities.
1.3
Chapter 4: Model-Free Adaptive Dynamic Programming Algorithms for H-Infinity Control of Complex Linear Systems
In this chapter the authors address the design of optimal H∞ controllers for discrete-time systems by solving linear quadratic zero-sum games that appear in the optimal control problem of complex linear systems. The method used to obtain the optimal controller is the approximate dynamic programming (ADP) technique. In this chapter two methods are presented to obtain the optimal controller, and both yield online algorithms. The authors present a technique for online implementation, as well as convergence proofs of ADP methods for H∞ discrete-time control. The first algorithm, heuristic dynamic programming (HDP), is used to find the optimal controller forward in time. In this algorithm the system model is assumed to be known. The second algorithm, referred to as action-dependent heuristic dynamic programming (ADHDP) or Q-learning, is an improved version of the first algorithm in the sense that the knowledge of the system model is not needed. This leads to a model-free optimal controller design, which is in fact an adaptive control design that converges to the optimal H∞ solution. To the best of the authors’ knowledge, Q-learning provides the first direct adaptive control technique that converges to an H∞ controller.
1.4
Chapter 5: Optimization and Distributed Control for Fair Data Gathering in Wireless Sensor Networks
Sensor networks are a rather recent area of research that involves complex modeling, control, communication, and network problems. Despite the numerous research efforts in the field, there is still a large gap between theory and practice, especially when it comes to the design of higher layer network protocols. The prevailing methodology for protocol design in this context is a bottom-up intuitive engineering approach, not a top-down process guided by solid mathematical understanding. In this chapter the authors present an illustrative case study showing how a distributed convex optimization framework can be used to design a rate control protocol for fair data gathering in wireless sensor networks. A distributed dual-based gradient search algorithm is proposed and illustrated. They believe that this kind of systematic modeling
P1: Binaya Dash October 24, 2007
6
17:28
7985
7985˙C001
Modeling and Control of Complex Systems
and optimization framework represents the future of protocol design in complex wireless networks such as sensor networks.
1.5
Chapter 6: Optimization Problems in the Deployment of Sensor Networks
Chapter 6 addresses optimization problems in the deployment of sensor networks. The performance of sensor networks is sensitive to the location of its nodes in the mission space. This leads to the basic problem of deploying sensors in order to meet overall system objectives. Taking into consideration the distributed communication and computation structure of sensor networks, cooperative control comes into play so as to meet specific mission objectives. This chapter describes system deployment problems for sensor networks viewed as complex dynamic systems. Initially, a deployment setting where date sources are known is taken into consideration. The main aim is to determine the locations of a given number of relay nodes and the corresponding link flows in order to minimize the total communication cost. Next, a deployment setting where data sources are unknown is taken into account. In this case, the sensing field is modeled by a density function representing the probability that specific events take place while mobile nodes having limited range are introduced. A distributed deployment algorithm is applied at each mobile node so that it maximizes the joint detection probabilities of random events. Under dynamically changing sensing fields, the adaptive relocation behavior naturally follows from the optimal coverage formulation. Finally, communication cost is incorporated into the coverage control problem, which trades off sensing coverage and communication cost.
1.6
Chapter 7: Congestion Control in Computer Networks
Congestion control of computer networks is another problem that deviates from the classical control problems of electromechanical systems. The lack of measurements and adequate local control actions makes both the modeling and control of traffic very challenging. In this chapter the authors provide a survey of recent theoretical and practical developments in the design of Internet congestion control protocols based on the resource allocation view. A theoretical framework is presented that has been used extensively in the last few years to design congestion control protocols with verifiable properties. In this framework the congestion control problem is viewed as a resource allocation problem, which is transformed through a suitable representation into a nonlinear programming problem. The relevant cost functions serve as Lyapunov functions for the derived algorithms, thus demonstrating how local dynamics are coupled to achieve a global objective. Many of the algorithms
P1: Binaya Dash October 24, 2007
17:28
7985
7985˙C001
Introduction to Modeling and Control of Complex Systems
7
derived have been shown to have globally stable equilibrium points in the presence of delays. However, for max-min congestion controllers the problem of asymptotic stability in the presence of delays still remains open. So, the performance of these algorithms in networks of arbitrary topology has been demonstrated through simulations and practical implementation. This approach has failed to produce protocols that satisfy all the design requirements. In this chapter the authors present a number of global stability results that guide the proposal of a new adaptive congestion control protocol, which is shown through simulations to outperform previous proposals and work effectively in a number of representative scenarios.
1.7
Chapter 8: Persistent Autonomous Formations and Cohesive Motion Control
The modeling and control of formation of agents such as flying objects or robots in order to follow certain trajectories as a single body with the ability to split and reconfigure is another area of recent research activities, which again deviate from the traditional control problem formulations. In this chapter the authors present autonomous multiagents formations in the framework of graph rigidity and persistence. They give useful characteristics of rigid and persistent graphs and their implications for the control of persistent formations. They also present some operational criteria to check the persistence of a given formation. Based on these characteristics and criteria they analyze certain persistence acquisition and maintenance tasks. They also analyze cohesive motion of persistent autonomous formations and present a set of distributed control schemes to cohesively move a given persistent formation with specified initial position and orientation to arbitrary desired final position and orientation.
1.8
Chapter 9: Modeling and Control of Unmanned Aerial Vehicles: Current Status and Future Directions
Chapter 9 reviews the unmanned aerial vehicle (UAV) technologies, including the system architecture, formation control, and target tracking. Both current developments and future directions are addressed to improve the autonomy and reliability of UAVs. The assembly of multiple and heterogeneous vehicles is viewed as a “system of systems” where individual UAVs are functioning as sensors or agents. Thus, for the coordinated/collaborative control of UAV swarms, new modeling, networking, communications, and computing technologies must be developed and validated if such complex unmanned systems are to perform effectively and efficiently, in conjunction with manned
P1: Binaya Dash October 24, 2007
8
17:28
7985
7985˙C001
Modeling and Control of Complex Systems
systems, in a variety of application domains. The authors propose possible solutions to these challenges.
1.9
Chapter 10: A Framework for Large-Scale Autonomous Multi-Robot Teams
The control of individual robots involved the dynamic characteristics of the electromechanical parts and involved position and tracking accuracy depending on the application. The control of multiple robots, however, to achieve a much wider class of tasks via coordination and interaction with each other goes beyond the classical control techniques for individual robots. Chapter 10 discusses robotic teams comprised of heterogeneous members, which have many unique capabilities, making them suitable for operation in scenarios that may be hazardous or harmful to human response. The effectiveness of these teams requires that the team members take advantage of the strengths of one another to overcome individual limitations. Some of these strengths and deficiencies come in the form of locomotion, sensing, processing, communication, and available power. Many times larger robots can be used to transport, deploy, and recover smaller deployable robots. There has been some work in the area of marsupial systems, but in general marsupial systems represent teams of two or three robots. The basic design of the majority of marsupial systems does not have the scalability to handle larger-scale teams, which offer increased performance and redundancy in complex scenarios. The work presented in this chapter deals with the modeling of a much larger-scale robotic team that utilizes principles of marsupial systems. Specifically, the power consumption of a large-scale robotic team is modeled and used to optimize the location of mobile resupply stations. Each deployable member of the distributed robotic team has a specific model comprised of a series of behaviors dictating the actions of the robot. The transitions between these behaviors are used to guide the actions of both the deployable robots and the mobile docking stations that resupply them. Results from preliminary simulation are presented.
1.10
Chapter 11: Modeling and Control in Cancer Genomics
System biology and genomics is another important emerging area with many challenging modeling and control problems whose solution will have a tremendous impact in the field. In Chapter 11, the authors present an overview of the research accomplished thus far in the interdisciplinary field of cancer genomics and point out some of the research challenges that remain. Genomics study is important because cellular control and its failure in disease result from multivariate activity among cohorts of genes. Very recent research indicates
P1: Binaya Dash October 24, 2007
17:28
7985
7985˙C001
Introduction to Modeling and Control of Complex Systems
9
that engineering approaches for prediction signal processing and control are quite well suited for studying this kind of multivariate interaction. The authors model genetic regulatory networks using probabilistic Boolean networks (PBNs) whose state transition probabilities depend on an external (control) variable, and consider the issue of choosing the sequence of control actions to minimize a given performance index over a finite number of steps. They illustrate these ideas for the real-life example of melanoma cell line, for cancer therapy.
1.11
Chapter 12: Modeling and Estimation Problems in the Visuomotor Pathway
In Chapter 12 the authors describe modeling and estimation problems that arise in the animal visuomotor pathway. The pathway is particularly adept at tracking targets that are moving in space, acquiring and internally representing images of the target and finally actuating a suitable motor action, such as capturing the target. The authors describe how a population of neurons model the dynamic activity of a suitable region of the turtle visual cortex, responding to a class of visual inputs, and show how the model cortex is able to discriminate location of the target in the visual space. The discrimination is carried out using two separate algorithms. The first method utilizes statistical detection wherein the activity waves generated by the visual cortex are encoded using principal components analysis. The representation is carried out, first in the spatial domain and subsequently in the temporal domain over a sequence of sliding windows. Using the model cortex, they show that the representation of the activity waves, viewed as “beta strands,” are sufficiently different from each other to allow for alternative locations of point targets in the visual space. Discrimination is carried out assuming that the noise is additive and Gaussian. In the second method, the beta strands are discriminated using a nonlinear dynamic system with multiple regions of attraction. Each beta strand corresponds to a suitable initialization of the dynamic system and the states of attraction correspond to various target locations. The chapter concludes with a discussion of the motor control problem and how the cortical waves play a leading role in actuating movements that would track a moving target with some level of evasive maneuvers.
1.12
Chapter 13: Modeling, Simulation, and Control of Transportation Systems
Transportation networks are classical examples of complex dynamic systems with many challenging problems related to modeling their behavior and controlling their dynamics. The use of advanced technologies for data
P1: Binaya Dash October 24, 2007
10
17:28
7985
7985˙C001
Modeling and Control of Complex Systems
collection and control makes the development of validated models and control feasible in dealing with such complex systems on the local and network levels. This chapter presents an overview of traffic flow modeling at the microscopic and macroscopic levels, a review of current traffic simulation software and several methods for managing and controlling the various transportation system modes. Traffic flow and congestion phenomena are so complex that modeling techniques and computers are used to generate simulation models to describe the dynamic behavior of traffic networks. Ramp metering and speed limit control techniques for current and future transportation systems are also discussed.
1.13
Chapter 14: Backstepping Controllers for Stabilization of Turbulent Flow PDEs
This chapter presents a backstepping approach to the control of the benchmark three-dimensional channel flow. This complex system is modeled by the Navier–Stokes equations. The model is linearized about a prescribed equilibrium. The resulting linear model is described by a set of partial differential equations (PDEs), where the actuation is the velocity components at one wall. After a two-dimensional Fourier transform and an invertible change of variables, a continuum of uncoupled one-dimensional PDEs is derived to model the flow. Each one-dimensional model consists of a spatially noncausal subsystem, which is transformed into a causal subsystem via feedback. The backstepping approach is then used to develop the feedback controllers to decouple and stabilize the flow. Advantages of this method include: no spatial or temporal approximations are needed; the “gains” can be precomputed because they are explicit functions of the Reynolds number and wave numbers (thus no need to solve high-dimensional Riccati equations).
1.14
Chapter 15: An Approach to Home Automation by means of MAS Theory
In Chapter 15 the authors analyze and study home automation systems using a multi-agent system framework. The problem of conceiving and developing efficient systems for home automation presents several difficult aspects, due to a number of factors such as distributed control structures, hybrid time-driven/event-driven behaviors, interoperability between components of different brands, and requirements of safe and efficient interaction with human users, which, all together, generate complexity. The appliances and devices in modern houses can be viewed as components that are essentially autonomous, possess a certain degree of intelligence, share resources and
P1: Binaya Dash October 24, 2007
17:28
7985
7985˙C001
Introduction to Modeling and Control of Complex Systems
11
some common goals, and communicate among themselves. The formalism derived from the MAS theory can, in principle, respond to these needs, providing a powerful conceptual framework and a number of appropriate methodological tools for coping with complexity, which arises mainly by the interaction between different components.
1.15
Chapter 16: Multi-Robot Social Group-Based Search Algorithms
In Chapter 16 the authors use various ideas from the traditional search and rescue (SAR) theory and merge them with a more heuristic social group-based oriented search mechanism, to determine the effectiveness of the detection and the tracking ability of a moving target by groups of robots. They develop a multi-robot social group-based search algorithm, and simulate a group of robots detecting and tracking a target moving in a linear, nonlinear, and random walk manner. Three algorithms are pursued: a robot search algorithm; a standard search algorithm using a multiradial search function and dispersion behavior, and a Voronoi search algorithm using a Voronoi decomposition of search space prior to the commencement of multiradial search. The robots are divided into two social groups: a faster moving group and a more energy-conserving group. The work is designed to lay the foundations of future studies in planar and three-dimensional submarine detection and tracking, by ships and aircraft, in both cooperative and noncooperative search scenarios. The cooperative searches involve both parties trying to locate each other as in a SAR situation, and the non-cooperative searches are typical in warfare environments where both parties search for each other but attempt to avoid detection.
P1: Binaya Dash October 24, 2007
17:28
7985
7985˙C001
P1: Binaya Dash November 16, 2007
16:58
7985
7985˙C002
2 Control of Complex Systems Using Neural Networks
Kumpati S. Narendra, Matthias J. Feiler, and Zhiling Tian
CONTENTS 2.1
2.2
2.3
Introduction.................................................................................................. 15 2.1.1 Historical Background.................................................................... 15 2.1.2 Artificial Neural Networks (ANNs) ............................................. 16 2.1.3 ANNs for Control............................................................................ 16 2.1.3.1 Linear Control and Linear Adaptive Control.............. 16 2.1.3.2 Control of Complex Systems.......................................... 17 2.1.3.3 Nonlinear Adaptive Control: Stability and Design .... 18 2.1.3.4 Assumptions..................................................................... 19 2.1.4 Objectives of Chapter ..................................................................... 19 2.1.5 Organization of Chapter ................................................................ 19 Mathematical Preliminaries ....................................................................... 20 2.2.1 Linear Time-Invariant Systems ..................................................... 20 2.2.1.1 Controllability, Observability, and Stability................. 21 2.2.1.2 ARMA Model ................................................................... 21 2.2.1.3 Minimum Phase Systems ............................................... 22 2.2.2 Nonlinear Systems .......................................................................... 23 2.2.2.1 Controllability, Observability, and Stability................. 23 2.2.3 Adaptive Systems: Theoretical Considerations .......................... 25 2.2.4 Problem Statement .......................................................................... 27 2.2.4.1 Plant................................................................................... 27 2.2.4.2 An Area for Future Research ......................................... 29 Neural Networks, Adaptive Laws, and Stability.................................... 30 2.3.1 Neural Networks............................................................................. 30 2.3.1.1 Feedforward Networks................................................... 31 2.3.1.2 Recurrent Networks ........................................................ 32 2.3.1.3 System Approximation ................................................... 34
13
P1: Binaya Dash November 16, 2007
16:58
14
2.5
2.6
2.7
7985˙C002
Modeling and Control of Complex Systems 2.3.2
2.4
7985
Stable Adaptive Laws: Error Models ........................................... 35 2.3.2.1 Error Models for Nonlinear Systems ............................ 36 2.3.2.2 Gradient-Based Methods and Stability ........................ 36 2.3.3 Adjustment of Parameters: Feedforward and Recurrent Networks ................................................................ 37 2.3.3.1 Back Propagation through Time.................................... 39 2.3.3.2 Dynamic Back Propagation............................................ 40 2.3.3.3 Interconnection of LTI Systems and Neural Networks ..................................................... 41 2.3.3.4 Real-Time Recurrent Learning....................................... 41 Identification and Control Methods ......................................................... 42 2.4.1 Identification and Control Based on Linearization .................... 43 2.4.1.1 System Representation.................................................... 43 2.4.1.2 Higher-Order Functions ................................................. 44 2.4.1.3 System Theoretic Properties........................................... 45 2.4.2 Practical Design of Identifiers and Controllers (Linearization) ................................................................................. 48 2.4.2.1 Modeled Disturbances and Multiple Models for Rapidly Varying Parameters...................... 50 2.4.2.2 Control of Nonlinear Multivariable Systems .............. 53 2.4.2.3 Interconnected Systems .................................................. 54 2.4.3 Related Current Research............................................................... 55 2.4.4 Theoretical and Practical Stability Issues..................................... 59 2.4.4.1 Linear Adaptive Control................................................. 59 2.4.4.2 Nonlinear Adaptive Control Using Linearization Methods.................................................... 60 2.4.4.3 Nonlinear Adaptive Control .......................................... 60 Global Control Design ................................................................................ 64 2.5.1 Dynamics on Manifolds ................................................................. 64 2.5.2 Global Controllability and Stabilization ...................................... 66 2.5.3 Global Observability....................................................................... 71 Optimization and Optimal Control Using Neural Networks ............... 72 2.6.1 Neural Networks for Optimal Control ........................................ 72 2.6.1.1 Function Approximation Using Neural Networks..... 73 2.6.1.2 Parameter Optimization ................................................. 74 2.6.1.3 Computational Advantage............................................. 78 2.6.1.4 Other Formulations ......................................................... 78 2.6.2 Dynamic Programming in Continuous and Discrete Time....... 79 2.6.2.1 Continuous Time (No Uncertainty) .............................. 79 2.6.2.2 Discrete Time (No Uncertainty)..................................... 80 2.6.2.3 Discrete Time (System Unknown)................................. 81 Applications of Neural Networks to Control Problems ........................ 84 2.7.1 Application 1: Controller in a Hard Disk Drive.......................... 85 2.7.1.1 Model................................................................................. 85 2.7.1.2 Radial Basis Function Network ..................................... 85
P1: Binaya Dash November 16, 2007
16:58
7985
7985˙C002
Control of Complex Systems Using Neural Networks
15
2.7.1.3 Objective ........................................................................... 85 2.7.1.4 Adaptive Laws and Control Law .................................. 86 2.7.2 Application 2: Lean Combustion in Spark Ignition Engines.............................................................................................. 86 2.7.2.1 Objective ........................................................................... 86 2.7.2.2 Method .............................................................................. 87 2.7.3 Application 3: MIMO Furnace Control........................................ 87 2.7.4 Application 4: Fed-Batch Fermentation Processes ..................... 87 2.7.5 Application 5: Automotive Control Systems .............................. 88 2.7.6 Application 6: Biped Walking Robot ............................................ 89 2.7.7 Application 7: Real-Time Predictive Control in the Manufacturing Industry...................................................... 90 2.7.8 Application 8: Multiple-Models: Switching and Tuning........... 90 2.7.9 Application 9: Biological Control Structures for Engineering Systems................................................................. 91 2.7.10 Application 10: Preconscious Attention Control ........................ 91 2.8 Comments and Conclusions ...................................................................... 92 Acknowledgments ................................................................................................ 93 References............................................................................................................... 94
2.1
Introduction
2.1.1 Historical Background The term artificial neural network (ANN) has come to mean any computer architecture that has massively parallel interconnections of simple processing elements. As an area of research it is of great interest due to its potential for providing insights into the kind of highly parallel computation performed by physiological nervous systems. Research in the area of artificial neural networks has had a long and interesting history, marked by periods of great activity followed by years of fading interest and revival due to new engineering insights [1]–[8], technological developments, and advances in biology. The latest period of explosive growth in pure and applied research in both real and artificial neural networks started in the 1980s, when investigators from across the scientific spectrum were attracted to the field by the prospect of drawing ideas and perspectives from many different disciplines. Many of them also believed that an integration of the knowledge acquired in the different areas was possible. Among these were control theorists like the first author, who were inspired by the ability of biological systems to retrieve contextually important information from memory, and process such information to interact efficiently with uncertain environments. They came to the field with expectations of building controllers based on artificial neural networks with similar information processing capabilities. At the same time they were also convinced that the design of such controllers should be rooted in the theoretical research in
P1: Binaya Dash November 16, 2007
16:58
7985
7985˙C002
16
Modeling and Control of Complex Systems
progress in different areas of mathematical control theory, such as adaptive, learning, stochastic, nonlinear, hierarchical, and decentralized control. 2.1.2 Artificial Neural Networks (ANNs) From the point of view of systems theory, an artificial neural network (ANN; henceforth referred to as a neural network) can be regarded as a finitely parameterized, efficiently computable, and practically implementable family of transformations. The fact that they are universal approximators, involve parallel distributed processing, can be implemented in hardware, are capable of adaptation online, and easily applied to multivariable systems, made them attractive as components and subsystems in various applications. In the early 1980s, extensive computer simulation studies were carried out to demonstrate that such networks could approximate very well nearly all functions encountered in practical applications. As stated in their seminal paper, these claims led Hornik, Stinchcombe, and White [11] to raise the question whether these were merely flukes, or whether the observed successes were reflective of some deep and fundamental approximating capabilities. During the late 1980s, as a result of the work of numerous authors [9]–[11], it was shown conclusively that neural networks are universal approximators in a very precise and satisfactory sense. Following this, the study of neural networks left its empirical origins and became a mathematical discipline. Since approximation theory is at the core of many systems–related disciplines, the new results found wide application of neural networks in such areas as pattern recognition, identification, and optimization, and provided mathematical justification for them. 2.1.3 ANNs for Control Even as the above ground-breaking developments in static optimization were taking place, it was suggested in 1990 [12] that feedforward neural networks could also be used as components in feedback systems, because the approximation capabilities of such networks could be used in the design of identifiers and controllers for unknown or partially known dynamic systems. This, in turn, gave rise to a frenzy of activity in the neural network control community, and numerous heuristic methods were proposed in the following years for the control of nonlinear processes. As in the case of function approximation, vast amounts of empirical evidence began to accumulate, demonstrating that neural networks could outperform traditional linear controllers in many applications. As in the past it once again became evident that more formal methods, grounded in mathematical systems theory, would have to be developed to quantitatively assess the capabilities as well as limitations of neurocontrol. 2.1.3.1 Linear Control and Linear Adaptive Control The objective of control is to influence the behavior of dynamic systems. This includes maintaining the outputs of the system at constant values (regulation), or forcing them to follow prescribed time functions (tracking). The control
P1: Binaya Dash November 16, 2007
16:58
7985
7985˙C002
Control of Complex Systems Using Neural Networks
17
problem is to use all available data at every instant and determine the control inputs to the system. Achieving fast and accurate control, while assuring stability and robustness in the presence of perturbations, is the aim of all control design. The best developed part of control theory deals with linear systems. The extensive results in linear control theory, and subsequently in linear adaptive control theory, have strongly influenced the evolution of neural network-based control, and we recapitulate briefly in this section some of the principal concepts and results. Some further mathematical details concerning these are included in the following section. Starting with the state description of linear systems, system theoretic properties such as controllability, observability, stability, stabilizability, and detectability were investigated in the 1960s and 1970s and the results were used to stabilize and control such systems using state feedback. Later, through the use of observers, the methods were extended to control both single-input single-output (SISO) and multiple-input multiple-output (MIMO) systems, in which all the state variables are not accessible. Many of the concepts and methods developed for the control of a single dynamic system were also extended to larger classes of systems where two or more subsystems are interconnected to achieve different objectives. In fact, current research in control theory includes many problems related to the decentralized control of linear systems using incomplete information about their interconnections. Classical adaptive control deals with the control of linear, time-invariant dynamic systems, when some of the parameters are unknown. Hence, all the theoretical advances in linear control theory were directly relevant to its development. In fact, the evolution of the field of adaptive control closely paralleled that of linear control theory, because the same control problems were attempted in the modified context. However, the principal difficulties encountered were significantly different, as adaptive systems are invariably nonlinear. During the period 1970 to 1980 the emphasis was on generating adaptive laws that would assure the stability of the overall system, and the asymptotic convergence of the performance of the adaptive system to that predicted by linear theory. 2.1.3.2 Control of Complex Systems Our objective, as indicated by the title of the chapter, is to control complex systems using neural networks. The focus of the chapter is on theoretical developments, primarily on the methods for generating appropriate control inputs. Although the authors have carried out extensive simulation studies during the past fifteen years to test many of these methods, no simulation results are included here. In spite of numerous efforts on the part of researchers in the past, there is currently no universally accepted definition of a “complex system.” Like many other terms in control theory, it is multifaceted, and its definition cannot be compressed into a simple statement. At the same time, most researchers would agree on many of the characteristics that would make a system complex. Among these, the presence of nonlinear dynamics in the plant (or process)
P1: Binaya Dash November 16, 2007
18
16:58
7985
7985˙C002
Modeling and Control of Complex Systems
to be controlled would be included as one of the more significant ones. The identification and control of an isolated nonlinear plant should therefore fall within the ambit of our investigations. Other characteristics of complex systems would include uncertainties or time variations in system behavior, and operation of the system far from equilibrium. Even if a stable equilibrium exists, the system may be prevented from approaching it by external disturbances or input signals in general. Complex systems are typically composed of many interconnected subsystems which mutually influence the evolution of their state variables. In some cases, the effect of the coupling dominates the dynamics of the system. Such problems have been studied extensively in the context of linear systems by means of matrix theory, and notions such as diagonal dominance have been coined to quantify the strength of the interconnections. The principal difficulty is, again, that the couplings may be nonlinear. An additional source of complexity is the high dimensionality of the state and parameter spaces. In a large set of interconnected systems, the role of each individual system may be small, but together they constitute a powerful whole, capable of realizing higher-order functionality at the network level. This is sometimes referred to as emergent behavior and is one of the manifestations of complexity, as it cannot be explained by simply aggregating the behaviors of the constituent systems. This means that the system cannot be modeled using “representative” variables of reduced dimensionality. The neural network itself is a prime example of such an interconnected system. 2.1.3.3 Nonlinear Adaptive Control: Stability and Design It is a truism in control practice that efficient controllers for specific classes of dynamic systems can be designed only when their stability properties are theoretically well understood. This explains why controllers can be designed with confidence at the present time for both linear, time-invariant (LTI) plants with known parameters and those with unknown but constant parameters (i.e., linear adaptive systems). The advantages of using neural networks as components in dynamic systems was stated earlier in this section. Although the qualitative statements made in that context are for the most part valid and make neural networks attractive in static situations such as pattern recognition and optimization, their use in dynamic contexts involving control raises a host of problems that need to be resolved before they can be used with confidence. In the following sections, neural networks are used primarily as controllers in dynamic systems to cope with either known or unknown nonlinearities, making the overall system under consideration both nonlinear and adaptive. In spite of advances in stability theory for over two centuries, our knowledge of the stability of general nonlinear systems is quite limited. This makes the stability of nonlinear adaptive systems containing neural networks a truly formidable problem. Hence, while discussing the use of neural networks for identification and control, it is incumbent upon the authors to state precisely the class of plants considered, the prior information
P1: Binaya Dash November 16, 2007
16:58
7985
7985˙C002
Control of Complex Systems Using Neural Networks
19
available to the designer concerning the system, the external perturbations that may be present, the domain of interest in the state space, and the manner in which new information concerning the unknown plant is acquired (i.e., online or off-line) and utilized, and the conditions under which the results are valid. 2.1.3.4 Assumptions Because the primary difficulty in most of the problems mentioned earlier arise due to the presence of nonlinearities in the representation of the system, it is not surprising that a wide spectrum of methods have been proposed in the literature by many authors making different assumptions. These assumptions determine the mathematical tractability of the problems, but at the same time also determine the extent to which the procedures developed will prove practically feasible. As is well known to experienced researchers, and succinctly stated by Feldkamp et al. [13], apparently difficult problems can be made almost trivial by unreasonably optimistic assumptions. 2.1.4 Objectives of Chapter The first objective of the chapter is to discuss in detail the methods that are currently available for the control of a nonlinear plant with unknown characteristics, using neural networks. In particular the chapter will examine the efforts made by different investigators to extend principles of linear control and linear adaptive control to such problems. It will examine the assumptions they have made, the corresponding approaches they have proposed, and the theoretical justification they provide for stability and robustness. In this context we also include our own approach to the same adaptive control problems, address the same issues mentioned earlier, and conclude with a statement concerning our position regarding the current status of the field of neurocontrol. When the identifier and controller for a nonlinear plant are neural networks, we have the beginnings of interconnected neural networks. When many such are interconnected as described earlier, we have a network of neural networks. As we believe that this is the direction in which the field is bound to evolve in the future, we include a typical problem for future investigation. At the present time, there is considerable research activity in the use of neural networks in optimization and optimal control problems in the presence of uncertainty and we believe that it would be a great omission on our part if we failed to comment on it. We therefore devote a section to this important topic, merely to clarify the principal concepts involved. Finally, our objective is to present and comment on some successful applications of the theory in practical control problems, as well as briefly touch upon some not so conventional applications that are currently under investigation which, if successful, will provide greater motivation for the use of neural networks in control.
P1: Binaya Dash November 16, 2007
16:58
7985
7985˙C002
20
Modeling and Control of Complex Systems
2.1.5 Organization of Chapter In this chapter we attempt to discuss many of the issues related to neurocontrol that have arisen during the past fifteen years. Section 2.2 is devoted to mathematical preliminaries and includes results from linear and nonlinear control, as well as concepts for adaptive control that are useful for later discussions. The section concludes with a statement of the problems discussed in the chapter. Section 2.3 introduces feedforward and recurrent networks used to practically realize the control laws, and the methods used to update their parameters. Because neural network-based control naturally leads to nonlinear control, and to nonlinear adaptive control when system characteristics are unknown, many of the current research problems are related to these areas. Results in nonlinear control theory, concepts and structures suggested by classical (linear) adaptive control, and the approximating capabilities of neural networks have to be judiciously combined to deal with the nonlinear adaptive control problems that arise in complex systems. Appropriate assumptions have to be made at every stage, both to have well-posed problems and to make them mathematically tractable. These are contained in Section 2.4, which concludes with some critical comments concerning methods currently in vogue in the neurocontrol literature. In Section 2.5, global stabilizability questions are discussed, because the authors believe that such concepts are essential for our understanding of the nonlinear domain and will be encountered increasingly in neurocontrol in the future. Section 2.6 is devoted to optimization and optimal control over a finite time using neural networks. Finally, the current status of applications is discussed in Section 2.7.
2.2
Mathematical Preliminaries
Well-known results from linear and nonlinear control that are used throughout the chapter are presented in a condensed form in this section for easy reference. The section concludes with the statement of the identification and control problems that are investigated in the following sections. 2.2.1 Linear Time-Invariant Systems A general multiple-input multiple-output (MIMO), linear, time-invariant, continuous-time system c (discrete-time system d ) is described by the vector differential (difference) equation: c :
x˙ (t) = Ax(t) + Bu(t) x(k + 1) = Ax(k) + Bu(k) d : y(t) = C x(t) y(k) = C x(k)
(2.1)
where u(t) ∈ Rr , y(t) ∈ Rm , and x(t) ∈ Rn . A, B, and C are constant matrices with A ∈ Rn×n , B ∈ Rn×r , and C ∈ Rm×n respectively, and u(t), y(t), and x(t)
P1: Binaya Dash November 16, 2007
16:58
7985
7985˙C002
Control of Complex Systems Using Neural Networks
21
are, respectively, the input, output, and the state of the system at time t. In the discussions that follow, we will deal with single-input single-output (SISO) systems (where r = m = 1) for clarity, and extend the results to the MIMO case. The SISO system is then described by the equation: c :
x˙ (t) = Ax(t) + bu(t) x(k + 1) = Ax(k) + bu(k) d : y(t) = cx(t) y(k) = cx(k)
(2.2)
where b and c T are constant vectors in Rn . 2.2.1.1 Controllability, Observability, and Stability Controllability, observability, and stability are system theoretic properties that play important roles in systems-related problems. The following definitions and basic results can be found in standard textbooks on linear systems [14]. A system is said to be controllable if any initial state can be transferred to any final state by the application of a suitable control input. The SISO system c (d ) described in Equation (2.2) is controllable if the matrix Wc = [b, Ab, A2 b, . . . , An−1 b]
(2.3)
is nonsingular. The MIMO system c (d ) in Equation (2.1) is controllable if the (n × nr ) matrix [B, AB, . . . , An−1 B] is of rank n. The dual concept of controllability is observability. A system is said to be observable if the initial state (and hence all subsequent states) of the system can be determined by observing the system output y(·) over a finite interval of time. For a SISO system (2.2), the condition for observability is that the matrix Wo = [c T , AT c T , . . . , A(n−1)T c T ]
(2.4)
be nonsingular. For MIMO systems, the (n × mn) matrix [C T , AT C T , . . . , A(n−1)T C T ] is of rank n. The third system theoretic property that is crucial to all control systems is stability and depends on the matrix A. c is stable if the eigenvalues of A lie in the open left half plane (d is stable if the eigenvalues of A lie in the interior of the unit circle). Controllability and stability: For LTI systems (2.2) it is known that if the pair (A, b) is controllable, it can be stabilized by state feedback, that is, u = k T x. Estimation and control: When c (d ) is represented by the triple (c, A, b) which is controllable and observable, an important result derived in the 1970s assures the existence of a control input that can stabilize the system. The state x of the system is estimated as xˆ and used to determine the stabilizing input u = k T xˆ . 2.2.1.2 ARMA Model The proper representation of a discrete-time LTI system d in terms of only inputs and outputs is an important consideration in the mathematical
P1: Binaya Dash November 16, 2007
16:58
7985
7985˙C002
22
Modeling and Control of Complex Systems
tractability of many control problems. From Equation (2.2), the following input-output relation can be obtained: y(k + n) = c An x(k) +
n−1
c An−1−i b u(k + i)
(2.5)
i=0
From the above equation the following ARMA (autoregressive moving average) representation can be derived for SISO systems: y(k + 1) =
n−1
α i y(k − i) +
i=0
n−1
β j u(k − j)
(2.6)
j=0
where α i and β j are constants (i, j = 1, 2 . . . , n). The same also applies to MIMO systems where y(k + 1) =
n−1
Ai y(k − i) +
i=0
n−1
B j u(k − j)
(2.7)
j=0
where Ai and B j are constant (m × m and m × r ) matrices. If, in the SISO system (2.6), the input u(k) at time k affects the output at time (k + d) but not earlier, the system is said to have a relative degree d. For LTI systems this is merely the delay through the system. Because cb, c Ab, . . . , c Ad−1 b are zero, but c Ad b = 0, it can be shown that the system has a representation: y(k + d) =
n−1
αi y(k − i) +
i=0
n−1
β j u(k − j)
(2.8)
j=0
where αi and β j are constants. For MIMO systems with m inputs and m outputs (r = m), each output yi (·) has a relative degree di j to the jth input u j . The relative degree di is then defined as di = min j {di j }, and represents the smallest time in which some input can affect the jth output. Hence, each of the m outputs has a clearly assigned relative degree denoted by the elements of the vector d = [d1 , d2 , . . . , dm ]T . Using the same procedure as in the SISO case, we obtain the following input-output relation for the MIMO system: ⎡ ⎤ y1 (k + d1 ) n−1 n−1 ⎢ y2 (k + d2 ) ⎥ ⎢ ⎥ Y(k + d) = ⎢ Ai y(k − i) + B j u(k − j) (2.9) ⎥= .. ⎣ ⎦ . ym (k + dm )
i=0
j=0
where Ai and B j are matrices of appropriate dimensions. 2.2.1.3 Minimum Phase Systems A question that arises in control theory is whether or not internal signals in the system can become unbounded while the observed outputs remain
P1: Binaya Dash November 16, 2007
16:58
7985
7985˙C002
Control of Complex Systems Using Neural Networks
23
bounded. In terms of Equation (2.8), the question can be posed as follows: Is it possible for limk→∞ y(k) to be zero while the input u(k) grows in an unbounded fashion? Obviously such a situation is possible if β0 u(k) + β1 u(k − 1) + · · · + βn u(k − n) = 0
(2.10)
has unbounded solutions. It can be shown that this is equivalent to the equation: β0 zn + β1 zn−1 + · · · + βn = 0
(2.11)
having at least one root outside the unit circle, or alternatively a necessary and sufficient condition for the question to have a negative answer is that all the roots of Equation (2.11) (representing the zeros of the transfer function of the SISO system) lie inside the unit circle. We refer to such a system as a “minimum phase system.” 2.2.2 Nonlinear Systems Finite-dimensional, continuous-time, and discrete-time nonlinear systems can be described by the state equations of the form: c :
x˙ (t) = F (x(t), u(t)) y(t) = H(x(t))
d :
x(k + 1) = F [x(k), u(k)] y(k) = H[x(k)]
(2.12)
Work in the area of nonlinear control has been in progress for many decades, and numerous attempts have been made to obtain results that parallel those in linear theory (refer to Section 2.5). We include here well-established results concerning such systems which are related to their linearizations (refer to Section 2.4). 2.2.2.1 Controllability, Observability, and Stability The definitions of controllability, observability, and stability in the nonlinear case are identical to those in the linear case. However, obtaining general conditions to assure these properties in a domain D in the state space is substantially more complex. 2.2.2.1.1 Controllability If the state x(0) of the discrete-time system d in Equation (2.12) is to be transferred to the state x(n) by the application of a suitable input u, the following equation has to be satisfied: x(n) = F [· · · F [F [x(0), u(0)], u(1)] · · · , u(n − 1)] = [x(0), Un (0)]
(2.13)
where Un (0) = {u(0), u(1), . . . , u(n − 1)} is an input sequence of length n. The problem of controllability at time k = 0 is evidently one of determining the existence of Un (0) that will satisfy Equation (2.12) for any specified x(0) and x(n).
P1: Binaya Dash November 16, 2007
24
16:58
7985
7985˙C002
Modeling and Control of Complex Systems
2.2.2.1.2 Observability Similarly, observability can be defined by considering the equations: y(k) = H[x(k)] = 1 [x(k)] y(k + 1) = H[x(k + 1)] = H[F [x(k), u(k)]] = 2 [x(k), u(k)] .. .
(2.14)
y(k + n − 1) = H[x(k + n − 1)] = H[x(k + n − 1)] = n [x(k), u(k), u(k + 1), . . . , u(k + n − 2)]
Given the sequence Yn (k) = {y(k), y(k + 1), . . . , y(k + n − 1)} and the input Un (k) = {u(k), u(k + 1), . . . , u(k + n − 1)} observability implies that the state x(k) and hence x(k + 1), . . . , x(k + n) can be determined. 2.2.2.1.3 Inverse Function Theorem and the Implicit Function Theorem Both controllability and observability consequently involve the solutions of nonlinear algebraic equations. Two fundamental theorems of analysis that are useful in this context are the inverse function theorem and the implicit function theorem. Inverse function theorem: Let U be an open set in Rn and let f : U → Rn be a C k function with k ≥ 1. If a point x ∈ U such that the matrix Df (x) is invertible, then there exists an open neighborhood of x in U such that f : V → f [V] is invertible with a C k inverse. Implicit function theorem: Let U be an open set in Rm × Rn and let f : U → Rn be a C k function with k ≥ 1. Let (x, y) ∈ U where x ∈ Rm and y ∈ Rn with f (x, y) = c. If the (n × n) matrix Dy f (x, y) of partial derivatives is invertible, then there are open sets Vm ⊂ Rm and Vn ⊂ Rn with (x, y) ∈ Vm × Vn ⊂ U and a unique C k function φ : Vm → Vn such that f (x, φ(x)) = c for all x ∈ Vm . Moreover φ(x). f (x, y) = c if (x, y) ∈ Vm × Vn and y = By the inverse function theorem if x is the solution of the vector equation f (x) = c, then the equation can also be solved in the neighborhood of x if the Jacobian matrix Df (x)|x=x is nonsingular. The implicit function theorem extends this result to equations that are functions of x and y, and the solution y is desired as a unique function of x. The following important theorem derived using the implicit function theorem is the starting point of all the local results derived for nonlinear control, and is stated without proof. THEOREM Let the linearized equations of (2.12) around the origin be z(k + 1) = Az(k) + bu(k) w(k) = cz(k)
(2.15)
P1: Binaya Dash November 16, 2007
16:58
7985
7985˙C002
Control of Complex Systems Using Neural Networks
25
where A = ∂∂xf , b = ∂∂uf , and c = ∂∂hx . If the linearized x=0,u=0 x=0,u=0 x=0,u=0 system (2.15) is controllable (observable), the nonlinear system (2.14) is controllable (observable) in some neighborhood of the origin. 2.2.2.1.4 Stability If A in Equation (2.15) is a stable matrix, from Lyapunov’s works it is well known that the nonlinear system d is asymptotically stable in a neighborhood of the origin. The controllability and observability of the linearized system are merely sufficient conditions for the corresponding properties to hold for (2.12) and are not necessary. Yet the theorem is important, as according to it the nonlinear system is well behaved in a neighborhood of the origin, if the linearized system is well behaved. The relevance of these comments will be made clear in Sections 2.4 and 2.5. 2.2.3 Adaptive Systems: Theoretical Considerations All the problems discussed in this chapter can be considered as infinite-time or finite-time problems in nonlinear adaptive control. The term “adaptive control” refers to the control of partially known systems. Linear adaptive control deals with the identification and control of LTI systems with unknown parameters [15]. The class of nonlinear adaptive control problems of interest in this chapter are those in which the nonlinear functions F (·) and H(·) in the description of the controlled plant (2.12) are unknown or partially known. Obviously, this represents a very large class of systems for which general analytic methods are hard to develop. Many subclasses may have to be defined and suitable assumptions may have to be made to render them analytically tractable. In spite of the complexity of nonlinear adaptive systems, many of the questions that they give rise to are closely related to those in the linear case. Although the latter seem simple (in hindsight), it is worth stressing that linear adaptive control gave rise to a multitude of difficult questions for a period of forty years and that some of them have not been completely answered thus far. Because the statement of the problems in the linear case, the assumptions made and the reasons for the difficulties encountered are all directly relevant for the issues discussed in this chapter, we provide a brief introduction to them in this section. The plant to be controlled is described by the linear state Equations (2.1) or (2.2) depending upon whether the system is MIMO or SISO. If only the inputs and outputs are accessible, the ARMA representations (2.5) and (2.6) are used. In all cases, the parameters of the plant are assumed to be unknown. Identification and control: Identification involves the estimation of the unknown parameters of the plant using either historical input-output data or online input-output measurements. Indirect adaptive control involves the adjustment of controller parameters based on the estimates of the plant parameters.
P1: Binaya Dash November 16, 2007
26
16:58
7985
7985˙C002
Modeling and Control of Complex Systems
Comment 1 Parameters that are adjusted online become state variables. This also makes (linear) adaptive systems nonlinear. Hence, all system characteristics have to be discussed in a higher-dimensional state space. For example, the stability analysis of a linear plant of dimension n has to be discussed in a 3n-dimensional space (2n corresponding to the unknown parameters of the plant). The reference model: Controller parameters can be adjusted either to minimize a specified performance criterion, or to track the output of a reference model (model reference adaptive control). In the latter case, the reference model has to be chosen properly so that the problem is well posed. Choosing a linear reference model to possess desired characteristics is relatively straightforward. Choosing a nonlinear reference model is considerably more difficult and requires a detailed knowledge of the plant. Therefore, it is not suprising that in most applications linear models are chosen. Direct and indirect adaptive control: In direct adaptive control the control input is determined directly from a knowledge of the output (control) error. To assure boundedness of all the signals strong assumptions such as positive realness of the plant transfer function have to be made. If indirect control is used, it must first be demonstrated that a controller structure exists that can result in the convergence of the output error to zero. Comment 2 Existence questions in linear adaptive systems lead to linear algebraic equations (e.g., the Diophantine equation). In nonlinear systems, the corresponding equations would be nonlinear. Algebraic and analytic parts: All conventional adaptive methods invariably consist of two stages. Demonstrating the existence of a controller, mentioned earlier, constitutes the first stage, which is algebraic. Determining adaptive laws for adjusting the controller parameters so that the overall system is stable and the output error tends to zero constitutes the second stage and is the analytical part. Both stages are relevant for all the problems treated in this chapter. Comment 3 The resolution of the above problem in linear adaptive control in the late 1970s took several years. This was in spite of the advantage that could be taken of many of the subsystems being linear. For example, the error models relating parametric errors to output errors are linear (refer to Section 2.3.2). Since this advantage is lost in nonlinear adaptive control, the problem is significantly more complex. Nonlinear plants with a triangular structure: A more general class of nonlinear systems, whose adaptive control has been investigated rigorously in the literature, are those that are in canonical form with constant unknown parameters. Two types of plants that have been analyzed are defined below [16, 17].
P1: Binaya Dash November 16, 2007
16:58
7985
7985˙C002
Control of Complex Systems Using Neural Networks
27
DEFINITION A system is said to be in parametric pure-feedback (PPF) form if z˙ i = zi+1 + θ T γi (z1 , . . . , zi+1 ), i = 1, 2, . . . , n − 1 z˙ n = γ0 (z) + θ T γn (z) + [β0 (z) + θ T β(z)]u
(2.16)
DEFINITION A system is said to be in parametric strict-feedback (PSF) form if z˙ i = zi+1 + θ T γi (z1 , . . . , zi ) z˙ n = γ0 (z) + θ T γn (z) + β0 (z)u
(2.17)
where z = [z1 , z2 , . . . , zn ], θ ∈ R p is a vector of unknown parameters. Stabilizing adaptive controllers have been developed for systems in both PPF and PSF forms where the result is local in nature in the former and global in the latter. Comment 4 The proof of stability given in References [16, 17] for the above problems is strongly dependent on the fact that the nonlinear functions γ0 (·), β0 (·), and β(·) are known and smooth (so their derivatives can also be used in the control laws) and the only unknown in the system is the constant vector θ . Naturally, the proofs are no longer valid if any of the above assumptions do not hold (see comments in Section 2.4.4). 2.2.4 Problem Statement As stated in the introduction, the historical developments in the fields of neurocontrol have traversed the same paths as those of linear adaptive control, even as the latter have closely paralleled those of linear control theory. In this section we consequently confine our attention to the same sequence of problems that were resolved in the two preceding fields in the past four decades. These are concerned with the identification and control of nonlinear dynamical systems. 2.2.4.1 Plant We assume that the plant (or process) to be controlled is described by the discrete-time state Equations (2.12): :
x(k + 1) = F [x(k), u(k)] F (0, 0) = 0 y(k) = H[x(k)] H(0) = 0
(2.18)
and the functions F and H are smooth. If F and H are known, the problem belongs to the domain of nonlinear control. If F and H are unknown or partially known, the problem is one of nonlinear adaptive control. Naturally, as in the case of linear adaptive control, we will be interested first in the questions that arise in the control problem
P1: Binaya Dash November 16, 2007
16:58
7985
7985˙C002
28
Modeling and Control of Complex Systems
when F and H are known, and later on how the methods proposed can be modified for the adaptive case. A number of factors influence both the problems posed and the methods used for resolving them. Among these the most important are the assumptions about the function F (·) and H(·), the stability of the system , and the accessibility of its state variables. If control has to be carried out using only the inputs and outputs of , a different representation of the system will be needed. This would naturally call for a modification of the methods used for identification and control. Three problems are presented below. In the first two problems the plant is assumed to be stable. In the third problem, the plant is assumed to be unstable, and identification and control proceed concurrently to make the overall system stable (this corresponds to the major stability problem of adaptive control resolved in 1980). In all cases, it is assumed that external inputs are bounded with known bounds, and that the region in the state space in which the trajectories of should lie are also specified. PROBLEM 1 (Identification) The discrete-time plant is described by Equations (2.12) where F (·) and H(·) are unknown. The input u(·) satisfies the condition ||u(t)|| ≤ c u and is BIBO (bounded-input, bounded-output) stable, so that ||x(t)|| ≤ c x and ||y(t)|| < c y , where c u , c x , and c y are known constants. The state x(k) of is accessible at every time instant k. ˆ of the plant whose output 1. Determine a suitable representation for a model xˆ (·) satisfies the condition limk→∞ ||x(k) − xˆ (k)|| < 1 . ˆ I/O of 2. If y(k) but not x(k) is accessible, determine an input-output model the system such that the output y(k) ˆ of the model satisfies limk→∞ ||y(k) − y(k)|| ˆ ≤ 2 for the set of input-output pairs provided, where 1 and 2 are prescribed constants. PROBLEM 2 (Control of a stable plant) ˆ and ˆ I/O satisfying the conditions of Assuming that is stable and that models Problem 1 have been determined, the following control problems may be stated: 1. Determine a feedback control law u(k) = γ (x(k)), where γ (·) is a smooth function, such that every initial condition x0 in a neighborhood of the origin is transferred to the equilibrium state in a finite number of steps. 2. Assuming that only the input and output of are accessible, determine a control law such that x(k) tends to the equilibrium state in a finite time. 3. (Set point regulation) In problems (1) and (2) determine a control law such that the output y(·) of is regulated around a constant value. 4. (Tracking) Given a stable reference model m defined by m :
xm (k + 1) = Fm (xm (k), r (k)) ym (k) = Hm (xm (k))
(2.19)
P1: Binaya Dash November 16, 2007
16:58
7985
7985˙C002
Control of Complex Systems Using Neural Networks
29
where xm (k) ∈ Rn and ym (k) ∈ Rm , and Fm and Hm are known, determine a feedback control law such that lim |y(k) − ym (k)| <
k→∞
(2.20)
where is a specified constant. PROBLEM 3 (Control of an unstable plant) In this case identification and control of the plant (which is assumed to be unstable), have to be carried out simultaneously. All four cases stated in Problem 2 can also be considered in this case. Comment 5 As in classical adaptive control we will be interested in both the algebraic and the analytical parts of the solution. These are discussed in Section 2.4.1. All the problems stated above can be addressed either from a strictly theoretical point of view or as those that arise in the design of identifiers and controllers in real applications. In this chapter, we are interested in both classes of problems. In the latter case, the prior information that is available concerning the plant as well as mathematical tractability will dictate to a large extent the models used for both identification and control. Some of the questions that arise in this context are listed below: 1. Structures of identifiers and controllers and the use of feedforward networks and recurrent networks to realize them 2. The algorithms used to adjust the parameters of the neural networks 3. The questions of stability that arise in the various cases
2.2.4.2 An Area for Future Research In control theory as well as in adaptive control, after problems involving isolated systems had been addressed, interest invariably shifted to problems in which multiple dynamic systems are involved. Decentralized control, distributed control, and hierarchical control come under this category. More recently, there has been a great deal of research activity in multiagent systems in which many dynamic systems interact. Also interest in distributed architectures has increased, since researchers in control theory and computer science believe that they would enhance our ability to solve complex problems. The above comments indicate that interaction of dynamical systems can arise due to a variety of factors ranging from practical physical considerations to desire for increased efficiency. When dealing with interacting or interconnected systems, neural networks play a critical role similar to that in the problems described earlier. A generic problem of interconnected nonlinear systems may be stated as follows.
P1: Binaya Dash November 16, 2007
16:58
7985
7985˙C002
30
Modeling and Control of Complex Systems
The overall system consists of a set of subsystems i (i = 1, 2, . . . , N)
:
1 : x1 (k + 1) = f 1 x1 (k), h 1 [x 1 (k)], u1 (k)
2 : x2 (k + 1) = f 2 x2 (k), h 2 [x 2 (k)], u2 (k) .. .
.. .
(2.21)
N : xN (k + 1) = f N xN (k), h N [x N (k)], u N (k) N ni where N = i=1 ni is the dimension of the state space , xi ∈ R is the N−ni denotes the states of the remaining state of subsystem i , and xi ∈ R N − 1 systems. Each system i is affected by other subsystems through an unknown smooth function h i (·). Depending upon the nature of the problem, the different subsystems may compete or cooperate with each other to realize their overall objectives. How the various systems identify their dynamics in the presence of uncertainty, how they acquire their information, and whether communication is permitted between them constitute different aspects of the problems that arise. For mathematical tractability, much of the research in progress on problems of the type described above are restricted to linear systems. In Section 2.4, one such problem dealing with decentralized adaptive control is discussed. However, because most real systems are in fact nonlinear, it is only reasonable to expect increased interest in the future in nonlinearly interconnected systems.
2.3
Neural Networks, Adaptive Laws, and Stability
In the following sections neural networks are used as identifiers and controllers in dynamic systems. The type of networks to be used, the adaptive laws for adjusting the parameters of the network based on available data, and the stability and robustness issues that have to be addressed are all important considerations in their design. In this section we comment briefly on each of the above aspects. 2.3.1 Neural Networks Although numerous network architectures have been proposed in the literature, we will be concerned mainly with two broad classes of networks in this chapter: (1) feedforward networks and (2) recurrent networks. The former are static maps whereas the latter are dynamic maps. Even though both of them have been studied extensively in the literature, for the sake of continuity, we provide brief introductions to both of them.
P1: Binaya Dash November 16, 2007
16:58
7985
7985˙C002
Control of Complex Systems Using Neural Networks u0=1 u1 u2
un
z0=1
v0=1 W1, b1
Σ Σ
Σ
γ
v1
γ
v2
γ
vn
31
W2, b2
Σ Σ
Σ
γ
z1
γ
z2
γ
W3 , b3
zn
1a. A Multilayer Preceptron Network (MPN)
Σ Σ Σ
γ γ
γ
y1 y2
yn
u1
R(||u – c1||)
W1
u2
R(||u – c2||)
W2
R(||u – cN||)
WN
un
w0 y
Σ
1b. A Radial Basis Function (RBF) Network
FIGURE 2.1 Neural networks: (a) a multilayer preceptron network (MPN); (b) a radial basis function network (RBF).
2.3.1.1 Feedforward Networks The most commonly used feedforward networks are the multilayer preceptron network (MPN) and the radial basis function network (RBFN). An N-layer MPN with input u ∈ Rn and output y ∈ Rn can be described by the equation: y = WN [WN−1 · · · [W1 u + b 1 ] + b 2 ] + · · · + b N−1 ] + b N
(2.22)
where Wi is the weight matrix associated with the ith layer, the vectors b i (i = 1, 2, . . . , N) represent the threshold values for each node in the ith layer and is a static nonlinear operator with an output vector [γ (x1 ), γ (x2 ) · · · , γ (xn )]T corresponding to an input [x1 , x2 , . . . , xn ]T , where γ : R → [−1, 1] is a smooth function. A three-layer network is shown in Figure 2.1a. It is seen that each layer of the network consists of multiplications by constants (elements of the weight matrix) summation and the use of a single nonlinear map γ . Radial basis function networks, which are an alternative to MPN, represent the output y as a weighted sum of basis (or activation) functions Ri : Rn → R, where i = 1, . . . , N. If y ∈ Rn , the RBFN is described by y = W T R(u) + W0 where W = [W1 , W2 , . . . , WN ]T is a weight vector multiplying the N basis functions having u = [u1 , u2 , . . . , un ]T as the input and W0 is an offset weight. Quite often Gaussian functions are used as radial basis functions so that Ri (u) =
(u −c ) 2 exp − nj=1 j2 σi ij j where c i = [c i1 , c i2 , . . . , c in ] is the center of the ith receptive field, and σi j is referred to as the width of the Gaussian function. An RBFN is shown in Figure 2.1b. Since the function R(u) is predetermined, the output is a linear function of the elements of W. For the purposes of this chapter, both MPN and RBFN enjoy two important characteristics. The first is their ability to approximate nonlinear maps. The second is the fact that for such networks, different methods of adjusting their parameters have been developed and are generally known. These methods will be discussed in Section 2.3.3.
P1: Binaya Dash November 16, 2007
32
16:58
7985
7985˙C002
Modeling and Control of Complex Systems
2.3.1.2 Recurrent Networks In contrast to the feedforward networks considered thus far, which are static maps from one finite-dimensional vector space to another, recurrent networks are dynamic maps that map input time signals into output time signals. This is accomplished by providing them with memory and introducing time delays and feedback. It is well known that any LTI discrete-time system can be realized using only the operations of multiplication by a constant, summation, and a unit delay. A static feedforward network, on the other hand, includes multiplication by a constant and summation and a single appropriate nonlinear function (e.g., the sigmoid). Recurrent networks, which are nonlinear dynamic maps, can be generated using all four operations described above, that is, addition, multiplication by constant, delay, and a sigmoid nonlinearity. As in the case of static networks, interest in recurrent neural networks also grew from successes in practical applications. Through considerable experience, people in industry became convinced of the usefulness of such networks for the modeling and control of dynamic systems. As shown later, recurrent networks provide a natural way of modeling nonlinear dynamic systems. It was also found that recurrent networks used as controllers are significantly more robust than feedforward controllers to changes in plant characteristics. It was argued by Williams [18] in 1990 that recurrent neural networks can be designed to have significantly new capabilities. Since then, it has been shown that recurrent networks can serve as sequence recognition systems, generators of sequential patterns, and nonlinear filters. They can also be used to transform one input sequence into another, all of which thus far have not been exploited in control theory. Delays can be introduced anywhere in a feedforward network to make it dynamic. The number of such delays can vary from unity (if only the output signal is fed back to the input) to N = N2 where N represents the sum of the input, hidden, and output nodes (and delays exist between every node and all the other nodes in the system). For practical as well as theoretical reasons, compromises have to be made on the total number of delays used. Many different structures have been suggested in the neural network literature by both computer scientists and engineers. We present only two structures, which will be needed for addressing the problems stated earlier. Both of them use the universal approximating property of multilayer neural networks. Consider first the general state Equation (2.18) representing an nth-order nonlinear system. F : Rn × Rr → Rn can be approximated using a multilayer neural network with (n + r ) inputs and n outputs, and delays as shown in Figure 2.2. Similarly, using a separate multilayer network, the function H : Rn → Rm can be approximated. The representation of the dynamic system given by Equaˆ represent the approximations of tion (2.18) is shown in Figure 2.2. Fˆ and H F and H, respectively. If the state variables are not accessible and an input-output model I/O of the system (with relative degree d) is needed, it has been shown [19, 20] that
P1: Binaya Dash November 16, 2007
16:58
7985
7985˙C002
Control of Complex Systems Using Neural Networks
x (k)
z–1 0 0 z–1
0
0 0 z–1
0
Fˆ (x,u)
u
33
ˆ H(x) x(k+1)
y(k+1)
FIGURE 2.2 State vector model.
a SISO system can be described by the equation: y(k + d) = f [y(k), y(k − 1), . . . , u(k), . . . , u(k − n + 1)]
(2.23)
in a neighborhood of the equilibrium state. Similarly, for a multivariable system with r inputs (u(k) ∈ Rr ) and m outputs y(k) ∈ Rm , and relative degree di for the ith output, it has been shown that a representation of the form: y1 (k + d1 ) = f 1 [y(k), y(k − 1), . . . , y(k − v + 1), u(k), u(k), . . . , u(k − v + 1)] ··· ym (k + dm ) = f m [y(k), y(k − 1), . . . , y(k − v + 1), u(k), u(k), . . . , u(k − v + 1)] (2.24) exists in . These are referred to as NARMA (nonlinear ARMA) models. In Figure 2.3, the realization of a SISO system is shown using tapped delay lines. The multivariable system (2.24) can also be realized in a similar fashion. fˆ represents an approximation of f in Equation (2.23). The recurrent network models shown in Figures 2.2 and 2.3 can be used either as identifiers or controllers in the problems stated earlier. u (k) Z–1 Z–1 u (k – n + 1) y(k – n + 1)
Z–1
Multilayer Neural Network fˆ ( )
Z–1 y (k)
FIGURE 2.3 Input-output model.
Z–d
y (k + d)
P1: Binaya Dash November 16, 2007
16:58
7985
7985˙C002
34
Modeling and Control of Complex Systems
2.3.1.3 System Approximation The two principal methods for approximating a system described by a recursive equation can be illustrated by considering the estimation of the parameters of a linear system, described by Equation (2.8): y(k + d) =
n−1
αi y(k − i) +
i=0
n−1
β j u(k − j)
(2.25)
j=0
where αi and β j are unknown and need to be estimated. A series-parallel identification model has the form: y(k ˆ + d) =
n−1
αˆ i (k) y(k − i) +
i=0
n−1
βˆ j (k)u(k − j)
(2.26)
j=0
where αˆ i (k) and βˆ j (k) are the parameter estimates at time k. The output error equation has the simple form: y˜ (k + d) =
n−1
α˜ i (k) y(k − i) +
i=0
n−1
β˜ j (k)u(k − j),
(2.27)
j=0
where y˜ , α˜ i , and β˜ j represent the output and parameter errors at time k. Since this has the standard form of error model 1 (described in Section 2.3.2), stable adaptive laws for adjusting αˆ i (k) and βˆ j (k) can be determined by inspection. If, however, a recurrent (or parallel) identification model is used, the equation describing the model is no longer simple, and is a difference equation, as shown below: y(k ˆ + d) =
n−1 i=0
αˆ i (k) y(k ˆ − i) +
n−1
βˆ j (k)u(k − j)
(2.28)
j=0
since the estimate y(k ˆ + d) at time k + d depends upon past estimates y(k), ˆ y(k ˆ − 1),. . ., y(k ˆ − n + 1). The determination of stable adaptive laws for adjusting αˆ i (k) and βˆ j (k) is substantially more complex in this case. In fact, such adaptive laws are not available and only approximate methods are currently known. Comment 6 The following points are worth emphasizing. The series-parallel model is not truly a model but merely a predictor. In contrast to this, the recurrent model is a true model of the system with all the advantages of such a model (e.g., control strategies can be tried out on the model rather than the plant). If an efficient predictor is adequate for control purposes (as has been demonstrated in linear adaptive control) the simplicity of the seriesparallel model, and the assured stability of the overall control system, based on the adjustment of the control parameters using plant parameter estimates, may outweigh the theoretical advantages of the recurrent model in some applications.
P1: Binaya Dash November 16, 2007
16:58
7985
7985˙C002
Control of Complex Systems Using Neural Networks
u
Error Model 1
φT
e1(t)
35 φTu = e1 φ = –e1u (adaptive law)
Error Model 2
u
φT
e(t)
Stable Plant
e = Ae + bφTu φ = –eT Pbu e = Ae + bφTu
Error Model 3
u
e1(t) φT
SPR
e1 = ce φ = –e1u
FIGURE 2.4 Error models.
2.3.2 Stable Adaptive Laws: Error Models The laws for adjusting the parameters derived in classical adaptive control are based on simple linear models, known as error models. These relate the parameter errors φ(t) to the output error e(t) ∈ R(Rm for MIMO) between the actual output of the plant and a desired output (generally the output of a ˙ using all the information reference model). The objective is to determine φ(t) available at time t to make the system stable, so that e(t) tends to zero. The study of error models is rendered attractive by the fact that by analyzing these models, which are independent of specific applications, it is possible to obtain insights into the behavior of a large class of adaptive systems. The error models [15] are shown in Figure 2.4. The equations describing the models, the adaptive laws proposed, and the Lyapunov functions which assure stability in each case are given below. Error Model 1 φ(t), u(t) ∈ Rn and φ T (t)u(t) = e 1 (t). ˙ ˙ = Adaptive law φ(t) = −e 1 (t)u(t) if the input u(t) is bounded and φ(t) −e 1 (t)u(t) 1 T when it is not known a priori that u(t) is bounded. V(φ) = φ (t)φ(t) 1+uT (t)u(t) 2
−e 12 (t) 2 ˙ and V(t) = −e 1 (t) or 1+uT (t)u(t) ≤ 0. Error Model 2 e˙ (t) = Ae(t) + bφ T (t)u(t)
(2.29)
where the matrix A and vector b are known, A is stable, and (A, b) is controllable. Adaptive law: −e T (t) Pbu(t) ˙ = −e T (t)Pbu(t) or φ(t) 1 + uT (t)u(t) AT P + P A = −Q < 0
(2.30)
P1: Binaya Dash November 16, 2007
16:58
7985
7985˙C002
36
Modeling and Control of Complex Systems
u = N(x)
e1 φT
SPR x N(x)
FIGURE 2.5 Error models for nonlinear systems.
Error Model 3 In error model 2, e(t) is not accessible, but ce(t) = e 1 (t) is accessible and c[s I − A]−1 b is strictly positive real. ˙ = −e 1 (t)u(t) or Adaptive law: φ(t)
−e 1 (t)u(t) 1+uT (t)u(t)
˙ V(e, φ) = e T Pe + φ T φ and V(t) = −e T Qe
(2.31)
By the Kalman–Yakubovich lemma [21], a matrix P exists which simultaneously satisfies the equations AT P + P A = −Q and Pb = c T . It is worth emphasizing that when the input u(·) is not known a priori to be bounded, the adaptive laws have to be suitably normalized. 2.3.2.1 Error Models for Nonlinear Systems In some simple nonlinear adaptive control problems as well as more complex problems in which appropriate simplifying assumptions are made, it may be possible to obtain an error model of the form shown in Figure 2.5. This is the same as error model 3 in which u = N(x) where x is the state of the unknown plant and N(·) is a continuous nonlinear function. If the same ˙ = −e 1 (t) N(x(t)), is used, by the same adaptive law as in error model 3, φ(t) arguments as before, it follows that e and φ are bounded. This ensures that x and consequently N(x) are bounded, and hence the output e 1 tends to zero. Comment 7 Approximations of the type described in this model have been made widely in the neurocontrol literature without adequate justification. We shall comment on these at the end of Section 2.4. 2.3.2.2 Gradient-Based Methods and Stability During the early days of adaptive control in the 1960s, the adjustment of the parameters of an adaptive system were made to improve performance. Extremum seeking methods and sensitivity methods that are gradient based were among the most popular methods used, and in both cases the parameters were adjusted online. Once the adjustment was completed, the stability of the overall system was analyzed, and conditions were established for local
P1: Binaya Dash November 16, 2007
16:58
7985
7985˙C002
Control of Complex Systems Using Neural Networks
37
stability. Thus, optimization of performance preceded stability analysis in such cases. In 1966 Parks [22], in a paper of great historical significance in adaptive control, conclusively demonstrated using a specific example that gradient methods for adjusting the parameters in an adaptive system can result in instability. At the same time he also showed that the system could be made globally stable using a design procedure based on Lyapunov’s method. This clear demonstration that gradient-based methods could become unstable tolled the death knell of such systems, and witnessed a shift in the next decade to design based on stability methods described earlier. In the following forty years adaptive control aimed at first stabilizing the overall system using stable adaptive laws, and later adjusting the fixed controller parameters of the system to improve performance within a stability framework. When neural networks are used in a system for identification and control, the overall system is nonlinear and it is very hard to derive globally stable adaptive laws for adjusting parameters. However, as shown in Section 2.4, numerous authors have continued to formulate problems in such a fashion that stable adaptive laws can still be determined. In view of the difficulties encountered in generating stable adaptive laws, most of the methods currently used for adjusting the parameters of a neural network are related to back propagation, and are gradient based as in classical adaptive control of the 1960s. Once again, it becomes necessary to examine the reasons for discarding gradient methods in the past and explore ways of reconciling them with stability and robustness of the overall system. In the example described by Parks in his classic paper [22], and later verified in numerous applications, it is the speed of adaptation that causes instability. If the frequency of adjustment of the parameters is such that the output error has a phase shift greater than π2 , adaptation may proceed exactly in a direction opposite to what is desired. Also, we note that in all the error models discussed earlier φ˙ → 0. Motivated by such considerations, researchers have been reexamining gradient-based methods which are both theoretically and practically attractive. In the following sections we shall assume that gradient methods result in stability if the operating point is stable, and the adjustments are slow compared to the dynamics of the system. 2.3.3 Adjustment of Parameters: Feedforward and Recurrent Networks The simplicity of the adaptive laws for adjusting the parameters of the seriesparallel model described earlier resulted from the fact that the error equation (relating parameter errors to output error) was linear. When the neural networks shown in Figure 2.1 are used to approximate a nonlinear function, the parameters of the networks are adjusted to minimize some specified norm ||y − yd || where y is the output of the neural network and yd the output of the given nonlinear function when both have the same input u. For radial basis function networks if wi is adjusted, ∂∂we is merely the output of the i ith radial basis function network. For multilayer networks the element θi of
P1: Binaya Dash November 16, 2007
16:58
7985
7985˙C002
38
Modeling and Control of Complex Systems
the parameter vector θ (the elements of the matrices Wi and vectors b i in Figure 2.1a) are in general nonlinearly related to the output error. While the output error depends linearly on the weights (W3 , b 3 ) in the output layer [if the activation functions γ (·) are omitted at the output], they depend nonlinearly on (W2 , b 2 ) and (W1 , b 1 ). Hence, one is forced to resort to gradient methods for adjusting these parameters. If w is a typical weight the derivative of e 2 with 2 respect to w is ∂∂ew = 2e ∂∂we so that ∂∂we has to be computed. Hence, a convenient
method of computing ∂∂θe is needed. “Back propagation” is such a method i and is currently well established. In Reference [12], an architecture was proposed by which the partial derivatives of e with respect to parameters in an arbitrary number of layers can be realized. This is shown in Figure 2.6. Since our interest is in the control of complex systems using neural networks, and the latter quite often involves the cascading of several networks, the architecture in Figure 2.6 is found to be very useful in practical applications. Back propagation has been perhaps the most diversely used adaptive architecture in practical applications. In problems that do not involve feedback (such as function approximation and pattern recognition) it is simple to apply and does not pose any stability questions. If it is used in control problems, as described later, the adjustment of the parameters must be slow compared to the dynamics of the controlled system. In contrast to the above, the recurrent network, as stated earlier, is a dynamic map and the inputs and outputs are not vectors in a finite-dimensional 1 = u0
1 = v0 1
2
t = {ti } u1
Σ
γ
u2
Σ
γ
Σ
γ
. . .
1 = z0
1
t =
v1
v2
. . .
{tk2}
Σ
γ
Σ
γ
Σ
γ
vn
un 1
W =
{w1ij}
t3 = {t3l}
z1 z2
. . .
Σ
γ
y1
Σ
γ
y2
Σ
γ
. . .
zn 2
W =
γ’
{wki2}
W3 = {wlk3}
γ’
γ’ π
δ1
δp
1 U
Σ
π
δ2
W2T = {wik2}
π
π
Σ
δ2
Σ
π
δ1
π
px (n+1) multiplications
FIGURE 2.6 Architecture for back propagation.
V
π π
e1 e2 em
δm
Σ
1
П
δ1 δ2
Σ
π
δq
Σ
W3T = {wkl3}
yn
1 qx (p+1) multiplications
Z
mx (q+1) multiplications
P1: Binaya Dash November 16, 2007
16:58
7985
7985˙C002
Control of Complex Systems Using Neural Networks
39
vector space but time sequences. Mathematically they are substantially more complex than feedforward networks, but it is their very complexity that makes them attractive because they permit a wide variety of dynamic behaviors to be programmed by the proper choice of weights. In fact, it has been shown that a recurrent network with enough units can approximate any dynamic system [23]. Even though the mathematical tools for dealing with such networks are not sufficiently well developed, it is not surprising that they are being widely studied at the present time and there is every indication that their use in the control of complex systems is bound to increase in the future. The problem of adjusting the parameters of a recurrent network for approximating a desired behavior has been addressed using supervised learning based on an output error signal, reinforcement learning based on a scalar reward signal, and unsupervised learning based on the statistical features of the input signal. Our interest here is in supervised learning algorithms. A network with a fixed structure (such as the ones described earlier) and a set of parameters, inputs, and desired outputs is specified. An error function is defined, and it is the gradient of this function with respect to the parameters that is desired. Numerous authors in engineering and other fields (including computer science and computational neuroscience) have proposed different algorithms for adjusting the parameters of recurrent networks, and we refer the reader to papers by Pearlmutter [24] and Rumelhart et al. [8, 25] on this subject. In the engineering literature, the problem was considered independently by Werbos [26], Narendra and Parthasarathy [27] and Williams and Zipser [28] in the early 1990s. Werbos refers to it as “back propagation through time,” Narendra and Parthasarathy as “dynamic back propagation,” and Williams and Zipser as “real-time recurrent learning.” In spite of the different origins and terminologies, the basic ideas are essentially the same and involve the computation of partial derivatives of time functions with respect to parameters in dynamic systems. A brief description of the principal ideas contained in each of the approaches is given below, and the reader is referred to the source papers for further details.
2.3.3.1 Back Propagation through Time The basic idea of this approach is that corresponding to every recurrent network it is possible to construct a feedforward network with identical behavior over a finite number of steps. The finiteness of the time interval permits the neural network to be unfolded into a multilayer network, so that standard back propagation methods can be used for updating the parameters. Back propagation through time was first described in 1974 by Werbos and was rediscovered independently by Rumelhart et al. (1986). The principal idea of the method is best described by considering two steps of a recursive equation: x(k + 1) = N(x(k), u(k), θ)
(2.32)
P1: Binaya Dash November 16, 2007
16:58
7985
7985˙C002
40
Modeling and Control of Complex Systems u(1)
Z–1
u(0)
N[x(0), u(0), θ]
x(1) xd(1)
N[x(1), u(1), θ]
e(1)
xd(2)
x(2)
e(2)
FIGURE 2.7 Back propagation through time.
where θ is an adjustable parameter vector. The system, over the instants 0,1,2, can be represented as shown in Figure 2.7. If u(0) and u(1) are known and x(0) is specified, the states x(1) and x(2) can be computed. If the desired states xd (1) and xd (2) are specified, static back propagation over two steps can be carried out to determine the gradient of an error function with respect to θ , and based on that, the parameter θ can be adjusted to decrease the error function over the interval. The above method can be readily extended to a finite number of steps. Since the input, output, and state information that has to be stored grows linearly with time, the interval over which optimization is carried out must be chosen judiciously from practical considerations. Having chosen the interval, the states are computed for the specified values of the inputs and the gradients computed using back propagation. Comment 8 The architecture for back propagation shown in Figure 2.6, which applies to an arbitrary number of layers, is a convenient aid for computing the necessary gradient while using this method. 2.3.3.2 Dynamic Back Propagation The origin of real-time recurrent learning may be traced back to a paper written in 1965 by McBride and Narendra [29] on “Optimization of timevarying systems.” Determining gradients of error functions with respect to adjustable parameters and adjusting the latter along the negative gradients was well known in the 1960s. This was extended to time-varying systems in the above paper and it was shown that it led to a whole gamut of optimization procedures ranging from self-optimizing (quasi) stationary systems to optimal programming and optimal control. A procedure for determining the gradient of a performance index with respect to the parameters of a nonlinear dynamical system was proposed by Narendra and Parthasarathy in 1991 [27]. This work was naturally strongly influenced by numerous papers written in the 1960s by Narendra and McBride on gradient-based methods for the optimization of linear systems. The main idea is best illustrated by a simple example of a continuous-time system (similar results can be readily obtained for discrete-time systems, as well as multivariable systems).
P1: Binaya Dash November 16, 2007
16:58
7985
7985˙C002
Control of Complex Systems Using Neural Networks
41
Let a second-order system be described by the equation: y¨ + F (α, y˙ ) + y = u y(0) = y1 y (0) = y2
(2.33)
The objective is to determine the value of α which minimizes 1 J (α) = T
0
T
1 [y(τ, α) − yd (τ )] dτ = T
2
T
e 2 (τ, α)dτ
(2.34)
0
where e(t, α) is the error between y(t, α) and a specified desired time function yd (t). A gradient-based method for adjusting α can be implemented if ∂e(t,α) ∂α
(= ∂ y(t,α) ) can be computed over the interval [0, T]. ∂α = z, Differentiating Equation (2.33) with respect to α and denoting ∂ y(t,α) ∂α ∂ F (α, y˙ ) ∂ F ∂ F ∂ y˙ we have (using ∂α = ∂α + ∂ y˙ ∂α ) z¨ +
∂F ∂ y˙
z˙ + z = −
∂F ∂α
(2.35)
If a time-varying sensitivity model described by Equation (2.35) can be constructed, z (the desired gradient) can be generated as its output. When the parameters of a neural network need to be adjusted, the desired partial derivatives ∂∂Fy˙ and ∂∂Fα are known signals, so that the desired gradient z and the change in the parameter α can be computed at every instant of time. Comment 9 The adjustment of α is assumed to be slow compared to the dynamics of the system and the methods were referred to as quasi-stationary methods in the 1960s.
2.3.3.3 Interconnection of LTI Systems and Neural Networks The method described above for determining the partial derivatives of the error functions with respect to the adjustable parameters in a recurrent network was used widely in the 1960s for optimizing LTI systems. Since the methods used are identical in the two cases, Narendra and Parthasarathy [27] suggested the use of dynamic back propagation for use in complex systems in which LTI systems and static multilayer networks are interconnected in arbitrary configurations. In all cases the method calls for the construction of a dynamic sensitivity model whose outputs are the desired partial derivatives. 2.3.3.4 Real-Time Recurrent Learning In 1989 Williams and Zipser [28] suggested a method very closely related to dynamic back propagation described earlier for discrete-time recurrent networks.
P1: Binaya Dash November 16, 2007
16:58
7985
7985˙C002
42
Modeling and Control of Complex Systems
Let a recurrent network consist of r inputs ui (·) (i = 1, 2, . . . , r ) and n state variables xi (i = 1, 2, . . . , n) which are related by the state equations: ⎛ ⎞ n r xi (n + 1) = γ ⎝ w1,i j x j (n) + w2,i j u j (n) ⎠ (2.36) j=1
j=1
or equivalently by the equation: ⎛ ⎞ n+r xi (n + 1) = γ ⎝ wi j z j (n) ⎠
(2.37)
j=1
where
zj =
xj
1≤ j ≤n
u j−n j > n
and γ is a squashing function. The objective is to determine the weights wi j so that the state (x1 (n), x2 (n), . . . , xn (n)) follows a desired trajectory (x1d (n), x2d (n), . . . , xnd (n)). To adjust the weights, the partial derivatives of xi (n) with respect to the weights have to be determined. If wkl is a typical weight, the effect of a change in it on the network dynamics can be determined by taking partial derivatives of both sides of Equation (2.36). This yields: ⎡ ⎤ n ∂xi (n + 1) ∂ x j (n) = γ ( yi (n)) ⎣ wi j + δik zl (n) ⎦ (i = 1, 2, . . . , n) (2.38) ∂wkl ∂wkl j=1 n+m where yi (n) = j=1 wi j z j (n) and δik is Kronecker’s delta (δik = 1, i = k and 0 otherwise). The term δik zl (n) represents the explicit effect of the weight wkl on the state xi , and the first term in the brackets the implicit effect on the state due to network dynamics. Equation (2.36) is a linear time-varying difference equation (which corresponds to the sensitivity network described in dynamic back propagation) which can be used to generate the required partial derivatives in real time.
2.4
Identification and Control Methods
In this section, we consider several different methods which have been proposed for addressing identification and control problems of the form posed in Section 2.2. The first method is based on linearization and represents the only
P1: Binaya Dash November 16, 2007
16:58
7985
7985˙C002
Control of Complex Systems Using Neural Networks
43
one that we endorse without reservation. We discuss the approach in detail and comment on its practical implementation as well as its limitations. These comments are based on extensive simulation studies carried out in numerous doctoral theses at Yale and the Technical University of Munich [30]–[35] under the direction of the first author, but are not included here due to space limitations. In addition to the above, alternate methods proposed by other researchers are included in this section. These make different assumptions concerning the representation of the plant. We comment on and raise both theoretical and practical issues related to these methods. Even though only some methods are presented here, they subsume to a large extent most of the approaches that have appeared in the neurocontrol literature. The latter are included in the list of references contained at the end of the chapter. 2.4.1 Identification and Control Based on Linearization A vast body of literature currently exists on linear systems and linear control systems. The introductory chapters of most textbooks on these subjects emphasize the fact that almost all physical systems are nonlinear and that linear systems are obtained by linearizing the equations describing the system around an equilibrium state (LTI system) or around a nominal time-varying trajectory (linear time-varying system, LTV) so that they are valid in some neighborhood of the equilibrium state or trajectory. The fact that linear systems analysis and synthesis have proved extremely useful in a large number of practical applications attests to the fact that the neighborhoods in which the linearizations are good approximations of the dynamic systems are not always small. In this section we attempt, using similar methods, to obtain a more accurate representation of the dynamic system in a neighborhood of the equilibrium state, by including nonlinear terms that are small compared to the linear terms. Such representations enjoy many of the theoretical advantages of linear systems, while at the same time permitting improvements in performance using neural networks to compensate for nonlinearity. 2.4.1.1 System Representation The developments in this section essentially follow those reported in Reference [36] by Chen and Narendra. These are used to derive the main results developed at Yale during the period 1990 to 2004. A nonlinear system is described by Equation (2.12). The local results for are based on the properties of its linearized system L described by: L :
x(k + 1) = Ax(k) + Bu(k) y(k) = C x(k)
(2.39)
where A = ∂∂Fx |(x=0,u=0) , B = ∂∂Fu |(x=0,u=0) , and C = ∂∂Hx |(x=0) are, respectively, the Jacobian matrices of F and H with respect to x and u. We now rewrite the
P1: Binaya Dash November 16, 2007
16:58
7985
7985˙C002
44
Modeling and Control of Complex Systems u
b
Unit Delay
c
A
h( )
+ +
y
f() FIGURE 2.8
actual system Equations (2.12) as: :
x(k + 1) = Ax(k) + Bu(k) + f (x(k), u(k)) y(k) = C x(k) + h(x(k))
(2.40)
where f and h are called “higher-order functions.” A block diagram representation of system (2.40) is seen in Figure 2.8. The representation shown in Figure 2.8 highlights the role played by the linear system in the synthesis of controllers, and suggests (as shown in this section) how methods proposed for linear systems can be modified for identifying and controlling the nonlinear plant, in some neighborhood of the equilibrium state. Using the approach proposed, linear identifiers and controllers are first designed before an attempt is made to include nonlinear terms. Since this is in general agreement with procedures followed by the neurocontrol community, the approach has both pedagogical and practical interest. 2.4.1.2 Higher-Order Functions We shall address all the problems stated in Section 2.2 in the context of the nonlinear system . In all cases we use the inverse function theorem and the implicit function theorem as the principal tools and indicate how the assumptions concerning the higher-order function permit the application of the theorems. Because the problems invariably lead to the addition, multiplication, and composition of “higher-order functions,” as functions of their argument, we formally define them as follows. DEFINITION A continuously differentiable function G(x) : Rn → Rm is called a “higherorder function” if G(0) = 0 and ∂∂Gx |x=0 = 0. We denote this class by H. Thus, any smooth function can be expressed as the sum of a linear function and a higher-order function. The following properties of functions in H are found to be useful and can be verified in a straightforward fashion: 1. If A is a constant matrix and F (·) ∈ H, then AF (·) ∈ H. 2. If F1 , F2 ∈ H then F1 F2 , F1 + F2 ∈ H. 3. If F1 ∈ H and F2 (0) = 0 and is continuously differentiable, F1 F2 (·) ∈ H.
P1: Binaya Dash November 16, 2007
16:58
7985
7985˙C002
Control of Complex Systems Using Neural Networks
45
If in some neighborhood U1 ⊂ Rn of the origin the equation Ax + f (x) = y
A ∈ Rn×n ,
x ∈ Rn
(2.41)
is defined, where A is nonsingular, by the inverse function theorem, there exists a neighborhood U ⊂ U1 , containing the origin such that V = AU + f (U) is open, and for all x ∈ U x = A−1 y + g( y)
g(·) ∈ H.
(2.42)
Similarly, it can be shown that if U1 ⊂ Rn+k is an open set containing the origin, an element of U1 is denoted by (x, y) with x ∈ Rn and y ∈ Rk , and F (x, y) = Ax + By + f (x, y), A is nonsingular, and f (·) ∈ H is a function from U1 to Rn , then by the implicit function theorem there exists an open set U ⊂ Rk containing the origin such that: x = A−1 By + g( y)
y∈U
g(· ∈ H)
(2.43)
satisfies the equation F (x, y) = 0. From the above it is seen that when the functions involved in the equations are either linear or belong to H inverses can be obtained in some neighborhood of the origin. It is this fact that is exploited throughout this section. The above two results can be shown in block diagram form as in Reference [36] (Figure 2.9 and Figure 2.10). It is the existence of functions g(·) in the neighborhoods around the origin that can be used in the inverse operation that provides an analytical basis for all the results that have been derived in Reference [36]. We refer the reader to that paper for further details. 2.4.1.3 System Theoretic Properties In Section 2.2 it was shown that if the linearized system L of is controllable, observable, and stable, then is also controllable, observable, and stable, that is, there exist neighborhoods of the origin where these properties hold. The following discussions indicate how the additional nonlinear terms are determined in each case. Local controllability: If the system L is controllable, then the system is locally controllable, that is, there exists a neighborhood c of the origin such that for any states x1 , x2 ∈ c there is a finite sequence of inputs that transfer
x
A
+ –
f()
FIGURE 2.9
Ax
f (x)
y
y
A–1 – +
g( )
x
P1: Binaya Dash November 16, 2007
16:58
7985
7985˙C002
46
Modeling and Control of Complex Systems x y
A
Ax y
B
A–1 B + +
Σ
x
g( ) f()
FIGURE 2.10
x1 to x2 . For the linearized system we have the equation: x(n) = An x(0) + Wc U0,n
(2.44)
where Wc is the controllability matrix and U0,n = [u(0), u(1), . . . , u(n − 1)]T . For the nonlinear system , using the implicit function theorem it is shown in Reference [36] that: U0,n = Wc−1 [x(n) − An x(0)] + g(x(0), x(n)) g(·) ∈ H
(2.45)
Local observability: Similarly, we also have the result that if L is observable, is locally observable. If the input-output sequences are defined as: Y(0,n) = [y(0), y(1), . . . , y(n − 1)]T U(0,n−1) = [u(0), u(1), . . . , u(n − 2)]T
(2.46)
L yields x(0) = W0−1 [Y(0,n) − PU(0,n−1) ] while the application of the inverse function theorem yields: x(0) = Wo−1 [Y(0,n) − PU(0,n−1) ] − η[Y(0,n) , U(0,n−1) ]
(2.47)
where η ∈ H, and P is a known matrix. From the above it follows that there exists a neighborhood o of the origin in which the state x(0) can be reconstructed by observing the finite sequence Y(0,n) and U(0,n−1) . Stability and stabilizability: It is well known that if the linear system described by: x(k + 1) = Ax(k)
(2.48)
is asymptotically stable, then the nonlinear system: x(k + 1) = Ax(k) + η(x(k)) η(·) ∈ H
(2.49)
is (locally) asymptotically stable. This can be shown by demonstrating that a function V(x) = x T P x which is a Lyapunov function for Equation (2.48) is also a Lyapunov function for Equation (2.49) in a neighborhood of the origin. It is also well known that if the system x(k + 1) = Ax(k) + bu(k) is controllable, then it can be stabilized by a feedback control law of the form u(k) = x(k) where is a constant row vector. When the nonlinear system is
P1: Binaya Dash November 16, 2007
16:58
7985
7985˙C002
Control of Complex Systems Using Neural Networks
47
considered, we have the equation x(k+1) = Ax(k)+bu(k)+ f [x(k), ; g(x(k))] and it follows that this is also asymptotically stable in some neighborhood of the origin s . It has also been shown that there exists a nonlinear feedback controller u(k) = x(k) + γ (x(k)) which stabilizes the system in a finite number (≤ n) of steps (i.e., transfers any state x0 ∈ s to the origin in a finite number [≤ n] of steps). In summary, the results presented thus far merely make precise in the three specific contexts considered that design based on linearization works locally for nonlinear systems. In each case the existence of a nonlinear function in H assures controllability, observability, or stability. Neural networks can consequently be used to practically realize these nonlinear functions. Set-point regulation and tracking: The same procedures used thus far can also be used to demonstrate that well-defined solutions exist for the set-point regulation and tracking problems stated in Section 2.3. We merely provide the main ideas, and refer the reader to the source paper [37], as well as the principal references [19, 32] contained in it, for further details. THEOREM 1 (Set-Point Regulation) The output y of a nonlinear system (whose state vector is accessible) can be regulated around a constant value r in a neighborhood of the origin if this can be achieved for the linearized system L . For L , an input u = x + v, where v is a constant, can be used to regulate the output around a constant value if the transfer function c[zI − A]−1 b does not have a zero at z = 1. For the nonlinear system , the same input yields a constant state x ∗ asymptotically, where x ∗ = [A + b]x ∗ + bv + f [x ∗ , x ∗ + v]
(2.50)
Because r = cx ∗ + h(x ∗ ), using the implicit function theorem it can be shown that v can be expressed explicitly in terms of r . This consists of the sum of the input used in the linear case (i.e., r/c[I − A]−1 b, A = A+ b) together with γ (r ) where γ ∈ H. Tracking an arbitrary signal y∗ (k): To pose the problem of tracking an arbitrary signal y∗ (k) precisely, we need concepts such as relative degree, normal form, and zero dynamics of the nonlinear system, which are beyond the scope of this chapter. We merely state the results when the state of is accessible, and when it is not, and in the latter case use the NARMA representation for , and the corresponding ARMA representation for its linearization L . THEOREM 2 (Tracking, State Vector Accessible) If the nonlinear system has a well-defined relative degree, and the zero dynamics of the linearized system L is asymptotically stable, then a neighborhood of the origin, and a control law of the form: u(k) = (c Ad−1 b) −1 [y∗ (k + d) − P x(k)] + g[x(k), y∗ (k + d)]
(2.51)
exist where g(·) ∈ H, such that the output y(k) of follows asymptotically the desired output y∗ (k) provided x(k), x ∗ (k) ∈ .
P1: Binaya Dash November 16, 2007
48
16:58
7985
7985˙C002
Modeling and Control of Complex Systems
THEOREM 3 (Tracking, Inputs and Outputs Accessible). Let have a well-defined relative degree and an input-output representation of the form: y(k + d) = α0 y(k) + · · · + αn−1 y(k − n + 1) + β0 u(k) + · · · + βn−1 u(k − n + 1) + ω ( y(k), y(k − 1), . . . , y(k − n + 1), u(k), . . . , u(k − n + 1)) (2.52) where β0 = 0 and ω(·) ∈ H. If the zero dynamics of is asymptotically stable, then there exists a control law of the form u(k) =
1 ∗ [y (k + d) − α0 y(k) − α1 y(k − 1) − · · · − αn−1 y(k − n + 1) β0 − β1 u(k − 1) − · · · − βn−‘ u(k − n + 1) + g( y(k), y(k − 1), · · · y(k − n + 1), y∗ (k + d), u(k), . . . , u(k − n + 1))] (2.53) ∗
such that y(k) will follow any reference signal y (k) with a sufficiently small amplitude, while all the signals in the system remain in a neighborhood of the origin. Comment 10 Before neural networks are used in any system, as stated in the previous sections, the existence of the appropriate input-output maps must be established. Theorems 1 to 3 establish the existence of such maps. 2.4.2 Practical Design of Identifiers and Controllers (Linearization) Having established the existence of the appropriate functions for the identification and control of nonlinear plants, we proceed to consider the application of these results to Problems 1 to 3 stated in Section 1.2, and the practical realization of neural networks. We shall use multilayer networks and radial basis function networks, as well as recurrent networks for both identification and control, depending upon the available prior information and the objectives. PROBLEM 4 (Identification) A stable nonlinear plant is operating in a domain D ⊂ X, containing the origin. ˆ of the plant using neural networks. Since the The objective is to determine a model input set is compact, the state x and the output y of also belong to compact sets. Identification can be carried out online, or off-line using data collected as the system is in operation. Two networks NF and NH are used to identify the functions F and H, respectively. Because the state x(k) is accessible, the network NH can be trained by standard methods from a knowledge of y(k) and the estimate y(k)(= ˆ NH (x(k))). NF (k) can be identified using a series-parallel method, as shown in Figure 2.11(a). Because the plant is known to be stable, and x(k) is bounded, no stability questions arise in this case. If a parallel model is used to identify the system, as shown in Figure 2.11(b), instability is possible. Hence, training of NF should be carried out
P1: Binaya Dash November 16, 2007
16:58
7985
7985˙C002
Control of Complex Systems Using Neural Networks
49
u(k) u(k) F
F
x (k+1) x(k)
x(k)
x (k+1)
e (k+1)
z –1
e (k+1) z–1 NF NF (a) Series-Parallel
xˆ (k+1)
ˆ x(k)
xˆ (k+1)
z–1 (b) Parallel
FIGURE 2.11 Identification.
off-line using the weights obtained by the series-parallel method as initial values. Once the model is structurally stable, parameter adjustments can be carried out online, provided the adjustment is slow compared to the dynamics of . Comment 11 It may also be preferable to identify F (x, u) as Ax+ Bu+ f (x, u) to distinguish between the contributions of the linear and nonlinear parts. PROBLEM 5 (Control) Whether a multilayer network or a recurrent network is used to control the unknown plant, the stability question is invariably present. It is well known in industry that great caution is necessary while introducing nonlinear control, and that any increase in the nonlinear component of the control input has to be gradual. Hence, as a first step, the region in the state space where x(t) lies has to be limited and a linear controller designed first before the nonlinear component is added to it. Adjustment of network parameters has to be carried out using dynamic back propagation. PROBLEM 6 (Plant unstable) At present we have no methods available for controlling an unknown, unstable, nonlinear plant with arbitrary initial conditions. However, if the system is operating in the domain where linearization is valid, standard adaptive techniques can be used to stabilize the system. Increasing the domain of stability is a slow process and involves small changes in the nonlinear components of the input. The direction in which the parameter vector of the neural network is to be adjusted must be determined using one of the methods (of dynamic back propagation) described in Section 2.3. If only the inputs and outputs of the plant are accessible in Problems 2 and 3, the NARMA model and the corresponding controller given by Equations (2.52) and (2.53) have to be used. Figure 2.12 shows the structure of the indirect controller proposed in Reference [12].
P1: Binaya Dash November 16, 2007
16:58
7985
7985˙C002
50
Modeling and Control of Complex Systems
Reference Model
Desired output ym –
r(t) ei Neural Network Ni
yˆp
Z–1
Σ +
– TDL
TDL
yp
Nonlinear Plant
ei(t) Σ
TDL u
Neural Network Nc
ec(t)
+
TDL
FIGURE 2.12 Indirect control.
Comment 12 The preceding sections indicate that the method of linearization is conservative but rigorous, assures stability using only the linearized equations, and improves performance using neural networks and a nonlinear component of the input that is small compared to the linear component. If the trajectories of the plant lie in a large domain where the above assumptions are not satisfied [i.e., f (x, u) in Equation (2.40) is comparable to the linear components] we do not have an adequate theory at the present time. This underscores the importance of Section 2.5, understanding of which is essential to formulate precisely adaptive control problems in regions where nonlinear effects are predominant. 2.4.2.1 Modeled Disturbances and Multiple Models for Rapidly Varying Parameters In the three problems stated in Section 2.2 and discussed above there were no external disturbances. In addition, it was also assumed that the systems to be controlled are autonomous (or the plant parameters are constant) so that the plant can track a desired signal exactly as t → ∞. However, external disturbances are invariably present, the plant generally contains dynamics not included in the identification model, and the characteristics of the plant may change with time. So the control has to be evaluated in the presence of external and internal perturbations. When small external perturbations and slow parameter variations affect the dynamics of the system, numerous methods such as the use of a dead zone, σ -modification and | |-modification have been suggested in the literature to assure robustness. These methods have also been suitably modified for use in neurocontrol. The reader is referred to the comprehensive volume on robust control by Ioannou and Sun [38] and the monograph [39] by Tsakalis and Ioannou for details concerning such problems. From a theoretical standpoint, these methods can be rigorously
P1: Binaya Dash November 16, 2007
16:58
7985
7985˙C002
Control of Complex Systems Using Neural Networks
51
justified if control based on linearization is used. Due to space limitations we do not consider them here. Instead, we discuss two cases: (1) where the disturbances are large and can be modeled as the outputs of unforced difference equations, and (2) where the parameters vary rapidly and multiple models are used. 2.4.2.1.1 External Disturbances [40] A SISO system is described by the equations: :
x(k + 1) = F [x(k), u(k), v(k)] y(k) = h[x(k)]
(2.54)
where x(k) and u(k) are the same as before and v(k) is a disturbance that is the output of an unforced stable disturbance model v where v :
xv (k + 1) = g[xv (k)] v(k) = d[xv (k)]
(2.55)
where xv (k) ∈ Rn . The state x(k) of as well as the disturbance are not accessible and it is desired to track a desired output y∗ (k) exactly (as k → ∞) even in the presence of the disturbance. In classical adaptive control theory, when both the plant and the disturbance model are linear, it is known that exact tracking can be achieved by increasing the dimension of the controller. Mukhopadhyay and Narendra have used this concept for disturbance rejection in nonlinear dynamic systems. They have discussed the approach in detail in Reference [40]. The following NARMA identification model is proposed by the authors: y(k ˆ + 1) = N[y(k), y(k − 1), · · · y(k − (n + m) + 1), u(k), u(k − 1), · · · , u(k − (n + m) + 1)]
(2.56)
to identify the given system as an (n + m) th -order system and control it as in Problem 3. It can be shown that this results in y(k) [and hence y(k)] ˆ tracking y∗ (k) exactly asymptotically in time. A large number of simulation studies have been successfully carried out on very different classes of nonlinear systems, and a few of these are reported in Reference [40]. 2.4.2.1.2 Multiple Models The authors of this chapter are currently actively working in this area and without going into much detail we state the problem qualitatively in its simplest form and merely comment on the principal concepts involved. A plant can operate in any one of N “environments” at any instant. The NARMA models 1 , 2 ,. . ., N approximate the behavior of in the different environments. In the simplest case the N models are assumed to be known so that N controllers (each corresponding to one of the models) can be designed. The plant switches slowly and randomly between the
P1: Binaya Dash November 16, 2007
16:58
7985
7985˙C002
52
Modeling and Control of Complex Systems
Model ΣN
yˆN
+
Σ
eN
– Model Σ1
yˆ1
Σ
e1
–
uN u u1
+
Plant
y
+
Σ
Control Error ec
–
Controller C1
Controller CN
Desired Output y*
FIGURE 2.13 Multiple models.
N environments. The objective is to detect the change in the plant based on the available output data and use the appropriate controller corresponding to the existing model. As shown in Figure 2.13, at every instant, N errors [e i (k) (e i (k) = yi (k) − y(k)), where yi (k) is the output of the ith model] are computed, the model that corresponds to the smallest error according to an error criterion is chosen at that instant, and the controller corresponding to it is used. If the plant comes to rest after a finite number of switchings, convergence of the model to the plant has been demonstrated in the linear case [41]. It has also been shown that the same is true for nonlinear systems, provided all of them satisfy the linearization conditions stated earlier. Simulation studies have been very successful and the transient performance of the system is substantially improved. When the plant characteristics vary, they may not correspond exactly to one of the predetermined models assumed in the previous case. In such situations tuning of both the model i and the corresponding controller Ci may be needed online. This has been referred to as the “switching and tuning” approach [42] and is widely used in many different fields (refer to Section 2.7). The results obtained for deterministic systems have also been extended to stochastic systems [43]. In the cases discussed thus far, all the systems i share the same equilibrium state, and all the trajectories lie in a neighborhood of the origin. In substantially more interesting problems, the models have different equilibrium states and operate in neighborhoods corresponding to those equilibrium states. However, transitions between equilibrium states may involve regions
P1: Binaya Dash November 16, 2007
16:58
7985
7985˙C002
Control of Complex Systems Using Neural Networks
53
where the nonlinear terms are dominant. Switching to assure that the system transitions from one neighborhood to another raises questions that require concepts of nonlinear control that are contained in Section 2.5.
2.4.2.2 Control of Nonlinear Multivariable Systems [44] Most practical systems have multiple inputs and multiple outputs, and the applicability of neural networks as practical adaptive controllers will be judged by their performance in such contexts. The representation, identification, and control of nonlinear multivariable systems are rendered very complex, as described in Section 2.2, due to the couplings as well as the delays that exist between the inputs and outputs. We comment briefly here on two questions that are relevant to the objectives of this chapter. They are (1) the problem of tracking in multivariable control and (2) decoupling of multivariable systems. For details concerning these problems the reader is referred to Reference [44]. 2.4.2.2.1 Tracking Given a controllable and observable nonlinear dynamic system with m inputs ui (·) and m outputs yi (·) i = 1, 2, . . . , m, with well-defined relative degrees di , it can be shown that it can be represented in a neighborhood of the origin by the equations: yi (k + di ) = i [x(k), u(k)] i = 1, 2, . . . , m
(2.57)
Because x(k) (by the assumptions made) can be expressed in terms of the outputs and inputs y(k), y(k − 1), . . . , y(k − n + 1) and u(k), . . . , u(k − n + 1), the system has a NARMA representation of the form (2.23). It is these equations that are used to determine controllers for tracking a desired output y∗ (k). Based on the existence of control inputs for LTI systems, it can be shown that the desired input u(k) can be generated as the output of a multivariable nonlinear system: u(k) = c [y(k), . . . , y(k − n + 1), r (k), u(k − 1), . . . , u(k − n + 1)]
(2.58)
As stated earlier in this section, c (·) can be approximated using a neural network Nc [or as the sum of a linear function of u(k − i) and y(k − i) and a nonlinear component that belongs to H]. Stability is guaranteed by the linear part while exact tracking is achieved using the nonlinear component. 2.4.2.2.2 Decoupling An important practical question that arises in multivariable control is decoupling. Qualitatively, in its simplest form it implies that each output is controlled by one input. In mathematical terms the output yi is invariant under inputs u j , j = i, and is not invariant with respect to ui . The desirability of decoupling in some practical applications is obvious. For linear multivariable systems described by the equation x(k + 1) = Ax(k) + Bu(k), y(k) = C x(k),
P1: Binaya Dash November 16, 2007
16:58
7985
7985˙C002
54
Modeling and Control of Complex Systems
it is well known that the system can be decoupled if a matrix E ∈ Rm×m is nonsingular [45] and ⎡ ⎢ ⎢ u(k) = E −1 ⎢ ⎣
c 1 Ad1 c 2 Ad2 .. .
⎤ ⎥ ⎥ ⎥ x(k) + E −1r (k) ⎦
(2.59)
c m Adm = Gx(k) + F r (k) (E is the matrix whose ith row is c i Adi −1 B). If the linearization of the given nonlinear system can be decoupled, it can be shown using the same arguments as before that in a neighborhood of the origin, exact decoupling is possible using a controller of the form: u(k) = Gx(k) + F r (k) + g(x(k), r (k)) g ∈ H
(2.60)
While approximate decoupling can be achieved using linear state feedback, exact decoupling can be achieved by approximating g(·) using neural networks. Similar arguments can also be given for the case when decoupling has to be achieved using only inputs and outputs. 2.4.2.3 Interconnected Systems All the problems considered thus far in this section are related to the identification and control of isolated nonlinear systems. As stated earlier, interest in the control field is shifting to problems that arise when two or more systems are interconnected to interact in some sense. In adaptive contexts, to make the problems analytically tractable, the different dynamic systems are assumed to be linear. Since the solution of the linear problem is the first approximation for nonlinear control problems that are addressed using linearization methods, we present here an important result that was obtained recently and that may have interesting implications for decentralized nonlinear control of the type stated in Section 2.2. A system shown in Figure 2.14 consists of N subsystems 1 , 2 ,. . ., N . i is linear and time invariant, whose objective is to choose a control input ui (·) such that its state xi (t) tracks a desired state xmi (t). Each system j affects the input to the system i by a signal a i j x j where i has no knowledge of either a i j or x j (t). The question that is raised is whether all the subsystem i can follow their reference inputs without having knowledge of the state vectors x j (t) of the other systems. The above problem was answered affirmatively in Reference [46, 47]. If it can be assumed that the desired outputs xmi (t) of the N subsystems are common knowledge, each subsystem attempts to cancel the perturbing signals from the other subsystems (e.g., h i j x j ) by using hˆ i j xmj in place of hˆ i j x j , and adapting hˆ i j (t). It was shown that the overall system would be asymptotically stable and that all the errors would tend to zero.
P1: Binaya Dash November 16, 2007
16:58
7985
7985˙C002
Control of Complex Systems Using Neural Networks
55 y*1
u1(t)
A1, B1, C1
Σ a1NxN
a12x2
e1
y1 Σ
x1 y*2
x2 u2(t)
Σ
A2, B2, C2
Σ
y*N
xN uN(t)
Σ
AN, BN, CN
e2
y2
yN
eN Σ
FIGURE 2.14 Interconnected system.
Decentralized nonlinear control with nonlinear interactive terms between subsystems stated in Section 2.2 is obviously the next class of problems of interest and can be attempted using neural networks, provided that in the region of interest is in the state space, the dynamical systems satisfy the condition discussed earlier. For more general nonlinear interconnections of the type stated in Section 2.2, where the trajectories lie in larger domains in the state space, more advanced methods will be needed (refer to Section 2.5).
2.4.3 Related Current Research The literature on neural network-based control is truly vast, and there is also intense activity in the area at the present time. Dealing with all the ongoing work in any detail is beyond the scope of this chapter. Because our main objective is to examine and present the basic principles used in simple terms, we present in this subsection some representative samples of methods, introduced by well-known researchers. One general method for representing nonlinear systems for control purposes was introduced by Suykens and coworkers [48]. The authors claim that the NL q system form that they introduce [alternating sequences of nonlinear elements (N), linear gains (L) having q layers] represents a large class of dynamic systems that can be used as identifiers and controllers. The plant and controller models (represented by Mi and Ci ) have the general form: x(k + 1) = 1 [A1 2 [A2 · · · q [Aq x(k)(k)] · · · + B2 u(k)] + B1 u(k)]
(2.61)
y(k) = 1 [c 1 2 [c 2 · · · q [c q x(k) + Dq u(k)] · · · D2 u(k)] + D1 u(k)] (2.62)
P1: Binaya Dash November 16, 2007
56
16:58
7985
7985˙C002
Modeling and Control of Complex Systems
where x(k) ∈ Rn is the state, y(k) ∈ Rm is the output, and u(k) ∈ Rr is the input of the recurrent network where the complexity increases with q . The same structure is used for both controllers and identifiers so that stability questions of the subsystems as well as the overall system are the same. The procedure used to control the plant is based on indirect adaptive control. Taking advantage of the structure of the models and using Lure theory it is shown that sufficient conditions can be derived for asymptotic stability. These are expressed in terms of the matrices [ Ai (i = 1, 2 . . . , q )]. The model is asymptotically stable if there exist diagonal matrices Di such that: −1 ||Di Ai Di+1 || ≤ 1, ∀ i
(2.63)
The above condition assures the existence of a Lyapunov function of the form V(x) = ||D1 x||. To approximate the plant dynamics the parameters of the matrices are adjusted using dynamic back propagation. To assure stability (satisfying inequalities) it is shown that the adjustment of Ai can be realized by solving a nonlinear constrained optimization problem, which results in a modified back-propagation scheme. A general structure was also introduced by Poznyak and coworkers [49] in which identification, estimation, sliding mode control, and tracking are treated. The emphasis of the book is on recurrent networks and we briefly outline below the approach proposed for the identification problem. It is assumed that a general nonlinear system can be represented by the difference equation: x(k + 1) = Ax(k) + W1 σ (x(k)) + W2 φ(x(k))γ (u(k))
(2.64)
where W1 and W2 are weight matrices and σ (·), φ(·), and γ (·) are known nonlinear functions. Both σ and φ contain additional parameter vectors V1 and V2 corresponding to hidden layers. However, σ , φ, and γ are assumed to be bounded functions. Hence W1 and W2 determine a family of maps to which any given nonlinear system belongs. The objective is consequently to estimate W1 and W2 and then to determine the control input u(·). Identifiers using the approach in Reference [49] are represented by recurrent networks described by ˆ 1 (k)σ ( xˆ (k)) + W ˆ 2 (k)φ( xˆ (k))γ (u(k)) xˆ (k + 1) = W
(2.65)
ˆ 1 (k) and W ˆ 2 (k) are derived. For seriesand the adaptive laws for adjusting W parallel models [where the arguments of σ and φ are x(k) rather than xˆ (k)] standard adaptive laws can be derived directly from those found in Reference [15]. Similar laws are also derived for recurrent networks. In classical adaptive control, deriving adaptive laws for such models was known to be a very difficult problem and was never resolved for the deterministic case. However, by making several assumptions concerning the boundedness of various ˜ 1, W ˜ 2, V ˜1, V ˜2) = signals in the system, the authors demonstrate that V(e, W ˜ 1T W ˜ 2T W ˜ 1T V ˜ 2T V ˜1 +W ˜2 +V ˜1 + V ˜ 2 ] is a Lyapunov function, which e T Pe + 12 Tr [W assures the convergence of the state error e to zero.
P1: Binaya Dash November 16, 2007
16:58
7985
7985˙C002
Control of Complex Systems Using Neural Networks
57
ASSUMPTIONS As has been stated several times in the preceding sections, assumptions concerning the plant to be controlled are invariably made to have analytically tractable problems. In fact, in the next section, prior information that is assumed in linear adaptive control is indicated. Because nonlinear adaptive control is very complex, it was only natural that different assumptions were made, starting around the early 1990s. Gradually, these became accepted in the field and researchers began to apply them to more complex systems. At the present time they have become an integral part of the thinking in the field. In Section 2.1, it was stated that assumptions can make complex problems almost trivial. The authors believe that the neurocontrol community should reexamine the conditions that are currently assumed as almost self-evident. In Section 2.4.4 the authors provide their own view on this subject. In the rest of this section the evolution of these assumptions to their present form is traced briefly by examining a sequence of typical papers that have appeared in the literature. Chen and Liu [50] consider the problem of tracking in a nonlinear system (1994). The system is described by the equation: x˙ = f 0 (x) + g0 (x)u y = h(x)
(2.66)
where x(t) ∈ Rn can be measured and f 0 , g0 , and h are smooth. The input and output are related by the equation y˙ = f 1 (x) + g1 (x)u where f 1 (x) = h Tx f 0 (x) and g1 (x) = h Tx g0 (x). Neural networks approximate f 1 and g1 as fˆ 1 and gˆ 1 and the latter are used to determine the control input u. Comment 13 The system is stable and gradient methods are used to approximate unknown functions. Polycarpou [51] deals with a second-order system (1996): x˙ 1 = f (x1 ) + φ(x1 ) + x2 x˙ 2 = u
(2.67)
It is assumed that f (·) is known while φ(·) is unknown and that the state variables are accessible. The objective is to regulate the system around the equilibrium state x1 = x2 = 0. To make the problem tractable it is assumed that φ(x) = θ T ζ (x1 ), where ζ (x1 ) is a vector of known basis functions. With this assumption the problem becomes a nonlinear (second-order) version of the problems described in Section 2.2.3. Using similar methods, an adaptive law for obtaining an estimate θˆ of θ , and a control law u are derived and are shown below. If z1 = x1 , z2 = x2 − α(x1 , θˆ ) and α(x1 , θˆ ) = −x1 − θˆ T ζ (x1 ), then u = −z1 − z2 + ∂∂xα (x2 + f + θ T ζ ) + ∂∂αθ θ˙ 1 ˙θˆ = ζ z − z ∂α − σ ( θˆ − θ ) . 1 2 ∂x 0 1
(2.68)
P1: Binaya Dash November 16, 2007
58
16:58
7985
7985˙C002
Modeling and Control of Complex Systems
In spite of the assumption and the low order of the system, the resulting control is found to be quite complex, which is typical of backstepping. The control used is seen to depend upon the partial derivatives of α which is estimated online. Comment 14 As a mathematical problem the above is precisely stated. The assumption that φ can be approximated using basis functions, in our opinion, is justifiable in this case, because it is a function of only one variable. We discuss this further in the next section. Rovithakis [52] describes a system (1999) by the equation: x˙ = f (x) + g(x)u + ω(x, u) x(t) ∈ Rn
(2.69)
where ω(x, u) is an external disturbance that is bounded and unknown. The objective is to regulate the system close to the equilibrium state. The following two assumptions are made: 1. In the absence of the disturbance a control α(x) stabilizes the system, and a Lyapunov function for the nonlinear system is V(x). 2. ω(x, u(x)) lies in the range of the basis functions S(x), i.e., ω(x, u(x)) = W T S(x) for any control law. Based on these assumptions it is shown that a control input u(x) = α(x) + uc (x) can be determined to stabilize the system. In 2004, an expanded version of this paper was presented in Reference [53]. Comment 15 It is assumed that the nominal system (without the disturbance) is stable and that an explicit Lyapunov function V(x) is known for it. Also while φ(x1 ) in (d) was a function of a single variable, an arbitrary function ω(x, u(x)) is approximated here using basis functions. Out of a very large collection of papers published in the literature [54]–[60], which are based on backstepping procedures and utilize basis functions, we consider a representative sample of four papers here. 1. In [54] Kwan and Lewis (2000) consider the tracking problem in a system described by: x˙ i = Fi (x1 , x2 , . . . , xi ) + G i (x1 , x2 , . . . , xi )xi+1 i = 1, 2, . . . , i x˙ n = Fn (x1 , x2 , . . . , xn ) + G n (x1 , x2 , . . . , xn )u (2.70) where the state variables are accessible, G i s are known and sign definite, and Fi s are unknown. The objective is to determine a control input such that x1 (t) tracks a desired output asymptotically. 2. Ge and Wang (2002) consider a similar problem in Reference [58] in which the system is described by the equations: x˙ i = f i (x1 , x2 , . . . , xi ) + gi (x1 , x2 , . . . , xi )xi+1 i = 1, 2, . . . , i x˙ n = f n (x1 , x2 , . . . , xn ) + gn (x1 , x2 , . . . , xn )u (2.71)
P1: Binaya Dash November 16, 2007
16:58
7985
7985˙C002
Control of Complex Systems Using Neural Networks
59
As in (1) it is assumed that gi are bounded away from 0. It is further assumed that |g˙ i | ≤ gid i = 1, 2, . . . , n and that a complex unknown function of z1 , z2 , . . . , zn related to the unknown functions f i and gi lies in the span of a set of known basis functions. 3. Li, Qiang, Zhuang, and Kaynak (2004) [59] consider the same system as in (2) and attempt the same problem with the same assumptions on gi and g˙ i but use two different sets of basis functions. 4. Wang and Huang (2005) [60] consider the system described by: x˙ i = f i (x1 , x2 , . . . , xi ) + xi+1 i = 1, 2, . . . , n − 1 x˙ n = f n (x1 , x2 , . . . , xn ) + u.
(2.72)
Tracking of a desired signal is achieved by computing a control input assuming that each element f i (x1 , . . . , xi ) in Equation (2.72) can be expressed as θiT ξi (x1 , x2 , . . . , xi ) where ξi are basis vectors. The above typical papers clearly indicate the thrust of the research in the community and the emphasis on basis functions. 2.4.4 Theoretical and Practical Stability Issues In Section 2.4.3 we discussed some typical methods that are representative of a large number of others that have also been proposed and share common features with them. Most of them claim that their approaches result in global asymptotic stability of the overall system and robustness under perturbations with suitable modifications of the adaptive laws. As all of them, in one way or another, attempt to emulate classical adaptive control, we shall start with the latter to provide a benchmark for examining and comparing the different methods. Because the approach based on linearization described in Sections 2.4.1 and 2.4.2 is closest to classical adaptive control, we shall briefly comment on that. Following this, we shall raise several questions and provide brief comments regarding each one of them to help us to critically evaluate the contributions made by the different authors. 2.4.4.1 Linear Adaptive Control The theoretical study of linear adaptive control starts with the assumptions made concerning the plant to be controlled. The plant is assumed to be linear and time invariant (LTI) of order n, with 2n unknown parameters. An upper bound on n is known. All the zeros of the plant transfer function are assumed to lie in the open left half of the complex plane (minimum phase). If the plant is to be identified (i.e., parameters are to estimated) it is normal to assume that it is stable with bounded inputs and bounded outputs. If the plant is to be controlled, it is generally assumed to be unstable and whatever adaptive scheme is proposed is expected to stabilize it. Because the controller parameters that are adjusted become state variables, the overall system is
P1: Binaya Dash November 16, 2007
16:58
7985
7985˙C002
60
Modeling and Control of Complex Systems
nonlinear with an extended state space. It is in this extended state space that all properties of the overall system are studied. By a suitable choice of a Lyapunov function V and corresponding adaptive laws, it is first shown that the system is stable and that all signals and parameters are bounded. Asymptotic stability never follows directly, as the time ˙ of V is always negative semidefinite and not negative definite. It derivative V is next shown that the control error tends to zero. Additional conditions on the reference input (persistent excitation) are needed to show that the parameter errors tend to zero (asymptotic stability). Theoretically, this is what is meant by the proof of stability in adaptive control. Another important consequence of the linearity of the plant is that all the results are global. 2.4.4.2 Nonlinear Adaptive Control Using Linearization Methods As shown in Section 2.4.1, in the method based on linearization, we are operating in a domain where the linear terms dominate the nonlinear terms. Hence all the results of linear adaptive control carry over, but are valid only in this domain. If only linear adaptive control is used, the errors do not tend to zero due to the presence of the nonlinear terms. It is at this stage that neural networks are needed to compensate for the nonlinear terms and make the control error tend to zero. Also, because the overall nonlinear effect (due to plant and controller) is small compared to the linear terms, adaptation of both linear and nonlinear terms can be fast. However, to assure that the approximation is sufficiently accurate, the neural networks are invariably adjusted on a slower time scale. This is the same procedure adopted for multivariable control, control using multiple models, and interconnected systems.
2.4.4.3 Nonlinear Adaptive Control The following simple adaptive control problem is a convenient vehicle for discussing many of the questions related to purely nonlinear adaptive control. THE PROBLEM: A system is described by the differential equation: x˙ 1 = x2 x˙ 2 = x3 ··· x˙ n = f (x1 , x2 , . . . , xn ) + u
(2.73)
f (·) : Rn → R is smooth, the state variables xi are accessible, and the input u is to be chosen so that the output x1 tracks a desired output y1d which is the
P1: Binaya Dash November 16, 2007
16:58
7985
7985˙C002
Control of Complex Systems Using Neural Networks
61
output of a stable nth-order differential equation: yd(n) +
n−1
αi+1 yd(i) = r
(2.74)
i=0
where r is a known bounded reference input. f (·) known: In the deterministic case where f is known, the choice of u(·) is simple. If u=−
n
a i xi − f (x) + r
(2.75)
i=1
the error e = ( y − yd ) satisfies the same stable homogeneous differential equation and tends to zero asymptotically. f (·) unknown: This is an adaptive control problem, and strictly speaking one for which a solution does not exist unless some assumptions are made concerning f (·). The prior information concerning f (·) determines the nature and complexity of the adaptive control problem. Consequently, it is with these assumptions that we are concerned here. •
If it is assumed that f (x) = α T η(x) where η(x) is a known vector function of x and α ∈ Rn an unknown parameter vector, the problem is trivial from a theoretical standpoint. The theoretical solution is obtained by using the control input u=−
n
a i (t)xi − αˆ T η(x) + r
(2.76)
i=1
and adjusting αˆ using an adaptive law derived from the error model in Figure 2.4. This solution was known in the adaptive control field thirty years ago. • Suggesting that f (x), x ∈ Rn , can be approximated by α T η(x) by choosing ηi (x) as basis functions and N sufficiently large is impractical even for small values of n (as stated in Polycarpou [51], n = 1 may be an exception). Instability may be caused by the residual error between f (x) and αˆ T η(x). • If f (·) is a part of a nonlinear dynamic system to be controlled, information concerning the function f (·) in the strict adaptive control problem can be obtained only from the inputs and outputs of . •
To measure the inputs and the outputs of to estimate f (·), it must be assumed that the system is stable. Obviously, this assumption is not valid since stabilizing the system is one of the main objectives of adaptive control.
P1: Binaya Dash November 16, 2007
62
16:58
7985
7985˙C002
Modeling and Control of Complex Systems
If f (·) is unknown, the problem has not been solved rigorously thus far. However, a large number of papers dealing with very complex nonlinear adaptive control problems with multiple unknown nonlinearities and highdimensional state space have appeared in the literature. In all cases the nonlinearities are approximated as α T η(x) where η(x) is known. Below we abstract from the above sample problem a set of general questions that we feel have to be answered before a nonlinear adaptive control problem is addressed. 1. Is the representation of the plant sufficiently general? It is clear that assumptions have to be made to render the problem tractable. One extreme assumption would be to assume that the plant is linear! But this will limit the class of plants to which the methods can be applied. In the twentieth century, starting with the work of Volterra, numerous representations for nonlinear systems have been proposed. If truncated models are used to identify a nonlinear system, the magnitudes of the residuals are evident in such cases. With the models proposed in References [48] and [49] this is not the case. For example, the effect of making the model more complex is not clear. 2. Are basis functions a generally acceptable way to approximate a nonlinear function? Although it is true that any continuous function f (x), x ∈ Rn can be approximated as f (x) ≈ α T η(x) α ∈ R N using a sufficiently large number N of basis function (ηi (x)), it is not clear how the approximation error scales with the dimension n. The authors have considerable experience with approximation methods and have carried out extensive numerical identification for many years using different approaches, including neural networks. These have shown that N must be very large even for simple functions. The number of basis functions increases dramatically with the dimension of the space over which the function is defined. Hence, from both theoretical and practical standpoints f (x) = α T η(x) is not a satisfactory parameterization of the approximator (though it may be convenient to derive adaptive laws). The many theoretical proofs given in the literature are consequently not acceptable in their present forms. 3. Is the plant stable or unstable? As stated earlier, following adaptive control, we will assume that the plant is stable only if identification is of interest, and unstable if the principal objective is stability. In our opinion, at present, only the method based on linearization can be used to stabilize an unknown unstable system. 4. If the plant is unstable, is it a continually evolving process (such as an aircraft or a chemical process) or can it be stopped or interrupted and reinitiated (like a broom balancer or a biped robot [as in Section 2.7])? These two represent very different classes of problems. Online adaptive control refers only to the first class. Repetitive
P1: Binaya Dash November 16, 2007
16:58
7985
7985˙C002
Control of Complex Systems Using Neural Networks
63
learning, as in the second case, is not online adaptive control but is important both practically and theoretically, as seen from the next question. 5. If the plant is stable, is the region S in the state space in which the trajectories lie known? We shall assume that this is indeed the case (though it is not a simple assumption in higher dimensions). If this is all the prior information available, a neural network can be trained (slowly off-line or online) to identify the system in S. However, the behavior of the system outside S being unknown, any input that drives the system outside S can result in instability. The reference trajectory should therefore lie inside S and during the adaptation process, the plant trajectories should also lie in S. This accounts for the great care taken in industry while trying new control methods. If a process can be interrupted, regions outside S can be explored and a feedback controller can be designed through repetitive learning. In the case of online adaptive control, regions outside S can be explored incrementally using continuity arguments and approximating properties of neural networks. To the authors’ knowledge such investigations have not been carried out thus far. 6. Are the controllers to be designed off-line or online? As seen from earlier comments, stability questions arise only in the latter. Stability questions also arise in computer simulations but very little cost is attached to them, and they can be reinitiated. 7. Are gradient methods or stability-based methods used in the adjustment of the controller? The essential difference between the two is in time scales. The latter operate in real time (i.e., the dynamics of adaptation is as fast as the dynamics of the plant). No gradient method in real time has been demonstrated to be stable. Therefore, such a method operating in a slow time scale cannot stabilize an unstable plant. However, almost all of them can be shown to work satisfactorily if the plant is stable and the adjustments are sufficiently slow. This accounts for neural networks performing very well in practical applications. Once again this demonstrates that successful applications do not necessarily imply sound theory. In the authors’ opinion, there have been very few real theoretical results in nonlinear adaptive control using neural networks. The solutions for the most part are not mathematically precise. Obviously, better theoretical formulations of problems are needed. Much greater emphasis has to be placed in the future on the prior information assumed about f (x), and the choice of the basis functions dictated by it. This has not impeded the use of neural networks in practice to improve the performance of systems that have already been stabilized using linear controllers. The applications described in Section 2.7 attest to this.
P1: Binaya Dash November 16, 2007
16:58
7985
7985˙C002
64
2.5
Modeling and Control of Complex Systems
Global Control Design
A basic problem in control is the stabilization of an equilibrium state. In Section 2.4, nonlinear stabilizers were developed which are valid in the neighborhood of such a point. An obvious question of both practical and theoretical importance is whether or not the region of validity can be extended to larger domains of the state space. The study of the global properties of nonlinear systems and their control is much more complicated than the local approaches employed so far. Nevertheless, the development of adequate mathematical tools has always been guided by linear intuition and aimed at finding analogies of the concepts developed in linear systems theory. As Brockett put it, even as the state space of a system becomes a differentiable manifold, the characterization of observability, controllability, and realization “is not more difficult than linear systems of the usual type in Rn ”[62] p. 1. The mathematical machinery used in the study of global system theory consists of differential geometric methods, the theory of foliations, and the theory of topological groups. Our objective in this section is to point the reader to the excellent and insightful body of literature that exists on the subject as well as to convey the intuition behind the principal ideas involved. This will permit the neural network community to formulate well-posed problems in the design of nonlinear controllers using neural networks as well as to chart future directions for research. We begin by characterizing the natural state space of a nonlinear dynamic system. The first question that arises naturally is the essential difference between linear and nonlinear systems, from a geometric viewpoint. The state space Rn of a linear system is “flat” in the sense that it expands to infinity along the direction given by the vectors of a basis of Rn . The space of nonlinear systems, on the other hand, is curved and is defined as the manifold M, where a point p ∈ M if there exists an open neighborhood U of p and a homeomorphic map ϕ : U → ϕ(U) ⊂ Rn , called the (local) coordinate chart of M. In other words, the manifold “looks” locally like Rn . This simple fact has an important consequence: many coordinate systems may be needed to describe the global evolution of a nonlinear dynamic system. Whereas the manifold is an abstract geometric object, the coordinate system is the physical handle on that object through which we have to interact with the system when we control it. It is important to keep this in mind when designing global nonlinear controllers. 2.5.1 Dynamics on Manifolds An excellent textbook on the subject is Boothby [63]. One important idea is the following: The flow of a system is a C 1 -map φ : R × U → M sending an initial value p ∈ U defined on some open neighborhood U ⊂ M to a value
P1: Binaya Dash November 16, 2007
16:58
7985
7985˙C002
Control of Complex Systems Using Neural Networks
65
φ(t, p) ∈ M (at time t ∈ [t0 , t1 ]). It defines a vector field X p as follows: d Xp = (2.77) φ(t, p) dt t=t0 X p is the tangent vector to the curve t → φ(t, p) ∈ M at t = 0. Given a coordinate chart (U, ϕ), we obtain the usual differential equation: f (x) = x˙
(2.78)
Notice that f (x) is merely the local representative of the vector field X p defined in Equation (2.77) on the smooth manifold M. Furthermore ϕ( p) = x for all p ∈ U. Once the solution leaves the neighborhood U in which the representation is valid, we have to find a new set of local coordinates. Denote by C ∞ (x) the set of smooth functions defined on a neighborhood of x ∈ Rn . Any vector field defines a linear operator assigning to any h ∈ C ∞ (x): n ∂h (L f h)(x) = f i (x) (2.79) ∂xi x i=1 which is called the directional (Lie-) derivative of h along f . Geometrically, the vector field assigns to each p ∈ M a tangent vector given as an element of a linear space of mappings called tangent space Tp M to the manifold M at p. The mappings are denoted by: Xp :
C ∞ ( p) → R
(2.80)
Given a smooth map F : M → N it is clear that for any point p ∈ M we have F ( p) ∈ N. But what happens to the tangent vectors attached to p ? We define a map (called the tangent map of F at p) F∗ : Tp M → TF ( p) N as follows: F∗ X p (h) = X p (h ◦ F )
(2.81)
where h is (again) a smooth scalar valued function h : M → R. h ◦ F denotes the composition of h and F (at p). Hence the argument of X p in Equation (2.81) is simply the function h(·) evaluated at the point F ( p). It is easily checked that F∗ X p is indeed an element of TF ( p) N, that is, the tangent space to N at F ( p) (see Figure 2.15). The tangent map ϕ∗ : Tp M → Tϕ( p) Rn is used to define local representatives of the tangent vectors X p at p ∈ M. The tangent bundle defined as: TM = Tp M (2.82) p∈M
is 2n-dimensional with natural coordinates (ϕ( p), ϕ∗ ( X p )) =: (x, f (x)), that is, it is composed of the point p on M and the corresponding tangent vector X p ∈ Tp M attached at that point.
P1: Binaya Dash November 16, 2007
16:58
7985
7985˙C002
66
Modeling and Control of Complex Systems
F TpM p
F*Xp
Xp
F(p) N M TF(p)N FIGURE 2.15 Tangent maps.
We are now ready to define a (smooth) vector field as the mapping: X : M → TM
(2.83)
assigning to every p ∈ M a tangent vector X p ∈ Tp M (in a smooth way). In view of the above a nonlinear control system of the general type can be defined in local coordinates as follows: :
x˙ = f (x, u)
(2.84)
where ϕ( p) = x ∈ Rn and ϕ∗ ( X p (u)) = f (x, u) ∈ Tx Rn are both defined in a neighborhood U of M, a smooth connected manifold. Notice that the tangent space Tx Rn to the Euclidean space at any point x is actually equivalent to Rn . The vector field is parameterized by the controls u ∈ V ⊂ Rr . We assume that the solution for Equation (2.84) exists up to infinite time for any fixed u. 2.5.2 Global Controllability and Stabilization Keeping in mind that the representation of a dynamic system on curved spaces requires many local coordinate systems, we set M = Rn in this section. We are interested in designing a globally stabilizing feedback controller for a general nonlinear system of the form (2.84). Among the many ideas developed for nonlinear feedback control (see, e.g., Reference [64] and references therein) we select one that closely builds upon the results obtained in Chapter 4, in fact it will allow us to extend the results in the very direct meaning of the word (see Reference [65]). Given the control system (2.84) where x ∈ Rn , the solution for Equation (2.84) exists up to infinite time for any u ∈ V ⊂ Rr fixed. The problem is to find a feedback control u ∈ V such that [x ∗ , u∗ = u(x ∗ )] is an asymptotically stable fixed point of the closed-loop system f (x, u(x)). We restrict ourselves to the case of semi-global stabilization, that is, the region of attraction of x ∗ is a compact subset K ⊂ Rn . A first question is whether a smooth feedback can be found that stabilizes the system. This is fundamental if neural networks
P1: Binaya Dash November 16, 2007
16:58
7985
7985˙C002
Control of Complex Systems Using Neural Networks
67
are to be employed to approximate and implement the control law. Let f −1 (0) = {(x, u) ∈ Rn × Rr | f (x, u) = 0}
(2.85)
denote the equilibrium set of the control system. It turns out that for some point [x ∗ , u(x ∗ )] ∈ f −1 (0) to be smoothly stabilizable f −1 (0) must be an unbounded set. As an example [65], the system x˙ 1 = x12 + x22 − 1 x˙ 2 = u
(2.86)
is not smoothly stabilizable in the large, as its equilibrium set defined by x12 + x22 = 1, u = 0 is bounded. Moreover, a general smooth system defined on a compact set K ⊂ Rn is never globally smoothly stabilizable because its equilibrium set is evidently bounded. Another necessary condition for C ∞ stabilizability obtained by Brockett [66] is that f (x, u) : Rn × Rm → Rn maps every neighborhood of (x ∗ , u∗ ) onto a neighborhood of zero. As an example, the system: x˙ 1 = u1 x˙ 2 = u2 x˙ 3 = x2 u1 − x1 u2
(2.87)
does not have a continuous stabilizing feedback, because no point of the form x = [0 0 ε]T is in the image of f (x, u). The conditions motivated the introduction of discontinuous feedback laws. To this end, we are interested in a special kind of controllability (discussed in Section 2.2): a point x0 ∈ K is piecewise constantly steered into a point x ∈ Rn if: ∃ T ∈ R, T > 0 for x0 ∈ K , x ∗ ∈ Rn u(t) : [0, T] → V ⊂ Rr , u piecewise constant such that φu (T, x0 ) = x ∗
(2.88)
where φu (t, x0 ) is the flow of f (x, u) on Rn with initial value x0 . V ⊂ Rr is a finite set. If Equation (2.88) holds for every point x0 ∈ K then x ∗ is said to be piecewise constantly accessible from the set K . Accessibility in general is the property that the above holds for arbitrary u : [0, T] → V ⊂ Rr . Equivalently, controllability means that every point x0 ∈ K can be steered into x ∈ Rn . It is clear that the controls ui ∈ V = {u1 , . . . , u N } generate different vector fields f i = f (x, ui ) where i = 1, . . . , N. Every vector field when applied to the system will cause the state variable x to evolve in the direction tangent to it. A fundamental property of discontinuous controls is that it may generate additional directions (i.e., Reference other than f i ) where the system may evolve.
Example (adapted from Reference [64])
Consider a kinematic model of a car (front axis) with position [x1 , x2 ] ∈ R2 and the angle of rotation x3 ∈ S1 , that is, the state space of the car is given
P1: Binaya Dash November 16, 2007
16:58
7985
7985˙C002
68
Modeling and Control of Complex Systems
by M = R2 × S1 . The two applicable events are “drive” and “rotate” that corresponds to the two vector fields: ⎛ ⎛ ⎞ ⎞ sin x3 0 f 2 = ⎝ 0 ⎠ “rotate (2.89) f 1 = ⎝ cos x3 ⎠ “drive 1 0 As the experienced driver knows, in order to park the car, one has to use the switching sequence given by “roll – rotate – roll back – rotate back”, that is, the resulting flow is obtained as the composition of the flows induced by the fields f 1 , f 2 , − f 1 , and − f 2 . It can be shown that the system moves infinitesimally in the direction orthogonal to the drive direction. The new direction corresponds to the Lie bracket of the vector fields f 1 and f 2 . The Lie bracket of two vector fields is another vector field that measures the noncommutativeness of the flows induced by both vector fields. The Lie bracket is instrumental in understanding nonlinear control systems, as it implies that the points attainable from some point x0 by the vector fields f i lie not only in the directions given by linear combinations of f i but also in the direction of the (iterated) Lie brackets of f i . In local coordinates, the Lie bracket writes ∂ f2 f 1 (x) − ∂ f 1 f 2 (x). [ f 1 , f 2 ]x = (2.90) ∂x ∂x x
x
In the above example we obtain [ f 1 , f 2 ] = [− cos x3 sin x3 0]T . For the interested reader some of the more formal mathematical statements regarding Lie brackets are included at the end of the section (see also Reference [67, 68]). At present, we state the main result of the section, which is due to Reference [65]. THEOREM Let w = w(x) be a smooth feedback that locally stabilizes at (x ∗ , w ∗ ) ∈ f −1 (0). Let K be a compact set and w ∈ Int V where V is the set of admissible controls. Then w = w(x) has a piecewise smoothly stabilizing extension u = u(x) : Rn → V over K if and only if Nx∗ is p.w. constantly accessible from K , where Nx∗ is an open neighborhood of x ∗ such that its closure Nx∗ is an invariant set of the closed loop system x˙ = f (x, w(x)) and w = w(x) smoothly stabilizes in (x ∗ , u∗ ) over Nx∗ . Let us highlight the conditions of the theorem. It is required that 1. The system must be locally stabilized at the point (x ∗ , u∗ ) ∈ f −1 (0). 2. The point x ∗ must be piecewise constantly accessible from a compact set K . In order to verify (1) and realize the local stabilizing controller the methods described in Chapter 4 are used. We know that if the linearized system L = ∂f ∂ f := ( A, B) is stabilizable, that is, , ∗ ∗ ∗ ∗ ∂x
x ,u
∂u
x ,u
rank(s I − A, B) = n whenever
Re s ≥ 0
then the original system is locally C ∞ stabilizable at (x ∗ , u∗ ).
(2.91)
P1: Binaya Dash November 16, 2007
16:58
7985
7985˙C002
Control of Complex Systems Using Neural Networks
69
Condition (2) refers to our above discussion and Equation (2.88). The theorem states that the semiglobally stabilizing control law will be given by a set of smooth controls ui , i ∈ = {1, . . . , N} and a switching sequence σ (x) : Rn → that decides which control law is to be used depending on the state of the system. Notice that one of the controllers ui corresponds to the local stabilizer at x ∗ while the others serve to enlarge the region of attraction of x ∗ in the closed-loop system. Based on this existence theorem, N + 1 neural networks can be used to implement ui , i = 1, . . . , N and the switching function. The first N networks play the role of function approximators while the ( N + 1)th network is used as a classifier of the state space of the system.
Example Find a global controller that stabilizes the origin of the system:
x˙ 1 = −x1 + x2 1 + x12 x˙ 2 = u
(2.92)
We verify condition (1) and find that the controllability matrix of the linearized system (at zero) has full rank. We define a linear feedback: u = −x2
(2.93)
to stabilize the nonlinear system (2.92) in a neighborhood of the origin. It is clear that the control law cannot be extended to an arbitrary compact domain K ⊂ R2 , because the quadratic term x12 will eventually dominate the stable part “−x1 ” on the right-hand side of Equation (2.92). Assuming global accessibility of the origin (the question is addressed later) we define the piecewise smooth feedback:
(2.94) u = −sign(x1 ) 1 + x12 − sign(x2 ) which steers the system state to the region where the linear stabilizing control law is valid. This can be verified using the strict Lyapunov function: V = |x1 | + 0.5 x22
(2.95)
˙ = The time derivative along the trajectories of the closed-loop system is V −|x1 | − |x2 | < 0. Figure 2.16 displays the local stability region and a sample trajectory which starts from outside this region and is steered to the origin using a piecewise smooth control. This is possible since the system is in fact semiglobally controllable. Controllability of a nonlinear system depends upon the way the family of vector fields FV = { f (x, u) | u ∈ V ⊂ Rr } generates a Lie algebra of differentiations of C ∞ (Rn ). Consider the control system: x˙ = f (x) + g1 (x)u1 + . . . + gr (x)ur . drift vf
control vectorfield
(2.96)
P1: Binaya Dash November 16, 2007
16:58
7985
7985˙C002
70
Modeling and Control of Complex Systems
3 2 1 0 −1 −2 −3 −3
−2
−1
0
1
2
3
FIGURE 2.16 Piecewise smooth stabilization in the large. The controller consists of a piecewise smooth feedback of the form (2.94) (outside the local stability region) and (2.93) (in a neighborhood of the origin).
Given an initial point x(0), we wish to determine the set of points that can be reached from x(0) in finite time by a suitable choice of the input functions u1 , . . . , ur . The set of vector fields of Equation (2.92) that can be obtained by applying different controls u spans a linear space x at any point x ∈ Rn . x is called a distribution of vector fields at x and is a subspace of the tangent space Tx Rn of Rn at x. The set of points reachable from x0 ∈ K lie on the integral manifold of x . Frobenius’ theorem states that is integrable if and only if it is of constant rank and involutive, that is, closed under Lie brackets: f 1 , f 2 ∈ ⇒ [ f 1 , f 2 ] ∈ at every point x ∈ Rn . Thus, is controllable if it is possible to construct an integrable distribution of dimension n. This is achieved by successively including “new directions” to obtained by forming higherorder Lie brackets of the vector fields in FV . In the above example we have
0 −x1 + x2 1 + x12 f (x) = and g(x) = (2.97) 1 0 We form a distribution of vector fields, x = span{g, [ f, g]} = span
! 0 1 + x12 , 1 0
(2.98)
We include only those Lie brackets that are not already contained in lowerorder Lie brackets of vector fields in FV . Given any point x ∈ Rn , Lie x FV is the linear space spanned by the tangent vectors of Lie FV at that point.
P1: Binaya Dash November 16, 2007
16:58
7985
7985˙C002
Control of Complex Systems Using Neural Networks
71
The dimension of that space is called the rank of the Lie algebra Lie FV at the point x ∈ Rn . The system defined in Equation (2.84) is globally controllable provided that: rank Lie x FV = dim Rn
∀ x ∈ Rn
(2.99)
The condition is evidently fulfilled in our example; see Equation (2.98). The construction of the distribution enables us to identify the set of reachable states without specifying the control input u1 . . . ur . The set depends exclusively on geometrical properties of the system (2.96). 2.5.3 Global Observability In the above, it was assumed that the state x of the system is accessible. Stabilization of a nonlinear system is more difficult if only a function of the state is available, in the way introduced earlier: y = h(x)
(2.100)
where h : Rn → R is a C ∞ function representing some measuring device. As is well known from control theory and has also been stressed in Section 2.2, the critical property in this case is observability. Unlike the linear case it is impossible to pass directly from controllability conditions to observability because in the nonlinear domain there is no clear notion of duality. The question is whether, given two initial states x1 and x2 , one can distinguish these initial states by observing the values of the “output” function h(x) for any input sequence of length l. This is referred to as strong observability. A weaker form is generic observability, which requires that almost any input sequence of length greater or equal to l will uniquely determine the state. In Aeyels [69] it is demonstrated that almost all smooth output functions pair with an almost arbitrarily chosen smooth vector field to form a globally observable system provided that l = 2n + 1 samples are taken. His proof makes use of a “generic transversality” theorem of differential topology to characterize observable flows. An extension of this result to “universal observability” has been given in Reference [70] in which the output function is only continuous. In a paper by the first author [19], conditions under which strong observability holds have been investigated and it is shown how this can be used to construct global input-output models of the nonlinear system. Comment 16 In this section we found that the extension of familiar concepts such as controllability, observability, and stabilization to the nonlinear domain requires new mathematical tools and insights. Over the last thirty years a rich body of literature has been created at the intersection of differential geometry, topology, and control theory which addresses the questions of global nonlinear control. However, this literature has not yet entered the engineering literature. Constructive methods for actually realizing global controllers based on geometric control theory are only sparely available and often
P1: Binaya Dash November 16, 2007
72
16:58
7985
7985˙C002
Modeling and Control of Complex Systems
involve a high level of mathematical formalism which is hard to grasp intuitively. In practice, our ideas for successful and ingenious design come from the “feel” we have about the effect of the chosen design on the system behavior. We have not reached this stage yet in the nonlinear domain but the material presented in this section is meant to be the first step in that direction. The fact that switching may be involved in order to overcome the topological obstruction to global stabilization must become common knowledge in much the same way as state space properties are in linear systems. The multiplemodel approach described in Section 2.4 provides the architecture for orchestrating the action of the neural networks involved in global nonlinear control.
2.6
Optimization and Optimal Control Using Neural Networks
As stated in Section 2.1.4, there is currently a great deal of interest in the use of neural networks in optimization and optimal control problems. These are problems in which optimization is carried out over a finite time. Even though the authors are not actively involved in research in this area at present, a few years ago the first author had an active program in the area of optimal control using neural networks, and consequently has some familiarity with such problems. Hence, for the sake of completeness, we wish to discuss this topic briefly and clarify the concepts involved. In Section 2.6.1, the system to be optimized is assumed to be completely known. Optimal control is the principal tool, and is used to determine optimal control inputs as functions of time. The information collected is used to design neural networks to act as feedback controllers. In Section 2.6.2 the principal mathematical vehicle is dynamic programming. More importantly, the system to be controlled is either unknown or partially known. Hence it addresses problems of optimization under uncertainty and bears the same relation to Section 2.6.1 that adaptive control problems discussed in Section 2.4 bear to feedback control theory. Finally, although the problems treated in this section involve optimization over a finite time interval, unlike the adaptive control problems treated earlier, much of the motivation for using different approximation schemes, as well as the analytical difficulties encountered, are very similar. 2.6.1 Neural Networks for Optimal Control A question that arises in all decision making in both biological and engineering systems concerns the extent to which decisions should be based on memory and online computation. For example, in some cases retrieval of stored answers may be preferable; in other cases solutions may have to be computed online, based on data obtained at that instant. In this section, we describe methods proposed in the last decade that utilize the above concepts for solving optimal control problems using neural networks.
P1: Binaya Dash November 16, 2007
16:58
7985
7985˙C002
Control of Complex Systems Using Neural Networks
73
Theoretical methods such as Pontryagin’s maximum principle and Bellman’s dynamic programming exist for determining optimal controls for nonlinear dynamic systems with state and control constraints. However, solutions for specific problems can rarely be determined online in practical problems. In optimal control theory the solutions are obtained as functions of time [i.e., u(t)], which makes them nonrobust in practical applications, where feedback controllers [i.e., controllers of the form u(x)] are required. During the period 1994 to 2000 very promising methods were proposed by Narendra and Brown [71] for circumventing these difficulties. The authors suggested that open-loop solutions of optimal control problems computed off-line could be used to train neural networks as online feedback controllers. Substantial progress was made, and the authors succeeded in proposing and realizing solutions to relatively complex optimal control problems. However, even as efforts to improve the scope of the approach were proving successful, the research was terminated, for a variety of reasons. Because the concepts may prove attractive to future researchers in the field, and also as an introduction to Section 2.6.2, they are discussed briefly here. 2.6.1.1 Function Approximation Using Neural Networks The gradual evolution of the “solve and store” approach from function approximation to optimal feedback control is best illustrated by considering several simple examples. The approach itself was motivated by a problem in a transportation system involving N electrical vehicles whose positions and velocities determine voltages which are critical variables of interest. A simplified version is given in Problem 7. PROBLEM 7 Consider the network shown in Figure 2.17a representing two electrical vehicles operating on a track. R1 , R2 , and R3 are fixed resistances, while three other resistances depend on variables x1 and x2 . The voltages V1 and V2 across R1 and R2 are nonlinear functions of x1 and x2 and are the values of interest. The objective is to estimate V1 and V2 for given values of x1 and x2 . This is a standard application of a neural network as a function approximator. The problem was solved 1000 times for randomly chosen values of x1 and x2 in a compact set and a two-input two-output neural network shown in Figure 2.17b was trained. The network outputs were compared Rx1
R(x2 – x1)
R(1 – x2) x1
Vs
R2
R1
R3
V1(x1, x2) Neural Network
x2 (a) FIGURE 2.17 Estimation of voltages in a time-varying electrical system.
V2(x1, x2) (b)
P1: Binaya Dash November 16, 2007
16:58
7985
7985˙C002
74
Modeling and Control of Complex Systems
with true values for test inputs and had standard deviations of 0.0917 and 0.0786, respectively. PROBLEM 8 All the resistances in Problem 7 were linear, making the computations of currents and voltages simple for any choice of x1 and x2 . In Problem 8, one of the resistances was made nonlinear, making the computation of Vi (x1 , x2 ) (i = 1, 2) substantially more complex. For every choice of x1 and x2 the problem had to be solved iteratively and the results used to train the neural network. This problem introduces the principal difficulty encountered in many of the problems that follow, where available data have to be processed at many levels to obtain the relevant inputs and outputs necessary to train neural networks.
2.6.1.2 Parameter Optimization The next stage in the evolution of the method was concerned with static optimization in which parameter values have to be determined that optimize a performance criterion in the presence of equality or inequality constraints. As shown below, a number of optimization problems have to be solved to obtain the information to train a neural network. PROBLEM 9 A function f (x, α) has to be minimized subject to an inequality constraint g(x, α) = 0, where x = [x1 , x2 ]T ∈ R2 , and α is a parameter. α can assume different values in an interval [0,1] and a neural network has to be trained to obtain the corresponding optimal values of x1 (α) and x2 (α). x2
1+α 2 x1 x2 f (x, α) = (α + 0.5)x12 + (1+α) 2 − 2 (2.101) g(x, α) = (1 − α)[10 + 0.1(x1 + 10α − 5) 3 + α(−3x1 − 10) − x2 ] = 0
Plots of f (x, α) = c (a constant) and g(x, α) = 0 are shown for two typical values of α in Figure 2.18, that is, α = 0.2 and α = 0.8, from which the optimal values x1 (0.2), x2 (0.2) and x1 (0.8), x2 (0.8) can be computed. The constrained parameter optimization problem was solved 100 times for 100 values of α to train the network to obtain optimal solutions for values of α not used before. It is seen from Figure 2.19 that the optimal values of x1 and x2 are discontinuous functions of α. PROBLEM 10 (Dynamic Optimization) Problem 9 sets the stage for dynamic optimization in which neural networks can be used effectively as feedback controllers in optimal control problems. The general statement of such problems is first given and two examples are included to illustrate the different specific forms it can take. A system is described by the differential equation: x˙ = f [α(t), u(t)]
x(t0 ) = x0
(2.102)
P1: Binaya Dash November 16, 2007
16:58
7985
7985˙C002
Control of Complex Systems Using Neural Networks
75
10 8 6
f (x, α) = e
4 2 g (x, α) = 0
x2
0 –2 –4 –6 –8 –10 –10
–8
–6
–4
–2
0 x1
2
4
6
8
10
8
10
(a) α = 0.2 10 8 6
f (x, α) = c
4
x2
2 0 g (x, α) = 0
–2 –4 –6 –8 –10 –10
–8
–6
–4
–2
0
2
4
6
x1 (b) α = 0.8 FIGURE 2.18 Contour plots of f (x, α) = c, where c is a constant, and constraint curves g(x, α) = 0 for two values of α.
where f (·) satisfies conditions to assure the existence and uniqueness of solutions in an interval [0, T]. x(t) ∈ Rn and u(t) ∈ Rr . The input u(·) is amplitude constrained and must lie in a unit cube c ⊂ Rr [i.e., ||ui (t) ≤ 1 i = 1, 2, . . . , r ]. The initial state x0 ∈ S0 and the objective is to determine a control input that transfers x0 to xT and
P1: Binaya Dash November 16, 2007
16:58
7985
7985˙C002
Modeling and Control of Complex Systems
x1
76 3 2 1 0 –1 –2 –3 0
0.1
0.2
0.3
0.4
0.5 α
0.6
0.7
0.8
0.9
1
0
0.1
0.2
0.3
0.4
0.5 α
0.6
0.7
0.8
0.9
1
4
x2
2 0 –2 –4 –6
FIGURE 2.19 Optimal values of x1 and x2 as functions of α. Precomputed solutions coincid with neural network approximations.
minimizes a performance criterion: J [u] =
T
L[x(t), u(t)]dt
(2.103)
0
The optimal input and optimal trajectory are denoted by u∗ (t) and x ∗ (t), respectively. The above problem reduces to the solution of 2n differential equations of the form: x˙ (t) = Hλ [x(t), λ(t), u(t)] x(0) = x0 ˙ λ(t) = −Hx [x(t), λ(t), u(t)] x(T) = xT
(2.104)
and the optimal input u∗ (t) is determined from the optimality condition: I nf u(t)∈C H[x ∗ , λ, u] = H[x ∗ , λ, u∗ ]
(2.105)
This necessary condition confines the optimal solution to a small set of candidates. In the problems that we shall consider, it will be unique. Once u∗ (t) is known as a function of x ∗ (t) and λ(t), Equations (2.104) correspond to a two-point boundary value problem (TPBVP) that can be solved off-line through successive approximations. This yields x ∗ (t), λ(t) and the corresponding u∗ (t) as functions of time. Comment 17 Our interest is in training neural networks as feedback controllers for the above problem. Following the procedure we have adopted thus far, the above problem must be solved for numerous values of x0 ∈ S0 to obtain the necessary information.
P1: Binaya Dash November 16, 2007
16:58
7985
7985˙C002
Control of Complex Systems Using Neural Networks
77
By Bellman’s principle of optimality, given the optimal trajectory from x0 to xT , if x ∗ (t1 ) is on this trajectory, the optimal control from t1 to T is merely u∗ (t +t1 ) over the remaining T −t1 units of time. If a family of optimal controls can be generated off-line for different values of the state, they can be stored and used to train a neural network. Such a neural network will have for its inputs x(t) the state of the system, xT the final state, and Tr the time to go (i.e., T − t). The following two problems are considered in Reference [71], and only brief descriptions of the problems and the corresponding solutions are presented here. PROBLEM 11 (Minimum Time) A second-order system is described by the differential equations x˙ 1 = x2 , x˙ 2 = u. The scalar input u(·) satisfies the amplitude constraint |u(t)| ≤ 1, and the trajectories should lie outside a circular region in the state space. A specified initial state x0 must be transferred to a final state x f in minimal time. It can be shown that the solution to the above problem (1) does not exist, or (2) is bang bang, or (3) is piecewise continuous. The optimal trajectory in the state space and the corresponding input for a typical pair of states x0 and x f (generated by a neural network) is shown in Figure 2.20. PROBLEM 12 (Minimum Energy) Another problem that is generally included in standard textbooks on control systems deals with minimum energy control, for which a closed form solution exists.
x1 Constrained Region
x0
xf x2
+1
–1
FIGURE 2.20
P1: Binaya Dash November 16, 2007
16:58
7985
7985˙C002
78
Modeling and Control of Complex Systems u(t)
Plant . x = Ax + bu
x(t)
x0
Neural Network
xT Tr =T – t
FIGURE 2.21
Our interest is in determining a feedback controller for such a problem using a neural network and comparing the computational effort involved in the two cases. A system is described by the linear state equation: x˙ = Ax(t) + bu(t)
(2.106)
where x(t) ∈ Rn , u(t) ∈ R, A ∈ Rn×n and b ∈ Rn , and (A, b) is controllable. The objective is to transfer an initial state x0 at time t = 0 to a final state x(T) = xT "T where T is specified, while minimizing the energy E = 12 0 u2 (τ )dτ . The open-loop optimal control u∗ (t) is given by u∗ (t) = −b T W(Tr ) −1 [x(t) − e −ATr x f ] where Tr is the remaining time, that is, T − t and Tr T W(Tr ) = e −Aτ bb T e −A τ dτ.
(2.107)
(2.108)
0
Assuming that the optimal control input and the corresponding response x ∗ (t) have been generated for a number of triples (x0 , xT , and T), a neural network having the structure shown in Figure 2.21 can be trained as a feedback controller. 2.6.1.3 Computational Advantage A comparison of the computational times required to evaluate the Grammian matrix W, using standard techniques and a neural network, reveals that the latter has a significant advantage (0.27 ms vs. 600 ms). Comment 18 The method proposed here, when applied to a slowly varying terminal state, also performed remarkably accurately. 2.6.1.4 Other Formulations Thus far, the basic philosophy has been to solve the optimal control problem off-line a number of times to obtain optimal solutions, and use the information at different levels to train a neural network to act as a feedback controller. Many other formulations were suggested in the late 1990s to reduce the computational effort. Two of them are shown in Figure 2.22. The idea is
P1: Binaya Dash November 16, 2007
16:58
7985
7985˙C002
Control of Complex Systems Using Neural Networks x0 xf tf
79
λ0 = N2 (x 0 , x f , t f ) λ0 u(t)
. x = f (x, u) . λ = Λ( λ, x, u)
x(t) λ(t)
u = G(x, λ) u(t)
u = G(x, λ)
x(t)
. x = f (x, u)
λ(t) = N3(x, xf ,Tf )
x(t) xf Tf
FIGURE 2.22
to convert the TPBVP into an initial value problem, which is substantially simpler to solve. In particular, if given x0 , xt f and tf a neural network can map them on to λ0 , the initial value of λ(t), the 2n Equations (2.104) can be integrated forward in time, yielding x(t) and λ(t) simultaneously. This is shown in Figure 2.22a. A more robust method is shown in Figure 2.22b where a neural network yields λ(t) corresponding to the triple (xt , xt f and Tt f (= t f − t)). 2.6.2 Dynamic Programming in Continuous and Discrete Time Although optimization using dynamic programming is well known, we review briefly in this section the problem to be addressed and the principal concepts involved to aid us in the discussions that follow involving approximate methods that have been proposed in the literature. We first state the problem in continuous time and later switch to discrete-time systems for the practical realization of the solutions. 2.6.2.1 Continuous Time (No Uncertainty) A performance function J [x, u, t] is defined as: J [x(t), u(t), t] = φ(x(t f )) +
tf
L[x(τ ), u(τ ), τ ]dτ
(2.109)
t0
and the objective is to determine a control law that minimizes the performance function. The state x(t) ∈ Rn of the system is governed by the
P1: Binaya Dash November 16, 2007
16:58
7985
7985˙C002
80
Modeling and Control of Complex Systems
differential equation: x˙ (t) = f [x(t), u(t), t]
(2.110)
and u(t) ∈ Rr denotes the control input. The initial and final times t0 and t f as well as the initial and final states x(t0 ) = x0 and x(T) = xT are assumed to be specified (in which case φ in Equation (2.109) can be omitted) or the cost of the final state can be explicitly included in φ. L : Rn × Rr × R → R is the instantaneous cost as in standard optimal control theory. If J ∗ represents the optimum value of J we have ! tf L(x(τ, u(τ ), τ ))dτ ) (2.111) J ∗ [x(t), t] = min φ(x(t f ) + t0 ≤τ ≤t f
t0
If the integral in Equation (2.111) is over the interval [t, t f ] expressed as the " tf " t+t Ldτ + t+t Ldτ , by the principle of optimality, the sum of two integrals t second integral must be optimal independent of the value of the first integral. Extending this argument to the case t → 0, we obtain the Hamilton–Jacobi– Bellman equation ! ∂ J ∗ [x(t), t] ∂ ∗ T = − min L(x(t), u(t), t) + J [x(t), t] f (x(t), u(t), t) u(t) ∂t ∂x (2.112) Partial differential equations of the form given in Equation (2.112) are in general extremely hard to solve and exact solutions have been derived mainly for linear systems (2.110) with quadratic performance criteria. This is why researchers became interested in neural networks, since they are universal approximators, and they permit the problem to be stated as a parameter optimization problem. Comment 19 Before we discuss the use of neural networks to carry out the computation, it is important to bear in mind that, in general, the above problem admits only a nonsmooth viscosity solution. Therefore assumptions about the performance measure J and the system (2.110) have to be carefully examined before attempting the use of neural networks. 2.6.2.2 Discrete Time (No Uncertainty) In discrete time the procedure is substantially more transparent. The analogous problem can now be stated as follows: J T = J [t0 , T] [x(k), u(k), k] = φ(x(T)) +
k=T
L(x(k), u(k), k)
(2.113)
k=k0
subject to x(k + 1) = f [x(k), u(k), k] x(0) = x0
(2.114)
P1: Binaya Dash November 16, 2007
16:58
7985
7985˙C002
Control of Complex Systems Using Neural Networks
81
As J [T, T] [x(k), u(k), k] = φ(x(T)), we start by considering the transfer from k = T − 1 to k = T: ∗ J [T−1, T] = J [T, T] [ f (x, u, T − 1)] + L(x, u, T − 1)
(2.115)
where (x, u) implies [x(T − 1), u(T − 1)]. This is a one-step optimization problem whose solution yields the optimal u(T − 1) for any initial state x(T − 1). The expression for J [T−2, T] is similar to Equation (2.115), except that the optimal cost from T − 1 to T has to be used in place of J [T, T] . We therefore have ∗ ∗ J [T−2, T] = min [L(x, u, T − 2) + J T−1 (x, T − 1)]
(2.116)
u(T−2)
Proceeding backwards in time, we have the general equation ∗ J [T−k, T] [x, T − k] = min
u(T−k)
#
$
∗ L(x, u, T − k) + J [T−k+1, T] [ f (x, u, T − k)]
(2.117)
where k = 1, 2, . . . , T is the stage number. Because the procedure involves computations backwards in time (as generally in optimal control), the computations have to be carried out off-line. At state T − k, one obtains the optimal control law u∗ (T − k), by the minimization of the function in Equation (2.113) as function g(x(T − k), T − k) of the state and time. This in turn is used to ∗ compute the optimal performance J [T−k, T] [x, T − k], which in the next step ∗ is used to derive u (T − k − 1). As stated earlier, except for the LQ (linearquadratic) problem which has been studied extensively, an explicit expression of g : Rn × R → Rm is very hard to obtain. In such a case the state space is discretized and an exhaustive search over the space of admissible solutions is performed to find the optimum. The number of computations grows exponentially with the dimension of the state space, and this is generally referred to as the curse of dimensionality. 2.6.2.3 Discrete Time (System Unknown) In the problems of interest to us in this chapter, accurate knowledge of complex nonlinear systems is not available. When f (·) in Equation (2.117) is unknown or partially unknown, the methods described thus far cannot be applied and incremental minimization methods of the type treated in previous sections become necessary. These are referred to collectively as approximate dynamic programming. In this section we confine our attention to discrete-time methods, in which most of the current work is being carried out. In these methods, which have been strongly influenced by reinforcement learning, one proceeds forward in time to estimate future rewards based on state and action transitions. Incremental dynamic programming is combined with a parametric structure (using neural networks) to reduce the computational complexity of estimating cost.
P1: Binaya Dash November 16, 2007
16:58
7985
7985˙C002
82
Modeling and Control of Complex Systems
2.6.2.3.1 Adaptive Critics In view of the complexity of the problems and the computations they entail, it is not surprising that a variety of methods have been proposed, which include heuristic dynamic programming (HDP), dual heuristic programming (DHP), globalized dual heuristic programming (GDHP), and the so-called action-dependent programming (AP) variants of HDP and DHP. All of them involve the use of neural networks to approximate the value function, the decision law, and the system dynamics, so that the problem is reduced to one of parameter optimization. As in previous sections we shall try to address the basic ideas of the different methods. Following this, we shall briefly outline the differences between the various methods proposed and proceed to comment on convergence and stability questions they give rise to [72]. Unlike classical dynamic programming, we proceed forward in time but nevertheless determine the optimal ∗ control law by using the same recurrence Equation (2.117) where J [T−k+1, T] is ∗ ˆ replaced by an estimate of the optimal cost-to-go J [T−k+1, T] . So, at k = T we solve: ∗ ˆ∗ ˆ J [0, T] = min{L(x, u, 0) + J [1, T] [ f (x, u, 0)]} u(0)
(2.118)
where fˆ is the estimate of f at time T − k = 0. At step 1 the procedure is repeated, that is, ∗ ˆ∗ ˆ J [1, T] = min{L(x, u, 1) + J [2, T] [ f (x, u, 1)]} u(1)
(2.119)
∗ ∗ ∗ Again, Jˆ[2, T] is used instead of J [2, T] . The estimate of the optimal cost J as a function of the state has been updated using the cost that was actually caused by the control u(0) in the previous instant. Repeating this process at every stage k = T, . . . , 1, the estimate of the optimal policy uˆ ∗ (x), the estimate of the plant dynamics fˆ (x, u, k) and the estimate of the optimal cost-to-go ∗ Jˆ[T−k, T] are evaluated. The evaluation of all three functions at any stage k based on x(k) is carried out iteratively over l cycles (so that k denotes the time instant, while l denotes the number of iterations at that time instant). It is claimed that this procedure will result in the convergence of uˆ ∗ (x, k) to u∗ (k) and Jˆ ∗ to J ∗ . Although the optimization procedure described above contains the essential ideas of the four methods mentioned earlier, they differ in the exact nature in which the various steps are executed (e.g., the nature of the functionals or their derivatives that they attempt to realize) as shown below. Heuristic dynamic programming (HDP) is essentially the procedure outlined above, and represents conceptually the simplest form of design. It uses two neural networks to approximate the value function and the decision law. In dual heuristic programming (DHP) neural networks are used to approximate the derivatives of the value function with respect to the state variables (used in the computation of the control law). Empirically, the resulting updating laws have been shown to converge more rapidly than HDP, although at the
P1: Binaya Dash November 16, 2007
16:58
7985
7985˙C002
Control of Complex Systems Using Neural Networks
83
cost of more complexity, as a vector function is approximated rather than a scalar. This also results in the relationship between the updatings of the value function and the control law becoming more complicated. The globalized dual heuristic programming attempts to combine the fast convergence of DHP with the easier implementation of HDP. Action-dependent (AD) methods modify the above three methods by using a value function V(x, α) in place of V(x), where α is the control action. Once again empirically this has been shown to result in an improved convergence rate. The reader is referred to Reference [73] for details concerning all four methods. Comment 20 (Convergence) The control law is determined as the one that ∗ minimizes the instantaneous cost L(x, u(x), ˆ T−k) in the value function J [T−k, T] . ∗ ˆ At every stage, only an estimate J [T−k, T] is available and hence the quality of the resulting control depends on how close the estimate is to the actual optimal trajectory. Hence, u(x) ˆ is suboptimal in general. But then, if it is applied in the next step of the recursive procedure, it does not really minimize the instantaneous cost L(x, u(x), ˆ T − k). Hence, L does not contribute to the cost in the same way that the optimal cost L ∗ would, and this, in turn, may distort the estimate of the value function. A word must be added regarding the logic ∗ behind the forward programming scheme. The improvement of Jˆ T−k, T is of no use for determining the next control input. As expected, the procedure approximates the cost only in hindsight, that is, after the control has been applied. However, in the procedures proposed many iterations are carried out for the same time instant. Comment 21 (Stability) Because u(x) ˆ applied to the system is suboptimal we have to assume that it does not destabilize the system. Comment 22 (Stability) If J ∗ is seen as the output of a system and Jˆ ∗ is its estimate, then clearly, by testing the system and comparing the output to the estimate, one gains information that can be used to adjust the estimate at the next instant of time. This is similar to the viewpoint adopted in adaptive control where the adjustment is performed on a suitably parameterized system model. Accepting this analogy temporarily, we recall that a series of questions had to be answered in order to prove stability of the adaptive process. These questions are concerned with the way in which the adjustment of parameters and the application of the control are to be interwoven so as to keep all the signals in the system bounded. A similar issue arises in the present case, since the system is controlled at the same time that an estimate of the cost to go is generated.
2.6.2.3.2 Conclusion 1. At the present time, we do not believe that the methods described in Section 2.6.2 can be used to stabilize an unknown nonlinear dynamical system online while optimizing a performance function
P1: Binaya Dash November 16, 2007
84
16:58
7985
7985˙C002
Modeling and Control of Complex Systems over a finite interval. However, as in Section 2.4, stability may be arrived at provided that the uncertainty regarding the plant is sufficiently small and initial choices are sufficiently close to the optimal values. It should also be mentioned that a large body of empirical evidence exists which demonstrates the success of the method in practical applications. 2. To the authors’ knowledge the stability of even a linear adaptive system optimized over a finite interval has not been resolved thus far. As it is likely that conditions for the latter can be derived, we believe that it should be undertaken first so that we have a better appreciation of the difficulties encountered in the nonlinear case. 3. In problems discussed in Section 2.6.1 it was assumed that the dynamics of the plant were known. In light of the discussions in Section 2.6.2, it may be worth attempting the same problem (i.e., with plant uncertainty) using optimal control methods described in Section 2.6.1. By providing an alternative viewpoint it would complement much of the work that is currently in progress using dynamic programming methods. 4. Finally, the authors also believe that the multiple-model-based approach [41] (which is now used extensively in many different fields) may have much to offer to the problems discussed in this section. Although it is bound to be computationally intensive, the use of multiple trajectories at each stage would increase the probability of convergence. The reader is referred to References [41]–[43] where a switching scheme is proposed to achieve both stability and accuracy.
2.7
Applications of Neural Networks to Control Problems
A computer search carried out by the first author about ten years ago revealed that 9,955 articles with the title “neural networks” were published in the engineering literature over a five-year period, of which over 8,000 dealt with problems related to function approximation. Of the remaining 1,500, approximately 350 were related to applications, which were concerned with theory, experiments in the laboratory, and primarily simulation studies. Only 14 of the roughly 10,000 articles dealt with real applications. The authors have once again searched the engineering literature and have concluded that matters are not very different today. For a comprehensive and systematic classification of neural network-based control applications they refer the reader to the paper by Agarwal [74]. They are also aware that a number of exciting applications of neural networks have existed in the industrial world during the entire period, and that many of them do not appear in
P1: Binaya Dash November 16, 2007
16:58
7985
7985˙C002
Control of Complex Systems Using Neural Networks
85
journal articles for proprietary reasons. In this section we present a few carefully chosen applications. These have been collected from books and technical articles, and from friends in industry through private communication. These fall into three distinct categories. The first consists of applications that are in one way or another related to the issues raised in Sections 2.3 and 2.4, and indicate the extent to which theory plays a role in design. Sufficient details are provided about these problems. In the second category are included those applications that emphasize the practical considerations that determine the choices that have to be made. The third category consists of novel applications where the emphasis is not on mathematical techniques but on the ingenuity and creativity of some of our colleagues. 2.7.1 Application 1: Controller in a Hard Disk Drive [75] This concerns a high-performance servo controller for data acquisition in a hard disk drive using an electromechanical voice-coil motor (VCM) actuator. Such high-performance controllers are needed due to the rapid increase in data storage density. 2.7.1.1 Model If q is the position of the actuator tip and q˙ is its velocity, the model of such a system is described by the equation: M¨q + F (q , q˙ ) = u
(2.120)
where M is the system inertia and is unknown, and the smooth function F (·) is also unknown but bounded with a known bound K F , that is, |F | ≤ K F . Further, (q , q˙ ) ∈ S where S is a compact set and is known. 2.7.1.2 Radial Basis Function Network Since the domain over which F needs to be approximated is compact and F is bounded, the approximation can be achieved using a radial basis function network such that F (q , q˙ ) = θ T R(q , q˙ ) + E F
(2.121)
where R is a vector of radial basis functions, θ is an unknown constant vector and |E F | ≤ K E is an error term. Since the state is bounded, the principal concern is accuracy. 2.7.1.3 Objective ˆ and θˆ are estimates of M and θ, the objective is to determine adaptive laws If M for updating the estimates and at every instant use the estimates to determine the control input to transfer the initial state to the desired final state.
P1: Binaya Dash November 16, 2007
16:58
7985
7985˙C002
86
Modeling and Control of Complex Systems
2.7.1.4 Adaptive Laws and Control Law The following adaptive laws for determining the parameter estimates and control law for generating the input u were used. ˙ˆ = −γ q¨ r M r θ˙ˆ = −φ(q , q˙ )r " ˆ qr + θˆ T φ(q , q˙ ) − ( K d r˙ + K r + K i t r dτ ) − K E sgn(r ) u = M¨ 0 where = q − q d , q˙ r = q˙ d − λ , are error signals.
and r = q˙ − q˙ r = ˙ + λ
(2.122)
λ>0
Comment 23 This is a well-defined problem for which neural networks can be designed, as the compact region in which the trajectories lie is known a priori, and all radial basis functions were carefully chosen to cover the operational range of the controller.
2.7.2 Application 2: Lean Combustion in Spark Ignition Engines [76] This problem discussed by He and Jagannathan is a very interesting application of neural networks in control. At the same time it also raises theoretical questions related to those discussed in Section 2.4. It deals with spark ignition (SI) engines at extreme lean conditions. The control of engines at lean operating conditions is desirable to reduce emissions and to improve fuel efficiency. However, the engine exhibits strong cyclic variations in heat release, which is undesirable from the point of view of stability. The objective of the design is consequently to reduce cyclic variations in heat release at lean engine operation. The system to be controlled is described by the equations: x1 (k + 1) = f 1 (x1 (k), x2 (k)) + g1 (x1 (k), x2 (k))x2 (k) + d1 (k) x2 (k + 1) = f 2 (x1 (k), x2 (k)) + g2 (x1 (k), x2 (k))u(k) + d2 (k)
(2.123)
where x2 (k) is the mass of fuel before the kth burn, x1 (k) is the mass of air before the kth burn, and u, the control variable, is the change of mass of fuel per cycle. f 1 , f 2 , g1 , and g2 are smooth unknown functions of their arguments and gi (k) are known to lie in the intervals [0, gmi ], i = 1, 2. d1 (·) and d(·) are external disturbances. 2.7.2.1 Objective The objective is to maintain x1 (k) at a constant value (Problem 2) and reduce the variations in the ratio x2 (k)/x1 (k) over a cycle, by using u as the control variable.
P1: Binaya Dash November 16, 2007
16:58
7985
7985˙C002
Control of Complex Systems Using Neural Networks
87
2.7.2.2 Method The authors use a nonlinear version of backstepping (which we have discussed in Section 2.4) and use x2 (k) as the virtual control input and u(k) as the control input. The former requires the adjustment of the weight vector w1 of a neural network, and the latter that of the weight vector w2 of a second network. The adaptive laws are
w1 (k + 1) = w1 (k) − α1 φ(z1 (k)) w1T (k)φ(z1 (k)) + k1 e 1 (k)
α2 w2 (k + 1) = w2 (k) − σ (x2 (k)) z1T (k)σ (z2 (k)) + k2 e 2 (k) (2.124) k2 where z1 (k) = [x1 (k) x2 (k) x1d ]T and z2 (k) = [x1 (k) x2 (k) w1 (k)]T . The authors claim that the objectives set forth earlier are achieved and that the performance is highly satisfactory. Comment 24 Although this is a very ingenious application, we do not agree with the theoretical arguments used to justify the structure of the controllers. Significantly more information about the manner in which the basis functions were chosen, and the conditions under which the neural networks were trained, need to be known before questions of theoretical correctness can be argued. 2.7.3 Application 3: MIMO Furnace Control [77] An application of neural networks for temperature control was developed by Omron Inc. in Japan over ten years ago. It is included here because it exemplifies the set-point regulation problem of a nonlinear system posed in Section 2.3 and discussed in Section 2.4. It deals with an MIMO temperature control system. The range of the temperatures and the accuracy of the setpoint control affect the final products in an industrial process. The objective is to regulate temperature in three channels by bringing up the set point during startup as quickly as possible while avoiding overshoots. The system is open-loop stable so that neural networks can be trained to obtain both forward and inverse models of the three channels of interest. The identification models, in turn, are used to train the controllers. A comparison of the proposed scheme with a self-tuning controller and a proportionalintegral–derivative (PID) controller demonstrated that neurocontrollers are considerably more robust than the other two, and can also cope with changes in the dynamics of the plant. 2.7.4 Application 4: Fed-Batch Fermentation Processes [78] Biofermentation processes, in which microorganisms grown on a suitable substrate synthesize a desired substance, are used widely to produce a large number of useful products. The application of neural network-based controllers discussed by Boskovi´ ˇ c and Narendra in 1995 for the control of fed-batch fermentation processes [78] is an appropriate one for examination in the present
P1: Binaya Dash November 16, 2007
88
16:58
7985
7985˙C002
Modeling and Control of Complex Systems
context. It reveals clearly that the efficacy of a control strategy in a practical application depends upon a number of factors, which include the prior information needed to implement the controller, the difficulty encountered in choosing design parameters, stability and robustness issues, the effect of measurement noise, and the computational aspects involved in the implementation of the control algorithm. The paper deals with the control of a distinctly nonlinear system whose dynamics are not known precisely, whose initial conditions and parameters can vary, and whose inputs have saturation constraints. The system is openloop stable and all its five state variables are accessible. The two inputs of the system u1 (·) and u2 (·) are to be determined to maximize the production of microorganisms [i.e., x1 (k)] in the interval [0, T], while assuring that one of the state variables x4 (k) (ethanol) is small. On the basis of the results presented in the paper it became clear that the method to be used for controlling a fermentation process would depend upon several factors, including the extent to which parameter values and initial conditions may vary, the error the designer would be willing to tolerate, and prior knowledge concerning nonlinearities. It was concluded that linear adaptive controllers and neurocontrollers were the only two viable alternatives. Even among these, if accuracy and robustness are critical issues, neural network-based control is distinctly preferable. 2.7.5 Application 5: Automotive Control Systems The research team at Ford Research Laboratory, under the leadership of Feldkamp, has been investigating for about fifteen years the efficacy of neural network techniques for addressing different problems that arise in automotive systems. The experiments that they have carried out under carefully controlled conditions, the meticulous manner in which they have examined various issues, as well as their candid comments concerning the outcomes, have had a great impact on the confidence of the neural network community in general, and the authors in particular, regarding the practical applicability of neural networks. The reader is referred to References [13], [79]–[82], which are papers presented by Feldkamp and coworkers at the Yale Workshops during the period 1994 to 2005. Idle speed control implies holding the engine speed at or near an externally supplied target speed in the presence of disturbances. The latter include load from the air conditioning system, load from electrical accessories such as windows, and load from power steering. Some of these loads may be large enough to stall a poorly controlled engine. The controls to the system are throttle position and spark advance. As these have different dynamics and control authority, an interesting aspect of the control problem is in coordinating the control actions effectively. In Reference [83] an attempt was made to develop a recurrent neural network idle speed controller for a four-cylinder engine. An identification
P1: Binaya Dash November 16, 2007
16:58
7985
7985˙C002
Control of Complex Systems Using Neural Networks
89
stage preceded the design of a controller that was trained online. Training was carried out with no disturbance, a single disturbance, and combinations of disturbances during which engine speed measurements were compared to the target speed, and an extended Kalman filter rather than simple gradient updates were used. The latter involved truncated back propagation through time. A variation of the above procedure was to identify the system using a recurrent network and use it off-line to train a controller. Over the 1990s, the same approach was used to obtain nonlinear models for active suspension and antilock braking. In all cases the best estimates of noise, parameter uncertainty, measurement error, and actuator delays were used. Of current interest to the group is a setting in which a nominal model is given and the objective is to develop a controller with an adjustable tradeoff between performance and robustness. Feldkamp has been arguing strongly for a long time for the use of recurrent networks as controllers. Training such networks off-line eliminates stability issues provided an initial set of values can be found which will eventually yield a satisfactory reduction in the chosen cost function. Based on this, the first author is starting a program at Yale to study the theoretical properties of recurrent neural networks. 2.7.6 Application 6: Biped Walking Robot [84] An exciting, ambitious, and very complex application, whose development over many years was followed by the first author, was the biped walking robot designed, implemented, and tested by Kun and Miller at the University of New Hampshire. It encompassed almost all the difficulties enumerated in Section 2.4, including unstable nonlinear dynamics, time delays in the control loops, nonlinear kinematics that are difficult to model accurately, and noisy sensors. In spite of these difficulties walking control strategies were tested and extended in studies over generations of bipeds. From the point of view of this paper it brings to focus some of the theoretical questions concerning stability raised in Section 2.4. Dynamic walking promises higher speeds, improved walking structures, and greater efficiency. However, this also implies exploring new regions in the state space to learn better strategies, which automatically brings in its wake stability questions. Three neural networks controlled front-to-back and side-to-side balance, and good foot contact. The first network controlled instantaneous front/back position of the hips relative to the feet, the second network predicted the correct amplitude and velocity of side-to-side lean during each step. The third network was used to learn kinematically consistent postures. All three networks have to operate in new regions to learn the proper strategies, and in all cases instability is a distinct possibility. Frequent human support was needed to keep the biped from falling during the learning process when it was learning to perform satisfactorily in unfamiliar regions.
P1: Binaya Dash November 16, 2007
90
16:58
7985
7985˙C002
Modeling and Control of Complex Systems
2.7.7 Application 7: Real-Time Predictive Control in the Manufacturing Industry The following is a recent application from industry, but for proprietary reasons details concerning the process controlled are not provided. In a manufacturing plant, the sensors used to measure the relevant outputs do not report actual readings frequently enough for effective feedback control. Further, the plant has unmodeled dynamics which consist of several parts that are very hard to model from first principles. Hence, the system presented the right opportunity to develop a dynamic model using the neural network methods described in Section 2.4. The neural network model was used as a virtual sensor, and based on the estimate of the state provided by the model, an intelligent set-point generator was developed to determine set points for two control variables. A 32% improvement in performance of the overall system was achieved. 2.7.8 Application 8: Multiple Models: Switching and Tuning The multiple-model switching and tuning method proposed in Reference [41] is currently widely used in very different areas. Brief descriptions of two applications are given below: 1. Anomaly detection in finance. The stock markets in the United States employ the industry’s most sophisticated real-time surveillance systems to ensure investor protection and a fair and competitive trading environment. They have designed and developed unusual methods for detecting universal real-time market activity. The “switch and tune” approach has been used to develop piecewise models of various regions in the underlying operating space. New regions are explored and flagged by the use of anomaly detection methods using neural networks, while well-traversed spaces are learned using a local classifier which classifies activity as an anomaly or not. 2. Reconfigurable control of space structural platforms. In a broad spectrum of aerospace applications, achieving acceptable performance over an extended operating range may be difficult due to a variety of factors such as high dimensionality, multiple inputs and outputs, complex performance criterion, and operational constraints. The multiple-model-based “switching and tuning” philosophy described in Section 2.4 is ideally suited for such problems. One example of such a system is described in Reference [85]. The system considered is a flexible structure with nonlinear and time-varying dynamics. An adaptive radial basis function network is used to identify most of the spatiotemporal interaction among the structure members. A fault diagnosis system provides neurocontrollers with various failure scenarios, and an associative memory compensates for catastrophic changes of structural parameters by providing
P1: Binaya Dash November 16, 2007
16:58
7985
7985˙C002
Control of Complex Systems Using Neural Networks
91
a continuous solution space of acceptable control configurations. As stated in Section 2.4, the latter are designed a priori. 2.7.9 Application 9: Biological Control Structures for Engineering Systems In the introduction it was stated that control theorists were attracted to the new field of neurocontrol in the 1980s, inspired by the ability of biological systems to interact with uncertain complex environments. Thus far, most of the systems that we have discussed are engineering systems. The following interesting application by Doyle et al. [86] is a notable exception. In it, the authors attempt reverse engineering by using biological structures for applications in control systems. The baroreceptor vagal reflex is responsible for short-term blood pressure control. The authors make a very good case that it provides an excellent biological paradigm for the development of control strategies for multipleinput single-output (MISO) processes. The components of the system have well-defined control analogs. The central nervous system is the “controller.” The transducers in the major blood vessels (baroreceptors) are the “sensors,” and the sympathetic and vagal postganglionic motor neurons in the heart and vessels are “actuators.” Demand (eating, exercise), external inputs (cold weather), emotional state (joy, anger), and anticipated action (postural adjustment) correspond to “time-varying environments.” Maintaining blood pressure around a set point dictated by cardiovascular demands is the objective. The control system performs a variety of tasks which includes integration of multiple inputs, noise filtering, compensation for nonlinear features of cardiovascular function, and generation of a robust control input. The primary objective of the authors is to understand the above system and then to mimic the functions for process control application. The MISO control architecture employed in the baroreceptor reflex consists of two parallel controllers in the central nervous system: the sympathetic and the parasympathetic systems. Whereas the response of the latter is relatively fast (2–4.5 sec), the response of the former is slow (10–80 sec). However, the faster control is “expensive,” whereas the slower control is acceptable. The brain coordinates the use of the two controllers to provide effective blood pressure control while minimizing the long-term cost of the control actions. This is one of many functions discussed by the authors which is used for reverse engineering. We consider this a truly noteworthy application. 2.7.10 Application 10: Preconscious Attention Control An application that is distinctly different from those discussed thus far, which combines engineering and biology, and which was brought to the attention of the first author, concerns preconscious attention control. Automobile collision avoidance is an example of a problem where improved recognition results in reduced reaction time, but where visual distraction decreases overall performance. In this application, the authors [87] seek to aid an operator in recognizing an object requiring attention by first presenting an object image
P1: Binaya Dash November 16, 2007
92
16:58
7985
7985˙C002
Modeling and Control of Complex Systems
to the preconscious region of visual awareness (the region in which neural activity registers without the user’s awareness). Preconscious exposure results in subsequent processing of identical images to require less neural activity in identification (known as visual priming). Visual priming results from the plasticity of the brain and is known to facilitate recognition, as recognition becomes a function of memory as well as awareness. Because each individual is assumed to have unique visual sensitivities that evolve with all prior experience, a neural network is used to derive an operator-specific sensitivity to images presented outside of awareness. From an engineering point of view, the interesting feature of the indirect control procedure is that the output of the reference model is a function of the state of the plant. The effect of visual priming is found by comparing an operator’s reaction time with that of a non-primed operator. The observed reduction in reaction time is directly proportional to the reduction in neural activity. For further details concerning this application the reader is referred to Reference [87].
2.8
Comments and Conclusions
The chapter examines the current status of neurocontrol and the methods available for infinite-time and finite-time adaptive control of nonlinear systems. The theoretical basis for their design, the determination of the existence of appropriate maps, and the realization of identifiers and controllers when such maps exist are discussed. The emphasis of the chapter is on the simple ideas that underlie the different methods. Spread throughout the chapter are comments to relate the procedures proposed to well-established concepts in adaptive control. Section 2.2 presents many of the basic results from control theory and adaptive control theory for easy reference, and concludes with a statement of problems for investigation in the following sections. As neural networks are the principal components used to cope with nonlinearities, their structures and the adaptive laws used to train them are discussed in Section 2.3. Section 2.4 deals with the methods that have been proposed to identify and control nonlinear dynamic systems. The authors believe that the method based on linearization is one that is general and can be rigorously justified. It is valid in a neighborhood of the equilibrium state where the linear terms dominate the nonlinear components. In such cases, using the inverse function theorem and the implicit function theorem all the problems investigated in the past in adaptive control theory can be revisited after including nonlinear terms in the description of the plant. Stability can be assured using linear methods and fast adaptation is possible. Slow adjustment of the parameters of the neural networks can be used, without affecting stability, to achieve greater accuracy. Extensive simulation results have shown that need not be small.
P1: Binaya Dash November 16, 2007
16:58
7985
7985˙C002
Control of Complex Systems Using Neural Networks
93
Although the emphasis in Section 2.4 is on linearization-based methods, several others proposed by different authors are also discussed. The assumptions made by them concerning the dynamics of the plant are critically examined. In particular, the authors believe that the assumptions regarding the existence of basis functions need considerably more justification before they can be accepted as being theoretically rigorous. They also believe that a vast number of contributions made in recent years extending the well-known backstepping procedure to general nonlinear systems cannot be theoretically justified. An introduction to nonlinear control concepts is included in Section 2.5. These become relevant when the nonlinear terms dominate the linear terms in regions of the state space far from equilibrium, and will be needed when attempts are made in the future to control dynamic systems in those regions. Although these concepts have not become part of the mainstream thinking in the neural network community, the authors believe that the latter will take to them enthusiastically in the future, when their power and scope become evident as experience with distinctly nonlinear phenomena increases. Finite time optimization and optimal control is the topic of Section 2.6. Optimal control theory and dynamic programming are the principal mathematical tools used here. In Section 2.6.1, the dynamics of the plant are assumed to be known and optimal control theory is used to design feedback controllers. In Section 2.6.2, the dynamics of the plant are assumed to be unknown, and approximate dynamic programming methods are proposed. The same questions that arise in nonlinear adaptive control also arise in this case, and the authors are of the opinion that the arguments used can be rigorously justified only when the initial trajectories are in the neighborhoods of the optimal trajectories. The chapter concludes with a section on applications. Some of the applications are based on the analytical principles developed in the chapter, whereas others have only a tenuous relation to them. However, most of them are both novel and creative and, like many other similar applications in the history of control, call for theoretical explanations and thereby catalyze research. Nonlinear control using neural networks is still in its infancy, but there is little doubt that great opportunities abound both in theory and in practice.
Acknowledgments The first author would like to thank Kannan Parthasarathy, Asriel Levine, Snehasis Mukhopadhyay, Sai-Ming Li, Jo˜ao Cabrera, Lingji Chen, and Osvaldo Driollet, who were his former graduate students, and Jovan Boskovi´ ˇ c, his former postdoctoral fellow, all of whom collaborated with him on different aspects of the research reported here. In particular, he would like to thank Snehasis, Jo˜ao, and Lingji for many insightful discussions in recent years. He would also like to acknowledge the help received from Lee Feldkamp, Santosh Ananthram, and Alex Parlos concerning recent applications, and
P1: Binaya Dash November 16, 2007
16:58
94
7985
7985˙C002
Modeling and Control of Complex Systems
the generous support from the National Science Foundation (through Paul Werbos) over a period of fifteen years when much of the work on linearization was done. Finally, the authors would like to thank the editors Andreas Pitsillides and Petros Ioannou for their invitation to contribute to this book and for their patience and support.
References 1. W. S. McCulloch and W. Pitts, “A logical calculus of the ideas immanent in nervous activity,” Bulletin of Mathematical Biophysics, vol. 5, pp. 115–133, 1943. 2. N. Wiener, Cybernetics. Cambridge, MA: MIT Press, 1948. 3. D. O. Hebb, Organization of Behavior: A Neuropsychological Theory. New York: Wiley, 1949. 4. F. Rosenblatt, The Perceptron: A Perceiving and Recognizing Automaton, Technical Report 85-460-1. Buffalo, NY: Cornell Aeronautical Laboratory, 1957. 5. B. Widrow, “Generalization and information storage in networks of adaline ‘neurons’,” Self-Organizing Systems, M. Yovitz, G. Jacobi, and G. Goldstein, Editors. Washington, D.C.: Spartan Books, pp. 435–461, 1962. 6. M. Minsky and S. Papert, Perceptrons: An Introduction to Computational Geometry. Cambridge, MA: MIT Press, 1969. 7. P. J. Werbos, Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences. Ph.D. thesis, Cambridge, MA: Harvard University, 1974. 8. D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by error propagation,” Parallel Distributed Processing: Explorations in the Microstructure of Cognition, D. E. Rumelhart and J. L. McClelland, Editors, vol. 1, pp. 318– 362. Combridge, MA: MIT Press, 1986. 9. G. Cybenko, “Approximation by superposition of a sigmoidal function,” Mathematics of Control, Signals, and Systems, vol. 2, pp. 303–314, 1989. 10. K. Funahashi, “On the approximate realization of continuous mappings by neural networks,” Neural Networks, vol. 2, pp. 183–192, 1989. 11. K. Hornik, M. Stinchcombe, and H. White, “Multi-layer feedforward networks are universal approximators,” Neural Networks, vol. 2, pp. 359–366, 1989. 12. K. S. Narendra and K. Parthasarathy, “Identification and control of dynamical systems using neural netwroks,” IEEE Transactions on Neural Networks, vol. 1, pp. 4–27, 1990. 13. L. A. Feldkamp, G. V. Puskorius, L. I. David, and F. Yuan, “Enabling concepts for applications of neuraocontrol,” in Proceedings of the Eighth Yale Workshop (New Haven, CT), pp. 168–173, 1994. 14. W. J. Rugh, Linear System Theory. Englewood Cliffs, NJ: Prentice-Hall, 1995. 15. K. S. Narendra and A. M. Annaswamy, Stable Adaptive Systems. Englewood Cliffs, NJ: Prentice-Hall, 1989 (Dover, 2005). 16. M. Krstic, I. Kanellakopoulos, and P. Kokotovic, Nonlinear and Adaptive Control Design. New York: John Wiley & Sons, 1995. 17. D. Seto, A. M. Annaswamy, and J. Baillieul, “Adaptive control of nonlinear systems with triangular structure,” IEEE Transactions on Automatic Control, vol. 39, pp. 1411–1428, 1994.
P1: Binaya Dash November 16, 2007
16:58
7985
7985˙C002
Control of Complex Systems Using Neural Networks
95
18. R. J. Williams, “Adaptive state representation and estimation using recurrent connectionist networks,” Neural Networks for Control, W. T. Miller, R. S. Sutton, and P. J. Werbos, Editors. Cambridge: MIT Press/Bradford Books, 1990. 19. A. U. Levin and K. S. Narendra, “Control of nonlinear dynamical systems using neural networks: Observability, identification, and control,” IEEE Transactions on Neural Networks, vol. 7, pp. 30–42, 1996. 20. I. J. Leontaritis and S. A. Billings, “Input-output parametric models for nonlinear systems. I: Deterministic non-linear systems,” International Journal of Control, vol. 41, pp. 303–328, 1985. 21. K. S. Narendra and J. H. Taylor, Frequency Domain Criteria for Absolute Stability. New York: Academic Press, 1972. 22. P. C. Parks, “Lyapunov redesign of model reference adaptive control systems,” IEEE Transactions on Automatic Control, vol. 11, pp. 362–367, 1966. 23. K. Funahashi and Y. Nakamura, “Approximation of dynamical systems by continuous time recurrent neural networks,” Neural Networks, vol. 6, pp. 801–806, 1993. 24. B. A. Pearlmutter, “Gradient calculations for dynamic recurrent neural networks: A survey,” IEEE Transactions on Neural Networks, vol. 6, pp. 1212–1228, 1993. 25. D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by backpropagating errors,” Nature, vol. 6, pp. 533–536, 1986. 26. P. J. Werbos, “Backpropagation through time: What it does and how to do it,” Proceedings of the IEEE, vol. 78, pp. 1550–1560, 1990. 27. K. S. Narendra and K. Parthasarathy, “Gradient methods for the optimization of dynamical systems containing neural networks,” IEEE Transactions on Neural Networks, vol. 2, pp. 252–262, 1991. 28. R. J. Williams and D. Zipser, “A learning algorithm for continually running fully recurrent neural networks,” Neural Computation, vol. 1, pp. 270–280, 1989. 29. L. E. McBride and K. S. Narendra, “Optimization of time-varying systems,” IEEE Transactions of the Professional Group on Automatic Control, 1965. 30. A. U. Levin, Neural Networks in Dynamical Systems. Doctoral Dissertation: New Haven, CT: Yale University, 1992. 31. S. Mukhopadhyay, Synthesis of Nonlinear Control Systems Using Neural Networks. Doctoral Dissertation: New Haven, CT: Yale University, 1994. 32. J. B. D. Cabrera and K. S. Narendra, “Issues in the application of neural networks for tracking based on inverse control,” IEEE Transactions on Automatic Control, vol. 44(11), pp. 2007–2027, 1999. 33. S.-M. Li, Pattern Recognition and Control. Doctoral Dissertation: New Haven, CT: Yale University, 1999. 34. L. Chen, Nonlinear Adaptive Control of Discrete-time Systems Using Neural Networks and Multiple Models. Doctoral Dissertation: New Haven, CT: Yale University, 2001. 35. M. J. Feiler, Adaptive Control in the Presence of Disturbances. Doctoral Dissertation: Techinical University of Munich, 2004. 36. L. Chen and K. S. Narendra, “Identification and control of a nonlinear discretetime system based on its linearization: A unified framework,” IEEE Transactions on Neural Networks, vol. 15(3), pp. 663–673, 2004. 37. L. Chen and K. S. Narendra, “Nonlinear adaptive control using neural networks and multiple models,” Automatica, vol. 37, pp. 1245–1255, 2001. 38. P. Ioannou and J. Sun, Robust Adaptive Control. Upper Saddle River, NJ: PrenticeHall, 1996.
P1: Binaya Dash November 16, 2007
96
16:58
7985
7985˙C002
Modeling and Control of Complex Systems
39. K. S. Tsakalis and P. A. Ioannou, Linear Time Varying Systems: Control and Adaptation. Upper Saddle River, NJ: Prentice-Hall, 1993. 40. S. Mukhopadhyay and K. S. Narendra, “Disturbance rejection in nonlinear systems using neural networks,” IEEE Transactions on Neural Networks, vol. 4, pp. 63–72, 1993. 41. K. S. Narendra and J. Balakrishnan, “Adaptive control using multiple models,” IEEE Transactions on Automatic Control, vol. 42, pp. 171–187, 1997. 42. K. S. Narendra, O. A. Driollet, M. Feiler, and K. George, “Adaptive control using multiple models, switching and tuning,” International Journal of Adaptive Control and Signal Processing, vol. 17, pp. 87–102, 2003. 43. K. S. Narendra and O. A. Driollet, “Stochastic adaptive control using multiple models for improved performance in the presence of random disturbances,” International Journal of Adaptive Control and Signal Processing, vol. 15, pp. 287– 317, 2001. 44. K. S. Narendra and S. Mukhopadhyay, “Adaptive control of nonlinear multivariable systems using neural networks,” Neural Networks, vol. 41, pp. 737–752, 1994. 45. P. L. Falb and W. A. Wolovich, “Decoupling in the design and synthesis of multivariable control systems,” IEEE Transactions on Automatic Control, vol. 12, pp. 651–659, 1967. 46. K. S. Narendra and N. O. Oleng, “Exact output tracking in decentralized adaptive control systems,” IEEE Transactions on Automatic Control, vol. 47, pp. 390–395, 2002. 47. B. M. Mirkin, “A new decentralized model reference adaptive control scheme for large scale systems,” in Proceedings of the 4th IFAC Int. Symp. Adaptive Systems Control Signal Processing (Grenoble, France), 1992. 48. J. A. Suykens, J. P. Vandewalle, and B. L. DeMoor, Artificial Neural Networks for Modeling and Control of Non-Linear Systems. New York: Springer-Verlag, 1995. 49. A. S. Poznyak, E. N. Sanchez, and W. Yu, Differential Neural Networks for Robust Nonlinear Control: Indentification, State Estimation and Trajectory Tracking. Singapore: World Scientific, 2001. 50. F.-C. Chen and C.-C. Liu, “Adaptively controlling nonlinear continuous-time systems using multilayer neural networks,” IEEE Transactions on Automatic Control, vol. 39, pp. 1306–1310, 1994. 51. M. M. Polycarpou, “Stable adaptive neural control scheme for nonlinear systems,” IEEE Transactions on Automatic Control, vol. 41, pp. 447–451, 1996. 52. G. A. Rovithakis, “Robustifying nonlinear systems using high order neural network controllers,” IEEE Transactions on Automatic Control, vol. 44, pp. 104–107, 1999. 53. G. A. Rovithakis, “Robust redesign of a neural network controller in the presence of unmodeled dynamics,” IEEE Transactions on Neural Networks, vol. 15, pp. 1482– 1490, 2004. 54. C. Kwan and F. L. Lewis, “Robust backstepping control of nonlinear systems using neural networks,” IEEE Transactions of the Systems, Man, and Cybernetics Society, A, vol. 30, pp. 753–765, 2000. 55. S. S. Ge, T. H. Lee, and C. J. Harris, Adaptive Neural Network Control of Robotic Manipulators. London: World Scientific, 1998. 56. S. S. Ge, C. C. Hang, T. H. Lee, and T. Zhang, Stable Adaptive Neural Network Control. Norwell, MA: Kluwer Academic, 2001.
P1: Binaya Dash November 16, 2007
16:58
7985
7985˙C002
Control of Complex Systems Using Neural Networks
97
57. F. L. Lewis, J. Campos, and R. Selmic, Neuro-Fuzzy Control of Industrial Systems with Actuator Nonlinearities. Philadelphia: Society of Industrial and Applied Mathematics Press, 2002. 58. S. S. Ge and C. Wang, “Adaptive NN control of uncertain nonlinear purefeedback systems,” Automatica, vol. 38, pp. 671–682, 2002. 59. Y. Li, S. Qiang, X. Zhuang, and O. Kaynak, “Robust and adaptive backstepping control for nonlinear systems using rbf neural networks,” IEEE Transactions on Neural Networks, vol. 15, pp. 693–701, 2004. 60. D. Wang and J. Huang, “Neural network-based adaptive dynamic surface control for a class of uncertain nonlinear systems in strict-feedback form,” IEEE Transactions on Neural Networks, vol. 16, pp. 195–202, 2005. 61. K. S. Narendra, Y.-H. Lin, and L. Valavani, “Stable adaptive controller design. II: Proof of stability,” IEEE Transactions on Automatic Control, vol. 25, pp. 440–448, 1980. 62. R. W. Brockett, “System theory on group manifolds and coset spaces,” SIAM Journal on Control, vol. 10, pp. 265–284, 1972. 63. W. M. Boothby, An Introduction to Differentiable Manifolds and Riemannian Geometry. New York: Academic Press, 1975. 64. E. D. Sontag, “Stability and stabilization disturbances: Discontinuities and the effect of disturbances,” Nonlinear Analysis, Differential Equation, and Control, F. H. Clarke and R. J. Stern, Editors. Dordrecht, NL: Kluwer Academic Publishers, 1999. 65. S. Nikitin, Global Controllability and Stabilization of Nonlinear Systems. London: World Scientific, 1994. 66. R. W. Brockett, “Asymptotic stability and feedback stabilization,” Differential Geometric Control Theory, R. W. Brockett, R. S. Millman and H. J. Sussmann, Editors. Boston Birkhauser, pp. 181–191, 1983. 67. A. Isidori, Nonlinear Control Systems. London: Springer, 1995. 68. H. J. Sussmann, “Orbits of families of vector fields and integrability of distribution,” AMS Transactions, vol. 180, pp. 171–188, 1973. 69. D. Aeyels, “Generic observability in differentiable systems,” SIAM Journal on Control and Optimization, vol. 19, pp. 596–603, 1981. 70. M. Nerurkar, “Observability and topological dynamics,” Journal of Dynamics and Differential Equations, vol. 3, pp. 273–287, 1991. 71. K. S. Narendra and S. J. Brown, Neural Networks in Optimal Control: Part I, Technical Report 9703. New Haven, CT: Yale University, Center for Systems Science, 1997. 72. P. J. Werbos, “Approximate dynamic programming for real-time control and neural modeling,” Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches, D. A. White and D. A. Sofge, Editors. New York: Van Nostrand Reinhold, 1992. 73. S. Ferrari and R. F. Stengel, “Model-based adaptive critic designs,” Learning and Approximate Dynamic Programming: Scaling Up to the Real World, J. Si, A. Barto, W. Powell, and D. Wunsch, Editors. New York: IEEE Press and John Wiley & Sons, pp. 65–92, 2004. 74. M. Agarwal, “A systematic classification of neural-network based control,” IEEE Control Systems Magazine, vol. 17, pp. 75–93, 1997. 75. G. Herrmann, S. S. Ge, and G. Guo, “Practical implementation of a neural network controller in a hard disk drive,” IEEE Transactions on Control Systems Technology, vol. 13, pp. 146–154, 2005.
P1: Binaya Dash November 16, 2007
98
16:58
7985
7985˙C002
Modeling and Control of Complex Systems
76. P. He and S. Jagannathan, “Neuro-controller for reducing cyclic variation in lean combustion spark ignition,” Automatica, vol. 41, pp. 1133–1142, 2005. 77. M. Khalid, S. Omatu, and R. Yusof, “MIMO furnace control with neural networks,” IEEE Transactions on Control Systems Technology, vol. 1, pp. 238–245, 1993. 78. J. D. Boskovi´ ˇ c and K. S. Narendra, “Comparison of linear, nonlinear and neuralnetwork-based adaptive controllers for a class of fed-batch fermentation processes,” Automatica, vol. 31, pp. 814–840, 1995. 79. L. A. Feldkamp, G. Puskorius, K. A. Marko, J. V. James, T. M. Feldkamp, and G. Jesion, “Unravelling dynamics with recurrent networks: Application to engine diagnostics,” in Proceedings of the Ninth Yale Workshop (New Haven, CT), pp. 59–64, 1996. 80. D. Prokhorov and L. A. Feldkamp, “Bayesian regularization in extended Kalman filter training of neural networks,” in Proceedings of the Tenth Yale Workshop (New Haven, CT), pp. 77–84, 1998. 81. L. A. Feldkamp, D. Prokhorov, and T. M. Feldkamp, “Conditioned adaptive behavior from a fixed neural network,” in Proceedings of the Eleventh Yale Workshop (New Haven, CT), pp. 78–85, 2001. 82. L. A. Feldkamp and D. Prokhorov, “Recurrent neural networks for state estimation,” in Proceedings of the Twelfth Yale Workshop (New Haven, CT), pp. 17–22, 2003. 83. L. A. Feldkamp and G. V. Puskorius, “A signal processing framework based on dynamic neural networks with application to problems in adaptation, filtering, and classification,” Proceedings of the IEEE, vol. 12, pp. 651–659, 1998. 84. A. Kun and T. Miller, “Adaptive dynamic balance of a biped robot using neural networks,” in Proceedings of the 1996 IEEE International Conference on Robotics and Automation (Minneapolis, MN), 1996. 85. G. G. Yen, “Reconfigurable neural control in precision space structural platforms,” Neural Networks for Control, O. Omidvar and D.L. Elliott, Editors. New York: Academic Press, pp. 289–316, 1997. 86. F. J. Doyle, M. A. Henson, B. A. Ogunnaike, J. S. Schwaber, and I. Rybak, “Neuronal modeling of the baroreceptor reflex with applications in process modeling and control,” Neural Networks for Control, O. Omidvar and D. L. Elliott, Editors. New York: Academic Press, pp. 89–130, 1997. 87. A. Subramanian and D. Gerrity, US Patents 6967594, 6650251 and US Patent Application 60840623.
P1: Binaya Dash November 16, 2007
15:3
7985
7985˙C003
3 Modeling and Control Problems in Building Structures and Bridges
Sami F. Masri and Anastasios G. Chassiakos
CONTENTS 3.1
3.2
3.3
Introduction................................................................................................ 100 3.1.1 Background .................................................................................... 100 3.1.2 Overview of Structural Control of Civil Infrastructure Systems.................................................................. 100 3.1.3 Identification of Structural Systems ........................................... 101 3.1.4 Classification of Identification Techniques for Structural Systems................................................................... 102 3.1.5 Uncertainty in Identification of Structural Systems ................. 103 3.1.6 Scope ............................................................................................... 103 Hybrid Approach for the Identification of Nonlinear Structural Systems..................................................................................... 104 3.2.1 Formulation of Hybrid Parametric/Nonparametric Approach ........................................................................................ 104 3.2.2 Identification of Parametric Linear Part..................................... 106 3.2.3 Identification of Parametric Nonlinear Part .............................. 107 3.2.3.1 Identification of Hysteretic Systems ........................... 108 3.2.3.2 Problem Formulation .................................................... 108 3.2.3.3 Online Identification Algorithm.................................. 109 3.2.4 Identification of the Nonparametric Nonlinear Part................ 111 3.2.4.1 Nonlinear Nonparametric Terms ................................ 111 3.2.4.2 Orthogonal Series Expansion....................................... 112 3.2.4.3 Nonlinear Forces Representation by Chebyshev Series...................................................... 113 3.2.4.4 Neural Network Approach .......................................... 114 Examples and Case Studies...................................................................... 117 3.3.1 Modeling of the Vincent Thomas Bridge Using Earthquake Response Measurements ............................. 117 99
P1: Binaya Dash November 16, 2007
15:3
7985
100
7985˙C003
Modeling and Control of Complex Systems 3.3.2
Online Identification of Hysteretic Joints from Experimental Measurements ............................................. 118 3.3.2.1 Identification Results for Full-Scale Structural Steel Subassembly.......................................................... 118 3.3.2.2 Identification Results for a Structural Reinforced Concrete Subassembly .................................................. 120 3.3.3 Models of Nonlinear Viscous Dampers from Experimental Measurements................................................................................ 121 3.3.4 Further Examples of Parametric Identification of MDOF Systems ........................................................................................... 123 3.3.5 Nonparametric Identification through Volterra–Wiener Neural Networks........................................................................... 125 3.4 Conclusions ................................................................................................ 128 References............................................................................................................. 129
3.1
Introduction
3.1.1 Background Recent developments in the broad field of structural control of civil infrastructure systems have led to increased emphasis on procedures to obtain models, of various types and formats, for representing complex nonlinear systems encountered in the modeling, monitoring, and control of civil structures. The needs that are fueling these developments include (1) the increasing emphasis on high-fidelity simulation models that can be relied on to reduce the need for physical tests; (2) the widespread use of structural health monitoring (SHM) approaches based on the use of vibration response measures; and (3) the need to have robust models of civil structures whose nonlinear motions under strong nonstationary loads are to be actively controlled. This chapter provides a synopsis of recent developments in the modeling and control of complex civil infrastructure components. 3.1.2 Overview of Structural Control of Civil Infrastructure Systems The general field of structural control deals with monitoring and controlling the motion of civil infrastructure systems under the action of dynamic environments. The nature of control theory and practice is largely determined by the particular application under consideration, and the subject of structural control has distinctive features that govern the direction of research. For example, it focuses on the performance of relatively massive structures; it
P1: Binaya Dash November 16, 2007
15:3
7985
7985˙C003
Modeling and Control Problems in Building Structures and Bridges
101
involves excitations whose precise properties are not known beforehand; it may require dissipation of large amounts of kinetic energy; it may involve the application of relatively large counterforces; and it is concerned with only relatively low accuracy of control. It has the potential for important benefits to the economy and to public safety. The excitations of main concern are earthquake, wind, and man-made forces. There are a variety of different approaches to controlling, but not necessarily eliminating, unwanted motions of structures: passive methods such as base-isolation or tuned-mass dampers; active methods in which counterforces are applied to reduce the motions; combined active and passive methods of control; methods of varying structural stiffness; controlled damping; and so on. The objective is to utilize the most effective combination of these methods to provide integrated control of structural vibrations. The ultimate goal of research on structural control is the practical application to real structures, with an acceptable cost. When considering practical applications, related subjects must also be examined, for example: motions and strains must be monitored, sensors must be developed of various types, relevant material properties must be studied, damage detection and health monitoring methods must be developed, problems in system identification must be overcome, and so on. Two different goals of structural control must be given consideration: the utilization of control in the design of new structures and the utilization of control to improve the seismic or wind resistance of existing structures. The problems posed by existing structures differ from the problems of designing new structures because of various constraints imposed by the fact that the building already exists. Further details concerning these issues are available in the work of Housner et al. (1997). 3.1.3 Identification of Structural Systems Structural identification provides a means of utilizing data from laboratory and field testing to improve dynamic modeling capabilities for structural systems. By systematically utilizing dynamic test data from a structure, rather than relying on theory alone, models can be derived that provide more accurate response predictions for dynamic loads on the structure produced by wind, earthquakes, or man-created forces. Identification of structural systems plays a very important role in both SHM and structural control applications. In the case of structural health monitoring, identification techniques are used to determine any changes in the building’s or bridge’s characteristics, especially after a major event, such as an earthquake. In this context, system identification provides an additional tool for defect identification or damage assessment of civil structures. In the case of structural control, identification techniques are used to determine accurate low-order dynamic models of the structure, to be used in the design of vibration control systems.
P1: Binaya Dash November 16, 2007
102
15:3
7985
7985˙C003
Modeling and Control of Complex Systems
Some early publications on the subject are available in the work of Masri and Caughey (1979), Beck and Jennings (1980), Masri et al. (1987a, 1987b), and the references therein. A recent and updated overview of the field can be found in Kerschen et al. (2006). 3.1.4 Classification of Identification Techniques for Structural Systems Identification techniques for structural systems can be classified into two broad categories, parametric and nonparametric, depending on the level of prior knowledge of the system structure required for their implementation. Parametric methods assume that the mathematical form of the model is known from theoretical considerations, apart from some unknown parameters. Identification then consists primarily of estimating values for these parameters from experimental data, although this step should be followed by an assessment of the assumed mathematical form for the model in light of how well it fits the data. If a poor match is observed, it may be necessary to modify this form by improving on the assumptions or approximations that were used and then to repeat the parameter estimation so that a model that gives a better fit to the data can be derived. Parametric methods can be used, for example, to estimate equivalent viscous damping values for the modes of vibration of a structure, which are difficult to derive from theory. They may also be used to assess, or to improve, a linear finite-element model used to predict the dynamic behavior of a structure. Parametric methods also allow feedback to the design process by assessing the accuracy of assumptions used to derive theoretical models that are needed during design to predict what the response of a proposed structure would be under dynamic loadings, such as those arising from wind or earthquakes. Nonparametric methods refer to techniques that require little or no theoretical knowledge of the structure. Instead, they take a “black-box” approach and fit the input-output relation of the structure by some general form. For example, representations involving orthogonal functions or functionals can be used. Parameter estimation is still required to find coefficients in the function expansions, but because these coefficients are not structural parameters directly related to physical properties, these methods are commonly called “nonparametric.” In many practical dynamic problems, the mathematical form of the model is not clear, so an increasing amount of attention has been devoted to nonparametric methods. For example, traditional seismic design of a structure requires that the structure behave in a ductile inelastic fashion in severe earthquakes, so the modeling of this behavior is important. Development of inelastic constitutive models for large-scale structures is an area of much research in earthquake engineering, but despite these valuable efforts, there is no well-accepted mathematical form that can be used to model such behavior
P1: Binaya Dash November 16, 2007
15:3
7985
7985˙C003
Modeling and Control Problems in Building Structures and Bridges
103
with confidence. Even when a reliable constitutive model is available at the element or member level, the sheer number of components in a structure and the complexity of their interactions may make it difficult to build up a mathematical structure for a model based on theory. Traditional nonparametric identification methods do have their own problems, however. These include restrictions on the type of input signals that can be used and restrictions on the nature of the dynamic systems to be identified; for example, some methods are inapplicable to nonlinearities involving infinite memory, which can occur if the structural behavior includes plastic hysteresis. Furthermore, nonparametric techniques may require a prohibitive amount of computational effort, coupled with very demanding storage requirements. The avoidance of a theoretical structural model may be an advantage in deriving a model for use in vibration control or response predictions for an existing structure, which can be dynamically tested. This advantage, however, limits the usefulness of nonparametric methods in improving the analytical modeling capabilities required in dynamic design.
3.1.5 Uncertainty in the Identification of Structural Systems In the usual identification approach, a general mathematical form is chosen to specify a class of parametric or nonparametric models describing the inputoutput relation of a specific structure, but there are free parameters that must be assigned values to choose a model from the class that “best” describes the behavior of the structure. This gives rise to two types of uncertainty. The first type, “parameter uncertainty,” arises simply because the “best” values for the free parameters are not known a priori. This includes the possibility that there may be multiple solutions for the “best” values. The other type of uncertainty, “modeling error,” arises because the accuracy of each model in the class, and in particular that of the “best” models, is not known a priori. One of the main aspects of system identification procedures is the criterion for choosing the “best” models. Related work on the subject can be found in Beck (1990), which shows that a Bayesian probabilistic formulation not only leads to a natural choice for this criterion, but also provides an integrated framework to handle nonuniqueness, modeling error, and measurement noise.
3.1.6 Scope In the sequel, we present an overview of a systematic, hybrid approach for the identification of nonlinear systems in which a parametric model is used to describe the linear part as well as known nonlinear features of the structural model, and a nonparametric approach is used to describe the model-unknown nonlinear part. A review of methods for identification of the parametric part
P1: Binaya Dash November 16, 2007
104
15:3
7985
7985˙C003
Modeling and Control of Complex Systems
is given, which is followed by an examination of several nonparametric approaches for modeling the nonparametric nonlinear part, including orthogonal series expansions and neural networks. Illustrative case studies and further examples are presented to demonstrate the utility of the proposed identification approach.
3.2
Hybrid Approach for the Identification of Nonlinear Structural Systems
3.2.1 Formulation of the Hybrid Parametric/Nonparametric Approach This section presents a general formulation of the dynamics of a broad class of typical nonlinear multidegree-of-freedom (MDOF) structural systems, which leads to the hybrid parametric/nonparametric identification and modeling of such systems. Consider the equation of motion of a discrete MDOF system subjected to directly applied excitation forces: M¨x (t) + f R (x(t), x˙ (t), t) = f (t)
(3.1)
where M is a constant matrix characterizing the inertia properties of the system, x is the displacement vector, f R is a vector valued function representing the system’s restoring forces, and f is the vector of excitation forces. The vector of restoring forces f R may depend on its arguments in a linear or nonlinear fashion. Typically, vector f R is represented as a combination of a linear part f L (t) and an additive nonlinear part f N (t), fR = fL + fN
(3.2)
In the case of structural systems, external influences can act on the structure directly, as represented by the vector f (t) in Equation (3.1), or they may enter the structure indirectly through the motion of the structure’s supports. When we include separate support motions and directly applied external forces, Equation (3.1) will be modified as following: e e e e e x¨ 1 (t) + C11 x˙ 1 (t) + K 11 x 1 (t) + M10 x¨ 0 (t) + C10 x˙ 0 (t) M11 e + K 10 x 0 (t) + f N (t) = f (t),
(3.3)
P1: Binaya Dash November 16, 2007
15:3
7985
7985˙C003
Modeling and Control Problems in Building Structures and Bridges
105
where: f (t) x(t) x 1 (t) x 0 (t) e e e M11 , C11 , K 11
e e e M10 , C10 , K 10
f N (t)
= an n1 column vector of directly applied forces, = (x 1 (t), x 0 (t)) T = system displacement vector of dimension (n1 + n0 ), = internal degree-of-freedom (DOF) displacement vector of dimension n1 , = prescribed support (boundary) displacement vector of dimension n0 , = constant matrices that characterize the inertia, linearized damping, and linearized stiffness forces, respectively, associated with the unconstrained DOF of the system, each of dimension n1 × n1 , = constant matrices that characterize the inertia, linearized damping, and linearized stiffness forces, respectively, associated with the support motions, each of dimension n1 × n0 , = an n1 column vector of nonlinear nonconservative forces involving x 1 (t) as well as x 0 (t).
It is noted that the linear component f L of the restoring forces vector in Equation (3.2) is now represented by the following terms: e e e e e x˙1 + K 11 x 1 + M10 x¨0 + C10 x˙0 + K 10 x0 f L = C11
(3.4)
First the linear parametric part f L of the class of models defined above is identified by using a time-domain method for the system matrices appearing in Equation (3.3) as will be described later. Next the nonlinear forces f N acting on the system are identified. Based on the nature of the problem under consideration, it is often reasonable to postulate a parametric form of a simplified nonlinear model that represents the physical phenomena being analyzed. Examples of such nonlinearities appearing frequently in structural modeling are polynomial spring nonlinearities (such as the Duffing oscillator, whose dynamics for a single DOF system are given by m x¨ + c 1 x˙ + c 2 x + c 3 x 3 = f ), or systems with additional polynomial cross terms (such as the Duffing–Van der Pol oscillator, given for a single DOF system as m x¨ + c 1 x˙ + c 2 x + c 3 x 3 + c 4 x 2 x˙ = f ). Systems with hysteretic properties can fit this category in certain applications, as well. Because in this case the form of the nonlinearity has already been chosen, the identification problem is reduced to determining the optimum values of the model parameters. Furthermore, the nonlinear part f N of the restoring forces can be modeled as consisting of two additive components: a parametric component, f p , as described above, whose functional form is assumed known, and a nonparametric
P1: Binaya Dash November 16, 2007
15:3
7985
7985˙C003
106
Modeling and Control of Complex Systems
component, f np , which does not have a known functional form: f N = f p + f np
(3.5)
These two components will be identified by parametric and nonparametric identification methods, respectively. 3.2.2 Identification of the Parametric Linear Part Consider the equivalent linear system of the system in Equation (3.3) rewritten in the following form: e e e e e x¨ 1 (t) + C11 x˙ 1 (t) + K 11 x 1 (t) + M10 x¨ 0 (t) + C10 x˙ 0 (t) M11 e + K 10 x 0 (t) = f (t) + δ(t).
(3.6)
The term δ(t) contains the nonlinear restoring forces f N , plus additional modeling errors and measurement noise. In the equation-error approach, the linearized system matrices are estimated by minimizing a norm of the error δ(t) over a specified time interval to give the “best” linear model. Let the response vector r (t) of dimension 3(n1 + n0 ) be defined as: T r (t) = x¨ 1T (t), x˙ 1T (t), x 1T (t), x¨ 0T (t), x˙ 0T (t), x 0T (t) .
(3.7)
For clarity of presentation, let the six matrices appearing in Equation (3.6) be denoted by 1 A, 2 A, . . . , 6 A, respectively. Also, let < j Ai >= i th row of a generic matrix j A, and introduce the parameter vector α i : α i = (< 1 Ai >, < 2 Ai >, < 3 Ai >, < 4 Ai >, < 5 Ai >, < 6 Ai >) T . (3.8) Suppose that the excitation and the response of the system governed by Equation (3.6) are measured at times t1 , t2 , . . . , tN . Then at every tk , 1
A¨x 1 (tk ) + 2 A˙x 1 (tk ) + 3 Ax 1 (tk ) + 4 A¨x 0 (tk ) + 5 A˙x 0 (tk ) + 6 Ax 0 (tk ) = f (tk ) + δ(tk );
k = 1, 2, . . . , N.
(3.9)
Introducing matrix R, ⎞ r T (t1 ) ⎜ r T (t ) ⎟ 2 ⎟ ⎜ ⎟ R=⎜ . ⎜ . ⎟ ⎝ . ⎠ r T (tN ) ⎛
(3.10)
and using the notation above, the grouping of the measurements can be expressed concisely as: ˆ = bˆ + δˆ Rα
(3.11)
P1: Binaya Dash November 16, 2007
15:3
7985
7985˙C003
Modeling and Control Problems in Building Structures and Bridges
107
ˆ is a block diagonal matrix whose diagonal elements are equal to R, where R T α = (α 1 , α 2T , · · · , α nT1 ) T , and bˆ and δˆ are the corresponding vectors of excitation measurements and equation errors. ˆ is of dimensions m × n where m = Nn1 , and Keeping in mind that R n = 3n1 (n1 + n0 ), then if a sufficient number of measurements are taken, this will result in m > n. Under these conditions, least-squares procedures can be used to solve for all the system parameters that constitute the entries in α. Consider the general case where the measurements associated with certain degrees of freedom are more reliable than others or measurements accumulated over certain time periods are to be emphasized differently from the others. For this case, in the weighted least-squares equation-error method we minimize the cost function given by: T J (α) = δˆ Wδˆ
(3.12)
where W is the inverse of the covariance matrix of δˆ . W is usually chosen subjectively and is often taken as a diagonal matrix. By substituting Equation (3.11) into Equation (3.12) and performing the minimization of J (α), we find that the optimal parameters are given by: ˆ −1 Rˆ T Wb. ˆ αˆ = ( Rˆ T W R)
(3.13)
Solving Equation (3.3) for the nonlinear force vector f N (t), and using the definition of f L from Equation (3.4), results in: e f N (t) = f (t) − M11 x¨ 1 (t) + f L (t) . (3.14) Because all the terms appearing on the right-hand side of Equation (3.14) are available from measurements or have been previously identified, the time history of f N can be determined. Note from Equation (3.14) that f N (t) can be interpreted as the residual force vector corresponding to the difference between the excitation vector f (t) and the equivalent linear force vector composed of the inertia, damping, and stiffness terms. 3.2.3 Identification of the Parametric Nonlinear Part As has been described already, the residual nonlinear forces f N (t) can be assumed to consist of a parametric component f p , whose functional form is known, and of a nonparametric component f np with unknown functional form. The identification of these components is the subject of the current and the next sections. Some parametric forms of typical structural nonlinearities were described before (for example the Duffing oscillator). The parameters of these models are estimated using standard least-squares techniques. A different class of nonlinearities that are of particular interest in structural identification are hysteretic nonlinearities. These nonlinearities differ from the simple polynomial nonlinearities because they exhibit hereditary behavior. In the remaining part of this section we will focus on modeling and identifying
P1: Binaya Dash November 16, 2007
15:3
7985
7985˙C003
108
Modeling and Control of Complex Systems
hysteretic behavior in the context of parametric nonlinear identification. We also note that in the next section we will present a treatment of hysteretic nonlinearities in a nonparametric nonlinear identification context, using dynamic neural networks. 3.2.3.1 Identification of Hysteretic Systems Problems involving the identification of structural systems exhibiting inelastic restoring forces with hereditary characteristics are widely encountered in the applied mechanics field. Representative examples involve buildings under strong earthquake excitations or aerospace structures incorporating joints. Due to the hysteretic nature of the restoring forces in such situations, the nonlinear forces cannot be expressed in the form of an algebraic function involving the instantaneous values of the state variables of the system. Consequently, much effort has been devoted by numerous investigators to develop models of hysteretic restoring forces and techniques to identify such systems. Some early contributions in this area have been made by Caughey (1960), Bouc (1967), Masri and Caughey (1979), Baber and Wen (1982), and Wen (1989). One of the challenges in actively controlling the nonlinear dynamic response of structural systems undergoing hysteretic deformations is the need for rapid identification of the nonlinear restoring force so that the information can be utilized by online control algorithms. Consequently, the availability of a method for the online identification of hysteretic restoring forces is crucial for the practical implementation of structural control concepts. 3.2.3.2 Problem Formulation In this section we present the modeling and formulation of the hysteretic identification problem for a single DOF system. The method can easily be expanded to MDOF systems. The symbol Q is used here to represent hysteretic restoring forces, in order to distinguish them from the more general restoring forces f R of the previous sections. Details of the formulation and applications of this approach can be found in Chassiakos et al. (1998). The motion of the single DOF system to be identified is governed by: m x¨ (t) + Q(x(t), x˙ (t)) = u(t)
(3.15)
where x(t) is the system displacement, Q(x(t), x˙ (t)) is the restoring force, and u(t) is the system’s external excitation. The mass m of the system is assumed to be known or already estimated, and measurements of u(t), x¨ (t) are assumed to be available at times tk , k = 1, . . . . The values of x˙ (t) and x(t) are available either by direct measurements at times tk , k = 1, . . . , or by integration of the signal x¨ (t). If the restoring force Q(x, x˙ ) has hysteretic characteristics, a model for such a force can be given by the following nonlinear differential equation, the Bouc–Wen model (Wen, 1989): Q(x, x˙ ) = z with
z˙ = (1/η)[A˙x − ν(β|x˙ ||z|n−1 z − γ x˙ |z|n )]
(3.16)
P1: Binaya Dash November 16, 2007
15:3
7985
7985˙C003
Modeling and Control Problems in Building Structures and Bridges
109
Different combinations of the parameters η, A, ν, β, γ , and n will produce smooth hysteretic loops of various hardening or softening characteristics, with different amplitudes and shapes. Let u(k) = u(tk ), x(k) = x(tk ), x˙ (k) = x˙ (tk ), x¨ (k) = x¨ (tk ), and z(k) = z(tk ). The system equation of motion (3.15) is rewritten as: Q(k) = z(k) = u(k) − m x¨ (k)
(3.17)
hence the values of z at time tk are available and the identification problem can be stated as: given the mass m and using the online measurements of x, x˙ , x¨ , and u, make online estimates of the unknown parameters of the hysteretic model expressed by Equation (3.16). 3.2.3.3 Online Identification Algorithm The hysteretic model obeys the nonlinear differential Equation (3.16). The model is parameterized linearly with respect to the coefficients (1/η) A, (1/η)νβ, and (1/η)νγ , but nonlinearly with respect to the power n. It is, however, desirable to use a linearly parameterized estimator for the online estimation of hysteretic behavior, hence the following modification of the model expressed by Equation (3.16) will be used:
z˙ = (1/η) A˙x −
n=N
a n ν(β|x˙ ||z|
n−1
z − γ x˙ |z| ) n
(3.18)
n=1
where the value of coefficient a n determines the contribution of power n to the hysteresis, and N is a large enough integer. For example, if the value of power n in model (3.16) is n = 3, then the coefficients a i in Equation (3.18) will be a 1 = 0, a 2 = 0, a 3 = 1. Because measurements are usually taken at discrete time intervals t, a discrete time version model of the system defined by Equation (3.18) will be used: z(k) = z(k − 1) + t(1/η) A˙x (k − 1) + t
n=N
(−a n (1/η)νβ|x˙ (k − 1)||
n=1
z(k − 1)|n−1 z(k − 1) + a n (1/η)νγ x˙ (k − 1)|z(k − 1)|n )
(3.19)
This discrete-time model gives rise to the following discrete-time linearly parameterized estimator: ˆ Q(k) = z(k − 1) + θ0 (k) x˙ (k − 1) n=N (θ2n−1 (k)|x˙ (k − 1)||z(k − 1)|n−1 z(k − 1) + n=1
+ θ2n (k) x˙ (k − 1)|z(k − 1)|n )
(3.20)
P1: Binaya Dash November 16, 2007
15:3
7985
7985˙C003
110
Modeling and Control of Complex Systems
where the coefficients θi (k), i = 0, . . . 2N are estimates at time tk of the corresponding coefficients from Equation (3.19), that is, θ0 (k) is an estimate of t(1/η) A, θ2n−1 (k) is an estimate of −(t)a n (1/η)νβ, and θ2n is an estimate of (t)a n (1/η)νγ . Let θ (k) = [θ0 (k), θ1 (k), θ2 (k), . . . , θ2N (k)]T be the vector containing the ∗ T ] the vector parameter estimates at time tk and θ ∗ = [θ0∗ , θ1∗ , θ2∗ , . . . , θ2N containing the true values of the parameters. Also let:
ξ(k − 1) = x˙ (k − 1), |x˙ (k − 1)||z(k − 1)|0 z(k − 1), x˙ (k − 1)|z(k − 1)|1 , |x˙ (k − 1)||z(k − 1)|1 z(k − 1), x˙ (k − 1)|z(k − 1)|2 , . . . ,
T |x˙ (k − 1)||z(k − 1)|2N−1 z(k − 1), x˙ (k − 1)|z(k − 1)|2N , z(k − 1) (3.21)
be a vector containing the system measurements at time tk . Estimator (3.20) is then expressed as: ˆ Q(k) = z(k − 1) + ξ T (k − 1)θ (k)
(3.22)
and the estimation error will be: ˆ e(k) = Q(k) − Q(k) = ξ T (k)θ ∗ − ξ T (k)θ (k) = ξ T (k)φ(k)
(3.23)
where φ(k) = θ ∗ −θ (k) is the ([2N+1]×1) vector of parameter errors between the actual and estimated values θi . Based on Equation (3.22), and using standard techniques found in the adaptive estimation and adaptive control literature (Ioannou and Datta, 1991), the following adaptation law (3.24)–(3.25) is designed for updating the estimates θ (k) online: ⎧ if μ(k) ≤ Mθ ⎨ μ(k), (3.24) θ (k) = ⎩ ( Mθ /μ(k))μ(k) if μ(k) > Mθ with μ(k) = θ(k − 1) −
γ0 e(k)ξ(k − 1) β0 + ξ(k − 1)2
(3.25)
where γ0 > 0 is the learning rate of the algorithm and β0 > 0 is a design constant. The norms ξ(k) and μ(k) are the usual Euclidean vector norms. The number Mθ is an upper bound on the norm θ ∗ . Such an upper bound can be easily found, if some information about the order of magnitude of the elements of θ ∗ is available a priori. The adaptive law expressed by Equations (3.24) and (3.25) guarantees that all the signals will remain bounded and that the error e(k) −→ 0 as k → ∞, if the model of Equation (3.16) is a good representation of the unknown system.
P1: Binaya Dash November 16, 2007
15:3
7985
7985˙C003
Modeling and Control Problems in Building Structures and Bridges
111
3.2.4 Identification of the Nonparametric Nonlinear Part The nonparametric nonlinear forces f np do not have a known functional representation. Under the assumption of additive parametric and nonparametric nonlinearities, the nonparametric part is given from Equation (3.5) as: f np = f N − f p and because the time history of f N is known from Equation (3.14) and f p has been identified by one of the methods of the previous section, the residual restoring forces f np will be identified by a nonparametric method. 3.2.4.1 Nonlinear Nonparametric Terms Let h i (t) represent the ith component of the nonlinear residual force vector f np . In general, vector h depends simultaneously on all the components of the system acceleration, velocity, and displacement vectors associated with the n1 internal DOF as well as the n0 support components: h(t) = h(x, x˙ , x¨ ).
(3.26)
The central idea of the present method (Masri and Caughey, 1979) is that, in the case of nonlinear dynamic systems commonly encountered in the structural mechanics field, a judicious assumption is that each component of h can be expressed in terms of a series of the form: J ma xi
h i (x, x˙ , x¨ ) ≈
( j) ( j) ( j) hˆ i v1i , v2i
(3.27)
j =1
where the v1 s and v2 s are suitable generalized coordinates which, in turn, are linear combinations of the physical displacements, velocities, and accelerations. The approximation indicated in Equation (3.27) is that each component h i of the nonlinear force vector h can be adequately estimated by a collection ( j) of terms hˆ i , each one of which involves a pair of generalized coordinates. The particular choice of combinations and permutations of the generalized coordinates and the number of terms J ma xi needed for a given h i depend on the nature and extent of the nonlinearity of the system and its effects on the specific DOF i. Because h i (t) is chosen as the i th component of f np (t), the procedure expressed by Equation (3.27) will directly estimate the corresponding component of the unknown nonlinear force. For certain structural configurations (e.g., localized nonlinearities) and relatively low-order systems, the choice of suitable generalized coordinates for the series in Equation (3.27) is a relatively straightforward task. However, in many practical cases involving distributed nonlinearities coupled with a relatively high-order system, an improved rate of convergence of the series in Equation (3.27) can be achieved by performing the least-squares fit of the nonlinear forces in the “modal” domain as outlined
P1: Binaya Dash November 16, 2007
15:3
7985
7985˙C003
112
Modeling and Control of Complex Systems
below. Experience with typical structural nonlinearities has shown that the main reason for this improvement is the fact that relatively few “modes” dominate the transformed nonlinear forces. Using the identification results for the linear part, the eigenvalue prob−1 −1 lem associated with M11 K 11 is solved, that is, M11 K 11 = , where is a diagonal matrix containing the squares of the natural frequencies on the diagonal, and is the eigenvector matrix or modal matrix, resulting in the corresponding vector of generalized coordinates u: u(t) = −1 x(t)
(3.28)
For simplicity of notation, from here on, we use the symbol h to denote the vector of transformed nonlinear residual forces ( T f np ) instead of f np : ˙ u) ¨ = T f np (t) h(u, u,
(3.29)
3.2.4.2 Orthogonal Series Expansion The individual terms appearing in the series expansion of Equation (3.27) may be evaluated by using the least-squares approach to determine the optimum fit for the time history of each h i . Thus, hˆ i(1) may be expressed as a double series involving a suitable choice of generalized coordinates: (1) (i) (1) (1) , v2(1) Ck Tk v1i T v2i ≡ hˆ i(1) v1(1) i i k
(3.30)
where the Ckl s are a set of undetermined constants and Tk (.) and Tl (.) are suitable basis functions, such as orthogonal polynomials. As an example, if Tk (.) are the Chebyshev polynomials, then for k = 4, the 4th Chebyshev polynomial will be T4 (v1(1) ) = 8(v1(1) ) 4 − 8(v1(1) ) 2 + 1. i i i Let h i(2) , the deviation (residual) error between h i and its first estimate hˆ i(1) , be given by: . , v2(1) h i(2) (x, x˙ , x¨ ) = h i (x 1 , x˙ 1 , x¨ 1 ) − hˆ i(1) v1(1) i i
(3.31)
Equation (3.30) accounts for the contribution to the nonlinear force h i of generalized coordinates v1(1) and v2(1) appearing in the form (v1(1) ) k (v2(1) ) . Consei i i i quently, the residual error as defined by Equation (3.30) can be further reduced by fitting h i(2) by a similar double series involving variables v1(2) and v2(2) : i i , v2(2) h i(2) (x, x˙ , x¨ ) ≈ hˆ i(2) v1(2) i i
(3.32)
(2) (i) (2) (2) , v2(2) Ck Tk v1i T v2i . ≡ hˆ i(2) v1(2) i i
(3.33)
where:
k
P1: Binaya Dash November 16, 2007
15:3
7985
7985˙C003
Modeling and Control Problems in Building Structures and Bridges
113
This procedure is extended to account for all DOFs that have significant in( j) ( j) ( j) teraction with DOF i. The terms hˆ i (v1i , v2i ) in Equation (3.27) are estimates ( j) of h i (x, x˙ , x¨ ). In this iterative process, the jth residual error is given by: ( j)
( j−1)
h i (x, x˙ , x¨ ) = h i
( j−1) ( j−1) ( j−1) (x, x˙ , x¨ ) − hˆ i (v1i , v2i );
j = 1, 2, . . . , J ma xi (3.34)
where
and
h i(1) (x, x˙ , x¨ ) ≡ h i (x, x˙ , x¨ )
(3.35)
( j) ( j) ( j) ( j) ( j) ( j) (i) Ck Tk v1i T v2i . hˆ i v1i , v2i ≡
(3.36)
k
Note that, in general, the range of the summation indices k and appearing in Equation (3.36) may vary with the series index j and DOF index i. Similarly, J ma xi , the total number of series terms needed to achieve a given level of accuracy in fitting the nonlinear force time history, depends on the DOF index i. 3.2.4.3 Nonlinear Forces Representation by Chebyshev Series Using orthogonal polynomials Tk (.), estimate each h i (x, x˙ , x¨ ) by a series of ( j) approximating functions hˆ i of the form indicated in Equation (3.36). The numerical value of the Ck coefficients can be determined by invoking the applicable orthogonality conditions for the chosen polynomials. Although there is a wide choice of suitable basis functions for least-squares application, the orthogonal nature of the Chebyshev polynomials and their “equal-ripple” characteristics make them convenient to use in the present work. The nth Chebyshev polynomial is defined by the identity Tn (cosθ) = cos(nθ), or equivalently by: Tn (ξ ) = cos(n cos−1 ξ ),
−1 < ξ < 1
and satisfies the weighted orthogonality property 1 0, m= 0 (π/2) δnm , n = w(ξ )Tn (ξ )Tm (ξ )dξ = π, n=m=0 −1
(3.37)
(3.38)
where w(ξ ) = (1 − ξ 2 ) −1/2 is the weighting function and δnm is the Kronecker delta. Note that in the special case in which no cross-product terms are involved in any of the series terms, function h can be expressed as the sum of two one-dimensional orthogonal polynomial series instead of a single twodimensional series of the type under discussion. Further details regarding this approach and a demonstration of the utility of this procedure are available in the works of Masri et al. (1987a, 1987b, 2006).
P1: Binaya Dash November 16, 2007
114
15:3
7985
7985˙C003
Modeling and Control of Complex Systems
3.2.4.4 Neural Network Approach A different approach for identifying the nonparametric nonlinear part of the restoring forces is the use of artificial neural networks. The standard multilayer feedforward neural networks have been shown to perform very well in identifying a broad class of structural nonlinearities, without assuming any knowledge of the functional form of the nonlinearity. Training algorithms such as standard back propagation can be used to train the network, which learns to approximate the unknown nonlinearity within the range of the data provided in the training set. The multilayer feedforward networks will work well if the unknown nonlinearities are static, that is, the output of the nonlinearity depends only on the instantaneous value of its input. Detailed discussions on the approximating properties of this type of network for structural systems are provided in Chassiakos and Masri (1991), and Masri et al. (1992, 1993). When the nonlinearity is of a dynamic nature, such as a hysteretic element, which possesses memory characteristics, the simple multilayer feedforward networks cannot capture the input-output nonlinear dynamics. A more general form of Equation (3.16) for the restoring force is given by: Q(x, x˙ ) = z with z˙ = G(z, x, x˙ , f )
(3.39)
where G is a continuous nonlinear function, capable of capturing nonlinear hysteretic effects, and x and f are the displacement and excitation forces, respectively. The main challenge in designing adaptive algorithms for estimating the accelerations/displacements of system (3.39) is the differential Equation (3.16) for z˙ , which depends nonlinearly and dynamically on x, x˙ , and f . A network architecture that has been shown to approximate nonlinear dynamic systems is the class of Volterra–Wiener neural networks (VWNN) (Kosmatopoulos et al., 2001). The VWNN consists of a linear multi-input multi-output (MIMO) stable dynamic system connected in cascade with a linear-in-the-weights neural network, as shown in Figure 3.1. The dynamics of the linear MIMO system, are given as follows: 1 = ζ 2 = H1 (s)1 3 = H2 (s)2 .. .
(3.40)
p+1 = Hp (s) p ξ = [τ1 , . . . , p+1 ]τ where ζ is the input vector to the VWNN, ξ is the output of the linear MIMO system, and Hi (s) are stable transfer function matrices available for design (here, s denotes the Laplace operator). The vector ζ contains all the signals that are available for measurement (for example, the vector ζ could contain the acceleration signals x¨ as well as the input excitation forces f which are
P1: Binaya Dash November 16, 2007
15:3
7985
7985˙C003
Modeling and Control Problems in Building Structures and Bridges
ζ
Linear MIMO Filter H(s)
ξ
Neural Network WT φ(·)
115
y
FIGURE 3.1 Block diagram of the Volterra–Wiener neural network.
assumed to be available for measurement). Note that from Equation (3.40) we know that the linear MIMO system used in the VWNN dynamics consists of a cascade of linear stable filters Hi (s), where the output of each filter is fed as input to the next filter. The linear-in-the-weights neural network is described as follows: y = Wτ φ(ξ )
(3.41)
where y is the output of the neural network, W denotes the matrix of the synaptic weights of the neural network, ξ is the output of the linear filter (3.40), and φ is a vector of the nonlinear activation functions of the neural network. It is noted that in the VWNN (3.40) and (3.41), the “learning” capabilities are due to the synaptic weights W, which are adjusted based on certain input-output measurements. All other parameters, that is, the linear filters Hi (s) and the nonlinear activation functions φ(ξ ), are fixed, and not adjusted during training. Although different parameter estimation algorithms exist that can be used for the adjustment of W, we will use a normalized gradient adaptive law with projection (Ioannou and Datta, 1991). Such an adaptive law keeps the parameter estimates bounded regardless of the boundedness properties of the signals x, x¨ , and f . The adaptive law is summarized as follows: Estimation Model yˆ = Wτ φ(ξ )
(3.42)
Here yˆ denotes the estimate of the vector x¨ (t + 1) at the next time sample in the case where accelerations are estimated or the estimate of the displacement vector x(t + 1) at the next time sample in the case where displacements are estimated. Adaptive Laws ⎧ ⎨ γ φ τ (ξ ) ˙ = W τ ⎩ I − WW γ φ τ Wτ W
if |W| < M or if |W| = M otherwise
and
(γ φ τ ) τ W ≤ 0
(3.43)
Normalized Estimation Error = ( y − yˆ )/η2 η2 = 1 + φ τ φ
(3.44)
P1: Binaya Dash November 16, 2007
116
15:3
7985
7985˙C003
Modeling and Control of Complex Systems
The scalars γ , M are positive design constants. Parameter γ is the adaptive gain; M is a large positive constant bounding W such that |W| < M. The vector y corresponds to either x¨ (t + 1) or x(t + 1) depending on whether we estimate accelerations or displacements. In the case where displacements are estimated, we assume that during training the adaptive algorithm is provided with the actual node displacements. The adaptive law (3.42)–(3.44) guarantees the following properties: 1. The parameter matrix W(t) remains bounded for all t, provided that |W(0)| < M. 2. The normalized estimation error converges to a residual set whose radius can be made arbitrarily small by increasing accordingly the dimensions of ξ and the regressor vector φ. The role of the VWNN filter H(s) can be understood by considering the discrete-time analog of the estimation process. Because the structure is a dynamic system, the future value of any of its states depends not only on the current value of the states and inputs but also on their past values. In other words, the structure dynamics possess “memory,” because they depend on past values of the states and inputs. Therefore, a discrete-time estimation scheme should use not only the current values x¨ (t) and f (t) as inputs but also their past values x¨ (t − 1), f (t − 1), . . . , x¨ (t − p), f (t − p), where p denotes the memory of the estimator. The value of p should be large enough to ensure that the memory of the estimator is larger than the memory of the actual system. The continuous-time analog of the memory in the VWNN estimator is the cascade of stable linear filters, where the output of each filter is fed to the next one. The output of the first filter can be thought of as the analog of x¨ (t − 1), f (t − 1) in discrete time; the output of the second filter is the analog of x¨ (t − 2), f (t − 2), and so on. There are two design issues for the filters H1 (s), . . . , Hp (s). The first is the choice of each filter. Although there are many different approaches that can be used, the simplest one is to choose Hi (s) to be low-pass first-order filters of the form Hi (s) = 1/(s + α) where α is a positive design parameter. The cut-off frequency of these filters (alternatively, the choice of α) must be such that the filter “passes” all the signal energy, that is, the cut-off frequency must be large enough so that there is no loss of information during filtering of the input signal. The second issue is the number p of filters used (i.e., the memory of the estimator). A good “rule of thumb” is to choose p such that, if we obtain a state representation of Equation (3.41), the dimension of this representation should be greater or equal to the dimension of a state space representation of the system. The number p obtained in this manner is further increased or decreased until a good estimator is obtained, that is, we modify p using trial and error. Note that, for practical reasons, we want to keep p as small as possible. The role of the regressor vector φ is to capture the nonlinear characteristics of the structure dynamics. There are no general design methodologies for
P1: Binaya Dash November 16, 2007
15:3
7985
7985˙C003
Modeling and Control Problems in Building Structures and Bridges
117
choosing the regressor terms for complicated high-dimensional systems such as MDOF structures. A good “rule of thumb” is letting the regressor vector φ be formed as the output of a high-order neural network (HONN). Details on the formulation, theoretical analysis, and approximation properties of the HONN can be found in Kosmatopoulos et al. (1995). The output of a HONN is a combination of nonlinear transformation of its input signals: first the input signal ξ is passed through a sigmoidal; the resulting signals form the first entries of φ (which are called first-order terms). The reason we use a sigmoidal is to make sure that all regressor signals are bounded and normalized. The sigmoidal should not be very steep to make sure that it does not get saturated for all possible values of the input variables. Then, we augment φ by adding all the possible products between two entries of φ (this is similar to a multidimensional second-order Taylor expansion). The new terms added are referred to as second-order terms. If we need more approximation power we augment further φ by including third-order terms, fourth-order terms, and so on.
3.3
Examples and Case Studies
In this section we apply the identification methodologies to several case studies and we present experimental results from the following systems: (1) the Vincent Thomas Bridge; (2) steel and concrete nonlinear joints; (3) nonlinear viscous dampers. We also present representative simulation results from various nonlinear systems, illustrating the applicability of the developed techniques. 3.3.1 Modeling of the Vincent Thomas Bridge Using Earthquake Response Measurements The Vincent Thomas Bridge in the Los Angeles metropolitan area is a critical artery for commercial traffic flow in and out of the Los Angeles harbor, and is at risk in the seismically active southern California region, particularly because it straddles the Palos Verdes fault zone. A combination of linear and nonlinear system identification techniques were used in the work of Smyth et al. (2003) to obtain a complete reduced-order, multi-input multi-output (MIMO) dynamic model of the bridge based on the dynamic response of the structure to the 1987 Whittier and 1994 Northridge earthquakes. The bridge has been instrumented with 26 accelerometers (16 on the bridge structure, and 10 at various locations on its base). Starting with the available acceleration measurements, the methodology of Section 3.2.2 is applied to the data set to develop a reduced-order, equivalent linear, MDOF model. The linear system identification method is combined with a nonparametric identification technique, as presented in Section 3.2.4, to generate a reducedorder nonlinear mathematical model suitable for use in subsequent studies to predict, with good fidelity, the total response of the bridge under arbitrary dynamic environments.
P1: Binaya Dash November 16, 2007
118
15:3
7985
7985˙C003
Modeling and Control of Complex Systems
Results of this study yield measurements of the equivalent linear modal properties (frequencies, mode shapes, and nonproportional damping) as well as quantitative measures of the extent and nature of nonlinear interaction forces arising from strong ground shaking. It is shown that, for the particular subset of observations used in the identification procedure, the apparent nonlinearities in the system restoring forces are quite significant, and they contribute substantially to the improved fidelity of the model. The study also shows the potential of the presented identification techniques to detect slight changes in the structure’s influence coefficients, which may be indicators of damage and degradation in the structure being monitored. Figure 3.2 shows a set of representative time-domain plots of the nonlinear residual fitting. The representative results are from a single accelerometer station (station 7, located on a side span of the bridge and measuring lateral acceleration). The top plot shows the measured acceleration history. The second plot shows the linear, time-invariant model estimate. The third plot shows the residual (i.e., the difference of the previous two signals). In the fourth plot the nonparametrically modeled residual is given, and finally at the bottom the remaining total error is shown. For ease of comparison, identical scales are used for all plots. 3.3.2 Online Identification of Hysteretic Joints from Experimental Measurements In this section we present experimental results, using the techniques of Section 3.2.3 for identifying the parametric nonlinear part of two systems: (1) a fullscale structural steel subassembly and (2) a structural reinforced concrete subassembly. 3.3.2.1 Identification Results for Full-Scale Structural Steel Subassembly The experiments were conducted by means of a full-scale structural steel subassembly, made of ASTM A36 steel and consisting of a W16X40 wide flange beam framing into an 11-inch/square box column. Because the behavior of the column wall has an important effect on the overall behavior of the connection, an axial loads was applied to the column to simulate the dead and live loads in an actual building column. Hydraulic actuators were used to impose the vertical loads as well as the induced moment at the connection. The applied tip loads and beam displacements were monitored by force and displacement sensors. The experimental measurements were processed to extract the value of the applied moment and the corresponding joint rotation, which were subsequently used to develop the hysteretic characteristics of the connection. Following the development of Section 3.2.3, the model of Equation (3.16) was assumed to represent the nonlinear hysteretic behavior of the system. A value of N = 3 was chosen for the number of terms in the sum of Equation (3.18). Figure 3.3(a) shows the phase-plane plots (restoring force versus displacement) of the measured (solid) and identified (dashed) restoring forces.
P1: Binaya Dash November 16, 2007
15:3
7985
7985˙C003
Modeling and Control Problems in Building Structures and Bridges
cm/sec2
200
119
Measured acceleration
100 0 –100 –200 0
10
20
30
40
50
cm/sec2
200
60
70
80
Eq. linear estimate
100 0 –100 –200 0
10
20
30
40
50
cm/sec2
200
60
70
80
Nonlinear residual
100 0 –100 –200 0
10
20
30
cm/sec2
200
40
50
60
70
80
Nonparametric model estimate of Nl. residual
100 0 –100 –200 0
10
20
30
40
cm/sec2
200
50
60
70
80
Error in fitting of Nl. residual
100 0 –100 –200 0
10
20
30
40 Time
50
60
70
80
FIGURE 3.2 Sample nonlinear identification result from the hybrid identification of a multi-input multioutput nonlinear model of Vincent Thomas Bridge under earthquake excitation.
The agreement is seen to be extremely good. Figure 3.3(b) shows the convergence of the (2N + 1) parameter clusters of vector θ. One can see that the system is degrading, as is evident from the evolution of the parameter θ0 . The term θ0 , which basically represents the stiffness of the system, can be seen to be steadily decreasing in Figure 3.3(b). This decrease in stiffness accounts for the slight clockwise rotation of the hysteretic loops in Figure 3.3(a).
P1: Binaya Dash November 16, 2007
15:3
7985
7985˙C003
120
Modeling and Control of Complex Systems 1
0.8
------ : zˆ ____ : z
0.6
0.4
. r(x, x)
0.2
0
−0.2
−0.4
−0.6
−0.8
−1 −1
−0.8 −0.6 −0.4 −0.2
0 x (a)
0.2
0.4
0.6
0.8
1
FIGURE 3.3 Adaptive identification of structural steel subassembly undergoing cyclic testing. (a) Phase-plane plot of restoring force prediction (dashed) compared to measured force (solid); (b) evolution of the parameter estimates.
3.3.2.2 Identification Results for a Structural Reinforced Concrete Subassembly The concrete specimen was a one-third scale model of a reinforced concrete, multistory frame joint prototype. Details of the test article and a photograph of the fabricated specimen and test apparatus are available in the work of Masri et al. (1994). The concrete specimen was tested by means of a servohydraulic device which imposed a prescribed dynamic motion at the specimen boundary. Again, following the development of Section 3.2.3, the model of Equation (3.16) was assumed to represent the nonlinear behavior of the system. Figure 3.4(a) shows the phase plots of the measured concrete restoring force (solid curve) and its estimate (dashed curve). It is seen that the system, in addition to its hysteretic characteristics, exhibits dead-space nonlinearities
P1: Binaya Dash November 16, 2007
15:3
7985
7985˙C003
Modeling and Control Problems in Building Structures and Bridges
121
0.07 0.06 0.05 0.04 θ0 0.03 0.02 0.01 θ5
θ4
0 −0.01
θ6
θ2
−0.02 −0.03
θ3
θ1 0
100 200 300 400 500 600 700 800 900 1000 (b)
FIGURE 3.3 (Continued).
as well. Figure 3.4(b) shows the evolution of the estimated parameters. It is seen that the identified model approximates very accurately the characteristics of the structure, even though the restoring force incorporates features associated with highly nonlinear behavior exhibited by hysteretic as well as dead-space-type nonlinearities. 3.3.3 Models of Nonlinear Viscous Dampers from Experimental Measurements Nonlinear viscous dampers are increasingly incorporated into retrofit and new design strategies of large civil structures to dissipate energy from strong dynamic loads, such as those resulting from wind and seismic activity. An experimental data set from a full-scale nonlinear viscous damper under dynamic load was used to develop different types of parametric and nonparametric models as presented in Section 3.2. Assuming a parametric model commonly used in the design of nonlinear viscous dampers (called the simple design model [SDM] here), an adaptive least-squares identification approach was used to identify the model’s
P1: Binaya Dash November 16, 2007
15:3
7985
7985˙C003
122
Modeling and Control of Complex Systems Least−Squares with Forgetting Factor ID of Reinforced Concrete Data 0.8 0.6
Restoring Force
0.4 0.2 0 −0.2 −0.4 −0.6 −0.8 −0.4
−0.3
−0.2
−0.1 0 0.1 Displacement
0.2
0.3
0.4
(a) 12 θ3
10 8 θ6 θ Parameters
6 4 θ5 2
θ0 θ1
0 θ2
θ1
−2 −4 −6
θ4 0
50
100
150
200
250
300
Samples (b) FIGURE 3.4 Adaptive identification of structural reinforced concrete subassembly undergoing cyclic testing. (a) Phase-plane plot of restoring force prediction vs. exact measured force; (b) evolution of the estimated parameters.
P1: Binaya Dash November 16, 2007
15:3
7985
7985˙C003
Modeling and Control Problems in Building Structures and Bridges
123
parameters. The results of the parametric modeling are shown in Figure 3.5 (a, d). The nonparametric restoring force method presented in Sections 3.2.4.1 to 3.2.4.3 was used to obtain the results shown in Figure 3.5 (b, e), whereas Figure 3.5 (c, f) shows the results using nonparametric neural networks as discussed in Section 3.2.4.4. The phase plots show approximately a one-cycle period of the damper response (solid line for measured, and dashed for identified forces). In the figure, the first row of plots shows the relationship between displacement and force, and the second row shows the relationship between velocity and force for each investigated identification method. Details regarding this study are available in the work of Yun et al. (2007). 3.3.4 Further Examples of Parametric Identification of MDOF Systems In this section we present simulation results from a three-DOF structure. The system is modeled as a three-story building, consisting of three masses connected in a chain-like topology (i.e., the structure support is connected to mass 1; mass 1 is connected to mass 2; and mass 2 is connected to mass 3). No other connections are present in the model. The three elements connecting the three masses are assumed to have unknown restoring force characteristics. Because the restoring forces are unknown, the more general model of Equation (3.18) was used. The model was developed to identify hysteretic elements, but it can also identify, as special cases, linear elements and polynomial nonlinearities. Results of the application of the online parametric identification method of Section 3.2.3 are shown in Figure 3.6. The left-hand panels in Figure 3.6 correspond to the the three phase diagrams in which each element restoring force is plotted versus the corresponding interstory displacement. The right-hand panels of plots show the evolution of θ parameters corresponding to each of the three elements. The model correctly identified all three elements: the top (third) element was identified as a polynomial-type nonlinearity (damped Duffing oscillator); the middle element was identified as a linear spring-damper connection; and the bottom (first story) connection was identified as a hysteretic element. In each of the three elements, the system parameters reach their correct asymptotic values within a few seconds of tracking time. Figure 3.7 shows an important illustration of the application of the parametric identification approach in the structural health monitoring field. Synthetic data were generated to simulate a situation in which a nonlinear SDOF system had its stiffness suddenly reduced from a value of 5 to 3, thus simulating an abrupt damage to the system. It is clear from the phase-domain plot on the left-hand side of Figure 3.7 as well as the time-history plots of the evolution of the system parameters, that the online monitoring approach can accurately detect the incipient damage state, as well as be able to track its changing magnitude.
P1: Binaya Dash November 16, 2007
15:3
7985
7985˙C003
Modeling and Control of Complex Systems 800
600
600
400
400
200
200
0 −200
−200 −400
−600
−600 −100 0 100 Displacement (mm) (a) SDM
−800 −200
200
800
800
600
600
400
400
200
200
Force (kN)
Force (kN)
0
−400
−800 −200
0 −200
−600
−600 −800 −500
200
800
600
600
400
400
200
200
0 −200
500
0 −200
−400
−400
−600
−600 0 Velocity (mm/sec) (e) RFM
0 Velocity (mm/sec) (d) SDM
800
−800 −500
200
−200 −400
−100 0 100 Displacement (mm) (c) ANN
−100 0 100 Displacement (mm) (b) RFM
0
−400
−800 −200
Force (kN)
Force (kN)
800
Force (kN)
Force (kN)
124
500
−800 −500
0 Velocity (mm/sec) (f) ANN
500
FIGURE 3.5 Sample identification results of nonlinear viscous damper models for a representative experimental data set. (a, d) Identification using the parametric simple design model (SDM); (b, e) identification using the nonparametric restoring force method (RFM); (c, f) identification using nonparametric artificial neural networks (ANN).
P1: Binaya Dash November 16, 2007
15:3
7985
7985˙C003
Modeling and Control Problems in Building Structures and Bridges
125
Restoring Force
6 Element # 3 - Duffing nonlinearity
4 2 0 2 4
4
3
2
1
0
1
2
3
4
1
1.5
2
0 0.5 1 Interstory Displacement
1.5
2
Restoring Force
6 4
Element # 2 - Linear
2 0 2 4 1.5
1
0.5
0
0.5
Restoring Force
6 4
Element # 1 - Bouc-Wen nonlinearity
2 0 2 4 1.5
1
0.5
FIGURE 3.6 Parametric identification of three elements in a nonlinear three-degree-of-freedom system.
3.3.5 Nonparametric Identification through Volterra–Wiener Neural Networks In this section the Volterra–Wiener neural network of Section 3.2.4.4 is used to identify the restoring forces of a three-DOF system, representing a three-story building, similar to the one of Section 3.3.4. Using a wideband random signal as the base excitation, the system was simulated for 40 sec. The neural estimator was also running for the entire 40-sec duration. The network weights were allowed to adapt during this period. Figure 3.8 presents plots of the restoring forces (solid curves) and their estimates (dashed curves) produced by the neural estimator. The neural network weights are initially set to small
P1: Binaya Dash November 16, 2007
15:3
7985
7985˙C003
126
Modeling and Control of Complex Systems 0.15
θ Parameters
0.1 θ2
θ0
0.05 0
θ1
0.05 Element # 3 - Duffing nonlinearity
θ Parameters
0.1
3.5 3 2.5 2 1.5 1 0.5 0 0.5
0
2
4
6
8
10
12
14
16
18
20
16
18
20
18
20
θ0 Element # 2 - Linear θ1
0
2
4
6
8
10
12
14
6
θ Parameters
5 4
θ0
3
Element # 3 - Bouc-Wen nonlinearity
2
θ4
1 0 1
0
2
4
6
8
10 12 Time (sec)
14
16
FIGURE 3.6 (Continued).
random values. The adaptation is on from time t = 0, and it is seen from the figures that it takes about 15 sec (a few response cycles) for the network weights to adapt and to estimate the restoring forces exactly. Next, the weights of the neural network are fixed to the values already obtained. Now a different base excitation is used, in order to validate the approximation capabilities of the VWNN. Figure 3.9 presents the restoring forces (solid curves) and their estimates (dashed curves) produced by the fixed VWNN. It is seen that the agreement is excellent, although the network has never been trained on the responses to the specific base excitation.
P1: Binaya Dash November 16, 2007
15:3
7985
7985˙C003
Modeling and Control Problems in Building Structures and Bridges
127
5 4 3
Restoring Force
2 1 0 1 2 Estimated force 3
Actual rest. force
4 1.5
1
0.5
0
0.5
1
1.5
2
2.5
3
3.5
Interstory Displacement for Element #1 Identification of Element #1, where Stiffness Shifts from 5 to 3 at t = 5 sec. 8
θ Parameters for Element #1
6
θ0
4
2
0 θ4
2
4
6
0
2
4
6
8
10 Time
12
14
16
18
20
FIGURE 3.7 Detection of change in nonlinear element when at time t = 5 sec, the stiffness changes abruptly from 5 to 3.
P1: Binaya Dash November 16, 2007
15:3
7985
7985˙C003
128
Modeling and Control of Complex Systems Time-History of Restoring Forces and their Estimates
3rd Story
4 2 0 –2 –4
0
5
10
15
20
25
30
35
40
45
0
5
10
15
20
25
30
35
40
45
0
5
10
15
20 25 Time (secs)
30
35
40
45
2nd Story
5
0
–5
1st Story
10 5 0 –5
FIGURE 3.8 Nonparametric identification using VWNN of three elements in a nonlinear three-degree-offreedom system.
3.4
Conclusions
The great variety of challenging situations encountered in the modeling of realistic structural dynamic systems for purposes of simulation, active control, or structural health monitoring applications require the availability of a toolkit of methods and approaches for developing robust mathematical models of varying levels of sophistication and format to capture the underlying complex physical phenomena embedded within the structural systems of interest. This chapter provides a state-of-the-art approach, incorporating parametric as well as nonparametric system identification methods, for developing parsimonious nonlinear (i.e., not-necessarily-linear) models of arbitrary structural systems. The models can be used in a variety of applications, spanning the range from micro-electro-mechanical (MEMS) devices, to aerospace structures, to dispersed civil infrastructure systems. A wide variety of case studies is provided to illustrate the use of the modeling tools for online or off-line identification situations, using experimental
P1: Binaya Dash November 16, 2007
15:3
7985
7985˙C003
Modeling and Control Problems in Building Structures and Bridges Time-History of Restoring Forces and their Estimates
2 3rd Story
129
1 0 –1 –2
0
5
10
15
20
25
30
35
40
45
0
5
10
15
20
25
30
35
40
45
0
5
10
15
20 25 Time (secs)
30
35
40
45
2nd Story
4 2 0 –2 –4
1st Story
4 2 0 –2 –4
FIGURE 3.9 Time history of restoring forces and their estimates after training using VWNN, but with different base excitation.
measurements and simulation results, to represent many challenging types of stationary as well as nonstationary nonlinearities such as polynomial-type, hysteresis, limited-slip, and so on.
References Baber, T. T. and Wen, Y. K. (1982). “Stochastic Response of Multistory Yielding Frames,” Earthquake Eng. Struct. Dyn. 10, 403–416. Beck, J. L. and Jennings, P. C. (1980). “Structural Identification Using Linear Models and Earthquake Records,” Int. J. Earthquake Eng. Struct. Dyn. 8, 145–160. Beck, J. L. (1990). “Statistical System Identification of Structures,” Proc. 5th Int. Conf. Structural Safety and Reliability, ASCE, New York. Bouc, R. (1967). “Forced Vibration of Mechanical Systems with Hysteresis, Abstract,” Proc. 4th Conf. Nonlinear Oscillation, Prague, Czechoslovakia. Caughey, T. K. (1960). “Random Excitation of a System with Bilinear Hysteresis,” J. Appl. Mech. Trans. ASME 27, 649–652.
P1: Binaya Dash November 16, 2007
130
15:3
7985
7985˙C003
Modeling and Control of Complex Systems
Chassiakos, A. G. and Masri, S. F. (1991), “Identification of the Internal Forces of Structural Systems Using Feedforward Multilayer Networks,” Computing Systems Eng. 2(1), 100–110. Chassiakos, A., Masri, S., Smyth, A., and Caughey, T. (1998). “On-Line Identification of Hysteretic Systems,” Trans. ASME J. Appl. Mech. 65(1), 194–203. Housner, G. W., Bergman, L. A., Caughey, T. K., Chassiakos, A. G., Claus, R. O., Masri, S. F., Skelton, R. E., Soong, T. T., Spencer, B. F., and Yao, J. T. P. (1997). “Structural Control: Past, Present and Future,” ASCE J. Eng. Mech. 123(9), 897–971. Ioannou, P. A. and Datta, A. (1991). “Robust Adaptive Control: A Unified Approach,” Proc. IEEE 79, 1736–1768. Kerschen, G., Worden, K., Vakakis, A., and Golinval, J-C. (2006). “Past, Present and Future of Nonlinear System Identification in Structural Dynamics,” Mech. Systems Signal Proc. 20(3), 505–592. Kosmatopoulos, E. B., Polycarpou, M. M., Christodoulou, M. A., and Ioannou, P. A. (1995). “High-Order Neural Network Structures for Identification of Dynamical Systems,” IEEE Trans. Neural Networks 6(2), 422–431. Kosmatopoulos, E. B., Smyth, A. W., Masri, S. F., and Chassiakos, A. G. (2001). “Robust Adaptive Neural Estimation of Restoring Forces in Nonlinear Structures,” Trans. ASME J. Appl. Mech., 68(6), 880–893. Masri, S. F. and Caughey, T. K. (1979). “A Nonparametric Identification Technique for Nonlinear Dynamic Problems,” Trans. ASME J. Appl. Mech. 46(2), 433–447. Masri, S. F., Miller, R. K., Saud, A. F., and Caughey, T. K. (1987a). “Identification of Nonlinear Vibrating Structures. I: Formulation,” Trans. ASME J. Appl. Mech. 109, 918–922. Masri, S. F., Miller, R. K., Saud, A. F., and Caughey, T. K. (1987b). “Identification of Nonlinear Vibrating Structures. II: Applications,” Trans. ASME J. Appl. Mech. 109, 923–929. Masri, S. F., Chassiakos, A. G., and Caughey, T. K. (1992). “Structure-Unknown Nonlinear Dynamic Systems: Identification through Neural Networks,” Smart Materials & Structures 1(1), 45–56. Masri, S. F., Chassiakos, A. G., and Caughey, T. K. (1993). “Identification of Nonlinear Dynamic Systems Using Neural Networks,” Trans. ASME J. Appl. Mech. 60, 123– 133. Masri, S. F., Agbabian, M. S., Abdel-Ghaffar, A. M., Highazy, M., Claus, R. O., and de Vries, M. J. (1994). “An Experimental Study of Embedded Fiber-Optic Strain Gauges in Concrete Structures,” J. Eng. Mech. Div. Am. Soc. Civ. Eng. 120 (8), 1696– 1717. Masri, S. F., Tasbihgoo, F., Caffrey, J. P., Smyth, A. W., and Chassiakos, A. G. (2006). “Data-Based Model-Free Representation of Complex Hysteretic MDOF Systems,” J. Struct. Control Health Monitoring 13 (1), 365–387. Smyth, A. W., Pei, J.-S., and Masri, S. F. (2003). “System Identification of the Vincent Thomas Suspension Bridge Using Earthquakes Records,” Earthquake Eng. Struct. Dyn. 33, 339–367. Wen, Y. K. (1989). “Methods of Random Vibration for Inelastic Structures,” Appl. Mech. Rev. 42(2), 39–52. Yun, H.-B., Tasbihgoo, F., Masri, S. F., Caffrey, J. P., Wolfe, R. W., Makris, N., and Black, C. (2007). “Comparison of Modeling Approaches for Full-Scale Nonlinear Viscous Dampers,” J. Vibration Control (in press).
P1: Binaya Dash November 16, 2007
18:17
7985
7985˙C004
4 Model-Free Adaptive Dynamic Programming Algorithms for H-Infinity Control of Complex Linear Systems
Asma Al-Tamimi, Murad Abu-Khalaf, and Frank L. Lewis
CONTENTS 4.1 4.2 4.3
Introduction................................................................................................ 131 Discrete-Time State Feedback Control for Zero-Sum Games.............. 132 Heuristic Dynamic Programming (HDP) .............................................. 137 4.3.1 Derivation of HDP for Zero-Sum Games .................................. 137 4.3.2 Online Implementation of HDP Algorithm .............................. 139 4.3.3 Convergence of Zero-Sum Game HDP ...................................... 140 4.4 Action-Dependent Heuristic Dynamic Programming (ADHDP): Q-Learning.............................................................................. 142 4.4.1 Derivation of Model-Free Online Tuning Based on the Q-Learning Algorithm (ADHDP)................................... 143 4.4.2 Online Implementation of Q-Learning Algorithm................... 146 4.4.3 Convergence of Zero-Sum Game Q-Learning .......................... 148 4.5 Online ADP H∞ Autopilot Controller Design for an F-16 Aircraft.... 150 4.5.1 HDP-Based H∞ Autopilot Controller Design ........................... 151 4.5.2 Q-Learning-Based H∞ Autopilot Controller Design ............... 153 4.6 Conclusion .................................................................................................. 156 References............................................................................................................. 158
4.1
Introduction
In this chapter the design of optimal controllers for the discrete-time linear quadratic zero-sum games that appear in the H∞ optimal control problem is addressed. The method used to obtain the optimal controller is the approximate dynamic programming (ADP) technique. In this chapter two methods 131
P1: Binaya Dash November 16, 2007
132
18:17
7985
7985˙C004
Modeling and Control of Complex Systems
are presented to obtain the optimal controller, and both yield online algorithms. We present a technique for online implementation as well as, for the first time, conversion proofs of ADP methods for H∞ discrete-time control. The first algorithm, heuristic dynamic programming (HDP), is a method to find the optimal controller forward in time; in this algorithm the system model is needed. The second algorithm, action-dependent heuristic dynamic programming (ADHDP) or Q-learning, is an improved algorithm from the first one, as the system model is not needed. This leads to a model-free optimal controller design, which is in fact an adaptive control design that converges to the optimal H∞ solution. To our knowledge, Q-learning provides the first direct adaptive control technique that converges to an H∞ controller. These are ADP algorithms that create agents that learn to coexist. ADP was proposed by Werbos [14], Barto et al. [6], Widrow et al. [24], Howard [15], Watkins [12], Bertsekas and Tsitsiklis [21], Prokhorov and Wunsch [20], and others to solve optimal control problems forward in time. In these works, the optimal control law, an action network, and the value function, a critic network, are modeled as parametric structures, that is, neural networks. This is combined with incremental optimization, such as reinforcement learning, to tune and improve both networks forward in time, and hence can be implemented in actual control systems. This overcomes computational complexity associated with dynamic programming, which is an off-line technique that requires a backward-in-time solution procedure [11]. Werbos [16] classified ADP approaches into four main schemes: heuristic dynamic programming (HDP), dual heuristic dynamic programming (DHP), action-dependent heuristic dynamic programming (ADHDP), also known as Q-learning, and action-dependent dual heuristic dynamic programming (ADDHP). Bradtke et al. [1], Hagen and Krose [3], and Landelius [9] applied ADP techniques to the discrete-time linear quadratic optimal control problem. The connection with algebraic Riccati equations was emphasized in Reference [9]. The current status on ADP is given in Reference [2]. The organization of this chapter is as follows. In Section 4.2, zero-sum games for a discrete-time linear system with quadratic infinite horizon cost are revisited. In Section 4.3, an HDP algorithm is proposed to solve the zerosum game forward in time. Section 4.4 extends the results to the ADHDP case. Finally, an H∞ control example for an F-16 aircraft autopilot design example is given to show the practical effectiveness of the two ADP techniques.
4.2
Discrete-Time State Feedback Control for Zero-Sum Games
In this section, the solution of the zero-sum game of a linear discrete-time system with quadratic cost is derived under a state feedback information structure. The policies for each of the two players, control and disturbance, are derived with the associated Riccati equation. Specific forms for both the
P1: Binaya Dash November 16, 2007
18:17
7985
7985˙C004
Model-Free Adaptive Dynamic Programming Algorithms
133
Riccati equation and the control and disturbance policies are derived that are required for applications in ADP. Consider the following discrete-time linear system: xk+1 = Axk + Buk + Ewk yk = xk ,
(4.1)
where x ∈ Rn , y ∈ R p , uk ∈ Rm1 is the control input and wk ∈ Rm2 is the disturbance input. Consider the infinite-horizon value function: ∞ xiT Rxi + uiT ui − γ 2 wiT wi (4.2) V(xk ) = i=k
for a prescribed fixed value of γ . In the H-infinity control problem, γ is the desired L2 gain for disturbance attenuation. It is desired to find the optimal control u∗k for the worst-case disturbance ∗ wk , in which the infinite-horizon cost is to be minimized by player 1, uk , and maximized by player 2, wk . Here the class of strictly feedback stabilizing policies is considered. For any stabilizing sequence of policies uk and wk , one can write the infinite-horizon cost-to-go as: ∞ x T Rxi + uiT ui − γ 2 wiT wi V(xk ) = i=k i ∞ = xkT Rxk + ukT uk − γ 2 wkT wk + xiT Rxi + uiT ui − γ 2 wiT wi (4.3) i=k+1
= xkT Rxk + ukT uk − γ 2 wkT wk + V(xk+1 ) = r (xk , uk , wk ) + V(xk+1 ). It is known that for any stabilizing policies, the cost function is a finite quadratic function of the states and can be written as: V(xk ) = xkT Mxk for some symmetric positive semidefinite matrix M. Therefore, Equation (4.3) can be written as: T Mxk+1 . xkT Mxk = xkT Rxk + ukT uk − γ 2 wkT wk + xk+1
(4.4)
Using the dynamic programming principle, the optimization problem in Equations (4.1) and (4.2) can be written as: V ∗ (x) = min max(r (xk , uk , wk ) + V ∗ (xk+1 )) u
w
w
u
= max min(r (xk , uk , wk ) + V ∗ (xk+1 )).
(4.5)
If we assume that there exists a solution to the game algebraic Riccati equation (GARE) as in Equation (4.9) that is strictly feedback stabilizing, then it is known that the policies are in saddle-point equilibrium, that is, minimax is equal to maximin, in the restricted class of feedback stabilizing policies under
P1: Binaya Dash November 16, 2007
18:17
7985
7985˙C004
134
Modeling and Control of Complex Systems
which xk → 0 as k → ∞ for all x0 ∈ Rn . Assuming that the game has a value and is solvable, in order to have a unique feedback saddle-point in the class of strictly feedback stabilizing policies, the inequalities in Equations (4.6) and (4.7) should be satisfied: I − γ −2 E T PE > 0
(4.6)
I + B PB > 0
(4.7)
V ∗ (xk ) = xkT P xk
(4.8)
T
where P ≥ 0 such that:
and satisfies the GARE [5, 17], which is given as: −1 T I + B T PB B T PE B PA . P = AT PA + R − [ AT PB AT PE] E T PA E T PB E T PE − γ 2 I (4.9) Substituting Equation (4.8) in Equation (4.5) and applying the Bellman optimality principle, one has: V ∗ (xk ) = min max(r (xk , uk , wk ) + V ∗ (xk+1 )) u w T P xk+1 . = min max xkT Rxk + ukT uk − γ 2 wkT wk + xk+1 u
w
(4.10)
This can be rewritten as:
xkT P xk = min max xkT Rxk + ukT uk − γ 2 wkT wk u
w
+ ( Axk + Buk + Ewk ) T P( Axk + Buk + Ewk ).
(4.11)
The optimal controller can be derived from Equation (4.11) by satisfying the first necessary condition and given as: u∗k = ( I + B T PB − B T PE( E T PE − γ 2 I ) −1 E T PB) −1 ×( B T PE( E T PE − γ 2 I ) −1 E T PA − B T PA)xk
(4.12)
so the optimal control is a state feedback with gain: L = ( I + B T PB − B T PE( E T PE − γ 2 I ) −1 E T PB) −1 ×( B T PE( E T PE − γ 2 I ) −1 E T PA − B T PA).
(4.13)
and the worst case disturbance is: wk∗ = ( E T PE − γ 2 I − E T PB( I + B T PB) −1 B T PE) −1 ×( E T PB( I + B T PB) −1 B T PA − E T PA)xk
(4.14)
so the optimal disturbance is a state feedback with gain: K = ( E T PE − γ 2 I − E T PB( I + B T PB) −1 B T PE) −1 ×( E T PB( I + B T PB) −1 B T PA − E T PA).
(4.15)
P1: Binaya Dash November 16, 2007
18:17
7985
7985˙C004
Model-Free Adaptive Dynamic Programming Algorithms
135
Note that the inverse matrices in Equations (4.13) and (4.14) exist due to Equations (4.6) and (4.7). LEMMA 1 If ( I − γ −2 E T PE) is invertible, then (I − γ −2 E E T P) is also invertible. Because ( I − γ −2 E T PE) is invertible then the following expression
PROOF
is valid: I + γ −2 E( I − γ −2 E T PE) −1 E T P. Applying the matrix inversion lemma, it can be shown that: I + γ −2 E( I − γ −2 E T PE) −1 E T P = ( I − γ −2 E E T P) −1 . Hence, I − γ −2 E E T P is invertible and I − γ −2 EET P > 0. LEMMA 2 The optimal policies for control L, and disturbance K , in Equations (4.13) and (4.15), respectively, are equivalent to the ones that appear in Reference [7] and [8]: L = −B T P( I + B B T P − γ 2 EET P) −1 ) A K = γ −2 E T P( I + B B T P − γ 2 EET P) −1 ) A. PROOF
Apply the matrix inversion lemma.
Next it is shown that, under state feedback information structure, the value function of the game V ∗ (xk ) = xkT P xk satisfies a certain Riccati equation. The form of the Riccati equation derived in this chapter is similar to the one appearing in Reference [5], which was derived under full information structure. Moreover, it will be shown that the Riccati equation derived in this chapter is equivalent to the work in References [7, 8] derived under the same state feedback information structure. Note that Equation (4.11) can be rewritten as follows: ∗ 2 ∗T ∗ xkT P xk = xkT Rxk + u∗T k uk − γ wk wk T + Axk + Bu∗k + Ewk∗ P Axk + Bu∗ k + Ewk∗ .
(4.16)
This is equivalent to: P = R + L T L − γ 2 K T K + ( A + B L + E K ) T P( A + B L + E K ) = R + L T L − γ 2 K T K + AclT PAcl
(4.17)
where Acl = A+ B L + E K . Equation (4.17) is the closed-loop Riccati equation.
P1: Binaya Dash November 16, 2007
18:17
7985
7985˙C004
136
Modeling and Control of Complex Systems
LEMMA 3 Substituting the policies, Equations (4.13) and (4.15), in Equation (4.17) one can obtain the Riccati equation that appears in Reference [5], and given by: −1 T I + B T PB B T PE B PA T T T . P = A PA + R − [ A PB A PE] E T PA E T PB E T PE − γ 2 I (4.18) PROOF
The control policy and the disturbance policy can be written as fol-
lows: −1 T T ( A12 A−1 L = D11 22 E PA − B PA)
(4.19)
−1 T T K = D22 ( A21 A−1 11 B PA − E PA)
(4.20)
where −1 D11 A12 A21 A11
= ( I + B T PB − B T PE( E T PE − γ 2 I ) −1 E T PB) −1 = B T PE = E T PB = I + B T PB
A22 = E T PE − γ 2 I −1 D22 = ( E T PE − γ 2 I − E T PB( I + B T PB) −1 B T PE) −1 . −1 −1 and D22 are inFrom Equations (4.6) and (4.7), one concludes that D11 vertible. Equations (4.19) and (4.20) can be written as follows: T −1 −1 L D11 B PA −D11 A12 A−1 22 =− . (4.21) T −1 −1 K PA E D22 A21 A−1 D 22 22
It is known that: A11 A21
A12 A22
−1
=
−1 D11 −1 D22 A21 A−1 22
−1 −D11 A12 A−1 22 . −1 D22
Therefore, one can rewrite Equation (4.21) as follows: −1 T B PA L A11 A12 =− A21 A22 E T PA K −1 T I + B T PB B T PE B PA . =− E T PA E T PB E T PE − γ 2 I
(4.22)
Equation (4.17) can be written as follows: P = ( A + B L + E K ) T P( A + B L + E K ) + L T L − γ 2 K T K + R = AT PA + AT PBL + AT PEK + L T B T PA + K T E T PA + [ L T K T ] T I 0 L L B PB B T PE T T K ] + R. +[L × K E T PB E T PE K 0 −γ 2 I
(4.23)
P1: Binaya Dash November 16, 2007
18:17
7985
7985˙C004
Model-Free Adaptive Dynamic Programming Algorithms
137
Substituting Equation (4.22) in Equation (4.23), one has: P = AT PA + AT PBL + AT PEK + L T B T PA + K T E T PA − L T K T −1 T I + B T PB I + B T PB B T PE B T PE B PA +R × E T PA E T PB E T PE − γ 2 I E T PB E T PE − γ 2 I = AT PA + AT PBL + AT PEK + R.
(4.24)
Equation (4.24) can be written as: P = AT PA + AT PB
AT PE
L + R. K
(4.25)
Substituting Equation (4.22) in Equation (4.25), one has the desired Riccati equation: −1 T I + B T PB B T PE B PA T T T . P = A PA + R − [ A PB A PE] E T PA E T PB E T PE − γ 2 I (4.26) It can be seen that Equation (4.26) is the same as Equation (4.18). It is shown in Reference [5] that Equation (4.18) is equivalent to the game algebraic Riccati equation (GARE) that appears in References [7, 8], which is given as: P = R + AT P( I + ( B B T − EET ) P) −1 A. In the next sections the solution for the optimal controller will be found using the two ADP algorithms, HDP and ADHDP.
4.3
Heuristic Dynamic Programming (HDP)
In this section, the HDP algorithm is developed to solve the discrete-time linear system zero-sum game described in Section 4.2. In the HDP, a parametric structure is used to approximate the cost-to-go function of the current control policy. Then the certainty equivalence principle is used to improve the policy of the action network. 4.3.1 Derivation of HDP for Zero-Sum Games Consider the system (4.1), and the value function (4.2). Starting with an initial quadratic cost-to-go V0 (x) ≥ 0 that is not necessarily optimal, one finds V1 (x) by solving Equation (4.27) with i = 0 according to: (4.27) Vi+1 (xk ) = min max xkT Rxk + uT u − γ 2 w T w + Vi (xk+1 ) . uk
wk
Equation (4.27) is a recurrence relation that is used to solve for the optimal cost-to-go, the game value function, forward in time. Note that since Vi (x)
P1: Binaya Dash November 16, 2007
138
18:17
7985
7985˙C004
Modeling and Control of Complex Systems
is not initially optimal, policies are found using Vi (x) in Equation (4.27) by using the certainty equivalence principle. These greedy policies are denoted as ui (xk ) and wi (xk ). Therefore Vi+1 (x) is given by: Vi+1 (xk ) = xkT Rxk + uiT (xk )ui (xk ) − γ 2 wiT (xk )wi (xk ) + Vi (xk+1 ).
(4.28)
It can be shown that the parameters of the action networks, L i and K i of ui (xk ) and wi (xk ), are found as: L i = ( I + B T Pi B − B T Pi E( E T Pi E − γ 2 I ) −1 E T Pi B) −1 × ( B T Pi E( E T Pi E − γ 2 I ) −1 E T Pi A − B T Pi A),
(4.29)
K i = ( E T Pi E − γ 2 I − E T Pi B( I + B T Pi B) −1 B T Pi E) −1 ×( E T Pi B( I + B T Pi B) −1 B T Pi A − E T Pi A).
(4.30)
Once Vi+1 (x) is found, one then repeats the same process for i = 0, 1, 2, . . .. In this section, it is shown that Vi (xk ) → V ∗ (xk ) as i → ∞, where V ∗ (xk ) is the optimal value function for the game based on the solution to the GARE (4.18). Because Vi (x) is quadratic in the state as given in Equation (4.8), and the two action networks are linear in the state, as shown in Equations (4.12) and (4.14), a natural choice of these parametric structures is given as: ˆ V(x, pi ) = piT x¯ ,
(4.31)
u(x, ˆ L i ) = L iT x,
(4.32)
w(x, ˆ Ki ) =
K iT x,
(4.33)
where x¯ = (x12 , . . . , x1 xn , x22 , x2 x3 , . . . , xn−1 xn , xn2 ) is the Kronecker product quadratic polynomial basis vector 0, and p = v( P), where v(·) is a vector function that acts on n × n matrices and outputs a n(n+1)/2 × 1 column vector. The output vector of v(·) is constructed by stacking the columns of the squared matrix into a one-column vector with the off-diagonal elements summed as Pi j + P ji , 0. The parameter structures (4.31), (4.32), and (4.33) give an exact closed-form representation of the functions in Equations (4.27) and (4.28). Note that to update the action networks, it is necessary to know the plant model matrices A, B, and E. Substituting Equations (4.29) and (4.30) in the right-hand side of Equation (4.28), one has: d(xk , pi ) = xkT Rxk + (L i xk ) T (L i xk ) − γ 2 ( K i xk ) T ( K i xk ) + piT x¯ k+1 ,
(4.34)
which can be thought of as the desired target function to which one needs to ˆ fit V(x, pi+1 ) in a least-squares sense to find pi+1 such that: T x¯ k = d(xk , pi ). pi+1
P1: Binaya Dash November 16, 2007
18:17
7985
7985˙C004
Model-Free Adaptive Dynamic Programming Algorithms
139
The parameter vector pi+1 is found by minimizing the error between the target value function (4.34) and (4.31) in a least-squares sense over a compact set, , ⎧ ⎫ ⎨ ⎬ T | pi+1 x¯ − d(x, pi )|2 d x . (4.35) pi+1 = arg min pi+1 ⎩ ⎭
4.3.2 Online Implementation of HDP Algorithm The least-squares problem in Equation (4.35) can be solved in real time by collecting enough data points generated from d(xk , pi ) in Equation (4.34). This requires one to have knowledge of the state information xk , xk+1 and the reward function r (xk , uk , wk ) as the dynamics evolve in time. This can be determined by simulation, or, in real-time applications, by observing the states online. To satisfy the excitation condition of the least-squares problem, one needs to have the number of collected points N at leastN ≥ n(n + 1)/2, where n is the number of states. Therefore, after several time steps that are enough to guarantee the excitation condition, one has the following leastsquares problem: pi+1 = ( XXT ) −1 XY,
(4.36)
where X = [ x¯ |xk−N−1
x¯ |xk−N−2
Y = [ d(xk−N−1 , pi )
···
x¯ |xk−1 ]
d(xk−N−2 , pi )
· · · d(xk−1 , pi ) ]T .
The least-squares (4.36) can be solved recursively by requiring a persistency of excitation condition: ε0 I ≤
α 1 T x¯ k−t x¯ k−t ≤ ε1 I α m=1
for all k > α0 , α > α0 , with α, ε0 , and ε1 positive integers and ε0 ≤ ε1 . The recursive least-squares algorithm (RLS) is given as: e i (t) = d(xk , pi ) − x¯ kT pi+1 (t − 1) pi+1 (t) = pi+1 (t − 1) + i (t) = i (t − 1) −
i (t − 1) x¯ k e i (t) 1 + x¯ kT i (t − 1) x¯ k
i (t − 1) x¯ k x¯ kT i (t − 1) 1 + x¯ kT i (t − 1) x¯ k
where i is the policy update index, t is the index of the recursions of the recursive least-squares, and k is the discrete time. is the covariance matrix
P1: Binaya Dash November 16, 2007
18:17
7985
7985˙C004
140
Modeling and Control of Complex Systems
Start of the Zero-Sum HDP
Initialization p0 = v( P0 ) ≥ 0 : P0 ≥ 0 i =0
Policy Iteration Li = ( I + BT Pi B − BT Pi E ( E T Pi E − γ 2 I ) −1 E T Pi B )−1 × ( BT Pi E ( E T Pi E − γ 2 I )−1 E T Pi A − BT Pi A), Ki = ( E T Pi E − γ 2 I − E T Pi B ( I + BT Pi B ) −1 BT Pi E )−1 × ( E T Pi B ( I + BT Pi B ) −1 BT Pi A − E T Pi A).
Solving the Least-squares X = [x
x k − N −1
x
x k − N −2
Λ
x
x k–1
]
Y = [ d ( xk − N −1, pi ) d ( xk − N − 2 , pi ) Λ
d ( xk–1 , pi )]T .
pi +1 = ( XX T ) −1 XY
i
i +1
No
pi+1 − pi
F
<ε
Yes Finish
FIGURE 4.1 Zero-sum games HDP flowchart.
of the recursion and e(t) is the estimation error of the recursive least-squares. Note that i (0) is a large number and i+1 (0) = i . The on-line HDP algorithm developed in this chapter is summarized in the flowchart shown in Figure 4.1. 4.3.3 Convergence of Zero-Sum Game HDP We now prove the convergence of the proposed zero-sum game HDP algorithm.
P1: Binaya Dash November 16, 2007
18:17
7985
7985˙C004
Model-Free Adaptive Dynamic Programming Algorithms
141
LEMMA 4 Iterating on Equations (4.29), (4.30), and (4.36) under excitation condition is equivalent to the following iteration on the GARE in Equation (4.18): −1 T I + B T Pi B B T Pi E B Pi A Pi+1 =AT Pi A + R−[ AT Pi B AT Pi E] . E T Pi A E T Pi B E T Pi E − γ 2 I (4.37) PROOF
The least-squares problem is defined in Equation (4.36), which is: ⎧ ⎫ ⎨ ⎬ T pi+1 = arg min | pi+1 x¯ − d(x, pi )|2 d x . (4.38) pi+1 ⎩ ⎭
The first-order necessary condition requires that: (2x¯ x¯ T pi+1 − 2x¯ d T (x, pi ))d x = 0.
(4.39)
Because the excitation condition is assumed, substituting Equation (4.34) in Equation (4.39), one has: ⎛ ⎞−1 ⎛ ⎞ pi+1 = ⎝ x¯ k x¯ kT d x ⎠ ⎝ x¯ k x¯ kT d x ⎠
×v( R +
L iT L i
− γ K iT K i + ( A + B L i + E K i ) T Pi ( A + B L i + E K i )) 2
= v( R + L iT L i − γ 2 K iT K i + ( A + B L i + E K i ) T Pi ( A + B L i + E K i )), where v is the vectorized function in the Kronecker product. Since the matrix Pi+1 which reconstructed from pi+1 is symmetric, iteration on pi is equivalent to the following iteration: Pi+1 = R+ L iT L i −γ 2 K iT K i +( A+ B L i + E K i ) T Pi ( A+ B L i + E K i ).
(4.40)
Using the same steps as in Lemma 3, Equation (4.37) follows from Equation (4.40). THEOREM 1 Assume that the game has a value and is solvable. If the sequence of least-squares problems in Equation (4.35) is solvable, that is, the corresponding excitation conditions hold, then the HDP algorithm converges to the value of the game that solves the Riccati Equation (4.18) when starting with P0 ≥ 0. PROOF This follows from Lemma 4 and from Reference [5] where it is shown that iterating on Equation (4.37) with P0 ≥ 0 converges to P that solves Equation (4.18).
P1: Binaya Dash November 16, 2007
18:17
7985
7985˙C004
142
Modeling and Control of Complex Systems
We have just proved convergence of the HDP algorithm. Note that an easy way to initialize the algorithm in Figure 4.1 is by selecting P0 = 0. In the next section the second algorithm, ADHDP, will be addressed.
4.4
Action-Dependent Heuristic Dynamic Programming (ADHDP): Q-Learning
The ADHDP algorithm is based on the Q-function, so in the next section the relationship between the optimal value function and the Q-function is discussed. The optimal policies are derived from the Q-function which, as will be shown, will lead to a model-free algorithm. In this section, the concept of Q-functions to zero-sum games that are continuous in the state and action space as in Equation (4.5) is developed. The optimal action-dependent value function Q∗ of the zero-sum game is then defined to be: Q∗ (xk , uk , wk ) = r (xk , uk , wk ) + V ∗ (xk+1 ) = xkT ukT wkT H xkT ukT
T T
(4.41)
wk
where H is the matrix associated with P that solves GARE, and is derived as:
xkT
ukT
wkT H xkT
ukT
= r (xk , uk , wk ) + V ∗ (xk+1 )
wkT
T
T = xkT Rxk + ukT uk − γ 2 wkT wk + xk+1 P xk+1
= xkT Rxk + ukT uk − γ 2 wkT wk + ( Axk + Buk + Ewk ) T P( Axk + Buk + Ewk ) ⎤⎡ ⎤ ⎡ R 0 0 xk T 0 ⎦ ⎣ uk ⎦ = xk ukT wkT ⎣ 0 I (4.42) wk 0 0 −γ 2 I ⎡ ⎤ ⎡ T⎤ xk A T + xk ukT wkT ⎣ B T ⎦ P A B E ⎣ uk ⎦ ET wk so H can be written as: ⎛ ⎞ ⎡ T A PA + R Hxx Hxu Hxw ⎝ Hux Huu Huw ⎠ = ⎣ B T PA Hwx Hwu Hww E T PA
AT PB T B PB + I E T PB
⎤ AT PE ⎦. B T PE T 2 E PE − γ I (4.43)
The optimal action-dependent game value function Q∗ (xk , uk , wk ) is equal to the game value function V ∗ (xk ) when the policies uk , wk are optimal.
P1: Binaya Dash November 16, 2007
18:17
7985
7985˙C004
Model-Free Adaptive Dynamic Programming Algorithms
143
Then one has: V ∗ (xk ) = min max Q∗ (xk , uk , wk ) uk wk = min max xkT ukT wkT H xkT uk ∗
wk
ukT
wkT
T
(4.44)
= Q (xk , u∗k , wk∗ ). Therefore the relation between P and H can be obtained by equating Equations (4.44) and (4.8): T (4.45) P = I LT K T H I LT K T . Substituting Equation (4.45) in Equation (4.42): ⎤ ⎡ ⎤T ⎡ ⎡ R 0 0 A B E A B 0 ⎦ + ⎣ LA LB LE ⎦ H ⎣ LA LB H=⎣0 I KA KB KE KA KB 0 0 −γ 2 I
⎤ E L E ⎦ (4.46) KE
which can be related to: ∗ Q∗ (xk , uk , wk ) = r (xk , uk , wk ) + Q∗ (xk+1 , u∗k+1 , wk+1 ).
(4.47)
Equations (4.46) and (4.47) are the action-dependent version of Equations (4.9) and (4.10) in terms of the H. Similarly using Equation (4.43), the gains of the optimal strategies can be written in terms of H as: −1 −1 −1 Hwu Hwx − Hux , Huw Hww (4.48) L = Huu − Huw Hww −1 −1 −1 K = Hww − Hwu Huu Huw Hwu Huu Hux − Hwx . (4.49) Equations (4.48) and (4.49) depend only on the H matrix, and they are the main equations needed in the algorithm to be proposed to find the control and disturbance gains. Note that if H is known, then the system model is not needed to compute the controller gains. In the next section, we show how to develop an algorithm to learn the Q-functions (i.e., the H matrix) of a given zero-sum game. This model-free Q-learning algorithm allows for solving the GARE equation online without requiring the knowledge of the plant model. 4.4.1 Derivation of Model-Free Online Tuning Based on the Q-Learning Algorithm (ADHDP) In this section, we use the Q-function of Section 4.4 to develop a Q-learning algorithm to solve for the discrete-time zero-sum game H matrix that does not require the system dynamic matrices. In the Q-learning approach, a parametric structure is used to approximate the Q-function of the current control policy. Then the certainty equivalent principle is used to improve the policy of the action network. In the Q-learning, one starts with an initial Q-function Q0 (x, u, w) ≥ 0 that is not necessarily optimal, and then finds Q1 (x, u, w) by solving
P1: Binaya Dash November 16, 2007
18:17
7985
7985˙C004
144
Modeling and Control of Complex Systems
Equation (4.50) with i = 0 as: Qi+1 (xk , uk , wk ) = xkT Rxk + ukT uk − γ 2 wkT wk + min max Qi (xk+1 , uk+1 , wk+1 ) , uk+1 wk+1 = xkT Rxk + ukT uk − γ 2 wkT wk + Vi (xk+1 ) = xkT Rxk + ukT uk − γ 2 wkT wk + Vi ( Axk + Buk + Ewk ) (4.50) and by applying the following incremental optimization on the Q function as: min max Qi+1 (xk , uk , wk ) = min max xkT uk
wk
uk
wk
ukT
wkT
Hi+1 xkT
ukT
wkT
T
.
Note that in Equation (4.50), the Q-function is given for any policy u and w. According to Equations (4.48) and (4.49) the corresponding state feedback policy updates are given by i −1 i i i −1 i i −1 i i L i = Huu − Huw Hww Hwu Hwx − Hux Huw Hww , i −1 i i −1 i i i −1 i i K i = Hww − Hwu Huu Huw Huu Hux − Hwx Hwu
(4.51)
ui (xk ) = L i xk wi (xk ) = K i xk .
(4.52)
with
Note that since Qi (x, u, w) is not initially optimal, the improved policies ui (xk ) and wi (xk ) use the certainty equivalence principle. To update the action networks, the plant model matrices A, B, and E are not needed, and only the H matrix is required. To develop solutions to Equation (4.50) forward in time that do not need the system matrices, one can substitute Equation (4.52) in Equation (4.50) to obtain the following recurrence relation on i: Qi+1 (xk , ui (xk ), wi (xk )) = xkT Rxk + uiT (xk )ui (xk ) − γ 2 wiT (xk )wi (xk ) T T T uiT (xk+1 ) wiT (xk+1 ) Hi xk+1 uiT (xk+1 ) wiT (xk+1 ) + xk+1
(4.53)
that is used to solve for the optimal Q-function forward in time. The idea is to solve for Qi+1 , then once it is determined, to repeat the same process for i = 0, 1, 2, . . .. In this chapter, it is shown that Qi+1 (xk , ui (xk )), wi (xk ) → Q∗ (xk , uk, wk ) as i → ∞, which means Hi → H, L i → L and K i → K . A parametric structure is used to approximate the actualQi (x, u, w). Similarly, parametric structures are used to obtain approximate closed-form representations of the two action networks u(x, ˆ L) and w(x, ˆ K ). Because in this chapter linear quadratic zero-sum games are considered, the Q-function is quadratic in the state and the policies. Moreover, the two action networks are
P1: Binaya Dash November 16, 2007
18:17
7985
7985˙C004
Model-Free Adaptive Dynamic Programming Algorithms
145
linear in the state. Therefore, a natural choice of these parameter structures is given as: uˆ i (x) = L i x,
(4.54)
wˆ i (x) = K i x,
(4.55)
ˆ z¯ , h i ) = zT Hi z, Q( = h iT z¯
(4.56)
where z = [ x T uT w T ]T z ∈ Rn+m1 +m2 =q , z¯ = (z12 , . . . , z1 zq , z22 , z2 z3 , . . . , zq −1 zq , zq2 ) is the Kronecker product quadratic polynomial basis vector 0, and h = v( H) with v(·) a vector function that acts on q × q matrices and gives a q (q +1)/ × 1 column vector. The output of v(·) is constructed by stacking 2 the columns of the squared matrix into a one-column vector with the offdiagonal elements summed as Hi j + H ji , 0. In the linear case, the parametric structures in Equations (4.55), (4.54), and (4.56) give an exact closed-form representation of the functions in Equation (4.53). Note that Equations (4.54) and (4.55) are updated using Equation (4.51). To solve for Qi+1 in Equation (4.53), the right-hand side of Equation (4.53) is written as: d(zk (xk ), Hi ) = xkT Rxk + uˆ i (xk ) T uˆ i (xk ) − γ 2 wˆ i (xk ) T wˆ i (xk ) + Qi (xk+1 , uˆ i (xk+1 ), wˆ i (xk+1 )
(4.57)
which can be thought of as the desired target function to which one needs to ˆ fit Q(z, h i+1 ) in a least-squares sense to find h i+1 such that: T h i+1 z¯ (xk ) = d( z¯ (xk ), h i ).
(4.58)
The parameter vector h i+1 is found by minimizing the error between the target value function (4.57) and (4.56) in a least-squares sense over a compact set , ⎧ ⎫ ⎨ ⎬ T h i+1 = arg min (4.59) |h i+1 z¯ (xk ) − d( z¯ (xk ), h i )|2 d xk . h i+1 ⎩ ⎭
Solving the least-squares problem one obtains: ⎛
h i+1 = ⎝
⎞−1 z¯ (xk ) z¯ (xk ) T d x ⎠
z¯ (xk )d( z¯ (xk ), h i )d x
(4.60)
where z(xk ) is: T z(xk ) = xkT ( uˆ i (xk )) T ( wˆ i (xk )) T T = xkT (L i xk ) T ( K i xk ) T T T = xkT I L iT K iT .
(4.61)
P1: Binaya Dash November 16, 2007
18:17
7985
7985˙C004
146
Modeling and Control of Complex Systems
Note, however, that uˆ i and wˆ i are linearly dependent on xk , see Equations (4.54) and (4.55), and therefore z¯ (xk ) z¯ (xk ) T d xk
is never invertible, which means that the least-squares problem in Equations (4.59) and (4.60) will never be solvable. To overcome this problem, exploration noise is added to both inputs in Equation (4.52) to obtain: uˆ ei (xk ) = L i xk + n1k wˆ ei (xk ) = K i xk + n2k
(4.62)
where n1k (0, σ1 ) and n2k (0, σ2 ) are zero-mean exploration noise with variances σ12 and σ22 respectively; therefore z(xk ) in Equation (4.61) becomes: ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ xk 0 xk xk z(xk ) = ⎣ uˆ ei (xk ) ⎦ = ⎣ L i xk + n1k ⎦ = ⎣ L i xk ⎦ + ⎣ n1k ⎦. wˆ ei (xk ) K i xk + n2k K i xk n2k Evaluating Equation (4.58) at enough points p1, p2, p3, . . . ∈ , one has: h i+1 = ( ZZT ) −1 ZY
(4.63)
with Z = [ z¯ ( p1)
z¯ ( p2)
Y = [ d( z¯ ( p1), h i )
···
z¯ ( pN) ]
d( z¯ ( p2), h i )
· · · d( z¯ ( pN), h i ) ]T .
Here the target in Equation (4.57) becomes: d(zk (xk ), Hi ) = xkT Rxk + uˆ ei (xk ) T uˆ ei (xk ) − γ 2 wˆ ei (xk ) T wˆ ei (xk ) + Qi (xk+1 , uˆ i (xk+1 ), wˆ i (xk+1 ))
(4.64)
with uˆ i and wˆ i used for Qi instead of uˆ ei and wˆ ei where the invertibility of the matrix in Equation (4.63) is therefore guaranteed by the excitation condition. 4.4.2 Online Implementation of Q-Learning Algorithm The least-squares problem in Equation (4.63) can be solved in real time by collecting enough data points generated from d(zk , h i ) in Equation (4.64). This requires one to have knowledge of the state information xk , xk+1 as the dynamics evolve in time, and also of the reward function r (zk ) = xkT Rxk + uˆ ei (xk ) T uˆ ei (xk ) − γ 2 wˆ ei (xk ) T wˆ ei (xk ) and Qi . This can be determined by simulation, or in real-time applications, by observing the states online. To satisfy the excitation condition of the least-squares problem, one needs to have the number of collected points N at least N ≥ q (q + 1)/2, where q = n + m1 + m2 is the number of states including both policies, control and
P1: Binaya Dash November 16, 2007
18:17
7985
7985˙C004
Model-Free Adaptive Dynamic Programming Algorithms
147
disturbance. In online implementation of the least-squares problem, Y and Z matrices are obtained in real time as: Z = [ z¯ (xk−N−1 )
z¯ (xk−N−2 )
Y = [ d( z¯ (xk−N−1 ), h i )
···
z¯ (xk−1 ) ]
d( z¯ (xk−2 ), h i )
· · · d( z¯ (xk−1 ), h i ) ]T .
(4.65)
One can also solve Equation (4.65) recursively using the well-known recursive least-squares technique. In that case, the excitation condition is replaced by the persistency of excitation condition, α 1 T z¯ k−t z¯ k−t ≤ ε1 I, for all k > α0 , α > α0 ε0 I ≤ α k=1 with α0 , ε0 , and ε1 positive integers and ε0 ≤ ε1 . The online Q-learning algorithm developed in this chapter is summarized in the flowchart shown in Figure 4.2. Start of the Zero-Sum Q-Learning Initialization h0 = v(H0) = 0 : P0 = 0 i = 0, L0 = 0, K0 = 0.
Solving the Least-Squares
Z = [z ( xk−N−1 ) z ( xk −N−2 ) Λ
z ( xk −1 )] d ( z ( xk −1 ), hi )]T
Y = [d ( z ( xk −N −1 ), hi ) d ( z ( xk −2 ), hi ) Λ hi+1 = ( ZZT ) −1 ZY Hi+1 = f(hi+1)
Policy Iteration −1
−1
−1
−1
i +1 i +1 i +1 i +1 −1 i +1 i +1 i +1 i +1 Li +1 = ( H uu − H uw H ww H wu ) ( H uw H ww H wx − H ux ), i +1 i +1 i +1 i +1 −1 i +1 i +1 i +1 i +1 K i +1 = ( H ww − H wu H uu H uw ) ( H wu H uu H ux − H wx )
i
i+1
No
hi+1 − hi
Yes Finish
FIGURE 4.2 Zero-sum games Q-learning (ADHDP) flowchart.
F <ε
P1: Binaya Dash November 16, 2007
18:17
7985
7985˙C004
148
Modeling and Control of Complex Systems
This algorithm for zero-sum games follows by iterating between Equation (4.51) and Equation (4.65). In the remainder of this section, it will be shown that this policy iteration technique will cause Qi to converge to the optimal Q∗ . 4.4.3 Convergence of Zero-Sum Game Q-Learning We now prove that the proposed Q-learning algorithm for zero-sum games converges to the optimal policies. Some preliminary lemmas are needed. LEMMA 5 Iterating on Equations (4.51) and (4.65) is equivalent to: ⎡ ⎡ ⎤T A B A B E Hi+1 = G + ⎣ L i A L i B L i E ⎦ Hi ⎣ L i A L i B Ki A Ki B Ki E Ki A Ki B where G is
⎡
R ⎣0 0
0 I 0
⎤ E L i E ⎦, Ki E
(4.66)
⎤ 0 0 ⎦. −γ 2
Because Equation (4.64) is equivalent to: ⎛ ⎡ ⎡ ⎤T A B A B E ⎜ d( z¯ k (xk ), h i ) = z¯ kT ×v ⎝G + ⎣ L i A L i B L i E ⎦ Hi ⎣ L i A L i B Ki A Ki B Ki E Ki A Ki B PROOF
⎤⎞ E ⎟ L i E ⎦⎠, Ki E
then using the Kronecker products, the least-squares Equation (4.65) becomes: ⎛ ⎡ ⎤T ⎡ ⎤⎞ A B E A B E ⎜ ⎟ h i+1 = ( ZZT ) −1 ( ZZ) ×v⎝G + ⎣ L i A L i B L i E ⎦ Hi ⎣ L i A L i B L i E ⎦⎠, ! Ki A Ki B Ki E Ki A Ki B Ki E I where v is the vectorized function in Kronecker products. Because the matrix Hi+1 reconstructed from h i+1 is symmetric, iterating on h i is equivalent to: ⎡ ⎡ ⎤T ⎤ A B E A B E Hi+1 = G + ⎣ L i A L i B L i E ⎦ Hi ⎣ L i A L i B L i E ⎦. Ki A Ki B Ki E Ki A Ki B Ki E LEMMA 6 The matrices Hi+1 , L i+1 , and K i+1 can be written as: ⎤ ⎡ T A Pi A + R AT Pi B AT Pi E ⎦, B T Pi B + I B T Pi E Hi+1 = ⎣ B T Pi A T T T E Pi A E Pi B E Pi E − γ 2 I
(4.67)
P1: Binaya Dash November 16, 2007
18:17
7985
7985˙C004
Model-Free Adaptive Dynamic Programming Algorithms
149
L i+1 = ( I + B T Pi B − B T Pi E( E T Pi E − γ 2 I ) −1 E T Pi B) −1 ×( B T Pi E( E T Pi E − γ 2 I ) −1 E T Pi A − B T Pi A), K i+1 = ( E T Pi E − γ 2 I − E T Pi B( I + B T Pi B) −1 B T Pi E) −1
(4.68)
×( E T Pi B( I + B T Pi B) −1 B T Pi A − E T Pi A),
(4.69)
where Pi is given as Pi = I
K iT Hi I
L iT
L iT
K iT
T
.
(4.70)
Equation (4.66) in Lemma 5 can be written as:
PROOF
⎡ ⎤T ⎤ A B E A B E = G + ⎣ L i A L i B L i E ⎦ Hi ⎣ L i A L i B L i E ⎦ Ki A Ki B Ki E Ki A Ki B Ki E T T T T I L i K i Hi I L iT K iT A B =G+ A B E ⎡
Hi+1
E .
Because Pi is described as in Equation (4.70) then it follows that: ⎡
Hi+1
AT Pi A + R ⎣ B T Pi A = E T Pi A
AT Pi B T B Pi B + I E T Pi B
⎤ AT Pi E ⎦. B T Pi E T 2 E Pi E − γ I
Using Equations (4.51) and (4.67), one obtains Equations (4.68) and (4.69). LEMMA 7 Iterating on Hi is similar to iterating on Pi as: Pi+1 = AT Pi A + R − [ AT Pi B AT Pi E] −1 T I + B T Pi B B T Pi E B Pi A × E T Pi A E T Pi B E T Pi E − γ 2 I
(4.71)
with Pi defined as in Equation (4.70). From Equation (4.70) in Lemma 6, one has:
PROOF
Pi+1 = I
T L i+1
T K i+1 Hi+1 I
T L i+1
T K i+1
T
,
and using Equation (4.67) in Lemma 6, one obtains:
Pi+1 = I
T L i+1
⎡ T A Pi A + R T ⎣ B T Pi A K i+1 E T Pi A
AT Pi B T B Pi B + I E T Pi B
⎤⎡ ⎤ AT Pi E I ⎦⎣ L i+1 ⎦ B T Pi E T K i+1 E Pi E − γ 2 I
T T T T = R + L i+1 L i+1 − γ 2 K i+1 K i+1 + ( AT + L i+1 B T + K i+1 ET) ×Pi ( A + B L i+1 + E K i+1 ).
(4.72)
P1: Binaya Dash November 16, 2007
18:17
7985
7985˙C004
150
Modeling and Control of Complex Systems
Substituting Equations (4.68) and (4.69) in Equation (4.72), one has: −1 T I + B T Pi B B T Pi E B Pi A . Pi+1 = AT Pi A+R−[ AT Pi B AT Pi E] E T Pi A E T Pi B E T Pi E − γ 2 I
The next result is our main theorem and shows convergence of the Qlearning algorithm. THEOREM 2 Assume that the linear quadratic zero-sum game is solvable and has a value under the state feedback information structure. Then, iterating on Equation (4.66) in Lemma 5, withH0 = 0, L 0 = 0 and K 0 = 0 converges with Hi → H, where H corresponds to Q∗ (xk , uk , wk ) as in Equations (4.41) and (4.43) with corresponding P solving the GARE (4.9). In Reference [5] it is shown that iterating on the GARE (4.71) with P0 = 0 converges to P that solves Equation (4.9). Since Lemma 7 shows that iterating on Hi matrix is equivalent to iterating on Pi , then as i → ∞
PROOF
⎡
AT PA + R Hi → ⎣ B T PA E T PA
AT PB T B PB + I E T PB
⎤ AT PE ⎦. B T PE T E PE − γ 2 I
Hence from Equation (4.46), and because from Equation (4.70) H0 = 0, L 0 = 0, and K 0 = 0 implies that P0 = 0, one concludes that Qi → Q∗ . We have just proved convergence of the Q-learning algorithm assuming the least-squares problem (4.65) is solved completely; that is, the excitation condition is satisfied. Note that this implies that Q-learning can be interpreted as solving the GARE of the zero-sum game without requiring the plant model.
4.5
Online ADP H∞ Autopilot Controller Design for an F-16 Aircraft
In this design application, the zero-sum game that corresponds to the H∞ control problem is solved for an F-16 aircraft autopilot design. The F-16 short period dynamics states are x = [ α q δe ]T where α is the angle of attack, q is the pitch rate, and δe is the elevator deflection angle. The discrete-time plant model of this aircraft dynamics is a discretized version of the continuous-time one given in Reference [27]. We used standard zero-order-hold discretization techniques explained and easily implemented in MATLABTM [30] to obtain
P1: Binaya Dash November 16, 2007
18:17
7985
7985˙C004
Model-Free Adaptive Dynamic Programming Algorithms the sampled data plant: ⎡ ⎤ 0.906488 0.0816012 −0.0005 −0.000708383 ⎦ A = ⎣ 0.0741349 0.90121 0 0 0.132655 ⎡ ⎤ ⎡ ⎤ −0.00150808 0.00951892 ⎦ E = ⎣ 0.00038373 ⎦ B = ⎣ −0.0096 0.867345 0
151
(4.73)
with sampling time T = 0.1. In this H∞ design problem, the L2 gain for disturbance attenuation is γ = 1. The solution to Equation (4.9) is ⎡ ⎤ 15.5109 12.4074 −0.0089 P = ⎣ 12.4074 15.5994 −0.0078 ⎦. (4.74) −0.0089 −0.0078 1.0101 The corresponding policies have the gains: L = [ 0.0733 0.0872 −0.0661 ] K = [ 0.1476 0.1244 0 ]. Note that P ≥ 0, from Reference [8], that implies: "∞ T ∗T ∗ k=0 xk Qxk + uk uk "∞ ≤ γ2 T k=0 wk wk
(4.75)
for all finite energy disturbances, that is, all disturbances with ∞
wkT wk
k=0
is bounded. Hence u∗ (xk ) has the well-known robustness and disturbance rejection capabilities of H∞ control. 4.5.1 HDP-Based H∞ Autopilot Controller Design In this part, the HDP algorithm developed in Section 4.3 of this chapter is applied to solve for the H∞ autopilot controller forward in time. In this HDP design, the states of the aircraft are initialized to be x(0) = [ 4 2 5 ] where any values can be selected. The parameters of the critic network and the action network are initialized to zero. Following this initialization step, the aircraft dynamics are run forward in time and tuning of the parameter structures is performed using recursive least-squares by observing the states and rewards online. In Figures 4.3 and 4.4, the states and the inputs to the aircraft are shown with respect to time. In order to maintain the excitation condition, one can use several standard schemes, including covariance resetting, state resetting, or injection of a small probing noise signal. In this example, we use state
P1: Binaya Dash November 16, 2007
18:17
7985
7985˙C004
152
Modeling and Control of Complex Systems 5 x1
4.5
x2
4
x3
States x1, x2, x3
3.5 3 2.5 2 1.5 1 0.5 0
0
200
400
600
800 Time (k)
1000
1200
1400
1600
FIGURE 4.3 State trajectories with reinitialization.
The Control and Disturbance Inputs
1 Control input Disturbance input
0.8 0.6 0.4 0.2 0 –0.2 –0.4
0
500
1000 Time (k)
FIGURE 4.4 The control and disturbance inputs.
1500
P1: Binaya Dash November 16, 2007
18:17
7985
7985˙C004
Model-Free Adaptive Dynamic Programming Algorithms
153
16 14
The Convergence of P
12 P11 10
P12
8
P13 P22
6
P23 P33
4 2 0 –2
0
500
1000
1500
Time (k) FIGURE 4.5 (See color insert following p. 272.) Convergence of the critic network parameters.
resetting and the states are reinitialized to x(0) = [ 4 2 5 ] periodically to prevent them from converging to zero. State reinitialization has appeared recently in Reference [29] to solve the Hamilton-Jacobi-Bellman (HJB) equation associated with continuous-time optimal control problems. In Figures 4.5, 4.6, and 4.7, the convergence of the parameters of the critic network and the action network are shown. As expected, the parameters of the critic network converge to P in Equation (4.74) that solves the GARE equation. 4.5.2 Q-Learning-Based H∞ Autopilot Controller Design In this part, the Q-learning algorithm developed in Section 4.4 of this chapter is applied to solve for the H∞ autopilot controller forward in time. The recursive least-squares algorithm is used to tune the parameters of the critic network online. The parameters of the actions networks are updated according to Equation (4.51). The states of the aircraft are initialized to be x0 = 4 2 5 where any values can be selected. The parameters of the critic network and the action network are initialized to zero. Following this initialization step, the aircraft dynamics are run forward in time and tuning of the parameter structures is performed by observing the states online. In Figures 4.8 and 4.9, the states and the inputs to the aircraft are shown with respect to time. In this example, we inject probing noise to the control and disturbance inputs. Hence, the persistency of excitation condition
P1: Binaya Dash November 16, 2007
18:17
7985
7985˙C004
154
Modeling and Control of Complex Systems
The Convergence of the Disturbance Policy
0.16 0.14 0.12 0.1
K11
0.08
K12
0.06
K13
0.04 0.02 0 –0.02
0
500
1000
1500
Time (k) FIGURE 4.6 (See color insert following p. 272.) Convergence of the disturbance action network parameters.
0.1
The Convergence of the Control Policy
0.08 0.06 0.04
L11 L12
0.02
L13
0 –0.02 –0.04 –0.06 –0.08
0
500
1000 Time (k)
FIGURE 4.7 (See color insert following p. 272.) Convergence of the control action network parameters.
1500
P1: Binaya Dash November 16, 2007
18:17
7985
7985˙C004
Model-Free Adaptive Dynamic Programming Algorithms
155
The State x1
4 x1
2 0 –2
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
The State x2
4 x2
2 0 –2
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
The State x3
5 x3 0
–5
0
1000
2000
3000
4000 5000 The Time Step
6000
7000
8000
9000
FIGURE 4.8 (See color insert following p. 272.) State trajectories.
The Disturbance Input
4 w 2 0 –2 –4
0
1000
2000
3000
4000
5000
6000
7000
8000
4 The Control Input
u 2 0 –2 –4
0
1000
2000
FIGURE 4.9 The control and disturbance inputs.
3000
4000 5000 The Time Step
6000
7000
8000
P1: Binaya Dash November 16, 2007
18:17
7985
7985˙C004
156
Modeling and Control of Complex Systems 16 14
The Convergence of P
12 10
P11 P12
8
P13 6
P22 P23
4
P33 2 0 –2
0
1000
2000
3000
4000
5000
6000
7000
8000
Time (k) FIGURE 4.10 (See color insert following p. 272.) Online model-free convergence of Pi to P that solves the GARE.
required for the convergence of the recursive least-squares tuning, that is, to avoid the parameter drift problem, will hold. In Figures 4.10, 4.11, and 4.12, the convergence of the critic and action networks is shown. Using Equation (4.70), it can be shown that the critic network parameters Hi converge to the corresponding game value P that solves Equation (4.9).
4.6
Conclusion
This chapter presented application of the online ADP techniques to solve the linear quadratic discrete-time zero-sum game appearing in H∞ optimal control forward in time. Two ADP schemes have been applied to the zerosum game case, namely HDP and ADHDP. It was shown that the convergence to the optimal solution in the HDP algorithm is faster than in ADHDP. The convergence of these two schemes has been shown to be related to a stable iteration on the GARE. The HDP algorithm requires the knowledge of the plant model to tune the action networks, whereas in the ADHDP case, the plant model is not required to tune the action or the critic networks. The ADHDP algorithm results in a direct adaptive method that finds the H∞ feedback controller.
P1: Binaya Dash November 16, 2007
18:17
7985
7985˙C004
Model-Free Adaptive Dynamic Programming Algorithms
157
0.16
The Convergence of the Disturbance Policy
0.14 0.12 0.1 0.08 K11 0.06
K12 K13
0.04 0.02 0 –0.02
0
1000
2000
3000 4000 5000 The Policies Update no.
6000
7000
8000
7000
8000
FIGURE 4.11 (See color insert following p. 272.) Convergence of the disturbance action network parameters.
0.1
The Convergence of the Control Policy
0.08 0.06 0.04 0.02 L11
0
L12 –0.02
L13
–0.04 –0.06 –0.08
0
1000
2000
3000 4000 5000 The Policies Update no.
FIGURE 4.12 (See color insert following p. 272.) Convergence of the control action network parameters.
6000
P1: Binaya Dash November 16, 2007
18:17
158
7985
7985˙C004
Modeling and Control of Complex Systems
The results presented herein are directly applicable in practice because they provide means to solve the H∞ control problem, which is highly effective in feedback control systems design. An aircraft design example makes the point. It is interesting to see that when designing the H∞ controller in forward time, one needs to provide an input signal that acts as a disturbance that is tuned to be the worst-case disturbance in forward time. Once the H∞ controller is found, one can use the parameters of the control action network as the final parameters of the controller, without having to deliberately insert any disturbance signal to the system. Note that if γ → ∞ or the disturbance gain matrix E = 0, a special case of this approach can be the solution of the discrete-time linear quadratic regulator (LQR) in optimal control.
References 1. S. J. Bradtke, B. E. Ydestie, and A. G. Barto, “Adaptive linear quadratic control using policy iteration,” Proceedings of the American Control Conference, pp. 3475– 3476. Baltimore, MD, June, 1994. 2. J. Si, A. Barto, W. Powell, and D. Wunsch, Handbook of Learning and Approximate Dynamic Programming, John Wiley & Sons, Hoboken, New Jersey, 2004. 3. S. Hagen and B. Krose, “Linear quadratic regulation using reinforcement learning,” Proceedings on the 8th Belgian-Dutch Conference on Mechanical Learning, pp. 39–46, Wageningen, Netherlands, October 1998. 4. D. Kleinman, “Stabilizing a discrete, constant, linear system with application to iterative methods for solving the Riccati equation,” IEEE Transactions on Automatic Control, pp. 252–254, 1974. 5. A. A. Stoorvogle and A. J. T. M. Weeren, “The discrete-time Riccati equation related to the H∞ control problem,” IEEE Transactions on Automatic Control, vol. 39, no. 3, pp. 686–691, 1994. 6. A. G. Barto, R. S. Sutton, and C. W. Anderson, “Neuronlike elements that can solve difficult learning control problems,” IEEE Transactions on Systems Man, and Cybernetics, vol. SMC-13, pp. 835–846, 1983. 7. T. Ba¸sar and P. Bernhard, H∞ Optimal Control and Related Minimax Design Problems, Birkh¨auser, Boston, 1995. 8. T. Ba¸sar and G. J. Olsder, Dynamic Noncooperative Game Theory, SIAN, Philadelphia, 1999. 9. T. Landelius, Reinforcement Learning and Distributed Local Model Synthesis, PhD Dissertation, Linkoping University, Sweden, 1997. 10. J. W. Brewer, “Kronecker products and matrix calculus in system theory,” IEEE Trans. Circuit System, vol. CAS-25, No. 9, 1978. 11. F. L. Lewis and V. L. Syrmos, Optimal Control, John Wiley, New York, 1995. 12. C. Watkins, Learning from Delayed Rewards, Ph.D. Thesis, Cambridge University, Cambridge, England, 1989. 13. P. J. Werbos, “Neural networks for control and system identification,” Heuristics, Vol. 3, No. 1, pp. 18–27, 1990.
P1: Binaya Dash November 16, 2007
18:17
7985
7985˙C004
Model-Free Adaptive Dynamic Programming Algorithms
159
14. P. J. Werbos, “A menu of designs for reinforcement learning over time,” Neural Networks for Control, pp. 67–95, ed. W. T. Miller, R. S. Sutton, and P. J. Werbos: MIT Press, Cambridge, MA, 1991. 15. R. Howard, Dynamic Programming and Markov Processes, MIT Press, Cambridge, MA, 1960. 16. P. J. Werbos, “Approximate dynamic programming for real-time control and neural modeling,” Handbook of Intelligent Control, ed. D. A. White and D. A. Sofge, Van Nostrand Reinhold, New York, 1992. 17. W. Lin and C. I. Byrnes, “H ∞ control of discrete-time nonlinear system,” IEEE Transactions on Automatic Control, Vol. 41, No 4, pp. 494–510, 1996. 18. M. L. Littman, “Value-function reinforcement learning in Markov games,” Journal of Cognitive Systems Research, Vol. 2, pp. 55–66, 2002. 19. J. Hu and M. P. Wellman. “Multiagent reinforcement learning: Theoretical framework and an algorithm,” International Conference on Machine Learning, pp. 242–250, 1998. 20. D. Prokhorov and D. Wunsch, “Adaptive critic designs,” IEEE Transactions on Neural Networks, Vol. 8, No. 5, pp. 997–1007, 1997. 21. D. P. Bertsekas and J. N. Tsitsiklis, Neuro-Dynamic Programming, Athena Scientific, Belmont, MA, 1996. 22. K. S. Narendra and F. L. Lewis, “Special Issue on neural network feedback control,” Automatica, Vol. 37, No. 8, 2001. 23. M. Abu-Khalaf, F. L. Lewis, and J. Huang, “Hamilton-Jacobi-Isaacs formulation for constrained input nonlinear systems,” in 43rd IEEE Conference on Decision and Control, Vol. 5, pp. 5034–5040, 2004. 24. B. Widrow, N. Gupta, and S. Maitra, “Punish/reward: Learning with a critic in adaptive threshold systems,” IEEE Transactions on Systems, Man, and Cybernetics., Vol. SMC-3, pp. 455–465, 1973. 25. W. H. Kwon and S. Han, Receding Horizon Control, Springer-Verlag, London, 2005. 26. B. Stevens and F. L. Lewis, Aircraft Control and Simulation, 2nd edition, John Wiley, New York, 2003. 27. F. L. Lewis, Optimal Estimation, John Wiley, New York, 1986. 28. F. L. Lewis, Applied Optimal Control and Estimation, Prentice-Hall, Englewood Cliffs, NJ, 1992. 29. J. J. Murray, C. J. Cox, G. G. Lendaris, and R. Saeks, “Adaptive dynamic programming,” IEEE Transactions on Systems, Man, and Cybernetics, Vol. 32, No. 2, pp. 140–153, 2002. 30. MATLABTM 7th edition, The MathWorks Inc., 2005.
P1: Binaya Dash November 16, 2007
18:17
7985
7985˙C004
P1: Binaya Dash November 16, 2007
15:30
7985
7985˙C005
5 Optimization and Distributed Control for Fair Data Gathering in Wireless Sensor Networks
Avinash Sridharan and Bhaskar Krishnamachari
CONTENTS 5.1 5.2
Introduction................................................................................................ 161 Fair Data Gathering Problem .................................................................. 163 5.2.1 Modeling Wireless Receiver Bandwidth Consumption .......... 163 5.2.2 Formulating the Constrained Optimization Problem.............. 165 5.3 A Primal-Based Distributed Heuristic ................................................... 168 5.3.1 Performance of the Primal-Based Algorithm............................ 168 5.4 Dual-Based Approach .............................................................................. 170 5.4.1 Distributed Algorithm.................................................................. 172 5.4.2 Performance Evaluation............................................................... 173 5.5 Conclusions ................................................................................................ 176 References............................................................................................................. 176
5.1
Introduction
Wireless sensor networks are complex systems consisting of large numbers of small devices, each capable of a combination of radio communication, computation, sensing, and actuation [1]. They are able to provide a fine granularity of spatiotemporal monitoring at large scale. Applications envisioned for these next-generation networks range from military applications and ecological studies to civil structure monitoring and industrial process control. It is believed that pervasive deployments of such autonomous embedded networked sensing systems will constitute an important milestone in the information revolution [2]. The crucial engineering challenge in the development of protocols for these novel networks is that they are characterized by severe resource constraints. Precisely because the deployments need to be large in scale, 161
P1: Binaya Dash November 16, 2007
162
15:30
7985
7985˙C005
Modeling and Control of Complex Systems
considerations including economics and ease of deployment restrict the capability of each individual node. In particular, individual sensor nodes are often characterized by significant limits on energy, bandwidth, storage, and computational ability. This has two important interrelated implications for design: (1) control algorithms are needed to allocate and manage the available resources efficiently while enabling the deployed network to perform its intended monitoring application, and (2) these resource-allocation algorithms must be efficient in their own operation (there is no point designing an inefficient algorithm to efficiently allocate resources!). Researchers have been hard at work gaining a better understanding of how to model wireless sensor networks and how to develop scalable protocols for them. Various practical algorithms have been proposed for a range of important tasks in wireless sensor networks, including deployment, node localization, time synchronization, medium access, routing, data-centric storage, and querying. Despite the large and growing research literature on these topics (see, for instance, the survey in Reference [1]), there is still very much a large gap between theory and practice, particularly when it comes to the design of higher-layer network protocols. The prevailing methodology for protocol design in this context is a bottom-up intuitive engineering approach, not a top-down process guided by solid mathematical understanding. One area of research where this gap is starting to narrow is the development of a network utility maximization/flow optimization approach for various problems pertaining to data gathering. Generally, in these problems, a high-level network utility optimization goal (such as maximizing the lifetime of the network, the amount of data collected, or some fairness measure) is defined in terms of the data flow from sensor sources, taking into account resource constraints (such as energy or bandwidth constraints) on intermediate nodes and links. At a minimum, this approach helps in identifying optimal flows from a centralized perspective, providing a benchmark for more scalable distributed approaches. But beyond this, it is also sometimes possible to develop distributed algorithms directly based on the optimization formulation. The design of control protocols for data networks based on distributed solutions to convex optimization problems was first advocated in the famous work by Low and Lapsley [3], in the context of traditional wired networks. A thorough tutorial on distributed convex optimization for cross-layer protocol design in both wired and wireless networks is presented by Chiang et al. [4]. This approach is often characterized by the use of a distributed gradient search based on the Lagrange dual of the original problem formulation. Such distributed convex optimization-based algorithms for multihop wireless networks are presented in some recent works by Chiang [5], and Wang and Kar [6]. In the specific context of wireless sensor networks, there is a closely related work by Ye and Ordonez [7], where a distributed dual-based gradient search algorithm is proposed for the problem of maximizing data extraction under energy constraints.
P1: Binaya Dash November 16, 2007
15:30
7985
7985˙C005
Optimization and Distributed Control for Fair Data Gathering
163
In this chapter, we illustrate this approach by developing a distributed algorithm for fair data gathering in wireless sensor networks. We had previously treated this problem from the point of view of developing a timedivision multiple access mechanism for fair data gathering [8]; that work is the basis of the primal-based algorithm we shall present in this chapter. Recently, Rangwala et al. [9] have developed, implemented, and experimentally studied a practical distributed rate control protocol called IFRC—interference-aware rate control protocol—for this very problem (however, their design approach is not based on a formal distributed optimization framework). This chapter is organized as follows. In the next section, we shall describe and formalize the problem as a linear program, discussing along the way our modeling of a wireless receiver bandwidth that is crucial to the formulation. In Section 5.3 we present a primal distributed heuristic that serves as a baseline for comparison with the dual-based distributed gradient search algorithm that we develop and evaluate in Section 5.4. Finally, we present brief concluding comments and discuss our future work in Section 5.5.
5.2
Fair Data Gathering Problem
The problem we wish to address is as follows: a set of nodes in the wireless sensor network are all trying to send data to a single sink through the shortest path tree. Every receiver in the network is bandwidth constrained, and hence every node in the network must be allocated a rate (for sourcing and relaying data) that does not exceed the available bandwidth at any receiver. The goal is to maximize the utilization of the network capacity while maintaining a fair bandwidth allocation to all source nodes. We formulate the objective function as a linear combination of minimum rate and sum rate. 5.2.1 Modeling Wireless Receiver Bandwidth Consumption As a first step we need to develop a model that can capture the consumption of a receiver’s bandwidth by various flows traversing this receiver on their path to the sink. In wireless networks there are two factors that contribute to the consumption of bandwidth at a receiver. The first is the bandwidth consumed by flows belonging to the children of the receiver. The second is the bandwidth consumed by flows generated by neighbors that are not destined to this receiver; this is the interference perceived at the receiver. We denote the set of all communication links in the network by the set E, and the set of all nodes by the set V. The set E is a union of two sets e and n. e ⊂ E is the set of links that connect a child to a parent. n ⊂ E is the set of links that connect two neighbors (that do not have a parent–child relationship). Every receiver in the network has a finite receiver bandwidth capacity given by the set B. Thus, the network graph G can be represented
P1: Binaya Dash November 16, 2007
15:30
7985
7985˙C005
164
Modeling and Control of Complex Systems
1
2
4
3
5
6
FIGURE 5.1 Six-node topology.
by a three tuple (E,V,B). The routing tree rooted at the sink is denoted by T ⊂ G. Note that only the set of edges e are part of T. One important aspect of the quantification of the network at hand that we would like to point out is that we have explicitly segregated the set of links that connected two nodes that have a parent–child relationship (set e) and the set of links that connect nodes that do not have a parent–child relationship (set n). The relevance of this segregation is as follows: the flows from the source to the sink are carried over the set of links e. Hence the rates allocated to each e ij ∈e are equal to the bandwidth consumed at a receiver (i) by the flows of children ( j) connected to the receiver, through that specific link e ij . Also, a receiver is able to hear flows that are being sent by its neighbors to their parents over the set of links n. These flows are not destined for the receiver but due to the broadcast nature of the wireless domain will interfere in the receiver’s reception of flows that it needs to hear from its children. Thus, the rates allocated to the set of links n would represent the bandwidth wasted at the receiver due to interference. Figure 5.1 illustrates the model with an example. The segregation of the links into the two sets e and n thus gives us a simple model to quantify the consumption of bandwidth at a receiver. The bandwidth constraint at the receiver i would then be (i) r j,∀e ij , j∈C i + rk,∀nki ∈n,k∈Ni + rsrc ≤ B (i)
where Ci is the set of all immediate children of i, Ni is the set of all neighbors (i) of i, and rsrc is the source rate allocated to node i (if node i is a source). To make the description of our model more explicit let us calculate the bandwidth constraint of node 2 in the six-node topology presented in Figure 5.1. In Figure 5.1 the set of links corresponding to the set e are marked with a solid line and the set of links belonging to the set of noise links belonging
P1: Binaya Dash November 16, 2007
15:30
7985
7985˙C005
Optimization and Distributed Control for Fair Data Gathering
165
to the set n is marked with a dashed line. Thus, for node 2 the bandwidth constrained would be given by: (2) r3 + r4 + r5 + rsrc ≤ B (2)
where (3) (6) (4) r3 = rsrc + rsrc , r4 = rsrc
(5) and r5 = rsrc .
5.2.2 Formulating the Constrained Optimization Problem We now formulate our constrained optimization problem. Our model of the receiver bandwidth consumption from Section 5.2.1 gives us the relationship between the rate allocated to the sources and the receiver bandwidth. Before illustrating the constrained optimization problem let us define some additional notation that will be part of our optimization problem. •
B: Is an n × 1 vector representing the bandwidth available at each node i ∈ V
•
Rsrc: Is an n × 1 vector representing the rate allocated to each source i∈V
•
N: Is an n × 1 matrix, that denotes the presence of a noise edge nij ∈n between two nodes i, j ∈ V
•
C: Is an n × 1 matrix 0 cij = 1
•
j has i in its path j does not have i in its path
Y: Is a scalar that acts as a slack variable for our objective function.
We define the optimization problem as follows: P1 : (i) max : Y + i∈T rsrc subject to : 1T × Rin + N × Rnoise ≺B Rin = C × Rsrc Rnoise = C × Rsrc + Rsrc (i) rsrc ≥Y Rsrc ≺ B. The constraints of our optimization problem come directly from our bandwidth consumption model presented in Section 5.2.1 and our application and network-level parameters represented by the matrices B, Rsrc , N, and C. The new variables that have been introduced are the matrices Rin and Rnoise . Rin represents total input rate incident on a receiver from all its child nodes. Rnoise represents the total rate that is incident on each receiver from its neighbors, i.e., this is the total bandwidth at the receiver that is lost due to interference.
P1: Binaya Dash November 16, 2007
15:30
7985
7985˙C005
166
Modeling and Control of Complex Systems
1
2
5
3
6
7
4
8
9
FIGURE 5.2 Nine-node topology.
i Note that the noise bandwidth rnoise ∈ Rnoise represents the total outgoing bandwidth of node i ∈ V. The objective function is given here as the sum of the minimum rate and the sum rate; it can be extended easily to an arbitrarily weighted linear combination (to consider different trade-offs between fairness and total throughput). The optimization problem is a linear program. Hence, we can obtain a solution to the rate allocation problem by using an appropriate numerical linear program solver. We use the topology presented in Figure 5.2 as an example and give the solutions obtained by running the linear program through a linear program solver in Table 5.1. The receiver bandwidth capacity is presented in Table 5.2. We consider two cases, the heterogeneous case where at least one node has a receiver bandwidth capacity lower than all the other nodes in the network and the homogeneous case where all receiver bandwidths have the same
TABLE 5.1
Optimal Source Rate Allocation for a Nine-Node Topology, Heterogeneous and Homogenous Node
Homogenous
Heterogeneous
2 3 4 5 6 7 8 9
2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5
1.66 1.66 4.0 1.66 1.66 1.66 4.0 4.0
P1: Binaya Dash November 16, 2007
15:30
7985
7985˙C005
Optimization and Distributed Control for Fair Data Gathering
167
TABLE 5.2
Receiver Bandwidth Capacity for Nine-Node Topology, Homogeneous and Heterogeneous Node
Homogenous
Heterogenous
1 2 3 4
20 20 20 20
20 8 20 20
receiver bandwidth capacity. The solutions obtained by solving the optimization problem give us the following insights: •
For the homogeneous case the node with the bottleneck bandwidth would always be the root. The root has the maximum number of flows incident on it and hence the maximum fair bandwidth allocation it can perform is B r oot cij
.
∀ j,i = r oot
•
For the heterogeneous case the node with the bottleneck bandwidth would be the node that has the least bandwidth per flow allocation capacity. The bandwidth per flow allocation can then be obtained by dividing the capacity of the receiver by the sum of the number of children incident on the receiver and sum of all the children of its neighbors: cij
∀ j,i= j
+
B (i)
∀ j∈N(i)
∀k, j=k
c jk
.
Thus, for the heterogeneous case, every source would be required to be given a rate at least equal to that offered by the bottlenecked node. The remaining capacity of the network can than be distributed among nodes that can still have a higher rate (that do not traverse the bottlenecked node) to maximize the capacity utilization. Solving the optimization problem in a centralized manner for a wireless network presents us with some problems. It cannot cater for a dynamic environment where the receiver bandwidth capacity might be changing, or the topology itself might be changing due to node failures. Moreover, the constraints in the optimization problem are related to the number of flows active in the network, which could also change dynamically. In these cases information would have to be repeatedly sent from the nodes in the network to the root and the result recalculated and propagated to the children. Such a centralized solution is therefore not scalable, motivating us to explore distributed approaches.
P1: Binaya Dash November 16, 2007
168
5.3
15:30
7985
7985˙C005
Modeling and Control of Complex Systems
A Primal-Based Distributed Heuristic
Consider our optimization problem. If we start increasing the source rates ( j) of source i ∈ V, we will also start increasing the terms rin , as well as the (i) terms rnoise ∀ j that lie in the path of i to the root. This in turn would imply an increase in the bandwidth consumption for all these nodes. Hence, if we iteratively increase the source bandwidths, we can continue our increments until we have consumed the bandwidth at every bottleneck receiver. We could formulate the above logic as a distributed solution by making the parents, children, and neighbors exchange messages, informing one another of their current available bandwidth. Whenever a child receives a message from a parent or a neighbor receives a message from another neighbor about the unavailability of bandwidth, the child or neighbor should stop incrementing its rate and also inform all their children about the unavailability of receiver bandwidth. This heuristic distributed algorithm would be iterative in nature. At each iteration the root node would send messages to the child nodes (which would then propagate the message through the tree) to increment their rate by an amount ε. At the end of each iteration, nodes would send their total output rate to their parents. The parent in turn would compare the total bandwidth consumed by themselves, their children, and their neighbors with the available bandwidth. The output rate of each node (which is treated as “noise” for any neighboring node not logically linked to this node on the tree) would be calculated as follows: (i) (i) (i) (i) routput = rnoise = rin + rsrc .
At the end of each iteration every node would compare the total bandwidth consumed with the total receiver bandwidth as follows: ( j) (i) (i) (i) Bpending = B (i) − rin − rsrc − rnoise j∈N(i) (i) where rnoise is the total noise bandwidth incident on the node from its neigh(i) bors. If Bpending ≤ 0 node i would send a “CONSTRAIN” message to all its children as well as its neighbors. Thus, the termination condition for the algorithm would be when all the nodes in the network have been constrained. The pseudo code for the algorithm has been presented in Figure 5.3.
5.3.1 Performance of the Primal-Based Algorithm Because the distributed algorithm is a heuristic we need to see how close the algorithm performs to the optimal. We use the nine-node topology presented in Figure 5.2 to test the performance of our heuristic. As is evident in Figure 5.4 and Figure 5.5 for the nine-node topology, the algorithm is able to
P1: Binaya Dash November 16, 2007
15:30
7985
7985˙C005
Optimization and Distributed Control for Fair Data Gathering
169
Step 1: Parent initiates message to child to increment bandwidth. Step 2: Termination Condition: If (constrainedi = TRUE, i) end; Step 3: Rate Update: For i If (constrainedi = FALSE ) (i)
(i)
rsrc = rsrc + ε; Step 4: Checking Bandwidth Constraints: (i) (i) (i) If (rin + rnoise > –B ) constrainedi = TRUE; For j , j C i constrainedj = TRUE; For j , j N i constrainedj = TRUE; For k, k
Cj
constrainedk = TRUE; Else inform parent and all neighbors of the o/p bandwidth Step 5: goto Step 1
FIGURE 5.3 Primal-based heuristic algorithm.
2.5
Src 2
Source Bandwidths Allocated
2
1.5
1
0.5
0
0
50
100 150 No. of Iterations (υ = 0.01)
200
250
FIGURE 5.4 Bandwidth allocation by the heuristic when all receivers have equal bandwidth capacity.
P1: Binaya Dash November 16, 2007
15:30
7985
7985˙C005
170
Modeling and Control of Complex Systems 4 Src 2 Src 4
Source Bandwidth Allocated
3.5 3 2.5 2 1.5 1 0.5 0
0
50
100
150 200 250 No. of Iterations (υ = 0.01)
300
350
400
FIGURE 5.5 Bandwidth allocation on a nine-node topology with the heuristic when receivers have different bandwidth capacities.
achieve the optimum solution. The accuracy of the solution is solely dependent on the choice of ε. The smaller the value of ε, the closer the rate allocation achieved is to the optimum. At the same time the value of ε controls the rate of convergence of the algorithm. Thus, the tuning of the parameter ε presents a trade-off between the speed of convergence and accuracy. Apart from the choice of the parameter for ε another drawback of the algorithm is that the source rates can only be increased and not decreased. Thus, if the topology changes or the number of flows in the network changes we would have to restart the algorithm to make the source nodes converge to the optimum, possibly an expensive proposition. Hence, this heuristic is not flexible enough to handle such dynamics. This motivates us to explore another approach.
5.4
Dual-Based Approach
In Section 5.3 we presented a heuristic for solving our constrained optimization problem. In this section, we present an alternative, more rigorously derived, dual-based approach that gives an economic interpretation of the problem in terms of shadow prices. The constrained optimization problem
P1: Binaya Dash November 16, 2007
15:30
7985
7985˙C005
Optimization and Distributed Control for Fair Data Gathering
171
P1 presented in Section 5.2.2 is our primal problem. In our approach we will derive its Lagrange dual and then work with the dual to come up with a distributed algorithm using the shadow price interpretation of the Lagrange multipliers in the dual. To aid our solution we can simplify our primal further. We can perform the simplification by rewriting the primal in terms of Rsrc and Y as follows: P2 : (i) max :Y + rsrc i∈T ( j) subject to: rsrc + j∈C (i)
j∈N(i)
(i) ≥ Y∀i ∈ T rsrc Rsrc ≺ B.
k∈C ( j)
(k) rsrc +
j∈N(i)
( j)
(i) rsrc + rsrc ≤ B (i) ∀ i ∈ T
In this section, we consider the Lagrange dual function of the primal P2 (see Reference [10] for a treatment of Lagrange duality theory). Because P2 is a linear program, from the duality theorem [10] it can be seen that the duality gap of the primal and the dual would be zero. We consider the maximum bandwidth constraint as our domain and relax the constraints in the primal to obtain the Lagrange dual. The Lagrange relaxation is as follows: (i) rsrc − λT × (C × Rsrc + N × Rsrc + N × C × Rsrc L(Y, Rsrc , λ, ν) = Y + i∈T
+ Rsrc − B) − υ T (−Rsrc + Y). Hence, the Lagrange function is ⎛ ⎛ (i) ( j) (k) (i) rsrc − λi ⎝ rsrc + rsrc + rsrc D(λ, υ) = max ⎝Y + Rsrc ≺B
i∈T
i∈T
(i) + rsrc − B (i)
−
= max
Y 1−
Rsrc ≺B
+
υi +
λ( j) + λi
j∈C (i)
1 + υi −
(i) rsrc
i∈T
j∈N(i)
λ( j) +
i∈C ( j)
λk
i∈C ( j) k∈N( j)
λi B (i)
i∈T
i∈N( j) ∗ Let ζ ( Rsrc )=
+
i∈T
j∈N(i) k∈C ( j)
(i) υi − rsrc + Y)
i∈T
j∈C (i)
j∈N(i)
k∈C ( j)
( j)∗
rsrc +
(k)∗ rsrc +
j∈N(i)
( j)∗
(i)∗ rsrc + rsrc .
From the Lagrange dual function it can be seen that the subgradient with respect to λi is ∂L ∂λi
∗ = −(ζ (i) ( Rsrc ) − B (i) )
P1: Binaya Dash November 16, 2007
15:30
7985
7985˙C005
172
Modeling and Control of Complex Systems
and the subgradient w.r.t υi is ∂L ∂υi
(i)∗ = rsrc − Y∗ .
The objective here is to maximize the Lagrange dual function. If we trace the graph in the direction of the negative gradient we are bound to hit the optimal, because the dual is a linear equation. We will use this to develop our distributed algorithm. 5.4.1 Distributed Algorithm In this section, we describe our distributed algorithm that is based on the maximization of the Lagrange dual function. In the Lagrange dual function, (i) denoted by μi is given by: the coefficient of rsrc ⎞ ⎛ μi = 1 + υi − ⎝ λj + λk + λ j + λi ⎠ . i∈C j
j∈Nk i∈C j
i∈N j
Also, because the Lagrange dual function is a linear combination of (λ(i) src , μ(i) (i) and λi ), it implies that as long as we keep incrementing rsrc , whose coefficients (i) are positive, and decrement rsrc , whose coefficients are negative, we will tend toward the maximum. Let us assume we do increment all source rates (this increment could be the same as that being done in our heuristic algorithm), at what point do the coefficients of a particular node i become negative, which would imply a decrement operation to be carried for the source rate of that specific node i? The answer to the above question lies in our observations of the subgradients of the shadow prices. As mentioned earlier if we trace the (i) graph represented by the Lagrange dual function, for a fixed value of rsrc , we will maximize the Lagrange dual function by moving in the direction of the negative gradient. In the subgradient technique the update of the shadow prices itself would be performed as follows: ∗ λi (t + 1) = [λi (t) + α(ζ i ( Rsrc ) − B i )]+ (i)∗ υi (t + 1) = [υi (t) + αt (rsrc − Y∗ )]+
where R∗src and Y∗ are the optimal values that would maximize the Lagrange dual function given by dual function at the tth iteration. Now given the update mechanism of the subgradients, it can be seen that if [ζ i (R∗src ) − B i ] is consistently positive, the value of λi would keep increasing. Also to be noted is the fact that λi is a part of the negative term in the coefficient j ∀Rsrc j ∈ C i | j ∈ Ni | j ∈ C k , k ∈ Ni , that is, λi affects the coefficients of source rates of all nodes j for which either i is an intermediate node on the path from j to the sink, or i is a neighbor to node j or is a neighbor to a node k which is on the path of j to the sink.
P1: Binaya Dash November 16, 2007
15:30
7985
7985˙C005
Optimization and Distributed Control for Fair Data Gathering
173
Thus, if we allocate the source rates in such a manner that we exceed the bandwidth constraints at a particular receiver i, it makes the gradient of λi positive, which increases the value of λi . Eventually the value of λi will be j large enough to make the coefficients ∀Bsrc that contain λi negative. This in turn would result in a negative reinforcement to the affected nodes, forcing them to reduce their rates. Similar to the primal heuristic, here too we see that the sources whose flows are consuming bandwidth at the constrained receiver are the ones that receive a signal. In this case, however, our analysis of the update mechanism for the subgradient gives us a clear estimate of the decrement required to be applied to the source rate of a constrained receiver. Hence, unlike in the case of the primal-based heuristic, we need not be extremely cautious about incrementing the source rates to avoid exceeding the receiver bandwidth at a constrained node. We now present an algorithm designed on the above principles in Figure 5.6. The algorithm proceeds as follows. In step 1 we perform a breadth first search of the tree and at every node initialize the available bandwidth. The available bandwidth is either the receiver bandwidth at that node divided by the number of children at that node or its parent’s available bandwidth, whichever is smaller. In step 2 we initialize the source rate of all sources in the network to the available bandwidth at that source. In step 3 we calculate the pending bandwidth at each node in the network. The pending bandwidth is the difference of the receiver bandwidth capacity and the total bandwidth consumed at the receiver by all flows originating either at the children or from neighboring nodes. In step 4 for every node in the network we look at the pending bandwidth of nodes lying in the path from the specific node in question to the sink. We also consider the pending bandwidth of neighbors of these intermediate nodes. The pending bandwidth at the specific node in question is then the minimum of all these bandwidths. If the pending available bandwidth is negative, this implies that a constraint has been violated at one of the nodes on the path or the nodes neighboring the path from the source to the sink. Hence the source rate of the node is set to the available bandwidth of this constrained node, and a flag is set implying that the node is constrained. In case the pending available bandwidth is positive, we go ahead and increment the rate of the source node by the pending bandwidth. In step 5 we check the constrained flag for all nodes in the network and if all nodes have been constrained the program terminates, else we repeat the algorithm from step 3. 5.4.2 Performance Evaluation The simulation results of the algorithm are shown in Figure 5.7 and Figure 5.8. As can be seen, the algorithm allows the source bandwidth of nodes to converge to the optimal value within 10 iterations. The fast convergence to the optimal value compared to the heuristic presented in Section 5.3
P1: Binaya Dash November 16, 2007
15:30
7985
7985˙C005
174
Modeling and Control of Complex Systems
Step 1: Perform a breadth first search of the tree, and set the available bandwidth at each node i as follows: If Bavailablei < Bavailablej i C j Bavailablei
=
Bi Ci
Else Bavailablei = Bavailablej Step 2: Initialization For i (i) = Bavailable rsrc i
constrainedi = FALSE Step 3: Pending Bandwidth Calculation For i
Bpending i = ζ i(Bsrc ) – Bi Step 4: Updating Source bandwidth: For i
Pending _bandwidth = min(Bpending j), i constrained _node = arg min(Bpending j ), i
Cj i Cj i
Nj j Nj j
N k, i N k, i
Ck Ck
If ( pending _ bandwidth < 0) (i) = Bavailable rsrc constrained_ node
constrainedi = TRUE else constrainedi = FALSE (i) = r (i) + pending _ bandwidth rsrc src
Step 5: Checking termination condition: If (constrainedi = TRUE) i end else goto Step 3
FIGURE 5.6 Dual-based algorithm.
is a result of the explicit knowledge that nodes get, of how much decrement needs to be applied to their source bandwidth, when one of the receivers becomes bandwidth constrained. Moreover in the improved bandwidth allocation algorithm, there are no iterative increments of source bandwidths (thus removing the effect of ε from the rate of convergence). Apart from the rate of convergence another advantage that the improved algorithm presents
P1: Binaya Dash November 16, 2007
15:30
7985
7985˙C005
Optimization and Distributed Control for Fair Data Gathering
Bandwidth Allocated to Source Nodes
4
175
Src 2 Src 3 Src 4
3.5
3
2.5
2
1.5
0
1
2
3
4 5 6 No. of lterations
7
8
9
10
FIGURE 5.7 Rate allocation by the improved distributed algorithm for nine-node topology with node 2 having the constrained bandwidth.
Bandwidth Allocated per Node
3.5
3
2.5
2
1.5
0
1
2
3
4 5 6 No. of Iterations
7
8
9
10
FIGURE 5.8 Bandwidth allocation by the distributed algorithm for nine-node topology with all nodes having the same receiver bandwidth.
P1: Binaya Dash November 16, 2007
15:30
7985
176
7985˙C005
Modeling and Control of Complex Systems
as compared to the heuristic is its ability to adapt to the changes in network topology. By changes to the network topology we imply a deletion or addition of a new source or an intermediate node into the network. In the improved algorithm, we have the ability to decrease the source bandwidth when a bandwidth constraint is violated. At the same time we have the ability to increment source bandwidths when there is available capacity pending in the network. Thus, if a new node were added, adding capacity to the network, in step 4 of the algorithm the nodes that were constrained would become unconstrained and increment their bandwidth to the excess capacity available.
5.5
Conclusions
We have presented an illustrative case study showing how a distributed convex optimization framework can be used to design a rate control protocol for fair data gathering in wireless sensor networks. We believe that this kind of systematic modeling and optimization framework represents the future of protocol design in complex wireless networks such as sensor networks. It is therefore of particular interest to see how such theoretically guided approaches will perform in a real-world example. One direction we are currently pursuing is implementing the dual-based algorithm on a real wireless test-bed, to provide a direct comparison with the IFRC protocol proposed by Rangwala et al. [9].
References 1. B. Krishnamachari, Networking Wireless Sensors, Cambridge University Press, Cambridge, 2005. 2. National Research Council Staff, Embedded Everywhere: A Research Agenda for Networked Systems of Embedded Computers, National Academy Press, Washington, DC, 2001. 3. S. H. Low and D. E. Lapsley, “Optimization flow control. I: Basic algorithm and convergence,” IEEE/ACM Transactions on Networking, vol. 7, pp. 861–875, 1999. 4. M. Chiang, S. H. Low, A. R. Calderbank, and J. C. Doyle, “Layering as optimization decomposition: A mathematical theory of network architectures,” Proceedings of the IEEE, in press 2007. 5. M. Chiang, “Balancing transport and physical layers in wireless multihop networks: Jointly optimal congestion control and power control,” IEEE J. Sel. Areas Comm., vol. 23, no. 1, pp. 104–116, 2005. 6. X. Wang and K. Kar, “Cross-layer rate control for end-to-end proportional fairness in wireless networks with random access,” Proceedings of ACM MobiHoc, May 2005.
P1: Binaya Dash November 16, 2007
15:30
7985
7985˙C005
Optimization and Distributed Control for Fair Data Gathering
177
7. W. Ye and F. Ordonez, “A sub-gradient algorithm for maximal data extraction in energy-limited wireless sensor networks,” Proceedings of the International Conference on Wireless Networks, Communications and Mobile Computing, June 2005. 8. A. Sridharan and B. Krishnamachari, “Max-min fair collision-free scheduling for wireless sensor networks,” Proceedings of the IEEE International Conference on Performance, Computing, and Communications, April 2004. 9. S. Rangwala, R. Gummadi, R. Govindan, and K. Psounis, “Interference-aware fair rate control in wireless sensor networks,” Proceedings of ACM SIGCOMM Symposium on Network Architectures and Protocols, September 2006. 10. S. Boyd and L. Vandenberghe, Convex Optimization, Cambridge, 2004. University Press, Cambridge, 2004.
P1: Binaya Dash November 16, 2007
15:30
7985
7985˙C005
P1: Binaya Dash November 16, 2007
15:33
7985
7985˙C006
6 Optimization Problems in the Deployment of Sensor Networks
Christos G. Cassandras and Wei Li
CONTENTS 6.1 6.2 6.3
Introduction................................................................................................ 179 Sensor Network Structure ........................................................................ 181 Deployment of Networks with Fixed Data Sources and no Mobility ......................................................................................... 182 6.3.1 Problem Decomposition............................................................... 186 6.3.2 Incremental Iterative Approach .................................................. 187 6.3.2.1 Determining the Bottleneck Node............................... 188 6.3.2.2 Enumerating Topology Adjustment Options ............ 189 6.3.2.3 Obtaining a New Deployment..................................... 189 6.4 Deployment of Networks with Unknown Data Sources and Mobile Nodes ..................................................................................... 190 6.4.1 Mission Space and Sensor Model ............................................... 191 6.4.2 Optimal Coverage Problem Formulation and Distributed Control ............................................................... 191 6.4.3 Optimal Coverage Problem with Communication Costs........ 195 6.5 Research Issues in Sensor Networks....................................................... 199 Acknowledgments .............................................................................................. 201 References............................................................................................................. 201
6.1
Introduction
A sensor network consists of a collection of (possibly mobile) sensing devices that can coordinate their actions through wireless communication and aim at performing tasks such as exploration, surveillance, or monitoring and tracking “target points” over a specific region, often referred to as the “mission space.” Collected data are then further processed and often support 179
P1: Binaya Dash November 16, 2007
180
15:33
7985
7985˙C006
Modeling and Control of Complex Systems
higher-level decision-making processes. Nodes in such networks are generally inhomogeneous, they have limited on-board resources (e.g., power and computational capacity), and they may be subject to communication constraints. It should be pointed out that sensor networks differ from conventional communication networks in a number of critical ways. First, they allow us to interact with the physical world, not just computers, databases, or human-generated data. By inserting decision-making, and control functionality into such networks one can envision closing the loop on remote processes that would otherwise be inaccessible. Thus, sensor networks are expected to realize a long-anticipated convergence of communication, computing, and control [1], [2]. Second, at least some nodes in such a network are “active,” for example, they execute sensing processes or they are mobile; therefore, they are characterized by dynamics, making a sensor network as a whole a challenging dynamic system. In addition, nodes are typically small and inexpensive, operating with limited resources, often in adverse stochastic environments. This implies that optimization in designing and operating sensor networks is a real need and not a mere luxury. Moreover, the limited computational capabilities of nodes often make distributed control or optimization methods indispensable. Finally, when it comes to measuring the performance of sensor networks, the metrics can be quite different from those used in standard communication networks, giving rise to new types of problems. For example, because of limited energy, we recognize that nodes have finite lives and we often seek control mechanisms that maximize an appropriately defined “network lifetime.” Part of such mechanisms may involve switching nodes on and off so as to conserve their energy or finding means to periodically replenish their energy supply. When the nodes are mobile, mechanisms are also needed to determine desired trajectories for the nodes over the mission space and cooperative control comes into play so as to meet specific mission objectives. The performance of a sensor network is sensitive to the location of its nodes in the mission space. This leads to the basic problem of deploying sensors in order to meet the overall system’s objectives. In particular, sensors must be deployed so as to maximize the information extracted from the sensing region while maintaining acceptable levels of communication and energy consumption. The static version of this problem involves positioning sensors without any further mobility; optimal locations for them can be determined by an off-line scheme prior to the deployment, which is akin to the widely studied facility location optimization problem. The dynamic version allows the coordinated cooperative movement of sensors, which may adapt to changing conditions in the sensing region, typically deploying them into geographical areas with the highest information density. This is also referred to as the coverage control or active sensing problem. In this chapter, we describe deployment problems for sensor networks viewed as complex dynamic systems. We first consider a deployment setting where data sources are known. We formulate a minimum-power wireless sensor network deployment problem whose objective is to determine the locations of a given number of relay nodes and the corresponding link flows
P1: Binaya Dash November 16, 2007
15:33
7985
7985˙C006
Optimization Problems in the Deployment of Sensor Networks
181
in order to minimize the total communication power consumption. This is shown to be a nonlinear optimization problem with a nonconvex cost function, in addition to being combinatorially complex. We describe a solution approach (first presented in Reference [3]) based on an incremental algorithm deploying nodes one at a time into the network. Next, we consider a setting where data sources are unknown. In this case, the sensing field is modeled using a density function representing the probability that specific events take place (e.g., the temperature of a noxious gas exceeds a certain level or an object emits detectable data). We assume that a mobile sensor has a limited range that is defined by a probabilistic model. We then describe a distributed deployment algorithm (first proposed in Reference [4]) applied at each mobile node so that it maximizes the joint detection probabilities of random events. If the sensing field (or our perception of the sensing field) changes over time, the adaptive relocation behavior naturally follows from the optimal coverage formulation. Finally, we incorporate communication cost into the coverage control problem, viewing the sensor network as a multisource, single-base-station data collection network. Communication cost is modeled as the power consumption needed to deliver collected data from sensor nodes (data sources) to the base station using wireless multihop links. Thus, the coverage problem we formulate trades off sensing coverage and communication cost. The rest of the chapter is organized as follows. Section 6.2 describes the basic structure of sensor networks. In Section 6.3, we describe a deployment approach for sensor networks with fixed data sources and no mobility. In Section 6.4 we present a solution approach to the coverage control problem. Finally, in Section 6.5 we outline some fundamental research questions related to sensor networks and the convergence of communication, computing, and control.
6.2
Sensor Network Structure
In its most basic form, the main objective of a sensor network is to collect field data from an observation region (the “mission space”), denoted by R, and route it to a base station, denoted by B (also referred to as “data collection point” or “sink”). At any time instant, there may exist multiple data sources in R (also referred to as “target points” or simply “targets”). Nodes in a sensor network collaborate to ensure that every source is sensed and that the data gathered are successfully relayed to the base station. During cooperation, a sensor node may fall into one of the following states: 1. Sensing: a sensing node monitors the source using an integrated sensor, digitizes the information, processes it, and stores the data in its onboard buffer. These data will eventually be sent back to the base station. 2. Relaying: a relaying node receives data from other nodes and forwards it towards their destination.
P1: Binaya Dash November 16, 2007
182
15:33
7985
7985˙C006
Modeling and Control of Complex Systems 3. Sleeping: for a sleeping node, most of the device is either shut down or works in low-power mode. A sleeping node does not participate in either sensing or relaying. However, it “wakes up” from time to time and listens to the communication channel in order to answer requests from other nodes. Upon receiving a request, a state transition to “sensing” or “relaying” may occur. 4. Dead: a dead node is no longer available to the sensor network. It has either used up its energy or has suffered vital damage. Once a node is dead, it cannot reenter any other state.
Instead of a flat structure, some sensor networks assume a more hierarchical one. In this case, besides sensors and a base station, there are also nodes acting as clusterheads. These nodes generally have more powerful data processing and routing capabilites, at the expense of size and cost. Each clusterhead is in charge of a cluster of sensor nodes which is obtained by making a spatial or logical division of the network. By aggregating the data sent from sensor nodes, a clusterhead refines the observation of the cluster’s region. Then, it may produce some post-processed data and route them to the base station. The links connecting clusterheads and the base station may have a larger data rate in order to support high-speed data transmission. The first and most basic problem we face is that of deployment, that is, positioning the nodes so as to meet the goal of successfully transferring data from the sources to the base station and, ultimately, to optimize some network performance metric. Once this is accomplished, there are numerous operational control issues at different layers (physical, data-link, network, transport, and son on.); for a comprehensive overview, see References [5], [6], and [2]. The most important problems where dynamic control-oriented methods may be used are 1. Routing, that is, determining the destination node of data packets transmitted from some node i on their way to the base station. 2. Scheduling, that is, determining the precise timing mechanism for transmitting packets of possibly different types. 3. Power control, that is, making decisions aimed at conserving a node’s energy in a way that benefits the network as a whole.
6.3
Deployment of Networks with Fixed Data Sources and no Mobility
The deployment of sensor nodes may be either deterministic or random. The latter situation arises in applications such as reconnaissance and exploration where sensors are randomly dropped into the mission space and their exact location cannot be precisely controlled. In this case, research focuses on the relationship between sensor density and network performance. Deterministic deployment takes place when the characteristics of the mission space are
P1: Binaya Dash November 16, 2007
15:33
7985
7985˙C006
Optimization Problems in the Deployment of Sensor Networks
183
known in advance (e.g., in building monitoring). Fundamental questions in this case include: 1. How many sensor nodes are needed to meet the overall system objectives? 2. For a given network with a certain number of sensor nodes, how do we precisely deploy these nodes in order to optimize network performance? 3. When data sources change or some part of the network malfunctions, how do we adjust the network topology and sensor deployment? These questions can be resolved by an off-line scheme that is akin to the widely studied facility location optimization problem. One of the commonly applied approaches is to discretize the mission space and place sensor nodes along grid points. The resulting optimal deployment problem can be formulated as a linear program. As all grid points and interconnected links must be considered, this results in significant combinatorial complexity. Alternatively, one can formulate a nonlinear optimization problem and seek to exploit the structure of a sensor network in order to develop decomposition approaches to solve it. In what follows, we describe such an approach, introduced in Reference [3]. Adopting the source/base station structure of a sensor network discussed earlier, we consider M data sources residing at points sm ∈ R2 (m = 1, . . . , M) and a single base station B, with location x0 ∈ R2 . Each data source has a fixed position and a given data rate denoted by rm (m = 1, . . . , M). To collect data at each data source, a sensor must be deployed at its location. In addition, because a data source may be far from the base station and the distance may exceed the range of radio communication, we also need to deploy a certain number of sensor nodes that work as relays. Suppose there are N active sensor nodes and each has location xk ∈ R2 (k = 1, . . . , N). Let W = (V, E, c, e) be a flow network with an underlying directed graph G = (V, E), where V is the set of nodes and E is the set of links. A capacity vector c = [c 1 , . . . , c |E| ] and a cost vector e = [e 1 , . . . , e |E| ] are defined on every link j ∈ E with c j , p j ∈ R+ . Each link j starts at node s( j) and ends at node t( j) and e j denotes some cost metric per unit of data, which generally depends on the node locations. Over this flow network, we can formulate an optimization problem that minimizes the total cost by controlling on each link j the locations of sensor nodes xs( j) and xt( j) and the data rate f jm from each source m = 1, . . . , M: minxi ,fm
M
e j (xs( j) , xt( j) ) f jm
(6.1)
m=1 j∈E
s.t.
a ij f jm = −rm dim ∀i, m
(6.2)
j∈E M m=1
f jm ≤ c j ∀ j ∈ E
(6.3)
P1: Binaya Dash November 16, 2007
15:33
7985
7985˙C006
184
Modeling and Control of Complex Systems f jm ≥ 0 ∀ j, m
(6.4)
xm = sm ∀m.
(6.5)
In Equation (6.1), the decision variables are fm and xi , where a component f jm of the flow vector fm denotes the data rate on link j ( j ∈ E) that originates from source m (m = 1, . . . , M). In the flow balance Equation (6.2), A = {a ij } is the node-link incidence matrix of graph G such that for all i = 1, . . . , N and j = 1, . . . , |E| [7]: ⎧ ⎨ +1 if arc j leaves node i a ij = −1 if arc j enters node i ⎩ 0 otherwise. and dm = [dim ] is the flow balance vector for data source m such that ⎧ ⎨ −1 i = 0 dim = +1 i = m ⎩ 0 otherwise. The remaining three equations represent the link capacity constraints, flow non-negativity, and the fixed locations of the M sources. Although this formulation is general, we shall consider a particular problem in which our objective is to determine a minimum power deployment. In this case, the link cost e j (xs( j) , xt( j) ) denotes the transmission energy consumed per unit of data. The function e j (xs( j) , xt( j) ) can be specified based on a model whose key parameters are the energy needed to sense a bit (E se ), receive a bit (Er x ), and transmit a bit over a distance d (E tx ). A 1/d n (n ≥ 1) path loss is commonly assumed [8], in which case we can write: E tx = α11 + α2 d n ,
Er x = α12 ,
E se = α3
(6.6)
where α11 is the energy/bit consumed by the transmitter electronics, α2 accounts for energy dissipated in the transmit op-amp, α12 is the energy/bit consumed by the receiver electronics, and α3 is the energy cost of sensing a bit. Hence, the energy consumed by a node acting as a relay that receives a bit and then transmits it a distance d onward is e(d) = α11 + α2 d n + α12 ≡ α1 + α2 d n .
(6.7)
Therefore, in Equation (6.1) we have: e j (xs( j) , xt( j) ) = e(||xs( j) − xt( j) ||) = α1 + α2 ||xs( j) − xt( j) ||n
(6.8)
for each link j ∈ E that starts at node s( j) and ends at node t( j). A property of e j (·) as formulated in Equation (6.8) is that it is a convex function of both xs( j) and xt( j) . The solution of the deployment problem is in fact robust with respect to the specific form of e j (·), as long as the convexity property is preserved. In Reference [8], a minimum-power topology was proposed based on the assumption that there is no constraint on the number of intermediate sensor
P1: Binaya Dash November 16, 2007
15:33
7985
7985˙C006
Optimization Problems in the Deployment of Sensor Networks
185
nodes. Under this assumption, the most energy-efficient path between a data source and the sink is a straight line with multiple hops, and the minimumpower topology is constructed by building such a path for each data source in the network. This topology consumes the least power since each data flow rm takes the shortest path toward the sink and by optimizing the number of intermediate nodes, the power consumption on this shortest path is also minimized. The theoretical optimal number of hops, K opt , including the node that D acts as a sensor at a data source s, is given by K opt = dchar where D = s − b is the distance between s and b and dchar is the “characteristic distance” given by: α1 dchar = n α2 (n − 1) where αi , n are defined by the node energy model above. The corresponding lower bound for power consumption between some source sm and the base station is (Dm = sm − b): n Dm Pm = α1 − α12 rm + α3rm . (6.9) n − 1 dchar However, in constructing this minimum-power topology, a large number of relay nodes are needed, because each data flow is independent and shares no relays with the rest. When the number of nodes is limited, a natural idea is to minimize the power consumption by (1) making two or more data flows share some relays, or (2) deploying fewer relays on some route. This brings us back to the minimum-power sensor deployment problem (6.1), which couples two traditional optimization problems: if flow vectors fm are given and Equation (6.1) is optimized only over the locations of sensors xi , Equation (6.1) can be viewed as a facility location problem; on the other hand, if sensor locations xi are given and fm are the only decision variables, it can be reduced to a minimum-cost flow problem. The nonlinearity of the cost function as well as the coupling of these two problems make Equation (6.1) difficult to solve. For example, we have found that using standard Lagrangian relaxation methods does not substantially reduce complexity because this coupling is tight. As an alternative, the solution approach proposed in Reference [3] uses a decomposition method exploiting two facts: 1. the convexity of the link costs e j (xs( j) , xt( j) ), and 2. the fact that in a sensor network data traffic always flows from the sources towards the base station, which allows us to reduce the feasible space of fm by only considering flow vectors that form a tree structure over the network. In addition, we also relax the capacity constraint (6.3); current sensor networks indeed operate with light traffic and the actual data flow over a link is unlikely to reach the link’s capacity. When this capacity is not reached, it is also easy to see that no links other than those in a tree structure are ever used
P1: Binaya Dash November 16, 2007
15:33
7985
7985˙C006
186
Modeling and Control of Complex Systems
[7] (if any such link is used, the distance to the sink is increased, hence, the power consumption increases as well). 6.3.1 Problem Decomposition The proposed decomposition method is motivated by the special structure of the problem. Because the cost e j (·) of link j is a convex function of the location of its end points xs( j) and xt( j) and the total cost in Equation (6.1) is a weighted sum of all link costs, this implies that for a given set of flow vectors fm , the cost will also be a convex function of the locations of sensors xi . This convexity permits the design of a fast algorithm to find the optimal sensor locations xi∗ and the corresponding minimal cost g(f1 , . . . , f M ) for a given set of flow vectors. More formally, g(f1 , . . . , f M ) = min xi
M
f jm e j (xs( j) , xt( j) )
m=1 j∈E
s.t.xm = sm , m = 0, 1, . . . , M.
(6.10)
With g(f , . . . , f ) as above, and keeping in mind the network tree structure and the elimination of Equation (6.3), the main problem (6.1) becomes: 1
M
min g(f1 , . . . , f M ) fm a ij f jm = −rm dim ∀i, m s.t. j∈E b ij f jm ≤ rm ∀i, m j∈E f jm ∈ [0, rm ] ∀ j, m where
b ij =
1 0
(6.11) (6.12) (6.13) (6.14)
if arc j leaves node i otherwise.
In this formulation, Equation (6.12) still captures all flow balance equations, whereas constraints (6.13) and (6.14) build a unique path between each data source and the base station, therefore guaranteeing the tree structure of the network. Subproblems (6.10) and (6.11) suggest an iterative approach for solving the original problem. Starting with a feasible set of flow vectors f1 , . . . , f M , the first step is to solve Equation (6.10), which provides information used to update the flow vectors. An efficient gradient-based method for doing this (referred to as the “inner force method”) is detailed in Reference [3]. It views the network as a dynamic system with “inner forces” applied to each node; in particular, a force applied to node i by link j is defined as: Fij = −
M m=1
f jm
∂e j (xs( j) , xt( j) ) ∂ xi
.
(6.15)
P1: Binaya Dash November 16, 2007
15:33
7985
7985˙C006
Optimization Problems in the Deployment of Sensor Networks
187
Each such force causes node i to move toward the steepest descending direction that leads it to an equilibrium point (unique, due to convexity) where all forces applied on i are balanced out. The second step is to solve subproblem (6.11), that is, find the optimal routing from all data sources to the sink in the tree structure resulting from the first step. Although this idea is straightforward, there is still a difficulty that prohibits its implementation. The difficulty is that g(f1 , . . . , f M ) is a nonconvex (and nonconcave) function of the flow vectors f1 , . . . , f M , which generally implies the existence of multiple local minima. Thus, we follow a different approach, based on the idea of 1. incrementing the number of nodes one at a time, 2. determining the optimal location of the new node and the corresponding flow vectors, and 3. repeating this process until the number of available nodes N is reached or the cost is sufficiently close to the known lower bound (6.9).
6.3.2 Incremental Iterative Approach In an incremental deployment, the initial step is to begin with M nodes, each located at one of the M sources, and construct the corresponding tree structure with the base station as its root. The associated flow vectors f1 , . . . , f M are immediately given by Equation (6.12) with a ij , b ij determined by this simple initial tree structure. The next step is to add a node and determine its optimal location while preserving the network’s tree structure. Unfortunately, as the number of nodes increases, the number of possible tree structures increases exponentially and constructing an efficient algorithm to find the optimal topology is a crucial issue. The approach proposed in Reference [3] is based on a local topology adjustment, thus the size of the problem is limited; the price to pay is that global optimality can no longer be guaranteed. However, since, as discussed above, we know that the optimal deployment with an unlimited number of nodes consists of multihop straight-line paths between every data source and the base station, we have at our disposal the lower bound (6.9) that our solution can be compared to. As numerical results illustrate (see Reference [3]), this lower bound is rapidly approached by the proposed algorithm and with a number of nodes significantly smaller than the associated number K opt given earlier. The addition of a node and determination of its optimal location is a threestep process. First of all, we determine which part of the network needs a new relay the most. Then, all possible topology adjustments around this area are obtained. Finally, the power improvement of each case will be checked and the one that provides the greatest improvement will become the new configuration. Figure 6.1 graphically summarizes the overall process.
P1: Binaya Dash November 16, 2007
15:33
7985
7985˙C006
188
Modeling and Control of Complex Systems
Initialize with K = M nodes, solve min-cost flow problem for (f1, … fM ) and ( x1, …, xM )
No
K
1. Locate bottleneck node 2. Add node and enumerate candidate tree structures and corresponding flows ( f1,…, fM)t , t = 1,…, TK
(f1, …, fM)TK
(f1, …, fM)1 Solve Subproblem 1 for
(x1, … , xK+1)1 and g1
…
Solve Subproblem 1 for
(x1, … , xK+1)TK and gTK gTK(f1, …, fM )
g1 (f1, …, fM ) Solution for Subproblem 2 t * = arg min g (f , …, f t=1, …, TK t 1
M
)
(f1, … fM )t* and ( x1, …, xK+1)t*
K = K+ 1
Terminate with K = N nodes with optimal solution
(x1, …, xN ) and (f1, … fM)
FIGURE 6.1 Incremental iterative node deployment process.
6.3.2.1 Determining Bottleneck Node A bottleneck node is defined as a node around which a new relay and corresponding new topology would bring the most improvement to the power conservation of the whole network. The bottleneck node is determined by checking the inner forces applied to nodes: as mentioned earlier, the inner forces on a link contain the gradient information of the power consumption on this link. The larger an inner force applied by a link on the node, the greater the power savings by shortening this link. Before adding a new node, all nodes in the network have reached their equilibrium points. Thus, if a
P1: Binaya Dash November 16, 2007
15:33
7985
7985˙C006
Optimization Problems in the Deployment of Sensor Networks
189
node is balanced under several inner forces that have relatively larger magnitude, it follows that by shortening one of its links, the power consumption on this link will improve greatly, but the cost involved on other links will overwhelm this improvement. Intuitively, we can visualize the area around this sensor node as being more sparse, and there is a higher need for a new relay in this region. With these observations in mind, we define the sparseness around node i as: ||Fij || SPi = j∈V(i) with Fij given in Equation (6.15) and the bottleneck node k is defined to be the node that has the greatest sparseness. That is, k = arg max SPi . i=0,..., N
Obviously, there is no guarantee that the optimal location of the new node is indeed in the vicinity of the bottleneck node as defined above, so the solution implied by this approach is generally suboptimal. 6.3.2.2 Enumerating Topology Adjustment Options The bottleneck node indicates the area that needs a new relay the most. Once it is determined, the precise placement of the new relay must be determined. Because we are working on a tree structure, the insertion of a new relay also means adding a new link. Thus, we need to consider topologies generated when an additional relay and link are present in the target area. As shown in Reference [3], the number of all possible new topologies is 3 · 2m−1 − 2 where m is the number of children of the bottleneck node. 6.3.2.3 Obtaining a New Deployment The outcome of step 2 when the current number of nodes is L < N is a number of possible new network tree structures, say TL , each with associated flow vectors (f1 , . . . , f M ) t , t = 1, . . . , TL . For each such structure t, subproblem (6.10) is solved (as described earlier), giving the corresponding optimal node locations xi,t , i = 1, . . . , L + 1 and cost gt (f1 , . . . , f M ). Next, the solution of problem (6.11) reduces to comparing all such costs and determining t ∗ = arg mint=1,..., TL gt (f1 , . . . , f M ), the corresponding node locations xi,t∗ , i = 1, . . . , L + 1, and the flows (f1 , . . . , f M ) t∗ . In closing, we should point out that the incremental deployment approach above is based on a centralized scheme. It assumes the existence of a controller with powerful computational capabilities, perfect information of the whole network, and unlimited control over all sensor nodes. In the case of mobile nodes but still known data sources, an open problem is the development of distributed algorithms for sensor node deployment through which an individual sensor node can autonomously decide how to move based on its own local knowledge of the overall system.
P1: Binaya Dash November 16, 2007
190
6.4
15:33
7985
7985˙C006
Modeling and Control of Complex Systems
Deployment of Networks with Unknown Data Sources and Mobile Nodes
When nodes are mobile and data source targets are either unknown or are mobile as well, the problem of deploying sensors in order to meet the overall system objectives is referred to as the coverage control or active sensing problem [9],[10],[11]. In particular, sensors must be deployed so as to maximize the information extracted from the mission space while maintaining acceptable levels of communication and energy consumption. The static version of this problem involves positioning sensors without further mobility; optimal locations can be determined by an off-line scheme that is akin to the widely studied facility location optimization problem. The dynamic version allows the coordinated movement of sensors, which may adapt to changing conditions in the mission space, typically deploying them into geographic areas with the highest information density. Because of the similarity of coverage control with facility location optimization, the problem is often viewed in that framework. In Reference [10], the authors develop a decentralized coverage control algorithm based on Voronoi partitions and the Lloyd algorithm. In Reference [9] a coverage control scheme is proposed that aims at the maximization of target exposure in some surveillance applications, and in Reference [12] a heuristic algorithm based on “virtual forces” is applied to enhance the coverage of a sensor network. Much of the active sensing literature [11] also concentrates on the problem of tracking specific targets using mobile sensors, and the Kalman filter is used extensively to process observations and generate estimates. Some of the methods that have been proposed for coverage control assume uniform sensing quality and an unlimited sensing range. Partition-based deployment methods, on the other hand, tend to overlook the fact that the overall sensing performance may be improved by sharing the observations made by multiple sensors. There are also efforts that rely on a centralized controller to solve the coverage control problem. A centralized approach does not suit the distributed communication and computation structure of sensor networks. In addition, the combinatorial complexity of the problem constrains the application of such schemes to limited-size sensor networks. Finally, another issue that appears to be neglected is the cost of relocating sensors. The movement of sensors not only impacts sensing performance, but it also influences other quality-of-service aspects in a sensor network, especially those related to wireless communication: because of the limited on-board power and computational capacity, a sensor network is not only required to sense but also to collect and transmit data as well. For this reason, both sensing quality and communication performance need to be jointly considered when controlling the deployment of sensors. This motivates a distributed coverage control approach for cooperative sensing [4]. The mission space will now be modeled using a density function representing the frequency in which specific events take place (e.g., data are
P1: Binaya Dash November 16, 2007
15:33
7985
7985˙C006
Optimization Problems in the Deployment of Sensor Networks
191
generated at a certain point). At the two extremes, this allows us to model a mission space with no information on target locations (using a uniform density function) or one with known locations (using a probability mass function). We assume that a mobile sensor node has a limited range that is defined by a probabilistic model. A deployment algorithm is applied at each mobile node such that it maximizes the joint detection probabilities of random events. We assume that the event density function is fixed and given; however, in the case that the mission space (or our perception of the mission space) changes over time, the adaptive relocation behavior naturally follows from the optimal coverage formulation. 6.4.1 Mission Space and Sensor Model We model the mission space as a polyhedron ⊂ R2 , over which there exists an event density function R(x), x ∈ , that captures the frequency or density 2 of a specific random
event taking place (in Hz/m ). R(x) satisfies R(x) ≥ 0 for all x ∈ and R(x) < ∞. Depending on the application, R(x) may be the frequency that a certain type of data source appears at x, or it could be the probability that a variable sensed (e.g., temperature) at x exceeds a specific threshold. In the mission space , there are N mobile nodes located at s = (s1 , . . . , s N ), si ∈ R2 , i = 1, . . . , N. When an event occurs at point x, it emits a signal and this signal is observed by a sensor node at location si . The received signal strength generally decays with x − si , the distance between the source and the sensor. Similar to the model in Reference [13], we represent this degradation by a monotonically decreasing differentiable function pi (x), which expresses the probability that sensor i detects the event occurring at x. A simple example is pi (x) = p0i e −λi ||x−si || where the detection probability declines exponentially with distance, and p0i , λi are determined by physical characteristics of the sensor. 6.4.2 Optimal Coverage Problem Formulation and Distributed Control When deploying mobile sensor nodes into the mission space, we want to maximize the probability that events are detected. This motivates the formulation of an optimal coverage problem. Assuming that sensors make observations independently, when an event takes place at x and it is observed by sensor nodes, the joint probability that this event is detected can be expressed by: P(x, s) = 1 −
N
[1 − pi (x)] .
(6.16)
i=1
The optimal coverage problem can be formulated as a maximization of the expected event detection frequency by the sensor nodes over the mission space : (6.17) max R(x) P(x, s)d x. s
P1: Binaya Dash November 16, 2007
15:33
7985
7985˙C006
192
Modeling and Control of Complex Systems
In this optimization problem, the controllable variables are the locations of mobile sensors in the vector s. The problem may be solved by applying a nonlinear optimizer with an algorithm that can evaluate integrals numerically. In this case, a centralized controller with substantial computational capacity is required. In a mobile sensor network, the base station is a likely candidate for such a controller. However, this solution is only suitable for networks of limited size. Otherwise, both the complexity of the optimization problem and the communication overhead will make this centralized scheme infeasible. Thus, instead of using a centralized scheme, we will develop a distributed control method to solve the optimal coverage problem. We denote the objective function in Equation (6.17) by: F (s) = R(x) P(x, s)d x. (6.18)
When taking partial derivatives with respect to si , i = 1, . . . , N, we have: ∂F ∂ P(x, s) = R(x) d x. (6.19) ∂si ∂si
If this partial derivative can be evaluated locally by each mobile node i, then a gradient method can be applied that directs nodes towards locations that maximize F (s). In view of Equation (6.16), the partial derivative (6.19) can be rewritten as: N ∂F d pi (x) si − x = R(x) [1 − pk (x)] dx (6.20) ∂si ddi (x) di (x) k=1,k = i
where di (x) ≡ x − si . It is hard for a mobile sensor node to directly compute Equation (6.20), since it requires global information such as the value of R(x) over the whole mission space and the exact locations of all other nodes. In addition, the evaluation of integrals remains a significant task for a sensor node to carry out. To address these difficulties, we first truncate the sensor model and constrain its sensing capability by applying a sensing radius. This approximation is based on the physical observation that when di (x) is large, pi (x) = 0 for most sensing devices. Let: pi (x) = 0,
d pi (x) =0 ddi (x)
for all x s.t. di (x) ≥ D
(6.21)
where D denotes the sensing radius. Thus, Equation (6.21) defines node i’s region of coverage, which is represented by i = {x : di (x) ≤ D}. Since pi (x) = 0, d pi (x)/ddi (x) = 0 for all x ∈ / i , we can use i to replace in Equation (6.20). Another byproduct of using Equation (6.21) is the emergence of the concept of neighbors. In Equation (6.20), for a point x ∈ i and a node k = i, a necessary condition for the detection probability pk (x) to be greater than 0 is dk (x) ≤ D. As shown in Figure 6.2, when the distance between nodes i and k is greater than 2D, every point x in i satisfies
P1: Binaya Dash November 16, 2007
15:33
7985
7985˙C006
Optimization Problems in the Deployment of Sensor Networks
193
2D
FIGURE 6.2 Defining neighbor sets.
dk (x) > D, thus pk (x) = 0 and [1 − pk (x)] = 1 for all x ∈ i . If we define a set i}, then any sensor node k ∈ / Bi Bi = {k : si − sk < 2D, k = 1, . . . , N, k = (k = i) will not contribute to the integral in Equation (6.20). After applying Equation (6.21) and using Bi , Equation (6.20) reduces to: ∂F d pi (x) si − x d x. (6.22) = R(x) [1 − pk (x)] ∂si ddi (x) di (x) k∈B i
i
The final step in making Equation (6.22) computable is to discretize the integral evaluation. A (2V + 1) × (2V + 1) grid is applied over the coverage region i with V = D/ , where << D is the resolution of the grid. On the grid of each node i, a Cartesian coordinate system is defined, with its origin located at si , its axes parallel to the grid’s setting, and the unit length being . In this local coordinate system, let (u, v) denote the location of a point x. Then, the transformation that maps (u, v) onto the global coordinate system is x = si + [ u v ]T . Upon switching to this local coordinate system, the terms in Equation (6.22) become: ˜ i (u, v), R(x) = R
pi (x) = p˜ i (u, v),
d pi (x) = p˜ i (u, v) ddi (x)
˜ i (u, v) indicates node i’s local perception (map) on the event density where R of the mission space. In a typical dynamic deployment application, all sensor nodes start with the same copy of an estimated event density function at the beginning of the deployment. As nodes are deployed and data are collected, an individual node may update its local map through merging new observations into its perception, and by exchanging information with nearby neighbors.
P1: Binaya Dash November 16, 2007
15:33
7985
7985˙C006
194
Modeling and Control of Complex Systems We also can rewrite the product term in Equation (6.22) as:
[1 − pk (x)] =
k∈Bi
1 − p˜ k
k∈Bi
sk1 − si1 sk2 − si2 u− ,v−
≡ B˜ i (u, v) −si1 −si2 where (u − sk1 , v − sk2 ) are the coordinates of x in the kth node’s local coordinate system. By applying the grid and the coordinate tranformation, Equation (6.22) can be rewritten as:
∂F ∂si1 ∂F ∂si2
≈ 2
V V ˜ i (u, v) B˜ i (u, v) p˜ i (u, v)u R √ u2 + v2 u=−V v=−V
≈ 2
V V ˜ i (u, v) B˜ i (u, v) p˜ i (u, v)v R √ . u2 + v2 u=−V v=−V
(6.23)
These derivatives can be computed easily by mobile sensor nodes using only the local information available. An advantage of switching to the local coordinates in Equation (6.23) is that the values of p˜ i (u, v) and p˜ i (u, v) are uniquely determined by (u, v) and the sensor model. This motivates the storage of p˜ i (u, v) and p˜ i (u, v) as two matrices in the on-board memory of a sensor node. Through acquiring key sensor model parameters from neighbors (e.g., the parameters p0i and λi if pi (x) = p0i e −λi ||x−si || ) and properly rescaling p˜ i (u, v) and p˜ i (u, v), node i can also easily evaluate B˜ i (u, v) using stored matrices. By doing so, the computation effort in repeatedly evaluating Equation (6.23) is drastically reduced. The gradient information above provides a direction for a mobile node’s movement. The precise way in which this information is used depends on the choice of motion scheme. The most common approach in applying a gradient method is to determine the next waypoint on the ith mobile node’s motion trajectory through: sik+1 = sik + αk
∂F ∂sik
(6.24)
where k is an iteration index, and the step size αk is selected according to standard rules (e.g., see Reference [14]) in order to guarantee the convergence of motion trajectories. The computational complexity in evaluating the gradient shown in Equation (6.23) depends on the scale of the grid and the size of neighbor set Bi . In the worst case, node i has N − 1 neighbors and the number of operations needed to compute ∂∂sF is O( NV 2 ). The best case occurs when there is no neighi
bor for node i, and the corresponding complexity is O(V 2 ). In both cases, the complexity is quadratic in V.
P1: Binaya Dash November 16, 2007
15:33
7985
7985˙C006
Optimization Problems in the Deployment of Sensor Networks
195
6.4.3 Optimal Coverage Problem with Communication Costs Besides sensing and collecting data, the coverage control mission includes the task of forwarding data to the base station. Assuming a flat network structure (i.e., no clusterheads as discussed in Section 6.2), the cost of communication comes mainly from the power consumption for wireless transmissions. We shall use once again the link energy model (6.6) to (6.7) of Section 6.3. The base station location is represented as s0 ∈ R2 and the data rate originating from the ith sensor node is denoted by ri (si ), i = 1, . . . , N. Note that ri is defined as a function of si because the amount of data forwarded at i is determined by the number of events detected, and the latter depends on the node’s location. Here we assume that ri (si ) is proportional to the frequency events are detected, that is, ri (si ) = α3 R(x) pi (x)d x (6.25)
where α3 (bits/detection) is the amount of data forwarded when the sensor node detects an event. Let c i (s) be the total power consumed by the network in order to deliver a bit of data from node i to the base station. Then, the optimal coverage problem can be revised by combining sensing coverage and communication cost as follows: ⎧ ⎫ N ⎨ ⎬ max w1 R(x) P(x, s)d x − w2 ri (si )c i (s) (6.26) s ⎩ ⎭ i=1
where w1 , w2 are weighting factors. One can think of w1 as the reward for detecting an event and w2 as the price of consuming a unit of energy. Let us denote the communication cost by: G(s) =
N
ri (si )c i (s)
(6.27)
i=1
so that, recalling Equation (6.18), the overall objective function is written as: J (s) = w1 F (s) − w2 G(s) In order to derive partial derivatives ∂∂sJ as done earlier, we shall focus on the i
evaluation of ∂∂Gs , which can be expressed as: i
∂G ∂si
dri (si ) ∂c k (s) + rk (sk ) . dsi ∂si k=1 N
= c i (s)
(6.28)
P1: Binaya Dash November 16, 2007
15:33
7985
7985˙C006
196
Modeling and Control of Complex Systems
In this expression, both ri and ∂∂rsi can be obtained by applying the same i method as the one described earlier. That is, recalling that x = si + [ u v ]T , ri ≈ α3 2
V V
˜ (u, v) p˜ i (u, v) R
u=−V v=−V V V ˜ (u, v) p˜ i (u, v)u dri R √ ≈ α3 2 dsi1 u2 + v2 u=−V v=−V V V ˜ (u, v) p˜ i (u, v)v dri R √ ≈ α3 2 . dsi2 u2 + v2 u=−V v=−V
(6.29)
The only term remaining to derive in ∂∂Gs is c i (s) and its gradient. The cost of i delivering a bit of data from i to the base station, c i (s), is determined by the way in which data forwarding paths are constructed, that is, the precise routing protocol used. For a typical shortest-path-based routing scheme, these quantities are obtained in Reference [4] and each sensor uses gradient information to direct motion control as in Equation (6.24) with ∂ J /∂sik replacing ∂ F /∂sik . With properly selected step sizes, mobile sensors will finally converge to a maximum point of J (s). To illustrate the use of this distributed deployment algorithm, we consider an example with a team of six mobiles waiting to be deployed into a 40 × 40 (meter) mission space. The event density function R(x) is given by, R(x) = R0 − β x − x0
(6.30)
where R0 = 3.0, β = 0.1, x0 = [0, 20]. In this case, the event density of a point x (x ∈ ) declines linearly with the distance between x and the center point x0 of the mission space. At time t = 0, mobile sensors reside at s0 = [0, 0]. Each mobile node is equipped with a sensor whose detection probability is modeled by pi (x) = p0i e −λi ||x−si || where p0i = 1.0, λi = 1.0 for all i = 1, . . . , N. The sensing radius is D = 5.0, as illustrated by the circles in Figure 6.3. A mobile sensor also has a wirelesstransceiver whose power consumption is modeled by Equation (6.7) with α1 = 0.01nJ/bit, α2 = 0.001nJ/bit/m4 , and n = 4. Upon a sensor detecting an event, it collects 32 bits of data and forwards them back to the base station (so that α3 = 32 in Equation (1.6)). We consider two distinct cases. In the first case, no communication cost is considered, which corresponds to w1 > 0, w2 = 0 in the optimal coverage formulation (6.26). In the second case, both sensing coverage and communication cost are included (w1 , w2 > 0). Figure 6.3 presents several snapshots taken during the deployment process of the first case. Starting with Figure 6.3a, six sensors establish a formation and move toward the center of the mission space. During its movement, the formation keeps evolving, so that sensors expand the overall area of sensing and at the same time jointly cover the points with high event density. In addition, sensors also maintain wireless
P1: Binaya Dash November 16, 2007
15:33
7985
7985˙C006
Optimization Problems in the Deployment of Sensor Networks
(a)
(b)
(c)
(d)
197
FIGURE 6.3 Sensor deployment without communication cost consideration.
communication with the base station. This is shown in Figure 6.3 as links between sensor nodes and the base station. The team of sensors finally converges to a stationary formation as shown in Figure 6.3d. It can be seen in this symmetric formation that all six sensors are jointly sensing the area with the highest event density. We incorporate communication cost into the optimal coverage formulation by setting w2 = 0.0008 and w1 = 1 − w2 in Equation (6.26). The corresponding deployment simulation results are shown in Figure 6.4. Comparing with the first case, a critical difference can be observed in the formation of mobile sensors: sensors not only move towards the area with high event density, but they also maintain an economical multihop path to the base station. The team of sensors reaches a stationary deployment as illustrated in Figure 6.4d. In contrast to the final formation of the first case (Figure 6.3d), only four sensors gather around the center of the mission space. The other two sensors are aligned as relays to support the communication with the base station. Figures 6.5 and 6.6 demonstrate the sensing coverage and communication cost associated with the two cases previously shown. Figure 6.5 depicts the change in sensing coverage (measured by the expected frequency of event detection) when sensors move towards the optimal deployment. A direct observation is that in both cases, sensing coverage increases monotonically
P1: Binaya Dash November 16, 2007
15:33
7985
7985˙C006
198
Modeling and Control of Complex Systems
(a)
(b)
(c)
(d)
FIGURE 6.4 Sensor deployment with communication cost consideration.
100
Frequency of Detection (Hz)
90 80 70 60 50 40 30
No comm. cost Including comm. cost 0
20
FIGURE 6.5 Comparison of sensing coverage.
40
60 t
80
100
120
P1: Binaya Dash November 16, 2007
15:33
7985
7985˙C006
Optimization Problems in the Deployment of Sensor Networks
2
199
×105
No comm. cost Including comm. cost
1.8 1.6
Power (nW)
1.4 1.2 1 0.8 0.6 0.4 0.2 0
0
20
40
60 t
80
100
120
FIGURE 6.6 Comparison of communication costs.
with the evolution of formations. If no communication cost is considered during sensor deployment, sensing coverage reaches a maximum at 91.47 Hz. However, in the case that communication cost is considered, when sensors reach optimal deployment, only 84.74 events can be detected per second, which corresponds to a 7.36% coverage loss. This coverage loss is natural, because the optimal coverage formulation (6.26) actually trades off sensing coverage for a lower communication cost. This trade-off can be further examined by looking at Figure 6.6. If communication cost is considered, the final power consumption is 8.01 × 103 nW. Compared to the communication cost of the first case (1.877 × 105 nW), there is a 95.73% power saving. One issue that we have not addressed explicitly in the development of this distributed cooperative coverage control approach is that of the global optimality of the gradient-based algorithm involved. This remains a topic of ongoing research. In particular, one expects that global optimality is intricately connected to properties of the event density function and the sensor model adopted.
6.5
Research Issues in Sensor Networks
We mentioned in Section 6.2 some of the major design and control problems related to sensor networks, which include the use of cooperative control techniques as they pertain to the case of mobile nodes. There are numerous open
P1: Binaya Dash November 16, 2007
200
15:33
7985
7985˙C006
Modeling and Control of Complex Systems
research issues in the areas of routing, scheduling, power control, and the execution of various cooperative missions. In what follows, we briefly mention some of them. First, we mentioned in Section 6.2 that a potentially better structure for sensor networks is a hierarchical one, making use of clusterheads acting as intermediate processing nodes between data sources and a base station. The presense of clusterheads implies different approaches for some of the problems we have discussed; for example, deployment may be quite different if a clusterhead can “aggregate” data from neighboring nodes and avoid the need for these nodes to use up energy for direct communication with the base station. Second, we also mentioned that one form of power control is to switch the state of sensor nodes between “sleeping” and “sensing” or “relaying.” Formulating such a switching control problem and devising solution methods dependent on the information available to each node is an interesting direction for research. Third, the problem of location detection (i.e., identifying the precise location of a sensor network node) when nodes are mobile is one that deserves in-depth study. In the context of cooperative control, one can define different types of sensor network missions, typically formulated through optimization problems. Questions of local versus global optimality and the need for mechanisms consistent with the distributed nature of sensor networks are issues that remain largely unexplored to date. Although many of the open questions above are technically challenging in their own right, there are also some more fundamental issues of much broader long-term impact where progress has been minimal. These issues are closely related to the convergence of communication, computing, and control which brings together three disciplines that often use different modeling paradigms and different ways of thinking. Naturally, bridging the gaps between them is a real challenge. One of these issues concerns the combination of asynchronous and synchronous modes of operation in a common system setting. Although the gathering of data is inherently asynchronous (due to multiple sensor nodes operating in different temporal and spatial scales), the processes of data fusion and control are traditionally based on a synchronized time structure. This is one manifestation of the difference between time-driven and event-driven behavior, and designing a system environment where both can coexist remains an open problem; see also Reference [15]. The traditional setting of differential equation models and time-driven digital sampling provides a comfortable infrastructure for communication and control methodologies, but that is being challenged by computational models that rely on event-driven processes and by the simple intuitive observation that time-driven sampling is inherently wasteful. The limited resources of sensor network nodes emphasize the need to switch to a more efficient event-driven sampling approach, where data are collected only when “something interesting” happens. To do so, however, requires new sampling mechanisms and possibly new data collection hardware as well. A second research issue of particular importance to control theory is the obvious shift from sensor-poor to data-rich control systems. Traditional feedback control systems have been designed under the premise that sensors are
P1: Binaya Dash November 16, 2007
15:33
7985
7985˙C006
Optimization Problems in the Deployment of Sensor Networks
201
few and expensive and much of the “intelligence” in such systems is concentrated on compensating for limited state information. The sudden wealth of sensor data (subject, of course, to bandwidth and delay limitations) shifts the need for “intelligence” towards processing potentially huge amounts of data and combining model-based methodologies with increasingly data-driven ones. To date, there appears to be a significant gap between schools of thought advocating one versus the other approach. One would expect that a combination can enable us to exploit advantages of both.
Acknowledgments This work is supported in part by the National Science Foundation under Grant DMI-0330171, by the Air Force Office of Scientific Research under Grants FA9550-04-1-0133 and FA9550-04-1-0208, by the Army Research Office under Grant DAAD19-01-0610, and by the Department of Energy under Grant DE-FG52-06NA27490.
References 1. G. Baliga, S. Graham, and P. R. Kumar, “Middleware and Abstractions in the Convergence of Control with Communication and Computation,” in Proc. of 44th IEEE Conference on Decision and Control, pp. 4245–4250, 2005. 2. C. G. Cassandras and W. Li, “Sensor Networks and Cooperative Control,” European Journal of Control, vol. 11, no. 4-5, pp. 436–463, 2005. 3. W. Li and C. G. Cassandras, “A Minimum-Power Wireless Sensor Network Self-Deployment Scheme,” in Proceedings of IEEE Wireless Communications and Networking Conference, 2005. 4. W. Li and C. G. Cassandras, “Distributed Cooperative Coverage Control of Sensor Networks,” in Proceedings of 44th IEEE Conference on Decision and Control, pp. 2542 – 2547, 2005. 5. I. F. Akyildiz, W. Su, Y. Sankarasubramaniam, and E. Cayirci, “A Survey on Sensor Networks,” IEEE Communications Magazine, vol. 40, no. 8, pp. 102–114, 2002. 6. C. Chong, “Sensor Networks: Evolution, Opportunities, and Challenges,” Proceedings of the IEEE, vol. 91, pp. 1247–1256, 2003. 7. C. H. Papadimitriou and K. Steiglitz, Combinatorial Optimization: Algorithms and Complexity. Dover Publications San Raphael, CA, 1998. 8. M. Bhardwaj, A. Chandrakasan, and T. Garnett, “Upper Bounds on the Lifetime of Sensor Networks,” in Proceedings of IEEE International Conf. on Communications, pp. 785–790, 2001. 9. S. Meguerdichian, F. Koushanfar, M. Potkonjak, and M. B. Srivastava, “Coverage Problems in Wireless Ad-Hoc Sensor Networks,” in Proceedings of IEEE INFOCOM, pp. 1380–1387, 2001.
P1: Binaya Dash November 16, 2007
202
15:33
7985
7985˙C006
Modeling and Control of Complex Systems
10. J. Cortes, S. Martinez, T. Karatas, and F. Bullo, “Coverage control for mobile sensing networks,” IEEE Transactions on Robotics and Automation, vol. 20, no. 2, 2004. 11. L. Mihaylova, T. Lefebvre, H. Bruyninckx, and K. Gadeyne, “Active Sensing for Robotics. A Survey,” in Proc. of the 5th International Conference on Numerical Methods and Applications (Borovets, Bulgaria), pp. 316–324, 2002. 12. Y. Zou and K. Chakrabarty, “Sensor Deployment and Target Localization Based on Virtual Forces,” in Proceedings of IEEE INFOCOM, pp. 1293–1303, 2003. 13. T. Clouqueur, V. Phipatanasuphorn, P. Ramanathan, and K. Saluja, “Sensor Deployment Strategy for Target Detection,” in Proceedings of 1st ACM International Workshop on Wireless Sensor Networks and Applications (Atlanta, GA), pp. 42–48, 2002. 14. D. P. Bertsekas, Nonlinear Programming. Athena Scientific, Belmont, MA, 1995. 15. C. G. Cassandras, M. I. Clune, and P. J. Mosterman, “Hybrid System Simulation with SimEvents,” in Proceedings of 2nd IFAC Conference on Analysis and Design of Hybrid System, pp. 136–141, 2006.
P1: Binaya Dash November 16, 2007
15:41
7985
7985˙C007
7 Congestion Control in Computer Networks
Marios Lestas, Andreas Pitsillides, and Petros Ioannou
CONTENTS 7.1 7.2 7.3
Introduction................................................................................................ 203 Problem Formulation ................................................................................ 206 Previous Work ............................................................................................ 209 7.3.1 Dual Algorithms............................................................................ 210 7.3.2 A Primal Algorithms .................................................................... 213 7.3.3 Max-Min Congestion Controller Algorithms............................ 215 7.4 Model of the Single Bottleneck Link Case ............................................. 217 7.5 Adaptive Congestion Control Protocol .................................................. 220 7.5.1 Protocol ........................................................................................... 221 7.5.1.1 Packet Header ................................................................ 221 7.5.1.2 ACP Sender .................................................................... 222 7.5.1.3 ACP Receiver.................................................................. 223 7.5.1.4 ACP Router..................................................................... 223 7.5.2 Performance Evaluation............................................................... 225 7.5.2.1 Scalability........................................................................ 225 7.5.2.2 Performance in the Presence of Short Flows ............. 229 7.5.2.3 Fairness ........................................................................... 232 7.5.2.4 Dynamics of ACP .......................................................... 234 7.5.2.5 A Multilink Example..................................................... 237 7.5.2.6 Comparison with XCP .................................................. 239 7.5.2.7 Comparison with RCP .................................................. 241 7.6 Conclusions ................................................................................................ 243 References............................................................................................................. 244
7.1
Introduction
In the last twenty years, the Internet has experienced tremendous growth, which has transformed it from a small-scale research network to the largest and most complex artificially deployed system. The Internet possesses similar 203
P1: Binaya Dash November 16, 2007
204
15:41
7985
7985˙C007
Modeling and Control of Complex Systems
structural properties to the ones characterizing many other complex systems pervading science: a plethora of often heterogeneous subsystems (sources and routers) performing complex functions, interconnected by heterogeneous links (wired, wireless, satellite links) often incorporating complex dynamics themselves. There are several factors contributing to the immense complexity of the system: the large scale and size as a result of its exponential growth; the fragmented nature of the underlying infrastructure; the hierarchical organization; the extreme heterogeneity as a result of the diverse network technologies and communication services that are accommodated; the distributed management of the available resources; the complex structures that arise in the implementation of the various functionalities of the layered protocols, and so on [1]. Many of the complex network functions that drive the current Internet have been developed using engineering intuition, heuristics, and ad hoc nonlinear techniques, with the objective of making the system resilient to failures and robust to changing environments. The problem with this approach is that very little is known about why these methods work and very little explanation can be given when they fail. Given the lack of a coherent and unified theory of complex systems, these methods do not have analytically proven performance properties and can thus prove to be ineffective as the system evolves over time. When such vulnerabilities do show up, designers usually resort to even more complex network functions to solve the problem, thus contributing to a spiral of increasing complexity [1]. These observations highlight the necessity to develop a new theoretical framework to help explain the complex and unpredictable behaviors of the Internet and offer alternative network protocols that are provably effective and robust. Such a framework can serve as a starting point to develop a unified theory for complex systems, useful in explaining how the interaction between the individual components of such systems allows the emergence of a global behavior that would not be anticipated from the behavior of components in isolation. Dramatic progress is being made in developing such a theoretical framework to investigate and solve the problem of Internet congestion control. Congestion control is a representative example of how ad hoc solutions, although being successful at the beginning, can later be found to be ineffective as the evolution of the system reveals their deficiencies. Congestion control mechanisms were introduced in the transmission control protocol (TCP) protocol to fix the defects that led in October 1986 to the first of a series of “congestion collapses.” The original algorithm proposed by Van Jacobson [2] with its later enhancements ([3]–[7]), led to the current implementation of TCP which has served the Internet remarkably well as this has evolved from a small-scale network to the largest artificially deployed system. Despite its profound success, there are currently strong indications that TCP will perform poorly in future high-speed networks. Simulations and real measurements indicate that as the bandwidth delay products increase within the network, the slow additive increase and the drastic multiplicative decrease policy of the TCP protocol causes the system to spend a significant amount of time trying to probe for the available bandwidth, thus leading to underutilization
P1: Binaya Dash November 16, 2007
15:41
7985
7985˙C007
Congestion Control in Computer Networks
205
of the available resources [8]. It has also been shown analytically that as the bandwidth delay products increase, TCP becomes oscillatory and prone to instability [9]. Moreover, TCP is grossly unfair towards connections with high round-trip delays [10]. Finally, it has been shown that in networks incorporating wireless and satellite links, long delays and noncongestion-related losses also cause the TCP protocol to underutilize the network [11]. These observations have triggered intense research activity on Internet congestion control, which has led to TCP enhancements and new congestion control protocols. Despite the fact that heuristic methods continue to provide solutions with improved properties [12],[13], there has been increasing emphasis on designs based on mathematical abstractions of networks of arbitrary topology. Such abstractions based on fluid flow models help develop solutions that can be shown analytically to be effective, scalable, robust, and stable, prior to implementation. The theoretical framework used in most of the recent studies originates in the work of Hayden [14]: however, it has gained increasing popularity due to the pioneering work of Kelly et al. [15], where it was utilized to develop scalable price-based Internet congestion control schemes. In the fore-mentioned framework, the congestion control problem is viewed as a resource allocation problem where the objective is to allocate the available resources (link bandwidths) to the competing users without the input data rates at the links exceeding the link capacity. Through an appropriate representation, this problem is transformed into a convex programming problem. A utility function is associated to each flow and the objective is to maximize the aggregate utility function subject to the capacity constraints. Congestion control algorithms can then be viewed as distributed iterative algorithms that compute optimal or suboptimal solutions of this problem. It turns out that solutions with the required distributed structure can be obtained by interpreting the dual variables of the relevant Lagrangian function, as the prices or congestion signals generated at each link [16]. The challenge is then to establish global asymptotic stability of these schemes in the presence of heterogeneous delays. Several congestion control algorithms have been proposed using the described methodology [17] [18], many of which have been accompanied by local or global asymptotic stability results [19]–[22]. Some of these algorithms have been used as a baseline to develop practical packet-level Internet congestion control protocols which have exhibited superior performance [23], [18], [24]. However, the proposed algorithms share a common problem. Connections traveling more hops have a higher probability of being assigned a smaller sending rate value and are thus beaten down by short hop connections [14], [16]. This is known as the beat-down problem [25]. A class of congestion control algorithms that are known to solve this problem are algorithms that achieve max-min fairness at equilibrium. Max-min fairness is considered by many to be the ultimate fairness criterion as it originates from the intuitive notion of allowing each session to get as much network use as any other session; increasing the allocation of a session beyond the max-min equilibrium results in “forcing” other sessions to reduce their rates below their fair share. Maxmin congestion control schemes are associated with a special class of utility
P1: Binaya Dash November 16, 2007
15:41
7985
7985˙C007
206
Modeling and Control of Complex Systems
functions and are usually characterized by nonlinear feedback communication mechanisms. These nonlinearities make the analytical evaluation of the proposed congestion control schemes difficult in networks of arbitrary topology incorporating delays, and so designers usually resort to simple network topologies comprising a single bottleneck link to analytically validate their designs. In such simple networks several feedback-based control techniques have been used for design: linear control theory [26], nonlinear control theory [27], and even fuzzy logic-based control [28]. However, the lack of analytically verifiable solutions in networks of arbitrary topology has led to packet-level protocols, which fail to satisfy all the design objectives. So, despite intense research activity on Internet congestion control, the problem of designing fair (in the max-min sense) and effective congestion control protocols supported by global stability results in the presence of delays still remains open. This chapter provides a survey of recent theoretical and practical developments in modeling and design of Internet congestion control protocols. In Section 7.2 we present a theoretical framework that has been used extensively in the last few years to analyze networks of arbitrary topology and in Section 7.3 we review advances that have emerged in the context of this framework. In Section 7.4 we present a simpler mathematical model used in the analysis of recently proposed max-min congestion control schemes and then in Section 7.5 we present adaptive congestion control protocol (ACP), a new max-min congestion control protocol that has been shown through analysis and simulations to outperform previous proposals and work effectively in a number of scenarios. Finally, in Section 7.6 we offer our conclusions and future research directions.
7.2
Problem Formulation
Central to the development of Internet congestion protocols with verifiable properties are mathematical abstractions of networks of arbitrary topology. In this section, we present a fluid flow model that has been used extensively in the last few years. We use this model to formulate the congestion control problem mathematically. We consider a store and forward, packet-switched network that accommodates elastic applications. Data traffic within the network is viewed as a fluid flow. The network consists of a finite set of links R = {l1 , l2 , . . . , l L }, where l j denotes link j. Let J denote the index set of the links. The network is utilized by a finite set of users U = [s1 , s2 , . . . , s N ] where si denotes user i. Let I denote the index set of the users. Each user injects data packets into the network. Associated with each user si is its sending rate xi , which is chosen based on a control law of the form: w˙ i = g(wi , q i ), xi = h(wi , q i )
wi (0) = wi0
(7.1) (7.2)
P1: Binaya Dash November 16, 2007
15:41
7985
7985˙C007
Congestion Control in Computer Networks
207
where wi denotes a state maintained at user si (wi may also be a vector of states), q i denotes a feedback signal received from the network which represents the presence of congestion in the route of user si , and gi (.), h i (.) are functions that are chosen by the congestion control strategy. We use the vector x = [x1 , x2 , . . . , xN ]T to denote all the sending rates of the sources s1 , s2 , . . . , s N . Similarly we use the vectors w = [w1 , w2 , . . . w N ]T and q = [q 1 , q 2 , . . . q N ]T to denote all the states and the feedback signals at the sources. We lump the functions gi (.), h i (.) to form the vector valued functions: G(w, q ) = [g1 (w1 , q 1 ), g2 (w2 , q 2 ), . . . , g N (w N , q N )]T
(7.3)
H(w, q ) = [h 1 (w1 , q 1 ), h 2 (w2 , q 2 ), . . . , h N (w N , q N )]T
(7.4)
We can then write: w ˙ = G(w, q ), x = H(w, q )
w(0) = w0
(7.5) (7.6)
To each user we also associate a utility function Ui (xi ) of the sending rate xi which describes how “happy” a user is with a particular sending rate assignment. These utility functions are chosen to be strictly increasing and concave. Let C j denote the output capacity of link j and let y j denote the flow rate of data into link j. We use the vector y = [y1 , y2 , . . . , yL ]T to denote the vector of input flow rates at links l1 to l N . Similarly we define the vector C = [C1 , C2 , . . . C L ]T . Let A ∈ R L×N denote the matrix that represents the route of each user. The entry in the ith row and jth column of A is denoted by a i j . In this representation, A consists of elements equal to 0 or 1. If user i utilizes link j then a ji is equal to 1. Otherwise it is equal to 0. Ignoring the queuing dynamics we can establish that: y = Ax
(7.7)
At each link j we associate a signal processor that generates a signal p j which denotes the congestion status at the link. The congestion signal p j is generated according to a control law of the form: z˙j = d(z j , y j ),
z j (0) = z j0
(7.8)
p j = v(z j , y j ),
∀ j J
(7.9)
where z j denotes the state of the controller at link j (z j may also be a vector of states). The functions d(.) and v(.) are to be generated by the congestion control strategy. We use the vector p = [ p1 , p2 , . . . p L ]T to denote the vector of the congestion signals at links l1 to l L , we use the vector z = [z1 , z2 , . . . zL ]T to denote the vector of the controller states at links l1 to l L , and we lump the functions d(.) and v(.) to form the vector valued functions: D(z, y) = [d(z1 , y1 ), . . . , d(zL , yL )]T
P1: Binaya Dash November 16, 2007
15:41
7985
7985˙C007
208
Modeling and Control of Complex Systems V(z, y) = [v(z1 , y1 ), . . . , v(zL , yL )]T
The time evolution of the congestion signals can then be described by the following control law: z˙ = D(z, y),
z(0) = z0
p = V(z, y)
(7.10) (7.11)
The congestion signals generated at the links are communicated back to the sources resulting in the generation of a feedback signal q i at each source si . The relationship between the feedback signals q , received at the sources, and the congestion signals p, generated at the links, is represented by a vector valued function F (.) such that: q = F ( p)
(7.12)
The operator F (.) is to be determined by the congestion control strategy. It must be noted that the operator F (.) has specific structure, due to the assumed mechanism with which the feedback signals are generated. Data packets are active participants in this mechanism. A packet, as it traverses from its source si to its destination, calculates the feedback signal q i by processing the congestion signals it encounters in its path. This feedback signal is communicated back to the source using an acknowledgement mechanism. So each feedback signal q i can only be a function of the congestion signals p j of the links l j which lie in the path of source si . The equations indicating how the variables defined above are coupled together are summarized below: Plant : y = Ax Controller : z˙ = D(z, y),
(7.13) z(0) = z0
(7.14)
p = V(z, y)
(7.15)
q = F ( p)
(7.16)
w ˙ = G(w, q ), x = H(w, q )
w(0) = w0
(7.17) (7.18)
where z L , w N are the state vectors of the system, x, q N , y, p L are system signal vectors, D : L × L → L , G : N × N → N are vector fields, V : L × L → L , H : N × N → N , F : L → N are static, possibly nonlinear mappings, and A L×N is a matrix. Figure 7.1 demonstrates how Equations (7.13) to (7.18) are interconnected in a feedback system. The control objective is then to design the operators D(.), V(.), F (.), G(.), and H(.) such that: lim x(t) = x ∗
t→∞
(7.19)
P1: Binaya Dash November 16, 2007
15:41
7985
7985˙C007
Congestion Control in Computer Networks
x
209
y
A Routing Matrix . w = G(w,q) x = H(w,q)
. z = D(z,y) p = V(z,y) Congestion Signal Update
Source Behavior
q
p
F(.) Feedback Communication
FIGURE 7.1 Feedback system.
where x ∗ solves the following optimization problem: P1: max Ui (xi )
(7.20)
i I
subject to Ax ≤ C,
x≥0
(7.21)
In other words, the objective is to maximize the aggregate utility function subject to capacity and feasibility constraints. The polyhedral constraint set, described by the inequalities (7.21), is bounded. This together with the continuity of the cost function guarantee the feasibility of the optimization problem.
7.3
Previous Work
Many algorithms have been proposed to solve the above optimization problem or relaxations of the problem. These algorithms have been used as a baseline to develop packet-level protocols whose performance has been demonstrated through simulations or practical implementation. The proposed algorithms can be divided into two classes: primal algorithms and dual algorithms. In primal algorithms, network users update their sending rates using dynamic laws, while the links generate congestion signals using static laws. In dual algorithms, on the other hand, the links update their congestion signals dynamically while the users utilize static laws to determine their sending rates. Congestion control algorithms which utilize dynamic laws at both the
P1: Binaya Dash November 16, 2007
15:41
7985
7985˙C007
210
Modeling and Control of Complex Systems
link and the user ends are widely known as primal-dual algorithms. In this section, we review representative primal and dual algorithms. We demonstrate how a class of dual algorithms emerges as distributed solutions of the dual of the problem P1 while a class of primal algorithms solves relaxations of the original optimization problem. These algorithms, however, suffer from the beat-down problem [25]. Congestion control algorithms that achieve maxmin fairness solve this problem. We point out how max-min fairness relates to a special class of utility functions in the optimization problem P1 and we present a dual max-min congestion control algorithm that has been shown analytically to converge to the desired equilibrium point from any feasible initial condition. 7.3.1 Dual Algorithms The main objective of dual algorithms is to solve the dual problem of P1. The dual problem is formulated using the Lagrangian function which involves augmentation of the primal cost function with the constraints, weighted by auxiliary (or dual) variables. It turns out that these dual variables can be interpreted as the congestion signals generated at the links. In this subsection we formulate the dual problem and we present a dual algorithm that is shown to converge to the unique equilibrium point that is primal dual optimal. The Lagrangian of the system problem P1 is given by: L(x, p) =
Ui (xi ) − p T ( Ax − C)
(7.22)
i I
By defining q = AT p and noting that p T Ax = x T AT p = x T q , Equation (7.22) can be rewritten as: L(x, p) =
(Ui (xi ) − xi q i ) + pjCj i I
(7.23)
j J
The Lagrangian is used to define the dual function as follows: L(x, p) D( p) = max + x
(7.24)
and the dual problem is then: D: min D( p) p≥0
(7.25)
The dual function D( p) can be expressed in closed form by noting that the elements of the vector x are decoupled from each other in D( p) and so the values that maximize the Lagrangian can be expressed as follows: xˆ i ( p) = Ui−1 (q i )
(7.26)
P1: Binaya Dash November 16, 2007
15:41
7985
7985˙C007
Congestion Control in Computer Networks
211
Substituting the above in Equation (7.23) yields: D( p) =
[Ui (Ui−1 (q i )) − Ui−1 (q i )q i ] + pjCj i I
(7.27)
j J
We now make the following assumption: ASSUMPTION 1: The matrix A is full row rank. The above assumption guarantees that the dual function is strictly convex. The convexity of D( p) follows from the following observation. Let f i (z) = Ui (Ui−1 (z)) − Ui−1 (z)z. Then it is not hard to verify that: d f i (z) = −Ui−1 (z) dz
(7.28)
Because the utility function Ui (z) is assumed to be strictly increasing and concave it follows that −Ui−1 (z) is strictly negative and increasing, from which we can conclude that f i (z) is strictly convex. The strict convexity of f i (z) ∀i, and Assumption 1, then guarantee that the dual function D( p) is strictly convex. The feasibility of the problem P1, the concavity of the cost function, and the polyhedral constraint set guarantee the existence of at least one Lagrange multiplier ([29], p. 437) and thus of at least one solution p ∗ of the dual problem ([30], p. 317). The strict convexity of the dual function D( p) then guarantees the uniqueness of p ∗ . At the unique optimal solution p ∗ of the dual problem, the necessary and sufficient optimality condition ([29], p. 176) yields: ∇ D( p ∗ ) T p ∗ = (C − Ax ∗ ) T p ∗ = 0,
(C − Ax ∗ ) ≥ 0
(7.29)
where xi∗ = Ui−1 (q i∗ ) and q ∗ = AT p ∗ . Since p ∗ ≥ 0, it follows that x ∗ ≥ 0. The latter and the inequality ( Ax ∗ − C) ≤ 0 guarantee that x ∗ is feasible. It is also true that:
Ui (xi∗ ) = L(x ∗ , p ∗ ) = max L(x, p ∗ ) +
i I
x
(7.30)
from which it follows ([30], p. 316) that the pair (x ∗ , p ∗ ) is primal dual optimal. The preceding analysis that characterizes the desired equilibrium properties of the system provides insights on how to update the primal and dual variables so that they converge to the desired equilibrium values. We consider the following dual algorithm: ⎧ ⎨ yj − C j p˙ j = y j − C j ⎩ 0 xi = Ui−1 (q i )
if p j > 0 if p j = 0, y j − C j > 0, if p j = 0, y j − C j ≤ 0
p j (0) = p j0
(7.31) (7.32)
P1: Binaya Dash November 16, 2007
15:41
7985
7985˙C007
212
Modeling and Control of Complex Systems
where yj =
xi ,
qi =
i I j
pj
(7.33)
j J i
I j is the index set of the users utilizing link j and J i is the index set of the links that lie in the path of user i. We summarize the stability properties of the system in the following theorem. THEOREM Suppose Assumption 1 holds. Then starting from any initial condition p j (0) ≥ 0 j J , the system of differential Equations (7.31) to (7.32) has a unique equilibrium point p ∗ which is globally asymptotically stable. In addition the corresponding vector x ∗ is the unique solution of the problem P1. PROOF We first show that the following function is a Lyapunov function for the system of differential Equations (7.31) to (7.32):
qi
V( p) =
−Ui−1 (σ )dσ +
i I q ∗ i
( p j − p ∗j )C j
(7.34)
j J
Since p(0) ≥ 0, Equation (7.31) guarantees that p j (t) ≥ 0 ∀t ≥ 0. Because V( p) = D( p) − D( p ∗ ) and D( p ∗ ) = min D( p), V( p) ≥ 0 for all p ≥ 0. In p≥0
addition, it follows that V( p ∗ ) = 0. The strict convexity of the dual function guarantees that p ∗ is the only value in p ≥ 0 for which the latter is true. The time derivative of V( p) is given by: ˙ = ∇V( p) T p˙ = (C j − y j ) p˙ j = −( y j − C j ) 2 ≤ 0 (7.35) V j J
j J \J (t)
˙ = 0, for each j, either where J (t) = { j : p j (t) = 0, y j (t) − C j ≤ 0}. When V ˙ = 0, y j = C j and p j > 0 or p j = 0 and y j ≤ C j . So, for all p ≥ 0 such that V T (C − Ax) p = 0 and Ax − C ≤ 0. The latter two conditions are the necessary optimality conditions for the minimization of V( p) subject to p ≥ 0 which ˙ p) < 0 for all p ≥ 0 are satisfied only when p = p ∗ . This suggests that V( ∗ ˙ except when p = p in which case V( p) = 0. It follows that p ∗ is globally asymptotically stable. The algorithm presented is similar to the algorithm proposed in Reference [16] and the stability proof is along the lines of the proof in Reference [31]. The work is closely related to the work of Kelly and coworkers in References [32] and [15]. Kelly decomposes the problem P1 into user and network subproblems. The network subproblem is a special case of P1 where the utility functions are weighted logarithmic functions. A class of dual algorithms are proposed in Reference [15] which solve relaxations of the problem. The forementioned algorithms and the related stability results were based on fluid flow models that ignore feedback delays. Significant efforts have been made
P1: Binaya Dash November 16, 2007
15:41
7985
7985˙C007
Congestion Control in Computer Networks
213
to develop modifications that ensure stability in the presence of feedback delays. A notable attempt is reported in Reference [18] where the following algorithm is proposed: yj − C j + (7.36) p˙j = Cj pj xi = x¯ i e
−αi q i Mi τi
(7.37)
where Mi is an upper bound on the number of bottleneck links that source i sees in its path, τi is the round-trip time of source i, a i is a positive source gain, xi is a source constant and [ f (x)]+ x is defined as: f (x) if x > 0 (7.38) [ f (x)]+ x = max (f (x), 0) if x = 0 For the above algorithm, conditions for local stability in the presence of delays have been established in Reference [18] using frequency response methods, and conditions for global asymptotic stability have been established in Reference [22] using Lyapunov Ksasovskii functionals. 7.3.2 A Primal Algorithms The above analysis demonstrates that dual algorithms can solve the original optimization problem exactly. The problem with dual algorithms is that smart decisions are taken within the network, thus violating the end-to-end principle that has shaped the Internet. We are thus motivated to study primal algorithms where smart decisions are taken by the end systems. However, as demonstrated in several studies, primal algorithms can only solve relaxations of the original optimization problem. In this subsection we formulate an alternative optimization problem, which approximates the problem P1, and we present a primal algorithm that converges to the unique solution of this problem. We consider functions f j (σ ) j J which are non-negative, continuous, increasing, not identically zero and we consider the following optimization problem: P2 : max V(x) = x≥0
i I
yj
Ui (xi ) −
f j (σ )dσ
(7.39)
j J 0
The functions f j (σ ), j J are chosen such that V(x) is coercive, that is V(x) → −∞ when x → ∞. The latter condition together with the continuity of V(x) and the closure of the constraint set guarantees the feasibility of the optimization problem. The reasoning behind the consideration of problem P2 is that by suitable choice of the functions f j (σ ) j J, one can approximate the original constrained optimization problem P1 with the problem P2, which involves a penalty for violating the constraints. Because the functions f j (σ ) j J
P1: Binaya Dash November 16, 2007
15:41
7985
7985˙C007
214
Modeling and Control of Complex Systems
are increasing it follows that the cost function V(x) is strictly concave. This guarantees the uniqueness of a rate vector x ∗ which solves problem P2. At the optimal vector x ∗ , the necessary and sufficient optimality condition yields: ∇V(x ∗ ) ≤ 0,
∇V(x ∗ ) T x ∗ = 0
(7.40)
This gives the following set of equations: f j ( y∗j ) ≤ 0 Ui (xi∗ ) −
(7.41)
j J i
(Ui (xi∗ ) −
f j ( y∗j ))xi∗ = 0
(7.42)
j J i
We are looking for congestion control laws with the structure described in the previous section which solve the optimization problem P2. We consider the following control law: ⎧ ⎪ ⎨ Ui (xi ) − q i if xi > 0 (7.43) x˙i = Ui (xi ) − q i if xi = 0, Ui (xi ) − q i > 0, xi (0) = xi0 ⎪ ⎩0 if xi = 0, Ui (xi ) − q i ≤ 0 where qi =
f j ( y j ),
yj =
j J i
xi
(7.44)
i I j
We summarize the convergence properties of the above algorithm in the following theorem. THEOREM Starting from any initial condition xi (0) ≥ 0 i I , the system of differential Equations (7.43) to (7.44) has a unique equilibrium point x∗ that is globally asymptotically stable. The vector x ∗ is the unique solution of the problem P2. The proof of the above theorem is similar to the proof of Theorem 1 and is omitted. The cost function in Equation (7.39) serves as a Lyapunov function for the system of differential equations. It must be noted that several modified versions of the control law (7.43) to (7.44) appear in the literature. For the weighted logarithmic utility function Ui (xi ) = wi logxi , which api I
pears in Kelly’s network subproblem, the following source algorithm has been proposed: x˙i = ki (wi − xi q i )
(7.45)
A simple condition on the gains ki was derived in Reference [19] which guarantees local stability in the presence of delays whenever all round-trip times are equal. This same condition is conjectured to be true in the case of heterogeneous delays. This conjecture is shown to be true in Reference [33].
P1: Binaya Dash November 16, 2007
15:41
7985
7985˙C007
Congestion Control in Computer Networks
215
Local stability certificates in the presence of heterogeneous delays were also derived in Reference [20] for the following algorithm:
qi x˙i = ki xi 1 − Ui (xi ) B yj f j ( yj ) = Cj
(7.46) (7.47)
where B > 0. A special case of the above algorithm which can be used to describe TCP-like congestion algorithms ([20], [21]) is the following:
α − bxim q i x˙i = ki xi xin B yj f j ( yj ) = Cj
(7.48) (7.49)
where a , b are positive real numbers and m, n are real numbers that satisfy m + n > 0. For the above algorithm global asymptotic stability in the presence of heterogeneous delays was established in Reference [21] using Razumkhin’s theorem. 7.3.3 Max-Min Congestion Controller Algorithms The algorithms reviewed in the previous section share a common method with which they calculate the feedback signals received by the network users as a function of the congestion signals generated within the network. At each user si the feedback signal q i is calculated by adding the congestion signals p j encountered in the path of the user: qi =
pj
(7.50)
j J i
Congestion control schemes that adopt this approach are known to suffer from the beat-down problem [25]. Connections traveling more hops have a higher probability of being assigned a smaller sending rate value and are thus beaten down by short hop connections [14], [16]. An equilibrium rate assignment that solves the fore-mentioned problem is the one that achieves max-min fairness. Max-min fairness was motivated by the intuitive notion of allowing each session to get as much network use as any other session, with any further increase in its rate resulting in the reduction of other sessions. This has led to the idea of maximizing the network use allocated to the users with the minimum allocation, thus giving rise to the term max-min flow control. Max-min fairness has been defined in a number of ways. With respect to the optimization problem P1, max-min fairness can be defined using a special class of utility functions. The vector x ∗ is said to be max-min fair if it solves
P1: Binaya Dash November 16, 2007
15:41
7985
7985˙C007
216
Modeling and Control of Complex Systems
the following optimization problem: P3 : max lim
α→∞
−(− log xi ) α
i I
subject to Ax ≤ C,
x≥0
(7.51)
This suggests that by appropriate choice of the utility functions, one can use the methods described in the previous section to design congestion control algorithms which converge to sending rate vectors approximating the max-min fair allocation. However, such an approach results in high gains at the sources and leads to undesirable transient properties. In addition, only approximations to the max-min allocation can be achieved. So, over the years network engineers have used different approaches to design max-min congestion control schemes based on alternative definitions of max-min fairness. The most popular design approach has been the following: each link generates a signal which denotes the sending rate it desires from all the uses traversing the link. Then a packet, as it traverses from source to destination, accumulates in its header the minimum of the desired sending rates it encounters in its path. This information is communicated back to the sources, which update their transmission rates accordingly. So, at each user si the feedback signal q i is equal to the minimum of the congestion signals p j encountered in the path of the user. q i = min p j
(7.52)
j J i
Note the difference with Equation (7.50), where the summation operator is replaced with the minimum operator. This nonlinearity in the feedback mechanism makes the analysis of max-min congestion control schemes in networks of arbitrary topology difficult. Local stability results have been established in a number of studies taking advantage of the decoupled set of differential equations which describe the system in a neighborhood about the equilibrium point [34]–[36]. However, global stability results are rather difficult to obtain due to the system being hybrid [37]. Below, we present a max-min congestion controller which has been shown analytically to converge to the desired equilibrium point in the absence of delays. Other algorithms that appear in the literature and are accompanied by similar stability results ([14], [38]) are modified versions of the algorithm presented here. The establishment of global asymptotic stability of these algorithms in the presence of delays still remains an open problem. The congestion control algorithm is the following: xi = min a ji p j , j
∀i I
p˙ j = Pr [C j − y j ], p j (0) = p j0 , yj = xi , ∀ j J i I j
(7.53) ∀ j J
(7.54) (7.55)
P1: Binaya Dash November 16, 2007
15:41
7985
7985˙C007
Congestion Control in Computer Networks
217
where p j0 ≥ 0 and the projection operator Pr [.] is defined as follows: ⎧ ⎪ ⎨ C j − y j if p j < K p˙ j = C j − y j if p j ≥ K , C j − y j < 0 (7.56) ⎪ ⎩ 0 otherwise The initial states { p j (0), j J } of the controllers are chosen to be nonnegative to ensure that the sending rates remain non-negative at all times. The projection operator in Equation (7.56) guarantees that the controller states are bounded from above. In order to ensure that this upper bound does not affect the convergence properties of the feedback system we choose the parameter K to be larger than the maximum capacity in the network. We summarize the properties of the above algorithm in the following theorem. THEOREM The congestion control algorithm (7.53) to (7.55) guarantees that the controller states { p j , j J } are bounded, xi (t) ≥ 0, ∀t ≥ 0, ∀i I , and lim x(t) = x ∗
t→∞
(7.57)
for any feasible initial condition {xi (0) ≥ 0, i I }, where x ∗ is the max-min vector of sending rates which solves problem P3. The proof of this theorem can be found in Reference [39]. The proof is long and technical and demonstrates the degree of complexity involved in the establishment of global asymptotic stability results for max-min congestion controllers.
7.4
Model of the Single Bottleneck Link Case
In the previous section we reviewed representative congestion control algorithms. We focused on algorithms whose equilibrium points can be shown to be globally asymptotically stable using fluid flow models of arbitrary networks that ignore delays. As pointed out in the survey, some of these algorithms have also been shown to be globally stable in the presence of delays, thus leading to congestion control protocols with verifiable properties. However, for max-min congestion control schemes, the establishment of global asymptotic stability in the presence of delays still remains an open challenging research problem. This problem has forced designers to develop maxmin congestion control schemes based on models comprising a single bottleneck link. These models, simpler in terms of the assumed topology but more complex in terms of the considered dynamics, reveal insights on how to develop algorithms that remain stable in the presence of delays and queueing dynamics. In this section, we present a widely used mathematical model of a single bottleneck link network, which we use to demonstrate how the link
P1: Binaya Dash November 16, 2007
15:41
7985
7985˙C007
218
Modeling and Control of Complex Systems
y Source1
Source2
. . .
C q
Destination1
Signal Processor
p
Destination2
. . .
DestinationN
SourceN FIGURE 7.2 Single bottleneck link network.
side algorithm presented in Equation (7.54) needs to be modified to account for delays and queuing dynamics. We consider the single bottleneck link network shown in Figure 7.2. It consists of N users, which share a common bottleneck link through highbandwidth access links. At the bottleneck link we assume that there exists a buffer, which accommodates the incoming packets. The rate of data entering the buffer is denoted by y, the queue size is denoted by q , and the output capacity is denoted by C. At the bottleneck link, we implement a signal processor, which calculates the desired sending rate p. This information is communicated back to the network users which respond by setting their sending rate equal to the received information. We assume that all network users have the same round-trip propagation delay τ and so the sending rate of all users is equal to the same delayed value of the desired sending rate. The input data rate at the link is thus given by the following equation: y = Np(t − τ )
(7.58)
We model the queue as a simple integrator as follows: q˙ = y − C,
q (0) = q 0
(7.59)
In addition, the signal processor calculates the desired sending rate p according to the following differential equation: 1 ki kq p˙ = (C − y) − 2 q , p(0) = p0 (7.60) N τ τ
P1: Binaya Dash November 16, 2007
15:41
7985
7985˙C007
Congestion Control in Computer Networks
219
where ki and kq are design parameters. We define the variable x(t) = y(t) − C. We substitute the latter in Equations (7.58) to (7.60) to obtain the following set of differential equations: ki kq x(t − τ ) − 2 q (t − τ ), τ τ q˙ = x, q (0) = q 0 x˙ = −
x(0) = x0
(7.61) (7.62)
The above equations have been used to describe a number of recently proposed congestion control protocols: the explicit congestion control protocol (XCP) presented in Reference [40], the rate control protocol (RCP) protocol presented in Reference [26], and the ACP protocol described in the next section. The following theorem gives conditions on ki and kq which guarantee that the system is stable. THEOREM If the parameters ki and kq satisfy: π 0 < ki < √ , 4 2
√ kq = ki2 2
(7.63)
then the system (7.61) to (7.62) is stable independent of delay, capacity, and number of sources. The proof of this theorem can be found in Reference [40]. More relaxed stability bounds using nonlinear analysis tools can be found in Reference [26] and [41]. It is interesting to compare the link side algorithm described by Equation (7.60) with the link side algorithm described by Equation (7.54). The latter algorithm, which guarantees stability when delays and queuing dynamics are ignored, simply integrates the excess capacity. The former algorithm continues to integrate the excess capacity, but incorporates additional terms that guarantee that the system is stable in the presence of delays and queuing dynamics. A queue size term is introduced to ensure almost zero queue sizes at equilibrium. In addition, in order to maintain stability in the presence of delays, the control parameters are normalized with the time delay and also with the number of flows N utilizing the bottleneck link. It must be noted that the latter is an unknown time-varying parameter that needs to be estimated. Equation (7.60) has been used as a baseline to develop a number of congestion control protocols which, however, differ significantly in the packet-level implementation of the equation, leading to protocols with different performance characteristics. In the next section we present ACP, an adaptive congestion control protocol, which is also based on Equation (7.60) and is shown through analysis and simulations to satisfy all the design requirements of congestion control protocols outperforming previous proposals.
P1: Binaya Dash November 16, 2007
220
7.5
15:41
7985
7985˙C007
Modeling and Control of Complex Systems
Adaptive Congestion Control Protocol
Due to the problems encountered by TCP in networks with high-bandwidth delay products, several max-min Internet congestion control protocols have been proposed recently. Attempts to develop algorithms that do not require maintenance of per flow states within the network include the queue lengthbased approach in Reference [42], the XCP protocol presented in Reference [40], and the RCP protocol presented in Reference [26]. All these approaches have distinct disadvantages. The scheme proposed in Reference [42] generates feedback signals using queue length information only. However, it is well known that such an approach offers limited control space and thus leads to significant oscillations and degradation in performance, in networks with high-bandwidth delay products. The RCP protocol has been designed with the objective of minimizing the completion time of the network flows. In order to achieve the latter, it applies a rather aggressive policy when increasing or decreasing the sending rate of the network users. However, such an aggressive policy can cause underutilization of the network for large periods of time. XCP constitutes the most promising approach as it achieves high network utilization; smooth and fast responses; scalability with respect to changing bandwidths, delays, and the number of users utilizing the network; small queue sizes; and almost no packet drops. However, it has been shown in Reference [43] that the scheme fails to achieve fairness in scenarios with multiple congested links. The deficiencies of the fore-mentioned protocols indicate that the problem of high speed Internet congestion control still remains open. In this section we present an adaptive congestion control protocol (ACP), a new congestion control protocol with learning capability, which outperforms previous proposals and is shown through simulations to work effectively in a number of scenarios. ACP can be characterized as a dual protocol where intelligent decisions are taken within the network. The main control architecture is in the same spirit as the one used by the available bit rate (ABR) service in asynchronous transfer mode (ATM) networks. Each link calculates at regular time intervals a value that represents the sending rate it desires from all users traversing the link. A packet, as it traverses from source to destination, accumulates in a designated field in the packet header the minimum of the desired sending rates it encounters in its path. This information is communicated to the user that has generated the packet through an acknowledgment mechanism. The user-side algorithm then gradually modifies its congestion window in order to match its sending rate with the value received from the network. The userside algorithm also incorporates a delayed increase policy in the presence of congestion to avoid excessive queue sizes and reduce packet drops. The design of the link-side algorithm which calculates the desired sending rate is based on the algorithm described by Equation (7.60). The algorithm integrates the excess capacity and introduces a queue size term to ensure almost zero queue sizes at equilibrium. In order to maintain stability in the presence
P1: Binaya Dash November 16, 2007
15:41
7985
7985˙C007
Congestion Control in Computer Networks
221
of delays the algorithm requires the number of users utilizing each link. This is an unknown time-varying parameter that needs to be estimated. Algorithms that have been proposed to estimate this parameter are based on pointwise division in time [44]–[46]. This approach, however, is known to lack robustness and lead to erroneous estimates. In ACP online parameter identification techniques are used to derive an estimation algorithm which is shown through analysis and simulations to work effectively. In the rest of this section we describe in detail the proposed congestion control scheme and we evaluate its performance using simulations. Extensive simulations indicate that the proposed protocol satisfies all the design objectives. The scheme guides the network to a stable equilibrium which is characterized by high network utilization, max-min fairness, small queue sizes, and almost no packet drops. It is scalable with respect to changing delays, bandwidths, and number of users utilizing the network. It also exhibits nice dynamic properties such as smooth responses and fast convergence. In our simulations we use realistic traffic patterns which include both bulk data transfers and short-lived flows. More details about ACP and additional simulation results involving random packet drops can be found in Reference [41]. 7.5.1 Protocol 7.5.1.1 Packet Header In a way similar to XCP, the ACP packet carries a congestion header which consists of three fields as shown in Figure 7.3. The H_rtt field carries the current round-trip time estimate of the source that has generated the packet. The field is set by the user and is never modified in transit. It is read by each router and is used to calculate the control period. The H_feedback field carries the sending rate which the network requests from the user that has generated the packet. This field is initiated with the user’s desired rate and is then updated by each link the packet encounters in its path. At each link, the value in the field is compared with the desired sending rate value and the smallest value is stored in the H_feedback field. In this way, a packet as it traverses from source to destination accumulates the minimum sending rate it encounters in its path. The H_congestion bit is a single bit which is initialized by the user with a zero value and is set by a link if the input data rate at that link is more than 95% of the link capacity. In this way, the link informs its users that it is
H_rtt (sender’s rtt estimate) H_feedback (desired sending rate) H_congestion (congestion bit)
FIGURE 7.3 ACP congestion header.
P1: Binaya Dash November 16, 2007
15:41
7985
7985˙C007
222
Modeling and Control of Complex Systems
on the verge of becoming congested so that they can apply a delayed increase policy and avoid excessive instantaneous queue sizes and packet losses. 7.5.1.2 ACP Sender As in TCP, ACP maintains a congestion window cwnd which represents the number of outstanding packets and an estimate of the current round-trip time rtt. In addition to these variables ACP calculates the minimum of the roundtrip time estimates which have been recorded, mrtt. This is a good measure of the propagation delay of the source destination path and is used to transform the rate information reaching the sender to window information. The initial congestion window value is set to 1 and is never allowed to become less than this value because this would cause the source to stop sending data. On packet departure, the H_feedback field in the packet header is initialized with the desired sending rate of the application and the H_rtt field stores the current estimate of the round-trip time. If the source does not have a valid estimate of the round-trip time the H_rtt field is set to zero. The congestion window is updated every time the sender receives an acknowledgment. When a new acknowledgment is received, the value in the H_feedback field, which represents the sending rate requested by the network in bytes per second, is read and is used to calculate the desired congestion window as follows: desired_window =
H_feedback × mrtt size
(7.64)
where size is the packet size in bytes. We multiply with the mr tt to transform the rate information into window information and we divide by the packet size to change the units from bytes to packets. The desired window is the new congestion window requested by the network. We do not immediately set the cwnd equal to the desired congestion window because this abrupt change may lead to bursty traffic. Instead we choose to gradually make this change by means of a first-order filter. The smoothing gain of this filter depends on the state of the H_congestion bit in the acknowledgment received. If this is equal to 1, which indicates congestion in the source destination path, we apply a less aggressive increase policy. The congestion window is updated according to the following equation: cwnd = cwnd +
0.1 (desired_window − cwnd) cwnd
(7.65)
if desired_window > cwnd and H_congestion=1 and cwnd = Pr [cwnd +
1 (desired_window − cwnd)] cwnd
(7.66)
otherwise. The projection operator Pr[.] is defined below and guarantees that the congestion window does not become less than 1. x if x > 1 Pr [x] = (7.67) 1 otherwise.
P1: Binaya Dash November 16, 2007
15:41
7985
7985˙C007
Congestion Control in Computer Networks
223
7.5.1.3 ACP Receiver When it receives a packet the ACP receiver generates an acknowledgment in which it copies the congestion header of the packet. 7.5.1.4 ACP Router At each output queue of the router, the objective is to match the input data rate y to the link capacity C and at the same time maintain small queue sizes. To achieve this objective the router maintains for each link a value that represents the sending rate it desires from all users traversing the link. The desired sending rate is denoted by p and is updated every control period. The router implements a per link control timer. The desired sending rate and other statistics are updated every time the timer expires. The control period is set equal to the average round-trip time d. The average round-trip time is initialized with a value of 0.05 and is updated every control period. On packet arrival the router reads the H_rtt field in the packet header and updates the variables that are used to calculate the average round-trip time. The router calculates at each output queue the input data rate y. For each link, the router maintains a variable that denotes the number of received bytes. This variable is incremented with the packet size in bytes, every time the queue associated with the link receives a packet. When the control timer expires, the link calculates the input data rate by dividing the received number of bytes with the control period. It then resets the received number of bytes. The router also maintains at each output queue the persistent queue size q in bytes. The q is computed by taking the minimum queue seen by the arriving packets during the last propagation delay. The propagation delay is unknown at the router and is thus estimated by subtracting the local queueing delay from the average round-trip time. The local queueing delay is calculated by dividing the instantaneous queue size with the link capacity. The above variables are used to calculate the desired rate p every control period using the following iterative algorithm: 1 1 ki (0.99 ∗ C − y(k)) − kq q (k)] , p(0) = 0 p(k + 1) = Pr [ p(k) + ˆ d(k) N(k) (7.68) ˆ where ki and kq are design parameters, N represents an estimate of the number of users utilizing the link, and the projection operator is defined as follows: ⎧ ⎨ 0 if x < 0 Pr [x] = C if x > C (7.69) ⎩ x otherwise The projection operator guarantees that the desired sending rate is nonnegative and smaller than the link capacity. Values outside this range are not feasible. The design parameters ki and kq are chosen to be 0.1587 and 0.3175, respectively. In Reference [41] we show using phase plane analysis that this choice of the design parameters guarantees that the ACP protocol is stable for
P1: Binaya Dash November 16, 2007
15:41
7985
7985˙C007
224
Modeling and Control of Complex Systems
all delays. The link algorithm (7.68) is based on Equation (7.60). The basic idea is to integrate the excess capacity and to add a queue size term to guarantee that at equilibrium the queue size converges to zero. Previous work has shown that in a continuous time representation of the algorithm, in order to maintain stability, the excess capacity term must be divided with the time delay and the queue size term must be divided with the square of the propagation delay. However, when transforming the continuous time representation to the discrete time representation of Equation (7.68), we multiply both terms with the time delay and so we end up dividing only the queue term with the delay to maintain stability. Note also that we slightly underutilize the link at equilibrium by setting the virtual capacity equal to 99% of the true link capacity. We do this to reserve bandwidth resources which can be used to accommodate statistical fluctuations of the bursty network traffic. This prevents excessive instantaneous queue sizes. Previous experience in the design of link algorithms for congestion control has shown that to maintain stability we need to normalize the control parameters with the number of users utilizing the network. A novel part of this work is that we use online parameter identification techniques to derive an algorithm that estimates the unknown parameter online. The derivation is based on a fluid flow model of the network and is presented in Reference [41] together with the properties of the algorithm. Here we present a discrete-time implementation of the algorithm: ˆ γ [y(k) − N(k) p(k)] p(k) ˆ + 1) = Pr N(k) ˆ , N(k + 1 + p 2 (k)
ˆ N(0) = 10
(7.70)
where the projection operator Pr[.] is defined as follows: Pr [x] =
x 1
if x > 1 otherwise
(7.71)
The projection operator guarantees that the number of flows traversing the link is never allowed to be less than 1. Values less than one are obviously not feasible. γ is a design parameter that affects the convergence properties of the algorithm. We choose γ to be equal to 0.1. Note that the initial value of the estimated number of flows Nˆ is equal to 10. We choose this value to ensure a relatively conservative policy when initially updating the desired sending rate. The desired sending rate calculated at each link is used to update the H_feedback field in the packet header. On packet departure, the router compares the desired sending rate with the value stored in the H_feedback field and updates the field with the minimum value. In this way, a packet as it traverses from source to destination accumulates the minimum of the desired sending rates it encounters in its path. The last function performed by the router at each link is to notify the users traversing the link of the presence of congestion so that they can apply
P1: Binaya Dash November 16, 2007
15:41
7985
7985˙C007
Congestion Control in Computer Networks
225
a delayed increase policy. On packet departure the link checks whether the input data rate is larger than 0.95 of the link capacity. In this case it deduces that the link is congested and sets the H_congestion bit in the packet header. 7.5.2 Performance Evaluation Our objective has been to develop a window-based protocol that does not require maintenance of per flow states within the network and satisfies all the design objectives of congestion control protocols. In this section, we demonstrate through simulations that ACP satisfies these objectives to a very good extent. We also conduct a comparative study and demonstrate how ACP fixes the performance problems encountered by XCP and RCP. We conduct our simulations on the ns-2 simulator. In our simulations we mainly consider bulk data transfers but we also evaluate the performance of the protocol in the presence of short web-like flows. More simulation results incorporating random packet losses and additional performance metrics can be found in Reference [41]. 7.5.2.1 Scalability It is important for congestion control protocols to be able to maintain their properties as network characteristics change. We thus investigate the scalability of ACP with respect to changing link bandwidths, propagation delays, and number of users utilizing the network. We conduct our study by considering the single bottleneck link network shown in Figure 7.4. In the basic setup, 50 users share the bottleneck link through access links. The bandwidth of all links in the network is set equal to 155 Mb/sec and their propagation delay is set equal to 20 msec. As mentioned above, the purpose of this study is to investigate the scalability of ACP with respect to changing bandwidths, delays, and number of users utilizing
Source 1
155 Mb/sec 20 msec
Sink 1 155 Mb/sec 20 msec Bottleneck Link
Sink 2
.. .
.. .
Source 50
Sink 50
Source 2
FIGURE 7.4 Single bottleneck link topology used to investigate the scalability of ACP with respect to changing link capacities, delays, and number of users.
P1: Binaya Dash November 16, 2007
226
15:41
7985
7985˙C007
Modeling and Control of Complex Systems
the network. When investigating the scalability of the protocol with respect to a particular parameter, we fix the other parameters to the values of the basic setup and we evaluate the performance of the protocol as we change the parameter under investigation. We consider bandwidths in the range 10 Mbits/sec to 1 Gbit/sec, delays in the range 10 msec to 1 sec, and number of users in the range 1 to 1000. The performance metrics that we use in this study are the average utilization of the bottleneck link and the queue size of the buffer at the bottleneck link. We consider two measures for the queue size: the average queue size and the equilibrium queue size. The average queue size is calculated over the entire duration of the simulation and thus contains information about the transient behavior of the system. The equilibrium queue size is calculated by averaging the queue length values recorded after the system has converged to its equilibrium state. We do not report packet drops, as in all simulations we do not observe any. In addition, we do not show fairness plots, as in all simulations the network users are assigned the same sending rate at equilibrium, which implies that max-min fairness is achieved in all cases. The dynamics of the protocol and its ability to perform well in more complex network topologies are investigated in separate studies in later sections. In our simulations, we consider persistent file transfer protocol (FTP) sources. The packet size is equal to 1000 bytes and the buffer size of all links is set equal to the bandwidth delay product. The simulation time is not constant. It varies depending on the round-trip propagation delay. We simulate for a sufficiently long time to ensure that the system has reached an equilibrium state. It is highly unlikely that in an actual network the network users will enter the network simultaneously. So, in all scenarios, the users enter the network with an average rate of one user per round-trip time.
7.5.2.1.1 Effect of Capacity We first evaluate the performance of the ACP protocol as we change the link bandwidths. We fix the number of users to 50, we fix the propagation delays to 20 msec, and we consider link bandwidths in the range 10 Mbits/sec to 1 Gbit/sec. Plots of the bottleneck utilization and the average queue size versus the link capacity are shown in Figure 7.5. We observe that ACP scales well with increasing bandwidths. The protocol achieves high network utilization (≈ 98%) at all bandwidths. Moreover, the queue size always converges to an equilibrium value that is close to zero. The average queue size remains very small but we do observe an increasing pattern. The reason for this becomes apparent when we investigate the transient properties of the protocol. In the transient period, during which the users gradually enter the network, the queue size at the bottleneck link experiences an instantaneous overshoot, before settling down to a value that is close to zero. As the bandwidth increases the maximum value of this overshoot increases, thus causing the average queue size to increase as well. However, in all cases the queue size at equilibrium is small as required.
P1: Binaya Dash November 16, 2007
15:41
7985
7985˙C007
Congestion Control in Computer Networks
227
1
Bottleneck Utilization
0.8
0.6
0.4
0.2
0
0
200
400 600 800 Bottleneck Capacity (Mb/sec) (a) Utilization vs Capacity
1000
50 Average Equilibrium
Queue Size (packets)
40
30
20
10
0
0
200 400 600 800 Bottleneck Capacity (Mbits/sec)
1000
(b)Average Queue Size vs Capacity FIGURE 7.5 ACP achieves high network utilization and experiences no drops as the capacity increases. The average queue size increases with increasing capacity due to larger instantaneous queue sizes in the transient period. However, at all capacities, the queue size at equilibrium is close to zero.
7.5.2.1.2 Effect of Delays We then investigate the performance of ACP as we change the propagation delay of the links. Any change in the link propagation delay causes a corresponding change in the round-trip propagation delay of all source destination paths. We fix the link bandwidths to 155 Mbits/sec, we fix the number of users to 50, and we consider round-trip propagation delays in the range
P1: Binaya Dash November 16, 2007
15:41
7985
7985˙C007
228
Modeling and Control of Complex Systems 1
Bottleneck Utilization
0.8
0.6
0.4
0.2
0
0
200
400
600
800
1000
Round−Trip Propagation Delay (msec) (a) Utilization vs Delay 200
Queue Size (packets)
Average Equilibrium 150
100
50
0
0
200 400 600 800 Round–Trip Propagation Delay (msec) (b) Average Queue Size vs Delay
1000
FIGURE 7.6 ACP achieves high network utilization and experiences no drops as the round-trip propagation delay increases. The average queue size increases with increasing propagation delay due to larger instantaneous queue sizes in the transient period. However, at all delays, the queue size at equilibrium is close to zero.
10 msec to 1 sec. Plots of the bottleneck utilization and the average queue size versus the round-trip propagation delays are shown in Figure 7.6. The results are similar to the results obtained when investigating the effect of changing capacities. Figure 7.6a demonstrates that the protocol achieves high network utilization at all delays. The equilibrium queue size remains very small; however, the average queue size increases. This trend, as in the case of capacities, is due to the increasing instantaneous queue size in the transient period. As the propagation delays increase, the maximum of the overshoot
P1: Binaya Dash November 16, 2007
15:41
7985
7985˙C007
Congestion Control in Computer Networks
229
observed in the transient period increases, thus causing an increase in the average queue size. Although the average queue size increases, the queue size at equilibrium is close to zero as required. 7.5.2.1.3
Effect of the Number of Users
We finally investigate the performance of ACP as we increase the number of users utilizing the single bottleneck link network in Figure 7.4. We consider different numbers of users in the range 1 to 1000. Plots of the bottleneck utilization and the average queue size versus the number of users are shown in Figure 7.7. We observe that up to approximately 800 users the protocol satisfies the control objectives as it achieves high network utilization and small queue sizes. However, unlike the previous two cases, the equilibrium queue size is not close to zero. It exhibits similar behavior to the behavior of the average queue size. The reason for this is that as the number of users increases the queue size experiences oscillations. These oscillations dominate the overshoots observed during the transient period and so the equilibrium queue size calculated is very close to the average queue size. The oscillatory behavior at equilibrium is caused by the fact that the congestion window can only take integer values. When the fair congestion window is not an integer (which is the common case), the desired sending at the link is forced to oscillate about the equilibrium value, thus causing oscillations of the input data rate and the queue size. As the number of users increases, these oscillations grow in amplitude and at some point they cause a significant degradation in performance. We observe in Figure 7.7 that when the network is utilized by more than 800 users, the utilization drops to about 90% and the average queue size increases. The reason is that at such a high number of users, the fair congestion window is close to 1. Because the congestion window can only take integer values, it oscillates between 1 and 2. These oscillations of the congestion window cause both the utilization and the queue size to oscillate. This behavior causes a decrease in the observed average utilization and an increase in the observed average and equilibrium queue size. 7.5.2.2 Performance in the Presence of Short Flows In our performance analysis so far we have only considered persistent FTP flows that generate bulk data transfers. Internet traffic, however, consists of both short and long flows. The set of flows is dominated by a relatively few elephants (long flows) and a very large number of mice (short flows). Elephants, although smaller in number, account for the biggest percentage of the network traffic. Short flows account for a smaller percentage which, however, cannot be ignored. In this section, we evaluate the performance of ACP in the presence of short web-like flows. We consider the single bottleneck link network shown in Figure 7.4. The bandwidth of each link is set equal to 155 Mbits/sec and the round-trip propagation delay is equal to 80 msec. Fifty persistent FTP flows share the single
P1: Binaya Dash November 16, 2007
15:41
7985
7985˙C007
230
Modeling and Control of Complex Systems 1
Bottleneck Utilization
0.8
0.6
0.4
0.2
0
0
200
400
600
800
1000
Number of Users (a) Utilization vs Number of Users
Queue Size (packets)
200
150
100
50
0
0
200
400 600 Number of Users
800
1000
(b) Average Queue Size vs Number of Users FIGURE 7.7 ACP achieves high network utilization and experiences no packet drops as the number of users increases. At a high number of users, the utilization drops slightly and the average queue size increases. The reason is that the fair congestion window is small (close to 1). Because the congestion window can only take integer values both the utilization and queue size oscillate, thus causing a slight degradation in performance.
bottleneck link with short web-like flows. Short flows arrive according to a Poisson process. We conduct a number of tests where we change to the mean of this arrival process to emulate different traffic loads. The transfer size is derived from a Pareto distribution with an average of 30 packets. The shape of this distribution is set to 1.35. In Figure 7.8 we show plots of the utilization and the average queue size at the bottleneck link versus the mean arrival rate of the short flows. We observe
P1: Binaya Dash November 16, 2007
15:41
7985
7985˙C007
Congestion Control in Computer Networks
231
1
Bottleneck Utilization
0.8
0.6
0.4
0.2
0
0
100
200 300 400 Mice Arrival Rate (mice/sec)
500
(a) Utilization vs Mice Arrival Rate 100 Average Equilibrium Queue Size (packets)
80
60
40
20
0
0
100
200 300 400 Mice Arrival Rate (mice/sec)
500
(b) Average Queue Size vs Mice Arrival Rate FIGURE 7.8 ACP achieves high network utilization and maintains small queue sizes as the arrival rate of short web-like flows increases. Note that 500 users per second corresponds to a link load of 75%. In simulations, the transfer size of short flows is derived from a Pareto distribution with an average of 30 packets and a shape factor equal to 1.35.
that as we increase the arrival rate the utilization drops slightly, whereas both the average queue size and the equilibrium queue size increase. The important thing is that the queue size remains small and no packet drops are observed. It must be noted that 500 users per second corresponds to a link load of 75%. Experiments have shown that short flows account for about 20% of the traffic. In this regime, the utilization recorded at the bottleneck link is 96%, which is satisfactory.
P1: Binaya Dash November 16, 2007
15:41
7985
7985˙C007
232
Modeling and Control of Complex Systems User 1
155 Mb/sec 20 msec
User 2
155 Mb/sec 20 msec
155 Mb/sec 20 msec
Users 3, 4
Users 5−7
FIGURE 7.9 A three-link network used to investigate the ability of ACP to achieve max-min fairness. The first two users utilize the network throughout the simulation, users 2 and 3 start sending data at 20 sec and users 5 to 7 start sending data at 40 sec.
7.5.2.3 Fairness Our objective has been to develop a congestion control protocol that at equilibrium achieves max-min fairness. In this section we investigate the effectiveness of ACP to achieve max-min fairness in a scenario where the max-min fair sending rates change dynamically due to changes in the network load. We consider the three-link network shown in Figure 7.9. The bandwidth of each link is set equal to 155 Mbits/sec and the propagation delay of each link is set equal to 20 msec. Seven users utilize the network at different time intervals. At the beginning only users 1 and 2 utilize the network. The path of the first user traverses all three links while the path of the second user traverses the first link only. During the time that only these two users are active, the first link is the bottleneck link of the network and the fair sending rate for the two links is 77.5 Mbits/sec. At 20 sec users 3 and 4 enter the network. Both users traverse the second link, which becomes the bottleneck link for users 1, 3, and 4. User 2 is still bottlenecked at the first link because this is the only link that it utilizes. Note that at 20 sec, user 2 increases its window to take up the slack created by user 1 sharing the bandwidth of link 2 with the other two users. At 40 sec users 5 to 7 start sending data through the third link, which now becomes the bottleneck link for users 1, 5, 6, and 7. User 2 is bottlenecked at the first link, whereas users 3 and 4 are still bottlenecked at the second link. In Figure 7.10 we show the time responses of the congestion window of a representative number of users. These responses are compared with the theoretical max-min allocation values at each time. The actual responses are denoted by solid lines, whereas the theoretical values are denoted by dotted lines. We observe that at equilibrium, the actual values match exactly the theoretical values which implies that max-min fairness is achieved at all times. One thing to notice is that during the first 20 sec, the congestion windows of users 1 and 2 are different, despite the fact that their theoretical max-min sending rates in this period are the same. There is no inconsistency between the two observations. The two users experience different round-trip propagation delays as they travel a different number of hops. Although their sending rates are identical, the different round-trip times generate different congestion
P1: Binaya Dash November 16, 2007
15:41
7985
7985˙C007
Congestion Control in Computer Networks
233
2000 User 1 User 2 User 3 User 5 User 1 User 2 User 3 User 5
Cwnd
1500
1000
500
0
0
20
40
60
Time (sec) FIGURE 7.10 Time response of the congestion window of a representative number of users compared with the theoretical max-min values. The theoretical values are denoted by dotted lines.
windows. This demonstrates the ability of ACP to achieve fairness in the presence of flows with different round-trip times and number of hops. Also note that the response of user 4 equals the response of user 3 and the response of users 6 and 7 are equal to the response of user 5 and are thus not shown. Another interesting observation is the overshoot in the response of user 3. This is a result of the second link becoming a bottleneck link only when users 3 and 4 enter the network. During the time that only users 1 and 2 utilize the network, the two users are bottlenecked at the first link, and so the input data rate in the second link is consistently less than the capacity. This causes the algorithm that updates the desired sending rate at the link to consistently increase the desired sending rate. Basically, the link asks for more data, the users do not comply because they are bottlenecked elsewhere, and the link reacts by asking for even more data. The desired sending rate, however, does not increase indefinitely. A projection operator in the link algorithm causes the desired sending rate at the second link to converge to the link capacity. When users 3 and 4 enter the network the second link becomes their bottleneck link. Their sending rate thus becomes equal to the desired sending rate computed at the link. Because the desired sending rate is originally equal to the link capacity, the congestion windows of the two users experience an overshoot before settling down to their equilibrium value. This can be observed in Figure 7.10. Despite this overshoot the system does not experience any packet drops. The above setting can be used to emulate the case where network users cannot comply with the network’s request because they do not have enough data to send. The above shows the ability of ACP to also cope with this case.
P1: Binaya Dash November 16, 2007
15:41
7985
7985˙C007
234
Modeling and Control of Complex Systems
7.5.2.4 Dynamics of ACP To fully characterize the performance of the proposed protocol, apart from the properties of the system at equilibrium, we need to investigate its transient properties. The protocol must generate smooth responses that are well damped and converge fast to the desired equilibrium state. To evaluate the transient behavior of ACP, we consider the single bottleneck link network shown in Figure 7.4 and we generate a dynamic environment where users enter and leave the network at different times. In such an environment, we investigate the dynamics of the user sending rates, we examine the queuing dynamics at the bottleneck link, and we also evaluate the performance of the estimator which is used to track the number of users utilizing the network. To conduct our study we consider the following scenario. Thirty users originally utilize the single bottleneck link network shown in Figure 7.4. At 30 sec 20 of these users stop sending data simultaneously. So the number of users utilizing the network is reduced to 10. At 45 sec, however, 40 additional users enter the network, thus causing the number of users to increase to 50. In Figure 7.11 we present the time responses of the congestion window of a representative number of users. User 1 utilizes the network throughout the simulation, user 30 stops sending data at 30 sec, and user 40 enters the network at 45 sec. The transient behavior of the other users is very similar to the ones shown in Figure 7.11. We observe that the protocol achieves smooth responses which converge fast to the desired equilibrium with no oscillations. However, in some cases, they experience overshoots. When user 1 starts sending data it converges fast to its max-min fair allocation. Because the users gradually enter the network, the max-min allocation gradually decreases. This is why 300
cwnd 1 cwnd 30 cwnd 40
250
Cwnd
200 150 100 50 0
0
20
40 Time (sec)
60
80
FIGURE 7.11 Time response of the congestion window of three users. User 1 utilizes the network throughout the simulation, user 30 stops sending data at 30 sec, and user 40 enters the network at 45 sec. We observe smooth and fast responses with no oscillations.
P1: Binaya Dash November 16, 2007
15:41
7985
7985˙C007
Congestion Control in Computer Networks
235
the congestion window of user 1 experiences a large overshoot before settling down to its equilibrium value. Note, however, that once the desired sending rate calculated at the bottleneck link has settled down to an equilibrium value, a new user, such as user 30, converges fast to the max-min allocation value with no overshoots. When the 20 users suddenly stop sending data at 30 seconds the flow of data through the bottleneck link drops, thus causing an instantaneous underutilization of the link. The link identifies this drop in the input data rate and reacts by increasing its desired sending rate. This causes user 1 to increase its congestion window. The time response in Figure 7.11 indicates fast convergence to the new equilibrium value with no oscillations. However, the response does experience a small overshoot before settling down to its equilibrium value. This slight overshoot is caused by the feedback delays and the pure integral action of the congestion controller. It can be avoided by introducing proportional action. However, such a modification would increase the complexity of the algorithm without significantly improving the performance and is thus avoided. When 40 new users enter the network at 45 sec, the max-min fair sending rate decreases. The controller at the bottleneck link iteratively calculates this rate and communicates this information to the end users. This causes user 1 to decrease its congestion window and user 40 which has just entered the network to gradually increase its congestion window to the equilibrium value. We observe from Figure 7.11 that user 1 converges fast to the new equilibrium value with no undershoots or oscillations. We also observe that the time response of the congestion window of user 40 experiences a small overshoot before settling down to its equilibrium value. This is due to the fact that the user sets its sending rate equal to the desired sending rate calculated at the bottleneck link while the latter is still decreasing. The next thing we investigate is the transient behavior of the utilization and the queue size at the bottleneck link. In Figure 7.12 we show the time responses of the utilization and the queue size at the bottleneck link. We observe that the link utilization converges fast to a value which is close to 1. When the 20 users leave the network, the flow of data suddenly decreases, thus causing an instantaneous decrease in the utilization. However, the system reacts quickly by increasing the sending rate of the remaining users, thus achieving almost full utilization in a very short period of time. The time response of the queue size indicates that the latter converges to a value that is close to 0. This is what is required by the congestion control protocol in order to avoid excessive queueing delays in the long run. However, in the transient periods during which new users enter or leave the network, the queue size experiences an instantaneous increase. It might seem strange that we observe increasing queue sizes when users leave the network. This is caused by the fact that the remaining users, while they increase their sending rate to take up the slack created, experience overshoots. It must be noted that the maximum queue size recorded in the transient period increases as the bandwidth delay product increases. This is why in our study of the scalability properties of ACP, the average queue size increases as we increase the
P1: Binaya Dash November 16, 2007
15:41
7985
7985˙C007
236
Modeling and Control of Complex Systems 1.4
Bottleneck Utilization
1.2 1 0.8 0.6 0.4 0.2 0
0
20
40 Time (sec)
60
80
(a) Utilization vs Time
Bottleneck Queue Size (packets)
500 400 300 200 100 0
0
20
40 Time (sec)
60
80
(b) Queue Size vs Time FIGURE 7.12 Time response of the instantaneous utilization and the queue size at the bottleneck link. Utilization converges fast to a value that is close to 1. There is an instantaneous drop when the 20 users leave the network but the protocol manages to recover quickly. The queue size experiences instantaneous increases when new users enter the network but at equilibrium the queue size is almost zero.
bandwidths and the delays. However, careful choice of the control parameters at the links and the delayed increase policy that we apply at the sources ensure that these overshoots do not exceed the buffer size and thus do not lead to packet drops. A distinct feature of the proposed congestion control strategy is the implementation at each link of an estimation algorithm that estimates the number of flows utilizing the link. These estimates are required to maintain stability in the presence of delays. Here, we evaluate the performance of the proposed
P1: Binaya Dash November 16, 2007
15:41
7985
7985˙C007
Congestion Control in Computer Networks
237
Estimated Number of Flows
50 40 30 20 10 0
0
20
40 Time (sec)
60
80
FIGURE 7.13 Time response of the estimated number of users utilizing the bottleneck link. We observe almost perfect tracking at equilibrium and fast responses with no overshoots.
estimation algorithm. In the scenario that we have described in the previous subsection, the number of users utilizing the single bottleneck link network changes from 30 to 10 at 30 sec and it becomes 50 at 45 sec. So, we evaluate the performance of the proposed estimation algorithm by investigating how well the estimator tracks these changes. In Figure 7.13 we show the time response of the output of the estimator. We observe that the estimator generates smooth responses with no overshoots or oscillations. In addition, the estimator tracks the changes in the number of users and produces correct estimates at equilibrium. 7.5.2.5 A Multilink Example Until now we have evaluated the performance of ACP in simple network topologies that include one, two, or three links. Our objective in this section is to investigate how ACP performs in a more complex network topology. We consider the parking lot topology shown in Figure 7.14.
20 users Link 2 Link 1 155 Mb/sec 155 Mb/sec Router 1 15 msec Router 2 15 msec
20 users
20 users
FIGURE 7.14 A parking lot network topology.
Link 3 155 Mb/sec Router 3 15 msec
20 users
Router 4
Link 4 80 Mb/sec 15 msec
20 users
Router 5
Router 9
P1: Binaya Dash November 16, 2007
15:41
7985
7985˙C007
238
Modeling and Control of Complex Systems
The network consists of eight links connected in series. All links have a bandwidth of 155 Mbits/sec except link 4, which has a bandwidth of 80 Mbits/sec. The propagation delay of all links is set equal to 15 msec. Twenty users utilize the network by traversing all eight links. Moreover, each link in the network is utilized by an additional 20 users which have single hop paths as shown in Figure 7.14. In this way, all links in the network are bottleneck links and link 4 is the single bottleneck link for the 20 users that traverse the whole network. We evaluate the performance of ACP by examining the utilization and the average queue size observed at each link. We do not report packet drops, as we do not observe any. In Figure 7.15, we show on separate 1
Link Utilization
0.8
0.6
0.4
0.2
0
1
2
3
4 5 Link ID
6
7
8
(a) Utilization at Each Link 35 Average Equilibrium
Queue Size (packets)
30 25 20 15 10 5 0
1
2
3
4 5 6 Link ID (b) Queue at Each Link
7
8
FIGURE 7.15 ACP achieves high utilization at all links and experiences no packet drops. In addition it manages to maintain small queue sizes.
P1: Binaya Dash November 16, 2007
15:41
7985
7985˙C007
Congestion Control in Computer Networks
239
graphs the utilization achieved at each link and the average and equilibrium queue sizes recorded at the link. Because all links in the network are bottleneck links for some flows, we do expect them to be fully utilized. Indeed, we observe that ACP achieves almost full utilization at all links. In addition, both the equilibrium queue size and the average queue size remain small. At link 4 we observe smaller average queue size. This is due to its smaller bandwidth delay product. This is consistent with our observations in previous sections. 7.5.2.6 Comparison with XCP Our objective has been to develop a congestion control protocol that does not require maintenance of per flow states within the network and satisfies all the design objectives. An explicit congestion control protocol (XCP), which has been recently developed in Reference [40], satisfies most of the design objectives but fails to achieve max-min fairness in the case of multiple congested links. It has been shown through analysis and simulations that when the majority of flows at a particular link are bottlenecked elsewhere, the remaining flows do not make efficient use of the residual bandwidth [43]. In this section, we consider a topology where the above problem is evident and we demonstrate that ACP fixes this problem and achieves max-min fairness. We consider the two-link network shown in Figure 7.16. Link 1 has a bandwidth of 155 Mbits/sec whereas link 2 has a bandwidth of 80 Mbits/sec. Eighty users access the network though 155-Mbits/sec access links. The access links of the first 60 users have a propagation delay of 15 msec, the access links of the next 10 users have a propagation delay of 100 msec, and the propagation delay of the last 10 users are set to 2 msec. We have chosen a variety of
Sources 1−10 155 Mb/sec 15 msec Sources 11−60 155 Mb/sec 15 msec Router 1 155 Mb/sec 100 msec Sources 61−70
Link 1
155 Mb/sec 15 msec
Router 2
Link 2
Router 3
80 Mb/sec 15 msec
155 Mb/sec 2 msec Sources 71−80
Sinks 1−10
Sinks 11−80
FIGURE 7.16 A two-link network used to investigate the ability of ACP to achieve max-min fairness at equilibrium. We consider a simulation scenario that involves users with heterogeneous round-trip times.
P1: Binaya Dash November 16, 2007
15:41
7985
7985˙C007
240
Modeling and Control of Complex Systems
TABLE 7.1
Theoretical Max-Min Fair Values, Compared with the Equilibrium Values Achieved by ACP and XCP Users
Round-Trip Time (msec)
Max-Min Congestion Window
ACP Congestion Window
XCP Congestion Window
1–10 11–60 61–70 71–80
60 90 260 62
56 13 37 9
56 13 37 9
40 13 37 9
propagation delays to investigate the ability of ACP to achieve fairness in the presence of flows with multiple round-trip times. The first 10 users of the network have connection sinks at the first router and the rest of the users have connection sinks at the second router. This has been done to ensure that both links are bottleneck links for some flows. The first 10 users are bottlenecked at link 1, whereas the remaining users are bottlenecked at link 2. We simulate the above scenario using both XCP and ACP users. In Table 7.1 we compare the theoretical max-min congestion window values with the equilibrium values achieved by ACP and XCP. We observe that ACP matches exactly the theoretical values, whereas XCP does not. XCP fails to assign max-min sending rates to the first 10 users, which utilize link 1 only. This is consistent with the findings in Reference [43]. The other users traversing link 1 are bottlenecked at link 2 and so the 10 users that are bottlenecked at link 1 do not make efficient use of the available bandwidth. This inefficiency causes underutilization of link 1. This is demonstrated in Figure 7.17 where 1.4 ACP XCP
1.2
Utilization
1 0.8 0.6 0.4 0.2 0
0
5
10
15 Time (sec)
20
25
30
FIGURE 7.17 Time response of the utilization at the first link achieved by ACP and XCP. We observe that ACP achieves higher utilization.
P1: Binaya Dash November 16, 2007
15:41
7985
7985˙C007
Congestion Control in Computer Networks
241
we plot the time response of the utilization achieved at link 1 by the ACP and the XCP users. Obviously XCP causes underutilization of the link, whereas ACP achieves almost full utilization of the link at equilibrium. This example demonstrates that ACP outperforms XCP in both utilization and fairness. Another thing to note in Table 7.1 is the ability of ACP to achieve max-min fairness despite the presence of flows with a variety of round-trip times. 7.5.2.7 Comparison with RCP RCP and ACP were developed independently based on similar design principles. However, the design objectives of the two protocols are different. The main objective of RCP is to minimize the duration of the network flows, whereas the main objective of ACP is to optimize network-centric performance metrics such as fairness, utilization, queue sizes, and packet drops. Although RCP and ACP were motivated by the same design ideas, they implement different algorithms at both the sources and the links. At each source RCP applies a rather aggressive increase policy where it immediately adopts the desired sending rate received from the network as the current sending rate of the source. This is done to ensure that flows with small file sizes finish their sessions quickly. However, such an aggressive increase policy, especially for new users, must be accompanied by an aggressive decrease policy in the case of congestion, to avoid packet losses. However, as we will see later such an aggressive decrease policy can cause RCP to underutilize the network for a significant time period. ACP, on the other hand, applies a more conservative policy both when increasing and when decreasing the source sending rate. This conservative policy ensures no packet losses and high network utilization. However, it does take several round-trip times for each source to converge to its max-min fair sending rate and this can cause larger duration of flows with small file sizes. RCP and ACP also have fundamental differences in the implementation of the algorithm that updates the desired sending rate. RCP implements a nonlinear congestion controller, whereas ACP implements a certainty equivalent controller. The properties of the RCP controller have been established by linearizing the nonlinear equations in a small neighborhood about the stable equilibrium point. However, the linear model is a poor approximation of the nonlinear model in some regions of the state space. These model inaccuracies can cause the RCP algorithm to deviate significantly from the predicted behavior and perform poorly in some scenarios. Specifically, when the desired sending rate experiences a large undershoot, the controller is very slow in recovering, thus causing underutilization of the network for large time intervals. ACP, on the other hand, implements a certainty equivalent controller at each link. The controller is designed assuming that the number of users utilizing the link is known. In practice the latter is an unknown timevarying parameter. We utilize online parameter identification techniques to estimate this parameter online. We then replace the known parameter in the control algorithm with its estimate to yield the certainty equivalent controller.
P1: Binaya Dash November 16, 2007
15:41
7985
7985˙C007
242
Modeling and Control of Complex Systems
RCP performs poorly when the network experiences sudden changes in the traffic load. Such sudden changes can cause RCP to underutilize the network for significant time periods. In this section we demonstrate this behavior of RCP and we show that ACP continues to perform well in the scenarios where RCP performs poorly. We consider the single bottleneck link network of Figure 7.4. The bandwidth of each link is set to 155 Mbit/sec and the roundtrip propagation delay is set equal to 80 msec. The network is initially utilized by only one user. At 15 sec a second user enters the network. This represents a 100% increase in the traffic load at the bottleneck link. We simulate both ACP and RCP networks. In Figure 7.18 we show the time responses of the congestion window of users 1 and 2 for ACP and RCP. We observe that ACP generates smooth responses which gradually converge to their equilibrium values. The sending rate of user 1 converges to the bandwidth of the bottleneck link and then 1600 cwnd 1 cwnd 2
1400
Cwnd (packets)
1200 1000 800 600 400 200 0
0
5
10
15
20
25
30
Time (seconds) (a) ACP 1600
cwnd 1 cwnd 2
1400
Cwnd (packets)
1200 1000 800 600 400 200 0
0
5
10
15 20 Time (seconds)
25
30
(b) RCP
FIGURE 7.18 Time responses of the congestion window of the network users for ACP and RCP. Observe that the second RCP user converges very slowly to its equilibrium value.
P1: Binaya Dash November 16, 2007
15:41
7985
7985˙C007
Congestion Control in Computer Networks
243
1.4 ACP RCP
Bottleneck Utilization
1.2 1 0.8 0.6 0.4 0.2 0
0
5
10
15
20
25
30
Time (seconds)
FIGURE 7.19 Time response of the utilization achieved at the bottleneck link by ACP and RCP. We observe that RCP underutilizes the network for a significant amount of time when the second user enters the network, whereas ACP achieves almost full utilization in that period.
gradually decreases to half of this value when user 2 enters the network. It takes several round-trip times for the congestion windows to converge to their equilibrium values. RCP, on the other hand, adopts a more aggressive response policy. Note how quickly user 1 originally converges to its equilibrium value. However, when user 2 enters the network its sending rate is set equal to the sending rate of user 1. This causes excessive queue sizes at the bottleneck link. The aggressive decrease policy that RCP adopts then causes the desired sending rate calculated at the link to decrease to the minimum value allowed by the control algorithm, which is one packet. When this happens, the desired sending rate does not recover quickly. It remains close to one for approximately 5 sec and converges to the equilibrium value in 15 sec. This slow response is a result of the nonlinear control algorithm RCP utilizes to calculate the desired sending rate. The nonlinearity causes slow responses when the desired rate experiences large undershoots. This problem is exacerbated as we increase the link bandwidths. This behavior of RCP can cause underutilization of the network in significant time periods. In Figure 7.19 we show the time responses of the utilization of the bottleneck link achieved by ACP and RCP. We observe that RCP underutilizes the network for a significant amount of time when the second user enters the network, whereas ACP achieves almost full utilization in that period.
7.6
Conclusions
This chapter provides a survey of recent theoretical and practical developments in the design of Internet congestion control protocols for networks of
P1: Binaya Dash November 16, 2007
15:41
244
7985
7985˙C007
Modeling and Control of Complex Systems
arbitrary topology. We present a theoretical framework that has been used extensively in the last few years to design congestion control protocols with verifiable properties. In this framework the congestion control problem is viewed as a resource allocation problem which is transformed through a suitable representation in a nonlinear programming problem. The relevant cost functions serve as Lyapunov functions for the derived algorithms, thus demonstrating how local dynamics are coupled to achieve a global objective. Many of the derived algorithms have been shown to have globally stable equilibrium points in the presence of delays. However, for max-min congestion controllers the problem of asymptotic stability in the presence of delays still remains open. So, the performance of these algorithms in networks of arbitrary topology has been demonstrated through simulations and practical implementation. This approach has failed to produce protocols that satisfy all the design requirements. In this chapter we present a new adaptive congestion control protocol, which is shown through simulations to outperform previous proposals and work effectively in a number of scenarios. Even though we have not yet established analytically its global convergence, simulation results are encouraging. Future work will consider the analytical establishment of global stability for arbitrary topologies in the presence of delays, for this class of problems.
References 1. W. Willinger and J. Doyle. Robustness and the Internet: Design and evolution, 2002. http://netlab.caltech.edu/internet/. 2. V. Jacobson. Congestion avoidance and control. In Symposium Proceedings on Communications Architectures and Protocols, pp. 314–329. ACM Press, New York, August 1988. 3. V. Jacobson, R. Braden, and D. Borman. TCP extensions for high performance. RFC 1323, May 1992. 4. W. Stevens. TCP slow start, congestion avoidance, fast retransmit, and fast recovery algorithms. RFC 2001, January 1997. 5. M. Mathis, J. Madhavi, S. Floyd, and A. Romanow. TCP selective acknowledgement options. RFC 2018, October 1996. 6. M. Allman, V. Paxson, and W. Stevens. TCP congestion control. RFC 2581, April 1999. 7. S. Floyd and T. Henderson. The NewReno modification to TCPs fast recovery algorithm. RFC 2582, April 1999. 8. Y. Li, D. Leith, and RN. Shorten. Experimental evaluation of TCP protocols for high-speed networks. Technical Report HI, Hamilton Institute, Maynooth, Ireland, June 2005. 9. S. H. Low, F. Paganini, J. Wang, S. Adlakha, and J. C. Doyle. Dynamics of TCP/RED and a scalable control. In Proc. IEEE INFOCOM, volume 1, pages 23–27, June 2002. 10. T. V. Lakshman and U. Madhow. The performance of TCP/IP for networks with high bandwidth-delay products and random loss. IEEE/ACM Transactions on Networking, 5(3):336–350, June 1997.
P1: Binaya Dash November 16, 2007
15:41
7985
7985˙C007
Congestion Control in Computer Networks
245
11. R. Caceres and L. Iftode. Improving the performance of reliable transport protocols in mobile computing environmnents. IEEE Journal on Selected Areas in Communications, 13(5):850–857, June 1995. 12. S. Floyd. High speed TCP for large congestion windows. RFC 3649, December 2003. 13. T. Kelly. Improving performance in highspeed wide area networks. Computer Communication Review, 32(2):83–91, April 2003. 14. H. Hayden. Voice flow control in integrated packet networks. Technical Report LIDS-TH-1152, MIT Laboratory for Information and Decision Systems, Cambridge, MA, 1981. 15. F. Kelly, A. Maulloo, and D. Tan. Rate control in communication networks: Shadow prices, proportional fairness and stability. Journal of the Operational Research Society, 49:237–252, 1998. 16. S. H. Low and D. E. Lapsley. Optimization flow control. I: Basic algorithm and convergence. IEEE/ACM Transactions on Networking, 7(6):861–874, December 1999. 17. S. Liu, T. Basar, and R. Srikant. Controlling the Internet: A survey and some new results. In Proc. IEEE Conf. Decision and Control, volume 3, pages 3048–3057, December 2003. 18. F. Paganini, Z. Wang, J. Doyle, and S. H. Low. Congestion control for high performance, stability and fairness in general networks. IEEE/ACM Transactions on Networking, 13(1):43–56, February 2005. 19. R. Johari and D. Tan. End-to-end congestion control for the Internet: Delays and stability. IEEE/ACM Transactions on Networking, 9(6):818–832, December 2001. 20. G. Vinnicombe. On the stability of end-to-end congestion control for the Internet. Technical Report CUED/F-INFENG/TR.398, Cambridge University Engineering Department, Cambridge, UK, December 2000. 21. L. Ying, G. E. Dullerud, and R. Srikant. Global stability of Internet congestion controllers with heterogeneous delays. In Proceedings of the American Control Conference, volume 4, pages 2948–2953, June 2004. 22. A. Papachristodoulou. Global asymptotic stability of a TCP/AQM protocol for arbitrary networks with delay. In Proc. IEEE Conference on Decision and Control, volume 1, pages 1029–1034, December 2004. 23. S. H. Low, S. Hegde, D. X. Wei, and C. Jin. FAST TCP: Motivation, architecture, algorithms, performance. IEEE/ACM Transactions on Networking, December 2006. 24. S. Kunniyur and R. Srikant. Analysis and design of an adaptive virtual queue algorithm for active queue management. IEEE/ACM Transactions on Networking, 12(2):286–299, April 2004. 25. F. Bonomi and K. W. Fendick. The rate-based flow control framework for the available bit rate ATM service. IEEE Network, 9(2):25–39, March/April 1995. 26. N. Dokkipati, M. Kobayashi, R. Zhang-Shen, and N. McKeown. Processor sharing flows in the Internet. In Proceedings of the Thirteenth International Workshop on Quality of Service 2005, June 2005. 27. A. Pitsillides, P. Ioannou, M. Lestas, and L. Rossides. Adaptive nonlinear congestion controller for a differentiated-services framework. IEEE/ACM Transactions on Networking, 13(1):94–107, February 2005. 28. C. Chrysostomou, A. Pitsillides, L. Rossides, M. Polycarpou, and A. Sekercioglu. Congestion control in differentiated services networks using fuzzy-red. IFAC Control Engineering Practice (CEP) Journal, 11(10):1153–1173, September 2003.
P1: Binaya Dash November 16, 2007
246
15:41
7985
7985˙C007
Modeling and Control of Complex Systems
29. D. P. Bertsekas. Constrained Optimization and Lagrange Multiplier Methods. Athena Scientific, Nashua, NH, 1982. 30. D. P. Bertsekas. Constrained Optimization and Lagrange Multiplier Methods. Academic Press, New York, 1982. 31. F. Paganini. On the stability of optimization-based flow control. In Proceedings of the American Control Conference, volume 6, pages 4689–4694, June 2001. 32. F. Kelly. Charging and rate control for elastic traffic. European Transactions on Telecommunications, 8:33–37, January 1997. 33. L. Massoulie. Stability of distributed congestion control with heterogeneous feedback delays. IEEE Transactions on Automatic Control, 47(6):895–902, June 2002. 34. L. Benmohamed and S. M. Meerkov. Feedback control of congestion in packet switching networks: The case of multiple congested nodes. International Journal of Communication Systems, 10(5):227–246, Sept-Oct 1997. 35. M. Lestas, P. Ioannou, and A. Pitsillides. A congestion control algorithm for max-min resource allocation and bounded queue sizes. Proceedings of the IEEE American Control Conference, volume 2, pages 1683–1688, June-July 2004. 36. B. Wydrowski and M. Zukerman. Maxnet: A congestion control architecture. IEEE Communications Letters, 6(11):512–514, November 2002. 37. M. Lestas, P. Ioannou, and A. Pitsillides. On a hybrid model for max-min congestion controllers. In Proc. IEEE Conference on Decision and Control, volume 1, pages 543–548, December 2004. 38. P. A. Santosh and K. Anurag. Stochastic approximation approach for max-min fair adaptive rate control of abr sessions with mcrs. In Proceedings of the IEEE INFOCOM, volume 3, pages 1358–1365, April 1998. 39. M. Lestas, P. Ioannou, and A. Pitsillides. Global asymptotic stability of a max-min congestion control scheme. In Proceedings of the Workshop on Modeling and Control of Complex Systems, June 2005. 40. D. Katabi, M. Handley, and C. Rohrs. Internet congestion control for highbandwidth-delay products. In Proc. ACM SIGCOMM, August 2002. 41. M. Lestas, P. Ioannou, A. Pitsillides, and G. Hadjipollas. Adaptive Congestion Protocol: A new congestion control protocol with learning capability. Computer Networks, June 2006. (Submitted for publication.) 42. L. Kalampoukas, A. Varma, and K. K. Ramakrishnan. Explicit window adaptation: A method to enhance TCP performance. In Proceedings of the IEEE INFOCOM, pages 242–251, 1998. 43. S. H. Low, L. L. H. Andrew, and B. P. Wydrowski. Understanding XCP: Equilibrium and fairness. In Proceedings of the IEEE INFOCOM, volume 2, pages 1025–1036, March 2005. 44. C. Fulton, S. Li, and C. S. Lim. An ABR feedback control scheme with tracking. In Proceedings of the IEEE INFOCOM’97, volume 2, pages 805–814, April 1997. 45. M. K. Wong and F. Bonomi. A novel explicit rate congestion control algorithm. In Proceedings of the IEEE GLOBECOM’98, volume 4, pages 2432–2439, November 1998. 46. Y. Zhang, D. Leonard, and D. Loguinov. Jetmax: Scalable max-min congestion control for high-speed heterogeneous networks. In Proceedings of the IEEE INFOCOM, April 2006.
P1: Binaya Dash November 16, 2007
15:48
7985
7985˙C008
8 Persistent Autonomous Formations and Cohesive Motion Control
Barı¸s Fidan, Brian D. O. Anderson, Changbin Yu, and Julien M. Hendrickx
CONTENTS 8.1 8.2
Introduction................................................................................................ 248 Rigid and Persistent Formations ............................................................. 250 8.2.1 Rigid Formations........................................................................... 251 8.2.2 Constraint-Consistent and Persistent Formations.................... 252 8.3 Acquisition and Maintenance of Persistence......................................... 255 8.3.1 Acquiring Persistence ................................................................... 255 8.3.2 Persistent Formation Reconfiguration Operations ................... 259 8.3.3 Maintaining Persistence During Formation Reconfigurations ........................................................ 260 8.4 Cohesive Motion of Persistent Formations............................................ 263 8.4.1 Problem Definition........................................................................ 263 8.4.2 Acyclically Led and Cyclically Led Formations ....................... 264 8.5 Decentralized Control of Cohesive Motion ........................................... 266 8.5.1 Control Design............................................................................... 266 8.5.1.1 Control Law for Zero-DOF Agents ............................. 266 8.5.1.2 Control Law for One-DOF Agents .............................. 267 8.5.1.3 Control Law for Two-DOF Agents.............................. 268 8.5.2 Stability and Convergence ........................................................... 269 8.5.2.1 Acyclically Led Minimally Persistent Formations.... 269 8.5.2.2 Cyclically Led Minimally Persistent Formations...... 270 8.5.3 More Complex Agent Models ..................................................... 271 8.6 Discussions and Future Directions.......................................................... 273 Acknowledgment................................................................................................ 273 References............................................................................................................. 274
247
P1: Binaya Dash November 16, 2007
15:48
7985
248
8.1
7985˙C008
Modeling and Control of Complex Systems
Introduction
Recently, the topic of distributed motion control of autonomous multiagent systems has gained significant attention, in parallel with the interest in the real-life applications of such systems involving teams of unmanned aerial and ground vehicles, combat and surveillance robots, underwater vehicles, and so on. [1–4,6,12,20,22–24,26]. This topic presents numerous aspects to be explored corresponding to different control tasks of interest, control approaches to be followed, assumed agent dynamics and interagent information structures, and so on. In this chapter, using a recently developed theoretical framework of graph rigidity and persistence, we analyze a general class of autonomous multiagent systems moving in formation, namely persistent formations, where the formation shape is maintained during any continuous motion via a set of constraints on each agent to keep its distances from a prespecified group of other neighboring agents constant. As the title indicates, the chapter focuses on two complementary issues about autonomous formations: persistence (which will be explained further below) in Sections 8.2 and 8.3, and cohesive, that is, shapepreserving motion, in Sections 8.4 and 8.5. Before listing the contents and contributions of the chapter and linking these two topics, we give an intuitive introduction to the fundamental terms to be used throughout the chapter. The formal definitions of these terms (where needed) will be given later in the chapter. We use the term formation for a collection of agents moving in real twoor three-dimensional space to fulfill certain mission requirements. Leaving the agent dynamics issues to future studies in the field and focusing on the motion of the entire formation rather than individual agent behaviors,1 we assume a point-agent system model [14,31]. We represent each multiagent formation F by a graph G F = (VF , E F ) with a vertex set VF and an edge set E F where each vertex i ∈ VF corresponds to an agent Ai in F and each edge (i, j) ∈ E F corresponds to an information link between a pair ( Ai , Aj ) of agents. G F is also called the underlying graph of the formation F . Here, G F for a particular F can be directed or undirected depending on the properties of information links of F , as will be discussed below. A formation F with an underlying graph G F = (VF , E F ) is called rigid if by explicitly maintaining distances between all the pairs of agents which are connected by an information link, that is, whose representative vertices are connected by an edge in E F , the distances between all other pairs of agents in F are consequentially held fixed as well, and hence F can move as a cohesive whole. Typically the agent pairs in F whose interdistances are explicitly maintained are the ones having information (i.e., sensing and communication) 1 It
is worth noting here that agent dynamics and dynamic interactions are major issues in realworld multivehicle formation control and some further discussions on these issues can be found in Reference [25] and the references therein.
P1: Binaya Dash November 16, 2007
15:48
7985
7985˙C008
Persistent Autonomous Formations and Cohesive Motion Control
249
links in between, corresponding to the edges in the underlying graph G F . Hence, in (a geometric representation of) the underlying graph G F , explicit maintenance of the distance between an agent pair ( Ai , Aj ) with an information link between the two corresponds to keeping the length of the edge (i, j) ∈ E F constant. There are two types of control structures that can be used to maintain the required distance between pairs of agents in a formation: symmetric control and asymmetric control. In the symmetric case, to keep the distance between, for example, agent Ai and agent Aj at a desired value di j , there is a joint effort of both agent Ai and agent Aj to simultaneously and actively maintain their relative positions. The associated undirected underlying graph will have an undirected edge (i, j) between vertices i and j. If enough agent pairs explicitly maintain distances, all remaining interagent distances will be consequently maintained and the formation will be rigid. In the asymmetric case, which is the assumed control structure in this chapter, only one of the agents in each pair, for example, agent Ai , actively maintains its distance to agent Aj at the desired value di j . This means that only agent Ai has to receive the position information broadcast by agent Aj , or sense the position of agent Aj and it can make decisions on its own. Therefore, in the asymmetric case, both the overall control complexity and the communication complexity in terms of message sent or information sensed for the formation are expected to be reduced by half. This is modeled in the associated (directed) underlying graph G F = (VF , E F ) by a directed edge −−→ (i, j) ∈ E F from vertex i to vertex j. In this case, we also say that Ai has the constraint of staying at a distance di j from Aj or Ai follows Aj or Ai is a follower of Aj . For a formation F with asymmetric control structure, if each agent in F is able to satisfy all the constraints on it provided that all other agents within F are trying to satisfy their constraints (i.e., satisfy as many of their constraints as possible), then F is called constraint consistent (examples of both a constraint-consistent formation and a formation lacking constraint consistence will be presented subsequently). A formation that is both rigid and constraint-consistent is called persistent [31]. In a persistent formation, provided that all the agents are trying to satisfy the distance constraints on them, they can in fact satisfy these constraints and, consequently, the global structure of the formation is preserved, that is, when the formation moves, it necessarily moves as a cohesive whole.2 For a given persistent formation F , if removal of any single edge (in the underlying graph) makes F nonpersistent then F is further called minimally persistent, that is, a minimally persistent formation provably preserves its persistence with a minimal number of edges. Persistence appears to be the crucial property of an information/control architecture of a formation that ensures that the formation can move 2 There exists an exceptional small class of formations in 3 , for which the intuitive explanation here and the formal definition of persistence given in Section 8.2 do not match. This special class is further discussed in Section 8.2.2.
P1: Binaya Dash November 16, 2007
250
15:48
7985
7985˙C008
Modeling and Control of Complex Systems
cohesively. Minimal persistence defines those situations where loss of a link means cohesiveness of the motion is no longer assured; from an operational point of view, nonminimal persistence may be desirable to secure redundancy [8]. In Section 8.2, we review the general characteristics of rigid and persistent formations using a recently established framework of rigid and persistent graphs. We present some operational criteria to check the persistence of a given formation. Based on these characteristics and criteria, in Section 8.3, we focus on the acquisition and maintenance of the persistence of certain types of autonomous formations. We particularly consider systematic construction of provably persistent two-dimensional formations by assigning directions to their information links. We briefly review some common operations on persistent formations, including addition of new agents to the formation, closing ranks when an agent is lost, merging two or more formations, splitting a formation into smaller formations, and we provide strategies for maintaining persistence during these operations. Finally, in Sections 8.4 and 8.5, we focus on cohesive motion control of persistent autonomous formations. We present a set of distributed control schemes to move a given two-dimensional persistent formation with specified initial position and orientation to arbitrary desired final position and orientation without deforming the shape of the formation during the motion. The control design procedure is presented assuming a velocity integrator agent model that is widely considered in the literature [1,12,26]; nevertheless, generalization of these designs for other kinematic models is discussed briefly as well. The chapter concludes with some mention of relevant future research directions.
8.2
Rigid and Persistent Formations
In this section, we give formal definitions of the rigidity and persistence notions and present a brief review of the fundamental characteristics of rigid and persistent formations to the extent needed for the analysis in the following sections. For details the reader may refer to Reference [6,14,28,29,31]. We focus on formations in 2 and 3 (two-dimensional and three-dimensional Euclidean spaces, respectively) considering real-world multivehicle formation applications, although most of the definitions and results can be generalized for arbitrary dimensional space n (n ∈ {2, 3, . . .}) [31]. Consider a formation F with asymmetric control structure. The directed underlying graph G F = (VF , E F ) of F has been defined in Section 8.1. The undirected graph G uF = (VF , E Fu ) with the same vertex set VF and the undirected edge set E Fu having the same edges in E F but with the directions neglected, that is, satisfying −−→ −−→ (i, j) ∈ E Fu ⇔ ( (i, j) ∈ E F or ( j, i) ∈ E F )
P1: Binaya Dash November 16, 2007
15:48
7985
7985˙C008
Persistent Autonomous Formations and Cohesive Motion Control
251
is called the underlying undirected graph of the formation F (or of G F ). Next, we focus on formal definition and characterization of rigidity and persistence of formations with asymmetric control structure, using their undirected and directed underlying graphs. 8.2.1 Rigid Formations We formally call a formation F in n (n ∈ {2, 3}) with asymmetric control structure rigid (and its directed underlying graph G F generically n-rigid) if its undirected underlying graph G uF is generically n-rigid, where generic n-rigidity of an undirected graph (n ∈ {2, 3}) is defined in the sequel. In n (n ∈ {2, 3}), a representation of an undirected graph G = (V, E) with vertex set V and edge set E is a function p : V → n . We say that p(i) ∈ n is the position of the vertex i, and define the distance between two representations p1 and p2 of the same graph by:3 δ( p1 , p2 ) = max | p1 (i) − p2 (i)| i∈V
A distance set d¯ for G is a set of distances di j > 0, defined for all edges (i, j) ∈ E. ¯ the pair (G, d) ¯ Given a graph G = (V, E) and a corresponding distance set d, can be considered as a weighted graph [weighted version of the graph G = ¯ A distance set is (V, E)], where the weight of each edge (i, j) ∈ E is di j ∈ d. realizable if there exists a representation p of the graph for which | p(i)− p( j)| = di j for all (i, j) ∈ E. Such a representation is then called a realization. Note that each representation p of a graph induces a realizable distance set [defined by di j = | p(i) − p( j)| for all (i, j) ∈ E], of which it is a realization. A representation p is rigid if there exists > 0 such that for all realizations p of the distance set induced by p and satisfying δ( p, p ) < , there holds | p (i) − p ( j)| = | p(i) − p( j)| for all i, j ∈ V (we say in this case that p and p are congruent). An undirected graph is said to be generically n-rigid or simply n-rigid (n ∈ {2, 3}) if almost all its representations in n are rigid. Some discussions on the need for using the qualifiers “generic” and “almost all” can be found in References [14,27]. One reason for using these terms is to avoid the problems arising from having three or more collinear vertices in 2 or four or more coplanar vertices in 3 . Another notion used in rigidity analysis is minimal rigidity. A graph G is called minimally n-rigid (n ∈ {2, 3}) if G is n-rigid and if there exists no n-rigid subgraph of G with the same set of vertices as G and a smaller number of edges than G. Provably equivalently, a graph is minimally n-rigid if it is n-rigid and if no single edge can be removed without losing n-rigidity. Fundamental characteristics of rigid and minimally rigid graphs and some of their applications this chapter, we use | · | to denote two different operators, one for vectors and the other for sets. For a vector ξ ∈ n (n ∈ {2, 3}), |ξ | denotes the Euclidean norm of ξ . Hence | p1 (i) − p2 (i)|, on this page, denotes the Euclidean distance between p1 (i) and p2 (i). For a set S, |S| denotes the number of elements of S. 3 In
P1: Binaya Dash November 16, 2007
252
15:48
7985
7985˙C008
Modeling and Control of Complex Systems
in autonomous formation control can be found in References [6,27–29]. Following are a selection of these characteristics. THEOREM 1 For any n-rigid graph G = (V, E) (n ∈ {2, 3}) with at least n vertices, there exists a subset E ⊆ E of edges such that the graph G = (V, E ) is minimally n-rigid and satisfies the following: (1) |E | = n|V|−n(n + 1)/2; (2) any subgraph G = (V , E ) of G with at least n vertices satisfies |E | ≤ n|V | − n(n + 1)/2. LEMMA 1 Let G = (V, E) be a minimally n-rigid graph (n ∈ {2, 3}) and G = (V , E ) be a subgraph of G. If |E | = n|V | − n(n + 1)/2 then G is minimally n-rigid. LEMMA 2 For n ∈ {2, 3}, a graph obtained by adding one vertex to a graph G = (V, E) and n edges connecting this vertex to other vertices of G is (minimally) n-rigid if and only if G is (minimally) n-rigid.
8.2.2 Constraint-Consistent and Persistent Formations Similar to the definition of rigid formations, we call a formation F in n (n ∈ {2, 3}) with asymmetric control structure persistent (constraint consistent) if its directed underlying graph G F is n-persistent (respectively n-constraint consistent), where n-persistence and n-constraint consistence of a directed graph are defined as follows. Consider a directed graph G = (V, E), a representation p : V → n −−→ (n ∈ {2, 3}) of G, and a set of desired distances di j > 0, ∀(i, j) ∈ E. Note here that the representation, vertex positions, and distance between two representations corresponding to a directed graph are defined exactly the same as the ones corresponding to undirected graphs (see Section 8.2.1). We say −−→ that the edge (i, j) ∈ E is active if | p(i) − p( j)| = di j . We also say that the position of the vertex i ∈ V is fitting for the distance set d if it is not possible to increase the set of active edges leaving i by modifying the position of i while keeping the positions of the other vertices unchanged. More formally, given a representation p, the position of vertex i is fitting if there is no p ∗ ∈ n for which −−→ −−→ {(i, j) ∈ E : | p(i) − p( j)| = di j } ⊂ {(i, j) ∈ E : | p∗ − p( j)| = di j } A representation p of a graph is called fitting for a certain distance set d¯ if ¯ Note that any realization is a fitting all the vertices are at fitting positions for d. representation for its distance set. The representation p is called persistent if there exists > 0 such that every representation p fitting for the distance set induced by p and satisfying δ( p, p ) < is congruent to p. A graph is then
P1: Binaya Dash November 16, 2007
15:48
7985
7985˙C008
Persistent Autonomous Formations and Cohesive Motion Control 5
2
5 1
4
3 (a)
5
2
2 1
1 4
3 (b)
253
4
3 (c)
FIGURE 8.1 Application of Theorem 2 in 2 . Assume√ that the distance set d¯ is given by d12 = d13 = d23 = d25 = d34 = d45 = 1 and (for [b] and [c]) d24 = 2. (a) The representation is constraint consistent but not rigid. (Assuming vertices [agents] 1, 2, and 3 are stationary, 4 and 5 can continuously move to new locations without violating d25 , d34 , d45 .) Hence it is not persistent. (b) The representation is rigid but not constraint consistent. (Again assuming that 1, 2, and 3 are stationary, 5 can continuously move to new positions without violating the distance constraint [d25 ] on it, for which vertex 4 is unable to meet all three distance constraints [d24 , d34 , d45 ] on it at the same time.) Hence it is not persistent. (c) The representation is both rigid and constraint consistent, hence it is persistent.
generically n-persistent (n ∈ {2, 3}) if almost all its representations in n are persistent. Similarly, a representation p is called constraint consistent if there exists > 0 such that any representation p fitting for the distance set δ¯ induced by p and satisfying δ( p, p ) < is a realization of δ¯ . Again, we say that a graph is generically n-constraint consistent (n ∈ {2, 3}) if almost all its representations in n are constraint consistent. The relation among persistence, rigidity,4 and constraint consistence of a directed graph is given in the following theorem and demonstrated using a two-dimensional example in Figure 8.1. THEOREM 2 [31] A representation in n (n ∈ {2, 3}) is persistent if and only if it is rigid and constraint consistent. A graph is generically n-persistent (n ∈ {2, 3}) if and only if it is generically n-rigid and generically n-constraint consistent. In order to check persistence of a directed graph G, one may use the following criterion, where d − (i) and d + (i) designate, respectively, the in- and out-degree of the vertex i in the graph G, that is, the number of edges in G heading to and originating from i, respectively. PROPOSITION 1 [31] An n-persistent graph (n ∈ {2, 3}) remains n-persistent after deletion of any −−→ edge (i, j) for which d + (i) ≥ n + 1. Similarly, an n-constraint-consistent graph −−→ (n ∈ {2, 3}) remains n-constraint consistent after deletion of any edge (i, j) for which d + (i) ≥ n + 1.
4 Rigidity for a directed graph is defined in the same way as for an undirected graph; one simply
takes no account of any assigned direction.
P1: Binaya Dash November 16, 2007
254
15:48
7985
7985˙C008
Modeling and Control of Complex Systems
Another notion found useful in characterizing a persistent formation F (or its underlying graph G F ) is the number of degrees of freedom (DOF count) of each agent (vertex) in n (n ∈ {2, 3}), which is defined as the maximal dimension, over all n-dimensional representations of G F , of the set of possible fitting positions for this agent (vertex). In n (n ∈ {2, 3}), the vertices with zero outdegrees have n DOFs, the vertices with out-degree 1 have n − 1 DOFs, the ones with out-degree 2 have n − 2 DOFs, and all the other vertices have zero DOF. In an underlying graph of an n-dimensional formation (n ∈ {2, 3}), a vertex (agent) with n-DOF in n is also called a leader. The following corollary of Proposition 1 provides a natural bound on the total number of degrees of freedom in an n-persistent graph in n (n ∈ {2, 3}), which we also call the total DOF count of that graph in n . COROLLARY 1 The total DOF count of an n-persistent graph in n (n ∈ {2, 3}) can at most be n(n + 1)/2. In Corollary 1, note that n of the n(n + 1)/2 DOFs correspond to translations and the remaining n(n − 1)/2 correspond to rotations of a formation represented by the n-persistent graph, considering the whole formation as a single body. Next, we present two other essential results on the characterization of persistent graphs (and hence persistent formations), proofs of which can be found in Reference [31]. THEOREM 3 A directed graph is n-persistent (n ∈ {2, 3}) if and only if all those subgraphs are n-rigid which are obtained by successively removing outgoing edges from vertices with out-degree larger than n until all such vertices have an out-degree equal to n. PROPOSITION 2 [31] Consider a directed graph G and another directed graph G that are obtained by adding one vertex to G and at least n edges leaving this vertex and incident on different vertices of G (n ∈ {2, 3}). Then, G is n-persistent if and only if G is n-persistent. It has been stated in Section 8.1 that there exists a particular small class of formations in 3 , for which the intuitive definition of persistence given in Section 8.1 and the formal definition, here in Section 8.2, do not match. The problem in this exceptional class arises when it is not possible for all the agents in a certain subset of the agent set of the formation to simultaneously satisfy all their constraints, despite the ability of any single agent to move to a position that satisfies the constraints on it once all the other agents are fixed [31]. Persistent formations free of this problem are called structurally persistent. For a formal definition and characteristics of structural persistence, as well as the details of the above problem, the reader may refer to Reference [31]. The distinction between structural persistence and persistence does not arise in two dimensions. In 3 , it turns out that a formation is structurally persistent
P1: Binaya Dash November 16, 2007
15:48
7985
7985˙C008
Persistent Autonomous Formations and Cohesive Motion Control
255
if and only if it is persistent and does not have two leaders each with three DOFs. For simplification, we assume all the practical persistent formations considered in the chapter to be structurally persistent as well.
8.3
Acquisition and Maintenance of Persistence
8.3.1 Acquiring Persistence The importance of persistence for cohesive and coordinated motion of autonomous formations with asymmetric control structure has been indicated in the previous sections. In this subsection, we focus on acquisition of persistence for such formations, which can be interpreted in various ways: (1) systematic construction of a persistent formation from a given team of autonomous agents with certain desired information architecture characteristics; (2) converting a given nonpersistent or non-rigid formation to a persistent one via swapping some of the directions of the information links and adding some extra links if needed; (3) assigning directions to the links of formations with given rigid undirected underlying graphs (i.e., to the edges of the undirected underlying graphs) to obtain persistent formations; and so on. Interpretation (1) is partially analyzed in References [14–16,31], where certain systematic construction procedures have been developed similar to their well-established counterparts for growing undirected rigid formation (or graphs), namely Henneberg construction sequences [6,27]. In References [15,16], a systematic procedure is developed for constructing (minimally) twopersistent graphs, where at each step of the procedure, one of the following three operations is applied: vertex addition, edge splitting, and edge reversal. Each of these operations (as defined below) preserves minimal two-persistence when applied to a minimally two-persistent graph and two-persistence when applied to a two-persistent graph. Hence, if the procedure starts with a twopersistent graph G 0 , the graph G i obtained at each step i = 1, 2, . . . is twopersistent; and if G 0 is further a minimally two-persistent graph, each G i is minimally two-persistent. Next, we briefly explain the three operations. At step i (i ∈ {1, 2, . . .}), application of a vertex addition to graph G i−1 = (Vi−1 , E i−1 ) means addition to G i−1 of a vertex j with in-degree 0 and outdegree 2 and two distinct edges −−→ −−→ ( j, k), ( j, l) outgoing from j where k, l ∈ Vi−1 . The resultant graph is G i = (Vi , E i ) where Vi = Vi−1 ∪ { j} and −−→ −−→ E i = E i−1 ∪ {( j, k), ( j, l)}
P1: Binaya Dash November 16, 2007
256
15:48
7985
7985˙C008
Modeling and Control of Complex Systems
Application of edge splitting means removing a directed edge −−→ ( j, k) ∈ E i−1 and adding a new vertex l and the edges −−→ −−→ −−−→ ( j, l), (l, k), (l, m) where m ∈ Vi−1 , that is, the resultant graph is G i = (Vi , E i ) where Vi = Vi−1 ∪ {l} and −−→ −−→ −−−→ −−→ E i = E i−1 ∪ {( j, l), (l, k), (l, m)} \ {( j, k)} Finally, application of edge reversal on G i−1 is replacing directed edge −−→ ( j, k) ∈ E i−1 −−→ where j ∈ Vi−1 has at least 1 DOF (in 2 ) with (k, j) to obtain G i = (Vi , E i ) with Vi = Vi−1 and −−→ −−→ E i = E i−1 ∪ {(k, j)} \ {( j, k)} Any minimally two-persistent graph G = (V, E) with V = {1, 2, . . . , N}, where |V| = N ≥ 3, can be obtained starting with a seed graph G 0 = (V0 , E 0 ) with −−→ −−→ −−→ V0 = {1, 2, 3}, E 0 = {(2, 1), (3, 1), (3, 2)} and applying the procedure described above in the following particular form [15,16]: First, using a sequence of N − 3 operations each of which is either vertex addition or edge splitting, a minimally two-persistent graph G N−3 is built having the same underlying undirected graph as G. Then from G N−3 , a graph G = (V, E ) is obtained having the same undirected underlying graph and the same DOF distribution (among vertices in V in 2 ) with G, by redistributing the DOFs among the vertices by applying a sequence of edge reversals. It is further shown in References [15,16] that the only possible differences between G and G are directions (orientations) of certain cycles. G is obtained by reversing the directions of these cycles via a sequence of edge reversals for each cycle, that is, reversing the direction of each of the edges in these cycles in a sequential manner, as the final part of the construction procedure. The doability of each of the three parts of the above procedure for building G from G 0 is proven in References [15,16]. As a special case, if G is an acyclic (cycle free), minimally two-persistent graph, then G (with possibly a different permutation of vertex indices) can be grown from G 0 by applying a sequence of N − 3 vertex additions [14].
P1: Binaya Dash November 16, 2007
15:48
7985
7985˙C008
Persistent Autonomous Formations and Cohesive Motion Control
257
A generalized version of the vertex addition operation above (for both twopersistence and three-persistence) is discussed in References [14,31], where a vertex with out-degree at least n and in-degree 0 is added to an n-persistent graph (n ∈ {2, 3}). The results in References [14,31] imply that the generalized vertex addition operation preserves n-persistence (n ∈ {2, 3}) and any acyclic (cycle free) n-persistent graph G = (V, E) with V = {1, 2, . . . , N}, where |V| = N ≥ n + 1 can be grown from an acyclic n-persistent seed graph with three vertices, performing a sequence of N − 3 generalized vertex additions. One possible approach to the interpretation (2) of persistence acquisition is to perform the acquisition task in two steps, where in the first step the undirected underlying graph is made rigid via addition of a necessary number of links with arbitrary directions, and in the second step directions of selected links are swapped to satisfy constraint consistence and hence persistence of the directed underlying graph. This interpretation has not been fully analyzed in the literature yet, but partial discussions on or relevant to making a nonrigid graph rigid via adding edges and making a nonconstraint-consistent directed graph constraint consistent via edge reversals can be found in References [6,15,16,29]. Particularly, a discussion on making a nonrigid graph rigid via adding edges is presented in Reference [6], where the task is named as a (minimal) cover problem. A general solution to the problem defined in interpretation (3), that is, one applicable to nonminimally rigid as well as minimally rigid graphs, is not available in the literature yet, which is not unreasonable given that the notion of persistence is very recently defined and the relation between this directed graph notion and the undirected graph notion of rigidity is nontrivial. Nevertheless, systematic solutions for classes of nonminimally rigid undirected graphs, namely complete graphs, bilateration and trilateration graphs, wheel graphs, C 2 graphs, C 3 graphs, and bipartite graphs of type K m,n , are provided in Reference [8]. Formal definitions of these graph classes can be found in References [7,10] and their rigidity can be verified easily using these definitions and the rigidity criteria available in the literature, for example, References [27–29]. Below we present the persistence acquisition procedures for some of these classes (depicted in Figure 8.2) followed by some discussion on their practical implications. The complete list of results as well as the proofs can be seen in Reference [8]. PROPOSITION 3 Given an integer k ≥ 3, consider the k-complete (undirected) graph K k with the vertex set V = {1, 2, . . . , k}, where every vertex pair i, j ∈ V is directly connected by an edge. Let K k be the directed graph obtained by assigning directions to the edges of K k such that for any vertex pair i, j satisfying 1 ≤ i < j ≤ k, the direction of edge (i, j) is from j to i. Then, K k is n-persistent for n ∈ {2, 3}. PROPOSITION 4 Given a trilateration graph T, that is, a graph with an ordering of vertices 1, 2, . . . , k such that 1, 2, and 3 form a complete graph, and vertex j is joined to at least three
P1: Binaya Dash November 16, 2007
15:48
7985
7985˙C008
258
Modeling and Control of Complex Systems 5
1
1
3
2
4
3 K5
5
(a) 6
2
4
(b) 1
6
1 0
5
2 0 4
3 W6 (c)
5
2
4
3 W6 (d)
FIGURE 8.2 Persistence acquisition (via link direction assignment) of two-dimensional formations with (a) a complete underlying graph; (b) a trilateration underlying graph; (c), (d) a wheel underlying graph with two different representations.
vertices 1, 2, . . . , j − 1 for j = 4, 5, . . . , k, let T be the directed graph obtained by assigning directions to the edges of T such that the direction of each edge (i, j) for i < j is from j to i. Then, T is n-persistent for n ∈ {2, 3}. PROPOSITION 5 Given an integer k ≥ 3, consider the wheel graph Wk that is composed of k rim vertices, labeled vertices 1, 2, . . . , k, the rim cycle of edges Ck = {(1, 2), (2, 3), . . . , (k − 1, k), (k, 1)} passing through these vertices, one hub vertex (labeled vertex 0), and the edges (0, i) for i = 1, 2, . . . , k connecting the hub vertex to each of the rim vertices. Let Wk be the directed graph obtained by assigning directions to the edges of Wk such that the direction of each rim edge (i, i + 1) is from i to i + 1 , the direction of (1, k) is from k to 1, and the direction of any edge (0, i) is from i to 0. Then, Wk is two-persistent. Note that each of the rigid graph classes considered above corresponds to a formation architecture that can be used in guidance and control of autonomous multiagent formations. Complete graphs model the information architecture of formations where the sensing (communication) radius of each agent potentially allows it to maintain its distance actively from any other agent in the entire formation. Trilateration results given in Proposition 4 can be used in acquisition of cycle-free formations with leader-follower structure
P1: Binaya Dash November 16, 2007
15:48
7985
7985˙C008
Persistent Autonomous Formations and Cohesive Motion Control
259
[26] and asymmetric control architecture. One might use wheel graphs to model two-dimensional formations in which there is a central agent, the commander, which can be sensed by all other agents. Note here that since the sensing/communication capabilities of the agents and the distance constraints on them may be different from each other, the formations with wheel underlying graphs may have various geometries such as the ones depicted in Figure 8.2c and d. As demonstrated in Figure 8.2d, the commander (corresponding to the hub vertex) does not need to be in the geometric center of the formation. Although the procedures in Reference [8], including the results above, have been developed for a limited number of formation classes, the methodology used to develop these procedures can be used to generate similar procedures for persistence acquisition and of other formation classes as well. 8.3.2 Persistent Formation Reconfiguration Operations In many autonomous multiagent formation applications, some of which are mentioned in Section 8.1, one needs to analyze certain scenarios that have a significant likelihood in practice, as a matter of guaranteeing robustness in the presence of such scenarios. In Reference [6] three key categories of operations on rigid formations have been analyzed: merging, splitting, and closing ranks. The focus of analysis in Reference [6] for each operation is preservation of rigidity during the operation. The trivial extensions of the above operations for persistent formations and the relevant persistence maintenance problems during these operations can be defined as follows. In merging the task is to establish a set L n of new directed [information] links between (agents of) two persistent formations F1 , F2 such that the merged formation F1 ∪ F2 ∪ L n (i.e., the formation whose agent set is the union of the agent sets of F1 and F2 and whose directed [information] link set is L 1 ∪ L 2 ∪ L n , where L i denotes the directed [information] link set of Fi for i ∈ {1, 2}) is persistent. In terms of the underlying graphs, the merging task is equivalent to the following: Given the underlying graphs G F 1 = (VF 1 , E F 1 ), G F 2 = (VF 2 , E F 2 ) of two persistent formations F1 and F2 , find a directed edge set E n such that the directed graph (VF 1 ∪ VF 2 , E F 1 ∪ E F 2 ∪ E n ) is persistent. In splitting, which can be thought of as the reverse of merging, the case is considered where a persistent formation F with directed underlying graph G F = (VF , E F ) is split into two formations F1 , F2 with directed underlying graphs G F 1 = (VF 1 , E F 1 ), G F 2 = (VF 2 , E F 2 ), respectively (where VF = VF 1 ∪ VF 2 ) due to loss of some information links in F (or some edges in E F ). The task is to establish new links within each of F1 , F2 (add new directed edges to E F 1 , E F 2 ) such that both F1 and F2 become persistent. Closing ranks can be thought of as a special (pseudo-) splitting operation. The case of interest is the loss of an agent (and the links associated to this agent) from a persistent formation, and the closing ranks task is to establish new directed links between certain pairs among the remaining agents such that the
P1: Binaya Dash November 16, 2007
260
15:48
7985
7985˙C008
Modeling and Control of Complex Systems
new formation (formed after the agent loss and establishment of the new links) is persistent as well. Note here that splitting can be thought of as a generalized closing ranks operation (defined for the loss of a set of agents instead of a single agent) as well, observing that the scenario of the above splitting problem for the post–split formation F1 , for example, can be equivalently reformulated as F1 being what is left when F , having initially F2 as its subformation, then loses the agents in the subformation F2 . This observation has been found useful (at least for the undirected underlying graphs and rigidity considerations) in treating splitting problems using certain results derived for the closing ranks problem, for example, in Reference [6]. The persistence maintenance problems corresponding to the three operations above can be thought of as special cases of the problem of making a nonpersistent (underlying) graph persistent via adding some new edges (and swapping some of the edge directions), which can be thought of as the extension of the (minimal) cover problem discussed in Reference [6] for directed graphs or formations. A complete generalization to directed graphs of the solution to the minimal cover problem has not yet been achieved. Moreover, the three operations and the corresponding persistence maintenance tasks can be further generalized to consider merging involving more than two persistent formations, splitting into more than two persistent formations, closing ranks during loss of two or more agents, and so on. Furthermore, various other scenarios can be generated as combinations of specific forms of a number of the above formations. One such scenario is where a multiagent (vehicle) formation loses some of its agents and new agents are required to be added to the formation without violating the existing control structure [6]. Another similar scenario is where the leader of a formation has to be substituted due to evolving mission requirements [8,30]. If the formation in the beginning is persistent, the leader change task above without damaging the control structure can be abstracted as changing directions of certain edges in an underlying directed graph in an appropriate way that maintains the persistence. We conclude this subsection with a real-life example where frequent formation changes are expected: terrain surveillance using a formation of aerial vehicles with surveying sensors mounted on them [3,8]. In this application, abstracting each vehicle with the sensing/communication equipment on it as a sensor agent, in order to adapt varying conditions during the surveillance mission, an extra sensor agent may be needed to improve the overall coverage. In such a case it is essential to coordinate well the behavior of each such additional sensor agent with that of the already existing agents, which can be done by maintaining persistence of the formation during variations. 8.3.3 Maintaining Persistence During Formation Reconfigurations As mentioned in Section 8.3.2, the main issue in the reconfiguration operations on persistent formations in terms of the information architecture is maintenance of persistence. Full analysis of the basic or extended versions of the merging, splitting, and closing ranks operations for persistent formations
P1: Binaya Dash November 16, 2007
15:48
7985
7985˙C008
Persistent Autonomous Formations and Cohesive Motion Control
261
and the relevant persistence maintenance problems are not available in the literature yet. However, some partial analysis results and relevant discussions can be found in some recent studies, for example, in References [8,17,30,31]. In Reference [17], the problem of merging persistent formations in 2 and 3 is partially analyzed in a so-called metaformation framework. The main results of this analysis are summarized in the following theorems. The first theorem is about a particular way of merging the directed underlying graphs of persistent formations via addition of a set E M of additional edges having end-vertices belonging to different underlying graphs; and the second is about necessary and sufficient conditions for an edge-optimal persistent merging, that is, for having E M such that no single edge of E M can be removed without losing persistence of the merged formation. THEOREM 4 [17] A collection of n-persistent graphs G 1 , . . . , G N (n ∈ {2, 3}, N ∈ {2, 3, . . .}) can be merged into a (structurally) n-persistent graph if and only if this merging can be done by adding edges leaving vertices with one or more local DOFs in n , that is, DOFs (in n ) in the corresponding G i (i ∈ {1, 2, . . . , N}), such that the original local DOF count of each vertex is greater than or equal to the number of added edges leaving this vertex. In that case, the merged graph G is n-persistent if and only if it is n-rigid. G is then structurally three-persistent if and only if it has at most one vertex with three DOFs in 3 . THEOREM 5 [17] Consider a set G = G3 ∪ G2 ∪ G1 of disjoint directed graphs where G3 is composed of n-persistent graphs (n ∈ {2, 3}) having at least three vertices, G2 is composed of directed graphs withtwo vertices and an edge, and G1 is composed of single-vertex graphs. Then, G = G i ∈G G i ∪ E M , where E M is a set of additional edges having end-vertices belonging to different graphs in G, is an edge-optimal persistent merging in n if and only if the following conditions all hold: 1. |E M | = n(n + 1)/2 (|G3 | − 1) + (2n − 1)|G2 | + n|G1 |. 2. For all nonempty E M ⊆ E M , there holds |E M | ≤ n(n + 1)/2(|I ( E M )| − 1) + (2n − 1)|J ( E M )| + n|K ( E M )|, where I ( E M ) is the set of graphs in which at least three vertices or two unconnected ones are incident to edges of E M , J (EM ) is the set of those in which one connected pair of vertices is incident to edges of E M , and K ( E M ) is the set of those in which only one vertex is incident to edge(s) of E M . 3. All edges of E M leave vertices with local DOFs in n .
In References [8,31], persistence maintenance of a three-dimensional formation is analyzed during addition of a new agent to the formation using a DOF allocation state framework. This framework is based on the elaboration of the fact that in a three-persistent graph, there are at most six DOFs (in 3 ) to be allocated among the vertices [8,31]. The six DOF allocation states in the framework are defined as the following six sets of DOF counts (in 3 ) of
P1: Binaya Dash November 16, 2007
262
15:48
7985
7985˙C008
Modeling and Control of Complex Systems
vertices ordered in a nonincreasing manner, which represent the six different ways (considering the agents as indistinguishable) of allocating the six DOFs, where in the state S1 , for example, one vertex (the leader) has three DOFs, one has two DOFs, another has one DOF, and all the others are 0-DOF: S1 = {3, 2, 1, 0, 0, . . .}, S2 = {2, 2, 2, 0, 0, . . .}, S3 = {3, 1, 1, 1, 0, 0, . . .} S4 = {2, 2, 1, 1, 0, 0, . . .}, S5 = {2, 1, 1, 1, 1, 0, 0, . . .}, S6 = {1, 1, 1, 1, 1, 1, 0, 0, . . .} Further discussion on the DOF allocation states can be found in Reference [8]. Employing an analysis based on the DOF allocation framework, a set of directed vertex addition operations requiring a minimal number of new edges, namely directed trilateration operations, have been developed in Reference [31] for maintaining persistence. A directed trilateration, DT(m), where m ∈ {0, 1, 2, 3}, is defined as a transformation of a three-persistent graph G = (V, E), where |V| ≥ 3, to another three-persistent graph G = (V , E ), where −−→ −−→ V = V ∪ {i}, E = E ∪ {(i, k) : ∀k ∈ V1 } ∪ {( j, i) : ∀ j ∈ V2 } for some V1 , V2 ⊆ V satisfying V1 ∩ V2 = ∅, |V1 | = 3 − m, |V2 | = m, and DOF ( j) ≥ 1 , ∀ j ∈ V2 ,5 provided that the vertices of V1 ∪ V2 are all distinct and are not collinear [31]. Note here that Theorem 3 indicates that the graph obtained after applying a directed trilateration is three-persistent, that is, the directed trilateration defined above preserves the three-persistence of the graphs. Furthermore, an undirected graph formed by applying a sequence of trilateration operations starting with an initial undirected triangle, often called a trilateration graph, is guaranteed to be generically three-rigid [28,29]. Similarly, a directed graph formed by applying a sequence of directed trilateration operations starting with any initial directed triangle with three vertices and three directed edges, one for each vertex pair is guaranteed to be generically three-persistent [8,31]. Further properties and interpretations of the four directed trilateration operations DT(0), DT(1), DT(2), and DT(3) are given in Reference [8]. REMARK 1 In the implementation of the persistence acquisition and maintenance strategies presented in this chapter, a common requirement would be developing decentralized controllers for individual agents, instead of a centralized control scheme. The main concerns leading to this requirement are complexity and computational cost, sensitivity to loss of certain agents (e.g., a central commander), communication delays between the commander agent and the other agents, impracticality of processing local information by a central control unit, and so on, in a possible central control scheme.
5 If
there is no such V2 then the corresponding DT(m) cannot be performed for the graph G.
P1: Binaya Dash November 16, 2007
15:48
7985
7985˙C008
Persistent Autonomous Formations and Cohesive Motion Control
8.4
263
Cohesive Motion of Persistent Formations
8.4.1 Problem Definition In the previous sections we have focused on characteristics of persistent autonomous formations and discrete procedures to acquire and maintain persistence, without considering any dynamic control task required for the formation. In this section and the next, we focus on decentralized motion control of two-dimensional persistent formations where each of the agents makes decisions based only on its own observations and state. The particular problem we deal with in these two sections, in its basic form, is how to move a given persistent formation with specified initial position and orientation to a new desired position and orientation cohesively, that is, without violating the persistence of the formation during the motion. More specific definition of the problem is given as follows. PROBLEM 1 Consider a persistent two-dimensional formation F with m ≥ 3 agents A1 , . . . , Am whose initial position and orientation in 2 (the xy-plane) are specified with a set d¯ of desired inter-agent distances di j between neighbor agent pairs ( Ai , Aj ), that is, the initial position pi0 of each agent Ai (i ∈ {1, . . . , m}) is known, where the ¯ The control task is to move F to a given initial positions pi0 are consistent with d. desired (final) position and orientation defined by a set of final positions pi f of the ¯ cohesively, individual agents Ai for i = 1, . . . , m, where pi f are consistent with d, that is, without deforming the shape or violating the distance constraints of F during motion, using a decentralized strategy. We perform our control design and analysis in continuous-time domain with the following assumptions, relaxation or generalization of which will be discussed later: A1: Each agent Ai has a velocity integrator kinematics p˙ i (t) = vi (t)
(8.1)
where pi (t) = [xi (t), yi (t)], vi (t) = [vxi (t), v yi (t)] ∈ 2 denote the position and velocity of Ai at time t, respectively. A2: The individual controller of each agent Ai (i ∈ {1, . . . , m}) can adjust the velocity vi (t) directly, that is, vi (t) is the control signal of agent Ai . The controller of Ai (i ∈ {1, . . . , m}) is assumed to guarantee that vi (t) is continuous and |vi (t)| ≤ v¯ , ∀t ≥ 0 for some constant maximum speed limit v¯ > 0. A3: Each agent Ai knows its final desired position pi f and can sense its own position pi (t) and velocity vi (t) as well as the position p j (t) of any agent Aj it follows at any time t ≥ 0.
P1: Binaya Dash November 16, 2007
264
15:48
7985
7985˙C008
Modeling and Control of Complex Systems A4: The distance-sensing range for a neighbor agent pair ( Ai , Aj ) is sufficiently larger than the desired distance di j to be maintained.
Note here that Problem 1, together with the assumptions A1 to A4, is formulated in a simple form in order to simplify the initial analysis of cohesive motion of persistent formation to be presented in this chapter. In a more realistic or more practical scenario one would need to consider more complex agent dynamics; noise, disturbance, and time delay effects in sensing, control, and communication; imperfect position and distance sensors providing measurements with uncertainties; obstacles in the area of interest that the formation has to avoid; and so on. Moreover, there would be some optimality criteria for the control task in terms of the overall process duration, physical and computational energy consumption, and so on. Again these issues are neglected for the convenience of building a clear initial design and analysis framework that can be elaborated later for a particular, more practical problem according to particular, specifications. On the other hand, in order to make our design and analysis framework usable in more complex practical scenarios, we need to consider the requirements in possible extensions of Problem 1 and pay attention to simplicity and robustness, even if not needed for the sake of solving Problem 1 only. For example, a straightforward attempt to solve Problem 1 based on predetermining a suitable time trajectory for the formation starting at the given initial setting and ending at the final desired setting, and hence a time trajectory pi (t) for each agent Ai , and then generating vi for each Ai that would result in the predetermined trajectory pi (t), would not be easily extendible for more complex scenarios; in fact, it might not even be possible. In our approach, we choose the control laws below so that meeting the distance constraint has a higher priority than reaching to the final desired position which can be rewritten as a guideline as follows: G1: A 0-DOF agent has to use all of its control potential for satisfying its distance constraints, and a 1-DOF agent can undergo its DOF movement, only when its distance constraint is satisfied within a certain error bound.
8.4.2 Acyclically Led and Cyclically Led Formations For a complete analysis of Problem 1, we need to take into account the following categorization of two-dimensional persistent formations in terms of distribution of the DOFs. The sum of the DOFs of individual agents in a persistent two-dimensional formation is at most three, and for a minimally persistent formation, exactly three, which is the same as the DOF of a free rigid (nonvertex) object in 2 (two for translation and one for rotation) [14,31]. Based on the distribution of these three DOFs, minimally persistent formations can be divided into two categories: acyclically led minimally persistent formations (or formations with the leader-follower structure) where one agent
P1: Binaya Dash November 16, 2007
15:48
7985
7985˙C008
Persistent Autonomous Formations and Cohesive Motion Control
265
has two DOFs, another has one DOF, and the rest have zero DOFs, and cyclically led minimally persistent formations (or formations with the three-coleader structure) where three agents have one DOF and the rest have zero DOF. It can be easily shown that any two-dimensional minimally persistent formation has to be either acyclically led or cyclically led. In an acyclically led formation, the two-DOF agent is called the leader and the one-DOF agent is called the first follower. In a cyclically led formation, the one-DOF agents are called the coleaders. In both structures the zero-DOF agents are called the (ordinary) followers. The names cyclically led and acyclically led come from the following facts, which can be easily shown using the definition of DOF and Lemma 1 of Reference [15]: There is no cycle in an acyclically led formation passing through the leader; and there always exists a cycle passing through all of the three coleaders in a cyclically led formation. In a cyclically led formation, because of lying on a cycle, the motions of the three coleaders are cyclically dependent on each other and hence the motion control for the formation requires a more implicit strategy than one for an acyclically led formation. Some stability properties of a subclass of acyclically led formations are investigated in Reference [26]; however, such an investigation is not available yet for cyclically led formations. In Section 8.5, we design controllers to solve Problem 1 for both the cyclically led and acyclically led categories of minimally persistent formations. Note here that there exist acyclically led (minimally) persistent formations where the first follower does not directly follow the leader but another (ordinary follower) agent (see Figure 8.3a), and there exist cyclically led (minimally) persistent formations where the coleaders do not directly follow each other but some other agents (see Figure 8.3b). For simplicity, in Section 8.5, we assume that the first follower directly follows the leader in an acyclically led formation and the three coleaders follow each other in a cyclically led formation.
A2
A2
A1 A4
A1
A5
A5 A4 A3 (a)
A3 (b)
FIGURE 8.3 Atypical minimally persistent formations in 2 : (a) An acyclically led formation where the first follower (A4 ) does not follow the leader (A1 ). (b) A cyclically led formation where the coleaders (A1 , A3 , A5 ) do not follow each other.
P1: Binaya Dash November 16, 2007
15:48
7985
7985˙C008
266
8.5
Modeling and Control of Complex Systems
Decentralized Control of Cohesive Motion
8.5.1 Control Design Based on the definition of Problem 1, Assumptions A1 to A4, and Guideline G1, it can be judged that the structure of the individual controller of each agent in the minimally persistent formation of interest should be specific to its DOF. In other words, three types of controllers are needed, one for each zero-DOF, one-DOF, and two-DOF agent sets, regardless of whether the formation is acyclically led or cyclically led, although the motion behaviors and stability and convergence analysis for these two categories are expected to be different. In our control design to solve Problem 1 for minimally persistent formations, we use basic vector analysis and borrow ideas from the virtual vector field concept, details of which can found in Reference [5] and the references therein.6 The main idea in the virtual vector field approach is obtaining the overall velocity vector (for each agent) as the superposition of the vectors defining each of the separate motion tasks (of this agent). In our case the two separate motion vector types of an agent are (1) to maintain a distance constraint with each of the agents it follows and (2) to move towards a final destination. For optimality considerations and to cope with constant velocity requirements in certain unmanned air vehicle (UAV) and other flight formation applications, we assert the following two additional guidelines in our control design in addition to Assumptions A1 to A4 and Guideline G1: G2: Any agent shall move at the constant maximum speed v¯ > 0 at any time instant t ≥ 0 unless it is impossible to do so at that particular instant t due to, for example, initial and final zero-velocity constraints, and so on. G3: Any zero-DOF or one-DOF agent shall move through a path of shortest length in order to satisfy its distance constraints. Based on Assumptions A1 to A4 and Guidelines G1 to G3, we design a control scheme for zero-DOF, one-DOF, and two-DOF agents separately. Note here that the guidelines are labeled as guidelines (separate from the assumptions), because it is recognized that there may be clashes between them. The way these clashes are resolved is embedded in the control laws presented below. 8.5.1.1 Control Law for Zero-DOF Agents Consider a zero-DOF agent Ai and the two agents Aj and Ak it follows. Note here that, in a two-dimensional minimally persistent formation, any zero-DOF 6 However,
the virtual vector field approaches described in these works are different from our approach, as in these works, the interagent distance constraints are not considered and hence do not constitute a vector field.
P1: Binaya Dash November 16, 2007
15:48
7985
7985˙C008
Persistent Autonomous Formations and Cohesive Motion Control
267
agent necessarily follows exactly two other agents by definition of minimal persistence and Proposition 1. Due to the distance constraints of keeping | pi (t) − p j (t)|, | pi (t) − pk (t)| at the desired values of di j , dik , respectively, at each time t ≥ 0, the desired position pid (t) of Ai is the point whose distances to p j (t) and pk (t) are di j , dik , respectively, and which satisfies continuity of pid (t). Assuming | pi (t) − pid (t)| is sufficiently small, pid (t) can be explicitly determined as: pid (t) = p¯ jk (t, pi (t))
(8.2)
where p¯ jk (t, p0 ) for any p0 ∈ 2 denotes the intersection of the circles C( p j (t), di j ) and C( pk (t), dik ) that is closer to p0 , and in the notion C(·, ·) the first argument denotes the center and the second denotes the radius. Based on this observation, we propose the following control law for the zero-DOF agent Ai : vi (t) = v¯ βi (t)δid (t) /|δid (t)| δid (t) = pid (t) − pi (t) = p¯ jk (t, pi (t)) − pi (t) ⎧ 0, |δid (t)| < εk ⎪ ⎪ ⎪ ⎨ |δ (t)|−ε id k εk ≤ |δid (t)| < 2εk βi (t) = εk ⎪ ⎪ ⎪ 1, |δid (t)| ≥ 2εk ⎩
(8.3)
where v¯ > 0 is the constant maximum speed of the agents and εk > 0 is a small design constant. In Equation (8.3), the switching term βi (t) is introduced to avoid chattering due to small but acceptable errors in the desired interagent distances. 8.5.1.2 Control Law for One-DOF Agents Let agent Ai have one DOF and Aj be the agent it follows. First, observe that once Ai satisfies its distance constraint with Aj , it is free to move on the circle with center p j and radius di j provided that it does not need to use the whole of its velocity capacity to satisfy pi − p j = di j . Based on this observation and Guidelines G1 to G3, we design the following control scheme for Ai : vi (t) = βi (t)vi1 (t) +
1 − βi2 (t)vi2 (t)
δ ji (t) = (δ ji x (t), δ ji y (t)) = p j (t) − pi (t) δ¯ ji (t) = δ ji (t) − di j δ ji (t)/|δ ji (t)| ⎧ δ¯ ji (t) < εk 0, ⎪ ⎪ ⎪ ⎨ δ¯ (t) −ε | ji | k εk ≤ δ¯ ji (t) < 2εk βi (t) = εk ⎪ ⎪ ⎪ δ¯ ji (t) ≥ 2εk ⎩ 1,
(8.4)
P1: Binaya Dash November 16, 2007
268
15:48
7985
7985˙C008
Modeling and Control of Complex Systems
where vi1 (t) = v¯ δ¯ ji (t)/|δ¯ ji (t)|
(8.5)
vi2 (t) = v¯ β¯ i (t)sgn(δi f (t), δ¯ ⊥ji (t)) δ¯ ⊥ji (t) δi f (t) = pi f (t) − pi (t) δ¯ ⊥ji (t) = (−δ¯ ji y (t), δ¯ ji x (t))/|δ¯ ji (t)| ⎧ 0, |δi f (t)| < ε f ⎪ ⎪ ⎨ f β¯ i (t) = |δi f (t)|−ε ε f ≤ |δi f (t)| < 2ε f εf ⎪ ⎪ ⎩ 1, |δi f (t)| ≥ 2ε f
(8.6)
εk , ε f > 0 are small design constants and ·, · denotes the dot product operation. In Equation (8.4), via the switching term βi (t), the controller switches between the translational action vi1 (given in Equation [8.5]) to satisfy | pi − p j | ∼ = di j and the rotational action vi2 (given in Equation [8.6]) to move the agent Ai towards pi f , which can take place only when | pi − p j | is sufficiently close to di j . In Equation (8.6), δ¯ ⊥ji (t) is the unit vector perpendicular to the distance vector δ ji (t) = p j (t) − pi (t) with clockwise orientation with respect to the circle C( p j (t), di j ), and the term
sgn δi f (t), δ¯ ⊥ji (t) determines the orientation of motion that would move Ai towards pi f . The switching term β¯ i (t) is for avoiding chattering due to small but acceptable errors in the final position of Ai . 8.5.1.3 Control Law for Two-DOF Agents If a given agent Ai has two DOFs (which can only happen if Ai is the leader of an acyclically led formation in our case), as it does not have any constraint to satisfy, it can use its full velocity capacity only to move towards its desired final position pi f . Hence the velocity input at each time t can be simply designed as a vector with magnitude v¯ in the direction of pi f − pi (t):
vi (t) = v¯ β¯ i (t)δi f (t) |δi f (t)| δi f (t) = pi f − pi (t) ⎧ 0, |δi f (t)| < ε f ⎪ ⎪ ⎨ f β¯ i (t) = |δi f (t)|−ε ε f ≤ |δi f (t)| < 2ε f εf ⎪ ⎪ ⎩ 1, |δi f (t)| ≥ 2ε f
(8.7)
The switching term β¯ i (t) again prevents chattering due to small but acceptable errors in the final position of Ai .
P1: Binaya Dash November 16, 2007
15:48
7985
7985˙C008
Persistent Autonomous Formations and Cohesive Motion Control
269
A2
A1 A4
A5
A3 FIGURE 8.4 An acyclically led minimally persistent formation to be controlled: A1 is the leader, A2 is the first follower.
8.5.2 Stability and Convergence In this section we informally discuss the stability and convergence properties associated with the application of the control laws designed in Section 8.5.1 to each of the classes of acyclically led and cyclically led persistent formations separately. 8.5.2.1 Acyclically Led Minimally Persistent Formations Consider Problem 1 for an acyclically led minimally persistent formation F with m ≥ 3 agents A1 , . . . , Am , where without loss of generality, A1 is the leader, A2 is the first follower, and the other agents are ordinary followers (such a formation is depicted in Figure 8.4). Note here that, by the assumption at the end of Section 8.4, A2 follows A1 . In this case based on the proposed control scheme in Section 8.5.1 A1 uses the control law (8.7); A2 uses the control law (8.4) to (8.6); each of A3 , . . . , Am uses the control law (8.3). Following is an informal sketch of a possible analysis of the stability and convergence properties of F during its motion, noting that the formal complete analysis was not completed during the submission of this chapter. Consider dynamic behavior of each agent separately. The leader agent A1 uses the
control law (8.7). Hence, defining the Lyapunov function V1 (t) = 12 δ1Tf (t)δ1 f (t), from Equation (8.7) it follows that: ˙ 1 (t) = −δ1Tf (t)v1 (t) V = −¯vβ¯ 1 (t)δ1Tf (t)δ1 f (t)/|δ1 f (t)| = −¯vβ¯ 1 (t)|δ1 f (t)| ⎧ if δ1 f (t) < ε f ⎪ ⎨ 0, δ (t) −ε = −¯v|δ1 f (t)| | 1 f ε | f ≤ −¯v δ1 f (t) − ε f , if ε f ≤ δ1 f (t) < 2ε f (8.8) f ⎪ ⎩ −¯v|δ1 f (t)| ≤ −2¯vε f , if δ1 f (t) ≥ 2ε f
P1: Binaya Dash November 16, 2007
15:48
7985
7985˙C008
270
Modeling and Control of Complex Systems
Therefore, we have |δ1 f (t)| ≤ |δ1 f (0)|, ∀t ≥ 0 and limt→∞ |δ1 f (t)| ≤ ε f , that is, p1 (t) is always bounded and asymptotically converges to the ball B( p1 f , ε f ), that is, the ball with center p1 f and radius ε f . It can be further deduced from Equation (8.8) that p1 (t) enters the ball B( p1 f , 2ε f ) in finite time and remains there. A similar but relatively longer analysis can be done for the dynamic behavior of A2 defining the Lyapunov functions 1 T 1 δ¯ (t) δ¯12 (t), V2 f (t) = δ2Tf (t)δ2 f (t) 2 12 2 ˙ 21 (t) and V ˙ 2 f (t) together with V ˙ 1 (t) and the control law Equaand examining V tions (8.4) to (8.6). This analysis is expected to establish the conditions under which p2 (t) remains bounded, and converges to finite time-varying balls around p1 (t) for t ≥ 0 with certain radii as well as a fixed ball around p2 f with a certain radius. ˙ id (t) for Similarly, for agents Ai where i ∈ {3, 4, . . . , m}, analyzing V
V21 (t) =
1 T δ (t)δid (t) 2 id it appears possible that boundedness and convergence properties of each pi (t) can be established. Combining this result with the above ones for agents A1 , A2 and via some geometric analysis on the definition of pid in (Equation 8.2), the conditions to guarantee convergence of all the agents to their final desired positions, as well as satisfaction of the distance constraints within certain error tolerance bounds, can be deduced. Note here that the discussions above, as well as the applicability of the control laws for agents Ai (i ∈ {3, 4, . . . , m}), are valid if and only if pid (t) in Equation (8.2) is well defined, that is, the circles C( p j (t), di j ) and C( pk (t), dik ) intersect for all t ≥ 0. Via a geometric analysis of accumulation of the position errors it is observed that Equation (8.2) can be guaranteed to be well defined selecting the constant εk sufficiently small. Some simulation results and discussions on testing of the control structure described above on acyclically led persistent formations can be found in Reference [25]. The results shown in Reference [25] for a four-agent example formation indicate that the control goals are successfully achieved where each agent satisfies its distance constraints all the time during motion with a significantly small error (less than 2% of the distance to be kept).
Vid (t) =
8.5.2.2 Cyclically Led Minimally Persistent Formations Consider a cyclically led minimally persistent formation F with m ≥ 3 agents A1 , . . . , Am , where A1 , A2 , and A3 are the coleaders, and the other agents are ordinary followers (such a formation is depicted in Figure 8.5). By the assumption at the end of Section 8.4 which involves some loss of generality, assume also that A2 follows A1 , A3 follows A2 , and A1 follows A3 . In this case, based on the proposed control scheme in Section 8.5.1, each of A1 , A2 , and A3 uses the control law (8.4) to (8.6) and each of A4 , . . . , Am uses the control law (8.3).
P1: Binaya Dash November 16, 2007
15:48
7985
7985˙C008
Persistent Autonomous Formations and Cohesive Motion Control
271
A2
A1 A4
A5
A3 FIGURE 8.5 A cyclically led minimally persistent formation to be controlled: A1 , A2 , and A3 are the coleaders.
For the remaining agents it appears possible that boundedness and convergence properties of each pi (t) can be established in a similar manner to that suggested for acyclically led formations. Combining all these results, the conditions guaranteeing cohesive motion as well as stability and convergence of the entire formation to the desired destination can be deduced. Again, the formal analysis corresponding to the above sketch has not been completed yet. Nevertheless, the corresponding simulation results and discussions of Reference [25] on testing of the control structure described in Section 8.5.1 demonstrate that the global stability and convergence properties or cyclically led persistent formations are comparable to those of the acyclically led formations. One distinction observed in the results shown in Reference [25] is that the agents follow longer and more indirect paths than the acyclically led case. This is mainly because of guidance of the whole formation by a coleader triple which make constrained motions as described by Equations (8.4) to (8.6). 8.5.3 More Complex Agent Models In this chapter, we have designed control schemes for and analyzed the problem of cohesive motion of persistent formations based on the velocity integrator model (8.1) for the agent kinematics. The actual kinematic or dynamic model of agents in a real-life formation, however, would be more complex than a velocity integrator in general. Therefore, a more practical design and analysis procedure for the cohesive motion problem would require a more complex agent model than Equation (8.1). The form of such a model for a particular application, of course, will depend on the specifications of the agents used in this application. Some of the widely used continuous-time agent models7 used in the formation control literature corresponding to the practical experimental agents 7 Discrete-time
counterparts of these models exist and are used in the literature as well.
P1: Binaya Dash November 16, 2007
15:48
7985
7985˙C008
272
Modeling and Control of Complex Systems
(e.g., robots or vehicles) of interest are the double integrator or point mass dynamics model, where the acceleration term v˙ i = a i is added to the model (8.1) and it is assumed that the control input is the vectorial acceleration a i [21,22]; the fully actuated uncertain dynamics model [11,13]: Mi ( pi ) p¨ i + ηi ( pi , p˙ i ) = ui where Mi represents the mass or inertia matrix, ηi represents the centripetal, Coriolis, gravitational effects, and other additive disturbances, and ui is the control signal that can be applied in the form of a vectorial force; and the nonholonomic unicycle dynamics model [19]: ( x˙ i , y˙ i ) = (vi cos θi , vi sin θi ) θ˙i = ωi 1 v˙ i = ui1 mi 1 ω˙ i = ui2 τi
(8.9)
where pi (t) = (xi (t), yi (t)) denotes the position of Ai as before and θi (t) denotes the orientation or steering angle of the agent with respect to a certain fixed axis, vi is the translational speed, ωi is the angular velocity, and the control input signals ui1 ,ui2 are, respectively, the force and torque inputs. A simplified form of Equation (8.9) is the nonholonomic unicycle kinematics model [9]: ( x˙ i , y˙ i ) = (vi cos θi , vi sin θi ) θ˙i = ωi
(8.10)
where it is assumed that the translational speed vi and the angular velocity ωi can be applied as control inputs directly. Some preliminary results, based mainly on simulation studies, on the solution of Problem 1 for the nonholonomic unicycle kinematic agent model (8.10) are presented in Reference [25]. The control schemes used in this work employ a so-called “separation-separation control” idea Reference [9], which was originally developed for following a mobile robot by another one at a specified distance or relative position, and are not direct extensions of the designs presented in Section 8.5.1. Nevertheless, in the simulations in Reference [25] using the new control scheme, it is observed that the control goal in Problem 1 is achieved, although the performance is poor compared to the performance for the velocity integrator model in terms of both the path length and the distance constraints. The simulation results in Reference [25] demonstrate that cohesive motion control with agent kinematics or dynamics models that are more complex than the velocity integrator model is feasible. Design of an enhanced control scheme to obtain a better performance for the unicycle kinematic agent model (8.10), as well as similar designs for the double integrator and fully actuated uncertain dynamics models, are currently being investigated by the authors.
P1: Binaya Dash November 16, 2007
15:48
7985
7985˙C008
Persistent Autonomous Formations and Cohesive Motion Control
8.6
273
Discussions and Future Directions
In this chapter we have analyzed persistent autonomous multiagent formations based on a recently developed theoretical framework of graph rigidity and persistence. We have reviewed the general characteristics of rigid and persistent graphs and their implications on the control of persistent formations, and presented some operational criteria to check the persistence of a given formation. Using these characteristics and criteria, we have analyzed certain persistence acquisition and maintenance tasks. Later, we have analyzed cohesive motion of persistent autonomous formations and presented a set of distributed control schemes to cohesively move a given persistent formation with specified initial position and orientation to arbitrary desired final position and orientation. There still exist open problems or designs to be completed related to discussions presented on each of characteristics, persistence acquisition, persistence maintenance, and cohesive motion of persistent formations. Relevant to the studies presented in Sections 8.2 and 8.3, the authors are currently working on developing new metrics to characterize health and robustness of formations; recovering persistence in the event of an agent loss; guaranteeing persistence after merging of two or more persistent formations to accomplish the same mission; as well as testing theoretical results that can be applied to the control of formations of aerial vehicles. Beside these, the general forms of splitting and closing rank problems for persistent formations remain open, as well as the general solution to persistence acquisition problems defined in Section 8.3.1. Related to the cohesive motion problem, the authors are currently working on analyzing and enhancing the control laws and strategies presented for optimality in terms of, for example, the total displacement of all agents; design of similar control schemes for more complex agent models discussed in Section 8.5.3; and solution of the cohesive motion problem in the existence of obstacles in the region of interest. Different approaches to these ongoing studies as well as consideration of various real-life effects such as distance measurement noises, lack of global position sensing for some agents, and so on, may constitute different future research directions.
Acknowledgment The work of B. Fidan, B. Anderson, and C. Yu is supported by National ICT Australia, which is funded by the Australian government’s Department of Communications, Information Technology and the Arts and the Australian Research Council through the Backing Australia’s Ability Initiative. J. Hendrickx holds an FNRS fellowship (Belgian Fund for Scientific Research).
P1: Binaya Dash November 16, 2007
15:48
274
7985
7985˙C008
Modeling and Control of Complex Systems
His work is supported by the Belgian Programme on Interuniversity Attraction Poles initiated by the Belgian Federal Science Policy Office, and The Concerted Research Action (ARC) “Large Graphs and Networks” of the French Community of Belgium. The scientific responsibility rests with its authors.
References 1. Baillieul, J. and Suri, A., Information patterns and hedging Brockett’s theorem in controlling vehicle formations, in Proceedings of the 42nd IEEE Conference on Decision and Control, Vol. 1, 2003, pp. 556–563. 2. Belta, C. and Kumar, V., Abstraction and control for groups of robots, IEEE Transactions on Robotics and Automation, Vol. 20, June 2004, pp. 865–875. 3. Bowyer, R.S. and Bogner, R.E., Agent behaviour and capability correlation factors as determinants in fusion processing, in Proc. Fusion 2003: Special Session on Fusion by Dist. Cooperative Agents, Cairns, Australia, 2003. 4. Das, A., Fierro, R., Kumar, V., and Ostrowski, J.P., A vision-based formation control framework, IEEE Transactions on Robotics and Automation, Vol. 18, October 2002, pp. 813–825. 5. Drake, S., Brown, K., Fazackerley, J., and Finn, A., Autonomous control of multiple UAVs for the passive location of radars, in Proc. 2nd Int. Conf. on Intelligent Sensors, Sensor Networks and Information Processing (ISSNIP), December 2005, pp. 403–409. 6. Eren, T., Anderson, B.D.O., Morse, A.S., Whiteley, W., and Belhumeur, P.N., Operations on rigid formations of autonomous agents, Communications in Information and Systems, Vol. 3, No. 4, 2004, pp. 223–258. 7. Eren, T., Goldenberg, D.K., Whiteley, W., Yang, Y.R., Morse, A.S., Anderson, B.D.O., and Belhumeur, P.N., Rigidity, computation, and randomization in network localization, in Proc. INFOCOM — Conference of the IEEE Computer and Communications Societies, Vol. 4, March 2004, pp. 2673–2684. 8. Fidan, B., Yu, C., and Anderson, B.D.O., Acquiring and maintaining persistence of autonomous multi-vehicle formations, in IET Control Theory & Applications, Vol. 1, No. 2, pp. 452–460, March 2007. 9. Fierro, R., Song, P., Das, A., and Kumar, V., Cooperative control of robot formations, in Cooperative Control and Optimization, R. Murphey and P. Pardalos (eds.), Kluwer Academic, Dordrecht, the Netherlands, 2002, pp. 73–94. 10. Foulds, L.R., Graph Theory Applications, Springer-Verlag, New York, 1992. 11. Gazi, V., Swarm aggregations using artificial potentials and sliding mode control, IEEE Transactions on Robotics and Automation, Vol. 21, No. 6, December 2005, pp. 1208–1214. 12. Gazi, V. and Passino, K.M., Stability analysis of swarms, IEEE Transactions on Automatic Control, Vol. 48, No. 4, April 2003, pp. 692–697. 13. Guldner, J. and Utkin, V.I., Sliding mode control for gradient tracking and robot navigation using artificial potential fields, IEEE Transactions on Robotics and Automation, Vol. 11, No. 2, April 1995, pp. 247–254. 14. Hendrickx, J.M., Anderson, B.D.O., Delvenne, J.-C., and Blondel, V.D., Directed graphs for the analysis of rigidity and persistence in autonomous agent systems,
P1: Binaya Dash November 16, 2007
15:48
7985
7985˙C008
Persistent Autonomous Formations and Cohesive Motion Control
15.
16.
17.
18. 19.
20.
21. 22. 23.
24.
25.
26. 27. 28.
29.
30.
31.
275
International Journal of Robust and Nonlinear Control, Vol. 17, pp. 860–881, July 2007. Hendrickx, J.M., Fidan, B., Yu, C, Anderson, B.D.O., and Blondel, V.D., Elementary operations for the reorganization of minimally persistent formations, in Proceedings of the 17th International Symposium on Mathematical Theory of Networks and Systems (MT NS2006), July 2006, pp. 859–873. Hendrickx, J.M., Fidan, B., Yu, C., Anderson, B.D.O., and Blondel, V.D., Formation reorginization by primitive operations on directed graphs, to appear in IEEE Transactions on Automatic Control. Hendrickx, J.M., Yu, C., Fidan, B., and Anderson, B.D.O., Rigidity and persistence of meta-formations, in Proceedings of the 45th IEEE Conference on Decision and Control, December 2006, pp. 5980–5985. Laman, G., On graphs and rigidity of plane skeletal structures, Journal of Engineering Mathematics, Vol. 4, 1970, pp. 331–340. Lawton, J.R.T., Beard, R.W., and Young, B.J., A decentralized approach to formation maneuvers, IEEE Transactions on Robotics and Automation, Vol. 19, No. 6, December 2003, pp. 933–941. Lin, Z., Francis, B.A., and Maggiore, M., Necessary and sufficient graphical conditions for formation control of unicycles, IEEE Transactions on Automatic Control, Vol. 50, January 2005, pp. 121–127. Liu, Y. and Passino, K.M., Stable social foraging swarms in a noisy environment, IEEE Transactions on Automatic Control, Vol. 49, January 2004, pp. 30–44. Olfati-Saber, R., Flocking for multi-agent dynamic systems: Algorithms and theory, IEEE Transactions on Automatic Control, Vol. 51, March 2006, pp. 401–420. Olfati-Saber, R. and Murray, R.M., Graph rigidity and distributed formation stabilization of multi-vehicle systems, in Proceedings of the IEEE Conference on Decision and Control, Vol. 3, December 2002, pp. 2965–2971. Ren, W. and Beard, R.W., A decentralized scheme for spacecraft formation flying via the virtual structure approach, AIAA Journal of Guidance, Control and Dyanmics, Vol. 27, No. 1, 2004, pp. 73–82. Sandeep, S., Fidan, B., and Yu, C., Decentralized cohesive motion control of multi-agent formations, in Proceedings of the 14th Mediterranean Conference on Control and Automation, June 2006. Tanner, H.G., Pappas, G.J., and Kumar, V., Leader-to-formation stability, IEEE Transactions on Robotics and Automation, Vol. 20, No. 3, 2004, pp. 443–455. Tay, T. and Whiteley, W., Generating isostatic frameworks, Structural Topology, No. 11, 1985, pp. 21–69. Whiteley, W., Rigidity and scene analysis, in Handbook of Discrete and Computational Geometry, J. Goodman and J. O’Rourke (eds.), CRC Press, Boca Raton, FL, 1997, pp. 893–916. Whiteley, W., Some matroids from discrete applied geometry, in Matroid Theory, J.E. Bonin, J.G. Oxley, and B. Servatius (eds.), Vol. 197 of Contemporary Mathematics, American Mathematical Society, Providence, RI, 1996, pp. 171–311. Yu, C., Fidan, B., and Anderson, B.D.O., Persistence acquisition and maintenance for autonomous formations, in Proceedings of the 2nd International Conference on Intelligent Sensors, Sensor Networks and Information Processing (ISSNIP), December 2005, pp. 379–384. Yu, C., Hendrickx, J.M., Fidan, B., Anderson, B.D.O., and Blondel, V.D., Three and higher dimensional autonomous formations: Rigidity, persistence, and structural persistence, Automatica, Vol. 43, No. 3, March 2007, pp. 387–402.
P1: Binaya Dash November 16, 2007
15:48
7985
7985˙C008
P1: Shashi November 16, 2007
16:36
7985
7985˙C009
9 Modeling and Control of Unmanned Aerial Vehicles: Current Status and Future Directions
George Vachtsevanos, Panos Antsaklis, and Kimon Valavanis
CONTENTS 9.1 9.2 9.3 9.4 9.5 9.6
Introduction................................................................................................ 277 System Architecture .................................................................................. 280 Formation Control ..................................................................................... 284 Two-Vehicle Example ................................................................................ 285 Simulation Results .................................................................................... 288 Target Tracking........................................................................................... 290 9.6.1 Particle Filtering in a Bayesian Framework .............................. 291 9.7 New Directions and Technological Challenges..................................... 292 9.7.1 Technological Challenges............................................................. 292 9.7.2 Enabling Technologies.................................................................. 293 9.8 Conclusion .................................................................................................. 294 References............................................................................................................. 294
9.1
Introduction
Recent military and civil actions worldwide have highlighted the potential utility for unmanned aerial vehicles (UAVs). Both fixed-wing and rotary aircraft have contributed significantly to the success of several military and surveillance/rescue operations. Future combat operations will continue to place unmanned aircraft in challenging conditions, such as the urban warfare environment. However, the poor reliability, reduced autonomy, and operator workload requirements of current unmanned vehicles present a roadblock to their success. It is anticipated that future operations will require multiple UAVs performing in a cooperative mode, sharing resources and 277
P1: Shashi November 16, 2007
278
16:36
7985
7985˙C009
Modeling and Control of Complex Systems
complementing other air or ground assets. Surveillance and reconnaissance tasks that rely on UAVs require sophisticated modeling, planning, and control technologies. This chapter reviews the current status of UAV technologies, with emphasis on recent developments aimed at improved autonomy and reliability of UAVs, and discusses future directions and technological challenges that must be addressed in the immediate future. We view the assembly of multiple and heterogeneous vehicles as a “system of systems” where individual UAVs are functioning as sensors or agents. Thus, networking, computing, and communications issues must be considered as the UAVs are tasked to perform surveillance and reconnaissance missions in an urban environment. The same scenario arises in similar civil applications, such as forest fire detection, rescue operations, pipeline monitoring, and so on. We will briefly survey softwareenabled control technologies — fault detection and control reconfiguration, adaptive mode transitioning, and envelope protection that are intended to endow the UAV with improved autonomy and reliability even when operating under extreme conditions. A software (middleware) platform enables real-time reconfiguration, plug-and-play, and other quality of service functions. Multiple UAVs, flying in a swarm, constitute a network of distributed (in the spatiotemporal sense) sensors that must be coordinated to complete a complex mission. Current R&D activities are discussed that concern issues of modeling, planning, and control. Here, optimum terrain coverage, target tracking, and adversarial reasoning strategies require new technologies to deal with issues of system complexity, uncertainty management, and computational efficiency. We will pose the major technical challenges arising in the “system of systems” approach and state the need for new modeling, networking, communications, and computing technologies that must be developed and validated if such complex unmanned systems as UAVs are to perform effectively and efficiently, in conjunction with manned systems, in a variety of application domains. We will conclude by proposing possible solutions to these challenges. The future urban warfare, as well as search and rescue, border patrol, Homeland security, and other applications will utilize an unprecedented level of automation in which human-operated, autonomous, and semiautonomous air and ground platforms will be linked through a coordinated control system [1]. Networked UAVs bring a new dimension to future combat systems that must include adaptable operational procedures, planning, and deconfliction of assets coupled with the technology to realize such concepts. The technical challenges the control designer is facing for autonomous collaborative operations stem from real-time sensing, computing, and communications requirements, environmental and operational uncertainty, hostile threats, and the emerging need for improved UAV and UAV team autonomy and reliability. Figure 9.1 shows the autonomous control level trend according to the Department of Defense UAV Roadmap [2]. The same roadmap details the need for new technologies that will address single-vehicle and multivehicle autonomy issues. The challenges increase significantly as we move up the
P1: Shashi November 16, 2007
16:36
7985
7985˙C009
Modeling and Control of Unmanned Aerial Vehicles
279
Autonomous Control Levels Fully autonomous swarms 10 Group strategic goals
9
Distributed control
8
Group tactical goals
7
Group tactical replan
6
Group coordination
5
Onboard route replan
4
Adapt to failures & flight conditions
3
Real-time health/diagnosis
2
Remotely guided
1 1955
UCAR
AF UCAV
UCAV-N
Global Hawk Predator Pioneer
1965
1975
1985
1995
2005
2015
2025
FIGURE 9.1 Autonomous control level trend.
hierarchy of the chart shown in Figures 9.2a and b from single-vehicle to multivehicle coordinated control. Moderate success has been reported thus far in meeting the lower-echelon challenges. Achieving the ultimate goal of full autonomy for a swarm of vehicles executing a complex surveillance and reconnaissance mission still remains a major challenge. To meet these challenges, innovative coordinated planning and control technologies, such as distributed artificial intelligence (DAI), computational intelligence, and soft computing, as well as game theory and dynamic optimization, have been investigated intensively in recent years. However, in this area, more work has been focused on solving particular problems, such as formation control and autonomous search, while less attention has been paid to the system architecture, especially from an implementation and integration point of view. Other significant concerns relate to inter-UAV communications, links to command and control, contingency management, and so on. We will review briefly in this chapter a few of the challenges referred to above and suggest possible approaches to these problems. The intent is to motivate through application examples the modeling, control, and communication concerns and highlight those new directions that are needed to assist in arriving at satisfactory solutions. We will emphasize the synergy of tools and methodologies stemming from various domains, as well as the resurfacing of classical mathematical notions that may be called upon now to solve difficult spatiotemporal dynamic situations. Recent advances in computing and communications promise to accommodate the on-line real-time implementation of such mathematical algorithms that were considered intractable some years back.
P1: Shashi November 16, 2007
16:36
7985
7985˙C009
280
Modeling and Control of Complex Systems Autonomous Control Level (ACL) Chart Note: As ACL increases, capability includes, or replaces, items from lower levels Level
Level Descriptor
Observe Perception/Situational Awareness
Orient Analysis/Coordination
Decide Decision Making
Act Capability
10
Fully Autonomous
Cognizant of all within Battlespace
Coordinates as necessary
Capable of total independence
Requires little guidance to do job
9
Battlespace Swarm Cognizance
Battlespace inference – Intent of self and others (allies and foes). Complex Intense environment – on-board tracking
Strategic group goals assigned.
Distributed tactical group planning. Individual determination of tactical goal. Individual task planning/execution. Choose tactical targets
Group accomplishment of strategic goal with no supervisory assistance
Enemy Strategy inferred.
8
Battlespace Cognizance
Proximity inference – Intent of self and others (allies and foes). Reduced dependence upon off-board data.
Strategic group goals assigned Enemy tactics inferred ATR
Coordinated tactical group planning Individual task planning/ execution Choose targets of opportunity
Group accomplishment of strategic goal with minimal supervisory assistance
7
Battlespace Knowledge
Short track awareness – History and predictive battlespace data in limited range, timeframe and numbers.
Tactical group goals assigned. Enemy Trajectory estimated
Individual task planning/execution to meet goals
Group accomplishment of tactical goal with minimal supervisory assistance
6
Real-Time MultiVehicle Cooperation
Ranged awareness – Onboard sensing for long rang, supplemented by off-board data
Tactical group goals assigned Enemy location sensed/ estimated
Coordinated trajectory planning and execution to meet goals – Group optimization
Group accomplishment of tactical goal with minimal supervisory assistance
5
Real-Time MultiVehicle Cooperation
Sensed awareness Tactical group plan On-board trajectory – Local sensor to detect assigned replanning others, fused with off- RT Health Diagnosis, Ability – Optimize for current and board data to compensate for most predictive conditions failures and flight conditions, Collision avoidance Ability to predict onset of failures (e.g. Prognostic Health Mgmt) Group diagnosis and resource management
Group accomplishment of tactical plan as externally assigned Air collision avoidance Possible close air space separation for AAR, formation in nonthread conditions
FIGURE 9.2 The autonomous control level chart.
9.2
System Architecture
While networked and autonomous UAVs can be centrally controlled, this requires that each UAV communicates all the data from its sensors to a central location and receives all the control signals back. Network failures and communication delays are one of the main concerns in the design of cooperative control systems. On the other hand, DAI systems provide an environment in which agents autonomously coordinate, cooperate, negotiate, make decisions, and take actions to meet the objectives of a particular application or mission. The autonomous nature of agents allows for efficient communication and processing among distributed resources.
P1: Shashi November 16, 2007
16:36
7985
7985˙C009
Modeling and Control of Unmanned Aerial Vehicles
281
Autonomous Control Level (ACL) Chart Note: As ACL increases, capability includes, or replaces, items from lower levels Level
Level Descriptor
Observe Perception/Situational Awareness
Orient Analysis/Coordination
Decide Decision Making
Act Capability
4
Fault/Event Adaptive Vehicle
Deliberate Awareness – Allies communicate data
Tactical plan assigned Assigned Rules of Engagement RT Health Diagnosis. – Ability to compensate for most failures and flight conditions – inner loop changes reflected in outer loop performance
On-board trajectory replanning – Event driven, Self resource management, Deconfliction
Self-accomplishment of tactical plan as externally assigned Medium vehicle airspace separation
3
Robust Response to RealTime Faults/Events
Health/Status history and models
Tactical plan assigned RT Health Diagnostics Ability to compensate for most control failures and flight conditions
Evaluate status vs. required mission capabilities Abort/RTB if insufficient
Self-accomplishment of tactical plan as externally assigned
2
Changeable Mission
Health/Status sensors
RT Health diagnosis Off-board replan
Execute preprogrammed or uploaded plans in response to mission and health conditions
Self-accomplishment of tactical plan as externally assigned
1
Execute Preplanned Mission
Preloaded mission data Flight Control and Navigation Sensing
Pre/Post Flight BIT Report status
Preprogrammed mission and abort plans
Wide airspace separation requirements
0
Remotely Piloted Vehicle
Flight Control sensing Nose camera
Telemetered data Remote pilot commands
N/A
Control by remote pilot
FIGURE 9.2 (Continued).
For the purpose of coordinated control of multiple UAVs, each individual UAV in the team is considered as an agent or sensor with particular capabilities engaged in executing a portion of the mission. The primary task of a typical team of UAVs is to execute faithfully and reliably a critical mission while satisfying local survivability conditions. In order to define the application domain, we adopt an assumed mission scenario of a group of UAVs executing reconnaissance and surveillance (RS) missions in an urban warfare environment, as depicted in Figure 9.3. A “system of systems” approach suggests a hierarchical architecture for the coordinated control of multiple UAVs. The hierarchical architecture, shown in Figure 9.4, features an upper level with global situation awareness and team mission planning; a middle level with local knowledge, formation control, and obstacle avoidance; and a lower level that interfaces with onboard baseline controllers, sensors, and communication and weapon systems. Each level consists of several interacting agents with dedicated functions. The formation control problem is viewed as a pursuit game of n pursuers and n
P1: Shashi November 16, 2007
16:36
7985
7985˙C009
282
Modeling and Control of Complex Systems
Urban Warfare GTMax
Manned Vehicle GTMav
OAV
Fixed Wing UAV
GTMax
Sniper
Ground Sensor
Ground Sensor Moving Target Soldiers Ground Sensor
Commander
Operator
FIGURE 9.3 A team of five UAVs executing reconnaissance and surveillance missions in an urban warfare environment.
evaders. Stability of the formation of vehicles is guaranteed if the vehicles can reach their destinations within a specified time, assuming that the destination points are avoiding the vehicles in an optimal fashion. The vehicle model is simplified to point mass with acceleration limit. Collision avoidance is achieved by designing the value function so that it ensures that the vehicles move away from one another when they come too close together. Simulation results are provided to verify the performance of the proposed algorithms. The highest level of the control hierarchy features functions of global situation awareness and teamwork. The mission planning agent is able to generate and refine mission plans for the team, generate or select flight routes, and create operational orders. It is also responsible for keeping track of the team’s plan and goals, and team members’ status. The overall mission is usually planned by the command and control center based on the capabilities of each individual UAV agent, and is further decomposed into tasks and subtasks, which are finally allocated to the UAV assets (individually or in coordination with other vehicles). This can usually be cast as a constrained optimization problem and tackled with various approaches, such as integer programming, graph theory, and so on. Market-based methods [3–5] and especially auction theory [6,7] can be applied as a solution to autonomous mission replanning. Planning the UAVs’ flight route is also an integral part of mission planning. A modified A* search algorithm, which
P1: Shashi November 16, 2007
16:36
7985
7985˙C009
Modeling and Control of Unmanned Aerial Vehicles
283
Command
Manned
Level 3 Global Knowledge
Team Mission Planning
Global Situation
Global Performance
Knowledge Fusion
QoS Assessment
Level 2 Local Knowledge
Formation Control
Moving Obstacle
Local Situation
FDI/Reconfigurable
Local Mission
Level 1 Behavioral Knowledge
Vehicle
Weapon System
Communication
Sensing Agent
... ...
FIGURE 9.4 A generic hierarchical multiagent system architecture.
attempts to minimize a suitable cost function consisting of the weighted sum of distance, hazard, and maneuverability measures, [8–10] can be utilized to facilitate the design of the route planner. In the case of a leader-follower scenario, an optimal route is generated for the leader, while the followers fly in close formation in the proximity of the leader. The global situation awareness agent, interacting with the knowledge fusion agent, evaluates the world conditions based on data gathered from each UAV (and ground sensors if available) and reasons about the enemy’s likely actions. Adversarial reasoning and deception reasoning are two important tasks executed here. The global performance measurement agent measures the performance of the team and suggests team reconfiguration or mission replanning, whenever necessary. Quality of service (QoS) is assessed to make the best effort to accomplish the mission and meet the predefined quality criteria. Real-world implementation of this level is not limited to the agents depicted in the figure. For example, in heterogeneous agent societies, knowledge of coordination protocols and languages may also reside [11,12].
P1: Shashi November 16, 2007
16:36
7985
7985˙C009
284
9.3
Modeling and Control of Complex Systems
Formation Control
The problem of finding a control algorithm that will ensure that multiple autonomous vehicles can maintain a formation while traversing a desired path and avoid intervehicle collisions will be referred to as the formation control problem. The formation control problem has recently received considerable attention due in part to its wide range of applications in aerospace and robotics. A classic example involving the implementation of the virtual potential problem is presented in Reference [13]. The authors performed simulations on a two-dimensional system, which proved to be well behaved. However, as they mention in their conclusion, the drawback of the virtual potential function approach is the possibility of being “trapped” in local minima. Hence, if local minima exist, one cannot guarantee that the system is stable. In Reference [14] the individual trajectories of autonomous vehicles moving in formation were generated by solving the optimal control problem at each time step. This is computationally demanding and hence not possible to perform in real time with current hardware. This chapter views the formation control problem from a two-player differential game perspective, which provides a framework to determine acceptable initial vehicle deployment conditions but also provides insight into acceptable formation maneuvers that can be performed while maintaining the formation. The formation control problem can be regarded as a pursuit game, except that it is, in general, much more complex in terms of the combined dynamic equations, as the system consists of n pursuers and n evaders instead of only one of each. However, if the group of vehicles is viewed as the pursuer and the group of desired points in the formation as the evader, the problem is essentially reduced to the standard but much more complex pursuit game. Differential game theory was initially used to determine optimal military strategies in continuous time conflicts governed by some given dynamics and constraints [15]. One such application is the so-called pursuit game, in which a pursuer has to collide with an evading target. Naturally, in order to solve such a problem it is advantageous to know the dynamics and the positional information of both the evader and the pursuer, that is, the pursuit game will be viewed as a perfect information game. Stability of the formation of vehicles is guaranteed if the vehicles can reach their destination within some specified time, assuming that the destination points are avoiding the vehicles in an optimal fashion. It seems counterintuitive that the destination points should be avoiding the vehicles optimally; however, if the vehicles can reach the points under such conditions then they will always be able to reach their destination. As a consequence of our stability criterion, it is necessary not only to determine the control strategies of the vehicles but also the optimal avoidance
P1: Shashi November 16, 2007
16:36
7985
7985˙C009
Modeling and Control of Unmanned Aerial Vehicles
285
strategies of the desired points. Let us label the final control vector of the ¯ Then, vehicles by φ¯ and the control final vector of the desired points by ψ. the main equation which has to be satisfied is ⎡
min max ⎣ φ
ψ
⎤ Vj · f j (x , φ, ψ) + G(x , φ, ψ) ⎦ = 0
j
¯ which has to be true for both φ¯ andψ. The f j (x , φ, ψ) term is the jth dynamic equation governing the system, and the Vj is the corresponding value of the game. G(x , φ, ψ) is a predetermined function which, when integrated, provides the payoff of the game. Notice that the only quantity that is not specified in the equation is the Vj term. From the main equation it is possible to determine the retrograde path equations (RPEs), which will have to be solved to determine the actual paths traversed by the vehicles in the formation. However, initial conditions of the RPEs will have to be considered in order to integrate the RPEs. These initial condition requirements provide us with the ability to introduce tolerance boundaries, within which we say that the formation has settled. Such boundaries naturally add complexity to the problem; however, they also provide a framework for positional measurement errors. The above formulation suggests a way for approaching the solution to the differential game. However, how does one ensure that intervehicle collisions are avoided? To ensure this, it is necessary to consider the payoff function determined by the integral of G(x , φ, ψ). As an example, if we simply seek that the vehicles must reach their goal within acertain time τ , then τ G(x , φ, ψ) = 1. This can be verified by evaluating 0 G(x , φ, ψ)dt = τ . Hence, we have restricted our solutions to the initial vehicle deployment, which will ensure that the vehicles will reach the desired points in τ time. However, if G (x , φ, ψ) is changed to penalize proximity of vehicles to one another, only initial conditions that ensure collision-free trajectories will be valid. However, G (x , φ, ψ) does not provide the means to perform the actual collision avoidance, but merely limits the solution space. So, in order to incorporate collision avoidance into the controller, one can either change the value function or add terms to the system of dynamic equations.
9.4
Two-Vehicle Example
In order to illustrate some of the advantages and disadvantages with the differential game approach to formation control, consider the following system of simple point “helicopters,” that is, points that can move in three dimensions
P1: Shashi November 16, 2007
16:36
7985
7985˙C009
286
Modeling and Control of Complex Systems
governed by the following dynamic equations: x˙ i = vxi v˙ xi = Fi cos(φ2i−1 ) sin(φ2i ) − ki · vxi y˙ i = v yi v˙ yi = Fi sin(φ2i−1 ) sin(φ2i ) − ki · v yi z˙ i = vzi v˙ zi = Fi cos(φ2i ) − ki · vzi where i = 1, 2. The two desired “points” are described by one set of dynamic equations. This simply implies that there is a constant distance separating the two desired points, and that the formation can only perform translations and not rotations in the three-dimensional space. Hence the dynamic equations become: x˙ d = vxd v˙ xd = Fd cos(ψ1 ) sin(ψ2 ) − kd · vxd y˙ d = v yd v˙ yd = Fd sin(ψ1 ) sin(ψ2 ) − kd · v yd z˙ d = vzd v˙ zd = Fd cos(ψ2 ) − kd · vzd In the above dynamic systems, the ki and kd factors are simply linear drag terms to ensure that the velocities are bounded, and the Fd and Fi terms are the magnitudes of the applied forces. Figure 9.5 shows the coordinate system and the associated angles. Z 2i,Ψ2
Y
X
FIGURE 9.5 Definition of angles.
2i-1,Ψ1
P1: Shashi November 16, 2007
16:36
7985
7985˙C009
Modeling and Control of Unmanned Aerial Vehicles
287
Substituting the dynamical equations into the main Equation (9.1), we obtain the following expressions: min[F1 · (Vvx1 · cos(φ1 ) · sin(φ2 ) + Vvy1 · sin(φ1 ) · sin(φ2 ) + Vvz1 · cos(φ2 )) φ
+F2 · (Vvx2 · cos(φ3 ) · sin(φ4 )+Vvy2 · sin(φ3 ) · sin(φ4 ) + Vvz2 · cos(φ4 ))] and max[Fd ·(Vvxd ·cos(ψ1 ) ·sin(ψ2 ) + Vvyd ·sin(ψ1 ) ·sin(ψ2 ) + Vvzd ·cos(ψ2 ))] ψ
(9.1)
To obtain the control law that results from the max-min solution of Equation (9.1), the following lemma is used: LEMMA 1 Let a , b ∈ : Then ρ=
a 2 + b2
is obtained where max (a · cos (θ ) + b · sin (θ )) θ
cos (θ ) =
α , ρ
and
sin(θ ) =
b ρ
and the max is ρ. By combining Lemma 1 with the solution of Equation (9.1), the following control strategy for vehicle 1 is found: Vvx1 , ρ1 Vvz1 cos( φ¯ 2 ) = − , ρ2 cos( φ¯ 1 ) = −
where ρ1 = and ρ2 =
Vvy1 ρ1 ρ1 sin( φ¯ 2 ) = − ρ2 sin( φ¯ 1 ) = −
2 2 Vvx1 + Vvy1
2 2 2 Vvx1 + Vvy1 + Vvz1
Similar results are obtained for vehicle 2. For the optimal avoidance strategy of the desired points, we obtain the following: Vvxd , ρd1 Vvzd cos( ψ¯ 2 ) = + , ρd2 cos( ψ¯ 1 ) = +
Vvyd ρd1 ρd1 sin( ψ¯ 2 ) = + ρd2 sin( ψ¯ 1 ) = +
P1: Shashi November 16, 2007
16:36
7985
7985˙C009
288
Modeling and Control of Complex Systems From this, we see that the retrograde equations have the following form: o
vx1 = −F1 ·
Vvx1 + k1 · vx1 ρ2
o
x1 = −vx1 o
Vx1 = 0 o
V vx1 = Vx1 − k1 · Vvx1 For this example, the final value will be zero, and occurs when the difference between the desired position and the actual position is zero. Naturally, to obtain a more general solution, a solution manifold should be used; however, in order to display the utility of this approach, the previously mentioned final conditions will suffice. The closed-form expression of the value function is then of the form: Vvx1 = (x1 − xd ) ·
1 − e −k1 t k1
It should be noted that the above analysis could be performed on a reduced set of differential equations, where each equation would express the differences in distance and velocity, and hence reduce the number of differential equations by a factor of 2. However, for the sake of clarity, the analysis is performed on the actual position and velocity differential equations. Furthermore, it should also be noted that this solution closely resembles the isotropic rocket pursuit game described in Reference [15]. This is due to the fact that the dynamic equations are decoupled, and hence working within a three-dimensional framework will not change the problem considerably.
9.5
Simulation Results
From the closed-form expression of the control presented in the previous section, it is obvious that the optimal strategies are in fact bang-bang controllers. Because the forces in the system are not dependent on the proximity of the vehicles to the desired points, there will always exist some positional error. It is, however, possible to resolve this problem simply by switching controllers at some error threshold, or introducing terms that minimize the force terms F1 and F2 as the vehicles approach the desired points. The above plot shows the tracking capabilities of the derived controller. The two vehicles are attempting to follow two parameterized circular trajectories with a radius of three. In Figure 9.6 the vehicles can move quickly enough to actually reach the desired trajectories, whereas in Figure 9.7 the velocities of the vehicles are not sufficient to reach the desired trajectories. In the latter case, the vehicles simply move in a smaller circle, which ensures that the error remains constant.
P1: Shashi November 16, 2007
16:36
7985
7985˙C009
Modeling and Control of Unmanned Aerial Vehicles
289
3D Position Plot Vehicle 1 Vehicle 2 Desired Trajectory 1 Desired Trajectory 2
15
10 Z 5
0 10 15 5
10
Y
5 0
X
0
FIGURE 9.6 Two-vehicle simulation with sufficient vehicle velocities.
3D Position Plot Vehicle 1 Vehicle 2 Desired Trajectory 1 Desired Trajectory 2
15
Z
10
5
0 10 15 5 Y
10 5 0
0
FIGURE 9.7 Two-vehicle simulation with insufficient vehicle velocities.
X
P1: Shashi November 16, 2007
16:36
7985
7985˙C009
290
Modeling and Control of Complex Systems
The term target tracking is often used to refer to the task of finding or estimating the motion parameters (mainly the location and direction) of a moving target in a time sequence of measurements. This task is achievable as long as the target is within the sensor’s field of view (FOV). If it happens that the target keeps moving away from the FOV, the target tracking task will fail to track the moving target until the target reenters the sensor’s FOV. To address this problem, the sensor is mounted on a moving platform such as a UAV. We call the new setup (the sensor plus the UAV) an agent. Thus, we can start a second task, other than the target tracking task, to (reactively or proactively) move the sensor to guarantee that the target stays in view. That second task is what we call the agent placement task. The work presented in this chapter is of the active sensing-based target tracking variety, in which both tasks discussed above are integrated.
9.6
Target Tracking
There exist a number of efforts to formally describe the dynamic agent placement problem for target tracking. The choice is made to use a formulation of the variety of Weighted Cooperative Multi-robot Observation of Multiple Moving Targets (W-CMOMMT) [16,17] because it captures the multipleobserver–multiple-target scenario with target prioritization. W-CMOMMT can be shown to be an nondeterminstic plynomial (NP)-hard problem [18]. The agent (sensor) placement problem is formulated by defining a global utility function to be optimized given a graph representing the region of interest, a team of agents, and a set of targets. A course motion model is developed first where target transitions follow a stochastic model described by an Mth-order Markov chain. Agents use the model to predict the target locations at future time instants as probability distributions. The algorithm attempts to maximize the coverage by searching for a set of observation points at each time step. A real-time dynamic programming tool is called upon to solve the maximization problem. Details of the approach can be found in Reference [18]. Particle filters have recently been successful in tracking mobile targets in video [19,21]. The video tracking problem consists of determining the position of a target within a particular video frame based on information from all past frames. Information such as size, color, and motion characteristics of the target is known a priori. In the particle filter framework, this information is used to initialize the filter in the first few frames of video. Thereafter, using a model similar to that in Reference [20], the state of each particle is updated as the video progresses from one frame to the next. At each step, color and motion data are collected for each particle to determine which particles have a high probability of correctly tracking the target. On the next iteration, particles are drawn according to this probability. Thus, successful particles “survive” and are used in subsequent frames, while the other particles “die.”
P1: Shashi November 16, 2007
16:36
7985
7985˙C009
Modeling and Control of Unmanned Aerial Vehicles
291
9.6.1 Particle Filtering in a Bayesian Framework The objective of Bayesian state estimation is to estimate the posterior pdf of a state, xk , based on all previous measurements, z1:k . This, probability density function (pdf), p(xk |z1:k ), can be determined in two steps, prediction and update. In the prediction step, the state update model is used to determine the prior pdf p(xk |xk−1 ). If a first-order Markov model is assumed, then the prior is given as:
p(xk |z1:k−1 ) = p(xk |xk−1 ) p(xk−1 |z1:k−1 ) dxk−1 After the measurement, zk is made and the prior is updated using Bayes’ rule: p(xk |zk ) p(xk |z1:k−1 ) p(xk |z1:k ) = p(zk |z1:k−1 ) In most cases, the above equations cannot be determined analytically. The Kalman filter is a well-known exception. However, when a Kalman filter is used, the system must be linear, with Gaussian distributions. The particle filter is one way to estimate the above equations. A particle filter iteratively approximates the posterior pdf as a set:
Sk = xk(i) , wk(i) i = 1, . . . , n where xk(i) represents a point in the state space, and wk(i) is the importance weight associated with this point. The wk(i) are nonnegative, and sum to unity. At each iteration, the particles are updated using the system dynamics and sampling from: (i) p xk(i) , xk−1 Measurements are then taken at each particle and the weights are updated using: (i) p zk xk(i) wk(i) ∝ wk−1 If the particles are resampled at each iteration, then the previous weights may be neglected and the above Equation becomes: wk(i) ∝ p zk | xk(i) After the weights are determined to at least a scale factor, they are normalized such that their sum is equal to unity. It has been shown21 that the posterior pdf estimated using particle filtering converges to the actual pdf as the number of particles increases. A particle filter was used to track a soldier as he maneuvered in an urban environment 2. Frames were grabbed from a movie at a rate of 30 Hz. The movie camera was held by a human operator. Therefore, there are a number of vibrations in the video, and the zoom is adjusted during the video.
P1: Shashi November 16, 2007
16:36
7985
7985˙C009
292
Modeling and Control of Complex Systems
FIGURE 9.8 Typical output frames. Frames are approximately 1.7 seconds apart from each other.
A neural network construct is employed to adapt the particle filter so that minimum number of particles is called upon and these particles are focused on the target even when the latter emerges from an occlusion. A few frames of the output are shown in Figure 9.8. The box represents a weighted average of the ten best particles. The set of “lights” in the upper left corner of each frame are used to indicate the output of the neural network. If the lowest “light” is “illuminated,” the neural network has output the lowest confidence level. If the second lowest is “illuminated,” the neural network has output the second lowest confidence level. If the middle two are “illuminated,” the neural network has output the second highest confidence level. If the top three are “illuminated,” the neural network has output the highest confidence level.
9.7
New Directions and Technological Challenges
9.7.1 Technological Challenges From single system to “system of systems.” • Modeling — Spatiotemporal modeling paradigms are needed for real-time planning and control of networked systems.
P1: Shashi November 16, 2007
16:36
7985
7985˙C009
Modeling and Control of Unmanned Aerial Vehicles
293
•
Control — Hierarchical/intelligent control of multiple networked systems (agents, sensors); new reasoning paradigms for tracking, pursuit-evasion, surveillance/reconnaissance, coordinated control, planning and scheduling, obstacle avoidance, and so on.
•
Networking and communications — Inter- and intrasystems reliable and secure communication protocols; need for command and control and supervisory functions; bandwidth and other quality of service requirements.
•
Computing — On-platform computational requirements; hardware and software architectures; open systems architectures.
•
Sensors and sensing strategies — Hardware/software requirements; performance and effectiveness metrics; networked sensors.
•
Performance metrics/verification and validation — Defining metrics for design and performance assessment; formal methods for verification and validation.
9.7.2 Enabling Technologies •
New modeling techniques are required to capture the coupling between individual system/sensor dynamics, communications, and so on, with system of systems behaviors. Hybrid system approaches will play a key role. Means to represent and manage uncertainty. Software models for improved QoS. Spatiotemporal models of distributed agents (sensors) are required to integrate system and motion dependencies, contingency planning, and so on.
•
Control — Intelligent and hierarchical/distributed control concepts must be developed and expanded to address “system of systems” configurations. Game-theoretic notions and optimization algorithms running in almost real time to assist in cooperative control and adversarial reasoning. Control of networks of dynamic agents.
•
Networking and communications — Communication protocols and standards.
•
Computing — Embedded processing requirements; new and reliable, fault-tolerant computing platforms; software reliability issue.
•
Sensors and sensing strategies — Innovative concept and technologies in wireless communications; improved and reliable/costeffective sensor suites; “smart” sensors and sensing strategies; data processing, data mining, sensor fusion, and so on.
•
Performance metrics/verification and validation — Need new system of systems performance and effectiveness metrics to assist in the design, verification/validation, and assessment of networked systems.
P1: Shashi November 16, 2007
16:36
7985
294
9.8
7985˙C009
Modeling and Control of Complex Systems
Conclusion
Federated systems consisting of multiple unmanned aerial vehicles performing complex missions present new challenges to the control community. UAVs must possess attributes of autonomy in order to function effectively in a “system of systems” configuration. Coordinated/collaborative control of UAV swarms demands new and novel technologies that integrate modeling, control, and communications/computing concerns into a single architecture. Typical application domains include reconnaissance and surveillance missions in an urban environment, target tracking and evasive maneuvers, search and rescue operations, Homeland security, and so on. Major technological challenges remain to be addressed for such UAV swarms, or similar federated system of systems configurations to perform efficiently and reliably. Excessive operator load, autonomy issues, and reliability concerns have limited thus far their widespread utility. The systems and controls community is called upon to play a major role in the introduction of breakthrough technologies in this exciting area.
References 1. Vachtsevanos, G., Tang, L. and Reimann, J., “An Intelligent Approach to Coordinated Control of Multiple Unmanned Aerial Vehicles,” Proceedings of the American Helicopter Society 60th Annual Forum, Baltimore, MD, June 7–10, 2004. 2. Office of the Secretary of Defense (Acquisition, Technology, & Logistics), Air Warfare. “OSD UAV Roadmap 2002-2027.” December 2002. 3. Voos, H., “Market-based Algorithms for Optimal Decentralized Control of Complex Dynamic Systems,” Proc. of the 38th IEEE Conference on Decision and Control, Vol. 40, pp. 3295–3296, Phoenix, AZ, 1999. 4. Clearwater, S. H. E., Market-Based Control: A Paradigm for Distributed Resource Allocation, Singapore: World Scientific, 1996. 5. Walsh, W. and Wellman, M., “A Market Protocol for Decentralized Task Allocation,” Proc. of the 3rd International Conference on Multiagent Systems, 1998. 6. Engelbrecht, W. R., Shubik, M. and Stark, R. M., Auctions, Bidding, and Contracting: Uses and Theory, New York: New York University Press, 1983. 7. Bertsekas, D., Auction Algorithms for Network Flow Problems: A Tutorial Introduction, Computational Optimization and Applications, Vol. 1, pp. 7–66. Netherlands: Springer, 1992. 8. Vachtsevanos, G., Kim, W., Al-Hasan, S., Rufus, F., Simon, M., Schrage, D., and Prasad, J. V. R., “Mission Planning and Flight Control: Meeting the Challenge with Intelligent Techniques,” Journal of Advanced Computational Intelligence, Vol. 1(1), pp. 62–70, October 1997. 9. Al-Hasan, S. and Vachtsevanos, G., “Intelligent Route Planning for Fast Autonomous Vehicles Operating in a Large Natural Terrain,” Journal of Robotics and Autonomous Systems, Vol. 40, pp. 1–24, 2002.
P1: Shashi November 16, 2007
16:36
7985
7985˙C009
Modeling and Control of Unmanned Aerial Vehicles
295
10. Al-Hasan, S. and Vachtsevanos, G., “A Neural Fuzzy Controller for Moving Obstacle Avoidance,” Third International NAISO Symposium on Engineering of Intelligent Systems, Malaga, Spain, September 24–27, 2002. 11. Sousa, J. B. and Pereira, F., “A Framework for Networked Motion Control,” Proceedings of the 42nd IEEE Conference on Decision and Control, pp. 1526–1531, Hawaii, December 2003. 12. Howard, M., Hoff, B. and Lee, C., “Hierarchical Command and Control for Multi-agent Teamwork,” Proceedings of 5th Intl. Conf. on Practical Application of Intelligent Agents and Mult-Agent Technology (PAAM2000), pp. 1–13, Manchester, U.K., April 10, 2000. 13. Baras, J. S., Tan, X. and Hovareshti, P., “Decentralized Control of Autonomous Vehicles,” Proceedings of the 42nd IEEE Conference on Decision and Control, pp. 1532–1537, Hawaii, December 2003. 14. Dunbar, W. B. and Murray, R. M., “Model Predictive Control of Coordinated Multi-Vehicle Formation,” Proceedings of the 41st IEEE Conference on Decision and Control, pp. 4631–4636, Las Vegas, December 2002. 15. Isaacs, R., Differential Games: A Mathematical Theory with Applications to Warfare and Pursuit, Control and Optimization. New York: John Wiley & Sons, 1965. 16. Werger, B. and Mataric, M., “Broadcast of Local Eligibility: Behavior-Based Control for Strongly Cooperative Robot Teams,” Proceedings of the Fourth International Conference on Autonomous Agents, pp. 21–22, ACM Press: Barcelona 2000. 17. Werger, B. and Mataric, M. J., “From Insect to Internet: Situated Control for Networked Robot Teams,” Annals of Mathematics and Artificial Intelligence, 31(1-4), pp. 173–197, 2001. 18. Hegazy, T. and Vachtsevanos, G., “Dynamic Agent Deployment for Tracking Moving Targets,” Proceedings of the Twelfth Mediterranean Conference on Control and Automation, Kusadasi, Aydin, Turkey, 2004. 19. Perez, P., Hue, C. and Vermaak, J., “Color-Based Probabilistic Tracking,” Proceedings of the European Conference on Computer Vision, pp. 134–149, 2002. 20. Perez, P., Vermaak, J. and Blake, A., “Data Fusion for Visual Tracking with Particles,” Proceedings of the IEEE, Vol. 92, pp. 495–513, 2004. 21. Arulampalam, A. and Maskell, S., “A Tutorial on Particle Filters for Online Nonlinear/Non-Gaussian Bayesian Tracking,” IEEE Transactions on Signal Processing, Vol. 50, pp. 174–188, 2002.
P1: Shashi November 16, 2007
16:36
7985
7985˙C009
P1: Binaya Dash October 24, 2007
17:56
7985
7985˙C010
10 A Framework for Large-Scale Multi-Robot Teams
Andrew Drenner and Nikolaos Papanikolopoulos
CONTENTS 10.1
10.2
10.3
10.4 10.5 10.6
Introduction.............................................................................................. 298 10.1.1 Problem Description................................................................. 299 10.1.2 Organization .............................................................................. 299 Related Works .......................................................................................... 300 10.2.1 Resource Distribution Problem............................................... 300 10.2.1.1 Homogeneous Teams .............................................. 300 10.2.1.2 Heterogeneous Teams ............................................. 301 10.2.1.3 How this Approach Differs .................................... 302 10.2.2 Autonomous Docking and Deployment ............................... 302 10.2.2.1 Autonomous Recharge ........................................... 303 10.2.2.2 Deployment Methods ............................................. 304 10.2.2.3 Deployment Models................................................ 304 10.2.2.4 Docking Methods .................................................... 305 10.2.3 Cooperative Robotics ............................................................... 308 Challenges and Issues............................................................................. 309 10.3.1 Docking ...................................................................................... 309 10.3.1.1 General Theory of Docking .................................... 309 10.3.1.2 Traditional Algorithms for Docking ..................... 311 10.3.1.3 Traditional Assumptions and Limitations of Docking ................................................................ 312 10.3.2 Hardware ................................................................................... 313 Optimization of Behaviors ..................................................................... 314 10.4.1 Cooperative Maneuvering for Energy Conservation .......... 314 Simulation................................................................................................. 316 10.5.1 Results ........................................................................................ 324 Future Work.............................................................................................. 329 10.6.1 Simulation Extensions.............................................................. 329
297
P1: Binaya Dash October 24, 2007
17:56
7985
7985˙C010
298
Modeling and Control of Complex Systems
10.6.2 Hardware Innovations ............................................................. 330 10.6.3 Applications............................................................................... 331 10.7 Conclusions .............................................................................................. 332 Acknowledgments .............................................................................................. 332 References............................................................................................................. 333 Robotic teams comprised of heterogeneous members have many unique capabilities that make them suitable for operation in scenarios that may be hazardous or harmful to humans. The effectiveness of these teams requires that the team members take advantage of the strengths of one another to overcome individual limitations. Some of these strengths and deficiencies come in the form of locomotion, sensing, processing, communication, and available power. Many times larger robots can be used to transport, deploy, and recover smaller deployable robots. There has been some work in the area of marsupial systems, but in general marsupial systems represent teams of two or three robots. The basic design of the majority of marsupial systems does not have the scalability to handle larger-scale teams, which offer increased performance and redundancy in complex scenarios. The work presented here deals with the modeling of a much larger-scale robotic team that utilizes principles of marsupial systems. Specifically, the power consumption of large-scale robotic teams is modeled and used to optimize the location of mobile resupply stations. Each deployable member of the distributed robotic team has a specific model comprised of a series of behaviors dictating the actions of the robot. The transitions between these behaviors are used to guide the actions of both the deployable robots and the mobile docking stations that resupply them. Results from preliminary simulation are presented.
10.1
Introduction
In recent years, the number of scenarios where an unmanned response would reduce the risk to responders has increased greatly. Whether the result of a natural disaster, such as a hurricane or earthquake, terror attack, or an unintentional leak of chemical materials, those responding to the scenarios may benefit greatly from the use of robots. Robotic teams offer the potential for increased accuracy, the ability to enter spatially restricted areas, the ability to transport a wide range of sensors, the ability to operate for extended periods without fatigue, and the ability to do this remotely either semiautonomously or autonomously. However, the coordination and control of these robotic teams can be a challenging task. In some cases, more operators than robots are required, which can create a logistical nightmare in terms of operating a functional team. Other challenges exist for the utilization of robotic teams. A primary consideration to the team is cost. Large, multipurpose robots that are highly redundant are
P1: Binaya Dash October 24, 2007
17:56
7985
7985˙C010
A Framework for Large-Scale Multi-Robot Teams
299
often prohibitively expensive to use in environments where they may be lost or by institutions with limited funding. Availability is a secondary concern. Even if a robot is deemed affordable, it must be available to respond to a situation in a timely fashion. For the most effective response this means that robotic teams should be staged around an area, similar to the dispersal of police, fire, and ambulance services. As a result of cost and availability, the individual capabilities of the robotic team members are often restricted or specialized. This requires selection of robots for the team that balance size, power (longevity), sensing, processing, and communication capabilities. 10.1.1 Problem Description The task then becomes: How can a large team of robots be controlled in a manner that maximizes the resource availability for mission-specific criteria? Some example criteria are •
Team longevity — The maximum operational lifetime of the robotic team is an important factor in a number of scenarios. When attempting to monitor a hazardous area, a primary concern revolves around how long the team is capable of monitoring the area before it will run out of power. This can be addressed through a number of means, including energy conservation, finding energy in the environment, resupply off other robots, or some combination thereof.
•
Area coverage — There are a number of tasks, such as search and rescue, reconnaissance, surveillance, and inspection, which require a robotic team to thoroughly search an area. When searching areas there may be areas that a single robot may not be able to traverse or enter into because of size limitations. In order to address this, heterogeneous teams comprised of robots with varying locomotion capabilities are desirable. In addition, these robots must have the sensing and computational abilities to recognize when they have been somewhere or if they have found the object of their search.
•
Speed of deployment — Response time is critical to successful mediation of many types of hazards. Finding means to optimize the deployment time is required in many scenarios where robots may be monitoring environmental contamination (such as tracking the release of a harmful chemical agent) or searching for survivors in a disaster area.
10.1.2 Organization This chapter is organized as follows. Relevant literature, specifically issues in resource distribution, robotic team formation, cooperative robotics, and power utilization, will be discussed in Section 10.2. Section 10.3 will more formally outline the problem discussed in Section 10.1.1 as well as deal with
P1: Binaya Dash October 24, 2007
17:56
7985
7985˙C010
300
Modeling and Control of Complex Systems
specific challenges and issues involved. Section 10.4 will present the formulations for the optimization of one of the behaviors in greater detail, followed by a discussion of simulation in Section 10.5. Future work, including a short discussion of hardware presently in development for this work, will be discussed in Section 10.6, followed by concluding remarks in Section 10.7.
10.2
Related Works
10.2.1 Resource Distribution Problem The distribution of resources is an underlying problem in two key ways when responding to emergency scenarios. First it is the distribution of resources relative to the incident that is being responded to. For example, when preparing for natural disasters, it is necessary to stockpile food, water, and medicine in geographically convenient areas to expedite the initial relief until outside relief can arrive. Military leaders position units around the world to enable resources to be deployed to areas of need as they arise in order to respond quickly and restore order. This can be thought of as the “strategic” response planning or “Where is the best place to store my resources for distribution?” The second key area deals with getting the necessary resources to the place they are needed. This includes many aspects of information gathering and path planning in order to minimize the response time. This can be thought of as the “tactical” question of “How do I best maneuver my resources in the field to achieve a goal?” The first problem is outside of the scope of this work. It is assumed that the robot teams exist and are in useful positions. This work focuses on identifying means by which robotic teams can disperse supplies among its members. Resource distribution on the scene has been classically done with a fairly static team of robots. There are generally two types of robotic teams, homogeneous teams [71, 10, 51], whereas others are heterogeneous [7, 21]. Often these teams are called upon to identify features in an environment and either manipulate them (such as carrying targets to a goal [39, 63]), create representations of the area [16, 41], or achieve a degree of coverage for surveillance or reconnaissance applications [22]. 10.2.1.1 Homogeneous Teams One interesting case of the resource distribution problem is that of search and retrieval. In the search and retrieval scenario, robots (resources) must be distributed to find objects of interest at unknown locations, and then return with those objects of interest. The MinDART team [54, 55] is an example of a homogeneous team that is used for this type of operation. One interesting subclass of homogeneous teams is those that are comprised of reconfigurable homogeneous modules. Systems such as PolyBot [71, 70], CONRO [10, 9, 62], and reconfigurable molecules [51, 38] are each
P1: Binaya Dash October 24, 2007
17:56
7985
7985˙C010
A Framework for Large-Scale Multi-Robot Teams
301
able to reconfigure themselves by rearranging identical modules. In each of these cases, different configurations result in different types of locomotion. In References [38, 31], it is possible to build self-reconfiguring robots in three dimensions in polynomial time. This lattice structure can make transitions among the modules which allows it to traverse any continuous surface of the structure. In CONRO, a variety of locomotion methods can be achieved by combining the homogeneous modules [62]. CONRO can configure itself as a long chain, moving as a snake or crawling as a caterpillar. The system can be reconfigured into a rolling wheel configuration as well as a variety of multilegged walkers. In order to control the system, a hormone-inspired distributed control scheme has been developed [58]. The dynamic nature of the reconfiguration of individual modules requires a dynamic communication network topology. The system must be fault tolerant as nodes fail or as configurations change, thus it cannot be reliant upon the ability to identify a specific node in the configuration. To accomplish this, topology information is used to send a virtual hormone which dictates the motion of the individual modules. PolyBot [71, 70, 72] is a modular system similar to CONRO. Individual modules can be autonomously reconfigured to support a variety of locomotion gaits. However, control of the system is done with static identification of each individual module. Using this static identification, predefined gait tables are used to control the system. Reconfiguration planning is generally done off-line, although online reconfiguration is possible to demonstrate the versatility of the system in dynamic environments. Simulations have been developed where the PolyBot is configured such that it can carry an object while rolling. 10.2.1.2 Heterogeneous Teams Often robotic teams are built on limited budgets and with limited supplies. In these cases, many of the individual members may not have the high-fidelity sensors desired. As a result, systems comprised of robots with extra capabilities are distributed among robots of lesser capabilities. This is particularly useful in applications such as Simultaneous Localization and Mapping, or SLAM. Sharing sensor data and computational resources of a robotic team are important aspects in completing a mission. Drenner et al. [13] report on scouts equipped with infrared (IR) emitters that are able to illuminate the way for other scouts that carry different sensor payloads. The Millibots [7] combine a variety of sensors on a team of small robots in order to accomplish more than a single robot could. However, it is possible to share other resources as well. Marsupial systems [39] enable the sharing of locomotion (and at times power) of larger systems with smaller ones. This enables small robots to be autonomously transported over large terrain quickly and without exhausting precious battery power. Another type of heterogeneous team was characterized by Hirose as the “super mechano-colony” or SMC [19]. This work has been extended [33, 68, 20] to
P1: Binaya Dash October 24, 2007
302
17:56
7985
7985˙C010
Modeling and Control of Complex Systems
utilize multiple robots to manipulate a mothership. The work proposed here makes use of a similar, but different, system described in detail in Reference [8]. The difference here is that this system proposes the manipulation of a large-scale marsupial device that creates a unique multitiered hierarchical system designed to be scalable across multiple marsupial docking stations. 10.2.1.3 How this Approach Differs Although there have been many approaches to solving the classical distribution of resources problem, this approach differs in two main ways. First, traditional resource distribution is built around a static team configuration. At the onset of the response, the robotic team is known to consist of a fixed number of systems, the number of which may become lower due to failure, but rarely increases. An exception to this is cases in which a marsupial system is able to release a second robot to perform a task. Often, the system being transported simply replaces the transporting system so the number of “active” robots within the team does not change. In this approach, the team size is reconfigured dynamically as the larger systems have the capability to expand the number of “active” systems dramatically. Second, this approach is built around a different control structure. Ideally, all robotic systems would have the capability to perform fully autonomously. Sliding scales of autonomy are often beneficial in resource allocation tasks as the end user may wish to simply assign a set of resources to perform a certain task and take control of individual assets when they have reached an objective. Many robotic teams have a simple hierarchy for the control structures that underlie the coordination of team members [72]. This can be depicted by thinking of the robots as nodes of a tree. The typical hierarchical structure has many benefits, such as enabling easier communication routing protocols and task allocation structures. Alternatively, a robotic team could be controlled through a fully connected graph where each robot can talk directly to each other robot. Although this makes direct communication possible, it also requires significant overhead to achieve. Fully distributed control is also possible, as reported in Reference [58], where no single robot has any authority over any others as they simply relay information to adjoining robots. The multihierarchical control can be thought of as something that is a bit in between. 10.2.2 Autonomous Docking and Deployment The act of having one agent connect to another to form a larger, more complex or more useful device has been around for centuries. From the interlocking of railroad cars to the docking of spacecraft in orbit, systems have had to connect to share capabilities. In the last decade, the act of docking autonomously has been explored by many. There are many reasons and approaches to docking and deployment. According to Reference [39], autonomous docking can be achieved more quickly and more accurately than through simple teleoperation. Autonomous docking allows for autonomous recharge, which allows for
P1: Binaya Dash October 24, 2007
17:56
7985
7985˙C010
A Framework for Large-Scale Multi-Robot Teams
303
an increased operational lifetime. Further, there are tremendous advantages in terms of locomotion capabilities by transporting smaller systems on more capable ones. This in turn allows for increased sensor coverage, which can expedite the mission being undertaken by a robotic team. 10.2.2.1 Autonomous Recharge Perhaps one of the most necessary behaviors of any robotic system that is expected to have sufficient longevity to carry out missions is the ability to either generate its own power (through the use of on-board reactors such as the SlugBot [29] or solar conversion, as found on Mars Rovers such as the Sojourner [35]) or to autonomously reach a recharging station and receive power that way. For the purposes of autonomous recharging, Walter [66] was probably the first to address the issues of autonomous recharge. A pair of robots would use light sensors to find their way to a charging station. Most modern approaches are derivatives of this method. Hada and Yuta [17] describe a robotic system that leaves a recharging station, moves to a fixed point, and moves back. This process is repeated continuously for a week to verify the ability to recharge. Over the duration of this first experiment, the robot travels 3.391 km. A second experiment was conducted where the robot patrolled the hallway outside of the lab. The robot will attempt docking by calculating the time difference between the detection of a reflective tape near the docking station by each of two photovoltaic sensors. The robot then aligns itself with the tape and moves to a docking station. This method resulted in 1080 successful dockings and chargings. The work in Reference [61] implements a recharging station and docking mechanism for a Pioneer 2 DX robot. The charging connection is mounted on the rear of the Pioneer 2 DX, thus the robot must drive in backwards without the benefits of any forward-facing sensors. The process begins with a colortracking system that identifies the location of the charging station. As the robot servos towards the docking station, a laser beacon is used to determine the angle of the robot relative to the wall. Once the proper orientation is achieved, the robot rotates using odometry alone. The robot will then blindly back into place for recharging. Failures in docking can occur if the robot does not properly align itself. These are detected with an IR system as well as monitoring for a voltage spike when electrically connected. As designed, the system was 97% successful in autonomous docking and recharge. In work reported in References [17] and [61] the docking station (or marker) was always within the robot’s initial field of view. Traversing larger environments where the docking stations are not immediately obvious to the robots may require additional searching or mapping of the environment prior to docking. Alternatively, in preengineered environments, the placements of docking and charging stations may be known a priori. It is worth noting that the capability to reach a home base and recharge has become commercially common in systems such as the Roomba autonomous vacuum cleaner from iRobot [49].
P1: Binaya Dash October 24, 2007
304
17:56
7985
7985˙C010
Modeling and Control of Complex Systems
10.2.2.2 Deployment Methods In order to successfully deploy a team of robots or other sensors, there are many factors to consider, such as the power available to each of the smaller systems, the communication capability of the team, the size of the area that must be covered, and the nature of the mission at hand. These factors can be combined in different ways for different deployment strategies. For instance, the scout team [21] can be deployed ballistically by a larger robot, the ranger, to investigate a target area. This system has no means of asset recovery, thus the robots that are launched must either make their way back to a “safe” zone or be considered disposable. Unfortunately, deployment through this means provides little information about where the robot ultimately ends up, which limits the usefulness of the method for pinpointing features in the environment. The method could be improved by running some form of SLAM on the individual robots and attempting to merge their maps; however, there would be a great uncertainty in their initial starting points. Scouts have also been deployed via their jumping mechanism when used in conjunction with a Pioneer 2 robot [27]. Here, the starting points of the scouts are known relative to the Pioneer, allowing for a better estimation of environmental landmarks relative to the larger Pioneer. In addition, deployment via this mechanism allows for recovery of the robots. In Reference [42] a marsupial approach enables a modified powerwheels jeep to transport a smaller Inuktun microbot. This system served as a testbed for a more robust system comprised of an RWI ATRV-Jr that carries the prePackbot Urban. These systems have been used to show that marsupial teams when working cooperatively can reach locations that independent systems cannot, and reach them faster [43]. Marsupial robotics need not be restricted to ground vehicles. In Reference [11] Corke et al. describe a group of sensors that are carried and released from an unmanned helicopter. This method allows for rapid deployment of a sensor network as the helicopter is not nearly as subject to terrain obstacles that other robots may face. 10.2.2.3 Deployment Models Regardless of how the robots are physically deployed, the nature of the mission may require that they deploy themselves in a particular fashion. If the robotic team is to perform surveillance on a known area for an extended duration, the task becomes that of the “art gallery problem” [46]. Here, a computational geometry model is used to identify the best location for each of the robots, and each robot simply navigates to the required position. When the environment is not known a priori, there are examples where robots are deployed to explore the environment in order to maximize coverage [23, 22]. There are different types of coverage that can be accomplished when deploying robots into different situations. In Reference [15], these coverage methods are classified into “blanket coverage,” “barrier coverage,” and
P1: Binaya Dash October 24, 2007
17:56
7985
7985˙C010
A Framework for Large-Scale Multi-Robot Teams
305
“sweep coverage.” There have been many methods of deploying robots to achieve these different forms of coverage. In Reference [4], dispersion occurs through random walk with artificial potential fields to prevent robots from coming too close together. Dispersion based on animal navigation [37] moves individual robots away from the highest local density of other robots that are visible to onboard sensors. In Reference [47], an overhead camera is used to direct a “virtual pheromone” to cause the robots to disperse. In Reference [45, 67], the robots or sensors are deployed in a static lattice formation for monitoring changes in environmental hazards such as oil slicks, air plumes, or radiation levels. Here, the goal is to maximize observation time by optimizing the power consumption used in communication. This is accomplished by limiting inter-robot communication and changing the “density” of the sensors by using a recursive dyadic partitioning in order to define a minimum set of sensors or robots necessary to map the boundary to a desired resolution. Deployment to map boundaries can also be accomplished by using deformable contour models [36]. This is a similar approach to using deformable contour models (or snakes) for image segmentation; however, in this case the control points are the mobile robots capable of sensing the environmental disturbance. For resources that have been dynamically allocated in response to environmental hazards, such as identifying the source of a release of a biological or chemical agent into the atmosphere, there are many models for which robots can be deployed. In Reference [26], robots simply follow wind direction to locate the source. Following gradients is done with different degrees of success in References [53, 52, 25, 34, 28, 40]. Each of these deployment models can be enhanced through the utilization of a system such as the one proposed in Section 10.6.2. For instance, in the “deployed” state, the docking station itself can act as an additional landmark for the coordination described in References [22, 23]. Teams of the mobile docking stations can deploy sensors into a lattice such as the ones utilized in References [45, 67]. When mapping the boundaries of pollutants, the mobile docking station can deploy additional units to act as control points or retrieve those units if they are not necessary to reduce the complexity of a control scheme such as the deformable contour models used in Reference [36]. Finally, when attempting to identify the source of a plume, the mobile docking station could deploy additional robots or sensors to increase the aperture of sensing when it is caught in a local minimum or shallow gradient. 10.2.2.4 Docking Methods There are many approaches to docking. Each has benefits and drawbacks in terms of costs, sensors, and processing and communication requirements. In Reference [5] a reactive, potential fields-based system is developed which allows for the docking of one robot with another. In Reference [65] a mobile platform is docked in industrial environments using low-cost IR sensors. The IR sensors are ideal for robots that may not have the computational resources
P1: Binaya Dash October 24, 2007
306
17:56
7985
7985˙C010
Modeling and Control of Complex Systems
for processing vision onboard. This system uses two passive retroreflectors to localize the robot relative to a known docking location. These sensing readings are used to generate a trajectory that maneuvers the system to a docking station. There are many vision-based approaches. In Reference [48], docking is achieved using an alignment and breaking process. This work is extended in Reference [57], where the concepts of ego-docking, where the vision system is mounted onboard, and eco-docking, where the camera is mounted on an external docking platform, are introduced. In both instances, the optical flow is used to control the motion of the robot to maneuver the robot perpendicularly to a specified docking location. In Reference [56] this work is used with a stereo camera system that allows for centering and wall-following behaviors. In each of these behaviors, the motion is controlled without precalibrating the vision systems. Rather, the behavior is determined by what portion of the visual field is utilized and which control law is adopted. In Reference [50], a combination of sensing methodologies is used to maneuver a robotic platform in an industrial environment that is relatively known, with few obstacles. The initial guidance is based on odometry and landmarks fixed within the environment. However, as errors tend to accumulate from odometry, a combination of IR and ultrasonic sensors with a relatively known environment is used to correct these errors. The final docking procedure makes use of artificial landmarks and a CCD camera. The work in Reference [5] is extended in Reference [39]. Here, the approach relies on a three-stage behavior for docking. In the first stage, the docking robot is only aware of the general location of the docking station and moves directly towards the docking station. Once the docking robot can detect the docking station with onboard sensing, a second motion control scheme is used based on the original potential field method. When the docking robot is within suitable range, a final dead-reckoning approach is utilized. In order to complete the final stage of docking, a landmark is identified using color cameras based on the spherical coordinate transform [24]. This method works fairly reliably, and many experiments were conducted to verify that the autonomous docking was more reliable than teleoperation. However, the feature that is used to complete the docking must be identified during the deployment phase. Thus, the docking station must remain stationary or maintain a topological reference to the landmark that remains invariant to any lighting changes that may occur during the duration of the deployment. Further, in responding to scenarios where the environment does not remain static, whether due to an aftershock of an earthquake or an adversarial agent manipulating the environment, it may be impossible to maintain a fix on any feature for docking. Thus, a more reliable method that is independent of the environment must be found. The previous docking methods [5, 50, 48, 57, 39] are all methods of docking in marsupial or industrial environments where the goal of docking involves the robot arriving in a specified location. In polymorphic systems such as CONRO [10, 9], PolyBot [71, 70], or the lattice structures of Reference [51, 38], the purpose of the docking is not so much to transport one robot to a specific
P1: Binaya Dash October 24, 2007
17:56
7985
7985˙C010
A Framework for Large-Scale Multi-Robot Teams
307
location, but to actively combine capabilities to accomplish something that otherwise would not be possible. In the case of CONRO [59], each module has three male connection sides and one female connection side. At each connection side, there is a set of IR transmitters and receivers. These IR transmitters and receivers are dual purpose. They are used to measure distance and align the modules as well as serve as a communication link between attached modules. As with many other docking systems, there is a three-staged approach to the docking procedure. First, the two modules must move close enough to sense the guidance signal from the other robot. Second, they must orient themselves using the guidance sensors, until they are properly aligned and can be connected. The work in Reference [59] is extended to design for self-assembly in space [60]. Here, the goal is to develop systems that can operate in three dimensions and have the ability to tether themselves to one another. The control process for such a system would be the same hormone-inspired system that is used to control CONRO [58]. PolyBot [71] is similar to CONRO in its approach to connecting modules, but does so with a series of genderless connection plates with four pins and four mating chamfered holes. The design of the system allows for a faster docking and ultimately faster reconfiguration. In Reference [72], the process of docking is divided into two phases, the approach and the physical latching. Alignment is accomplished using IR sensors. Due to the inaccuracies in angle estimation, errors can propagate across the length of the chain. As a result, the process of connecting modules varies depending upon the distance between the modules. At long range, the errors are potentially too great for an accurate coupling. At medium ranges, the IR sensors become saturated as the modules become close. At this point, the pins should be correctly aligned and the mechanical design of the latching mechanism facilitates the final stage of docking. In Reference [6], docking is achieved between two mobile robots by using a forklift on one robot and a receptacle set on another. Partial alignment of the system is accomplished through a vision system. The robots are then skidsteered such that when they are partially aligned, the wheel slippage will force the connection together. Achievement of docking is highly dependent upon the means by which two robots dock. When one robot docks entirely inside another such as in Reference [39], there is a different set of needs than when two robots join together to form one larger robot. In the case of Reference [44], special connectors have been designed to allow for fast and convenient docking, both electrically and mechanically. These devices are designed with the shear forces in mind that such a junction may encounter when covering difficult terrain. The appropriate choices of connection method, connector style, and docking schema are important to the development of a reconfigurable system that is capable of transitioning from a homogeneous system to a heterogeneous system. The time and energy required to accomplish these tasks will ultimately affect mission performance in terms of battery life, speed of deployment and recovery of assets, and complexity of controlling a system.
P1: Binaya Dash October 24, 2007
17:56
7985
7985˙C010
308
Modeling and Control of Complex Systems
10.2.3 Cooperative Robotics To this point, this work has mainly focused on the ability of robots to cooperate in terms of locomotion [71, 10], exploration and coverage [22, 18], localization relative to one another [23, 14], or for exploration and simultaneous localization and mapping (SLAM) [41, 14, 16]. However, the movement of the mobile docking station as described in Section 10.6.2 can be also characterized in the sense of cooperative manipulation. In Reference [69], a manipulator is mounted on a mobile platform. This allows for a simplification of manipulation by allowing the manipulator to be brought into a desired position by planning the motion of the mobile platform first, then performing the preferred manipulation, rather than developing a more complex manipulation scheme. The control algorithm in this work is built around two issues. The first is that manipulators and mobile platforms generally have different dynamics. This is an important aspect to understand when integrating the two aspects of the task that these robots will be used for. Second, most mobile platforms are subject to nonholonomic movement while manipulators have a much higher degree of freedom. This control algorithm attempts to address these issues while moving the robot to bring the manipulator into a preferred configuration. In Reference [1], a team of four robots is utilized to cooperatively move a large object. Two of the robots in this task are designated observers and two of them are involved in the cooperative manipulation. This system relies upon a human–robot interface that is built upon a hierarchical control system [2]. The interface allows the operator to intervene and prevent the systems from reaching an error state. The control strategies for coordinating the team of robots are built on a set of primitives for each robot’s sensory and locomotion capability. Behaviors and tasks can be assigned and executed sequentially, concurrently, or conditionally to ensure that the robotic team is able to manipulate the large object to the desired position. Two methods of controlling the manipulation of a large object are discussed. The first approach treats the whole system, including the object being transported, as a large kinematic chain. The second approach distributes the control to the two robots manipulating the object. There are savings in terms of communications and computational complexity in the second approach; however, in a system with reduced complexity (such as the one proposed in Section 10.6.2) the first approach may be more applicable. Also, the work in Reference [1] requires that two of the four team members be dedicated to observing the environment and the coordination of the two robots actually manipulating the environment. This is not necessarily a particularly efficient utilization of available resources. In Reference [63] a framework for controlling multiple mobile manipulators is presented. The control algorithm for performing this task is decentralized and scalable to a larger number of robots. This is advantageous in that centralized control would be too complex when dealing with large numbers of robots and dynamic reallocation of robot roles is necessitated by the environment. The combined locomotion is broken down into trajectory planning, robot control, and formation control tasks. This system has been used
P1: Binaya Dash October 24, 2007
17:56
7985
7985˙C010
A Framework for Large-Scale Multi-Robot Teams
309
successfully to transport flexible boards with a team of three mobile robots. The system is capable of transitioning formation to allow for changes in support and navigation of changing environments.
10.3
Challenges and Issues
The underlying problem in utilizing heterogeneous robotic teams is that of resource availability. The challenge becomes: “Where and how are resources best deployed for a specific mission?” One potential approach is to marsupially transport and deploy smaller robots to the scene to complete the mission objectives. The larger “motherships” or “docking stations” provide a number of services to the deployed systems. In some cases they provide power through tethers [42], while in others they provide enhanced processing and communication capabilities [27, 21]. However, these approaches must improve in scalability in order to support significantly larger robotic teams. Increased scalability and team size have many associated challenges. The robot that is deploying other systems must have the communication, processing, and power capabilites to facilitate the organization of the entire team. In order to ensure maximum longevity of the team, the robot must be capable of deploying, coordinating, recovering, and recharging the deployable systems. The framework to support this must also allow that the deployable robots be interchangeable, so that the other “motherships” or “docking stations” can be used in conjunction with one another. The problem of deploying robotic systems has been studied in detail, whereas work in the area of robot recovery has been more limited. 10.3.1 Docking 10.3.1.1 General Theory of Docking Docking of two or more systems (agents, robots, machines, and so forth) can generally be thought of as a process in which individual systems combine as a means to benefit one another. The benefit is often in terms of energy conservation, time conservation, or resource availability. With this viewpoint, there are a number of systems that may not be traditionally thought of as docking, but actually serve as useful models for this work: •
Trains — Although the environment and perception of trains as a docking system may be restrictive, they are an interesting example of the benefits of docking. The ability of railroad cars to “dock” with one another is an example of a system where two or more unique systems share capability. For example, a locomotive provides the power to move the whole train. Some cars of a train can carry coal or other fuel for powering the train. Other cars carry passengers, who benefit from the speed of the train and collectively
P1: Binaya Dash October 24, 2007
17:56
310
7985
7985˙C010
Modeling and Control of Complex Systems save energy over each individual passenger trying to traverse the distance alone. •
In-flight refueling — A more commonly accepted example of docking is that of in-flight refueling. The ability to refuel a plane while in flight enables extended mission durations or relocation of resources over longer distances. Here, the docking process is much more complicated as the alignment must occur in three dimensions. However, this is limited as the number of agents being simultaneously serviced is reduced in the case of a single refueling plane.
•
Aircraft carriers — In terms of scalability, the aircraft carrier at sea is one of the best models for this work. Carriers can deploy and recover fighters individually, but can service a number of them simultaneously. The carrier must coordinate its movements with other ships as well as the aircraft it services. A carrier must transport, deploy, recover, and resupply not only the set of aircraft intended for its use, but as the need arises, the carrier must be able to service planes and helicopters from other carriers as well.
•
Robotic docking — Robotic docking can be classified into three models similar to the models discussed above. The basic case is that of simply docking with a known position for power purposes (similar in nature to in-flight refueling). Work has been done in this area since the 1960s when systems attempted to align themselves for recharge using a combination of photovoltaic sensors and light beacons [66]. In Reference [17], photovoltaics are used in conjunction with reflective tape to align robots to a fixed docking station. A recharging station and docking mechanism for a Pioneer 2 DX robot has been implemented using a color-tracking system in a fixed environment [61]. Other cases of extremely calibrated environments include the use of low-cost infrared sensors [65]. Vision-based docking is achieved in Reference [48] and extended in Reference [57] where optical flow is used to control the maneuvering of the robot to a desired docking location. Enhancements through the use of stereo camera systems allow for improved centering and wall following for better alignment with the docking location [56]. In these cases, the work is done using highly calibrated systems. Such calibration may be possible as in fixed industrial environments where all path planning with respect to landmarks can be done a priori [50]; however, in general the sensing needs to be adaptive. Such approaches have met with limited success in commercial applications such as the Roomba [49] and the Sony AIBO [3] which have the ability to recognize a docking station and recharge as necessary. The term polymorphic systems can be used to describe the second type of robot docking. This is similar in many ways to the train example, where modules of differing capabilities are interlinked to form a
P1: Binaya Dash October 24, 2007
17:56
7985
7985˙C010
A Framework for Large-Scale Multi-Robot Teams
311
more useful structure. However, in this case, the robotic modules may interconnect in a nonlinear fashion, allowing for robots that can traverse a variety of terrains. Examples of these types of systems are the PolyBot [71, 70, 72], the CONRO [10, 9, 59], and the lattice structures of References [51, 38]. Finally, there are the marsupial systems. Marsupial systems are in many ways like the carrier example. Teams of robots such as the Scout and Pioneer team [27], the Georgia Tech Yellow Jackets [12], or the marsupial teams of Reference [42] use a larger robot to transport, coordinate, and control one or more other robots. One of the main advantages of this approach is that marsupial teams when working cooperatively can reach locations that independent systems cannot, and reach them faster [43].
10.3.1.2 Traditional Algorithms for Docking To understand the contributions of this work, a brief understanding of traditional algorithms for docking is necessary. This section will provide a brief look at the traditional algorithms for docking and the various assumptions made in specific implementations. The limitations of these assumptions form one of the general problems that this work attempts to solve. Traditionally, robotic docking starts with a set of control laws, generally formed a priori. These control laws may be a preprogrammed sequence of events in highly calibrated environments [50], slightly adaptive algorithms that attempt to identify landmarks in otherwise unknown (but structured) environments, or systems that follow artificial potential fields towards a recognized goal [5]. The robot then works through three general phases of motion which cycle between identification of the docking station, coarse approach, and fine approach, until the robot can dock. This is illustrated in Figure 10.1. The individual implementations for each phase are often the result of tradeoffs in available sensing and computational capabilities of the teams. However, regardless of how the stages are accomplished, the majority of algorithms for docking utilize the three stages. •
Docking station identification — Identification of the location of the docking station is the first step in the docking process and is accomplished in several ways. In highly calibrated environments [50] this is known a priori. Other methods attempt to utilize knowledge gained from the point of deployment to match the environment from other perspectives when returning [39]. Some approaches use GPS locations, beacons [66, 17], or visual landmarks [24, 48, 57, 61].
•
Coarse approach phase — Once the docking station location is known, the second phase involves traversing a relatively long distance toward the docking area. This phase generally consists of visionbased sensing [24, 48, 57, 61] or IR communication [71, 65]. Here, the initial control laws are used to guide the robot until it reaches a position that is relatively close to the goal.
P1: Binaya Dash October 24, 2007
17:56
7985
7985˙C010
312
Modeling and Control of Complex Systems
Predefine control laws
Identify docking station
Coarse approach phase
Fine approach phase
Dock FIGURE 10.1 Traditional docking algorithm flow diagram.
•
Fine approach phase — Once the robot is close to the docking point, final alignment must occur. As the robot may be attempting mechanical or electrical interconnection, more precision is required. Also, as the robots may be in close proximity to one another, longer-range sensors may become saturated [72] and relationships between landmarks at deployment may become skewed from the close vantage point [39].
If a failure occurs in one phase, the robot may have to revert to a previous phase or restart the process entirely. Failures often occur as the result of improper alignment leading the robot away from the final goal position. However, successful docking results in either physical interconnection or the stowing of one robot onboard another. 10.3.1.3 Traditional Assumptions and Limitations of Docking To this point there have been a number of assumptions, including: •
Static environments — Many approaches assume that the environment will not change and that preplanned movements and known landmarks will be effective for docking. This works in specially built environments such as factories, where the landmarks are provided and not likely to change. However, the desired application of these systems is often in highly dynamic environments such as those found in urban search and rescue. In this case, there is no guarantee that the fiducial calculated at deployment, as is the case in Reference [39], will remain the same by the time the mission is completed. In these instances, it is likely that over time the lighting conditions will change, causing differences in visual interpretations
P1: Binaya Dash October 24, 2007
17:56
7985
7985˙C010
A Framework for Large-Scale Multi-Robot Teams
313
of an area. It is also possible that the physical environment changes as the result of falling rubble or aftershocks of an earthquake. •
Fixed deployment and recovery positions — Many of the implementations assume that whether or not the environment changes, the docking station will remain in a static location. This assumption allows easy identification of the docking area, provides upfront knowledge of where to return, and allows the use of physical mechanisms such as tethers. However, there are three main disadvantages to this approach. First, this reduces the speed of exploration as all robots must be deployed from the same area, which in turn may reduce the operational lifetime as the deployable systems may expend more energy to reach their intended destinations. Second, in dynamic environments, the route traversed by the deployable system may become impassable, negating the benefits of the assumption. Finally, in highly dynamic environments, leaving the docking station in a fixed position removes any aspects of self-preservation. Given the cost and complexity of the docking station, it is important that it be able to maneuver if the position it is in becomes unstable.
•
Infinite power and perfect communication — Many approaches do not consider power expenditure or the costs of communication, especially in the case of tethering. Many environments will cause tethers to fail as the result of snagging. Other times tethers can be restrictive in terms of energy expended to pull the tether or simply the finite tether length. Additionally, one should not assume that wireless communication will work perfectly. Thus, a docking station that is mobile and an algorithm for deploying robots that may act as communication-repeating nodes is necessary.
•
Team size — Generally speaking, robotic teams that have been physically implemented have consisted of teams of fewer than ten robots (with the exception of a few projects such as the Centibots project [30]). Marsupial teams have typically been one mother and one daughter [43], although more recently there have been up to five robots involved as in the scouts and pioneers [27] or the AIBOs and ATRV-Jr [12]. Larger teams should expedite the exploration and area covered. For cost and complexity of control reasons, the creation of large teams has been somewhat infeasible. However, an approach that distributes control across a team of docking stations should provide the scalability to enable large teams of inexpensive robots.
10.3.2 Hardware One of the major challenges to an endeavor of this nature is the design and availability of a hardware platform that is physically capable of deploying, coordinating, recovering, and recharging a sufficiently large number of smaller robots. The design considerations for such a platform are discussed in greater detail in Reference [8].
P1: Binaya Dash October 24, 2007
17:56
7985
7985˙C010
314
10.4
Modeling and Control of Complex Systems
Optimization of Behaviors
10.4.1 Cooperative Maneuvering for Energy Conservation In order to conserve power, it would be ideal to deploy robots into several areas by having the docking station first drop them off and then move between the deployed robots as necessary to recharge them. However, given that the rates of power expenditure may vary, the docking station may be constantly moving, which may expend power unnecessarily, or the docking station may be situated centrally, causing the deployed systems to expend more power to return. As a result, a means of intelligently maneuvering both the docking station and deployable systems must be devised which conserves as much energy as possible. This energy conservation can be formulated as identifying a minimum spanning tree of the estimated location of all deployed systems such that the distances between the docking station and the robots are minimized. The docking station can then simply traverse this tree. Improving upon this would be to minimize the average distance between the estimated location of the robots. This would be more beneficial to the deployed systems, which do not carry as much energy. However, both of these cases can result in an undesirable situation where robots do not have enough power to make it back to the docking station and are unable to be recovered. Thus, a third approach is necessary for finding a position that minimizes not only the distance (and effectively the energy required to return), but adds a penalty to solutions that put the docking station in an area where the remaining energy at time t is less than the required energy for the robot to return. This leads to the creation of the following objective function. Each robot Ri has several parameters, including position [RXi (t), RYi (t)], velocity VRi , and an estimated battery life E Ri . Each docking station Di is similar in that it has a position [DXi (t), DYi (t)] and a velocity VDi , and significantly more battery life than the deployable systems. At each time t, a cost function C must be used to calculate the cost of the robot returning to a potential location of the docking station at time t + δt. This cost function can be given by Equation (10.1). The distance function dist is simply the distance of the shortest known traversable path to the docking station. C( Ri (t), Di (t + δt)) = xe x− f where: x=
di VRi
E Ri
di = dist( Ri (t), Di (t + δt)) f = 1.
(10.1)
P1: Binaya Dash October 24, 2007
17:56
7985
7985˙C010
A Framework for Large-Scale Multi-Robot Teams
315
FIGURE 10.2 The set of possible locations at time t + δt, assuming a constant velocity VD .
The set of possible positions for the docking station Di at time t + δt (depicted in Figure 10.2) can be represented as a circle with the origin located at the position of the dock Di at time t with a radius of r Di , which is equivalent to the docking station’s velocity times the time increment or VDi δt. This gives the new position as: DX (t + δt) = DX (t) + r D cos(θ D )
(10.2)
DY (t + δt) = DY (t) + r D sin(θ D ).
(10.3)
The task then becomes finding the value of θ D that minimizes: n
C( Ri , [DX (t + δt), DY (t + δt)]).
(10.4)
i=1
Figure 10.3 depicts many possibilities for the best placement of the docking station. It is important to consider that the best placement for the docking station may be to remain stationary or move only slightly. This requires the introduction of a velocity scalar α over the range [0−1], and for the optimization to be conducted over both α and θ , resulting in: n
C( Ri , [DX (t) + αVD δtcos(θ D ), DY (t) + αVD δtsin(θ D )]).
(10.5)
i=1
There are still cases where this approach may be suboptimal or fail. In these cases, additional constraints may be needed for both algorithmic and practical purposes. For example, in order to reduce the complexity, the deployed robots may be clustered or partitioned for collection by a single docking station, rather than allowing multiple docking stations to attempt pickup of the same systems. Additionally, the exploration of the deployable systems may be reduced to a function based upon available sensing and communication ranges, which would reduce the spatial area that the docking stations must cover.
P1: Binaya Dash October 24, 2007
17:56
7985
7985˙C010
316
Modeling and Control of Complex Systems
(a) Initial Configuration.
(b) Possible New Location.
(c) Another Possible Location.
FIGURE 10.3 Computing the cost at different potential locations of the docking station at time t + δt.
10.5
Simulation
A simulation setup has been developed to test the validity of the method proposed in Section 10.4.1 which involves 50 robots deployed at random in an open field. A single docking station capable of recharging 10 robots at a time attempts to optimally position itself among the deployable robots using Equation (10.5) at each time step. Figure 10.4 depicts a sample deployment. The simulation requires that the deployable robots maintain a series of internal states based upon available power and estimated proximity to the docking station as shown in Figures 10.5 and 10.6. The docking station, depicted as an asterix, continually repositions itself according to Equation (10.5). While the deployable robots have sufficient power, they “explore” their surroundings by randomly moving at each time step (depicted by a grey dot).
P1: Binaya Dash October 24, 2007
17:56
7985
7985˙C010
A Framework for Large-Scale Multi-Robot Teams
317
200 150 100 50 0 –50 –100 –150 –200 –200
–150
–100
–50
0
50
100
150
200
FIGURE 10.4 Simulated initial deployment of mobile docking station and robots.
Exploration continues until the robot’s power is sufficiently low and the robot begins to “seek home” (depicted as a dark dot). If the robot lacks sufficient power to return to the docking station, it enters an “abandoned” state (depicted as a grey x) where it attempts to minimize power consumption until the docking station is close enough to “seek home” or until it runs out of power completely and is “dead” (depicted as a black x). As long as the relative spatial distribution of the robots is small, simulation shows that the cost function in Equation (10.5) holds. However, as the robots disperse outwards, a problem emerged where groups of robots would become “abandoned” in diametrically opposite positions. Given that all of the robots are of equal priority, the docking station would remain stuck in the center and all of the robots would die, as shown in Figure 10.5. This suboptimal performance of the cost function required that a prioritization mechanism be added. This came in the form of clustering using the ISODATA [64] clustering algorithm. Robots are clustered based on their estimated spatial positions and internal states. A priority cluster is then chosen and the docking station will optimize its location in order to recover the members of that cluster. Robots are removed from the cluster as they successfully dock or die. When the cluster is empty, a new cluster is chosen. Figure 10.6 illustrates several time steps of the simulation utilizing the clustering. For visualization purposes the surfaces of the objective function for each of the steps in Figure 10.6 are shown in Figure 10.7.
P1: Binaya Dash October 24, 2007
17:56
7985
7985˙C010
318
Modeling and Control of Complex Systems
200 150 100 50 0 –50 –100 –150 –200 –200
–150
–100
–50
0
50
100
150
200
FIGURE 10.5 A sample situation in which the “global” solution results in extremely suboptimal performance.
200 150 100 50 0 –50 –100 –150 –200 –200
–150
–100
–50
0
50
(a) Initial Configuration. FIGURE 10.6 A sample run of the simulation.
100
150
200
P1: Binaya Dash October 24, 2007
17:56
7985
7985˙C010
A Framework for Large-Scale Multi-Robot Teams
319
200 150 100 50 0 –50 –100 –150 –200 –200
–150
–100
–50
0
50
100
150
200
50
100
150
200
(b) Time = 400.
200 150 100 50 0 –50 –100 –150 –200 –200
–150
–100
–50
0
(c) Time = 600. FIGURE 10.6 (Continued).
P1: Binaya Dash October 24, 2007
17:56
7985
7985˙C010
320
Modeling and Control of Complex Systems 200 150 100 50 0 –50 –100 –150 –200 –200
–150
–100
–50
0
50
100
150
200
100
150
200
(d) Time = 1000.
200 150 100 50 0 –50 –100 –150 –200 –200
–150
–100
–50
0
50
(e) Time = 1100. FIGURE 10.6 (Continued).
P1: Binaya Dash October 24, 2007
17:56
7985
7985˙C010
A Framework for Large-Scale Multi-Robot Teams
321
200 150 100 50 0 –50 –100 –150 –200 –200
–150
–100
–50
0
50
100
150
200
(f ) Time = 1200. FIGURE 10.6 (Continued).
1 0.5 0 –0.5 –1 10 8 6
4 2 0 –2 –16
–14
–12
(a) Initial Configuration. FIGURE 10.7 Objective surfaces for each time step shown in Figure 10.6.
–10
–8
–6
–4
P1: Binaya Dash October 24, 2007
17:56
7985
7985˙C010
322
Modeling and Control of Complex Systems 1 0.95 0.9 0.85 0.8 20 18 16 14
12 10 8
94
98
96
100
102
104
106
(b) Time = 400.
1 0.9 0.8 0.7 0.6 0.5 0.4 –8 –10 –12
–14 –16 –18 –20 –158
–156
–154
–152
–150
–148
–146
(c) Time = 600.
1 0.99 0.98 0.97 0.96 0.95 6 4 2 0 –2 –4
–66
–6 –68 (d) Time = 1000.
FIGURE 10.7 (Continued).
–64
–62
–60
–58
–56
P1: Binaya Dash October 24, 2007
17:56
7985
7985˙C010
A Framework for Large-Scale Multi-Robot Teams
323
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 18 16 14 12 10 8 6 52
56
54
58
60
62
64
(e) Time = 1100.
1 0.8 0.6 0.4 0.2 0 8 6 4 2 0 –2 –4 –72
–70
(f ) Time = 1200. FIGURE 10.7 (Continued).
–68
–66
–64
–62
–60
P1: Binaya Dash October 24, 2007
17:56
7985
7985˙C010
324
Modeling and Control of Complex Systems
TABLE 10.1
Results of Simulation Runs Initial Initial Dock Robot Position Distribution Centered Unimodal Right Unimodal Lower Unimodal Lower right Unimodal Centered Bimodal Right Bimodal Lower Bimodal Lower right Bimodal Average
Mean Time in State Runs 3 3 3 3 3 3 3 3
Docked 194.7 128.0 145.8 117.9 431.2 425.0 488.8 439.5
Exploring 1072.5 843.7 918.9 805.3 2002.6 1957.6 2220.8 2046.5
Seek Home 270.9 201.5 210.9 173.9 572.7 560.5 584.8 545.1
Abandoned 973.7 866.4 880.3 929.2 149.3 199.5 203.7 232.7
Dead 2488.1 2960.3 2844.2 2973.8 1844.1 1857.4 1502.0 1736.1
24
296.4
1483.5
390.1
554.3
2275.7
10.5.1 Results A series of 24 simulations with 50 robots were run with the docking station starting in positions in the center of, to the right, to the lower right, and below the distribution of robots. Half of the runs were conducted with the robots initially in a unimodal random distribution. The other half of the runs were conducted with the robots distributed in a bimodal random distribution. Table 10.1 shows results of these simulations. Figure 10.8 shows the box plots of the time spent in each state for the unimodal robot distributions. Figure 10.9 shows the box plots of the time spent in each state for the bimodal runs. Dock = Center, Initial Distribution = Unimodal
3500 3000
Time
2500
+ +
2000 1500 1000 500
+ +
0 Docked
Exploring Seeking Home Abandoned Robot State (a) Center unimodal
FIGURE 10.8 Time spent in each state across all unimodal runs.
Dead
P1: Binaya Dash October 24, 2007
17:56
7985
7985˙C010
A Framework for Large-Scale Multi-Robot Teams
325
Dock = Right, Initial Distribution = Unimodal 4000 3500 3000
Time
2500 2000 1500 1000 500 0 Docked
Exploring Seeking Home Abandoned Robot State (b) Right unimodal
Dead
Dock = Lower Right, Initial Distribution = Unimodal 4000 3500 3000
Time
2500 2000 1500 1000 500 0 Docked
Exploring Seeking Home Abandoned Robot State (c) Lower right unimodal
FIGURE 10.8 (Continued).
Dead
P1: Binaya Dash October 24, 2007
17:56
7985
7985˙C010
326
Modeling and Control of Complex Systems Dock = Lower, Initial Distribution = Unimodal 3500 3000
Time
2500 2000 1500 1000 500 0 Docked
Exploring Seeking Home Abandoned Robot State
Dead
(d) Lower unimodal FIGURE 10.8 (Continued).
Dock = Center, Initial Distribution = Unimodal 3500 3000
Time
2500 2000 1500 1000 500 0 Docked
Exploring Seeking Home Abandoned Robot State (a) Center bimodal
FIGURE 10.9 Time spent in each state across all bimodal runs.
Dead
P1: Binaya Dash October 24, 2007
17:56
7985
7985˙C010
A Framework for Large-Scale Multi-Robot Teams
327
Dock = Right, Initial Distribution = Bimodal 3500 3000
Time
2500 2000 1500 1000 500 0 Docked
Exploring Seeking Home Abandoned Robot State
Dead
(b) Right bimodal
Dock = Lower Right, Initial Distribution = Bimodal 3500 3000
Time
2500 2000 1500 1000 500 0 Docked
Exploring Seeking Home Abandoned Robot State (c) Lower right bimodal
FIGURE 10.9 (Continued).
Dead
P1: Binaya Dash October 24, 2007
17:56
7985
7985˙C010
328
Modeling and Control of Complex Systems Dock = Lower, Initial Distribution = Bimodal 4000 3500 3000
Time
2500 2000 1500 1000 500 0 Docked
Exploring Seeking Home Abandoned Robot State
Dead
(d) Lower bimodal FIGURE 10.9 (Continued).
12
×104
Docking Station Power Utilization Over Time Power Used to Manuever Docking Station Power Used to Recharge Robots
Power Used per Time Step
10
8
6
4
2
0
0
500
1000 1500 2000 2500 3000 3500 4000 4500 5000 Time
FIGURE 10.10 Power consumption of the docking station for recharging and movement.
P1: Binaya Dash October 24, 2007
17:56
7985
7985˙C010
A Framework for Large-Scale Multi-Robot Teams
329
Mean Power Across Robots Over Time 500 450 400
Mean Power
350 300 250 200 150 100 50 0
0
500
1000 1500 2000 2500 3000 3500 4000 4500 5000 Time
FIGURE 10.11 Mean power available to the distributed robots.
The power consumption of the docking station for movement and recharging from a single run is shown in Figure 10.10. The mean power available to the robots for a single run is shown in Figure 10.11.
10.6
Future Work
10.6.1 Simulation Extensions In order to increase effectiveness as a model for a potential multi-robot system, there are several extensions necessary for consideration. Among these extensions are additional cost functions for different mission priorities to develop a means of weighting priorities in order to address more complex goals. In terms of the simulation, there are several areas of extension that should be considered: •
Docking process and procedure — The current “instantaneous docking” method will not hold in a real-world system. The simulation requires an extension that causes more time to be expended while the robot is physically docking. However, this time expenditure will be based on the actual physical means of docking and thus will be evaluated when the hardware design is finished. Additionally, there will be some work in allowing for simultaneous docking and
P1: Binaya Dash October 24, 2007
17:56
7985
7985˙C010
330
Modeling and Control of Complex Systems deployment of multiple robots as the present designs call for the inclusion of multiple docking bays. •
Multiple docking stations — In order to support larger teams of deployable robots, multiple docking stations will be advantageous. The development of extensions to the algorithm discussed in Section 10.4.1 is necessary to ensure the correct distribution of multiple docking stations.
•
Communication issues — The present simulation does not take into account transmission delays associated with distance and repeating communication. Thus, a more accurate communication model will be necessary to ensure that the docking station’s estimation of robot position and state are accurate in a real-world scenario. If the docking station is unable to communicate with deployed systems, the docking station’s movement may be driven by restoring communication rather than minimizing power consumption. Additionally, communication on miniature hardware platforms will have to evolve. Presently, very few small robots have the capability of forming self-organized repeating networks. As the cost and availability of single-chip solutions become more feasible, robots integrating these capabilities should appear.
•
Robot deployment — The initial deployment of robots is random, which is a feasible simulation of deployment via airdrop or similar mechanism. However, more effort should be placed on intentional initial deployment by the docking station as well as a better method for redeployment. This may involve swapping tasks for a given robot and it may require the docking station to pick up robots from one area to redirect the search in another. This process is outside of the scope of the “energy minimization” and “resource recovery” aspects of this work, but it is important nonetheless.
•
Robot motion — The random motion when “exploring” by the deployed robots is not particularly useful in terms of actual multi-robot systems. Increasing the accuracy of this behavior will better model the power consumption of the system and thus make the recovery model more accurate.
•
Environments — Presently the simulation is running in an “open field;” the addition of dynamic events in a cluttered environment will be the true test of the simulation.
10.6.2 Hardware Innovations The work discussed in this chapter is the first step in a much larger process that will culminate in the development of a unique multi-robot system. The simulated results in Section 10.5 revolve around the development of the hardware platform capable of actually performing the tasks of transportation, deployment, recovery, and recharge. This new robotic platform will actually
P1: Binaya Dash October 24, 2007
17:56
7985
7985˙C010
A Framework for Large-Scale Multi-Robot Teams
331
FIGURE 10.12 Concept of the modular mobile docking station.
be built using existing platforms and result in a system similar to Hirose’s “super mechano-colony” [19]. Here, the robots propelling the system will be the MegaScout [32], which is a ruggedized platform with the strength and versatility to maneuver large payloads. However, unlike the “super mechanocolony,” the MegaScouts will cooperatively manipulate a mobile marsupial docking station capable of deploying a dozen smaller robots. A preliminary concept drawing of the system is shown in Figure 10.12. The reuse of the MegaScouts as the locomotive capability of this system allows the development to focus only on the deployment, recovery, and recharge aspects, which should expedite development time and lower development costs. If necessary, the docking station portion will function as a stationary system and potentially allow deployment of the MegaScouts for other purposes. This allows a greater flexibility of usage. The design of the docking station component is dependent on a number of parameters. It must be capable of performing the computation necessary to coordinate the deployed robots. This may involve proxy processing of sensing which could result in a large amount of resources dedicated to computation. It must also be able to communicate with the deployed systems as well as communicating back to a base station independent of the robot team. Additionally, there must be sufficient energy reserves present in order to continuously operate itself as well as provide power to resupply other systems. These capabilities must be fit into a package that leaves enough volume for transporting robots and is still maneuverable by the cooperating MegaScouts. 10.6.3 Applications There are a number of applications that can make use of a scalable multi-robot system as described in this work. •
Distributed detection and decontamination — Determining the location of hazardous materials requires that robotic team members collect samples of the environment and bring them back for further analysis. The method for recovery described here can optimize the
P1: Binaya Dash October 24, 2007
17:56
7985
7985˙C010
332
Modeling and Control of Complex Systems time that samples are brought back and can be used to retask robots to sample, monitor, or decontaminate areas that have been designated as contaminated.
10.7
•
Pervasive coverage — The resource sharing and optimization of resource distribution afforded by this model will allow more work in the area of pervasive coverage. A semipermanent sensor network can be created where the docking station will assist in deploying and optimizing the location of sensors to monitor an area for a given task. For example, the increased computational power of the docking station can do the work in planning where sensors are necessary using feedback from the deployed robots. As the environment changes, the docking station can reconfigure the team and recharge the robots that are monitoring the area.
•
Improved dispersion and recovery models — The underlying hardware system proposed and software simulation provide a unique testbed for dispersion and recovery algorithms. The ability to deploy multiple robots from a single docking station that can be moved to multiple locations allows for new dispersion methods. Conversely, recovery options are increased when the docking station is mobile. The one presented here is just one of many ways in which this system could be used to recover robots.
Conclusions
This work has discussed the background and limitations of existing algorithms and methods for multi-robot systems. The approach presented attempts to provide a method for multi-robot systems to scale to large numbers in practical and computationally feasible ways. This method is built upon the design of a marsupial system that attempts to maximize system longevity by relocating the docking station in order to minimize energy expended for recovery. Preliminary simulations are discussed and initial findings are shown. Future work is necessary to extend these simulations to coordinating multiple mobile docking stations and more complex environments. Initial design considerations of the development of a physical system capable of performing the tasks discussed are also presented.
Acknowledgments This material is based on work supported under a National Science Foundation Graduate Research Fellowship. This work has also been supported through Grants IIS-0219863, CNS-0224363, CNS-0324864, and CNS-0420836.
P1: Binaya Dash October 24, 2007
17:56
7985
7985˙C010
A Framework for Large-Scale Multi-Robot Teams
333
The authors also wish to acknowledge Casey Carlson for the concept drawings of the mobile docking station.
References 1. J. Adams, R. Bajcsy, J. Kosecka, V. Kumar, R. Mandelbaum, M. Mintz, R. Paul, C. Wang, Y. Yamamoto, and X. Yun. Cooperative material handling by human and robotic agents: Module development and system synthesis. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, volume 1, pages 200–205, Pittsburgh, PA, Aug. 1995. 2. J. A. Adams and R. Paul. Human management of a hierarchical control system for multiple mobile agents. In Proceedings of the IEEE Conference on Decision and Control, pages 3524–3529, Dec. 1994. 3. Sony global - AIBO global link. http://www.sony.net/Products/aibo/index. html. 4. R. Arkin and K. Ali. Integration of reactive and telerobotic control in multi-agent robotic systems. In Proceedings of the Third International Conference on Simulation of Adaptive Behavior, pages 473–478, Brighton, England, Aug. 1994. 5. R. C. Arkin and R. R. Murphy. Autonomous navigation in a manufacturing environment. IEEE Transactions on Robotics and Automation, 6:445–454, Aug. 1990. 6. C. Bererton and P. Khosla. Toward a team of robots with repair capabilities: A visual docking system. Seventh International Symposium on Experimental Robotics, pages 333–342, 2000. 7. C. Bererton, L. Navarro-Serment, R. Grabowski, C. J. J. Paredis, and P. K. Khosla. Millibots: Small distributed robots for surveillance and mapping. In Government Microcircuit Applications Conference, Anaheim, CA, Mar. 2000. 8. C. Carlson, A. Drenner, I. Burt, and N. Papanikolopoulos. Modular mobile docking station design. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robot Systems, 2006. 9. A. Castano, ˜ R. Chokkalingam, and P. Will. Autonomous and self-sufficient CONRO modules for reconfigurable robots. In Proceedings of the 5th International Symposium on Distributed Autonomous Robotic Systems, pages 155–164, Oct. 2000. 10. A. Castano, ˜ W.-M. Shen, and P. Will. CONRO: Towards deployable robots with inter-robot metamorphic capabilities. Autonomous Robots, 8(3):309–324, 2000. 11. P. I. Corke, S. E. Hrabar, R. Peterson, D. Rus, S. Saripalli, and G. S. Sukhatme. Autonomous deployment and repair of a sensor network using an unmanned aerial vehicle. In IEEE International Conference on Robotics and Automation, pages 3602–3609, Apr. 2004. 12. F. Dellaert, T. Balch, M. Kaess, R. Ravichandran, F. Alegre, M. Berhault, R. McGuire, E. Merrill, L. Moshkina, and D. Walker. The Georgia Tech yellow jackets: A marsupial team for urban search and rescue. In AAAI Mobile Robot Competition Workshop, pages 44–49, Edmonton, Alberta, 2002. 13. A. Drenner, I. Burt, T. Dahlin, B. Kratochvil, C. McMillen, B. Nelson, N. Papanikolopoulos, P. E. Rybsk, K. Stubbs, D. Waletzko, and K. B. Yesin. Mobility enhancements to the scout robot platform. In IEEE International Conference on Robotics and Automation, volume 1, pages 1069–1074, Washington, DC, May 2002.
P1: Binaya Dash October 24, 2007
17:56
334
7985
7985˙C010
Modeling and Control of Complex Systems
14. D. Fox, W. Burgard, H. Kruppa, and S. Thrun. A probabilistic approach to collaborative multi-robot localization. Autonomous Robots, Special Issue On Heterogeneous Multi-Robot Systems, 8(3):325–344, 2000. 15. D. Gage. Command control for many-robot systems. AUVS-92, the Nineteenth Annual AUVS Technical Symposium. Reprinted in Unmanned Systems Magazine, 10(4):28–34, June 1992. 16. R. Grabowski, L. E. Navarro-Serment, C. J. J. Paredis, and P. Khosla. Heterogeneous teams of modular robots for mapping and exploration. Autonomous Robotics, 8(3):293–308, 2000. 17. Y. Hada and S. Yuta. A first experiment of long term activity of autonomous mobile robot: Result of repetitive base-docking over a week. In Proceedings of the ISER 2000 Seventh International Symposium on Experimental Robotics, pages 235–244, Dec. 2000. 18. A. Hayes, A. Martinoli, and R. Goodman. Swarm robotic odor localization. In Proceedings of the 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems, volume 2, pages 1073–1078, Maui, Hawaii, Oct. 2001. 19. S. Hirose. Super mechano-system: New perspective for versatile robotic system. In Lecture Notes in Control and Information Sciences, Experimental Robotics VII, pages 249–258, Springer-Verlag, Berlin, 2000. 20. S. Hirose, R. Damoto, and A. Kawakami. Study of super-mechano-colony (concept and basic experimental setup). In Proceedings of the 2000 IEEE/RSJ International Conference on Intelligent Robots and Systems, volume 3, pages 1664–1669, Oct. 2000. 21. D. F. Hougen, J. C. Bonney, J. R. Budenske, M. Dvorak, M. Gini, D. G. Krantz, F. Malver, B. Nelson, N. Papanikolopoulos, P. E. Rybski, S. A. Stoeter, R. Voyles, and K. B. Yesin. Reconfigurable robots for distributed robotics. In Government Microcircuit Applications Conference, pages 72–75, Anaheim, CA, Mar. 2000. 22. A. Howard, M. Matari´c, and G. Sukhatme. An incremental deployment algorithm for mobile robot teams. In Proceeedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, volume 3, pages 2849–2854, EPFL, Switzerland, Mar. 2002. 23. A. Howard, M. Matari´c, and G. Sukhatme. Localization for mobile robot teams using maximum likelihood estimation. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, volume 1, pages 434–439, EPFL, Switzerland, Mar. 2002. 24. J. Hyams, M. Powell, and R. Murphy. Position estimation and cooperative navigation of micro-rovers using color segmentation. Autonomous Robots, 9:7–16, 2000. 25. H. Ishida, Y. Kagawa, T. Nakamoto, and T. Moriizumi. Odor-source localization in the clean room by an autonomous mobile sensing system. Sensors and Actuators B, 33:115–121, 1996. 26. H. Ishida, K. Suetsugu, T. Nakamoto, and T. Moriizumi. Study of autonomous mobile sensing system for localization of odor source using gas sensors and anemometric sensors. Sensors and Actuators A, 45:153–157, 1994. 27. E. Kadioglu and N. Papanikolopoulos. A method for transporting a team of miniature robots. In Proceedings of the 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems, volume 3, pages 2297–2302, Las Vegas, NV, Oct. 2003.
P1: Binaya Dash October 24, 2007
17:56
7985
7985˙C010
A Framework for Large-Scale Multi-Robot Teams
335
28. S. Kazadi, R. Goodman, D. Tsikata, D. Green, and H. Lin. An autonomous water vapor plume tracking robot using passive resistive polymer sensors. Autonomous Robots, 9:175–188, 2000. 29. I. Kelly, O. Holland, and C. Melhuish. Slugbot: A robotic predator in the natural world. In Proceedings of the Fifth International Symposium on Artificial Life and Robotics for Human Welfare and Artificial Liferobotics, pages 470–475, Oita, Japan, Jan. 2000. 30. K. Konolige, C. Ortiz, R. Vincent, A. Agno, M. Eriksen, B. Limketkai, M. Lewis, L. Briesemeister, E. Ruspini, D. Fox, J. Ko, B. Stewart, and L. Guibas. Centibots: Large scale robot teams. In A. C. Schultz, L. E. Parker, and F. E. Schneider, editors, Multi-Robot Systems: From Swarms to Intelligent Automata, volume 2, pages 193–204 Kluwer Academic Publishers, Dordrecht, NL, 2003. 31. K. Kotay, D. Rus, and M. Vona. Using modular self-reconfiguring robots for locomotion. In Proceedings of the 7th International Symposium on Experimental Robotics, pages 259–269, Honolulu, HI, Dec. 2000. 32. B. E. Kratochvil, I. T. Burt, A. Drenner, D. Goerke, B. Jackson, C. McMillen, C. Olson, N. Papanikolopoulos, A. Pfeifer, S. A. Stoeter, K. Stubbs, and D. Waletzko. Heterogeneous implementation of an adaptive robotic sensing team. In Proceedings of the IEEE International Conference on Robotics and Automation, volume 3, pages 4264–4269, Taipei, Taiwan, Sept. 2003. 33. H. Kurihara and Y. Matsuo. On analysis and control of collective behavior of a super-mechano colony in an object retrieval mission considering energy recharging and congestion. In Proceedings of the 41st SICE Annual Conference, volume 1, pages 141–145, Aug. 2002. 34. Y. Kuwana, I. Shimoyama, and H. Miura. Steering control of a mobile robot using insect antennae. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 530–535, 1995. 35. Mars microrover power subsystem. http://mars.jpl.nasa.gov/MPF/roverpwr/ power.html. 36. D. Marthaler and A. L. Bertozzi. Tracking environmental level sets with autonomous vehicles. In S. Butenko, R. Murphey, and P. Pardalos, editors, Recent Developments in Cooperative Control and Optimization. Kluwer Academic Publishers, Dordrecht, 2003. 37. M. J. Matari´c. Designing and understanding adaptive group behavior. Adaptive Behavior, 4(1):51–80, 1995. 38. C. McGray and D. Rus. Self-reconfigurable molecule robots as 3d metamorphic robots. In Proceedings of the 1998 IEEE/RSJ International Conference on Intelligent Robots and Systems, volume 2, pages 837–842, Victoria, BC, Canada, Oct. 1998. 39. B. Minten, R. Murphy, J. Hyams, and M. Micire. Low-order-complexity visionbased docking. IEEE Transactions on Robotics and Automation, 17(6):922–930, Dec. 2001. 40. T. Moriizumi and H. Ishida. Robotic systems to track chemical plumes. In Conference Optoelectronic and Microelectronic Materials and Devices, pages 537–540, Dec. 2002. 41. A. I. Mourikis and S. I. Roumeliotis. Analysis of positioning uncertainty in reconfigurable networks of heterogeneous mobile robots. In Proceedings of the IEEE International Conference on Robotics and Automation, pages 572–579, New Orleans, LA, Apr. 2004.
P1: Binaya Dash October 24, 2007
17:56
336
7985
7985˙C010
Modeling and Control of Complex Systems
42. R. R. Murphy. Marsupial and shape-shifting robots for urban search and rescue. IEEE Intelligent Systems, 15(2):14–19, Mar. 2000. 43. R. R. Murphy, M. Ausmus, M. Bugajska, T. Ellis, T. Johnson, N. Kelley, J. Kiefer, and L. Pollock. Marsupial-like mobile robot societies. In Proceedings of the Third Annual Conference on Autonomous Agents, pages 364–365, ACM Press, Seattle, WA, 1999. 44. M. Nilsson. Connectors for self-reconfiguring robots. IEEE Transactions on Mechatronics, 7(4):473–474, Dec. 2002. 45. R. Nowak, U. Mitra, and R. Willett. Estimating inhomogeneous fields using wireless sensor networks. IEEE Journal on Selected Areas in Communications Vol. 22, No 6, pp. 999–1006, 2004. 46. J. O’Rourke. Art Gallery Theorems and Algorithms. Oxford University Press, New York, 1987. 47. J. L. Pearce, P. E. Rybski, S. A. Stoeter, and N. Papanikolopoulos. Dispersion behaviors for a team of multiple miniature robots. In Proceedings of the 2003 IEEE International Conference on Robotics and Automation, pages 1158–1163, Taipei, Taiwan, Sept. 2003. 48. P. Questa, E. Grossmann, and G. Sandini. Camera self orientation and docking maneuver using normal flow. In Proceedings of SPIE, volume 2488, pages 274– 283, Orlando, FL, Apr. 1995. 49. Roomba self-charging home base. http://www.irobot.com/. 50. H. Roth and K. Schilling. Navigation and docking maneuvers of mobile robots in industrial environments. In Proceedings of IECON, pages 2458–2462, Aachen, Germany, Aug. 1998. 51. D. Rus. Self-reconfiguring robots. IEEE Intelligent Systems, 13(4):2–4, July 1998. 52. R. Russell, D. Thiel, R. Deveza, and A. Mackay-Sim. A robotic system to locate hazardous chemical leaks. In IEEE International Conference on Robotics and Automation, pages 556–561, 1995. 53. R. Russell, D. Thiel, and A. Mackay-Sim. Sensing odour trails for mobile robot navigation. In IEEE International Conference on Robotics and Automation, volume 3, pages 2672–2677, May 1994. 54. P. Rybski, A. Larson, A. Schoolcraft, S. Osentoski, and M. Gini. Evaluation of control strategies for multi-robot search and retrieval. In Proceedings of the International Conference on Intelligent Autonomous Systems, pages 281–288, Marina del Rey, CA, Mar. 2002. 55. P. E. Rybski, A. Larson, H. Veeraraghavan, M. LaPoint, and M. Gini. Communication strategies in multi-robot search and retrieval: Experiences with minDART. In DARS 2004, pages 301–310, Toulouse, France, June 2004. 56. J. Santos-Victor and G. Sandini. Visual based obstacle detection: A purposive approach using the normal flow. In Proceedings of the International Conference on Intelligent Autonomous Systems, Karlsruhe, Germany, 1995. 57. J. Santos-Victor and G. Sandini. Visual behaviors for docking. Computer Vision and Image Understanding, 67(3):223–238, 1997. 58. W.-M. Shen, B. Salemi, and P. Will. Hormone-inspired adaptive communication and distributed control for conro self-reconfigurable robots. IEEE Transactions on Robotics and Automation, 18(5):700–712, Oct. 2002. 59. W.-M. Shen and P. Will. Docking in self-reconfigurable robots. In Proceedings of the 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems, volume 2, pages 1049–1054, Maui, Hawaii, Oct. 2001.
P1: Binaya Dash October 24, 2007
17:56
7985
7985˙C010
A Framework for Large-Scale Multi-Robot Teams
337
60. W.-M. Shen, P. Will, and B. Khoshnevis. Self-assembly in space via selfreconfigurable robots. In Proceedings of the 2003 IEEE International Conference on Robotics and Automation, volume 2, pages 2516–2521, Taipei, Taiwan, Sept. 2003. 61. M. C. Silverman, D. Nies, B. Jung, and G. S. Sukatme. Staying alive: A docking station for autonomous robot recharging. In Proceedings of the 2002 IEEE International Conference on Robotics and Automation, pages 1050–1055, Washington, DC, May 2002. 62. K. Stoy, W.-M. Shen, and P. Will. Implementing configuration dependent gaits in a self-reconfigurable robot. In Proceedings of the 2003 IEEE International Conference on Robotics and Automation, volume 3, pages 3828–3833, Taipei, Taiwan, Sept. 2003. 63. T. Sugar, J. P. Desai, V. Kumar, and J. Ostrowski. Coordination of multiple mobile manipulators. In Proceedings of the IEEE International Conference on Robotics and Automation, volume 3, pages 3022–3027, 2001. 64. C. W. Therrien. Decision Estimation and Classification. Wiley, New York, 1989. 65. P. M. Vaz, R. Ferreira, V. Grossmann, and M. I. Ribeiro. Docking of a mobile platform based on infrared sensors. In IEEE International Symposium on Industrial Electronics, volume 2, pages 735–740, Guimar˜aes, Portugal, July 1997. 66. W. G. Walter. The Living Brain. W.W. Norton, New York, 1963. 67. R. Willett, A. Martin, and R. Nowak. Backcasting: Adaptive sampling for sensor networks. In Proceedings of Information Processing in Sensor Networks, pages 124– 133, 2004. 68. M. Yamakita, Y. Taniguchi, and Y. Shukuya. Analysis of formation control of cooperative transportation of mother ship by SMC. In Proceedings of the 2003 IEEE International Conference on Robotics and Automation, volume 1, pages 951– 956, Sept. 2003. 69. Y. Yamamoto and X. Yun. Coordinating locomotion and manipulation of a mobile manipulator. In Proceedings of the 31st Conference on Decision and Control, pages 2643–2648, Tucson, AZ, Dec. 1992. 70. M. Yim, D. G. Duff, and K. Roufas. Modular reconfigurable robots, an approach to urban search and rescue. In 1st Intl. Workshop on Human-Friendly Welfare Robotics Systems, pages 69–76, Taejon, Korea, 2000. 71. M. Yim, D. G. Duff, and K. D. Roufas. Polybot: A modular reconfigurable robot. In Proceedings of the IEEE International Conference on Robotics and Automation, volume 1, pages 514–520, Apr. 2000. 72. M. Yim, Y. Zhang, K. Roufas, D. Duff, and C. Eldershaw. Connecting and disconnecting for chain self-reconfiguration with polybot. IEEE/ASME Transactions on Mechatronics, 7(4):442–451, Dec. 2002.
P1: Binaya Dash October 24, 2007
17:56
7985
7985˙C010
P1: Binaya Dash/Sanjay Das November 16, 2007
15:52
7985
7985˙C011
11 Modeling and Control in Cancer Genomics
Aniruddha Datta, Ashish Choudhary, Michael L. Bittner, and Edward R. Dougherty
CONTENTS 11.1 11.2 11.3 11.4 11.5
Introduction.............................................................................................. 340 Genetic Regulatory Networks and Dynamical Systems.................... 341 Intervention .............................................................................................. 342 Dynamic Programming .......................................................................... 344 Mathematical Details .............................................................................. 345 11.5.1 Introduction ............................................................................... 345 11.5.2 Review of Probabilistic Boolean Networks........................... 347 11.5.3 Control in Probabilistic Boolean Networks: Problem Formulation ............................................................................... 350 11.5.4 Solution Using Dynamic Programming ................................ 353 11.6 Examples................................................................................................... 354 11.6.1 Simple Illustrative Example .................................................... 354 11.6.2 Real-World Example Based on Gene Expression Data........ 358 11.7 Concluding Remarks .............................................................................. 362 11.8 Future Directions ..................................................................................... 362 Acknowledgments .............................................................................................. 364 References............................................................................................................. 364
Genomics concerns the study of large sets of genes with the goal of understanding collective function, rather than that of individual genes. Such a study is important because cellular control and its failure in disease result from multivariate activity among cohorts of genes. Very recent research indicates that engineering approaches for prediction, signal processing, and control are quite well suited for studying this kind of multivariate interaction. In this chapter, we will present an overview of the research that has been accomplished thus far in this interdisciplinary field and point out some of the open research challenges that remain. 339
P1: Binaya Dash/Sanjay Das November 16, 2007
15:52
7985
340
7985˙C011
Modeling and Control of Complex Systems
Among the recent paradigms that have been proposed for modeling genetic regulatory networks are the so-called probabilistic Boolean networks (PBNs). Such rule-based networks provide a convenient tool for studying interactions between different genes while allowing for uncertainty in the knowledge of these relationships. This chapter will first introduce PBNs as a modeling tool and then consider the issue of control in probabilistic Boolean networks. First, we will consider the following control problem: given a PBN whose state transition probabilities depend on an external (control) variable, choose the sequence of control actions to minimize a given performance index over a finite number of steps. This is a standard finite horizon optimal control problem for Markov chains and can be solved using the classical technique of dynamic programming. The choice of the finite horizon performance index is motivated by cancer treatment applications where one would ideally like to intervene only over a finite time horizon, then suspend treatment and observe the effects over some additional time before deciding if further intervention is necessary. A real-world example, utilizing a melanoma cell line, is included to illustrate the key ideas. It is our belief that techniques of this type, which are well proven and time tested in the engineering literature, will one day find application in actual cancer therapy. Having established the connection between optimal control theory and a problem in cancer therapy, we will highlight several challenges that will have to be overcome before such methods can be used in actual clinical practice. We will also report on ongoing work and progress made in overcoming some of these challenges. The first few sections of the chapter are kept nontechnical in an effort to make the results accessible to a wide audience, including biologists and medical practitioners. Such readers can skip directly to Section 11.6.2 from Section 11.4.
11.1
Introduction
Cancer is caused by a breakdown in the cell cycle control system. This usually manifests as uncontrolled cell proliferation or reduced apoptosis, both of which can lead to tumorigenesis and cancer development. Proliferation genes or oncogenes, which turn on cell division, and tumor suppressor genes, which serve as brakes on cell division, play an important role in the initiation, progression, and final tumor development in the disease. The turning ON of oncogenes or the turning OFF of tumor suppressor genes does not usually occur in isolation. Indeed, these events are triggered by many other genes acting together in a collective fashion. Thus, it makes sense to study the holistic behavior of a set of genes in order to gain an understanding of the gene regulatory mechanisms that endow a cell with remarkable adaptivity under normal, disease-free conditions. Such an understanding can also be expected to provide useful pointers toward the regulatory mechanisms that fail in disease and perhaps also suggest appropriate intervention strategies for treatment.
P1: Binaya Dash/Sanjay Das November 16, 2007
15:52
7985
7985˙C011
Modeling and Control in Cancer Genomics
341
The advent of DNA microarray technology has heralded a new era in the study of intergene relationships [1–5]. By using DNA microarrays, it is now possible to simultaneously monitor the expression status of thousands of genes. By viewing the expression status of the genes across different conditions, it may be possible to establish relationships between the genes that show variations in expression status at least a minimum number of times across the different conditions. For instance, if two genes behave such that both of them turn ON and OFF simultaneously, it is reasonable to infer that they may be coregulated. On the other hand, if one gene turns ON when another turns OFF and vice versa, the expression status of the two genes are inversely related and one may be an inhibitor for the other. In general, the expression status of one particular gene will not depend on just one other gene but on a multitude of genes. To establish such multivariate relationships between genes, it makes sense to quantify how our estimate for the expression status of a particular gene, called a target gene, can be improved in the presence of the knowledge of the expression status of some other genes, called predictor genes. This can be mathematically formalized via the notion of the coefficient of determination (COD). A rigorous treatment of the COD and its use in genomic signal processing, cancer classification, and so forth can be found in References [6–8]. For our purposes here, it is sufficient to note that the COD measures the degree to which the best estimate for the transcriptional activity1 of a target gene can be improved using the knowledge of the transcriptional activity of some predictor genes, relative to the best estimate in the absence of any knowledge of the transcriptional activity of the predictors. Although the COD does not tell us anything about whether the transcriptional activity of the target genes are regulated by their predictors, or vice versa, it does indicate the existence of intergene relationships. As mathematically defined, the COD is a number between zero and one, with a higher value indicating a tighter relationship. Given a particular target gene of interest, it is possible that several sets of predictors may provide us with an equally good estimate of its transcriptional activity. This goodness can be measured in terms of the COD. Furthermore, for a particular target gene, it is possible to rank several sets of predictors in terms of their CODs. Such a ranking would provide us with a quantitative measure to determine the relative ability of each of these predictor sets to improve the estimate of the transcriptional activity of that particular target gene.
11.2
Genetic Regulatory Networks and Dynamical Systems
Given a set of genes of interest, one would like to study their behavior in a collective fashion. This can be facilitated by observing the transcriptional activity profile or gene activity profile of this set of genes across different 1 The
process of synthesizing m-RNA from DNA is called transcription.
P1: Binaya Dash/Sanjay Das November 16, 2007
15:52
7985
342
7985˙C011
Modeling and Control of Complex Systems
conditions and using that knowledge to infer the existence of relationships between the different genes via the coefficient of determination. As already discussed, given any target gene, in general there will be several sets of predictor genes, each with different determinative power as indicated by the COD. Thus, while attempting to infer intergene relationships, it makes sense to not put all our faith in one particular predictor set; instead, for a particular target gene, a better approach would be to consider a number of predictor sets with high CODs, while discarding the predictors with low CODs. Thereafter, each retained predictor set could be inferred to be indicative of the transcriptional activity of the target gene with a chance (probability) proportional to its COD. Having inferred the intergene relationships as above, it is now possible to use this information to model the evolution of the gene activity profile over time. The only assumption that is required is that the transcriptional activity of a given target gene at a particular time point is determined by the transcriptional activity profile of its predictors at the previous time point. Because each target gene is associated with several predictors, it is not possible to say with complete certainty what the transcriptional activity status of that gene will be at the next time point. Instead, one can compute the chances that at the next time step the target gene will be transcriptionally active, based on the information about the gene activity profile at the previous time step. The time evolution of the gene activity profile now defines a dynamic system. The fact that the gene activity profile at a particular time point depends only on the gene activity profile at the immediately preceding time point makes the dynamic system a Markovian one. Systems of this type have been extensively studied in the dynamic systems literature, and many powerful techniques are available for analyzing their behavior [9]. The ideas articulated in this section have been mathematically formalized in References [10,11] by introducing the so-called probabilistic Boolean networks (PBNs). The Markovian property of such networks has been established and results from Markovian dynamic system theory have been used successfully to study different aspects of their evolutionary behavior [12,13]. Here it is appropriate to mention that the PBNs are a generalization of the Boolean networks introduced earlier in the biological modeling literature by Kauffman [14–16].
11.3
Intervention
The PBNs mentioned in the last section are “descriptive” in nature in the sense that they can be used to describe the evolution of the gene activity profile, starting from any initial profile. For treatment or intervention purposes, we are interested in working with “prescriptive” PBNs where the chances of transitioning from one gene activity profile to another depend on certain auxiliary variables, whose values can be chosen to make the gene activity profile evolve in some desirable fashion.
P1: Binaya Dash/Sanjay Das November 16, 2007
15:52
7985
7985˙C011
Modeling and Control in Cancer Genomics
343
The use of such auxiliary variables makes sense from a biological perspective. For instance, in the case of diseases like cancer, auxiliary treatment inputs such as radiation, chemotherapy, and so forth may be employed to move the gene activity profile away from one that is associated with uncontrolled cell proliferation or markedly reduced apoptosis. The auxiliary variables could also include genes that serve as external master-regulators for all the genes in the network. The values of the individual auxiliary variables, which we also refer to as control inputs, can be changed from one time step to the other in an effort to make the network behave in a desirable fashion. The evolution of the gene activity profile of the PBN with control depends not only on the initial profile but also on the values of the control inputs at different time steps. Furthermore, intuitively it appears that it may be possible to make the gene activity profiles of the network evolve in a desirable fashion by appropriately choosing the control input at each time step. We first provide a nontechnical discussion of the underlying principles. The PBN with control is a special case of what is referred to in the engineering control literature as a controlled Markov chain [17]. Dynamical systems of this type occur in many real-life applications, the most notable example being the control of queues. Given such a controlled Markov chain, the objective is to come up with a sequence of control inputs, usually referred to as a control strategy, such that an appropriate cost function is minimized over the entire class of allowable control strategies. To arrive at a meaningful solution, the cost function must capture the costs and the benefits of using any control. The actual design of a “good” cost function is application dependent and is likely to require considerable expert knowledge. We next outline a procedure that we believe would enable us to arrive at a reasonable cost function for determining the course of therapeutic intervention using PBNs. In the case of diseases like cancer, treatment is typically applied over a finite time horizon. For instance, in the case of radiation treatment, the patient may be treated with radiation over a fixed interval of time following which the treatment is suspended for some time as the effects are evaluated. After that, the treatment may be applied again, but the important point to note is that the treatment window at each stage is usually finite. Thus, we will be interested in a finite horizon problem where the control is applied only over a finite number of steps. Suppose that the number of steps over which the control input is to be applied has been determined a priori to be M and we are interested in controlling the behavior of the PBN over the interval k = 0 through k = M − 1. Suppose at time step k, the gene activity profile of the PBN is given by z(k) and the corresponding control input is v(k). Then we can define a cost Ck (z(k), v(k)) as being the cost of applying the control input v(k) when the gene activity profile is z(k). For a given trajectory, the cost of control over the entire treatment horizon is simply the summation of these one-step costs. Recall that starting from a given initial gene activity profile, the evolution of the gene activity profile may follow several different trajectories. Thus, it makes sense to consider the cost of control averaged over all the possible trajectories for evolution.
P1: Binaya Dash/Sanjay Das November 16, 2007
15:52
7985
7985˙C011
344
Modeling and Control of Complex Systems
The averaged cost of control does give us one component of the finite horizon cost. We now proceed to introduce the second component. The net result of the control actions v(0), v(1), · · · , v( M−1) is that the gene activity profile of the PBN will evolve in time and finally end up in some gene activity profile z( M). Depending on the particular PBN and the control inputs used at each step, it is possible that some of the gene activity profiles may never occur. However, because the control strategy itself has not yet been determined, it would be difficult, if not impossible, to identify and exclude such profiles from further consideration. Accordingly, we assume that all the possible terminal gene activity profiles are reachable and assign a penalty or terminal cost C M (z( M)) associated with each one of them. We next consider penalty assignment. First, consider the PBN with all controls set to zero, that is, all the therapeutic interventions have been deactivated. Then divide the possible gene activity profiles into different categories depending on how desirable or undesirable they are and assign higher terminal costs to the undesirable gene activity profiles. For instance, a gene activity profile associated with rapid cell proliferation leading to cancer should be associated with a high terminal penalty, whereas a gene activity profile associated with normal behavior should be assigned a low terminal penalty. For the purposes of this chapter, we will assume that the assignment of terminal penalties has been carried out and we have at our disposal a terminal penalty C M (z( M)) which is a function of the terminal gene activity profile. Now, starting from a given initial gene activity profile, there is a certain chance of ending up with a particular gene activity profile in M steps. Furthermore, this particular terminal gene activity profile could be attained following different trajectories, each with its own chances of being followed. Thus, it makes sense to average out the terminal penalty to arrive at the second component of our cost function. The finite horizon cost to be minimized is given by the sum of the averaged cost of control and the averaged terminal penalty. Assuming that the control input v(k) is a function of the current gene activity profile z(k), we now use a mathematical technique called dynamic programming to arrive at an optimal sequence of control inputs that minimizes the finite horizon cost. In the next section, we provide a heuristic discussion of this important procedure.
11.4
Dynamic Programming
Dynamic programming, pioneered by R. Bellman in the 1950s [18], has been applied extensively in engineering applications. The control of queues in a computer server, the optimal scheduling of elevators in a building, or the optimal routing of telephone calls through a communication network are but a few examples of real-world applications where dynamic programming has played
P1: Binaya Dash/Sanjay Das November 16, 2007
15:52
7985
7985˙C011
Modeling and Control in Cancer Genomics
345
D 4
6
G
B 5
5 3
4
E A
3
5
7 C
2
H
4
J
4 F
6
3 I
FIGURE 11.1 Optimal fare selection problem.
an important role. To get a more concrete feel for what dynamic programming can do, consider the minimum fare selection problem in Figure 11.1. Here the number alongside an arrow indicates the fare involved in travelling from the vertex at the tail end of the arrow to the one located at the arrow head. For instance, the fare from point A to point C is 5 units. Clearly, there are many different paths that can be used to travel from point A to point J. The problem of interest, however, is to determine the optimal path in traveling from point A to point J, that is, the path for which the fare required is the minimum. This certainly represents a familiar real-world scenario. It can be verified that, in this example, the optimal path is given by A-C-F-H-J. Now suppose that the cost of travel between the different points is uncertain in the sense that at different times, it is represented by different sets of numbers. In this case, it would make sense to minimize the average fares. Roughly speaking, the technique of dynamic programming enables us to systematically determine the path that minimizes the average fare without having to go through unnecessary trial and error. From the discussion presented here, it is intuitively clear that the dynamic programming technique can be used to solve the optimal intervention problem posed in the last section. The technical developments are presented next. To make the presentation self-contained, some of the ideas discussed intuitively so far will be revisited, although at a much higher level of mathematical rigor.
11.5
Mathematical Details
11.5.1 Introduction Probabilistic Boolean networks (PBNs) have been proposed recently as a paradigm for studying gene regulatory networks [10]. These networks, which allow the incorporation of uncertainty into the intergene relationships, are
P1: Binaya Dash/Sanjay Das November 16, 2007
346
15:52
7985
7985˙C011
Modeling and Control of Complex Systems
essentially probabilistic generalizations of the standard Boolean networks introduced by Kauffman [14–16]. Given a PBN, the transition from one state to the next takes place in accordance with certain transition probabilities. Indeed, as shown in Reference [10], and as will be briefly reviewed in the next subsection, the states of a PBN form a homogeneous Markov chain with finite state space. Thus the PBNs form a subclass of the general class of Markovian genetic regulatory networks. The PBNs considered thus far in the literature can be described by Markov chains with fixed transition probabilities. Consequently, for such a network, given an initial state, the subsequent states evolve according to a priori determined probabilities. This setup provides a model for dynamically tracking the gene activity profile while allowing for uncertainty in the relationship between the different genes. However, it does not provide any effective knobs that could be used to externally guide the time evolution of the PBN, hopefully toward more desirable states. Intervention has been considered in the context of PBNs from other perspectives. By exploiting concepts from Markov chain theory, it has been shown how at a given state, one could toggle the expression status of a particular gene from ON to OFF or vice versa to facilitate transition to some other desirable state or set of states [12]. Specifically, using the concept of the mean first passage time, it has been shown how the particular gene, whose transcription status is to be momentarily altered to initiate the state transition, can be chosen to “minimize” in a probabilistic sense the time required to achieve the desired state transitions. These results come under the category of “transient” intervention, which essentially amounts to letting the original network evolve after reinitializing the state to a different value. A second approach has aimed at changing the steady-state (long-run) behavior of the network by minimally altering its rule-based structure [13]. This too constitutes transient intervention, but is more permanent in that it involves structural intervention. In this section, we consider PBNs where the transition probabilities between the various states can be altered by the choice of some auxiliary variables. These variables, which we will refer to as control inputs, can then be chosen to increase the likelihood that the network will transition from an undesirable state to a desirable one. Such a situation is likely to arise in the treatment of diseases such as cancer where the auxiliary variables could represent the current status of therapeutic interventions such as radiation, chemotherapy, and so forth. To be consistent with the binary nature of the state space associated with PBNs, these auxiliary control inputs will be allowed to be in one of two states: an ON state, indicating that a particular intervention is being actively applied at that point in time, and an OFF state, indicating that the application of that particular intervention has ceased. The control objective here would be to “optimally” apply one or more treatments so that an appropriate cost function is minimized over a finite number of steps, which we will refer to as the treatment horizon. The choice of the cost function, as well as the length of the treatment window, are two important aspects where the expert knowledge from biologists/clinicians could play a crucial role.
P1: Binaya Dash/Sanjay Das November 16, 2007
15:52
7985
7985˙C011
Modeling and Control in Cancer Genomics
347
Once the cost function and the treatment window have been selected, the control problem is essentially reduced to that of controlling a Markov chain over a finite horizon. Control problems of this type have been studied extensively in the controls literature for over four decades. Among the different solution methods available, the most popular one is the technique of dynamic programming, pioneered by Bellman in the 1960s [17,18]. In this section, we will formulate the optimal control problem for a PBN and arrive at a solution based on the dynamic programming approach. This section is organized as follows. In Subsection 11.5.2, we provide a brief review of PBNs as introduced in Reference [10]. In Subsection 11.5.3, we formulate the control problem for PBNs. The solution to this problem using the dynamic programming technique is presented in Subsection 11.5.4. 11.5.2 Review of Probabilistic Boolean Networks In this subsection, we provide a brief review of PBNs. We will only focus on those aspects that are critical to the development in this section. For a detailed and complete exposition, the reader is referred to References [10–12]. A probabilistic Boolean network is a formalism that has been developed for modeling the behavior of gene regulatory networks. In such a network, each gene can take on one of two binary values, zero or one. A zero value for a gene corresponds to the case when that particular gene is not expressed and a one value indicates that the corresponding gene has been turned ON. The functional dependency of a given gene value on all the genes in the network is given in terms of a single Boolean function or a family of Boolean functions. The case of a single Boolean function for each gene arises when the functional relationships between the different genes in the network are exactly known. Such a situation is not very likely to occur in practice. Nevertheless, networks of this type, referred to as standard Boolean networks [16], have been studied extensively in the literature. To account for uncertainty in our knowledge of the functional dependencies between the different genes, one could postulate that the expression level of a particular gene in the network is described by a family of Boolean functions with finite cardinality. Furthermore, each member of this family is assumed to describe the functional relationship with a certain probability. This leads to a PBN, as introduced in Reference [10]. Our discussion so far has only concentrated on the static relationships between the different genes in the network. To introduce dynamics, we assume that in each time step, the value of each gene is updated using the Boolean functions evaluated at the previous time step. For PBNs, the expression level of each gene will be updated in accordance with the probabilities corresponding to the different Boolean functions associated with that particular gene. To concretize matters, let us assume that we are attempting to model the relationship between n genes. Suppose that the activity level of gene i at time step k is denoted by xi (k). Thus, xi (k) = 0 would indicate that at the kth time step, the ith gene is not expressed, whereas xi (k) = 1 would indicate
P1: Binaya Dash/Sanjay Das November 16, 2007
15:52
7985
7985˙C011
348
Modeling and Control of Complex Systems
that the corresponding gene is expressed. The overall expression levels of all the genes in the network at time step k is given by the row vector x(k) = [x1 (k), x2 (k), · · · , xn (k)]. This vector is sometimes referred to as the gene activity profile (GAP) of the network at time k. Now suppose that for each gene i, there are l(i) possible Boolean functions: (i) f 1(i) , f 2(i) , f 3(i) , · · · , fl(i)
that can be used to describe the dependency of xi on x1 , x2 , · · · , xn . Furthermore, suppose that f j(i) is selected with a probability c (i) j so that: l(i)
c (i) j = 1.
j=1
Then the expression level of the ith gene transitions according to the equation: xi (k + 1) = f j(i) (x(k)) with probability c (i) j .
(11.1)
Let us consider the evolution of the entire state vector n x(k). Corresponding to a PBN with n genes, there are at most N = i=1 l(i) distinct Boolean networks, each of which could capture the intergene functional relationships with a certain probability. Let P1 , P2 , · · · , PN be the probabilities associated with the selection of each of these networks. Suppose the kth network is for gene i, i = 1, 2, · · · , n, obtained by selecting the functional relationship f i(i) k 1 ≤ i k ≤ l(i). Then, if the choice of the functional relationship for each gene is assumed to be independent of that for other genes, we have: Pk =
n
c i(i) . k
(11.2)
i=1
As discussed in Reference [10], even when there are dependencies between the choice of the functional relationships for different genes, one can calculate the Pi s by using conditional probabilities instead of the unconditional ones c (i) j . The evolution of the states of the PBN can be described by a finite Markov chain model. To do so, we first focus on standard Boolean networks. Then the state vector x(k) at any time step k is essentially an n-digit binary number whose decimal equivalent is given by: y(k) =
n
2n− j x j (k).
(11.3)
j=1
As x(k) ranges from 000 · · · 0 to 111 · · · 1, y(k) takes on all values from 0 to 2n −1. Now to be completely consistent with the development in Reference [10], define: z(k) = 1 + y(k).
(11.4)
Then as x(k) ranges from 00 · · · 0 to 11 · · · 1, z(k) will take on all values from 1 to 2n . Clearly, the map from x(k) to z(k) is one-to-one, onto and hence
P1: Binaya Dash/Sanjay Das November 16, 2007
15:52
7985
7985˙C011
Modeling and Control in Cancer Genomics
349
invertible. Thus, instead of the binary representation x(k) for the state vector, one could equivalently work with the decimal representation z(k). Furthern more, each z(k) could be uniquely represented by a basis vector w(k) ∈ R2 where w(k) = e z(k) , for example, if z(k) = 1, then w(k) = [1, 0, · · ·]. Then, as discussed in Reference [10], the evolution of the vector w(k) proceeds according to the following difference equation: w(k + 1) = w(k) A
(11.5)
where A is a 2n × 2n matrix having only one nonzero entry in each row.2 Equation (11.5) is reminiscent of the state transition equation in Markov chain theory. The only difference here is that for a given initial state, the transition is completely deterministic. However, Equation (11.5) can also be interpreted easily within a stochastic framework. For instance, the vector w(k) does represent the probability distribution over the entire state space at time step k. Indeed, because of the deterministic nature of the evolution, at each time step k, the entire probability mass is concentrated on only one out of the 2n possible states, thereby accounting for the 2n -dimensional vectors w(k) with only one nonzero entry of one corresponding to the location where the probability mass is concentrated. The matrix A also qualifies as a bona fide stochastic matrix with the sole nonzero entry in each row being equal to one. Thus, given an initial state, the transition to the next state is deterministic and takes place with probability one. The stochastic interpretation of Equation (11.5) given above allows us to readily extend it to accommodate state transitions in PBNs. Toward this end, n let a and b be any two basis vectors in R2 . Then, using the total probability theorem, it follows that the transition probability Pr {w(k + 1) = a |w(k) = b} is given by: Pr {w(k + 1) = a |w(k) = b} N Pr {w(k + 1) = a |w(k) = b, Network s is selected}.Ps = s=1
=
Ps
(11.6)
s∈S
where S = {s : Pr {w(k + 1) = a |w(k) = b, Network s is selected } = 1}. n
By letting the vectors a and b range over all possible basis vectors in R2 , we can determine the 2n × 2n entries of the transition probability matrix A. Now let w(k) denote the probability distribution vector at time k, that is, wi (k) = Pr {z(k) = i}. It is straightforward to show that w(k) evolves 2 Row
a has a 1 in column b if given w(k) = a w(k + 1) = b.
P1: Binaya Dash/Sanjay Das November 16, 2007
15:52
7985
7985˙C011
350
Modeling and Control of Complex Systems
according to the equation: w(k + 1) = w(k) A
(11.7)
where the entries of the Amatrix have been determined using Equation (11.6). This completes our discussion of PBNs. For a more rigorous derivation of Equation (11.7), the reader is referred to Reference [10]. 11.5.3 Control in Probabilistic Boolean Networks: Problem Formulation Probabilistic Boolean networks can be used for studying the dynamic behavior of gene regulatory networks. However, once a probability distribution vector has been specified for the initial state, the subsequent probability distribution vectors evolve according to Equation (11.7) and there is no mechanism for “controlling” this evolution. Thus, the PBNs discussed thus far in this section are “descriptive” in nature in the sense that they can be used to describe the evolution of the probability distribution vector, starting from any initial distribution. For treatment or intervention purposes, we are interested in working with “prescriptive” PBNs where the transition probabilities of the associated Markov chain depend on certain auxiliary variables, whose values can be chosen to make the probability distribution vector evolve in some desirable fashion. The use of such auxiliary variables makes sense from a biological perspective. For instance, in the case of diseases like cancer, auxiliary treatment inputs such as radiation, chemotherapy, and so forth may be employed to move the state probability distribution vector away from one, which is associated with uncontrolled cell proliferation or markedly reduced apoptosis. The auxiliary variables could also include genes that serve as external master-regulators for all the genes in the network. To be consistent with the binary nature of the expression status of individual genes in the PBN, we will assume that the auxiliary variables (control inputs) can take on only the binary values zero or one. The values of the individual control inputs can be changed from one time step to the other in an effort to make the network behave in a desirable fashion. Suppose that a PBN with n genes has m control inputs u1 , u2 , · · · , um .
Then at any given time step k, the row vector u(k) = [u1 (k), u2 (k), · · ·, um (k)] describes the complete status of all the control inputs. Clearly, u(k) can take on all binary values from [0, 0, · · · , 0] to [1, 1, · · · , 1]. As in the case of the state vector, one can equivalently represent the control input status using the decimal number: v(k) = 1 +
m
2m−i ui (k).
(11.8)
i=1
Clearly, as u(k) takes on binary values from [0, 0 · · · , 0] to [1, 1, · · · , 1], the variable v(k) ranges from 1 to 2m . We can equivalently use v(k) as an indicator of the complete control input status of the PBN at time step k.
P1: Binaya Dash/Sanjay Das November 16, 2007
15:52
7985
7985˙C011
Modeling and Control in Cancer Genomics
351
We now proceed to derive the counterpart of Equation (11.7) for a PBN subject to auxiliary controls. Let v∗ be any integer between 1 and 2m and suppose that v(k) = v∗ . Then, it is clear that the procedure outlined in the last subsection can be used to compute the corresponding A matrix, which will now depend on v∗ and can be denoted by A(v∗ ). Furthermore, the evolution of the probability distribution vector at time k will take place according to the following equation: w(k + 1) = w(k) A(v∗ ).
(11.9)
Because the choice of v∗ is arbitrary, the one-step evolution of the probability distribution vector in the case of a PBN with control inputs takes place according to the equation: w(k + 1) = w(k) A(v(k)).
(11.10)
Note that the transition probability matrix here is a function of all the control inputs u1 (k), u2 (k), · · ·, um (k). Consequently, the evolution of the probability distribution vector of the PBN with control now depends not only on the initial distribution vector but also on the values of the control inputs at different time steps. Furthermore, intuitively it appears that it may be possible to make the states of the network evolve in a desirable fashion by appropriately choosing the control input at each time step. We next proceed to formalize these ideas. Equation (11.10) is referred to in the control literature as a controlled Markov chain [17]. Markov chains of this type occur in many real-life applications, the most notable example being the control of queues. Given such a controlled Markov chain, the objective is to come up with a sequence of control inputs, usually referred to as a control strategy, such that an appropriate cost function is minimized over the entire class of allowable control strategies. To arrive at a meaningful solution, the cost function must capture the costs and the benefits of using any control. The actual design of a “good” cost function is application dependent and is likely to require considerable expert knowledge. We next outline a procedure that we believe would enable us to arrive at a reasonable cost function for determining the course of therapeutic intervention using PBNs. In the case of diseases like cancer, treatment is typically applied over a finite time horizon. For instance, in the case of radiation treatment, the patient may be treated with radiation over a fixed interval of time following which the treatment is suspended for some time as the effects are evaluated. After that, the treatment may be applied again but the important point to note is that the treatment window at each stage is usually finite. Thus, we will be interested in a finite horizon problem where the control is applied only over a finite number of steps. Suppose that the number of steps over which the control input is to be applied has been determined a priori to be M and we are interested in controlling the behavior of the PBN over the interval k = 0, 1, 2, · · · , M − 1. Suppose at
P1: Binaya Dash/Sanjay Das November 16, 2007
15:52
7985
7985˙C011
352
Modeling and Control of Complex Systems
time step k, the state3 of the PBN is given by z(k) and the corresponding control input is v(k). Then we can define a cost Ck (z(k), v(k)) as being the cost of applying the control input v(k) when the state is z(k). With this definition, the expected cost of control over the entire treatment horizon becomes:
E
M−1
Ck (z(k), v(k))|z(0) .
(11.11)
k=0
Note that even if the network starts from a given (deterministic) initial state z(0), the subsequent states will be random because of the stochastic nature of the evolution in Equation (11.10). Consequently, the cost in Equation (11.11) had to be defined using an expectation. Equation (11.11) does give us one component of the finite horizon cost, namely the cost of control. We now proceed to introduce the second component. The net result of the control actions v(0), v(1), · · · , v( M − 1) is that the state of the PBN will transition according to Equation (11.10) and will end up in some state z( M). Because of the probabilistic nature of the evolution, the terminal state z( M) is a random variable that could possibly take on any of the values 1, 2, · · · , 2n . Depending on the particular PBN and the control inputs used at each step, it is possible that some of these states may never be reached because of noncommunicating states in the resulting Markov chains, and so forth. However, because the control strategy itself has not yet been determined, it would be difficult, if not impossible, to identify and exclude such states from further consideration. Instead, we assume that all the 2n terminal states are reachable and assign a penalty or terminal cost C M (z( M)) associated with each one of them. Indeed, in the case of PBNs with perturbation, all states communicate and the Markov chain is ergodic [12]. We next consider penalty assignment. First, consider the PBN with all controls set to zero, that is, v(k) ≡ 1 for all k. Then divide the states into different categories depending on how desirable or undesirable they are and assign higher terminal costs to the undesirable states. For instance, a state associated with rapid cell proliferation leading to cancer should be associated with a high terminal penalty, whereas a state associated with normal behavior should be assigned a low terminal penalty. For the purposes of this chapter, we will assume that the assignment of terminal penalties has been carried out and we have at our disposal a terminal penalty C M (z( M)) which is a function of the terminal state. Thus, we have arrived at the second component of our cost function. Once again, note that the quantity C M (z( M)) is a random variable and so we must take its expectation while defining the cost function to be minimized. In view of Equation (11.11), the
3 In
the rest of this chapter, we will be referring to z(k) as the state of the probabilistic Boolean network because, as discussed in Section 11.5.2, z(k) is equivalent to the actual state x(k).
P1: Binaya Dash/Sanjay Das November 16, 2007
15:52
7985
7985˙C011
Modeling and Control in Cancer Genomics finite horizon cost to be minimized is given by: M−1 E Ck (z(k), v(k)) + C M (z( M))|z(0) .
353
(11.12)
k=0
To proceed further, let us assume that at time k, the control input v(k) is a function of the current state z(k), that is, v(k) = μk (z(k))
(11.13)
where μk : {1, 2, · · · , 2n } → {1, 2, · · · , 2m }. The optimal control problem can now be stated as follows: Given an initial state z(0), find a control law π = {μ0 , μ1 , · · · , μ M−1 } that minimizes the cost functional: M−1 J π (z(0)) = E Ck (z(k), μk (z(k))) + C M (z( M)) (11.14) k=0
subject to the constraint: Pr {z(k + 1) = j|z(k) = i} = a i j (v(k))
(11.15)
where a i j (v(k)) is the ith row, jth column entry of the matrix A(v(k)). 11.5.4 Solution Using Dynamic Programming Optimal control problems of the type described by Equations (11.14) and (11.15) can be solved by using the technique of dynamic programming. This technique, pioneered by Bellman in the 1960s, is based on the so-called principle of optimality. This principle is a simple but powerful concept and can be explained as follows. Suppose that we have an optimization problem where we are interested in optimizing a performance index over a finite number of steps, say M. At each step, a decision is made and the objective is to come up with a strategy or sequence of M decisions which is optimal in the sense that the cumulative performance index over all the M steps is optimized. In general, such an optimal strategy may not exist. However, when such an optimal strategy does exist, the principle of optimality asserts the following: if one searches for an optimal strategy over a subset of the original number of steps, then this new optimal strategy will be given by the overall optimal strategy, restricted to the steps being considered. Although intuitively obvious, the principle of optimality can have far-reaching consequences. For instance, it can be used to obtain the following proposition proven in Reference [17] (Chapter 1, page 23). PROPOSITION 1 Let J ∗ (z(0)) be the optimal value of the cost functional (11.14). Then: J ∗ (z(0)) = J 0 (z(0))
P1: Binaya Dash/Sanjay Das November 16, 2007
15:52
7985
7985˙C011
354
Modeling and Control of Complex Systems
where the function J 0 is given by the last step of the following dynamic programming algorithm which proceeds backward in time from time step M − 1 to time step 0: J M (z( M)) = C M (z( M)) J k (z(k)) = min m E {Ck (z(k), v(k)) + J k+1 (z(k + 1))}
(11.16)
v(k)∈{1,2,···,2 }
k = 0, 1, 2, · · · , M − 1.
(11.17)
Furthermore, if v∗ (k) = μ∗k (z(k)) minimizes the right-hand side of Equation (11.17) for each z(k) and k, the control law π ∗ = {μ∗0 , μ∗1 , · · · , μ∗M−1 } is optimal. Note that the expectation on the right-hand side of Equation (11.17) is conditioned on z(k) and v(k). Hence, in view of Equation (11.15), it follows that: 2 n
E[J k+1 (z(k + 1))|z(k), v(k)] =
a z(k), j (v(k)).J k+1 ( j).
j=1
Thus, the dynamic programming solution to Equations (11.14) and (11.15) is given by: J M (z( M)) = C M (z( M)) J k (z(k)) =
min
v(k)∈{1,2···,2m }
⎡ ⎣Ck (z(k), v(k)) +
k = 0, 1, 2, · · · , M − 1.
11.6
2 n
(11.18) ⎤ a z(k), j (v(k)).J k+1 ( j) ⎦ ,
j=1
(11.19)
Examples
In this section, we present two examples to show optimal control design using the dynamic programming approach. The first example is a simple contrived one for illustrative purposes only, whereas the second one is a realistic example based on actual gene expression data. 11.6.1 Simple Illustrative Example In this subsection, we present an example of a PBN with control and work through the details to show how Equations (11.18) and (11.19) can be used in arriving at an optimal control strategy. The example we consider is adapted from Example 1 in Reference [10]. That example involves a PBN with three genes, x1 , x2 , and x3 . There are two functions f 1(1) , f 2(1) associated with x1 , one function f 1(2) associated with x2 , and two functions f 1(3) , f 2(3) associated with x3 . These functions are given by the truth table shown in Table 11.1. The truth
P1: Binaya Dash/Sanjay Das November 16, 2007
15:52
7985
7985˙C011
Modeling and Control in Cancer Genomics
355
TABLE 11.1
Truth Table for Example 1 in Reference [10] x1 x2 x3
f1(1)
f2(1)
f1(2)
f1(3)
f2(3)
000 001 010 011 100 101 110 111
0 1 1 1 0 1 1 1
0 1 1 0 0 1 1 1
0 1 1 0 1 1 0 1
0 0 0 1 0 1 1 1
0 0 0 0 0 0 0 1
c (i) j
0.6
0.4
1
0.5
0.5
table corresponds to an uncontrolled PBN. To introduce control, let us assume that x1 is now going to be a control input whose value can be switched externally between 0 and 1 and the states of the new PBN are x2 and x3 . To be consistent with the notation introduced in Section 11.5.3, the variables x1 , x2 , and x3 will be renamed; the variable x1 now becomes u1 , whereas the variables x2 and x3 become x1 and x2 , respectively. With this change, we have the truth table shown in Table 11.2 which also contains the values of the variables v and z corresponding to u1 and [x1 x2 ], respectively. The values of c (i) j in the table dictate that there are two possible networks, the first corresponding to the choice of functions ( f 1(1) , f 1(2) ) and the second corresponding to the choice of functions ( f 1(1) , f 2(2) ). The probabilities P1 and P2 associated with each of these networks are given by P1 = P2 = 0.5. We next proceed to compute the matrices A(1) and A(2) corresponding to the two possible values for v. From Table 11.2, it is clear that when v = 1, the following transitions are associated with the network N1 and occur with probability P1 : z = 1 → z = 1, z = 2 → z = 3, z = 3 → z = 3, z = 4 → z = 2.
TABLE 11.2
Truth Table for the Example of this Section u1
v
x1
x2
z
f1(1)
f1(2)
f2(2)
0 0 0 0 1 1 1 1
1 1 1 1 2 2 2 2
0 0 1 1 0 0 1 1
0 1 0 1 0 1 0 1
1 2 3 4 1 2 3 4
0 1 1 0 1 1 0 1
0 0 0 1 0 1 1 1
0 0 0 0 0 0 0 1
1
0.5
0.5
c (i) j
(11.20)
P1: Binaya Dash/Sanjay Das November 16, 2007
15:52
7985
7985˙C011
356
Modeling and Control of Complex Systems
The corresponding transitions associated with network N2 that occur with probability P2 are given by: z = 1 → z = 1, z = 2 → z = 3, z = 3 → z = 3, z = 4 → z = 1.
(11.21)
In view of Equations (11.20) and (11.21), the matrix A(1) is given by: ⎤ ⎡ 1 0 0 0 ⎢ 0 0 1 0⎥ ⎥ ⎢ (11.22) A(1) = ⎢ ⎥. ⎣ 0 0 1 0⎦ P1
P2
0
0
Similarly, we can arrive at the following A(2) matrix: ⎤ ⎡ 0 0 1 0 ⎢ 0 0 P2 P1 ⎥ ⎥ ⎢ A(2) = ⎢ ⎥. ⎣ P2 P1 0 0 ⎦ 0 0 0 1
(11.23)
In this example, n = 2 so that the variable z can take on any one of the four values 1, 2, 3, or 4. Also, because m = 1, the control variable v can take on any one of the two values 1 or 2. Suppose that the control action is to be carried out over five steps so that M = 5. Moreover, assume that the terminal penalties are given by: C5 (1) = 0,
C5 (2) = 1,
C5 (3) = 2,
C5 (4) = 3.
(11.24)
Note that the above choices of M and the values of the terminal penalties are completely arbitrary; in a real-world example, this information would be obtained from biologists. The current choice of terminal penalties indicates that the most desirable terminal state is 1 while the least desirable terminal state is 4. To set up the optimization problem (11.14), (11.15), we need to define the function Ck (z(k), v(k)). For the sake of simplicity, let us define: Ck (z(k), v(k)) =
m
ui (k) = u1 (k)
(11.25)
i=1
where v(k) and ui (k), i = 1, 2, · · · , m are related by Equation (11.8). Clearly, the cost Ck (z(k), v(k)) captures the cost of applying the input u1 (k) at the kth step. The optimization problem (11.14), (11.15) can now be posed using the quantities defined in Equations (11.22), (11.23), (11.24), and (11.25). The dynamic programming algorithm resulting from Equations (11.18) and (11.19) becomes: J 5 (z(5)) = C5 (z(5)) ⎡ J k (z(k)) = min ⎣u1 (k) + v(k)∈{1,2}
k = 0, 1, 2, 3, 4.
4
⎤
(11.26)
a z(k), j (v(k)).J k+1 ( j) ⎦
j=1
(11.27)
P1: Binaya Dash/Sanjay Das November 16, 2007
15:52
7985
7985˙C011
Modeling and Control in Cancer Genomics
357
We proceed backwards step by step from k = 4 to obtain a solution to Equations (11.26) and (11.27). The net result is that the optimal control strategy for this finite horizon problem is given by: μ∗0 (z(0)) = μ∗1 (z(1)) = μ∗2 (z(2)) = μ∗3 (z(3)) = 1 for all z(0), z(1), z(2), z(3)
2 if z(4) = 3 μ∗4 (z(4)) = 1 otherwise.
(11.28) (11.29)
Thus, the control input is applied only in the last time step provided the state z of the system at that time step is equal to 3; otherwise, the optimal control strategy is to not apply any control at all. Let us now consider a few different initial states z(0) and see whether the optimal control strategy determined above makes sense. Case 1
z(0) = 1: According to Equations (11.28), (11.29), and (11.22), the optimal control strategy in this case is no control. Note from Equation (11.24) that the evolution of the probabilistic Boolean network is starting from the most desirable terminal state. Furthermore, from Equation (11.22) it is clear that in the absence of any control, the state of the network remains at this position. Hence, the control strategy arrived at is, indeed, optimal and the value of the optimal cost is 0. Case 2
z(0) = 4: In this case, from Equation (11.24), it is clear that the evolution of the probabilistic Boolean network is starting from the most undesirable terminal state. Moreover, from Equation (11.23) note that if the control input was kept turned ON over the entire control horizon, then the state would continue to remain in this most undesirable position during the entire control duration. Such a control strategy cannot be optimal because not only does the network end up in the most undesirable terminal state, but also the maximum possible control cost is incurred over the entire time horizon. To get a more concrete feel for the optimal control strategy, let us focus on the cases where the PBN degenerates into a standard (deterministic) Boolean network. There are two cases to consider: (1) P2 = 1, P1 = 0: In this case, from Equation (11.22) we have: ⎡ ⎤ 1 0 0 0 ⎢0 0 1 0⎥ ⎥ A(1) = ⎢ (11.30) ⎣ 0 0 1 0 ⎦. 1
0
0
0
Clearly, if no control is employed, then, starting from z(0) = 4, the network will reach the state z(1) = 1 in one step and stay there forever, after. Thus, this no-control strategy is, indeed, optimal and the optimal cost is 0, which does agree with the value determined from Equations (11.26) and (11.27) with P1 = 0.
P1: Binaya Dash/Sanjay Das November 16, 2007
15:52
7985
7985˙C011
358
Modeling and Control of Complex Systems (2) P2 = 0, P1 = 1: In this case, from Equations (11.22) and (11.23) we have: ⎡
1 ⎢0 ⎢ A(1) = ⎢ ⎣0
0 0 0
0 1 1
⎤ 0 0⎥ ⎥ ⎥, 0⎦
0
1
0
0
⎡
0 ⎢0 ⎢ A(2) = ⎢ ⎣0 0
0 0
1 0
1 0
0 0
⎤ 0 1⎥ ⎥ ⎥. 0⎦
(11.31)
1
Note from Equation (11.28) that the optimal control strategy is no control over the first four time steps. From Equation (11.31) it follows that with z(0) = 4, we will have z(1) = 2, z(2) = 3, z(3) = 3, and z(4) = 3. Then at the last time step, the control input is turned ON and from Equation (11.31), the resulting state is z(5) = 2. The optimal cost is given by 2 (the sum of the terminal cost and the cost of control) and this value agrees with that determined from Equations (11.26) and (11.27) with P1 = 1. 11.6.2 Real-World Example Based on Gene Expression Data In this subsection, we apply the methodology of this chapter to derive an optimal intervention strategy for a particular gene regulatory network. The network chosen as an example of how control might be applied is one developed from data collected in a study of metastatic melanoma [19]. In this expression profiling study, the abundance of messenger RNA for the gene WNT5A was found to be a highly discriminating difference between cells with properties typically associated with high metastatic competence versus those with low metastatic competence. These findings were validated and expanded in a second study [20]. In this study, experimentally increasing the levels of the Wnt5a protein secreted by a melanoma cell line via genetic engineering methods directly altered the metastatic competence of that cell as measured by the standard in vitro assays for metastasis. A further finding of interest in the aforementioned study was that an intervention that blocked the Wnt5a protein from activating its receptor, by the use of an antibody that binds Wnt5a protein, could substantially reduce the ability of Wnt5a to induce a metastatic phenotype. This of course suggests a study of control based on interventions that alter the contribution of the WNT5A gene’s action to biological regulation, because the available data suggest that disruption of WNT5A’s influence could reduce the chance of a melanoma metastasizing, a desirable outcome. The methods for choosing the genes involved in a small local network that includes the activity of the WNT5A gene and the rules of interaction have been described in Reference [21]. As discussed in that paper, the WNT5A network was obtained by studying the predictive relationship among 587 genes. The expression status of each gene was quantized to one of three possible levels: −1 (downregulated), 0 (unchanged), and 1 (upregulated). Thus in this case, the gene activity profile at any time step is not a binary number but a ternary one. However, the PBN formulation and the associated control strategy can be developed exactly as described in Sections 11.5.2, 11.5.3, and 11.5.4, with
P1: Binaya Dash/Sanjay Das November 16, 2007
15:52
7985
7985˙C011
Modeling and Control in Cancer Genomics
359
RET-1
MMP-3
HADHB
WNT5A S100P
Pirin MART-1
Synuclein STC2 PHO-C FIGURE 11.2 Multivariate relationship among the genes of the ten-gene WNT5A network [21].
the only difference that now for an n-gene network, we will have 3n states instead of the 2n states encountered earlier. In this context, it is appropriate to point out that to apply the control algorithm of this chapter, it is not necessary to actually construct a PBN; all that is required are the transition probabilities between the different states under the different controls. A network with 587 genes will have 3587 states, which is an intractably large number to use either for modeling or for control. Consequently, the number of genes was narrowed down to the ten most significant ones and the resulting multivariate relationship, using the best three-gene predictor for each gene, is shown in Figure 11.2. These relationships were developed using the COD (coefficient of determination) technique [6–8] applied to the gene expression patterns across 31 different conditions and prior biological knowledge. A detailed description of this is available in Reference [21]. The control objective for this ten-gene network is to externally downregulate the WNT5A gene. The reason is that it is biologically known that WNT5A ceasing to be downregulated is strongly predictive of the onset of metastasis. Controlling the ten-gene network using dynamic programming would require us to design a control algorithm for a system with 310 (= 59, 049) states. Although there is nothing conceptually difficult about doing this, it is beyond the computational limits of our current software, which we are in the process of improving.
P1: Binaya Dash/Sanjay Das November 16, 2007
15:52
7985
7985˙C011
360
Modeling and Control of Complex Systems WNT5A STC2 Pirin
S100P HADHB
MART1
RET1
FIGURE 11.3 Multivariate relationship among the genes of the seven-gene WNT5A network.
Accordingly, we further narrowed down the number of genes in the network to seven by using COD analysis on the 31 samples. The resulting genes along with their multivariate relationship are shown in Figure 11.3. For each gene in this network, we determined their two best two-gene predictors and their corresponding CODs. Using the procedure discussed in Reference [10], the COD information for each of the predictors was then used to determine the 37 ×37 matrix of transition probabilities for the Markov chain corresponding to the dynamic evolution of the gene activity profile of the seven-gene network. The optimal control problem can now be completely specified by choosing (1) the treatment/intervention window, (2) the terminal penalty, and (3) the types of controls and the costs associated with them. For the treatment window, we arbitrarily chose a window of length 5, that is, control inputs would be applied only at time steps 0, 1, 2, 3, and 4. The terminal penalty at time step 5 was chosen as follows. Because our objective is to ensure that WNT5A is downregulated, we assigned a penalty of zero to all states for which WNT5A equals −1, a penalty of 3 to all states for which WNT5A equals 0, and a penalty of 6 to all states for which WNT5A equals 1. Here the choice of the numbers 3 and 6 is arbitrary but they do reflect our attempt to capture the intuitive notion that states where WNT5A equals 1 are less desirable than those where WNT5A equals 0. Two types of possible controls were used, and next we discuss the two cases separately. (WNT5A Controlled Directly) In this case, the control action at any given time step is to force WNT5A equal to −1, if necessary, and let the network evolve from there. Biologically such a control could be implemented by using a WNT5A inhibitory protein. In this case, the control variable is binary, with 0 indicating that the expression status of WNT5A has not been forcibly altered, while 1 indicates that such a forcible alteration has taken place.
P1: Binaya Dash/Sanjay Das November 16, 2007
15:52
7985
7985˙C011
Modeling and Control in Cancer Genomics
361
Of course, whether at a given time step such intervention takes place or not is decided by the solution to the resulting dynamic programming algorithm and the actual state of the network immediately prior to the intervention. With this kind of intervention strategy, it seems reasonable to incur a control cost at a given time step if and only if the expression status of WNT5A has to be forcibly changed at that time step. Once again, we arbitrarily assigned a cost of 1 to each such forcible change and solved for the optimal control using dynamic programming. The net result was a set of optimal control inputs for each of the 2187 (= 37 ) states at each of the five time points. Using these control inputs, we studied the evolution of the state probability distribution vectors with and without control. For every possible initial state, our simulations indicated that at every time step from 1 to 5, the probability of WNT5A being equal to −1 was higher with control than that without control. Furthermore, with control, WNT5A always reached −1 at the final time point (k = 5). Thus, we conclude that the optimal control strategy of Sections 11.5.3 and 11.5.4 was, indeed, successful in achieving the desired control objective. In this context, it is significant to point out that if the network starts from the initial state STC2 = −1, HADHB = 0, MART-1 = 0, RET-1 = 0, S100P = −1, pirin = 1, WNT5A = 1 and if no control is used, then it quickly transitions to a bad absorbing state (absorbing state with WNT5A = 1). With optimal control, however, this does not happen. (WNT5A Controlled through Pirin) In this case, the control objective is the same as in Case 1, namely to keep WNT5A downregulated. The only difference is that this time, we use another gene, pirin, to achieve this control. The treatment window and the terminal penalties are kept exactly the same as before. The control action consists of either forcing pirin to −1 (corresponding to a control input of 1) or letting it remain wherever it is (corresponding to a control input of 0). As before, at any step, a control cost of 1 is incurred if and only if pirin has to be forcibly reset to −1 at that time step. Having chosen these design parameters, we implemented the dynamic programming algorithm with pirin as the control. Using the resulting optimal controls, we studied the evolution of the state probability distribution vectors with and without control. For every possible initial state, our simulations indicated that, at the final state, the probability of WNT5A being equal to −1 was higher with control than that without control. In this case, there was, however, no definite ordering of probabilities between the controlled and uncontrolled cases at the intermediate time points. Moreover, the probability of WNT5A being equal to −1 at the final time point was not, in general, equal to 1. This is not surprising given that, in this case, we are trying to control the expression status of WNT5A using another gene and the control horizon of length 5 simply may not be adequate for achieving the desired objective with such a high probability. Nevertheless, even in this case, if the network starts from the state corresponding to STC2 = −1, HADHB = 0, MART-1 = 0, RET-1 = 0, S100P = −1, pirin = 1, WNT5A = 1 and evolves
P1: Binaya Dash/Sanjay Das November 16, 2007
15:52
7985
7985˙C011
362
Modeling and Control of Complex Systems
under optimal control, then the probability of WNT5A = −1 at the final time point equals 0.673521. This is quite good in view of the fact that the same probability would have been equal to zero in the absence of any control action.
11.7
Concluding Remarks
In this chapter, we have introduced probabilistic Boolean networks with one or more control inputs. In contrast to the PBNs introduced in Reference [10], the evolution of the state of the networks considered here depends on the status of these control inputs. In the case of diseases like cancer, these control inputs can potentially be used to model the effects of treatments such as radiation, chemotherapy, and so forth on the holistic behavior of the genes. Furthermore, the control inputs can themselves be chosen so that the genes evolve in a more “desirable fashion.” Thus, the PBNs with control can be used as a modeling tool to facilitate effective strategies for therapeutic intervention. In Reference [10], it was shown how the state evolution of a PBN can be modeled as a standard Markov chain. In this work, we have shown how control can be introduced into a PBN leading to a controlled Markov chain. Furthermore, we also showed how the control inputs can be optimally chosen using the dynamic programming technique.
11.8
Future Directions
Motivated by biological considerations, the initial result on intervention presented here has been subsequently extended in several directions. First, in Reference [22], we have modified the optimal intervention algorithm to accommodate the case where the entire state vector (gene activity profile) is not available for measurement. Next, in Reference [23], we have extended our intervention results to the so-called context-sensitive PBNs which we believe are a closer approximation at modeling biological reality. In that same reference, we also considered intervention in the presence of random perturbations, where any gene in a PBN could randomly switch values with a small probability. Several open issues, however, remain and these will have to be successfully tackled before the methods suggested in this chapter find application in actual clinical practice. We next discuss some of the issues that we are aware of at the current time. •
Methodical assignment of terminal penalties: The formulation of the optimal control problem assumes that there is a terminal penalty associated with each state of the PBN. However, the assignment of these terminal penalties for cancer therapy is by no means a
P1: Binaya Dash/Sanjay Das November 16, 2007
15:52
7985
7985˙C011
Modeling and Control in Cancer Genomics
363
straightforward task. The reason is that although the intervention will be carried out only over a finite horizon, one would like to continue to enjoy the benefits in the steady state. For such purposes, the kind of terminal penalty used for the melanoma cell line study of Section 11.6.2 is simply inadequate, because it fails to capture the steady-state behavior once the intervention has ceased. To remedy the situation, we propose to assign terminal penalties based on equivalence classes. The results of preliminary simulation studies in this regard [24] appear to be encouraging. •
Choice of control inputs: In the case of the melanoma cell line study presented in Section 11.6.2, one of the genes in the PBN, namely pirin, was used as a control input. The question is, how do we decide which gene to use? Of course, one consideration is to use genes for which inhibitors or enhancers are readily available. However, even if such a gene is chosen, how can we be certain that it is capable of controlling some other gene(s)? Although the answer is not clear at this stage, we do believe that the traditional control theoretic concept of controllability [25] may yield some useful insights. Another possibility is to use the concept of gene influence introduced in Reference [10], an approach that we have explored preliminarily in Reference [23].
•
Intervening to alter the steady-state behavior: Given a Boolean network, one can partition the state-space into a number of attractors along with their basins of attraction. The attractors characterize the long-run behavior of the Boolean network and have been conjectured by Kauffman to be indicative of the cell type and phenotypic behavior of the cell. Consequently, a reasonable objective of therapeutic intervention could be to alter the attractor landscape in the associated Boolean network. Such an idea can be generalized to PBNs, and a brute force approach aimed at such intervention has been proposed in Reference [13]. We intend to develop more systematic approaches for affecting the steady-state behavior, and some initial results in this connection have been reported in Reference [26].
•
PBN design from steady-state data: Yet another aspect that merits further investigation is motivated by the fact that the currently available gene expression data comes from the steady-state phenotypic behavior and really does not capture any temporal history. Consequently, the process of inferring PBNs from the data will have to be modified, in the sense that it will have to be guided more by steadystate and limited connectivity considerations. Major research efforts in these directions are currently under way [27,28]. This last aspect further underscores the fact that the category of intervention cannot be researched in isolation. Issues that arise upstream will definitely impact intervention and vice versa.
The optimal control results presented in this chapter assume known transition probabilities and pertain to a finite horizon problem of known length.
P1: Binaya Dash/Sanjay Das November 16, 2007
15:52
7985
364
7985˙C011
Modeling and Control of Complex Systems
Their extension to the situation where the transition probabilities and the horizon length are unknown is a topic for further investigation. Finally, the results presented in this chapter correspond to the following stages in standard control design: modeling, controller design, and verification of the performance of the designed controller via computer simulations. The designed controllers will have to be successfully implemented in practical studies, at least with cancer cell lines, before the benefits of using engineering approaches in translational medicine become transparent to the biological and medical communities. A considerable amount of effort needs to be focused on this endeavour.
Acknowledgments This work was supported in part by the National Cancer Institute under Grant CA90301, by the Translational Genomics Research Institute, and by the National Science Foundation under Grants ECS-0355227 and CCF-0514644.
References 1. DeRisi, J., Penland, L., Brown, P. O., Bittner, M. L., Meltzer, P. S., Ray, M., Chen, Y., Su, Y. A., & Trent, J. M. (1996). Use of a cDNA Microarray to Analyze Gene Expression Patterns in Human Cancer. Nature Genetics, 14, 457–460. 2. DeRisi, J. L., Iyer, V. R., & Brown, P. O. (1997). Exploring the Metabolic and Genetic Control of Gene Expression on a Genomic Scale. Science, 278, 680–686. 3. Schena, M., Shalon, D., Davis, R. W., & Brown, P. O. (1995). Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray. Science, 270, 467–470. 4. Schena, M., Shalon, D., Heller, R., Chai, A., Brown, P. O., & Davis, R. W. (1996). Parallel Human Genome Analysis: Microarray-Based Expression Monitoring of 1000 Genes. Proceedings of the National Academy of Sciences USA, 93, 10614–10619. 5. Wodicka, L., Dong, H., Mittmann, M., Ho, M. H., & Lockhart, D. J. (1997). Genome-wide Expression Monitoring in Saccharomyces cerevisiae. Nature Biotechnology, 15, 1359–1367. 6. Dougherty, E. R., Kim, S., & Chen, Y. (2000). Coefficient of Determination in Nonlinear Signal Processing. Signal Processing, 80(10), 2219–2235. 7. Kim, S., Dougherty, E. R., Bittner, M. L., Chen, Y., Sivakumar, K., Meltzer, P., & Trent, J. M. (2000). A General Framework for the Analysis of Multivariate Gene Interaction via Expression Arrays. Biomedical Optics, 4(4), 411–424. 8. Kim, S., Dougherty, E. R., Chen, Y., Sivakumar, K., Meltzer, P., Trent, J. M., & Bittner, M. (2000). Multivariate Measurement of Gene-Expression Relationships. Genomics, 67, 201–209. 9. Kemeny, J. G. & Snell, J. L. (1976). Finite Markov Chains. Springer-Verlag, Berlin, 1976.
P1: Binaya Dash/Sanjay Das November 16, 2007
15:52
7985
7985˙C011
Modeling and Control in Cancer Genomics
365
10. Shmulevich, I., Dougherty, E. R., Kim, S., & Zhang, W. (2002). Probabilistic Boolean Networks: A Rule-Based Uncertainty Model for Gene Regulatory Networks. Bioinformatics, 18, 261–274. 11. Shmulevich, I., Dougherty, E. R., & Zhang, W. (2002). From Boolean to Probabilistic Boolean Networks as Models of Genetic Regulatory Networks. Proceedings of the IEEE, 90(11), 1778–1792. 12. Shmulevich, I., Dougherty, E. R., & Zhang, W. (2002). Gene Perturbation and Intervention in Probabilistic Boolean Networks. Bioinformatics, 18, 1319–1331. 13. Shmulevich, I., Dougherty, E. R., & Zhang, W. (2002). Control of Stationary Behavior in Probabilistic Boolean Networks by Means of Structural Intervention. Biological Systems, 10(4), 431–446. 14. Kauffman, S. A. (1969). Metabolic Stability and Epigenesis in Randomly Constructed Genetic Nets. Theoretical Biology, 22, 437–467. 15. Kauffman, S. A. & Levin, S. (1987). Towards a General Theory of Adaptive Walks on Rugged Landscapes. Theoretical Biology, 128, 11–45. 16. Kauffman, S. A. (1993). The Origins of Order: Self-Organization and Selection in Evolution, Oxford University Press, New York. 17. Bertsekas, D. P. (2000). Dynamic Programming and Optimal Control, Athena Scientific, Nashua, NH. 18. Bellman, R. (1957). Dynamic Programming, Princeton University Press, Princeton, NJ. 19. Bittner, M., Meltzer, P., Chen, Y., Jiang, Y., Seftor, E., Hendrix, M., Radmacher, M., Simon, R., Yakhini, Z., Ben-Dor, A., Sampas, N., Dougherty, E., Wang, E., Marincola, F., Gooden, C., Lueders, J., Glatfelter, A., Pollock, P., Carpten, J., Gillanders, E., Leja, D., Dietrich, K., Beaudry, C., Berens, M., Alberts, D., & Sondak, V. (2000). Molecular Classification of Cutaneous Malignant Melanoma by Gene Expression Profiling. Nature, 406 (6795), 536–540. 20. Weeraratna, A. T., Jiang, Y., Hostetter, G., Rosenblatt, K., Duray, P., Bittner, M., & Trent, J. M. (2002). Wnt5a Signalling Directly Affects Cell Motility and Invasion of Metastatic Melanoma. Cancer Cell, 1, 279–288. 21. Kim, S., Li, H., Dougherty, E. R., Cao, N., Chen, Y., Bittner, M., & Suh, E. (2002). Can Markov Chain Models Mimic Biological Regulation? Journal of Biological Systems, 10(4), 337–357. 22. Datta, A., Choudhary, A., Bittner, M. L., & Dougherty, E. R. (2004). External Control in Markovian Genetic Regulatory Networks: The Imperfect Information Case. Bioinformatics, 20(6), 924–930. 23. Pal, R., Datta, A., Bittner, M. L., & Dougherty, E. R. (2005). Intervention in Context-Sensitive Probabilistic Boolean Networks. Bioinformatics, 21, 1211–1218. 24. Choudhary, A., Datta, A., Bittner, M. L., & Dougherty, E. R. (2005). Assignment of Terminal Penalties in Controlling Genetic Regulatory Networks. Proceedings of the American Control Conference, 417–422. 25. Kalman, R. E. (1962). Canonical Structure of Linear Dynamical Systems. Proc. Natl. Acad. Sci., 596–600. 26. Pal, R., Datta, A., & Dougherty, E. R. (2006). Optimal Infinite Horizon Control for Probabilistic Boolean Networks. IEEE Transactions on Signal Processing, 54, 2375–2387. 27. Pal, R., Ivanov, I., Datta, A., & Dougherty, E. R. (2005). Generating Boolean Networks with a Prescribed Attractor Structure. Bioinformatics, 21, 4021–4025. 28. Zhou, X., Wang, X., Pal, R., Ivanov, I., & Dougherty, E. R. (2004). A Bayesian Connectivity-Based Approach to Constructing Probabilistic Gene Regulatory Networks. Bioinformatics, 20, 2918–2927.
P1: Binaya Dash/Sanjay Das November 16, 2007
15:52
7985
7985˙C011
P1: Binaya Dash/Sanjay Dash November 16, 2007
18:19
7985
7985˙C012
12 Modeling and Estimation Problems in the Visuomotor Pathway
Bijoy K. Ghosh, Wenxue Wang, and Zachary V. Freudenburg
CONTENTS 12.1 12.2 12.3 12.4 12.5
Introduction.............................................................................................. 368 Multineuronal Model of the Visual Cortex.......................................... 374 Generation of Activity Waves in the Visual Cortex............................ 377 Simulation with a Sequence of Stationary Inputs............................... 379 Encoding Cortical Waves with β-Strands Using Double KL Decomposition......................................................................................... 383 12.6 Statistical Detection of Position ............................................................. 387 12.6.1 Series Expansion of Sample Functions of Random Processes................................................................ 387 12.6.2 Hypothesis Testing ................................................................... 388 12.6.3 Decoding with Additive Gaussian White Noise Model............................................................................... 389 12.7 Detection Using Nonlinear Dynamics.................................................. 393 12.7.1 Phase Locking with a Network of Kuramoto Models......... 393 12.7.2 Memory with Two Elements ................................................... 394 12.8 The Role of Tectal Waves in Motor Control: A Future Goal.............. 399 12.9 Conclusion................................................................................................ 403 Acknowledgments .............................................................................................. 403 References............................................................................................................. 403 In this chapter we describe how a population of neurons models the dynamic activity of a suitable region of the visual cortex, responding to a class of visual inputs. Specifically, a large-scale neuronal model has been described which generates a propagating wave of activity that has been independently recorded in experiments using multiple electrodes and voltage-sensitive dyes. We show how the model cortex is able to discriminate location of a target in the visual space. The discrimination is carried out using two separate algorithms. 367
P1: Binaya Dash/Sanjay Dash November 16, 2007
18:19
7985
368
7985˙C012
Modeling and Control of Complex Systems
The first method utilizes statistical detection wherein the activity waves generated by the visual cortex are encoded using principal components analysis. The representation is carried out, first in the spatial domain and subsequently in the temporal domain, over a sequence of sliding windows. Using the model cortex, we show that the representation of the activity waves, viewed as a “beta strand,” is sufficiently different from each other for alternative locations of point targets in the visual space. Discrimination is carried out assuming that the noise is additive and Gaussian. In the second method, the beta strands (β-stands) are discriminated using a nonlinear dynamic system with multiple regions of attraction. Each beta strand corresponds to a suitable initialization of the dynamic system and the states of attraction correspond to various target locations. The chapter concludes with a discussion of the motor control problem and how the cortical waves play a leading role in actuating movements that would track a moving target with some level of evasive maneuvers.
12.1
Introduction
In this chapter our goal is to describe modeling and estimation problems that arise in the animal visuomotor pathway. The pathway is particularly adept at tracking targets that are moving in space, acquire and internally represent images of the target, and finally actuate a suitable motor action, such as capturing the target. In Figure 12.1 we show the tracking maneuver of a freshwater turtle as it strives to capture a moving fish. Turtles anticipate the future position of a moving target by solving a motion prediction problem—a task that is believed to be initiated in the visual cortex. Visual inputs to the retina are routed through the geniculate before it hits the cortex (see Figure 12.2). The role of the visual pathway prior to the cortex is essentially filtering the visual signal although the role of the cortex is somewhat more involved, which we describe presently. Mammals have a cerebral cortex that embodies several topographically organized representations of visual space. Extracellular recordings show that neurons in a restricted region of visual cortex are activated when a visual stimulus is presented to a restricted region of the visual space, the classical receptive field of the neuron [7]. Neurons at adjacent points in the cortex are activated by stimuli presented at adjacent regions of the visual space. Consequently, there is a continuous but deformed map of the coordinates of visual space to the coordinates of the cortex. Extracellular recordings from the visual cortex of freshwater turtles produce a different result [16]. Neurons at each cortical locus are activated by visual stimuli presented at every point in the binocular visual space, although the latency and shape of the response waveforms vary as the stimulus is presented at different loci in the visual space. This suggests that there may not be a simple map of the coordinates of the visual space to the coordinates of the visual cortex in turtles. Position in the visual space is perhaps represented in a form other than a retinotopic
P1: Binaya Dash/Sanjay Dash November 16, 2007
18:19
7985
7985˙C012
Modeling and Estimation Problems in the Visuomotor Pathway
369
01
24
60
82
100
117
130
133
135
138
Y Prey
α
Reference Point (RP)
β Digitization Points for Kinematic Analysis 1. Head angle (α) 2. Prey angle (β) 3. Distance from RP to snout 4. Distance from snout to prey
X
FIGURE 12.1 (See color insert following p. 272.) Kinematic analysis of turtle prey capture. Selected movie frames at the top of the figure show a turtle orienting to a moving fish (arrow) in frames 01 to 82, moving toward it (100 to 130), extending and turning its neck (133 to 135) and capturing the fish (138). The bottom image shows the digitization points of the kinematic analysis.
map. Experiments conducted by Senseman and Robbins [27], [29], [30] have supported this viewpoint. They used voltage-sensitive dye methods to show that presentation of a visual stimulus to the retina of an in vitro preparation of the turtle eye and brain produces a wave of depolarization that propagates anisotropically across the cortex (see Figure 12.3 for a visualization of the wave propagation in a model cortex). These waves have been demonstrated using both multielectrode arrays and voltage-sensitive dyes [23], [24], [28]. Both methods detect the activity of populations of pyramidal cells [28]. The waves have been analyzed using a variant of the principal components method, known as Karhunen–Loeve decomposition. Individual waves could be represented as a weighted sum of as few as three eigenvectors which are functions of the coordinates of the cortex. Interestingly, presentation of different
P1: Binaya Dash/Sanjay Dash November 16, 2007
18:19
7985
7985˙C012
370
Modeling and Control of Complex Systems
Cortex Visual Input Eye
Retina
LGN
Ho
a
rizo Axi ntal s
b
b a
c
Visual Streak
d
c d Lateral Medial Subpial Stellate Horizontal
L
C
R
FIGURE 12.2 (See color insert following p. 272.) The visual pathway in the turtle visual system from eyes to visual cortex.
1 ms
90 ms
160 ms 15
0 220 ms
400 ms
500 ms –15
–30
580 ms
760 ms
880 ms
–45
–60
FIGURE 12.3 (See color insert following p. 272.) A traveling wave of cortical activity from the model cortex without Hebbian and anti-Hebbian adaptation.
P1: Binaya Dash/Sanjay Dash November 16, 2007
18:19
7985
7985˙C012
Modeling and Estimation Problems in the Visuomotor Pathway
371
visual stimuli, such as spots of light at different points in the visual space, produce waves that have different representations in the three-dimensional eigenspace. This raises the possibility that visual information is coded in the spatiotemporal dynamics of the cortical waves. Subsequent research work has provided abundant evidence that the traveling electrical waves are observed not only in turtle visual cortex [24], but also across olfactory, visual, and visuomotor areas of the cortex in a variety of species [10]. Propagating waves with comparable properties can be produced in a large-scale model of turtle visual cortex that contains geniculate and cortical neurons [9], [20], [21], [32]. The large-scale model described in this chapter contains geniculate neurons in the dorsal lateral geniculate complex of the thalamus and the five major populations of neurons in the visual cortex (see Figure 12.4 for a model cortex). Turtle visual cortex has three layers: an outer layer 1, an intermediate layer 2, and an inner layer 3, and is divided into lateral and medial parts. Pyramidal cells (including lateral and medial pyramidal cells) have somata situated in layer 2 and are the source of efferent projections from the cortex. The cortex also contains at least three populations of inhibitory interneurons, the subpial (situated in the outer half of layer 1), the stellate (situated in the inner half of layer 1), and the horizontal cells (situated in layer 3). Interactions among these five types of cells involve two types of effects: excitatory and inhibitory. Both geniculate and pyramidal cells are excitatory. Geniculate neurons project excitatory contacts onto pyramidal cells, subpial cells, and stellate cells. Pyramidal cells give rise to
μm
1600
800
Lateral Medial Subpial Stellate Horizontal
0
M C
R L
0
1600
800 Left
Center
Right
μm FIGURE 12.4 (See color insert following p. 272.) Distribution of cells in each of the three layers of the turtle cortex projected on a plane. The lateral geniculate (LGN) cells are distributed linearly (shown at the right side of the bottom edge of the cortex) and the solid line shows how they interact with cells in the cortex.
P1: Binaya Dash/Sanjay Dash November 16, 2007
18:19
7985
7985˙C012
372
Modeling and Control of Complex Systems
Stellate
LGN
Horizontal
Input
Pyramidal
Subpial
Excitation connection Inhibition connection Anti−Hebbian connection Hebbian connection
(a)
SP
SP
ST GA PYR ST
PYR GA H
Feed Forward
Feedback (b)
FIGURE 12.5 Cortical circuit of freshwater turtles. Interconnection between neurons in various layers of the visual cortex is shown. Each box symbolizes a population of cells. The geniculate afferents (GA) provide excitatory input to cells in both pathways. The pyramidal cells (PYR) are excitatory. The distinction between medial and lateral pyramidal cells is not made in this diagram. The subpial (SP), stellate (ST), and horizontal (H) cells are inhibitory. The axons of pyramidal cells leave the visual cortex to other cortical areas and to the brainstem in both pathways.
excitatory inputs to the inhibitory interneurons as well as neighbor pyramidal cells. Subpial, stellate, and horizontal cells are inhibitory and provide inhibitory inputs to pyramidal cells. Subpial and stellate cells also involve recurrent connections to neighbor cells. Figure 12.5a shows the interconnections among the cortical neurons in the large-scale cortex model. The five types of cells can be thought of as forming two anatomically defined pathways within the cortex (Figure 12.5b). A feed-forward pathway (Figure 12.5B, left part) involves the geniculate inputs that make excitatory contacts on subpial,
P1: Binaya Dash/Sanjay Dash November 16, 2007
18:19
7985
7985˙C012
Modeling and Estimation Problems in the Visuomotor Pathway
373
stellate, and lateral pyramidal cells. The subpial and stellate cells make inhibitory contacts on the lateral pyramidal cells. Lateral pyramidal cells give rise to excitatory recurrent contacts on the other lateral and medial pyramidal cells. A feedback pathway (Figure 12.5b, right part) involves the recurrent collateral of both lateral and medial pyramidal cells, which make excitatory contacts on subpial, stellate, and horizontal cells. Each of these populations of inhibitory interneurons make inhibitory contacts on pyramidal cells. In addition, there are inhibitory contacts between individual subpial cells as well as between individual stellate cells. Both the lateral and medial pyramidal cells give rise to efferent connections to the thalamus, striatum, and brainstem. The retino-cortical pathway has been sketched in Figure 12.2. The retinal ganglion cells are densely distributed around a horizontal axis called the visual streak. Thus, turtle vision has a greater acuity across the horizontal axis (along the surface of water for a freely swimming turtle) in comparison to the vertical axis (above and below the water surface). The retinal inputs are redistributed “retinotopically” on the lateral geniculate nucleus (LGN). The LGN receives feed-forward inputs from the retina and feedback inputs from the cortex. The precise functional role of the LGN is not entirely known and has not been detailed here. Visual input from the LGN to the cortex is not retinotopic. In fact, inputs from the LGN are spatially distributed across the cortex and the axons are shown as lines in Figures 12.2, 12.4, and 12.6. These lines cross over, giving rise to an intense activity at the rostro-lateral part of the cortex, sparking the generation of a wave.
M C
R L 0.5 mm
FIGURE 12.6 Linear arrangement of geniculate neurons. The somata are shown as boxes and the corresponding axons are shown as lines. Only 13 out of 201 LGN neurons are shown for clarity.
P1: Binaya Dash/Sanjay Dash November 16, 2007
18:19
7985
7985˙C012
374
Modeling and Control of Complex Systems x1, t1
x2, t2
x3, t3
Striatum
SN
Cortex
PT LGN
Tectum
RF Retina Movement I (x,y,t) (a)
(b)
FIGURE 12.7 (See color insert following p. 272.) Prey capture and motion extrapolation. (a) To capture a moving fish, a turtle must extrapolate its future position. (b) Probable neural substrate for motion extrapolation in turtles. (Abbreviations: LGN, lateral geniculate nucleus; PT, pretectum; RF, reticular formation; SN, substantia nigra).
The visuo-cortical complex is part of an overall visuomotor pathway, sketched in Figure 12.7. Visual input converges onto the optic tectum via two separate routes. A direct input from the retina is fused with an input from the cortex at the tectum. The intermediate stages of the cortical input, namely the striatum, the substantia nigra, and the preteactum, are not relevant for this discussion. The tectum is responsible for predicting future locations of moving targets by processing cortical waves and fusing more recent visual inputs from the retina. The animal is able to make prediction based on long-term visual data and correct the prediction based on “somewhat recent” target location. The exact mechanism of sensor fusion at the tectum is a subject of future research.
12.2
Multineuronal Model of the Visual Cortex
In this section, we give a description of the large-scale model of the turtle visual cortex. Modeling, in general, is an evolutionary process and involves numerous parameters, some of which are obtained by physiological measurements and some of which are simply tuned in the modeling process. For a comprehensive description of the computational model we would like to refer to Reference [21]. Briefly, the dimensions of the somata and dendrites of individual types of neurons are based on measurements from Golgi impregnations of turtle visual cortex [4], [5]. Biophysical parameters for each cell type are measured with in vivo intracellular recording methods [14], [15]. The physiology of each type of synapse included in the model is known from in vitro
P1: Binaya Dash/Sanjay Dash November 16, 2007
18:19
7985
7985˙C012
Modeling and Estimation Problems in the Visuomotor Pathway
375
intracellular recording experiments [13]. The kinetics of individual types of voltage-gated channels have not been characterized with voltage clamp methods in turtle visual cortex. So the parameters needed to implement Hodgkin– Huxley-like kinetic schemes are obtained from work on mammalian cortex and constrained by comparing the firing patterns of model cells to real cells following intracellular current injections. The geometry of the geniculocortical and intracortical interconnections are known in detail [6], [17]. Moreover, there is some information on the basic shape and dimensions of the axonal arbors of subpial, stellate, and horizontal cells from Golgi preparations. These data are used to estimate spheres of influence between subpial, stellate, and horizontal cells and their postsynaptic targets. As noted in the introduction, the visual cortex of freshwater turtles contains three layers. Our model assumes the three layers are projected onto a single plane (see Figure 12.4). Each neuron is represented by a multiple compartmental model with 3 to 29 compartments based on its morphology (see Figure 12.8). Each compartment is modeled by a standard membrane equation and implemented in GENESIS [3]. The membrane equation is written
Lateral
Medial
Subpial Lateral Medial Subpial Stellate Horizontal
Stellate
Horizontal
FIGURE 12.8 (See color insert following p. 272.) Compartmental structures of cortical neuron models in the large-scale model of turtle visual cortex.
P1: Binaya Dash/Sanjay Dash November 16, 2007
18:19
7985
7985˙C012
376
Modeling and Control of Complex Systems
using a set of ordinary differential equations described as follows: ⎡ 1 ⎣ (Vi (t) − Er ) (Vi (t) − Vj (t)) d Vi (t) + =− dt Ci Ri Ri j j +
ion
gk (Vi (t) − E k ) +
⎤
gk (Vi (t) − E k ) + Istim (t) ⎦
(12.1)
syn
where Vi (t) is the time-dependent membrane potential of the ith compartment relative to the resting membrane potential, Ci is the total membrane capacitance of the ith compartment, Ri is the total membrane resistance of the ith compartment, and Ri j is the coupling resistance between the ith and jth compartments. Total resistances and capacitances are calculated from the geometry of the compartments and the biophysical parameters, Rm , Cm , and Ra using standard relationships [3]. The first summation in Equation (12.1) is over all the compartments linked to the ith compartment. The second summation is over all the species of ionic conductances present on the ith compartment. The third summation is over all the species of synaptic conductances present on the ith compartment. Istim (t) is a time-varying current injected into the ith compartment. The somata are modeled as spherical compartments and the dendrites are modeled as cylindrical compartments. The axons are not modeled as compartments but as delay lines. For a detailed description of compartmental models see References [21] and [33]. In Figure 12.8 we show the compartmental structures of the five types of cortical interneurons in the model cortex. Maps of the spatial distribution of neurons in each of the three layers of the cortex are constructed from coronal sections through the visual cortex of a turtle. The maps are divided into an 8 × 56 array of rectangular areas, each measuring 28 × 190 μm. Experimental data are not available for each of the 8 × 56 rectangular boxes and are interpolated at locations where measurements are not available. An algorithm is developed in MATLABTM to construct an array of neurons in each layer that preserves the ratios of cells between layers in the real cortex. The cells are distributed between 8 × 56 blocks according to the actual density information. Within each block, the cell coordinates are chosen randomly from a uniform distribution, independently for every block. This algorithm is convenient to use because it can generate as many different models as needed, while retaining the information about the relative densities of cells in the visual cortex of a real turtle. The model in our study has 368 lateral pyramidal cells, 311 medial pyramidal cells, 44 subpial cells, 45 stellate cells, and 20 horizontal cells (see Figure 12.4). Biophysical data are not available for neurons in the dorsal lateral geniculate complex of turtles, so geniculate neurons are modeled as single isopotential compartments with a spike-generating mechanism. Geniculate axons are modeled as delay lines that extend across the cortex from lateral to medial. The number of geniculate neurons in the model is L = 201. The
P1: Binaya Dash/Sanjay Dash November 16, 2007
18:19
7985
7985˙C012
Modeling and Estimation Problems in the Visuomotor Pathway
377
LGN neurons are arranged linearly along the lateral edge of the cortex with axons extending to the cortex (see Figure 12.6). The axons of the most rostral (right) and most caudal (left) LGN neurons in the array extend to the caudal and rostral poles of the cortex, respectively. The other afferents1 are evenly spaced between these two axons. Geniculate afferents enter the cortex at its lateral edge, cross over each other, and then run in relatively straight lines from lateral to medial cortex. The rostrocaudal axis of the geniculate is consequently mapped along the caudorostral axis of the cortex. The geometry of the geniculate afferents and their spatial distribution are based on anatomical data from Reference [17]. The number of synaptic sites (varicosities) assigned to each geniculate afferent is calculated by multiplying the length of the axon by the average number of varicosities per 100 μm of axon length. The spatial positions of the individual varicosities (the total of approximately 11,300 varicosities has been used) are assigned to axons using the distribution of varicosities along the lengths of real axons [17]. The distribution is strongly skewed to the left, indicating a greater number of varicosities in the lateral than in the medial part of the visual cortex. For cortico-cortical connections, we have constructed spheres of influence. Therefore, a cortical neuron will be connected to any other cell in the cortex within its sphere of influence. The synaptic strengths are higher in the center of influence and are linearly reduced with the distance. Propagation times between neurons are calculated using the distance between a pair of neurons and conduction velocities. The conduction velocity for geniculate afferents in turtle visual cortex has been measured at 0.18 m/s [5]. Cortico-cortical connections are given conduction velocities of 0.05 m/s, consistent with measurements of propagating waves in the turtle visual cortex [27], and the conduction velocities for axons of inhibitory interneurons in rat cortex [26].
12.3
Generation of Activity Waves in the Visual Cortex
We have already seen that a group of neurons in the turtle visual cortex has the ability to sustain a traveling wave. Typically this wave results as an interaction between a feed-forward and a feedback circuit (see Figure 12.5), the details of which have been explained in Reference [32]. Roughly speaking, the feedforward circuit controls the origination and propagating speed of the traveling wave and the feedback circuit controls the propagation duration. Waves are typically generated in the pyramidal cells as a result of an external input current that results in an increase in membrane potential. Pyramidal cells locally excite each other, resulting in a region of neural activity that tends to
1 An
afferent nerve carries impulses toward the central nervous system. The opposite of afferent is efferent.
P1: Binaya Dash/Sanjay Dash November 16, 2007
378
18:19
7985
7985˙C012
Modeling and Control of Complex Systems
propagate in all directions. Left unabated, these pyramidal cells would excite the entire cortex. Fortunately, the feed-forward circuit incorporates inhibitory actions from the stellate and subpial cells. Although the precise roles of the two inhibitory cells are different and somewhat unclear, they control the timing of wave generation. There are inhibitory actions that inhibit the wave using a feedback circuit due to three different cells: subpial, stellate, and horizontal. The feedback inhibition reduces and eventually kills the neuronal activity at the spot where the activity is greatest. The combined effect of the two circuits gives the appearance of a traveling wave. Eventually these waves are killed by a strong gaba (a type of synaptic input)-initiated inhibition that originates after a long delay. Using the large-scale model of the visual cortex that consists of excitatory and inhibitory cells described above, we have observed that the neural population remains hyperpolarized (i.e., maintained a very low membrane potential) long after the initial wave has been killed. The cortex remains unresponsive to future visual inputs, an undesirable property. One way to remedy this problem is to detect this period of hyperpolarization and increase the synaptic interaction between the excitatory pyramidal cells. This would amplify the tiny input into the pyramidal cells, forcing these cell populations to get out of hyperpolarization. This is achieved successfully, using Hebbian and anti-Hebbian adaptation. In Hebbian adaptation, the synaptic strength between two cells increases in proportion to the product of the pre- and postsynaptic activities. Likewise, in anti-Hebbian adaptation, the synaptic strength between two cells decreases in proportion to the product of the pre- and postsynaptic activities. In our model, the excitatory interconnection between pyramidal cells is chosen to be anti-Hebbian. This produces increasingly larger synaptic weights between pyramidal cells once the waves have been abated. The inhibitory interactions between the stellate/subpial/horizontal and the pyramidal cells are chosen to be Hebbian. These produce increasingly stronger inhibition to active pyramidal cells (see Figure 12.5). In Figure 12.9 we show anti-Hebbian action on the pyramidal cells. Rows 1a and 2a show wave activity as a function of time. After about 700 ms, the first round of waves has been inhibited and the pyramidal cells are hyperpolarized. The weights between the cells are very large, as indicated by the red lines in rows 1b and 2b of Figure 12.9. A subsequent input causes a second round of waves (not shown in the figure). In summary, we outline in this section how cortical cells have the ability to generate and maintain a wave of activity. An important result, outlined in this chapter, is that the waves encode target information from the visual scene. We show, using simulation of the model cortex, how Hebbian and antiHebbian adaptation has been used in generating a series of cortical waves. In later sections we show how these waves encode the location of targets in the visual space. We do not claim to have established a biological role of the Hebbian/anti-Hebbian adaptation in the wave-generation process observed in actual turtles.
P1: Binaya Dash/Sanjay Dash November 16, 2007
18:19
7985
7985˙C012
Modeling and Estimation Problems in the Visuomotor Pathway
10
(1a)
200 ms
10
100 ms
20
20
20
30
30
30
30
40
40
40
40
50
50
50
50
60
60
60
1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0
0
1
1.5
2
500 ms
10
(2a)
0.5
0
0.5
1
1.5
10 20 30 40 50 60
10 20 30 40 50 60
2
600 ms
10
1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0
1b
0
0.5
1
1.5
1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 2 0
700 ms
10
20
20
20
30
30
30
30
40
40
40
40
50
50
50
50
60
60
60
1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0
0
0.5
1
1.5
2
0
0.5
1
1.5
2
1.5
2
800 ms
10 20 30 40 50 60
10 20 30 40 50 60 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0
1
60
10 20 30 40 50 60 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0
0.5
10
20
10 20 30 40 50 60
(2b)
60
10 20 30 40 50 60 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0
400 ms
10
20
10 20 30 40 50 60
(1b)
300 ms
10
379
0
0.5
1
1.5
2
1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0
0
0.5
1
1.5
2
FIGURE 12.9 (See color insert following p. 272.) Pyramidal to pyramidal anti-Hebbian synaptic response to changes in the pyramidal activity. (1a): Frames of pyramidal cell activity due to pulse input to the LGN at 0 ms lasting for 150 ms. (1b): Frames of weight responses corresponding to the activities in (1a). (2a): Frames of pyramidal cell activity due to pulse input to the LGN at 400 ms following the first pulse lasting for 150 ms. (2b): Frames of synaptic weight responses corresponding to activities in (2a).
12.4
Simulation with a Sequence of Stationary Inputs
The stationary stimulus has been simulated by presenting a 150-ms square current pulse to a set of adjacent geniculate neurons (see Figure 12.10). For the purpose of our simulation, three equidistant positions of the stimuli have been chosen across the LGN. The stimuli are labeled by “Left,” “Center,” and “Right” (see Figure 12.4), each input goes into 20 LGN neurons, 1–20, 91–110, and 181–200, respectively, from left to right along the LGN array. To study the encoding property of the large-scale cortex model, noises have been
P1: Binaya Dash/Sanjay Dash November 16, 2007
18:19
7985
7985˙C012
Modeling and Control of Complex Systems Current (nA)
380
0.3
Left
Center
Right
LGN
Ti
m
e(
m
s)
150
FIGURE 12.10 Simulation of flash inputs.
introduced into the model by injecting randomly generated currents to the somata of cortical neurons that satisfy Gaussian distribution. Using waves generated by inputs at different locations in the visual space, we are able to study the encoding property of the model cortex (see Reference [9]). Without the Hebbian and anti-Hebbian adaptation, the model cortex produces propagating waves of activity that last for about 600 to 800 ms (see Figure 12.3), with stationary inputs described as above. After the wave has propagated, the cortical neurons remain hyperpolarized and unresponsive to any future inputs. In order to study the ability of the model cortex to encode a sequence of consecutive events, the model is expected to generate a sequence of activity waves corresponding to a sequence of visual inputs. In this section we claim that by introducing Hebbian/anti-Hebbian adaptation we are able to pull out the model cortex from hyperpolarization after the first wave has propagated. With the implementation of Hebbian/anti-Hebbian adaptation to the model cortex, one obtains a model that responds to the activities of the pyramidal cells by altering intercellular synaptic interactions. Among the many consequences of adaptation, an important one is that the duration of wave propagation is shortened from about 800 ms (see Figure 12.3) to less than 400 ms (see Figure 12.11). After the end of the first round of waves (around 450 ms in Figure 12.11), the synaptic interactions between pyramidal cells are stronger in the case of model cortex with adaptation as compared to the model without adaptation. This results in a strong amplification of tiny inputs into the pyramidal cells, to compensate for the hyperpolarization of membrane potential. An input initiated around 500-ms results in a second wave (see Figure 12.11). The model cortex with adaptation that we describe in this section samples the visual world every 500 ms by producing a wave of cortical activity that lasts for a little less than 400 ms. Each of the 500-ms time intervals would
P1: Binaya Dash/Sanjay Dash November 16, 2007
18:19
7985
7985˙C012
Modeling and Estimation Problems in the Visuomotor Pathway
381
100 ms
150 ms
200 ms
250 ms
300 ms
350 ms
400 ms
450 ms
600 ms
650 ms
700 ms
750 ms
800 ms
850 ms
900 ms
950 ms
FIGURE 12.11 (See color insert following p. 272.) Two cortical waves generated by the model cortex using Hebbian and anti-Hebbian adaptation with two consecutive inputs. The second input is initialized at 500 ms.
be called a period. In the simulations that we have carried out, a target is shown only for the first 150 ms of each period and removed subsequently. A target can be located at three different locations: Left (L), Center (C), or Right (R). At any given period, one would like to detect “target location” from the associated cortical wave observed during the same period. Furthermore, one would like the detection results at any given period to be independent of prior locations of the target. In order to ensure that one period of cortical activity does not “spill over” to the next period, we consider a pair of consecutive periods. In each period a target is located at either L, C, or R. This gives rise to a total of nine pairs of target locations for the two consecutive periods, given by LL, LC, LR, CL, CC, CR, RL, RC, and RR. Each combination of two locations can be simulated as an input by presenting two 150-ms square current pulses, that start at 0 ms and
P1: Binaya Dash/Sanjay Dash November 16, 2007
382
18:19
7985
7985˙C012
Modeling and Control of Complex Systems
500 ms, to the corresponding sets of adjacent geniculate neurons, respectively. The overall simulation time is set to 1000 ms. Each of the nine inputs causes the model cortex (with adaptation) to produce a pair of waves of activity in each of the two consecutive periods. Considering a noisy model of the cortex, the simulation is repeated 20 times for each of the nine input pairs, giving rise to a set of 180 pairs of activity waves. The simulation results consisting of membrane potentials of individual neurons are recorded and saved in a data file. Even though the data for all neurons are available, we are primarily interested in the pyramidal neurons, and the responses of pyramidal cells are denoted by I (t, n), 0 ≤ t ≤ T, 1 ≤ n ≤ N where t is time and n is the index of the pyramidal neuron. The responses of pyramidal neurons are visualized as movies by spatially resampling the data from a nonuniform grid of neuron coordinates to an artificially constructed l × l uniform grid. The program uses triangle-based linear interpolation, although other methods are also available (triangle-based cubic interpolation, nearest neighbor interpolation, and so forth) [20]. The interpolated data, for visualization, are denoted by I (x, y, t) where the pair (x, y) denotes the pixels. The value of membrane potential at each pixel is color coded and the spikes are not removed in the process. Selected snapshots from movies corresponding to stationary stimuli (assuming a model cortex without adaptation) are shown in Figure 12.3. A comparison between model waves [20] and experimental waves recorded by Senseman [28] shows that the two waves have similar features. They originate from the same point in the cortex (rostrolateral edge) and they propagate in both rostrocaudal and mediolateral directions. The spatiotemporal response I (x, y, t) of the model cortex to different target locations can be viewed as a collection of movie frames (snapshots). Given that every frame has l × l pixels, and every movie has m frames, it is clear that the dimension of I (x, y, t) could be very high (l × l × m). In order to compare two movies, in the next section we proceed to describe a principal components-based technique. This method has also been used earlier by Senseman and Robbins for the analysis of data recorded from the cortex of a freshwater turtle [29], [30]. Principal components analysis has been introduced independently by many authors at different times. The method is widely used in various disciplines, such as image and signal processing, data compression, fluid dynamics, partial differential equations, weather prediction, and so forth [11]. In image processing, the method is used for removing a redundancy (decorrelating pixels) from images [25]. The transformation itself is linear, and represents a rotation of a coordinate system, so that neighboring pixels in the new coordinate system are less correlated. Moreover, the rotation proposed by the method is optimal as it leads to a complete removal of the correlation from neighboring pixels, which is equivalent to diagonalizing the image correlation matrix. Consequently, the image can be approximated in a low-dimensional subspace, using only selected basis vectors, also called principal eigenvectors. In the theory of partial differential equations, the method is useful for finding a separable approximation to the solution of a partial differential
P1: Binaya Dash/Sanjay Dash November 16, 2007
18:19
7985
7985˙C012
Modeling and Estimation Problems in the Visuomotor Pathway
383
equation, which is optimal in the sense that it maximizes the kinetic energy cost functional [8]. Depending on the context, the method goes by the names: Karhunen– Loeve (KL) decomposition, proper orthogonal decomposition, Hotelling decomposition, and singular value decomposition. We shall refer to it as the KL decomposition which has already been applied to the analysis of cortical waves (see References [9], [21], [29], [30]). The next section describes some of the main ideas using double KL decomposition.
12.5
Encoding Cortical Waves with β-Strands Using Double KL Decomposition
In this section, we describe how the set of 180 pairs of activity waves (described in Section 12.4) are encoded, using double KL decomposition, as βstrands. For each of the nine input pairs, the model cortex with adaptation is repeatedly simulated by adding independent and identically distributed Gaussian noises to each of its neurons. As a result of additive noise injected to the cortical neurons, repeated presentation of the same stimulus does not produce the same response in general. We discuss how to utilize a two-step KL decomposition to analyze the cortical responses of the model cortex to various stimuli with injected noises using the sliding detection window (SDW) technique. As shown in Figure 12.12, the time axis is covered by equal-length, overlapping encoding windows and double KL decomposition is applied to the segment of the spike rate signal that is covered by each window. Both the starting and ending times of the windows change while the length of the window remains constant. Another encoding window technique considered in Reference [9] is the expanding detection window (EDW) for which the starting time remains unchanged at 0 ms. In this section we only describe the SDW technique, in which each segment of the cortical wave is mapped to a point in a suitably defined B-space. Plotting images of successive windows produces a a
w t1 + 1
t1 + w
FIGURE 12.12 Encoding window. The time axis is covered by equal-length, overlapping, sliding encoding windows. Both the starting and ending times of the windows slide over the time axis while the length of the window remains constant. a is the amount of time that the window slides and w is the width of each encoding window.
P1: Binaya Dash/Sanjay Dash November 16, 2007
18:19
7985
7985˙C012
384
Modeling and Control of Complex Systems
0.02 0.02 −0.02
0.01
−0.06
Membrane Voltage (V)
0
0 400
800
1200
0
400
800
1200
0
400
800
1200
400
800
1200
0.02 0.02 −0.02
0.01
−0.06 0
0 400
800
1200
0.02 0.02 −0.02
0.01
−0.06 0
0 400
800
1200
0
Time (ms) FIGURE 12.13 Responses of model pyramidal cells. The traces in the left column show voltage traces from three model pyramidal cells. The traces in the right column show the smoothed spike rate of the same three cells.
sequence of points in the B-space, called the β-strand (see Figure 12.15 below). This is a vector-valued function of time and is an alternative way to encode the original movie as a strand. The encoding process is now described in detail. Prior to the double KL decomposition process, the spike trains from the pyramidal cells are smoothed by a low-pass filter into a spike rate function. Figure 12.13 shows some examples of spike trains of pyramidal cells and their smoothed spike rates. For a particular input stimulus, we continue to use I (t, n), 0 ≤ t ≤ T, 1 ≤ n ≤ N to denote the smoothed spike rate of the cell as a spatiotemporal response signal, where t is time and n is the index of the pyramidal neuron. I (t, n) can be viewed as a matrix. The tth row represents the spike rate of each neuron at time t in response to a particular stimulus. The nth column corresponds to the pyramidal neuron n. Let the length of each
P1: Binaya Dash/Sanjay Dash November 16, 2007
18:19
7985
7985˙C012
Modeling and Estimation Problems in the Visuomotor Pathway
385
time window be w and let I (t, n), t1 + 1 ≤ t ≤ t1 + w, t1 = 0, a , 2a , · · · be the response signals for different time windows. Here a is the amount of time that the encoding windows slide (see Figure 12.12). Let M denote the total number of cortical response movies in response to stimuli in the left, center, and right visual fields. For the kth, k = 1, 2, · · · , M movie, the spatiotemporal signal in this time window can be viewed as a collection of vectors {I k (t1 + 1), I k (t1 + 2), · · · , I k (t1 + w)} where I k (t1 + i) ∈ R1×N , i = 1, 2, · · · , w. The dimensionality of the cortical response is reduced by two KL transforms into A-space and B-space, respectively. We first describe the KL transform into A-space. The covariance matrix C1 ∈ R N×N for a family of M movies is calculated as: C1 =
M w 1 k ( I (t1 + i)) T ( I k (t1 + i)) Mw k=1 i=1
(12.2)
where [I k (t1 + i)]T is the transpose of I k (t1 + i). The matrix C1 is symmetric and positive semidefinite, so its eigenvalues are all real and nonnegative and the corresponding eigenvectors form an orthonormal basis in R N . The eigenvectors corresponding to the largest p eigenvalues of C1 are called the principal eigenvectors, or modes, and the pth-order successive reconstruction of the spatiotemporal signal I k (t) ∈ R1×N is given by: Iˆ k (t1 + i) =
p
α kj (t1 + i) φ Tj ,
i = 1, 2, · · · , w
(12.3)
j=1
where φ j ∈ R N×1 is the jth principal mode, the time coefficients α kj (t1 + i) are given by α kj (t1 + i) = I k (t1 + i), φ Tj , and ·, · stands for the standard inner product. The coefficients α kj (t1 + i) of the KL decomposition are uncorrelated in terms of j and we call α kj (t), t1 + 1 ≤ t ≤ t1 + w, 1 ≤ j ≤ p the pth-order A-space representation of the movie segment within the corresponding time window for the kth movie. Figure 12.14 shows the first three principal modes and the corresponding time coefficients in a certain time window. The vector function k α1 (t), α2k (t), · · · , α kp (t) , t1 + 1 ≤ t ≤ t1 + w (12.4) can be viewed as a sample function of a vector random process. Statistical analysis of a random process can be facilitated if the process is further parameterized using a second KL decomposition. Let ⎡ γ jk
α kj (t1 + 1)
⎢ k ⎢ α j (t1 + 2) ⎢ =⎢ .. ⎢ . ⎣ α kj (t1 + w)
⎤ ⎥ ⎥ ⎥ ⎥, ⎥ ⎦
⎡
γ1k
⎤
⎢ k⎥ ⎢ γ2 ⎥ ⎢ ⎥ (ξ ) = ⎢ . ⎥ ⎢ . ⎥ ⎣ . ⎦ k T
γ pk
(12.5)
P1: Binaya Dash/Sanjay Dash November 16, 2007
18:19
7985
7985˙C012
386
Modeling and Control of Complex Systems
i=1
i=2
i=3
FIGURE 12.14 (See color insert following p. 272.) The left-hand column shows the three principal spatial modes. The right-hand column shows the corresponding time coefficients.
where j = 1, 2, . . . , p. Calculating the covariance matrix as in Equation (12.2), we have: 1 k T k (ξ ) (ξ ). C2 = M k=1 M
(12.6)
The q th-order successive approximation of the kth vector ξ k is given by: ξˆ k =
q
β kj ψ Tj
(12.7)
j=1
where ψ j , j = 1, 2, · · · , q are the eigenvectors corresponding to the largest q eigenvalues of the matrix C2 . The coefficients β kj are found by orthogonal projection of ξ k onto the jth eigenvector β kj = ξ k , ψ Tj . The β vector is referred to as the B-space representation of the cortical movie restricted to a given time window. It turns out that only a few β components capture most of the information contained in the original movie and the rest of the β components are close to zero. Repeating the above data-processing procedure for all the sliding encoding windows of a movie produces a β-strand as a vector-valued function of time. We refer to this β-strand as the B-space representation of this movie. By discarding those components that are close to zero, we obtain a low-dimensional representation of the original movie segment. If, for each sliding encoding window, the first q components of each β vector are used, we say that the vector consisting of these q components is the q th-order B-space representation of the movie. The statistical mean of the β-strands of the left-, center-, and right-stimuli movies can be easily obtained. In our analysis of this section, we used w = 10, M = 180, p = 679, q = 3, and the values
P1: Binaya Dash/Sanjay Dash November 16, 2007
18:19
7985
7985˙C012
Modeling and Estimation Problems in the Visuomotor Pathway
First Waves
387
Second Waves 0.2
0.18 0.16 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0 0.4
0.15 0.1 0.05 0 0.4 0.3 0.2 0.1
0 0
0.2
0.4
0.6
0.8
1
0.3 0.2 0.1
0
0
0.2
0.4
0.6
0.8
FIGURE 12.15 (See color insert following p. 272.) The typical β-strands with double KL decomposition. In the left figure are the mean β-strands for the 60 presentations for stimuli presented at the left, center, and right clusters of geniculate neurons in the first time period. In the right figure are the mean β-strands in the second time period. The colors blue, red and green represent the actual positions of stimuli at left, center, and right clusters of geniculate neurons.
of t1 were chosen to be 0, 2, 4, . . .. Figure 12.15 shows the mean β-strands for 60 presentations of stimuli at the left, center, and right clusters of geniculate neurons in the first time period and the second time period, respectively.
12.6
Statistical Detection of Position
In this section, the problem of detection is posed as a hypothesis testing problem. Assume that the three positions of the target correspond to three different hypotheses, that is, let H1 , H2 , and H3 denote the hypothesis that the stimulus is from the left, center, and right, respectively. Let us write: r (t) = si (t) + n(t),
i = 1, 2, 3
(12.8)
where n(t) represents a vector-valued Gaussian noise process contained in the β-strand with mean 0. 12.6.1 Series Expansion of Sample Functions of Random Processes The β-strand, r (t), can be regarded as a sample function of a vector stochastic process. It is well known that a deterministic waveform with finite energy can be represented in terms of a series expansion. This idea can be extended to include sample functions of a random process as well. In our case, we propose to obtain a series expansion of the β-strand within a chosen detection window. This process involves finding a complete orthonormal set {φi (t), i ∈ N}
P1: Binaya Dash/Sanjay Dash November 16, 2007
18:19
7985
7985˙C012
388
Modeling and Control of Complex Systems
(N denotes the set of integers) and expanding r (t) as: r (t) = l.i.m. L→∞
L
ri φi (t),
T1 ≤ t ≤ T2
(12.9)
i=1
where φi (t) are vectors of the same dimension as r (t). Let us denote r k (t) and φik (t) to be the kth component of the vectors r (t) and φi (t), respectively. In Equation (12.9), l.i.m. denotes “limit in the mean” which is defined as: q L k k 2 (r (t) − ri φi (t)) = 0, T1 ≤ t ≤ T2 (12.10) lim E L→∞
k=1
i=1
where E is the expectation operator. The coefficients ri , to be defined later in Equation (12.13), are required to be uncorrelated with each other. This is to say that, if E[ri ] = mi , then we would like to have: E[(ri − mi )(r j − m j )] = λi δi j .
(12.11)
The value of ri2 has a simple physical interpretation. It corresponds to the energy along the coordinate function, φi (t), in a particular sample function. It is shown in Reference [31] that if mi = 0, then λi is the expected value of the energy along φi (t). Clearly, λi ≥ 0 for all i. The complete orthonormal set φi (t) is the solution of the integral equation: λi φik (t)
=
q j=1
T2 T1
j
K k j (t, u)φi (u)du
(12.12)
where k = 1, 2, · · · , q , T1 ≤ t ≤ T2 , and K (t, u) are the covariance matrix of the noise process n(t), that is, K i j (t, u) = E[ni (t)n j (u)]. Here, t and u denote time and i and j denote indices of the component of the vector noise process. In Equation (12.12), λi is called the eigenvalue of the noise process and φi (t) is called the corresponding eigenfunction. Once the coordinate functions {φi (t), i ∈ N} are obtained, one can project the sample function r (t), T1 ≤ t ≤ T2 onto φi (t) and obtain the coefficient ri as: ri =
q k=1
T2 T1
r k (t)φik (t)dt.
(12.13)
Let us recall from Section 12.5 that q is the number of β components we choose for the B-space representation of the cortical movies. The νth-order representation of r (t) can then be written as a vector R = [r1 , r2 , · · · , rν ]. 12.6.2 Hypothesis Testing The proposed detection algorithm is based on computing conditional probability densities and choosing a decision criterion (see Reference [31] for details). Commonly used decision criteria include the Bayes and
P1: Binaya Dash/Sanjay Dash November 16, 2007
18:19
7985
7985˙C012
Modeling and Estimation Problems in the Visuomotor Pathway
389
Neyman–Pearson criteria. In this chapter, we use the former for two reasons. The first is that the hypotheses are governed by probability assignments, which we denote as P j , j = 1, 2, 3, that is, hypothesis H1 occurs with probability P1 , hypothesis H2 with probability P2 , and hypothesis H3 with probability P3 . The second reason is that a certain cost is incurred each time an experiment is conducted. We propose to design our decision rule so that on the average the cost is as small as possible. It is demonstrated [31] that for a decision using the Bayes criterion, the optimum detection consists of computing the logarithm likelihood ratio and comparing it to a threshold. If we assign the cost of correct detection to be zero and that of wrong detection to be 1, the likelihood ratio can be computed (see Section 2.3 of Reference [31]) as follows: 1 ( R) =
pr |H2 ( R|H2 ) pr |H1 ( R|H1 )
(12.14)
2 ( R) =
pr |H3 ( R|H3 ) . pr |H1 ( R|H1 )
(12.15)
The decision regions in the decision space are determined by comparing the logarithm likelihood ratio to the following thresholds: H2 or H3
ln
P1 P2
(12.16)
ln
P1 P3
(12.17)
ln 1 ( R) + ln
P2 . P3
(12.18)
ln 1 ( R)
> <
H1 or H3
ln 2 ( R)
H3 or H2
> <
H1 or H2
ln 2 ( R)
H3 or H1
> <
H2 or H1
The associated decision space has been sketched in Figure 12.16. For a particular strand r (t), we say that the hypothesis H1 is true, that is, the stimulus is from the left part of the visual space, if the logarithm likelihood ratio pair falls in region H1 . Likewise, the same can be said for H2 and H3 . In Figure 12.16, if P1 = P2 = P3 = 1/3, the dividing line between regions H1 and H2 becomes the negative vertical axis, the dividing line between regions H2 and H3 becomes the diagonal line which is 45 degrees counterclockwise from the positive horizontal axis, and the dividing line between regions H3 and H1 becomes the negative horizontal axis. In the following discussion, we assume that P1 = P2 = P3 = 1/3. The vector noise process n(t) in Equation (12.8) can be either white or colored, and we address only the case for which n(t) is white. 12.6.3 Decoding with Additive Gaussian White Noise Model If the vector noise process is Gaussian and white, that is, E[n(t)nT (u)] = N0 I δ(t − u), where N0 ∈ R, I is the identity matrix, and δ(·) is the Dirac
P1: Binaya Dash/Sanjay Dash November 16, 2007
18:19
7985
7985˙C012
390
Modeling and Control of Complex Systems
ln Λ2 (R) H3 ln
P1 P3
H1
H2
ln
In Λ1 (R)
P1 P2
FIGURE 12.16 Decision space divided into three regions, H1 , H2 , and H3 , in terms of the logarithm likelihood ratio Equations (12.14) and (12.15). H1 is the hypothesis that the visual input is from left; H2 is the hypothesis that the visual input is from center; H3 is the hypothesis that the visual input is from right. For any given β-strand r (t), the region that the pair ln 1 ( R) and ln 2 ( R) fall into in the decision space determines which hypothesis is true.
function, the eigenfunctions of the noise process turn out to be the orthonormalization of {si (t), i = 1, 2, 3}. So, instead of solving the integral Equation (12.12), we apply the Gram–Schmidt orthogonalization procedure on {si (t), i = 1, 2, 3} to get {φi (t), i = 1, 2, 3} as: φ1 (t) = s1 (t)/ norm(s1 (t)) φ2 (t) = ψ2 (t)/ norm(ψ2 (t)) φ3 (t) = ψ3 (t)/ norm(ψ3 (t)) where ψ2 (t) = s2 (t) − c 1 ∗ φ1 (t) ψ3 (t) = s3 (t) − c 2 ∗ φ1 (t) − c 3 ∗ φ2 (t) c 1 = IP(s2 (t), φ1 (t)) IP(s3 (t), φ1 (t)) IP(s3 (t), φ2 (t)) c 2 = IP(φ1 (t), φ1 (t)) IP(φ1 (t), φ2 (t)) IP(φ1 (t), φ1 (t)) IP(φ1 (t), φ2 (t)) c 3 = IP(φ1 (t), φ1 (t)) IP(φ1 (t), φ2 (t))
IP(φ2 (t), φ1 (t)) IP(φ2 (t), φ2 (t)) IP(φ2 (t), φ1 (t)) IP(φ2 (t), φ2 (t)) IP(s3 (t), φ1 (t)) IP(s3 (t), φ2 (t)) IP(φ2 (t), φ1 (t)) IP(φ2 (t), φ2 (t))
P1: Binaya Dash/Sanjay Dash November 16, 2007
18:19
7985
7985˙C012
Modeling and Estimation Problems in the Visuomotor Pathway
391
and IP(·, ·) and norm(·, ·) are defined, respectively, as: IP(a (t), b(t)) =
q k=1
norm(a (t)) =
T2
a k (t)b k (t)dt
T1
IP(a (t), a (t)).
The remaining φi (t) consist of an arbitrary orthonormal set whose members are orthogonal to φ1 (t), φ2 (t), and φ3 (t) and are chosen so that the entire set is complete. We then project r (t) onto this set of orthonormal coordinate functions to generate coefficients ri as in Equation (12.13). All of the ri except r1 , r2 , and r3 do not depend on which hypothesis is true and are statistically independent of r1 , r2 , and r3 . The mean values of r1 , r2 , and r3 depend on which hypothesis is true: E[ri |H j ] = mi j , i, j = 1, 2, 3. Note also that the coefficients r1 , r2 , r3 are uncorrelated with each other. Based on the Gaussian assumption, the logarithm likelihood ratio (12.14) and (12.15) can be calculated as: 3 1 ri mi2 − N0 i=1 3 1 ln 2 ( R) = ri mi3 − N0 i=1
ln 1 ( R) =
1 2 1 2 mi2 − ri mi1 + mi1 2 2
1 2 1 2 . mi3 − ri mi1 + mi1 2 2
In this study, the length of the sliding detection window has been set to 99 ms. The waves generated in each of the time periods have been used to detect the location of the target at that time period. In Figure 12.17, we show the decision spaces for a set of five different time windows chosen from two consecutive time periods. The column on the left corresponds to the first period and the column on the right corresponds to the second period. The noise n(t) is assumed to be additive, white, Gaussian, and independent over time. For each of the decision spaces in Figure 12.17, each point on the decision space represents a given cortical wave (restricted to the corresponding time window) in response to an unknown stimulus. The actual position of the stimuli at the left, center, and right clusters of geniculate neurons are encoded by the blue, red, and green colors, respectively. Ideally, any point corresponding to a left, center, or right stimulus should fall in the region of H1 , H2 , or H3 , respectively, on the decision space. Any point that does not fall in its corresponding region in the decision space produces a detection error. In Figure 12.18, we have plotted the detection error probability as a function of the location of the “time window.” We observe that the detection error increases slightly when the detection window slides to the latter part of any period. This indicates that the target locations are “less detectable” toward the latter part of the period in comparison to the earlier part, an observation that has already been made by Du et al. [9]. We also note that the detection error probabilities are slightly higher for the second time period in comparison to the first, indicating the “spillover effect” from the first time period. This phenomenon has not been studied in detail in this chapter and will be described in the future.
P1: Binaya Dash/Sanjay Dash November 16, 2007
18:19
7985
7985˙C012
392
Modeling and Control of Complex Systems First Waves
Second Waves
65−163 ms
145−243 ms
225−323 ms
305−403 ms
385−483 ms
FIGURE 12.17 (See color insert following p. 272.) Decision spaces for the detection of three hypotheses. The coordinates are log likelihood ratios computed for five different time windows. On the left column are the decision spaces using the waves in the first time period and on the right column are the decision spaces using the waves in the second time period. The actual positions of stimuli at left, center, and right clusters of geniculate neurons are encoded by the blue, red, and green colors respectively.
P1: Binaya Dash/Sanjay Dash November 16, 2007
18:19
7985
7985˙C012
Modeling and Estimation Problems in the Visuomotor Pathway First Waves
Second Waves
0.07
0.07
0.06
0.06
0.05
0.05
0.04
0.04
0.03
0.03
0.02
0.02
0.01
0.01
0
0
−0.01 −0.02
393
−0.01 0 50 100 150 200 250 300 350 400 450 500
−0.02
0 50 100 150 200 250 300 350 400 450 500
FIGURE 12.18 Detection error probability using statistic method with points on β-strands within a sliding time window of 99 ms. The left figure shows the detection error probability using the first waves and the right figure shows the detection error probability using the second waves.
12.7
Detection Using Nonlinear Dynamics
The purpose of this section is to introduce yet another computing paradigm, emerging from a network of oscillators, for the purpose of decoding from cortical waves. Elements of the oscillator network interact with each other via phases rather than amplitudes; memorized patterns correspond to synchronized states. Each unit of the oscillator network oscillates with the same frequency and a prescribed phase relationship. For pattern recognition with a network of oscillators, phase differences, instead of phases, play a crucial role. The mechanism of recognition is related to phase locking. To illustrate the main idea, we would like to review a model proposed by Kuramoto [12].
12.7.1 Phase Locking with a Network of Kuramoto Models Consider a dynamic system of the form: φ˙ i = ω +
N
si j sin(φ j − φi + ψi j )
(12.19)
j=1
where φi , i = 1, · · · , N (assume N = 2 for illustration) are phase variables taking values in the interval [−π, π). The parameters si j and ψi j are assumed to satisfy si j = s ji , ψi j = −ψ ji . The index i refers to the ith unit and these units are coupled. In order to understand the dynamics of Equation (12.19), we define a new variable φ = φ1 − φ2 and rewrite Equation (12.19)
P1: Binaya Dash/Sanjay Dash November 16, 2007
18:19
7985
7985˙C012
394
Modeling and Control of Complex Systems
as follows: φ˙ = −2s12 sin(φ − ψ12 ).
(12.20)
The stationary points of Equation (12.20) are given by φ − ψ12 = kπ , out of which the stable points are given precisely by: φ − ψ12 = 2kπ,
k = 0, ±1, ±2, · · · .
(12.21)
For φ1 , φ2 in the interval [−π, π ), φ = ψ12 and φ = ψ12 + 2π are the two stable points if ψ12 < 0, and φ = ψ12 and φ = ψ12 − 2π are the two stable points if ψ12 > 0. Up to mod 2π , the two stable points of φ are actually the same, indicating that Equation (12.20) converges globally to a unique equilibrium point. 12.7.2 Memory with Two Elements Let us discuss the problem of detecting n patterns with a Kuramoto model using two units (i.e., N = 2). In order to use Equation (12.20) for the purpose of memorizing n patterns, we would require that it has (at least) n equilibria. This can be achieved by rescaling the phase variables as: φ¯ 1 =
1 φ1 , n
φ¯ 2 =
1 φ2 . n
Rewriting Equation (12.19) with respect to the new variables, we obtain: 1 1 φ˙¯ 1 = ω + s12 sin(nφ¯ 2 − nφ¯ 1 + ψ12 ) n n 1 1 φ¯˙ 2 = ω + s21 sin(nφ¯ 1 − nφ¯ 2 + ψ21 ). n n By defining φ¯ = φ¯ 1 − φ¯ 2 , we obtain analogously the following equation: 2 φ˙¯ = − s12 sin(nφ¯ − ψ12 ). n
(12.22)
Up to mod 2π, the n stable stationary points of Equation (12.22) are given by φ¯ ke = ψn12 + 2(k−1)π if ψ12 < 0. Additionally it can be verified that if: n φ¯ ke −
π π ¯ < φ(0) < φ¯ ke + n n
(12.23)
¯ converges to the kth stable stationary point φ¯ ke . The phase difference then φ(t) ¯ ¯ variable φ(t) can be plotted as a unit complex number e i φ(t) . In Figure 12.19 such a plot is shown when the rescaling parameter n is 3. This gives rise to three stable stationary points at φ¯ ke = ψ312 + 2(k−1)π , k = 1, 2 and 3. 3 The main idea behind pattern recognition is to utilize the convergence properties of Equation (12.22) to distinguish among n complex patterns. Let us
P1: Binaya Dash/Sanjay Dash November 16, 2007
18:19
7985
7985˙C012
Modeling and Estimation Problems in the Visuomotor Pathway
395
ψ12 + 2π 3 3
ψ12 3 ψ12 4 + π 3 3
FIGURE 12.19 ¯ ¯ Phase variable φ(t) is plotted as a unit complex number e i φ(t) with the rescaling parameter 3, showing three stable equilibria which result in three regions of convergence for the dynamical system (12.22) under initial conditions constrained by Equation (12.23).
define the following n vectors in C2 as: p1 =
π1 π2
and
pk =
(k−1)π
e +i n π1 (k−1)π e −i n π2
(12.24)
for k = 2, 3, · · · , n where π1 and π2 are any two complex numbers such that |π1 | = |π2 | = 1 and arg(π1 π¯ 2 ) =
ψ12 . n
The complex vectors pk , k = 1, 2, . . . , n, are n memorized complex patterns associated with n stable equilibria φ¯ ke , k = 1, 2, . . . , n. Let us define a mapping ξ : C2 −→ R as follows:
w1 w2
(12.25)
−→ arg(w1 w ¯ 2 ).
It would follow that ξ( pk ) = φ¯ ke = ψn12 + 2(k−1)π . Thus, the n patterns pk , n k = 1, 2, · · · , n, are mapped to the n stable equilibria of Equation (12.22) under the map ξ . Patterns that are close to any pk would be attracted towards the
P1: Binaya Dash/Sanjay Dash November 16, 2007
18:19
7985
7985˙C012
396
Modeling and Control of Complex Systems
corresponding kth equilibrium. This principle can therefore be used to recognize between alternative target locations, which we would now like to explore. Target locations are not typically given as a vector of complex numbers. Hence, we are not interested in a set of complex patterns. Rather, we would like to memorize patterns of real vectors. Assume that we have n vectors vk , k = 1, 2, · · · , n, in R Q which we would like to memorize. We consider a map: T : R Q −→ C2
(12.26)
such that vk −→ pk , k = 1, 2, · · · , n where pk s are defined as in Equation (12.24). The memorized patterns are associated with phase difference equilibria via the map ξ T where: ξ T(vk ) = φ¯ ke . The dynamics of Equation (12.22) can be used to memorize patterns of n real vectors. To detect a pattern v in R Q , the phase variables φi s of the two ¯ oscillatory units can be initialized with ξ T(v) and φ(t) converges to one of the equilibria. As an illustration, a Kuramoto model with two units has been chosen to detect the position of visual inputs from the cortical waves generated in the first time period and the second time period, respectively. Three equilibria are achieved by rescaling with n = 3. The locations of the target are detected with the β-strands within a sliding time window of width 99 ms. The average of points on β-strands within the sliding time window in either the first or second time period, in R Q , are mapped to the complex vector space C2 and the phases of two units in the Kuramoto model are initialized. The map T can be either linear or nonlinear. In the case of linear transformation, the map T between the real vector space R Q and the complex vector space C2 is obtained by minimizing the following error criterion: 3 60 pk − Tq k j
(12.27)
k=1 j=1
where k is the position index and j varies over the total number of movies. The vectors pk s are defined in Equation (12.24) and the vectors q k j are the average of the β-strands within a time window. In this example, we have Q = 3, indicating that only the first three principal components are used for the detection problem. It follows that the rescaled phase difference φ¯ = φ3 converges to one of the three equilibria that are associated to the three positions of the targets, ¯ in terms L, C, and R. Figure 12.20 shows plots of phase difference variable φ(t) of sin and cos functions over time for the detection results from 180 cortical responses using the two-unit Kuramoto model and linear map upon the average points on β-strands within the time window associated with the waves in the
P1: Binaya Dash/Sanjay Dash November 16, 2007
18:19
7985
7985˙C012
Modeling and Estimation Problems in the Visuomotor Pathway Second Waves
1
1
0.5
0.5
sin(φ/3)
sin(φ/3)
First Waves
397
0
0 –0.5
–0.5 –1 1
–1 1 0.5 300 0 cos 200 250 (φ/ 150 100 3) –0.5 –1 0 50 Time
0.5 300 0 200 250 cos 150 (φ/ –0.5 100 3) Time –1 0 50
1
1
0.5
0.5
sin(φ/3)
sin(φ/3)
65–163 ms
0
0 –0.5
–0.5
–1 1
–1 1 0.5 300 0 cos 200 250 150 (φ/ –0.5 100 3) Time –1 0 50
0.5 0 cos (φ/ –0.5 3) –1 0
250 150 200 100 50 Time
300
1
1
0.5
0.5
sin(φ/3)
sin(φ/3)
145–243 ms
0
0 –0.5
–0.5 –1 1
–1 1 0.5 cos 0 (φ/ 3) –0.5
250 300
100 –1 0 50
150 200 Time
0.5 cos 0 (φ/ –0.5 3) –1 0
250 300 150 200 50 100 Time
225–323 ms
FIGURE 12.20 (See color insert following p. 272.) Convergence of phase variables in the two-unit Kuramoto model in detection with linear maps from β-space to complex vector space using the first waves and the second waves for five different time windows. The actual positions of stimuli at left, center, and right clusters of geniculate neurons are encoded by the blue, red, and green colors, respectively.
first time period (left column), and the second period (right column), respectively. The figure shows the detection results within five different sliding time windows. Each curve in Figure 12.20 represents a given cortical wave in response to an unknown stimulus. The actual positions of the target at the left, center, and right clusters of geniculate neurons are encoded by blue, red,
P1: Binaya Dash/Sanjay Dash November 16, 2007
18:19
7985˙C012
Modeling and Control of Complex Systems
1
1
0.5
0.5
0
sin(φ/3)
sin(φ/3)
398
7985
0 –0.5
–0.5
–1 1
–1 1 0.5 300 0 cos 200 250 (φ/ 150 –0.5 100 3) –1 0 50 Time
0.5 300 0 200 250 cos 150 (φ/ –0.5 100 3) Time –1 0 50
1
1
0.5
0.5
0
sin(φ/3)
sin(φ/3)
305–403 ms
0 –0.5
–0.5
–1 1
–1 1 0.5 250 300 0 cos 150 200 (φ/ –0.5 100 3) Time –1 0 50
0.5 0 cos (φ/ –0.5 3) –1 0
250 150 200 im 50 100 T e
300
385–483 ms
FIGURE 12.20 (See color insert following p. 272.) (Continued).
and green colors, respectively. Ideally, any curve corresponding to a left, center, or right stimulus should converge to the associated point of equilibrium. Any curve that does not approach to its corresponding equilibrium produces a detection error. Performing the detection over a continuum of detection windows and summing the total detection error for each detection window yields the relationship between the probability of detection error and detection window as shown in Figure 12.21. It may be remarked that the detection error probabilities are quite large in comparison to Figure 12.18, indicating that the algorithm using nonlinear dynamic methods requires further improvement. Because the maps from β-strands to complex vector space are not limited to being linear, one would like to ask if there is a nonlinear map for which the detection results can be improved. In this chapter we give an example of such a map that can improve the detection results using the Kuramoto model. We consider the nonlinear map L from the space R Q of β-strands to the points on the decision space S, obtained in Section 12.6. Subsequently, we can map the points on the decision space S onto the complex vector space C2 by a linear transformation D. The maps defined are described as follows: L : R Q −→ S D : S −→ C2 .
(12.28)
P1: Binaya Dash/Sanjay Dash November 16, 2007
18:19
7985
7985˙C012
Modeling and Estimation Problems in the Visuomotor Pathway First Waves
Second Waves
0.8
0.8
0.7
0.7
0.6
0.6
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1
0
399
0 0
50 100 150 200 250 300 350 400 450 500
0
50 100 150 200 250 300 350 400 450 500
FIGURE 12.21 Detection error probability using the Kuramoto model with points on β-strands within a sliding time window of 99 ms. The left figure is the detection error probability using the first waves and the right figure is the detection error probability using the second waves.
The map D is obtained by constructing an optimal linear function that maps each of the three clusters on the decision space S to its corresponding pattern on the complex vector space C2 . The details are similar to optimizing a cost function of the form (12.27). By concatenating the two maps L and D we obtain a nonlinear map from the space of β-strands onto the space C2 of complex patterns. One can now use the Kuramoto model as described earlier. The detection results, shown in Figures 12.22 and 12.23, are considerably improved in comparison to Figures 12.20 and 12.21.
12.8
The Role of Tectal Waves in Motor Control: A Future Goal
We have remarked earlier that an important role of the cortex is in predicting the future locations of a moving target. The turtle tries to track a moving fish by anticipating its future location. We now describe in some detail how the tracking movement is executed. In Figure 12.7 we show a turtle that is trying to catch a fish moving past it from left to right.The turtle first notices the fish at point x1 at time t1 . It watches the fish until it reaches point x2 at time t2 , and then moves toward the fish to grasp it with its mouth. However, the fish keeps moving and reaches point x3 by the time t3 , when the turtle completes its head movement. Thus, the turtle will miss the fish if it bases its head movement on the position of the fish when the movement began. To be successful, the turtle must extrapolate the present position of the fish and plan its head movement to reach point x3 at time t3 . However, the fish has a stake in the event and will try to evade capture by making escape movements that appear unpredictable to the turtle. An important question in this context is, “How does the turtle accomplish this complex motion extrapolation task?”
P1: Binaya Dash/Sanjay Dash November 16, 2007
18:19
7985
7985˙C012
400
Modeling and Control of Complex Systems Second Waves
1
1
0.5
0.5
sin(φ/3)
sin(φ/3)
First Waves
0
0
–0.5
–0.5
–1 1
–1 1 0.5 0 cos (φ/ 3) –0.5
200 100 150 e –1 0 50 Tim
250
0.5 300 0 200 250 cos 150 (φ/ –0.5 100 3) Time –1 0 50
300
1
1
0.5
0.5
sin(φ/3)
sin(φ/3)
65–163 ms
0
0
–0.5
–0.5
–1 1
–1 1 0.5 0 cos (φ/ –0.5 3) –1 0
0.5 250 300 0 cos 150 200 (φ/ –0.5 100 e 3) Tim –1 0 50
250 150 200 100 50 Time
300
1
1
0.5
0.5
sin(φ/3)
sin(φ/3)
145–243 ms
0
0
–0.5
–0.5
–1 1
–1 1 0.5 cos 0 (φ/ 3) –0.5
100 –1 0 50
250 300 150 200 Time
0.5 cos 0 (φ/ –0.5 3) –1 0
50 100
250 300 150 200 Time
225–323 ms
FIGURE 12.22 (See color insert following p. 272.) Convergence of phase variables in the two-unit Kuramoto model in detection with maps from points in decision space to complex vector space using the first waves and the second waves. The actual positions of stimuli at left, center, and right clusters of geniculate neurons are encoded by the blue, red, and green colors, respectively.
P1: Binaya Dash/Sanjay Dash November 16, 2007
18:19
7985
7985˙C012
1
1
0.5
0.5
sin(φ/3)
sin(φ/3)
Modeling and Estimation Problems in the Visuomotor Pathway
0
401
0 –0.5
–0.5 –1 1
–1 1 0.5 300 250 0 cos (φ/ 150 200 100 3) –0.5 –1 0 50 Time
0.5 250 300 0 cos 150 200 (φ/ –0.5 100 3) Time –1 0 50
1
1
0.5
0.5
sin(φ/3)
sin(φ/3)
305–403 ms
0
0 –0.5
–0.5 –1 1
–1 1 0.5 0 cos (φ/ –0.5 3) –1 0
0.5 300 0 cos 200 250 150 (φ/ –0.5 100 3) Time –1 0 50
250 150 200 im 50 100 T e
300
385–483 ms
FIGURE 12.22 (See color insert following p. 272.) (Continued).
First Waves
Second Waves
0.2
0.2
0.15
0.15
0.1
0.1
0.05
0.05
0
0
−0.05
0
50 100 150 200 250 300 350 400 450 500
−0.05
0
50 100 150 200 250 300 350 400 450 500
FIGURE 12.23 Detection error probability using the Kuramoto model with maps from points in decision space to complex vector space. The left figure is the detection error probability using the first waves and the right figure is the detection error probability using the second waves.
P1: Binaya Dash/Sanjay Dash November 16, 2007
402
18:19
7985
7985˙C012
Modeling and Control of Complex Systems
The neural system in Figure 12.1 is the substrate for the prey capture behavior. A light intensity function, I (x, y, t), contains the image of the moving fish that is the input to the system. The image is transformed by the retinal circuitry and fed in parallel to the lateral geniculate nucleus and optic tectum. The lateral geniculate transmits information to the cortex, which sends information to the tectum via the striatum, pretectum, and substantia nigra. The tectum thus receives direct information from the retina with a relatively short time delay and indirect information from the cortex with a longer delay. The tectum contains a topographic map of visual space and can generate a head movement directed towards point x2 at time t2 . The movement is realized by projections from the tectum to the brainstem reticular formation, which drives the motoneurons that innervate the neck muscles. An interesting feature of the dynamics of this system is that moving stimuli produce waves of activity in the retina, visual cortex, and tectum. Berry et al. [2] used single electrodes to record the responses of retinal ganglion cells in salamanders and rabbits to moving bars. The neural image of the bar on the retina is a wave of activity that leads the advancing edge of the bar. This phenomenon is due to a contrast gain mechanism in the retinal circuitry and is a potential mechanism underlying motion extrapolation. Wilke et al. [34] used an array of 100 extracellular electrodes to record the responses of ganglion cells in turtles to moving and stationary bars. A moving bar produced a rapid and intense increase in the firing of ganglion cells that was not seen following the presentation of a stationary bar. Several studies ([1], [18], [19]) suggest that moving stimuli produce a wave, or “hill,” of activity in the superior colliculus (the mammalian homolog of the optic tectum). More recently, Port et al. [22] recorded simultaneously from pairs of electrodes implanted in the superior colliculus of macaques while the monkeys made visual saccades. Their data suggest that relatively large saccades — typically coordinated with head movements — are accompanied by a wave of activity that proceeds from caudal to rostral across the superior colliculus. They hypothesize that this wave is involved in determining the duration of eye and head movements. Finally, studies using both multielectrode arrays and voltage-sensitive dyes show that visual stimuli produce waves of activity in the visual cortex of freshwater turtles [28]. Information about the position and speed of visual stimuli is encoded in the spatiotemporal dynamics of the wave [9], [20]. Both retinal and tectal waves have been considered as candidate mechanisms for motion extrapolation [22], [34], so it is natural to inquire if they play a role in the turtle’s attempt to catch the fish. Our future work would test the hypothesis that the waves in the visual cortex and optic tectum contain information that can be used to extrapolate the future position of a moving stimulus from its past trajectory. Specifically, we hypothesize that the cortical wave contains information that can be used to extrapolate the future position of a stimulus that has been moving along a smooth trajectory, whereas the tectal wave contains information that can be used to predict the future position of a stimulus that undergoes an abrupt change in its trajectory.
P1: Binaya Dash/Sanjay Dash November 16, 2007
18:19
7985
7985˙C012
Modeling and Estimation Problems in the Visuomotor Pathway
12.9
403
Conclusion
To conclude, we would like to emphasize that the retino-cortical circuit of a turtle has been described in this chapter. The animal scans the visual world by splitting it into a sequence of time windows called “periods.” In each period, the cortex produces a wave of activity. We emphasize how Hebbian and antiHebbian adaptation plays a crucial role in maintaining a sequence of cortical activity and describe how these activity waves can be used to decode locations of unknown targets in space. Two different decoding algorithms have been discussed in this chapter. The first algorithm utilizes a statistical detection method to detect the location of a target in the visual space. The proposed method utilizes hypothesis testing by projecting the observed data onto a decision space. The second algorithm is developed using a nonlinear dynamic model that has the ability to converge to an equilibrium point based on where the model has been initialized. Each of the equilibrium points is calibrated with alternative locations of the target and a function is constructed that maps the raw data onto a vector space of complex numbers that can be used as an initial condition for the model. For the purpose of this chapter, we use the Kuramoto model and construct two different functions that generate the required initial conditions. We show that in order to obtain detection results comparable to that obtained by the statistical method, a nonlinear function is required to generate the associated initial conditions, and we construct one such function in this chapter. The role of the retino-cortical pathway is discussed in the context of the overall visuomotor control problem and we remark that the tectum plays an important part in the generation of the associated motor commands. Such control problems are the subject of future research.
Acknowledgments This work is partially supported by Grants EIA-0218186 and ECS-0323693 from the National Science Foundation.
References 1. R. W. Anderson, E. L. Keller, N. J. Gandhi, and D. Sanjoy. Two-dimensional saccade related population activity in superior colliculus in monkey. J. Neurophysiol., 79:2082–2096, 1998. 2. M. J. Berry II, I. H. Brivanlou, T. A. Jordan, and M. Meister. Anticipation of moving stimuli by the retina. Nature, 398:334–338, 1999.
P1: Binaya Dash/Sanjay Dash November 16, 2007
404
18:19
7985
7985˙C012
Modeling and Control of Complex Systems
3. J. M. Bower and D. Beeman. The Book of Genesis. TELOS, Santa Clara, 1998. 4. J. B. Colombe, J. Sylvester, J. Block, and P. S. Ulinski. Subpial and stellate cells: Two populations of interneurons in turtle visual cortex. J. Comp. Neurol., 471:333– 351, 2004. 5. J. B. Colombe and P. S. Ulinski. Temporal dispersion windows in cortical neurons. J. Comput. Neurosci., 7:71–87, 1999. 6. C. E. Cosans and P. S. Ulinski. Spatial organization of axons in turtle visual cortex: Intralamellar and interlamellar projections. J. Comp. Neurol., 296:548–558, 1990. 7. F. Delcomyn. Foundations of Neurobiology. W. H. Freeman & Co., New York, 1998. 8. M. Dellnitz, M. Golubitsky, and M. Nicol. Symmetry of attractors and the Karhunen–Loeve decomposition. In L. Sirovich, editor, Trends and Perspectives in Applied Mathematics, pages 73–108. Springer Verlag, New York, 1994. 9. X. Du, B. K. Ghosh, and P. S. Ulinski. Encoding and decoding target locations with waves in the turtle visual cortex. IEEE Trans. Biomedical Engineering, 52:566– 577, 2005. 10. G. B. Ermentrout and D. Kleinfeld. Traveling electrical waves in cortex: Insights from phase dynamics and speculation on computational role. Neuron, 29(3334):33–44, January 2001. 11. P. Holmes, J. L. Lumley, and G. Berkooz. Turbulence, Coherent Structure, Dynamical Systems and Symmetry. Cambridge University Press, Cambridge, 1996. 12. Y. Kuramoto. Chemical Oscillations, Waves, and Turbulence. Springer-Verlag, New York, 1984. 13. L. J. Larson-Prior, P. S. Ulinski, and N. T. Slater. Excitatory amino acid receptormediated transmission in geniculocortical and intracortical pathways within visual cortex. J. Neurophysiol., 66:293–306, 1991. 14. J. G. Mancilla, M. Fowler, and P. S. Ulinski. Responses of regular spiking and fast spiking cells in turtle visual cortex to light flashes. Vis. Neurosci., 15:979–993, 1998. 15. J. G. Mancilla and P. S. Ulinski. Role of GABAA -mediated inhibition in controlling the responses of regular spiking cells in turtle visual cortex. Vis. Neurosci., 18:9–24, 2001. 16. P. Z. Mazurskaya. Organization of receptive fields in the forebrain of Emys orbicularis. Neurosci. Behav. Physiol., 7:311–318, 1974. 17. K. A. Mulligan and P. S. Ulinski. Organization of geniculocortical projections in turtles: Isoazimuth lamellae in the visual cortex. J. Comp. Neurol., 296:531–547, 1990. 18. D. P. Munoz, D. Guitton, and D. Pelisson. Control of orienting gaze shifts by the tectoreticulospinal system in the head-free cat. III. Spatiotemporal characteristics of phasic motor discharges. J. Neurophysiol., 66:1642–1666, 1991. 19. D. P. Munoz and R. H. Wurtz. Saccade-related activity in monkey superior colliculus. II. Spread of activity during saccades. J. Neurophysiol., 73:2334–2348, 1995. 20. Z. Nenadic, B. K. Ghosh, and P. S. Ulinski. Modeling and estimation problems in the turtle visual cortex. IEEE Trans. Biomedical Engineering, 49:753–762, 2002. 21. Z. Nenadic, B. K. Ghosh, and P. S. Ulinski. Propagating waves in visual cortex: A large scale model of turtle visual cortex. J. Comput. Neurosci., 14:161–184, 2003. 22. N. L. Port, M. A. Sommer, and R. H. Wurtz. Multielectrode evidence for spreading activity across the superior colliculus movement map. J. Neurophysiol., 84:344–357, 2000.
P1: Binaya Dash/Sanjay Dash November 16, 2007
18:19
7985
7985˙C012
Modeling and Estimation Problems in the Visuomotor Pathway
405
23. J. C. Prechtl, T. H. Bullock, and D. Kleinfeld. Direct evidence for local oscillatory current sources and intracortical phase gradients in turtle visual cortex. Proc. Natl. Acad. Sci. USA, 97(2):877–882, 2000. 24. J. C. Prechtl, L. B. Cohen, P. P. Mitra, B. Pesaran, and D. Kleinfeld. Visual stimuli induce waves of electrical activity in turtle cortex. Proc. Natl. Acad. Sci. USA, 94(4):7621–7626, 1997. 25. K. R. Rao and P. C. Yip. The Transform and Data Compression Handbook. CRC Press, Boca Raton, FL, 2001. 26. P. A. Salin and D. A. Prince. Electrophysiological mapping of GABA A receptor mediated inhibition in adult rat somatosensory cortex. J. Neurophysiol., 75:1589– 1600, 1996. 27. D. M. Senseman. Correspondence between visually evoked voltage sensitive dye signals and activity recorded in cortical pyramidal cells with intracellular microelectrodes. Vis. Neurosci., 13:963–977, 1996. 28. D. M. Senseman. Spatiotemporal structure of depolarization spread in cortical pyramidal cell populations evoked by diffuse retinal light flashes. Vis. Neurosci., 16:65–79, 1999. 29. D. M. Senseman and K. A. Robbins. Visualizing differences in movies of cortical activity. IEEE Trans. Visual. Comput. Graphics, 4:217–224, 1998. 30. D. M. Senseman and K. A. Robbins. Modal behavior of cortical neural networks during visual processing. J. Neurosci., 19:1–7, 1999. 31. H. L. Van Trees. Detection, Estimation and Modulation Theory. John Wiley & Sons, New York, 1968. 32. W. Wang, C. Campaigne, B. K. Ghosh, and P. S. Ulinski. Two cortical circuits control propagating waves in visual cortex. J. Comput. Neurosci., 19:263–289, 2005. 33. W. Wang, S. Luo, B. K. Ghosh, and P. S. Ulinski. Generation of receptive fields of subpial cells in turtle visual cortex. J. Integrative Neurosci., 5:561–593, 2006. 34. S. D. Wilke, A. Thiel, C. W. Eurich, M. Greschner, M. Bongard, J. Ammermuller, and H. Schwegler. Population coding of motion patterns in the early visual system. J. Comp. Physiol. A, 187:549–558, 2001.
P1: Binaya Dash/Sanjay Dash November 16, 2007
18:19
7985
7985˙C012
P1: Binaya Dash/Sanjay Das November 16, 2007
16:6
7985
7985˙C013
13 Modeling, Simulation, and Control of Transportation Systems
Petros Ioannou, Yun Wang, and Hwan Chang
CONTENTS 13.1 13.2
Introduction.............................................................................................. 408 Modeling of Traffic Flow ........................................................................ 409 13.2.1 Macroscopic Traffic Flow Models........................................... 409 13.2.2 Microscopic Traffic Flow Models............................................ 417 13.2.2.1 Linear Car-Following Model.................................. 417 13.2.2.2 Generalized Linear Car-Following Model ........... 418 13.2.2.3 Asymmetric Model.................................................. 418 13.2.2.4 Nonlinear Car-Following Model ........................... 418 13.2.2.5 Helly’s Model ........................................................... 419 13.2.2.6 Deterministic Optimal Control Model ................. 419 13.2.2.7 Stochastic Optimal Control Model........................ 420 13.2.2.8 Gipps Model............................................................. 420 13.2.2.9 Psychophysical Spacing Model or Action Point (AP) Model ................................... 420 13.2.3 Mesoscopic Models................................................................... 421 13.3 Highway Traffic Flow Control............................................................... 422 13.3.1 System Description and Notation .......................................... 422 13.3.2 Microscopic Simulation............................................................ 424 13.3.3 Control Strategies...................................................................... 427 13.3.4 Evaluation of the HTFC System ............................................. 431 13.4 Conclusions .............................................................................................. 434 References............................................................................................................. 435
407
P1: Binaya Dash/Sanjay Das November 16, 2007
16:6
7985
7985˙C013
408
13.1
Modeling and Control of Complex Systems
Introduction
Freeways were built originally to provide almost unlimited mobility to road users for a number of years to come. No one predicted the dramatic increase in car ownership that has led to the current situation where congestion during rush hours often converts a smooth traffic flow to a virtual parking lot. The negative effects of congestion go beyond the obvious one, the travel time that drivers experience, to include environmental and health effects, travel cost, safety, quality of life, and so forth. The need for additional capacity in order to maintain the mobility and freedom drivers used to enjoy in the past can no longer be met in most metropolitan areas by following the traditional approach of building additional highways. The lack of space, high cost of land, environmental constraints, and the time it takes to build a new highway, as well as the possible disruption to the traffic system in already congested areas, make the building of new highways in many metropolitan areas a very challenging proposition. The only way to add additional capacity is to make the current system more efficient through the use of technologies and intelligence. As characterized on page 271 of Reference [1], “the traffic situation on today’s freeways resembles very much the one in urban road networks prior to the introduction of traffic lights: blocked links, chaotic intersections, reduced safety.” In another paper [2] it is pointed out that most of the congestion is due to mismanagement of traffic rather than to demand exceeding capacity. The current highway system operates as an almost open-loop dynamic system, which is susceptible to the effect of disturbances in ways that lead to frequent congestion phenomena. The successful implementation of intelligent transportation systems will require a good understanding of the dynamics of traffic and the effect of associated phenomena and disturbances. In addition, the understanding of human interaction within the transportation system is also crucial. Transportation systems and traffic phenomena constitute highly complex dynamic problems where simplified mathematical models are not adequate for their analysis. There is a need for more advanced methods and models in order to understand the complexity of traffic flow characteristics and find ways to manage traffic better using advanced technologies and feedback control techniques. The high complexity and dynamicity of traffic systems cannot be always accurately captured by mathematical models. For this reason computer simulation models are developed and tuned to describe the traffic flow characteristics on a given traffic network. Highway traffic flow is a complex dynamic system that exhibits phenomena that are not easy to capture using mathematics. When one looks at a particular vehicle, a small element of this large dynamic system, that element itself has complex dynamics which include those of the vehicle as well as the dynamics of the driver. The driver’s response exhibits a certain level of randomness as different drivers may respond differently under the same driving conditions. Different vehicles may have different dynamics. Furthermore each vehicle interacts with others leading to an interconnected dynamic system.
P1: Binaya Dash/Sanjay Das November 16, 2007
16:6
7985
7985˙C013
Modeling, Simulation, and Control of Transportation Systems
409
Zooming out, however, one can view the traffic flow as a flow of fluid where instead of overflows we have reduction in speed and flow rates. In such a macroscopic view the responses of individual vehicles are averaged and quantities such as average speed, flow, and density are the main states of the system. Efforts to model traffic flow, both on the vehicle level, referred to as microscopic models, and the flow level, referred to as macroscopic models, have been around as early as the 1950s, if not earlier, and continue to take place. With advances in computers and computational power, simulation tools have been developed to capture what up-to-date mathematical models cannot adequately describe. This chapter presents an overview of traffic flow modeling at the microscopic and macroscopic levels, an example of a computer simulation model validated by field data, and the design of a highway traffic flow control system evaluated using the simulation model.
13.2
Modeling of Traffic Flow
Traffic flow models can be divided into two major popular classes, the macroscopic and microscopic models. The macroscopic models are concerned with the overall behavior of traffic flow as described by the values of average speed, density, and flow rate and do not capture individual vehicle responses, local traffic disturbances, and so forth. They are simpler to develop and analyze but less accurate in their description of traffic flow characteristics during transient traffic disturbances. On the other hand the microscopic models deal with the individual vehicle/driver behavior as the vehicle operates in traffic. They are more accurate in the sense of capturing individual vehicle behavior but computationally demanding if one wants to model traffic flow due to many vehicles. For this reason software tools and packages have been developed to model traffic flow using the microscopic behavior of vehicles. These packages include CORSIM generated by the Federal Highway Administration (FHWA) [3], and commercial ones such as VISSIM [4], AIMSUM [5], PARAMICS [6], and others. With advances in computers and computational speed these software tools allow the simulation of a large traffic network using microscopic models. In the following subsections we present some of the most popular macroscopic and microscopic models proposed in the literature over the years. 13.2.1 Macroscopic Traffic Flow Models On the macroscopic level, traffic flow is often treated as a fluid flow, typically characterized by flow rate, q (number of vehicles/unit time), density (concentration), k (number of vehicles/unit length), and speed, v (distance traveled/unit time). Generalized average speed is defined as generalized flow/generalized density [7]. It is more common to define flow as the rate at which vehicles
P1: Binaya Dash/Sanjay Das November 16, 2007
16:6
7985
7985˙C013
410
Modeling and Control of Complex Systems Observer
Δx FIGURE 13.1 Upstream vehicles passing the observer.
pass a point in space, and density as the number of vehicles per unit length over a segment of road at an instant of time. In order to better understand these definitions let us consider Figure 13.1. All the vehicles that are within a short distance x upstream of the observer pass the observer in a very small time t. Because t is small, speed and density can be considered constant. Therefore, t is approximately equal to x/ v. The total number of vehicles that passed the observer is approximately k x. Therefore the flow rate q is approximately equal to k x/ t, which is equal to kv. Thus, we have: q = kv
(13.1)
at location x and time t. If traffic is stationary, that is, at steady state, it is reasonable to assume that there is a relationship between flow and speed that depends on the properties of the road, the weather, the traffic composition (ratio of passenger cars to trucks), and so forth [7]. Figure 13.2 shows the speed–density relationship, flow–density relationship, and speed–flow relationship from some field data. Using similar field data and empirical studies, several static models have been proposed in the literature in an effort to capture these relationships in the form of equations. We describe some of these models below. Greenshield’s model [8] assumes that the traffic flow speed v = V(k) is a linear function of density, described as: k V(k) = vfree 1 − (13.2) kjam where vfree is the free flow speed and kjam is the jam density. It is clear that as the density reaches the jam density the speed goes to zero. This model has been shown to approximate real traffic in the case of fairly light traffic conditions. Using Equations (13.1) and (13.2) we obtain the relationship between flow and density, given by the equation: k2 (13.3) q = vfree k − kjam which shows a quadratic relationship between flow and density with a maximum flow q 0 reached at the critical density kc . The maximum flow q 0 can
P1: Binaya Dash/Sanjay Das November 16, 2007
16:6
7985
7985˙C013
Modeling, Simulation, and Control of Transportation Systems Speed-Density Curve
Flow-Density Curve 2500
100
2000 q (veh/h/lane)
120
v (km/h)
80 60 40
1500 1000 500
20 0
411
0
10 20 30 40 50 60 70 k (veh/km)
0
0 10 20 30 40 50 60 70 k (veh/km)
Speed-Flow Curve 120
v(km/h)
100 80 60 40 20 0
0
500 1000 1500 2000 2500 q (veh/h/lane)
FIGURE 13.2 Speed–density, flow–density, and speed–flow curves from some field data.
be viewed as the capacity of the traffic network based on its geometry, road conditions, and so forth. The corresponding fundamental diagram based on Equation (13.3) is shown in Figure 13.3. The speed is equal to the slope of the straight line connecting the origin with any point on the curve. At low densities the flow increases linearly with density. In this case the speed is equal to the slope of the curve, which is vfree , the free flow speed. As the density increases further, the flow also increases until it reaches the maximum value q 0 at the critical density kc . After this point further increases in density lead to reduction of flow rate and congestion takes place until the jam density is reached where the speed and flow rate become zero. In the region to the left of the critical density, the traffic is considered stable, and to the right of the critical density, the traffic is considered congested and unstable. Comparing Figure 13.3 with the corresponding curve in Figure 13.2 it is clear that the Greenshield model is a good approximation, at least qualitatively, of the traffic flow at low densities.
P1: Binaya Dash/Sanjay Das November 16, 2007
16:6
7985
7985˙C013
412
Modeling and Control of Complex Systems q
q0
Vf
V Stable
Unstable
kc
kjam
k
FIGURE 13.3 Fundamental diagram of Greenshield’s model.
The Greenberg model [9] offers a good approximation of traffic flow characteristics during congested traffic conditions and is given by the equation: kjam V(k) = vfree ln (13.4) k Similarly, the relationship of flow rate with density is given by: kjam q = vfree k ln k
(13.5)
which describes a similar shape (Figure 13.4) as that shown in Figure 13.3. Comparing the shape of this curve with field data it is clear that the Greenberg model is a good approximation, at least qualitatively, of the flow at high densities. Underwood’s model [10] also gives a good approximation of the free-flow traffic, and is described as: −k (13.6) V(k) = vfree exp kc q
kjam FIGURE 13.4 Fundamental diagram of Greenberg’s model.
k
P1: Binaya Dash/Sanjay Das November 16, 2007
16:6
7985
7985˙C013
Modeling, Simulation, and Control of Transportation Systems
413
where kc is the critical density, that is, the density at which the roadway segment is operating at its capacity. The corresponding flow–density relationship is −k q = vfree k exp (13.7) kc The diffusion model [11, 12] modifies Greenshield’s model to make the speed drop gradually (instead of instantaneously) as density increases and is described by the equation: k D ∂k V(k) = vfree 1 − − (13.8) kjam k ∂x where D is a diffusion coefficient given by D = ς vr2 , ς is a constant referred to as the relaxation parameter and vr is a random speed. A general model that describes some of the previous models as special cases is given in Reference [13] as: m k l (13.9) V(k) = vfree 1 − kjam where m ≥ l > 0 are real valued parameters. For example, Equations (13.2) and (13.6) can be obtained from Equation (13.9) by appropriate choices of m and l. Greenshield’s model is obtained by setting m = l = 1. Underwood’s model is obtained by setting kjam = kc m, m → ∞, l = 1. The above models are based on observations of traffic flow at steady state. They are not dynamic models as there is no dependency on time. The first dynamic macroscopic traffic flow model was proposed by Lighthill and Whitham [14] and Richards [15], and is referred to as the LWR model. It is the simplest first-order hydrodynamic model that provides a coarse description of one-dimensional traffic flow dynamics. The model is based on the assumption that the fundamental relationship between flow and density, that is, q = Q(k, x, t)
(13.10)
is also true when traffic is not stationary. If the road is homogeneous, Equation (13.10) becomes: q (x, t) = Q(k(x, t))
(13.11)
For a long, crowded, one-way road without traffic lights, exits, and entrances (Figure 13.5), the total number of vehicles within the space interval [x1 , x2 ] at time t is: x2 N(t) = k(x, t)d x (13.12) x1
P1: Binaya Dash/Sanjay Das November 16, 2007
16:6
7985
7985˙C013
414
Modeling and Control of Complex Systems Traffic Direction
k(x,t)
q(x1,t)
q(x2,t) x2
x1
FIGURE 13.5 Long one-way road without exits, entrances, and traffic lights.
Using the conservation law for vehicles within [x1 , x2 ] the change of N(t) can only come from the change of the flow at the boundaries. ∂ N(t) = q (x1 , t) − q (x2 , t) ∂t
(13.13)
Substituting Equation (13.12) into Equation (13.13) we obtain: x2 ∂ k(x, t)d x = q (x1 , t) − q (x2 , t) ∂t x1
(13.14)
Equations (13.11) and (13.14) completely define the LWR model and can be expressed as a single equation by using: x2 ∂ k(x, t)d x + q (x2 , t) − q (x1 , t) ∂t x1 x2 x2 ∂ ∂ k(x, t)d x + Q(k(x, t))d x = 0 = ∂t x1 ∂ x x1 i.e. x2 ∂ ∂ k(x, t) + Q(k(x, t)) d x = 0 ∂t ∂x x1 to obtain kt + Qk k x = 0
(13.15)
where kt =
∂k , ∂t
Qk =
∂Q ∂k
,
kx =
∂k ∂x
Given appropriate initial/boundary conditions, Equation (13.15) defines the evolution of traffic density along a specified roadway. Therefore, the LWR model together with the speed–density relationship (such as Equation [13.2]) or flow–density relationship (such as Equation [13.11]), describe the evolution of traffic states (flow, density, and speed) along a roadway section. Payne [16] modified the LWR model by adding a second differential equation, that is, the dynamic speed equation, derived from microscopic
P1: Binaya Dash/Sanjay Das November 16, 2007
16:6
7985
7985˙C013
Modeling, Simulation, and Control of Transportation Systems
415
Section 1
Section i
Section N
k1, v1, q1
ki, vi, qi
kN, vN, qN
r1,
s1
ri,
si
rN,
sN
FIGURE 13.6 Discretized space–time model.
car-following models (it can also be derived from statistical mechanical models) [16]. Whitham [17] presented a similar model independently. The model is usually called the PW model and includes in addition to Equation (13.15) the dynamic speed equation: ∂v ∂v 1 1 d V(k) ∂k = −v + V(k) − v + (13.16) ∂t ∂x τ 2k dk ∂ x where τ is the driver’s reaction time and V(k) is the stationary speed–density relationship implied by the particular car-following model. Payne also presented a discretized version of the PW model in Reference [16]. During the past thirty years the PW model motivated numerous publications dealing with extensions, variations, and applications of the PW model. One of the popular extensions is the discrete model proposed by Papageorgiou et al. [13], derived as follows. Consider the freeway segment shown in Figure 13.6, subdivided into N sections. In this discrete space configuration, the following variables are used: T0 Time step size (in h) L i Length of section i mi Number of lanes of section i ki (n) Traffic density (in veh/km/lane) of section i at time nT0 vi (n) Mean speed (in km/h) of section i at time nT0 q i (n) Traffic flow (in veh/h/lane) out of section i at time nT0 ri (n) On-ramp inflow of section i at time nT0 si (n) Off-ramp outflow of section i at time nT0 Papageorgiou et al. [13] first modified the PW model in a continuous space–time framework. The conservation of flow Equation (13.15) is modified to incorporate on-ramp and off-ramp flows: ∂k ∂q + dx = r − s (13.17) ∂t ∂x
P1: Binaya Dash/Sanjay Das November 16, 2007
16:6
7985
7985˙C013
416
Modeling and Control of Complex Systems
where r is the on-ramp inflow and s is the off-ramp outflow. Its discrete time approximation is derived as: ki (n + 1) − ki (n) ∂ki , ≈ ∂t T0
∂q i q i (n) − q i−1 (n) ≈ ∂x Li
(13.18)
Then, the difference equation of density is obtained as: ki (n + 1) = ki (n) +
T0 [q i−1 (n) − q i (n) + ri (n) − si (n)] Li
(13.19)
The dynamic speed Equation (13.16) is discretized as: vi (n + 1) − vi (n) vi (n) − vi−1 (n) 1 = −vi (n) + [V(ki (n)) − vi (n)] T0 Li τ 1 ν ki+1 (n) − ki (n) − (13.20) τ ki (n) Li where d V(k) 2dk is regarded as a constant parameter. By rearranging the terms and adding the effect of the on ramps, the speed difference equation is obtained as: ν=−
T0 T0 vi (n)[vi (n) − vi−1 (n)] + [V(ki (n)) − vi (n)] Li τ δT0 ri (n)vi (n) νT0 ki+1 (n) − ki (n) − (13.21) − τ Li ki (n) + ξ L i ki (n) + ξ
vi (n + 1) = vi (n) −
where ν, δ are constant parameters, and ξ is introduced in order to keep the last term within reasonable bounds when the density becomes small. The complete discrete time–space model improved in References [18] and [19] is summarized as: ki (n + 1) = ki (n) +
T0 [q i−1 (n) − q i (n) + ri (n) − si (n)] L i mi
vi (n + 1) = vi (n) −
T0 T0 vi (n)[vi (n) − vi−1 (n)] + [V(ki (n)) − vi (n)] Li τ
νT0 ki+1 (n) − ki (n) δT0 ri (n)vi (n) − + ωi (n) τ Li ki (n) + ξ L i ki (n) + ξ 1 k a V(k) = v f exp − a kc −
q i (n) = ki (n) · vi (n) · mi + ζi (n)
(13.22)
(13.23) (13.24) (13.25)
where, τ, ν, δ, ξ , a are constant parameters, which have the same values for all sections and need to be calibrated for different roadway segments. Equation (13.22) is the conservation equation and it is deterministic, whereas the
P1: Binaya Dash/Sanjay Das November 16, 2007
16:6
7985
7985˙C013
Modeling, Simulation, and Control of Transportation Systems
417
dynamic speed Equation (13.23) and the transport Equation (13.25) have some noise, that is, ωi a nd ζi . The physical meanings of some of the terms in the dynamic speed Equation (13.23) were given in References [20] and [21]. [V(ki (n)) − vi (n)] is the relaxation term which regards the speed provided by the stationary speed–density relationship under current density ki (n) as the desired value; vi (n)[vi (n) − vi−1 (n)] is the convection term, which represents i (n) the influence of the upstream speed; ki+1ki(n)−k is the anticipation term, which (n)+ξ describes how drivers respond to the downstream density. The stationary speed Equation (13.24) is a special case of the general model in Equation (13.9). The discrete time model (13.22) to (13.25) describes the evolution of the speed, density, and flow with time at different segments of the highway lanes. The various constants that appear in the model need to be selected so that the model closely describes traffic flow characteristics by matching real traffic data. Efforts to validate the model using real traffic data are described in Reference [13]. 13.2.2 Microscopic Traffic Flow Models Microscopic traffic flow models deal with the individual vehicle behavior. Therefore, the modeling of a traffic network involving many vehicles is far more elaborate and complex than in the case of macroscopic models. The two major classes of microscopic models are the car following and the cellular automata. The cellular automata treat the road as a string of cells, which are either empty or occupied. An example is the stochastic traffic cellular automata model presented in Reference [22]. The cellular automata are not as popular as the car-following models which attracted most of the attention from the research as well as simulation and analysis point of view. In car-following models, we assume that vehicles are in the same lane; therefore, no passing or moving backward are allowed. Each driver reacts in some fashion to a stimulus from the vehicle ahead in the following way [23]: response(t + τ ) = sensitivity × stimulus(t)
(13.26)
where τ is the reaction time, which includes the reaction time of the driver as well as that of the vehicle actuation system. Lane-changing models dealing with passing and merging also exist [24], but because lane changing and passing are not as frequent a phenomenon as vehicle following, lane-changing models received less attention. Below we present some of the most popular car-following models. 13.2.2.1 Linear Car-Following Model Pipes [25] was one of the first researchers to propose the linear car-following model: v˙ f (t + τ ) = λ[vl (t) − v f (t)]
(13.27)
where v f is the speed of the following and vl is the speed of the leading vehicles in the same lane. In this model, the following vehicle reacts to the
P1: Binaya Dash/Sanjay Das November 16, 2007
16:6
7985
7985˙C013
418
Modeling and Control of Complex Systems
speed difference between the lead and following vehicles by generating an acceleration/deceleration command proportional to the speed difference after some delay τ . The proportionality or sensitivity variable λ is assumed to be constant, that is, the driver’s reaction is solely dependent on the relative speed and does not depend on the spacing between the lead and following vehicles, which is not realistic. 13.2.2.2 Generalized Linear Car-Following Model Lee [26] made a generalization of the linear car-following model expressed as: t M(t − z)[vl (z) − v f (z)]dz (13.28) v˙ f (t) = 0
where M(t) is a memory function, one choice of which is M(t) = αe −β t , where α and β are constant parameters. This model assumes that the acceleration/deceleration at time t depends on the time history of the relative speed. The approximated sensitivity coefficient and reaction time in the Pipes model (13.27) can be derived from the memory function as: ∞ 1 ∞ ¯λ = M(t)dt, τ¯ = t M(t)dt λ¯ 0 0 The terms λ¯ and τ¯ are roughly equivalent to λ and τ in Equation (13.27). 13.2.2.3 Asymmetric Model The linear car-following model assumes that drivers react to acceleration and deceleration in the same way, whereas in reality, drivers’ reactions to deceleration is generally greater than to acceleration for safety reasons. This fact motivated the asymmetric model: λ+ [vl (t) − v f (t)], vl (t) − v f (t) ≥ 0 (13.29) v˙ f (t + τ ) = λ− [vl (t) − v f (t)], vl (t) − v f (t) < 0 Instead of a single sensitivity coefficient λ in Equation (13.27), there are two sensitivity coefficients λ+ and λ− at positive and negative relative speed, respectively, in Equation (13.29). 13.2.2.4 Nonlinear Car-Following Model Gazis et al. [27] tried to improve Pipes’ model by assuming that the sensitivity constant in Equation (13.27) is a function of the intervehicle spacing, leading to the generalized nonlinear model: v˙ f (t + τ ) = λ
vmf (t + τ ) [xl (t) − x f (t)]l
[vl (t) − v f (t)]
(13.30)
where xl , x f are the absolute positions of the lead and following vehicles, respectively, and m, l, and λ are design constants. This model is also called the Gazis–Herman–Rothery (GHR) model.
P1: Binaya Dash/Sanjay Das November 16, 2007
16:6
7985
7985˙C013
Modeling, Simulation, and Control of Transportation Systems
419
The model suggests that at large intervehicle spacing the vehicle does not accelerate as much and at small intervehicle spacing it accelerates much more if the relative speed is also high. This behavior may be unrealistic as drivers may be willing to use high acceleration when the intervehicle spacing is large irrespective of the relative speed and may not like to accelerate as much at small intervehicle spacing even if there is a positive relative speed. The following model takes into account some of the psychological responses of drivers. 13.2.2.5 Helly’s Model When the relative speed is zero, Equations (13.27), (13.29), and (13.30) give zero acceleration no matter how small the spacing is, which is not realistic. Helly [28] proposed a model that takes both the relative speed and the spacing as stimulus. v˙ f (t + τ ) = λv [vl (t) − v f (t)] + λx [xl (t) − x f (t) − D(t)]
(13.31)
where λv is the speed sensitivity coefficient, λx is the spacing sensitivity coefficient, and D(t) is the desired intervehicle spacing. In this case the driver response depends both on the relative speed and the relative spacing the driver likes to maintain. 13.2.2.6 Deterministic Optimal Control Model Tyler [29] modeled the car-following behavior as an optimal control problem where the speed of the following vehicle is regarded as the state of the dynamic system and the control u(t) is generated by solving an optimization problem. The cost of the optimization problem is chosen as: 1 ∞ J = {[xl (t) − x f (t) − σ v f (t)]2 q 1 +[vl (t) − v f (t)]2 q 2 + r u2 (t)}dt (13.32) 2 0 where σ v f (t) is some desired spacing which depends on the speed of the following vehicle linearly, and q 1 , q 2 , r are the weights of the three different square terms. If we assume that the dynamics of the lead and the following vehicles are the same, the optimal control can be shown to be: u(t) = Cv [vl (t) − v f (t)] + C x [xl (t) − x f (t) − Cc v f (t)]
(13.33)
The acceleration of the vehicle is equal to v˙ f (t) = u(t − τ ) − ρv f (t) − βv2f (t)
(13.34)
where Cv , C x , and Cc are constant gains; ρ is a coefficient related to mechanical drag (about 10−5 to 10−4 sec−1 ); β is a coefficient that depends on the aerodynamic drag (about 10−3 to 10−2 m−1 ); and τ is the reaction time as before, which was introduced by Burnham et al. [30] into the control structure. The parameters and controller gains were estimated using real traffic data in Reference [30]. This model clearly indicates that the driver/vehicle
P1: Binaya Dash/Sanjay Das November 16, 2007
16:6
7985
7985˙C013
420
Modeling and Control of Complex Systems
response depends on the relative speed and spacing, as well as on the vehicle speed. 13.2.2.7 Stochastic Optimal Control Model By introducing noise, a stochastic optimal control car-following model could be derived similar to Equations (13.33) and (13.34). In this case, the observed position xˆ (t) and speed vˆ (t) are corrupted by noise. The optimal control law and model are described by: u(t) = Cv [ˆvl (t) − vˆ f (t)] + C x [xˆ l (t) − xˆ f (t) − Cc vˆ f (t)] v˙ f (t) = u(t − τ ) − ρv f (t) − βv2f (t) + ω(t) vˆ f (t) = v f (t) + η(t)
(13.35) (13.36) (13.37)
where ω(t) and η(t) are white noise. 13.2.2.8 Gipps Model The Gipps model [31] is based on the assumption that each driver sets limits to his or her desired braking and acceleration rates and selects a safe speed to ensure that there will be no collision even when the vehicle ahead comes to a sudden stop. This model consists of two parts: the acceleration and deceleration parts which can be expressed in the same equation as: v f (t) 1/2 v f (t) 0.025 + , b f,m τ v f (t + τ ) = min v f (t) + 2.5a f,m τ 1 − vf ,des vf ,des 1/2
vl2 (t) 2 2 + b f,m τ − b f,m 2xl (t) − 2L l − 2x f (t) − v f (t)τ − bˆ (13.38) where a f,m is the maximum acceleration the driver of the following vehicle is willing to undertake; b f,m (<0) is the most severe deceleration the driver of the following vehicle is willing to undertake; bˆ is the estimate of bl,m , which is the most severe deceleration the driver of the lead vehicle is willing to undertake; vf ,des is the speed at which the following vehicle wishes to travel; L l is the physical length of the leading vehicle plus a margin. 13.2.2.9 Psychophysical Spacing Model or Action Point (AP) Model The above models assume that drivers react to changes in relative speed even at large spacing. However, drivers are subjected to certain constraints on the stimuli to which they respond: at large spacing, drivers are not influenced by relative speed; at small spacing, there are combinations of spacing and relative speed for which there is no response because the relative motion is too small; and the smaller the spacing, the more perceptible the speed difference is. Wiedemann [32, 33] proposed a psychophysical car-following model that takes into account these considerations. This model is used in the microscopic traffic simulator VISSIM. Perception thresholds or action points are the basic
P1: Binaya Dash/Sanjay Das November 16, 2007
16:6
7985
7985˙C013
Modeling, Simulation, and Control of Transportation Systems
421
characteristics of these models; therefore, they are often called action point (AP) models. The driver of the following vehicle continues to do what he or she is doing until he or she hits a threshold. This threshold depends on the driver’s perception and physical ability as well as the spacing and relative speed; therefore, the model is stochastic with specified distributions of thresholds. More details about the model can be found in References [32–34]. The connection between microscopic and macroscopic models can be established by using the microscopic models to define speed–density relationships on the macroscopic level during stationary or steady-state traffic conditions. Gazis et al. [35] showed that Greenberg’s model (Equation [13.4]) can be derived from the nonlinear car-following model (13.30) with m = 0, l = 1, and τ = 0 as follows: [vl (t) − v f (t)] (13.39) v˙ f (t + τ ) = λ [xl (t) − x f (t)] Let spacing s = xl (t) − x f (t). Then Equation (13.39) becomes: dv λ ds = dt s dt Integrating both sides, we obtain: v = λ ln s + c 0 = λ ln
(13.40)
1 + c0 k
when k = kjam ,
v = 0,
therefore
c 0 = −λ ln
1 kjam
which leads to kjam (13.41) k When λ = vfree , Equation (13.41) is the same as Equation (13.4), which is Greenberg’s model. v = λ ln
13.2.3 Mesoscopic Models As described in the previous section, the macroscopic models capture the average characteristics of traffic flow determined by variables such as average speed, flow rate, and density. Individual vehicle responses and local traffic disturbances get averaged and cannot be seen in a macroscopic model. Macroscopic models are simpler to analyze and simulate. On the other hand microscopic models are more elaborate as they include individual vehicle responses and their complexity increases rapidly as the number of vehicles in the network increases. Another class of models that is more complex than the macroscopic ones but not as complex as the microscopic models is referred to as mesoscopic models. These models can be built by using the microscopic models and interpolation to generate the states of the macroscopic model. In Reference [36], such a mesoscopic traffic flow model is proposed for automated vehicles.
P1: Binaya Dash/Sanjay Das November 16, 2007
422
13.3
16:6
7985
7985˙C013
Modeling and Control of Complex Systems
Highway Traffic Flow Control
In recent years considerable research efforts have been made to improve highway traffic flow. Among the various traffic flow control strategies, ramp metering, speed control, route guidance, and a combination of these have been developed and implemented. According to an overview [1], modern ramp metering strategies can be classified into two categories: (1) reactive strategies, such as ALINEA [37–39], aiming at maintaining the freeway traffic conditions close to prespecified set values by use of real-time measurements, and (2) nonlinear optimal ramp metering strategies, such as fuzzy logic, artificial neural networks, and other optimal ramp control strategies [40–46]. In addition to ramp metering, variable speed limits can be issued by the infrastructure to vehicles in an effort to control traffic flow characteristics on highways. It has been shown that the use of variable speed limits can improve traffic flow performance [47, 48] by preventing traffic flow breakdown [49] in the presence of traffic disturbances. The coordination of variable speed limits and ramp metering is shown to increase the range over which ramp metering is effective [50]. Nonlinear optimization and model predictive control (MPC) techniques have been used for generating desired speed limits commands [50, 51]. During the last decade, considerable research efforts have been devoted to automating vehicles in an effort to improve safety and efficiency of vehicle following [52]. Although dedicated highways with fully automated vehicles is a far in the future objective [53], the introduction of semiautomated vehicles, such as vehicles with adaptive cruise control (ACC), also referred to as intelligent cruise control (ICC), on current freeways designed to operate with manually driven vehicles has already taken place in Japan and Europe and more recently in the United States too [54]. These trends offer an opportunity to have the infrastructure communicate directly with ACC vehicles by providing commands, recommendations, and warnings for the purpose of improving traffic flow characteristics and safety. It motivates the design of ACC systems as an integral part of a larger control system that involves the roadway. In this section we present a highway traffic flow control (HTFC) system which integrates roadway to vehicle (R2V) communication capabilities and ACC systems to design a traffic flow control system. 13.3.1 System Description and Notation The structure of the HTFC system is shown in Figure 13.7. The highway traffic management center (HTMC) collects information about the status of the traffic and calculates the appropriate commands for the ramps and desired speed limits along the highway lanes. The speed limits are communicated to the individual vehicles via short-range vehicle to roadway communications or by billboards (less advanced system). If the vehicles are equipped with
P1: Binaya Dash/Sanjay Das November 16, 2007
16:6
7985
7985˙C013
Modeling, Simulation, and Control of Transportation Systems
423
Remote HTMC
WAN
Beacon, Roadside Unit
Vehicle with ACC
FIGURE 13.7 The integrated roadway/adaptive cruise control system.
adaptive cruise control systems (ACC), these systems are modified to accept and respond to speed limit commands from the roadway. The non-ACC vehicles would have to rely on the human drivers to obey the desired speed limits. Since almost all vehicles are following the vehicles immediately ahead of them the speed limit commands will be indirectly obeyed by all if at least one driver in each lane obeys the roadway speed limit commands. The HTFC system can also be viewed as a feedback control system, as shown in Figure 13.8. The HTMC system consists of the data acquisition and processing block whose responsibility is to process all traffic measurements obtained at a sampling period T2 and provide to the roadway controller those measurements that are relevant to control at a sampling rate T0 . The roadway controller uses these measurements to come up with the control commands, which include ramp metering commands and desired speed limits for the various sections of the traffic network. These commands are provided at a sampling period T1. Consider a freeway stretch that is subdivided into N sections. Each section is about 500 m long as shown in Figure 13.6. Aggregated traffic state variables are collected from traffic surveillance systems or estimated every To seconds, and the controller generates commands every T1 seconds, where T1 = Nc To , Nc is a positive design integer. Once a control command is generated at time nT1 , it will remain constant during this control interval, that is, from nT1 to (n + 1)T1 . The feedback roadway control system is shown in Figure 13.8. The freeway stretch and its surveillance system are simulated
P1: Binaya Dash/Sanjay Das November 16, 2007
16:6
7985
7985˙C013
424
Modeling and Control of Complex Systems
Data Acquisition and Processing
T0
T1
Roadside Vehicle Communication (DRSC)
HTMC System
T2
Roadway Controller
Control Inputs Speed Limit and Ramp Metering Rate
Highway Traffic System
Output: Traffic Data FIGURE 13.8 The HTFC as feedback control system.
using the microscopic simulator VISSIM. In addition to the symbols and notation defined in Section 13.2.1, the following symbols and notation are used: To Surveillance system time step size (in this project, To = 15 sec). T1 Controller time step size (T1 = Nc To , Nc is a positive integer). Vi (nT1 ) Speed limit command of the ith section during time interval [nT1 , (n + 1)T1 ], (nT1 = nNc T0 ). R j (nT1 ) Ramp flow command of the jth on ramp during time interval [nT1 , (n + 1)T1 ], (nT1 = nNc T0 ). Vmin , Vmax The lower and upper bounds of speed limits. The upper limit is the default speed limit of the freeway stretch. Rmin , Rmax The lower and upper limits of ramp flow rate. IV The set of the section indices in which speed limits are controlled. J R The set of the section indices in which ramp meters are controlled.
13.3.2 Microscopic Simulation Because actual experiments involving new traffic control algorithms are not feasible most of the time due to cost and possible adverse effects on traffic, extensive simulation studies need to be performed to evaluate the performance and robustness of the proposed control strategies and the effect of the proposed commercial developments or road schemes. Macroscopic models capture the evolution of traffic on a coarse level and therefore need less computing power and calibrating efforts. However, they are sometimes
P1: Binaya Dash/Sanjay Das November 16, 2007
16:6
7985
7985˙C013
Modeling, Simulation, and Control of Transportation Systems
425
not sufficient to capture the desired level of details of the studied transportation system. Along with the advances in computing power, microscopic simulations have increased their area of application. Software packages such as PARAMICS [6], AIMSUN [5], CORSIM [3], and VISSIM [4] have been used and studied among traffic engineers and researchers. Different software packages consist of different traffic flow models which are the keys to the accuracy of a traffic simulation system. Therefore, the calibration of the model parameters plays an important role in simulating the desired transportation system accurately. In the design and evaluation of the HTFC system, a freeway segment model is created and validated by field data using VISSIM. The Berkeley Highway Laboratory (BHL) is a test site covering 4.3 km of Interstate-80 immediately east of the San Francisco-Oakland Bay Bridge. The facility provides traffic data collected by 16 directional dualinductiveloop-detector stations [55]. The unidirectional freeway stretch constructed in VISSIM includes only the BHL northbound part, which has five lanes, including one HOV (high occupancy vehicle) lane. The existence of the HOV lane was not considered. The freeway curvature was not considered either because the degree of curvature in the area is not high enough to affect the traffic. As shown in Figure 13.9 (upper part), the freeway stretch includes two on ramps (Ashby Avenue and University Avenue) and one off ramp (University Avenue). The triangular marks represent the data collection stations of BHL. Dual-loop detectors in these seven detector stations collect speed, occupancy, and flow measurements which were then aggregated to 30-sec summary data files and could be downloaded from the BHL Web site. The basic idea of calibrating the model is to use the flow measurements from station 7 as input flows to VISSIM and compare flow and speed measurements of simulation runs with different parameters to those field measurements in order to find an acceptable set of parameters. Specifically, data from four different days of June 2004 were selected. These four days showed similar congestion patterns, that is, duration of the congestion, peak flow rate, and congested speed. Because our simulation period is 12 hours (from 10:00 am to 10:00 pm), we increased the sampling period to 5 min.
Section 1
Section 8
Section 2
Section 9
FIGURE 13.9 The BHL and its extension.
Section 3
Section 4
Section 5
Section 6
Section 10 Section 11 Section 12 Section 13 Section 14
Section 7
P1: Binaya Dash/Sanjay Das November 16, 2007
16:6
7985
7985˙C013
426
Modeling and Control of Complex Systems
The traffic flow model in VISSIM is a discrete, stochastic, time step-based, microscopic model. The model contains a psychophysical car-following model for longitudinal vehicle movement based on the Wiedemann 1999 car-following model. Several parameters involved in this model are quite sensitive and need to be calibrated. These parameters can be expressed in the following equation: s = CC0 + CC1 · v + Vl + CC2/2
(13.42)
where CC0 is the standstill distance, CC1 is the headway time, Vl is the vehicle length, CC2 is the following variation, v is the vehicle speed, and s is the spacing. Due to the fact that spacing s is approximately the inverse of density k, and that we can estimate the relationship between k and v in steady state, our first guess of these parameters came from the estimation of the following two parameters: h and d, in Equation (13.43): s = h · v + d = 1.4934 · v + 9.2099
(13.43)
where h and d are estimated by least squares estimation using our field data. Flow and speed measurements in the free-flow region of the 4 days’ data were pooled together. Spacing estimates were obtained by using s = v/q . As shown in Figure 13.10, blue points are points (spacing, speed) from field data; red points are the fitted straight line by least square estimation. Therefore, the slope of the line is approximately the time headway h and the intercept of the line is approximately the parameter d. Comparing Equations (13.42) and (13.43), we get nominal values for CC1, which is around 1.5, and CC0 + Vl + CC2/2, which is around 9. Consider the common length of a car (including the standstill distance) to be around 6 m, then CC2/2 is around 3, that is, Car-Following Model Parameter Estimation 45
Spacing (m/veh)
40 35 30 25 20 15 10
Field data points Estimated points 0
5
10 15 Speed (m/sec)
FIGURE 13.10 Car-following model parameter estimation using field data.
20
25
P1: Binaya Dash/Sanjay Das November 16, 2007
16:6
7985
7985˙C013
Modeling, Simulation, and Control of Transportation Systems Simulated Flow [veh/h/lane]
Flow [veh/h/lane]
Flow [veh/h/lane]
Field Flow [veh/h/lane]
2000 1500 1000 500 0 6
Sta 4 2 tion ID
0 0
2000 1500 1000 500 0 6
10 12 6 8 4 ] 2 [hour Time
4 Sta tion
ID
2
0
(a)
Speed [km/h]
Speed [km/h]
6 0
0
2
4
Tim
Simulated Speed [km/h]
120 100 80 60 40 20 0 2 ID
0
12 8 10 ur] e [ho
6
(b)
Field Speed [km/h]
4 Sta tion
427
2
12 8 10 4 ] r [hou Time
(c)
6
120 100 80 60 40 20 0 6 Sta 4 tion
ID
2
0
0
2
4
12 8 10 ] [hour Time 6
(d)
FIGURE 13.11 Validation results: (a) field flow, (b) simulated flow, (c) field speed, (d) simulated speed.
CC2 is around 6. Based on the nominal values of these parameters and a series of simulation runs, CC1 is chosen to be 1.5 and CC2 is chosen to be 6.5. Figure 13.11 shows the validation results in terms of field flow, simulated flow, field speed, and simulated speed [56]. It is clear that the simulation model generated matches real data and is therefore suitable for studying traffic phenomena and the effects of new control strategies on traffic flow characteristics. 13.3.3 Control Strategies Several ramp metering strategies, such as ALINEA, FLOW, METALINE, demand–capacity strategy, and occupancy strategy, are investigated in References [1, 57]. It has been shown that these strategies are easy to implement and capable of reducing traffic congestion. As our ramp metering strategy is a modification of ALINEA, we present a brief review of ALINEA. ALINEA is a simple, flexible, robust, and efficient local ramp metering strategy. It can be applied without any theoretical preinvestigation or calibration to a broad range
P1: Binaya Dash/Sanjay Das November 16, 2007
16:6
7985
7985˙C013
428
Modeling and Control of Complex Systems
of freeway ramps where congestion problems exist. Different studies have demonstrated that ALINEA is not inferior to sophisticated coordinated approaches under recurrent traffic congestions [39]. ALINEA can be expressed as R(nT1 ) = R((n − 1)T1 ) + K r [o d − o(nT1 )]
(13.44)
where R(nT1 ) is the ramp meter command at time t = nT1 , K r is a control parameter, o(nT1 ) is the measured downstream occupancy at time nT1 , and o d is the desired value for the downstream occupancy, which is typically chosen close to the critical occupancy o c [39]. The control strategy described by Equation (13.44) is a simple integral controller where the integral action rejects constant disturbances and reference points in an effort to force the downstream occupancy to stay close to the desired occupancy when the traffic volume is high. In the freeway layout shown in Figure 13.6, if section j ( j ∈ J R ) contains one on-ramp located near the middle of the section, then a similar ramp metering strategy can be implemented as in Equation (13.44) with the occupancy o replaced by the traffic density ki , and the desired occupancy o d replaced by the desired density kd . Then, kd can be chosen to be close to the critical density kc in the fundamental diagram. In the HTFC system a generalized ALINEA ramp metering strategy is used, described as follows: ⎧ if R¯ j (nT1 ) >Rmax ⎪ ⎨ Rmax , ¯ j (nT1 ) < Rmin if R (13.45) R j (nT1 ) = Rmin , ⎪ ⎩¯ R j (nT1 ) , otherwise where ¯ j (nT1 ) = R j ((n − 1) T1 ) + K r R
Nc
kd − k j ((n − 1) Nc T0 + mT0 )
(13.46)
m=1
R j (nT1 ) is the ramp command for the ramp on section j at time t = nT1 ; K r is a positive controller parameter; kd is the desired density; j ∈ JR , JR is the set of section indices in which ramp meters are controlled; T1 = Nc To , Nc is a positive integer. The ramp metering control strategy is combined with the speed limit control strategy developed next to form the overall roadway controller of the HTFC system. The current highway traffic is operating as an almost open-loop dynamic system. Ramp metering provides some feedback by controlling the volume of vehicles entering the highway through the ramps but there is no control of the vehicles coming into the highway network with different speeds from different branches of the highway. A small traffic disturbance due to a shortduration accident or vehicle breakdown creates a shock wave that takes a long time to be dissipated due to the fact that vehicles away from the accident can be at high speeds, whereas vehicles close to the accident are at almost zero speed. This possible high speed differential along the highway lanes is also associated with a high differential in the traffic density. It results in low-speed,
P1: Binaya Dash/Sanjay Das November 16, 2007
16:6
7985
7985˙C013
Modeling, Simulation, and Control of Transportation Systems
429
high-density waves that propagate upstream and persist for much longer than it takes to clear the accident or the vehicle breakdown. One way to close the loop to this physically unstable dynamic system is to calculate the desired speeds vehicles need to follow at each section of the highway for a given traffic flow situation and communicate them to the vehicles using variable message signs along the freeway [49] or via short-range roadway to vehicle communication. The deployment of roadway control systems involving variable speed limits is feasible with current communication technologies. Various speed control strategies have been proposed in the literature [48, 50, 51, 58] based on some second-order traffic flow models [such as Equations (13.22)–(13.25)]. These control strategies usually are computationally intense, and their robustness is questionable because the design models involve many unknown parameters that have to be estimated or calibrated a priori. A simple speed control strategy based on information from the fundamental flow–density relationship is used in the HTFC system, which is described as follows. ¯i Denote Ci (i ∈ IV ) as the controller generating the desired speed limit V for section i. The following switching rules are used to determine whether or not Ci should be active: S1. If ki+1 (nT1 ) ≥ (1 + + )kc , where + is a positive design parameter and kc is the critical density, then Ci is active. S2. If ki+1 (nT1 ) ≤ (1 − − )kc , where − is a positive design parameter, then Ci is inactive. S3. Otherwise, Ci maintains the same status as in the previous control time interval. The above rules prevent frequent switches of the controller between the active mode and the inactive mode. The speed of the traffic flow at each section i satisfies an upper and lower bound Vmin ≤ Vi (nT1 ) ≤ Vmax
(13.47)
where Vmin , Vmax are positive constants. When Ci is inactive, the desired speed limit is the default speed limit of the ith freeway section. When Ci is active, section i is regarded as a virtual on ramp of section i + 1 and the same generalized ALINEA ramp metering strategy is applied to regulate the flow rate Qi from section i to section i + 1, that is, ⎧ ¯ i (nT1 ) > Qmax if Q ⎪ ⎨ Qmax , ¯ i (nT1 ) < Qmin if Q (13.48) Qi (nT1 ) = Qmin , ⎪ ⎩ ¯ Qi (nT1 ), otherwise where ¯ i (nT1 ) = Qi ((n − 1) T1 ) + K v Q
Nc m=1
[kd − ki+1 ((n − 1) Nc T0 + mT0 )]
(13.49)
P1: Binaya Dash/Sanjay Das November 16, 2007
16:6
7985
7985˙C013
430
Modeling and Control of Complex Systems Flow (veh/h)
A
Qmax
Qmin
B Vmin Vc
kd
kc
Density (veh/km)
FIGURE 13.12 Fundamental flow–density diagram.
and K v is a controller parameter. kd is the desired density; i ∈ IV , IV is the set of section indices in which speed limits are controlled; T1 = Nc To , Nc is a positive integer. The above equations provide the regulation of the flow at a particular section of the highway. Our control variable, however, is speed. Therefore, in order to regulate traffic speed instead of the traffic flow rate as done in ramp metering, we use the flow rate to speed relationship as described by the fundamental flow–density diagram, shown in Figure 13.12. We set Qmax as the flow corresponding to the critical density, which is the capacity. We denote by vc the speed corresponding to the critical density. It is reasonable to assume that Vmin ≤ vc ≤ Vmax . Therefore, we can set Qmin as the flow corresponding to Vmin . A mapping from [Qmin , Qmax ] to [Vmin , vc ] can be found, denoted as f ( Q). The flow–density relationship of every section can be estimated either off-line or online [59]. Therefore, the mapping f ( Q) is defined based on the estimated flow–density relationship. Specifically, if the flow–density relationship is assumed to be: 1 k α q = vfree · k · exp − (13.50) α kc then the free flow vfree , critical density kc , and the exponent α can be estimated online or off-line using real traffic data and used to find the mapping f ( Q), as shown in Figure 13.13. Therefore, when Ci is active, we have the desired speed limit as: ¯ i (nT1 ) = f ( Qi (nT1 )) V
(13.51)
¯ i generated by Equation (13.51) may lead to unsafe changes However, V of speed limits. For practical purposes, we use the following speed limit Vi ,
P1: Binaya Dash/Sanjay Das November 16, 2007
16:6
7985
7985˙C013
Modeling, Simulation, and Control of Transportation Systems
431
60 vc 50
v (km/hour)
40 30
Vmin
20 10 Qmin 0 0
500
1000 q (veh/hour/lane)
Qmax 1500
FIGURE 13.13 f(Q), strictly increasing mapping from [Qmin , Qmax ] to [Vmin, vc ].
which is smoother: ⎧ V ((n − 1)T1 ) − c v , ⎪ ⎨ i Vi (nT1 ) = Vi+1 (nT1 ) + c v , ⎪ ⎩ ¯ i (nT1 ), V
¯ i (nT1 ) ≤ Vi ((n − 1)T1 ) − c v if V ¯ i (nT1 ) ≥ Vi+1 (nT1 ) + c v if V
(13.52)
otherwise
where c v is a design constant. If Ci is inactive at time (n − 1)T1 and becomes active at time nT1 , the speed limit is given as: ¯ i (nT1 ) ≥ Vi+1 (nT1 ) + c v Vi+1 (nT1 ) + c v , if V (13.53) Vi (nT1 ) = ¯ i (nT1 ) = f (ki+1 (nT1 )vi+1 (nT1 )), otherwise V The roadway controller of the overall HTFC system consists of the ramp metering strategy given by Equation (13.45) and the speed control strategy given by Equations (13.52), and (13.53) and rules S1 to S3. 13.3.4 Evaluation of the HTFC System The validated microscopic simulation model in the upper part of Figure 13.9 was extended north to include a total of 14 sections, about 7.6 km long, as shown in Figure 13.9, lower part. Different congestion scenarios were created in VISSIM to evaluate the proposed HTFC system (Table 13.1). In order to quantify the effectiveness of the proposed HTFC system, we use two quantities: total time spent (TTS) in the network and the standard deviation of
P1: Binaya Dash/Sanjay Das November 16, 2007
16:6
7985
7985˙C013
432
Modeling and Control of Complex Systems
TABLE 13.1
Simulation Inputs for Different Congestion Scenarios Scenario No. Scenario Category Inflow (veh/h/lane) 1 Peak hour traffic 2100 (high mainline demand) 2 Peak hour traffic 2200 (high mainline demand) 3 Peak hour traffic 2300 (very high mainline demand) 4 Accident traffic 1800
5
Accident traffic
1800
Disturbance/Incident None None None Speed drops to 10 km/h during the time interval 600–900 sec on sections 10 and 11 Speed drops to 4 km/h during the time interval 600–900 sec on sections 10 and 11
density (StdK). Environmental effects and safety effects are related to the StdK because the smoother the density of the segment the fewer the number of acceleration or deceleration events that take place. Therefore, smaller density deviation is an indicator of possible lower emission rates and lower possibility of accidents. The TTS is defined as: N
Nsim TTS = T0 (13.54) [mi L i ki (nT0 )] n=1
i=4
where Nsim = (3600/T0 ) = 240, N = 14 is the total number of sections. We consider sections 4 to 14 for calculating TTS because the first three sections of the segment are not controlled via variable speed limits. Moreover, because the inflow to section 1 is at a constant level, if the speed limits are reduced at section 4, the simulation model needs some space to accommodate the extra vehicles that cannot enter section 4 and all its downstream sections. Because all the simulation runs are 1 hour long and the length of each section is constant, the TTS is actually a weighted measure of the average density of the segment (freeway sections 4–14). The StdK, which is defined below, is a smoothness measure of traffic. StdK = std[(ki,n )],
ki,n : density of section i at time nT0
(13.55)
where 4 ≤ i ≤ 14, and (ki,n ) is the density map of the segment for the whole hour. As T0 = 15 sec and simulation time is 1 hour, 1 ≤ n ≤ 240. When all the vehicles are manually driven, the speed limit commands are communicated to the drivers via billboards or, in the case of roadway to vehicle communication system via a display or audio inside the vehicle. We assume that drivers will follow the speed limit commands. This is not
P1: Binaya Dash/Sanjay Das November 16, 2007
16:6
7985
7985˙C013
Modeling, Simulation, and Control of Transportation Systems
433
TABLE 13.2
Simulation Evaluation Results ACC%
0%
10%
40%
100%
TTS StdK TTS StdK TTS StdK TTS StdK Scenario No. (veh·h) (veh/km) (veh·h) (veh/km) (veh·h) (veh/km) (veh·h) (veh/km) 1 HTFC off 517 9.44 489.5 8.44 491.6 7.94 463.3 5.53 HTFC on 448.5 6.86 460 6.85 476.2 6.86 456.2 5.1 Down by 13% 27% 6% 19% 3% 14% 2% 8% 2
HTFC off HTFC on Down by
555.1 455.7 18%
10.23 7.41 28%
541.3 470.1 13%
9.92 7.2 27%
498.8 471.6 5%
8.31 6.78 18%
549.6 527.2 4%
8.81 8.05 9%
3
HTFC off HTFC on Down by
548.8 454.6 17%
10.17 7.14 30%
533.3 468.5 12%
9.78 7.24 26%
501.4 481.7 4%
8.08 7.01 13%
537.2 496.4 8%
8.08 6.78 16%
4
HTFC off HTFC on Down by
692.4 624.3 10%
21.94 19.46 11%
694.8 622.6 10%
21.08 19.27 9%
670.7 621.8 7%
20.21 19.76 2%
654.8 642.8 2%
21.52 19.97 7%
5
HTFC off HTFC on Down by
969.8 793.7 18%
35.98 33.21 8%
1017.7 881.5 13%
36.75 34.48 6%
938.9 871.3 7%
35.31 34.51 2%
1091.8 944.4 14%
41.12 36.31 12%
a strong assumption as only a single driver in each lane needs to respond favorably to the speed limit command to affect the rest. For the peak hour traffic scenario (scenario 1), section 11 begins to become congested due to the high inflow rate, which is approximately close to the estimated capacity. This four-lane section, which has a freeway split, becomes a bottleneck due to the immediate on ramp in the next section. When the onset of congestion is detected at section 11, the roadway controller immediately reduces the speed limits of the upstream sections in order to prevent queuing of vehicles at section 11. Therefore, traffic is free flowing shortly after at section 11 and downstream. TTS is reduced to 449 veh·h, which means a 13% decrease from the case without the HTFC system (517 veh·h). The smoothness of traffic, as indicated by the density deviations, is also reduced by a factor of 27% (Table 13.2). This reduction was acquired by the quick response to the onset of congestion and the smoothness effect of reducing speed limits. We also estimated critical densities and capacities and other traffic flow characteristics for mixed manual and ACC vehicles scenarios. Figure 13.14 shows that the critical density and capacity increase with the ACC penetration, whereas the shape of the fundamental diagram remains the same during the free-flow region, which agrees with intuition. We also tested the HTFC system for different ACC penetration. For example, for all the scenarios in Table 13.1, simulation runs were conducted when 0%, 10%, 40%, and 100% vehicles are ACC-equipped vehicles. Results from over 100 simulation runs showed that the HTFC system relieves congestion and reduces the TTS. Furthermore, as the penetration of ACC vehicles increases, congestion gets dissipated faster (Table 13.2).
P1: Binaya Dash/Sanjay Das November 16, 2007
16:6
7985
7985˙C013
434
Modeling and Control of Complex Systems
FLOW-DENSITY relationship
2500
flow [veh/h/lane]
2000
1500
1000
500
0
100% ACC 40% ACC 0% ACC 0
10
20 30 density [veh/km/lane]
40
50
FIGURE 13.14 Flow–density diagram in mixed ACC scenarios.
13.4
Conclusions
Transportation systems are complex dynamic systems whose modeling and control generate many challenging problems that have kept research going since the time the first automobile operated on a public road. This chapter presented an outline of the various models used to describe traffic flow characteristics on both the macroscopic and microscopic levels. In addition it presented an example of developing a validated simulation model using commercial software for a particular highway network using real traffic data. Recently developed commercial software tools allow the development of simulation models for traffic networks, which due to the availability of fast computers with large memory, can be proven to be very effective in representing traffic flow dynamics and understanding their behavior. These simulation models can also be used to evaluate new traffic flow management and control techniques in a way that was not possible several years ago. The chapter also presents a feedback control strategy to control highway traffic by generating desired speed limits along the highway lanes and ramp metering commands. This control strategy takes advantage of available technologies such as roadway to vehicle communication as well as adaptive cruise control systems on board some vehicles. The validated simulation model is used to demonstrate that the proposed control strategy reduces travel time by better managing congestion and reduces high-density deviations, which implies smoother traffic flow that has positive impact on the environment and on safety.
P1: Binaya Dash/Sanjay Das November 16, 2007
16:6
7985
7985˙C013
Modeling, Simulation, and Control of Transportation Systems
435
References 1. Papageorgiou, M. and Kotsialos, A., “Freeway ramp metering: An overview,” IEEE Trans. Intelligent Transportation Systems, 3(4), 271–281, 2002. 2. Chen, C., Jia, Z. and Varayia, P., “Causes and cures of highway congestion,” IEEE Control Systems Magazine, 21(4), 26–33, December 2001. 3. FHWA, CORSIM 5.1, User Guide and Reference Manual, February 2003. 4. PTV AG, VISSIM Version 4.10, User Manual, March 2005. 5. TSS, AIMSUN-NG Version 5.0, User Manual, November 2005. 6. Quadstone Limited, PARAMICS, Version 5.1, Modeler User Guide, 2005. 7. Daganzo, C. F., Fundamentals of Transportation and Traffic Operations. Pergamon Press, New York, 1997. 8. Greenshields, B. D., “A study in highway capacity,” Highway Research Board, 14, 458, 1935. 9. Greenberg, H., “An analysis of traffic flow,” Operations Research, 7, 78–85, 1959. 10. Underwood, R. T., “Traffic flow and bunching,” Journal of Australian Road Research, 1, 8–25, 1963. 11. Burns, J. A. and Kang, S., “A control problem for Burger’s equation with bounded input/output,” Nonlinear Dynamics, 2, 235–262, 1991. 12. Musha, T. and Higuchi, H., “Traffic current fluctuations and the Burger’s,” Japanese Journal of Applied Physics, 17, 811–816, 1978. 13. Papageorgiou, M., Blosseville, J. M. and Haj-Salem, H., “Macroscopic modeling of traffic flow on the Boulevard P´eriph´erique in Paris,” Transportation Research B, 23, 29–47, 1989. 14. Lighthill, M. J. and Whitham, G. B., “On kinematic waves. I. Flow movement in Long Rives; II. A theory of traffic flow on long crowded roads,” Proceedings of the Royal Society A, 229, 281–345. 15. Richards, P. I., “Shockwaves on the highway,” Operations Research, 42–45, 1956. 16. Payne, H. J., “Models of freeway traffic control,” Simulation Council Proc. 1, 51, 1971. 17. Whitham, G. B., Linear and Nonlinear Waves. John Wiley, New York, 1974. 18. Papageorgiou, M., “Modeling and real-time control of traffic flow on the southern part of boulevard Peripherique in Paris. II. Coordinated on-ramp metering,” Transportation Research, 24A(5), 361–370, 1990. 19. Wang, Y., Papageorgiou, M. and Messmer, A., “A real-time freeway network traffic surveillance tool,” IEEE Transactions on Control Systems Technology, 14(1), 18–32, January 2006. 20. Papageorgiou, M., Application of Automatic Control Concepts to Traffic Flow Modeling and Control. Springer-Verlag, Berlin, 1983. 21. Karaaslan, U., Varaiya, P. and Walrand, J., “Two proposals to improve freeway traffic flow,” PATH Research Reports, Paper UCB-ITS-PRR-90-6, January 1, 1990. 22. Nagel, K., “Partial hopping models and traffic flow theory,” Physical Review E, 53, 4655–4672, 1996. 23. Chien, C.C., “Advanced vehicle control and traffic management systems for intelligent vehicle highway system,” Ph.D. dissertation, University of Southern California, 1994.
P1: Binaya Dash/Sanjay Das November 16, 2007
436
16:6
7985
7985˙C013
Modeling and Control of Complex Systems
24. Ahmed, K., Ben-Akiva, M., Koutsopoulos, H. and Mishalani, R., “Models of freeway lane changing and gap acceptance behavior,” in Proceedings of the 13th International Symposium on the Theory of Traffic Flow and Transportation, 1996. 25. Pipes, L. A., “An operational analysis of the traffic dynamics,” Journal of Applied Physics, 24, 271–281, 1953. 26. Lee, G., “A generalization of linear car-following theory,” Operations Research, 14, 595–606, 1966. 27. Gazis, D. C., Herman, R. and Rothery, R. W., “Nonlinear follow-the-leader models of traffic flow,” Operations Research, 9, 545–567, 1961. 28. Helly, W., “Simulation of bottlenecks in single-lane traffic flow,” R. C. Herman (ed.), Theory of Traffic Flow, Proceedings of the Symposium on the Theory of Traffic Flow, Elsevier, Amsterdam, 1961. 29. Tyler, J. S., “The characteristics of model following systems as synthesized by optimal control,” IEEE Transactions on Automatic Control, 9, 485–498, 1964. 30. Burnham, G. O., Seo, J. and Bekey, G. A., “Identification of human driver models in car following,” IEEE Transactions on Automatic Control, 6, 911–916, 1974. 31. Gipps, P. G., “A behavioral car-following model for computer simulation,” Transportation Research, 15B, 105–111, 1981. 32. Wiedemann, R., Simulations des Straßenverkehrsflusses, Schriftenreihe des Instituts fur ¨ Verkehrswesen der Universit¨at Karlsruhe, Heft 8, 1974. 33. Wiedemann, R., “Modeling of RTI-elements on multi-lane roads,” in Advanced Telematics in Road Transport, edited by the Commission of the European Community, XIII, Brussels, 1991. 34. Panwai, S. and Dia, H., “Comparative evaluation of microscopic car-following behavior,” IEEE Transactions on Intelligent Transportation Systems, 6(3), 314–325, 2005. 35. Gazis, D. C., Herman, R. and Potts, R. B., “Car-following theory of steady-state traffic flow,” Operations Research, 7, 499–505, 1959. 36. Li, K. and Ioannou, P., “Modeling of traffic flow of automated vehicles,” IEEE Transactions on Intelligent Transportation Systems, 5(2), June 2004. 37. Papageorgiou, M., Hadj-Salem, H. and Blosseville, J.-M., “ALINEA: A local feedback control law for on-ramp metering,” Transportation Research Record, No. 1320, 58–64, 1991. 38. Smaragdis, E., Papageorgiou, M. and Kosmatopoulos, E., “A flow-maximizing adaptive local ramp metering strategy,” Transportation Research Part B, 38, 251– 270, 2004. 39. Papageorgiou, M., Haj-Salem, H. and Middleham, F., “ALINEA local ramp metering: Summary of field results,” Transportation Research Record, No. 1603, 90–98, 1997. 40. Hegyi, A., Schutter, B. D. and Hellendoorn, H., “Model predictive control for optimal coordination of ramp metering and variable speed limits,” Transportation Research Part C, 13, 185–209, 2005. 41. Kotsialos, A., Papageorgiou, M., Mangeas, M. and Haj-Salem, H., “Coordinated and integrated control of motorway networks via nonlinear optimal control,” Transportation Research Part C, 10, 65–84, 2002. 42. Chang, T. and Li, Z., “Optimization of mainline traffic via an adaptive coordinated ramp-metering control model with dynamic OD estimation,” Transportation Research Part C, 10, 99–120, 2002.
P1: Binaya Dash/Sanjay Das November 16, 2007
16:6
7985
7985˙C013
Modeling, Simulation, and Control of Transportation Systems
437
43. Zhang, H. M., Ritchie, S. G. and Jayakrishnan, R., “Coordinated trafficresponsive ramp control via nonlinear state feedback,” Transportation Research Part C, 9, 337–352, 2001. 44. Zhang, H. M., Ritchie, S. G. and Recker, W., “Some general results on the optimal ramp metering control problem,” Transportation Research Part C, 4, 51–69, 1996. 45. Stephanedes, Y. and Chang, K.-K., “Optimal control of freeway corridors,” ASCE J. Transportation Eng., 119, 504–514, 1993. 46. Papageorgiou, M. and Mayr, R., “Optimal decomposition methods applied to motorway traffic control,” International Journal of Control, 35, 269–280, 1982. 47. Mammar, S., Messmer, A., Jensen, P., Papageorgiou, M., Haj-Salem, H. and Jensen, L., “Automatic control of variable message signs in Aalborg,” Transportation Research Part C, 4(3), 131–150, 1996. 48. Chien, C. C., Zhang, Y. and Ioannou, P.A., “Traffic density control for automated highway systems,” Automatica, 33(7), 1273–1285, 1997. 49. Smulders, S. A., “Control of freeway traffic by variable speed signs,” Transportation Research Part B, 24(2), 111–132, 1990. 50. Hegyi, A., De Schutter, B., Hellendoorn, H. and Van Den Boom, T., “Optimal coordination of ramp metering and variable speed control: An MPC approach,” Proceedings of the 2002 American Control Conference, Anchorage, Alaska, May 2002, pp. 3600–3605. 51. Alessandri, A., Di Febbraro, A., Ferrara, A. and Punta, E., “Nonlinear optimization for freeway control using variable-speed signaling,” IEEE Transactions Vehicular Technology, 48(6), 2042–2052, 1999. 52. Ioannou, P. A., Automated Highway Systems, Plenum Press, New York, 1997. 53. Jones, W. D., “Building safer cars,” IEEE Spectrum, 39(1), 82–85, 2002. 54. Richard, B., “Japan’s demo 2000 wows attendees,” ITS World, January/February 2001, pp. 18–19. 55. Coifman, B., Lyddy, D. and Sabardonis, A., “The Berkeley Highway Laboratory: Building on the I-880 Field Experiment,” Proceedings of the IEEE ITS Council Annual Meeting, pp. 5–10, 2000. 56. Wang, Y., Chang, H. and Ioannou, P., Integrated Roadway/Adaptive Cruise Control System: Safety, Performance, Environmental and Near Term Deployment Considerations, submitted to California Partners for Advanced Transit and Highways (PATH), Research Report. 57. Hasan, M., Jha, M. and Ben-Akiva, M., “Evaluation of ramp control algorithms using microscopic traffic simulation,” Transportation Research Part C, 10, 229–256, 2002. 58. Zhang, J., Boitor, A. and Ioannou, P. A., “Design and evaluation of a roadway controller for freeway traffic,” Proceedings of the 8th International IEEE Conference on Intelligent Transportation Systems, Vienna, Austria, 2005, pp. 543–548. 59. Wang, Y. and Ioannou, P., “Real-time parallel parameter estimators for a secondorder macroscopic traffic flow model,” Proceedings of the 9th IEEE Intelligent Transportation Systems Conference, Toronto, Canada, 2006, pp. 1466–1470.
P1: Binaya Dash/Sanjay Das November 16, 2007
16:6
7985
7985˙C013
P1: Binaya Dash November 16, 2007
16:10
7985
7985˙C014
14 Backstepping Controllers for Stabilization of Turbulent Flow PDEs
Miroslav Krstic, Jennie Cochran, and Rafael Vazquez
CONTENTS 14.1 14.2 14.3 14.4
Introduction.............................................................................................. 439 The Model................................................................................................. 441 Controllers ................................................................................................ 445 Stability Proof........................................................................................... 451 14.4.1 Controlled Wave Numbers...................................................... 451 14.4.2 Uncontrolled Wave Numbers ................................................. 453 14.4.3 Physical Domain ....................................................................... 454 14.5 Case k x = 0 ............................................................................................... 455 14.6 Discussion................................................................................................. 459 Acknowledgments .............................................................................................. 460 References............................................................................................................. 460
14.1
Introduction
The next complex system under discussion is incompressible turbulent flows modeled by the Navier–Stokes equations. We consider a benchmark system– a three-dimensional (3D) channel flow at Reynolds numbers that are both turbulent and laminar. A channel flow consists of a channel that is infinite in the x and z (streamwise and spanwise) directions and bounded in the y (normal) direction. The flow is driven by a pressure gradient in the streamwise direction. The Reynolds number, which represents the ratio between inertial forces and viscous forces, is the parameter of this system which governs the stability. At low Reynolds numbers the flow is stable, or laminar, and at high Reynolds numbers the system is unstable or turbulent. We present an approach referred to as “backstepping” for stabilization of this 3D channel flow at arbitrary Reynolds numbers. The stabilization of 439
P1: Binaya Dash November 16, 2007
440
16:10
7985
7985˙C014
Modeling and Control of Complex Systems
2D channel flows is very similar. 1 In this chapter, we develop a controller for the Navier–Stokes equations linearized around the parabolic equilibrium profile. This controller employs actuation of all three components of velocity at one of the walls. Though the controller presented here employs full state feedback, an observer developed by similar methods allows us to implement the controller by measuring only the pressure and the skin friction at the same wall where actuation is applied. 2 The linearized channel flow is modeled by a set of partial differential equations (PDEs) that are coupled for all Reynolds numbers and then also unstable when the Reynolds number is large. Even at Reynolds numbers where the linearized equations are stable, the coupling between the PDEs is unfavorable and results in large transient energy growth that could possibly lead to transition to turbulence. 3–5 We derive a controller that decouples these PDEs (commonly referred to as the Orr–Sommerfeld and Squire equations), and stabilizes them at any Reynolds number, including higher Reynolds numbers where they may be unstable without control. The backstepping approach that achieves this employs two tools: (1) a change of variable that converts the Orr–Sommerfeld and Squire PDEs into heat equations with the coupling effects shifted into the boundary conditions, and (2) boundary control that cancels the coupling effects. The change of variable is done with a Volterra operator (a “spatially causal” or lower triangular change of variable which starts from the uncontrolled wall and is marched forward in space towards the controlled wall). This type of approach has been effective in control of finite-dimensional nonlinear control systems employed in robotics and flight control, and known under the names of feedback linearization, 6 dynamic inversion, and integrator backstepping. 7 The extension to infinitedimensional systems was developed recently and results in explicit formulas for the gain functions. 8 The advantage of the backstepping approach over optimal control approaches, when applied to channel flow, is that it is not necessary to solve high-dimensional Riccati equations, and the backstepping gains are explicit functions of the Reynolds number and the wave numbers, which can be precomputed. Whereas previous optimal control designs required actuation of only the wall normal component of velocity, our approach also employs the streamwise and spanwise velocity components at the boundary for actuation. The goal achieved with the extra actuation is complete decoupling of the Orr– Sommerfeld and Squire systems at any wave number desired. A bonus obtained with the backstepping approach is that, once the linear Navier–Stokes equations are converted into the heat equations using an invertible change of the velocity variables and boundary feedback, they can be solved in closed form. Because the inverse of the variable change is available explicitly, this means that the controlled linearized Navier–Stokes equations are solvable explicitly. The explicit solvability of the linearized Navier–Stokes equations is the first such result in the literature and is obtained as a result of the use of control, with a particular control design approach.
P1: Binaya Dash November 16, 2007
16:10
7985
7985˙C014
Backstepping Controllers for Stabilization of Turbulent Flow PDEs
441
Besides linearization and the standard Fourier transform in x and z directions, we do not employ any approximations or simplifications of the Navier– Stokes model, in particular, no spatial or temporal discretizations. The result is valid for the continuum model, although it is clear that the implementation of the continuum controller would employ numerical integration in the feedback law operator. We start, in Section 14.2, by reviewing the geometry and system modeled by the Navier–Stokes equations. We then transform the equations to a form with underlying Orr–Sommerfeld and Squire equations. These transformations include a Fourier transform in the x and z directions and a change of variables. This leaves us with a continuum of uncoupled 1D PDEs parameterized by wave numbers. We note that only a subset of wave numbers need to be controlled at any Reynolds number. With this in mind we split the transformed system into two subsystems, one controlled and one not controlled. In Section 14.3 we derive one controller to put the controlled subsystem into a strict feedback structure. Once the system is in a strict feedback structure, we can make use of the backstepping method. We then derive the other two controllers to stabilize and decouple the Orr–Sommerfeld and Squire equations using a backstepping transformation. In Section 14.4 we prove the stability of the whole system. We first consider the system in wavespace, and then use Parseval’s theorem to prove stability in the physical domain. We continue with Section 14.5 where we study the PDEs derived in Section 14.3 that define the backstepping gain kernels. We focus on the specific case of an averaged streamwise velocity. This is equivalent to considering any spanwise wave number and setting the streamwise wave number to zero. This scenario is often considered the “ultimate problem” in control of channel flow turbulence because it is where the transient growth is the largest. 3–5 For this important case, we derive explicit solutions to the kernels used in the streamwise and spanwise controllers. We finish the chapter in Section 14.6 with a discussion of the results.
14.2
The Model
In this section we review the model and corresponding equations and perform several standard transformations, which result in the underlying Orr– Sommerfeld and Squire equations. The geometry we consider is a 3D channel flow. As seen in Figure 14.1 it is infinite in the x and z directions and bounded by walls at y = 0 and y = 1. The governing equations for the dimensionless velocity field of the incompressible channel flow we consider are the Navier–Stokes equations: 1 (Uxx + Uzz + Uyy ) − UUx − VUy − WUz − Px Re 1 Wt = (Wxx + Wzz + Wyy ) − UWx − VWy − WWz − Pz Re Ut =
(14.1) (14.2)
P1: Binaya Dash November 16, 2007
16:10
7985
7985˙C014
442
Modeling and Control of Complex Systems y=1
U(y)
z
y x
y=0
FIGURE 14.1 Three-dimensional channel flow Poiseuille parabolic equilibrium.
Vt =
1 (Vxx + Vzz + Vyy ) − UVx − VVy − WVz − Py Re
(14.3)
Ux + Wz + Vy = 0
(14.4)
U| y=0 = W| y=0 = V| y=0 = 0
(14.5)
U| y=1 = W| y=1 = V| y=1 = 0,
(14.6)
where U = U(t, x, z, y) is the streamwise velocity, W = W(t, x, z, y) is the spanwise velocity, V = V(t, x, z, y) is the wall-normal velocity, P = P(t, x, z, y) is the pressure, Re is the Reynolds number, and a subscript indicates a 2 partial derivative with respect to that argument (i.e., Ut = ∂∂Ut and Vxx = ∂∂xV2 ). Note that the nondynamic constraint (14.4) arises because we are dealing with an incompressible flow. The equilibrium solution to Equations (14.1)–(14.6) that we stabilize is the parabolic Poiseuille profile: U e = 4y(1 − y)
(14.7)
W =V =0 (14.8) 8 e P = P0 − x, (14.9) Re which is unstable for high Reynolds numbers. The new equations, after defining the fluctuation variables: e
e
u = U − Ue
p = P − Pe
(14.10)
and linearizing around the equilibrium profile, are 1 (uxx + uzz + u yy ) − U e ux − Uye V − px Re 1 Wt = (Wxx + Wzz + Wyy ) − U e Wx − pz Re 1 Vt = (Vxx + Vzz + Vyy ) − U e Vx − p y Re ux + Vy + Wz = 0 ut =
u| y=0 = W| y=0 = V| y=0 = 0 u| y=1 = Uc
W| y=1 = Wc
(14.11) (14.12) (14.13) (14.14) (14.15)
V| y=1 = Vc
(14.16)
P1: Binaya Dash November 16, 2007
16:10
7985
7985˙C014
Backstepping Controllers for Stabilization of Turbulent Flow PDEs
443
where u = u(t, x, y, z) and p = p(t, x, y, z). Note that the linearized system is still unstable at high Reynolds numbers and the incompressible constraint (14.14) still holds. Uc = Uc (t, x, z), Wc = Wc (t, x, z), and Vc = Vc (t, x, z) are potential controllers positioned at the upper wall. In the uncontrolled case, each of these is equal to 0. The first transformation we perform is a 2D Fourier transform in the x and z directions. This results in a continuum of 1D systems, each parameterized by k x and k z , the wave numbers in the x and z directions, respectively. Each 1D system is uncoupled from the others, though the subsystems within the 1D system remain coupled. Though previously u, V, W, and p indicated physical space functions, from this point onward in the chapter (unless explicitly stated), u = u(t, y), W = W(t, y), V = V(t, y), and p = p(t, y) will indicate the transformed (in Fourier space) functions. Note that u, V, W, and p are now functions of only t and y (in that order), and that k x and k z are parameters of the functions. In addition, from this point Uc = Uc (t), Wc = Wc (t), and Vc = Vc (t) also represent transformed functions and are now only functions of t. After defining α 2 = 4π 2 k x2 + k z2 the transformed equations are ut =
1 (−α 2 u + u yy ) − 2πik x U e u − Uye V − 2πik x p Re
(14.17)
Wt =
1 (−α 2 W + Wyy ) − 2πik x U e W − 2πik z p Re
(14.18)
Vt =
1 (−α 2 V + Vyy ) − 2πik x U e V − p y Re
(14.19)
2πik x u + Vy + 2πik z W = 0
(14.20)
u| y=0 = W| y=0 = V| y=0 = 0 u| y=1 = Uc ,
W| y=1 = Wc ,
(14.21) V| y=1 = Vc .
(14.22)
To control the whole velocity field (u, W, V), we divide the wave numbers into two continuous sets. The first set contains the wave numbers in the area outside of the square with length 2m and inside the square with length 2M, and is controlled by Uc , Wc , and Vc to be designed. The other set, containing all other wave numbers, is left uncontrolled. We separate these sets mathematically using the following function: ⎧ ⎪ ⎪ ⎪ ⎪ ⎨1, χ (k x , k z ) = ⎪ ⎪ ⎪ ⎪ ⎩ 0,
⎧ {|k | ≥ m ⎪ ⎨ x ⎪ ⎩
or
|k z | ≥ m}
and {|k x | ≤ M and
else,
|k z | ≤ M}
(14.23)
P1: Binaya Dash November 16, 2007
16:10
7985
7985˙C014
444
Modeling and Control of Complex Systems
where 1 m= 64π Re
1 M= π
Re 2
(14.24)
for the analysis in this chapter. For implementation, m and M are chosen using the numerical results by Schmid and Henningson. 5 Our next transformation is a standard transformation in fluid dynamics, with a twist. Instead of looking at the (u, W, V, p) subsystems, we shall work with the (Vy , ω) subsystems, where ω is the vorticity in the normal direction. [The standard transformation considers the (V, ω) subsystems which lead to the Orr–Sommerfeld and Squire equations.] We denote Y = Vy and our (Vy , ω) subsystems are defined as follows: Y = −2πi(k x u + k z W) = Vy ω = −2πi(k z u − k x W) Y| y=0 = ω| y=0 = 0 Y| y=1 = −2πi(k x Uc + k z Wc ) = Yc ω| y=1 = −2πi(k zUc − k x Wc ) = ωc
(14.25) (14.26) (14.27) (14.28) (14.29)
where Y = Y(t, y), ω = ω(t, y), Yc = Yc (t), and ωc = ωc (t) are each in Fourier space and parameterized by k x and k z . By stabilizing (Y, ω) we stabilize the entire Navier–Stokes system as the backward transformation is −1 k x Y + k z ω 2πi k x2 + k z2 −1 k z Y − k x ω W= 2πi k x2 + k z2 y V( y) = Y(η)dη. u=
(14.30) (14.31) (14.32)
0
Before our final manipulations, we state the dynamics for (Y, ω): 1 (−α 2 Y + Yyy ) − 2πik x U e Y + 2πik x Uye V − α 2 p, Re 1 ωt = (−α 2 ω + ω yy ) − 2πik x U e ω + 2πik zUye V. Re Yt =
(14.33) (14.34)
Note that to obtain the Orr–Sommerfeld equation we take the Laplace transform of the derivative of Equation (14.33) minus α 2 times the integral of
Equation (14.33). O-S = L{ ∂∂y Yt − α 2 Yt dy}. We make use of the following Equation for p −α 2 p + p yy = −4πik x Uye V,
(14.35)
to take the integral of Equation (14.33). We obtain the Squire equation by taking the Laplace transform of Equation (14.34). The final Orr–Sommerfeld
P1: Binaya Dash November 16, 2007
16:10
7985
7985˙C014
Backstepping Controllers for Stabilization of Turbulent Flow PDEs
445
and Squire equations are
1 e (−c + 2πik x U e )( ∂ yy − α 2 ) − 2πik x Uyy − ( ∂ yy − α 2 ) 2 V = 0 Re 1 (−c + 2πik x U e ) − ( ∂ yy − α 2 ) ω = 2πik zUye V. Re
(14.36) (14.37)
(Note that ω is analogous to the η variable that appears in many references to the Squire equation: ω = −η.) The variable c is a Laplace transform variable and represents a derivative with respect to time. Therefore, to decouple and stabilize the Orr–Sommerfeld and Squire equations, we need to decouple and stabilize the (Y, ω) subsystems.
14.3
Controllers
In the previous section we derived our first pass at the Orr–Sommerfeld and Squire equations. In this section, we continue to modify the equations and find controllers that stabilize and decouple the two subsystems. In order to do this, we shall first manipulate the dynamic equations for Y and ω, Equations (14.33) and (14.34), to arrive at a homogeneous equation for Y and an equation for ω forced only by Y. The first step is to find a solution to the elliptic partial differential equation that governs pressure. −α 2 p + p yy = −4πik x Uye V
(14.38)
p y | y=0 =
Vyy (t, 0) Re
(14.39)
p y | y=1 =
Vyy (t, 1) − α 2 Vc − (Vc ) t Re
(14.40)
The solution to Equations (14.38)–(14.40) is 1 p= α
− 4πik x
+ 4πik x
0
y
V(t, η)Uηe (η) sinh(α( y − η))dη
cosh(αy) sinh(α)
0
1
V(t, η)Uηe (η) cosh(α(1 − η))dη
cosh(α(1 − y)) Vyy (t, 0) cosh(αy) − + sinh(α) Re sinh(α)
Vyy (t, 1) − α 2 Vc − (Vc ) t Re
.
(14.41)
P1: Binaya Dash November 16, 2007
16:10
7985
7985˙C014
446
Modeling and Control of Complex Systems
Substituting Equation (14.41) into Equation (14.33) we arrive at the following equation for Yt in terms of Y and V: Yt =
1 (−α 2 Y + Yyy ) − 2πik x U e Y + 2πik x Uye V Re y + α 4πik x V(t, η)Uηe (η) sinh(α( y − η))dη 0
cosh(αy) − 4πik x sinh(α)
1
0
V(t, η)Uηe (η) cosh(α(1 − η))dη
cosh(α(1 − y)) Vyy (t, 0) cosh(αy) + − sinh(α) Re sinh(α)
Vyy (t, 1) − α 2 Vc − (Vc ) t Re
.
(14.42) Before continuing with our manipulations, we note that Equation (14.42) is not spatially causal; there is an integral over the whole space (i.e., from 0 to 1). To use the backstepping method as it has been developed so far, we must work with systems in strict feedback form. To put the equation into this form, we set Vc as follows: (Vc ) t =
1 (Vyy (t, 1) − Vyy (t, 0) − α 2 Vc ) Re 1 + 4πik x V(t, η)Uηe (η) cosh(α(1 − η))dη.
(14.43)
0
The resulting equation: Yt =
1 (−α 2 Y + Yyy ) − 2πik x U e Y + 2πik x Uye V Re y
+ α4πik x 0
+α
V(t, η)Uηe (η) sinh(α( y − η))dη
cosh(α(1 − y)) − cosh(αy) Vyy (t, 0) sinh(α) Re
(14.44)
is in strict feedback form. (Note the existing integrals are spatially causal, from 0 to y.) Now, using the transformation equations for Y, Equations (14.25) and (14.32), and changing the order of integration in the integral in the Yt equation, we arrive at our final dynamic equations for Y and ω. Making use of Equation (14.7), we denote the following for notational convenience: 1 Re φ( y) = 8πik x y( y − 1) =
kx f ( y, η) = 8πik x (2y − 1) − 32πi sinh(α( y − η)) α −16πik x (2η − 1) cosh(α( y − η))
(14.45) (14.46)
(14.47)
P1: Binaya Dash November 16, 2007
16:10
7985
7985˙C014
Backstepping Controllers for Stabilization of Turbulent Flow PDEs cosh(α(1 − y)) − cosh(αy) sinh(α) h( y) = 8πik z (1 − 2y) g( y) = α
447
(14.48) (14.49)
and we are left with: Yt = (−α 2 Y + Yyy ) + φ( y)Y + g( y)Yy (t, 0) y + f ( y, η)Y(t, η)dη 0 y 2 ωt = (−α ω + ω yy ) + φ( y)ω + h( y) Y(η)dη .
(14.51)
0
Y| y=0 = ω| y=0 = 0 Y| y=1 = Yc ,
(14.50)
(14.52)
ω| y=1 = ωc .
(14.53)
To stabilize and decouple these subsystems at any Reynolds number, we define two backstepping transformations: y K ( y, η)Y(t, η)dη (14.54) =Y− 0 y
=ω− ( y, η)Y(t, η)dη, (14.55) 0
each of which contains a Volterra operator. Note that = (t, y), = (t, y), K = K ( y, η), and = ( y, η) are each parameterized by k x and k z . As these backstepping operators are invertible, we define the inverse transformations as follows: y L( y, η) (t, η)dη (14.56) Y= + 0 y ω = + ( y, η) (t, η)dη , (14.57) 0
where L = L( y, η) and = ( y, η) are, as normal, parameterized by k x and k z . We use these transformations to convert the system (14.50) to (14.53) to the following decoupled, stable system: t = (−α 2 + yy ) + φ( y)
(14.58)
t = (−α 2 + yy ) + φ( y)
(14.59)
| y=0 = | y=1 = 0
(14.60)
| y=0 = | y=1 = 0.
(14.61)
Note that φ( y) is an imaginary function and does not affect the stability of the above system. Also note that, for certain φ( y), this system is solvable explicitly, and therefore the closed-loop system in physical space is solvable explicitly. (The explicit solution in 2D can be found in Vazquez and Krstic. 1 ) Should we find the kernels K and , the controller that stabilizes and decouples the
P1: Binaya Dash November 16, 2007
16:10
7985
7985˙C014
448
Modeling and Control of Complex Systems
Orr–Sommerfeld and Squire equations is found by evaluating the forward transform (14.54), (14.55) at y = 1. 1 Yc = K (1, η)Y(t, η)dη (14.62) 0
ωc =
1
(1, η)Y(t, η)dη.
(14.63)
0
Note that, though this decoupling, stabilizing controller is valid for any wave number pair, when controlling the entire set of wave number pairs, we would separate the pairs as mentioned above, using the function χ , defined in Equation (14.23). To find the kernels K and , we use the forward backstepping transformations, as well as the equations that govern the behavior of (Y, ω) and ( , ). To find K , first differentiate Equation (14.54) with respect to t, and substitute Equation (14.50) for Yt . y t = (−α 2 Y + Yyy ) + φ( y)Y + g( y)Yy (t, 0) + f ( y, η)Y(t, η)dη 0 y K ( y, η) − α 2 Y(t, η) + Yηη (t, η) + φ(η)Y(t, η) + g(η)Yη (t, 0) − 0 η + f (η, σ )Y(t, σ )dσ dη 0 y f ( y, η)Y(t, η)dη = (−α 2 Y + Yyy ) + φ( y)Y + g( y)Yy (t, 0) + 0 y − K ( y, η){−α 2 Y(t, η) + φ(η)Y(t, η) + g(η)Yη (t, 0)}dη 0 y y + Y(t, η) K ( y, σ ) f (σ, η)dσ dη 0 η − Yy (t, y) K ( y, y) − Yy (t, 0) K ( y, 0) y −Y(t, y) K η ( y, y) + Y(t, 0) K η ( y, 0) + K ηη ( y, η)Y(t, η)dη (14.64) 0
We then find t from Equation (14.58) in terms of Y, by substituting Equation (14.54) and its appropriate derivatives ( yy ). dK( y, y) t = Yyy (t, y) − Y(t, y) − K ( y, y)Yy (t, y) dy y − K y ( y, y)Y(t, y) − K yy ( y, η)Y(t, η)dη 0 y + (−α 2 + φ( y)) Y(t, y) − K ( y, η)Y(t, η)dη (14.65) 0
P1: Binaya Dash November 16, 2007
16:10
7985
7985˙C014
Backstepping Controllers for Stabilization of Turbulent Flow PDEs
449
Finally, we set Equation (14.64) equal to Equation (14.65) and make use of appropriate boundary conditions, which results in a PDE that defines K : y K ( y, ξ ) f (ξ, η)dξ (14.66) K yy = K ηη − f ( y, η) + (φ(η) − φ( y)) K + η y K ( y, η)g(η)dη − g( y) (14.67) K ( y, 0) = 0
K ( y, y) = −g(0).
(14.68)
We follow similar steps to find the PDE which must satisfy. Differentiate Equation (14.55) with respect to t and substitute Equation (14.50) for Yt and Equation (14.51) for ωt . Then, write Equation (14.59) in terms of ω and Y by substituting in Equation (14.55) and its appropriate derivatives ( yy ). After equating these two expressions, the resulting PDE for is y ( y, σ ) f (σ, η)dσ (14.69) yy = ηη − h( y) + (φ(η) − φ( y)) + η y ( y, η)g(η)dη (14.70) ( y, 0) = 0
( y, y) = 0.
(14.71)
Note that PDEs for L, can be derived following the same method. y f ( y, ξ )L(ξ, η)dξ (14.72) L yy = L ηη − f ( y, η) + (φ(η) − φ( y))L − η
L( y, 0) = −g( y)
(14.73)
L( y, y) = −g(0). y ( y, σ )L(σ, η)dσ =+
(14.74)
η
(14.75)
It can be proved that K , , L, and have smooth solutions. 8 Each can be solved (off-line), either numerically or symbolically (using an equivalent integral equation formulation that can be solved via a successive approximation series). 8 In certain cases, such as when k x = 0, these kernels can be found analytically. We cover this in Section 14.5. Using the transformations Y to and ω to , we state the equations for the controllers Uc and Wc in wavespace: 1 k x K (1, η) + k z (1, η) 2πi Uc = 2 (k x Y(t, 1) + k z ω(t, 1)) = 2πi Y(t, η)dη α α2 0 (14.76) 1 2πi k z K (1, η) − k x (1, η) Y(t, η)dη. Wc = 2 (k z Y(t, 1) − k x ω(t, 1)) = 2πi α α2 0 (14.77)
P1: Binaya Dash November 16, 2007
16:10
7985
7985˙C014
450
Modeling and Control of Complex Systems
Given the previous derivations, we can state the controllers in physical space. Note that in the following three equations and in the following theorem statement, u, W, and V are in physical space and are functions of (t, x, z, y). In addition, please be aware that we integrate along the parameters k x and k z in K and . The controllers are t ∞ ∞ ∞ ∞ 2πi Vc = χ (k x , k z ){k x (u y (τ, x˜ , z˜ , 0) − u y (τ, x˜ , z˜ , 1)) 0 −∞ −∞ −∞ −∞ Re α2
+ k z (Wy (τ, x˜ , z˜ , 0) − Wy (τ, x˜ , z˜ , 1))}e Re (t−τ ) e2πi(kx (x−x˜ )+kz (z−˜z)) t 1 ∞ ∞ ∞ ∞ × dk x dk z d x˜ d z˜ dτ − V(τ, x˜ , z˜ , η)(2η − 1)χ (k x , k z ) 0
0
−∞ −∞ −∞ −∞ α2
× {16πk x i cosh(α(1 − η))}e Re (t−τ ) e2πi(kx (x−x˜ )+kz (z−˜z)) dk x dk z d x˜ d z˜ dηdτ (14.78) 1 ∞ ∞ ∞ ∞ χ (k x , k z ) Uc = (k x K (1, η; k x , k z ) + k z (1, η; k x , k z )) 2 2 0 −∞ −∞ −∞ −∞ k x + k z ×(k x u(t, x˜ , z˜ , η) + k z W(t, x˜ , z˜ , η))e2πi(kx (x−x˜ )+kz (z−˜z)) dk x dk z d x˜ d z˜ dη
1
Wc =
∞
−∞
0
∞
−∞
∞
−∞
(14.79) ∞
−∞
χ (k x , k z ) (k z K (1, η; k x , k z ) − k x (1, η; k x , k z )) k x2 + k z2
×(k x u(t, x˜ , z˜ , η) + k z W(t, x˜ , z˜ , η))e2πi(kx (x−x˜ )+kz (z−˜z)) dk x dk z d x˜ d z˜ dη, (14.80) where K and are defined by the systems (14.66) to (14.68) and (14.69) to (14.70), respectively. These controllers guarantee the following result. THEOREM 1 The equilibrium (in physical space) u(t, x, z, y) ≡ V(t, x, z, y) ≡ W(t, x, z, y) ≡ 0 of the linearized Navier–Stokes system (14.11) to (14.13), (14.78), (14.79), (14.80) is exponentially stable in the L 2 sense: 0
1
∞
−∞
∞
−∞
1 − 2Re t
(|V|2 (t, x, z, y) + |u|2 (t, x, z, y) + |W|2 (t, x, z, y)) d x dz d y
≤ Ce
0
1
∞
−∞
∞
−∞
(|V|2 (0, x, z, y) + |u|2 (0, x, z, y)
+ |W|2 (0, x, z, y)) d x dz d y
(14.81)
where C=
max
{2(1 + α 2 + 2∞ )(1 + L∞ ) 2 (1 + K ∞ + ∞ ) 2 }
m<|k x |,|k z |<M
and u, V, and W are in physical space.
(14.82)
P1: Binaya Dash November 16, 2007
16:10
7985
7985˙C014
Backstepping Controllers for Stabilization of Turbulent Flow PDEs
14.4
451
Stability Proof
This section proves Theorem 1. First, we prove the stability of the set of controlled wave numbers in Fourier space. We then prove the stability of the set of uncontrolled wave numbers, also in Fourier space. Finally, we use these results to prove stability in physical space. Throughout Sections 14.4.1 and 14.4.2 the functions u, V, and W (as well as Y, ω, , and ) are in Fourier space, are functions of (t, y), and are parameterized by k x and k z . The kernels K , , L, and are also in Fourier space and are parameterized by k x and k z . However, they are functions of ( y, η). 14.4.1 Controlled Wave Numbers To prove that the controlled wave number system is stable around the equilibrium u(t, y) ≡ V ≡ W ≡ 0, we start with V, u, and W in Fourier space and transform them into and . Note that as the kernels in the backstepping transformation exist, are unique, and have unique inverse kernels, the backstepping transformations are well posed. We use an exponential bound on these transformed variables and bounds on the norms of the kernels to show an exponential bound on the original variables. To begin, we use Equations (14.30) and (14.31) to transform u, W into Y, ω. 1 (|V|2 (t, y) + |u|2 (t, y) + |W|2 (t, y))dy 0 2 2 1 Y(t, y)+ k ω(t, y) Y(t, y) − k ω(t, y) k k x z z x + 2πi dy |V|2 (t, y)+2πi = 2 2 α α 0 1 1 = (14.83) |V|2 (t, y) + 2 (|Y|2 (t, y) + |ω|2 (t, y)) dy. α 0 We employ the forward backstepping transformations, (14.56), (14.57), and V= 0
y
1+
η
y
L(η, σ )dσ (t, η)dη
(14.84)
[derived from Equations (14.56) and (14.32)] to transform V, Y, ω into , ,
1
(|V|2 (t, y) + |u|2 (t, y) + |W|2 (t, y))dy 2 y 1 y 1 1+ L(η, σ )dσ (t, η)dη + 2 (t, y) = α 0 0 η 2 2 y y 1 + L( y, η) (t, η)dη + 2 (t, y) + ( y, η) (η)dη dy α 0 0
0
P1: Binaya Dash November 16, 2007
16:10
7985
7985˙C014
452
Modeling and Control of Complex Systems
1 ≤ 2 α
0
1 α2
≤
1
1
((1 + α 2 )(1 + L∞ ) 2 | |2 (t, y) + | |2 (t, y) + 2∞ | |2 (t, y))dy
0
1 + α 2 + 2∞ (1 + L∞ ) 2 (| |2 (t, y) + | |2 (t, y)) dy
(14.85)
where we used Equation (14.75) in the last bound. The following L 2 estimates
1
1 − Re t
0
| | (t, y)dy ≤ e 2
1
| |2 (0, y)dy
(14.86)
| |2 (0, y)dy
(14.87)
0
1
| |2 (t, y)dy ≤ e− Re t 1
0
1
0
are derived from Equations (14.58) to (14.60) and Equations (14.59) to (14.61). Note that the φ( y) (t, y) and φ( y) (t, y) terms do not affect the L 2 estimates as φ( y) is purely imaginary. Using the previous bounds, Equations (14.86) and (14.87), and the backward transformations, Equations (14.54) and (14.55), we continue:
1
(|V|2 (t, y) + |u|2 (t, y) + |W|2 (t, y))dy
0
≤
1 −1t e Re α2
1
0
1 + α 2 + 2∞ (1 + L∞ ) 2 (| |2 (0, y) + | |2 (0, y)) dy
1 −1t 1 2 2 2 Re 1 + α + ∞ (1+L∞ ) |Y(0, y) ≤ 2e α 0 y y − dy K ( y, η)Y(0, η)dη|2 + |ω(0, y) − ( y, η)Y(0, η)dη)|2 0
≤
0
1 1 Ce− Re t α2
1
(|Y|2 (0, y) + |ω|2 (0, y))dy
(14.88)
0
where C is defined in Equation (14.82) above. Equations (14.25) and (14.26) transform Y, ω back into u, W.
1
(|V|2 (t, y) + |u|2 (t, y) + |W|2 (t, y))dy
0
4π 2 − 1 t 1 Re Ce (|k x u(0, y) + k z W(0, y)|2 + |k z u(0, y) − k x W(0, y)|2 )dy α2 0 1 1 (14.89) ≤ Ce− Re t (|V|2 (0, y) + |u|2 (0, y) + |W|2 (0, y))dy ≤
0
P1: Binaya Dash November 16, 2007
16:10
7985
7985˙C014
Backstepping Controllers for Stabilization of Turbulent Flow PDEs
453
This shows an exponential stability bound for the system containing controlled wave numbers, 1 ∞ ∞ χ (k x , k z )(|V|2 (t, y) + |u|2 (t, y) + |W|2 (t, y)) dk x dk z dy −∞
0
≤ Ce−
1 Re
−∞
t
1
∞
−∞
0
∞
−∞
χ (k x , k z )(|V|2 (0, y) + |u|2 (0, y) + |W|2 (0, y)) dk x dk z dy. (14.90)
Note that we integrate along the parameters k x and k z . 14.4.2 Uncontrolled Wave Numbers To prove the stability of the uncontrolled system, we define a new Lyapunov functional: 1 1 ucw (t) = (|u(t, y)|2 + |V(t, y)|2 + |W(t, y)|2 )dy (14.91) 2 0 1 ˙ ucw = −2α 2 ucw − (|u y (t, y)|2 + |Vy (t, y)|2 + |Wy (t, y)|2 )dy +
0 1
4(2y − 1)
0
¯ (V(t, y) u(t, ¯ y) + V(t, y)u(t, y)) dy. 2
(14.92)
By using the Poincar´e inequality: −
1
(|u y (t, y)|2 + |Vy (t, y)|2 + |Wy (t, y)|2 ) dy ≤ −ucw (t)
(14.93)
0
we find: ˙ ucw ≤ −2α 2 ucw − ucw +
1
¯ 2(V(t, y) u(t, ¯ y) + V(t, y)u(t, y))dy. (14.94)
0
By noting that:
1
2|V(t, y)||u(t, y)|dy ≤
0
1
(|V(t, y)|2 + |u(t, y)|2 ) dy
0
≤
1
(|V(t, y)|2 + |u(t, y)|2 + |W(t, y)|2 )dy
(14.95)
0
we see that ˙ ucw ≤ −2α 2 ucw − ucw + 4ucw and if α 2 ≥ 2/ (which is equivalent to (k x2 + k z2 ) ≥ ˙ ucw ≤ −ucw .
Re 2π 2
(14.96)
= M2 ), then: (14.97)
P1: Binaya Dash November 16, 2007
16:10
7985
7985˙C014
454
Modeling and Control of Complex Systems
To bound small wave numbers, we substitute the continuity equation into Equation (14.94): ˙ ucw ≤ −2α 2 ucw − ucw y 1 −2πi +2 (k x u(t, η) + k z W(t, η) )dη u(t, ¯ y) 0 0 y + 2πi (k x u(t, ¯ η) + k z W(t, η))dη u(t, y) dy 0
≤ − 2α 2 ucw − ucw + 8π |k x |
1
|u(t, y)|2 dy
0
1
+ 4π|k z |
¯ (u(t, y) W(t, y) + u(t, ¯ y)W(t, y))dy
0
≤ − 2α 2 ucw − ucw + 16π |k x |ucw + 8π |k z |ucw . Therefore, if |k x | + |k z | ≤ /32π =
1 32π Re
(14.98)
= 2m, we have:
˙ ucw ≤ − ucw . 2
(14.99)
We obtain an exponential stability bound for the uncontrolled system: 1 ∞ ∞ (1−χ (k x , k z ))(|V|2 (t, y) + |u|2 (t, y) + |W|2 (t, y))dk x dk z dy 0
−∞ −∞ 1 − 2Re t
≤e
1
∞ ∞
(1−χ (k x , k z ))
0
−∞ −∞ 2
×(|V|2 (0, y) + |u| (0, y) + |W|2 (0, y))dk x dk z dy
(14.100)
where again we are integrating along the parameters k x and k z . 14.4.3 Physical Domain The proof that the whole system in physical space is stabilized by these controllers uses Parseval’s identity (the L 2 norm in the wave number domain, Fourier space, is the same as the L 2 norm in the physical domain). It also makes use of the function χ in two ways. First, as the uncontrolled wave number pairs are stable, bounds on their growth can be found. Second, as the L ∞ norms of the kernels increase as the wave numbers increase, it is necessary to restrict the number of wave number pairs that are controlled. Therefore, in summary, adding together Equation (14.90) with Equation (14.100) and applying Parseval’s identity to both sides of the inequality gives Equation (14.81). To see this, start with the left-hand side of Equation (14.81). After taking the Fourier transform of u, W, and V and applying Parseval’s identity we have Equation (14.101). To obtain Equation (14.102), we split the integral using the function χ (k x , k z ). We then use the bounds in Equation (14.90) and (14.100) to arrive at Equation (14.103). Line (14.104) is the result of combining the two integrals back together. Finally, Equation (14.105)
P1: Binaya Dash November 16, 2007
16:10
7985
7985˙C014
Backstepping Controllers for Stabilization of Turbulent Flow PDEs
455
is obtained by performing an inverse Fourier transform and again applying Parseval’s identity, leading us to the right-hand side of Equation (14.81).
1
0
=
∞
−∞ 1 0
= +
∞
−∞ ∞
−∞ 1 ∞
0 1
(|V|2 (t, x, z, y) + |u|2 (t, x, z, y) + |W|2 (t, x, z, y))d xdzd y
∞
−∞ ∞
−∞ −∞ ∞ ∞
0
−∞ 2
−∞
(|V|2 (t, y) + |u|2 (t, y) + |W|2 (t, y))dk x dk z dy
χ (k x , k z )(|V|2 (t, y) + |u|2 (t, y) + |W|2 (t, y))dk x dk z dy
(1 − χ (k x , k z ))(|V|2 (t, y) + |u|2 (t, y)
+ |W| (t, y))dk x dk z dy 1 ∞ ∞ 1 ≤ Ce− Re t χ (k x , k z )(|V|2 (0, y) + |u|2 (0, y) −∞
0
(14.101)
(14.102)
−∞
+ |W| (0, y))dk x dk z dy 1 ∞ ∞ 1 + e− 2Re t (1−χ (k x , k z ))(|V|2 (0, y) + |u|2 (0, y) 2
−∞ −∞
0
+ |W| (0, y))dk x dk z dy 1 ∞ ∞ 1 ≤ Ce− 2Re t (|V|2 (0, y) + |u|2 (0, y) 2
0
−∞
−∞
+ |W|2 (0, y))dk x dk z dy 1 ∞ ∞ 1 ≤ Ce− 2Re t (|V|2 (0, x, z, y) + |u|2 (0, x, z, y) 0
−∞
(14.104)
−∞
+ |W|2 (0, x, z, y))d xdzd y
14.5
(14.103)
(14.105)
Case kx = 0
We examine the special case of k x = 0 in this section. It is often considered the “ultimate problem” in control of channel flow turbulence as it is the case where the transient growth is the largest.3–5 Setting k x = 0 allows us to explicity solve for K and , which in turn gives explicit formulas for Uc and Wc . We derive these solutions and then discuss their properties. Note that we continue again in Fourier space alone, not physical space. In the case of k x = 0, the variables Y and ω reduce to: Y = k z W,
ω = k z u.
(14.106)
Denoting κ = α|kx =0 = 2π k z ,
(14.107)
P1: Binaya Dash November 16, 2007
16:10
7985
7985˙C014
456
Modeling and Control of Complex Systems
the plant becomes: ¯ y) ut = −κ 2 u + u yy + h(
y
W(t, η)dη
(14.108)
0
Wt = (−κ 2 W + Wyy + g¯ ( y)Wy (t, 0)) Vt = (−κ 2 V + Vyy ) − p y u| y=0 = W| y=0 = V| y=0 = 0 u| y=1 = Uc W| y=1 = Wc
(14.109) (14.110) (14.111)
V| y=1 = Vc
(14.112)
where g¯ ( y) = κ tanh( κ2 ) cosh(κ y) − sinh(κ y) 4κ ¯ y) = − i(2y − 1). h(
(14.113) (14.114)
Likewise, the controllers reduce to:
1
Uc =
(1, η)W(t, η)dη
(14.115)
K (1, η)W(t, η)dη
(14.116)
0
1
Wc = 0
˙ c (t) = κ[−κ Vc (t) + i(Wy (t, 0) − Wy (t, 1))]. V
(14.117)
The transformations (14.54) and (14.55) simplify to:
y
uˆ = u − 0 ˆ = W− W
( y, η)W(t, η)dη, y
K ( y, η)W(t, η)dη,
(14.118) (14.119)
0
ˆ are the target variables for k x = 0 that behave according to: where uˆ and W uˆ t = (−κ 2 uˆ + uˆ yy ) ˆ +W ˆ yy ) ˆ t = (−κ 2 W W ˆ ˆ y=1 = 0. u| ˆ y=0 = u| ˆ y=1 = W| y=0 = W|
(14.120) (14.121) (14.122)
We first examine the gain kernel PDE (14.66) to (14.68) for K when k x = 0. After setting k x to zero, we obtain: K yy = K ηη y K ( y, 0) = K ( y, η) g¯ (η)dη − g¯ ( y)
(14.123) (14.124)
0
K ( y, y) = −g¯ (0).
(14.125)
P1: Binaya Dash November 16, 2007
16:10
7985
7985˙C014
Backstepping Controllers for Stabilization of Turbulent Flow PDEs PROPOSITION 14.5.1 The solution K ( y, η) to Equations (14.123) to (14.125) is κ 2 κ tanh( κ2 )( y−η) K ( y, η) = κ − coth e . sinh(κ) 2
457
(14.126)
PROOF This explicit solution is found by postulating K ( y, η) = F ( y − η), which yields a Volterra equation: s F (σ ) g¯ (s − σ )dσ − g¯ (s) . (14.127) F (s) = 0
The equation for F (s) can be reduced to a second-order ordinary differential equation (ODE) by using the fact that g¯ = κ 2 g¯ . F − g¯ (0) F = 0, F (0) = −g¯ (0) F (0) = −g¯ 2 (0) − g¯ (0)
(14.128) (14.129) (14.130)
We postulate that F = A1 eg¯ (0)s + A2 and see the following relations must hold: A1 + A2 = F (0) g¯ (0) A1 = F (0).
(14.131) (14.132)
Solving these equations yields Equation (14.126) above. It is not hard to see that the exponent κ tanh( κ2 ) is bounded in absolute value by 2π|k z |. Also, K and its derivative with respect to k z disappears when k z goes to zero. Thus, the gain K (1, η), when k x = 0 is independent of , grows quadratically in k z for, small k z and linearly for k z large. Next we turn our attention to the gain kernel PDE (14.69) to (14.71) for with k x = 0. ¯ y) yy = ηη − h( y ( y, 0) = ( y, η) g¯ (η)dη ( y, y) = 0
(14.134)
0
PROPOSITION 14.5.2 The solution ( y, η) to Equations (14.133) to (14.135) is =
(14.133)
κi ( y − η)η(3y − η − 2) κi 4 g¯ (0) 2 −2 g¯ (0) 3 +¯g(0) α 2 −α 2 + 2 g¯ (0) 3 α 2
(14.135)
P1: Binaya Dash November 16, 2007
16:10
7985
7985˙C014
458
Modeling and Control of Complex Systems +2
g¯ (0)−1 (y g¯ (0) 2
− η) −
1+¯g(0) (y g¯ (0)
− η) 2 + ( y − η) 3
− 8 α12 ( y − η) cosh(κ( y − η)) cosh(κ( y − η)) + 4 sinh(α)+α α3 sinh(κ( y − η)) + 4 cosh(α)+3 α3 −2 PROOF
5 g¯ (0) 2 −¯g(0) 3 +¯g(0) α 2 −α 2 g¯ (0)( y−η) e g¯ (0) 3 (α 2 −¯g(0) 2 )
.
(14.136)
We start with a change of variables: ξ = y+η ζ = y−η ξ +ζ ξ −ζ , = (ξ, ζ ). ( y, η) = 2 2
(14.137) (14.138) (14.139)
This turns the PDE (14.133) to (14.134) for into the following PDE for . 1 ξ + ζ (14.140) ξ ζ = − h¯ 4 2 (ξ, 0) = 0 (14.141) ξ (ξ + τ, ξ − τ ) g¯ (τ )dτ (14.142) (ξ, ξ ) = 0
Integrating Equation (14.140) first with respect to ζ from 0 to ζ and then with respect to ξ from ζ to ξ we obtain: ξ ζ 1 s + τ = dτ ds + (ξ, 0) − (ζ, 0) + (ζ, ζ ) (14.143) − h¯ 4 2 ζ 0 ζ κi = (ζ + τ, ζ − τ ) g¯ (τ )dτ. (14.144) (ξ − ζ )ζ (2ζ + ξ − 2) + 2 0 Postulating: κi 1 (ξ − ζ )ζ (2ζ + ξ − 2) + (ζ ) (14.145) 2 we are left with an equation for that depends only on ζ : ζ 1 1 (τ ) g¯ (ζ − τ )dτ (14.146) = ϒ(ζ ) + 0
ζ where ϒ = κi 0 τ (ζ − τ )(3ζ − τ − 2) g¯ (τ )dτ . We turn Equation (14.146) into an ODE by again recalling that g¯ = κ 2 g¯ . (ξ, ζ ) =
− g¯ (0) = ϒ − κ 2 ϒ (0) = ϒ(0) = 0 (0) = (0) g¯ (0) + ϒ (0) = 0
(14.147) (14.148) (14.149)
P1: Binaya Dash November 16, 2007
16:10
7985
7985˙C014
Backstepping Controllers for Stabilization of Turbulent Flow PDEs
459
Note that the homogeneous part of Equation (14.147) has the same coefficient as Equation (14.128). Therefore, the solution to Equation (14.147) will contain a term of the form eg¯ (0)( y−η) . Due to the inhomogeneous forcing term, ϒ −κ 2 ϒ, there are other terms, which are found by solving for . Computing ϒ and ϒ − κ 2 ϒ shows that must be of the form: (z) = A0 + A1 z + A2 z2 + A3 z3 + B0 z cosh(κz) + B1 z sinh(κz) + C0 cosh(κz) + C1 sinh(κz) + D0 eg¯ (0)z . (14.150) By substituting Equation (14.150) into Equation (14.145) and then Equation (14.145) into Equation (14.139), the solution (14.136) is obtained. As Equation (14.136) shows, is linearly dependent on 1/, the Reynolds number. When k z = 0, goes to zero. Therefore, grows linearly in k z for k z small, and exponentially when k z is large. Finally, we point out that the “peak-to-peak” gain of the dynamic controller in Equation (14.117) from the skin friction sensor Wy (t, 0) − Wy (t, 1) to the actuated variable Vc (t) when k x = 0 is Vc (·)∞ 1 ≤ , Wy (·, 0) − Wy (·, 1)∞ 2π k z
(14.151)
which means that it is independent of the Reynolds number 1/ and that this controller is nearly inactive for large k z , whereas its effort is significant for small k z . THEOREM 2 The closed-loop system (14.108) to (14.112), (14.115) to (14.117), (14.126), (14.136) is exponentially stable for any finite k z . As in Section 14.4, however, in this case the dependence of C on k z comes only from the norms of K , L, , and . PROOF
14.6
Discussion
We have shown the derivation and stability proof for controllers that (1) stabilize and decouple the Orr–Sommerfeld and Squire equations at any Reynolds number and (2) stabilize the 3D Navier–Stokes equations linearized around a Poiseuille profile equilibrium. This derivation employs actuation along one wall in each of the streamwise, spanwise, and normal directions. One controller converts the system into a strict feedback form, while the other two controllers are designed using the backstepping method (cascade Volterra operator transformations) to stabilize and decouple the subsystems. These controllers essentially work to make the normal velocity (Orr-Sommerfeld)
P1: Binaya Dash November 16, 2007
16:10
7985
460
7985˙C014
Modeling and Control of Complex Systems
and normal vorticity (Squire) subsystems behave as two uncoupled stable heat equations. To use these controllers in practice or in simulations, instead of solving high-dimensional Ricatti equations, we must find solutions to the kernels K and . As mentioned above, this can be done off-line, either numerically, symbolically, or (in certain cases, such as when k x = 0) analytically. We studied the special case of k x = 0, for which we derived explicit controller gain kernels. The system (14.108), (14.109) displays the cascade connection commonly regarded as the cause for nonorthogonality that leads to transient growth. 3–5 With our transformations (14.118), (14.119) and boundary feedback (14.115) to (14.117) we cut the coupling and reduce the system to two heat equations (14.120) to (14.122). Examining the explicit gain kernels demonstrates that controllers in this case depend, at most, linearly on the Reynolds number. However, the gain kernels have an exponential dependence on k z for large k z .
Acknowledgments This work was funded by a National Defense Science and Engineering Graduate Fellowship and National Science Foundation Grant number CMS0329662.
References 1. Vazquez, R. and Krstic, M., A closed-form feedback controller for stabilization of linearized Navier–Stokes equations: The 2D Poiseuille flow, 44th IEEE Conf. Design and Control, 5959, 2005. 2. Vazquez, R., Schuster, E., and Krstic, M., A closed-form observer for the 3D inductionless MHD and Navier–Stokes channel flow, IEEE Conf. on Decision and Control, 2006. 3. Bewley, T. R., Flow control: New challenges for a new Renaissance, Progress in Aerospace Sciences, 37, 21, 2001. 4. Jovanovic, M. and Bamieh, B., Componentwise energy amplification in channel flows, Journal of Fluid Mechanics, 534, 145, 2005. 5. Schmid, P. J. and Henningson, D. S., Stability and Transition in Shear Flows, Springer-Verlag, Berlin, 2001. 6. Isidori, A., Nonlinear Control Systems, Springer-Verlag, Berlin, 1995. 7. Krstic, M., Kanellakopoulos, I., and Kokotovic, P., Nonlinear and Adaptive Control Design, John Wiley & Sons, New York, 1995. 8. Smyshlyaev, A. and Krstic, M., Closed-form boundary state feedback for a class of 1-D partial integro-differential equations, IEEE Trans. Automatic Control, 49, 2185, 2004.
P1: Binaya Dash November 20, 2007
10:57
7985
7985˙C015
15 An Approach to Home Automation by Means of MAS Theory
Giuseppe Conte and David Scaradozzi
CONTENTS 15.1 15.2
Introduction.............................................................................................. 461 Home Automation Systems ................................................................... 464 15.2.1 HAS Description ....................................................................... 464 15.2.2 Paradigmatic Example ............................................................. 466 15.2.3 System Behavior and Control ................................................. 468 15.3 HAS Theory.............................................................................................. 469 15.3.1 Domotic Agent Definition ....................................................... 469 15.3.2 Home Automation System Definition ................................... 471 15.4 HAS Analysis and Simulation............................................................... 472 15.4.1 The Simulator ............................................................................ 472 15.4.2 Power Leveling ......................................................................... 474 15.4.3 Water Leveling .......................................................................... 475 15.4.4 Simulation Results .................................................................... 477 15.5 Conclusion................................................................................................ 481 References............................................................................................................. 482
15.1
Introduction
The aim of this work is to show the possibility of using in a beneficial way the multi-agent system (MAS) theory in the analysis and study of home automation systems. The problem of conceiving and developing efficient systems for home automation presents several difficult aspects, due to a number of factors that, all together, generate complexity. In dealing with home automation, in particular, one has to consider distributed control structures, hybrid time-driven/event-driven behaviors, interoperability between components of different brands, and requirements of safe and efficient interaction with 461
P1: Binaya Dash November 20, 2007
462
10:57
7985
7985˙C015
Modeling and Control of Complex Systems
human users, to mention only some of the characteristics that make this area interesting and challenging. In modern houses, the appliances and devices that may be included in a home automation system are endowed with individual control systems that, in a more or less sophisticated way, manage their behavior, possibly using information and data they acquire externally in some way and exchange among them. Today, prototypal home automation systems are conceived for regulating the concurrent use of limited resources, like electric energy or hot water, and for facilitating operation, monitoring, and survey of (groups of) appliances. From this point of view, a home automation system can be viewed roughly as a partially distributed control system, whose components are essentially autonomous, possess a certain degree of intelligence, share resources and some common goals, and communicate among them. In general, however, it appears difficult to determine the key features of a home automation system, as well as to define general criteria for evaluating its performances and, what is perhaps more important, to develop a satisfactory, systematic design methodology. The basic idea originally proposed in References [5, 6, and 19] and illustrated here is that a formalism derived from the MAS theory can, in principle, respond to these needs, providing a powerful conceptual framework and a number of appropriate methodological tools for coping with complexity. The paradigm of MAS is used widely in several areas of computer science, automation, and robotics (see References [10, 11, 15, 16] and the references therein). In a MAS, several autonomous agents interact to accomplish specific tasks and possibly they compete for resources in an environment that may be modified by their actions. In general, a coordination and collaboration strategy is necessary in order to solve conflicts and to assure overall satisfactory performances (see, e.g., References [17, 18]). These characteristics make the MAS paradigm particularly well suited for dealing with complex systems in which complexity arises mainly by the interaction between different components. From this point of view, the general description of a home automation system, provided some basic features are suitably specified, fits well with the MAS paradigm. In fact, appliances can be viewed as autonomous agents, whose individual tasks consist in completing their operating cycles, sharing resources, such as electricity, water and gas, that are (all or in part) limited and possibly not sufficient to satisfy all the agents at the time of request. Resources must therefore be allocated according to given priorities, taking into account that delays in performing specific tasks affect, in a way depending on the single delayed task, the degree of user satisfaction. Allocation cannot follow static rules, but must be decided according to the evolving, dynamic situation of the environment and, in addition to maximizing the possibility for every agent to see its demand satisfied in the shortest possible time, it must facilitate resource saving. In spite of the above consideration, the tools and formalism of MAS theory do not seem to have been employed to a full extent in the study of home automation systems and related problems. Actually, except in some cases
P1: Binaya Dash November 20, 2007
10:57
7985
7985˙C015
An Approach to Home Automation by Means of MAS Theory
463
(see for instance Reference [8]), the design and development of future home automation systems seem to have progressed, until now, mainly without the help of a unifying conceptual framework of some kind. The general structure of a home automation system we consider was previously discussed in References [5, 6, 19], referring, for the sake of exemplification, to a series of industrial experiences (see References [1–4, 9]). In those papers, a conceptual model for a home automation system was proposed, using as a base the MAS formalism. Here, we expand further the approach of References [5, 6] and provide an example of application to the analysis of possible control strategies for energy management in a simulated environment. In the system structure we consider here, the appliances use the power line also for communicating purposes. This particular choice is motivated by the fact that, using suitable, simple devices for interfacing, this solution is feasible and allows the building of a prototypal system, for testing and experiments, in an easy and economical way (see References [2–4] and Section 15.3.2 for more details). The use of different technologies (and, possibly, of different kinds of interfacing devices) for allowing communication (like WiFi, Bluetooth, ZigBee, and dedicated networks) does not change the general aspects of the situation studied. The system or single parts of it interact with human users through the interfaces of the appliances or through other specific interfaces. In addition, the system may be endowed with gateways to allow communication with remote locations. An important element of the system is represented by a power meter, which is able to measure the electric load imposed at each time on the energy source and to inform the elements of the home automation system about the quantity of energy that is available (see Reference [7]). In standard home installations, the meter is coupled with a power limiter that, according to some specific procedure, may disconnect the energy source in case of excessive load. This action, which causes a blackout in the house, must be avoided by proper functioning of the home automation system (see References [1, 5, 9]). In particular, among the tasks of the home automation system, those of major concern at the present stage of development include: •
regulating energy consumption by avoiding overload and consequent blackouts;
•
regulating the use of hot water produced by gas boilers;
monitoring the behavior of different appliances and, possibly, detecting and signaling malfunctions or failures; • facilitating interaction with human users. •
Concerning the first two points, regulation includes the action of allocating the resource in case the demand exceeds the availability. Practically, in case of conflict, the system has to implement a set of rules that define priorities and distribute the available resources accordingly. The development of efficient policies for assigning priorities, which take into account the peculiarities of the tasks performed by the single appliances and the preferences of the human user, is probably the key area where the formalism we develop could result
P1: Binaya Dash November 20, 2007
10:57
7985
7985˙C015
464
Modeling and Control of Complex Systems
in analyzing and then improving the system’s performances. As mentioned also in References [6, 19], similar remarks apply to the third point, for which, in addition, the kind and quantity of data the system is able to process and the way in which it can extract structured information from them are of particular importance. This is also relevant for the fourth point, which requires the development of simple systems architectures and easiness of communication between agents and human users (see Reference [14]). The content of this chapter is structured as follows. First, we give a brief description of a home automation system, adopting an informal point of view. Then we summarize the new, formal definition of domotic object and of domotic agent and, using them, we formalize the definition of home automation system. Then we describe the construction of a simulator environment for home automation systems, based on the previous formalism, and we show its use in analyzing and validating several overall control strategies.
15.2
Home Automation Systems
In order to illustrate the general framework where we place our investigation and to motivate our approach, we give in this section a general, informal description of a home automation system (HAS). After describing the main components and characteristics of the system, we will describe a paradigmatic example and then we will point out a class of problems. 15.2.1 HAS Description A home automation system consists, basically, of a number of appliances, which may exchange data through a communication network of some kind. The various components of the system, such as the white goods, the audio/ video subsystems for home entertainment and for security, the HVAC (heating, ventilation, and air-conditioning), and the illumination subsystem, require a specific and variable amount of energy during their normal working cycle and, in the presence of constraints that limit the available energy, each appliance needs to cooperate with the others in order to maximize efficiency. This structure can roughly be viewed as a partially distributed control system, whose components are essentially autonomous, possess a certain degree of intelligence, and share some common goals. We refer to the whole system with the acronym HAS (home automation system). Considering the presence of several components (the appliances) that act as autonomous agents of various available resources (electricity, cold water, and gas) and of a number of common goals (energy saving, user satisfaction, security and safeness, and so on), we get a global picture that fits well with the MAS point of view. MAS theory is used widely in several areas of computer science, automation, and robotics (see References [10, 11, 15, 16] and the references therein) and, on the basis of our previous remark, it will be used here for formalizing the notion of HAS.
P1: Binaya Dash November 20, 2007
10:57
7985
7985˙C015
An Approach to Home Automation by Means of MAS Theory
465
Before entering in the details of a formal definition, however, it is useful to remark on some points. The structure of a general HAS, in the rough terms we have, for the moment, described it, deserves to be studied at least at two levels: a global level, that concerns the overall behavior and performances of the system, and a local level that concerns the way in which single components and devices are integrated into the system and work. In other words, in a HAS we have a collection of components, which can be generically called agents, and an overall architecture, which defines the environment in which the agents interact and the modalities of interaction. From this point of view, a HAS qualifies as a MAS in the sense of Sycara et al. [16], in which, in particular: 1. there is no centralized, global controller or supervisor; 2. data and knowledge are decentralized; 3. computing power is distributed and asynchronous; 4. each agent has incomplete information or capabilities and, as a consequence, limited knowledge and awareness of the overall situation. In order to allow the system to operate, the architecture of the system and the way in which information flows in it must guarantee that: 1. each agent may declare itself. 2. each agent may detect and possibly recognize other agents; 3. each agent may interact with other agents according to given rules (system rules). Besides these basic and somehow abstract features of the global system, in a real HAS the agents must have some individual qualities that facilitate their integration into a larger system, without reducing their ability to work as stand-alone devices and to satisfy the user. In summary, the principal and most important qualities can be informally described as follows: •
Autonomy — the capability to perform their own tasks, when required, without support from the HAS or, alternatively, the capability to negotiate resources in interacting with other agents
•
Collaborative behavior — the capability to cooperate with other agents in order to share resources and to reach common goals
•
Adaptability — the capability to modify their according to the outcome of interaction processes
•
Reactivity — the capability to react, to some degree, to the system’s actions
behavior
Although the above qualities are very few with respect to those considered for a generic agent in a MAS (see, e.g., References [10, 15]), they are enough for assuring, in principle, the possibility to work by sharing common resources, while applying suitable strategies for optimizing individual performances (see also Reference [14]).
P1: Binaya Dash November 20, 2007
10:57
7985
7985˙C015
466
Modeling and Control of Complex Systems
From the point of view we have adopted, we can consider the human user himself as an agent of the HAS. In this way, his needs, in terms of resources, will conflict with those of other agents in the same structured environment and the solution of conflicts will follow the same general rules. Privileges and priorities granted to the human user, then, do not represent external disturbances which may interfere with the system’s behavior and degrade its performances, but are integrated in the laws that govern its operations. 15.2.2 Paradigmatic Example In order to develop our analysis, we refer to a general scheme of a concrete domestic environment like the one outlined in Figure 15.1. This approach will simplify our exposition without losing generality. The agents, in the underlying HAS, are represented by a human user (HU) and by the various devices: washing machine (WM), dishwasher (DW), gas boiler (GB), house heating system, and power meter/power limiter (PM/PL). Other agents can be added, but the ones we have mentioned are those that are more commonly found in real situations and they are enough to illustrate the typical problems one encounters in dealing with general HASs. In that environment, we have three resources, cold water, gas, and electricity, that are the basic resources available in the domestic environment.
Domestic Water Heating Circuit HC
USER
ev1
Cold Water Hot Water ev2 ev1
Water Input
CW ev2 HW
Gas Boiler in1 GB
in3
WM
DW
N
Dishwasher
N
Internet
H2O
Residential Gateway (optional)
N
Washing Machine
Power line Communication nodes
FIGURE 15.1 Domestic environment and home automation system.
PL PM
N
GAS
Gas Input
Power limiter Power meter Power Line
Power Line Input
P1: Binaya Dash November 20, 2007
10:57
7985
7985˙C015
An Approach to Home Automation by Means of MAS Theory
467
Hot water, which is viewed as an additional resource, may be produced internally, by transforming other resources, like gas and cold water or electricity and cold water. The available amounts of cold water and gas at a given time are assumed to be free from limitations, while the available amount of hot water and that of electricity at a given time are subject to limitations. In the system of Figure 15.1, the GB supplies hot water both for sanitary use and for heating purposes through the hot water circuit (HW). The house heating circuit (HC) and the HU can obtain hot water from the HW. The WM and DW can both use hot water produced by the boiler, competing between them, with the HC and the HU, in the exploitation of this limited resource, or they can produce hot water for private use employing electricity and cold water. In addition the agents that use electricity compete for this limited resource. Obviously, competition generates conflicts, and then it is useful to have an allocation strategy of the available limited resources that can practically be implemented by the various agents. The basic information for implementing a resource allocation strategy concerning electricity comes from the PM, which measures the actual, global electric load. The coupled PL keeps the load below a fixed threshold by shutting down the connection with the source of electricity when the threshold is exceeded. For allocating hot water, the situation is, in general, more problematic, because a direct measure of demand and availability is usually not obtainable. A possible strategy, based on indirect information, will be described and discussed in Section 15.4.3. As described in Reference [6], one can assume that some devices, for instance the DM and the WM, are able, at different degrees, to exchange information with other agents through a communication network. The communication network may be physically realized, for instance, using the same power line that carries the electricity. Alternatively, one may assume that wireless communication occurs or that a dedicated network is in place. The first possibility is represented in Figure 15.1, where the communication through the power line is assumed to be realized by means of suitable devices, here called nodes. Essentially, nodes are introduced in our picture in order to decouple the basic functional characteristics of the appliance from its communication capability. This reflects a possible orientation of the market, where communication capabilities may be offered to the buyer as additional features, in such a way to keep the price of appliances in basic configuration as low as possible. In addition, nodes may be conceived in such a way as to facilitate interfacing with appliances of different brands and, possibly, they can be endowed with the capability to read some of the internal variables of the appliance they are connected to and to transmit to it commands or information they get through the network (for instance, the information coming from other nodes and, specifically, those coming from the PM), according to the ability of the appliance to establish a dialogue with the node. Nodes may also be capable of processing the data they collect according to a set of software instructions, to incorporate part of the intelligence of the systems. In this case, individually or together with the appliance they are connected
P1: Binaya Dash November 20, 2007
10:57
468
7985
7985˙C015
Modeling and Control of Complex Systems
to, they are instrumental in implementing the control strategies that optimize the global system performances. It has to be pointed out, however, that the physical architecture that employs nodes is used here only as an example of possible and currently feasible implementation (see References [2–4] for a description of devices, like the Wr@p Enabled Smart Adapter (WESA) or Smart Cube, that can act as a node and for a discussion of their functionality). Other architectures are possible, where, the functionalities of the node are included into the single appliances. Information may also flow into and out of the house on a communication line connected to a residential gateway and, in this way, data can be exchanged, for example, for remote assistance or control. However, we will not consider this aspect in the present discussion. 15.2.3 System Behavior and Control Referring to the system we have described above, the problems in regulating its behavior come from the following: •
Limited resources, such as electricity and hot water, must be distributed according to specific priorities (for instance, the washing machine should not be allowed to get hot water, if this may cause a sudden, unpleasant reduction of the water temperature while the human user is taking a shower).
•
Operation of different appliances must be organized and scheduled in such a way as to keep the global electric load within the limits established by the supplier, to avoid possible blackout.
•
The use of electricity and gas must take into account economic priorities.
•
The performances of individual appliances must be optimized according to specific criteria of the user’s satisfaction, under the constraints imposed by the above requirements.
At the same time, it is assumed that the essential information that the system itself has on its status and that may be used for regulating its behavior is represented by the knowledge of the actual electric load. Additional information about the appliance status may possibly be generated by the associated node, either using direct measurements or interpreting the time evolution of the electric consumption in terms of a known model of the appliance. This possibility, in particular, will be considered in Section 15.4.3 in developing a strategy for the allocation of hot water based on the knowledge of the status of the gas boiler. General control strategies for the above systems consist of a set of rules that establish priorities in gaining access to limited resources on the basis of the available information. These rules are assumed to have been synthesized in such a way as to maximize a functional that, more or less abstractly, describes the user’s satisfaction. The way in which a control strategy can be
P1: Binaya Dash November 20, 2007
10:57
7985
7985˙C015
An Approach to Home Automation by Means of MAS Theory
469
implemented practically depends on the action each agent can actuate in order to regulate its behavior. In case the functionalities of each agent can be divided in two groups, those of the associated node and those of the appliance or device, the analysis is simplified. Basically, a node can forbid the use of electricity, blocking, in this way, the appliance. This is actually the only possibility if the appliance and the node are not able to interact at a higher level. Alternatively, the node and the appliance, if both have the ability to exchange information, can cooperate in choosing a strategy that allows the appliance to reduce its actual demand for electricity by modifying its behavior. It is clear from the picture we have given that both the system and the task of controlling it are very complex and difficult to handle. The main difficulties arise from the highly diversified nature of the system’s components; from the hybrid nature, time-driven and event-driven, of its evolution; from the presence of components, like the human user, whose behavior is stochastic; from the fact that realistic and satisfactory indices of performances are very difficult to define. The approach we propose to deal with this complexity consists in developing a formal theory that applies to our situation and that allows us to define a set of procedures and to construct a number of formal tools for tackling the modeling problem and the control problem. As anticipated, this is done by relying on a well-established, preexisting, formal theory, namely the theory of multiagent systems, or MAS, that supports our construction and assures its coherency, applicability, and power.
15.3
HAS Theory
In order to construct a formal definition of HAS, we need first to characterize, following the paradigm of the MAS theory, its components, namely the appliances and the other entities we have already generically termed agents. This has been done first in Reference [6], and here we recall the definition given there. 15.3.1 Domotic Agent Definition Following the MAS theory point of view, an agent, namely the basic element of a larger system, can be defined by characterizing its capacities. In general, we agree that the basic capacities or abilities that define an agent are those considered in the following definition. DEFINITION 1 An agent is a virtual or physical entity that possesses, to different degrees, the following capacities: 1. it is able to perform specific actions in a given environment 2. it is able to perceive elements of the environment
P1: Binaya Dash November 20, 2007
470
10:57
7985
7985˙C015
Modeling and Control of Complex Systems 3. it is able to construct (partial) models of the environment 4. it is able to use personal resources and environmental resources 5. it is able to orient its actions toward specific goals 6. it is able to communicate directly with other agents in the environment 7. it is able to offer services to other agents 8. it is able to govern its action according to its possibilities and limitations, its goals, knowledge of the environment, and the resources available in the environment
Now, we can give the notion of domotic object and of domotic agent (domotic comes from the Latin word domus, which means home), namely the elementary component of a HAS, by specializing Definition 1 to our case. DEFINITION 2 [6] A domotic object is a generic agent in the sense of Definition 1 that has at least the general capacities 1, 4, 5, and 8 and, concerning capacity 6, it is able to communicate to other agents in the environment at least its requirements about environmental resources. DEFINITION 3 [6] A domotic agent is a domotic object that, in addition, has at least the general capacities 2. A domotic agent is called cognitive if it also has capacity 3. The definitions we have introduced are quite abstract and need to be explained and made more concrete by the following remarks. REMARK 1 Capacity 1 includes the possibility to work in a stand-alone configuration in an environment that supplies resources, as well as the possibility to work as a component of a team of objects that exchange information, resources, and services. Actually, this agrees with the characteristics of the devices that populate the domestic environment. REMARK 2 When capacity 2 is present, it is implicitly required that capacity 6 allows a communication richer than the basic one considered in Definition 2. Perception concerns in particular the signals coming from other agents, which describe their requirements and display the availability of resources. REMARK 3 The distinctive quality of cognitive agents that consists in the ability to represent the environment by means of a model can be used for understanding the results of given action and, therefore, for planning future behavior. Agents that are not cognitive are only capable of a reactive behavior in response to the information, viewed as a stimulus, coming from the environment.
P1: Binaya Dash November 20, 2007
10:57
7985
7985˙C015
An Approach to Home Automation by Means of MAS Theory
471
REMARK 4 The union of capacities 3, 6, and 8 gives to the agents that possess them the ability to cooperate in order to improve the global performances of the system by regulating individual behaviors. In such a way, in fact, priorities and other rules concerning the concurrent use of resources can be implemented and conflicts can be solved. The notion of domotic agent we have defined embraces the qualities we have listed informally in Section 15.2.1. 15.3.2 Home Automation System Definition Having defined the principal elements, namely the domotic object and agents that will form the overall system, we can give the following definition. DEFINITION 4 [6] A home automation system consists of the following elements: 1. 2. 3. 4.
a set GR of global resources; a set DO of domotic objects; a set DA of domotic agents, subset of DO; one information network IN, that connects domotic objects and agents; 5. a set R of rules that govern the individual behavior and the concurrent operation of domotic objects and concerns: (a) use and transformation of external resources; (b) communication; (c) perception and understanding; 6. a set L of operators, called laws, that describe the time evolution of the global system according to the individual behavior of objects and agents. This characterization of the notion of HAS agrees with the general point of view of MAS theory. Although it represents only an abstract and conceptual instrument, the above formal definition gives us the possibility to analyze a concrete example and to understand its structure. The time evolution of a HAS formally described on the basis of Definition 1 is completely determined by L and it depends, in particular, on the rules that form R. Then, it is possible to study the effects of a different choice of rules that form R on the global evolution and behavior of the system and, in particular, to evaluate its performances in terms of functionals that represent user satisfaction. Simulation and design procedures can be developed on this basis as exemplified in the next section. In addition, critical parameters of the system can be more easily recognized by analyzing its structure in a formal framework like the one we have constructed.
P1: Binaya Dash November 20, 2007
10:57
7985
7985˙C015
472
15.4
Modeling and Control of Complex Systems
HAS Analysis and Simulation
Simulation is, at this stage of development of home automation systems, a basic methodology for investigating important features and behaviors and for designing control strategies. Experiments that involve real systems of nontrivial complexity, even in an emulation environment, are, in fact, costly and time consuming. Regarding simulation, the theoretical framework we have constructed is instrumental, first of all, in analyzing the characteristics of real systems and in displaying their structure in terms of Definition 4. On this basis, it is then possible to define in a systematic way suitable, accurate models that may represent the system at issue and simulate its behavior. To show this, here we briefly present and discuss the main lines of the development of a virtual environment that reproduces a common domestic environment, like that depicted in Figure 15.1 (see also References [13, 14]). 15.4.1 The Simulator The simulator we are speaking of has been conceived in such a way that real devices can be integrated into it, by means of suitable interfaces, to substitute, totally or partially, virtual, simulated devices. This gives us the possibility to increase realism and significance of the simulation and guarantees the accuracy of the results. The attainment of this characteristic of versatility has been facilitated by the choice of working in a software environment, namely the NI LabView and LabWindows/CVI environment, which allows rapid prototyping and easy interfacing, by means of suitable hardware, between the virtual world and the real one. Moreover, the simulator blocks communicate by TCP/IP protocol. This allows us to model and simulate a quite general situation, according to the scenario for home automation developed by the CECED European Commission Discussion Group (see Reference [9]). Following Definition 4 and referring to Figure 15.1, we have that the set GR of global resources is described by GR = {electricity, cold water, hot water, gas}. Some of the global resources, namely electricity and hot water, are characterized by limited availability. The set DO of domotic objects is described by DO = {power meter/power limiter PM/PL, washing machine WM, dishwasher DW, gas boiler GB, house heating circuit HC, human user HU}. To these we can add a generic electric device GD, not represented in Figure 15.1, that accounts for other different users of electricity. Except for HC, HU, and GD, the above elements, together with the node that links each one of them to the communication network, can be viewed as domotic agents, and they form the set DA in the system. In the virtual environment, each one of the devices we have mentioned, except the human user, is modeled by a dynamic system that specifies part of the rules forming the set R. The actions of the human user are implemented in the simulator by means of external commands. The WM and the DW are agents that use electricity, cold water, and hot water and they can produce hot water, from electricity and cold water, for
P1: Binaya Dash November 20, 2007
10:57
7985
7985˙C015
An Approach to Home Automation by Means of MAS Theory
473
internal use. The GB uses cold water and mainly gas to produce hot water, but it also employs electricity in the process (to circulate water and to eliminate combustion gases). Both the HC and the HU employ hot water. The WM and the DW are modeled (by specifying suitable rules) in such a way that they find it more economic to use the hot water produced by the boiler than to produce it from electricity, and the GB can respond satisfactorily to the request of only one agent at a time. The information network, which in the real situation is represented by the power line, is realized by the TCP/IP protocol in the simulation environment. This gives the additional possibility to implement the simulator on different networked PCs, splitting the computational burden, and to integrate real appliances having networking capabilities. The flow of information is simulated by employing global external variables, which are shared by all (the programs that represent) the agents. In addition, external variables are also used to model the amount of each available resource the system is employing. The PM/PL agent, as it happens in the real situation, is not consuming any resource, but it produces information about the actual electric load that is forwarded to the other agents through the IN. In addition, the PL acts interrupting the supply of electricity and causing system blackout if the load exceeds a threshold and, as a result, the operation of every agent that uses electricity stops. The DW and the WM are assumed to be cognitive agents. The PM/PL, as we have described it, is not a cognitive agent and neither is the GB. However, we can assume that the GB is coupled to a node that, measuring the electric load it generates and referring it to a known model, is able to detect the internal status of the boiler and can communicate this information through the IN. This specific information may be used by cognitive agents in modeling the environment to understand whether or not hot water is available for their actual requirements. In conclusion, the information that is exchanged between the agents describes the availability of electricity and, indirectly, through the status of the GB, the availability of hot water. Agents may be given access to all or part of the available information and, according to this, as well as to their instructions, ability, and assigned task, they regulate their behavior following specific rules. In this way they implement in a decentralized way control strategies that aim at regulating the global system behavior. All these characteristics are expressed by means of suitable elements of the sets R of rules and L of laws. It has to be remarked that the structure given to the virtual environment reproduces that found in a standard home environment, for example, in Italian and European houses. As described in Section 15.2.3, the use of limited resources in the system is governed in a distributed way. The rules and laws of the sets R and L, when implemented, are expected to govern the behavior of the single agents as a whole and to regulate the competition for hot water and electricity. In order to solve conflicts due to scarcity of resources, two solutions are in general conceivable. One consists in giving the system the capability to predict and
P1: Binaya Dash November 20, 2007
10:57
7985
7985˙C015
474
Modeling and Control of Complex Systems
avoid conflicts by implementing a centralized policy, whereas the other consists in providing individual agents with the capability to negotiate in order to solve conflicts when they arise. Coherently with the idea of adopting a distributed control system point of view, we choose the second solution by implementing two different strategies for managing, respectively, electricity and hot water. According to the terminology introduced in References [1, 12], we call these strategies, respectively, power leveling and water leveling. Both are decisional strategies, based on the assignment of priorities that are characterized in terms of time-out intervals and are automatically activated in case of conflict. In the present implementation, priorities are time invariant, but, as a result of use of the simulator, strategies based on varying (with time or as a consequence of specific events) priorities may be tested and evaluated. 15.4.2 Power Leveling Power leveling is the electric power management policy that agents follow in order to access electric resources. This policy is based, first, on the capacity of agents and objects to make their needs known to others (this capacity, number 6 in Section 15.3.1, Definition 1, is quite natural for electric devices and is actuated simply by imposing a load to the power line). When a device activates, new energy is requested and the PM/PL detects an increase of the load. If the device is not able to communicate directly (as in the case of a generic appliance), the PM/PL transmits the information about the change over the IN to inform all other agents. In our theoretical framework the PM/PL plays therefore a key role by producing information that agents may receive, understand, and use, according to their capacities (specifically, and in order of importance, numbers 2, 3, 5, and 8 in Section 15.3.1, Definition 1). If the appliance that receives the information is a cognitive agent, because of the capacity to construct a (partial) model of the environment it may govern its action in the resource’s market, compromising between its energy needs and priorities and the energy needs and priorities of other agents in the house. This means that the considered agent will suspend its task if the sum of all energy needs is higher than a fixed threshold (typically, in Italy the threshold is 3 kW) and its priorities are lower than those of all other cognitive agents. In our system, for each cognitive agent, priority is described by means of two parameters, which represent the length of two time-out intervals: the overload time (to ) and the suspension time (ts ). These assign, respectively, the time the agent can wait before stopping operation and entering into a standby status, in case energy consumption exceeds a fixed threshold, and the time the agent must wait in the standby status before trying to start operation again. The level of priority assigned to each agent is determined by the choice of these parameters: low overload time and high suspension time, for example, mean low priority. Using simulation, the consequences of different choices of the parameter values can be tested and their effect on user satisfaction can be evaluated. Table 15.1 shows the default values chosen in a series of tests. From the table, one sees that the dishwasher has a priority higher than that of the washing machine, whereas the gas boiler cannot go on standby for
P1: Binaya Dash November 20, 2007
10:57
7985
7985˙C015
An Approach to Home Automation by Means of MAS Theory
475
TABLE 15.1
Overload and Suspension Times Chosen for the Power Leveling Strategy Generic appliance Gas boiler Dish washer Washing machine
Overload Time (s) ∞ ∞ 60 30
Suspension Time (s) / / 600 900
security reasons and the generic appliance cannot go on standby because it does not have a cognitive capacity. By implementing a power leveling strategy of this kind, it happens that, when the demand for electricity exceeds availability, the PM detects an overload and sends this information to the agents. Each cognitive agent then starts a counter and, if its overload time elapses before, for some reason, the load returns under the threshold, it enters a standby status, reducing its demand for electricity. The load reduction caused by this action is detected by the PM, which informs the agents of the new situation. In case the load goes under the threshold, cognitive agents that are not yet in standby stop their counters. After their suspension time has elapsed, the standby agents start their operation again and the cycle restarts. In case the actions of the single appliances fail to reduce the load, the PL acts by shutting down electricity. The action of the PL is regulated by two parameters that define, respectively, the limit time (tl ) that the PL can wait in an overload situation before taking action and a secondary threshold that the load cannot surpass without causing immediate shut down. Differently from the overload time and the suspension time of the single appliances, these two parameters in the action of the PL cannot, for obvious reasons, be chosen by the users. 15.4.3 Water Leveling Water leveling is the hot water management policy that agents follow in order to access the hot water resource produced by the gas boiler (see Reference [12]). Because the gas boiler has limited capabilities, this resource degrades in quality (temperature and pressure) in case too many agents try to use it at the same time. Therefore, it is important that the information about the hot water market status is available in the HAS and that cognitive agents have the ability to use it. As in the case of power leveling, the water leveling policy is based, first, on the capacity of agents and objects to make their needs known to others, but the situation is more difficult to handle, because in general hot water consumption is not directly measured and objectives are different. A demand for hot water causes a variation of pressure in the hot water circuit that is detected by the gas boiler, but standard boilers are not endowed with the capacity to transmit this information directly to other agents. However, the reaction of the boiler causes an increase in its electric consumption, due to the fact that electric fans
P1: Binaya Dash November 20, 2007
10:57
7985
7985˙C015
476
Modeling and Control of Complex Systems
for the elimination of exhausted gases turn on. If this variation is detected and correctly interpreted, it may serve for generating the information that some agent is currently using the hot water resource. We can assume that the node associated with the boiler can detect and measure the variation and can interpret it on the basis of a simple model of the behavior of the boiler (a model of this kind, called the electric signature of the device, essentially allows one to associate the internal states of the device to the levels of electric consumption). If the node can, in addition, communicate this information, cognitive agents (using capacities 2 and 3 in Section 15.3.1, Definition 1) can infer the availability of the resource and regulate their behavior accordingly. In the system of Figure 15.1, it has been assumed that the HU, WM, DW, and HC have access to hot water. In this respect, the HU and the HC are not cognitive agents, because they do not get the information dispatched by the gas boiler node. However, when the HU opens the hot water tap, he can usually wait some time before his demand is satisfied, although it must be satisfied in a reasonable time, and the GB itself can decide to let the HC wait, in case it is currently occupied to produce hot water for the hot water circuit. Priorities have then to be explicitly fixed only for the WM and the DW, in such a way that the HU gets implicitly, after a possible waiting time, a higher priority and the HC gets implicitly a lower priority. Practically, remarking that the WM and the DW, as well as the HC, need to acquire only limited and fixed quantities of hot water, for a use of this resource limited in time to, for example, 30 s, this can be obtained by imposing that (1) the WM and the DW do not acquire hot water while the HU is using this resource; (2) in case their demand cannot be satisfied immediately, the WM and the DW wait for a fixed amount of time before trying again and they make a fixed number of attempts before starting to produce hot water internally. In this setting, priorities are described by two parameters, that represent, respectively, the wait time (tw ) between two consecutive attempts to get hot water and the number of possible attempts (wait cycles) before renouncing. In this way, there are practically three levels of priority: the highest one, given to the HU, the middle one, given to the WM and the DW, and the lowest one, given to the HC. Also in this case, using simulation, the consequences of different choices of the parameter values can be tested and their effect on user satisfaction can be evaluated. Table 15.2 shows the default values chosen in a series of tests. By implementing a water leveling strategy of this kind, it happens that, when the first agent demands hot water from the hot water circuit, it receives it, possibly at the expense of the house heating circuit. Due to the short time TABLE 15.2
Wait Cycles and Wait Times Chosen for the Water Leveling Strategy Dishwasher Washing machine
Cycles 3 3
Wait Time (s) 300 300
P1: Binaya Dash November 20, 2007
10:57
7985
7985˙C015
An Approach to Home Automation by Means of MAS Theory
477
and to the limited amount of hot water that are necessary, in general, to satisfy the demand, this causes only a small disturbance that the temperature regulation system of the house can easily handle. The node detects the activity of the gas boiler and informs the other agents that the GB is occupied to satisfy someone’s demand. Cognitive agents, then, in case they need hot water, start a waiting cycle. In case the HU demands hot water while the first agent is still getting it, the quality of the resource degrades, but this condition may only persist for 30 s. After this time is elapsed, the first agent stops its use and the HU, if still demanding, or the HC, if waiting, can be satisfied. Clearly, as long as the HU is using the resource, no other agent can interfere. If this situation holds for a long time, the WM and the DW will produce internally the hot water they need. By acting on the wait time and on the number of wait cycles, one can force the WM and the DW to use preferably the hot water produced, at lower cost, by the boiler, than to produce it internally by means of electricity. Clearly, that may reduce the time required for completing a washing cycle. 15.4.4 Simulation Results We summarize, now, the results of a series of tests performed in the simulation environment described in the previous section, in order to show how the simulator can be used in evaluating the performance of given policies. The period of simulation corresponds to two days of real operations, during which different conditions are considered. Simulation time has been reduced, setting the ratio between real time and simulation time equal to 5, so that, for instance, the heating cycle of the washing machine has a simulated time of 60 s, instead of the 300 s needed in reality. Table 15.3 describes the activities which involve agents during the two days in one of the situations considered. In order to illustrate some of the assumptions we have made, it is useful to make the following points: (1) the gas boiler has been assumed to be in the summer configuration, so that the agent representing the house heating circuit is not active; (2) many instances of the generic electric device have been used, employing different models, to represent various appliances, such as a vacuum cleaner, a refrigerator, and an oven. TABLE 15.3
Agents’ Activities and Status Day 1 Refrigerator ON, the door is opened up to 5 times. Gas boiler is ON. WM activates 1 time. DW activates 1 time. Hot water is used 4 times for different time periods. Vacuum cleaner is used 1 time for about 1 h. The oven is used 1 time.
Day 2 Refrigerator ON, the door is opened up to 8 times. Gas boiler is ON. WM activates 1 time. DW activates 2 times. Hot water is used 6 times for different time periods. Vacuum cleaner is used 2 times for about 1 h. The oven is used 2 times.
P1: Binaya Dash November 20, 2007
10:57
7985
7985˙C015
478
Modeling and Control of Complex Systems
7.29.37 7.35.43 7.41.49 7.47.55 7.54.01 8.00.07 8.06.13 8.12.19 8.18.25 8.24.31 8.30.37 8.36.43 8.42.49 8.48.55 8.55.01 9.01.07 9.07.13 9.13.19 9.19.25 9.25.31 9.31.37 9.37.43 9.43.49
Power (W)
Power Trace of PM/PL 3500 3000 2500 2000 1500 1000 500 0
Time (h.mm.ss) FIGURE 15.2 Power trace of PM/PL in simulation.
Figure 15.2 shows the time evolution of the electric load, as measured by the PM, with the system in power leveling mode. Time 0 in the simulation is assumed to correspond to 7:35 a.m. The gas boiler and the refrigerator originate the small load detected before 7:47 a.m. (63 W + 50 W), when the vacuum cleaner (1200 W) is activated for about 1 hour. At about 8:00 a.m. the door of the refrigerator is opened repeatedly, causing the action of the compressor and an additional load (250 W). At 8:31 a.m. the washing machine is asked to start a washing cycle, but this causes an overload (3313 W in the presence of a threshold of 3000 W) and, after the chosen overload time (to = 60 s) is elapsed, operation is delayed and blackout is avoided. The washing machine enters a standby status and it tries repeatedly every 450 s, corresponding to the chosen suspension time ts , to activate. It finally succeeds only at 8:55 a.m., after the vacuum cleaner has been turned off. The programmed washing cycle ends at about 9:40 a.m., without other interesting occurrences. In the observed situation, modifying the parameters that characterize the priority of the WM, in particular reducing ts , one can reduce the dead time occurring between the turning off of the vacuum cleaner and the activation of the washing machine, making in general the system more prompt in response to the user commands. On the other hand, this produces more occurrences of the unwanted overload situation, stressing the power system. By defining a functional that penalizes the occurrences of overload situations and rewards the dead time reduction (or better, the reduction of the ratio between dead time and task duration), one can compare the performances obtained by different choices of the parameters, in relation to the typical or average behavior of the system user. Figure 15.3 shows the time evolution of the electric load in the same system, with different parameters, at a later time. In the considered situation, two cognitive agents, the WM and the DW, having different priorities, enter into conflict.
P1: Binaya Dash November 20, 2007
10:57
7985
7985˙C015
An Approach to Home Automation by Means of MAS Theory
479
DW WM
10.23.50 10.27.26 10.31.02 10.34.38 10.38.14 10.41.50 10.45.26 10.49.02 10.52.38 10.56.14 10.59.50 11.03.26 11.07.02 11.10.38 11.14.14 11.17.50 11.21.26 11.25.02 11.28.38 11.32.14 11.35.50 11.39.26 11.43.02 11.46.38
Power (W)
Power Trace of PM/PL 4500 4000 3500 3000 2500 2000 1500 1000 500 0
Time (h.mm.ss) FIGURE 15.3 Power trace of PM/PL in simulation.
The chosen parameters in this simulation are the following: •
WM: to = 30 s; ts = 120 s
•
DW : to = 10 s; ts = 180 s
With this choice, the DW has lower priority than the WM and, in case of overload, the DW yields first. In addition, the WM tries more often to access the resource in case of scarce availability. Time 0 in the simulation is assumed to correspond to 10:28 a.m. The gas boiler and the refrigerator originate the small load detected before 10:41 a.m. (63 W + 50 W), when the vacuum cleaner (1200 W) is activated for about 40 min. At 11:03 a.m., the human user is assumed to use hot water, causing the electric consumption of the boiler to increase (about 27 W). At about 11:10 a.m., the DW is activated and it starts the prewashing cycle. When, a few minutes later, the DW tries to use more electricity to heat the water, one has an overload that repeats when, a few minutes later, the WM is asked to activate. Therefore, both agents enter a standby status. From 11:10 a.m. until about 11:22 a.m., the DW and the WM try unsuccessfully to start their cycles, until the vacuum cleaner is turned off and the WM can activate. It can be remarked that the vacuum cleaner has been deactivated during an overload condition, before the WM overload time was elapsed and, therefore, operation of this has not been stopped. A few minutes later also, the DW tries to activate, but this causes a conflict with the WM and, again, an overload occurs. In this case, each one of the involved agents can decide to renounce and to delay the completion of its task; however, the DW, having a lower overload time, renounces first. The next time the DW tries to activate, at about 11:26 a.m., no conflict arises, because the WM has already reduced its consumption. Figure 15.4 shows the time evolution of the electric load, as measured by the PM, with the system in power leveling and water leveling modes during another test and Table 15.4 describes the activities that involve agents
P1: Binaya Dash November 20, 2007
10:57
7985
7985˙C015
480
Modeling and Control of Complex Systems
1 2-3
2000 1800 1600 1400 1200 1000 800 600 400 200 0
4
5 6-7-8
9
10
11-12
11.17.30 11.21.57 11.26.24 11.30.51 11.35.18 11.39.45 11.44.12 11.48.39 11.53.06 11.57.33 12.02.00 12.06.27 12.10.54 12.15.21 12.19.48 12.24.15 12.28.42 12.33.09 12.37.36 12.42.03 12.46.30 12.50.57 12.55.24 12.59.51 13.04.18 13.08.45 13.13.12 13.17.39 13.22.06 13.26.33 13.31.00 13.35.27
Power (W)
Power Trace of PM/PL
Time (h.mm.ss) FIGURE 15.4 Power trace of PM/PL in simulation.
during the considered period. The priorities in the use of hot water have been determined by the following choice of parameters: WM: tw = 300 s; wait cycles = 5 • DW: tw = 500 s; wait cycles = 3 •
The important point, here, is that the use of hot water by the WM and the DW reduces the global consumption of electricity, but, as there is competition for the hot water resources, in particular with the human user, the completion of tasks is delayed. Starting the simulation, we can see, as in the previous situations, the load due to the boiler in standby condition (1), to the vacuum cleaner (2) and again to the boiler (3), when hot water is requested by the human user. At about 11:47 a.m., the DW and the WM are asked to activate. While the DW start its prewashing cycle, which does not require the use of hot water, with TABLE 15.4
Agents’ Activities and Status Marker 1 2 3 4 5 6 7 8 9 10 11 12
Time (hh.mm.ss) 11.20.03 11.23.37 11.26.13 11.46.51 11.55.12 12.01.00 12.02.07 12.03.31 12.25.07 12.55.33 13.31.07 13.34.56
Power (W) 113 1313 1340 1530 1340 1313 1530 1720 1833 1643 1313 113
Actions/Status Gas boiler is ON. Vacuum cleaner activates. Hot water tap opens (boiler fan activates). DW starts prewashing cycle. DW ends prewashing cycle. Hot water tap closes (boiler fan deactivates). WM starts heating cycle. DW starts washing cycle. WM ends heating cycle/starts washing cycle. DW ends washing cycle. WM ends washing cycle. Vacuum cleaner deactivates.
P1: Binaya Dash November 20, 2007
10:57
7985
7985˙C015
An Approach to Home Automation by Means of MAS Theory
481
a corresponding increase of the load (4), the WM enters a standby status, waiting until the hot water resource becomes available. When the DW should start the washing cycle, it also finds that hot water is not available and enters a standby status, reducing the load (5). In the following period, both the WM and the DW try to get the desired resource. At 12:01 a.m., the human user stops using hot water, the boiler consumption decreases (6), and, shortly after, the WM can obtain the resource and start its cycle (7). Clearly, because less electricity is needed for bringing the water temperature to the required value, the consumption of the WM is lower than in the previous simulation (190 W instead of 2000 W). At 12:03 a.m., also the DW gets hot water and can start the washing cycle (8), with an additional load of 190 W instead of 1850 W. Then, the load evolves according to the phases of the washing cycle of the WM (9) and (10), of the DW (11), and the operation of the vacuum cleaner (11). Comparing this behavior of the system with one in which the appliances were not given the possibility to use the hot water produced by the boiler, it results that, because of the employed power leveling and water leveling policies, one saves globally 20% of the energy and avoids overloads.
15.5
Conclusion
Conceptual instruments for developing a formal, useful theory of HAS have been derived from the MAS theory. The resulting approach has been used for classifying the components and analyzing the structure of real systems. This has facilitated the development and construction of a rich simulation environment, where overall control strategies for power leveling and water leveling have been tested. The various choices we have utilized in this work are motivated, in particular, by the present level of technology in the industry of white goods and of other appliances that populate the domestic environment. Among the points of the theory that deserve to be studied and possibly revised in the future, a central one concerns the definition of domotic agent. This concept needs to be made, at the same time, formally more precise, in order to gain efficacy in deriving the notion of system and in classifying systems according to their components and associated features, and more versatile, in order to apply to the various devices one may reasonably want to include in a home automation system, now and in the future. Another very important point is the one that concerns the characterization of the overall structure of the system. In our approach the structure is described by a set of rules and a set of laws, but, although very flexible, this description is relatively poor and it does not offer tools for classification or a direct way for developing design procedure. The problem of a conceptual definition of the overall system structure is complicated also by the fact that the main producers of appliances and devices for home management have not yet been able to propose a unifying
P1: Binaya Dash November 20, 2007
10:57
482
7985
7985˙C015
Modeling and Control of Complex Systems
point of view about the architecture of possible, real home automation systems. In particular, several basic questions, such as those concerning the levels of decentralization, of autonomy, of interoperability that will be attainable at reasonable costs and that will be accepted by the consumer, are still waiting for illuminating answers. Related to this, there is also the choice of the communication system and the characteristics of the communication network. The use of the power line we have assumed in our example has the advantage of avoiding wiring the domestic environment, but its practical implementation requires suitable interfacing devices. Wireless systems are undergoing a fast evolution and will probably represent the choice of the future, but standards for this home application are not yet completely established. Further advances in the directions we have outlined above will make clearer and more feasible the basic conceptual tools described in this work and, in particular, will give the basis for developing tools and systematic design procedures. In turn, this should facilitate the definition of control and regulation strategies that can be practically implemented and that improve the global system performances. Here, we have only touched this point, avoiding commenting on the details of the construction of a suitable functional that measures the level of performances of the system. The simulation procedure we have described, however, can be considered conceptually as a practical tool for evaluating the outcomes of cooperation and coordination strategies and the effects of the choice of defining parameters.
References 1. Aisa V., Meloni F., “Regolazione degli assorbimenti di potenza delle utenze domestiche,” Automazione e Strumentazione, no. 7, 1999. 2. Aisa V., “Tecnologie ICT per l’innovazione dell’elettrodomestico: il caso Merloni,” Automazione e Strumentazione, no. 10, 2002. 3. Aisa V., “KONNEX/LONTALK Compliant Method for Connecting White Goods to a Home Network at a Very Low Cost,” Proceedings of the 3rd International Conference on Energy Efficiency in Domestic Appliances and Lighting EEDAL03, Turin, Italy, 2003. 4. Aisa V., Falcioni P., Pracchi P., “Connecting white goods to a home network at a very low cost,” in Proceedings of the Congress on International Appliance Manufacturing, Milano, Italy, 2004. 5. Conte G., Scaradozzi D., “Viewing Home Automation Systems as Multiple Agents Systems,” in Proceedings of the Workshop on Multi-Agent System for Industrial and Service Robotics Applications, Padova, Italy, 2003. 6. Conte G., Scaradozzi D., “Applying MAS Theory to Complex Home Automation Systems,” in Proceedings of the Workshop on Modeling and Control of Complex Systems, Ayia Napa, Cyprus, 2005. 7. Cotti M., Casa G., “Il Telegestore: una nuova infrastruttura di comunicazione per i Servizi a Valore Aggiunto,” in Atti 45◦ Convegno Nazionale ANIPLA, Ancona, Italy, 2001.
P1: Binaya Dash November 20, 2007
10:57
7985
7985˙C015
An Approach to Home Automation by Means of MAS Theory
483
8. Dutta-Roy A., “Bringing home the internet,” IEEE Spectrum, March 1999. 9. Falcioni P., Minni F., Scaradozzi D., “Innovative Techniques to Maximize Energy Savings in a Domestic Environment,” in Proceedings of the 9th World Multiconference on Systemics, Cybernetics and Informatics (WMSCI 2005), Orlando, Florida, July 2005. 10. Ferber J., Multi-Agent Systems. An Introduction to Distributed Artificial Intelligence, Addison-Wesley, London, 1999. 11. Flores-Mendez R.A., Towards a Standardization of Multi-Agent System Frameworks, Crossroads archive, 5, ACM Press, New York, NY 1999. 12. Scaradozzi D., Conte G., Aisa V., “Insertion of Boilers in Home Automation Systems,” Proceedings of the 3rd International Conference on Energy Efficiency in Domestic Appliances and Lighting EEDAL03, Turin, Italy, 2003. 13. Scaradozzi D., “Strumenti di Simulazione e Analisi per Sistemi di Automazione Domestica,” Proceedings of II Conferenza su Tecnologia ed Economia della Domotica: uso razionale dell’energia nelle abitazioni domotizzate, Pavia, Italy, 2004. 14. Scaradozzi D., “Methodologies and Techniques for Analysis and Design of Home Automation Systems,” Ph.D. dissertation, Universit`a Politecnica delle Marche, Ancona, Italy, 2005. 15. Sycara K., “MultiAgent Systems,” AI Magazine, 19 (2), 1998. 16. http://www.aaai.org/AITopics/html/multi.html 17. Edwin G., Cox M. T., “Resource Coordination in Single Agent and Multiagent Systems,” in Proceedings of the 13th International Conference on Tools with Artificial Intelligence, Dallas, Texas, 2001. 18. Vishwanathan V., McCalley J., Honavar V., “A Multiagent System Infrastructure and Negotiation Framework for Electric Power Systems,” Power Tech Proceedings, 2001. 19. Conte G., Scaradozzi D., Perdon A., Cesaretti M., Morganti G., “A Simulation Environment for the Analysis of Home Automation Systems”, in Proceedings of the 15th Mediterranean Conference on Control on Control and Automation, Athens, Greece, 2007.
P1: Binaya Dash November 20, 2007
10:57
7985
7985˙C015
P1: Binaya Dash November 20, 2007
10:58
7985
7985˙C016
16 Multi-Robot Social Group-Based Search Algorithms
Bud Fox, Wee Tiong Ong, Heow Pueh Lee, and Albert Y. Zomaya CONTENTS 16.1 16.2
16.3 16.4
16.5
Introduction.............................................................................................. 486 Social Group-Based Search Algorithm ................................................. 487 16.2.1 Coordination of Alpha and Beta Robots ............................... 488 16.2.2 Movement of Alpha Robots .................................................... 489 16.2.3 Movement of Beta Robots........................................................ 489 16.2.4 Multiradial Search .................................................................... 491 16.2.5 Multiradial Search with Voronoi Domain Decomposition .......................................................................... 491 16.2.6 Robot Dispersion....................................................................... 492 Code Structure ......................................................................................... 492 Algorithm Comparison .......................................................................... 494 16.4.1 Standard Search......................................................................... 494 16.4.2 Voronoi Search........................................................................... 494 16.4.3 Goldsmith Search...................................................................... 494 16.4.4 Target Motion ............................................................................ 495 16.4.5 Parameters for Search Methods .............................................. 495 Results ....................................................................................................... 496 16.5.1 Standard Search......................................................................... 497 16.5.1.1 Linear Target Motion without Prediction ............ 497 16.5.1.2 Linear Target Motion with Prediction .................. 498 16.5.1.3 Nonlinear Target Motion without Prediction...... 498 16.5.1.4 Nonlinear Target Motion with Prediction............ 499 16.5.1.5 Target Random Walk............................................... 500 16.5.2 Voronoi Search........................................................................... 500 16.5.2.1 Linear Target Motion without Prediction ............ 500 16.5.2.2 Linear Target Motion with Prediction .................. 501 16.5.2.3 Nonlinear Target Motion without Prediction...... 501 16.5.2.4 Nonlinear Target Motion with Prediction............ 501 16.5.2.5 Target Random Walk............................................... 501 485
P1: Binaya Dash November 20, 2007
10:58
7985
486
7985˙C016
Modeling and Control of Complex Systems 16.5.3
Goldsmith Search...................................................................... 501 16.5.3.1 Linear Target Motion............................................... 502 16.5.3.2 Nonlinear Target Motion ........................................ 503 16.5.3.3 Target Random Walk............................................... 503 16.5.4 Algorithm Execution Time ...................................................... 503 16.5.5 Radial Search Execution Time................................................. 504 16.6 Conclusions .............................................................................................. 505 References............................................................................................................. 506 A multi-robot social group-based search algorithm is developed using Matrix Laboratory [1] to simulate a group of robots detecting and tracking a target moving in a linear, nonlinear, and random walk manner. Three algorithms are pursued: (1) a robot search algorithm proposed by Goldsmith and Robinett [2], (2) a standard search algorithm using a multiradial search function and dispersion behavior, and (3) a Voronoi search algorithm using a Voronoi decomposition of search space prior to the commencement of multiradial search. The robots are divided into two social groups: a faster moving alpha (α) group, and a more energy-conserving beta (β) group.
16.1
Introduction
The first search theory was created during World War II by Koopman [3] (see also Reference [4]) to help the U.S. Navy locate enemy ships and submarines. Since then there has been a lot of progress in this field. Of late, much work on cooperative robot search techniques has been performed; for example, Jennings et al. [5] discussed the use of a team of cooperative search robots, Goldsmith and Robinett [2] introduced alpha (α) and beta (β) robot search, Burgard et al. [6] studied collaborative multi-robot exploration, Singh and Thayer [7] researched the behavior of a team of search and rescue (SAR) robots based on immunology studies, Pack and Mullins [8] pursued finding a universal search algorithm that suits a group of robots, Guo et al. [9] explored a multi-robot security application, and Ablavsky and Snorrason [10] studied optimized search methods. There were only limited results on optimal search for a moving target before 1977 [11]. However, more research on a moving target has been done since then. Some examples of this research are the work of Kan, [12] concerning an optimal search for a moving target; the explorations of Discenza and Stone [13] on an optimal survivor search; Stone [14] discusses the theory of an optimal search; and Iida [15] introduces an optimal search plan concerning the minimum expected risk for locating a target. A recent review of search theory, involving SAR decision support, is presented by Frost and Stone [16]. The aforementioned research has had an indirect influence on the development of this work.
P1: Binaya Dash November 20, 2007
10:58
7985
7985˙C016
Multi-Robot Social Group-Based Search Algorithms
487
The study here aims to use various ideas from the traditional SAR theory and to merge them with a more heuristic social group-based oriented search mechanism, to determine the effectiveness of the detection and tracking ability of a moving target by groups of robots. The initial idea for the development of this research comes from the work of Goldsmith and Robinett [2]. A heuristic multi-robot social group-based search algorithm, involving two groups of robots, an α and a β group, is developed to detect a stationary or moving target. The aggressive α robots are more active and hence more energy consuming, whereas the β robots are less so. The α and β robots are assumed to operate in ideal conditions where there are no obstacles, ambient noise, or communication errors. In addition, the robots in the search space are able to locate a stationary target within a particular detection radius and consequently the robots converge towards the target. The extensions made here to the social group-based search method involve the robots being programmed to perform a multiradial search in order to locate the target when the target is not initially within the radius of detection of the robots. In addition, the robots are designed to disperse from the center of mass (CM) of the robot group, or disperse in a manner based on a repulsive inverse square (electrostatic) force law once they become too congested or close to each other. A Voronoi domain decomposition method is also introduced to partition the robots into regions of search space to more efficiently perform a multiradial search. The work here is designed to lay the foundations of future studies in planar and three-dimensional submarine detection and tracking, by ships and aircraft, in both cooperative and noncooperative search scenarios. The cooperative searches involve both parties trying to locate each other as in a SAR situation, and the noncooperative searches are typical in warfare environments where both parties search for each other but attempt to avoid detection. Section 16.2 introduces the idea of a social group-based search algorithm. Section 16.3 discusses the general code structure written using Matrix Laboratory (MATLAB).TM [1] In Section 16.4, the original algorithm and two new variations of it are explained. Section 16.5 presents experimental results concerning the three search algorithms, three types of target motion, and different robot behaviors with and without target motion prediction. Section 16.6 provides a summary of the findings of this research work, and discusses some possible extensions that can be made upon the current foundations.
16.2
Social Group-Based Search Algorithm
The social group-based search algorithm is a search method that involves two social groups of robots, α and β robots, coordinating their search to locate a moving target within a predefined search space. The following sections concern coordination and movement of the two social groups, and ideas are
P1: Binaya Dash November 20, 2007
10:58
7985
7985˙C016
488
Modeling and Control of Complex Systems
introduced here to supplement the original algorithm, that is, multiradial search, Voronoi domain decomposition, inverse square law dispersion, and dispersion, from the robot-group CM. 16.2.1 Coordination of Alpha and Beta Robots The original social group-based search algorithm introduced in Goldsmith and Robinett [2] involves two groups of robots in search of a moving target within a defined search area. The robots are considered to be α robots when the target lies within a radius of a detectable range of the robots while the remaining robots form the β group. The motion of the α robots in the original algorithm is toward a randomly chosen α robot. In this work, the closest α robot to the target is known as the best α robot and the remaining α robots will be defined to move towards this best α robot as shown in Figure 16.1. The remaining β robots will move towards the CM of the α group (CMα ), as shown in Figure 16.2. The position vector rc of the CM of a group of point masses mi , i = 1, . . . , n, with position vectors ri = (xi , yi ) T (for planar search) is [17] n n mi ri 1 rc = i=1 = ri n n i=1 i=1 mi
(16.1)
where the robot masses mi are considered to be constant in this study.
Best Alpha
Target CM Alpha
Alpha Robot
Beta Robot
FIGURE 16.1 The motion of α robots towards the closest α robot to the target.
P1: Binaya Dash November 20, 2007
10:58
7985
7985˙C016
Multi-Robot Social Group-Based Search Algorithms
Target
489
Best Alpha
CM Alpha
Alpha Robot
Beta Robot
FIGURE 16.2 The motion of β robots towards the CM of the α group, CMα .
16.2.2 Movement of Alpha Robots The α group is the more aggressive or faster moving of the two social groups, and the motion of these robots is towards the best α robot while the actual best α robot remains stationary. The new position vector pαi (t + 1) of α robot i is pαi (t + 1) = pαi (t) + wα (pc (t) − pαi (t))
(16.2)
where pc (t) is the position vector of the closest of the α robots to the detectable target, and wα is the weighting or scalar multiple of the direction vector towards pc (t) which defines how fast the α robots move. The choice of the weight wα is the same for all α robots. If wα ≤ 1, the robots approach the best α robot but do not progress beyond it. If wα > 1, the α robots move beyond the best α robot. Figure 16.1 shows the motion of the α robots towards the closest α robot to the target; the β robots remain stationary at this stage. 16.2.3 Movement of Beta Robots The β group is less aggressive in searching the space than the α group and moves more slowly, and their motion is directed towards the location rcα (t), of CMα , as shown in Figure 16.2. However, not all β robots will move towards CMα ; only those within a certain critical distance dcβ of CMα will change in position. This is to minimize overcrowding or high density, or premature convergence of the β social group towards the target.
P1: Binaya Dash November 20, 2007
10:58
7985
7985˙C016
490
Modeling and Control of Complex Systems
In a similar manner to that in Reference [2], the new position pβi (t + 1) of the ith beta robot βi , is pβi (t + 1) = pβi (t) + wβi (rcα (t) − pβi (t))
(16.3)
where pβi (t) is the current β robot position, rcα (t) is the position vector of CMα , and wβi is a weight for the ith β robot defined as: sβi wβi = nβ i sβi
(16.4)
where sβi is the social status value or closeness measure of βi to the target, for the group of size nβ , which is determined by ranking the robots in terms of proximity to the target. The weight wβi influences the speed of motion of robot βi . The motion of the β group is modified by the authors here to also preempt the next target position without excessive convergence. See Figure 16.3, where the direction vector used may be the vector difference rcα (t) − rcα (t − 1) of the CMα at consecutive time-steps, t − 1 and t. Hence, the new βi position is defined as: pβi (t + 1) = pβi (t) + wβi (rcα (t) − rcα (t − 1))
(16.5)
for t > 1.
Predicted CM Point Alpha Robot Best Alpha CM Alpha Target
Beta Robot
FIGURE 16.3 The motion of β robots in the direction of the predicted location of CMα , at the next time step t + 1.
P1: Binaya Dash November 20, 2007
10:58
7985
7985˙C016
Multi-Robot Social Group-Based Search Algorithms
491
16.2.4 Multiradial Search In this work, the robots are programmed to perform a concurrent radial search when they fail to detect the target. The motion of the search is in concentric circles separated by a radial distance dr , with arc steps dl = r dθ, through an angular displacement, dθ. A robot progresses outwards until it is within a disc of radius r T of the target position T = [Tx , Ty ]T . All robots from both social groups start this concurrent radial search from their respective locations at the time point of loss of detection of the target. 16.2.5 Multiradial Search with Voronoi Domain Decomposition The standard radial search commencing at the time point of loss of detection of the target results in an overlapping of radial search paths of the β robots and is not coordinated to cover the search area efficiently. Various search methods presented in Ablavsky and Snorrason, [10] including the raster, box-spiral, and zamboni coverage patterns, may be used to search for a moving target. A Voronoi domain decomposition [1,18] is performed here to effectively assign robots to subregions of the search space in a manner that does not result in excessive consumption of robot energy by an overlapping of search area. The decomposition is made with the robots in their current positions, and then the CM of each Voronoi cell is determined using the Voronoi cell boundary points. The robots then move from the current positions in their cells to their respective cell CMs, as shown in Figure 16.4. The multiradial search then progresses for each robot in its respective cell.
Target Center of Voronoi Cell
Robot
FIGURE 16.4 A Voronoi domain decomposition concerning all robots, and placement of the robots at their respective cell CMs prior to radial search.
P1: Binaya Dash November 20, 2007
10:58
7985
7985˙C016
492
Modeling and Control of Complex Systems
16.2.6 Robot Dispersion The target-following or target-tracking behavior may result in the robots becoming too congested. Hence, a dispersion technique based on the inverse square electrostatic force law is used to mutually repel the robots. The force of repulsion is Fi j =
1 qi q j rˆi j , 4π ε0 ri j 2
(16.6)
where ri j = ri − r j is the position vector between the origins of the point charges (robots) q i and q j , respectively, rˆi j = ri j /ri j is the normalized direction vector, and ε0 is the permittivity constant [19]. The constant 1/4π ε0 can be replaced here by an appropriate constant for the purpose of performing a heuristic position update based on inverse square force-induced displacement. In this sense, no integration is performed to obtain positions from velocities and accelerations, but rather a heuristic position-update mechanism to mutually reposition all robots is made. A vector sum of all forces acting on all n bodies due to the n(n − 1)/2 interactions is made, and the positions are updated as follows: pi (t + 1) = pi (t) + Qi (t),
(16.7)
where pi (t + 1) is the new heuristically updated position of robot i, pi (t) is the current position of the ith robot, and Qi (t) is the force-induced displacement on body i ∈ [1, n] due to the other bodies, at time t. This idea of an inverse square law dispersion is introduced to take place for all robots when the average displacement of the robot from the target d¯ < d¯ crit , for d¯ crit a critically small displacement, or if the robot group density ρ is too large, and will occur only for a selection of robots, if the distance between those concerned is smaller than some critical distance, that is: ||ri j || < rcrit .
(16.8)
An alternative method was also experimented with and involved a dispersion of the entire (α + β) group in the direction away from the CM of the whole group; that is, the new robot positions are pi (t + 1) = pi (t) + kd (pi (t) − rc (t)),
i ∈ [1, n]
(16.9)
where rc (t) is the position vector of the CM of the entire (α + β) group and kd is the dispersion constant specifying the magnitude of dispersion.
16.3
Code Structure
The general structure of the MATLAB code is shown in Figure 16.5. The initial conditions may involve placing the robots on a grid or using uniformly random positions. The main time loop iterates through a predetermined number
P1: Binaya Dash November 20, 2007
10:58
7985
7985˙C016
Multi-Robot Social Group-Based Search Algorithms
initial_conditions() for t = 1:t_total target_move()
493
% prescribed target motion
% -- Alpha and beta groups build_alpha_beta_groups() if(n_alpha==0) multi_radial_search() voronoin_CofM() % voronoi domain decomposition CofM() endif CofM(alpha) get_closest_alpha() % -- Movement of alpha and beta groups movement_alpha() % towards best alpha robot movement_beta() % towards CMα movement_beta_target_pred() % in direc of projected CMα % -- Dispersion if congested average_displacement() particle_density() CofM(group) dispersion_wrt_CofM() % dispersion away from CMα dispersion_crit_elec() % critical electrostatic dispersion dispersion_elec() % total electrostatic dispersion % -- plot plot(x,y,t) end statistics() average_displacement()
% plot robot positions and other info
FIGURE 16.5 Pseudocode showing the general structure of the MATLAB code simulating the α − β social group search method using: (1) target motion, (2) group definition, (3) radial search and Voronoi domain decomposition, (4) group motion, (5) robot dispersion, and (6) summary statistics.
of time steps ttotal , where at each step t: (1) the target is moved, (2) the α and β groups are identified, (3) an active radial search to determine the α group may be required and may use Voronoi domain decomposition, (4) the motion of the groups is defined with the option of target preemption, (5) robot dispersion takes place if the robot density is too high for the target tracking case, and finally, (6) summary statistics are obtained to determine the convergence of the search method. The average displacement over time is then plotted to determine the convergence properties of a range of robot populations. The execution times for the multiradial search and the whole simulation are also detailed for comparison purposes (see Section 16.5, Results).
P1: Binaya Dash November 20, 2007
10:58
7985
7985˙C016
494
16.4
Modeling and Control of Complex Systems
Algorithm Comparison
Three different social group-based search algorithms are studied: (1) standard search, (2) Voronoi search, and (3) Goldsmith search. The first two methods introduced by the authors here are based on but are different from the Goldsmith search, which is a method that incorporates the ideas from the work of Goldsmith and Robinett [2] together with multiradial search and electrostatic dispersion introduced in this work.
16.4.1 Standard Search The standard search algorithm involves: (1) target motion, (2) multiradial search, (3) α and β group motion, and (4) inverse square law-based dispersion. A multiradial search commences at the location where the robots lose track of the target, but with the absence of a Voronoi decomposition. The target following continues until the robots become too dense and are required to be dispersed “electrostatically,” either with respect to the whole group of robots [dispersion_elec()] or specifically/critically for which Equation (16.8) is true for the robots concerned [dispersion_crit_elec()].
16.4.2 Voronoi Search The Voronoi search method introduced here is similar to the standard search case, but differs only in that a Voronoi domain decomposition of the search space, as defined by the position of the robots, is performed prior to the multiradial search. Each of the robots is moved from its position, about which the Voronoi decomposition was made, to their respective Voronoi cell CMs. The multiradial search then commences for each robot from its respective Voronoi CM position.
16.4.3 Goldsmith Search The Goldsmith search method is based on Goldsmith and Robinett [2] and differs from the two previous methods by the following features: (1) the motion of the α robots is towards a randomly picked α robot, as opposed to the best α robot in the other algorithms; (2) the motion of these α robots will be in the direction of and beyond the randomly picked α robot, that is, in Equation (16.2) wα > 1, whereas in the previous algorithms wα ≤ 1; and (3) the robots will disperse from each other with respect to the CM of the entire α + β group. This method, however, does make use of the multiradial search and specific/critical electrostatic dispersion introduced here but not present in Goldsmith and Robinett [2] for a better comparison with the other two search methods.
P1: Binaya Dash November 20, 2007
10:58
7985
7985˙C016
Multi-Robot Social Group-Based Search Algorithms
495
16.4.4 Target Motion The motion of the target is programmed to be of a linear ( y = x), nonlinear ( y = sin(x)), or random walk behavior, where: Ti (t + 1) = Ti (t) + δTi (t),
(16.10)
and δTi (t) = (δT xi (t), δT yi (t)) T is the finite random displacement of the target at the current time point t. The social group-based search behavior is then studied for the three target tracking motions; the convergence properties of the search for a varying number of robots are plotted in Section 16.5. 16.4.5 Parameters for the Search Methods The parameters shown in Table 16.1 for the robot_search.m code, represented by the pseudocode of Figure 16.5, are experimented with to study the group convergence behavior as measured by the average displacement of the robots from the target. The actual values used were determined experimentally. TABLE 16.1
Parameter Settings for the Three Social Group-Based Search Algorithms Standard Search 1. General and Detection 1.1. n 1.2. ic_square 1.3. t_total 1.4. target_rad
3.7. percentage_of_range 3.8. bndy_delta 3.9. n_bndy_steps
Goldsmith Search
9 ∼ 100 1 100 0.1
2. Motion and Convergence Linear, nonlinear, and 2.1. T_MOVE random walk Linear (with and without), 2.2. T_PRED (prediction) nonlinear (with and without), random walk (without) 2.3. dT 0.3 2.4. alpha_wt 2.5. k_beta 2.6. crit_dist_b 3. Dispersion 3.1. rad_dense 3.2. min_ave_displ 3.3. max_density 3.4. min_sep 3.5. k_disp 3.6. percentage_of_domain
Voronoi Search
Linear, nonlinear, random walk (all without) 0.2 1.8 10 0.8 0.12 0.4 5 0.001
0.5 0.2 (dispersion_crit_elec and (dispersion_elec) 0.2 (dispersion_crit_elec and dispersion_elec)
2.0 0.2 (dispersion_crit_elec) 0.2 (dispersion_crit_elec) 0.2 7
P1: Binaya Dash November 20, 2007
10:58
7985
496
7985˙C016
Modeling and Control of Complex Systems
The general algorithm parameters are as follows: (1) the robot population size is n, (2) ic_square denotes a logical initial condition setting that places the robots on a square grid, (3) the total number of algorithm time steps is t_total, and (4) the radius of detection of the target by the search robots is target_rad. The motion and convergence parameters are (1) a flag T_MOVE which indicates the type of target motion to be linear, nonlinear, or a random walk involving random direction changes but finite step lengths; (2) a target prediction flag T_PRED which invokes the use of the appropriate β robot position update equation, here (16.5) rather than Equation (16.3); (3) the finite step length of the target motion is dT; (4) the α robot position-update weight ωα is alpha_wt; (5) kβ or k_beta is a premultiplying scaling factor of ωβ to vary the amount of β robot motion and is constant for all β robots; and (6) crit_dist_b is the critical distance of the β robots from CMα , beyond which β robots remain stationary so as to maintain a broad coverage of the search space. The dispersion parameters used for redistributing the robots if they become too congested are as follows: (1) the radial distance within which robots are considered to be too dense is rad_dense, (2) the minimum average displacement for which robots are considered too close is min_ave_displ, (3) the maximum percentage of robots of the total group for which robots are considered too dense is max_density, (4) the minimum separation of robots in the dispersion_crit_elec() function is min_sep and is used to disperse only a pair of robots, (5) the dispersion constant that determines the magnitude of dispersion for all robot dispersion methods is k_disp, (6) the percentage of the domain of search space limiting the dispersion of the robots is percentage_of_ domain, (7) the percentage of the range of search space limiting the dispersion of the robots is percentage_of_range, (8) the distance from the extreme located robots to the boundary of the Voronoi decomposition points to decompose the search space into Voronoi cells is bndy_delta, and (9) the number of boundary steps, n_bndy_steps, is the interval size between boundary points used to construct the Voronoi decomposition of search space. Both the standard and Voronoi search methods introduced here share the same values for all parameters as shown in Table 16.1; the parameters of the Goldsmith search that differ from the two previous search methods are T_PRED = 0 (no target prediction), alpha_wt, ωα = 1.8, and k_disp, kd = 2.0. This table may be useful for researchers intending to reproduce and extend the work done here.
16.5
Results
The convergence behavior, as measured by the average displacement of the robots from the target, of the α-β social group-based search in detecting and following a moving target is studied for a number of robots n = i 2 , i ∈
P1: Binaya Dash November 20, 2007
10:58
7985
7985˙C016
Multi-Robot Social Group-Based Search Algorithms
497
{[3, 4, · · · , 10]} (initially placed on a grid) and a total number of iterations ttotal = 100 time steps. The execution time of the multiradial search method and the code as a whole are also studied.
16.5.1 Standard Search The experimentations conducted here involve three prescribed target motions introduced in Section 16.4.4: (1) linear, (2) nonlinear, and (3) random walk, where the first two involve the presence or absence of target motion prediction, and the third is without target motion prediction. 16.5.1.1 Linear Target Motion without Prediction Figure 16.6 shows the average displacement versus time (100 time steps) for different numbers of robots n ∈ {[9, 16, 25, 36, 64, 81, 100]} for the standard search algorithm involving linear target motion without target motion prediction. The common behavior, except for n = 16, is an initial convergence towards the target, then successive dispersions after a certain period of target following. The α and β groups are formed once the target has been detected but the average displacement decreases due to aggressive convergence of the α group. Successive dispersions take place, resulting in an almost constant average displacement for each case n, where only a small number of α robots remain in close proximity following the target.
Average Displacement vs. Time – Standard Search: Linear Target Motion without Prediction
Average Displacement (m)
2.5
2 9 16 25 36 49 64 81 100
1.5
1
0.5
0
1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97 Time (s)
FIGURE 16.6 Average displacement vs. time — standard search: linear target motion without prediction.
P1: Binaya Dash November 20, 2007
10:58
498
7985
7985˙C016
Modeling and Control of Complex Systems Average Displacement vs. Time – Standard Search: Linear Target Motion with Prediction 2
Average Displacement (m)
1.8 1.6 1.4
9 16 25 36 49 64 81 100
1.2 1 0.8 0.6 0.4 0.2 0
1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97 Time (s)
FIGURE 16.7 Average displacement vs. time — standard search: linear target motion with prediction.
16.5.1.2 Linear Target Motion with Prediction The average displacement of the robots from the target for the standard search, with linear target motion and target motion prediction, is shown in Figure 16.7. As the number of robots increases, the average displacement decreases until the robot group is too congested; at this point, a total inverse square law dispersion takes place, mutually dispersing all robots. This is the cause of the relatively large increase in average dispersion between approximately 29 and 37 time steps. Thereafter, critical electrostatic dispersions take place incrementally, increasing the average displacement, where only those robots satisfying Equation (16.8) are dispersed. The higher the value of n, the better the convergence of the average displacement. This may be due to the broader coverage of the search space and the increased ability to converge to the extrapolated or predicted position of the target under linear motion. 16.5.1.3 Nonlinear Target Motion without Prediction The average displacement versus time graphs for nonlinear target motion without target motion prediction for the standard search is shown in Figure 16.8. The variation in average displacement values of the robots from the target is greater than for linear target motion without prediction, but there is still convergence after numerous total electrostatic dispersions. This greater variation in average displacement can be attributed to the sinusoidal and less predictable motion of the target. It appears that there is still a decreasing average displacement for n = 64 at the end of the simulation time.
P1: Binaya Dash November 20, 2007
10:58
7985
7985˙C016
Multi-Robot Social Group-Based Search Algorithms
499
Average Displacement vs. Time – Standard Search: Nonlinear Target Motion Without Prediction
Average Displacement (m)
2.5
2 9 16 25 36 49 64 81 100
1.5
1
0.5
0
1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97 Time (s)
FIGURE 16.8 Average displacement vs. time — standard search: nonlinear target motion without prediction.
16.5.1.4 Nonlinear Target Motion with Prediction The average robot displacement for the standard search involving nonlinear target motion with target motion prediction is shown in Figure 16.9. As the target moves sinusoidally through the search space, the average displacement decreases for n ≥ 36, at the point when the target changes its direction in the y-coordinate. At this point, the density is too high and a total electrostatic
Average Displacement vs. Time – Standard Search: Nonlinear Target Motion with Prediction 1.4
Average Displacement (m)
1.2 1 9 16 25 36 49 64 81 100
0.8 0.6 0.4 0.2 0
1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97 Time (s)
FIGURE 16.9 Average displacement vs. time — standard search: nonlinear target motion with prediction.
P1: Binaya Dash November 20, 2007
10:58
7985
7985˙C016
500
Modeling and Control of Complex Systems Average Displacement vs. Time – Standard Search: Target Random Walk 3
Average Displacement (m)
2.5 2
9 16 25 36 49 64 81 100
1.5 1 0.5 0
1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97 Time (s)
FIGURE 16.10 Average displacement vs. time — standard search: target random walk.
dispersion takes place, after which there is a tendency to converge. For n < 36, there appears to be divergence because there are insufficient robots to cover the search space or excessive grouping of a small number of robots leading to dispersion. 16.5.1.5 Target Random Walk The average robot displacement versus time for standard search with a random target walk is shown in Figure 16.10. For all n except n = 16, there is initial convergence and dispersion until final convergence; this is due to the localized motion of the target. For n = 16, there appears to be a strong electrostatic dispersion, indicating that the robots became too close to each other during the search. 16.5.2 Voronoi Search The results of the Voronoi search method are in general similar to those of the standard search method, except for the additional dispersion when moving the robots to their respective cell CMs using Voronoi domain decomposition. 16.5.2.1 Linear Target Motion without Prediction The average robot displacement versus time for the Voronoi search method (figure not shown) concerning linear target motion without target motion prediction reveals multiple electrostatic dispersions and the natural tendency of dispersion due to the Voronoi domain decomposition moving the robots to
P1: Binaya Dash November 20, 2007
10:58
7985
7985˙C016
Multi-Robot Social Group-Based Search Algorithms
501
the CMs of the Voronoi cells, and results in increasing the average displacement. However, for n > 25, the average displacement appears not to continue growing without bound. 16.5.2.2 Linear Target Motion with Prediction The simulation for the Voronoi search with linear target motion and target motion prediction shows initial convergence followed by total electrostatic dispersion, after which there are multiple critical pair-wise electrostatic dispersions for the robots satisfying Equation (16.8) and a pseudo-natural dispersion brought about by the Voronoi decomposition. The average displacement versus time graphs (figure not shown) are similar to those of the standard search case, and there is a tendency for convergence for all cases except n = 81 and n = 100. 16.5.2.3 Nonlinear Target Motion without Prediction The average displacement versus time graphs for the Voronoi search with nonlinear target motion without target motion prediction are similar to those of the standard search case, except for the natural dispersion as a result of Voronoi domain decomposition and movement of the robots to the respective CM of the Voronoi cells; convergence for this case is inconclusive. 16.5.2.4 Nonlinear Target Motion with Prediction The average displacement versus time graph for the Voronoi search method with nonlinear target motion and target motion prediction (figure not shown) is similar to that for the standard search as shown in Figure 16.9. There is a tendency for all the robots to converge except for n = 9, where there may not be enough robots to suitably cover the search space. 16.5.2.5 Target Random Walk The average displacement versus time graphs for Voronoi search with a target random walk is shown in Figure 16.11. It can be seen that for n = 9 robots, there is good coverage of the search space in proximity to the target without the need for electrostatic dispersion or Voronoi-based multiradial search. For all other cases, there appears to be continual dispersion throughout the simulation. In comparison to the standard search with target random walk of Figure 16.10, it appears that the Voronoi decomposition performs additional dispersion, because it moves the robots from their positions in the Voronoi cells to the Voronoi cell CM, which leads to a slight divergence. 16.5.3 Goldsmith Search Studies of the average displacement of the Goldsmith search method with linear and nonlinear target motion and random target walk are made.
P1: Binaya Dash November 20, 2007
10:58
502
7985
7985˙C016
Modeling and Control of Complex Systems Average Displacement vs. Time – Voronoi Search: Target Random Walk
Average Displacement (m)
2.5
2 9 16 25 36 49 64 81 100
1.5
1
0.5
0
1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97 Time (s)
FIGURE 16.11 Average displacement vs. time — Voronoi search: target random walk.
16.5.3.1 Linear Target Motion The average displacement versus time graphs for a Goldsmith search with linear target motion is shown in Figure 16.12. Multiradial search and electrostatic dispersion are added to Goldsmith and Robinett’s [2] work for ease of comparison with the standard and Voronoi search methods introduced by the authors here. There appears to be a gradual divergence of all robots throughout the simulation but towards the end, there is slight convergence for all robots except for n = 9. Average Displacement vs. Time – Goldsmith Search: Linear Target Motion 1.8
Average Displacement (m)
1.6 1.4 9 16 25 36 49 64 81 100
1.2 1 0.8 0.6 0.4 0.2 0
1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97 Time (s)
FIGURE 16.12 Average displacement vs. time — Goldsmith search: linear target motion.
P1: Binaya Dash November 20, 2007
10:58
7985
7985˙C016
Multi-Robot Social Group-Based Search Algorithms
503
Average Displacement vs. Time – Goldsmith Search: Nonlinear Target Motion 1.6
Average Displacement (m)
1.4 1.2 9 16 25 36 49 64 81 100
1 0.8 0.6 0.4 0.2 0
1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97 Time (s)
FIGURE 16.13 Average displacement versus time — Goldsmith search: nonlinear target motion.
16.5.3.2 Nonlinear Target Motion The average displacement versus time graph for a Goldsmith search with nonlinear target motion is shown in Figure 16.13. There appears to be numerous instances of critical electrostatic repulsion and dispersion from the CM, which results in initial divergence but a trend toward an average displacement between 0.8 and 1.2 m. 16.5.3.3 Target Random Walk The average displacement versus time graph for a Goldsmith search with a target random walk is shown in Figure 16.14. There appears to be multiple critical electrostatic dispersions and dispersion from the CM, and a general trend towards divergence. The random motion of the target also causes a larger variance of average displacement for different robot populations n, when compared to the linear or nonlinear target motion cases. 16.5.4 Algorithm Execution Time The measurement of the execution time is performed using the MATLAB [1] cputime() command and measures the total time taken for ttotal = 100 algorithm time steps. The execution time versus the number of robots for the standard search method for all types of motion is shown in Figure 16.15. As the number of robots increases, the execution time increases due to the increased computation in the various functions. The polynomial fitted to the execution time data for the standard search with linear target motion without motion prediction is quadratic in the number of robots n, that is, t(n) = 0.08n2 −0.36n+ 0.53. The algorithm is of computational complexity O(n2 ) because there are
P1: Binaya Dash November 20, 2007
10:58
7985
504
7985˙C016
Modeling and Control of Complex Systems Average Displacement vs. Time – Goldsmith Search: Target Random Walk 1.8 1.6
Average Displacement (m)
1.4 1.2
9 16 25 36 49 64 81 100
1 0.8 0.6 0.4 0.2 0
1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97 Time (s)
FIGURE 16.14 Average displacement vs. time — Goldsmith search: target random walk.
n(n − 1)/ 2 inverse square law force interactions; however, dispersion may not take place at each time step. The execution time figures (not shown) for the Voronoi and Goldsmith algorithms are similar to those of Figure 16.15. 16.5.5 Radial Search Execution Time The results (not shown) indicate that as the number of robots is increased, the execution time for multiradial search decreases for both the standard and Execution Time vs. Number of Robots Standard Search 4
Standard Search - Linear Target Motion with Prediction
Execution Time (s)
3.5
Standard Search - Linear Target Motion without Prediction
3 2.5
Standard Search - Nonlinear Target Motion with Prediction
2 1.5
Standard Search - Nonlinear Target Motion without Prediction
1 0.5 0
Standard Search - Target Random Walk
9
16
25 36 49 64 Number of Robots (n)
81
FIGURE 16.15 Execution time vs. number of robots — standard search.
100
P1: Binaya Dash November 20, 2007
10:58
7985
7985˙C016
Multi-Robot Social Group-Based Search Algorithms
505
TABLE 16.2
Average Execution Times (in seconds) Over All Population Sizes for the Standard and Voronoi Search Algorithms Using Multiradial Search Target Motion Search Methods Linear target motion without prediction Linear target motion with prediction Nonlinear target motion without prediction Nonlinear target motion with prediction Target random walk
Standard Search 169.87 70.00 199.25 124.00 297.75
Voronoi Search 186.50 60.37 232.25 80.25 252.37
Voronoi algorithms. Table 16.2 shows the average execution times over all population sizes for the two search algorithms using multiradial search. It can be seen for both algorithms that, for linear and nonlinear target motion, the execution time is smaller with target motion prediction than without. This is due to a preemptive motion of the robots towards the area of the target location at the next time step, resulting in a closer proximity of the robots to the target and reducing the time spent performing multiradial search. The results of Table 16.2 also indicate that the average execution time performing multiradial search for the Voronoi algorithm with target motion prediction is less than that for the standard search algorithm; however, this is reversed when no target prediction is used. This may be attributed to the Voronoi method performing additional dispersion by moving the robots to their cell CMs. The execution time for multiradial search for target random walk is less for the Voronoi search method due to the localized nature of the walk with small finite displacements, and the more even distribution of robots placed by the Voronoi domain decomposition. In these simulations, the search is not ended after target detection, but rather a target-following mechanism is employed: robot convergence takes place initially, followed by an intentional dispersion to diverge the robots to prevent congestion, and results in a general convergence to an average displacement value.
16.6
Conclusions
A heuristic multi-robot social group-based search algorithm, for the purpose of search, detection, and tracking of a stationary or moving target, was studied. Three search algorithms were experimented with in this work: the original Goldsmith search, and the authors’ standard and Voronoi search methods. The results of interest for all three algorithms concern: (1) the convergence of the robots to the target and the sufficient target tracking ability without becoming too congested in search space, for the three target motion types, and (2) the execution times for the three different algorithms as a whole, and
P1: Binaya Dash November 20, 2007
10:58
506
7985
7985˙C016
Modeling and Control of Complex Systems
the time taken for the multiradial search function both with and without the Voronoi domain decomposition. The Goldsmith search algorithm used a multiradial search mechanism to detect the target, and target tracking was performed by robots moving aggressively both toward and beyond the target. The variance in the average displacement d¯ of the robots from the target was the smallest for the nonlinear target motion case, and the greatest for the target random walk. The authors’ standard search method showed convergence for linear and nonlinear target motion with prediction, but general divergence for both target motions without target motion prediction. The variance of the average displacement was the smallest for target random walk, due to the more localized position of the target in search space, and the algorithm showed good convergence for all population sizes. The authors’ Voronoi search algorithm made use of Voronoi domain decomposition and was hence susceptible to a greater number of robot dispersions due to replacement of the robots to their respective Voronoi cell CMs; the average displacement plots are hence more jagged. There appeared to be convergence for some populations for all target motions with and without prediction, and in general there was more variability in the average displacement than for the standard search. The execution times of the three algorithms were similar, as the same mechanism is in place for each, and have similar computational complexity; all algorithms perform multiradial search of complexity O (n) and inverse square law dispersion of complexity O(n2 ) within each iteration, but in general the execution time is dependent on the robots finding the target using a multiradial search. The average time taken performing the multiradial search in the standard and Voronoi search algorithms, for target motion prediction, is less than without target motion prediction. This is due to the robots being closer to the target and hence less time is spent performing the radial search. The Voronoi domain decomposition also distributes the robots more evenly over the search space. Future work will be done in the following areas: (1) cooperative and noncooperative behavior and predator–prey relationships [20] involving multiple targets and social groups, (2) planar and three-dimensional searches to contribute to a ship-aircraft-submarine simulation system with multiple objectives, (3) obstacle avoidance, [21,22] and (4) optimal search resource allocation to perform suitable search space coverage.
References 1. Matrix Laboratory (MATLAB) version 7.0, The MathWorks, Inc. 2. Goldsmith, S.Y. and Robinett, R., Collective search by mobile robots using alphabeta coordination, in Collective Robotics, Drogoul, A., Tambe, M., and Fukuda, T., Eds., Springer-Verlag, Berlin, 1998, 136–146.
P1: Binaya Dash November 20, 2007
10:58
7985
7985˙C016
Multi-Robot Social Group-Based Search Algorithms
507
3. Koopman, B.O., OEG Report No. 56, The Summary Reports Group of the Columbia University Division of War Research “Search and Screening”, (Available from the Center for Naval Analyses), 1946. 4. Koopman, B.O., Search and Screening: General Principles with Historical Applications, Pergamon Press, New York, 1980. 5. Jennings, J.S., Whelan, G., and Evans, W.F., Cooperative search and rescue with a team of mobile robots, in Proceedings of ICAR ’97, International Conference Advanced Robotics, Monterey, 1997, 193. 6. Burgard, W. et al., Collaborative multi robot exploration, in Proc. of ICRA ’00, IEEE International Conference on Robotics and Automation, San Francisco, 2000, 476. 7. Singh, S.P.N. and Thayer, S.M., Kilorobot search and rescue using an immunologically inspired approach, Distributed Autonomous Robotic Systems, 5, 424, 2002. 8. Pack, D.J. and Mullins, B.E., Toward finding an universal search algorithm for swarm robots, in Proceedings International Conference on Intelligent Robots and Systems, IEEE/RSJ, Las Vegas, 2003, 1945. 9. Guo, Y., Parker, L.E., and Madhavan, R., Towards collaborative robots for infrastructure security applications, in Proceedings of CTS ’04, International Symposium on Collaborative Technologies and Systems, San Diego, 2004, 235. 10. Ablavsky, V. and Snorrason, M., Optimal search for a moving target: A geometric approach, in Proceedings of AIAA Guidance, Navigation and Control Conference and Exhibit, AIAA, Denver, 2000, AIAA-2000-4060. 11. Benkoski, S.J., Monticino, M.G., and Weisinger, J.R., A survey of the search theory literature, Naval Research Logistics, 38, 469, 1991. 12. Kan, Y.C., Optimal search of a moving target, Operations Research, 25, 864, 1977. 13. Discenza, J.H. and Stone, L.D., Optimal survivor search with multiple states, Operations Research, 29, 309, 1981. 14. Stone, L.D., Theory of Optimal Search, 2nd ed., Operations Research Society of America, Arlington VA, 1989. 15. Iida, K., Optimal search plan minimizing the expected risk of the search for a target with conditionally deterministic motion, Naval Research Logistics, 36, 597, 1989. 16. Frost, J.R. and Stone, L.D., Review of Search Theory: Advances and Applications to Search and Rescue Decision Support, Report No. CG-D-15-01, U.S. Coast Guard Research and Development Center, Groton, CT, 2001. 17. Meriam, J.L. and Kraige, L.G., Engineering Mechanics: Statics, 5th ed., John Wiley & Sons, New York, 2002. 18. Wolfram Research Inc.: http://mathworld.wolfram.com/VoronoiDiagram.html 19. Kibble, T.W.B. and Berkshire, F.H., Classical Mechanics, 5th ed., Imperial College Press, London, 2004, 78. 20. Pekalski, A., A short guide to predator-prey lattice models, Computing in Science and Engineering, 6(1), 62, 2004. 21. Gill, M.A.C. and Zomaya, A.Y., Obstacle Avoidance in Multi-Robot Systems: Experiments in Parallel Genetic Algorithms, World Scientific Series in Robotics and Intelligent Systems, Vol. 20, World Scientific Publishing, London, 1998. 22. Piepmeier, J.A. et al., Uncalibrated target tracking with obstacle avoidance, in Proceedings of the International Conference on Robotics and Automation, 2, IEEE, San Francisco, 2000, 1670.
P1: Binaya Dash November 20, 2007
10:58
7985
7985˙C016
P1: Naresh Chandra November 16, 2007
18:8
7985
7985˙C017
Index A ACC, see Adaptive cruise control ACP, see Adaptive congestion control protocol Action-dependent heuristic dynamic programming (ADHDP), 5, 132, 142 Action-dependent programming, 82, 83 Action networks, Q-learning and, 144 Action point (AP) models, 421 Active sensing problem, 180, 190 Adaptive congestion control protocol (ACP), 206, 220 Adaptive control classical, 17 linear, 59 nonlinear, 60 online, 62 problem, 61, 62 Adaptive controllers, stabilizing, 27 Adaptive cruise control (ACC), 422 Adaptive dynamic programming, see H-infinity control, model-free adaptive dynamic programming algorithms for Adaptive law, error models and, 35 Adaptive methods, stages of, 26 Adaptive system(s) instability in, 37 theoretical considerations, 25 Additive Gaussian white noise model, 389–393 ADHDP, see Action-dependent heuristic dynamic programming ADP, see Approximate dynamic programming Aerospace structures, inelastic restoring forces of, 108 AIBO, Sony, 310 Aircraft, initialized states of, 153 Aircraft carriers, robotic docking in, 310 ANN, see Artificial neural network AP models, see Action point models Approximate dynamic programming (ADP), 5, 81, 131 ARMA, see Autoregressive moving average
Artificial neural network (ANN), 15, 16, see also Neural networks adaptive algorithms, 114 goal of using, 3 identification in nonlinear viscous dampers using, 124 research, 15 Automotive control systems, 88–89 Autonomous multiagents formations, 7 Autopilot controller design F-16 aircraft, 150 HDP-based, 151 parameter drift problem, 156 Q-learning-based, 153 Autoregressive moving average (ARMA), 22
B Back propagation, architecture for, 38, 40 Back propagation through time, 39, 40 Backstepping, nonlinear version of, 87 Base station, 181 Basis functions, 62, 63 Beat-down problem, 205 Bellman optimality principle, 134 Bellman’s dynamic programming, 73 Beta strands, 9 Biological control structures, 91 Boolean networks, cancer genomics and, 347 Bottleneck link network, 217, 218, 231 Bouc–Wen model, 108 Bounded reference input, 61 Bridge(s) earthquake excitation of, 119 influence coefficients, 118 MIMO model, 117 Building structures and bridges, 4–5, 99–130, see also Structural systems background, 100 classification of identification techniques, 102–103 examples and case studies, 117–128 modeling of Vincent Thomas Bridge using earthquake response measurements, 117–118
509
P1: Naresh Chandra November 16, 2007
18:8
7985
7985˙C017
510 models of nonlinear viscous dampers from experimental measurements, 121–123 nonparametric identification through Volterra–Wiener neural networks, 125–128 online identification of hysteretic joints from experimental measurements, 118–121 parametric identification of MDOF systems, 123–125 hybrid approach for identification of nonlinear structural systems, 104–117 hybrid parametric/nonparametric approach, 104–106 nonparametric nonlinear part, 111–117 parametric linear part, 106–107 parametric nonlinear part, 107–110 identification of structural systems, 101–102 overview of structural control of civil infrastructure systems, 100–101 scope, 103–104 uncertainty in identification, 103
C Cancer genomics, 8–9, 339–365 coefficient of determination, 341, 359 context-sensitive PBNs, 362 control inputs, 343, 346 controlled Markov chain, 343, 351 dynamic programming, 344–345 examples, 354–362 real-world example based on gene expression data, 358–362 simple illustrative example, 354–358 finite horizon performance index, 340 future directions, 362–364 gene activity profile, 348 gene expression data, 358 genetic regulatory networks and dynamical systems, 341–342 intervention, 342–344 mathematical details, 345–354 control in probabilistic Boolean networks, 350–353 review of probabilistic Boolean networks, 347–350 solution using dynamic programming, 353–354 mean first passage time, 346 oncogenes, 340 optimal fare selection problem, 345
Modeling and Control of Complex Systems prediction signal processing and control, 8–9 predictor genes, 341 prescriptive PBNs, 342 principle of optimality, 353 probabilistic Boolean networks, 340, 342 target gene, 341 transition probability matrix, 351 treatment horizon, 346 tumor suppressor genes, 340 Chebyshev polynomial, definition of, 113 Civil infrastructure systems, see Building structures and bridges COD, see Coefficient of determination Coefficient of determination (COD), 341, 359 Cohesive motion control, see Persistent autonomous formations and cohesive motion control Communication cost, sensor network, 181, 195, 197 inter-robot, 305 radio, sensor networks and, 183 Complex systems, modeling and control of, 1–11 building structures and bridges, 4–5 cancer genomics, 8–9 congestion control in computer networks, 6–7 fair data gathering in wireless sensor networks, 5–6 H-infinity control of complex linear systems, 5 home automation by means of MAS theory, 10–11 large-scale autonomous multi-robot teams, 8 multi-robot social group-based search algorithms, 11 neural networks, 3–4 optimization problems in deployment of sensor networks, 6 persistent autonomous formations and cohesive motion control, 7 stabilization of turbulent flow PDEs, 10 transportation systems, 9–10 unmanned aerial vehicles, 7–8 visuomotor pathway, 9 Computer networks, congestion control in, 6–7, 203–246 ACP dynamics, 234 receiver, 223 router, 223 sender, 222 utilization, 238
P1: Naresh Chandra November 16, 2007
18:8
7985
7985˙C017
Index adaptive congestion control protocol, 220–243 comparison with RCP, 241243 comparison with XCP, 239–241 dynamics of ACP, 234–237 fairness, 232–233 multilink example, 237–239 packet header, 221–222 performance in presence of short flows, 229–231 scalability, 225–229 bandwidth delay products, 204 congestion control algorithm, 217 protocol, 239 strategy, 236 congestion window oscillation, 229 distributed iterative algorithms, 205 dual problem, 210 dynamic laws, 209 equilibrium values, 211 explicit congestion control protocol, 219, 239 feedback mechanism, 216 signal, 207, 208 file transfer protocol, 226 fluid flow models, 212, 217 Lyapunov function, 214 max-min congestion controller, 216 model of single bottleneck link case, 217–219 multilink example, 237 networks of arbitrary topology, 244 packet header, 221 previous work, 209–217 dual algorithms, 210–213 max-min congestion controller algorithms, 215–217 primal algorithms, 213–215 problem formulation, 206–209 projection operator, 223, 224 propagation delay, 218, 227 queue size time response, 235 rate control protocol, 219, 241 scalability, 225 sending rate, 207, 224 static laws, 209 transmission control protocol, 204 user response, 233 vulnerabilities, 204 Concrete, see Reinforced concrete subassembly CONRO, 301, 306 Constraint consistent formation, 249, 253
511 Continuous-time agent models, 271 Control decentralized, 29 design, global, 64 distributed, 29 hierarchical, 29 indirect, 50 law adaptive laws and, 86 determination of, 83 optimal, neural networks for, 72 separation-separation, 272 systems shift from sensor-poor to data-rich, 200 theoretic properties of, 21 Controllability, global, 66 Controlled Markov chain, 343 Controller(s) adjustment of, 63 backstepping, 445 building of based on artificial neural networks, 15 feedback, neural networks as, 76 global, 69 high-performance, 85 linear, 44 multiple models of, 52 off-line vs. online design of, 63 parameters, 26 PID, 87 piecewise smooth stabilization, 70 practical design of, 48–55 control of nonlinear multivariable systems, 53–54 interconnected systems, 54–55 modeled disturbances and multiple models for rapidly varying parameters, 50–53 Control problems, applications of neural networks to, 84–92 automotive control systems, 88–89 biological control structures for engineering systems, 91 biped walking robot, 89 controller in hard disk drive, 85–86 fed-batch fermentation processes, 87–88 lean combustion in spark ignition engines, 86–87 MIMO furnace control, 87 multiple models, 90–91 preconscious attention control, 91–92 real-time predictive control in manufacturing industry, 90 Cortical movie, B-space representation of, 386
P1: Naresh Chandra November 16, 2007
18:8
7985
7985˙C017
512 Coverage control problem, 180, 190 Curse of dimensionality, 81
D DAI, see Distributed artificial intelligence Data collection point, 181 Dead zone, 50 Decentralized control, 29 Degrees of freedom (DOF) count, 254 DHP, see Dual heuristic programming Differential game theory, military strategy and, 284 Diophantine equation, 26 Directed trilateration (DT), 262 Directional derivative, 65 Discrete-time model, 109 Discrete-time plant, 28 Distance set, 251 Distributed artificial intelligence (DAI), 279 Distributed control, 29 DOF count, see Degrees of freedom count DOF system, hysteretic identification problem for, 108 DT, see Directed trilateration Dual algorithms, 210 Dual heuristic programming (DHP), 82 Duffing–Van der Pol oscillator, 105 Dynamic back propagation, 39, 40 Dynamic programming, 72, 73 approximate, 81 continuous time, 79–80 discrete time, 80–81, 81–84
E Earthquake engineering, modeling behavior and, 102 response measurements, 117 Edge splitting, 256 EDW, see Expanding detection window Emergent behavior, 18 Error models adaptive laws and, 35 for nonlinear systems, 36 output errors and, 26 Expanding detection window (EDW), 383 Explicit congestion control protocol (XCP), 219, 239 Exploration noise, 146 Extremum seeking methods, gradient-based, 36
Modeling and Control of Complex Systems F F-16 aircraft, autopilot controller design for, 150 Fair data gathering problem, 163 Fed-batch fermentation processes, 87–88 Federal Highway Administration (FHWA), 409 Feedback control law, 28, 29, 46 controllers, neural networks as, 76 laws, discontinuous, 67 linear state, 54 signals, computer network, 207, 208 smooth, 66, 68 stabilizing policies, 133 state information structure, 135 optimal disturbance as, 134 Feedforward networks, 31 adjustment of parameters in, 37 static nonlinearities of, 114 FHWA, see Federal Highway Administration Field of view (FOV), 290 File transfer protocol (FTP), 226 First-order terms, 117 First responders, 298 Flying objects, modeling and control of, 7 Formation control problem, 284 underlying graph of, 248 underlying undirected graph of, 251 Forward programming scheme, logic behind, 83 FOV, see Field of view Frobenius’ theorem, neural networks and, 70 FTP, see File transfer protocol Function approximation, neural networks and, 73
G Game algebraic Riccati equation (GARE), 133, 137, 156 GAP, see Gene activity profile GARE, see Game algebraic Riccati equation Gaussian function, width of, 31 Gazis–Herman–Rothery (GHR) model, 418 GDHP, see Globalized dual heuristic programming Gene activity profile (GAP), 348 Genomics, see Cancer genomics GHR model, see Gazis–Herman–Rothery model
P1: Naresh Chandra November 16, 2007
18:8
7985
7985˙C017
Index Globalized dual heuristic programming (GDHP), 82 Global observability, 71 Global system theory, mathematical machinery used in, 64 Grammian matrix, 78
H Hamilton–Jacobi–Bellman equation, 80 Hard disk drive, controller in, 85–86 HDP, see Heuristic dynamic programming Heuristic dynamic programming (HDP), 5, 82, 132, 137 action-dependent heuristic, 132, 142 algorithm, online implementation of, 139 -based autopilot controller design, 151 derivation of for zero-sum games, 137 flowchart, zero-sum games, 140 Hierarchical control, 29 Higher-order functions, 44 High-order neural network (HONN), 117 Highway traffic flow control (HTFC) system, 422 H-infinity control, model-free adaptive dynamic programming algorithms for, 5, 131–159 action-dependent heuristic dynamic programming, 142–150 convergence of zero-sum game Q-learning, 148–150 derivation of model-free online tuning based on Q-learning algorithm, 143–146 online implementation of Q-learning algorithm, 146–148 discrete-time state feedback control for zero-sum games, 132–137 heuristic dynamic programming, 137–142 convergence of zero-sum game HDP, 140–142 derivation of HDP for zero-sum games, 137–139 online implementation of HDP algorithm, 139–140 online ADP H∞ autopilot controller design for F-16 aircraft, 150–156 HDP-based H∞ autopilot controller design, 151–153 Q-learning-based H∞ autopilot controller design, 153–156 Home automation, approach to by means of MAS theory, 10–11, 461–483 agents, 465 communication through power line, 466, 467
513 device electric signature, 476 domotic agent, 469, 470 energy consumption, 463 HAS analysis and simulation, 472–481 power leveling, 474–475 simulation results, 477–481 simulator, 472–474 water leveling, 475–477 HAS theory, 469–471 domotic agent definition, 469–471 home automation system definition, 471 home automation systems, 464–469 HAS description, 464–466 paradigmatic example, 466–468 system behavior and control, 468–469 hot water circuit, 467 house heating circuit, 467 nodes, 467, 477 paradigm of MAS theory, 462 qualities of HAS agents, 465 software environment, 472 TCP/IP protocol, 473 HONN, see High-order neural network HTFC, see Highway traffic flow control Hydraulic actuators, 118 Hysteretic joints, online identification of, 118
I ICC, see Intelligent cruise control Identifiers, practical design of, 48–55 control of nonlinear multivariable systems, 53–54 interconnected systems, 54–55 modeled disturbances and multiple models for rapidly varying parameters, 50–53 IFRC, see Interference-aware rate control protocol Implicit function theorem, 24, 47 Inequality constraint, optimization problem, 74 Infinite-horizon value function, 133 In-flight refueling, robotic docking in, 310 Input–output model, 32, 33 transfer function, 2 Instability, speed of adaptation and, 37 Intelligent cruise control (ICC), 422 Interconnected systems, 54–55 Interference-aware rate control protocol (IFRC), 163
P1: Naresh Chandra November 16, 2007
18:8
7985
7985˙C017
514 Internet congestion control, 205 congestion protocols, 206 structural properties of, 203–204 Inuktun microbot, 304 Inverse function theorem, 24, 46 ISODATA clustering algorithm, 317
K Karhunen–Loeve (KL) decomposition, 369, 383 KL decomposition, see Karhunen–Loeve decomposition Kronecker product, 138, 141, 145 Kuramoto model, 393, 398
L Lateral geniculate nucleus (LGN), 373, 377 LGN, see Lateral geniculate nucleus Lie algebra controllability of nonlinear systems and, 69 rank of, 71 Linear adaptive control, 59 Linearization, identification and control based on, 43–48 higher-order functions, 44–45 system representation, 43–44 system theoretic properties, 45–48 Linear operator, vector field defining, 65 Linear quadratic regulator (LQR), 158 Linear systems mathematical tractability of, 30 state description of, 17 Linear and time invariant (LTI) plant, 59 Linear time-invariant systems, 20–23 ARMA model, 21–22 controllability, observability, and stability, 21 minimum phase systems, 22–23 Linear time-varying difference equation, 42 Linear time-varying system (LTV), 43 LQR, see Linear quadratic regulator LTI, see Linear and time invariant plant LTV, see Linear time-varying system Lure theory, 56 Lyapunov function, 35, 46, 60
M Manifolds, dynamics on, 64 Marsupial systems, 8
Modeling and Control of Complex Systems MAS theory, see Multi-agent system theory Mathematical systems theory, ANN and, 16 MDOF structural systems, see Multidegree-of-freedom structural systems MDOF systems hysteretic identification problem for, 108 parametric identification of, 123–125 MegaScouts, 331 MEMS devices, see Micro-electromechanical systems devices Micro-electromechanical systems (MEMS) devices, 4–5, 128 Military operations, see Unmanned aerial vehicles MIMO systems, see Multiple-input multiple-output systems Minimally persistent formation, 249 Minimum phase system, 23 MISO control architecture, baroreceptor reflex, 91 Mission space, 179, 191 Model(s) action point, 421 additive Gaussian white noise, 389–393 agent, continuous-time, 271 ARMA, 21–22 Bouc–Wen, 108 complex, purposes of, 2 development of using physical laws, 2 discrete-time, 109 error adaptive laws and, 35 output errors and, 26 Gazis–Herman–Rothery, 418 input–output, 32, 33 Kuramoto, 393, 398 multiple neural networks and, 52 rapidly varying parameters and, 50 switching and tuning, 90 NARMA, 33, 51 nonholonomic unicycle dynamics, 272 nonholonomic unicycle kinematics, 272 nonlinear, parametric form of, 105 nonlinear viscous dampers, 121–123 predictive control (MPC), 422 reference, controller parameters and, 26 robot deployment, 304 sensitivity, time-varying, 41 series-parallel identification, 34 simple design, 121 state vector, 33 traffic networks, 10 unforced stable disturbance, 51
P1: Naresh Chandra November 16, 2007
18:8
7985
7985˙C017
Index validation, 2 velocity integrator, 250, 271 Motion control, cohesive, see Persistent autonomous formations and cohesive motion control MPC, see Model predictive control MPN, see Multilayer preceptron network Multi-agent system (MAS) theory, 461, see also Home automation, approach to by means of MAS theory Multidegree-of-freedom (MDOF) structural systems, 104 Multilayer preceptron network (MPN), 31 Multiple-input multiple-output (MIMO) systems, 17, 20, 114 controllable, 21 equation describing, 20 furnace control, 87 input–output relation for, 22 model, Vincent Thomas Bridge, 117 plant control and, 25 Volterra–Wiener neural network, 114 Multi-robot social group-based search algorithms, 11, 485–507 algorithm comparison, 494–496 Goldsmith search, 494 parameters for search methods, 495–496 standard search, 494 target motion, 495 Voronoi search, 494 algorithm execution time, 503–504 code structure, 492–493 dispersion parameters, 496 Goldsmith search results, 501–503 linear target motion, 502 nonlinear target motion, 503 target random walk, 503 inverse square electrostatic force law, 492 MATLAB® code, 492, 493 MATLAB execution time, 503 radial search execution time, 504–505 robot group center of mass, 487 security application, 486 social group-based search algorithm, 487–492 coordination of alpha and beta robots, 488 movement of alpha robots, 489 movement of beta robots, 489–490 multiradial search with Voronoi domain decomposition, 491 robot dispersion, 492 standard search results, 497–500 linear target motion with prediction, 498
515 linear target motion without prediction, 497 nonlinear target motion with prediction, 499–500 nonlinear target motion without prediction, 498 target random walk, 500 Voronoi domain decomposition method, 487, 491, 500 Voronoi search results, 500–501 linear target motion with prediction, 501 linear target motion without prediction, 500–501 nonlinear target motion with prediction, 501 nonlinear target motion without prediction, 501 target random walk, 501 Multi-robot teams, framework for large-scale, 8, 297–337 area coverage, 299 autonomous docking and deployment, 302–307 autonomous recharge, 303 deployment methods, 304 deployment models, 304–305 docking methods, 305–307 autonomous recharging, 303 battery life, 314 coarse approach phase, 311 combined locomotion, 308 CONRO, 301, 306 cooperative robotics, 308–309 distribution of resources, 302 docking, 309–313 theory, 309–311 traditional algorithms for docking, 311–312 traditional assumptions and limitations of docking, 312–313 docking station identification, 311 power consumption, 328 energy minimization, 330 fine approach phase, 312 first responders, 298 fixed deployment and recovery positions, 313 future work, 329–332 applications, 331–332 hardware innovations, 330–331 simulation extensions, 329–330 hardware, 313 heterogeneous teams, 301
P1: Naresh Chandra November 16, 2007
18:8
7985
7985˙C017
516 homogeneous teams, 300 infinite power and perfect communication, 313 instantaneous docking, 329 inter-robot communication, 305 Inuktun microbot, 304 ISODATA clustering algorithm, 317 MegaScouts, 331 optimization of behaviors, 314–315 organization, 299–300 Pioneer2 DX robot, 303 PolyBot, 301 polymorphic systems, 310 problem description, 299 resource distribution problem, 300–302 heterogeneous teams, 301–302 homogeneous teams, 300–301 how this approach differs, 302 resource recovery, 330 seeking home, 317 simulation, 316–329 speed of deployment, 299 static environments, 312 tactical question, 300 team longevity, 299 team size, 313
N NARMA models, see Nonlinear ARMA models Network(s), see also specific types action, Q-learning and, 144 bottleneck link, 217, 218, 231 data traffic, 206 feedforward, 31 adjustment of parameters in, 37 static nonlinearities of, 114 lifetime, 180 radial basis function, 31, 85 recurrent, 32 adjustment of parameters in, 37 laws derived for, 56 stability question of, 49 static, interest in recurrent neural networks in, 32 Neural network(s), 3–4, 13–98 adaptive law of, 115 ANNs for control, 16–19 assumptions, 19 control of complex systems, 17–18 linear control and linear adaptive control, 16–17 nonlinear adaptive control, 18–19
Modeling and Control of Complex Systems applications of to control problems, 84–92 automotive control systems, 88–89 biological control structures for engineering systems, 91 biped walking robot, 89 controller in hard disk drive, 85–86 fed-batch fermentation processes, 87–88 lean combustion in spark ignition engines, 86–87 MIMO furnace control, 87 multiple-models, 90–91 preconscious attention control, 91–92 real-time predictive control in manufacturing industry, 90 approach to identification of nonlinear structural systems, 114–117 approximating capabilities of, 20 artificial adaptive algorithms, 114 identification in nonlinear viscous dampers using, 124 comments, 92–93 computational advantage, 78 construct, UAV, 292 dynamic programming in continuous and discrete time, 79–84 continuous time (no uncertainty), 79–80 discrete time (no uncertainty), 80–81 discrete time (system unknown), 81–84 feedback controller determination using, 78 feedforward and recurrent networks, 37–42 back propagation through time, 39–40 dynamic back propagation, 40–41 interconnection of LTI systems and neural networks, 41 real-time recurrent learning, 41–42 filter design issues, 116 function approximation using, 73 global control design, 64–72 dynamics on manifolds, 64–66 global controllability and stabilization, 66–71 global observability, 71–72 goal of using, 3 high-order, 117 historical background, 15–16 identification and control methods, 42–63 identification and control based on linearization, 43–48 practical design of identifiers and controllers, 48–55
P1: Naresh Chandra November 16, 2007
18:8
7985
7985˙C017
Index related current research, 55–59 theoretical and practical stability issues, 59–63 industrial applications of, 4 interconnected, 19 linear-in-the-weights, 115 Lure theory, 56 mathematical preliminaries, 20–30 adaptive systems, 25–27 linear time-invariant systems, 20–23 nonlinear systems, 23–25 problem statement, 27–30 models of nonlinear viscous dampers using, 123 multilayer, 32 multiple models, 52 optimization and optimal control, 72–84 computational advantage, 78 function approximation, 73–74 other formulations, 78–79 parameter optimization, 74–78 properties guaranteed by adaptive law, 116 recurrent networks, 32–33 regressor vector, 116 smooth feedback in, 66 stable adaptive laws, 35–37 error models for nonlinear systems, 36 gradient-based methods and stability, 36–37 system approximation, 34 theoretical results in nonlinear adaptive control using, 63 Volterra–Wiener, 114, 125–127 approximation capabilities of, 126 estimator, 116 filter, 116 learning capabilities in, 115 neural network weights, 125, 126 restoring forces of three-DOF system, 125, 128 Neurocontrol historical developments in, 27 issues related to, 4, 20 Nonholonomic unicycle dynamics model, 272 Nonholonomic unicycle kinematics model, 272 Nonlinear adaptive control, 18–19, 60 Nonlinear ARMA (NARMA) models, 33, 47, 51 Nonlinear dynamic system, natural state space of, 64 Nonlinear forces, residual, 107 Nonlinear model, parametric form of, 105
517 Nonlinear multivariable systems, control of, 53–54 decoupling, 53–54 tracking, 53 Nonlinear restoring forces, 106 Nonlinear system(s) controllability of, 69 equation describing, 43 error models for, 36 tracking problem in, 57, 58 Nonlinear viscous dampers, 121–123 identification using artificial neural networks, 124 nonparametric neural networks, 123 parametric model, 121 Nonparametric component, 105–106
O Observability, neural networks and, 71 One-DOF agents, control law for, 267 Online adaptive control, 62 Online control algorithms, hysteretic system, 108 Online identification algorithm, structural system, 109 Open-loop optimal control, 78 Optimal control neural networks for, 72 open-loop, 78 theory, 93 Optimization dynamic, 74 problem deployment of sensor networks, 6 one-step, 81
P Parameter uncertainty, 103 Parametric component, 105 Parametric pure-feedback (PPF) system, 27 Parametric strict-feedback (PSF) system, 27 Partial differential equations (PDEs), 10, 440, see also PDEs, turbulent flow, backstepping controller for stabilization of solving of, 80 visuomotor pathway, 382 PBNs, see Probabilistic Boolean networks PDEs, see Partial differential equations PDEs, turbulent flow, backstepping controller for stabilization of, 10, 439–460 case k x –0, 455–459 controllers, 445–450
P1: Naresh Chandra November 16, 2007
18:8
7985
7985˙C017
518 discussion, 459–460 model, 441–445 Navier–Stokes equations, 441 Orr–Sommerfeld and Squire equations, 440 Poiseuille parabolic equilibrium, 442 Riccati equations, 440 stability proof, 451–455 controlled wave numbers, 451–453 physical domain, 454–455 uncontrolled wave numbers, 453–454 Volterra operator, 440 Performance function, definition of, 79 index, gradient of, procedure for determining, 40 Persistent autonomous formations and cohesive motion control, 7, 247–275 acquisition and maintenance of persistence, 255–262 acquiring persistence, 255–259 maintaining persistence during formation reconfigurations, 260–262 persistent formation reconfiguration operations, 259–260 closing ranks, 259 cohesive motion of persistent formations, 263–265 acyclically led and cyclically led formations, 264–265 problem definition, 263–264 continuous-time agent models, 271 decentralized control of cohesive motion, 266–272 control design, 266–268 more complex agent models, 271–272 stability and convergence, 269–271 discussions and future directions, 273 edge splitting, 256 first follower, 265 leader, 265 metaformation framework, 261 minimal rigidity, 251 nonholonomic unicycle dynamics model, 272 rigid and persistent formations, 250–255 constraint-consistent and persistent formations, 252–255 rigid formations, 251–252 separation-separation control, 272 velocity integrator model, 250, 271 PID controller, see Proportional-intergate derivative controller Pioneer2 DX robot, 303
Modeling and Control of Complex Systems Plant control, MIMO system and, 25 discrete-time, 28 dynamics, approximation of, 56 simulation studies, 52 unstable, 28 evolving process of, 62 problem of, 49 Policy update index, 139 PolyBot, 301 Pontryagin’s maximum principle, 73 PPF system, see Parametric pure-feedback system Preconscious attention control, 91–92 Primal-based algorithm, 163, 168 Primal-dual algorithms, 210 Probabilistic Boolean networks (PBNs), 9, 340, 342 Proportional-intergate derivative (PID) controller, 87 PSF system, see Parametric strict-feedback system Pursuit game, 284
Q Q-learning, 5, 132 algorithm flowchart of, 147 model-free, 143 online implementation of, 146 -based autopilot controller design, 153 convergence of zero-sum game, 148 QoS, see Quality of service Quality of service (QoS), 283 Quasi-stationary methods, dynamic back propagation and, 41
R Radial basis function network (RBFN), 31, 85 Radio communication, see Wireless sensor networks, fair data gathering in Rate control protocol (RCP), 219, 241 RBFN, see Radial basis function network RCP, see Rate control protocol Real-time recurrent learning, 39, 41 Recurrent network(s), 32 adjustment of parameters in, 37, 39 laws derived for, 56 stability question of, 49 Recursive least-squares algorithm (RLS), 139 Reference model, controller parameters and, 26
P1: Naresh Chandra November 16, 2007
18:8
7985
7985˙C017
Index Reinforced concrete subassembly, 120–121 cyclic testing of, 122 evolution of estimated parameters, 121, 122 nonlinear behavior of system, 120 Reinforcement learning, 81 Retrograde path equations (RPEs), 285 Riccati equations, 10, 132, 440 Rigid formation, 248, 251 RLS, see Recursive least-squares algorithm Robot(s), see also Multi-robot social group-based search algorithms; Multi-robot teams, framework for large-scale biped walking, 89 modeling and control of, 7 Roomba, 310 RPEs, see Retrograde path equations
S SAR robots, see Search and rescue robots SAR theory, see Search and rescue theory SDM, see Simple design model SDW, see Sliding detection window Search algorithm, distributed dual-based gradient, 6 Search and rescue (SAR) robots, 486 Search and rescue (SAR) theory, 11 Second-order system, description of, 77 Second-order terms, 117 Sensing radius, 192 Sensor networks, optimization problems in deployment of, 6, 179–202 bottleneck node, 188, 189 clusterheads, 200 communication cost, 181, 195, 197 coverage control problem, 180, 190 data collection point, 181 data delivery cost, 196 data rate, 183 dead node, 182 deployment with fixed data sources and no mobility, 182–189 incremental iterative approach, 187–189 problem decomposition, 186–187 deployment with unknown data sources and mobile nodes, 190–199 mission space and sensor model, 191 optimal coverage problem with communication costs, 195–199 optimal coverage problem formulation and distributed control, 191–194
519 gradient information, 194 inner force method, 186 link costs, 185 minimum-power topology, 184 mission space, 179, 191 neighbor sets, 193 nodes in, 180 performance of, 180 quality of service, 190 radio communication, 183 research issues, 199–201 sensing radius, 192 sensor network structure, 181–182 sleeping node, 182 switching control problem, 200 system intelligence, 201 virtual forces, 190 wireless transceiver power consumption, 196 Separation-separation control, 272 Series-parallel identification model, 34 Set-point regulation, 47 SHM, see Structural health monitoring SI engines, see Spark ignition engines Signal, tracking of arbitrary, 47 Simple design model (SDM), 121 Single-input single-output (SISO) systems, 17, 21 condition for observability, 21 equation describing, 21, 33, 51 representation, 22 tapped delay lines of, 33 transfer function of, 23 SISO systems, see Single-input single-output systems Sliding detection window (SDW), 383 Spark ignition (SI) engines, 86 Stability, proof of, 27 State vector model, 33 Steel, see Structural steel subassembly Strong observability, 71 Structural control civil infrastructure system, 100 goals of, 101 ultimate goal of research on, 101 Structural health monitoring (SHM), 100, 101 Structural identification, 101 Structural model, theoretical, 103 Structural steel subassembly, 118–119 cyclic testing of, 120 experimental measurements, 118 system stiffness, 119 use of hydraulic actuators on, 118
P1: Naresh Chandra November 16, 2007
18:8
7985
7985˙C017
520 Structural systems, see also Building structures and bridges adaptation law, 110 algorithm learning rate, 110 criterion of system identification procedures, 103 development of mathematical models for, 128 identification, 102–103 hybrid approach, 104–117 hysteretic systems, 108 nonparametric methods, 102–103, 106 parametric methods, 102, 106 problem formulation, 108 types of uncertainty in, 103 multidegree-of-freedom, 104 nonlinear forces representation by Chebyshev series, 113 nonlinear force vector, 111 nonparametric methods, black-box approach to, 102 nonparametric nonlinear forces, 111 online identification algorithm, 109 orthogonal series expansion, 112 Structure(s) aerospace, inelastic restoring forces of, 108 traditional seismic design of, 102 unwanted motions of, control of, 101 Switching and tuning, 52, 90 System(s), see also specific types approximation, methods of, 34 general smooth, 67 interconnected, 54–55 theoretic properties, 21, 45 System of systems, vehicle assembly viewed as, 7
T Tangent bundle, definition of, 65 maps, 66 space, 65 Target capture, 9 gene, 341 points, tracking of, 179 tracking UAV technology and, 7 visuomotor pathway and, 9 Taylor expansion, multidimensional second-order, 117 TCP, see Transmission control protocol TPBVP, see Two-point boundary value problem
Modeling and Control of Complex Systems Tracking problem, state variables of, 58 Traffic networks, models of, 10 Transmission control protocol (TCP), 204, 215, 473 Transportation systems, 9–10, 407–437 action point models, 421 adaptive cruise control, 422 data collection and control in, 9–10 diffusion model, 413 discretized space–time model, 415 Gazis–Herman–Rothery model, 418 Greenberg model, 412 highway traffic flow control, 422–433 control strategies, 427–431 evaluation of HTFC system, 431–433 microscopic simulation, 424–427 system description and notation, 422–424 intelligent cruise control, 422 LWR model, 413 macroscopic traffic flow models, 409–417 mesoscopic traffic flow models, 421 microscopic traffic flow models, 417–421 asymmetric model, 418 deterministic optimal control model, 419–420 generalized linear car-following model, 418 Gipps model, 420 Helly’s model, 419 linear car-following model, 417–418 nonlinear car-following model, 418–419 psychophysical spacing model or action point model, 420–421 stochastic optimal control model, 420 model predictive control, 422 PW model, 415 relaxation parameter, 413 Trilateration graph, 262 Two-DOF agents, control law for, 267 Two-point boundary value problem (TPBVP), 76, 79
U UAVs, see Unmanned aerial vehicles Universal observability, 71 Unmanned aerial vehicles (UAVs), 7–8, 266, 277–295 autonomous control level chart, 280–281 avoidance strategy, 287 collision avoidance, 282 definitions of angles, 286 differential game theory, 284 distributed agent models, 293
P1: Naresh Chandra November 16, 2007
18:8
7985
7985˙C017
Index distributed artificial intelligence, 279 formation control, 284–285 hierarchical multiagent system architecture, 283 Homeland security operations, 278 neural network construct, 292 new directions and technological challenges, 292–293 enabling technologies, 293 technological challenges, 292–293 particle filters, 290, 291 pursuit game, 284 retrograde path equations, 285 sensor field of view, 290 simulation, 289 simulation results, 288–290 stability, 284 system architecture, 280–283 system of systems control, 293 target tracking, 290–292 terrain coverage, 278 two-vehicle example, 285–288 video tracking problem, 290 Urban warfare, automated, 278, 282
V VCM actuator, see Voice-coil motor actuator Vector field(s) distribution of, 70 linear operator and, 65 Velocity integrator model, 250, 271 Virtual sensor, neural network model used as, 90 Virtual vector field, 266 Visual priming, 92 Visuomotor pathway, 9, 367–405 anti-Hebbian adaptation, 378 detection using nonlinear dynamics, 393–399 memory with two elements, 394–399 phase locking with network of Kuramoto models, 393–394 eigenvalue of noise process, 388 encoding cortical waves with β-strands using double KL decomposition, 383–387 expanding detection window, 383 generation of activity waves in visual cortex, 377–378 GENESIS, 375 Hebbian adaptation, 378 Karhunen–Loeve decomposition, 369, 383 Kuramoto model, 393, 398 lateral geniculate nucleus, 373, 377
521 multineuronal model of visual cortex, 374377 partial differential equations, 382 principal eigenvectors, 382, 385 retino-cortical pathway, 370, 373 role of tectal waves in motor control, 399–402 simulation with sequence of stationary inputs, 379–383 sliding detection window, 383 statistical detection of position, 387–393 decoding with additive Gaussian white noise model, 389–393 hypothesis testing, 388–389 series expansion of sample functions of random processes, 387–388 visual streak, 373 Voice-coil motor (VCM) actuator, 85 Volterra–Wiener neural networks (VWNN), 114, 125–127 approximation capabilities of, 126 block diagram, 115 estimator, 116 filter, 116 learning capabilities in, 115 neural network weights, 125, 126 restoring forces of three-DOF system, 125, 129 Voronoi search algorithm, 11 VWNN, see Volterra–Wiener neural networks
W Walking, dynamic, 89 Weighted least-squares equation-error method, 107 Wireless sensor networks, fair data gathering in, 5–6, 161–177 bandwidth allocation, 169, 170, 175 dual-based algorithm, 174 dual-based approach, 170–176 distributed algorithm, 172 performance evaluation, 173 fair data gathering problem, 163–167 formulation of constrained optimization problem, 165–167 modeling wireless receiver bandwidth consumption, 163–165 Lagrange dual function, 171, 172 modeling, 162 nine-node topology, 166, 170 parent–child relationship, 164 primal-based distributed heuristic, 168–170
P1: Naresh Chandra November 16, 2007
18:8
7985
7985˙C017
522 receiver bandwidth capacity, 167 resource-allocation algorithms, 162 source rate allocation, 173
X XCP, see Explicit congestion control protocol
Modeling and Control of Complex Systems Z Zero-DOF agents, control law for, 266 Zero-sum game(s) derivation of HDP for, 137 discrete-time-state feedback control for, 132 Q-learning, 148
P1: Binaya Dash November 17, 2007
15:50
7985
7985˙Color˙Pages
COLOR FIGURE 4.5 Convergence of the critic network parameters.
COLOR FIGURE 4.6 Convergence of the disturbance action network parameters.
P1: Binaya Dash November 17, 2007
15:50
7985
7985˙Color˙Pages
COLOR FIGURE 4.7 Convergence of the control action network parameters.
COLOR FIGURE 4.8 State trajectories.
2
P1: Binaya Dash November 17, 2007
15:50
7985
7985˙Color˙Pages
COLOR FIGURE 4.10 Online model-free convergence of Pi to P that solves the GARE.
COLOR FIGURE 4.11 Convergence of the disturbance action network parameters.
3
P1: Binaya Dash November 17, 2007
15:50
7985
7985˙Color˙Pages
COLOR FIGURE 4.12 Convergence of the control action network parameters.
COLOR FIGURE 12.1 Kinematic analysis of turtle prey capture. Selected movie frames at the top of the figure show a turtle orienting to a moving fish (arrow) in frames 01 to 82, moving toward it (100 to 130), extending and turning its neck (133 to 135) and capturing the fish (138). The bottom image shows the digitization points of the kinematic analysis.
4
P1: Binaya Dash November 17, 2007
15:50
7985
7985˙Color˙Pages
COLOR FIGURE 12.2 The visual pathway in the turtle visual system from eyes to visual cortex.
COLOR FIGURE 12.3 A traveling wave of cortical activity from the model cortex without Hebbian and anti-Hebbian adaptation.
5
P1: Binaya Dash November 17, 2007
15:50
7985
7985˙Color˙Pages
COLOR FIGURE 12.4 Distribution of cells in each of the three layers of the turtle cortex projected on a plane. The lateral geniculate (LGN) cells are distributed linearly (shown at the right side of the bottom edge of the cortex) and the solid line shows how they interact with cells in the cortex.
COLOR FIGURE 12.7 Prey capture and motion extrapolation. (a) To capture a moving fish, a turtle must extrapolate its future position. (b) Probable neural substrate for motion extrapolation in turtles. (Abbreviations: LGN, lateral geniculate nucleus; PT, pretectum; RF, reticular formation; SN, substantia nigra).
6
P1: Binaya Dash November 17, 2007
15:50
7985
7985˙Color˙Pages
COLOR FIGURE 12.8 Compartmental structures of cortical neuron models in the large-scale model of turtle visual cortex.
COLOR FIGURE 12.9 Pyramidal to pyramidal anti-Hebbian synaptic response to changes in the pyramidal activity. (1a): Frames of pyramidal cell activity due to pulse input to the LGN at 0 ms lasting for 150 ms. (1b): Frames of weight responses corresponding to the activities in (1a). (2a): Frames of pyramidal cell activity due to pulse input to the LGN at 400 ms following the first pulse lasting for 150 ms. (2b): Frames of synaptic weight responses corresponding to activities in (2a).
7
P1: Binaya Dash November 17, 2007
15:50
7985
7985˙Color˙Pages
COLOR FIGURE 12.11 Two cortical waves generated by the model cortex using Hebbian and anti-Hebbian adaptation with two consecutive inputs. The second input is initialized at 500 ms.
8
P1: Binaya Dash November 17, 2007
15:50
7985
7985˙Color˙Pages
COLOR FIGURE 12.14 The left-hand column shows the three principal spatial modes. The right-hand column shows the corresponding time coefficients.
COLOR FIGURE 12.15 The typical β-strands with double KL decomposition. In the left figure are the mean β-strands for the 60 presentations for stimuli presented at the left, center and right clusters of geniculate neurons in the first time period. In the right figure are the mean β-strands in the second time period. The colors, blue, red and green, represent the actual positions of stimuli at left, center and right clusters of geniculate neurons.
9
P1: Binaya Dash November 17, 2007
15:50
7985
7985˙Color˙Pages
COLOR FIGURE 12.17 Decision spaces for the detection of three hypotheses. The coordinates are log likelihood ratios computed for five different time windows. On the left column are the decision spaces using the waves in the first time period and on the right column are the decision spaces using the waves in the second time period. The actual positions of stimuli at left, center and right clusters of geniculate neurons are encoded by the blue, red and green colors respectively.
P1: Binaya Dash November 17, 2007
15:50
7985
7985˙Color˙Pages
COLOR FIGURE 12.20 Convergence of phase variables in the two-unit Kuramoto model in detection with linear maps from β-space to complex vector space using the first waves and the second waves for five different time windows. The actual positions of stimuli at left, center and right clusters of geniculate neurons are encoded by the blue, red and green colors respectively.
11
P1: Binaya Dash November 17, 2007
15:50
7985
7985˙Color˙Pages
COLOR FIGURE 12.22 Convergence of phase variables in the two unit Kuramoto model in detection with maps from points in decision space to complex vector space using the first waves and the second waves. The actual positions of stimuli at left, center and right clusters of geniculate neurons are encoded by the blue, red and green colors respectively.
12