Optimization and Dynamical Systems
Uwe Helmke1 John B. Moore2
2nd Edition
March 1996
1. Department of Mathematics, U...
27 downloads
1315 Views
2MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Optimization and Dynamical Systems
Uwe Helmke1 John B. Moore2
2nd Edition
March 1996
1. Department of Mathematics, University of W¨ urzburg, D-97074 W¨ urzburg, Germany. 2. Department of Systems Engineering and Cooperative Research Centre for Robust and Adaptive Systems, Research School of Information Sciences and Engineering, Australian National University, Canberra, ACT 0200, Australia.
Preface This work is aimed at mathematics and engineering graduate students and researchers in the areas of optimization, dynamical systems, control systems, signal processing, and linear algebra. The motivation for the results developed here arises from advanced engineering applications and the emergence of highly parallel computing machines for tackling such applications. The problems solved are those of linear algebra and linear systems theory, and include such topics as diagonalizing a symmetric matrix, singular value decomposition, balanced realizations, linear programming, sensitivity minimization, and eigenvalue assignment by feedback control. The tools are those, not only of linear algebra and systems theory, but also of differential geometry. The problems are solved via dynamical systems implementation, either in continuous time or discrete time , which is ideally suited to distributed parallel processing. The problems tackled are indirectly or directly concerned with dynamical systems themselves, so there is feedback in that dynamical systems are used to understand and optimize dynamical systems. One key to the new research results has been the recent discovery of rather deep existence and uniqueness results for the solution of certain matrix least squares optimization problems in geometric invariant theory. These problems, as well as many other optimization problems arising in linear algebra and systems theory, do not always admit solutions which can be found by algebraic methods. Even for such problems that do admit solutions via algebraic methods, as for example the classical task of singular value decomposition, there is merit in viewing the task as a certain matrix optimization problem, so as to shift the focus from algebraic methods to geometric methods. It is in this context that gradient flows on manifolds appear as a natural approach to achieve construction methods that complement the existence and uniqueness results of geometric invari-
iv
Preface
ant theory. There has been an attempt to bridge the disciplines of engineering and mathematics in such a way that the work is lucid for engineers and yet suitably rigorous for mathematicians. There is also an attempt to teach, to provide insight, and to make connections, rather than to present the results as a fait accompli. The research for this work has been carried out by the authors while visiting each other’s institutions. Some of the work has been written in conjunction with the writing of research papers in collaboration with PhD students Jane Perkins and Robert Mahony, and post doctoral fellow Weiyong Yan. Indeed, the papers benefited from the book and vice versa, and consequently many of the paragraphs are common to both. Uwe Helmke has a background in global analysis with a strong interest in systems theory, and John Moore has a background in control systems and signal processing.
Acknowledgements This work was partially supported by grants from Boeing Commercial Airplane Company, the Cooperative Research Centre for Robust and Adaptive Systems, and the German-Israeli Foundation. The LATEXsetting of this manuscript has been expertly achieved by Ms Keiko Numata. Assistance came also from Mrs Janine Carney, Ms Lisa Crisafulli, Mrs Heli Jackson, Mrs Rosi Bonn, and Ms Marita Rendina. James Ashton, Iain Collings, and Jeremy Matson lent LATEXprogramming support. The simulations have been carried out by our PhD students Jane Perkins and Robert Mahony, and post-doctoral fellow Weiyong Yan. Diagrams have been prepared by Anton Madievski. Bob Mahony was also a great help in proof reading and polishing the manuscript. Finally, it is a pleasure to thank Anthony Bloch, Roger Brockett and Leonid Faybusovich for helpful discussions. Their ideas and insights had a crucial influence on our way of thinking.
Foreword By Roger W. Brockett Differential equations have provided the language for expressing many of the important ideas of automatic control. Questions involving optimization, dynamic compensation and stability are effectively expressed in these terms. However, as technological advances in VLSI reduce the cost of computing, we are seeing a greater emphasis on control problems that involve real-time computing. In some cases this means using a digital implementation of a conventional linear controller, but in other cases the additional flexibility that digital implementation allows is being used to create systems that are of a completely different character. These developments have resulted in a need for effective ways to think about systems which use both ordinary dynamical compensation and logical rules to generate feedback signals. Until recently the dynamic response of systems that incorporate if-then rules, branching, etc. has not been studied in a very effective way. From this latter point of view it is useful to know that there exist families of ordinary differential equations whose flows are such as to generate a sorting of the numerical values of the various components of the initial conditions, solve a linear programming problem, etc. A few years ago, I observed that a natural formulation of a steepest descent algorithm for solving a least-squares matching problem leads to a differential equation for carrying out such operations. During the course of some conversation’s with Anthony Bloch it emerged that there is a simplified version of the matching equations, obtained by recasting them as flows on the Lie algebra and then restricting them to a subspace, and that this simplified version can be used to sort lists as well. Bloch observed that the restricted equations are identical to the Toda lattice equations in the form introduced by Hermann
vi
Foreword
Flaschka nearly twenty years ago. In fact, one can see in J¨ urgen Moser’s early paper on the solution of the Toda Lattice, a discussion of sorting couched in the language of scattering theory and presumably one could have developed the subject from this, rather different, starting point. The fact that a sorting flow can be viewed as a gradient flow and that each simple Lie algebra defines a slightly different version of this gradient flow was the subject of a systematic analysis in a paper that involved collaboration with Bloch and Tudor Ratiu. The differential equations for matching referred to above are actually formulated on a compact matrix Lie group and then rewritten in terms of matrices that evolve in the Lie algebra associated with the group. It is only with respect to a particular metric on a smooth submanifold of this Lie algebra (actually the manifold of all matrices having a given set of eigenvalues) that the final equations appear to be in gradient form. However, as equations on this manifold, the descent equations can be set up so as to find the eigenvalues of the initial condition matrix. This then makes contact with the subject of numerical linear algebra and a whole line of interesting work going back at least to Rutishauser in the mid 1950s and continuing to the present day. In this book Uwe Helmke and John Moore emphasize problems, such as computation of eigenvalues, computation of singular values, construction of balanced realizations, etc. involving more structure than just sorting or solving linear programming problems. This focus gives them a natural vehicle to introduce and interpret the mathematical aspects of the subject. A recent Harvard thesis by Steven Smith contains a detailed discussion of the numerical performance of some algorithms evolving from this point of view. The circle of ideas discussed in this book have been developed in some other directions as well. Leonid Faybusovich has taken a double bracket equation as the starting point in a general approach to interior point methods for linear programming. Wing Wong and I have applied this type of thinking to the assignment problem, attempting to find compact ways to formulate a gradient flow leading to the solution. Wong has also examined similar methods for nonconvex problems and Jeffrey Kosowsky has compared these methods with other flow methods inspired by statistical physics. In a joint paper with Bloch we have investigated certain partial differential equation models for sorting continuous functions, i.e. generating the monotone equi-measurable rearrangements of functions, and Saveliev has examined a family of partial differential equations of the double bracket type, giving them a cosmological interpretation. It has been interesting to see how rapidly the literature in this area has grown. The present book comes at a good time, both because it provides a well reasoned introduction to the basic ideas for those who are curious
Foreword
vii
and because it provides a self-contained account of the work on balanced realizations worked out over the last few years by the authors. Although I would not feel comfortable attempting to identify the most promising direction for future work, all indications are that this will continue to be a fruitful area for research.
Contents Preface
iii
Foreword 1 Matrix Eigenvalue Methods 1.1 Introduction . . . . . . . . . . . . . . . . 1.2 Power Method for Diagonalization . . . 1.3 The Rayleigh Quotient Gradient Flow . 1.4 The QR Algorithm . . . . . . . . . . . . 1.5 Singular Value Decomposition (SVD) . . 1.6 Standard Least Squares Gradient Flows
v
. . . . . .
1 1 5 14 30 35 38
2 Double Bracket Isospectral Flows 2.1 Double Bracket Flows for Diagonalization . . . . . . . . . . 2.2 Toda Flows and the Riccati Equation . . . . . . . . . . . . 2.3 Recursive Lie-Bracket Based Diagonalization . . . . . . . .
43 43 58 68
3 Singular Value Decomposition 3.1 SVD via Double Bracket Flows . . . . . . . . . . . . . . . . 3.2 A Gradient Flow Approach to SVD . . . . . . . . . . . . . .
81 81 84
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
4 Linear Programming 101 4.1 The Rˆ ole of Double Bracket Flows . . . . . . . . . . . . . . 101 4.2 Interior Point Flows on a Polytope . . . . . . . . . . . . . . 111 4.3 Recursive Linear Programming/Sorting . . . . . . . . . . . 117 5 Approximation and Control
125
x
Contents
5.1 5.2 5.3
Approximations by Lower Rank Matrices . . . . . . . . . . 125 The Polar Decomposition . . . . . . . . . . . . . . . . . . . 143 Output Feedback Control . . . . . . . . . . . . . . . . . . . 146
6 Balanced Matrix Factorizations 6.1 Introduction . . . . . . . . . . . . . . . . . 6.2 Kempf-Ness Theorem . . . . . . . . . . . 6.3 Global Analysis of Cost Functions . . . . 6.4 Flows for Balancing Transformations . . . 6.5 Flows on the Factors X and Y . . . . . . 6.6 Recursive Balancing Matrix Factorizations
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
163 163 165 168 172 186 193
7 Invariant Theory and System Balancing 7.1 Introduction . . . . . . . . . . . . . . . . 7.2 Plurisubharmonic Functions . . . . . . . 7.3 The Azad-Loeb Theorem . . . . . . . . 7.4 Application to Balancing . . . . . . . . . 7.5 Euclidean Norm Balancing . . . . . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
201 201 204 207 209 220
8 Balancing via Gradient Flows 8.1 Introduction . . . . . . . . . . . . . . . 8.2 Flows on Positive Definite Matrices . . 8.3 Flows for Balancing Transformations . 8.4 Balancing via Isodynamical Flows . . 8.5 Euclidean Norm Optimal Realizations
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
229 229 231 246 249 258
9 Sensitivity Optimization 9.1 A Sensitivity Minimizing Gradient Flow . . 9.2 Related L2 -Sensitivity Minimization Flows . 9.3 Recursive L2 -Sensitivity Balancing . . . . . 9.4 L2 -Sensitivity Model Reduction . . . . . . . 9.5 Sensitivity Minimization with Constraints .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
269 269 282 291 295 298
A Linear Algebra A.1 Matrices and Vectors . . . . . . . . . . . . . . . . . . . A.2 Addition and Multiplication of Matrices . . . . . . . . A.3 Determinant and Rank of a Matrix . . . . . . . . . . . A.4 Range Space, Kernel and Inverses . . . . . . . . . . . . A.5 Powers, Polynomials, Exponentials and Logarithms . . A.6 Eigenvalues, Eigenvectors and Trace . . . . . . . . . . A.7 Similar Matrices . . . . . . . . . . . . . . . . . . . . . A.8 Positive Definite Matrices and Matrix Decompositions
. . . . . . . .
. . . . . . . .
. . . . . . . .
311 311 312 312 313 314 314 315 316
. . . . .
Contents
A.9 Norms of Vectors and Matrices A.10 Kronecker Product and Vec . . A.11 Differentiation and Integration A.12 Lemma of Lyapunov . . . . . . A.13 Vector Spaces and Subspaces . A.14 Basis and Dimension . . . . . . A.15 Mappings and Linear Mappings A.16 Inner Products . . . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
xi
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
317 318 318 319 319 320 320 321
B Dynamical Systems B.1 Linear Dynamical Systems . . . . . . . . . . B.2 Linear Dynamical System Matrix Equations B.3 Controllability and Stabilizability . . . . . . B.4 Observability and Detectability . . . . . . . B.5 Minimality . . . . . . . . . . . . . . . . . . B.6 Markov Parameters and Hankel Matrix . . B.7 Balanced Realizations . . . . . . . . . . . . B.8 Vector Fields and Flows . . . . . . . . . . . B.9 Stability Concepts . . . . . . . . . . . . . . B.10 Lyapunov Stability . . . . . . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
323 323 325 325 326 327 327 328 329 330 332
C Global Analysis C.1 Point Set Topology . . . . . . . . . . . . . . . . C.2 Advanced Calculus . . . . . . . . . . . . . . . . C.3 Smooth Manifolds . . . . . . . . . . . . . . . . C.4 Spheres, Projective Spaces and Grassmannians C.5 Tangent Spaces and Tangent Maps . . . . . . . C.6 Submanifolds . . . . . . . . . . . . . . . . . . . C.7 Groups, Lie Groups and Lie Algebras . . . . . C.8 Homogeneous Spaces . . . . . . . . . . . . . . . C.9 Tangent Bundle . . . . . . . . . . . . . . . . . . C.10 Riemannian Metrics and Gradient Flows . . . . C.11 Stable Manifolds . . . . . . . . . . . . . . . . . C.12 Convergence of Gradient Flows . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
335 335 338 341 343 346 349 351 355 357 360 362 365
References
367
Author Index
389
Subject Index
395
CHAPTER
1
Matrix Eigenvalue Methods 1.1 Introduction Optimization techniques are ubiquitous in mathematics, science, engineering, and economics. Least Squares methods date back to Gauss who developed them to calculate planet orbits from astronomical measurements. So we might ask: “What is new and of current interest in Least Squares Optimization?” Our curiosity to investigate this question along the lines of this work was first aroused by the conjunction of two “events”. The first “event” was the realization by one of us that recent results from geometric invariant theory by Kempf and Ness had important contributions to make in systems theory, and in particular to questions concerning the existence of unique optimum least squares balancing solutions. The second “event” was the realization by the other author that constructive procedures for such optimum solutions could be achieved by dynamical systems. These dynamical systems are in fact ordinary matrix differential equations (ODE’s), being also gradient flows on manifolds which are best formulated and studied using the language of differential geometry. Indeed, the beginnings of what might be possible had been enthusiastically expounded by Roger Brockett (Brockett, 1988) He showed, quite surprisingly to us at the time, that certain problems in linear algebra could be solved by calculus. Of course, his results had their origins in earlier work dating back to that of Fischer (1905), Courant (1922) and von Neumann (1937), as well as that of Rutishauser in the 1950s (1954; 1958). Also there were parallel efforts in numerical analysis by Chu (1988). We also mention
2
Chapter 1. Matrix Eigenvalue Methods
the influence of the ideas of Hermann (1979) on the development of applications of differential geometry in systems theory and linear algebra. Brockett showed that the tasks of diagonalizing a matrix, linear programming, and sorting, could all be solved by dynamical systems, and in particular by finding the limiting solution of certain well behaved ordinary matrix differential equations. Moreover, these construction procedures were actually mildly disguised solutions to matrix least squares minimization problems. Of course, we are not used to solving problems in linear algebra by calculus, nor does it seem that matrix differential equations are attractive for replacing linear algebra computer packages. So why proceed along such lines? Here we must look at the cutting edge of current applications which result in matrix formulations involving quite high dimensional matrices, and the emergent computer technologies with distributed and parallel processing such as in the connection machine, the hypercube, array processors, systolic arrays, and artificial neural networks. For such “neural” network architectures, the solutions of high order nonlinear matrix differential (or difference) equations is not a formidable task, but rather a natural one. We should not exclude the possibility that new technologies, such as charge coupled devices, will allow N digital additions to be performed simultaneously rather than in N operations. This could bring about a new era of numerical methods, perhaps permitting the dynamical systems approach to optimization explored here to be very competitive. The subject of this book is currently in an intensive state of development, with inputs coming from very different directions. Starting from the seminal work of Khachian and Karmarkar, there has been a lot of progress in developing interior point algorithms for linear programming and nonlinear programming, due to Bayer, Lagarias and Faybusovich, to mention a few. In numerical analysis there is the work of Kostant, Symes, Deift, Chu, Tomei and others on the Toda flow and its connection to completely integrable Hamiltonian systems. This subject also has deep connections with torus actions and symplectic geometry. Starting from the work of Brockett, there is now an emerging theory of completely integrable gradient flows on manifolds which is developed by Bloch, Brockett, Flaschka and Ratiu. We also mention the work of Bloch on least squares estimation with relation to completely integrable Hamiltonian systems. In our own work we have tried to develop the applications of gradient flows on manifolds to systems theory, signal processing and control theory. In the future we expect more applications to optimal control theory. We also mention the obvious connections to artificial neural networks and nonlinear approximation theory. In all these research directions, the development is far from being complete and a definite picture has not yet appeared. It has not been our intention, nor have we been able to cover thoroughly
1.1. Introduction
3
all these recent developments. Instead, we have tried to draw the emerging interconnections between these different lines of research and to raise the reader’s interests in these fascinating developments. Our window on these developments, to which we invite the reader to share, is of course our own research of recent years. We see then that a dynamical systems approach to optimization is rather timely. Where better to start than with least squares optimization? The first step for the approach we take is to formulate a cost function which when minimized over the constraint set would give the desired result. The next step is to formulate a Riemannian metric on the tangent space of the constraint set, viewed as a manifold, such that a gradient flow on the manifold can be readily implemented, and such that the flow converges to the desired algebraic solution. We do not offer a systematic approach for achieving any “best” selections of the metric, but rather demonstrate the approach by examples. In the first chapters of the monograph, these examples will be associated with fundamental and classical tasks in linear algebra and in linear system theory, therefore representing more subtle, rather than dramatic, advances. In the later chapters new problems, not previously addressed by any complete theory, are tackled. Of course, the introduction of methods from the theory of dynamical systems to optimization is well established, as in modern analysis of the classical steepest descent gradient techniques and the Newton method. More recently, feedback control techniques are being applied to select the step size in numerical integration algorithms. There are interesting applications of optimization theory to dynamical systems in the now well established field of optimal control and estimation theory. This book seeks to catalize further interactions between optimization and dynamical systems. We are familiar with the notion that Riccati equations are often the dynamical systems behind many least squares optimization tasks, and engineers are now comfortable with implementing Riccati equations for estimation and control. Dynamical systems for other important matrix least squares optimization tasks in linear algebra, systems theory, sensitivity optimization, and inverse eigenvalue problems are studied here. At first encounter these may appear quite formidable and provoke caution. On closer inspection, we find that these dynamical systems are actually Riccati-like in behaviour and are often induced from linear flows. Also, it is very comforting that they are exponentially convergent, and converge to the set of global optimal solutions to the various optimization tasks. It is our prediction that engineers will become familiar with such equations in the decades to come. To us the dynamical systems arising in the various optimization tasks studied in this book have their own intrinsic interest and appeal. Although
4
Chapter 1. Matrix Eigenvalue Methods
we have not tried to create the whole zoo of such ordinary differential equations (ODE’s), we believe that an introductory study is useful and worthy of such optimizing flows. At this stage it is too much to expect that each flow be competitive with the traditional numerical methods where these apply. However, we do expect applications in circumstances not amenable to standard numerical methods, such as in adaptive, stochastic, and time-varying environments, and to optimization tasks as explored in the later parts of this book. Perhaps the empirical tricks of modern numerical methods, such as the use of the shift strategies in the QR algorithm, can accelerate our gradient flows in discrete time so that they will be more competitive than they are now. We already have preliminary evidence for this in our own research not fully documented here. In this monograph, we first study the classical methods of diagonalizing a symmetric matrix, giving a dynamical systems interpretation to the methods. The power method, the Rayleigh quotient method, the QR algorithm, and standard least squares algorithms which may include constraints on the variables, are studied in turn. This work leads to the gradient flow equations on symmetric matrices proposed by Brockett involving Lie brackets. We term these equations double bracket flows, although perhaps the term Lie-Brockett could have some appeal. These are developed for the primal task of diagonalizing a real symmetric matrix. Next, the related exercise of singular value decomposition (SVD) of possibly complex nonsymmetric matrices is explored by formulating the task so as to apply the double bracket equation. Also a first principles derivation is presented. Various alternative and related flows are investigated, such as those on the factors of the decompositions. A mild generalization of SVD is a matrix factorization where the factors are balanced. Such factorizations are studied by similar gradient flow techniques, leading into a later topic of balanced realizations. Double bracket flows are then applied to linear programming. Recursive discretetime versions of these flows are proposed and rapproachement with earlier linear algebra techniques is made. Moving on from purely linear algebra applications for gradient flows, we tackle balanced realizations and diagonal balancing as topics in linear system theory. This part of the work can be viewed as a generalization of the results for singular value decompositions. The remaining parts of the monograph concerns certain optimization tasks arising in signal processing and control. In particular, the emphasis is on quadratic index minimization. For signal processing the parameter sensitivity costs relevant for finite-word-length implementations are minimized. Also constrained parameter sensitivity minimization is covered so as to cope with scaling constraints in a digital filter design. For feedback control, the quadratic indices are those usually used to achieve trade-offs
1.2. Power Method for Diagonalization
5
between control energy and regulation or tracking performance. Quadratic indices are also used for eigenvalue assignment. For all these various tasks, the geometry of the constraint manifolds is important and gradient flows on these manifolds are developed. Digressions are included in the early chapters to cover such topics as Projective Spaces, Riemannian Metrics, Gradient Flows, and Lie groups. Appendices are included to cover the relevant basic definitions and results in linear algebra, dynamical systems theory, and global analysis including aspects of differential geometry.
1.2 Power Method for Diagonalization In this chapter we review some of the standard tools in numerical linear algebra for solving matrix eigenvalue problems. Our interest in such methods is not to give a concise and complete analysis of the algorithms, but rather to demonstrate that some of these algorithms arise as discretizations of certain continuous-time dynamical systems. This work then leads into the more recent matrix eigenvalue methods of Chapter 2, termed double bracket flows, which also have application to linear programming and topics of later chapters. There are excellent textbooks available where the following standard methods are analyzed in detail; one choice is Golub and Van Loan (1989).
The Power Method The power method is a particularly simple iterative procedure to determine a dominant eigenvector of a linear operator. Its beauty lies in its simplicity rather than its computational efficiency. Appendix A gives background material in matrix results and in linear algebra. Let A : Cn → Cn be a diagonalizable linear map with eigenvalues λ1 , . . . , λn and eigenvectors v1 , . . . , vn . For simplicity let us assume that A is nonsingular and λ1 , . . . , λn satisfy |λ1 | > |λ2 | ≥ · · · ≥ |λn |. We then say that λ1 is a dominant eigenvalue and v1 a dominant eigenvector . Let x =
n
1/2 2
|xi |
(2.1)
i=1
denote the standard Euclidean norm of Cn . For any initial vector x0 of Cn with x0 = 1, we consider the infinite normalized Krylov-sequence (xk ) of
6
Chapter 1. Matrix Eigenvalue Methods
unit vectors of Cn defined by the discrete-time dynamical system xk =
Ak x0 Axk−1 = , Axk−1 Ak x0
k ∈ N.
(2.2)
Appendix B gives some background results for dynamical systems. Since the growth rate of the component of Ak x0 corresponding to the eigenvector v1 dominates the growth rates of the other components we would expect that (xk ) converges to the dominant eigenvector of A: lim xk = λ
k→∞
v1 v1
for some λ ∈ C with |λ| = 1.
(2.3)
Of course, if x0 is an eigenvector of A so is xk for all k ∈ N. Therefore we would expect (2.3) to hold only for generic initial conditions, that is for almost all x0 ∈ Cn . This is indeed quite true; see Golub and Van Loan (1989), Parlett and Poole (1973). Example 2.1 Let
1 0 A= 0 2
1 1 . with x0 = √ 2 1
1 1 xk = √ , 2k 2k 1+2 which converges to 01 for k → ∞. Then
k ≥ 1,
Example 2.2 Let 1 A= 0 Then
0 , −2
1 1 . x0 = √ 2 1
1 1 xk = √ k , 1 + 22k (−2)
and the sequence (xk | k ∈ N) has 0 0 , 1 −1 as a limit set, see Figure 2.1.
k ∈ N,
1.2. Power Method for Diagonalization
x1
7
xk
x0 0
1
FIGURE 2.1. Power estimates of
0 1
In both of the above examples, the
power method defines a discretedynamical system on the unit circle (a, b) ∈ R2 | a2 + b2 = 1 ak ak+1 1 = 2 (2.4) bk+1 ak + 4b2k 2εbk with initial conditions a20 + b20 = 1, and ε =±1. In the first case, ε = 1, all a0 1 solutions (except for those where = ± b0 0 ) of (2.4) converge to either 0 0 or to − depending on whether b > 0 or b0 < 0. In the second case, 0 1 1 ε = −1, no solution with b0 = 0 converges; oscillate 0 in fact the solutions 0 infinitely often between the vicinity of and that of , approaching 1 −1
0 0 closer and closer. 1 , −1 The dynamics of the power iterations (2.2) become particularly transparent if eigenspaces instead of eigenvectors are considered. This leads to the interpretation of the power method as a discrete-time dynamical system on the complex projective space; see Ammar and Martin (1986) and Parlett and Poole (1973). We start by recalling some elementary terminology and facts about projective spaces, see also Appendix C on differential geometry. Digression: Projective Spaces and Grassmann Manifolds The n-dimensional complex projective space CPn is defined as the set of all one-dimensional complex linear subspaces of Cn+1 . Likewise, the n-dimensional real projective space RPn is the set of all one-dimensional real linear subspaces of Rn+1 . Thus, using stereographic projection, RP1 can be depicted as the unit circle in R2 . In a similar way, CP1 coincides with the
8
Chapter 1. Matrix Eigenvalue Methods
R .
0
1
IRIP1
CIP1
FIGURE 2.2. Circle RP1 and Riemann Sphere CP1 familiar Riemann sphere C ∪ {∞} consisting of the complex plane and the point at infinity, as illustrated in Figure 2.2. Since any one-dimensional complex subspace of Cn+1 is generated by a unit vector in Cn+1 , and since any two unit row vectors z = (z0 , . . . , zn ), w = (w0 , . . . , wn ) of Cn+1 generate the same complex line if and only if (w0 , . . . , wn ) = (λz0 , . . . , λzn ) for some λ ∈ C, |λ| = 1, one can identify CPn with the set of equivalence classes [z0 : · · · : zn ] = {(λz0 , . . . , λzn ) | λ ∈ C, |λ| = 1} for unit vectors (z0 , . . . , zn ) of Cn+1 . Here z0 , . . . , zn are called the homogeneous coordinates for the complex line [z0 : · · · : zn ]. Similarly, we denote by [z0 : · · · : zn ] the complex line which is, generated by an arbitrary nonzero vector (z0 , . . . , zn ) ∈ Cn+1 . Now , let H1 (n + 1) denote the set of all one-dimensional Hermitian projection operators on Cn+1 . Thus H ∈ H1 (n + 1) if and only if H = H ∗,
H 2 = H,
rank H = 1.
(2.5)
By the spectral theorem every H ∈ H1 (n + 1) is of the form H = x · x∗ for a unit column vector x = (x0 , . . . , xn ) ∈ Cn+1 . The map f : H1 (n + 1) →CPn H →Image of H = [x0 : · · · : xn ]
(2.6)
is a bijection and we can therefore identify the set of rank one Hermitian projection operators on Cn+1 with the complex projective space CPn . This is what we refer to as the isospectral picture of the projective space. Note
1.2. Power Method for Diagonalization
9
that this parametrization of the projective space is not given as a collection of local coordinate charts but rather as a global algebraic representation. For our purposes such global descriptions are of more interest than the local coordinate chart descriptions. Similarly, the complex Grassmann manifold GrassC (k, n + k) is defined as the set of all k-dimensional complex linear subspaces of Cn+k . If Hk (n + k) denotes the set of all Hermitian projection operators H of Cn+k with rank k H = H ∗,
H 2 = H,
rank H = k,
then again, by the spectral theorem, every H ∈ Hk (n + k) is of the form H = X · X ∗ for a complex (n + k) × k-matrix X satisfying X ∗ X = Ik . The map f : Hk (n + k) → GrassC (k, n + k)
(2.7)
f (H) = image (H) = column space of X
(2.8)
defined by
is a bijection. The spaces GrassC (k, n + k) and CPn = GrassC (1, n + 1) are compact complex manifolds of (complex) dimension kn and n, respectively. Again, we refer to this as the isospectral picture of Grassmann manifolds. In the same way the real Grassmann manifolds GrassR (1, n + 1) = RPn and GrassR (k, n + k) are defined; i.e. GrassR (k, n + k) is the set of all kdimensional real linear subspaces of Rn+k of dimension k.
Power Method as a Dynamical System We can now describe the power method as a dynamical system on the complex projective space CPn−1 . Given a complex linear operator A : Cn → Cn with det A = 0, it induces a map on the complex projective space CPn−1 denoted also by A: A : CPn−1 →CPn−1 , →A · ,
(2.9)
which maps every one-dimensional complex vector space ⊂ Cn to the image A · of under A. The main convergence result on the power method can now be stated as follows: Theorem 2.3 Let A be diagonalizable with eigenvalues λ1 , . . . , λn satisfying |λ1 | > |λ2 | ≥ · · · ≥ |λn |. For almost all complex lines 0 ∈ CPn−1
10
Chapter 1. Matrix Eigenvalue Methods
the sequence Ak · 0 | k ∈ N of power estimates converges to the dominant eigenspace of A: lim Ak · 0 = dominant eigenspace of A.
k→∞
Proof 2.4 Without loss of generality we can assume that A = diag (λ1 , . . . , λn ). Let 0 ∈ CPn−1 be any complex line of Cn , with homogeneous coordinates of the form 0 = [1 : x2 : · · · : xn ]. Then Ak · 0 = λk1 : λk2 x2 : · · · : λkn xn k k λ2 λn = 1: x2 : · · · : xn , λ1 λ1 which converges to the dominant eigenspace ∞ = [1 : 0 : · · · : 0], since λi < 1 for i = 2, . . . , n. λ1 An important insight into the power method is that it is closely related to the Riccati equation. Let F ∈ Cn×n be partitioned by F11 F12 F = , F21 F22 where F11 , F12 are 1×1 and 1×(n − 1) matrices and F21 , F22 are (n − 1)×1 and (n − 1) × (n − 1). The matrix Riccati differential equation on C(n−1)×1 is then for K ∈ C(n−1)×1 : K˙ = F21 + F22 K − KF11 − KF12 K,
K (0) ∈ C(n−1)×1 . (2.10)
Given any solution K (t) = (K1 (t) , . . . , Kn−1 (t)) of (2.10), it defines a corresponding curve (t) = [1 : K1 (t) : · · · : Kn−1 (t)] ∈ CPn−1 in the complex projective space CPn−1 . Let etF denote the matrix exponential of tF and let etF : CPn−1 →CPn−1 →etF ·
(2.11)
denote the associated flow on CPn−1 . Then it is easy to show and in fact well known that if (t) = etF · 0
(2.12)
1.2. Power Method for Diagonalization
11
and (t) = [0 (t) : 1 (t) : · · · : n−1 (t)], then
K (t) = (1 (t) /0 (t) , . . . , n−1 (t) /0 (t))
is a solution of the Riccati equation (2.10), as long as 0 (t) = 0, and conversely. There is therefore a one-to-one correspondence between solutions of the Riccati equation and curves t → etF · in the complex projective space. The above correspondence between matrix Riccati differential equations and systems of linear differential equations is of course well known and is the basis for explicit solution methods for the Riccati equation. In the scalar case the idea is particularly transparent. Example 2.5 Let x (t) be a solution of the scalar Riccati equation a = 0.
x˙ = ax2 + bx + c, Set y (t) = e
−a
t t0
x(τ )dτ
so that we have y˙ = −axy. Then a simple computation shows that y (t) satisfies the second-order equation y¨ = by˙ − acy. Conversely if y (t) satisfies y¨ = αy˙ +βy then x (t) = the Riccati equation x˙ = −x2 + αx + β.
y(t) ˙ y(t) ,
y (t) = 0, satisfies
With this in mind, we can now easily relate the power method to the Riccati equation. Let F ∈ Cn×n be a matrix logarithm of a nonsingular matrix A eF = A,
log A = F.
(2.13)
Then k ∈ N, Ak = ekF and hence Ak · 0 | k ∈ N =
kF for any integer e · 0 | k ∈ N . Thus the power method for A is identical with constant time sampling of the flow t → (t) = etF · 0 on CPn−1 at times t = 0, 1, 2, . . . . Thus, Corollary 2.6 The power method for nonsingular matrices A corresponds to an integer time sampling of the Riccati equation with F = log A. The above results and remarks have been for an arbitrary invertible complex n × n matrix A. In that case a logarithm F of A exists and we
12
Chapter 1. Matrix Eigenvalue Methods
can relate the power method to the Riccati equation. If A is an invertible Hermitian matrix, a Hermitian logarithm does not exist in general, and matters become a bit more complex. In fact, by the identity det eF = etr F ,
(2.14)
F being Hermitian implies that det eF > 0. We have the following lemma. Lemma 2.7 Let A ∈ Cn×n be an invertible Hermitian matrix. Then there exists a pair of commuting Hermitian matrices S, F ∈ Cn×n with S 2 = I and A = SeF . Also, A and S have the same signatures. Proof 2.8 Without loss of generality we can assume A is a real diagonal matrix
diag (λ1 , . .. , λn ). Define F := diag (log |λ1 | , . . . , log |λn |) and S := diag |λλ11 | , . . . , |λλnn | . Then S 2 = I and A = SeF . Applying the lemma to the Hermitian matrix A yields
F k for k even ekF k k kF =S e = A = Se SekF for k odd.
(2.15)
We see that, for k even, the k-th power estimate Ak · l0 is identical with the solution of the Riccati equation for the Hermitian logarithm F = log A at time k. A straightforward extension of the above approach is in studying power iterations to determine the dominant k-dimensional eigenspace of A ∈ Cn×n . The natural geometric setting is that as a dynamical system on the Grassmann manifold GrassC (k, n). This parallels our previous development. Thus let A ∈ Cn×n denote an invertible matrix, it induces a map on the Grassmannian GrassC (k, n) denoted also by A: A : GrassC (k, n) → GrassC (k, n) V →A · V which maps any k-dimensional complex subspace V ⊂ Cn onto its image A · V = A (V ) under A. We have the following convergence result. Theorem 2.9 Assume |λ1 | ≥ · · · ≥ |λk | > |λk+1 | ≥ · · · ≥ |λn |. For almost all V ∈ GrassC (k, n) the sequence (Aν · V | ν ∈ N) of power estimates converges to the dominant k-dimensional eigenspace of A: lim Aν · V = the k-dimensional dominant eigenspace of A.
ν→∞
Proof 2.10 Without loss of generality we can assume that A = diag (Λ1 , Λ2 ) with Λ1 = diag (λ1 , . . . , λk ), Λ2 = diag (λk+1 , . . . , λn ). Given
1.2. Power Method for Diagonalization
13
a full rank matrix X ∈ Cn×k , let [X] ⊂ Cn denote the k-dimensional subspace of Cn which is spanned by the columns of X. Thus [X] ∈ 1 ∈ Grass GrassC (k, n). Let V = X C (k, n) satisfy the genericity conX2 k×k dition det (X1 ) = 0, with X1 ∈ C , X2 ∈ C(n−k)×k . Thus Λν1 · X1 ν A ·V = Λν2 · X2 Ik = . Λν2 X2 X1−1 Λ−ν 1 Estimating the 2-norm of Λν2 X2 X1−1 Λ−ν 1 gives ν Λ2 X2 X −1 Λ−ν ≤ Λν2 Λ−ν X2 X −1 1 1 1 1 ν |λk+1 | X2 X −1 ≤ 1 |λk | ν = |λk |−ν for diagonal matrices. Thus since Λν2 = |λk+1 | , and Λ−ν 1 −1 −ν ν ν Λ 2 X1 Λ1 converges to the zero matrix and hence A · V converges to 2IX k , that is to the k-dimensional dominant eigenspace of A. 0 Proceeding in the same way as before, we can relate the power iterations for the k-dimensional dominant eigenspace of A to the Riccati equation with F = log A. Briefly, let F ∈ Cn×n be partitioned as F11 F12 F = F21 F22 where F11 , F12 are k×k and k×(n − k) matrices and F21 , F22 are (n − k)×k and (n − k) × (n − k). The matrix Riccati equation on C(n−k)×k is then given by (2.10) with K (0) and K (t) complex (n − k) × k matrices. Any (n − k) × k matrix solution K (t) of (2.10) defines a corresponding curve Ik ∈ GrassC (k, n + k) V (t) = K (t) in the Grassmannian and V (t) = etF · V (0) holds for all t. Conversely, if V (t) = etF · V (0) with V (t) = X2 (t) X1−1 (t)
, X2 ∈ C then K (t) = X1 ∈ C matrix Riccati equation, as long as det X1 (t) = 0. k×k
(n−k)×k
X1 (t) X2 (t)
,
is a solution of the
14
Chapter 1. Matrix Eigenvalue Methods
Again the power method for finding the k-dimensional dominant eigenspace of A corresponds to integer time sampling of the (n − k) × k matrix Riccati equation defined for F = log A. Problem 2.11 A differential equation x˙ = f (x) is said to have finite escape time if there exists a solution x (t) for some initial condition x (t0 ) such that x (t) is not defined for all t ∈ R. Prove (a) The scalar Riccati equation x˙ = x2 has finite escape time. (b) x˙ = ax2 + bx + c, a = 0, has finite escape time.
Main Points of Section The power method for determining a dominant eigenvector of a linear nonsingular operator A : Cn → Cn has a dynamical system interpretation. It is in fact a discrete-time dynamical system on a complex projective space CPn−1 , where the n-dimensional complex projective space CPn is the set of all one dimensional complex linear subspaces of Cn+1 . The power method is closely related to a quadratic continuous-time matrix Riccati equation associated with a matrix F satisfying eF = A. It is in fact a constant period sampling of this equation. Grassmann manifolds are an extension of the projective space concept and provide the natural geometric setting for studying power iterations to determine the dominant k-dimensional eigenspace of A.
1.3 The Rayleigh Quotient Gradient Flow Let A ∈ Rn×n be real symmetric with eigenvalues λ1 ≥ · · · ≥ λn and corresponding eigenvectors v1 , . . . , vn . The Rayleigh quotient of A is the smooth function rA : Rn − {0} → R defined by rA (x) =
x Ax x2
.
(3.1)
Let S n−1 = {x ∈ Rn | x = 1} denote the unit sphere in Rn . Since rA (tx) = rA (x) for all positive real numbers t > 0, it suffices to consider the Rayleigh quotient on the sphere where x = 1. The following theorem is a special case of the celebrated Courant-Fischer minimax theorem characterizing the eigenvalues of a real symmetric n × n matrix, see Golub and Van Loan (1989).
1.3. The Rayleigh Quotient Gradient Flow
15
Theorem 3.1 Let λ1 and λn denote the largest and smallest eigenvalue of a real symmetric n × n-matrix A respectively. Then λ1 = max rA (x) ,
(3.2)
λn = min rA (x) .
(3.3)
x=1 x=1
More generally, the critical points and critical values of rA are the eigenvectors and eigenvalues for A. Proof 3.2 Let rA : S n−1 → R denote the restriction of the Rayleigh quotient on the (n − 1)-sphere. For any unit vector x ∈ S n−1 the Fr´echetderivative of rA is the linear functional DrA |x : Tx S n−1 → R defined on the tangent space Tx S n−1 of S n−1 at x as follows. The tangent space is Tx S n−1 = {ξ ∈ Rn | x ξ = 0} ,
(3.4)
DrA |x (ξ) = 2 Ax, ξ = 2x Aξ.
(3.5)
and
Hence x is a critical point of rA if and only if DrA |x (ξ) = 0, or equivalently, x ξ = 0 =⇒ x Aξ = 0, or equivalently, Ax = λx for some λ ∈ R. Left multiplication by x implies λ = Ax, x . Thus the critical points and critical values of rA are the eigenvectors and eigenvalues of A respectively. The result follows. This proof uses concepts from calculus, see Appendix C. The approach taken here is used to derive many subsequent results in the book. The arguments used are closely related to those involving explicit use of Lagrange multipliers, but here the use of such is implicit. From the variational characterization of eigenvalues of A as the critical values of the Rayleigh quotient, it seems natural to apply gradient flow techniques in order to search for the dominant eigenvector. We now include a digression on these techniques, see also Appendix C. Digression: Riemannian Metrics and Gradient Flows Let M be a smooth manifold and let T M andT ∗ M denote its tangent and cotangent bundle, respectively. Thus T M = x∈M Tx M is the set theoretic disjoint
16
Chapter 1. Matrix Eigenvalue Methods
union of all tangent spaces Tx M of M while T ∗ M = x∈M Tx∗ M denotes the disjoint union of all cotangent spaces Tx∗ M = Hom (Tx M, R) (i.e. the dual vector spaces of Tx M ) of M . A Riemannian metric on M then is a family of nondegenerate inner products , x , defined on each tangent space Tx M , such that , x depends smoothly on x ∈ M . Once a Riemannian metric is specified, M is called a Riemannian manifold . Thus a Riemannian metric on Rn is just a smooth map Q : Rn → Rn×n such that for each x ∈ Rn , Q (x) is a real symmetric positive definite n × n matrix. In particular every nondegenerate inner product on Rn defines a Riemannian metric on Rn (but not conversely) and also induces by restriction a Riemannian metric on every submanifold M of Rn . We refer to this as the induced Riemannian metric on M . Let Φ : M → R be a smooth function defined on a manifold M and let DΦ : M → T ∗ M denote the differential, i.e. the section of the cotangent bundle T ∗ M defined by DΦ (x) : Tx M → R,
ξ → DΦ (x) · ξ,
where DΦ (x) is the derivative of Φ at x. We also often use the notation DΦ|x (ξ) = DΦ (x) · ξ to denote the derivative of Φ at x. To be able to define the gradient vector field of Φ we have to specify a Riemannian metric , on M . The gradient grad Φ of Φ relative to this choice of a Riemannian metric on M is then uniquely characterized by the following two properties. (a) Tangency Condition grad Φ (x) ∈ Tx M
for all x ∈ M.
(b) Compatibility Condition DΦ (x) · ξ = grad Φ (x) , ξ
for all ξ ∈ Tx M.
There exists a uniquely determined vector field grad Φ : M → T M on M such that (a) and (b) hold. grad Φ is called the gradient vector field of Φ. Note that the Riemannian metric enters into Condition (b) and therefore the gradient vector field grad Φ will depend on the choice of the Riemannian metric; changing the metric will also change the gradient. If M = Rn is endowed with its standard Riemannian metric defined by ξ, η = ξ η
for ξ, η ∈ Rn
then the associated gradient vector field is just the column vector ∂Φ ∂Φ (x) , . . . , (x) . ∇Φ (x) = ∂x1 ∂xn
1.3. The Rayleigh Quotient Gradient Flow
17
If Q : Rn → Rn×n denotes a smooth map with Q (x) = Q (x) > 0 for all x then the gradient vector field of Φ : Rn → R relative to the Riemannian metric , on Rn defined by ξ, ηx := ξ Q (x) η,
ξ, η ∈ Tx (Rn ) = Rn ,
is grad Φ (x) = Q (x)−1 ∇Φ (x) . This clearly shows the dependence of grad Φ on the metric Q (x). The following remark is useful in order to compute gradient vectors. Let Φ : M → R be a smooth function on a Riemannian manifold and let V ⊂ M be a submanifold. If x ∈ V then grad ( Φ| V ) (x) is the image of grad Φ (x) under the orthogonal projection Tx M → Tx V . Thus let M be a submanifold of Rn which is endowed with the induced Riemannian metric from Rn (Rn carries the Riemannian metric given by the standard Euclidean inner product), and let Φ : Rn → R be a smooth function. Then the gradient vector grad Φ|M (x) of the restriction Φ : M → R is the image of ∇Φ (x) ∈ Rn under the orthogonal projection Rn → Tx M onto Tx M .
Returning to the task of achieving gradient flows for the Rayleigh quotient, in order to define the gradient of rA : S n−1 → R, we first have to specify a Riemannian metric on S n−1 , i.e. an inner product structure on each tangent space Tx S n−1 of S n−1 as in the digression above; see also Appendix C. An obviously natural choice for such a Riemannian metric on S n−1 is that of taking the standard Euclidean inner product on Tx S n−1 , i.e. the Riemannian metric on S n−1 induced from the imbedding S n−1 ⊂ Rn . That is we define for each tangent vectors ξ, η ∈ Tx S n−1 ξ, η := 2ξ η.
(3.6)
where the constant factor 2 is inserted for convenience. The gradient ∇rA of the Rayleigh quotient is then the uniquely determined vector field on S n−1 which satisfies the two conditions (a)
∇rA (x) ∈ Tx S n−1
(b)
DrA |x (ξ) = ∇rA (x) , ξ
= 2∇rA (x) ξ
for all x ∈ S n−1 ,
for all ξ ∈ Tx S n−1 .
(3.7)
(3.8)
18
Chapter 1. Matrix Eigenvalue Methods
Since DrA |x (ξ) = 2x Aξ we obtain
[∇rA (x) − Ax] ξ = 0
∀ξ ∈ Tx S n−1 .
(3.9)
By (3.4) this is equivalent to ∇rA (x) = Ax + λx with λ = −x Ax, so that x ∇rA (x) = 0 to satisfy (3.7). Thus the gradient flow for the Rayleigh quotient on the unit sphere S n−1 is x˙ = (A − rA (x) In ) x.
(3.10)
It is easy to see directly that flow (3.10) leaves the sphere S n−1 invariant: d (x x) =x˙ x + x x˙ dt =2x (A − rA (x) I) x =2 [rA (x) − rA (x)] x
2
=0. We now move on to study the convergence properties of the Rayleigh quotient flow. First, recall that the exponential rate of convergence ρ > 0 of a solution x (t) of a differential equation to an equilibrium point x¯ refers to the maximal possible α occurring in the following lemma (see Appendix B). Lemma 3.3 Consider the differential equation x˙ = f (x) with equilibrium point x¯. Suppose that the linearization Df |x¯ of f has only eigenvalues with real part less than −α, α > 0. Then there exists a neighborhood Nα of x ¯ and a constant C > 0 such that ¯ x (t) − x ¯ ≤ Ce−α(t−t0 ) x (t0 ) − x for all x (t0 ) ∈ Nα , t ≥ t0 . In the numerical analysis literature, often convergence is measured on a logarithmic scale so that exponential convergence here is referred to as linear convergence. Theorem 3.4 Let A ∈ Rn×n be symmetric with eigenvalues λ1 ≥ · · · ≥ λn . The gradient flow of the Rayleigh quotient on S n−1 with respect to the (induced) standard Riemannian metric is given by (3.10). The solutions x (t) of (3.10) exist for all t ∈ R and converge to an eigenvector of A. Suppose λ1 > λ2 . For almost all initial conditions x0 , x0 = 1, x (t) converges to a dominant eigenvector v1 or −v1 of A with an exponential rate ρ = λ1 − λ2 of convergence.
1.3. The Rayleigh Quotient Gradient Flow
19
Proof 3.5 We have already shown that (3.10) is the gradient flow of rA : S n−1 → R. By compactness of S n−1 the solutions of any gradient flow of rA : S n−1 → R exist for all t ∈ R. It is a special case of a more general result in Duistermaat, Kolk and Varadarajan (1983) that the Rayleigh quotient rA is a Morse-Bott function on S n−1 . Such functions are defined in the subsequent digression on Convergence of Gradient Flows. Moreover, using results in the same digression, the solutions of (3.10) converge to the critical points of rA , i.e. ,by Theorem 3.1, to the eigenvectors of A. Let vi ∈ S n−1 be an eigenvector of A with associated eigenvalue λi . The linearization of (3.10) at vi is then readily seen to be ξ˙ = (A − λi I) ξ,
vi ξ = 0.
(3.11)
Hence the eigenvalues of the linearization (3.11) are (λ1 − λi ) , . . . , (λi−1 − λi ) , (λi+1 − λi ) , . . . , (λn − λi ) . Thus the eigenvalues of (3.11) are all negative if and only if i = 1, and hence only the dominant eigenvectors ±v1 of A can be attractors for (3.10). Since the union of eigenspaces of A for the eigenvalues λ2 , . . . , λn defines a nowhere dense subset of S n−1 , every solution of (3.10) starting in the complement of that set will converge to either v1 or to −v1 . This completes the proof. Remark 3.6 It should be noted that what is usually referred to as the Rayleigh quotient method is somewhat different to what is done here; see Golub and Van Loan (1989). The Rayleigh quotient method uses the recursive system of iterations −1
(A − rA (xk ) I) xk xk+1 = −1 (A − rA (xk ) I) xk to determine the dominant eigenvector at A and is thus not simply a discretization of the continuous-time Rayleigh quotient gradient flow (3.10). The dynamics of the Rayleigh quotient iteration can be quite complicated. See Batterson and Smilie (1989) for analytical results as well as Beattie and Fox (1989). For related results concerning generalised eigenvalue problems, see Auchmuty (1991). We now digress to study gradient flows; see also Appendices B and C. Digression: Convergence of Gradient Flows Let M be a Riemannian manifold and let Φ : M → R be a smooth function. It is an immediate
20
Chapter 1. Matrix Eigenvalue Methods
consequence of the definition of the gradient vector field grad Φ on M that the equilibria of the differential equation x˙ (t) = − grad Φ (x (t))
(3.12)
are precisely the critical points of Φ : M → R. For any solution x (t) of (3.12) d Φ (x (t)) = grad Φ (x (t)) , x˙ (t) dt = − grad Φ (x (t)) 2 ≤ 0
(3.13)
and therefore Φ (x (t)) is a monotonically decreasing function of t. The following standard result is often used in this book. Proposition 3.7 Let Φ : M → R be a smooth function on a Riemannian manifold with compact sublevel sets, i.e. for all c ∈ R the sublevel set {x ∈ M | Φ (x) ≤ c} is a compact subset of M . Then every solution x (t) ∈ M of the gradient flow (3.12) on M exists for all t ≥ 0. Furthermore, x (t) converges to a connected component of the set of critical points of Φ as t → +∞. Note that the condition of the proposition is automatically satisfied if M is compact. Moreover, in suitable local coordinates of M , the linearization of the gradient flow (3.12) around each equilibrium point is given by the Hessian HΦ of Φ. Thus by symmetry, HΦ has only real eigenvalues. The linearization is not necessarily given by the Hessian if grad Φ is expressed using an arbitrary system of local coordinate charts of M . However, the numbers of positive and negative eigenvalues of the Hessian HΦ and of the linearization of grad Φ at an equilibriuim point are always the same. In particular, an equilibrium point x0 ∈ M of the gradient flow (3.12) is a locally stable attractor if the Hessian of Φ at x0 is positive definite. It follows from Proposition 3.7 that the solutions of a gradient flow have a particularly simple convergence behaviour. There are no periodic solutions, strange attractors or any chaotic behaviours. Every solution converges to a connected component of the set of equilibria points. Thus, if Φ : M → R has only finitely many critical points, then the solutions of (3.12) will converge to a single equilibrium point rather than to a set of equilibria. We state this observation as Proposition 3.8. Recall that the ω-limit set Lω (x) of a point x ∈ M for a vector field X on M is the set of points of the form limn→∞ φtn (x), where (φt ) is the flow of X and tn → +∞. Similarly, the α-limit set Lα (x) is defined by letting tn → −∞ instead of +∞. Proposition 3.8 Let Φ : M → R be a smooth function on a Riemannian manifold M with compact sublevel sets. Then
1.3. The Rayleigh Quotient Gradient Flow
21
(a) The ω-limit set Lω (x), x ∈ M , of the gradient vector field grad Φ is a nonempty, compact and connected subset of the set of critical points of Φ : M → R. (b) Suppose Φ : M → R has isolated critical points. Then Lω (x), x ∈ M , consists of a single critical point. Therefore every solution of the gradient flow (3.12) converges for t → +∞ to a critical point of Φ. In particular, the convergence of a gradient flow to a set of equilibria rather than to single equilibrium points occurs only in nongeneric situations. We now focus on the generic situation. The conditions in Proposition 3.9 below are satisfied in all cases studied in this book. Let M be a smooth manifold and let Φ : M → R be a smooth function. Let C (Φ) ⊂ M denote the set of all critical points of Φ. We say Φ is a Morse-Bott function provided the following three conditions (a), (b) and (c) are satisfied. (a) Φ : M → R has compact sublevel sets. (b) C (Φ) = kj=1 Nj with Nj disjoint, closed and connected submanifolds of M such that Φ is constant on Nj , j = 1, . . . , k. (c) ker (HΦ (x)) = Tx Nj , ∀x ∈ Nj , j = 1, . . . , k. Actually, the original definition of a Morse-Bott function also includes a global topological condition on the negative eigenspace bundle defined by the Hessian, but this condition is not relevant to us. Here ker (HΦ (x)) denotes the kernel of the Hessian of Φ at x, that is the set of all tangent vectors ξ ∈ Tx M where the Hessian of Φ is degenerate. Of course, provided (b) holds, the tangent space Tx (Nj ) is always contained in ker (HΦ (x)) for all x ∈ Nj . Thus Condition (c) asserts that the Hessian of Φ is full rank in the directions normal to Nj at x. Proposition 3.9 Let Φ : M → R be a Morse-Bott function on a Riemannian manifold M . Then the ω-limit set Lω (x), x ∈ M , for the gradient flow (3.12) is a single critical point of Φ. Every solution of the gradient flow (3.12) converges as t → +∞ to an equilibrium point. Proof 3.10 Since a detailed proof would take us a bit too far away from our objectives here, we only give a sketch of the proof. The experienced reader should have no difficulties in filling in the missing details. By Proposition 3.8, the ω-limit set Lω (x) of any element x ∈ M is contained in a connected component of C (Φ). Thus Lω (x) ⊂ Nj for some 1 ≤ j ≤ k. Let a ∈ Lω (x) and, without loss of generality, Φ (Nj ) = 0. Using the hypotheses on Φ and a straightforward generalization of the Morse lemma (see, e.g., Hirsch (1976) or Milnor (1963)), there exists an open neighborhood Ua of a in M and a diffeomorphism f : Ua → Rn , n = dim M , such that
22
Chapter 1. Matrix Eigenvalue Methods x
3
-
U
a
+
Ua
x
2
+
Ua -
Ua
FIGURE 3.1. Flow around a saddle point (a) f (Ua ∩ Nj ) = Rn0 × {0}
(b) Φ ◦ f −1 (x1 , x2 , x3 ) = 12 x2 2 − x3 2 , x1 ∈ Rn0 , x2 ∈ Rn− , n+ x3 ∈ R , n0 + n− + n+ = n. With this new system of coordinates on Ua , the gradient flow (3.12) of Φ on Ua becomes equivalent to the linear gradient flow x˙ 1 = 0, −1
x˙ 2 = −x2 ,
x˙ 3 = x3
(3.14)
n
on R , depicted in Figure 3.1. Note that the equilibrium point is of Φ ◦ f
−1 + , x space. Let U := f (x1 , x2 , x3 ) | x2 ≥ x3
a saddle point in x 2 3 a
and Ua− = f −1 (x1 , x2 , x3 ) | x2 ≤ x3 . Using the convergence properties of (3.14) it follows that every solution of (3.12) starting in Ua+ −
− f −1 (x1 , x2 , x3 ) | x3 = 0 will enter the region U a . On the other hand, ev−1 ery solution starting in f (x1 , x2 , x3 ) | x3 = 0 will converge to a point on Ua− , all sof −1 (x, 0, 0) ∈ Nj , x ∈ Rn0 fixed.
As Φ is strictly negative lutions starting in Ua− ∪ Ua+ − f −1 (x1 , x2 , x3 ) | x3 = 0 will leave this set eventually and converge to some Ni = Nj . By repeating the analysis for Ni and noting that all other solutions in Ua converge to a single equilibrium point in Nj , the proof is completed.
Returning to the Rayleigh quotient flow, (3.10) gives the gradient flow restricted to the sphere S n−1 . Actually (3.10) extends to a flow on Rn −{0}. Is this extended flow again a gradient flow with respect to some Riemannian metric on Rn − {0}? The answer is yes, as can be seen as follows. A straightforward computation shows that the directional (Fr´echet) derivative of rA : Rn − {0} → R is given by DrA |x (ξ) =
2 x
2
(Ax − rA (x) x) ξ.
1.3. The Rayleigh Quotient Gradient Flow
23
Consider the Riemannian metric defined on each tangent space Tx (Rn − {0}) of Rn − {0} by ξ, η :=
2 x
2ξ
η.
for ξ, η ∈ Tx (Rn − {0}). The gradient of rA : Rn − {0} → R with respect to this Riemannian metric is then characterized by (a) grad rA (x) ∈ Tx (Rn − {0}) (b) DrA |x (ξ) = grad rA (x) , ξ for all ξ ∈ Tx (Rn − {0}). Note that (a) imposes no constraint on grad rA (x) and (b) is easily seen to be equivalent to grad rA (x) = Ax − rA (x) x Thus (3.10) also gives the gradient of rA : Rn − {0} → R. Similarly, a stability analysis of (3.10) on Rn − {0} can be carried out completely. Actually, the modified flow, termed here the Oja flow x˙ = (A − x AxIn ) x
(3.15)
is defined on Rn and can also be used as a means to determine the dominant eigenvector of A. This property is important in neural network theories for pattern classification, see Oja (1982). The flow seems to have certain additional attractive properties, such as being structurally stable for generic matrices A. Example 3.11 Again consider A = 10 02 . The phase portrait of the gradient flow (3.10) is depicted in Figure 3.2. Note that the flow preserves x (t) x (t) = x0 x0 . Example 3.12 The phase portrait of the Oja flow (3.15) on R2 is depicted in Figure 3.3. Note the apparent structural stability property of the flow. The Rayleigh quotient gradient flow (3.10) on the (n − 1) sphere S n−1 has the following interpretation. Consider the linear differential equation on Rn x˙ = Ax for an arbitrary real n × n matrix A. Then LA (x) = Ax − (x Ax) x
24
Chapter 1. Matrix Eigenvalue Methods
x2 x–space
x1
x1 axis unstable x2 axis stable
FIGURE 3.2. Phase portrait of Rayleigh quotient gradient flow
x2
0
FIGURE 3.3. Phase portrait of the Oja flow
x1
1.3. The Rayleigh Quotient Gradient Flow
25
Ax
ÉÉÉÉÉÉÉÉÉÉÉÉÉÉÉÉÉÉÉ ÉÉÉÉÉÉÉÉÉÉÉÉÉÉÉÉÉÉÉ ÉÉÉÉÉÉÉÉÉÉÉÉÉÉÉÉÉÉÉ ÉÉÉÉÉÉÉÉÉÉÉÉÉÉÉÉÉÉÉ ÉÉÉÉÉÉÉÉÉÉÉÉÉÉ ÉÉÉÉÉÉÉÉÉÉÉÉÉÉ ÉÉÉÉÉÉÉÉÉÉÉÉÉÉ
FIGURE 3.4. Orthogonal projection of Ax
is equal to the orthogonal projection of Ax, x ∈ S n−1 , along x onto the tangent space Tx S n−1 of S n−1 at x. See Figure 3.4. Thus (3.10) can be thought of as the dynamics for the angular part of the differential equation x˙ = Ax. The analysis of this angular vector field (3.10) has proved to be important in the qualitative theory of linear stochastic differential equations. An important property of dynamical systems is that of structural stability or robustness. In heuristic terms, structural stability refers to the property that the qualitative behaviour of a dynamical system is not changed by small perturbations in its parameters. Structural stability properties of the Rayleigh quotient gradient flow (3.10) have been investigated by de la Rocque Palis (1978). It is shown there that for A ∈ Rn×n symmetric, the flow (3.10) on S n−1 is structurally stable if and only if the eigenvalues of A are all distinct. Now let us consider the task of finding, for any 1 ≤ k ≤ n, the kdimensional subspace of Rn formed by the eigenspaces of the eigenvalues λ1 , . . . , λk . We refer to this as the dominant k-dimensional eigenspace of A. Let
St (k, n) = X ∈ Rn×k | X X = Ik denote the Stiefel manifold of real orthogonal n × k matrices. The generalized Rayleigh quotient is the smooth function RA : St (k, n) → R,
26
Chapter 1. Matrix Eigenvalue Methods
defined by RA (X) = tr (X AX) .
(3.16)
Note that this function can be extended in several ways on larger subsets of Rn×k . A na¨ıve way of extending RA to a function on Rn×k −{0} would be as RA (X) = tr(X AX)/ X). A more sensible choice which is studied
tr(X −1 . later is as RA (X) = tr X AX (X X) The following lemma summarizes the basic geometric properties of the Stiefel manifold, see also Mahony, Helmke and Moore (1996). Lemma 3.13 St (k, n) is a smooth, compact manifold of dimension kn − k (k + 1) /2. The tangent space at X ∈ St (k, n) is
TX St (k, n) = ξ ∈ Rn×k | ξ X + X ξ = 0 .
(3.17)
Proof 3.14 Consider the smooth map F : Rn×k → Rk(k+1)/2 , F (X) = X X − Ik , where we identified Rk(k+1)/2 with the vector space of k × k real symmetric matrices. The derivative of F at X, X X = Ik , is the linear map defined by DF |X (ξ) = ξ X + X ξ.
(3.18)
Suppose there exists η ∈ Rk×k real symmetric with tr (η (ξ X + X ξ)) = 2 tr (ηX ξ) = 0 for all ξ ∈ Rn×k . Then ηX = 0 and hence η = 0. Thus the derivative (3.18) is a surjective linear map for all X ∈ St (k, n) and the kernel is given by (3.17). The result now follows from the fiber theorem; see Appendix C. The standard Euclidean inner product on the matrix space Rn×k induces an inner product on each tangent space TX St (k, n) by ξ, η := 2 tr (ξ η) . This defines a Riemannian metric on the Stiefel manifold St (k, n), which is called the induced Riemannian metric. ⊥
Lemma 3.15 The normal space TX St (k, n) , that is the set of n × k matrices η which are perpendicular to TX St (k, n), is
⊥ TX St (k, n) = η ∈ Rn×k | tr (ξ η) = 0 for all ξ ∈ TX St (k, n)
(3.19) = XΛ ∈ Rn×k | Λ = Λ ∈ Rk×k .
1.3. The Rayleigh Quotient Gradient Flow
27
Proof 3.16 For any symmetric k × k matrix Λ we have
2 tr (XΛ) ξ = tr (Λ (X ξ + ξ X)) = 0 for all ξ ∈ TX St (k, n) and therefore
XΛ | Λ = Λ ∈ Rk×k ⊂ TX St (k, n)⊥ .
⊥ Both TX St (k, n) and XΛ ∈ Rn×k | Λ = Λ ∈ Rk×k are vector spaces of the same dimension and thus must be equal. The proofs of the following results are similar to those of Theorem 3.4, although technically more complicated, and are omitted. Theorem 3.17 Let A ∈ Rn×n be symmetric and let X = (v1 , . . . , vn ) ∈ Rn×n be orthogonal with X −1 AX = diag (λ1 , . . . , λn ), λ1 ≥ · · · ≥
λn . The generalized Rayleigh quotient RA : St (k, n) → R has at least nk = n!/ (k! (n − k)!) critical points XI = (vi1 , . . . , vik ) for 1 ≤ i1 < · · · < ik ≤ n. All other critical points are of the form ΘXI Ψ, where Θ, Ψ are arbitrary n × n and k × k orthogonal matrices with Θ AΘ = A. The critical value of RA at XI is λi1 + · · · + λik . The minimum and maximum value of RA are max RA (X) =λ1 + · · · + λk ,
(3.20)
min RA (X) =λn−k+1 + · · · + λn .
(3.21)
X X=Ik X X=Ik
Theorem 3.18 The gradient flow of the generalized Rayleigh quotient with respect to the induced Riemannian metric on St (k, n) is X˙ = (I − XX ) AX.
(3.22)
The solutions of (3.22) exist for all t ∈ R and converge to some generalized k-dimensional eigenbasis ΘXI Ψ of A. Suppose λk > λk+1 . The generalized Rayleigh quotient RA : St (k, n) → R is a Morse-Bott function. For almost all initial conditions X0 ∈ St (k, n), X (t) converges to a generalized kdimensional dominant eigenbasis and X (t) approaches the manifold of all k-dimensional dominant eigenbasis exponentially fast. The equalities (3.20), (3.21) are due to Fan (1949). As a simple consequence of Theorem 3.17 we obtain the following eigenvalue inequalities between the eigenvalues of symmetric matrices A, B and A + B. Corollary 3.19 Let A, B ∈ Rn×n be symmetric matrices and let λ1 (X) ≥ · · · ≥ λn (X) denote the ordered eigenvalues of a symmetric n × n matrix
28
Chapter 1. Matrix Eigenvalue Methods
X. Then for 1 ≤ k ≤ n: k i=1
λi (A + B) ≤
k
λi (A) +
i=1
k
λi (B) .
i=1
Proof 3.20 We have RA+B (X) = RA (X) + RB (X) for any X ∈ St (k, n) and thus max RA+B (X) ≤ max RA (X) + max RB (X)
X X=Ik
X X=Ik
X X=Ik
Thus the result follows from (3.20) It has been shown above that the Rayleigh quotient gradient flow (3.10) extends to a gradient flow on Rn − {0}. This also happens to be true for the gradient flow (3.22) for the generalized Rayleigh quotient. Consider the extended Rayleigh quotient. −1 RA (X) = tr X AX (X X) defined on the noncompact Stiefel manifold
ST (k, n) = X ∈ Rn×k | rk X = k of all full rank n × k matrices. The gradient flow of RA (X) on ST (k, n) with respect to the Riemannian metric on ST (k, n) defined by −1 ξ, η := 2 tr ξ (X X) η on TX ST (k, n) is easily seen to be −1 X˙ = In − X (X X) X AX.
(3.23)
For X X = Ik this coincides with (3.22). Note that every solution X (t) of d (X (t) X (t)) = 0 and hence X (t) X (t) = X (0) X (0) (3.23) satisfies dt for all t. Instead of considering (3.23) one might also consider (3.22) with arbitrary initial conditions X (0) ∈ ST (k, n). This leads to a more structurally stable flow on ST (k, n) than (3.22) for which a complete phase portrait analysis is obtained in Yan, Helmke and Moore (1993). Remark 3.21 The flows (3.10) and (3.22) are identical with those appearing in Oja’s work on principal component analysis for neural network applications, see Oja (1982; 1990). Oja shows that a modification of a Hebbian learning rule for pattern recognition problems leads naturally to a class of nonlinear stochastic differential equations. The Rayleigh quotient gradient flows (3.10), (3.22) are obtained using averaging techniques.
1.3. The Rayleigh Quotient Gradient Flow
29
Remark 3.22 There is a close connection between the Rayleigh quotient gradient flows and the Riccati equation. In fact let X (t) be a solution of the Rayleigh flow (3.22). Partition X (t) as X1 (t) , X (t) = X2 (t) where X1 (t) is k × k, X2 (t) is (n − k) × k and det X1 (t) = 0. Set K (t) = X2 (t) X1 (t)−1 . Then K˙ = X˙ 2 X1−1 − X2 X1−1 X˙ 1 X1−1 . A straightforward but somewhat lengthy computation (see subsequent problem) shows that X˙ 2 X1−1 − X2 X1−1 X˙ 1 X1−1 = A21 + A22 K − KA11 − KA12 K,
where A=
A11
A12
A21
A22
.
Problem 3.23 Show that every solution x (t) ∈ S n−1 of the Oja flow (3.15) has the form eAt x0 x (t) = At e x0 Problem 3.24 Verify that for every solution X (t) ∈ St (k, n) of (3.22), H (t) = X (t) X (t) satisfies the double bracket equation (Here, let [A, B] = AB − BA denote the Lie bracket) H˙ (t) = [H (t) , [H (t) , A]] = H 2 (t) A + AH 2 (t) − 2H (t) AH (t) . Problem 3.25 Verify the computations in Remark 3.22.
Main Points of Section The Rayleigh quotient of a symmetric matrix A ∈ Rn×n is a smooth function rA : Rn − {0} → R. Its gradient flow on the unit sphere S n−1 in Rn , with respect to the standard (induced) , converges to an eigenvector of A. Moreover, the convergence is exponential. Generalized Rayleigh quotients are defined on Stiefel manifolds and their gradient flows converge to the
30
Chapter 1. Matrix Eigenvalue Methods
k-dimensional dominant eigenspace, being a locally stable attractor for the flow. The notions of tangent spaces, normal spaces, as well as a precise understanding of the geometry of the constraint set are important for the theory of such gradient flows. There are important connections with Riccati equations and neural network learning schemes.
1.4 The QR Algorithm In the previous sections we deal with the questions of how to determine a dominant eigenspace for a symmetric real n × n matrix. Given a real symmetric n × n matrix A, the QR algorithm produces an infinite sequence of real symmetric matrices A1 , A2 , . . . such that for each i the matrix Ai has the same eigenvalues as A. Also, Ai converges to a diagonal matrix as i approaches infinity. The QR algorithm is described in more precise terms as follows. Let A0 = A ∈ Rn×n be symmetric with the QR-decomposition A0 = Θ0 R0 , where Θ0 ∈ R
n×n
is orthogonal and ∗ R0 = 0
... .. .
(4.1) ∗ .. . ∗
(4.2)
is upper triangular with nonnegative entries on the diagonal. Such a decomposition can be obtained by applying the Gram-Schmidt orthogonalization procedure to the columns of A0 . Define A1 := R0 Θ0 = Θ0 A0 Θ0 .
(4.3)
Repeating the QR factorization for A1 , A1 = Θ1 R1 with Θ1 orthogonal and R1 as in (4.2). Define
A2 = R1 Θ1 = Θ1 A1 Θ1 = (Θ0 Θ1 ) A0 Θ0 Θ1 .
(4.4)
Continuing with the procedure a sequence (Ai | i ∈ N) of real symmetric matrices is obtained with QR decomposition Ai =Θi Ri , Ai+1 =Ri Θi =
(4.5) Θi Ai Θi ,
i ∈ N.
(4.6)
1.4. The QR Algorithm
31
Thus for all i ∈ N Ai = (Θ0 . . . Θi ) A0 (Θ0 . . . Θi ) ,
(4.7)
and Ai has the same eigenvalues as A0 . Now there is a somewhat surprising result; under suitable genericity assumptions on A0 the sequence (Ai ) always converges to a diagonal matrix, whose diagonal entries coincide with the eigenvalues of A0 . Moreover, the product Θ0 . . . Θi of orthogonal matrices approximates, for i large enough, the eigenvector decomposition of A0 . This result is proved in the literature on the QR-algorithm, see Golub and Van Loan (1989), Chapter 7, and references. The QR algorithm for diagonalizing symmetric matrices as described above, is not a particularly efficient method. More effective versions of the QR algorithm are available which make use of so-called variable shift strategies; see e.g. Golub and Van Loan (1989). Continuous-time analogues of the QR algorithm have appeared in recent years. One such analogue of the QR algorithm for tridiagonal matrices is the Toda flow, which is an integrable Hamiltonian system. Examples of recent studies are Symes (1982), Deift, Nanda and Tomei (1983), Nanda (1985), Chu (1984a), Watkins (1984), Watkins and Elsner (1989). As pointed out by Watkins and Elsner, early versions of such continuous-time methods for calculating eigenvalues of matrices were already described in the early work of Rutishauser (1954; 1958). Here we describe only one such flow which interpolates the QR algorithm. A differential equation A˙ (t) = f (t, A (t))
(4.8)
defined on the vector space of real symmetric matrices A ∈ Rn×n is called isospectral if every solution A (t) of (4.8) is of the form
A (t) = Θ (t) A (0) Θ (t)
(4.9)
with orthogonal matrices Θ (t) ∈ Rn×n , Θ (0) = In . The following simple characterization is well-known, see the above mentioned literature. Lemma 4.1 Let I ⊂ R be an interval and let B (t) ∈ Rn×n , t ∈ I, be a continuous, time-varying family of skew-symmetric matrices. Then A˙ (t) = A (t) B (t) − B (t) A (t)
(4.10)
is isospectral. Conversely, every isospectral differential equation on the vector space of real symmetric n × n matrices is of the form (4.10) with B (t) skew-symmetric.
32
Chapter 1. Matrix Eigenvalue Methods
Proof 4.2 Let Θ (t) denote the unique solution of Θ (0) Θ (0) = In .
˙ (t) = Θ (t) B (t) , Θ
Since B (t) is skew-symmetric we have d ˙ (t) ˙ (t) Θ (t) + Θ (t) Θ Θ (t) Θ (t) =Θ dt =B (t) Θ (t) Θ (t) + Θ (t) Θ (t) B (t)
(4.11)
(4.12)
=Θ (t) Θ (t) B (t) − B (t) Θ (t) Θ (t) , and Θ (0) Θ (0) = In . By the uniqueness of the solutions of (4.12) we (t) = have Θ (t) Θ (t) = In , t ∈ I, and thus Θ (t) is orthogonal. Let A Θ (t) A (0) Θ (t). Then A (0) = A (0) and d ˙ (t) A (0) Θ (t) + Θ (t) A (0) Θ ˙ (t) A (t) =Θ dt (4.13) (t) B (t) − B (t) A (t) . =A (t) = A (t), t ∈ I, Again, by the uniqueness of the solutions of (4.13), A and the result follows. In the sequel we will frequently make use of the Lie bracket notation [A, B] = AB − BA
(4.14)
for A, B ∈ Rn×n . Given an arbitrary n × n matrix A ∈ Rn×n there is a unique decomposition A = A− + A+ where A− is skew-symmetric and A+ (aij ) ∈ Rn×n symmetric then 0 −a21 a 0 21 A− = . .. . . .
is upper triangular. Thus for A =
... .. . .. .
an1 . . . a11 2a21 0 a22 A+ = . .. . . .
... .. . .. .
0
0
...
(4.15)
an,n−1
−an1 .. . , −an,n−1 0
2an1 .. . . 2an,n−1 ann
(4.16)
1.4. The QR Algorithm
33
For any positive definite real symmetric n × n matrix A there exists a unique real symmetric logarithm of A B = log (A) with eB = A. The restriction to positive definite matrices avoids any possible ambiguities in defining the logarithm. With this notation we then have the following result, see Chu (1984a) for a more complete theory. Theorem 4.3 The differential equation A˙ = A, (log A)− = A (log A)− − (log A)− A,
A (0) = A0 (4.17)
is isospectral and the solution A (t) exists for all A0 = A0 > 0 and t ∈ R. If (Ai | i ∈ N) is the sequence produced by the QR-algorithm for A0 = A (0), then Ai = A (i) for all i ∈ N.
(4.18)
Proof 4.4 The proof follows that of Chu (1984a). Let et·log A(0) = Θ (t) R (t)
(4.19)
be the QR-decomposition of et·log A(0) . By differentiating both sides with respect to t, we obtain ˙ (t) R (t) + Θ (t) R˙ (t) = log A (0) et·log A(0) Θ = (log A (0)) · Θ (t) R (t) . Therefore
˙ −1 Θ (t) Θ (t) + R˙ (t) R (t) = log Θ (t) A (0) Θ (t)
˙ (t) is skew-symmetric and R˙ (t) R (t)−1 is upper triangular, Since Θ (t) Θ this shows
˙ (t) = Θ (t) log Θ (t) A (0) Θ (t) Θ . (4.20) −
Consider A (t) := Θ (t) A (0) Θ (t). By (4.20) ˙ A (0) Θ + Θ A (0) Θ ˙ A˙ =Θ = − (log A)− Θ A (0) Θ + Θ A (0) Θ (log A)− . = A, (log A)− .
34
Chapter 1. Matrix Eigenvalue Methods
Thus A (t) is a solution of (4.17). Conversely, since A (0) is arbitrary and by the uniqueness of solutions of (4.17), any solution A (t) of (4.17) is of the form A (t) = Θ (t) A (0) Θ (t) with Θ (t) defined by (4.19). Thus
et·log A(t) =et·log(Θ(t) A(0)Θ(t)) = Θ (t) et·log A(0) Θ (t) =R (t) Θ (t) . Therefore for t = 1, and using (4.19) A (1) =elog A(1) = R (1) Θ (1)
=Θ (1) Θ (1) R (1) Θ (1)
=Θ (1) A (0) Θ (1) . Proceeding by induction this shows that for all k ∈ N
A (k) = Θ (k) . . . Θ (1) A (0) Θ (1) . . . Θ (k) is the k-th iterate produced by the QR algorithm. This completes the proof. As has been mentioned above, the QR algorithm is not a numerically efficient method for diagonalizing a symmetric matrix. Neither would we expect that integration of the continuous time flow (4.17), by e.g. a RungeKutta method, leads to a numerically efficient algorithm for diagonalization. Numerically efficient versions of the QR algorithms are based on variable shift strategies. These are not described here and we refer to e.g. Golub and Van Loan (1989) for a description. Example 4.5 For the case A0 = 11 12 , the isospectral flow (4.17) is plotted in Figure 4.1. The time instants t = 1, 2, . . . are indicated.
Main Points of Section The QR algorithm produces a sequences of real symmetric n × n matrices Ai , each with the same eigenvalues, and which converges to a diagonal matrix A∞ consisting of the eigenvalues of A. Isospectral flows are continuous-time analogs algorithm flows. One such is the Lie of the QR bracket equation A˙ = A, (log A)− where X− denotes the skew-symmetric component of a matrix X. The QR algorithm is the integer time sampling of this isospectral flow. Efficient versions of the QR algorithm require shift strategies not studied here.
1.5. Singular Value Decomposition (SVD) QR Algorithm (1.4.5)
Elements of A
Elements of A
Isospectral Flow (1.4.16)
35
Time
Discrete Time
FIGURE 4.1. Isospectral flow and QR algorithm
1.5 Singular Value Decomposition (SVD) The singular value decomposition is an invaluable tool in numerical linear algebra, statistics, signal processing and control theory. It is a cornerstone of many reliable numerical algorithms, such as those for solving linear equations, least squares estimation problems, controller optimization, and model reduction. A variant of the QR algorithm allows the singular value decomposition of an arbitrary matrix A ∈ Rm×n with m ≥ n. The singular value decomposition (SVD) of A is diag (σ1 , . . . , σn ) A = V ΣU or V AU = Σ, Σ= . (5.1) 0(m−n)×n Here U , V are orthogonal matrices with U U = U U = In , V V = V V = Im , and the σi ∈ R are the nonnegative singular values of A, usually ordered so that σ1 ≥ σ2 ≥ · · · ≥ σn ≥ 0. The singular values σi (A) are the nonnegative square roots of the eigenvalues of A A. If A is Hermitian, then the singular values of A are identical to the absolute values of the eigenvalues of A. However, this is not true in general. The columns of V and U are called the left and the right singular vectors of A, respectively. Of course from (5.1) we have that
U (A A) U =Σ2 = diag σ12 , . . . , σn2 , 2
0 Σ (5.2) = diag σ12 , . . . , σn2 , 0, . . . , 0 . V (AA ) V = ! 0 0 m−n
36
Chapter 1. Matrix Eigenvalue Methods
Clearly, since A A, AA are symmetric the application of the QR algorithm to A A or AA would achieve the decompositions (5.1), (5.2). However, an unattractive aspect is that matrix squares are introduced with associated squaring of condition numbers and attendant loss of numerical accuracy. An alternative reformulation of the SVD task which is suitable for application of the QR algorithm is as follows: Define 0 A = , A (5.3) =A A 0 then for orthogonal V , Σ V = V A 0 0
0 −Σ 0
0 0 0
(5.4)
= diag (σ1 , . . . , σn , −σ1 , . . . , −σn , 0, . . . , 0 ) . ! m−n
Simple manipulations show that 1 V1 V1 V = √ 2 U −U " # V = V1 V2 ! ! n
V2 , 0
(5.5)
m−n
with orthogonal matrices U , V satisfying (5.1). Actually, the most common way of calculating the SVD of A ∈ Rm×n is to apply an implicit QR algorithm based on the work of Golub and Kahan (1965). In this text, we seek implementations via dynamical systems, and in particular seek gradient flows, or more precisely self-equivalent flows which preserve singular values. A key result on such flows is as follows. Lemma 5.1 Let C (t) ∈ Rn×n and D (t) ∈ Rm×m be a time-varying continuous family of skew-symmetric matrices on a time interval. Then the flow on Rm×n , H˙ (t) = H (t) C (t) + D (t) H (t)
(5.6)
is self equivalent on this interval. Conversely, every self-equivalent differential equation on Rm×n is of the form (5.6) with C (t), D (t) skew-symmetric.
1.5. Singular Value Decomposition (SVD)
37
Proof 5.2 Let (U (t) , V (t)) denote the unique solutions of U˙ (t) =U (t) C (t) , V˙ (t) = − V (t) D (t) ,
U (0) =In , V (0) =Im .
(5.7)
Since C, D are skew-symmetric, then U , V are orthogonal matrices. Let A (t) = V (t) H (0) U (t). Then A (0) = H (0) and A˙ (t) = V (t) H (0) U˙ (t) + V˙ (t) H (0) U (t) = A (t) C (t) + D (t) A (t) . Thus A (t) = H (t) by uniqueness of solutions. Remark 5.3 In the discrete-time case, any self equivalent flow takes the form Hk+1 = Sk eCk Hk eDk Tk ,
(5.8)
where Ck , Dk are skew-symmetric with Sk , Tk appropriate sign matrices of the form diag (±1, 1, . . . , 1) and consequently eCk , eDk are orthogonal matrices. The proof is straightforward. Remark 5.4 In Chu (1986) and Watkins and Elsner (1989), self-equivalent flows are developed which are interpolations of an explicit QR based SVD method. In particular, for square and nonsingular H0 such a flow is H˙ (t) = H (t) , (log (H (t) H (t)))− ,
H (0) = H0 .
(5.9)
Here, the notation A− denotes the skew-symmetric part of A, see (4.15) - (4.16). This self-equivalent flow then corresponds to the isospectral QR flow of Chu (1986) and Watkins and Elsner (1989). Also so called shifted versions of the algorithm for SVD are considered by Watkins and Elsner (1989).
Main Points of Section The singular value decomposition of A ∈ Rm×n can be achieved using the = 0 A . The associated flows QR algorithm on the symmetric matrix A A 0 on rectangular matrices A preserve the singular values and are called selfequivalent flows. General self-equivalent flows on Rm×n are characterized by pairs of possibly time-variant skew-symmetric matrices. They can be considered as isospectral flows on symmetric (m + n) × (m + n) matrices.
38
Chapter 1. Matrix Eigenvalue Methods
1.6 Standard Least Squares Gradient Flows So far in this chapter, we have explored dynamical systems evolving on rather simple manifolds such as spheres, projective spaces or Grassmannians for eigenvalue methods. The dynamical systems have not always been gradient flows nor were they always related to minimizing the familiar least squares measures. In the next chapters we explore more systematically eigenvalue methods based on matrix least squares. As a prelude to this, let us briefly develop standard vector least squares gradient flows. In the first instance there will be no side constraints and the flows will be on Rn . The standard (vector) least squares minimization task is to minimize over x ∈ Rn and for A ∈ Rm×n , b ∈ Rm , the 2-norm index Φ (x) = 12 Ax − b22
(6.1)
= 12 (Ax − b) (Ax − b)
The directional derivative DΦ|x (ξ) is given in terms of the gradient
∂Φ ∂Φ ∇Φ (x) = ∂x , . . . , ∂x as 1 n
DΦ|x (ξ) = (Ax − b) Aξ =: ∇Φ (x) ξ
(6.2)
Consequently, the gradient flow for least squares minimizing (6.1) in Rn is x˙ = −∇Φ (x) ,
∇Φ (x) = A (Ax − b)
(6.3)
Here Rn is considered in the trivial way as a smooth manifold for the flow, and x (0) ∈ Rn . The equilibria satisfy ∇Φ (x) = A (Ax − b) = 0
(6.4)
If A is injective, then A A > 0 and there is a unique equilibrium x∞ = −1 (A A) A b which is exponentially stable. The exponential convergence rate is given by ρ = λmin (A A), that is by the square of the smallest singular value of A.
Constrained Least Squares Of more interest to the theme of the book are constrained least squares gradient flows. Let us impose for example an additional affine equality constraint, for a full row rank matrix L ∈ Rp×n , c ∈ Rp Lx = c
(6.5)
1.6. Standard Least Squares Gradient Flows
39
This affine constraint defines a smooth constraint manifold M in Rn with tangent spaces given by Tx M = {ξ | Lξ = 0} ,
x ∈ M.
(6.6)
The gradient flow on M requires an orthogonal projection of the directional derivative into the tangent space, so we need that the gradient ∇Φ (x) satisfy both (6.2) and L∇Φ (x) = 0. Note that P := I − L# L := I −
−1 L (LL ) L = I − L L# is the linear projection operator from Rn onto Tx M . Thus
(6.7) x˙ = −∇Φ (x) , ∇Φ (x) = I − L# L A (Ax − b) is the gradient flow on M . Of course, the initial condition x (0) satisfies Lx (0) = c. Here # denotes the pseudo-inverse. Again, there is exponential convergence to the equilibrium points satisfying ∇Φ (x) = 0. There is a unique equilibrium when A is full rank. The exponential convergence rate is given by ρ = λmin I − L# L A A . As an example of a nonlinear constraint , let us consider least squares estimation on a sphere. Thus, consider the constraint x2 = 1, defining the (n − 1)-dimensional sphere S n−1 . Here the underlying geometry of the problem is the same as that developed for the Rayleigh quotient gradient flow of Section 1.3. Thus the gradient vector ∇Φ (x) = A (Ax − b) − (x A (Ax − b)) x
(6.8)
satisfies both (6.2) on the tangent space Tx S n−1 = {ξ ∈ Rn | x ξ = 0} and is itself in the tangent space for all x ∈ S n−1 , since x ∇Φ (x) = 0 for all x2 = 1. The gradient flow for least squares estimation on the sphere S n−1 is x˙ = −∇Φ (x) , that is x˙ = (x A (Ax − b)) x − A (Ax − b)
(6.9)
and the equilibria are characterized by (x A (Ax − b)) x = A (Ax − b) . In general, there is no simple analytical formula for the equilibria x. However, for m = n, it can be shown that there are precisely two equilibria. There is a unique minimum (and a unique maximum), being equivalent to a minimization of b ∈ Rn to the surface of the ellip of the distance n −1 n soid y ∈ R | A y = 1 in R , as depicted in Figure 6.1. If A is an orthogonal matrix, this ellipsoid is the sphere S n−1 , and the equilibria are
40
Chapter 1. Matrix Eigenvalue Methods
Generic A
Orthogonal A
FIGURE 6.1. Least squares minimization
−1 x∗ = ± A−1 b A−1 b. A linearization of the gradient flow equation at the equilibrium point with a positive sign yields an exponentially stable linear differential equation and thus ensures that this equilibrium is exponentially stable. The maximum is unstable, so that for all initial conditions different from the maximum point the gradient flow converges to the desired optimum.
Main Points of Section Certain constrained least squares optimization problems can be solved via gradient flows on the constraint manifolds. The cases discussed are the smooth manifolds Rn , an affine space in Rn , and the sphere S n−1 . The estimation problem on the sphere does not always admit a closed-form solution, thus numerical techniques are required to solve this problem. The somewhat straightforward optimization procedures of this section and chapter form a prototype for some of the more sophisticated matrix least squares optimization problems in later chapters.
1.6. Standard Least Squares Gradient Flows
41
Notes for Chapter 1 For a general reference on smooth dynamical systems we refer to Palis and de Melo (1982). Structural stability properties of two-dimensional vector fields were investigated by Andronov and Pontryagin (1937), Andronov, Vitt and Khaikin (1987) and Peixoto (1962). A general investigation of structural stability properties of gradient vector fields on manifolds is due to Smale (1961). Smale shows that a gradient flow x˙ = grad Φ (x) on a smooth compact manifold is structurally stable if the Hessian of Φ at each critical point is nondegenerate, and the stable and unstable manifolds of the equilibria points intersect each other transversally. Morse has considered smooth functions Φ : M → R such that the Hessian at each critical point is nondegenerate. Such functions are called Morse functions and they form a dense subset of the class of all smooth functions on M . See Milnor (1963) and Bott (1982) for nice expositions on Morse theory. Other recent references are Henniart (1983) and Witten (1982). A generalization of Morse theory is due to Bott (1954) who considers smooth functions with nondegenerate manifolds of critical points. An extension to Morse theory on infinite dimensional spaces is due to Palais (1962). Thom (1949) has shown that the stable and unstable manifolds of an equilibrium point of a gradient flow x˙ = grad Φ (x), for a smooth Morse function Φ : M → R, are diffeomorphic to Euclidean spaces Rk , i.e. they are cells. The decomposition of M by the stable manifolds of a gradient flow for a Morse function Φ : M → R is a cell decomposition. Refined topological properties of such cell decompositions were investigated by Smale (1960). Convergence and stability properties of gradient flows may depend on the choice of the Riemannian metric. If a ∈ M is a nondegenerate critical point of Φ : M → R, then the local stability properties of the gradient flow x˙ = grad Φ (x) around a do not change with the Riemannian metric. However, if a is a degenerate critical point, then the qualitative picture of the local phase portrait of the gradient around a may well change with the Riemannian metric. See Shafer (1980) for results in this direction. For an example of a gradient flow where the solutions converge to a set of equilibrium points rather than to single equilibria, see Palis and de Melo (1982). A recent book on differential geometric methods written from an applications oriented point of view is Doolin and Martin (1990), with background material on linear quadratic optimal control problems, the Riccati equation and flows on Grassmann manifolds. Background material on the matrix Riccati equation can be found in Reid (1972) and Anderson and Moore (1990). An analysis of the Riccati equation from a Lie theoretic perspective is given by Hermann (1979), Hermann and Martin (1982). A complete phase portrait analysis of the matrix Riccati equation has been
42
Chapter 1. Matrix Eigenvalue Methods
obtained by Shayman (1986). Structural stability properties of the matrix Riccati equation were studied by Schneider (1973), Bucy (1975). Finite escape time phenomena of the matrix Riccati differential equation are well studied in the literature, see Martin (1981) for a recent reference. Numerical matrix eigenvalue methods are analyzed from a Lie group theoretic perspective by Ammar and Martin (1986). There also the connection of the matrix Riccati equation with numerical methods for finding invariant subspaces of a matrix by the power method is made. An early reference for this is Parlett and Poole (1973). For a statement and proof of the Courant-Fischer minmax principle see Golub and Van Loan (1989). A generalization of Theorem 3.17 is Wielandt’s minmax principle on partial sums of eigenvalues. For a proof as well as applications to eigenvalue inequalities for sums of Hermitian matrices see Bhatia (1987). The eigenvalue inequalities appearing in Corollary 3.19 are the simplest of a whole series of eigenvalue inequalities for sums of Hermitian matrices. They include those of Cauchy-Weyl, Lidskii, and Thompson and Freede. For this we refer to Bhatia (1987) and Marshall and Olkin (1979). For an early investigation of isospectral matrix flows as a tool for solving matrix eigenvalue problems we refer to Rutishauser (1954). Apparently his pioneering work has long been overlooked until its recent rediscovery by Watkins and Elsner (1989). The first more recent papers on dynamical systems applications to numerical analysis are Kostant (1979), Symes (1980a; 1982), Deift et al. (1983), Chu (1984a; 1986), Watkins (1984) and Nanda (1985).
CHAPTER
2
Double Bracket Isospectral Flows 2.1 Double Bracket Flows for Diagonalization Brockett (1988) has introduced a new class of isospectral flows on the set of real symmetric matrices which have remarkable properties. He considers the ordinary differential equation H˙ (t) = [H (t) , [H (t) , N ]] ,
H (0) = H0
(1.1)
where [A, B] = AB − BA denotes the Lie bracket for square matrices and N is an arbitrary real symmetric matrix. We term this the double bracket equation. Brockett proves that (1.1) defines an isospectral flow which, under suitable assumptions on N , diagonalizes any symmetric matrix H (t) for t → ∞. Also, he shows that the flow (1.1) can be used to solve various combinatorial optimization tasks such as linear programming problems and the sorting of lists of real numbers. Further applications to the travelling salesman problem and to digital quantization of continuous-time signals have been described, see Brockett (1989a; 1989b) and Brockett and Wong (1991). Note also the parallel efforts by Chu (1984b; 1984a), Driessel (1986), Chu and Driessel (1990) with applications to structured inverse eigenvalue problems and matrix least squares estimation. The motivation for studying the double bracket equation (1.1) comes from the fact that it provides a solution to the following matrix least squares approximation problem. Let Q = diag (λ1 , . . . , λn ) with λ1 ≥ λ2 ≥ · · · ≥ λn be a given real
44
Chapter 2. Double Bracket Isospectral Flows
diagonal matrix and let
M (Q) = Θ QΘ ∈ Rn×n | ΘΘ = In
(1.2)
denote the set of all real symmetric matrices H = Θ QΘ orthogonally equivalent to Q. Thus M (Q) is the set of all symmetric matrices with eigenvalues λ1 , . . . , λn . Given an arbitrary symmetric matrix N ∈ Rn×n , we consider the task of minimizing the matrix least squares index (distance) N − H2 = N 2 + H2 − 2 tr (N H)
(1.3)
of N to any H ∈ M (Q) for the Frobenius norm A2 = tr (AA ). This is a 2 least estimation problem with spectral constraints. Since H = $n squares 2 i=1 λi is constant for H ∈ M (Q), the minimization of (1.3) is equivalent to the maximization of the trace functional φ (H) = tr (N H) defined on M (Q). Heuristically, if N is chosen to be diagonal we would expect that the matrices H∗ ∈ M (Q) which minimize (1.3) are of the same form, i.e. H∗ = πQπ for a suitable n × n permutation matrix. Since M (Q) is a smooth manifold (Proposition 1.1) it seems natural to apply steepest descent techniques in order to determine the minima H∗ of the distance function (1.3) on M (Q). This program will be carried out in this section. The intuitive meaning of equation (1.1) will be explained by showing that it is actually the gradient flow of the distance function (1.3) on M (Q). Since M (Q) is a homogeneous space for the Lie group O (n) of orthogonal matrices, we first digress to the topic of Lie groups and homogeneous spaces before starting our analysis of the flow (1.1). Digression: Lie Groups and Homogeneous Spaces A Lie group G is a group, which is also a smooth manifold, such that G × G → G,
(x, y) → xy −1
is a smooth map. Examples of Lie groups are
(a) The general linear group GL (n, R) = T ∈ Rn×n | det (T ) = 0 .
(b) The special linear group SL (n, R) = T ∈ Rn×n | det (T ) = 1 .
(c) The orthogonal group O (n) = T ∈ Rn×n | T T = In .
(d) The unitary group U (n) = T ∈ Cn×n | T T ∗ = In . The groups O (n) and U (n) are compact Lie groups, i.e. Lie groups which are compact spaces. Also, GL (n, R) and O (n) have two connected components while SL (n, R) and U (n) are connected.
2.1. Double Bracket Flows for Diagonalization
45
The tangent space G := Te G of a Lie group G at the identity element e ∈ G carries in a natural way the structure of a Lie algebra. The Lie algebras of the above examples of Lie groups are
(a) gl (n, R) = X ∈ Rn×n
(b) sl (n, R) = X ∈ Rn×n | tr (X) = 0
(c) skew (n, R) = X ∈ Rn×n | X = −X
(d) skew (n, C) = X ∈ Cn×n | X ∗ = −X In all of these cases the product structure on the Lie algebra is given by the Lie bracket [X, Y ] = XY − Y X of matrices X,Y. A Lie group action of a Lie group G on a smooth manifold M is a smooth map σ : G × M → M, (g, x) → g · x satisfying for all g, h ∈ G, and x ∈ M g · (h · x) = (gh) · x,
e · x = x.
The group action σ : G × M → M is called transitive if there exists x ∈ M such that every y ∈ M satisfies y = g · x for some g ∈ G. A space M is called homogeneous if there exists a transitive G-action on M . The orbit of x ∈ M is defined by O (x) = {g · x | g ∈ G} . Thus the homogeneous spaces are the orbits of a group action. If G is a compact Lie group and σ : G × M → M a Lie group action, then the orbits O (x) of σ are smooth, compact submanifolds of M . Any Lie group action σ : G × M → M induces an equivalence relation ∼ on M defined by x ∼ y ⇐⇒ There exists
g∈G
with
y =g·x
for x, y ∈ M . Thus the equivalence classes of ∼ are the orbits of σ : G×M → M . The orbit space of σ : G×M → M , denoted by M/G = M/ ∼, is defined as the set of all equivalence classes of M , i.e. M/G := {O (x) | x ∈ M } . Here M/G carries a natural topology, the quotient topology, which is defined as the finest topology on M/G such that the quotient map π : M → M/G,
π (x) := O (x) ,
46
Chapter 2. Double Bracket Isospectral Flows
is continuous. Thus M/G is Hausdorff only if the orbits O (x), x ∈ M , are closed subsets of M . Given a Lie group action σ : G × M → M and a point X ∈ M , the stabilizer subgroup of x is defined as Stab (x) := {g ∈ G | g · x = x} . Stab (x) is a closed subgroup of G and is also a Lie group. For any subgroup H ⊂ G the orbit space of the H-action α : H × G → G, (h, g) → gh−1 , is the set of coset classes G/H = {g · H | g ∈ G} which is a homogeneous space. If H is a closed subgroup of G, then G/H is a smooth manifold. In particular, G/ Stab (x), x ∈ M , is a smooth manifold for any Lie group action σ : G × M → M . Consider the smooth map σx : G → M,
g → g · x,
for a given x ∈ M . Then the image σx (G) of σx coincides with the G-orbit O (x) and σx induces a smooth bijection σ ¯x : G/ Stab (x) → O (x) . This map is a diffeomorphism if G is a compact Lie group. For arbitrary non-compact Lie groups G, the map σ ¯x need not be a homeomorphism. Note that the topologies of G/ Stab (x) and O (x) are defined in a different way : G/ Stab (x) is endowed with the quotient topology while O (x) carries the relative subspace topology induced from M . If G is compact, then these topologies are homeomorphic via σ ¯x .
Returning to the study of the double bracket equation (1.1) as a gradient flow on the manifold M (Q) of (1.2), let us first give a derivation of some elementary facts concerning the geometry of the set of symmetric matrices with fixed eigenvalues. Let 0 λ1 In1 .. (1.4) Q= . 0 λr Inr be a real diagonal n × n matrix with eigenvalues λ1 > · · · > λr occurring with multiplicities n1 , . . . , nr , so that n1 + · · · + nr = n.
2.1. Double Bracket Flows for Diagonalization
47
Proposition 1.1 M (Q) is a smooth, connected, compact manifold of dimension r 1 2 2 dim M (Q) = ni . n − 2 i=1 Proof 1.2 The proof uses some elementary facts and the terminology from differential geometry which are summarized in the above digression on Lie groups and homogeneous spaces. Let O (n) denote the compact Lie group of all orthogonal matrices Θ ∈ Rn×n . We consider the smooth Lie group action σ : O (n) × Rn×n → Rn×n defined by σ (Θ, H) = ΘHΘ . Thus M (Q) is an orbit of the group action σ and is therefore a compact manifold. The stabilizer subgroup Stab (Q) ⊂ O (n) of Q ∈ Rn×n is defined as Stab (Q) = {Θ ∈ O (n) | ΘQΘ = Q}. Since ΘQ = QΘ if and only if Θ = diag (Θ1 , . . . , Θr ) with Θi ∈ O (ni ), we see that Stab (Q) = O (n1 ) × · · · × O (nr ) ⊂ O (n) .
(1.5)
Therefore M (Q) ∼ = O (n) / Stab (Q) is diffeomorphic to the homogeneous space M (Q) ∼ =O (n) /O (n1 ) × · · · × O (nr ) ,
(1.6)
and dim M (Q) = dim O (n) −
r
dim O (ni )
i=1 r
n (n − 1) 1 − ni (ni − 1) 2 2 i=1 r 1 2 2 ni . = n − 2 i=1 =
To see the connectivity of M (Q) note that the subgroup SO (n) = {Θ ∈ O (n) | det Θ = 1} of O (n) is connected. Now M (Q) is the image of SO (n) under the continuous map f : SO (n) → M (Q), f (Θ) = ΘQΘ , and therefore also connected. This completes the proof. We need the following description of the tangent spaces of M (Q). Lemma 1.3 The tangent space of M (Q) at H ∈ M (Q) is
TH M (Q) = [H, Ω] | Ω = −Ω ∈ Rn×n .
(1.7)
48
Chapter 2. Double Bracket Isospectral Flows
Proof 1.4 Consider the smooth map σH : O (n) → M (Q) defined by σH (Θ) = ΘHΘ . Note that σH is a submersion and therefore it induces a surjective map on tangent spaces. The tangent space of O (n) at the n × n identity matrix I is TI O (n) = {Ω ∈ Rn×n | Ω = −Ω} (Appendix C) and the derivative of σH at I is the surjective linear map DσH |I : TI O (n) →TH M (Q) , Ω →ΩH − HΩ.
(1.8)
This proves the result. We now state and prove our main result on the double bracket equation (1.1). The result in this form is due to Bloch, Brockett and Ratiu (1990). Theorem 1.5 Let N ∈ Rn×n be symmetric, and Q satisfy (1.4). (a) The differential equation(1.1), H˙ = [H, [H, N ]] ,
H (0) = H (0) = H0 ,
defines an isospectral flow on the set of all symmetric matrices H ∈ Rn×n . (b) There exists a Riemannian metric on M (Q) such that (1.1) is the gradient flow H˙ = grad fN (H) of the function fN : M (Q) → R, 2 fN (H) = − 21 N − H . (c) The solutions of (1.1) exist for all t ∈ R and converge to a connected component of the set of equilibria points. The set of equilibria points H∞ of (1.1) is characterized by [H∞ , N ] = 0, i.e. N H∞ = H∞ N . (d) Let N = diag (µ1 , . . . , µn ) with µ1 > · · · > µn . Then every solution H (t) of (1.1) converges
for t → ±∞ to a diagonal matrix π diag (λ1 , . . . , λn ) π = diag λπ(1) , . . . , λπ(n) , where λ1 , . . . , λn are the eigenvalues of H0 and π is a permutation matrix. (e) Let N = diag (µ1 , . . . , µn ) with µ1 > · · · > µn and define the function fN : M (Q) → R, fN (H) = − 12 N − H2 . The Hessian of fN : M (Q) → R at each critical point is nonsingular. For almost all initial conditions H0 ∈ M (Q) the solution H (t) of (1.1) converges to Q as t → ∞ with an exponential bound on the rate of convergence. For the case of distinct eigenvalues λi = λj , then the linearization at a critical point H∞ is
i > j, (1.9) ξ˙ij = − λπ(i) − λπ(j) (µi − µj ) ξij ,
where H∞ = diag λπ(1) , . . . , λπ(n) , for π a permutation matrix.
2.1. Double Bracket Flows for Diagonalization
49
Proof 1.6 To prove (a) note that [H, N ] = HN − N H is skew-symmetric if H, N are symmetric. Thus the result follows immediately from Lemma 1.4.1. It follows that every solution H (t), t belonging to an interval in R, of (1.1) satisfies H (t) ∈ M (H0 ). By Proposition 1.1, M (H0 ) is compact and therefore H (t) exists for all t ∈ R. Consider the time derivative of the t function fN (H (t)) = − 21 N − H (t)2 = − 21 N 2 −
1 2
H0 2 + tr (N H (t))
and note that d fN (H (t)) = tr N H˙ (t) = tr (N [H (t) , [H (t) , N ]]) . dt Since tr (A [B, C]) = tr ([A, B] C), and [A, B] = − [B, A], then with A, B symmetric matrices, we have [A, B] = [B, A] and thus
tr (N [H, [H, N ]]) = tr ([N, H] [H, N ]) = tr [H, N ] [H, N ] . Therefore d 2 fN (H (t)) = [H (t) , N ] . dt
(1.10)
Thus fN (H (t)) increases monotonically. Since fN : M (H0 ) → R is a continuous function on the compact space M (H0 ), then fN is bounded from above and below. It follows that fN (H (t)) converges to a finite value and its time derivative must go to zero. Therefore every solution of (1.1) converges to a connected component of the set of equilibria points as t → ∞. By (1.10), the set of equilibria of (1.1) is characterized by [N, H∞ ] = 0. This proves (c). Now suppose that N = diag (µ1 , . . . , µn ) with µ1 > · · · > µn . By (c), the equilibria of (1.1) are the symmetric matrices H∞ = (hij ) which commute with N and thus µi hij = µj hij
for i, j = 1, . . . , n.
(1.11)
Now (1.11) is equivalent to (µi − µj ) hij = 0 for i, j = 1, . . . , n and hence, by assumption on N , to hij = 0 for i = j. Thus H∞ is a diagonal matrix with the same eigenvalues as H0 and it follows that H∞ = π diag (λ1 , . . . , λn ) π . Therefore (1.1) has only a finite number of equilibrium points. We have already shown that every solution of (1.1) converges to a connected component of the set of equilibria and thus to a single equilibrium H∞ . This completes the proof of (d). In order to prove (b) we need some preparation. For any H ∈ M (Q) let σH : O (n) → M (Q) be the smooth map defined by σH (Θ) = Θ HΘ. The
50
Chapter 2. Double Bracket Isospectral Flows
derivative at In is the surjective linear map DσH |I : TI O (n) →TH M (Q) , Ω →HΩ − ΩH. Let skew (n) denote the set of all real n × n skew-symmetric matrices. The kernel of DσH |I is then K = ker DσH |I = {Ω ∈ skew (n) | HΩ = ΩH} .
(1.12)
Let us endow the vector space skew (n) with the standard inner product defined by (Ω1 , Ω2 ) → tr (Ω1 Ω2 ). The orthogonal complement of K in skew (n) then is K ⊥ = {Z ∈ skew (n) | tr (Z Ω) = 0 Since
∀Ω ∈ K} .
(1.13)
tr [N, H] Ω = − tr (N [H, Ω]) = 0
for all Ω ∈ K, then [N, H] ∈ K ⊥ for all symmetric matrices N . For any Ω ∈ skew (n) we have the unique decomposition Ω = Ω H + ΩH
(1.14)
with ΩH ∈ K and ΩH ∈ K ⊥ . The gradient of a function on a smooth manifold M is only defined with respect to a fixed Riemannian metric on M (see Sections C.9 and C.10). To define the gradient of the distance function fN : M (Q) → R we therefore have to specify which Riemannian metric on M (Q) we consider. Now to define a Riemannian metric on M (Q) we have to define an inner product on each tangent space TH M (Q) of M (Q). By Lemma 1.3, DσH |I : skew (n) → TH M (Q) is surjective with kernel K and hence induces an isomorphism of K ⊥ ⊂ skew (n) onto TH M (Q). Thus to define an inner product on TH M (Q) it is equivalent to define an inner product on K ⊥ . We proceed as follows: Define for tangent vectors [H, Ω1 ] , [H, Ω2 ] ∈ TH M (Q) , (1.15) [H, Ω1 ] , [H, Ω2 ] := tr ΩH ΩH 1 2 where ΩH i , i = 1, 2, are defined by (1.14). This defines an inner product on TH M (Q) and in fact a Riemannian metric on M (Q). We refer to this as the normal Riemannian metric on M (Q). The gradient of fN : M (Q) → R with respect to this Riemannian metric is the unique vector field grad fN on M (Q) which satisfies the condition
2.1. Double Bracket Flows for Diagonalization
grad fN (H) ∈ TH M (Q) for all H ∈ M (Q)
(a) (b)
51
DfN |H ([H, Ω]) = grad fN (H) , [H, Ω] for all [H, Ω] ∈ TH M (Q) . (1.16)
By Lemma 1.3, condition (1.16) implies that for all H ∈ M (Q) grad fN (H) = [H, X]
(1.17)
for some skew-symmetric matrix X (which possibly depends on H). By computing the derivative of fN at H we find
DfN |H ([H, Ω]) = tr (N [H, Ω]) = tr [H, N ] Ω . Thus (1.16) implies
tr [H, N ] Ω = grad fN (H) , [H, Ω] = [H, X] , [H, Ω] = tr X H ΩH
(1.18)
for all Ω ∈ skew (n). H and therefore N ] ∈ K ⊥ we have [H, N ] = [H, N ]
Since [H, H tr [H, N ] Ω = tr [H, N ] Ω . By (1.17) therefore X H = [H, N ] .
(1.19)
This shows grad fN (H) = [H, X] = H, X H = [H, [H, N ]] . This completes the proof of (b). For a proof that the Hessian of fN : M (Q) → R is nonsingular at each critical point, without assumption on the multiplicities of the eigenvalues of Q, we refer to Duistermaat et al. (1983). Here we only treat the generic case where n1 = · · · = nn = 1, i.e. Q = diag (λ1 , . . . , λn ) with λ1 > · · · > λn . The linearization of the flow (1.1) on M (Q) around any equilibrium point H∞ ∈ M (Q) where [H∞ , N ] = 0 is given by ξ˙ = H∞ (ξN − N ξ) − (ξN − N ξ) H∞ ,
(1.20)
where is in the tangent space of M (Q) at H∞ =
ξ ∈ TH∞ M (Q) diag λπ(1) , . . . , λπ(n) , for π a permutation matrix. By Lemma 1.3, ξ =
52
Chapter 2. Double Bracket Isospectral Flows
[H∞ , Ω] holds for a skew-symmetric matrix Ω. Equivalently,
in terms of the matrix entries, we have ξ = (ξij ), Ω = (Ωij ) and ξij = λπ(i) − λπ(j) Ωij . Consequently, (1.20) is equivalent to the decoupled set of differential equations.
ξ˙ij = − λπ(i) − λπ(j) (µi − µj ) ξij for i, j = 1, . . . , n. Since ξij = ξji the tangent space TH∞ M (Q) is parametrized by the coordinates ξ ij for i > j. Therefore the eigenvalues of the linearization (1.20) are − λπ(i) − λπ(j) (µi − µj ) for i > j which are all nonzero. Thus the Hessian is nonsingular. Similarly, π = In is the unique
permutation matrix for which λπ(i) − λπ(j) (µi − µj ) > 0 for all i > j. Thus H∞ = Q is the unique critical point of fN : M (Q) → R for which the Hessian is negative definite. The union of the unstable manifolds (see Section C.11) of the equilibria points H∞ = diag λπ(1) , . . . , λπ(n) for π = In forms a closed subset of M (Q) of co-dimension at least one. (Actually, the co-dimension turns out to be exactly one). Thus the domain of attraction for the unique stable equilibrium H∞ = Q is an open and dense subset of M (Q). This completes the proof of (e), and the theorem. Remark 1.7 In the generic case, where the ordered eigenvalues λi , µi of 2 2 H0 , N are distinct, then the property N − H0 ≥ N − H∞ leads $n 2 2 to the Wielandt-Hoffman inequality N − H0 ≥ i=1 (µi − λi ) . Since the eigenvalues of a matrix depend continuously on the coefficients, this last inequality holds in general, thus establishing the Wielandt-Hoffman inequality for all symmetric matrices N , H. Remark 1.8 In Part (c) of Theorem 1.5 it is stated that every solution of (1.1) converges to a connected component of the set of equilibria points. Actually Duistermaat et al. (1983) has shown that fN : M (Q) → R is always a Morse-Bott function. Thus, using Proposition 1.3.7, it follows that any solution of (1.1) is converging to a single equilibrium point rather than to a set of equilibrium points. I 0 Remark 1.9 If Q = 0k 0 is the standard rank k projection operator, then the distance function fN : M (Q) → R is a Morse-Bott function on the Grassmann manifold GrassR (k, n). This function then induces a Morse-Bott function on the Stiefel manifold St (k, n), which coincides with the Rayleigh quotient RN : St (k, n) → R. Therefore the Rayleigh quotient is seen as a Morse-Bott function on the Stiefel manifold. Remark 1.10 The rˆ ole of the matrix N for the double bracket equation is that of a parameter which guides the equation to a desirable final state. In a diagonalization exercise one would choose N to be a diagonal matrix
2.1. Double Bracket Flows for Diagonalization
53
while the matrix to be diagonalized would enter as the initial condition of the double bracket flow. Then in the generic situation, the ordering of the diagonal entries of N will force the eigenvalues of H0 to appear in the same order. Remark 1.11 It has been shown that the gradient flow of the least squares distance function fN : M (Q) → R with respect to the normal Riemannian metric is the double bracket flow. This is no longer true if other Riemannian metrices are considered. For example, the gradient of fN with respect to the induced Riemannian metric is given by a very complicated formula in N and H which makes it hard to analyze. This is the main reason why the somewhat more complicated normal Riemannian metric is chosen.
Flows on Orthogonal Matrices The double bracket flow (1.1) provides a method to compute the eigenvalues of a symmetric matrix H0 . What about the eigenvectors of H0 ? Following Brockett, we show that suitable gradient flows evolving on the group of orthogonal matrices converge to the various eigenvector basis of H0 . Let O (n) denote the Lie group of n × n real orthogonal matrices and let N and H0 be n × n real symmetric matrices. We consider the smooth potential function φ : O (n) → R,
φ (Θ) = tr (N Θ H0 Θ)
(1.21)
defined on O (n). Note that N − Θ H0 Θ = N 2 − 2φ (Θ) + H0 2 , 2
so that maximizing φ (Θ) over O (n) is equivalent to minimizing the least 2 squares distances N − Θ H0 Θ of N to Θ H0 Θ ∈ M (H0 ). We need the following description of the tangent spaces of O (n). Lemma 1.12 The tangent space of O (n) at Θ ∈ O (n) is
TΘ O (n) = ΘΩ | Ω = −Ω ∈ Rn×n .
(1.22)
Proof 1.13 Consider the smooth diffeomorphism λΘ : O (n) → O (n) defined by left multiplication with Θ, ı.e. λΘ (ψ) = Θψ for ψ ∈ O (n). The tangent space of O (n) at the n × n identity matrix I is TI O (n) = {Ω ∈ Rn×n | Ω = −Ω}; i.e. is equal to the Lie algebra of skewsymmetric matrices. The derivative of λΘ at I is the linear isomorphism between tangent spaces D (λΘ )|I : TI O (n) → TΘ O (n) ,
Ω → ΘΩ.
(1.23)
54
Chapter 2. Double Bracket Isospectral Flows
The result follows. The standard Euclidean inner product A, B = tr (A B) on Rn×n induces an inner product on each tangent space TΘ O (n), defined by
(1.24) ΘΩ1 , ΘΩ2 = tr (ΘΩ1 ) (ΘΩ2 ) = tr (Ω1 Ω2 ) . This defines a Riemannian matrix on O (n) to which we refer to as the induced Riemannian metric on O (n). As an aside for readers with a background in Lie group theory, this metric coincides with the Killing form on the Lie algebra of O (n), up to a constant scaling factor. Theorem 1.14 Let N, H0 ∈ Rn×n be symmetric. Then: (a) The differential equation ˙ (t) = H0 Θ (t) N − Θ (t) N Θ (t) H0 Θ (t) , Θ
Θ (0) ∈ O (n) (1.25)
induces a flow on the Lie group O (n) of orthogonal matrices. Also ˙ = ∇φ (Θ) of the trace function (1.21) (1.25) is the gradient flow Θ on O (n) for the induced Riemannian metric on O (n). (b) The solutions of (1.25) exist for all t ∈ R and converge to a connected component of the set of equilibria points Θ∞ ∈ O (n). The set of equilibria points Θ∞ of (1.25) is characterized by [N, Θ∞ H0 Θ∞ ] = 0. (c) Let N = diag (µ1 , . . . , µn ) with µ1 > · · · > µn and suppose H0 has distinct eigenvalues λ1 > · · · > λn . Then every solution Θ (t) of (1.25) converges
for t → ∞ to an orthogonal matrix Θ∞ satisfying H0 = Θ∞ diag λπ(1) , . . . , λπ(n) Θ∞ for a permutation matrix π. The columns of Θ∞ are eigenvectors of H0 . (d) Let N = diag (µ1 , . . . , µn ) with µ1 > · · · > µn and suppose H0 has distinct eigenvalues λ1 > · · · > λn . The Hessian of φ : O (n) → R at each critical point is nonsingular. For almost all initial conditions Θ0 ∈ O (n) the solution Θ (t) of (1.25) converges exponentially fast to an eigenbasis Θ∞ of H0 , such that H0 = Θ∞ diag (λ1 , . . . , λn ) Θ∞ holds. Proof 1.15 The derivative of φ : O (n) → R, φ (Θ) = tr (N Θ H0 Θ), at Θ ∈ O (n) is the linear map on the tangent space TΘ O (n) defined by Dφ|Θ (ΘΩ) = tr (N Θ H0 ΘΩ − N ΩΘ H0 Θ) = tr ([N, Θ H0 Θ] Ω)
(1.26)
2.1. Double Bracket Flows for Diagonalization
55
for Ω = −Ω. Let ∇φ (Θ) denote the gradient of φ at Θ ∈ O (n), defined with respect to the induced Riemannian metric on O (n). Thus ∇φ (Θ) is characterized by (a)
∇φ (Θ) ∈ TΘ O (n) ,
(b)
Dφ|φ (ΘΩ) = ∇φ (Θ) , ΘΩ = tr ∇φ (Θ) ΘΩ ,
(1.27)
for all skew-symmetric matrices Ω ∈ Rn×n . By (1.26) and (1.27) then (1.28) tr (Θ ∇φ (Θ)) Ω = tr ([N, Θ H0 Θ] Ω) for all Ω = −Ω and thus, by skew symmetry of Θ ∇φ (Θ) and [N, Θ H0 Θ], we have Θ ∇φ (Θ) = − [N, Θ H0 Θ] .
(1.29)
Therefore (1.25) is the gradient flow of φ : O (n) → R, proving (a). By compactness of O (n) and by the convergence properties of gradient flows, the solutions of (1.25) exist for all t ∈ R and converge to a connected component of the set of equilibria points Θ∞ ∈ O (n), corresponding to a fixed value of the trace function φ : O (n) → R. Also, Θ∞ ∈ O (n) is an equilibrium print of (1.25) if and only if Θ∞ [N, Θ∞ H0 Θ∞ ] = 0, i.e. as Θ∞ is invertible, if and only if [N, Θ∞ H0 Θ∞ ] = 0. This proves (b). To prove (c) we need a lemma. Lemma 1.16 Let N = diag (µ1 , . . . , µn ) with µ1 > · · · > µn . Let ψ $r∈ O (n) with H0 = ψ diag (λ1 In1 , . . . , λr Inr ) ψ , λ1 > · · · > λn and i=1 ni = n. Then the set of equilibria points of (1.25) is characterized as Θ∞ = ψDπ
(1.30)
where D = diag (D1 , . . . , Dr ) ∈ O (n1 ) × · · · × O (nr ), Di ∈ O (ni ), i = 1, . . . , r, and π is an n × n permutation matrix. Proof 1.17 By (b), Θ∞ is an equilibrium point if and only if [N, Θ∞ H0 Θ∞ ] = 0. As N has distinct diagonal entries this condition is equivalent to Θ∞ H0 Θ∞ being a diagonal matrix Λ. This is clearly true for matrices of the form (1.30). Now the diagonal elements of Λ are the eigenvalues of H0 and therefore Λ = π diag (λ1 In1 , . . . , λr Inr ) π for an n × n permutation matrix π. Thus diag (λ1 In1 , . . . , λr Inr ) =πΘ∞ H0 Θ∞ π
(1.31)
=πΘ∞ ψ diag (λ1 In1 , . . . , λr Inr ) ψ Θ∞ π .
56
Chapter 2. Double Bracket Isospectral Flows
Thus ψ Θ∞ π commutes with diag (λ1 In1 , . . . , λr Inr ). But any orthogonal matrix which commutes with diag (λ1 In1 , . . . , λr Inr ), λ1 > · · · > λr , is of the form D = diag (D1 , . . . , Dr ) with Di ∈ O (ni ), i = 1, . . . , r, orthogonal. The result follows. It follows from this lemma and the genericity conditions on N and H0 in (c) that the equilibria of (1.25) are of the form Θ∞ = ψDπ, where D = diag (±1, . . . , ±1) is an arbitrary sign matrix and π is a permutation matrix. In particular there are exactly 2n n! equilibrium points of (1.25). As this number is finite, part (b) implies that every solution Θ (t) converges
to an element Θ∞ ∈ O (n) with Θ∞ H0 Θ∞ = π diag (λ1 , . . . , λn ) π = diag λπ(1) , . . . , λπ(n) . In particular, the column vectors of Θ∞ are eigenvectors of H0 . This completes the proof of (c). To prove (d) we linearize the flow (1.25) around each equilibrium point. The linearization of (1.25) at Θ∞ = ψDπ, where D = diag (d1 , . . . , dn ), di ∈ {±1}, is ξ˙ = −Θ∞ [N, ξ H0 Θ∞ + Θ∞ H0 ξ]
(1.32)
for ξ = Θ∞ Ω ∈ TΘ∞ O (n). Thus (1.32) is equivalent to ˙ = − [N, [Θ H0 Θ∞ , Ω]] Ω ∞
(1.33)
on the linear space skew (n) of skew-symmetric n × n matrices Ω. As
Θ∞ H0 Θ∞ = diag λπ(1) , . . . , λπ(n) , this is equivalent, in terms of the matrix entries of Ω = (Ωij ), to the decoupled set of linear differential equations
Ω˙ ij = − λπ(i) − λπ(j) (µi − µj ) Ωij , i > j. (1.34) From this it follows immediately that the eigenvalues of the linearization at Θ∞ are nonzero. Furthermore, they are all negative if and only if
λπ(1) , . . . , λπ(n) and (µ1 , . . . , µn ) are similarly ordered, that is if and only if π = In . Arguing as in the proof of Theorem 1.5, the union of the unstable manifolds of the critical points Θ∞ = ψDπ with π = In is a closed subset of O (n) of co-dimension at least one. Its complement thus forms an open and dense subset of O (n). It is the union of the domains of attractions for the 2n locally stable attractors Θ∞ = ψD, with D = diag (±1, . . . , ±1) arbitrary. This completes the proof of (d). Remark 1.18 In Part (b) of the above theorem it has been stated that every solution of (1.24) converges to a connected component of the set of
2.1. Double Bracket Flows for Diagonalization
57
equilibria points. Again, it can be shown that φ : O (n) → R, φ (Θ) = tr (N Θ H0 Θ), is a Morse-Bott function. Thus, using Proposition 1.3.9, any solution of (1.24) is converging to an equilibrium point rather than a set of equilibria. It is easily verified that Θ (t) given from (1.25) implies that H (t) = Θ (t) H0 Θ (t)
(1.35)
is a solution of the double bracket equation (1.1). In this sense, the double bracket equation is seen as a projected gradient flow from O (n). A final observation is that in maximizing φ (Θ) = tr (N Θ HΘ) over O (n) in the case of generic matrices N , H0 as in Theorem 1.14 Part (d), then of the 2n n! possible equilibria Θ∞ = ψDπ, the 2n maxima of φ$: O (n) → R with n π = In maximize the sum of products of eigenvalues i=1 λπ(i) µi . This $n ties in with a classical result that to maximize i=1 λπ(i) µi there must be a “similar” ordering; see Hardy, Littlewood and Polya (1952). Problem 1.19 For a nonsingular positive definite symmetric matrix A, show that the solutions of the equation H˙ = [H, A [H, N ] A] converges to the set of H∞ satisfying [H∞ , N ] = 0. Explore the convergence properties in the case where A is diagonal. Problem 1.20 Let A, B ∈ Rn×n be symmetric. For any integer i ∈ N i i define A (B) recursively by adA (B) := AB − BA, adA (B) =
ad i−1 i adA adA (B) , i ≥ 2. Prove that adA (B) = 0 for some i ≥ 1 implies adA (B) = 0. Deduce that for i ≥ 1 arbitrary H˙ = H, adiH (N ) has the same equilibria as (1.1). Problem 1.21 Let diag (H) = diag (h11 , . . . , hnn ) denote the diagonal matrix with diagonal entries identical to those of H. Consider the problem of minimizing the distance function g : M (Q) → R defined by g (H) = 2 H − diag (H) . Prove that the gradient flow of g with respect to the normal Riemannian metric is H˙ = [H, [H, diag (H)]] . Show that the equilibrium points satisfy [H∞ , diag (H∞ )] = 0. Problem 1.22 Let N ∈ Rn×n be symmetric. Prove that the gradient flow of the function H → tr N H 2 on M (Q) with respect to the normal Riemannian metric is H˙ = H, H 2 , N
58
Chapter 2. Double Bracket Isospectral Flows
Investigate the convergence properties of the flow. Generalize to tr (N H m ) for m ∈ N arbitrary!
Main Points of Section An eigenvalue/eigenvector decomposition of a real symmetric matrix can be achieved by minimizing a matrix least squares distance function via a gradient flow on the Lie group of orthogonal matrices with an appropriate Riemannian metric. The distance function is a smooth Morse-Bott function. The isospectral double bracket flow on homogeneous spaces of symmetric matrices converges to a diagonal matrix consisting of the eigenvalues of the initial condition. A specific choice of Riemannian metric allows a particularly simple form of the gradient. The convergence rate is exponential, with stability properties governed by a linearization of the equations around the critical points.
2.2 Toda Flows and the Riccati Equation The Toda Flow An important issue in numerical analysis is to exploit special structures of matrices to develop faster and more reliable algorithms. Thus, for example, in eigenvalue computations an initial matrix is often first transformed into Hessenberg form and the subsequent operations are performed on the set of Hessenberg matrices. In this section we study this issue for the double bracket equation. For appropriate choices of the parameter matrix N , the double bracket flow (1.1) restricts to a flow on certain subclasses of symmetric matrices. We treat only the case of symmetric Hessenberg matrices. These are termed Jacobi matrices and are banded, being tri-diagonal, symmetric matrices H = (hij ) with hij = 0 for |i − j| ≥ 2. Lemma 2.1 Let N = diag (1, 2, . . . , n) and let H be a Jacobi matrix. Then [H, [H, N ]] is a Jacobi matrix, and (1.1) restricts to a flow on the set of isospectral Jacobi matrices. Proof 2.2 The (i, j)-th entry of B = [H, [H, N ]] is bij =
n k=1
(2k − i − j) hik hkj .
2.2. Toda Flows and the Riccati Equation
59
Hence for |i − j| ≥ 3 and k = 1, . . . , n, then hik = 0 or hkj = 0 and therefore bij = 0. Suppose j = i + 2. Then for k = i + 1 bij = (2 (i + 1) − i − j) hi,i+1 hi+1,i+2 = 0. Similarly for i = j + 2. This completes the proof. Actually for N = diag (1, 2, . . . , n) and H a Jacobi matrix HN − N H = Hu − H , where
Hu =
and
h11
h12
0 .. .
h22 .. .
..
.
..
.
0
...
0
0
... .. . .. .
h11 h 21 H = 0
h22 .. .
0
(2.1)
hn−1,n hnn ,
hn,n−1
0 .. . 0 hnn
denote the upper triangular and lower triangular parts of H respectively. (This fails if H is not Jacobi; why?) Thus, for the above special choice of N , the double bracket flow induces the isospectral flow on Jacobi matrices H˙ = [H, Hu − H ] .
(2.2)
The differential equation (2.2) is called the Toda flow. This connection between the double bracket flow (1.1) and the Toda flow has been first observed by Bloch (1990b). For a thorough study of the Toda lattice equation we refer to Kostant (1979). Flaschka (1974) and Moser (1975) have given the following Hamiltonian mechanics interpretation of (2.2). Consider the system of n idealized mass points x1 < · · · < xn on the real axis. Let us suppose that the potential energy of this system of points x1 , . . . , xn is given by V (x1 , . . . , xn ) =
n k=1
exk −xk+1
60
Chapter 2. Double Bracket Isospectral Flows
r x0 = −∞
r x1
r xi
r xi+1
r xn
r xn+1 = +∞
FIGURE 2.1. Mass points on the real axis
xn+1 = +∞, while the kinetic energy is given as usual by 12 the total energy of the system is given by the Hamiltonian
$n k=1
1 2 xk −xk+1 yk + e , H (x, y) = 2 n
n
k=1
k=1
x˙ 2k . Thus
(2.3)
with yk = x˙ k the momentum of the k-th mass point. Thus the associated Hamiltonian system is described by ∂H = yk , ∂yk ∂H y˙ k = − = exk−1 −xk − exk −xk+1 , ∂xk
x˙ k =
(2.4)
for k = 1, . . . , n. In order to see the connection between (2.2) and (2.4) we introduce the new set of coordinates (this trick is due to Flaschka). ak = 12 e(xk −xk+1 )/2 , Then with
b1
a 1 H= 0
bk = 12 yk , a1 b2 .. .
k = 1, . . . , n. 0
..
.
..
.
an−1
(2.5)
, an−1 bn
the Jacobi matrix defined by ak , bk , it is easy to verify that (2.4) holds if and only if H satisfies the Toda lattice equation (2.2). Let Jac (λ1 , . . . , λn ) denote the set of n × n Jacobi matrices with eigenvalues λ1 ≥ · · · ≥ λn . The geometric structure of Jac (λ1 , . . . , λn ) is rather complicated and not completely understood. However Tomei (1984) has shown that for distinct eigenvalues λ1 > · · · > λn the set Jac (λ1 , . . . , λn ) is a smooth, (n − 1)-dimensional compact manifold. Moreover, Tomei (1984) has determined the Euler characteristic of Jac (λ1 , . . . , λn ). He shows that for n = 3 the isospectral set Jac (λ1 , λ2 , λ3 ) for λ1 > λ2 > λ3 is a compact Riemann surface of genus two. With these remarks in mind, the following result is an immediate consequence of Theorem 1.5.
2.2. Toda Flows and the Riccati Equation
61
Corollary 2.3 (a) The Toda flow (2.2) is an isospectral flow on the set of real symmetric Jacobi matrices. (b) The solution H (t) of (2.2) exists for all t ∈ R and converges to a diagonal matrix as t → ±∞. (c) Let N = diag (1, 2, . . . , n) and λ1 > · · · > λn . The Toda flow on the isospectral manifold Jac (λ1 , . . . , λn ) is the gradient flow for the least squares distance function fN : Jac (λ1 , . . . , λn ) → R, fN (H) = 2 − 12 N − H . Remark 2.4 In Part (c) of the above Corollary, the underlying Riemannian metric on Jac (λ1 , . . . , λn ) is just the restriction of the Riemannian metric , appearing in Theorem 1.5 from M (Q) to Jac (λ1 , . . . , λn ).
Connection with the Riccati equation There is also an interesting connection of the double bracket equation (1.1) with the Riccati equation. For any real symmetric n × n matrix Q = diag (λ1 In1 , . . . , λr Inr )
(2.6)
with $r distinct eigenvalues λ1 > · · · > λr and multiplicities n1 , . . . , nr , with i=1 ni = n, let M (Q) = {Θ QΘ | Θ Θ = In }
(2.7)
denote the isospectral manifold of all symmetric n × n matrices H which are orthogonally similar to Q. The geometry of M (Q) is well understood, in fact, M (Q) is known to be diffeomorphic to a flag manifold. By the isospectral property of the double bracket flow, it induces by restriction a flow on M (Q) and therefore on a flag manifold. It seems difficult to analyze this induced flow in terms of the intrinsic geometry of the flag manifold; see however Duistermaat et al. (1983). We will therefore restrict ourselves to the simplest nontrivial case r = 2. But first let us digress on the topic of flag manifolds. Digression: Flag Manifolds A flag in Rn is an increasing sequence of subspaces V1 ⊂ · · · ⊂ Vr ⊂ Rn of Rn , 1 ≤ r ≤ n. Thus for r = 1 a flag is just a linear subspace of Rn . Flags V1 ⊂ · · · ⊂ Vn ⊂ Rn with dim Vi = i, i = 1, . . . , n, are called complete, and flags with r < n are called partial.
62
Chapter 2. Double Bracket Isospectral Flows
Given any sequence (n1 , . . . , nr ) of nonnegative integers with n1 + · · · + nr ≤ n, the flag manifold Flag (n1 , . . . , nr ) is defined as the set of all flags (V1 , . . . , Vr ) of vector spaces with V1 ⊂ · · · ⊂ Vr ⊂ Rn and dim Vi = n1 +· · ·+ ni , i = 1, . . . , r. Flag (n1 , . . . , nr ) is a smooth, connected, compact manifold. For r = 1, Flag (n1 ) = GrassR (n1 , n) is just the Grassmann manifold of n1 -dimensional linear subspaces of Rn . In particular, for n = 2 and n1 = 1, 1 projective line and is thus homeomorphic to the Flag (1) = RP
is the real 1 circle S = (x, y) ∈ R2 | x2 + y 2 = 1 . Let Q = diag (λ1 In1 , . . . , λr Inr ) with λ1 > · · · > λr and n1 + · · · + nr = n and let M (Q) be defined by (1.2). Using (1.6) it can be shown that M (Q) is diffeomorphic to the flag manifold Flag (n1 , . . . , nr ). This isospectral picture of the flag manifold will be particularly useful to us. In particular for r = 2 we have a diffeomorphism of M (Q) with the real Grassmann manifold GrassR (n1 , n) of n1 -dimensional linear subspaces of Rn . This is in harmony with the fact, mentioned earlier in Chapter 1, that for Q = diag (Ik , 0) the real Grassmann manifold GrassR (k, n) is diffeomorphic to the set M (Q) of rank k symmetric projection operators of Rn . Explicitly, to any orthogonal matrix n × n matrix Θ1 . Θ = .. Θr with Θi ∈ Rni ×n , i = 1, . . . , r, and n1 + · · · + nr = n, we associate the flag of vector spaces VΘ := (V1 (Θ) , . . . , Vr (Θ)) ∈ Flag (n1 , . . . , nr ) . Here Vi (Θ), i = 1, . . . , r, is defined as the (n1 + · · · + ni )-dimensional vector space in Rn which is generated by the row vectors of the sub-matrix [ Θ1 ... Θi ] This defines a map f : M (Q) → Flag (n1 , . . . , nr ) Θ QΘ →VΘ . ˆ QΘ ˆ then Θ ˆ = ψΘ for an orthogonal matrix ψ Note that, if Θ QΘ = Θ which satisfies ψ Qψ = Q. But this implies that ψ = diag (ψ1 , . . . , ψr ) is block diagonal and therefore VΘ ˆ = VΘ , so that f is well defined. ˆ Conversely, VΘ ˆ = VΘ implies Θ = ψΘ for an orthogonal matrix ψ = diag (ψ1 , . . . , ψr ). Thus the map f is well-defined and injective. It is easy to see that f is in fact a smooth bijection. The inverse of f is f −1 : Flag (n1 , . . . , nr ) →M (Q) V = (V1 , . . . , Vr ) →ΘV QΘV
2.2. Toda Flows and the Riccati Equation where
63
Θ1 . ΘV = .. ∈ O (n) Θr
⊥ ∩ Vi of and Θi is any orthogonal basis of the orthogonal complement Vi−1 −1 is smooth and thus Vi−1 in Vi ; i = 1, . . . , r. It is then easy to check that f f : M (Q) → Flag (n1 , . . . , nr ) is a diffeomorphism.
With the above background material on flag manifolds, let us proceed with the connection of the double bracket equation to the Riccati equation. Let GrassR (k, n) denote the Grassmann manifold of k-dimensional linear subspaces of Rn . Thus GrassR (k, n) is a compact manifold of dimension k (n − k). For k = 1, GrassR (1, n) = RPn−1 is the (n − 1)-dimensional projective space of lines in Rn (see digression on projective spaces of Section 1.2). For Ik 0 , Q= 0 0 M (Q) coincides with the set of all rank k symmetric projection operators H of Rn : H = H,
H 2 = H,
rank H = k,
(2.8)
and we have already shown (see digression) that M (Q) is diffeomorphic to the Grassmann manifold GrassR (k, n). The following result is a generalization of this observation. Lemma 2.5 For Q = diag (λ1 Ik , λ2 In−k ), λ1 > λ2 , the isospectral manifold M (Q) is diffeomorphic to the Grassmann manifold GrassR (k, n). Proof 2.6 To any orthogonal n × n matrix Θ1 Θ= Θ2 ∈ R(n−k)×n , , Θ1 ∈ Rk×n , Θ2 we associate the k dimensional vector-space VΘ1 ⊂ Rn , which is generated by the k orthogonal row vectors of Θ1 . This defines a map f : M (Q) → GrassR (k, n) , Θ QΘ →VΘ1 .
(2.9)
64
Chapter 2. Double Bracket Isospectral Flows
QΘ then Θ = ψΘ for an orthogonal matrix ψ Note that, if Θ QΘ = Θ which satisfies ψ Qψ = Q. But this implies that ψ = diag (ψ1 , ψ2 ) and therefore VΘ 1 = VΘ1 and f is well defined. Conversely, VΘ 1 = VΘ1 implies = ψΘ for an orthogonal matrix ψ = diag (ψ1 , ψ2 ). Thus (2.9) is injective. Θ It is easy to see that f is a bijection and a diffeomorphism. In fact, as M (Q) is the set of rank k symmetric projection operators, f (H) ∈ GrassR (k, n) is the image of H ∈ M (Q). Conversely let X ∈ Rn×k be such that the columns of X generate a k-dimensional linear subspace V ⊂ Rn . Then −1
PX := X (X X)
X
is the Hermitian projection operator onto V and f −1 : GrassR (k, n) →M (Q) −1
column span (X) →X (X X)
X
is the inverse of f . It is obviously smooth and a diffeomorphism. Every n×n matrix N ∈ Rn×n induces a flow on the Grassmann manifold ΦN : R × GrassR (k, n) → GrassR (k, n) defined by ΦN (t, V ) = etN · V,
(2.10)
where etN · V denotes the image of the k-dimensional subspace V ⊂ Rn under the invertible linear transformation etN : Rn → Rn . We refer to (2.10) as the flow on GrassR (k, n) which is linearly induced by N . We have already seen in Section 1.2, that linearly induced flows on the Grassmann manifold GrassR (k, n) correspond to the matrix Riccati equation K˙ = A21 + A22 K − KA11 − KA12 K
(2.11)
on R(n−k)×k , where A is partitioned as A11 A12 A= . A21 A22 Theorem 2.7 Let N ∈ Rn×n be symmetric and Q = diag (λ1 Ik , λ2 In−k ), λ1 > λ2 , for 1 ≤ k ≤ n − 1. The double bracket flow (1.1) is equivalent via the map f , defined by (2.9), to the flow on the Grassmann manifold GrassR (k, n) linearly induced by (λ1 − λ2 ) N . It thus induces the Riccati equation (2.11) with generating matrix A = (λ1 − λ2 ) N .
2.2. Toda Flows and the Riccati Equation
65
Proof 2.8 Let H0 = Θ0 QΘ0 ∈ M (Q) and let H (t) ∈ M (Q) be a solution of (1.1) with H (0) = H0 . We have to show that for all t ∈ R f (H (t)) = et(λ1 −λ2 )N V0 ,
(2.12)
where f is defined by (2.9) and V0 = VΘ0 . By Theorem 1.14, H (t) = Θ (t) QΘ (t) where Θ (t) satisfies the gradient flow on O (n)
Hence for X = Θ
˙ = Θ (Θ QΘN − N Θ QΘ) . Θ
(2.13)
X˙ = N XQ − XQX N X.
(2.14)
Let X (t), t ∈ R, be any orthogonal matrix solution of (2.14). (Note that orthogonality of X (t) holds automatically in case of orthogonal initial conditions.) Let S (t) ∈ Rn×n , S (0) = In , be the unique matrix solution of the linear time-varying system S˙ = X (t) N X (t) ((λ1 − λ2 ) S − QS) + QX (t) N X (t) S.
(2.15)
Let Sij (t) denote the (i, j)-block entry of S (t). Suppose S21 (t0 ) = 0 for some t0 ∈ R. A straightforward computation using (2.15) shows that then also S˙ 21 (t0 ) = 0. Therefore (2.15) restricts to a time-varying flow on the subset of block upper triangular matrices. In particular, the solution S (t) with S (0) = In is block upper triangular for all t ∈ R S11 (t) S12 (t) . S (t) = 0 S22 (t) Lemma 2.9 For any solution X (t) of (2.14) let S11 S12 S= 0 S22 be the solution of (2.15) with S (0) = In . Then X (t) · S (t) = et(λ1 −λ2 )N · X (0) .
(2.16)
Proof 2.10 Let Y (t) = X (t) · S (t). Then ˙ + X S˙ Y˙ =XS =N XQS − XQX N XS + N X ((λ1 − λ2 ) S − QS) + XQX N XS = (λ1 − λ2 ) N Y. Let Z (t) = et(λ1 −λ2 )N · X (0). Now Y and Z both satisfy the linear differential equation ξ˙ = (λ1 − λ2 ) N ξ with identical initial condition Y (0) = Z (0) = X (0). Thus Y (t) = Z (t) for all t ∈ R and the lemma is proved.
66
Chapter 2. Double Bracket Isospectral Flows
We now have the proof of Theorem 2.7 in our hands. In fact, let V (t) = f (H (t)) ∈ GrassR (k, n) denote the vector space which is generated by the first k column vectors of X (t). By the above lemma V (t) = et(λ1 −λ2 )N V (0) which completes the proof. One can use Theorem 2.7 to prove results on the dynamic Riccati equation arising in linear optimal control, see Anderson and Moore (1990). Let N=
A
−BB
−C C
−A
(2.17)
be the Hamiltonian matrix associated with a linear system x˙ = Ax + Bu,
y = Cx
(2.18)
where A ∈ Rn×n , B ∈ Rn×m and C ∈ Rp×n . If m = p and (A, B, C) = (A , C , B ) then N is symmetric and we can apply Theorem 2.7. Corollary 2.11 Let (A, B, C) = (A , C , B ) be a symmetric controllable and observable realization. The Riccati equation K˙ = −KA − A K + KBB K − C C
(2.19)
extends to a gradient flow on GrassR (n, 2n) given by the double bracket equation (1.1) under (2.19). Every solution in GrassR (n, 2n) converges to an equilibrium point. Suppose A −BB N= −C C −A has distinct eigenvalues. Then (2.19) has ( 2n n ) equilibrium points in the Grassmannian GrassR (n, 2n), exactly one of which is asymptotically stable. Proof 2.12 By Theorem 2.7 the double bracket flow on M (Q) for Q = Ik 0 0 0 is equivalent to the flow on the Grassmannian GrassR (n, 2n) which is linearly induced by N . If N has distinct eigenvalues, then the double bracket flow on M (Q) has 2n equilibrium points with exactly one ben ing asymptotically stable. Thus the result follows immediately from Theorem 1.5.
2.2. Toda Flows and the Riccati Equation
67
−1
Remark 2.13 A transfer function G (s) = C (sI − A) B has a symmet ric realization (A, B, C) = (A , C , B ) if and only if G (s) = G (s) and the Cauchy index of G (s) is equal to the McMillan degree; Youla and Tissi (1966), Brockett and Skoog (1971). Such transfer functions arise frequently in circuit theory. Remark 2.14 Of course the second part of the theorem is a well-known fact from linear optimal control theory. The proof here, based on the properties of the double bracket flow, however, is new and offers a rather different approach to the stability properties of the Riccati equation. Remark 2.15 A possible point of confusion arising here might be that in some cases the solutions of the Riccati equation (2.19) might not exist for all t ∈ R (finite escape time) while the solutions to the double bracket equation always exist for all t ∈ R. One should keep in mind that Theorem 2.7 only says that the double bracket flow on M (Q) is equivalent to a linearly induced flow on GrassR (n, 2n). Now the Riccati equation (2.19) is the vector field corresponding to the linear induced flow on an open coordinate chart of GrassR (n, 2n) and thus, up to the change of variables described by the diffeomorphism f : M (Q) → GrassR (n, 2n), coincides on that open coordinate chart with the double bracket equation. Thus the double bracket equation on M (Q) might be seen as an extension or completion of the Riccati vector field (2.19). I 0 Remark 2.16 For Q = 0k 0 any H ∈ M (Q) satisfies H 2 = H. Thus the double bracket equation on M (Q) becomes the special Riccati equation on Rn×n H˙ = HN + N H − 2HN H which is shown in Theorem 2.7 to be equivalent to the general matrix Riccati equation (for N symmetric) on R(n−k)×k . Finally we consider the case k = 1, i.e. the associated flow on the projective space RPn−1 . Thus let Q = diag (1, 0, . . . , 0).$ Any H ∈ M (Q) has n a representation H = x · x where x = (x1 , . . . , xn ), j=1 x2j = 1, and x is uniquely determined up to multiplication by ±1. The double bracket flow on M (Q) is equivalent to the gradient flow of the standard Morse function Φ (x) =
n 1 1 tr (N xx ) = nij xi xj . 2 2 i,j=1
on RPn−1 see Milnor (1963). Moreover, the Lie bracket flow (1.25) on O (n) induces, for Q as above, the Rayleigh quotient flow on the (n − 1) sphere S n−1 of Section 1.3.
68
Chapter 2. Double Bracket Isospectral Flows
Problem 2.17 Prove that every solution of H˙ = AH + HA − 2HAH,
H (0) = H0 ,
(2.20)
has the form
−1 tA H (t) = etA H0 In − H0 + e2tA H0 e ,
t ∈ R.
Problem 2.18 Show that the spectrum (i.e. the set of eigenvalues) σ (H (t)) of any solution is given by
σ (H (t)) = σ H0 e−2tA (In − H0 ) + H0 . Problem 2.19 Show that for any solution H (t) of (2.20) also G (t) = In − H (−t) solves (2.20). Problem 2.20 Derive similar formulas for time-varying matrices A (t).
Main Points of Section The double bracket equation H˙ = [H, [H, N ]] with H (0) a Jacobi matrix and N = diag (1, 2, . . . , n) preserves the Jacobi property in H (t) for all t ≥ 0. In this case the double bracket equation is the Toda flow H˙ = [H, Hu − H ] which has interpretations in Hamiltonian mechanics. For the case of symmetric matrices N and Q = diag (λ1 Ik , λ2 In−k ) with λ1 > λ2 , the double bracket flow is equivalent to the flow on the Grassmann manifold GrassR (k, n) linearly induced by (λ1 − λ2 ) N . This in turn is equivalent to a Riccati equation (2.11) with generating matrix A = (λ1 − λ2 ) N . This result gives an interpretation of certain Riccati equations of linear optimal control as gradient flows. For the special case Q = diag (Ik , 0, . . . , 0), then H 2 = H and the double bracket equation becomes the special Riccati equation H˙ = HN + N H − 2HN H. When k = 1, then H = xx with x a column vector and the double bracket equation is induced by x˙ = (N − x N xIn ) x, which is the Rayleigh quotient gradient flow of tr (N xx ) on the sphere S n−1 .
2.3 Recursive Lie-Bracket Based Diagonalization Now that the double bracket flow with its rich properties has been studied, it makes sense to ask whether or not there are corresponding recursive versions. Indeed there are. For some ideas and results in this direction we refer to Chu (1992b), Brockett (1993), and Moore, Mahony and Helmke
2.3. Recursive Lie-Bracket Based Diagonalization
69
(1994). In this section we study the following recursion, termed the Liebracket recursion, Hk+1 = e−α[Hk ,N ] Hk eα[Hk ,N ] ,
H0 = H0 ,
k∈N
(3.1)
for arbitrary symmetric matrices H0 ∈ Rn×n , and some suitably small scalar α, termed a step size scaling factor. A key property of the recursion (3.1) is that it is isospectral. This follows since eα[Hk ,N ] is orthogonal, as indeed is any eA where A is skew symmetric. To motivate this recursion (3.1), observe that Hk+1 is also the solution at time t = α to a linear matrix differential equation initialized by Hk as follows ¯ dH ¯ [Hk , N ] , = H, dt ¯ (α) . Hk+1 =H
¯ (0) = Hk , H
¯ (t) appears to be a solution close to For small t, and thus α small, H that of the corresponding double bracket flow H (t) of H˙ = [H, [H, N ]], H (0) = Hk . This suggests, that for step-size scaling factors not too large, or not decaying to zero too rapidly, that the recursion (3.1) should inherit the exponential convergence rate to desired equilibria of the continuoustime double bracket equation. Indeed, in some applications this piece-wise constant, linear, and isospectral differential equation may be more attractive to implement than a nonlinear matrix differential equation. Our approach in this section is to optimize in some sense the α selections in (3.1) according to the potential of the continuous-time gradient flow. The subsequent bounding arguments are similar to those developed by Brockett (1993), see also Moore et al. (1994). In particular, for the potential function 2
2
(3.2)
∆fN (Hk , α) := fN (Hk+1 ) − fN (Hk ) = tr (N (Hk+1 − Hk ))
(3.3)
fN (Hk ) = − 12 N − Hk = tr (N Hk ) −
1 2
2
N −
1 2
H0
we seek to maximize at each iteration its increase
Lemma 3.1 The constant step-size selection α=
1 4 H0 · N
satisfies ∆fN (Hk , α) > 0 if [Hk , N ] = 0.
(3.4)
70
Chapter 2. Double Bracket Isospectral Flows
Proof 3.2 Let Hk+1 (τ ) = e−τ [Hk ,N ] Hk eτ [Hk ,N ] be the k + 1th iteration of (3.1) for an arbitrary step-size scaling factor τ ∈ R. It is easy to verify that d Hk+1 (τ ) = [Hk+1 (τ ) , [Hk , N ]] dτ d2 Hk+1 (τ ) = [[Hk+1 (τ ) , [Hk , N ]] , [Hk , N ]] . dτ 2 Applying Taylor’s theorem, then since Hk+1 (0) = Hk Hk+1 (τ ) =Hk + τ [Hk , [Hk , N ]] + τ 2 R2 (τ ) , where
%
1
[[Hk+1 (yτ ) , [Hk , N ]] , [Hk , N ]] (1 − y) dy.
R2 (τ ) =
(3.5)
0
Substituting into (3.3) gives, using matrix norm inequalities outlined in Sections A.6 and A.9,
∆fN (Hk , τ ) = tr N τ [Hk , [Hk , N ]] + τ 2 R2 (τ ) 2
=τ [Hk , N ] + τ 2 tr (N R2 (τ )) 2
≥τ [Hk , N ] − τ 2 |tr (N R2 (τ ))| ≥τ [Hk , N ]2 − τ 2 N · R2 (τ ) 2
≥τ [Hk , N ] − τ 2 N % 1 [[Hk+1 (yτ ) , [Hk , N ]] , [Hk , N ]] (1 − y) dy · 0 2
2
≥τ [Hk , N ] − 2τ 2 N · H0 · [Hk , N ] L =:∆fN (Hk , τ ) .
(3.6)
L Thus ∆fN (Hk , τ ) is a lower bound for ∆fN (Hk , τ ) and has the property that for sufficiently small τ > 0, it is strictly positive, see Figure 3.1. Due L (Hk , τ ) in τ , it is immediately clear that if to the explicit form of ∆fN [Hk , N ] = 0, then α = 1/ (4 H0 N ) is the unique maximum of (3.6). L Hence, fN (Hk , α) ≥ fN (Hk , α) > 0 for [Hk , N ] = 0.
This lemma leads to the main result of the section. Theorem 3.3 Let H0 = H0 be a real symmetric n × n matrix with eigenvalues λ1 ≥ · · · ≥ λn and let N = diag (µ1 , . . . , µn ), µ1 > · · · > µn . The Lie-bracket recursion (3.1), restated as Hk+1 = e−α[Hk ,N ] Hk eα[Hk ,N ] ,
α = 1/ (4 H0 · N )
with initial condition H0 , has the following properties:
(3.7)
2.3. Recursive Lie-Bracket Based Diagonalization
∆ f (H , τ ) > N
k
71
∆ f L (H , τ ) N
k
∆ f L (H , τ ) N
k
α
τ ∆ f L (H , τ ) N
k
L FIGURE 3.1. The lower bound on ∆fN (Hk , τ )
(a) The recursion is isospectral. (b) If (Hk ) is a solution of the Lie-bracket algorithm, then fN (Hk ) of (3.2) is a strictly monotonically increasing function of k ∈ N, as long as [Hk , N ] = 0. (c) Fixed points of the recursive equation are characterised by matrices H∞ ∈ M (H0 ) such that [H∞ , N ] = 0, being exactly the equilibrium points of the double bracket equation (1.1). (d) Let (Hk ), k = 1, 2, . . . , be a solution to the recursive Lie-bracket algorithm, then Hk converges to a matrix H∞ ∈ M (H0 ), [H∞ , N ] = 0, an equilibrium point of the recursion. (e) All equilibrium points of the recursive Lie-bracket algorithm are unstable, except for Q = diag (λ1 , . . . , λn ) which is locally exponentially stable. Proof 3.4 To prove Part (a), note that the Lie-bracket [H, N ] = − [H, N ] is skew-symmetric. As the exponential of a skew-symmetric matrix is orthogonal, (3.1) is just conjugation by an orthogonal matrix, and hence is an isospectral transformation. Part (b) is a direct consequence of Lemma 3.1. For Part (c) note that if [Hk , N ] = 0, then by direct substitution into (3.1) we see Hk+l = Hk for l ≥ 1, and thus Hk is a fixed point. Conversely if [Hk , N ] = 0, then from (3.6), ∆fN (Hk , α) = 0, and thus Hk+1 = Hk . In particular, fixed points of (3.1) are equilibrium points of (1.1), and
72
Chapter 2. Double Bracket Isospectral Flows
furthermore, from Theorem 1.5, these are the only equilibrium points of (1.1). To show Part (d), consider the sequence Hk generated by the recursive Lie-bracket algorithm for a fixed initial condition H0 . Observe that Part (b) implies that fN (Hk ) is strictly monotonically increasing for all k where [Hk , N ] = 0. Also, since fN is a continuous function on the compact set M (H0 ), then fN is bounded from above, and fN (Hk ) will converge to some f∞ ≤ 0 as k → ∞. As fN (Hk ) → f∞ then ∆fN (Hk , αc ) → 0. For some small positive number ε, define an open set D ⊂ M (H0 ) consisting of all points of M (H0 ), within an -neighborhood of some equilibrium point of (3.1). The set M (H0 ) − Dε is a closed, compact subset of M (H0 ), on which the matrix function H → [H, N ] does not vanish. As a consequence, the difference function (3.3) is continuous and strictly positive on M (H0 ) − Dε , and thus, is under bounded by some strictly positive number δ1 > 0. Moreover, as ∆fN (Hk , α) → 0, there exists a K = K (δ1 ) such that for all k > K then 0 ≤ ∆fN (Hk , α) < δ1 . This ensures that Hk ∈ Dε for all k > K. In other words, (Hk ) is converging to some subset of possible equilibrium points. From Theorem 1.5 and the imposed genericity assumption on N , it is known that the double bracket equation (1.1) has only a finite number of equilibrium points. Thus Hk converges to a finite subset in M (H0 ). Moreover, [Hk , N ] → 0 as k → ∞. Therefore Hk+1 − Hk → 0 as k → ∞. This shows that (Hk ) must converge to an equilibrium point, thus completing the proof of Part (d). To establish exponential convergence, note that since α is constant, the map H → e−α[H,N ] Heα[H,N ] is a smooth recursion on all M (H0 ), and we may consider the linearization of this map at an equilibrium point π Qπ. The linearization of this recursion, expressed in terms of Ξk ∈ Tπ Qπ M (H0 ), is Ξk+1 = Ξk − α [(Ξk N − N Ξk ) π Qπ − π Qπ (Ξk N − N Ξk )] . Thus for the elements of Ξk = (ξij )k we have
(ξij )k+1 = 1 − α λπ(i) − λπ(j) (µi − µj ) (ξij )k ,
(3.8)
for i, j = 1, . . . , n. (3.9)
The tangent space Tπ Qπ M (H0 ) at π Qπ consists of those matrices Ξ = [π Qπ, Ω] where Ω ∈ skew (n), the class of skew symmetric matrices. Thus, the matrices Ξ are linearly parametrized by their components ξij , where i < j, and λπ(i) = λπ(j) . As this is a linearly independent parametrisation, the eigenvalues of the linearization (3.8) can be read directly from
2.3. Recursive Lie-Bracket Based Diagonalization
73
the linearization (3.9), and are 1 − α λπ(i) − λπ(j) (µi − µj ), for i < j and λπ(i) = λπ(j) . From classical stability theory for discrete-time linear dynamical systems, (3.8) is asymptotically stable if and only if all eigenvalues have absolute value less than 1. Equivalently, (3.8) is asymptotically stable if and only if
0 < α λπ(i) − λπ(j) (µi − µj ) < 2 for all i < j with λπ(i) = λπ(j) . This condition is only satisfied when π = I and consequently π Qπ = Q. Thus the only possible stable equilibrium point for the recursion is H∞ = Q. Certainly (λi − λj ) (µi − µj ) < 4 N 2 H0 2 . Also since N 2 H0 2 < 2 N H0 we have α < 1 2N 2 H0 2 . Therefore α (λi − λj ) (µi − µj ) < 2 for all i < j which establishes exponential stability of (3.8). This completes the proof. Remark 3.5 In the nongeneric case where N has multiple eigenvalues, the proof techniques for Parts (d) and (e) do not apply. All the results except convergence to a single equilibrium point remain in force. Remark 3.6 It is difficult to characterise the set of exceptional initial conditions, for which the algorithm converges to some unstable equilibrium point H∞ = Q. However, in the continuous-time case it is known that the unstable basins of attraction of such points are of zero measure in M (H0 ), see Section 2.1. Remark 3.7 By using a more sophisticated bounding argument, a variable step size selection can be determined as 2 [Hk , N ] 1 log +1 (3.10) αk = 2 [Hk , N ] H0 · [N, [Hk , N ]] Rigorous convergence results are given for this selection in Moore et al. (1994). The convergence rate is faster with this selection.
Recursive Flows on Orthogonal Matrices The associated recursions on the orthogonal matrices corresponding to the gradient flows (1.25) are
Θk+1 = Θk eαk [Θk H0 Θk ,N ] ,
α = 1/ (4 H0 · N )
(3.11)
where Θk is defined on O (n) and α is a general step-size scaling factor. Thus Hk = Θk H0 Θk is the solution of the Lie-bracket recursion (3.1). Precise results on (3.11) are now stated and proved for generic H0 and constant
74
Chapter 2. Double Bracket Isospectral Flows
step-size selection, although corresponding results are established in Moore et al. (1994) for the variable step-size scaling factor (3.10). Theorem 3.8 Let H0 = H0 be a real symmetric n × n matrix with distinct eigenvalues λ1 > · · · > λn . Let N ∈ Rn×n be diag (µ1 , . . . , µn ) with µ1 > · · · > µn . Then the recursion (3.11) referred to as the associated orthogonal Lie-bracket algorithm, has the following properties: (a) A solution Θk , k = 1, 2, . . . , to the associated orthogonal Lie-bracket algorithm remains orthogonal. (b) Let fN,H0 : O (n) → R, fN,H0 (Θ) = − 21 Θ H0 Θ − N be a function on O (n). Let Θk , k = 1, 2, . . . , be a solution to the associated orthogonal Lie-bracket algorithm. Then fN,H0 (Θk ) is a strictly monotonically increasing function of k ∈ N, as long as [Θk H0 Θk , N ] = 0. 2
(c) Fixed points of the recursive equation are characterised by matrices Θ ∈ O (n) such that [Θ H0 Θ, N ] = 0. There are exactly 2n n! such fixed points. (d) Let Θk , k = 1, 2, . . . , be a solution to the associated orthogonal Liebracket algorithm, then Θk converges to an orthogonal matrix Θ∞ , satisfying [Θ∞ H0 Θ∞ , N ] = 0. (e) All fixed points of the associated orthogonal Lie-bracket algorithm are strictly unstable, except those 2n points Θ∗ ∈ O (n) such that Θ∗ H0 Θ∗ = Q, where Q = diag (λ1 , . . . , λn ). Such points Θ∗ are locally exponentially asymptotically stable and H0 = Θ∗ QΘ∗ is the eigenspace decomposition of H0 . Proof 3.9 Part (a) follows directly from the orthogonal nature of eα[Θk H0 Θk ,N ] . Let g : O (n) → M (H0 ) be the matrix valued function g (Θ) = Θ H0 Θ. Observe that g maps solutions (Θk | k ∈ N) of (3.11) to solutions (Hk | k ∈ N) of (3.1). 2 Consider the potential fN,H (Θk ) = − 21 Θk H0 Θk − N , and the po2 tential fN = − 21 Hk − N . Since g (Θk ) = Hk for all k = 1, 2, . . . , then fN,H0 (Θk ) = fN (g (Θk )) for k = 1, 2, . . . . Thus fN (Hk ) = fN (g (Θk )) = fN,H0 (Θk ) is strictly monotonically increasing for [Hk , N ] = [g (Θk ) , N ] = 0, and Part (b) follows.
2.3. Recursive Lie-Bracket Based Diagonalization
75
If Θk is a fixed point of the associated orthogonal Lie-bracket algorithm with initial condition Θ0 , then g (Θk ) is a fixed point of the Lie-bracket algorithm. Thus, from Theorem 3.3, [g (Θk ) , N ] = [Θk H0 Θk , N ] = 0. Moreover, if [Θk H0 Θk , N ] = 0 for some given k ∈ N, then by inspection Θk+l = Θk for l = 1, 2, . . . , and Θk is a fixed point of the associated orthogonal Lie-bracket algorithm. A simple counting argument shows that there are precisely 2n n! such points and Part (c) is established. To prove (d) note that since g (Θk ) is a solution to the Lie-bracket algorithm, it converges to a limit point H∞ ∈ M (H0 ), [H∞ , N ] = 0, by Theorem 3.3. Thus Θk must converge to the pre-image set of H∞ via the map g. The genericity condition on H0 ensures that the set generated by the pre-image of H∞ is a finite disjoint set. Since [g (Θk ) , N ] → 0 as k → ∞, then Θk+1 − Θk → 0 as k → ∞. From this convergence of Θk follows. To prove Part (e), observe that, due to the genericity condition on H0 , the dimension of O (n) is the same as the dimension of M (H0 ). Thus g is locally a diffeomorphism on O (n), and taking a restriction of g to such a region, the local stability structure of the equilibria are preserved under the map g −1 . Thus, all fixed points of the associated orthogonal Lie-bracket algorithm are locally unstable except those that map via g to the unique locally asymptotically stable equilibrium of the Lie-bracket recursion.
Simulations A simulation has been included to demonstrate the recursive schemes developed. The simulation deals with a real symmetric 7 × 7 initial condition, H0 , generated by an arbitrary orthogonal similarity transformation of matrix Q = diag (1, 2, 3, 4, 5, 6, 7). The matrix N was chosen to be diag (1, 2, 3, 4, 5, 6, 7) so that the minimum value of fN occurs at Q such that fN (Q) = 0. Figure 3.2 plots the diagonal entries of Hk at each iteration and demonstrates the asymptotic convergence of the algorithm. The exponential behaviour of the curves appears at around iteration 30, suggesting that this is when the solution Hk enters the locally exponentially attractive domain of the equilibrium point Q. Figure 3.3 shows the evolution of the potential fN (Hk ) = − 21 Hk − N 2 , demonstrating its monotonic increasing properties and also displaying exponential convergence after iteration 30.
Computational Considerations It is worth noting that an advantage of the recursive Lie-bracket scheme over more traditional linear algebraic schemes for the same tasks, is the
Chapter 2. Double Bracket Isospectral Flows
Potential
Diagonal elements of estimate
76
Iterations
Iterations
FIGURE 3.2. The recursive Liebracket scheme
FIGURE 3.3. The potential function fN (Hk )
presence of a step-size α and the arbitrary target matrix N . The focus in this section is only on the step size selection. The challenge remains to devise more optimal (possibly time-varying) target matrix selection schemes for improved performance. This suggests an application of optimal control theory. It is also possible to consider alternatives of the recursive Lie-bracket scheme which have improved computational properties. For example, consider a (1, 1) Pad´e approximation to the matrix exponential eα[Hk ,N ] ≈
2I − α [Hk , N ] . 2I + α [Hk , N ]
Such an approach has the advantage that, as [Hk , N ] is skew symmetric, then the Pad´e approximation will be orthogonal, and will preserve the isospectral nature of the Lie-bracket algorithm. Similarly, an (n, n) Pad´e approximation of the exponential for any n will also be orthogonal. Actually Newton methods involving second order derivatives can be devised to give local quadratic convergence. These can be switched in to the Lie-bracket recursions here as appropriate, making sure that at each iteration the potential function increases. The resulting schemes are then much more competitive with commercial diagonalization packages than the purely linear methods of this section. Of course there is the possibility that quadratic convergence can also be achieved using shift techniques, but we do not explore this fascinating territory here. Another approach is to take just an Euler iteration, Hk+1 = Hk + α [Hk , [Hk , N ]] , as a recursive algorithm on Rn×n . A scheme such as this is similar in form to approximating the curves generated by the recursive Lie-bracket scheme by straight lines. The approximation will not retain the isospectral nature
2.3. Recursive Lie-Bracket Based Diagonalization
77
of the Lie-bracket recursion, but this fact may be overlooked in some applications, because it is computationally inexpensive. We cannot recommend this scheme, or higher order versions, except in the neighborhood of an equilibrium.
Main Points of Section In this section we have proposed a numerical scheme for the calculation of double bracket gradient flows on manifolds of similar matrices. Step-size selections for such schemes has been discussed and results have been obtained on the nature of equilibrium points and on their stability properties. As a consequence, the schemes proposed in this section could be used as a computational tool with known bounds on the total time required to make a calculation. Due to the computational requirements of calculating matrix exponentials these schemes may not be useful as a direct numerical tool in traditional computational environments, however, they provide insight into discretising matrix flows such as is generated by the double bracket equation.
78
Chapter 2. Double Bracket Isospectral Flows
Notes for Chapter 2 As we have seen in this chapter, the isospectral set M (Q) of symmetric matrices with fixed eigenvalues is a homogeneous space and the least squares distance function fN : M (Q) → R is a smooth Morse-Bott function. Quite a lot is known about the critical points of such functions and there is a rich mathematical literature on Morse theory developed for functions defined on homogeneous spaces. If Q is a rank k projection operator, then M (Q) is a Grassmannian. In this case the trace function H → tr (N H) is the classical example of a Morse function on Grassmann manifolds. See Milnor (1963) for the case where k = 1 and Wu (1965), Hangan (1968) for a slightly different construction of a Morse function for arbitrary k. For a complete characterization of the critical points of the trace functional on classical groups, the Stiefel manifold and Grassmann manifolds, with applications to the topology of these spaces, we refer to Frankel (1965) and Shayman (1982). For a complete analysis of the critical points and their Morse indices of the trace function on more general classes of homogeneous spaces we refer to Hermann (1962; 1963; 1964), Takeuchi (1965) and Duistermaat et al. (1983). For results on gradient flows of certain least squares functions defined on infinite dimensional homogeneous spaces arising in physics we refer to the important work of Atiyah and Bott (1982), Pressley (1982) and Pressley and Segal (1986). The work of Byrnes and Willems (1986) contains an interesting application of moment map techniques from symplectic geometry to total least squares estimation. The geometry of varieties of isospectral Jacobi matrices has been studied by Tomei (1984) and Davis (1987). The set Jac (λ1 , . . . , λn ) of Jacobi matrices with distinct eigenvalues λ1 > · · · > λn is shown to be a smooth compact manifold. Furthermore, an explicit construction of the universal covering space is given. The report of Driessel (1987b) contains another elementary proof of the smoothness of Jac (λ1 , . . . , λn ). Also the tangent space of Jac (λ1 , . . . , λn ) at a Jacobi matrix L is shown to be the vector space of all Jacobi matrices ζ of the form ζ = LΩ − ΩL, where Ω is skew symmetric. Closely related to the above is the work by de Mari and Shayman (1988) de Mari, Procesi and Shayman (1992) on Hessenberg varieties; i.e. varieties of invariant flags of a given Hessenberg matrix. Furthermore there are interesting connections with torus varieties; for this we refer to Gelfand and Serganova (1987). The double bracket equation (1.1) and its properties were first studied by Brockett (1988); see also Chu and Driessel (1990). The simple proof of the Wielandt-Hoffman inequality via the double bracket equation is due to Chu and Driessel (1990). A systematic analysis of the double bracket
2.3. Recursive Lie-Bracket Based Diagonalization
79
flow (1.2) on adjoint orbits of compact Lie groups appears in Bloch, Brockett and Ratiu (1990; 1992). For an application to subspace learning see Brockett (1991a). In Brockett (1989b) it is shown that the double bracket equation can simulate any finite automaton. Least squares matching problems arising in computer vision and pattern analysis are tackled via double bracket-like equations in Brockett (1989a). An interesting connection exists between the double bracket flow (1.1) and a fundamental equation arising in micromagnetics. The Landau-Lifshitz equation on the two-sphere is a nonlinear diffusion equation which, in the absence of diffusion terms, becomes equivalent to the double bracket equation. Stochastic versions of the double bracket flow are studied by Colonius and Kliemann (1990). A thorough study of the Toda lattice equation with interesting links to representation theory has been made by Kostant (1979). For the Hamiltonian mechanics interpretation of the QR-algorithm and the Toda flow see Flaschka (1974; 1975), Moser (1975), and Bloch (1990b). For connections of the Toda flow with scaling actions on spaces of rational functions in system theory see Byrnes (1978), Brockett and Krishnaprasad (1980) and Krishnaprasad (1979). An interesting interpretation of the Toda flow from a system theoretic viewpoint is given in Brockett and Faybusovich (1991); see also Faybusovich (1989). Numerical analysis aspects of the Toda flow have been treated by Symes (1980b; 1982), Chu (1984b; 1984a), Deift et al. (1983), and Shub and Vasquez (1987). Expository papers are Watkins (1984), Chu (1984a). The continuous double bracket flow H˙ = [H, [H, diag H]] is related to the discrete Jacobi method for diagonalization. For a phase portrait analysis of this flow see Driessel (1987a). See also Wilf (1981) and Golub and Van Loan (1989), Section 8.4 “Jacobi methods”, for a discussion on the Jacobi method. For the connection of the double bracket flow to Toda flows and flows on Grassmannians much of the initial work was done by Bloch (1990b; 1990a) and then by Bloch, Brockett and Ratiu (1990) and Bloch, Flaschka and Ratiu (1990). The connection to the Riccati flow was made explicit in Helmke (1991) and independently observed by Faybusovich. The paper of Faybusovich (1992b) contains a complete description of the phase portrait of the Toda flow and the corresponding QR algorithm, including a discussion of structural stability properties. In Faybusovich (1989) the relationship between QR-like flows and Toda-like flows is described. Monotonicity properties of the Toda flow are discussed in Lagarias (1991). A VLSI type implementation of the Toda flow by a nonlinear lossless electrical network is given by Paul, H¨ uper and Nossek (1992). Infinite dimensional versions of the Toda flow with applications to sorting of function values are in Brockett and Bloch (1989), Deift, Li and Tomei (1985), Bloch, Brockett, Kodama and Ratiu (1989). See also the closely
80
Chapter 2. Double Bracket Isospectral Flows
related work by Bloch (1985a; 1987) and Bloch and Byrnes (1986). Numerical integration schemes of ordinary differential equations on manifolds are presented by Crouch and Grossman (1991), Crouch, Grossman and Yan (1992a; 1992b) and Ge-Zhong and Marsden (1988). Discrete-time versions of some classical integrable systems are analyzed by Moser and Veselov (1991). For complexity properties of discrete integrable systems see Arnold (1990) and Veselov (1992). The recursive Lie-bracket diagonalization algorithm (3.1) is analyzed in detail in Moore et al. (1994). Related results appear in Brockett (1993) and Chu (1992a). Step-size selections for discretizing the double bracket flow also appear in the recent PhD thesis of Smith (1993).
CHAPTER
3
Singular Value Decomposition 3.1 SVD via Double Bracket Flows Many numerical methods used in application areas such as signal processing, estimation, and control are based on the singular value decomposition (SVD) of matrices. The SVD is widely used in least squares estimation, systems approximations, and numerical linear algebra. In this chapter, the double bracket flow of the previous chapter is applied to the singular value decomposition (SVD) task. In the following section, a first principles derivation of these self-equivalent flows is given to clarify the geometry of the flows. This work is also included as a means of re-enforcing for the reader the technical approach developed in the previous chapter. The work is based on Helmke and Moore (1992). Note also the parallel efforts of Chu and Driessel (1990) and Smith (1991).
Singular Values via Double Bracket Flows Recall, that the singular values of a matrix H ∈ Rm×n with m ≥ n are defined as the nonnegative square roots σi of the eigenvalues of H H, i.e. 1/2 σi (H) = λi (H H) for i = 1, . . . , n. Thus, for calculating the singular values σi of H ∈ Rm×n , m ≥ n, let us first consider, perhaps na¨ıvely, the diagonalization of H H or HH by the double bracket isospectral flows evolving on the vector space of real symmetric n × n and m × m matrices.
82
Chapter 3. Singular Value Decomposition
d (H H) = [H H, [H H, N1 ]] , dt d (HH ) = [HH , [HH , N2 ]] . dt The equilibrium points of these equations are the solutions of [(H H)∞ , N1 ] = 0,
[(HH )∞ , N2 ] = 0.
(1.1)
(1.2)
Thus if N1 = diag (µ1 , . . . , µn ) ∈ Rn×n ,
(1.3)
N2 = diag (µ1 , . . . , µn , 0, . . . , 0) ∈ Rm×m ,
for µ1 > . . . µn > 0, then from Theorem 2.1.5 these equilibrium points are characterised by
(H H)∞ =π1 diag σ12 , . . . , σn2 π1 ,
(1.4) (HH )∞ =π2 diag σ12 , . . . , σn2 , 0, . . . , 0 π2 . Here σi2 are the squares of the singular values of H (0) and π1 , π2 are permutation matrices. Thus we can use equations (1.1) as a means to compute, asymptotically, the singular values of H0 = H (0). A disadvantage of using the flow (1.1) for the singular value decomposition is the need to “square” a matrix, which as noted in Section 1.5, may lead to numerical problems. This suggests that we be more careful in the application of the double bracket equations. We now present another direct approach to the SVD task based on the double-bracket isospectral flow of the previous chapter. This approach avoids the undesirable squarings of H appearing in (1.1). First recall, from Section 1.5, that for any matrix H ∈ Rm×n with m ≥ n, there is an asso ∈ R(m+n)×(m+n) as ciated symmetric matrix H 0 H H := . (1.5) H 0 are given by ±σi , i = 1, . . . , n, The crucial fact is that the eigenvalues of H and possibly zero, where σi are the singular values of H. Theorem 1.1 With the definitions 0 H (t) (t) = = 0 H , N H (t) 0 N
N , 0
0 = 0 H H0
H0 , 0 (1.6)
3.1. SVD via Double Bracket Flows
83
(t) satisfies the double bracket isospectral flow where N ∈ Rm×n , then H (m+n)×(m+n) on R " ## (t) " dH (t) , H (t) , N , (0) = H 0, = H H (1.7) dt if and only if H (t) satisfies the self-equivalent flow on Rm×n H˙ =H (H N − N H) − (HN − N H ) H, =H {H, N } − {H , N } H.
H (0) = H0
(1.8)
using a mild generalization of the Lie-bracket notation
{X, Y } = X Y − Y X = {Y, X} = − {Y, X} .
(1.9)
The solutions H (t) of (1.8) exist for all t ∈ R. Moreover, the solutions of (1.8) converge, in either time direction, to the limiting solutions H∞ satisfying H∞ N − N H∞ = 0,
H∞ N − N H∞ = 0.
(1.10)
Proof 1.2 Follows by substitution of (1.6) into (1.7), and by applying Theorem 2.1.5 for double bracket flows. is of course not diagonal and, more imporRemark 1.3 The matrix N for m = n. Thus certain of the statant, 0 is a multiple eigenvalue of N bility results of Theorem 2.1.5 do not apply immediately to the flow (1.8). However, similar proof techniques as used in deriving the stability results in Theorem 2.1.5 do apply. This is explained in the next section. Remark 1.4 The gradient flow associated with the maximization of
0 Θ on the manifold of orthogonal (m + n) × (m + n) matrices Θ H tr N is given by Theorem 2.1.14 as Θ, # " d (t) H (t) , N (t) Θ 0Θ (1.11) Θ (t) = Θ dt = [V If Θ 0
0 U
], (1.11) is easily seen to be equivalent to U˙ =U {V H0 U, N } , ' & V˙ =V (V H0 U ) , N .
(1.12)
This gradient flow on O (m) × O (n) allows the determination of the left and right eigenvectors of H0 . Its convergence properties are studied in the next section.
84
Chapter 3. Singular Value Decomposition
Problem 1.5 Verify that the flow H˙ = H [H H, N1 ] − [HH , N2 ] H for H (t) implies (1.1) for H (t) H (t) and H (t) H (t). Problem 1.6 Flesh out the details of the proof of Theorem 1.1 and verify the claims in Remark 1.4.
Main Points of Section The singular value decomposition of a matrix H0 ∈ Rm×n can be obtained by applying the double bracket flow to the symmetric (m + n) × (m + n) matrix (1.6). The resulting flow can be simplified as a self equivalent flow (1.8) for SVD.
3.2 A Gradient Flow Approach to SVD In this section, we parallel the analysis of the double bracket isospectral flow on symmetric matrices with a corresponding theory for self-equivalent flows on rectangular matrices H ∈ Rm×n , for m ≥ n. A matrix H0 ∈ Rm×n has a singular value decomposition (SVD) H0 = V ΣU
(2.1)
where U , V are real n × n and m × m orthogonal matrices satisfying V V = Im , U U = In and r diag (σ1 In1 , . . . , σr In r ) Σ1 Σ= ni = n, = , 0(m−n)×n 0 i=1 (2.2) with σ1 > · · · > σr ≥ 0 being the singular values of H0 . We will approach the SVD task by showing that it is equivalent to a certain norm minimization problem. Consider the class S (Σ) of all real m × n matrices H having singular values (σ1 , . . . σr ) occurring with multiplicities (n1 , . . . , nr ) as
(2.3) S (Σ) = V ΣU ∈ Rm×n | U ∈ O (n) , V ∈ O (m) , with O (n) the group of all real orthogonal n × n matrices. Thus S (Σ) is the compact set of all m × n matrices H which are orthogonally equivalent to Σ via (2.1). Given an arbitrary matrix N ∈ Rm×n , we consider the task of minimizing the least squares distance H − N = H + N − 2 tr (N H) 2
2
2
(2.4)
3.2. A Gradient Flow Approach to SVD
85
$r 2 2 of N to any H ∈ S (Σ). Since the Frobenius norm H = i=1 ni σi is constant for H ∈ S (Σ), the minimization of (2.4) is equivalent to the maximization of the inner product function φ (H) = 2 tr (N H), defined on S (Σ). Heuristically, if N is chosen to be of the form N=
N1
0(m−n)×n
,
N1 = diag (µ1 , . . . , µn ) ∈ Rn×n ,
(2.5)
we would expect the minimizing matrices H∗ ∈ S (Σ) of (2.4) to be of the same form. i.e. H∗ = [ π0 0I ] Σπ S for a suitable n × n permutation matrix π and a sign matrix S = diag (±1, . . . , ±1). In fact, if N1 = diag (µ1 , . . . , µn ) with µ1 > · · · > µn > 0 then we would expect the minimizing H∗ to be equal to Σ. Since S (Σ) turns out to be a smooth manifold (Proposition 2.1) it seems natural to apply steepest descent techniques in order to determine the minima H∗ of the distance function (2.4) on S (Σ). To achieve the SVD we consider the induced function on the product space O (n) × O (m) of orthogonal matrices φ : O (n) × O (m) → R, φ (U, V ) = 2 tr (N V H0 U ) ,
(2.6)
defined for fixed arbitrary real matrices N ∈ Rn×m , H0 ∈ S (Σ). This leads to a coupled gradient flow U˙ (t) =∇U φ (U (t) , V (t)) , V˙ (t) =∇V φ (U (t) , V (t)) ,
U (0) =U0 ∈ O (n) , V (0) =V0 ∈ O (m) ,
(2.7)
on O (n) × O (m). These turn out to be the flows (1.12) as shown subsequently. Associated with the gradient flow (2.7) is a flow on the set of real m × n matrices H, derived from H (t) = V (t) H0 U (t). This turns out to be a self-equivalent flow (i.e. a singular value preserving flow) (2.4) on S (Σ) which, under a suitable genericity assumption on N , converges exponentially fast, for almost every initial condition H (0) = H0 ∈ S (Σ), to the global minimum H∗ of (2.4). We now proceed with a formalization of the above norm minimization approach to SVD. First, we present a phase portrait analysis of the selfequivalent flow (1.8) on S (Σ). The equilibrium points of (1.8) are determined and, under suitable genericity assumptions on N and Σ, their stability properties are determined. A similar analysis for the gradient flow (2.7) is given.
86
Chapter 3. Singular Value Decomposition
Self-equivalent Flows on Real Matrices We start with a derivation of some elementary facts concerning varieties of real matrices with prescribed singular values. Proposition 2.1 S (Σ) is a smooth, compact, manifold of dimension $r n (m − 1) − i=1 ni (n2i −1) if σr > 0, $ dim S (Σ) = n (m − 1) − ri=1 ni (ni −1) 2 if σr = 0.
− nr nr −1 + (m − n) 2
Proof 2.2 S (Σ) is an orbit of the compact Lie group O (m)×O (n), acting on Rm×n via the equivalence action: η : (O (m) × O (n)) × Rm×n → Rm×n ,
η ((V, U ) , H) = V HU.
See digression on Lie groups and homogeneous spaces. Thus S (Σ) is a homogeneous space of O (m) × O (n) and therefore a compact manifold. The stabilizer group Stab (Σ) = {(V, U ) ∈ O (m) × O (n) | V ΣU = Σ} of Σ is the set of all pairs of block-diagonal orthogonal matrices (V, U ) U = diag (U11 , . . . , Urr ) , V = diag (V11 , . . . , Vr+1,r+1 ) , i = 1, . . . , r, Vr+1,r+1 ∈ O (m − n) Uii = Vii ∈ O (ni ) ,
(2.8)
if σr > 0, and if σr = 0 then U = diag (U11 , . . . , Urr ) , Uii = Vii ∈ O (ni ) , i = 1, . . . , r − 1,
V = diag (V11 , . . . , Vrr ) ,
Urr ∈ O (nr ) ,
Vrr ∈ O (m − n + nr ) .
Hence there is an isomorphism of homogeneous spaces S (Σ) (O (m) × O (n)) / Stab (Σ) and dim S (Σ) = dim (O (m) × O (n)) − dim Stab (Σ) , =
m (m − 1) n (n − 1) + − dim Stab (Σ) . 2 2
Moreover, $ dim Stab (Σ) = The result follows.
ni (ni −1) 2 ni (ni −1) i=1 2 r
$i=1 r
+ +
(m−n)(m−n−1) 2 (m−n+nr )(m−n+nr −1) 2
if σr > 0, if σr = 0.
∼ =
3.2. A Gradient Flow Approach to SVD
" Remark 2.3 Let Q =
In 0(m−n)×n
87
# . Then S (Q) is equal to the (compact)
Stiefel manifold St (n, m) consisting of all X ∈ Rm×n with X X = In . In particular, for m = n, S (Q) is equal to the orthogonal group O (n). Thus the manifolds S (Σ) are a generalization of the compact Stiefel manifolds St (n, m), appearing in Section 1.3.
Let
Σ0 =
µ1 In1
0 ..
0
. µr Inr
0(m−n)×n be given with µ1 , . . . , µr arbitrary real numbers. Thus Σ0 ∈ S (Σ) if and only if {|µ1 | , . . . , |µr |} = {σ1 , . . . , σr }. We need the following description of the tangent space of S (Σ) at Σ0 . Lemma 2.4 Let Σ be defined by (2.2) and Σ0 as above, with σr > 0. Then the tangent space TΣ0 S (Σ) of S (Σ) at Σ0 consists of all block partitioned real m × n matrices ξ11 ... ξ1r . .. .. . ξ= . . . ξr+1,1 . . . ξr+1,r ξij ∈Rni ×nj , ξr+1,j ∈R
(m−n)×nj
i, j =1, . . . , r ,
j =1, . . . , r
= −ξii , i = 1, . . . , r. with ξii
Proof 2.5 Let skew (n) denote the Lie algebra of O (n), i.e. skew (n) consists of all skew-symmetric n × n matrices and is identical with tangent space of O (n) at the identity matrix In , see Appendix C. The tangent space TΣ0 S (Σ) is the image of the R-linear map L : skew (m) × skew (n) →Rm×n , (X, Y ) → − XΣ0 + Σ0 Y,
(2.9)
i.e. of the Fr´echet derivative of η : O (m) × O (n) → R, η (U, V ) = V Σ0 U , at (Im , In ). It follows that the image of L is contained in the R-vector space Ξ consisting of all block partitioned matrices ξ as in the lemma. Thus it suffices
88
Chapter 3. Singular Value Decomposition
to show that both spaces have the same dimension. By Propostion 2.1 dim image (L) = dim S (Σ) r ni (ni + 1) . =nm − 2 i=1 From (2.9) also dim Ξ = nm −
$r
i=1
ni (ni +1) 2
and the result follows.
Recall from Section 1.5, that a differential equation (or a flow) H˙ (t) = f (t, H (t))
(2.10)
defined on the vector space of real m × n matrices H ∈ R self-equivalent if every solution H (t) of (2.10) is of the form H (t) = V (t) H (0) U (t)
m×n
is called (2.11)
with orthogonal matrices U (t) ∈ O (n), V (t) ∈ O (m), U (0) = In , V (0) = Im . Recall that self-equivalent flows have a simple characterization given by Lemma 1.5.1. The following theorem gives explicit examples of self-equivalent flows on Rm×n . Theorem 2.6 Let N ∈ Rm×n be arbitrary and m ≥ n (a) The differential equation (1.8) repeated as H˙ = H (H N − N H) − (HN − N H ) H,
H (0) = H0 (2.12)
defines a self-equivalent flow on Rm×n . (b) The solutions H (t) of (2.12) exist for all t ∈ R and H (t) converges to an equilibrium point H∞ of (2.12) as t → +∞. The set of equilibria points H∞ of (2.12) is characterized by N = N H∞ , H∞
N H∞ = H∞ N .
(2.13)
(c) Let N be defined as in (2.5), with µ1 , . . . , µn real and µi + µj = 0 of all i, j = 1, . . . , n. Then every solution H (t) of (2.12) converges for t → ±∞ to an extended diagonal matrix H∞ of the form 0 λ1 .. . H∞ = (2.14) 0 λn 0(m−n)×n
3.2. A Gradient Flow Approach to SVD
89
with λ1 , . . . , λn real numbers. (d) If m = n and N = N is symmetric ( or N = −N is skewsymmetric), (2.12) restricts to the isospectral flow H˙ = [H, [H, N ]]
(2.15)
evolving on the subset of all symmetric (respectively skew-symmetric) n × n matrices H ∈ Rn×n Proof 2.7 For every (locally defined) solution H (t) of (2.12), the matrices C (t) = H (t) N (t) − N (t) H (t) and D (t) = N (t) H (t) − H (t) N (t) are skew-symmetric. Thus (a) follows immediately from Lemma 1.5.1. For any H (0) = H0 ∈ Rm×n there exist orthogonal matrices U ∈ O (n), V ∈ O (m) with V H0 U = Σ as in (2.1) where Σ ∈ Rm×n is the extended diagonal matrix given by (2.2) and σ1 > · · · > σr ≥ 0 are the singular values of H0 with corresponding multiplicities n1 , . . . , nr . By (a), the solution H (t) of (2.12) evolves in the compact set S (Σ) and therefore exists for all t ∈ R. Recall that {H, N } = {N, H} = − {H, N }, then the time derivative of the Lyapunov-type function tr (N H (t)) is d tr (N H (t)) dt = tr N H˙ (t) + H˙ (t) N
2
= tr (N H {H, N } − N {H , N } H − {H, N } H N + H {H , N } N ) = tr {H, N } {H, N } + {H , N } {H , N } (2.16) 2 2 = {H, N } + {H , N } . Thus tr (N H (t)) increases monotonically. Since H → tr (N H) is a continuous function defined on the compact space S (Σ), then tr (N H) is bounded from below and above. Therefore (2.16) must go to zero as t → +∞ (and indeed for t → −∞). It follows that every solution H (t) of (2.12) converges to an equilibrium point and the set of equilibria of (2.12) is char , N } = 0, or equivalently acterized by {H∞ , N } = {H∞ (2.13). This proves (b). To prove (c) recall that (2.5) partitions N as N01 with N1 = N1 . Now 1 partition H∞ as H∞ = H H2 . This is an equilibrium point of (2.12) if and only if H1 N1 = N1 H1 ,
N1 H1 = H1 N1 ,
N1 H2 = 0. (2.17)
90
Chapter 3. Singular Value Decomposition
Since N1 is nonsingular, (2.17) implies H2 = 0. Also since N1 = N1 , then N1 (H1 − H1 ) = (H1 − H1 ) N1 and therefore with H1 = (hij | i, j = 1, . . . , n) (µi + µj ) (hij − hji ) = 0 for all i, j = 1, . . . , n. By assumption µi + µj = 0 and therefore H1 = H1 is symmetric. Thus under (2.17) N1 H1 = H1 N1 , which implies that H1 is real diagonal. To prove (d) we note that for symmetric (or skew-symmetric) matrices N , H, equation (2.12) is equivalent to the double bracket equation (2.15). The isospectral property of (2.15) now follows readily from Lemma 1.5.1, applied for D = −C. This completes the proof of Theorem 2.6.
Remark 2.8 Let N = U0 N1 , 0n×(m−n) V0 be the singular value decomposition of N with real diagonal N1 . Using the change of variables H → V0 HU assume without loss of generality that N is
0 we can always equal to N1 , 0n×(m−n) with N1 real diagonal. From now on we will always assume that Σ and N are given from (2.2) and (2.5). An important property of the self-equivalent flow (2.12) is that it is a gradient flow. To see this, let & ' ∈M Σ | H ∈ S (Σ) S (Σ) := H
denote the subset of (m + n) × (m + n) symmetric matrices H ∈ M Σ is the isospectral manifold defined which are of the form (1.6). Here M Σ
∈ R(m+n)×(m+n) | Θ ∈ O (m + n) . The map by ΘΣΘ , i (H) = H i : S (Σ) → M Σ defines a diffeomorphism of S (Σ) onto its image S (Σ) = i (S (Σ)). By Lemma 2.4 the double bracket flow (1.7) has S (Σ) as an invariant submanifold.
with the normal Riemannian metric defined in ChapEndow M Σ ter 2. Thus flow of the least squares distance function
(1.7) is the gradient → R, fN (H) = − 1 N − H2 , with respect to the normal RiefN : M Σ 2
mannian metric. By restriction, the normal Riemannian metric on M Σ induces a Riemannian metric on the submanifold S (Σ); see Appendix C. Therefore, the diffeomorphism i : S (Σ) → S (Σ), a Riemannian met
using refer to this as the normal ric on M Σ induces a Riemannian on S (Σ). We −H 2 = 2 N − H2 , Lemma 2.4 Riemannian metric on S (Σ). Since N 2 implies that the gradient of FN : S (Σ) → R, FN (H) = − N − H , with
3.2. A Gradient Flow Approach to SVD
91
respect to this normal Riemannian metric on S (Σ) is given by (1.8). We have thus proved Corollary 2.9 The differential equation (2.12) is the gradient flow of the 2 distance function FN : S (Σ) → R, FN (H) = − N − H , with respect to the normal Riemannian metric on S (Σ). Since self-equivalent flows do not change the singular values or their multiplicities, (2.12) restricts to a flow on the compact manifold S (Σ). The following result is an immediate consequence of Theorem 2.6(a)–(c). Corollary 2.10 The differential equation (2.12) defines a flow on S (Σ). Every equilibrium point H∞ on S (Σ) is of the form πΣ1 π S H∞ = (2.18) 0 where π is an n × n permutation matrix and S = diag (s1 , . . . , sn ), si ∈ {−1, 1}, i = 1, . . . , n, is a sign matrix. We now analyze the local stability properties of the flow (2.12) on S (Σ) around each equilibrium point. The linearization of the flow (2.12) on S (Σ) around any equilibrium point H∞ ∈ S (Σ) is given by ξ˙ = H∞ (ξ N − N ξ) − (ξN − N ξ ) H∞
(2.19)
where ξ ∈ TH∞ S (Σ) is the tangent space of S (Σ). By Lemma 2.4 ξ11 ... ξ1r . ξ1 . . .. .. ξ = .. = ξ 2 ξr+1,1 . . . ξr+1,r with ξ1 = (ξij ) ∈ Rn×n , ξii = −ξii , and ξ2 = [ξr+1,1, . . . , ξr+1,r ] ∈ (m−n)×n . R Using (2.18), (2.19) is equivalent to the decoupled system of equations
ξ˙1 =πΣ1 π S (ξ1 N1 − N1 ξ1 ) − (ξ1 N1 − N1 ξ1 ) πΣ1 π S ξ˙2 = − ξ2 N1 πΣ1 π S
(2.20)
In order to simplify the subsequent analysis we now assume that the singular values of Σ = Σ01 are distinct, i.e. Σ1 = diag (σ1 , . . . , σn ) ,
σ1 > · · · > σn > 0.
(2.21)
92
Chapter 3. Singular Value Decomposition
" Furthermore, we assume that N =
N1 O(m−n)×n
N1 = diag (µ1 , . . . , µn ) ,
# with
µ1 > · · · > µn > 0.
Then πΣ1 π S = diag (λ1 , . . . , λn ) with λi = si σπ(i) ,
i = 1, . . . , n,
(2.22)
and ξii = −ξii are zero for i = 1, . . . , n. Then (2.20) is equivalent to the system for i, j = 1, . . . , n, ξ˙ij = − (λi µi + λj µj ) ξij + (λi µj + λj µi ) ξji , ξ˙n+1,j = − µj λj ξn+1,j .
(2.23)
This system of equations is equivalent to the system of equations ξij − (λi µi + λj µj ) (λi µj + λj µi ) d ξij = ,i < j dt ξji (λi µj + λj µi ) − (λi µi + λj µj ) ξji ξ˙n+1,j = − µj λj ξn+1,j .
(2.24)
The eigenvalues of λi µj + λj µi − (λi µi + λj µj ) λi µj + λj µi − (λi µi + λj µi ) are easily seen to be {− (λi + λj ) (µi + µj ) , − (λi − λj ) (µi − µj )} for i < j. Let H∞ be an equilibrium point of the flow (2.12) on the manifold S (Σ). Let n+ and n− denote the number of positive real eigenvalues and negative real eigenvalues respectively of the linearization (2.19). The Morse index ind H∞ of (2.20) at H∞ is defined by ind (H∞ ) = n− − n+ .
(2.25)
By (2.21), (2.22) the eigenvalues of the linearization (2.23) are all nonzero real numbers and therefore n+ + n− = dim S (Σ) = n (m − 1) 1 n− = (n (m − 1) + ind (H∞ )) 2
(2.26)
This sets the stage for the key stability results for the self-equivalent flows (2.12).
3.2. A Gradient Flow Approach to SVD
93
Corollary 2.11 Let σ1 > · · · > σn > 0 and let (λ1 , . . . , λn ) be defined by (2.22). Then (2.12) has exactly 2n .n! equilibrium points on S (Σ). The linearization of the flow (2.12) on S (Σ) at H∞ = πΣ10π S is given by (2.23) and has only nonzero real eigenvalues. The Morse index at H∞ is ind (H∞ ) = (m − n) +
n
si +
i=1
sign si σπ(i) − sj σπ(j)
i<j
sign si σπ(i) + sj σπ(j) .
(2.27)
i<j
In particular, ind (H∞ ) = ± dim S (Σ) if and only if s1 = · · · = sn = ±1 and π = In . Also, H∞ = Σ is the uniquely determined asymptotically stable equilibrium point on S (Σ). Proof 2.12 It remains to prove (2.27). By (2.23) and as µi > 0 ind (H∞ ) = (m − n) +
n
sign µi λi
i=1
(sign (µi − µj ) (λi − λj ) + sign (µi + µj ) (λi + λj ))
i<j
= (m − n) +
n
sign λi +
i=1
sign (λi − λj )
i<j
sign (λi + λj )
i<j
with λi = si σπ(i) . The result follows. Remark 2.13 From (2.23) and Corollary 2.11 it follows that, for almost all initial conditions on S (Σ), the solutions of (2.12) on S (Σ) approach the attractor Σ exponentially fast. The rate of exponential convergence to Σ is equal to , (2.28) ρ = min µn σn , min (µi − µj ) (σi − σj ) . i<j
A Gradient Flow on Orthogonal Matrices
Let N = N1 , 0n×(m−n) with N1 = diag (µ1 , . . . , µn ) real diagonal and µ1 > · · · > µn > 0. Let H0 be a given real m × n matrix with singular value decomposition H0 = V0 ΣU0
(2.29)
94
Chapter 3. Singular Value Decomposition
where U0 ∈ O (n), V0 ∈ O (m) and Σ is given by (2.2). For simplicity we always assume that σr > 0. With these choices of N , H0 we consider the smooth function on the product space O (n) × O (m) of orthogonal groups φ : O (n) × O (m) → R,
φ (U, V ) = 2 tr (N V H0 U ) .
(2.30)
We always endow O / its standard
Riemannian
met. (n) × O (m) with Ω Ω = tr Ω + tr Ω for , Ω ) , Ω , Ω ric , defined by (Ω 1 2 1 2 1 1 2 2
1, Ω 2 ∈ T(U,V ) (O (n) × O (m)). Thus , is the Riemannian (Ω1 , Ω2 ) , Ω metric induced by the imbedding O (n) × O (m) ⊂ Rn×n × Rm×m , where Rn×n × Rm×m , is equipped with its standard symmetric inner product. The Riemannian metric , on O (n) × O (m) in the case where m = n coincides with the Killing form, up to a constant scaling factor. Lemma 2.14 The gradient flow of φ : O (n) × O (m) → R (with respect to the standard Riemannian metric) is U˙ =U {V H0 U, N } , V˙ =V {U H0 V, N } ,
U (0) =U0 ∈ O (n) , V (0) =V0 ∈ O (m) .
(2.31)
The solutions (U (t) , V (t)) of (2.31) exist for all t ∈ R and converge to an equilibrium point (U∞ , V∞ ) ∈ O (n) × O (m) of (2.31) for t → ±∞. Proof 2.15 We proceed as in Section 2.1. The tangent space of O (n) × O (m) at (U, V ) is given by T(U,V ) (O (n) × O (m))
n×n × Rm×m | Ω1 ∈ skew (n) , Ω2 ∈ skew (m) . = (U Ω1 , V Ω2 ) ∈ R The Riemannian matrix on O (n) × O (m) is given by the inter product , on the tangent space T(U,V ) (O (n) × O (m)) defined for (A1 , A2 ) , (B1 , B2 ) ∈ T(U,V ) (O (n) × O (m)) by (A1 , A2 ) , (B1 , B2 ) = tr (A1 B1 ) + tr (A2 B2 ) for (U, V ) ∈ O (n) × O (m). The derivative of φ : O (n) × O (m) → R, φ (U, V ) = 2 tr (N V H0 U ), at an element (U, V ) is the linear map on the tangent space T(U,V ) (O (n) × O (m)) defined by Dφ|(U,V ) (U Ω1 , V Ω2 ) =2 tr (N V H0 U Ω1 − N Ω2 V H0 U ) = tr [(N V H0 U − U H0 V N ) Ω1 + (N U H0 V − V H0 U N ) Ω2 ] = tr {V H0 Y, N } Ω1 + {U H0 V, N } Ω2
(2.32)
3.2. A Gradient Flow Approach to SVD
95
for skew-symmetric matrices Ω1 ∈ skew (n), Ω2 ∈ skew (m). Let ∇U φ ∇φ = ∇V φ denote the gradient vector field of φ, defined with respect to the above Riemannian metric on O (n) × O (m). Thus ∇φ (U, V ) is characterised by (a) (b)
∇φ (U, V ) ∈ T(U,V ) (O (n) × O (m)) Dφ|(U,V ) (U Ω1 , V Ω2 ) = ∇φ (U, V ) , (U Ω1 , V Ω2 )
= tr ∇U φ (U, V ) U Ω1 + tr ∇V φ (U, V ) V Ω2
for all (Ω1 , Ω2 ) ∈ skew (n) × skew (m). Combining (2.32) and (a), (b) we obtain ∇U φ =U {V H0 U, N } ∇V φ =V {U H0 V, N } . Thus (2.32) is the gradient flow U˙ = ∇U φ (U, V ), V˙ = ∇V φ (U, V ) of φ. By compactness of O (n) × O (m) the solutions of (2.31) exist for all t ∈ R. Moreover, by the properties of gradient flows as summarized in the digression on convergence of gradient flows, Section 1.3, the ω-limit set of any solution (U (t) , V (t)) of (2.31) is contained in a connected component of the intersection of the set of equilibria (U∞ , V∞ ) of (2.31) with some level set of φ : O (n) × O (m) → R. It can be shown that φ : O (n) × O (m) → R is a Morse-Bott function. Thus, by Proposition 1.3.9, any solution (U (t) , V (t)) of (2.32) converges to an equilibrium point. The following result describes the equilibrium points of (2.31). Lemma 2.16 Let σr > 0 and (U0 , V0 ) ∈ O (n) × O (m) as in (2.29). A pair (U, V ) ∈ O (n) × O (m) is an equilibrium point of (2.31) if and only if U =U0 diag (U1 , . . . , Ur ) π S, V =V0 diag (U1 , . . . , Ur , Ur+1 )
π
0
0
Im−n
,
(2.33)
where π ∈ Rn×n is a permutation matrix, S = diag (s1 , . . . , sn ), si ∈ {−1, 1}, is a sign-matrix and (U1 , . . . , Ur , Ur+1 ) ∈ O (n1 ) × · · · × O (nr ) × O (n − m) are orthogonal matrices.
96
Chapter 3. Singular Value Decomposition
Proof 2.17 From (2.31) it follows that the equilibria (U, V ) are characterised by {V H0 U, N } = 0, {U H0 V, N } = 0. Thus, by Corollary 2.10, (U, V ) is an equilibrium point of (2.31) if and only if πΣ1 π S . (2.34) V H0 U = 0 Thus the result follows from the form of the stabilizer group given in (2.8). Remark 2.18 In fact the gradient flow (1.12) is related to the self-equivalent flow (1.8). Let (U (t) , V (t)) be a solution of (1.12). Then H (t) = V (t) H0 U (t) ∈ S (Σ) is a solution of the self-equivalent flow (1.8) since H˙ =V H0 U˙ + V˙ H0 U
' & =V H0 U {V H0 U, N } − (V H0 U ) , N V H0 U =H {H, N } − {H , N } H.
Note that by (2.33) the equilibria (U∞ , V∞ ) of the gradient flow satisfy π 0 Σπ S V∞ H0 U∞ = 0 Im−n πΣ1 π S = . 0 Thus, up to permutations and possible sign factors, the equilibria of (1.12) just yield the singular value decomposition of H0 . In order to relate the local stability properties of the self-equivalent flow (1.8) on S (Σ) to those of the gradient flow (2.31) we consider the smooth function f : O (n) × O (m) → S (Σ) defined by f (U, V ) = V H0 U.
(2.35)
, V if and only if there exists orthogonal matrices By (2.8), f (U, V ) = f U (U1 , . . . , Ur , Ur+1 ) ∈ O (n1 ) × · · · × O (nr ) × O (m − n) with =U diag (U1 , . . . , Ur ) U0 · U, U 0 V =V0 diag (U1 , . . . , Ur , Ur+1 ) V0 · V.
(2.36)
3.2. A Gradient Flow Approach to SVD
97
and (U0 , V0 ) ∈ O (n) × O (m) as in (2.29). Therefore the fibres of (2.35) are all diffeomorphic to O (n1 ) × · · · × O (nr ) × O (m − n). For π an n × n permutation matrix and S = diag (±1, . . . , ±1) an arbitrary sign-matrix let C (π, S) denote the submanifold of O (n) × O (m) which consists of all pairs (U, V ) U =U0 diag (U1 , . . . , Ur ) π S, π V =V0 diag (U1 , . . . , Ur+1 ) 0
0 Im−n
,
with (U1 , . . . , Ur , Ur+1 ) ∈ O (n1 )×· · ·×O (nr )×O (m − n) arbitrary. Thus, by Lemma 2.16, the union 0 C= C (π, S) (2.37) (π,S)
of all 2n · n! sets C (π, S) is equal to the set of equilibria of (2.31). That is, the behaviour of the self-equivalent flow (1.8) around an equilibrium point πΣ10π S now is equivalent to the behaviour of the solutions of the gradient flow (2.31) as they approach the invariant submanifold C (π, S). This is made more precise in the following theorem. For any equilibrium point (U∞ , V∞ ) ∈ C let W s (U∞ , V∞ ) ⊂ O (n) × O (m) denote the stable manifold. See Section C.11. Thus W s (U∞ , V∞ ) is the set of all initial conditions (U0 , V0 ) such that the corresponding solution (U (t) , V (t)) of (1.12) converges to (U∞ , V∞ ) as t → +∞. Theorem 2.19 Suppose the singular values σ1 , . . . , σn of H0 are pairwise distinct with σ1 > · · · > σn > 0. Then W s (U∞ , V∞ ) is an immersed submanifold in O (n) × O (m). Convergence in W s (U∞ , V∞ ) to (U∞ , V∞ ) is exponentially fast. Outside of a union of invariant submanifolds of codimension ≥ 1, every solution of the gradient flow (1.12) converges to the submanifold C (In , In ). Proof 2.20 By the stable manifold theorem, Section C.11, W s (U∞ , V∞ ) is an immersed submanifold. The stable manifold W s (U∞ , V∞ ) of (1.12) at (U∞ , V∞ ) is mapped diffeomorphically by f : O (n) × O (m) → S (Σ) H0 U∞ . The second claim onto the stable manifold of (1.8) at H∞ = V∞ follows since convergence on stable manifolds is always exponential. For any equilibrium point (U∞ , V∞ ) ∈ C (In , In ) H0 U ∞ = Σ H∞ = V∞
is the uniquely determined exponentially stable equilibrium point of (1.8)
98
Chapter 3. Singular Value Decomposition
and its stable manifold W s (H∞ ) is the complement of a union Γ of submanifolds of S (Σ) of codimension ≥ 1. Thus W s (U∞ , V∞ ) = O (n) × O (m) − f (Γ) is dense in O (n) × O (m) with codim f −1 (Γ) ≥ 1. The result follows. Problem 2.21 Verify that φ : O (n) × O (m) → R, φ (U, V ) = tr (N V H0 U ), is a Morse-Bott function. Here N1 µ1 > · · · > µn > 0. , N1 = diag (µ1 , . . . , µn ) , N= O(m−n)×n Problem 2.22 Let A, B ∈ Rn×n with singular values σ1 ≥ · · · ≥ σn and τ1 ≥ · · · ≥ τn respectively. Show that the maximum value of the trace function tr (AU BV ) on O (n) × O (n) is max
U,V ∈O(n)
tr (AU BV ) =
n
σi τi .
i=1
Problem 2.23 Same notation as above. Show that max
U,V ∈SO(n)
tr (AU BV ) =
n−1
σi τi + (sign (det (AB))) σn τn
i=1
Main Points of Section The results on finding singular value decompositions via gradient flows, depending on the viewpoint, are seen to be both a generalization and specialization of the results of Brockett on the diagonalization of real symmetric matrices. The work ties in nicely with continuous time interpolations of the classical discrete-time QR algorithm by means of self-equivalent flows.
3.2. A Gradient Flow Approach to SVD
99
Notes for Chapter 3 There is an enormous literature on the singular value decomposition (SVD) and its applications within numerical linear algebra, statistics, signal processing and control theory. Standard applications of the SVD include those for solving linear equations and least squares problems; see Lawson and Hanson (1974). A strength of the singular value decomposition lies in the fact that it provides a powerful tool for solving ill-conditioned matrix problems; see Hansen (1990) and the references therein. A lot of the current research on the SVD in numerical analysis has been pioneered and is based on the work of Golub and his school. Discussions on the SVD can be found in almost any modern textbook on numerical analysis. The book of Golub and Van Loan (1989) is an excellent source of information on the SVD. A thorough discussion on the SVD and its history can also be found in Klema and Laub (1980). A basic numerical method for computing the SVD of a matrix is the algorithm by Golub and Reinsch (1970). For an SVD updating method applicable to time-varying matrices we refer to Moonen, van Dooren and Vandewalle (1990). A neural network algorithm for computing the SVD has been proposed by Sirat (1991). In the statistical literature, neural network theory and signal processing, the subject of matrix diagonalization and SVD is often referred to as principal component analysis or the Karhunen-Loeve expansion/transform method; see Hotelling (1933; 1935), Dempster (1969), Oja (1990) and Fernando and Nicholson (1980). Applications of the singular value decomposition within control theory and digital signal processing have been pioneered by Mullis and Roberts (1976) and Moore (1981). Since then the SVD has become an invaluable tool for system approximation and model reduction theory; see e.g. Glover (1984). Applications of the SVD for rational matrix valued functions to feedback control can be found in Hung and MacFarlane (1982). Early results on the singular value decomposition, including a characterization of the critical points of the trace function (2.6) as well as a computation of the associated Hessian, are due to von Neumann (1937). The idea of relating the singular values of a matrix A to the eigenvalues of 0 A is a standard trick in linear algebra which is the symmetric matrix A 0 probably due to Wielandt. Inequalities for singular values and eigenvalues of a possibly nonsymmetric, square matrix include those of Weyl, Polya, Horn and Fan and can be found in Gohberg and Krein (1969) and Bhatia (1987). Generalizations of the Frobenius norm are unitarily invariant matrix norms. These satisfy U AV = A for all orthogonal (or unitary) matrices
100
Chapter 3. Singular Value Decomposition
U , V and arbitrary matrices A. A characterization of unitarily invariant norms has been obtained by von Neumann (1937). The appropriate generalization of the SVD for square invertible matrices to arbitrary Lie groups is the Cartan decomposition. Basic textbooks on Lie groups and Lie algebras are Br¨ ocker and tom Dieck (1985) and Humphreys (1972). The Killing form is a symmetric bilinear form on a Lie algebra. It is nondegenerate if the Lie algebra is semisimple. An interesting application of Theorem 2.6 is to the Wielandt-Hoffman inequality for singular values of matrices. For this we refer to Chu and Driessel (1990); see also Bhatia (1987). For an extension of the results of Chapter 3 to complex matrices see Helmke and Moore (1992). Parallel work is that of Chu and Driessel (1990) and Smith (1991). The analysis in Chu and Driessel (1990) is not entirely complete and they treat only the generic case of simple, distinct singular values. The crucial observation that (2.12) is the gradient flow for the least squares cost function (2.4) on S (Σ) is due to Smith (1991). His proof, however, is slightly different from the one presented here. For an analysis of a recursive version of the self-equivalent flow (2.12) we refer to Moore et al. (1994). It is shown that the iterative scheme Hk+1 = e−αk {Hk ,N } Hk eαk {Hk ,N }
, k ∈ N0 ,
with initial condition H0 and step-size selection αk := 1/ (8 H0 · N ) defines a self-equivalent recursion on S (H0 ) such that every solution (Hk | k ∈ N0 ) converges to the same set of equilibria points as for the continuous flow (2.12). Moreover, under suitable genericity assumption, the convergence is exponentially fast, see also Brockett (1993). For extensions of the singular value decomposition to several matrices and restricted versions we refer to Zha (1989a; 1989b) and de Moor and Golub (1989).
CHAPTER
4
Linear Programming 4.1 The Rˆole of Double Bracket Flows In Chapter 2, the double bracket equation H˙ (t) = [H (t) , [H (t) , N ]] ,
H (0) = H0
(1.1)
for a real diagonal matrix N , and its recursive version, is presented as a scheme for diagonalizing a real symmetric matrix. Thus with a generic initial condition H (0) = H0 where H0 is real symmetric, H (t) converges to a diagonal matrix H∞ , with its diagonal elements ordered according to the ordering in the prespecified diagonal matrix N . If a generic non-diagonal initial matrix H0 = Θ Σ0 Θ is chosen, where Σ0 is diagonal with a different ordering property than N , then the flow thus allows a re-ordering, or sorting. Now in a linear programming exercise, one can think of the vertices of the associated compact convex constraint set as needing an ordering to find the one vertex which has the greatest cost. The cost of each vertex can be entered in a diagonal matrix N . Then the double bracket equation, or its recursive form, can be implemented to achieve, from an initial H0 = H0 with one nonzero eigenvalue, a diagonal H∞ with one nonzero diagonal element. This nonzero element then “points” to the corresponding diagonal element of N , having the maximum value, and so the maximum cost vertex is identified. The optimal solution in linear programming via a double bracket flow is achieved from the interior of the constraint set, and is therefore similar in spirit to the celebrated Khachian’s and Karmarkar’s algorithms. Actually, there is an essential difference between Karmarkar’s and Brockett’s algorithms due to the fact that the optimizing trajectory in Brockett’s
102
Chapter 4. Linear Programming
approach is constructed using a gradient flow evolving on a higher dimensional smooth manifold. An important property of this gradient flow and its recursive version is that it converges exponentially fast to the optimal solution. However, we caution that the computational effort in evaluating the vertex costs entered into N , is itself a computational cost of order m, where m is the number of vertices. Starting from the seminal work of Khachian and Karmarkar, the development of interior point algorithms for linear programming is currently making fast progress. The purpose of this chapter is not to cover the main recent achievements in this field, but rather to show the connection between interior points flows for linear programming and matrix least squares estimation. Let us mention that there are also exciting connections with artificial neural network theory; as can be seen from the work of, e.g., Pyne (1956), Chua and Lin (1984), Tank and Hopfield (1985). A connection of linear programming with completely integrable Hamiltonian systems is made in the important work of Bayer and Lagarias (1989), Bloch (1990b) and Faybusovich (1991a; 1991b; 1991c). In this chapter we first formulate the double bracket algorithm for linear programming and then show that it converges exponentially fast to the optimal solution. Explicit bounds for the rate of convergence are given as well as for the time needed for the trajectory produced to enter an ε-neighborhood of the optimal solution. In the special case where the convex constraint set is the standard simplex, Brockett’s equation, or rather its simplification studied here, is shown to induce an interior point algorithm for sorting. The algorithm in this case is formally very similar to Karmarkar’s interior point flow and this observation suggests the possibility of common generalizations of these algorithms. Very recently, Faybusovich (1991a; 1991b; 1991c) have proposed a new class of interior point flows and for linear programming which, in the case of a standard simplex, coincide with the flow studied here. These interior point flows are studied in Section 4.3.
The Linear Programming Problem Let C (v1 , . . . , vm ) ⊂ Rn denote the convex hull of m vectors v1 , . . . , vm ∈ Rn . Given the compact convex set C (v1 , . . . , vm ) ⊂ Rn and a nonzero row vector c = (c1 , . . . , cn ) ∈ Rn the linear programming problem then asks to find a vector x ∈ C (v1 , . . . , vm ) which maximizes c · x. Of course, the optimal solution is a vertex point vi∗ of the constraint set C (v1 , . . . , vm ). Since C (v1 , . . . , vm ) is not a smooth manifold in any reasonable sense it is not possible to apply in the usual way steepest descent gradient methods in order to find the optimum. One possible way to circumvent such techni-
4.1. The Rˆ ole of Double Bracket Flows
103
cal difficulties would be to replace the constraint set C (v1 , . . . , vm ) by some suitable compact manifold M so that the optimization takes place on M instead of C (v1 , . . . , vm ). A mathematically convenient way here would be to construct a suitable resolution space for the singularities of C (v1 , . . . , vm ). This is the approach taken by Brockett. $m Let m−1 = {(η1 , . . . , ηm ) ∈ Rm | ηi ≥ 0, i=1 ηi = 1} denote the standard (m − 1)-dimensional simplex in Rm . Let T = (v1 , . . . , vm )
(1.2)
be the real n × m matrix whose column vectors are the vertices v1 , . . . , vm . Thus T : Rm → Rn maps the simplex m−1 ⊂ Rm linearly onto the set C (v1 , . . . , vm ). We can thus use T in order to replace the constraint set C (v1 , . . . , vm ) by the standard simplex m−1 . Suppose that we are to solve the linear programming problem consisting of maximizing c x over the compact convex set of all x ∈ C (v1 , . . . , vm ). Brockett’s recipe to solve the problem is this (cf. Brockett (1988), Theorem 6). Theorem 1.1 Let N be the real m × m matrix defined by N = diag (c v1 , . . . , c vm ) ,
(1.3)
and assume that the genericity condition c vi = c vj for i = j holds. Let Q = diag (1, 0, . . . , 0) ∈ Rm×m . Then for almost all orthogonal matrices Θ ∈ O (m) the solution H (t) of the differential equation H˙ = [H, [H, N ]] =H 2 N + N H 2 − 2HN H,
H (0) = Θ QΘ
converges as t → ∞ to a diagonal matrix H∞ = diag (0, . . . , 0, 1, 0, . . . , 0), with the entry 1 being at position i∗ so that x = vi∗ is the optimal vertex of the linear programming problem. In fact, this is an immediate consequence of the remark following Theorem 2.1.5. Thus the optimal solution of a linear programming problem can be obtained by applying the linear transformation T to a vector obtained from the diagonal entries of the stable limiting solution of (1.4). Brockett’s method, while theoretically appealing, has however a number of shortcomings. First, it works with a huge overparametrization of the problem. The differential equation (1.4) evolves on the 12 m (m + 1)-dimensional vector space of real symmetric m × m matrices H, while the linear programming
104
Chapter 4. Linear Programming
problem is set up in the n-dimensional space C (v1 , . . . , vm ). Of course, usually n will be much smaller than 12 m (m + 1). Second, convergence to H∞ is guaranteed only for a generic choice of orthogonal matrices. No explicit description of this generic set of initial data is given. Finally the method requires the knowledge of the values of the cost functional at all vertex points in order to define the matrix N . Clearly the last point is the most critical one and therefore Theorem 1.1 should be regarded only as a theoretical approach to the linear programming problem. To
overcome the first two we proceed as follows. $ difficulties 2 ξ = 1 denote the set of unit Let S m−1 = (ξ1 , . . . , ξm ) ∈ Rm | m i=1 i vectors of Rm . We consider the polynomial map f : S m−1 → m−1 defined by
2 . f (ξ1 , . . . , ξm ) = ξ12 , . . . , ξm
(1.4)
By composing f with the map T : m−1 → C (v1 , . . . , vm ) we obtain a real algebraic map πT = T ◦ f : S m−1 →C (v1 , . . . , vm ) , ξ12 . . (ξ1 , . . . , ξm ) →T . . 2 ξm
(1.5)
The linear programming task is to maximize the restriction of the linear functional λ : C (v1 , . . . , vm ) →R x →c x
(1.6)
over C (v1 , . . . , vm ). The idea now is to consider instead of the maximization of (1.6) the maximization of the induced smooth function λ ◦ πT : S m−1 → R on the sphere S m−1 . Of course, the function λ ◦ πT : S m−1 → R has the same form as that of a Rayleigh quotient, so that we can apply our previous theory developed in Section 1.4. Let S m−1 be endowed with the Riemannian metric defined by ξ, η = 2ξ η,
ξ, η ∈ Tx S m−1 .
(1.7)
4.1. The Rˆ ole of Double Bracket Flows
105
Theorem 1.2 Let N be defined by (1.3) and assume the genericity condition c (vi − vj ) = 0
for all i = j.
(1.8)
(a) The gradient vector-field of λ ◦ πT on S m−1 is ξ˙ = (N − ξ N ξI) ξ,
(1.9)
2
|ξ| = 1. Also, (1.9) has exactly 2m equilibrium points given by the standard basis vectors ±e1 , . . . , ±em of Rm . (b) The eigenvalues of the linearization of (1.9) at ±ei are c (v1 − vi ) , . . . , c (vi−1 − vi ) , c (vi+1 − vi ) , . . . , c (vm − vi ) , (1.10) and there is a unique index 1 ≤ i∗ ≤ m such that ±ei∗ is asymptotically stable. (c) Let X ∼ = S m−2 be the smooth codimension-one submanifold of S m−1 defined by X = {(ξ1 , . . . , ξm ) | ξi∗ = 0} .
(1.11)
With the exception of initial points contained in X, every solution ξ (t) of (1.9) converges exponentially fast to the stable attractor [±ei∗ ]. Moreover, πT (ξ (t)) converges exponentially fast to the optimal solution πT (±ei∗ ) = vi∗ of the linear programming problem, with a bound on the rate of convergence |πT (ξ (t)) − vi∗ | ≤ const e−µ(t−t0 ) , where µ = min |c (vj − vi∗ )| . j=i∗
(1.12)
Proof 1.3 For convenience of the reader we repeat the argument for the proof, taken from Section 1.4. For any diagonal m × m matrix N = diag (n1 , . . . , nm ) consider the Rayleigh quotient ϕ : Rm − {0} → R defined by $m ni x2i ϕ (x1 , . . . , xm ) = $i=1 (1.13) m 2 . i=1 xi
106
Chapter 4. Linear Programming
ÉÉÉÉÉÉÉÉÉÉÉÉ ÉÉÉÉÉÉÉÉÉÉÉÉ ÉÉÉÉÉÉÉÉÉÉÉÉ ÉÉÉÉÉÉÉÉÉÉÉÉ ÉÉÉÉÉÉÉÉÉÉÉÉ ÉÉÉÉÉÉÉÉÉÉÉÉ vi *
FIGURE 1.1. The convex set C (v1 , . . . , vm )
A straightforward computation shows that the gradient of ϕ at a unit vector x ∈ S m−1 is ∇ϕ (x) = (N − x N x) x. Furthermore, if ni = nj for i = j, then the critical points of the induced map ϕ : S m−1 → R are, up to a sign ±, the standard basis vectors, i.e. ±e1 , . . . , ±em . This proves (a). The Hessian of ϕ : S m−1 → R at ±ei is readily computed as Hϕ (±ei ) = diag (n1 − ni , . . . , ni−1 − ni , ni+1 − ni , . . . , mm − ni ) . Let i∗ be the unique index such that ni∗ = max1≤j≤m nj . Thus Hϕ (±ei ) < 0 if and only if i = i∗ which proves (b). Let X ∼ = S m−2 be the closed m−1 m−1 defined by (1.11). Then S − X is equal to the submanifold of S stable manifold of ±ei∗ and X is equal to the union of the stable manifolds of the other equilibrium points ±ei , i = i∗ . The result follows. Remark 1.4 From (1.11) πT (X) = C (v1 , . . . , vi∗ −1 , vi∗ +1 , . . . , vm ) .
(1.14)
Thus if n = 2 and C (v1 , . . . , vm ) ⊂ R is the convex set, illustrated by Figure 1.1 with optimal vertex point vi∗ then the shaded region describes the image πT (X) of the set of exceptional initial conditions. The gradient flow (1.9) on the (m − 1)-space S m−1 is identical with the Rayleigh quotient gradient flow; see Section 1.2. 2
Remark 1.5 Moreover, the flow (1.4) with Q = diag (1, 0, . . . , 0) is equivalent to the double bracket gradient flow (1.9) on the isospectral set M (Q) = RPm−1 . In fact, with H = ξξ and ξ (t) a solution of (1.9) we have H 2 = H and ˙ + ξ ξ˙ = (N − ξ N ξI) ξξ + ξξ (N − ξ N ξI) H˙ =ξξ (1.15) =N H + HN − 2HN H = [H, [H, N ]] .
4.1. The Rˆ ole of Double Bracket Flows
107
Also, (1.9) has an interesting interpretation in neural network theory; see Oja (1982). Remark 1.6 For any (skew-) symmetric matrix Ω ∈ Rm×m ξ˙ = (N + Ω − ξ (N + Ω) ξIm ) ξ
(1.16)
defines a flow on S m−1 . If Ω = −Ω is skew-symmetric, the functional ϕ (ξ) = ξ (N + Ω) ξ = ξ N ξ is not changed by Ω and therefore has the same critical points. Thus, while (1.16) is not the gradient flow of ϕ (if Ω = −Ω ), it can still be of interest for the linear programming problem. If Ω = Ω is symmetric, (1.16) is the gradient flow of ψ (ξ) = ξ (N + Ω) ξ on S m−1 .
Interior Point Flows on the Simplex Brockett’s equation (1.4) and its simplified version (1.9) both evolve on a high-dimensional manifold M so that the projection of the trajectories into the polytope leads to a curve which approaches the optimum from the interior. In the special case where the polytope is the (m − 1)-simplex m−1 ⊂ Rm , (1.9) actually leads to a flow evolving in the simplex such that the optimum is approached from all trajectories starting in the interior of the constraint set. Such algorithms are called interior point algorithms, an example being the celebrated Karmarkar algorithm (1984), or the algorithm proposed by Khachian (n.d.). Here we like to take such issues a bit further. In the special case where the constraint set is the standard simplex m−1 ⊂ Rm , then equation (1.9) on S m−1 becomes ξ˙i =
m 2 ci − cj ξj ξi ,
i = 1, . . . , m,
(1.17)
j=1
$m
2 i=1 ξi
= 1. Thus with the substitution xi = ξi2 , i = 1, . . . , m, we obtain m x˙ i = 2 ci − cj xj xi ,
i = 1, . . . , m,
(1.18)
j=1
$m $m xi ≥ 0, i=1 xi = 1. Since i=1 x˙ i = 0, (1.18) is a flow on the simplex m−1 . The set X ⊂ S m−1 of exceptional initial conditions is mapped by the quadratic substitution xi = ξi2 , i = 1, . . . , m, onto the boundary ∂m−1 of the simplex. Thus Theorem 1.2 implies
108
Chapter 4. Linear Programming
FIGURE 1.2. Phase portrait of (1.18)
Corollary 1.7 Equation (1.18) defines a flow on m−1 . Every solution x (t) with initial condition x (0) in the interior of m−1 converges to the optimal solution ei∗ of the linear programming problem: Maximize c x over x ∈ m−1 . The exponential rate of convergence is given by |x (t) − ei∗ | ≤ const e−2µt ,
µ = min (ci∗ − cj ) . j=i∗
Remark 1.8 Equation (1.18) is a Volterra-Lotka type of equation and thus belongs to a well studied class of equations in population dynamics; cf. Schuster, Sigmund and Wolff (1978), Zeeman (1980). ˚ m−1 of the simplex is endowed with the Remark 1.9 If the interior Riemannian metric defined by ξ, η =
m ξi ηi i=1
xi
,
˚ m−1 , x = (x1 , . . . , xm ) ∈
then (1.18) is actually (up to an irrelevant factor by 2) the gradient flow of ˚ m−1 , see Figure 1.2. the linear functional x → c x on Remark 1.10 Karmarkar (1990) has analyzed a class of interior point flows which are the continuous-time versions of the discrete-time algorithm described in Karmarkar (1984). In the case of the standard simplex, Karmarkar’s equation turns out to be m 2 cj xj xi , i = 1, . . . , m, (1.19) x˙ i = ci xi − j=1
$m
xi ≥ 0, i=1 x i = 1. This flow is actually the gradient flow for the $ $quadratic m m cost function j=1 cj x2j rather than for the linear cost function j=1 cj xj .
4.1. The Rˆ ole of Double Bracket Flows
109
Thus Karmakar’s equation solves a quadratic optimization problem on the simplex. A more general class of equations would be 7 x˙ i = ci fi (xi ) −
m
8 cj fj (xj ) xj xi ,
i = 1, . . . , m,
(1.20)
j=1
with fj : [0, 1] → R monotonically increasing C 1 functions, see Section 4.2. Incidentally, the Karmarkar flow (1.19) is just a special case of the equations studied by Zeeman (1980). The following result estimates the time a trajectory of (1.18) needs in order to reach an ε-neighborhood of the optimal vertex. Proposition 1.11 Let 0 < ε < 1 and µ = minj=i∗ (ci∗ − cj ). Then for any initial condition x (0) in the interior of m−1 the solution x (t) of (1.18) is contained in an ε-neighborhood of the optimum vertex ei∗ if t ≥ tε where log (min1≤i≤m xi (0)) ε2 /2 . (1.21) tε = 2µ Proof 1.12 We first introduce a lemma. Lemma 1.13 Every solution x (t) of (1.18) is of the form e2tN x (0) , e2tN x (0) . / $m where N = diag (c1 , . . . , cm ) and e2tN x (0) = j=1 e2tcj xj (0). x (t) =
(1.22)
Proof 1.14 In fact, by differentiating the right hand side of (1.22) one sees that both sides satisfy the same conditions (1.18) with identical initial conditions. Thus (1.22) holds. Using (1.22) one has . 2tN / e x (0) , ei∗ +1 x (t) − ei∗ = x (t) − 2 e2tN x (0) e2tci∗ xi (0) . ≤2 − 2 2tN ∗ e x (0) 2
2
(1.23)
Now m j=1
e2t(cj −ci∗ ) xj (0) ≤ e−2µt + xi∗ (0) ,
(1.24)
110
Chapter 4. Linear Programming
and thus
−1 2 −1 x (t) − ei∗ ≤ 2 − 2 1 + e−2µt xi∗ (0) .
Therefore x (t) − ei∗ ≤ ε if −1 ε2 −1 1 + e−2µt xi∗ (0) >1− , 2 i.e., if
log ε2 xi∗ (0) 2−ε2 . t ≥ 2µ
From this the result easily follows. 1 Note that for the initial condition x (0) = m (1, . . . , 1) the estimate (1.21) becomes ε2 log 2m , (1.25) tε ≥ 2µ
and (1.25) gives an effective lower bound for (1.21) valid for all x (0) ∈ ˚ m−1 . We can use Proposition 1.11 to obtain an explicit estimate for the time needed in either Brockett’s flow (1.4) or for (1.9) that the projected interior point trajectory πT (x (t)) enters an ε-neighborhood of the optimal solution. Thus let N , T be defined by (1.2), (1.3) with (1.8) understood and let vi∗ denote the optimal solution for the linear programming problem of maximizing c x over the convex set C (v1 , . . . , vm ). Theorem 1.15 Let 0 < ε < 1, µ = minj=i∗ (c vi∗ − c vj ), and let X be defined by (1.11). Then for all initial conditions ξ (0) ∈ S m−1 − X the projected trajectory πT (ξ (t)) ∈ C (v1 , . . . , vm ) of the solution ξ (t) of (1.9) is in an ε-neighborhood of the optimal vertex vi∗ for all t ≥ tε with (ε/T )2 log 2m tε = . (1.26) 2µ Proof 1.16 By (1.5) πT (ξ (t)) = T x (t) where x (t) = ξ1 (t)2, . . . , ξm (t)2 satisfies m x˙ i = 2 c vi − c vj xj xi , i = 1, . . . , m. j=1
4.2. Interior Point Flows on a Polytope
2 log (ε/T )
Proposition 1.11 implies x (t) − ei∗ < for t ≥ hence πT (ξ (t)) − vi∗ ≤ T · x (t) − ei∗ < ε. ε T
2m 2µ
111
and
Main Points of the Section The “gradient method” for linear programming consists in the following program: (a) To find a smooth compact manifold M and a smooth map π : M → Rn , which maps M onto the convex constraint set C. (b) To solve the gradient flow of the smooth function λ ◦ π : M → R and determine its stable equilibria points. In the cases discussed here we have M = S m−1 and π : S m−1 → Rn is the Rayleigh quotient formed as the composition of the linear map T : m−1 → C (v1 , . . . , vm ) with the smooth map from S m−1 → m−1 defined by (1.4). Likewise, interior point flows for linear programming evolve in the interior of the constraint set.
4.2 Interior Point Flows on a Polytope In the previous sections interior point flows for optimizing a linear functional on a simplex were considered. Here, following the pioneering work of Faybusovich (1991a; 1992a), we extend our previous results by considering interior point gradient flows for a cost function defined on the interior of an arbitrary polytope. Polytopes, or compact convex subsets of Rn , can be parametrized in various ways. A standard way of describing such polytopes C is as follows.
Mixed Equality-Inequality Constraints For any x ∈ Rn we write x ≥ 0 if all coordinates of x are non-negative. Given A ∈ Rm×n with rk A = m < n and b ∈ Rm let C = {x ∈ Rn | x ≥ 0, Ax = b} be the convex constraint set of our optimization problem. The interior of C is the smooth manifold defined by C˚ = {x ∈ Rn | x > 0, Ax = b}
112
Chapter 4. Linear Programming
In the sequel we assume that C˚ is a nonempty subset of Rn and C is ˚ The optimization task is then to compact. Thus C is the closure of C. optimize (i.e. minimize or maximize) the cost function φ : C → R over C. Here we assume that φ : C → R is the restriction of a smooth function φ : Rn → R. Let ∂φ ∂φ ∇φ (x) = (x) , . . . , (x) (2.1) ∂x1 ∂xn denote the usual gradient vector of φ in Rn . For x ∈ Rn let D (x) = diag (x1 , . . . , xn ) ∈ Rn×n .
(2.2)
For any x ∈ C˚ let Tx C˚ denote the tangent space of C˚ at x. Thus Tx C˚ coincides with the tangent space of the affine subspace of {x ∈ Rn | Ax = b} at x:
Tx C˚ = {ξ ∈ Rn | Aξ = 0} , (2.3) ˚ the diagonal that is, with the kernel of A; cf. Section 1.6. For any x ∈ C, matrix D (x) is positive definite and thus
(2.4) ξ, η ∈ Tx C˚ ,
defines a positive definite inner product on Tx C˚ and, in fact, a Riemannian metric on the interior C˚ of the constraint set. The gradient grad φ of φ : C˚ → R with respect to the Riemannian metric , , at x ∈ C˚ is characterized by the properties
(a) grad φ (x) ∈ Tx C˚ ξ, η := ξ D (x)
−1
η,
(b) grad φ (x) , ξ = ∇φ (x) ξ
for all ξ ∈ Tx C˚ . Here (a) is equivalent to A · grad φ (x) = 0 while (b) is equivalent to −1 D (x) grad φ (x) − ∇φ (x) ξ = 0
∀ξ ∈ ker A
⇐⇒D (x)−1 grad φ (x) − ∇φ (x) ∈ (ker A)⊥ = Im (A ) −1
⇐⇒D (x)
grad φ (x) − ∇φ (x) = A λ
for a uniquely determined λ ∈ Rm .
(2.5)
4.2. Interior Point Flows on a Polytope
113
Thus grad φ (x) = D (x) ∇φ (x) + D (x) A λ.
(2.6)
Multiplying both sides of this equation by A and noting that A·grad φ (x) = 0 and AD (x) A > 0 for x ∈ C˚ we obtain −1
λ = − (AD (x) A ) Thus
AD (x) ∇φ (x) .
" # −1 grad φ (x) = In − D (x) A (AD (x) A ) A D (x) ∇φ (x)
(2.7)
˚ We have thus proved is the gradient of φ at an interior point x ∈ C. Theorem 2.1 The gradient flow x˙ = grad φ (x) of φ : C˚ → R with respect to the Riemannian metric (2.4) on the interior of the constraint set is −1 x˙ = In − D (x) A (AD (x) A ) A D (x) ∇φ (x) .
(2.8)
˚ Let us consider a few special We refer to (2.8) as the Faybusovich flow on C. cases of this result. If A = (1, . . . , 1) ∈ R1×n and b = 1, the constraint set C is just the standard (n − 1)-dimensional simplex n−1 . The gradient flow (2.8) then simplifies to x˙ = (D (∇φ (x)) − x ∇φ (x) In ) x, that is, to x˙ i =
n ∂φ ∂φ − xj xi , ∂xi j=1 ∂xj
i = 1, . . . , n.
(2.9)
If φ (x) = c x then (2.9) is equivalent to the interior point flow (1.18) on n−1 (up to the constant factor 2). If φ (x) =
n
fj (xj )
j=1
then (2.8) is equivalent to x˙ i =
n fi (xi ) − xj fj (xj ) xi , j=1
i = 1, . . . , n,
114
Chapter 4. Linear Programming
i.e. to (1.20). In particular, Karmarkar’s flow (1.19) on the simplex is thus seen as the gradient flow of the quadratic cost function 1 cj x2j 2 j=1 n
φ (x) =
on n−1 . As another example, consider the least squares cost function on the simplex φ : n−1 → R defined by φ (x) =
1 2
2
F x − g
Then ∇φ (x) = F (F x − g) and the gradient flow (2.8) is equivalent to x˙ i = ((F F x)i − (F g)i − x F F x + x F g) xi
i = 1, . . . , n (2.10)
In particular, for F = In , (2.9) becomes 2 x˙ i = xi − gi − x + x g xi ,
i = 1, . . . , n
(2.11)
˚ n−1 converges to the best If g ∈ n−1 , then the gradient flow (2.11) in approximation of g ∈ Rn by a vector on the boundary ∂n−1 of n−1 ; see Figure 2.1. Finally, for A arbitrary and φ (x) = c x the gradient flow (2.8) on C˚ becomes " # −1 x˙ = D (c) − D (x) A (AD (x) A ) AD (c) x
(2.12)
which is identical with the gradient flow for linear programming proposed and studied extensively by Faybusovich (1991a; 1991b; 1991c). Independently, this flow was also studied by Herzel, Recchioni and Zirilli (1991). Faybusovich (1991a; 1992a) have also presented a complete phase portrait analysis of (2.12). He shows that the solutions x (t) ∈ C˚ converge exponentially fast to the optimal vertex of the linear programming problem.
Inequality Constraints A different description of the constraint set is as C = {x ∈ Rn | Ax ≥ b}
(2.13)
4.2. Interior Point Flows on a Polytope
ÉÉÉÉÉ ÉÉÉÉÉ ÉÉÉÉÉ ÉÉÉÉÉ ÉÉÉÉÉ ÉÉÉÉÉ
115
Db
FIGURE 2.1. Best approximation of b on a simplex
where A ∈ Rm×n , b ∈ Rm . Here we assume that x → Ax is injective, that is rk A = n ≤ m. Consider the injective map F : Rn → Rm ,
F (x) = Ax − b
Then C = {x ∈ Rn | F (x) ≥ 0} and C˚ = {x ∈ Rn | F (x) > 0}
In this case the tangent space Tx C˚ of C˚ at x is unconstrained and thus consists of all vectors ξ ∈ Rn . A Riemannian metric on C˚ is defined by
ξ, η := DF |x (ξ) diag (F (x))
−1
=ξ A diag (Ax − b)
−1
DF |x (η)
(2.14)
Aη
for x ∈ C˚ and ξ, η ∈ Tx C˚ = Rn . Here DF |x : Tx C˚ → TF (x) Rm is the derivative or tangent map of F at x and D (F (x)) is defined by (2.2). The reader may verify that (2.14) does indeed define a positive definite inner product on Rn , using the injectivity of A. It is now rather straightforward to compute the gradient flow x˙ = grad φ (x) on C˚ with respect to this ˚ Here φ : Rn → R is a given smooth function. Riemannian metric on C. Theorem 2.2 The gradient flow x˙ = grad φ (x) of a smooth function φ : C˚ → R with respect to the Riemannian metric (2.14) on C˚ is −1 −1 x˙ = A diag (Ax − b) A ∇φ (x) where ∇φ (x) =
∂φ ∂φ ∂x1 , . . . , ∂xn
(2.15)
116
Chapter 4. Linear Programming
Proof 2.3 The gradient is defined by the characterization
(a) grad φ (x) ∈ Tx C˚
(b) grad φ (x) , ξ = ∇φ (x) ξ for all ξ ∈ Tx C˚ .
Since Tx C˚ = Rn we only have to consider Property (b) which is equivalent to −1 grad φ (x) A diag (Ax − b) Aξ = ∇φ (x) ξ for all ξ ∈ Rn . Thus −1 grad φ (x) = A diag (Ax − b)−1 A ∇φ (x) . Let us consider some special cases. If φ (x) = c x, as in the linear programming case, then (2.15) is equivalent to −1 −1 x˙ = A diag (Ax − b) A c.
(2.16)
If we assume further that A is invertible, i.e. that we have exactly n inequality constraints, then (2.16) becomes the linear differential equation x˙ = A−1 diag (Ax − b) (A )
−1
c.
(2.17)
Let us consider$next the case of a n-dimensional simplex domain C = n {x ∈ Rn | x ≥ 0, i=1 xi ≤ 1}, now regarded as a subset of Rn . Here m = n + 1 and 0 . .. In , b= A= −e 0 −1 with e = (1, . . . , 1) ∈ Rn . Thus −1
diag (Ax − b)
= diag
−1 x−1 1 , . . . , xn ,
−1 n 1− xi . i=1
and
−1
A diag (Ax − b)
A = diag (x)
−1
−1 n + 1− xi ee . 1
4.3. Recursive Linear Programming/Sorting
117
Using the matrix inversion lemma, see Appendix A, gives −1 −1 A diag (Ax − b) A −1 = D (x)−1 + (1 − Σxi )−1 ee −1
=D (x) − D (x) e (e D (x) e + (1 − Σxi )) = (I − D (x) ee ) D (x) .
e D (x)
Thus the gradient flow (2.17) in this case is x˙ = (I − D (x) ee ) D (x) c = (D (c) − xc ) x = (D (c) − c xIn ) x. in harmony with Corollary 1.7. Problem 2.4 Let A ∈ Rm×n , rk A = m < n, B ∈ Rm×n and consider the convex subset of positive semidefinite matrices
C = P ∈ Rn×n | P = P ≥ 0, AP = B with nonempty interior
C˚ = P ∈ Rn×n | P = P > 0, AP = B . Show that
ξ, η := tr P −1 ξP −1 η ,
ξ, η ∈ TP C˚ ,
˚ defines a Riemannian metric on C. Problem 2.5 Let W1 , W2 ∈ Rn×n be symmetric matrices.
Prove that the gradient flow of the cost function φ : C˚ → R, φ (P ) = tr W1 P + W2 P −1 with respect to the above Riemannian metric on C˚ is −1 P˙ = In − P A (AP A ) A (P W1 P − W2 ) .
4.3 Recursive Linear Programming/Sorting Our aim in this section is to obtain a recursive version of Brockett’s linear programming/sorting scheme (1.4). In particular, we base our algorithms on the recursive Lie-bracket algorithm of Section 2.3, as originally presented in Moore et al. (1994).
118
Chapter 4. Linear Programming
Theorem 1.2 gives an analysis of the Rayleigh quotient gradient flow on the sphere S m−1 , ξ˙ = (N − ξ N ξI) ξ,
ξ (0) = ξ0
(3.1)
and shows how this leads to a solution to the linear programming/sorting problem. As pointed out in the remarks following Theorem 1.2, any solution ξ (t) of (3.1) induces a solution to the isospectral double bracket flow H˙ = [H, [H, N ]] ,
H (0) = H0 = ξ0 ξ0
via the quadratic substitution H = ξξ . Similarly, if (ξk ) is a solution to the recursion
ξk+1 = e−α[ξk ξk ,N ] ξk ,
α=
1 , 4 N
ξ0 ∈ S m−1 ,
(3.2)
evolving on S m−1 , then (Hk ) = (ξk ξk ) is a solution to the recursive Liebracket scheme Hk+1 = e−α[Hk ,N ] Hk eα[Hk ,N ] ,
α=
1 , 4 N
ξ0 ∈ S m−1 . (3.3)
We now study (3.2) as a recursive solution to the linear programming/ sorting problem. Theorem 3.1 (Linear Programming/Sorting Algorithm) Consider the maximization of c x over the polytope C (v1 , . . . , vm ), where x, c, vi ∈ Rn . Assume the genericity condition c vi = c vj holds for all i = j. Let N = diag (c v1 , . . . , c vm ). Then: (a) The solutions (ξk ) of the recursion (3.2) satisfy i ξk ∈ S m−1 for all k ∈ N. ii There are exactly 2m fixed points corresponding to ±e1 , . . . , ±em , the standard basis vectors of Rm . iii All fixed points are unstable, except for ±ei∗ with c vi∗ = max (c vi ), which is exponentially stable. i=1,...,m
iv The recursive algorithm (3.2) acts to monotonically increase the cost rN (ξk ) = ξk N ξk .
4.3. Recursive Linear Programming/Sorting
(b) Let p0 =
m
ηi vi ∈ C,
ηi ≥ 0,
119
η1 + · · · + ηm = 1
i=1
√ √ η1 , . . . , ηm . Now let ξk = be arbitrary and define ζ0 = (ξk,1 , . . . , ξk,m ) be given by (3.2). Then the sequence of points pk =
m
ξk,i vi
(3.4)
i=1
converges exponentially fast to the unique optimal vertex of the polytope C for generic initial conditions ξ0 . Proof 3.2 The proof of Part (a) is an immediate consequence of Theorem 2.3.8, once it is observed that ξk is the first column vector of an orthogonal matrix solution Θk ∈ O (n) to the double bracket recursion
Θk+1 = e−α[Θk QΘk ,N ] Θk . Here Q = diag (1, 0, . . . , 0) and ξ0 = Θ0 e1 . Part (b) follows immediately from Part (a). $m 1 Remark 3.3 By choosing p0 = m i=1 vi ∈ C as the central point of the polytope and setting ξ0 = √1m , . . . , √1m , it is guaranteed that the sequence of interior points pk defined by (3.4) converges to the optimal vertex. Remark 3.4 Of course, it is not our advise to use the above algorithm as a practical method for solving linear programming problems. In fact, the same comments as for the continuous time double bracket flow (1.4) apply here. Remark 3.5 It should be noted that a computationally simple form of (3.2) exists which does not require the calculation of the matrix exponential (see Mahony et al. (1996)). sin (αyk ) sin (αyk ) N ξk ξk+1 = cos (αyk ) − ξk N ξk ξk + yk yk (3.5) 2 1/2
yk = xk N 2 xk − (xk N xk )
Chapter 4. Linear Programming
y axis
120
2
1
3 4 x axis
FIGURE 3.1. The recursion occurring inside the polytope C
Remark 3.6 Another alternative to the exponential term eA = e−αk (ξk ξk N −N ξk ξk ) is a (1, 1) Pad´e approximation 2I+A 2I−A which preserves the orthogonal nature of the recursion ξk and reduces the computational cost of each step of the algorithm. Remark 3.7 Although the linear programming/sorting algorithm evolves in the interior of the polytope to search for the optimal vertex, it is not strictly a dynamical system defined on the polytope C, but rather a dynamical system with output variables on C. It is a projection of the recursive Rayleigh quotient algorithm from the sphere S m−1 to the polytope C. It remains as a challenge to find isospectral-like recursive interior point algorithms similar to Karmarkar’s algorithm evolving on C. Actually, one interesting case where the desired algorithm is available as a recursive interior point algorithm, is the special case where the polytope is the standard simplex m−1 , see Problem 3.10. Remark 3.8 An attraction of this algorithm, is that it can deal with timevarying or noisy data. Consider a process in which the vertices of a linear polytope are given by a sequence of vectors {vi (k)} where k = 1, 2, . . . , and similarly the cost vector c = c (k) is also a function of k. Then the only difference is the change in target matrix at each step as N (k) = diag (c (k) v1 (k) , . . . , c (k) vm (k)) . Now since such a scheme is based on a gradient flow algorithm for which the convergence rate is exponential, it can be expected that for slowly timevarying data, the tracking response of the algorithm should be reasonable, and there should be robustness to noise.
121
Error
4.3. Recursive Linear Programming/Sorting
Iterations
FIGURE 3.2. Response to a time-varying cost function
Simulations Two simulations were run to give an indication of the nature of the recursive interior-point linear programming algorithm. To illustrate the phase diagram of the recursion in the polytope C, a linear programming problem simulation with 4 vertices embedded in R2 is shown in Figure 3.1. This figure underlines the interior point nature of the algorithm. The second simulation in Figure 3.2 is for 7 vectors embedded in R7 . This indicates how the algorithm behaves for time-varying data. Here the cost vector c ∈ Rn has been chosen as a time varying sequence c = c (k) where c(k+1)−c(k) ∼ 0.1. For each step the optimum direction ei∗ is calculated c(k) using a standard sorting algorithm and then the norm of the difference between the estimate ξ (k) − ei∗ 2 is plotted. Note that since the cost vector c (k) is changing the optimal vertex may change abruptly while the algorithm is running. In this simulation such a jump occurs at iteration 10. It is interesting to note that for the iterations near to the jump the change in ξ (k) is small indicating that the algorithm slows down when the optimal solution is in doubt. Problem 3.9 Verify (3.5). Problem 3.10 Verify that the Rayleigh quotient algorithm (3.5) on the sphere S m−1 induces the following interior point algorithm on the simplex m−1 . Let c = (c1 , . . . , cm ) . For any solution ξk = (ξk,1 , . . . , ξk,m ) ∈ S m−1
2 2 ∈ m−1 . Then (xk ) of (3.5) set xk = (xk,1 , . . . , xk,m ) := ξk,1 , . . . , ξk,m satisfies the recursion
xk+1
2 sin (αyk ) sin (αyk ) = cos (αyk ) − c xk N xk I+ yk yk
122
Chapter 4. Linear Programming
with α =
1 4N
and yk =
9 $m
2 i=1 ci xk,i
$ 2 −( m i=1 ci xk,i ) .
Problem 3.11 Show that every solution (xk ) which starts in the interior of the simplex converges exponentially to the optimal vertex ei∗ with ci∗ = max ci .
i=1,...,m
Main Points of Section The linear programming/sorting algorithm presented in this section is based on the recursive Lie-bracket scheme in the case where H0 is a rank one projection operator. There are computationally simpler forms of the algorithm which do not require the calculation of the matrix exponential. An attraction of the exponentially convergent algorithms, such as is described here, is that they can be modified to be tracking algorithms which can the track time-varying data or filter noisy data. In such a situation the algorithms presented here provide simple computational schemes which will track the optimal solution, and yet be robust.
4.3. Recursive Linear Programming/Sorting
123
Notes for Chapter 4 As textbooks on optimization theory and nonlinear programming we mention Luenberger (1969; 1973). Basic monographies on linear programming, the simplex method and combinatorial optimization are Dantzig (1963), Gr¨ otschel, Lov´asz and Schrijver (1988). The book Werner (1992) also contains a discussion of Karmarkar’s method. Sorting algorithms are discussed in Knuth (1973). For a collection of articles on interior point methods we refer to Lagarias and Todd (1990) as well as to the Special Issue of Linear Algebra and its Applications (1991), Volume 151. Convergence properties of interior point flows are investigated by Meggido and Shub (1989). Classical papers on interior point methods are those of Khachian (n.d.) and Karmarkar (1984). The paper of Karmarkar (1990) contains an interesting interpretation of the (discrete-time) Karmarkar algorithm as the discretization of a continuous time interior point flow. Variable step-size selections are made with respect to the curvature of the interior point flow trajectories. For related work we refer to Sonnevend and Stoer (1990) and Sonnevend, Stoer and Zhao (1990; 1991). The idea of using isospectral matrix flows to solve combinatorial optimization tasks such as sorting and linear programming is due to Brockett (1991b); see also Bloch (1990b) and Helmke (1993b) for additional information. For further applications to combinatorial assignment problems and least squares matching problems see Brockett and Wong (1991), Brockett (1989a). Starting from the work of Brockett (1991b), a systematic approach to interior point flows for linear programming has been developed in the important pioneering work of Faybusovich. Connections of linear programming with completely integrable Hamiltonian systems are made in Bayer and Lagarias (1989), Bloch (1990b) and Faybusovich (1991a; 1991b; 1992a). For a complete phase portrait analysis of the interior point gradient flow (2.12) we refer to Faybusovich (1991a; 1992a). Quadratic convergence to the optimum via a discretization of (2.12) is established in Herzel et al. (1991). An important result from linear algebra which is behind Brockett’s approach to linear programming is the Schur-Horn theorem. The Schur-Horn theorem states that the set of vectors formed by the diagonal entries of Hermitian matrices with
eigenvalues λ1, . . . , λn coincides with the convex polytope with vertices λπ(1) , . . . , λπ(n) , where π varies over all n! permutations of 1, . . . , n; Schur (1923), Horn (1953). A Lie group theoretic generalization of this result is due to Kostant (1973). For deep connections of such convexity results with symplectic geometry we refer to Atiyah (1982; 1983); see also Byrnes and Willems (1986), Bloch and Byrnes (1986). There is an interesting connection of interior point flows with population
124
Chapter 4. Linear Programming
dynamics. In population dynamics interior point flows on the standard simplex naturally arise. The book of Akin (1979) contains a systematic analysis of interior point flows on the standard simplex of interest in population dynamics. Akin refers to the Riemannian metric (2.4) on the interior of the standard simplex as the Shahshahani metric. See also the closely related work of Smale (1976), Schuster et al. (1978) and Zeeman (1980) on Volterra-Lotka equations for predator-prey models in population dynamics. For background material on the Volterra-Lotka equation see Hirsch and Smale (1974). Highly interconnected networks of coupled nonlinear artificial neurons can have remarkable computational abilities. For exciting connections of optimization with artificial neural networks we refer to Pyne (1956), Chua and Lin (1984), Hopfield and Tank (1985), Tank and Hopfield (1985). Hopfield (1982; 1984) have shown that certain well behaved differential equations on a multidimensional cube may serve as a model for associative memories. In the pioneering work Hopfield and Tank (1985) show that such dynamical systems are capable of solving complex optimization problems such as the Travelling Salesman Problem. See also Peterson and Soederberg (1989) for related work. For an analysis of the Hopfield model we refer to Aiyer, Niranjan and Fallside (1990). For further work we refer to Yuille (1990) and the references therein. An open problem for future research is to find possible connections between the interior point flows, as described in this chapter, and Hopfield type neural networks for linear programming.
CHAPTER
5
Approximation and Control 5.1 Approximations by Lower Rank Matrices In this chapter, we analyse three further matrix least squares estimation problems. The first is concerned with the task of approximating matrices by lower rank ones. This has immediate relevance to total least squares estimation and representations by linear neural networks. The section also serves as a prototype for linear system approximation. In the second, short section we introduce dynamical systems achieving the polar decomposition of a matrix. This has some contact with Brockett’s (1989a) investigation on matching problems of interest in computer vision and image detection. Finally the last section deals with inverse eigenvalue problems arising in control theory. This section also demonstrates the potential of the dynamical systems approach to optimize feedback controllers. In this section, we study the problem of approximating finite-dimensional linear operators by lower rank linear operators. A classical result from matrix analysis, the Eckart-Young-Mirsky Theorem, see Corollary 1.17, states that the best approximation of a given M × N matrix A by matrices of smaller rank is given by a truncated singular value decomposition of A. Since the set M (n, M × N ) of real M × N matrices X of fixed rank n is a smooth manifold, see Proposition 1.14, we thus have an optimization problem for the smooth Frobenius norm distance function fA : M (n, M × N ) → R,
2
fA (X) = A − X .
This problem is equivalent to the total linear least squares problem, see
126
Chapter 5. Approximation and Control
Golub and Van Loan (1980). The fact that we have here assumed that the rank of X is precisely n instead of being less than or equal to n is no restriction in generality, as we will later show that any best approximant of A of rank ≤ n will automatically satisfy rank X = n. X The critical points and, in particular, the global minima of the distance function fA (X) = A − X2 on manifolds of fixed rank symmetric and rectangular matrices are investigated. Also, gradient flows related to the minimization of the constrained distance function fA (X) are studied. Similar results are presented for symmetric matrices. For technical reasons it is convenient to study the symmetric matrix case first and then to deduce the corresponding results for rectangular matrices. This work is based on Helmke and Shayman (1995) and Helmke, Prechtel and Shayman (1993).
Approximations by Symmetric Matrices Let S (N ) denote the set of all N × N real symmetric matrices. For integers 1 ≤ n ≤ N let
(1.1) S (n, N ) = X ∈ RN ×N | X = X, rank X = n denote the set of real symmetric N × N matrices of rank n. Given a fixed real symmetric N × N matrix A we consider the distance function fA : S (n, N ) → R,
2
X → A − X
(1.2)
where X2 = tr (XX ) is the Frobenius norm. We are interested in finding the critical points and local and global minima of fA , i.e. the best rank n symmetric approximants of A. The following result summarizes some basic geometric properties of the set S (n, N ). Proposition 1.1 (a) S (n, N ) is a smooth manifold of dimension 12 n (2N − n + 1) and has n + 1 connected components S (p, q; N ) = {X ∈ S (n, N ) | sig X = p − q}
(1.3)
where p, q ≥ 0, p + q = n and sig denotes signature. The tangent space of S (n, N ) at an element X is
(1.4) TX S (n, N ) = ∆X + X∆ | ∆ ∈ RN ×N (b) The topological closure of S (p, q; N ) in RN ×N is 0 S (p, q; N ) = S (p , q ; N ) p ≤p, q ≤q
(1.5)
5.1. Approximations by Lower Rank Matrices
127
and the topological closure of S (n, N ) in RN ×N is S (n, N ) =
n 0
S (i, N )
(1.6)
i=0
Proof 1.2 The function which associates to every X ∈ S (n, N ) its signature is locally constant. Thus S (p, q; N ) is open in S (n, N ) with S (n, N ) = p+q=n, p,q≥0 S (p, q; N ). It follows that S (n, N ) has at least n + 1 connected components. Any element X ∈ S (p, q; N ) is of the form X = γQγ where γ ∈ GL (N, R) is invertible and Q = diag (Ip , −Iq , 0) ∈ S (p, q; N ). Thus S (p, q; N ) is an orbit of the congruence group action (γ, X) → γXγ of GL (N, R) on S (n, N ). It follows that S (p, q; N ) with p, q ≥ 0, p + q = n, is a smooth manifold and therefore also S (n, N ) is. Let GL+ (N, R) denote the set of invertible N ×N matrices γ with det γ > 0. Then GL+ (N, R) is a connected subset of GL (N, R) and S (p, q; N ) = {γQγ | γ ∈ GL+ (N, R)}. Consider the smooth surjective map η : GL (N, R) → S (p, q; N ) ,
η (γ) = γQγ .
(1.7)
Then S (p, q; N ) is the image of the connected set GL+ (N, R) under the continuous map γ and therefore S (p, q; N ) is connected. This completes the proof that the sets S (p, q; N ) are precisely the n + 1 connected components of S (n, N ). Furthermore, the derivative of η at γ ∈ GL (N, R) is the linear map Dη (γ) : Tγ GL (N, R) → TX S (p, q; N ), X = γQγ , defined by Dη (γ) (∆) = ∆X + X∆ and maps Tγ GL (N, R) ∼ = RN ×N surjectively onto TX S (p, q; N ). Since X and p, q ≥ 0, p + q = n, are arbitrary this proves (1.4). Let ∆11 ∆12 ∆13 ∆ = ∆21 ∆22 ∆23 ∆31
∆32
∆33
be partitioned according to the partition (p, q, N − n) of N . Then ∆ ∈ ker Dη (I) if and only if ∆13 = 0, ∆23 = 0 and the n × n submatrix ∆11 ∆12 ∆21
∆22
is skew-symmetric. A simple dimension count thus yields dim ker Dη (I) = 1 2 n (n − 1) + N (N − n) and therefore dim S (p, q; N ) = N 2 − dim ker Dη (I) = 12 n (2N − n + 1) .
128
Chapter 5. Approximation and Control
This completes the proof of (a). The proof of (b) is left as an exercise to the reader. Theorem 1.3 (a) Let A ∈ Rn×n be symmetric and let N+ = dim Eig+ (A) ,
N− = dim Eig− (A)
(1.8)
be the numbers of positive and negative eigenvalues of A, respectively. The critical points X of the distance function fA : S (n, N ) → R are characterized by AX = XA = X 2 . (b) If A has N distinct eigenvalues λ1 > · · · > λN , and A = Θ diag (λ1 , . . . , λN ) Θ for Θ ∈ O (N ), then the restriction of the distance function fA : S (p, q; N ) → R has exactly N+ N− N+ !N− ! · = p! (N+ − p)!q! (N− − q)! p q critical points. In particular, fA has critical points in S (p, q; N ) if and only if p ≤ N+ and q ≤ N− . The critical points X ∈ S (p, q; N ) of fA with p ≤ N+ , q ≤ N− , are characterized by X = Θ diag (x1 , . . . , xN ) Θ
(1.9)
with xi = 0
or
xi = λi ,
i = 1, . . . , N
(1.10)
and exactly p of the xi are positive and q are negative. Proof 1.4 Without loss of generality, we may assume that A = diag (λ1 , . . . , λN ) with λ1 ≥ · · · ≥ λN . A straightforward computation shows that the derivative of fA : S (n, N ) → R at X is the linear map DfA (X) : TX S (n, N ) → R defined by DfA (X) (∆X + X∆ ) = − 2 tr ((A − X) (∆X + X∆ )) = − 4 tr (X (A − X) ∆)
(1.11)
for all ∆ ∈ RN ×N . Therefore X ∈ S (n, N ) is a critical point of fA if and only if X 2 = XA. By symmetry of X then XA = AX. This proves (a). Now assume that λ1 > · · · > λN . From X 2 = AX = XA, and since A has distinct eigenvalues, X must be diagonal. Thus the critical points of fA
5.1. Approximations by Lower Rank Matrices
129
FIGURE 1.1. Real symmetric matrices of rank ≤ 1
are the diagonal matrices X = diag (x1 , . . . , xN ) with (A − X) X = 0 and therefore xi = 0 or xi = λi for i = 1, . . . , N . Also, X ∈ S (p, q; N ) if and only if exactly p of the xi are positive and q are negative. Consequently, using the standard symbol for the binomial coefficient, there are N+ N− · p q critical points of fA in S (p, q; N ), characterized by (1.9). This completes the proof. Example 1.5 Let N = 2, n = 1 and A = ab cb . The varietyof rank ≤ 1 real symmetric 2 × 2 matrices is a cone X = [ xy yz ] | xz = y 2 depicted in Figure 1.1. The function fA : S (1, 2) → R has generically 2 local minimum, if sig (A) = 0, and 1 local = global minimum, if A > 0 or A < 0. No other critical points exist. Theorem 1.3 has the following immediate consequence. Corollary 1.6 Let A = Θ diag (λ1 , . . . , λN ) Θ with Θ ∈ O (N ) a real orˆ ∈ S (p, q; N ) thogonal N × N matrix and λ1 ≥ · · · ≥ λN . A minimum X for fA : S (p, q; N ) → R exists if and only if p ≤ N+ and q ≤ N− . One ˆ ∈ S (p, q; N ) is given by such minimizing X ˆ = Θ diag (λ1 , . . . , λp , 0, . . . , 0, λN −q+1 , . . . , λN ) Θ X
(1.12)
$N −q 2 ˆ and the minimum value of fA : S (p, q; N ) → R is i=p+1 λi . X ∈ S (p, q; N ) given by (1.12) is the unique minimum of fA : S (p, q; N ) → R if λp > λp+1 and λN −q > λN −q+1 . The next result shows that a best symmetric approximant of A of rank ≤ n with n < rank A necessarily has rank n. Thus, for n < rank A, the global minimum of the function fA : S (n, N ) → R is always an element of
130
Chapter 5. Approximation and Control
S (n, N ). Recall that the singular values of A ∈ RN ×N are the nonnegative square roots of the eigenvalues of AA . Proposition 1.7 Let A ∈ RN ×N be symmetric with singular values σ1 ≥ · · · ≥ σN and let n ∈ N be any integer with n < rank A. There exists ˆ ∈ S (n, N ) which minimizes fA : S (n, N ) → R. Moreover, if X ˆ ∈ RN ×N X is any symmetric matrix which satisfies ˆ ≤n rank X A − X ˆ = inf {A − X | X ∈ S (i, N ) , 0 ≤ i ≤ n}
(1.13)
ˆ = n. In particular, the minimum value then necessarily rank X µn (A) = min {fA (X) | X ∈ S (n, N )}
(1.14)
coincides with the minimum value min {fA (X) | X ∈ S (i, N ) , 0 ≤ i ≤ n}. One has 2
A = µ0 (A) > µ1 (A) > · · · > µr (A) = 0,
r = rank A, (1.15)
and for 0 ≤ n ≤ r µn (A) =
N
σi2 .
(1.16)
i=n+1
Proof 1.8 By Proposition 1.1 S (n, N ) =
n 0
S (i, N )
i=0
is a closed subset of RN ×N and therefore fA : S (n, N ) → R is proper. Since any proper continuous function f : X → Y has a closed image f (X) ˆ ∈ S (n, N ) for fA : S (n, N ) → R exists: it follows that a minimizing X & ' ˆ 2 = min fA (X) | X ∈ S (n, N ) A − X ˆ < n. Then for ε ∈ R and b ∈ RN , b = 1, arbitrary we Suppose rk (X) ˆ have rk X + εbb ≤ n and thus
A − X ˆ − εbb 2 =A − X ˆ 2 − 2ε tr A − X ˆ bb + ε2 ˆ 2 . ≥A − X
5.1. Approximations by Lower Rank Matrices
131
Thus for all ε ∈ R and b ∈ RN , b = 1,
ˆ bb . ε2 ≥ 2ε tr A − X
ˆ bb = 0 for all b ∈ RN . Hence A = X, ˆ contrary This shows tr A − X ˆ to assumption. Therefore X ∈ S (n, N ) and µ0 (A) > · · · > µr (A) = 0 for r = rk (A). Thus (1.16) follows immediately from Corollary 1.6. We now show that the best symmetric approximant of a symmetric matrix A ∈ RN ×N in the Frobenius norm is in general uniquely determined. Theorem 1.9 Let A = Θ diag (λ1 , . . . , λN ) Θ with Θ ∈ O (N ) real orthogonal N × N matrix and λ21 ≥ · · · ≥ λ2n > λ2n+1 ≥ · · · ≥ λ2N . Then ˆ = Θ diag (λ1 , . . . , λn , 0, . . . , 0) Θ ∈ S (n, N ) X
(1.17)
is the uniquely determined best symmetric approximant of A of rank ≤ n. Proof 1.10 We have N A − X ˆ 2 = λ2i = µn (A) i=n+1
ˆ ∈ S (n, N ) is a global minimum of fA : S (n, N ) → R. by (1.16), and thus X By Proposition 1.7 every symmetric best approximant of A of rank ≤ n has rank n. Let X0 ∈ S (n, N ) denote any minimum of fA : S (n, N ) → R. Then X0 is a critical point of fA and therefore, by Theorem 1.3, is of the form X0 = Θ diag (x1 , . . . , xn ) Θ with xi = λi for indices i satisfying 1 ≤ i1 < · · · < in ≤ N and$ xi = 0 2 otherwise. For I = {i1 , . . . , in } the minimal distance A − X0 = i∈I λ2i $N coincides with µn (A) = i=n+1 λ2i if and only if I = {1, . . . , n}, ı.e. if and only if X0 = Θ diag (λ1 , . . . , λn , 0, . . . , 0) Θ . An important problem in linear algebra is that of finding the best positive semidefinite symmetric approximant of a given symmetric N × N matrix A. By Corollary 1.6 we have Corollary 1.11 Let A = Θ diag (λ1 , . . . , λN ) Θ with Θ ∈ O (N ) real orthogonal and λ1 ≥ · · · ≥ λn > λn+1 ≥ · · · ≥ λN , λn > 0. Then ˆ = Θ diag (λ1 , . . . , λn , 0, . . . , 0) Θ ∈ S (n, N ) X
(1.18)
is the unique positive semidefinite symmetric best approximant of A of rank ≤ n.
132
Chapter 5. Approximation and Control
In particular
ˆ = Θ diag λ1 , . . . , λN+ , 0, . . . , 0 Θ X
(1.19)
is the uniquely determined best approximant of A in the class of positive semidefinite symmetric matrices. This implies the following result due to Higham (1988). Corollary 1.12 Let A ∈ RN ×N and let B = 12 (A + A ) be the symmetric part of A. Let B = U H be the polar decomposition (U U = IN , H = H ≥ ˆ = 1 (B + H) is the unique positive semidefinite approximant 0). Then X 2 of A in the Frobenius norm. 2
2
Proof 1.13 For any symmetric matrix X we have A − X = B − X +
2 2 1 A . Let B = Θ diag (λ1 , . . . , λN ) Θ and let B = U H be − tr A 2 the polar decomposition of B. Then diag (λ1 , . . . , λN ) = (Θ U Θ) · (Θ HΘ) is a polar decomposition of diag (λ1 , . . . , λN ) and thus H = Θ diag (|λ1 | , . . . , |λN |) Θ and
U = Θ diag IN+ , −IN −N+ Θ ,
where N+ is the number of positive eigenvalues of 12 (A + A ). Note that
1/2 H = B2 is uniquely determined. Therefore, by Corollary 1.11 1 2
(B + H) = 12 (U + I) H = Θ diag IN+ , 0 Θ H
=Θ diag λ1 , . . . , λN+ , 0, . . . , 0 Θ
(1.20)
is the uniquely determined best approximant of B in the class of positive semidefinite symmetric matrices. The result follows.
Approximations by Rectangular Matrices Here we address the related classical issue of approximating a real rectangular matrix by matrices of lower rank. For integers 1 ≤ n ≤ min (M, N ) let
M (n, M × N ) = X ∈ RM×N | rank X = n
(1.21)
5.1. Approximations by Lower Rank Matrices
133
denote the set of real M × N matrices of rank n. Given A ∈ RM×N the approximation task is to find the minimum and, more generally, the critical points of the distance function FA : M (n, M × N ) → R,
2
FA (X) = A − X
(1.22)
for the Frobenius norm X = tr (XX ) on RM×N . 2
Proposition 1.14 M (n, M × N ) is a smooth and connected manifold of dimension n (M + N − n), if max (M, N ) > 1. The tangent space of M (n, M × N ) at an element X is
TX M (n, M × N ) = ∆1 X + X∆2 | ∆1 ∈ RM×M , ∆2 ∈ RN ×N . (1.23) Proof 1.15 Let Q ∈ M (n, M × N ) be defined by Q = I0n 00 Since every X ∈ M (n, M × N ) is congruent to Q by the congruence action ((γ1 , γ2 ) , X) → γ1 Xγ2−1 , γ1 ∈ GL (M, R), γ2 ∈ GL (N, R), the set M (n, M × N ) is an orbit of this smooth real algebraic Lie group action of GL (M ) × GL (N ) on RM×N and therefore a smooth manifold; see Appendix C. Here γ1 , γ2 may be chosen to have positive determinant. Thus M (n, M × N ) is the image of the connected subset GL+ (M ) × GL+ (N ) of the continuous (and in fact smooth) map π : GL (M ) × GL (N ) → RM×N , π (γ1 , γ2 ) = γ1 Qγ2−1 , and hence is also connected. The derivative of π at (γ1 , γ2 ) is the linear map on the tangent space T(γ1 ,γ2 ) (GL (M ) × GL (N )) ∼ = RM×M × RN ×N defined by
Dπ (γ1 , γ2 ) ((∆1 , ∆2 )) = ∆1 γ1 Qγ2−1 − γ1 Qγ2−1 ∆2 and TX M (n, M × N ) is the image of this map. Finally, the dimension result follows from a simple parameter counting. In fact, from the Schur complement formula, see Horn and Johnson (1985), any M × N matrix of rank n −1 X12 X11 0 I X11 X11 X12 = X= −1 X21 I 0 X22 − X21 X11 X21 X22 X12 with X11 ∈ Rn×n invertible, X21 ∈ R(M−n)×n , X12 ∈ Rn×(N −n) sat−1 X12 and thus depends on n2 + n (M + N − 2n) = isfies X22 = X21 X11 n (M + N − n) independent parameters. This completes the proof. The general approximation problem for rectangular matrices can be reduced to the approximation problem for symmetric matrices, using the
134
Chapter 5. Approximation and Control
same symmetrisation trick as in Chapter 3. To this end, define for A, X ∈ RM×N 0 A 0 X ˆ ˆ A= , X= . (1.24) A 0 X 0 ˆ are (M + N ) × (M + N ) symmetric matrices. If A ∈ RM×N Thus Aˆ and X has singular values σ1 , . . . , σk , k = min (M, N ), then the eigenvalues of Aˆ are ±σ1 , . . . , ±σk , and possibly 0. By (1.24) we have a smooth injective imbedding M (n, M × N ) → S (2n, M + N ) ,
ˆ X → X
(1.25)
2 ˆ 2 . with 2 A − X = Aˆ − X ˆ is a critical point (or a It is easy to check that, for X ∈ M (n, M × N ), X minimum) for fAˆ : S (2n, M + N ) → R if and only if X is a critical point (or a minimum) for FA : M (n, M × N ) → R. Thus the results for the symmetric case all carry over to results on the function FA : M (n, M × N ) → R, and the next result follows. Recall, see Chapter 3, that an M × N matrix A has a singular value decomposition σ1 A = Θ1 ΣΘ2 , and Σ = 0
..
.
...
σk 0
0 .. . ∈ RM×N 0 0
where Θ1 , Θ2 are orthogonal matrices. Let Σn , n ≤ k, be obtained from Σ by setting σn+1 , . . . , σk equal to zero. Theorem 1.16 Let A = Θ1 ΣΘ2 be the singular value decomposition of A ∈ RM×N with singular values σ1 ≥ · · · ≥ σk > 0, 1 ≤ k ≤ min (M, N ). (a) The critical points of FA : M (n, M × N ) → R, FA (X) = A − X2 , are characterized by (A − X) X = 0, X (A − X) = 0. (b) FA : M (n, M × N ) → R has a finite number of critical points if and only if M = N and A has M distinct singular values. (c) If σn > σn+1 , n ≤ k, there exists a unique global minimum Xmin of FA : M (n, M × N ) → R which is given by Xmin = Θ1 Σn Θ2 .
(1.26)
5.1. Approximations by Lower Rank Matrices
135
As an immediate Corollary we obtain the following classical theorem of Eckart and Young (1936). Corollary 1.17 Let A = Θ1 ΣΘ2 be the singular value decomposition of A ∈ RM×N , Θ1 Θ1 = IM , Θ2 Θ2 = IN , Σ = diag(σ10,...,σk ) 00 ∈ RM×N with σ1 ≥ · · · ≥ σk > 0. Let n ≤ k and σn > σn+1 . Then with diag (σ1 , . . . , σn , 0, . . . , 0) 0 Σn = ∈ RM×N 0 0 Xmin = Θ1 Σn Θ2 is the unique M × N matrix of rank n which minimizes A − X2 over the set of matrices of rank less than or equal to n.
Gradient Flows In this subsection we develop a gradient flow approach to find the critical points of the distance functions fA : S (n, N ) → R and FA : M (n, M × N ) → R. We first consider the symmetric matrix case. By Proposition 1.1 the tangent space of S (n, N ) at an element X is the vector space
TX S (n, N ) = ∆X + X∆ | ∆ ∈ RN ×N . For A, B ∈ RN ×N we define {A, B} = AB + B A
(1.27)
which, by the way, coincides with the product defined on RN ×N , when RN ×N is considered as a Jordan algebra. Thus the tangent space TX S (n, N ) is the image of the linear map πX : RN ×N → RN ×N ,
∆ → {∆, X}
(1.28)
ker πX = ∆ ∈ RN ×N | ∆X + X∆ = 0
(1.29)
while the kernel of πX is
Taking the orthogonal complement
⊥ (ker πX ) = Z ∈ RN ×N | tr (Z ∆) = 0 ∀∆ ∈ ker πX , with respect to the standard inner product on RN ×N A, B = tr (A B) ,
(1.30)
136
Chapter 5. Approximation and Control
yields the isomorphism of vector spaces ⊥ (ker πX ) ∼ = RN ×N / ker πX ∼ = TX S (n, N ) .
(1.31)
We have the orthogonal decomposition of RN ×N ⊥
RN ×N = ker πX ⊕ (ker πX )
and hence every element ∆ ∈ RN ×N has a unique decomposition ∆ = ∆X + ∆X
(1.32)
where ∆X ∈ ker πX and ∆X ∈ (ker πX )⊥ . Given any pair of tangent vectors {∆1 , X}, {∆2 , X} of TX S (n, M ) we define X (1.33) ∆2 . {∆1 , X} , {∆2 , X} := 4 tr ∆X 1 It is easy to show that , defines a nondegenerate symmetric bilinear form on TX S (n, N ) for each X ∈ S (n, N ). In fact, , defines a Riemannian metric of S (n, N ). We refer to , as the normal Riemannian metric on S (n, N ). Theorem 1.18 Let A ∈ RN ×N be symmetric. (a) The gradient flow of fA : S (n, N ) → R with respect to the normal Riemannian metric , is X˙ = − grad fA (X) = {(A − X) X, X}
(1.34)
= (A − X) X + X (A − X) 2
2
(b) For any X (0) ∈ S (n, N ) the solution X (t) ∈ S (n, N ) of (1.34) exists for all t ≥ 0. (c) Every solution X (t) ∈ S (n, N ) of (1.34) converges to an equilibrium point X∞ characterized by X (X − A) = 0. Also, X∞ has rank less than or equal to n. Proof 1.19 The gradient of fA with respect to the normal metric is the uniquely determined vector field on S (n, N ) characterized by DfA (X) ({∆, X}) = grad fA (X) , {∆, X} grad fA (X) = {Ω, X} ∈ TX S (n, N )
(1.35)
5.1. Approximations by Lower Rank Matrices
137
⊥
for all ∆ ∈ RN ×N and some (unique) Ω ∈ (ker πX ) . A straightforward computation shows that the derivative of fA : S (n, N ) → R at X is the linear map defined on TX S (n, N ) by DfA (X) ({∆, X}) =2 tr (X {∆, X} − A {∆, X})
=4 tr ((X − A) X) ∆ . Thus (1.35) is equivalent to
4 tr ((X − A) X) ∆ = grad fA (X) , {∆, X} = {Ω, X} , {∆, X} =4 tr ΩX ∆X
=4 tr Ω ∆X ⊥
since Ω ∈ (ker πX )
(1.36)
(1.37)
implies Ω = ΩX . For all ∆ ∈ ker πX
tr (X (X − A) ∆) =
1 2
tr ((X − A) (∆X + X∆ )) = 0 ⊥
and therefore (X − A) X ∈ (ker πX ) . Thus
4 tr ((X − A) X) ∆ =4 tr ((X − A) X) ∆X + ∆X
=4 tr ((X − A) X) ∆X and (1.37) is equivalent to Ω = (X − A) X Thus grad fA (X) = {(X − A) X, X}
(1.38)
which proves (a). For (b) note that for any X ∈ S (n, N ) we have {X, X (X − A)} ∈ TX S (n, N ) and thus (1.34) is a vector field on S (n, N ). Thus for any initial condition X (0) ∈ S (n, N ) the solution X (t) of (1.34) satisfies X (t) ∈ S (n, N ) for all t for which X (t) is defined. It suffices therefore to show the existence of solutions of (1.34) for all t ≥ 0. To this end consider any solution X (t) of (1.34). By d fA (X (t)) =2 tr (X − A) X˙ dt =2 tr ((X − A) {(A − X) X, X}) (1.39) 2 = − 4 tr (A − X) X 2 = − 4 (A − X) X2
138
Chapter 5. Approximation and Control
(since tr (A {B, C}) = 2 tr (BCA) for A = A ). Thus fA (X (t)) decreases monotonically and the equilibria points of (1.34) are characterized by (X − A) X = 0. Also, A − X (t) ≤ A − X (0) and X (t) stays in the compact set & ' X ∈ S (n, N ) A − X ≤ A − X (0) . By the closed orbit lemma, see Appendix C, the closure S (n, N ) of each orbit S (n, N ) is a union of orbits S (i, N ) for 0 ≤ i ≤ n. Since the vector field is tangent to all S (i, N ), 0 ≤ i ≤ n, the boundary of S (n, N ) is invariant under the flow. Thus X (t) exists for all t ≥ 0 and the result follows. Remark 1.20 An important consequence of Theorem 1.18 is that the differential equation on the vector space of symmetric N × N matrices X˙ = X 2 (A − X) + (A − X) X 2
(1.40)
is rank preserving, ı.e. rank X (t) = rank X (0) for all t ≥ 0, and therefore also signature preserving. Also, X (t) always converges in the spaces of symmetric matrices to some symmetric matrix X (∞) as t → ∞ and hence rk X (∞) ≤ rk X (0). Here X (∞) is a critical point of fA : S (n, N ) → R, n ≤ rk X (0). Remark 1.21 The gradient flow (1.34) of fA : S (n, N ) → R can also be obtained as follows. Consider the task of finding the gradient for the function gA : GL (N ) → R defined by gA (γ) = A − γ Qγ (1.41)
−1 −1 where Q ∈ S (n, N ). Let ξ, η = 4 tr γ ξ γ η for ξ, η ∈ Tγ GL (N ). It is now easy to show that 2
γ˙ = γγ Qγ (γ Qγ − A)
(1.42)
is the gradient flow of gA : GL (N ) → R with respect to the Riemannian metric , on GL (N ). In fact, the gradient grad (gA ) with respect to , is characterized by grad gA (γ) =γ · Ω DgA (γ) (ξ) = grad gA (γ) , ξ
=4 tr Ω γ −1 ξ
(1.43)
5.1. Approximations by Lower Rank Matrices
139
for all ξ ∈ Tγ GL (N ). The derivative of gA on Tγ GL (N ) is (X = γ Qγ) DgA (γ) (ξ) = −4 tr ((A − X) γ Qξ) and hence Ω γ −1 = (X − A) γ Q.
This proves (1.42). It is easily verified that X (t) = γ (t) Qγ (t) is a solution of (1.34), for any solution γ (t) of (1.42). We now turn to the task of determining the gradient flow of FA : M (n, M × N ) → R. A differential equation X˙ = F (X) evolving on the matrix space RM×N is said to be rank preserving if the rank rk X (t) of every solution X (t) is constant as a function of t. The following characterization is similar to that of isospectral flows. Lemma 1.22 Let I ⊂ R be an interval and let A (t) ∈ RM×M , B (t) ∈ RN ×N , t ∈ I, be a continuous time-varying family of matrices. Then X (0) ∈ RM×N
X˙ (t) = A (t) X (t) + X (t) B (t) ,
(1.44)
is rank preserving. Conversely, every rank preserving differential equation on RM×N is of the form (1.44) for matrices A (t), B (t). Proof 1.23 For any fixed X ∈ RM×N with rank X = n, and n ≤ min (M, N ) arbitrary, A (t) X + XB (t) ∈ TX M (n, M × N ). Thus (1.44) defines a time varying vector field on each subset M (n, M × N ) ⊂ RM×N . Thus for any initial condition X0 ∈ M (n, M × N ) the solution X (t) of (1.44) satisfies X (t) ∈ M (n, M × N ) for all t ∈ I. Therefore (1.44) is rank preserving. Conversely, suppose X˙ = F (X) is rank preserving. Then it defines a vector field on M (n, M × N ) for any 1 ≤ n ≤ min (M, N ). By Proposition 1.14 therefore F (X) = ∆1 (X) · X + X · ∆2 (X), X ∈ M (n, M × N ), for M × M and N × N matrices ∆1 and ∆2 . Setting A (t) = ∆1 (X (t)), B (t) = ∆2 (X (t)) completes the proof. To obtain the gradient flow of the distance function FA : M (n, M × N ) → R in the general approximation problem we proceed as above. Let i : M (n, M × N ) → S (2n, M + N ) denote the imbedding defined by ˆ= i (X) = X
0
X
X
0
.
140
Chapter 5. Approximation and Control
The gradient flow of fAˆ : S (2n, M + N ) → R is
Z˙ = − Z 2 Z − Aˆ + Z − Aˆ Z 2 , Z ∈ S (2n, M + N ) ˆ= For Z = X
0 X X 0
the right hand side is simplified as
0 − (XX (X − A) + (X − A) X X)
XX (X − A) + (X − A) X X 0
Thus the gradient flow (1.34) on S (2n, M + N ) of fAˆ leaves the submanifold i (M (n, M × N )) ⊂ S (2n, M + N ) invariant. The normal Riemannian metric of S (2n, M + N ) induces by restriction a Riemannian metric on i (M (n, M × N )) and hence on M (n, M × N ). We refer to this as the normal Riemannian metric of M (n, M × N ). The above computation together with Theorem 1.18 then shows the following theorem. Theorem 1.24 Let A ∈ RM×N . 2
(a) The gradient flow of FA : M (n, M × N ) → R, FA (X) = A − X , with respect to the normal Riemannian metric on M (n, M × N ) is X˙ = − grad FA (X) = XX (A − X) + (A − X) X X
(1.45)
(b) For any X (0) ∈ M (n, M × N ) the solution X (t) of (1.45) exists for all t ≥ 0 and rank X (t) = n for all t ≥ 0. (c) Every solution X (t) of (1.45) converges to an equilibrium point satis − A ) = 0, X∞ (X∞ − A) = 0 and X∞ has rank ≤ n. fying X∞ (X∞
A Riccati Flow The Riccati differential equation X˙ = (A − X) X + X (A − X) appears to be the simplest possible candidate for a rank preserving flow on S (N ) which has the same set of equilibria as the gradient flow (1.34). Moreover, the restriction of the right hand side of (1.34) on the subclass of projection operators, characterized by X 2 = X, coincides with the above Riccati equation. This motivates us to consider the above Riccati equation in more detail. As we will see, the situation is particularly transparent for positive definite matrices X.
5.1. Approximations by Lower Rank Matrices
141
Theorem 1.25 Let A ∈ RN ×N be symmetric. (a) The Riccati equation X˙ = (A − X) X + X (A − X) ,
(1.46)
for X (0) ∈ S (n, N ), defines a rank preserving flow on S (n, N ). (b) Assume A is invertible. Then the solutions X (t) of (1.46) are given by
−1 tA X (t) = etA X0 IN + A−1 e2At − IN X0 e (1.47) (c) For any positive semidefinite initial condition X (0) = X (0) ≥ 0, the solution X (t) of (1.46) exists for all t ≥ 0 and is positive semidefinite. (d) Every positive semidefinite solution X (t) ∈ S + (n, N ) = S (n, 0; N ) of (1.46) converges to a connected component of the set of equilibrium points, characterized by (A − X∞ ) X∞ = 0. Also X∞ is positive semidefinite and has rank ≤ n. If A has distinct eigenvalues then every positive semidefinite solution X (t) converges to an equilibrium point. Proof 1.26 The proof of (a) runs similarly to that of Lemma 1.22; see Problem. To prove (b) it suffices to show that X (t) defined by (1.47) satisfies the Riccati equation. By differentiation of (1.47) we obtain X˙ (t) = AX (t) + X (t) A − 2X (t)2 , which shows the claim. For (c) note that (a) implies that X (t) ∈ S + (n, N ) for all t ∈ [0, tmax [. Thus it suffices to show that X (t) exists for all t ≥ 0; ı.e. tmax = ∞. This follows from a simple Lyapunov argument. First, we note that the set S + (n, N ) of positive semidefinite matrices X of rank ≤ n is a closed subset of S (N ). Consider the distance function fA : S (N ) → R+ defined by 2 fA (X) = A − X . Thus fA is a proper function of S (N ) and hence also on S + (n, N ). For every positive semidefinite solution X (t), t ∈ [0, tmax [, 1/2 let X (t) denote the unique positive semidefinite symmetric square root. A simple computation shows " # d fA (X (t)) = − 4 tr (A − X (t)) X˙ (t) dt 2 = − 4 (A − X (t)) X (t)1/2 ≤ 0.
142
Chapter 5. Approximation and Control
A
FIGURE 1.2. Riccati flow on positive semidefinite matrices
Thus fA is a Lyapunov function for (1.46), restricted to the class of positive semidefinite matrices, and equilibrium points X∞ ∈ S + (n, N ) are characterized by (A − X∞ ) X∞ = 0. In particular, fA (X (t)) is a monotonically decreasing function of t and the solution X (t) stays in the compact subset ' & X ∈ S + (n, N ) | fA (X) ≤ fA (X (0)) . Thus X (t) is defined for all t ≥ 0. By the closed orbit lemma, Appendix C, the boundary of S + (n, N ) is a union of orbits of the congruence action on GL (N, R). Since the Riccati vectorfield is tangent to these orbits, the boundary is invariant under the flow. Thus X (t) ∈ S + (n, N ) for all t ≥ 0. By La Salle’s principle of invariance, the ω-limit set of X (t) is a connected component of the set of positive semidefinite equilibrium points. If A has distinct eigenvalues, then the set of positive semidefinite equilibrium points is finite. Thus the result follows. Remark 1.27 The above proof shows that the least squares distance function fA (X) = A − X2 is a Lyapunov function for the Riccati equation, evolving on the subset of positive semidefinite matrices X. In particular, the Riccati equation exhibits gradient-like behaviour if restricted to positive definite initial conditions. If X0 is an indefinite matrix then also X (t) is indefinite, and fA (X) is no longer a Lyapunov function for the Riccati equation. Figure 1.2 illustrates the phase portrait of the Riccati flow on S (2) for A = A positive semidefinite. Only a part of the complete phase portrait is shown here, concentrating on the cone of positive semidefinite matrices in S (2). There are two equilibrium points on S (1, 2). For the flow on S (2), both equilibria are saddle points, one having 1 positive and 2 negative eigenvalues while the other one has 2 positive and 1 negative eigenvalues. The induced flow on S (1, 2) has one equilibrium as a local attractor while the other one is a saddle point. Figure 1.3 illustrates the phase portrait in the case where A = A is indefinite.
5.2. The Polar Decomposition
143
FIGURE 1.3. Riccati flow on S (1, 2).
Problem 1.28 Let I ⊂ R be an interval and let A (t) ∈ RN ×N , t ∈ I, be a continuous family of matrices. Show that X˙ (t) = A (t) X (t) + X (t) A (t) , X (0) ∈ S (n)
is a rank (and hence signature) preserving flow on S (N ). Prove that, conversely, any rank preserving vector field on S (N ) is of this form. Problem 1.29 Let A, B ∈ RN ×N . What is the gradient flow of the total least squares function FA,B : M (n, N × N ) → R, FA,B (X) = A − BX2 , with respect to the normal Riemannian metric on M (n, N × N )? Characterize the equilibria!
Main Points of Section The approximation problem of a matrix by a lower rank one in the Frobenius norm is a further instance of a matrix least squares estimation problem. The general matrix case can be reduced to the approximation problem for symmetric matrices. Explicit formulas are given for the critical points and local minima in terms of the eigenspace decomposition. A certain Riemannian metric leads to a particularly simple expression of the gradient vector field. A remarkable property of the gradient vector field is that the solutions are rank preserving.
5.2 The Polar Decomposition Every real n × n matrix A admits a decomposition A=Θ·P
(2.1)
where P is positive semidefinite symmetric and Θ is an orthogonal matrix 1/2 is uniquely determined, satisfying ΘΘ = Θ Θ = In . While P = (A A)
144
Chapter 5. Approximation and Control
Θ is uniquely determined only if A is invertible. The decomposition (2.1) is called the polar decomposition of A. While effective algebraic algorithms achieving the polar decomposition are well known, we are interested in finding dynamical systems achieving the same purpose. Let O (n) and P (n) denote the set of real orthogonal and positive definite symmetric n×n matrices respectively. In the sequel we assume for simplicity that A is invertible. Given A ∈ Rn×n , det (A) = 0, we consider the smooth function 2
FA : O (n) × P (n) → R, FA (Θ, P ) = A − ΘP
(2.2)
where A = tr (AA ) is the Frobenius norm. Since F (Θ0 , P0 ) = 0 if and only if A = Θ0 P0 we see that the global minimum of F corresponds to the polar decomposition. This motivates the use of a gradient flow for FA : O (n) × P (n) → R to achieve the polar decomposition. Of course, other approximation problems suggest themselves as well. Thus, if Θ is restricted to be the identity matrix, we have the problem studied in the previous section of finding the best positive definite approximant of a given matrix A. Similarly if P is restricted to be the identity matrix, then the question amounts to finding the best orthogonal matrix approximant of a 2 2 given invertible matrix A. In this case A − Θ = A − 2 tr (A Θ) + n and we have a least square matching problem, studied by Shayman (1982) and Brockett (1989a). 2
Theorem 2.1 Let A ∈ Rn×n , det (A) = 0, and let O (n) and P (n) be endowed with the constant Riemannian metric arising from the Euclidean inner product of Rn×n . The (minus) gradient flow of FA : O (n)×P (n) → R is ˙ = ΘP A Θ − AP Θ P˙ = −2P + A Θ + Θ A.
(2.3)
For every initial condition (Θ (0) , P (0)) ∈ O (n) × P (n) the solution (Θ (t) , P (t)) of (2.3) exists for all t ≥ 0 with Θ (t) ∈ O (n) and P (t) symmetric (but not necessarily positive semidefinite). Every solution of (2.3) converges to an equilibrium point of (2.3) as t → +∞. The equilibrium points of (2.3) are (Θ∞ , P∞ ) with Θ∞ = Θ0 Ψ, Ψ ∈ O (n), P∞ =
1 2
(P0 Ψ + Ψ P0 )
Ψ (P∞ P0 ) Ψ = P0 P∞
(2.4)
and A = Θ0 P0 is the polar decomposition. For almost every initial condition Θ (0) ∈ O (n), P (0) ∈ P (n), then (Θ (t) , P (t)) converges to the polar decomposition (Θ0 , P0 ) of A.
5.2. The Polar Decomposition
145
Proof 2.2 The derivative of FA : O (n) × P (n) → R at (Θ, P ) is the linear map on the tangent space DF (Θ, P ) (ΘΩ, S) = − 2 tr (A ΘΩP ) − 2 tr (A ΘS) + 2 tr (P S) = − tr ((P A Θ − Θ AP ) Ω) − tr ((A Θ − P − P + Θ A) S) for arbitrary matrices Ω = −Ω, S = S. Thus the gradient is ∇FA = (∇Θ FA , ∇P FA ) with ∇Θ FA =Θ (Θ AP − P A Θ) = AP − ΘP A Θ ∇P FA =2P − A Θ − Θ A. This is also the gradient flow of the extended function FˆA : O (n)× S (n) → 2 R, FˆA (Θ, S) = A − ΘS , where S (n) is the set of all real symmetric matrices. Since FˆA : O (n) × S (n) → R is proper, the existence of solutions (2.3) for t ≥ 0 follows. Moreover, every solution converges to a critical point of FˆA characterized by (1.30). A difficulty with the above ODE approach to the polar decomposition is that equation (2.3) for the polar part P (t) does not in general evolve in the space of positive definite matrices. In fact, positive definiteness may be lost during the evolution of (2.3). For example, let n = 1, a = 1, and Θ0 = −1. Then (Θ∞ , P∞ ) = (−1, −1). If, however, A Θ∞ + Θ∞ A ≥ 0 then P (t) > 0 for all t ≥ 0.
Research Problems More work is to be done in order to achieve a reasonable ODE method for polar decomposition. If P (n) is endowed with the normal Riemannian metric instead of the constant Euclidean one used in the above theorem, then the gradient flow on O (n) × P (n) becomes ˙ =Θ (P A ) Θ − AP Θ
P˙ =4 1 (A Θ + Θ A) − P P 2 + P 2 1 (A Θ + Θ A) − P . 2
2
Show that this flow evolves on O (n)×P (n), ı.e. Θ (t) ∈ O (n), P (t) ∈ P (n) exists for all t ≥ 0. Furthermore, for all initial conditions Θ (0) ∈ O (n), P (0) ∈ P (n), (Θ (t) , P (t)) converges (exponentially?) to the unique polar decomposition Θ0 P0 = A of A (for t → ∞). Thus this seems to be the right gradient flow.
146
Chapter 5. Approximation and Control
A different, somewhat simpler, gradient-like flow on O (n) × P (n) which also achieves the polar decomposition is ˙ =Θ (P A ) Θ − AP Θ
P˙ = 12 (A Θ + Θ A) − P P + P 12 (A Θ + Θ A) − P . Note that the equation for P now is a Riccati equation. Analyse these flows.
Main Points of Section The polar decomposition of a matrix is investigated from a matrix least squares point of view. Gradient flows converging to the orthogonal and positive definite factors in a polar decomposition are introduced. These flows are coupled Riccati equations on the orthogonal matrices and positive definite matrices, respectively.
5.3 Output Feedback Control Feedback is a central notion in modern control engineering and systems theory. It describes the process of “feeding back” the output or state variables in a dynamical systems configuration through the input channels. Here we concentrate on output feedback control of linear dynamical systems. Consider finite-dimensional linear dynamical systems of the form x˙ (t) =Ax (t) + Bu (t) y (t) =Cx (t) .
(3.1)
Here u (t) ∈ Rm and y (t) ∈ Rp are the input and output respectively of the system while x (t) is the state vector and A ∈ Rn×n , B ∈ Rn×m and C ∈ Rp×n are real matrices. We will use the matrix triple (A, B, C) to denote the dynamical system (3.1). Using output feedback the input u (t) of the system is replaced by a new input u (t) = Ky (t) + v (t)
(3.2)
defined by a feedback gain matrix K ∈ Rm×p . Combining equations (3.1) and (3.2) yields the closed loop system x˙ (t) = (A + BKC) x (t) + Bv (t) y (t) =Cx (t)
(3.3)
5.3. Output Feedback Control
147
An important task in linear systems theory is that of pole placement or eigenvalue assignment via output feedback. Thus for a given system (A, B, C) and a self-conjugate set {s1 , . . . , sn } ⊂ C one is asked to find a feedback matrix K ∈ Rm×p such that A + BKC has eigenvalues s1 , . . . , sn . A simple dimension count shows that the condition mp ≥ n is necessary for the solvability of the problem; see also the notes for Chapter 5. Thus, in general, for mp < n, there is no choice of a feedback gain matrix K such that A+ BKC has prescribed eigenvalues. It is desirable in this case to find a feedback matrix that generates the best closed loop approximation to a specified eigenstructure. Thus, for a fixed matrix F ∈ Rn×n with eigenvalues s1 , . . . , sn the task is to find matrices T ∈ GL (n, R) and K ∈ Rm×p 2 that minimize the distance F − T (A + BKC) T −1 . One would hope to find an explicit formulae for the optimal feedback gain that achieves the best approximation, however, the question appears to be too difficult to tackle directly. Thus, algorithmic solutions become important. A natural generalization of eigenvalue or eigenstructure assignment of linear systems is that of optimal system assignment. Here the distance of a target system (F, G, H) to an output feedback orbit (see below) is minimized. In this section a gradient flow approach is developed to solve such output feedback optimization problems. More complicated problems, such as simultaneous eigenvalue assignment of several systems or eigenvalue assignment problems for systems with symmetries, can be treated in a similar manner. We regard this as an important aspect of the approach. Although no new output feedback pole placement theorems are proved here we believe that the new methodology introduced is capable of offering new insights into these difficult questions.
Gradient Flows on Output Feedback Orbits We begin with a brief description of the geometry of output feedback orbits. Two linear systems (A1 , B1 , C1 ) and (A2 , B2 , C2 ) are called output feedback equivalent if
(A2 , B2 , C2 ) = T (A1 + B1 KC1 ) T −1 , T B1 , C1 T −1
(3.4)
holds for T ∈ GL (n, R) and K ∈ Rm×p . Thus the system (A2 , B2 , C2 ) is obtained from (A1 , B1 , C1 ) using a linear change of basis T ∈ GL (n, R) in the state space Rn and a feedback transformation K ∈ Rm×p . Observe that the set GL (n, R) × Rm×p of feedback transformation is a Lie group under the operation (T1 , K1 ) ◦ (T2 , K2 ) = (T1 T2 , K1 + K2 ). This group is called the output feedback group.
Chapter 5. Approximation and Control
148
The Lie group GL (n, R) × Rm×p acts on the vector space of all triples
L (n, m, p) = (A, B, C) | (A, B, C) ∈ Rn×n × Rn×m × Rp×n (3.5) via the Lie group action
α : GL (n, R) × Rm×p × L (n, m, p) → L (n, m, p)
((T, K) , (A, B, C)) → T (A + BKC) T −1 , T B, CT −1
(3.6)
Thus, the orbits
T (A + BKC) T
−1
F (A, B, C) = , T B, CT −1 | (T, K) ∈ GL (n, R) × Rm×p (3.7)
are the set of systems which are output feedback equivalent to (A, B, C). It is readily verified that the transfer function G (s) = C (sI − A)−1 B associated with (A, B, C), Section B.1, changes under output feedback via the linear fractional transformation −1
G (s) → GK (s) = (Ip − G (s) K)
G (s) .
Let G (s) be an arbitrary strictly proper p × m rational transfer function of McMillan degree n, see Appendix B for results on linear system theory. It is a consequence of Kalman’s realization theorem that
FG = (A, B, C) ∈ L (n, m, p) | C (sIn − A) =F (A, B, C)
−1
−1
B = (Ip − G (s) K)
G (s) for some K ∈ Rm×p
coincides with the output feedback orbit F (A, B, C), for (A, B, C) controllable and observable. Lemma 3.1 Let (A, B, C) ∈ L (n, m, p). Then (a) The output feedback orbit F (A, B, C) is a smooth submanifold of all triples L (n, m, p). (b) The tangent space of F (A, B, C) at (A, B, C) is
T(A,B,C) F = ([X, A] + BLC, XB, −CX) | X ∈ Rn×n , L ∈ Rm×p (3.8)
5.3. Output Feedback Control
149
Proof 3.2 F (A, B, C) is an orbit of the real algebraic Lie group action (3.6) and thus, by Appendix C, is a smooth submanifold of L (n, m, p). To prove (b) we consider the smooth map Γ : GL (n, R) × Rm×p → L (n, m, p)
(T, K) → T (A + BKC) T −1 , T B, CT −1 . The derivative of Γ at the identity element (In , 0) of GL (n, R) × Rm×p is the surjective linear map
DΓ|(In ,0) : T(In ,0) GL (n, R) × Rm×p → T(A,B,C) F (A, B, C) (X, L) → ([X, A] + BLC, XB, −CX) for X ∈ TIn GL (n, R), L ∈ T0 Rm×p . The result follows. Let (A, B, C) , (F, G, H) ∈ L (n, m, p) and consider the potential Φ : F (A, B, C) → R
Φ T (A + BKC) T −1 , T B, CT −1 2 2 := T (A + BKC) T −1 − F + T B − G2 + CT −1 − H
(3.9)
In order to determine the gradient flow of this distance function we must specify a Riemannian metric on F (A, B, C). The construction of the Riemannian metric that we consider parallels a similar development in Section 5.1. By Lemma (3.1), the tangent space T(A,B,C) F (A, B, C) at an element (A, B, C) is the image of the linear map π : Rn×n × Rm×p → L (n, m, p) defined by π (X, L) = ([X, A] + BLC, XB, −CX) , which has kernel
ker π = (X, L) ∈ Rn×n × Rm×p | ([X, A] + BLC, XB, −CX) = (0, 0, 0) . Taking the orthogonal complement (ker π) Euclidean inner product on Rn×n × Rm×p
⊥
with respect to the standard
(Z1 , M1 ) , (Z2 , M2 ) := tr (Z1 Z2 ) + tr (M1 M2 ) yields the isomorphism of vector spaces ⊥
(ker π) ≈ T(A,B,C) F (A, B, C) .
150
Chapter 5. Approximation and Control
We have the orthogonal decomposition of Rn×n × Rm×p Rn×n × Rm×p = ker π ⊕ (ker π)⊥ and hence every element has a unique decomposition
(X, L) = (X⊥ , L⊥ ) + X ⊥ , L⊥
⊥ with (X⊥ , L⊥ ) ∈ ker π, X ⊥ , L⊥ ∈ (ker π) . Given any pair of tangent vectors ([Xi , A] + BLi C, Xi B, −CXi ), i = 1, 2, of T(A,B,C) F (A, B, C) we define ([X1 , A] + BL1 C, X1 B, −CX1 ) , ([X2 , A] + BL2 C, X2 B, −CX2 )
⊥ + 2 tr L⊥ := 2 tr X1⊥ X2⊥ 1 L2 It is easily shown the ·,· defines a nondegenerate symmetric bilinear form on T(A,B,C) F (A, B, C). In fact, ·,· defines a Riemannian metric on F (A, B, C) which is termed the normal Riemannian metric. Theorem 3.3 (System Assignment) Suppose (A, B, C) , (F, G, H) ∈ L (n, m, p) . ˙ B, ˙ C˙ (a) The gradient flow A, = − grad Φ (A, B, C) of Φ
:
F (A, B, C) → R given by (3.9), with respect to the normal Riemannian metric is A˙ = [A, [A − F, A ] + (B − G) B − C (C − H)] − BB (A − F ) C C B˙ = − ([A − F, A ] + (B − G) B − C (C − H)) B C˙ =C ([A − F, A ] + (B − G) B − C (C − H))
(3.10)
(b) Equilibrium points (A∞ , B∞ , C∞ ) of (3.10) are characterised by [A∞ − F, A∞ ] + (B∞ − G) B∞ − C∞ (C∞ − H) = 0 B∞
(A∞ −
F ) C∞
=0
(3.11) (3.12)
(c) For every initial condition (A (0) , B (0) , C (0)) ∈ F (A, B, C) the solution (A (t) , B (t) , C (t)) of (3.10) exists for all t ≥ 0 and remains in F (A, B, C).
5.3. Output Feedback Control
151
(d) Every solution (A (t) , B (t) , C (t)) of (3.10) converges to a connected component of the set of equilibrium points (A∞ , B∞ , C∞ ) ∈ F (A, B, C). Proof 3.4 The derivative of Φ : F (A, B, C) → R at (A, B, C) is the linear map on the tangent space T(A,B,C) F (A, B, C) defined by DΦ|(A,B,C) ([X, A] + BLC, XB, −CX) =2 tr (([X, A] + BLC) (A − F ) + XB (B − G ) − (C − H ) CX) =2 tr (X ([A, A − F ] + B (B − G ) − (C − H ) C)) + 2 tr (LC (A − F ) B) . The gradient of Φ with respect to the normal metric is the uniquely determined vector field on F (A, B, C) characterised by grad Φ (A, B, C) = ([Z, A] + BM C, ZB, −CZ), DΦ|(A,B,C) ([X, A] + BLC, XB, −CX) = grad Φ (A, B, C) , ([X, A] + BLC, XB, −CX)
=2 tr Z ⊥ X ⊥ + 2 tr M ⊥ L⊥ for all (X, L) ∈ Rn×n × Rm×p and some (Z, M ) ∈ Rn×n × Rm×p . Virtually the same argument as for the proof of Theorem 1.18 then shows that
Z ⊥ = ([A, A − F ] + B (B − G ) − (C − H ) C) M
⊥
= (C (A − F ) B) .
(3.13) (3.14)
Therefore (3.10) gives the (negative of the) gradient flow of Φ, which proves (a). Part (c) is an immediate consequence of (3.13) and (3.14). As Φ : L (n, m, p) → R is proper it restricts to a proper function Φ : F (A, B, C) → R on the topological closure of the output feedback orbit. Thus every solution (A (t) , B (t) , C (t)) of (3.10) stays in the compact set & (A1 , B1 , C1 ) ∈ F (A (0) , B (0) , C (0)) ' | Φ (A1 , B1 , C1 ) ≤ Φ (A (0) , B (0) , C (0)) and thus exists for all t ≥ 0. By the orbit closure lemma, see Section C.8, the boundary of F (A, B, C) is a union of output feedback orbits. Since the gradient flow (3.10) is tangent to the feedback orbits F (A, B, C), the solutions are contained in F (A (0) , B (0) , C (0)) for all t ≥ 0. This completes the proof of (b). Finally, (d) follows from general convergence results for gradient flows, see Section C.12.
152
Chapter 5. Approximation and Control
Remark 3.5 Although the solutions of the gradient flow (3.10) all converge to some equilibrium point (A∞ , B∞ , C∞ ) as t → +∞, it may well be that such an equilibrium point is contained the boundary of the output feedback orbit. This seems reminiscent to the occurrence of high gain output feedback in pole placement problems; see however, Problem 3.17. Remark 3.6 Certain symmetries of the realization (F, G, H) result in associated invariance properties of the gradient flow (3.10). For example, if (F, G, H) = (F , H , G ) is a symmetric realization, then by inspection the flow (3.10) on L (n, m, p) induces a flow on the invariant submanifold of all symmetric realizations (A, B, C) = (A , C , B ). In this case the induced flow on {(A, B) ∈ Rn×n × Rn×m | A = A } is A˙ = [A, [A, F ] + BG − GB ] − BB (A − F ) BB B˙ = − ([A, F ] + BG − GB ) B. In particular, for G = 0, we obtain the extension of the double bracket flow A˙ = [A, [A, F ]] − BB (A − F ) BB B˙ = − ([A, F ]) B. Observe that the presence of the feedback term BB (A − F ) BB in the equation for A (t) here destroys the isospectral nature of the double bracket equation.
Flows on the Output Feedback Group Of course there are also associated gradient flows achieving the optimal feedback gain K∞ and state space coordinate transformation T∞ . Let (A, B, C) ∈ L (n, m, p) be a given realization and let (F, G, H) ∈ L (n, m, p) be a “target system”. To find the optimal output feedback transformation of (A, B, C) which results in a best approximation of (F, G, H), we consider the smooth function (3.15) φ : GL (n, R) × Rm×p → R 2 2 2 −1 −1 φ (T, K) = T (A + BKC) T − F + T B − G + CT − H on the feedback group GL (n, R)×Rm×p . Any tangent vector of GL (n, R)× Rm×p at an element (T, K) is of the form (XT, L) for X ∈ Rn×n and L ∈ Rm×p . In the sequel we endow GL (n, R) × Rm×p with the normal Riemannian metric defined by (X1 T, L1 ) , (X2 T, L2 ) := 2 tr (X1 X2 ) + 2 tr (L1 L2 )
(3.16)
5.3. Output Feedback Control
153
for any pair of tangent vectors (Xi T, Li ) ∈ T(T,K) (GL (n, R) × Rm×p ), i = 1, 2. Theorem 3.7 Let (A, B, C) ∈ L (n, m, p) and consider the smooth function φ : GL (n, R) × Rm×p → R defined by (3.15) for a target system (F, G, H) ∈ L (n, m, p). (a) The gradient flow T˙ , K˙ = − grad φ (T, K) of φ with respect to the normal Riemannian metric (3.16) on GL (n, R) × Rm×p is " #
T˙ = − T (A + BKC) T −1− F, T (A + BKC) T −1
−1 − (T B − G) B T T + (T ) C CT −1 − H T (3.17)
−1 K˙ = −B T T (A + BKC) T −1 − F (T ) C (b) The equilibrium points (T∞ , K∞ ) ∈ Rn×n × Rn×m are characterised by "
# −1 −1 − F, T∞ (A + BK∞ C) T∞ T∞ (A + BK∞ C) T∞
−1 −1 = (T∞ ) C CT∞ − H − (T∞ B − G) B T∞ ,
−1 −1 CT∞ − F T∞ B = 0. T∞ (A + BK∞ C) T∞ (c) Let (T (t) , K (t)) be a solution of (3.17). Then (A (t) , B (t) , C (t)) := −1 −1 T (t) (A + BK (t) C) T (t) , T (t) B, CT (t) is a solution (3.10). Proof 3.8 The Fr´echet-derivative of φ : GL (n, R) × Rm×p → R is the linear map on the tangent space defined by DΦ|(T,K) (XT, L) =2 tr T (A + BKC) T −1 − F
× X, T (A + BKC) T −1 + T BLCT −1
+ 2 tr (T B − G) XT B − CT −1 − H CT −1 X "
# =2 tr X T (A + BKC) T −1 , T (A + BKC) T −1 − F
+ T B (T B − G) − CT −1 − H CT −1
+ 2 tr L CT −1 T (A + BKC) T −1 − F T B
154
Chapter 5. Approximation and Control
Therefore the gradient vector grad φ (T, K) is " #
T (A + BKC) T −1 − F, T (A + BKC) T −1 T
−1 −1 + (T B − G) B T T − (T ) C − H T CT grad φ (T, K) =
−1 −1 B T T (A + BKC) T − F (T ) C This proves (a). Part (b) follows immediately from (3.17). For (c) note that (3.17) is equivalent to T˙ = − A (t) − F, A (t) T − (B (t) − G) B (t) T + C (t) (C − H) T K˙ = B (t) (A (t) − F ) C (t) , where A = T (A + BKC) T −1 , B = T B and C = CT −1 . Thus, " # ˙ A˙ = T˙ T −1 , A + B KC = − [[A − F, A ] + (B − G) B − C (C − H) , A] − BB (A − F ) C C B˙ = T˙ T −1 B = − ([A − F, A ] + (B − G) B − C (C − H)) B C˙ = C T˙ T −1 = C ([A − F, A ] + (B − G) B − C (C − H)) , which completes the proof of (c). Remark 3.9 Note that the function φ : GL (n, R) × Rm×p → R is not necessarily proper. Therefore the existence of the solutions (T (t) , K (t)) of (3.17) for all t ≥ 0 is not guaranteed a-priori. In particular, finite escape time behaviour is not precluded. Remark 3.10 A complete phase portrait analysis of (3.17) would be desirable but is not available. Such an understanding requires deeper knowledge of the geometry of the output feedback problem. Important question here are: (a) Under which conditions are there finitely many equilibrium points of (3.17)? (b) Existence of global minima for φ : GL (n, R) × Rm×p → R? (c) Are there any local minima of the cost function φ : GL (n, R) × Rm×p → R besides global minima?
5.3. Output Feedback Control
155
Optimal Eigenvalue Assignment For optimal eigenvalue assignment problems one is interested in minimizing the cost function ϕ : GL (n, R) × Rm×p → R 2 ϕ (T, K) = T (A + BKC) T −1 − F
(3.18)
rather than (3.15). In this case the gradient flow T˙ , K˙ = − grad ϕ (T, K) on GL (n, R) × Rm×p is readily computed to be " # T (A + BKC) T −1 , T (A + BKC) T −1 − F T
−1 K˙ =B T F − T (A + BKC) T −1 (T ) C . T˙ =
(3.19)
The following result is an immediate consequence of the proof of Theorem 3.3. Corollary 3.11 (Eigenvalue Assignment) Let (A, B, C) ∈ L (n, m, p) and F ∈ Rn×n . ˙ B, ˙ C˙ (a) The gradient flow A, = − grad Ψ (A, B, C) of Ψ : 2
F (A, B, C) → R, Ψ (A, B, C) = A − F , with respect to the normal Riemannian metric is A˙ = [A, [A − F, A ]] − BB (A − F ) C C B˙ = − [A − F, A ] B
(3.20)
C˙ =C [A − F, A ] (b) Equilibrium points (A∞ , B∞ , C∞ ) of (3.20) are characterized by [A∞ − F, A∞ ] =0 (A∞ − F ) C∞ =0 B∞
(3.21) (3.22)
(c) For every initial condition (A (0) , B (0) , C (0)) ∈ F (A, B, C) the solution (A (t) , B (t) , C (t)) of (3.20) exists for all t ≥ 0 and remains in F (A, B, C). In the special case where (F, G, H) = (F , H , G ) and also (A, B, C) = (A , C , B ) are symmetric realizations the equation (3.19) restricts to a
156
Chapter 5. Approximation and Control
gradient flow on O (n) × S (m), where S (m) is the set of m × m symmetric matrices. In this case the flow (3.19) simplifies to ˙ = [F, Θ (A + BKB ) Θ ] Θ, Θ K˙ =B (Θ F Θ − (A + BKB )) B,
Θ0 ∈O (n) K0 ∈S (m)
The convergence properties of this flow is more easily analysed due to the compact nature of O (n), see Mahony and Helmke (1995). In this case we also have the result: Every solution (A (t) , B (t) , C (t)) of (3.20) converges to a connected component of the set of equilibrium points (A∞ , B∞ , C∞ ) ∈ F (A, B, C). To achieve such a result more generally, one can impose upper bounds on T and T −1 in the optimization. See Jiang and Moore (1996). Problem 3.12 A flow on L (n, m, p) A˙ =f (A, B, C) B˙ =g (A, B, C) C˙ =h (A, B, C) is called feedback preserving if for all t the solutions (A (t) , B (t) , C (t)) are feedback equivalent to (A (0) , B (0) , C (0)), i.e. F (A (t) , B (t) , C (t)) = F (A (0) , B (0) , C (0)) for all t and all (A (0) , B (0) , C (0)) ∈ L (n, m, p). Show that every feedback preserving flow on L (n, m, p) has the form A˙ (t) = [A (t) , L (t)] + BM (t) C B˙ (t) =L (t) B (t) C˙ (t) = − CL (t) for suitable matrix functions t → L (t), t → M (t). Conversely, show that every such flow on L (n, m, p) is feedback preserving. −1
−1
Problem 3.13 Let N (s) D (s) = C (sI − A) B be a coprime factor−1 p×m , ization of the transfer function C (sI − A) B, where N (s) ∈ R [s] m×m D (s) ∈ R [s] . Show that the coefficients of the polynomial entries appearing in N (s) are invariants for the flow (3.10). Use this to find n independent algebraic invariants for (3.10) when m = p = 1!
5.3. Output Feedback Control
157
Problem 3.14 For any integer N ∈ N let (A1 , B1 , C1) , . . . , (AN , BN , CN) ∈ L (n, m, p) be given. Prove that F ((A1 , B1 , C1 ) , . . . , (AN , BN , CN ))
:= T (A1 + B1 KC1 ) T −1 , T B1 , C1 T −1 , . . .
, T (AN + BN KCN ) T −1 , T BN , CN T −1 | T ∈ GL (n, R) , K ∈ Rm×p is a smooth submanifold of the N -fold product space L (n, m, p) × · · · × L (n, m, p). Problem 3.15 Prove that the tangent space of F ((A1 , B1 , C1 ) , . . . , (AN , BN , CN )) at (A1 , B1 , C1 ) , . . . , (AN , BN , CN ) is equal to {([X, A1 ] + B1 LC1 , XB1 , −CX1 ) ,
. . . , ([X, AN ] + BN LCN , XBN , −CXN ) | X ∈ Rn×n , L ∈ Rm×p .
Problem 3.16 Show that the gradient vector field of Ψ ((A1 , B1 , C1 ) , . . . , (AN , BN , CN )) :=
N
2
Ai − Fi
i=1
with respect to the normal Riemannian metric on the orbit F ((A1 , B1 , C1 ) , . . . , (AN , BN , CN )) is
A˙ i = Ai ,
N
N Aj − Fj , Aj − Bj Bj (Aj − Fj ) Cj Cj ,
j=1
B˙ i = −
N
j=1
Aj − Fj , Aj Bi ,
j=1
C˙ i = Ci
N
Aj − Fj , Aj ,
j=1
for i = 1, . . . , N . Characterize the equilibrium points. Problem 3.17 Show that every solution (T (t) , K (t)) of (3.17) satisfies the feedback gain bound 2
K (t) − K (0) ≤ 12 φ (T (0) , K (0)) .
158
Chapter 5. Approximation and Control
Main Points of Section The optimal eigenstructure assignment problem for linear systems (A, B, C) is that of finding a feedback equivalent system such that the resulting closed loop state matrix T (A + BKC) T −1 is as close as possible to a prescribed target matrix F . A related task in control theory is that of eigenvalue assignment or pole placement. A generalisation of the eigenstructure assignment problem is that of optimal system assignment. Here the distance of a target system to an ouput feedback orbit is minimized. Gradient flows achieving the solution to these problems are proposed . For symmetric systems realizations these flows generalise the double bracket flows. The approach easily extends to more complicated situations, i.e. for systems with symmetries and simultaneous eigenvalue assignment tasks.
5.3. Output Feedback Control
159
Notes for Chapter 5 For a thorough investigation of the total least squares problem we refer to Golub and Van Loan (1980). In de Moor and David (1992) it is argued that behind every least squares estimation problem there is an algebraic Riccati equation. This has been confirmed by de Moor and David (1992) for the total least squares problem. A characterization of the optimal solution for the total least squares problem in terms of a solution of an associated algebraic Riccati equation is obtained. Their work is closely related to that of Helmke (1991) and a deeper understanding of the connection between these papers would be desirable. The total least squares problem amounts to fitting a k-dimensional vector space to a data set in n real or complex dimensions. Thus the problem can be reformulated as that of minimizing the total least squares distance function, which is a trace function tr (AH) defined on a Grassmann manifold of Hermitian projection operators H. See Pearson (1901) for early results. Byrnes and Willems (1986) and Bloch (1985b) have used the symplectic structure on a complex Grassmann manifold to characterize the critical points of the total least squares distance function. Bloch (1985b; 1987) have analyzed the Hamiltonian flow associated with the function and described its statistical significance. In Bloch (1990a) the K¨ ahler structure of complex Grassmannians is used to derive explicit forms of the associated Hamiltonian and gradient flows. He shows that the gradient flow of the total least squares distance function coincides with Brockett’s double bracket equation. In Helmke et al. (1993) a phase portrait analysis of the gradient flow (1.45) and the Riccati flow on spaces of symmetric matrices is given. Local stability properties of the linearizations at the equilibria points are determined. A recursive numerical algorithm based on a variable step-size discretization is proposed. In Helmke and Shayman (1995) the critical point structure of matrix least squares distance functions defined on manifolds of fixed rank symmetric, skew-symmetric and rectangular matrices is obtained. Eckart and Young (1936) have solved the problem of approximating a matrix by one of lower rank, if the distance is measured by the Frobenius norm. They show that the best approximant is given by a truncated singular value decomposition. Mirsky (1960) has extended their result to arbitrary unitarily invariant norms, i.e. matrix norms which satisfy U XV = X for all unitary matrices U and V . A generalization of the Eckart-Young-Mirsky theorem has been obtained by Golub, Hoffman and Stewart (1987). In that paper also the connection between total least squares estimation and the minimization task for the least squares distance
160
Chapter 5. Approximation and Control 2
function FA : M (n, M × N ) → R, FA (X) = A − X , is explained. Matrix approximation problems for structured matrices such as Hankel or Toeplitz matrices are of interest in control theory and signal processing. Important papers are those of Adamyan, Arov and Krein (1971), who derive the analogous result to the Eckart-Young-Mirsky theorem for Hankel operators. The theory of Adamyan, Arov and Krein had a great impact on model reduction control theory, see Kung and Lin (1981). Halmos (1972) has considered the task of finding a best symmetric positive semidefinite approximant P of a linear operator A for the 2-norm. He derives a formula for the distance A − P 2 . A numerical bisection method for finding a nearest positive semidefinite matrix approximant is studied by Higham (1988). This paper also gives a simple formula for the best positive semidefinite approximant to a matrix A in terms of the polar decomposition of the symmetrization (A + A ) /2 of A. Problems with matrix definiteness constraints are described in Fletcher (1985), Parlett (1978). For further results on matrix nearness problems we refer to Demmel (1987), Higham (1986) and Van Loan (1985). Baldi and Hornik (1989) has provided a neural network interpretation of the total least squares problem. They prove that the least squares distance 2 function (1.22) FA : M (n, N × N ) → R, FA (X) = A − X , has generically a unique local and global minimum. Moreover, a characterization of the critical points is obtained. Thus, equivalent results to those appearing in the first section of the chapter are obtained. The polar decomposition can be defined for an arbitrary Lie group. It is then called the Iwasawa decomposition. For algorithms computing the polar decomposition we refer to Higham (1986). Textbooks on feedback control and linear systems theory are those of Kailath (1980), Sontag (1990b) and Wonham (1985). A celebrated result from linear systems theory is the pole-shifting theorem of Wonham (1967). It asserts that a state space system (A, B) ∈ Rn×n × Rn×m is controllable if and only if for every monic real polynomial p (s) of degree n there exists a state feedback gain K ∈ Rm×n such that the characteristic polynomial of A + BK is p (s). Thus controllability is a necessary and sufficient condition for eigenvalue assignment by state feedback. The corresponding, more general, problem of eigenvalue assignment by constant gain output feedback is considerably harder and a complete solution is not known to this date. The question has been considered by many authors with important contributions by Brockett and Byrnes (1981), Byrnes (1989), Ghosh (1988), Hermann and Martin (1977), Kimura (1975), Rosenthal (1992), Wang (1991) and Willems and Hesselink (1978). Important tools for the solution of the problem are those from algebraic geometry and topology. The paper of Byrnes (1989) is an excellent survey of the re-
5.3. Output Feedback Control
161
cent developments on the output feedback problem. Hermann and Martin (1977) has shown that the condition mp ≥ n is necessary for generic eigenvalue assignment by constant gain output feedback. A counter-example (m = p = 2, n = 4) of Willems and Hesselink (1978) shows that the condition mp ≥ n is in general not sufficient for generic eigenvalue assignment. Brockett and Byrnes (1981) has shown that generic eigenvalue assignment is possible if mp = n and an additional hypothesis on the degree of the pole-placement map is satisfied. Wang (1991) has shown that mp > n is a sufficient condition for generic eigenvalue assignment via output feedback. For results on simultaneous eigenvalue assignment of a finite number of linear systems by output feedback we refer to Ghosh (1988). The development of efficient numerical methods for eigenvalue assignment by output feedback is a challenge. Methods from matrix calculus have been applied by Godbout and Jordan (1980) to determine gradient matrices for output feedback. In Mahony and Helmke (1995) gradient flows for optimal eigenvalue assignment for symmetric state space systems, of interest in circuit theory, are studied.
CHAPTER
6
Balanced Matrix Factorizations 6.1 Introduction The singular value decomposition of a finite-dimensional linear operator is a special case of the following more general matrix factorization problem: Given a matrix H ∈ Rk× find matrices X ∈ Rk×n and Y ∈ Rn× such that H = XY.
(1.1)
A factorization (1.1) is called balanced if X X = Y Y
(1.2)
X X = Y Y = D
(1.3)
holds and diagonal balanced if
holds for a diagonal matrix D. If H = XY then H = XT −1T Y for all invertible n × n matrices T , so that factorizations (1.1) are never unique. A coordinate basis transformation T ∈ GL (n) is called balancing, and
diagonal balancing, if XT −1 , T Y is a balanced, and diagonal balanced factorization, respectively. Diagonal balanced factorizations (1.3) with n = rank (H) are equivalent to the singular value decomposition of H. In fact if (1.3) holds for H = XY then with U = D−1/2 Y,
V = XD−1/2 ,
(1.4)
164
Chapter 6. Balanced Matrix Factorizations
we obtain the singular value decomposition H = V DU , U U = In , V V = In . A parallel situation arises in system theory, where factorizations H = O · R of a Hankel matrix H by the observability and controllability matrices O and R are crucial for realization theory. Realizations (A, B, C) are easily constructed from O and R. Balanced and diagonal balanced realizations then correspond to balanced and diagonal balanced factorizations of the Hankel matrix. While diagonal balanced realizations are an important tool for system approximation and model reduction theory, they are not always best for certain minimum sensitivity applications arising, for example in digital filtering. There is therefore the need to study both types of factorizations as developed in this chapter. System theory issues are studied in Chapters 7–9, based on the analysis of balanced matrix factorizations as developed in this chapter. In order to come to grips with the factorization tasks (1.1)–(1.3) we have to study the underlying geometry of the situation. Let
n = rk (H) F (H) = (X, Y ) ∈ Rk×n × Rn× | H = XY , (1.5) denote the set of all minimal, that is full rank, factorizations (X, Y ) of H. Applying a result from linear algebra, Section A.8, we have
F (H) = XT −1, T Y ∈ Rk×n × Rn× | T ∈ GL (n) (1.6) for any initial factorization H = XY with rk (X) = rk (Y ) = n. We are thus led to study the sets
O (X, Y ) := XT −1, T Y ∈ Rk×n × Rn× | T ∈ GL (n) (1.7) for arbitrary n, k, l. The geometry of O (X, Y ) and their topological closures is of interest for invariant theory and the necessary tools are developed in Sections 6.2 and 6.3. It is shown there that O (X, Y ) and hence F (H) are smooth manifolds. Balanced and diagonal balanced factorizations are characterized as the critical points of the cost functions ΦN : O (X, Y ) → R
(1.8) −1 ΦN XT −1, T Y = tr N (T ) X XT −1 + tr (N T Y Y T ) in terms of a symmetric target matrix N , which is the identity matrix or a diagonal matrix, respectively. If N is the identity matrix, the function ΦI , denoted Φ, has the appealing form 2
2 Φ XT −1, T Y = XT −1 + T Y , (1.9)
6.2. Kempf-Ness Theorem
165
for the Frobenius norm. We are now able to apply gradient flow techniques in order to compute balanced and diagonal balanced factorizations. This is done in Sections 6.3– 6.5.
6.2 Kempf-Ness Theorem The purpose of this section is to recall an important recent result as well as the relevant terminology from invariant theory. The result is due to Kempf and Ness (1979) and plays a central rˆ ole in our approach to the balanced factorization problem. For general references on invariant theory we refer to Kraft (1984) and Dieudonn´e and Carrell (1971). Digression: Group Actions Let GL (n) denote the group of invertible real n×n matrices and let O (n) ⊂ GL (n) denote the subgroup consisting of all real orthogonal n × n matrices Θ characterized by ΘΘ = Θ Θ = In . A group action of GL (n) on a finite dimensional real vector space V is a map α : GL (n) × V → V,
(g, x) → g · x
(2.1)
satisfying for all g, h ∈ GL (n), x ∈ V g · (h · x) = (gh) · x,
e·x=x
where e = In denotes the n × n identity matrix. If the coordinates of g · x in (2.1) are polynomials in the coefficients of g, x and det (g)−1 then, (2.1) is called an algebraic group action, and α : GL (n) × V → V is called a linear algebraic group action if in addition the maps αg : V → V,
αg (x) = g · x
(2.2)
are linear, for all g ∈ GL (n). A typical example of a linear algebraic group action is the similarity action α : GL (n) × Rn×n →Rn×n , (S, A) →SAS −1
(2.3)
where the (i, j)-entry of SAS −1 is a polynomial in the coefficients of S, A and (det S)−1 .
166
Chapter 6. Balanced Matrix Factorizations
Given a group action (2.1) and an element x ∈ V , the stabilizer of x is the subgroup of GL (n) defined by Stab (x) = {g ∈ GL (n) | g · x = x}
(2.4)
The orbit of x is defined as the set O (x) of all y ∈ V which are GL (n)equivalent to x, that is O (x) = {g · x | g ∈ GL (n)}
(2.5)
We say that the orbit O (x) is closed if O (x) is a closed subset of V . In the above example, the orbit O (A) of an n × n matrix A is just the set of all matrices B which are similar to A. If GL (n) / Stab (x) denotes the quotient space of GL (n) by the stabilizer group Stab (x) then O (x) is homeomorphic to GL (n) / Stab (x). In fact, for any algebraic group action of GL (n), both O (x) and GL (n) / Stab (x) are smooth manifolds which are diffeomorphic to each other; see Appendix C.
A positive definite inner product , on the vector space V is called orthogonally invariant with respect to the group action (2.1) if Θ · x, Θ · y = x, y
(2.6)
holds for all x, y ∈ V and Θ ∈ O (n). This induces an O (n)-invariant 2 Hermitian norm on V defined by x = x, x . Choose any such O (n)-invariant norm on V . For any given x ∈ V we consider the distance or norm functions Φ : O (x) →R,
Φ (y) = y
2
(2.7)
and φx : GL (n) →R,
φx (g) = g · x2
(2.8)
Note that φx (g) is just the square of the distance of the transformed vector g · x to the origin 0 ∈ V . We can now state the Kempf-Ness result, where the real version presented here is actually due to Slodowy (1989). We will not prove the theorem here, because we will prove in Chapter 7 a generalization of the Kempf-Ness theorem which is due to Azad and Loeb (1990). Theorem 2.1 Let α : GL (n) × V → V be a linear algebraic action of GL (n) of a finite-dimensional real vector space V and let · be derived from an orthogonally invariant inner product on V . Then
6.2. Kempf-Ness Theorem
167
(a) The orbit O (x) is closed if and only if the norm function Φ : O (x) → R (respectively φx : G → R) has a global minimum (i.e. g0 · x = inf g∈G g · x for some g0 ∈ G). (b) Let O (x) be closed. Every critical point of Φ : O (x) → R is a global minimum and the set of global minima of Φ is a single O (n)-orbit. (c) Let O (x) be closed and let C (Φ) ⊂ O (x) denote the set of all critical points of Φ. Let x0 ∈ C (Φ) be a critical point of Φ : O (x) → R. Then the Hessian D2 Φx of Φ at x0 is positive semidefinite and D2 Φx 0 0 degenerates exactly on the tangent space Tx0 C (Φ). Any function F : O (x) → R which satisfies Condition (c) of the above theorem is called a Morse-Bott function. We say that Φ is a perfect MorseBott function. Example 2.2 (Similarity Action) Let α : GL (n) × Rn×n → Rn×n , 2 (S, A) → SAS −1 , be the similarity action on Rn×n and let A = tr (AA ) be the Frobenius norm. Then A = ΘAΘ for Θ ∈ O (n) is an orthogonal invariant Hermitian norm of Rn×n . It is easy to see and a well known fact from invariant theory that a similarity orbit
O (A) = SAS −1 | S ∈ GL (n) (2.9) is a closed subset of Rn×n if and only if A is diagonalizable over C. A matrix B = SAS −1 ∈ O (A) is a critical point for the norm function Φ : O (A) → R,
Φ (B) = tr (BB )
(2.10)
if and only if BB = B B, i.e. if and only if B is normal. Thus the KempfNess theorem implies that, for A diagonalizable over C, the normal matrices B ∈ O (A) are the global minima of (2.10) and that there are no other critical points.
Main Points of Section The important notions of group actions and orbits are introduced. An example of a group action is the similarity action on n × n matrices and the orbits here are the equivalence classes formed by similar matrices. Closed orbits are important since on those there always exists an element with minimal distance to the origin. The Kempf-Ness theorem characterizes the closed orbits of a GL (n) group action and shows that there are no critical points except for global minima of the norm distance function. In the case of the similarity action, the normal matrices are precisely those orbit elements which have minimal distance to the zero matrix.
168
Chapter 6. Balanced Matrix Factorizations
6.3 Global Analysis of Cost Functions We now apply the Kempf-Ness theorem to our cost functions Φ : O (X, Y ) → R and ΦN : O (X, Y ) → R introduced in Section 6.1. Let V = Rk×n × Rn× denote the vector space of all pairs of k × n and n × matrices X and Y . The group GL (n) of invertible n × n real matrices T acts on V by the linear algebraic group action
α : GL (n) × Rk×n × Rn× → Rk×n × Rn× (3.1)
(T, (X, Y )) → XT −1, T Y . The orbits O (X, Y ) =
XT −1 , T Y | T ∈ GL (n)
are thus smooth manifolds, Appendix C. We endow V with the Hermitian inner product defined by (X1 , Y1 ) , (X2 , Y2 ) = tr (X1 X2 + Y1 Y2 )
(3.2)
The induced Hermitian norm (3.3) (X, Y )2 = tr (X X + Y Y ) is orthogonally invariant, that is XΘ−1 , ΘY = (X, Y ) for all orthogonal n × n matrices Θ ∈ O (n). Lemma 3.1 Let (X, Y ) ∈ Rk×n × Rn× . Then (a) O (X, Y ) is a smooth submanifold of Rk×n × Rn× . (b) The tangent space of O = O (X, Y ) at (X, Y ) is
T(X,Y ) O = (−XΛ, ΛY ) | Λ ∈ Rn×n
(3.4)
Proof 3.2 O (X, Y ) is an orbit of the smooth algebraic Lie group action (3.1) and thus, by Appendix C, a smooth submanifold of Rk×n × Rn× . To prove (b) we consider the smooth map
(3.5) ϕ : GL (n) → O (X, Y ) , T → XT −1, T Y The derivative of ϕ at the identity matrix In is the surjective linear map Dϕ|In : TIn GL (n) → T(X,Y ) O defined by Dϕ|In (Λ) = (−XΛ, ΛY ) for Λ ∈ TIn GL (n) = Rn×n . The result follows.
6.3. Global Analysis of Cost Functions
169
The above characterization of the tangent spaces of course remains in force if the point (X, Y ) ∈ O is replaced by any other point (X1 , Y1 ) ∈ O. In the sequel we will often denote a general point (X1 , Y1 ) of O (X, Y ) simply by (X, Y ). The following lemma is useful in order to characterize the closed orbits O (X, Y ) for the group action (3.1). Let A¯ denote the topological closure of subset A of a topological space M . Lemma 3.3 (a) (X1 , Y1 ) ∈ O (X, Y ) if and only if X1 Y1 = XY , ker (Y1 ) = ker (Y ) and image (X1 ) = image (X). (b) (X1 , Y1 ) ∈ O (X, Y ) if and only if X1 Y1 = XY , ker (Y1 ) ⊃ ker (Y ) and image (X1 ) ⊂ image (X). (c) O (X, Y ) is closed if and only if ker (Y ) = ker (XY ) and image (X) = image (XY ). Proof 3.4 See Kraft (1984). Corollary 3.5 O(X, Y ) is closed if and only if rk (X) = rk (Y ) = rk (XY ). Proof 3.6 By Lemma 3.3 the orbit O (X, Y ) is closed if and only if dim ker (Y ) = dim ker (XY ) and dim image (X) = dim image (XY ). But rk (Y ) + dim ker (Y ) = and thus the condition is satisfied if and only if rk (X) = rk (Y ) = rk (XY ). Let Φ : O (X, Y ) → R be the smooth function on O (X, Y ) defined by 2
2 Φ XT −1 , T Y = XT −1 + T Y (3.6) −1 = tr (T ) X XT −1 + T Y Y T We have Theorem 3.7 (a) An element (X0 , Y0 ) ∈ O (X, Y ) is a critical point of Φ if and only if X0 X0 = Y0 Y0 . (b) There exists a minimum of Φ : O (X, Y ) → R in O (X, Y ) if and only if rk (X) = rk (Y ) = rk (XY ). (c) All critical points of Φ : O (X, Y ) → R are global minima. If (X1 , Y1 ), an orthogo(X2 , Y2 ) ∈ O (X, Y ) are global minima, then there exists
nal transformation Θ ∈ O (n) such that (X2 , Y2 ) = X1 Θ−1 , ΘY1 .
170
Chapter 6. Balanced Matrix Factorizations
(d) Let C (Φ) ⊂ O (X, Y ) denote the set of all critical points of Φ : O (X, Y ) → R. Then the Hessian D2 Φ(X0 ,Y0 ) is positive semidefi degenerates nite at each critical point (X0 , Y0 ) of Φ and D2 Φ (X0 ,Y0 )
exactly on the tangent space T(X0 ,Y0 ) C (Φ) of C (Φ). Proof 3.8 The derivative DΦ|(X,Y ) : T(X,Y ) O → R of Φ at any point (X, Y ) ∈ O is the linear map defined by DΦ|(X,Y ) (ξ, η) = 2 tr (X ξ + ηY ) . By Lemma 3.1, the tangent vectors of O at (X, Y ) are of the form (ξ, η) = (−XΛ, ΛY ) for an n × n matrix Λ. Thus DΦ|(X,Y ) (ξ, η) = −2 tr [(X X − Y Y ) Λ] , which vanishes on T(X,Y ) O if and only if X X = Y Y . This proves (a). Parts (b), (c) and (d) are an immediate consequence of the Kempf-Ness Theorem 2.1 together with Corollary 3.5. Let N = N ∈ Rn×n be an arbitrary symmetric positive definite matrix and let ΦN : O (X, Y ) → R be the weighted cost function defined by
−1 ΦN XT −1 , T Y = tr N (T ) X XT −1 + N T Y Y T (3.7) Note that ΦN is in general not an orthogonally invariant norm function. Thus the next theorem does not follow immediately from the Kempf-Ness theory. Theorem 3.9 Let N = N > 0. (a) (X0 , Y0 ) is a critical point of ΦN if and only if N X0 X0 = Y0 Y0 N . (b) There exists a minimum of ΦN : O (X, Y ) → R in O (X, Y ) if rk (X) = rk (Y ) = rk (XY ). (c) Let N = diag (µ1 , . . . , µn ) with µ1 > · · · > µn > 0. Then (X0 , Y0 ) is a critical point of ΦN if and only if (X0 , Y0 ) is diagonal balanced. Proof 3.10 The derivative of ΦN DΦN |(X,Y ) : T(X,Y ) O → R with
: O (X, Y ) → R at (X, Y ) is
DΦN |(X,Y ) (ξ, η) = 2 tr (N X ξ + N ηY ) = −2 tr ((N X X − Y Y N ) Λ)
6.3. Global Analysis of Cost Functions
171
for (ξ, η) = (−XΛ, ΛY ). Hence DΦN |(X,Y ) = 0 on T(X,Y ) O if and only if N X X = Y Y N , which proves (a). By Corollary 3.5 the rank equality condition of (b) implies that O (X, Y ) is closed. Consequently, ΦN : O (X, Y ) → R+ is a proper function and therefore a minimum in O (X, Y ) exists. This proves (b). To prove (c) note that N X X = Y Y N is equivalent to N X XN −1 = Y Y . By symmetry of Y Y thus N 2 X X = X XN 2 and hence
2 µi − µ2j xi xj = 0 for all i, j = 1, . . . , n where X = (x1 , . . . , xn ). Since µ2i = µ2j for i = j then xi xj = 0 for i = j and therefore both X X and Y Y must be diagonal and equal. The result follows. Finally let us explore what happens if other orthogonal invariant norms are used. Thus let Ω1 ∈ Rk×k , Ω2 ∈ R× be symmetric positive definite and let Ψ : O (X, Y ) → R,
Ψ (X, Y ) = tr (X Ω1 X + Y Ω2 Y )
(3.8)
The function Ψ defines an orthogonally invariant norm on O (X, Y ). The critical points of Ψ : O (X, Y ) → R are easily characterized by X Ω1 X = Y Ω2 Y
(3.9)
and thus the Kempf-Ness Theorem (2.1) implies the following. Corollary 3.11 Let Ω1 ∈ Rk×k and Ω2 ∈ R× be positive definite symmetric matrices. Then: (a) The critical points of Ψ : O (X, Y ) → R defined by (3.8) are characterized by X Ω1 X = Y Ω2 Y . (b) There exists a minimum of Ψ in O (X, Y ) if and only if rk (X) = rk (Y ) = rk (XY ). (c) The critical points of Ψ : O (X, Y ) → R are the global minima. If the pairs (X1 , Y1 ) , (X2 , Y2 ) ∈ O (X, Y ) are global minima, then (X2 , Y2 ) = X1 Θ−1 , ΘY1 for an orthogonal matrix Θ ∈ O (n). A similar result, analogous to Theorem 3.9, holds for ΨN : O (X, Y ) → R, ΨN (X, Y ) = tr (N X Ω1 X) + tr (N Y Ω2 Y ). Problem 3.12 State and prove the analogous result to Theorem 3.9 for the weighted cost function ΨN : O (X, Y ) → R.
172
Chapter 6. Balanced Matrix Factorizations
Main Points of Section A global analysis of the cost functions for balancing matrix factorizations is developed. The orbits O (X, Y ) are shown to be smooth manifolds and their tangent spaces are computed. The closed orbits are characterized using the Kempf-Ness theorem. The global minima of the cost function are shown to be balanced. Similarly, diagonal balanced factorizations are characterized as the critical points of a weighted cost function. The Kempf-Ness theory does not apply to such weighted cost functions.
6.4 Flows for Balancing Transformations In this section gradient flows evolving on the Lie group GL (n) of invertible coordinate transformations T ∈ GL (n) are constructed which converge exponentially fast to the class of all balancing coordinate transformations for a given factorization H = XY . Thus let cost functions Φ : GL (n) → R and ΦN : GL (n) → R be defined by 2 2 Φ (T ) = XT −1 + T Y and
Given that
−1 ΦN (T ) = tr N (T ) X XT −1 + tr (N T Y Y T ) # " −1 Φ (T ) = tr (T ) X XT −1 + T Y Y T
= tr X XP −1 + Y Y P
(4.1)
(4.2)
(4.3)
for P = T T
(4.4)
it makes sense to consider first the task of minimizing Φ with respect to the positive definite symmetric matrix P = T T .
Gradient flows on positive definite matrices Let P (n) denote the set of positive definite symmetric n × n matrices P = P > 0. The function we are going to study is
φ : P (n) → R, φ (P ) = tr X XP −1 + Y Y P (4.5)
6.4. Flows for Balancing Transformations
173
Lemma 4.1 Let (X, Y ) ∈ Rk×n × Rn× with rk (X) = rk (Y ) = n. The function φ : P (n)
→ R defined by(4.5)has compact sublevel sets, that is P ∈ P (n) | tr X XP −1 + Y Y P ≤ a is a compact subset of P (n) for all a ∈ R.
Proof 4.2 Let Ma := P ∈ P (n) | tr X XP −1 + Y Y P ≤ a and Na := 2
2 T ∈ GL (n) | XT −1 + T Y ≤ a . Suppose that X and Y are full rank n. Then ϕ : GL (n) →O (X, Y )
T → XT −1 , T Y onto φ (Na ) = is
a homeomorphism which 2maps Na2 homeomorphically (X1 , Y1 ) ∈ O (X, Y ) | X1 + Y1 ≤ a . By Corollary 3.5, φ (Na ) is the intersection of the compact set & ' 2 2 (X1 , Y1 ) ∈ Rk×n × Rn× | X1 + Y1 ≤ a with the closed set O (X, Y ) and therefore is compact. Thus Na must be compact. Consider the continuous map F : Ma × O (n) → Na ,
F (P, Θ) = ΘP 1/2 .
By the polar decomposition of T ∈ GL (n), F is a bijection and its inverse is given by the continuous map F −1 : Na → Ma × O (n), F −1 (T ) =
−1/2 T T, T (T T ) . Thus F is a homeomorphism and the compactness of Na implies that of Ma . The result follows. Flanders (1975) seemed to be one of the first which has considered the problem of minimizing the function φ : P (n) → R defined by (4.5) over the set of positive definite symmetric matrices. His main result is as follows. The proof given here which is a simple consequence of Theorem 3.7 is however much simpler than that of Flanders. Corollary 4.3 There exists a minimizing positive definite symmetric matrix
P −1= P > 0 for the function φ : P (n) → R, φ (P ) = tr X XP + Y Y P , if and only if rk (X) = rk (Y ) = rk (XY ). Proof 4.4 map φ : P (n) → O (X, Y ) defined by
Consider the continuous φ (P ) = XP −1/2 , P 1/2 Y . There exists a minimum for φ : P (n) → R if and only if there exists a minimum for the function O (X, Y ) → R, 2 2 (X, Y ) → X + Y . The result now follows from Theorem 3.7(b).
174
Chapter 6. Balanced Matrix Factorizations
We now consider the minimization task for the cost function φ : P (n) → R from a dynamical systems viewpoint. Certainly P (n) is a smooth submanifold of Rn×n . For the subsequent results we endow P (n) with the induced Riemannian metric. Thus the inner product of two tangent vectors ζ, η ∈ TP P (n) is ζ, η = tr (ζ η). Theorem 4.5 (Linear index gradient flow) Let (X, Y ) ∈ Rk×n ×Rn× with rk (X) = rk (Y ) = n. (a) There exists a unique P∞ = P∞ > 0 which minimizes φ : P (n) → R, −1 φ (P ) = tr X XP + Y Y P , and P∞ is the only critical point of φ. This minimum is given by " # −1/2 1/2 1/2 1/2 −1/2 (Y Y ) (X X) (Y Y ) (Y Y ) P∞ = (Y Y ) (4.6) 1/2
and T∞ = P∞ is a balancing transformation for (X, Y ). (b) The gradient flow P˙ (t) = −∇φ (P (t)) on P (n) is given by P˙ = P −1 X XP −1 − Y Y ,
P (0) = P0
(4.7)
For every initial condition P0 = P0 > 0, P (t) ∈ P (n) exists for all t ≥ 0 and converges exponentially fast to P∞ as t → ∞, with a lower bound for the rate of exponential convergence given by 3
ρ≥2
σmin (Y ) σmax (X)
(4.8)
where σmin (A) and σmax (A) are the smallest and largest singular value of a linear operator A respectively. Proof 4.6 The existence of P∞ follows immediately from Lemma 4.1 and Corollary 4.3. Uniqueness of P∞ follows from Theorem 3.7(c). Similarly by Theorem 3.7(c) every critical point is a global minimum. Now simple manipulations using standard results from matrix calculus, reviewed in Appendix A, give ∇φ (P ) = Y Y − P −1 X XP −1
(4.9)
for the gradient of φ. Thus the critical point of Φ is characterized by ∇φ (P∞ ) = 0 ⇔X X = P∞ Y Y P∞
1/2 1/2 1/2 2 X X (Y Y ) = (Y Y ) P∞ (Y Y ) " # −1/2 1/2 1/2 1/2 −1/2 (Y Y ) X X (Y Y ) = (Y Y ) (Y Y )
⇔ (Y Y )
1/2
⇔P∞
6.4. Flows for Balancing Transformations
175
−1 −1 Also X X = P∞ Y Y P∞ ⇔ T∞ X XT∞ = T∞ Y Y T∞ for T∞ = P∞ . This proves (a) and (4.7). By Lemma 4.1 and standard properties of gradient flows, reviewed in Appendix C, the solution P (t) ∈ P (n) exists for all t ≥ 0 and converges to the unique equilibrium point P∞ . To prove the result about exponential stability, we first review some known results concerning Kronecker products and the vec operation, see Section A.10. Recall that the Kronecker product of two matrices A and B is defined by 1/2
a11 B a21 B A⊗B = .. . an1 B
a12 B a22 B .. . an2 B
... ... .. . ...
a1k B a2k B .. . . ank B
The eigenvalues of the Kronecker product of two matrices are given by the product of the eigenvalues of the two matrices, that is λij (A ⊗ B) = λi (A) λj (B) . Moreover, with vec (A) defined by " vec (A) = a11
...
an1
a12
...
an2
...
a1m
...
# anm ,
then vec (M N ) = (I ⊗ M ) vec (N ) = (N ⊗ I) vec (M ) vec (ABC) = (C ⊗ A) vec (B) . A straightforward computation shows that the linearization of the right hand side of (4.7) at P∞ is given by the linear operator −1 −1 −1 −1 −1 −1 ξ → −P∞ ξP∞ X XP∞ − P∞ X XP∞ ξP∞ . −1 −1 Since P∞ X XP∞ = Y Y it follows that
d −1 −1 ⊗ Y Y − Y Y ⊗ P∞ vec (P − P∞ ) = −P∞ vec (P − P∞ ) dt
(4.10)
is the linearization of (4.7) around P∞ . The smallest eigenvalue of JSV D = −1 −1 P∞ ⊗ Y Y + Y Y ⊗ P∞ can then be estimated as (see Appendix A)
−1 λmin (JSV D ) ≥ 2λmin P∞ λmin (Y Y ) > 0
176
Chapter 6. Balanced Matrix Factorizations
which proves exponential convergence of P (t) to P∞ . To obtain the bound on the rate of convergence let S be an orthogonal transformation that ¯ = XS −1 and diagonalizes P∞ , so that SP∞ S = diag (λ1 , . . . , λn ). Let X ¯ ¯ ¯ ¯ ¯ Y = SY . Then X X = SP∞ S Y Y SP∞ S and thus by ¯ i 2 = e X ¯ Xe ¯ = λ2 Y¯ ei 2 , Xe i i i
i = 1, . . . , n,
we obtain
Y¯ ei
−1 σmin Y¯ (Y ) σ
= min λmin P∞ = min ¯ i ≥ σmax X ¯ Xe i σmax (X) and therefore by λmin (Y Y ) = σmin (Y ) we obtain (4.8). 2
In the sequel we refer to (4.7) as the linear index gradient flow. Instead of minimizing the functional φ (P ) we might as well consider the minimization problem for the quadratic index function 2
2 (4.11) ψ (P ) = tr (Y Y P ) + X XP −1 over all positive definite symmetric matrices P = P > 0. Since, for P = T T , ψ (P ) is equal to 2 2 −1 tr (T Y Y T ) + (T ) X XT −1 , the minimization to the task of minimizing the quar
of (4.11) is equivalent tic function tr (Y Y )2 + (X X)2 over all full rank factorizations (X, Y ) of a given matrix H = XY . The cost function ψ has a greater penalty for being away from the minimum than the cost function φ, so can be expected to converge more rapidly. By the formula # " 2 2 2 tr (Y Y ) + (X X) = X X − Y Y + 2 tr (X XY Y ) = X X − Y Y + 2 H 2 2 we see that the minimization of tr (X X) + (Y Y ) over F (H) is equivalent to the minimization of the least squares distance X X − Y Y 2 over F (H). 2
2
Theorem 4.7 (Quadratic index gradient flow) Let (X, Y ) ∈ Rk×n × Rn× with rk (X) = rk (Y ) = n.
6.4. Flows for Balancing Transformations
177
(a) There exists a unique P∞ = P∞ > 0 which minimizes ψ : P (n) → R,
2 2 ψ (P ) = tr X XP −1 + (Y Y P ) , and
P∞ = (Y Y )
−1/2
" # 1/2 1/2 1/2 −1/2 (Y Y ) (X X) (Y Y ) (Y Y ) 1/2
is the only critical point of ψ. Also, T∞ = P∞ is a balancing coordinate transformation for (X, Y ). (b) The gradient flow P˙ (t) = −∇ψ (P (t)) is P˙ = 2P −1 X XP −1 X XP −1 − 2Y Y P Y Y ,
P (0) = P0 (4.12)
For all initial conditions P0 = P0 > 0, the solution P (t) of (4.12) exists in P (n) for all t ≥ 0 and converges exponentially fast to P∞ . A lower bound on the rate of exponential convergence is 4
ρ > 4σmin (Y )
(4.13)
Proof 4.8 Again the existence and uniqueness of P∞ and the critical points of ψ follows as in the proof of Theorem 4.5. Similarly, the expression (4.12) for the gradient flow follows easily from the standard rules of matrix calculus. The only new point we have to check is the bound on the rate of exponential convergence (4.13). The linearization of the right hand side of (4.12) at the equilibrium point P∞ is given by the linear map (ξ = ξ )
−1 −1 −1 −1 −1 −1 −1 −1 ξP∞ X XP∞ X XP∞ + P∞ X XP∞ ξP∞ X XP∞ ξ → −2 P∞ −1 −1 −1 −1 + P∞ X XP∞ X XP∞ ξP∞ + Y Y ξY Y That is, using P∞ Y Y P∞ = X X, by
−1 −1 ξY Y P∞ Y Y + 2Y Y ξY Y + Y Y P∞ Y Y ξP∞ ξ → −2 P∞ (4.14) Therefore the linearization of (4.12) at P∞ is
d −1 ⊗ Y Y P∞ Y Y vec (P − P∞ ) = −2 2Y Y ⊗ Y Y + P∞ dt (4.15) −1 vec (P − P∞ ) + Y Y P∞ Y Y ⊗ P∞
178
Chapter 6. Balanced Matrix Factorizations
Therefore (see Appendix A) the minimum eigenvalue of the self adjoint linear operator given by (4.14) is greater than or equal to
−1 2 λmin (Y Y P∞ Y Y ) 4λmin (Y Y ) + 4λmin P∞
−1 2 λmin (P∞ ) ≥4λmin (Y Y ) 1 + λmin P∞ >4λmin (Y Y ) >0.
2
The result follows. We refer to (4.12) as the quadratic index gradient flow. The above results show that both algorithms converge exponentially fast to P∞ , however their transient behaviour is rather different. In fact, simulation experiment with both gradient flows show that the quadratic index flow seems to behave better than the linear index flow. This is further supported by the following lemma. It compares the rates of exponential convergence of the algorithms and shows that the quadratic index flow is in general faster than the linear index flow. Lemma 4.9 Let ρ1 and ρ2 denote the rates of exponential convergence of (4.7) and (4.12) respectively. Then ρ1 < ρ2 if σmin (XY ) > 12 . Proof 4.10 By (4.10), (4.15),
−1 −1 , ⊗ Y Y + Y Y ⊗ P∞ ρ1 = λmin P∞ and
−1 −1 . ⊗ Y Y P∞ Y Y + Y Y P∞ Y Y ⊗ P∞ ρ2 = 2λmin 2Y Y ⊗ Y Y + P∞ We need the following lemma; see Appendix A. Lemma 4.11 Let A, B ∈ Rn×n be symmetric matrices and let λ1 (A) ≥ · · · ≥ λn (A) and λ1 (B) ≥ · · · ≥ λn (B) be the eigenvalues of A, B, ordered with respect to their magnitude. Then A − B positive definite implies λn (A) > λn (B). Thus, according to Lemma 4.11, it suffices to prove that −1 4Y Y ⊗ Y Y + 2P∞ ⊗ Y Y P∞ Y Y + −1 −1 −1 − P∞ ⊗ Y Y − Y Y ⊗ P∞ 2Y Y P∞ Y Y ⊗ P∞ −1 =4Y Y ⊗ Y Y + P∞ ⊗ (2Y Y P∞ Y Y − Y Y ) + −1 (2Y Y P∞ Y Y − Y Y ) ⊗ P∞ >0.
6.4. Flows for Balancing Transformations
179
But this is the case, whenever " # 1/2 1/2 1/2 1/2 1/2 (Y Y ) X X (Y Y ) 2Y Y P∞ Y Y =2 (Y Y ) (Y Y ) >YY # " 1/2 1/2 1/2 ⇔2 (Y Y ) X X (Y Y ) > In 1/2 > 14 ⇔λmin (Y Y )1/2 X X (Y Y ) ⇔λmin (X XY Y ) > 14
2 We have used the facts that λi A2 = λi (A) for any operator A = A and
1/2 2
1/2 2 = λi (A) as well as λi (AB) = λi (BA). = λi A therefore λi A The result follows because
2 λmin (X XY Y ) = λmin (XY ) (XY ) = σmin (XY ) .
Gradient flows on GL (n) Once the minimization of the cost functions φ, ψ on P (n) has been performed the class of all balancing transformations is {ΘT∞ | Θ ∈ O (n)} 1/2 where T∞ = P∞ is the unique symmetric, positive definite balancing transformation for (X, Y ). Of course, other balancing transformations than T∞ might be also of interest to compute in a similar way and therefore one would like to find suitable gradient flows evolving on arbitrary invertible n × n matrices. See Appendix C for definitions and results on dynamical systems. Thus for T ∈ GL (n) we consider −1 (4.16) Φ (T ) = tr T Y Y T + (T ) X XT −1 and the corresponding gradient flow T˙ = −∇Φ (T ) on GL (n). Here and in the sequel we always endow GL (n) with its standard Riemannian metric A, B = 2 tr (A B)
(4.17)
i.e. with the constant Euclidean inner product (4.17) defined on the tangent spaces of GL (n). Here the constant factor 2 is introduced for convenience. Theorem 4.12 Let rk (X) = rk (Y ) = n. (a) The gradient flow T˙ = −∇Φ (T ) of Φ : GL (n) → R is −1 −1 T˙ = (T ) X X (T T ) − T Y Y ,
T (0) = T0
(4.18)
180
Chapter 6. Balanced Matrix Factorizations
and for any initial condition T0 ∈ GL (n) the solution T (t) of (4.18) exists in GL (n) for all t ≥ 0. (b) For any initial condition T0 ∈ GL (n) the solution T (t) of (4.18) converges to a balancing transformation T∞ ∈ GL (n) and all balancing transformations of (X, Y ) can be obtained in this way, for suitable initial conditions T0 ∈ GL (n). Moreover, Φ : GL (n) → R is a Morse-Bott function. (c) Let T∞ be a balancing transformation and let W s (T∞ ) denote the set of all T0 ∈ GL (n) such that the solution T (t) of (4.18) with T (0) = T0 converges to T∞ as t → ∞. Then W s (T∞ ) is an immersed invariant submanifold of GL (n) of dimension n (n + 1) /2 and every solution T (t) ∈ W s (T∞ ) converges exponentially fast in W s (T∞ ) to T∞ . Proof 4.13 The derivative of Φ : GL (n) → R at T is the linear operator given by −1 DΦ|T (ξ) =2 tr Y Y T − T −1 (T ) X XT −1 ξ −1 −1 ξ =2 tr T Y Y − (T ) X X (T T ) and therefore ∇Φ (T ) = T Y Y − (T )
−1
−1
X X (T T )
.
To prove that the gradient flow (4.18) is complete, i.e. that the solutions T (t) exist for all t ≥ 0, it suffices to show that Φ : GL (n) → R has compact sublevel sets {T ∈ GL (n) | Φ (T ) ≤ a} for all a ∈ R. =
Since rk (X) rk (Y ) = n the map φ : GL (n) → O (X, Y ), φ (T ) = XT −1, T Y , is a homeomorphism and, using Theorem 3.7, & ' 2 2 (X1 , Y1 ) ∈ O (X, Y ) | X1 + Y1 ≤ a is compact for all a ∈ R. Thus {T ∈ GL (n) | Φ (T ) ≤ a} & ' 2 2 = φ−1 (X1 , Y1 ) ∈ O (X, Y ) | X1 + Y1 ≤ a is compact. This shows (a).
6.4. Flows for Balancing Transformations
181
To prove (b) we note that, by (a) and by the steepest descent property of gradient flows, any solution T (t) converges to the set of equilibrium points T∞ ∈ GL (n). The equilibria of (4.18) are characterized by −1
) (T∞
−1
X X (T∞ T∞ )
−1
= T∞ Y Y ⇔ (T∞ )
−1 X XT∞ = T∞ Y Y T∞
and hence T∞ is a balancing transformation. This proves (b), except that we haven’t yet proved convergence to equilibrium points rather than to the entire set of equilibria; see below. Let T∞ ) Y Y (T∞ T∞ ) = X X} E := {T∞ ∈ GL (n) | (T∞
(4.19)
denote the set of equilibria points of (4.18). To prove (c) we need a lemma. Lemma 4.14 The tangent space of E at T∞ ∈ E is
TT∞ E = S ∈ Rn×n | S T∞ + T∞ S=0
(4.20)
Proof 4.15 Let P∞ denote the unique symmetric positive definite solution of P Y Y P = X X. Thus E = {T | T T = P∞ } and therefore TT∞ E is the kernel of the derivative of T → T T − P∞ at T∞ , i.e. by {S ∈ Rn×n | S T∞ + T∞ S = 0}. Let
φ (P ) = tr Y Y P + X XP −1 ,
λ (T ) = T T.
Thus Φ (T ) = φ (λ (T )). By Theorem 4.5 Dφ|P∞ = 0,
D2 φP∞ > 0.
Let L denote the matrix representing the linear operator Dλ|T∞ (S) = T∞ S + S T∞ . Using the chain rule we obtain = L · D2 φ ·L D 2 Φ T∞
P∞
for all T∞ ∈ E. By D2 φP∞ > 0 thus D2 ΦT∞ ≥ 0 and D2 ΦT∞ degenerates exactly on the kernel of L; i.e. on the tangent space TT∞ E. Thus Φ is a Morse-Bott function, as defined in Appendix C. Thus Proposition C.12.3 implies that every solution T (t) of (4.18) converges to an equilibrium point. From the above we conclude that the set E of equilibria points is a normally hyperbolic subset for the gradient flow; see Section C.11. It follows (see Appendix C) that W s (T∞ ) the stable manifold of (4.18) at T∞ is an immersed submanifold of GL (n) of dimension dim GL (n) − dim E = n2 − n (n − 1) /2 = n (n + 1) /2, which is invariant under the gradient flow. Since convergence to an equilibrium is always exponentially fast on stable manifolds this completes the proof of (c).
182
Chapter 6. Balanced Matrix Factorizations
Now consider the following quadratic version of our objective function Φ (T ). For T ∈ GL (n), let 2 2 −1 −1 Ψ (T ) := tr (T Y Y T ) + (T ) X XT (4.21) The proof of the following theorem is completely analogous to the proof of Theorem 4.12 and is therefore omitted. Theorem 4.16 (a) The gradient flow T˙ = −∇Ψ (T ) of Ψ (T ) on GL (n) is # " −1 −1 −1 T˙ = 2 (T ) X X (T T ) X X (T T ) − T Y Y T T Y Y (4.22) and for all initial conditions T0 ∈ GL (n), the solution T (t) of exist in GL (n) for all t ≥ 0. (b) For all initial conditions T0 ∈ GL (n), every solution T (t) of (4.20) converges to a balancing transformation and all balancing transformations are obtained in this way, for suitable initial conditions T0 ∈ GL (n). Moreover, Ψ : GL (n) → R is a Morse-Bott function. (c) For any balancing transformation T∞ ∈ GL (n) let W s (T∞ ) ⊂ GL (n) denote the set of all T0 ∈ GL (n), such that the solution T (t) of (4.20) with initial condition T0 converges to T∞ as t → ∞. Then W s (T∞ ) is an immersed submanifold of GL (n) of dimension n (n + 1) /2 and is invariant under the flow of (4.20). Every solution T (t) ∈ W s (T∞ ) converges exponentially to T∞ .
Diagonal balancing transformations The previous results were concerned with the question of finding balancing transformations via gradient flows; here we address the similar issue of computing diagonal balancing transformations. We have already shown in Theorem 3.9 that diagonal balancing factorizations of a k × matrix H can be characterized as the critical points of the weighted cost function ΦN : O (X, Y ) → R,
ΦN (X, Y ) = tr (N X X + N Y Y )
for a fixed diagonal matrix N = diag (µ1 , . . . , µn ) with distinct eigenvalues µ1 > · · · > µn . Of course a similar result holds for diagonal balancing
6.4. Flows for Balancing Transformations
183
transformations. Thus let ΦN : GL (n) → R be defined by −1 ΦN (T ) = tr N T Y Y T + (T ) X XT −1 . The proof of the following result parallels that of Theorem 3.9 and is left as an exercise for the reader. Note that for X X = Y Y and T orthogonal, the function ΦN : O (n) → R, restricted to the orthogonal group O (n), coincides with the trace function studied in Chapter 2. Here we are interested in minimizing the function ΦN rather than maximizing it. We will therefore expect an order reversal relative to that of N occurring for the diagonal entries of the equilibria points. Lemma 4.17 Let rk (X) = rk (Y ) = n and N = diag (µ1 , . . . , µn ) with µ1 > · · · > µn > 0. Then (a) T ∈ GL (n) is a critical point for ΦN : GL (n) → R if and only if T is a diagonal balancing transformation. (b) The global minimum Tmin ∈ GL (n) has the property Tmin Y Y Tmin = diag (d1 , . . . , dn ) with d1 ≤ d2 ≤ · · · ≤ dn .
Theorem 4.18 Let rk (X) = rk (Y ) = n and N = diag (µ1 , . . . , µn ) with µ1 > · · · > µn > 0. (a) The gradient flow T˙ = −∇ΦN (T ) of ΦN : GL (n) → R with respect to the Riemannian metric (4.17) is −1 −1 T˙ = (T ) X XT −1N (T ) − N T Y Y ,
T (0) = T0 (4.23)
and for all initial conditions T0 ∈ GL (n) the solution T (t) of (4.23) exists in GL (n) for all t ≥ 0. (b) For any initial condition T0 ∈ GL (n) the solution T (t) of (4.23) converges to a diagonal balancing transformation T∞ ∈ GL (n). Moreover, all diagonal balancing transformations can be obtained in this way. (c) Suppose the singular values of H = XY are distinct. Then (4.23) has exactly 2n n! equilibrium points. These are characterized by −1 −1 ) X XT∞ = T∞ Y Y T∞ = D where D is a diagonal ma(T∞ n trix. There are exactly 2 stable equilibrium points of (4.23) which −1 −1 ) X XT∞ = T∞ Y Y T∞ = D, where are characterized by (T∞ D = diag (d1 , . . . , dn ) is diagonal with d1 < · · · < dn .
184
Chapter 6. Balanced Matrix Factorizations
(d) There exists an open and dense subset Ω ⊂ GL (n) such that for all T0 ∈ Ω the solution of (4.23) converges exponentially fast to a stable rate of exponential convergence is bounded equilibrium point
T∞ . The −1 min ((di − dj ) (µj − µi ) , 4di µi ). All other below by λmin (T∞ T∞ ) i<j equilibria are unstable. Proof 4.19 Using Lemma 4.17, the proof of (a) and (b) parallels that of Theorem 4.12 and is left as an exercise for the reader. To prove (c) consider the linearization of (4.23) around an equilibrium T∞ , given by −1 −1 −1 −1 D (T∞ ) − DξT∞ N (T∞ ) ξ˙ = −N ξT∞ −1
− (T∞ )
−1
ξ DN (T∞ )
−1
− DN (T∞ )
ξ (T∞ )
−1
,
−1 using (T∞ ) X XT∞ = T∞ Y Y T∞ = D. Using the change of variables −1 η = ξT∞ thus η˙ (T∞ T∞ ) = −N ηD − DηN − η DN − DN η
Thus, using Kronecker products and the vec notation, Appendix A, we obtain ⊗ In ) vec (η) ˙ (T∞ T∞ = − (D ⊗ N + N ⊗ D) vec (η) − (DN ⊗ In + In ⊗ DN ) vec (η ) Consider first the special case when T∞ T∞ = I, and η is denoted η ∗
vec (η˙ ∗ ) = − (D ⊗ N + N ⊗ D) vec (η ∗ ) − (DN ⊗ I + I ⊗ DN ) vec η ∗ . (4.24)
Then for i < j, ∗ η˙ ij ∗ η˙ ji
and for all i,
=−
di µj + µi dj
dj µj + di µi
di µi + dj µj
di µj + µi dj
∗ ηij
∗ ηji
∗ ∗ η˙ ii = −4di µi ηii .
By assumption, µi > 0, and di > 0 for all i. Thus (4.24) is exponentially stable if and only if (di − dj ) (µj − µi ) > 0 for all i, j, i < j, or equivalently, if and only if the diagonal entries of D are distinct and in reverse ordering to those of N . In this case, (4.24) is equivalent to (4.25) vec (η˙ ∗ ) = −F vec (η ∗ ) .
(4.25)
6.4. Flows for Balancing Transformations
185
with a symmetric positive definite matrix F = F > 0. Consequently, exponential convergence of (4.25) is established with a rate given by λmin (F ) as follows: di µj + µi dj dj µj + di µi λmin (F ) = min λmin , 4di µi i<j di µi + dj µj di µj + µi dj = min ((di − dj ) (µj − µi ) , (4di µi )) i<j
Since (T∞ T∞ ⊗ I) is positive definite, exponential stability of (4.25) implies exponential stability of ) ⊗ I) vec (η) ˙ = −F vec (η) . ((T∞ T∞
−1 The rate of exponential convergence is given by λmin (T∞ T∞ ) ⊗I F . Now since A = A > 0, B = B > 0 implies λmin (AB) ≥ λmin (A) λmin (B), a lower bound on the convergence rate is given from −1 −1 ) ⊗ I F ≥ λmin (T∞ T∞ ) ⊗ I λmin (F ) λmin (T∞ T∞ −1 min ((di − dj ) (µj − µi ) , 4di µi ) = λmin (T∞ T∞ ) i<j
as claimed. Since there are only a finite number of equilibria, the union of the stable manifolds of the unstable equilibria points is a closed subset of O (X, Y ) of codimension at least one. Thus its complement Ω in O (X, Y ) is open and dense and coincides with the union of the stable manifolds of the stable equilibria. This completes the proof.
Problem 4.20 Show that ζ, ν := tr P −1 ζP −1 η defines an inner product on the tangent spaces TP P (n) for P ∈ P (n). Problem 4.21 Prove that , defines a Riemannian metric on P (n). We refer to this as the intrinsic Riemannian metric on P (n). Problem 4.22 Prove that the gradient flow of (4.5) with respect to the intrinsic Riemannian metric is the Riccati equation P˙ = − grad φ (P ) = X X − P Y Y P. Problem 4.23 Prove the analogous result to Theorem 4.5 for this Riccati equation. Problem 4.24 Concrete proofs of Theorem 4.16, and Lemma 4.17.
186
Chapter 6. Balanced Matrix Factorizations
Main Points of Section Gradient flows can be developed for matrix factorizations which converge exponentially fast to balancing factorizations, including to diagonal balanced factorizations. Such flows side step the usual requirement to find the balancing transformations.
6.5 Flows on the Factors X and Y In the previous section gradient flows for balancing or diagonal balancing T∞ were analyzed. Here we coordinate transformations T∞ and P∞ = T∞ address the related issue of finding gradient flows for the cost functions Φ : O (X, Y ) → R and ΦN : O (X, Y ) → R on the manifold O (X, Y ) of factorizations of a given matrix H = XY . Thus for arbitrary integers k, and n let
O (X, Y ) = XT −1 , T Y ∈ Rk×n × Rn× | T ∈ GL (n) denote the GL (n)-orbit of (X, Y ). We endow the vector space Rk×n ×Rn× with its standard inner product (3.2) repeated here as (X1 , Y1 ) , (X2 , Y2 ) = tr (X1 X2 + Y1 Y2 )
(5.1)
By Lemma 3.1, O (X, Y ) is a submanifold of Rk×n × Rn× and thus the inner product (5.1) on Rk×n × Rn× induces an inner product on each tangent space T(X,Y ) O of O by (−XΛ1 , Λ1 Y ) , (−XΛ2, Λ2 Y ) = tr (Λ2 X XΛ1 + Λ2 Y Y Λ1 ) (5.2) and therefore a Riemannian metric on O (X, Y ); see Appendix C and (3.4). We refer to this Riemannian metric as the induced Riemannian metric on O (X, Y ). It turns out that the gradient flow of the above cost functions with respect to this Riemannian metric has a rather complicated form; see Problems. A second, different, Riemannian metric on O (X, Y ) is constructed as follows. Here we assume that rk (X) = rk (Y ) = n. Instead of defining the inner product of tangent vectors (−XΛ1, Λ1 Y ) , (−XΛ2 , Λ2 Y ) ∈ T(X,Y ) O as in (5.2) we set (−XΛ1 , Λ1 Y ) , (−XΛ2 , Λ2 Y ) := 2 tr (Λ1 Λ2 )
(5.3)
It is easily seen (using rk (X) = rk (Y ) = n) that this defines an inner product on each tangent space T(X,Y ) O and in fact a Riemannian metric
6.5. Flows on the Factors X and Y
187
on O = O (X, Y ). We refer to this as the normal Riemannian metric on O (X, Y ). This is in fact a particularly convenient Riemannian metric with which to work. In this, the associated gradient takes a particularly simple form. Theorem 5.1 Let rk (X) = rk (Y ) = n. Consider the cost function Φ : O (X, Y ) → R, Φ (X, Y ) = tr (X X + Y Y ). ˙ Y˙ = − grad Φ (X, Y ) for the normal Rieman(a) The gradient flow X, nian metric is: X˙ = − X (X X − Y Y ) , Y˙ = (X X − Y Y ) Y,
X (0) =X0 Y (0) =Y0
(5.4)
(b) For every initial condition (X0 , Y0 ) ∈ O (X, Y ) the solutions (X (t) , Y (t)) of (5.4) exist for all t ≥ 0 and (X (t) , Y (t)) ∈ O (X, Y ) for all t ≥ 0. (c) For any initial condition (X0 , Y0 ) ∈ O (X, Y ) the solutions (X (t) , Y (t)) of (5.4) converge to a balanced factorization (X∞ , Y∞ ) of H = XY . Moreover the convergence to the set of all balanced factorizations of H is exponentially fast. Proof 5.2 Let grad Φ = (grad Φ1 , grad Φ2 ) denote the gradient of Φ : O (X, Y ) → R with respect to the normal Riemannian metric. The derivative of Φ at (X, Y ) ∈ O is the linear map DΦ|(X,Y ) : T(X,Y ) O → R defined by DΦ|(X,Y ) (−XΛ, ΛY ) = 2 tr ((Y Y − X X) Λ)
(5.5)
By definition of the gradient of a function, see Appendix C, grad Φ (X, Y ) is characterized by grad Φ (X, Y ) ∈ T(X,Y ) O
(5.6)
DΦ|(X,Y ) (−XΛ, ΛY ) = (grad Φ1 , grad Φ2 ) , (−XΛ, ΛY )
(5.7)
and
for all Λ ∈ Rn×n . By Lemma 3.1, (5.6) is equivalent to grad Φ (X, Y ) = (−XΛ1, Λ1 Y )
(5.8)
188
Chapter 6. Balanced Matrix Factorizations
for an n × n matrix Λ1 . Thus, using (5.5), (5.7) is equivalent to 2 tr ((Y Y − X X) Λ) = (−XΛ1 , Λ1 Y ) , (−XΛ, ΛY ) =2 tr (Λ1 Λ) for all Λ ∈ Rn×n . Thus Y Y − X X = Λ1
(5.9)
Therefore grad Φ (X, Y ) = (−X (Y Y − X X) , (Y Y − X X) Y ). This proves (a). To prove (b) note that Φ (X (t) , Y (t)) decreases along every solution of (5.4). By Corollary 3.5 {(X, Y ) ∈ O | Φ (X, Y ) ≤ Φ (X0 , Y0 )} is a compact set. Therefore (X (t) , Y (t)) stays in that compact subset of O (X, Y ) and therefore exists for all t ≥ 0. This proves (b). Since (5.4) is a gradient flow of Φ : O (X, Y ) → R and since Φ : O (X, Y ) → R has compact sublevel sets the solutions (X (t) , Y (t)) all converge to the set of equilibria of (5.4). But X∞ = Y∞ Y∞ . (X∞ , Y∞ ) is an equilibrium point of (5.4) if and only if X∞ Thus the equilibria are in both cases the balanced factorizations of H = XY . By Theorem 3.7 the Hessian of Φ : O (X, Y ) → R is positive semidefinite at each critical point and degenerates exactly on the tangent spaces at the set of critical points. This implies that the linearization of (5.4) is exponentially stable in directions transverse to the tangent spaces of the set of equilibria. Thus Proposition C.12.3 implies that any solutions (X (t) , Y (t)) of (5.4) converges to an equilibrium point. Exponential convergence follows from the stable manifold theory, as summarized in Section C.11. A similar approach also works for the weighted cost functions ΦN : O (X, Y ) → R,
ΦN (X, Y ) = tr (N (X X + Y Y )) .
We have the following result. Theorem 5.3 Consider the weighted cost function ΦN : O (X, Y ) → R, ΦN (X, Y ) = tr (N (X X + Y Y )), with N = diag (µ1 , . . . , µn ), µ1 > · · · > µn > 0. Let rk (X) = rk (Y ) = n.
˙ Y˙ = − grad ΦN (X, Y ) for the normal Rie(a) The gradient flow X, mannian metric is X˙ = − X (X XN − N Y Y ) , Y˙ = (X XN − N Y Y ) Y,
X (0) =X0 Y (0) =Y0
(5.10)
6.5. Flows on the Factors X and Y
189
(b) For any initial condition (X0 , Y0 ) ∈ O (X, Y ) the solutions (X (t) , Y (t)) of (5.10) exist for all t ≥ 0 and (X (t) , Y (t)) ∈ O (X, Y ) for all t ≥ 0. (c) For any initial condition (X0 , Y0 ) ∈ O (X, Y ) the solutions (X (t) , Y (t)) of (5.10) converge to a diagonal balanced factorization (X∞ , Y∞ ) of H = XY . (d) Suppose that the singular values of H = X0 Y0 are distinct. There are exactly 2n n! equilibrium points of (5.10) The set of such asymp totically stable equilibria is characterized by X∞ X∞ = Y∞ Y∞ = D , where D = diag (d1 , . . . , dn ) satisfies d1 < · · · < dn . In particular there are exactly 2n asymptotically stable equilibria, all yielding the same value of ΦN . The diagonal entries di are the singular values of H = X0 Y0 arranged in increasing order. All other equilibria are unstable. (e) There exists an open and dense subset Ω ⊂ O (X, Y ) such that for all (X0 , Y0 ) ∈ Ω the solutions of (5.10) converge exponentially fast to a stable equilibrium point (X∞ , Y∞ ). Proof 5.4 The proof of (a) and (b) goes mutatis mutandis as for (a) and (b) in Theorem 5.1. Similarly for (c), except that we have yet to check that the equilibria of (5.10) are the diagonal balanced factorizations. The equilibria of (5.10) are just the critical points of the function ΦN : O (X, Y ) → R and the desired result follows from Theorem 3.9(c). To prove (d), first note that we cannot apply, as for Theorem 5.1, the Kempf-Ness theory. Therefore we give a direct argument based on linearizations at the equilibria. The equilibrium points of (5.10) are characterized X∞ = Y∞ Y∞ = D, where D = diag (d1 , . . . , dn ). By linearizing by X∞ (5.10) around an equilibrium point (X∞ , Y∞ ) we obtain ξN − N ηY∞ − N Y∞ η ) ξ˙ = −X∞ (ξ X∞ N + X∞ ξN − N ηY∞ − N Y∞ η ) Y∞ = (ξ X∞ N + X∞
(5.11)
where (ξ, η) = (X∞ Λ, −ΛY∞ ) ∈ T(X∞ ,Y∞ ) O (X, Y ) denotes a tangent vector. Since (X∞ , Y∞ ) are full rank matrices (5.11) is equivalent to the linear ODE on Rn×n Λ˙ = − (Λ X∞ X∞ N + X∞ X∞ ΛN + N ΛY∞ Y∞ + N Y∞ Y∞ Λ ) (5.12) = − (Λ DN + DΛN + N ΛD + N DΛ )
The remainder of the proof follows the pattern of the proof for Theorem 4.18.
190
Chapter 6. Balanced Matrix Factorizations
An alternative way to derive differential equations for balancing is by the following augmented systems of differential equations. They are of a somewhat simpler form, being quadratic matrix differential equations. Theorem 5.5 Let (X, Y ) ∈ Rk×n × Rn× , Λ ∈ Rn×n , N = diag (µ1 , . . . , µn ) with µ1 > · · · > µn > 0 and κ > 0 a scaling factor. Assume rk (X) = rk (Y ) = n. (a) The solutions (X (t) , Y (t)) of X˙ = − XΛ, Y˙ =ΛY, Λ˙ = − κΛ + X X − Y Y ,
X (0) =X0 Y (0) =Y0
(5.13)
Λ (0) =Λ0
and of X˙ = − XΛN , Y˙ =ΛN Y,
X (0) =X0 Y (0) =Y0
Λ˙ N = − κΛN + X XN − N Y Y ,
(5.14)
ΛN (0) =Λ0
exist for all t ≥ 0 and satisfy X (t) Y (t) = X0 Y0 . (b) Every solution (X (t) , Y (t) , ΛN (t)) of (5.14) converges to an equilibrium point (X∞ , Y∞ , Λ∞ ), characterized by Λ∞ = 0 and (X∞ , Y∞ ) is a diagonal balanced factorization. (c) Every solution (X (t) , Y (t) , Λ (t)) of (5.13) converges to the set of equilibria (X∞ , Y∞ , 0). The equilibrium points of (5.13) are characterized by Λ∞ = 0 and (X∞ , Y∞ ) is a balanced factorization of (X0 , Y0 ). Proof 5.6 For any solution (X (t) , Y (t) , Λ (t)) of (5.13), we have d ˙ + X Y˙ = −XΛY + XΛY ≡ 0 X (t) Y (t) = XY dt and thus X (t) Y (t) = X (0) Y (0) for all t. Similarly for (5.14). For any symmetric n × n matrix N let ΩN : O (X0 , Y0 ) × Rn×n → R be defined by ΩN (X, Y, Λ) = ΦN (X, Y ) +
1 2
tr (ΛΛ )
(5.15)
6.5. Flows on the Factors X and Y
191
By Corollary 3.5 the function ΩN : O (X0 , Y0 ) × Rn×n → R has compact sublevel sets and ΩN (X, Y, Λ) ≥ 0 for all (X, Y, Λ). For any solution (X (t) , Y (t) , Λ (t)) of (5.14) we have d ΩN (X (t) , Y (t) , Λ (t)) dt ˙ = tr N X X˙ + N Y˙ Y + ΛΛ = tr (−N X XΛ + N ΛY Y + (−κΛ + X XN − N Y Y ) Λ ) = −κ tr (ΛΛ ) ≤ 0. ˙ N = 0 if and only Thus ΩN decreases along the solutions of (5.14) and Ω if Λ = 0. This shows that ΩN is a weak Lyapunov function for (5.14) and La Salle’s invariance principle (Appendix B) asserts that the solutions of (5.14) converge to a connected invariant subset of O (X0 , Y0 ) × {0}. But a subset of O (X0 , Y0 ) × {0} is invariant under the flow of (5.14) if and only if the elements of the subset satisfy X XN = N Y Y . Specializing now to the case where N = diag (µ1 , . . . , µn ) with µ1 > · · · > µn > 0 we see that the set of equilibria points of (5.14) is finite. Thus the result follows from Proposition C.12.2. If N = In , then we conclude from Proposition C.12.2(b) that the solutions of (5.13) converge to a connected component of the set of equilibria. This completes the proof. As a generalization of the task of minimizing the cost function Φ : O (X, Y ) → R let us consider the minimization problem for the functions Φ(X0 ,Y0 ) : O (X, Y ) → R, 2
Φ(X0 ,Y0 ) = X − X0 + Y − Y0
2
(5.16)
for arbitrary (X0 , Y0 ) ∈ Rk×n × Rn× . Theorem 5.7 Suppose rk (X) = rk (Y ) = n. The gradient flow of Φ(X0 ,Y0 ) : O (X, Y ) → R with respect to the normal Riemannian metric is X˙ =X ((Y − Y0 ) Y − X (X − X0 )) , Y˙ = ((Y − Y0 ) Y − X (X − X0 )) Y,
X (0) =X0 Y (0) =Y0
(5.17)
The solutions (X (t) , Y (t)) exist for all t ≥ 0 and converge to a connected component of the set of equilibrium points, characterized by − Y0 ) = (X∞ − X0 ) X∞ Y∞ (Y∞
(5.18)
192
Chapter 6. Balanced Matrix Factorizations
Proof 5.8 The derivative of Φ(X0 ,Y0 ) at (X, Y ) is
DΦ|(X0 ,Y0 ) (−XΛ, ΛY ) =2 tr − (X − X0 ) XΛ + ΛY (Y − Y0 ) =2 tr ((Y − Y0 ) Y − X (X − X0 )) Λ The gradient of Φ(X0 ,Y0 ) at (X, Y ) is grad Φ(X0 ,Y0 ) (X, Y ) = (−XΛ1 , Λ1 Y ) with :: ;; DΦ|(X0 ,Y0 ) (−XΛ, ΛY ) = grad Φ|(X0 ,Y0 ) (X, Y ) , (−XΛ, ΛY ) =2 tr (Λ1 Λ) and hence
Λ1 = (Y − Y0 ) Y − X (X − X0 ) .
Therefore grad Φ(X0 ,Y0 ) (X, Y ) = (−X ((Y − Y0 ) Y − X (X − X0 )) , ((Y − Y0 ) Y − X (X − X0 )) Y )
which proves (5.17). By assumption, O (X, Y ) is closed and therefore function Φ(X0 ,Y0 ) : O (X, Y ) → R is proper. Thus the solutions of (5.17) for all t ≥ 0 and, using Proposition C.12.2, converge to a connected component of the set of equilibria. This completes the proof. For generic choices of (X0 , Y0 ) there are only a finite numbers of equilibria points of (5.17) but an explicit characterization seems hard to obtain. Also the number of local minima of Φ(X0 ,Y0 ) on O (X, Y ) is unknown and the dynamical classification of the critical points of Φ(X0 ,Y0 ) , i.e. a local phase portrait analysis of the gradient flow (5.17), is an unsolved problem. Problem 5.9 Let rk X = rk Y = n and let N = N ∈ Rn×n be an arbitrary symmetric matrix. Prove that the gradient flow X˙ = −∇X ΦN (X, Y ), Y˙ = −∇Y ΦN (X, Y ), of ΦN : O (X, Y ) → R, ΦN (X, Y ) = 1 2 tr (N (X X + Y Y )) with respect to the induced Riemannian metric (5.2) is X˙ = −XΛN (X, Y ) Y˙ = ΛN (X, Y ) Y %
where
∞
ΛN (X, Y ) =
e−sX
X
(X XN − N Y Y ) e−sY Y ds
0
is the uniquely determined solution of the Lyapunov equation X XΛN + ΛN Y Y = X XN − N Y Y .
6.6. Recursive Balancing Matrix Factorizations
193
Problem 5.10 For matrix triples (X, Y, Z) ∈ Rk×n × Rn×m × Rm× let
O (X, Y, Z) = XS −1 , SY T −1 , T Z | S ∈ GL (n) , T ∈ GL (m) . Prove that O = O (X, Y, Z) is a smooth manifold with tangent spaces
T(X,Y,Z) O = (−XΛ, ΛY − Y L, LZ) | Λ ∈ Rn×n , L ∈ Rm×m . Show that X˙ = − X (X X − Y Y ) Y˙ = (X X − Y Y ) Y − Y (Y Y − ZZ ) Z˙ = (Y Y − ZZ ) Z is a gradient flow
(5.19)
X˙ = − gradX φ (X, Y, Z) , Y˙ = − gradY φ (X, Y, Z) , Z˙ = − gradZ φ (X, Y, Z)
of φ : O (X, Y, Z) → R,
φ (X, Y, Z) = X2 + Y 2 + Z2 .
(Hint: Consider the natural generalization of the normal Riemannian metric (5.3) as a Riemannian metric on O (X, Y, Z)!)
Main Points of Section Gradient flows on positive definite matrices and balancing transformations, including diagonal balancing transformations, can be used to construct balancing matrix factorizations. Under reasonable (generic) conditions, the convergence is exponentially fast.
6.6 Recursive Balancing Matrix Factorizations In order to illustrate, rather than fully develop, discretization possibilities for the matrix factorization flows of this chapter, we first focus on a discretization of the balancing flow (5.4) on matrix factors X, Y , and then discretize the diagonal balancing flow (5.10).
194
Chapter 6. Balanced Matrix Factorizations
Balancing Flow Following the discretization strategy for the double bracket flow of Chapter 2 and certain subsequent flows, we are led to consider the recursion, for all k ∈ N.
Xk+1 =Xk e−αk (Xk Xk −Yk Yk )
Yk+1 =eαk (Xk Xk −Yk Yk ) Yk ,
(6.1)
with step-size selection αk =
1 2λmax (Xk Xk
+ Yk Yk )
(6.2)
initialized by X0 , Y0 such that X0 Y0 = H. Here Xk ∈ Rm×n , Yk ∈ Rn× for all k and H ∈ Rm× Theorem 6.1 (Balancing flow on factors) Consider the recursion (6.1), (6.2) initialized by X0 , Y0 such that X0 Y0 = H and rk X0 = rk Y0 = n. Then (a) for all k ∈ N Xk Yk = H.
(6.3)
(b) The fixed points (X∞ , Y∞ ) of the recursion are the balanced matrix factorizations, satisfying X∞ = Y∞ Y∞ , X∞
H = X∞ Y∞ .
(6.4)
(c) Every solution (Xk , Yk ), k ∈ N, of (6.1) converges to the class of balanced matrix factorizations (X∞ , Y∞ ) of H, satisfying (6.4). (d) The linearization of the flow at the equilibria, in transversal directions to the set of equilibria, is exponentially stable with eigenvalues satisfying, for all i, j 0≤1−
λi (X∞ X∞ ) + λj (X∞ X∞ ) < 1. 2λmax (X∞ X∞ )
(6.5)
Proof 6.2 Part (a) is obvious. For Part (b), first observe that for any specific sequence of positive real numbers αk , the fixed points (X∞ , Y∞ ) of the flow (6.1) satisfy (6.4). To proceed, let us introduce the notation X (α) = Xe−α∆ ,
Y (α) = eα∆ Y,
∆ = X X − Y Y ,
Φ (α) 2 := Φ (X (α) , Y (α)) = tr (X (α) X (α) + Y (α) Y (α)) ,
6.6. Recursive Balancing Matrix Factorizations
195
and study the effects of the step size selection α on the potential function Φ (α). Thus observe that ∆Φ (α) :=Φ (α) − Φ (0)
= tr X X e−2α∆ − I + Y Y e2α∆ − I ∞ j (2α) j j tr X X + (−1) Y Y (−∆) = j! j=1 ∞ 2j−1
(2α) tr (X X − Y Y ) ∆2j−1 = (2j − 1)! j=1 2j
(2α) 2j + tr (X X + Y Y ) ∆ (2j)! ∞ 2j (2α∆) 2j = I tr X X + Y Y − 2α (2j)! j=1 ∞ 2j (2α∆) 1 ≤ . tr X X + Y Y − I α (2j)! j=1 With the selection α∗ of (6.2), deleting subscript k, then ∆Φ (α∗ ) < 0. Consequently, under (6.1) (6.2), for all k Φ (Xk+1 , Yk+1 ) < Φ (Xk , Yk ) . Moreover, Φ (Xk+1 , Yk+1 ) = Φ (Xk , Yk ) if and only if ∞ j=1
1 2j tr (2αk (Xk Xk − Yk Yk )) = 0, (2j)!
or equivalently, since αk > 0 for all k, Xk Xk = Yk Yk . Thus the fixed points (X∞ , Y∞ ) of the flow (6.1) satisfy (6.4), as claimed, and result (b) of the theorem is established. Moreover, the ω-limit set of a solution (Xk , Yk ) is contained in the compact set O (X0 , Y0 )∩{(X, Y ) | Φ (X, Y ) ≤ Φ (X0 , Y0 )}, and thus is a nonempty compact subset of O (X0 , Y0 ). This, together with the property that Φ decreases along solutions, establishes (c). For Part (d), note that the linearization of the flow at (X∞ , Y∞ ) with X∞ X∞ = Y∞ Y∞ is Λk+1 = Λk − α∞ (Λk Y∞ Y∞ + Y∞ Y∞ Λk + X∞ X∞ Λk + Λk X∞ X∞ )
where Λ ∈ Rn×n parametrizes uniquely the tangent space T(X∞ ,Y∞ ) × O (X0 , Y0 ).
Chapter 6. Balanced Matrix Factorizations
196
The skew-symmetric part of Λk corresponds to the linearization in directions tangent to the set of equilibria. Similarly the symmetric part of Λk corresponds to the linearization in directions transverse to the set of equilibria. Actually, the skew-symmetric part of the Λk remains constant. Only the dynamics of the symmetric part of Λk is of interest here. Thus, the linearization with Λ = Λ is, X∞ + X∞ X∞ Λ k ) , Λk+1 =Λk − 2α∞ (Λk X∞ X∞ ) ⊗ I + I ⊗ (X∞ X∞ ))) (vec Λk ) . vec (Λk+1 ) = (I − 2α∞ ((X∞ X∞ = Y∞ Y∞ is square, the set of eigenvalues of Since X∞ ((X∞ X∞ ) ⊗ I + I ⊗ (X∞ X∞ )) is given by the set of eigenvalues X∞ ) + λj (X∞ X∞ )] for all i, j. The result (d) follows. [λi (X∞
Remark 6.3 Other alternative step-size selections are (a)
(b)
−1
α ˆ k = 12 (λmax (Xk Xk ) + λmax (Yk Yk )) −1 2 2 = 12 σmax (Xk ) + σmax (Yk ) . α ˜k =
1 2
2
2
Xk + Yk
Diagonal Balancing Flows The natural generalization of the balancing flow (6.1), (6.2) to diagonal balancing is a discrete-time version of (5.10). Thus consider
Xk+1 =Xk e−αk (Xk Xk N −N Yk Yk )
Yk+1 =eαk (Xk Xk N −N Yk Yk ) Yk ,
(6.6)
where αk =
1 (6.7) 2 Xk Xk N − N Yk Yk 2 (X X − Y Y ) N + N (X X − Y Y ) k k k k k k k k + 1 , × log 2 2 4 N Xk Xk N − N Yk Yk Xk + Yk
initialized by full rank matrices X0 , Y0 such that X0 Y0 = H. Again Xk ∈ Rm×n , Yk ∈ Rn× for all k and H ∈ Rm× . We shall consider N = diag (µ1 , . . . µn ) with µ1 > µ2 · · · > µn .
6.6. Recursive Balancing Matrix Factorizations
197
Theorem 6.4 Consider the recursion (6.6), (6.7) initialized by (X0 , Y0 ) such that X0 Y0 = H and rk X0 = rk Y0 = n. Assume that N = diag (µ1 , . . . , µn ) with µ1 > · · · > µn . Then (a) for all k ∈ N Xk Yk = H.
(6.8)
(b) The fixed points X∞ , Y∞ of the recursion are the diagonal balanced matrix factorizations satisfying X∞ = Y∞ Y∞ = D, X∞
(6.9)
with D = diag (d1 , . . . , dn ). (c) Every solution (Xk , Yk ), k ∈ N, of (6.6), (6.7) converges to the class of diagonal balanced matrix factorizations (X∞ , Y∞ ) of H. The singular values of H are d1 , . . . , dn . (d) Suppose that the singular values of H are distinct. There are exactly 2n n! fixed points of (6.6). The set of asymptotically stable fixed points is characterized by (6.9) with d1 < · · · < dn . The linearization of the flow at the fixed points is exponentially stable. Proof 6.5 The proof follows that of Theorem 5.3, and Theorem 6.1. Only its new aspects are presented here. The main point is to derive the stepsize selection (6.7) from an upper bound on ∆ΦN (α). Consider the linear operator A F : Rn×n → Rn×n defined by AF (B) = F B + B F . Let m−1 (B) , A0F (B) = B, be defined recursively for m ∈ N. Am F (B) = AF AF In the sequel we are only interested in the case where B = B is symmetric, so that AF maps the set of symmetric matrices to itself. The following identity is easily verified by differentiating both sides
eαF BeαF = Then
∞ αm m A (B) . m! F m=0
∞ αm m αF αF AF (B) − αAF (B) − B = e Be m! m=2 ≤
∞ m |α| Am F (B) m! m=2
∞ m |α| m 2 F m B m! m=2 = e2|α|F − 2 |α| F − 1 B .
≤
198
Chapter 6. Balanced Matrix Factorizations
Setting B = X X and F = X XN − N Y Y , then simple manipulations give that ∆ΦN (α) = tr N e−αF X Xe−αF + αAF (X X) − X X + N eαF Y Y eαF − αAF (Y Y ) − Y Y − α tr (F (F + F )) . Applying the exponential bound above, then α 2 ∆ΦN (α) ≤ − F + F 2 2 2 + e2|α|F − 2 |α| F + 1 N X + Y =:∆ΦU N (α) . 2
d U U Since dα 2 ∆ΦN (α) > 0 for α ≥ 0, the upper bound ∆ΦN (α) is a strictly convex function of α ≥ 0. Thus there is a unique minimum α∗ ≥ 0 of d U ∆ΦU N (α) obtained by setting dα ΦN (α) = 0. This leads to 2 F + F 1 + 1 log α∗ = 2 2 2 F 4 F N X + Y
and justifies the variable step-size selection (6.7). A somewhat tedious argument shows that—under the hypothesis of the theorem—the linearization of (6.6) at an equilibrium point (X∞ , Y∞ ) satisfying (6.9) is exponentially stable. We omit these details. Remark 6.6 The discrete-time flows of this section on matrix factorization for balancing inherit the essential properties of the continuous-time flows, namely exponential convergence rates. Remark 6.7 Actually, the flows of this section can be viewed as hybrid flows in that a linear system continuous-time flow can be used to calculate the matrix exponential and at discrete-time instants a recursive update of the linear system parameters is calculated. Alternatively, matrix Pad´e approximations could perhaps be most in lieu of matrix exponentials.
Main Points of Section There are discretizations of the continuous-time flows on matrix factors for balancing, including diagonal balancing. These exhibit the same convergence properties as the continuous-time flows, including exponential convergence rates.
6.6. Recursive Balancing Matrix Factorizations
199
Notes for Chapter 6 Matrix factorizations (1.1) are of interest in many areas. We have already mentioned an application in system theory, where factorizations of Hankel matrices by the observability and controllability matrices are crucial for realization theory. As a reference to linear system theory we mention Kailath (1980). Factorizations (1.1) also arise in neural network theory; see e.g. Baldi and Hornik (1989). In this chapter techniques from invariant theory are used in a substantial way. For references on invariant theory and algebraic group actions we refer to Weyl (1946), Dieudonn´e and Carrell (1971), Kraft (1984) and Mumford, Fogarty and Kirwan (1994). It is a classical result from invariant theory and linear algebra that any two full rank factorizations of a n × m-matrix A = XY = X1 Y1 (X, Y ) , (X1 , Y1 ) ∈ Rn×r × Rr×m satisfy (X1 , Y1 ) = XT −1 , T Y for a unique invertible r × r-matrix T . In fact, the first fundamental theorem of invariant theory states that any polynomial in the coefficients of X and Y which is invariant under the GL (r, R) group action (X, Y ) → XT −1, T Y can be written as a polynomial in the coefficient of XY . See Weyl (1946) for a proof. The geometry of orbits of a group action plays a central rˆ ole in invariant theory. Of special significance are the closed orbits, as these can be separated by polynomial invariants. For a thorough study of the geometry of the orbits O (X, Y ) including a proof of Lemma 3.3 we refer to Kraft (1984). The standard notions from topology are used throughout this chapter. An algebraic group such as GL (n, C) however also has a different, coarser topology: the Zariski topology. It can be shown that an orbit of an algebraic group action GL (n, C) × CN → Cn is a Zariski-closed subset of CN if and only if it is a closed subset of CN in the usual sense of Euclidean topology. Similar results hold for real algebraic group actions; see Slodowy (1989). The Hilbert-Mumford criterion, Mumford et al. (1994) and Kraft (1984), yields an effective test for checking whether or not an orbit is closed. The Kempf-Ness theorem can be generalized to algebraic group actions of reductive groups. See Kempf and Ness (1979) for proofs in the complex case and Slodowy (1989) for a proof in the real case. Although this has not been made explicitly in the chapter, the theory of Kempf and Ness can be linked to the concept of the moment map arising in symplectic geometry and Hamiltonian mechanics. See Guillemin and Sternberg (1984). The book Arnold (1989) is an excellent source for classical mechanics. The problem of minimizing the trace function ϕ : P (n) → R defined by (4.5) over the set of positive definite symmetric matrices has been consid-
200
Chapter 6. Balanced Matrix Factorizations
ered by Flanders (1975) and Anderson and Olkin (1978). For related work we refer to Wimmer (1988). A proof of Corollary 4.3 is due to Flanders (1975). The simple proof of Corollary 4.3, which is based on the Kempf-Ness theorem is due to Helmke (1993a).
CHAPTER
7
Invariant Theory and System Balancing 7.1 Introduction In this chapter balanced realizations arising in linear system theory are discussed from a matrix least squares viewpoint. The optimization problems we consider are those of minimizing the distance 2 (A0 , B0 , C0 ) − (A, B, C) of an initial system (A0 , B0 , C0 ) from a manifold M of realizations (A, B, C). Usually M is the set of realizations of a given transfer function but other situations are also of interest. If B0 , C0 and B, C are all zero then we obtain the least squares matrix estimation problems studied in Chapters 1–4. Thus the present chapter extends the results studied in earlier chapters on numerical linear algebra to the system theory context. The results in this chapter have been obtained by Helmke (1993a). In the sequel, we will focus on the special situation where (A0 , B0 , C0 ) = (0, 0, 0), in which case it turns out that we can replace the norm functions by a more general class of norm functions, including p-norms. This is mainly for technical reasons. A more important reason is that we can apply the relevant results from invariant theory without the need for a generalization of the theory. Specifically, we consider continuous-time linear dynamical systems
x˙ (t) =Ax (t) + Bu (t) y (t) =Cx (t) ,
t ∈ R,
(1.1)
202
Chapter 7. Invariant Theory and System Balancing
or discrete-time linear dynamical systems of the form xk+1 =Axk + Buk ,
k ∈ N0 ,
yk =Cxk ,
(1.2)
where (A, B, C) ∈ Rn×n × Rn×m × Rp×n and x, u, y are vectors of suitable dimensions. The systems (1.1) or (1.2) are called asymptotically stable if the eigenvalues of A are in the open left half plane C− = {z ∈ C | Re (z) < 0} or in the unit disc D = {z ∈ C | |z| < 1}, respectively. Given any asymptotically stable system (A, B, C), the controllability Gramian Wc and observability Gramian Wo are defined in the discrete and continuous-time case, respectively, by Wc =
∞
Ak BB (A ) ,
k=0 % ∞
Wc =
k
tA
tA
e BB e
Wo =
∞
(A ) C CAk , k
k=0 % ∞
dt,
Wo =
0
(1.3) e
tA
tA
C Ce dt,
0
For unstable systems, finite Gramians are defined, respectively, by Wc(N )
=
N k=0 % T
Wc (T ) =
k
k
A BB (A ) ,
etA BB etA dt,
Wo(N )
=
N k=0 % T
Wo (T ) =
0
(A ) C CAk , k
(1.4)
etA C CetA dt,
0
for N ∈ N and T > 0 a real number. Thus (A, B, C) is controllable or observable if and only Wc > 0 or Wo > 0. In particular, the ‘sizes’ of Wc and Wo as expressed e.g. by the norms Wc , W0 or by the eigenvalues of Wc , Wo measure the controllability and observability properties of (A, B, C). Functions such as f (A, B, C) = tr (Wc ) + tr (Wo ) or, more generally fp (A, B, C) = tr (Wcp ) + tr (Wop ) ,
p ∈ N,
measure the controllability and observability properties of (A, B, C). In the sequel we concentrate on asymptotically stable systems and the associated infinite Gramians (1.3), although many results also carry over to finite Gramians (1.4). A realization (A, B, C) is called balanced if the controllability and observability Gramians are equal Wc = Wo
(1.5)
7.1. Introduction
203
and is called diagonal balanced if Wc = Wo = D For a stable system (A, B, C), a realization in which the Gramians are equal and diagonal as Wc = Wo = diag (σ1 , . . . , σn ) is termed a diagonally balanced realization. For a minimal, that is controllable and observable, realization (A, B, C), the singular values σi are all positive. For a non minimal realization of McMillan degree m < n, then σm+i = 0 for i > 0. Corresponding definitions and results apply for Gramians defined on finite intervals T . Also, when the controllability and observability Gramians are equal but not necessarily diagonal the realizations are termed balanced. Such realizations are unique only to within orthogonal basis transformations. Balanced truncation is where a system (A, B, C) with A ∈ Rn×n , B ∈ n×m , C ∈ Rp×n is approximated by an rth order system with r < n and R A ] of σr > σr+1 , the last (n − r) rows of (A, B) and last (n − r) columns of [ C a balanced realization are set to zero to form a reduced rth order realization (Ar , Br , Cr ) ∈ Rr×r × Rr×m × Rp×r . A theorem of Pernebo and Silverman (1982) states that if (A, B, C) is balanced and minimal, and σr > σr+1 , then the reduced r-th order realization (Ar , Br , Cr ) is also balanced and minimal. Diagonal balanced realizations of asymptotically stable transfer functions were introduced by Moore (1981) and have quickly found widespread use in model reduction theory and system approximations. In such diagonal balanced realizations, the controllability and observability properties are reflected in a symmetric way thus, as we have seen, allowing the possibility of model truncation. However, model reduction theory is not the only reason why one is interested in balanced realizations. For example, in digital control and signal processing an important issue is that of optimal finite-wordlength representations of linear systems. Balanced realizations or other related classes of realizations (but not necessarily diagonal balanced realizations!) play an important rˆ ole in these issues; see Mullis and Roberts (1976) and Hwang (1977). To see the connection with matrix least squares problems we consider the manifolds OC (A, B, C)
= T AT −1, T B, CT −1 ∈ Rn×n × Rn×m × Rp×n | T ∈ GL (n)
204
Chapter 7. Invariant Theory and System Balancing
for arbitrary n, m, p. If (A, B, C) is controllable and observable then OC (A, B, C) is the set of all realizations of a transfer function −1
G (z) = C (zI − A)
B ∈ Rp×m (z) ,
(1.6)
see Kalman’s realization theorem; Section B.5. −1 Given any p × m transfer function G (z) = C (zI − A) B we are interested in the minima, maxima and other critical points of smooth functions f : OC (A, B, C) → R defined on the set OC (A, B, C) of all realizations of G (z). Balanced and diagonal balanced realizations are shown to be characterized as the critical points of the respective cost functions f : OC (A, B, C) →R,
f (A, B, C) = tr (Wc ) + tr (Wo )
(1.7)
and fN : OC (A, B, C) →R,
fN (A, B, C) = tr (N Wc ) + tr (N Wo )
(1.8)
for a symmetric matrix N = N . Suitable tools for analyzing the critical point structure of such cost functions come from both invariant theory and several complex variable theory. It has been developed by Kempf and Ness (1979) and, more recently, by Azad and Loeb (1990). For technical reasons the results in this chapter have to be developed over the field of complex numbers. Subsequently, we are able to deduce the corresponding results over the field of real numbers, this being of main interest. Later chapters develop results building on the real domain theory in this chapter.
7.2 Plurisubharmonic Functions In this section we recall some basic facts and definitions from several complex variable theory concerning plurisubharmonic functions. One reason why plurisubharmonic functions are of interest is that they provide a coordinate free generalization of convex functions. They also play an important rˆ ole in several complex variable theory, where they are used as a tool in the solution of Levi’s problem concerning the characterization of holomorphy domains in Cn . For textbooks on several complex variable theory we refer to Krantz (1982) and Vladimirov (1966). Let D be an open, connected subset of C = R2 . An upper semicontinuous function f : D → R ∪ {−∞} is called subharmonic if the mean value inequality % 2π
1 f a + reiθ dθ f (a) ≤ (2.1) 2π 0
7.2. Plurisubharmonic Functions
205
holds for every a ∈ D and each open disc B (a, r) ⊂ B (a, r) ⊂ D; B (a, r) = {z ∈ C | |z − a| < r}. More generally, subharmonic functions can be defined for any open subset D of Rn . If f : D → R is twice continuously differentiable then f is subharmonic if and only if ∆f =
n ∂2f i=1
∂x2i
≥0
(2.2)
on D. For n = 1, condition (2.2) is just the usual condition for convexity of a function. Thus, for n = 1, the class of subharmonic functions on R coincides with the class of convex functions. Let X ⊂ Cn be an open and connected subset of Cn . An upper semicontinuous function f : X → R ∪ {−∞} is called plurisubharmonic (plush) if the restriction of f to any one-dimensional complex disc is subharmonic, i.e. if for all a, b ∈ Cn and z ∈ C with a + bz ∈ X the function z → f (a + bz)
(2.3)
is subharmonic. The class of plurisubharmonic functions constitutes a natural extension of the class of convex functions: Any convex function on X is plurisubharmonic. We list a number of further properties of plurisubharmonic functions:
Properties of Plurisubharmonic Functions p
(a) Let f : X → C be holomorphic. Then the functions log |f | and |f | , p > 0 real, are plurisubharmonic (plush). (b) Let f : X → R be twice continuously differentiable. The f is plush if and only if the Levi form of f 2 ∂ f (2.4) L (f ) := ∂zi ∂ z¯j is positive semidefinite on X. We say that f : X → R is strictly plush if the Levi form L (f ) is positive definite, i.e. L (f ) > 0, on X. (c) Let f, g : X → R be plush, a ≥ 0 real. Then the functions f + g and a · f are plush. (d) Let f : X → R be plush and let ϕ : R → R be a convex and monotonically increasing function. Then the composed map ϕ ◦ f : X → R is plush.
206
Chapter 7. Invariant Theory and System Balancing
(e) Let ϕ : X → Y be holomorphic and f : Y → R be plush. Then f ◦ ϕ : X → R is plush. By property this property the notion of plurisubharmonic functions can be extended to functions on any complex manifold. We then have the following important property: (f) Let M ⊂ X be a complex submanifold and f : X → R (strictly) plush. Then the restriction f |M : M → R of f to M is strictly plush. (g) Any norm on Cn is certainly a convex function of its arguments and therefore plush. More generally we have by Property (f): (h) Let · be any norm on Cn and let X be a complex submanifold of Cn . Then for every a ∈ Cn the distance function φa : X → R, φa (x) = x − a is plurisubharmonic. Let , be any (positive definite) Hermitian inner product on Cn . Thus for all z = (z1 , . . . , zn ) and w = (w1 , . . . , wn ) ∈ Cn we have z, w =
n
aij zi w ¯j
(2.5)
i,j=1
where A = (aij ) ∈ Cn×n a uniquely determined positive definite complex 1/2 is the induced norm on Cn Hermitian n × n-matrix. If z := z, z n then the Levi form of f : C → R , f (z) = z2 , is L (f ) = A > 0 and therefore f : Cn → R is strictly plurisubharmonic. More generally, if · is the induced norm of positive definite Hermitian inner product on Cn and X ⊂ Cn is a complex analytic submanifold, then the distance functions 2 φa : X → R, φa (x) = x − a , are strictly plurisubharmonic, a ∈ X. be the Frobenius Problem 2.1 For A ∈ Cn×n let AF := [tr (AA∗ )] 2 n×n norm. Show that f : C → R, f (A) = AF , is a strictly plurisubharmonic function. 1/2
Problem 2.2 Show that the condition number, in terms of Frobenius norms, K (A) = AF A−1 F defines a strictly plurisubharmonic function K 2 : GL (n, C) → R, K 2 (A) = 2 2 AF A−1 F , on the open subset of invertible complex n × n - matrices.
Main Points of Section Plurisubharmonic functions extend the class of subharmonic functions for a single complex variable to several complex variables. They are also a
7.3. The Azad-Loeb Theorem
207
natural coordinate-free extension of the class of convex functions and have better structural properties. Least squares distance functions from a point to a complex analytic submanifold are always strictly plurisubharmonic. Strictly plurisubharmonic functions are characterized by the Levi form being positive definite.
7.3 The Azad-Loeb Theorem We derive a generalization of the Kempf-Ness theorem, employed in the previous chapter, by Azad and Loeb (1990). The result plays a central rˆ ole in our approach to balanced realizations. Let GL (n, C) denote the group of all invertible complex n × n matrices and let U (n, C) ⊂ GL (n, C) denote the subgroup consisting of all complex unitary matrices Θ characterized by ΘΘ∗ = Θ∗ Θ = In . A holomorphic group action of GL (n, C) on a finite dimensional complex vector space V is a holomorphic map α : GL (n, C) × V → V,
(g, x) → g · x
(3.1)
such that for all x ∈ V and g, h ∈ GL (n, C) g · (h · x) = (gh) · x,
e·x = x
where e = In denotes the n × n identity matrix. Given an element x ∈ V , the subset of V GL (n, C) · x = {g · x | g ∈ GL (n, C)}
(3.2)
is called an orbit of α. Since GL (n, C) is a complex manifold, each orbit GL (n, C) · x is a complex manifold which is biholomorphic to the complex homogeneous space GL (n, C) /H, where H = {g ∈ GL (n, C) | g · x = x} is the stabilizer subgroup of x, Appendix C. A function ϕ : GL (n, C) · x → R on a GL (n, C) - orbit GL (n, C) · x is called unitarily invariant if for all g ∈ GL (n, C) and all unitary matrices Θ ∈ U (n, C) ϕ (Θg · x) = ϕ (g · x)
(3.3)
holds. We are interested in the critical points of unitarily invariant plurisubharmonic functions, defined on GL (n, C) - orbits of a holomorphic group action α. The following result (except for Part (b)) is a special case of a more general result due to Azad and Loeb (1990).
208
Chapter 7. Invariant Theory and System Balancing
Theorem 3.1 Let ϕ : GL (n, C) · x → R be a unitarily invariant plurisubharmonic function defined on a GL (n, C)-orbit GL (n, C) · x of a holomorphic group action α. Suppose that a global minimum of ϕ exists on GL (n, C) · x. Then (a) The local minima of ϕ : GL (n, C) · x → R coincide with the global minima. If ϕ is smooth then all its critical points are global minima. (b) The set of global minima is connected. (c) If ϕ is a smooth strictly plurisubharmonic function on GL (n, C) · x, then any critical point of ϕ is a point where ϕ assumes its global minimum. The set of global minima is a single U (n, C) − orbit. Proof 3.2 Let GL (n, C) = U (n, C) · P (n) denote the polar decomposition. Here P (n) denotes the set of positive definite Hermitian matrices and U (n, C) = eiΩ | Ω∗ = −Ω is the group of unitary matrices. Suppose that x0 ∈ GL (n, C) · x is a local minimum (a critical point, respectively) of ϕ. By Property (e) of plurisubharmonic functions, the scalar function
z∈C (3.4) φ (z) = ϕ eizΩ · x0 , is (pluri)subharmonic for all Ω∗ = −Ω. Since eizΩ is a unitary matrix for purely imaginary z, the invariance property of ϕ implies that φ (z) depends only on the real part of z, Re (z). Thus for t real, t → φ (t) is a convex function. Thus ϕ eitΩ · x0 ≥ ϕ (x0 ) for all t ∈ R and all Ω∗ = −Ω. By the unitary invariance of ϕ it follows that ϕ (g · x0 ) ≥ ϕ (x0 ) for all g ∈ GL (n, C). This proves (a). To prove (b) suppose that x0 , x1 ∈ GL (n, C) · x are global minima of ϕ. Thus for x1 = ueiΩ · x0 with Ω∗ = −Ω, u ∈ U (n, C), we have
ϕ ueitΩ · x0 = ϕ (x0 )
for t = 0, 1. By the above argument, t → ϕ ueitΩ · x0 is convex and therefore ϕ (t) = ϕ (0) for all 0 ≤ t ≤ 1. Since U (n, C) is connected there exists a continuous path [0, 1] → U (n, C), t → ut , with u0 = In , u1 = u. Thus t →ut eitΩ · x0 is a continuous path connecting x0 with x1 and ϕ ut eitΩ · x0 = ϕ (x0 ) for all 0 ≤ t ≤ 1. The result follows. For (c) note that everything has been proved except for the last statement. But this follows immediately from Lemma 2 in Azad and Loeb (1990). Since any norm function induced by an unitarily invariant positive definite Hermitian inner product on V is strictly plurisubharmonic on any
7.4. Application to Balancing
209
GL (n, C) − orbit of a holomorphic group action x, Part (b) of the KempfNess Theorem (6.2.1) (in the complex case) follows immediately from the Azad-Loeb result.
Main Points of Section The theorem of Azad and Loeb generalizes the scope of the Kempf-Ness theorem from unitarily invariant Hermitian norms to arbitrary unitarily invariant strictly plurisubharmonic functions. This extension is crucial for the subsequent application to balancing.
7.4 Application to Balancing We now show how the above result can be used to study balanced realizations. The work is based on Helmke (1993a). Consider the complex vector space of triples (A, B, C)
LC (n, m, p) := (A, B, C) ∈ Cn×n × Cn×m × Cp×n . (4.1) The complex Lie group GL (n, C) of complex invertible n × n matrices acts on LC (n, m, p) via the holomorphic group action σ : GL (n, C) × LC (n, m, p) → LC (n, m, p)
(S, (A, B, C)) → SAS −1 , SB, CS −1 .
(4.2)
The orbits of σ OC (A, B, C) =
SAS −1 , SB, CS −1 | S ∈ GL (n, C)
(4.3)
are complex homogenous spaces and thus complex submanifolds of the space LC (n, m, p). Of course, by a fundamental theorem in linear systems theory (Kalman’s realization theorem; Appendix B) the orbits (4.3) of controllable and observable triples (A, B, C) are in one-to-one correspondence with strictly proper complex rational transfer functions G (z) ∈ C (z)p×m via OC (A, B, C) ↔ G (z) = C (zI − A)
−1
B.
(4.4)
A function f : OC (A, B, C) → R is called unitarily invariant if for all unitary matrices S ∈ U (n, C), SS ∗ = In ,
(4.5) f SAS −1 , SB, CS −1 = f (A, B, C)
210
Chapter 7. Invariant Theory and System Balancing
holds. We are interested in the critical point structure of smooth, unitary invariant plurisubharmonic functions f : OC (A, B, C) → R on GL (n, C)orbits OC (A, B, C). A particular case of interest is where the function f : OC (A, B, C) → R is induced from a suitable norm on LC (n, m, p). Thus let , denote a positive definite Hermitian inner product on the C-vector space LC (n, m, p). The induced Hermitian norm of (A, B, C) is defined by 2
(A, B, C) = (A, B, C) , (A, B, C)
(4.6)
A Hermitian norm (4.6) is called unitarily invariant, if SAS −1 , SB, CS −1 = (A, B, C)
(4.7)
holds for all unitary transformations S, with SS ∗ = In , and (A, B, C) ∈ LC (n, m, p). Any Hermitian norm (4.6) defines a smooth strictly plurisubharmonic function φ : OC (A, B, C) →R 2
SAS −1 , SB, CS −1 → SAS −1 , SB, CS −1
(4.8)
on OC (A, B, C). In the sequel we fix a strictly proper transfer function G (z) ∈ Cp×m (z) of McMillan degree n and an initial controllable and observable realization (A, B, C) ∈ LC (n, m, p) of G (z): G (z) = C (zI − A)
−1
B.
(4.9)
Thus the GL (n, C)-orbit OC (A, B, C) parametrizes the set of all (minimal) realizations of G (z). 2 Our goal is to study the variation of the norm SAS −1 , SB, CS −1 as S varies in GL (n, C). More generally, we want to obtain answers to the questions: (a) Given a function f : OC (A, B, C) → R, does there exists a realization of G (z) which minimizes f ? (b) How can one characterize the set of realizations of a transfer function which minimize f : OC (A, B, C) → R? As we will see, the theorems of Kempf-Ness and Azad-Loeb give a rather general solution to these questions. Let f : OC (A, B, C) → R be a smooth function on OC (A, B, C) and let · denote a Hermitian norm defined on LC (n, m, p).
7.4. Application to Balancing
211
GL(n, C ) – orbit
FIGURE 4.1. Norm minimality
A realization
(F, G, H) = S0 AS0−1 , S0 B, CS0−1
(4.10)
−1
of a transfer function G (s) = C (sI − A) B is called norm balanced and function balanced respectively, if the function φ : GL (n, C) →R 2 S → SAS −1 , SB, CS −1
(4.11)
or the function F : GL (n, C) →R
S →f SAS −1 , SB, CS −1
(4.12)
respectively, has a critical point at S = S0 ; i.e. the Fr´echet derivative Dφ|S0 =0
(4.13)
DF |S0 =0
(4.14)
respectively,
vanishes. (F, G, H) is called norm minimal or function minimizing, if φ (S0 ), respectively, F (S0 ) is a global minimum for the function (4.11) or (4.12) on GL (n, C), see Figure 4.1 We need the following characterization of controllable and observable realizations. Using the terminology of geometric invariant theory, these are shown to points for the similarity action
be the GL (n, C)-stable (A, B, C) → SAS −1 , SB, CS −1 .
212
Chapter 7. Invariant Theory and System Balancing
Lemma 4.1 (A, B, C) ∈ LC (n, m, p) is controllable and observable if and only if the following conditions are satisfied: (a) The similarity orbit OC (A, B, C) :=
SAS −1 , SB, CS −1 | S ∈ GL (n, C)
is a closed subset of LC (n, m, p). (b) dimC OC (A, B, C) = n2 . Proof 4.2 The necessity of (b) is obvious, since GL (n, C) acts freely on controllable and observable triples via similarity. The necessity of (a) follows from realization theory, since OC (A, B, C) is a fibre of the continuous map H : LC (n, m, p) →
∞ <
Cp×m
i=1
(F, G, H) → HF i G | i ∈ N0
(4.15)
= p×m where ∞ is endowed with the product topology. i=1 C To prove the sufficiency, let us assume that, e.g., (A, B) is not controllable while Conditions (a) and (b) are satisfied. Without loss of generality A11 A12 B1 , C = [C1 , C2 ] A= , B= 0 A22 0 and (A11 , B1 ) controllable, Aii ∈ Cni ×ni for i = 1, 2. Consider the oneparameter group of transformations In1 0 St := ∈ GL (n, C) 0 t−1 In2
for t = 0. Then (At , Bt , Ct ) := St ASt−1 , St B, CSt−1 ∈ OC (A, B, C) with A11 tA22 B1 , Bt := , Ct := [C1 , tC2 ] At = 0 A22 0 Since OC (A, B, C) is closed A11 (A0 , B0 , C0 ) = 0
0 B1 , [C1 , 0] ∈ OC (A, B, C) , 0 A22
the stabilizer of which is St , t ∈ C∗ , there is a contradiction to Condition (b). Thus (A, B) must be controllable, and similarly (A, C) must be observable. This proves the lemma.
7.4. Application to Balancing
213
The above lemma shows that the orbits OC (A, B, C) of controllable and observable realizations (A, B, C) are closed subsets of LC (n, m, p). It is still possible for other realizations to generate closed orbits with complex dimension strictly less than n2 . The following result characterizes completely which orbits OC (A, B, C) are closed subsets of LC (n, m, p). Let R (A, B) = (B, AB, . . . , An B) C CA O (A, C) = . ..
(4.16)
(4.17)
CAn denote the controllability and observability matrices of (A, B, C) respectively and let H (A, B, C) =O (A, C) R (A, B) CB CAB . . . .. CAB . = . . . CAn B
...
CAn B .. . 2n CA B
(4.18)
denote the corresponding (n + 1) × (n + 1) block Hankel matrix. Lemma 4.3 A similarity orbit OC (A, B, C) is a closed subset of LC (n, m, p) if and only if there exists (F, G, H) ∈ OC (A, B, C) of the form F11 0 G1 , H = [H1 , 0] F = , G= 0 0 F22 such that (F11 , G1 , H1 ) is controllable and observable and F22 is diagonalizable. In particular, a necessary condition for OC (A, B, C) to be closed is rk R (A, B) = rk O (A, C) = rk H (A, B, C)
(4.19)
Proof 4.4 Suppose OC (A, B, C) is a closed subset of LC (n, m, p) and assume, for example, that (A, B) is not controllable. Then there exists (F, G, H) ∈ OC (A, B, C) with F11 F12 G1 , H = [H1 , H2 ] F = (4.20) , G= 0 F22 0
214
Chapter 7. Invariant Theory and System Balancing
and (F11 , G1 ) controllable. The same argument as for the proof of Lemma 4.1 shows that F12 = 0, H2 = 0. Moreover, arguing similarly for (A, C) we may assume that (F11 , H1 ) is observable. Suppose F22 is not diagonalizable. Then there exists a convergent sequence of matrices
(n) (n) (n) ∞ F22 | n ∈ N with F22 = limn→∞ F22 , such that F22 is similar to F22 ∞ for all n ∈ N but F22 is not. As OC (A, B, C) is closed, it follows that for F
(n)
=
F11
0
0
F22
(n)
one has F (n) , G, H ∈ OC (A, B, C) for all n ∈ N. Moreover,
(∞)
F , G, H = limn→∞ F (n) , G, H ∈ OC (A, B, C). But this implies that ∞ is similar to F22 , therefore leading to a contradiction. This proves the F22 necessity part of the theorem. Conversely, suppose that (A, B, C) is given as A=
A11 0
0 A22
,
B=
B1 0
,
C = [C1 , 0]
with (A11 , B1 , C1 ) controllable and observable, and A22 diagonalizable. Then suppose OC (A, B, C) is not a closed subset of LC (n, m, p). By the closed orbit lemma, Kraft (1984), there exists a closed orbit OC (F, G, H) ⊂ OC (A, B, C). Using the same arguments as above for the necessary part of the lemma we may assume without loss of generality that 0 G1 F11 , G= , H = [H1 , 0] F = 0 F22 0 with (F11 , G1 , H1 ) controllable and observable and F22 diagonalizable. As the entries of the Hankel matrix H (A, B, C) depend continuously on A, B, C, we obtain
=
H (F11 , G1 , H1 ) =H (F, G, H) H (A11 , B1 , C1 ) =H (A, B, C) Thus the Hankel matrices of (F11 , G1 , H1 ) and (A11 , B1 , C1 ) coincide and therefore, by Kalman’s realization theorem,
(F11 , G1 , H1 ) = S1 A11 S1−1 , S1 B1 , C1 S1−1
7.4. Application to Balancing
215
by an invertible matrix S1 . Since F is in the closure of the similarity orbit of A, their characteristic polynomials coincide: det (sI − F ) = det (sI − A). Thus det (sI − F ) det (sI − F11 ) det (sI − A) = det (sI − A11 ) = det (sI − A22 )
det (sI − F22 ) =
and F22 , A22 must have the same eigenvalues. Both matrices are diagonalizable and therefore F22 is similar to A22 . Therefore (F, G, H) ∈ OC (A, B, C) and OC (A, B, C) = OC (F, G, H) is closed. This contradiction completes the proof. The following theorems are our main existence and uniqueness results for function minimizing realizations. They are immediate consequences of Lemma 4.1 and the Azad-Loeb Theorem 3.1 and the Kempf-Ness Theorem 6.2.1, respectively. Recall that a continuous function f : X → R on a topological space X is called proper if the inverse image f −1 ([a, b]) of any compact interval [a, b] ⊂ R is a compact subset of X. For every proper function f : X → R the image f (X) is a closed subset of R. −1
Theorem 4.5 Let (A, B, C) ∈ LC (n, m, p) with G (z) = C (zI − A) B. Let f : OC (A, B, C) → R+ with R+ = [0, ∞) be a smooth unitarily invariant, strictly plurisubharmonic function on OC (A, B, C) which is proper. Then: (a) There exists a global minimum of f in OC (A, B, C). (b) A realization (F, G, H) ∈ OC (A, B, C) is a critical point of f if and only if it minimizes f . (c) If (A1 , B1 , C1 ) , (A2 , B2 , C2 ) ∈ OC (A, B, C) are minima of f , then there exists a unitary transformation S ∈ U (n, C) such that
(A2 , B2 , C2 ) = SA1 S −1 , SB1 , C1 S −1 . S is uniquely determined if (A, B, C) is controllable and observable. Proof 4.6 Part (a) follows because any proper function f : OC (A, B, C) → R+ possesses a minimum. Parts (b) and (c) are immediate consequences of the Azad-Loeb Theorem.
216
Chapter 7. Invariant Theory and System Balancing
A direct application of Theorem 4.5 to the question of norm balancing yields the following result. 2
Theorem 4.7 Let · be a norm on LC (n, m, p) which is induced from an invariant unitarily Hermitian inner product and let G (z) ∈ Cp×m (z) denote a strictly proper rational transfer function of McMillan degree n. (a) There exists a norm minimal realization (A, B, C) of G (z). (b) A controllable and observable realization (A, B, C) ∈ LC (n, m, p) of G (z) is norm minimal if and only if it is norm balanced. (c) If (A1 , B1 , C1 ) , (A2 , B2 , C2 ) ∈ LC (n, m, p) are controllable and observable norm minimal realizations of G (z), then there exists a uniquely determined unitary transformation S ∈ U (n, C) such that
(A2 , B2 , C2 ) = SA1 S −1 , SB1 , C1 S −1 . Proof 4.8 Let (A, B, C) be a controllable and observable realization of G (z). By Lemmma 4.1, the similarity orbit OC (A, B, C) is a closed complex submanifold of LC (n, m, p). Thus the norm function (A, B, C) → 2 (A, B, C) defines a proper strictly plurisubharmonic function on OC (A, B, C). The result now follows immediately from Theorem 4.5 Corollary 4.9 Let G (z) ∈ Rp×m (z) denote a strictly proper real rational 2 transfer function of McMillan degree n and let · be a unitarily invariant Hermitian norm on the complex vector space LC (n, m, p). Similarly, let f : OC (A, B, C) → R+ be a smooth unitarily invariant strictly plurisubharmonic function on the complex similarity orbit
which is proper and in¯ H ¯ = f (F, G, H) for variant under complex conjugation. That is, f F¯ , G, all (F, G, H) ∈ OC (A, B, C). Then: (a) There exists a real norm (function) minimal realization (A, B, C) of G (z). (b) A real controllable and observable realization (A, B, C) ∈ LC (n, m, p) of G (z) is norm (function) minimal if and only if it is norm (function) balanced. (c) If (A1 , B1 , C1 ) , (A2 , B2 , C2 ) ∈ LC (n, m, p) are real controllable and observable norm (function) minimal realizations of G (z), then there exists a uniquely determined real orthogonal transformation S ∈ into a real realization O (n, R) such that (A1 , B1 , C1 ) transforms (A2 , B2 , C2 ) = SA1 S −1 , SB1 , C1 S −1 .
7.4. Application to Balancing
217
Proof 4.10 The following observation is crucial for establishing the above real versions of Theorems 4.5 and 4.7. If (A1 , B1 , C1 ) and (A2 , B2 , C2 ) are real controllable and observable realizations of a transfer function then any co-ordinate transformation S with (A2 , B2 , C2 ) =
G (z), −1 SA1 S , SB1 , C1 S −1 is necessarily real, i.e. S ∈ GL (n, R). This follows immediately from Kalman’s realization theorem, see Section B.5. With this observation, then the proof follows easily from Theorem 4.5. Remark 4.11 The norm balanced realizations were also considered by Verriest (1988) where they are referred to as “optimally clustered”. Verriest points out the invariance of the norm under orthogonal transformations but does not show global minimality of the norm balanced realizations. An example is given which shows that not every similarity orbit OC (A, B, C) allows a norm balanced realization. In fact, by the Kempf-Ness Theorem 6.2.1(a), there exists a norm balanced realization in OC (A, B, C) if and only if OC (A, B, C) is a closed subset of LC (n, m, p). Therefore Lemma 4.3 characterizes all orbits OC (A, B, C) which contain a norm balanced realization. Remark 4.12 In the above theorems we use the trivial fact that LC (n, m, p) is a complex manifold, however, the vector space structure of LC (n, m, p) is not essential. Balanced realizations for the class of asymptotically stable linear systems were first introduced by Moore (1981) and are defined by the condition that the controllability and observability Gramians are equal and diagonal. We will now show that these balanced realizations can be treated as a special case of the theorems above. For simplicity we consider only the discretetime case and complex systems (A, B, C). A complex realization (A, B, C) is called N -balanced if and only if N
Ak BB ∗ (A∗ ) = k
k=0
N
(A∗ ) C ∗ CAk k
(4.21)
k=0
An asymptotically stable realization (A, B, C) (i.e. σ (A) < 1) is said to be balanced, or ∞-balanced, if and only if ∞ k=0
Ak BB ∗ (A∗ ) = k
∞
(A∗ ) C ∗ CAk k
(4.22)
k=0
We say that (A, B, C) is diagonally N-balanced, or diagonally balanced respectively, if the Hermitian matrix (4.21), or (4.22), respectively, is diagonal.
218
Chapter 7. Invariant Theory and System Balancing
Note that the above terminology differs slightly from the common terminology in the sense that for balanced realizations controllability, respectively observability Gramians, are not required to be diagonal. Of course, this can always be achieved by an orthogonal change of basis in the state space. In Gray and Verriest (1987) realizations (A, B, C) satisfying (4.21) or (4.22) are called essentially balanced. In order to prove the existence of N -balanced realizations we consider the minimization problem for the N-Gramian norm function fN (A, B, C) := tr
N k k Ak BB ∗ (A∗ ) + (A∗ ) C ∗ CAk k=0
(4.23)
for N ∈ N ∪ {∞}. Let (A, B, C) ∈ LC (n, m, p) be a controllable and observable realization with N ≥ n. Consider the smooth unitarily invariant function on the GL (n, C)-orbit fN : OC (A, B, C) → R+
SAS −1 , SB, CS −1 → fN SAS −1 , SB, CS −1 (4.24)
where fN SAS −1 , SB, CS −1 is defined by (4.23) for any N ∈ N ∪ {∞}. In order to apply Theorem 4.5 to the cost function fN (A, B, C), we need the following lemma. Lemma 4.13 Let (A, B, C) ∈ LC (n, m, p) be a controllable and observable realization with N ≥ n, N ∈ N ∪ {∞}. For N = ∞ assume, in addition, that A has all its eigenvalues in the open unit disc. Then the function fN : OC (A, B, C) → R defined by (4.24) is a proper, unitarily invariant strictly plurisubharmonic function. Proof 4.14 Obviously fN : OC (A, B, C) → R+ is a smooth, unitarily invariant function for all N ∈ N ∪ {∞}. For simplicity we restrict attention to the case where N is finite. The case N = ∞ then follows by a simple argument. Let RN (A, B) = B, . . . , AN B and ON (A, C) =
limiting N C , A C , . . . , (A ) C denote the controllability and observability matrices of length N + 1, respectively. Let
OC (ON , RN ) = ON (A, C) T −1 , T RN (A, B) | T ∈ GL (n, C) denote the complex orbit of (ON (A, C) , RN (A, B))—see Chapter 6. By controllability and observability of (A, B, C) and using N ≥ n, both OC (A, B, C) and OC (ON , RN ) are closed complex submanifolds. Moreover, for N ≥ n, the map ρN : OC (A, B, C) → OC (ON , RN ) defined by
7.4. Application to Balancing
219
SAS −1 , SB, CS −1 → ON (A, C) S −1 , SRN (A, B) is seen to be biholomorphic. The Frobenius norm defines a strictly plurisubharmonic function ϕ : OC (ON , RN ) → R, 2 2 ϕ ON (A, C) S −1 , SRN (A, B) = ON (A, C) S −1 + SRN (A, B) .
Thus fN = ϕ ◦ ρN : OC (A, B, C) → R with 2
2 fN SAS −1 , SB, CS −1 = ON (A, C) S −1 + SRN (A, B) is strictly plurisubharmonic, too. Similarly, as OC (ON , RN ) is closed, the Frobenius norm ϕ : OC (ON , RN ) → R+ is a proper function. Since ρN : OC (A, B, C) → OC (O, R) is a homeomorphism, then also FN = ϕ ◦ ρN is proper. This completes the proof of the lemma. Using this lemma we can now apply Theorem 4.5. To compute the critical points of fN : OC (A, B, C) → R+ , we consider the induced function on GL (n, C): FN : GL (n, C) → R+
S → fN SAS −1 , SB, CS −1
(4.25)
for any N ∈ N ∪ {∞}. A simple calculation of the gradient vector ∇FN at S = In shows that ∇FN (In ) = 2
N
Ak BB ∗ (A∗ ) − (A∗ ) C ∗ CAk
k=0
k
k
(4.26)
for any N ∈ N ∪ {∞}. We conclude Corollary 4.15 Given a complex rational strictly proper transfer function G (z) of McMillan degree n, then for all finite N ≥ n there exists a realization (A∗ , B∗ , C∗ ) of G (z) that is N -balanced. If (A1 , B1 , C1 ), (A2 , B2 , C2 , ) are N -balanced realizations of G (z) of order n, N ≥ n, then
(A2 , B2 , C2 ) = SA1 S −1 , SB1 , C1 S −1 for a uniquely determined unitary transformation S ∈ U (n, C). A realization (A1 , B1 , C1 ) is N-balanced if and only if it minimizes the N-Gramian norm taken over all realizations of G (z) of order n. For asymptotically stable linear systems we obtain the following amplifications of Moore’s fundamental existence and uniqueness theorem for balanced realizations, see Moore (1981).
220
Chapter 7. Invariant Theory and System Balancing
Theorem 4.16 Suppose there is given a complex, rational, strictly-proper transfer function G (z) of McMillan degree n and with all poles in the open unit disc. Then there exists a balanced realization (A1 , B1 , C1 ) of G (z) of order n. If (A1 , B1 , C1 ), (A2 , B2 , C2 ) are two balanced realizations of G (z) of order n, then
(A2 , B2 , C2 ) = SA1 S −1 , SB1 , C1 S −1 for a uniquely determined unitary transformation S ∈ U (n, C). An ndimensional realization (A1 , B1 , C1 ) of G (z) is balanced if and only if (A1 , B1 , C1 ) minimizes the Gramian norm (4.24) (for N = ∞), taken over all realizations of G (z) of order n. Problem 4.17 Deduce Corollary 4.15 from Theorem 3.1. Hint: Work with the factorization N =ON · RN ,
of the (N + 1) × (N + 1) block Hankel HN where RN = B, AB, . . . , AN B , ON := C , A C , . . . , (A ) C . Problem 4.18 Prove that the critical points of the cost function fp : p p O (A, B, C) → R, fp (A, B, C) = tr (Wc (A, B) + Wo (A, C) ), p ∈ N, are the balanced realizations (A0 , B0 , C0 ) satisfying Wc (A0 , B0 ) = Wo (A0 , C0 ).
Main Points of Section Function balanced realizations are the critical points of a function defined on the manifold of all realizations of a fixed transfer function. Function minimizing realizations are those realizations of given transfer function which minimize a given function. In the case of strictly plurisubharmonic functions, a general existence and uniqueness result on such function balanced realizations is obtained using the Azad-Loeb theorem. Standard balanced realizations are shown to be function minimizing, where the function is the sum of the eigenvalues of the controllability and observability Gramians. This function is unitarily invariant strictly plurisubharmonic.
7.5 Euclidean Norm Balancing The simplest candidate of a unitarily invariant Hermitian norm on the space LC (n, m, p) is the standard Euclidean norm, defined by (A, B, C)2 := tr (AA∗ ) + tr (BB ∗ ) + tr (C ∗ C)
(5.1)
¯ denotes Hermitian transpose. Applying Theorem 4.7, or where X ∗ = X more precisely its Corollary 4.9, to this norm yields the following result, which describes a new class of norm minimal realizations.
7.5. Euclidean Norm Balancing
221
Theorem 5.1 Let G (z) be a real, rational, strictly-proper transfer function of McMillan degree n. Then: (a) There exists a (real) controllable and observable realization (A, B, C) of G (z) with AA + BB = A A + C C
(5.2)
(b) If (A1 , B1 , C1 ), (A2 , B2 , C2 ) are (real) controllable and observable realizations of G (z) satisfying (5.2). Then there exists a unique orthogonal transformation S ∈ O (n, R) with
(A2 , B2 , C2 ) = SA1 S −1 , SB1 , C1 S −1 (c) An n-dimensional realization (A, B, C) of G (z) satisfies (5.2) if and only if it minimizes the Euclidean norm (5.1) taken over all possible n-dimensional realizations of G (z). Proof 5.2 For any controllable and observable realization (A, B, C) of G (z), consider the function φ : GL (n, R) → R defined by the Euclidean norm 2 φ (S) = SAS −1 , SB, CS −1 Thus
−1 −1 φ (S) = tr SAS −1 (S ) A S + tr (SBB S ) + tr (S ) C CS −1
The gradient vector of φ at S = In is ∇φ (In ) = 2 (AA − A A + BB − C C)
(5.3)
and thus (A, B, C) is norm balanced for the Euclidean norm (5.1) if and only if AA − A A + BB − C C = 0 which is equivalent to (5.2). The result now follows immediately from Theorem 4.7 and its Corollary 4.9. Similar results hold for symmetric or Hamiltonian transfer functions. Recall that a real rational m × m-transfer function G (z) is called a symmetric realization if for all z ∈ C
G (z) =G (z)
(5.4)
and a Hamiltonian realization, if for all z ∈ C
G (z) =G (−z)
(5.5)
222
Chapter 7. Invariant Theory and System Balancing
Every strictly proper symmetric transfer function G (z) of McMillan degree n has a minimal signature symmetric realization (A, B, C) satisfying
(AIpq ) = AIpq , C = Ipq B
(5.6)
where p + q = n and Ipq = diag (ε1 , . . . , εn ) with 1 i = 1, . . . , p εi = −1 i = p + 1, . . . , n Here p − q is the Cauchy-Maslov index of G (z); cf. Anderson and Bitmead (1977) and Byrnes and Duncan (1989). Similarly, every strictly proper Hamiltonian transfer function G (z) ∈ R (z)m×m of McMillan degree 2n has a minimal Hamiltonian realization (A, B, C) satisfying (AJ) = AJ, where
0 J= In
C = JB
−In 0
(5.7)
(5.8)
is the standard complex structure. Let O (p, q), respectively, Sp (n, R) denote the (real) stabilizer groups of Ipq respectively, J, i.e. T ∈ O (p, q) ⇐⇒T Ipq T = Ipq , T ∈ Sp (n, R) ⇐⇒T JT = J,
T ∈GL (n, R) T ∈GL (2n, R)
(5.9)
Analogues of Theorem 5.1 are now presented for the symmetric and Hamiltonian transfer functions, The proof of Theorem 5.3 requires a more subtle argument than merely an application of the Corollary 4.9. Theorem 5.3 m×m
(a) Every strictly proper symmetric transfer function G (z) ⊂ R (z) with McMillan degree n and Cauchy-Maslov index p−q has a controllable and observable signature symmetric realization (A, B, C) satisfying
(AIpq ) = AIpq , C = Ipq B AA + BB = A A + C C
(5.10) (5.11)
(b) If (A1 , B1 , C1 ), (A2 , B2 , C2 ) are two minimal realizations of G (z) satisfying (5.10), (5.11), then there exists a unique orthogonal transformation S = diag (S1 , S2 ) ∈ O (p) × O (q) ⊂ O (n) with
(A2 , B2 , C2 ) = SA1 S −1 , SB1 , C1 S −1
7.5. Euclidean Norm Balancing
223
Proof 5.4 We first prove (a). By Theorem 5.1 there exists a minimal realization (A, B, C) of the symmetric transfer function G (s) which satisfies (5.11) and (A, B, C) is unique up to an orthogonal change of basis S ∈ O (n, R). By the symmetry of G (s), with (A, B, C) also (A , C , B ) is a realization of G (s). Note that if (A, B, C) satisfies (5.11), also (A , C , B ) satisfies (5.11). Thus, by the above uniqueness Property (b) of Theorem 5.1 there exists a unique S ∈ O (n, R) with (A , C , B ) = (SAS , SB, CS )
(5.12)
By transposing (5.12) we also obtain (A, B, C) = (SA S , SC , B S )
(5.13)
or equivalently (A , C , B ) = (S AS, S B, CS)
(5.14)
Thus (by minimality of (A, B, C)) S = S is symmetric orthogonal. A straightforward argument, see Byrnes and Duncan (1989) for details, shows that the signature of S is equal to the Cauchy-Maslov index p−q of G (s) = −1 C (sI − A) B. Thus S = T Ipq T for T ∈ O (n, R) and the new realization (F, G, H) := (T AT , T B, CT ) satisfies (5.11). To prove Part (b), let
(A2 , B2 , C2 ) = SA1 S −1 , SB1 , C1 S −1 be two realizations of G (s) which satisfy (5.10), (5.11). Byrnes and Duncan (1989) has shown that (5.10) implies S ∈ O (p, q). By Theorem 5.1(b) also S ∈ O (n, R). Since O (p) × O (q) = O (p, q) ∩ O (n, R) the result follows. A similar result holds for Hamiltonian transfer functions. Theorem 5.5 m×m
(a) Every strictly proper Hamiltonian transfer function G (s) ∈ R (s) with McMillan degree 2n has a controllable and observable Hamiltonian realization (A, B, C) satisfying (AJ) = AJ,
C = JB
AA + BB = A A + C C
(5.15) (5.16)
224
Chapter 7. Invariant Theory and System Balancing
(b) If (A1 , B1 , C1 ), (A2 , B2 , C2 ) are two minimal realizations of G (s) satisfying (5.15), (5.16), then there exists a unique symplectic transformation S ∈ Sp (n, R) ∩ O (2n, R) with
(A2 , B2 , C2 ) = SA1 S −1 , SB1 , C1 S −1 Proof 5.6 This result can be deduced directly from Theorem 4.7, generalized to the complex similarity orbits for the reductive group Sp (n, C). Alternatively, a proof can be constructed along the lines of that for Theorem 5.3. Problem 5.7 Prove that every strictly proper, asymptotically stable symmetric transfer function G (z) ∈ R (z)m×m with McMillan degree n and Cauchy-Maslov index p − q has a controllable and observable signaturesymmetric balanced realization satisfying (AIpq ) = AIpq , C = Ipq B,
Wc (A, B) = Wo (A, C) = diagonal.
Problem 5.8 Prove as follows an extension of Theorem 5.1 to bilinear systems. Let W (n) = {I = (i1 , . . . , ik ) | ij ∈ {1, 2} , 1 ≤ k ≤ n}∪{∅} be ordered lexicographically. A bilinear system (A1 , A2 , B, C) ∈ Rn×n × Rn×n × Rn×m × Rp×n is called span controllable
if and only if R (A1 , A2 , B) = (AI B | I ∈ W (n)) ∈ Rn×N , N = m 2n+1 − 1 , has full rank n. Here Aφ := In . Similarly (A1 , A2 , B, C) is called span observable ifand only
if O (A1 , A2 , C) = (CAI | I ∈ W (n)) ∈ RM×n , M = p 2n+1 − 1 , has full rank n. (a) Prove that, for (A1 , A2 , B, C) span controllable and observable, the similarity orbit O = O (A1 , A2 , B, C)
= SA1 S −1 , SA2 S −1 , SB, CS −1 | S ∈ GL (n) 2
is a closed submanifold of R2n +n(m+p) with tangent space
T(A1 ,A2 ,B,C) O = ([X, A1 ] , [X, A2 ] , XB, −CX) | X ∈ Rn×n . (b) Show that the critical points (F1 , F2 , G, H) ∈ O (A1 , A2 , B, C) of Φ : O (A1 , A2 , B, C) → R, 2
2
2
2
Φ (F1 , F2 , G, H) = F1 + F2 + G + H , are characterized by F1 F1 + F2 F2 + GG = F1 F1 + F2 F2 + H H.
7.5. Euclidean Norm Balancing
225
(c) Prove that, for (A1 , A2 , B, C) span controllable and observable, the critical points of Φ : O (A1 , A2 , B, C) → R form a uniquely determined O (n)-orbit
SF1 S −1 , SF2 S −1 , SG, HS −1 | S ∈ O (n) .
Main Points of Section Function minimizing realization of a transfer function, where the distance function is the matrix Euclidean norm, are characterized. These realizations minimize the Euclidean norm distance to the zero realization (0, 0, 0). The existence and characterization of signature symmetric norm minimal realizations for symmetric or Hamiltonian transfer functions is shown.
226
Chapter 7. Invariant Theory and System Balancing
Notes for Chapter 7 For results on linear system theory we refer to Kailath (1980) and Sontag (1990a). Balanced realizations for asymptotically stable linear systems were introduced by Moore (1981) as a tool for model reduction and system approximation theory. For important results on balanced realizations see Moore (1981), Pernebo and Silverman (1982), Jonckheere and Silverman (1983), Verriest (1983) and Ober (1987). In signal processing and sensitivity analysis, balanced realizations were introduced in the pioneering work of Mullis and Roberts (1976); see also Hwang (1977). In these papers sensitivity measures such as weighted trace functions of controllability and observability Gramians are considered and the global minima are characterized as balanced, input balanced or sensitivity optimal realizations. For further results in this direction see Williamson (1986) and Williamson and Skelton (1989). A systematic optimization approach for defining and studying various classes of balanced realizations has been developed by Helmke (1993a). The further development of this variational approach to balancing has been the initial motivation of writing this book. We also mention the closely related work of Gray and Verriest (1987), Gray and Verriest (1989) and Verriest (1986; 1988) who apply differential geometric methods to characterize balanced realizations. For further applications of differential geometry to define balanced realizations for multi-mode systems and sensitivity optimal singular systems we refer to Verriest (1988) and Gray and Verriest (1989). Techniques from geometric invariant theory and complex analysis, such as the Kempf-Ness or the Azad-Loeb theorems, for the study of system balancing and sensitivity optimization tasks have been introduced by Helmke (1992; 1993a). Both the Kempf-Ness theorem and the theorem by Azad and Loeb are initially stated over the field of complex numbers. In fact, for applications of these tools to systems balancing it becomes crucial to work over the field of complex numbers, even if one is ultimately interested in real data. Therefore we first focus in this chapter on the complex case and then deduce the real case from the complex one. In the later chapters we mainly concentrate on the real case. For textbooks on several complex variable theory we refer to Krantz (1982) and Vladimirov (1966). Preliminary results on norm balanced realizations were obtained by Verriest (1988), where they are referred to as “optimally clustered”. The existence and uniqueness results Theorems 5.1–5.5 were obtained by Helmke (1993a). In contrast to balanced realizations, no simple algebraic methods such as the SVD are available to compute norm balanced realizations. To this date, the dynamical systems methods for finding norm balanced real-
7.5. Euclidean Norm Balancing
227
izations are the only available tools to compute such realizations. For an application of these ideas to solve an outstanding problem in sensitivity optimization of linear systems see Chapter 9.
CHAPTER
8
Balancing via Gradient Flows 8.1 Introduction In the previous chapter we investigate balanced realizations from a variational viewpoint, i.e. as the critical points of objective functions defined on the manifold of realizations of a given transfer function. This leads us naturally to the computation of balanced realizations using steepest descent methods. In this chapter, gradient flows for the balancing cost functions are constructed which evolve on the class of positive definite matrices and converge exponentially fast to the class of balanced realizations. Also gradient flows are considered which evolve on the Lie group of invertible coordinate transformations. Again there is exponential convergence to the class of balancing transformations. Of course, explicit algebraic methods are available to compute balanced realizations which are reliable and comparatively easy to implement on a digital computer, see Laub, Heath, Paige and Ward (1987) and Safonov and Chiang (1989). So why should one consider continuous-time gradient flows for balancing, an approach which seems much more complicated and involved? This question of course brings us back to our former goal of seeking to replace algebraic methods of computing by analytic or geometric ones, i.e. via the analysis of certain well designed differential equations whose limiting solutions solve the specific computational task. We believe that the flexibility one gains while working with various discretizations of continuous-time systems, opens the possibility of new algorithms to solve a given problem than a purely algebraic approach would allow. We see the possibility of adaptive
230
Chapter 8. Balancing via Gradient Flows
schemes, and perhaps even faster algorithms based on this approach. It is for this reason that we proceed with the analysis of this chapter. For any asymptotically stable minimal realization (A, B, C) the controllability Gramian Wc and observability Gramian Wo are defined in discrete time and continuous time, respectively, by Wc =
∞
Ak BB Ak ,
k=0 % ∞
Wc =
Wo =
eAt BB eA t dt,
∞
Ak C CAk
k=0 % ∞
Wo =
0
eA t C CeAt dt.
(1.1) (1.2)
0
For an unstable system the controllability and observability Gramians are defined by finite sums or integrals, rather than the above infinite sums or integrals. In the following, we assume asymptotic stability and deal with infinite sums, however, all results hold in the unstable case for finite sum Gramians such as in (7.1.4). Any linear change of coordinates in the state space Rn by an invertible T ∈ GL (n, R) changes the realization by (A, B, C) →
transformation T AT −1, T B, CT −1 and thus transforms the Gramians via Wc → T Wc T ,
−1
Wo → (T )
Wo T −1
(1.3)
We call a state space representation (A, B, C) of the transfer function G (s) balanced if Wc = Wo . This is more general than the usual definition of balanced realizations, Moore (1981), which requires that Wc = Wo = diagonal, which is one particular realization of our class of balanced realizations. In this latter case we refer to (A, B, C) as a diagonal balanced realization. In the sequel we consider a given, fixed asymptotically stable minimal −1 realization (A, B, C) of a transfer function G (s) = C (sI − A) B. To get a quantitative measure of how the Gramians change we consider the function −1 φ (T ) = tr T Wc T + (T ) Wo T −1 −1 (1.4) = tr Wc T T + Wo (T T )
= tr Wc P + Wo P −1 with P = T T
(1.5)
8.2. Flows on Positive Definite Matrices
231
Note that φ (T ) is the sum
of the eigenvalues of the controllability and observability Gramians of T AT −1, T B, CT −1 and thus is a crude numerical measure of the controllability and observability properties of
T AT −1, T B, CT −1 . Moreover, as we have seen in Chapter 7, the critical points of the potential φ (·) are balanced realizations.
Main Points of Section In seeking a balanced realization of a state space linear system representation, it makes sense to work with a quantitative measure of how the controllability and observability Gramians change with the co-ordinate basis transformation. We choose the sum of the eigenvalues of these two Gramians.
8.2 Flows on Positive Definite Matrices In this and the next section, we consider a variety of gradient flows for balancing. The intention is that the reader browse through these to grasp the range of possibilities of the gradient flow approach. Let P (n) denote the set of positive definite symmetric n × n matrices P = P > 0. The function we are going to study is φ : P (n) → R,
φ (P ) = tr Wc P + Wo P −1
(2.1)
Let X, Y ∈ Rn×n be invertible square root factors of the positive definite Gramians Wc , Wo so that X X = Wo ,
Y Y = Wc
(2.2)
The following results are then immediate consequences of the more general results from Section 6.4. Lemma 2.1 Let Wc , Wo be defined by (1.1) for an asymptotically stable controllable and observable B, C). Then the function
realization (A, sublevel sets, φ : P (n) → R, φ (P ) = tr Wc P + Wo P −1 has compact i.e. for all a ∈ R, P ∈ P (n) | tr Wc P + Wo P −1 ≤ a is a compact subset of P (n). There exists a minimizing positive definite symmetric matrix P = P > 0 for the function φ : P (n) → R defined by (2.1).
232
Chapter 8. Balancing via Gradient Flows
Theorem 2.2 (Linear index gradient flow) (a) There exists
a unique P∞ = P∞ > 0 which minimizes φ : P (n) → R, φ (P ) = tr Wc P + Wo P −1 , and P∞ is the only critical point of φ. This minimum is given by
1/2 Wc−1/2 P∞ = Wc−1/2 Wc1/2 Wo Wc1/2
(2.3)
1/2
T∞ = P∞ is a balancing transformation for (A, B, C). (b) The gradient flow P˙ (t) = −∇φ (P (t)) on P (n) is given by P˙ = P −1 Wo P −1 − Wc ,
P (0) = P0 .
(2.4)
For every initial condition P0 = P0 > 0, P (t) exists for all t ≥ 0 and converges exponentially fast to P∞ as t → ∞, with a lower bound for the rate of exponential convergence given by ρ≥2
λmin (Wc )3/2 1/2
(2.5)
λmax (Wo )
where λmin (A), λmax (A), denote the smallest and largest eigenvalue of A, respectively. In the sequel we refer to (2.4) as the linear index gradient flow. Instead of minimizing φ (P ) we can just as well consider the minimization problem for the quadratic index function ψ : P (n) → R,
2
ψ (P ) = tr (Wc P )2 + Wo P −1
(2.6)
over all positive symmetric matrices 0. Since, for P = T T ,
−1P = P−1 >
2 2 , the minimization probψ (P ) is equal to tr (T Wc T ) + (T ) Wo T
lem for (2.6) is equivalent to the task of minimizing tr Wc2 + Wo2 = 2 2 Wc + Wo over all realizations of a given transfer function G (s) = C (sI − A)−1 B where X denotes the Frobenius norm. Thus ψ (P ) is the sum of the
squared eigenvalues of the controllability and observability Gramians of T AT −1, T B, CT −1 . The quadratic task 2 index minimization −1 above can also be reformulated as minimizing T Wc T − (T ) Wo T −1 . A theorem corresponding to Theorem 6.4.7 now applies.
8.2. Flows on Positive Definite Matrices
233
Theorem 2.3 (Quadratic index gradient flow) (a) There exists a unique P∞ = P∞ > 0 which minimizes
ψ : P (n) → R, and
2
, ψ (P ) = tr (Wc P )2 + Wo P −1
1/2 P∞ = Wo−1/2 Wc1/2 Wo Wc1/2 Wc−1/2 1/2
is the only critical point of ψ. Also, T∞ = P∞ is a balancing coordinate transformation for (A, B, C). (b) The gradient flow P˙ (t) = −∇ψ (P (t)) on P (n) is P˙ = 2P −1 Wo P −1 Wo P −1 − 2Wc P Wc .
(2.7)
For all initial conditions Po = P0 > 0, the solution P (t) of (2.7) exists for all t ≥ 0 and converges exponentially fast to P∞ . A lower bound on the rate of exponential convergence is ρ > 4λmin (Wc )2
(2.8)
We refer to (2.7) as quadratic index gradient flow. Both algorithms converge exponentially fast to P∞ , although the rate of convergence is rather slow if the smallest singular value of Wc is near to zero. The convergence rate of the linear index flow, however, depends inversely on the maximal eigenvalue of the observability Gramian. In contrast, the convergence of the quadratic index flow is robust with respect to the observability properties of the system. In general, the quadratic index flow seems to behave better than the linear index flow. The following lemma, which is a special case of Lemma 6.4.9, compares the rates of exponential convergence of the algorithms and shows that the quadratic index flow is in general faster than the linear index flow. Lemma 2.4 Let ρ1 and ρ2 respectively denote the rates of exponential convergence of (2.5) and (2.8) respectively. Then ρ1 < ρ2 if the smallest singular value of the Hankel operator of (A, B, C) is > 12 , or equivalently, if λmin (Wo Wc ) > 14 .
234
Chapter 8. Balancing via Gradient Flows 10
8
6
Wo 4
2
0 0
2
4
6
8
10
Wc
FIGURE 2.1. The quadratic index flow is faster than the linear index flow in the shaded region
Simulations The following simulations show the exponential convergence of the diagonal elements of P towards the solution matrix P∞ and illustrate what might effect the convergence rate. In Figure 2.2a–c we have 7 4 4 3 5 2 0 3 4 4 2 2 2 7 −1 −1 Wo = and Wc = 0 −1 5 2 4 2 4 1 3 2 1 5 3 −1 2 6 so that λmin (Wo Wc ) ≈ 1.7142 > 14 . Figure 2.2a concerns the linear index flow while Figure 2.2b shows the evolution of the quadratic index flow, both using P (0) = P1 where 1 0 0 0 2 1 0 0 0 1 0 0 1 2 1 0 . P (0) = P2 = P (0) = P1 = , 0 0 1 0 0 1 2 1 0 0 0 1 0 0 1 2 Figure 2.2c shows the evolution of both algorithms with a starting value of P (0) = P2 . These three simulations demonstrate that the quadratic algorithm converges more rapidly than the linear algorithm when λmin (Wo Wc ) > 14 . However, the rapid convergence rate is achieved at the expense of approximately twice the computational complexity, and consequently the computing time on a serial machine may increase.
8.2. Flows on Positive Definite Matrices
(a)
Linear Index Flow lmin(WoWc)>1/4
Quadratic Index Flow lmin(WoWc)>1/4
(b) 1.2
Diagonal Elements of P
Diagonal Elements of P
1.2
1
1
0.8
0.6
0.8
0.6 0
1
2
0
Time
(c)
1
2
Time
(d)
Linear and Quadratic Flow lmin(WoWc)>1/4
Linear and Quadratic Flow lmin(WoWc)<1/4 2
Diagonal Elements of P
2
Diagonal Elements of P
235
Linear Flow Quadratic Flow 1.3
0.6
Linear Flow QuadraticFlow
1.4
0.8 0
1.5
Time
3
0
1.5
Time
FIGURE 2.2. The linear and quadratic flows
3
236
Chapter 8. Balancing via Gradient Flows
In Figure 2.2d 7 4 Wo = 4 3
4 4 4 2 2 4 2 1
3 2 1 3
5 4 and Wc = 0 3
4
0
3
−1 −1 −1 5 2 −1 2 6 7
so that λmin (Wo Wc ) ≈ 0.207 < 14 . Figure 2.2d compares the linear index flow behaviour with that of the quadratic index flow for P (0) = P1 . This simulation demonstrates that the linear algorithm does not necessarily converge more rapidly than the quadratic algorithm when λmin (Wo Wc ) < 14 , because the bounds on convergence rates are conservative.
Riccati Equations A different approach to determine a positive definite matrix P = P > 0 with Wo = P Wc P is via Riccati equations. Instead of solving the gradient flow (2.4) we consider the Riccati equation P˙ = Wo − P Wc P,
P (0) = P0 .
(2.9)
It follows from the general theory of Riccati equations that for every positive definite symmetric matrix P0 , (2.9) has a unique solution as a positive definite symmetric P (t) defined for all t ≥ 0, see for example Anderson and Moore (1990). Moreover, P (t) converges exponentially fast to P∞ with P∞ Wc P∞ = Wo . The Riccati equation (2.9) can also be interpreted as a gradient flow
for the cost function φ : P (n) → R, φ (P ) = tr Wc P + Wo P −1 by considering a different Riemannian matrix to that used above. Consider the Riemannian metric on P (n) defined by
ξ, η := tr P −1 ξP −1 η (2.10) for tangent vectors ξ, η ∈ TP (P (n)). This is easily seen to define a symmetric positive definite inner product on TP P (n). The Fr´echet derivative of φ : P (n) → R at P is the linear map Dφ|P : TP P (n) → R on the tangent space defined by
ξ ∈ TP (P (n)) Dφ|P (ξ) = tr Wc − P −1 Wo P −1 ξ , Thus the gradient grad φ (P ) of φ with respect to the Riemannian metric (2.10) on P (n) is characterized by Dφ|P (ξ) = grad φ (P ) , ξ
= tr P −1 grad φ (P ) P −1 ξ
8.2. Flows on Positive Definite Matrices
237
for all ξ ∈ TP P (n). Thus P −1 grad φ (P ) P −1 = Wc − P −1 Wo P −1 , or equivalently grad φ (P ) = P Wc P − Wo
(2.11)
Therefore (2.9) is seen as the gradient flow P˙ = − grad φ (P ) of φ on P (n) and thus has equivalent convergence properties to (2.4). In particular, the solution P (t), t ≥ 0 converges exponentially to P∞ . This Riccati equation approach is particularly to balanced realizations suitable if one is dealing with time-varying matrices Wc and Wo . As a further illustration of the above approach we consider the following optimization problem over a convex set of positive semidefinite matrices. Let A ∈ Rm×n , rk A = m < n, B ∈ Rm×n and consider the convex subset of positive semidefinite matrices
C = P ∈ Rn×n | P = P ≥ 0, AP = B with nonempty interior
C˚ = P ∈ Rn×n | P = P > 0, AP = B . A Riemannian metric on C˚ is
ξ, η = tr P −1 ξ P −1 η with
∀ξ, η ∈ TP C˚
TP C˚ = ξ ∈ Rn×n | Aξ = 0 .
Similarly, we consider for A ∈ Rm×n , rk A = m < n, and B ∈ Rm×m
D = P ∈ Rn×n | P = P ≥ 0, AP A = B
˚ = P ∈ Rn×n | P = P > 0, AP A = B D Here the Riemannian metric is the same as above and
˚ = ξ ∈ Rn×n | AξA = 0 . TP D We now consider the cost function φ : C˚ → R defined by
˚→ R respectively φ : D
φ (P ) = tr W1 P + W2 P −1
for symmetric matrices W1 , W2 ∈ Rn×n . The gradient of φ : C˚ → R is characterized by
Chapter 8. Balancing via Gradient Flows
238
grad φ (P ) ∈ TP C˚
(a)
grad φ (P ) , ξ = Dφ|P (ξ) ,
(b)
∀ξ ∈ TP C˚
Now (a) ⇐⇒A · grad φ (P ) = 0 and
W1 − P −1 W2 P −1 ξ = tr P −1 grad φ (P ) P −1 ξ ,
(b) ⇐⇒ tr
∀ξ ∈ TP C˚
⊥ ⇐⇒W1 − P −1 W2 P −1 − P −1 grad φ (P ) P −1 ∈ TP C˚
⇐⇒W1 − P −1 W2 P −1 − P −1 grad φ (P ) P −1 = A Λ From this it follows, as above∗ that −1 grad φ (P ) = In − P A (AP A ) A P W1 − P −1 W2 P −1 P ! P W1 P −W2
Thus the gradient flow on C˚ is P˙ = grad φ (P ) −1 P˙ = In − P A (AP A ) A (P W1 P − W2 ) ˚ the gradient grad φ (P ) of φ : D˙ → R, is characterized by For D A · grad φ (P ) A = 0
(a)
(b)
tr
P −1 grad φ (P ) P −1 ξ
= tr W1 − P −1 W2 P −1 ξ ,
∀ξ : AξA = 0.
Hence
⊥ (b) ⇐⇒ P −1 grad φ (P ) P −1 − W1 − P −1 W2 P −1 ∈ {ξ | AξA = 0} ∗ P ∇φ (P ) P
− P A ΛP = grad φ and Λ = (AP A )−1 AP ∇φ (P )
8.2. Flows on Positive Definite Matrices
239
Lemma 2.5 Let A ∈ Rm×n , AA > 0.
⊥ ξ ∈ Rn×n | AξA = 0 = A ΛA | Λ ∈ Rm×m Proof 2.6 As tr (A Λ Aξ) = tr (Λ AξA ) = 0
∀ξ with AξA = 0,
the right hand side set in the lemma is contained in the left hand side set. Both spaces have the same dimension. Thus the result follows. Hence with this lemma (b) ⇐⇒ P −1 grad φ (P ) P −1 − ∇φ (P ) = A ΛA and therefore
grad φ (P ) = P ∇φ (P ) P + P A ΛAP.
Thus
AP ∇φ (P ) P A = −AP A ΛAP A ,
i.e.
−1
Λ = − (AP A )
AP ∇φ (P ) P A (AP A )
−1
and therefore grad φ (P ) = P ∇φ (P ) P − P A (AP A )
−1
−1
AP ∇φ (P ) P A (AP A )
AP.
We conclude ˙ Theorem 2.7 The gradient flow P = grad φ (P ) of the cost function ˚ respectively, φ (P ) = tr W1 P + W2 P −1 on the constraint sets C˚ and D, is −1 P˙ = In − P A (AP A ) A (P W1 P − W2 )
(a)
˚ for P ∈ C. (b)
P˙ =P W1 P − W2 − P A (AP A )
−1
−1
A (P W1 P − W2 ) A (AP A )
AP
˚ for P ∈ D. In both cases the underlying Riemannian metric is defined as above.
240
Chapter 8. Balancing via Gradient Flows
Dual Flows We also note the following transformed versions of the linear and quadratic gradient flow. Let Q = P −1 , then (2.4) is equivalent to Q˙ = QWc Q − Q2 Wo Q2
(2.12)
and (2.7) is equivalent to Q˙ = 2QWc Q−1 Wc Q − 2Q2 Wo QWo Q2 .
(2.13)
We refer to (2.12) and (2.13) as the transformed linear and quadratic index gradient algorithms respectively. Clearly the analogue statements of Theorems 2.2 and 2.3 remain valid. We state for simplicity only the result for the transformed gradient algorithm (2.12). Theorem 2.8 The transformed gradient flow (2.12) converges exponen−1 tially from every initial condition Qo = Q0 > 0 to Q∞ = P∞ . The rate of exponential convergence is the same as for the Linear Index Gradient Flow. Proof 2.9 It remains to prove the last statement. This is true since P → P −1 = Q is a diffeomorphism of the set of positive definite symmetric matrices onto itself. Therefore the matrix −1 −1 ⊗ Wc − Wc ⊗ P∞ J = −P∞
of the linearization ξ˙ = J · ξ of (2.4) at P∞ is similar to the matrix Jˆ of the linearization −1 η˙ = −Q∞ ηQ−1 ∞ Wc Q∞ − Q∞ Wc Q∞ ηQ∞
of (2.12) via the invertible linear map ξ → η = −Q−1 ∞ ξQ∞ . It follows that J and Jˆ have the same eigenvalues which completes the proof. The dual versions of the linear and quadratic gradient flows are defined as follows. Consider the objective functions
φd (P ) = tr Wc P −1 + Wo P and ψ d (P ) = tr
2 2 Wc P −1 + (Wo P )
Thus φd (P ) = φ P −1 , ψ d (P ) = ψ P −1 . The gradient flows of φd and ψ d respectively are the dual linear flow P˙ = −Wo + P −1 Wc P −1
(2.14)
8.2. Flows on Positive Definite Matrices
241
and the dual quadratic flow P˙ = 2P −1 Wc P −1 Wc P −1 − 2Wo P Wo .
(2.15)
Thus the dual gradient flows are obtained from the gradient flows (2.4), (2.7) by interchanging the matrices Wc and Wo . −1 , with a lower bound In particular, (2.14) converges exponentially to P∞ on the rate of convergence given by ρd ≥ 2
λmin (Wo )
3/2
λmax (Wc )1/2
(2.16)
−1 , with a lower bound on the while (2.15) converges exponentially to P∞ rate of convergence given by
ρd ≥ 4λmin (Wo )2 .
(2.17)
Thus in those cases where Wc is ill-conditioned, i.e. the norm of Wc is small, we may prefer to use (2.16). In this way we can obtain a relatively −1 −1 and thus, by inverting P∞ , we obtain the fast algorithm for computing P∞ desired solution P∞ . By transforming the gradient flows (2.12), respectively, (2.13) with the transformation P → P −1 = Q we obtain the following results. Theorem 2.10 For every initial condition Po = Po > 0, the solution P (t) of the ordinary differential equation P˙ = P Wo P − P 2 Wc P 2
(2.18)
exists within P (n) for all t ≥ 0 and converges exponentially fast to P∞ , with a lower bound on the rate of convergence 3/2
ρ≥2
λmin (Wo )
1/2
(2.19)
λmax (Wc )
Theorem 2.11 For every initial condition Po = Po > 0, the solution P (t) of the ordinary differential equation P˙ = 2P Wo P −1 Wo P − 2P 2 Wc P Wc P 2
(2.20)
exists in P (n) for all t ≥ 0 and converges exponentially fast to P∞ . A lower bound on the rate of convergence is 2
ρ ≥ 4λmin (Wo )
(2.21)
242
Chapter 8. Balancing via Gradient Flows
Proof 2.12 Apply the transformation P → P −1 = Q to (2.14), respectively, (2.15) to achieve (2.19), respectively (2.21). The same bounds on the rate of convergence as (2.16), (2.17) carry over to (2.18), (2.20). Thus, in cases where Wc (respectively λmin (Wc )) is small, one may prefer to use (2.18) (respectively (2.20)) as a fast method to compute P∞ .
Simulations These observations are also illustrated by the following figures, which show the evolution of the diagonal elements of P towards the solution matrix P∞ . In Figure 2.3a–b, Wo and Wc are the same as used in generating Figure 2.2d simulations so that λmin (Wo Wc ) ≈ 0.207 < 14 . Figure 2.3a uses (2.14) while Figure 2.3b shows the evolution of (2.15), both using P (0) = P1 . These simulations demonstrate that the quadratic algorithm converges more rapidly than the linear algorithm in the case when λmin (Wo Wc ) < 14 . In Figure 2.3c–d Wo and Wc are those used in Figure 2.2a–e simulations with λmin (Wo Wc ) ≈ 1.7142 > 14 . Figure 2.3c uses (2.14) while Figure 2.3d shows the evolution of (2.15), both using P (0) = P1 . These simulations demonstrate that the linear algorithm does not necessarily converge more rapidly than the quadratic algorithm in this case when λmin (Wo Wc ) > 14 .
Newton-Raphson Gradient Flows The following modification of the gradient algorithms (2.4), (2.7) is a continuous-time version of the (discrete-time) Newton-Raphson method to minimize φ (P ), respectively ψ (P ). The Hessians of φ and ψ are given by: Hφ (P ) = − P −1 ⊗ P −1 Wo P −1 − P −1 Wo P −1 ⊗ P −1
(2.22)
and Hψ (P ) = − 2P −1 ⊗ P −1 Wo P −1 Wo P −1 − 2P −1 Wo P −1 ⊗ P −1 Wo P −1 − 2P −1 Wo P −1 Wo P −1 ⊗ P −1 − 2Wc ⊗ Wc
(2.23)
Thus Hφ (P )
−1
and Hψ (P )
−1
−1
= − (P ⊗ P ) [P ⊗ Wo + Wo ⊗ P ]
(P ⊗ P )
(2.24)
= − (P ⊗ P ) P ⊗ Wo P −1 Wo + Wo ⊗ Wo + Wo P −1 Wo ⊗ P −1
+ P Wc P ⊗ P Wc P ]
(P ⊗ P )
(2.25)
8.2. Flows on Positive Definite Matrices (a)
(b)
Dual Linear Flow lmin(WoWc)<1/4
Dual Quadratic Flow lmin(WoWc)<1/4 1.6
Diagonal Elements of P
Diagonal Elements of P
1.6
1.1
0.6
1.1
0.6 0
1.5
0
3
(c)
1.5
3
Time
Time
(d)
Dual Linear Flow lmin(WoWc)>1/4 1.3
Dual Quadratic Flow lmin(WoWc)>1/4 1.2
Diagonal Elements of P
Diagonal Elements of P
243
1
0.6
0.9
0.6 0
1.5
Time
3
0
1.5
Time
FIGURE 2.3. The linear and quadratic dual flows
3
244
Chapter 8. Balancing via Gradient Flows
We now consider the Newton-Raphson gradient flows −1 vec P˙ = −Hφ (P ) vec (∇φ (P ))
(2.26)
vec P˙ = −Hψ (P )−1 vec (∇ψ (P ))
(2.27)
respectively,
where φ and ψ are defined by (2.4), (2.7). The advantage of these flows is that the linearization of (2.26), (2.27) at the equilibrium P∞ has all eigenvalues equal to −1 and hence each component of P converges to P∞ at the same rate. Theorem 2.13 Let κ > 0 be given. The Newton-Raphson gradient flow (2.26) is d −1 vec (P ) =κ (P ⊗ P ) [P ⊗ Wo + Wo ⊗ P ] dt
× (P ⊗ P ) vec P −1 Wo P −1 − Wc
(2.28)
which converges exponentially fast from every initial condition P0 = P0 > 0 to P∞ . The eigenvalues of the linearization of (2.28) at P∞ are all equal to −κ.
Simulations Figure 2.4a–d shows the evolution of these Newton Raphson equations. It can be observed in Figure 2.4e–f that all of the elements of P , in one equation evolution, converge to P∞ at the same exponential rate. In Figure 2.4a,b,e, Wo = W1 , Wc = W2 with λmin (Wo Wc ) ≈ 0.207 < 14 . Figure 2.4a uses (2.27) while Figure 2.4b shows the evolution of (2.26) both using P (0) = P1 . These simulations demonstrate that the linear algorithm does not necessarily converge more rapidly than the quadratic algorithm when λmin (Wo Wc ) ≈ 1.7142 > 14 . Figure 2.4c uses (2.27) while Figure 2.4d shows the evolution of (2.26), both using P (0) = P1 . These simulations demonstrate that the quadratic algorithm converges more rapidly than the linear algorithm in this case when λmin (Wo Wc ) > 14 .
8.2. Flows on Positive Definite Matrices
Linear Newton Raphson Flow lmin(WoWc)<1/4
Quadratic Newton Raphson Flow lmin(WoWc)<1/4
(b) Diagonal Elements of P
Diagonal Elements of P
(a) 1.6
1.1
0.6
1.6
1.1
0.6 0
5
10
0
Linear Newton Raphson Flow lmin(WoWc)>1/4
Diagonal Elements of P
Diagonal Elements of P
0.9
0
5
1.2
0.9
0.6
10
0
Time
8
100 10–1
100
Linear Flow Quadratic Flow
10–1
10–2 10–3
10–4
10
Error in the Estimate of P lmin(WoWc)>1/4
(f) Diagonal Elements of |P–P |
Diagonal Elements of |P–P | 8
5
Time
Error in the Estimate of P lmin(WoWc)<1/4
(e)
10
Quadratic Newton Raphson Flow lmin(WoWc)>1/4
(d)
1.2
0.6
5
Time
Time (c)
245
0
10–2
Linear Flow Quadratic Flow 2
Time
4
10–3 0
2
Time
FIGURE 2.4. The linear and quadratic Newton Raphson flows
4
246
Chapter 8. Balancing via Gradient Flows
Main Points of Section With T denoting the co-ordinate basis transformation of a linear system state space realization, then gradient flows on the class of positive definite matrices P = T T are developed which lead to equality of the controllability and observability Gramians. A different choice of the Riemannian metric leads to the Riccati Equation. Constrained optimization tasks on positive definite matrices can also be solved along similar lines. A Newton-Raphson flow involving second derivatives of the cost function is developed, and flows on Q = P −1 are also studied. A variation involving a quadratic index is also introduced. Each flow converges exponentially fast to the balanced realization, but the rates depend on the eigenvalues of the Gramians, so that at various extremes of conditioning, one flow converges more rapidly than the others.
8.3 Flows for Balancing Transformations In the previous section we studied gradient flows which converged to P∞ = 2 , where T∞ is the unique symmetric positive definite balancing transT∞ formation for a given asymptotically stably system (A, B, C). Thus T∞ is obtained as the unique symmetric positive definite square root of P∞ . In this section we address the general problem of determining all balancing transformations T ∈ GL (n, R) for a given asymptotically stable system (A, B, C), using a suitable gradient flow on the set GL (n, R) of all invertible n × n matrices. Thus for T ∈ GL (n) we consider the cost function defined by −1 Φ (T ) = tr T Wc T + (T ) Wo T −1
(3.1)
and the associated gradient flow T˙ = −∇Φ (T ) on GL (n). Of course, in order to define the gradient of a function we have to specify a Riemannian metric. Here and in the sequel we always endow GL (n, R) with its standard Riemannian metric A, B = 2 tr (A B)
(3.2)
i.e. with the constant Euclidean inner product (3.2) defined on the tangent spaces of GL (n, R). For the geometric terminology used in Part (c) of the following theorem we refer to Appendix B; cf. also the digression on dynamical systems in Chapter 6.
8.3. Flows for Balancing Transformations
247
Theorem 3.1 (a) The gradient flow T˙ = −∇Φ (T ) of the cost function Φ on GL (n, R) is −1 −1 T˙ = (T ) Wo (T T ) − T Wc
(3.3)
and for any initial condition T0 ∈ GL (n, R) the solution T (t) of (3.3), T (0) = T0 , exists, and is invertible, for all t ≥ 0. (b) For any initial condition T0 ∈ GL (n, R), the solution T (t) of (3.3) converges to a balancing transformation T∞ ∈ GL (n, R) and all balancing transformations can be obtained in this way, for suitable initial conditions T0 ∈ GL (n, R). (c) Let T∞ be a balancing transformation and let In (T∞ ) denote the set of all T0 ∈ GL (n, R), such that the solution T (t) of (3.3) with T (0) = T0 converges to T∞ at t → ∞. Then the stable manifold W s (T∞ ) is an immersed invariant submanifold of GL (n, R) of dimension n (n + 1) /2 and every solution T (t) ∈ W s (T∞ ) converges exponentially fast in W s (T∞ ) to T∞ . Proof 3.2 Follows immediately from Theorem 6.4.12 with the substitution Wc = Y Y , Wo = X X. In the same way the gradient flow of the quadratic version of the objective function Φ (T ) is derived. For T ∈ GL (n, R), let 2 2 −1 Ψ (T ) = tr (T Wc T ) + (T ) Wo T −1 (3.4) The next theorem is an immediate consequence of Theorem 6.4.16, using the substitutions Wc = Y Y , Wo = X X. Theorem 3.3 (a) The gradient flow T˙ = −∇Ψ (T ) of the objective function Ψ on GL (n, R) is −1 −1 −1 T˙ = 2 (T ) Wo (T T ) Wo (T T ) − T Wc T T Wc
(3.5)
and for all initial conditions T0 ∈ GL (n, R), the solutions T (t) ∈ GL (n, R) of (3.5) exist for all t ≥ 0.
248
Chapter 8. Balancing via Gradient Flows
(b) For all initial conditions T0 ∈ GL (n, R), every solution T (t) of (3.5) converges to a balancing transformation and all balancing transformations are obtained in this way, for a suitable initial conditions T0 ∈ GL (n, R). (c) For any balancing transformation T∞ ∈ GL (n, R) let W s (T∞ ) ⊂ GL (n, R) denote the set of all T0 ∈ GL (n, R), such that the solution T (t) of (3.5) with initial condition T0 converges to T∞ as t → ∞. Then W s (T∞ ) is an immersed submanifold of GL (n, R) of dimension n (n + 1) /2 and is invariant under the flow of (3.5). Every solution T (t) ∈ W s (T∞ ) converges exponentially to T∞ .
Diagonal balancing transformations Here we address the related issue of computing diagonal balancing transformations for a given asymptotically stable minimal realization (A, B, C). Again the results are special cases of the more general results derived in Section 6.4. Let us consider a fixed diagonal matrix N = diag (µ1 , . . . , µn ) with distinct eigenvalues µ1 > · · · > µn . Using N , a weighted cost function for balancing is defined by −1 ΦN (T ) = tr N T Wc T + N (T ) Wo T −1 (3.6) We have the following immediate corollaries of Lemma 6.4.17 and Theorem 6.4.18. Lemma 3.4 Let N = diag (µ1 , . . . , µn ) with µ1 > · · · > µn . Then (a) T ∈ GL (n, R) is a critical point of ΦN : GL (n, R) → R if and only if T is a diagonal balancing transformation, i.e. −1
T Wc T = (T )
Wo T −1 = diagonal
(b) A global minimum Tmin ∈ GL (n, R) of ΦN exists, if (A, B, C) is controllable and observable. Theorem 3.5 Let (A, B, C) be asymptotically stable, controllable and observable and let N = diag (µ1 , . . . , µn ) with µ1 > · · · > µn . (a) The gradient flow T˙ = −∇ΦN (T ) of the weighted cost function ΦN : GL (n) → R is −1 −1 T˙ = (T ) Wo T −1 N (T ) − N T Wc ,
T (0) = T0
(3.7)
and for all initial conditions T0 ∈ GL (n) the solution of (3.7) exists and is invertible for all t ≥ 0.
8.4. Balancing via Isodynamical Flows
249
(b) For any initial condition T0 ∈ GL (n) the solution T (t) of (3.7) converges to a diagonal balancing coordinate transformation T∞ ∈ GL (n) and all diagonal balancing transformations can be obtained in this way, for a suitable initial condition T0 ∈ GL (n). (c) Suppose the Hankel singular values di , i = 1, . . . , n, of (A, B, C) are distinct. Then (3.7) has exactly 2n n! equilibrium points T∞ . These −1 −1 are characterized by (T∞ ) Wo T∞ = T∞ Wc T∞ = D where D is a diagonal matrix. There are exactly 2n stable equilibrium points of (3.7), where now D = diag (d1 , . . . , dn ) is diagonal with d1 < · · · < dn . There exists an open and dense subset Ω ⊂ GL (n) such that for all T0 ∈ Ω the solution of (3.7) converges exponentially fast to a stable equilibrium point T∞ . The rate of exponential convergence is bounded below by −1
λmin (T∞ T∞ )
. min ((di − dj ) (µj − µi ) , 4di µi ) . i<j
All other equilibria are unstable.
Main Points of Section Gradient flows on the co-ordinate basis transformations are studied for both balancing and diagonal balancing. Again, exponential convergence rates are achieved.
8.4 Balancing via Isodynamical Flows In this section we construct ordinary differential equations A˙ =f (A, B, C) B˙ =g (A, B, C) C˙ =h (A, B, C) evolving on the space of all realizations (A, B, C) of a given transfer function G (s), with the property that their solutions (A (t) , B (t) , C (t)) all converge
¯ B, ¯ C¯ of G (s). These equations for t → ∞ to balanced realizations A, generalize the class of isospectral flows, defined by B = 0, C = 0, and are thus of theoretical interest.
250
Chapter 8. Balancing via Gradient Flows
A differential equation A˙ (t) =f (t, A (t) , B (t) , C (t)) B˙ (t) =g (t, A (t) , B (t) , C (t))
(4.1)
C˙ (t) =h (t, A (t) , B (t) , C (t)) defined on the vector space of all triples (A, B, C) ∈ Rn×n × Rn×m × Rp×n is called isodynamical if every solution (A (t) , B (t) , C (t)) of (4.1) is of the form −1 −1 (A (t) , B (t) , C (t)) = S (t) A (0) S (t) , S (t) B (0) , C (0) S (t) (4.2) with S (t) ∈ GL (n), S (0) = In . Condition (4.2) implies that the transfer function Gt (s) = C (t) (sI − A (t))−1 B (t) = C (0) (sI − A (0))−1 B (0) (4.3) is independent of t. Conversely if (4.3) holds with (A (0) , B (0) , C (0)) controllable and observable, then (A (t) , B (t) , C (t)) is of the form (4.2). This is the content of Kalman’s realization theorem, see Appendix B. There is a simple characterization of isodynamical flows which extends the characterization of isospectral flows given in Chapter 1. Lemma 4.1 Let I ⊂ R be an interval and let Λ (t) ∈ Rn×n , t ∈ I, be a continuous time-varying family of matrices. Then A˙ (t) =Λ (t) A (t) − A (t) Λ (t) B˙ (t) =Λ (t) B (t)
(4.4)
C (t) = − C (t) Λ (t) is isodynamical. Conversely, every isodynamical differential equation (4.1) on Rn×n × Rn×m × Rp×n is of the form (4.4). Proof 4.2 Let T (t) denote the unique solution of the linear differential equation T˙ (t) = Λ (t) T (t) , T (0) = In , and let
ˆ (t) , Cˆ (t) = T (t) A (0) T (t)−1 , T (t) B (0) , C (0) T (t)−1 . Aˆ (t) , B
8.4. Balancing via Isodynamical Flows
251
ˆ (0) , Cˆ (0) = (A (0) , B (0) , C (0)) and Then Aˆ (0) , B d ˆ A (t) =T˙ (t) A (0) T (t)−1 − T (t) A (0) T (t)−1 T˙ (t) T (t)−1 dt =Λ (t) Aˆ (t) − Aˆ (t) Λ (t) d ˆ ˆ (t) B (t) =T˙ (t) B (0) = Λ (t) B dt d ˆ −1 −1 C (t) =C (0) − T (t) T˙ (t) T (t) = −Cˆ (t) Λ (t) . dt
ˆ (t) , Cˆ (t) of an isodynamical flow satisfies Thus every solution Aˆ (t) , B (4.4). On the other
hand, for every solution (A (t) , B (t) , C (t)) of (4.4), ˆ (t) , Cˆ (t) is of the form (4.2). By the uniqueness of the solutions Aˆ (t) , B
ˆ (t) , Cˆ (t) = (A (t) , B (t) , C (t)), t ∈ I, and the result of (4.4), Aˆ (t) , B follows. In the sequel let (A, B, C) denote a fixed asymptotically stable realization. At this point we do not necessarily assume that (A, B, C) is controllable or observable. Let
(4.5) O (A, B, C) = SAS −1 , SB, CS −1 | S ∈ GL (n, R) Since O (A, B, C) is an orbit for the similarity action on Rn×(n+m+p) , see Chapter 7, it is a smooth submanifold of Rn×(n+m+p) . Let Φ : O (A, B, C) → R denote the cost function defined by
Φ SAS −1 , SB, CS −1 = tr Wc SAS −1 , SB + Wo SAS −1 , CS −1 (4.6) where Wc (F, G) and Wo (F, H) denote the controllability and observability Gramians of (F, G, H) respectively. The following proposition summarizes some important properties of O (A, B, C) and Φ : O (A, B, C) → R. Proposition 4.3 Let (A, B, C) ∈ Rn×(n+m+p) . (a) O (A, B, C) is a smooth submanifold of Rn×(n+m+p) . The tangent space of O (A, B, C) at (F, G, H) ∈ O (A, B, C) is T(F,G,H) O (A, B, C) = ' & ([X, F ] , XG, −HX) ∈ Rn×(m+m+p) | X ∈ Rn×n
(4.7)
Let (A, B, C) be asymptotically stable, controllable and observable. Then
252
Chapter 8. Balancing via Gradient Flows
(b) O (A, B, C) is a closed subset of Rn×(n+m+p) . (c) The function Φ : O (A, B, C) → R defined by (4.6) is smooth and has compact sublevel sets. Proof 4.4 O (A, B, C) is an orbit of the GL (n, R) similarity action σ : GL (n) × Rn×(n+m+p) →Rn×(n+m+p)
(S, (A, B, C)) → SAS −1 , SB, CS −1 and thus a smooth submanifold of Rn×(n+m+p) ; see Appendix C. This proves (a). Now assume (A, B, C) is asymptotically stable, controllable and observable. Then (b) and (c) follow immediately from Lemmas 6.4.1 and 6.4.17. We now address the issue of finding gradient flows for the objective function Φ : O (A, B, C) → R, relative to some Riemannian metric on O (A, B, C). We endow the vector space Rn×(n+m+p) of all triples (A, B, C) with its standard Euclidean inner product , defined by (A1 , B1 , C1 ) , (A2 , B2 , C2 ) = tr (A1 A2 + B1 B2 + C1 C2 )
(4.8)
Since the orbit O (A, B, C) is a submanifold of Rn×(n+m+p) , the inner product , on Rn×(n+m+p) induces an inner product on each tangent space T(F,G,H) O (A, B, C) by ([X1 , F ] , X1 G, −HX1 ) , ([X2 , F ] , X2 G, −HX2 )
= tr [X1 , F ] · [X2 , F ] + X1 GG X2 + HX1 X2 H
(4.9)
and therefore defines a Riemannian metric on O (A, B, C); see Appendix C and (4.7). We refer to this Riemannian metric as the induced Riemannian metric on O (A, B, C) A second, and for the subsequent development, a more important Riemannian metric on O (A, B, C) is defined as follows. Here we assume that (A, B, C) is controllable and observable. Instead of defining the inner product of tangent vectors ([X1 , F ] , X1 G, −HX1 ) , ([X2 , F ] , X2 G, −HX2 ) ∈ T(F,G,H) O (A, B, C) as in (4.9) we set ([X1 , F ] , X1 G, −HX1 ) , ([X2 , F ] , X2 G, −HX2 ) := 2 tr (X1 X2 ) (4.10) The next lemma shows that (4.10) defines an inner product on the tangent space.
8.4. Balancing via Isodynamical Flows
253
Lemma 4.5 Let (A, B, C) be controllable or observable. Then ([X, A] , XB, −CX) = (0, 0, 0) implies X = 0.
Proof 4.6 If XB = 0 and AX = XA then X B, AB, . . . , An−1 B = 0. Thus by controllability X = 0. Similarly for observability of (A, B, C).
It is easily verified, using controllability and observability of (F, G, H), that (4.10) defines a Riemannian metric on O (A, B, C). We refer to this as the normal Riemannian metric on O (A, B, C). In order to determine the gradient flow of Φ : O (A, B, C) → R we need the following lemma. Lemma 4.7 Let Φ : O (A, B, C) → R be defined by Φ (F, G, H) = tr (Wc (F, G) + Wo (F, H)) for all (F, G, H) ∈ O (A, B, C). Then the derivative of Φ at (F, G, H) ∈ O (A, B, C) is the linear map on T(F,G,H) O (A, B, C) defined by DΦ(F,G,H) ([X, F ] , XG, −HX) = 2 tr ((Wc (F, G) − Wo (F, H)) X) (4.11) Proof 4.8 Consider the smooth map σ : GL (n) → O (A, B, C) defined by σ (S) = SF S −1 , SG, HS −1 for S ∈ GL (n). The composed map of Φ with σ is φ = Φ · σ with −1 φ (S) = tr SWc (F, G) S + (S ) Wo (F, H) S −1 By the chain rule we have Dφ|I (X) = DΦ|(F,G,H) ([X, F ] , XG, −HX) and Dφ|I (X) = 2 tr (Wc (F, G) − Wo (F, H) X) This proves the result. Theorem 4.9 Let (A, B, C) be asymptotically stable, controllable and observable. Consider the cost function Φ : O (A, B, C) → R, Φ (F, G, H) = tr (Wc (F, G) + Wo (F, H)). (a) The gradient flow A˙ = −∇A Φ (A, B, C), B˙ = −∇B Φ (A, B, C), C˙ = −∇C Φ (A, B, C) for the normal Riemannian metric on O (A, B, C) is A˙ (t) = − [A (t) , Wc (A (t) , B (t)) − Wo (A (t) , C (t))] B˙ (t) = − (Wc (A (t) , B (t)) − Wo (A (t) , C (t))) B (t) C˙ (t) =C (t) (Wc (A (t) , B (t)) − Wo (A (t) , C (t))) .
254
Chapter 8. Balancing via Gradient Flows
(b) For every initial condition (A (0) , B (0) , C (0)) ∈ O (A, B, C) the solution (A (t) , B (t) , C (t)) of (4.12) exist for all t ≥ 0 and (4.12) is an isodynamical flow on the open set of asymptotically stable controllable and observable systems (A, B, C). (c) For any initial condition (A (0) , B (0) , C (0)) ∈ O (A, B, C) the solution (A (t) , B (t) , C (t)) of (4.12) converges to a balanced realization −1 (A∞ , B∞ , C∞ ) of the transfer function C (sI − A) B. Moreover the −1 convergence to the set of all balanced realizations of C (sI − A) B is exponentially fast. Proof 4.10 The proof is similar to that of Theorem 6.5.1. Let grad Φ = (grad Φ1 , grad Φ2 , grad Φ3 ) denote the three components of the gradient of Φ with respect to the normal Riemannian metric. The derivative, i.e. tangent map, of Φ at (F, G, H) ∈ O (A, B, C) is the linear map DΦ|(F,G,H) : T(F,G,H) O (A, B, C) → R defined by DΦ|(F,G,H) ([X, F ] , XG, −HX) = 2 tr (X (Wc (F, G) − Wo (F, H))) , (4.12) see Lemma 5.1. By definition of the gradient of a function, see Appendix C, grad Φ (F, G, H) is characterized by the conditions grad Φ (F, G, H) ∈ T(F,G,H) O (A, B, C)
(4.13)
and DΦ|(F,G,H) ([X, F ] , XG, −HX) = (grad Φ1 , grad Φ2 , grad Φ3 ) , ([X, F ] , XG, −HX)
(4.14)
for all X ∈ Rn×n . By Proposition 4.3, (4.13) is equivalent to grad Φ (F, G, H) = ([X1 , F ] , X1 G, −HX1 )
(4.15)
for X1 ∈ Rn×n . Note that by Lemma 4.7, the matrix X1 is uniquely determined. Thus (4.14) is equivalent to 2 tr ((Wc (F, G) − Wo (F, H)) X) = ([X1 , F ] , X1 G, −HX1 ) , ([X, F ] , XG, −HX) =2 tr (X1 X) for all X ∈ Rn×n . Thus X1 = Wc (F, G) − Wo (F, H)
(4.16)
8.4. Balancing via Isodynamical Flows
255
and therefore
grad Φ (F, G, H) = [Wc (F, G) − Wo (F, H) , F ] ,
(Wc (F, G) − Wo (F, H)) G, −H (Wc (F, G) − Wo (F, H)) .
This proves (a). The rest of the argument goes as in the proof of Theorem 6.5.1. Explicitly, for (b) note that Φ (A (t) , B (t) , C (t)) decreases along every solution of (4.12). By Proposition 4.3, {(F, G, H) ∈ O (A, B, C) | Φ (F, G, H) ≤ Φ (Fo , Go , Ho )} is a compact set. Therefore (A (t) , B (t) , C (t)) stays in that compact subset (for (Fo , Go , Ho ) = (A (0) , B (0) , C (0))) and thus exists for all t ≥ 0. By Lemma 4.1 the flows are isodynamical. This proves (b). Since (4.12) is a gradient flow of Φ : O (A, B, C) → R and since Φ : O (A, B, C) → R has compact sublevel sets the solutions (A (t) , B (t) , C (t)) all converge to the equilibria points of (4.12), i.e. to the critical points of Φ : O (A, B, C) → R. But the critical points of Φ : O (A, B, C) → R are just the balanced realizations (F, G, H) ∈ O (A, B, C), being characterized by Wc (F, G) = Wo (F, H); see Lemma 4.7. Exponential convergence follows since the Hessian of Φ at a critical point is positive definite, cf. Lemma 6.4.17. A similar ODE approach also works for diagonal balanced realizations. Here we consider the weighted cost function ΦN : O (A, B, C) → R, ΦN (F, G, H) = tr (N (Wc (F, G) + Wo (F, H)))
(4.17)
for a real diagonal matrix N = diag (µ1 , . . . , µn ), µ1 > · · · > µn . We have the following result. Theorem 4.11 Let (A, B, C) be asymptotically stable, controllable and observable. Consider the weighted cost function ΦN : O (A, B, C) → R, ΦN (F, G, H) = tr (N (Wc (F, G) + Wo (F, H))) , with N = diag (µ1 , . . . , µn ), µ1 > · · · > µn . (a) The gradient flow A˙ = −∇A ΦN , B˙ = −∇B ΦN , C˙ = −∇C ΦN for the normal Riemannian metric is A˙ = − [A, Wc (A, B) N − N Wo (A, C)] B˙ = − (Wc (A, B) N − N Wo (A, C)) B C˙ =C (Wc (A, B) N − N Wo (A, C))
(4.18)
256
Chapter 8. Balancing via Gradient Flows
(b) For any initial condition (A (0) , B (0) , C (0)) ∈ O (A, B, C) the solution (A (t) , B (t) , C (t)) of (4.18) exists for all t ≥ 0 and the flow (4.18) is isodynamical. (c) For any initial condition (A (0) , B (0) , C (0)) ∈ O (A, B, C) the solution matrices (A (t) , B (t) , C (t)) of (4.18) converges to a diagonal −1 balanced realization (A∞ , B∞ , C∞ ) of C0 (sI − A0 ) B0 . Proof 4.12 The proof of (a) and (b) goes—mutatis mutandis—as for (a) and (b) in Theorem 4.9. Similarly for (c) once we have checked that the critical point of ΦN : O (A, B, C) → R are the diagonal balanced realizations. But this follows using the same arguments given in Theorem 6.3.9.
We emphasize that the above theorems give a direct method to compute balanced or diagonal balanced realizations, without computing any balancing coordinate transformations.
Discrete-time Balancing Flows Based on earlier observations in this chapter, it is no surprise that discretetime flows to achieve balanced realizations and diagonal balanced realizations follow directly from the balanced matrix factorization algorithms of Chapter 6 and inherit their convergence properties. Here we restrict attention to isodynamical balancing flows derived by taking the matrix factors X, Y to be the m-th controllability and observability matrices as follows:
Y =Rm (A, B) = B, AB, . . . , Am−1 B C CA . X =Om (A, C) = .. .
(4.19)
(4.20)
CAm−1 Now the observability and controllability Gramians are Wo(m) (A, C) = X X,
Wc(m) (A, B) = Y Y
(4.21)
From now on let us assume that m ≥ n + 1. The case m = ∞ is allowed if A is allowed to be discrete-time stable. We claim that the matrix factorization balancing flow (6.6.1) (6.6.2) specialized to this case gives by simple
8.4. Balancing via Isodynamical Flows
257
manipulations the discrete-time isodynamical flow (m)
(Ak ,Bk )−Wo(m) (Ak ,Ck ))
Ak eαk (Wc
(m)
(Ak ,Bk )−Wo(m) (Ak ,Ck ))
Bk
Ak+1 =e−αk (Wc Bk+1 =e−αk (Wc Ck+1 =Ck e
(m)
(Ak ,Bk )−Wo(m) (Ak ,Ck ))
αk (Wc(m) (Ak ,Bk )−Wo(m) (Ak ,Ck ))
(4.22) αk =
1 (m) (m) 2λmax Wc (Ak , Bk ) + Wo (Ak , Ck )
(4.23)
In fact from (6.6.1), (6.6.2) we obtain for Tk = e−αk (Yk Yk −Xk Xk ) ∈ GL (n) that Xk+1 =Xk Tk−1 , X0 =Om (A0 , C0 ) ,
k ∈N,
Yk+1 =Tk Yk , Y0 =Rm (A0 , B0 ) .
A simple induction argument on k then establishes the existence of (Ak , Bk , Ck ) with Xk = Om (Ak , Ck ) ,
Yk = Rm (Ak , Bk )
for all k ∈ N0 . The above recursion on (Xk , Yk ) is now equivalent to
Om (Ak+1 , Ck+1 ) =Om (Ak , Ck ) Tk−1 = Om Tk Ak Tk−1 , Ck Tk−1
Rm (Ak+1 , Bk+1 ) =Tk Rm (Ak , Bk ) = Rm Tk Ak Tk−1 , Tk Bk for k ∈ N. Thus, using m ≥ n + 1, we obtain Ak+1 = Tk Ak Tk−1 ,
Bk+1 = Tk Bk ,
Ck+1 = Ck Tk−1 ,
which is equivalent to (4.22). Likewise for the isodynamical diagonal balancing flows, we have (m)
Ak+1 =e−αk (Wc
(Ak ,Bk )N −N Wo(m) (Ak ,Ck )) (m)
× Ak eαk (Wc (m)
Bk+1 =e−αk (Wc
(Ak ,Bk )N −N Wo(m) (Ak ,Ck ))
(m)
Ck+1 =Ck eαk (Wc
(Ak ,Bk )N −N Wo(m) (Ak ,Ck ))
(4.24) Bk
(Ak ,Bk )N −N Wo(m) (Ak ,Ck ))
258
Chapter 8. Balancing via Gradient Flows
1 (4.25) αk = (m) (m) 2 Wo N − N Wc (m) (m) (m) (m) N + N Wo − Wc Wo − Wc + 1 × log (m) (m) (m) (m) 4 N · Wo N − N Wc tr Wc + Wo where the Ak , Bk , Ck dependance of αk has been omitted for simplicity. Also, as before N is taken for most applications as N = diag (µ1 , . . . , µn ), µ1 > · · · > µn . By applying Theorem 6.6.4 we conclude Theorem 4.13 Let (A0 , B0 , C0 ) ∈ L (n, m, p) be controllable and observable and let N = diag (µ1 , . . . , µn ) with µ1 > · · · > µn . Then: (a) The recursion (4.24) is a discrete-time isodynamical flow. (b) Every solution (Ak , Bk , Ck ) of (4.24) converges to the set of all diag−1 onal balanced realizations of the transfer function C0 (sI − A0 ) B0 . (c) Suppose that the singular values of the Hankel matrix Hm = Om (A0 , C0 ) Rm (A0 , B0 ) are distinct. There are exactly 2n n! fixed points of (4.24), corresponding to the diagonal balanced realizations of −1 C0 (sI − A0 ) B0 . The set of asymptotically stable fixed points correspond to those diagonal balanced realizations (A∞ , B∞ , C∞ ) with Wc (A∞ , B∞ ) = Wo (A∞ , C∞ ) = diag (σ1 , . . . , σn ), where σ1 < · · · < σn are the singular values of Hm .
Main Points of Section Ordinary differential equations on the linear system matrices A, B, C are achieved which converge exponentially to balancing, or diagonal balancing matrices. This avoids the need to work with co-ordinate basis transformations. Discrete-time isodynamical flows for balancing and diagonal balancing are in essence matrix factorization flows for balancing and inherit the same geometric and dynamical system properties.
8.5 Euclidean Norm Optimal Realizations We now consider another type of balancing that can be applied to systems regardless of their stability properties. This form of balancing was introduced in Verriest (1988) and Helmke (1993a) and interesting connections
8.5. Euclidean Norm Optimal Realizations
259
exist with least squares matching problems arising in computer graphics, Brockett (1989a). The techniques in this section are crucial to our analysis of L2 -sensitivity optimization problems as studied in the next chapter. The limiting solution of a gradient algorithm appears to be the only possible way of finding such least squares optimal realizations and direct algebraic algorithms for the solutions are unknown. Rather than minimizing the sum of the traces of the controllability and observability Gramians (which may not exist) the task here is to minimize the least squares or Euclidean norm of the realization, that is T AT −1, T B, CT −1 2
= tr T AT −1T −1 A T + T BB T + CT −1 T −1 C (5.1)
Observe that (5.1) measures the distance of T AT −1 , T B, CT −1 to the zero realization (0, 0, 0). Eising has considered the problem of finding the best approximations of a given realization (A, B, C) by uncontrollable or unobservable systems with respect to the Euclidean norm (5.1). Thus this norm is sometimes called the Eising distance. Using the substitution P = T T the function to be minimized, over the class of symmetric positive definite matrices P , is Γ : P (n) → R,
(5.2) Γ (P ) = tr AP −1 A P + BB P + P −1 C C . The gradient ∇Γ (P ) is then ∇Γ (P ) = AP −1 A − P −1 A P AP −1 + BB − P −1 C CP −1 (5.3) and the associated gradient flow P˙ = −∇Γ (P ) is P˙ = −AP −1 A − BB + P −1 (A P A + C C) P −1 .
(5.4)
Any equilibrium P∞ of such a gradient flow will be characterized by −1 −1 −1 AP∞ A + BB = P∞ (A P∞ A + C C) P∞ .
(5.5)
> 0 satisfying (5.5) is called Euclidean norm optimal. Any P∞ = P∞ 1/2 Let P∞ = P∞ > 0 satisfy (5.5) and T∞ = P∞ be the positive definite −1 −1 symmetric square root. Then (F, G, H) = T∞ AT∞ , T∞ B, CT∞ satisfies
F F + GG = F F + H H.
(5.6)
Any realization (F, G, H) ∈ O (A, B, C) which satisfies (5.6) is called Euclidean norm balanced. A realization (F, G, H) ∈ O (A, B, C) is called Euclidean diagonal norm balanced if F F + GG = F F + H H = diagonal.
(5.7)
260
Chapter 8. Balancing via Gradient Flows
The diagonal entries σ1 ≥ · · · ≥ σn > 0 of (5.7) are called the generalized (norm) singular values of (F, G, H) and F F + GG , F F + H H are called the Euclidean norm controllability Gramian and Euclidean norm observability Gramian, respectively. For the subsequent stability of the gradient flow we need the following technical lemma. Lemma 5.1 Let (A, B, C) be controllable or observable (but not necessarily asymptotically stable!). Then the linear operator In ⊗ AA + A A ⊗ In − A ⊗ A − A ⊗ A + In ⊗ BB + C C ⊗ In (5.8) has all its eigenvalues in C+ = {λ ∈ C | Re (λ) > 0}. Proof 5.2 Suppose there exists λ ∈ C and X ∈ Cn×n such that XAA + A AX − A XA − AXA + XBB + C CX = λ · X ¯ denote the Hermitian transpose. Then Let X ∗ = X tr(XAA X ∗ + AXX ∗ A − AXA X ∗ − A XAX ∗ + XBB X ∗ + CXX ∗ C ) = Re (λ) · X
2
A straightforward manipulation shows that the left hand side is equal to 2
2
2
AX − XA + XB + CX ≥ 0 and therefore Re (λ) ≥ 0. Suppose AX = XA, XB = 0 = CX. Then by Lemma 4.7 X = 0. Thus taking vec operations and assuming X = 0 implies the result. Lemma 5.3 Given a controllable and observable realization (A, B, C), the linearization of the flow (5.4) at any equilibrium point P∞ > 0 is exponentially stable. Proof 5.4 The linearization (5.4) about P∞ is given by vec (ξ) where
d dt
vec (ξ) = J ·
−1 −1 −1 −1 −1 ⊗ AP∞ − P∞ ⊗ P∞ (A P∞ A + C C) P∞ J =AP∞ −1 −1 −1 −1 −1 − P∞ (A P∞ A + C C) P∞ ⊗ P∞ + P∞ A ⊗ P∞ A 1/2
−1/2
Let F = P∞ AP∞
1/2
−1/2
, G = P∞ B, H = CP∞ . Then with 1/2 1/2 1/2 1/2 J P∞ J¯ = P∞ ⊗ P∞ ⊗ P∞
8.5. Euclidean Norm Optimal Realizations
261
J¯ = F ⊗ F − I ⊗ F F + F ⊗ F − F F ⊗ I − I ⊗ H H − H H ⊗ I (5.9) and using (5.6) gives J¯ = F ⊗ F + F ⊗ F − I ⊗ (F F + GG ) − (F F + H H) ⊗ I (5.10) By Lemma 5.1 the matrix J¯ has only negative eigenvalues. The symmetric matrices J¯ and J are congruent, i.e. J¯ = X JX for an invertible n × n matrix X. By the inertia theorem, see Appendix A, J and J¯ have the same rank and signatures and thus the numbers of positive respectively negative eigenvalues coincide. Therefore J has only negative eigenvalues which proves the result. The uniqueness of P∞ is a consequence of the Kempf-Ness theorem and follows from Theorem 7.5.1. Using Lemma 5.3 we can now give an alternative uniqueness proof which uses Morse theory. It is intuitively clear that on a mountain with more than one maximum or minimum there should also be a saddle point. This is the content of the celebrated Birkhoff minimax theorem and Morse theory offers a systematic generalization of such results. Thus, while the intuitive basis of the uniqueness of P∞ should be obvious from Theorem 7.4.7, we show how the result follows by the formal machinery of Morse theory. See Milnor (1963) for a thorough account on Morse theory. Proposition 5.5 Given any controllable and observable system (A, B, C), > 0 which minimizes Γ : P (n) → R there exists a unique P∞ = P∞ 1/2 and T∞ = P∞ is a least squares optimal coordinate transformation. Also, Γ : P (n) → R has compact sublevel sets. Proof 5.6 Consider the continuous map τ : P (n) → Rn×(n+m+p) defined by
τ (P ) = P 1/2 AP −1/2 , P 1/2 B, CP −1/2 .
By Lemma 7.4.1 the similarity orbit O (A, B, C) is a closed subset of Rn×(n+m+p) and therefore Ma := {(F, G, H) ∈ O (A, B, C) | tr (F F + GG + H H) ≤ a} is compact for all a ∈ R. The function τ : P (n) → O (A, B, C) is a homeomorphism and therefore τ −1 (Ma ) = {P ∈ P (n) | Γ (P ) ≤ a} is compact.
262
Chapter 8. Balancing via Gradient Flows
This proves that Γ : P (n) → R has compact sublevel sets. By Lemma 5.3, the Hessian of Γ at each critical point is positive definite. A smooth function f : P (n) → R is called a Morse function if it has compact sublevel sets and if the Hessian at each critical point of f is invertible. Morse functions have isolated critical points, let ci (f ) denote the number of critical points where the Hessian has exactly i negative eigenvalues, counted with multiplicities. Thus c0 (f ) is the number of local minima of f . The Morse inequalities bound the numbers ci (f ), i ∈ No for a Morse function f in terms of topological invariants of the space P (n), which are thus independent of f . These are the so-called Betti numbers of P (n). The i-th Betti number bi (P (n)) is defined as the rank of the i-th (singular) homology group Hi (P (n)) of P (n). Also, b0 (P (n)) is the number of connected components of P (n). Since P (n) is homeomorphic to the Euclidean space Rn(n+1)/2 we have 1 i = 0, bi (P (n)) = (5.11) 0 i ≥ 1. From the above, Γ : P (n) → R is a Morse function on P (n). The Morse inequalities for Γ are c0 (Γ) ≥b0 (P (n)) c0 (Γ) − c1 (Γ) ≤b0 (P (n)) − b1 (P (n)) c0 (Γ) − c1 (Γ) + c2 (Γ) ≥b0 (P (n)) − b1 (P (n)) + b2 (P (n)) .. . dim P(n)
i=0
dim P(n) i
(−1) ci (Γ) =
i
(−1) bi (P (n)) .
i=0
Since ci (Γ) = 0 for i ≥ 1 the Morse inequalities imply, using (5.11), c0 (Γ) ≥1 c0 (Γ) − c1 (Γ) =c0 (Γ) ≤ b0 (P (n)) = 1. Hence c0 (Γ) = 1, and ci (Γ) = 0 for i ≥ 1, i.e. Γ has a unique local = global minimum. This completes the proof. We summarize the above results in a theorem. Theorem 5.7 Let (A, B, C) be controllable and observable and let Γ : P (n) → R be the smooth function defined by
Γ (P ) = tr AP −1 A P + BB P + C CP −1 .
8.5. Euclidean Norm Optimal Realizations
263
The gradient flow P˙ (t) = −∇Γ (P (t)) on P (n) is given by P˙ = −AP −1 A − BB + P −1 (A P A + C C) P −1 .
(5.12)
For every initial condition P0 = P0 > 0, the solution P (t) ∈ P (n) of the gradient flow exists for all t ≥ 0 and P (t) converges to the uniquely determined positive definite matrix P∞ satisfying (5.5). As before we can find equations that evolve on the space of realizations rather than on the set of positive definite squared transformation matrices. Theorem 5.8 Let the matrices (A, B, C) be controllable and observable and let Γ : O (A, B, C) → R be the least squares cost function defined by Γ (F, G, H) = tr (F F + GG + H H) for (F, G, H) ∈ O (A, B, C). (a) The gradient flow A˙ = −∇A Γ, B˙ = −∇B Γ, C˙ = −∇C Γ for the normal Riemannian metric on O (A, B, C) given by (4.10) is A˙ = [A, [A, A ] + BB − C C] B˙ = − ([A, A ] + BB − C C) B
(5.13)
C˙ =C ([A, A ] + BB − C C) (b) For all initial conditions (A (0) , B (0) , C (0)) ∈ O (A, B, C), the solution (A (t) , B (t) , C (t)) of (5.13), exist for all t ≥ 0 and (5.13), is an isodynamical flow on the set of all controllable and observable triples (A, B, C). (c) For any initial condition, the solution (A (t) , B (t) , C (t)) of (5.13) converges to a Euclidean norm optimal realization, characterized by AA + BB = A A + C C
(5.14)
(d) Convergence to the class of Euclidean norm optimal realizations (5.14) is exponentially fast. Proof 5.9 Let grad Γ = (grad Γ1 , grad Γ2 , grad Γ3 ) denote the gradient of Γ : O (A, B, C) → R with respect to the normal Riemannian metric. Thus for each (F, G, H) ∈ O (A, B, C), then grad Γ (F, G, H) is characterized by the condition grad Γ (F, G, H) ∈ T(F,G,H) O (A, B, C)
(5.15)
264
Chapter 8. Balancing via Gradient Flows
and DΓ|(F,G,H) ([X, F ] , XG, −HX) = (grad Γ1 , grad Γ2 , grad Γ3 ) , ([X, F ] , XG, −HX)
(5.16)
for all X ∈ Rn×n . The derivative of Γ at (F, G, H) is the linear map defined on the tangent space T(F,G,H) O (A, B, C) by DΓ|(F,G,H) ([X, F ] , XG, −HX) = 2 tr [([F, F ] + GG − H H) X] (5.17) for all X ∈ Rn×n . By (4.14) grad Γ (F, G, H) = ([X1 , F ] , X1 G, −HX1 ) for a uniquely determined n × n matrix X1 . Thus with (5.17), then (5.16) is equivalent to X1 = [F, F ] + GG − H H
(5.18)
This proves (a). The proofs of (b) and (c) are similar to the proofs of Theorem 4.9. We only have to note that Γ : O (A, B, C) → R has compact sublevel sets (this follows immediately from the closedness of O (A, B, C) in Rn×(n+m+p) ) and that the Hessian of Γ on the normal bundle of the set of equilibria is positive definite. The above theorem is a natural generalization of Theorem 6.5.1. In fact, for A = 0, (5.13) is equivalent to the gradient flow (6.5.4). Similarly for B = 0, C = 0, (5.13) is the double bracket flow A˙ = [A, [A, A ]]. It is inconvenient to work with the gradient flow (5.13) as it involves solving a cubic matrix differential equation. A more convenient form leading to a quadratic differential equation is to augment the system with a suitable differential equation for a matrix parameter Λ. Problem 5.10 Show that the system of differential equations A˙ =AΛ − ΛA B˙ = − ΛB C˙ =CΛ Λ˙ = − Λ + [A, A ] + BB − C C
(5.19)
is Lyapunov stable, where Λ = Λ is symmetric. Moreover, every solution (A (t) , B (t) , C (t) , Λ (t)) of (5.19) exists for all t ≥ 0 and converges to (A∞ , B∞ , C∞ , 0) where (A∞ , B∞ , C∞ ) is Euclidean norm balanced.
8.5. Euclidean Norm Optimal Realizations
265
Main Points of Section A form of balancing involving a least squares (Euclidean) norm on the system matrices is achieved via gradient flow techniques. There does not appear to be an explicit algebraic solution to such balancing. At this stage, the range of application areas of the results of this section is not clear, but there are some applications in system identification under study, and there is a generalization which is developed in Chapter 9 to sensitivity minimizations.
266
Chapter 8. Balancing via Gradient Flows
Notes for Chapter 8 Numerical algorithms for computing balanced realizations are described in Laub et al. (1987), Safonov and Chiang (1989). These are based on standard numerical software. Least squares optimal realizations have been considered by Verriest (1988) and Helmke (1993a). Numerical linear algebra methods for computing such realizations are unknown and the gradient flow techniques developed in this chapter are the only available computational tools. An interesting feature of the dynamical systems for balancing as described in this chapter is their robustness with respect to losses of controllability or observability properties. For example, the linear and quadratic gradient flows on positive definite matrices have robust or high rates of exponential convergence, respectively, if the observability properties of the system are poor. Thus for certain applications such robustness properties may prove useful. Discrete-time versions of the gradient flows evolving on positive definite matrices are described in Yan, Moore and Helmke (1993). Such recursive algorithms for balancing are studied following similar ideas for solving Riccati equations as described by Hitz and Anderson (1972). Isodynamical flows are a natural generalization of isospectral matrix flows (where B = C = 0). Conversely, isodynamical flows may also be A B ]). viewed as special cases of isospectral flows (evolving on matrices [ C 0 Moreover, if A = 0, then the class of isodynamical flows is equivalent those evolving on orbits O (B, C), studied in Chapter 6. For some early ideas concerning isodynamical flows see Hermann (1979). Flows on spaces of linear systems are discussed in Krishnaprasad (1979) and Brockett and Faybusovich (1991). A number of open research problems concerning the material of Chapter 8 are listed below (a) Is there a cost function whose critical points are the Euclidean norm diagonal balanced realizations? N.B.: The trace function tr (N (AA + BB + C C)) does not seem to work! (b) Pernebo and Silverman (1982) has shown that truncated subsystems of diagonal balanced realizations of asymptotically stable systems are also diagonal balanced. Is there a “cost function” proof of this? What about other types of balancing? The truncation property does not hold for Euclidean norm balancing. (c) Develop a Singular Perturbation Approach to balanced realization model reduction, i.e. find a class of singular perturbation isodynami-
8.5. Euclidean Norm Optimal Realizations
267
cal flows which converge to reduced order systems. (d) Study measures for the degree of balancedness! What is the minimal distance of a realization to the class of balanced ones? There are two cases of interest: Distance to the entire set of all balanced realizations or to the subset of all balanced realizations of a given transfer function.
CHAPTER
9
Sensitivity Optimization 9.1 A Sensitivity Minimizing Gradient Flow In this chapter, the further development and application of matrix least squares optimization techniques is made to minimizing linear system parameter sensitivity measures. Even for scalar linear systems, the sensitivity of the system transfer function with respect to the system state space realization parameters is expressed in terms of norms of matrices. Thus a matrix least squares approach is a very natural choice for sensitivity minimization. See also Helmke and Moore (1993), Yan and Moore (1992) and Yan, Moore and Helmke (1993). For practical implementations of linear systems in signal processing and control, an important issue is that of sensitivity of the input/output behaviour with respect to the internal parameters. Such sensitivity is dependent on the co-ordinate basis of the state space realization of the parametrized system. In this section we tackle, via gradient flow techniques, the task of L2 -sensitivity minimization over the class of all co-ordinate basis. The existence of L2 -sensitivity optimal realizations is shown and it is proved that the class of all optimal realizations is obtained by orthogonal transformations from a single optimal one. Gradient flows are constructed whose solutions converge exponentially fast to the class of sensitivity optimal realizations. The results parallel the norm-balancing theory developed in Chapters 7 and 8. Since the sensitivity optimum co-ordinate basis selections are unique to within arbitrary orthogonal transformations, there remains the possibility of ‘super’ optimal designs over this optimum class to achieve scaling and sparseness constraints, with the view to good finiteword-length filter design. Alternatively, one could incorporate such con-
270
Chapter 9. Sensitivity Optimization
straints in a more complex ‘sensitivity’ measure, or as side constraints in the optimization, as developed in the last section. As far as we know, the matrix least squares L2 -sensitivity minimization problem studied here can not be solved by algebraic means. For the authors, this chapter is a clear and convincing demonstration of the power of the gradient flow techniques of this book. It sets the stage for future gradient flow studies to solve optimization problems in signal processing and control theory not readily amenable to algebraic solution. The classic problem of robust continuous-time filter design insensitive to component value uncertainty is these days often replaced by a corresponding problem in digital filter design. Finite-word-length constraints for the variables and coefficients challenge the designer to work with realizations with input/output properties relatively insensitive to coefficient values, as discussed in Williamson (1991) and Roberts and Mullis (1987). Input/output shift operator representations involving regression state vectors are notoriously sensitive to their coefficients, particularly as the system order increases. The so called delta operator representations are clearly superior for fast sampled continuous-time systems, as are the more general delay representation direct forms, as discussed in Williamson (1991) and Middleton and Goodwin (1990). The challenge is to select appropriate sensitivity measures for which optimal co-ordinate basis selections can be calculated, and which translate to practical robust designs. A natural sensitivity measure to use for robust filter design is an L2 -index as in Thiele (1986). Thus consider a discrete-time filter transfer function H (z) with the (minimal) state space representation H (z) = c (zI − A)
−1
b
(1.1)
where (A, b, c) ∈ Rn×n × Rn×1 × R1×n . The L2 -sensitivity measure is 2 2 2 ∂H + ∂H (z) + ∂H (z) (z) (1.2) S (A, b, c) = ∂b ∂c ∂A 2 2 2 where 2
X (z)2 =
1 2πi
>
∗ dz , tr X (z) X (z) z |z|=1
∗ and X (z) = X z −1 . The associated optimization task is to select a
co-ordinate basis transformation T such that S T AT −1, T b, cT −1 is minimized. Incidentally, adding a direct feedthrough constant d in (1.1) changes the index (1.2) by a constant and does not change the analysis. Also for p × m transfer function matrices H (z) = (Hij (z)) H (z) = C (zI − A)
−1
B,
with Hij (z) = ci (zI − A)
−1
bj (1.3)
9.1. A Sensitivity Minimizing Gradient Flow
271
where B = (b1 , b2 , . . . , bm ), C = c1 , c2 , . . . , cp , the corresponding sensitivity measure is p m ∂H 2 ∂H 2 ∂H 2 + + = S (A, B, C) = S (A, bj , ci ) ∂A ∂B ∂C 2 2 2 i=1 j=1
(1.4)
or more generally S T AT −1, T B, CT −1 . In tackling such sensitivity problems, the key observation of Thiele (1986) 2 is that although the term ∂H ∂A 2 appears difficult to work with technically,such difficulties can be circumvented by working instead with a term ∂H 2 , or rather an upper bound on this. There results a mixed L2 /L1 ∂A 1 sensitivity bound optimization which is mainly motivated because it allows explicit solution of the optimization problem. Moreover, the optimal realization turns out to be, conveniently, a balanced realization which is also 2 the optimal solution for the case when the term ∂H ∂A 2 is deleted from the sensitivity measure. More recently in Li, Anderson and Gevers (1992), the theory for frequency shaped designs within this L2 /L1 framework as first proposed by Thiele (1986) has been developed further. It is shown that, in general, the sensitivity term involving ∂H ∂A does then affect the optimal solution. Also the fact that the L2 /L1 bound optimal T is not unique is exploited. In fact the ‘optimal’ T is only unique to within arbitrary orthogonal transformations. This allows then further selections, such as the Schur form or Hessenberg forms, which achieve sparse matrices. One obvious advantage of working with sparse matrices is that zero elements can be implemented with effectively infinite precision. In this chapter, we achieve a complete theory for L2 sensitivity realization optimization both for discrete-time and continuous-time, linear, timeinvariant systems. Constrained sensitivity optimization is studied in the last section. Our aim in this section is to show that an optimal coordinate basis transformation T¯ exists and that every other optimal coordinate transformation differs from T¯ by an arbitrary orthogonal left factor. Thus P¯ = T¯ T¯ is uniquely determined. While explicit algebraic constructions for T¯ or P¯ are unknown and appear hard to obtain we propose to use steepest descent methods in order solution. Thus for example, with the
to find the optimal sensitivities S T AT −1, T b, cT −1 expressed as a function of P = T T , denoted S¯ (P ), the gradient flow on the set of positive definite symmetric matrices P is constructed as P˙ = −∇S¯ (P )
(1.5)
272
Chapter 9. Sensitivity Optimization
with P¯ = lim P (t)
(1.6)
t→∞
The calculation of the sensitivities S (P ) and the gradient ∇S (P ), in the first instance, requires unit circle integration of matrix transfer functions expressed in terms of P , A, b, c. For corresponding continuous-time results, the integrations would be along the imaginary axis in the complex plane. Such contour integrations can be circumvented by solving appropriate Lyapunov equations.
Optimal L2 -sensitivity Measure We consider the task of optimizing a total L2 -sensitivity function over the class of all minimal realizations (A, b, c) of single-input, single-output, discrete-time, asymptotically stable, rational transfer functions −1
H (z) = c (zI − A)
b.
(1.7)
Continuous-time, asymptotically stable, transfer functions can be treated in a similar way and present no additional difficulty. Here asymptotic stability for discrete-time systems means that A has its eigenvalues all in the open unit disc {z ∈ C | |z| < 1}. Likewise, continuous time systems (A, b, c) are called asymptotically stable if the eigenvalues of A all have real parts less than zero. In the sequel (A, b, c) ∈ Rn×n × Rn×1 × R1×n always denotes an asymptotically stable controllable and observable realization of H (z). Given any such initial minimal realization (A, b, c) of H (z), the similarity orbit
(1.8) RH := T AT −1, T b, cT −1 | T ∈ GL (n) of all minimal realizations of H (z) is a smooth closed submanifold of Rn×n × Rn×1 × R1×n , see Lemma 7.4.1. Note that we here depart from our previous notation O (A, b, c) for (1.8). To define a sensitivity measure on RH we first have to determine the partial derivatives of the transfer function H (z) with respect to the variables A, b and c. Of course, in the course of such computations we have to regard the complex variable z ∈ C as a fixed but arbitrary constant. Let us first introduce the definitions B (z) := (zI − A)
−1
b
−1
C (z) :=c (zI − A) A (z) :=B (z) · C (z)
(1.9)
9.1. A Sensitivity Minimizing Gradient Flow
273
By applying the standard rules of matrix calculus ,see Appendix A, we obtain well known formulas for the partial derivatives as follows: ∂H (z) = A (z) , ∂A
∂H (z) = C (z) , ∂b
∂H (z) = B (z) ∂c
(1.10)
∂H ∂H We see that the derivatives ∂H ∂A (z), ∂b (z), ∂c (z) are stable rational matrix valued functions of z and therefore their L2 -norms exist as the unit circle contour integrals: > ∂H 2
∗ dz = A (z)2 = 1 tr A (z) A (z) 2 ∂A 2πi z 2 2 > ∂H dz
1 2 ∗ (1.11) ∂b = C (z)2 = 2πi tr C (z) C (z) z 2 > ∂H 2 dz
= B (z)2 = 1 tr B (z) B (z)∗ 2 ∂c 2πi z 2
The total L2 -sensitivity function S : RH → R is defined by (1.2),which can be reformulated using the following identities for the controllability Gramian and observability Gramian obtained by expanding B (z) and C (z) in a Laurent series. > ∞ 1 ∗ dz k k A bb (A ) = Wc = B (z) B (z) 2πi z k=0 (1.12) > ∞ 1 dz ∗ k k Wo = (A ) c cA = C (z) C (z) 2πi z k=0
as S (A, b, c) =
1 2πi
>
∗ dz + tr Wc + tr Wo tr A (z) A (z) z
(1.13)
The first term involving contour integration can be expressed in explicit form as > ∂H 2 dz = 1 tr (AA∗ ) ∂A 2πi z 2 k l tr (A ) c b (A ) As bcAr = (1.14) k,l,r,s r+l=k+s
=
k,l,r,s r+l=k+s
cAr (A ) c · b (A ) As b k
l
Chapter 9. Sensitivity Optimization
274
Consider now
the effect on S (A, b, c) of a transformation to the state space (A, b, c) → T AT −1 , T b, cT −1 , T ∈ GL (n, R). It is easily seen that >
dz −1 tr T A (z) T −1 (T ) A (z)∗ T z >
dz 1 ∗ + tr T B (z) B (z) T 2πi z > dz 1 ∗ −1 + tr (T ) C (z) C (z) T −1 2πi z
Setting P = T T , we can reformulate S T AT −1, T b, cT −1 as a cost function on P (n), the set of positive definite matrices in Rn×n , as
1 S T AT −1 , T b, cT −1 = 2πi
S¯ (P ) (1.15) >
1 dz ∗ ∗ ∗ = tr A (z) P −1 A (z) P + B (z) B (z) P + C (z) C (z) P −1 2πi z Lemma 1.1 The smooth sensitivity function S¯ : P (n) → R defined by (1.15) has compact sublevel sets. Proof 1.2 Obviously S¯ (P ) ≥ 0 for all P ∈ P (n). For any P ∈ P (n) the inequality
S¯ (P ) ≥ tr (Wc P ) + tr Wo P −1
holds by (1.13). By Lemma 6.4.1, P ∈ P (n) | tr Wc P + Wo P −1 ≤ a is a compact subset of P (n). Thus
P ∈ P (n) | S¯ (P ) ≤ a ⊂ P ∈ P (n) | tr Wc P + Wo P −1 ≤ a
is a closed subset of a compact set and therefore also compact. Any continuous function f : P (n) → R with compact sublevel sets is proper and thus possesses a minimum, see Section C.1. Thus Lemma 1.1 immediately implies the following corollary. Corollary 1.3 The sensitivity function S¯ : P (n) → R (and S : RH → R) defined by (1.15) (respectively (1.13)) assumes its global minimum, i.e. there exists Pmin ∈ P (n) (and (Amin , bmin , cmin ) ∈ RH ) such that S¯ (Pmin ) = S (Amin , bmin , cmin ) =
inf
P ∈P(n)
inf
S¯ (P )
(F,g,h)∈RH
S (F, g, h)
9.1. A Sensitivity Minimizing Gradient Flow
275
Following the analysis developed in Chapters 6 and 8 for Euclidean norm balancing, we now derive the gradient flow P˙ = −∇S¯ (P ). Although the equations look forbidding at first, they are the only available reasons for solving the L2 -sensitivity optimal problem. Subsequently, we show how to work with Lyapunov equations rather than contour integrations. Using (1.15), we obtain for the total derivative of S¯ at P >
1 ∗ ∗ tr A (z) P −1 A (z) ξ − A (z) P −1 ξP −1 A (z) P DS¯P (ξ) = 2πi dz + B (z) B (z)∗ ξ − C (z)∗ C (z) P −1 ξP −1 z >
1 ∗ ∗ = tr A (z) P −1 A (z) − P −1 A (z) P A (z) P −1 2πi dz ∗ ∗ +B (z) B (z) − P −1 C (z) C (z) P −1 ξ z Therefore the gradient for S¯ : P (n) → R with respect to the induced Riemannian metric on P (n) is given by >
1 ∗ ∗ A (z) P −1 A (z) − P −1 A (z) P A (z) P −1 ∇S¯ (P ) = 2πi dz ∗ ∗ + B (z) B (z) − P −1 C (z) C (z) P −1 z The gradient flow P˙ = −∇S¯ (P ) of S¯ is thus seen to be 1 P˙ = 2πi
>
−1 ∗ ∗ P A (z) P A (z) P −1 − A (z) P −1 A (z) ∗
∗
− B (z) B (z) + P −1 C (z) C (z) P −1
dz z
(1.16)
The equilibrium points P∞ of this gradient flow are characterized by P˙ = 0, and consequently >
1 ∗ ∗ dz −1 A (z) + B (z) B (z) A (z) P∞ 2πi z > dz −1
∗ ∗ −1 1 P (1.17) = P∞ A (z) P∞ A (z) + C (z) C (z) 2πi z ∞ T∞ Any co-ordinate transformation T∞ ∈ GL (n) such that P∞ = T∞ 2 satisfies (1.17) is called an L -sensitivity optimal co-ordinate transformation. Moreover, any realization (Amin , bmin , cmin ) which minimizes the L2 sensitivity index S : RH → R is called L2 -sensitivity optimal. Having established the existence of the L2 -sensitivity optimal transformations, and
276
Chapter 9. Sensitivity Optimization
realizations, respectively, we now turn to develop a gradient flow approach to achieve the L2 -sensitivity optimization. The following results are straightforward modifications of analogous results developed in Chapters 7 and 8 for the case of Euclidean norm balanced realizations. The only significant modification is the introduction of contour integrations. We need the following technical lemma. Lemma 1.4 Let (A, b, c) be a controllable or observable asymptotically stable realization. Let (A (z) , B (z) , C (z)) be defined by (1.9) and (1.10). Then the linear operator >
1 ∗ ∗ I ⊗ A (z) A (z) + B (z) B (z) 2πi
+ A (z)∗ A (z) + C (z)∗ C (z) ⊗ I ∗ ∗ dz − A (z) ⊗ A (z) − A (z) ⊗ A (z) z has all its eigenvalues in C+ = {λ ∈ C | Re (λ) > 0}. Proof 1.5 Suppose there exists λ ∈ C and a non zero matrix X ∈ Cn×n such that (writing A instead of A (z), etc.) > 1 dz = λX (XAA∗ + A∗ AX − A∗ XA − AXA∗ + XBB ∗ + C ∗ CX) 2πi z ¯ denote the Hermitian transpose. Then Let X ∗ = X >
1 tr XAA∗ X ∗ + AXX ∗ A∗ − AXA∗ X ∗ 2πi dz 2 = Re (λ) X − A∗ XAX ∗ + XBB ∗X ∗ + CXX ∗C ∗ z A straightforward manipulation shows that left hand side is equal to > dz 1 A (z) X − XA (z)2 + XB (z)2 + C (z) X2 ≥0 2πi z Thus Re (λ) ≥ 0. The integral on the left is zero if and only if A (z) X = XA (z) ,
XB (z) = 0,
C (z) X = 0
for all |z| = 1. Now suppose that Re (λ) = 0. Then any X satisfying these equations satisfies XB (z) = X (zI − A)−1 b = 0
(1.18)
for all |z| = 1 and hence, by the identity theorem for analytic functions, for all z ∈ C. Thus controllability of (A, b) implies X = 0 and similarly for (A, c) observable. Thus Re (λ) = 0 and the proof is complete.
9.1. A Sensitivity Minimizing Gradient Flow
277
Lemma 1.6 Given any initial controllable and observable asymptotically stable realization (A, b, c), then the linearization of the gradient flow (1.16) at any equilibrium point P∞ > 0 is exponentially stable. Proof 1.7 The linearization of (1.16) at P∞ is given by > −1 −1 −1 −1 ∗ ∗ −1 ⊗ AP − P ⊗ P [A P A + C C] P AP 1 dz ∞ ∞ ∞ ∞ ∞ ∞ J¯ = −1 −1 −1 −1 ∗ −1 ∗ 2πi z −P∞ (A∗ P∞ A + C ∗ C) P∞ ⊗ P∞ + P∞ A ⊗ P∞ A 1/2
1/2
1/2
−1/2
Let F := P∞ AP∞ , G = P∞ B, H = CP∞ . Then with 1/2 1/2 1/2 1/2 J = P∞ J¯ P∞ ⊗ P∞ ⊗ P∞ we have 1 J = 2πi
>
F ⊗ F + F∗ ⊗ F∗ − I ⊗ (F ∗ F + H∗ H) − (F ∗ F + H∗ H) ⊗ I
dz z
Using (1.17) gives >
1 J= F ⊗ F + F∗ ⊗ F∗ 2πi dz − I ⊗ (F F ∗ + GG ∗ ) − (F ∗ F + H∗ H) ⊗ I z By Lemma 1.6, J has only negative eigenvalues. Therefore, by Sylvester’s inertia theorem, see Appendix A, J¯ has only negative eigenvalues. The result follows. We can now state and prove one of the main results of this section. Theorem 1.8 Consider any minimal asymptotically stable realization (A, b, c) of the transfer function H (z). Let (A (z) , B (z) , C (z)) be defined by (1.9) and (1.10). > 0 which minimizes S¯ (P ) and (a) There exists a unique P∞ = P∞ 1/2 2 T∞ = P∞ is an L -sensitivity optimal transformation. Also, P∞ is the uniquely determined critical point of the sensitivity function S¯ : P (n) → R, characterized by (1.17).
(b) The gradient flow P˙ (t) = −∇S¯ (P (t)) is given by (1.16) and for every initial condition P0 = P0 > 0, the solution P (t) exists for all t ≥ 0 and converges exponentially fast to P∞ .
278
Chapter 9. Sensitivity Optimization
(c) A realization (A, b, c) ∈ RH is a critical point for the total L2 sensitivity function S : RH → R if and only if (A, b, c) is a global minimum for S : RH → minima of S : RH → R
R. The set of global is a single O (n)-orbit SAS −1 , Sb, cS −1 | S S = SS = I . Proof 1.9 We give two proofs. The first proof runs along similar lines to the proof of Proposition 8.5.5. The existence of P∞ is shown in Corollary 1.3. By Lemma 1.1 the function S¯ : P (n) → R has compact sublevel sets and therefore the solutions of the gradient flow (1.16) exist for all t ≥ 0 and converges to a critical point. By Lemma 1.6 the linearization of the gradient flow at each critical point is exponentially stable, with all its eigenvalues on the negative real axis. As the Hessian of S¯ at each critical point is congruent to Jˆ, it is positive definite. Thus S¯ has only isolated minima as critical points and S¯ is a Morse function. The set P (n) of symmetric positive definite matrices is connected and homeomorphic to Euclidean space
n+1 . Thus by the R( 2 ) . From the above, all critical points have index n+1 2 Morse inequalities, as in the proof of Proposition 8.5.5, (see also Milnor (1963)).
c0 S¯ = b0 (P (n)) = 1 and therefore there is only one critical point, which is a local and indeed global minimum. This completes the proof of (a), (b). Part (c) follows since every L2 -sensitivity optimal realization is of the form 1/2 −1/2 −1 1/2 −1/2 −1 ST∞ AT∞ S , ST∞ b, CT∞ S for a unique orthogonal matrix S ∈ O (n). The result follows. For the second proof we apply the Azad-Loeb theorem. It is easy to see, using elementary properties for plurisubharmonic functions that the ? 2 1 A (z) dz function (A, b, c) → 2πi z defines a plurisubharmonic function on the complex similarity orbit
OC (A, b, c) = T AT −1, T b, cT −1 | T ∈ GL (n, C) . Similarly, as in Chapter 8, the sum of the controllability and observability Gramians defines a strictly plurisubharmonic function (A, b, c) → tr Wc + tr Wo on OC (A, b, c). As the sum of a plurisubharmonic and a strictly plush function is strictly plush, we see by formula (1.13) that the sensitivity function S (A, b, c) is a strictly plush function on OC (A, b, c). As in Chapter 7, the real case follows from the complex case. Thus Part (c) of Theorem 1.8 follows immediately from the Azad-Loeb theorem.
9.1. A Sensitivity Minimizing Gradient Flow
279
Remark 1.10 By Theorem 1.8(c) any two optimal realizations which minimize the L2 -sensitivity function (1.2) are related by an orthogonal similarity transformation. One may thus use this freedom to transform any optimal realization into a canonical form for orthogonal similarity transformations, such as e.g. into the Hessenberg form or the Schur form. In particular the above theorem implies that there exists a unique sensitivity optimal realization (AH , bH , cH ) which is in Hessenberg form
AH
∗
... ⊗ . . . = .. . 0
... ..
.
⊗
∗ .. . , .. . ∗
bH
⊗ 0 = .. , .
cH = [∗ . . . ∗]
0
where the entries denoted by ⊗ are positive. These condensed forms are useful since they reduce the complexity of the realization. From an examination of the L2 sensitivity optimal condition (1.17), it makes sense to introduce the two modified Gramian matrices, termed L2 sensitivity Gramians of H (z) as follows > dz
˜c 1 W A (z) A (z)∗ + B (z) B (z)∗ 2πi z >
1 dz ∗ ∗ ˜o W A (z) A (z) + C (z) C (z) 2πi z
(1.19)
These Gramians are clearly both a generalization of the standard Gramians Wc , Wo of (1.12) and of the Euclidean norm Gramians appearing in (8.5.6). Now the above theorem implies the corollary. Corollary 1.11 With the L2 -sensitivity Gramian definitions (1.19), the necessary and sufficient condition for a realization (A, b, c) to be L2 -sensitivity optimal is the L2 -sensitivity balancing property ˜c = W ˜ o. W
(1.20)
Any controllable and observable realization of H (z) with the above property is said to be L2 -sensitivity balanced. ˜ o , when W ˜o = W ˜ c , as the Let us also denote the singular values of W 2 Hankel singular values of an L -sensitivity optimal realization of H (z). Moreover, the following properties regarding the L2 -sensitivity Gramians apply:
280
Chapter 9. Sensitivity Optimization
(a) If the two L2 -sensitivity optimal realizations (A1 , b1 , c1 , d1 ) and (A2 , b2 , c2 , d2 ) are related by a similarity transformation T , i.e. A2 c2
b2 T = d2 0
0 I
A1 c1
b1 d1
T 0
0 I
−1
then P = T T = In and T is orthogonal. Moreover, in obvious notation ˜ c(1) T ˜ c(2) = T W W so that the L2 -sensitivity Hankel singular values are invariant of orthogonal changes of co-ordinates. (b) There exists an L2 -sensitivity optimal realization such that its L2 sensitivity Gramians are diagonal with diagonal elements in descending order. ˜ o are invariant under orthogonal state space ˜ cW (c) The eigenvalues of W transformations.
˜ c, W ˜ o are the L2 -sensitivity Gramians of (A, b, c), then so are (d) If W
˜ o, W ˜ c of (A , c , b ). W
Contour integrations by Lyapunov equations The next result is motivated by the desire to avoid contour integrations in calculating the L2 -sensitivity gradient flows, and L2 -sensitivity Gramians. Note the emergence of coupled Riccati-like equations with Lyapunov equations. Proposition 1.12 Given a minimal realization (A, b, c) of H (z). Let Q11 Q12 R11 R12 and Q = R= R21 R22 Q21 Q22 be the solutions to the following two Lyapunov equations, respectively, 0 A R11 R12 bb 0 A bc R11 R12 − =− 0 I 0 A R21 R22 c b A R21 R22 (1.21) Q11 Q12 A c b Q11 Q12 A 0 c c 0 − =− 0 I 0 A Q21 Q22 bc A Q21 Q22
9.1. A Sensitivity Minimizing Gradient Flow
281
˜ o of (A, b, c) equals ˜ c, W Then the L2 -sensitivity Gramian pair W (R11 , Q11 ). Moreover, the differential equation (1.16) can be written in the equivalent form P˙ = P −1 Q11 (P ) P −1 − R11 (P ) where A A bc R11 (P ) R12 (P ) 0 A R21 (P ) R22 (P ) c b
A 0
c b A
(1.22)
0 R11 (P ) R12 (P ) − A R21 (P ) R22 (P ) bb 0 =− 0 P −1
(1.23)
Q11 (P ) Q12 (P ) Q11 (P ) Q12 (P ) A 0 − Q21 (P ) Q22 (P ) bc A Q21 (P ) Q22 (P ) c c 0 =− (1.24) 0 P
Proof 1.13 It is routine to compute the augmented state equations and thereby the transfer function of the concatenation of two linear systems to yield −1 (A (z) , B (z)) = B (z) (C (z) , I) = Ca (zI − Aa ) Ba where
A bc Aa = , 0 A
0 Ba = I
b , 0
Ca = [I
0] .
Thus we have > ∗ dz ˜c = 1 W (A (z) , B (z)) (A (z) , B (z)) 2πi z , > 1 −1 ∗ dz −1 (zI − Aa ) Ba Ba (¯ z I − Aa ) =Ca Ca 2πi z =Ca RCa =R11 ˜ o = Q11 . As a consequence, (1.22)–(1.24) Similarly, it can be proved that W follow.
282
Chapter 9. Sensitivity Optimization
Remark 1.14 Given P > 0, R11 (P ) and Q11 (P ) can be computed either indirectly by solving the Lyapunov equations (2.6)–(2.7) or directly by using an iterative algorithm which needs n iterations and can give an exact value, where n is the order of H (z). Actually, a simpler recursion for L2 sensitivity minimization based on these equations is studied in Yan, Moore and Helmke (1993) and summarized in Section 9.3. Problem 1.15 Let H (z) be a scalar transfer function. Show, using the symmetry of H (z) = H (z) , that if (Amin , bmin, cmin ) ∈ RH is an L2 sensitivity optimal realization then also (Amin , bmin , cmin ) is. Problem 1.16 Show that there exists a unique symmetric matrix Θ ∈ O (n) such that (Amin , bmin, cmin ) = (ΘAmin Θ , Θbmin , cmin Θ ).
Main Points of Section The theory of norm-balancing via gradient flows completely solves the L2 sensitivity minimization for linear time-invariant systems. Since the optimal co-ordinate basis selections are optimal to within a class of orthogonal matrices, there is the possibility for ‘super’ optimal designs based on other considerations, such as sparseness of matrices and scaling/overflow for state variables. Furthermore, the results open up the possibility for tackling other sensitivity measures such as those with frequency shaping and including other constraints in the optimization such as scaling and sparseness constraints. Certain of these issues are addressed in later sections.
9.2 Related L2 -Sensitivity Minimization Flows In this section, we first explore isodynamical flows to find optimal L2 sensitivity realizations. Differential equations are constructed for the matrices A, b, c, so that these converge to the optimal ones. Next, the results of the previous section are generalized to cope with frequency shaped indices, so that emphasis can be placed in certain frequency bands to the sensitivity of H (z) to its realizations. Then, the matrix transfer function case is considered, and the case of sensitivity to the first N Markov parameters of H (z).
Isodynamical Flows Following the approach to find balanced realizations via differential equations as developed in Chapter 7, we construct certain ordinary differential
9.2. Related L2 -Sensitivity Minimization Flows
283
equations A˙ =f (A, b, c) b˙ =g (A, b, c) c˙ =h (A, b, c) which evolve on the space RH of all realizations of the transfer function H (z), with the property that the solutions (A (t) , b (t) , c (t)) all converge for t → ∞ to a sensitivity optimal realization of H (z). This approach avoids computing any sensitivity optimizing coordinate transformation matrices T . We have already considered in Chapter 8 the task of minimizing certain cost functions defined on RH via gradient flows. Here we consider the related task of minimizing the L2 -sensitivity index via gradient flows. Following the approach developed in Chapter 8 we endow RH with its normal Riemannian metric. Thus for tangent vectors ([X1 , A] , X1 b, −cX1 ), ([X2 , A] , X2 b, −cX2 ) ∈ T(A,b,c) RH we work with the inner product ([X1 , A] , X1 b, −cX1 ) , ([X2 , A] , X2 b, −cX2 ) := 2 tr (X1 X2 ) , (2.1) which defines the normal Riemannian metric, see (8.4.10). Consider the L2 -sensitivity function 1 S (A, b, c) = 2πi
>
S : RH → R
∗ dz tr A (z) A (z) + tr (Wc ) + tr (Wo ) , z
(2.2)
This has critical points which correspond to L2 -sensitivity optimal realizations. Theorem 2.1 Let (A, b, c) ∈ RH be a stable controllable and observable realization of the transfer function H (z) and let Λ (A, b, c) be defined by > dz
1 ∗ ∗ + (Wc − Wo ) Λ (A, b, c) = A (z) A (z) − A (z) A (z) 2πi z (2.3) 2 (a) Then S : RH → R on
the gradient flow of the L -sensitivity function ˙ ˙ RH A = − gradA S, b = − gradb S, c˙ = − gradc S with respect to the normal Riemannian metric is
A˙ =AΛ (A, b, c) − Λ (A, b, c) A b˙ = − Λ (A, b, c) b c˙ =cΛ (A, b, c)
(2.4)
284
Chapter 9. Sensitivity Optimization
Now for all initial conditions (A0 , b0 , c0 ) ∈ RH the solution (A (t) , b (t) , c (t)) of (2.4) exists for all t ≥ 0 and converges for t → ∞ ¯ ¯b, c¯ of H (z). to an L2 -sensitivity optimal realization A, (b) Convergence to the class of sensitivity optimal realizations is exponentially fast. (c) The transfer function of any solution (A (t) , b (t) , c (t)) is independent of t, i.e. the flow is isodynamical. Proof 2.2 By Proposition 8.4.3 and Lemma 8.4.5, the gradient vector field S := (gradA S, gradb S, gradc S) is of the form gradA S = ΛA − AΛ, gradb S = Λb, gradc S = −cΛ, for a uniquely determined n × n matrix Λ. Recall that a general element of the tangent space T(A,b,c) RH is ([X, A] , Xb, −cX) for X ∈ Rn×n , so that the gradient for the normal Riemannian metric satisfies DS|(A,b,c) ([X, A] , Xb, −cX) = ([Λ, A] , Λb, −cΛ) , ([X, A] , Xb, −cX) =2 tr (Λ X) at (A, b, c) ∈ RH .
Let S¯ : GL (n) → R be defined by S¯ (T ) = S T AT −1 , T b, cT −1 . A straightforward computation using the chain rule shows that the total derivative of the function S (A, b, c) is given by DS|(A,b,c) ([X, A] , Xb, −cX) = DS¯In (X) >
1 ∗ ∗ = tr X A (z) A (z) − A (z) A (z) πi dz ∗ ∗ + B (z) B (z) − C (z) C (z) z >
dz 1 ∗ ∗ A (z) A (z) − A (z) A (z) = tr X πi z + 2 tr (X (Wc − Wo )) for all X ∈ Rn×n . This shows that (2.4) is the gradient flow A˙ = − gradA S, b˙ = − gradb S, c˙ = − gradc S of S. The other statements in Theorem 2.1 follow easily from Theorem 8.4.9.
9.2. Related L2 -Sensitivity Minimization Flows
285
Remark 2.3 Similar results to the previous ones hold for parameter dependent sensitivity function such as ∂H 2 ∂H 2 ∂H 2 + + Sε (A, b, c) = ε ∂A 2 ∂b 2 ∂c 2 for ε ≥ 0. Note that for ε = 0 the function S0 (A, b, c) is equal to the sum of the traces of the controllability and observability Gramians. The optimization of the function S0 : RH → R is well understood, and in this case the class of sensitivity optimal realizations consists of the balanced realizations, as studied in Chapters 7 and 8. This opens perhaps the possibility of using continuation type of methods in order to compute L2 -sensitivity optimal realization for S : RH → R, where ε = 1.
Frequency Shaped Sensitivity Minimization In practical applications of sensitivity minimization for filters, it makes sense to introduce frequency domain weightings. It may for example be that insensitivity in the pass band is more important that in the stop band. It is perhaps worth mentioning that one case where frequency shaping is crucial −1 is to the case of cascade filters H = H1 H2 where Hi (z) = ci (zI − Ai ) bi . ∂H1 ∂H2 ∂H ∂H Then ∂A1 = ∂A1 H2 and ∂A2 = H1 ∂A2 and H2 , H1 can be viewed as the frequency shaping filters. It could be that each filter Hi is implemented with a different degree of precision. Other frequency shaping filter selections to minimize pole and zero sensitivities are discussed in Thiele (1986). Let us consider a generalization of the index S (A, b, c) of (1.2) as ∂H 2 ∂H 2 ∂H 2 + wb + wc Sw (A, b, c) = wA ∂A 2 ∂b 2 ∂c 2
(2.5)
where weighting wA etc denote the scalar transfer functions wA (z) etc. Thus ∂H (z) =wc (z) B (z) =: Bw (z) ∂c ∂H (z) =wb (z) C (z) =: Cw (z) wb (z) ∂b ∂H wA (z) (z) =wA (z) C (z) B (z) =: Aw (z) ∂A wc (z)
(2.6)
The important point to note is that the derivations in this more general frequency shaped case are now a simple generalization of earlier ones. Thus
286
Chapter 9. Sensitivity Optimization
the weighted index Sw expressed as a function of positive definite matrices P = T T is now given from a generalization of (1.15) as >
1 ∗ S¯w (P ) = tr Aw (z) P −1 Aw (z) P 2πi dz (2.7) ∗ ∗ . + Bw (z) Bw (z) P + Cw (z) Cw (z) P −1 z The gradient flow equations are corresponding generalizations of (1.16): >
−1 1 P Aw (z)∗ P Aw (z) P −1 − Aw (z) P −1 Aw (z)∗ P˙ = 2πi (2.8) dz ∗ ∗ . − Bw (z) Bw (z) + P −1 Cw (z) Cw (z) P −1 z With arbitrary initial conditions satisfying P0 = P0 > 0, the unique limit > 0, characterized by ing equilibrium point is P∞ = P∞ >
1 ∗ ∗ dz −1 Aw (z) + Bw (z) Bw (z) Aw (z) P∞ 2πi z > dz −1
1 ∗ ∗ −1 P (2.9) = P∞ Aw (z) P∞ Aw (z) + Cw (z) Cw (z) 2πi z ∞ To complete the full analysis, it is necessary to introduce the genericity conditions on wb , wc : No zeros of wb (z) , wc (z)
are poles of H (z) .
(2.10)
Under (2.10), and minimality of (A, b, c), the weighted Gramians are positive definite: > 1 ∗ dz >0 Wwc = Bw (z) Bw (z) 2πi z > (2.11) 1 dz ∗ >0 Wwo = Cw (z) Cw (z) 2πi z Moreover, in generalizing (1.18) and its dual, likewise, then −1
XBw (z) =X (zI − A)
bwb (z) = 0
Cw (z) X =wc (z) c (zI − A)−1 X = 0 for all z ∈ C implies X = 0 as required. The frequency shaped generalizations are summarized as a theorem. Theorem 2.4 Consider any controllable and observable asymptotically stable realization (A, b, c) of the transfer function H (z). Consider also the weighted index (2.5), (2.7) under definition (2.6).
9.2. Related L2 -Sensitivity Minimization Flows
287
(a) There exists a unique P∞ = P∞ > 0 which minimizes S¯w (P ), P ∈ 1/2 P (n), and T∞ = P∞ is an L2 -frequency shaped sensitivity optimal transformation. Also, P∞ is the uniquely determined critical point of S¯w : P (n) → R.
(b) The gradient flow P˙ (t) = −∇S¯w (P (t)) on P (n) is given by (2.8). Moreover, its solution P (t) of (2.8) exists for all t ≥ 0 and converges exponentially fast to P∞ satisfying (2.9).
Sensitivity of Markov Parameters It may be that a filter HN (z) has a finite Markov expansion HN (z) =
N
cAk bz −k−1 .
k=0
Then definitions (1.9) specialize as BN (z) =
N
Ak bz −k−1 ,
CN (z) =
k=0
N
cAk z −k−1 ,
k=0
and the associated Gramians are finite sums Wc(N ) =
N
Ak bb (A ) ,
k=0
k
Wo(N ) =
N
(A ) c cAk k
k=0
In this case the sensitivity function for HN is with AN (z) = BN (z) CN (z), ∂HN 2 ∂HN 2 ∂HN 2 + + SN (A, b, c) = ∂A ∂b ∂c 2 2 2 >
1 ∗ dz + tr Wc(N ) + tr Wo(N ) = tr AN (z) AN (z) 2πi z
r l = b (A ) Ak bcAs (A ) c + tr WcN + tr WoN 0≤k,l,r,s≤N r+l=k+s
This is just the L2 -sensitivity function for the first N + 1 Markov parameters. Of course, SN is defined for an arbitrary, even unstable, realization. If (A, b, c) were stable, then the functions SN : RH → R converge uniformly on compact subsets of RH to the L2 -sensitivity function (1.2) as N → ∞. Thus one may use the function SN in order to approximate the sensitivity function (1.2). If N is greater or equal to the McMillan degree of (A, b, c), then earlier results all remain$ in force if S is replaced by SN and likewise −1 N (zI − A) by the finite sum k=0 Ak z −k−1 .
288
Chapter 9. Sensitivity Optimization
The Matrix Transfer Function Case Once the multivariable sensitivity norm is formulated as the sum of scalar variable sensitivity norms as in (1.3), (1.4), the generalization of the scalar variable results to the multivariable case is straightforward. Thus let H (z) = (Hij (z)) = C (zI − A)−1 B be a p × m strictly proper transfer function with a controllable and observable realization (A, B, C) ∈ Rn×n × Rn×m × Rp×n , so that Hij (z) = ci (zI − A)−1 bj for i = 1, . . . , p, j = 1, . . . , m. The set of all controllable and observable realizations of H (z) then is the similarity orbit
RH = T AT −1 , T B, CT −1 | T ∈ GL (n) . By Lemma 7.4.1, RH is a closed submanifold of Rn×n × Rn×m × Rp×n . Here, again, RH is endowed with the normal Riemannian metric defined by (2.1). The multivariable sensitivity norm S (A, B, C) is then seen to be the sum of scalar variable sensitivities as p m S (A, B, C) = S (A, bj , ci ) . i=1 j=1
Similarly S : P (n) → R, S (P ) = S T AT −1, T B, CT −1 , is S¯ (P ) =
p m
S¯ij (P )
i=1 j=1
1 S¯ij (P ) = tr 2πi
>
∗ Aij (z) P −1 Aij (z) P
(2.12) dz ∗ ∗ + Bij (z) Bij (z) P + Cij (z) Cij (z) P −1 z
with, following (1.9), (1.10) ∂H ij (z) −1 Cij (z) = = ci (zI − A) ∂b ∂H ij (z) Bij (z) = = (zI − A)−1 bj ∂c ∂H ij (z) Aij (z) = = Bij (z) Cij (z) ∂A
(2.13)
The gradient flow (1.5) is generalized as P˙ = −∇S¯ (P ) = −
p m i=1 j=1
∇S¯ij (P ) .
(2.14)
9.2. Related L2 -Sensitivity Minimization Flows
289
Therefore 1 tr S¯ (P ) = 2πi
>
p m
∗
Aij (z) P −1 Aij (z) P
i=1 j=1
dz z
+ p tr (Wc P ) + m tr Wo P −1 . The equilibrium point P∞ of the gradient flow is characterized by ∇S¯ (P∞ ) =
p m
∇S¯ij (P∞ ) = 0
(2.15)
i=1 j=1
which, following (1.17), is now equivalent to 1 2πi
> p m
∗ ∗ dz −1 Aij (z) + Bij (z) Bij (z) Aij (z) P∞ z i=1 j=1 > p m
dz −1 −1 1 P Aij (z)∗ P∞ Aij (z) + Cij (z)∗ Cij (z) = P∞ 2πi z ∞ i=1 j=1 (2.16)
The introduction of the summation terms is clearly the only effect of generalization to the multivariable case, and changes none of the arguments in the proofs. A point at which care must be taken is in generalizing (1.18). Now XBij (z) = 0 for all i, j, and thus X (zI − A)−1 B = 0, requiring (A, B) controllable to imply X = 0. Likewise, the condition (A, c) observable generalizes as (A, C) observable. We remark that certain notational elegance can be achieved using vec and Kronecker product notation in the multivariable case, but this can also obscure the essential simplicity of the generalization as in the above discussion. We summarize the first main multivariable results as a theorem, whose proof follows Theorem 1.8 as indicated above. Theorem 2.5 Consider any controllable and observable asymptotically −1 stable realization (A, B, C) of the transfer function H (z) = C (zI − A) B, −1 H (z) = (Hij (z)) with Hij (z) = ci (zI − A) bj . Then: (a) There exists a unique P∞ = P∞ > 0 which the sensi$m $pminimizes tivity index (1.3), also written as S¯ (P ) = i=1 j=1 S¯ij (P ), and 1/2 T∞ = P∞ is an L2 -sensitivity optimal transformation. Also, P∞
290
Chapter 9. Sensitivity Optimization
is the uniquely determined critical point of the sensitivity function S¯ : P (n) → R. (b) The gradient flow P˙ = −∇S¯ (P ) of S¯ : P (n) → R is given by (2.14)– (2.13). The solution exists for arbitrary P0 = P0 > 0 and for all t ≥ 0, and converges exponentially fast to P∞ satisfying (2.15)–(2.16). (c) A realization (A, B, C) ∈ RH is a critical point for the total L2 sensitivity function S : RH → R if and only if (A, B, C) is a global minimum for S : RH →
R. The set of global minima of S : RH → R is a single O (n) orbit SAS −1 , SB, CS −1 | S S = SS = I . The second main multivariable result generalizes Theorem 2.1 as follows. Theorem 2.6 Let (A, B, C) ∈ RH be a stable controllable and observable realization of the transfer function H (z), and let Λ = Λ (A, B, C) be defined by > p m
dz 1 ∗ ∗ Aij (z) Aij (z) − Aij (z) Aij (z) Λ (A, B, C) = 2πi z i=1 j=1 + (pWc − mWo ) Thenthe gradient flow of the L2 -sensitivity function S : RH → R ˙ ˙ ˙ on RH A = −∇A S, B = −∇B S, C = −∇C S with respect to the induced Riemannian metric is A˙ =AΛ (A, B, C) − Λ (A, B, C) A B˙ = − Λ (A, B, C) B C˙ =CΛ (A, B, C) . For all initial conditions (A0 , B0 , C0 ) ∈ RH the solution (A (t) , B (t) , C (t)) exists for all t ≥ 0 and converges for t → ∞ exponentially fast to an L2 ¯ ¯ sensitivity optimal realization A, B, C¯ of H (z), with the transfer function of any solution (A (t) , B (t) , C (t)) independent of t. Proof 2.7 Follows that of Theorem 2.1 with the multivariable generalizations.
9.3. Recursive L2 -Sensitivity Balancing
291
Main Points of Section Differential equations on the system matrices A, b, c which converge exponentially fast to optimal L2 -sensitivity realizations are available. Also generalizations of the gradient flows to cope with frequency shaping and the matrix transfer function case are seen to be straightforward, as is the case of L2 -sensitivity minimization for just the first N Markov parameters of the transfer function.
9.3 Recursive L2 -Sensitivity Balancing To achieve L2 -sensitivity optimal realizations on a digital computer, it makes sense to seek for recursive algorithms which converge to the same solution set as that for the differential equations of the previous sections. In this section, we present such a recursive scheme based on our work in Yan, Moore and Helmke (1993). This work in turn can be viewed as a generalization of the work of Hitz and Anderson (1972) on Riccati equations. Speed improvements of two orders of magnitude of the recursive algorithm over Runge-Kutta implementations of the differential equations are typical. Full proofs of the results are omitted here since they differ in spirit from the other analysis in the book. The results are included because of their practical significance. For the L2 -sensitivity minimization problem, we have the following result. We concentrate on the task of finding the unique minimum P∞ ∈ P (n) of the L2 -sensitivity cost S¯ : P (n) → R. The set of all sensitivity optimal T∞ = P∞ . coordinate transformations T∞ is then given as T∞ Theorem 3.1 Given an initial stable, controllable and observable realiza−1 tion (A, b, c) of H (z) = c (zI − A) b. The solution of the difference equation " # ˜ o (Pk ) /α Pk+1 =Pk − 2 Pk + W #−1 " " # ˜ o (Pk ) /α + αW ˜ c (Pk )−1 ˜ o (Pk ) /α Pk + W × 2Pk + W ˜ o (Pk ) /α + 2W (3.1) converges exponentially to P∞ ∈ P (n) from any initial condition P0 ∈ P (n) and P∞ minimizes the L2 -sensitivity function S¯ : P (n) → R. Here
292
Chapter 9. Sensitivity Optimization
α is any positive constant. > ˜ c (P ) 1 W 2πi > 1 ˜ Wo (P ) 2πi
˜ c (P ) and W ˜ o (P ) are defined by W
∗ ∗ dz A (z) P −1 A (z) + B (z) B (z) z dz
∗ ∗ . A (z) P A (z) + C (z) C (z) z
(3.2) (3.3)
Proof 3.2 See Yan, Moore and Helmke (1993) for full details. Suffice it to say here, the proof technique works with an upper bound PkU for Pk which is monotonically decreasing, and a lower bound PkL for Pk which is monotonically increasing. In fact, PkU is a solution of (3.1) with initial condition P0U suitably “large” satisfying P0U > P0 > 0. Also, PkL is a solution of (3.1) with initial condition P0L suitably “small” satisfying 0 < P0U < P0 . Since it can be shown that lim PkU = PkL , then lim Pk = P∞ k→∞
k→∞
is claimed. ˜ c (P ), W ˜ o (P ) Remark 3.3 Of course, the calculation of the Gramians W on (3.2) (3.3) can be achieved by the contour integrations as indicated, or by solving Lyapunov equations. The latter approach leads then to the more convenient recursive implementation of the next theorem. ˜ o (P ) are replaced by constant matri˜ c (P ) and W Remark 3.4 When W ces, then the algorithm specializes to the Riccati equation studied by Hitz and Anderson (1972). This earlier result stimulated us to conjecture the above theorem. The proof of the theorem is not in any sense a straightforward consequence of the Riccati theory as developed in Hitz and Anderson (1972). Theorem 3.5 With the same hypotheses as in Theorem V (Q) be the solutions of the Lyapunov equations A 0 c c A c b + U (P ) U (P ) = 0 bc A 0 A A bc A 0 bb V (Q) = V (Q) + 0 A c b A 0
3.1, let U (P ) and 0 P
0 Q
(3.4)
(3.5)
Then for any α > 0 and any initial condition (P0 , U0 , V0 ) ∈ P (n)×P (2n)× P (2n), the solution (Pk , Uk , Vk ) of the system of difference equations
Pk+1 =Pk − 2 Pk + Uk11 /α " −1 #−1
(3.6) Pk + Uk11 /α × 2Pk + Uk11 /α + α Vk11 + 2Uk11 /α
9.3. Recursive L2 -Sensitivity Balancing
Uk+1
Vk+1
11 Uk+1
12 Uk+1
293
21 22 Uk+1 Uk+1 A c b Uk11 Uk12 A 0 c c 0 = + 0 A Uk21 Uk22 bc A 0 Pk 12 V 11 Vk+1 k+1 21 22 Vk+1 Vk+1 0 0 A bc Vk11 Vk12 A bb = + 0 Pk−1 0 A Vk21 Vk22 c b A
(3.7)
(3.8)
−1 converges to P∞ , U (P∞ ) , V P∞ ∈ P (n) × P (2n) × P (2n). the sensitivity function S¯ : P (n) → R and U∞ = Here P∞ minimizes
−1 2 ˜ c (P∞ ), W ˜ o (P∞ ) U (P∞ ), V∞ = V P∞ are the L -sensitivity Gramians W respectively. Proof 3.6 See Yan, Moore and Helmke (1993) for details. Remark 3.7 There does not appear to be any guarantee of exponential convergences of these difference equations, at least with the proof techniques as in Theorem 3.1. Example 3.8 To demonstrate the effectiveness of the proposed algorithms, consider a specific minimal state-space realization (A, b, c) with
0.5 0 1.0 A = 0 −0.25 0 , 0 0 0.1
0 b = 1 ,
c= 1
5
10
2
Recall that there exists a unique positive definite matrix P∞ such that the realization T AT −1 , T b, cT −1 is L2 -sensitivity optimal for any similarity transformation T with T∞ T∞ = P∞ . It turns out that P∞ is exactly given by
P∞
which indeed satisfies (1.17).
0.5 = 0 5.0 0 0.5 0 5.0 0.2
0
(3.9)
Chapter 9. Sensitivity Optimization
5
5
4
4
3
3 P(t)
Pk
294
2
2
1
1
0
0
-1 0
100
200
300
400
500
-1 0
0.5
k
1
1.5
2
t
FIGURE 3.1. The trajectory of Pk of (3.1) with α = 300
FIGURE 3.2. The trajectory of P (t) of (1.16)
We first take the algorithm (3.1) with α = 300 and implement it starting from the identity matrix. The resulting trajectory Pk during the first 500 iterations is shown in Figure 3.1 and is clearly seen to converge very rapidly to P∞ . In contrast, referring to Figure 3.2, using Runge-Kutta integration methods to solve the differential equation (1.16) with the same initial condition, even after 2400 iterations the solution does not appear to be close to P∞ and it is converging very slowly to P∞ . Adding a scalar factor to the differential equation (1.16) does not reduce significantly the computer time required for solving it on a digital computer. Next we examine the effect of α on the convergence rate of the algorithm (3.1). For this purpose, define the deviation between Pk and the true solution P∞ in (3.9) as dα (k) = Pk − P∞ 2 where ·2 denotes the spectral norm of a matrix. Implement (3.1) with α = 0.1, 10, 25, 100, 300, 2000, respectively, and depict the evolution of the associated deviation dα (k) for each α in Figure 3.3. Then one can see that α = 300 is the best choice. In addition, as long as α ≤ 300, the larger α, the faster the convergence of the algorithm. On the other hand, it should be observed that a larger α is not always better than a smaller α and that too small α can make the convergence extremely slow. Finally, let us turn to the algorithm (3.6)–(3.8) with α = 300, where all the initial matrices required for the implementation are set to identity matrices of appropriate dimension. Define d (k) = Pk − P∞ 2
9.4. L2 -Sensitivity Model Reduction 5
4
a+0.1
4
3
a+10
3
2
a+100
d(k)
d(k) α
5
2
a+25
1 0 0
295
1
a+2000
a+300 100
200
300
400
500
k
FIGURE 3.3. Effect of different α on the convergence rate of (3.1)
0 0
100
200
300
400
500
k
FIGURE 3.4. Convergence of algorithm (3.6)–(3.8) with α = 300
as the deviation between the first component Pk of the solution and P∞ . Figure 3.4 gives its evolution and manifestly exhibits the convergence of the algorithm. Indeed, the algorithm (3.6)–(3.8) is faster compared to (3.1) in terms of the execution time; but with the same number of iterations the former does not produce a solution as satisfactory as the latter.
Main Points of Section Recursive L2 -sensitivity balancing schemes are devised which achieve the same convergence points as the continuous-time gradient flows of the previous sections. The algorithms are considerably faster than those obtained by applying standard numerical software for integrating the gradient flows.
9.4 L2 -Sensitivity Model Reduction We consider an application of L2 -sensitivity minimization to model reduction, first developed in Yan and Moore (1992). Recall that a L2 -sensitivity optimal realization (A, b, c) of H (z) can be found so that Σ1 0 ˜ ˜ Wc = Wo = diag (σ1 , σ2 , . . . , σn ) = (4.1) 0 Σ2 where σ1 ≥ · · · ≥ σn1 > σn1 +1 ≥ · · · ≥ σn and Σi is ni × ni , i = 1, 2. Partition compatibly (A, b, c) as " # b1 A11 A12 , b= , c = c1 c2 (4.2) A= A21 A22 b2
296
Chapter 9. Sensitivity Optimization
Then it is not hard to prove that ∂H 2 ∂H 2 ∂H 2 ∂H 2 + ≥ + ∂b1 ∂A22 ∂c2 + tr (Σ1 − Σ2 ) ∂A11 2 2 2 2 ∂H 2 ∂H 2 ∂H 2 ∂H 2 ∂A11 + ∂c1 ≥ ∂A22 + ∂b2 + tr (Σ1 − Σ2 ) 2 2 2 2
(4.3) (4.4)
which suggests that the system is generally less sensitive with respect to (A22 , b2 , c2 ) than to (A11 , b1 , c1 ). In this way, the nth 1 order model (A11 , b1 , c1 ) may be used as an approximation to the full order model. However, it should be pointed out that in general the realization (A11 , b1 , c1 ) is no longer sensitivity optimal. Let us present a simple example to illustrate the procedure of performing model reduction based on L2 -sensitivity balanced truncation. Example 4.1 Consider the discrete-time state transfer function H (z) = c (zI − A)−1 b with " # 0 1 0 A= , b= , c= 1 5 (4.5) −0.25 −1 1 This is a standard Lyapunov balanced realization with discrete-time controllability and observability Gramians Wc = Wo = diag (10.0415, 0.7082)
(4.6)
The first order model resulting from direct truncation is H1 (z) =
5.2909 . z + 0.6861
The magnitude plot of the reduced order model H1 (z) is shown in Figure 4.1 with point symbol and compared to that of the full order model H (z). An L2 ˜ ˜b, c˜ of H (z) satisfying (4.1) is found sensitivity balanced realization A, to be −0.6564 0.1564 ˜ A= , −0.1564 −0.3436
˜b = −2.3558 , 0.7415
" # c˜ = −2.3558 −0.7415
with the associated L2 -sensitivity Gramians being ˜c = W ˜ o = diag (269.0124, 9.6175) W
9.4. L2 -Sensitivity Model Reduction 18
18
.. ... ... ... .... .. .. .. .. .. .. ... .. .. 14 .. .. .. .. .. .. ... .. .. .. 12 ... .. .. .. . ... . . ... ... . ... . 10 . .. .. . .. . .. .. . .. 8 .. .. .. ... . . ... ... ... . . . .... 6 .... .... . . . ..... ... ...... ..... . . . . ....... . .... .......... ......... . . .............. . . 4 . . . . . . . ... .................................... ......................................
...... .. ... .. ... ... ... ... .. .. .. ... .. .. .. .. ... .. ... ... 12 . . ... ... ... . . . ... ... .. .. .. 10 .. .. . .. . .. .. . .. 8 . ... ... ... . . ... ... . .... . ... .... 6 .... ..... . . . ...... .. ....... ...... . . . . . . ......... ..... ............. 4 ....................................................... .......................... ............
16
2 0
297
1
2
3
4
5
6
16 14
7
FIGURE 4.1. Frequency responses of H (z), H1 (z)
2 0
1
2
3
4
5
6
7
FIGURE 4.2. Frequency responses of H (z), H2 (z)
Thus by directly truncating this realization, there results a first order model H2 (z) =
5.5498 z + 0.6564
(4.7)
which is stable with a dotted magnitude plot as depicted in Figure 4.2. Comparing Figure 4.1 and Figure 4.2, one can see that H2 (z) and H1 (z) have their respective strengths in approximating H (z) whereas their difference is subtle. At this point, let us recall that a reduced order model resulting from truncation of a standard balanced stable realization is also stable. An issue naturally arises as to whether this property still holds in the case of L2 sensitivity balanced realizations. Unfortunately, the answer is negative. To see this, we consider the following counterexample. Example 4.2 H (z) = c (zI − A)−1 b where 0.5895 −0.0644 0.8062 A= , b= , 0.0644 0.9965 0.0000
" # c = 0.8062 0.0000
It is easily checked that the realization (A, b, c) is Lyapunov balanced with A2 = 0.9991 < 1. Solving the relevant differential or difference equation, we find the solution of the L2 −sensitivity minimization problem to be 1.0037 −0.0862 P∞ = −0.0862 1.0037 from which a L2 −sensitivity-optimizing similarity transformation is constructed as 0.0431 −1.0009 . T∞ = 1.0009 −0.0431
298
Chapter 9. Sensitivity Optimization
˜ ˜b, c˜ is As a result, the L2 -sensitivity optimal realization A, " # 1.0028 −0.0822 0.0347 ˜b = A˜ = , , c˜ = −0.0347 0.8070 0.0822 0.5832 0.8070 with the L2 -sensitivity Gramians ˜c = W ˜ o = diag (27.7640, 4.2585) W One observes that the spectral norm of A˜ is less than 1, but that the first order model resulting from the above described truncation procedure is unstable. It is also interesting to note that the total L2 -sensitivities of the original standard balanced realization and of the resulting sensitivity optimal one are 33.86 and 33.62, respectively, which shows little difference.
Main Points of Section Just as standard balancing leads to good model reduction techniques, so L2 -sensitivity balancing can achieve equally good model reduction. In the latter case, the reduced order model is, however, not L2 -sensitivity balanced, in general.
9.5 Sensitivity Minimization with Constraints In the previous section, parameter sensitivity minimization of a linear system realization is achieved using an L2 -sensitivity norm. However, to minimize roundoff noise in digital filters due to the finite word-length implementations, it may be necessary to ensure that each of the states fluctuates over the same range. For such a situation, it is common to introduce an L2 scaling constraint. One such constraint for controllable systems, studied in Roberts and Mullis (1987) is to restrict the controllability Gramian Wc , or rather T Wc T under a co-ordinate basis transformation T , so that its diagonal elements are identical. Here, without loss of generality we take these elements to be unity. The approach taken in this section to cope with such scaling constraints, is to introduce gradient flows evolving on the general linear group of invertible n × n co-ordinate basis transformation matrices, T , but subject to the constraint diag (T Wc T ) = In . Here Diag (X) denotes a diagonal matrix with the same diagonal elements as X, to distinguish from the notation diag (X) = (x11 , . . . , xnn ) . Thus we restrict attention to the constraint set M = {T ∈ GL (n) | diag (T Wc T ) = In }
(5.1)
9.5. Sensitivity Minimization with Constraints
299
The simplest sensitivity function proposed in Roberts and Mullis (1987) for optimization is −1 (5.2) φ : M → R, φ (T ) = tr (T ) Wo T −1 . A natural approach in order to obtain the minimum of φ is to solve the relevant gradient flow of φ. In order to do this we first have to explore the underlying geometry of the constraint set M . Lemma 5.1 M is a smooth manifold of dimension n (n − 1) which is diffeomorphic to the product space Rn(n−1)/2 × O (n). Here O (n) denotes the group of n × n orthogonal matrices. The tangent space TanT (M ) at T ∈ M is
TanT (M ) = ξ ∈ Rn×n | diag (ξWc T ) = 0
(5.3)
Proof 5.2 Let γ : GL (n) → Rn denote the smooth map defined by γ (T ) = diag (T Wc T ) The differential of γ at T0 ∈ GL (n) is the linear map Dγ|T0 : Rn×n → Rn defined by Dγ|T0 (ξ) = 2 diag (ξWc T0 ) Suppose there exists a row vector c ∈ Rn which annihilates the image of Dγ|T0 . Thus 0 = c diag (ξWc T0 ) = tr (D (c) ξWc T0 ) for all ξ ∈ Rn×n , where D (c) = diag (c1 , . . . , cn ). This is equivalent to Wc T0 D (c) = 0 ⇔ D (c) = 0 since Wc and T0 are invertible. Thus γ is a submersion and therefore for all x ∈ Rn γ −1 (x) = {T ∈ GL (n) | diag (T Wc T ) = x} is a smooth submanifold of codimension n. In order to prove that M is diffeomorphic to Rn(n−1)/2 × O (n) we observe first that M is diffeomorphic to {S ∈ GL (n) | diag (SS ) = In } via the map T → S := T Wc1/2 Applying the Gram-Schmidt orthogonalization procedure to S yields S = L·U
300
Chapter 9. Sensitivity Optimization
∗ . L = .. ∗
where
..
.
...
0 ∗
is lower triangular with positive elements on the diagonal and U ∈ O (n) is orthogonal. Hence L satisfies Diag (LL ) = In if and only if the row vectors of L have all norm one. The result follows. Lemma 5.3 The function φ : M → R defined by (5.2) has compact sublevel sets. Proof 5.4 M is a closed subset of GL (n). By Lemma 8.2.1, the set
P = P > 0 | tr (Wc P ) = n, tr Wo P −1 ≤ c is a compact subset of GL (n). Let P = T T . Hence, −1 T ∈ φ−1 ((−∞, c]) ⇔ tr (T ) Wo T −1 ≤ c and diag (T Wc T ) =In
and tr (Wc P ) =n ⇒ tr Wo P −1 ≤ c shows that φ−1 ((−∞, c]) is a closed subset of a compact set and therefore is itself compact. Since the image of every continuous map φ : M → R with compact sublevel sets is closed in R and since φ (T ) ≥ 0 for all T ∈ M , Lemma 5.3 immediately implies Corollary 5.5 A global minimum of φ : M → R exists. In order to define the gradient of φ we have to fix a Riemannian metric on M . In the sequel we will endow M with the Riemannian metric which is induced from the standard Euclidean structure of the ambient space Rn×n , i.e. ξ, η := tr (ξ η) for all ξ, η ∈ TanT (M ) ⊥
Let TanT (M ) denote the orthogonal complement of TanT (M ) with respect to the inner product in Rn×n . Thus η ∈ TanT (M )⊥ ⇔ tr (η ξ) = 0 ⊥
By Lemma 5.1., dim TanT (M ) diag (η1 , . . . , ηn ) let
for all ξ ∈ TanT (M )
(5.4)
= n. Now given any diagonal matrix
η := diag (η1 , . . . , ηn ) T Wc
(5.5)
9.5. Sensitivity Minimization with Constraints
301
Since tr (η ξ) = tr (Wc T diag (η1 , . . . , ηn ) ξ) = tr (diag (η1 , . . . , ηn ) Diag (ξWc T )) =0 for all ξ ∈ TanT (M ). Thus any η of the form (5.5) is orthogonal to the tangent space TanT (M ) and hence is contained in TanT (M )⊥ . Since the ⊥ vector space of such matrices η has the same dimension n as TanT (M ) we obtain. Lemma 5.6 TanT (M )⊥
= η ∈ Rn×n | η = diag (η1 , . . . , ηn ) T Wc for some η1 , . . . , ηn . We are now ready to determine the gradient of φ : M → R. The gradient ∇φ is the uniquely determined vector field on M which satisfies the condition (a)
∇φ (T ) ∈ TanT (M )
(b)
−1 Dφ|T (ξ) = − 2 tr (T ) Wo T −1 ξT −1 = ∇φ (T ) , ξ
= tr ∇φ (T ) ξ
(5.6)
for all ξ ∈ TanT (M ). Hence −1 tr ∇φ (T ) + 2T −1 (T ) Wo T −1 ξ = 0 for all ξ ∈ TanT (M ). Thus by Lemma 5.6 there exists a (uniquely determined) vector η = (η1 , . . . , ηn ) such that
∇φ (T ) + 2T −1 (T )
−1
Wo T −1 = (diag (η) T Wc )
or, equivalently, ∇φ (T ) = −2 (T )
−1
Wo T −1 (T )
−1
+ diag (η) T Wc
Using (5.6) we obtain
−1 −1 diag (η) Diag T Wc2 T = 2 Diag (T ) Wo (T T ) Wc T
(5.7)
302
Chapter 9. Sensitivity Optimization
and therefore with Wc being positive definite
−1 −1 −1 diag (η) = 2 Diag (T ) Wo (T T ) Wc T Diag T Wc2 T (5.8) We therefore obtain the following result arising from Lemma 5.6. Theorem 5.7 The gradient flow T˙ = −∇φ (T ) of the function φ : M → R defined by (5.2) is given by −1 −1 T˙ =2 (T ) Wo (T T )
−1 −1 −1 − 2 Diag (T ) Wo (T T ) Wc T Diag T Wc2 T T Wc
(5.9) For any initial condition T0 ∈ GL (n) the solution T (t) ∈ GL (n) of (5.9) exists for all t ≥ 0 and converges for t → +∞ to a connected component of the set of equilibria points T∞ ∈ GL (n). The following lemma characterizes the equilibria points of (5.9). We use the following terminology. Definition 5.8 A realization (A, B, C) is called essentially balanced if there exists a diagonal matrix ∆ = diag (δ1 , . . . , δn ) of real numbers such that for the following controllability and observability Gramians Wo = ∆Wc .
(5.10)
Note that for ∆ = In this means that the Gramians coincide while for ∆ = diag (δ1 , . . . , δn ) with δi = δj for i = j the (5.10) implies that Wc and Wo are both diagonal. Lemma 5.9 A coordinate transformation T ∈ GL (n) is an equilibrium point of the gradient flow (5.9) if and only if T is an essentially balancing transformation. Proof 5.10 By (5.9), T is an equilibrium point if and only if −1 −1 −1 (T ) Wo T −1 = Diag (T ) Wo T −1 (T ) Wc T −1
T Wc T × Diag T Wc2 T −1
Set WcT := T Wc T , WoT = (T )
(5.11)
Wo T −1 . Thus (5.11) is equivalent to −1
B = Diag (B) (Diag (A))
A
(5.12)
9.5. Sensitivity Minimization with Constraints
303
where −1
B = WoT (T )
Wc T ,
A = T Wc2 T
(5.13) −1
Now B = ∆A for a diagonal matrix ∆ implies (Diag B) (Diag A) and thus (5.12) is equivalent to WoT = ∆WcT
−1 for a diagonal matrix ∆ = Diag WoT Diag WcT
=∆
(5.14)
Remark 5.11 By (5.12) and since T ∈ M we have
∆ = Diag WoT
(5.15)
so that the equilibrium points T∞ are characterized by
WoT = Diag WoT WcT
(5.16) WcT
Remark 5.12 If T is an input balancing transformation, i.e. if = In , WoT diagonal, then T satisfies (5.16) and hence is an equilibrium point of the gradient flow.
Trace Constraint The above analysis indicates that the task of finding the critical points T ∈ M of the constrained sensitivity function (4.6) may be a difficult one. It appears a hard task to single out the sensitivity minimizing coordinate transformation T ∈ M . We now develop an approach which allows us to circumvent such difficulties. Here, let us restrict attention to the enlarged constraint set M ∗ = {T ∈ GL (n) | tr (T Wc T ) = n}, but expressed in terms of P = T T as N = {P ∈ P (n) | tr (Wc P ) = n} .
(5.17)
Here P (n) is the set of positive definite real symmetric n × n matrices. ∗ Of course, the diagonal constraint set M is a proper subset of M . More ∗ −1 over, the extended sensitivity function on M , φ (T ) = tr (T ) Wo T −1 , expressed in terms of P = T T is now
(5.18) Φ : N → R, Φ (P ) = tr Wo P −1 . Before launching into the global analysis of the constrained cost function (5.18), let us first clarify the relation between these optimization problems. For this, we need a lemma
304
Chapter 9. Sensitivity Optimization
Lemma 5.13 Let A ∈ Rn×n be symmetric with tr A = n. Then there exists an orthogonal coordinate transformation Θ ∈ O (n) such that Diag (ΘAΘ ) = In . Proof 5.14 The proof is by induction on n. For n = 1 the lemma is certainly$ true. Thus let n > 1. Let λ1 ≥ · · · ≥ λn denote the eigenvalues of n Λ. Thus i=1 λi = n, which implies λ1 ≥ 1 and λn ≤ 1. Using the CourantFischer minmax Theorem 1.3.1, the mean value theorem from real analysis then implies the existence of a unit vector x ∈ S n−1 with x Ax = 1. Extend x to an orthogonal basis Θ = (x1 , x2 , . . . , xn ) ∈ O (n) of Rn . Then 1 ΘAΘ = A1 with A1 ∈ R(n−1)×(n−1) symmetric and tr A1 = n − 1. Applying the induction hypothesis on A1 thus completes the proof. Effective numerical algorithms for finding orthogonal transformations Θ ∈ O (n) such as described in the above lemma are available. See Hwang (1977) for an algorithm based on Householder transformations. Thus we see that the tasks of minimizing the sensitivity function φ (T ) = −1 tr (T ) Wo T −1 over M and M ∗ respectively are completely equivalent. Certainly the minimization of Φ : N → R over N is also equivalent to minimizing φ : M ∗ → R over M ∗ , using any square root factor T of P = T T . Moreover the minimum value of φ (T ) where T ∈ M coincides with the minimum value of Φ (P ) where P ∈ N . Thus these minimization tasks on M and N are equivalent. Returning to the analysis of the sensitivity function Φ : N → R we begin with two preparatory results. The proofs are completely analogous to the corresponding ones from the first subsection and the details are than left as an easy exercise to the reader. Lemma 5.15 N = {P ∈ P (n) | tr (Wc P ) = n} is a smooth manifold of dimension d = 12 n (n + 1) − 1 which is diffeomorphic to Euclidean space Rd . The tangent space TP N at P ∈ N is
(5.19) TP N = ξ ∈ Rn×n | ξ = ξ, tr (Wc ξ) = 0 . Proof 5.16 Follows the steps of Lemma 5.1. Thus let Γ : P (n) → R denoted the smooth map defined by Γ (P ) = tr (Wc P ) . The derivative of Γ at P ∈ P (n) is the surjective map DΓ|P : S (n) → R defined on the vector space S (n) of symmetric matrices ξ by DΓ|P (ξ) =
9.5. Sensitivity Minimization with Constraints
305
tr (Wc ξ). Thus the first result follows from the fibre theorem from Appendix C. The remainder of the proof is virtually identical to the steps in the proof of Lemma 5.1. Lemma 5.17 The function Φ : N → R defined by (5.18) has compact sublevel sets. Moreover, a global minimum of Φ : N → R exists. Proof 5.18 This is virtually identical to that of Lemma 5.3 and its corollary. In order to determine the gradient of Φ we have to specify a Riemannian metric. The following choice of a Riemannian metric on N is particularly convenient. For tangent vectors ξ, η ∈ TP N define
(5.20) ξ, η := tr P −1 ξP −1 η . It is easily seen that this defines an inner product on TP N and does in fact define a Riemannian metric on N . We refer to this as the intrinsic Riemannian metric on N . Theorem 5.19 Let Φ : N → R be defined by (5.18). (a) The gradient flow P˙ = −∇Φ (P ) of the function Φ with respect to the intrinsic Riemannian metric on N is given by P˙ = Wo − λ (P ) P Wc P,
P (0) = P0
(5.21)
where λ (P ) =
tr (Wc Wo ) 2
tr (Wc P )
(5.22)
(b) There exists a unique equilibrium point P∞ ∈ N . P∞ satisfies the balancing condition Wo = λ∞ P∞ Wc P∞ n 2 $ with λ∞ = n1 σi and σ1 , . . . , σn are the singular values of the i=1
1/2
associated Hankel, i.e. σi = λi of Φ : N → R.
(Wc Wo ). P∞ is the global minimum
(c) For every initial condition P0 ∈ N the solution P (t) of (5.21) exists for all t ≥ 0 and converges to P∞ for t → +∞, with an exponential rate of convergence.
306
Chapter 9. Sensitivity Optimization
Proof 5.20 Let ∇Φ denote the gradient vector field. The derivative of Φ : N → R at an element P ∈ N is the linear map on the tangent space DΦ|P : TP N → R given by
DΦ|P (ξ) = − tr Wo P −1 ξP −1 = − Wo , ξ for all ξ ∈ TP N . It is easily verified that the orthogonal complement (TP N )⊥ of TP N in the vector space of symmetric matrices with respect to the inner product , is ⊥
(TP N ) = {λP Wc P | λ ∈ R} , i.e. it coincides with the scalar multiplies of P Wc P . Thus DΦ|P (ξ) = ∇Φ (P ) , ξ for all ξ ∈ TP N if and only if ∇Φ (P ) = −Wo + λP Wc P. The Lagrange multiplier λ (P ) is readily computed such as to satisfy (−Wo + λP Wc P ) ∈ TP N . Thus tr (Wc (−Wo + λP Wc P )) = 0 or equivalently, λ (P ) =
tr (Wc Wo ) . tr (Wc P Wc P )
This shows (a). To prove (b), consider an arbitrary equilibrium point P∞ of (5.21). Thus Wo = λ∞ P∞ Wc P∞ . for a real number λ∞ = λ (P∞ ) > 0. By (8.2.3) we have 1/2 −1/2 1/2 1/2 W λ1/2 P = W W W Wc−1/2 . ∞ o ∞ c c c Therefore, using the constraint tr (Wc P∞ ) = n we obtain 1/2 1/2 1/2 nλ1/2 = σ1 + · · · + σn ∞ = tr Wc Wo Wc and hence
λ∞ =
2 1 (σ1 + · · · + σn ) . n
9.5. Sensitivity Minimization with Constraints
307
Also P∞ is uniquely determined and hence must be the global minimum of Φ : N → R. This proves (b). A straightforward argument now shows that the linearization of the gradient flow (5.21) at P∞ is exponentially stable. Global existence of solutions follows from the fact that Φ : N → R has compact sublevel sets. This completes the proof of the theorem. As a consequence of the above theorem we note that the constrained minimization problem for the function Φ : N → R is equivalent to the unconstrained minimization problem for the function Ψ : P (n) → R defined by
(5.23) Ψ (P ) = tr Wo P −1 + λ∞ tr (Wc P )
2 where λ∞ = n1 (σ1 + · · · + σn ) . This is of course reminiscent of the usual Lagrange multiplier approach, where λ∞ would be referred to as a Lagrange multiplier. So far we have considered the task of minimizing the observ
optimization ability dependent measure tr Wo P −1 over the constraint set N . A similar analysis can be performed for the minimization task of the constrained L2 sensitivity cost function S¯ : N → R, defined for a matrix-valued transfer −1 function H (z) = C (zI − A) B. Here the constrained sensitivity index is defined as in Section 9.1. Thus in the single input, single output case >
1 ∗ dz ¯ S (P ) = tr A (z) P −1 A (z) P 2πi z
+ tr (Wc P ) + tr Wo P −1 , while for m input, p output systems (see (2.15)) > p m
∗ dz tr Aij (z) P −1 Aij (z) P z i=1 j=1
+ p tr (Wc P ) + m tr Wo P −1
1 S¯ (P ) = 2πi
(5.24)
Here of course the constraint condition on P is tr (Wc P ) = n, so that this term does not influence the sensitivity index. Following the notation of Section 9.3 define the L2 -sensitivity Gramians > p m
∗ ∗ dz Aij (z) P −1 Aij (z) + Bij (z) Bij (z) z i=1 j=1 > p m dz
∗ ∗ ˜ o (P ) := 1 W Aij (z) P Aij (z) + Cij (z) Cij (z) 2πi z i=1 j=1 ˜ c (P ) := 1 W 2πi
308
Chapter 9. Sensitivity Optimization
A straightforward computation, analogous to that in the proof of Theorem 5.19, shows that the gradient flow P˙ = −∇S¯ (P ) with respect to the intrinsic Riemannian metric on N is ˜ o (P ) − P W ˜ c (P ) P − λ (P ) P Wc P P˙ = W where
(5.25)
˜ o (P ) + P W ˜ c (P ) P tr Wc W λ (P ) =
tr (Wc P )2
(5.26)
$∞ k k Here Wc = k=0 A BB (A ) is the standard controllability Gramian ˜ ˜ while Wo (P ) and Wc (P ) denote the L2 -sensitivity Gramians. Arguments virtually identical to those for Theorem 2.5 show Theorem 5.21 Consider any controllable and observable discrete-time asymptotically stable realization (A, B, C) of the matrix transfer function −1 −1 H (z) = C (zI − A) B, H (z) = (Hij (z)) with Hij (z) = ci (zI − A) bj . Then (a) There exist a unique positive definite matrix Pmin ∈ N which minimizes the constrained L2 -sensitivity index S¯ : N → R defined by 1/2 (5.24) and Tmin = Pmin ∈ M ∗ is a constrained L2 sensitivity optimal coordinate transformation. Also, Pmin ∈ N is the uniquely determined critical point of the L2 -sensitivity index S¯ : N → R. (b) The gradient flow P˙ = −∇S¯ (P ) of S : N → R for the intrinsic Riemannian metric on N is given by (5.25), (5.26). Moreover, Pmin is determined from the gradient flow (5.25), (5.26) as the limiting solution as t → +∞, for arbitrary initial conditions Po ∈ N . The solution P (t) exists for all t ≥ 0, and converges exponentially fast to Pmin . Results similar to those developed in Section 9.3 hold for recursive Riccati-like algorithms for constrained L2 -sensitivity optimization, which by pass the necessity to evaluate the gradient flow (5.25) for constrained L2 -sensitivity minimization.
Main Points of Section In minimizing sensitivity functions of linear systems subject to scaling constraints, gradient flow techniques can be applied. Such flows achieve what are termed essentially balanced transformations.
9.5. Sensitivity Minimization with Constraints
309
Notes for Chapter 9 The books for Roberts and Mullis (1987), Middleton and Goodwin (1990) and Williamson (1991) give a background in digital control. An introductory textbook on digital signal processing is that of Oppenheim and Schafer (1989). A recent book on sensitivity optimization in digital control is Gevers and Li (1993). For important results on optimal finite-word-length representations of linear systems and sensitivity optimization in digital signal processing, see Mullis and Roberts (1976), Hwang (1977) and Thiele (1986). The general mixed L2 /L1 sensitivity optimization problem, with frequency weighted sensitivity cost functions, is solved in Li et al. (1992). Gradient flow techniques are used in Helmke and Moore (1993) to derive the existence and uniqueness properties of L2 -sensitivity optimal realizations. In Helmke (1992), corresponding results for Lp -sensitivity optimization are proved, based on the Azad-Loeb theorem. In recent work, Gevers and Li (1993) also consider the L2 -sensitivity problem. Their proof of Parts (a) and (c) of Theorem 1.8 is similar in spirit to the second proof, but is derived independently of the more general balancing theory developed in Chapter 7, which is based on the Azad-Loeb theorem. Gevers and Li (1993) also give an interesting physical interpretation of the L2 -sensitivity index, showing that it is a more natural measure of sensitivity than the previously studied mixed L2 /L1 indices. Moreover, a thorough study of L2 -sensitivity optimization with scaling constraints is given. For further results on L2 -sensitivity optimal realizations including a preliminary study of L2 -sensitivity optimal model reduction we refer to Yan and Moore (1992). Efficient numerical methods for computing L2 sensitivity optimal coordinate transformations are described in Yan, Moore and Helmke (1993). Also, sensitivity optimal controller design problems are treated by Madievski, Anderson and Gevers (1993). Much more work remains to be done in order to develop a satisfactory theory of sensitivity optimization for digital control and signal processing. For example, we mention the apparently unsolved task of sensitivity optimal model reduction, which has been only briefly mentioned in Section 9.4. Also sensitivity optimal controller design such as developed in Madievski et al. (1993) remains a challenge. Other topics of interest are sensitivity optimization with constraints and sensitivity optimal design for nonlinear systems. The sensitivity optimization problems arising in signal processing are only in very exceptional cases amenable to explicit solution methods based on standard numerical linear algebra, such as the singular value decomposition. It is thus our conviction that the dynamical systems approach, as developed in this book, will prove to be a natural and powerful tool for tackling such complicated questions.
APPENDIX
A
Linear Algebra This appendix summarizes the key results of matrix theory and linear algebra results used in this text. For more complete treatments, see Barnett (1971) and Bellman (1970).
A.1 Matrices and Vectors Let R and C denote the fields of real numbers and complex numbers, respectively. The set of integers is denoted N = {1, 2, . . . }. An n × m matrix is an array of n rows and m columns of elements xij for i = 1, . . . , n, j = 1, . . . , m as
x11
x12 X= .. . xn1
x1m
x12
...
x22 .. .
... .. .
x2m .. = (xij ) , .
xn2
...
xnm
x1
x2 x= .. = (xi ) . . xn
The matrix is termed square when n = m. A column n- vector x is an n × 1 matrix. The set of all n-vectors (row or column) with real arbitrary entries, denoted Rn , is called n-space. With complex entries, the set is denoted Cn and is called complex n-space. The term scalar denotes the elements of R, or C. The set of real or complex n × m matrices is denoted by Rn×m or Cn×m , respectively. The transpose of an n × m matrix X, denoted X , is the m × n matrix X = (xji ). When X = X , the square matrix is termed symmetric. When X = −X , then the matrix is skew symmetric. ¯ . Let us denote the complex conjugate transpose of a matrix X as X ∗ = X
312
Appendix A. Linear Algebra
Then matrices X with X = X ∗ are termed Hermitian and with X = −X ∗ are termed skew Hermitian. The direct sum, of two square matrices X, Y , 0 ˙ , is [ X denoted X +Y 0 Y ] where 0 is a zero matrix consisting of zero elements.
A.2 Addition and Multiplication of Matrices Consider matrices X, Y ∈ Rn×m or Cn×m , and scalars k, ∈ R or C. Then Z = kX + Y is defined by zij = kxij + yij . Thus X + Y = Y + X and addition is commutative. Also, Z =$XY is defined for X an n × p matrix and Y an p × m matrix by ξij = pk=1 xik ykj , and is an n × m matrix. Thus W = XY Z = (XY ) Z = X (Y Z) and multiplication is associative. Note that when XY = Y X, which is not always the case, we say that X, Y are commuting matrices. When X X = I and X is real then X is termed an orthogonal matrix and when X ∗ X = I with X complex it is termed a unitary matrix. Note real vectors x, y are orthogonal if x y = 0 and complex vectors are orthogonal if x∗ y = 0. A permutation matrix π has exactly one unity element in each row and column and zeros elsewhere. Of course π is orthogonal. An n × n square matrix X with only diagonal elements and all other elements zero is termed a diagonal matrix and is written X = diag (x11 , x22 . . . , xnn ). When xii = 1 for all i and xij = 0 for i = j, then X is termed an identity matrix, and is denoted In , or just I. Thus for an n × m matrix Y , then Y Im = Y = In Y . A sign matrix S is a diagonal matrix with diagonal elements +1 or −1, and is orthogonal. For X, Y square matrices the Lie bracket of X and Y is [X, Y ] = XY − Y X.
Of course, [X, Y ] = − [Y, X], and for symmetric X, Y then [X, Y ] = [Y, X] = − [X, Y ]. Also, if X is diagonal, with distinct diagonal elements then [X, Y ] = 0 implies that Y is diagonal. For X, Y ∈ Rn×m or Cn×m , the generalized Lie-bracket is {X, Y } = X Y − Y X. It follows that {X, Y } = − {X, Y }.
A.3 Determinant and Rank of a Matrix A recursive definition of the determinant of a square n × n matrix X, denoted det (X), is det (X) =
n j=1
i+j
(−1)
xij det (Xij )
A.4. Range Space, Kernel and Inverses
313
where det (Xij ) denotes the determinant of the submatrix of X constructed by detecting the i-th row and the j-th column deleted. The element (−1)i+j det (Xij ) is termed the cofactor of xij . The square matrix X is said to be a singular matrix if det (X) = 0, and a nonsingular matrix otherwise. It can be proved that for square matrices det (XY ) = det (X) det (Y ), det (In + XY ) = det (In + Y X). In the scalar case det (I + xy ) = 1 + y x. The rank of an n × m matrix X, denoted by rk (X) or rank (X), is the maximum positive integer r such that some r × r submatrix of X, obtained by deleting rows and columns is nonsingular. Equivalently, the rank is the maximum number of linearly independent rows and columns of X. If r is either m or n then X is full rank. It is readily seen that rank (XY ) ≤ min (rank X, rank Y ) .
A.4 Range Space, Kernel and Inverses For an n × m matrix X, the range space or the image space R (X ), also denoted by image (X), is the set of vectors Xy where y ranges over the set of all m vectors. Its dimension is equal to the rank of X. The kernel ker (X) of X is the set of vectors z for which Xz = 0. It can be seen that for real matrices R (X ) is orthogonal to ker (X), or equivalently, with y1 = X y for some y and if Xy2 = 0, then y1 y2 = 0. For a square nonsingular matrix X, there exists a unique inverse of element of X, denoted X −1 , such that X −1 X = XX −1 = I. The ij-th −1 −1 X −1 is given from det (X) × cofactor of xji . Thus X −1 = (X ) and (XY )−1 = Y −1 X −1 . More generally, a unique (Moore-Penrose) pseudoinverse of X, denoted X # , is defined by the characterizing properties X # Xy = y for all y ∈ R (X ) and X # y = 0 for all y ∈ ker (X ). Thus #
= X, if det (X) = 0 then X # = X −1 , if X = 0, X # = 0, X # X # XX # = X # , XX # X = X For a nonsingular n × n matrix X, a nonsingular p × p matrix A and an n × p matrix B, then provided inverses exist, the readily verified Matrix Inversion Lemma tells us that −1 −1
I + XBA−1 B X = X −1 + BA−1 B −1
=X − XB (B XB + A) and
BX
−1 −1
I + XBA−1 B XBA−1 = X −1 + BA−1 B BX −1 −1
=XB (B XB + A)
.
314
Appendix A. Linear Algebra
A.5 Powers, Polynomials, Exponentials and Logarithms The p-th power of a square matrix X, denoted X p , is the p times product q XX . . . X when p is integer. Thus (X p ) = X pq , X p X q $ = X p+q for integers p, q. Polynomials of a matrix X are defined by p (X) = pi=0 si X i , where si are scalars. Any two polynomials p (X), q (X) commute. Rational functions of a matrix X, such as p (X) q −1 (X), also pairwise commute. The exponential of a square $ matrix X, denoted eX , is given by the limit ∞ X of the convergent series e = i=0 i!1 X i . It commutes with X. For X any
−1 = e−X . If X is skew-symmetric square matrix, eX is invertible with eX X then e is orthogonal. If X and Y commute, then e(X+Y ) = eX eY = eY eX . For general matrices such simple expressions do not hold. One has always eX Y e−X =Y + [X, Y ] + +
1 3!
1 2!
[X, [X, Y ]]
[X, [X, [X, Y ]]] + . . . .
When Y is nonsingular and eX = Y , the logarithm is defined as log Y = X.
A.6 Eigenvalues, Eigenvectors and Trace For a square n × n matrix X, the characteristic polynomial of X is det (sI − X) and its real or complex zeros are the eigenvalues of X, denoted λi . The spectrum spec (X) of X is the set of its eigenvalues. The Cayley-Hamilton Theorem tells us that X satisfies its own characteristic equation det (sI − X) = 0. For eigenvalues λi , then Xvi = λi vi for some real or complex vector vi , termed an eigenvector. The real or complex vector space of such vectors is termed the eigenspace. If λi is not a repeated eigenvalue, then vi is unique to within a scalar factor. The eigenvectors are real or complex according to whether or not λi is real or complex. When X is diagonal then X = diag (λ1 , λ2 , . . . , λn ). Also, det (X) = Πni=1 λi so that det (X) = 0, if and only if at least one eigenvalue is zero. As det (sI − XY ) = det (sI − Y X), XY has the same eigenvalues as Y X. A symmetric, or Hermitian, matrix has only real eigenvalues, a skew symmetric, or skew-Hermitian, matrix has only imaginary eigenvalues, and an orthogonal, or unitary, matrix has unity magnitude $n $n eigenvalues. The trace of X, denoted tr (X), is the sum i=1 xii = i=1 λi . Notice that tr (X + Y ) = tr (X) + tr (Y$), and with XY square, then n $n 2 2 tr (XY ) = tr (Y X). Also, tr (X X) = x and tr (XY ) ≤ i=1 j=1 ij X tr(X) tr (X X) tr (Y Y ). A rather useful identity is det e = e .
A.7. Similar Matrices
315
If X is square and nonsingular, then log det (X) ≤ tr X − n and with equality if and only if X = In . The eigenvectors associated with distinct eigenvalues of a symmetric (or Hermitian) matrix are orthogonal. A normal matrix is one with an orthogonal set of eigenvectors. A matrix X is normal if and only if XX = X X. If X, Y are symmetric matrices, the minimum eigenvalue of X is denoted by λmin (X). Then λmin (X + Y ) ≥ λmin (X) + λmin (Y ) . Also, if X and Y are positive semidefinite (see Section A.8) then " #2 1/2 tr (X) tr (Y ) ≥ Σi λi (XY ) ,
λmin (XY X) ≥λmin X 2 λmin (Y ) , λmin (XY ) ≥λmin (X) λmin (Y ) . With λmax (X) as the maximum eigenvalue, then for positive semidefinite matrices, see Section A.8. 1/2
tr (X) + tr (Y ) ≥2Σi λi
(XY ) ,
λmax (X + Y ) ≤λmax (X) + λmax (Y ) ,
λmax (XY X) ≤λmax X 2 λmax (Y ) , λmax (XY ) ≤λmax (X) λmax (Y ) . If λ1 (X) ≥ · · · ≥ λn (X) are the ordered eigenvalues of a symmetric matrix X, then for 1 ≤ k ≤ n and symmetric matrices X, Y : k
λi (X + Y ) ≤
i=1 k
λn−i+1 (X + Y ) ≥
i=1
k i=1 k i=1
λi (X) +
k
λi (Y ) .
i=1
λn−i+1 (X) +
k
λn−i+1 (Y )
i=1
A.7 Similar Matrices Two n × n matrices X, Y are called similar if there exists a nonsingular T such that Y = T −1 XT . Thus X is similar to X. Also, if X is similar to Y , then Y is similar to X. Moreover, if X is similar to Y and if Y is similar to Z, then X is similar to Z. Indeed, similarity of matrices is an equivalence relation, see Section C.1. Similar matrices have the same eigenvalues.
316
Appendix A. Linear Algebra
If for a given X, there exists a similarity transformation T such that Λ = T −1 XT is diagonal, then X is termed diagonalizable and Λ = diag (λ1 , λ2 . . . λn ) where λi are the eigenvalues of X. The columns of T are then the eigenvectors of X. All matrices with distinct eigenvalues are diagonalizable, as are orthogonal, symmetric, skew symmetric, unitary, Hermitian, and skew Hermitian matrices. In fact, if X is symmetric, it can be diagonalized by a real orthogonal matrix and when unitary, Hermitian, or skew-Hermitian, it can be diagonalized by a unitary matrix. If X is Hermitian and T is any invertible transformation, then Sylvester’s inertia theorem asserts that T ∗ XT has the same number P of positive eigenvalues and the same number N of negative eigenvalues as X. The difference S = P − N is called the signature of X, denoted sig (X). For square real X, there exists a unitary matrix U such that U ∗ XU is upper triangular the diagonal elements being the eigenvalues of X. This is called the Schur form. It is not unique. Likewise, there exists a unitary matrix U , such that U ∗ XU is a Hessenberg matrix with the form x11 x21 0
x12 x22 .. .
... ... .. . xn,n−1
x1n x2n .. . . xnn
A Jacobi matrix is a (symmetric) Hessenberg matrix, so that in the Hessenberg form xij = 0 for |i − j| ≥ 2. The Hessenberg form of any symmetric matrix is a Jacobi matrix.
A.8 Positive Definite Matrices and Matrix Decompositions With X = X and real, then X is positive definite (positive definite or nonnegative definite) if and only if the scalar x Xx > 0 (x Xx ≥ 0) for all nonzero vectors x. The notation X > 0 (X ≥ 0) is used. In fact X > 0 (X ≥ 0) if and only if all eigenvalues are positive (nonnegative). If X = Y Y then X ≥ 0 and Y Y > 0 if and only if Y is an m × n matrix with m ≤ n and rk Y = m. If Y = Y , so that X = Y 2 then Y is unique and is termed the symmetric square root of X, denoted X 1/2 . If X ≥ 0, then X 1/2 exists. If Y is lower triangular with positive diagonal entries, and Y Y = X, then Y is termed a Cholesky factor of X. A successive row by row generation of
A.9. Norms of Vectors and Matrices
317
the nonzero entries of Y is termed a Cholesky decomposition. A subsequent step is to form Y ΛY = X where Λ is diagonal positive definite, and Y is lower triangular with 1’s on the diagonal. The above decomposition also applies to Hermitian matrices with the obvious generalizations. For X a real n × n matrix, then there exists a polar decomposition X = ΘP where P is positive semidefinite symmetric and Θ is orthogonal 1/2 is uniquely determined, satisfying Θ Θ = ΘΘ = In . While P = (X X) Θ is uniquely determined only if X is nonsingular. The singular values of possibly complex rectangular matrices X, denoted σi (X), are the positive square roots of the eigenvalues of X ∗ X. There exist unitary matrices U , V such that V XU =
0
σ1 ..
.
0
σn
0 .. .
... .. .
0 .. .
0
...
0
=: Σ.
If unitary U , V yield V XU , a diagonal matrix with nonnegative entries then the diagonal entries are the singular values of X. Also, X = V ΣU is termed a singular value decomposition (SVD) of X. Every real m × n matrix A of rank r has a factorization A = XY by real m × r and r × n matrices X and Y with rk X = rk Y = r. With X ∈ Rm×r and Y ∈ Rr×n , then the pair (X, Y ) belong to the product space Rm×r × Rr×n . If (X, Y ), (X1 , Y1 ) ∈ Rm×r × Rr×n are two full rank factorizations of A, i.e. A = XY = X1 Y1 , then there exists a unique invertible r × r matrix T with (X, Y ) = X1 T −1 , T Y1 . For X a real n × n matrix, the QR decomposition is X = ΘR where Θ is orthogonal and R is upper triangular (zero elements below the diagonal) with nonnegative entries on the diagonal. If X is invertible then Θ, R are uniquely determined.
A.9 Norms of Vectors and Matrices The norm of a vector x, written x, is any length measure satisfying x ≥ 0 for all x, with equality if and only if x = 0, sx = |s| x for any scalar s, and x + y ≤ x + y for all x, y. The Euclidean norm
318
Appendix A. Linear Algebra
$n 2 1/2 or the 2-norm is x = , and satisfies the Schwartz inequality i=1 xi |x y| ≤ x y, with equality if and only if y = sx for some scalar s. The operator norm of a matrix X with respect to a given vector norm, is defined as X = maxx=1 Xx. Corresponding to the Euclidean norm 1/2 is the 2-norm X2 = λmax (X X), being the largest singular value of X. The Frobenius norm is XF = tr1/2 (X X). The subscript F is deleted when it is clear that the Frobenius norm is intended. For all these norms Xx ≤ X x. Also, X + Y ≤ X + Y and XY ≤ X Y . Note that tr (XY ) ≤ XF Y F . The condition number of a nonsingular matrix X relative to a norm · is X X −1 .
A.10 Kronecker Product and Vec With X an n × m matrix, the Kronecker product X ⊗ Y is the matrix,
x11 Y . . .
... .. .
xn1 Y
...
x1n Y .. . . xnm Y
With X, Y square the set of eigenvalues of X ⊗ Y is given by λi (X) λj (Y ) for all i, j and the set of eigenvalues of (X ⊗ I + I ⊗ Y ) is the set {λi (X) + λj (Y )} for all i, j. Also (X ⊗ Y ) (V ⊗ W ) = XV ⊗ Y W , and (X ⊗ Y ) = X ⊗ Y . If X, Y are invertible, then so is X ⊗ Y and −1 (X ⊗ Y ) = X −1 ⊗ Y −1 . Vec (X) is the column vector obtained by stacking the second column of X under the first, and then the third, and so on. In fact vec (XY ) = (I ⊗ X) vec Y = (Y ⊗ I) vec (X) vec (ABC) = (C ⊗ A) vec (B) .
A.11 Differentiation and Integration Suppose X is a matrix valued function of the scalar variable t. Then X (t) is called differentiable if each entry xij (t) is differentiable. Also, dX = dt
dxij dt
,
dX dY d (XY ) = Y +X , dt dt dt
d tX e = XetX = etX X. dt
A.12. Lemma of Lyapunov
Also,
Xdt =
319
xij dt . Now with φ a scalar function of a matrix X, then
∂φ ∂φ = the matrix with ij-th entry . ∂X ∂xij If Φ is a matrix function of a matrix X, then ∂Φ ∂φij = a block matrix with ij-th block . ∂X ∂X The case when X, Φ are vectors is just a specialization
of
the above ∂ definitions. If X is square (n × n) and nonsingular ∂X tr W X −1 = −X −1 W X −1 . Also log det (X) ≤ tr X − n and with equality if and only d X −1 (t) = if X = In . Furthermore, if X is a function of time, then dt −1 dX −1 −1 −X dt X , which follows from differentiating XX = I. ∂ If P = P , ∂x (x P x) = 2P x.
A.12 Lemma of Lyapunov If A, B, C are known n×n, m×m and n×m matrices, then the linear equation AX + XB + C = 0 has a unique solution for an n × m matrix X if and only if λi (A)+λj (B) = 0 for any i and j. In fact [I ⊗ A + B ⊗ I] vec (X) = − vec (C) and the eigenvalues of [I ⊗ A + B ⊗ I] are precisely given by λi (A) + λj (B). If C > 0 and A = B , the Lemma of Lyapunov for AX + XB + C = 0 states that X = X > 0 if and only if all eigenvalues of B have negative real parts. The linear equation X − AXB = C, or equivalently, [In2 − B ⊗ A] × vec (X) = vec (C) has a unique solution if and only if λi (A) λj (B) = 1 for any i, j. If A = B and |λi (A)| < 1 for all i, then for X − AXB = C, the Lemma of Lyapunov states that X = X > 0 for all C = C > 0. Actually, the condition C > 0 in the lemma can be relaxed to requiring for any D such that DD = C that (A, D) be completely controllable, or (A, D) be completely detectable, see definitions Appendix B.
A.13 Vector Spaces and Subspaces Let us restrict to the real field R (or complex field C), and recall the spaces Rn (or Cn ). These are in fact special cases of vector spaces over R (or C) with the vector additions and scalar multiplications properties for its elements spelt out in Section A.2. They are denoted real (or complex) vector
320
Appendix A. Linear Algebra
spaces. Any space over an arbitrary field K which has the same properties is in fact a vector space V . For example, the set of all m × n matrices with entries in the field as R (or C), is a vector space. This space is denoted by Rm×n (or Cm×n ). A subspace W of V is a vector space which is a subset of the vector space V . The set of all linear combinations of vectors from a non empty subset S of V , denoted L (S), is a subspace (the smallest such) of V containing S. The space L (S) is termed the subspace spanned or generated by S. With the empty set denoted φ, then L (φ) = {0}. The rows (columns) of a matrix X viewed as row (column) vectors span what is termed the row (column) space of X denoted here by [X]r ([X]c ). Of course, R (X ) = [X]r and R (X) = [X]c .
A.14 Basis and Dimension A vector space V is n-dimensional (dim V = n), if there exists linearly independent vectors, the basis vectors, {e1 , e2 , . . . , en } which span V . A basis for a vector space is non-unique, yet every basis of V has the same number n of elements. A subspace W of V has the property dim W ≤ n, and if dim W = n, then W = V . The dimension of the row (column) space of a matrix X is the row (column) rank of X. The row and column ranks are equal and are in fact the rank of X. The co-ordinates of a vector x in V with respect to a basis are the (unique) tuple of coefficients of $ a linear combination of the basis vectors that generate x. Thus with x = i ai ei , then a1 , a2 , . . . , an are the co-ordinates.
A.15 Mappings and Linear Mappings For A, B arbitrary sets, suppose that for each a ∈ A there is assigned a single element f (a) of B. The collection f of such is called a function, or map and is denoted f : A → B. The domain of the mapping is A, the codomain is B. For subsets As , Bs , of A, B then f (As ) = {f (a) : a ∈ As } is the image of As , and f −1 (Bs ) = {a ∈ A : f (a) ∈ Bs } is the preimage or fiber of Bs . If Bs = {b} is a singleton set we also write f −1 (b) instead of f −1 ({b}). Also, f (A) is the image or range of f . The notation x → f (x) is used to denote the image f (x) of an arbitrary element x ∈ A. The composition of mappings f : A → B and g : B → C, denoted gof , is an associative operation. The identity map idA : A → A is the map defined by a → a for all a ∈ A.
A.16. Inner Products
321
A mapping f : A → B is one-to-one or injective if different elements of A have distinct images, i.e. if a1 = a2 ⇒ f (a1 ) = f (a2 ). The mapping is onto or surjective if every b ∈ B is the image of at least one a ∈ A. A bijective mapping is one-to-one and onto (surjective and injective). If f : A → B and g : B → A are maps with g ◦ f = idA , then f is injective and g is surjective. For vector spaces V , W over R or C (denoted K) a mapping F : V → W is a linear mapping if F (v + w) = F (v) + F (w) for any v, w ∈ V , and as F (kv) = kF (v) for any k ∈ K and any v ∈ V . Of course F (0) = 0. A linear mapping is called an isomorphism if it is bijective. The vector spaces V , W are isomorphic if there is an isomorphism of V onto W . A linear mapping F : V → U is called singular if it is not an isomorphism. For F : V → U , a linear mapping, the image of F is the set Im (F ) = {u ∈ U | F (v) = u for some v ∈ V } . The kernel of F , is ker (F ) = {u ∈ V | F (u) = 0}. In fact for finite dimensional spaces dim V = dim ker (F ) + dim Im (F ). Linear operators or transformations are linear mappings T : V → V , i.e. from V to itself. The dual vector space V ∗ of a K-vector space V is defined as the K-vector space of all K-linear maps λ : V → K. It is denoted by V ∗ = Hom (V, K).
A.16 Inner Products Let V be a real vector space. An inner product on V is a bilinear map β : V × V → R, also denoted by · , · , such that β (u, v) = u, v satisfies the conditions u, v = v, u , u, u >0,
u, sv + tw = s u, v + t u, w for u ∈ V − {0}
for all u, v, w ∈ V , s, t ∈ R. 1/2 for u ∈ V , An inner product defines a norm on V , by u := u, u which satisfies the usual axioms for a norm. An isometry on V is a linear map T : V → V such that T u, T v = u, v for all u, v ∈ V . All isometries are isomorphisms though the converse is not true. Every inner product on V = Rn is uniquely determined by a positive definite symmetric matrix Q ∈ Rn×n such that u, v = u Qv holds for all u, v ∈ Rn . If Q = In , u, v = u v is called the standard Euclidean inner product on Rn . The 1/2 is the Euclidean norm on Rn . A linear map induced norm x = x, x
322
Appendix A. Linear Algebra
A : V → V is called selfadjoint if Au, v = u, Av holds for all u, v ∈ V . A matrix A : Rn → Rn taken as a linear map is selfadjoint for the inner product u, v = u Qv if and only if the matrix QA = A Q is symmetric. Every selfadjoint linear map A : Rn → Rn has only real eigenvalues. Let V be a complex vector space. A Hermitian inner product on V is a map β : V × V → C, also denoted by · , · , such that β (u, v) = u, v satisfies u, v =v, u ,
u, sv + tw = s u, v + t u, w .
u, u >0,
for u ∈ V − {0}
holds for all u, v, w ∈ V , s, t ∈ C, where the superbar denotes the complex conjugate. 1/2 The induced norm on V is u = u, u , u ∈ V . One has the same notions of isometries and selfadjoint maps as in the real case. Every Hermitian inner product on V = Cn is uniquely defined by a positive definite Hermitian matrix Q = Q∗ ∈ Cn×n such that u, v = u¯ Qv holds for all u, v ∈ Cn . The standard Hermitian inner product on Cn is u, v = u∗ v for u, v ∈ Cn . A complex matrix A : Cn → Cn is selfadjoint for the Hermitian inner product u, v = u∗ Qv if and only if the matrix QA = A∗ Q is Hermitian. Every selfadjoint linear map A : Cn → Cn has only real eigenvalues.
APPENDIX
B
Dynamical Systems This appendix summarizes the key results of both linear and nonlinear dynamical systems theory required as a background in this text. For more complete treatments, see Irwin (1980), Isidori (1985), Kailath (1980) and Sontag (1990b).
B.1 Linear Dynamical Systems State equations Continuous-time, linear, finite-dimensional, dynamical systems initialized at time t0 are described by dx = Ax + Bu, x (t0 ) = x0 , y = Cx + Du dt where x ∈ Rn , u ∈ Rm , y ∈ Rp and A, B, C, D are matrices of appropriate dimension, possibly time varying. In the case when B = 0, then the solution of x˙ = A (t) x, with initial state x0 is x (t) = Φ (t, t0 ) x0 where Φ (t, t0 ) is the transition matrix which satisfies Φ˙ (t, t0 ) = A (t) Φ (t, t0 ), Φ (t0 , t0 ) = I and has the property Φ (t2 , t1 ) Φ (t1 , t0 ) = Φ (t2 , t0 ). For any B, then % t x (t) = Φ (t, t0 ) x0 + Φ (t, τ ) B (τ ) u (τ ) dτ. x˙ :=
t0
In the time-invariant case where A (t) = A, then Φ (t, t0 ) = e(t−t0 )A . In discrete-time, for k = k0 , k0 + 1, . . . with initial state xk0 ∈ Rn xk+1 = Axk + Buk ,
yk = Cxk + Duk .
324
Appendix B. Dynamical Systems
The discrete-time solution is, xk = Φk,k0 xk0 +
k
Φk,i Bi ui
i=ko
where Φk,k0 = Ak Ak−1 . . . Ak0 . In the time invariant case Φk,k0 = Ak−k0 . A continuous-time, time-invariant system (A, B, C) is called stable if the eigenvalues of A have all real part strictly less than 0. Likewise a discretetime, time-invariant system (A, B, C) is called stable, if the eigenvalues of A are all located in the open unit disc {z ∈ C | |z| < 1}. Equivalently, limt→+∞ etA = 0 and limk→∞ Ak = 0, respectively.
Transfer functions The unilateral Laplace transform of a function f (t) is the complex valued function % ∞ f (t) e−st dt. F (s) = 0+
A sufficient condition for F (s) to exist as a meromorphic function is that |f (t)| ≤ Ceat is exponentially bounded for some constants C, a > 0. In the time-invariant case, transfer functions for the continuous-time system are given in terms of the Laplace transform in the complex variable s as H (s) = C (sI − A)−1 B + D.
−1 Thus when x0 = 0, then Y (s) = C (sI − A) B + D U (s) gives the Laplace transform Y (s) of y (·), expressed as a linear function of the 1/2 Laplace transform U (s) of u (·). When s = iw (where i = (−1) ), then H (iw) is the frequency response of the systems at a frequency w. The transfer function s−1 is an integrator. The Z-transform of a sequence (hk | k ∈ N0 ) is the formal power series in z −1 H (z) =
∞
hk z −k .
k=0
In discrete-time, the Z-transform yields the transfer function H (z) = −1 C (zI − A) B + D in terms of the Z-transform variable z. For x0 = 0, −1 then Y (z) = C (zI − A) B +D U (z) expresses the relation between the Z-transforms of the sequences {uk } and {yk }. The transfer function z −1 is a unit delay. The frequency response of a periodically sampled continuoustime signal with sampling period T is H (z) with z = eiwT .
B.2. Linear Dynamical System Matrix Equations
325
B.2 Linear Dynamical System Matrix Equations Consider the matrix differential equation dX (t) = A (t) X (t) + X (t) B (t) + C (t) , X˙ := dt
X (t0 ) = X0 .
Its solution when A, B are constant is %
t
X (t) = e(t−t0 )A X0 e(t−t0 )B +
e(t−τ )A C (τ ) e(t−τ )B dτ. t0
When A, B, and C are constant, and the eigenvalues of A, B have negative real parts, then X (t) → X∞ as t → ∞ where % ∞ X∞ = etA CetB dt 0
which satisfies the linear equation AX∞ + X∞ B + C = 0. A special case is the Lyapunov equation AX + XA + C = 0, C = C ≥ 0.
B.3 Controllability and Stabilizability In the time-invariant, continuous-time, case, the pair (A, B) with A ∈ Rn×n and B ∈ Rn×m is termed completely controllable (or more simply controllable) if one of the following equivalent conditions holds: • There exists a control u taking x˙ = Ax + Bu from arbitrary state x0 to another arbitrary state x1 , in finite time.
• Rank B, AB, . . . , An−1 B = n • (λI − A, B) full rank for all (complex) λ T • Wc (T ) = 0 etA BB etA dt > 0 for all T > 0 • AX = XA and XB = 0 implies X = 0 • w etA B = 0 for all t implies w = 0 • w Ai B = 0 for all i implies w = 0 • There exists a K of appropriate dimension such that A + BK has arbitrary eigenvalues.
326
Appendix B. Dynamical Systems
• There exists no co-ordinate basis change such that A11 A12 B1 A= . , B= 0 A22 0 T The symmetric matrix Wc (T ) = 0 etA BB etA dt > 0 is the controllability Gramian, associated with x˙ = Ax + Bu. It can be found as the solution ˙ c (t) = AWc (t) + Wc (t) A + BB initialized by Wc (t0 ) = 0. at time T of W If A has only eigenvalues with negative real part, in short Re (λ (A)) < 0, then Wc (∞) = limt→∞ Wc (t) exists. In the time-varying case, only the first definition of controllability applies. It is equivalent to requiring that the Gramian % t+T Φ (t + T, τ ) B (τ ) B (τ ) Φ (t + T, τ ) dτ Wc (t, T ) = t
be positive definite for all t and some T . The concept of uniform complete controllability requires that Wc (t, T ) be uniformly bounded above and below from zero for all t and some T > 0. This condition ensures that a bounded energy control can take an arbitrary state vector x to zero in an interval [t, t + T ] for arbitrary t. A uniformly controllable system has the property that a bounded K (t) exists such that x˙ = (A + BK) x has an arbitrary degree of (exponential) stability. The discrete-time controllability conditions, Gramians, etc are analogous to the continuous-time definitions and results. In particular, the N controllability Gramian of a discrete-time system is defined by Wc(N ) :=
N
Ak BB (A )
k
k=0 (N )
for N ∈ N. The pair (A, B) is controllable if and only if Wc is positive definite for all N ≥ n − 1. If A has all its eigenvalues in the open unit disc {z ∈ C | |z| < 1}, in short |λ (A)| < 1, then for N = ∞ Wc :=
∞
Ak BB (A )
k
k=0
exists and is positive definite if and only if (A, B) is controllable.
B.4 Observability and Detectability The pair (A, C) has observability/detectability properties according to the controllability/stabilizability properties of the pair (A , C ); likewise for the
B.5. Minimality
327
time-varying and uniform observability cases. The observability Gramians are known as duals of the controllability Gramians, e.g. in the continuoustime case %
T
Wo (T ) =
etA C CetA dt,
%
∞
Wo =
o
etA C CetA dt
0
and in the discrete-time case,
Wo(N ) =
N
(A ) C CAk , k
k=0
Wo :=
∞
(A ) C CAk . k
k=0
B.5 Minimality The state space systems of Section B.1 denoted by the triple (A, B, C) are termed minimal realizations, in the time-invariant case, when (A, B) is completely controllable and (A, C) is completely observable. The McMil−1 lan degree of the transfer functions H (s) = C (sI − A) B or H (z) = −1 C (zI − A) B is the state space dimension of a minimal realization. Kalman’s realization theorem asserts that any p × m rational matrix function H (s) with H (∞) = 0 (that is h (s) is strictly proper) has a minimal −1 realization (A, B, C) such that G (s) = C (sI − A) B holds. Moreover, given two minimal realizations denoted (A1 , B1 , C1 ) and (A2 , B2 , C2 ). then there always exists a unique nonsingular transformation matrix T such that T A1 T −1 = A2 , T B1 = B2 , C1 T −1 = C2 . All minimal realizations of a transfer function have the same dimension.
B.6 Markov Parameters and Hankel Matrix With H (s) strictly proper (that is for H (∞) = 0), then H (s) has the Laurent expansion at ∞ H (s) =
M1 M2 M0 + 2 + 3 + ··· s s s
328
Appendix B. Dynamical Systems
where the p × m matrices Mi are termed Markov parameters. The Hankel matrices of H (s) of size N p × N m are then
HN
M0 M1 = .. . MN −1
M1 M2 .. . MN
... ... .. . ...
MN −1 MN .. , . M2N −1
N ∈ N.
If the triple (A, B, C) is a realization of H (s), then Mi = CAi B for all i. Moreover, if (A, B, C) is minimal and of dimension n, then rank HN = n for all N ≥ n. A Hankel matrix HN of the transfer function −1 H (s) = C (sI − A) B has the factorization HN = ON · RN where RN =
N −1 B, . . . , AN −1 B and ON = C , . . . , (A ) C are the controllability and observability matrices of length N . Thus for singular values, σi (HN ) =
(N ) (N ) 1/2 λi Wc Wo .
B.7 Balanced Realizations For a stable system (A, B, C), a realization in which the Gramians are equal and diagonal as Wc = Wo = diag (σ1 , . . . , σn ) is termed a diagonally balanced realization. For a minimal realization (A, B, C), the singular values σi are all positive. For a non minimal realization of McMillan degree m < n, then σm+i = 0 for i > 0. Corresponding definitions and results apply for Gramians defined on finite intervals T . Also, when the controllability and observability Gramians are equal but not necessarily diagonal the realizations are termed balanced. Such realizations are unique only to within orthogonal basis transformations. Balanced truncation is where a system (A, B, C) with A ∈ Rn×n , B ∈ n×m R , C ∈ Rp×n is approximated by an rth order system with r < n and σr > σr+1 , the last (n − r) rows of (A, B) and last (n − r) columns of A ] of a balanced realization are set to zero to form a reduced rth order [C realization (Ar , Br , Cr ) ∈ Rr×r × Rr×m × Rp×r . A theorem of Pernebo and Silverman states that if (A, B, C) is balanced and minimal, and σr > σr+1 , then the reduced r-th order realization (Ar , Br , Cr ) is also balanced and minimal.
B.8. Vector Fields and Flows
329
B.8 Vector Fields and Flows Let M be a smooth manifold and let T M denote its tangent bundle; see Appendix C. A smooth vector field on M is a smooth section X : M → T M of the tangent bundle. Thus X (a) ∈ Ta M is a tangent vector for each a ∈ M . An integral curve of X is a smooth map α : I → M , defined on an open interval I ⊂ R, such that α (t) is a solution of the differential equation on M . α˙ (t) = X (α (t)) ,
t ∈ I.
(8.1)
Here the left-hand side of (8.1) is the velocity vector α˙ (t) = Dα|t (1) of the curve α at time t. In local coordinates on M , (8.1) is just an ordinary differential equation defined on an open subset U ⊂ Rn , n = dim M . It follows from the fundamental existence and uniqueness result for solutions of ordinary differential equations, that integral curves of smooth vector fields on manifolds always exist and are unique. The following theorem summarizes the basic properties of integral curves of smooth vector fields. Theorem 8.1 Let X : M → T M be a smooth vector field on a manifold M. (a) For each a ∈ M there exists an open interval Ia ⊂ R with 0 ∈ Ia and a smooth map αa : Ia → M such that the following conditions hold i α˙ a (t) = X (αa (t)), αa (0) = a, t ∈ Ia . ii Ia is maximal with respect to (a)i.. That is, if β : J → M is another integral curve of X satisfying (a)i. then J ⊂ Ia . (b) The set D = {(t, a) ∈ R × M | t ∈ Ia } is an open subset of R × M and φ : D → M, φ (t, a) = αa (t) , is a smooth map. (c) φ : D → M satisfies i φ (0, a) = a for all a ∈ M . ii φ (s, φ (t, a)) = φ (s + t, a) whenever both sides are defined. iii The map φt : Dt → M, φt (a) = φ (t, a) on the open set Dt = {a ∈ M | (t, a) ∈ D} is a diffeomorphism of Dt onto its image. The inverse is given by (φt )
−1
= φ−t .
330
Appendix B. Dynamical Systems
The smooth map φ : D → M or the associated one-parameter family of diffeomorphisms φt : Dt → M is called the flow of X. Thus, if (φt ) is the flow of X, then t → φt (a) is the unique integral curve of (8.1) which passes through a ∈ M at t = 0. It is not necessarily true that a flow (φt ) is defined for all t ∈ R (finite escape time phenomena) or equivalently that the domain of a flow is D = R × M . We say that the vector field X : M → T M is complete if the flow (φt ) is defined for all t ∈ R. If M is compact then all smooth vector fields X : M → T M are complete. If a vector field X on M is complete then (φt : t ∈ R) is a one-parameter group of diffeomorphism M → M and satisfying φt+s = φt ◦ φs ,
φ0 = idM ,
−1
φ−t = (φt )
for all s, t ∈ R. In the case where the integral curves of a vector field are only considered for nonnegative time, we say that (φt | t ≥ 0) (if it exists) is a one-parameter semigroup of diffeomorphisms. An equilibrium point of a vector field X is a point a ∈ M such that X (a) = 0 holds. Equilibria points of a vector field correspond uniquely to fixed points of the flow, i.e. φt (a) = a for all t ∈ R. The linearization of a vector field X : M → T M at an equilibrium point is the linear map Ta X = X˙ (a) : Ta M → Ta M. Note that the tangent map Ta X : Ta M → T0 (T M ) maps Ta M into the tangent space T0 (T M ) of T M at X (a) = 0. But T0 (T M ) is canonically isomorphic to Ta M ⊕ Ta M , so that the above map is actually defined by composing Ta X with the projection onto the second summand. The associated linear differential equation on the tangent space Ta M ξ˙ = X˙ (a) · ξ,
ξ ∈ Ta M
(8.2)
is then referred to as the linearization of (8.1). If M = Rn this corresponds to the usual concept of a linearization of a system of differential equations on Rn .
B.9 Stability Concepts Let X : M → T M be a smooth vector field on a manifold and let φ : D → M be the flow of X. Let a ∈ M be an equilibrium point of X, so that X (a) = 0.
B.9. Stability Concepts
331
The equilibrium point a ∈ M is called stable if for any neighborhood U ⊂ M of a there exists a neighborhood V ⊂ M of a such that x ∈ V → φt (x) ∈ U
for all t ≥ 0.
Thus solutions which start in a neighborhood of a stay near to a for all time t ≥ 0. In particular, if a ∈ M is a stable equilibrium of X then all solutions of x˙ = X (x) which start in a sufficiently small neighborhood of a are required to exist for all t ≥ 0. The equilibrium point a ∈ M of X is asymptotically stable if it is stable and if a convergence condition holds: There exists a neighborhood U ⊂ M of a such that for all x ∈ U lim φt (x) = a.
t→+∞
Global asymptotic stability holds if a is a stable equilibrium point and if limt→+∞ φt (x) = a holds for all x ∈ M . Smooth vector fields generating such flows can exist only on manifolds which are homeomorphic to Rn . Let M be a Riemannian manifold and X : M → T M be a smooth vector field. An equilibrium point a ∈ M is called exponentially stable if it is stable and there exists a neighborhood U ⊂ M of a and constants C > 0, α > 0 such that dist (φt (x) , a) ≤ Ce−αt dist (x, a) for all x ∈ U , t ≥ 0. Here dist (x, y) denotes the geodesic distance of x and y on M . In such cases, the solutions converging to a are said to converge exponentially. The exponential rate of convergence ρ > 0 refers to the maximal possible α in the above definition. Lemma 9.1 Given a vector field X : M → T M on a Riemannian manifold M with equilibrium point a ∈ M . Suppose that the linearization X˙ (a) : Ta M → Ta M has only eigenvalues with real part less than −α, α > 0. Then a is (locally) exponentially stable with an exponential rate of convergence ρ ≥ α. Let X : M → T M be a smooth vector field on a manifold M and let (φt ) denote the flow of X. The ω-limit set Lω (x) of a point x of the vector field X is the set of points of the form limn→∞ φtn (x) with the tn → +∞. Similarly, the α-limit set Lα (x) is defined by letting tn → −∞ instead of converging to +∞. A subset A ⊂ M is called positively invariant or negatively invariant, respectively, if for each x ∈ A the associated integral curve satisfies φt (a) ∈ A for all t ≥ 0 or φt (a) ∈ A for t ≤ 0, respectively. Also, A is called invariant, if it is positively and negatively invariant, and
332
Appendix B. Dynamical Systems
A is called locally invariant, if for each a ∈ A there exists an open interval ]a, b[ ⊂ R with a < 0 < b such that for all t ∈ ]a, b[ the integral curve φt (a) of X satisfies φt (a) ∈ A.
B.10 Lyapunov Stability We summarize the basic results from Lyapunov theory for ordinary differential equations on Rn . Let x˙ = f (x) ,
x ∈ Rn
(10.1)
be a smooth vector field on Rn . We assume that f (0) = 0, so that x = 0 is an equilibrium point of (10.1). Let D ⊂ Rn be a compact neighborhood of 0 in Rn . A Lyapunov function of (10.1) on D is a smooth function V : D → R having the properties (a)
V (0) = 0,
V (x) > 0 for all x = 0 in
D.
(b) For any solution x (t) of (10.1) with x (0) ∈ D d V˙ (x (t)) = V (x (t)) ≤ 0. dt
(10.2)
Also, V : D → R is called a strict Lyapunov function if the strict inequality holds d V˙ (x (t)) = V (x (t)) < 0 dt
for x (0) ∈ D − {a} .
(10.3)
Theorem 10.1 (Stability) If there exists a Lyapunov function V : D → R defined on some compact neighborhood of 0 ∈ Rn , then x = 0 is a stable equilibrium point. Theorem 10.2 (Asymptotic Stability) If there exists a strict Lyapunov function V : D → R defined on some compact neighborhood of 0 ∈ Rn , then x = 0 is an asymptotically stable equilibrium point. Theorem 10.3 (Global Asymptotic Stability) If there exists a proper map V : Rn → R which is a strict Lyapunov function with D = Rn , then x = 0 is globally asymptotically stable.
B.10. Lyapunov Stability
333
Here properness of V : Rn → R is equivalent to V (x) → ∞ for x → ∞, see also Section C.1. Theorem 10.4 (Exponential Asymptotic Stability) If one has in Theorem 10.2 2
2
α1 x ≤V (x) ≤ α2 x , −α3 x2 ≤V˙ (x) ≤ −α4 x2 for some positive αi , i = 1, . . . , 4, then x = 0 is exponentially asymptotically stable. Now consider the case where X : M → T M is a smooth vector field on a manifold M . A smooth function V : M → R is called a (weak) Lyapunov function if V has compact sublevel sets i.e. {x ∈ M | V (x) ≤ c} is compact for all c ∈ R, and d V˙ (x (t)) = V (x (t)) ≤ 0 dt for any solution x (t) of x˙ (t) = X (x (t)). Recall that a subset Ω ⊂ M is called invariant if every solution x (t) of x˙ = X (x) with x (0) ∈ Ω satisfies x (t) ∈ Ω for all t ≥ 0. La Salle’s principle of invariance then asserts Theorem 10.5 (Principle of Invariance) Let X : M → T M be a smooth vector field on a Riemannian manifold and let V : M → R be a smooth weak Lyapunov function for X. Then every solution x (t) ∈ M of x˙ = X (x) exists for all t ≥ 0. Moreover the ω-limit set Lω (x) of any point x ∈ M is a compact, connected and invariant subset of {x ∈ M | grad V (x) , X (x) = 0}.
APPENDIX
C
Global Analysis In this appendix we summarize some of the basic facts from general topology, manifold theory and differential geometry. As further references we mention Munkres (1975) and Hirsch (1976). In preparing this appendix we have also profited from the appendices on differential geometry in Isidori (1985) as well as from unpublished lectures notes for a course on calculus on manifolds by Gamst (1975).
C.1 Point Set Topology A topological space is a set X together with a collection of subsets of X, called open sets, satisfying the axioms (a) The empty set φ and X are open sets. (b) The union of any family of open sets is an open set. (c) The intersection of finitely many open sets is an open set. A collection of subsets of X satisfying (a)–(c) is called a topology on X. A neighbourhood of a point a ∈ X is a subset which contains an open subset U of X with a ∈ U . A basis for a topology on X is a collection of open sets, called basic open sets, satisfying • X is the union of basic open sets. • The intersection of two basic open sets is a union of basic open sets. A subset A of a topological space X is called closed if the complement X − A is open. The collection of closed subsets satisfies axioms dual to
336
Appendix C. Global Analysis
(a)–(c). In particular, the intersection of any number of closed subsets is closed and the union of finitely many closed subsets is a closed subset. ˚ is the largest (possibly The interior of a subset A ⊂ X, denoted by A, empty) open subset of X which is contained in A. A is open if and only if ˚ The closure of A in X, denoted by A, ¯ is the smallest closed subset A = A. ¯ of X which contains A. A is closed if and only if A = A. A subset A ⊂ X is called dense in X if its closure coincides with X. A subset A ⊂ X is called generic if it contains a countable intersection of open and dense subsets of X. Generic subsets of manifolds are dense. If X is a topological space and A ⊂ X a subset, then A can be made into a topological space by defining the open subsets of A as the intersections A ∩ U by open subsets U of X. This topology on A is called the subspace topology. Let X and Y be topological spaces. The cartesian product X × Y can be endowed with a topology consisting of the subsets U × V where U and V are open subsets of X and Y respectively. This topology is called the product topology. A space X is called Hausdorff, if any two distinct points of X have disjoint neighborhoods. We say the points can be separated by open neighborhoods. A topological space X is called connected if there do not exist disjoint open subsets U , V with U ∪ V = X. Also, X is connected if and only if the empty set and X are the only open and closed subsets of X. Every topological space X is the union of disjoint connected subsets Xi , i ∈ I, such that every connected subset C of X intersects only one Xi . The subsets Xi , i ∈ I, are called the connected components of X. Let A be a subset of X. A collection C of subsets of X is said to be a covering of A if A is contained in the union of the elements of C. Also, A ⊂ X is called compact if every covering of A by open subsets of X contains a finite subcollection covering A. Every closed subset of a compact space is compact. Every compact subset of a Hausdorff space is closed. A topological space X is said to be locally compact if and only if every point a ∈ X has a compact neighborhood. The standard Euclidean n-space is Rn . A basis for a topology on the set of real numbers R is given by the collection of open intervals ]a, b[ = {x ∈ R | a < x < b}, for a < b arbitrary. We use the notations [a, b] := {x ∈ R | a ≤ x ≤ b} [a, b[ := {x ∈ R | a ≤ x < b} ]a, b] := {x ∈ R | a < x ≤ b} ]a, b[ := {x ∈ R | a < x < b} Each of these sets is connected. For a, b ∈ R, [a, b] is compact.
C.1. Point Set Topology
337
A basis for a topology on Rn is given by the collection of open n-balls , Br (a) = x ∈ Rn
n 2 (xi − ai ) < r i=1
for a ∈ Rn and r > 0 arbitrary. The same topology is defined by taking the product topology via Rn = R × · · · × R (n-fold product). A subset A ⊂ Rn is compact if and only if it is closed and bounded. A map f : X → Y between topological spaces is called continuous if the pre-image f −1 (Y ) = {p ∈ X | f (p) ∈ Y } of any open subset of Y is an open subset of X. Also, f is called open (closed) if the image of any open (closed) subset of X is open (closed) in Y . The map f : X → Y is called proper if the pre-image of every compact subset of Y is a compact subset of X. The image f (K) of a compact subset K ⊂ X under a continuous map f : X → Y is compact. Let X and Y be locally compact Hausdorff spaces. Then every proper continuous map f : X → Y is closed. The image f (A) of a connected subset A ⊂ X of a continuous map f : X → Y is connected. Let f : X → R be a continuous function on a compact space X. The Weierstrass theorem asserts that f possesses a minimum and a maximum. A generalization of this result is: Let f : X → R be a proper continuous function on a topological space X. If f is lower bounded, i.e. if there exists a ∈ R such that f (X) ⊂ [a, ∞[, then there exists a minimum xmin ∈ X for f with f (xmin ) = inf {f (x) | x ∈ X} A map f : X → Y is called a homeomorphism if f is bijective and f and f −1 : Y → X are continuous. An imbedding is a continuous, injective map f : X → Y such that f maps X homeomorphically onto f (X), where f (X) is endowed with the subspace topology of Y . If f : X → Y is an imbedding, then we also write f : X → Y . If X is compact and Y a Hausdorff space, then any bijective continuous map f : X → Y is a homeomorphism. If X is any topological space and Y a locally compact Hausdorff space, then any continuous, injective and proper map f : X → Y is an imbedding and f (X) is closed in Y . Let X and Y be topological spaces and let p : X → Y be a continuous surjective map. Then p is said to be a quotient map provided any subset U ⊂ Y is open if and only if p−1 (U ) is open in X. Let X be a topological space and Y be a set. Then, given a surjective map p : X → Y , there exists a unique topology on Y , called the quotient topology, such that p is a quotient map. It is the finest topology on Y which makes p a continuous map.
338
Appendix C. Global Analysis
Let ∼ be a relation on X. Then ∼ is called an equivalence relation, provided it satisfies the three conditions (a) x ∼ x for all x ∈ X (b) x ∼ y if and only if y ∼ x for x, y ∈ X. (c) For x, y, z ∈ X then x ∼ y and y ∼ z implies x ∼ z. Equivalence classes are defined by [x] := {y ∈ X | x ∼ y}. The quotient space is defined as the set X/ ∼:= {[x] | x ∈ X} of all equivalence classes of ∼. It is a topological space, endowed with the quotient topology for the map p : X → X/ ∼, p (x) = [x], x ∈ X. The graph Γ of an equivalence relation ∼ on X is the subset of X × X defined by Γ := {(x, y) ∈ X × X | x ∼ y}. The closed graph theorem states that the quotient space X/ ∼ is Hausdorff if and only if the graph Γ is a closed subset of X × X.
C.2 Advanced Calculus Let U be an open subset of Rn and f : U → R a function. f : U → R is called a smooth, or C ∞ , function if f is continuous and the mixed partial derivatives of any order exist and are continuous. A function f : U → R is called real analytic if it is C ∞ and for each x0 ∈ U the Taylor series expansion of f at x0 converges on a neighborhood of x0 to f (x). A smooth map f : U → Rm is an m-tuple (f1 , . . . , fm ) of smooth functions fi : U → R. Let x0 ∈ U and f : U → Rm be a smooth map. The Fr´echet derivative of f at x0 is defined as the linear map Df (x0 ) : Rn → Rm which has the following ε − δ characterization: For any ε > 0 there exists δ > 0 such that for x − x0 < δ f (x) − f (x0 ) − Df (x0 ) (x − x0 ) < ε x − x0 holds. The derivative Df (x0 ) is uniquely determined by this condition. We also use the notation Df |x0 (ξ) = Df (x0 ) · ξ to denote the Fr´echet derivative Df (x0 ) : Rn → Rm , ξ → Df (x0 ) (ξ). If Df (x0 ) is expressed with respect to the standard basis vectors of Rn and Rm then the associated m×n-matrix of partial derivatives is the Jacobi matrix
C.2. Advanced Calculus
∂f
1
(x0 ) . . . . .. .. Jf (x0 ) = . ∂fm ∂x1 (x0 ) . . . ∂x1
339
(x0 ) .. . . ∂fm ∂xn (x0 ) ∂f1 ∂xn
The second derivative of f : U → Rm at x0 ∈ U is a bilinear map D2 f (x0 ) : Rn × Rn → Rm . It has the ε − δ characterization: For all ε > 0 there exists δ > 0 such that for x − x0 < δ f (x) − f (x0 ) − Df (x0 ) (x − x0 ) − 1 D2 f (x0 ) (x − x0 , x − x0 ) 2! 2
< ε x − x0
holds. Similarly, for any integer i ∈ N, the i-th derivative Di f (x0 ) : =i n m is defined as a multilinear map on the i-th fold products j=1 R → R n n R × · · · × R . The Taylor series is the formal power series ∞ Di f (x0 )
i!
i=0
(x − x0 , . . . , x − x0 ) .
The matrix representation of the second derivative D2 f (x0 ) of a function f : U → R with respect to the standard basis vectors on Rn is the Hessian Hf (x0 ) =
∂2 f ∂x21
(x0 ) .. .
∂2 f ∂xn ∂x1
... .. .
(x0 ) . . .
(x0 ) .. . . ∂2f (x ) 0 ∂x2
∂2f ∂x1 ∂xn
n
It is symmetric if f is C ∞ . The chain rule for the derivatives of smooth maps f : Rm → Rn , g : n R → Rk asserts that D (g ◦ f ) (x0 ) = Dg (f (x0 )) ◦ Df (x0 ) and thus for the Jacobi matrices Jg◦f (x0 ) = Jg (f (x0 )) · Jf (x0 ) . A diffeomorphism between open subsets U and V of Rn is a smooth bijective map f : U → V such that f −1 : V → U is also smooth. A map f : U → V is called a local diffeomorphism if for any x0 ∈ U there exists
340
Appendix C. Global Analysis
open neighborhoods U (x0 ) ⊂ U and V (f (x0 )) ⊂ V of x0 and f (x0 ), respectively, such that f maps U (x0 ) diffeomorphically onto V (f (x0 )). Theorem 2.1 (Inverse Function Theorem) Let f : U → V be a smooth map between open subsets of Rn . Suppose the Jacobi matrix Jf (x0 ) is invertible at a point x0 ∈ U . Then f maps a neighborhood of x0 in U diffeomorphically onto a neighborhood of f (x0 ) in V . A critical point of a smooth map f : U → Rn is a point x0 ∈ U where rank Jf (x0 ) < min (m, n). Also, f (x0 ) is called a critical value. Regular values y ∈ Rn are those such that f −1 (y) contains no critical point. If f : U → R is a function, critical points x0 ∈ U are those where Df (x0 ) = 0. A local minimum (local maximum) of f is a point x0 ∈ U such that f (x) ≥ f (x0 ) (f (x) ≤ f (x0 )) for all points x ∈ U0 in a neighborhood U0 ⊂ U of x0 . Moreover, x0 ∈ U is called a global minimum (global maximum) if f (x) ≥ f (x0 ) (f (x) ≤ f (x0 )) holds for all x ∈ U . Local minima or maxima are critical points. All other critical points x0 ∈ U are called saddle points. A critical point x0 ∈ U of f : U → R is called a nondegenerate critical point if the Hessian Hf (x0 ) is invertible. A function f : U → R is called a Morse function if all its critical points x0 ∈ U are nondegenerate. Nondegenerate critical points are always isolated. Let x0 ∈ U be a critical point of the function f . If the Hessian Hf (x0 ) is positive definite, i.e. Hf (x0 ) > 0, then x0 is a local minimum. If Hf (x0 ) < 0, then x0 is a local maximum. The Morse lemma explains the behaviour of a function around a saddle point. Lemma 2.2 (Morse Lemma) Let f : U → R be a smooth function and let x0 ∈ U be a nondegenerate critical point of f . Let k be the number of positive eigenvalues of Hf (x0 ). Then there exists a local diffeomorphism φ of a neighborhood U (x0 ) of x0 onto a neighborhood V ⊂ Rn of 0 such that k n
f ◦ φ−1 (x1 , . . . , xn ) = x2j − x2j . j=1
j=k+1
In constrained optimization, conditions for a point to be a local minimum subject to constraints on the variables are often derived using Lagrange multipliers. Let f : Rm → R be a smooth function. Let g : Rm → Rn , g = (g1 , . . . , gn ), n < m, be a smooth map such that constraints are defined by g (x) = 0. Assume that 0 is a regular value for g, so that rk Dg (x) = n for all x ∈ Rm satisfying g (x) = 0.
C.3. Smooth Manifolds
341
Condition 2.3 (First-Order Necessary Condition) Let x0 ∈ Rm , g (x0 ) = 0, be a local minimum of the restriction of f to the constraint set {x ∈ Rm | g (x) = 0}. Assume that 0 is a regular value of g. Then there exists $n real numbers λ1 , . . . , λn such that x0 is a critical point of f (x) + j=1 λj gj (x), i.e. n D f+ λj gj (x0 ) = 0. j=1
The parameters λ1 , . . . , λn appearing in the above necessary condition are called the Lagrange multipliers. Condition 2.4 (Second-Order Sufficient Condition) Let f : Rn → R be a smooth function and let 0 be a regular value of g : Rn → Rm . Also let x0 ∈ Rm , g (x0 ) = 0, satisfy D (f + λj gj ) (x0 ) = 0 for real numbers λ1 , . . . , λn . Suppose that the Hessian Hf (x0 ) +
n
λj Hgj (x0 )
j=1
$ of the function f + λj gj at x0 is positive definite on the linear subspace {ξ ∈ Rn | Dg (x0 ) · ξ = 0} of Rn . Then x0 is a local minimum for the restriction of f on {x ∈ Rn | g (x) = 0}.
C.3 Smooth Manifolds Let M be a topological space. A chart of M is a triple (U, φ, n) consisting of an open subset U ⊂ M and a homeomorphism φ : U → Rn of U onto an open subset φ (U ) of Rn . The integer n is called the dimension of the chart. We use the notation dimx M = n to denote the dimension of M at any point x in U . φ = (φ1 , . . . , φn ) is said to be a local coordinate system on U . Two charts (U, φ, n) and (V, ψ, m) of M are called C ∞ compatible charts if either U ∩ V = ∅, the empty set, or if (a) φ (U ∩ V ) and ψ (U ∩ V ) are open subsets of Rn and Rm .
342
Appendix C. Global Analysis
f
U
ÉÉÉÉ ÉÉÉÉ ÉÉÉÉ y f *1
V y
M
ÉÉÉÉ ÉÉÉÉ ÉÉÉÉ ÉÉÉÉ
f y *1
FIGURE 3.1. Coordinate charts
(b) The transition functions ψ ◦ φ−1 : φ (U ∩ V ) → ψ (U ∩ V ) and
φ ◦ ψ −1 : ψ (U ∩ V ) → φ (U ∩ V )
are C ∞ maps, see Figure 3.1. It follows that, if U ∩ V = ∅ then the dimensions of the charts m and n are the same. A C ∞ atlas of M is a set A = {(Ui , φi , ni ) | i ∈ I} of C ∞ compatible charts such that M = ∪i∈I Ui . An atlas A is called maximal if every chart (U, φ, n) of M which is C ∞ compatible with every chart from A also belongs to A. Definition 3.1 A smooth or C ∞ manifold M is a topological Hausdorff space, having a countable basis, that is equipped with a maximal C ∞ atlas. If all coordinate charts of M have the same dimension n, then M is called an n-dimensional manifold. If M is connected, then all charts of M must have the same dimension and therefore the dimension of M coincides with the dimension of any coordinate chart. Let U ⊂ M be an open subset of a smooth manifold. Then U is a smooth manifold. The charts of M which are defined on open subsets of U form a C ∞ atlas of U . Also, U is called an open submanifold of M . Let M and N be smooth manifolds. Any two coordinate charts (U, φ, m) and (V, ψ, n) of M and N define a coordinate chart (U × V, φ × ψ, m + n)
C.4. Spheres, Projective Spaces and Grassmannians
343
of M × N . In this way, M × N becomes a smooth manifold. It is called the product manifold of M and N . If M and N have dimensions m and n, respectively, then dim (M × N ) = m + n. Let M and N be smooth manifolds. A continuous map f : M → N is called smooth if for any charts (U, φ, m) of M and (V, ψ, n) of N with f (U ) ⊂ V the map ψ ◦ f ◦ φ−1 : φ (U ) → Rn is C ∞ . The map ψf φ−1 is called the expression of f in local coordinates. A continuous map f : M → Rn is smooth if and only if each component function fi : M → R, f = (f1 , . . . , fn ), is smooth. The identity map idM : M → M , x → x, is smooth. If f : M → N and g : N → P are smooth maps, then also the composition g ◦ f : M → P . If f, g : M → Rn are smooth maps, so is every linear combination. If f, g : M → R are smooth, so is the sum f + g : M → R and the product f · g : M → R. A map f : M → N between smooth manifolds M and N is called a diffeomorphism if f : M → N is a smooth bijective map such that f −1 : N → M is also smooth. M and N are called diffeomorphic if there exists a diffeomorphism f : M → N . Let M be a smooth manifold and (U, φ, n) a coordinate chart of M . Then φ : U → Rn defines a diffeomorphism φ : U → φ (U ) of U onto φ (U ) ⊂ Rn . An example of a smooth homeomorphism which is not a diffeomorphism is f : R → R, f (x) = x3 . Here f −1 : R → R, f −1 (x) = x1/3 is not smooth at the origin. A smooth map f : M → N is said to be a local diffeomorphism at x ∈ M if there exists open neighborhoods U ⊂ M , V ⊂ N of x and f (x) respectively with f (U ) ⊂ V , such that the restriction f : U → V is a diffeomorphism.
C.4 Spheres, Projective Spaces and Grassmannians
The n-sphere in Rn+1 is defined by S n = x ∈ Rn+1 | x21 + · · · + x2n+1 = 1 . It is a smooth, compact manifold of dimension n. Coordinate charts of S n can be defined by stereographic projection from the north pole xN = (0, . . . , 0, 1) and south pole xS = (0, . . . , 0, −1). Let UN := S n − {xN } and define ψN : UN → Rn by x − xn+1 xN ψN (x1 , . . . , xn+1 ) = for x = (x1 , . . . , xn+1 ) ∈ UN . 1 − xn+1 −1 : Rn → UN is given by Then ψN 2 y − 1 xN + 2 (y, 0) −1 ψN , (y) = 2 y + 1
y ∈ Rn .
344
Appendix C. Global Analysis IR
xN x
f N(x)
n
IR
FIGURE 4.1. Stereographic projection
where y2 = y12 + · · ·+ yn2 . Similarly for US = S n − {xS }, the stereographic projection from the south pole is defined by ψS : US → Rn , ψS (x) =
x + xn+1 xS 1 + xn+1
and ψS−1 (y) =
for
2
1 − y
2 xS
1 + y
+
x = (x1 , . . . , xn+1 ) ∈ US 2 2
1 + y
(y, 0) ,
y ∈ Rn .
−1 , ψN ◦ ψS−1 : Rn − {0} → By inspection, the transition functions ψS ◦ ψN n ∞ R − {0} are seen to be C maps and thus A = {(UN , ψN , n) , (US , ψS , n)} is a C ∞ atlas for S n . The real projective n-space RPn is defined as the set of all lines through the origin in Rn+1 . Thus RPn can be identified with the set of equivalence classes [x] of nonzero vectors x ∈ Rn+1 − {0} for the equivalence relation on Rn+1 − {0} defined by
x ∼ y ⇐⇒ x = λy
for λ ∈ R − {0} .
Similarly, the complex projective n-space CPn is the defined as the set of complex one-dimensional subspaces of Cn+1 . Thus CPn can be identified with the set of equivalence classes [x] of nonzero complex vectors x ∈ Cn+1 − {0} for the equivalence relation x ∼ y ⇐⇒ x = λy
for
λ ∈ C − {0}
defined on Cn+1 −{0}. It is easily verified that the graph of this equivalence relation is closed and thus RPn and CPn are Hausdorff spaces. If x = (x0 , . . . , xn ) ∈ Rn+1 − {0}, the one-dimensional subspace of Rn+1 generated by x is denoted by [x] = [x0 : · · · : xn ] ∈ RPn .
C.4. Spheres, Projective Spaces and Grassmannians
345
The coordinates x0 , . . . , xn are called the homogeneous coordinates of [x]. They are uniquely defined up to a common scalar multiple by a nonzero real number. Coordinate charts for RPn are defined by Ui := {[x] ∈ RPn | xi = 0} and φi : Ui → Rn with φi ([x]) =
x0 xi−1 xi+1 xn ,..., , ,..., xi xi xi xi
∈ Rn
for i = 0, . . . , n. It is easily seen that these charts are C ∞ compatible and thus define a C ∞ atlas of RPn . Similarly for CPn . Since every one dimensional real linear subspace of Rn+1 is generated by a unit vector x ∈ S n one has a surjective continuous map π : S n → RPn ,
π (x) = [x] ,
with π (x) = π (y) ⇐⇒ y = ±x. One says that RPn is defined by identifying antipodal points on S n . We obtain that RPn is a smooth compact and connected manifold of dimension n. Similarly CPn is seen as a smooth compact and connected manifold of dimension 2n. Both RP1 and CP1 are homeomorphic to the circle S 1 and the Riemann sphere S 2 respectively. Also, RP2 is a non orientable surface which cannot be visualized as a smooth surface imbedded in R3 . For 1 ≤ k ≤ n the real Grassmann manifold GrassR (k, n) is defined as the set of k-dimensional real linear subspaces of Rn . Similarly the complex Grassmann manifold GrassC (k, n) is defined as the set of k-dimensional complex linear subspaces of Cn , where k means the complex dimension of the subspace. GrassR (k, n) is a smooth compact connected manifold of dimension k (n − k). Similarly GrassC (k, n) is a smooth compact connected manifold of dimension 2k (n − k). For k = 1, GrassR (1, n) and GrassC (1, n) coincide with the real and complex projective spaces RPn−1 and CPn−1 respectively. The points of the Grassmann manifold GrassR (k, n) can be identified with equivalence classes of an equivalence relation on rectangular matrices. Let
ST (k, n) = X ∈ Rn×k | rk X = k denote the noncompact Stiefel manifold, i.e. the set of full rank n× k matrices. Any such matrices are called k-frames Let [X] denote the equivalence class of the equivalence relation defined on ST (k, n) by X ∼ Y ⇐⇒ X = Y T
for
T ∈ GL (k, R) .
346
Appendix C. Global Analysis
Thus X, Y ∈ ST (k, n) are equivalent if and only if their respective columnvectors generate the same vector space. This defines a surjection π : ST (k, n) → GrassR (k, n) , π (X) = [X] , and GrassR (k, n) can be identified with the quotient space ST (k, n) / ∼. It is easily seen that the graph of the equivalence relation is a closed subset of ST (k, n) × ST (k, n). Thus GrassR (k, n), endowed with the quotient topology, is a Hausdorff space. Similarly GrassC (k, n) is seen to be Hausdorff. To define coordinate charts on GrassR (k, n), let I = {i1 < · · · < ik } ⊂ {1, . . . , n} denote a subset having k elements. Let I = {1, . . . , n} − I denote the complement of I. For X = (x1 , . . . , xn ) ∈ Rn×k , let XI =
xi1 , . . . , xik ∈ Rk×k denote the submatrix of X of those row vectors xi of X with i ∈ I. Similarly let XI be the (n − k) × k submatrix formed by those row vectors xi with i ∈ I. Let UI := {[X] ∈ GrassR (k, n) | det (XI ) = 0} with ψI : UI → Rk(n−k) defined by φI ([X]) = XI · (XI )−1 ∈ R(n−k)×k
It is easily seen that the system of nk coordinate charts {(UI , ψI , (n − k) k) | I ⊂ {1, . . . , n} and |I| = k} is a C ∞ atlas for GrassR (k, n). Similarly coordinate charts and a C ∞ atlas for GrassC (k, n) are defined. As the Grassmannian is the image of the continuous surjective map π n) → GrassR (k, n) of the compact Stiefel manifold St (k, n) =
: St (k, X ∈ Rn×k | X X = Ik it is compact. Two orthogonal k-frames X and Y ∈ St (k, n) are mapped to the same point in the Grassmann manifold if and only if they are orthogonally equivalent. That is, π (X) = π (Y ) if and only if X = Y T for T ∈ O (k) orthogonal. Similarly for GrassC (k, n).
C.5 Tangent Spaces and Tangent Maps Let M be a smooth manifold. There are various equivalent definitions of the tangent space of M at a point x ∈ M . The following definition is particularly appealing from a geometric viewpoint. A curve through x ∈ M is a smooth map α : I → M defined on an open interval I ⊂ R with 0 ∈ I, α (0) = x. Let (U, φ, n) be a chart of M with x ∈ U . Then φ ◦ α : I → φ (U ) ⊂ Rn is a smooth map and its derivative at 0 is called the velocity vector (φα) (0) of α at x with respect to the chart
C.5. Tangent Spaces and Tangent Maps
347
(U, φ, n). If (V, ψ, n) is another chart with x ∈ V , then the velocity vectors of α with respect to the two charts are related by
(ψα) (0) = D ψφ−1 φ(x) · (φα) (0) (5.1) Two curves α and β through x are said to be equivalent, denoted by α ∼x β, if (φα) (0) = (φβ) (0) holds for some (and hence any) coordinate chart (U, φ, n) around x. Using (5.1), this is an equivalence relation on the set of curves through x. The equivalence class of a curve α is denoted by [α]x . Definition 5.1 A tangent vector of M at x is an equivalence class ξ = [α]x of a curve α through x ∈ M . The tangent space Tx M of M at x is the set of all such tangent vectors. Working in a coordinate chart (U, φ, n) one has a bijection τxφ : Tx M → Rn ,
τxφ ([α]x ) = (φα) (0)
Using the vector space structure on Rn we can define a vector space structure on the tangent space Tx M such that τxφ becomes an isomorphism of vector spaces. The vector space structure on Tx M is easily verified to be independent of the choice of the coordinate chart (U, φ, n) around x. Thus if M is a manifold of dimension n, the tangent spaces Tx M are n-dimensional vector spaces. Let f : M → N be a smooth map between manifolds. If α : I → M is a curve through x ∈ M then f ◦ α : I → N is a curve through f (x) ∈ N . If two curves α, β : I → M through x are equivalent, so are the curves f ◦ α, f ◦ β : I → N through f (x). Thus we can consider the map [α]x → [f ◦ α]f (x) on the tangent spaces Tx M and Tf (x) N . Definition 5.2 Let f : M → N be a smooth map, x ∈ M . The tangent map Tx f : Tx M → Tf (x) N is the linear map on tangent spaces defined by Tx f ([α]x ) = [f ◦ α]f (x) for all tangent vectors [α]x ∈ Tx M . Expressed in local coordinates (U, φ, m) and (V, ψ, n) of M and N at x and f (x), the tangent map coincides with the usual derivative of f (expressed in local coordinates). Thus
−1
= D ψf φ−1 φ(x) . τfψ(x) ◦ Tx f ◦ τxφ One has a chain rule for the tangent map. Let idM : M → M be the identity map and let f : M → N and g : N → P be smooth. Then Tx (idM ) = idTx M ,
Tx (g ◦ f ) = Tf (x) g ◦ Tx f.
The inverse function theorem for open subsets of Rn implies
348
Appendix C. Global Analysis
Theorem 5.3 (Inverse Function Theorem) Let f : M → M be smooth and x ∈ M . Then Tx f : Tx M → Tf (x) M is an isomorphism if and only if f is a local diffeomorphism at x. Let M and N be smooth manifolds, and x ∈ M , y ∈ N . The tangent space of the product manifold M × N at a point (x, y) is then canonically isomorphic to T(x,y) (M × N ) = (Tx M ) × (Ty N ) . The tangent space of Rn (or of any open subset U ⊂ Rn ) at a point x is canonically isomorphic to Rn , i.e. Tx (Rn ) = Rn . If f : M → Rn is a smooth map then we use the above isomorphism of Tf (x) (Rn ) with Rn to identify the tangent map Tx f : Tx M → Tf (x) (Rn ) with the linear map Df |x : Tx M → Rn . Let f : M → R be a smooth function on M . A critical point of f is a point x ∈ M where the derivative Df |x : Tx M → R is the zero map, i.e. Df |x (ξ) = 0 for all tangent vectors ξ ∈ Tx M . A point x0 ∈ M is a local minimum (or local maximum) if there exists an open neighborhood U of x in M such that f (x) ≥ f (x0 ) (or f (x) ≤ f (x0 )) for all x ∈ U . Local minima maxima are critical points of f . The Hessian Hf (x0 ) of f : M → R at a critical point x0 ∈ M is the symmetric bilinear form Hf (x0 ) : Tx0 M × Tx0 M → R
Hf (x0 ) (ξ, ξ) = f (α) (0) for any tangent vector ξ = [α]x0 ∈ Tx0 M . Thus for tangent vectors ξ, η, ∈ T x0 M Hf (x0 ) (ξ, η) =
1 2
(Hf (x0 ) (ξ + η, ξ + η) − Hf (x0 ) (ξ, ξ) − Hf (x0 ) (η, η)) .
Note that the Hessian of f is only well defined at a critical point of f . A critical point x0 ∈ M where the Hessian Hf (x0 ) is positive definite on Tx M is a local minimum. Similarly, a critical point x0 with Hf (x0 ) (ξ, ξ) < 0 for all ξ ∈ Tx0 M , ξ = 0, is a local maximum. A critical point x0 ∈ M is called nondegenerate if the Hessian Hf (x0 ) is a nondegenerate bilinear form; i.e. if Hf (x0 ) (ξ, η) = 0,
for all η ∈ Tx0 M =⇒ ξ = 0
C.6. Submanifolds
349
Nondegenerate critical points are always isolated and there are at most countably many of them. A smooth function f : M → R with only nondegenerate critical points is called a Morse function. The Morse index indf (x0 ) of a critical point x0 ∈ M is defined as the signature of the Hessian: indf (x0 ) = sig Hf (x0 ) . Thus indf (x0 ) = dim Tx0 M if and only if Hf (x0 ) > 0, i.e. if and only if x0 is a local minimum of f . If x0 is a nondegenerate critical point then the index indf (x0 ) = dim M − 2n− , where n− is the dimension of the negative eigenspace of Hf (x0 ). Often the number n− of negative eigenvalues of the Hessian is known as the Morse index of f at x0 .
C.6 Submanifolds Let M be a smooth manifold of dimension m. A subset A ⊂ M is called a smooth submanifold of M if every point a ∈ A possesses chart
a coordinate (V, ψ, m) around a ∈ V such that ψ (A ∩ V ) = ψ (V ) ∩ Rk × {0} for some 0 ≤ k ≤ n. In particular, submanifolds A ⊂ M are also manifolds. If A ⊂ M is a submanifold, then at each point x ∈ A, the tangent space Tx A ⊂ Tx M is considered as a sub vector space of Tx M . Any open subset U ⊂ M of a manifold is a submanifold of the same dimension as M . There are subsets A of a manifold M , such that A is a manifold but not a submanifold of M . The main difference is topological: In order for a subset A ⊂ M to qualify as a submanifold, the topology on A must be the subspace topology. Let f : M → N be a smooth map of manifolds. (a) f is an immersion if the tangent map Tx f : Tx M → Tf (x) N is injective at each point x ∈ M . (b) f is a submersion if the tangent map Tx f : Tx M → Tf (x) N is surjective at each point x ∈ M . (c) f is a subimmersion if the tangent map Tx f : Tx M → Tf (x) N has constant rank on each connected component of M .
350
Appendix C. Global Analysis
(d) f is an imbedding if f is an injective immersion such that f induces a homeomorphism of M onto f (M ), where f (M ) is endowed with the subspace topology of N . Submersions are open mappings. If f : M → N is an imbedding, then f (M ) is a submanifold of N and f : M → f (M ) is a diffeomorphism. Conversely, if M is a submanifold of N than the inclusion map inc : M → N , inc (x) = x for x ∈ M , is an imbedding. The image f (M ) of an injective immersion f : M → N is called an immersed submanifold of N , although it is in general not a submanifold of N . A simple example of an immersed submanifold which is not a subman1 ifold are the so-called “irrational lines on a torus”. Here N = S 1 × S is 1 1 it 2πit the two-dimensional torus and f : R → S × S , f (t) = e , e , is an injective immersion with dense image f (R) in S 1 × S 1 . Thus f (R) cannot be a submanifold of S 1 × S 1 . A simple condition for an injective immersion to be an imbedding is stated as a proposition. Proposition 6.1 Let f : M → N be an injective immersion. If M is compact then f is an imbedding. An important way to produce submanifolds is as fibers of smooth maps. Theorem 6.2 (Fiber Theorem) Let f : M → N be a subimmersion. For a ∈ M let A = f −1 (f (a)) be the fiber of f through a. Then A is a smooth submanifold of M with Tx A = ker Tx f
for all
x ∈ A.
In particular, for x ∈ A dimx A = dimx M − rk Tx f Important special cases are $n (a) Let q (x) = x Ax = i,j=1 aij xi xj be a quadratic form on Rn with A = (aij ) a symmetric invertible n × n matrix. Then M = {x ∈ Rn | q (x) = 1} is a submanifold of Rn with tangent spaces Tx M = {ξ ∈ Rn | x Aξ = 0} ,
x ∈ M.
(b) Let f : Rn → R be smooth with ∂f ∂f (x) , . . . , (x) = 0 ∇f (x) = ∂x1 ∂xn
C.7. Groups, Lie Groups and Lie Algebras
351
for all x ∈ Rn satisfying f (x) = 0. Then M = {x ∈ Rn | f (x) = 0} is a smooth submanifold with tangent spaces
Tx M = ξ ∈ Rn | ∇f (x) ξ = 0 , x ∈ M. If M = ∅, then M has dimension n − 1. (c) Let f : Rn → Rk be smooth and M = {x ∈ Rn | f (x) = 0}. If the Jacobi matrix ∂fi Jf (x) = (x) ∈ Rk×n ∂xj has constant rank r ≤ min (n, k) for all x ∈ M then M is a smooth submanifold of Rn . The tangent space of M at x ∈ M is given by Tx M = {ξ ∈ Rn | Jf (x) ξ = 0} . If M = φ, it has dimension n − r. (d) Let f : M → N be a submersion. Then A = f −1 (f (a)) is a submanifold of M with dim A = dim M − dim N , for every a ∈ M .
C.7 Groups, Lie Groups and Lie Algebras A group is a set G together with an operation ◦ which assigns to every ordered pair (a, b) of elements of G a unique element a ◦ b such that the following axioms are satisfied (a) (a ◦ b) ◦ c = a ◦ (b ◦ c) (b) There exists some element e ∈ G with e ◦ a = a for all a ∈ G. (c) For any a ∈ G there exists a−1 ∈ G with a−1 ◦ a = e A subgroup H ⊂ G is a subset of G which is also a group with respect to the group operation ◦ of G. A Lie group is a group G, which is also a smooth manifold, such that the map G × G → G, (x, y) → xy −1 is smooth. A Lie subgroup H ⊂ G is a subgroup of G which is also a submanifold. By Ado’s theorem, closed subgroups of a Lie group are Lie subgroups. Compact Lie groups are Lie groups which are compact as topological spaces. A compact subgroup H ⊂ G is called maximal compact if
352
Appendix C. Global Analysis
there is no compact subgroup H with G H ⊂ G. If H is a compact Lie group, then the only maximal compact Lie subgroup is G itself. Similar definitions exist for complex Lie groups, or algebraic groups, etc. Thus a complex Lie group G is a group G which is also a complex manifold, such that the map G × G → G, (x, y) → xy −1 , is holomorphic The tangent space G = Te G of a Lie group G at the identity element e ∈ G has in a natural way the structure of a Lie algebra. A Lie algebra is a real vector space which is endowed with a product structure, [ , ] : V × V → V , called the Lie bracket, satisfying (a) [x, y] = − [y, x] for all x, y ∈ V . (b) For α, β ∈ R and x, y, z ∈ V [αx + βy, z] = α [x, z] + β [y, z] . (c) The Jacobi identity is satisfied [x, [y, z]] + [z, [x, y]] + [y, [z, x]] = 0 for all x, y, z ∈ V . Example 7.1 The general linear group
GL (n, K) = T ∈ Kn×n | det (T ) = 0 for K = R, or C is a Lie group of dimension n2 , or 2n2 , under matrix multiplication. A maximal compact subgroup of GL (n, K) is the orthogonal group
O (n) = O (n, R) = T ∈ Rn×n | T T = In for K = R and the unitary group
U (n) = U (n, C) = T ∈ Cn×n | T T ∗ = In for K = C. Here T ∗ denotes the Hermitian transpose of T . Also GL (n, R) and O (n, R) each have two connected components, GL+ (n, R) = {T ∈ GL (n, R) | det (T ) > 0} , GL− (n, R) = {T ∈ GL (n, R) | det (T ) < 0} , O+ (n, R) = {T ∈ O (n, R) | det (T ) = 1} , O− (n, R) = {T ∈ O (n, R) | det (T ) = −1} , while GL (n, C) and U (n, C) are connected.
C.7. Groups, Lie Groups and Lie Algebras
353
The Lie algebra of GL (n, K) is
gl (n, K) = X ∈ Kn×n endowed with the Lie bracket product structure [X, Y ] = XY − Y X for n × n matrices X, Y . The Lie algebra of O (n, R) is the vector space of skew-symmetric matrices
skew (n, R) = X ∈ Rn×n | X = −X and the Lie algebra of U (n, C) is the vector space of skew-Hermitian matrices
skew (n, C) = X ∈ Cn×n | X ∗ = −X . In both cases the Lie algebras are endowed with the Lie bracket product [X, Y ] = XY − Y X. Example 7.2 The special linear group
SL (n, K) = T ∈ Kn×n | det (T ) = 1 . A maximal compact subgroup of SL (n, K) is the special orthogonal group
SO (n) = SO (n, R) = T ∈ Rn×n | T T = In , det (T ) = 1 . for K = R and the special unitary group
SU (n) = SU (n, C) = T ∈ Cn×n | T T ∗ = In , det (T ) = 1 for K = C. The groups SO (n, R) and SU (n, C) are connected. Also, SO (2, R) is homeomorphic to the circle S 1 and SO (3, R) is homeomorphic to real projective 3-space RP3 , and SU (2, C) is homeomorphic to the 3-sphere S 3 . The Lie algebra of SL (n, K) is
sl (n, K) = X ∈ Kn×n | tr (X) = 0 , endowed with the matrix Lie bracket product structure [X, Y ] = XY − Y X. The Lie algebra of SO (n, R) coincides with the Lie algebra skew (n, R) of O (n, R). The Lie algebra of SU (n, C) is the subspace of the Lie algebra of U (n, C) defined by {X ∈ Cn×n | X ∗ = −X, tr (X) = 0}.
354
Appendix C. Global Analysis
Example 7.3 The real symplectic group Sp (n, R) = {T ∈ GL (2n, R) | T JT = J}
where
0 J= In
−In . 0
A maximal compact subgroup of Sp (n, R) is Sp (n) = Sp (n, R) ∩ O (2n, R) . The Lie algebra of Sp (n, R) is the vector space of 2n × 2n Hamiltonian matrices
Ham (n) = X ∈ R2n×2n | JX + X J = 0 . The Lie bracket operation is [X, Y ] = XY − Y X. Example 7.4 The Lie group O (p, q) = {T ∈ GL (n, R) | T Ipq T = Ipq } where 0 Ip Ipq = 0 −Iq and p, q ≥ 0, p + q = n. A maximal compact subgroup of O (p, q) is the direct product O (p) × O (q) of orthogonal groups O (p) and O (q). The Lie algebra of O (p, q) is the vector space of signature skew-symmetric matrices
o (p, q) = X ∈ Rn×n | (XIpq ) = −XIpq . The exponential map is a local diffeomorphism exp : G → G, which maps the Lie algebra of G into the connected component of e ∈ G. In the above cases of matrix Lie groups the exponential map is given by the classical matrix exponential ∞ 1 i X . exp (X) = eX = i! i=0 Thus if X is a skew-symmetric, skew-Hermitian, or Hamiltonian matrix, then eX is an orthogonal, unitary, or symplectic matrix. For A ∈ Rn×n let adA : Rn×n → Rn×n be defined by adA (X) = AX − i XA. The map ad0A is called the adjoint transformation. Let adA (B) = i−1 adA adA B , adA B = B for i ∈ N. Then the Baker-Campbell-Hausdorff formula holds etA Be−tA =B + t [A, B] + ∞ k t = adk B k! A k=0
t2 [A, [A, B]] + · · · 2!
C.8. Homogeneous Spaces
355
for t ∈ R and A, B ∈ Rn×n arbitrary.
C.8 Homogeneous Spaces A Lie group action of a Lie group G on a smooth manifold M is a smooth map δ : G × M → M, (g, x) → g · x satisfying for all g, h ∈ G and x ∈ M g · (h · x) = (gh) · x,
e · x = x.
A group action δ : G×M → M is called transitive if there exists an element x ∈ M such that every y ∈ M satisfies y = g · x for some g ∈ G. A space M is called homogeneous if there exists a transitive G-action on M . The orbit of x ∈ M is defined by O (x) = {g · x | g ∈ G} . Thus the homogeneous spaces are the orbits of a group action. Any Lie group action δ : G × M → M induces an equivalence relation ∼ on M defined by x ∼ y ⇐⇒
there exists g ∈ G
with y = g · x
for x, y ∈ M . Thus the equivalence classes of ∼ are the orbits of δ : G×M → M . The quotient space of this equivalence relation is called the orbit space and is denoted by M/G or by M/ ∼. Thus M/G = {O (x) | x ∈ M } . M/G is endowed with the quotient topology, i.e. with the finest topology on M/G such that the quotient map π : M → M/G,
π (x) = O (x)
is continuous. Also, M/G is Hausdorff if and only if the graph of the Gaction, defined by Γ = {(x, g · x) ∈ M × M | x ∈ M, g ∈ G} is a closed subset of M × M . Moreover, M/G is a smooth manifold if and only if Γ is a closed submanifold of M × M .
356
Appendix C. Global Analysis
Given a Lie group action δ : G × M → M and a point x ∈ M , the stabilizer subgroup of x is defined by Stab (x) = {g ∈ G | g · x = x} . Stab (x) is a closed Lie subgroup of G. For any Lie subgroup H ⊂ G the orbit space of the H-action α : H ×G → G, (h, g) → gh−1 , is the set of coset classes G/H = {g · H | g ∈ G} . G/H is an homogeneous space for the G-action α : G × G· /H → G/H, (f, g · H) → f g · H. If H is a closed Lie subgroup of G then G/H is a smooth manifold. In particular, G/ Stab (x), x ∈ M , is a smooth manifold for any Lie group action δ : G × M → M . A group action δ : G × M → M of a complex Lie group on a complex manifold M is called holomorphic, if δ : G×M → M is a holomorphic map. If δ : G × M → M is a holomorphic group action then the homogeneous spaces G/ Stab (x), x ∈ M , are complex manifolds. Let G be a complex algebraic group acting algebraically on a complex variety M . The closed orbit lemma states that every orbit is then a smooth locally closed subset of M whose boundary is a union of orbits of strictly lower dimension. This is wrong for arbitrary holomorphic Lie group actions. However, it is true for real algebraic and even semialgebraic group actions (see below). For a point x ∈ M consider the smooth map δx : G → M,
g → g · x.
Thus the image δx (G) = O (x) coincides with the G-orbit of x and δx induces a smooth injective immersion δ¯x : G/ Stab (x) → M of G/ Stab (x) onto O (x). Thus O (x) is always an immersed submanifold of M . It is an imbedded submanifold in any of the following two cases (a) G is compact. (b) G ⊂ GL (n, R) is a semialgebraic Lie subgroup and δ : G × RN → N R semialgebraic action.This means that the graph
is a smooth (x, g · x) ∈ RN × RN | g ∈ G, x ∈ RN is a semialgebraic subset of RN × RN . This condition is, for example, satisfied for real algebraic
C.9. Tangent Bundle
357
subgroups G ⊂ GL (n, R) and δ : G × RN → RN a real algebraic action. Examples of such actions are the similarity action (S, A) → SAS −1 of GL (n, R) on Rn×n and more generally, the similarity action
(S, (A, B, C)) → SAS −1 , SB, CS −1 of GL (n, R) on matrix triples (A, B, C) ∈ Rn×n × Rn×m × Rp×n . In these two cases (a) or (b) the map δ¯x is a diffeomorphism of G/ Stab (x) onto O (x). For a proof of the following proposition we refer to Gibson (1979). Proposition 8.1 Let G ⊂ GL (n, R) be a semialgebraic Lie subgroup and let δ : G × RN → RN be a smooth semialgebraic action. Then each orbit O (x), x ∈ RN , is a smooth submanifold of RN and δ¯x : G/ Stab (x) → O (x) is a diffeomorphism.
C.9 Tangent Bundle Let M be a smooth manifold. A vector bundle E on M is a family of real vector spaces Ex , x ∈ M , such that Ex varies continuously with x ∈ M . In more precise terms, a smooth vector bundle of rank k on M is a smooth manifold E together with a smooth surjective map π : E → M satisfying (a) For any x ∈ M the fiber Ex = π −1 (x) is a real vector space of dimension k. (b) Local Triviality: Every point x ∈ M possesses an open neighborhood U ⊂ M such that there exists a diffeomorphism θ : π −1 (U ) → U × Rk which maps Ey linearly isomorphically onto {y} × Rk for all y ∈ U .
358
Appendix C. Global Analysis
The smooth mapπ : E → M is called the bundle projection. As a set, a vector bundle E = x∈M Ex is a disjoint union of vector spaces Ex , x ∈ M . A section of a vector bundle E is a smooth map s : M → E such that π ◦ s = idM . Thus a section is just a smooth right inverse of π : E → M . A smooth vector bundle E of rank k on M is called trivial if there exists a diffeomorphism θ : E → M × Rk which maps Ex linearly isomorphically onto {x} × Rk for all x ∈ M . A rank k vector bundle E is trivial if and only if there exist k (smooth) sections s1 , . . . , sk : M → E such that s1 (x) , . . . , sk (x) ∈ Ex are linearly independent for all x ∈ M . The tangent bundle T M of a manifold M is defined as 0 Tx M. TM = x∈M
Thus T M is the disjoint union of all tangent spaces Tx M of M . It is a smooth vector bundle on M with bundle projection π : T M → M, π (ξ) = x
for ξ ∈ Tx M.
Also, T M is a smooth manifold of dimension dim T M = 2 dim M . Let f : M → N be a smooth map between manifolds. The tangent map of f is the map T f : T M → T N,
ξ → Tπ(ξ) f (ξ) .
−1 Thus T f : T M → T N maps a fiber Tx M = πM (x) into the fiber Tf (x) N = −1 πN (f (x)) via the tangent map Tx f : Tx M → Tf (x) N . If f : M → N is smooth, so is the tangent map T f : T M → T N . One has a chain rule for the tangent map
T (g ◦ f ) = T g ◦ T f,
T (idM ) = idT M
for arbitrary smooth maps f : M → N , g : N → P . A smooth vector field on M is a (smooth) section X : M → T M of the tangent bundle. The tangent bundle T S 1 of the circle is trivial, as is the tangent bundle of any Lie group. The tangent bundle T (Rn ) = Rn × Rn is trivial. The tangent bundle of the 2-sphere S 2 is not trivial. The cotangent bundle T ∗ M of M is defined as 0 Tx∗ M T ∗M = x∈M
C.9. Tangent Bundle
359
where Tx∗ M = Hom (Tx M, R) denotes the dual (cotangent) vector space of Tx M . Thus T ∗ M is the disjoint union of all cotangent spaces Tx∗ M of M . Also, T ∗ M is a smooth vector bundle on M with bundle projection π : T ∗ M → M,
π (λ) = x
for λ ∈ Tx∗ M
and T ∗ M is a smooth manifold of dimension dim T ∗ M = 2 dim M . While the tangent bundle provides the right setting for studying differential equations on manifolds, the cotangent bundle is the proper setting for the development of Hamiltonian mechanics. A smooth section α : M → T ∗ M of the cotangent bundle is called a 1form. Examples of 1-forms are the derivatives DΦ : M → T ∗ M of smooth functions Φ : M → R. However, not every 1-form α : M → T ∗ M is necessarily of this form (i.e. α need not be exact ). The bilinear bundle Bil (M ) of M is defined as the disjoint union 0 Bil (M ) = Bil (Tx M ) , x∈M
where Bil (Tx M ) denotes the real vector space consisting of all symmetric bilinear maps β : Tx M × Tx M → R. Also, Bil (M ) is a smooth vector bundle on M with bundle projection π : Bil (M ) → M,
π (β) = x
for β ∈ Bil (Tx M )
and Bil (M ) is a smooth manifold of dimension dim Bil(M ) = n+ 21 n(n + 1), where n = dim M . A Riemannian metric on M is a smooth section s : M → Bil (M ) of the bilinear bundle Bil (M ) such that s (x) is a positive definite inner product on Tx M for all x ∈ M . Riemannian metrics are usually denoted by , x where x indicates the dependence of the inner product of x ∈ M . A Riemannian manifold is a smooth manifold, endowed with a Riemannian metric. If M is a Riemannian manifold, then the tangent bundle T M and the cotangent bundle T ∗ M are isomorphic. Let M be a Riemannian manifold with Riemannian metric , and let A ⊂ M be a submanifold. The normal bundle of A in M is defined by 0 ⊥ T ⊥A = (Tx A) x∈A ⊥
where (Tx A) ⊂ Tx M denotes the orthogonal complement of Tx A in Tx M with respect to the positive definite inner product , x : Tx M × Tx M → R on Tx M . Thus ⊥
(Tx A) = {ξ ∈ Tx M | ξ, η x = 0 for all η ∈ Tx A}
360
Appendix C. Global Analysis
and T ⊥ A is a smooth vector bundle on A with bundle projection π : T ⊥ A → ⊥ A, π (ξ) = x for ξ ∈ (Tx A) . Also, T ⊥ A is a smooth manifold of dimension ⊥ dim T A = dim M .
C.10 Riemannian Metrics and Gradient Flows Let M be a smooth manifold and let T M and T ∗ M denote its tangent bundle and cotangent bundle, respectively, see Section C.9. A Riemannian metric on M is a family of nondegenerate inner products , x , defined on each tangent space Tx M , such that , x depends smoothly on x ∈ M . Thus a Riemannian metric is a smooth section in the bundle of bilinear forms defined on T M , such that the value at each point x ∈ M is a positive definite inner product , x on Tx M . Riemannian metrics exist on every smooth manifold. Once a Riemannian metric is specified, M is called a Riemannian manifold. A Riemannian metric on Rn is a smooth map Q : Rn → Rn×n such that for each x ∈ Rn , Q (x) is a real symmetric positive definite n × n matrix. Every positive definite inner product on Rn defines a (constant) Riemannian metric. If f : M → N is an immersion, any Riemannian metric on N pulls back to a Riemannian metric on M by defining ξ, η x := Tx f (ξ) , Tx f (η) f (x) for all ξ, η ∈ Tx M . In particular, if f : M → N is the inclusion map of a submanifold M of N , any Riemannian metric on N defines by restriction of T N to T M a Riemannian metric on M . We refer to this as the induced Riemannian metric on M . Let Φ : M → R be a smooth function defined on a manifold M and let DΦ : M → T ∗ M denote the differential, i.e. the section of the cotangent bundle T ∗ M defined by DΦ (x) : Tx M → R,
ξ → DΦ (x) · ξ,
where DΦ (x) is the derivative of Φ at x. We also use the notation DΦ|x (ξ) = DΦ (x) · ξ to denote the derivative of Φ at x. To define the gradient vector field of Φ we have to specify a Riemannian metric , on M . The gradient vector field grad Φ of Φ with respect to this choice of a Riemannian metric on M is then uniquely characterized by the following two properties.
C.10. Riemannian Metrics and Gradient Flows
361
(a) Tangency Condition grad Φ (x) ∈ Tx M
for all x ∈ M.
(b) Compatibility Condition DΦ (x) · ξ = grad Φ (x) , ξ
for all ξ ∈ Tx M.
There exists a uniquely determined smooth vector field grad Φ : M → T M on M such that (a) and (b) hold. It is called the gradient vector field of Φ. If M = Rn is endowed with its standard constant Riemannian metric defined by ξ, η = ξ η for ξ, η ∈ Rn , the associated gradient vector field is the column vector ∂Φ ∂Φ ∇Φ (x) = (x) , . . . , (x) ∂x1 ∂xn
If Q : Rn → Rn×n is a smooth map with Q (x) = Q (x) > 0 for all x ∈ Rn , the gradient vector field with respect to the general Riemannian metric on Rn defined by ξ, η x = ξ Q (x) η,
ξ, η ∈ Tx (Rn ) = Rn
is grad Φ (x) = Q (x)
−1
∇Φ (x) .
The linearization of the gradient flow x˙ (t) = grad Φ (x (t)) at an equilibrium point x0 ∈ Rn , ∇Φ (x0 ) = 0, is the linear differential equation ξ˙ = Aξ with A = Q (x0 )−1 HΦ (x0 ) where HΦ (x0 ) is the Hessian at x0 . Thus A −1/2 −1/2 HΦ (x0 ) Q (x0 ) and has only real eigenvalues. is similar to Q (x0 ) The numbers of positive and negative eigenvalues of A coincides with the numbers of positive and negative eigenvalues of the Hessian HΦ (x0 ). Let Φ : M → R be a smooth function on a Riemannian manifold and let V ⊂ M be a submanifold, endowed with the Riemannian metric induced from M . If x ∈ V , then the gradient grad (Φ | V ) (x) of the restriction
362
Appendix C. Global Analysis
Φ : V → R is the image of grad Φ (x) ∈ Tx M under the orthogonal projection map Tx M → Tx V . Let M = Rn be endowed with the standard, constant Riemannian metric and let V ⊂ Rn be a submanifold endowed with the induced Riemannian metric. Let Φ : Rn → R be a smooth function. The gradient grad (Φ | V ) (x) of the restriction Φ : V → R is thus the image of the orthogonal projection of ∇Φ (x) ∈ Rn onto Tx V under Rn → Tx V .
C.11 Stable Manifolds Let a ∈ M be an equilibrium point of a smooth vector field X : M → T M . (a) The stable manifold of a (or the inset of a) is W s (a) = {x ∈ M | Lω (x) = {a}} . (b) The unstable manifold of a (or the outset of a) is W u (a) = {x ∈ M | Lα (x) = {a}} . The stable and unstable manifolds of a are invariant subsets of M . In spite of their names, they are in general not submanifolds of M . Let a ∈ M be an equilibrium point of a smooth vector field X : M → T M . Let X˙ (a) : Ta M → Ta M be the linearization of the vector field at a. The equilibrium point a ∈ M is called hyperbolic if X˙ (a) has only nonzero eigenvalues. Let E + ⊂ Ta M and E − ⊂ Ta M denote the direct sums of the generalized eigenspaces of X˙ (a) corresponding to eigenvalues of X˙ (a) having positive or negative real part respectively. Let n+ = dim E + ,
n− = dim E − .
If a ∈ M is hyperbolic, then E + ⊕ E − = Ta M and n+ + n− = dim M . s A local stable manifold is a locally invariant smooth submanifold Wloc (a) s s (a) such that Ta Wloc (a) = E − . Similarly a local unstaof M with a ∈ Wloc u ble manifold is a locally invariant smooth submanifold Wloc (a) of M with u u (a) such that Ta Wloc (a) = E + . a ∈ Wloc Theorem 11.1 (Stable/Unstable Manifold Theorem) Let a ∈ M be a hyperbolic equilibrium point of a smooth vector field X : M → T M , with integers n+ and n− as defined above.
C.11. Stable Manifolds
363
s (a) There exists local stable and local unstable manifolds Wloc (a) and u s s u Wloc (a). They satisfy Wloc (a) ⊂ W (a) and Wloc (a) ⊂ W u (a). s (a) converges exponentially fast to a. Every solution starting in Wloc
(b) There are smooth injective immersions +
ϕ+ : Rn →M, ϕ+ (0) = a, −
ϕ− : Rn →M, ϕ+ (0) = a, − + and W u (a) = ϕ+ Rn . The derivasuch that W s (a) = ϕ− Rn tives + −
→ Ta M T0 ϕ+ : T0 Rn → Ta M, T0 ϕ− : T0 Rn + − + − ∼ map both T0 Rn ∼ = Rn and T0 Rn = Rn isomorphically onto the eigenspaces E + and E − . Thus, in the case of a hyperbolic equilibrium point a, the stable and unstable manifolds are immersed submanifolds, tangent to the generalized eigenspaces E − and E + of the linearization X˙ (a) : Ta M → Ta M . Let f : M → R be a smooth Morse function on a compact manifold M . Suppose M has a C ∞ Riemannian metric which is induced by Morse charts, defined by the Morse lemma, around the critical point. Consider the gradient flow of grad f for this metric. Then the stable and unstable manifolds W s (a) and W u (a) are connected C ∞ submanifolds of dimensions n− and n+ respectively, where n− and n+ are the numbers of negative and positive eigenvalues of the Hessian of f at a. Moreover W s (a) is diffeomorphic to − + Rn and W u (a) is diffeomorphic to Rn . The following concept becomes important in understanding the behaviour of a flow around a continuum of equilibrium points; see Hirsch, Pugh and Shub (1977). Let f : M → M be a diffeomorphism of a smooth Riemannian manifold M . A compact invariant submanifold V ⊂ M , i.e. f (V ) = V , is called normally hyperbolic if the tangent bundle of M restricted to V , denoted as TV M splits into three continuous subbundles TV M = N u ⊕ T V ⊕ N s , invariant by the tangent map T f of f , such that (a) T f expands N u more sharply than T V . (b) T f contracts N s more sharply than T V .
364
Appendix C. Global Analysis
Thus the normal behaviour (to V ) of T f is hyperbolic and dominate the tangent behaviour. A vector field X : M → T M is said to be normally hyperbolic with respect to a compact invariant submanifold V ⊂ M if V is normally hyperbolic for the time-one map φ1 : M → M of the flow of X. The stable manifold theorem for hyperbolic equilibrium points has the following extension to normally hyperbolic invariant submanifolds. Theorem 11.2 (Fundamental Theorem of Normally Hyperbolic Invariant Manifolds) Let X : M → T M be a complete smooth vector field on a Riemannian manifold and let V be a compact invariant submanifold, which is normally hyperbolic with respect to X. Through V pass stable and unstable C 1 manifolds, invariant under the flow and tangent at V to T V ⊕ N s , N u ⊕ T V . The stable manifold is invariantly fibered by C 1 submanifolds tangent at V to the subspaces N s . Similarly, for the unstable manifold and N u . Let a ∈ M be an equilibrium point of the smooth vector field X : M → T M . Let E 0 denote the direct sum of the generalised eigenspaces of the linearization X˙ (a) : Ta M → Ta M corresponding to eigenvalues of X˙ (a) with real part zero. A submanifold W c ⊂ M is said to be a center manifold if a ∈ W c , Ta W c = E o and W c is locally invariant under the flow of X. Theorem 11.3 (Center Manifold Theorem) Let a ∈ M be an equilibrium point of a smooth vector field X : M → T M . Then for any k ∈ N there exists a C k -center manifold for X at a. Note that the above theorem asserts only the existence of center manifolds which are C k submanifolds of M for any finite k ∈ N. Smooth, i.e. C ∞ center manifolds do not exist, in general. A center manifold W c (a) captures the main recurrence behaviour of a vector field near an equilibrium point. Theorem 11.4 (Reduction Principle) Let a ∈ M be an equilibrium point of a smooth vector field X : M → T M and let n◦ , n+ and n− denote the numbers of eigenvalues λ of the linearization X˙ (a) : Ta M → Ta M with Re (λ) = 0, Re (λ) > 0 and Re (λ) < 0, respectively (counted with multiplicities). Thus n◦ + n+ + n− = n the dimension of M . Then there exists a homeomorphism ϕ : U → Rn from a neighborhood U ⊂ M of a onto a neighborhood of 0 ∈ Rn such that ϕ maps integral curves of X to integral
C.12. Convergence of Gradient Flows
365
curves of x˙ 1 = X1 (x1 ) ,
◦
x1 ∈ Rn , +
y˙ = y,
y ∈ Rn ,
z˙ = −z,
z ∈ Rn .
−
Here the flow of x˙ 1 = X1 (x1 ) is equivalent to the flow of X on a center manifold W c (a).
C.12 Convergence of Gradient Flows Let M be a Riemannian manifold and let Φ : M → R be smooth function. Let grad Φ denote the gradient vector field with respect to the Riemannian metric on M . The critical points of Φ : M → R coincide with the equilibria of the gradient flow on M . x˙ (t) = − grad Φ (x (t)) .
(12.1)
For any solution x (t) of the gradient flow d Φ (x (t)) = grad Φ (x (t)) , x˙ (t) dt = − grad Φ (x (t))2 ≤ 0 and therefore Φ (x (t)) is a monotonically decreasing function of t. Proposition 12.1 Let Φ : M → R be a smooth function on a Riemannian manifold with compact sublevel sets, i.e. for all c ∈ R the sublevel set {x ∈ M | Φ (x) ≤ c} is a (possibly empty) compact subset of M . Then every solution x (t) ∈ M of the gradient flow (12.1) on M exists for all t ≥ 0. Furthermore, x (t) converges to a connected component of the set of critical points of Φ as t → +∞. Note that the condition of the proposition is automatically satisfied if M is compact. Solutions of a gradient flow (12.1) have a particularly simple convergence behaviour. There are no periodic solutions or strange attractors, and there is no chaotic behaviour. Every solution converges to a connected component of the set of equilibria points. This does not necessarily mean that every solution actually converges to an equilibrium point rather than to a whole subset of equilibria. Let C (Φ) ⊂ M denote the set of critical points of Φ : M → R. Recall that the ω-limit set Lω (x) of a point x ∈ M for a vector field X on M is
366
Appendix C. Global Analysis
the set of points of the form limn→∞ φtn (x), where (φt ) is the flow of X and tn → +∞. Proposition 12.2 Let Φ : M → R be a smooth function on a Riemannian manifold with compact sublevel sets. Then (a) The ω-limit set Lω (x), x ∈ M , of the gradient flow (12.1) is a nonempty, compact and connected subset of the set C (Φ) of critical points of Φ : M → R. Moreover, for any x ∈ M there exists c ∈ R such that Lω (x) is a nonempty compact and connected subset of C (Φ) ∩ {x ∈ M | Φ (x) = c}. (b) Suppose Φ : M → R has isolated critical points in any level set {x ∈ M | Φ (x) = c}, c ∈ R. Then Lω (x), x ∈ M , consists of a single critical point. Therefore every solution of the gradient flow (12.1) converges for t → +∞ to a critical point of Φ. In particular, the convergence of a gradient flow to a set of equilibria rather than to single equilibrium points occurs only in nongeneric situations. Let Φ : M → R be a smooth function on a manifold M and let C (Φ) ⊂ M denote the set of critical points of Φ. We say Φ is a Morse-Bott function, provided the following three conditions are satisfied. (a) Φ : M → R has compact sublevel sets. k (b) C (Φ) = j=1 Nj with Nj disjoint, closed and connected submanifolds of M such that Φ is constant on Nj , j = 1, . . . , k. (c) ker (HΦ (x)) = Tx Nj , for all x ∈ Nj , j = 1, . . . , k. The original definition of a Morse-Bott function also includes a global topological condition on the negative eigenspace bundle defined by the Hessian. This further condition is omitted here as it is not relevant to the subsequent result. Condition (b) implies that the tangent space Tx Nj is always contained in the kernel of the Hessian HΦ (x) at each x ∈ Nj . Condition (c) then asserts that the Hessian of Φ is full rank in the directions normal to Nj at x. Proposition 12.3 Let Φ : M → R be a Morse-Bott function on a Riemannian manifold M . Then the ω-limit set Lω (x), x ∈ M , for the gradient flow (12.1) is a single critical point of Φ. Every solution of (12.1) converges as t → +∞ to an equilibrium point.
References Adamyan, Arov and Krein, M. G. (1971). Analytic properties of Schmidt pairs for a Hankel operator and the generalized Schur-Takagi problem, Math. USSR Sbornik 15: 31–73. Aiyer, S. V. B., Niranjan, M. and Fallside, F. (1990). A theoretical investigation into the performance of the Hopfield model, IEEE Trans. Neural Networks 1: 204–215. Akin, E. (1979). The geometry of population genetics, number 31 in Lecture Notes in Biomathematics, Springer-Verlag, Berlin, London, New York. Ammar, G. S. and Martin, C. F. (1986). The geometry of matrix eigenvalue methods, Acta Applicandae Math. 5: 239–278. Anderson, B. D. O. and Bitmead, R. R. (1977). The matrix Cauchy index: Properties and applications, SIAM J. of Applied Math. 33: 655–672. Anderson, B. D. O. and Moore, J. B. (1990). Optimal Control: Linear Quadratic Methods, Prentice-Hall, Englewood Cliffs, N.J. Anderson, T. W. and Olkin, I. (1978). An extremal problem for positive definite matrices, Linear and Multilinear Algebra 6: 257–262. Andronov, A. A. and Pontryagin, L. (1937). Syst`emes grossiers, C.R. (Doklady) Acad. Sci. U.S.S.R. 14: 247–251. Andronov, A. A., Vitt, A. A. and Khaikin, S. E. (1987). Theory of Oscillators, Dover Publications, New York.
368
References
Arnold, V. I. (1989). Mathematical Methods of Classical Mechanics, number 60 in Graduate Texts in Mathematics, second edn, Springer-Verlag, Berlin, London, New York. Arnold, V. I. (1990). Analysis et cetera, in P. Rabinowich and E. Zehnder (eds), Dynamics of intersections, Academic Press, New York. Atiyah, M. F. (1982). Convexity and commuting Hamiltonians, Bull. London Math. Soc. 16: 1–15. Atiyah, M. F. (1983). Angular momentum, convex polyhedra and algebraic geometry, Proc. Edinburgh Math. Soc. 26: 121–138. Atiyah, M. F. and Bott, R. (1982). On the Yang-Mills equations over Riemann surfaces, Phil. Trans. Roy. Soc. London A. 308: 523–615. Auchmuty, G. (1991). Globally and rapidly convergent algorithms for symmetric eigen problems, SIAM J. Matrix Anal. and Appl. 12: 690–706. Azad, H. and Loeb, J. J. (1990). On a theorem of Kempf and Ness, Indiana Univ. Math. J. 39: 61–65. Baldi, P. and Hornik, K. (1989). Neural networks and principal component analysis: learning from examples without local minima, Neural Networks 2: 53–58. Barnett, S. (1971). Matrices in Control Theory, Van Nostrand Reinhold Company Inc., New York. Batterson, S. and Smilie, J. (1989). The dynamics of Rayleigh quotient iteration, SIAM Journal of Numerical Analysis 26: 624–636. Bayer, D. A. and Lagarias, J. C. (1989). The nonlinear geometry of linear programming I, II, Trans. Amer. Math. Soc. 314: 499–580. Beattie, C. and Fox, D. W. (1989). Localization criteria and containment for Rayleigh quotient iteration, SIAM J. Matrix Anal. and Appl. 10: 80–93. Bellman, R. E. (1970). Introduction to Matrices, second edn, McGraw-Hill, New York. Bhatia, R. (1987). Perturbation bounds for matrix eigenvalues, number 162 in Pitman Research Notes in Math., Longman Scientific and Technical, Harlow, Essex.
References
369
Bloch, A. M. (1985a). A completely integrable Hamiltonian system associated with line fitting in complex vector spaces, Bull. AMS 12: 250–254. Bloch, A. M. (1985b). Estimation, principal components and Hamiltonian systems, Systems and Control Letters 6: 1–15. Bloch, A. M. (1987). An infinite-dimensional Hamiltonian system on projective Hilbert space, Trans. Amer. Math. Soc. 302: 787–796. Bloch, A. M. (1990a). The K¨ahler structure of the total least squares problem, Brockett’s steepest descent equations, and constrained flows, in M. A. Kaashoek, J. H. van Schuppen and A. C. M. Ran (eds), Realization and Modelling in Systems Theory, Birkh¨ auser Verlag, Basel. Bloch, A. M. (1990b). Steepest descent, linear programming and Hamiltonian flows, in Lagarias and Todd (1990), pp. 77–88. Bloch, A. M., Brockett, R. W., Kodama, Y. and Ratiu, T. (1989). Spectral equations for the long wave limit of the Toda lattice equations, in J. Harnad and J. E. Marsden (eds), Proc. CRM Workshop on Hamiltonian systems, Transformation groups and Spectral Transform Methods, Universit´e de Montr´eal, Montr´eal, Canada. Bloch, A. M., Brockett, R. W. and Ratiu, T. (1990). A new formulation of the generalized Toda lattice equations and their fixed point analysis via the moment map, Bull. AMS 23: 447–456. Bloch, A. M., Brockett, R. W. and Ratiu, T. (1992). Completely integrable gradient flows, Comm. Math Phys. 23: 47–456. Bloch, A. M. and Byrnes, C. I. (1986). An infinite-dimensional variational problem arising in estimation theory, in M. Fliess and H. Hazewinkel (eds), Algebraic and Geometric Methods in Nonlinear Control Theory, D. Reidel, Dordrecht, Boston, Lancaster, Tokyo. Bloch, A. M., Flaschka, H. and Ratiu, T. (1990). A convexity theorem for isospectral sets of Jacobi matrices in a compact Lie algebra, Duke Math. J. 61: 41–65. Bott, R. (1954). Nondegenerate critical manifolds, Annals of Mathematics 60: 248–261. Bott, R. (1982). Lectures on Morse theory, old and new, Bull. AMS 7: 331– 358.
370
References
Br¨ ocker, T. and tom Dieck, T. (1985). Representations of Compact Lie Groups, Springer-Verlag, Berlin, London, New York. Brockett, R. W. (1988). Dynamical systems that sort lists and solve linear programming problems, Proc. IEEE Conf. Decision and Control, Austin, TX, pp. 779–803. See also (Brockett, 1991b). Brockett, R. W. (1989a). Least squares matching problems, Linear Algebra Appl. 122/123/124: 701–777. Brockett, R. W. (1989b). Smooth dynamical systems which realize arithmetical and logical operations, Three Decades of Mathematical System Theory, number 135 in Lecture Notes in Control and Information Sciences, Springer-Verlag, Berlin, London, New York, pp. 19–30. Brockett, R. W. (1991a). Dynamical systems that learn subspaces, in A. C. Antoulas (ed.), Mathematical System Theory—The Influence of Kalman, Springer-Verlag, Berlin, London, New York, pp. 579–592. Brockett, R. W. (1991b). Dynamical systems that sort lists, diagonalize matrices and solve linear programming problems, Linear Algebra Appl. 146: 79–91. Brockett, R. W. (1993). Differential geometry and the design of gradient algorithms, Proc. of Symp. Pure Math. 54: 69–91. Brockett, R. W. and Bloch, A. M. (1989). Sorting with the dispersionless limit of the Toda lattice, in J. Harnad and J. E. Marsden (eds), Hamiltonian Systems, Transformation Groups and Spectral Transform Methods, Universit´e de Montr´eal, Montr´eal, Canada, pp. 103–112. Brockett, R. W. and Byrnes, C. I. (1981). Multivariable Nyquist criteria, root loci and pole placement: A geometric viewpoint, IEEE Trans. Automatic Control AC-26: 271–284. Brockett, R. W. and Faybusovich, L. E. (1991). Toda flows, inverse spectral transform and realization theory, Systems and Control Letters 16: 779– 803. Brockett, R. W. and Krishnaprasad, P. S. (1980). A scaling theory for linear systems, IEEE Trans. Automatic Control AC-25: 197–207. Brockett, R. W. and Skoog, R. A. (1971). A new perturbation theory for the synthesis of nonlinear networks, SIAM-AMS Proc. Volume III: Mathematical Aspects of Electrical Network Analysis, Providence, R. I., pp. 17–33.
References
371
Brockett, R. W. and Wong, W. S. (1991). A gradient flow for the assignment problem, in G. Conte, A. M. Perdon and B. Wyman (eds), New Trends in System Theory, Progress in System and Control Theory, Birkh¨auser Verlag, Basel, pp. 170–177. Bucy, R. S. (1975). Structural stability for the Riccati equation, SIAM J. Control Optim. 13: 749–753. Byrnes, C. I. (1978). On certain families of rational functions arising in dynamics, Proc. IEEE pp. 1002–1006. Byrnes, C. I. (1989). Pole assignment by output feedback, number 135 in Lecture Notes in Control and Information Sciences, Springer-Verlag, Berlin, London, New York, pp. 31–78. Byrnes, C. I. and Duncan, T. W. (1989). On certain topological invariants arising in system theory, in P. J. Hilton and G. S. Young (eds), New Directions in Applied Mathematics, Springer-Verlag, Berlin, London, New York, pp. 29–72. Byrnes, C. I. and Willems, J. C. (1986). Least-squares estimation, linear programming and momentum: A geometric parametrization of local minima, IMA J. Math. Control and Inf. 3: 103–118. Chu, M. T. (1984a). The generalized Toda flow, the QR algorithm and the center manifold theory, SIAM J. Alg. Disc. Meth. 5: 187–201. Chu, M. T. (1984b). On the global convergence of the Toda lattice for real normal matrices and its application to the eigenvalue problems, SIAM J. Math. Anal. 15: 98–104. Chu, M. T. (1986). A differential equation approach to the singular value decomposition of bidiagonal matrices, Linear Algebra Appl. 80: 71–80. Chu, M. T. (1988). On the continuous realization of iterative processes, SIAM Review 30: 375–387. Chu, M. T. (1992a). Matrix differential equations: A continuous realization process for linear algebra problems, Nonlinear Anal., Theory, Methods and Applications 18: 1125–1146. Chu, M. T. (1992b). Numerical methods for inverse singular value problems, SIAM Journal of Numerical Analysis 29: 885–903. Chu, M. T. and Driessel, K. R. (1990). The projected gradient method for least squares matrix approximations with spectral constraints, SIAM Journal of Numerical Analysis 27: 1050–1060.
372
References
Chua, L. O. and Lin, G. N. (1984). Nonlinear programming without computation, IEEE Trans. Circuits and Systems 31: 182–188. Colonius, F. and Kliemann, W. (1990). Linear control semigroup acting on projective space, Technical Report 224, Universit¨ at Augsburg. Courant, R. (1922). Zur Theorie der kleinen Schwingungen, Zeits. f. angew. Math. Mech. 2: 278–285. Crouch, P. E. and Grossman, R. (1991). Numerical integration of ordinary differential equations on manifolds, J. of Nonlinear Science . Crouch, P. E., Grossman, R. and Yan, Y. (1992a). On the numerical integration of the dynamic attitude equations, Proc. IEEE Conf. Decision and Control, Tuscon, Arizona, pp. 1497–1501. Crouch, P. E., Grossman, R. and Yan, Y. (1992b). A third order RungeKutta algorithm on a manifold. Preprint. Dantzig, G. B. (1963). Linear Programming and Extensions, Princeton University Press, Princeton. Davis, M. W. (1987). Some aspherical manifolds, Duke Math. J. 5: 105– 139. de la Rocque Palis, G. (1978). Linearly induced vector fields and R2 actions on spheres, J. Diff. Geom. 13: 163–191. de Mari, F., Procesi, C. and Shayman, M. A. (1992). Hessenberg varieties, Trans. Amer. Math. Soc. 332: 529–534. de Mari, F. and Shayman, M. A. (1988). Generalized Eulerian numbers and the topology of the Hessenberg variety of a matrix, Acta Applicandae Math. 12: 213–235. de Moor, B. L. R. and David, J. (1992). Total linear least squares and the algebraic Riccati equation, Systems and Control Letters 5: 329–337. de Moor, B. L. R. and Golub, G. H. (1989). The restricted singular value decomposition: Properties and applications, Technical Report NA-8903, Department of Computer Science, Stanford University, Stanford. Deift, P., Li, L. C. and Tomei, C. (1985). Toda flows with infinitely many variables, J. Functional Analysis 64: 358–402.
References
373
Deift, P., Nanda, T. and Tomei, C. (1983). Ordinary differential equations for the symmetric eigenvalue problem, SIAM Journal of Numerical Analysis 20: 1–22. Demmel, J. W. (1987). On condition numbers and the distance to the nearest ill-posed problem, Num. Math 51: 251–289. Dempster, A. P. (1969). Elements of Continuous Multivariable Analysis, Addison-Wesley, Reading, MA, USA. Dieudonn´e, J. and Carrell, J. B. (1971). Invariant Theory, Old and New, Academic Press, New York. Doolin, B. F. and Martin, C. F. (1990). Introduction to Differential Geometry for Engineers, Marcel Dekker Inc., New York. Driessel, K. R. (1986). On isospectral gradient flow-solving matrix eigenproblems using differential equations, in J. R. Cannon and U. Hornung (eds), Inverse Problems, number 77 in ISNM, Birkh¨ auser Verlag, Basel, pp. 69–91. Driessel, K. R. (1987a). On finding the eigenvalues and eigenvectors of a matrix by means of an isospectral gradient flow, Technical Report 541, Department of Mathematical Sciences, Clemson University. Driessel, K. R. (1987b). On isospectral surfaces in the space of symmetric tridiagonal matrices, Technical Report 544, Department of Mathematical Sciences, Clemson University. Duistermaat, J. J., Kolk, J. A. C. and Varadarajan, V. S. (1983). Functions, flow and oscillatory integrals on flag manifolds and conjugacy classes in real semisimple Lie groups, Compositio Math. 49: 309–398. Eckart, G. and Young, G. S. (1936). The approximation of one matrix by another of lower rank, Psychometrika 1: 221–218. Fan, K. (1949). On a theorem of Weyl concerning eigenvalues of linear transformations 1, Proc. Nat. Acad. Sci. USA 35: 652–655. Faybusovich, L. E. (1989). QR-type factorizations, the Yang-Baxter equations, and an eigenvalue problem of control theory, Linear Algebra Appl. 122/123/124: 943–971. Faybusovich, L. E. (1991a). Dynamical systems which solve optimization problems with linear constraints, IMA J. Math. Control and Inf. 8: 135–149.
374
References
Faybusovich, L. E. (1991b). Hamiltonian structure of dynamical systems which solve linear programming problems, Physica D 53: 217–232. Faybusovich, L. E. (1991c). Interior point methods and entropy, Proc. IEEE Conf. Decision and Control, Brighton, UK, pp. 2094–2095. Faybusovich, L. E. (1992a). Dynamical systems that solve linear programming problems, Proc. IEEE Conf. Decision and Control, Tuscon, Arizona, pp. 1626–1631. Faybusovich, L. E. (1992b). Toda flows and isospectral manifolds, Proc. Amer. Math. Soc. 115: 837–847. Fernando, K. V. M. and Nicholson, H. (1980). Karhunen-Loeve expansion with reference to singular value decomposition and separation of variables, Proc. IEE, Part D 127: 204–206. ¨ Fischer, E. (1905). Uber quadratische Formen mit reellen Koeffizienten, Monatsh. Math. u. Physik 16: 234–249. Flanders, H. (1975). An extremal problem on the space of positive definite matrices, Linear and Multilinear Algebra 3: 33–39. Flaschka, H. (1974). The Toda lattice, I, Phys. Rev. B 9: 1924–1925. Flaschka, H. (1975). Discrete and periodic illustrations of some aspects of the inverse methods, in J. Moser (ed.), Dynamical system, theory and applications, number 38 in Lecture Notes in Physics, Springer-Verlag, Berlin, London, New York. Fletcher, R. (1985). Semi-definite matrix constraints in optimization, SIAM J. Control Optim. 23: 493–513. Frankel, T. (1965). Critical submanifolds of the classical groups and Stiefel manifolds, in S. S. Cairns (ed.), Differential and Combinatorial Topology, Princeton University Press, Princeton. Gamst, J. (1975). Calculus on manifolds, Technical report, Department of Mathematics, University of Bremen. Unpublished lecture notes. Ge-Zhong and Marsden, J. E. (1988). Lie-Poisson Hamilton-Jacobi theory and Lie-Poisson integration, Phys. Lett. A-133: 134–139. Gelfand, I. M. and Serganova, V. V. (1987). Combinatorial geometries and torus strata on homogeneous compact manifolds, Russian Math. Surveys 42: 133–168.
References
375
Gevers, M. R. and Li, G. (1993). Parametrizations in Control, Estimation and Filtering Problems: Accuracy Aspects, Communications and Control Engineering Series, Springer-Verlag, Berlin, London, New York. Ghosh, B. K. (1988). An approach to simultaneous system design. Part II: Nonswitching gain and dynamic feedback compensation by algebraic geometric methods, SIAM J. Control Optim. 26: 919–963. Gibson, C. G. (1979). Singular points of smooth mappings, Research Notes in Mathematics, Pitman, Boston. Glover, K. (1984). All optimal Hankel-norm approximations of linear multivariable systems and their L∞ -error bounds, International Journal of Control 39: 1115–1193. Godbout, L. F. and Jordan, D. (1980). Gradient matrices for output feedback systems, International Journal of Control 32: 411–433. Gohberg, I. C. and Krein, M. G. (1969). Introduction to the Theory of Linear Nonselfadjoint operators, Vol. 18 of Transl. Math. Monographs, American Mathematical Society, Providence, RI. Golub, G. H., Hoffman, A. and Stewart, G. W. (1987). A generalization of the Eckart-Young-Mirsky matrix approximation theorem, Linear Algebra Appl. 88/89: 317–327. Golub, G. H. and Kahan, W. (1965). Calculating the singular values and pseudo-inverse of a matrix, SIAM Journal of Numerical Analysis Series B2: 205–224. Golub, G. H. and Reinsch, C. (1970). Singular value decomposition and least squares solutions, Num. Math 14: 403–420. Golub, G. H. and Van Loan, C. F. (1980). An analysis of the total least squares problem, SIAM Journal of Numerical Analysis 17: 883–843. Golub, G. H. and Van Loan, C. F. (1989). Matrix Computations, second edn, Johns Hopkins Press, Baltimore. Gray, W. S. and Verriest, E. I. (1987). Optimality properties of balanced realizations: minimum sensitivity, Proc. IEEE Conf. Decision and Control, Los Angeles, pp. 124–128. Gray, W. S. and Verriest, E. I. (1989). On the sensitivity of generalized state-space systems, Proc. IEEE Conf. Decision and Control, Tampa, pp. 1337–1342.
376
References
Gr¨ otschel, M., Lov´ asz, L. and Schrijver, A. (1988). Geometric algorithms and combinatorial optimization, Springer-Verlag, Berlin, London, New York. Guillemin, V. and Sternberg, S. (1984). Symplectic Techniques in Physics, Cambridge University Press, Cambridge, U.K. Halmos, P. (1972). Positive approximants of operators, Indiana Univ. Math. J. 21: 951–960. Hangan, T. (1968). A Morse function on Grassmann manifolds, J. Diff. Geom. 2: 363–367. Hansen, P. C. (1990). Truncated singular value decomposition solutions to discrete ill-posed problems with ill-determined numerical rank, SIAM J. Sci. Stat. Comp. 11: 503–518. Hardy, G. H., Littlewood, J. E. and Polya, G. (1952). Inequalities. Helmke, U. (1991). Isospectral flows on symmetric matrices and the Riccati equation, Systems and Control Letters 16: 159–166. Helmke, U. (1992). A several complex variables approach to sensitivity analysis and structured singular values, J. Math. Systems, Estimation and Control 2: 1–13. Helmke, U. (1993a). Balanced realizations for linear systems: a variational approach, SIAM J. Control Optim. 31: 1–15. Helmke, U. (1993b). Isospectral flows and linear programming, J. Australian Math. Soc. Series B, 34: 495–510. Helmke, U. and Moore, J. B. (1992). Singular value decomposition via gradient and self equivalent flows, Linear Algebra Appl. 69: 223–248. Helmke, U. and Moore, J. B. (1993). L2 -sensitivity minimization of linear system representations via gradient flows, J. Math. Systems, Estimation and Control 5(1): 79–98. Helmke, U., Prechtel, M. and Shayman, M. A. (1993). Riccati-like flows and matrix approximations, Kybernetik 29: 563–582. Helmke, U. and Shayman, M. A. (1995). Critical points of matrix least squares distance functions, Linear Algebra Appl. 215: 1–19. Henniart, G. (1983). Les inegalities de Morse, Seminaire Bourbaki 617: 1– 17.
References
377
Hermann, R. (1962). Geometric aspects of potential theory in the bounded symmetric domains, Math. Ann. 148: 349–366. Hermann, R. (1963). Geometric aspects of potential theory in the symmetric bounded domains II, Math. Ann. 151: 143–149. Hermann, R. (1964). Geometric aspects of potential theory in symmetric spaces III, Math. Ann. 153: 384–394. Hermann, R. (1979). Cartanian geometry, nonlinear waves, and control theory, Part A, Vol. 20 of Interdisciplinary Mathematics, Math. Sci. Press, Brookline MA 02146. Hermann, R. and Martin, C. F. (1977). Applications of algebraic geometry to system theory, Part I, IEEE Trans. Automatic Control AC-22: 19– 25. Hermann, R. and Martin, C. F. (1982). Lie and Morse theory of periodic orbits of vector fields and matrix Riccati equations I: General Lietheoretic methods, Math. Systems Theory 15: 277–284. Herzel, S., Recchioni, M. C. and Zirilli, F. (1991). A quadratically convergent method for linear programming, Linear Algebra Appl. 151: 255– 290. Higham, N. J. (1986). Computing the polar decomposition—with applications, SIAM J. Sci. Stat. Comp. 7: 1160–1174. Higham, N. J. (1988). Computing a nearest symmetric positive semidefinite matrix, Linear Algebra Appl. 103: 103–118. Hirsch, M. W. (1976). Differential Topology, number 33 in Graduate Text in Mathematics, Springer-Verlag, Berlin, London, New York. Hirsch, M. W., Pugh, C. C. and Shub, M. (1977). Invariant Manifolds, number 583 in Lecture Notes in Mathematics, Springer-Verlag, Berlin, London, New York. Hirsch, M. W. and Smale, S. (1974). Differential Equations, Dynamical Systems, and Linear Algebra, Academic Press, New York. Hitz, K. L. and Anderson, B. D. O. (1972). Iterative method of computing the limiting solution of the matrix Riccati differential equation, Proc. IEEE 119: 1402–1406. Hopfield, J. J. (1982). Neural networks and physical systems with emergent collective computational abilities, Proc. Nat. Acad. Sci. USA 79: 2554.
378
References
Hopfield, J. J. (1984). Neurons with graded response have collective computational properties like those of two-state neurons, Proc. Nat. Acad. Sci. USA 81: 3088–3092. Hopfield, J. J. and Tank, D. W. (1985). Neural computation of decisions in optimization problems, Biological Cybernetics 52: 1–25. Horn, R. A. (1953). Doubly stochastic matrices and the diagonal of a rotation matrix, Amer. J. Math. 76: 620–630. Horn, R. A. and Johnson, C. R. (1985). Matrix Analysis, Cambridge University Press, Cambridge, U.K. Hotelling, H. (1933). Analysis of a complex of statistical variables into principal components, J. Educ. Psych. 24: 417–441, 498–520. Hotelling, H. (1935). Simplified calculation of principal components, Psychometrika 1: 27–35. Humphreys, J. E. (1972). Introduction to Lie Algebras and Representation Theory, Springer-Verlag, Berlin, London, New York. Hung, Y. S. and MacFarlane, A. G. J. (1982). Multivariable Feedback: A Quasi-Classical Approach, Vol. 40 of Lecture Notes in Control and Information Sciences, Springer-Verlag, Berlin, London, New York. Hwang, S. Y. (1977). Minimum uncorrelated unit noise in state space digital filtering, IEEE Trans. Acoust., Speech, and Signal Process. ASSP25: 273–281. Irwin, M. C. (1980). Smooth Dynamical Systems, Academic Press, New York. Isidori, A. (1985). Nonlinear Control Systems: An Introduction, SpringerVerlag, Berlin, London, New York. Jiang, D. and Moore, J. B. (1996). Least squares pole assignment by memory-less output feedback. Submitted. Jonckheere, E. and Silverman, L. M. (1983). A new set of invariants for linear systems: Applications to reduced order compensator design, IEEE Trans. Automatic Control 28: 953–964. Kailath, T. (1980). Linear Systems, Prentice-Hall, Englewood Cliffs, N.J. Karmarkar, N. (1984). A new polynomial time algorithm for linear programming, Combinatorica 4: 373–395.
References
379
Karmarkar, N. (1990). Riemannian geometry underlying interior point methods for linear programming, in Lagarias and Todd (1990), pp. 51– 76. Kempf, G. and Ness, L. (1979). The length of vectors in representation spaces, in K. Lonsted (ed.), Algebraic Geometry, number 732 in Lecture Notes in Mathematics, Springer-Verlag, Berlin, London, New York, pp. 233–244. Khachian, L. G. (1979). A polynomial algorithm in linear programming, Soviet Math. Dokl. 201: 191–194. Khachian, L. G. (n.d.). A polynomial algorithm in linear programming, Dokladi Akad. Nauk SSSR 244S: 1093–1096. Translated in (Khachian, 1979). Kimura, H. (1975). Pole assignment by gain output feedback, IEEE Trans. Automatic Control AC-20: 509–516. Klema, V. C. and Laub, A. J. (1980). The singular value decomposition: Its computation and some applications, IEEE Trans. Automatic Control AC-25: 164–176. Knuth, D. E. (1973). Sorting and Searching, Vol. 3 of The art of computer programming, Addison-Wesley, Reading, MA, USA. Kostant, B. (1973). On convexity, the Weyl group and the Iwasawa decomposition, Ann. Sci. Ecole Norm. Sup. 6: 413–455. Kostant, B. (1979). The solution to a generalized Toda lattice and representation theory, Adv. Math 34: 195–338. Kraft, H. (1984). Geometrische Methoden in der Invariantentheorie, number D1 in Aspects of Mathematics, Vieweg Verlag, Braunschweig. Krantz, S. G. (1982). Function Theory of Several Complex Variables, John Wiley & Sons, New York, London, Sydney. Krishnaprasad, P. S. (1979). Symplectic mechanics and rational functions, Richerche Automat. 10: 107–135. Kung, S. Y. and Lin, D. W. (1981). Optimal Hankel-norm model reductions: multivariable systems, IEEE Trans. Automatic Control AC26: 832–852.
380
References
Lagarias, J. C. (1991). Monotonicity properties of the Toda flow, the QR flow and subspace iteration, SIAM J. Matrix Anal. and Appl. 12: 449– 462. Lagarias, J. C. and Todd, M. J. (eds) (1990). Mathematical Development Arising from Linear Programming, Vol. 114 of AMS Contemp. Math., American Mathematical Society, Providence, RI. Laub, A. J., Heath, M. T., Paige, C. C. and Ward, R. C. (1987). Computation of system balancing transformations and other applicaitons of simultaneous diagonalization algorithms, IEEE Trans. Automatic Control AC-32: 115–121. Lawson, C. L. and Hanson, R. J. (1974). Solving Least Squares Problems, Prentice-Hall, Englewood Cliffs, N.J. Li, G., Anderson, B. D. O. and Gevers, M. R. (1992). Optimal FWL design of state-space digital systems with weighted sensitivity minimization and sparseness consideration, IEEE Trans. Circuits and Systems I: Fundamental Theory and Applications 39: 365–377. Luenberger, D. G. (1969). Optimization by Vector Space Methods, John Wiley & Sons, New York, London, Sydney. Luenberger, D. G. (1973). Introduction to linear and nonlinear programming, Addison-Wesley, Reading, MA, USA. Madievski, A. G., Anderson, B. D. O. and Gevers, M. R. (1993). Optimum FWL design of sampled data controllers, Automatica 31: 367–379. Mahony, R. E. and Helmke, U. (1995). System assignment and pole placement for symmetric realizations, J. Math. Systems, Estimation and Control 5(2): 267–272. Mahony, R. E., Helmke, U. and Moore, J. B. (1996). Gradient algorithms for principal component analysis, J. Australian Math. Soc. Series B 5(5): 1–21. Marshall, A. W. and Olkin, I. (1979). Inequalities: Theory of Majorization and its Applications, Academic Press, New York. Martin, C. F. (1981). Finite escape time for Riccati differential equations, Systems and Control Letters 1: 127–131. Meggido, N. and Shub, M. (1989). Boundary behaviour of interior-point algorithms in linear programming, Methods of Operations Research 14: 97–146.
References
381
Middleton, R. H. and Goodwin, G. C. (1990). Digital Control and Estimation: A Unified Approach, Prentice-Hall, Englewood Cliffs, N.J. Milnor, J. (1963). Morse Theory, number 51 in Ann. of Math. Studies, Princeton University Press, Princeton. Mirsky, L. (1960). Symmetric gauge functions and unitarily invariant norms, Quarterly J. of Math. Oxford 11: 50–59. Moonen, M., van Dooren, P. and Vandewalle, J. (1990). SVD updating for tracking slowly time-varying systems. a parallel implementation, in M. A. Kaashoek, J. H. van Schuppen and A. C. M. Ran (eds), Signal Processing, Scattering and Operator Theory, and Numerical Methods, Birkh¨auser Verlag, Basel, pp. 487–494. Moore, B. C. (1981). Principal component analysis in linear systems: controllability, observability, and model reduction, IEEE Trans. Automatic Control AC-26: 17–32. Moore, J. B., Mahony, R. E. and Helmke, U. (1994). Numerical gradient algorithms for eigenvalue and singular value decomposition, SIAM J. Matrix Anal. and Appl. 2: 881–902. Moser, J. (1975). Finitely many mass points on the line under the influence of an exponential potential—an integrable system, in J. Moser (ed.), Dynamic Systems Theory and Applications, Springer-Verlag, Berlin, London, New York, pp. 467–497. Moser, J. and Veselov, A. P. (1991). Discrete versions of some classical integrable systems and factorization of matrix polynomials, Comm. Math Phys. 139: 217–243. Mullis, C. T. and Roberts, R. A. (1976). Synthesis of minimum roundoff noise fixed point digital filters, IEEE Trans. Circuits and Systems CAS-23: 551–562. Mumford, D., Fogarty, J. and Kirwan, F. (1994). Geometric Invariant Theory, number 34 in Ergebnisse der Mathematik und ihrer Grenzgebiete, third edn, Springer-Verlag, Berlin, London, New York. Munkres, J. R. (1975). Topology—A First Course, Prentice-Hall, Englewood Cliffs, N.J. Nanda, T. (1985). Differential equations and the QR algorithm, SIAM Journal of Numerical Analysis 22: 310–321.
382
References
Ober, R. (1987). Balanced realizations: Canonical form, parametrization, model reduction, International Journal of Control 46: 643–670. Oja, E. (1982). A simplified neuron model as a principal component analyzer, J. Math. Biology 15: 267–273. Oja, E. (1990). Neural networks, principal components and subspaces, Int. J. Neural Syst. 1(1): 61–68. Oppenheim, A. V. and Schafer, R. W. (1989). Discrete-time signal processing, Prentice-Hall, Englewood Cliffs, N.J. Palais, R. S. (1962). Morse theory on Hilbert manifolds, Topology 2: 299– 340. Palis, J. and de Melo, W. (1982). Geometric Theory of Dynamical Systems, Springer-Verlag, Berlin, London, New York. Parlett, B. N. (1978). 20: 443–456.
Progress in numerical analysis, SIAM Review
Parlett, B. N. and Poole, W. G. (1973). A geometric theory for the QR, LU and power iterations, SIAM Journal of Numerical Analysis 10: 389– 412. Paul, S., H¨ uper, K. and Nossek, J. A. (1992). A class of nonlinear loss¨ less dynamical systems, Archiv f¨ ur Elektronik und Ubertragungstechnik 46: 219–227. Pearson, K. (1901). On lines and planes of closest fit to points in space, Phil. Mag. pp. 559–572. Peixoto, M. M. (1962). Structural stability on two-dimensional manifolds, Topology 1: 101–120. Pernebo, L. and Silverman, L. M. (1982). Model reduction via balanced state space representations, IEEE Trans. Automatic Control 27: 282– 287. Peterson, C. and Soederberg, B. (1989). A new method for mapping optimization problems onto neural networks, Int. J. Neural Syst. 1: 3–22. Pressley, A. N. (1982). The energy flow on the loop space of a compact Lie group, J. London Math. Soc. 26: 557–566. Pressley, A. N. and Segal, G. (1986). Loop Groups, Oxford Mathematical Monographs, Oxford.
References
383
Pyne, I. B. (1956). Linear programming on an analogue computer,, Trans. AIEE 75: 139–143. Part I. Reid, W. T. (1972). Riccati Differential Equations, Academic Press, New York. Roberts, R. A. and Mullis, C. T. (1987). Digital Signal Processing, Addison-Wesley, Reading, MA, USA. Rosenthal, J. (1992). New results in pole assignment by real output feedback, SIAM J. Control Optim. 30: 203–211. Rutishauser, H. (1954). Ein Infinitesimales Analogon zum QuotientenDifferenzen-Algorithmus, Arch. Math. (Basel) 5: 132–137. Rutishauser, H. (1958). Solution of eigenvalue problems with the LR-transformation, Nat. Bureau of Standards Applied Math. Series 49: 47–81. Safonov, M. G. and Chiang, R. Y. (1989). A Schur method for balancedtruncation model reduction, IEEE Trans. Automatic Control pp. 729– 733. Schneider, C. R. (1973). Global aspects of the matrix Riccati equation, Math. Systems Theory 1: 281–286. ¨ Schur, I. (1923). Uber eine Klasse von Mittelbildungen mit Anwendungen auf die Determinanten Theorie, Sitzungsber. der Berliner Math. Gesellschaft. 22: 9–20. Schuster, P., Sigmund, K. and Wolff, R. (1978). Dynamical systems under constant organisation I: Topological analysis of a family of nonlinear differential equations, Bull. Math. Biophys. 40: 743–769. Shafer, D. (1980). Gradient vectorfields near degenerate singularities, in Z. Nitecki and C. Robinson (eds), Global Theory of Dynamical Systems, number 819 in Lecture Notes in Mathematics, Springer-Verlag, Berlin, London, New York, pp. 410–417. Shayman, M. A. (1982). Morse functions for the classical groups and Grassmann manifolds. Unpublished manuscript. Shayman, M. A. (1986). Phase portrait of the matrix Riccati equation, SIAM J. Control Optim. 24: 1–65.
384
References
Shub, M. and Vasquez, A. T. (1987). Some linearly induced Morse-Smale systems, the QR algorithm and the Toda lattice, in L. Keen (ed.), The Legacy of Sonya Kovaleskaya, Vol. 64 of AMS Contemp. Math., American Mathematical Society, Providence, RI, pp. 181–194. Sirat, J. A. (1991). A fast neural algorithm for principal component analysis and singular value decompositions, Int. J. Neural Syst. 2: 147–155. Slodowy, P. (1989). Zur Geometrie der Bahnen reell reduktiver Gruppen, in H. Kraft, P. Slodowy and T. Springer (eds), Algebraic Transformation Groups and Invariant Theory, Birkh¨ auser Verlag, Basel, pp. 133–144. Smale, S. (1960). Morse inequalities for a dynamical system, Bull. AMS 66: 43–49. Smale, S. (1961). On gradient dynamical systems, Annals of Mathematics 74: 199–206. Smale, S. (1976). On the differential equations of species in competition, J. Math. Biology 3: 5–7. Smith, S. T. (1991). Dynamical systems that perform the singular value decomposition, Systems and Control Letters 16: 319–328. Smith, S. T. (1993). Geometric optimization methods for adaptive filtering, PhD thesis, Harvard University. Sonnevend, G. and Stoer, J. (1990). Global ellipsoidal approximations and homotopy methods for solving convex programs, Appl. Math. and Optimization 21: 139–165. Sonnevend, G., Stoer, J. and Zhao, G. (1990). On the complexity of following the central path of linear programs by linear extrapolation, Methods of Operations Research 62: 19–31. Sonnevend, G., Stoer, J. and Zhao, G. (1991). On the complexity of following the central path of linear programs by linear extrapolation II, Mathematical Programming (Series B) . Sontag, E. D. (1990a). Mathematical Control Systems: Deterministic Finite Dimensional Systems, Texts in Applied Mathematics, Springer-Verlag, Berlin, London, New York. Sontag, E. D. (1990b). Mathematical Control Theory, Springer-Verlag, Berlin, London, New York.
References
385
Symes, W. W. (1980a). Hamiltonian group actions and integrable systems, Physica 1D: 339–374. Symes, W. W. (1980b). Systems of the Toda type, inverse spectral problems, and representation theory, Inventiones Mathematicae 59: 13–51. Symes, W. W. (1982). The QR algorithm and scattering for the finite nonperiodic Toda lattice, Physica 4D: 275–280. Takeuchi, M. (1965). Cell decompositions and Morse equalities on certain symmetric spaces, J. Fac. of Sci. Univ. of Tokyo 12: 81–192. Tank, D. W. and Hopfield, J. J. (1985). Simple ‘neural’ optimization networks: an A/D converter, signal decision circuit and a linear programming circuit, IEEE Trans. Circuits and Systems CAS-33: 533–541. Thiele, L. (1986). On the sensitivity of linear state-space systems, IEEE Trans. Circuits and Systems CAS-33: 502–510. Thom, R. (1949). Sur une partition en cellules associee anne fonction sur une varite, C.R. Acad. Sci. Paris Series A–B 228: 973–975. Tomei, C. (1984). The topology of isospectral manifolds of tri-diagonal matrices, Duke Math. J. 51: 981–996. Van Loan, C. F. (1985). How near is a stable matrix to an unstable matrix?, in B. N. Datta (ed.), Linear Algebra and its Rˆ ole in Systems Theory, Vol. 47 of AMS Contemp. Math., American Mathematical Society, Providence, RI, pp. 465–478. Verriest, E. I. (1983). On generalized balanced realizations, IEEE Trans. Automatic Control AC-28: 833–844. Verriest, E. I. (1986). Model reduction via balancing, and connections with other methods, in U. B. Desai (ed.), Modelling and Approximation of Stochastic Processes, Kluwer Academic Publishers Group, Dordrecht, The Netherlands, pp. 123–154. Verriest, E. I. (1988). Minimum sensitivity implementation for multimode systems, Proc. IEEE Conf. Decision and Control, Austin, TX, pp. 2165–2170. Veselov, A. P. (1992). Growth and integrability in the dynamics of mappings, Comm. Math Phys. 145: 181–193. Vladimirov, V. S. (1966). Methods of the Theory of Functions of Many Complex Variables, MIT Press, Cambridge, MA.
386
References
von Neumann, J. (1937). Some matrix-inequalities and metrization of matric-spaces, Tomsk Univ. Rev. 1: 286–300. See also (von Neumann, 1962). von Neumann, J. (1962). Some matrix-inequalities and metrization of matric-spaces, in A. H. Taub (ed.), John von Neumann Collected Works, Vol. IV, Pergamon, New York, pp. 205–218. Wang, X. (1991). On output feedback via Grassmannians, SIAM J. Control Optim. 29: 926–935. Watkins, D. S. (1984). Isospectral flows, SIAM Review 26: 379–391. Watkins, D. S. and Elsner, L. (1989). Self-equivalent flows associated with the singular value decomposition, SIAM J. Matrix Anal. and Appl. 10: 244–258. Werner, J. (1992). schweig.
Numerische Mathematik 2, Vieweg Verlag, Braun-
Weyl, H. (1946). The Classical Groups, Princeton University Press, Princeton. Wilf, H. S. (1981). An algorithm-inspired proof of the spectral theorem in E n , Amer. Math. Monthly 88: 49–50. Willems, J. C. and Hesselink, W. H. (1978). Generic properties of the pole placement problem, Proc. IFAC World Congress, Helsinki. Williamson, D. (1986). A property of internally balanced and low noise structures, IEEE Trans. Automatic Control AC-31: 633–634. Williamson, D. (1991). Finite Word Length Digital Control Design and Implementation, Prentice-Hall, Englewood Cliffs, N.J. Williamson, D. and Skelton, R. E. (1989). Optimal q-Markov COVER for finite wordlength implementation, Math. Systems Theory 22: 253–273. Wimmer, K. (1988). Extremal problems for H¨ older norms of matrices and realizations of linear systems, SIAM J. Matrix Anal. and Appl. 9: 314– 322. Witten, E. (1982). Supersymmetry and Morse theory, J. Diff. Geom. 17: 661–692. Wonham, W. M. (1967). On pole assignment in multi-input controllable linear systems, IEEE Trans. Automatic Control AC-12: 660–665.
References
387
Wonham, W. M. (1985). Linear Multivariable Control, third edn, SpringerVerlag, Berlin, London, New York. Wu, W. T. (1965). 14: 1721–1728.
On critical sections of convex bodies, Sci. Sinica
Yan, W. Y., Helmke, U. and Moore, J. B. (1993). Global analysis of Oja’s flow for neural networks, IEEE Trans. Neural Networks 5(5): 674–683. Yan, W. Y. and Moore, J. B. (1992). On L2 -sensitivity minimization of linear state-space systems, IEEE Trans. Circuits and Systems 39: 641– 648. Yan, W. Y., Moore, J. B. and Helmke, U. (1993). Recursive algorithms for solving a class of nonlinear matrix equations with applications to certain sensitivity optimization problems, SIAM J. Control Optim. 32(6): 1559–1576. Youla, D. C. and Tissi, P. (1966). N-port synthesis via reactance extraction—Part I, IEEE Intern. Convention Record pp. 183–205. Yuille, A. L. (1990). Generalized deformable models, statistical physics and matching problems, Neural Computation 2: 1–24. Zeeman, E. C. (1980). Population dynamics from game theory, in Z. Nitecki and C. Robinson (eds), Global Theory of Dynamical Systems, number 819 in Lecture Notes in Mathematics, Springer-Verlag, Berlin, London, New York, pp. 471–497. Zha, H. (1989a). A numerical algorithm for computing the restricted singular value decomposition of matrix triplets, number 89-1 in Scientific Report, Konrad-Zuse Zentrum f¨ ur Informationstechnik, Berlin. Zha, H. (1989b). Restricted singular value decomposition of matrix triplets, number 89-2 in Scientific Report, Konrad-Zuse Zentrum f¨ ur Informationstechnik, Berlin.
Author Index 123, 125, 144, 160, 161, 259, 266, 369–371 Bucy, R. S., 42, 371 Byrnes, C. I., 78–80, 123, 159– 161, 222, 223, 369–371
Adamyan, 160, 367 Aiyer, S. V. B., 124, 367 Akin, E., 124, 367 Ammar, G. S., 7, 42, 367 Anderson, B. D. O., 41, 66, 222, 236, 266, 271, 291, 292, 309, 367, 377, 380 Anderson, T. W., 200, 367 Andronov, A. A., 41, 367 Arnold, V. I., 80, 199, 368 Arov, 160, 367 Atiyah, M. F., 78, 123, 368 Auchmuty, G., 19, 368 Azad, H., 166, 204, 207, 208, 368
Carrell, J. B., 165, 199, 373 Chiang, R. Y., 229, 266, 383 Chu, M. T., 1, 2, 31, 33, 37, 42, 43, 68, 78–81, 100, 371 Chua, L. O., 102, 124, 372 Colonius, F., 79, 372 Courant, R., 1, 372 Crouch, P. E., 80, 372
Baldi, P., 160, 199, 368 Barnett, S., 311, 368 Batterson, S., 19, 368 Bayer, D. A., 2, 102, 123, 368 Beattie, C., 19, 368 Bellman, R. E., 311, 368 Bhatia, R., 42, 99, 100, 368 Bitmead, R. R., 222, 367 Bloch, A. M., 2, 48, 59, 79, 80, 102, 123, 159, 369, 370 Bott, R., 41, 78, 368, 369 Br¨ ocker, T., 100, 370 Brockett, R. W., 1, 2, 43, 48, 53, 67–69, 78–80, 100, 103,
Dantzig, G. B., 123, 372 David, J., 159, 372 Davis, M. W., 78, 372 Deift, P., 2, 31, 42, 79, 372, 373 Demmel, J. W., 160, 373 Dempster, A. P., 99, 373 de la Rocque Palis, G., 25, 372 de Mari, F., 78, 372 de Melo, W., 41, 382 de Moor, B. L. R., 100, 159, 372 Dieudonn´e, J., 165, 199, 373 Doolin, B. F., 41, 373 Driessel, K. R., 43, 78, 79, 81, 100, 371, 373
390
Author Index
Duistermaat, J. J., 19, 51, 52, 61, 78, 373 Duncan, T. W., 222, 223, 371 Eckart, G., 135, 159, 373 Elsner, L., 31, 37, 42, 386 Fallside, F., 124, 367 Fan, K., 27, 373 Faybusovich, L. E., 2, 79, 79, 102, 111, 114, 123, 266, 370, 373, 374 Fernando, K. V. M., 99, 374 Fischer, E., 1, 374 Flanders, H., 173, 200, 374 Flaschka, H., 2, 59, 60, 79, 369, 374 Fletcher, R., 160, 374 Fogarty, J., 199, 381 Fox, D. W., 19, 368 Frankel, T., 78, 374
Hangan, T., 78, 376 Hansen, P. C., 99, 376 Hanson, R. J., 99, 380 Hardy, G. H., 57, 376 Heath, M. T., 229, 266, 380 Helmke, U., 26, 28, 69, 73, 74, 79–81, 100, 117, 119, 123, 126, 156, 159, 161, 200, 201, 209, 226, 258, 266, 269, 282, 291–293, 309, 376, 380, 381, 387 Henniart, G., 41, 376 Hermann, R., 2, 41, 78, 160, 161, 266, 377 Herzel, S., 114, 123, 377 Hesselink, W. H., 160, 161, 386 Higham, N. J., 132, 160, 377 Hirsch, M. W., 21, 124, 335, 363, 377 Hitz, K. L., 266, 291, 292, 377 Hoffman, A., 159, 375 Hopfield, J. J., 102, 124, 377, 378, 385 Horn, R. A., 123, 133, 378 Hornik, K., 160, 199, 368 Hotelling, H., 99, 378 Humphreys, J. E., 100, 378 Hung, Y. S., 99, 378 H¨ uper, K., 79, 382 Hwang, S. Y., 203, 226, 304, 309, 378
Gamst, J., 335, 374 Gauss, 1 Ge-Zhong, 80, 374 Gelfand, I. M., 78, 374 Gevers, M. R., 271, 309, 375, 380 Ghosh, B. K., 160, 161, 375 Gibson, C. G., 357, 375 Glover, K., 99, 375 Godbout, L. F., 161, 375 Gohberg, I. C., 99, 375 Golub, G. H., 5, 6, 14, 19, 31, 34, 36, 42, 79, 99, 100, 126, 159, 372, 375 Goodwin, G. C., 270, 309, 381 Gray, W. S., 218, 226, 375 Grossman, R., 80, 372 Gr¨ otschel, M., 123, 376 Guillemin, V., 199, 376
Jiang, D., 156, 378 Johnson, C. R., 133, 378 Jonckheere, E., 226, 378 Jordan, D., 161, 375
Halmos, P., 160, 376
Kahan, W., 36, 375
Irwin, M. C., 323, 378 Isidori, A., 323, 335, 378
Author Index
Kailath, T., 160, 199, 226, 323, 378 Karmarkar, N., 2, 107, 108, 123, 378, 379 Kempf, G., 1, 165, 199, 204, 379 Khachian, L. G., 2, 107, 123, 379 Khaikin, S. E., 41, 367 Kimura, H., 160, 379 Kirwan, F., 199, 381 Klema, V. C., 99, 379 Kliemann, W., 79, 372 Knuth, D. E., 123, 379 Kodama, Y., 79, 369 Kolk, J. A. C., 19, 51, 52, 61, 78, 373 Kostant, B., 2, 42, 59, 79, 123, 379 Kraft, H., 165, 169, 199, 214, 379 Krantz, S. G., 204, 226, 379 Krein, M. G., 99, 160, 367, 375 Krishnaprasad, P. S., 79, 266, 370, 379 Kung, S. Y., 160, 379 Lagarias, J. C., 2, 79, 102, 123, 368, 380 Laub, A. J., 99, 229, 266, 379, 380 Lawson, C. L., 99, 380 Li, G., 271, 309, 375, 380 Li, L. C., 79, 372 Lin, D. W., 160, 379 Lin, G. N., 102, 124, 372 Littlewood, J. E., 57, 376 Loeb, J. J., 166, 204, 207, 208, 368 Lov´ asz, L., 123, 376 Luenberger, D. G., 123, 380 MacFarlane, A. G. J., 99, 378 Madievski, A. G., 309, 380
391
Mahony, R. E., 26, 69, 73, 74, 80, 100, 117, 119, 156, 161, 380, 381 Marsden, J. E., 80, 374 Marshall, A. W., 42, 380 Martin, C. F., 7, 41, 42, 160, 161, 367, 373, 377, 380 Meggido, N., 123, 380 Middleton, R. H., 270, 309, 381 Milnor, J., 21, 41, 67, 78, 261, 278, 381 Mirsky, L., 159, 381 Moonen, M., 99, 381 Moore, B. C., 99, 203, 217, 219, 226, 230, 381 Moore, J. B., 26, 28, 41, 66, 69, 73, 74, 80, 81, 100, 117, 119, 156, 236, 266, 269, 282, 291–293, 295, 309, 367, 376, 378, 380, 381, 387 Moser, J., 59, 79, 80, 381 Mullis, C. T., 99, 203, 226, 270, 298, 299, 309, 381, 383 Mumford, D., 199, 381 Munkres, J. R., 335, 381 Nanda, T., 31, 42, 79, 373, 381 Ness, L., 1, 165, 199, 204, 379 Nicholson, H., 99, 374 Niranjan, M., 124, 367 Nossek, J. A., 79, 382 Ober, R., 226, 382 Oja, E., 23, 28, 99, 107, 382 Olkin, I., 42, 200, 367, 380 Oppenheim, A. V., 309, 382 Paige, C. C., 229, 266, 380 Palais, R. S., 41, 382 Palis, J., 41, 382 Parlett, B. N., 6, 7, 42, 160, 382
392
Author Index
Paul, S., 79, 382 Pearson, K., 159, 382 Peixoto, M. M., 41, 382 Pernebo, L., 203, 226, 266, 382 Peterson, C., 124, 382 Polya, G., 57, 376 Pontryagin, L., 41, 367 Poole, W. G., 6, 7, 42, 382 Prechtel, M., 126, 159, 376 Pressley, A. N., 78, 382 Procesi, C., 78, 372 Pugh, C. C., 363, 377 Pyne, I. B., 102, 124, 383 Ratiu, T., 2, 48, 79, 369 Recchioni, M. C., 114, 123, 377 Reid, W. T., 41, 383 Reinsch, C., 99, 375 Roberts, R. A., 99, 203, 226, 270, 298, 299, 309, 381, 383 Rosenthal, J., 160, 383 Rutishauser, H., 1, 31, 42, 383 Safonov, M. G., 229, 266, 383 Schafer, R. W., 309, 382 Schneider, C. R., 42, 383 Schrijver, A., 123, 376 Schur, I., 123, 383 Schuster, P., 108, 124, 383 Segal, G., 78, 382 Serganova, V. V., 78, 374 Shafer, D., 41, 383 Shayman, M. A., 42, 78, 126, 144, 159, 372, 376, 383 Shub, M., 79, 123, 363, 377, 380, 384 Sigmund, K., 108, 124, 383 Silverman, L. M., 203, 226, 266, 378, 382 Sirat, J. A., 99, 384 Skelton, R. E., 226, 386 Skoog, R. A., 67, 370
Slodowy, P., 166, 199, 384 Smale, S., 41, 124, 377, 384 Smilie, J., 19, 368 Smith, S. T., 80, 81, 100, 384 Soederberg, B., 124, 382 Sonnevend, G., 123, 384 Sontag, E. D., 160, 226, 323, 384 Sternberg, S., 199, 376 Stewart, G. W., 159, 375 Stoer, J., 123, 384 Symes, W. W., 2, 31, 42, 79, 385 Takeuchi, M., 78, 385 Tank, D. W., 102, 124, 378, 385 Thiele, L., 270, 271, 285, 309, 385 Thom, R., 41, 385 Tissi, P., 67, 387 Todd, M. J., 123, 380 tom Dieck, T., 100, 370 Tomei, C., 2, 31, 42, 60, 78, 79, 372, 373, 385 van Dooren, P., 99, 381 Vandewalle, J., 99, 381 Van Loan, C. F., 5, 6, 14, 19, 31, 34, 42, 79, 99, 126, 159, 160, 375, 385 Varadarajan, V. S., 19, 51, 52, 61, 78, 373 Vasquez, A. T., 79, 384 Verriest, E. I., 217, 218, 226, 258, 266, 375, 385 Veselov, A. P., 80, 381, 385 Vitt, A. A., 41, 367 Vladimirov, V. S., 204, 226, 385 von Neumann, J., 1, 99, 100, 386 Wang, X., 160, 161, 386 Ward, R. C., 229, 266, 380 Watkins, D. S., 31, 37, 42, 79, 386
Author Index
Werner, J., 123, 386 Weyl, H., 199, 386 Wilf, H. S., 79, 386 Willems, J. C., 78, 123, 159–161, 371, 386 Williamson, D., 226, 270, 309, 386 Wimmer, K., 200, 386 Witten, E., 41, 386 Wolff, R., 108, 124, 383 Wong, W. S., 43, 123, 371 Wonham, W. M., 160, 386, 387 Wu, W. T., 78, 387 Yan, W. Y., 28, 266, 269, 282, 291–293, 295, 309, 387 Yan, Y., 80, 372 Youla, D. C., 67, 387 Young, G. S., 135, 159, 373 Yuille, A. L., 124, 387 Zeeman, E. C., 108, 109, 124, 387 Zha, H., 100, 387 Zhao, G., 123, 384 Zirilli, F., 114, 123, 377
393
Subject Index 1-form, 359 2-norm, 318 adjoint orbit, 79 transformation, 354 Ado’s theorem, 351 affine constraint, 39 algebraic group action, 165 approximation and control, 125 approximations by symmetric matrices, 126 artificial neural network, 2, 102 asymptotically stable, 331 Azad-Loeb theorem, 207 Baker-Campbell-Hausdorff formula, 354 balanced matrix factorization, 163 realization, 230, 328 truncation, 203, 328 balancing transformation, 172, 180 balancing via gradient flows, 229 basic open set, 335 basins of attraction, 73 basis, 335 Betti number, 262
bijective, 321 bilinear bundle, 359 system, 224 binomial coefficient, 129 Birkhoff minimax theorem, 261 bisection method, 160 C ∞ -atlas, 342 Cartan decomposition, 100 Cauchy index, 67 Cauchy-Maslov index, 222 Cayley-Hamilton theorem, 314 cell decomposition, 41 center manifold, 364 center manifold theorem, 364 chain rule, 339 characteristic polynomial, 314 chart, 341 Cholesky decomposition, 317 factor, 316 closed graph theorem, 338 loop system, 146 orbit, 199 orbit lemma, 142, 214, 356 subset, 335 closure, 336
396
Subject Index
combinatorial assignment problem, 123 compact, 336 Lie group, 44, 351 Stiefel manifold, 346 sublevel set, 20, 173 compatibility condition, 16, 361 completely integrable gradient flow, 2 Hamiltonian system, 123 complex Grassmann manifold, 9, 345 homogeneous space, 207 Lie group, 352 projective space, 7 complexity, 80 computational considerations, 75 computer vision, 79 condition number, 206, 318 congruence group action, 127 connected components, 127, 336 constant Riemannian metric, 144 constant step-size selection, 74 constrained least squares, 38 continuation method, 285 continuous map, 337 continuous-time, 323 contour integration, 273 controllability, 66, 325 controllability Gramian, 202, 326 convergence of gradient flows, 19, 365 convex constraint set, 111 convex function, 204 coordinate chart, 9 cotangent bundle, 16, 358 Courant-Fischer minimax theorem, 14 covering, 336 critical point, 20, 340 value, 15, 340
dense, 336 detectability, 326 determinant, 312 diagonal balanced factorization, 163 realization, 230 balancing flow, 196 transformation, 182, 248 matrix, 312 diagonalizable, 316 diagonalization, 43 diagonally balanced realization, 328 balanced realization, 203 N -balanced, 217 diffeomorphism, 46, 339, 343 differential geometry, 1 digital filter design, 4 discrete integrable system, 80 Jacobi method, 79 -time, 323 balancing flow, 256 dynamical system, 6 discretization, 194 dominant eigenspace, 10 eigenvalue, 5 eigenvector, 5 double bracket equation, 4, 43, 48, 81, 101 flow, 118, 152 dual flow, 240 dual vector space, 321 dynamic Riccati equation, 66 dynamical system, 323 Eckart-Young-Mirsky 125, 159
theorem,
Subject Index
eigenspace, 314 eigenstructure, 147 eigenvalue, 314 assignment, 5, 147 inequality, 42 eigenvector, 314 Eising distance, 259 equilibrium point, 20, 330 equivalence class, 338 relation, 45, 338 essentially balanced, 302 Euclidean diagonal norm balanced, 259 inner product, 17 norm, 5, 259, 317 balanced, 259 balanced realization, 276 balancing, 220 controllability Gramian, 260 observability Gramian, 260 optimal realization, 258 topology, 199 Euler characteristic, 60 Euler iteration, 76 exponential, 314 convergence, 39, 72, 188 map, 354 rate of convergence, 18, 331 exponentially stable, 331 Faybusovich flow, 113 feedback control, 4 gain, 146 preserving flow, 156 fiber, 320 fiber theorem, 26, 350 finite escape time, 14, 42, 67, 154 finite Gramian, 202
397
finite-word-length constraint, 270 first fundamental theorem of invariant theory, 199 first-order necessary condition, 341 fixed point, 71 flag, 61 flag manifold, 61 flows on orthogonal matrices, 53 flows on the factors, 186 Fr´echet derivative, 15, 338 frequency response, 324 frequency shaped sensitivity minimization, 285 Frobenius norm, 44, 318 function balanced realization, 211 function minimizing, 211 general linear group, 44 generalized Lie-bracket, 312 Rayleigh quotient, 25 generic, 336 genericity assumption, 72 geometric invariant theory, 211, 226 global analysis, 335 asymptotic stability, 331 maximum, 340 minimum, 129, 340 gradient flow, 15, 360 gradient vector field, 16, 20, 361 gradient-like behaviour, 142 gradient-like flow, 146 Gram-Schmidt orthogonalization, 30 graph, 355 Grassmann manifold, 7, 159 Grassmannian, 343 group, 351 group action, 165
398
Subject Index
Hamiltonian flow, 159 matrix, 66, 354 mechanics, 79 realization, 221 system, 2, 60 Hankel matrix, 160, 164, 199, 327 operator, 160 Hausdorff, 46, 336 Hebbian learning, 28 Hermitian, 312 inner product, 322 projection, 8 Hessenberg form, 58, 279 matrix, 58, 316 varieties, 78 Hessian, 20, 48, 51, 339, 348 high gain output feedback, 152 Hilbert-Mumford criterion, 199 holomorphic group action, 207, 356 homeomorphism, 46, 337 homogeneous coordinates, 8, 345 homogeneous space, 44, 47, 355 Householder transformation, 304 hyperbolic, 362 identity matrix, 312 image space, 313 imbedded submanifold, 356 imbedding, 337, 350 immersed submanifold, 350 immersion, 349 induced Riemannian metric, 16, 53, 54, 174, 186, 252, 275, 360 inequality constraint, 114 infinite dimensional space, 41 injective, 321 inner product, 50, 321
input, 146 input balancing transformation, 303 integrable Hamiltonian system, 31 integrable system, 80 integral curve, 329 interior, 336 interior point algorithm, 107 interior point flow, 107 intrinsic Riemannian metric, 185, 305 invariant submanifold, 97 theory, 165, 201 inverse eigenvalue problem, 43, 125 inverse function theorem, 340 isodynamical flow, 249, 282 isomorphism, 321 isospectral differential equation, 31 flow, 35, 43, 48, 81 manifold, 90 Iwasawa decomposition, 160 Jacobi identity, 352 matrix, 58, 316, 338 Jordan algebra, 135 K¨ ahler structure, 159 Kalman’s realization theorem, 148, 204, 327 Karhunen-Loeve expansion, 99 Karmarkar algorithm, 101, 107 Kempf-Ness theorem, 170, 261 kernel, 313 Khachian’s algorithms, 101 Killing form, 54, 94, 100 Kronecker product, 175, 318 Krylov-sequence, 5
Subject Index
L2 -sensitivity balanced, 279 balanced realization, 296 balancing, recursive, 291 cost function, constrained, 307 function, total, 273 Gramian, 279 measure, 270 model reduction, 295 optimal co-ordinate transformation, 275 La Salle’s principle of invariance, 142, 333 Lagrange multiplier, 15, 307, 340 Landau-Lifshitz equation, 79 Laplace transform, 324 least squares estimation on a sphere, 39 matching, 79, 123, 144 optimization, 1 Lemma of Lyapunov, 319 Levi form, 205 Levi’s problem, 204 Lie algebra, 45, 352 bracket, 32, 43, 45, 312, 352 bracket recursion, 69 group, 44, 351 group action, 45, 355 subgroup, 351 limit set, α-, 20, 331 ω-, 20, 331, 365 linear algebra, 311 algebraic group action, 165 cost function, 108 dynamical system, 146, 323 index gradient flow, 174, 232 induced flow, 67 neural network, 125
399
operator, 321 optimal control, 66 programming, 43, 101, 102 linearization, 18, 330 local coordinate system, 341 diffeomorphism, 339, 343 maximum, 340 minimum, 129, 340 stable manifold, 362 unstable manifold, 362 locally compact, 336 invariant, 332 stable attractor, 20 lossless electrical network, 79 Lyapunov balanced realization, 296 equation, 192, 280 function, 142, 332 stability, 332 Markov parameter, 327 matching problem, 259 matrix, 311 eigenvalue methods, 1 exponential, 10 inversion lemma, 117, 313 least squares approximation, 43 least squares index, 44 logarithm, 11 nearness problem, 160 Riccati equation, 64 maximal compact, 351 McMillan degree, 67, 148, 327 micromagnetics, 79 minimal realization, 327 mixed equality-inequality constraint, 111 mixed L2 /L1 -sensitivity, 271 model reduction, 99, 160, 203
400
Subject Index
moment map, 78, 199 Morse chart, 363 function, 41, 67, 262, 340 function on Grassmann manifolds, 78 index, 78, 92, 349 inequality, 262, 278 lemma, 21, 340 theory, 41, 78, 261 Morse-Bott function, 19, 21, 27, 52, 95, 366 multi-mode system, 226 multivariable sensitivity norm, 288 N -balanced, 217 n-dimensional manifold, 342 N -Gramian norm, 218 n-space, complex projective, 344 real projective, 344 negatively invariant, 331 neighbourhood, 335 neural network, 23, 28, 99, 199 Newton method, 3, 76 Newton-Raphson gradient flow, 244 noncompact Stiefel manifold, 28, 345 nondegenerate critical point, 340 nonlinear artificial neuron, 124 constraint, 39 programming, 123 norm balanced realization, 211 norm minimal, 211 normal bundle, 359 matrix, 167, 315 Riemannian metric, 50, 53, 90, 136, 140, 150, 187,
253, 283 space, 26 normally hyperbolic subset, 181 numerical analysis, 42 integration, 80 matrix eigenvalue methods, 42 observability, 326 observability Gramian, 202, 327 observable realization, 66 Oja flow, 23 one-parameter group, 330 open interval, 336 open set, 335 open submanifold, 342 optimal eigenvalue assignment, 155 optimal vertex, 109 optimally clustered, 217 orbit, 45, 166, 355 orbit closure lemma, 151 orbit space, 45, 355 orthogonal group, 44, 352 Lie-bracket algorithm, 74 matrix, 312 projection, 25 orthogonally invariant, 166 output, 146 output feedback control, 146 equivalent, 147 group, 147 optimization, 147 orbit, 148 Pad´e approximation, 76, 120, 198 parallel processing, 2 pattern analysis, 79 perfect Morse-Bott function, 167
Subject Index
permutation matrix, 44, 54, 312 Pernebo and Silverman theorem, 328 phase portrait analysis, 79 plurisubharmonic function, 204, 205 polar decomposition, 132, 143, 208, 317 pole and zero sensitivity, 285 pole placement, 147 polynomial invariant, 199 polytope, 111 population dynamics, 108, 124 positive definite approximant, 144 definite matrix, 172, 316 semidefinite matrix, 117, 142 positively invariant, 331 power method, 5, 9 principal component analysis, 28 product manifold, 343 product topology, 336 projected gradient flow, 57 projective space, 7, 343 proper function, 141 proper map, 337 pseudo-inverse, 39, 313 QR algorithm, 4, 30 -decomposition, 30, 317 -factorization, 30 quadratic convergence, 123 cost function, 108 index gradient flow, 176, 233 quantization, 43 quotient map, 337 quotient space, 338 quotient topology, 45, 337
401
rank, 313 rank preserving flow, 139, 141 rates of exponential convergence, 178 Rayleigh quotient, 14, 52, 104 Rayleigh quotient method, 19 real algebraic Lie group action, 149 algebraic subgroup, 357 analytic, 338 Grassmann manifold, 9, 345 projective space, 7 symplectic group, 354 recursive balancing matrix factorization, 193 flows on orthogonal matrices, 73 linear programming, 117 reduction principle, 364 reductive group, 199 regular value, 340 research problem, 145, 266 Riccati equation, 3, 10, 29, 58, 61, 140, 185, 236, 292 Riccati-like algorithm, 308 Riemann sphere, 8 Riemann surface, 60 Riemannian manifold, 16, 359 Riemannian metric, 112, 115, 359 roundoff noise, 298 row rank, 320 row space, 320 Runge-Kutta, 34, 291 saddle point, 22, 340 scaling constraints, 298 Schur complement formula, 133 Schur form, 316 Schur-Horn theorem, 123
402
Subject Index
second-order sufficient condition, 341 section, 358 self-equivalent differential equation, 36 self-equivalent flow, 36, 83, 88, 100 selfadjoint, 322 semialgebraic Lie subgroup, 356 sensitivity minimization with constraints, 298 of Markov parameters, 287 optimal controller design, 309 optimal singular system, 226 optimization, 269 Shahshahani metric, 124 shift strategies, 4 sign matrix, 56, 312 signal processing, 4 signature, 12, 316 signature symmetric balanced realization, 224 realization, 222 similar matrices, 315 similarity action, 165, 251 orbit, 272 transformation, 316 simplex, 103 simulation, 75, 121, 234, 242 simultaneous eigenvalue assignment, 147 singular value, 35, 317 value decomposition, 4, 35, 81, 84, 164, 317 vector, 35 skew Hermitian, 312 skew symmetric, 311 smooth, 338
smooth manifold, 341 smooth semialgebraic action, 356 sorting, 43, 101 sorting algorithm, 123 sorting of function values, 79 special linear group, 44, 353 orthogonal group, 353 unitary group, 353 spectral theorem, 8 sphere, 343 stability, 330 stability theory, 73 stabilizability, 325 stabilizer, 166 stabilizer subgroup, 46, 356 stable, 324, 331 manifold, 181, 362 manifold theorem, 362 point, 211 standard Euclidean inner product, 321 Hermitian inner product, 322 inner product, 135 Riemannian metric, 16, 94 simplex, 102 state vector, 146 steepest descent, 85 step-size selection, 69, 194 Stiefel manifold, 25, 52, 87 strict Lyapunov function, 332 strictly plush, 205 strictly proper, 327 structural stability, 25, 42, 79 subgroup, 351 subharmonic, 204 subimmersion, 349 submanifold, 349 submersion, 349 subspace, 320 subspace learning, 79
Subject Index
subspace topology, 336 surjective, 321 symmetric, 311 approximant, 131 projection operator, 62 realization, 67, 152, 221 symplectic geometry, 2, 78, 123, 199 system balancing, 201 systems with symmetry, 147 tangency condition, 16, 361 tangent bundle, 358 map, 347, 358 space, 26, 47, 347 vector, 347 Taylor series, 339 Taylor’s theorem, 70 Toda flow, 2, 31, 58 Toda lattice equation, 60 Toeplitz matrix, 160 topological closure, 169 topological space, 335 topology, 335 torus action, 2 torus varieties, 78 total least squares, 159 least squares estimation, 125 linear least squares, 125 trace, 314 trace constraint, 303 transfer function, 148, 324 transition function, 342 transitive, 45 transitive action, 355 transpose, 311 travelling salesman problem, 43, 124 trivial vector bundle, 358
403
truncated singular value decomposition, 125, 159 uniform complete controllability, 326 unit sphere, 14 unitarily invariant, 207 unitarily invariant matrix norm, 99 unitary group, 44, 352 unitary matrix, 312 universal covering space, 78 unstable manifold, 41, 52, 362 upper triangular, 32 variable step size selection, 73 vec, 318 vec operation, 175 vector bundle, 357 field, 329, 358 space, 320 VLSI, 79 Volterra-Lotka equation, 108, 124 weak Lyapunov function, 333 Weierstrass theorem, 337 Wielandt’s minmax principle, 42 Wielandt-Hoffman inequality, 52, 78, 100 Z-transform, 324 Zariski topology, 199