Generalized Inverses of Linear Transformations
Books in the Classics in Applied Mathematics series are monographs and textbooks declared out of print by their original publishers, though they are of continued importance and interest to the mathematical community. SIAM publishes this series to ensure that the information presented in these texts is not lost to today's students and researchers.
Editor-in-Chief Robert E. O'Malley, Jr., University of Washington Editorial Board
John Boyd, University of Michigan Leah Edelstein-Keshet, University of British Columbia William 0. Fans, University of Arizona Nicholas J. Higham, University of Manchester Peter Hoff, University of Washington Mark Kot, University of Washington Hilary Ockendon, University of Oxford Peter Olver, University of Minnesota Philip Protter, Cornell University Gerhard Wanner, L'Université de Geneve Classics in Applied Mathematics C. C. Lin and L. A. Segel, Mathematics Applied to Deterministic Problems in the Natural Sciences Johan 0. F. Belinfante and Bernard Kolman, A Suntey of Lie Groups and Lie Algebras with Applications and Computational Methods James M. Ortega, Numerical Analysis. A Second Course Anthony V. Fiacco and Garth P. McCormick, NonhineaT Sequential Unconstrained Minimization Techniques F. H. Clarke, Optimization and Nonsmooth Analysis George F. Carrier and Carl E Pearson, Ordinary Differential Equations Leo Breiman, Probability
R. Bellman and 0. M. Wing, An Introduction to Invariant Imbedding Abraham Berman and Robert J. Plemmons, Nonnegative Matrices in the Mathematical Sciences Olvi L Mangasarian, Nonlinear Programming °Carl Friedrich Gauss, Theory of the Combination of Observations Least Subject to Errors: Part One, Part Supplement. Translated by 0. W. Stewart Richard Bellman, Introduction to Matrix Analysis U. M. Ascher, R. M. M. Mattheij, and R. D. Russell, Numerical Solution of Boundary Value Problems for Ordinary Differential Equations K. E. Brenan, S. L Campbell, and L. R. Petzold, Numerical Solution of lnitial.Value Problems in Differential.Algebmic Equations Charles 1. Lawson and Richard J. Hanson, Solving Least Squares Problems J. E. Dermis, Jr. and Robert B. Schnabel, Numerical Methods for Unconstrained Optimization and Nonlinear Equations
Richard E. Barlow and Frank Proechan, Mathematical Theory of Reliability Cornelius Lanczos, Linear Differential Operators Richard Bellman, Introduction to Matrix Analysis, Second Edition
*First time in print.
Classics in Applied Mathematics (continued) Beresford N. Partert, The Symmetric Eigenvaluc Problem Richard Haberman, Mathematical ModeLs: Mechanical Vibrations, R*ulation Dynamics, and Traffic Flow Peter W. M. John, Statistical Design and Analysis of Experiments and Geert Jan Olsder, Dynamic Noncooperative Game Theory, Second Edition Tamer Emanuel Parzen, StOChaStiC Processes Petar Kokotovi& Hassan K. KhaLit, and John O'ReilLy, Singular Perturbation Methods in Cont,ob Analysis
and Design Jean Dickinson Gibbons, Ingram 01km, and MiLton Sobel, Selecting and Ordering Fbpulations: A New Statistical Methodology James A. Murdock, Perturbations: Theory and Methods and Variational Problems Ivar EkeLand and Roger Témam, Convex AnaLysis Ivar Stalcgold, Boundary Value Problems of Mathematical Physics, Volumes I and ii J. M. Ortega and W. C. Rheinboldt, Iterative Solution of Nonlinear Equations in Several Variables David Kinderlehrer and Guido Srampacchia, An introduction to Variational inequalities and Their Applications
Natterer, The Mathematics of Computerized Tomography Avinash C. Kak and MalcoLm Slaney, Principles of Computerized Tomographic Imaging E
R. Wong, Asymptotic Approximations of integrals
0. Axelsson arid V. A. Barker, Rnite Element Solution of Boundary Value Problems: Theory and Computation David R. Brillinger, TIme Series: Data Analysis and Theory Joel N. Franklin, Methods of Mathematical Economics linear and Nonlinear Programming, Rxed-ibint Theorems
Philip Hartman, Ordinary Differential Equations, Second Edition Michael D. Inmilgator, Mathematical Optimization and Economic Theory Philippe 0.
The FInite Element Method for EUiptiC Problems Jane K. Cullum and Ralph A. Willoughby, Lanczos Algorithms for Large Symmetric Eigennalue Vol.
Theory
M. Vidyasagar, NOrjineaT Systems Analysis, Second Edition Robert Mattheij and Jaap Molenaar, Ordinary Differential Equations in Theory and Practice Shanti S. Gupta and S. Panchapakesan, Multiple Decision Procedures: Theory and Methodology of Selecting and Ranking Populations Eugene L Ailgower and Kurt Georg, Introduction to Numerical Continuation Methods
Leah Edelstein-Keshet, Mathematical Models in Biology Heinz-Otto Kreiss and Jens Lorenz, initial.Boundary Value Problems and the Navier-Stokes Equations J. L. Hodges, Jr. and E L Lehrnann, Basic Concepts of Probability and Statistics, Second Edition George F. Carrier, Max Krook, and Carl E. Pearson, Functions of a Complex Variable Theory and Technique Friedrich Pukeisheim, Optimal Design of Experiments Israel Oohberg, Peter Lancaster, and Leiba Rodman, Invariant Subspaces of Matrices with Applications Lee A. Segel with 0. H. Handelman, Mathematics Applied to Continuum Mechanics Rajendra Bhatia, Perturbation Bounds for Matrix Eigenvalues Barry C. ArnoLd, N. Batakrishnan, and H. N. Nagaraja, A First Course in Order Statistics Charles A. Desoer and M. Vidyasagar, Feedback Systems: lnput.Output Properties Stephen L Campbell and Carl D. Meyer, Generalized Inverses of linear Transformations
Generalized Inverses of Linear Transformations
ci
Stephen L. Campbell
Carl D. Meyer North Carolina State University Raleigh, North Carolina
Society for Industrial and Applied Mathematics Philadelphia
Copyright © 2009 by the Society for Industrial and Applied Mathematics This SIAM edition is an unabridged republication of the work published by Dover Publications, Inc., 1991, which is a corrected republication of the work first published by Pitman Publishing Limited, London, 1979.
1098 7 6 5 43 2 1 All rights reserved. Printed in the United States of America. No part of this book may be reproduced, stored, or transmitted in any manner without the written permission of the publisher. For information, write to the Society for Industrial and Applied Mathematics, 3600 Market Street, 6th Floor, Philadelphia, PA 19104-2688 USA. Library of Congress Cataloging.in#Publication Data Campbell, S. L. (Stephen La Vern) Generalized inverses of linear transformations I Stephen L. Campbell, Carl D. Meyer.
p. cm. -- (Classics in applied mathematics ; 56) Originally published: London: Pitman Pub., 1979. Includes bibliographical references and index. ISBN 978-0-898716-71-9 1. Matrix inversion. 2. Transformations (Mathematics) I. Meyer, C. D. (Carl Dean) II. Title. QA188.C36 2009 512.9'434--dc22 2008046428
512.111.
is
a registered trademark.
To 4ciI1L
Contents
Preface to the Classics Edition
Preface o
Introduction and other preliminaries I
2 3 I
2
Exercises
6
I
8
4
Computation of At Generalized inverse of a product
8 10 12 19
5
Exercises
25
Least squares solutions What kind of answer is 2 Fitting a linear hypothesis 3 Estimating the unknown parameters 4 Goodness of lit 5 An application to curve fitting 6 Polynomial and more general fittings 1
7 WhyAt? 3
I
Prerequisites and philosophy Notation and basic geometry
The Moore—Penrose or generalized inverse I Basic definitions 2 Basic properties of the generalized inverse 3
2
xiii xix
28 28 30 32 34 39 42 45
Sums, partitioned matrices and the constrained generalized inverse I The generalized inverse of a sum 2 Modified matrices 3 Partitioned matrices 4 Block triangular matrices
46
The fundamental matrix of constrained minimization 6 Constrained least squares and constrained generalized
63
5
7
inverses Exercises
46 SI 53 61
65 69
CONTENTS
x
4
Partial isometries and EP matrIces I Introduction 2 Partial isometrics 3 EP matrices 4 Exercises
71 71 71
74 75
5 The generalized Inverse in electrical engineering I Introduction 2 n-port networks and the impedance matrix 3 Parallel sums 4 Shorted matrices 5 Other uses of the generalized inverse 6 Exercises 7 References and further reading
77 77 77 82 86 88 88 89
6 (i,j, k)-Generajlzed inverses and linear estImation I Introduction 2 Definitions 3 (1)-inverses 4 Applications to the theory of linear estimation
91
5
Exercises
7 The Drazin Inverse I Introduction 2 3
4
Definitions Basic properties of the Drazin inverse Spectral properties of the Drazin inverse
ADasapolynomialinA 6 ADasa limit 5
7 8
The Drazin inverse of a partitioned matrix Other properties
8 ApplicatIons of the Drazin inverse to the theory of finite Markov I Introduction and terminology 2 Introduction of the Drazin inverse into the theory of finite Markov chains 3 Regular chains
4 Ergodic chains 5 Calculation of A' and w for an ergodic chain 6 7
Non-ergodic chains and absorbing chains References and further reading
9 ApplIcations of the DraiJa Inverse 1 Introduction 2 Applications of the Drazin inverse to linear systems of differential equations
91 91
96 104 115 121)
120 121
127 129 130 136 139 147
151 151
152 157 158 160 165 170
171 171 171
CONTENTS
Applications of the Drazin inverse to difference equations 4 The Leslie population growth model and backward population projection 5 Optimal control 6 Functions of a matrix 3
7 Weak Drazin inverses 8
10
ContInuity of die generalized inverse 1
2 3
4 5
6 7 8
9 11
Exercises
Introduction Matrix norms Matrix norms and invertibility Continuity of the Moore—Penrose generalized inverse Matrix valued functions Non-linear least squares problems: an example Other inverses Exercises
References and further reading
Linear programming
Introduction and basic theory 2 Pyle's reformulation I
12
3
Exercises
4
References and further reading
ComputatIonal concerns
Introduction 2 Calculation of A' 3 Computation of the singular value decomposition 1
4 (1)-inverses
Computation of the Drazin inverse 6 Previous algorithms 5
xi
181
184 187
200 202 208
210 210 210 214 216 224 229 231
234 235
236 236 241
244 245
246 246 247 251
255 255 260
Exercises
261
Bibliography
263
Index
269
7
Preface to the Classics Edition
The first edition of Generalized Inverses of Linear Transfor,narions was
written toward the end of a period of active research on generalized inverses. Generalized inverses of various kinds have become a standard and important mathematical concept in many areas. The core chapters of this book, consisting of Chapters 1—7, 10, and 12. provide a development of most of the key generalized inverses. This presentation is as up to date and readable as ever and can be profitably read by anyone interested in learning about generalized inverses and their application. Two of the application chapters, however, turned out to be on the ground floor of the development of application areas that have gone on to become significant areas of applied mathematics. Chapter 8 focuses on applications involving Markov chains. While the basic relation between the group inverse and the theory of Markov chains is still relevant, several advances have been made. Most notably, there has been a wealth of new results concerning the use of the group inverse to characterize the sensitivity of the stationary probabilities to perturbations in the underlying transition probabilities—representative results are found in [8, 9, 13, 16, 28, 29, 23, 36, 37, 38, 39,40].' More generally, the group inverse has found applications involving expressions for differentiation of eigenvectors and eigenvalues [10, 11, 12, 14, 30, 31]. Since the original version of this book appeared, researchers in more theoretical areas have been applying the group inverse concept to the study of M-matrices, graph theory, and general nonnegative matrices [4, 5, 6, 7, 17, 18, 19, 20, 21, 22, 32, 33, 34]. Finally, the group inverse has recently proven to be fundamental in the analysis of Google's PageRank system. Some of these applications are described in detail in [26, 27]. 'Citations here correspond only to the references immediately following this preface.
xiv
PREFACE TO THE CLASSICS EDITION
Chapter 9 discusses the Drazin inverse and its application to
differential equations of the form + Bx = f. In Chapter 9 these equations are called singular systems of differential equations, and some applications to control problems are given. It turns out that many physical processes are most naturally modeled by such implicit differential equations. Since the publication of the first edition of this book there has been a major investigation of the applications, numerical solution, and theory behind such implicit differential equations. Today, rather than being called singular systems, they are more often called differential algebraic equations (DAEs) in applied mathematics and called either DAES or descriptor systems in the sciences and engineering. Chapter 9 still provides a good introduction to the linear time invariant case, but now it should be viewed as the first step in understanding a much larger and very important area. Readers interested in reading further about DAEs are referred to the general developments [3, 1,24, 15] and the more technical books [25, 35]. There has been, of course, some additional work on generalized inverses since the first edition was published. A large and more recent bibliography can be found in [2].
Stephen L Campbell Carl D. Meyer September 7, 2008
References [1] Ascher, U. M. and Petzold, L. R. Computer Methods for Ordinary Equations and Equations. SIAM, Philadelphia, 1998. [2] Ben-Israel, A. and Greville, T. N. E. Generalized Inverses. Theory and Applications. 2nd ed. Springer-Verlag, New York, 2003.
[3] Brenan, K. E., Campbell, S. L., and Petzold, L. R. Numerical Solution of Initial-Value Pmblems in Equations. Classics in Appl. Math. 14, SIAM, Philadelphia, 1995. [4] Catral, M., Neumann, M., and Xu, J. Proximity in group inverses of M-matrices and inverses of diagonally dominant M-matrices. LinearAlgebraAppl. 409,32—50,2005.
PREFACE TO THE CLASSICS EDITION
xv
[5] Catral, M., Neumann, M., and Xu, J. Matrix analysis of a Markov chain small-world model. LinearAlgebra App!. 409, 126—146, 2005.
[6] Chen, Y., Kirkland, S. J., and Neumann, M. Group generalized
inverses of M-matrices associated with periodic and nonperiodic Jacobi matrices. Linear Muhi!inearAlgebra. 39, 325—340, 1995. (7] Chen, Y., Kirkland, S. J., and Neumann, M. Nonnegative
alternating circulants leading to Al-matrix group inverses. Linear Algebra App!. 233, 8 1—97, 1996. (8] Cho, 0. and Meyer, C. Markov chain sensitivity measured by mean first passage times. LinearA!gebra App!. 316,21—28, 2000. (9] Cho,
0. and Meyer, C. Comparison of perturbation bounds for the
stationary distribution of a Markov chain. Linear Algebra App!. 335, 137—150, 2001.
[10] Deutsch, E. and Neumann, M. Derivatives of the Perron root at an essentially nonnegative matrix and the group inverse of an M-matrix. J. Math. Anal. App!. 102, 1—29, 1984. (11] Deutsch, E. and Neumann, M. On the first and second derivatives of the Perron vector. Linear Algebra App!. 71,57—76, 1985. [12] Deutsch, E. and Neumann, M. On the derivative of the Perron vector whose infinity norm is fixed. Linear Multilinear Algebra. 21,75—85, 1987.
(13] Funderlic, R. E. and Meyer, C. Sensitivity of the stationary distribution vector for an ergodic Markov chain. Linear Algebra AppI. 17, 1—16, 1986.
[14] Golub, 0. H. and Meyer, Jr., C. D. Using the QR factorization and group inversion to compute, differentiate, and estimate the sensitivity of stationary probabilities for Markov chains. SIAM J. Aig. Discrete Met!,. 7, 273—281, 1986.
[15] Hairer, E. and Wanner, G. Solving Ordinary Differential Equations II. St and Differential-Algebraic Problems. 2nd ed. Springer Ser. Comput. Math. 14, Springer-Verlag, Berlin, 1996.
PREFACE TO THE CLASSICS EDITION
xvi
[16] Ipsen, I. and Meyer, C. D. Uniform stability of Markov SJAMJ. MatrLxAnal. App!. 15, 1061—1074, 1994.
chains.
[17] Kirkland, S. J. and Neumann, M. Convexity and concavity of the Perron root and vector of Leslie matrices with applications to a population model. SIAM I. Matrix Anal. App!. 15, 1092—1107, 1994. [18]
[19]
Kirkland, S. J. and Neumann, M. Group inverses of M-matrices associated with nonnegative matrices having few eigenvalues. LinearAlgebra App!. 220, 181—213, 1995.
Kirkland, S. J. and Neumann, M. The M-matrix group generalized inverse problem for weighted trees. SIAM J. Matrix Anal. App!. 19, 226—234, 1998.
[20] Kirkland, S. J. and Neumann, M. Cutpoint decoupling and first passage times for random walks on graphs. SIAM J. Matrix Anal. App!. 20,860—870, 1999.
[21] Kirkland, S. J., Neumann, M., and Shader, B. L. Distances in weighted trees and group inverse of Laplacian matrices. SIAM I. Matrix Anal. App!. 18, 827—841, 1997.
[22] Kirkland, S. J., Neumann, M.,
and Shader, B. L. Bounds on the
subdominant eigenvalue involving group inverses with applications to graphs. Czech.
Math. J. 48,
1—20, 1998.
[23] Kirkland, S. J., Neumann, M., and Sze, N.-S. On optimal condition
numbers for Markov chains. Numer
Math. 110,521—537,2008.
[24] Kumar, A. and Daoutidis, P. Control of Nonlinear Djfferential
Equation Systems with Applications to Chemical Processes. Chapman and Hall/CRC, Boca Raton, FL, 1999.
Algebraic
[25] Kunkel, P. and Mehrmann, V. Differential-Algebraic Equations. Analysis and Numerical Solution. EMS Textbooks in Math., European Mathematical Society, ZUrich, 2006. [26]
Meyer, C. Google 's PageRank and Beyond: The Science of Search Engine Rankings. Princeton University Press, Langville, A. and
Princeton, NJ, 2006.
PREFACE TO THE CLASSICS EDITION
xvii
[27] Langville, A. and Meyer, C. D. Updating Markov chains with an eye on Google's PageRank. SIAM J. Matrix Anal. App!. 27, 968—987,2006.
[28] Meyer, C. The character of a finite Markov chain. In Linear Algebra, Markoi' Chains, and Queuing Models. IMA Vol. Math. Appl. 48, Springer. New York, 1993.47—58.
[291 Meyer, C. Sensitivity of the stationary distribution of a Markov chain. SlAM). Matrix Anal. App!. 15, 715—728, 1994.
[30] Meyer, C. Matrix Analysis and Applied Linear Algebra. 2nd ed. SIAM, Philadelphia, to appear. [31] Meyer, C. and Stewart, G. W. Derivatives and perturbations of eigenvectors. SIAM I. Anal. 25,679—691, 1988. [32] Neumann, M. and Werner, H. J. Nonnegative group inverses. LinearAlgebra App!. 151, 85—96, 1991.
[33] Neumann, M. and Xu, J. A parallel algorithm for computing the group inverse via Perron complementation. Electron. J. Linear Algebra 13, 13 1—145, 2005.
[34] Neumann, M. and Xu, J. A note on Newton and Newton-like inequalities for M-matrices and for Drazin inverses of M-matrices. Electron. J. Linear Algebra 15, 314—328, 2006. [35] Riaza, R. Differential-Algebraic Systems: Analytical Aspects and Circuit Applications. World Scientific, River Edge, NJ, 2008. [36] Seneta, E. Sensitivity to perturbation of the stationary distribution: Some refinements. LinearAlgebra App!. 108, 12 1—126, 1988. [37] Seneta, E. Perturbation of the stationary distribution measured by ergodicity coefficients. Adv. App!. Probab. 20,228—230, 1988. [38] Seneta, E. Sensitivity analysis, ergodicity coefficients, and rank-one updates for finite Markov chains. In Numerical Solution of Mar/coy Chains. (W. J. Stewart, ed.) Marcel Dekker, New York, 1991, 121—129.
xviii
PREFACE TO THE CLASSICS EDITION
[39] Seneta, E. Explicit forms for ergodicity coefficients of stochastic matrices. Linear Algebra AppI. 191, 245—252, 1993. [40] Seneta, E. Sensitivity of finite Markov chains under perturbation.
Statist. Probab. Leu. 17, 163—168, 1993.
Preface
During the last two decades, the study of generalized inversion of linear transformations and related applications has grown to become an important topic of interest to researchers engaged in the study of linear mathematical problems as well as to practitioners concerned with applications of linear mathematics. The purpose of this book is twofold. First, we try to present a unified treatment of the general theory of generalized inversion which includes topics ranging from the most traditional to the most contemporary. Secondly, we emphasize the utility of the concept of generalized inversion by presenting many diverse applications in which generalized inversion plays an integral role. This book is designed to be useful to the researcher and the practitioner, as well as the student. Much of the material is written under the assumption that the reader is unfamiliar with the basic aspects of the theory and applications of generalized inverses. As such, the text is accessible to anyone possessing a knowledge of elementary Linear algebra. This text is not meant to be encyclopedic. We have not tried to touch on all aspects of generalized inversion—nor did we try to include every known application. Due to considerations of length, we have been forced to restrict the theory to finite dimensional spaces and neglect several important topics and interesting applications. In the development of every area of mathematics there comes a time when there is a commonly accepted body of results, and referencing is limited primarily to more recent and less widely known results. We feel that the theory of generalized inverses has reached that point. Accordingly, we have departed from previous books and not referenced many of the more standard facts about generalized inversion. To the many individuals who have made an original contribution to the theory of generalized inverses we are deeply indebted. We are especially indebted to Adi Ben-Israel, Thomas Greville, C. R. Rao, and S. K. Mitra whose texts undoubtedly have had an influence on the writing of this book.
xx
PREFACE
In view of the complete (annotated) bibliographies available in other texts, we made no attempt at a complete list of references.
Special thanks are extended to Franklin A. Graybill and Richard 3. Painter who introduced author Meyer to the subject of generalized inverses and who prov led wisdom and guidance at a time when they were most needed. S. L. Campbell C. D. Meyer, Jr North Carolina State University at Raleigh
0
Introduction and other preliminaries
1.
Prerequisites and philosophy
The study of generalized inverses has flourished since its rebirth in the
early 1950s. Numerous papers have developed both its theory and its applications. The subject has advanced to the point where a unified treatment is possible. It would be desirable to have a book that treated the subject from the viewpoint of linear algebra, and not with regard to a particular application. We do not feel that the world needs another introduction to linear algebra. Accordingly, this book presupposes some familiarity with the basic facts and techniques of linear algebra as found in most introductory courses. It is our hope that this book would be suitable for self-study by either students or workers in other fields. Needed ideas that a person might well have forgotten or never learned, such as the singular value decomposition, will be stated formally. Unless their proof is illuminating or illustrates an important technique, it will be relegated to the exercises or a reference. There are three basic kinds of chapters in this book. Chapters 0, 1, 2, 3, 4, 6, 7, 10, and 12 discuss the theory of the generalized inverse and related notions. They are a basic introduction to the mathematical theory. Chapters 5, 8,9 and 11 discuss applications. These chapters are intended to illustrate the uses of generalized inverses, not necessarily to teach how to use them. Our goal has been to write a readable, introductory book which will whet the appetite of the reader to learn more. We have tried to bring the reader far enough so that he can proceed into the literature, and yet not bury him under a morass of technical lemmas and concise, abbreviated proofs. This book reflects our rather definite opinions on what an introductory book is and what it should include. In particular, we feel that the numerous applications are necessary for a full appreciation of the theory. Like most types of mathematics, the introduction of the various generalized inverses is not necessary. One could do mathematics without
2
GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
ever defining a ring or a continuous function. However, the introduction
of generalized inverses, as with rings and continuous functions, enables us to more clearly see the underlying structure, to more easily manipulate it, and to more easily express new results. No attempt has been made to have the bibliography comprehensive. The bibliography of [64] is carefully annotated and contains some 1775 entries. In order to keep this book's size down we have omitted any discussion of the infinite dimensional case. The interested reader is referred to [64].
2.
Notation and basic geometry
Theorems, facts, propositions, lemmas, corollaries and examples are
numbered consecutively within each section. A reference to Example 3.2.3. refers to the third example in Section 2 of Chapter 3. If the reader were already in Chapter 3, the reference would be just to Example 2.3. Within Section 2, the reference would be to Example 3. Exercise sections are scattered throughout the chapters. Exercises with an (*) beside them are intended for more advanced readers. In some cases the require knowledge from an outside area like complex function theory. At other times they involve fairly complicated proofs using basic ideas. xli) C(R) is the field of complex (real) numbers. CTM is the vector-space of m x n complex (real) matrices over C(R). C'(R") is the vector-space of n-tuples of complex (real) numbers over C(R). We will frequently not distinguish between C"(R") and C" X '(R" 1)• That is, n-tuples will be written as column vectors. Except where we specifically state otherwise, it is to be assumed that we are working over the complex field. If A e C'" then with respect to the standard basis of C" and C'", A induces a linear transformation : C" -. C'" by = Au for every ueC". Whenever we go from one of A or to the other, it is to be understood that it is with respect to the standard basis. The capital letters, A, B, C, X, Y, Z are reserved for matrices or their corresponding linear transformations. Subspaces are denoted by the capital letters M, N. Subspaces are always linear subspaces. The letters U, V, W are reserved for unitary matrices or partial isometrics. I always denotes the identity matrix. If I e we sometimes write 1. Vectors are denoted by b, u, v, y, etc., scalars by a, b, )., k, etc. R(A) denotes the range of A, that is, the linear span of the columns of A. The range of is denoted Since A is derived from by way of the standard basis, we have R(A) = The null space of A, N(A), is : Ax = 0). A matrix A is hermiuan if its conjugate transpose, A*, equals A. Jf A2 = A, then A is called a projector of C" onto R(A). Recall that rank (A) = TI(A) if A2 = A. If A2 = A and A = A *, then A is called an orthogonal projector. If A, then [A,B] = AB — BA. The inner product between two vectors u, we C" is denoted by (u, v). If $" X
INTRODUCTION AND OTHER PRELIMINARIES
3
= {ueC" :(u,v) = 0 for every VE.Y}. The smallest subspace of C" containing .9' is denoted LS([f). Notice that = LS(.9'). Suppose now that M,N1, and N2 are subspaces of C". Then N1 + N2 = {u + v:ueN1 and veN2}, while AN1 = {Au :uEN1}. If M = N1 + N2 and N1 N2 = {O}, then M is called the direct sum of N1 and N2. In this case we write M = N1 + N2. If M = N1 + N2 and N1 J.. N2, that is (u,v) = 0 for every u€N1 and yeN2, then M is called the orthogonal sum of N1 and N2. This will be written M = N1 N2. If two vectors are orthogonal. their sum will frequently be written with a also. If C" = N1 + N2, then N1 and N2 are called complementary subs paces. Notice that C" = N1 Nt. The dimension of a subspace M is denoted dim M. One of the most basic facts used in this book is the next proposition. is a subset of C", then ,$f'-'-
Proposition 0.2.1
Suppose that AeCtm
Then R(A) =
Proof We will show that R(A)
N(A*)I and dim R(A) = dim Suppose that uER(A). Then there is a such that Av = u. If wEN(A*), (v,A*w) w) = then (u, w) = (As', = (v,O) =0. Thus UEN(A*)J. and R(A) N(A*)i. But dim R(A) = rank A = rank = m — dim N(A*) = dim N(A*)L.
•
Thus R(A)=N(A*).. A useful consequence of Proposition 1 is the 'star cancellation law'. Proposition 0.2.2 (Star cancellation law) Suppose that Then (i) A*AB MAC and only (ii) N(AtA) = N(A), (iii) R(A*A) = R(A*).
and
= AC. Also
Proof. (i) may be rewritten as A*A(B — C) =0 if and only if A(B — C)
=0.
Clearly (i) and (ii) are equivalent. To see that (ii) holds notice that by Proposition 1, N(M) = R(A)L and thus MAx =011 and only if Ax =0. To see (iii), note that using (ii) we have that R(A*A) = N(A*A)L = N(A)- =
R(A*). U
Propositions 1 and 2 are basic and will be used frequently in what
follows without comment. If M is a subspace of C", then we may define the orthogonal projector, = u if ueM and PMU = 0 M by Notice that = and that for any orthogonal projector, P. we have P= — It is frequently helpful to write a matrix A in block form, that is, express A as a matrix made up of matrices, A = If B is a second block matrix, then (AB)1, =
submatrices A.,, cations and additions. Example 0.2.1
and (A + B)1, =
+ B1, provided the
the correct size to permit the indicated multipli-
LetA=
112341 1
2
01
1
1
iJ
riol
andB= 10 OI.ThenA 10 OJ
4 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
can be written as A
written as
where A11
= [1], A12 = [2,3],
=
[i] where B1 = [1
0], B2
and B3 = [1,0].
=
B3
_1A11 Th US AB
LA21
A1311B11 1A11B1 + A12B2 + A13B3 A22 A23J1B2 1LA21B1+A22B2+A22B3 LB3J A12
If all the submatrices are from C' 'for a fixed r, then block matrices K
may be viewed as matrices over a ring. We shall not do so. This notation is especially useful when dealing with large matrices or matrices which have submatrices of a special type. There is a close connection between block matrices, invariant subspaces, and projections which we now wish to develop. then M is called Definition 0.2.1 If M is a subspace of and an invariant subspace of A Ef and only (lAM = (Au : ueM} c M. If M is an invariant subspace for both A and then M is called a reducing subspace
of A.
Invariant and reducing subspaces have a good characterization in terms of projectors.
Proposition 0.2.3
Let PM and M isa subspace of be the orthogonal projector onto M. Then (i) M is an invariant subspace for A (land only if PMAPM = APM. (ii) M is a reducing subspace for A (land only (fAPM = PMA. The proof of Proposition 3 is left to the next set of exercises. Knowledge of invariant subspaces is useful for it enables the introduction of blocks of Suppose
zeros into a matrix. Invariant subspaces also have obvious geometric importance in studying the effects of on
Proposition 0.2.4 Suppose that A E CN and that M is an invariant subspace of A of dimension r. Then there exists a unitary matrix U such that
(i) A=U*IIA
Al
If M is a reducing subspace of A, then (ii) A = U*I
L°
A 11
ii
1 IU where
A22j
' A 22
Then CN = M M1. Let Proof Let M be a subspace of orthonormal basis for M and be an orthonormal basis for
be an Then
INTRODUCTION AND OTHER PRELIMINARIES
5
Order the vectors in P so that P2 is an orthonormal basis for are listed first. those in Let U be a unitary transformation that maps the standard basis for onto the ordered basis fi. Then P = P1
M
r
—
Lo
—
0]'
I
12
*
LA21
(1)
A,2
r = dim M. Suppose now that M is an invariant subspace for A. Thus PMAPM = APM by Proposition 3. This is equivalent to where
E C'
X
UPMUUAUUPMU = U*AUU*PMU. Substituting (1) into this gives IA11 L0
0]
[A11
0
0JLA,1 0
Thus A21 =0 and part (i) of Proposition 4 follows. Part (ii) follows by substituting (1) into U*PMUU*AU = U*AUU*PMU. • If AECnXn, then R(A) is always an invariant subspace for A. If A is hermitian, then every invariant subspace is reducing. In particular. R(A) is reducing. If A = A*, then there exists a unitary marix U and an
Proposition 0.2.5
irnertible hermitian matrix A1 such that A =
1
U.
Proposition 5 is. of course, a special case of the fact that every hennitian matrix is unitarily equivalent to a diagonal matrix. Viewed in this manner. it is clear that a similar result holds if hermitian is replaced by normal where a matrix is called normal if A*A = AA*. We assume that the reader is already familiar with the fact that normal and hermitian matrices are unitarily equivalent to diagonal matrices. Our purpose here is to review some of the 'geometry' of invariant subspaces and to gain a facility with the manipulation of block matrices. Reducing subspaces are better to work with than invariant subspaces, has no reducing = n> 1, does have invariant subspaces
but reducing subspaces need not always exist. A X
subspaces. Every matrix in since it has eigenvectors. And if is a set of eigenvectors corresponding to a particular eigenvalue of A, then LS(S1') is an invariant subspace for A. We shall see later that unitary and hermitian matrices are often easier to work with. Thus it is helpful if a matrix can be written as a product of such factors. If there is such a decomposition. It is called the polar form. The name comes from the similarity between it and the polar form of a complex number z = re1° where reR and 1e101 = I.
Theorem 0.2.1
If
then there exists a unitary matrix U and
hermitian matrices B,C such that A = UB = CU.
6
GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
If A E cm
n, then one cannot hope to get quite as good an expression for A since A is not square. There are two ways that Theorem 1 can be extended. One is to replace U by what is called a partial isometry. This will be discussed in a later section. The other possibility is to replace B by a matrix like a hermitian matrix. By 'like a hermitian matrix' we are thinking of the block form given in Proposition 5. where m
Theorem 0.2.2
(Singular Value Decomposition) Suppose that A E Ctm 'C". Then there exist unitary matrices UeCM 'Cm and VeC" 'C", and an invertible
hermitian diagonal matrix D = Diag{a1, ... ,a,}, whose diagonal entries are the positive square roots of the eigenvalues of(A*A) repeated according
to multiplicity, such that A = u
0°]v.
The proofs of Theorems I and 2 are somewhat easier if done with the notation of generalized inverses. The proofs will be developed in the exercises following the section on partial isometrics. Two comments are in order. First, the matrix
] is not a square
matrix, although D is square. Secondly, the name 'Singular Value Decomposition' comes from the numbers fri, ... , a,) which are frequently referred to as the singular values of A. The notation of the functional calculus is convenient and will be used from time to time. If C = U Diag{A1, ... , for some unitary U and is a function defined on then we definef(C) = U Diag{f(21), ... This definition only makes sense if C is normal. If p(A) = aMA" + ... + a0, then p(C) as we have defined it here agrees with the standard definition of p(C) = aRC" + ... + a0!. For any A E C" 'Ca a(A) denotes the set of eigenvalues of A.
I
3.
Exercises
Prove that if P is hermitian, then P is a projector if and only if P3 = P2. 2. Prove that if M1 c M2 C" are invariant subspaces for A, then A is 1.
unitarily equivalent to a matrix of the form J 0
X X where X
LOOXJ
denotes a non-zero block. Generalize this to t invariant subspaces such that 3. If M2 c C" are reducing subspaces for A show that A is unitarily
Ix equivalent to a matrix in the form reducing subspaces such that M1
4. Prove Proposition 2.3.
0
01
0 X 0 1. Generalize this to t LOOXJ ...
M.
INTRODUCTION AND OTHER PRELIMINARIES X is an invariant subspace for A, then and M c 5. Prove that if A M1 is an invariant subspace for A*.
6. Suppose A = A*. Give necessary and sufficient conditions on A to guarantee that for every pair of reducing subspaces M1 , M2 of A that either M1 ±M2 or M1rM2
{O}.
7
1
The Moore—Penrose or generalized inverse
1.
Basic definitions
Equations of the form
Ax = b,AECMXA,XECA,b€Cm
(1)
X9, occur in many pure and applied problems. If A and is invertible, then the system of equations (1) is, in principle, easy to solve. The unique solution is x = A - 1b. If A is an arbitrary matrix in C"' ", then it becomes more difficult to solve (1). There may be none, one, or an infinite number of solutions depending on whether b€ R(A) and whether n-rank (A)> 0. One would like to be able to find a matrix (or matrices) C, such that solutions of (1) are of the form Cb. But if R(A), then (1) has no solution. This will eventually require us to modify our concept of what a solution of (1) is. However, as the applications will illustrate, this is not as unnatural as it sounds. But for now we retain the standard definition of solution. To motivate our first definition of the generalized inverse, consider the functional equation
(2)
wheref is a real-valued function with domain S". One procedure for solving (2) is to restrict the domain off to a smaller set is one to so that one. Then an inverse functionj' from R(J) to b°' is defined byj 1(y) = x
if xe9" andf(x) = y. Thus! '(y) is a solution of(2) for y€R(J). This is how the arcsec, arcsin, and other inverse functions are normally defined. The same procedure can be used in trying to solve equation (1). As usual, we let be the linear function from C" into C"' defined by = Ax for xeC". To make a one to one linear transformation it must be restricted to a subspace complementary to N(A). An obvious one is N(A)- = R(A*). This suggests the following definition of the generalized inverse.
Definition 1.1.1
Functional definition of the generalized inverse
If
THE MOORE—PENROSE OR GENERALIZED INVERSE
define the linear transformation 'X and AtX
=
by The matrix :Cm
9
=0 is denoted At
and is called the generali:ed inverse of A.
It is easy to check that AAtX = Oil xER(A)1 and AAtx = x if x€R(A). Similarly, AtAX = 0 if xEN(A)= and A'Ax = x if XER(A*) = R(At). Thus AAt is the orthogonal projector of Ctm onto R(A) while A'A is the orthogonal projector of onto R(A*) = R(At). This suggests a second definition of the generalized inverse due to E. H. Moore
If Definition 1.1.2 Moore definition of the generalized inverse AECrnXn. then the generali:ed inuerse of A is defined to be the unique matrix
such 1/tat
(a) AAt
=
PR(A).
and
(b) AtA=PR(A.).
Moore's definition was given in 1935 and then more or less forgotten. This is possibly due to the fact that it was not expressed in the form of Definition 2 but rather in a more cumbersome (no pun intended) notation. An algebraic form of Moore's definition was given in 1955 by Penrose who was apparently unaware of Moore's work.
Definition 1.1.3
Penrose definition of the generalized inverse At then is the unique matrix in CnXtm such that
(i)
If
AAtA=A,
(ii) AtAAt = At, (iii) (AAt)* = AAt, (iv) (AtA)* = AtA. The first important fact to be established is the equivalence of the definitions. Theorem 1.1.1 The functional, Moore and Penrose definitions of the generalized inverse are equivalent.
Proof We have already noted that if At satisfies Definition 1. then it satisfies equations (a) and (b). If a matrix At satisfies (a) and (b) then it immediately satisfies (iii) and (iv). Furthermore (i) follows from (a) by observing that AAtA = PR(A,A = A. (ii) will follow from (b) in a similar manner. Since Definition I was constructive and the A' it constructs satisfies (a), (b) and (i)—(iv), the question of existence in Definitions 2 and 3 is already taken care of. There are then two things remaining to be proven. One is that a solution of equations (i)—(iv) is a solution of(a) and (b). The second is that a solution of(a) and (b) or (i)—(iv) is unique. Suppose then that A' is a matrix satisfying (i)—(iv). Multiplying (ii) on the left by A gives (AAt)2 = (AAt). This and (iii) show that AAt is an orthogonal projector. We must show that it has range equal to the range
10 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
of A. Using (i) and the fact that R(BC) c R(B) for matrices B and C, we get R(A) = R(AAtA) c R(AAt) R(A), so that R(A) = R(AAt). as desired. The proof that AtA = is similar and is Thus = left to the reader as an exercise. One way to show uniqueness is to show that if At satisfies (a) and (b), or (i)—(iv), then it satisfies Definition 1. Suppose then that At is a matrix satisfying (i)—(iv), (a), and (b). If then by (a), AAtx =0. Thus by (ii) A'x = AtAAtX = A'O = 0. II xeR(A), then there exist yeR(A*) such that Ay = x. But AtX = AtAy = y. The last equality follows by observing that taking the adjoint of both sides of 1x. Thus (1) gives PR(At)A = A* so that R(A) c R(At). But y = At satisfies Definition I. U As this proof illustrates, equations (i) and (ii) are, in effect, cancellation laws. While we cannot say that AB = AC implies B = C, we can say that if AtAB = AtAC then AB = AC. This type of cancellation will frequently appear in proofs and the exercises. For obvious reasons, the generalized inverse is often referred to as the Moore—Penrose inverse. Note also that if A E C" "and A is invertible, then A -' = At so that the generalized inverse lives up to its name.
2.
Basic properties of the generalized inverse
Before proceeding to establish some of what is true about generalized inverses, the reader should be warned about certain things that are not true. While it is true that R(A*) = R(At), if At is the generalized inverse, condition (b) in Definition 2 cannot be replaced by AtA =
Example 1.2.1
o] = A' = XAX
?].SinceXA=AX= =
X satisfies AX =
and hence X #
and XA
At: Note that XA
=
But
and thus
X. If XA = PR( A.), AX = PR(A), and in addition XAX = X, then
X = At. The proof of this last statement is left to the exercises. In computations involving inverses one frequently uses (AB)' = B- 1A A and B are invertible. This fails to hold for generalized inverses even if AB = BA.
Fact 1.2.1 If as BtAt. Furthermore Example 1.2.2
then (AB)t is not necessarily the same is
not necessarily equal to (A2)t.
Ii
THE MOORE—PENROSE OR GENERALIZED INVERSE
A2
= A while At2 =
Thus (At)2A2 =
which is not a projection.
(A2)t.
Thus (At)2 Ways of calculating At will be given shortly. The generalized inverses in Examples 1 and 2 can be found directly from Definition 1 without too much difficulty. Examples 2 illustrates another way in which the properties of the generalized inverse differ from those of the inverse. If A is invertible, then
2Eo(A) if and only
If A
in Example 2, then
—
(7(A) = {1,0} while a(At) =
If A is similar to a matrix C, then A and C have the same eigenvalues. the same Jordan form, and the same characteristic polynomial. None of these are preserved by taking of the generalized inverse.
1 Example 1.2.2
Ii B= 10
0 1
Let A =
1
1
0 —2
2
L—i
ii
—11 Then A =
BJB'
where
1J
1
10 1 0] IIandJ=IO 0 0J.ThecharacteristicpolynomialofA
Liooi
Loo2J
and J is
ro o = Ji 0
A2 and 2— 2.
LO
and the characteristic polynomial of divisors 22,(2 — 1/2).
is
—
1/2)
0
o
0 1/2
with elementary
11
An easy computation gives At = 1/12
6
2—11 0
L—' —2
6
1J
— (1 — (1 + and hence a diagonal Jordan form. Thus, if A and C are similar, then about the only thing that one can always say about At and C is that they have the same rank. A type of inverse that behaves better with respect to similarity is discussed in Chapter VII. Since the generalized inverse does not have all the properties of the inverse, it becomes important to know what properties it does have and which identities it does satisfy. There are, of course, an arbitrarily large number of true statements about generalized inverses. The next theorem lists some of the more basic properties.
But At has characteristic polynomial 1(2 —
Theorem 1.2.1 (P1)
Then
Suppose that
(At)t=A 0
(P2) (At)* = (A*)t (P3) IfAeC,(AA)t = A'At where
ifA=0. = AAAt = AtAA* (P4) (P5) (A*A)t = AtA*t
= — 412 #0 and
=0
12
GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
At = (A*A)?A* = A*(AA*)? (P7) (UAV)t = v*A?u* where U, V are unitary matrices. (P6)
Proof We will discuss the properties in the order given. A look at Definition 2 and a moment's thought show that (P1) is true. We leave (P2) and (P3) to the exercises. (P4) follows by taking the adjoints of both A = (AAt)A and A = A(AtA). (P5), since it claims that something is a generalized inverse, can be checked by using one of the definitions. Definition 2 is the quickest. AtA*tA*A = At(A*tA*)A = A?(AAt)*A = AtAA'A = AtA = Similarly, A*AA?A*? = A*(AAt)A*t = = A*(AAt)*A*t = A*(A*tA*)A*t = (A*A*?)(A*A*?) = A*A*t = = A?A*? by Definition 2. (A*A)? = Thus = = (P6) follows from (P5). (P7) is left as an exercise, a
01t
IA
Proposition
1.2.1
•.
I
I
:
0
=I [0
Ló The proof of Proposition 1 is left as an exercise. As the proof of Theorem I illustrates, it is frequently helpful to know the ranges and null spaces of expressions involving At. For ease of reference we now list several of these basic geometric properties. (P8) and (P9) have been done already. The rest are left as an exercise. :AmJ
Theorem 1.2.2 If then (P8) R(A) = R(AAt) = R(AA*) (P9) R(At) = R(A*) = R(AtA) = (PlO) R(I — AA') = N(AAt) = = N(At) = R(A)(P11) R(I — AtA)= N(AtA) = N(A)= 3.
Computation of At
In learning any type of mathematics, the working out of examples is useful, if not essential. The calculation of At from A can be difficult, and will be discussed more fully later. For the present we will give two methods which will enable the reader to begin to calculate the generalized inverses of small matrices. The first method is worth knowing because using it should help give a feeling of what A' is. The method consists of 'constructing' At according to Definition 1.
Example 1.3.1
[1127 Let A = I 0
Ii
2
2 I . Then R(A*) is
0
1
spanned by
L'°' {
[i].
[o]. [o]}. A subset forming a basis of R(A*) is
THE MOORE—PENROSE OR GENERALIZED INVERSE
[?] =
=
roi
3
and At 4 =
I1
. I
We now
must calculate a basis for R(A)-- =
N(A*).
Lii
1
+ x4 [1/2] where
Solving the system A*x =0 we get that x = x3
—1
x3.x4eC. Then At
—11
=At
1/2
1/21
= O€C3. Combining
all
of this
01
0 3
3
—1
gives At 2
4
1/2
21 21 ri
At=I0
0 1
1
Li
1J
—1 1/2 =
1
0
0
1
0 0 0
01
oJ
[1
0
10
1
o
01
0 Olor
Lii
0
oJ
3
3
—1
—1
2
4
1/2
1,2
2
I
1
0
2
I
0
1
—l
The indicated inverse can always be taken since its columns form a basis for R(A) R(A)L and hence are a linearly independent set of four vectors in C4. Below is a formal statement of the method described in Example 1. Xfl Theorem 1.3. 1 Let AECTM have rank r. If {v1 , v2,... "r} is a basis for R(A*) and {w1 , w2, ... , is a basis for N(A*), then
Proof By using Definition I
I
I
A'[Av 1I••.I'Av,:w I
I
Furthermore,
I
I
I
I
I
I
1.1
it
is clear that
I
{Av1,Av2, ... ,Av,) must be a basis for R(A). Since R(A)1 =
N(A*), it follows that the matrix [Av1 ...
w1
...
non-singular. The desired result is now immediate. •
must be
13
14 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
The second method involves a formula which is sometimes useful. It
depends on the following fact:
Proposition
then there exists that A = BC and r = rank (A) = rank (B) = rank (C).
such
1.3.1
The proof of Proposition 1 is not difficult and is left as an exercise. It means that the next result can, in theory, always be used to calculate At. or (B*BYl if C or B have See Chapter 12 for a caution on taking small singular values. Theorem 1.3.2 If A = BC where r = rank (A) = rank (B) = rank (C), then At
Prool Notice that
= C(CC) '(BB) iB*.
and CC are rank r matrices in C' 'so that it X
makes sense to take their inverses. Let X = C*(CC*) *(B*B) IB*. We will show X satisfies Definition 3. This choice is made on the grounds that the more complicated an expression is, the more difficult it becomes geometrically to work with it, and Definition 3 is algebraic. Now AX = BCC*(CC*) i(B*B) 1B* = B(BB) 'B,so(AX) = AX. Also XA = C*(CC*) IB*BC = C*(CC*) 1C, so (XA)* = XA. Thus (iii) and (iv) hold. To check (1) and (ii) use XA = C*(CC*) 'C to get that A(XA) = BC(C*(CC*)... 'C) = BC = A. And (XA)X = C(CC) 1CC*(CC*)i x (B*B) 'B = C*(CC*) '(BB) IB* = X. Thus X = At by Definition 3.U
Example 1.3.2
1.A=BC where
Let
BeC2'" and CeC1 x3 In fact, a little thought shows that A
Then B*B=[5],CC*=[6].ThusAt= Ill
1
2].
2
LzJ 1
is typical as the next result shows.
Theorem 1.3.3
If
and rank(A) = 1, then At = !A* where
The proof is left to the exercises. The method of computing At described in Example 1.3.1 and the method of Theorem 1.3.2 may both be executed by reducing A by elementary row operations.
Definition 1.3.1 A matrix echelon form .fE is of the form
which has rank r is said to be in row
(1) (m — r) X
THE MOORE—PENROSE OR GENERALIZED INVERSE
where the elements c, of C (
15
= C,. satisfy the following conditions,
(1)
(ii) The first non-zero entry in each row of C is 1. (iii) If = 1 is the first non-zero entry of the ith row, then the jth column of C is the unit vector e1 whose only non-zero entry is in the ith position.
For example, the matrix
120 —2 3501 400 E= 0 0 0 0 0 000 0000 000 0000 00
1
1
3
(2)
is in row echelon form. Below we state some facts about the row echelon form, the proofs of which may be found in [65]. For A e CTM "such that rank (A) = r:
(El) A can always be row reduced to row echelon form by elementary row operations (i.e. there always exists a non-singular matrix P€C" such that PA = EA where EA is in row echelon form). (E2) For a given A, the row echelon form EA obtained by row reducing A is unique. (E3) If Eq is the row echelon form for A and the unit vectors in EA appear in columns i2,... , and i,, then the corresponding columns of A are a basis for R(A). This particular basis is called the set of distinguished columns of A. The remaining columns are called the undistinguished columns of A. (For example, if A is a matrix such that its row echelon form is given by (2) then the first, third, and sixth columns of A are the distinguished columns. (E4) If EA is the row echelon form (1) for A, then N(A) = = N(C). (ES) If (1) is the row echelon form for A, and if the matrix made up of the distinguished columns of A (in the same order as they are in A), then A = BC where C is obtained from the row echelon form. This is a full rank factorization such as was described in Proposition 1.
Very closely related to the row echelon form is the hermite echelon form. However, the hermite echelon form is defined only for square matrices.
Definition 1.3.2
tf its elements
A matrix
is said to be in hermite echelon form satisfies the following conditions.
(i) H is upper triangular (i.e. =0 when i (ii) is either 0 or 1. (iii) If h,1 =0, then h, = Ofor every k, 1 k (iv) If h1, = 1, then hkg = Ofor every k # I.
>j). n.
16
GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
For example, the matrix
120 0000 3501 000 —2 4 0 0 00 H=000 0000 000 0000 000 0013 000 0000 1
is in hermite form. Below are some facts about the hermite form, the proofs of which may be found in [65). For can always be row reduced to a hermite form. If A is reduced to its row echelon form, then a permutation of the rows can always be performed to obtain a hermite form. (H2) For a given matrix A the hermite form HA obtained by row reducing A is unique. (H3) = HA (i.e. HA is a projection). (H4) N(A) = N(HA) = R(I — HA) and a basis for N(A) is the set of non-zero columns of I — HA. A
We can now present the methods of Theorems 1 and 2 as algorithms.
Algorithm 1.3.1
To obtain the generalized inverse of a square matrix
Ae CA
Row reduce A* to its hermite form HA,. (II) Select the distinguished columns of A*. Label these columns v, and place them as columns in a matrix L. '1' (III) Form the matrix AL. (IV) Form I — HA. and select the non-zero columns from this matrix. Label these columns w1 ,w2,... , (V) Place the columns of AL and the w1's as columns in a matrix (I)
M= rows of M -
and compute M '.(Actually only the first r l are
needed.)
(VI) Place the first r rows of M '(in the same order as they appear in M ')in a matrix called R. (VII) Compute At as At = LR. Although Algorithm 1 is stated for square matrices, it is easy to use it for non-square matrices. Add zero rows or zero columns to construct a square
matrix and use the fact that Algorithm 1.3.2 inverse for any
=
= [AtjO*].
To obtain the full rank factorization and the generalized
THE MOORE—PENROSE OR GENERALIZED INVERSE 17
(I) Reduce A to row echelon form EA.
(II) Select the distinguished columns of A and place them as the columns in a matrix B in the same order as they appear in A. (III) Select the non-zero rows from EA and place them as rows in a matrix C in the same order as they appear in EA. (IV) Compute (CC*yl and (B*B)'. (V) Compute A' as At = C*(CC*) l(B*BY We will use Algorithm 1 to find At where
Example 1.3.3
ri A
4 6
2
1
—10 0
1
1
0 0
1
40
12 Lo
(I) Using elementary row operations on A* we get that its hermite echelon form is
10
10
00
01
H
(H) The first, second and fourth columns of A* are distinguished. Thus
ft 2 0 12 4 0 L=11 0
0
6
1
(III) Then
AL=
22 34
34 56
5
6
41 61
461 1
(IV) The non-zero column oil — HA. is = [— 1, 1/2. 1,0]*. (V) Putting AL and w1 into a matrix M and then computing M' gives r22
34
M— 134
56 6
J5
4 6
—
40
—20
19
14
1
46 —4 —44
0
40
20
1
1/2
1
61
(VI) The first three rows of M ' give R as 40 i 1
—20 14
50 —26
[—46—4 -44
—90 18
342
50 —26
40
—
901
18!
oJ
18 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
(VII) Thus
4 —1 —27
1
2 20 0
—
Example 1.3.4
A=
8
—10 0
—2 —54 25
—45
0
45
We will now use Algorithm 2 to fmd At where
12141 24066 0 24066 1
2
3
3
(I) Using elementary row operations we reduce A to its row echelon form
Ii
2
0
3
31
10
0
1
1
—21
EA=10
00
0
01
L0000
oJ
(II) The first and third columns are distinguished. Thus
B=[? 1]. (III) The matrix C is made up of the non-zero rows of EA so that
20 Lo 0
(IV) Now CC*
1
3
3
1
—2
[23
B*B =[1?
—
Calculating
=
we get
(V) Substituting the results of steps (II), (III) and (IV) into the formula
for At
At = C*(CC*) l(B*B) 1B*
=1
27 54 207 288 —333
6
3
6
12
6
12
—40 —20 —40 —22 —11 —22 98
49
98
THE MOORE—PENROSE OR GENERALIZED INVERSE
19
Theorem 2 is a good illustration of one difficulty with learning from a
text in this area. Often the hard part is to come up with the right formula. To verify it is easy. This is not an uncommon phenomenon. In differential equations it is frequently easy to verify that a given function is a solution. The hard part is to show one exists and then to find it. In the study of generalized inverses, existence is usually taken care of early. There remains then the problem of finding the right formula. For these reasons, we urge the reader to try and derive his own theorems as we go. For example, can you come up with an alternative formula to that of Theorem 2? The resulting formula should, of course, only involve At on one side. Ideally, Then ask yourseff, can I do better by it would not even involve B' and imposing special conditions on B and C or A? Under what conditions does the formula simplify? The reader who approaches each problem, theorem and exercise in this manner will not only learn the material better, but will be a better mathematician for it. 4.
Generalized inverse of a product
As pointed out in Section 2, one of the major shortcomings of the
Moore—Penrose inverse is that the 'reverse order law' does not always hold, that is, (AB)' is not always BtAt. This immediately suggests two questions. What is (AB)'? When does (AB)t = BtAt? The question, 'What is (AB)'?' has a lot of useless, or non-answer, answers. For example, (AB)' = is a non-answer. It merely restates condition (ii) of the Penrose definition of (AB)t. The decision as to whether or not an answer is an answer is subjective and comes with experience. Even then, professional mathematicians may differ on how good an answer is depending on how they happen to view the problem and mathematics. The authors feel that a really good answer to the question, 'What is (AB)'?' does not, and probably will not exist. However, an answer should:
(A) have some sort of intuitive justification if possible; (B) suggest at least a partial answer to the other question, 'When does
= B'A'?' Theorem 4.1 is, to our knowledge, the best answer available. We shall now attack the problem of determining a formula for (AB)'. The first problem is to come up with a theorem to prove. One way to come up with a conjecture would be to perform algebraic manipulations on (AB)' using the Penrose conditions. Another, and the one we now follow, is to draw a picture and make an educated guess. If that guess does not work, then make another. Figure 1.1 is, in a sense, not very realistic. However, the authors find it a convenient way to visualize the actions of linear transformations. The vertical lines stand for CTM, and C. A sub-interval is a subspace. The rest of the interval is a (possibly orthogonal) complementary subspace. A
20 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS C
1'
— C
C,,
---—=A '45R (5) Fig. 1.1
shaded band represents the one to one mapping of one subspace onto Xfl another. It is assumed that A€C'" and BEC" ". In the figure: (a, c) =
R(B*), (a', c') = R(B), (b', d') = R(A*), and (b", d") = R(A). The total shaded band from C" to C" represents the action of B (or The part that is shaded more darkly represents PR(A.)B = AtAB. The total shaded area
from CN to C'" is A. The darker portion is APR(B) = ABBt. The one to one from C" to C'" may be viewed as the v-shaped portion of the mapping dark band from C" to C'". Thus to 'undo' AB, one would trace the dark band backwards. That is, (AB)t = (PRA.B)t(APRB)t. There are only two things wrong with this conjecture. For one,
AB = A(AtA)(BBt)B
(1)
and not AB = A(BBt)(A?A)B. Secondly, the drawing lies a bit in that the interval (b', c') is actually standing for two slightly skewed subspaces and not just one. However, the factorization (I) does not help and Fig. 1.1 does seem to portray what is happening. We are led then to try and prove the following theorem due to Cline and Greville.
Theorem 1.4.1
IfA€C'"
X
and BEC"
X
then (AB)t = (PR(A. )B)(APR(B)).
Proof Assuming that one has stumbled across this formula and is wondering if it is correct, the most reasonable thing to do is see if the formula for (AB)t satisfies any of the definitions of the generalized inverse. Let X = (PR( A.)B)'(APR(B))t = (AtAB)t(ABBt)t. We will proceed as follows. We will assume that X satisfies condition (i) of the Penrose definition. We will then try and manipulate condition (1) to get a formula that can be verified independently of condition (1). If our manipulations are reversible, we will have shown condition (i) holds. Suppose then that ABXAB = AB, or equivalently, AB(AtAB)?(ABBt)tAB = AB.
(2)
Multiply (2) on the left by At and on the right by Bt. Then (2) becomes A?AB(AtAB)t(ABBt)tABBt = AtABBt,
(3)
THE MOORE—PENROSE OR GENERALIZED INVERSE
or,
21
A.PR(B1• Equivalently,
BVA' ) =
(4)
= To see that (4) is true, we will show that if E1
=
R(BPBB' R(A),E2 =
(5)
Suppose first that u€R(B)-. Then E1u = E2u =0. Suppose then that ueR(B). Now to find E1u we will need to But BBtR(A*) is a subspace of R(B). Let u = u1 calculate R(A*) and u2e[BBtR(A*)]1 n R(B). Then where u1 then E1u = E2u for all
=
R(BPBB'
=
R(B)U1 =
(6)
AtAu1.
(7)
Equality (7) follows since u1 ER(B) and the projection of R(B) onto AtAR(B) is accomplished by A'A. Now E2u = PR(A.)PR(B)u = PR(A.)u = AtAii, SO = A'Au1, that is, if U2EN(A) = (4) will now follow provided that R(A*)J.. Suppose then that vER(A*). By definition, u2 BBty. Thus o = (BBtv,u2) = (V,BBtU2) = (v,u2). Hence u2eR(A*)I. as desired. (4) is now established. But (4) is equivalent to (3). Multiply (3) on the left by A and the right by B. This gives (2). Because of the particular nature of X it turns out to be easier to use ABXAB = AB to show that X satisfies the Moore definition Of(AB)t. Multiply on the right by (AB)t. Then we have But N(X) c N((ABBt)t) = R(ABBt)1 = R(AB). ABXPR(AB) = Thus XPR(As)
=
and hence
ABX=PR(AB).
(8)
)XAB = Now multiply ABXAB = AB on the left by (AB)t to get B•A• But R(X) c R((AtAB)t) = R((AtAB)*) = R(B*A*A*t) = R(B*A*) and hence BA X = X. Thus XAB=PR((AB).).
(9)
Equations (8) and (9) show that Theorem 1 holds by the Moore definition.
U Comment It is possible that some readers might have some misgivings about equality (7). An easy way to see that R(B)U1 = AtAu1 is as EAtAR(B). But follows. Suppose that u1 ER(B). Then A(u1 — AtAu1)= Au1 — Au1 = 0. Thus u1 — AtAu1eR(A*)1 c(AfAR(B))L. Hence u1 = AtAU1 (u1 — Thus PR(AAB)u3 = A'Au1. The formula for simplifies if either = I or = I.
Corollary 1.4.1 Suppose that AECTM "and BeC" P (i) If rank (A) = n, then (AB)' = Bt(APR(a,)t. )B)A. (ii) If rank (B) = n, then (AB)' =
22 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
Part (i) (or ii) of Corollary 1 is, of course, valid if A (or B) is invertible.
and rank(A)= rank(B) = n. Then (AB)t = BtAt and At = A*(AA*)l while Bt = (B*B) 1B*. The formulas for At and Bt in Corollary 1 come from Theorem 3.2 and B In fact Corollary 1 can be and the factoring of A = derived from Theorem 3.2 also. The assumptions of Corollary 2 are much more stringent than are necessary to guarantee that (AB)t = BtAt.
Corollary 1.4.2
Example 1.4.1
Suppose that
LetA=
10101 10
0
1a001
ii andB=
10
andBt=
01 where a,b,ceC.
LOOd
L000J 10001 0 01
b
10
bt
cland(AB)t= Ibt 0 0l.Ontheotherhand ctOJ L000J Lo 0
10001
BtAt= Ibt 0 LO
sothat(AB)t=BtAt.NoticethatBA=
ctOJ
IOaOl 10 0 bI. L000J
By varying the values of a,b and cone can see that (AB)t = B?At
possible without
(i)AB=BA (a#b) (ii) rank(A)=rank(B) (iii)
(a=b=0,c=1)
rank(AB)=rank(BA) (a=b#0,c=0).
The list can be continued, but the point has been made. The question
remains, when is (AB)t = BtAt? Consider Example 1 again. What was it about that A and B that made (AB)t = BtAt? The only thing that A and B seem to have in common are some invariant subspaces. The subspaces R(A), R(A), N(A), and N(A) are all invariant for both A and B. A statement about invariant subspaces is also a statement about projectors. A possible method of attack has suggested itself. We will assume that = BtAt. From this we will try to derive statements about projectors. In Example 1, MA and B were simultaneously diagonalizabk so we should see if MA works in. Finally, we should check to see if our conditions are necessary as well as sufficient. Assume then that and
= BtAt.
(10)
Theorem 1 gives another formula for (AB)'. Substitute that into (10) to get (AtAB)t(ABBt)t = B?At. To change this into a projector equation,
THE MOORE—PENROSE OR GENERALIZED INVERSE
23
multiply on the left by (A'AB)and on the right by (ABBt), to give
PR(A'AB) P
—'P R(A') PR(B)1
11
By equation (4), (11) can be rewritten as
and
=
hence is a projector. But the product of two hermitian projectors is a projector if and only if the two hermitian projectors commute. Thus (recall that [X, Y] = XY — YX) = 0, or equivalently, [A'A,BB'] = 0, If(AB)t = B?At. (12)
Is (12) enough to guarantee (10)? We continue to see if we can get any additional conditions. Example 1 suggested that an AA term might be useful. If(10) holds, then ABBtAI is hermitian. But then A*(ABBtAt)A is hermitian. Thus A*(ABBt)A?A = AtABBtA*A. (13) Using (12) and the fact that or
= A5, (13) becomes
=
{A*A,BBtJ = 0.
(14)
Condition (14) is a stronger condition than (12) since it involves AA and not JUSt
AA )•
In Theorem 1 there was a certain symmetry in the formula. It seems unreasonable that in conditions for (10) that either A or B should be more important. We return to equation (10) to try and derive a formula like (14) but with the roles of A and B 'reversed'. Now BtAtAB is hermitian is hermitian. Proceedsince we are assuming (10) holds. Thus ing as before, we get that
[BB*,AtA]=O.
(15)
Continued manipulation fails to produce any conditions not implied by
(14) and (15). We are led then to attempt to prove the following theorem. Theorem 1.4.2
Suppose that A e C'" statements are equivalent:
Xfl
and BE C"
X
Then the following
(AB)' = BtAt. (ii) BB*AtA and A*ABBt are hermitian. (iii) R(A5) is an invariant subspace for BB5 and R(B) is an invariant (1)
subs pace of A5A.
= 0 and =0. AtABBSAS (v) = BB5A5 and BB'A5AB = A5AB.
(iv)
Proof Statement (iii) is equivalent to equations (14) and (15) so that (i) implies (ii). We will first show that (ii)—(v) are all equivalent. Since BBS and A5A are hermitian, all of their invariant subspaces are reducing. Thus (ii) and (iii) are equivalent. Now observe that if C is a matrix and M a
24 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
= (I — = then + PMCPtI. This + is an invariant subspace if and only if =0. Thus says that (iii) and (iv) are equivalent. Since (v) is written algebraically, we will show it is equivalent to (ii). Assume then that BB*AtA is hermitian. Then BB*AtA = AtABB*. Thus BB*(AtA)A* = A?ABB*A*, or subspace,
BB*A* = AtABB*A*.
(16)
Similarly if A*ABBt is hermitian. then BBtA*AB = A*AB
(17)
so that (ii) implies (v). Assume now that (16) and (17) hold. Multiply (17) on the right by Bt and (16) on the right by A*t. The new equations are precisely statement (iii) which is equivalent to (ii). Thus (ii)—(v) are all equivalent. Suppose now that (ii)—(v) hold. We want to prove (1). Observe that BtAt = Bt(BBt)(AtA)At = Bt(AtA)(BBt)At, while by Theorem I (AB)t = Theorem 2 will be proven if we can show that
(AtAB)t=Bt(AtA) and
(18)
(ABBt)t = BBtAt.
(19)
To see 11(18) holds, check the Penrose equations. Let X = Bt(AtA). Then AtABX = AtABB?AtA = AtAAtABBt = A?ABBt = A'ABB') = Thus Penrose conditions (i) and (iii) are satisfied. Now X(AtAB) = Bt(AtA)(AtA)B = Bt(AtA)B. Thus X(AtAB)X = Bt(AtA)BB?(AtA) = BtAtA = X and Penrose condition (ii) is satisfied. There remains then only to show that X(A'AB) is hermitian. But A?ABB* is hermitian by assumption (ii). Thus B?(A?ABB*)Bt* is hermitian and Bt(A?A)BB*B*t = Bt(AtA)B = X(A'AB) as desired. The proof of (19) is similar and left to
the exercises. S It is worth noticing that conditions (ii)—(v) of Theorem 1.4.2 make Fig. 1.1 correct. The interval (b', c') would stand for one subspace R(AtABB?) rather than two skewed subspaces, R(AtABBt) and R(BBtAtA). A reader might think that perhaps there is a weaker appearing set of conditions than (ii)—(v) that would imply (ABt) = BtAt. The next Example shows that even with relatively simple matrices the full statement of the conditions is needed. 10 0 01 ri 0 0] Example 1.4.2 Let A = 10 1 and B = 10 1 0 I. Then B = B* = Lo i oJ Lo 0 OJ 1
10
00 ] 100
Bt=B2andAt=IO 11 Lo
i]'I=Io 0
Li oJ J
[0 i
01
1I.N0wBB*AtAis
—ij
hermitian so that [BB*, AtA] = 0 and [BBs, AtA] =0. However,
10001
A*ABBt = 10 2 LO
I
which is not hermitian so that (AB)t # B'At. 0J
THE MOORE—PENROSE OR GENERALIZED INVERSE
25
An easy to verify condition that implies (AB)t = BtAt is the following.
Corollary 1.4.3 IJA*ABB* = BB*A*A, then (AB)t = B'A'. The proof of Corollary 2 is left to the exercises. Corollary 2 has an advantage over conditions (ii)—(iv) of Theorem 2 in that one does not have to calculate At. Bt, or any projectors to verify it. It has the disadvantage that it is only sufficient and not necessary. Notice that the A, B in Example I satisfy [AtA, BB*] =0 while those in Example 2 do not. There is another approach to the problem of determining when (AB)t = BtAt. It is to try and define a different kind of inverse of A, call it, A so B A . This approach will not be discussed. that (AB)
5. 1.
Exercises Each of the following is an alternative set of equations whose unique solution X is the generalized inverse of A. For each definition show that it is equivalent to one of the three given in the text.
(a) AX = PR(A),MX ) = N(A). (b) AX = PR(A),XA = PR(A.),XAX = X. (c) XAA* =
d
A*,XX*A* = x.
XAx—J"
if
10 ifxeN(A*).
(e) XA = PR(A),N(X) = N(A*). Comment: Many of these have appeared as definitions or theorems in the
literature. Notice the connection between Example 2.1, Exercise 1(b), and Exercise 3 below. 2. Derive a set of conditions equivalent to those given in Definition 1.2 or Definition 1.3. Show they are equivalent. Can you derive others? 3. Suppose that A€Cm Xfl Prove that a matrix X satisfies AX = XA = if and only if X satisfies (i), (iii), and (iv) of Definition 1.3. Such an X is called a (1,3,4)-inverse of A and will be discussed later. Observe that it cannot be unique if it is not At since trivially At is also a (1,3,4)-inverse.
4. Calculate At from Theorem 3.1 when A
and when
=
Hint: For the second matrix see Example 6. 5. Show that if rank (A) =
1,
then At = !A* where k = 1=1 j= 1
6.
If and rank (A) = n, notice that in Theorem 3.2, C may be chosen as an especially simple invertible matrix. Derive the formula
26 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
for At under the assumption that rank A = n. Do the same for the case
when rank (A) = 0 f 0 0 7. Let A = 1
I
0
L—1
1
0
21 I. Cakulate At from Definition 1.1. 1
—1 0]
8. Verify that (At)* = (A*)t. 9. Verify that (AA)t = AtAt, AEC, where ,V = 10. If AeCTM
Xiv
1
if A
0 and 0' =0.
and U, V are unitary matrices, verify that (UAV)t =
V*AtU*. 11. Derive an explicit formula for A' that involves no (t)'s by using the singular value decomposition.
*12. Verify that At =
*13. Verify the At =
I 2iri
f !(zI z
— A*A) IA*dz where C is a closed contour
containing the non-zero eigenvalues of A*A, but not containing the zero eigenvalue of A*A in or on it.
14. Prove Proposition 1.1. Exercises 15—21 are all drawn from the literature. They were originally done by Schwerdtfeger, Baskett and Katz, Greville, and Erdelyi. Some follow almost immediately from Theorem 4.1. Others require more work.
such that rank A = rank B and if the eigenvectors corresponding to non-zero elgenvalues of the two matrices ASA and BB* span the same space, then (AB)' = B'A'. and AAt = AtA,BBt = BtB,(AB)tAB= 16. Prove that if AEC AB(AB)', and rank A = rank B = rank(AB), then (AB)' = BtAt. Be 17. Assume that AECTM Show that the following statements are equivalent. 15. Prove that if AECrnXv and
X
(i) (AB)t = B'A'. (ii) A?ABB*A*ABBt = BB*A*A. (iii) A'AB = B(AB)'AB and BBtA* = A*AB(AB)t. (iv) (AtABBt)* = (AtABBt)t and the two matrices ABBtAt and UtAtAB are both hermitian.
18. Show that if [A, = 0, [At, (Bk] =0, [B, =0, and [Bt,PR(A,)] = 0, then (AB)' = 19. Prove that if A* = At and B* = Bt and if any of the conditions of Exercise 18 hold, then (AB)' = BtAt = B*A*. 20. Prove that if A* = A' and the third and fourth conditions of Exercise 18 hold, then (AB)' = BtAt = BtA*. 21. Prove that if B* = B' and the first and second conditions of Exercise 18 hold, then (AB)t = BtAt. 22. Prove that if [A*A, BB*] =0, then (AB)t = B'A'.
THE MOORE—PENROSE OR GENERALIZED INVERSE
27
Verify that the product of two hermitian matrices A and B is hermitian if and only if [A, B] =0. 24. Suppose that P. Q are hermitian projectors. Show that PQ is a projector if and only if [P, Q] =0. = 25. Assuming (ii)—(v) of Theorem 8, show that (APR(B))t = BtAt. PR(B)A without directly using the fact that (AS)' = 26. Write an expression for (PAQ)t when P and Q are non-singular. 27. Derive necessary and sufficient conditions on P and A, P non-singular, 23.
for (P 'AP)' to equal P 'A'P. X
28. Prove that if Ae Br and the entries of A are rational numbers, then the entries of At are rational.
2
Least squares solutions
1.
What kind of answer is Atb?
At this point the reader should have gained a certain facility in working with generalized inverses, and it is time to find out what kind of solutions they give. Before proceeding we need a simple geometric lemma. Recall \1/2 if w = [w1, ... , w,3*GCP, then l( w = E 1w112) = (w*w)U2 that denotes the Euclidean norm of w.
Lemma 2.1.1
Ifu,veC"and(u,v)=O,thenhlu+v112=11u112+11v112.
Proof Suppose that
and (u,v) =0. Then
llu+v112 =(u+v,u+v)=(u,u)+(v,u)+(u,v)+(v,v)= hull2 + 11,112. Now consider again the problem of finding solutions u to (1)
If(1) is inconsistent, one could still look for u that makes Au — as possible.
b
as small
Definition 2.1.1
Suppose that and b€Cm. Then a vector is called a least squares solution to Ax = b if II Au — b Av — b for all veC'. A vector u is called a minimal least squares solution to Ax = b tf u is a least squares solution to Ax = b and liii < w for all other least squares solutions w.
The name 'least squares' comes from the definition of the Euclidean
norm as the square root of a sum of squares. If beR(A), then the notions of solution and least squares solution obviously coincide. The next theorem speaks for itself. Theorem 2.1.1
Suppose that AeCtm'" and beCTM. Then Atb is the
minima! least squares solution to Ax = b.
LEAST SQUARES SOLUTIONS 29
Proof Notice that IIAx—b112 = II(Ax
(I -AA')b112
—
= IlAx —
+ 11(1 —AAt)b112.
Thus x will be a least squares solution if and only if x is a solution of the consistent system Ax = AAtb. But solutions of Ax = AAtb are of the form
AtA)h = — A'A)h. Since 11x112 = IIAtbII2 we see that there is exactly one minimal least squares solution x = Atb. As a special case of Theorem 2.1.1, we have the usual description of an orthogonal projection.
x=
—
Corollary 2.1.1
Suppose that M is a subs pace of
•
and
is the
orthogonal projector of onto M. If beCI*, then PMb is the unique closest vector in M to b with respect to the Euclidean norm.
In some applications, the minimality of the norm of a least squares solution is important, in others it is not. If the minimality is not important, then the next theorem can be very useful.
Theorem 2.1.2 Suppose that AECTM "and beCTM. Then the following statements are equivalent (1) u is a least squares solution of Ax = b,
(ii) u is a solution of Ax = AAtb, (iii) ii is a solution of A*Ax = A*b. (iv) u is of the form A'b + h where heN(A).
Proof We know from the proof of Theorem 1 that (i), (ii) and (iv) are equivalent. If (1) holds, then multiplying Au = b on the left by A* gives (iii). On the other hand, multiplying A*Au = A*b on the left by A*t gives Au = AA'b. Thus (iii) implies (ii). U Notice that the system of equations in statement (iii) of Theorem 2 does not involve At and is a consistent system of equations. They are called the normal equations and play an important role in certain areas of statistics.
It was pointed out during the introduction to this section that if X satisfies AXA = A, and be R(A), then Xb is a solution to (1). Thus, for consistent systems a weaker type of inverse than the Moore—Penrose would suffice. However, if then the condition AXA = A is not enough to guarantee that Xb is a least squares solution.
Fact There exist matrices X, A and vector b, R(A), such that AXA = A but Xb is not a least squares solution of Ax = b. Example 2.1.1
If X satisfies AXA = A, then X is of the
Let A
= form 1
Lx21
squares
1. Let b =
X22J
1 1. Then by Theorem 2 a vector u isa least L1J
solution to Ax = b if and only if Ax = b1 where b1
least squares solution, then IIAu — bil =
11b1 —
bli
=
1.
r1i
If u is a
= Loi [1 +2X1 2].
But A(Xb) =
30 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
= (1 +41 x12 12)1/2. If x22 0 then A(Xb) — b > I and Thus A(Xb) — Xb will not be a least squares solution Ax = b.
Example 2 also points out that one can get least squares solutions of the form Xb where X is not At. Exactly what conditions need to be put on X to guarantee Xb is a least squares solution to Ax = b will be discussed in Chapter 6.
2.
Fitting a linear hypothesis
Consider the law of gravitation which says that the force of attraction y
between two unit-mass points is inversely proportional to the square of the distance d between the points. If x = l/d2. then the mathematical formulation of the relationship between y and x isy =f(x) = /ix where fi is an unknown constant. Because the functionf is a linear function, we consider this to be a linear functional relationship between x and y. Many such relationships exist. The physical sciences abound with them. Suppose that an experiment is conducted in which a distance d0 between two unit masses is set and the force of attraction y0 between them is measured. A value for the constant fi is then obtained as fi = y0/x0 = However, if the experiment is conducted a second time, one should not be greatly surprised if a slightly different value of fi is obtained. Thus, for the purposes of estimating the value of fi, it is more realistic to say that for each fixed value of x, we expect the observed values yj of yto satisfy an equation of the form y1 = fix + e1 where ej is a measurement error which occurs more or less at random. Furthermore, if continued observations of y were made at a fixed value for x, it is natural to expect that the errors would average out to zero in the long run. Aside from measurement errors, there may be another reason why different observations of y might give rise to different values of/i. The force of attraction may vary with unknown quantities other than distance (e.g. the speed of the frame of reference with respect to the speed of light). That is, the true functional relationship may be y = fix + g(u1 , u2, ... ,UN) where the function g is unknown. Here again, it may not be unreasonable to expect that at each fixed value of x, the function g assumes values more or less at random and which average out to zero in the long run. This second type of error will be called functional error. Many times, especially in the physical sciences, the functional relationship between the quantities in question is beyond reproach so that measurement error is the major consideration. However, in areas such as economics, agriculture, and the social sciences the relationships which exist are much more subtle and one must deal with both types of error. The above remarks lead us to the following definition.
Definition 2.2.1 When we hypothesize that y is related linearly to x1 ,x2, ... ,xN, we are hypothesizing that for each set of values p1 = (x11, x12,... , x1)for x1, x2,... ,x,,, the observations y1for y atp1 can be expressed where(i)/i0,fi1,...,fiare asy,=fi0+fi1x11 +fi2x12 + ...
LEAST SQUARES SOLUTIONS
31
unknown constants (called parameters). (ii) e,1 is a value assumed by an unknown real valued function e, such that e, has the property that the values which it assumes will 'average out' to zero over all possible observat ions y at p..
That is, when we hypothesize that y is related linearly to x3 , x2, ...
,;,
are hypothesizing that for each point p = (x1, , the 'expected ... , value', E(,y.), of the observation y. at (that is, the 'average observation' at we
p1) satisfies the equation
=
+
+
+ ... +
and not that y1 does. This can be easily pictured in the case when only two
variables are involved. Suppose we hypothesize that y is related linearly to the single variable x. This means that we are assuming the existence of a linef(x) = II,,, + such that each point (x1, E(y.)) lies on this straight line. See Fig. 21.
In the case when there are n independent variables, we would be hypothesizing the existence of a surface in (which is the translate of a subspace) which passes through the points (p1. E(y.)). We shall refer to such a surface as a flat. In actual practice, the values E(y.) are virtually impossible to obtain exactly. Nevertheless, we will see in the next section that it is often possible to obtain good estimates for the unknown parameters, and therefore produce good estimates for the E(y.)'s while also producing a reasonable facsimile of the hypothesized line of flat. The statistically knowledgeable reader will by now have observed that we have avoided, as much as possible, introducing the statistical concepts which usually accompany this type of problem. Instead, we have introduced vague terms such as 'average out'. Admittedly, these terms being incorporated in a definition would (and should) make a good mathematician uncomfortable. However, our purpose in this section is to examine just the basic aspects of fitting a linear hypothesis without introducing statistical
The set of oil
E(.Vm)
possible
at x,,, set of all possible observations
4The set of oil possible observations at x1
A'
Fig. 2.1
32 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
concepts. For some applications, the methods of this section may be
sufficient. A rigorous treatment of linear estimation appears in Chapter 6. In the next two sections we will be concerned with the following two basic problems.
(I) Having hypothesized a linear relationship, y, = + + ... + for the unknown parameters P1. 1=0,1,... find estimates , n. + e,, (II) Having obtained estimates for the p1's, develop a criterion to help decide to what degree the functionf(x1 , x2, ... , = Po + + ... + 'models' the situation under question. 3.
Estimating the unknown parameters
We will be interested in two different types of hypotheses.
Definition 2.3.1
When the term
is present in the expression (1)
we shall refer to (1) as an intercept hypothesis. When does not appear, we will call (1) a no intercept hypothesis. Suppose that we have hypothesized that y is related linearly to by the no intercept hypothesis x1 , x2,...
,;
y1 = P1;1
+ P2;2 + ... +
(2)
+ e,1.
To estimate the parameters P1 , select (either at random or by ... , design) a set of values for the x's. Call them p1 = [x11,x12, ... ,x1J. Then observe a value for y at p1 and call this observation y1. Next select a second set of values for the x's and call them p2 = [x21 ,x22, ... ,x2j(they need not be distinct from the first set) and observe a corresponding value for y. Call it y2. Continue the process until m sets of values for the x's and m observations for y have been obtained. One usually tries to have rn> n. If the observations for the x's are placed as rows in a matrix
xli x21
xml
pm_i
which we will call the design matrix, and the observed values for y are
placed in a vector y = [y1 ,...
we may write our hypothesis (2) as
y=Xb+e, where b is the vector of unknown parameters b =
(3)
,... , p,,JT and e, is the
unknown e,,= [e,1,... ,e,,JT.
In the case of an intercept hypothesis (1) the design matrix X1 in the equation
y=X,b,+e,
(4)
LEAST SQUARES SOLUTIONS 33
takes on a slightly different appearance from the design matrix X which
arose in (3). For an intercept hypothesis. X1
is
of the form
ri I
I
[1 x2 1
I
i=I
:
Li
Xmi
and b3 is of the form
I
Xm2 b1
L'
maJipnx(n+1)
= [IJ0IbT]T,
=
b=
Consider a no intercept hypothesis and the associated matrix equation b is to use the (3). One of the most useful ways to obtain estimates information contained in X and y, and impose the demand that 6 be a vector such that 'X6 is as close to y as possible', or equivalently, 'e, is as close to 0 as possible. That is, we require 6 to be a least squares solution of Xb = y. Therefore, from Theorem 1.2, any vector of the form 6= Xty + h,heN(X), could serve as an estimate for b. If X is not of full column rank, to select a particular estimate one must impose further restrictions on 6. In passing, we remark that one may always impose the restriction that 11611 be minimal among all least squares estimates so that the desired estimate is 6= Xty. Depending on the application, this may or may not be the estimate to choose. defined by x6 = is For each least squares estimate 6 of b, the values', E(y), where an estimate for the vector of
E(y)= [E(y1),... Although it is a trivial consequence of Theorem 1.2, it is useful to observe the following. The vector = X6 is the same for all least sjuares solutions 6, of X6 = Moreover,for all least squares solutions 5, = x6 = = XXty and r = y — = = (I — XXt)y.
Theorem 2.3.1
A closely related situation is the following. Suppose (as is the case in
many applications) one wishes to estimate or predict a value for a particular linear combination (5)
of the
on the basis of previous observations. Here,
= [c1 ,... , ce].
That is, we want to predict a value, y(c*), for y at the point = [C1, c2, ... , cjon the basis of observations made at the points p1 , p2, ... p,,,. If = cab, we know that it may be possible to have we use = infinitely many estimates 6. Hence y(c*) could vary over an infinite set of values in which there may be a large variation. However, there when c*b is invariant among all least squares estimates 6, so that y(c*) has a unique value.
Theorem 2.3.2
Let
The linear form
least squares solutions of X6 = c*16
= C*X?Y.
invariant among all y !f and only iice R(X*); in which case, is
34 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
Proof If S is a least squares solution of X6 = y, then S is of the form 6 = X'y + h where heN(X). Thus c"S = cXty + ch is invariant if and only if c*h =0 for all hEN(X). That is, ceR(X*) = N(X)L. • Note that in the important special case when X has full column rank, cb is trivially invariant for any Most of the discussion in this section concerned a no intercept hypothesis and the matrix equation (3). By replacing X by X1 and b by b,, similar remarks can be made about an intercept hypothesis via the matrix equation (4).
4.
Goodness of fit
Consider first the case of a no intercept hypothesis. Suppose that for
various sets of values p, of the x's, we have observed corresponding values y, for y and set up the equation
y=Xb+e,.
(1)
From (1) we obtain a set of least squares estimates parameters ,81
,
for the
...
As we have seen, one important application is to use the to estimate or predict a value for y for a given point = [c1 ,c2, ... ,cJ by means
of what we shall refer to as the estimating equation (2)
How good is the estimating equation? One way to measure its effectiveness is to use the set of observation points which gave rise to X and measure how close the vector y of observed values is to the vector of estimated values. That is how close does the flat defined by (2) come to passing From through the points (ps, y— Theorem 3.1 we know that is invariant among all least squares estimates S and that y — = r = 11(1— XXt)y II. One could be tempted to say that if r Ills small, then our estimating
equation provides a good fit for our data. But the term 'small' is relative. If we are dealing with a problem concerning distances between celestial objects, the value r = 10 ft might be considered small whereas the same value might be considered quite large if we are dealing with distances between electrons in an atomic particle. Therefore we need a measure of relative error rather than absolute error. Consider Fig. 2.2. This diagram Suggests another way to measure how close y is to The magnitude of the angle 9 between the two vectors is such a measure. In Ce', it is more convenient to measure Icos 0,, rather than I°I, by means of the equation cos 0i_j!.tll —
_IIxxtyII —
(Throughout, we assume y 0, otherwise there is no problem.) LikeWise, Isin 0, or stan might act as measures of relative error. Since y can be
LEAST SQUARES SOLUTIONS 35
R
Fig. 2.2
decomposed into two components, one in R(X) and the other in R(X)--, y = + r, one might say, in rough terms, that Icos 01 represents the 1. percentage ofy which lies in R(X). Let R = cos 0 so that 0 RI Notice that if IRI = 1, then all of y is in R(X),y = and r = 0. If R = 0, then y j.. R(X), =0, and r = y. Thus when I RI = 1, the flat defined by the actually passes 131x1 + j32x2 + ... + equationf(x1,x2, ... through each of the data points (p1. y1) so that we have an 'exact' or as possible and 'perfect' fit. When R =0, y is as far away from we say that we have no fit at all. In practice, it is common to use the term R2 = cos2 0 = I! 112/Il 112 rather than I RI to measure the goodness of fit. Note that R2 RI since R 1. Thus R2 is a more conservative measure. For example, if R2 = 0.96, one would probably consider the fit to be fairly good since this indicates that, in fact, about 98% of y lies in R(X). A familiar form for R2, when all numbers are real, is /.,. \2 1=1
R2—
1=1
1=1
where
denotes the ith entry of XXty = and y. is the ith entry of y.
This follows because II
112
[y*yJctyj = [y*fl = 1=1
Hence R2 =
Notice that R and R2
2 II 3' II
r=
is not. In statistical circles, R goes by the name of the product moment correlation and R2 is known as the between the observed y1's and the predicted are unit free measures whereas
coefficient of determination.
y
—
36 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
Consider now the case of the intercept hypothesis
= Po + Pixji + ... +
+
As mentioned earlier, this gives rise to the matrix equation y = X,b + e,
where X, = [i!X1. If one wishes to measure the goodness of fit in this case, one may be
tempted to copy the no intercept case and use IIx,x,yII
(3
This would be a mistake. In using the matrix X,, more information would be used than the data provided because of the first column, j. The
expression (3) does not provide a measure of how well the flat
fitsthedatapoints Instead, the expression (3) measures how well the flat in C".2 fits the points = + + f(xO,xl,x2, ... (1, p.. y.). In order to decide on a measure of goodness of fit, we need the following fact. = The vector 6, = Theorem 2.4.1 Let X, ,fl,J' = [p0:ST]bT = [Ps.... ,$j, isa least squares solution of X,b, = y and only if = !j*(y — X6) (4) and 6 is a least squares solution of (5)
Here J =
is
a matrix of ones.
Proof Suppose first that
satisfies (4) and that Sis a least squares b"]' is a least squares solution of solution of(5). To show that = X,b, = y, we shall use Theorem 1.2 and show that 6, satisfies the normal = equations, Note first that
lw
'Lxi —I
Therefore, x1'x,S, = (6)
Xb) +
j*y
I X*Jy
—
x*JxG + x*x6
LEAST SQUARES SOLUTIONS
Since S is a least squares solution of (5), we
is a solution of
/
/ \
I
'
mj\ mj 1
1
/
\2
/ \
know from Theorem 2.1 that 6 I
'
mj\ mj 1
\*/ /
/ =II__J) 1
37
1
(7)
\
orthogonal projector onto N(J)). the equation (7) is equivalent to X*X6 — !X*JXS = X*Y
or
—
!X*Jy — !X*JX6 + X*X6 = X*Y.
(8)
= which proves that 6, is a = = y. Conversely, assume now that is a least
Therefore, (6) becomes x,x,61
least squares solution of squares solution of x16, = y. Then 61 satisfies the normal equations = That is, fl0 and S must satisfy m
—
I
—
Direct multiplication yields (9)
X*j$0 + X*X6 = X*y.
Equation (9) implies that value of X*Jy —
(10)
=
into (10) yields
X*JX6 + X*X6 =
—
X6), which is (4). Substituting this
—
X6) + X*X6 = X*y or equivalently,
which is equation (9). Hence, 6
satisfies (7) so that S is a least squares solution of (i
(i
—
—
j )y, and the theorem is proven.
Let XMand yMdenote the matrices XM =
(i !J)y and let i1 =!
and =
(i
—
—
!
and
y.. That is,
xS =
YM
=
the mean of
—
the jth column of X, is the mean of the values assumed by thejth independent variable Likewise, is just the mean of all of the observations y, for y.
38 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
and YM are
Therefore,
the matrices
I x21—X1 X22—X2
XMI
:
1Y2—YI
I'YMI
x,,2—i2
I
Theorem 1 says that in obtaining least squares solutions of X,51 = y, X15 = YM In we are really only obtaining least squares solutions effect, this says that fitting the intercept hypothesis + P,x, + y. = /1,,, + fl1x13 + fl2x12 + by the theory of least squares is equivalent to fitting the no intercept hypothesis
=
—
+ P2(x,2
—
—
x2) +
+
—
i.,) +
Thus, a measure of goodness of fit for an intercept hypothesis is 2
p2_ —
M
JM
— —
2
;
2 2
A familiar form for R2 when all numbers are real is
(11)
[91_irJ2) i—i
where y. is the ith entry of y and (11), we must first prove that
=
is the ith entry of = X,Xy. To prove (12)
—
To see that (12) is true, note that so that by Theorem 1,
is a least squares solution of X,,6 = y
[!j*y — L
a least squares solution of = y. Thus all least squares solutions of X161 = y are of the form 61=s+ h where heN(X1). Because Xy is a least squares solution of = y, there must be a vector b0e N(X1) such that is
= X1(x + h0) = X1s = !jj*(y —
= s + h0. Therefore,
= !Jy =
+
—
+
= !Jy + (i —
+ =
+
from which (12) follows. Now observe that (12)
LEAST SQUARES SOLUTIONS 39
implies that the ith entry (YM)j of
=
is
given by (13)
We can now obtain (11) as 2
R2=
YM
4
=
IIYMII2
IIYMIIIIY%jII
(14)
=
By using (13) along with the definition of YM we see that (14) reduces to (11). We summarize the preceding discussion in the following theorem.
Theorem 2.4.2
For the no intercept hypothesis
= fl1x11 +
+ ... +
+ e,. the number,
)2 XXty 112_Il
R2 —
112
-
11y1I2
1=1
1=1
is a measure of goodness offit. For the intercept hypothesis, = fl0 + measure of goodness offit is given by + + ... + fi
)2 R2
YM
where XM =
=
-
11211
=11
(i
—
=
IIYMII
YM =
(i
—
Here X1
and
is the ith entry of
andJ =jj*,j =[1,...,lJ*.
In each case 0 R2 1 and is free of units. When R2 = 1, the fit is exact and when R =0, there is no fit at all. 5.
An application to curve fitting
Carl Friedrich Gauss was a famous and extremely gifted scientist who lived
from 1777 to 1855. In January of 1801 an astronomer named G. Piazzi briefly observed and then lost a 'new planet' (actually this 'new planet' was the asteroid now known as Ceres). During the rest of 1801 astronomers and other scientists tried in vain to relocate this 'new planet' of Piazzi. The task of finding this 'new planet' on the basis of a few observations seemed hopeless. Astronomy was one of the many areas in which Gauss took an active interest. In September of 1801, Gauss decided to take up the challenge of
40 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
finding the lost planet. Gauss hypothesized an elliptical orbit rather than
the circular approximation which previously was the assumption of the astronomers of that time. Gauss then proceeded to develop the method of least squares. By December, the task was completed and Gauss informed the scientific community not only where to look, but also predicted the position of the lost planet at any time in the future. They looked and it was where Gauss had predicted it would be. This extraordinary feat of locating a tiny, distant heavenly body from apparently insufficient data astounded the scientific conununity. Furthermore, Gauss refused to reveal his methods. These events directly lead to Gauss' fame throughout the entire scientific community (and perhaps most of Europe) and helped to establish his reputation as a mathematical and scientific genius of the highest order. Because of Gauss' refusal to reveal his methods, there were those who even accused Gauss of sorcery. Gauss waited until 1809 when he published his Theoria Motus Corporum Coelestiwn In Sectionibus Conicis Dolem Ambientium to systematically develop the theory of least squares and his methods of orbit calculation. This was in keeping with Gauss' philosophy to publish nothing but well polished work of lasting significance. Gauss lived before linear algebra as such existed and he solved the problem of finding Ceres by techniques of calculus. However, it can be done fairly simply without calculus. For the sake of exposition, we will treat a somewhat simplified version of the problem Gauss faced. To begin with, assume that the planet travels an elliptical orbit centred about a known point and that m observations were made. Our version of Gauss' problem is this.
Problem A
Suppose that (x1 , y1), (x2 , y2),... , (x,,,, y,,) represent the m
coordinates in the plane where the planet was observed. Find the ellipse in standard position x2/a2 ÷ y2/b2 = 1, which comes as close to the data
points as possible. If there exists an ellipse which actually passes through the m data points, then there exist parameters = 1/a2, = 1/b2, which satisfy each of the m equations fl1(x1)2 + $2(y,)2 = 1 for I = 1,2, ... , m. However, due to measurement error, or functional error, or both, it is reasonable to expect =1 that no such ellipse exists. In order to find the ellipse fl1x2 + which is 'closest' to our m data points, let
=
1
for I = 1,2,... ,m.
Then, in matrix notation, (1) is written as I
e11 e21
X2
eJ
x2
;i LP2J
'
(1)
LEAST SQUARES SOLUTIONS 41
or e = Xb — j. There are many ways to minimize the
could require that
1e11
be minimal
or that max
For example, we I,1e21,...
be
i—I
minimal. However, Gauss himself gave an argument to support the claim
that the restriction that minimal
e 112 be
(2)
i—i
gives rise to the 'best closest ellipse', a term which we will not define here.
(See Chapter 6.) Intuitively, the restriction (2) is perhaps the most reasonable if for no other reason than that it agrees with the usual concept of euclidean length or distance. Thus Problem A can be reformulated as follows.
Problem B
Find a vector 6=
, fl2]T
that is a least squares solution
of Xb = j. From Theorem 2.4, we know that all possible least squares solutions are of the form 6 = Xtj + h where hEN(X). In our example, the rank of XeC"2 wilibe two unless for each i= l,2,...,m; in which case
the data points line on a straight line. Assuming non-colinearity, it follows that N(X) = {O} and there is a unique least squares solution 6= Xtj = (X*X) 1X"j. That the matrix X is of full column rank is characteristic of curve fitting by least squares techniques. Example 2.5.1 We will find the ellipse in standard position which comes as close as possible to the four data points (1, 1), (0,2), ( — 1,1), and (— 1,2).
ThenX=[? the least squares solution to X6 = j and e = I( X6 — if is approximately = I is the ellipse that fits 'best' (Fig. 2.3). A measure of goodness of fit is R2 = f 2/lIi 112 * 0.932 ( * means 'approximately equal to') which is a decent fit. Notice that there is nothing in the working of Problem B that forced Xtj to have positive coefficients. If instead of an ellipse, we had tried to fit a hyperbola in standard position to the data, we would have wound up with the same least squares problem which has only an ellipse as a least squares solutions. To actually get a least squares problem equivalent to Problem A it would have to look something like this: 0.5. Thus j71x2 +
II
Problem C f Au — b H
Find a vector u with positive coefficients such that for all v with positive coefficients.
Av — b
II
The idea of a constrained least squares problem will not be discussed
here. It is probably unreasonable to expect to know ahead of time the
42 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
0
OzOo$o point Fig. 2.3 The ellipse
x2
+1 yz =
orientation of a trajectory. Perhaps a more reasonable problem than
Problem A would be:
Problem 0 Given some data points find the conic section that provides the 'closest fit.' For convenience assume that the conic section does not go through the origin. Then it may be written as ax2 + by2 + cx + dy +fxy = I. If we use the same four data points of the previous example there are many least squares solutions all of which pass through the four data points. The minimal least squares solution is a hyperbola.
Polynomial and more general fittings
6.
In the previous section we were concerned with fitting a conic to a set of
data points. A variation which occurs quite frequently is that of trying to find the nth degree polynomial (1)
Usually one has which best fits m data points (x1 ,yj, (x2,y2), ... rn> n + 1; otherwise there is no problem because jim n + 1, then the interpolation polynomial of degree m — I n provides an exact fit.
We proceed as before by setting (2) J—o
Thus
Il I! es,,
XNS
XM*XJ
fbi
y11
II]
yJ '
LEAST SQUARES SOLUTIONS 43
or e = Xb — y. If the restriction that e be minimal is imposed, then a closest nth degree polynomial to our data points has as its coefficients Where the are the components of a least squares solution b = Xty h,heN(X) of Xb = y. Notice that if the xi's are all distinct then X has full column rank. (It is an example of what is known as a Vandermonde segment.) Hence N(X) = {O} and there is a unique least = (X*X) 1X*y. squares solution, 5= To measure goodness of fit, observe that (2) was basically an intercept hypothesis so that from Theorem 4.2, one would use the coefficient of 112/Il YM determination R2 = A slightly more general situation than polynomial fitting is the following. Suppose that you are given n functions g1(x) and n linear functions I. of Now suppose that you are given m data points k unknown parameters (x1, y,). The problem is to find values so that
... is as close to the data points as possible. Let
WJk; and define e. =
x=:
=
+ ... +
!,gjx1) — y.. Then the corresponding matrix
e= [e1,...,e,,JT,b= [$1
equation ise= g1(x1) g1(x2)
,
g2(x1) ... g2(x2) ...
Note that this problem is equivalent to finding values so that is as close to the data points as y = fl1L1(x) + fl2L2(x) + ... + possible where L1(x) is a linear combination of the functions g1(x),
To insure that e is minimal, the parameters must be the components of a least squares solution 6,, of XW6W = y. By Theorem 1.7 we have 5w
=
x1)W) (XPR(w))Y + Ii, he N(XW).
In many situations W is invertible. It is also frequently the case that X has full column rank. If W is invertible and N(X) = {O}, then N(XW) = (0) so that (2) gives a unique least squares solution
= W 'Xty = W i(X*x) IX*y = (X*XW) 'X1y. Example 2.6.1
We will find parameters
$2. and
(3) so
that the
function
f(x)= $1x + fi2x2 + fi3 (sin x) best fits the four data points (0,1),
(, 3). and (it 4). Then we will
44 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
find the
so that
f(x) = (fir + ft2)x + (P2 + = P1x + + x2) +
+ sin x + sinx)
best fits the same four data points. In each case we will also compute R2, the coefficient of determination. In the first case, we seek least squares solutions of = y where
000
1
1
2
X=4irir
3
4
In this case, X has full column rank so that the least squares solution is unique and is given by = Xty [9.96749, — 2.76747, — 5.82845]T. Also
0.9667
so that we have a fairly good fit.
11101
Inthesecondcase,W= 10
1
1
LOOLJ
solution is given by
fi
—
1
10
1
Lo
0
ii 1 9.967491
ii iJ
1—2.767471 L—
5.82845J
1 6.906511
=
I
3.060981.
[— 5.82845J
Since N(X) = {O} and W is invertible we have XW(XW)t = XWW XXt and 2
=
IIXW(XW)'y12
hO In closing this section we would like to note that while in curve fitting problems it is usually possible to get X of full column rank. There are times when this is not the case. In the study of Linear Statistical Models, in say Economics, one sometimes has several that tend to move together (are highly correlated). As a result some columns of X, if not linearly dependent, will be nearly so. In these cases Xis either not of full rank or is ill-conditioned (see Chapter 12) [71]. The reader interested in a more statistically rigorous treatment of least squares problems is referred to Section 6.4.
LEAST SQUARES SOLUTIONS 45
7.
WhyAt?
It may come as a surprise to the reader who is new to the ideas in this book that not everyone bestows upon the Moore—Penrose inverse the same central role that we have bestowed upon it so far. The reason for this disfavour has to do with computability. There is another type of inverse, which we denote for now by A' such that A'b is a least squares solution to Ax = b. (A' is any matrix such that AA' = The computation of A' or A'b frequently requires fewer arithmetic operations than the computation of At. Thus, if one is only interested in finding a least squares solution, then A'b is fine and there would appear to be no need for At. Since this is the case in certain areas, such as parts of statistics, they are usually happy with an A'b and are not too concerned with At. Because they are useful we will discuss the A' in the chapter on other types of inverses (Chapter 6). We feel, however, that the generalized inverse deserves the central role it has played so far. The first and primary reason is pedagogical. A' stands for a particular matrix while A' is not unique. Two different formulas for an A' of a given matrix A might lead to two different matrices. The authors believe very strongly that for the readers with only an introductory knowledge of linear algebra and limited 'mathematical maturity' it is much better to first learn thoroughly the theory of the Moore—Penrose generalized inverse. Then with a firm foundation they can easily learn about the other types of inverses, some of which are not unique and some of which need not always exist.
Secondly, a standard way to check an answer is to calculate it again by a different means. This may not work if one calculates an A' by two different techniques, for it is quite possible that the two different correct approaches will produce very different appearing answers. But no matter how one calculates A' for a given matrix A, the answer should be the same.
3
Sums, partitioned matrices and the constrained generalized inverse
1.
The generalized inverse of a sum
For non-singular matrices A, B, and A + B, the inverse of the sum is rarely the sum of the inverses. In fact, most would agree that a worthwhile
expression for (A +'is not known in the general case. This would tend to make one believe that there is not much that can be said, in general, about (A + B)t. Although this may be true, there are some special cases which may prove to be useful. In the first section we will state two results and prove a third. The next sections apply the ideas of the first to develop computational algorithms for At and prove some results on the generalized inverse of special kinds of partitioned matrices. Our first result is easily verified by checking the four Penrose conditions of Definition 1.1.3. Theorem 3.1.1 If A, B€Cm Xn and if AB* = 0 and B*A = 0, then (A + B)t = At + Bt. The hypothesis of Theorem 1 is equivalent to requiring that R(A*) R(B*) and R(A) j. R(B). Clearly this is very restrictive. If the hypothesis of Theorem I is relaxed to require only that R(A*) j.. R(B*), which is still a very restrictive condition, or if we limit our attention to special sums which have the form AA* + BB*, it is then possible to prove that the following rather complicated formulas hold.
j
Theorem 3.1.2 IfAeC BeCTM ", then (AA* + BB*)? = (I — Ct*B*)At [I — AtB(I — CtC)KB*At*JAt(I — BCt) + C'C' where C = (I — AAt)B, K = [I + (I — CtC)B*At*AtB(I — CtC)] '. If A, and AB* = 0, then (A + B)t = At +(I — AtB)[C? +(I — CtC) x KB*At*At(I — where C and K are defined above. Since Theorem 2 is stated only to give the reader an idea of what types of statements about sums are possible and will not be used in the sequel, its proof is omitted. The interested reader may find the proof in Cline's paper [30].
PARTITIONED MATRICES AND CONSTRAINED GENERALIZED INVERSE 47
We will now develop a useful formulation for the generalized inverse of
a particular kind of sum. For any matrix BeC'" ",the matrix B can be denote the written as a sum of matrices of rank one. For example, let matrix in C'" "which contains a I in the (i.j)th position and 0's elsewhere. then If B = (1)
is the sum of rnn rank one matrices. It can be easily shown that if rank (B) = then B can be written as the sum of just r matrices of rank one. Furthermore, if FECrnx? is any matrix of rank one, then F can be written as the product of two vectors, F = cd*, where ceC'", deC". Thus BeC'" can always be written as (2)
Throughout this chapter e1 will denote a vector with a I in the ith place and zeros elsewhere. Thus if ... ,eJ C'", then {e1 ,... ,e,,J would be the standard basis for C'". If B has the decomposition given in (1), let = where e1eC'". Then the representation (2) assumes the form B= It should be clear that a representation such as (2) is not unique. Now if one had at one's disposal a formula by which one could g-invert (invert in the generalized sense) a sum of the form A + cd* where ceC'" and deC", then B could be written as in (2) and (A + B)? could be obtained by
recursively using this formula. In order for a formula for (A + cd)' to be useful, it is desirable that it be of the form (A + cd)' = At + G where G a matrix made up of sums and products of only the matrices A, At, c, d, and their conjugate transposes. The reason for this requirement will become clearer in Sections 2 and 3. Rather than present one long complicated expression for (A + cd*)f Exercise 3.7.18.), it is more convenient to consider the following six logical possibilities which are clearly exhaustive. (1)
and
and 1 + d*Afc arbitrary; and 1 + d*A?C =0;
(ii) ceR(A) and (iii) ceR(A) and d arbitrary and 1 + dA'c #0; (iv) and deR(A*) and 1 + d*Atc =0;
(v) c arbitrary and deR(A) and 1 + d*Afc #0; (vi) ceR(A) and deR(A*) and I +d*Afc=0. Throughout the following discussion, we will make frequent use of the fact that the generalized inverse of a non-zero vector x is given by = x*/IIx 112 where lix 112 =(x,x). Theorem 3.1.3 For CEC'" and dEC" let k = the column AtC, h = the row d*At, u = the column (I — AAt)c, v = the row d*(I — AtA), and = the scalar I + d*AtC. (Notice that ceR(A) and only if u =0 and
48 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
deR(A*)
and only if,
= 0.) Then the generalized inverse of(A + cd*) is
as follows. (1)
Ifu#Oand,#O,then(A+cd*)t=At_kut _vth+p,tut.
(ii)
If u = 0 and #0, then (A + cd*)? = A' +
and
(iv)
qt = _QI')2k*At
÷k).
where p1 =
—
= 11k0211v112+1P12.
Ifu#0,,=O,andfl=0,then(A+cd*)t=At_Athth_ku?.
(v) Ifv=OandP#0, then (A +cd*)t =At
whereP2=_(0")Ath*+k). 02 = 11h11211u112 + 1P12. (vi) Ifu=O,,=O,andp=0,then(A+cd*)t=At_kktAt_
Athth + (ktAtht)kh. Before proving Theorem 1, two preliminary facts are needed. We state them as lemmas.
IA
Lemma 3.1.1
L'
uli—i.
—PJ
Proof This follows immediately from the factorization
IA+cd* L
0*
ci —li—Lb
O1FA
'JL'
ullI kill
0
—PJL0" lJLd*
Lemma 3.1.2 If M and X are matrices such that XMMt = X and MtM = XM, then X = Mt.
Proof Mt = (MtM)Mt = XMMt = X. U We now proceed with the proof of Theorem 3. Throughout, we assume
c #0 and d #0.
Proof of (i). Let X1 denote the right-hand side of the equation in (i) and let M = A + cd*. The proof consists of showing that X1 satisfies the four Penrose conditions. Using Mt =0, dv' = 1, = — 1, and c — Ak = AAt it is easy to see that MX1 = + wit so that the thfrd Penrose condition holds. Using UtA =0, utc = 1, be = — 1, and d* — hA = v, one obtnint X1M = AtA = and hence the fourth condition holds. The first and second conditions follow easily. Proof of (ii). Let X2 denote the right-hand side of the equality (ii). By using = 0,d*vt = 1, and d*k = — 1, it is seen that (A + cd*)X2 = AA', Ak = C,
PARTITIONED MATRICES AND CONSTRAINED GENERALIZED INVERSE 49
which is hermitian. From the facts that ktAtA = kt. hc = — I, and d* — hA = v, it follows that X2(A + cd*) = A'A — kk' + v'v, which is also
hermitian. The first and second Penrose conditions are now easily verified.
Proof of (iii) This case is the most difficult. Here u = 0 so that CER(A) and hence it follows that R(A + cdi c R(A). Since fi 0 it is clear from Lemma 1 that rank (A + cd*) = rank (A) so that R(A + cdi = R(A). Therefore
+
(A +
= AAt
(3)
because AAt is the unique orthogonal projector onto R(A). Let X3 denote the right-hand side of the equation in (iii). Because = it follows immediately from (3) that X3(A + cd*)(A + cd*)t = X3. Hence the first condition of Lemma 2 is satisfied. To show that the second condition of Lemma 2 is also satisfied, we first The matrix AtA — show that (A + cd*)?(A + cd*) = AtA — kk' + kkt + is hermitian and idempotent. The fact that it is hermitian is clear and the fact that it is idempotent follows by direct computation using AtAk = k, AtAp1 = — k, and kkt p1 = — k. Since the rank of an idempotent matrix is equal to its trace and since trace is a linear function, it follows = Tr(AtA) — = Tr(A'A — kkt + that rank(AtA — kkt + Tr(kkt) + Now, kkt and are idempotent matrices of rank = trace = 1 and AtA is an idempotent matrix whose rank is equal to rank (A), so that rank(AtA — kk' + rank(A + cd*). (4) 1,d*p1 = I —a1fl',and UsingthefactsAk=c,Ap1 = d*AtA = — v, one obtains (A + cd*)(AtA — kkt + p1 = A + cd* — 2, c(v + flk' + 'p11). Now, II so that = k 2a1 I P1 2=PhIklL2 and hence +fiIIkII = — v _flkt. 112
Ii
Thus, (A + cd*)(AtA — kkt + p1p'1) = A + cd*. Because A'A — kkt + is an orthogonal projector, it follows that R(A* + dc*) c R(AtA — kkt + By virtue of(4), we conclude that R(A* + dc*) = R(AtA — kkt + or p1p'1), and hence(A* + dc*)(A* ÷ dci' = AtA — kk' + equivalently, (A + + cdi = AtA — kkt + p1 To show that X3(A + cd*) = A'A + p1 — kkt, we compute X3(A + cd*). 1, Observe that k*AfA = k*, = — and + d* = — 11v112P1k+v. 1
Now, X3(A + cd*)
/
1
a, = AtA +
—
= AtA + !,*k* —
a,
a,
—
+
" k2 - v*)d* P
p,d* — a1
+ d*)
ft
+ p,d*
50 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
=AtA+!v*k*__p,(v_ Write v as = — fill k fl + parentheses and use the fact that X3(A + cd*) = Since
+
AtA+
and substitute this in the expression in 2 p1 = 1fi12c' Ilk II to obtain p1k*.
+ p, P,
+
II
2k =
—
k —k —k = — kk'. we arrive at X3(A + cd*) = A'A + Thus (A + cd*)t(A + cd*) = X3(A + cd*) so that X3 = (A + Cd*)t by Lemma 2. Proof of (iv) and (v). (iv) follows from (ii) and (v) follows from (iii) by
taking conjugate transposes and using the fact that for any matrix = (Me)'. h'h and AtA — kkt is an Proof of (vi). Each of the matrices AAt — orthogonal projector. The fact that they are idempotent follows from AA'h' = lit, MA' = h,A'Ak = k and ktAtA = kt. It is clear that each is hermitian. Moreover, the rank of each is equal to its trace and hence each has rank equal to rank (A) — 1. Also, since u =0, v =0, and fi =0, it follows from Lemma 1.1 that rank(A + cd*) = rank(A) — 1. Hence, rank(A + cd*) = rank (AA'
—
hth)
= rank (AAt
—
k'k).
(5)
With the facts AAtC = c, hc = — I, and hA = d*, it is easy to see that (AAt — hth)(A + cd*) = (A + cd*), so that R(A + cd*) c R(AAt — h'h). Likewise, using d*A?A = d*, d*k = — 1, and Ak = c, one sees that (A + cd*)(AtA — kk') = A + cd*. Hence R(A* + dc*) R(A'A — kk'). By virtue of (5), it now follows that
(A + cd*)(A + cd*)t = AA'
—
h'h, and
(6)
(A + cd*)t(A + cd*) = A'A
—
kk'.
(7)
If X4 denotes the right-hand side of (vi), use (6) and the fact that bAA' = h to obtain X4(A + cd*)(A + cd*)t = X4 which is the first condition of Lemma 2. Use k'A'A = kt, hA = d*, and hc = — 1 to obtain X4(A + cd) =
AtA — kk'. Then by (7), we have that the second condition of Lemma 2 is satisfied. Hence X4 = (A + cd*)f
Corollary 3.1.1
•
When ceR(A), deR(A*), and
inverse of A + cd* is given by (A +
0, the generalized
= At —
At
—
Proof Setv=Oin(iii),u=Oin(v). U Corollary 1 is the analogue of the well known formula which states that
if both A and A + cd are non-singular, then (A + cdT' =
A'
—
PARTITIONED MATRICES AND CONSTRAINED GENERALIZED INVERSE
2.
51
Modified matrices
At first glance. the results of Theorem 1.3 may appear too complicated
to be any practical value. However, a closer examination of Theorem 1.3 will reveal that it may be very useful and it is not difficult to apply to a large class of problems. Suppose that one is trying to model a particular situation with a mathematical expression which involves a matrix A C'" XsI and its generalized inverse A'. For a variety of reasons, it is frequently the case that one wishes to modify the model by changing one or more entries of A to produce a 'modified' matrix A, and then to compute At. The modified model involving A and A' may then be analysed and compared with the original model to determine what effects the modifications produce. A similar situation which is frequently encountered is that an error is discovered in a matrix A of data for which A' has been previously calculated. It then becomes necessary to correct or modify A to produce a matrix A and then to compute the generalized inverse A' of the modified matrix. In each of the above situations, it is highly desirable to use the already known information; A,A' and the modifications made. in the computation of A' rather than starting from scratch. Theorem 1.3 allows us to do this since any matrix modification can always be accomplished by the addition of one or more rank one matrices. To illustrate these ideas, consider the common situation in which one wishes to add a scalar to the (i,j)th entry of A€C'" to produce the modified matrix A. Write A as where Write A' as A' =
= [c1 ...
(1)
rr =
[r
That is, g•j denotes the (i,j)-entry of At, c1 is the ith column of At, and r1 is the ith row of At. The dotted lines which occur in the block matrix of At are included to help the reader distinguish the blocks and their arrangement. They will be especially useful in Section 3 where some blocks have rather complicated expressions. To use Theorem 1.3 on the modified matrix (1), order the computation as follows.
Algorithm 3.2.1 To g-invert the modified matrix A + (I) Compute k and h. This is easy since k = Ate1 and h =
(II) Compute u and v by u = e1 — Ac1 and v =
(Ill) Compute
this is also easy since fi = 1 +
so that
—
so that fi = 1 + (IV) Decide which of the six cases to use according as u, ,, and are zero or non-zero. (V) Depending on which case is to be used, carefully arrange the he1,
52 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
computation of the terms involved so as to minimize the number of
multiplications and divisions performed. To illustrate step (V) of Algorithm 1, consider the term kktAt, which has to be computed in cases (ii) and (vi) of Theorem 1.3. It could be computed in several ways. Let us examine two of them. To obtain kt (we k 112 = assume) k #0) we use k' = This requires 2n operations (an operation is either a multiplication or division). If we perform the calculations by next forming the product kk' and then the product (kkt)At, it would be necessary to do an additional mn2 + n2 operations, making a total of n2(m + 1) + 2n operations. However, if kktAt is computed by first obtaining kt and then forming the product ktAt, followed then by forming the product (k(ktAt)), the number of operations required is reduced to 2n(m + 1). This could amount to a significant saving in time and effort as compared to the former operational count. It is important to observe that the products AA' or AtA do not need to be explicitly computed in order to use Theorem 1.3. If one were naive enough to form the products AAt or AtA, a large amount of unnecessary effort would be expended.
Example 3.2.1 2
Suppose
0
A= 10
1
—1
Lo
0
1
that
3 ii 0I,andA'=— 3 12
—iJ
—3
01
5 7
41
4J
3-7-8J
has been previously computed. Assume that an error has been discovered in A in that the (3,3)-entry of A should have been zero instead of one. Then A is corrected by adding — 1 to the (3,3)-entry. Thus the modified matrix is A = A + e3( — To obtain A' we proceed as follows. (I) The terms k and h are first read from A' as
—4].
7
(II) The terms u and v are easily calculated. Ac3 = e3 so that u =0.
v=
—1 —1
—
ij.
(III) The term $ is also read from At as fi = 1 (IV) Since u =0, v used to obtain At.
0, and
—
g33
=
#0, case (iii) of Theorem 1.3 must be
(V) Computing the terms in case (iii) we get
k 112
= c3 112 =
and
PARTITIONED MATRICES AND CONSTRAINED GENERALIZED INVERSE
53
I
Then by (in) of Theorem 1.3., A =
2
—2
2
2
— 5
2
0
3.
0 —6
Partitioned matrices where A, B, C and D are four matrices such that E is
Let E
=
also a matrix. Then A, B, C, D are called conformable. There are two ways to think of E. One is that E is made up of blocks A, B, C, D. In this case an E to the blocks are, in a sense, considered fixed. If one is trying to have a certain property, then one might experiment with a particular size or kind of blocks. This is especially the case in certain more advanced areas of mathematics such as Operator Theory where specific examples of linear operators on infinite dimensional vector spaces are often defined in terms of block matrices. One can also view E as partitioned into its blocks. In this viewpoint one starts with E, views it as a partitioned matrix and tries to compute things about E from the blocks it is partitioned into. In this viewpoint E is fixed and different arrangements of blocks may be considered. Partitioned matrix and block matrix are equivalent mathematical terms. However, in a given area of mathematics one of the two terms is more likely to be in vogue. We shall try to be in style. Of course, a partitioned or block matrix may have more or less than four blocks.
54 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
This section is concerned with how to compute the generalized inverse
of a matrix in terms of various partitions of it. As with Section 2, it is virtually impossible to come up with usable results in the general case. However, various special cases can be handled and, as in Section 2, they are not only theoretically interesting, but lead to useful algorithms. In fact. Theorem 1.3 will be the basis of much of this section, including the first case we consider. Let be partitioned by slicing off its last column, so that P = [B cJ where BeCtm and ceCtm. Our objective is to obtain a useful P may also be written as P = [Bj011 + 1] where expression for 01ECM, O2ECA.
and
Then P is in a form for which Theorem 1.3 applies. Let A = = [0 1]. Using the notation of Theorem 1.3 and the fact that At
one easily obtains h = d*At = 0 so that
=
= 1 + d*Afc = 1 and
v = d* #0. Also, u = (I — AAt)C = (I — BB')c. Thus, there are two cases to consider, according as to whether u #0 or u =0. Consider first the case when u #0 (i.e. Then case (1) in Theorem 1.3 is used to obtain pt In this case
AC=[fBtcl 0
IBtl IB'cutl
101
10
L
Next, consider the case when u =0 so that (iii) of Theorem 1.3 must be
used. Let k = Btc. Then = — k*Bt so that Bt —
=
= 1 + c*B'*Btc = 1 + k*k,
=
and —
kk*Bt 1 + k*k k*Bt
.
Thus we have the following theorem.
1 + kk Theorem 3.3.1 For ceCTM, and P = let k = Btc and u = (I — BBt)C = c — Bk. The generalized inverse of P is given by t
IBt_kyl
(ut
]whereY=l(l+k*k)_lk.Bt ifu=0.
[B]
Theorem 4 can be applied to matrices of the form p = r a row vector, by using the conjugate transpose. By using Theorem 1 together with Theorem 1.3, it is possible to consider the more general partitioned matrix M
=
where
PARTITIONED MATRICES AND CONSTRAINED GENERALIZED INVERSE
dECN,
so that M can be written as
and xeC. Let P =
+
+
M
55
=
=
Theorem 1.3 can be applied to obtain Mt as M' = [Pt 01 + G. pt is
I+
known from Theorem 1. Clearly, x
[Pt oiI?1 = I L
0. Thus either case (i)or case (v) of Theorem 1.3 must be
.J
used, depending
on whether or not
Idi
IIA*
The details, which are somewhat involved, are left to the interested reader, or one may see Meyer's paper [551. We state the end result. Theorem 3.3.2
let
For AECtm N CEC
:],k=Atc,h=dsAt,u=(I_AAt)
c,
= I + 11k02,w2= 1 + The generalized inverse for M is as follows.
(i) If U
0 and V
lAt — kut — vth — 5,tut
0, then Mt =
I
u
L
At (ii)
10
lkk*At
—
'0
I
then M'
(iii)
where p1
— k.
=[ 2k*At
=
(iv) If v =0 and t5 =0, then M
iAt_
where p2
(vi)
=
0, then Mt
— k,
u
11,112 +
t
—i
I
rAt
= cv,
—
*
u
L
(v) If v = 0 and 5
—
,
0
—3-- IAtII*U*
=L +
1]'.
=5
—
h, and 42 = W2 U 112 + 1512.
Ifu=O,v=O, andS=O, then
M' = I
[
—kA
0
J
+ k*Ath*Iklrh, —1
w1w2[—lJ''
56 GENERAUZED INVERSES OF LINEAR TRANSFORMATIONS
Frequently one finds that one is dealing with a hermitian or a real
symmetric matrix. The following corollary might then be useful.
Corollary 3.3.1
For AECTMXM such that A = A* and for xeR, the
generalized inverse of the hermitian matrix H =
0, then Ht
(1) If u
rAt
k
tk*
as follows: (5
t' U:
t.
=L
(ii)
If u =0 and 6=0, then
H' =
1
1 and 2 may be used to recursively compute a generalized inverse, and were the basis of some early methods. However, the calculations are sensitive to ill conditioning (see Chapter 12). The next two algorithms, while worded in terms of calculating At should only be used for that purpose on small sized, reasonably well conditioned matrices. The real value of the next two algorithms, like Algorithm 2.1, is in updating. Only in this case instead of updating A itself, one is adding rows and/or columns to A. This corresponds, for example in least squares problems, to either making an additional observation or adding a new parameter to the model.
Algorithm 3.3.1 To g-invert any (I)
(II) For i 2, set B = (III) k = B_ 1c1, (IV) u1 = —
!c1],
1k1,
if ii,
0,
and k*Bt
(VI) B
if
0,
=
(VII) Then
Example 3.3.1
Suppose that we have the linear system Ax = b, and
PARTITIONED MATRICES AND CONSTRAINED GENERALIZED INVERSE
57
have computed At so we know that
10 A— —
2
1
3
1
11
'r
At
1
0
'
1
1
—1
—1
3
But now we wish to add another independent variable and solve Ai = b where
10
—
A
=
1
21
0
3
1
1
1
1
—1
—t by computing A.
We will use Algorithm 1. In the notation of Algorithm I, A = B3, A = B2,
c*=[1 0
1
—2
=
3
—
1
7], so that 2
3
2 —2
0
1
1-2
1]
—7J• If the matrix in our model is always hermitian and we add both an independent variable and a row, the next Algorithm could be useful. L
Algorithm 3.3.2
5
3
To g-invert H =
such
that H = H*
(I) Set A1=h11.
(II) For i 2, set
=
where c. =
I,] (III) Let k = A1_ 1c1, (IV) ô. = — and (V) u1
0, then
=
V1
=
and
At (VII) If U1 = 0 and 61 #0, then
(VIII) If ii, =0 and
At
= [_L
= 0, then let r1 =
A;j;L1
58 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
=
=( At =
so that
and z1 =
(r1_z1)*
L
(IX) Then
For a general hermitian matrix it is difficult to compare operational counts, however Algorithm 2 is usually more efficient than Algorithm 1 when applicable since it utilizes the symmetry. There is, of course, no clear cut point at which it is better to recompute At from scratch then use augmentation methods. If A were 11 x 4 and 2 rows were added, the authors would go with Algorithm 1. If A were 11 x 4 and 7 rows were added, we would recompute At directly from A. It is logical to wonder what the extensions of Theorems 1 and 2 are.
That is, what are [A!C]t and
when C and D no longer are just
columns and B is no longer a scalar? When A, B, C and D are general conformable matrices, the answer to 'what is [A! C]?' is difficult. A useful
answer to 'what is
rA Cit ?' is not yet known though formulas exist. LD B]
The previous discussion suggests that in some cases, at least when C has a 'small' number of columns, such as extensions could be useful. We will begin by examining matrices of the form [A! C] where A and C are general conformable matrices. One representation for [A!C]t is as follows.
Theorem 3.3.3 For AeCTM 'and CECTM [A C] can be written as
r
AA rA'clt I i — LT*(I + fl'*) I
I
VA —
I(At
—
where B = (I — AAt)C and T = AtC(I
the generalized inverse of
AtCRt
AtCBt) + Bt —
BtB).
Proof One verifies that the four Penrose conditions are satisfied. U A representation similar to that of Theorem 3 is possible for matrices partitioned in the form
by taking transposes.
The reader should be aware that there are many other known ways of representing the generalized inverse for matrices partitioned as [A! C] or
as []. The interested reader is urged to consult the following references to obtain several other useful representations. (See R. E. Cline [31], A. Ben-Israel [12], and P. V. Rao [72].)
PARTITIONED MATRICES AND CONSTRAINED GENERALIZED INVERSE
As previously mentioned, no useful representation for
rA L
R
59
Cl' has, D]
up to this time, been given where A, C, R and D are general comformable matrices. However, if we place some restrictions on the blocks in question, we can obtain some useful results.
Lemma 3.3.1
If A, C, R and D are conformable matrices such that A is then D = RA 'C.
square and non-singular and rank (A) = rank
Furthermore, EJP = RA' and Q = A'C then
Proof
The factorization I
F
O1[A
I][R
yelds rank
I
I
—A'Cl fA
DJL0
i
IAC1 = rank rA;
0
]Lo 1
0
D—RA'C
= rank (A) +
rank (D — RA - 'C). Therefore, it can be concluded that rank (D — RA 'C) =0, or equivalently, D = RA 'C. The factorization (1) follows
directly. • Matrices of the type discussed in Lemma 1 have generalized inverses which possess a relatively simple form. Theorem 3.3.4
Let A, C, R and D be conformable matrices such that
A is square, non-singular, and rank (A) any matrices such that
[A C]t =
[
[R D] =
= rank
If P and Q are
Q],
then
([I + P*P]A[I + QQ*Jy 1[I P*]
and G = [I, Q]. Notice B= = that rank(B) = rank(G) = rank(A) = rank(M) = r = (number of columns
Proof Let
M
of B) = number of rows of G). Thus, we may apply Theorem 1.3.2 to obtain Mt = (BG)t = G*(GG*) '(BB) IB*. Since (B*B) 1B* = [A*(I + P*P)A] = A 1(1 + p*p) l[UP*] and G*(GG*rl =
+
desired result is obtained. •
It is always possible to perform a permutation of rows and columns to
60 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
any matrix so as to bring a full-rank non-singular block to the upper left
hand corner. Theorem 3.3.4 may then be used as illustrated below. Example 3.3.3 In order to use Theorem 4 to find Mt, the first step is to reduce M to row echelon form, EM. This not only will reveal the rank of M, but will also indicate a full set of linearly independent columns.
LetM=
2
1
2
1
1
3
1
2
4
2
2
6
0 0
1
2' —1 0
2
,sothatEM=
0 1
0 0 0
1/2 1/2 0
5/21 1/2 0
00000
24 015 Thus, rank(M)
=2 and the first and third columns of M form a full independent set. Let F be the 5 x 5 permutation matrix obtained by exchanging the second and third rows of so that [1
11213
MF =
—
[2
independent
=
X2]. The next step is to select two
01415
rows from the
matrix X1. This may be
accomplished in several
rows reduction to echelon form, or one might
ways. One could have obtained this information by noting which
were interchanged
during the
just look at X1 and select the appropriate rows, or one might reduce to echelon form. In our example, it is easy to see that the first and third rows of X1 are independent. Let E be the 4 x 4 permutation matrix obtained by exchanging the second and third rows of 14 so that
EMF
=
11
1
Ii
—1
12
1
2
1
3
2
0
2
01415
[2
IA
C
= I
permutation matrices are unitary, Theorem 1.2.1 allows us to write (EMF)t = F*M?E* so that = Now apply Theorem 4 to obtain (EMF)t. In our example, Since
so that
—88
66
—6
36
—12
30
15
—35
30
—20
9
1
18
10
18
15
36
30
33
330
1
—3 —6 —6 —12 15
66 30
9
18
33
—55
—88 —55 —35 —20 1
10
PARTITIONED MATRICES AND CONSTRAINED GENERALIZED INVERSE
Theorem 3.3.5.
Let A. C, R and D be conformable matrices such that
lAd
A is square, non-singular, and rank (A) = rank
matrix M
61
R
=
and let
D
•
Let M denoe the
The matrix W is
l[A* R*].
non-singular and Mt =
Proof We first prove the matrix W is non-singular. Write W as W = A*AA* + R*RA* + A*CC* + R*DC*. From Lemma I we know = that D = RA 'C so that W = A*AA + R*RA* + A*CC* + R*RA (A*A + R*R)A - l(AA* + CCt). Because A is non-singular, the matrices (AtA + RtR) and (AAt + CCt), are both positive definite, and thus non-singular. Therefore, W must be non-singular. Furthermore, W' = + CC*yl + RtR) '.Using this, one can now verify
the four Penrose conditions are satisfied. U In both Theorems 4 and 5, it is necessary to invert only one matrix whose dimensions are equal to those of A. In Theorem 5, it is not necessary to obtain the matrices P and Q as in Theorem 4. However, where problems of ill-conditioning are encountered (see Chapter 12), Theorem 4 might be preferred over Theorem 5.
4.
Block triangular matrices
Definition 3.4.1 For conformable matrices T,, , T,2 , T21, and T22, matrices oftheform fT11
L°
0
122J
LT21
T22
are called upper block triangular and lower block triangular, respectively. It is important to note that neither T,, nor T22 are required to be square in the above definition. Throughout this section, we will discuss only upper block triangular matrices. For each statement we make about upper block triangular matrices, there is a corresponding statement possible for lower block triangular matrices.
Definition 3.4.2
For an upper block triangular matrix,
T is a properly partitioned upper block triangular matrix if T is upper block triangular of the form T = I L
A
"22J
where the
62 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
dimensions of G11 and G22 are the same as the dimensions of the transposes of T11 and T22, respectively. Any partition ofT which makes T into a
properly partitioned matrix is called a proper partition for T.
111101 lo 0 0 ii
LetT=
Example 3.4.1
Loooi]
1/30 1/3 0 1/30
0 0 0
0 1/2 1/2 There are several ways to partition T so that T will have an upper block triangular form. For example,
Iii 1101 and T2 Ii = 100101 L000: iJ Loolo 1 i
are two different partitions of T which both give rise to upper block triangular forms. Clearly, T, is a proper partition of T while T2 is not. In fact, T1 is the only proper partition of T. Example 3.4.2 If T is an upper block triangular matrix which is partitioned as (1) whether T11 and T22 are both non-singular, then T is properly partitioned because
T_l_1Tu
I
T1 22
1 I
Not all upper block triangular matrices can be properly partitioned.
Example 3.4.3
Let T =
111
2 L20
'I
11
2
12
4 0J . Since there are no zeros in
11
L001J
—11
8
25J The next theorem characterizes properly partitioned matrices. —
10
Theorem 3.4.1 Let T be an upper block triangular matrix partitioned as (1). T is properly partitioned (land only c 1) and Furthermore, when T is properly partitioned, Tt is given by R(Tr2)
rTt all —L
_TfT Tt I
I
i
(Note the resemblance between this expression and that of Example 2.)
Proof Suppose first that T is properly partitioned so that Tt is upper block triangular. It follows that iTt and TtT must also be upper block triangular. Since iT' and VT are hermitian, it must be the case that they
PARTITIONED MATRICES AND CONSTRAINED GENERALIZED INVERSE 63
are of the form
fR
0]
FL
0
Lo
R2]
Lo
L2
By using the fact that 1'TtT = T, one obtains R1T11 =T11 and L2T22 =T22, R1T12 =T12 and T12L2 =T12
(3)
Also, R1 , R2, L1 and L2 must be orthogonal projectors because they are
hermitian and idempotent. Since R1 = T1 1X for some X and R1T11 = T11, Likewise, we can conclude R(R = i)' and hence T = From (3), L2 = VT22 for some Y and T22 = T22L2 implies L2 = we now have 2
T —T12a"dP
T*_T* 12 12'
4
c
(5)
and therefore R(T12) c R(T11) and
To prove the converse, one first notes that (5) implies (4) and then uses this to show that the four Penrose conditions of Definition 1.1.3 are satisfied by the matrix (2). A necessary condition for an upper block triangular to be properly partitioned is easily obtained from Theorem 1.
•
Let T be partitioned as in (1). If T is properly partitioned, rank(T11) + rank(T22).
Corollary 3.4.1 then rank(T)
Proof If T is properly partitioned, then Tt is given by (2) and T1
rTTtL'_ 1T12 = T12 so that iTt =L
rank(T)= rank(TTt) = rank(T1 rank(T22). U 5.
1
—
i
1
0 22
+ rank
Thus, 22
= rank(T1
+
The fundamental matrix of constrained minimization
Definition 3.5.1
Let V E CA
be any matrix in
xr•
x
be a positive semi-definite matrix and let C*
The block matrix B
Iv C*1 is called the =LC 0 ]
fundamental matrix of constrained minimization.
This matrix is so named because of its importance in the theory of constrained minimization, as is demonstrated in the next section. It also plays a fundamental role in the theory of linear estimation. (See Section 4 of Chapter 6.) Throughout this section, the letter B will always denote the matrix of Definition 1. Our purpose here is to obtain a form for Bt.
64 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
Note that ifS is the permutation matrix S =
then
rc* vit
B=[0 vi cjS*sothatBt=SLo rc*
Thus, we may use Theorem 3.4.1 to obtain the following result.
Theorem 3.5.1
Bt =
jf and only tfR(V) c R(C*).
In the case when R(V) R(C*), it is possible to add an appropriate term to the expression in Theorem 1 to get Bt.
Theorem 3.5.2 For any positive semi-definite let E = I — and let Q = (EVE)t. Then Bt
Proof
XII
—
=
and any C*ECN
VCt].
(1)
+
Since V is positive semi-definite, there exists
XII
such
that
Now, E = E* = E2so that
V=
(2) Q = (E*A*AE)t = ([AE]*AE)t and hence R(Q) = R( [AE]*) R(EA*) c R(E). This together with the fact that Q = Q* implies
EQ=QandQE=Q,
(3)
so that CQ = 0 and Q*C =0. Let X denote the right-hand side 01(1). We shall show that X satisfies the four Penrose conditions. Using the above information, calculate BX as
+ EVQ 0
—
BX
:
EVCt
- EVQVCt
Use (3) to write EVQ = EVEQ = QtQ and EVQV = EVEQEV = EA*AE(AE;t(AE)*t(AE)*A = EA*AE(AE)t(AE) x (AE)tA = EA*AE(AE)f A = (AE)*(AE)(AE)tA = (AE)*A
=EA*A=EV.
(4)
Thus,
+ Q'Q
BX — L
0
0 1
:ccti' :
which is hermitian. Using (5), compute BXB
(5)
=
From (3) and (4) it is easy to get that EVEQV = EVQV = EV, and hence BXB = B. It follows by direct computation using (5) that XBX = X.
PARTITIONED MATRICES AND CONSTRAINED GENERALIZED INVERSE 65
Finally, compute XB
I
QVE+CtC
01
:
From (3)
= QVE = QEVE = QQ'. In a manner similar to that used in obtaining (4), one can show that VQVE = yE, so that XB
=
which is hermitian. We have shown that X satisfies all four Penrose
conditions so that Bt = X. U 6.
Constrained least squares and constrained generalized inverses
In this section, we deal with two problems in constrained minimization. Let beC'", and f€C". Let 5" denote the set
5= {xIx=Ctf+ N(C)}. That is, 9' is the set of solutions (or least squares solutions) of Cx = 1, depending on whether or not fER(C). .9' will be the set of constraints. It
can be argued that the function Ax — b attains a minimum value on .9'. The two problems which we will consider are as follows. Problem
Find mm
1
Ax
—
b
and describe the points in 9' at which
the minimum is attained as a function of A, b, C, and
f.
Problem 2 Among the points in 9 for which the minimum of Problem 1 is attained, show there is a unique point of minimal norm and then describe it asa function of A,b,C, and f. The solutions of these two problems rest on the following fundamental theorem. This theorem also indicates why the term 'fundamental matrix' was used in Definition 3.5.1. Theorem 3.6.1 q(x) = Ax — b Ax0 —
2•
Let A, b, C, 1, and .5" be as described above, and let A vector x satisfies the conditions that XØE.9' and
bil Ax — bil for all XE9' !fand only !f there is a vector
such that z0
IA*A
Lc
[°] is a least squares solution of the system = :
C*][x]
fA*b
o]Ly]Lf
Proof Let B and v denote the block matrices IA*A C*1 IA*bl B Suppose first that is a least squares = =L g solution of Bz = v. From Theorem 2.1.2, we have that Bz0 = BBt,. From equations (2) and (5) of Section 5, we have I
66 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
BB
ICtC + (AE)t(AE) =[
o
where E = I
-'
—
C C,
so that Bz, = BBt, implies A*Ax0 = C*y. = CtCA*b + (AE)t(AE)A*b = CtCA*b + (AE)t(AE)(AE)*b = CtCA*b + (AE)*b = A*b
(1)
and
Cx0=CCtf. From (2) we know x E9'. Write x, = Ctf + h0 where h0eN(C). For every xe9' we have x = C1f+ so that
IIAC'f+ = AC'f + =
(2)
Ax —b112 —
ACtf — Ah, +
A; — b
(3)
112
For all h€N(C), we may use (1) to get (Ah,Ax0 — b) = (h,A*(Ax0
—
b))= +
(h, _C*y0)= —(Ch,y0)=O. Hence(3)becomesq(x)= q(x0) so that q(x) q(x) for all xeb°, as desired. Conversely, suppose x0e9' and q(x) q(x). If Ctm is decomposed as Ctm =
A(N(C)) + [A(N(C))]'-, then
A; — b = Ah + w, where hEN(C),WE[A(N(C))]-'-.
(4)
We can write q(x0) =
Ah + w II 2 = H
Ah 2 + w 2.
(5)
Now observe that (x0 — h)E$° because x0E$" and heN(C) implies C(x0 — I.) = CCtf. By hypothesis, we have q(x) q(x) for all xeb' so that q(x0) q(x0 — h) = II (Ax — b) — Ala 112 = (A1 + w) — Ah 112 = w (from (4)) = q(x0) — Ala 112 (from (5)). Thus Ah =0 and (Ax0 — b)E
[A(N(C))]' by (4). Hence for any geN(C), 0= (Ag, Ax0 — b) = (g, A*Ax0 A*b), and (A*Ax, — = R(C*). This means there exists a — A*b or vector( — such that C*( — y0) = A*Ax + C*y, = A*b = A*b — (AE)*b + (AE)*b
(6)
= A*b — (AE)*b + (AE)t(AE)(AE)*b = A*b — EA*b + (AE)IAEA*b
= [CtC + (AE)I(AE)]A*b Now (6) together with the fact that x0e9', gives
1
= RBt
rA*bl L
therefore
[c
is a least squares solution of —
• IJ
FA*bl
The solution to Problem 11$ obtained directly from Theorem 1.
j
and
PARTITIONED MATRICES AND CONSTRAINED GENERALIZED INVERSE
Theorem 3.6.2
The set of vectors M c .9' at which mm
67
Ax — b is
atta.ned is given by
M = {(AE)t(b — AC'f) + Ctf+(I
=I—
(M will be called the set of constrained least squares solutions). Furthermore where E
Ax — b
mm
= 11(1 —
A(AE)t)(ACtf —
b)
(7)
Proof From Theorem I, we have M
=
Bt[A;b]+(I — BtB)['L].
and
arbitrary).
By Theorem 5.2, M =
QQt = [(AE)*(AE)]l(AE)*(AE) = = (AE)t(AE)sO that M becomes M {QA*(b — ACtf) + Ctf + (I — Note that, R((AE)') = R((AE)*) = R(EA*) R(E), so that Q
E(AE)t = (AE)t and (3) of Section 5 yields QA* = (EQE)A* = E(AE)t(AE)*t(AE)* = E(AE)t = (AE)t. Thus M becomes M = {(AE)t(b — ACtf) + Ctf+ For each mEM, we wish to write the (I expression Am — b Ii. In order to do this, observe (8) implies that A(AE)t(AE) = AE and = so that A(I — when
(8)
=0
for all {eN(C). Expression (7) now follows. I The solution to Problem 2 also follows quickly. Let M denote the set of constrained least squares solutions as given in Theorem 6.2. If u denotes the vector u = (AE)t(b — Act f) + Cti, then u is the unique constrained least squares x for all XE M such solution of minimal norm. That is, U EM and
Theorem 3.6.3
that x
u.
Proof The fact that UE M is a consequence of Theorem 2 by taking =0. To see u has minimal norm, suppose x€M and use Theorem 2 to = N(ct*) Since R((AE)t(AE)) = write x = u + (I — (AE,t(AE)g, R((AE)*) = R(EA*) R(E) = N(Ct*), it follows that ct*(AE)t AE =0 and Therefore it is now a simple matter to verify that u (I — — (AE)t(AE)g 112 112 u 112 with equality holding if and lix = u 112 + 11(1 only if(I — = 0, i.e if and only if u = x. U From Theorems 2 and 3, one sees that the matrix (AE)t is the basic quantity which allows the solution of the constrained least squares problem to be written in a fashion analogous to that of the solution of the unconstrained problem. Suppose one wished to define a 'constrained generalized inverse for
68 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
A with respect to C' so that it would have the same type of least squares
properties in the constrained sense as At has in the unconstrained sense. Suppose you also wanted it to reduce to At when no constraints are present (i.e. C = 0). The logical definition would be the matrix (AE)t. Definition 3.6.1 For Ae C and CE Xfl the constrained generalized inverse of A with respect to C, denoted by is defined to be — CtC))t. = (APN(c))t = (A(I (Notice that reduces to At when C = 0.) The definition of could also have been formulated algebraically, see Exercise 7.18. The solutions of Problem 1 and Problem 2 now take on a familiar form. The constrained least squares solution of Ax = b of minimal norm is. + (1—
xM =
(9)
The set of constrained least squares solutions is given by
M=
+ (I —
(10)
Furthermore, mm
lAx —bli =
(11)
b)lt.
11(1 —
The special case when the set of constraints defines a subspace instead of just a flat deserves mention as a corollary. Let V be a subspace of and P = The point X_E Vofminimal norm at which mm Ax — b H is attained is given by
Corollary 3.6.1
(12)
and the set of points M Vat which mm H Ax — b H is attained is "€1,
(13)
Furthermore, mm
Ax
—
b
= H (I —
AA' )b Il.
(14)
Proof C = and f=0, in (9),(10), and (11). Whether or not the constrained problem Ax = b, x e V is consistent also has an obvious answer. Corollary 3.6.2 If Vis a subspace of Ax = b, xe Vhas a solution and only
and P =
= b (i.e. problem is consistent, then the solution set is given by V) and the minimal norm solution is Xm =
then the problem
If the + (I —
Proof The problem is consistent if and only if the quantity in (14) is zero, that is, = b. That this is equivalent to saying
PARTITIONED MATRICES AND CONSTRAINED GENERALIZED INVERSE 69
follows from (8). The rest of the proof follows from (13) and (12). U In the same fashion one can analyse the consistency of the problem Ax = b, xe{f+ Vis a subspace) or one can decide when two systems
possess a common solution. This topic will be discussed in Chapter 6 from a different point of view. 7. 1.
Exercises Use Theorem 3.1.3 to prove Theorem 3.3.2.
2. Prove that if rank(A) =
R(C)
R(A), and R(R) c R(A*),
then D = RAT. 3. Let Q = D — RAtC. If R(C) c R(A), R(R*) c R(A*), R(R) c R(Q), and R(C*) R(Q*), prove that — fA — fAt + AtCQtRAt —QtRAt LR Qt
DJL
4. Let P = A — CDtR. If R(R)
R(D), R(C*) c R(D*), R(R*)
R(P*), and
R(P), write an expression for
R(C)
in terms of
5. If M =
C, R and D.
..
rA
.
[c* Dj is a positive semi-definite hermitian matrix such
that R(C*At) R(D — C*AtC), write an expression for Mt. 6. Suppose A is non-singular in the matrix M of Exercise 5. Under this assumption, write an expression for Mt. 7. If T22 is non-singular in (1) of Section 4 prove that B= — T1 1)T12 + is non-singular and then prove that
8.
If T11 is non-singular in Exercise 7, write an expression for Tt.
9. Let T
= IA
LO*
ci where
Derive an
CECTM, and
expression for TT
10. Prove that the generalized inverse of an upper (lower) triangular matrix T of rank r is again upper (lower) triangular if and only if there
exists a permutation matrix P such that PTP
] where
=
T1 E C' 'is a non-singular upper (lower) triangular matrix. 11. For such that rank(A) = r, prove that AtA = AA' if and only if there exists a unitary matrix W such that W*AW
=
] where
70 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
T1 eC'
xr
a nonsingular triangular matrix. Use Exercise 10 and the fact that any square matrix is unitarily equivalent to an upper (lower) triangular matrix. is
and
12. For
write an expression for
rAlt LRi
13. Prove Theorem 3.4.1 for lower block triangular matrices. 14. Give an example to show that the condition rank(T) = rank(T1 + rank (T22) is not sufficient for (1) of Section 4 to be properly partitioned. 15. Complete the proof of Theorem 3.3.3. 16. Complete the proof of Theorem 3.3.5. 17. If V is a positive definite hermitian matrix and if C is conformable, let K = V + CCt and R = C*KtC. Show that
Cit IKt — KtCRtC*Kt
Iv
[Ct
:
oJ
[
:
KtcRt
RtC*Kt
18. The constrained generalized inverse of A with respect to C is the unique solution X of the five equations (1) AXA = A on N(C), (2) XAX = X, (3) (AX)t = AX, (4) PN(C)(XA)t = XA, on N(C) (5) CX =0. 19. Complete the proof of Theorem 3.1.1 20. Derive Theorem 3.3.1 from Theorem 3.3.3.
4
Partial isometries and EP matrices
1.
Introduction
There are certain special types of matrices which occur frequently and called unitary if A* = A - l, have useful properties. For example, A e hermitian if A = A*, and normal if A*A = AA*. This should suggest to the reader questions like: when is A* = At?, when is A = At?, and when is
AtA = AAt? The answering of such questions is useful in understanding the generalized in'.'erse and is probably worth doing for that reason alone. It turns out, however, that the matrices involved are useful. It should probably be pointed out that one very rarely has to use partial isometrics or the polar form. The ideas discussed in this short chapter tend to be geometrical in nature and if there is a geometrical way of doing something then there is probably an algebraic way (and conversely). It is the feeling of the authors, however, that to be able to view a problem from more than one viewpoint is advantageous. Accordingly, we have tried to develop both the geometric and algebraic theory as we proceed. Throughout this chapter denotes the Eudhdean norm on C". 2.
Partial isometries
Part of the difficulity with generalizing the polar form in Theorem 0.3.1 X form to AECM X m # n, was the need for a 'non-square unitary'.
We will now develop the appropriate generalization of a unitary matrix. Definition 4.2.1 Suppose that VeCtm x n m. Then V is called an isometry fl Vu = u fl for all UECN. The equation Vu = u may be rewritten (Vu, Vu) = (u, u) or (V*yu, u) = (u, u). Now if C1, C2 are hermitian matrices in C" then (C1u,u)= (C2u,u) for all UEC" if and only if C1 = C2. Thus we have: Proposition 4.2.1 is an isonietry if and only if V*V = A more general concept than isometry is that of a partial isometry.
72
GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
a subs pace M. Then Definition 4.2.2 Let =M partial isometry (of M into Ctm) and only if (1)
is a
IIVUH=IIuIIforallu€Mand
(ii) Vu=Oifu€M'.
The subspace M is called the initial space of V and R(V) is called the final space.
A partial isometry V (or y) sends its initial space onto its final space without changing the lengths of vectors in its initial space or the angles between them. In other words, a partial isometry can be viewed as the identification of two subspaces. Orthogonal projections are a special type of partial isometry. Partial isometrics are easy to characterize. Theorem 4.2.1
Suppose that VeCtm
X
Then the following are equivalent.
(i) V is a partial isometry
(ii)
V*=Vt.
VV*—D a R( 's') s"" (iv) V=VV*V. (v) V* = (vi) (VV)2 = (V*V).
—
R( V) —
Initial space oF V
(vii) (VV*)2 =
Proof The equivalence of (1) and (iv)—(vii) is left to the exercises, while the equivalence of (ii) and (iii) is the Moore definition of yt• Suppose then that V is a partial isometry and M is its initial space. If ueM, then (Vu, Vu) = (V*Vu, u) = (u, u). But also R(V*V) = R(V*) = N(V)1 = M. Thus is hermitian. If ueM-'-, then V*Vu = 0 since )LYIM =IIM since Vu =0. Thus Similar arguments show that VV* = and = (iii) follows. To show that (iii) implies (I) the above argument can be done in reverse.
Corollary 4.2.1 If V is a partial isometry, then so is V*. For partial isometrics the Singular Value Decomposition mentioned in Chapter 0 takes a form that is worth noting. We are not going to prove the Singular Value Decomposition but our proof of this special case and of the general polar form should help the reader do so for himself.
Proposition 4.2.2
Xfl
Suppose that V cC"' is a partial isometry of rank r. Then there exist unitary matrices UeC"' X and We C" X "such that
:Jw. Proof Suppose that V cC'" XIt is a partial isometry. Let M = R(V*) be its initial space. Let { b1,... ,b,} be an orthonormal basis for M. Extend this to
PARTIAL ISOMETRIES AND EP MATRICES 73
an orthonormal basis
... ,bj of C". Since V is
= {b1 ,... ,b,,
isometric on M, {Vb1, ... ,Vbj is an orthonormal basis for R(V). Extend {Vb1,... ,Vb,} to an orthonormal basis = {Vb1, ... of CTM. Let W be the unitary transformation which changes a vector to its Let U be the unitary transformation coordinates with respect to basis which changes a 132-coordinate vector into a coordinate vector with respect to the standard basis of CTM. Then (1) follows. • We are now in a position to prove the general polar form.
Theorem 4.2.2
(General Polar Form). Suppose that AeCTM
Xn
Then
(1) There exists a hermitian BeC" such that N(B) = N(A) and a partial isometry such that R(V) = R(A), N(V) = N(B), and A = VB. X (ii) There exists a hermitian Ce C'" TM such that R(C) = R(A) and a partial isometry W such that N(W) = N(A), R(W) = R(C), and A = CW.
Proof The proof is motivated by the complex number idea it generalizes. We will prove (1) of Theorem If z = rern, then r = (zzl"2 and e" = z(zzT 2 and leave (ii), which is similar, to the exercises. Let B = (Recall the notation of page 6.) Then BeC" and N(A*A) N(B) = = N(A). Let V = AB'. We must show that V is the required partial isometry. Notice that Bt is hermitian, N(Bt) = N(B), and R(Bt) = R(B). Thus R(V) = R(ABt) = R(AB) = R(A(A*A)) = R(A) and N(V) = N(AA*A) = N(A) = N(B) as desired. Suppose then that ueN(V)1 = R(B). Then Vu 112 = (Vu, Vu) = (ABtU, AB'u) = (BtA*ABtU, u) = (BtBIBtU, u) = u Thus V is the required partial isometry. U The proof of the singular value decomposition theorem is left to the
exercises. Note that if D is square, then
ID O'lIi L o'
01
o'iLo
where
ID L0
01
0] can be factored as Ii 01. a partial
ID 01.is square and
k
[o
Of
isometry. A judicious use of this observation, Theorem 2, and the proof of Proposition 2 should lead to a proof of the singular value decomposition. While partial isometries are a generalization of unitary matrices there are some differences. For example, the columns or rows of a partial isometry need not be an orthonormal set unless a subset of the standard basis for C" or C'" is a basis for R(V*) or R(V).
Example 4.2.1
0
.ThenVisa
0 partial isometry but neither the columns nor the rows (or a subset thereof) form an orthonormal basis for R(V) or R(V). It should also be noted that, in general, the product of a pair of partial isometrics need not be a partial isometry. Also, unlike unitary operators,
74 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
square partial isometrics can have eigenvalues of modulus unequal to
one or zero. Example 4.2.2
Let
v=
Then V is a partial isometry and
a(V)= 10
Example 4.2.3
3.
Let V =
1
0
01 .
1
[000J
Then V is a partial isometry and
EP matrices
The identities A5A
= AM for normal matrices and A 'A = AA' for
invertible matrices are sometimes useful. This suggests that it might be
helpful to know when AtA = AAt.
Definition 4.3.1
and rank(A)= r. If AtA = AAt,
Suppose that
then A is called an EP,, or simply EP, matrix. The basic facts about EP matrices are set forth in the next theorem.
Theorem 4.3.1 (I)
Suppose that AeC"
X
Then the following are equivalent.
AisEP
(ii) R(A) = R(A5) = R(A) N(A) (iii) (iv) There exists a unitarr matrix U and an invertible r x r matrix A1, r = rank (A), such that (1)
: U5.
Proof (1). (ii) and (iii) are clearly equivalent. That (iv) implies (iii) is obvious. To see that (iii) implies (iv) let 13 be an orthonormal basis for C" consisting of first an orthonormal basis for R(A) and then an orthonormal basis for N(A). is then the coordinate transformation from standard
coordinates to fl-coordinates. • If A is EP and has the factorization given by (1), then since U, unitary
are
(2)
Since EP matrices have a nice form it is helpful if one can tell when a matrix is EP. This problem will be discussed again later. Several conditions implying EP are given in the exercises.
PARTIAL ISOMETRIES AND EP MATRICES
75
It was pointed out in Chapter 1 that, unlike the taking of an inverse, the taking of a generalized inverse does not have a nice 'spectral mapping property'. If A e invertible, then Aec(A) if and only
')
(3)
and Ax = Ax
if and only if A 'x =
(1\ x. )
(4)
While it is difficult to characterize matrices which satisfy condition (3), it is relatively easy to characterize those that satisfy condition (4). Notice that (4) implies (3).
Theorem 4.3.2
Suppose
that
Then A is EP if and only jf
(Ax = Ax if and only if Atx = Atx).
(5)
Proof Suppose that A is EP. By Theorem 1, A = is unitary and A11 exists. Then Ax = Ax if and only
]
U where U
]
Ux = A
=0, then u1 =0, and AtX =0. Thus (5) holds for A =0. If A
0,
then u2 =0 and u1 is an eigenvector for A1. Thus (5) follows from (2) and (4). Suppose now that (5) holds. Then N(A) = = Thus A is EP
by condition (iii) of Theorem 1. U Corollary 4.3.1 If A is EP, then Aec(A) if and only tfAtea(At). Corollary 1 does not of course, characterize when A is EP.
Example 4.3.1
Notice that a(A) = {O) and At = A*.
Let A
= Thus Aeo(A) if and only if A'ec(A). However, AtA
AAt_.[i 4. 1.
while
=
Exercises If V, W, and VW arc partial isometrics, show that (VW)t = WtVt using only Theorem 1.
76 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
=0, If V. W are partial isometries and [W, VVt] =0 or [V. then WV is a partial isometry. a partial isometry and U, W are unitary, show that 3. If VeC" x UVW is a partial isometry. 4. Show that the following conditions are equivalent. 2.
(i) V is a partial isometry (ii) VV*V = V (iii) V*VV* = (iv) (V*V)2 = = VV* (v) 5. Prove part (ii) of Theorem 2. *6. Prove the Singular Value Decomposition Theorem (Theorem 0.2). 7. Prove that if A*A = AA*, then A is EP. 8. Prove that the following are equivalent.
(a) A is EP
(b) [AtA,A+At]=0 (c) [AAt, A + At] =0 (d) [AtA,A + A*] = 0 (e) [AAt,A + A*] = 0
(f) [A,AtA]=O (g) [A,AAt] = 0
9. Prove that if A is EP, then (At)2 = (A2)t. Find an example of a matrix A 0, such that (A2)t = (At)2 but A is not EP. *10. Prove that A is EP if and only if both (At)2 = (A2)t and R(A) = R(A2). *11. Prove that A is EP if and only if R(A2)= R(A) and [AtA,AAt] = 0. = (Al)t, then [AtA,AAt] = 0 but not conversely. Comment: Thus the result of Exercise 11 implies the result of Exercise 10. Exercise 11 has a fairly easy proof if the condition [AtA, AAt] =0 is translated into a decomposition of C". 12. Suppose that X = What can you say about X? Give an example X of a X such that X = a partial isometry. What conditions in addition to X = Xt are needed to make X a partial isometry? 13. Prove that V is an orthogonal projector if and only if V = Vt = 14. Prove that if A, B are EP (not necessarily of the same rank) and AB = BA, then (AB)t = BtAt.
5. The generalized inverse in electrical engineering
1.
Introduction
In almost any situation where a system of linear equations occurs there is the possibility of applications of the generalized inverse. This chapter will describe a place where the generalized inverse appears in electrical engineering. To make the exposition easily accessible to those with little knowledge of circuit theory, we have kept the examples and discussion at an elementary level. Technical terms will often be followed by intuitive definitions. No attempt has been made to describe all the uses of generalized inverses in electrical engineering, but rather, one particular use will be discussed in some detail. Additional uses will be mentioned in the closing paragraphs to this chapter. It should be understood that almost everything done here can be done for more complex circuits. Of course, curve fitting and least squares analysis as discussed in Chapter 2 is useful in electrical engineering. The applications of this chapter are of a different sort. The Drazin Inverse of Chapter 7 as shown in Chapter 9 can be used to study linear systems of differential equations with singular coefficients. Such equations sometimes occur in electrical circuits if, for example, there are dependent sources.
2.
n-port network and the impedance matrix
It is sometimes desirable, or necessary, to consider an electrical network in terms of how it appears from the outside. One should visualize a box (the network) from which lead several terminals (wires). The idea is to describe the network in terms of measurements made at the terminals. One thus characterizes the network by what it does, rather than what it physically is. This is the so-called 'black box' approach. This approach appears in many other fields such as nuclear engineering where the black
78 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
box might be a nuclear reactor and the terminals might represent
measurements of variables such as neutron flow, temperature, etc. We will restrict ourselves to the case when the terminals may be treated in pairs. Each pair is called a port. It is assumed that the amount of current going into one terminal of a port is the same as that coming out of the other terminal of the same port. This is a restriction on the types of devices that might be attached to the network at the port. It is not a restriction on the network. A network with n ports is called an n-port network.
Given an n-port network there are a variety of ways to characterize it depending on what one wants to do. In particular, there are different kinds of readings that can be taken at the ports. Those measurements thought of as independent variables are called inputs. Those thought of as dependent variables are called outputs. We will assume that our networks have the properties of homogeneity and superposition. Homogeneity says that if the inputs are multiplied by a factor, then the outputs are multiplied by that same factor. If the network has the property of superposition, then the output for the sum of several inputs is the sum of the outputs for each input. We will use current as our input and voltage as our output. Kirchhofl's laws are useful in trying to determine if a particular pair of terminals are acting like a port. We will also use them to analyse a particular circuit. A node is the place where two or more wires join together. A loop is any closed conducting path. KIRCH HOFF'S CURRENT LAW: The algebraic sum of all the instantaneous currents leaving a node is zero. KIRCH HOFF'S VOLTAGE LAW: The algebraic sum of all the voltage drops around any loop is zero. Kirchhoff's current law may also be applied to the currents entering and leaving a network if there are no current sources inside the network. Suppose that r denotes a certain amount of resistance to the current in a wire. We will assume that our wires have no resistance and that the resistance is located in certain devices called resistors. Provided that the resistance of the wires is 'small' compared with that of other devices in the circuit this is not a 'bad' approximation of a real circuit. Let v denote the voltage (pressure forcing current) across the resistor. The voltage across the resistor is also sometimes referred to as the 'voltage drop' across the resistor, or the 'change in potential'. Let i denote the current in the resistor. Then
v=ir. that r is constant but v and i vary with time. lithe one-sided Laplace transform is taken of both sides of (1), then v = ir where v and i are now functions of a frequency variable rather than of a time variable. When v and I are these transformed functions (for any circuit), then the ratio v/i is called the impedance of the circuit. Impedance is in the same units (ohms) as resistance but is a function of frequency. If the circuit Suppose
(1)
THE GENERALIZED INVERSE IN ELECTRICAL ENGINEERING
Poril
79
Port3
0-
JPod4 Fig. 5.1 A 4.pori network.
consists only of resistors, then the impedance is constant and equals the
resistance. Impedance is usually denoted by a z. In order to visualize what is happening it is helpful to be able to picture
a network. We will denote a current source by t ,a resistor by
a
terminal by and a node by —o-----. The reader should be aware that not all texts distinguish between terminals and nodes as we do. We reserve the word 'terminal' for the ports. Our current sources will be idea! current sources in that they are assumed to have zero resistance. Resistors are assumed to have constant resistance. Before proceeding let us briefly review the definition of an n-port network. Figure 5.1 is a 4-port network where port 1 is open, port 2 has a current source applied across it, port 3 is short-circuited, and port 4 has a resistor across it. Kirchhofls current law can be applied to show that ports 1, 2 and 3 actually are ports, that is, the current entering one terminal is the same as that leaving the other terminal. Port 1 is a port since it is open and there is no current at all. Now consider the network in Fig. 5.2. The network in Fig. 5.2 is not a 4-port network. As before, the pairs of terminals 5 and 6, 7 and 8, do form ports. But there is no way to guarantee,
(4
I
5
2
6
:3
Fig. 5.2 A network which is not an n-port.
80 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
Port 2
Port3
Port
Fig. 5.3
without looking inside the box, that the current coming out of terminal 4 is the same as that flowing into any of terminals 1, 2 or 3. Thus terminal 4 cannot be teamed up with any other terminal to form a port. There are, of course, ways of working with terminals that cannot be considered as ports, but we will not discuss them here. It is time to introduce the matrices. Consider a 3-port network, Fig. 5.3, which may, in fact, be hooked up to other networks not shown. Let be the potential (voltage) across the jth port. Let be the current through one of the terminals of the jth port. Since the v,, are variables, it really does not matter which way the arrow for points. Given the values of the voltages v1, v2, v3 and are determined. But we have assumed our network was homogeneous and had the property of superposition. Thus the can be written in terms of the by a system of linear equations. = Z11i1 + Z12j2 + z13i3 V2 = Z21i1 + Z22i2 + Z23j3, = z31i1 + z32i2 + z33i3 V1
(2)
or in matrix notation,
= Zi, where v,ieC3,
Z is called the impedance matrix of the network since it has the same units as impedance and (3) looks like (1). In the system of equations (2), i,, and the are all functions of the frequency variable mentioned earlier. If there are devices other than just resistors, such as capacitors, in the network, then Z will vary with the frequency. The numbers have a fairly elementary physical meaning. Suppose that we take the 3-port of Fig. 5.3 and apply a current of strength i1 across the terminals forming port 1, leave ports 2 and 3 open, and measure the voltage across port 3. Now an ideal voltmeter has infinite resistance, that is, there is no current in it. (In reality a small amount of current goes through it.) Thus i3 =0. Since port 2 was left open, we have i2 =0. Then (2) says that v3 = z31i1 or z31 = v3/i1 when = = 0. The other Zkj have similar interpretations. We shall calculate the impedance matrix of the network Example 1 in Fig. 5.4. In practice Z would be calculated by actual physical
THE GENERALIZED INVERSE IN ELECTRICAL ENGINEERING
81
Iport3
_J
1_
Fig. 5.4 A particular 3-port network. The circled number give the resistance in ohms of the resistor.
measurements of currents and voltages. We shall calculate it by looking
'inside the box'. If a current i1
is
applied across port 1 we have the situation
in Fig. 5.5. The only current is around the indicated loop. There is a resistance of
1 ohm on this loop so that v1 = Thus = v1/i1 = I. Now there is no current in the rest of the network so there can be no changes in potential. This means that v2 =0 since there is no potential change across the terminals forming port 2. It also means that the potential v3 across port 3 is the same as the potential between nodes a and b in Fig. 5.5. That is, v3 = 1 also. Hence z21 = v2/i1 =0 and z31 = v3/i1 = 1. Continuing we get
Ii
0
Li
2
Z=IO 2
11
(4)
21.
3j
In order to calculate z33 recall that if two resistors are connected in series (Fig. 5.6), then the resistance of the two considered as one resistor is the sum of the resistance of each. Several comments about the matrix (4) are in order. First, the matrix (4) is hermitian. This happened because our network was reciprocal. A network is reciprocal if when input and output terminals are interchanged, the relationship between input and output is unchanged. That is, = Second, the matrix (4) had only constant terms since the network in Fig. 5.5
'I
L___ -J Fig. 5.5 Application of a current to port I.
82 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
Fig. 5.6 Two resistors in series.
was resistive, that is, composed of only resistors. Finally, notice that (4) was not invertible. This, of course, was due to the fact that v3 = v1 + v2. One might argue that v3 could thus be eliminated. However, this dependence might not be known a priori. Also the three-port might be needed for joining with other networks. We shall also see later that theoretical considerations sometimes lead to singular matrices. 3.
Parallel sums
Suppose that R3 and R2 are two resistors with resistances r1 and r2. Then if R1 and R2 are in series (see Fig. 5.6) we have that the total resistance is
r1 + r2. The resistors may also be wired in parallel (Fig. 5.7). The total resistance of the circuit elements in Fig. 5.7 is r1r2/(r1 + r2) unless r1 = r2 =0 in which case it is zero. The number r1r2/(r1 + r2)
(1)
is called the parallel sum of r1 and r2. It is sometimes denoted r1 : r2. This section will discuss to what extent the impedance matrices of two n-ports, in series or in parallel, can be computed from formulas like those of simple resistors. It will be convenient to alter our notation of an n-port slightly by writing the 'input' terminals on the left and the 'output' terminals on the right. The numbers j, j' will label the two terminals forming the jth port. Thus the 3-port in Fig. 5.8a would now be written as in Fig. 5.8b. The notation of Fig. 5.8a is probably more intuitive while that of Fig. 5.8b is more convenient for what follows. The parallel and series connection of two n-ports is done on a port basis.
Fig. 5.7 Two resistors wired in parallel.
I
t
(0)
Fig. 5.8 Two ways of writing a 3-port network.
THE GENERALIZED INVERSE IN ELECTRICAL ENGINEERING
83
FIg. 5.9 Series connection of two 3-ports.
That is, in the series connection of two n-port networks, the ports labelled 1 are in series, the ports labelled 2 are in series, etc. (Fig. 5.9). Note, though, that the designation of a port as 1, 2, 3,... is arbitrary. Notice that the parallel or series connection of two n-ports forms what appears to be a new n-port (Fig. 5.10).
Proposition 5.3.1
Suppose that one has two n-ports N1 and N2 with impedance matrices Z1 and Z2. Then the impedance matrix of the series connection of N1 and N2 will be Z1 + Z2 provided that the two n-ports are
stillfunctioning as n-ports. Basically, the provision says that one cannot expect to use Z1 and Z2,
if in the series connection,
and N2 no longer act like they did when
Z1 and Z2 were computed.
It is not too difficult to see why Proposition 1 is true. Let N be the network formed by the series connection of two n-ports N1 and N2. Suppose in the series connect.c,n that N1 and N2 still function as n-ports. Apply a current of I amps across the ith port of N. Then a current of magnitude I goes into the 'first' terminal of port i of N1. Since N1 is an n-port, the same amount of current comes out of the second terminal of port i of N1 and into the first terminal of port i of N2. But N2 is also functioning as an n-port. Thus I amps flow out of terminal 2 of the ith port of N2. The resulting current is thus equivalent to having applied a current of! amps across the ith ports of and N2 separately. But the is the sum of the potentials potential across the jth port of N, denoted across the jth ports of N1 and N2 since the second terminal of port j of N1 and the first terminal of port j of N2 are at the same potential. But Nk, = k = 1,2 are functioning as n-ports so we have that = and where the superscript refers to the network. Thus = vs/I = (41) + = as desired. + 1
Fig. 5.10 Parallel connection of Iwo 3-ports.
84 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
L______ I FIg. 5.11 Two 2-ports connected in series.
Example 5.3.1 Consider the series connection of two 2-ports shown in Fig. 5.11. All resistors are assumed to be 1 ohm. The impedance matrix of while that of the second is
the first 2-port in Fig. 5.11 is Z1
Suppose now that a current of magnitude I is applied across
=
port 1 of the combined 2-port of Fig. 5.11. The resistance between nodes a and b is 1:2=2/3. The potential between a and b is thus 2/3 I. But what is important, 1/3 of the current goes through branch b and 2/3 through branch a. Thus in the series hookup of Fig. 5.11, there is I amperes of current going in terminal 1 of the first 2-port but only 21/3 coming out of terminal I of the first 2-port. Thus the first port of the first 2-port is no longer acting like a port. If the impedance matrix Z of the entire network
z1 + z2. in many cases,
of Fig. 5.11 is calculated, we get Z
=
however, the n-ports still act like n-ports when in series and one may add the impedance matrices. In practice, there is a simple procedure that can be used to check if the n-ports are still functioning as n-ports. We see then that when n-ports are in series, that the impedance matrix of the whole network can frequently be calculated by adding the individual impedance matrices. Likewise, when in parallel a formula similar to (1) can sometimes be used to calculate the impedance matrix. Suppose that Then define the parallel sum A :B of A and B by
A :B = A(A + B)tB. If a reciprocal network is composed solely of resistive elements, then the impedance matrix Z is not only hermitian but also positive semi-definite. That is, (Zx, x) 0 for all If Z is positive semi-definite, we sometimes write Z 0. If Z is positive semi-definite, then Z is hermitian. (This depends on the fact that and not just Ra.) if A — B 0 for A A is greater than or equal to B.
Proposition 5.3.2
Suppose
that N1 and N3 are two reciprocal n-ports
THE GENERALIZED INVERSE IN ELECTRICAL ENGINEERING
85
which are resistive networks with impedance ?natriCes Z1 and Z2. Then the impedance matrix of the paralle! connection of N1 and N2 is Z1 : Z2.
Proof In order to prove Proposition 2 we need to use three facts about the parallel sum of hermitian positive semi-definite matrices Z ,Z2. The first is that (Z1 : Z2) = (Z2 : Z1). The second is that R(Z1) + R(Z2) = R(Z1 + Z2), so that, in particular. R(Z1), R(Z2) c R(Z1 + Z2). The third is is hermitian. The proof of these facts = N(Z1), i = 1,2. since that is left to the exercises. Let N1 and N2 be two n-ports connected in parallel to form an n-port N. Let Z1, Z2 and Z be the impedance matrices of N1, N2 and N respectively. and v be the current and voltage vectors Similarly, let i1, i; v1, for N1, N2 and N. To prove Proposition 2 we must show that
v=Z1[Z1 +Z2]tZi=(Z1:Z2)i.
(2)
The proof of (2) will follow the derivation of the simple case when N1, N2 are two resistors and Z1, Z2 are positive real numbers. The current vector i may be decomposed as
i=i1+i2.
(3)
= v2 since N1 and N2 are connected in parallel. Thus
But v =
v=Z1i1,andv=Z2i2.
(4)
We will now transform (3) into the form of(2). Multiply (3) by Z1 and Z2 to get the two equations Z1i = Z1i1 + Z1i2 = v + Z1i2, and Z2i = Z2i1 + Z2i2 = v + Z2i1. Now multiply both of these equations by (Z1 + Z2)t. This gives
(Z1 + Z2)tZ1i = (Z1 + Z2)tv + (Z1 + Z2)tZ1i2, and (Z1
+
= (Z1 +
+ (Z1 + Z2)tZ2i1.
(5) (6)
Multiply (5) on the left by Z2 and (6) on the left by Z1. Equations (5) and (6) become
(Z2:Z1)i=Z2(Z1 +Z2)tv+(Z2:Z1)i2,and (Z1 : Z2)i = Z1(Z1 + Z2)t, + (Z1 : Z2)i1.
(7)
=+
But (Z1 :Z2) =(Z2 :Z1), i and Z1 + Z2 is hermitian. Thus addition of the two equations in (7) gives us that
(Z1 :Z2)i =(Z1 + Z2)(Z1 + Z2)tv =
+ z)V.
(8)
Now the impedance matrix gives v from i. Thus v must be in R(Z1) and R(Z2) by (4) so that + 1)V =v and (8) becomes (Z1 Z2)i = v as
desired. •
Example 5.3.2
Consider the parallel connection of two 3-port networks
86 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
(9)
Fig. 5.12 The parallel connection of N3 and N2.
shown in Fig. 5.12. The impedance matrices of N1 and N2 are
Ii Z1=I0
0
1]
2
Li
2
Ii
0
1
2IandZ2=I0
1
1
3J
1
2
LI
By Proposition 2, the impedance matrix of circuit (9) is Z1 : Z2. 11
0
11
111
0
Z1:Z2=Io
ii
2
21
(10
2
21
Li
2
3]
\L'
2
3]
Ii
0
= —10
ii r
2
84 21 I— 60
2
3] L
324
11/2
=
0
0 2/3 2/3
24
+
1'
0
10
1
Li
I
11\tli
0
iii 10
2J1L1
ii
1
ii
1
2]
i]
—60 66
241
Ii
0
61
10
1
II
6
30J
Li
1
2J
1/21
2/31. 7/6J Ll/2 The generalized inverse can be computed easily by several methods. The values obtained from Z1 Z2 reader is encouraged to verify that the agree with those obtained by direct computation from (9). 4.
I
Shorted matrices
The generalized inverse appears in situations other than just parallel
connections Suppose that one is interested in a 3-port network N with impedance matrix Z. Now short out port 3 to produce a new 3-port network, N' and denote its impedance matrix by Z'. Since v3 is always zero in N' we must have = Z'33 =0, that is, the bottom row of Z' is zero. If N is a
THE GENERALIZED INVERSE IN ELECTRICAL ENGINEERING
87
reciprocal network, then the third column of Z must also consist of zeros.
Z' would then have the form
z,=
(1)
.
The obvious question is: What is the relationship between the ZkJ and the The answer, which at first glance is probably not obvious, is:
Proposition 5.4.1
Suppose that N is a resistive n-port network with
impedance matrix Z. Partition Z as Z
=
1 s n. Then Z' =
is the impedance matrix of
the network N'formed by shorting the last sports of N N is reciprocal.
Iii
Proof Write i = I
.°
I. v
L13J
lvi where
= [°j
v5eC5. Then v = Zi may be
s
written as V0
= =
ho
+ (2)
+
Suppose now that the last s ports of N are shorted. We must determine the matrix X such that v0 = Xi0. Since the last s ports are shorted, =0. Thus the second equation of(2) becomes Z22i = — Z21i0. Hence
=
—
+ [I
—
=—
+ h where h€N(Z22). (3)
If (Zi, 1) = 0, then Zi = 0 since Z 0. Thus N(Z22)c N(Z32). (Consider i with = 0). Substituting equation (3) into the first equation of (2) now + h)) = Z1 110 — gives v0 =Z1 110 + Z1215 = Z1 110 + Z12( — = (Z11 — as desired. The zero blocks appear in
the V matrix for the same reason that zeros appeared in the special
case(1). U Z' is sometimes referred to as a shorted matrix. Properties of shorted matrices often correspond to physical properties of the circuit. We will mention one. Others are developed in the exercises along with a generalization of the definition of shorted matrix. Suppose that Z,Z' are as in Proposition 1. Then Z V 0. This corresponds to the physical fact that a short circuit can only lower resistance of a network and not increase it. It is worth noting that in the formula for V in Proposition 1 a weaker
88 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
type of inverse than the generalized inverse would have sufficed. However,
as we will see later, it would then have been more difficult to show that Z Z'. In the parallel sum the Penrose conditions are needed and a weaker inverse would not have worked. 5.
Other uses of the generalized inverse
The applications of the generalized inverse in Sections 3 and 4 were chosen
partly because of their uniqueness. There are other uses of the generalized inverse which are more routine. For example, suppose that we have an n-port network N with impedance matrix Z. Then v = Zi. It might be desirable to be able to produce a particular output v0. In that case we would want to solve v0 = Zi. If then we must seek approximate solutions. This would be a least squares problem as discussed in Chapter 2. would correspond to that least squares solution which requires the least current input (in the sense that Iii II is minimized). Of course, this approach would work for inputs and outputs other than just current inputs and voltage outputs. The only requirements are that with respect to the new variables the network has the properties of homogeneity and superposition otherwise we cannot get a linear system of equations unless a first order approximation is to be taken. In practice, there should also be a way to compute the matrix of coefficients of the system of equations. Another use of the generalized inverse is in minimizing quadratic forms subject to linear constraints. Recall that a quadratic form is a function 4) from C" to C of the form 4)(x) = (Ax, x), AE C" Xst for a fixed A. The instantaneous power dissipated by a circuit, the instantaneous value of the energy stored in the inductors (if any are in the circuit), and the instantaneous value of the energy stored in the capacitors, may all be written in the form (Al, i) where i is a vector made up of the loop currents. A description of loop currents and how they can be used to get a system of equations describing a network may be found in Chapter III of Huelsman. His book provides a very good introduction to and amplification of the ideas presented in this chapter.
6.
Exercises
For Exercises (I)—(8) assume that A, B, C, D 0 are n x n hermitian matrices.
1. Show that A:B=B:A. 2. Show that A:B0. 3. Prove that R(A : B) = R(A) R(B).
4. Prove that (A:B):C=A:(B:C). *5 Prove that Tr (A : B) (Tr A) : (Tr B). *6. Prove that det (A : B) (det A) : (det B).
THE GENERALIZED INVERSE IN ELECTRICAL ENGINEERING
Fig. 5.14
Fig. 5.13 7.
89
a. Provethat ifAB,then
b. Formulate the physical analogue of 7a in terms of impedances. *8. Show that (A + B) : (C + D) A : C + B D. (This corresponds to the assertion that Fig. 5.13 has more impedance than Fig. 5.14 since the latter has more paths.) For Exercises (9)—(14) assume that is an hermitian positive semidefinite operator on C". Pick a subspace M c C". Let B be an orthonormat basis consisting of With respect to B, first an orthonormal basis for M and then one for
A
B
has the matrix E =[B* C]' where E 0. Define EM =[ 9.
A_BCtB* 0 o
Prove that EEM.
10. Show that if D is hermitian positive semi-definite and E D 0 and
R(D)cM, DEM. *11. Prove that EM=lim E:nPM. For the next three exercises F is another hermitian positive semi-definite matrix partitioned with respect to B just like E was.
12. Suppose that E F 0. Show that EM FM 0. 13. Prove that (E + F)M EM + FM. 14. Determine when equality holds in Exercise 13. Let L. M be subspaces. Prove = :
7.
References and further reading
A good introduction to the use of matrix theory in circuit theory is [44].
Our Section 2 is a very condensed version of the development there. In particular, Huelsman discusses how to handle more general networks than ours with the port notation by the use of 'grounds' or reference nodes. Matrices other than the impedance matrix are also discussed in detail. Many papers have been published on n-ports. The papers of Cederbaum, together with their bibliographies, will get the interested reader started. Two of his papers are listed in the bibliography at the end of this book [27], [28]. The parallel sum of matrices has been studied by Anderson and colleagues in [2], [3], [4], [5] and his thesis. Exercises (1 )—(8) come from [3] while (9)—(14) are propositions and lemmas from [2]. The theory of shorted operators is extended to operators on Hubert space in [5]. In [4] the operations of ordinary and parallel addition are treated as special
90 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
of a more general type of matrix addition. The minimization of quadratic forms is discussed in [10]. The authors of [42] use the generalized inverse to minimize quadratic forms and eliminate 'unwanted variables'. The reader interested in additional references is referred to the bibliographies of the above and in particular [4]. cases
6
(/jk)-generalized inverses and linear estimation
1.
Introduction
We have seen in the earlier chapters that the generalized inverse At of A E C'" although useful, has some shortcomings. For example: computation of At can be difficult, At is lacking in desirable spectral properties, and the generalized inverse of a product is not necessarily the product of the generalized inverses in reverse order. It seems reasonable that in order to define a generalized inverse which overcomes one or more of these deficiencies, one must expect to give up something. The importance one attaches to the various types of generalized inverses will depend on the particular applications which one has in mind. For some applications the properties which are lost will not be nearly as important as those properties which are gained. From a theoretical point of view, the definition and properties of the generalized inverse defined in Chapter 1 are probably more elegant than those of this chapter. However, the concepts of this chapter are considered by many to be more practical than those of the previous chapters.
2.
Definitions
Recall that L(C", C'") denotes the set of linear transformations from C"
into C'". For 4eL(C",C'"), was defined in Chapter 1 as follows. C" was denoted the decomposed into the direct sum of N(4) and a one to one mapping of restriction of 4 to N(4)', so that 4j was onto R(4). was then defined to be Atx -
if xeR(A) — tO
if
Instead of considering orthogonal complements of N(4) and R(4), one could consider any pair of complementary subspaces and obtain a linear transformation which could be considered as a generalized inverse for 6.
92 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
Definition 6.2.1
(Functional Definition) Let 44eL(C", C'") and let N and R be complementary subs paces of N(4) and R(44), that is, C" = N(44) + N and C'" = R(4) + R. Let 41 =44 L (i.e. 44 restricted to N. Note that 441 is a one to R(44)—' N exists). For xeC'", one mapping of N onto R(44) so that 44 let x = r1 + r2 where r1 ER(4) and r2eR. The function QN,R defined by
9 is either called the (N, R)-generalized inverse for 4 or a prescribed range!
null space generalized inverse for 44. For a given N and R, QNR is a uniquely defined linear transformation from C'" into C". Therefore,for AECrnXA, A induces afunction 44 and we can define GNR€Cnxrn to be the matrix 0fQNR (with respect to the standard basis).
In the terminology of Definition 1, At is the (R(A*), inverse for A. In order to avoid confusion, we shall henceforth refer to At as the Moore—Penrose inverse of A. In Chapter 1 three equivalent definitions for At were given; a functional definition, a projective definition (Moore's definition), and an algebraic definition (Penrose's definition). We can construct analogous definitions for the (N, R)-generalized inverse. It will be assumed throughout this section that N, R are as in Definition 1. The projection operators which we will be dealing with in this chapter will not necessarily be orthogonal projectors. So as to avoid confusion, the following notation will be adopted. Notation. To denote the oblique projector whose range is M and whose null space is N we shall use the symbol The symbol will denote, as before, the orthogonal projector whose range is M and whose null space
is M'. Starting with Definition 1 it is straightforward to arrive at the following
two alternative characterizations of GNR.
Definition 6.2.2 (Projective Definition) For AeC'"
is called the
Il) = N, N(GNR) = R, AGNR =
(N, R)-generalized inverse for A and GNRA =
Definition 6.2.3 (Algebraic Definition) For AeC'" (N, R)-generalized inverse for A and GNRAGNR = GNR.
",
GNR is the
R(GNR) = N, N(GN R) = R, AGN RA = A,
Theorem 6.2.1
The functional, the projective, and the algebraic definitions of the (N, R)-generalized inverse are equivalent.
Proof We shall first prove that Definition 1 is equivalent to Definition 2 and then that Definition 2 is equivalent to Definition 3. If QN R satisfies the conditions of Definition 1, then it is clear that R(QNR)'= N and N(Q$R) = R. But 44QNR is the identity function on R(4) and the zero function on R. Thus AGNR
=
Similarly GNRA =
GENERALIZED INVERSES AND LINEAR ESTIMATION
93
and Definition I implies Definition 2. Conversely, if GNR satisfies
the conditions of Definition 2, then N and R must be complementary subspaces for N(A) and R(A) respectively. It also follows that must be the identity on R while is the identity on N and zero on N(A). Thus QNR satisfies the conditions of Definition I, and hence Definition 2 implies Definition 1. That Definition 2 implies Definition 3 is clear. To complete the proof, we need only to show that Definition 3 implies Definition 2. Assuming G satisfies the conditions of Definition 3, we obtain (AG)2 = AG and (GA)2 = GA so that AG and GA are projectors. Furthermore, R(AG) R(A) = R(AGA) c R(AG) and R(GA) R(G) = R(GAG) c R(GA). Thus R(AG) = R(A) and R(GA) = R(G) = N. Likewise, it is a simple matter to show that N(AG) = N(G) = R and N(GA) = N(A), so that AG = and GA = which are the
conditions of Definition 2 and the proof is complete. • K
Corollary 6.2.1 For AeCTM the class of all prescribed range/null space generalized inverses for A is precisely the set (1)
i.e. those matrices which satisfy the first and second Penrose conditions. The definition of a prescribed range/null space inverse was formulated as an extension of the Moore—Penrose inverse with no particular applications in mind. Let us now be a bit more practical and look at a problem of fundamental importance. Consider a system of linear equations written as Ax = b. If A is square and non-singular, then one of the A characteristics of is the solution. In order to generalize this property, one migbt ask for A Ctm what are the characteristics of a matrix GE Km such that Gb is a solution of Ax = b for every beCTM for which Ax = b is consistent? That is, what are the characteristics of G if AGb = b for every beR(A)?
(2)
In loose terms, we are asking what do the 'equation solving generalized inverses of A' look like? This is easy to answer. Since AGb = b for every beR(A), it is clear that AGA = A. Conversely, suppose that G satisfies AGA = A. For every beR(A) there exists an such that Ax,, = b. Therefore, AGb = AGAx,, = Ax, = b for every bER(A). Below is a formal statement of our observations.
Theorem 6.2.2
For AeCTM has the property that Gb is a solution of Ax = bfor every be Ctm for which Ax = b is consistent and only if
GE{XECAXmIAXA = A}.
Thus the 'equation solving generalized inverses for A' are precisely those which satisfy the first Penrose condition of Definition 1.1.3.
(3)
94 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
such that, in addition to being an 'equation solving inverse' in the sense of (2), we also z for all z Gb and require that for each beR(A), Gb ze{x Ax = b). That is, for each beR(A) we want Gb to be the solution of minimal norm. On the basis of Theorem 2.1.1 we may restate our objective as follows. For each beR(A) we require that Gb = Atb. Therefore, G must satisfy the equation GA = AtA = (4) Let us now be more particular. Suppose we seek
which is equivalent to
AGA = A and (GA)* = GA.
(5)
The equivalence of (4) and (5) is left to the exercises. Suppose now that G is any matrix which satisfies (5). All of the above implications are reversible so that Gb must be the solution of minimal norm. Below is a formal statement of what we have just proven.
Theorem 6.2.3
For AeC'"
and beR(A), A(Gb) = band
forallz#Gbandze{xIAx=b) = A and (XA)* = XA). We define the term minimum nonn generalized inverse to be a matrix which satisfies the first and fourth Penrose conditions of Definition 1.1.3. Let us now turn our attention to inconsistent systems of equations. As in Chapter 2, the statement Ax = b is to be taken as an open statement and the set of vectors {z I Az is equal to b) (i.e. the 'solution set' for the open statement) may or may not be empty, depending on whether beR(A). When the solution set for Ax = b is empty, we say Ax = b is inconsistent. In dealing with inconsistent equations, a common practice is to seek a least squares solution as defined in Definition 2.1.1. As we saw in Theorem 2.1.2, Atb is always a least squares solution of Ax = b. However, Atb is a special least squares solution. It is the least squares solution of minimal norm. In some applications, one might settle for obtaining any least squares solution and not care about the one of minimal norm. For Ae C'" X let us try to determine the characteristics of a matrix G such that Gb is a least squares solution of Ax = b for all be Ctm. To begin with, we can infer from Corollary 2.1.1 that AGb — b is minimal if and only if AGb = Pft(A)b = AAtb. This being true for all beC'" yields AG = AAt. But AG = AAt is equivalent to AGA = A and (AG)* = AG. The proof of this equivalence is left to the exercises. We formally state the above observations in the following theorem.
Theorem 6.2.4 For for all be C'"
Gb is a least squares solution of Ax = b
and only if
= A and (AX) = AX).
GENERALIZED INVERSES AND LINEAR ESTIMATION
95
We define the term least squares generalized inverse to be a matrix which satisfies theflrst and third Penrose conditions.
Looking at Corollary 1 and Theorems 2—4, one sees that each of the different types of G matrices discussed can be characterized as a solution to some subset of the Penrose conditions of Definition 1.1.3. To simplify our nomenclature we make the following standard definition. XPfl is called an (i,j, k)Definition 6.2.4 For AeCTM X a matrix inverse for A if G satisfies the ithjth, and kth Penrose conditions;
(1) AXA=A, (2) XAX=X, (3) (AX)* = AX, = XA.
(4)
Furthermore,the set of all (i,j, k)-inverses for A will be denoted by A {i,j, k }. For example, G isa (1,3)-inverse for A if AGA = A and (AG)* = AG. We write GeA{1,3}. Note that G may or may not satisfy either of the
other two Penrose conditions. This notation requires one to pay particular attention to how the equations are ordered, but experience has shown this convention to be efficient and useful. A will be used to designate an Notation. For arbitrary element of A { 1 }. The notation is a convenience that must be treated with some care. It is frequently used to make statements which hold for the entire class A { 1). For example, the statement rank(AA) = rank(A) is understood to mean that 'rank(AG) = rank(A) for every GeA{1}'. The phrase 'for every GeA{1}' will always be implicit, unless otherwise stated, but generally will not appear. Because expressions involving the (f notation are not always uniquely defined matrices, ambiguities can arise. For example, in investigating the possibility of a reverse order law for (1)-inverses, what
(f
should it mean to write (AB) = B A ? It is better to avoid the notation in situations of this type. At times it will be necessary to formulate statements more explicitly than the (f notation allows. Some authors have assigned special notations to different kinds of (i,j, k)-inverses. We will not do this. Since almost every application of (i,j, k)-inverses involves subsets of A (1 } (the equation solvers) we will simply use the (f together with a qualifying phrase. For example, we might say 'let A be a least squares inverse' when we wish to designate an arbitrary element of A{1,3), or we can simply write 'let AEA{1,3)'. The term 'generalized inverse' would be inappropriate if the related concepts did not coincide with the usual notion of matrix inverse in the special case that the matrix under consideration is non-singular. Note that if is non-singular, then A{1) = {A'}. Note also that OeA{2) even if A is non-singular. Table 6.1 summarizes the information concerning the important types of (i,j, k)-inverses.
96 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS Let
Table 6.1
Type of inverse
Properties
Terminology
(1)-inverse
An Equation Solving Inverse C eA { Ijil and only if Gb is a solution of (Sometimes called a g.inverse) Ax = b for every be R(A)
(1,2)-inverse
A prescribed range/null space inverse. (The (N, R)inverse) (Some authors have also called this a reflexive inverse)
If GeA{l,2}, then N(A)+ R(G)u C and R(A) N(G) = C. That is, each (1,2)-inverse defines complementary subspaces for N(A) as well as R(A). Conversely, each pair (N, R), where N and R are complements of N(R) and R(A) respectively, uniquely determines a (1,2)-inverse, GNR with R(GN*) = N and N(GNR) = R
(1,3)-inverse
A Least Squares Inverse
GeA{1,3) if and only 11Gb is a least squares solution of Ax — b for every beC
(1,4)-inverse
A Minimum Norm Inverse
G€A(1,4) if and only (Gb is the minimum norm solution of Ax = b for every beR(A)
(1,2.3,4)-inverse The Moore— Penrose Inverse, At
A (1,2,3,4) contains exactly one element, At, which is the (R(A), N(A))-inverse (or A. A'b is the minimal norm least squares solution of Ax = b. If beR(A), then Atb is the solution of minimal norm
There are, of course, several possible (i,j, k)-inverses which are not included in Table 6.1. However, they are of lesser importance and their properties can be inferred from those listed in the table. For example, a generalized inverse which will provide least squares solutions and whose range and null space are complements of N(A) on R(A), respectively, must clearly be a (1,2,3)-inverse.
3.
(1)-Inverses
As already mentioned, the important types of (i,j, k)-inverses are usually
members of A{1}. Therefore, we will take some time to discuss (1)-inverses. All of the facts listed below are self-evident and are presented here for completeness.
Theorem 6.3.1
then
rank(A) rank(A), (ii) rank(AA ) = rank(AA) = rank(A), (1)
(iii) (A)*EA*{1}, (iv) For non-singular P and Q, Q A has full column rank, then AA = (vi) If A has full row rank, then AA = I,. (vii) If P has full column rank and Q has full row rank, then
QAPe(PAQ){1},
GENERALIZED INVERSES AND LINEAR ESTIMATION 97
01.is a (1)-inverse for IA
1A
(viii)
.
0
(ix) If A is hermitian, then there exists a hermitian (1 )-im'erse for A.
(For example, At) (x) If A is positive (negative) semi-definite, then there exists a positive (negative) semi-definite (1)-inverse for A. (For example. A')
Theorem 6.3.2
For A, X, B, C conformable matrices, the matrix equation AXB = C has a solution jf and only B = C: in which case the set of all solutions is given by
=
+H
—
AAHBBIH arbizrarv}.
(I)
Proof If AXB = C is consistent then there exists an X0 such that C= Thus AACBB = AAAX0BBB = AX0B = C. If C = AA CBB, then X = ACB is a particular solution. To prove (1), note first that for every H,ACB + H — AAHBB is a solution of AXB = C. Given a particular solution X0, there exists an H0 such that X0 = ACB + H0 — AAH0BB (by the consistency condition), take
H0=X0. •
Corollary 6.3.1 For AeCTM v", bER(A), the set of all solutions of Ax = b can be written as {x}
= {Ab + (I
Likewise,for 1*
—
cER(X* ), the solution set of
=
is
+ h*(I — XX)IhEC"}.
Corollary 6.3.2 For A E AX =0 is given by {X}
=
C"
X
the set of solutions to
= ((I — A_A)HtHEC"P}
and the set of solutions to YA =0 is given by
Theorem 6.3.3
If Ae
x
then A { I } can be characterized by either
of the following : (2)
A{1) =
1A+ H(I — AA)+(I — AA)KIH,
Proof To prove (2), note that AXA = A is always consistent since AAtA = A. The result in (2) now follows from Theorem 2. To prove (3), note that A +11(1— AA)+(I — AA)KeA(l} for all and that for any particular element AeA (1), there exist an H0 and K, such
(3)
98 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
that
A: = A +
—
AA) + (1— AA)K0, namely H0 = A: — A and
K0=A:AA. U Since (1)-inverses provide solutions to consistent systems, it seems only natural to investigate the possibility of using (1)-inverses to provide a xn common solution, if one exists, to two systems. Let A E cmx BE a€Cm, and consider the two systems
Ax=aandBx=b.
(4)
The problem is to find an
which satisfies both. This problem is clearly equivalent to solving the partitioned system
IA]
ía]
(5)
LB]X = Lb] If (1)-inverses could be obtained for the partitioned matrix M
= [B]' then
the system (4), could be completely analysed. The following result provides the solution to this problem.
Theorem 6.3.4
IA]
matrix M
For
a (1)-inverse for the partitioned
is given by
= Lii GM = [(I — (I — AA)(B(I — AA)fB)A
(I — AA)(B(I
—
AA))]. (6)
Further,nore,for CECTM Xe, a (1)-inverse for the partitioned matrix N = [A C] is given by
GN
1A(1 — C((I — AA)C)(I — AA)) ((I — AA)Cf(1 — AA) =L
7
If (f is taken to mean (1, 2)-inverse, then GM eM (1,2) and GN eN (1,2). The representation for the entire class M (1) as well as N (1) can be given in terms of GM and GN. We have chosen not to present these representations. Theorem 4 is proven by simply verifying the defining equations are satisfied. We can now say something about (4).
Theorem 6.3.5
aeR(A), beR(B), and let x0 and Xb Let be any two particular solutions for the systems
Ax=aandBx=b respectively. Let F = B(I
(8) —
AA). The following statements are equivalent.
The two systems in (8) possess a common solution.
B; — beR(F) = R(B/N(A)) —
XbEN(A) + N(B).
(9) (10)
(11)
GENERALIZED INVERSES AND LINEAR ESTIMATION
99
Furthermore, when a common solution exists, a particular common solution is given by
x = (I — (I
—
AA)FB)x0 + (I — AA)Fb
(12)
and the set of all common solutions can be written as + (I — AA)(I
—
FF)hIhEC"}.
(13)
Proof The chain of implication to be proven is(1l) (11) Suppose (11) holds. Then X0 — = + nbEN(B) so that B; — b = B(x0 — xb) = BnQER(F)
If (10) holds, then the vector x
(10)
since
flQEN(A),
of (12) is a common solution
= A; = a, and
+ FF1.
= B; —
(14)
= b — FFb. — b) = B; — b, or B; — Now (10) yields that Therefore, (14) becomes Bx = b. (9) (11): If there exists a common solution then the two solution sets Thus there exist must intersect. That is, + N(A) } {xb + N(B) } and (11) vectors ;eN(A) and nbEN(B) such that x0 + n0 = Xb + follows. To obtain the set of all solutions, use the fact that they can be written as
+
=
{
=
—
+
+ (I —
G4])h}
where GM is given in (6). Now,
I
—
= I — (AA —(I— AA)FBAA) — (I — AA)FB =
(I — AA)(I — FB + FBAA) = (I — AA)(I — FB(I — AA)) = (I — AA)(I — FF), which gives (13). We shall now present some results on finer partitions than those discussed in Theorem 4. Representations for (1)- and (1,2)-inverses of
•
matrices partitioned as
M_1AIC [R ID where A, C, R, and D are any conformable matrices will be given. First, we need a technical lemma and some notation. Notation. For and TeT{1}, ET and FT will denote the
ITT.
Lemma 6.3.1
Let
such that N
=
and let X,
100 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
be (1)- or (1,2)-inversesfor X, Y and W, respectively, which satisfy
and
=0, and WX =0,
0,
=0. Then, depending on how ()
is interpreted, a (1)- or (I, 2)-inverse for N is given by
Q[ZYI—I].
(15)
where Q = FY(EWZFVIEW.
Proof The proof amounts to showing that the defining relations are satisfied. Let L1 and L2 denote the first and second term of the sum on the right-hand side of (15), so that
NL1N
Ix
1NLN 2
=LWTE
0
=
=Z— and hence, EWZYY + WWZ + EWZQZFY = Z so that NNN = N. If () Now, EWZQZFV =
and
is
interpreted as meaning a (1,2)-inverse, then a direct calculation shows that L1NL1 = L1, L1NL2 =0, and L2NL1 =0. By using the fact that Z is a (1)-inverse for Q, it is also easy to verify that L2NL2 = L2 and hence
NNN=N. •
In passing, we remark that there are three other forms of Lemma 1 which are possible by considering the following three sets of hypotheses:
ZY=0 and YZ=0. XW=OandWX=O.
YX=0,
=0, WZ =0,
(16)
(17)
=0.
=0 and
(18)
By performing a permutation of rows, or columns, or both, and then applying Theorem 2.1, a representation which resembles (15) can be obtained for each of the previous cases. With the aid of Lemma 1, it is now possible to develop a representation for a (1)- or (1,2)-inverse of a completely general partitioned matrix. Theorem 6.3.6
Let MeCTM "denote the matrix
IAC
MLRD. and let Z = D — RAC, Y = EAC, W = RFA and Q = FV(EWZFVIEW. or (1,2)-inverse for M, depending on how (f is interpreted, is given by
M
—
IA
—
ACYEA
—
+
[FAWZ +
—
FAWRA YEA
—
FAWZYEA
FAW :
+
—
I].
0
GENERALIZED INVERSES AND LINEAR ESTIMATION
101
Proof For the moment, let (f denote a (1,2)-inverse and let P.S and N denote the matrices
rI Then, M = PNS
IA V II Ac1 1j.S=[0 jandN=Lw z
ol
and
a (1.2)-inverse for M
is
given by
M=S'NP1.
(20)
and W, the matrices G = YEA and H = FAW are (1,2)-inverses for V and W, respectively, such that GA =0 and AH =0. Since it is also true that each (1,2)-inverse A satisfies A Y =0 and WA =0, Lemma 1 is used to obtain a (1,2)-inverse for N as For every (1,2)-inverse
+
N=
I
—
J] (21)
Using (21) with (20) yields (19), the desired result for (1,2)-inverses. If (f is interpreted as meaning only a (1)-inverse, it is a matter of direct
computation to verify that the matrix (19) is still a (1)-inverse. U Observe that M may also be factored in three other ways as
II IAR
I
R_DCAJ[ i I1IR
o
RD
ERD
i
C—AR'D]LO
i
ICD L i
1ICA
O][ C
hID
1IDR
EDR
ojLcFD A—CDR][ i
I
o
Coupling each of these with the appropriate form of Lemma I which is obtained from (16), (17) or (18), one can use the same method as in the proof of Theorem 6 to derive three other representations for M which resemble that obtained in Theorem 6. Theorem 6 has several useful consequences. Corollary 6.3.3
If M =
where R(C) c R(A) and
RS(R) c RS(A),(RS() denotes the row space), then a particular (1)- or (1,2)-inverse for M is given by
M-
1A + AC(D — RAC)RA
—
AC(D — RACI
=L This gives the familiar form which occurs when the matrices M and A are non-singular and is taken as By using Lemma 3, we have the following.
102 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
If M
Corollary 6.3.4
where AECrXr and rank(M)
=
rank(A) = r, then a particular (1)- or (1,2)-inversefor M is given by 0 M_1A o Lo In many applications, the matrices involved are either positive or
negative semi-definite. In these cases, (1)-inverses of partitioned matrices are easy to find. Below we give the result for positive semi-definite matrices. The results for negative semi-definite matrices are left as exercises.
IfS =
Corollary 6.3.5
is positive semi-definite, then a
particular (1)-inverse for S is given by
- 1A + AC(D — C*AC)C*A
I
—
—
= Proof
Since S is positive semi-definite, S can be written as
i' 5*]'
S— —
—
—
D
c
Clearly, R(C) =
= R(A) and RS(C*) = = RS(A). The result now follows from Corollary 3. • Partitioned matrices of the form X
[x* 1
0
occurred in our treatment of the constrained least squares problem of Section 6 in Chapter 3. They also occur in statistical applications. As we shall see in the next section, (1)-inverses of B can also be used to present a unified treatment of the subject of linear models. Theorem 6 provides a (1)-inverse of B which will be used in the next section. Corollary 6.3.6 B
Iv lxi
= B
—101
Let
be positive semi-definite,
A particular (1)-inverse for B is given by
1111
.
—
VX
-
where Q =
(EXVFX.)EX. Theorem 6 can also be used to represent the rank of a large partitioned matrix in terms of ranks of matrices of lower order.
Theorem 6.3.7
Let M€CMXJ denote the matrix M
The rank
GENERALIZED INVERSES AND LINEAR ESTIMATION
103
of M is given by rank(M) = rank(A) + rank(Y) + rank(W) + rank(U), where V = EAC. W = RFA. and U = EW(D — = EWZF.. Moreover, the expressions rank(Y), rank(W), and rank(U) do not depend upon which (1)-inverses are used.
Proof For every (1)-inverse M of M. the product MM is idempotent. Using Theorem 6 we compute MM as 0
1
I
so that (Tr = trace)
rank(M) = rank(MM) = Tr(MM) = Tr(AA) + Tr(YYEA) + Tr(WW) + (22) are Because E1Y = V and WW, and = U, and since AA, idempotent. (22) becomes rank(M) = rank(A) + rank(Y) = rank(W) + rank(U). To show that rank(Y), rank(W), and rank(U) are independent of the (1)-inverses used, let G1 and G2 be two (1)-inverses for A and let
E1=I—AG11E2=I—AG2,Y1 =E1Cand Y2=E2C. Because E1E2 = E1 and E2E1 = E2, it follows that V1 = E1Y2 andY2 = that rank(Y1) = rank(Y2). Similar remarks may be made about W. From the first part of the theorem, rank(U) = rank(M) — rank(A) — rank(Y) — rank(W), so that rank(U) is also constant with respect to the so
(1)-inverses used. U By performing row permutations, column permutations, or both, one can obtain three more forms of Theorem 7. Corollary 6.3.7
For the matrix M of Theorem 6.3.7,
rank(M) rank(A) + rank(C) + rank(R) + rank(Z)
where Z=D— RAC,A€A{1}.
Proof The inequality follows directly from Theorem 7 since
Y=EAC, W=RFA,andU=EWZFV. • Corollary 6.3.8
If rank(M) = rank(A), then
R(C) c R(A), RS(R) c RS(A), and D
= RAC
for every A. Conversely, (1 the three conditions of(23) hold for some then rank(M) = rank(A).
Proof Ifrank(M)=rank(A),thenY=0W=O,andU=0.Y=O R(A) and W =0 implies RS(R) RS(A). Since Y =0 and W =0, it follows that U = Z and U =0 implies D = RA C. Conversely, implies R(C)
the three conditions of (23) imply Y =0, W =0, and U =0. U Again, there are three additional forms of Corollaries 7 and 8 which can be proven. For example:
(23)
104 GENERALIZED IN VERSES OF LINEAR TRANSFORMATIONS
If R(C) c R(A) and RS(R) c RS(A), then rank(M) = = rank(A) + rank(D — RA C)for every A. Conversely, (A) and rank(D — RA C)for some A, then R(C) rank(A) +
Corollary 6.3.9
RS(R)
RS(A).
We now turn to the slightly more specialized problem of block triangular matrices. That is, either R =0 or C =0. We shall limit the discussion to upper block triangular matrices. For each statement concerning upper block triangular matrices, there is a corresponding statement about lower block triangular matrices easily proved using transposes. Let T =
AECMXn, CECrnXr, DECPXr
From Theorem 6 one gets that:
Theorem 6.3.8
A (1)-inverse for T is always given by
—CD1 where Q = FD(ECFD)EA. Furthermore, then (24) yields a (1,2)-inverse for T. From Theorem 7, we have that:
(24)
represents any (1,2)-inverse,
Theorem 6.3.9 rank(A) +
4.
For every choice of(1)-inverses in EA and FD, rank(T) = rank(D) + ranlc(EACFD).
Applications to the theory of linear estimation
Once one realizes that generalized inverses can be used to provide
expressions for solutions (or least squares solutions) to a linear system of algebraic equations, it is only natural to use this tool in connection with the statistical theory of linear estimation. Indeed, the popularity of generalized inverses during the last two decades was, in large part, due to the interest statisticians exhibited for the subject. Much of the early theory of generalized inverses was developed by statisticians with specific applications relating to linear estimation in mind. One advantage of introducing generalized inverses into the theory of linear estimation is that a unified theory can be presented which draws no distinction between full rank and rank deficient models or between models with singular variance matrices as opposed to those with non-singular variance matrices. We will confine our discussion to applications involving problems of linear estimation. A complete treatment of how generalized inverses are utilized in statistical applications would require another book almost the size of this one. In Chapter 2 the application of the Moore—Penrose inverse to least
GENERALIZED INVERSES AND LINEAR ESTIMATION
105
problems was presented in a way that avoided the introduction of statistical terminology. However, in this section we will assume the reader is familiar with standard statistical terminology and some of the basic concepts pertaining to statistical models. We will analyse the linear model y = Xb + e where y is an (n x 1) vector of observable random variables, X is an (n x k) matrix of known constants, b is a (k x 1) unknown vector of parameters, and e is a (n x 1) vector of non-observable random variables with zero expectation and variance matrix E(ee*) = Var(y) = a2V. Here V is positive semi-definite and known, but is unknown. We will denote this model by (y, Xb, r2V). If rank (X) = k and V = I, then the celebrated Gauss—Markov theorem guarantees that the least squares solution 6= Xty = (X*X) 1Xy provides the minimum variance linear unbiased estimate of b. However, in the general case where X is possibly rank deficient or V is possibly singular, there may be no unbiased estimate of b. Then only certain linear functions of b are unbiasedly estimable. The problem is to obtain minimum variance linear unbiased estimates for estimable linear functions of b as well as an unbiased estimate for Throughout this section, all matrices will have real entries and (.)* denotes the transpose. When V is singular, there are some natural restrictions on y as well as b. In order to derive these restrictions, as well as other results, we will need to make frequent use of the following fact. squares
Lemma 6.4.1
For the mode!(y,Xb,o2V) and for ,I2€W' and Var(I*y + =
Proof
+;
+
=
—
— — Xb)]) = E[Itee*12] c21*yI follows by taking =
—
=
= o21V12. That Var(I*y +
=
We now investigate the restrictions which are naturally present when V is singular. Since V is positive semi-definite, there exists an orthogonal matrix S such that S*VS is the diagonal matrix, 0 22 ID2 01 2. 0jwhere21#Oand
Lo
0. r = rank(V). If T =
then it is easy to see that the model
(y, Xb, cr2V) is equivalent to the model
(I) Let S be partitioned as S =
where P is(n x r). Then (1)can be
106 GENERALIZED IN VERSES OF LINEAR TRANSFORMATIONS
written in the equivalent form 2
Q*y_.Q*xb+Q*e It follows that Var(Q*e) = 0 and E(Q*e) = 0 so that Q*y = Q*Xb with probability 1.
(3)
Equation (3) is just a set of linear restrictions on b. If the linear system in (3) is assumed to be consistent, then the model (y, Xb, o2V) is called a consistent model. Therefore, the model (2). and hence (y, Xb, o2V), is equivalent to a restricted model of the form (4)
where = D 1P*y, = D é = D 1P*e, R Q*X,and f= Rb = f, (7.21) has the obvious interpretation notation (i,,
The
b such that Rb = f, Var(é) = 021.
=
In the sequel, we will always assume all models we write are consistent.
Definition 6.4.1 A linear function c*b is said to be linearly unbiasedly estimable under (y, Xb, o2V) if there exists a vector 1€ and scalar zeR such that E(l*y + z) = c*b for all b such that Q*Xb = Q*y where Q* is as in (2). Whenever we use the term 'estimable' in the sequel, we will mean linearly unbiasedly estimable.
We are now in a position to characterize those vectors c such that c*b is unbiasedly estimable. Theorem 6.4.1
The function c*b is estimable under (y, Xb, o2V) if and
only if ceR(X*).
Proof
From Corollary 3.1, Q*Xb = Q*y if and only if
beS" = {(Q*XfQ*y + [I — (Q*xf(Q*X)]hlhepkk
and zeR such that E(l*y + x) = c*b, beb"
There exists
+ z = c*b, b€U' — l*X)b = z, be 9" —
=
l*X)[(Q*XIQ*y + (I — (Q*X)(Q*X))h] = 1*X)(Q*X)Q*y — l*X)[I — (Q*X)(Q*X)J = 0 —
GENERALIZED INVERSES AND LINEAR ESTIMATION
Ia= = (
107
— I*X)(Q*X)Q*y + (c*
—
I*X)(Q*X)Q*]X
—
—
— l*X
•
There are several important consequences of Theorem 1.
Corollary 6.4.1 A linear function I*y + a is an unbiased estimate of c*b under (y, Xb, o2V) if and only !f there exists a vector d such that = y*d, with probability 1, and X*(I + d) = C.
In some of the literature on linear estimation, it is often stated that 1111$ vector such that J*y is an unbiased estimate for c*b under (y, Xb, o2V), then Xi = c. This is wrong. Corollary 4.1 shows that X*1 = C, is sufficient but not necessary for l*y to be an unbiased estimate for c*b a
under (y, Xb, o2V).
We next need a result which will tell us what the entire set of linear unbiased estimates for c*b looks like. Suppose c*b is estimable under (y, Xb, o2V). The set of linear unbiased estimates for c*b is given by
Theorem 6.4.2
where
Proof Suppose first that
=
Clearly, E(cl') =
for some EU. Then cli = such that = c*b. Conversely, suppose is a linear
unbiased estimate for c*b. Then, by definition, there exists a vector I and a scalar a such that = I*y + a. Since i/i is unbiased, Corollary 4.1 guarantees that there is a vector d such that c' = (1+ d)*X and d*y = (with probability 1) so that
= I*y + a = I*y + d*y = (I + d)*y.
•
= c' and hence EU. with Therefore, has the form As an immediate consequence of Theorem 2, we have the following. Corollary 6.4.2 Suppose that c*b is estimable under (y, Xb, o2V). 1 is a vector such I*y is an unbiased estimate of c*b and only if there exists a = = vector { such that where Using Corollary 6.3.1, the set of unbiased estimates for c*b given in Theorem 2 can be written in terms of any element of X { 1}. Corollary 6.4.3
Suppose c*b is estimable under (y, Xb, o2V) and
XEX{1}. The set of linear unbiased estimates for c*b is given by
U=
{c'Xy +
We now address the problem of finding a form for the minimum variance linear unbiased estimate of an estimable function. From
108 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
Theorem 2 and Lemma 1, we know that when cb is estimable under (y, Xb, a2V), is a linear unbiased estimate for cb which has minimum
variance if and only if there is a vector minimum
such that
=
and the
minzVz where 9=
(6)
is attained at z = Since V is positive semi-definite, there exists a matrix A such that V = MA. Thus (6) is equivalent to minhiAzil where
(7)
This is a constrained least squares problem of the type studied in Section 3.6. Theorem 3.6.1 guarantees that the minimum (7) is attained at a vector z0 if and only if there is a vector A such that
Izi.isa least squares solution of Iv .
Let B =
:
XlIzl
101
(8)
Lc]
Using Corollary 3.6, it is easy to see that
[0]ER(B)
because for the B of Corollary 3.6,
BB[0]
=
(9)
=
by Theorem 6.4.1. Thus we can make an even stronger statement than (8) by replacing the phrase 'least squares solution' in (8) with the word 'solution' to give the following result. When cab is estimable under (y, Xb, c,r2V), is a linear unbiased estimate of cab which has minimum variance and only there are
Theorem 6.4.3 vectors
and A such that
= {y and
x][c]Io
fv
0]LafLc
LX*
A useful equivalent formulation of Theorem 3 is as follows.
Theorem 6.4.4 When cab is estimable under (y, Xb, a2V), is a linear unbiased estimate of c*b which has minimwn variance if and only if there is a vector such that each of the following conditions hold. (i)
=
(ii)X{=c (iii)
The linear unbiased estimate of cb which has minimum variance is unique in the following sense. Theorem 6.4.5
Suppose that cb is estimable under (y, Xb, a2'I). If
10
GENERALIZED INVERSES AND LINEAR ESTIMATION
109
and are both linear unbiased estimates of c*b which hae'e minimum variance, then i/is = with probability I.
Proof From Theorem 4 we know that there exist vectors
and
such that (11)
il'
and
(12)
—{2)ER(X).
(13)
From (13) we know there is a vector h such that
—
we can use Lemma I together with (12) to obtain Var(il'1
—
"2) = a2({
—
Thus, there is a constant such that However, (11) and (12) imply that K =
({ —
=
—
=0. U
—
=
— —
= Xh so that
=0.
with probability I. — = =
In light of this result, we make the following definition.
Definition 6.4.2 When c*b is estimable under (y, Xb, a2V), is called the best linear unbiased estimate (or BLUE)for c*b when is the unique linear unbiased estimate for c*b with minimum variance.
If c', is the BLUE for cb, then i,1'
unique. There are, however, generally infinitely many vectors satisfying the conditions of Theorem 4 which can give rise to ci'. Although there may be a slight theoretical interest in representing all of the associated with the BLUE this is usually not the problem of prime concern. The important problem is to obtain some formula for i/i. If any one particular satisfying the conditions of Theorem 4 can be determined, then the problem of finding the BLUE of cb is considered to be solved. It is clear from Theorem 3 that knowledge of any (1)-inverse of the matrix
is
can provide a representation
for the BLUE of cb. Such a (1)-inverse can also provide other valuable information. Before pursuing this further, we need the following definition. Definition 6.4.3
For the linear model (y, Xb, o2V), let B denote the
matrix. (V is n x n and X is n x k).
An n x n matrix is said to be a B11-matrix it appears as the upper left hand block in some BeB{1}. Likewise, those n x k matrices which appear as an upper right hand block in some B are called B12-matrices; those k x n matrices which appear in the lower left hand corner of some B are called B21-matrices; and those k x k matrices which appear in the lower right hand corner of some B are called B22-matrices.
110 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
A somewhat amazing fact about B1çmatrices is that each class is
completely independent of every other class in the sense that if Q is any 1-matrix, U is any B1 2-matrix, L is any B21-matrix, and T is any is always a (1)-inverse
B22-matrix, then the composite matrix
for B. Furthermore, the .-matrices can be computed independently of each block which appears in a B can be one another. This means calculated as a separate entity without regard to any other block which might appear in the same B (or any other B). In order to establish these facts, we need some preliminary lemmas. Lemma 6.4.2
Let E = I — XX'. A matrix Q is a B1 1-matrix
and only
satisfies the four equations
E(V—VQV)=0,
(14)
VQX=O, X*QV=O,
(15)
X*QX=O.
(17)
Proof
(16)
Suppose first that Q is a B11 -matrix. Then there must exist
This implies,
matrices W12, W21, and W22 such that
by direct multiplication, that (1) VQV + XW21 V + YW + XW22X = V; (ii) VQX + XW21X = X; (iii) X*QV + X*W12X* = X*; and (iv) X*QX =0. Note that (iv) is equation (17). To establish (15), multiply (ii) on the left by X*Q* and use (iv) to obtain XQVQX =0. Since V is positive semi-definite, V = A and it is easy to see that (15) follows. Equation (16) follows in a similar manner. Equations (ii) and (iii) now degenerate to XW21X = X and X*W12X* = X*.
(18)
To establish (14), notice that for every vector I,, and BeB{1}, (9) guarantees that BB
_rol 101
Lx*hi = [x.h]
From this it follows that
VW12X* + XW22X* =0 so that (i) becomes
V—VQV=XW21V.
(19)
Equation (14) is obtained by multiplying (19) on the left by E. Conversely, if Q satisfies (14)—(17), then F and Q is
Q
(I_QV)X?*
a B11-matrix. •
B!
GENERALIZED INVERSES AND LINEAR ESTIMATION
111
Lemma 6.4.3
The term VQV is invariant for all B11 -matrices Q. Moreover. for every B11 -matrix Q, VQV = A*(AE)(AE)tA where
E=I_XXtandV=A*A. Proof
Suppose Q isa B11-matrix so that (14)—(17) hold. By direct multiplication, along with (15) and (16), and the fact that Xt = (X*X)tX* = X*(XX*)t, it can be verified that VEQEV = VQV. Let K = EVE and QKKt G= + KtKQ — Q. It now follows from (14) that QeK{1} and hence GE K { 1). Now observe that Q can be written as
+ E(I — KtK)Q + EGE
Q = Q(! —
(20)
since (15) and (16) imply KQE = KQ and EQK = QK. But from K = Kt, it follows that KKt = K'K and (I — KKt)EV =0= VE(I — KtK). Therefore, (20) together with KKtEV = EV and VEK'K = VE yields VQV =
VEGEV= Now use the fact that GEK{l} to obtain VQV = VEKtEV = VE(EA*AE)tEV = A*AE(AE)t(AE)*$EA*A = A*(AE)(AE)tA. U
Let D = A*[I — (AE)(AE)tJA where A and E are as in
Lemma 6.4.4
Lemma 3. (By virtue of Lemma 3, D = V — VQV where Q can be any B11 -matrix.) Each of the following statements hold.
U is a B12-matrix if and only ifUEXt{1} and VUXt = D.
(21)
and XLV = D.
L isB21-matrix ifand only
(22)
T is B22-matrix (land only if XTXt = — D.
(23)
Proof of (21). Suppose first that UeXt{l} where VUXt = D. To see that U is a B12-matrix, let Q = E(EVE)tE and verify that
M-1 uLxt(I_VQ)
U
E
Bi
I
by observing that Corollary 3.6 implies that Q is a B11-matrix so that (14)—(17) can be used. Conversely, suppose that U is a B12-matrix. This
means there exist matrices Q, L and T such that
1}.
From (18)we have that UEXt{1}. The fact that VUX* = V — VQV = D follows from (19). Thus (21) holds. The same type of argument is used to prove (22), (23) except in place one uses of ML
MT
IQ =
LII r
=
I
(I_QV)Xt*1
for (22) and
(I_QV)Xt*1
for (23).
U
By combining the results of Lemmas 2—4 one arrives at the following
112
GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
important result concerning the independence of the various classes of
Theorem 6.4.6 If is any B1 1-matrix. G12 is any B1 2-matrix, G21 is any B21 -matrix and G22 is any B22-matrix. then the composite matrix I
r
is a (1 )-rnverse for
B. Furthermore,the matrices C, can be
L"21 computed independently of each other. The equations on which such calculations must be based are given in (14)—(17) and (21)—(23).
Although it is not necessary to know a B11 -matrix in order to compute - matrix can be useful B12-, B21 -, or B2 2-matrices, knowledge of any since the matrix D of Lemma 4 is then readily available. Once D is known, a set of B1 B21-, and B22-matrices can be easily computed. The importance of the different B11-matrices in linear estimation is given in the following fundamental theorem. If c*b is estimable under (y, Xb,
Theorem 6.4.7
i2V), then each of the
following is true. If G12 is any B12-matrix, then the BLUE of c*b is given by (24)
If C21
is any B21 -matrix, then the BLUE of c*b is also given by
(24')
Suppose and respectively. If
are both estimable with BLUE's
and
is any B22-matrix. then
=
(25)
If G22 is any B22-matrix and i/i is the BLUE of c*b, then
_a2c*G22c.
(26)
If G11 is any B11 -matrix, then an unbiased estimator for o2 is given by Iy*G 1y where y = Tr(G1 1V).
(27)
Proof of (24): II C12 is a B12 -matrix, then there exist matrices Q, L,
and T such that
[Q
for c*b is given by
Theorem 3 guarantees that the BLUE
= {*y where satisfies (10). Therefore, a solution for
for any B. Thus one solution for
[f] is
is G12c, and hence
= Proof of(24'): If G21 isa B21-matrix, then by Lemma 4,
is a
B1 2-matrix, (24') now follows from (24).
Proof of(25): If C22 is a B22-matrix, then there exist matrices Q, U and
IQU1 ,-. IeB{1}. From Theorem 1, we know that there
L such that I
L'
GENERALIZED INVERSES AND LINEAR ESTIMATION
113
exists a vectors h1 and h2 such that X*h5 = c1 and X*1I2 = c2. Use (24)
together with Lemma 1 to obtain ,
= = = =
= =
=
—
—
o2htXG22X*h2
(from (21)) (from (23)) (from (21) since UEX*{1})
is immediate. is also given by — Proof of (26): This is obtained from (25) by taking Cs = c2. Proof of (27): If G11 is a B11-matrix, then Y*GY = (Xb — e)G1 1(Xb — e) = bXtG1 3Xb — 2b*X*G1 1e + e*G1 1e. Using (17), together with the fact that E(e) =0, yields The fact that Cov(sfr1.
E(y*G 1y) = E(e*G1 1e) = E[Tr(G1 1ee*)] = Tr[E(G1 = a2Tr(G1 1V).
•
0. It can be shown In (27) we made the assumption that Tr(G1 1V) that Tr(G1 1V) =0 if and only if R(V) c R(X), which is clearly a pathological situation. The details are left as exercises. Theorem 7 shows that once any element of B { 1) is known, the problem of inference from a general linear model is completely solved and the problem of inference is thus reduced to the calculation of specific Ba-matrices. Actually, knowledge of any B11 -matrix together with any element of X (1) will suffice in order to produce the quantities of Theorem 7 (i.e. a priori knowledge of a B12-, B21- or a B22-matrix is not necessary). Theorem 6.4.8
If c*b is estimable under (y. Xb. a2V) and J'Q is any 3-matrix and X is any element of X 1), then each of the following is true.
The BLUE of cb is ç(' = c*X(I — VQ)y. = o2c*X(V (ii) If i/i is the BLUE of c*b, then (1)
—
VQV)X*c.
= o2c*XDX*c.
(iii) If cl'1 and cl'2 are the BLUE's of Cb and
respectively,
(assuming each are estimable) then —
VQV)X*c2.
= (iv) An unbiased estimation for
is
ly*Qy where y = Tr(QV).
Proof If Q is a B1 1-matrix then (14)—(17) hold and it is not difficult to show that
I
Q
(I_QV)X*
1 B 11).
The desired result now follows from Theorem 7. U
114 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
If any Xis known, then a B11 -matrix is always available via the formula Q = (I — XX)*[(I — XX)V(I — XX)*J(I — XX). However, -matrix is known, then computing X is unnecessary. The next result shows that once a B11-matrix is known, then all one needs is any solution 1* of the system I*X = c. if a
Theorem 6.4.9
If c*b is estimable under (y, Xb, a2V) and J* is any = I*X and Q is any B11 -matrix, then each of the following is
solution of true.
(I) The BLUE
of c*b is = l*(I — VQ)y
(ii) If i/i is the BLUE ofc*b, then (iii)
= a21*(V — VQV)I = a21*Dl.
respectively, then and #2 are BLUE's and = and — VQV)l = o21tD12 where ITX = 2
— c*
—V
Proof We know from Theorem 1 that I*X = is always consistent so that = c*XXt. For a particular solution, there is always a particular member X€X{1} such that 1 = c*X, namely x = X' + — c*tc*Xt. The desired conclusions now follow from Theorem 8. • We conclude by considering the special, but important, case when V is
non-singular. It is a simple exercise to show that (X*V 'X) X'V1 eX{1}. It is then easy to use Lemma 2 to show that [V' — V IX(X*V 1X)
'X)X* a B11 -matrix. Therefore, D = V — VQV = and it is clear from Lemma 4 that (X*V is a B21-matrix (X*V and — - 1X) is a B22-matrix. These observations together with (7) give the following useful result. x
is
Corollary 6.4.4
If c*b is estimable under (y, Xb, a2V) and V is nonsingular, then each of the following hold. (i) The BLUE ofcb
is = C*(X*V IX)X*V 'y.
=
(ii)
and
=
'X)c2
where and #2 are the BLUE's for and (iii) An unbiased estimator for is given by ly*[V1 1X(X*V IXfX*V —V y where y = n-rank(X). y Perhaps the most common situation encountered is when V = I, in
which case we have the following.
Corollary 6.4.5
If c*b is estimable under (y, Xb, 021), then each of the
following hold. (i) The BLUE of c*b is ',I, = c*(X*XIX*y.
(ii) Var(#) = o2c*(X*X)c and Cov(*1 '#2) = and *2 are the BLUE'sfor and An unbiased estimator for 02 is given by (iii) where y = n-rank(X).
where
ly*[I — X(X*XfX*]y
GENERALIZED INVERSES AND LINEAR ESTIMATION
Notice that (X*Xf
115
so that (X*XIX*y is just another way of
1.
representing any least squares solution of Xb = y. Also c*(X*Xi is just any
solution, 1*, of = l*(X*X). Thus Corollary 4 can also be stated in terms of solutions of = l*(X*X) or in terms of least squares solutions of Xb = y. Similar remarks can be made about the results in Corollary 3
because C*(X*V_ 'X) represents any solution of = l*(X*V. 'X) and (X*V IX)X*V.. 'y represents any weighted least squares solution of Xb = y. (By a weighted least squares solution of Xb = y, we mean any vector z such that H Xz — y - = (Xz — y)*V - 1(Xz — y) is minimized, or equivalently, any solution of the weighted normal equations XW 'Xz = 'y.) In conclusion, we note that not only are linear models with singular variance matrices representable as restricted linear models but that restricted linear models (y, Xb Rb = 1. o2V) are just special cases of linear models where the variance matrix is singular. Indeed, one can always write -
lxi. y
= LRi' = L and it
is c!ear
and
lv:o
=
that the restricted model (y. Xb Rb =
1.
is equivalent to
o2V) where V is singular.
5.
Exercises
Verify each of the following assertions.
1. (AØB)e(AØB){1} where A®B denotes the Kronecker product of A and B.
2. Let G = U(VAUIV and let rank(A) = r. Each of the following is true. (a) GeA{l) iffrank(VAU) = r. (b) GeA and R(G) = R(U) if rank(VAU) = rank(U). (c) and N(G) = N(V) if rank(VAU) = rank(V). (d) G is a (R(U), N(V))-inverse for A iffrank(U) = rank(V) = rank(VAU) = r. If rank(A*VA) = rank(A), then A(A*VAI(A*VA) = A and (A*VA)(A*VA)A* = A*. 4. Verify A(A*AIA* = AAt. 5. II R(C) c R(A) and RS(R) c RS(A), then RAC is invariant over A I 3.
AAAeA{1,2). 7. For reC' rAA =
6.
8. For Ge A { 1), the following statements are equivalent: (a) GE A { 1,2), (b) rank(A)= rank(G), (c) G = G,AG2 for some G, , G2eA{1J. 9. If has rank r, then there exists G1eA{ I) such that rank(G1) = r+ I, 1= 0, 1, 2, ... , min(m,n). 10. If and P is a non-singular matrix such that PA = H where H is the Hermite canonical form for A, then PEA { 1). 11. For every there exists a GeA{1) and FeB{1)
such FGe(AB){1).
116 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS 12.
Let KrnXn denotes the set of m x n matrices with integer entries. II
then there exists
such that AGEKMXM and
(Hint: Consider C = QStP where PAQ = SandS is the Smith canonical form with P and Q being non-singular.) where beR(A). Let C = QStP (as described and 13. Let above). Ax = b has an integer solution if and only if GbE XI which case the general integer solution is given by x = Gb + (I — GA)h. .
hEKNX I. 14.
and Q* be permutation matrices such that M =
Let
= rand let T = A 'C, S = RA
where rank(M) =
Then
01 (j)
(ii)
(iii) (iv) 15.
Q[...](I +T1'*) 'A '(I + S*S)[I!S*]P= Mt.
Let P and Q be non-singular matrices and let A be an r x r
non-singular matrix such that M = p'
LetG=Q
1•
P.Then
(i) GeM{l) iffZ=A'. iffZ = A1 and W = VAU. U=
(ii) (iii)
W= —
where P
= (iv) where Q
iffZ=A'.V =
—Qk?,U
= [Q, Q2]
and 16.
For A€CrnxhI. CECtm. deCk. let E = I —
AA. F = I — AA, and
fl=1 A (1)-inverse for A + cd* is given by one of the following: (i)
-
-
Acc*E*E
FF*dd*A
FF*dC*E*E
(A + cd*) = A — c*E*Ec — d*FF*d + P(c*E*Ec)(d*FF*d) when cØR(A),
GENERALIZED INVERSES AND LINEAR ESTIMATION
FF*ddSA
(ii) (A + cd*) = A — d*FFSd when (A
=0. CER(A),
+ cd*i = A — r 'A cdA when
or dER(A*).
-
(iv) (A + cd*) = A
117
0 and either c€R(A)
- AccEE when fi =0, — c*Es&
dER(A*)
(v) (A + cd*) = A when $ = 0, c€R(A),
17. At = A*(A*AA*)A*. 18. G€A{2} 1ff there exist a pair of orthogonal projections P and Q such that G = (PAQ)t.
19. GEA{1,2} 1ff there exist A, A€A{1} such that G = AAA. 20.
A{1,3)={At+(I_AA)HlHisarbitrary},A(1,4}= (At + K(I — AA)I K is arbitrary}.
21. Let A be n x n. If P is a non-singular matrix such that PA*A is in Hermite form, then PA*€A(1,3}. 22. Let M be a subspace and let P = and P1 = The constrained system Ax + y = b, xeM, yeM1 is consistent lIT beR(AP + P); in which case the solutions are x = P(AP + P)b and y = b — Ax. When
(AP + PJ1 exists, the matrix G = P(AP + P1) 'b is called the Boft—Duffin inverse.
23. (AP + P1)' exists
exists where the columns of K form a basis for M. The Bott—Duffin inverse is G = K(KAK) lK*. 24. When it exists, the Bott—Duffin inverse is the (M. M1)-inverse of PAP. 25. Let A, denote an incidence matrix of a directed graph consisting
ofmnodes (N,,N2,...,N,,,}andndirectedpaths{P1,P2,...,Pj between nodes. That is, a., = 1 if is a path directed away from N., —1 if P, is a path directed into N., and a1,=O ifP, is a path neither leads away from or into N.. Suppose the graph is connected (i.e. every pair of nodes is connected by some sequence of 1
paths.) If GeA{1,3}, then I — AG = —J where J is a matrix of l's. 26. If A is the incidence matrix of a connected di-graph, then rank(A) = m — 1, where m = number of nodes. 27. Let W be a positive definite matrix and let "v, be the norm associated with W (i.e. x = x*Wx). G is a matrix such that x = Gb is a weighted least squares solution (II Ax — b is minimal) for every b ill G satisfies AGA = A and (WAG)* = WAG (A weighted (1,3)inverse).
28. Let V be positive definite. Gy is a V '-least squares solution of Xb = y for all y if G is a B21-matrix. 29. Let V be positive definite and suppose Ax = b is consistent. G is a matrix such that x = Gb is the minimal V-norm solution of Ax = b for all beR(A) if G satisfies AGA = A and (VGA)* = VGA (A weighted (1,4)-inverse).
118 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS 30.
(Weighted Moore-Penrose inverse) AGA = A, GAG = G, (WAG) = WAG, and = VGA, ill for all b, Gb is the W-least squares solution of Ax = b which has minimal V-norm. Moreover, there exists a unique solution for G which can be expressed as G = V"2 x (W1I2AV - h12)t w 112
= v - 1A* WA(A WAY - 'A*WA)A*W.
31. Let V be positive semi-definite and let fl x denote the semi-norm (x*Vx)h12. A vector . is called a minimum V-semi-norm solution of the = c and { is minimal among all system X*z = c, ceR(X*) if solutions. The following statements are equivalent.
(i) Cc is a minimum V-semi-norm for every ceR(X*). (ii) G is a B12-matrix. (iii) GeX*{1} and XG*V = VGX*. (The same as in Exercise 29). 32. Let W be positive semi-definite. G is a matrix such that Gb is a W-least squares solution of Ax b, for all b, ifi A*WAG = A*W. This last equation is equivalent to the two conditions WAGA = WA and (WAG)* = WAG. (Notice that G is not necessarily in A { I), as was the case in Exercise 27.) 33. Let V and W be positive semi-definite. C is a matrix such that Gb is a minimal V-semi-norm W-least squares solution of Ax = b 1ff G satisfies the four conditions WAGA = WA, VGAG = VG, (WAG)* = WAG, and (VGA)* = VGA. (If V is positive definite, there exists a unique solution for G. If V is just semi-definite, G may not be unique.) 34. If V and W are positive semi-definite and X = A*WA, then every B12-matrix satisfies the four conditions of Exercise 33. 35. If Q is any B1 ,-matrix, then Tr(QV) = rank[V X] — rank[X]: Furthermore, Tr(QV)= 01ff R(V)c_ R(X) if 0 isa B,,-matrix.
36. (V + XX*IX[X*(V + XXIXI is always a B, 2-matnx. If R(X) c R(V), then VX(X*V X) is a B, 2-matnx.
37. The matrix
non-singular if rank(VA
= it and
rank(XRk)= k. 38. If M is any matrix such that R(V + XMX)= R([V!X])and if W = (V + XMX*), then L = (X*WX)X*W is a B21-matrix, L is a B,2-matrix, W(I — XL) is a B,,-matrix, and (XWXI — M is a B22-matrix. 39. The following statements are equivalent. (i) The invariant term D of Lemma 4.4 is the zero matrix. (ii) 0 is a B22-matrix. (iii) R(V) = R(VIN(x.)). (iv) rank(V) = Tr(VQ) where is any B,,-matrix. (v) R(V) R(X) =0. 40. (Use of 2-inverses in a generalized Newton's Method.) Let x0eCa let B(x0,r)be the open ball of radius r centred at x0. Let I be a function f: B(x0,r)-. C and let J(x)eC be defined for
xeB(x0,r)where X(x)eJ(x){2}. Suppose 5,; and yare constants such
GENERALIZED INVERSES AND LINEAR ESTIMATION
119
that the following hold:
IIf(u)—f(v)—J(v)(u — w)II Lilu—vU, for u, veB(x0,r) with u — veR(X(v)). (ii) (X(u) — X(v))f(v) u— for u, veB(x0, r). (iii) cIIX(u) I +yö< 1 for ueB(x0,r) (iv) X(x) 111(x) II <(1 + ö)r. (1)
I
converges to a point Then the sequence =X— peB(x0, r) which is a solution of X(p)f(x) =0. (If X(p) has full column rank, then p is a solution of f(x) = 0.) 41. For any choices of (1)-inverses for X,X*, and K = EXVFX. where = I — XX, = I — - X, the following statements are true. IfQ is a B11-matrix, there exist matrices Z1 ,Z2,G such that GeK{1} (I — KK)Z2 + and GEE. Conversely, Q = Z1(I — ÷ the matrix Q in(s) is a for every pair Z1 ,Z2, and every
B1 1-matrix.
7
The Drazin inverse
1.
Introduction
In the previous chapters, the Moore—Penrose inverse and the other (i,j, k)-inverses were discussed in some detail. A major characteristic of the (i,j, k)-inverses is the fact that they provide some type of solution, or least squares solution, for a system of linear algebraic equations. That is, they are 'equation solving' inverses. However, we also saw that there are some desirable properties that the (i,j, k)-inverses do not usually possess. For example, if A, BE then there is no class, C(i,j, k), of(i,j, k)-inverses for A and B such that A, BE C(i,j, k) implies any of the following: X
(i) AA=AA, (ii)
= (AP) for positive integers p.
(iii) Aeo(A)=.Veor(A ), (iv)
'A =
A is similar to B via the
similarity transformation P. then A is similar to B via P. Depending on the intended applications, it might be desirable to give up the algebraic equation solving properties the (1)-inverses possess in exchange for a generalized inverse which possesses some other 'inverselike' properties. The Group and Drazin generalized inverses of this chapter will be of such a compromising nature. In many ways, they more closely resemble the true non-singular inverse than do the (i,j, k)-inverses. They will possess all of the above mentioned properties. Although the Drazin inverse will not provide solutions of linear algebraic equations, it will provide solutions for systems of linear differential equations and linear difference equations as will be shown in Chapter 9. Up to this point the underlying field has always been taken to be the field of complex numbers. Although this was not always necessary, the complex numbers provided the most natural setting for the development of the Moore- Penrose inverse as well as most of the other (i,j, k)-invcrsc. To extend the concepts of the previous chapters to matrices over different
THE DRAZIN $NVERSE
121
fields is somewhat artificial. One soon finds that the kind of field needed in order to obtain analogous results must possess properties which mimic
those of the complex numbers. There is nothing special about the complex numbers when it comes to defining the Drazin inverse. However, many of the results in the latter part of this chapter depend on the taking of limits. Rather than get into a technical discussion of the type of topology needed on the field, we shall merely note that almost all our results extend to arbitrary fields. The Group inverse, as we shall see later, is just a special case of the Drazin inverse. However, because the Group inverse appears in some interesting applications, (see Chapter 8) we consider it as a separate entity.
Definitions
2.
The Drazin inverse will only be defined for square matrices. Just as was
the case when defining the (i,j, k)-inverses, there are at least two different approaches possible when formulating the definition. These are the functional or geometric definition and the algebraic definition. The algebraic definition was first given by M. P. Drazin in 1958 in the setting of an abstract ring. We will give both definitions and then show that they are equivalent. Before doing this, some preliminary geometrical facts are needed. Throughout this chapter, we adopt the convention that 0° = I.
Lemma 7.2.1
Let A be a linear transformation on C". There exists a non-negative integer k such that C" = R(4k) + N(4k).
Proof Let k
be
the smallest non-negative integer such that
...or
...
c equivalently, N(4°) N(4) c ... c N(4"2) = .... Suppose that Then there exists a zeC" such that 4kz = x. Thus, 42kz = AkX =0, so that zeN(42k) = N(4k). Thus x =0. Suppose, that rank(Ak) = r so that dim[N(Ak)J = n — r. If {v1 ... ,v,} is a basis for R(4k) and if {v,,.,... is a basis for N(4"), it is easy to show that {v1,... ,v,, is a basis for C". U The number k which was introduced in Lemma 1 will be very important. 1)
Definition 7.2.1
N(4k) = N(4kf 1) =
Let 4 be a linear transformation on C". The smallest
non-negative integer k such that C" = R(4k) + N(4k), or equivalently, the smallest non-negative integer k such that rank(4'9 = rank(4k4 I), is called the index of4 and is denoted by Ind(4).
Note that if A is invertible, Ind(4) =0. Also lnd(0) = 1. Several different characterizations of the index will be developed in the
sequel. 114 is a linear transformation on C" and Ind(4) = k, 4 restricted to R(4k)) is an invertible linear transformation on R(4k).
Lemma 7.2.2 then
122 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
Proof We
• now formulate a definition of the Drazin inverse of a linear
transformation on
Definition 7.2.2
(Functional Definition.) Let 4 be a linear transformation such that Ind(4) = k. Let XECA and write x = u +v where uER(4k) and veN(4'9. Let The linear transformation defined by = ADx = is called the Drãzin inverse of 4. For let 4 be the linear transformation induced on C" by A. The Drazin inverse, AD, of A is defined to be the matrix 0f4D with respect to the standard basis. on
Theorem 7.2.1
(The Canonical Form Representation For A and AD.) is such that lnd(A) = k >0, then there exists a non-singular matrix P such that (1)
where C is non-singular and N is nilpotent of index k. Furthermore, if P, C and N are any matrices satisfying the above conditions, then
]P_l.
(2)
Proof Let 4 be the linear transformation induced on
,...
B=
be the basis for
by A. Let
constructed in the proof
of Lemma I so that {v1,...,vj isabasisforR(4k)and{v,+1,...,v0} isa basis for N(4"). Since R(4t9, N(4") are invariant subspaces for 4 and Ak(N(Ak)) = The form we have the block form for A lIP = [vi'... for AD follows from the definition of if P is as specified. However if P. C, N are such that (1) holds, and C is non-singular, and Nk =0, then the first r columns of P are a basis for R(4k) while the remaining columns are a basis for N(4"). Thus (2) for any P., C or N by Definition 2. U 4D, is as follows. The algebraic definition of
Definition 7.2.3 (Algebraic Definition.) If and 1JADECn
with Ind(A) = k
is such that
ADAAD = AD,
(3)
AAD = ADA, and
.
(4)
(5) then AD is called the Drazin inverse of A.
Theorem 7.2.2
For
to the algebraic definition
the functional definition of AD is equivalent
of AD.
THE DRAZIN INVERSE
Proof Write A as in (1). That AD satisfies (3). (4) and (5) Suppose then that X satisfies (3), (4) and (5).
Ix
Now
11
is
123
trivial.
xl 12
Iwhere X11 and Care the same size.
From (5) we have X11 =Ck,Ck4l X12 =0. Thus X11 = C-' and X12 =0. But also XAk+I =Ak by (4) and (5). Thus X21 =0. There remains to show that X22 =0. From (3) and (4) we have
= X22. Thus N&.... 'X22 =
Nk(X22)2
=0. But then Nk 2X22 = (X22) =0. Continuing in this manner gives X22 =0 as desired. U
N(X22)2
Notice that AD exists and is unique for all since the functional definition is constructive in nature. Some important facts that are evident either from the definitions or from the above proof are listed in the following corollary.
Corollary 7.2.1
If
and Ind(A) = k, then
(i) R(AD) = R(Ak), (ii) N(A°) = N(Ak),
(iii) AAD = A°A =
A4 ).N(
(iv) (I _AAD)=(I _ADA)= l'N(A1).R(A4). (v) for a non-negative integer p. A' and only (1 p k, and (vi) if A is non-singular, then AD = A '. The number k = Ind(A) was used in the algebraic definition. Actually, any non-negative integer p. p k, could have been used.
Theorem 7.2.3
Let
negative integer and Xe
be such that lnd(A) = k. If p is a nonsuch that XAX = X, AX = XA, and
= A', then p k and X =
A' implies that = R(A') so that p k. Write 'X = Ak. p = k + i. Then (AD)IAk+l = This reduces to Thus X satisfies the conditions of the algebraic definition of AD. U Something that should immediately strike one's attention when looking at Definition 3 is that AD is not always a (1)-inverse for A. This, of course, means that AD is not an 'equation solver'. That is, if b a consistent system of algebraic equations, then ADb may not be a solution. In fact, ADb is a solution of Ax b if and only if be R(Ak) X
where k = Ind(A).•
There are special cases when AD is a (1)-inverse for A E
Theorem 7.2.4
For
X
AADA = A if and only !flnd(A)
A'
1.
Proof If Ind(A)= 0, then AD = and AADA = A. Suppose that Ind(A) 1. Then relative to(l), (2) we have AADA = A if and only if 0 = N. But 0 = N Wand only if Ind(A)= 1. U
124 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
The special case when Ind(A)
rise to what is known as the Group inverse. Notice that in this case, (5) can be rewritten as AADA = A.
Definition 7.2.4
If
1 gives
is such that Ind(A)
then the Drazin inverse of A is called the Group inverse of A and is denoted by A'. When is characterized as the unique matrix satisfying the three it exists, 1,
equations AA'A=A,AAA' =A,andAA =AA. The following theorem makes it clear why is used.
the term
'group inverse'
Theorem 7.2.5
A linear transformation 4 on which has rank r belongs to a multiplicative group, G, of linear transformations on if and only if Ind(4) 1. Furthermore, if 4EG and if 49eG is the multiplicative inverse of 4 within G, then 49
=
an 4eG such 4 is in a multiplicative group G, then there exists 4D 49449 = 49. Then 49= = 4'4, and by Theorem 3 =4,
that
Also Ind(4) as in (1),
1. Conversely,
(Ix G=1P[0
01
0]P
suppose Ind(4)
1. Then with P defined
1 XeC'x ',r=rank(C)1,
is a multiplicative group containing 4. • As a special case of Theorem 1 (or Theorem 5) we have the following.
Corollary 7.2.2
For AeC
A' exists
and only if there exist
non-singular matrices P and C such that A =
The following is a simple example of a group of singular matrices.
Example 7.2.1
Consider the following subset of RNXN.
11... G=
1
...
ii...1
It is clear that G is a multiplicative group. The multiplicative identity in
Gis
then thegroupinverseofA is A
=_!1j
Another algebraic characterization of AD is illustrated in the heuristic
diagram of Fig. 7.1. is a semi-group and the G's (one for each idempotent), are the maximal subgroups of CN N Clearly, { Gj is a disjoint family but not a partition of CN* n• If Ind(A) 1, then, as pointed out for some i, and AD = A' is just the inverse of A with respect earlier, to the group G1. If Ind(A) = k> 1, then it is not difficult to show that k can X
THE DRAZIN INVERSE
125
= Some maximal
klnd (A) (A1 )*
A°
Fig. 7.1
as that number such that Ak E G,, for some r, but Ak-I ØG,, for all r. Thus, Ak has an inverse, X, within G,. The Drazin l(A&)*. 'X = inverse of A is simply AD = Suppose that the Jordan form of A is be characterized
J1
° o
0
0
... 0
1
0
0
0
oo...ió 0
o
0
01!
0
olO
o
0
0
...
0
(If a field other than C is used one still gets (1) but not possibly
If the Jordan blocks, J1, are arranged so that the diagonal elements of ,J1.,.2, J1,J2,... ,J are non-zero and the diagonal elements of arc zeros, then the matrices C and N in Theorem 1 may be taken to be
rJ1o o...ol
C=10 J20
...
andN= 10 [o
J20
...
0
o
Theorem 1 will be fundamental in the development of the theory of the Drazin inverse. However, Theorem 1 also has a practical side. One may use this theorem to compute the Drazin inverse.
Algorithm 7.2.1
Computation of A° where A E
Xii
and Ind(A) = k.
(I) Let p be an integer such that p k. (p can always be taken to
equal to n if no smaller value can be determined.) If =0, then AD =0. Thus assume 0. (II) Row reduce to its Hermite echelon form, HA,. (See Definition 1.3.2.) The sequence of reducing matrices need not be saved.
126 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
(III) By noting the position of the non-zero diagonal elements in Hi,, • select the distinguished columns from and call them (This is a basis for R(Ak).)
(IV) Form the matrix I —
and save its non-zero columns. Call (This is a basis for N(Ak).) (V) Construct the non-singular matrix P = ... I (VI) Compute P'
them v,
I
(VII) Form the product P 'AP. This matrix will be in the form
P 'AP
where C is non-singular and N is nilpotent.
=
(VIII) ComputeC'. (IX) Compute
by forming the product
Example 7.2.2
01
0
0]
Let
12
0]
0
1.
1
[—i
rc-'
—1 —l
We shall find AD by using this algorithm.
(I) Since we don't know what Ind(A) is let p = 3. Then
18001
01,
0
L 0 0 oJ and
(II)
11001 H=I0 0 01. L000J
(III) Thus,, =
81
1
—8
L
is a basis for R(Ak).
OJ
10001 (IV) Now I—HA= 10 0I,sothat L0o1J 1
101
V2=I1 LOJ
101
JolformabasisforN(Ak). L1J
18001 1
LOO1J (VI)
11001 P1=!18 8 01, 8Lo 0 8J
and
THE DRAZIN INVERSE
127
P'AP=
(VII)
(VIII) Since C = 2, C' =
We thus get
11100 00
11001 L000J
AD=PI0 0
(IX)
L000
The next characterization of AD may be useful if one tries to formulate a definition for the Drazin inverse of a linear transformation on an infinite dimensional vector space [17]. For A let C denote the class, X
C=
= XA and XAX = X}.
(Clearly, C is non-empty since OEC.) Define a partial ordering on C by X1 X2 if and only if X1AX2 = X1 = X2AX1.
Theorem 7.2.6
A° is the maximal element of C.
Proof Suppose XeC. Then X = for n = 1,2,... Thus, R(X) c for each n. In particular, R(X) c R(Ak) where k = Ind(A) so that A°AX = X. Furthermore, it is easy to see that c N(X) for N(X). It follows from this that XAA° = X. every n. In particular, Therefore, X AD for every XE C.
•
3.
Basic properties of the Drazin inverse
This section will present basic results about the Drazin inverse.
In Section 2, we saw that AD was not always a (1)-inverse for Though AADA is usually unequal to A, the product AADA = A2AD still plays an important role.
Definition 7.3.1
For
the product
= AADA = A2AD = ADA2
is called the core of A.
Intuitively, the 'core' of A should contain that which is basic to the structure of A. If is removed from A, then not much should remain. The next theorem shows in what sense this is true. Theorem 7.3.1
If
then A — CA =
is a nilpotent matrix of
index k = Ind(A).
Proof The theorem is trivial if Ind(A) =0. Thus assume Ind(A) 1
and notice that (NA)k = (A — AADA)k = (A(I — AAD))k = Ak(I — AAD) =
•
128 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
Definition 7.3.2
A
For
CA = (I — AAD)A is CA + NA is called the
—
called the nilpotent part of A. The decomposition A = core-nilpotent decomposition of A.
In terms of the canonical form representation of Theorem 2.1, we have the following. Theorem 7.3.2
If AeC0 "is written as A =
where
P and C are non-singular and N is nilpotent of index k = Ind(A), then
The core-nilpotent decomposition of A is unique in the following sense. A Theorem 7.3.3 For A=X+Y where XY = YX =0. Ind(X) I, and Y is nilpotent of index k = Ind(A). Moreover, this unique decomposition is given by A = CA + NA.
Proof Let X, Y be as described in Theorem 3. If md (X) =0, then V =0 and A is invertible. Suppose then that Ind(X) = 1. Let P. C be invertible
'Then V =
matrices so that X =
since
XY = YX =0 and C is invertible. Thus Y2 is nilpotent with Ind(Y2) = Ind(A) since Y is. But A =X + Y = Y
so
that X =
= NA by Theorem 2. •
Corollary 7.3.1 = NA, , and
If A e C' xn and if p is a positive integer, then If p Ind(A), then + = CA, + NA, =
= CA., =
The next lemma summarizes some of the basic relationships between
A,CA,NA,and AD. Lemma 7.3.1
For
'
the following statements are true.
I
I
iflnd(A)=0
(ii) NACA=CANA=O. (iii) NAAD = ADNA = 0. (iv) CAAAD = AADCA = CA. (v)
(vi) A =
CA
(vii)
if and only if Ind(A) 1. = AD.
(viii) AD = (ix) (AD)* = (A*)D.
(In the case of a general field, (*) is taken to mean transpose)
THE DRAZIN INVERSE
129
There are cases when the Drazin inverse coincides with the Moore—
Penrose inverse.
Theorem 7.3.4
For
X
A° = At (land only if A is an EP matrix,
(See Chapter 7for a discussion of EP matrices.)
Proof If A is EP, then AAt = AtA. Since At is always a(l,2)-inverse for A, it follows that At = A' = AD. Conversely, if At = AD, then AAt = AAD = ADA = AtA so that A must be EP. • 4.
Spectral properties of the Drazin inverse
In what follows, o() will always denote the spectrum, that is the set of eigenvalues. For a non-singular matrix A, it is easily proven that AEO(A) if and only if A 'ea(A 1). Furthermore, x is an eigenvector for A corresponding to A if and only if x is an eigenvector for A corresponding
toA'.
Recall the definition of a generalized eigenvector.
Definition 7.4.1
If A€C"
and x is a non-zero vector such that there =0 and is a positive integer p and a scala! AEO(A)for which (A — (A — Al)"- 1x 0, then x is called a generalized eigenvector for A of grade p.
An eigenvector of grade one is, of course, just an eigenvector. For a non-singular matrix A, it is well known that x is a generalized eigenvector for A of grade p corresponding to Aeo'(A) if and only if x is a generalized eigenvector for A' of grade p corresponding to A'Ea(A 1)• The next theorem shows that the same situation holds for Drazin inverses of singular matrices.
Theorem 7.4.1
For
such that Ind(A)= k, Aeo(A) (land only if
x is a generalized eigenvector for A of grade p corresponding to Aeo(A), A #0, (land only (lx is a generalized eigenvector for AD of grade p corresponding to A' Furthermore, x is a generalized eigenvector for A corresponding to A =0 (land only (fXE N(Ak) = N(AD).
Proof If Ind(A) =0 we are done. Suppose that A =
x=P
.
L"2J
Then x is a generalized eigenvector for A of grade p for A
0
if and only if u2 =0 and u, is a generalized eigenvector of grade p for C. Since C is invertible and A =
case is obvious. • Corollary 7.4.2
L
0
0 JI"
we
are done. The A =0
Let be such that Ind(A) = k. If x is a generalized eigenvector for A corresponding to A #0, then XGR(Ak).
130 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
5.
A° as a polynomial in A
If A is a non-singular matrix, then it is easy to show that A1 can be expressed as a polynomial in A. This property does not carry over to the (i,j, k)-inverses. In particular, if A is square, then there may not exist a polynomial p(x) such that At = p(A). However, the Drazin inverse of A is always expressible as a polynomial in A.
Theorem 7.5.1
If
then there exists a polynomial p(x) such that
AD = p(A).
Proof
where
Use Theorem 2.1 and write A as A =
P
and C are non-singular and N is nilpotent of index k = Ind(A). Since c is non-singular, we know that there exists a polynomial q(x) such that '.Then C' = q(C). Let p(x) be the polynomial defined by p(x) = q(N)]k
[C_l
=AD.
The polynomial constructed in the proof of Theorem 1 is generally of
much higher degree then is actually necessary. The next theorem shows how one might actually construct a polynomial p(x) such that p(A) = AD. Unlike Theorem 1 it uses the fact that A is a matrix over C.
Theorem 7.5.2 [77] Let distinct eigenvalues of A and
of
Suppose that {A0,Z, ,22, ... ,t,} are the =0. Let denote the algebraic multiplicity
letm=n-m0=m1+m2+...+m1. Let p(x) be the polynomial of
... degreen— coefficients are the unique solutions of the following m x m system of linear equations. denotes the ith derivative with respect to x.)
= = —
2'"'
fori=1,2,...,t, ('"'-'pci
Then p(A) = AD.
Proof
Since
A is similar to a Jordan form. Write
THE DRAZIN INVERSE
131
where J and N are the block diagonal matrices,
A=
J = Diag[B1, ... ,Bj, N = Diag[F1,
,F9]. Each is an elementary Jordan block corresponding to a non-zero eigenvalue. That is, each B, is of the form A,
I
o
o
0
0
A,
1
0
0
o o
0 0
0,
A,
0 0
0
...
A,
(1)
1
O...O
A,
and s m1. Each F, is an elementary Jordan block corresponding to a zero eigenvalue. That is, each is of the form (1) with A, =0. Clearly, J is non-singular and Ne CTM0 "o is nilpotent of index k = Ind(A) in0. Therefore, AD
]T- 1• Now, p(A) =
=
0]T 1, because NN0 = 0 implies p(N) = 0. Since p(J) = =
Diag[p(B1),... ,p(Bh)], it suffices to show that using (I), it is not difficult to verify that p(A,)
p'(A,)
p"(21)
p(S_ 1'(A)
1!
2!
(s—i)!
B' for eachj. But
p'(A,)
0
I!
p(B1) = p"(A,)
2! p'(A,) 1!
0
0 1
—1
0
0
SxS
(1Y'l
I
A,
A
—1
.4..
= B;1. •
—1 "1
0 Thus, p(A) =
0
•
0...0
1
A,
132 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
Theorem 2 can sometimes be useful in computing AD. This is particularly true if in0 is large with respect to n. The following is an example where
Theorem 2 can be used quite effectively.
Example 7.5.1
Let A
We shall use
.
=
[
=
=
—
Theorem 2 to compute The first (and, in general, the most difficult) step is to compute the eigenvalues for A. They are a(A) = {O, 0, 1, 1 }. Thus, in0 =2 and m1 =2 so that Theorem 2 implies that AD can be expressed as AD = A2(cx0I + oi1A) since p(x) = Now and + are the solutions of the system:
ç
1
—1 =p'(l)=2ix0+3a1 = — 3, and
Therefore,
AD=A2(41_3A)= For each A E
X
there are two polynomials of special importance.
These are the characteristic polynomial and the minimal polynomial. Let us examine each one. Consider first the minimal polynomial for A,
It is not difficult to show that A is non-singular if and only if; #0;
in which case, A' =
—
+ CLd....
+ ... + a2A + x11). Now,
assume A is singular so that ç =0. Let i be the smallest number such that = ..• This number, I, is sometimes called the index of the zero eigenvalue. The next theorem (valid in a general field) shows that the index of the zero eigenvalue of A is the same as the index of A.
Theorem 7.5.3
+ ... +cçx', with
IfA€C
#0, is the minimal polynomial for A, then I = Ind(A).
Proof Use Theorem 2.1 and write A as A =
0]P_1 where C is
non-singular and N is nilpotent of index k = md (A). Since ni(A) =0, we can conclude that 0= m(N) = Nd + + ... + ; + +1 + ;N1
+ ...
+cz1J)
THE DRAZIN INVERSE
is
133
invertible we have N' =0. Hence I k. Suppose that k < i. Then, ADAi=Ai_l.
(2)
Write m(x) = x1q(x) so that 0= rn(A) = A1q(A). Multiply both sides of this 'q(A). Thus, the polynomial r(x) = by AD and use (2) to obtain 0= such that r(A) =0 and deg[r(x)] <deg[rn(x)]. This is a x' 1q(x) is contradiction. Therefore, we can conclude that k = i.
•
Corollary 7.5.1
Let k = Ind(A), and m0 denote the algebraic multiplicity of the zero eigenvalue. It is always the case that m0 k.
Proof The minimal polynomial, from Theorem 5.3, is m(x) = xd_k + ,Xd ,x + ;). (ç 0), and m(x) must divide p(x). + + 2d-
•
1
X
When one uses Theorem 2 to compute the Drazin inverse of A E
it is necessary to compute each eigenvalue of A along with the multiplicities of each eigenvalue. Many times one can compute the coefficients of the characteristic polynomial for A easier than the eigenvalues. The following theorem shows how to obtain AD from the characteristic polynomial for A. and let k = Ind(A). Write the characteristic
Theorem 7.5.4 Let equation for A as 0= foq(x), 0). Let
+ fl,,
,x"'
+
,x +
=
if m0
+ ... +
+
—
+
(3)
ifm0=n
0,
Then, AD = A' [r(A)
+
'for each integer I
k.
=0, and the result is trivial. Proof If m0 = n, then A is nilpotent and to Thus assume m0 < n. Multiply both sides of 0= Amoq(A) by By obtain 0= ADq(A). From this, it easily follows that AD = raising both sides to the (I + 1)th power, we obtain (AD)s ii = AAD[r(Afl
Multiplication on both sides of this by A' yields AD = A'[r(A)]"'. • Since the index of a matrix can never exceed its size, nor the number rn of Corollary 1, we have the following.
Corollary 7.5.2
For
=
=
where
r(x) is the polynomial in Theorem 4.
For
the coefficients in the characteristic equation
for A can be computed recursively by the well known algorithm [43] (Tr denotes trace). (4)
134 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
where (5)
=
1A1' + ...
the next result showing how this algorithm may be used to obtain the matrix r(A) is immediate. Since
+
Theorem 7.5.5 AD =0.
Let
and let r(x) be as in (3).
If n > m0, then r(A) =
and
where
—
computed from (4) and (5). Thus AD
=
If n = rn,, then are
for each I Ind(A).
—
NoticethatifS,=0,thenO=S+j =S..2= ...
and obtain r(A), to P0 = 0. Thus, it is easy to use Theorem 5 = and AD as follows. To compute
Algorithm 7.5.1
for
(I) Set S0 = I and recursively compute
=
+
=
until some S1 =0, but S1_1 #0.
—
3
(II) Let u
=
that number such that and = fin-u-2 = =0. (Notice that n — u = m0, the algebraic multiplicity of the
be
zero eigenvalue.)
(III) Let I = n
as AD =
(IV) Compute
=
compute
— u and
—
Note that not all of the computed Si's must be saved. If fi,, - #0, then 2can be forgotten. However needs to be saved until next
non-zero fi appears. Also notice that this algorithm produces the value of the algebraic multiplicity of the zero elgenvalue for A. Example 7.5.2
Let —8 —10
6
—3']
8
—4)
i
—1
1
L—2
2
—2
[io
A—' 12 I
oJ' 2]
We shall use Algorithm I to compute AD. (I) Successive calculations give
S0=I,
r I I
12 1
L—2
—8 —13
6 —31 8
—1 —2
—41
01'
2—2—1J
THE DRAZIN INVERSE 12
—10
4
—4
4 —4
4
r—14 AS1
= L
—12
12
—10
S2=AS1+21=[4
4
2
—
=2,
P0 = —
= 0,
=
1
135
—4J 5
4 —4 4 —2 0
0 S3
=
AS3 =
0000
0000
and
S4=AS3= 0. Therefore t =4 in this example and the algebraic multiplicity of the zero eigenvalue is in0 = 2.
(II) Setu=2. (III) Set 1=2. (IV) Compute AD =
=
—
as follows. Since S2 = AS1 +
—
= = + Write this as we have that $2(AS1)]S1. Since AS2 and AS1 have already been computed, only two matrix multiplications are necessary. This is more efficient than forming directly. In general, one can always do something like the product this when this algorithm is used. Now $21,
—12
12
24 = —24 — 12 12
00
6
—12 —24
12
12
6
—
00
16
—12
8
—4
—8
8
—8
8
[(AS2)S1—2(AS1)]=
—
136 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
and
12
—8
—16
16
r—16
L
16
—16]
Therefore, 2 —
8
I
'—I—i L—2
6.
1
—1
1
2
—2
2
A° as a limit
It was previously shown how the Moore— Penrose inverse could be expressed by means of a limiting process. In this section, we will show how the Drazin inverse and the index of a square matrix can also be characterized in terms of a limiting process. Whenever we consider the limit, as z 0, of an expression involving (A + zI)1 we shall assume that —
If
Definition 7.6.1
and '1 CA and NA denote the core and
nhlpotent part of A, respectively, then for integers m — 1, we define (AD,
,fm=O and
ifm1 (0
,
ifm=—1 EfmO,
ifm=—I
(0,
ifm=O.
afm1
(.Nr,
(.
Theorem 7.6.1 z-'O
and let Ind(A) = k. For every integer 1 k,
Let Ae
+zI)'A'.
(1)
For every non-negative integer I, AD
Proof Ic
= urn (A"' + zI)
(2)
If k =0, then A is non-singular, and the result is evident. For
>0, use Theorem 3.1 and write
(A"' + zI)
r(cs+1
I —'C'
I
01
-:
=
+ zI) 'C' = C- ', (2) is proven. A' and (1) follows. •
Since C is non-singular, and Jim (C"'
If l k,
z-O
then C? = Since it is always true that k n, we
also have the following
THE DRAZIN INVERSE
Corollary 7.7.1
AD
For
The index of
= urn
137
+ zI) 'An.
also be characterized in terms of a limit. Before doing this, we need some preliminary results. The first is an obvious consequence of Theorem 3.1. can
Xli
Lemma 7.6.1 Let Ind(A") = I and only
a singular matrix. For a positive integer p, p Ind(A). Equivalently, the smallest positive integer Ifor which Ind(A1) = 1 is the index of A. be
Lemma 7.6.2 Let be a nilpotent matrix such that Ind(N) = k. For non-negative integers m and p. the limit (3)
z-0 exists
and only
+ p k. When the limit exists, its value is given by
lit4
=
urn zm(N + zI)
ifm>O ,fm=O
0
z—0
(4)
Proof If N = 0, then, from Lemma 2.1, we know that k =
1.
The limit
under consideration reduces to
ifp=O
ifpl.
0,
z—0
It is evident this limit will exist if and only if either p 1 or m 1, which is equivalent to m + p 1. Thus the result is established for k = 1. Assume k—i
N1
nowthatk1,i.e.N#O.Since(N+zIY1=
Z
1=0
+(_lr_2zNm+P_2 ÷
+(—
+
(—
z
+ (5)
.
If m + p k, then clearly the limit (3) exists. Conversely, if the limit (3) exists, then it can be seen from (5) that
Theorem 7.6.2
For
= 0 and hence m + p k. U
where Ind(A) = k and for non-negative
integers m and p. the limit (6) urn z"(A + zI) z-0 exists tf and only !fm + p k: in which case the value of the limit is given by
limf'(A + zI)'A" =
{( — Ir
(I_AAD)Am4Pi,
(7)
138 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
Proof If k =0, the result is immediate. Assume k 1 and use Theorem 2.1 where P and C are non-singular and N is
to write A = nilpotent of index k. Then
zm(A + zI) 'A"=
(8)
Because C is non-singular, we always have
limz'"(C+zI)
1
,
z-.O
ifm>O ifm=O
(9)
Thus the limit (6) exists if and only if the limit lim z"(N + zI) exists, :-. 0 which, by Lemma 2, is equivalent to saying m + p k. The expression (7)
is obtained from (8) by using (9) and (4). • There are some important corollaries to the above theorem. The first characterizes Ind(A) in terms of a limit. Corollary 7.6.2
For AeC"
the following statements are equivalent
(i) Ind(A) = k. (ii) k is the smallest non-negative integer such that the limit lim (A + zI) tAk exists. z-0 (iii) k is the smallest non-negative integer such that the limit urn z*(A +
exists.
(iv) If Ind(A) = k, then lim (A + zI) iAk = (AAD)Akl z-0 (v) And when k> 0, lim zk(A + zI)' = (—
=
1)
1)
z-0
Corollary 7.6.3 For lim (A + zI) '(A' + z'I) = z-'O
Corollary 7.6.4
A''.
For
and for every integer l Ind(A) >0,
the following statements are equivalent.
(1) Ind(A)l. (ii) lim(A+zI)1A=AA'. z-0
=I—AA'.
(iii) Jim
z-0 The index can also be characterized in terms of the limit (1).
Theorem 7.6.3
For
X
the smallest non-negative integer, 1, such
that
lim (A'11 +zI)'A' exists is the index of A.
(10)
THE DRAZIN INVERSE
139
Proof If Ind(A) = 0, then the existence of(10) is obvious. So suppose Ind(A) = k 1. Using Theorem 2.1 we get
zI)
— —
0
L
1
The term (C'4 ' + zI) IC: has a limit for alt 1 0 since C is invertible.
which has a limit if and only if N' =0. That is, 1 Ind(A). U
The Drazin inverse of a partitioned matrix
7.
This section will investigate the Drazin inverse of matrices partitioned as
M
fl where A and C are always assumed to be square.
=
Unfortunately, at the present time there is no known representation for MD with A, B, C, D arbitrary. However, we can say something if either D =0 or B =0. In the following theorem, we assume D =0.
If M
Theorem 7.7.1
where A and Care square,
= k = Ind(A), and 1= Ind(C), then MD =
X = (AU)2[
(AD)IBCi](I
—
[AD
CD] where
CCD)
jzO
+ (I —
—
= (AD)2[
—
ADBCD
CCD)
IwO
rk-* 1 + (I — AAD)I Z AB(CDY l(CD)2 — ADBCD. Ls.O J (We define 00
(2)
I)
Proof Expand the term AX as follows. i—i
lCD
'BC' — 0
o
-
k—i
s—i
AX = k—I
2 —
0
140 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
=
/
\ / ADBCCD
1-2
I ADB +
(AD)1. 2BC11' 1—I
0
++
+ /k—I
+1
iB(CDYi +
— (
k-i
+E
0
1k-i
= ADB +
1-2
)—
1—2
1—2
(AD)i+2BCI.I.l 0 —
—
ADBCCD — 0
k-i
ADA1+ iB(CD)i+l
+
—
AADBC.
Now expand the term XC as follows. (AD)I.I.2BCI.41
XC =
k—i
I—i
1—i
o
(AD)i+2BC1+2CD +
—
0
o k— I
—
—
ADBCDC
0
/1—2
'+' +(AD)l1BCI
=1
'BC') + (BcD +
+
1B(CDY11)
+ It
—
\
/1—2
1—f
\o
I
k—i
AIB(CDY+1)
—
(ADABCD
ADBCDC.
ADX AB cD]=[o CD][o ci'
AB ADX
Fromthisitfollowsthat[0
so that condition (3) of Definition 2.3 is satisfied. To show that condition (2)
holds, note that
FAD X irA B1IAD X] [0 CDJ[o cj[o CDf[0
I
ADAX + XCCD + ADBCD CD
:
Thus, it is only necessary to show that ADAX + XCCD + ADBCD = X.
However, this is immediate from (2). Thus condition (2) of Definition 2.3 is satisfied. Finally, we will show that
IA
[o cJ
[0
X 1
IA
cDf[o c] IA
.]
(3)
IA' S(p)1 =Lo c'
THE DRAZIN INVERSE
141
p—i
'BC'. Thus, since n + 2> k and n + 2>1,
whereS(p)= 1=0
IA
]
X
[0 C]
An42X+S(fl+2)CD
c°][o
[o
Therefore, it is only necessary to show that 2X + S(n + 2)CD S(n + 1). Observe first that since 1+ k < n + 1, it must be the case that AhI(AD)i
=
for I = 1, 2, ...
=
(AD)IBCI 1(1
(4)
,1 — 1.
Thus, 1 Li=o
ri-i
= L 1—0
—
CCD)
1BCD
—
(5)
J 1
An-IBCI](I — CCD) —
I—i
I—i
1=0
10
IBCD
= =
Now, S(n +
IBCICD
=
IBCICD 1=0
1—0 11+ i
+ 1 1
1
- IBCICD =
By writing
1BCD
.IBCICD
+
1—0
=A11BCD+
1=1 l—1
1=0
we
obtain 1_i S(n
+
2)CD
=
1BCD
lCD +
+ 1=0
(6) 1=1+ 1
It is now easily seen from (5) and (6) that 1—1
A"2X +S(n + 2)CD_ Z
+1—1+1 Z
i—I
n
1=0
1=1
=
=
A"'BC' = S(n +
1),
i—0
which is the desired result. U By taking transposes we also have the following.
142 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
If
Corollary 7.7.1
L
where
A andC are square
= with Ind(A) = k and Ind(C)
1, then LD is given by LD
AD] where
=
X is the matrix given in (2).
There are many cases when one deals with a matrix in which two blocks are zero matrices. Corollary 7.7.2
Let
lAB] Mi=[0
lAo]
1001
A
A is square and each
is square. Then,
rAD
rAD 0
—L0
lOB
]'
2—
til "IMD_ 3 — (AD)2B AD 0]'
and B(AD)2 AD
Each of these cases follows directly from Theorem 1 and Corollary 1.
The next result shows how lnd(M) is related to Ind(A) and Ind(C). Theorem 7.7.2
If M
with A, C square, then
= Max{Ind(A), Ind(C)} Ind(M) lnd(A) + Ind(C).
[
Proof By using (iii) of Corollary 6.2 we know that if Ind(M) =
m,
then
the limit
limf'(M+zI)'
(7)
z-0
exists. Since
z"(M + 1)'—
+ 0
f'(C+zI)'
J'
(8)
one can see that the existence of the limit (7) implies that the limits
lim z"(A + 21)_i and urn f'(C + zI)' exist. From Corollary 6.2 we can z—'O
conclude that Ind(A) m = Ind(M) and Ind(C) m = Ind(M), which
establishes the first inequality of the theorem. On the other hand, if Ind(A) = k and Ind(C) =1, then by Theorem 6.2 the limits lim zkft(A + zI) lim + zI)' and lim 2k '(A + zI)'B(C + ZI)_1 = z—O z—O z—'O lim [z"(A + zI) I] B[z(C + zI)'J each exist. z—O
THE DRAZIN IN VERSE 143
+ zI)
Thus urn
exists and lnd(M) k +
I.
U
In the case when either A or C is non-singular, the previous theorem reduces to the following.
Corollary 7.7.3
Let AECPXr,
C
non-singular, (Ind(C) = 0), then lnd(M) = Ind(A). Likewise, if A is nonsingular, then Ind(M) = Ind(C).
The case in which Ind(M) 1 is of particular interest and will find applications in the next chapter. The next theorem characterizes these matrices. Theorem 7.7.3
If
and M
then Ind(M)
1
=
ifand only if each of the following conditions is true: Ind(A) 1, Ind(C) 1,
(9)
and
(I—AA')B(I—CC')=O.
(10)
Furthermore, when M' exists, it is given by
c'
Lo
(11)
Proof Suppose first that Ind(M) 1. Then from Theorem 2, it follows that Ind(A) 1, Ind(C) I and (9) holds. Since Ind(M) 1, we know that MD = M' is a (1)-inverse for M. Also, Theorem 5.1 guarantees that M' is
a polynomial in M so that M' must be an upper block triangular (1)-inverse for M. Theorem 6.3.8 now implies that (10) must hold. Conversely, suppose that (9) and (10) hold. Then (9) implies that AD = A' and CD = C'. Since A' and C' are (1)-inverses for A and C, (10) along with Theorem 6.3.8 implies that there exists an upper block triangular (1)-inverse for M. Theorem 6.3.9 implies that rank(M) = rank(A) + rank(C). Similarly, rank(M2) = rank(A2) + rank(C2) = rank(A) + rank(C) = rank(M)
so that Ind(M) 1. The explicit form forM' given in (11) is a direct consequence of (2).
U
Corollary 7.7.4
If Ind(A) N(C) c N(B), then Ind(M)
1, 1
Ind(C) 1, and either R(B)
R(A) or
where M is as in Theorem 3.
Proof R(B) c R(A) implies that (I — AA' )B =0 and N(C) c N(B)
impliesthatB(I—CC')=O. •
It is possible to generalize Theorem 3 to block triangular matrices of a
general index.
144 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
Theorem 7.7.4
Let
both be singular
non-singular, then Corollary 3 applies) and let M
A"'
positive integer p, let S(p) =
either is
For each
=[
with the convention that 00
= I.
i— 0
Then, Ind(M)
m if and only
each of the following conditions are true:
Ind(A)m, Ind(C)m,and
(12) (13)
(I — AAD)S(m)(I
—
CCD) =0.
(14)
Proof Notice that for positive integers p.
M"
(15)
= [s—— Assume first that Ind(M) m. Then Ind(MM) = 1. From Theorem 3 and the singularity of A, C we can conclude that Ind(AM) = 1, Ind(C'") = 1, and (I — Am(Am)D)S(m)(I
(16)
=0.
—
(17)
Then (12), (13) hold by (16). Clearly, (17) reduces to (14). Conversely, suppose (12)—(14) hold. Then (16) and (17) hold. Theorem 3 now implies
that Ind(Mm) = 1. Therefore, Ind(M) 1. • SeCNXS, such that rank(R) = n and Lemma 7.7.1 For rank(S) = n, it is true that rank(RTS) = rank(T).
Proof Note that RtR = and = Thus rank(RTS) rank(T) rank(RtRTSSt) rank(RTS). We now consider the Drazin inverse of a non-triangular partitioned matrix.
•
Theorem 7.7.5
Let
and M
If rànk(M) = rank(A)= r,
= then Ind(M) = Ind[A(I — QP)] +1= Ind[(I — QP)A] +1, where
P=CA' andQ=A'B.
Proof From Lemma 3.3.3, we have that D = CA 'B so that M=
Q]. Thus, for every positive integer i;
+
QJ
+ QP)AJ'1[I
Q]. (18)
THE DRAZIN INVERSE
Since
[i,]
A has full column rank and [I
Q]
145
has full row rank, we can
')= rank([A(I + QP)]"). Therefore, rank([A(I + QP)]")= rank([A(I + QP)] 1) if and only if ')= rank(M). Hence Ind([A(I + QP)]) + 1 = Ind(M). In a similar manner, one can show that Ind[(I + QP)A] + 1 = Ind(M). • conclude from Lemma 1 that
An immediate corollary is as follows.
Corollary 7.7.5 For the situation of Theorem 5, Ind(M) = 1 and only tf (I + QP) is non-singular. The results we are developing now are not only useful in computing the index but also in computing AD. Theorem 7.7.6 rank(A) = r, then
1]A[(sA)2riI!A_IB]
=
(19)
Proof Let R denote the matrix R
1][
=
A 'B],
and let m = Ind(M). By using (18), we obtain l[(44S)2]DA[I
'R
By Theorem 5 we know Ind(AS) = m —
1
A 'B].
so that
'[(4%S)2J° =
(ASr-'. Thus, it now follows that
'A[I
'R
A 1B] = MM.
The facts that MR = RM and RMR are easily verified and we have that R satisfies the algebraic definition for MD. The second equality of
(19)is similarly verified. U The case when Ind(M) = 1 is of particular interest.
Theorem 7.7.7
Let
If rank(M)=
and M
=
S'
exists where S = I + rank(A) = r, then Ind(M) = 1 if and only A 1BCA '.When Ind(M) =1, M' is given by
M'
146 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
+BC)'A(A2 + BC)'[A!BI. This result follows directly from Corollary 5 and Theorem 6. Before showing how some of these results concerning partitioned matrices can be used to obtain the Drazin inverse of a general matrix we need one more Lemma. Consider a matrix of the form M
Lemma 7.7.2 AeCPXF.
] where
=
Ind(A) Ind(M) Ind(A)+ 1.
(20)
Furthermore, suppose rank(M) = r. Then Ind(M) = non-singular.
1
and only
A is
Proof (20) follows from Theorem 2. To prove the second statement of the lemma, first note that md (A) =0 implies md (M) = md (0) = 1 by Corollary 3. Conversely, if rank (M) = r and md (M) = 1, then r = rank(M2) or equivalently, r = rank ([A2, AB]) rank (A) r. Thus rank (A) = r so
that A-' exists. • The next theorem can be used to compute the group inverse in case
Ind(M)=' 1. Theorem 7.7.8
Let
where R is a non-singular matrix
suchthatRM_[
]whereUeC". Then MD — R
LO
—
If Ind(M) =1, then M' =
0
R
R
Proof
Since Ind(M) = Ind(RMR '), we know from Lemma 2 that Ind(M) = 1 if and only if U' exists. The desired result now follows from
Corollary 2. • Example 7.7.1
1='Ii
12 Li
2
Let 1
4
2
0
0
We shall calculate M' using Theorem 7. Row reduce [M I] to
R]
THE DRAZIN INVERSE
147
where EM is in row echelon form. (R is not unique.) Then,
Ii 0 01 EM=I0
0
1
ii
0 0
1
LO 0 0]
[—2
[1
—fl, 0]
1
2
0
andRM=E%I.NowR1=12 4
1
[1
0
0
120 [KV sothatRMR'=
00:0 00
Clearly, U' exists. This imphes that Ind(M) = I and rig—' ' ii—2v1
[0
—
0
J
R—
—8
4
2
—10 —5
21
From Lemma 6.1, we know that lip Ind(M), then Ind(M") =
1. Thus, = the above method could be applied to to obtain For a general matrix M with p Ind(M), MD is then given by MD = = Another way Theorem 7 can be used to obtain M of index greater than 1 is described below. Suppose p Ind(M). Then
Ind((RMR ')')= Ind(RM"R ')= Ind(M")= 1. Thus if RM
then RMR'
S
=
T
0]and(RMR'Y'=
].
It follows from (20) that Ind(SP) 1. Therefore one can use Theorem 7 to (This is an advantage because
fmd MD
=R
=
r(SP).sP—
'L
is a smaller size matrix.) Then, 1
1
(SP)#SP.. 2T1
]R.
Finally, note that the singular value decomposition can be used in conjunction with Theorem 7. (See Chapter 12).
8.
Other properties
It is possible to express the Drazin inverse in terms of any (1)-inverse. Theorem 7.8.1
If
is such that Ind(A) = k, then for each integer
148 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
I k, and AD =
any (1)-inverse of l)tAI
X= AD
01
= P[0
rc-2:-i
AD
=
In particular,
',P and C non-singular and N nilpotent.
Proof Let A = Then
;
'.If X is a (1)-inverse, then it is easy to see
1
where X1, X2, X3 are arbitrary. That
L"i
= A'XA' is easily verified by multiplying the block matrices together. U In Theorem 1.5, the Moore—Penrose inverse was expressed by using the full rank factorization of a matrix. The Drazin inverse can also be obtained by using the full rank factorization.
Theorem 7.8.2 Suppose A e and perform a sequence offull rank factorizations: A = B,C1,C1B, = B2C2,C2B2 = B3C3,... so that isafull rank factorization of C,_ ,B1_ 1 ,for i = 2,3 Eventually, there will be a pair offactors, Bk and Ck, such that either (CkBk) 1 exists or CkBk =0. If k denotes the first integer for which this occurs, then
fk when (CkBk)' exist 'lk+l whenCkBk=O.
I nd(A'
—
When CkBk is non-singular, rank(Ak) = number of columns of Bk = number of rows of
and
R(Ak)= R(B, B2...
Bk), N(Ak)= N(CkCk...l ... C1).
Moreover, AD
— 5B1 B2 ... 0
C1
when
exists
when CkBk=O.
Proof IfC1B1isp xpand has rankq
=B1 B2.. Bk(CkBk)C*Ck_l is p x r and Ck is r x p, then
rank(BkCk) = r. Since CkB, is r x r and non-singular, it follows that so that Lemma 7.1 guarantees that rank(CkBk) = r = Since k is the smallest ')= rank(CkBk) = rank(BkCk) =
THE DRAZIN INVERSE
149
integer for which this holds, it must be the case that Ind(A) = k. The fact that rank(Ak) = number of columns of Bk = number of rows of Ck is clear.
By using the fact that the B's and C's are full rank factors, it is not difficult to see that R(Ak) = R(B1 B2 Bk) and N(Ak) = N(CkC&_I C1). If =0, then it is clear that A must be nilpotent of index k + 1. To prove the formula given for AD is valid, one simply verifies that the three conditions of the algebraic definition are satisfied. This is straightforward
and is left as an exercise. • There are several methods for performing full rank factorizat ions. One is the elimination scheme described in Algorithm 1.2. The others depend on orthogonalization techniques such as the modified Gram—Schmidt algorithm. Needless to say, the method chosen to perform the factorizations at each step can influence the final result.
Corollary 7.8.1
If
'B(CB)2C = 0, then AD = = BC is afull rank factorization. A
'where
Corollary 7.8.2 If is such that Ind(A) = A = B(CB) - 2C. rank factorization for A, then Theorem 7.8.3
If
and A = BC is afull
is such that rank(A) = 1, then
[Tr(A)12A when Tr(A)
Proof If rank(A) =
1,
0 and AD =0 when Tr(A) =
= A# =
0.
then A can be written as A = cd* where and Tr(cd*) = Tr(d*c) cd*cd* deCo. Now, Tr(A) = = Tr(A)A. = dc. Thus A2 = If Tr(A) 0, then R(A2) = R(A) so that Ind(A) 1. The fact that 1,
A# = [Tr(A)]2A can now be deduced from Corollary 9.2, or else one can
verify by direct computation the requirements of Definition 2.4. U In general, the reverse order law does not hold for the Drazin inverse. That is, (AB)D # BDAD. In the case of the Moore—Penrose inverse, we saw that very strong conditions had to be placed on A and B in order to guarantee that (AB)t = B'At. Even the commutativity of A and B is not strong enough to guarantee that (AB)t = BtAt. However, commutativity of A and B is enough to guarantee that (AB)D = BDAD.
Theorem 7.8.4
If A,
are such that AB = BA, then
= BDAD = ADBD,
(i)
(ii) ADB = BAD and ABD = BDA.
In general,
= A [(BA)2]DB (iii) even tfAB#BA. Proof
Assume first that AB = BA. It follows from Theorem 6.1 that
150 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
and (ii) now are is a polynomial in easily proven. Assume now that A and B do not necessarily commute. To prove (iii), let Y = A[(BA)2]DB. Clearly YABY = Y and ABY = A
YAB = A(BA)DB. Let k =
Ind(BA)}. Then (ABr2Y =
(ABr2A(BA)2"B = (ABr 1ABA(BA)2°B = (ABr 1A(BA)DB = 1• A(BA)k+ l(BA)DB = A(BA)kB = Therefore, by Theorem 2.2
Y=(AB)D. •
Corollary 7.8.3 Let A, be such that AB = BA. Then Ind(AB) max Ind(B) }. Given a solution to just one of the three defining conditions, Ak+ 'X = Ak, from it. one can construct
Theorem 7.8.5
Let be a matrix such that AD for somel Ind(A)= k. Then
Proof If
Let
and
'B = A', then
B=P[B 1
Thus
=
=
:1
= AD.
8
Applications of the Drazin inverse to the theory of finite Markov chains
1.
Introduction and terminology
Let {X, : teF
R} be an indexed set of random variables. If P is a
probability measure such that
whenever t1
152 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
An rn-state Markov chain is said to be ergodic lithe transition matrix of the chain is irreducible, or equivalently, its states form a single ergodic set. An ergodic chain is said to be regular if the transition matrix T has the 0. (For property that there exists a positive integer p such that
XeRhx,X >0 means each entry of X is positive.)
If an ergodic chain is not regular, then it is said to be cyclic. It can be shown that if an ergodic chain is cyclic, then each state can only be entered at periodic intervals. A state is said to be absorbing if once it is entered, it can never be left. A chain is said to be an absorbing chain if it has at least one absorbing state and from every state it is possible to reach an absorbing state (but not necessarily in one step). The theory of finite Markov chains provides one of the most beautiful and elegant applications of the theory of matrices. The classical theory of Markov chains did not include concepts relating to generalized inversion of matrices. In this chapter it will be demonstrated how the theory of generalized inverses can be used to unify the theory of finite Markov chains. It is the Drazin inverse rather than any of the (i,j, k)-inverses which must be used. Some types of (1)-inverses, including the Moore—Penrose inverse, can be 'forced' into the theory because of their equation solving abilities. However, they lead to cumbersome expressions which do little to enhance or unify the theory and provide no practical or computational advantage. Throughout this chapter it is assumed that the reader is familiar with the classical theory as it is presented in the text by Kemeny and Snell [46]. All matrices used in this chapter are assumed to have only real entries so that (.)* should be taken to mean transpose.
2.
Introduction of the Drazin inverse into the theory of finite Markov chains.
For an rn-state chain whose transition matrix is T, we will be primarily concerned with the matrix A = I — T. Virtually everything that one wants to know about a chain can be extracted from A and its Drazin inverse. One of the most important reasons for the usefulness of the Drazin inverse is the fact that Ind(A) = I for every transition matrix T so that the Drazin inverse is also the group inverse. This fact is obtainable from the classical theory of elementary divisors. However, we will present a different proof utilizing the theory of generalized inverses. After the theorem is proven, we will use the notation A in place of in order to emphasize the fact that we are dealing with the group inverse.
Theorem 8.2.1
If TE W" is any transition matrix (i.e. T is a stochastic matrix) and if A = I — T, then Ind(A) = 1 (i.e. exists).
APPLICATIONS OF THE DRAZIN INVERSE TO THE MARKOV CHAINS 153
Proof The proof is in two parts. Part I is for the case when T is irreducible. Part H is for the case when T is reducible. (I) If T is a stochastic matrix and j is a vector of l's, then Tj = j so that = 1,(see page 211), it follows that p(T)= 1. 1EO(T). Since p(T) lIT If T is irreducible, then the Perron — Frobenius Theorem implies that the eigenvalue 1 has algebraic multiplicity equal to one. Thus, Oec(A) with algebraic multiciplicity equal to one. Therefore, Ind(A) = 1, which is exists by Theorem 7.2.4. N equivalent to saying that Before proving Part II of this theorem, we need the following fact. Lemma 8.2.1 If B 0 is irreducible and M 0 Is a non-zero matrix that B + M = S is a transition matrix, then p(B) < 1.
such
Proof Suppose the proposition is false. Then p(B) 1. However, since 1. Thus, p(B) S is stochastic and M 0, it follows that B B 1. Therefore, it must be the case that 1 = p(B) = p(B*). The Perron—Frobenius Theorem implies that there exists a positive eigenvector. v >0, corresponding to the eigenvalue 1 for B. Thus, v = Bv = = j*S*v — j*M*v = (S* — M)v. By using the fact that Sj = j, we obtain j*i, — Therefore, j*M*v =0. However, this is impossible because
•
We are now in a position to give the second part of the proof of Theorem 1. (II) Assume now that the transition matrix is reducible. By a suitable permutation of the states, we can write
T
ix
(—
indicates equality after a suitable permutation
LO ZJ
hasbeenperformed)
where X and Z are square. If either X or Z is reducible, we can perform another permutation so that
ruvw TJ0 CD. Lo
0
E
If either U, C or E, is reducible, then another permutation is performed. Continuing in this manner, we eventually get B11
0
B22
o
O..:B,,,,
is irreducible. If one or more rows of blocks are all zero except where =0 for for the diagonal block (i.e. if there are subscripts I such that
154 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
each
k # I), then perform one last permutation and write T11
T12
O
T23
...
T1, T2,
T1,+2 T2,..2
T1,+1
Tir+i
...
...
T 0
0
o
Each T11 (1= 1,2,... ,n) is irreducible. From Part I of this proof, we know that (I — T11)' exists for every i. However, for i = 1,2, ... ,r, there is at least <1 #0. It follows from Lemma 2.1 that one index k # I such that exists for i= 1,2,...,r. Wecan now for 1= 1,2,...r. Therefore,(I
conclude that there exists a permutation matrix P such that A can be written as A— L
22J
I
where G11 is non-singular and
exists. It now follows from
Theorem 7.7.3 that A must exist and —
I
L
Thus,
I
22
the proof is complete. •
Notice that for every transition matrix T, it is always the case that
j€N(A) = N(A) so that A'j =0. Furthermore, it is always the case that =j. We will frequently use these observations together with the following well-known Lemma, which we state without proof. (I —
Lemma 8.2.2 (I) Every transition matrix T is similar to a matrix of the form where lØo(K).
(II) If T is the transition matrix of an ergodic chain, then k = 1, i.e. T is similar to a matrix of the form
(III) If T is the transition matrix of a regular chain, then lim K =0. U-. We are now in a position to relate the single expression I — AA to
APPLICATIONS OF THE DRAZIN INVERSE TO THE MARKOV CHAINS 155
various types of limiting processes which are frequently encountered in the
theory of finite Markov chains.
Theorem 8.2.2
Let 1 be the transition matrix of an rn-state chain and let A = I — T. Then
hm
for every transition matrix.
lim (xI + (1 — x)Tr
for every transition matrix T and 0< a < 1.
J—AA' = tim Jim
r r
for every regular chain. for every absorbing chain.
Proof For every transition matrix T, we know from Lemma 2 that there exists a non-singular matrix S such that (2)
and I
Therefore, I — K is non-singular and
=s-'[:_.!
and
J_AA#
(3)
Assume first that T is the transition matrix of a regular chain. Then from =0. It is now clear that Lemma 2 we know that k = 1 and Jim I,—
Next, consider I = oci + (1 — x)T, 0<
a<
1.
It is clear that 1
is
a transition
matrix whose eigenvalues are = a + (1 — A€o(T). convex combination of 1 and A, it follows each X different from 1 is inside the unit circle (i.e. I I < I if # 1) so that by considering (2) it is clear that each X is a
= p(aI + (1 — a)K) < 1 and
StI.Jim t' \
/
.
1i
=lim I
LO
0
xl + (1 —
Assume next that T is the transition matrix of an absorbing chain with
156 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
exactly r absorbing states. Then there exists a permutation matrix P such
that
T=
p
and r =
For an absorbing chain, it is well-known that I + Q + Q2 + ... = (I — so that
Q)'
V= Since A has the form
Theorem 7.7.3 yields
A' —
(I—Q)'J
L—(I—QYTR:
Therefore, (4)
Finally, assume that T is any transition matrix and is written in the form (2). Since I — K is non-singular, we may write
I+K+K2+...
(5)
By using (2) and (3) together with (5), it is a simple matter to verify that
=
n
all n, it follows that urn
(I—r)A' n
+ I — AA'. Since hr
= 1 for
(I—r)A' =0, and hence n
a In the case of ergodic chains, the matrix I — AA' has a very special structure.
Theorem 8.2.3
If T is the transition matrix of an rn-state ergodic chain,
then each row of I — AA' is the same vector is the fixed probability (row) vector of T.
Proof Thisfollowsfromthefactthatlim a-.
= [w1,w2, ... ,w,,,J, which
T'=I—AA'. S
APPLICATIONS OF THE DRAZIN INVERSE TO THE MARKOV CHAINS 157
Throughout the rest of this chapter we will let W denote the limiting matrix. I — AA 3.
Regular chains
Theorem 8.3.1
A'=lim
•For every transition matrix T,
n—i
n—i
k
—k
If T is the transition matrix of a regular chain, then the expression reduces
toA'= Proof Write T as in (I) of Lemma 2.2 so that n-I k
E"
S.
k—O
Since I — K is non-singular,
nffl_k&i _(I-.-K)_IKP.EKk1 n
Lnk_O
]•
From the first part of Theorem 2.2, it follows that Inn
I+K+... ÷Ki =0. 1*
Therefore,lim
n-i —k
and hence
urn I
(
)
The second equality in the first part of the theorem follows because I — W = AA Assume now that T is the transition matrix of a regular chain. Write T as in Part (II) of Lemma 2.2. From Part (III) of that Lemma,
= (I — K)
we know that urn K" =0 so that
1•
Therefore,
k=O
=
= A.
The second part of the proof of Theorem 2.3 provides an interpretation for each of the entries of A' for a regular chain.
158 GENERALiZED INVERSES OF LINEAR TRANSFORMATIONS
Theorem 8.3.2
Let I be the transition matrix of a regular chain and let denote the matrix whose (i,j)th entry is the expected number of times the in the first n stages (the initial plus n — 1 stages) when chain is in state Then A' = lim (Ne" — nW). the chain was initially in
Proof The result follows by combining the fact that 1=0
together with Lemma 3.1 since
—
nW) = tim
—
( Ii=0
w))= A'. U
A thus has the following meaning for a when regular chain. For large n, the expected number of times in state where w. is the differs from nw1 by approximately initially in state jth component of the fixed probability vector In loose terms, one + n(I — AA') for large n. Furthermore, for large n, could write since one can compare two starting states in terms of the elements of = — tim — The (i,j)th entry,
For an initial probability vector the jth component, of p*N(n) gives the expected number of times in state .Y in n stages. The following corollary provides a comparison of and for two different initial probability vectors. Corollary 8.3.1
Let T be the transition matrix of a regular chain, let be an initial probability vector, and let w* be the fixed probability vector for T. Then tim — nw) = and for two initial vectors and
=
Tr(A') =
tim
Furthermore,
—
= tim
—
—
paN(n)j),
for every
It
initial probability vector
Proof The first two limits are immediate. The third limit follows fiom the second and the fact that A'j =0 since — 1
=
— k
-.
—
p*A# e1
It
U 4.
Ergodic chains
In this section we will extend our attention to investigate ergodic chains in generaL It will be shown that the matrix A' is the fundamental quantity in the theory of ergodic chains. Virtually everything that one would want to know about an ergodic chain can be determined by computing A'. We will begin by investigating the mean first passage times (i.e. the
APPLICATIONS OF THE DRAZIN INVERSE TO THE MARKOV CHAINS 159
expected number of steps it takes to go from state
to
for the
first time.)
Let M denote the matrix whose (i,j)th entry is the expected number of steps before entering state for the first time after the initial state .9's. M is called the mean first passage matrix. For a square matrix X, the diagonal matrix obtained by setting all off diagonal entries of X equal to zero is denoted by Xd. J,, will denote the matrix of all l's. If the size of J,, is understood from the context, then the subscript m will be omitted.
If T is the transition matrix of an rn-state ergodic chain
Theorem 8.4.1
whose fixed probability vector is passage matrix is given by
= [w1 , w2, ... , w,,,], then the mean first (1)
where D is the diagonal matrix 0
0... 0 0...
0
=[(I—A.A)d]'.
0 WI,,
Proof It is known that the mean first passage matrix is the unique solution of the matrix equation
AX=J—TXd.
(2)
We simply verify that the right hand side of (1) satisfies the equation (2). Let R denote the right hand side of (1) and observe that = D. Now,
AR=A(I—A'
J—TD=J—TR. U Corollary 8.4.1
Consider an ergodic chain whose fixed probability rector is = [w1 , w2, ... , wJ. If the chain starts in state .9'., then the expected number of steps taken before returning to state S" for the first time is given by M11 = 1/(1
—
For an initial probability vector the kth component, (p*M)k, of p*M is the expected number of steps before entering state br the first time. Consider the case of a regular chain that has gone through n steps before it is observed. If n is sufficiently large, the initial probability vector may be taken to be p*(O) = In this case, p*(t) = When this t =0, 1,2 situation occurs, we say that the chain is observed in equilibrium. The next theorem relates the diagonal elements of A' to the expected number of steps taken before entering state when the initial distribution is p*(0) =
160 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
If T is the transition matrix of an ergodic chain and is the fixed probability vector of T, then
Theorem 8.4.2
(w*M)k= 1
A' W&
Use this Proof From Theorem 4.1, we have M = D — A'D + w*D + w*JAI D= together with the fact that wA' =0 to obtain w*M =
A'
= 1 + (j*AID)k = 1 + —a.
j(I + Ad'D). Therefore,
U
provides a comparison of the expected number of steps before entering state '9'k for two different This expression is given below in terms initial distributions and The kth component of the vector
—
of A'. Theorem 8.4.3 vectors
For an ergodic chain and for two initial probability
and
—
=
—
Proof This follows from Theorem 4.1 since
—
A#)D.
=
=
•
We now address ourselves to the problem of obtaining the variances
of the first passage times. In Theorem 4.1, we saw how the matrix produces the expected first passage times. The following theorem shows that also produces the variances of the first passage times. For an ergodic chain, let V denote the matrix whose (i,j)-entry is the variance of the number of steps required to reach state for the first time after the initial state 5'.. For Xe W" X m let X denote the matrix whose entries are = the squares of the entries of X, i.e. The matrix of variances of the first passage times is given by V = B — where M is the mean first
Theorem 8.4.4.
passage matrix and
B=
+ I) + 2(A'M — J[A'MId).
(3)
Proof If V = B — M5, then it is well known that B must be the unique solution of AB = J — + 2T(M — Me). From Theorem 4.1, we know that M — Md = M — D = — (A' — )D. Therefore B must be the unique solution of AB = J — TBd — 2T(A' — JA7)D. Let R denote the right hand side of (3) and observe that = Rd O,AM = J — TD,AA'M = 2DArD + D. Now, use the facts that AJ = — A'TD, and TJ = J to obtain AR =(J — + I) — 2A'TD = J— — 2T(A' — )D. Therefore, (3) must be true. S 5.
Calculation of A* and
for an ergodic chain
practical methods for calculating the group inverse of a general square matrix G (provided of course that C, exists) were given in Some
APPLICATIONS OF THE DRAZIN INVERSE TO THE MARKOV CHAINS
161
Chapter 7. However, for the present situation we can take advantage of
the special structure of A = I — T and devise an efficient algorithm by
which to compute A'. Theorem 8.5.1
Let T be the transition matrix of an rn-state ergodic chain and let A = I — T. Every principal submatrix of A of size k x k, k = 1,2,3, . - , m — 1, is non-singular. Furthermore, the inverse of each principal submatrix of A is a non-negative matrix.
Proof Since T is the transition matrix of an ergodic chain, T is irreducible and p(T)= 1. Let A be any k x k submatrix of A, where 1 k m — 1, so that A = I —1 where T is a k x k principal submatrix of T. It is well known that p(T) < p(T) = 1. Thus, A must be non-singular. For every matrix B 0, it is known that p(B) < 1 ii and only if(l — B)' exists and 0 since T 0. • (I — B)1 0. Therefore, we can conclude that (A)' As a direct consequence of this theorem, we obtain the following result. Corollary 8.5.1 If T is the transition matrix of an rn-state ergodic chain and we write A as
l),cERm_
where UeR
U' O.
then U' exists and
and
Using this result, it is now possible to give a useful formula for obtaining
A'. Theorem 8.5.2 Ld* where U E
For an rn-state ergodic chain, write A as
I
-1)
notation: h*
and e
CE or ',d
1— h*j,and F= U1
—
ö>O,fl> 1,andA'
is given by
FjhF
-5
U-'-tp
Adopt the following
I
p
Then
Fj
/32
Proof To show S >0 and /3> 1, observe that d* 0. From Corollary 5.1, it follows that U1 0 and hence h* 0. However, h* is not the zero vector. Otherwise, if h =0 then d* =0, because U is non-singular, and this would imply that T is reducible, which is impossible. It now follows that 5>0 and /3> 1. To show the validity of(1), first note that (I —
162
GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
=I+jh*/fl.LetH=(I_jh*)IU*(t_jh*)isothat Uijh* jh*UI ôjh*
H=U'+
—--p--.
(2)
+ By a direct calculation, using (2). it is easy to see that Fjh*F U
U1 + h*F
= H. Likewise, use (2) to show that
—
Fj
=
= Hj, and
ö
= — h*Hj. Therefore, the matrix on the right
hand side of (1) can be written as
[H
—Hj
I
(3
_h*Hj
Lh*H
Since rank (A) = rank (U) = m — I.
A_lU
Uk h*Uk
Lh*u
From Theorem 7.7.6. A' is given by
where'k =
A'
A can be written as
—
(4)
h*Hk]
Lh*H
the row sums of A are all zero, it follows that Uj + c =0 and hencej = — U1c = — k so that the matrices in (3) and (4) are equal. • For an ergodic chain, if one desires to compute the fixed probability vector one does not need to know the entire matrix A'. As demonstrated in Theorem 2.3, knowledge of any single row of is sufficient. Theorem 5.2 provides an easily obtainable row, namely the last one, and thereby provides one with a relatively simple way of computing Because
Theorem 8.5.3
If T is the transition rnatrix of an rn-state ergodic chain and A = I — T is partitioned as
A_lU
:
[d*
cl_lu
where U E p(rn -1)
-
:
—Uj
_d*j
then the fixed probability vector of T is given by
1_h*j= 1 _d*Ulj. Proof From Theorem 2.3, w* = e From (1), rZ =
M
MPL
so that
1
w*=e*_r*A_ff_h*
—
i
11
S
•
rA, where r is the last row of
=
hi]. Therefore
APPLICATIONS OF THE DRAZIN INVERSE TO THE MARKOV CHAINS
163
From a computational standpoint it is important to point out that when
Theorem 5.3 is used to compute w, it is not necessary to explicitly calculate U— just as it is not necessary to explicitly calculate C- in order to solve a non-singular system of linear equations Cx = b. Indeed, one may consider the vector h to be the solution of the non-singular system U*x = d and proceed with the solution of the system by conventional methods.
Corollary 8.5.2
For an erqodic chain.
I—AA'
=
Proof The first result follows since W = I
—
AAZ the second follows
from the first. • pointed out above, for an ergodic chain it is not necessary to explicitly compute A4, or even U- 1, in order to compute the fixed is readily probability vector But if one knows A4 or U', then available. However, it is just the reverse that is often encountered in applications. That is. by theoretical considerations or perhaps by previous experience or experimentation, one knows what the fixed probability vector, w, or the limiting matrix W, has to be. In order to obtain some of the information about the chain which was discussed in the previous section, such as the mean first passage matrix, the matrix A needs to be computed. It seems reasonable to try to use the already known information about or W in order to obtain A4 rather than starting from scratch and using only the knowledge of A and formula (1). The next theorem shows how this can be done. Before stating that result, we give a very simple example of a situation where the fixed probability vector is known beforehand. As
Example 8.5.1 Consider an rn-state ergodic chain which is 'symmetric'. That is, the one-step probabilities satisfy = for each i,j = 1,2, ... ,m. This implies that the transition matrix T is a doubly stochastic matrix. In particular, j*T = The vector > 0 is a fixed vector for T but it is not a probability vector. However this can easily be fixed by multiplying by Now,
= (!j*)T,
>0, and
=
1.
Since the fixed probability vector is unique, it must be the case that w*
=
Thus a symmetric ergodic chain is an example (which occurs
164 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
frequently) of a situation where
is known beforehand. when or W is Let us now return to the problem of finding already known. If W is known, then AA' is known since AA' = I — W. The matrix AA' can be used to obtain A' as follows.
Theorem 8.5.4 For an rn-state ergodic chain with transition matrix T, write A = I — T as
[d where UE W"'
I
1)
("
The matrix
is given by
A' Proof
(5)
If A is any (1)-inverse for A then AA'AAA' = AAAAA# =
A'AA' = A'. But X
=
is a (1)-inverse for A since
U Frequently, one may wish to check a computed inverse. If the matrix under question is non-singular, one can compare the products of the original matrix by the computed inverse with the identity matrix. However, if the matrix A has index 1 and a computed is checked by
comparing the product AA'A with A, AA' with A'A, and A'AA' with A', then one has probably done more work doing the 'check' than in computing the original quantity. Since the number of arithmetic operations necessary to form the indicated matrix products is relatively large, it is possible that factors such as roundoff error can render the 'check' almost useless and leave the investigator totally unsure about the quantity he has computed. The next theorem provides an alternate means by which one can 'check' a computed A' for an ergodic chain. For an ergodic chain, the fixed probability vector (or equivalently, the limiting matrix W) is either known from theoretical considerations, or else can usually be computed without much difficulty by simply proceeding as suggested in Theorem 5.3. Furthermore, iterative improvement techniques work well for producing very accurate numerical solutions for For a general chain, one can use the second part of Theorem 2.2 to obtain W. Once one has confidently obtained w (or W), then a computed A' can be checked by using the following result. For any chain with limiting matrix W, A' can be characterized as the unique solution X of the two equations WX =0 and
Theorem 8.5.5
AX=I—W. The proof is by direct substitution. lithe chain is ergodic, then WX =0 can be replaced by W*X = 0*.
APPLICATIONS OF THE DRAZIN INVERSE TO THE MARKOV CHAINS 165
This theorem provides a method by which one can have some confidence
in a computed value for A# since if X is a matrix such that WX is 'close' to 0 and AX is 'close' to I — W, then X must be 'close' to A More precisely, we have the following. Theorem 8.5.6 For any chain, if is a sequence of matrices such that —, 0 and AX, —, I — W., then X, —, A #
Proof
—
X, = AA*(A*
—
Xe)— WX, =
As with Theorem 5, if the chain is ergodic, then
—
X,)] — WX—'O
—' 0
can be replaced
by 6.
Non-ergodic chains and absorbing chains
In this section we will show that A and I — AA can be useful in extracting information from non-ergodic chains. We first consider the problem of classifying the states as being either ergodic or transient. For chains with large numbers of states the problem of classifying the states,
or equivalently, putting the transition matrix in a 'canonical' block triangular form is non-trivial. The following theorem shows how the projection I — AA can be used to classify the states. (As before A = I where T is the transition matrix.)
—
T
For a general chain, state 6". is a transient state (land only (I the ith column of I — AA is entirely zero. Equivalently, .9'. is an
Theorem 8.6.1
ergodic state (land only if the ith column contains at least one non-zero entry.
Proof Perform the necessary permutations so that T has the form (1) of Section 2. Then all transient states are listed before any ergodic states, with the partition as indicated in (1) of Section 2. As argued in Theorem 2.1, A has the form A
=
where G31 is non-singular, Ind(G22) =
1,
and P is a permutation matrix.
By using Theorem 7.7.3, 1 — AA * is seen to have the form
I—AA' Furthermore, every column of I entry since
1I—T,+1.,+1 G22 =
L0
0
—
G22Gr2 contains at least one non-zero
166 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
and each T,1 (i> r) is the transition matrix of an ergodic chain. Hence = I — (I — T11)(I — Tx)> 0 because it is the limiting matrix of the I—
chain associated with T11 and it is well known that the limiting matrix of
an ergodic chain is strictly positive. • I — AA' can provide a distinction between the transient states and the ergodic states and it can completely solve the problem of determining the ergodic sets. Theorem 8.6.2 same ergodic set,
and 9'k belong to the For a general chain, states the ith and kth rows of I — AA are equal.
Proof Write the transition matrix in the form (1) of Section 2 so that (1) holds. The desired result follows from 0
[
0
because each of the rows of I — A11A! are identical (i> r) since each
represents the limiting matrix of an ergodic chain. U For a general chain with more than one ergodic set, the elements of I— can be used to obtain the probabilities of eventual absorption into any one particular ergodic set for each possible starting state. I—
Theorem 8.6.3
For a general chain, let [9'J denote the equivalence class denote the set of (ergodic set) determined by the ergodic state Let indices of those states which belong to [5",]. IfS', is a transient state, then P (eventual absorption into [1/k] I initially in .9',) =
— AA
Proof Permute the states so that T has the form (1) of Section 2. Replace each T.,, i> r, by an identity matrix and call the resulting matrix 1'. From Theorem 2.1 we know that < 1 for i = 1,2,... ,r so that urn '1" exists A-.
and therefore must be given by urn f
= I — AA' where A = I — f. This
modified chain is clearly an absorbing chain and the probability of eventual absorption into state when initially in 5", is given by
(urn
/1k
= (I — AA')Ik. From this, it should be clear that in the original
chain the probability of eventual absorption into the set [9',] is simply
the sum over
of the absorption probabilities. That is,
P (eventual absorption into [5",] I initially in .9'.) =
(I —
AA '),,. (3)
'ES."
We must now show that the
can be eliminated from (3). In order to do
APPLICATIONS OF THE DRAZIN INVERSE TO THE MARKOV CHAINS
167
this, write A and A as
Ill
:oi
G
A—
'
I
E
Theorem 2.1 guarantees that
I-
II
ndA—
12
I
G22. is non-singular and Theorem 7.7.3 yields
=
When T is in the form (1) of Section 2, the set of indicies 91k Will be = {h,h + 1, h + 2,... h + r}. Partition I — AA' and sequential. I— as follows:
columns h, h +
I
o...0 0...0
1,...h+t
WIN
...
Wpq
row i is
in here W,fl
=
(4)
0...0
0".0 0...0 and
V
W1qWqq •..
i.r+ i
1w,+ i.,+ +
I—AM =
+
+
columns h, h + 1, ... h + 1
W,,qWqq
1
row
... W,,,W,,,
v-.lis }
...
...
(5)
W+ir+i
...
0
...
0
...
0
0
where W
=I—
the gth row of
In here
Suppose the ith row of I — AA lies along If P denotes the probability given in (3), then it
168 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
is
clear that P is given by
P = gth row sum of W,,,.
(6)
It is also evident that
= gth row sum of
(I —
(7)
lEfk
Since Wq is the limiting matrix of Tqq and Tqq is the transition matrix of an ergodic it follows that the rows of Wqq are identical and sum to 1. Therefore, the gth row sum of W,,qWq = gth row sum of W,,,, and the
desired result now follows by virtue ol(6) and (7). U Theorem 8.6.4
If
is a transient state and is an ergodic state in an ergodic class [6"j which is regular, then the limiting value for the nth step = (I — AA')Ik. transition probability from 50• to 60k is given by lim SI-.
Proof It is not hard to see that urn absorption
= P (eventual
where
=
A-.
into [9'J I initially in 6".) and
the component of the fixed probability vector associated with [5°,] corresponding to the state 6",. Suppose [9'J corresponds to Tqq when the transition matrix T is written in the form (1) of Section 2, and suppose the ith row of I — AA' lies along the gth row of the block in (5). The kth column of I — AA' must therefore lie along one of the columns of W say thefth one, so that = we can use (6) to obtain (I — AA = (gth row sum of WN) x is
W
W,Pk
U
=
itself contain important information about a general chain with more than one transient state. The elements of
Theorem 8.6.5
If 6". and 9', are transient states, then (A')1, is the expected number of times in .9', when initially in Furthermore 5°. and 6", belong to the same transient set (A')1, > 0 and (A' >0
Proof Permute the states so that T has the form (1) of Section 2 so that T=
and A =
[!j9.-! T2E]
where p(Q) < I. Notice that TSk = Qil because 6". and 6", are both
\
/1,—i
transient states. By using the fact that (
/a-I
T'
\
is the Q' ) = ( ) Ii* \s"o 1k expected number of times in 6°, in n steps when initially in 6",, it is easy to see that the expected number of times in .9', when initially in is
\:=o
lim
(a_i)
•
1=0
Theorem 8.6.6
For a general chain, let
denote the set of indices
APPLICATIONS OF THE DRAZIN INVERSE TO THE MARKOV CHAINS 169
corresponding to the transient states. If the chain is initially in the transient states then is the expected number of times the chain is in a k€f transient state. (Expected number of times to reach an ergodic set.)
Proof If a pennutation is performed so that the transition matrix has the form (1) of Section 2, then
<1.
T=
If Q is r x r, then if = { 1,2,... , r} and the previous theorem implies that (A')11 =
(I
k.f
is
—
the expected number of times the chain is in a
transient state when initially in
Theorem 8.6.7
IfS'1 and b"k are transient states, then
(A' [2A' — I] — A
= Variance of the number of times in
when
initially in 6". and
E ([2A'
—
IJA'
—
= Variance of the number of times the chain is in a transient state when initially in 6".
where f is the set of indices corresponding to the transient states and
are as described in Theorems 4.1 and 4.4.
and
The proof is left as an exercise.
As direct corollaries of the above theorems, we obtain as special (but extremely useful) cases the following results about absorbing chains.
If T
is the transition matrix for an absorbing chain, then the following statements are true.
Corollary 8.6.1
is an absorbing state, then (I — AA' is the probability of being when initially in b". absorbed into are non-absorbing states, then (A' is the expected (ii) If b"1 and when initially in number of times in (iii) If if is the set of indices corresponding to the non-absorbing states, then is the expected number of steps until absorption when initially (1)
If
in the non-absorbing state 5".. — I] — are non-absorbing states, then (A (iv) ff11'1 and is the variance of the nwnber of times in b"k when initially in 6's. (v) If if is the set of indices corresponding to the non-absorbing states, — I]A' — As')11 is the variance of the numbers of steps until then
absorption when initially in
In order to analyse a chain by utilizing the classical theory, it is always
170 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
to first permute or relabel the states so that the transition matrix assumes the canonical form (I) of Section 2. However, by analysing the chain using A and I — AA , the problem of first permuting the states may be completely avoided since all results involving A # or I — AA' are independent of how the states are ordered or labelled. In fact, the results of this section help to perform a classification of states rather than requiring that a classification previously exist. necessary
7.
References and further reading
Almost any good text on probability theory treats the subject of finite
Markov chains. However, not all authors use the tool of matrix theory for their development. In [34] and [66], the reader can find a good development of the subject in terms of matrix theory. In [46], the probabilistic approach is combined with the matrix theory approach. This text can provide the reader with all the needed background necessary to read this chapter. Only the case where the state space is finite has been considered in this chapter. The industrious student might see what he can do with the subject when the state space is countably infinite, in which case one is dealing with infinite matrices. A good place to start is by reading [47]. See also [17].
9
Applications of the Drazin inverse
1.
Introduction
The previous two chapters have developed the basic theory of the Drazin
inverse and the applications of a special case, the group inverse, to Markov chains. This chapter will develop the application of the Drazin inverse to singular differential and difference equations. We shall also discuss where these singular equations occur. 2.
Applications of the Drazin inverse to linear systems of differential equations
In this section, we will be concerned with systems of first order linear differential equations of the form Ax(t) + Bx(t) = f(i), x(t0) = ceCa where and x(t) and (kt) are vector valued functions of the real variable t, and f(t) is continuous in some interval containing t0. If A is non-singular, then the classical theory applies and one has the following situation.
(I) The general solution of the homogeneous equation, Ax(t) + Bx(t) =0, is given by x(t) = A (H) The homogeneous initial value problem, Ai(t) + Bx(t) =0, x(t0) = c, has the unique solution x(t) = e " ISQ_lo)C
(III) The general solution of the inhomogeneous equation Ai(t) + ' iaiJ e'"f(s)ds, Bx(t)= ftt), is given by x(t) = +A
e'
aeR,qeC. (IV) The inhomogeneous initial value problem, Ax(t) + Bx(t) =
x(t) = c, has the unique solution x(t)= e' 'Bft.to)c + A x f(s)ds.
e SO
172 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
In this section, we will examine what happens in each of these problems when A is a singular matrix. When A is a singular matrix, things can happen that are impossible when A exists. For example, the homogeneous initial value problem may be inconsistent, that is, there may not exist a solution. If there is a solution, it need not be unique. The following is a simple example that illustrates this fact.
Example 9.2.1
Let A
and B
=
=
?]• Then the initial
value problem Ax(t) + Bx(t) =0, x(0) = [1,11* clearly has no solution. 10
1
IIA= 10
0
01
10
1
0 IandB=10
0
[000]
11
Olandweimposetheinitialcondition
L000J
x(0) = [1, 1, = c, then it is not difficult to see that the initial value problem Ax(t) + Bx(t) =0, x(0) = c, has infinitely many solutions. Notice that in each of the above examples, we even have that AB = BA.
The situations illustrated above motivate the following definitions.
Definition 9.2.1 For and t0ER, the vector is said to be a consistent initial vector associated with t0 for the equation Ax(t) + Bx(t) = 1(t) when the initial value problem Ai(t) + Bx(t) = 1(t), x(t,) = C, possesses at least one solution.
Definition 9.2.2 The equation Ax(t) + Bx(t) = 1(t) is said to be tractable at the point t0 the initial value problem Ax(t) + Bx(t) = f(t), x(t0) = c has a unique solution for each consistent initial vector, C, associated with t0.
If the homogeneous equation Ax(t) + Bx(t) =0 is tractable at some point So we may simply say the equation t0e R, then it is tractable at every t is tractable. Our goals are as follows. (1) Characterize tractable homogeneous equations. (ii) Provide, in closed form, the general solution of every tractable homogeneous equation. (iii) Characterize the set of consistent initial vectors for tractable homogeneous equations. (iv) Provide, in closed form, a particular solution for the inhomogeneous equation when the homogeneous equation is tractable. (v) Characterize the set of consistent initial vectors associated with a point t0 for an inhomogeneous equation when the homogeneous equation is tractable. (vi) Provide, in closed form, the unique solution of Ai(t) + Bx(t) = 1(t), x(t0) = c where c is a consistent initial vector associated with t0 and the differential equation is tractable. The key to accomplishing (i)—(vi) is the following two results.
APPLICATIONS OF THE DRAZIN INVERSE
Theorem 9.2.1
X
For A, BE
173
the homogeneous differential equation
Ax(t)+Bx(t)=O is tractable exists.
(1)
there exists a scalar AEC such that (2A + B)-
and only
Lemma 9.2.1 Let A, (AA + B)1 exists, and let
X
l
such that
,• Suppose there exists a
(2)
Then
=
Theorem 1 shows that assuming (1A + B) is invertible is a natural assumption. Lemma 1 means that we can assume for proof purposes that A and B commute. We shall prove the Lemma first.
If there exists AEC such that (A.A + B)-' exists, then
Proof of Lemma 1. Proof of Theorem
•
that + B) exists. Let AA and beas defined in (2). Clearly Ax(t) + =0 is. Taking a similarity Bx(t) =0 is tractable if and only if AAX(t) + Suppose first that there exists AEC such
1
we may write A
Ic 0]
ALO NJ'
B
Il—Ac 0
0
1_IB,
01
_Ix,(t)
I—,N]LO
(3)
= I. Since C is invertible, Cx,(t) + (I — AC)x2(t) =0 is since ).Aa + tractable. Thus it suffices to show Nx2(t) + (J — AN)x2(t) =0 is tractable.
(4)
Let k = Ind(N) and multiply (12) by Then (I — 'x2(t) = 0. Hence 1x2(t)= 0. Multiply (12) by Then 'x2(t) + Nk_2x2(t) (I — = 0. Continuing in this manner = 0 so that we get that x2(t) =0 and Ni2(t) + (I — AN)x2(t) =0 is trivially tractable. Suppose now that Ax(t) + Bx(t) = 0 is tractable. We need to show that there is a AeC such that (AA + B) is invertible. Suppose that this is not true. Then (AA + B) is singular for all AEC. This means that for each
IEC, there is a vector
+ B)vA =0 and
(AA
Let
,
,
such that 0.
be a finite linearly dependent set of such vectors. Let
C be such that not all the
are 0. Then z(t) =
= 0, where
is not identically zero and is
174 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
easily seen to be a solution of (1). However,
= 0.
z(O)
= Thus, there are two different solutions of(1), namely z(t) and 0, which satisfy the initial condition x(O) =0. Therefore, (1) is not tractable at t =0. which contradicts our hypothesis. Hence, (AA + B)-' exists for some
2€C. •
The next theorem will be used to show that most of our later development
is independent of the scalar 2 which is used in the expression (AA + B)-'. Theorem 9.2.2 Suppose that A, BECn*hI are such that there exists a AEC so that (IA + B)-' exists. Let AA = (IA + B)- 'A, BA = (IA + B)- 'B, For all for which + B)' and and 1A = (IA + B)- for (pA + B)-' exist, the following statements are true.
=
(5)
=
and
= =
and
(6)
=
(7)
=
(8)
(9)
Proof To prove (5), write = [(GA + B)- IAIDA = [(GA + B)- '(pA + B)(4uA + B)- IA]DA = +
=
by Theorem 7.9.4. + + B)]A + B) B) 'A + B) '(IA +
=
=
The proof of (6) is similar and is left as an exercise. To prove (7), write A, = [(xA + B)- '(pA+ B)J(pA + B)- 'A = + = + Since and commute, it follows that for each positive integer m,
R(A') = Thus (7) follows. To prove (8), use the same technique used to prove (5) to
=
obtain
'1=
+ B)-
+ B)]f =
+ B).-
+ B)
The proof of(9) is similar. In view of the preceding theorem, we can now drop the subscript A and appear. whenever the terms R(AA), Ind(AA), We shall do so. Let us return to the proof of Theorem 1. Recall that the original system x
+ B)
APPLICATIONS OF THE DRAZIN INVERSE
was
175
equivalent to the pair of equations
Cx1(t)+B1x1(t)=O, CB1=B1C
(10)
Nx1(t) + B2x2(t) = 0,
(11)
NB2 = B2N,
B2 invertible, and the only solution of(l1) was x2(t) 0. But (10) is consistent for any x1(t0) and the unique solution is x1(t) = exp( — C x (t — t0))x1(t0). Thus we have proved the first part of the next Theorem. Suppose Ai(t) + Bx(t) =0 is tractable. Then the general
Theorem 9.2.3 solution is given by
x(t)=e
qECN.
(12)
isa consistent initial vector for the homogeneous equation if
A vector
and only ifc€R(At) = R(ADA). is k-times continuously differentiable around t0. Then Suppose that the non-homogeneous equation Ai(t) + Bx(t) = f(t) always possesses solutions and a particular solution is given by
x(t) = e
+ (I —
J:0e
—
(13)
Moreover, the expression (13) is independent of A. The general solution is given by
x(t) = e
—
— to)AADq
+e
—
J
e
ADf(S)dS
(14)
—
+ (I —
Let * = (I — AAD)E( —
Then w is independent of A.
1=0
A vector is a consistent initial vector associated with t0eR for the inhomogeneous equation if and only {* + R(Ak) }. Furthermore, the inhomogeneous equation is tractable at t0 and the unique solution of the initial value problem with X(tr,) = c, c a consistent initial vector associated with t0, is given by (14) with q = c.
Proof (14) will follow from (12) and (13). We have already shown (12). To see (13) let
x,(t) = ADe_
e
x2(t) = (I — AAD) 1—0
where we have taken ç
=0 for notational convenience. We shall show that
176 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
Ax,(t)+
= AADf(t) and
(15)
Ax2(t) +
= (I — AAD)f(t)
(16)
To verily (15), note that Ax,(t) =
+
—
Ax2(t) = A(I
AADf(t) =
—
+ ADe
—
=
A°*te
+ AA"f(t),asdesired. We now verify(16).
—
lf(I+
—
= (I —
1—0
—
1)'
x
1—0
"(t) = (I —
= (I —
—
x
i—i
= (I — = —
(— 1)' i—i
x2(t) + (1— AAD)i(:) where the fact that (I —
—
I_i
( — 1)'
lAi(ED)14 1f(t) =
+ (I —
+
—
A, AD commute has been used freely.
Thus, x,(t) + x2(t) is a particular solution as desired. The characterization of the consistent initial vectors for the inhomogeneous equation follow directly from (14). That the solutions are independent of A follows from
Theorem 2. U An important special case is when B is invertible. Then we may take
f=B'f.
A=OandA=B'A,
The Drazin inverse can sometimes be useful even when A is invertible. If f(s) is a constant vector 1, the general solution of x(:) + Bx(t) = f(t) is given by
r
1
essdS]f.
(17)
a
If B' exists, then (17) is easily evaluated since feuds = B
+ G,
is more GE However, if B is singular, then the evaluation of difficult. The next result shows how to do it using the Drazin inverse.
Theorem 9.2.4
If
and Ind(B) = k, then
... Ge
X N
Proof Use the series expansion for
+ G]=
+(I —
Corollary 9.2.1
If
x(t) = BDf+ t(I — BBD)f_
to obtain
then
t3(I
... +
APPLICATIONS OF THE DRAZIN INVERSE (
flk- hZk(I
BBD)Bk_l
177
I is a solution of x(t) + Bx(t) = I. (Note, this is a
polynomial in t.)
Corollary 9.2.2
For each t. let C(t) denote the set of consistent initial vectors associated with tfor Ax(t) + Bx(t) = 1(t) where (AA + BY' exists for some A and 1(t) is k-times continuously Then d(C(t), C(t0)) 0 as t —, ç, where d(C(t), C(t0)) = sup inf X—y IJ
'i€C(10)
The proof is left as an exercise. We also note in passing, the next theorem. If(AA + B)' exists for some A, then
Theorem 9.2.5 AAD = lim 4—x
Proof
AD
= urn
it
2—0
-
Since
I = AA2 + B2 we obtain
AD_ = AAD +
The first limit A
is independent of A. The second limit follows from
follows since
S
Example 9.2.2 Consider the homogeneous differential equation Ax(t) + Bx(t) =0 where 2 0 —21 1 0 A=J—1 0 21,B=I—27 —22 —17 14 2] 10 L 2 3 L 18
1
1
1
Note that A and B are both singular and do not commute. Since A + B turns out to be invertible we multiply on the left by (A + B)-'to get Ax(s) + =0 where —5 5
iI—3
A=(A+B)'A=—I
6
—41
6
5
—2 —2
The eigenvalues of A are 0, 1,3 so that AD may be computed by Theorem 7.5.2 to be 1
1—27 —41
AD=_I
54
77
27L—27
—34
—28
46 —14
The consistency condition for initial conditions is thus (I — AAD)x(0) =
f
— 18
L
14
18
9
—
14 7
4]
—
101 Ix,(0)1 10 x2(0) I = 0. 5J Lx3(0)J
7]
178 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
There is only one independent equation involved, 9x1 (0) + 7x2(O) + 5x3(O) =0.
Since
—
(18)
= {O, 0,2/3), it is not difficult to compute the matrix
exponential as 1
x(t) = e
=
fl8 0 L 0
1 — e2t3
2(1 — e2113)1 rx1(o)
16(1 — e2"3)
26 — 8e2"3
13(e2"3 —
1)
26e2"3
—
8
J
Lx3(0)
Equation (18) can be used to eliminate one of the x1(0).
For many applications it is desirable to be able to solve Ax + Bx = I when A, B are rectangular. We shall develop two important special cases. For each case, the general solution will be given. Derivation of the appropriate set of consistent initial conditions for the non-homogeneous equations is left to the reader. Another generalization of the results of this section may be found in Exercises 1—6. We will first consider the case when (AA + B) is one-to-one for some A. Theorem 9.2.6
Suppose (Alt + B) is one-to-one. Then all solutions
Ai + Bx =0 are of the form x=e
of
where q€R(ADA)
and
=0 for m = 0, 1,... ,n.
[I — (AA + B)(AA +
Here A = (Alt + B)tA,
(19)
= (AA + B)tB.
=0. Proof If x is a solution of Ax + Bx =0, then x is a solution of Ax + = M and AA + = I. Hence x = A°*IADAq by Theorem 3. But BADA]e_ =0 for all t. Substituting back in gives [— + =0 for all t, or equivalently, Thus [— + BAADJe_ [BA —
=0 for m =0,1,2.... But = A(AA + B)tB = A(AA + B)t(AA + B)— A(AA + B)tAA = A — AA(AA + B)tA
=A_(AA+B)(AA+B)tA+B(AA+B)tA
=[I_(AA+B)(AA+B)t)A÷BA. U Corollary 9.2.3 If AA + B is one-to-one, and N(XA* + B*) = N(A*)r N(B*), then all solutions of Ax + Bx =0 are of the form x = e A°bIADAq where q is an arbitrary vector.
I
= N(XA* + B*) = N(A*) n N(B). But R(A) N(A) Proof R(AA + so that R(A) R(AA + B)-. Thus (19) holds for all qeR(AA"). U
APPLICATIONS OF THE DRAZIN INVERSE
Example 9.2.3
179
[?]. Then (AA + B) is one-to-one = [b], B = and N(AA + B) = N(A) N(B) = (O} for all 2. However, N(XA + B*) # N(B*) for all A. Ai + Bx =0 has only x =0 as a solution. Multiplying by (AA + B)t = (IA 12 + 1)-i [2., 1] we get 2(1212 + l) l x + (1212 + l) 'x = 0 which has the non-zero solutions x = "q. Let A
Suppose (A.A + B) is one-to-one and Ax + Bx consistent. Then all solutions of Ax + Bx = fare of the form
Theorem 9.2.7
= f is
+ ADe
x=
+ (I — A =(AA + B)tA,
=(AA + BrB, k = Ind(A), and f= (AA + B)tf.
Proof !fxsolvesAx+ Bx=f,then AA + = I. Thus (20) follows from Theorem 3. • Theorem 7 is not as completely satisfying as our other results since we have not stated precisely for which f is Ax + Bx = f consistent when AA + B is one-to-one. While the genóral problem appears difficult, we do have the following. Theorem 9.2.8 Suppose AA + B is one-to-one and N(AA* + B*) = N(A*) n N(B*). Then Ax + Bx = f is consistent and only jf
(I_(AA+B)(AA+B)t)f=O. Proof
Suppose AA + B is one-to-one and N(XA* + B*) = N(A*) n N(B*).
+ B*) = Now (2A + B)(AA + B)' is the identity on R(AA + B) = + B)'A = A and Thus(AA + 2 (AA+B)(AA+B)'B=B. Henceforany x, ifwe setf=Al+ Bx,weget
(AA + B)(AA + B)'f = f. On the other hand, if(2A + B)(AA + B)'f = f, then = I is = I. Since Ax + Ax + Bx = is equivalent to Ax +
consistent, so is Ax + Bx = f. U The special cases when A or B is one-to-one are of some interest. B being one-to-one is the case of most interest for the applications of Section 5. Theorem 9.2.9
Suppose that A is one-to-one. Then Ax consistent if and only tff is of the form
f=
—
+ Bx = f is
AAt)Bg
where h is an arbitrary function and
g=
+
AtBt
*?BsAth(S)th,
(22)
180 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
q an arbitrary constant. Conversely, (22) is the general solution.
I has the form (21), then g given in
Proof Suppose A is one-to-one. Then Ai + Bx = I is equivalent to the pair of equations:
x+AtBx=Atf, and (I —
AAt)Bx = (I
—
(23)
AAt)I.
(24)
Now AAtI can be chosen arbitrarily, say AAth. Then (23) uniquely determines x giving (21). Substituting x into (24) gives (1 — AAt)(. A similar result is possible if B is one-to-one.
Theorem 9.2.10
Suppose that B is one-to-one. Then Ax consistent tf and only 1ff is of the form
•
+ Bx = I is
I = BB'h + (I — BB')Ag
(25)
where h is arbitrary and
g=
+
+ [I — (BtA)D(BtA)]
(26)
E( —
k = Ind(B'A), q arbitrary. Conversely, 1ff has the form (24), then g in (25) is the general solution.
Proof Suppose B is one-to-one. Then Ax + Bx = I is equivalent to B'Ax + x = Btf, and (I — BBt)AX
= (1
—
BBt)I.
(27) (28)
Again BBtf is arbitrary. From (27) x is determined uniquely in terms of B'I. Then (I — BBt)f must follow from (28). N We now turn to the case when AA + B is onto. Let A, B be m x n matrices. Let A be such that AA + B is onto. Define P=(AA + B)t(AA + B). Then Ax + Bx =fbecomes
AP*+BPx=f—A(I—P)x—B(I—P)x. Or, equivalently, A(AA + B)'[(AA + B)x] + B(AA + B)t[(AA + B)x]
=f—A(I—P)x—B(I—P)x.
(29)
But A[A(AA + B)t] + [B(AA + Bt)] = I. Thus (29) is, in terms of(AA + B)x, a differential equation of the type already solved and hence has a solution for any choice of (I — P)x.
APPLICATIONS OF THE DRAZIN INVERSE
181
Theorem 9.2.11 Suppose that )A + B is onto and I is n-times B)t. Let B(AA+ B)t. Let + B)t(AA B)Jh — B[1 — (AA + B)t(,A + B)]h where h is an arbitrary (n + 1)-times vector valued function. Then all solutions of + Bx = fare of the form
x = (AA + B)'{e
+ (I
—
A°BIAADq
+ ADe_ + [I — (AA + B)t
ADA)
x(AA+B)Jh. an arbitrary constant vector, k = md (A). The formulas in Theorem 11 simplify considerably if A or B are onto. For the applications of Section 5, the case when B is onto is the more important.
q
Theorem 9.2.12 are of the form
x = Bt{e_
Suppose that B is onto. Then all solutions of Ax + Bx = I + C°e
+ (I h an
k
—
+ [1 —
—
arbitrary function, q an arbitary vector, g =
B'B]h.
I — A[I — B'BJh, C
= AB'.
= !nd(C).
Theorem 12 comes immediately from Theorem 11 by setting A = 0 and noting that ñ = I.
Theorem 9.2.13 are of the form
x = At{e_
Suppose that A is onto. Then all solutions of Ax + Bx = I
+e
+ [I — A'A]h
where h is an arbitrary function and g = I — B[I
—
(30)
A'A]h.
Proof This one is easier to prove directly. Suppose A is onto and rewrite Ax + Bx = f as (Ax) + BAt(AX) = f — B[I — AtA]X. Taking [I — A'A]x arbitrary we can solve uniquely for Ax, A1Ax = x, to get (30). U 3.
Applications of the Drazin inverse to difference equations
The Drazin inverse also arises naturally in attempting to solve difference
equations with singular coefficient matrices. To illustrate why the Drazin
182 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
inverse works and other types of generalized inverses don't work in dealing
with this type of difference equation, consider the following difference equation: A is singular.
At first glance, one might be tempted to introduce a (1)-inverse for A. However if one stops and thinks a moment, one can see that one must have that . .. = xfl = =
= = ... = as well as = k = Ind(A). Thus, the problem could be stated as follows: given find a R(Ak). vector ; such that A; +1 = and + 1E By examining Definition 2.2, one can see that the above problem has = AD;. a solution, the solution is unique, and is given by Not unexpectedly the solution of the difference equation proceeds much as for the differential equation. Definition 9.3.1 For A, Be Ctm X the vector cc C"' is called a consistent initial vector for the difference equation = B; + if the initial value problem = B; + f,,, x0 = c, n = 1,2,... has a solution
for;.
Definition 9.3.2 The difference equation tractable if the initial value problem =
= B; +
B; +
is said to be
x0 = c, n = 1,2,... has
a unique solution for each consistent initial vector c.
Theorem 9.3.1 Axe,,.,
The homogeneous difference equation
=Bx
is tractable if and only if there exists a scalar
such that (A,A + B)'
exists.
Proof The proof follows the same lines as the proof of Theorem 2.1 except that xA1(t) = is replaced with U = The difference analogue of Theorem 2.3 is as follows.
Theorem 9.3.2
If the homogeneous equation (1)
is tractable, then the general solution is given by
JAADq
—
ifn=0
ctm
exists. Furthermore, ceC"' is a consistent initial vector for (1) if
APPLICATIONS OF THE DRAZIN INVERSE
183
and only where k = Ind(A). In this case the unique solution, = (ADñy1c, n = 0, 1,2,3 subject to x = c, is given by The inhomogene-
= B; +
ous equation
forn.1, =
+ AD
is also tractable. Its general solution is,
- -
£
—
(I
—
=(2A — B)'f,k = Ind(A), is independent of 1. Let * = — (I — AAD) x
=(2A
A =(A.A — B)
and qeC"'. The solution
—
The vector c is a consistent initial vector jf and only
c lies
1=0
in the flat {* + R(A") }.
Proof Since (1) is tractable, multiplying by (AA — B)' gives the = equivalent equation After a similarity we get, as in the proof of Theorem 2.1,
IC 0
= C"(t + = (I + and the solution Thus =0, of the homogeneous equation follows. (2) may be verified directly as in the proof of Theorem 2.3. U
It is interesting to note that the solution (2) for
depends not only
on then+lpast vectors
1•
Z(A- 'By'-'-
When A is non-singular,
=(A'Brq +
depends only on the past vectors
and
In many applications one has a difference equation holding for only a subset of the x,. The difference equation that is discussed in Section 5 is solved by the following theorem.
Theorem 9.3.3
Suppose that A, B are square matrices and there exists a scalar 1 such that IA + B is non-singular. Set A = (IA + B) 'A and Then all solutions of +Bx1=0,i=0,...,N—Iare — (1 — A"A)xN. given by = ( +(
Proof Suppose that there exists a I such that AA + B is non-singular. Taking a similarity we get as in (3)
A_IA, 0 L0
0 1 Mk_0x
1
MJ'
L0
with B, = I — IA,,B2 = I
—
B2]'
—
'
AM. Then the difference equation is equivalent
184 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
to the decoupled equations
Thus w =( — AJ 4.
1B1)1w0, and v.
= ( — B 1M)N_*vN. •
The Leslie population growth model and backward population projection
Suppose that a population is partitioned according to age groups. Given specific rates of fertility and mortality, along with an initial age distribution, the Leslie model provides the age distribution of the survivors and descendants of the initial population at successive, discrete points in time. It is a standard demographic practice to consider only one sex at a time. We will consider only the female portion of a population. Select a unit of time (e.g. 5 years, or 1 year, or 10 s, or 0.5 ps, etc.). Let At denote one unit of time. Select an integer m such that m(At) is the maximum age to be consider. Construct m disjoint age classes or age intervals; A1
= (0, At], A2 = (At,2At],...
= ((m — 1)At, mAt].
Let t0 denote an initial point in time and for some integer n let
t=n(&). Let us agree to say a female belongs to Ak at time t if she is living at time t, and her age lies in Ak at time t. To define the survival and birth rates, let
Pk(t) be the probability that a female in Ak at time t will be in Ak+l at time t + At (survival rates). Let bk(t) be the expected number of daughters produced in the time interval [t, t + At), which are alive at time t + At, by a female in Ak (birth rates). Furthermore, let nk(t) be the expected number of females in Ak at time t. Finally, let n(t) = [n1(t), ... , For convenience, adopt the notation n(t1) = n(i), pk(tI) = and bk(tI) = bk(i). Suppose we know the age distribution n(i) of our population at time t.. From this together with the survival rates and birth rates, we can obtain the expected age distribution of the population at time
n1(i+ 1) n2(i + 1) fl3(i + 1)
[b3(i) J p1(i)
=
+')
0
LÔ
b2(i) 0 p2(i) 0
0 ... 0 ...
0 0
0 ... p.,_
0 0
n1(i) n2(i) n3(i)
0]
nrn(i)
(1)
or n(i + 1) = T(i)n(i). The expression (1) is the Leslie model. Many times, the survival rates and birth rates are constant with respect to the time scale under consideration. Let us make this assumption and write bk(t)=bk, so
that (1) becomes
n(i+1)=Tn(i)
(2)
APPLICATIONS OF THE DRAZIN INVERSE
185
We shall refer to T as the Leslie matrix. Suppose now that we are given
an initial population distribution, n(O). It is easy to see that we can now project forward into time and produce the expected population at a future time, say t = by n(k) = Tn(k — 1) = T2n(k — 2) = ... = T"n(O). We wish to deal with the problem of projecting a population distribution backward in time in order to determine what kind of population distribution there had to exist in the past, in order to produce the present population distribution. Such a problem might arise, for example, in a situation where one has statistics giving the age distribution for population A at only the time t. and other statistics giving the age distribution for population B at a different time, say t. + If one wishes to make a comparison of the two populations at time i, then it is necessary to project population B backward in time. Since n(i) = T 1n(i ÷ 1) the problem of backward population projection is trivial in the case when the matrix T of(1) is non-singular. If T is singular, the problem is more interesting. The Leslie matrix is very often singular. As a simple example, consider the population of human American females. Let & = 5 years and m =20 so that the age classes are: A1 = (0, 5],A2 = (5,10],... ,A19 = (90, 95], A20 = (95,100]. Almost everyone would agree that at least b20 =0. Suppose the Leslie matrix is given by b1
b2
Op20... o
0
0
T=
0 0
0 0 0
0... 0
0 0
0...0 O...0
0 0 0
PIN-k....l
0
0
0...
0
b3...bM_k_l
p1O 0...
(3)
0 0
0 0
0...
0
0 ...
Pm-k
0
0
o
0
0 ...
0
0
_1T11 [T2,
0
1
j
0 Pm-k+i
0... 0 0... 0
0
0 0
0
0 I
T22
The complete statement that we can make concerning backward population projection is the following. Theorem 9.4.1 For the Leslie population growth model whose matrix is given by (3), and for an integer x 0, let j be an integer such that
Oj
k + x. The future distributions n(k + x) determine the past distributions n(k + x —j) as follows.
186 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
if 0 j x,
(T")'n(k + x)
+ x)1
<Jk
+x
(4)
where VGRJ_x is arbitrary, n1(k + X)E
is the vector of the first m — (j — x) components of n(k + x), and M is the leading principal submatrix obtained from T by deleting the last j — x rows and columns from T.
Proof To prove the top half of (4) note that since Oj x, we have n(k + x _j)ER(rk), n(k + X)ER(Tk) and n(k + x) = + x —j). Thus (TJ)Dfl(k n(k + x — j) = + x) = + x). To show the bottom half of (4), first write T as
T
[M
:
01
=
where
so that, by Corollary 7.7.1, TD
MD 10-I =
Note that n(k)
=
÷ x)
In1 (k) 1
(k + x) 1
=
=
To complete the proof, it suffices to show [j — x])
1(MDY_xni(k)
v arbitrary. ], =L To simplify our notation , let 1= [I — x]. n(k
—
(5)
Partition M as b2
p1O
b3...bM_k_) 0 0
0...
bm_k
0
0...
0
0
0 0
0... 0...
0
0
0 0
0
0...
0
0
O
p20...
0 0
o
0
01
o
0...
•
0 0
o
0
M-——— —
0 ...
0
Pm-k+1
0 ...
0
0
0
Notice that lnd(M) = k
p-i S(p)=
(6)
I
0 0
— I.
0
0... 0...
0 0
0 0 0-
For each positive integer, p, let
IM
01
O1P
rMP
Nj
Lsp Ni]
APPLICATIONS OF THE DRAZIN INVERSE
187
Given the distribution n(k), there had to exist some initial population, n(O), (not necessarily unique) and an intermediate population, n(k — which gave rise to n(k). Write n(O) and n(k — I) as n(O)
n(k
=
—0= Now Tkn(O) = n(k) so that n1(k)ER(Mk) =
where n2(O), n2(k —
R(M&1. But T'n(k
—
I)
Also note that = n(k — I) so that n1(k and Ind(M) = k — 1 we have = n(k). Since M'n1(k — 1) =
n1(k — 1) is uniquely determined by n1(k) as n1(k — 1) =
To finish the proof it suffices to show that any vector u of the form (5) gives rise to the distribution n(k) after 1 intervals of time have elapsed. = n1(k), we have Since M'n(k — 1) = M(MD)mfl1(k) = TU
r
n1(k)
=
Thus it is only necessary to show that If n(O) = then T&n(O)
5.
= n2(k).
is any initial distribution which gives rise to n(k),
= n(k) implies n (k) =
Mkn1(O) and n2(k) = S(k)n1(O). Hence = S(1)(MD)IM n1(O) = S(1)Mk = S(k)n1(O) = n2(O).
Optimal control
In Section 2 we showed how to find solutions for linear systems of differential equations with singular coefficients provided that the solutions were uniquely determined by consistent initial conditions. In this section we shall apply those results to an optimal control problem. The problem presented will provide an interesting example of the type of differential equation studied in Section 2. In general, an optimal control problem involves a process x, which is regulated by a control u. The problem is to choose a control u so as to cause x to have some type of desired behaviour and minimize a cost J[x, uJ. The cost may, of course, take many forms. It may be time, total energy, or something else. The desired behaviour of the process may range
Finally, the process may from going to zero to hitting a moving depend on the control in a variety of ways, often non-linear. We shall present a particular problem and handle it in some detail. Of course, similar problems may also be analyzed using these techniques. Let A, B be n x n and n x m matrices respectively. All matrices and scalars are allowed to be complex though, of course, in many applications they are real. The usual inner product for complex (or real) vectors is denoted (.,.). Let Q, H be positive semi-definite m x m and n x n matrices. Finally, let x, ii denote vector valued functions of the real variable z. x is n x I while u is m x I.
188 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
We consider the autonomous (time independent coefficients) control
process
x=Ax+Bs.
(1)
on the time interval [t0,
J[x, u] =
I
with quadratic cost functional
(Hx, x) + (Qu, u)dt.
(2)
to
The dot in (1) denotes differentiation with respect to t.
If one has a fixed pair of vectors x0, x1 such that there exist controls u so that the process xis at x0 at a time t0 and x1 at time t1, then one can ask for a control that minimizes the cost (2) subject to the restraint that x(t0) = x0x(t1) = x1. Using the theory of Lagrange multipliers one gets the system of equations •
A+A*2+Hx=0 x—Ax—Bu=O
(3)
Bt+Qu=O as necessary conditions for optimization in this sense [1). If Q is invertible, then u can be eliminated from the second equation and the resulting system formed by the first two equations solved directly. We shall be most interested then in the case when Q is not invertible, though our results will include the case when Q is invertible. The system (3) can be rewritten as
Il
0 01 IA]
10
1
IA*
OIIxI+I0
LO 0 OJLuJ
LB*
H
01111
101
—A —BIIxJ=I0I. 0
QJ[uJ
LOJ
Note that (4) has leading coefficient singular.
We assume throughout that controls are continuous. All statements concerning optimality are made with respect to the control problem of this section and this linear manifold of controls. Optima) control problems with singular matrices in the quadratic cost functional have received much attention. They occur naturally as a first order approximation to more general optimal control problems. [45] surveys the known results on one such problem with singular matrices in the cost. The approach given here has the advantage that it leads to explicit solutions for the problem studied, as well as a procedure for solution. These explicit, closed form solutions, also simplify the proof and development of the mathematical theory for the problem studied. We shall first show that if (3) has a solution satisfying the boundary conditions, then u must be an optimal control.
(4)
APPLICATIONS OF THE DRAZIN INVERSE
189
Theorem 9.5.1 Suppose that x, u, A is a solution of(3) and x(t0) = x0, x(t1) = x1. Then u is an optimal control.
Proof To show that J [I, a]
J [x, u] for all î, ii satisfying (1) and the boundary conditions it is clearly equivalent to show that çb(s)
has
= J[sx + (1 — s)l, su + (1
—
s)ü]
a minimum at s = 1 for all I, ó. Let J0 =
I) + (Qu, i)dt,
3 = J[1, ü], and J = J[x, uJ. Then a direct calculation gives 4)(s) = s2(J
—
2J0
+ 3) + s(2J0 — 2!) + .1. Since 4) is quadratic in s it has
a maximum or minimum at s = (III I
Jto
1
if and only if J0 = J, or equivalently,
Cl,
(Hx,i)+(Qu,ü)dt= I (Hx,x)+(Qu,u)dt.
(5)
Jto
0 for all s so that if(S) holds there must be a minimum. Clearly (5) is equivalent to However, 4)(s)
— x)dz
=
— iI)dt.
(6)
(Hx,l)—(Hx,x)dt.
(7)
But
(Hxx—x)dt=
Cli
Jlo
Jt0
Now(Hx,x)=(—A— A*A,x)= _(A,x)_(A*A,x). But (AA,x)=(A,Ax)= (A, x — Bu) = (A, 1) — (A, Bu) = (A, ) — (B*A, u) = (A, x) ÷ (Qu, u). Thus (Hx, x) = Hence
(A, x) — (A, I) —
Cli
Jto
(Hx,i — x)dt =
(Qu, u). Similarly, (Hx, I) = (A, 1) — (A, 1) — (Qu, ü). C,,
(2,x)+(2,x)+(Qu,u)—(2,l)—(2,l)—(Qu,ã)dt
,J1O
ii
=(A,x)
Ii
—(Al) 10
to
+
(Qu,u)—(Qu,ü)dt Jto
C,
= JI to (Qu,u—ã)dt. U Note that Theorem 1 says that solutions of (3) satisfying the boundary conditions provide optimal controls even if the differential equation (3) has non-unique solutions for consistent initial conditions. Of course, in that case the optimal controls may not be unique. As a useful by-product of the proof of Theorem 1 we have that Si
J[x,u]=—2(A,x) I0
190 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
To simplify the solving of (4) rewrite it as
Ai+Bz=0 where A
(9)
], B =
=
['
:2]. Here I is 2n x 2n, O],and B4=Q.
B,
(10)
Clearly (p + B1) 'exists except for a finite number of p. Define
Q,=B4—B3(p+B,I'B2.
(11)
We now need the following easily verified result whose proof we omit. pA + B is invertible almost always if and only if invertible almost always. is + B1 are invertible. Let Assume that p. A, B are such that pA + B, are defined by A = (pA + B)- 'A, B = (pA + B) 'B. Then
Proposition 9.5.1
0]' n_I 01
LMP
01II_pN I] L—PMM
0
Using Corollary 7.7.2 we get
01
ADA_I
0 L
To evaluate e
0]
-
"i' note that for integers r
I
01'
1
[NDZ
0] =
L
0
1,
]
0
0
Thus the power series expansion of the exponential gives
r
0
e
e
I
Using Theorem 2.3 we see that the general solution of (9) is
I e
0
—
0 0
From the original equation (4) we have that
= e1
where
= ).(t0),
(12)
APPLICATIONS OF THE DRAZIN INVERSE
191
and
(13)
Thus we have shown the following.
Theorem 9.5.2
If
terms of x, A by (13) While (13) gives
is invertible, then the optimal control u is given in an optimal control exists.
[L]
explicitly, (13) does not give u directly in terms of
x. We now turn to this problem. Let E(t) = e -
—
[Ei(o) E2(t)] where the EAt), i = =
1,2, 3,4
are all n x n matrices. Suppose that (3) has a solution. Let
=
= A0. Then
Note that this is possible if and only if 1 1 Now E(ti)L]or
Lx(t1]=
x1 = E3(r1)A0 + E4(t1)x0.
(14)
Once ).0,x0 are known, x,u follow from (12) and (13). On the other hand if (14) is viewed as defining x1, then from (12) x will go from x0 to x1. Thus we have established the following result. Theorem 9.5.3 Suppose that is invertible almost always. For a given x0, x1 there is an optimal control that takes x from x0 to x1 in the time interval [t0, t1:J tf and only the equation (14) has a solution A0 such
It is possible, under our assumptions, for x to be able to go from x0 to x1 but not have an optimal control existing if Ng
L"oJ
=
.
We shall give a simple example that illustrates this. It shall also serve to illustrate our method. satisfying (14), is inconsistent in
Example 9.5.1 matrices. The process is then simply x = u, and the cost is 1
x=[x1,x2J1,
u=[u1,u2]T.
192
GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
The system (4) becomes 11 10
0
LO
0
I
0]IA1 10 1 01121 10] 0 —1 lix 1=101. 01 lxi +10 oJLuJ LI 0 QJLuJ LOJ
(15)
Since B is invertible, we may take p =0 in (pA + B) - L• Now
B1=I0
0
Li
0
QI
01-' fo
[0 i
—I
=11
I
—Q]
0
0
Lo —i
0
Multiplying (15) by B-' gives
Q 0112] 121
10 1
0
LO
—I
OIlxl+IxI=0. OJLuJ
(16)
LuJ
It is straightforward then, to get that the solutions to (15) are given by
110
121
0
Q
(17)
Q 011x01. 0 OJLu0J
LuJ \L0 —Q 0J JL—Q It is clear that for any x0,x1 there exists a control u sending x0 to x1. But for scalar c. Thus in
the x in (17) only takes on values of the form
order for an optimal control to exist, x0,
must be of the form
A look at the power series for the exponential in (17) shows that
121
lx
1
cosh(t—t0)Q+(I—Q)
1=1 —sinh(t—t0)Q
LuJ
L — cosh(t
—
t0)Q + Q
0] cosh(t—t0)Q+(I—Q) 0
—sinh(t--t0)Q — sinh(t
—
t0)Q
I
Q0 0
Q
L—Q °
OiLuoJ
— sinh(t — r0)Q cosh(t — t0)Q 01 cosh(t—t0)Q —sinh(t—t0)Q 01 =I cosh(t — t0)Q + Q — sinh(t — t0)Q 0J Lu0 L—
Ifx0
and x,
([o
Q
we see that t = t0 gives u= Q20. Since
0])we must have
then u0 =[— = [k], and
APPLICATIONS OF THE DRAZIN INVERSE
193
Letting t = t, gives
c, =
— sinh(t1
+ cosh(t1 — t0)c0.
—
we have
Solving (18) for
— t0){c,
=
(18)
—
cosh(t,
—
t0)c0}/sinh(t1
—
—
sinh(t —
0
L
as the optimal control. x can also be easily solved for if desired. We have arrived then at the following procedure for solving the original problem. Given x0, x1 determine whether it is possible to go from x0 to x1 with an optimal control by solving (if possible) (14) for such that a0
If
LX0J
is found, use the bottom half of (12) for x if x is
Use (12) and (3) to get the optimal control u. In working a given problem, it is sometimes simpler to solve (4) directly using the techniques used in deriving the formulas (12) and (13) as done in this example, rather than try to use the formulas directly. At this point, an obvious question is What does QM being invertible mean?' That is, 'What is the physical significance of assuming the The answer itself is easily comprehended. The proof, invertibility of however, requires some knowledge about Laplace transforms. The reader without an understanding of Laplace transforms and analytic functions is encouraged to read the statement of the theorems. From (10) and (11) we have needed.
p_A][_B] (19)
Proposition 9.5.2
Proof
If Q is invertible, then
is almost always invertible.
jim
matrices form an open set. U If Q is invertible, then it is obvious from (4) that u can be solved for in terms of x, A. Theorem 3 shows that this can happen even when Q is not invertible. We note without proof the following proposition.
Proposition 9.5.3
1fF, G are positive semi-definite r x r matrices, then F + G is invertible and only N(G) = {O}. Of course, is invertible almost always for real p ii and only if it is almost always invertible for complex p. Let p = is where s is real. Then (19) x H( — is
+ A)- 'B. From Proposition 3 we have that
is invertible almost
194 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
always if and only if (0) =
N(Q)rN(B*(
N(H112( — is + A)- 'B) = real s. Thus we have proven that:
+ A) I*H(_ is + A) 'B)= N(H( — is + A)- 'B) for almost all —
is
Theorem 9.5.4
is invertible for almost all and only if N(Q)n N(H( — is + A)- 'B) = (0) for almost every real s.
We need a technical result on analytic (1)-inverses before proceeding.
Theorem 9.5.5
Suppose that A(•) is an m x n matrix valued function such that A.Iz) is a fraction of polynomials for all i and). Suppose also that N(A(z)) is non-trivial for all z in the domain of A(). Then for any real number w >0, there exists an m x n matrix valued function B() such that
is a fraction of polynomials. (ii) R(B(z)) = N(A(z)) for almost all z. (iii) The poles of B are integral multiples of cvi, cv> 0, are simple, and (iv) IIB(z)II (i)
Proof Suppose that A() is an m x n matrix valued function such that is a fraction of polynomials for all i and j. Suppose also that N(A(z)) is non-trivial for all z in the domain of A(•). Let X be an n x m matrix of Then AXA = A is a consistent linear system of at most mn unknowns equations in mn unknowns. Denote this new system by (s). Since the coefficients of s are fractions of polynomials, there exists a real number K such that all minors of(s) are identically zero, or identically non-zero, for z K. Thus s can be solved by row operations (non-uniquely) to give a F(•) such that for z K; AFA = A, the entries of F(z) are fractions of polynomials in z, rank (F(z)) is constant, and rank (F(z)) is the maximum possible (dim N(A(z))). Note that is a fraction of
polynomials for alliandj. Let z,,...,zq be the poles ofFA. Let rj,...,rq denote their multiplicities. Let r0 be such that FA = CX z I'°) as z —, co.
Seta=r0+r,+... +rq+3.Define 8(z)
I
{ fl (z — zj' fl (z — ipw)' }(I — F(z)A(z)).
= j=1
p=l Then B clearly satisfies (I), (iii) and (iv). Since (ii) holds for I z K, it holds
for almost all z by analytic continuation. • We can now prove the following.
Theorem 9.5.6
The following are equivalent.
(a) There exists an x0,x1 for which optimal controls exists, but are not unique.
(b) There is a trajectory from zero to zero of zero cost with non-zero control.
(c) Q is not invertible for all p.
APPLICATIONS OF THE DRAZIN INVERSE
195
Proof Clearly (b) (a) since J[O, 0] = 0. To see that (a) (b). let (x, ii), (i. ii) be two optimal solutions from x0 to x,. Then there exists 2.2 so that (2. x, ii) and (2. x, ii)
satisfy (3). Thus (2 — 2, x — x, ii — ii) satisfies (3) and hence is optimal by Theorem 1. But (x — i)(t0) = (x — i)(t,) =0 and ii — ii is not identically zero. That J[x — 1, ii — ii] = 0 follows from (8). Suppose now that (b) holds so that there exists x, ii such that J[x, UI =0, x(t0) = O,x(t1) = 0, and u is non-zero. Since J[x.u] = 0 it is clear from (2) that Hx = 0 and Qu = 0. Extend xii periodically to [— x, x] and replace t by t — t0. Call the new functions î, ü. Thus Hx =0, Qü =0, and
i=Ai+Bü,t#n(t, —t0),n=O, ±1, ±2,... Sinceüisboundedand
sectionally continuous on finite intervals, x is continuous, and x is of exponential order, we can take Laplace transforms to get HL[i] =0,
QL[u] =0 and L[i] = (s — A)- 'BL[ü]. Thus L[ü](s)€N(Q)r N(H(s — A) 'B) for all s in some right half plane. By Theorem 4, we have is not invertible for all p. is not invertible for all p. From the proof of Conversely, suppose that for p = it. t real. N(H(p — A) 'B) = Theorem 4 we have
Thus N(Q)r' N(H(p — A) 'B) = N(QM) for almost all p. Now applying =0, and with co = 2ir/(t, — yields a B, such that Theorem 5 to =0. Let be satisfies (iii). (iv). But then QB =0, and H(p — A) Let by vector such that is not identically zero. Denote i(s) = (s — A) 'B(s). Then we have that
Hi(s) = O,Q#(s) = 0, and i(s) = (s — A)
(20)
Let x be the inverse Laplace transform of i, u the inverse Laplace transform of From (20) and (iv) we have Hi =0, Qü =0, x = Ai + Bü, i(0) =0, and ü(0) =0. [29, p. 1841. Furthermore, a is non-zero. Finally, we get since the poles of i(s) were simple and multiples of 2ni/(t, — that i,11 are periodic with period (, — [29, p. 188]. Replace i,ü by x = i(t + t0),u = ti(t + ta). Then x(t0) = x(z,) = 0, J[x.u] = 0, and x =
•
Ax + Bu. Thus (c) = (b). It is possible to have invertible almost always and still have non-zero optimal trajectories of zero cost. Of course, the control u must then be zero.
Example 9.5.2 Let Q = I, A = I, B =0, H =0 in (2) and (3). Then Q is invertible for large p since Q is. Clearly x = exp(A(t — t0))x0 is a trajectory of zero cost from x0 to x, = exp(A(t, — t0))x0. But ii = 0 and J[x,u] = 0. Note also if x0 =0, then x 0. We will make no use of 'controllability', and hence will not define it. For the benefit of the reader familiar with the concept, note that the invertibility of Q is logically independent of the controllability of (2) since for any choice of A, B, setting Q = I makes invertible almost always, while setting Q = H =0 makes 0. Note also that in Example 2, the pair (A, B) was completely controllable and was invertible. However, optimal controls only existed for certain
196 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
pairs x0, x,. Thus the assumption of controllability does not seem to
simplify matters if Q, H are allowed to be singular. The method of this section can be applied, of course, to any problem which leads to a system of the form Az + Bz = f. However, the special form of the A makes most of the calculations possible since it allowed us to use Theorem 7.8.1. Any problem which leads to a system with A=
IA1 01 be solved much as was (9), provided, of course, I can L"2 UJ I
pA + B is invertible for some p. We shall now describe several such problems. The calculation of the solutions parallels those just done, so a description of the problem will suffice. (Hx, x) + (Qu, u) + J <x, a >dt where a is a vector. Then the right hand side of (4) has = (as, 0*, 0*)* instead of the zero vector. Theorem 2.3 can be used to solve this non-homogeneous system to get
For example, suppose that the cost is given by
Q = ADe ADésJeADbsads +
+e
(I —
A"AeADAq.
The integral can be evaluated by using Theorem 2.4. For this problem, it is important to know whether or not the cost is positive. Another variation on the same type of problem is process (1) with the
+ 2 +
cost functional J[x,u] =
IH C*1
]
L
is
10
positive semi-definite [1, pp. 461—463]. In this case the system
to be solved is 1
10
0 01111 1
IA*
OIIxI÷I0
Lo 0 OJLuJ
LB*
H —A C
C*1 IA1
rol
QJLuJ
L0J
—B lIxI=I0I.
Solution proceeds almost exactly as when C =0, though QM has a slightly different form.
The analysis developed here can be also applied with little change to the following non-optimal control problem. Given output y, state vector x, and process i = Ax + Bu, find a control u such that y = Cx + Du. The appropriate system then is
II Olixi I—A —B1Ix1 101 Lo
c
D]Lu][yj
.
If y and u are the same size vectors, then (21) is the non-homogeneous form of equation (9). It may be solved by our techniques under the = D + C(p — A) 'B is invertible. assumption that
(21)
APPLICATIONS OF THE DRAZIN INVERSE
197
One frequently does not want to have D a square matrix in (21). If D is not square, then (21) can often be solved using Theorems 7.10.16 and 7.10.20.
Discrete control systems arise both as discretized versions of continuous systems and as systems of independent interest. By using the results of Section 3 some discrete problems can be handled in much the same way as the continuous ones were handled. For example, consider the following:
Discrete control problem Let A, B, Q, H be matrices of sizes n x n, n x m, in x in, and n x n respectively. Assume that Q. H are positive semi-definite. Let N be a fixed integer. Given the process (22)
the cost
J[x,u] =
(Hx1,x1)
+(Qu1,uJ,
(23)
1=0
and the initial position x0, find the control sequence which minimizes the u= cost (23). Here x = Note that the terminal position is not specified whereas it was in the continuous problems considered earlier.
Theorem 9.5.7 The Discrete Control Problem has a solution {x1}, satisfy and only there exists IAJ such that the sequences
IH _A* 0
LO
for i =0, 1,... , N
Proof
01
r— A
0
0 0
I
B] lxii
101
OIAj=I0I
Q]LuJ L with x0 given and AN =0, UN =0.
OJLu1+1J — 1,
—
(24)
LOJ
Since UNOflIY appears in the cost and does not effect the {x1}, UN
=0. Take UN =0. To see the
may be taken to be any vector such that N—i
necessity of(24), consider J[x.u] +
AN=0.Then
(A1,x1.. — Ax1 — Bu,) and set 1=0
a(x1 —Axo—Buo,...,xN—AxN_i a(x1,... ,xN)
BuN_l)_ 1
—
where (z1, ... ,zN) is to be considered as a list of the n entries of z1 then the n entries of z2, etc. Thus one gets by the usual theory of Lagrange
multipliers that
Hx1_A*A1+A1...1=0,i=l,...,N_1, (25)
HxN+AN...l=O, Qu1 — B*A1 =0,
i=0,...,N—1
198 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS is
necessary. But (25) is equivalent to (24) since AN was taken equal to zero.
On the other hand if (xj, {uj, {Aj satisfy (25), then one may show, almost exactly as for Theorem 1, that
J[sx+(1 —s)*.su+(l has a minimum of s = 1. We omit the details. •
1,
For the control problem considered here, x0 can be arbitrary.
Theorem 9.5.8 Suppose that Q + B*( — pA* + I) 'pH(p — A) 'B is invertible for some scalar p (and hence for all but a finite number of 4 Then for every x0 there exists a solution to the Control Problem.
Proof Given x0, J[x, u] defines a
function on
below, it suffices to show that J[x, uJ goes to 00 as
m•
Since 112
J is bounded
does. If Q is
invertible, this is clear. Suppose then that Q is singular and — pA* + I) 1pH(jt — A)- 'B is non-singular for almost all p. Q+ = Suppose for purposes of contradiction that there exists a sequence of N-i controlsequences
{u1,},i=O....,N— l;r=O...,suchthat
but J[x,, U,] is bounded. We shall show that, in fact, {u1,) is bounded as r-. oo. Since J[x,,u] bounded, one has (Qu0,,u0,) is bounded. Hence Q112u0, is bounded. But, (AHx,,,x1,) = + Bu0,) 112 is also bounded. Then H"2Bu0, is bounded since x0, = x0 for all r. But is invertible for almost all p. so that u0, is bounded. Hence x,, is bounded. Proceeding in this manner, one is
gets
P4±111
u1,
112 is bounded. Thus J attains its minimum as desired. U
01
1=0
We can now solve the Discrete Control Problem. Let A,
Then (24) becomes for i =0, ... , N
—
= LH
A]'
I,
fA, olIz+,1 fB, B21[z1110 0]Lu1+j LB3 B4JLU1jLO
L0
—
(26)
Proposition 9.5.4 = B4 — B3HM 'B2 = Q + B*(_ pA* + I) 'H x (p — A) 'B where p is such that and (p — A) are invertible. Then
fpA,+B, L
B3
is invertible
B21
B]
and only If Q is invertible.
It is assumed from here on that (27) and pA1 + B1 are invertible.
(27)
APPLICATIONS OF THE DRAZIN INVERSE
199
Multiply (26) by the inverse of (27) to get
f
011z1+11+1 0]Lu1+1]
LM
[M
0]D1
01D
01
and
0]'
0]
(28)
= I. But
+
= 0, x0 given,
with UN = 0, FNM
o11ze1=1o1 IjLu] L0J'
zM
0 0
1]
— ')(I — Here = (WN + WPZM + ... + By Theorem 3.3, all solutions of (28) are given by —
F—
[u1] — L
0hz0
0]
—
1=
OjLuo O1FZN]
÷1
0]
[
(29)
I]Lo ]
A solution (29) will satisfy the boundary conditions if and only if
+ (—
=
= (LMNM +
u0 = ZN
(30)
— —
(32)
+ (I —
= (—
and
0=
—
Recall that PNM + becomes — M( —
-
= I. Thus
(33)
—
and
commute. Using (32), (33)
—
=0. The
—
preceding discussion is summarized in the following theorem. Suppose that QM is invertible, that N> md (Np), and x0 is specWed. Then XN are obtained by solving
Theorem 9.5.9
=0, and
(I —
The control sequence is given by u0
=
=(
x (I
and for i >0,
= — (LMNM
—
+
—
—
As mentioned earlier, one is probably better off to follow the steps in the proof of Theorem 9 rather than try to utilize the formulas.
200 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
If the process is not completely controllable, and x0 is a point that cannot be steered to the origin, then XN will be unequal to zero. A very simple example is obtained by taking A = I, B =0, and Q invertible. is invertible. In this case, of course, one would get ; = Then and u, =0 for all I. It is possible to have Q not invertible, the process not completely controllable and still be invertible and our results apply. One may take
?]asanexamPle. In applications it frequently happens that Q is invertible. Unlike the continuous control problem, the discrete problem can still give rise to a singular difference equation when Q is invertible. If Q in the Discrete Control Problem is non-singular, then = Q - lB*).. for
i=O,1,...,N—land(24)becomes 0
Az
LH
L
o
i (34)
=[:]
for i=O,1,...,N—1,and2N=0. A is invertible if and only if A is. However, there always exists a p such that pA + is invertible so that Theorem 3.3 can always be applied. The difference equation (34) has the advantage that one can work with matrices that are 2n x 2n instead of(2n + m) x (2n + m). While N Ind(A) was assumed in the statement of the theorems, the assumption is not really necessary. if N < Ind(A), one may still use Theorem 3.3 to solve (24). Note that Theorem 5.7 holds even if Q is a singular for all p.
6.
Functions of a matrix
Iff(A) is an analytic function defined on a neighbourhood of the eigenvalues of the n x n matrix A, then there exists a family of projections 1, ... ,r, such that P1P,=0 if I, and , k1—1
f(A) =
'
11m0
' "(A —
m!
(1)
= md (A —
(Equivalently, is the multiplicity of as a root of the minimal polynomial of A.) Formula (1) has been known for some time, see [49) for example. The purpose of this short section is to observe that the may be explicitly written in terms of the Drazin inverse. where
Theorem 9.6.1 Suppose that AECN X N has r distinct eigenvalues (A1 ,... '1,), that Ind(A — A,) = Ic,, and thatf(A) is analytic on a neighbourhood of the
APPLICATIONS OF THE DRAZIN INVERSE
201
Then
f(A)= (I
—
(A
—
—
A.)).
From [49] it suffices to show that
= I, and
AQ1 = Q1A, (A — A)Q1 is nilpotent,
= Q1
(3)
0.
Suppose the Jordan form of A is given by TAT' = Diag{J, , ... ,J,} + N1 with where = nilpotent. Then = Diag{Q11, ... , Q1j with Q1, =0 for I i, = I for i = I. Thus (3) holds. • The are often referred to as the idempotent component matrices of A.
Corollary 9.6.1 eM
Let A be as in Theorem 1. Then
=
—
— (A
(4)
—
—
Proof Let 1(A) = eA, B = tA in (2). Observe that the eigenvalues of tA — A1). are {tA,, ... ,tAj and for t > O(tA — tA1)D(IA — tA1) = (A — U Using Corollary 9.6.1, one may get the following version of Theorem 2.3. Theorem 9.6.2 Suppose there exists a c such that cA + B is invertible. = (cA + B) 1B, and CA = AA + ñfor all A. Suppose Let A = (cA + B)- 1A, is not invertible for some A. Let {A,,... ,2,} be the Afor which AA + B is not invertible. Then all solutions of Ax + Bx =0 are of the form CA
r
=
(—
(5)
—
m.
or equivalently,
, (—
rn
— c + ADreAut(I
—
[(At — c)A + I]D[(A1 — c)A
+ I])q (6)
where k. = md CAL, q an arbitrary vector in
Proof The general solution of Ax + Bx =0 is x = e
ADBtADAq,
q an
arbitrary vector by Theorem 10.13. Since cA + = I, we have x=e
AD(I_CA)IADAq
=
AD +cI)IADAq.
(7)
Also if A1A + B is not invertible, then A.A + is not. Hence AA + I — cA is not. Thus 21A + B is not invertible if and only if — (A1 — c) - is an eigenvalue of A. But then (— + c) is an eigenvalue of AD. Thus A,A + B is not invertible if and only if A. is an eigenvalue of ci — Both (4) and (6)
now follow from (4), (7) and a little algebra. •
202 GENERALIZED IN VERSES OF LINEAR TRANSFORMATIONS
Note that if the k1 in Theorem 1 are unknown or hard to compute, one may use n in their place. It is interesting to note that while AA + and + B have the same eigenvalues (2 for which det(AA + B) = 0), it is the algebraic multiplicity of the eigenvalue in the pencil 2A + that is important and not the multiplicity in 2A + B. In some sense, Ax + =0 is a more natural
equation to consider than Ax + Bx =0. It is possible to get formulas like (5), (6) using inverses other than the Drazin. However, they tend to often be less satisfying, either because they apply to more restrictive cases, introduce extraneous solutions, or are more cumbersome. For example, one may prove the following corollary.
Corollary 9.6.2 Let A, B be as in Theorem 1. Then all solutions of Ax + Bx = 0 are of the form m
=
(—
—
m.
(8)
The proof of Corollary 2 is left to the exercises. Formula (8) has several disadvantages in comparison to (5) or (6). First At and CA do not necessarily commute. Secondly, in (5) or (6) one has q = x(O) while in (8) one needs to find q, such that —
= q, which may be a non-trivial task.
1=1
7.
Weak Drazin inverses
The preceding sections have given several applications of the Drazin inverse.
It can, however, be difficult to compute the Drazin inverse. One way to lessen this latter problem is to look for a generalized inverse that would play much the same role for AD as the (1)-inverses play for At. One would not expect such an inverse to be unique. It should, at least in some cases of interest, be easier to compute. It should also be usable as a replacement for in many of the applications. Finally, it should have additional applications of its own. Consider the difference equation (1)
From Section 3 we know that all solutions of (1) are o( the form x, = It is the fact that the Drazin inverse solves (I) that helps explain its applications to differential equations in Section 2. We shall define an inverse so that it solves (1) when (1) is consistent. Note that in (1), we have x,, = for I 0. Thus if our inverse is to always solve (1) it the must send R(Ak), k = Ind(A), onto itself and have its restriction to
APPLICATIONS OF THE DRAZIN INVERSE
203
to R(A"). That is. it provides the unique as the inverse of A solution to Ax = b, XER(Ak), when bE R(A"). same
Definition 9.7.1
"and k = Ind(A). Then B is a weak
that Drazin inverse, denoted Ad, Suppose
(d)
B is called a projective weak Drazin inverse of A if B satisfies (d) and
(p) R(BA) = R(AAD). B is called a commuting weak Drazin inverse of A (1 B satisfies (d) and
(c) AB=BA. B is called a minimal rank weak Drazin inverse of A B satisfies (d) and (m) rank(B) = rank(AD).
Definition 9.7.2
An (ia,... , i,,)-int'erse of A is a matrix B satisfying the
properties listed in the m-tu pie. Here i, E { 1,2,3,4. d, m, c. p }. The integers 1, 2, 3, 4 represent the usual defining relations of the Moore—Penrose inverse. Properties d, m, c, p are as in Definition 1. We shall only be concerned with properties { 1,2. m, d, c. p }. Note that
they are all invariant under a simultaneous similarity of A and B. Also 'B = Ak, and get note that one could define a right weak (d)-inverse by a theory analogous to that developed here. Theorem 9.7.1 Suppose that AEC" non-singular matrix such that TAT...I
X
k
= md A. Suppose TEC" XI?
C non-singular. Nk =0.
=
is
a
(2)
Then B is a (d)-inuerse of A if and only if
TBT'
X,Yarbitrary.
(3)
B is an (m,d)-inverse for A if and only if
Ic.-1 xl
TBT -1
Lo
B is a (p. d)-inverse
TBT'
=
oj'
Xarburary.
(4)
of A if and only if X arbitrary, YN =0.
(5)
B is a (c,d)-inverse of A ifand only if
rc-1 o' YNNY. Tff1'=Lo y]'
(6)
204 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
B is a (I, d).inverse of A
(land only if
TBT'
XN=O,Na(1).inverseofN.
(7)
B is a (2,d)-inverse of A (land only :f
YNY=Y,XNY=O.
(8)
If TAT'
is nilpotent, then (3)—(8) are to be interpreted as the (2,2)-block in the matrix. If A is invertible, then all reduce to A -
Proof Let A be written as in (2). That each of (3)—(8) are the required types of inverses is a straight-forward verification. Suppose then that B is a (d)-inverse of A. The case when A is nilpotent or invertible is trivial, so assume that A is neither nilpotent nor invertible. Since B leaves R(Ak)
invariant, we have TBT1
(d) gives only ZC"' =
Izxl
= Lo Hence
for some Z, X, Y. Substituting into
Z = C1 and (3) follows. (4) is clear.
Assume now that B satisfies (3). If B is a (p, d)-inverse then
iT' XN1\_ (F '
RkLO
° o
Thus (5) follows. If B is a (c, d)-inverse of A, then
r' cxl Lo
NY]
XN Lo YN
1'
But then CkX = XNk =0 and (6) follows. Similarly, (7) and (8) follow from (3) and the definition of properties { 1,2). Note that any number I md (A) can be used in place of k in (d). In and Ak+ lAd Ak. Although AdA and AAd are not general, AkA4A always projections, both are the identity on R(Ak). From (3), (4), and (6);
Corollary 9.7.1
is the unique (p.c. d')-inverse of A. AD is also a
(2, p, c, d)-inverse and is the unique (2, c, d)-inverse of A by definition.
Corollary 9.7.2
Suppose that Ind(A) =
1.
Then
(i) B is a (1,d)-inverse of A (land only (lB is a (d)-inverse, and (ii) B is a (2,d)-inverse of A (land only (1 B is an (m,d)-inverse.
Corollary 9.7.3
Suppose that Ind(A)
2. Then there are no (1,c,d)-
inverses or (1, p. 6)-inverses.
Proof Suppose that md (A) 2 and B is a (1, c, d)-inverse of A. Then by =0 (3), (6), (7) we have X =0, NYN = N, and NY = YN. But then
APPLICATIONS OF THE DRAZIN INVERSE 205
which is a contradiction. If B is a (1, p. d)-inverse we have by (3), (5), (7)
that X =0, Y =0, and NON = N which is a contradiction. U Most of the (d)-inverses are not spectral in the sense of [40] since no assumptions have been placed on N(A), N(Ad). However;
Corollary 9.7.4
The operation of taking (m,d)-inverses has the spectral mapping property. That is. 2 is a non-zero eigenvalue for A (land only if 1/2 is a non-zero eigenvalue for the (m, d)-inverse B. Furthermore, the eigenspaces for 2 and 1/2 are the same. Both A and B either have a zero eigenvalue or are invertible. The zero eigenspaces need not be the same.
Note that if 2 is a non-zero elgenvalue of A, then 1/1 is an eigenvalue of any (d)-inverse of A.
Corollary 9.7.5 JfB1,... , B, are (d)-inverses of A, then B1B2 ... B, is a is a (d')-inverse of Atm. (d)-inverse of A'. In particular, Corollary 5 is not true for (1)-inverses. For B
is
=
a (1,2)-
but B2 = 0 and hence B2 is not a (1)-inverse of
inverse of A
= A2 = A. This is not surprising, for (A')2 may not be even a (1)-inverse of A2.
Theorem 9.7.2
Ind(A) = k. Then
Suppose that
(i) (AD + Z(I — ADA)IZ€Cn is the set of all (d)-inverses of A, ADA {AD ADAZ(I — (ii) is the set of all (m,d)-inverses of A, + — ADA) (AD ZA = AZ) is the set of all (c, d)-inverses of A, (iii) + Z(I X
and
(iv) {AD + (I — ADA)[A(I — ADA)] 1(1 — ADA)[A(I ADA) =0) is the set of all (1, d)-inverses of A.
—
ADA)]A(I —
Proof (i)—(iv) follow from Theorem 1. We have omitted the (p. d)- and (2, d)-inverses since they are about as appealing as (iv). U Just as it is possible to calculate A' given an A, one may calculate A" from any Ad.
Jfk = lnd(A), then AD = (Ad)I+ 'A'for any I k. The next two results are the weak Drazin equivalents of Theorem 7.8.1.
Corollary 9.7.6
Theorem 9.7.3
Suppose that
A
where C is
= invertible. Then all (d)-inverses of A are given by Ad
IC1 _C.1DEd+Z(I_EàE)]. Lo
Ed
j'
206 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS Ed any (t)-inverse of E, Ea an (m, d)-inverse of E, Z an arbitrary
matrix of the
correct size.
Proof Suppose A Then Ak
with C invertible. Let k = Ind(A) = Ind(E).
=
ic' e1 = Lo
II
.
.
01.
Ek]' where Ois some matrix. Now the range of[0 0]
in R(Ak). Hence AD and any Ad agree on it. Thus (10)
Now suppose (10) is a (d)-inverse of A. Then AAdAk = A'. Hence
fC D1ICi Lo EJLO
XI1ICA 811C' & X2J1.o
II
CX1+DX211C' ThusLo EX2 ]Lo &+(CX1+DX2)E'=O, EX2E'=E'. If AdAk +2
E'f[0
E'
E'][0
E']'°'
01 ic' el
(11) (12)
= A' is to hold, one must have X2 a (d)-inverse of E. Let X2 = E'
for some (d)-inverse of E. Then (12) holds. Now (11) becomes X1E' = — C IDEdEk. Let Eà be an (m,d)-inverse of E. Then EaE is a projection onto R(E'). Hence X1 must be of the form — C IDEa + Z(I — EaE) and (9) follows. To see that (9) defines a (d)-inverse of A is a direct computation. = A' implies ABA' = A', the It should be pointed out that while two conditions are not equivalent.
Corollary 9.7.7
Suppose there exists an invertible T such that (13)
]T is an (m,d)-inverse
with C invertible and N nilpotent. Then T
for A. If one wanted AD from (13) it would be given by the more complicated
expressionTADTI
rc-i
=Lo
/k-i
C'XN'
Although for block triangular matrices it is easier to compute a weak Drazin than a Drazin inverse, in practice one frequently does not have a block triangular matrix to begin with. We now give two results which are the weak Drazin analogues of Algorithm 7.6.1.
Theorem 9.7.4
Suppose that
and that p(x) = x'(c0 + ... + c,xl,
APPLICATIONS OF THE DRAZIN INVERSE 207
0, is the characteristic (or minimal) polynomial of A.
c0
Then
Ad=__(c11+...+c,Ar_l)
(14)
is a (cd)-inverse of A. If(14) is not invertible, then Ad + (I invertible (c,d)-inverse of A.
—
AdA) is an
Proof Since p(A) =0, we have (c01 + ... + c,A')A1 =0. Hence (c11 + ... + c,A' ')A'4' = — c0A'. Since Ind(A) I, we have that (14) is a (d)-inverse. It is commuting since it is a polynomial in A. Now let A be as in (2). Then since Ad is a (c, d)-inverse it is in the form (6). But then
y= is
—
!(c11 + c2N + ... + c,N' c0
1)•
Ifc1 #0, then V is invertible since N
nilpotent and we are done. Suppose that c1 =0, then
is nilpotent. That Ad + (I — AdA) is a (c, d)-inverse
follows from the fact that Ad Note that Theorem 4 requires no information on eigenvalues or their multiplicities to calculate a (C, d)-inverse. If A has rational entries, (14) would provide an exact answer if exact arithmetic were used. Theorem 4 suggests that a variant of the Souriau — Frame algorithm could be used to compute (c, d)-inverses. In fact, the algorithm goes through almost unaltered.
Theorem 9.7.5
Suppose that A
and
X
Let B0 = I. For j = 1,2, ...
, n,
let
If p5#O, but
=0, then (15)
is a (c,d)-inverse. In fact, (14) and (15) are the same matrix.
Proof Let k = Ind(A). Observe that = — — — ... — If r is the smallest integer such that B, =0 and s is the largest such that #0, then Ind(A) = r — s. Since B, =0, we have A'=p1A'1 — ... —p5A'5=O. Hence,
A'5 = —(A' — p1A'
... — p5
1)
=
!(As—' _P1A5_2_..._P,_,I)Ar_s41.Thatis,Ak=(!B5....i)Ak+1
desired. U
as
208 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
Suppose that A, "and AB = BA. Let Ad be any (4)-inverse of A. Then AdBAAD = BAdAAD = BADAAD = ADBAAD.
Lemma 9.7.1
Proof If AB = BA, then ADB = BAD. Also if A is given by (2), then
TBT' =
with B1C = CB1. Lemma 1 now follows from
Theorem 1. • As an immediate consequence of Lemma 1, one may use a (d)-inverse in many of the applications of the Drazin inverse. For example, see the next theorem.
Theorem 9.7.6
Suppose that A,BeC""'. Suppose that Ax + Bx = 0 has
unique solutions for consistent initial conditions, that is, there is a scalar c such that (cA+ B) is invertible. Let A = (cA + B) 1A, = (cA + B) 18. Let k = md (A), jfAx + Bx =0, x(O) = q, is consistent, then the solution is If Ad is an (m,d)-inverse of A, then all solutions of Ax + Bx =0 x=e are of the form x = qeC", and the space of consistent initial conditions is R(AAd) = R(AI2A).
Note in Theorem 6 that AAd need not equal AdA even if is an (m, d)inverse of A. Weak Drazin inverses can also be used in the theory of Markov chains. For example, the next result follows from the results of Chapter 8.
If T is the transition matrix of an rn-state ergodic chain and if A = I — T, then the rows of I — AdA are all equal to the unique fixed probability vector ofTfor any (4)-inverse of A. Theorem 9.7.8
8.
Exercises
Exercises 1—6 provide a generalization of some of the results in Section 2.
Proofs may be found in [18]. 1. Suppose that A, B are m x n matrices. Let ()° denote a (2)-inverse. Show that the following are equivalent: (i) (AA + B)°A, + B)°B commute. (ii) (AA + B)(AA + B)°A[I —v.A + B)°(AA + B)] = 0. (iii) + B)(AA + B)°B{I — (AA + + B)] =0.
+ B)tA, 2. Prove that if A,B are hermitian, then + B)tB commute if and only if there exists a I such that N(XA + B) = N(A) N(B). Furthermore, if I exists, then (IA + B)tA, (IA + B)tB commute. 3. Prove that if A, BeC" 'are such that one is EP and the other is positive semi-definite, then there exists A such that AA + B is invertible if and only if N(A) N(S) = {0). 4. Prove that if A, Be C" Xli are such that one is EP and one is positive semi-definite, then there exists A such that N(AA ÷ B) = N(A) rs N(S).
APPLICATIONS OF THE DRAZIN INVERSE 209 5.
Suppose that A, are such that N(A) N(B) reduces both A and B. Suppose also that there exists a 2 such that N(AA + B) = N(A) N(B). Prove that when Ax + Bx =1, f n-times continuously differentiable is consistent if and only if + B) for all t, that is, (AA + B) x B)tf = f. And that if it is consistent, then all solutions are of the + form X
= + [(AA + B)D(AA + B) — ADarADAq
+e
—
+ [I — (2A + B)D(A.A + B)]g
where A = (AA + B)DA, B = (A.A + B)DB, I = (AA + q is an arbitary vector, g an arbitrary vector valued function, and k = md (A). 6. Prove that if A, B are EP and one is positive semi-definite, 2 as in Exercise 4, then all solutions of Ax + Bx = f are in the form given in Exercise 5. 7. Derive formula (8) in Corollary 9.6.2. 8. Derive an expression for the consistent set of initial conditions for Ax + Bx = f when f is n-times differentiable and AA + B is onto. 9. Verify that Corollary 9.7.6 is true. 10. Fill in the details in the proof of Theorem 9.5.7. + 'B = A". 11. If A E and k = md (A), define a right weak Drazin by Develop the right equivalent of Theorems 1, 2, 3 and their corollaries. 12. Solve Ax(t) + Bx(t) = b, A, B as in Example 2.2, b = [1 20]*. X
Answer: x1(t) =
—
x2(r) =
—
x3(t) = 13.
+ 2x3(0)) —
—
+
+ 2x3(0)) — + 2x3(0)) —
—
—
—
+ 2t —
t
Let T be a matrix of the form (1). Assume each P, > 0(If any P, = 0, Thus, we then there would never be anyone in the age interval A, agree to truncate the matrix T just before the first zero survival probability.) The matrix T is non-singular if and only if bm (the last birth rate) is non-zero. Show that the characteristic equation for T is
O=xm —b,x"''
—p1p2b3x"3 — Pm_2bm.- ,)x —(p,p2 ... ibm)•
10
Continuity of the generalized inverse
1.
Introduction
Consider the following statement: X (A) If is a sequence of matrices and C"' converges to an invertible matrix A, then for large enough j, A, is invertible and A
to its obvious theoretical interest, statement (A) has practical computational content. First, if we have a sequence of 'nice' matrices which gets close to A, it tells us that gets close to A'. Thus approximation methods might be of use in computing Secondly, statement (A) gives us information on how sensitive the inverse of A is to terrors' in determining A. It tells us that if our error in determining A was 'small', then the error resulting in A1 due to the error in A will also be 'small'. This chapter will determine to what extent statement (A) is true for the Moore—Penrose and Drazin generalized inverses. But first, we must discuss what we mean by 'near', 'small', and 'converges to'.
2. Matrix norms In linear algebra the most common way of telling when things are close is by the use of norms. Norms are to vectors what absolute value is to numbers.
Definition 10.2.1. A function p sending a vector space V into the positive reals is called a norm all u,vE V and
(I) p(u)=Oiffu=O. and (iii) p(u + v) p(u) + p(v) (ii)
(triangle inequality).
We will usually denote p(u) by u
CONTINUITY OF THE GENERALIZED INVERSE
211
There are many different norms that can be put on C". If uEC" and u has coordinates (u1, ... , un), then the sup norm of u is given by sup 1
The p-norm of u
is
given by 1/p
forp1.
The function u hf,, is not a norm for 0 p < since it fails to satisfy (iii). The norm I! ii 112 is the ordinary Euclidean norm of u, that is, u 112 is the geometric length of u. We are using the term norm a little loosely. To be precise we would have to say that is actually a family of norms, one for each C". However, to avoid unenlightening verbage we shall continue to talk of the norm the norm etc. are isomorphic, as a vector space, to C'"". The m x n matrices, C'" Thus C'" can be equipped with any of the above norms. However, in AB A liii B whenever working out estimates it is extremely helpful 1
AB is defined. A norm for which AB hAil IIBII for called a matrix norm. Not all norms are matrix norms. Example 10.2.1 the
If A =
of C'"" applied to
then define A II = max 1a11I. This is just
Now let
A = B = 1, but AB = 2. Thus this norm is not a matrix norm. There is a standard way to develop a matrix norm from a vector norm. is a norm on C' for all r. Define hA IL5 by Suppose that and
A 1105 = sup (II Au iii: ueC", u hIS = I). It is possible to generalize this
definition by using a different norm in C'" to 'measure' Au than the one used in C" to 'measure' u. However, in working problems it is usually We will not need easier to use a fixed vector norm such as or the more general definition. There is another formulation of II Proposition 10.2.1 Suppose that A e C'" Xfl and is a norm on C'" and Thus IA IL5 = I Au KIIuII5 for every ueC"}. The proof of Proposition 1 is left to the exercises. If A is thought of as a linear transformation from C" to C'", where C" and C'" are equipped with the norm then A L is the norm that is usually used. A L is also referred to as the operator norm of A relative to hence the subscript os. Conversely, if is a matrix norm, then by identifying C' and C' 'it is a matrix norm we have induces a norm on C', say IL. Since
IIAuIL hAil hulL.
212 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
If(1) occurs for a pair of norms and on C" and Ctm. we on By Proposition 1 we say that consistent with the vector norm know that is consistent with IL if and only if II A A We pause briefly to recall the definition of a limit. ii
Definition 10.2.2
Suppose that is a sequence of m x n matrices and A or lim Then = is a norm on C'" converges to A, (written XsI
ii
A)
j
every real number c> 0, there exists a real number i such A fi then ii Notice that the definition of convergence seems to depend on the norm
used. However, in finite dimensional spaces there is no difficulty.
Theorem 10.2.1
Suppose that dime,jsional vector space V. Then if
and and
f, are two norms on afinite Off, are equivalent.
That
is,
there exist constants, k, I such that k u u for all U E V. u Theorem 1 is a standard result in most introductory courses in linear
algebra. A proof may be found, for example, in [49]. Theorem 1 tells us that if A1 A with regard to one norm, then A1 A with regard to any norm. It is worth noting that A1 —' A if and only if the entries of converge to the corresponding entries of A. To further develop this circle of ideas, and for future reference, let us see what form Theorem 1 takes for the norms we have been looking at. Recall that an inequality is called sharp if it cannot be improved by multiplying one side by a scalar. Theorem 10.2.2 Suppose that ueC" and that p. q are two real numbers greater or equal to one. Then (1)
(ii)
iii
huh0 u
n—
(iv) n
— hi
ii
u
II u II q
if p
u
q
n
—
1/q
u
for all p, q
1.
Furthermore (i) and (ii) are sharp.
Proof (iv) follows from (i) and (ii) while (ii) is merely a rewriting of (i). To prove (i) assume uEC" and note that
/.,
In If u = e1, then ones,
then hull
u
In
\1/p
i=J
\lIp
(Z
iiuhh0 = max 1u11
)P)
(max j
(max I u.f)(
/)
=
\Ifp
li')
=
u JI
i=1
=
u
= 1 while ff
=
1
so u
=
u
Thus
is sharp. Of u consists of n is sharp.
CONTINUITY OF THE GENERALIZED INVERSE
213
observe that from (i)and (ii) we have " lUll4. The statement that II,, U U if 1 p
u Ii,,
ii
u
Ii
shows that Jensen's inequality is sharp. U We now turn to the problem of determining the matrix norms We begin with IlIL. Suppose that A = a
Au
= sup
= suP{
sup {
j=1
J
*
K
u
}
{
l
and
u, }
iai,i}
±
(2)
Thus
(3)
= and equality To get equality in (3) we need a vector u such that u occurs in (2). To get equality in the second inequality of (2) we can take Iu,l = 1, 1 j n. The first inequality of(2) will be an equality if 1
a.,uj =
u,l for that row of A for which
1a1,i is
maximum. Denote
1
this row by k. Then define u
if
as
follows,
0
u IL. This and (3) gives us the
u
following.
i=
Proposition 10.2.2
1
and A
=
then
Sup
{
a common formula. However, in the special case p = or p = 00, fl is reasonably calculable. Since p = 1,2,00 are the most common of the p-norms used this is not as bad as it sounds. We leave the derivation of the 1101 norm to the exercises. 1
Proposition 10.2.3 If A eCTM
then A L = max
{
214 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
The Euclidean norm 11112 is of much interest. Suppose that AeCTM
Xn
and
then Au
= (Au, Au) = (A*Au, u).
(4)
Since A*A is seif-adjoint, every vector UEC" can be written as u =
v, where v, satisfy (AA)v1 = for a real number 2,. Let be the largest eigenvalue of A*A with associated eigenvector u,. Then from (4), Thus we have and Proposition 10.2.4.
Proposition 10.2.4
then if A 1102 = 21/2 where 2 is the largest
eigenvalue of ABA. That is, if All02 is the largest singular value of A. In addition to and i,,, there are other matrix norms which can be used either to estimate and or in their own right. Two
common ones are given in the exercises. 3. Matrix norms and invertibility
In order to understand the convergence properties of the generalized inverse we shall briefly review the situation for invertible matrices. Throughout this section will denote any matrix norm on xl, such that lI'lI =1.Theassumptionthat 11111 =ldoesnotruleoutanyofthe matrix norms discussed in Section 2 and simplifies our formulas. XII
Proposition 10.3.1
Suppose that A e CNXII which is consistent with a vector norm
is a matrix norm on and IL. Then for every eigenvalue
AofA,I2l hAil. The proof of Proposition 1 is straightforward and is left to the exercises. Note that the vector norm of does not appear explicitly in Proposition 1. We will prove statement (A) of the introduction and develop some norm estimates for A — B The next proposition
is basic.
Proposition 10.3.2 11(1 — A)1
Proof Let B =
If hA if <1, then (I — A) is invertible and
1 —liAii
A if
a0
<1.
LetS= 11=0
Thus B = (I
—
A)-1. The estimate for if B follows from the
representation for B, 1IB1I
=
E
—
IA
•
CONTINUITY OF THE GENERALIZED INVERSE
215
"and lii — A < 1 then A is invertible and
Corollary 10.3.1
H
1-ll-All If — A < 1, then A = I — (I — A) and the result follows from Proposition 2. • Proof
H
The next two results are also basic to this section. We begin by
establishing that if A is invertible and B is close to A, then B is invertible. Alternatively we can show that if B is small then A + B is invertible. We choose the latter. Now (A + B) = A(I + A - 'B) where we are assuming that A is invertible. By Proposition 2, (I + A - 'B) will be invertible if A A 'B <1. A sufficient condition for this is clearly II B < A -'liii B
II
A'—(A+B)' =[A1(A+B)—IJ(A+By' =[I+A'B—I](A+BY' =A'B(A+ B)-' =A'B[A(I +A'B)]'
=A'B[I+A1B]'A'.
Thus 1lA'
—(A + B)—'
IIB1I 11A' 112 II(' + A'B)'
II
l1A' 211B11 The second inequality follows from Proposition 2 B ll• — A
while the third follows from 11A'BII the following result. Theorem 10.3.1 then (A
II
A
+ — 1
Suppose that AeC" B) is invertible. Furthermore — (A
+ B) 'II
1lA
X
11111 BlI <1.
We have proven
B II
is invertible.
<1/Il A'
1A' 2 Bf —
A
—
(2)
11111 B
Corollary 10.3.2
!fAeCTM is invertible and { A1 —' A, then for large enough j, exists and
c Ctm
such that
A
A and apply Theorem 1. U
Example 10.3.1
Then A
= 1/2 <
1.
But I — A is not invertible. Thus Proposition 2
was a matrix norm. depends on the fact The estimates given in Proposition 2, Corollary 1, and Theorem 1 are all sharp as can be seen from the scalar case.
216 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
Continuity of the Moore—Penrose generalized inverse
4.
We now wish to see to what extent the results of Section 3 can be extended
to the generalized inverse. The first thing to notice is that statement (A) of Section 1 is not true for the generalized inverse. Example 10.4.1 while A'
?i.]andA does not converge to anything,
Thus A,— A but
=
much less to At. The problem that immediately presents itself is to determine necessary —, At. Surprisingly it A implies and sufficient conditions so that has a nice answer. In order to try and get some feeling for what is happening let us return A to Example 1. Rather than talking of A+ where —, 0. ía, b.1 where for simplicity we assume that aj, are Let E,
=
0 so that
all real. We will also assume that E, —,
0.
0,
—'0,
At where A
We wish to investigate when does (A +
0,
and
=
Since a, —, 0 we might as well assume that I aj I < 1 for all j.
Then
b, '
L
d,
c,
rank at least one. That is, rank (A + E,) rank (A). We shall find out later that this is typical. There are two cases to consider for a particularj. has
Case /
A + E, is invertible. In this case E
(A +
A+ such that
Caso II
A + E1=
=
1
[
—
a,+1j1.
is singular. In this case rank (A + E1) =
[ai+ 1
1)
b,
1
(1)
so there exists
'][i
Thus (A + E )t — —
1
(1 +
Faj + 1
(1 + a1)2 + c)L
b1
(2)
CONTINUITY OF THE GENERALIZED INVERSE
217
Now suppose that A + E, has rank I for j greater than some J0. Then we are in Case H Ion j0. Now = b,/(a, + 1) so that Since we a,—'O, b1—.O, c,—'O, get from (2) that (A + and as desired. Suppose however that A + does not have rank 1 for all j greater than somej0. Then there exists a subsequence Em such that rank (A + Em) = 2 and E,, —, 0 for all integers m in the subsequence. Thus a,, —.0, d,, —.0. b,, —, 0, and cm —, 0. But the (2,2) entry of(A + Em)' is 3 (a,,, + 1)dm —
which does not converge to anything much less zero.
We have then that for our particular example, the following. At if and only if there is aj0 such (B) Given we have (A + rank (A) forj j0. that rank (A + such Statement (B) turns out to be valid for any c that E, —, 0. The proof will proceed much as in the special case. After some preliminary results we shall consider different cases involving rank (A) and rank (A + Es). To begin, recall the following fact.
Fact 10.4.1 If ', then Ax
is invertible.
x
is
a matrix norm on
and
— 1•
A 10 For all xeC" we will also need the following.
Fact 10.4.2
If
is ve
any norm on a vector space V. then u +
V
The generalized inverse version of Fact 1 takes the following form.
Proposition 10.4.1 Suppose that and is a matrix norm on q, At x / for XE R(A*). p 1, q 1. Then Ax 0
Then x = AtAu At Ax so that Proof Suppose Ax flx fl/Il At as desired. • In the discussion following Example 1 we noted that in that example rank (A + rank (A). This is typical as the next proposition shows. fi
Proposition 10.4.2 Suppose that A€CM Xfl and is a matrix norm on Kg p 1, q 1. IfEe CTM and fi E < 1/fl At fi, then rank (A + E) rank (A).
Proof Supposethatrank(A)=rand hEll < 1/IIA'lI. Let {ut,...,ur}be a basis for R(A*). It suffices to show that ((A + E)u1, ... ,(A + is a linearly independent subset of R(A + E). Suppose that 0=
iz
+ E)u,; 1
218 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
Then if x =
0 we get that i—I
0= E x1(A + E)e1 = Ax Ax
= Ax + Ex
(A + E) i—i
Ax
—
Ex (by Fact 2)
—
x / At x / H At
0
E liii x
—
x/
—
At
(by Proposition 1) =0.
Thus rank (A + E) rank (A). • Proposition 2 has several useful corollaries.
If A,
Corollary 10.4.1
and 1
IIABII <max{llAtli,
(1)
OBtll}'
then rank (A) = rank (B).
Proof Let E = A — B and notice that (1) implies that hA — Bli <1/il At ii and IIA—Bli
<1/hIBthl.Thusrank(A)=rank(B+E)rank(B)and
rank (B) = rank (A — E) rank (A) so that rank (A) = rank (B). As a special case of Corollary 1 we have the next corollary.
Corollary 10.4.2 If P. Q are orthogonal projectors in CA lip — QIL2 < 1, then rank(P) = rank(Q). The proof of half of statement (B) is now immediate.
Lemma 10.4.1
c
Suppose that
and(A + E)t_.At where
Then there exists aj0 such that rank(A
+
= rank(A)
JJo Pmof Suppose that
—0 and (A +
At. But (A +
-. A. Thus
... AAt = But the limit can be taken with respect to any norm by Theorem 2.1. Thus by Definition 2.2 there exists aj0 such that ifj j0, then = rank (A) forj J0 now follows from < 1. That rank (A + 0 o2 = (A +
+
—
Corollary 2. •
The rest of this section will be divided into two parts. The first will be
devoted to a proof of statement (B). The proof will be somewhat qualitative in nature. The second part will consist of a quantitative discussion of the same ideas. such that E1—i0. and {EJ) c Theorem 10.4.1 Suppose that _'At and only if rank(A + = rank(A)forj greater than Then (A + somefixedj0.
CONTINUITY OF THE GENERALIZED INVERSE
219
Proof Lemma 1 takes care of the only if part. Suppose that 0 and rank(A + = rank A = r forj j0. Now there exists unitary matrices UeCa' "', "and invertible matrix BeC' such that C = UAV
=
[
(2)
]. F12(j) F22
"
L F 21V/
(3)
Notice that rank (C + F,) = rank (A + and rank (C) = rank (A). Further= V*AtU* and (C + F)t = [U(A + = V*(A + E,)tU*, more since —, if and only if (A + E1)t _. At. For notational convenience (C +
we will omit thej in (3). We wish to get a formula for (C + F)t. Let liii = 111102• Since F—O we may assume fl F sup (II F,, fi, F,, H F2, fi, fi F22 II). Thus rank (B + F,,) = rank (B) by Proposition 2 and the fact that B is of full rank. Thus fi
IB+F,1 F,21
rankl
L
r21
I = rank (B).
22J
By Lemma 3.3.1 C + F can be written as
IB+F,1 L
F2,
I F,2 F21(B + F,,) 'F,2] — [F21(B + F1 x
Thus
+
(4)
[I + (B* ÷ by Theorem 3.3.4. Now F,1, F1 F12,F21,F22—'O since F—0. By Corollary 2,(B+ F,,)' —'B'. But 0. Similarly (B* + then (B + F, ,) + F,
x
1
(B+F,,)'-'O.ThusX,-'I, X2-IasF-'O.So and
[I (B + F, ,Y 'F,2]
[10] as F— 0.
Thus
as F —, 0 and the proof is We assumed that = and IIAII= IICII.
complete. • so
that we could assert that fi
fi
= fi
fi
220 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
Theorem 1 can be proved without using Theorem 3.3.4 in several ways.
Some are modifications of our argument here. Others, such as the original one due to Penrose, are completely different. In example 1 we had rank (A) and AJ =j = A — A111. This behaviour is typical as the next result shows. Theorem 10.4.2 Then (A +
and that rank(A + E)> rank (A). E for any operator norm
Suppose that A, EEC'"
E)t
1
/
Proof Suppose that A, EEC'" x" and that rank (A + E)> rank A. Then dim N(A)> dimN(A + E). Hence there is a vector u€N(A) of norm 1 such The proof of this last fact is left to the exercises. Now that ueN(A + UER(A* + E). Hence (A + E)t(A + E)u = u or (A + E)tEu = u. But then 1 II(A +E)tEII +E)'II hEll and II(A +E)tll
shows that the inequality of Theorem 2 is sharp. An obvious consequence of Theorem 2 is that if —' 0 but rank(A + E1)> rank (A), then not only does (A + + At but (A + E,)t is not even bounded in norm. Theorem I is theoretically satisfying in the sense that it completely -. At. However, in some situations it is characterizes when (A + important to be able to estimate the difference (error) between (A + Es)' — A' and A'. The rest of this section is devoted to estimating (A + In proving Theorem 1 we used unitary matrices. Unitary matrices are especially good when working with the Euclidean operator norm 111102 for ifU* = and then UA 1102 = hA We will use the Euclidean operator norm since it allows us to use the simplified block terms of Theorem 1. We shall also use the notation of the proof of Theorem 1. Let 1
0_F F* L
Also set !P,
(B*4F* ii ''X
2'
12'
and Y'2 = [1
=
rr11,
F2,
0]. Then (C + F)t = 8 B 19 while
C' =
B' !P2. Taking the norm of both sides gives (C + F)'
—
C' x
Certain factors in (5) are
+
B—
— "i
II
(5)
liii
obvious. B-
= Ct = At and
In order to estimate the rest we will use the following lemma.
Lemma 10.4.2
IfAeC'" XII BeC'
and
denotes the Euclidean
=
1.
CONTINUITY OF THE GENERALIZED INVERSE
A
operator, norm, then
Proof of Lemma
Suppose ueC,
Ilu
=
=
+
112
221
B 112)12.
= I. Then
+
hAil2 + 11B112.
+ 18112 as desired. Notice that since
Thus
D
= D for the Euclidean operator norm we also have l[A,B]112 11A112+ 118112. S We now begin to estimate the various terms in 9, and
To simplify
matters we assume that 11B' II IF11
(B + F1
II
<1, hI(B+F11)' II
11111 F21
II
< 1,and
< 1.
(6)
These assumptions will be discussed more fully later. Now II
F12
=
1111(1 11(1
11F12h1
+B
1F1
•+. B'F11)'
II
II
(1)
118'F1, iu18h11
by Proposition 3.2 and assumptions (6). Let 11)_I. Then (7) becomes
+
1B
=
118-111(1 —
B' x
F12
(8)
F21 II.
(9)
Similarly,
+
) — lF*
II
II
that 11(1 + DD*)u Thus II(I+DD*)l11 1. Hence Observe that for any DeCtm
ii for all ueCTM.
1.
1,and
(10)
By Lemma 2 we can now conclude from (8), (9),(10) that (11)
Now
xl I + Ft1) lxi
1—
1
(12)
and —
= [X2
—
I X2(B* +
By Theorem 3.1 we calculate that
(13)
_ 222 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
—
X1
+
II(B + F11) II
— 11(8
+
+ F1
'II
II(B+F11)—'F12112
—
II
1— lI(B+F11)—1F1211
Or a2 F12 112
lII_X1lIla2IIF
(14)
112
by (8). Similarly, —
X2
a211 F21
i—
112
F21
(15)
Ill. ii
Combining (8), (10), (12) and (14) with Lemma 2 gives
F 1211112 112
2
) +IIF
'2N2
(16)
1211
1211
II
In the same manner we get that 2 2
II
2111
112
II II
2111
(17) into (5) gives us that
(C + F)t — C'
F21 a
(1 + a2 F12 112)1/2 II a211F21fl2 22+1) (1 —a2fIF21II )
1/2
1/2
a2IIF12II2
+aIIF12II( (1 —
F21 112)2
+ 1)
(18)
It follows from (2) and (3) that if we define E11 = PR(A)EPR(A.),
=
E12 = PR(A)EPN(A), PN(AS)EPR(A.), E22 = PN(A.)EPN(A),
then
(19)
I
Theorem 10.4.3 Suppose that A, and rank(A + E) = rank (A). E <1/2 where Suppose furl her that At = II 11o2 Define as in (19). Let a = At 11/(1 — hA' E11 II). Then
II(A+ E)t
_At ahIE
II hAt 11(1 +a2h1E12hh2)"2 a2hIE21hI2
1/2
) x( (1—a2hIE21hI 22+1)
+ahIE12hh At ((1
a2hIE12hI2
E21
112)2
1/2
+ 1)
.
(20)
CONTINUITY OF THE GENERALIZED INVERSE
223
Proof We need only show that II
E <1/2 implies conditions (6) and that E for I i, are satisfied. Notice that = j2. Thus 11B' lIE,, At hEll <1/2< 1 so the Iirst condition is satisfied. To see that the second and third conditions are satisfied we calculate that II
II
(B+F 11
IIBII
<
F
I
—
1/2
F
1— 1/2
11B'II 11F1111
—1
as desired. U
There are several nice features to (20). The first is that H At factors out of the right hand side. Thus the 'percentage error' 10011 (A + E)t —
At Il/Il At Ills easy to estimate. The second is that it is probably undesirable However, if is replaced by in many cases to actually compute then an estimate can be obtained. Since any number K, E., K < E 1102 1102' one could use some of the more easy to compute norms In in Section 2 to estimate E 1102 and hence to estimate E, particular, we get II
I!(A+E)t—Athl 2IIAtII IIEII(1 +(1
I
\112
(21)
for any 11E1102 lIED <
0] rio Ii 0 and B = 0
rio Example 10.4.2
Let A =
I
0
0
LO
0J
= 1h11O2
LetE= 10
1
1
0
0 0
0
We wish
0]
hEll2 =
Thus
< 1/2 and Theorem 3 can be used. Now
0.1(1 — 0.03364.
0
Lo 0
so that B= A + E. Observe that IlAtil = 0.1. Also DElI
E
11
[0
ro
At 1111
i
0
1
Note that
rl/(lo+e)
Bt=I
[C
— At 0.11647. Substituting into (21) gives Thus one conclusion of our estimate is that Il C
i/Ii+e £
C
where e denotes a term <0.03364 in absolute value. The exact value for Bt is
11/10—1/1010
Bt=l
L
0 1/101
10/1111 1/11
0 0
—1/1111
0
—
For some purposes (21) is sufficient. However, in a particular problem
224 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
one might want a better estimate such as (20). Other estimates exist. We
will give one due to Stewart. Its proof is left to the (starred) exercises.
Theorem 10.4.4 Suppose A, E E Ctm Define the as in (19). Suppose further that < and that rank(A + E) = rank(A). Let liii E1 1
K=
I
At
+ 11jj2)
1
/
K11E1111\1 = yic and Il/Il A A — The number K = A At Ills called the condition number of A. K measures
where y
=
1
the amount of distortion of the unit ball of caused by the linear transformation induced by A. Theorem 3.1 may be written using K. Since = AAt we have K 1 if A 0. Let 2, and denote the largest and smallest non-zero singular values of A. (See page 6 for the definition of singular values.) It follows from the singular value decomposition, Theorem 0.2.2, that hA = A, and hAt = 1/2g. Thus K = 2,/As. It is sometimes useful to have an estimate for At — Bt even if rank (A) rank (B). The following Proposition is helpful.
Proposition 10.4.3 Suppose that A, BeCTM At
—
Bt = Bt(B — A)At + (I — BtB)(A*
Then —
B*)At*At + BtBt*(A* — B*)
x(I_AAt). The proof of Proposition 3 is straightforward and is left to the exercises. From Proposition 3 we get quickly that Theorem 10.4.5 Suppose that A,BECrn*A. If II II At — Bt 3 max { At 112,11 Bt 112)11 A — Bil.
= 111102' then
II
Proof From Proposition 3 we have that A'
—
B'
A
—
At + A'
B 11(11 Bt
2 II
+
IIA—BII 3max{IIA'fl2,IIB'112). U The 3 in Theorem 5 can be replaced by (1 +
5.
See [89].
Matrix valued functions
Theorem 10.4.1 has another formulation which is of interest. Let A(t) be an m x n matrix valued function for t in some interval [a, b]. That is,
1a11(t)
A(t)=I
. . .
:
:
...
atb,
CONTINUITY OF THE GENERALIZED INVERSE
225
where a.,(t) is a complex valued function and is defined on [a. b]. If is continuous for I I m, n, then A is called continuous. 1
This is equivalent to saying urn A(t) — A(t0) = 0, for a
b. If A(t)
10
invertible for tE[a, bj, then [A(t)} defines a matrix function for IE[a,bJ. We denote this function by A '(t). It is immediate from Theorem 3.1 that: is
Proposition 10.5.1
If A(t)
is a continuous n x n matrix valued function on [a,bJ such that A(r) is invertible for a t b, then A '(1) = [A(t)] is
a continuous matrix valued function on [a. b]. Proposition I may also be proved by Cramer's rule.
If A(t) is an m x n matrix valued function we define the n x m matrix valued function At() by At(t) = [A(t)]'. Theorem 4.1 gives us the following extension of Proposition 1.
Theorem 10.5.1 Suppose that A() is a continuous m x n matrix valued function defined on [a, bJ. Then At(t) is continuous on [a, b] and only if rank(A(t)) is constant on [a,b]. Define the rank functionr(t) by r(t) = rank(A(t)). The discontinuities of At(t) occur when A(t) changes rank. That is, at the discontinuities of r(t). We wish to establish 'how many' discontinuities At(t) may have if A(t) is continuous. The discussion requires a certain familiarity with the concepts of open and closed sets such as is found in a standard first course in real analysis.
Example 10.5.1
Letf(t)= t sin(ir/t) 110< t 1 andf(0)= 0. Let
?1.Thenr(t)=2ift Let $4' = {t0IA? is
not continuous at tj. Then 5" =
I.
fl an
integer}U {0).
Notice that the set 5" of Example 1 has an infinite number of points in it. However, it is closed (contains all its limit points) and has no interior (contains no open sets). This behaviour is typical. Theorem 10.5.2
Suppose that A() is a continuous m x n matrix valued function defined on [a, b]. Let $4' = { ç€ [a, b] I At(t) is not continuous at t, }. Then $4' is closed and has no interior. Thus there exists a collection of open intervals {(a1,b1)} such that At(.) is continuous on each (a1,b1) and the closure of (J (a1, b.) is all of [a, b].
Proof Suppose that A(•) is a continuous m x n matrix valued function defined on [a,b]. Let 5" = {t0Ir(e) is not continuous at Since r(t) is not continuous only when it is not constant this $4' is the same as the of Theorem 2. If the determinant of a fixed submatrix of A(t) is taken we
226 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
a continuous function of t. Let be the sum of the absolute values of the determinants of all I x I submatrices of A(t). (For convenience 1 I n, are continuous functions on [a, b] and suppose n m.) Then = 0 for j I. Notice that r(t) = sup { iI 4'.(t) # 0). if = 0, then and Let is closed since c = {t14)1(t)= 0). Notice that is continuous. Let denote the boundary of 6".. For a closed set 9's, which are not interior points (loosely speaking the is those points in edge of 6°.). For the 6" of Example 1 we have = 9'. Now is a get
closed set with no interior. Hence
is a closed set with no interior. iz I
We shall show that 6"
=
(J ô6°1.
Suppose that r(t) is not continuous at t0, that is, çe 6°. Let r = r(t0). By the continuity of A(t) and Proposition 4.2 we have r(t) r(t0) for t near ti,. Since r(t) is not continuous at t0 there exists a sequence t,—' t0 r(t0) + 1. But such that 1(t0) = 0 and 1(t) 0. Thus to€t36"r+i. Hence 5F
J
To show equality suppose that 10E (J i=
such that t0€35",. r(t0) = r and
Let r be the smallest integer
1
Let be a sequence such that ti—' t0 but Then r + 1. Thus r(t) is not continuous at t0 and t0€b°.
a It should be noted that set of discontinuities of At(t) can still be very complicated for there exist closed sets with no interior which are uncountable. Example 10.5.2 Let 6° be any closed subset of [0,1] which has no interior. Definef(t) = inf{It — sI :se9'}. Thenf is continuous and .9' = {te[0, 1] :f(t) = 0). Let A(t) = [f(t)] so that At(t) = [(f(t))t]. Then the set of discontinuities of At(t) is 9'. It is also useful to be able to differentiate matrix valued functions. If A(t) is an m x n matrix valued function on [a, b], we define dA(t,) by dA(t0) = urn [A(t) — A(t0)]/(t —
t0)
provided the limit exists (any matrix norm may be used). This is equivalent
to saying that dA(t0) =
where
=
dA(t0) is called the
derivative of A at t0. If At(t) is to be differentiable at it must be continuous at t0, hence of constant rank in some open interval containing Provided that this
CONTINUITY OF THE GENERALIZED INVERSE
227
happens the differentiability of A(t) at t0 implies the differentiability of
At(t) at t0. Theorem 10.5.3
Suppose that A(t) is an m x n matrix valued function defined on [a, b]. If rank(A(t)) is constant, then is differentiable on [a, bJ and
dAt =
Proof
—
)A A +
At(dA)At +
L.
Suppose that A(t) is differentiable on [a, b] and that rank (A(t)) is
constant. Then urn [A(t) — A(t0)]/(t
—
= At(ç). Since
ç) = dA(ç), urn t—1o
we are differentiating with respect to a real variable we have [d(A*)](10) =
[(dA)(t0)]t, that is (dA)* = d(A*), where A*(t) = [A(t)]*. Thus the symbol dA* is well defined. By Proposition 4.3 we have (At(t)
At(10))/(t
At(t0) { [A(t) — A(t0)]/(t — tv,) )At(t0) + (I — At(tJA(t0)) { — — t0) }At(t)*At(t) + A(to)tA(tc,)?* { [A(t) — A*(t0)]/(t — } (1 — A(t)A(t)t). —
—
t0)
—
Taking the limit as t -. i, of both sides gives the formula for dAt and the
differentiability of At. • It is important to notice that in the proof of Theorem 3 we used the fact that t was a real variable. Theorem 3, as stated, is not valid for t a complex variable. Let us see why. For the remainder of this section suppose that z is a complex variable and A(z) is an analytic, m x n matrix valued function defined on a connected open set Q. That is, is analytic for ZEQ, i in, n. If At(z) were also analytic, then so would be A(z)At(z) and At(z)A(z). But 1
A(z)A'(z) 1102 and
1
A(z)At(z) 1102 are identically one on Q. Thus A(z)At(z)
and At(z)A(z) are identically constant. (A vector-valued version of the maximum modulus theorem is used to prove this last assertion.) Thus R(A(z)) and N(A(z)) are independent of z if both A(z) and At(z) are analytic. Suppose now that R(A(z)) and N(A(z)) are independent of z. Then there exist constant unitary matrices U, V such that
]u* where A1(z) is analytic and invertible. Of course, some of the zero submatrices in this decomposition of A(z) may not be present. But then, At(z) is analytic since 1(z) is.
Theorem 10.5.4
If A(z) is an analytic, m x n matrix valued function defined on a connected open set Q, then At(z) is analytic on Q and only if R(A(z)) and N(A(z)) are independent of z.
228 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
The assumption that R(A(z)) and N(A(z)) are independent of z may be
too restrictive for some applications. If one is willing to use an inverse other than the Moore—Penrose inverse, then the situation is somewhat more flexible. It is possible to pick a (1,2)-inverse A2(z) so that A(z) and A(z) are both meromorphic on the same domain. (Recall that a function is called meromorphic on a domain if it is analytic except at isolated points called poles where A(z) satisfies urn (z — z.rA(z) =0 for some in.) An example will help illustrate the general situation.
[zO Example 10.5.3
Iz'
-
A2(z)
=
10
0 0
Lo
— 1/2
11
Let A(z) = 10
z
—1
LO
z
—IJ
and
0 0 —
1/2
•
Notice that N(A(z)) is not independent
of z, but A(z) is analytic except at z=0 where A(z) has a rank change.
The projectors
Ii A;(z)A(z)=I 0
0 0]
Ii
0 01 and A(z)A(z)= 10
[0 —z IJ
LO
0
01
1/2 1/2
1/21
1/2]
are both analytic for all z, including zero, but A(z)A(z) is of non-constant norm. This behaviour is typical. Theorem 10.5.5
Let A(z) be a meromorphic m x n matrix valued function Then there is a n x m matrix valued defined on a neighbourhood of on a neighbourhood of z0 such that: function A2(z) defined
A(z) is a (1,2)-inverse for A(z)for each z # z0. (ii) A(z) is analytic on a deleted neighbourhood of 20. (iii) A(z)A(z) and A(z)A(z) are analytic on a neighbourhood of (i)
The proof of Theorem 5 may be found in [6:1. Theorem 5 is a local result in that it talks about behaviour near a point. It is possible to get global versions. Theorem 4 and 5 could be useful in a variety of settings. For example, equations of the form A(z)x =0 occur in the general eigenvalue problem, the study of vibrating mechanical systems, and some damping problems. If the ideas of Chapter 5 are applied to circuits with capacitors and inductors, then the impedance matrix is a meromorphic matrix valued function since it involves polynomials in w and 11w. In particular, Theorem 5 might be useful when using the characterization of the impedance matrix of a shorted n-port network given in Proposition 5.3.1.
CONTINUITY OF THE GENERALIZED INVERSE
6.
229
Non-linear least squares problems: an example
Theorem 5.2 can be useful in non-linear least squares problems. Some
non-linear problems are very similar to those discussed in Chapter 2, Sections 6 and 7. In Section 2.7 we discussed fitting a function of the form
... to a set of data. There the g.(x) were known functions and the /3, were
parameters to be estimated. A somewhat more general problem is to fit a function of the form
... +
+
y=
(I)
to a set of data. Here the are known functions while the /3., and are parameters. The g1 may be vector valued. Equations of = the form (I) appear widely. For example.
+
= $1
and (—)
+
y = fl1e1" +
are both in the form of (I). Functions of the form of (2) are common
throughout the sciences. Any process that can be described by a linear differential equation (with reasonably good coefficients) will have solutions of the form (1) where k will be the order of the equation. Sometimes it is clear from the problem what the g(x, i) look like. A particular example is the iron Mössbauer spectrum with two sites of different electric field gradient and one single line. Here (I) takes the form:
y=
+
+ /33t
—
P4[2 +
—
+2( Pr__________ +
+
+2+(
— 1)2
_)2
F__________ We are not going into the full theory of fitting (1) to a set of data since
it would lead us too far astray. We will however work out an example that illustrates the basic ideas and difficulties.
Example 10.6.1
Suppose that we wish to fit the equation (4)
230 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
to the data points (ti, y.) which are (0,0), (1, 1), and (2,3). Here we assume
is a real parameter. Thus we wish to minimize the error in
0f30+133 +e, I
= 3 =
+ +
+ e2 + e3.
This can be written as
+ e where
y
(5)
11 '1
101
=I I = I 11, Ly3J
[3]
=
1
e2
and e
e23]
= e2
Li We will minimize e with respect to the Euclidean norm. For any value of Substituting into (5) gives we minimize e by choosing fi = — = (I To minimize that e = y — =y— e we must minimize 11(1 — or equivalently, maximize = It is clear from the data, that =0 gives a poor fit. Assume then that # 1. Then II
=2—
—
+
A direct calculation gives 19e42
16—8ev—
=
2—
—
+
Differentiate this and set equal to zero to locate potential maxima. The =0. This is a — result is 8 — + + + 24e32 — Its roots are — 0.8, 1, and 2. The root — 0.8 sixth degree polynomial in is out since 0, and I is ruled out by assumption. We shall discuss the one root later. That leaves e2 = 2 or = In 2. Then = [— 1, 1) so that y = —1 + or y = — 1 + 2' is the curve of form (4) which bests fits our data. Note that, in fact, we have an exact fit. In Example I we discarded two roots — 0.8 and 1. Clearly the — 0.8 was extraneous. Where did the 1 come from? Consider the problem of finding extrema off/g wheref, g are two differentiable functions. Proceeding formally we getf'/g —fg'/g2 =0 or qf' —fg' =0. If we now notice that g(x) = x2 we would conclude that x =0 was a potential extremum. But f/g is possibly not even defined for x =0. This is exactly what happened in Example 1. The root e2 = 1, corresponded to when was zero. — was not of full rank and the term 2— + There need not always exist a best fit.
CONTINUITY OF THE GENERALIZED INVERSE
Example 10.6.2 equation
231
If the process of Example 1 is used to try to fit the (7)
to the data (0.0). (1, 1)and (2,2) then, as in Example 1, we get a polynomial but the only positive root of this one is 1. It is not too difficult to find values of 0,fl0, and fl1 which give a better fit than =0 ever does. What has happened here is that there is no best fit. By picking close enough to zero and correctly choosing the one can fit (7) with as small an error as desired. But an exact fit is impossible since the data is colinear and (7) is strictly concave up or constant. Aside from these problems this least squares technique is, in general, is of full rank the formulas considerably more complicated. Even for can be very complicated. Secondly, if there are several a's, then will be a function of several variables which makes maximization more difficult. Finally, even when the maximization of reduces to finding the roots of a polynomial as our example did, the polynomial may be of large degree and require numerical methods to find its roots. The reader interested in the numerical methods necessary for solving such problems is referred to the paper of Golub and Pereyra [37]. Note that in working our example we used the differentiability of Where Theorem 5.2 comes in explicitly is in the and not theoretical development of the general technique. 7.
Other inverses
It should be pointed out that it does not make sense to try and duplicate the results on the continuity of for (i,j, k)-inverses. The reason is obvious, for A (t) (or any other (i,j, k)-inverse) is not a well defined function. Thus if A, A and rank (A) = rank (A) one could still have A fail to even converge. Example 10.7.1
ç=
Let A
Ii
01
= Lo
0]'
A,
Il+l/j 01 and
=L
0
0]
(—Wi 0
]
even converge. If A and Al is a uniquely defined matrix for each j, then it is possible to discuss the convergence of the sequence {Al). This would be necessary, for example, in discussing iterative algorithms or error bounds for particular methods of calculating inverses. See Chapter 12 for some such results. For uniquely defined inverses, such as the Drazin, it is possible to consider continuity. The conditions under which the Drazin inverse is
232 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
continuous are similar to but not identical to those under which the Moore — Penrose inverse is continuous.
o]
[1 0
[1
Let
Example 10.7.2
A
0
Then
—'
A,
—'
=
0 0 0 0 0 0 0
AD, but rank (A)> rank (A) and md (A)> md (A).
0 0
100 000 001 000
0 0 0 0 0
0
0000
while AD =0. Thus
—, A,
rank (A) = rank (A),
= Ind(A), but A
Notice in Example 2 that core-rank be the key.
the rank of A" where k = md (A). = core-rank A. This turns out to
-. A. Then and that -. AD and only jf there exists anj0 such that core-rank (A) = core-rank (A)
Theorem 10.7.1
Suppose that
A E C"
X
forjj0.
Before proving Theorem I we need two preparatory results. The first is a generalization of Corollary 4.2.
PeCm xm are projectors (not Suppose that —' P. Then there is a J0 such necessarily orthogonal). Suppose furt her that that rank (Ps) = rank (P) for J0•
Proposition 10.7.1
P we Proof Suppose that P, P where = P, and P2 = P. Since have rank (Ps) rank (P) for large enough J• that there does not P Let = P exist a J0 such that rank (Ps) = rank (P) for j L. Then there exists a subsequence PA such that rank (Ps)> rank (P). That is, dim dim R(P). But N(P) is complementary to R(P). Thus for each J,, there is a #0. But then eN(P), and vector u, such that u, )u. =Puj +E.u. =Ej U. . Let 11.11 bean U. =P, lkfor ThenkII E j,'. But F!, if —p0 and we have a
contradiction. Thus the required J0 does exist. I We next prove a special case of Theorem 1.
Proposition 10.7.2 Suppose that
A€C"'
and
—'
A. Suppose
CONTINUITY OF THE GENERALIZED INVERSE
further that md (A,) = md (A) and core-rank (A,) = core-rank (A) for Then A? —, AD.
greater than some fixed
Proof
233
j
Suppose that A1 —. A, md (A1) = md (A) = k and core-rank (A1) =
= core-rank (A). From Chapter 7 we know while AD = l)tAk. But rank(AJk4 1) = core-rank (A1) = core-rank (A) = 1)t by Theorem 4.1 Proposition 2 rank(A2k4 1)• Thus (A,2k l)t —, now follows. U We are now ready to prove Theorem 1.
Proof of Theorem 1 Suppose that A ., A are m x
m matrices and A1— A.
We will first prove the only if part of fheorem 1. Suppose that A?-. AD. But AJA? is a projector onto Then AJA? —, and AAD is a Thus rank(A.,A?) = core-rank (A1) and projector onto rank (AAD) = core-rank (A). That core-rank (A,) = core-rank (A) for large j now follows from Proposition 1 and the fact that AJA? —, AAD. To prove the if part of Theorem 1 assume that core-rank (A,) = core-rank (A). Let A1 = C + N, and A = C + N be the core-nilpotent decompositions A' for all integers 1. Pick in Chapter 7. Now of A1 and A
= and A' = C'. Hence C'— C'. Now I sup {Ind(A1), Ind(A)}. Then = rank(C'). core-rank (A1)= core-rank (A) = = Ind(C) are either both zero or both one. This implies that -. by Proposition 2. We may assume the indices are one Thus = (CD)1. else A is invertible and we are done. But = (C?)' and = converges to (C" = Hence A? = C?
•
We would like to conclude this section by examining the continuity of
the index. In working with the rank it was helpful to observe that if A1—. A, then rank (A1) rank (A) for large enough j. This is not true for the index.
Example 10.7.4
Let
=0 while lnd(A) =2. Notice that A? + AD in Example 4. Proposition 10.7.3 Suppose that then there exists aj0 such that Ind(A)
Ind(A)forj j.
A and
AD. Let J0 be such that md does not Proof Suppose A1—. A and take on any of its finite number of values a finite number of times for be the subsequence such that jj0. Let 1= inf{Ind(A) and md (A1) =1. Let N be the nilpotent parts of A1, A respectively. Then — —. A'(I — ADA) = N'. Hence md (N) l.U 0= (NJ =
234 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
8.
Exercises
Prove that 11 Ii,,, p 1 defines a norm on C". on C" and C'" that 2. Prove that for AeC'" • and norm 1.
sup { II Au
=I} K u 115 for every uEC}.
: UE C", II u
= inf{K: Au 115
3. Suppose that {Ak} is a sequence of rn x n matrices. Let cC'" ".Prove that Ak —, A if and only if A=
=
lim k-.
for 1 i rn, 1
4.
Show that if Ac C'" XII, then
5.
If Ae C'"
then
let A
n. A 1101
= max {
= n max ajj I = n II A
II
Since C'" XIS is
the 11112 noim. IA 112
isomorphic to C'"" we can give
/
i
\1/2
=
ii (a)
is a matrix norm in the following sense. If
Prove that A, BE C" XIS then
AB A
lB
(b) Prove that 11112 is a matrix norm. (This will probably require the following inequality:
/
\1/2/
i,
Ei_i
N
i—i
•
\1/2
1—1
which goes by the name of Cauchy's inequality.) Show that Suppose that A E C" (c)
!
(d)
—
(e)
A A A
A
A 1101 A fl
A A II
o2
and
A
(f) n"211A112 11A1102 hAil2
(g) n"211A112 (h) n"211A112 hAil01 (1)
(i) n"2iiAiI02 hAIL1 (k) n' hAil00 hAIL1 (1)
In parts (l)—(k) determine the correct inequalities if
AEC'"" rather than CNX. 6. Show that the inequalities (c)—(k) of Exercise 5 are all sharp. 7. Prove Proposition 10.3.1.
CONTINUITY OF THE GENERALIZED INVERSE 8.
Let I be the identity matrix on CflXft. Show that 11111
235
1 for any
matrix norm on 9. Let be any matrix norm 11111> 1 is permitted. 1111 A <1, prove that (I — A) is invertible and estimate 11(1
—
A)-'
10. Prove Fact 10.4.1. 11. Prove Fact 10.4.2. 12. Give an example to show that Proposition 10.4.2 is no longer true if E < 1 / H t is weakened to fi E 1 / 13. Suppose M, N are subspaces of a norm on Prove that if dim M > dim N, and K is a complementary subspace to N then there such that hull = 1. *14. Prove Theorem 10.4.4. (Hint: first prove for the case that AeCtm x1 has rank n.) 15. Prove Proposition 10.4.3.
9.
References and further reading
A more complete discussion of matrix norms may be found in [33] and
[37]. In [37] norms are discussed in terms of their unit balls { U: U 1). The relationship between various norms is studied in [33]. In particular, Exercise 5 is from [33]. Theorem 4.1 was given in Penrose's first paper on generalized inverses [67). His proof was based on the characteristic function of A*A and appears as an aside on the bottom of page 408. In a follow up paper [68] he discussed approximating At. The development given here is similar to that of Stewart [88]. A more restrictive treatment that applies some of the ideas to error estimation is [9]. An infinite dimensional treatment is given in [63]. Another treatment that is restricted to hermitian matrices is [76]. The paper by Robertson and Rosenberg [75] deals with matrix valued measures. In particular, they prove matrix versions of the Hahn—Jordan decomposition, the Radon—Nikodym theorem, and the Lebesgue decomposition. Their results use generalized inverses. As a lemma they establish that if A(t) is a measurable m x n matrix valued function (that is, a.fr) is measurable for I I m, 1 n) and if B(t) = [A(t)]t,
then B(t) is a measurable n x m matrix valued function. Proposition 4.3 and Theorem 5.3 are from [37). There is a nice bibliography at the end of [37] which includes a reference which explains formula (3).
11
Linear programming
1.
Introduction and basic theory
This chapter will discuss how the theory of linear programming relates to
the theory of generalized inverses. The chapter is not designed to teach the reader the full theory or applications of linear programming. We ignore, for example, the simplex method. We will begin by describing the basic linear programming problem. Then several basic theorems will be presented. Selected proofs will be given to give the reader an idea of some of the techniques involved. We conclude by showing how the generalized inverse can occur in working with linear programming problems. Hopefully by the end of this chapter, the reader will have a good idea of the part that the theory of the generalized inverse can play in the theory of linear programming. To begin with, we should probably point Out that the name 'linear programming' can be somewhat misleading. It is not concerned with computer programming as such, though computational algorithms play an important part. Rather the theory concerns maximizing and minimizing linear expressions with respect to linear constraints and linear inequalities. These problems arise, for example, in allocation of resources and transportation problems. The 'program' is thus more of a 'schedule' or 'allocation scheme'. In order to motivate the formulation of the general mathematical problem, let us consider the following simplified situation. A manufacturing company makes three products P1 , P2 and P3. Each product uses the inputs of electricity, labour, iron and copper. Suppose the amounts used are given by Table 4.1. The numbers are in terms of amount of input required (in some appropriate units) per unit output of product. For example, each unit of P3 uses 4 units of electricity. The problem is to maximize the profit where the profit per unit of product is I for P1. 2 for P2. and 1 for P3. We assume that all available product can be sold. However, there are certain constraints. We suppose that there are only 20 units of labour, 10 units of
LINEAR PROGRAMMING 237 Table 4.1 P1
P2
P3
Elect.
4
8
4
Labour Iron Copper
3
3
I
I
I
0
I
2
2
iron, and 5 units of copper available each week. The problem may be formalized mathematically as follows. Let x1, x2 , x3 denote the quantities of products P1,P2, P3 to be produced in a week.
Problem A 3x1
Maximize x1 + 2x2 + x3 subject to the constraints
+ 3x2 + x3
20,
x1 +x2 10, x1 + 2x2 + 2x3 5,
andx1 This can
be put
into a more standard form as follows. Let
x4 = 20— 3x1 — 3x2 — x3, x5 = 10— x1 — x2, and
x6=5—x1 —2x2—2x3. Then Problem A becomes Problem B.
Problem
B Maximize x1 + 2x2 + x3 subject to
+3x2+x3+x4=20, x1+x2+x3=10, x1+2x2+2x3+x6=5, 3x1
and
x6 represent unused available input and are called
slack variables. The general linear programming problem can be
formulated using Problem A as a model. One might be tempted to define it as maximization of a linear function subject to any combination of inequalities and equalities. However, by multiplying inequalities by the appropriate sign we may get all the equalities in the same direction. for all I, then Ifx =(xl,...,xM)ER" andy and ifx1 we write x y. This notation will simplify our calculations. Let
={xERN:xO}.
Since linear programming problems are usually done with real values we if x y is will work with them. However, most of the theory is valid for interpreted to mean Re(x — y) 0 and we maximize the real valued functional, Re(x,c). Here Re z is the vector whose ith entry is the real part of the ith entry of z.
238 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS beRtm, and Definition 11.1.1 Suppose that The general linear programming problem is to maximize (x, c) subject to the constraints Ax b and x 0. Note that by maximizing (x, — C) we can obtain the minimum of (x, c).
Definition 11.1.2 A linear programming problem is in standard form it is oftheform Ax = b, x 0, Note that the general linear programming problem can be put in standard form by using slack variables. Problem B was the standard form of Problem A. Problem B has the advantage that one is working with equality constraints. Problem A will in general have a smaller matrix A. Thus while Problem B might be easier to work with theoretically, Problem A might be simpler since it involves smaller sized matrices and vectors.
Going along with each linear programming problem is another one called its dual. Definition 11.1.2 (c, x) subject
The dual
of the linear programming problem: maximize
to Ax b, x 0 is, minimize (b, y) subject to
c, y 0.
The dual of Problem A would be Problem C.
Problem C
Minimize 2Oy1 + lOy2 + 5y3 subject to
3y1 +y2+y3I
y1 +2y3l
1
and This dual problem has an interpretation related to that of Problem A.
Problem A amounts to maximizing total net revenue, while Problem C consists of minimizing the total 'accounting value' of the inputs. The y,'s are the values to be assigned to the inputs. They are sometimes called 'shadow prices.' The equations (1) say that the values given to inputs must not be less than the contribution of the inputs to net revenue. We shall see shortly that Problem A and Problem C are equivalent in an appropriate sense.
We now turn to the mathematical treatment of linear programming problems. For either the general problem or its dual, a vector is called feasible if it satisfies the constraints. A feasible vector which maximizes the functional in the general problem (or minimizes the functional in the dual problem) will be called optimal for the general (or dual) problem. It should be noted that the dual of the general problem and the dual of the problem in standard form are equivalent. That is, they have the same feasible vectors, optimal vectors, and minimal value of the functional.
Proposition 11.1.1 equivalent duals.
The genera! problem and its standard form have
LINEAR PROGRAMMING 239
It will be more convenient for us to work with the standard form. The first problem is to determine when feasible solutions of Ax = b, x 0 exist, that is, when the constraints are consistent. The constraints Ax = b, x both bER(A) and A beN(A) +
Proposition 11.1.2 only
0, are
consistent !f and
Proof Solutions of Ax = b exist, of course, if and only if be R(A). If bER(A), then the set of all solutions to Ax = b is Ab + N(A) for any (1)-inverse A. Thus the set of constraints it consistent if and only if (Ab + N(A)) is non-empty. This happens if and only if
U From Proposition 2 and a lot more work we can get a characterization of consistent constraints due to Farkas. A proof using generalized inverse notation may be found in [11].
Theorem 11.1.1
(Farkas). Suppose that Ac R'" following are equivalent: (i) Ax = b, x
X
be Rm. Then the
0 is consistent,
(ii) A*y 0 implies that (b, y) 0.
A linear programming problem and its dual are closely related. We
summarize this relationship in the following fundamental theorem. The theorem is standard. A proof may be found in Simonnard [84], for example. Theorem 11.1.2
A linear programming problem and its dual either both have optimal solutions or neither does. If they both do, the maximum value of the original problem equals the minimum value of the dual. This common value is called the optimal value of the problem. This theorem has several consequences. First of all, it says that if either
the original or the dual has no feasible vector, then the other cannot achieve a maximum (or minimum) even if it has a feasible vector. Suppose that x, is a feasible vector for a linear programming problem while y is feasible for its dual. Then Ax0 b and A*y0 c. But then (b, y0)
c*x0 =
(c, x0).
(2)
According to Theorem 2, if y, x are optimal feasible vectors, then (b, y) = (c, x). This provides a way of testing two feasible vectors to see if they are both optimal solutions. Notice that (2) also says that if the original problem has a feasible vector x0, then (b,y) is bounded below. There would then seem to be hope for a minimum. Similarly, if y0 is a feasible vector for the dual, we have (c, x) is bounded above by (b, y0). In fact, the following is true.
240 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
Theorem 11.1.3
Iff(x) = (c, x) is bounded above (below) on {x: Ax = b, x O}, thenf attains its maximum (minimum) on {x : Ax = b, x O}. It should be noted that Theorem 3 is not true if {x : Ax = b, x O} is
replaced by an arbitrary convex set. In light of our observation that feasible vectors of the general problem and its dual form bounds for (x, c) and (x, b), the next result follows from Theorem 3.
Theorem 11.1.4 A necessary and sufficient condition for one of the two problems (and hence both) to have optimal vectors is that they both have feasible vectors. Under certain conditions it is easy to show (x, C) is bounded above. As pointed out earlier, any vector satisfying the constraints is of the form x0 = Ab + (I — AA)x0. Thus sup(x0,c) = (Ab,c) + sup((I — AA)x0,c), where x0 ranges over all feasible vectors. One way to get that the supremum on the right exists is to have (I — AA)*c =0. For then, ((I — AA)x0,c) =0 and the supremum is (Ab, c). If (I — AA)*c =0, then c*(I — AA) =0. Thus we have:
Proposition 11.1.3 A sufficient condition that thefunctionf(x) = (c, x) be bounded above is that c*AA = Proposition 3 is a very special case. If c*AA = thenf(x) = (Ab,c) for all feasible x. Thus every feasible vector is optimal. Another, and more reasonable, way to guarantee thatf(x) = (c, x) attains a maximum is to have that (Ab + the set of feasible vectors, is a closed, bounded set. That it is closed is clear since A b + N(A) and are closed. (Closed being used here in the sense of having all of its limit points.) One way that (Ab + could be unbounded would be the existence of an h0E N(A), h0 0, such that h0 0. In this case if he N(A) were
such that Ab + h 0, we would have Ab + h + 2h0 0 for A 0. In fact, the existence of non-zero h0E N(A) and sufficient if feasible solutions exist.
turns out to be necessary
+ N(A)) Suppose that the set is nonempty. Then it is unbounded (f and only (1 there exists a non-zero h0EN(A)
Proposition 11.1.4 such that h0 0.
Proof We need only show the only if part. Suppose that {Ab + N(A) } n is unbounded. Let {Ab + h,,j be an unbounded sequence made up of is an unbounded sequence. Let Then vectors in (Ab + {kj is a bounded sequence in RN, it has a convergent Since km = tim! H h, subsequence which we will denote by k1. Let k0 denote the limit of this subsequence. That is k, -. k0. Since k1€N(A) and N(A) is closed we have k0eN(A). Note that (Ab + h1)/11h111 -0 + k0 = k0. Thus k0eR",. since
Ab +
and k0 is the required vector. U
LINEAR PROGRAMMING 241
2.
Pyle's reformulation
To illustrate one way that the generalized inverse can be used in linear programming problems we will discuss a method developed by Pyle to reformulate a linear programming problem into a non-negative fixed point problem. The ways in which the generalized inverse are used in this section are typical of many applications of the theory of generalized inverses to to linear programming. Another application is discussed in the final section. We will not give all the proofs but rather outline Pyle's argument. The proofs are assigned as exercises or may be found in his paper. Consider the following problem and its dual:
(P1) Maximize (x,c) where Ax = b,x 0. (Dl) Minimize (y.b) where A*y c,y 0. Here
The first step is to reformulate (P1) and (Dl) as problems in which every feasible solution of the new problems is an optimal solution of (P1) and (Dl). This requires being able to express optimality of vectors as an algebraic condition. This algebraic condition will then be added to both of our original problems to get one large problem. The derivation of the algebraic condition requires us to first reformulate (P1) and (Dl). If x satisfies the constraints of(Pl), we know that x = Atb + (I — AtA)X. Thus (x,c) = (Atbc) + ((I — AtA)XC) = (Atb,c) + (x,(I — AtA)C). (P1) is thus equivalent to
(P2) Maximize (x.(I — AtA)C) where Ax = b (or AtAx = A'b). x 0. This is obviously equivalent to
(P3) Minimize (x, — (I — AtA)C) where AtAX = Atb, x 0.
Now we shall rewrite (Dl). The dual of (P2) is (D2) Minimize (y,
where
(I — AtA)C.
To remove the inequality in (D2) observe that if y is a feasible solution of(D2). then AtAy (I — AtA)C. Thus z = — (I — AtA)C + AtAy 0. Thus z is a feasible solution of(I — AtA)Z = — (1 — AtA)C, z 0.
Conversely, if(I — AtA)Z = — (I — AtA)C, z 0. Then z = — (I — AtA)C + (AtA)y and AtAy 0. Furthermore (y, Atb) = (z, Atb) if z, y are so related. Thus there is a many to one correspondence between feasible solutions of(D2) and feasible solutions of
(D3) Minimize (z,Atb), where (I — AtA)Z =
— (I
—
AtA)C.
Z
0.
The next theorem provides the basis for the algebraic condition we are looking for. Theorem 11.2.1
Suppose that x, z are feasible vectors for (P3) and (D3). Then (x,z) = 0 (fond only tfx is optimal for (P3) and z is optimal for (D3).
242 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
It will simplify some of our later calculations, and reduce the size of the matrices involved, if we use partial isometries to replace the projectors appearing in the constraints of(P3) and (D3). Let q = dim N(A). E* will be onto R(A*) while F* will be an isometry from an isometry from
onto N(A). Then F=[ai,...,aq]* and E= is an orthonormal basis for N(A) and {aq+1, ... , ;} is an orthonormal basis for R(A*). The only restraint that we put on the choice ofbasis is that a1 =(I _AtA)c/II(I — AtA)cII, and =Atb/IIAtbII. We AtA)C rule out the possibility that (I — =0, for then all feasible solutions to the original problem are optimal by Proposition 1.3. If Atb =0, then for consistent constraints we have b =0 and (x, c) is either zero or unbounded on the set of feasible vectors. So we assume Atb #0. Since E*, are isometrics, we have that EE*E = E and FF*F = F. Note that AtA = E*E. Thus EAtA = E. Similarly (I — AtA) = F*F. Let C0 = (I — AtA)C. Problems (P3) and (D3) may now be written as (P4) Minimize (x, — c0), where Ex = EAtb,
x 0
and (D4) Minimize (y, Atb), where Fy = F( — c0), y 0.
Notice that x = Atb is a solution of Ex = EA'b and N(E) = R(A*). = N(A). Thus if x is a feasible vector for (P4), then x must be of the form g
(1) i= 1
Similarily, if y is a feasible vector for (D4), then
y=—c0+
(2)
1+1
Now if x, y are feasible for (P4) and (D4), then they are feasible for (P3) and (D3). They will both be optimal if and only if(x, y) =0. Substituting (1)
and (2) into(x,y)= 0 gives
= (Atb + = (Atb,
—c0 +
i1
— c0) —
1q+1 11
+
a.).
iq+1
But and
Atb
LINEAR PROGRAMMING 243
Equations (3), (4) and the equality constraints of (P4), (D4) may be
expressed in one set of constraints as EAtb
0
[fl =
(5)
We have then that:
Theorem 11.2.2
Any solution of(5) which is non-negative in its first 2n
components provides optimal solutions of(P4) and (D4).
It is easy to modify (5) so that the desired solutions have all components where non-negative. Let = 0 and 0. Then (5) — becomes
[E
0
0
0 0
F
0
0 0
—11c011/IIAtbII
0
1
lrxl
11c011/IIA'bIII —1 JLP1J
EAtb
=
—Ec0 0
6
0
Theorem 11.2.3
Any solution of(6) which is non-negative in all of its components provides optimal solutions of(P4) and (D4). Let B be the coefficient matrix of (6). Then Theorem 3 says that solutions
of (P5)
Bz=d, z0.
provide optimal solutions of (P4) and (D4). Problem (P5) can be rewritten
so as to get
(P6) Px =
x,
x 0, P an orthogonal projector.
To see what P has to be, observe that we are asking for R(P) and { z : Bz = d} to be the same. A reasonable way to try and guarantee this is to require that {z : Bz = d} c R(P). If Bz = d, then z = B'd (I — BtB)z. P would thus be the sum of (I — BtB) and a projector onto the subspace spanned by Btd. Let P = (I — BtB) + (Btd)(Btd)*/ if Btd if 2• Then P is a hermitian projector and R(P) {z : Bz = d). Let us examine the relationship between solutions of (PS) and (P6). Suppose that z is a solution of (PS). Then z 0 and z = Btd + (I — BtB)ZE R(P). Thus z is a solution of (P6). Suppose that x is a solution of (P6). Then x 0 and Px = x. This implies that BtBx — (B?d)(Btd)* —
Btd
2
—
,
(Btd, x)
— if Btd
2
Btd
Multiplying both sides by B and using the fact that Bx d is = consistent,
244 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
we get that Bx = (B'd, x)d/ Btd 112. Let
1= (Btd, x)> 0, then z will be a solution of (P5) since we assumed x 0. If(Btd,x)= 0, then xeR(I — BtB) = N(B) and x 0. We
would then have (Btd + is unbounded. This in turn would imply that in our original problems the constraints defined an unbounded 0 set. Suppose that (Btd, x) <0. Then z 0 so that z = B'd (I — that Btd h 0 since (PS) is assumed and there is an he N(A) such consistent. But then h — (I — BtB)z = (B'd + h) — (B'd + (I — BtB)z) 0. Again we have N(B)r is non-empty and (Btd ÷ would be unbounded. Summarizing these observations, we have the next theorem.
Theorem 11.2.4
If in problems (P1) and (Dl) the constraints define a bounded subset of Fe!', then solutions of(P6) provide solutions of(P5) by means of equation (7). Thus solutions of(P6) will provide optimal solutions
of(P1) and (Dl). It is worth noting that Bt, and hence P. are fairly easy to compute. One way is to observe that B is of full row rank and hence B' = B5(BB5) Because of the particular entries involved, the matrix (BB5) is easy to invert. An even easier way to find Bt. is to observe that by slightly modifying the last two rows of B we can get a matrix C such that Bz = d and Cz = d have the same solutions but C is a partial isometry. Then C is used in place of B. From Chapter 2 we know that C' =
Exercises
3.
1. For a set .5° c
define .9'° =
:(z,s) 0 for all scsi. Prove that
for 5°,if any sets in (a)
(b) .5°
if implies if0 c .500
(c) $P C (54PO)0 = $000
(d) (e)
(clb°)°
cL9' is the closure of 6" in the Euclidean norm on R'.
(f) $°°+if°c($°+.fl° 2. Verify that if M RA is a subspace, then M° = M1.
3. Verify that 4. Prove that + N(A) is closed. (Hint: It is possible to define an n x p matrix C such that = + N(A) where p = n + dim N(A). In other words, R",. + N(A) is a polyhedral cone. Using C, C', and the fact that a closed convex set has an element of minimal norm, one can show that + N(A) is closed. Details may be found in [11, p. 380].) 5. Prove Theorem 11.2.1.
LINEAR PROGRAMMING 245
6.
Verify that if BE
X
dE R'".
then P = (I
—
BtB)
+ B'd
2(Bd)(Bd)
is an orthogonal projector. 7. Let B be the coefficient matrix in equation (6). Section 2. Verify that B is of full row rank and calculate Bt by using Bt = B*(BB*YI. 8. Let B be the coefficient matrix in equation (6). Section 2. Modify the last two rows of B to get a new matrix C such that Cz = d and Bz = d have the same solutions and C is a partial isometry. 9. Prove Proposition 11.1.1.
4.
References and further reading
There are many books on linear programming. We list three [48], [84],
[85] in the references. They are the ones we have found most useful. [48] is the most technical and [85] the least. The exposition in [84] seemed well written. The generalized inverse is also discussed in [70] by Pyle and Cline. They are concerned with gradient projection methods. That is, they approach optimal vectors by moving through the set of feasible vectors. This is in contrast to the simplex method which goes from vertex to vertex around the edge of the convex set of feasible vectors. (If there is an optimal vector, there must be one at a vertex.) A proof of Farkas's theorem can be found in Ben-Israel [11]. The complex case and a more thorough discussion of polars and cones is also given in [11].
12
Computational concerns
1.
Introduction
This chapter will consider the problem of computing generalized inverses.
We will not go into a detailed analysis of the different methods. Books exist on just computation of least squares problems. Rather we shall discuss some of the common methods and when they would be most useful. The bibliography at the end of this book will have references that go into a more detailed analysis. Our procedure will be as follows. We shall first discuss some of the difficulty with calculating generalized inverses. We will talk mainly about the Moore—Penrose inverse but the difficulties apply to all. Then we shall consider the problem of computing At. A particular algorithm involving the singular value decomposition will be developed in some detail. Then a section on A, and finally a section on AD. The first thing to note is that we are talking about calculating, say At, and not necessarily about solving Ax = b in the least squares sense. One has the same distinction in working with invertible matrices A. If one wishes to solve Ax = b, the quickest way is not to calculate A - and then A 1b. It takes of order n3 operations to calculate A-' and another n2 to form A - 1b. The direct solution of Ax = b by Gaussian elimination can be done in n3/3 operations. (Here operations are multiplications or divisions.) Similarly, it takes more time to calculate At and then Atb, than to directly calculate Atb. The algorithms we shall discuss fall into three broad groups. The first is full rank factorizations and singular value decompositions. The second is iterative. The third we shall loosely describe as 'other'. It consists of various special ways of calculating generalized inverses. These methods are usually of most use for small matrices, and will tend to be mainly for the Drazin inverse. The reader interested in actual programs, numerical experiments, and error analysis is referred to the references of the last section. It will be assumed that the calculations are not being done entirely in
COMPUTATIONAL CONCERNS 247
exact arithmetic. Thus there is some number ç >0, which depends on the
equipment used, such that numbers less than ç are considered zero. Several algorithms suitable for hand calculation or exact arithmetic on small matrices have been given earlier. For small matrices, those methods are sometimes preferable to the more complicated ones we shall now discuss. This chapter is primarily interested in computer calculation for 'large' matrices. Throughout this chapter denotes a matrix norm as described in Chapter 11. For invertible A, we define the condition number of A with We frequently write DC respect to the norm as = hA instead of ac(A). If A is singular, then i4A) =
2.
HA
At
Calculation of A'
This section will be concerned with computing
The first difficulty is that this is not, as stated, a well-posed problem. If A is a matrix which is not of full column or row rank then it is possible to change the rank of A
by an arbitrarily small perturbation. Using the notation of Chapter 12 we have:
Unpleasant fact Suppose AECTM XI, is of neither full column nor full row rank. Then for any real number K, and any 6>0, there exists a matrix E, E <6, such that II(A + E)' — A' K.
Proof Let UEN(A) and veR(A) be vectors of norm one. Let E = min{1/K,e}. Then hEll = çand rank(A +E)= rank(A)+ 1. where Hence (A + E)t — At K by Theorem 10.4.2. If A is determined experimentally, or is entered in decimal notation on a computer, or there is round off, then it is not obvious whether it makes sense to talk about computing At unless A is of full rank. One method of posing the problem is by using the singular value decomposition (Theorem 0.2.2). Let X be an m x n matrix. By Theorem 0.2.2 there exist unitary matrices U, Vso that
]v
(1)
where 1(X) = Diag[a1,... ,o,]. Then
:]u*. Now let
]v Herec
248 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
depends on the computing equipment available and the desired accuracy. Now take The entries of 1(A) are the eigenvalues of(AA)"2.
Since eigenvalues vary continuously with the matrix's entries, one has that 1(A + E) and 1(A) have the same rank if fl E Ills small enough. Thus urn IIEII—O
—
II
((A +
II
=
(3)
by Theorem 10.4.1. Thus the problem of computing is used as the definition of the Moore — Penrose inverse.
is well-posed if(2)
It should be noted that the rate of convergence of (3) depends on A and E and not just the condition number of A. Example 12.2.1
0].Then
Let
— (A + = large error if 8< II
c, it equals 0 if < c. For this A, we have a <& Note K(A) = 1
if
<2e and no error for
for 11112
There is another way to view the calculation of At as a well-posed problem. Fix A e 'i" and consider the equations
AXA—A=E1,
(4)
XAX—X=E2,
(5)
AX_X*A*=E3,
(6)
XA_A*X*=E4.
(7)
and In terms of our previous discussion, X may be thought of as the computed estimate and the E. as error terms. We shall now solve (4)—(7) for X in terms of A, At and the E.. Equation (7) gives AXA — AMX* = AE4, or — EA*. by (4), E1 + A — AA*X* = AE4. Thus XAA* = A* + Multiplying on the right by gives XAAt = [A* +
—
EA*]A*tAt.
(8)
Similarly from (6) we get AXA — X*A*A = E3A and hence
AtAX = AtA*t[A* +
—
A*Efl.
(9)
Substituting (8) and (9) into (5) gives
E2+X=XAX=XAAtAA1AX = [A* + — EA*]A*tAtAAtA*?[A* + — A*Er] = [A' + — EAtJAA?A*t[A* + — A*Efl — E]AtA*f[A* + — A*Efl = [I +
= A' + [EiA*f — E]A' + — +
—
—
Efl
Efl.
As an immediate consequence of (10) we have the following results.
(10)
COMPUTATIONAL CONCERNS 249
Theorem 12.2.1 If is a sequence of n x m matrices such that the {X,A — A*X') and {AX,A — A) sequences {XAXr — X,}, {AX. — all converge to zero, then {X,} converges to At.
Theorem 12.2.2
Let A C" ",XE C" "'and
l,2,3,4by(4)—(7). Then
such that 11B11 = IIBII. Define X—
denote a matrix norm
E2 +
{
E1
+ E3 +
P11
E4 II)
(I)
—fllAtO2{211E1l1+0E111l1E31l+IIE4HIIE1II) + hAt 11311E1
112
Note that estimate (11) in Theorem 2 seems to suggest that minimizing
is important if E has large condition number. Theorems 1 and 2 show that if A liii At and At are small, then calculating At is well-posed in the sense that if X comes close to satisfying the defining conditions of the Moore—Penrose inverse, then X must be close to At. Algorithms exist for calculating the singular value decompositions that are stable. That is, error accumulates no faster than the condition number would seem to warrant. Perhaps the best known one is due to Golub and Reinsch [39]. Their method consists of using Givens rotations and E1
H
Householder transformations to reduce A to the form
where J is
diagonal. This method is stable primarily because the Householder matrices and Givens rotations are unitary. Thus their application does not increase the norm of the error matrix. This algorithm is discussed in more detail in the next section. One might think that the full-rank case would always be easier to work with. Assume is full column rank. Then from Theorem 1.3.2
At = (A*A) 1A*.
(12)
However, taking AA is to be avoided unless one knows something about the singular values of A. Let A = UZV where I = Diag{a1, ... ,a}. Then A*A = V*12V where £2 = Diag , o). If a1 is small, then might be negligible. For example, if round off is 1011, a singular value of 10-6 would be lost.
fi
Example 12.2.2
ii
LetA= IP 0I.ThenA*A= LO PJ
If $ is small, we could have $2
fi+p2 I
I
L
negligible and (AA)
1
=
In any event, for small ji there is a loss of information. These same comments apply to computing At by writing A = BC where BC is a full rank factorization. Then At = CtBt by Theorem 1.3.2. It is possible with a little effort to still utilize a full rank factorization and
250 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
= C*(CC*)l or avoid the possible ill-conditioning caused by taking Bt = (B*B) lBS if only one of CC5 or B5B is ill-conditioned. If, however, both CC5 and BB are ill-conditioned then the method will not help much. The interested reader is referred to [35]. 11(12) is to be used, then double precision arithmetic is recommended. In fact, some numerical analysts recommend always using double precision. This will be especially true when working with the Drazin inverse as will be explained later. Another way of calculating the Moore — Penrose inverse is by an iterative scheme. Iterative methods exist for finding the inverse of a non-singular matrix. Their principal use is in taking a computed inverse and yielding a more accurate one. However, the algorithms for iteratively calculating the Moore—Penrose inverse are not generally self-correcting. One of the more commonly discussed iterative methods is the one due to Ben-Israel [8]. One form of it is the following. Its proof and some related results are left to the exercises. Theorem 12.2.3
ff0 <
denote the largest eigenvalue of AA5, A€Cm Let = X,(21 — AX,). Then < 2/A1, set X0 = xA5 and define
X—' At as r—' co. Stewart has shown that the iterative scheme of Theorem 3 takes at least
2 log2[c(Aj] iterations to produce an approximation to (Aàt. Thus it can be slow. The method does turn out to be stable. Error does not accumulate more rapidly than would be expected. Each iteration requires 2mn2 multiplications. This method and its variants are not competitive for large matrices with the singular value decomposition. Stewart has also shown that if the entries of A are large so that the round-off error is large in magnitude, then errors can accumulate rapidly so a careful analysis of rounding error is needed. Proper splittings have also been recently proposed as a method of calculating either (1)-inverses or the Moore—Penrose inverse [14]. These methods require first the splitting and then an iteration. Details may be found in Exercises 11—16 and in [14]. At this point it should be pointed out that while the singular value decomposition is a widely used method for computing At there are two schools of thought. Let us use the phrase 'elimination method' to designate those methods of computing At which are based on some variation of Gaussian elimination. Algorithms 1.3.1, 1.3.2,3.3.1 and 3.3.2 are of this type. To begin with there is no such thing as a universal algorithm. Given any non-trivial algorithm there is usually an example for which it works poorly. Secondly, if an algorithm is to be able to handle, less well conditioned problems it frequently must be made more sophisticated. On this basis there are those who argue that elimination methods have their place. Elimination methods usually are much simpler with substantially lower operation counts. Consequently, they are quicker to run with less accumulation of machine error.
COMPUTATIONAL CONCERNS
251
It would be unfortunate if after reading this chapter, a reader computed At for a 4 x 4 matrix with integer entries by using the singular value decomposition. By the time he had it entered on a machine, he could have obtained the answer at this desk, exactly if need be, by an elimination method and drunk a second cup of coffee. For well conditioned matrices of moderate to small size there are strong arguments for using an elimination method. In this same vein we might point out that in our computing experiments with A° we tried powering A by using QR factorization. While this is a stable method we discovered, not suprisingly, that for 6 x 6 or 8 x 8 matrices we got more error than if we just naively multiplied them out. The reason was the substantial increase in the operation count and the loss of information. When A was entered exactly (had integer entries for example), some error was introduced in the factorization whereas direct multiplication often produced the exact answer. If one suspects that a matrix is severely ill-conditioned and/or very high accuracy is needed, then the best idea of all may be to compute in exact (residue) arithmetic. While this may substantially increase the computational effort needed, it will provide At exactly if A has rational entries. Elimination methods are preferable with residue arithmetic since they have lower operation counts and produce exact answers. The basic idea is to rescale the original matrix so that it has integral entries. Then compute in modular arithmetic. If the numbers involved are large, then multiple modulus arithmetic may be necessary. A good discussion of three elimination methods and associated algorithms in modular arithmetic may be found in [14]. See also [87]. Residue arithmetic can be further studied in [90], [92]. See also the other references of [14]. Exact computation may also be done in p-adic number systems (see [14] for references).
3.
Computation of the singular value decomposition
This section will hopefully develop the Golub—Reinsch algorithm in
sufficient detail so that the interested reader will understand its basic structure. We shall first briefly discuss Householder transformations and Givens rotations. Then we shall give an outline of the algorithm. Finally, we shall discuss each step in greater detail. We shall not discuss the actual organization of such a program or worry about storage. The interested reader is referred to [52, Chapter 18] or [15], [36], [38], [39]. Recall from Chapter 10 that v 2 =
I
Definition 12.3.1 Householder transformation. It is easy to see that H= a
—
2uu*, is called a
H u
a liv II 2e1 where
252 GENERALIZED IN VERSES OF LINEAR TRANSFORMATIONS
a=+1
<0. Let H
0, a = — 1
= I — 2uu*/u*u. Then
Hv= —oIIv112e1.
Definition 12.3.2 A Givens rotation is a matrix G = [g1 ,... ,gj, such that g. = e4 except for two values of i ; i1
I
01
c
00 sl
10 wsthc
2
•
CJ
01
0
+s =1. 2
A Givens rotation is unitary and is used to introduce zeros.
Proposition 12.3.2 Given
and indices 1 i,
the vector
+ c= = v.2/(1v1112 + 1v12 12)112. Then the Givens rotation G defined by i1 ,i2, c, s is such that (Gv)1 = v. i1 or i2 ,(Gv)11 = (I v11 + (Gv)12 = 0.
I v.2
The operation performed by G in Proposition 2 is often referred to as zeroing the i2-entry and placing it in the i1-position. Note that only i1 , are affected at all. Note that (n — 1) Givens rotation could be used on a vector v to get a multiple of e1 but one Householder transformation will do the job. We now present the algorithm.
Algorithm 12.3.1
Given AECTM XII Assume
m n. If m n, work
with A*.
(I) Perform at most 2n — to get
1
Householder transformations Q,, H1 to A
QTAH=Q((QIA)H2.H)=[:]BEcAx. where q1
B=
e2 q2
0 (1)
0 Note B is bidiagonal.
(II) II some e. =0, then B =
[1
]
and the singular value
decompositions of B1 ,B2 may be computed separately. (III) If all e1 0 but some =0, pre-multiplication by (n — k) Given s
COMPUTATIONAL CONCERNS 253
rotation G1 gives q1
—
e2
0
0
0
—
0
I
0 I
Lo
0
ñ2
Application of Givens rotations on the right to produces a matrix with zero last column. (IV) Steps II, III have reduced the problem to computing the singular value decomposition of B in the form (1) where all q. 0, all the e. 0. Say B is n x n. Now set B1 = B, where Uk,Vk are = is upper orthogonal and to be defined. The Uk, Vk are chosen so that bidiagonal for all k and the (n — 1, n)-entry of Bk goes to zero as k —' Thus after a finite number of steps
01
B= k
0
0
4
and 0 is less than some agreed-on level of precision. Hence
Bk=[
¶].
(2)
(V) Step IV (and possibly II, III) is repeated on terminates in at most n — 1 repetitions to give UAV
=
in (2). The process
[1] where D is diagonal.
and U, V are unitary. (IV) Compute A? = U*[Dt 0] V'.
This completes our outline of the algorithm. We shall now discuss some of the steps in more detail.
Step I
If A = [a1 , a2,... , a], calculate Q1
Q1a1 is a multiple of e1. Let Q1A so
that
rr1l =
by Proposition 1 so that
Use Proposition 1 to find an
LrmJ (That is,
254 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
H2[] 0 i
(Q1A)H2 =
(2)
:
:
LOx {
I
x
Now repeat this procedure on the lower right hand block of (3). Note that
if H is an n — 1 x n —1 Householder transformation, then
Lo n x n Householder transformation. Continuing, we get (1) in at most 2n — I Householder transformations.
Step II
an
Clear.
Step I/I
Suppose ek 0 but =0. The rotation is defined as in Proposition 2 and places a zero in the (k,j)-position, the rotation effecting only the kth and jth rows of B.
Step IV This is the most difficult part. It constitutes a modified implicit QR-procedure. Suppose B is in the form (1) with all 0. Set B1 = B. Suppose that B,,, has been calculated. We shall show how to get Let Bm be in the form (1), B,,, ben x n. Set Bm+
1=
—
— —
+
1)/2e,,q,,_1, and
—
f[f+ (1 +f2)112J if f0
[f— (1 +f2)"2] iff< 0. + e,,2 — e,,q,,_ 1/t.
Set a = Now — cr1 has [x, x, 0, ... , for a first column. Calculate by Proposition 2 a rotation R1 to zero out the (2,1)-entry of BBM — al. This rotation would involve only the first two columns. Thus
xx x
x
e3
q3.
0 A series of Givens rotations are now performed on BR1 to return it to
bidiagonal form. Let R1 operate on columns i and I + 1 ;T1 operate on rows i and i + 1. Then B,,,.,.1
(4)
Each rotation takes a non-zero entry, zeros it, and sends the non-zero part
COMPUTATIONAL CONCERNS 255
to a new place. The pattern is as follows for a 5 x 5:
0
X
X
21
X
X
24
23
X
X
26
zs
x
x
0
non-zero at stage i and zero at the other stages. Thus T1 zeros and creates a non-zero entry at z2. R2 zeros z2 and creates a non-zero entry at z3. The B, computed in this way will again be bidiagonal and in the form of (1). (This requires proof.) Steps V and IV are reasonably self-explanatory. Here
4.
is
(1)-inverses
When dealing with generalized inverses that are not uniquely defined, one
cannot talk about error in the same sense as we did in Sections 1 and 2. Probably the easiest way to compute a (1)-inverse is as follows. By reducing to echelon form, one can find permutation matrices P. Q so that
PAQ
U is square and rank (U) = rank (A). Then
=
'
a (1)-inverse of A. Note that since there is some choice as to which columns go into U one might in some cases be able to control to some extent how well conditioned U is. How to check a computed inverse will depend on how it is being used. For example, even if A is invertible it is sometimes possible to find X so that AX — I is small but XA — I is large. is
Example 124.1 ABA
—
A=
I
L0
Let A
0], =
B
Then
=
I which is small. Thus B comes
—CJ
Iii
ru
(1)-inverse. However B gives BL
i.j = Lo]
which has the actual solution x =
.
as
to being a
a solution of Ax
['_
Ii = Li 1
is
the B is small and (1)-inverses solve consistent systems. As Example I shows some care must be taken. If row operations are to be used to compute A, then a well-conditioned matrix is desirable.
5.
Computation of the Drazin inverse
Since AD
=
I)tAt for 1
Ind(A),
(1)
256 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
one might suppose that computation of the Drazin inverse would be, at
worst, just a little more work than computation of the Moore—Penrose inverse. However, there are several problems. The first difficulty is that small perturbations in A can cause arbitrarily large perturbations in AD. Unlike the Moore—Penrose, the singular value decomposition of A does not immediately help because (UAW)D is not, in general, equal to W*ADU* when U, W are unitary. The following weaker version of Theorem 2.1 shows that computation of AD does make some sense when using nonexact arithmetic. Of course, there is no theoretical difficulty if exact arithmetic is used and A is known exactly. xn Theorem 12.5.1 IfAeC" and {X,} is such that {X.AX, — X,}, {AX, — X,A}, IAk.I. 'X, — all converge to zero and there is an L such X
Lfor all r, then Xr —, A".
X,
that
Proof
Suppose {Xj, A satisfy the assumptions of Theorem 1. Suppose of {X,.} X, does not converge to AD. Then there is a subsequence
such that 11X5—A1 e>Oforallsand some e>O. Since {Xj is bounded, it has a subsequence {Xj which converges. Let X0 = urn X,. Then X0
AD and X0A = AX0, X0AX0 = X0,
1X0 = A. Thus X = AD
U which is a contradiction. Hence X, The next theorem is true with no assumption on md (A) but the proof came too late for inclusion. AD.
A a Suppose that {A2X, — A), sequence of matrices such that the sequences {AX, — X,A}, and {XrAX, — X,) all converge to zero, then X, —, A
Theorem 12.5.2
Proof Suppose A has index one and {XJ is a sequence which satisfies the assumptions of Theorem 2. A is similar by some non-singular B to a matrix of the form o
o
Let BXB1
where C is invertible.
ix
(r)
=
X (r)1 Then by assumption, X4(r)j
1CX1(r) CX2(r)1 L
0
C2X1(r)
L0
iX3(r)C 01
0 ]Lx3rc o],0, O]o OJLOOJ
C2X2(r)]
—
(2)
(3)
COMPUTATIONAL CONCERNS 257
and
Xi(r)CX2(r)1 — X2(r)1 LX3(r)CX1(r) X3(r)CX3(r)J LX3(r) X4(r)J
(4)
From (2), we get X2(r) —'0, X3(r) -+0, since C is invertible. But then X4(r) 0 from (4). Equation (3) yields X1 (r) C - Hence X, —, A
as desired. U Another difficulty is that the index is not generally known. There are
exceptions, such as was the case with the Markov chains in Chapter 8. Thus to use (1) one must either get an estimate on the index or use n instead of k. However, using n can introduce large errors for moderately sized matrices. Example 12.5.1
=
and I is an r x r identity
If
] where A is 20 x 20,
Let A
for 1 r
19. Then A41 =
is considered zero, then (1) would produce
is actually
11001
[1
= 82
10
2
0]
=0, whereas AD
0
L
Numerical experiments using (1) with reasonably conditioned matrices
have shown that one can run into difficulty even with n less than 10. The difficulty in Example 1 is the loss of 'small' eigenvalues. Suppose A = P
P1 is the Jordan canonical form for A.
Then
1
IP'•
AM=PI
OJ
L
+1J1 corresponds to an eigenvalue A. such that rounds to zero, then (1) will produce a commuting (2)-inverse and not AD. Thus in checking a computed AD, AD, it is important to check whether it reduces powers. However, if there is a 1. x t. such that rounds to A' where as in fact zero, then to machine accuracy we will have AD is not close to AD. Thus if one has a 20 x 20 matrix of low index, using (1) with n =20 could lead to erroneous answers that would not be detected. For these reasons we prefer other methods to (1). Computation of the Jordan canonical form is also very sensitive so that one should not try and compute it in full detail unless it is needed. If one is going to calculate A 11(11 AD + 1) or some such, a much better idea of the conditioning would be
If any of the t1
C(A) =
x
H P1111 P 111(11
Ill + II
II')
258 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
is the Jordan form of A. This C(A) of course depends on P. However, if it is not too large, the eigenvalues are not likely to be lost and eigenspaces are not nearly parallel. The more one is willing to lump eigenvalues together and weaken the Jordan form, the less computational difficulty there should be. Let N be an n x n nilpotent Jordan block. Then, N is nilpotent with index n and, of course, ND =0. Zero is the only eigenvalue of N. Now let N be the same as N except for an >0 in the (n, 1) place. Then N— = £ but R has n eigenvalues of modulus The singular values of N, however, are (n — 1) ones and e, while the singular values of N are (n — 1) ones and zero. If c = 10-20 n = 20 then has an eigenvalue of 0.1 whereas if N — if = 10 20• It is because of this stability of the singular values, in particular the zero ones, and the instability of the zero eigenvalues that we suggest the following method for calculating This method is based on the orthogonal deflation method in [39]. Since there are established subroutines for calculating singular value decompositions, it should be fairly easy to implement. where
Algorithm 12.5.1 (I) Given A, calculate the singular value decomposition of
A=U[f If
then AD = A' =
If Ind(A)> 0, write 0 0
VAV*_VUF1
(II) Now calculate the singular value decomposition of
If
is invertible, go to Step IV. If not, then
111 0] VAU)V*_VU ii 'Lo 1
0 0
1
Thus
0 0 0
L
0
1
= 1
o] 0J
(4)
COMPUTATIONAL CONCERNS 259
(III) Continue in the manner of Step II, at each step calculating the and performing the appropriate singular value decomposition of =0, then AD =0. If some multiplication as in (4) to get is non-singular, go to Step IV. If some (IV) We now have k = Ind(A) and 0
0
01
FB,
WAW*=
=1 31
lc+1.I
*
I
k+1.2
1B
•.
k+l.k
L
I
(5)
I
2
I
0
where B1 is invertible, N is nilpotent. and W is unitary. The Drazin inverse of(S) may be computed as 0 0
where X is the solution of XB1 — NX = B2. The rows x1 of X are recursively r. the ith row of B2, then x1 = r1B1 and solved as follows. If N = x1
= (ri +
Note that the singular value decompositions are performed on
successively smaller matrices. Also the unitary matrix W is the product of
matrices of the form ['
and the size of V. decreases with i. Thus the
amount of computation decreases on each step. If k is not large in comparison with the nullity of the core of A, this method seems reasonable. If one suspects that k is comparable to the nullity of the core of A, then it might be better to use a method like the double Francis QR algorithm and get
U*AU=T where T is upper-triangle with the zero diagonal entries listed first. Thus
O.xl 1' A
0 —
0
x
[N B1 L° B2
—I
where N is nilpotent and B2 is invertible. Then Theorem 7.8.1 can be
used on T. Since the singular value decomposition is considered as reliable a way
(6)
260 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
any of determining numerical rank, the deflation algorithm produces a reliable value of the index. On the other hand, (6) provides information on the eigenvalues of A. As pointed out earlier if eigenvalues are small when taken to the kth power, then the algebraic definition cannot be used to numerically distinguish between AD and some commuting (2)-inverse. Thus the added information provided by (6) could be helpful. If one does an operational count on finding the Drazin inverse of a 20 x 20 matrix with core-rank 8 and index 3, then the deflation method entails about the same work as the power method using n = 20, and avoids the risk of losing small eigenvalues. If the deflation method reveals a small index, then (1) can be computed as a check using k instead of n. Note that R(I — ADA) can be immediately read off from (5) or (6) without any additional effort. Another method that has some strong points is the one given in Theorem 7.8.2. There successive full-rank factorizations of successively smaller matrices are performed until an invertible matrix is reached. This method involves only elementary row operations, and matrix multiplications. It is fairly easy to program. as
6.
Previous algorithms
In other parts of this book we have presented several algorithms. This section will list the algorithms and those parts of the book that discuss computation. In general, the algorithms were only compared for operational counts. Error analyses are not given. The methods are all fairly easy to program and when tested by the authors worked well for small wellconditioned matrices (less than 10 x 10). All algorithms terminate in a finite number of steps.
1. Algorithm 1.3.1, (page 16). Computes A' from geometric definition. 2. Algorithm 1.3.2, (page 16). Computes At by computing a full rank factorization. 3. Algorithm 3.2.1, (page 51). Computes At when A is rank one modification of a matrix B for which Bt has been computed. 4. Algorithm 3.3.1, (page 56). Computes A' by a sequence of rank one modifications. 5. Algorithm 3.3.2, (page 57). Computes A' when A is a hermitian matrix. Computation of any A may be reduced to computing the Moore— Penrose inverse of a hermitian matrix. 6. Algorithm 7.2.1, (page 125). Computes AD by computing corenilpotent decomposition. 7. Theorem 7.5.2, (page 130). Computes AD from eigenvalues of A and their multiplicities.
COMPUTATIONAL CONCERNS 261
Algorithm 7.5.1, (page 134). Computes AD by a finite number of recursively defined operations. 8.
9. Section 8.5 Discusses computation of A and w is the fixed probability vector. Of course, throughout the text there are results, on partitioned matrices for example. that can be useful in special cases. The results of Chapter 10 on continuity of generalized inversion may be useful in error analysis. 7. 1.
Exercises (Alternate Proof of Theorem 12.2.1 for real matrices.) Take AERMXI*. Definef
x
x
x
f(X)=(AXA — A, XAX — X, AX — X*A*, XA — A*X*).
(1)
At any the derivative off, denotedf'(X0), is a linear X x x '"into transformation from R'" m x R" ". Show is that its value at XE R" X
f'(X0)X = (AXA, XAX — X, AX — X*A*, XA — A*X*).
(2)
Show that f'(X0) is one-to-one for all X0, and thatf is also one-to-one. Conclude thatf has a continuous inverse from its range onto R" thus proving Theorem 12.2.1 for real matrices. 2. CflXfl
by
g(X) = (AX — XA,
'X
—
Ak, XAX — X).
Show that g'(X0)X = (AX — XA, Ak+ 'X, X0AX + XAX0
- X).
Show that g'(X0) need not be one-to-one but that
is one-to-one. 3. Using Exercise 2 show that for any AEC" ",there exists a constant K such that if g(X,) —'0 and X, K for a sequence { X,), then X, -. AD. Exercises 4—9 are from [7], [8], and [13].
4. Suppose X0eC" '"satisfies (1) X0 = MB0, B0eC'" '",B0 non-singular, (ii) X0 = C0A*, C0E C" ",C0 non-singular, (iii) AX — < 1, and (iv) X0A — PR( <1, where is a multiplicative matrix ask—' cc. norm. Let Xk+i =Xk(2PR(A)—AXk). Then 5. Let A1 denote the largest eigenvalue of AA*. 110 < < 2/A1, set X0= xA* and define Xk+l = Xk(2I — AX,). Show X,—'A' as k—' cc. H
k
6.
Define
as in Exercise 5. Let X, =
as k —' cc.
A*(I —
Show X, —, At
262 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
Show the convergence in Exercise 6 is of the first order, the convergence in Exercise 5 is of the second order. 8. Let u beas in Exercise 5. Let Show Zk —, AAt as k —, and '1R(A) — Zk+l — Zk 112 7.
II
9.
be as in Exercise 8. Show Let rank A.
II
converges monotonically to
Exercises 11—16 are from [14], [51]. Then A = M — N is a proper splitting if R(A) = R(M), 10. Let N(A) = N(M). Let p(S) denote the spectral radius. In Exercises 11—16, M, N is to be considered a proper splitting of A. =MtNX1÷Mt where A =M —Nis a proper splitting of A. 11. Let —, At if and only if p(MtN) < 1. Show 12. Show that if M4 isa (1,4)-inverse forM, then (I — is well-defined and a (1,4)-inverse for A. 13. Show that if M is a (1,3)-inverse for M, then (I — is well-defined and a (1,3)-inverse for A. 14. Show that At = (I — MtN) 'Me.
15. Show that if M is a (1,3) or (1,4)-inverse and Xk+i = MNXk + M, then Xk+l converges if and only if p(MN) < 1. If it converges it converges to a (1,3) or (1,4)-inverse, respectively, 16. Show that if A has full column rank, there exists a proper splitting of
A such that Mt = M. 17.
18. Prove Proposition 12.3.1. 19. Prove Proposition 12.3.2. Prove Theorem 12.5.2 without assuming Ind(A) 1. (See 'Continuity of the Drazin Inverse' by S. L. Campbell (to appear) for details.)
Bibliography
mentioned in Chapter 0, reference [64] has an annotated 1775 item bibliography on generalized inverses. Accordingly, we have made no attempt to be complete. The references listed fall into three groups. Most are explicitly mentioned in the text. We have also referenced only that part of our work that appears in the book and was co-authored with others, principally N. J. Rose. Finally, the idea of the Drazin inverse has recently proved useful in the study of singularly perturbed autonomous systems. This recent work, [19], [20], [22], [25], [26] and other related applications, [21], [58], [61] for example, were not included in the text due to page and time limitations, but are included in the references. I Athens, M. and Falb, P. L. Optima! Control. McGraw-Hill, New York, 1966. 2 Anderson, W. N. Jr. Shorted operators, SIAM J. app!. Math. 20, As
520—525, 1971.
Anderson, W. N. Jr. and Duflin, R. J. Series and parallel addition of matrices. J. Math. Anal. App/ic. 26, 576—594, 1969. 4 Anderson, W. N. Jr., Duffin, R. J. and Trapp, 0. E. Matrix operations induced by network connections. SIAM J. Control 13, 3
446—461, 1975. 5
Anderson, W. N. Jr. and Trapp, 0. E. Shorted operators II. SIAM J. app!. Math. 28, 60-71, 1975.
Bart, H., Kaashoek, M. A. and Lay, D. C. Relative inverses of meromorphic operator functions and associated holomorphic projection functions. Mash. Ann. (to appear). 7 Ben-Israel, A. An iterative method for computing the generalized inverse of an arbitrary matrix, Math. Comp. 19,452—455, 1965. 8 Ben-Israel, A. A note on an iterative method for generalized inversion of matrices. Math. Comp. 20,439—440, 1966. 9 Ben-Israel, A. On error bounds for generalized inverses. SIAM J. nwner. Anal. 3, 585-592, 1966.
6
264 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS 10 11
Ben-Israel, A. and Charnes, A. Generalized inverses and the BottDuffin network analysis, J. math. Anal. Applic. 7,428—435, 1963. Ben-Israel, A. Linear equations and inequalities on finite dimensional, real or complex vector spaces: a unified theory. J. math. Anal. Applic. 27, 367—389, 1969.
12
Ben-Israel, A. A note on partitioned matrices and equations. SIAM Review 11,247—250, 1969.
13
Ben-Israel, A. and Cohen, D. On iterative computation of generalized inverses and associated projections. SIAM I. numer. Anal. 3,410—419, 1966.
14
Berman, A. and Plemmons, R. J. Cones and iterative methods for best least-squares solutions of linear systems. SIAM J. numer. Anal. 11, 145—154, 1974.
15
Businger, P. A. and Golub, 0. H. Algorithm 358 singular value decomposition of a complex matrix. Communications of ACM 12, 564—565, 1969.
16
Campbell, S. L. Differentiation of the Drazin inverse. SIAMJ. appi. Math. 30, 703—707, 1976.
17
Campbell, S. L. The Drazin inverse of an infinite matrix. SIAM).
18
app!. Math. 31,492—503, 1976. Campbell, S. L. Linear systems of differential equations with singular coefficients. SIAM J. math. Anal. 8, 1057—1066, 1977.
Campbell, S. L. On the limit of a product of matrix exponentials. Linear multilinear AIg. 6, 55—59, 1978. 20 Campbell, S. L. Singular perturbation of autonomous linear systems II. J. Eqn. 29, 362-373, 1978. 21 Campbell, S. L. Limit behavior of solutions of singular difference 19
equations. Linear Aig. applic. 23, 167—178, 1979.
Campbell, S. L. Singular perturbation of autonomous linear systems IV (submitted). 23 Campbell, S. L. and Meyer, C. D. Jr. Recent applications of the Drazin inverse. In Recent Applications of Generalized Inverses M. Nashad, Ed. Pitman Pub. Co., London, 1979. 24 Campbell, S. L. Meyer, C. D. Jr. and Rose, N. J. Applications of the Drazin inverse to linear systems of differential equations. SIAM I. 22
app!. Math. 31,411—425, 1976. 25
Campbell, S. L. and Rose, N. J. Singular perturbation of autonomous linear systems. SIAM). math. Anal. 10, 542—551, 1979.
26
Campbell, S. L. and Rose, N. J. Singular perturbation of autonomous linear systems III. Houston J. Math. 44, 527—539, 1978.
Cederbaum, I. On equivalence of resistive n-port networks. IEEE Trans. Circuit Theory, Vol. CT-l2, 338—344, 1965. 28 Cederbaum, I. and Lempel, A. Parallel connection of n-port networks. IEEE Trans. Circuit Theory, Vol. CT-14, 274—279, 1967. 29 Churchill, R. V. Operational Mathematics. McGraw-Hill, New York, 27
1958.
BIBLIOGRAPHY 265 30 31
32
Cline, R. E. Representations for the generalized inverse of sums of matrices. SIAM J. nwner. Anal. Series B, 2, 99—114, 1965. Cline, R. E. Representations for the generalized inverse of a partitioned matrix. SIAM J. app!. Math. 12, 588—600, 1964. Drazin, M. P. Pseudoinverses in associative rings and semigroups. Amer. Math. Month!)' 65, 506—5 14, 1968.
33
34 35
36 37
38
39
40
41
Faddeev, D. K. and Faddeeva, V. N. Computational Methods of Linear Algebra, (translated by Robert C. Williams). W. H. Freeman and Co., San Francisco, 1963. Gantmacher, F. R. The Theory of Matrices, Volume II. Chelsea Publishing Company, New York, 1960. Gallie, 1. M. Calculation of the generalized inverse of a matrix, Technical Report CS-1975-7, Computer Science Department, Duke University. Golub, 0. and Kahan, W. Calculating the singular values and pseudoinverse of a matrix. SIAM J. numer. Anal. Series B., 2, 205—224, 1965. Golub, 0. H. and Pereyra, V. The differentiation of pseudo-inverses and non-linear least squares problems whose variables separate. SIAM /. numer. Anal. 10,413—432, 1973. Golub, 0. H. and Wilkinson, J. H. Ill conditioned eigensystems and the computation of the Jordan canonical form. Technical Report, STAN-CS-75-478. Golub, G. H. and Reinsch, C. Singular value decomposition and least squares solutions. Numer. Math. 14, 403—420, 1970. Greville, T. N. E. Spectral generalized inverses of square matrices. MRC Tech. Sum. Rep. 823, Mathematics Research Center, University of Wisconsin, Madison, 1967. Greville, T. N. E. The Souriau-Frame algorithm and the Drazin pseudoinverse. Linear Alg. Applic. 6, 205—208, 1973.
42
43
44 45
46 47 48
49 50
Hakimi, S. L. and Manherz, R. K. The generalized inverse in network analysis and quadratic error-minimization problems. IEEE Trans. Circuit Theory Nov., 559—562, 1969. Householder, A. S. The Theory of Matrices in Numerical Analysis. Blaisdell Publishing Co., New York, 1964.
Huelsman, L. P. Circuits, Matrices, and Linear Vector Spaces. McGraw-Hill, New York, 1963. Jacobson, D. H. Totally singular quadratic minimization problems. IEEE Trans. Automatic Corn. 16, 651—657, 1971. Kemeny, J. 0. and Snell, J. L. Finite Markov Chains. D. Van Nostrand Company, New York, 1960. Kemeny, J. 0. Snell, J. L. and Knapp, A. W. Denumerable Markov Chains. D., Van Nostrand Company, New York, 1966. Linear Inequalities and Related Systems (Kuhn, H. W. and Tucker, A. W. Eds.), Princeton University Press, Princeton, N. J., 1956. Lancaster, P. Theory of Matrices. Academic Press, New York, 1969. Lay, D. C. Spectral properties of generalized inverses of linear
266 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
operators. SIAM J. app!. Math. 29, 103—109, 1975. 51
Lawson, L. M. Computational methods for generalized inverse matrices arising from proper splittings. Linear Aig. Applic. 12, 111—126, 1975.
Lawson, C. L. and Hanson, R. J. Solving Least Squares Problems. Prentice-Hall, New Jersey, 1974. 53 Meyer, C. D. Jr. Generalized inverses of triangular matrices. SIAM J. app!. Math. 18,401—406, 1970. 54 Meyer, C. D. Jr. Generalized inverses of block triangular matrices. 52
55
SIAM J. app!. Math. 19, 741—750, 1970. Meyer, C. D. Jr. The Moore—Penrose inverse of a bordered matrix. Linear Aig. Applic. 5, 375—381, 1972.
56
Meyer, C. D. Jr. Generalized inversion of modified matrices. SIAM I. app!. Math. 24, 315—323, 1973.
57
Meyer, C. D. Jr. An alternative expression for the mean first passage matrix. Linear AIg. Applic. 22,41—47, 1978.
Meyer, C. D. Jr. and Plemmons, R. J. Convergent powers of a matrix with applications to iterative method for singular linear systems. SlAM J. numer. Anal. 14,699—705, 1977. 59 Meyer, C. D. Jr. and Rose, N. J. The index and the Drazin inverse of block triangular matrices. SIAM J. app!. Math. 33, 1—7, 1976. 60 Meyer, C. D. Jr. and Shoaf, J. M. Updating finite Markov chains by 58
using techniques of generalized matrix inversion. J. Stat. Comp. and Simulation, II, 163—181, 1980. 61
Meyer, C. D. Jr. and Stadelmaier, M. W. Singular M-matnces and inverse positivity. Linear Aig. Applic. 22, 129—156, 1978.
62 63
64 65 66 67
Mihalyffy, L. An alternative representation of the generalized inverse of partitioned matrices, Linear Aig. Applic. 4,95—100, 1971. Moore, R. H. and Nashed, M. Z. Approximations to generalized inverses. University of Wis., MRC Tech. Summary Report # 1294. Generalized Inverses and Applications (Nashed, M. Z. Ed.), Academic Press, New York, 1976. Noble, B. Applied Linear Algebra. Prentice-Hall, New Jersey, 1969. Pearl, M. Matrix Theory and Finite Mathematics. McGraw-Hill, Inc., New York, 1973. Penrose, R. A generalized inverse for matrices. Proc. Cambridge Phil. Soc. 51,406—413, 1955.
68
Penrose, R. On best approximate solutions of linear matrix equations.
69
Proc. Cambridge Phil. Soc. 52, 17—19, 1955. Pyle, L. D. The generalized inverse in linear programming. Basic structure. SIAM I. app!. Math. 22, 335—355, 1972.
70
Pyle, L. D. and Cline, R. E. The generalized inverse in linear programming—interior gradient projection methods. SIAM J. app!. Math. 24, 511—534, 1973.
71
Rao, C. R. Some thoughts on regression and prediction, Part 1. Sankhyã 37, Series C., 102—120, 1975.
72
Rao, J. V. V. Some more lepresentations for the generalized inverse of a partitioned matrix. SIAM J. app!. Math. 24, 272—276, 1973.
BIBL$OGRAPHV 267 73
74
Rao, T. M. Subramanian, K. and Krishnamurthy, E. V. Residue arithmetic algorithms for exact computation of g-inverses of matrices. SIAM.!. numer. Anal. 13, 155—171, 1976. Robert, P. On the group inverse of a linear transformation. J. math. Anal. App/ic. 22, 658—669, 1968.
75 76 77
Robertson, J. B. and Rosenberg, M. The decomposition of matrix valued measures. Mich. math. J. 15, 353—368, 1968. Rosenberg, M. Range decomposition and generalized inverse of non-negative hermitian matrices. SIAM Review 11, 568—571, 1969. Rose, N. J. A note on computing the Drazin inverse. Linear Aig. App/ic. 15, 95—98, 1976.
Rose, N. J. The Laurent expansion of a generalized resolvent with some applications. SIAM J. app!. Math. (to appear). 79 Schwerdtfeger, H. Introduction to Linear Algebra and the Theory of Matrices. P. Noordhoff, N. V., Groningen, Holland, 1961. 80 Scholnik, H. D. A new approach to linear programming, preliminary report. 81 Shinozaki, N., Sibuya, M. and Tanabe, K. Numerical algorithms for the Moore—Penrose inverse of a matrix: direct methods. Ann. Inst. statist. Math. 24, 193—203, 1972. 82 Shinozaki, N., Sibuya, M. and Tanabe, K. Numerical algorithms for the Moore—Penrose inverse of a matrix: iterative methods. 78
Ann. Inst. statist. Math. 24, 621 —629, 1972. 83
Shoaf, J. M. The Drazin inverse of a rank-one modification of a square matrix. Ph.D. Dissertation, North Carolina State University, 1975.
Simonnard, M. Linear Programming. Prentice-Hall, Inc., Englewood Cliffs, N. J., 1966. 85 Smythe, W. R. and Johnson, L. A. Introduction to Linear Programming, with Applications. Prentice-Hall, Inc., Englewood Cliffs, N. J., 1966. 86 Söderstrom, T. and Stewart, G. W. On the numerical properties of an iterative method for computing the Moore—Penrose generalized inverse. SIAM.!. numer. Anal. 11, 6 1—74, 1974. 87 Stallings, W. T. and Boullion, T. L. Computation of pseudo-inverse matrices using residue arithmetic. SIAM Review 14, 152—537, 1972. 88 Stewart, 0. W. On the continuity of the generalized inverse. SIAM J. 84
app!. Math. 17, 33—45, 1968.
Stewart, G. W. On the perturbation of pseudo-inverses, projections and linear least squares problems. SIAM Review, 19, 634—662, 1977. 90 Szabo, S. and Tanaka, ft. Residue Arithmetic and Its Application to Computer Technology. McGraw-Hill, New York, 1967. 91 Wedin, P.-A. Perturbation bounds in connection with singular for information Behandling value decomposition. Nordisk 89
12,99—111,
1972.
92 Young, D. M. and Gregory, R. T. A
Mathematics, Vol.
Survey of Numerical
2, Addision-Wesley, Reading, Mass. 1973.
Index
Absorbing, chain, 152, 165, 169 state, 152 Absorbtion probability, 166 Adjoining a row and/or a column, 54 Algorithms, 260 Analytic matrix valued function, 227 matrices, 109, 118 Backward population projection, 184
Best linear unbiased estimate, 109 Block, form, 3 triangular matrix, 61 Blue, 109,112,113,114 Bou—Duflln inverse, 117
Calculation of the Moore-Penrose inverse, 247 Canonical form for the Drazan inverse, 122 Cauchy's inequality, 234 Characteristic polynomial, 132, 207
aosest vector, 29 Coethcient of determination, 35 Common solution, 98 Commutator-(X,YJ, 23 Commuting weak Drazin inverse, 203 Complementary subspace, 3 Component matrices, 201 Computation of the Drazm inverse, 255 Computational concerns, 246 Condition number, 224,247 Conformable, 53
Conjugate transpose, 2 Consistent, initial vector, 172, 182 model, 106 norms, 212 systems, 93
Constrained, generalized inverse, 68 least squares solutions, 65 minimization, 63
Continuity, of generalized inverses, 210 of the Drazin inverse, 232 of the index, 233 of the Moore—Penrose inverse, 216 Core of a matrix, 127 Core—nilpotent decomposition, 128 Curve fitting, 39 Cyclic chain, 152
Deflation, 258 Derivative, of matrix valued function, 226 of Moore—Penrose inverse, 227 Design matrix, 32 Difference equations—systems, 181 Differential equations—systems, 171 Direct sum, 3 Directed graph, 117 Discrete control problem, 197 Distinguished columns, IS Doubly stochastic matrix, 163 Drazin inverse, a polynomial, 130
270 INDEX algebraic definition, 122
asa limit, 136 functional definition, 122 of a partitioned matrix, 139 Dual, 238 Electrical engineering, 77 EP, matrices, 208 matrix, 74, 129
Equation solving inverses, 93 Equivalent norms, 212 Ergodic, chain, 152 158 set, 151, 166 state, 151, 165 Estimable, 106 Estimating equation, 34 EucLidean norm, 28 Expected, absorption times, 169 number of times in a state, 158
Farkas theorem, 239 Feasible solution, 238 Final space, 72 Finite Markov chain, 151 Fitting a linear combination of functions, 43 Fixed probability vector, 156, 160, 162,208 Flat, 31 Full rank factorization, 14, 15, 16, 148, 149,249 Functional, error, 30 relationship, 30 Functions of a matrix, 200 Fundamental matrix of constrained minimization, 63 Gaussian elimination, 246, 250 Gauss—Markov theorem, 105 General solution of AXB = C, 97 Generalized, eigenvector of grade P, 129 inverse, functional definition, 8 Moore definition, 9 of a sum, 46 Penrose definition, 9
Newton method, 118 Givens rotation, 249, 252 Goodness of fit, 34, 38,39 Group inverse, 124 of a block triangular matrix, 143 Hermite, echelon form, 15 form, 115, 117 Hermitian, 2 Homogeneous, Markov chain, 151 systems—differential equations, 175 Householder transformation, 249,251 Idempotent matrices, 201 Impedance, 78 matrix, 80 Inconsistent systems, 94 Index, of a block triangular matrix, 142 of a matrix, 121, 138 of a product, 150 of zero eigenvalue, 132 Initial, space, 72 value problem, 172 Inputs, 78 Integer solution, 116 Integration of exponential, 176 Intercept hypothesis, 32 Invariant subspace, 4, 7 Inverse-,(i,j, k), 91, 95
(l),96 (l,2),96 (1, 2, 3, 4), 96
(1,3), 96 (1,3,4), 25 (1,4), 96 Inverse function, 8 Isometry, 71 Iterative methods, 250, 261,262
Kirchhoff law, 78 Kronecker product, 115 I_east,
squares generalized inverse, 95 squares solution, 28
INDEX
Leslie, matrix, 185
population model, 184 Limited probability, 168 Limits for transition matrices, 155 Linear, estimation, 104 functional reLationship, 30 hypothesis, 30 model, 105 programming, 237 unbiased estimates, 107 Linearly unbiased estimate, 106 Markov process, 151 Matrix, norm, 211 valued functions, 224 Maximal element, 127 Mean first passage matrix, 159 Measurable matrix valued functions, 235
Measurement errors, 30 Meromorphic matrix valued function, 228
Minimal, least squares solution, 28 norm constrained least squares solutions, 67 polynomial, 132,207 rank weak Drazin inverse, 203 V-norm solution, 117 V-seminorm W-least squares solution, 118 Minimum, norm generalized inverse, 94 variance linear unbiased estimate, 107 Modified matrices, 51 Moore—Penrose inverse, 10,92 as an integral, 26 of block triangular matrix, 62 of partitioned matrix, 54, 55,58 of rank 1 matrix, 25 n-port network, 78 Nilpotent part of a matrix, 128 No intercept hypothesis, 32 Nonlinear least squares, 229 Norm, matnx, 211
271
operator, 211 vector, 210 Normal, 5 equations, 29 NR-Generalized inverse, 92 Null space, 2
Oblique projector, 92 Operation, 52 Operator norm, 211 Optimal, control, 187, 189 value, 239 vector, 238 Orthogonal projector, 2, 3, 29, 92, 218 Outputs, 78 p-norm, 211 Parallel sum, 82, 84 Partial isometry, 72 Partitioned, matrices, 116 (1,2)-inverse 100 (1)-inverse 98 matrix, 53
Perturbation of the Moore-Penrose inverse, 224
Polar fonn, 5, 73 Polyhedral cone, 244 Port, 78 Positive semi-definite, 102 Prescribed range null space inverse,
92,93 Product moment correlation, 35 Projective weak Drazin inverse, 203 Projector, 2 Proper splitting, 262 Properly partitioned block triangular matrix, 61 Property, of homogeneity, 78 of superposition, 78 Pyle's formulation, 241
Qudratic cost functional, 188 Range, 2 Rank, function, 225 of block triangular matrix, 104 of partitioned matrix, 102
272
INDEX
Rank 1 modification, 47, 116 Reciprocal network, 81 Rectangular systems of differential eq., 178 Reducing subspace, 4,7 Regular chain, 152, 157 Relative error, 34 Resistive network, 82 Restricted, linear model, 115 linear transformation, 9, 91, 121 model, 106 Reverse order law, 19, 26, 95, 115, 149 Row, echelon form, 14 space, 101
Shorted, matrices, 86 matrix, 87 Singular value, 6, 214 decomposition, 6, 247, 251 Slack variables, 237
Special, mapping property, 75, 205 properties of the Drazin inverse, 129 Standard basis, 47 Star cancellation law, 3 Stochastic matrix, 152 sup norm, 211
Terminals, 77 Tractable, 172, 182 Transient set, 151, 165, 168 Transition matrix, 151
Unitary matrices, 12, 72 Variance of first passage time, 160 Weak Drazin inverses, 202, 203 Weighted, generalized inverse, 117 least squares solution, 117 Moore—Penrose inverse, 118 normal equations, 115