Generalized Inverses of Linear Transformations

Generalized Inverses of Linear Transformations Books in the Classics in Applied Mathematics series are monographs and...

Author: Stephen L. Campbell | Carl D. Meyer

416 downloads 2018 Views 6MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

Generalized Inverses of Linear Transformations

Books in the Classics in Applied Mathematics series are monographs and textbooks declared out of print by their original publishers, though they are of continued importance and interest to the mathematical community. SIAM publishes this series to ensure that the information presented in these texts is not lost to today's students and researchers.

Editor-in-Chief Robert E. O'Malley, Jr., University of Washington Editorial Board

John Boyd, University of Michigan Leah Edelstein-Keshet, University of British Columbia William 0. Fans, University of Arizona Nicholas J. Higham, University of Manchester Peter Hoff, University of Washington Mark Kot, University of Washington Hilary Ockendon, University of Oxford Peter Olver, University of Minnesota Philip Protter, Cornell University Gerhard Wanner, L'Université de Geneve Classics in Applied Mathematics C. C. Lin and L. A. Segel, Mathematics Applied to Deterministic Problems in the Natural Sciences Johan 0. F. Belinfante and Bernard Kolman, A Suntey of Lie Groups and Lie Algebras with Applications and Computational Methods James M. Ortega, Numerical Analysis. A Second Course Anthony V. Fiacco and Garth P. McCormick, NonhineaT Sequential Unconstrained Minimization Techniques F. H. Clarke, Optimization and Nonsmooth Analysis George F. Carrier and Carl E Pearson, Ordinary Differential Equations Leo Breiman, Probability

R. Bellman and 0. M. Wing, An Introduction to Invariant Imbedding Abraham Berman and Robert J. Plemmons, Nonnegative Matrices in the Mathematical Sciences Olvi L Mangasarian, Nonlinear Programming °Carl Friedrich Gauss, Theory of the Combination of Observations Least Subject to Errors: Part One, Part Supplement. Translated by 0. W. Stewart Richard Bellman, Introduction to Matrix Analysis U. M. Ascher, R. M. M. Mattheij, and R. D. Russell, Numerical Solution of Boundary Value Problems for Ordinary Differential Equations K. E. Brenan, S. L Campbell, and L. R. Petzold, Numerical Solution of lnitial.Value Problems in Differential.Algebmic Equations Charles 1. Lawson and Richard J. Hanson, Solving Least Squares Problems J. E. Dermis, Jr. and Robert B. Schnabel, Numerical Methods for Unconstrained Optimization and Nonlinear Equations

Richard E. Barlow and Frank Proechan, Mathematical Theory of Reliability Cornelius Lanczos, Linear Differential Operators Richard Bellman, Introduction to Matrix Analysis, Second Edition

*First time in print.

Classics in Applied Mathematics (continued) Beresford N. Partert, The Symmetric Eigenvaluc Problem Richard Haberman, Mathematical ModeLs: Mechanical Vibrations, R*ulation Dynamics, and Traffic Flow Peter W. M. John, Statistical Design and Analysis of Experiments and Geert Jan Olsder, Dynamic Noncooperative Game Theory, Second Edition Tamer Emanuel Parzen, StOChaStiC Processes Petar Kokotovi& Hassan K. KhaLit, and John O'ReilLy, Singular Perturbation Methods in Cont,ob Analysis

and Design Jean Dickinson Gibbons, Ingram 01km, and MiLton Sobel, Selecting and Ordering Fbpulations: A New Statistical Methodology James A. Murdock, Perturbations: Theory and Methods and Variational Problems Ivar EkeLand and Roger Témam, Convex AnaLysis Ivar Stalcgold, Boundary Value Problems of Mathematical Physics, Volumes I and ii J. M. Ortega and W. C. Rheinboldt, Iterative Solution of Nonlinear Equations in Several Variables David Kinderlehrer and Guido Srampacchia, An introduction to Variational inequalities and Their Applications

Natterer, The Mathematics of Computerized Tomography Avinash C. Kak and MalcoLm Slaney, Principles of Computerized Tomographic Imaging E

R. Wong, Asymptotic Approximations of integrals

0. Axelsson arid V. A. Barker, Rnite Element Solution of Boundary Value Problems: Theory and Computation David R. Brillinger, TIme Series: Data Analysis and Theory Joel N. Franklin, Methods of Mathematical Economics linear and Nonlinear Programming, Rxed-ibint Theorems

Philip Hartman, Ordinary Differential Equations, Second Edition Michael D. Inmilgator, Mathematical Optimization and Economic Theory Philippe 0.

The FInite Element Method for EUiptiC Problems Jane K. Cullum and Ralph A. Willoughby, Lanczos Algorithms for Large Symmetric Eigennalue Vol.

Theory

M. Vidyasagar, NOrjineaT Systems Analysis, Second Edition Robert Mattheij and Jaap Molenaar, Ordinary Differential Equations in Theory and Practice Shanti S. Gupta and S. Panchapakesan, Multiple Decision Procedures: Theory and Methodology of Selecting and Ranking Populations Eugene L Ailgower and Kurt Georg, Introduction to Numerical Continuation Methods

Leah Edelstein-Keshet, Mathematical Models in Biology Heinz-Otto Kreiss and Jens Lorenz, initial.Boundary Value Problems and the Navier-Stokes Equations J. L. Hodges, Jr. and E L Lehrnann, Basic Concepts of Probability and Statistics, Second Edition George F. Carrier, Max Krook, and Carl E. Pearson, Functions of a Complex Variable Theory and Technique Friedrich Pukeisheim, Optimal Design of Experiments Israel Oohberg, Peter Lancaster, and Leiba Rodman, Invariant Subspaces of Matrices with Applications Lee A. Segel with 0. H. Handelman, Mathematics Applied to Continuum Mechanics Rajendra Bhatia, Perturbation Bounds for Matrix Eigenvalues Barry C. ArnoLd, N. Batakrishnan, and H. N. Nagaraja, A First Course in Order Statistics Charles A. Desoer and M. Vidyasagar, Feedback Systems: lnput.Output Properties Stephen L Campbell and Carl D. Meyer, Generalized Inverses of linear Transformations

Generalized Inverses of Linear Transformations

ci

Stephen L. Campbell

Carl D. Meyer North Carolina State University Raleigh, North Carolina

Society for Industrial and Applied Mathematics Philadelphia

Copyright © 2009 by the Society for Industrial and Applied Mathematics This SIAM edition is an unabridged republication of the work published by Dover Publications, Inc., 1991, which is a corrected republication of the work first published by Pitman Publishing Limited, London, 1979.

1098 7 6 5 43 2 1 All rights reserved. Printed in the United States of America. No part of this book may be reproduced, stored, or transmitted in any manner without the written permission of the publisher. For information, write to the Society for Industrial and Applied Mathematics, 3600 Market Street, 6th Floor, Philadelphia, PA 19104-2688 USA. Library of Congress Cataloging.in#Publication Data Campbell, S. L. (Stephen La Vern) Generalized inverses of linear transformations I Stephen L. Campbell, Carl D. Meyer.

p. cm. -- (Classics in applied mathematics ; 56) Originally published: London: Pitman Pub., 1979. Includes bibliographical references and index. ISBN 978-0-898716-71-9 1. Matrix inversion. 2. Transformations (Mathematics) I. Meyer, C. D. (Carl Dean) II. Title. QA188.C36 2009 512.9'434--dc22 2008046428

512.111.

is

a registered trademark.

To 4ciI1L

Contents

Preface to the Classics Edition

Preface o

Introduction and other preliminaries I

2 3 I

2

Exercises

6

I

8

4

Computation of At Generalized inverse of a product

8 10 12 19

5

Exercises

25

Least squares solutions What kind of answer is 2 Fitting a linear hypothesis 3 Estimating the unknown parameters 4 Goodness of lit 5 An application to curve fitting 6 Polynomial and more general fittings 1

7 WhyAt? 3

I

Prerequisites and philosophy Notation and basic geometry

The Moore—Penrose or generalized inverse I Basic definitions 2 Basic properties of the generalized inverse 3

2

xiii xix

28 28 30 32 34 39 42 45

Sums, partitioned matrices and the constrained generalized inverse I The generalized inverse of a sum 2 Modified matrices 3 Partitioned matrices 4 Block triangular matrices

46

The fundamental matrix of constrained minimization 6 Constrained least squares and constrained generalized

63

5

7

inverses Exercises

46 SI 53 61

65 69

CONTENTS

x

4

Partial isometries and EP matrIces I Introduction 2 Partial isometrics 3 EP matrices 4 Exercises

71 71 71

74 75

5 The generalized Inverse in electrical engineering I Introduction 2 n-port networks and the impedance matrix 3 Parallel sums 4 Shorted matrices 5 Other uses of the generalized inverse 6 Exercises 7 References and further reading

77 77 77 82 86 88 88 89

6 (i,j, k)-Generajlzed inverses and linear estImation I Introduction 2 Definitions 3 (1)-inverses 4 Applications to the theory of linear estimation

91

5

Exercises

7 The Drazin Inverse I Introduction 2 3

4

Definitions Basic properties of the Drazin inverse Spectral properties of the Drazin inverse

ADasapolynomialinA 6 ADasa limit 5

7 8

The Drazin inverse of a partitioned matrix Other properties

8 ApplicatIons of the Drazin inverse to the theory of finite Markov I Introduction and terminology 2 Introduction of the Drazin inverse into the theory of finite Markov chains 3 Regular chains

4 Ergodic chains 5 Calculation of A' and w for an ergodic chain 6 7

Non-ergodic chains and absorbing chains References and further reading

9 ApplIcations of the DraiJa Inverse 1 Introduction 2 Applications of the Drazin inverse to linear systems of differential equations

91 91

96 104 115 121)

120 121

127 129 130 136 139 147

151 151

152 157 158 160 165 170

171 171 171

CONTENTS

Applications of the Drazin inverse to difference equations 4 The Leslie population growth model and backward population projection 5 Optimal control 6 Functions of a matrix 3

7 Weak Drazin inverses 8

10

ContInuity of die generalized inverse 1

2 3

4 5

6 7 8

9 11

Exercises

Introduction Matrix norms Matrix norms and invertibility Continuity of the Moore—Penrose generalized inverse Matrix valued functions Non-linear least squares problems: an example Other inverses Exercises

References and further reading

Linear programming

Introduction and basic theory 2 Pyle's reformulation I

12

3

Exercises

4

References and further reading

ComputatIonal concerns

Introduction 2 Calculation of A' 3 Computation of the singular value decomposition 1

4 (1)-inverses

Computation of the Drazin inverse 6 Previous algorithms 5

xi

181

184 187

200 202 208

210 210 210 214 216 224 229 231

234 235

236 236 241

244 245

246 246 247 251

255 255 260

Exercises

261

Bibliography

263

Index

269

7

Preface to the Classics Edition

The first edition of Generalized Inverses of Linear Transfor,narions was

written toward the end of a period of active research on generalized inverses. Generalized inverses of various kinds have become a standard and important mathematical concept in many areas. The core chapters of this book, consisting of Chapters 1—7, 10, and 12. provide a development of most of the key generalized inverses. This presentation is as up to date and readable as ever and can be profitably read by anyone interested in learning about generalized inverses and their application. Two of the application chapters, however, turned out to be on the ground floor of the development of application areas that have gone on to become significant areas of applied mathematics. Chapter 8 focuses on applications involving Markov chains. While the basic relation between the group inverse and the theory of Markov chains is still relevant, several advances have been made. Most notably, there has been a wealth of new results concerning the use of the group inverse to characterize the sensitivity of the stationary probabilities to perturbations in the underlying transition probabilities—representative results are found in [8, 9, 13, 16, 28, 29, 23, 36, 37, 38, 39,40].' More generally, the group inverse has found applications involving expressions for differentiation of eigenvectors and eigenvalues [10, 11, 12, 14, 30, 31]. Since the original version of this book appeared, researchers in more theoretical areas have been applying the group inverse concept to the study of M-matrices, graph theory, and general nonnegative matrices [4, 5, 6, 7, 17, 18, 19, 20, 21, 22, 32, 33, 34]. Finally, the group inverse has recently proven to be fundamental in the analysis of Google's PageRank system. Some of these applications are described in detail in [26, 27]. 'Citations here correspond only to the references immediately following this preface.

xiv

PREFACE TO THE CLASSICS EDITION

Chapter 9 discusses the Drazin inverse and its application to

differential equations of the form + Bx = f. In Chapter 9 these equations are called singular systems of differential equations, and some applications to control problems are given. It turns out that many physical processes are most naturally modeled by such implicit differential equations. Since the publication of the first edition of this book there has been a major investigation of the applications, numerical solution, and theory behind such implicit differential equations. Today, rather than being called singular systems, they are more often called differential algebraic equations (DAEs) in applied mathematics and called either DAES or descriptor systems in the sciences and engineering. Chapter 9 still provides a good introduction to the linear time invariant case, but now it should be viewed as the first step in understanding a much larger and very important area. Readers interested in reading further about DAEs are referred to the general developments [3, 1,24, 15] and the more technical books [25, 35]. There has been, of course, some additional work on generalized inverses since the first edition was published. A large and more recent bibliography can be found in [2].

Stephen L Campbell Carl D. Meyer September 7, 2008

References [1] Ascher, U. M. and Petzold, L. R. Computer Methods for Ordinary Equations and Equations. SIAM, Philadelphia, 1998. [2] Ben-Israel, A. and Greville, T. N. E. Generalized Inverses. Theory and Applications. 2nd ed. Springer-Verlag, New York, 2003.

[3] Brenan, K. E., Campbell, S. L., and Petzold, L. R. Numerical Solution of Initial-Value Pmblems in Equations. Classics in Appl. Math. 14, SIAM, Philadelphia, 1995. [4] Catral, M., Neumann, M., and Xu, J. Proximity in group inverses of M-matrices and inverses of diagonally dominant M-matrices. LinearAlgebraAppl. 409,32—50,2005.

PREFACE TO THE CLASSICS EDITION

xv

[5] Catral, M., Neumann, M., and Xu, J. Matrix analysis of a Markov chain small-world model. LinearAlgebra App!. 409, 126—146, 2005.

[6] Chen, Y., Kirkland, S. J., and Neumann, M. Group generalized

inverses of M-matrices associated with periodic and nonperiodic Jacobi matrices. Linear Muhi!inearAlgebra. 39, 325—340, 1995. (7] Chen, Y., Kirkland, S. J., and Neumann, M. Nonnegative

alternating circulants leading to Al-matrix group inverses. Linear Algebra App!. 233, 8 1—97, 1996. (8] Cho, 0. and Meyer, C. Markov chain sensitivity measured by mean first passage times. LinearA!gebra App!. 316,21—28, 2000. (9] Cho,

0. and Meyer, C. Comparison of perturbation bounds for the

stationary distribution of a Markov chain. Linear Algebra App!. 335, 137—150, 2001.

[10] Deutsch, E. and Neumann, M. Derivatives of the Perron root at an essentially nonnegative matrix and the group inverse of an M-matrix. J. Math. Anal. App!. 102, 1—29, 1984. (11] Deutsch, E. and Neumann, M. On the first and second derivatives of the Perron vector. Linear Algebra App!. 71,57—76, 1985. [12] Deutsch, E. and Neumann, M. On the derivative of the Perron vector whose infinity norm is fixed. Linear Multilinear Algebra. 21,75—85, 1987.

(13] Funderlic, R. E. and Meyer, C. Sensitivity of the stationary distribution vector for an ergodic Markov chain. Linear Algebra AppI. 17, 1—16, 1986.

[14] Golub, 0. H. and Meyer, Jr., C. D. Using the QR factorization and group inversion to compute, differentiate, and estimate the sensitivity of stationary probabilities for Markov chains. SIAM J. Aig. Discrete Met!,. 7, 273—281, 1986.

[15] Hairer, E. and Wanner, G. Solving Ordinary Differential Equations II. St and Differential-Algebraic Problems. 2nd ed. Springer Ser. Comput. Math. 14, Springer-Verlag, Berlin, 1996.

PREFACE TO THE CLASSICS EDITION

xvi

[16] Ipsen, I. and Meyer, C. D. Uniform stability of Markov SJAMJ. MatrLxAnal. App!. 15, 1061—1074, 1994.

chains.

[17] Kirkland, S. J. and Neumann, M. Convexity and concavity of the Perron root and vector of Leslie matrices with applications to a population model. SIAM I. Matrix Anal. App!. 15, 1092—1107, 1994. [18]

[19]

Kirkland, S. J. and Neumann, M. Group inverses of M-matrices associated with nonnegative matrices having few eigenvalues. LinearAlgebra App!. 220, 181—213, 1995.

Kirkland, S. J. and Neumann, M. The M-matrix group generalized inverse problem for weighted trees. SIAM J. Matrix Anal. App!. 19, 226—234, 1998.

[20] Kirkland, S. J. and Neumann, M. Cutpoint decoupling and first passage times for random walks on graphs. SIAM J. Matrix Anal. App!. 20,860—870, 1999.

[21] Kirkland, S. J., Neumann, M., and Shader, B. L. Distances in weighted trees and group inverse of Laplacian matrices. SIAM I. Matrix Anal. App!. 18, 827—841, 1997.

[22] Kirkland, S. J., Neumann, M.,

and Shader, B. L. Bounds on the

subdominant eigenvalue involving group inverses with applications to graphs. Czech.

Math. J. 48,

1—20, 1998.

[23] Kirkland, S. J., Neumann, M., and Sze, N.-S. On optimal condition

numbers for Markov chains. Numer

Math. 110,521—537,2008.

[24] Kumar, A. and Daoutidis, P. Control of Nonlinear Djfferential

Equation Systems with Applications to Chemical Processes. Chapman and Hall/CRC, Boca Raton, FL, 1999.

Algebraic

[25] Kunkel, P. and Mehrmann, V. Differential-Algebraic Equations. Analysis and Numerical Solution. EMS Textbooks in Math., European Mathematical Society, ZUrich, 2006. [26]

Meyer, C. Google 's PageRank and Beyond: The Science of Search Engine Rankings. Princeton University Press, Langville, A. and

Princeton, NJ, 2006.

PREFACE TO THE CLASSICS EDITION

xvii

[27] Langville, A. and Meyer, C. D. Updating Markov chains with an eye on Google's PageRank. SIAM J. Matrix Anal. App!. 27, 968—987,2006.

[28] Meyer, C. The character of a finite Markov chain. In Linear Algebra, Markoi' Chains, and Queuing Models. IMA Vol. Math. Appl. 48, Springer. New York, 1993.47—58.

[291 Meyer, C. Sensitivity of the stationary distribution of a Markov chain. SlAM). Matrix Anal. App!. 15, 715—728, 1994.

[30] Meyer, C. Matrix Analysis and Applied Linear Algebra. 2nd ed. SIAM, Philadelphia, to appear. [31] Meyer, C. and Stewart, G. W. Derivatives and perturbations of eigenvectors. SIAM I. Anal. 25,679—691, 1988. [32] Neumann, M. and Werner, H. J. Nonnegative group inverses. LinearAlgebra App!. 151, 85—96, 1991.

[33] Neumann, M. and Xu, J. A parallel algorithm for computing the group inverse via Perron complementation. Electron. J. Linear Algebra 13, 13 1—145, 2005.

[34] Neumann, M. and Xu, J. A note on Newton and Newton-like inequalities for M-matrices and for Drazin inverses of M-matrices. Electron. J. Linear Algebra 15, 314—328, 2006. [35] Riaza, R. Differential-Algebraic Systems: Analytical Aspects and Circuit Applications. World Scientific, River Edge, NJ, 2008. [36] Seneta, E. Sensitivity to perturbation of the stationary distribution: Some refinements. LinearAlgebra App!. 108, 12 1—126, 1988. [37] Seneta, E. Perturbation of the stationary distribution measured by ergodicity coefficients. Adv. App!. Probab. 20,228—230, 1988. [38] Seneta, E. Sensitivity analysis, ergodicity coefficients, and rank-one updates for finite Markov chains. In Numerical Solution of Mar/coy Chains. (W. J. Stewart, ed.) Marcel Dekker, New York, 1991, 121—129.

xviii

PREFACE TO THE CLASSICS EDITION

[39] Seneta, E. Explicit forms for ergodicity coefficients of stochastic matrices. Linear Algebra AppI. 191, 245—252, 1993. [40] Seneta, E. Sensitivity of finite Markov chains under perturbation.

Statist. Probab. Leu. 17, 163—168, 1993.

Preface

During the last two decades, the study of generalized inversion of linear transformations and related applications has grown to become an important topic of interest to researchers engaged in the study of linear mathematical problems as well as to practitioners concerned with applications of linear mathematics. The purpose of this book is twofold. First, we try to present a unified treatment of the general theory of generalized inversion which includes topics ranging from the most traditional to the most contemporary. Secondly, we emphasize the utility of the concept of generalized inversion by presenting many diverse applications in which generalized inversion plays an integral role. This book is designed to be useful to the researcher and the practitioner, as well as the student. Much of the material is written under the assumption that the reader is unfamiliar with the basic aspects of the theory and applications of generalized inverses. As such, the text is accessible to anyone possessing a knowledge of elementary Linear algebra. This text is not meant to be encyclopedic. We have not tried to touch on all aspects of generalized inversion—nor did we try to include every known application. Due to considerations of length, we have been forced to restrict the theory to finite dimensional spaces and neglect several important topics and interesting applications. In the development of every area of mathematics there comes a time when there is a commonly accepted body of results, and referencing is limited primarily to more recent and less widely known results. We feel that the theory of generalized inverses has reached that point. Accordingly, we have departed from previous books and not referenced many of the more standard facts about generalized inversion. To the many individuals who have made an original contribution to the theory of generalized inverses we are deeply indebted. We are especially indebted to Adi Ben-Israel, Thomas Greville, C. R. Rao, and S. K. Mitra whose texts undoubtedly have had an influence on the writing of this book.

xx

PREFACE

In view of the complete (annotated) bibliographies available in other texts, we made no attempt at a complete list of references.

Special thanks are extended to Franklin A. Graybill and Richard 3. Painter who introduced author Meyer to the subject of generalized inverses and who prov led wisdom and guidance at a time when they were most needed. S. L. Campbell C. D. Meyer, Jr North Carolina State University at Raleigh

0

Introduction and other preliminaries

1.

Prerequisites and philosophy

The study of generalized inverses has flourished since its rebirth in the

early 1950s. Numerous papers have developed both its theory and its applications. The subject has advanced to the point where a unified treatment is possible. It would be desirable to have a book that treated the subject from the viewpoint of linear algebra, and not with regard to a particular application. We do not feel that the world needs another introduction to linear algebra. Accordingly, this book presupposes some familiarity with the basic facts and techniques of linear algebra as found in most introductory courses. It is our hope that this book would be suitable for self-study by either students or workers in other fields. Needed ideas that a person might well have forgotten or never learned, such as the singular value decomposition, will be stated formally. Unless their proof is illuminating or illustrates an important technique, it will be relegated to the exercises or a reference. There are three basic kinds of chapters in this book. Chapters 0, 1, 2, 3, 4, 6, 7, 10, and 12 discuss the theory of the generalized inverse and related notions. They are a basic introduction to the mathematical theory. Chapters 5, 8,9 and 11 discuss applications. These chapters are intended to illustrate the uses of generalized inverses, not necessarily to teach how to use them. Our goal has been to write a readable, introductory book which will whet the appetite of the reader to learn more. We have tried to bring the reader far enough so that he can proceed into the literature, and yet not bury him under a morass of technical lemmas and concise, abbreviated proofs. This book reflects our rather definite opinions on what an introductory book is and what it should include. In particular, we feel that the numerous applications are necessary for a full appreciation of the theory. Like most types of mathematics, the introduction of the various generalized inverses is not necessary. One could do mathematics without

2

GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

ever defining a ring or a continuous function. However, the introduction

of generalized inverses, as with rings and continuous functions, enables us to more clearly see the underlying structure, to more easily manipulate it, and to more easily express new results. No attempt has been made to have the bibliography comprehensive. The bibliography of [64] is carefully annotated and contains some 1775 entries. In order to keep this book's size down we have omitted any discussion of the infinite dimensional case. The interested reader is referred to [64].

2.

Notation and basic geometry

Theorems, facts, propositions, lemmas, corollaries and examples are

numbered consecutively within each section. A reference to Example 3.2.3. refers to the third example in Section 2 of Chapter 3. If the reader were already in Chapter 3, the reference would be just to Example 2.3. Within Section 2, the reference would be to Example 3. Exercise sections are scattered throughout the chapters. Exercises with an (*) beside them are intended for more advanced readers. In some cases the require knowledge from an outside area like complex function theory. At other times they involve fairly complicated proofs using basic ideas. xli) C(R) is the field of complex (real) numbers. CTM is the vector-space of m x n complex (real) matrices over C(R). C'(R") is the vector-space of n-tuples of complex (real) numbers over C(R). We will frequently not distinguish between C"(R") and C" X '(R" 1)• That is, n-tuples will be written as column vectors. Except where we specifically state otherwise, it is to be assumed that we are working over the complex field. If A e C'" then with respect to the standard basis of C" and C'", A induces a linear transformation : C" -. C'" by = Au for every ueC". Whenever we go from one of A or to the other, it is to be understood that it is with respect to the standard basis. The capital letters, A, B, C, X, Y, Z are reserved for matrices or their corresponding linear transformations. Subspaces are denoted by the capital letters M, N. Subspaces are always linear subspaces. The letters U, V, W are reserved for unitary matrices or partial isometrics. I always denotes the identity matrix. If I e we sometimes write 1. Vectors are denoted by b, u, v, y, etc., scalars by a, b, )., k, etc. R(A) denotes the range of A, that is, the linear span of the columns of A. The range of is denoted Since A is derived from by way of the standard basis, we have R(A) = The null space of A, N(A), is : Ax = 0). A matrix A is hermiuan if its conjugate transpose, A*, equals A. Jf A2 = A, then A is called a projector of C" onto R(A). Recall that rank (A) = TI(A) if A2 = A. If A2 = A and A = A *, then A is called an orthogonal projector. If A, then [A,B] = AB — BA. The inner product between two vectors u, we C" is denoted by (u, v). If $" X

INTRODUCTION AND OTHER PRELIMINARIES

3

= {ueC" :(u,v) = 0 for every VE.Y}. The smallest subspace of C" containing .9' is denoted LS([f). Notice that = LS(.9'). Suppose now that M,N1, and N2 are subspaces of C". Then N1 + N2 = {u + v:ueN1 and veN2}, while AN1 = {Au :uEN1}. If M = N1 + N2 and N1 N2 = {O}, then M is called the direct sum of N1 and N2. In this case we write M = N1 + N2. If M = N1 + N2 and N1 J.. N2, that is (u,v) = 0 for every u€N1 and yeN2, then M is called the orthogonal sum of N1 and N2. This will be written M = N1 N2. If two vectors are orthogonal. their sum will frequently be written with a also. If C" = N1 + N2, then N1 and N2 are called complementary subs paces. Notice that C" = N1 Nt. The dimension of a subspace M is denoted dim M. One of the most basic facts used in this book is the next proposition. is a subset of C", then ,$f'-'-

Proposition 0.2.1

Suppose that AeCtm

Then R(A) =

Proof We will show that R(A)

N(A*)I and dim R(A) = dim Suppose that uER(A). Then there is a such that Av = u. If wEN(A*), (v,A*w) w) = then (u, w) = (As', = (v,O) =0. Thus UEN(A*)J. and R(A) N(A*)i. But dim R(A) = rank A = rank = m — dim N(A*) = dim N(A*)L.

•

Thus R(A)=N(A*).. A useful consequence of Proposition 1 is the 'star cancellation law'. Proposition 0.2.2 (Star cancellation law) Suppose that Then (i) A*AB MAC and only (ii) N(AtA) = N(A), (iii) R(A*A) = R(A*).

and

= AC. Also

Proof. (i) may be rewritten as A*A(B — C) =0 if and only if A(B — C)

=0.

Clearly (i) and (ii) are equivalent. To see that (ii) holds notice that by Proposition 1, N(M) = R(A)L and thus MAx =011 and only if Ax =0. To see (iii), note that using (ii) we have that R(A*A) = N(A*A)L = N(A)- =

R(A*). U

Propositions 1 and 2 are basic and will be used frequently in what

follows without comment. If M is a subspace of C", then we may define the orthogonal projector, = u if ueM and PMU = 0 M by Notice that = and that for any orthogonal projector, P. we have P= — It is frequently helpful to write a matrix A in block form, that is, express A as a matrix made up of matrices, A = If B is a second block matrix, then (AB)1, =

submatrices A.,, cations and additions. Example 0.2.1

and (A + B)1, =

+ B1, provided the

the correct size to permit the indicated multipli-

LetA=

112341 1

2

01

1

1

iJ

riol

andB= 10 OI.ThenA 10 OJ

4 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

can be written as A

written as

where A11

= [1], A12 = [2,3],

=

[i] where B1 = [1

0], B2

and B3 = [1,0].

=

B3

_1A11 Th US AB

LA21

A1311B11 1A11B1 + A12B2 + A13B3 A22 A23J1B2 1LA21B1+A22B2+A22B3 LB3J A12

If all the submatrices are from C' 'for a fixed r, then block matrices K

may be viewed as matrices over a ring. We shall not do so. This notation is especially useful when dealing with large matrices or matrices which have submatrices of a special type. There is a close connection between block matrices, invariant subspaces, and projections which we now wish to develop. then M is called Definition 0.2.1 If M is a subspace of and an invariant subspace of A Ef and only (lAM = (Au : ueM} c M. If M is an invariant subspace for both A and then M is called a reducing subspace

of A.

Invariant and reducing subspaces have a good characterization in terms of projectors.

Proposition 0.2.3

Let PM and M isa subspace of be the orthogonal projector onto M. Then (i) M is an invariant subspace for A (land only if PMAPM = APM. (ii) M is a reducing subspace for A (land only (fAPM = PMA. The proof of Proposition 3 is left to the next set of exercises. Knowledge of invariant subspaces is useful for it enables the introduction of blocks of Suppose

zeros into a matrix. Invariant subspaces also have obvious geometric importance in studying the effects of on

Proposition 0.2.4 Suppose that A E CN and that M is an invariant subspace of A of dimension r. Then there exists a unitary matrix U such that

(i) A=U*IIA

Al

If M is a reducing subspace of A, then (ii) A = U*I

L°

A 11

ii

1 IU where

A22j

' A 22

Then CN = M M1. Let Proof Let M be a subspace of orthonormal basis for M and be an orthonormal basis for

be an Then

INTRODUCTION AND OTHER PRELIMINARIES

5

Order the vectors in P so that P2 is an orthonormal basis for are listed first. those in Let U be a unitary transformation that maps the standard basis for onto the ordered basis fi. Then P = P1

M

r

—

Lo

—

0]'

I

12

*

LA21

(1)

A,2

r = dim M. Suppose now that M is an invariant subspace for A. Thus PMAPM = APM by Proposition 3. This is equivalent to where

E C'

X

UPMUUAUUPMU = U*AUU*PMU. Substituting (1) into this gives IA11 L0

0]

[A11

0

0JLA,1 0

Thus A21 =0 and part (i) of Proposition 4 follows. Part (ii) follows by substituting (1) into U*PMUU*AU = U*AUU*PMU. • If AECnXn, then R(A) is always an invariant subspace for A. If A is hermitian, then every invariant subspace is reducing. In particular. R(A) is reducing. If A = A*, then there exists a unitary marix U and an

Proposition 0.2.5

irnertible hermitian matrix A1 such that A =

1

U.

Proposition 5 is. of course, a special case of the fact that every hennitian matrix is unitarily equivalent to a diagonal matrix. Viewed in this manner. it is clear that a similar result holds if hermitian is replaced by normal where a matrix is called normal if A*A = AA*. We assume that the reader is already familiar with the fact that normal and hermitian matrices are unitarily equivalent to diagonal matrices. Our purpose here is to review some of the 'geometry' of invariant subspaces and to gain a facility with the manipulation of block matrices. Reducing subspaces are better to work with than invariant subspaces, has no reducing = n> 1, does have invariant subspaces

but reducing subspaces need not always exist. A X

subspaces. Every matrix in since it has eigenvectors. And if is a set of eigenvectors corresponding to a particular eigenvalue of A, then LS(S1') is an invariant subspace for A. We shall see later that unitary and hermitian matrices are often easier to work with. Thus it is helpful if a matrix can be written as a product of such factors. If there is such a decomposition. It is called the polar form. The name comes from the similarity between it and the polar form of a complex number z = re1° where reR and 1e101 = I.

Theorem 0.2.1

If

then there exists a unitary matrix U and

hermitian matrices B,C such that A = UB = CU.

6

GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

If A E cm

n, then one cannot hope to get quite as good an expression for A since A is not square. There are two ways that Theorem 1 can be extended. One is to replace U by what is called a partial isometry. This will be discussed in a later section. The other possibility is to replace B by a matrix like a hermitian matrix. By 'like a hermitian matrix' we are thinking of the block form given in Proposition 5. where m

Theorem 0.2.2

(Singular Value Decomposition) Suppose that A E Ctm 'C". Then there exist unitary matrices UeCM 'Cm and VeC" 'C", and an invertible

hermitian diagonal matrix D = Diag{a1, ... ,a,}, whose diagonal entries are the positive square roots of the eigenvalues of(A*A) repeated according

to multiplicity, such that A = u

0°]v.

The proofs of Theorems I and 2 are somewhat easier if done with the notation of generalized inverses. The proofs will be developed in the exercises following the section on partial isometrics. Two comments are in order. First, the matrix

] is not a square

matrix, although D is square. Secondly, the name 'Singular Value Decomposition' comes from the numbers fri, ... , a,) which are frequently referred to as the singular values of A. The notation of the functional calculus is convenient and will be used from time to time. If C = U Diag{A1, ... , for some unitary U and is a function defined on then we definef(C) = U Diag{f(21), ... This definition only makes sense if C is normal. If p(A) = aMA" + ... + a0, then p(C) as we have defined it here agrees with the standard definition of p(C) = aRC" + ... + a0!. For any A E C" 'Ca a(A) denotes the set of eigenvalues of A.

I

3.

Exercises

Prove that if P is hermitian, then P is a projector if and only if P3 = P2. 2. Prove that if M1 c M2 C" are invariant subspaces for A, then A is 1.

unitarily equivalent to a matrix of the form J 0

X X where X

LOOXJ

denotes a non-zero block. Generalize this to t invariant subspaces such that 3. If M2 c C" are reducing subspaces for A show that A is unitarily

Ix equivalent to a matrix in the form reducing subspaces such that M1

4. Prove Proposition 2.3.

0

01

0 X 0 1. Generalize this to t LOOXJ ...

M.

INTRODUCTION AND OTHER PRELIMINARIES X is an invariant subspace for A, then and M c 5. Prove that if A M1 is an invariant subspace for A*.

6. Suppose A = A*. Give necessary and sufficient conditions on A to guarantee that for every pair of reducing subspaces M1 , M2 of A that either M1 ±M2 or M1rM2

{O}.

7

1

The Moore—Penrose or generalized inverse

1.

Basic definitions

Equations of the form

Ax = b,AECMXA,XECA,b€Cm

(1)

X9, occur in many pure and applied problems. If A and is invertible, then the system of equations (1) is, in principle, easy to solve. The unique solution is x = A - 1b. If A is an arbitrary matrix in C"' ", then it becomes more difficult to solve (1). There may be none, one, or an infinite number of solutions depending on whether b€ R(A) and whether n-rank (A)> 0. One would like to be able to find a matrix (or matrices) C, such that solutions of (1) are of the form Cb. But if R(A), then (1) has no solution. This will eventually require us to modify our concept of what a solution of (1) is. However, as the applications will illustrate, this is not as unnatural as it sounds. But for now we retain the standard definition of solution. To motivate our first definition of the generalized inverse, consider the functional equation

(2)

wheref is a real-valued function with domain S". One procedure for solving (2) is to restrict the domain off to a smaller set is one to so that one. Then an inverse functionj' from R(J) to b°' is defined byj 1(y) = x

if xe9" andf(x) = y. Thus! '(y) is a solution of(2) for y€R(J). This is how the arcsec, arcsin, and other inverse functions are normally defined. The same procedure can be used in trying to solve equation (1). As usual, we let be the linear function from C" into C"' defined by = Ax for xeC". To make a one to one linear transformation it must be restricted to a subspace complementary to N(A). An obvious one is N(A)- = R(A*). This suggests the following definition of the generalized inverse.

Definition 1.1.1

Functional definition of the generalized inverse

If

THE MOORE—PENROSE OR GENERALIZED INVERSE

define the linear transformation 'X and AtX

=

by The matrix :Cm

9

=0 is denoted At

and is called the generali:ed inverse of A.

It is easy to check that AAtX = Oil xER(A)1 and AAtx = x if x€R(A). Similarly, AtAX = 0 if xEN(A)= and A'Ax = x if XER(A*) = R(At). Thus AAt is the orthogonal projector of Ctm onto R(A) while A'A is the orthogonal projector of onto R(A*) = R(At). This suggests a second definition of the generalized inverse due to E. H. Moore

If Definition 1.1.2 Moore definition of the generalized inverse AECrnXn. then the generali:ed inuerse of A is defined to be the unique matrix

such 1/tat

(a) AAt

=

PR(A).

and

(b) AtA=PR(A.).

Moore's definition was given in 1935 and then more or less forgotten. This is possibly due to the fact that it was not expressed in the form of Definition 2 but rather in a more cumbersome (no pun intended) notation. An algebraic form of Moore's definition was given in 1955 by Penrose who was apparently unaware of Moore's work.

Definition 1.1.3

Penrose definition of the generalized inverse At then is the unique matrix in CnXtm such that

(i)

If

AAtA=A,

(ii) AtAAt = At, (iii) (AAt)* = AAt, (iv) (AtA)* = AtA. The first important fact to be established is the equivalence of the definitions. Theorem 1.1.1 The functional, Moore and Penrose definitions of the generalized inverse are equivalent.

Proof We have already noted that if At satisfies Definition 1. then it satisfies equations (a) and (b). If a matrix At satisfies (a) and (b) then it immediately satisfies (iii) and (iv). Furthermore (i) follows from (a) by observing that AAtA = PR(A,A = A. (ii) will follow from (b) in a similar manner. Since Definition I was constructive and the A' it constructs satisfies (a), (b) and (i)—(iv), the question of existence in Definitions 2 and 3 is already taken care of. There are then two things remaining to be proven. One is that a solution of equations (i)—(iv) is a solution of(a) and (b). The second is that a solution of(a) and (b) or (i)—(iv) is unique. Suppose then that A' is a matrix satisfying (i)—(iv). Multiplying (ii) on the left by A gives (AAt)2 = (AAt). This and (iii) show that AAt is an orthogonal projector. We must show that it has range equal to the range

10 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

of A. Using (i) and the fact that R(BC) c R(B) for matrices B and C, we get R(A) = R(AAtA) c R(AAt) R(A), so that R(A) = R(AAt). as desired. The proof that AtA = is similar and is Thus = left to the reader as an exercise. One way to show uniqueness is to show that if At satisfies (a) and (b), or (i)—(iv), then it satisfies Definition 1. Suppose then that At is a matrix satisfying (i)—(iv), (a), and (b). If then by (a), AAtx =0. Thus by (ii) A'x = AtAAtX = A'O = 0. II xeR(A), then there exist yeR(A*) such that Ay = x. But AtX = AtAy = y. The last equality follows by observing that taking the adjoint of both sides of 1x. Thus (1) gives PR(At)A = A* so that R(A) c R(At). But y = At satisfies Definition I. U As this proof illustrates, equations (i) and (ii) are, in effect, cancellation laws. While we cannot say that AB = AC implies B = C, we can say that if AtAB = AtAC then AB = AC. This type of cancellation will frequently appear in proofs and the exercises. For obvious reasons, the generalized inverse is often referred to as the Moore—Penrose inverse. Note also that if A E C" "and A is invertible, then A -' = At so that the generalized inverse lives up to its name.

2.

Basic properties of the generalized inverse

Before proceeding to establish some of what is true about generalized inverses, the reader should be warned about certain things that are not true. While it is true that R(A*) = R(At), if At is the generalized inverse, condition (b) in Definition 2 cannot be replaced by AtA =

Example 1.2.1

o] = A' = XAX

?].SinceXA=AX= =

X satisfies AX =

and hence X #

and XA

At: Note that XA

=

But

and thus

X. If XA = PR( A.), AX = PR(A), and in addition XAX = X, then

X = At. The proof of this last statement is left to the exercises. In computations involving inverses one frequently uses (AB)' = B- 1A A and B are invertible. This fails to hold for generalized inverses even if AB = BA.

Fact 1.2.1 If as BtAt. Furthermore Example 1.2.2

then (AB)t is not necessarily the same is

not necessarily equal to (A2)t.

Ii

THE MOORE—PENROSE OR GENERALIZED INVERSE

A2

= A while At2 =

Thus (At)2A2 =

which is not a projection.

(A2)t.

Thus (At)2 Ways of calculating At will be given shortly. The generalized inverses in Examples 1 and 2 can be found directly from Definition 1 without too much difficulty. Examples 2 illustrates another way in which the properties of the generalized inverse differ from those of the inverse. If A is invertible, then

2Eo(A) if and only

If A

in Example 2, then

—

(7(A) = {1,0} while a(At) =

If A is similar to a matrix C, then A and C have the same eigenvalues. the same Jordan form, and the same characteristic polynomial. None of these are preserved by taking of the generalized inverse.

1 Example 1.2.2

Ii B= 10

0 1

Let A =

1

1

0 —2

2

L—i

ii

—11 Then A =

BJB'

where

1J

1

10 1 0] IIandJ=IO 0 0J.ThecharacteristicpolynomialofA

Liooi

Loo2J

and J is

ro o = Ji 0

A2 and 2— 2.

LO

and the characteristic polynomial of divisors 22,(2 — 1/2).

is

—

1/2)

0

o

0 1/2

with elementary

11

An easy computation gives At = 1/12

6

2—11 0

L—' —2

6

1J

— (1 — (1 + and hence a diagonal Jordan form. Thus, if A and C are similar, then about the only thing that one can always say about At and C is that they have the same rank. A type of inverse that behaves better with respect to similarity is discussed in Chapter VII. Since the generalized inverse does not have all the properties of the inverse, it becomes important to know what properties it does have and which identities it does satisfy. There are, of course, an arbitrarily large number of true statements about generalized inverses. The next theorem lists some of the more basic properties.

But At has characteristic polynomial 1(2 —

Theorem 1.2.1 (P1)

Then

Suppose that

(At)t=A 0

(P2) (At)* = (A*)t (P3) IfAeC,(AA)t = A'At where

ifA=0. = AAAt = AtAA* (P4) (P5) (A*A)t = AtA*t

= — 412 #0 and

=0

12

GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

At = (A*A)?A* = A*(AA*)? (P7) (UAV)t = v*A?u* where U, V are unitary matrices. (P6)

Proof We will discuss the properties in the order given. A look at Definition 2 and a moment's thought show that (P1) is true. We leave (P2) and (P3) to the exercises. (P4) follows by taking the adjoints of both A = (AAt)A and A = A(AtA). (P5), since it claims that something is a generalized inverse, can be checked by using one of the definitions. Definition 2 is the quickest. AtA*tA*A = At(A*tA*)A = A?(AAt)*A = AtAA'A = AtA = Similarly, A*AA?A*? = A*(AAt)A*t = = A*(AAt)*A*t = A*(A*tA*)A*t = (A*A*?)(A*A*?) = A*A*t = = A?A*? by Definition 2. (A*A)? = Thus = = (P6) follows from (P5). (P7) is left as an exercise, a

01t

IA

Proposition

1.2.1

•.

I

I

:

0

=I [0

Ló The proof of Proposition 1 is left as an exercise. As the proof of Theorem I illustrates, it is frequently helpful to know the ranges and null spaces of expressions involving At. For ease of reference we now list several of these basic geometric properties. (P8) and (P9) have been done already. The rest are left as an exercise. :AmJ

Theorem 1.2.2 If then (P8) R(A) = R(AAt) = R(AA*) (P9) R(At) = R(A*) = R(AtA) = (PlO) R(I — AA') = N(AAt) = = N(At) = R(A)(P11) R(I — AtA)= N(AtA) = N(A)= 3.

Computation of At

In learning any type of mathematics, the working out of examples is useful, if not essential. The calculation of At from A can be difficult, and will be discussed more fully later. For the present we will give two methods which will enable the reader to begin to calculate the generalized inverses of small matrices. The first method is worth knowing because using it should help give a feeling of what A' is. The method consists of 'constructing' At according to Definition 1.

Example 1.3.1

[1127 Let A = I 0

Ii

2

2 I . Then R(A*) is

0

1

spanned by

L'°' {

[i].

[o]. [o]}. A subset forming a basis of R(A*) is

THE MOORE—PENROSE OR GENERALIZED INVERSE

[?] =

=

roi

3

and At 4 =

I1

. I

We now

must calculate a basis for R(A)-- =

N(A*).

Lii

1

+ x4 [1/2] where

Solving the system A*x =0 we get that x = x3

—1

x3.x4eC. Then At

—11

=At

1/2

1/21

= O€C3. Combining

all

of this

01

0 3

3

—1

gives At 2

4

1/2

21 21 ri

At=I0

0 1

1

Li

1J

—1 1/2 =

1

0

0

1

0 0 0

01

oJ

[1

0

10

1

o

01

0 Olor

Lii

0

oJ

3

3

—1

—1

2

4

1/2

1,2

2

I

1

0

2

I

0

1

—l

The indicated inverse can always be taken since its columns form a basis for R(A) R(A)L and hence are a linearly independent set of four vectors in C4. Below is a formal statement of the method described in Example 1. Xfl Theorem 1.3. 1 Let AECTM have rank r. If {v1 , v2,... "r} is a basis for R(A*) and {w1 , w2, ... , is a basis for N(A*), then

Proof By using Definition I

I

I

A'[Av 1I••.I'Av,:w I

I

Furthermore,

I

I

I

I

I

I

1.1

it

is clear that

I

{Av1,Av2, ... ,Av,) must be a basis for R(A). Since R(A)1 =

N(A*), it follows that the matrix [Av1 ...

w1

...

non-singular. The desired result is now immediate. •

must be

13

14 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

The second method involves a formula which is sometimes useful. It

depends on the following fact:

Proposition

then there exists that A = BC and r = rank (A) = rank (B) = rank (C).

such

1.3.1

The proof of Proposition 1 is not difficult and is left as an exercise. It means that the next result can, in theory, always be used to calculate At. or (B*BYl if C or B have See Chapter 12 for a caution on taking small singular values. Theorem 1.3.2 If A = BC where r = rank (A) = rank (B) = rank (C), then At

Prool Notice that

= C(CC) '(BB) iB*.

and CC are rank r matrices in C' 'so that it X

makes sense to take their inverses. Let X = C*(CC*) *(B*B) IB*. We will show X satisfies Definition 3. This choice is made on the grounds that the more complicated an expression is, the more difficult it becomes geometrically to work with it, and Definition 3 is algebraic. Now AX = BCC*(CC*) i(B*B) 1B* = B(BB) 'B,so(AX) = AX. Also XA = C*(CC*) IB*BC = C*(CC*) 1C, so (XA)* = XA. Thus (iii) and (iv) hold. To check (1) and (ii) use XA = C*(CC*) 'C to get that A(XA) = BC(C*(CC*)... 'C) = BC = A. And (XA)X = C(CC) 1CC*(CC*)i x (B*B) 'B = C*(CC*) '(BB) IB* = X. Thus X = At by Definition 3.U

Example 1.3.2

1.A=BC where

Let

BeC2'" and CeC1 x3 In fact, a little thought shows that A

Then B*B=[5],CC*=[6].ThusAt= Ill

1

2].

2

LzJ 1

is typical as the next result shows.

Theorem 1.3.3

If

and rank(A) = 1, then At = !A* where

The proof is left to the exercises. The method of computing At described in Example 1.3.1 and the method of Theorem 1.3.2 may both be executed by reducing A by elementary row operations.

Definition 1.3.1 A matrix echelon form .fE is of the form

which has rank r is said to be in row

(1) (m — r) X

THE MOORE—PENROSE OR GENERALIZED INVERSE

where the elements c, of C (

15

= C,. satisfy the following conditions,

(1)

(ii) The first non-zero entry in each row of C is 1. (iii) If = 1 is the first non-zero entry of the ith row, then the jth column of C is the unit vector e1 whose only non-zero entry is in the ith position.

For example, the matrix

120 —2 3501 400 E= 0 0 0 0 0 000 0000 000 0000 00

1

1

3

(2)

is in row echelon form. Below we state some facts about the row echelon form, the proofs of which may be found in [65]. For A e CTM "such that rank (A) = r:

(El) A can always be row reduced to row echelon form by elementary row operations (i.e. there always exists a non-singular matrix P€C" such that PA = EA where EA is in row echelon form). (E2) For a given A, the row echelon form EA obtained by row reducing A is unique. (E3) If Eq is the row echelon form for A and the unit vectors in EA appear in columns i2,... , and i,, then the corresponding columns of A are a basis for R(A). This particular basis is called the set of distinguished columns of A. The remaining columns are called the undistinguished columns of A. (For example, if A is a matrix such that its row echelon form is given by (2) then the first, third, and sixth columns of A are the distinguished columns. (E4) If EA is the row echelon form (1) for A, then N(A) = = N(C). (ES) If (1) is the row echelon form for A, and if the matrix made up of the distinguished columns of A (in the same order as they are in A), then A = BC where C is obtained from the row echelon form. This is a full rank factorization such as was described in Proposition 1.

Very closely related to the row echelon form is the hermite echelon form. However, the hermite echelon form is defined only for square matrices.

Definition 1.3.2

tf its elements

A matrix

is said to be in hermite echelon form satisfies the following conditions.

(i) H is upper triangular (i.e. =0 when i (ii) is either 0 or 1. (iii) If h,1 =0, then h, = Ofor every k, 1 k (iv) If h1, = 1, then hkg = Ofor every k # I.

>j). n.

16

GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

For example, the matrix

120 0000 3501 000 —2 4 0 0 00 H=000 0000 000 0000 000 0013 000 0000 1

is in hermite form. Below are some facts about the hermite form, the proofs of which may be found in [65). For can always be row reduced to a hermite form. If A is reduced to its row echelon form, then a permutation of the rows can always be performed to obtain a hermite form. (H2) For a given matrix A the hermite form HA obtained by row reducing A is unique. (H3) = HA (i.e. HA is a projection). (H4) N(A) = N(HA) = R(I — HA) and a basis for N(A) is the set of non-zero columns of I — HA. A

We can now present the methods of Theorems 1 and 2 as algorithms.

Algorithm 1.3.1

To obtain the generalized inverse of a square matrix

Ae CA

Row reduce A* to its hermite form HA,. (II) Select the distinguished columns of A*. Label these columns v, and place them as columns in a matrix L. '1' (III) Form the matrix AL. (IV) Form I — HA. and select the non-zero columns from this matrix. Label these columns w1 ,w2,... , (V) Place the columns of AL and the w1's as columns in a matrix (I)

M= rows of M -

and compute M '.(Actually only the first r l are

needed.)

(VI) Place the first r rows of M '(in the same order as they appear in M ')in a matrix called R. (VII) Compute At as At = LR. Although Algorithm 1 is stated for square matrices, it is easy to use it for non-square matrices. Add zero rows or zero columns to construct a square

matrix and use the fact that Algorithm 1.3.2 inverse for any

=

= [AtjO*].

To obtain the full rank factorization and the generalized

THE MOORE—PENROSE OR GENERALIZED INVERSE 17

(I) Reduce A to row echelon form EA.

(II) Select the distinguished columns of A and place them as the columns in a matrix B in the same order as they appear in A. (III) Select the non-zero rows from EA and place them as rows in a matrix C in the same order as they appear in EA. (IV) Compute (CC*yl and (B*B)'. (V) Compute A' as At = C*(CC*) l(B*BY We will use Algorithm 1 to find At where

Example 1.3.3

ri A

4 6

2

1

—10 0

1

1

0 0

1

40

12 Lo

(I) Using elementary row operations on A* we get that its hermite echelon form is

10

10

00

01

H

(H) The first, second and fourth columns of A* are distinguished. Thus

ft 2 0 12 4 0 L=11 0

0

6

1

(III) Then

AL=

22 34

34 56

5

6

41 61

461 1

(IV) The non-zero column oil — HA. is = [— 1, 1/2. 1,0]*. (V) Putting AL and w1 into a matrix M and then computing M' gives r22

34

M— 134

56 6

J5

4 6

—

40

—20

19

14

1

46 —4 —44

0

40

20

1

1/2

1

61

(VI) The first three rows of M ' give R as 40 i 1

—20 14

50 —26

[—46—4 -44

—90 18

342

50 —26

40

—

901

18!

oJ

18 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

(VII) Thus

4 —1 —27

1

2 20 0

—

Example 1.3.4

A=

8

—10 0

—2 —54 25

—45

0

45

We will now use Algorithm 2 to fmd At where

12141 24066 0 24066 1

2

3

3

(I) Using elementary row operations we reduce A to its row echelon form

Ii

2

0

3

31

10

0

1

1

—21

EA=10

00

0

01

L0000

oJ

(II) The first and third columns are distinguished. Thus

B=[? 1]. (III) The matrix C is made up of the non-zero rows of EA so that

20 Lo 0

(IV) Now CC*

1

3

3

1

—2

[23

B*B =[1?

—

Calculating

=

we get

(V) Substituting the results of steps (II), (III) and (IV) into the formula

for At

At = C*(CC*) l(B*B) 1B*

=1

27 54 207 288 —333

6

3

6

12

6

12

—40 —20 —40 —22 —11 —22 98

49

98

THE MOORE—PENROSE OR GENERALIZED INVERSE

19

Theorem 2 is a good illustration of one difficulty with learning from a

text in this area. Often the hard part is to come up with the right formula. To verify it is easy. This is not an uncommon phenomenon. In differential equations it is frequently easy to verify that a given function is a solution. The hard part is to show one exists and then to find it. In the study of generalized inverses, existence is usually taken care of early. There remains then the problem of finding the right formula. For these reasons, we urge the reader to try and derive his own theorems as we go. For example, can you come up with an alternative formula to that of Theorem 2? The resulting formula should, of course, only involve At on one side. Ideally, Then ask yourseff, can I do better by it would not even involve B' and imposing special conditions on B and C or A? Under what conditions does the formula simplify? The reader who approaches each problem, theorem and exercise in this manner will not only learn the material better, but will be a better mathematician for it. 4.

Generalized inverse of a product

As pointed out in Section 2, one of the major shortcomings of the

Moore—Penrose inverse is that the 'reverse order law' does not always hold, that is, (AB)' is not always BtAt. This immediately suggests two questions. What is (AB)'? When does (AB)t = BtAt? The question, 'What is (AB)'?' has a lot of useless, or non-answer, answers. For example, (AB)' = is a non-answer. It merely restates condition (ii) of the Penrose definition of (AB)t. The decision as to whether or not an answer is an answer is subjective and comes with experience. Even then, professional mathematicians may differ on how good an answer is depending on how they happen to view the problem and mathematics. The authors feel that a really good answer to the question, 'What is (AB)'?' does not, and probably will not exist. However, an answer should:

(A) have some sort of intuitive justification if possible; (B) suggest at least a partial answer to the other question, 'When does

= B'A'?' Theorem 4.1 is, to our knowledge, the best answer available. We shall now attack the problem of determining a formula for (AB)'. The first problem is to come up with a theorem to prove. One way to come up with a conjecture would be to perform algebraic manipulations on (AB)' using the Penrose conditions. Another, and the one we now follow, is to draw a picture and make an educated guess. If that guess does not work, then make another. Figure 1.1 is, in a sense, not very realistic. However, the authors find it a convenient way to visualize the actions of linear transformations. The vertical lines stand for CTM, and C. A sub-interval is a subspace. The rest of the interval is a (possibly orthogonal) complementary subspace. A

20 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS C

1'

— C

C,,

---—=A '45R (5) Fig. 1.1

shaded band represents the one to one mapping of one subspace onto Xfl another. It is assumed that A€C'" and BEC" ". In the figure: (a, c) =

R(B*), (a', c') = R(B), (b', d') = R(A*), and (b", d") = R(A). The total shaded band from C" to C" represents the action of B (or The part that is shaded more darkly represents PR(A.)B = AtAB. The total shaded area

from CN to C'" is A. The darker portion is APR(B) = ABBt. The one to one from C" to C'" may be viewed as the v-shaped portion of the mapping dark band from C" to C'". Thus to 'undo' AB, one would trace the dark band backwards. That is, (AB)t = (PRA.B)t(APRB)t. There are only two things wrong with this conjecture. For one,

AB = A(AtA)(BBt)B

(1)

and not AB = A(BBt)(A?A)B. Secondly, the drawing lies a bit in that the interval (b', c') is actually standing for two slightly skewed subspaces and not just one. However, the factorization (I) does not help and Fig. 1.1 does seem to portray what is happening. We are led then to try and prove the following theorem due to Cline and Greville.

Theorem 1.4.1

IfA€C'"

X

and BEC"

X

then (AB)t = (PR(A. )B)(APR(B)).

Proof Assuming that one has stumbled across this formula and is wondering if it is correct, the most reasonable thing to do is see if the formula for (AB)t satisfies any of the definitions of the generalized inverse. Let X = (PR( A.)B)'(APR(B))t = (AtAB)t(ABBt)t. We will proceed as follows. We will assume that X satisfies condition (i) of the Penrose definition. We will then try and manipulate condition (1) to get a formula that can be verified independently of condition (1). If our manipulations are reversible, we will have shown condition (i) holds. Suppose then that ABXAB = AB, or equivalently, AB(AtAB)?(ABBt)tAB = AB.

(2)

Multiply (2) on the left by At and on the right by Bt. Then (2) becomes A?AB(AtAB)t(ABBt)tABBt = AtABBt,

(3)

THE MOORE—PENROSE OR GENERALIZED INVERSE

or,

21

A.PR(B1• Equivalently,

BVA' ) =

(4)

= To see that (4) is true, we will show that if E1

=

R(BPBB' R(A),E2 =

(5)

Suppose first that u€R(B)-. Then E1u = E2u =0. Suppose then that ueR(B). Now to find E1u we will need to But BBtR(A*) is a subspace of R(B). Let u = u1 calculate R(A*) and u2e[BBtR(A*)]1 n R(B). Then where u1 then E1u = E2u for all

=

R(BPBB'

=

R(B)U1 =

(6)

AtAu1.

(7)

Equality (7) follows since u1 ER(B) and the projection of R(B) onto AtAR(B) is accomplished by A'A. Now E2u = PR(A.)PR(B)u = PR(A.)u = AtAii, SO = A'Au1, that is, if U2EN(A) = (4) will now follow provided that R(A*)J.. Suppose then that vER(A*). By definition, u2 BBty. Thus o = (BBtv,u2) = (V,BBtU2) = (v,u2). Hence u2eR(A*)I. as desired. (4) is now established. But (4) is equivalent to (3). Multiply (3) on the left by A and the right by B. This gives (2). Because of the particular nature of X it turns out to be easier to use ABXAB = AB to show that X satisfies the Moore definition Of(AB)t. Multiply on the right by (AB)t. Then we have But N(X) c N((ABBt)t) = R(ABBt)1 = R(AB). ABXPR(AB) = Thus XPR(As)

=

and hence

ABX=PR(AB).

(8)

)XAB = Now multiply ABXAB = AB on the left by (AB)t to get B•A• But R(X) c R((AtAB)t) = R((AtAB)*) = R(B*A*A*t) = R(B*A*) and hence BA X = X. Thus XAB=PR((AB).).

(9)

Equations (8) and (9) show that Theorem 1 holds by the Moore definition.

U Comment It is possible that some readers might have some misgivings about equality (7). An easy way to see that R(B)U1 = AtAu1 is as EAtAR(B). But follows. Suppose that u1 ER(B). Then A(u1 — AtAu1)= Au1 — Au1 = 0. Thus u1 — AtAu1eR(A*)1 c(AfAR(B))L. Hence u1 = AtAU1 (u1 — Thus PR(AAB)u3 = A'Au1. The formula for simplifies if either = I or = I.

Corollary 1.4.1 Suppose that AECTM "and BeC" P (i) If rank (A) = n, then (AB)' = Bt(APR(a,)t. )B)A. (ii) If rank (B) = n, then (AB)' =

22 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

Part (i) (or ii) of Corollary 1 is, of course, valid if A (or B) is invertible.

and rank(A)= rank(B) = n. Then (AB)t = BtAt and At = A*(AA*)l while Bt = (B*B) 1B*. The formulas for At and Bt in Corollary 1 come from Theorem 3.2 and B In fact Corollary 1 can be and the factoring of A = derived from Theorem 3.2 also. The assumptions of Corollary 2 are much more stringent than are necessary to guarantee that (AB)t = BtAt.

Corollary 1.4.2

Example 1.4.1

Suppose that

LetA=

10101 10

0

1a001

ii andB=

10

andBt=

01 where a,b,ceC.

LOOd

L000J 10001 0 01

b

10

bt

cland(AB)t= Ibt 0 0l.Ontheotherhand ctOJ L000J Lo 0

10001

BtAt= Ibt 0 LO

sothat(AB)t=BtAt.NoticethatBA=

ctOJ

IOaOl 10 0 bI. L000J

By varying the values of a,b and cone can see that (AB)t = B?At

possible without

(i)AB=BA (a#b) (ii) rank(A)=rank(B) (iii)

(a=b=0,c=1)

rank(AB)=rank(BA) (a=b#0,c=0).

The list can be continued, but the point has been made. The question

remains, when is (AB)t = BtAt? Consider Example 1 again. What was it about that A and B that made (AB)t = BtAt? The only thing that A and B seem to have in common are some invariant subspaces. The subspaces R(A), R(A), N(A), and N(A) are all invariant for both A and B. A statement about invariant subspaces is also a statement about projectors. A possible method of attack has suggested itself. We will assume that = BtAt. From this we will try to derive statements about projectors. In Example 1, MA and B were simultaneously diagonalizabk so we should see if MA works in. Finally, we should check to see if our conditions are necessary as well as sufficient. Assume then that and

= BtAt.

(10)

Theorem 1 gives another formula for (AB)'. Substitute that into (10) to get (AtAB)t(ABBt)t = B?At. To change this into a projector equation,

THE MOORE—PENROSE OR GENERALIZED INVERSE

23

multiply on the left by (A'AB)and on the right by (ABBt), to give

PR(A'AB) P

—'P R(A') PR(B)1

11

By equation (4), (11) can be rewritten as

and

=

hence is a projector. But the product of two hermitian projectors is a projector if and only if the two hermitian projectors commute. Thus (recall that [X, Y] = XY — YX) = 0, or equivalently, [A'A,BB'] = 0, If(AB)t = B?At. (12)

Is (12) enough to guarantee (10)? We continue to see if we can get any additional conditions. Example 1 suggested that an AA term might be useful. If(10) holds, then ABBtAI is hermitian. But then A*(ABBtAt)A is hermitian. Thus A*(ABBt)A?A = AtABBtA*A. (13) Using (12) and the fact that or

= A5, (13) becomes

=

{A*A,BBtJ = 0.

(14)

Condition (14) is a stronger condition than (12) since it involves AA and not JUSt

AA )•

In Theorem 1 there was a certain symmetry in the formula. It seems unreasonable that in conditions for (10) that either A or B should be more important. We return to equation (10) to try and derive a formula like (14) but with the roles of A and B 'reversed'. Now BtAtAB is hermitian is hermitian. Proceedsince we are assuming (10) holds. Thus ing as before, we get that

[BB*,AtA]=O.

(15)

Continued manipulation fails to produce any conditions not implied by

(14) and (15). We are led then to attempt to prove the following theorem. Theorem 1.4.2

Suppose that A e C'" statements are equivalent:

Xfl

and BE C"

X

Then the following

(AB)' = BtAt. (ii) BB*AtA and A*ABBt are hermitian. (iii) R(A5) is an invariant subspace for BB5 and R(B) is an invariant (1)

subs pace of A5A.

= 0 and =0. AtABBSAS (v) = BB5A5 and BB'A5AB = A5AB.

(iv)

Proof Statement (iii) is equivalent to equations (14) and (15) so that (i) implies (ii). We will first show that (ii)—(v) are all equivalent. Since BBS and A5A are hermitian, all of their invariant subspaces are reducing. Thus (ii) and (iii) are equivalent. Now observe that if C is a matrix and M a

24 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

= (I — = then + PMCPtI. This + is an invariant subspace if and only if =0. Thus says that (iii) and (iv) are equivalent. Since (v) is written algebraically, we will show it is equivalent to (ii). Assume then that BB*AtA is hermitian. Then BB*AtA = AtABB*. Thus BB*(AtA)A* = A?ABB*A*, or subspace,

BB*A* = AtABB*A*.

(16)

Similarly if A*ABBt is hermitian. then BBtA*AB = A*AB

(17)

so that (ii) implies (v). Assume now that (16) and (17) hold. Multiply (17) on the right by Bt and (16) on the right by A*t. The new equations are precisely statement (iii) which is equivalent to (ii). Thus (ii)—(v) are all equivalent. Suppose now that (ii)—(v) hold. We want to prove (1). Observe that BtAt = Bt(BBt)(AtA)At = Bt(AtA)(BBt)At, while by Theorem I (AB)t = Theorem 2 will be proven if we can show that

(AtAB)t=Bt(AtA) and

(18)

(ABBt)t = BBtAt.

(19)

To see 11(18) holds, check the Penrose equations. Let X = Bt(AtA). Then AtABX = AtABB?AtA = AtAAtABBt = A?ABBt = A'ABB') = Thus Penrose conditions (i) and (iii) are satisfied. Now X(AtAB) = Bt(AtA)(AtA)B = Bt(AtA)B. Thus X(AtAB)X = Bt(AtA)BB?(AtA) = BtAtA = X and Penrose condition (ii) is satisfied. There remains then only to show that X(A'AB) is hermitian. But A?ABB* is hermitian by assumption (ii). Thus B?(A?ABB*)Bt* is hermitian and Bt(A?A)BB*B*t = Bt(AtA)B = X(A'AB) as desired. The proof of (19) is similar and left to

the exercises. S It is worth noticing that conditions (ii)—(v) of Theorem 1.4.2 make Fig. 1.1 correct. The interval (b', c') would stand for one subspace R(AtABB?) rather than two skewed subspaces, R(AtABBt) and R(BBtAtA). A reader might think that perhaps there is a weaker appearing set of conditions than (ii)—(v) that would imply (ABt) = BtAt. The next Example shows that even with relatively simple matrices the full statement of the conditions is needed. 10 0 01 ri 0 0] Example 1.4.2 Let A = 10 1 and B = 10 1 0 I. Then B = B* = Lo i oJ Lo 0 OJ 1

10

00 ] 100

Bt=B2andAt=IO 11 Lo

i]'I=Io 0

Li oJ J

[0 i

01

1I.N0wBB*AtAis

—ij

hermitian so that [BB*, AtA] = 0 and [BBs, AtA] =0. However,

10001

A*ABBt = 10 2 LO

I

which is not hermitian so that (AB)t # B'At. 0J

THE MOORE—PENROSE OR GENERALIZED INVERSE

25

An easy to verify condition that implies (AB)t = BtAt is the following.

Corollary 1.4.3 IJA*ABB* = BB*A*A, then (AB)t = B'A'. The proof of Corollary 2 is left to the exercises. Corollary 2 has an advantage over conditions (ii)—(iv) of Theorem 2 in that one does not have to calculate At. Bt, or any projectors to verify it. It has the disadvantage that it is only sufficient and not necessary. Notice that the A, B in Example I satisfy [AtA, BB*] =0 while those in Example 2 do not. There is another approach to the problem of determining when (AB)t = BtAt. It is to try and define a different kind of inverse of A, call it, A so B A . This approach will not be discussed. that (AB)

5. 1.

Exercises Each of the following is an alternative set of equations whose unique solution X is the generalized inverse of A. For each definition show that it is equivalent to one of the three given in the text.

(a) AX = PR(A),MX ) = N(A). (b) AX = PR(A),XA = PR(A.),XAX = X. (c) XAA* =

d

A*,XX*A* = x.

XAx—J"

if

10 ifxeN(A*).

(e) XA = PR(A),N(X) = N(A*). Comment: Many of these have appeared as definitions or theorems in the

literature. Notice the connection between Example 2.1, Exercise 1(b), and Exercise 3 below. 2. Derive a set of conditions equivalent to those given in Definition 1.2 or Definition 1.3. Show they are equivalent. Can you derive others? 3. Suppose that A€Cm Xfl Prove that a matrix X satisfies AX = XA = if and only if X satisfies (i), (iii), and (iv) of Definition 1.3. Such an X is called a (1,3,4)-inverse of A and will be discussed later. Observe that it cannot be unique if it is not At since trivially At is also a (1,3,4)-inverse.

4. Calculate At from Theorem 3.1 when A

and when

=

Hint: For the second matrix see Example 6. 5. Show that if rank (A) =

1,

then At = !A* where k = 1=1 j= 1

6.

If and rank (A) = n, notice that in Theorem 3.2, C may be chosen as an especially simple invertible matrix. Derive the formula

26 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

for At under the assumption that rank A = n. Do the same for the case

when rank (A) = 0 f 0 0 7. Let A = 1

I

0

L—1

1

0

21 I. Cakulate At from Definition 1.1. 1

—1 0]

8. Verify that (At)* = (A*)t. 9. Verify that (AA)t = AtAt, AEC, where ,V = 10. If AeCTM

Xiv

1

if A

0 and 0' =0.

and U, V are unitary matrices, verify that (UAV)t =

V*AtU*. 11. Derive an explicit formula for A' that involves no (t)'s by using the singular value decomposition.

*12. Verify that At =

*13. Verify the At =

I 2iri

f !(zI z

— A*A) IA*dz where C is a closed contour

containing the non-zero eigenvalues of A*A, but not containing the zero eigenvalue of A*A in or on it.

14. Prove Proposition 1.1. Exercises 15—21 are all drawn from the literature. They were originally done by Schwerdtfeger, Baskett and Katz, Greville, and Erdelyi. Some follow almost immediately from Theorem 4.1. Others require more work.

such that rank A = rank B and if the eigenvectors corresponding to non-zero elgenvalues of the two matrices ASA and BB* span the same space, then (AB)' = B'A'. and AAt = AtA,BBt = BtB,(AB)tAB= 16. Prove that if AEC AB(AB)', and rank A = rank B = rank(AB), then (AB)' = BtAt. Be 17. Assume that AECTM Show that the following statements are equivalent. 15. Prove that if AECrnXv and

X

(i) (AB)t = B'A'. (ii) A?ABB*A*ABBt = BB*A*A. (iii) A'AB = B(AB)'AB and BBtA* = A*AB(AB)t. (iv) (AtABBt)* = (AtABBt)t and the two matrices ABBtAt and UtAtAB are both hermitian.

18. Show that if [A, = 0, [At, (Bk] =0, [B, =0, and [Bt,PR(A,)] = 0, then (AB)' = 19. Prove that if A* = At and B* = Bt and if any of the conditions of Exercise 18 hold, then (AB)' = BtAt = B*A*. 20. Prove that if A* = A' and the third and fourth conditions of Exercise 18 hold, then (AB)' = BtAt = BtA*. 21. Prove that if B* = B' and the first and second conditions of Exercise 18 hold, then (AB)t = BtAt. 22. Prove that if [A*A, BB*] =0, then (AB)t = B'A'.

THE MOORE—PENROSE OR GENERALIZED INVERSE

27

Verify that the product of two hermitian matrices A and B is hermitian if and only if [A, B] =0. 24. Suppose that P. Q are hermitian projectors. Show that PQ is a projector if and only if [P, Q] =0. = 25. Assuming (ii)—(v) of Theorem 8, show that (APR(B))t = BtAt. PR(B)A without directly using the fact that (AS)' = 26. Write an expression for (PAQ)t when P and Q are non-singular. 27. Derive necessary and sufficient conditions on P and A, P non-singular, 23.

for (P 'AP)' to equal P 'A'P. X

28. Prove that if Ae Br and the entries of A are rational numbers, then the entries of At are rational.

2

Least squares solutions

1.

What kind of answer is Atb?

At this point the reader should have gained a certain facility in working with generalized inverses, and it is time to find out what kind of solutions they give. Before proceeding we need a simple geometric lemma. Recall \1/2 if w = [w1, ... , w,3*GCP, then l( w = E 1w112) = (w*w)U2 that denotes the Euclidean norm of w.

Lemma 2.1.1

Ifu,veC"and(u,v)=O,thenhlu+v112=11u112+11v112.

Proof Suppose that

and (u,v) =0. Then

llu+v112 =(u+v,u+v)=(u,u)+(v,u)+(u,v)+(v,v)= hull2 + 11,112. Now consider again the problem of finding solutions u to (1)

If(1) is inconsistent, one could still look for u that makes Au — as possible.

b

as small

Definition 2.1.1

Suppose that and b€Cm. Then a vector is called a least squares solution to Ax = b if II Au — b Av — b for all veC'. A vector u is called a minimal least squares solution to Ax = b tf u is a least squares solution to Ax = b and liii < w for all other least squares solutions w.

The name 'least squares' comes from the definition of the Euclidean

norm as the square root of a sum of squares. If beR(A), then the notions of solution and least squares solution obviously coincide. The next theorem speaks for itself. Theorem 2.1.1

Suppose that AeCtm'" and beCTM. Then Atb is the

minima! least squares solution to Ax = b.

LEAST SQUARES SOLUTIONS 29

Proof Notice that IIAx—b112 = II(Ax

(I -AA')b112

—

= IlAx —

+ 11(1 —AAt)b112.

Thus x will be a least squares solution if and only if x is a solution of the consistent system Ax = AAtb. But solutions of Ax = AAtb are of the form

AtA)h = — A'A)h. Since 11x112 = IIAtbII2 we see that there is exactly one minimal least squares solution x = Atb. As a special case of Theorem 2.1.1, we have the usual description of an orthogonal projection.

x=

—

Corollary 2.1.1

Suppose that M is a subs pace of

•

and

is the

orthogonal projector of onto M. If beCI*, then PMb is the unique closest vector in M to b with respect to the Euclidean norm.

In some applications, the minimality of the norm of a least squares solution is important, in others it is not. If the minimality is not important, then the next theorem can be very useful.

Theorem 2.1.2 Suppose that AECTM "and beCTM. Then the following statements are equivalent (1) u is a least squares solution of Ax = b,

(ii) u is a solution of Ax = AAtb, (iii) ii is a solution of A*Ax = A*b. (iv) u is of the form A'b + h where heN(A).

Proof We know from the proof of Theorem 1 that (i), (ii) and (iv) are equivalent. If (1) holds, then multiplying Au = b on the left by A* gives (iii). On the other hand, multiplying A*Au = A*b on the left by A*t gives Au = AA'b. Thus (iii) implies (ii). U Notice that the system of equations in statement (iii) of Theorem 2 does not involve At and is a consistent system of equations. They are called the normal equations and play an important role in certain areas of statistics.

It was pointed out during the introduction to this section that if X satisfies AXA = A, and be R(A), then Xb is a solution to (1). Thus, for consistent systems a weaker type of inverse than the Moore—Penrose would suffice. However, if then the condition AXA = A is not enough to guarantee that Xb is a least squares solution.

Fact There exist matrices X, A and vector b, R(A), such that AXA = A but Xb is not a least squares solution of Ax = b. Example 2.1.1

If X satisfies AXA = A, then X is of the

Let A

= form 1

Lx21

squares

1. Let b =

X22J

1 1. Then by Theorem 2 a vector u isa least L1J

solution to Ax = b if and only if Ax = b1 where b1

least squares solution, then IIAu — bil =

11b1 —

bli

=

1.

r1i

If u is a

= Loi [1 +2X1 2].

But A(Xb) =

30 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

= (1 +41 x12 12)1/2. If x22 0 then A(Xb) — b > I and Thus A(Xb) — Xb will not be a least squares solution Ax = b.

Example 2 also points out that one can get least squares solutions of the form Xb where X is not At. Exactly what conditions need to be put on X to guarantee Xb is a least squares solution to Ax = b will be discussed in Chapter 6.

2.

Fitting a linear hypothesis

Consider the law of gravitation which says that the force of attraction y

between two unit-mass points is inversely proportional to the square of the distance d between the points. If x = l/d2. then the mathematical formulation of the relationship between y and x isy =f(x) = /ix where fi is an unknown constant. Because the functionf is a linear function, we consider this to be a linear functional relationship between x and y. Many such relationships exist. The physical sciences abound with them. Suppose that an experiment is conducted in which a distance d0 between two unit masses is set and the force of attraction y0 between them is measured. A value for the constant fi is then obtained as fi = y0/x0 = However, if the experiment is conducted a second time, one should not be greatly surprised if a slightly different value of fi is obtained. Thus, for the purposes of estimating the value of fi, it is more realistic to say that for each fixed value of x, we expect the observed values yj of yto satisfy an equation of the form y1 = fix + e1 where ej is a measurement error which occurs more or less at random. Furthermore, if continued observations of y were made at a fixed value for x, it is natural to expect that the errors would average out to zero in the long run. Aside from measurement errors, there may be another reason why different observations of y might give rise to different values of/i. The force of attraction may vary with unknown quantities other than distance (e.g. the speed of the frame of reference with respect to the speed of light). That is, the true functional relationship may be y = fix + g(u1 , u2, ... ,UN) where the function g is unknown. Here again, it may not be unreasonable to expect that at each fixed value of x, the function g assumes values more or less at random and which average out to zero in the long run. This second type of error will be called functional error. Many times, especially in the physical sciences, the functional relationship between the quantities in question is beyond reproach so that measurement error is the major consideration. However, in areas such as economics, agriculture, and the social sciences the relationships which exist are much more subtle and one must deal with both types of error. The above remarks lead us to the following definition.

Definition 2.2.1 When we hypothesize that y is related linearly to x1 ,x2, ... ,xN, we are hypothesizing that for each set of values p1 = (x11, x12,... , x1)for x1, x2,... ,x,,, the observations y1for y atp1 can be expressed where(i)/i0,fi1,...,fiare asy,=fi0+fi1x11 +fi2x12 + ...

LEAST SQUARES SOLUTIONS

31

unknown constants (called parameters). (ii) e,1 is a value assumed by an unknown real valued function e, such that e, has the property that the values which it assumes will 'average out' to zero over all possible observat ions y at p..

That is, when we hypothesize that y is related linearly to x3 , x2, ...

,;,

are hypothesizing that for each point p = (x1, , the 'expected ... , value', E(,y.), of the observation y. at (that is, the 'average observation' at we

p1) satisfies the equation

=

+

+

+ ... +

and not that y1 does. This can be easily pictured in the case when only two

variables are involved. Suppose we hypothesize that y is related linearly to the single variable x. This means that we are assuming the existence of a linef(x) = II,,, + such that each point (x1, E(y.)) lies on this straight line. See Fig. 21.

In the case when there are n independent variables, we would be hypothesizing the existence of a surface in (which is the translate of a subspace) which passes through the points (p1. E(y.)). We shall refer to such a surface as a flat. In actual practice, the values E(y.) are virtually impossible to obtain exactly. Nevertheless, we will see in the next section that it is often possible to obtain good estimates for the unknown parameters, and therefore produce good estimates for the E(y.)'s while also producing a reasonable facsimile of the hypothesized line of flat. The statistically knowledgeable reader will by now have observed that we have avoided, as much as possible, introducing the statistical concepts which usually accompany this type of problem. Instead, we have introduced vague terms such as 'average out'. Admittedly, these terms being incorporated in a definition would (and should) make a good mathematician uncomfortable. However, our purpose in this section is to examine just the basic aspects of fitting a linear hypothesis without introducing statistical

The set of oil

E(.Vm)

possible

at x,,, set of all possible observations

4The set of oil possible observations at x1

A'

Fig. 2.1

32 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

concepts. For some applications, the methods of this section may be

sufficient. A rigorous treatment of linear estimation appears in Chapter 6. In the next two sections we will be concerned with the following two basic problems.

(I) Having hypothesized a linear relationship, y, = + + ... + for the unknown parameters P1. 1=0,1,... find estimates , n. + e,, (II) Having obtained estimates for the p1's, develop a criterion to help decide to what degree the functionf(x1 , x2, ... , = Po + + ... + 'models' the situation under question. 3.

Estimating the unknown parameters

We will be interested in two different types of hypotheses.

Definition 2.3.1

When the term

is present in the expression (1)

we shall refer to (1) as an intercept hypothesis. When does not appear, we will call (1) a no intercept hypothesis. Suppose that we have hypothesized that y is related linearly to by the no intercept hypothesis x1 , x2,...

,;

y1 = P1;1

+ P2;2 + ... +

(2)

+ e,1.

To estimate the parameters P1 , select (either at random or by ... , design) a set of values for the x's. Call them p1 = [x11,x12, ... ,x1J. Then observe a value for y at p1 and call this observation y1. Next select a second set of values for the x's and call them p2 = [x21 ,x22, ... ,x2j(they need not be distinct from the first set) and observe a corresponding value for y. Call it y2. Continue the process until m sets of values for the x's and m observations for y have been obtained. One usually tries to have rn> n. If the observations for the x's are placed as rows in a matrix

xli x21

xml

pm_i

which we will call the design matrix, and the observed values for y are

placed in a vector y = [y1 ,...

we may write our hypothesis (2) as

y=Xb+e, where b is the vector of unknown parameters b =

(3)

,... , p,,JT and e, is the

unknown e,,= [e,1,... ,e,,JT.

In the case of an intercept hypothesis (1) the design matrix X1 in the equation

y=X,b,+e,

(4)

LEAST SQUARES SOLUTIONS 33

takes on a slightly different appearance from the design matrix X which

arose in (3). For an intercept hypothesis. X1

is

of the form

ri I

I

[1 x2 1

I

i=I

:

Li

Xmi

and b3 is of the form

I

Xm2 b1

L'

maJipnx(n+1)

= [IJ0IbT]T,

=

b=

Consider a no intercept hypothesis and the associated matrix equation b is to use the (3). One of the most useful ways to obtain estimates information contained in X and y, and impose the demand that 6 be a vector such that 'X6 is as close to y as possible', or equivalently, 'e, is as close to 0 as possible. That is, we require 6 to be a least squares solution of Xb = y. Therefore, from Theorem 1.2, any vector of the form 6= Xty + h,heN(X), could serve as an estimate for b. If X is not of full column rank, to select a particular estimate one must impose further restrictions on 6. In passing, we remark that one may always impose the restriction that 11611 be minimal among all least squares estimates so that the desired estimate is 6= Xty. Depending on the application, this may or may not be the estimate to choose. defined by x6 = is For each least squares estimate 6 of b, the values', E(y), where an estimate for the vector of

E(y)= [E(y1),... Although it is a trivial consequence of Theorem 1.2, it is useful to observe the following. The vector = X6 is the same for all least sjuares solutions 6, of X6 = Moreover,for all least squares solutions 5, = x6 = = XXty and r = y — = = (I — XXt)y.

Theorem 2.3.1

A closely related situation is the following. Suppose (as is the case in

many applications) one wishes to estimate or predict a value for a particular linear combination (5)

of the

on the basis of previous observations. Here,

= [c1 ,... , ce].

That is, we want to predict a value, y(c*), for y at the point = [C1, c2, ... , cjon the basis of observations made at the points p1 , p2, ... p,,,. If = cab, we know that it may be possible to have we use = infinitely many estimates 6. Hence y(c*) could vary over an infinite set of values in which there may be a large variation. However, there when c*b is invariant among all least squares estimates 6, so that y(c*) has a unique value.

Theorem 2.3.2

Let

The linear form

least squares solutions of X6 = c*16

= C*X?Y.

invariant among all y !f and only iice R(X*); in which case, is

34 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

Proof If S is a least squares solution of X6 = y, then S is of the form 6 = X'y + h where heN(X). Thus c"S = cXty + ch is invariant if and only if c*h =0 for all hEN(X). That is, ceR(X*) = N(X)L. • Note that in the important special case when X has full column rank, cb is trivially invariant for any Most of the discussion in this section concerned a no intercept hypothesis and the matrix equation (3). By replacing X by X1 and b by b,, similar remarks can be made about an intercept hypothesis via the matrix equation (4).

4.

Goodness of fit

Consider first the case of a no intercept hypothesis. Suppose that for

various sets of values p, of the x's, we have observed corresponding values y, for y and set up the equation

y=Xb+e,.

(1)

From (1) we obtain a set of least squares estimates parameters ,81

,

for the

...

As we have seen, one important application is to use the to estimate or predict a value for y for a given point = [c1 ,c2, ... ,cJ by means

of what we shall refer to as the estimating equation (2)

How good is the estimating equation? One way to measure its effectiveness is to use the set of observation points which gave rise to X and measure how close the vector y of observed values is to the vector of estimated values. That is how close does the flat defined by (2) come to passing From through the points (ps, y— Theorem 3.1 we know that is invariant among all least squares estimates S and that y — = r = 11(1— XXt)y II. One could be tempted to say that if r Ills small, then our estimating

equation provides a good fit for our data. But the term 'small' is relative. If we are dealing with a problem concerning distances between celestial objects, the value r = 10 ft might be considered small whereas the same value might be considered quite large if we are dealing with distances between electrons in an atomic particle. Therefore we need a measure of relative error rather than absolute error. Consider Fig. 2.2. This diagram Suggests another way to measure how close y is to The magnitude of the angle 9 between the two vectors is such a measure. In Ce', it is more convenient to measure Icos 0,, rather than I°I, by means of the equation cos 0i_j!.tll —

_IIxxtyII —

(Throughout, we assume y 0, otherwise there is no problem.) LikeWise, Isin 0, or stan might act as measures of relative error. Since y can be

LEAST SQUARES SOLUTIONS 35

R

Fig. 2.2

decomposed into two components, one in R(X) and the other in R(X)--, y = + r, one might say, in rough terms, that Icos 01 represents the 1. percentage ofy which lies in R(X). Let R = cos 0 so that 0 RI Notice that if IRI = 1, then all of y is in R(X),y = and r = 0. If R = 0, then y j.. R(X), =0, and r = y. Thus when I RI = 1, the flat defined by the actually passes 131x1 + j32x2 + ... + equationf(x1,x2, ... through each of the data points (p1. y1) so that we have an 'exact' or as possible and 'perfect' fit. When R =0, y is as far away from we say that we have no fit at all. In practice, it is common to use the term R2 = cos2 0 = I! 112/Il 112 rather than I RI to measure the goodness of fit. Note that R2 RI since R 1. Thus R2 is a more conservative measure. For example, if R2 = 0.96, one would probably consider the fit to be fairly good since this indicates that, in fact, about 98% of y lies in R(X). A familiar form for R2, when all numbers are real, is /.,. \2 1=1

R2—

1=1

1=1

where

denotes the ith entry of XXty = and y. is the ith entry of y.

This follows because II

112

[y*yJctyj = [y*fl = 1=1

Hence R2 =

Notice that R and R2

2 II 3' II

r=

is not. In statistical circles, R goes by the name of the product moment correlation and R2 is known as the between the observed y1's and the predicted are unit free measures whereas

coefficient of determination.

y

—

36 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

Consider now the case of the intercept hypothesis

= Po + Pixji + ... +

+

As mentioned earlier, this gives rise to the matrix equation y = X,b + e,

where X, = [i!X1. If one wishes to measure the goodness of fit in this case, one may be

tempted to copy the no intercept case and use IIx,x,yII

(3

This would be a mistake. In using the matrix X,, more information would be used than the data provided because of the first column, j. The

expression (3) does not provide a measure of how well the flat

fitsthedatapoints Instead, the expression (3) measures how well the flat in C".2 fits the points = + + f(xO,xl,x2, ... (1, p.. y.). In order to decide on a measure of goodness of fit, we need the following fact. = The vector 6, = Theorem 2.4.1 Let X, ,fl,J' = [p0:ST]bT = [Ps.... ,$j, isa least squares solution of X,b, = y and only if = !j*(y — X6) (4) and 6 is a least squares solution of (5)

Here J =

is

a matrix of ones.

Proof Suppose first that

satisfies (4) and that Sis a least squares b"]' is a least squares solution of solution of(5). To show that = X,b, = y, we shall use Theorem 1.2 and show that 6, satisfies the normal = equations, Note first that

lw

'Lxi —I

Therefore, x1'x,S, = (6)

Xb) +

j*y

I X*Jy

—

x*JxG + x*x6

LEAST SQUARES SOLUTIONS

Since S is a least squares solution of (5), we

is a solution of

/

/ \

I

'

mj\ mj 1

1

/

\2

/ \

know from Theorem 2.1 that 6 I

'

mj\ mj 1

\*/ /

/ =II__J) 1

37

1

(7)

\

orthogonal projector onto N(J)). the equation (7) is equivalent to X*X6 — !X*JXS = X*Y

or

—

!X*Jy — !X*JX6 + X*X6 = X*Y.

(8)

= which proves that 6, is a = = y. Conversely, assume now that is a least

Therefore, (6) becomes x,x,61

least squares solution of squares solution of x16, = y. Then 61 satisfies the normal equations = That is, fl0 and S must satisfy m

—

I

—

Direct multiplication yields (9)

X*j$0 + X*X6 = X*y.

Equation (9) implies that value of X*Jy —

(10)

=

into (10) yields

X*JX6 + X*X6 =

—

X6), which is (4). Substituting this

—

X6) + X*X6 = X*y or equivalently,

which is equation (9). Hence, 6

satisfies (7) so that S is a least squares solution of (i

(i

—

—

j )y, and the theorem is proven.

Let XMand yMdenote the matrices XM =

(i !J)y and let i1 =!

and =

(i

—

—

!

and

y.. That is,

xS =

YM

=

the mean of

—

the jth column of X, is the mean of the values assumed by thejth independent variable Likewise, is just the mean of all of the observations y, for y.

38 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

and YM are

Therefore,

the matrices

I x21—X1 X22—X2

XMI

:

1Y2—YI

I'YMI

x,,2—i2

I

Theorem 1 says that in obtaining least squares solutions of X,51 = y, X15 = YM In we are really only obtaining least squares solutions effect, this says that fitting the intercept hypothesis + P,x, + y. = /1,,, + fl1x13 + fl2x12 + by the theory of least squares is equivalent to fitting the no intercept hypothesis

=

—

+ P2(x,2

—

—

x2) +

+

—

i.,) +

Thus, a measure of goodness of fit for an intercept hypothesis is 2

p2_ —

M

JM

— —

2

;

2 2

A familiar form for R2 when all numbers are real is

(11)

[91_irJ2) i—i

where y. is the ith entry of y and (11), we must first prove that

=

is the ith entry of = X,Xy. To prove (12)

—

To see that (12) is true, note that so that by Theorem 1,

is a least squares solution of X,,6 = y

[!j*y — L

a least squares solution of = y. Thus all least squares solutions of X161 = y are of the form 61=s+ h where heN(X1). Because Xy is a least squares solution of = y, there must be a vector b0e N(X1) such that is

= X1(x + h0) = X1s = !jj*(y —

= s + h0. Therefore,

= !Jy =

+

—

+

= !Jy + (i —

+ =

+

from which (12) follows. Now observe that (12)

LEAST SQUARES SOLUTIONS 39

implies that the ith entry (YM)j of

=

is

given by (13)

We can now obtain (11) as 2

R2=

YM

4

=

IIYMII2

IIYMIIIIY%jII

(14)

=

By using (13) along with the definition of YM we see that (14) reduces to (11). We summarize the preceding discussion in the following theorem.

Theorem 2.4.2

For the no intercept hypothesis

= fl1x11 +

+ ... +

+ e,. the number,

)2 XXty 112_Il

R2 —

112

-

11y1I2

1=1

1=1

is a measure of goodness offit. For the intercept hypothesis, = fl0 + measure of goodness offit is given by + + ... + fi

)2 R2

YM

where XM =

=

-

11211

=11

(i

—

=

IIYMII

YM =

(i

—

Here X1

and

is the ith entry of

andJ =jj*,j =[1,...,lJ*.

In each case 0 R2 1 and is free of units. When R2 = 1, the fit is exact and when R =0, there is no fit at all. 5.

An application to curve fitting

Carl Friedrich Gauss was a famous and extremely gifted scientist who lived

from 1777 to 1855. In January of 1801 an astronomer named G. Piazzi briefly observed and then lost a 'new planet' (actually this 'new planet' was the asteroid now known as Ceres). During the rest of 1801 astronomers and other scientists tried in vain to relocate this 'new planet' of Piazzi. The task of finding this 'new planet' on the basis of a few observations seemed hopeless. Astronomy was one of the many areas in which Gauss took an active interest. In September of 1801, Gauss decided to take up the challenge of

40 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

finding the lost planet. Gauss hypothesized an elliptical orbit rather than

the circular approximation which previously was the assumption of the astronomers of that time. Gauss then proceeded to develop the method of least squares. By December, the task was completed and Gauss informed the scientific community not only where to look, but also predicted the position of the lost planet at any time in the future. They looked and it was where Gauss had predicted it would be. This extraordinary feat of locating a tiny, distant heavenly body from apparently insufficient data astounded the scientific conununity. Furthermore, Gauss refused to reveal his methods. These events directly lead to Gauss' fame throughout the entire scientific community (and perhaps most of Europe) and helped to establish his reputation as a mathematical and scientific genius of the highest order. Because of Gauss' refusal to reveal his methods, there were those who even accused Gauss of sorcery. Gauss waited until 1809 when he published his Theoria Motus Corporum Coelestiwn In Sectionibus Conicis Dolem Ambientium to systematically develop the theory of least squares and his methods of orbit calculation. This was in keeping with Gauss' philosophy to publish nothing but well polished work of lasting significance. Gauss lived before linear algebra as such existed and he solved the problem of finding Ceres by techniques of calculus. However, it can be done fairly simply without calculus. For the sake of exposition, we will treat a somewhat simplified version of the problem Gauss faced. To begin with, assume that the planet travels an elliptical orbit centred about a known point and that m observations were made. Our version of Gauss' problem is this.

Problem A

Suppose that (x1 , y1), (x2 , y2),... , (x,,,, y,,) represent the m

coordinates in the plane where the planet was observed. Find the ellipse in standard position x2/a2 ÷ y2/b2 = 1, which comes as close to the data

points as possible. If there exists an ellipse which actually passes through the m data points, then there exist parameters = 1/a2, = 1/b2, which satisfy each of the m equations fl1(x1)2 + $2(y,)2 = 1 for I = 1,2, ... , m. However, due to measurement error, or functional error, or both, it is reasonable to expect =1 that no such ellipse exists. In order to find the ellipse fl1x2 + which is 'closest' to our m data points, let

=

1

for I = 1,2,... ,m.

Then, in matrix notation, (1) is written as I

e11 e21

X2

eJ

x2

;i LP2J

'

(1)

LEAST SQUARES SOLUTIONS 41

or e = Xb — j. There are many ways to minimize the

could require that

1e11

be minimal

or that max

For example, we I,1e21,...

be

i—I

minimal. However, Gauss himself gave an argument to support the claim

that the restriction that minimal

e 112 be

(2)

i—i

gives rise to the 'best closest ellipse', a term which we will not define here.

(See Chapter 6.) Intuitively, the restriction (2) is perhaps the most reasonable if for no other reason than that it agrees with the usual concept of euclidean length or distance. Thus Problem A can be reformulated as follows.

Problem B

Find a vector 6=

, fl2]T

that is a least squares solution

of Xb = j. From Theorem 2.4, we know that all possible least squares solutions are of the form 6 = Xtj + h where hEN(X). In our example, the rank of XeC"2 wilibe two unless for each i= l,2,...,m; in which case

the data points line on a straight line. Assuming non-colinearity, it follows that N(X) = {O} and there is a unique least squares solution 6= Xtj = (X*X) 1X"j. That the matrix X is of full column rank is characteristic of curve fitting by least squares techniques. Example 2.5.1 We will find the ellipse in standard position which comes as close as possible to the four data points (1, 1), (0,2), ( — 1,1), and (— 1,2).

ThenX=[? the least squares solution to X6 = j and e = I( X6 — if is approximately = I is the ellipse that fits 'best' (Fig. 2.3). A measure of goodness of fit is R2 = f 2/lIi 112 * 0.932 ( * means 'approximately equal to') which is a decent fit. Notice that there is nothing in the working of Problem B that forced Xtj to have positive coefficients. If instead of an ellipse, we had tried to fit a hyperbola in standard position to the data, we would have wound up with the same least squares problem which has only an ellipse as a least squares solutions. To actually get a least squares problem equivalent to Problem A it would have to look something like this: 0.5. Thus j71x2 +

II

Problem C f Au — b H

Find a vector u with positive coefficients such that for all v with positive coefficients.

Av — b

II

The idea of a constrained least squares problem will not be discussed

here. It is probably unreasonable to expect to know ahead of time the

42 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

0

OzOo$o point Fig. 2.3 The ellipse

x2

+1 yz =

orientation of a trajectory. Perhaps a more reasonable problem than

Problem A would be:

Problem 0 Given some data points find the conic section that provides the 'closest fit.' For convenience assume that the conic section does not go through the origin. Then it may be written as ax2 + by2 + cx + dy +fxy = I. If we use the same four data points of the previous example there are many least squares solutions all of which pass through the four data points. The minimal least squares solution is a hyperbola.

Polynomial and more general fittings

6.

In the previous section we were concerned with fitting a conic to a set of

data points. A variation which occurs quite frequently is that of trying to find the nth degree polynomial (1)

Usually one has which best fits m data points (x1 ,yj, (x2,y2), ... rn> n + 1; otherwise there is no problem because jim n + 1, then the interpolation polynomial of degree m — I n provides an exact fit.

We proceed as before by setting (2) J—o

Thus

Il I! es,,

XNS

XM*XJ

fbi

y11

II]

yJ '

LEAST SQUARES SOLUTIONS 43

or e = Xb — y. If the restriction that e be minimal is imposed, then a closest nth degree polynomial to our data points has as its coefficients Where the are the components of a least squares solution b = Xty h,heN(X) of Xb = y. Notice that if the xi's are all distinct then X has full column rank. (It is an example of what is known as a Vandermonde segment.) Hence N(X) = {O} and there is a unique least = (X*X) 1X*y. squares solution, 5= To measure goodness of fit, observe that (2) was basically an intercept hypothesis so that from Theorem 4.2, one would use the coefficient of 112/Il YM determination R2 = A slightly more general situation than polynomial fitting is the following. Suppose that you are given n functions g1(x) and n linear functions I. of Now suppose that you are given m data points k unknown parameters (x1, y,). The problem is to find values so that

... is as close to the data points as possible. Let

WJk; and define e. =

x=:

=

+ ... +

!,gjx1) — y.. Then the corresponding matrix

e= [e1,...,e,,JT,b= [$1

equation ise= g1(x1) g1(x2)

,

g2(x1) ... g2(x2) ...

Note that this problem is equivalent to finding values so that is as close to the data points as y = fl1L1(x) + fl2L2(x) + ... + possible where L1(x) is a linear combination of the functions g1(x),

To insure that e is minimal, the parameters must be the components of a least squares solution 6,, of XW6W = y. By Theorem 1.7 we have 5w

=

x1)W) (XPR(w))Y + Ii, he N(XW).

In many situations W is invertible. It is also frequently the case that X has full column rank. If W is invertible and N(X) = {O}, then N(XW) = (0) so that (2) gives a unique least squares solution

= W 'Xty = W i(X*x) IX*y = (X*XW) 'X1y. Example 2.6.1

We will find parameters

$2. and

(3) so

that the

function

f(x)= $1x + fi2x2 + fi3 (sin x) best fits the four data points (0,1),

(, 3). and (it 4). Then we will

44 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

find the

so that

f(x) = (fir + ft2)x + (P2 + = P1x + + x2) +

+ sin x + sinx)

best fits the same four data points. In each case we will also compute R2, the coefficient of determination. In the first case, we seek least squares solutions of = y where

000

1

1

2

X=4irir

3

4

In this case, X has full column rank so that the least squares solution is unique and is given by = Xty [9.96749, — 2.76747, — 5.82845]T. Also

0.9667

so that we have a fairly good fit.

11101

Inthesecondcase,W= 10

1

1

LOOLJ

solution is given by

fi

—

1

10

1

Lo

0

ii 1 9.967491

ii iJ

1—2.767471 L—

5.82845J

1 6.906511

=

I

3.060981.

[— 5.82845J

Since N(X) = {O} and W is invertible we have XW(XW)t = XWW XXt and 2

=

IIXW(XW)'y12

hO In closing this section we would like to note that while in curve fitting problems it is usually possible to get X of full column rank. There are times when this is not the case. In the study of Linear Statistical Models, in say Economics, one sometimes has several that tend to move together (are highly correlated). As a result some columns of X, if not linearly dependent, will be nearly so. In these cases Xis either not of full rank or is ill-conditioned (see Chapter 12) [71]. The reader interested in a more statistically rigorous treatment of least squares problems is referred to Section 6.4.

LEAST SQUARES SOLUTIONS 45

7.

WhyAt?

It may come as a surprise to the reader who is new to the ideas in this book that not everyone bestows upon the Moore—Penrose inverse the same central role that we have bestowed upon it so far. The reason for this disfavour has to do with computability. There is another type of inverse, which we denote for now by A' such that A'b is a least squares solution to Ax = b. (A' is any matrix such that AA' = The computation of A' or A'b frequently requires fewer arithmetic operations than the computation of At. Thus, if one is only interested in finding a least squares solution, then A'b is fine and there would appear to be no need for At. Since this is the case in certain areas, such as parts of statistics, they are usually happy with an A'b and are not too concerned with At. Because they are useful we will discuss the A' in the chapter on other types of inverses (Chapter 6). We feel, however, that the generalized inverse deserves the central role it has played so far. The first and primary reason is pedagogical. A' stands for a particular matrix while A' is not unique. Two different formulas for an A' of a given matrix A might lead to two different matrices. The authors believe very strongly that for the readers with only an introductory knowledge of linear algebra and limited 'mathematical maturity' it is much better to first learn thoroughly the theory of the Moore—Penrose generalized inverse. Then with a firm foundation they can easily learn about the other types of inverses, some of which are not unique and some of which need not always exist.

Secondly, a standard way to check an answer is to calculate it again by a different means. This may not work if one calculates an A' by two different techniques, for it is quite possible that the two different correct approaches will produce very different appearing answers. But no matter how one calculates A' for a given matrix A, the answer should be the same.

3

Sums, partitioned matrices and the constrained generalized inverse

1.

The generalized inverse of a sum

For non-singular matrices A, B, and A + B, the inverse of the sum is rarely the sum of the inverses. In fact, most would agree that a worthwhile

expression for (A +'is not known in the general case. This would tend to make one believe that there is not much that can be said, in general, about (A + B)t. Although this may be true, there are some special cases which may prove to be useful. In the first section we will state two results and prove a third. The next sections apply the ideas of the first to develop computational algorithms for At and prove some results on the generalized inverse of special kinds of partitioned matrices. Our first result is easily verified by checking the four Penrose conditions of Definition 1.1.3. Theorem 3.1.1 If A, B€Cm Xn and if AB* = 0 and B*A = 0, then (A + B)t = At + Bt. The hypothesis of Theorem 1 is equivalent to requiring that R(A*) R(B*) and R(A) j. R(B). Clearly this is very restrictive. If the hypothesis of Theorem I is relaxed to require only that R(A*) j.. R(B*), which is still a very restrictive condition, or if we limit our attention to special sums which have the form AA* + BB*, it is then possible to prove that the following rather complicated formulas hold.

j

Theorem 3.1.2 IfAeC BeCTM ", then (AA* + BB*)? = (I — Ct*B*)At [I — AtB(I — CtC)KB*At*JAt(I — BCt) + C'C' where C = (I — AAt)B, K = [I + (I — CtC)B*At*AtB(I — CtC)] '. If A, and AB* = 0, then (A + B)t = At +(I — AtB)[C? +(I — CtC) x KB*At*At(I — where C and K are defined above. Since Theorem 2 is stated only to give the reader an idea of what types of statements about sums are possible and will not be used in the sequel, its proof is omitted. The interested reader may find the proof in Cline's paper [30].

PARTITIONED MATRICES AND CONSTRAINED GENERALIZED INVERSE 47

We will now develop a useful formulation for the generalized inverse of

a particular kind of sum. For any matrix BeC'" ",the matrix B can be denote the written as a sum of matrices of rank one. For example, let matrix in C'" "which contains a I in the (i.j)th position and 0's elsewhere. then If B = (1)

is the sum of rnn rank one matrices. It can be easily shown that if rank (B) = then B can be written as the sum of just r matrices of rank one. Furthermore, if FECrnx? is any matrix of rank one, then F can be written as the product of two vectors, F = cd*, where ceC'", deC". Thus BeC'" can always be written as (2)

Throughout this chapter e1 will denote a vector with a I in the ith place and zeros elsewhere. Thus if ... ,eJ C'", then {e1 ,... ,e,,J would be the standard basis for C'". If B has the decomposition given in (1), let = where e1eC'". Then the representation (2) assumes the form B= It should be clear that a representation such as (2) is not unique. Now if one had at one's disposal a formula by which one could g-invert (invert in the generalized sense) a sum of the form A + cd* where ceC'" and deC", then B could be written as in (2) and (A + B)? could be obtained by

recursively using this formula. In order for a formula for (A + cd)' to be useful, it is desirable that it be of the form (A + cd)' = At + G where G a matrix made up of sums and products of only the matrices A, At, c, d, and their conjugate transposes. The reason for this requirement will become clearer in Sections 2 and 3. Rather than present one long complicated expression for (A + cd*)f Exercise 3.7.18.), it is more convenient to consider the following six logical possibilities which are clearly exhaustive. (1)

and

and 1 + d*Afc arbitrary; and 1 + d*A?C =0;

(ii) ceR(A) and (iii) ceR(A) and d arbitrary and 1 + dA'c #0; (iv) and deR(A*) and 1 + d*Atc =0;

(v) c arbitrary and deR(A) and 1 + d*Afc #0; (vi) ceR(A) and deR(A*) and I +d*Afc=0. Throughout the following discussion, we will make frequent use of the fact that the generalized inverse of a non-zero vector x is given by = x*/IIx 112 where lix 112 =(x,x). Theorem 3.1.3 For CEC'" and dEC" let k = the column AtC, h = the row d*At, u = the column (I — AAt)c, v = the row d*(I — AtA), and = the scalar I + d*AtC. (Notice that ceR(A) and only if u =0 and

48 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

deR(A*)

and only if,

= 0.) Then the generalized inverse of(A + cd*) is

as follows. (1)

Ifu#Oand,#O,then(A+cd*)t=At_kut _vth+p,tut.

(ii)

If u = 0 and #0, then (A + cd*)? = A' +

and

(iv)

qt = _QI')2k*At

÷k).

where p1 =

—

= 11k0211v112+1P12.

Ifu#0,,=O,andfl=0,then(A+cd*)t=At_Athth_ku?.

(v) Ifv=OandP#0, then (A +cd*)t =At

whereP2=_(0")Ath*+k). 02 = 11h11211u112 + 1P12. (vi) Ifu=O,,=O,andp=0,then(A+cd*)t=At_kktAt_

Athth + (ktAtht)kh. Before proving Theorem 1, two preliminary facts are needed. We state them as lemmas.

IA

Lemma 3.1.1

L'

uli—i.

—PJ

Proof This follows immediately from the factorization

IA+cd* L

0*

ci —li—Lb

O1FA

'JL'

ullI kill

0

—PJL0" lJLd*

Lemma 3.1.2 If M and X are matrices such that XMMt = X and MtM = XM, then X = Mt.

Proof Mt = (MtM)Mt = XMMt = X. U We now proceed with the proof of Theorem 3. Throughout, we assume

c #0 and d #0.

Proof of (i). Let X1 denote the right-hand side of the equation in (i) and let M = A + cd*. The proof consists of showing that X1 satisfies the four Penrose conditions. Using Mt =0, dv' = 1, = — 1, and c — Ak = AAt it is easy to see that MX1 = + wit so that the thfrd Penrose condition holds. Using UtA =0, utc = 1, be = — 1, and d* — hA = v, one obtnint X1M = AtA = and hence the fourth condition holds. The first and second conditions follow easily. Proof of (ii). Let X2 denote the right-hand side of the equality (ii). By using = 0,d*vt = 1, and d*k = — 1, it is seen that (A + cd*)X2 = AA', Ak = C,

PARTITIONED MATRICES AND CONSTRAINED GENERALIZED INVERSE 49

which is hermitian. From the facts that ktAtA = kt. hc = — I, and d* — hA = v, it follows that X2(A + cd*) = A'A — kk' + v'v, which is also

hermitian. The first and second Penrose conditions are now easily verified.

Proof of (iii) This case is the most difficult. Here u = 0 so that CER(A) and hence it follows that R(A + cdi c R(A). Since fi 0 it is clear from Lemma 1 that rank (A + cd*) = rank (A) so that R(A + cdi = R(A). Therefore

+

(A +

= AAt

(3)

because AAt is the unique orthogonal projector onto R(A). Let X3 denote the right-hand side of the equation in (iii). Because = it follows immediately from (3) that X3(A + cd*)(A + cd*)t = X3. Hence the first condition of Lemma 2 is satisfied. To show that the second condition of Lemma 2 is also satisfied, we first The matrix AtA — show that (A + cd*)?(A + cd*) = AtA — kk' + kkt + is hermitian and idempotent. The fact that it is hermitian is clear and the fact that it is idempotent follows by direct computation using AtAk = k, AtAp1 = — k, and kkt p1 = — k. Since the rank of an idempotent matrix is equal to its trace and since trace is a linear function, it follows = Tr(AtA) — = Tr(A'A — kkt + that rank(AtA — kkt + Tr(kkt) + Now, kkt and are idempotent matrices of rank = trace = 1 and AtA is an idempotent matrix whose rank is equal to rank (A), so that rank(AtA — kk' + rank(A + cd*). (4) 1,d*p1 = I —a1fl',and UsingthefactsAk=c,Ap1 = d*AtA = — v, one obtains (A + cd*)(AtA — kkt + p1 = A + cd* — 2, c(v + flk' + 'p11). Now, II so that = k 2a1 I P1 2=PhIklL2 and hence +fiIIkII = — v _flkt. 112

Ii

Thus, (A + cd*)(AtA — kkt + p1p'1) = A + cd*. Because A'A — kkt + is an orthogonal projector, it follows that R(A* + dc*) c R(AtA — kkt + By virtue of(4), we conclude that R(A* + dc*) = R(AtA — kkt + or p1p'1), and hence(A* + dc*)(A* ÷ dci' = AtA — kk' + equivalently, (A + + cdi = AtA — kkt + p1 To show that X3(A + cd*) = A'A + p1 — kkt, we compute X3(A + cd*). 1, Observe that k*AfA = k*, = — and + d* = — 11v112P1k+v. 1

Now, X3(A + cd*)

/

1

a, = AtA +

—

= AtA + !,*k* —

a,

a,

—

+

" k2 - v*)d* P

p,d* — a1

+ d*)

ft

+ p,d*

50 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

=AtA+!v*k*__p,(v_ Write v as = — fill k fl + parentheses and use the fact that X3(A + cd*) = Since

+

AtA+

and substitute this in the expression in 2 p1 = 1fi12c' Ilk II to obtain p1k*.

+ p, P,

+

II

2k =

—

k —k —k = — kk'. we arrive at X3(A + cd*) = A'A + Thus (A + cd*)t(A + cd*) = X3(A + cd*) so that X3 = (A + Cd*)t by Lemma 2. Proof of (iv) and (v). (iv) follows from (ii) and (v) follows from (iii) by

taking conjugate transposes and using the fact that for any matrix = (Me)'. h'h and AtA — kkt is an Proof of (vi). Each of the matrices AAt — orthogonal projector. The fact that they are idempotent follows from AA'h' = lit, MA' = h,A'Ak = k and ktAtA = kt. It is clear that each is hermitian. Moreover, the rank of each is equal to its trace and hence each has rank equal to rank (A) — 1. Also, since u =0, v =0, and fi =0, it follows from Lemma 1.1 that rank(A + cd*) = rank(A) — 1. Hence, rank(A + cd*) = rank (AA'

—

hth)

= rank (AAt

—

k'k).

(5)

With the facts AAtC = c, hc = — I, and hA = d*, it is easy to see that (AAt — hth)(A + cd*) = (A + cd*), so that R(A + cd*) c R(AAt — h'h). Likewise, using d*A?A = d*, d*k = — 1, and Ak = c, one sees that (A + cd*)(AtA — kk') = A + cd*. Hence R(A* + dc*) R(A'A — kk'). By virtue of (5), it now follows that

(A + cd*)(A + cd*)t = AA'

—

h'h, and

(6)

(A + cd*)t(A + cd*) = A'A

—

kk'.

(7)

If X4 denotes the right-hand side of (vi), use (6) and the fact that bAA' = h to obtain X4(A + cd*)(A + cd*)t = X4 which is the first condition of Lemma 2. Use k'A'A = kt, hA = d*, and hc = — 1 to obtain X4(A + cd) =

AtA — kk'. Then by (7), we have that the second condition of Lemma 2 is satisfied. Hence X4 = (A + cd*)f

Corollary 3.1.1

•

When ceR(A), deR(A*), and

inverse of A + cd* is given by (A +

0, the generalized

= At —

At

—

Proof Setv=Oin(iii),u=Oin(v). U Corollary 1 is the analogue of the well known formula which states that

if both A and A + cd are non-singular, then (A + cdT' =

A'

—

PARTITIONED MATRICES AND CONSTRAINED GENERALIZED INVERSE

2.

51

Modified matrices

At first glance. the results of Theorem 1.3 may appear too complicated

to be any practical value. However, a closer examination of Theorem 1.3 will reveal that it may be very useful and it is not difficult to apply to a large class of problems. Suppose that one is trying to model a particular situation with a mathematical expression which involves a matrix A C'" XsI and its generalized inverse A'. For a variety of reasons, it is frequently the case that one wishes to modify the model by changing one or more entries of A to produce a 'modified' matrix A, and then to compute At. The modified model involving A and A' may then be analysed and compared with the original model to determine what effects the modifications produce. A similar situation which is frequently encountered is that an error is discovered in a matrix A of data for which A' has been previously calculated. It then becomes necessary to correct or modify A to produce a matrix A and then to compute the generalized inverse A' of the modified matrix. In each of the above situations, it is highly desirable to use the already known information; A,A' and the modifications made. in the computation of A' rather than starting from scratch. Theorem 1.3 allows us to do this since any matrix modification can always be accomplished by the addition of one or more rank one matrices. To illustrate these ideas, consider the common situation in which one wishes to add a scalar to the (i,j)th entry of A€C'" to produce the modified matrix A. Write A as where Write A' as A' =

= [c1 ...

(1)

rr =

[r

That is, g•j denotes the (i,j)-entry of At, c1 is the ith column of At, and r1 is the ith row of At. The dotted lines which occur in the block matrix of At are included to help the reader distinguish the blocks and their arrangement. They will be especially useful in Section 3 where some blocks have rather complicated expressions. To use Theorem 1.3 on the modified matrix (1), order the computation as follows.

Algorithm 3.2.1 To g-invert the modified matrix A + (I) Compute k and h. This is easy since k = Ate1 and h =

(II) Compute u and v by u = e1 — Ac1 and v =

(Ill) Compute

this is also easy since fi = 1 +

so that

—

so that fi = 1 + (IV) Decide which of the six cases to use according as u, ,, and are zero or non-zero. (V) Depending on which case is to be used, carefully arrange the he1,

52 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

computation of the terms involved so as to minimize the number of

multiplications and divisions performed. To illustrate step (V) of Algorithm 1, consider the term kktAt, which has to be computed in cases (ii) and (vi) of Theorem 1.3. It could be computed in several ways. Let us examine two of them. To obtain kt (we k 112 = assume) k #0) we use k' = This requires 2n operations (an operation is either a multiplication or division). If we perform the calculations by next forming the product kk' and then the product (kkt)At, it would be necessary to do an additional mn2 + n2 operations, making a total of n2(m + 1) + 2n operations. However, if kktAt is computed by first obtaining kt and then forming the product ktAt, followed then by forming the product (k(ktAt)), the number of operations required is reduced to 2n(m + 1). This could amount to a significant saving in time and effort as compared to the former operational count. It is important to observe that the products AA' or AtA do not need to be explicitly computed in order to use Theorem 1.3. If one were naive enough to form the products AAt or AtA, a large amount of unnecessary effort would be expended.

Example 3.2.1 2

Suppose

0

A= 10

1

—1

Lo

0

1

that

3 ii 0I,andA'=— 3 12

—iJ

—3

01

5 7

41

4J

3-7-8J

has been previously computed. Assume that an error has been discovered in A in that the (3,3)-entry of A should have been zero instead of one. Then A is corrected by adding — 1 to the (3,3)-entry. Thus the modified matrix is A = A + e3( — To obtain A' we proceed as follows. (I) The terms k and h are first read from A' as

—4].

7

(II) The terms u and v are easily calculated. Ac3 = e3 so that u =0.

v=

—1 —1

—

ij.

(III) The term $ is also read from At as fi = 1 (IV) Since u =0, v used to obtain At.

0, and

—

g33

=

#0, case (iii) of Theorem 1.3 must be

(V) Computing the terms in case (iii) we get

k 112

= c3 112 =

and

PARTITIONED MATRICES AND CONSTRAINED GENERALIZED INVERSE

53

I

Then by (in) of Theorem 1.3., A =

2

—2

2

2

— 5

2

0

3.

0 —6

Partitioned matrices where A, B, C and D are four matrices such that E is

Let E

=

also a matrix. Then A, B, C, D are called conformable. There are two ways to think of E. One is that E is made up of blocks A, B, C, D. In this case an E to the blocks are, in a sense, considered fixed. If one is trying to have a certain property, then one might experiment with a particular size or kind of blocks. This is especially the case in certain more advanced areas of mathematics such as Operator Theory where specific examples of linear operators on infinite dimensional vector spaces are often defined in terms of block matrices. One can also view E as partitioned into its blocks. In this viewpoint one starts with E, views it as a partitioned matrix and tries to compute things about E from the blocks it is partitioned into. In this viewpoint E is fixed and different arrangements of blocks may be considered. Partitioned matrix and block matrix are equivalent mathematical terms. However, in a given area of mathematics one of the two terms is more likely to be in vogue. We shall try to be in style. Of course, a partitioned or block matrix may have more or less than four blocks.

54 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

This section is concerned with how to compute the generalized inverse

of a matrix in terms of various partitions of it. As with Section 2, it is virtually impossible to come up with usable results in the general case. However, various special cases can be handled and, as in Section 2, they are not only theoretically interesting, but lead to useful algorithms. In fact. Theorem 1.3 will be the basis of much of this section, including the first case we consider. Let be partitioned by slicing off its last column, so that P = [B cJ where BeCtm and ceCtm. Our objective is to obtain a useful P may also be written as P = [Bj011 + 1] where expression for 01ECM, O2ECA.

and

Then P is in a form for which Theorem 1.3 applies. Let A = = [0 1]. Using the notation of Theorem 1.3 and the fact that At

one easily obtains h = d*At = 0 so that

=

= 1 + d*Afc = 1 and

v = d* #0. Also, u = (I — AAt)C = (I — BB')c. Thus, there are two cases to consider, according as to whether u #0 or u =0. Consider first the case when u #0 (i.e. Then case (1) in Theorem 1.3 is used to obtain pt In this case

AC=[fBtcl 0

IBtl IB'cutl

101

10

L

Next, consider the case when u =0 so that (iii) of Theorem 1.3 must be

used. Let k = Btc. Then = — k*Bt so that Bt —

=

= 1 + c*B'*Btc = 1 + k*k,

=

and —

kk*Bt 1 + k*k k*Bt

.

Thus we have the following theorem.

1 + kk Theorem 3.3.1 For ceCTM, and P = let k = Btc and u = (I — BBt)C = c — Bk. The generalized inverse of P is given by t

IBt_kyl

(ut

]whereY=l(l+k*k)_lk.Bt ifu=0.

[B]

Theorem 4 can be applied to matrices of the form p = r a row vector, by using the conjugate transpose. By using Theorem 1 together with Theorem 1.3, it is possible to consider the more general partitioned matrix M

=

where

PARTITIONED MATRICES AND CONSTRAINED GENERALIZED INVERSE

dECN,

so that M can be written as

and xeC. Let P =

+

+

M

55

=

=

Theorem 1.3 can be applied to obtain Mt as M' = [Pt 01 + G. pt is

I+

known from Theorem 1. Clearly, x

[Pt oiI?1 = I L

0. Thus either case (i)or case (v) of Theorem 1.3 must be

.J

used, depending

on whether or not

Idi

IIA*

The details, which are somewhat involved, are left to the interested reader, or one may see Meyer's paper [551. We state the end result. Theorem 3.3.2

let

For AECtm N CEC

:],k=Atc,h=dsAt,u=(I_AAt)

c,

= I + 11k02,w2= 1 + The generalized inverse for M is as follows.

(i) If U

0 and V

lAt — kut — vth — 5,tut

0, then Mt =

I

u

L

At (ii)

10

lkk*At

—

'0

I

then M'

(iii)

where p1

— k.

=[ 2k*At

=

(iv) If v =0 and t5 =0, then M

iAt_

where p2

(vi)

=

0, then Mt

— k,

u

11,112 +

t

—i

I

rAt

= cv,

—

*

u

L

(v) If v = 0 and 5

—

,

0

—3-- IAtII*U*

=L +

1]'.

=5

—

h, and 42 = W2 U 112 + 1512.

Ifu=O,v=O, andS=O, then

M' = I

[

—kA

0

J

+ k*Ath*Iklrh, —1

w1w2[—lJ''

56 GENERAUZED INVERSES OF LINEAR TRANSFORMATIONS

Frequently one finds that one is dealing with a hermitian or a real

symmetric matrix. The following corollary might then be useful.

Corollary 3.3.1

For AECTMXM such that A = A* and for xeR, the

generalized inverse of the hermitian matrix H =

0, then Ht

(1) If u

rAt

k

tk*

as follows: (5

t' U:

t.

=L

(ii)

If u =0 and 6=0, then

H' =

1

1 and 2 may be used to recursively compute a generalized inverse, and were the basis of some early methods. However, the calculations are sensitive to ill conditioning (see Chapter 12). The next two algorithms, while worded in terms of calculating At should only be used for that purpose on small sized, reasonably well conditioned matrices. The real value of the next two algorithms, like Algorithm 2.1, is in updating. Only in this case instead of updating A itself, one is adding rows and/or columns to A. This corresponds, for example in least squares problems, to either making an additional observation or adding a new parameter to the model.

Algorithm 3.3.1 To g-invert any (I)

(II) For i 2, set B = (III) k = B_ 1c1, (IV) u1 = —

!c1],

1k1,

if ii,

0,

and k*Bt

(VI) B

if

0,

=

(VII) Then

Example 3.3.1

Suppose that we have the linear system Ax = b, and

PARTITIONED MATRICES AND CONSTRAINED GENERALIZED INVERSE

57

have computed At so we know that

10 A— —

2

1

3

1

11

'r

At

1

0

'

1

1

—1

—1

3

But now we wish to add another independent variable and solve Ai = b where

10

—

A

=

1

21

0

3

1

1

1

1

—1

—t by computing A.

We will use Algorithm 1. In the notation of Algorithm I, A = B3, A = B2,

c*=[1 0

1

—2

=

3

—

1

7], so that 2

3

2 —2

0

1

1-2

1]

—7J• If the matrix in our model is always hermitian and we add both an independent variable and a row, the next Algorithm could be useful. L

Algorithm 3.3.2

5

3

To g-invert H =

such

that H = H*

(I) Set A1=h11.

(II) For i 2, set

=

where c. =

I,] (III) Let k = A1_ 1c1, (IV) ô. = — and (V) u1

0, then

=

V1

=

and

At (VII) If U1 = 0 and 61 #0, then

(VIII) If ii, =0 and

At

= [_L

= 0, then let r1 =

A;j;L1

58 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

=

=( At =

so that

and z1 =

(r1_z1)*

L

(IX) Then

For a general hermitian matrix it is difficult to compare operational counts, however Algorithm 2 is usually more efficient than Algorithm 1 when applicable since it utilizes the symmetry. There is, of course, no clear cut point at which it is better to recompute At from scratch then use augmentation methods. If A were 11 x 4 and 2 rows were added, the authors would go with Algorithm 1. If A were 11 x 4 and 7 rows were added, we would recompute At directly from A. It is logical to wonder what the extensions of Theorems 1 and 2 are.

That is, what are [A!C]t and

when C and D no longer are just

columns and B is no longer a scalar? When A, B, C and D are general conformable matrices, the answer to 'what is [A! C]?' is difficult. A useful

answer to 'what is

rA Cit ?' is not yet known though formulas exist. LD B]

The previous discussion suggests that in some cases, at least when C has a 'small' number of columns, such as extensions could be useful. We will begin by examining matrices of the form [A! C] where A and C are general conformable matrices. One representation for [A!C]t is as follows.

Theorem 3.3.3 For AeCTM 'and CECTM [A C] can be written as

r

AA rA'clt I i — LT*(I + fl'*) I

I

VA —

I(At

—

where B = (I — AAt)C and T = AtC(I

the generalized inverse of

AtCRt

AtCBt) + Bt —

BtB).

Proof One verifies that the four Penrose conditions are satisfied. U A representation similar to that of Theorem 3 is possible for matrices partitioned in the form

by taking transposes.

The reader should be aware that there are many other known ways of representing the generalized inverse for matrices partitioned as [A! C] or

as []. The interested reader is urged to consult the following references to obtain several other useful representations. (See R. E. Cline [31], A. Ben-Israel [12], and P. V. Rao [72].)

PARTITIONED MATRICES AND CONSTRAINED GENERALIZED INVERSE

As previously mentioned, no useful representation for

rA L

R

59

Cl' has, D]

up to this time, been given where A, C, R and D are general comformable matrices. However, if we place some restrictions on the blocks in question, we can obtain some useful results.

Lemma 3.3.1

If A, C, R and D are conformable matrices such that A is then D = RA 'C.

square and non-singular and rank (A) = rank

Furthermore, EJP = RA' and Q = A'C then

Proof

The factorization I

F

O1[A

I][R

yelds rank

I

I

—A'Cl fA

DJL0

i

IAC1 = rank rA;

0

]Lo 1

0

D—RA'C

= rank (A) +

rank (D — RA - 'C). Therefore, it can be concluded that rank (D — RA 'C) =0, or equivalently, D = RA 'C. The factorization (1) follows

directly. • Matrices of the type discussed in Lemma 1 have generalized inverses which possess a relatively simple form. Theorem 3.3.4

Let A, C, R and D be conformable matrices such that

A is square, non-singular, and rank (A) any matrices such that

[A C]t =

[

[R D] =

= rank

If P and Q are

Q],

then

([I + P*P]A[I + QQ*Jy 1[I P*]

and G = [I, Q]. Notice B= = that rank(B) = rank(G) = rank(A) = rank(M) = r = (number of columns

Proof Let

M

of B) = number of rows of G). Thus, we may apply Theorem 1.3.2 to obtain Mt = (BG)t = G*(GG*) '(BB) IB*. Since (B*B) 1B* = [A*(I + P*P)A] = A 1(1 + p*p) l[UP*] and G*(GG*rl =

+

desired result is obtained. •

It is always possible to perform a permutation of rows and columns to

60 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

any matrix so as to bring a full-rank non-singular block to the upper left

hand corner. Theorem 3.3.4 may then be used as illustrated below. Example 3.3.3 In order to use Theorem 4 to find Mt, the first step is to reduce M to row echelon form, EM. This not only will reveal the rank of M, but will also indicate a full set of linearly independent columns.

LetM=

2

1

2

1

1

3

1

2

4

2

2

6

0 0

1

2' —1 0

2

,sothatEM=

0 1

0 0 0

1/2 1/2 0

5/21 1/2 0

00000

24 015 Thus, rank(M)

=2 and the first and third columns of M form a full independent set. Let F be the 5 x 5 permutation matrix obtained by exchanging the second and third rows of so that [1

11213

MF =

—

[2

independent

=

X2]. The next step is to select two

01415

rows from the

matrix X1. This may be

accomplished in several

rows reduction to echelon form, or one might

ways. One could have obtained this information by noting which

were interchanged

during the

just look at X1 and select the appropriate rows, or one might reduce to echelon form. In our example, it is easy to see that the first and third rows of X1 are independent. Let E be the 4 x 4 permutation matrix obtained by exchanging the second and third rows of 14 so that

EMF

=

11

1

Ii

—1

12

1

2

1

3

2

0

2

01415

[2

IA

C

= I

permutation matrices are unitary, Theorem 1.2.1 allows us to write (EMF)t = F*M?E* so that = Now apply Theorem 4 to obtain (EMF)t. In our example, Since

so that

—88

66

—6

36

—12

30

15

—35

30

—20

9

1

18

10

18

15

36

30

33

330

1

—3 —6 —6 —12 15

66 30

9

18

33

—55

—88 —55 —35 —20 1

10

PARTITIONED MATRICES AND CONSTRAINED GENERALIZED INVERSE

Theorem 3.3.5.

Let A. C, R and D be conformable matrices such that

lAd

A is square, non-singular, and rank (A) = rank

matrix M

61

R

=

and let

D

•

Let M denoe the

The matrix W is

l[A* R*].

non-singular and Mt =

Proof We first prove the matrix W is non-singular. Write W as W = A*AA* + R*RA* + A*CC* + R*DC*. From Lemma I we know = that D = RA 'C so that W = A*AA + R*RA* + A*CC* + R*RA (A*A + R*R)A - l(AA* + CCt). Because A is non-singular, the matrices (AtA + RtR) and (AAt + CCt), are both positive definite, and thus non-singular. Therefore, W must be non-singular. Furthermore, W' = + CC*yl + RtR) '.Using this, one can now verify

the four Penrose conditions are satisfied. U In both Theorems 4 and 5, it is necessary to invert only one matrix whose dimensions are equal to those of A. In Theorem 5, it is not necessary to obtain the matrices P and Q as in Theorem 4. However, where problems of ill-conditioning are encountered (see Chapter 12), Theorem 4 might be preferred over Theorem 5.

4.

Block triangular matrices

Definition 3.4.1 For conformable matrices T,, , T,2 , T21, and T22, matrices oftheform fT11

L°

0

122J

LT21

T22

are called upper block triangular and lower block triangular, respectively. It is important to note that neither T,, nor T22 are required to be square in the above definition. Throughout this section, we will discuss only upper block triangular matrices. For each statement we make about upper block triangular matrices, there is a corresponding statement possible for lower block triangular matrices.

Definition 3.4.2

For an upper block triangular matrix,

T is a properly partitioned upper block triangular matrix if T is upper block triangular of the form T = I L

A

"22J

where the

62 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

dimensions of G11 and G22 are the same as the dimensions of the transposes of T11 and T22, respectively. Any partition ofT which makes T into a

properly partitioned matrix is called a proper partition for T.

111101 lo 0 0 ii

LetT=

Example 3.4.1

Loooi]

1/30 1/3 0 1/30

0 0 0

0 1/2 1/2 There are several ways to partition T so that T will have an upper block triangular form. For example,

Iii 1101 and T2 Ii = 100101 L000: iJ Loolo 1 i

are two different partitions of T which both give rise to upper block triangular forms. Clearly, T, is a proper partition of T while T2 is not. In fact, T1 is the only proper partition of T. Example 3.4.2 If T is an upper block triangular matrix which is partitioned as (1) whether T11 and T22 are both non-singular, then T is properly partitioned because

T_l_1Tu

I

T1 22

1 I

Not all upper block triangular matrices can be properly partitioned.

Example 3.4.3

Let T =

111

2 L20

'I

11

2

12

4 0J . Since there are no zeros in

11

L001J

—11

8

25J The next theorem characterizes properly partitioned matrices. —

10

Theorem 3.4.1 Let T be an upper block triangular matrix partitioned as (1). T is properly partitioned (land only c 1) and Furthermore, when T is properly partitioned, Tt is given by R(Tr2)

rTt all —L

_TfT Tt I

I

i

(Note the resemblance between this expression and that of Example 2.)

Proof Suppose first that T is properly partitioned so that Tt is upper block triangular. It follows that iTt and TtT must also be upper block triangular. Since iT' and VT are hermitian, it must be the case that they

PARTITIONED MATRICES AND CONSTRAINED GENERALIZED INVERSE 63

are of the form

fR

0]

FL

0

Lo

R2]

Lo

L2

By using the fact that 1'TtT = T, one obtains R1T11 =T11 and L2T22 =T22, R1T12 =T12 and T12L2 =T12

(3)

Also, R1 , R2, L1 and L2 must be orthogonal projectors because they are

hermitian and idempotent. Since R1 = T1 1X for some X and R1T11 = T11, Likewise, we can conclude R(R = i)' and hence T = From (3), L2 = VT22 for some Y and T22 = T22L2 implies L2 = we now have 2

T —T12a"dP

T*_T* 12 12'

4

c

(5)

and therefore R(T12) c R(T11) and

To prove the converse, one first notes that (5) implies (4) and then uses this to show that the four Penrose conditions of Definition 1.1.3 are satisfied by the matrix (2). A necessary condition for an upper block triangular to be properly partitioned is easily obtained from Theorem 1.

•

Let T be partitioned as in (1). If T is properly partitioned, rank(T11) + rank(T22).

Corollary 3.4.1 then rank(T)

Proof If T is properly partitioned, then Tt is given by (2) and T1

rTTtL'_ 1T12 = T12 so that iTt =L

rank(T)= rank(TTt) = rank(T1 rank(T22). U 5.

1

—

i

1

0 22

+ rank

Thus, 22

= rank(T1

+

The fundamental matrix of constrained minimization

Definition 3.5.1

Let V E CA

be any matrix in

xr•

x

be a positive semi-definite matrix and let C*

The block matrix B

Iv C*1 is called the =LC 0 ]

fundamental matrix of constrained minimization.

This matrix is so named because of its importance in the theory of constrained minimization, as is demonstrated in the next section. It also plays a fundamental role in the theory of linear estimation. (See Section 4 of Chapter 6.) Throughout this section, the letter B will always denote the matrix of Definition 1. Our purpose here is to obtain a form for Bt.

64 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

Note that ifS is the permutation matrix S =

then

rc* vit

B=[0 vi cjS*sothatBt=SLo rc*

Thus, we may use Theorem 3.4.1 to obtain the following result.

Theorem 3.5.1

Bt =

jf and only tfR(V) c R(C*).

In the case when R(V) R(C*), it is possible to add an appropriate term to the expression in Theorem 1 to get Bt.

Theorem 3.5.2 For any positive semi-definite let E = I — and let Q = (EVE)t. Then Bt

Proof

XII

—

=

and any C*ECN

VCt].

(1)

+

Since V is positive semi-definite, there exists

XII

such

that

Now, E = E* = E2so that

V=

(2) Q = (E*A*AE)t = ([AE]*AE)t and hence R(Q) = R( [AE]*) R(EA*) c R(E). This together with the fact that Q = Q* implies

EQ=QandQE=Q,

(3)

so that CQ = 0 and Q*C =0. Let X denote the right-hand side 01(1). We shall show that X satisfies the four Penrose conditions. Using the above information, calculate BX as

+ EVQ 0

—

BX

:

EVCt

- EVQVCt

Use (3) to write EVQ = EVEQ = QtQ and EVQV = EVEQEV = EA*AE(AE;t(AE)*t(AE)*A = EA*AE(AE)t(AE) x (AE)tA = EA*AE(AE)f A = (AE)*(AE)(AE)tA = (AE)*A

=EA*A=EV.

(4)

Thus,

+ Q'Q

BX — L

0

0 1

:ccti' :

which is hermitian. Using (5), compute BXB

(5)

=

From (3) and (4) it is easy to get that EVEQV = EVQV = EV, and hence BXB = B. It follows by direct computation using (5) that XBX = X.

PARTITIONED MATRICES AND CONSTRAINED GENERALIZED INVERSE 65

Finally, compute XB

I

QVE+CtC

01

:

From (3)

= QVE = QEVE = QQ'. In a manner similar to that used in obtaining (4), one can show that VQVE = yE, so that XB

=

which is hermitian. We have shown that X satisfies all four Penrose

conditions so that Bt = X. U 6.

Constrained least squares and constrained generalized inverses

In this section, we deal with two problems in constrained minimization. Let beC'", and f€C". Let 5" denote the set

5= {xIx=Ctf+ N(C)}. That is, 9' is the set of solutions (or least squares solutions) of Cx = 1, depending on whether or not fER(C). .9' will be the set of constraints. It

can be argued that the function Ax — b attains a minimum value on .9'. The two problems which we will consider are as follows. Problem

Find mm

1

Ax

—

b

and describe the points in 9' at which

the minimum is attained as a function of A, b, C, and

f.

Problem 2 Among the points in 9 for which the minimum of Problem 1 is attained, show there is a unique point of minimal norm and then describe it asa function of A,b,C, and f. The solutions of these two problems rest on the following fundamental theorem. This theorem also indicates why the term 'fundamental matrix' was used in Definition 3.5.1. Theorem 3.6.1 q(x) = Ax — b Ax0 —

2•

Let A, b, C, 1, and .5" be as described above, and let A vector x satisfies the conditions that XØE.9' and

bil Ax — bil for all XE9' !fand only !f there is a vector

such that z0

IA*A

Lc

[°] is a least squares solution of the system = :

C*][x]

fA*b

o]Ly]Lf

Proof Let B and v denote the block matrices IA*A C*1 IA*bl B Suppose first that is a least squares = =L g solution of Bz = v. From Theorem 2.1.2, we have that Bz0 = BBt,. From equations (2) and (5) of Section 5, we have I

66 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

BB

ICtC + (AE)t(AE) =[

o

where E = I

-'

—

C C,

so that Bz, = BBt, implies A*Ax0 = C*y. = CtCA*b + (AE)t(AE)A*b = CtCA*b + (AE)t(AE)(AE)*b = CtCA*b + (AE)*b = A*b

(1)

and

Cx0=CCtf. From (2) we know x E9'. Write x, = Ctf + h0 where h0eN(C). For every xe9' we have x = C1f+ so that

IIAC'f+ = AC'f + =

(2)

Ax —b112 —

ACtf — Ah, +

A; — b

(3)

112

For all h€N(C), we may use (1) to get (Ah,Ax0 — b) = (h,A*(Ax0

—

b))= +

(h, _C*y0)= —(Ch,y0)=O. Hence(3)becomesq(x)= q(x0) so that q(x) q(x) for all xeb°, as desired. Conversely, suppose x0e9' and q(x) q(x). If Ctm is decomposed as Ctm =

A(N(C)) + [A(N(C))]'-, then

A; — b = Ah + w, where hEN(C),WE[A(N(C))]-'-.

(4)

We can write q(x0) =

Ah + w II 2 = H

Ah 2 + w 2.

(5)

Now observe that (x0 — h)E$° because x0E$" and heN(C) implies C(x0 — I.) = CCtf. By hypothesis, we have q(x) q(x) for all xeb' so that q(x0) q(x0 — h) = II (Ax — b) — Ala 112 = (A1 + w) — Ah 112 = w (from (4)) = q(x0) — Ala 112 (from (5)). Thus Ah =0 and (Ax0 — b)E

[A(N(C))]' by (4). Hence for any geN(C), 0= (Ag, Ax0 — b) = (g, A*Ax0 A*b), and (A*Ax, — = R(C*). This means there exists a — A*b or vector( — such that C*( — y0) = A*Ax + C*y, = A*b = A*b — (AE)*b + (AE)*b

(6)

= A*b — (AE)*b + (AE)t(AE)(AE)*b = A*b — EA*b + (AE)IAEA*b

= [CtC + (AE)I(AE)]A*b Now (6) together with the fact that x0e9', gives

1

= RBt

rA*bl L

therefore

[c

is a least squares solution of —

• IJ

FA*bl

The solution to Problem 11$ obtained directly from Theorem 1.

j

and

PARTITIONED MATRICES AND CONSTRAINED GENERALIZED INVERSE

Theorem 3.6.2

The set of vectors M c .9' at which mm

67

Ax — b is

atta.ned is given by

M = {(AE)t(b — AC'f) + Ctf+(I

=I—

(M will be called the set of constrained least squares solutions). Furthermore where E

Ax — b

mm

= 11(1 —

A(AE)t)(ACtf —

b)

(7)

Proof From Theorem I, we have M

=

Bt[A;b]+(I — BtB)['L].

and

arbitrary).

By Theorem 5.2, M =

QQt = [(AE)*(AE)]l(AE)*(AE) = = (AE)t(AE)sO that M becomes M {QA*(b — ACtf) + Ctf + (I — Note that, R((AE)') = R((AE)*) = R(EA*) R(E), so that Q

E(AE)t = (AE)t and (3) of Section 5 yields QA* = (EQE)A* = E(AE)t(AE)*t(AE)* = E(AE)t = (AE)t. Thus M becomes M = {(AE)t(b — ACtf) + Ctf+ For each mEM, we wish to write the (I expression Am — b Ii. In order to do this, observe (8) implies that A(AE)t(AE) = AE and = so that A(I — when

(8)

=0

for all {eN(C). Expression (7) now follows. I The solution to Problem 2 also follows quickly. Let M denote the set of constrained least squares solutions as given in Theorem 6.2. If u denotes the vector u = (AE)t(b — Act f) + Cti, then u is the unique constrained least squares x for all XE M such solution of minimal norm. That is, U EM and

Theorem 3.6.3

that x

u.

Proof The fact that UE M is a consequence of Theorem 2 by taking =0. To see u has minimal norm, suppose x€M and use Theorem 2 to = N(ct*) Since R((AE)t(AE)) = write x = u + (I — (AE,t(AE)g, R((AE)*) = R(EA*) R(E) = N(Ct*), it follows that ct*(AE)t AE =0 and Therefore it is now a simple matter to verify that u (I — — (AE)t(AE)g 112 112 u 112 with equality holding if and lix = u 112 + 11(1 only if(I — = 0, i.e if and only if u = x. U From Theorems 2 and 3, one sees that the matrix (AE)t is the basic quantity which allows the solution of the constrained least squares problem to be written in a fashion analogous to that of the solution of the unconstrained problem. Suppose one wished to define a 'constrained generalized inverse for

68 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

A with respect to C' so that it would have the same type of least squares

properties in the constrained sense as At has in the unconstrained sense. Suppose you also wanted it to reduce to At when no constraints are present (i.e. C = 0). The logical definition would be the matrix (AE)t. Definition 3.6.1 For Ae C and CE Xfl the constrained generalized inverse of A with respect to C, denoted by is defined to be — CtC))t. = (APN(c))t = (A(I (Notice that reduces to At when C = 0.) The definition of could also have been formulated algebraically, see Exercise 7.18. The solutions of Problem 1 and Problem 2 now take on a familiar form. The constrained least squares solution of Ax = b of minimal norm is. + (1—

xM =

(9)

The set of constrained least squares solutions is given by

M=

+ (I —

(10)

Furthermore, mm

lAx —bli =

(11)

b)lt.

11(1 —

The special case when the set of constraints defines a subspace instead of just a flat deserves mention as a corollary. Let V be a subspace of and P = The point X_E Vofminimal norm at which mm Ax — b H is attained is given by

Corollary 3.6.1

(12)

and the set of points M Vat which mm H Ax — b H is attained is "€1,

(13)

Furthermore, mm

Ax

—

b

= H (I —

AA' )b Il.

(14)

Proof C = and f=0, in (9),(10), and (11). Whether or not the constrained problem Ax = b, x e V is consistent also has an obvious answer. Corollary 3.6.2 If Vis a subspace of Ax = b, xe Vhas a solution and only

and P =

= b (i.e. problem is consistent, then the solution set is given by V) and the minimal norm solution is Xm =

then the problem

If the + (I —

Proof The problem is consistent if and only if the quantity in (14) is zero, that is, = b. That this is equivalent to saying

PARTITIONED MATRICES AND CONSTRAINED GENERALIZED INVERSE 69

follows from (8). The rest of the proof follows from (13) and (12). U In the same fashion one can analyse the consistency of the problem Ax = b, xe{f+ Vis a subspace) or one can decide when two systems

possess a common solution. This topic will be discussed in Chapter 6 from a different point of view. 7. 1.

Exercises Use Theorem 3.1.3 to prove Theorem 3.3.2.

2. Prove that if rank(A) =

R(C)

R(A), and R(R) c R(A*),

then D = RAT. 3. Let Q = D — RAtC. If R(C) c R(A), R(R*) c R(A*), R(R) c R(Q), and R(C*) R(Q*), prove that — fA — fAt + AtCQtRAt —QtRAt LR Qt

DJL

4. Let P = A — CDtR. If R(R)

R(D), R(C*) c R(D*), R(R*)

R(P*), and

R(P), write an expression for

R(C)

in terms of

5. If M =

C, R and D.

..

rA

.

[c* Dj is a positive semi-definite hermitian matrix such

that R(C*At) R(D — C*AtC), write an expression for Mt. 6. Suppose A is non-singular in the matrix M of Exercise 5. Under this assumption, write an expression for Mt. 7. If T22 is non-singular in (1) of Section 4 prove that B= — T1 1)T12 + is non-singular and then prove that

8.

If T11 is non-singular in Exercise 7, write an expression for Tt.

9. Let T

= IA

LO*

ci where

Derive an

CECTM, and

expression for TT

10. Prove that the generalized inverse of an upper (lower) triangular matrix T of rank r is again upper (lower) triangular if and only if there

exists a permutation matrix P such that PTP

] where

=

T1 E C' 'is a non-singular upper (lower) triangular matrix. 11. For such that rank(A) = r, prove that AtA = AA' if and only if there exists a unitary matrix W such that W*AW

=

] where

70 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

T1 eC'

xr

a nonsingular triangular matrix. Use Exercise 10 and the fact that any square matrix is unitarily equivalent to an upper (lower) triangular matrix. is

and

12. For

write an expression for

rAlt LRi

13. Prove Theorem 3.4.1 for lower block triangular matrices. 14. Give an example to show that the condition rank(T) = rank(T1 + rank (T22) is not sufficient for (1) of Section 4 to be properly partitioned. 15. Complete the proof of Theorem 3.3.3. 16. Complete the proof of Theorem 3.3.5. 17. If V is a positive definite hermitian matrix and if C is conformable, let K = V + CCt and R = C*KtC. Show that

Cit IKt — KtCRtC*Kt

Iv

[Ct

:

oJ

[

:

KtcRt

RtC*Kt

18. The constrained generalized inverse of A with respect to C is the unique solution X of the five equations (1) AXA = A on N(C), (2) XAX = X, (3) (AX)t = AX, (4) PN(C)(XA)t = XA, on N(C) (5) CX =0. 19. Complete the proof of Theorem 3.1.1 20. Derive Theorem 3.3.1 from Theorem 3.3.3.

4

Partial isometries and EP matrices

1.

Introduction

There are certain special types of matrices which occur frequently and called unitary if A* = A - l, have useful properties. For example, A e hermitian if A = A*, and normal if A*A = AA*. This should suggest to the reader questions like: when is A* = At?, when is A = At?, and when is

AtA = AAt? The answering of such questions is useful in understanding the generalized in'.'erse and is probably worth doing for that reason alone. It turns out, however, that the matrices involved are useful. It should probably be pointed out that one very rarely has to use partial isometrics or the polar form. The ideas discussed in this short chapter tend to be geometrical in nature and if there is a geometrical way of doing something then there is probably an algebraic way (and conversely). It is the feeling of the authors, however, that to be able to view a problem from more than one viewpoint is advantageous. Accordingly, we have tried to develop both the geometric and algebraic theory as we proceed. Throughout this chapter denotes the Eudhdean norm on C". 2.

Partial isometries

Part of the difficulity with generalizing the polar form in Theorem 0.3.1 X form to AECM X m # n, was the need for a 'non-square unitary'.

We will now develop the appropriate generalization of a unitary matrix. Definition 4.2.1 Suppose that VeCtm x n m. Then V is called an isometry fl Vu = u fl for all UECN. The equation Vu = u may be rewritten (Vu, Vu) = (u, u) or (V*yu, u) = (u, u). Now if C1, C2 are hermitian matrices in C" then (C1u,u)= (C2u,u) for all UEC" if and only if C1 = C2. Thus we have: Proposition 4.2.1 is an isonietry if and only if V*V = A more general concept than isometry is that of a partial isometry.

72

GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

a subs pace M. Then Definition 4.2.2 Let =M partial isometry (of M into Ctm) and only if (1)

is a

IIVUH=IIuIIforallu€Mand

(ii) Vu=Oifu€M'.

The subspace M is called the initial space of V and R(V) is called the final space.

A partial isometry V (or y) sends its initial space onto its final space without changing the lengths of vectors in its initial space or the angles between them. In other words, a partial isometry can be viewed as the identification of two subspaces. Orthogonal projections are a special type of partial isometry. Partial isometrics are easy to characterize. Theorem 4.2.1

Suppose that VeCtm

X

Then the following are equivalent.

(i) V is a partial isometry

(ii)

V*=Vt.

VV*—D a R( 's') s"" (iv) V=VV*V. (v) V* = (vi) (VV)2 = (V*V).

—

R( V) —

Initial space oF V

(vii) (VV*)2 =

Proof The equivalence of (1) and (iv)—(vii) is left to the exercises, while the equivalence of (ii) and (iii) is the Moore definition of yt• Suppose then that V is a partial isometry and M is its initial space. If ueM, then (Vu, Vu) = (V*Vu, u) = (u, u). But also R(V*V) = R(V*) = N(V)1 = M. Thus is hermitian. If ueM-'-, then V*Vu = 0 since )LYIM =IIM since Vu =0. Thus Similar arguments show that VV* = and = (iii) follows. To show that (iii) implies (I) the above argument can be done in reverse.

Corollary 4.2.1 If V is a partial isometry, then so is V*. For partial isometrics the Singular Value Decomposition mentioned in Chapter 0 takes a form that is worth noting. We are not going to prove the Singular Value Decomposition but our proof of this special case and of the general polar form should help the reader do so for himself.

Proposition 4.2.2

Xfl

Suppose that V cC"' is a partial isometry of rank r. Then there exist unitary matrices UeC"' X and We C" X "such that

:Jw. Proof Suppose that V cC'" XIt is a partial isometry. Let M = R(V*) be its initial space. Let { b1,... ,b,} be an orthonormal basis for M. Extend this to

PARTIAL ISOMETRIES AND EP MATRICES 73

an orthonormal basis

... ,bj of C". Since V is

= {b1 ,... ,b,,

isometric on M, {Vb1, ... ,Vbj is an orthonormal basis for R(V). Extend {Vb1,... ,Vb,} to an orthonormal basis = {Vb1, ... of CTM. Let W be the unitary transformation which changes a vector to its Let U be the unitary transformation coordinates with respect to basis which changes a 132-coordinate vector into a coordinate vector with respect to the standard basis of CTM. Then (1) follows. • We are now in a position to prove the general polar form.

Theorem 4.2.2

(General Polar Form). Suppose that AeCTM

Xn

Then

(1) There exists a hermitian BeC" such that N(B) = N(A) and a partial isometry such that R(V) = R(A), N(V) = N(B), and A = VB. X (ii) There exists a hermitian Ce C'" TM such that R(C) = R(A) and a partial isometry W such that N(W) = N(A), R(W) = R(C), and A = CW.

Proof The proof is motivated by the complex number idea it generalizes. We will prove (1) of Theorem If z = rern, then r = (zzl"2 and e" = z(zzT 2 and leave (ii), which is similar, to the exercises. Let B = (Recall the notation of page 6.) Then BeC" and N(A*A) N(B) = = N(A). Let V = AB'. We must show that V is the required partial isometry. Notice that Bt is hermitian, N(Bt) = N(B), and R(Bt) = R(B). Thus R(V) = R(ABt) = R(AB) = R(A(A*A)) = R(A) and N(V) = N(AA*A) = N(A) = N(B) as desired. Suppose then that ueN(V)1 = R(B). Then Vu 112 = (Vu, Vu) = (ABtU, AB'u) = (BtA*ABtU, u) = (BtBIBtU, u) = u Thus V is the required partial isometry. U The proof of the singular value decomposition theorem is left to the

exercises. Note that if D is square, then

ID O'lIi L o'

01

o'iLo

where

ID L0

01

0] can be factored as Ii 01. a partial

ID 01.is square and

k

[o

Of

isometry. A judicious use of this observation, Theorem 2, and the proof of Proposition 2 should lead to a proof of the singular value decomposition. While partial isometries are a generalization of unitary matrices there are some differences. For example, the columns or rows of a partial isometry need not be an orthonormal set unless a subset of the standard basis for C" or C'" is a basis for R(V*) or R(V).

Example 4.2.1

0

.ThenVisa

0 partial isometry but neither the columns nor the rows (or a subset thereof) form an orthonormal basis for R(V) or R(V). It should also be noted that, in general, the product of a pair of partial isometrics need not be a partial isometry. Also, unlike unitary operators,

74 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

square partial isometrics can have eigenvalues of modulus unequal to

one or zero. Example 4.2.2

Let

v=

Then V is a partial isometry and

a(V)= 10

Example 4.2.3

3.

Let V =

1

0

01 .

1

[000J

Then V is a partial isometry and

EP matrices

The identities A5A

= AM for normal matrices and A 'A = AA' for

invertible matrices are sometimes useful. This suggests that it might be

helpful to know when AtA = AAt.

Definition 4.3.1

and rank(A)= r. If AtA = AAt,

Suppose that

then A is called an EP,, or simply EP, matrix. The basic facts about EP matrices are set forth in the next theorem.

Theorem 4.3.1 (I)

Suppose that AeC"

X

Then the following are equivalent.

AisEP

(ii) R(A) = R(A5) = R(A) N(A) (iii) (iv) There exists a unitarr matrix U and an invertible r x r matrix A1, r = rank (A), such that (1)

: U5.

Proof (1). (ii) and (iii) are clearly equivalent. That (iv) implies (iii) is obvious. To see that (iii) implies (iv) let 13 be an orthonormal basis for C" consisting of first an orthonormal basis for R(A) and then an orthonormal basis for N(A). is then the coordinate transformation from standard

coordinates to fl-coordinates. • If A is EP and has the factorization given by (1), then since U, unitary

are

(2)

Since EP matrices have a nice form it is helpful if one can tell when a matrix is EP. This problem will be discussed again later. Several conditions implying EP are given in the exercises.

PARTIAL ISOMETRIES AND EP MATRICES

75

It was pointed out in Chapter 1 that, unlike the taking of an inverse, the taking of a generalized inverse does not have a nice 'spectral mapping property'. If A e invertible, then Aec(A) if and only

')

(3)

and Ax = Ax

if and only if A 'x =

(1\ x. )

(4)

While it is difficult to characterize matrices which satisfy condition (3), it is relatively easy to characterize those that satisfy condition (4). Notice that (4) implies (3).

Theorem 4.3.2

Suppose

that

Then A is EP if and only jf

(Ax = Ax if and only if Atx = Atx).

(5)

Proof Suppose that A is EP. By Theorem 1, A = is unitary and A11 exists. Then Ax = Ax if and only

]

U where U

]

Ux = A

=0, then u1 =0, and AtX =0. Thus (5) holds for A =0. If A

0,

then u2 =0 and u1 is an eigenvector for A1. Thus (5) follows from (2) and (4). Suppose now that (5) holds. Then N(A) = = Thus A is EP

by condition (iii) of Theorem 1. U Corollary 4.3.1 If A is EP, then Aec(A) if and only tfAtea(At). Corollary 1 does not of course, characterize when A is EP.

Example 4.3.1

Notice that a(A) = {O) and At = A*.

Let A

= Thus Aeo(A) if and only if A'ec(A). However, AtA

AAt_.[i 4. 1.

while

=

Exercises If V, W, and VW arc partial isometrics, show that (VW)t = WtVt using only Theorem 1.

76 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

=0, If V. W are partial isometries and [W, VVt] =0 or [V. then WV is a partial isometry. a partial isometry and U, W are unitary, show that 3. If VeC" x UVW is a partial isometry. 4. Show that the following conditions are equivalent. 2.

(i) V is a partial isometry (ii) VV*V = V (iii) V*VV* = (iv) (V*V)2 = = VV* (v) 5. Prove part (ii) of Theorem 2. *6. Prove the Singular Value Decomposition Theorem (Theorem 0.2). 7. Prove that if A*A = AA*, then A is EP. 8. Prove that the following are equivalent.

(a) A is EP

(b) [AtA,A+At]=0 (c) [AAt, A + At] =0 (d) [AtA,A + A*] = 0 (e) [AAt,A + A*] = 0

(f) [A,AtA]=O (g) [A,AAt] = 0

9. Prove that if A is EP, then (At)2 = (A2)t. Find an example of a matrix A 0, such that (A2)t = (At)2 but A is not EP. *10. Prove that A is EP if and only if both (At)2 = (A2)t and R(A) = R(A2). *11. Prove that A is EP if and only if R(A2)= R(A) and [AtA,AAt] = 0. = (Al)t, then [AtA,AAt] = 0 but not conversely. Comment: Thus the result of Exercise 11 implies the result of Exercise 10. Exercise 11 has a fairly easy proof if the condition [AtA, AAt] =0 is translated into a decomposition of C". 12. Suppose that X = What can you say about X? Give an example X of a X such that X = a partial isometry. What conditions in addition to X = Xt are needed to make X a partial isometry? 13. Prove that V is an orthogonal projector if and only if V = Vt = 14. Prove that if A, B are EP (not necessarily of the same rank) and AB = BA, then (AB)t = BtAt.

5. The generalized inverse in electrical engineering

1.

Introduction

In almost any situation where a system of linear equations occurs there is the possibility of applications of the generalized inverse. This chapter will describe a place where the generalized inverse appears in electrical engineering. To make the exposition easily accessible to those with little knowledge of circuit theory, we have kept the examples and discussion at an elementary level. Technical terms will often be followed by intuitive definitions. No attempt has been made to describe all the uses of generalized inverses in electrical engineering, but rather, one particular use will be discussed in some detail. Additional uses will be mentioned in the closing paragraphs to this chapter. It should be understood that almost everything done here can be done for more complex circuits. Of course, curve fitting and least squares analysis as discussed in Chapter 2 is useful in electrical engineering. The applications of this chapter are of a different sort. The Drazin Inverse of Chapter 7 as shown in Chapter 9 can be used to study linear systems of differential equations with singular coefficients. Such equations sometimes occur in electrical circuits if, for example, there are dependent sources.

2.

n-port network and the impedance matrix

It is sometimes desirable, or necessary, to consider an electrical network in terms of how it appears from the outside. One should visualize a box (the network) from which lead several terminals (wires). The idea is to describe the network in terms of measurements made at the terminals. One thus characterizes the network by what it does, rather than what it physically is. This is the so-called 'black box' approach. This approach appears in many other fields such as nuclear engineering where the black

78 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

box might be a nuclear reactor and the terminals might represent

measurements of variables such as neutron flow, temperature, etc. We will restrict ourselves to the case when the terminals may be treated in pairs. Each pair is called a port. It is assumed that the amount of current going into one terminal of a port is the same as that coming out of the other terminal of the same port. This is a restriction on the types of devices that might be attached to the network at the port. It is not a restriction on the network. A network with n ports is called an n-port network.

Given an n-port network there are a variety of ways to characterize it depending on what one wants to do. In particular, there are different kinds of readings that can be taken at the ports. Those measurements thought of as independent variables are called inputs. Those thought of as dependent variables are called outputs. We will assume that our networks have the properties of homogeneity and superposition. Homogeneity says that if the inputs are multiplied by a factor, then the outputs are multiplied by that same factor. If the network has the property of superposition, then the output for the sum of several inputs is the sum of the outputs for each input. We will use current as our input and voltage as our output. Kirchhofl's laws are useful in trying to determine if a particular pair of terminals are acting like a port. We will also use them to analyse a particular circuit. A node is the place where two or more wires join together. A loop is any closed conducting path. KIRCH HOFF'S CURRENT LAW: The algebraic sum of all the instantaneous currents leaving a node is zero. KIRCH HOFF'S VOLTAGE LAW: The algebraic sum of all the voltage drops around any loop is zero. Kirchhoff's current law may also be applied to the currents entering and leaving a network if there are no current sources inside the network. Suppose that r denotes a certain amount of resistance to the current in a wire. We will assume that our wires have no resistance and that the resistance is located in certain devices called resistors. Provided that the resistance of the wires is 'small' compared with that of other devices in the circuit this is not a 'bad' approximation of a real circuit. Let v denote the voltage (pressure forcing current) across the resistor. The voltage across the resistor is also sometimes referred to as the 'voltage drop' across the resistor, or the 'change in potential'. Let i denote the current in the resistor. Then

v=ir. that r is constant but v and i vary with time. lithe one-sided Laplace transform is taken of both sides of (1), then v = ir where v and i are now functions of a frequency variable rather than of a time variable. When v and I are these transformed functions (for any circuit), then the ratio v/i is called the impedance of the circuit. Impedance is in the same units (ohms) as resistance but is a function of frequency. If the circuit Suppose

(1)

THE GENERALIZED INVERSE IN ELECTRICAL ENGINEERING

Poril

79

Port3

0-

JPod4 Fig. 5.1 A 4.pori network.

consists only of resistors, then the impedance is constant and equals the

resistance. Impedance is usually denoted by a z. In order to visualize what is happening it is helpful to be able to picture

a network. We will denote a current source by t ,a resistor by

a

terminal by and a node by —o-----. The reader should be aware that not all texts distinguish between terminals and nodes as we do. We reserve the word 'terminal' for the ports. Our current sources will be idea! current sources in that they are assumed to have zero resistance. Resistors are assumed to have constant resistance. Before proceeding let us briefly review the definition of an n-port network. Figure 5.1 is a 4-port network where port 1 is open, port 2 has a current source applied across it, port 3 is short-circuited, and port 4 has a resistor across it. Kirchhofls current law can be applied to show that ports 1, 2 and 3 actually are ports, that is, the current entering one terminal is the same as that leaving the other terminal. Port 1 is a port since it is open and there is no current at all. Now consider the network in Fig. 5.2. The network in Fig. 5.2 is not a 4-port network. As before, the pairs of terminals 5 and 6, 7 and 8, do form ports. But there is no way to guarantee,

(4

I

5

2

6

:3

Fig. 5.2 A network which is not an n-port.

80 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

Port 2

Port3

Port

Fig. 5.3

without looking inside the box, that the current coming out of terminal 4 is the same as that flowing into any of terminals 1, 2 or 3. Thus terminal 4 cannot be teamed up with any other terminal to form a port. There are, of course, ways of working with terminals that cannot be considered as ports, but we will not discuss them here. It is time to introduce the matrices. Consider a 3-port network, Fig. 5.3, which may, in fact, be hooked up to other networks not shown. Let be the potential (voltage) across the jth port. Let be the current through one of the terminals of the jth port. Since the v,, are variables, it really does not matter which way the arrow for points. Given the values of the voltages v1, v2, v3 and are determined. But we have assumed our network was homogeneous and had the property of superposition. Thus the can be written in terms of the by a system of linear equations. = Z11i1 + Z12j2 + z13i3 V2 = Z21i1 + Z22i2 + Z23j3, = z31i1 + z32i2 + z33i3 V1

(2)

or in matrix notation,

= Zi, where v,ieC3,

Z is called the impedance matrix of the network since it has the same units as impedance and (3) looks like (1). In the system of equations (2), i,, and the are all functions of the frequency variable mentioned earlier. If there are devices other than just resistors, such as capacitors, in the network, then Z will vary with the frequency. The numbers have a fairly elementary physical meaning. Suppose that we take the 3-port of Fig. 5.3 and apply a current of strength i1 across the terminals forming port 1, leave ports 2 and 3 open, and measure the voltage across port 3. Now an ideal voltmeter has infinite resistance, that is, there is no current in it. (In reality a small amount of current goes through it.) Thus i3 =0. Since port 2 was left open, we have i2 =0. Then (2) says that v3 = z31i1 or z31 = v3/i1 when = = 0. The other Zkj have similar interpretations. We shall calculate the impedance matrix of the network Example 1 in Fig. 5.4. In practice Z would be calculated by actual physical

THE GENERALIZED INVERSE IN ELECTRICAL ENGINEERING

81

Iport3

_J

1_

Fig. 5.4 A particular 3-port network. The circled number give the resistance in ohms of the resistor.

measurements of currents and voltages. We shall calculate it by looking

'inside the box'. If a current i1

is

applied across port 1 we have the situation

in Fig. 5.5. The only current is around the indicated loop. There is a resistance of

1 ohm on this loop so that v1 = Thus = v1/i1 = I. Now there is no current in the rest of the network so there can be no changes in potential. This means that v2 =0 since there is no potential change across the terminals forming port 2. It also means that the potential v3 across port 3 is the same as the potential between nodes a and b in Fig. 5.5. That is, v3 = 1 also. Hence z21 = v2/i1 =0 and z31 = v3/i1 = 1. Continuing we get

Ii

0

Li

2

Z=IO 2

11

(4)

21.

3j

In order to calculate z33 recall that if two resistors are connected in series (Fig. 5.6), then the resistance of the two considered as one resistor is the sum of the resistance of each. Several comments about the matrix (4) are in order. First, the matrix (4) is hermitian. This happened because our network was reciprocal. A network is reciprocal if when input and output terminals are interchanged, the relationship between input and output is unchanged. That is, = Second, the matrix (4) had only constant terms since the network in Fig. 5.5

'I

L___ -J Fig. 5.5 Application of a current to port I.

82 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

Fig. 5.6 Two resistors in series.

was resistive, that is, composed of only resistors. Finally, notice that (4) was not invertible. This, of course, was due to the fact that v3 = v1 + v2. One might argue that v3 could thus be eliminated. However, this dependence might not be known a priori. Also the three-port might be needed for joining with other networks. We shall also see later that theoretical considerations sometimes lead to singular matrices. 3.

Parallel sums

Suppose that R3 and R2 are two resistors with resistances r1 and r2. Then if R1 and R2 are in series (see Fig. 5.6) we have that the total resistance is

r1 + r2. The resistors may also be wired in parallel (Fig. 5.7). The total resistance of the circuit elements in Fig. 5.7 is r1r2/(r1 + r2) unless r1 = r2 =0 in which case it is zero. The number r1r2/(r1 + r2)

(1)

is called the parallel sum of r1 and r2. It is sometimes denoted r1 : r2. This section will discuss to what extent the impedance matrices of two n-ports, in series or in parallel, can be computed from formulas like those of simple resistors. It will be convenient to alter our notation of an n-port slightly by writing the 'input' terminals on the left and the 'output' terminals on the right. The numbers j, j' will label the two terminals forming the jth port. Thus the 3-port in Fig. 5.8a would now be written as in Fig. 5.8b. The notation of Fig. 5.8a is probably more intuitive while that of Fig. 5.8b is more convenient for what follows. The parallel and series connection of two n-ports is done on a port basis.

Fig. 5.7 Two resistors wired in parallel.

I

t

(0)

Fig. 5.8 Two ways of writing a 3-port network.

THE GENERALIZED INVERSE IN ELECTRICAL ENGINEERING

83

FIg. 5.9 Series connection of two 3-ports.

That is, in the series connection of two n-port networks, the ports labelled 1 are in series, the ports labelled 2 are in series, etc. (Fig. 5.9). Note, though, that the designation of a port as 1, 2, 3,... is arbitrary. Notice that the parallel or series connection of two n-ports forms what appears to be a new n-port (Fig. 5.10).

Proposition 5.3.1

Suppose that one has two n-ports N1 and N2 with impedance matrices Z1 and Z2. Then the impedance matrix of the series connection of N1 and N2 will be Z1 + Z2 provided that the two n-ports are

stillfunctioning as n-ports. Basically, the provision says that one cannot expect to use Z1 and Z2,

if in the series connection,

and N2 no longer act like they did when

Z1 and Z2 were computed.

It is not too difficult to see why Proposition 1 is true. Let N be the network formed by the series connection of two n-ports N1 and N2. Suppose in the series connect.c,n that N1 and N2 still function as n-ports. Apply a current of I amps across the ith port of N. Then a current of magnitude I goes into the 'first' terminal of port i of N1. Since N1 is an n-port, the same amount of current comes out of the second terminal of port i of N1 and into the first terminal of port i of N2. But N2 is also functioning as an n-port. Thus I amps flow out of terminal 2 of the ith port of N2. The resulting current is thus equivalent to having applied a current of! amps across the ith ports of and N2 separately. But the is the sum of the potentials potential across the jth port of N, denoted across the jth ports of N1 and N2 since the second terminal of port j of N1 and the first terminal of port j of N2 are at the same potential. But Nk, = k = 1,2 are functioning as n-ports so we have that = and where the superscript refers to the network. Thus = vs/I = (41) + = as desired. + 1

Fig. 5.10 Parallel connection of Iwo 3-ports.

84 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

L______ I FIg. 5.11 Two 2-ports connected in series.

Example 5.3.1 Consider the series connection of two 2-ports shown in Fig. 5.11. All resistors are assumed to be 1 ohm. The impedance matrix of while that of the second is

the first 2-port in Fig. 5.11 is Z1

Suppose now that a current of magnitude I is applied across

=

port 1 of the combined 2-port of Fig. 5.11. The resistance between nodes a and b is 1:2=2/3. The potential between a and b is thus 2/3 I. But what is important, 1/3 of the current goes through branch b and 2/3 through branch a. Thus in the series hookup of Fig. 5.11, there is I amperes of current going in terminal 1 of the first 2-port but only 21/3 coming out of terminal I of the first 2-port. Thus the first port of the first 2-port is no longer acting like a port. If the impedance matrix Z of the entire network

z1 + z2. in many cases,

of Fig. 5.11 is calculated, we get Z

=

however, the n-ports still act like n-ports when in series and one may add the impedance matrices. In practice, there is a simple procedure that can be used to check if the n-ports are still functioning as n-ports. We see then that when n-ports are in series, that the impedance matrix of the whole network can frequently be calculated by adding the individual impedance matrices. Likewise, when in parallel a formula similar to (1) can sometimes be used to calculate the impedance matrix. Suppose that Then define the parallel sum A :B of A and B by

A :B = A(A + B)tB. If a reciprocal network is composed solely of resistive elements, then the impedance matrix Z is not only hermitian but also positive semi-definite. That is, (Zx, x) 0 for all If Z is positive semi-definite, we sometimes write Z 0. If Z is positive semi-definite, then Z is hermitian. (This depends on the fact that and not just Ra.) if A — B 0 for A A is greater than or equal to B.

Proposition 5.3.2

Suppose

that N1 and N3 are two reciprocal n-ports

THE GENERALIZED INVERSE IN ELECTRICAL ENGINEERING

85

which are resistive networks with impedance ?natriCes Z1 and Z2. Then the impedance matrix of the paralle! connection of N1 and N2 is Z1 : Z2.

Proof In order to prove Proposition 2 we need to use three facts about the parallel sum of hermitian positive semi-definite matrices Z ,Z2. The first is that (Z1 : Z2) = (Z2 : Z1). The second is that R(Z1) + R(Z2) = R(Z1 + Z2), so that, in particular. R(Z1), R(Z2) c R(Z1 + Z2). The third is is hermitian. The proof of these facts = N(Z1), i = 1,2. since that is left to the exercises. Let N1 and N2 be two n-ports connected in parallel to form an n-port N. Let Z1, Z2 and Z be the impedance matrices of N1, N2 and N respectively. and v be the current and voltage vectors Similarly, let i1, i; v1, for N1, N2 and N. To prove Proposition 2 we must show that

v=Z1[Z1 +Z2]tZi=(Z1:Z2)i.

(2)

The proof of (2) will follow the derivation of the simple case when N1, N2 are two resistors and Z1, Z2 are positive real numbers. The current vector i may be decomposed as

i=i1+i2.

(3)

= v2 since N1 and N2 are connected in parallel. Thus

But v =

v=Z1i1,andv=Z2i2.

(4)

We will now transform (3) into the form of(2). Multiply (3) by Z1 and Z2 to get the two equations Z1i = Z1i1 + Z1i2 = v + Z1i2, and Z2i = Z2i1 + Z2i2 = v + Z2i1. Now multiply both of these equations by (Z1 + Z2)t. This gives

(Z1 + Z2)tZ1i = (Z1 + Z2)tv + (Z1 + Z2)tZ1i2, and (Z1

+

= (Z1 +

+ (Z1 + Z2)tZ2i1.

(5) (6)

Multiply (5) on the left by Z2 and (6) on the left by Z1. Equations (5) and (6) become

(Z2:Z1)i=Z2(Z1 +Z2)tv+(Z2:Z1)i2,and (Z1 : Z2)i = Z1(Z1 + Z2)t, + (Z1 : Z2)i1.

(7)

=+

But (Z1 :Z2) =(Z2 :Z1), i and Z1 + Z2 is hermitian. Thus addition of the two equations in (7) gives us that

(Z1 :Z2)i =(Z1 + Z2)(Z1 + Z2)tv =

+ z)V.

(8)

Now the impedance matrix gives v from i. Thus v must be in R(Z1) and R(Z2) by (4) so that + 1)V =v and (8) becomes (Z1 Z2)i = v as

desired. •

Example 5.3.2

Consider the parallel connection of two 3-port networks

86 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

(9)

Fig. 5.12 The parallel connection of N3 and N2.

shown in Fig. 5.12. The impedance matrices of N1 and N2 are

Ii Z1=I0

0

1]

2

Li

2

Ii

0

1

2IandZ2=I0

1

1

3J

1

2

LI

By Proposition 2, the impedance matrix of circuit (9) is Z1 : Z2. 11

0

11

111

0

Z1:Z2=Io

ii

2

21

(10

2

21

Li

2

3]

\L'

2

3]

Ii

0

= —10

ii r

2

84 21 I— 60

2

3] L

324

11/2

=

0

0 2/3 2/3

24

+

1'

0

10

1

Li

I

11\tli

0

iii 10

2J1L1

ii

1

ii

1

2]

i]

—60 66

241

Ii

0

61

10

1

II

6

30J

Li

1

2J

1/21

2/31. 7/6J Ll/2 The generalized inverse can be computed easily by several methods. The values obtained from Z1 Z2 reader is encouraged to verify that the agree with those obtained by direct computation from (9). 4.

I

Shorted matrices

The generalized inverse appears in situations other than just parallel

connections Suppose that one is interested in a 3-port network N with impedance matrix Z. Now short out port 3 to produce a new 3-port network, N' and denote its impedance matrix by Z'. Since v3 is always zero in N' we must have = Z'33 =0, that is, the bottom row of Z' is zero. If N is a

THE GENERALIZED INVERSE IN ELECTRICAL ENGINEERING

87

reciprocal network, then the third column of Z must also consist of zeros.

Z' would then have the form

z,=

(1)

.

The obvious question is: What is the relationship between the ZkJ and the The answer, which at first glance is probably not obvious, is:

Proposition 5.4.1

Suppose that N is a resistive n-port network with

impedance matrix Z. Partition Z as Z

=

1 s n. Then Z' =

is the impedance matrix of

the network N'formed by shorting the last sports of N N is reciprocal.

Iii

Proof Write i = I

.°

I. v

L13J

lvi where

= [°j

v5eC5. Then v = Zi may be

s

written as V0

= =

ho

+ (2)

+

Suppose now that the last s ports of N are shorted. We must determine the matrix X such that v0 = Xi0. Since the last s ports are shorted, =0. Thus the second equation of(2) becomes Z22i = — Z21i0. Hence

=

—

+ [I

—

=—

+ h where h€N(Z22). (3)

If (Zi, 1) = 0, then Zi = 0 since Z 0. Thus N(Z22)c N(Z32). (Consider i with = 0). Substituting equation (3) into the first equation of (2) now + h)) = Z1 110 — gives v0 =Z1 110 + Z1215 = Z1 110 + Z12( — = (Z11 — as desired. The zero blocks appear in

the V matrix for the same reason that zeros appeared in the special

case(1). U Z' is sometimes referred to as a shorted matrix. Properties of shorted matrices often correspond to physical properties of the circuit. We will mention one. Others are developed in the exercises along with a generalization of the definition of shorted matrix. Suppose that Z,Z' are as in Proposition 1. Then Z V 0. This corresponds to the physical fact that a short circuit can only lower resistance of a network and not increase it. It is worth noting that in the formula for V in Proposition 1 a weaker

88 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

type of inverse than the generalized inverse would have sufficed. However,

as we will see later, it would then have been more difficult to show that Z Z'. In the parallel sum the Penrose conditions are needed and a weaker inverse would not have worked. 5.

Other uses of the generalized inverse

The applications of the generalized inverse in Sections 3 and 4 were chosen

partly because of their uniqueness. There are other uses of the generalized inverse which are more routine. For example, suppose that we have an n-port network N with impedance matrix Z. Then v = Zi. It might be desirable to be able to produce a particular output v0. In that case we would want to solve v0 = Zi. If then we must seek approximate solutions. This would be a least squares problem as discussed in Chapter 2. would correspond to that least squares solution which requires the least current input (in the sense that Iii II is minimized). Of course, this approach would work for inputs and outputs other than just current inputs and voltage outputs. The only requirements are that with respect to the new variables the network has the properties of homogeneity and superposition otherwise we cannot get a linear system of equations unless a first order approximation is to be taken. In practice, there should also be a way to compute the matrix of coefficients of the system of equations. Another use of the generalized inverse is in minimizing quadratic forms subject to linear constraints. Recall that a quadratic form is a function 4) from C" to C of the form 4)(x) = (Ax, x), AE C" Xst for a fixed A. The instantaneous power dissipated by a circuit, the instantaneous value of the energy stored in the inductors (if any are in the circuit), and the instantaneous value of the energy stored in the capacitors, may all be written in the form (Al, i) where i is a vector made up of the loop currents. A description of loop currents and how they can be used to get a system of equations describing a network may be found in Chapter III of Huelsman. His book provides a very good introduction to and amplification of the ideas presented in this chapter.

6.

Exercises

For Exercises (I)—(8) assume that A, B, C, D 0 are n x n hermitian matrices.

1. Show that A:B=B:A. 2. Show that A:B0. 3. Prove that R(A : B) = R(A) R(B).

4. Prove that (A:B):C=A:(B:C). *5 Prove that Tr (A : B) (Tr A) : (Tr B). *6. Prove that det (A : B) (det A) : (det B).

THE GENERALIZED INVERSE IN ELECTRICAL ENGINEERING

Fig. 5.14

Fig. 5.13 7.

89

a. Provethat ifAB,then

b. Formulate the physical analogue of 7a in terms of impedances. *8. Show that (A + B) : (C + D) A : C + B D. (This corresponds to the assertion that Fig. 5.13 has more impedance than Fig. 5.14 since the latter has more paths.) For Exercises (9)—(14) assume that is an hermitian positive semidefinite operator on C". Pick a subspace M c C". Let B be an orthonormat basis consisting of With respect to B, first an orthonormal basis for M and then one for

A

B

has the matrix E =[B* C]' where E 0. Define EM =[ 9.

A_BCtB* 0 o

Prove that EEM.

10. Show that if D is hermitian positive semi-definite and E D 0 and

R(D)cM, DEM. *11. Prove that EM=lim E:nPM. For the next three exercises F is another hermitian positive semi-definite matrix partitioned with respect to B just like E was.

12. Suppose that E F 0. Show that EM FM 0. 13. Prove that (E + F)M EM + FM. 14. Determine when equality holds in Exercise 13. Let L. M be subspaces. Prove = :

7.

References and further reading

A good introduction to the use of matrix theory in circuit theory is [44].

Our Section 2 is a very condensed version of the development there. In particular, Huelsman discusses how to handle more general networks than ours with the port notation by the use of 'grounds' or reference nodes. Matrices other than the impedance matrix are also discussed in detail. Many papers have been published on n-ports. The papers of Cederbaum, together with their bibliographies, will get the interested reader started. Two of his papers are listed in the bibliography at the end of this book [27], [28]. The parallel sum of matrices has been studied by Anderson and colleagues in [2], [3], [4], [5] and his thesis. Exercises (1 )—(8) come from [3] while (9)—(14) are propositions and lemmas from [2]. The theory of shorted operators is extended to operators on Hubert space in [5]. In [4] the operations of ordinary and parallel addition are treated as special

90 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

of a more general type of matrix addition. The minimization of quadratic forms is discussed in [10]. The authors of [42] use the generalized inverse to minimize quadratic forms and eliminate 'unwanted variables'. The reader interested in additional references is referred to the bibliographies of the above and in particular [4]. cases

6

(/jk)-generalized inverses and linear estimation

1.

Introduction

We have seen in the earlier chapters that the generalized inverse At of A E C'" although useful, has some shortcomings. For example: computation of At can be difficult, At is lacking in desirable spectral properties, and the generalized inverse of a product is not necessarily the product of the generalized inverses in reverse order. It seems reasonable that in order to define a generalized inverse which overcomes one or more of these deficiencies, one must expect to give up something. The importance one attaches to the various types of generalized inverses will depend on the particular applications which one has in mind. For some applications the properties which are lost will not be nearly as important as those properties which are gained. From a theoretical point of view, the definition and properties of the generalized inverse defined in Chapter 1 are probably more elegant than those of this chapter. However, the concepts of this chapter are considered by many to be more practical than those of the previous chapters.

2.

Definitions

Recall that L(C", C'") denotes the set of linear transformations from C"

into C'". For 4eL(C",C'"), was defined in Chapter 1 as follows. C" was denoted the decomposed into the direct sum of N(4) and a one to one mapping of restriction of 4 to N(4)', so that 4j was onto R(4). was then defined to be Atx -

if xeR(A) — tO

if

Instead of considering orthogonal complements of N(4) and R(4), one could consider any pair of complementary subspaces and obtain a linear transformation which could be considered as a generalized inverse for 6.

92 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

Definition 6.2.1

(Functional Definition) Let 44eL(C", C'") and let N and R be complementary subs paces of N(4) and R(44), that is, C" = N(44) + N and C'" = R(4) + R. Let 41 =44 L (i.e. 44 restricted to N. Note that 441 is a one to R(44)—' N exists). For xeC'", one mapping of N onto R(44) so that 44 let x = r1 + r2 where r1 ER(4) and r2eR. The function QN,R defined by

9 is either called the (N, R)-generalized inverse for 4 or a prescribed range!

null space generalized inverse for 44. For a given N and R, QNR is a uniquely defined linear transformation from C'" into C". Therefore,for AECrnXA, A induces afunction 44 and we can define GNR€Cnxrn to be the matrix 0fQNR (with respect to the standard basis).

In the terminology of Definition 1, At is the (R(A*), inverse for A. In order to avoid confusion, we shall henceforth refer to At as the Moore—Penrose inverse of A. In Chapter 1 three equivalent definitions for At were given; a functional definition, a projective definition (Moore's definition), and an algebraic definition (Penrose's definition). We can construct analogous definitions for the (N, R)-generalized inverse. It will be assumed throughout this section that N, R are as in Definition 1. The projection operators which we will be dealing with in this chapter will not necessarily be orthogonal projectors. So as to avoid confusion, the following notation will be adopted. Notation. To denote the oblique projector whose range is M and whose null space is N we shall use the symbol The symbol will denote, as before, the orthogonal projector whose range is M and whose null space

is M'. Starting with Definition 1 it is straightforward to arrive at the following

two alternative characterizations of GNR.

Definition 6.2.2 (Projective Definition) For AeC'"

is called the

Il) = N, N(GNR) = R, AGNR =

(N, R)-generalized inverse for A and GNRA =

Definition 6.2.3 (Algebraic Definition) For AeC'" (N, R)-generalized inverse for A and GNRAGNR = GNR.

",

GNR is the

R(GNR) = N, N(GN R) = R, AGN RA = A,

Theorem 6.2.1

The functional, the projective, and the algebraic definitions of the (N, R)-generalized inverse are equivalent.

Proof We shall first prove that Definition 1 is equivalent to Definition 2 and then that Definition 2 is equivalent to Definition 3. If QN R satisfies the conditions of Definition 1, then it is clear that R(QNR)'= N and N(Q$R) = R. But 44QNR is the identity function on R(4) and the zero function on R. Thus AGNR

=

Similarly GNRA =

GENERALIZED INVERSES AND LINEAR ESTIMATION

93

and Definition I implies Definition 2. Conversely, if GNR satisfies

the conditions of Definition 2, then N and R must be complementary subspaces for N(A) and R(A) respectively. It also follows that must be the identity on R while is the identity on N and zero on N(A). Thus QNR satisfies the conditions of Definition I, and hence Definition 2 implies Definition 1. That Definition 2 implies Definition 3 is clear. To complete the proof, we need only to show that Definition 3 implies Definition 2. Assuming G satisfies the conditions of Definition 3, we obtain (AG)2 = AG and (GA)2 = GA so that AG and GA are projectors. Furthermore, R(AG) R(A) = R(AGA) c R(AG) and R(GA) R(G) = R(GAG) c R(GA). Thus R(AG) = R(A) and R(GA) = R(G) = N. Likewise, it is a simple matter to show that N(AG) = N(G) = R and N(GA) = N(A), so that AG = and GA = which are the

conditions of Definition 2 and the proof is complete. • K

Corollary 6.2.1 For AeCTM the class of all prescribed range/null space generalized inverses for A is precisely the set (1)

i.e. those matrices which satisfy the first and second Penrose conditions. The definition of a prescribed range/null space inverse was formulated as an extension of the Moore—Penrose inverse with no particular applications in mind. Let us now be a bit more practical and look at a problem of fundamental importance. Consider a system of linear equations written as Ax = b. If A is square and non-singular, then one of the A characteristics of is the solution. In order to generalize this property, one migbt ask for A Ctm what are the characteristics of a matrix GE Km such that Gb is a solution of Ax = b for every beCTM for which Ax = b is consistent? That is, what are the characteristics of G if AGb = b for every beR(A)?

(2)

In loose terms, we are asking what do the 'equation solving generalized inverses of A' look like? This is easy to answer. Since AGb = b for every beR(A), it is clear that AGA = A. Conversely, suppose that G satisfies AGA = A. For every beR(A) there exists an such that Ax,, = b. Therefore, AGb = AGAx,, = Ax, = b for every bER(A). Below is a formal statement of our observations.

Theorem 6.2.2

For AeCTM has the property that Gb is a solution of Ax = bfor every be Ctm for which Ax = b is consistent and only if

GE{XECAXmIAXA = A}.

Thus the 'equation solving generalized inverses for A' are precisely those which satisfy the first Penrose condition of Definition 1.1.3.

(3)

94 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

such that, in addition to being an 'equation solving inverse' in the sense of (2), we also z for all z Gb and require that for each beR(A), Gb ze{x Ax = b). That is, for each beR(A) we want Gb to be the solution of minimal norm. On the basis of Theorem 2.1.1 we may restate our objective as follows. For each beR(A) we require that Gb = Atb. Therefore, G must satisfy the equation GA = AtA = (4) Let us now be more particular. Suppose we seek

which is equivalent to

AGA = A and (GA)* = GA.

(5)

The equivalence of (4) and (5) is left to the exercises. Suppose now that G is any matrix which satisfies (5). All of the above implications are reversible so that Gb must be the solution of minimal norm. Below is a formal statement of what we have just proven.

Theorem 6.2.3

For AeC'"

and beR(A), A(Gb) = band

forallz#Gbandze{xIAx=b) = A and (XA)* = XA). We define the term minimum nonn generalized inverse to be a matrix which satisfies the first and fourth Penrose conditions of Definition 1.1.3. Let us now turn our attention to inconsistent systems of equations. As in Chapter 2, the statement Ax = b is to be taken as an open statement and the set of vectors {z I Az is equal to b) (i.e. the 'solution set' for the open statement) may or may not be empty, depending on whether beR(A). When the solution set for Ax = b is empty, we say Ax = b is inconsistent. In dealing with inconsistent equations, a common practice is to seek a least squares solution as defined in Definition 2.1.1. As we saw in Theorem 2.1.2, Atb is always a least squares solution of Ax = b. However, Atb is a special least squares solution. It is the least squares solution of minimal norm. In some applications, one might settle for obtaining any least squares solution and not care about the one of minimal norm. For Ae C'" X let us try to determine the characteristics of a matrix G such that Gb is a least squares solution of Ax = b for all be Ctm. To begin with, we can infer from Corollary 2.1.1 that AGb — b is minimal if and only if AGb = Pft(A)b = AAtb. This being true for all beC'" yields AG = AAt. But AG = AAt is equivalent to AGA = A and (AG)* = AG. The proof of this equivalence is left to the exercises. We formally state the above observations in the following theorem.

Theorem 6.2.4 For for all be C'"

Gb is a least squares solution of Ax = b

and only if

= A and (AX) = AX).

GENERALIZED INVERSES AND LINEAR ESTIMATION

95

We define the term least squares generalized inverse to be a matrix which satisfies theflrst and third Penrose conditions.

Looking at Corollary 1 and Theorems 2—4, one sees that each of the different types of G matrices discussed can be characterized as a solution to some subset of the Penrose conditions of Definition 1.1.3. To simplify our nomenclature we make the following standard definition. XPfl is called an (i,j, k)Definition 6.2.4 For AeCTM X a matrix inverse for A if G satisfies the ithjth, and kth Penrose conditions;

(1) AXA=A, (2) XAX=X, (3) (AX)* = AX, = XA.

(4)

Furthermore,the set of all (i,j, k)-inverses for A will be denoted by A {i,j, k }. For example, G isa (1,3)-inverse for A if AGA = A and (AG)* = AG. We write GeA{1,3}. Note that G may or may not satisfy either of the

other two Penrose conditions. This notation requires one to pay particular attention to how the equations are ordered, but experience has shown this convention to be efficient and useful. A will be used to designate an Notation. For arbitrary element of A { 1 }. The notation is a convenience that must be treated with some care. It is frequently used to make statements which hold for the entire class A { 1). For example, the statement rank(AA) = rank(A) is understood to mean that 'rank(AG) = rank(A) for every GeA{1}'. The phrase 'for every GeA{1}' will always be implicit, unless otherwise stated, but generally will not appear. Because expressions involving the (f notation are not always uniquely defined matrices, ambiguities can arise. For example, in investigating the possibility of a reverse order law for (1)-inverses, what

(f

should it mean to write (AB) = B A ? It is better to avoid the notation in situations of this type. At times it will be necessary to formulate statements more explicitly than the (f notation allows. Some authors have assigned special notations to different kinds of (i,j, k)-inverses. We will not do this. Since almost every application of (i,j, k)-inverses involves subsets of A (1 } (the equation solvers) we will simply use the (f together with a qualifying phrase. For example, we might say 'let A be a least squares inverse' when we wish to designate an arbitrary element of A{1,3), or we can simply write 'let AEA{1,3)'. The term 'generalized inverse' would be inappropriate if the related concepts did not coincide with the usual notion of matrix inverse in the special case that the matrix under consideration is non-singular. Note that if is non-singular, then A{1) = {A'}. Note also that OeA{2) even if A is non-singular. Table 6.1 summarizes the information concerning the important types of (i,j, k)-inverses.

96 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS Let

Table 6.1

Type of inverse

Properties

Terminology

(1)-inverse

An Equation Solving Inverse C eA { Ijil and only if Gb is a solution of (Sometimes called a g.inverse) Ax = b for every be R(A)

(1,2)-inverse

A prescribed range/null space inverse. (The (N, R)inverse) (Some authors have also called this a reflexive inverse)

If GeA{l,2}, then N(A)+ R(G)u C and R(A) N(G) = C. That is, each (1,2)-inverse defines complementary subspaces for N(A) as well as R(A). Conversely, each pair (N, R), where N and R are complements of N(R) and R(A) respectively, uniquely determines a (1,2)-inverse, GNR with R(GN*) = N and N(GNR) = R

(1,3)-inverse

A Least Squares Inverse

GeA{1,3) if and only 11Gb is a least squares solution of Ax — b for every beC

(1,4)-inverse

A Minimum Norm Inverse

G€A(1,4) if and only (Gb is the minimum norm solution of Ax = b for every beR(A)

(1,2.3,4)-inverse The Moore— Penrose Inverse, At

A (1,2,3,4) contains exactly one element, At, which is the (R(A), N(A))-inverse (or A. A'b is the minimal norm least squares solution of Ax = b. If beR(A), then Atb is the solution of minimal norm

There are, of course, several possible (i,j, k)-inverses which are not included in Table 6.1. However, they are of lesser importance and their properties can be inferred from those listed in the table. For example, a generalized inverse which will provide least squares solutions and whose range and null space are complements of N(A) on R(A), respectively, must clearly be a (1,2,3)-inverse.

3.

(1)-Inverses

As already mentioned, the important types of (i,j, k)-inverses are usually

members of A{1}. Therefore, we will take some time to discuss (1)-inverses. All of the facts listed below are self-evident and are presented here for completeness.

Theorem 6.3.1

then

rank(A) rank(A), (ii) rank(AA ) = rank(AA) = rank(A), (1)

(iii) (A)*EA*{1}, (iv) For non-singular P and Q, Q A has full column rank, then AA = (vi) If A has full row rank, then AA = I,. (vii) If P has full column rank and Q has full row rank, then

QAPe(PAQ){1},

GENERALIZED INVERSES AND LINEAR ESTIMATION 97

01.is a (1)-inverse for IA

1A

(viii)

.

0

(ix) If A is hermitian, then there exists a hermitian (1 )-im'erse for A.

(For example, At) (x) If A is positive (negative) semi-definite, then there exists a positive (negative) semi-definite (1)-inverse for A. (For example. A')

Theorem 6.3.2

For A, X, B, C conformable matrices, the matrix equation AXB = C has a solution jf and only B = C: in which case the set of all solutions is given by

=

+H

—

AAHBBIH arbizrarv}.

(I)

Proof If AXB = C is consistent then there exists an X0 such that C= Thus AACBB = AAAX0BBB = AX0B = C. If C = AA CBB, then X = ACB is a particular solution. To prove (1), note first that for every H,ACB + H — AAHBB is a solution of AXB = C. Given a particular solution X0, there exists an H0 such that X0 = ACB + H0 — AAH0BB (by the consistency condition), take

H0=X0. •

Corollary 6.3.1 For AeCTM v", bER(A), the set of all solutions of Ax = b can be written as {x}

= {Ab + (I

Likewise,for 1*

—

cER(X* ), the solution set of

=

is

+ h*(I — XX)IhEC"}.

Corollary 6.3.2 For A E AX =0 is given by {X}

=

C"

X

the set of solutions to

= ((I — A_A)HtHEC"P}

and the set of solutions to YA =0 is given by

Theorem 6.3.3

If Ae

x

then A { I } can be characterized by either

of the following : (2)

A{1) =

1A+ H(I — AA)+(I — AA)KIH,

Proof To prove (2), note that AXA = A is always consistent since AAtA = A. The result in (2) now follows from Theorem 2. To prove (3), note that A +11(1— AA)+(I — AA)KeA(l} for all and that for any particular element AeA (1), there exist an H0 and K, such

(3)

98 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

that

A: = A +

—

AA) + (1— AA)K0, namely H0 = A: — A and

K0=A:AA. U Since (1)-inverses provide solutions to consistent systems, it seems only natural to investigate the possibility of using (1)-inverses to provide a xn common solution, if one exists, to two systems. Let A E cmx BE a€Cm, and consider the two systems

Ax=aandBx=b.

(4)

The problem is to find an

which satisfies both. This problem is clearly equivalent to solving the partitioned system

IA]

ía]

(5)

LB]X = Lb] If (1)-inverses could be obtained for the partitioned matrix M

= [B]' then

the system (4), could be completely analysed. The following result provides the solution to this problem.

Theorem 6.3.4

IA]

matrix M

For

a (1)-inverse for the partitioned

is given by

= Lii GM = [(I — (I — AA)(B(I — AA)fB)A

(I — AA)(B(I

—

AA))]. (6)

Further,nore,for CECTM Xe, a (1)-inverse for the partitioned matrix N = [A C] is given by

GN

1A(1 — C((I — AA)C)(I — AA)) ((I — AA)Cf(1 — AA) =L

7

If (f is taken to mean (1, 2)-inverse, then GM eM (1,2) and GN eN (1,2). The representation for the entire class M (1) as well as N (1) can be given in terms of GM and GN. We have chosen not to present these representations. Theorem 4 is proven by simply verifying the defining equations are satisfied. We can now say something about (4).

Theorem 6.3.5

aeR(A), beR(B), and let x0 and Xb Let be any two particular solutions for the systems

Ax=aandBx=b respectively. Let F = B(I

(8) —

AA). The following statements are equivalent.

The two systems in (8) possess a common solution.

B; — beR(F) = R(B/N(A)) —

XbEN(A) + N(B).

(9) (10)

(11)

GENERALIZED INVERSES AND LINEAR ESTIMATION

99

Furthermore, when a common solution exists, a particular common solution is given by

x = (I — (I

—

AA)FB)x0 + (I — AA)Fb

(12)

and the set of all common solutions can be written as + (I — AA)(I

—

FF)hIhEC"}.

(13)

Proof The chain of implication to be proven is(1l) (11) Suppose (11) holds. Then X0 — = + nbEN(B) so that B; — b = B(x0 — xb) = BnQER(F)

If (10) holds, then the vector x

(10)

since

flQEN(A),

of (12) is a common solution

= A; = a, and

+ FF1.

= B; —

(14)

= b — FFb. — b) = B; — b, or B; — Now (10) yields that Therefore, (14) becomes Bx = b. (9) (11): If there exists a common solution then the two solution sets Thus there exist must intersect. That is, + N(A) } {xb + N(B) } and (11) vectors ;eN(A) and nbEN(B) such that x0 + n0 = Xb + follows. To obtain the set of all solutions, use the fact that they can be written as

+

=

{

=

—

+

+ (I —

G4])h}

where GM is given in (6). Now,

I

—

= I — (AA —(I— AA)FBAA) — (I — AA)FB =

(I — AA)(I — FB + FBAA) = (I — AA)(I — FB(I — AA)) = (I — AA)(I — FF), which gives (13). We shall now present some results on finer partitions than those discussed in Theorem 4. Representations for (1)- and (1,2)-inverses of

•

matrices partitioned as

M_1AIC [R ID where A, C, R, and D are any conformable matrices will be given. First, we need a technical lemma and some notation. Notation. For and TeT{1}, ET and FT will denote the

ITT.

Lemma 6.3.1

Let

such that N

=

and let X,

100 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

be (1)- or (1,2)-inversesfor X, Y and W, respectively, which satisfy

and

=0, and WX =0,

0,

=0. Then, depending on how ()

is interpreted, a (1)- or (I, 2)-inverse for N is given by

Q[ZYI—I].

(15)

where Q = FY(EWZFVIEW.

Proof The proof amounts to showing that the defining relations are satisfied. Let L1 and L2 denote the first and second term of the sum on the right-hand side of (15), so that

NL1N

Ix

1NLN 2

=LWTE

0

=

=Z— and hence, EWZYY + WWZ + EWZQZFY = Z so that NNN = N. If () Now, EWZQZFV =

and

is

interpreted as meaning a (1,2)-inverse, then a direct calculation shows that L1NL1 = L1, L1NL2 =0, and L2NL1 =0. By using the fact that Z is a (1)-inverse for Q, it is also easy to verify that L2NL2 = L2 and hence

NNN=N. •

In passing, we remark that there are three other forms of Lemma 1 which are possible by considering the following three sets of hypotheses:

ZY=0 and YZ=0. XW=OandWX=O.

YX=0,

=0, WZ =0,

(16)

(17)

=0.

=0 and

(18)

By performing a permutation of rows, or columns, or both, and then applying Theorem 2.1, a representation which resembles (15) can be obtained for each of the previous cases. With the aid of Lemma 1, it is now possible to develop a representation for a (1)- or (1,2)-inverse of a completely general partitioned matrix. Theorem 6.3.6

Let MeCTM "denote the matrix

IAC

MLRD. and let Z = D — RAC, Y = EAC, W = RFA and Q = FV(EWZFVIEW. or (1,2)-inverse for M, depending on how (f is interpreted, is given by

M

—

IA

—

ACYEA

—

+

[FAWZ +

—

FAWRA YEA

—

FAWZYEA

FAW :

+

—

I].

0

GENERALIZED INVERSES AND LINEAR ESTIMATION

101

Proof For the moment, let (f denote a (1,2)-inverse and let P.S and N denote the matrices

rI Then, M = PNS

IA V II Ac1 1j.S=[0 jandN=Lw z

ol

and

a (1.2)-inverse for M

is

given by

M=S'NP1.

(20)

and W, the matrices G = YEA and H = FAW are (1,2)-inverses for V and W, respectively, such that GA =0 and AH =0. Since it is also true that each (1,2)-inverse A satisfies A Y =0 and WA =0, Lemma 1 is used to obtain a (1,2)-inverse for N as For every (1,2)-inverse

+

N=

I

—

J] (21)

Using (21) with (20) yields (19), the desired result for (1,2)-inverses. If (f is interpreted as meaning only a (1)-inverse, it is a matter of direct

computation to verify that the matrix (19) is still a (1)-inverse. U Observe that M may also be factored in three other ways as

II IAR

I

R_DCAJ[ i I1IR

o

RD

ERD

i

C—AR'D]LO

i

ICD L i

1ICA

O][ C

hID

1IDR

EDR

ojLcFD A—CDR][ i

I

o

Coupling each of these with the appropriate form of Lemma I which is obtained from (16), (17) or (18), one can use the same method as in the proof of Theorem 6 to derive three other representations for M which resemble that obtained in Theorem 6. Theorem 6 has several useful consequences. Corollary 6.3.3

If M =

where R(C) c R(A) and

RS(R) c RS(A),(RS() denotes the row space), then a particular (1)- or (1,2)-inverse for M is given by

M-

1A + AC(D — RAC)RA

—

AC(D — RACI

=L This gives the familiar form which occurs when the matrices M and A are non-singular and is taken as By using Lemma 3, we have the following.

102 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

If M

Corollary 6.3.4

where AECrXr and rank(M)

=

rank(A) = r, then a particular (1)- or (1,2)-inversefor M is given by 0 M_1A o Lo In many applications, the matrices involved are either positive or

negative semi-definite. In these cases, (1)-inverses of partitioned matrices are easy to find. Below we give the result for positive semi-definite matrices. The results for negative semi-definite matrices are left as exercises.

IfS =

Corollary 6.3.5

is positive semi-definite, then a

particular (1)-inverse for S is given by

- 1A + AC(D — C*AC)C*A

I

—

—

= Proof

Since S is positive semi-definite, S can be written as

i' 5*]'

S— —

—

—

D

c

Clearly, R(C) =

= R(A) and RS(C*) = = RS(A). The result now follows from Corollary 3. • Partitioned matrices of the form X

[x* 1

0

occurred in our treatment of the constrained least squares problem of Section 6 in Chapter 3. They also occur in statistical applications. As we shall see in the next section, (1)-inverses of B can also be used to present a unified treatment of the subject of linear models. Theorem 6 provides a (1)-inverse of B which will be used in the next section. Corollary 6.3.6 B

Iv lxi

= B

—101

Let

be positive semi-definite,

A particular (1)-inverse for B is given by

1111

.

—

VX

-

where Q =

(EXVFX.)EX. Theorem 6 can also be used to represent the rank of a large partitioned matrix in terms of ranks of matrices of lower order.

Theorem 6.3.7

Let M€CMXJ denote the matrix M

The rank

GENERALIZED INVERSES AND LINEAR ESTIMATION

103

of M is given by rank(M) = rank(A) + rank(Y) + rank(W) + rank(U), where V = EAC. W = RFA. and U = EW(D — = EWZF.. Moreover, the expressions rank(Y), rank(W), and rank(U) do not depend upon which (1)-inverses are used.

Proof For every (1)-inverse M of M. the product MM is idempotent. Using Theorem 6 we compute MM as 0

1

I

so that (Tr = trace)

rank(M) = rank(MM) = Tr(MM) = Tr(AA) + Tr(YYEA) + Tr(WW) + (22) are Because E1Y = V and WW, and = U, and since AA, idempotent. (22) becomes rank(M) = rank(A) + rank(Y) = rank(W) + rank(U). To show that rank(Y), rank(W), and rank(U) are independent of the (1)-inverses used, let G1 and G2 be two (1)-inverses for A and let

E1=I—AG11E2=I—AG2,Y1 =E1Cand Y2=E2C. Because E1E2 = E1 and E2E1 = E2, it follows that V1 = E1Y2 andY2 = that rank(Y1) = rank(Y2). Similar remarks may be made about W. From the first part of the theorem, rank(U) = rank(M) — rank(A) — rank(Y) — rank(W), so that rank(U) is also constant with respect to the so

(1)-inverses used. U By performing row permutations, column permutations, or both, one can obtain three more forms of Theorem 7. Corollary 6.3.7

For the matrix M of Theorem 6.3.7,

rank(M) rank(A) + rank(C) + rank(R) + rank(Z)

where Z=D— RAC,A€A{1}.

Proof The inequality follows directly from Theorem 7 since

Y=EAC, W=RFA,andU=EWZFV. • Corollary 6.3.8

If rank(M) = rank(A), then

R(C) c R(A), RS(R) c RS(A), and D

= RAC

for every A. Conversely, (1 the three conditions of(23) hold for some then rank(M) = rank(A).

Proof Ifrank(M)=rank(A),thenY=0W=O,andU=0.Y=O R(A) and W =0 implies RS(R) RS(A). Since Y =0 and W =0, it follows that U = Z and U =0 implies D = RA C. Conversely, implies R(C)

the three conditions of (23) imply Y =0, W =0, and U =0. U Again, there are three additional forms of Corollaries 7 and 8 which can be proven. For example:

(23)

104 GENERALIZED IN VERSES OF LINEAR TRANSFORMATIONS

If R(C) c R(A) and RS(R) c RS(A), then rank(M) = = rank(A) + rank(D — RA C)for every A. Conversely, (A) and rank(D — RA C)for some A, then R(C) rank(A) +

Corollary 6.3.9

RS(R)

RS(A).

We now turn to the slightly more specialized problem of block triangular matrices. That is, either R =0 or C =0. We shall limit the discussion to upper block triangular matrices. For each statement concerning upper block triangular matrices, there is a corresponding statement about lower block triangular matrices easily proved using transposes. Let T =

AECMXn, CECrnXr, DECPXr

From Theorem 6 one gets that:

Theorem 6.3.8

A (1)-inverse for T is always given by

—CD1 where Q = FD(ECFD)EA. Furthermore, then (24) yields a (1,2)-inverse for T. From Theorem 7, we have that:

(24)

represents any (1,2)-inverse,

Theorem 6.3.9 rank(A) +

4.

For every choice of(1)-inverses in EA and FD, rank(T) = rank(D) + ranlc(EACFD).

Applications to the theory of linear estimation

Once one realizes that generalized inverses can be used to provide

expressions for solutions (or least squares solutions) to a linear system of algebraic equations, it is only natural to use this tool in connection with the statistical theory of linear estimation. Indeed, the popularity of generalized inverses during the last two decades was, in large part, due to the interest statisticians exhibited for the subject. Much of the early theory of generalized inverses was developed by statisticians with specific applications relating to linear estimation in mind. One advantage of introducing generalized inverses into the theory of linear estimation is that a unified theory can be presented which draws no distinction between full rank and rank deficient models or between models with singular variance matrices as opposed to those with non-singular variance matrices. We will confine our discussion to applications involving problems of linear estimation. A complete treatment of how generalized inverses are utilized in statistical applications would require another book almost the size of this one. In Chapter 2 the application of the Moore—Penrose inverse to least

GENERALIZED INVERSES AND LINEAR ESTIMATION

105

problems was presented in a way that avoided the introduction of statistical terminology. However, in this section we will assume the reader is familiar with standard statistical terminology and some of the basic concepts pertaining to statistical models. We will analyse the linear model y = Xb + e where y is an (n x 1) vector of observable random variables, X is an (n x k) matrix of known constants, b is a (k x 1) unknown vector of parameters, and e is a (n x 1) vector of non-observable random variables with zero expectation and variance matrix E(ee*) = Var(y) = a2V. Here V is positive semi-definite and known, but is unknown. We will denote this model by (y, Xb, r2V). If rank (X) = k and V = I, then the celebrated Gauss—Markov theorem guarantees that the least squares solution 6= Xty = (X*X) 1Xy provides the minimum variance linear unbiased estimate of b. However, in the general case where X is possibly rank deficient or V is possibly singular, there may be no unbiased estimate of b. Then only certain linear functions of b are unbiasedly estimable. The problem is to obtain minimum variance linear unbiased estimates for estimable linear functions of b as well as an unbiased estimate for Throughout this section, all matrices will have real entries and (.)* denotes the transpose. When V is singular, there are some natural restrictions on y as well as b. In order to derive these restrictions, as well as other results, we will need to make frequent use of the following fact. squares

Lemma 6.4.1

For the mode!(y,Xb,o2V) and for ,I2€W' and Var(I*y + =

Proof

+;

+

=

—

— — Xb)]) = E[Itee*12] c21*yI follows by taking =

—

=

= o21V12. That Var(I*y +

=

We now investigate the restrictions which are naturally present when V is singular. Since V is positive semi-definite, there exists an orthogonal matrix S such that S*VS is the diagonal matrix, 0 22 ID2 01 2. 0jwhere21#Oand

Lo

0. r = rank(V). If T =

then it is easy to see that the model

(y, Xb, cr2V) is equivalent to the model

(I) Let S be partitioned as S =

where P is(n x r). Then (1)can be

106 GENERALIZED IN VERSES OF LINEAR TRANSFORMATIONS

written in the equivalent form 2

Q*y_.Q*xb+Q*e It follows that Var(Q*e) = 0 and E(Q*e) = 0 so that Q*y = Q*Xb with probability 1.

(3)

Equation (3) is just a set of linear restrictions on b. If the linear system in (3) is assumed to be consistent, then the model (y, Xb, o2V) is called a consistent model. Therefore, the model (2). and hence (y, Xb, o2V), is equivalent to a restricted model of the form (4)

where = D 1P*y, = D é = D 1P*e, R Q*X,and f= Rb = f, (7.21) has the obvious interpretation notation (i,,

The

b such that Rb = f, Var(é) = 021.

=

In the sequel, we will always assume all models we write are consistent.

Definition 6.4.1 A linear function c*b is said to be linearly unbiasedly estimable under (y, Xb, o2V) if there exists a vector 1€ and scalar zeR such that E(l*y + z) = c*b for all b such that Q*Xb = Q*y where Q* is as in (2). Whenever we use the term 'estimable' in the sequel, we will mean linearly unbiasedly estimable.

We are now in a position to characterize those vectors c such that c*b is unbiasedly estimable. Theorem 6.4.1

The function c*b is estimable under (y, Xb, o2V) if and

only if ceR(X*).

Proof

From Corollary 3.1, Q*Xb = Q*y if and only if

beS" = {(Q*XfQ*y + [I — (Q*xf(Q*X)]hlhepkk

and zeR such that E(l*y + x) = c*b, beb"

There exists

+ z = c*b, b€U' — l*X)b = z, be 9" —

=

l*X)[(Q*XIQ*y + (I — (Q*X)(Q*X))h] = 1*X)(Q*X)Q*y — l*X)[I — (Q*X)(Q*X)J = 0 —

GENERALIZED INVERSES AND LINEAR ESTIMATION

Ia= = (

107

— I*X)(Q*X)Q*y + (c*

—

I*X)(Q*X)Q*]X

—

—

— l*X

•

There are several important consequences of Theorem 1.

Corollary 6.4.1 A linear function I*y + a is an unbiased estimate of c*b under (y, Xb, o2V) if and only !f there exists a vector d such that = y*d, with probability 1, and X*(I + d) = C.

In some of the literature on linear estimation, it is often stated that 1111$ vector such that J*y is an unbiased estimate for c*b under (y, Xb, o2V), then Xi = c. This is wrong. Corollary 4.1 shows that X*1 = C, is sufficient but not necessary for l*y to be an unbiased estimate for c*b a

under (y, Xb, o2V).

We next need a result which will tell us what the entire set of linear unbiased estimates for c*b looks like. Suppose c*b is estimable under (y, Xb, o2V). The set of linear unbiased estimates for c*b is given by

Theorem 6.4.2

where

Proof Suppose first that

=

Clearly, E(cl') =

for some EU. Then cli = such that = c*b. Conversely, suppose is a linear

unbiased estimate for c*b. Then, by definition, there exists a vector I and a scalar a such that = I*y + a. Since i/i is unbiased, Corollary 4.1 guarantees that there is a vector d such that c' = (1+ d)*X and d*y = (with probability 1) so that

= I*y + a = I*y + d*y = (I + d)*y.

•

= c' and hence EU. with Therefore, has the form As an immediate consequence of Theorem 2, we have the following. Corollary 6.4.2 Suppose that c*b is estimable under (y, Xb, o2V). 1 is a vector such I*y is an unbiased estimate of c*b and only if there exists a = = vector { such that where Using Corollary 6.3.1, the set of unbiased estimates for c*b given in Theorem 2 can be written in terms of any element of X { 1}. Corollary 6.4.3

Suppose c*b is estimable under (y, Xb, o2V) and

XEX{1}. The set of linear unbiased estimates for c*b is given by

U=

{c'Xy +

We now address the problem of finding a form for the minimum variance linear unbiased estimate of an estimable function. From

108 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

Theorem 2 and Lemma 1, we know that when cb is estimable under (y, Xb, a2V), is a linear unbiased estimate for cb which has minimum

variance if and only if there is a vector minimum

such that

=

and the

minzVz where 9=

(6)

is attained at z = Since V is positive semi-definite, there exists a matrix A such that V = MA. Thus (6) is equivalent to minhiAzil where

(7)

This is a constrained least squares problem of the type studied in Section 3.6. Theorem 3.6.1 guarantees that the minimum (7) is attained at a vector z0 if and only if there is a vector A such that

Izi.isa least squares solution of Iv .

Let B =

:

XlIzl

101

(8)

Lc]

Using Corollary 3.6, it is easy to see that

[0]ER(B)

because for the B of Corollary 3.6,

BB[0]

=

(9)

=

by Theorem 6.4.1. Thus we can make an even stronger statement than (8) by replacing the phrase 'least squares solution' in (8) with the word 'solution' to give the following result. When cab is estimable under (y, Xb, c,r2V), is a linear unbiased estimate of cab which has minimum variance and only there are

Theorem 6.4.3 vectors

and A such that

= {y and

x][c]Io

fv

0]LafLc

LX*

A useful equivalent formulation of Theorem 3 is as follows.

Theorem 6.4.4 When cab is estimable under (y, Xb, a2V), is a linear unbiased estimate of c*b which has minimwn variance if and only if there is a vector such that each of the following conditions hold. (i)

=

(ii)X{=c (iii)

The linear unbiased estimate of cb which has minimum variance is unique in the following sense. Theorem 6.4.5

Suppose that cb is estimable under (y, Xb, a2'I). If

10

GENERALIZED INVERSES AND LINEAR ESTIMATION

109

and are both linear unbiased estimates of c*b which hae'e minimum variance, then i/is = with probability I.

Proof From Theorem 4 we know that there exist vectors

and

such that (11)

il'

and

(12)

—{2)ER(X).

(13)

From (13) we know there is a vector h such that

—

we can use Lemma I together with (12) to obtain Var(il'1

—

"2) = a2({

—

Thus, there is a constant such that However, (11) and (12) imply that K =

({ —

=

—

=0. U

—

=

— —

= Xh so that

=0.

with probability I. — = =

In light of this result, we make the following definition.

Definition 6.4.2 When c*b is estimable under (y, Xb, a2V), is called the best linear unbiased estimate (or BLUE)for c*b when is the unique linear unbiased estimate for c*b with minimum variance.

If c', is the BLUE for cb, then i,1'

unique. There are, however, generally infinitely many vectors satisfying the conditions of Theorem 4 which can give rise to ci'. Although there may be a slight theoretical interest in representing all of the associated with the BLUE this is usually not the problem of prime concern. The important problem is to obtain some formula for i/i. If any one particular satisfying the conditions of Theorem 4 can be determined, then the problem of finding the BLUE of cb is considered to be solved. It is clear from Theorem 3 that knowledge of any (1)-inverse of the matrix

is

can provide a representation

for the BLUE of cb. Such a (1)-inverse can also provide other valuable information. Before pursuing this further, we need the following definition. Definition 6.4.3

For the linear model (y, Xb, o2V), let B denote the

matrix. (V is n x n and X is n x k).

An n x n matrix is said to be a B11-matrix it appears as the upper left hand block in some BeB{1}. Likewise, those n x k matrices which appear as an upper right hand block in some B are called B12-matrices; those k x n matrices which appear in the lower left hand corner of some B are called B21-matrices; and those k x k matrices which appear in the lower right hand corner of some B are called B22-matrices.

110 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

A somewhat amazing fact about B1çmatrices is that each class is

completely independent of every other class in the sense that if Q is any 1-matrix, U is any B1 2-matrix, L is any B21-matrix, and T is any is always a (1)-inverse

B22-matrix, then the composite matrix

for B. Furthermore, the .-matrices can be computed independently of each block which appears in a B can be one another. This means calculated as a separate entity without regard to any other block which might appear in the same B (or any other B). In order to establish these facts, we need some preliminary lemmas. Lemma 6.4.2

Let E = I — XX'. A matrix Q is a B1 1-matrix

and only

satisfies the four equations

E(V—VQV)=0,

(14)

VQX=O, X*QV=O,

(15)

X*QX=O.

(17)

Proof

(16)

Suppose first that Q is a B11 -matrix. Then there must exist

This implies,

matrices W12, W21, and W22 such that

by direct multiplication, that (1) VQV + XW21 V + YW + XW22X = V; (ii) VQX + XW21X = X; (iii) X*QV + X*W12X* = X*; and (iv) X*QX =0. Note that (iv) is equation (17). To establish (15), multiply (ii) on the left by X*Q* and use (iv) to obtain XQVQX =0. Since V is positive semi-definite, V = A and it is easy to see that (15) follows. Equation (16) follows in a similar manner. Equations (ii) and (iii) now degenerate to XW21X = X and X*W12X* = X*.

(18)

To establish (14), notice that for every vector I,, and BeB{1}, (9) guarantees that BB

_rol 101

Lx*hi = [x.h]

From this it follows that

VW12X* + XW22X* =0 so that (i) becomes

V—VQV=XW21V.

(19)

Equation (14) is obtained by multiplying (19) on the left by E. Conversely, if Q satisfies (14)—(17), then F and Q is

Q

(I_QV)X?*

a B11-matrix. •

B!

GENERALIZED INVERSES AND LINEAR ESTIMATION

111

Lemma 6.4.3

The term VQV is invariant for all B11 -matrices Q. Moreover. for every B11 -matrix Q, VQV = A*(AE)(AE)tA where

E=I_XXtandV=A*A. Proof

Suppose Q isa B11-matrix so that (14)—(17) hold. By direct multiplication, along with (15) and (16), and the fact that Xt = (X*X)tX* = X*(XX*)t, it can be verified that VEQEV = VQV. Let K = EVE and QKKt G= + KtKQ — Q. It now follows from (14) that QeK{1} and hence GE K { 1). Now observe that Q can be written as

+ E(I — KtK)Q + EGE

Q = Q(! —

(20)

since (15) and (16) imply KQE = KQ and EQK = QK. But from K = Kt, it follows that KKt = K'K and (I — KKt)EV =0= VE(I — KtK). Therefore, (20) together with KKtEV = EV and VEK'K = VE yields VQV =

VEGEV= Now use the fact that GEK{l} to obtain VQV = VEKtEV = VE(EA*AE)tEV = A*AE(AE)t(AE)*$EA*A = A*(AE)(AE)tA. U

Let D = A*[I — (AE)(AE)tJA where A and E are as in

Lemma 6.4.4

Lemma 3. (By virtue of Lemma 3, D = V — VQV where Q can be any B11 -matrix.) Each of the following statements hold.

U is a B12-matrix if and only ifUEXt{1} and VUXt = D.

(21)

and XLV = D.

L isB21-matrix ifand only

(22)

T is B22-matrix (land only if XTXt = — D.

(23)

Proof of (21). Suppose first that UeXt{l} where VUXt = D. To see that U is a B12-matrix, let Q = E(EVE)tE and verify that

M-1 uLxt(I_VQ)

U

E

Bi

I

by observing that Corollary 3.6 implies that Q is a B11-matrix so that (14)—(17) can be used. Conversely, suppose that U is a B12-matrix. This

means there exist matrices Q, L and T such that

1}.

From (18)we have that UEXt{1}. The fact that VUX* = V — VQV = D follows from (19). Thus (21) holds. The same type of argument is used to prove (22), (23) except in place one uses of ML

MT

IQ =

LII r

=

I

(I_QV)Xt*1

for (22) and

(I_QV)Xt*1

for (23).

U

By combining the results of Lemmas 2—4 one arrives at the following

112

GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

important result concerning the independence of the various classes of

Theorem 6.4.6 If is any B1 1-matrix. G12 is any B1 2-matrix, G21 is any B21 -matrix and G22 is any B22-matrix. then the composite matrix I

r

is a (1 )-rnverse for

B. Furthermore,the matrices C, can be

L"21 computed independently of each other. The equations on which such calculations must be based are given in (14)—(17) and (21)—(23).

Although it is not necessary to know a B11 -matrix in order to compute - matrix can be useful B12-, B21 -, or B2 2-matrices, knowledge of any since the matrix D of Lemma 4 is then readily available. Once D is known, a set of B1 B21-, and B22-matrices can be easily computed. The importance of the different B11-matrices in linear estimation is given in the following fundamental theorem. If c*b is estimable under (y, Xb,

Theorem 6.4.7

i2V), then each of the

following is true. If G12 is any B12-matrix, then the BLUE of c*b is given by (24)

If C21

is any B21 -matrix, then the BLUE of c*b is also given by

(24')

Suppose and respectively. If

are both estimable with BLUE's

and

is any B22-matrix. then

=

(25)

If G22 is any B22-matrix and i/i is the BLUE of c*b, then

_a2c*G22c.

(26)

If G11 is any B11 -matrix, then an unbiased estimator for o2 is given by Iy*G 1y where y = Tr(G1 1V).

(27)

Proof of (24): II C12 is a B12 -matrix, then there exist matrices Q, L,

and T such that

[Q

for c*b is given by

Theorem 3 guarantees that the BLUE

= {*y where satisfies (10). Therefore, a solution for

for any B. Thus one solution for

[f] is

is G12c, and hence

= Proof of(24'): If G21 isa B21-matrix, then by Lemma 4,

is a

B1 2-matrix, (24') now follows from (24).

Proof of(25): If C22 is a B22-matrix, then there exist matrices Q, U and

IQU1 ,-. IeB{1}. From Theorem 1, we know that there

L such that I

L'

GENERALIZED INVERSES AND LINEAR ESTIMATION

113

exists a vectors h1 and h2 such that X*h5 = c1 and X*1I2 = c2. Use (24)

together with Lemma 1 to obtain ,

= = = =

= =

=

—

—

o2htXG22X*h2

(from (21)) (from (23)) (from (21) since UEX*{1})

is immediate. is also given by — Proof of (26): This is obtained from (25) by taking Cs = c2. Proof of (27): If G11 is a B11-matrix, then Y*GY = (Xb — e)G1 1(Xb — e) = bXtG1 3Xb — 2b*X*G1 1e + e*G1 1e. Using (17), together with the fact that E(e) =0, yields The fact that Cov(sfr1.

E(y*G 1y) = E(e*G1 1e) = E[Tr(G1 1ee*)] = Tr[E(G1 = a2Tr(G1 1V).

•

0. It can be shown In (27) we made the assumption that Tr(G1 1V) that Tr(G1 1V) =0 if and only if R(V) c R(X), which is clearly a pathological situation. The details are left as exercises. Theorem 7 shows that once any element of B { 1) is known, the problem of inference from a general linear model is completely solved and the problem of inference is thus reduced to the calculation of specific Ba-matrices. Actually, knowledge of any B11 -matrix together with any element of X (1) will suffice in order to produce the quantities of Theorem 7 (i.e. a priori knowledge of a B12-, B21- or a B22-matrix is not necessary). Theorem 6.4.8

If c*b is estimable under (y. Xb. a2V) and J'Q is any 3-matrix and X is any element of X 1), then each of the following is true.

The BLUE of cb is ç(' = c*X(I — VQ)y. = o2c*X(V (ii) If i/i is the BLUE of c*b, then (1)

—

VQV)X*c.

= o2c*XDX*c.

(iii) If cl'1 and cl'2 are the BLUE's of Cb and

respectively,

(assuming each are estimable) then —

VQV)X*c2.

= (iv) An unbiased estimation for

is

ly*Qy where y = Tr(QV).

Proof If Q is a B1 1-matrix then (14)—(17) hold and it is not difficult to show that

I

Q

(I_QV)X*

1 B 11).

The desired result now follows from Theorem 7. U

114 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

If any Xis known, then a B11 -matrix is always available via the formula Q = (I — XX)*[(I — XX)V(I — XX)*J(I — XX). However, -matrix is known, then computing X is unnecessary. The next result shows that once a B11-matrix is known, then all one needs is any solution 1* of the system I*X = c. if a

Theorem 6.4.9

If c*b is estimable under (y, Xb, a2V) and J* is any = I*X and Q is any B11 -matrix, then each of the following is

solution of true.

(I) The BLUE

of c*b is = l*(I — VQ)y

(ii) If i/i is the BLUE ofc*b, then (iii)

= a21*(V — VQV)I = a21*Dl.

respectively, then and #2 are BLUE's and = and — VQV)l = o21tD12 where ITX = 2

— c*

—V

Proof We know from Theorem 1 that I*X = is always consistent so that = c*XXt. For a particular solution, there is always a particular member X€X{1} such that 1 = c*X, namely x = X' + — c*tc*Xt. The desired conclusions now follow from Theorem 8. • We conclude by considering the special, but important, case when V is

non-singular. It is a simple exercise to show that (X*V 'X) X'V1 eX{1}. It is then easy to use Lemma 2 to show that [V' — V IX(X*V 1X)

'X)X* a B11 -matrix. Therefore, D = V — VQV = and it is clear from Lemma 4 that (X*V is a B21-matrix (X*V and — - 1X) is a B22-matrix. These observations together with (7) give the following useful result. x

is

Corollary 6.4.4

If c*b is estimable under (y, Xb, a2V) and V is nonsingular, then each of the following hold. (i) The BLUE ofcb

is = C*(X*V IX)X*V 'y.

=

(ii)

and

=

'X)c2

where and #2 are the BLUE's for and (iii) An unbiased estimator for is given by ly*[V1 1X(X*V IXfX*V —V y where y = n-rank(X). y Perhaps the most common situation encountered is when V = I, in

which case we have the following.

Corollary 6.4.5

If c*b is estimable under (y, Xb, 021), then each of the

following hold. (i) The BLUE of c*b is ',I, = c*(X*XIX*y.

(ii) Var(#) = o2c*(X*X)c and Cov(*1 '#2) = and *2 are the BLUE'sfor and An unbiased estimator for 02 is given by (iii) where y = n-rank(X).

where

ly*[I — X(X*XfX*]y

GENERALIZED INVERSES AND LINEAR ESTIMATION

Notice that (X*Xf

115

so that (X*XIX*y is just another way of

1.

representing any least squares solution of Xb = y. Also c*(X*Xi is just any

solution, 1*, of = l*(X*X). Thus Corollary 4 can also be stated in terms of solutions of = l*(X*X) or in terms of least squares solutions of Xb = y. Similar remarks can be made about the results in Corollary 3

because C*(X*V_ 'X) represents any solution of = l*(X*V. 'X) and (X*V IX)X*V.. 'y represents any weighted least squares solution of Xb = y. (By a weighted least squares solution of Xb = y, we mean any vector z such that H Xz — y - = (Xz — y)*V - 1(Xz — y) is minimized, or equivalently, any solution of the weighted normal equations XW 'Xz = 'y.) In conclusion, we note that not only are linear models with singular variance matrices representable as restricted linear models but that restricted linear models (y, Xb Rb = 1. o2V) are just special cases of linear models where the variance matrix is singular. Indeed, one can always write -

lxi. y

= LRi' = L and it

is c!ear

and

lv:o

=

that the restricted model (y. Xb Rb =

1.

is equivalent to

o2V) where V is singular.

5.

Exercises

Verify each of the following assertions.

1. (AØB)e(AØB){1} where A®B denotes the Kronecker product of A and B.

2. Let G = U(VAUIV and let rank(A) = r. Each of the following is true. (a) GeA{l) iffrank(VAU) = r. (b) GeA and R(G) = R(U) if rank(VAU) = rank(U). (c) and N(G) = N(V) if rank(VAU) = rank(V). (d) G is a (R(U), N(V))-inverse for A iffrank(U) = rank(V) = rank(VAU) = r. If rank(A*VA) = rank(A), then A(A*VAI(A*VA) = A and (A*VA)(A*VA)A* = A*. 4. Verify A(A*AIA* = AAt. 5. II R(C) c R(A) and RS(R) c RS(A), then RAC is invariant over A I 3.

AAAeA{1,2). 7. For reC' rAA =

6.

8. For Ge A { 1), the following statements are equivalent: (a) GE A { 1,2), (b) rank(A)= rank(G), (c) G = G,AG2 for some G, , G2eA{1J. 9. If has rank r, then there exists G1eA{ I) such that rank(G1) = r+ I, 1= 0, 1, 2, ... , min(m,n). 10. If and P is a non-singular matrix such that PA = H where H is the Hermite canonical form for A, then PEA { 1). 11. For every there exists a GeA{1) and FeB{1)

such FGe(AB){1).

116 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS 12.

Let KrnXn denotes the set of m x n matrices with integer entries. II

then there exists

such that AGEKMXM and

(Hint: Consider C = QStP where PAQ = SandS is the Smith canonical form with P and Q being non-singular.) where beR(A). Let C = QStP (as described and 13. Let above). Ax = b has an integer solution if and only if GbE XI which case the general integer solution is given by x = Gb + (I — GA)h. .

hEKNX I. 14.

and Q* be permutation matrices such that M =

Let

= rand let T = A 'C, S = RA

where rank(M) =

Then

01 (j)

(ii)

(iii) (iv) 15.

Q[...](I +T1'*) 'A '(I + S*S)[I!S*]P= Mt.

Let P and Q be non-singular matrices and let A be an r x r

non-singular matrix such that M = p'

LetG=Q

1•

P.Then

(i) GeM{l) iffZ=A'. iffZ = A1 and W = VAU. U=

(ii) (iii)

W= —

where P

= (iv) where Q

iffZ=A'.V =

—Qk?,U

= [Q, Q2]

and 16.

For A€CrnxhI. CECtm. deCk. let E = I —

AA. F = I — AA, and

fl=1 A (1)-inverse for A + cd* is given by one of the following: (i)

-

-

Acc*E*E

FF*dd*A

FF*dC*E*E

(A + cd*) = A — c*E*Ec — d*FF*d + P(c*E*Ec)(d*FF*d) when cØR(A),

GENERALIZED INVERSES AND LINEAR ESTIMATION

FF*ddSA

(ii) (A + cd*) = A — d*FFSd when (A

=0. CER(A),

+ cd*i = A — r 'A cdA when

or dER(A*).

-

(iv) (A + cd*) = A

117

0 and either c€R(A)

- AccEE when fi =0, — c*Es&

dER(A*)

(v) (A + cd*) = A when $ = 0, c€R(A),

17. At = A*(A*AA*)A*. 18. G€A{2} 1ff there exist a pair of orthogonal projections P and Q such that G = (PAQ)t.

19. GEA{1,2} 1ff there exist A, A€A{1} such that G = AAA. 20.

A{1,3)={At+(I_AA)HlHisarbitrary},A(1,4}= (At + K(I — AA)I K is arbitrary}.

21. Let A be n x n. If P is a non-singular matrix such that PA*A is in Hermite form, then PA*€A(1,3}. 22. Let M be a subspace and let P = and P1 = The constrained system Ax + y = b, xeM, yeM1 is consistent lIT beR(AP + P); in which case the solutions are x = P(AP + P)b and y = b — Ax. When

(AP + PJ1 exists, the matrix G = P(AP + P1) 'b is called the Boft—Duffin inverse.

23. (AP + P1)' exists

exists where the columns of K form a basis for M. The Bott—Duffin inverse is G = K(KAK) lK*. 24. When it exists, the Bott—Duffin inverse is the (M. M1)-inverse of PAP. 25. Let A, denote an incidence matrix of a directed graph consisting

ofmnodes (N,,N2,...,N,,,}andndirectedpaths{P1,P2,...,Pj between nodes. That is, a., = 1 if is a path directed away from N., —1 if P, is a path directed into N., and a1,=O ifP, is a path neither leads away from or into N.. Suppose the graph is connected (i.e. every pair of nodes is connected by some sequence of 1

paths.) If GeA{1,3}, then I — AG = —J where J is a matrix of l's. 26. If A is the incidence matrix of a connected di-graph, then rank(A) = m — 1, where m = number of nodes. 27. Let W be a positive definite matrix and let "v, be the norm associated with W (i.e. x = x*Wx). G is a matrix such that x = Gb is a weighted least squares solution (II Ax — b is minimal) for every b ill G satisfies AGA = A and (WAG)* = WAG (A weighted (1,3)inverse).

28. Let V be positive definite. Gy is a V '-least squares solution of Xb = y for all y if G is a B21-matrix. 29. Let V be positive definite and suppose Ax = b is consistent. G is a matrix such that x = Gb is the minimal V-norm solution of Ax = b for all beR(A) if G satisfies AGA = A and (VGA)* = VGA (A weighted (1,4)-inverse).

118 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS 30.

(Weighted Moore-Penrose inverse) AGA = A, GAG = G, (WAG) = WAG, and = VGA, ill for all b, Gb is the W-least squares solution of Ax = b which has minimal V-norm. Moreover, there exists a unique solution for G which can be expressed as G = V"2 x (W1I2AV - h12)t w 112

= v - 1A* WA(A WAY - 'A*WA)A*W.

31. Let V be positive semi-definite and let fl x denote the semi-norm (x*Vx)h12. A vector . is called a minimum V-semi-norm solution of the = c and { is minimal among all system X*z = c, ceR(X*) if solutions. The following statements are equivalent.

(i) Cc is a minimum V-semi-norm for every ceR(X*). (ii) G is a B12-matrix. (iii) GeX*{1} and XG*V = VGX*. (The same as in Exercise 29). 32. Let W be positive semi-definite. G is a matrix such that Gb is a W-least squares solution of Ax b, for all b, ifi A*WAG = A*W. This last equation is equivalent to the two conditions WAGA = WA and (WAG)* = WAG. (Notice that G is not necessarily in A { I), as was the case in Exercise 27.) 33. Let V and W be positive semi-definite. C is a matrix such that Gb is a minimal V-semi-norm W-least squares solution of Ax = b 1ff G satisfies the four conditions WAGA = WA, VGAG = VG, (WAG)* = WAG, and (VGA)* = VGA. (If V is positive definite, there exists a unique solution for G. If V is just semi-definite, G may not be unique.) 34. If V and W are positive semi-definite and X = A*WA, then every B12-matrix satisfies the four conditions of Exercise 33. 35. If Q is any B1 ,-matrix, then Tr(QV) = rank[V X] — rank[X]: Furthermore, Tr(QV)= 01ff R(V)c_ R(X) if 0 isa B,,-matrix.

36. (V + XX*IX[X*(V + XXIXI is always a B, 2-matnx. If R(X) c R(V), then VX(X*V X) is a B, 2-matnx.

37. The matrix

non-singular if rank(VA

= it and

rank(XRk)= k. 38. If M is any matrix such that R(V + XMX)= R([V!X])and if W = (V + XMX*), then L = (X*WX)X*W is a B21-matrix, L is a B,2-matrix, W(I — XL) is a B,,-matrix, and (XWXI — M is a B22-matrix. 39. The following statements are equivalent. (i) The invariant term D of Lemma 4.4 is the zero matrix. (ii) 0 is a B22-matrix. (iii) R(V) = R(VIN(x.)). (iv) rank(V) = Tr(VQ) where is any B,,-matrix. (v) R(V) R(X) =0. 40. (Use of 2-inverses in a generalized Newton's Method.) Let x0eCa let B(x0,r)be the open ball of radius r centred at x0. Let I be a function f: B(x0,r)-. C and let J(x)eC be defined for

xeB(x0,r)where X(x)eJ(x){2}. Suppose 5,; and yare constants such

GENERALIZED INVERSES AND LINEAR ESTIMATION

119

that the following hold:

IIf(u)—f(v)—J(v)(u — w)II Lilu—vU, for u, veB(x0,r) with u — veR(X(v)). (ii) (X(u) — X(v))f(v) u— for u, veB(x0, r). (iii) cIIX(u) I +yö< 1 for ueB(x0,r) (iv) X(x) 111(x) II <(1 + ö)r. (1)

I

converges to a point Then the sequence =X— peB(x0, r) which is a solution of X(p)f(x) =0. (If X(p) has full column rank, then p is a solution of f(x) = 0.) 41. For any choices of (1)-inverses for X,X*, and K = EXVFX. where = I — XX, = I — - X, the following statements are true. IfQ is a B11-matrix, there exist matrices Z1 ,Z2,G such that GeK{1} (I — KK)Z2 + and GEE. Conversely, Q = Z1(I — ÷ the matrix Q in(s) is a for every pair Z1 ,Z2, and every

B1 1-matrix.

7

The Drazin inverse

1.

Introduction

In the previous chapters, the Moore—Penrose inverse and the other (i,j, k)-inverses were discussed in some detail. A major characteristic of the (i,j, k)-inverses is the fact that they provide some type of solution, or least squares solution, for a system of linear algebraic equations. That is, they are 'equation solving' inverses. However, we also saw that there are some desirable properties that the (i,j, k)-inverses do not usually possess. For example, if A, BE then there is no class, C(i,j, k), of(i,j, k)-inverses for A and B such that A, BE C(i,j, k) implies any of the following: X

(i) AA=AA, (ii)

= (AP) for positive integers p.

(iii) Aeo(A)=.Veor(A ), (iv)

'A =

A is similar to B via the

similarity transformation P. then A is similar to B via P. Depending on the intended applications, it might be desirable to give up the algebraic equation solving properties the (1)-inverses possess in exchange for a generalized inverse which possesses some other 'inverselike' properties. The Group and Drazin generalized inverses of this chapter will be of such a compromising nature. In many ways, they more closely resemble the true non-singular inverse than do the (i,j, k)-inverses. They will possess all of the above mentioned properties. Although the Drazin inverse will not provide solutions of linear algebraic equations, it will provide solutions for systems of linear differential equations and linear difference equations as will be shown in Chapter 9. Up to this point the underlying field has always been taken to be the field of complex numbers. Although this was not always necessary, the complex numbers provided the most natural setting for the development of the Moore- Penrose inverse as well as most of the other (i,j, k)-invcrsc. To extend the concepts of the previous chapters to matrices over different

THE DRAZIN $NVERSE

121

fields is somewhat artificial. One soon finds that the kind of field needed in order to obtain analogous results must possess properties which mimic

those of the complex numbers. There is nothing special about the complex numbers when it comes to defining the Drazin inverse. However, many of the results in the latter part of this chapter depend on the taking of limits. Rather than get into a technical discussion of the type of topology needed on the field, we shall merely note that almost all our results extend to arbitrary fields. The Group inverse, as we shall see later, is just a special case of the Drazin inverse. However, because the Group inverse appears in some interesting applications, (see Chapter 8) we consider it as a separate entity.

Definitions

2.

The Drazin inverse will only be defined for square matrices. Just as was

the case when defining the (i,j, k)-inverses, there are at least two different approaches possible when formulating the definition. These are the functional or geometric definition and the algebraic definition. The algebraic definition was first given by M. P. Drazin in 1958 in the setting of an abstract ring. We will give both definitions and then show that they are equivalent. Before doing this, some preliminary geometrical facts are needed. Throughout this chapter, we adopt the convention that 0° = I.

Lemma 7.2.1

Let A be a linear transformation on C". There exists a non-negative integer k such that C" = R(4k) + N(4k).

Proof Let k

be

the smallest non-negative integer such that

...or

...

c equivalently, N(4°) N(4) c ... c N(4"2) = .... Suppose that Then there exists a zeC" such that 4kz = x. Thus, 42kz = AkX =0, so that zeN(42k) = N(4k). Thus x =0. Suppose, that rank(Ak) = r so that dim[N(Ak)J = n — r. If {v1 ... ,v,} is a basis for R(4k) and if {v,,.,... is a basis for N(4"), it is easy to show that {v1,... ,v,, is a basis for C". U The number k which was introduced in Lemma 1 will be very important. 1)

Definition 7.2.1

N(4k) = N(4kf 1) =

Let 4 be a linear transformation on C". The smallest

non-negative integer k such that C" = R(4k) + N(4k), or equivalently, the smallest non-negative integer k such that rank(4'9 = rank(4k4 I), is called the index of4 and is denoted by Ind(4).

Note that if A is invertible, Ind(4) =0. Also lnd(0) = 1. Several different characterizations of the index will be developed in the

sequel. 114 is a linear transformation on C" and Ind(4) = k, 4 restricted to R(4k)) is an invertible linear transformation on R(4k).

Lemma 7.2.2 then

122 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

Proof We

• now formulate a definition of the Drazin inverse of a linear

transformation on

Definition 7.2.2

(Functional Definition.) Let 4 be a linear transformation such that Ind(4) = k. Let XECA and write x = u +v where uER(4k) and veN(4'9. Let The linear transformation defined by = ADx = is called the Drãzin inverse of 4. For let 4 be the linear transformation induced on C" by A. The Drazin inverse, AD, of A is defined to be the matrix 0f4D with respect to the standard basis. on

Theorem 7.2.1

(The Canonical Form Representation For A and AD.) is such that lnd(A) = k >0, then there exists a non-singular matrix P such that (1)

where C is non-singular and N is nilpotent of index k. Furthermore, if P, C and N are any matrices satisfying the above conditions, then

]P_l.

(2)

Proof Let 4 be the linear transformation induced on

,...

B=

be the basis for

by A. Let

constructed in the proof

of Lemma I so that {v1,...,vj isabasisforR(4k)and{v,+1,...,v0} isa basis for N(4"). Since R(4t9, N(4") are invariant subspaces for 4 and Ak(N(Ak)) = The form we have the block form for A lIP = [vi'... for AD follows from the definition of if P is as specified. However if P. C, N are such that (1) holds, and C is non-singular, and Nk =0, then the first r columns of P are a basis for R(4k) while the remaining columns are a basis for N(4"). Thus (2) for any P., C or N by Definition 2. U 4D, is as follows. The algebraic definition of

Definition 7.2.3 (Algebraic Definition.) If and 1JADECn

with Ind(A) = k

is such that

ADAAD = AD,

(3)

AAD = ADA, and

.

(4)

(5) then AD is called the Drazin inverse of A.

Theorem 7.2.2

For

to the algebraic definition

the functional definition of AD is equivalent

of AD.

THE DRAZIN INVERSE

Proof Write A as in (1). That AD satisfies (3). (4) and (5) Suppose then that X satisfies (3), (4) and (5).

Ix

Now

11

is

123

trivial.

xl 12

Iwhere X11 and Care the same size.

From (5) we have X11 =Ck,Ck4l X12 =0. Thus X11 = C-' and X12 =0. But also XAk+I =Ak by (4) and (5). Thus X21 =0. There remains to show that X22 =0. From (3) and (4) we have

= X22. Thus N&.... 'X22 =

Nk(X22)2

=0. But then Nk 2X22 = (X22) =0. Continuing in this manner gives X22 =0 as desired. U

N(X22)2

Notice that AD exists and is unique for all since the functional definition is constructive in nature. Some important facts that are evident either from the definitions or from the above proof are listed in the following corollary.

Corollary 7.2.1

If

and Ind(A) = k, then

(i) R(AD) = R(Ak), (ii) N(A°) = N(Ak),

(iii) AAD = A°A =

A4 ).N(

(iv) (I _AAD)=(I _ADA)= l'N(A1).R(A4). (v) for a non-negative integer p. A' and only (1 p k, and (vi) if A is non-singular, then AD = A '. The number k = Ind(A) was used in the algebraic definition. Actually, any non-negative integer p. p k, could have been used.

Theorem 7.2.3

Let

negative integer and Xe

be such that lnd(A) = k. If p is a nonsuch that XAX = X, AX = XA, and

= A', then p k and X =

A' implies that = R(A') so that p k. Write 'X = Ak. p = k + i. Then (AD)IAk+l = This reduces to Thus X satisfies the conditions of the algebraic definition of AD. U Something that should immediately strike one's attention when looking at Definition 3 is that AD is not always a (1)-inverse for A. This, of course, means that AD is not an 'equation solver'. That is, if b a consistent system of algebraic equations, then ADb may not be a solution. In fact, ADb is a solution of Ax b if and only if be R(Ak) X

where k = Ind(A).•

There are special cases when AD is a (1)-inverse for A E

Theorem 7.2.4

For

X

AADA = A if and only !flnd(A)

A'

1.

Proof If Ind(A)= 0, then AD = and AADA = A. Suppose that Ind(A) 1. Then relative to(l), (2) we have AADA = A if and only if 0 = N. But 0 = N Wand only if Ind(A)= 1. U

124 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

The special case when Ind(A)

rise to what is known as the Group inverse. Notice that in this case, (5) can be rewritten as AADA = A.

Definition 7.2.4

If

1 gives

is such that Ind(A)

then the Drazin inverse of A is called the Group inverse of A and is denoted by A'. When is characterized as the unique matrix satisfying the three it exists, 1,

equations AA'A=A,AAA' =A,andAA =AA. The following theorem makes it clear why is used.

the term

'group inverse'

Theorem 7.2.5

A linear transformation 4 on which has rank r belongs to a multiplicative group, G, of linear transformations on if and only if Ind(4) 1. Furthermore, if 4EG and if 49eG is the multiplicative inverse of 4 within G, then 49

=

an 4eG such 4 is in a multiplicative group G, then there exists 4D 49449 = 49. Then 49= = 4'4, and by Theorem 3 =4,

that

Also Ind(4) as in (1),

1. Conversely,

(Ix G=1P[0

01

0]P

suppose Ind(4)

1. Then with P defined

1 XeC'x ',r=rank(C)1,

is a multiplicative group containing 4. • As a special case of Theorem 1 (or Theorem 5) we have the following.

Corollary 7.2.2

For AeC

A' exists

and only if there exist

non-singular matrices P and C such that A =

The following is a simple example of a group of singular matrices.

Example 7.2.1

Consider the following subset of RNXN.

11... G=

1

...

ii...1

It is clear that G is a multiplicative group. The multiplicative identity in

Gis

then thegroupinverseofA is A

=_!1j

Another algebraic characterization of AD is illustrated in the heuristic

diagram of Fig. 7.1. is a semi-group and the G's (one for each idempotent), are the maximal subgroups of CN N Clearly, { Gj is a disjoint family but not a partition of CN* n• If Ind(A) 1, then, as pointed out for some i, and AD = A' is just the inverse of A with respect earlier, to the group G1. If Ind(A) = k> 1, then it is not difficult to show that k can X

THE DRAZIN INVERSE

125

= Some maximal

klnd (A) (A1 )*

A°

Fig. 7.1

as that number such that Ak E G,, for some r, but Ak-I ØG,, for all r. Thus, Ak has an inverse, X, within G,. The Drazin l(A&)*. 'X = inverse of A is simply AD = Suppose that the Jordan form of A is be characterized

J1

° o

0

0

... 0

1

0

0

0

oo...ió 0

o

0

01!

0

olO

o

0

0

...

0

(If a field other than C is used one still gets (1) but not possibly

If the Jordan blocks, J1, are arranged so that the diagonal elements of ,J1.,.2, J1,J2,... ,J are non-zero and the diagonal elements of arc zeros, then the matrices C and N in Theorem 1 may be taken to be

rJ1o o...ol

C=10 J20

...

andN= 10 [o

J20

...

0

o

Theorem 1 will be fundamental in the development of the theory of the Drazin inverse. However, Theorem 1 also has a practical side. One may use this theorem to compute the Drazin inverse.

Algorithm 7.2.1

Computation of A° where A E

Xii

and Ind(A) = k.

(I) Let p be an integer such that p k. (p can always be taken to

equal to n if no smaller value can be determined.) If =0, then AD =0. Thus assume 0. (II) Row reduce to its Hermite echelon form, HA,. (See Definition 1.3.2.) The sequence of reducing matrices need not be saved.

126 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

(III) By noting the position of the non-zero diagonal elements in Hi,, • select the distinguished columns from and call them (This is a basis for R(Ak).)

(IV) Form the matrix I —

and save its non-zero columns. Call (This is a basis for N(Ak).) (V) Construct the non-singular matrix P = ... I (VI) Compute P'

them v,

I

(VII) Form the product P 'AP. This matrix will be in the form

P 'AP

where C is non-singular and N is nilpotent.

=

(VIII) ComputeC'. (IX) Compute

by forming the product

Example 7.2.2

01

0

0]

Let

12

0]

0

1.

1

[—i

rc-'

—1 —l

We shall find AD by using this algorithm.

(I) Since we don't know what Ind(A) is let p = 3. Then

18001

01,

0

L 0 0 oJ and

(II)

11001 H=I0 0 01. L000J

(III) Thus,, =

81

1

—8

L

is a basis for R(Ak).

OJ

10001 (IV) Now I—HA= 10 0I,sothat L0o1J 1

101

V2=I1 LOJ

101

JolformabasisforN(Ak). L1J

18001 1

LOO1J (VI)

11001 P1=!18 8 01, 8Lo 0 8J

and

THE DRAZIN INVERSE

127

P'AP=

(VII)

(VIII) Since C = 2, C' =

We thus get

11100 00

11001 L000J

AD=PI0 0

(IX)

L000

The next characterization of AD may be useful if one tries to formulate a definition for the Drazin inverse of a linear transformation on an infinite dimensional vector space [17]. For A let C denote the class, X

C=

= XA and XAX = X}.

(Clearly, C is non-empty since OEC.) Define a partial ordering on C by X1 X2 if and only if X1AX2 = X1 = X2AX1.

Theorem 7.2.6

A° is the maximal element of C.

Proof Suppose XeC. Then X = for n = 1,2,... Thus, R(X) c for each n. In particular, R(X) c R(Ak) where k = Ind(A) so that A°AX = X. Furthermore, it is easy to see that c N(X) for N(X). It follows from this that XAA° = X. every n. In particular, Therefore, X AD for every XE C.

•

3.

Basic properties of the Drazin inverse

This section will present basic results about the Drazin inverse.

In Section 2, we saw that AD was not always a (1)-inverse for Though AADA is usually unequal to A, the product AADA = A2AD still plays an important role.

Definition 7.3.1

For

the product

= AADA = A2AD = ADA2

is called the core of A.

Intuitively, the 'core' of A should contain that which is basic to the structure of A. If is removed from A, then not much should remain. The next theorem shows in what sense this is true. Theorem 7.3.1

If

then A — CA =

is a nilpotent matrix of

index k = Ind(A).

Proof The theorem is trivial if Ind(A) =0. Thus assume Ind(A) 1

and notice that (NA)k = (A — AADA)k = (A(I — AAD))k = Ak(I — AAD) =

•

128 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

Definition 7.3.2

A

For

CA = (I — AAD)A is CA + NA is called the

—

called the nilpotent part of A. The decomposition A = core-nilpotent decomposition of A.

In terms of the canonical form representation of Theorem 2.1, we have the following. Theorem 7.3.2

If AeC0 "is written as A =

where

P and C are non-singular and N is nilpotent of index k = Ind(A), then

The core-nilpotent decomposition of A is unique in the following sense. A Theorem 7.3.3 For A=X+Y where XY = YX =0. Ind(X) I, and Y is nilpotent of index k = Ind(A). Moreover, this unique decomposition is given by A = CA + NA.

Proof Let X, Y be as described in Theorem 3. If md (X) =0, then V =0 and A is invertible. Suppose then that Ind(X) = 1. Let P. C be invertible

'Then V =

matrices so that X =

since

XY = YX =0 and C is invertible. Thus Y2 is nilpotent with Ind(Y2) = Ind(A) since Y is. But A =X + Y = Y

so

that X =

= NA by Theorem 2. •

Corollary 7.3.1 = NA, , and

If A e C' xn and if p is a positive integer, then If p Ind(A), then + = CA, + NA, =

= CA., =

The next lemma summarizes some of the basic relationships between

A,CA,NA,and AD. Lemma 7.3.1

For

'

the following statements are true.

I

I

iflnd(A)=0

(ii) NACA=CANA=O. (iii) NAAD = ADNA = 0. (iv) CAAAD = AADCA = CA. (v)

(vi) A =

CA

(vii)

if and only if Ind(A) 1. = AD.

(viii) AD = (ix) (AD)* = (A*)D.

(In the case of a general field, (*) is taken to mean transpose)

THE DRAZIN INVERSE

129

There are cases when the Drazin inverse coincides with the Moore—

Penrose inverse.

Theorem 7.3.4

For

X

A° = At (land only if A is an EP matrix,

(See Chapter 7for a discussion of EP matrices.)

Proof If A is EP, then AAt = AtA. Since At is always a(l,2)-inverse for A, it follows that At = A' = AD. Conversely, if At = AD, then AAt = AAD = ADA = AtA so that A must be EP. • 4.

Spectral properties of the Drazin inverse

In what follows, o() will always denote the spectrum, that is the set of eigenvalues. For a non-singular matrix A, it is easily proven that AEO(A) if and only if A 'ea(A 1). Furthermore, x is an eigenvector for A corresponding to A if and only if x is an eigenvector for A corresponding

toA'.

Recall the definition of a generalized eigenvector.

Definition 7.4.1

If A€C"

and x is a non-zero vector such that there =0 and is a positive integer p and a scala! AEO(A)for which (A — (A — Al)"- 1x 0, then x is called a generalized eigenvector for A of grade p.

An eigenvector of grade one is, of course, just an eigenvector. For a non-singular matrix A, it is well known that x is a generalized eigenvector for A of grade p corresponding to Aeo'(A) if and only if x is a generalized eigenvector for A' of grade p corresponding to A'Ea(A 1)• The next theorem shows that the same situation holds for Drazin inverses of singular matrices.

Theorem 7.4.1

For

such that Ind(A)= k, Aeo(A) (land only if

x is a generalized eigenvector for A of grade p corresponding to Aeo(A), A #0, (land only (lx is a generalized eigenvector for AD of grade p corresponding to A' Furthermore, x is a generalized eigenvector for A corresponding to A =0 (land only (fXE N(Ak) = N(AD).

Proof If Ind(A) =0 we are done. Suppose that A =

x=P

.

L"2J

Then x is a generalized eigenvector for A of grade p for A

0

if and only if u2 =0 and u, is a generalized eigenvector of grade p for C. Since C is invertible and A =

case is obvious. • Corollary 7.4.2

L

0

0 JI"

we

are done. The A =0

Let be such that Ind(A) = k. If x is a generalized eigenvector for A corresponding to A #0, then XGR(Ak).

130 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

5.

A° as a polynomial in A

If A is a non-singular matrix, then it is easy to show that A1 can be expressed as a polynomial in A. This property does not carry over to the (i,j, k)-inverses. In particular, if A is square, then there may not exist a polynomial p(x) such that At = p(A). However, the Drazin inverse of A is always expressible as a polynomial in A.

Theorem 7.5.1

If

then there exists a polynomial p(x) such that

AD = p(A).

Proof

where

Use Theorem 2.1 and write A as A =

P

and C are non-singular and N is nilpotent of index k = Ind(A). Since c is non-singular, we know that there exists a polynomial q(x) such that '.Then C' = q(C). Let p(x) be the polynomial defined by p(x) = q(N)]k

[C_l

=AD.

The polynomial constructed in the proof of Theorem 1 is generally of

much higher degree then is actually necessary. The next theorem shows how one might actually construct a polynomial p(x) such that p(A) = AD. Unlike Theorem 1 it uses the fact that A is a matrix over C.

Theorem 7.5.2 [77] Let distinct eigenvalues of A and

of

Suppose that {A0,Z, ,22, ... ,t,} are the =0. Let denote the algebraic multiplicity

letm=n-m0=m1+m2+...+m1. Let p(x) be the polynomial of

... degreen— coefficients are the unique solutions of the following m x m system of linear equations. denotes the ith derivative with respect to x.)

= = —

2'"'

fori=1,2,...,t, ('"'-'pci

Then p(A) = AD.

Proof

Since

A is similar to a Jordan form. Write

THE DRAZIN INVERSE

131

where J and N are the block diagonal matrices,

A=

J = Diag[B1, ... ,Bj, N = Diag[F1,

,F9]. Each is an elementary Jordan block corresponding to a non-zero eigenvalue. That is, each B, is of the form A,

I

o

o

0

0

A,

1

0

0

o o

0 0

0,

A,

0 0

0

...

A,

(1)

1

O...O

A,

and s m1. Each F, is an elementary Jordan block corresponding to a zero eigenvalue. That is, each is of the form (1) with A, =0. Clearly, J is non-singular and Ne CTM0 "o is nilpotent of index k = Ind(A) in0. Therefore, AD

]T- 1• Now, p(A) =

=

0]T 1, because NN0 = 0 implies p(N) = 0. Since p(J) = =

Diag[p(B1),... ,p(Bh)], it suffices to show that using (I), it is not difficult to verify that p(A,)

p'(A,)

p"(21)

p(S_ 1'(A)

1!

2!

(s—i)!

B' for eachj. But

p'(A,)

0

I!

p(B1) = p"(A,)

2! p'(A,) 1!

0

0 1

—1

0

0

SxS

(1Y'l

I

A,

A

—1

.4..

= B;1. •

—1 "1

0 Thus, p(A) =

0

•

0...0

1

A,

132 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

Theorem 2 can sometimes be useful in computing AD. This is particularly true if in0 is large with respect to n. The following is an example where

Theorem 2 can be used quite effectively.

Example 7.5.1

Let A

We shall use

.

=

[

=

=

—

Theorem 2 to compute The first (and, in general, the most difficult) step is to compute the eigenvalues for A. They are a(A) = {O, 0, 1, 1 }. Thus, in0 =2 and m1 =2 so that Theorem 2 implies that AD can be expressed as AD = A2(cx0I + oi1A) since p(x) = Now and + are the solutions of the system:

ç

1

—1 =p'(l)=2ix0+3a1 = — 3, and

Therefore,

AD=A2(41_3A)= For each A E

X

there are two polynomials of special importance.

These are the characteristic polynomial and the minimal polynomial. Let us examine each one. Consider first the minimal polynomial for A,

It is not difficult to show that A is non-singular if and only if; #0;

in which case, A' =

—

+ CLd....

+ ... + a2A + x11). Now,

assume A is singular so that ç =0. Let i be the smallest number such that = ..• This number, I, is sometimes called the index of the zero eigenvalue. The next theorem (valid in a general field) shows that the index of the zero eigenvalue of A is the same as the index of A.

Theorem 7.5.3

+ ... +cçx', with

IfA€C

#0, is the minimal polynomial for A, then I = Ind(A).

Proof Use Theorem 2.1 and write A as A =

0]P_1 where C is

non-singular and N is nilpotent of index k = md (A). Since ni(A) =0, we can conclude that 0= m(N) = Nd + + ... + ; + +1 + ;N1

+ ...

+cz1J)

THE DRAZIN INVERSE

is

133

invertible we have N' =0. Hence I k. Suppose that k < i. Then, ADAi=Ai_l.

(2)

Write m(x) = x1q(x) so that 0= rn(A) = A1q(A). Multiply both sides of this 'q(A). Thus, the polynomial r(x) = by AD and use (2) to obtain 0= such that r(A) =0 and deg[r(x)] <deg[rn(x)]. This is a x' 1q(x) is contradiction. Therefore, we can conclude that k = i.

•

Corollary 7.5.1

Let k = Ind(A), and m0 denote the algebraic multiplicity of the zero eigenvalue. It is always the case that m0 k.

Proof The minimal polynomial, from Theorem 5.3, is m(x) = xd_k + ,Xd ,x + ;). (ç 0), and m(x) must divide p(x). + + 2d-

•

1

X

When one uses Theorem 2 to compute the Drazin inverse of A E

it is necessary to compute each eigenvalue of A along with the multiplicities of each eigenvalue. Many times one can compute the coefficients of the characteristic polynomial for A easier than the eigenvalues. The following theorem shows how to obtain AD from the characteristic polynomial for A. and let k = Ind(A). Write the characteristic

Theorem 7.5.4 Let equation for A as 0= foq(x), 0). Let

+ fl,,

,x"'

+

,x +

=

if m0

+ ... +

+

—

+

(3)

ifm0=n

0,

Then, AD = A' [r(A)

+

'for each integer I

k.

=0, and the result is trivial. Proof If m0 = n, then A is nilpotent and to Thus assume m0 < n. Multiply both sides of 0= Amoq(A) by By obtain 0= ADq(A). From this, it easily follows that AD = raising both sides to the (I + 1)th power, we obtain (AD)s ii = AAD[r(Afl

Multiplication on both sides of this by A' yields AD = A'[r(A)]"'. • Since the index of a matrix can never exceed its size, nor the number rn of Corollary 1, we have the following.

Corollary 7.5.2

For

=

=

where

r(x) is the polynomial in Theorem 4.

For

the coefficients in the characteristic equation

for A can be computed recursively by the well known algorithm [43] (Tr denotes trace). (4)

134 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

where (5)

=

1A1' + ...

the next result showing how this algorithm may be used to obtain the matrix r(A) is immediate. Since

+

Theorem 7.5.5 AD =0.

Let

and let r(x) be as in (3).

If n > m0, then r(A) =

and

where

—

computed from (4) and (5). Thus AD

=

If n = rn,, then are

for each I Ind(A).

—

NoticethatifS,=0,thenO=S+j =S..2= ...

and obtain r(A), to P0 = 0. Thus, it is easy to use Theorem 5 = and AD as follows. To compute

Algorithm 7.5.1

for

(I) Set S0 = I and recursively compute

=

+

=

until some S1 =0, but S1_1 #0.

—

3

(II) Let u

=

that number such that and = fin-u-2 = =0. (Notice that n — u = m0, the algebraic multiplicity of the

be

zero eigenvalue.)

(III) Let I = n

as AD =

(IV) Compute

=

compute

— u and

—

Note that not all of the computed Si's must be saved. If fi,, - #0, then 2can be forgotten. However needs to be saved until next

non-zero fi appears. Also notice that this algorithm produces the value of the algebraic multiplicity of the zero elgenvalue for A. Example 7.5.2

Let —8 —10

6

—3']

8

—4)

i

—1

1

L—2

2

—2

[io

A—' 12 I

oJ' 2]

We shall use Algorithm I to compute AD. (I) Successive calculations give

S0=I,

r I I

12 1

L—2

—8 —13

6 —31 8

—1 —2

—41

01'

2—2—1J

THE DRAZIN INVERSE 12

—10

4

—4

4 —4

4

r—14 AS1

= L

—12

12

—10

S2=AS1+21=[4

4

2

—

=2,

P0 = —

= 0,

=

1

135

—4J 5

4 —4 4 —2 0

0 S3

=

AS3 =

0000

0000

and

S4=AS3= 0. Therefore t =4 in this example and the algebraic multiplicity of the zero eigenvalue is in0 = 2.

(II) Setu=2. (III) Set 1=2. (IV) Compute AD =

=

—

as follows. Since S2 = AS1 +

—

= = + Write this as we have that $2(AS1)]S1. Since AS2 and AS1 have already been computed, only two matrix multiplications are necessary. This is more efficient than forming directly. In general, one can always do something like the product this when this algorithm is used. Now $21,

—12

12

24 = —24 — 12 12

00

6

—12 —24

12

12

6

—

00

16

—12

8

—4

—8

8

—8

8

[(AS2)S1—2(AS1)]=

—

136 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

and

12

—8

—16

16

r—16

L

16

—16]

Therefore, 2 —

8

I

'—I—i L—2

6.

1

—1

1

2

—2

2

A° as a limit

It was previously shown how the Moore— Penrose inverse could be expressed by means of a limiting process. In this section, we will show how the Drazin inverse and the index of a square matrix can also be characterized in terms of a limiting process. Whenever we consider the limit, as z 0, of an expression involving (A + zI)1 we shall assume that —

If

Definition 7.6.1

and '1 CA and NA denote the core and

nhlpotent part of A, respectively, then for integers m — 1, we define (AD,

,fm=O and

ifm1 (0

,

ifm=—1 EfmO,

ifm=—I

(0,

ifm=O.

afm1

(.Nr,

(.

Theorem 7.6.1 z-'O

and let Ind(A) = k. For every integer 1 k,

Let Ae

+zI)'A'.

(1)

For every non-negative integer I, AD

Proof Ic

= urn (A"' + zI)

(2)

If k =0, then A is non-singular, and the result is evident. For

>0, use Theorem 3.1 and write

(A"' + zI)

r(cs+1

I —'C'

I

01

-:

=

+ zI) 'C' = C- ', (2) is proven. A' and (1) follows. •

Since C is non-singular, and Jim (C"'

If l k,

z-O

then C? = Since it is always true that k n, we

also have the following

THE DRAZIN INVERSE

Corollary 7.7.1

AD

For

The index of

= urn

137

+ zI) 'An.

also be characterized in terms of a limit. Before doing this, we need some preliminary results. The first is an obvious consequence of Theorem 3.1. can

Xli

Lemma 7.6.1 Let Ind(A") = I and only

a singular matrix. For a positive integer p, p Ind(A). Equivalently, the smallest positive integer Ifor which Ind(A1) = 1 is the index of A. be

Lemma 7.6.2 Let be a nilpotent matrix such that Ind(N) = k. For non-negative integers m and p. the limit (3)

z-0 exists

and only

+ p k. When the limit exists, its value is given by

lit4

=

urn zm(N + zI)

ifm>O ,fm=O

0

z—0

(4)

Proof If N = 0, then, from Lemma 2.1, we know that k =

1.

The limit

under consideration reduces to

ifp=O

ifpl.

0,

z—0

It is evident this limit will exist if and only if either p 1 or m 1, which is equivalent to m + p 1. Thus the result is established for k = 1. Assume k—i

N1

nowthatk1,i.e.N#O.Since(N+zIY1=

Z

1=0

+(_lr_2zNm+P_2 ÷

+(—

+

(—

z

+ (5)

.

If m + p k, then clearly the limit (3) exists. Conversely, if the limit (3) exists, then it can be seen from (5) that

Theorem 7.6.2

For

= 0 and hence m + p k. U

where Ind(A) = k and for non-negative

integers m and p. the limit (6) urn z"(A + zI) z-0 exists tf and only !fm + p k: in which case the value of the limit is given by

limf'(A + zI)'A" =

{( — Ir

(I_AAD)Am4Pi,

(7)

138 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

Proof If k =0, the result is immediate. Assume k 1 and use Theorem 2.1 where P and C are non-singular and N is

to write A = nilpotent of index k. Then

zm(A + zI) 'A"=

(8)

Because C is non-singular, we always have

limz'"(C+zI)

1

,

z-.O

ifm>O ifm=O

(9)

Thus the limit (6) exists if and only if the limit lim z"(N + zI) exists, :-. 0 which, by Lemma 2, is equivalent to saying m + p k. The expression (7)

is obtained from (8) by using (9) and (4). • There are some important corollaries to the above theorem. The first characterizes Ind(A) in terms of a limit. Corollary 7.6.2

For AeC"

the following statements are equivalent

(i) Ind(A) = k. (ii) k is the smallest non-negative integer such that the limit lim (A + zI) tAk exists. z-0 (iii) k is the smallest non-negative integer such that the limit urn z*(A +

exists.

(iv) If Ind(A) = k, then lim (A + zI) iAk = (AAD)Akl z-0 (v) And when k> 0, lim zk(A + zI)' = (—

=

1)

1)

z-0

Corollary 7.6.3 For lim (A + zI) '(A' + z'I) = z-'O

Corollary 7.6.4

A''.

For

and for every integer l Ind(A) >0,

the following statements are equivalent.

(1) Ind(A)l. (ii) lim(A+zI)1A=AA'. z-0

=I—AA'.

(iii) Jim

z-0 The index can also be characterized in terms of the limit (1).

Theorem 7.6.3

For

X

the smallest non-negative integer, 1, such

that

lim (A'11 +zI)'A' exists is the index of A.

(10)

THE DRAZIN INVERSE

139

Proof If Ind(A) = 0, then the existence of(10) is obvious. So suppose Ind(A) = k 1. Using Theorem 2.1 we get

zI)

— —

0

L

1

The term (C'4 ' + zI) IC: has a limit for alt 1 0 since C is invertible.

which has a limit if and only if N' =0. That is, 1 Ind(A). U

The Drazin inverse of a partitioned matrix

7.

This section will investigate the Drazin inverse of matrices partitioned as

M

fl where A and C are always assumed to be square.

=

Unfortunately, at the present time there is no known representation for MD with A, B, C, D arbitrary. However, we can say something if either D =0 or B =0. In the following theorem, we assume D =0.

If M

Theorem 7.7.1

where A and Care square,

= k = Ind(A), and 1= Ind(C), then MD =

X = (AU)2[

(AD)IBCi](I

—

[AD

CD] where

CCD)

jzO

+ (I —

—

= (AD)2[

—

ADBCD

CCD)

IwO

rk-* 1 + (I — AAD)I Z AB(CDY l(CD)2 — ADBCD. Ls.O J (We define 00

(2)

I)

Proof Expand the term AX as follows. i—i

lCD

'BC' — 0

o

-

k—i

s—i

AX = k—I

2 —

0

140 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

=

/

\ / ADBCCD

1-2

I ADB +

(AD)1. 2BC11' 1—I

0

++

+ /k—I

+1

iB(CDYi +

— (

k-i

+E

0

1k-i

= ADB +

1-2

)—

1—2

1—2

(AD)i+2BCI.I.l 0 —

—

ADBCCD — 0

k-i

ADA1+ iB(CD)i+l

+

—

AADBC.

Now expand the term XC as follows. (AD)I.I.2BCI.41

XC =

k—i

I—i

1—i

o

(AD)i+2BC1+2CD +

—

0

o k— I

—

—

ADBCDC

0

/1—2

'+' +(AD)l1BCI

=1

'BC') + (BcD +

+

1B(CDY11)

+ It

—

\

/1—2

1—f

\o

I

k—i

AIB(CDY+1)

—

(ADABCD

ADBCDC.

ADX AB cD]=[o CD][o ci'

AB ADX

Fromthisitfollowsthat[0

so that condition (3) of Definition 2.3 is satisfied. To show that condition (2)

holds, note that

FAD X irA B1IAD X] [0 CDJ[o cj[o CDf[0

I

ADAX + XCCD + ADBCD CD

:

Thus, it is only necessary to show that ADAX + XCCD + ADBCD = X.

However, this is immediate from (2). Thus condition (2) of Definition 2.3 is satisfied. Finally, we will show that

IA

[o cJ

[0

X 1

IA

cDf[o c] IA

.]

(3)

IA' S(p)1 =Lo c'

THE DRAZIN INVERSE

141

p—i

'BC'. Thus, since n + 2> k and n + 2>1,

whereS(p)= 1=0

IA

]

X

[0 C]

An42X+S(fl+2)CD

c°][o

[o

Therefore, it is only necessary to show that 2X + S(n + 2)CD S(n + 1). Observe first that since 1+ k < n + 1, it must be the case that AhI(AD)i

=

for I = 1, 2, ...

=

(AD)IBCI 1(1

(4)

,1 — 1.

Thus, 1 Li=o

ri-i

= L 1—0

—

CCD)

1BCD

—

(5)

J 1

An-IBCI](I — CCD) —

I—i

I—i

1=0

10

IBCD

= =

Now, S(n +

IBCICD

=

IBCICD 1=0

1—0 11+ i

+ 1 1

1

- IBCICD =

By writing

1BCD

.IBCICD

+

1—0

=A11BCD+

1=1 l—1

1=0

we

obtain 1_i S(n

+

2)CD

=

1BCD

lCD +

+ 1=0

(6) 1=1+ 1

It is now easily seen from (5) and (6) that 1—1

A"2X +S(n + 2)CD_ Z

+1—1+1 Z

i—I

n

1=0

1=1

=

=

A"'BC' = S(n +

1),

i—0

which is the desired result. U By taking transposes we also have the following.

142 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

If

Corollary 7.7.1

L

where

A andC are square

= with Ind(A) = k and Ind(C)

1, then LD is given by LD

AD] where

=

X is the matrix given in (2).

There are many cases when one deals with a matrix in which two blocks are zero matrices. Corollary 7.7.2

Let

lAB] Mi=[0

lAo]

1001

A

A is square and each

is square. Then,

rAD

rAD 0

—L0

lOB

]'

2—

til "IMD_ 3 — (AD)2B AD 0]'

and B(AD)2 AD

Each of these cases follows directly from Theorem 1 and Corollary 1.

The next result shows how lnd(M) is related to Ind(A) and Ind(C). Theorem 7.7.2

If M

with A, C square, then

= Max{Ind(A), Ind(C)} Ind(M) lnd(A) + Ind(C).

[

Proof By using (iii) of Corollary 6.2 we know that if Ind(M) =

m,

then

the limit

limf'(M+zI)'

(7)

z-0

exists. Since

z"(M + 1)'—

+ 0

f'(C+zI)'

J'

(8)

one can see that the existence of the limit (7) implies that the limits

lim z"(A + 21)_i and urn f'(C + zI)' exist. From Corollary 6.2 we can z—'O

conclude that Ind(A) m = Ind(M) and Ind(C) m = Ind(M), which

establishes the first inequality of the theorem. On the other hand, if Ind(A) = k and Ind(C) =1, then by Theorem 6.2 the limits lim zkft(A + zI) lim + zI)' and lim 2k '(A + zI)'B(C + ZI)_1 = z—O z—O z—'O lim [z"(A + zI) I] B[z(C + zI)'J each exist. z—O

THE DRAZIN IN VERSE 143

+ zI)

Thus urn

exists and lnd(M) k +

I.

U

In the case when either A or C is non-singular, the previous theorem reduces to the following.

Corollary 7.7.3

Let AECPXr,

C

non-singular, (Ind(C) = 0), then lnd(M) = Ind(A). Likewise, if A is nonsingular, then Ind(M) = Ind(C).

The case in which Ind(M) 1 is of particular interest and will find applications in the next chapter. The next theorem characterizes these matrices. Theorem 7.7.3

If

and M

then Ind(M)

1

=

ifand only if each of the following conditions is true: Ind(A) 1, Ind(C) 1,

(9)

and

(I—AA')B(I—CC')=O.

(10)

Furthermore, when M' exists, it is given by

c'

Lo

(11)

Proof Suppose first that Ind(M) 1. Then from Theorem 2, it follows that Ind(A) 1, Ind(C) I and (9) holds. Since Ind(M) 1, we know that MD = M' is a (1)-inverse for M. Also, Theorem 5.1 guarantees that M' is

a polynomial in M so that M' must be an upper block triangular (1)-inverse for M. Theorem 6.3.8 now implies that (10) must hold. Conversely, suppose that (9) and (10) hold. Then (9) implies that AD = A' and CD = C'. Since A' and C' are (1)-inverses for A and C, (10) along with Theorem 6.3.8 implies that there exists an upper block triangular (1)-inverse for M. Theorem 6.3.9 implies that rank(M) = rank(A) + rank(C). Similarly, rank(M2) = rank(A2) + rank(C2) = rank(A) + rank(C) = rank(M)

so that Ind(M) 1. The explicit form forM' given in (11) is a direct consequence of (2).

U

Corollary 7.7.4

If Ind(A) N(C) c N(B), then Ind(M)

1, 1

Ind(C) 1, and either R(B)

R(A) or

where M is as in Theorem 3.

Proof R(B) c R(A) implies that (I — AA' )B =0 and N(C) c N(B)

impliesthatB(I—CC')=O. •

It is possible to generalize Theorem 3 to block triangular matrices of a

general index.

144 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

Theorem 7.7.4

Let

both be singular

non-singular, then Corollary 3 applies) and let M

A"'

positive integer p, let S(p) =

either is

For each

=[

with the convention that 00

= I.

i— 0

Then, Ind(M)

m if and only

each of the following conditions are true:

Ind(A)m, Ind(C)m,and

(12) (13)

(I — AAD)S(m)(I

—

CCD) =0.

(14)

Proof Notice that for positive integers p.

M"

(15)

= [s—— Assume first that Ind(M) m. Then Ind(MM) = 1. From Theorem 3 and the singularity of A, C we can conclude that Ind(AM) = 1, Ind(C'") = 1, and (I — Am(Am)D)S(m)(I

(16)

=0.

—

(17)

Then (12), (13) hold by (16). Clearly, (17) reduces to (14). Conversely, suppose (12)—(14) hold. Then (16) and (17) hold. Theorem 3 now implies

that Ind(Mm) = 1. Therefore, Ind(M) 1. • SeCNXS, such that rank(R) = n and Lemma 7.7.1 For rank(S) = n, it is true that rank(RTS) = rank(T).

Proof Note that RtR = and = Thus rank(RTS) rank(T) rank(RtRTSSt) rank(RTS). We now consider the Drazin inverse of a non-triangular partitioned matrix.

•

Theorem 7.7.5

Let

and M

If rànk(M) = rank(A)= r,

= then Ind(M) = Ind[A(I — QP)] +1= Ind[(I — QP)A] +1, where

P=CA' andQ=A'B.

Proof From Lemma 3.3.3, we have that D = CA 'B so that M=

Q]. Thus, for every positive integer i;

+

QJ

+ QP)AJ'1[I

Q]. (18)

THE DRAZIN INVERSE

Since

[i,]

A has full column rank and [I

Q]

145

has full row rank, we can

')= rank([A(I + QP)]"). Therefore, rank([A(I + QP)]")= rank([A(I + QP)] 1) if and only if ')= rank(M). Hence Ind([A(I + QP)]) + 1 = Ind(M). In a similar manner, one can show that Ind[(I + QP)A] + 1 = Ind(M). • conclude from Lemma 1 that

An immediate corollary is as follows.

Corollary 7.7.5 For the situation of Theorem 5, Ind(M) = 1 and only tf (I + QP) is non-singular. The results we are developing now are not only useful in computing the index but also in computing AD. Theorem 7.7.6 rank(A) = r, then

1]A[(sA)2riI!A_IB]

=

(19)

Proof Let R denote the matrix R

1][

=

A 'B],

and let m = Ind(M). By using (18), we obtain l[(44S)2]DA[I

'R

By Theorem 5 we know Ind(AS) = m —

1

A 'B].

so that

'[(4%S)2J° =

(ASr-'. Thus, it now follows that

'A[I

'R

A 1B] = MM.

The facts that MR = RM and RMR are easily verified and we have that R satisfies the algebraic definition for MD. The second equality of

(19)is similarly verified. U The case when Ind(M) = 1 is of particular interest.

Theorem 7.7.7

Let

If rank(M)=

and M

=

S'

exists where S = I + rank(A) = r, then Ind(M) = 1 if and only A 1BCA '.When Ind(M) =1, M' is given by

M'

146 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

+BC)'A(A2 + BC)'[A!BI. This result follows directly from Corollary 5 and Theorem 6. Before showing how some of these results concerning partitioned matrices can be used to obtain the Drazin inverse of a general matrix we need one more Lemma. Consider a matrix of the form M

Lemma 7.7.2 AeCPXF.

] where

=

Ind(A) Ind(M) Ind(A)+ 1.

(20)

Furthermore, suppose rank(M) = r. Then Ind(M) = non-singular.

1

and only

A is

Proof (20) follows from Theorem 2. To prove the second statement of the lemma, first note that md (A) =0 implies md (M) = md (0) = 1 by Corollary 3. Conversely, if rank (M) = r and md (M) = 1, then r = rank(M2) or equivalently, r = rank ([A2, AB]) rank (A) r. Thus rank (A) = r so

that A-' exists. • The next theorem can be used to compute the group inverse in case

Ind(M)=' 1. Theorem 7.7.8

Let

where R is a non-singular matrix

suchthatRM_[

]whereUeC". Then MD — R

LO

—

If Ind(M) =1, then M' =

0

R

R

Proof

Since Ind(M) = Ind(RMR '), we know from Lemma 2 that Ind(M) = 1 if and only if U' exists. The desired result now follows from

Corollary 2. • Example 7.7.1

1='Ii

12 Li

2

Let 1

4

2

0

0

We shall calculate M' using Theorem 7. Row reduce [M I] to

R]

THE DRAZIN INVERSE

147

where EM is in row echelon form. (R is not unique.) Then,

Ii 0 01 EM=I0

0

1

ii

0 0

1

LO 0 0]

[—2

[1

—fl, 0]

1

2

0

andRM=E%I.NowR1=12 4

1

[1

0

0

120 [KV sothatRMR'=

00:0 00

Clearly, U' exists. This imphes that Ind(M) = I and rig—' ' ii—2v1

[0

—

0

J

R—

—8

4

2

—10 —5

21

From Lemma 6.1, we know that lip Ind(M), then Ind(M") =

1. Thus, = the above method could be applied to to obtain For a general matrix M with p Ind(M), MD is then given by MD = = Another way Theorem 7 can be used to obtain M of index greater than 1 is described below. Suppose p Ind(M). Then

Ind((RMR ')')= Ind(RM"R ')= Ind(M")= 1. Thus if RM

then RMR'

S

=

T

0]and(RMR'Y'=

].

It follows from (20) that Ind(SP) 1. Therefore one can use Theorem 7 to (This is an advantage because

fmd MD

=R

=

r(SP).sP—

'L

is a smaller size matrix.) Then, 1

1

(SP)#SP.. 2T1

]R.

Finally, note that the singular value decomposition can be used in conjunction with Theorem 7. (See Chapter 12).

8.

Other properties

It is possible to express the Drazin inverse in terms of any (1)-inverse. Theorem 7.8.1

If

is such that Ind(A) = k, then for each integer

148 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

I k, and AD =

any (1)-inverse of l)tAI

X= AD

01

= P[0

rc-2:-i

AD

=

In particular,

',P and C non-singular and N nilpotent.

Proof Let A = Then

;

'.If X is a (1)-inverse, then it is easy to see

1

where X1, X2, X3 are arbitrary. That

L"i

= A'XA' is easily verified by multiplying the block matrices together. U In Theorem 1.5, the Moore—Penrose inverse was expressed by using the full rank factorization of a matrix. The Drazin inverse can also be obtained by using the full rank factorization.

Theorem 7.8.2 Suppose A e and perform a sequence offull rank factorizations: A = B,C1,C1B, = B2C2,C2B2 = B3C3,... so that isafull rank factorization of C,_ ,B1_ 1 ,for i = 2,3 Eventually, there will be a pair offactors, Bk and Ck, such that either (CkBk) 1 exists or CkBk =0. If k denotes the first integer for which this occurs, then

fk when (CkBk)' exist 'lk+l whenCkBk=O.

I nd(A'

—

When CkBk is non-singular, rank(Ak) = number of columns of Bk = number of rows of

and

R(Ak)= R(B, B2...

Bk), N(Ak)= N(CkCk...l ... C1).

Moreover, AD

— 5B1 B2 ... 0

C1

when

exists

when CkBk=O.

Proof IfC1B1isp xpand has rankq
=B1 B2.. Bk(CkBk)C*Ck_l is p x r and Ck is r x p, then

rank(BkCk) = r. Since CkB, is r x r and non-singular, it follows that so that Lemma 7.1 guarantees that rank(CkBk) = r = Since k is the smallest ')= rank(CkBk) = rank(BkCk) =

THE DRAZIN INVERSE

149

integer for which this holds, it must be the case that Ind(A) = k. The fact that rank(Ak) = number of columns of Bk = number of rows of Ck is clear.

By using the fact that the B's and C's are full rank factors, it is not difficult to see that R(Ak) = R(B1 B2 Bk) and N(Ak) = N(CkC&_I C1). If =0, then it is clear that A must be nilpotent of index k + 1. To prove the formula given for AD is valid, one simply verifies that the three conditions of the algebraic definition are satisfied. This is straightforward

and is left as an exercise. • There are several methods for performing full rank factorizat ions. One is the elimination scheme described in Algorithm 1.2. The others depend on orthogonalization techniques such as the modified Gram—Schmidt algorithm. Needless to say, the method chosen to perform the factorizations at each step can influence the final result.

Corollary 7.8.1

If

'B(CB)2C = 0, then AD = = BC is afull rank factorization. A

'where

Corollary 7.8.2 If is such that Ind(A) = A = B(CB) - 2C. rank factorization for A, then Theorem 7.8.3

If

and A = BC is afull

is such that rank(A) = 1, then

[Tr(A)12A when Tr(A)

Proof If rank(A) =

1,

0 and AD =0 when Tr(A) =

= A# =

0.

then A can be written as A = cd* where and Tr(cd*) = Tr(d*c) cd*cd* deCo. Now, Tr(A) = = Tr(A)A. = dc. Thus A2 = If Tr(A) 0, then R(A2) = R(A) so that Ind(A) 1. The fact that 1,

A# = [Tr(A)]2A can now be deduced from Corollary 9.2, or else one can

verify by direct computation the requirements of Definition 2.4. U In general, the reverse order law does not hold for the Drazin inverse. That is, (AB)D # BDAD. In the case of the Moore—Penrose inverse, we saw that very strong conditions had to be placed on A and B in order to guarantee that (AB)t = B'At. Even the commutativity of A and B is not strong enough to guarantee that (AB)t = BtAt. However, commutativity of A and B is enough to guarantee that (AB)D = BDAD.

Theorem 7.8.4

If A,

are such that AB = BA, then

= BDAD = ADBD,

(i)

(ii) ADB = BAD and ABD = BDA.

In general,

= A [(BA)2]DB (iii) even tfAB#BA. Proof

Assume first that AB = BA. It follows from Theorem 6.1 that

150 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

and (ii) now are is a polynomial in easily proven. Assume now that A and B do not necessarily commute. To prove (iii), let Y = A[(BA)2]DB. Clearly YABY = Y and ABY = A

YAB = A(BA)DB. Let k =

Ind(BA)}. Then (ABr2Y =

(ABr2A(BA)2"B = (ABr 1ABA(BA)2°B = (ABr 1A(BA)DB = 1• A(BA)k+ l(BA)DB = A(BA)kB = Therefore, by Theorem 2.2

Y=(AB)D. •

Corollary 7.8.3 Let A, be such that AB = BA. Then Ind(AB) max Ind(B) }. Given a solution to just one of the three defining conditions, Ak+ 'X = Ak, from it. one can construct

Theorem 7.8.5

Let be a matrix such that AD for somel Ind(A)= k. Then

Proof If

Let

and

'B = A', then

B=P[B 1

Thus

=

=

:1

= AD.

8

Applications of the Drazin inverse to the theory of finite Markov chains

1.

Introduction and terminology

Let {X, : teF

R} be an indexed set of random variables. If P is a

probability measure such that

whenever t1
152 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

An rn-state Markov chain is said to be ergodic lithe transition matrix of the chain is irreducible, or equivalently, its states form a single ergodic set. An ergodic chain is said to be regular if the transition matrix T has the 0. (For property that there exists a positive integer p such that

XeRhx,X >0 means each entry of X is positive.)

If an ergodic chain is not regular, then it is said to be cyclic. It can be shown that if an ergodic chain is cyclic, then each state can only be entered at periodic intervals. A state is said to be absorbing if once it is entered, it can never be left. A chain is said to be an absorbing chain if it has at least one absorbing state and from every state it is possible to reach an absorbing state (but not necessarily in one step). The theory of finite Markov chains provides one of the most beautiful and elegant applications of the theory of matrices. The classical theory of Markov chains did not include concepts relating to generalized inversion of matrices. In this chapter it will be demonstrated how the theory of generalized inverses can be used to unify the theory of finite Markov chains. It is the Drazin inverse rather than any of the (i,j, k)-inverses which must be used. Some types of (1)-inverses, including the Moore—Penrose inverse, can be 'forced' into the theory because of their equation solving abilities. However, they lead to cumbersome expressions which do little to enhance or unify the theory and provide no practical or computational advantage. Throughout this chapter it is assumed that the reader is familiar with the classical theory as it is presented in the text by Kemeny and Snell [46]. All matrices used in this chapter are assumed to have only real entries so that (.)* should be taken to mean transpose.

2.

Introduction of the Drazin inverse into the theory of finite Markov chains.

For an rn-state chain whose transition matrix is T, we will be primarily concerned with the matrix A = I — T. Virtually everything that one wants to know about a chain can be extracted from A and its Drazin inverse. One of the most important reasons for the usefulness of the Drazin inverse is the fact that Ind(A) = I for every transition matrix T so that the Drazin inverse is also the group inverse. This fact is obtainable from the classical theory of elementary divisors. However, we will present a different proof utilizing the theory of generalized inverses. After the theorem is proven, we will use the notation A in place of in order to emphasize the fact that we are dealing with the group inverse.

Theorem 8.2.1

If TE W" is any transition matrix (i.e. T is a stochastic matrix) and if A = I — T, then Ind(A) = 1 (i.e. exists).

APPLICATIONS OF THE DRAZIN INVERSE TO THE MARKOV CHAINS 153

Proof The proof is in two parts. Part I is for the case when T is irreducible. Part H is for the case when T is reducible. (I) If T is a stochastic matrix and j is a vector of l's, then Tj = j so that = 1,(see page 211), it follows that p(T)= 1. 1EO(T). Since p(T) lIT If T is irreducible, then the Perron — Frobenius Theorem implies that the eigenvalue 1 has algebraic multiplicity equal to one. Thus, Oec(A) with algebraic multiciplicity equal to one. Therefore, Ind(A) = 1, which is exists by Theorem 7.2.4. N equivalent to saying that Before proving Part II of this theorem, we need the following fact. Lemma 8.2.1 If B 0 is irreducible and M 0 Is a non-zero matrix that B + M = S is a transition matrix, then p(B) < 1.

such

Proof Suppose the proposition is false. Then p(B) 1. However, since 1. Thus, p(B) S is stochastic and M 0, it follows that B B 1. Therefore, it must be the case that 1 = p(B) = p(B*). The Perron—Frobenius Theorem implies that there exists a positive eigenvector. v >0, corresponding to the eigenvalue 1 for B. Thus, v = Bv = = j*S*v — j*M*v = (S* — M)v. By using the fact that Sj = j, we obtain j*i, — Therefore, j*M*v =0. However, this is impossible because

•

We are now in a position to give the second part of the proof of Theorem 1. (II) Assume now that the transition matrix is reducible. By a suitable permutation of the states, we can write

T

ix

(—

indicates equality after a suitable permutation

LO ZJ

hasbeenperformed)

where X and Z are square. If either X or Z is reducible, we can perform another permutation so that

ruvw TJ0 CD. Lo

0

E

If either U, C or E, is reducible, then another permutation is performed. Continuing in this manner, we eventually get B11

0

B22

o

O..:B,,,,

is irreducible. If one or more rows of blocks are all zero except where =0 for for the diagonal block (i.e. if there are subscripts I such that

154 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

each

k # I), then perform one last permutation and write T11

T12

O

T23

...

T1, T2,

T1,+2 T2,..2

T1,+1

Tir+i

...

...

T 0

0

o

Each T11 (1= 1,2,... ,n) is irreducible. From Part I of this proof, we know that (I — T11)' exists for every i. However, for i = 1,2, ... ,r, there is at least <1 #0. It follows from Lemma 2.1 that one index k # I such that exists for i= 1,2,...,r. Wecan now for 1= 1,2,...r. Therefore,(I

conclude that there exists a permutation matrix P such that A can be written as A— L

22J

I

where G11 is non-singular and

exists. It now follows from

Theorem 7.7.3 that A must exist and —

I

L

Thus,

I

22

the proof is complete. •

Notice that for every transition matrix T, it is always the case that

j€N(A) = N(A) so that A'j =0. Furthermore, it is always the case that =j. We will frequently use these observations together with the following well-known Lemma, which we state without proof. (I —

Lemma 8.2.2 (I) Every transition matrix T is similar to a matrix of the form where lØo(K).

(II) If T is the transition matrix of an ergodic chain, then k = 1, i.e. T is similar to a matrix of the form

(III) If T is the transition matrix of a regular chain, then lim K =0. U-. We are now in a position to relate the single expression I — AA to

APPLICATIONS OF THE DRAZIN INVERSE TO THE MARKOV CHAINS 155

various types of limiting processes which are frequently encountered in the

theory of finite Markov chains.

Theorem 8.2.2

Let 1 be the transition matrix of an rn-state chain and let A = I — T. Then

hm

for every transition matrix.

lim (xI + (1 — x)Tr

for every transition matrix T and 0< a < 1.

J—AA' = tim Jim

r r

for every regular chain. for every absorbing chain.

Proof For every transition matrix T, we know from Lemma 2 that there exists a non-singular matrix S such that (2)

and I

Therefore, I — K is non-singular and

=s-'[:_.!

and

J_AA#

(3)

Assume first that T is the transition matrix of a regular chain. Then from =0. It is now clear that Lemma 2 we know that k = 1 and Jim I,—

Next, consider I = oci + (1 — x)T, 0<

a<

1.

It is clear that 1

is

a transition

matrix whose eigenvalues are = a + (1 — A€o(T). convex combination of 1 and A, it follows each X different from 1 is inside the unit circle (i.e. I I < I if # 1) so that by considering (2) it is clear that each X is a

= p(aI + (1 — a)K) < 1 and

StI.Jim t' \

/

.

1i

=lim I

LO

0

xl + (1 —

Assume next that T is the transition matrix of an absorbing chain with

156 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

exactly r absorbing states. Then there exists a permutation matrix P such

that

T=

p

and r =

For an absorbing chain, it is well-known that I + Q + Q2 + ... = (I — so that

Q)'

V= Since A has the form

Theorem 7.7.3 yields

A' —

(I—Q)'J

L—(I—QYTR:

Therefore, (4)

Finally, assume that T is any transition matrix and is written in the form (2). Since I — K is non-singular, we may write

I+K+K2+...

(5)

By using (2) and (3) together with (5), it is a simple matter to verify that

=

n

all n, it follows that urn

(I—r)A' n

+ I — AA'. Since hr

= 1 for

(I—r)A' =0, and hence n

a In the case of ergodic chains, the matrix I — AA' has a very special structure.

Theorem 8.2.3

If T is the transition matrix of an rn-state ergodic chain,

then each row of I — AA' is the same vector is the fixed probability (row) vector of T.

Proof Thisfollowsfromthefactthatlim a-.

= [w1,w2, ... ,w,,,J, which

T'=I—AA'. S

APPLICATIONS OF THE DRAZIN INVERSE TO THE MARKOV CHAINS 157

Throughout the rest of this chapter we will let W denote the limiting matrix. I — AA 3.

Regular chains

Theorem 8.3.1

A'=lim

•For every transition matrix T,

n—i

n—i

k

—k

If T is the transition matrix of a regular chain, then the expression reduces

toA'= Proof Write T as in (I) of Lemma 2.2 so that n-I k

E"

S.

k—O

Since I — K is non-singular,

nffl_k&i _(I-.-K)_IKP.EKk1 n

Lnk_O

]•

From the first part of Theorem 2.2, it follows that Inn

I+K+... ÷Ki =0. 1*

Therefore,lim

n-i —k

and hence

urn I

(

)

The second equality in the first part of the theorem follows because I — W = AA Assume now that T is the transition matrix of a regular chain. Write T as in Part (II) of Lemma 2.2. From Part (III) of that Lemma,

= (I — K)

we know that urn K" =0 so that

1•

Therefore,

k=O

=

= A.

The second part of the proof of Theorem 2.3 provides an interpretation for each of the entries of A' for a regular chain.

158 GENERALiZED INVERSES OF LINEAR TRANSFORMATIONS

Theorem 8.3.2

Let I be the transition matrix of a regular chain and let denote the matrix whose (i,j)th entry is the expected number of times the in the first n stages (the initial plus n — 1 stages) when chain is in state Then A' = lim (Ne" — nW). the chain was initially in

Proof The result follows by combining the fact that 1=0

together with Lemma 3.1 since

—

nW) = tim

—

( Ii=0

w))= A'. U

A thus has the following meaning for a when regular chain. For large n, the expected number of times in state where w. is the differs from nw1 by approximately initially in state jth component of the fixed probability vector In loose terms, one + n(I — AA') for large n. Furthermore, for large n, could write since one can compare two starting states in terms of the elements of = — tim — The (i,j)th entry,

For an initial probability vector the jth component, of p*N(n) gives the expected number of times in state .Y in n stages. The following corollary provides a comparison of and for two different initial probability vectors. Corollary 8.3.1

Let T be the transition matrix of a regular chain, let be an initial probability vector, and let w* be the fixed probability vector for T. Then tim — nw) = and for two initial vectors and

=

Tr(A') =

tim

Furthermore,

—

= tim

—

—

paN(n)j),

for every

It

initial probability vector

Proof The first two limits are immediate. The third limit follows fiom the second and the fact that A'j =0 since — 1

=

— k

-.

—

p*A# e1

It

U 4.

Ergodic chains

In this section we will extend our attention to investigate ergodic chains in generaL It will be shown that the matrix A' is the fundamental quantity in the theory of ergodic chains. Virtually everything that one would want to know about an ergodic chain can be determined by computing A'. We will begin by investigating the mean first passage times (i.e. the

APPLICATIONS OF THE DRAZIN INVERSE TO THE MARKOV CHAINS 159

expected number of steps it takes to go from state

to

for the

first time.)

Let M denote the matrix whose (i,j)th entry is the expected number of steps before entering state for the first time after the initial state .9's. M is called the mean first passage matrix. For a square matrix X, the diagonal matrix obtained by setting all off diagonal entries of X equal to zero is denoted by Xd. J,, will denote the matrix of all l's. If the size of J,, is understood from the context, then the subscript m will be omitted.

If T is the transition matrix of an rn-state ergodic chain

Theorem 8.4.1

whose fixed probability vector is passage matrix is given by

= [w1 , w2, ... , w,,,], then the mean first (1)

where D is the diagonal matrix 0

0... 0 0...

0

=[(I—A.A)d]'.

0 WI,,

Proof It is known that the mean first passage matrix is the unique solution of the matrix equation

AX=J—TXd.

(2)

We simply verify that the right hand side of (1) satisfies the equation (2). Let R denote the right hand side of (1) and observe that = D. Now,

AR=A(I—A'

J—TD=J—TR. U Corollary 8.4.1

Consider an ergodic chain whose fixed probability rector is = [w1 , w2, ... , wJ. If the chain starts in state .9'., then the expected number of steps taken before returning to state S" for the first time is given by M11 = 1/(1

—

For an initial probability vector the kth component, (p*M)k, of p*M is the expected number of steps before entering state br the first time. Consider the case of a regular chain that has gone through n steps before it is observed. If n is sufficiently large, the initial probability vector may be taken to be p*(O) = In this case, p*(t) = When this t =0, 1,2 situation occurs, we say that the chain is observed in equilibrium. The next theorem relates the diagonal elements of A' to the expected number of steps taken before entering state when the initial distribution is p*(0) =

160 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

If T is the transition matrix of an ergodic chain and is the fixed probability vector of T, then

Theorem 8.4.2

(w*M)k= 1

A' W&

Use this Proof From Theorem 4.1, we have M = D — A'D + w*D + w*JAI D= together with the fact that wA' =0 to obtain w*M =

A'

= 1 + (j*AID)k = 1 + —a.

j(I + Ad'D). Therefore,

U

provides a comparison of the expected number of steps before entering state '9'k for two different This expression is given below in terms initial distributions and The kth component of the vector

—

of A'. Theorem 8.4.3 vectors

For an ergodic chain and for two initial probability

and

—

=

—

Proof This follows from Theorem 4.1 since

—

A#)D.

=

=

•

We now address ourselves to the problem of obtaining the variances

of the first passage times. In Theorem 4.1, we saw how the matrix produces the expected first passage times. The following theorem shows that also produces the variances of the first passage times. For an ergodic chain, let V denote the matrix whose (i,j)-entry is the variance of the number of steps required to reach state for the first time after the initial state 5'.. For Xe W" X m let X denote the matrix whose entries are = the squares of the entries of X, i.e. The matrix of variances of the first passage times is given by V = B — where M is the mean first

Theorem 8.4.4.

passage matrix and

B=

+ I) + 2(A'M — J[A'MId).

(3)

Proof If V = B — M5, then it is well known that B must be the unique solution of AB = J — + 2T(M — Me). From Theorem 4.1, we know that M — Md = M — D = — (A' — )D. Therefore B must be the unique solution of AB = J — TBd — 2T(A' — JA7)D. Let R denote the right hand side of (3) and observe that = Rd O,AM = J — TD,AA'M = 2DArD + D. Now, use the facts that AJ = — A'TD, and TJ = J to obtain AR =(J — + I) — 2A'TD = J— — 2T(A' — )D. Therefore, (3) must be true. S 5.

Calculation of A* and

for an ergodic chain

practical methods for calculating the group inverse of a general square matrix G (provided of course that C, exists) were given in Some

APPLICATIONS OF THE DRAZIN INVERSE TO THE MARKOV CHAINS

161

Chapter 7. However, for the present situation we can take advantage of

the special structure of A = I — T and devise an efficient algorithm by

which to compute A'. Theorem 8.5.1

Let T be the transition matrix of an rn-state ergodic chain and let A = I — T. Every principal submatrix of A of size k x k, k = 1,2,3, . - , m — 1, is non-singular. Furthermore, the inverse of each principal submatrix of A is a non-negative matrix.

Proof Since T is the transition matrix of an ergodic chain, T is irreducible and p(T)= 1. Let A be any k x k submatrix of A, where 1 k m — 1, so that A = I —1 where T is a k x k principal submatrix of T. It is well known that p(T) < p(T) = 1. Thus, A must be non-singular. For every matrix B 0, it is known that p(B) < 1 ii and only if(l — B)' exists and 0 since T 0. • (I — B)1 0. Therefore, we can conclude that (A)' As a direct consequence of this theorem, we obtain the following result. Corollary 8.5.1 If T is the transition matrix of an rn-state ergodic chain and we write A as

l),cERm_

where UeR

U' O.

then U' exists and

and

Using this result, it is now possible to give a useful formula for obtaining

A'. Theorem 8.5.2 Ld* where U E

For an rn-state ergodic chain, write A as

I

-1)

notation: h*

and e

CE or ',d

1— h*j,and F= U1

—

ö>O,fl> 1,andA'

is given by

FjhF

-5

U-'-tp

Adopt the following

I

p

Then

Fj

/32

Proof To show S >0 and /3> 1, observe that d* 0. From Corollary 5.1, it follows that U1 0 and hence h* 0. However, h* is not the zero vector. Otherwise, if h =0 then d* =0, because U is non-singular, and this would imply that T is reducible, which is impossible. It now follows that 5>0 and /3> 1. To show the validity of(1), first note that (I —

162

GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

=I+jh*/fl.LetH=(I_jh*)IU*(t_jh*)isothat Uijh* jh*UI ôjh*

H=U'+

—--p--.

(2)

+ By a direct calculation, using (2). it is easy to see that Fjh*F U

U1 + h*F

= H. Likewise, use (2) to show that

—

Fj

=

= Hj, and

ö

= — h*Hj. Therefore, the matrix on the right

hand side of (1) can be written as

[H

—Hj

I

(3

_h*Hj

Lh*H

Since rank (A) = rank (U) = m — I.

A_lU

Uk h*Uk

Lh*u

From Theorem 7.7.6. A' is given by

where'k =

A'

A can be written as

—

(4)

h*Hk]

Lh*H

the row sums of A are all zero, it follows that Uj + c =0 and hencej = — U1c = — k so that the matrices in (3) and (4) are equal. • For an ergodic chain, if one desires to compute the fixed probability vector one does not need to know the entire matrix A'. As demonstrated in Theorem 2.3, knowledge of any single row of is sufficient. Theorem 5.2 provides an easily obtainable row, namely the last one, and thereby provides one with a relatively simple way of computing Because

Theorem 8.5.3

If T is the transition rnatrix of an rn-state ergodic chain and A = I — T is partitioned as

A_lU

:

[d*

cl_lu

where U E p(rn -1)

-

:

—Uj

_d*j

then the fixed probability vector of T is given by

1_h*j= 1 _d*Ulj. Proof From Theorem 2.3, w* = e From (1), rZ =

M

MPL

so that

1

w*=e*_r*A_ff_h*

—

i

11

S

•

rA, where r is the last row of

=

hi]. Therefore

APPLICATIONS OF THE DRAZIN INVERSE TO THE MARKOV CHAINS

163

From a computational standpoint it is important to point out that when

Theorem 5.3 is used to compute w, it is not necessary to explicitly calculate U— just as it is not necessary to explicitly calculate C- in order to solve a non-singular system of linear equations Cx = b. Indeed, one may consider the vector h to be the solution of the non-singular system U*x = d and proceed with the solution of the system by conventional methods.

Corollary 8.5.2

For an erqodic chain.

I—AA'

=

Proof The first result follows since W = I

—

AAZ the second follows

from the first. • pointed out above, for an ergodic chain it is not necessary to explicitly compute A4, or even U- 1, in order to compute the fixed is readily probability vector But if one knows A4 or U', then available. However, it is just the reverse that is often encountered in applications. That is. by theoretical considerations or perhaps by previous experience or experimentation, one knows what the fixed probability vector, w, or the limiting matrix W, has to be. In order to obtain some of the information about the chain which was discussed in the previous section, such as the mean first passage matrix, the matrix A needs to be computed. It seems reasonable to try to use the already known information about or W in order to obtain A4 rather than starting from scratch and using only the knowledge of A and formula (1). The next theorem shows how this can be done. Before stating that result, we give a very simple example of a situation where the fixed probability vector is known beforehand. As

Example 8.5.1 Consider an rn-state ergodic chain which is 'symmetric'. That is, the one-step probabilities satisfy = for each i,j = 1,2, ... ,m. This implies that the transition matrix T is a doubly stochastic matrix. In particular, j*T = The vector > 0 is a fixed vector for T but it is not a probability vector. However this can easily be fixed by multiplying by Now,

= (!j*)T,

>0, and

=

1.

Since the fixed probability vector is unique, it must be the case that w*

=

Thus a symmetric ergodic chain is an example (which occurs

164 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

frequently) of a situation where

is known beforehand. when or W is Let us now return to the problem of finding already known. If W is known, then AA' is known since AA' = I — W. The matrix AA' can be used to obtain A' as follows.

Theorem 8.5.4 For an rn-state ergodic chain with transition matrix T, write A = I — T as

[d where UE W"'

I

1)

("

The matrix

is given by

A' Proof

(5)

If A is any (1)-inverse for A then AA'AAA' = AAAAA# =

A'AA' = A'. But X

=

is a (1)-inverse for A since

U Frequently, one may wish to check a computed inverse. If the matrix under question is non-singular, one can compare the products of the original matrix by the computed inverse with the identity matrix. However, if the matrix A has index 1 and a computed is checked by

comparing the product AA'A with A, AA' with A'A, and A'AA' with A', then one has probably done more work doing the 'check' than in computing the original quantity. Since the number of arithmetic operations necessary to form the indicated matrix products is relatively large, it is possible that factors such as roundoff error can render the 'check' almost useless and leave the investigator totally unsure about the quantity he has computed. The next theorem provides an alternate means by which one can 'check' a computed A' for an ergodic chain. For an ergodic chain, the fixed probability vector (or equivalently, the limiting matrix W) is either known from theoretical considerations, or else can usually be computed without much difficulty by simply proceeding as suggested in Theorem 5.3. Furthermore, iterative improvement techniques work well for producing very accurate numerical solutions for For a general chain, one can use the second part of Theorem 2.2 to obtain W. Once one has confidently obtained w (or W), then a computed A' can be checked by using the following result. For any chain with limiting matrix W, A' can be characterized as the unique solution X of the two equations WX =0 and

Theorem 8.5.5

AX=I—W. The proof is by direct substitution. lithe chain is ergodic, then WX =0 can be replaced by W*X = 0*.

APPLICATIONS OF THE DRAZIN INVERSE TO THE MARKOV CHAINS 165

This theorem provides a method by which one can have some confidence

in a computed value for A# since if X is a matrix such that WX is 'close' to 0 and AX is 'close' to I — W, then X must be 'close' to A More precisely, we have the following. Theorem 8.5.6 For any chain, if is a sequence of matrices such that —, 0 and AX, —, I — W., then X, —, A #

Proof

—

X, = AA*(A*

—

Xe)— WX, =

As with Theorem 5, if the chain is ergodic, then

—

X,)] — WX—'O

—' 0

can be replaced

by 6.

Non-ergodic chains and absorbing chains

In this section we will show that A and I — AA can be useful in extracting information from non-ergodic chains. We first consider the problem of classifying the states as being either ergodic or transient. For chains with large numbers of states the problem of classifying the states,

or equivalently, putting the transition matrix in a 'canonical' block triangular form is non-trivial. The following theorem shows how the projection I — AA can be used to classify the states. (As before A = I where T is the transition matrix.)

—

T

For a general chain, state 6". is a transient state (land only (I the ith column of I — AA is entirely zero. Equivalently, .9'. is an

Theorem 8.6.1

ergodic state (land only if the ith column contains at least one non-zero entry.

Proof Perform the necessary permutations so that T has the form (1) of Section 2. Then all transient states are listed before any ergodic states, with the partition as indicated in (1) of Section 2. As argued in Theorem 2.1, A has the form A

=

where G31 is non-singular, Ind(G22) =

1,

and P is a permutation matrix.

By using Theorem 7.7.3, 1 — AA * is seen to have the form

I—AA' Furthermore, every column of I entry since

1I—T,+1.,+1 G22 =

L0

0

—

G22Gr2 contains at least one non-zero

166 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

and each T,1 (i> r) is the transition matrix of an ergodic chain. Hence = I — (I — T11)(I — Tx)> 0 because it is the limiting matrix of the I—

chain associated with T11 and it is well known that the limiting matrix of

an ergodic chain is strictly positive. • I — AA' can provide a distinction between the transient states and the ergodic states and it can completely solve the problem of determining the ergodic sets. Theorem 8.6.2 same ergodic set,

and 9'k belong to the For a general chain, states the ith and kth rows of I — AA are equal.

Proof Write the transition matrix in the form (1) of Section 2 so that (1) holds. The desired result follows from 0

[

0

because each of the rows of I — A11A! are identical (i> r) since each

represents the limiting matrix of an ergodic chain. U For a general chain with more than one ergodic set, the elements of I— can be used to obtain the probabilities of eventual absorption into any one particular ergodic set for each possible starting state. I—

Theorem 8.6.3

For a general chain, let [9'J denote the equivalence class denote the set of (ergodic set) determined by the ergodic state Let indices of those states which belong to [5",]. IfS', is a transient state, then P (eventual absorption into [1/k] I initially in .9',) =

— AA

Proof Permute the states so that T has the form (1) of Section 2. Replace each T.,, i> r, by an identity matrix and call the resulting matrix 1'. From Theorem 2.1 we know that < 1 for i = 1,2,... ,r so that urn '1" exists A-.

and therefore must be given by urn f

= I — AA' where A = I — f. This

modified chain is clearly an absorbing chain and the probability of eventual absorption into state when initially in 5", is given by

(urn

/1k

= (I — AA')Ik. From this, it should be clear that in the original

chain the probability of eventual absorption into the set [9',] is simply

the sum over

of the absorption probabilities. That is,

P (eventual absorption into [5",] I initially in .9'.) =

(I —

AA '),,. (3)

'ES."

We must now show that the

can be eliminated from (3). In order to do

APPLICATIONS OF THE DRAZIN INVERSE TO THE MARKOV CHAINS

167

this, write A and A as

Ill

:oi

G

A—

'

I

E

Theorem 2.1 guarantees that

I-

II

ndA—

12

I

G22. is non-singular and Theorem 7.7.3 yields

=

When T is in the form (1) of Section 2, the set of indicies 91k Will be = {h,h + 1, h + 2,... h + r}. Partition I — AA' and sequential. I— as follows:

columns h, h +

I

o...0 0...0

1,...h+t

WIN

...

Wpq

row i is

in here W,fl

=

(4)

0...0

0".0 0...0 and

V

W1qWqq •..

i.r+ i

1w,+ i.,+ +

I—AM =

+

+

columns h, h + 1, ... h + 1

W,,qWqq

1

row

... W,,,W,,,

v-.lis }

...

...

(5)

W+ir+i

...

0

...

0

...

0

0

where W

=I—

the gth row of

In here

Suppose the ith row of I — AA lies along If P denotes the probability given in (3), then it

168 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

is

clear that P is given by

P = gth row sum of W,,,.

(6)

It is also evident that

= gth row sum of

(I —

(7)

lEfk

Since Wq is the limiting matrix of Tqq and Tqq is the transition matrix of an ergodic it follows that the rows of Wqq are identical and sum to 1. Therefore, the gth row sum of W,,qWq = gth row sum of W,,,, and the

desired result now follows by virtue ol(6) and (7). U Theorem 8.6.4

If

is a transient state and is an ergodic state in an ergodic class [6"j which is regular, then the limiting value for the nth step = (I — AA')Ik. transition probability from 50• to 60k is given by lim SI-.

Proof It is not hard to see that urn absorption

= P (eventual

where

=

A-.

into [9'J I initially in 6".) and

the component of the fixed probability vector associated with [5°,] corresponding to the state 6",. Suppose [9'J corresponds to Tqq when the transition matrix T is written in the form (1) of Section 2, and suppose the ith row of I — AA' lies along the gth row of the block in (5). The kth column of I — AA' must therefore lie along one of the columns of W say thefth one, so that = we can use (6) to obtain (I — AA = (gth row sum of WN) x is

W

W,Pk

U

=

itself contain important information about a general chain with more than one transient state. The elements of

Theorem 8.6.5

If 6". and 9', are transient states, then (A')1, is the expected number of times in .9', when initially in Furthermore 5°. and 6", belong to the same transient set (A')1, > 0 and (A' >0

Proof Permute the states so that T has the form (1) of Section 2 so that T=

and A =

[!j9.-! T2E]

where p(Q) < I. Notice that TSk = Qil because 6". and 6", are both

\

/1,—i

transient states. By using the fact that (

/a-I

T'

\

is the Q' ) = ( ) Ii* \s"o 1k expected number of times in 6°, in n steps when initially in 6",, it is easy to see that the expected number of times in .9', when initially in is

\:=o

lim

(a_i)

•

1=0

Theorem 8.6.6

For a general chain, let

denote the set of indices

APPLICATIONS OF THE DRAZIN INVERSE TO THE MARKOV CHAINS 169

corresponding to the transient states. If the chain is initially in the transient states then is the expected number of times the chain is in a k€f transient state. (Expected number of times to reach an ergodic set.)

Proof If a pennutation is performed so that the transition matrix has the form (1) of Section 2, then

<1.

T=

If Q is r x r, then if = { 1,2,... , r} and the previous theorem implies that (A')11 =

(I

k.f

is

—

the expected number of times the chain is in a

transient state when initially in

Theorem 8.6.7

IfS'1 and b"k are transient states, then

(A' [2A' — I] — A

= Variance of the number of times in

when

initially in 6". and

E ([2A'

—

IJA'

—

= Variance of the number of times the chain is in a transient state when initially in 6".

where f is the set of indices corresponding to the transient states and

are as described in Theorems 4.1 and 4.4.

and

The proof is left as an exercise.

As direct corollaries of the above theorems, we obtain as special (but extremely useful) cases the following results about absorbing chains.

If T

is the transition matrix for an absorbing chain, then the following statements are true.

Corollary 8.6.1

is an absorbing state, then (I — AA' is the probability of being when initially in b". absorbed into are non-absorbing states, then (A' is the expected (ii) If b"1 and when initially in number of times in (iii) If if is the set of indices corresponding to the non-absorbing states, then is the expected number of steps until absorption when initially (1)

If

in the non-absorbing state 5".. — I] — are non-absorbing states, then (A (iv) ff11'1 and is the variance of the nwnber of times in b"k when initially in 6's. (v) If if is the set of indices corresponding to the non-absorbing states, — I]A' — As')11 is the variance of the numbers of steps until then

absorption when initially in

In order to analyse a chain by utilizing the classical theory, it is always

170 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

to first permute or relabel the states so that the transition matrix assumes the canonical form (I) of Section 2. However, by analysing the chain using A and I — AA , the problem of first permuting the states may be completely avoided since all results involving A # or I — AA' are independent of how the states are ordered or labelled. In fact, the results of this section help to perform a classification of states rather than requiring that a classification previously exist. necessary

7.

References and further reading

Almost any good text on probability theory treats the subject of finite

Markov chains. However, not all authors use the tool of matrix theory for their development. In [34] and [66], the reader can find a good development of the subject in terms of matrix theory. In [46], the probabilistic approach is combined with the matrix theory approach. This text can provide the reader with all the needed background necessary to read this chapter. Only the case where the state space is finite has been considered in this chapter. The industrious student might see what he can do with the subject when the state space is countably infinite, in which case one is dealing with infinite matrices. A good place to start is by reading [47]. See also [17].

9

Applications of the Drazin inverse

1.

Introduction

The previous two chapters have developed the basic theory of the Drazin

inverse and the applications of a special case, the group inverse, to Markov chains. This chapter will develop the application of the Drazin inverse to singular differential and difference equations. We shall also discuss where these singular equations occur. 2.

Applications of the Drazin inverse to linear systems of differential equations

In this section, we will be concerned with systems of first order linear differential equations of the form Ax(t) + Bx(t) = f(i), x(t0) = ceCa where and x(t) and (kt) are vector valued functions of the real variable t, and f(t) is continuous in some interval containing t0. If A is non-singular, then the classical theory applies and one has the following situation.

(I) The general solution of the homogeneous equation, Ax(t) + Bx(t) =0, is given by x(t) = A (H) The homogeneous initial value problem, Ai(t) + Bx(t) =0, x(t0) = c, has the unique solution x(t) = e " ISQ_lo)C

(III) The general solution of the inhomogeneous equation Ai(t) + ' iaiJ e'"f(s)ds, Bx(t)= ftt), is given by x(t) = +A

e'

aeR,qeC. (IV) The inhomogeneous initial value problem, Ax(t) + Bx(t) =

x(t) = c, has the unique solution x(t)= e' 'Bft.to)c + A x f(s)ds.

e SO

172 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

In this section, we will examine what happens in each of these problems when A is a singular matrix. When A is a singular matrix, things can happen that are impossible when A exists. For example, the homogeneous initial value problem may be inconsistent, that is, there may not exist a solution. If there is a solution, it need not be unique. The following is a simple example that illustrates this fact.

Example 9.2.1

Let A

and B

=

=

?]• Then the initial

value problem Ax(t) + Bx(t) =0, x(0) = [1,11* clearly has no solution. 10

1

IIA= 10

0

01

10

1

0 IandB=10

0

[000]

11

Olandweimposetheinitialcondition

L000J

x(0) = [1, 1, = c, then it is not difficult to see that the initial value problem Ax(t) + Bx(t) =0, x(0) = c, has infinitely many solutions. Notice that in each of the above examples, we even have that AB = BA.

The situations illustrated above motivate the following definitions.

Definition 9.2.1 For and t0ER, the vector is said to be a consistent initial vector associated with t0 for the equation Ax(t) + Bx(t) = 1(t) when the initial value problem Ai(t) + Bx(t) = 1(t), x(t,) = C, possesses at least one solution.

Definition 9.2.2 The equation Ax(t) + Bx(t) = 1(t) is said to be tractable at the point t0 the initial value problem Ax(t) + Bx(t) = f(t), x(t0) = c has a unique solution for each consistent initial vector, C, associated with t0.

If the homogeneous equation Ax(t) + Bx(t) =0 is tractable at some point So we may simply say the equation t0e R, then it is tractable at every t is tractable. Our goals are as follows. (1) Characterize tractable homogeneous equations. (ii) Provide, in closed form, the general solution of every tractable homogeneous equation. (iii) Characterize the set of consistent initial vectors for tractable homogeneous equations. (iv) Provide, in closed form, a particular solution for the inhomogeneous equation when the homogeneous equation is tractable. (v) Characterize the set of consistent initial vectors associated with a point t0 for an inhomogeneous equation when the homogeneous equation is tractable. (vi) Provide, in closed form, the unique solution of Ai(t) + Bx(t) = 1(t), x(t0) = c where c is a consistent initial vector associated with t0 and the differential equation is tractable. The key to accomplishing (i)—(vi) is the following two results.

APPLICATIONS OF THE DRAZIN INVERSE

Theorem 9.2.1

X

For A, BE

173

the homogeneous differential equation

Ax(t)+Bx(t)=O is tractable exists.

(1)

there exists a scalar AEC such that (2A + B)-

and only

Lemma 9.2.1 Let A, (AA + B)1 exists, and let

X

l

such that

,• Suppose there exists a

(2)

Then

=

Theorem 1 shows that assuming (1A + B) is invertible is a natural assumption. Lemma 1 means that we can assume for proof purposes that A and B commute. We shall prove the Lemma first.

If there exists AEC such that (A.A + B)-' exists, then

Proof of Lemma 1. Proof of Theorem

•

that + B) exists. Let AA and beas defined in (2). Clearly Ax(t) + =0 is. Taking a similarity Bx(t) =0 is tractable if and only if AAX(t) + Suppose first that there exists AEC such

1

we may write A

Ic 0]

ALO NJ'

B

Il—Ac 0

0

1_IB,

01

_Ix,(t)

I—,N]LO

(3)

= I. Since C is invertible, Cx,(t) + (I — AC)x2(t) =0 is since ).Aa + tractable. Thus it suffices to show Nx2(t) + (J — AN)x2(t) =0 is tractable.

(4)

Let k = Ind(N) and multiply (12) by Then (I — 'x2(t) = 0. Hence 1x2(t)= 0. Multiply (12) by Then 'x2(t) + Nk_2x2(t) (I — = 0. Continuing in this manner = 0 so that we get that x2(t) =0 and Ni2(t) + (I — AN)x2(t) =0 is trivially tractable. Suppose now that Ax(t) + Bx(t) = 0 is tractable. We need to show that there is a AeC such that (AA + B) is invertible. Suppose that this is not true. Then (AA + B) is singular for all AEC. This means that for each

IEC, there is a vector

+ B)vA =0 and

(AA

Let

,

,

such that 0.

be a finite linearly dependent set of such vectors. Let

C be such that not all the

are 0. Then z(t) =

= 0, where

is not identically zero and is

174 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

easily seen to be a solution of (1). However,

= 0.

z(O)

= Thus, there are two different solutions of(1), namely z(t) and 0, which satisfy the initial condition x(O) =0. Therefore, (1) is not tractable at t =0. which contradicts our hypothesis. Hence, (AA + B)-' exists for some

2€C. •

The next theorem will be used to show that most of our later development

is independent of the scalar 2 which is used in the expression (AA + B)-'. Theorem 9.2.2 Suppose that A, BECn*hI are such that there exists a AEC so that (IA + B)-' exists. Let AA = (IA + B)- 'A, BA = (IA + B)- 'B, For all for which + B)' and and 1A = (IA + B)- for (pA + B)-' exist, the following statements are true.

=

(5)

=

and

= =

and

(6)

=

(7)

=

(8)

(9)

Proof To prove (5), write = [(GA + B)- IAIDA = [(GA + B)- '(pA + B)(4uA + B)- IA]DA = +

=

by Theorem 7.9.4. + + B)]A + B) B) 'A + B) '(IA +

=

=

The proof of (6) is similar and is left as an exercise. To prove (7), write A, = [(xA + B)- '(pA+ B)J(pA + B)- 'A = + = + Since and commute, it follows that for each positive integer m,

R(A') = Thus (7) follows. To prove (8), use the same technique used to prove (5) to

=

obtain

'1=

+ B)-

+ B)]f =

+ B).-

+ B)

The proof of(9) is similar. In view of the preceding theorem, we can now drop the subscript A and appear. whenever the terms R(AA), Ind(AA), We shall do so. Let us return to the proof of Theorem 1. Recall that the original system x

+ B)

APPLICATIONS OF THE DRAZIN INVERSE

was

175

equivalent to the pair of equations

Cx1(t)+B1x1(t)=O, CB1=B1C

(10)

Nx1(t) + B2x2(t) = 0,

(11)

NB2 = B2N,

B2 invertible, and the only solution of(l1) was x2(t) 0. But (10) is consistent for any x1(t0) and the unique solution is x1(t) = exp( — C x (t — t0))x1(t0). Thus we have proved the first part of the next Theorem. Suppose Ai(t) + Bx(t) =0 is tractable. Then the general

Theorem 9.2.3 solution is given by

x(t)=e

qECN.

(12)

isa consistent initial vector for the homogeneous equation if

A vector

and only ifc€R(At) = R(ADA). is k-times continuously differentiable around t0. Then Suppose that the non-homogeneous equation Ai(t) + Bx(t) = f(t) always possesses solutions and a particular solution is given by

x(t) = e

+ (I —

J:0e

—

(13)

Moreover, the expression (13) is independent of A. The general solution is given by

x(t) = e

—

— to)AADq

+e

—

J

e

ADf(S)dS

(14)

—

+ (I —

Let * = (I — AAD)E( —

Then w is independent of A.

1=0

A vector is a consistent initial vector associated with t0eR for the inhomogeneous equation if and only {* + R(Ak) }. Furthermore, the inhomogeneous equation is tractable at t0 and the unique solution of the initial value problem with X(tr,) = c, c a consistent initial vector associated with t0, is given by (14) with q = c.

Proof (14) will follow from (12) and (13). We have already shown (12). To see (13) let

x,(t) = ADe_

e

x2(t) = (I — AAD) 1—0

where we have taken ç

=0 for notational convenience. We shall show that

176 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

Ax,(t)+

= AADf(t) and

(15)

Ax2(t) +

= (I — AAD)f(t)

(16)

To verily (15), note that Ax,(t) =

+

—

Ax2(t) = A(I

AADf(t) =

—

+ ADe

—

=

A°*te

+ AA"f(t),asdesired. We now verify(16).

—

lf(I+

—

= (I —

1—0

—

1)'

x

1—0

"(t) = (I —

= (I —

—

x

i—i

= (I — = —

(— 1)' i—i

x2(t) + (1— AAD)i(:) where the fact that (I —

—

I_i

( — 1)'

lAi(ED)14 1f(t) =

+ (I —

+

—

A, AD commute has been used freely.

Thus, x,(t) + x2(t) is a particular solution as desired. The characterization of the consistent initial vectors for the inhomogeneous equation follow directly from (14). That the solutions are independent of A follows from

Theorem 2. U An important special case is when B is invertible. Then we may take

f=B'f.

A=OandA=B'A,

The Drazin inverse can sometimes be useful even when A is invertible. If f(s) is a constant vector 1, the general solution of x(:) + Bx(t) = f(t) is given by

r

1

essdS]f.

(17)

a

If B' exists, then (17) is easily evaluated since feuds = B

+ G,

is more GE However, if B is singular, then the evaluation of difficult. The next result shows how to do it using the Drazin inverse.

Theorem 9.2.4

If

and Ind(B) = k, then

... Ge

X N

Proof Use the series expansion for

+ G]=

+(I —

Corollary 9.2.1

If

x(t) = BDf+ t(I — BBD)f_

to obtain

then

t3(I

... +

APPLICATIONS OF THE DRAZIN INVERSE (

flk- hZk(I

BBD)Bk_l

177

I is a solution of x(t) + Bx(t) = I. (Note, this is a

polynomial in t.)

Corollary 9.2.2

For each t. let C(t) denote the set of consistent initial vectors associated with tfor Ax(t) + Bx(t) = 1(t) where (AA + BY' exists for some A and 1(t) is k-times continuously Then d(C(t), C(t0)) 0 as t —, ç, where d(C(t), C(t0)) = sup inf X—y IJ

'i€C(10)

The proof is left as an exercise. We also note in passing, the next theorem. If(AA + B)' exists for some A, then

Theorem 9.2.5 AAD = lim 4—x

Proof

AD

= urn

it

2—0

-

Since

I = AA2 + B2 we obtain

AD_ = AAD +

The first limit A

is independent of A. The second limit follows from

follows since

S

Example 9.2.2 Consider the homogeneous differential equation Ax(t) + Bx(t) =0 where 2 0 —21 1 0 A=J—1 0 21,B=I—27 —22 —17 14 2] 10 L 2 3 L 18

1

1

1

Note that A and B are both singular and do not commute. Since A + B turns out to be invertible we multiply on the left by (A + B)-'to get Ax(s) + =0 where —5 5

iI—3

A=(A+B)'A=—I

6

—41

6

5

—2 —2

The eigenvalues of A are 0, 1,3 so that AD may be computed by Theorem 7.5.2 to be 1

1—27 —41

AD=_I

54

77

27L—27

—34

—28

46 —14

The consistency condition for initial conditions is thus (I — AAD)x(0) =

f

— 18

L

14

18

9

—

14 7

4]

—

101 Ix,(0)1 10 x2(0) I = 0. 5J Lx3(0)J

7]

178 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

There is only one independent equation involved, 9x1 (0) + 7x2(O) + 5x3(O) =0.

Since

—

(18)

= {O, 0,2/3), it is not difficult to compute the matrix

exponential as 1

x(t) = e

=

fl8 0 L 0

1 — e2t3

2(1 — e2113)1 rx1(o)

16(1 — e2"3)

26 — 8e2"3

13(e2"3 —

1)

26e2"3

—

8

J

Lx3(0)

Equation (18) can be used to eliminate one of the x1(0).

For many applications it is desirable to be able to solve Ax + Bx = I when A, B are rectangular. We shall develop two important special cases. For each case, the general solution will be given. Derivation of the appropriate set of consistent initial conditions for the non-homogeneous equations is left to the reader. Another generalization of the results of this section may be found in Exercises 1—6. We will first consider the case when (AA + B) is one-to-one for some A. Theorem 9.2.6

Suppose (Alt + B) is one-to-one. Then all solutions

Ai + Bx =0 are of the form x=e

of

where q€R(ADA)

and

=0 for m = 0, 1,... ,n.

[I — (AA + B)(AA +

Here A = (Alt + B)tA,

(19)

= (AA + B)tB.

=0. Proof If x is a solution of Ax + Bx =0, then x is a solution of Ax + = M and AA + = I. Hence x = A°*IADAq by Theorem 3. But BADA]e_ =0 for all t. Substituting back in gives [— + =0 for all t, or equivalently, Thus [— + BAADJe_ [BA —

=0 for m =0,1,2.... But = A(AA + B)tB = A(AA + B)t(AA + B)— A(AA + B)tAA = A — AA(AA + B)tA

=A_(AA+B)(AA+B)tA+B(AA+B)tA

=[I_(AA+B)(AA+B)t)A÷BA. U Corollary 9.2.3 If AA + B is one-to-one, and N(XA* + B*) = N(A*)r N(B*), then all solutions of Ax + Bx =0 are of the form x = e A°bIADAq where q is an arbitrary vector.

I

= N(XA* + B*) = N(A*) n N(B). But R(A) N(A) Proof R(AA + so that R(A) R(AA + B)-. Thus (19) holds for all qeR(AA"). U

APPLICATIONS OF THE DRAZIN INVERSE

Example 9.2.3

179

[?]. Then (AA + B) is one-to-one = [b], B = and N(AA + B) = N(A) N(B) = (O} for all 2. However, N(XA + B*) # N(B*) for all A. Ai + Bx =0 has only x =0 as a solution. Multiplying by (AA + B)t = (IA 12 + 1)-i [2., 1] we get 2(1212 + l) l x + (1212 + l) 'x = 0 which has the non-zero solutions x = "q. Let A

Suppose (A.A + B) is one-to-one and Ax + Bx consistent. Then all solutions of Ax + Bx = fare of the form

Theorem 9.2.7

= f is

+ ADe

x=

+ (I — A =(AA + B)tA,

=(AA + BrB, k = Ind(A), and f= (AA + B)tf.

Proof !fxsolvesAx+ Bx=f,then AA + = I. Thus (20) follows from Theorem 3. • Theorem 7 is not as completely satisfying as our other results since we have not stated precisely for which f is Ax + Bx = f consistent when AA + B is one-to-one. While the genóral problem appears difficult, we do have the following. Theorem 9.2.8 Suppose AA + B is one-to-one and N(AA* + B*) = N(A*) n N(B*). Then Ax + Bx = f is consistent and only jf

(I_(AA+B)(AA+B)t)f=O. Proof

Suppose AA + B is one-to-one and N(XA* + B*) = N(A*) n N(B*).

+ B*) = Now (2A + B)(AA + B)' is the identity on R(AA + B) = + B)'A = A and Thus(AA + 2 (AA+B)(AA+B)'B=B. Henceforany x, ifwe setf=Al+ Bx,weget

(AA + B)(AA + B)'f = f. On the other hand, if(2A + B)(AA + B)'f = f, then = I is = I. Since Ax + Ax + Bx = is equivalent to Ax +

consistent, so is Ax + Bx = f. U The special cases when A or B is one-to-one are of some interest. B being one-to-one is the case of most interest for the applications of Section 5. Theorem 9.2.9

Suppose that A is one-to-one. Then Ax consistent if and only tff is of the form

f=

—

+ Bx = f is

AAt)Bg

where h is an arbitrary function and

g=

+

AtBt

*?BsAth(S)th,

(22)

180 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

q an arbitrary constant. Conversely, (22) is the general solution.

I has the form (21), then g given in

Proof Suppose A is one-to-one. Then Ai + Bx = I is equivalent to the pair of equations:

x+AtBx=Atf, and (I —

AAt)Bx = (I

—

(23)

AAt)I.

(24)

Now AAtI can be chosen arbitrarily, say AAth. Then (23) uniquely determines x giving (21). Substituting x into (24) gives (1 — AAt)(. A similar result is possible if B is one-to-one.

Theorem 9.2.10

Suppose that B is one-to-one. Then Ax consistent tf and only 1ff is of the form

•

+ Bx = I is

I = BB'h + (I — BB')Ag

(25)

where h is arbitrary and

g=

+

+ [I — (BtA)D(BtA)]

(26)

E( —

k = Ind(B'A), q arbitrary. Conversely, 1ff has the form (24), then g in (25) is the general solution.

Proof Suppose B is one-to-one. Then Ax + Bx = I is equivalent to B'Ax + x = Btf, and (I — BBt)AX

= (1

—

BBt)I.

(27) (28)

Again BBtf is arbitrary. From (27) x is determined uniquely in terms of B'I. Then (I — BBt)f must follow from (28). N We now turn to the case when AA + B is onto. Let A, B be m x n matrices. Let A be such that AA + B is onto. Define P=(AA + B)t(AA + B). Then Ax + Bx =fbecomes

AP*+BPx=f—A(I—P)x—B(I—P)x. Or, equivalently, A(AA + B)'[(AA + B)x] + B(AA + B)t[(AA + B)x]

=f—A(I—P)x—B(I—P)x.

(29)

But A[A(AA + B)t] + [B(AA + Bt)] = I. Thus (29) is, in terms of(AA + B)x, a differential equation of the type already solved and hence has a solution for any choice of (I — P)x.

APPLICATIONS OF THE DRAZIN INVERSE

181

Theorem 9.2.11 Suppose that )A + B is onto and I is n-times B)t. Let B(AA+ B)t. Let + B)t(AA B)Jh — B[1 — (AA + B)t(,A + B)]h where h is an arbitrary (n + 1)-times vector valued function. Then all solutions of + Bx = fare of the form

x = (AA + B)'{e

+ (I

—

A°BIAADq

+ ADe_ + [I — (AA + B)t

ADA)

x(AA+B)Jh. an arbitrary constant vector, k = md (A). The formulas in Theorem 11 simplify considerably if A or B are onto. For the applications of Section 5, the case when B is onto is the more important.

q

Theorem 9.2.12 are of the form

x = Bt{e_

Suppose that B is onto. Then all solutions of Ax + Bx = I + C°e

+ (I h an

k

—

+ [1 —

—

arbitrary function, q an arbitary vector, g =

B'B]h.

I — A[I — B'BJh, C

= AB'.

= !nd(C).

Theorem 12 comes immediately from Theorem 11 by setting A = 0 and noting that ñ = I.

Theorem 9.2.13 are of the form

x = At{e_

Suppose that A is onto. Then all solutions of Ax + Bx = I

+e

+ [I — A'A]h

where h is an arbitrary function and g = I — B[I

—

(30)

A'A]h.

Proof This one is easier to prove directly. Suppose A is onto and rewrite Ax + Bx = f as (Ax) + BAt(AX) = f — B[I — AtA]X. Taking [I — A'A]x arbitrary we can solve uniquely for Ax, A1Ax = x, to get (30). U 3.

Applications of the Drazin inverse to difference equations

The Drazin inverse also arises naturally in attempting to solve difference

equations with singular coefficient matrices. To illustrate why the Drazin

182 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

inverse works and other types of generalized inverses don't work in dealing

with this type of difference equation, consider the following difference equation: A is singular.

At first glance, one might be tempted to introduce a (1)-inverse for A. However if one stops and thinks a moment, one can see that one must have that . .. = xfl = =

= = ... = as well as = k = Ind(A). Thus, the problem could be stated as follows: given find a R(Ak). vector ; such that A; +1 = and + 1E By examining Definition 2.2, one can see that the above problem has = AD;. a solution, the solution is unique, and is given by Not unexpectedly the solution of the difference equation proceeds much as for the differential equation. Definition 9.3.1 For A, Be Ctm X the vector cc C"' is called a consistent initial vector for the difference equation = B; + if the initial value problem = B; + f,,, x0 = c, n = 1,2,... has a solution

for;.

Definition 9.3.2 The difference equation tractable if the initial value problem =

= B; +

B; +

is said to be

x0 = c, n = 1,2,... has

a unique solution for each consistent initial vector c.

Theorem 9.3.1 Axe,,.,

The homogeneous difference equation

=Bx

is tractable if and only if there exists a scalar

such that (A,A + B)'

exists.

Proof The proof follows the same lines as the proof of Theorem 2.1 except that xA1(t) = is replaced with U = The difference analogue of Theorem 2.3 is as follows.

Theorem 9.3.2

If the homogeneous equation (1)

is tractable, then the general solution is given by

JAADq

—

ifn=0

ctm

exists. Furthermore, ceC"' is a consistent initial vector for (1) if

APPLICATIONS OF THE DRAZIN INVERSE

183

and only where k = Ind(A). In this case the unique solution, = (ADñy1c, n = 0, 1,2,3 subject to x = c, is given by The inhomogene-

= B; +

ous equation

forn.1, =

+ AD

is also tractable. Its general solution is,

- -

£

—

(I

—

=(2A — B)'f,k = Ind(A), is independent of 1. Let * = — (I — AAD) x

=(2A

A =(A.A — B)

and qeC"'. The solution

—

The vector c is a consistent initial vector jf and only

c lies

1=0

in the flat {* + R(A") }.

Proof Since (1) is tractable, multiplying by (AA — B)' gives the = equivalent equation After a similarity we get, as in the proof of Theorem 2.1,

IC 0

= C"(t + = (I + and the solution Thus =0, of the homogeneous equation follows. (2) may be verified directly as in the proof of Theorem 2.3. U

It is interesting to note that the solution (2) for

depends not only

on then+lpast vectors

1•

Z(A- 'By'-'-

When A is non-singular,

=(A'Brq +

depends only on the past vectors

and

In many applications one has a difference equation holding for only a subset of the x,. The difference equation that is discussed in Section 5 is solved by the following theorem.

Theorem 9.3.3

Suppose that A, B are square matrices and there exists a scalar 1 such that IA + B is non-singular. Set A = (IA + B) 'A and Then all solutions of +Bx1=0,i=0,...,N—Iare — (1 — A"A)xN. given by = ( +(

Proof Suppose that there exists a I such that AA + B is non-singular. Taking a similarity we get as in (3)

A_IA, 0 L0

0 1 Mk_0x

1

MJ'

L0

with B, = I — IA,,B2 = I

—

B2]'

—

'

AM. Then the difference equation is equivalent

184 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

to the decoupled equations

Thus w =( — AJ 4.

1B1)1w0, and v.

= ( — B 1M)N_*vN. •

The Leslie population growth model and backward population projection

Suppose that a population is partitioned according to age groups. Given specific rates of fertility and mortality, along with an initial age distribution, the Leslie model provides the age distribution of the survivors and descendants of the initial population at successive, discrete points in time. It is a standard demographic practice to consider only one sex at a time. We will consider only the female portion of a population. Select a unit of time (e.g. 5 years, or 1 year, or 10 s, or 0.5 ps, etc.). Let At denote one unit of time. Select an integer m such that m(At) is the maximum age to be consider. Construct m disjoint age classes or age intervals; A1

= (0, At], A2 = (At,2At],...

= ((m — 1)At, mAt].

Let t0 denote an initial point in time and for some integer n let

t=n(&). Let us agree to say a female belongs to Ak at time t if she is living at time t, and her age lies in Ak at time t. To define the survival and birth rates, let

Pk(t) be the probability that a female in Ak at time t will be in Ak+l at time t + At (survival rates). Let bk(t) be the expected number of daughters produced in the time interval [t, t + At), which are alive at time t + At, by a female in Ak (birth rates). Furthermore, let nk(t) be the expected number of females in Ak at time t. Finally, let n(t) = [n1(t), ... , For convenience, adopt the notation n(t1) = n(i), pk(tI) = and bk(tI) = bk(i). Suppose we know the age distribution n(i) of our population at time t.. From this together with the survival rates and birth rates, we can obtain the expected age distribution of the population at time

n1(i+ 1) n2(i + 1) fl3(i + 1)

[b3(i) J p1(i)

=

+')

0

LÔ

b2(i) 0 p2(i) 0

0 ... 0 ...

0 0

0 ... p.,_

0 0

n1(i) n2(i) n3(i)

0]

nrn(i)

(1)

or n(i + 1) = T(i)n(i). The expression (1) is the Leslie model. Many times, the survival rates and birth rates are constant with respect to the time scale under consideration. Let us make this assumption and write bk(t)=bk, so

that (1) becomes

n(i+1)=Tn(i)

(2)

APPLICATIONS OF THE DRAZIN INVERSE

185

We shall refer to T as the Leslie matrix. Suppose now that we are given

an initial population distribution, n(O). It is easy to see that we can now project forward into time and produce the expected population at a future time, say t = by n(k) = Tn(k — 1) = T2n(k — 2) = ... = T"n(O). We wish to deal with the problem of projecting a population distribution backward in time in order to determine what kind of population distribution there had to exist in the past, in order to produce the present population distribution. Such a problem might arise, for example, in a situation where one has statistics giving the age distribution for population A at only the time t. and other statistics giving the age distribution for population B at a different time, say t. + If one wishes to make a comparison of the two populations at time i, then it is necessary to project population B backward in time. Since n(i) = T 1n(i ÷ 1) the problem of backward population projection is trivial in the case when the matrix T of(1) is non-singular. If T is singular, the problem is more interesting. The Leslie matrix is very often singular. As a simple example, consider the population of human American females. Let & = 5 years and m =20 so that the age classes are: A1 = (0, 5],A2 = (5,10],... ,A19 = (90, 95], A20 = (95,100]. Almost everyone would agree that at least b20 =0. Suppose the Leslie matrix is given by b1

b2

Op20... o

0

0

T=

0 0

0 0 0

0... 0

0 0

0...0 O...0

0 0 0

PIN-k....l

0

0

0...

0

b3...bM_k_l

p1O 0...

(3)

0 0

0 0

0...

0

0 ...

Pm-k

0

0

o

0

0 ...

0

0

_1T11 [T2,

0

1

j

0 Pm-k+i

0... 0 0... 0

0

0 0

0

0 I

T22

The complete statement that we can make concerning backward population projection is the following. Theorem 9.4.1 For the Leslie population growth model whose matrix is given by (3), and for an integer x 0, let j be an integer such that

Oj

k + x. The future distributions n(k + x) determine the past distributions n(k + x —j) as follows.

186 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

if 0 j x,

(T")'n(k + x)

+ x)1

<Jk

+x

(4)

where VGRJ_x is arbitrary, n1(k + X)E

is the vector of the first m — (j — x) components of n(k + x), and M is the leading principal submatrix obtained from T by deleting the last j — x rows and columns from T.

Proof To prove the top half of (4) note that since Oj x, we have n(k + x _j)ER(rk), n(k + X)ER(Tk) and n(k + x) = + x —j). Thus (TJ)Dfl(k n(k + x — j) = + x) = + x). To show the bottom half of (4), first write T as

T

[M

:

01

=

where

so that, by Corollary 7.7.1, TD

MD 10-I =

Note that n(k)

=

÷ x)

In1 (k) 1

(k + x) 1

=

=

To complete the proof, it suffices to show [j — x])

1(MDY_xni(k)

v arbitrary. ], =L To simplify our notation , let 1= [I — x]. n(k

—

(5)

Partition M as b2

p1O

b3...bM_k_) 0 0

0...

bm_k

0

0...

0

0

0 0

0... 0...

0

0

0 0

0

0...

0

0

O

p20...

0 0

o

0

01

o

0...

•

0 0

o

0

M-——— —

0 ...

0

Pm-k+1

0 ...

0

0

0

Notice that lnd(M) = k

p-i S(p)=

(6)

I

0 0

— I.

0

0... 0...

0 0

0 0 0-

For each positive integer, p, let

IM

01

O1P

rMP

Nj

Lsp Ni]

APPLICATIONS OF THE DRAZIN INVERSE

187

Given the distribution n(k), there had to exist some initial population, n(O), (not necessarily unique) and an intermediate population, n(k — which gave rise to n(k). Write n(O) and n(k — I) as n(O)

n(k

=

—0= Now Tkn(O) = n(k) so that n1(k)ER(Mk) =

where n2(O), n2(k —

R(M&1. But T'n(k

—

I)

Also note that = n(k — I) so that n1(k and Ind(M) = k — 1 we have = n(k). Since M'n1(k — 1) =

n1(k — 1) is uniquely determined by n1(k) as n1(k — 1) =

To finish the proof it suffices to show that any vector u of the form (5) gives rise to the distribution n(k) after 1 intervals of time have elapsed. = n1(k), we have Since M'n(k — 1) = M(MD)mfl1(k) = TU

r

n1(k)

=

Thus it is only necessary to show that If n(O) = then T&n(O)

5.

= n2(k).

is any initial distribution which gives rise to n(k),

= n(k) implies n (k) =

Mkn1(O) and n2(k) = S(k)n1(O). Hence = S(1)(MD)IM n1(O) = S(1)Mk = S(k)n1(O) = n2(O).

Optimal control

In Section 2 we showed how to find solutions for linear systems of differential equations with singular coefficients provided that the solutions were uniquely determined by consistent initial conditions. In this section we shall apply those results to an optimal control problem. The problem presented will provide an interesting example of the type of differential equation studied in Section 2. In general, an optimal control problem involves a process x, which is regulated by a control u. The problem is to choose a control u so as to cause x to have some type of desired behaviour and minimize a cost J[x, uJ. The cost may, of course, take many forms. It may be time, total energy, or something else. The desired behaviour of the process may range

Finally, the process may from going to zero to hitting a moving depend on the control in a variety of ways, often non-linear. We shall present a particular problem and handle it in some detail. Of course, similar problems may also be analyzed using these techniques. Let A, B be n x n and n x m matrices respectively. All matrices and scalars are allowed to be complex though, of course, in many applications they are real. The usual inner product for complex (or real) vectors is denoted (.,.). Let Q, H be positive semi-definite m x m and n x n matrices. Finally, let x, ii denote vector valued functions of the real variable z. x is n x I while u is m x I.

188 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

We consider the autonomous (time independent coefficients) control

process

x=Ax+Bs.

(1)

on the time interval [t0,

J[x, u] =

I

with quadratic cost functional

(Hx, x) + (Qu, u)dt.

(2)

to

The dot in (1) denotes differentiation with respect to t.

If one has a fixed pair of vectors x0, x1 such that there exist controls u so that the process xis at x0 at a time t0 and x1 at time t1, then one can ask for a control that minimizes the cost (2) subject to the restraint that x(t0) = x0x(t1) = x1. Using the theory of Lagrange multipliers one gets the system of equations •

A+A*2+Hx=0 x—Ax—Bu=O

(3)

Bt+Qu=O as necessary conditions for optimization in this sense [1). If Q is invertible, then u can be eliminated from the second equation and the resulting system formed by the first two equations solved directly. We shall be most interested then in the case when Q is not invertible, though our results will include the case when Q is invertible. The system (3) can be rewritten as

Il

0 01 IA]

10

1

IA*

OIIxI+I0

LO 0 OJLuJ

LB*

H

01111

101

—A —BIIxJ=I0I. 0

QJ[uJ

LOJ

Note that (4) has leading coefficient singular.

We assume throughout that controls are continuous. All statements concerning optimality are made with respect to the control problem of this section and this linear manifold of controls. Optima) control problems with singular matrices in the quadratic cost functional have received much attention. They occur naturally as a first order approximation to more general optimal control problems. [45] surveys the known results on one such problem with singular matrices in the cost. The approach given here has the advantage that it leads to explicit solutions for the problem studied, as well as a procedure for solution. These explicit, closed form solutions, also simplify the proof and development of the mathematical theory for the problem studied. We shall first show that if (3) has a solution satisfying the boundary conditions, then u must be an optimal control.

(4)

APPLICATIONS OF THE DRAZIN INVERSE

189

Theorem 9.5.1 Suppose that x, u, A is a solution of(3) and x(t0) = x0, x(t1) = x1. Then u is an optimal control.

Proof To show that J [I, a]

J [x, u] for all î, ii satisfying (1) and the boundary conditions it is clearly equivalent to show that çb(s)

has

= J[sx + (1 — s)l, su + (1

—

s)ü]

a minimum at s = 1 for all I, ó. Let J0 =

I) + (Qu, i)dt,

3 = J[1, ü], and J = J[x, uJ. Then a direct calculation gives 4)(s) = s2(J

—

2J0

+ 3) + s(2J0 — 2!) + .1. Since 4) is quadratic in s it has

a maximum or minimum at s = (III I

Jto

1

if and only if J0 = J, or equivalently,

Cl,

(Hx,i)+(Qu,ü)dt= I (Hx,x)+(Qu,u)dt.

(5)

Jto

0 for all s so that if(S) holds there must be a minimum. Clearly (5) is equivalent to However, 4)(s)

— x)dz

=

— iI)dt.

(6)

(Hx,l)—(Hx,x)dt.

(7)

But

(Hxx—x)dt=

Cli

Jlo

Jt0

Now(Hx,x)=(—A— A*A,x)= _(A,x)_(A*A,x). But (AA,x)=(A,Ax)= (A, x — Bu) = (A, 1) — (A, Bu) = (A, ) — (B*A, u) = (A, x) ÷ (Qu, u). Thus (Hx, x) = Hence

(A, x) — (A, I) —

Cli

Jto

(Hx,i — x)dt =

(Qu, u). Similarly, (Hx, I) = (A, 1) — (A, 1) — (Qu, ü). C,,

(2,x)+(2,x)+(Qu,u)—(2,l)—(2,l)—(Qu,ã)dt

,J1O

ii

=(A,x)

Ii

—(Al) 10

to

+

(Qu,u)—(Qu,ü)dt Jto

C,

= JI to (Qu,u—ã)dt. U Note that Theorem 1 says that solutions of (3) satisfying the boundary conditions provide optimal controls even if the differential equation (3) has non-unique solutions for consistent initial conditions. Of course, in that case the optimal controls may not be unique. As a useful by-product of the proof of Theorem 1 we have that Si

J[x,u]=—2(A,x) I0

190 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

To simplify the solving of (4) rewrite it as

Ai+Bz=0 where A

(9)

], B =

=

['

:2]. Here I is 2n x 2n, O],and B4=Q.

B,

(10)

Clearly (p + B1) 'exists except for a finite number of p. Define

Q,=B4—B3(p+B,I'B2.

(11)

We now need the following easily verified result whose proof we omit. pA + B is invertible almost always if and only if invertible almost always. is + B1 are invertible. Let Assume that p. A, B are such that pA + B, are defined by A = (pA + B)- 'A, B = (pA + B) 'B. Then

Proposition 9.5.1

0]' n_I 01

LMP

01II_pN I] L—PMM

0

Using Corollary 7.7.2 we get

01

ADA_I

0 L

To evaluate e

0]

-

"i' note that for integers r

I

01'

1

[NDZ

0] =

L

0

1,

]

0

0

Thus the power series expansion of the exponential gives

r

0

e

e

I

Using Theorem 2.3 we see that the general solution of (9) is

I e

0

—

0 0

From the original equation (4) we have that

= e1

where

= ).(t0),

(12)

APPLICATIONS OF THE DRAZIN INVERSE

191

and

(13)

Thus we have shown the following.

Theorem 9.5.2

If

terms of x, A by (13) While (13) gives

is invertible, then the optimal control u is given in an optimal control exists.

[L]

explicitly, (13) does not give u directly in terms of

x. We now turn to this problem. Let E(t) = e -

—

[Ei(o) E2(t)] where the EAt), i = =

1,2, 3,4

are all n x n matrices. Suppose that (3) has a solution. Let

=

= A0. Then

Note that this is possible if and only if 1 1 Now E(ti)L]or

Lx(t1]=

x1 = E3(r1)A0 + E4(t1)x0.

(14)

Once ).0,x0 are known, x,u follow from (12) and (13). On the other hand if (14) is viewed as defining x1, then from (12) x will go from x0 to x1. Thus we have established the following result. Theorem 9.5.3 Suppose that is invertible almost always. For a given x0, x1 there is an optimal control that takes x from x0 to x1 in the time interval [t0, t1:J tf and only the equation (14) has a solution A0 such

It is possible, under our assumptions, for x to be able to go from x0 to x1 but not have an optimal control existing if Ng

L"oJ

=

.

We shall give a simple example that illustrates this. It shall also serve to illustrate our method. satisfying (14), is inconsistent in

Example 9.5.1 matrices. The process is then simply x = u, and the cost is 1

x=[x1,x2J1,

u=[u1,u2]T.

192

GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

The system (4) becomes 11 10

0

LO

0

I

0]IA1 10 1 01121 10] 0 —1 lix 1=101. 01 lxi +10 oJLuJ LI 0 QJLuJ LOJ

(15)

Since B is invertible, we may take p =0 in (pA + B) - L• Now

B1=I0

0

Li

0

QI

01-' fo

[0 i

—I

=11

I

—Q]

0

0

Lo —i

0

Multiplying (15) by B-' gives

Q 0112] 121

10 1

0

LO

—I

OIlxl+IxI=0. OJLuJ

(16)

LuJ

It is straightforward then, to get that the solutions to (15) are given by

110

121

0

Q

(17)

Q 011x01. 0 OJLu0J

LuJ \L0 —Q 0J JL—Q It is clear that for any x0,x1 there exists a control u sending x0 to x1. But for scalar c. Thus in

the x in (17) only takes on values of the form

order for an optimal control to exist, x0,

must be of the form

A look at the power series for the exponential in (17) shows that

121

lx

1

cosh(t—t0)Q+(I—Q)

1=1 —sinh(t—t0)Q

LuJ

L — cosh(t

—

t0)Q + Q

0] cosh(t—t0)Q+(I—Q) 0

—sinh(t--t0)Q — sinh(t

—

t0)Q

I

Q0 0

Q

L—Q °

OiLuoJ

— sinh(t — r0)Q cosh(t — t0)Q 01 cosh(t—t0)Q —sinh(t—t0)Q 01 =I cosh(t — t0)Q + Q — sinh(t — t0)Q 0J Lu0 L—

Ifx0

and x,

([o

Q

we see that t = t0 gives u= Q20. Since

0])we must have

then u0 =[— = [k], and

APPLICATIONS OF THE DRAZIN INVERSE

193

Letting t = t, gives

c, =

— sinh(t1

+ cosh(t1 — t0)c0.

—

we have

Solving (18) for

— t0){c,

=

(18)

—

cosh(t,

—

t0)c0}/sinh(t1

—

—

sinh(t —

0

L

as the optimal control. x can also be easily solved for if desired. We have arrived then at the following procedure for solving the original problem. Given x0, x1 determine whether it is possible to go from x0 to x1 with an optimal control by solving (if possible) (14) for such that a0

If

LX0J

is found, use the bottom half of (12) for x if x is

Use (12) and (3) to get the optimal control u. In working a given problem, it is sometimes simpler to solve (4) directly using the techniques used in deriving the formulas (12) and (13) as done in this example, rather than try to use the formulas directly. At this point, an obvious question is What does QM being invertible mean?' That is, 'What is the physical significance of assuming the The answer itself is easily comprehended. The proof, invertibility of however, requires some knowledge about Laplace transforms. The reader without an understanding of Laplace transforms and analytic functions is encouraged to read the statement of the theorems. From (10) and (11) we have needed.

p_A][_B] (19)

Proposition 9.5.2

Proof

If Q is invertible, then

is almost always invertible.

jim

matrices form an open set. U If Q is invertible, then it is obvious from (4) that u can be solved for in terms of x, A. Theorem 3 shows that this can happen even when Q is not invertible. We note without proof the following proposition.

Proposition 9.5.3

1fF, G are positive semi-definite r x r matrices, then F + G is invertible and only N(G) = {O}. Of course, is invertible almost always for real p ii and only if it is almost always invertible for complex p. Let p = is where s is real. Then (19) x H( — is

+ A)- 'B. From Proposition 3 we have that

is invertible almost

194 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

always if and only if (0) =

N(Q)rN(B*(

N(H112( — is + A)- 'B) = real s. Thus we have proven that:

+ A) I*H(_ is + A) 'B)= N(H( — is + A)- 'B) for almost all —

is

Theorem 9.5.4

is invertible for almost all and only if N(Q)n N(H( — is + A)- 'B) = (0) for almost every real s.

We need a technical result on analytic (1)-inverses before proceeding.

Theorem 9.5.5

Suppose that A(•) is an m x n matrix valued function such that A.Iz) is a fraction of polynomials for all i and). Suppose also that N(A(z)) is non-trivial for all z in the domain of A(). Then for any real number w >0, there exists an m x n matrix valued function B() such that

is a fraction of polynomials. (ii) R(B(z)) = N(A(z)) for almost all z. (iii) The poles of B are integral multiples of cvi, cv> 0, are simple, and (iv) IIB(z)II (i)

Proof Suppose that A() is an m x n matrix valued function such that is a fraction of polynomials for all i and j. Suppose also that N(A(z)) is non-trivial for all z in the domain of A(•). Let X be an n x m matrix of Then AXA = A is a consistent linear system of at most mn unknowns equations in mn unknowns. Denote this new system by (s). Since the coefficients of s are fractions of polynomials, there exists a real number K such that all minors of(s) are identically zero, or identically non-zero, for z K. Thus s can be solved by row operations (non-uniquely) to give a F(•) such that for z K; AFA = A, the entries of F(z) are fractions of polynomials in z, rank (F(z)) is constant, and rank (F(z)) is the maximum possible (dim N(A(z))). Note that is a fraction of

polynomials for alliandj. Let z,,...,zq be the poles ofFA. Let rj,...,rq denote their multiplicities. Let r0 be such that FA = CX z I'°) as z —, co.

Seta=r0+r,+... +rq+3.Define 8(z)

I

{ fl (z — zj' fl (z — ipw)' }(I — F(z)A(z)).

= j=1

p=l Then B clearly satisfies (I), (iii) and (iv). Since (ii) holds for I z K, it holds

for almost all z by analytic continuation. • We can now prove the following.

Theorem 9.5.6

The following are equivalent.

(a) There exists an x0,x1 for which optimal controls exists, but are not unique.

(b) There is a trajectory from zero to zero of zero cost with non-zero control.

(c) Q is not invertible for all p.

APPLICATIONS OF THE DRAZIN INVERSE

195

Proof Clearly (b) (a) since J[O, 0] = 0. To see that (a) (b). let (x, ii), (i. ii) be two optimal solutions from x0 to x,. Then there exists 2.2 so that (2. x, ii) and (2. x, ii)

satisfy (3). Thus (2 — 2, x — x, ii — ii) satisfies (3) and hence is optimal by Theorem 1. But (x — i)(t0) = (x — i)(t,) =0 and ii — ii is not identically zero. That J[x — 1, ii — ii] = 0 follows from (8). Suppose now that (b) holds so that there exists x, ii such that J[x, UI =0, x(t0) = O,x(t1) = 0, and u is non-zero. Since J[x.u] = 0 it is clear from (2) that Hx = 0 and Qu = 0. Extend xii periodically to [— x, x] and replace t by t — t0. Call the new functions î, ü. Thus Hx =0, Qü =0, and

i=Ai+Bü,t#n(t, —t0),n=O, ±1, ±2,... Sinceüisboundedand

sectionally continuous on finite intervals, x is continuous, and x is of exponential order, we can take Laplace transforms to get HL[i] =0,

QL[u] =0 and L[i] = (s — A)- 'BL[ü]. Thus L[ü](s)€N(Q)r N(H(s — A) 'B) for all s in some right half plane. By Theorem 4, we have is not invertible for all p. is not invertible for all p. From the proof of Conversely, suppose that for p = it. t real. N(H(p — A) 'B) = Theorem 4 we have

Thus N(Q)r' N(H(p — A) 'B) = N(QM) for almost all p. Now applying =0, and with co = 2ir/(t, — yields a B, such that Theorem 5 to =0. Let be satisfies (iii). (iv). But then QB =0, and H(p — A) Let by vector such that is not identically zero. Denote i(s) = (s — A) 'B(s). Then we have that

Hi(s) = O,Q#(s) = 0, and i(s) = (s — A)

(20)

Let x be the inverse Laplace transform of i, u the inverse Laplace transform of From (20) and (iv) we have Hi =0, Qü =0, x = Ai + Bü, i(0) =0, and ü(0) =0. [29, p. 1841. Furthermore, a is non-zero. Finally, we get since the poles of i(s) were simple and multiples of 2ni/(t, — that i,11 are periodic with period (, — [29, p. 188]. Replace i,ü by x = i(t + t0),u = ti(t + ta). Then x(t0) = x(z,) = 0, J[x.u] = 0, and x =

•

Ax + Bu. Thus (c) = (b). It is possible to have invertible almost always and still have non-zero optimal trajectories of zero cost. Of course, the control u must then be zero.

Example 9.5.2 Let Q = I, A = I, B =0, H =0 in (2) and (3). Then Q is invertible for large p since Q is. Clearly x = exp(A(t — t0))x0 is a trajectory of zero cost from x0 to x, = exp(A(t, — t0))x0. But ii = 0 and J[x,u] = 0. Note also if x0 =0, then x 0. We will make no use of 'controllability', and hence will not define it. For the benefit of the reader familiar with the concept, note that the invertibility of Q is logically independent of the controllability of (2) since for any choice of A, B, setting Q = I makes invertible almost always, while setting Q = H =0 makes 0. Note also that in Example 2, the pair (A, B) was completely controllable and was invertible. However, optimal controls only existed for certain

196 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

pairs x0, x,. Thus the assumption of controllability does not seem to

simplify matters if Q, H are allowed to be singular. The method of this section can be applied, of course, to any problem which leads to a system of the form Az + Bz = f. However, the special form of the A makes most of the calculations possible since it allowed us to use Theorem 7.8.1. Any problem which leads to a system with A=

IA1 01 be solved much as was (9), provided, of course, I can L"2 UJ I

pA + B is invertible for some p. We shall now describe several such problems. The calculation of the solutions parallels those just done, so a description of the problem will suffice. (Hx, x) + (Qu, u) + J <x, a >dt where a is a vector. Then the right hand side of (4) has = (as, 0*, 0*)* instead of the zero vector. Theorem 2.3 can be used to solve this non-homogeneous system to get

For example, suppose that the cost is given by

Q = ADe ADésJeADbsads +

+e

(I —

A"AeADAq.

The integral can be evaluated by using Theorem 2.4. For this problem, it is important to know whether or not the cost is positive. Another variation on the same type of problem is process (1) with the

+ 2 +
cost functional J[x,u] =

IH C*1

]

L

is

10

positive semi-definite [1, pp. 461—463]. In this case the system

to be solved is 1

10

0 01111 1

IA*

OIIxI÷I0

Lo 0 OJLuJ

LB*

H —A C

C*1 IA1

rol

QJLuJ

L0J

—B lIxI=I0I.

Solution proceeds almost exactly as when C =0, though QM has a slightly different form.

The analysis developed here can be also applied with little change to the following non-optimal control problem. Given output y, state vector x, and process i = Ax + Bu, find a control u such that y = Cx + Du. The appropriate system then is

II Olixi I—A —B1Ix1 101 Lo

c

D]Lu][yj

.

If y and u are the same size vectors, then (21) is the non-homogeneous form of equation (9). It may be solved by our techniques under the = D + C(p — A) 'B is invertible. assumption that

(21)

APPLICATIONS OF THE DRAZIN INVERSE

197

One frequently does not want to have D a square matrix in (21). If D is not square, then (21) can often be solved using Theorems 7.10.16 and 7.10.20.

Discrete control systems arise both as discretized versions of continuous systems and as systems of independent interest. By using the results of Section 3 some discrete problems can be handled in much the same way as the continuous ones were handled. For example, consider the following:

Discrete control problem Let A, B, Q, H be matrices of sizes n x n, n x m, in x in, and n x n respectively. Assume that Q. H are positive semi-definite. Let N be a fixed integer. Given the process (22)

the cost

J[x,u] =

(Hx1,x1)

+(Qu1,uJ,

(23)

1=0

and the initial position x0, find the control sequence which minimizes the u= cost (23). Here x = Note that the terminal position is not specified whereas it was in the continuous problems considered earlier.

Theorem 9.5.7 The Discrete Control Problem has a solution {x1}, satisfy and only there exists IAJ such that the sequences

IH _A* 0

LO

for i =0, 1,... , N

Proof

01

r— A

0

0 0

I

B] lxii

101

OIAj=I0I

Q]LuJ L with x0 given and AN =0, UN =0.

OJLu1+1J — 1,

—

(24)

LOJ

Since UNOflIY appears in the cost and does not effect the {x1}, UN

=0. Take UN =0. To see the

may be taken to be any vector such that N—i

necessity of(24), consider J[x.u] +

AN=0.Then

(A1,x1.. — Ax1 — Bu,) and set 1=0

a(x1 —Axo—Buo,...,xN—AxN_i a(x1,... ,xN)

BuN_l)_ 1

—

where (z1, ... ,zN) is to be considered as a list of the n entries of z1 then the n entries of z2, etc. Thus one gets by the usual theory of Lagrange

multipliers that

Hx1_A*A1+A1...1=0,i=l,...,N_1, (25)

HxN+AN...l=O, Qu1 — B*A1 =0,

i=0,...,N—1

198 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS is

necessary. But (25) is equivalent to (24) since AN was taken equal to zero.

On the other hand if (xj, {uj, {Aj satisfy (25), then one may show, almost exactly as for Theorem 1, that

J[sx+(1 —s)*.su+(l has a minimum of s = 1. We omit the details. •

1,

For the control problem considered here, x0 can be arbitrary.

Theorem 9.5.8 Suppose that Q + B*( — pA* + I) 'pH(p — A) 'B is invertible for some scalar p (and hence for all but a finite number of 4 Then for every x0 there exists a solution to the Control Problem.

Proof Given x0, J[x, u] defines a

function on

below, it suffices to show that J[x, uJ goes to 00 as

m•

Since 112

J is bounded

does. If Q is

invertible, this is clear. Suppose then that Q is singular and — pA* + I) 1pH(jt — A)- 'B is non-singular for almost all p. Q+ = Suppose for purposes of contradiction that there exists a sequence of N-i controlsequences

{u1,},i=O....,N— l;r=O...,suchthat

but J[x,, U,] is bounded. We shall show that, in fact, {u1,) is bounded as r-. oo. Since J[x,,u] bounded, one has (Qu0,,u0,) is bounded. Hence Q112u0, is bounded. But, (AHx,,,x1,) = + Bu0,) 112 is also bounded. Then H"2Bu0, is bounded since x0, = x0 for all r. But is invertible for almost all p. so that u0, is bounded. Hence x,, is bounded. Proceeding in this manner, one is

gets

P4±111

u1,

112 is bounded. Thus J attains its minimum as desired. U

01

1=0

We can now solve the Discrete Control Problem. Let A,

Then (24) becomes for i =0, ... , N

—

= LH

A]'

I,

fA, olIz+,1 fB, B21[z1110 0]Lu1+j LB3 B4JLU1jLO

L0

—

(26)

Proposition 9.5.4 = B4 — B3HM 'B2 = Q + B*(_ pA* + I) 'H x (p — A) 'B where p is such that and (p — A) are invertible. Then

fpA,+B, L

B3

is invertible

B21

B]

and only If Q is invertible.

It is assumed from here on that (27) and pA1 + B1 are invertible.

(27)

APPLICATIONS OF THE DRAZIN INVERSE

199

Multiply (26) by the inverse of (27) to get

f

011z1+11+1 0]Lu1+1]

LM

[M

0]D1

01D

01

and

0]'

0]

(28)

= I. But

+

= 0, x0 given,

with UN = 0, FNM

o11ze1=1o1 IjLu] L0J'

zM

0 0

1]

— ')(I — Here = (WN + WPZM + ... + By Theorem 3.3, all solutions of (28) are given by —

F—

[u1] — L

0hz0

0]

—

1=

OjLuo O1FZN]

÷1

0]

[

(29)

I]Lo ]

A solution (29) will satisfy the boundary conditions if and only if

+ (—

=

= (LMNM +

u0 = ZN

(30)

— —

(32)

+ (I —

= (—

and

0=

—

Recall that PNM + becomes — M( —

-

= I. Thus

(33)

—

and

commute. Using (32), (33)

—

=0. The

—

preceding discussion is summarized in the following theorem. Suppose that QM is invertible, that N> md (Np), and x0 is specWed. Then XN are obtained by solving

Theorem 9.5.9

=0, and

(I —

The control sequence is given by u0

=

=(

x (I

and for i >0,

= — (LMNM

—

+

—

—

As mentioned earlier, one is probably better off to follow the steps in the proof of Theorem 9 rather than try to utilize the formulas.

200 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

If the process is not completely controllable, and x0 is a point that cannot be steered to the origin, then XN will be unequal to zero. A very simple example is obtained by taking A = I, B =0, and Q invertible. is invertible. In this case, of course, one would get ; = Then and u, =0 for all I. It is possible to have Q not invertible, the process not completely controllable and still be invertible and our results apply. One may take

?]asanexamPle. In applications it frequently happens that Q is invertible. Unlike the continuous control problem, the discrete problem can still give rise to a singular difference equation when Q is invertible. If Q in the Discrete Control Problem is non-singular, then = Q - lB*).. for

i=O,1,...,N—land(24)becomes 0

Az

LH

L

o

i (34)

=[:]

for i=O,1,...,N—1,and2N=0. A is invertible if and only if A is. However, there always exists a p such that pA + is invertible so that Theorem 3.3 can always be applied. The difference equation (34) has the advantage that one can work with matrices that are 2n x 2n instead of(2n + m) x (2n + m). While N Ind(A) was assumed in the statement of the theorems, the assumption is not really necessary. if N < Ind(A), one may still use Theorem 3.3 to solve (24). Note that Theorem 5.7 holds even if Q is a singular for all p.

6.

Functions of a matrix

Iff(A) is an analytic function defined on a neighbourhood of the eigenvalues of the n x n matrix A, then there exists a family of projections 1, ... ,r, such that P1P,=0 if I, and , k1—1

f(A) =

'

11m0

' "(A —

m!

(1)

= md (A —

(Equivalently, is the multiplicity of as a root of the minimal polynomial of A.) Formula (1) has been known for some time, see [49) for example. The purpose of this short section is to observe that the may be explicitly written in terms of the Drazin inverse. where

Theorem 9.6.1 Suppose that AECN X N has r distinct eigenvalues (A1 ,... '1,), that Ind(A — A,) = Ic,, and thatf(A) is analytic on a neighbourhood of the

APPLICATIONS OF THE DRAZIN INVERSE

201

Then

f(A)= (I

—

(A

—

—

A.)).

From [49] it suffices to show that

= I, and

AQ1 = Q1A, (A — A)Q1 is nilpotent,

= Q1

(3)

0.

Suppose the Jordan form of A is given by TAT' = Diag{J, , ... ,J,} + N1 with where = nilpotent. Then = Diag{Q11, ... , Q1j with Q1, =0 for I i, = I for i = I. Thus (3) holds. • The are often referred to as the idempotent component matrices of A.

Corollary 9.6.1 eM

Let A be as in Theorem 1. Then

=

—

— (A

(4)

—

—

Proof Let 1(A) = eA, B = tA in (2). Observe that the eigenvalues of tA — A1). are {tA,, ... ,tAj and for t > O(tA — tA1)D(IA — tA1) = (A — U Using Corollary 9.6.1, one may get the following version of Theorem 2.3. Theorem 9.6.2 Suppose there exists a c such that cA + B is invertible. = (cA + B) 1B, and CA = AA + ñfor all A. Suppose Let A = (cA + B)- 1A, is not invertible for some A. Let {A,,... ,2,} be the Afor which AA + B is not invertible. Then all solutions of Ax + Bx =0 are of the form CA

r

=

(—

(5)

—

m.

or equivalently,

, (—

rn

— c + ADreAut(I

—

[(At — c)A + I]D[(A1 — c)A

+ I])q (6)

where k. = md CAL, q an arbitrary vector in

Proof The general solution of Ax + Bx =0 is x = e

ADBtADAq,

q an

arbitrary vector by Theorem 10.13. Since cA + = I, we have x=e

AD(I_CA)IADAq

=

AD +cI)IADAq.

(7)

Also if A1A + B is not invertible, then A.A + is not. Hence AA + I — cA is not. Thus 21A + B is not invertible if and only if — (A1 — c) - is an eigenvalue of A. But then (— + c) is an eigenvalue of AD. Thus A,A + B is not invertible if and only if A. is an eigenvalue of ci — Both (4) and (6)

now follow from (4), (7) and a little algebra. •

202 GENERALIZED IN VERSES OF LINEAR TRANSFORMATIONS

Note that if the k1 in Theorem 1 are unknown or hard to compute, one may use n in their place. It is interesting to note that while AA + and + B have the same eigenvalues (2 for which det(AA + B) = 0), it is the algebraic multiplicity of the eigenvalue in the pencil 2A + that is important and not the multiplicity in 2A + B. In some sense, Ax + =0 is a more natural

equation to consider than Ax + Bx =0. It is possible to get formulas like (5), (6) using inverses other than the Drazin. However, they tend to often be less satisfying, either because they apply to more restrictive cases, introduce extraneous solutions, or are more cumbersome. For example, one may prove the following corollary.

Corollary 9.6.2 Let A, B be as in Theorem 1. Then all solutions of Ax + Bx = 0 are of the form m

=

(—

—

m.

(8)

The proof of Corollary 2 is left to the exercises. Formula (8) has several disadvantages in comparison to (5) or (6). First At and CA do not necessarily commute. Secondly, in (5) or (6) one has q = x(O) while in (8) one needs to find q, such that —

= q, which may be a non-trivial task.

1=1

7.

Weak Drazin inverses

The preceding sections have given several applications of the Drazin inverse.

It can, however, be difficult to compute the Drazin inverse. One way to lessen this latter problem is to look for a generalized inverse that would play much the same role for AD as the (1)-inverses play for At. One would not expect such an inverse to be unique. It should, at least in some cases of interest, be easier to compute. It should also be usable as a replacement for in many of the applications. Finally, it should have additional applications of its own. Consider the difference equation (1)

From Section 3 we know that all solutions of (1) are o( the form x, = It is the fact that the Drazin inverse solves (I) that helps explain its applications to differential equations in Section 2. We shall define an inverse so that it solves (1) when (1) is consistent. Note that in (1), we have x,, = for I 0. Thus if our inverse is to always solve (1) it the must send R(Ak), k = Ind(A), onto itself and have its restriction to

APPLICATIONS OF THE DRAZIN INVERSE

203

to R(A"). That is. it provides the unique as the inverse of A solution to Ax = b, XER(Ak), when bE R(A"). same

Definition 9.7.1

"and k = Ind(A). Then B is a weak

that Drazin inverse, denoted Ad, Suppose

(d)

B is called a projective weak Drazin inverse of A if B satisfies (d) and

(p) R(BA) = R(AAD). B is called a commuting weak Drazin inverse of A (1 B satisfies (d) and

(c) AB=BA. B is called a minimal rank weak Drazin inverse of A B satisfies (d) and (m) rank(B) = rank(AD).

Definition 9.7.2

An (ia,... , i,,)-int'erse of A is a matrix B satisfying the

properties listed in the m-tu pie. Here i, E { 1,2,3,4. d, m, c. p }. The integers 1, 2, 3, 4 represent the usual defining relations of the Moore—Penrose inverse. Properties d, m, c, p are as in Definition 1. We shall only be concerned with properties { 1,2. m, d, c. p }. Note that

they are all invariant under a simultaneous similarity of A and B. Also 'B = Ak, and get note that one could define a right weak (d)-inverse by a theory analogous to that developed here. Theorem 9.7.1 Suppose that AEC" non-singular matrix such that TAT...I

X

k

= md A. Suppose TEC" XI?

C non-singular. Nk =0.

=

is

a

(2)

Then B is a (d)-inuerse of A if and only if

TBT'

X,Yarbitrary.

(3)

B is an (m,d)-inverse for A if and only if

Ic.-1 xl

TBT -1

Lo

B is a (p. d)-inverse

TBT'

=

oj'

Xarburary.

(4)

of A if and only if X arbitrary, YN =0.

(5)

B is a (c,d)-inverse of A ifand only if

rc-1 o' YNNY. Tff1'=Lo y]'

(6)

204 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

B is a (I, d).inverse of A

(land only if

TBT'

XN=O,Na(1).inverseofN.

(7)

B is a (2,d)-inverse of A (land only :f

YNY=Y,XNY=O.

(8)

If TAT'

is nilpotent, then (3)—(8) are to be interpreted as the (2,2)-block in the matrix. If A is invertible, then all reduce to A -

Proof Let A be written as in (2). That each of (3)—(8) are the required types of inverses is a straight-forward verification. Suppose then that B is a (d)-inverse of A. The case when A is nilpotent or invertible is trivial, so assume that A is neither nilpotent nor invertible. Since B leaves R(Ak)

invariant, we have TBT1

(d) gives only ZC"' =

Izxl

= Lo Hence

for some Z, X, Y. Substituting into

Z = C1 and (3) follows. (4) is clear.

Assume now that B satisfies (3). If B is a (p, d)-inverse then

iT' XN1\_ (F '

RkLO

° o

Thus (5) follows. If B is a (c, d)-inverse of A, then

r' cxl Lo

NY]

XN Lo YN

1'

But then CkX = XNk =0 and (6) follows. Similarly, (7) and (8) follow from (3) and the definition of properties { 1,2). Note that any number I md (A) can be used in place of k in (d). In and Ak+ lAd Ak. Although AdA and AAd are not general, AkA4A always projections, both are the identity on R(Ak). From (3), (4), and (6);

Corollary 9.7.1

is the unique (p.c. d')-inverse of A. AD is also a

(2, p, c, d)-inverse and is the unique (2, c, d)-inverse of A by definition.

Corollary 9.7.2

Suppose that Ind(A) =

1.

Then

(i) B is a (1,d)-inverse of A (land only (lB is a (d)-inverse, and (ii) B is a (2,d)-inverse of A (land only (1 B is an (m,d)-inverse.

Corollary 9.7.3

Suppose that Ind(A)

2. Then there are no (1,c,d)-

inverses or (1, p. 6)-inverses.

Proof Suppose that md (A) 2 and B is a (1, c, d)-inverse of A. Then by =0 (3), (6), (7) we have X =0, NYN = N, and NY = YN. But then

APPLICATIONS OF THE DRAZIN INVERSE 205

which is a contradiction. If B is a (1, p. d)-inverse we have by (3), (5), (7)

that X =0, Y =0, and NON = N which is a contradiction. U Most of the (d)-inverses are not spectral in the sense of [40] since no assumptions have been placed on N(A), N(Ad). However;

Corollary 9.7.4

The operation of taking (m,d)-inverses has the spectral mapping property. That is. 2 is a non-zero eigenvalue for A (land only if 1/2 is a non-zero eigenvalue for the (m, d)-inverse B. Furthermore, the eigenspaces for 2 and 1/2 are the same. Both A and B either have a zero eigenvalue or are invertible. The zero eigenspaces need not be the same.

Note that if 2 is a non-zero elgenvalue of A, then 1/1 is an eigenvalue of any (d)-inverse of A.

Corollary 9.7.5 JfB1,... , B, are (d)-inverses of A, then B1B2 ... B, is a is a (d')-inverse of Atm. (d)-inverse of A'. In particular, Corollary 5 is not true for (1)-inverses. For B

is

=

a (1,2)-

but B2 = 0 and hence B2 is not a (1)-inverse of

inverse of A

= A2 = A. This is not surprising, for (A')2 may not be even a (1)-inverse of A2.

Theorem 9.7.2

Ind(A) = k. Then

Suppose that

(i) (AD + Z(I — ADA)IZ€Cn is the set of all (d)-inverses of A, ADA {AD ADAZ(I — (ii) is the set of all (m,d)-inverses of A, + — ADA) (AD ZA = AZ) is the set of all (c, d)-inverses of A, (iii) + Z(I X

and

(iv) {AD + (I — ADA)[A(I — ADA)] 1(1 — ADA)[A(I ADA) =0) is the set of all (1, d)-inverses of A.

—

ADA)]A(I —

Proof (i)—(iv) follow from Theorem 1. We have omitted the (p. d)- and (2, d)-inverses since they are about as appealing as (iv). U Just as it is possible to calculate A' given an A, one may calculate A" from any Ad.

Jfk = lnd(A), then AD = (Ad)I+ 'A'for any I k. The next two results are the weak Drazin equivalents of Theorem 7.8.1.

Corollary 9.7.6

Theorem 9.7.3

Suppose that

A

where C is

= invertible. Then all (d)-inverses of A are given by Ad

IC1 _C.1DEd+Z(I_EàE)]. Lo

Ed

j'

206 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS Ed any (t)-inverse of E, Ea an (m, d)-inverse of E, Z an arbitrary

matrix of the

correct size.

Proof Suppose A Then Ak

with C invertible. Let k = Ind(A) = Ind(E).

=

ic' e1 = Lo

II

.

.

01.

Ek]' where Ois some matrix. Now the range of[0 0]

in R(Ak). Hence AD and any Ad agree on it. Thus (10)

Now suppose (10) is a (d)-inverse of A. Then AAdAk = A'. Hence

fC D1ICi Lo EJLO

XI1ICA 811C' & X2J1.o

II

CX1+DX211C' ThusLo EX2 ]Lo &+(CX1+DX2)E'=O, EX2E'=E'. If AdAk +2

E'f[0

E'

E'][0

E']'°'

01 ic' el

(11) (12)

= A' is to hold, one must have X2 a (d)-inverse of E. Let X2 = E'

for some (d)-inverse of E. Then (12) holds. Now (11) becomes X1E' = — C IDEdEk. Let Eà be an (m,d)-inverse of E. Then EaE is a projection onto R(E'). Hence X1 must be of the form — C IDEa + Z(I — EaE) and (9) follows. To see that (9) defines a (d)-inverse of A is a direct computation. = A' implies ABA' = A', the It should be pointed out that while two conditions are not equivalent.

Corollary 9.7.7

Suppose there exists an invertible T such that (13)

]T is an (m,d)-inverse

with C invertible and N nilpotent. Then T

for A. If one wanted AD from (13) it would be given by the more complicated

expressionTADTI

rc-i

=Lo

/k-i

C'XN'

Although for block triangular matrices it is easier to compute a weak Drazin than a Drazin inverse, in practice one frequently does not have a block triangular matrix to begin with. We now give two results which are the weak Drazin analogues of Algorithm 7.6.1.

Theorem 9.7.4

Suppose that

and that p(x) = x'(c0 + ... + c,xl,

APPLICATIONS OF THE DRAZIN INVERSE 207

0, is the characteristic (or minimal) polynomial of A.

c0

Then

Ad=__(c11+...+c,Ar_l)

(14)

is a (cd)-inverse of A. If(14) is not invertible, then Ad + (I invertible (c,d)-inverse of A.

—

AdA) is an

Proof Since p(A) =0, we have (c01 + ... + c,A')A1 =0. Hence (c11 + ... + c,A' ')A'4' = — c0A'. Since Ind(A) I, we have that (14) is a (d)-inverse. It is commuting since it is a polynomial in A. Now let A be as in (2). Then since Ad is a (c, d)-inverse it is in the form (6). But then

y= is

—

!(c11 + c2N + ... + c,N' c0

1)•

Ifc1 #0, then V is invertible since N

nilpotent and we are done. Suppose that c1 =0, then

is nilpotent. That Ad + (I — AdA) is a (c, d)-inverse

follows from the fact that Ad Note that Theorem 4 requires no information on eigenvalues or their multiplicities to calculate a (C, d)-inverse. If A has rational entries, (14) would provide an exact answer if exact arithmetic were used. Theorem 4 suggests that a variant of the Souriau — Frame algorithm could be used to compute (c, d)-inverses. In fact, the algorithm goes through almost unaltered.

Theorem 9.7.5

Suppose that A

and

X

Let B0 = I. For j = 1,2, ...

, n,

let

If p5#O, but

=0, then (15)

is a (c,d)-inverse. In fact, (14) and (15) are the same matrix.

Proof Let k = Ind(A). Observe that = — — — ... — If r is the smallest integer such that B, =0 and s is the largest such that #0, then Ind(A) = r — s. Since B, =0, we have A'=p1A'1 — ... —p5A'5=O. Hence,

A'5 = —(A' — p1A'

... — p5

1)

=

!(As—' _P1A5_2_..._P,_,I)Ar_s41.Thatis,Ak=(!B5....i)Ak+1

desired. U

as

208 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

Suppose that A, "and AB = BA. Let Ad be any (4)-inverse of A. Then AdBAAD = BAdAAD = BADAAD = ADBAAD.

Lemma 9.7.1

Proof If AB = BA, then ADB = BAD. Also if A is given by (2), then

TBT' =

with B1C = CB1. Lemma 1 now follows from

Theorem 1. • As an immediate consequence of Lemma 1, one may use a (d)-inverse in many of the applications of the Drazin inverse. For example, see the next theorem.

Theorem 9.7.6

Suppose that A,BeC""'. Suppose that Ax + Bx = 0 has

unique solutions for consistent initial conditions, that is, there is a scalar c such that (cA+ B) is invertible. Let A = (cA + B) 1A, = (cA + B) 18. Let k = md (A), jfAx + Bx =0, x(O) = q, is consistent, then the solution is If Ad is an (m,d)-inverse of A, then all solutions of Ax + Bx =0 x=e are of the form x = qeC", and the space of consistent initial conditions is R(AAd) = R(AI2A).

Note in Theorem 6 that AAd need not equal AdA even if is an (m, d)inverse of A. Weak Drazin inverses can also be used in the theory of Markov chains. For example, the next result follows from the results of Chapter 8.

If T is the transition matrix of an rn-state ergodic chain and if A = I — T, then the rows of I — AdA are all equal to the unique fixed probability vector ofTfor any (4)-inverse of A. Theorem 9.7.8

8.

Exercises

Exercises 1—6 provide a generalization of some of the results in Section 2.

Proofs may be found in [18]. 1. Suppose that A, B are m x n matrices. Let ()° denote a (2)-inverse. Show that the following are equivalent: (i) (AA + B)°A, + B)°B commute. (ii) (AA + B)(AA + B)°A[I —v.A + B)°(AA + B)] = 0. (iii) + B)(AA + B)°B{I — (AA + + B)] =0.

+ B)tA, 2. Prove that if A,B are hermitian, then + B)tB commute if and only if there exists a I such that N(XA + B) = N(A) N(B). Furthermore, if I exists, then (IA + B)tA, (IA + B)tB commute. 3. Prove that if A, BeC" 'are such that one is EP and the other is positive semi-definite, then there exists A such that AA + B is invertible if and only if N(A) N(S) = {0). 4. Prove that if A, Be C" Xli are such that one is EP and one is positive semi-definite, then there exists A such that N(AA ÷ B) = N(A) rs N(S).

APPLICATIONS OF THE DRAZIN INVERSE 209 5.

Suppose that A, are such that N(A) N(B) reduces both A and B. Suppose also that there exists a 2 such that N(AA + B) = N(A) N(B). Prove that when Ax + Bx =1, f n-times continuously differentiable is consistent if and only if + B) for all t, that is, (AA + B) x B)tf = f. And that if it is consistent, then all solutions are of the + form X

= + [(AA + B)D(AA + B) — ADarADAq

+e

—

+ [I — (2A + B)D(A.A + B)]g

where A = (AA + B)DA, B = (A.A + B)DB, I = (AA + q is an arbitary vector, g an arbitrary vector valued function, and k = md (A). 6. Prove that if A, B are EP and one is positive semi-definite, 2 as in Exercise 4, then all solutions of Ax + Bx = f are in the form given in Exercise 5. 7. Derive formula (8) in Corollary 9.6.2. 8. Derive an expression for the consistent set of initial conditions for Ax + Bx = f when f is n-times differentiable and AA + B is onto. 9. Verify that Corollary 9.7.6 is true. 10. Fill in the details in the proof of Theorem 9.5.7. + 'B = A". 11. If A E and k = md (A), define a right weak Drazin by Develop the right equivalent of Theorems 1, 2, 3 and their corollaries. 12. Solve Ax(t) + Bx(t) = b, A, B as in Example 2.2, b = [1 20]*. X

Answer: x1(t) =

—

x2(r) =

—

x3(t) = 13.

+ 2x3(0)) —

—

+

+ 2x3(0)) — + 2x3(0)) —

—

—

—

+ 2t —

t

Let T be a matrix of the form (1). Assume each P, > 0(If any P, = 0, Thus, we then there would never be anyone in the age interval A, agree to truncate the matrix T just before the first zero survival probability.) The matrix T is non-singular if and only if bm (the last birth rate) is non-zero. Show that the characteristic equation for T is

O=xm —b,x"''

—p1p2b3x"3 — Pm_2bm.- ,)x —(p,p2 ... ibm)•

10

Continuity of the generalized inverse

1.

Introduction

Consider the following statement: X (A) If is a sequence of matrices and C"' converges to an invertible matrix A, then for large enough j, A, is invertible and A

to its obvious theoretical interest, statement (A) has practical computational content. First, if we have a sequence of 'nice' matrices which gets close to A, it tells us that gets close to A'. Thus approximation methods might be of use in computing Secondly, statement (A) gives us information on how sensitive the inverse of A is to terrors' in determining A. It tells us that if our error in determining A was 'small', then the error resulting in A1 due to the error in A will also be 'small'. This chapter will determine to what extent statement (A) is true for the Moore—Penrose and Drazin generalized inverses. But first, we must discuss what we mean by 'near', 'small', and 'converges to'.

2. Matrix norms In linear algebra the most common way of telling when things are close is by the use of norms. Norms are to vectors what absolute value is to numbers.

Definition 10.2.1. A function p sending a vector space V into the positive reals is called a norm all u,vE V and

(I) p(u)=Oiffu=O. and (iii) p(u + v) p(u) + p(v) (ii)

(triangle inequality).

We will usually denote p(u) by u

CONTINUITY OF THE GENERALIZED INVERSE

211

There are many different norms that can be put on C". If uEC" and u has coordinates (u1, ... , un), then the sup norm of u is given by sup 1

The p-norm of u

is

given by 1/p

forp1.

The function u hf,, is not a norm for 0 p < since it fails to satisfy (iii). The norm I! ii 112 is the ordinary Euclidean norm of u, that is, u 112 is the geometric length of u. We are using the term norm a little loosely. To be precise we would have to say that is actually a family of norms, one for each C". However, to avoid unenlightening verbage we shall continue to talk of the norm the norm etc. are isomorphic, as a vector space, to C'"". The m x n matrices, C'" Thus C'" can be equipped with any of the above norms. However, in AB A liii B whenever working out estimates it is extremely helpful 1

AB is defined. A norm for which AB hAil IIBII for called a matrix norm. Not all norms are matrix norms. Example 10.2.1 the

If A =

of C'"" applied to

then define A II = max 1a11I. This is just

Now let

A = B = 1, but AB = 2. Thus this norm is not a matrix norm. There is a standard way to develop a matrix norm from a vector norm. is a norm on C' for all r. Define hA IL5 by Suppose that and

A 1105 = sup (II Au iii: ueC", u hIS = I). It is possible to generalize this

definition by using a different norm in C'" to 'measure' Au than the one used in C" to 'measure' u. However, in working problems it is usually We will not need easier to use a fixed vector norm such as or the more general definition. There is another formulation of II Proposition 10.2.1 Suppose that A e C'" Xfl and is a norm on C'" and Thus IA IL5 = I Au KIIuII5 for every ueC"}. The proof of Proposition 1 is left to the exercises. If A is thought of as a linear transformation from C" to C'", where C" and C'" are equipped with the norm then A L is the norm that is usually used. A L is also referred to as the operator norm of A relative to hence the subscript os. Conversely, if is a matrix norm, then by identifying C' and C' 'it is a matrix norm we have induces a norm on C', say IL. Since

IIAuIL hAil hulL.

212 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

If(1) occurs for a pair of norms and on C" and Ctm. we on By Proposition 1 we say that consistent with the vector norm know that is consistent with IL if and only if II A A We pause briefly to recall the definition of a limit. ii

Definition 10.2.2

Suppose that is a sequence of m x n matrices and A or lim Then = is a norm on C'" converges to A, (written XsI

ii

A)

j

every real number c> 0, there exists a real number i such A fi then ii Notice that the definition of convergence seems to depend on the norm

used. However, in finite dimensional spaces there is no difficulty.

Theorem 10.2.1

Suppose that dime,jsional vector space V. Then if

and and

f, are two norms on afinite Off, are equivalent.

That

is,

there exist constants, k, I such that k u u for all U E V. u Theorem 1 is a standard result in most introductory courses in linear

algebra. A proof may be found, for example, in [49]. Theorem 1 tells us that if A1 A with regard to one norm, then A1 A with regard to any norm. It is worth noting that A1 —' A if and only if the entries of converge to the corresponding entries of A. To further develop this circle of ideas, and for future reference, let us see what form Theorem 1 takes for the norms we have been looking at. Recall that an inequality is called sharp if it cannot be improved by multiplying one side by a scalar. Theorem 10.2.2 Suppose that ueC" and that p. q are two real numbers greater or equal to one. Then (1)

(ii)

iii

huh0 u

n—

(iv) n

— hi

ii

u

II u II q

if p
u

q

n

—

1/q

u

for all p, q

1.

Furthermore (i) and (ii) are sharp.

Proof (iv) follows from (i) and (ii) while (ii) is merely a rewriting of (i). To prove (i) assume uEC" and note that

/.,

In If u = e1, then ones,

then hull

u

In

\1/p

i=J

\lIp

(Z

iiuhh0 = max 1u11

)P)

(max j

(max I u.f)(

/)

=

\Ifp

li')

=

u JI

i=1

=

u

= 1 while ff

=

1

so u

=

u

Thus

is sharp. Of u consists of n is sharp.

CONTINUITY OF THE GENERALIZED INVERSE

213

observe that from (i)and (ii) we have " lUll4. The statement that II,, U U if 1 p
u Ii,,

ii

u

Ii

shows that Jensen's inequality is sharp. U We now turn to the problem of determining the matrix norms We begin with IlIL. Suppose that A = a

Au

= sup

= suP{

sup {

j=1

J

*

K

u

}

{

l

and

u, }

iai,i}

±

(2)

Thus

(3)

= and equality To get equality in (3) we need a vector u such that u occurs in (2). To get equality in the second inequality of (2) we can take Iu,l = 1, 1 j n. The first inequality of(2) will be an equality if 1

a.,uj =

u,l for that row of A for which

1a1,i is

maximum. Denote

1

this row by k. Then define u

if

as

follows,

0

u IL. This and (3) gives us the

u

following.

i=

Proposition 10.2.2

1

and A

=

then

Sup

{

a common formula. However, in the special case p = or p = 00, fl is reasonably calculable. Since p = 1,2,00 are the most common of the p-norms used this is not as bad as it sounds. We leave the derivation of the 1101 norm to the exercises. 1

Proposition 10.2.3 If A eCTM

then A L = max

{

214 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

The Euclidean norm 11112 is of much interest. Suppose that AeCTM

Xn

and

then Au

= (Au, Au) = (A*Au, u).

(4)

Since A*A is seif-adjoint, every vector UEC" can be written as u =

v, where v, satisfy (AA)v1 = for a real number 2,. Let be the largest eigenvalue of A*A with associated eigenvector u,. Then from (4), Thus we have and Proposition 10.2.4.

Proposition 10.2.4

then if A 1102 = 21/2 where 2 is the largest

eigenvalue of ABA. That is, if All02 is the largest singular value of A. In addition to and i,,, there are other matrix norms which can be used either to estimate and or in their own right. Two

common ones are given in the exercises. 3. Matrix norms and invertibility

In order to understand the convergence properties of the generalized inverse we shall briefly review the situation for invertible matrices. Throughout this section will denote any matrix norm on xl, such that lI'lI =1.Theassumptionthat 11111 =ldoesnotruleoutanyofthe matrix norms discussed in Section 2 and simplifies our formulas. XII

Proposition 10.3.1

Suppose that A e CNXII which is consistent with a vector norm

is a matrix norm on and IL. Then for every eigenvalue

AofA,I2l hAil. The proof of Proposition 1 is straightforward and is left to the exercises. Note that the vector norm of does not appear explicitly in Proposition 1. We will prove statement (A) of the introduction and develop some norm estimates for A — B The next proposition

is basic.

Proposition 10.3.2 11(1 — A)1

Proof Let B =

If hA if <1, then (I — A) is invertible and

1 —liAii

A if

a0

<1.

LetS= 11=0

Thus B = (I

—

A)-1. The estimate for if B follows from the

representation for B, 1IB1I

=

E

—

IA

•

CONTINUITY OF THE GENERALIZED INVERSE

215

"and lii — A < 1 then A is invertible and

Corollary 10.3.1

H

1-ll-All If — A < 1, then A = I — (I — A) and the result follows from Proposition 2. • Proof

H

The next two results are also basic to this section. We begin by

establishing that if A is invertible and B is close to A, then B is invertible. Alternatively we can show that if B is small then A + B is invertible. We choose the latter. Now (A + B) = A(I + A - 'B) where we are assuming that A is invertible. By Proposition 2, (I + A - 'B) will be invertible if A A 'B <1. A sufficient condition for this is clearly II B < A -'liii B
II

A'—(A+B)' =[A1(A+B)—IJ(A+By' =[I+A'B—I](A+BY' =A'B(A+ B)-' =A'B[A(I +A'B)]'

=A'B[I+A1B]'A'.

Thus 1lA'

—(A + B)—'

IIB1I 11A' 112 II(' + A'B)'

II

l1A' 211B11 The second inequality follows from Proposition 2 B ll• — A

while the third follows from 11A'BII the following result. Theorem 10.3.1 then (A

II

A

+ — 1

Suppose that AeC" B) is invertible. Furthermore — (A

+ B) 'II

1lA

X

11111 BlI <1.

We have proven

B II

is invertible.

<1/Il A'

1A' 2 Bf —

A

—

(2)

11111 B

Corollary 10.3.2

!fAeCTM is invertible and { A1 —' A, then for large enough j, exists and

c Ctm

such that

A

A and apply Theorem 1. U

Example 10.3.1

Then A

= 1/2 <

1.

But I — A is not invertible. Thus Proposition 2

was a matrix norm. depends on the fact The estimates given in Proposition 2, Corollary 1, and Theorem 1 are all sharp as can be seen from the scalar case.

216 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

Continuity of the Moore—Penrose generalized inverse

4.

We now wish to see to what extent the results of Section 3 can be extended

to the generalized inverse. The first thing to notice is that statement (A) of Section 1 is not true for the generalized inverse. Example 10.4.1 while A'

?i.]andA does not converge to anything,

Thus A,— A but

=

much less to At. The problem that immediately presents itself is to determine necessary —, At. Surprisingly it A implies and sufficient conditions so that has a nice answer. In order to try and get some feeling for what is happening let us return A to Example 1. Rather than talking of A+ where —, 0. ía, b.1 where for simplicity we assume that aj, are Let E,

=

0 so that

all real. We will also assume that E, —,

0.

0,

—'0,

At where A

We wish to investigate when does (A +

0,

and

=

Since a, —, 0 we might as well assume that I aj I < 1 for all j.

Then

b, '

L

d,

c,

rank at least one. That is, rank (A + E,) rank (A). We shall find out later that this is typical. There are two cases to consider for a particularj. has

Case /

A + E, is invertible. In this case E

(A +

A+ such that

Caso II

A + E1=

=

1

[

—

a,+1j1.

is singular. In this case rank (A + E1) =

[ai+ 1

1)

b,

1

(1)

so there exists

'][i

Thus (A + E )t — —

1

(1 +

Faj + 1

(1 + a1)2 + c)L

b1

(2)

CONTINUITY OF THE GENERALIZED INVERSE

217

Now suppose that A + E, has rank I for j greater than some J0. Then we are in Case H Ion j0. Now = b,/(a, + 1) so that Since we a,—'O, b1—.O, c,—'O, get from (2) that (A + and as desired. Suppose however that A + does not have rank 1 for all j greater than somej0. Then there exists a subsequence Em such that rank (A + Em) = 2 and E,, —, 0 for all integers m in the subsequence. Thus a,, —.0, d,, —.0. b,, —, 0, and cm —, 0. But the (2,2) entry of(A + Em)' is 3 (a,,, + 1)dm —

which does not converge to anything much less zero.

We have then that for our particular example, the following. At if and only if there is aj0 such (B) Given we have (A + rank (A) forj j0. that rank (A + such Statement (B) turns out to be valid for any c that E, —, 0. The proof will proceed much as in the special case. After some preliminary results we shall consider different cases involving rank (A) and rank (A + Es). To begin, recall the following fact.

Fact 10.4.1 If ', then Ax

is invertible.

x

is

a matrix norm on

and

— 1•

A 10 For all xeC" we will also need the following.

Fact 10.4.2

If

is ve

any norm on a vector space V. then u +

V

The generalized inverse version of Fact 1 takes the following form.

Proposition 10.4.1 Suppose that and is a matrix norm on q, At x / for XE R(A*). p 1, q 1. Then Ax 0

Then x = AtAu At Ax so that Proof Suppose Ax flx fl/Il At as desired. • In the discussion following Example 1 we noted that in that example rank (A + rank (A). This is typical as the next proposition shows. fi

Proposition 10.4.2 Suppose that A€CM Xfl and is a matrix norm on Kg p 1, q 1. IfEe CTM and fi E < 1/fl At fi, then rank (A + E) rank (A).

Proof Supposethatrank(A)=rand hEll < 1/IIA'lI. Let {ut,...,ur}be a basis for R(A*). It suffices to show that ((A + E)u1, ... ,(A + is a linearly independent subset of R(A + E). Suppose that 0=

iz

+ E)u,; 1

218 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

Then if x =

0 we get that i—I

0= E x1(A + E)e1 = Ax Ax

= Ax + Ex

(A + E) i—i

Ax

—

Ex (by Fact 2)

—

x / At x / H At

0

E liii x

—

x/

—

At

(by Proposition 1) =0.

Thus rank (A + E) rank (A). • Proposition 2 has several useful corollaries.

If A,

Corollary 10.4.1

and 1

IIABII <max{llAtli,

(1)

OBtll}'

then rank (A) = rank (B).

Proof Let E = A — B and notice that (1) implies that hA — Bli <1/il At ii and IIA—Bli

<1/hIBthl.Thusrank(A)=rank(B+E)rank(B)and

rank (B) = rank (A — E) rank (A) so that rank (A) = rank (B). As a special case of Corollary 1 we have the next corollary.

Corollary 10.4.2 If P. Q are orthogonal projectors in CA lip — QIL2 < 1, then rank(P) = rank(Q). The proof of half of statement (B) is now immediate.

Lemma 10.4.1

c

Suppose that

and(A + E)t_.At where

Then there exists aj0 such that rank(A

+

= rank(A)

JJo Pmof Suppose that

—0 and (A +

At. But (A +

-. A. Thus

... AAt = But the limit can be taken with respect to any norm by Theorem 2.1. Thus by Definition 2.2 there exists aj0 such that ifj j0, then = rank (A) forj J0 now follows from < 1. That rank (A + 0 o2 = (A +

+

—

Corollary 2. •

The rest of this section will be divided into two parts. The first will be

devoted to a proof of statement (B). The proof will be somewhat qualitative in nature. The second part will consist of a quantitative discussion of the same ideas. such that E1—i0. and {EJ) c Theorem 10.4.1 Suppose that _'At and only if rank(A + = rank(A)forj greater than Then (A + somefixedj0.

CONTINUITY OF THE GENERALIZED INVERSE

219

Proof Lemma 1 takes care of the only if part. Suppose that 0 and rank(A + = rank A = r forj j0. Now there exists unitary matrices UeCa' "', "and invertible matrix BeC' such that C = UAV

=

[

(2)

]. F12(j) F22

"

L F 21V/

(3)

Notice that rank (C + F,) = rank (A + and rank (C) = rank (A). Further= V*AtU* and (C + F)t = [U(A + = V*(A + E,)tU*, more since —, if and only if (A + E1)t _. At. For notational convenience (C +

we will omit thej in (3). We wish to get a formula for (C + F)t. Let liii = 111102• Since F—O we may assume fl F sup (II F,, fi, F,, H F2, fi, fi F22 II). Thus rank (B + F,,) = rank (B) by Proposition 2 and the fact that B is of full rank. Thus fi

IB+F,1 F,21

rankl

L

r21

I = rank (B).

22J

By Lemma 3.3.1 C + F can be written as

IB+F,1 L

F2,

I F,2 F21(B + F,,) 'F,2] — [F21(B + F1 x

Thus

+

(4)

[I + (B* ÷ by Theorem 3.3.4. Now F,1, F1 F12,F21,F22—'O since F—0. By Corollary 2,(B+ F,,)' —'B'. But 0. Similarly (B* + then (B + F, ,) + F,

x

1

(B+F,,)'-'O.ThusX,-'I, X2-IasF-'O.So and

[I (B + F, ,Y 'F,2]

[10] as F— 0.

Thus

as F —, 0 and the proof is We assumed that = and IIAII= IICII.

complete. • so

that we could assert that fi

fi

= fi

fi

220 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

Theorem 1 can be proved without using Theorem 3.3.4 in several ways.

Some are modifications of our argument here. Others, such as the original one due to Penrose, are completely different. In example 1 we had rank (A) and AJ =j = A — A111. This behaviour is typical as the next result shows. Theorem 10.4.2 Then (A +

and that rank(A + E)> rank (A). E for any operator norm

Suppose that A, EEC'"

E)t

1

/

Proof Suppose that A, EEC'" x" and that rank (A + E)> rank A. Then dim N(A)> dimN(A + E). Hence there is a vector u€N(A) of norm 1 such The proof of this last fact is left to the exercises. Now that ueN(A + UER(A* + E). Hence (A + E)t(A + E)u = u or (A + E)tEu = u. But then 1 II(A +E)tEII +E)'II hEll and II(A +E)tll

shows that the inequality of Theorem 2 is sharp. An obvious consequence of Theorem 2 is that if —' 0 but rank(A + E1)> rank (A), then not only does (A + + At but (A + E,)t is not even bounded in norm. Theorem I is theoretically satisfying in the sense that it completely -. At. However, in some situations it is characterizes when (A + important to be able to estimate the difference (error) between (A + Es)' — A' and A'. The rest of this section is devoted to estimating (A + In proving Theorem 1 we used unitary matrices. Unitary matrices are especially good when working with the Euclidean operator norm 111102 for ifU* = and then UA 1102 = hA We will use the Euclidean operator norm since it allows us to use the simplified block terms of Theorem 1. We shall also use the notation of the proof of Theorem 1. Let 1

0_F F* L

Also set !P,

(B*4F* ii ''X

2'

12'

and Y'2 = [1

=

rr11,

F2,

0]. Then (C + F)t = 8 B 19 while

C' =

B' !P2. Taking the norm of both sides gives (C + F)'

—

C' x

Certain factors in (5) are

+

B—

— "i

II

(5)

liii

obvious. B-

= Ct = At and

In order to estimate the rest we will use the following lemma.

Lemma 10.4.2

IfAeC'" XII BeC'

and

denotes the Euclidean

=

1.

CONTINUITY OF THE GENERALIZED INVERSE

A

operator, norm, then

Proof of Lemma

Suppose ueC,

Ilu

=

=

+

112

221

B 112)12.

= I. Then

+

hAil2 + 11B112.

+ 18112 as desired. Notice that since

Thus

D

= D for the Euclidean operator norm we also have l[A,B]112 11A112+ 118112. S We now begin to estimate the various terms in 9, and

To simplify

matters we assume that 11B' II IF11

(B + F1

II

<1, hI(B+F11)' II

11111 F21

II

< 1,and

< 1.

(6)

These assumptions will be discussed more fully later. Now II

F12

=

1111(1 11(1

11F12h1

+B

1F1

•+. B'F11)'

II

II

(1)

118'F1, iu18h11

by Proposition 3.2 and assumptions (6). Let 11)_I. Then (7) becomes

+

1B

=

118-111(1 —

B' x

F12

(8)

F21 II.

(9)

Similarly,

+

) — lF*

II

II

that 11(1 + DD*)u Thus II(I+DD*)l11 1. Hence Observe that for any DeCtm

ii for all ueCTM.

1.

1,and

(10)

By Lemma 2 we can now conclude from (8), (9),(10) that (11)

Now

xl I + Ft1) lxi

1—

1

(12)

and —

= [X2

—

I X2(B* +

By Theorem 3.1 we calculate that

(13)

_ 222 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

—

X1

+

II(B + F11) II

— 11(8

+

+ F1

'II

II(B+F11)—'F12112

—

II

1— lI(B+F11)—1F1211

Or a2 F12 112

lII_X1lIla2IIF

(14)

112

by (8). Similarly, —

X2

a211 F21

i—

112

F21

(15)

Ill. ii

Combining (8), (10), (12) and (14) with Lemma 2 gives

F 1211112 112

2

) +IIF

'2N2

(16)

1211

1211

II

In the same manner we get that 2 2

II

2111

112

II II

2111

(17) into (5) gives us that

(C + F)t — C'

F21 a

(1 + a2 F12 112)1/2 II a211F21fl2 22+1) (1 —a2fIF21II )

1/2

1/2

a2IIF12II2

+aIIF12II( (1 —

F21 112)2

+ 1)

(18)

It follows from (2) and (3) that if we define E11 = PR(A)EPR(A.),

=

E12 = PR(A)EPN(A), PN(AS)EPR(A.), E22 = PN(A.)EPN(A),

then

(19)

I

Theorem 10.4.3 Suppose that A, and rank(A + E) = rank (A). E <1/2 where Suppose furl her that At = II 11o2 Define as in (19). Let a = At 11/(1 — hA' E11 II). Then

II(A+ E)t

_At ahIE

II hAt 11(1 +a2h1E12hh2)"2 a2hIE21hI2

1/2

) x( (1—a2hIE21hI 22+1)

+ahIE12hh At ((1

a2hIE12hI2

E21

112)2

1/2

+ 1)

.

(20)

CONTINUITY OF THE GENERALIZED INVERSE

223

Proof We need only show that II

E <1/2 implies conditions (6) and that E for I i, are satisfied. Notice that = j2. Thus 11B' lIE,, At hEll <1/2< 1 so the Iirst condition is satisfied. To see that the second and third conditions are satisfied we calculate that II

II

(B+F 11

IIBII

<

F

I

—

1/2

F

1— 1/2

11B'II 11F1111

—1

as desired. U

There are several nice features to (20). The first is that H At factors out of the right hand side. Thus the 'percentage error' 10011 (A + E)t —

At Il/Il At Ills easy to estimate. The second is that it is probably undesirable However, if is replaced by in many cases to actually compute then an estimate can be obtained. Since any number K, E., K < E 1102 1102' one could use some of the more easy to compute norms In in Section 2 to estimate E 1102 and hence to estimate E, particular, we get II

I!(A+E)t—Athl 2IIAtII IIEII(1 +(1

I

\112

(21)

for any 11E1102 lIED <

0] rio Ii 0 and B = 0

rio Example 10.4.2

Let A =

I

0

0

LO

0J

= 1h11O2

LetE= 10

1

1

0

0 0

0

We wish

0]

hEll2 =

Thus

< 1/2 and Theorem 3 can be used. Now

0.1(1 — 0.03364.

0

Lo 0

so that B= A + E. Observe that IlAtil = 0.1. Also DElI

E

11

[0

ro

At 1111

i

0

1

Note that

rl/(lo+e)

Bt=I

[C

— At 0.11647. Substituting into (21) gives Thus one conclusion of our estimate is that Il C

i/Ii+e £

C

where e denotes a term <0.03364 in absolute value. The exact value for Bt is

11/10—1/1010

Bt=l

L

0 1/101

10/1111 1/11

0 0

—1/1111

0

—

For some purposes (21) is sufficient. However, in a particular problem

224 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

one might want a better estimate such as (20). Other estimates exist. We

will give one due to Stewart. Its proof is left to the (starred) exercises.

Theorem 10.4.4 Suppose A, E E Ctm Define the as in (19). Suppose further that < and that rank(A + E) = rank(A). Let liii E1 1

K=

I

At

+ 11jj2)

1

/

K11E1111\1 = yic and Il/Il A A — The number K = A At Ills called the condition number of A. K measures

where y

=

1

the amount of distortion of the unit ball of caused by the linear transformation induced by A. Theorem 3.1 may be written using K. Since = AAt we have K 1 if A 0. Let 2, and denote the largest and smallest non-zero singular values of A. (See page 6 for the definition of singular values.) It follows from the singular value decomposition, Theorem 0.2.2, that hA = A, and hAt = 1/2g. Thus K = 2,/As. It is sometimes useful to have an estimate for At — Bt even if rank (A) rank (B). The following Proposition is helpful.

Proposition 10.4.3 Suppose that A, BeCTM At

—

Bt = Bt(B — A)At + (I — BtB)(A*

Then —

B*)At*At + BtBt*(A* — B*)

x(I_AAt). The proof of Proposition 3 is straightforward and is left to the exercises. From Proposition 3 we get quickly that Theorem 10.4.5 Suppose that A,BECrn*A. If II II At — Bt 3 max { At 112,11 Bt 112)11 A — Bil.

= 111102' then

II

Proof From Proposition 3 we have that A'

—

B'

A

—

At + A'

B 11(11 Bt

2 II

+

IIA—BII 3max{IIA'fl2,IIB'112). U The 3 in Theorem 5 can be replaced by (1 +

5.

See [89].

Matrix valued functions

Theorem 10.4.1 has another formulation which is of interest. Let A(t) be an m x n matrix valued function for t in some interval [a, b]. That is,

1a11(t)

A(t)=I

. . .

:

:

...

atb,

CONTINUITY OF THE GENERALIZED INVERSE

225

where a.,(t) is a complex valued function and is defined on [a. b]. If is continuous for I I m, n, then A is called continuous. 1

This is equivalent to saying urn A(t) — A(t0) = 0, for a

b. If A(t)

10

invertible for tE[a, bj, then [A(t)} defines a matrix function for IE[a,bJ. We denote this function by A '(t). It is immediate from Theorem 3.1 that: is

Proposition 10.5.1

If A(t)

is a continuous n x n matrix valued function on [a,bJ such that A(r) is invertible for a t b, then A '(1) = [A(t)] is

a continuous matrix valued function on [a. b]. Proposition I may also be proved by Cramer's rule.

If A(t) is an m x n matrix valued function we define the n x m matrix valued function At() by At(t) = [A(t)]'. Theorem 4.1 gives us the following extension of Proposition 1.

Theorem 10.5.1 Suppose that A() is a continuous m x n matrix valued function defined on [a, bJ. Then At(t) is continuous on [a, b] and only if rank(A(t)) is constant on [a,b]. Define the rank functionr(t) by r(t) = rank(A(t)). The discontinuities of At(t) occur when A(t) changes rank. That is, at the discontinuities of r(t). We wish to establish 'how many' discontinuities At(t) may have if A(t) is continuous. The discussion requires a certain familiarity with the concepts of open and closed sets such as is found in a standard first course in real analysis.

Example 10.5.1

Letf(t)= t sin(ir/t) 110< t 1 andf(0)= 0. Let

?1.Thenr(t)=2ift Let $4' = {t0IA? is

not continuous at tj. Then 5" =

I.

fl an

integer}U {0).

Notice that the set 5" of Example 1 has an infinite number of points in it. However, it is closed (contains all its limit points) and has no interior (contains no open sets). This behaviour is typical. Theorem 10.5.2

Suppose that A() is a continuous m x n matrix valued function defined on [a, b]. Let $4' = { ç€ [a, b] I At(t) is not continuous at t, }. Then $4' is closed and has no interior. Thus there exists a collection of open intervals {(a1,b1)} such that At(.) is continuous on each (a1,b1) and the closure of (J (a1, b.) is all of [a, b].

Proof Suppose that A(•) is a continuous m x n matrix valued function defined on [a,b]. Let 5" = {t0Ir(e) is not continuous at Since r(t) is not continuous only when it is not constant this $4' is the same as the of Theorem 2. If the determinant of a fixed submatrix of A(t) is taken we

226 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

a continuous function of t. Let be the sum of the absolute values of the determinants of all I x I submatrices of A(t). (For convenience 1 I n, are continuous functions on [a, b] and suppose n m.) Then = 0 for j I. Notice that r(t) = sup { iI 4'.(t) # 0). if = 0, then and Let is closed since c = {t14)1(t)= 0). Notice that is continuous. Let denote the boundary of 6".. For a closed set 9's, which are not interior points (loosely speaking the is those points in edge of 6°.). For the 6" of Example 1 we have = 9'. Now is a get

closed set with no interior. Hence

is a closed set with no interior. iz I

We shall show that 6"

=

(J ô6°1.

Suppose that r(t) is not continuous at t0, that is, çe 6°. Let r = r(t0). By the continuity of A(t) and Proposition 4.2 we have r(t) r(t0) for t near ti,. Since r(t) is not continuous at t0 there exists a sequence t,—' t0 r(t0) + 1. But such that 1(t0) = 0 and 1(t) 0. Thus to€t36"r+i. Hence 5F

J

To show equality suppose that 10E (J i=

such that t0€35",. r(t0) = r and

Let r be the smallest integer

1

Let be a sequence such that ti—' t0 but Then r + 1. Thus r(t) is not continuous at t0 and t0€b°.

a It should be noted that set of discontinuities of At(t) can still be very complicated for there exist closed sets with no interior which are uncountable. Example 10.5.2 Let 6° be any closed subset of [0,1] which has no interior. Definef(t) = inf{It — sI :se9'}. Thenf is continuous and .9' = {te[0, 1] :f(t) = 0). Let A(t) = [f(t)] so that At(t) = [(f(t))t]. Then the set of discontinuities of At(t) is 9'. It is also useful to be able to differentiate matrix valued functions. If A(t) is an m x n matrix valued function on [a, b], we define dA(t,) by dA(t0) = urn [A(t) — A(t0)]/(t —

t0)

provided the limit exists (any matrix norm may be used). This is equivalent

to saying that dA(t0) =

where

=

dA(t0) is called the

derivative of A at t0. If At(t) is to be differentiable at it must be continuous at t0, hence of constant rank in some open interval containing Provided that this

CONTINUITY OF THE GENERALIZED INVERSE

227

happens the differentiability of A(t) at t0 implies the differentiability of

At(t) at t0. Theorem 10.5.3

Suppose that A(t) is an m x n matrix valued function defined on [a, b]. If rank(A(t)) is constant, then is differentiable on [a, bJ and

dAt =

Proof

—

)A A +

At(dA)At +

L.

Suppose that A(t) is differentiable on [a, b] and that rank (A(t)) is

constant. Then urn [A(t) — A(t0)]/(t

—

= At(ç). Since

ç) = dA(ç), urn t—1o

we are differentiating with respect to a real variable we have [d(A*)](10) =

[(dA)(t0)]t, that is (dA)* = d(A*), where A*(t) = [A(t)]*. Thus the symbol dA* is well defined. By Proposition 4.3 we have (At(t)

At(10))/(t

At(t0) { [A(t) — A(t0)]/(t — tv,) )At(t0) + (I — At(tJA(t0)) { — — t0) }At(t)*At(t) + A(to)tA(tc,)?* { [A(t) — A*(t0)]/(t — } (1 — A(t)A(t)t). —

—

t0)

—

Taking the limit as t -. i, of both sides gives the formula for dAt and the

differentiability of At. • It is important to notice that in the proof of Theorem 3 we used the fact that t was a real variable. Theorem 3, as stated, is not valid for t a complex variable. Let us see why. For the remainder of this section suppose that z is a complex variable and A(z) is an analytic, m x n matrix valued function defined on a connected open set Q. That is, is analytic for ZEQ, i in, n. If At(z) were also analytic, then so would be A(z)At(z) and At(z)A(z). But 1

A(z)A'(z) 1102 and

1

A(z)At(z) 1102 are identically one on Q. Thus A(z)At(z)

and At(z)A(z) are identically constant. (A vector-valued version of the maximum modulus theorem is used to prove this last assertion.) Thus R(A(z)) and N(A(z)) are independent of z if both A(z) and At(z) are analytic. Suppose now that R(A(z)) and N(A(z)) are independent of z. Then there exist constant unitary matrices U, V such that

]u* where A1(z) is analytic and invertible. Of course, some of the zero submatrices in this decomposition of A(z) may not be present. But then, At(z) is analytic since 1(z) is.

Theorem 10.5.4

If A(z) is an analytic, m x n matrix valued function defined on a connected open set Q, then At(z) is analytic on Q and only if R(A(z)) and N(A(z)) are independent of z.

228 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

The assumption that R(A(z)) and N(A(z)) are independent of z may be

too restrictive for some applications. If one is willing to use an inverse other than the Moore—Penrose inverse, then the situation is somewhat more flexible. It is possible to pick a (1,2)-inverse A2(z) so that A(z) and A(z) are both meromorphic on the same domain. (Recall that a function is called meromorphic on a domain if it is analytic except at isolated points called poles where A(z) satisfies urn (z — z.rA(z) =0 for some in.) An example will help illustrate the general situation.

[zO Example 10.5.3

Iz'

-

A2(z)

=

10

0 0

Lo

— 1/2

11

Let A(z) = 10

z

—1

LO

z

—IJ

and

0 0 —

1/2

•

Notice that N(A(z)) is not independent

of z, but A(z) is analytic except at z=0 where A(z) has a rank change.

The projectors

Ii A;(z)A(z)=I 0

0 0]

Ii

0 01 and A(z)A(z)= 10

[0 —z IJ

LO

0

01

1/2 1/2

1/21

1/2]

are both analytic for all z, including zero, but A(z)A(z) is of non-constant norm. This behaviour is typical. Theorem 10.5.5

Let A(z) be a meromorphic m x n matrix valued function Then there is a n x m matrix valued defined on a neighbourhood of on a neighbourhood of z0 such that: function A2(z) defined

A(z) is a (1,2)-inverse for A(z)for each z # z0. (ii) A(z) is analytic on a deleted neighbourhood of 20. (iii) A(z)A(z) and A(z)A(z) are analytic on a neighbourhood of (i)

The proof of Theorem 5 may be found in [6:1. Theorem 5 is a local result in that it talks about behaviour near a point. It is possible to get global versions. Theorem 4 and 5 could be useful in a variety of settings. For example, equations of the form A(z)x =0 occur in the general eigenvalue problem, the study of vibrating mechanical systems, and some damping problems. If the ideas of Chapter 5 are applied to circuits with capacitors and inductors, then the impedance matrix is a meromorphic matrix valued function since it involves polynomials in w and 11w. In particular, Theorem 5 might be useful when using the characterization of the impedance matrix of a shorted n-port network given in Proposition 5.3.1.

CONTINUITY OF THE GENERALIZED INVERSE

6.

229

Non-linear least squares problems: an example

Theorem 5.2 can be useful in non-linear least squares problems. Some

non-linear problems are very similar to those discussed in Chapter 2, Sections 6 and 7. In Section 2.7 we discussed fitting a function of the form

... to a set of data. There the g.(x) were known functions and the /3, were

parameters to be estimated. A somewhat more general problem is to fit a function of the form

... +

+

y=

(I)

to a set of data. Here the are known functions while the /3., and are parameters. The g1 may be vector valued. Equations of = the form (I) appear widely. For example.

+

= $1

and (—)

+

y = fl1e1" +

are both in the form of (I). Functions of the form of (2) are common

throughout the sciences. Any process that can be described by a linear differential equation (with reasonably good coefficients) will have solutions of the form (1) where k will be the order of the equation. Sometimes it is clear from the problem what the g(x, i) look like. A particular example is the iron Mössbauer spectrum with two sites of different electric field gradient and one single line. Here (I) takes the form:

y=

+

+ /33t

—

P4[2 +

—

+2( Pr__________ +

+

+2+(

— 1)2

_)2

F__________ We are not going into the full theory of fitting (1) to a set of data since

it would lead us too far astray. We will however work out an example that illustrates the basic ideas and difficulties.

Example 10.6.1

Suppose that we wish to fit the equation (4)

230 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

to the data points (ti, y.) which are (0,0), (1, 1), and (2,3). Here we assume

is a real parameter. Thus we wish to minimize the error in

0f30+133 +e, I

= 3 =

+ +

+ e2 + e3.

This can be written as

+ e where

y

(5)

11 '1

101

=I I = I 11, Ly3J

[3]

=

1

e2

and e

e23]

= e2

Li We will minimize e with respect to the Euclidean norm. For any value of Substituting into (5) gives we minimize e by choosing fi = — = (I To minimize that e = y — =y— e we must minimize 11(1 — or equivalently, maximize = It is clear from the data, that =0 gives a poor fit. Assume then that # 1. Then II

=2—

—

+

A direct calculation gives 19e42

16—8ev—

=

2—

—

+

Differentiate this and set equal to zero to locate potential maxima. The =0. This is a — result is 8 — + + + 24e32 — Its roots are — 0.8, 1, and 2. The root — 0.8 sixth degree polynomial in is out since 0, and I is ruled out by assumption. We shall discuss the one root later. That leaves e2 = 2 or = In 2. Then = [— 1, 1) so that y = —1 + or y = — 1 + 2' is the curve of form (4) which bests fits our data. Note that, in fact, we have an exact fit. In Example I we discarded two roots — 0.8 and 1. Clearly the — 0.8 was extraneous. Where did the 1 come from? Consider the problem of finding extrema off/g wheref, g are two differentiable functions. Proceeding formally we getf'/g —fg'/g2 =0 or qf' —fg' =0. If we now notice that g(x) = x2 we would conclude that x =0 was a potential extremum. But f/g is possibly not even defined for x =0. This is exactly what happened in Example 1. The root e2 = 1, corresponded to when was zero. — was not of full rank and the term 2— + There need not always exist a best fit.

CONTINUITY OF THE GENERALIZED INVERSE

Example 10.6.2 equation

231

If the process of Example 1 is used to try to fit the (7)

to the data (0.0). (1, 1)and (2,2) then, as in Example 1, we get a polynomial but the only positive root of this one is 1. It is not too difficult to find values of 0,fl0, and fl1 which give a better fit than =0 ever does. What has happened here is that there is no best fit. By picking close enough to zero and correctly choosing the one can fit (7) with as small an error as desired. But an exact fit is impossible since the data is colinear and (7) is strictly concave up or constant. Aside from these problems this least squares technique is, in general, is of full rank the formulas considerably more complicated. Even for can be very complicated. Secondly, if there are several a's, then will be a function of several variables which makes maximization more difficult. Finally, even when the maximization of reduces to finding the roots of a polynomial as our example did, the polynomial may be of large degree and require numerical methods to find its roots. The reader interested in the numerical methods necessary for solving such problems is referred to the paper of Golub and Pereyra [37]. Note that in working our example we used the differentiability of Where Theorem 5.2 comes in explicitly is in the and not theoretical development of the general technique. 7.

Other inverses

It should be pointed out that it does not make sense to try and duplicate the results on the continuity of for (i,j, k)-inverses. The reason is obvious, for A (t) (or any other (i,j, k)-inverse) is not a well defined function. Thus if A, A and rank (A) = rank (A) one could still have A fail to even converge. Example 10.7.1

ç=

Let A

Ii

01

= Lo

0]'

A,

Il+l/j 01 and

=L

0

0]

(—Wi 0

]

even converge. If A and Al is a uniquely defined matrix for each j, then it is possible to discuss the convergence of the sequence {Al). This would be necessary, for example, in discussing iterative algorithms or error bounds for particular methods of calculating inverses. See Chapter 12 for some such results. For uniquely defined inverses, such as the Drazin, it is possible to consider continuity. The conditions under which the Drazin inverse is

232 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

continuous are similar to but not identical to those under which the Moore — Penrose inverse is continuous.

o]

[1 0

[1

Let

Example 10.7.2

A

0

Then

—'

A,

—'

=

0 0 0 0 0 0 0

AD, but rank (A)> rank (A) and md (A)> md (A).

0 0

100 000 001 000

0 0 0 0 0

0

0000

while AD =0. Thus

—, A,

rank (A) = rank (A),

= Ind(A), but A

Notice in Example 2 that core-rank be the key.

the rank of A" where k = md (A). = core-rank A. This turns out to

-. A. Then and that -. AD and only jf there exists anj0 such that core-rank (A) = core-rank (A)

Theorem 10.7.1

Suppose that

A E C"

X

forjj0.

Before proving Theorem I we need two preparatory results. The first is a generalization of Corollary 4.2.

PeCm xm are projectors (not Suppose that —' P. Then there is a J0 such necessarily orthogonal). Suppose furt her that that rank (Ps) = rank (P) for J0•

Proposition 10.7.1

P we Proof Suppose that P, P where = P, and P2 = P. Since have rank (Ps) rank (P) for large enough J• that there does not P Let = P exist a J0 such that rank (Ps) = rank (P) for j L. Then there exists a subsequence PA such that rank (Ps)> rank (P). That is, dim dim R(P). But N(P) is complementary to R(P). Thus for each J,, there is a #0. But then eN(P), and vector u, such that u, )u. =Puj +E.u. =Ej U. . Let 11.11 bean U. =P, lkfor ThenkII E j,'. But F!, if —p0 and we have a

contradiction. Thus the required J0 does exist. I We next prove a special case of Theorem 1.

Proposition 10.7.2 Suppose that

A€C"'

and

—'

A. Suppose

CONTINUITY OF THE GENERALIZED INVERSE

further that md (A,) = md (A) and core-rank (A,) = core-rank (A) for Then A? —, AD.

greater than some fixed

Proof

233

j

Suppose that A1 —. A, md (A1) = md (A) = k and core-rank (A1) =

= core-rank (A). From Chapter 7 we know while AD = l)tAk. But rank(AJk4 1) = core-rank (A1) = core-rank (A) = 1)t by Theorem 4.1 Proposition 2 rank(A2k4 1)• Thus (A,2k l)t —, now follows. U We are now ready to prove Theorem 1.

Proof of Theorem 1 Suppose that A ., A are m x

m matrices and A1— A.

We will first prove the only if part of fheorem 1. Suppose that A?-. AD. But AJA? is a projector onto Then AJA? —, and AAD is a Thus rank(A.,A?) = core-rank (A1) and projector onto rank (AAD) = core-rank (A). That core-rank (A,) = core-rank (A) for large j now follows from Proposition 1 and the fact that AJA? —, AAD. To prove the if part of Theorem 1 assume that core-rank (A,) = core-rank (A). Let A1 = C + N, and A = C + N be the core-nilpotent decompositions A' for all integers 1. Pick in Chapter 7. Now of A1 and A

= and A' = C'. Hence C'— C'. Now I sup {Ind(A1), Ind(A)}. Then = rank(C'). core-rank (A1)= core-rank (A) = = Ind(C) are either both zero or both one. This implies that -. by Proposition 2. We may assume the indices are one Thus = (CD)1. else A is invertible and we are done. But = (C?)' and = converges to (C" = Hence A? = C?

•

We would like to conclude this section by examining the continuity of

the index. In working with the rank it was helpful to observe that if A1—. A, then rank (A1) rank (A) for large enough j. This is not true for the index.

Example 10.7.4

Let

=0 while lnd(A) =2. Notice that A? + AD in Example 4. Proposition 10.7.3 Suppose that then there exists aj0 such that Ind(A)

Ind(A)forj j.

A and

AD. Let J0 be such that md does not Proof Suppose A1—. A and take on any of its finite number of values a finite number of times for be the subsequence such that jj0. Let 1= inf{Ind(A) and md (A1) =1. Let N be the nilpotent parts of A1, A respectively. Then — —. A'(I — ADA) = N'. Hence md (N) l.U 0= (NJ =

234 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

8.

Exercises

Prove that 11 Ii,,, p 1 defines a norm on C". on C" and C'" that 2. Prove that for AeC'" • and norm 1.

sup { II Au

=I} K u 115 for every uEC}.

: UE C", II u

= inf{K: Au 115

3. Suppose that {Ak} is a sequence of rn x n matrices. Let cC'" ".Prove that Ak —, A if and only if A=

=

lim k-.

for 1 i rn, 1

4.

Show that if Ac C'" XII, then

5.

If Ae C'"

then

let A

n. A 1101

= max {

= n max ajj I = n II A

II

Since C'" XIS is

the 11112 noim. IA 112

isomorphic to C'"" we can give

/

i

\1/2

=

ii (a)

is a matrix norm in the following sense. If

Prove that A, BE C" XIS then

AB A

lB

(b) Prove that 11112 is a matrix norm. (This will probably require the following inequality:

/

\1/2/

i,

Ei_i

N

i—i

•

\1/2

1—1

which goes by the name of Cauchy's inequality.) Show that Suppose that A E C" (c)

!

(d)

—

(e)

A A A

A

A 1101 A fl

A A II

o2

and

A

(f) n"211A112 11A1102 hAil2

(g) n"211A112 (h) n"211A112 hAil01 (1)

(i) n"2iiAiI02 hAIL1 (k) n' hAil00 hAIL1 (1)

In parts (l)—(k) determine the correct inequalities if

AEC'"" rather than CNX. 6. Show that the inequalities (c)—(k) of Exercise 5 are all sharp. 7. Prove Proposition 10.3.1.

CONTINUITY OF THE GENERALIZED INVERSE 8.

Let I be the identity matrix on CflXft. Show that 11111

235

1 for any

matrix norm on 9. Let be any matrix norm 11111> 1 is permitted. 1111 A <1, prove that (I — A) is invertible and estimate 11(1

—

A)-'

10. Prove Fact 10.4.1. 11. Prove Fact 10.4.2. 12. Give an example to show that Proposition 10.4.2 is no longer true if E < 1 / H t is weakened to fi E 1 / 13. Suppose M, N are subspaces of a norm on Prove that if dim M > dim N, and K is a complementary subspace to N then there such that hull = 1. *14. Prove Theorem 10.4.4. (Hint: first prove for the case that AeCtm x1 has rank n.) 15. Prove Proposition 10.4.3.

9.

References and further reading

A more complete discussion of matrix norms may be found in [33] and

[37]. In [37] norms are discussed in terms of their unit balls { U: U 1). The relationship between various norms is studied in [33]. In particular, Exercise 5 is from [33]. Theorem 4.1 was given in Penrose's first paper on generalized inverses [67). His proof was based on the characteristic function of A*A and appears as an aside on the bottom of page 408. In a follow up paper [68] he discussed approximating At. The development given here is similar to that of Stewart [88]. A more restrictive treatment that applies some of the ideas to error estimation is [9]. An infinite dimensional treatment is given in [63]. Another treatment that is restricted to hermitian matrices is [76]. The paper by Robertson and Rosenberg [75] deals with matrix valued measures. In particular, they prove matrix versions of the Hahn—Jordan decomposition, the Radon—Nikodym theorem, and the Lebesgue decomposition. Their results use generalized inverses. As a lemma they establish that if A(t) is a measurable m x n matrix valued function (that is, a.fr) is measurable for I I m, 1 n) and if B(t) = [A(t)]t,

then B(t) is a measurable n x m matrix valued function. Proposition 4.3 and Theorem 5.3 are from [37). There is a nice bibliography at the end of [37] which includes a reference which explains formula (3).

11

Linear programming

1.

Introduction and basic theory

This chapter will discuss how the theory of linear programming relates to

the theory of generalized inverses. The chapter is not designed to teach the reader the full theory or applications of linear programming. We ignore, for example, the simplex method. We will begin by describing the basic linear programming problem. Then several basic theorems will be presented. Selected proofs will be given to give the reader an idea of some of the techniques involved. We conclude by showing how the generalized inverse can occur in working with linear programming problems. Hopefully by the end of this chapter, the reader will have a good idea of the part that the theory of the generalized inverse can play in the theory of linear programming. To begin with, we should probably point Out that the name 'linear programming' can be somewhat misleading. It is not concerned with computer programming as such, though computational algorithms play an important part. Rather the theory concerns maximizing and minimizing linear expressions with respect to linear constraints and linear inequalities. These problems arise, for example, in allocation of resources and transportation problems. The 'program' is thus more of a 'schedule' or 'allocation scheme'. In order to motivate the formulation of the general mathematical problem, let us consider the following simplified situation. A manufacturing company makes three products P1 , P2 and P3. Each product uses the inputs of electricity, labour, iron and copper. Suppose the amounts used are given by Table 4.1. The numbers are in terms of amount of input required (in some appropriate units) per unit output of product. For example, each unit of P3 uses 4 units of electricity. The problem is to maximize the profit where the profit per unit of product is I for P1. 2 for P2. and 1 for P3. We assume that all available product can be sold. However, there are certain constraints. We suppose that there are only 20 units of labour, 10 units of

LINEAR PROGRAMMING 237 Table 4.1 P1

P2

P3

Elect.

4

8

4

Labour Iron Copper

3

3

I

I

I

0

I

2

2

iron, and 5 units of copper available each week. The problem may be formalized mathematically as follows. Let x1, x2 , x3 denote the quantities of products P1,P2, P3 to be produced in a week.

Problem A 3x1

Maximize x1 + 2x2 + x3 subject to the constraints

+ 3x2 + x3

20,

x1 +x2 10, x1 + 2x2 + 2x3 5,

andx1 This can

be put

into a more standard form as follows. Let

x4 = 20— 3x1 — 3x2 — x3, x5 = 10— x1 — x2, and

x6=5—x1 —2x2—2x3. Then Problem A becomes Problem B.

Problem

B Maximize x1 + 2x2 + x3 subject to

+3x2+x3+x4=20, x1+x2+x3=10, x1+2x2+2x3+x6=5, 3x1

and

x6 represent unused available input and are called

slack variables. The general linear programming problem can be

formulated using Problem A as a model. One might be tempted to define it as maximization of a linear function subject to any combination of inequalities and equalities. However, by multiplying inequalities by the appropriate sign we may get all the equalities in the same direction. for all I, then Ifx =(xl,...,xM)ER" andy and ifx1 we write x y. This notation will simplify our calculations. Let

={xERN:xO}.

Since linear programming problems are usually done with real values we if x y is will work with them. However, most of the theory is valid for interpreted to mean Re(x — y) 0 and we maximize the real valued functional, Re(x,c). Here Re z is the vector whose ith entry is the real part of the ith entry of z.

238 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS beRtm, and Definition 11.1.1 Suppose that The general linear programming problem is to maximize (x, c) subject to the constraints Ax b and x 0. Note that by maximizing (x, — C) we can obtain the minimum of (x, c).

Definition 11.1.2 A linear programming problem is in standard form it is oftheform Ax = b, x 0, Note that the general linear programming problem can be put in standard form by using slack variables. Problem B was the standard form of Problem A. Problem B has the advantage that one is working with equality constraints. Problem A will in general have a smaller matrix A. Thus while Problem B might be easier to work with theoretically, Problem A might be simpler since it involves smaller sized matrices and vectors.

Going along with each linear programming problem is another one called its dual. Definition 11.1.2 (c, x) subject

The dual

of the linear programming problem: maximize

to Ax b, x 0 is, minimize (b, y) subject to

c, y 0.

The dual of Problem A would be Problem C.

Problem C

Minimize 2Oy1 + lOy2 + 5y3 subject to

3y1 +y2+y3I

y1 +2y3l

1

and This dual problem has an interpretation related to that of Problem A.

Problem A amounts to maximizing total net revenue, while Problem C consists of minimizing the total 'accounting value' of the inputs. The y,'s are the values to be assigned to the inputs. They are sometimes called 'shadow prices.' The equations (1) say that the values given to inputs must not be less than the contribution of the inputs to net revenue. We shall see shortly that Problem A and Problem C are equivalent in an appropriate sense.

We now turn to the mathematical treatment of linear programming problems. For either the general problem or its dual, a vector is called feasible if it satisfies the constraints. A feasible vector which maximizes the functional in the general problem (or minimizes the functional in the dual problem) will be called optimal for the general (or dual) problem. It should be noted that the dual of the general problem and the dual of the problem in standard form are equivalent. That is, they have the same feasible vectors, optimal vectors, and minimal value of the functional.

Proposition 11.1.1 equivalent duals.

The genera! problem and its standard form have

LINEAR PROGRAMMING 239

It will be more convenient for us to work with the standard form. The first problem is to determine when feasible solutions of Ax = b, x 0 exist, that is, when the constraints are consistent. The constraints Ax = b, x both bER(A) and A beN(A) +

Proposition 11.1.2 only

0, are

consistent !f and

Proof Solutions of Ax = b exist, of course, if and only if be R(A). If bER(A), then the set of all solutions to Ax = b is Ab + N(A) for any (1)-inverse A. Thus the set of constraints it consistent if and only if (Ab + N(A)) is non-empty. This happens if and only if

U From Proposition 2 and a lot more work we can get a characterization of consistent constraints due to Farkas. A proof using generalized inverse notation may be found in [11].

Theorem 11.1.1

(Farkas). Suppose that Ac R'" following are equivalent: (i) Ax = b, x

X

be Rm. Then the

0 is consistent,

(ii) A*y 0 implies that (b, y) 0.

A linear programming problem and its dual are closely related. We

summarize this relationship in the following fundamental theorem. The theorem is standard. A proof may be found in Simonnard [84], for example. Theorem 11.1.2

A linear programming problem and its dual either both have optimal solutions or neither does. If they both do, the maximum value of the original problem equals the minimum value of the dual. This common value is called the optimal value of the problem. This theorem has several consequences. First of all, it says that if either

the original or the dual has no feasible vector, then the other cannot achieve a maximum (or minimum) even if it has a feasible vector. Suppose that x, is a feasible vector for a linear programming problem while y is feasible for its dual. Then Ax0 b and A*y0 c. But then (b, y0)

c*x0 =

(c, x0).

(2)

According to Theorem 2, if y, x are optimal feasible vectors, then (b, y) = (c, x). This provides a way of testing two feasible vectors to see if they are both optimal solutions. Notice that (2) also says that if the original problem has a feasible vector x0, then (b,y) is bounded below. There would then seem to be hope for a minimum. Similarly, if y0 is a feasible vector for the dual, we have (c, x) is bounded above by (b, y0). In fact, the following is true.

240 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

Theorem 11.1.3

Iff(x) = (c, x) is bounded above (below) on {x: Ax = b, x O}, thenf attains its maximum (minimum) on {x : Ax = b, x O}. It should be noted that Theorem 3 is not true if {x : Ax = b, x O} is

replaced by an arbitrary convex set. In light of our observation that feasible vectors of the general problem and its dual form bounds for (x, c) and (x, b), the next result follows from Theorem 3.

Theorem 11.1.4 A necessary and sufficient condition for one of the two problems (and hence both) to have optimal vectors is that they both have feasible vectors. Under certain conditions it is easy to show (x, C) is bounded above. As pointed out earlier, any vector satisfying the constraints is of the form x0 = Ab + (I — AA)x0. Thus sup(x0,c) = (Ab,c) + sup((I — AA)x0,c), where x0 ranges over all feasible vectors. One way to get that the supremum on the right exists is to have (I — AA)*c =0. For then, ((I — AA)x0,c) =0 and the supremum is (Ab, c). If (I — AA)*c =0, then c*(I — AA) =0. Thus we have:

Proposition 11.1.3 A sufficient condition that thefunctionf(x) = (c, x) be bounded above is that c*AA = Proposition 3 is a very special case. If c*AA = thenf(x) = (Ab,c) for all feasible x. Thus every feasible vector is optimal. Another, and more reasonable, way to guarantee thatf(x) = (c, x) attains a maximum is to have that (Ab + the set of feasible vectors, is a closed, bounded set. That it is closed is clear since A b + N(A) and are closed. (Closed being used here in the sense of having all of its limit points.) One way that (Ab + could be unbounded would be the existence of an h0E N(A), h0 0, such that h0 0. In this case if he N(A) were

such that Ab + h 0, we would have Ab + h + 2h0 0 for A 0. In fact, the existence of non-zero h0E N(A) and sufficient if feasible solutions exist.

turns out to be necessary

+ N(A)) Suppose that the set is nonempty. Then it is unbounded (f and only (1 there exists a non-zero h0EN(A)

Proposition 11.1.4 such that h0 0.

Proof We need only show the only if part. Suppose that {Ab + N(A) } n is unbounded. Let {Ab + h,,j be an unbounded sequence made up of is an unbounded sequence. Let Then vectors in (Ab + {kj is a bounded sequence in RN, it has a convergent Since km = tim! H h, subsequence which we will denote by k1. Let k0 denote the limit of this subsequence. That is k, -. k0. Since k1€N(A) and N(A) is closed we have k0eN(A). Note that (Ab + h1)/11h111 -0 + k0 = k0. Thus k0eR",. since

Ab +

and k0 is the required vector. U

LINEAR PROGRAMMING 241

2.

Pyle's reformulation

To illustrate one way that the generalized inverse can be used in linear programming problems we will discuss a method developed by Pyle to reformulate a linear programming problem into a non-negative fixed point problem. The ways in which the generalized inverse are used in this section are typical of many applications of the theory of generalized inverses to to linear programming. Another application is discussed in the final section. We will not give all the proofs but rather outline Pyle's argument. The proofs are assigned as exercises or may be found in his paper. Consider the following problem and its dual:

(P1) Maximize (x,c) where Ax = b,x 0. (Dl) Minimize (y.b) where A*y c,y 0. Here

The first step is to reformulate (P1) and (Dl) as problems in which every feasible solution of the new problems is an optimal solution of (P1) and (Dl). This requires being able to express optimality of vectors as an algebraic condition. This algebraic condition will then be added to both of our original problems to get one large problem. The derivation of the algebraic condition requires us to first reformulate (P1) and (Dl). If x satisfies the constraints of(Pl), we know that x = Atb + (I — AtA)X. Thus (x,c) = (Atbc) + ((I — AtA)XC) = (Atb,c) + (x,(I — AtA)C). (P1) is thus equivalent to

(P2) Maximize (x.(I — AtA)C) where Ax = b (or AtAx = A'b). x 0. This is obviously equivalent to

(P3) Minimize (x, — (I — AtA)C) where AtAX = Atb, x 0.

Now we shall rewrite (Dl). The dual of (P2) is (D2) Minimize (y,

where

(I — AtA)C.

To remove the inequality in (D2) observe that if y is a feasible solution of(D2). then AtAy (I — AtA)C. Thus z = — (I — AtA)C + AtAy 0. Thus z is a feasible solution of(I — AtA)Z = — (1 — AtA)C, z 0.

Conversely, if(I — AtA)Z = — (I — AtA)C, z 0. Then z = — (I — AtA)C + (AtA)y and AtAy 0. Furthermore (y, Atb) = (z, Atb) if z, y are so related. Thus there is a many to one correspondence between feasible solutions of(D2) and feasible solutions of

(D3) Minimize (z,Atb), where (I — AtA)Z =

— (I

—

AtA)C.

Z

0.

The next theorem provides the basis for the algebraic condition we are looking for. Theorem 11.2.1

Suppose that x, z are feasible vectors for (P3) and (D3). Then (x,z) = 0 (fond only tfx is optimal for (P3) and z is optimal for (D3).

242 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

It will simplify some of our later calculations, and reduce the size of the matrices involved, if we use partial isometries to replace the projectors appearing in the constraints of(P3) and (D3). Let q = dim N(A). E* will be onto R(A*) while F* will be an isometry from an isometry from

onto N(A). Then F=[ai,...,aq]* and E= is an orthonormal basis for N(A) and {aq+1, ... , ;} is an orthonormal basis for R(A*). The only restraint that we put on the choice ofbasis is that a1 =(I _AtA)c/II(I — AtA)cII, and =Atb/IIAtbII. We AtA)C rule out the possibility that (I — =0, for then all feasible solutions to the original problem are optimal by Proposition 1.3. If Atb =0, then for consistent constraints we have b =0 and (x, c) is either zero or unbounded on the set of feasible vectors. So we assume Atb #0. Since E*, are isometrics, we have that EE*E = E and FF*F = F. Note that AtA = E*E. Thus EAtA = E. Similarly (I — AtA) = F*F. Let C0 = (I — AtA)C. Problems (P3) and (D3) may now be written as (P4) Minimize (x, — c0), where Ex = EAtb,

x 0

and (D4) Minimize (y, Atb), where Fy = F( — c0), y 0.

Notice that x = Atb is a solution of Ex = EA'b and N(E) = R(A*). = N(A). Thus if x is a feasible vector for (P4), then x must be of the form g

(1) i= 1

Similarily, if y is a feasible vector for (D4), then

y=—c0+

(2)

1+1

Now if x, y are feasible for (P4) and (D4), then they are feasible for (P3) and (D3). They will both be optimal if and only if(x, y) =0. Substituting (1)

and (2) into(x,y)= 0 gives

= (Atb + = (Atb,

—c0 +

i1

— c0) —

1q+1 11

+

a.).

iq+1

But and

Atb

LINEAR PROGRAMMING 243

Equations (3), (4) and the equality constraints of (P4), (D4) may be

expressed in one set of constraints as EAtb

0

[fl =

(5)

We have then that:

Theorem 11.2.2

Any solution of(5) which is non-negative in its first 2n

components provides optimal solutions of(P4) and (D4).

It is easy to modify (5) so that the desired solutions have all components where non-negative. Let = 0 and 0. Then (5) — becomes

[E

0

0

0 0

F

0

0 0

—11c011/IIAtbII

0

1

lrxl

11c011/IIA'bIII —1 JLP1J

EAtb

=

—Ec0 0

6

0

Theorem 11.2.3

Any solution of(6) which is non-negative in all of its components provides optimal solutions of(P4) and (D4). Let B be the coefficient matrix of (6). Then Theorem 3 says that solutions

of (P5)

Bz=d, z0.

provide optimal solutions of (P4) and (D4). Problem (P5) can be rewritten

so as to get

(P6) Px =

x,

x 0, P an orthogonal projector.

To see what P has to be, observe that we are asking for R(P) and { z : Bz = d} to be the same. A reasonable way to try and guarantee this is to require that {z : Bz = d} c R(P). If Bz = d, then z = B'd (I — BtB)z. P would thus be the sum of (I — BtB) and a projector onto the subspace spanned by Btd. Let P = (I — BtB) + (Btd)(Btd)*/ if Btd if 2• Then P is a hermitian projector and R(P) {z : Bz = d). Let us examine the relationship between solutions of (PS) and (P6). Suppose that z is a solution of (PS). Then z 0 and z = Btd + (I — BtB)ZE R(P). Thus z is a solution of (P6). Suppose that x is a solution of (P6). Then x 0 and Px = x. This implies that BtBx — (B?d)(Btd)* —

Btd

2

—

,

(Btd, x)

— if Btd

2

Btd

Multiplying both sides by B and using the fact that Bx d is = consistent,

244 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

we get that Bx = (B'd, x)d/ Btd 112. Let

1= (Btd, x)> 0, then z will be a solution of (P5) since we assumed x 0. If(Btd,x)= 0, then xeR(I — BtB) = N(B) and x 0. We

would then have (Btd + is unbounded. This in turn would imply that in our original problems the constraints defined an unbounded 0 set. Suppose that (Btd, x) <0. Then z 0 so that z = B'd (I — that Btd h 0 since (PS) is assumed and there is an he N(A) such consistent. But then h — (I — BtB)z = (B'd + h) — (B'd + (I — BtB)z) 0. Again we have N(B)r is non-empty and (Btd ÷ would be unbounded. Summarizing these observations, we have the next theorem.

Theorem 11.2.4

If in problems (P1) and (Dl) the constraints define a bounded subset of Fe!', then solutions of(P6) provide solutions of(P5) by means of equation (7). Thus solutions of(P6) will provide optimal solutions

of(P1) and (Dl). It is worth noting that Bt, and hence P. are fairly easy to compute. One way is to observe that B is of full row rank and hence B' = B5(BB5) Because of the particular entries involved, the matrix (BB5) is easy to invert. An even easier way to find Bt. is to observe that by slightly modifying the last two rows of B we can get a matrix C such that Bz = d and Cz = d have the same solutions but C is a partial isometry. Then C is used in place of B. From Chapter 2 we know that C' =

Exercises

3.

1. For a set .5° c

define .9'° =

:(z,s) 0 for all scsi. Prove that

for 5°,if any sets in (a)

(b) .5°

if implies if0 c .500

(c) $P C (54PO)0 = $000

(d) (e)

(clb°)°

cL9' is the closure of 6" in the Euclidean norm on R'.

(f) $°°+if°c($°+.fl° 2. Verify that if M RA is a subspace, then M° = M1.

3. Verify that 4. Prove that + N(A) is closed. (Hint: It is possible to define an n x p matrix C such that = + N(A) where p = n + dim N(A). In other words, R",. + N(A) is a polyhedral cone. Using C, C', and the fact that a closed convex set has an element of minimal norm, one can show that + N(A) is closed. Details may be found in [11, p. 380].) 5. Prove Theorem 11.2.1.

LINEAR PROGRAMMING 245

6.

Verify that if BE

X

dE R'".

then P = (I

—

BtB)

+ B'd

2(Bd)(Bd)

is an orthogonal projector. 7. Let B be the coefficient matrix in equation (6). Section 2. Verify that B is of full row rank and calculate Bt by using Bt = B*(BB*YI. 8. Let B be the coefficient matrix in equation (6). Section 2. Modify the last two rows of B to get a new matrix C such that Cz = d and Bz = d have the same solutions and C is a partial isometry. 9. Prove Proposition 11.1.1.

4.

References and further reading

There are many books on linear programming. We list three [48], [84],

[85] in the references. They are the ones we have found most useful. [48] is the most technical and [85] the least. The exposition in [84] seemed well written. The generalized inverse is also discussed in [70] by Pyle and Cline. They are concerned with gradient projection methods. That is, they approach optimal vectors by moving through the set of feasible vectors. This is in contrast to the simplex method which goes from vertex to vertex around the edge of the convex set of feasible vectors. (If there is an optimal vector, there must be one at a vertex.) A proof of Farkas's theorem can be found in Ben-Israel [11]. The complex case and a more thorough discussion of polars and cones is also given in [11].

12

Computational concerns

1.

Introduction

This chapter will consider the problem of computing generalized inverses.

We will not go into a detailed analysis of the different methods. Books exist on just computation of least squares problems. Rather we shall discuss some of the common methods and when they would be most useful. The bibliography at the end of this book will have references that go into a more detailed analysis. Our procedure will be as follows. We shall first discuss some of the difficulty with calculating generalized inverses. We will talk mainly about the Moore—Penrose inverse but the difficulties apply to all. Then we shall consider the problem of computing At. A particular algorithm involving the singular value decomposition will be developed in some detail. Then a section on A, and finally a section on AD. The first thing to note is that we are talking about calculating, say At, and not necessarily about solving Ax = b in the least squares sense. One has the same distinction in working with invertible matrices A. If one wishes to solve Ax = b, the quickest way is not to calculate A - and then A 1b. It takes of order n3 operations to calculate A-' and another n2 to form A - 1b. The direct solution of Ax = b by Gaussian elimination can be done in n3/3 operations. (Here operations are multiplications or divisions.) Similarly, it takes more time to calculate At and then Atb, than to directly calculate Atb. The algorithms we shall discuss fall into three broad groups. The first is full rank factorizations and singular value decompositions. The second is iterative. The third we shall loosely describe as 'other'. It consists of various special ways of calculating generalized inverses. These methods are usually of most use for small matrices, and will tend to be mainly for the Drazin inverse. The reader interested in actual programs, numerical experiments, and error analysis is referred to the references of the last section. It will be assumed that the calculations are not being done entirely in

COMPUTATIONAL CONCERNS 247

exact arithmetic. Thus there is some number ç >0, which depends on the

equipment used, such that numbers less than ç are considered zero. Several algorithms suitable for hand calculation or exact arithmetic on small matrices have been given earlier. For small matrices, those methods are sometimes preferable to the more complicated ones we shall now discuss. This chapter is primarily interested in computer calculation for 'large' matrices. Throughout this chapter denotes a matrix norm as described in Chapter 11. For invertible A, we define the condition number of A with We frequently write DC respect to the norm as = hA instead of ac(A). If A is singular, then i4A) =

2.

HA

At

Calculation of A'

This section will be concerned with computing

The first difficulty is that this is not, as stated, a well-posed problem. If A is a matrix which is not of full column or row rank then it is possible to change the rank of A

by an arbitrarily small perturbation. Using the notation of Chapter 12 we have:

Unpleasant fact Suppose AECTM XI, is of neither full column nor full row rank. Then for any real number K, and any 6>0, there exists a matrix E, E <6, such that II(A + E)' — A' K.

Proof Let UEN(A) and veR(A) be vectors of norm one. Let E = min{1/K,e}. Then hEll = çand rank(A +E)= rank(A)+ 1. where Hence (A + E)t — At K by Theorem 10.4.2. If A is determined experimentally, or is entered in decimal notation on a computer, or there is round off, then it is not obvious whether it makes sense to talk about computing At unless A is of full rank. One method of posing the problem is by using the singular value decomposition (Theorem 0.2.2). Let X be an m x n matrix. By Theorem 0.2.2 there exist unitary matrices U, Vso that

]v

(1)

where 1(X) = Diag[a1,... ,o,]. Then

:]u*. Now let

]v Herec

248 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

depends on the computing equipment available and the desired accuracy. Now take The entries of 1(A) are the eigenvalues of(AA)"2.

Since eigenvalues vary continuously with the matrix's entries, one has that 1(A + E) and 1(A) have the same rank if fl E Ills small enough. Thus urn IIEII—O

—

II

((A +

II

=

(3)

by Theorem 10.4.1. Thus the problem of computing is used as the definition of the Moore — Penrose inverse.

is well-posed if(2)

It should be noted that the rate of convergence of (3) depends on A and E and not just the condition number of A. Example 12.2.1

0].Then

Let

— (A + = large error if 8< II

c, it equals 0 if < c. For this A, we have a <& Note K(A) = 1

if

<2e and no error for

for 11112

There is another way to view the calculation of At as a well-posed problem. Fix A e 'i" and consider the equations

AXA—A=E1,

(4)

XAX—X=E2,

(5)

AX_X*A*=E3,

(6)

XA_A*X*=E4.

(7)

and In terms of our previous discussion, X may be thought of as the computed estimate and the E. as error terms. We shall now solve (4)—(7) for X in terms of A, At and the E.. Equation (7) gives AXA — AMX* = AE4, or — EA*. by (4), E1 + A — AA*X* = AE4. Thus XAA* = A* + Multiplying on the right by gives XAAt = [A* +

—

EA*]A*tAt.

(8)

Similarly from (6) we get AXA — X*A*A = E3A and hence

AtAX = AtA*t[A* +

—

A*Efl.

(9)

Substituting (8) and (9) into (5) gives

E2+X=XAX=XAAtAA1AX = [A* + — EA*]A*tAtAAtA*?[A* + — A*Er] = [A' + — EAtJAA?A*t[A* + — A*Efl — E]AtA*f[A* + — A*Efl = [I +

= A' + [EiA*f — E]A' + — +

—

—

Efl

Efl.

As an immediate consequence of (10) we have the following results.

(10)

COMPUTATIONAL CONCERNS 249

Theorem 12.2.1 If is a sequence of n x m matrices such that the {X,A — A*X') and {AX,A — A) sequences {XAXr — X,}, {AX. — all converge to zero, then {X,} converges to At.

Theorem 12.2.2

Let A C" ",XE C" "'and

l,2,3,4by(4)—(7). Then

such that 11B11 = IIBII. Define X—

denote a matrix norm

E2 +

{

E1

+ E3 +

P11

E4 II)

(I)

—fllAtO2{211E1l1+0E111l1E31l+IIE4HIIE1II) + hAt 11311E1

112

Note that estimate (11) in Theorem 2 seems to suggest that minimizing

is important if E has large condition number. Theorems 1 and 2 show that if A liii At and At are small, then calculating At is well-posed in the sense that if X comes close to satisfying the defining conditions of the Moore—Penrose inverse, then X must be close to At. Algorithms exist for calculating the singular value decompositions that are stable. That is, error accumulates no faster than the condition number would seem to warrant. Perhaps the best known one is due to Golub and Reinsch [39]. Their method consists of using Givens rotations and E1

H

Householder transformations to reduce A to the form

where J is

diagonal. This method is stable primarily because the Householder matrices and Givens rotations are unitary. Thus their application does not increase the norm of the error matrix. This algorithm is discussed in more detail in the next section. One might think that the full-rank case would always be easier to work with. Assume is full column rank. Then from Theorem 1.3.2

At = (A*A) 1A*.

(12)

However, taking AA is to be avoided unless one knows something about the singular values of A. Let A = UZV where I = Diag{a1, ... ,a}. Then A*A = V*12V where £2 = Diag , o). If a1 is small, then might be negligible. For example, if round off is 1011, a singular value of 10-6 would be lost.

fi

Example 12.2.2

ii

LetA= IP 0I.ThenA*A= LO PJ

If $ is small, we could have $2

fi+p2 I

I

L

negligible and (AA)

1

=

In any event, for small ji there is a loss of information. These same comments apply to computing At by writing A = BC where BC is a full rank factorization. Then At = CtBt by Theorem 1.3.2. It is possible with a little effort to still utilize a full rank factorization and

250 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

= C*(CC*)l or avoid the possible ill-conditioning caused by taking Bt = (B*B) lBS if only one of CC5 or B5B is ill-conditioned. If, however, both CC5 and BB are ill-conditioned then the method will not help much. The interested reader is referred to [35]. 11(12) is to be used, then double precision arithmetic is recommended. In fact, some numerical analysts recommend always using double precision. This will be especially true when working with the Drazin inverse as will be explained later. Another way of calculating the Moore — Penrose inverse is by an iterative scheme. Iterative methods exist for finding the inverse of a non-singular matrix. Their principal use is in taking a computed inverse and yielding a more accurate one. However, the algorithms for iteratively calculating the Moore—Penrose inverse are not generally self-correcting. One of the more commonly discussed iterative methods is the one due to Ben-Israel [8]. One form of it is the following. Its proof and some related results are left to the exercises. Theorem 12.2.3

ff0 <

denote the largest eigenvalue of AA5, A€Cm Let = X,(21 — AX,). Then < 2/A1, set X0 = xA5 and define

X—' At as r—' co. Stewart has shown that the iterative scheme of Theorem 3 takes at least

2 log2[c(Aj] iterations to produce an approximation to (Aàt. Thus it can be slow. The method does turn out to be stable. Error does not accumulate more rapidly than would be expected. Each iteration requires 2mn2 multiplications. This method and its variants are not competitive for large matrices with the singular value decomposition. Stewart has also shown that if the entries of A are large so that the round-off error is large in magnitude, then errors can accumulate rapidly so a careful analysis of rounding error is needed. Proper splittings have also been recently proposed as a method of calculating either (1)-inverses or the Moore—Penrose inverse [14]. These methods require first the splitting and then an iteration. Details may be found in Exercises 11—16 and in [14]. At this point it should be pointed out that while the singular value decomposition is a widely used method for computing At there are two schools of thought. Let us use the phrase 'elimination method' to designate those methods of computing At which are based on some variation of Gaussian elimination. Algorithms 1.3.1, 1.3.2,3.3.1 and 3.3.2 are of this type. To begin with there is no such thing as a universal algorithm. Given any non-trivial algorithm there is usually an example for which it works poorly. Secondly, if an algorithm is to be able to handle, less well conditioned problems it frequently must be made more sophisticated. On this basis there are those who argue that elimination methods have their place. Elimination methods usually are much simpler with substantially lower operation counts. Consequently, they are quicker to run with less accumulation of machine error.

COMPUTATIONAL CONCERNS

251

It would be unfortunate if after reading this chapter, a reader computed At for a 4 x 4 matrix with integer entries by using the singular value decomposition. By the time he had it entered on a machine, he could have obtained the answer at this desk, exactly if need be, by an elimination method and drunk a second cup of coffee. For well conditioned matrices of moderate to small size there are strong arguments for using an elimination method. In this same vein we might point out that in our computing experiments with A° we tried powering A by using QR factorization. While this is a stable method we discovered, not suprisingly, that for 6 x 6 or 8 x 8 matrices we got more error than if we just naively multiplied them out. The reason was the substantial increase in the operation count and the loss of information. When A was entered exactly (had integer entries for example), some error was introduced in the factorization whereas direct multiplication often produced the exact answer. If one suspects that a matrix is severely ill-conditioned and/or very high accuracy is needed, then the best idea of all may be to compute in exact (residue) arithmetic. While this may substantially increase the computational effort needed, it will provide At exactly if A has rational entries. Elimination methods are preferable with residue arithmetic since they have lower operation counts and produce exact answers. The basic idea is to rescale the original matrix so that it has integral entries. Then compute in modular arithmetic. If the numbers involved are large, then multiple modulus arithmetic may be necessary. A good discussion of three elimination methods and associated algorithms in modular arithmetic may be found in [14]. See also [87]. Residue arithmetic can be further studied in [90], [92]. See also the other references of [14]. Exact computation may also be done in p-adic number systems (see [14] for references).

3.

Computation of the singular value decomposition

This section will hopefully develop the Golub—Reinsch algorithm in

sufficient detail so that the interested reader will understand its basic structure. We shall first briefly discuss Householder transformations and Givens rotations. Then we shall give an outline of the algorithm. Finally, we shall discuss each step in greater detail. We shall not discuss the actual organization of such a program or worry about storage. The interested reader is referred to [52, Chapter 18] or [15], [36], [38], [39]. Recall from Chapter 10 that v 2 =

I

Definition 12.3.1 Householder transformation. It is easy to see that H= a

—

2uu*, is called a

H u

a liv II 2e1 where

252 GENERALIZED IN VERSES OF LINEAR TRANSFORMATIONS

a=+1

<0. Let H

0, a = — 1

= I — 2uu*/u*u. Then

Hv= —oIIv112e1.

Definition 12.3.2 A Givens rotation is a matrix G = [g1 ,... ,gj, such that g. = e4 except for two values of i ; i1
I

01

c

00 sl

10 wsthc

2

•

CJ

01

0

+s =1. 2

A Givens rotation is unitary and is used to introduce zeros.

Proposition 12.3.2 Given

and indices 1 i,
the vector

+ c= = v.2/(1v1112 + 1v12 12)112. Then the Givens rotation G defined by i1 ,i2, c, s is such that (Gv)1 = v. i1 or i2 ,(Gv)11 = (I v11 + (Gv)12 = 0.

I v.2

The operation performed by G in Proposition 2 is often referred to as zeroing the i2-entry and placing it in the i1-position. Note that only i1 , are affected at all. Note that (n — 1) Givens rotation could be used on a vector v to get a multiple of e1 but one Householder transformation will do the job. We now present the algorithm.

Algorithm 12.3.1

Given AECTM XII Assume

m n. If m n, work

with A*.

(I) Perform at most 2n — to get

1

Householder transformations Q,, H1 to A

QTAH=Q((QIA)H2.H)=[:]BEcAx. where q1

B=

e2 q2

0 (1)

0 Note B is bidiagonal.

(II) II some e. =0, then B =

[1

]

and the singular value

decompositions of B1 ,B2 may be computed separately. (III) If all e1 0 but some =0, pre-multiplication by (n — k) Given s

COMPUTATIONAL CONCERNS 253

rotation G1 gives q1

—

e2

0

0

0

—

0

I

0 I

Lo

0

ñ2

Application of Givens rotations on the right to produces a matrix with zero last column. (IV) Steps II, III have reduced the problem to computing the singular value decomposition of B in the form (1) where all q. 0, all the e. 0. Say B is n x n. Now set B1 = B, where Uk,Vk are = is upper orthogonal and to be defined. The Uk, Vk are chosen so that bidiagonal for all k and the (n — 1, n)-entry of Bk goes to zero as k —' Thus after a finite number of steps

01

B= k

0

0

4

and 0 is less than some agreed-on level of precision. Hence

Bk=[

¶].

(2)

(V) Step IV (and possibly II, III) is repeated on terminates in at most n — 1 repetitions to give UAV

=

in (2). The process

[1] where D is diagonal.

and U, V are unitary. (IV) Compute A? = U*[Dt 0] V'.

This completes our outline of the algorithm. We shall now discuss some of the steps in more detail.

Step I

If A = [a1 , a2,... , a], calculate Q1

Q1a1 is a multiple of e1. Let Q1A so

that

rr1l =

by Proposition 1 so that

Use Proposition 1 to find an

LrmJ (That is,

254 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

H2[] 0 i

(Q1A)H2 =

(2)

:

:

LOx {

I

x

Now repeat this procedure on the lower right hand block of (3). Note that

if H is an n — 1 x n —1 Householder transformation, then

Lo n x n Householder transformation. Continuing, we get (1) in at most 2n — I Householder transformations.

Step II

an

Clear.

Step I/I

Suppose ek 0 but =0. The rotation is defined as in Proposition 2 and places a zero in the (k,j)-position, the rotation effecting only the kth and jth rows of B.

Step IV This is the most difficult part. It constitutes a modified implicit QR-procedure. Suppose B is in the form (1) with all 0. Set B1 = B. Suppose that B,,, has been calculated. We shall show how to get Let Bm be in the form (1), B,,, ben x n. Set Bm+

1=

—

— —

+

1)/2e,,q,,_1, and

—

f[f+ (1 +f2)112J if f0

[f— (1 +f2)"2] iff< 0. + e,,2 — e,,q,,_ 1/t.

Set a = Now — cr1 has [x, x, 0, ... , for a first column. Calculate by Proposition 2 a rotation R1 to zero out the (2,1)-entry of BBM — al. This rotation would involve only the first two columns. Thus

xx x

x

e3

q3.

0 A series of Givens rotations are now performed on BR1 to return it to

bidiagonal form. Let R1 operate on columns i and I + 1 ;T1 operate on rows i and i + 1. Then B,,,.,.1

(4)

Each rotation takes a non-zero entry, zeros it, and sends the non-zero part

COMPUTATIONAL CONCERNS 255

to a new place. The pattern is as follows for a 5 x 5:

0

X

X

21

X

X

24

23

X

X

26

zs

x

x

0

non-zero at stage i and zero at the other stages. Thus T1 zeros and creates a non-zero entry at z2. R2 zeros z2 and creates a non-zero entry at z3. The B, computed in this way will again be bidiagonal and in the form of (1). (This requires proof.) Steps V and IV are reasonably self-explanatory. Here

4.

is

(1)-inverses

When dealing with generalized inverses that are not uniquely defined, one

cannot talk about error in the same sense as we did in Sections 1 and 2. Probably the easiest way to compute a (1)-inverse is as follows. By reducing to echelon form, one can find permutation matrices P. Q so that

PAQ

U is square and rank (U) = rank (A). Then

=

'

a (1)-inverse of A. Note that since there is some choice as to which columns go into U one might in some cases be able to control to some extent how well conditioned U is. How to check a computed inverse will depend on how it is being used. For example, even if A is invertible it is sometimes possible to find X so that AX — I is small but XA — I is large. is

Example 124.1 ABA

—

A=

I

L0

Let A

0], =

B

Then

=

I which is small. Thus B comes

—CJ

Iii

ru

(1)-inverse. However B gives BL

i.j = Lo]

which has the actual solution x =

.

as

to being a

a solution of Ax

['_

Ii = Li 1

is

the B is small and (1)-inverses solve consistent systems. As Example I shows some care must be taken. If row operations are to be used to compute A, then a well-conditioned matrix is desirable.

5.

Computation of the Drazin inverse

Since AD

=

I)tAt for 1

Ind(A),

(1)

256 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

one might suppose that computation of the Drazin inverse would be, at

worst, just a little more work than computation of the Moore—Penrose inverse. However, there are several problems. The first difficulty is that small perturbations in A can cause arbitrarily large perturbations in AD. Unlike the Moore—Penrose, the singular value decomposition of A does not immediately help because (UAW)D is not, in general, equal to W*ADU* when U, W are unitary. The following weaker version of Theorem 2.1 shows that computation of AD does make some sense when using nonexact arithmetic. Of course, there is no theoretical difficulty if exact arithmetic is used and A is known exactly. xn Theorem 12.5.1 IfAeC" and {X,} is such that {X.AX, — X,}, {AX, — X,A}, IAk.I. 'X, — all converge to zero and there is an L such X

Lfor all r, then Xr —, A".

X,

that

Proof

Suppose {Xj, A satisfy the assumptions of Theorem 1. Suppose of {X,.} X, does not converge to AD. Then there is a subsequence

such that 11X5—A1 e>Oforallsand some e>O. Since {Xj is bounded, it has a subsequence {Xj which converges. Let X0 = urn X,. Then X0

AD and X0A = AX0, X0AX0 = X0,

1X0 = A. Thus X = AD

U which is a contradiction. Hence X, The next theorem is true with no assumption on md (A) but the proof came too late for inclusion. AD.

A a Suppose that {A2X, — A), sequence of matrices such that the sequences {AX, — X,A}, and {XrAX, — X,) all converge to zero, then X, —, A

Theorem 12.5.2

Proof Suppose A has index one and {XJ is a sequence which satisfies the assumptions of Theorem 2. A is similar by some non-singular B to a matrix of the form o

o

Let BXB1

where C is invertible.

ix

(r)

=

X (r)1 Then by assumption, X4(r)j

1CX1(r) CX2(r)1 L

0

C2X1(r)

L0

iX3(r)C 01

0 ]Lx3rc o],0, O]o OJLOOJ

C2X2(r)]

—

(2)

(3)

COMPUTATIONAL CONCERNS 257

and

Xi(r)CX2(r)1 — X2(r)1 LX3(r)CX1(r) X3(r)CX3(r)J LX3(r) X4(r)J

(4)

From (2), we get X2(r) —'0, X3(r) -+0, since C is invertible. But then X4(r) 0 from (4). Equation (3) yields X1 (r) C - Hence X, —, A

as desired. U Another difficulty is that the index is not generally known. There are

exceptions, such as was the case with the Markov chains in Chapter 8. Thus to use (1) one must either get an estimate on the index or use n instead of k. However, using n can introduce large errors for moderately sized matrices. Example 12.5.1

=

and I is an r x r identity

If

] where A is 20 x 20,

Let A

for 1 r

19. Then A41 =

is considered zero, then (1) would produce

is actually

11001

[1

= 82

10

2

0]

=0, whereas AD

0

L

Numerical experiments using (1) with reasonably conditioned matrices

have shown that one can run into difficulty even with n less than 10. The difficulty in Example 1 is the loss of 'small' eigenvalues. Suppose A = P

P1 is the Jordan canonical form for A.

Then

1

IP'•

AM=PI

OJ

L

+1J1 corresponds to an eigenvalue A. such that rounds to zero, then (1) will produce a commuting (2)-inverse and not AD. Thus in checking a computed AD, AD, it is important to check whether it reduces powers. However, if there is a 1. x t. such that rounds to A' where as in fact zero, then to machine accuracy we will have AD is not close to AD. Thus if one has a 20 x 20 matrix of low index, using (1) with n =20 could lead to erroneous answers that would not be detected. For these reasons we prefer other methods to (1). Computation of the Jordan canonical form is also very sensitive so that one should not try and compute it in full detail unless it is needed. If one is going to calculate A 11(11 AD + 1) or some such, a much better idea of the conditioning would be

If any of the t1

C(A) =

x

H P1111 P 111(11

Ill + II

II')

258 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

is the Jordan form of A. This C(A) of course depends on P. However, if it is not too large, the eigenvalues are not likely to be lost and eigenspaces are not nearly parallel. The more one is willing to lump eigenvalues together and weaken the Jordan form, the less computational difficulty there should be. Let N be an n x n nilpotent Jordan block. Then, N is nilpotent with index n and, of course, ND =0. Zero is the only eigenvalue of N. Now let N be the same as N except for an >0 in the (n, 1) place. Then N— = £ but R has n eigenvalues of modulus The singular values of N, however, are (n — 1) ones and e, while the singular values of N are (n — 1) ones and zero. If c = 10-20 n = 20 then has an eigenvalue of 0.1 whereas if N — if = 10 20• It is because of this stability of the singular values, in particular the zero ones, and the instability of the zero eigenvalues that we suggest the following method for calculating This method is based on the orthogonal deflation method in [39]. Since there are established subroutines for calculating singular value decompositions, it should be fairly easy to implement. where

Algorithm 12.5.1 (I) Given A, calculate the singular value decomposition of

A=U[f If

then AD = A' =

If Ind(A)> 0, write 0 0

VAV*_VUF1

(II) Now calculate the singular value decomposition of

If

is invertible, go to Step IV. If not, then

111 0] VAU)V*_VU ii 'Lo 1

0 0

1

Thus

0 0 0

L

0

1

= 1

o] 0J

(4)

COMPUTATIONAL CONCERNS 259

(III) Continue in the manner of Step II, at each step calculating the and performing the appropriate singular value decomposition of =0, then AD =0. If some multiplication as in (4) to get is non-singular, go to Step IV. If some (IV) We now have k = Ind(A) and 0

0

01

FB,

WAW*=

=1 31

lc+1.I

*

I

k+1.2

1B

•.

k+l.k

L

I

(5)

I

2

I

0

where B1 is invertible, N is nilpotent. and W is unitary. The Drazin inverse of(S) may be computed as 0 0

where X is the solution of XB1 — NX = B2. The rows x1 of X are recursively r. the ith row of B2, then x1 = r1B1 and solved as follows. If N = x1

= (ri +

Note that the singular value decompositions are performed on

successively smaller matrices. Also the unitary matrix W is the product of

matrices of the form ['

and the size of V. decreases with i. Thus the

amount of computation decreases on each step. If k is not large in comparison with the nullity of the core of A, this method seems reasonable. If one suspects that k is comparable to the nullity of the core of A, then it might be better to use a method like the double Francis QR algorithm and get

U*AU=T where T is upper-triangle with the zero diagonal entries listed first. Thus

O.xl 1' A

0 —

0

x

[N B1 L° B2

—I

where N is nilpotent and B2 is invertible. Then Theorem 7.8.1 can be

used on T. Since the singular value decomposition is considered as reliable a way

(6)

260 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

any of determining numerical rank, the deflation algorithm produces a reliable value of the index. On the other hand, (6) provides information on the eigenvalues of A. As pointed out earlier if eigenvalues are small when taken to the kth power, then the algebraic definition cannot be used to numerically distinguish between AD and some commuting (2)-inverse. Thus the added information provided by (6) could be helpful. If one does an operational count on finding the Drazin inverse of a 20 x 20 matrix with core-rank 8 and index 3, then the deflation method entails about the same work as the power method using n = 20, and avoids the risk of losing small eigenvalues. If the deflation method reveals a small index, then (1) can be computed as a check using k instead of n. Note that R(I — ADA) can be immediately read off from (5) or (6) without any additional effort. Another method that has some strong points is the one given in Theorem 7.8.2. There successive full-rank factorizations of successively smaller matrices are performed until an invertible matrix is reached. This method involves only elementary row operations, and matrix multiplications. It is fairly easy to program. as

6.

Previous algorithms

In other parts of this book we have presented several algorithms. This section will list the algorithms and those parts of the book that discuss computation. In general, the algorithms were only compared for operational counts. Error analyses are not given. The methods are all fairly easy to program and when tested by the authors worked well for small wellconditioned matrices (less than 10 x 10). All algorithms terminate in a finite number of steps.

1. Algorithm 1.3.1, (page 16). Computes A' from geometric definition. 2. Algorithm 1.3.2, (page 16). Computes At by computing a full rank factorization. 3. Algorithm 3.2.1, (page 51). Computes At when A is rank one modification of a matrix B for which Bt has been computed. 4. Algorithm 3.3.1, (page 56). Computes A' by a sequence of rank one modifications. 5. Algorithm 3.3.2, (page 57). Computes A' when A is a hermitian matrix. Computation of any A may be reduced to computing the Moore— Penrose inverse of a hermitian matrix. 6. Algorithm 7.2.1, (page 125). Computes AD by computing corenilpotent decomposition. 7. Theorem 7.5.2, (page 130). Computes AD from eigenvalues of A and their multiplicities.

COMPUTATIONAL CONCERNS 261

Algorithm 7.5.1, (page 134). Computes AD by a finite number of recursively defined operations. 8.

9. Section 8.5 Discusses computation of A and w is the fixed probability vector. Of course, throughout the text there are results, on partitioned matrices for example. that can be useful in special cases. The results of Chapter 10 on continuity of generalized inversion may be useful in error analysis. 7. 1.

Exercises (Alternate Proof of Theorem 12.2.1 for real matrices.) Take AERMXI*. Definef

x

x

x

f(X)=(AXA — A, XAX — X, AX — X*A*, XA — A*X*).

(1)

At any the derivative off, denotedf'(X0), is a linear X x x '"into transformation from R'" m x R" ". Show is that its value at XE R" X

f'(X0)X = (AXA, XAX — X, AX — X*A*, XA — A*X*).

(2)

Show that f'(X0) is one-to-one for all X0, and thatf is also one-to-one. Conclude thatf has a continuous inverse from its range onto R" thus proving Theorem 12.2.1 for real matrices. 2. CflXfl

by

g(X) = (AX — XA,

'X

—

Ak, XAX — X).

Show that g'(X0)X = (AX — XA, Ak+ 'X, X0AX + XAX0

- X).

Show that g'(X0) need not be one-to-one but that

is one-to-one. 3. Using Exercise 2 show that for any AEC" ",there exists a constant K such that if g(X,) —'0 and X, K for a sequence { X,), then X, -. AD. Exercises 4—9 are from [7], [8], and [13].

4. Suppose X0eC" '"satisfies (1) X0 = MB0, B0eC'" '",B0 non-singular, (ii) X0 = C0A*, C0E C" ",C0 non-singular, (iii) AX — < 1, and (iv) X0A — PR( <1, where is a multiplicative matrix ask—' cc. norm. Let Xk+i =Xk(2PR(A)—AXk). Then 5. Let A1 denote the largest eigenvalue of AA*. 110 < < 2/A1, set X0= xA* and define Xk+l = Xk(2I — AX,). Show X,—'A' as k—' cc. H

k

6.

Define

as in Exercise 5. Let X, =

as k —' cc.

A*(I —

Show X, —, At

262 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

Show the convergence in Exercise 6 is of the first order, the convergence in Exercise 5 is of the second order. 8. Let u beas in Exercise 5. Let Show Zk —, AAt as k —, and '1R(A) — Zk+l — Zk 112 7.

II

9.

be as in Exercise 8. Show Let rank A.

II

converges monotonically to

Exercises 11—16 are from [14], [51]. Then A = M — N is a proper splitting if R(A) = R(M), 10. Let N(A) = N(M). Let p(S) denote the spectral radius. In Exercises 11—16, M, N is to be considered a proper splitting of A. =MtNX1÷Mt where A =M —Nis a proper splitting of A. 11. Let —, At if and only if p(MtN) < 1. Show 12. Show that if M4 isa (1,4)-inverse forM, then (I — is well-defined and a (1,4)-inverse for A. 13. Show that if M is a (1,3)-inverse for M, then (I — is well-defined and a (1,3)-inverse for A. 14. Show that At = (I — MtN) 'Me.

15. Show that if M is a (1,3) or (1,4)-inverse and Xk+i = MNXk + M, then Xk+l converges if and only if p(MN) < 1. If it converges it converges to a (1,3) or (1,4)-inverse, respectively, 16. Show that if A has full column rank, there exists a proper splitting of

A such that Mt = M. 17.

18. Prove Proposition 12.3.1. 19. Prove Proposition 12.3.2. Prove Theorem 12.5.2 without assuming Ind(A) 1. (See 'Continuity of the Drazin Inverse' by S. L. Campbell (to appear) for details.)

Bibliography

mentioned in Chapter 0, reference [64] has an annotated 1775 item bibliography on generalized inverses. Accordingly, we have made no attempt to be complete. The references listed fall into three groups. Most are explicitly mentioned in the text. We have also referenced only that part of our work that appears in the book and was co-authored with others, principally N. J. Rose. Finally, the idea of the Drazin inverse has recently proved useful in the study of singularly perturbed autonomous systems. This recent work, [19], [20], [22], [25], [26] and other related applications, [21], [58], [61] for example, were not included in the text due to page and time limitations, but are included in the references. I Athens, M. and Falb, P. L. Optima! Control. McGraw-Hill, New York, 1966. 2 Anderson, W. N. Jr. Shorted operators, SIAM J. app!. Math. 20, As

520—525, 1971.

Anderson, W. N. Jr. and Duflin, R. J. Series and parallel addition of matrices. J. Math. Anal. App/ic. 26, 576—594, 1969. 4 Anderson, W. N. Jr., Duffin, R. J. and Trapp, 0. E. Matrix operations induced by network connections. SIAM J. Control 13, 3

446—461, 1975. 5

Anderson, W. N. Jr. and Trapp, 0. E. Shorted operators II. SIAM J. app!. Math. 28, 60-71, 1975.

Bart, H., Kaashoek, M. A. and Lay, D. C. Relative inverses of meromorphic operator functions and associated holomorphic projection functions. Mash. Ann. (to appear). 7 Ben-Israel, A. An iterative method for computing the generalized inverse of an arbitrary matrix, Math. Comp. 19,452—455, 1965. 8 Ben-Israel, A. A note on an iterative method for generalized inversion of matrices. Math. Comp. 20,439—440, 1966. 9 Ben-Israel, A. On error bounds for generalized inverses. SIAM J. nwner. Anal. 3, 585-592, 1966.

6

264 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS 10 11

Ben-Israel, A. and Charnes, A. Generalized inverses and the BottDuffin network analysis, J. math. Anal. Applic. 7,428—435, 1963. Ben-Israel, A. Linear equations and inequalities on finite dimensional, real or complex vector spaces: a unified theory. J. math. Anal. Applic. 27, 367—389, 1969.

12

Ben-Israel, A. A note on partitioned matrices and equations. SIAM Review 11,247—250, 1969.

13

Ben-Israel, A. and Cohen, D. On iterative computation of generalized inverses and associated projections. SIAM I. numer. Anal. 3,410—419, 1966.

14

Berman, A. and Plemmons, R. J. Cones and iterative methods for best least-squares solutions of linear systems. SIAM J. numer. Anal. 11, 145—154, 1974.

15

Businger, P. A. and Golub, 0. H. Algorithm 358 singular value decomposition of a complex matrix. Communications of ACM 12, 564—565, 1969.

16

Campbell, S. L. Differentiation of the Drazin inverse. SIAMJ. appi. Math. 30, 703—707, 1976.

17

Campbell, S. L. The Drazin inverse of an infinite matrix. SIAM).

18

app!. Math. 31,492—503, 1976. Campbell, S. L. Linear systems of differential equations with singular coefficients. SIAM J. math. Anal. 8, 1057—1066, 1977.

Campbell, S. L. On the limit of a product of matrix exponentials. Linear multilinear AIg. 6, 55—59, 1978. 20 Campbell, S. L. Singular perturbation of autonomous linear systems II. J. Eqn. 29, 362-373, 1978. 21 Campbell, S. L. Limit behavior of solutions of singular difference 19

equations. Linear Aig. applic. 23, 167—178, 1979.

Campbell, S. L. Singular perturbation of autonomous linear systems IV (submitted). 23 Campbell, S. L. and Meyer, C. D. Jr. Recent applications of the Drazin inverse. In Recent Applications of Generalized Inverses M. Nashad, Ed. Pitman Pub. Co., London, 1979. 24 Campbell, S. L. Meyer, C. D. Jr. and Rose, N. J. Applications of the Drazin inverse to linear systems of differential equations. SIAM I. 22

app!. Math. 31,411—425, 1976. 25

Campbell, S. L. and Rose, N. J. Singular perturbation of autonomous linear systems. SIAM). math. Anal. 10, 542—551, 1979.

26

Campbell, S. L. and Rose, N. J. Singular perturbation of autonomous linear systems III. Houston J. Math. 44, 527—539, 1978.

Cederbaum, I. On equivalence of resistive n-port networks. IEEE Trans. Circuit Theory, Vol. CT-l2, 338—344, 1965. 28 Cederbaum, I. and Lempel, A. Parallel connection of n-port networks. IEEE Trans. Circuit Theory, Vol. CT-14, 274—279, 1967. 29 Churchill, R. V. Operational Mathematics. McGraw-Hill, New York, 27

1958.

BIBLIOGRAPHY 265 30 31

32

Cline, R. E. Representations for the generalized inverse of sums of matrices. SIAM J. nwner. Anal. Series B, 2, 99—114, 1965. Cline, R. E. Representations for the generalized inverse of a partitioned matrix. SIAM J. app!. Math. 12, 588—600, 1964. Drazin, M. P. Pseudoinverses in associative rings and semigroups. Amer. Math. Month!)' 65, 506—5 14, 1968.

33

34 35

36 37

38

39

40

41

Faddeev, D. K. and Faddeeva, V. N. Computational Methods of Linear Algebra, (translated by Robert C. Williams). W. H. Freeman and Co., San Francisco, 1963. Gantmacher, F. R. The Theory of Matrices, Volume II. Chelsea Publishing Company, New York, 1960. Gallie, 1. M. Calculation of the generalized inverse of a matrix, Technical Report CS-1975-7, Computer Science Department, Duke University. Golub, 0. and Kahan, W. Calculating the singular values and pseudoinverse of a matrix. SIAM J. numer. Anal. Series B., 2, 205—224, 1965. Golub, 0. H. and Pereyra, V. The differentiation of pseudo-inverses and non-linear least squares problems whose variables separate. SIAM /. numer. Anal. 10,413—432, 1973. Golub, 0. H. and Wilkinson, J. H. Ill conditioned eigensystems and the computation of the Jordan canonical form. Technical Report, STAN-CS-75-478. Golub, G. H. and Reinsch, C. Singular value decomposition and least squares solutions. Numer. Math. 14, 403—420, 1970. Greville, T. N. E. Spectral generalized inverses of square matrices. MRC Tech. Sum. Rep. 823, Mathematics Research Center, University of Wisconsin, Madison, 1967. Greville, T. N. E. The Souriau-Frame algorithm and the Drazin pseudoinverse. Linear Alg. Applic. 6, 205—208, 1973.

42

43

44 45

46 47 48

49 50

Hakimi, S. L. and Manherz, R. K. The generalized inverse in network analysis and quadratic error-minimization problems. IEEE Trans. Circuit Theory Nov., 559—562, 1969. Householder, A. S. The Theory of Matrices in Numerical Analysis. Blaisdell Publishing Co., New York, 1964.

Huelsman, L. P. Circuits, Matrices, and Linear Vector Spaces. McGraw-Hill, New York, 1963. Jacobson, D. H. Totally singular quadratic minimization problems. IEEE Trans. Automatic Corn. 16, 651—657, 1971. Kemeny, J. 0. and Snell, J. L. Finite Markov Chains. D. Van Nostrand Company, New York, 1960. Kemeny, J. 0. Snell, J. L. and Knapp, A. W. Denumerable Markov Chains. D., Van Nostrand Company, New York, 1966. Linear Inequalities and Related Systems (Kuhn, H. W. and Tucker, A. W. Eds.), Princeton University Press, Princeton, N. J., 1956. Lancaster, P. Theory of Matrices. Academic Press, New York, 1969. Lay, D. C. Spectral properties of generalized inverses of linear

266 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS

operators. SIAM J. app!. Math. 29, 103—109, 1975. 51

Lawson, L. M. Computational methods for generalized inverse matrices arising from proper splittings. Linear Aig. Applic. 12, 111—126, 1975.

Lawson, C. L. and Hanson, R. J. Solving Least Squares Problems. Prentice-Hall, New Jersey, 1974. 53 Meyer, C. D. Jr. Generalized inverses of triangular matrices. SIAM J. app!. Math. 18,401—406, 1970. 54 Meyer, C. D. Jr. Generalized inverses of block triangular matrices. 52

55

SIAM J. app!. Math. 19, 741—750, 1970. Meyer, C. D. Jr. The Moore—Penrose inverse of a bordered matrix. Linear Aig. Applic. 5, 375—381, 1972.

56

Meyer, C. D. Jr. Generalized inversion of modified matrices. SIAM I. app!. Math. 24, 315—323, 1973.

57

Meyer, C. D. Jr. An alternative expression for the mean first passage matrix. Linear AIg. Applic. 22,41—47, 1978.

Meyer, C. D. Jr. and Plemmons, R. J. Convergent powers of a matrix with applications to iterative method for singular linear systems. SlAM J. numer. Anal. 14,699—705, 1977. 59 Meyer, C. D. Jr. and Rose, N. J. The index and the Drazin inverse of block triangular matrices. SIAM J. app!. Math. 33, 1—7, 1976. 60 Meyer, C. D. Jr. and Shoaf, J. M. Updating finite Markov chains by 58

using techniques of generalized matrix inversion. J. Stat. Comp. and Simulation, II, 163—181, 1980. 61

Meyer, C. D. Jr. and Stadelmaier, M. W. Singular M-matnces and inverse positivity. Linear Aig. Applic. 22, 129—156, 1978.

62 63

64 65 66 67

Mihalyffy, L. An alternative representation of the generalized inverse of partitioned matrices, Linear Aig. Applic. 4,95—100, 1971. Moore, R. H. and Nashed, M. Z. Approximations to generalized inverses. University of Wis., MRC Tech. Summary Report # 1294. Generalized Inverses and Applications (Nashed, M. Z. Ed.), Academic Press, New York, 1976. Noble, B. Applied Linear Algebra. Prentice-Hall, New Jersey, 1969. Pearl, M. Matrix Theory and Finite Mathematics. McGraw-Hill, Inc., New York, 1973. Penrose, R. A generalized inverse for matrices. Proc. Cambridge Phil. Soc. 51,406—413, 1955.

68

Penrose, R. On best approximate solutions of linear matrix equations.

69

Proc. Cambridge Phil. Soc. 52, 17—19, 1955. Pyle, L. D. The generalized inverse in linear programming. Basic structure. SIAM I. app!. Math. 22, 335—355, 1972.

70

Pyle, L. D. and Cline, R. E. The generalized inverse in linear programming—interior gradient projection methods. SIAM J. app!. Math. 24, 511—534, 1973.

71

Rao, C. R. Some thoughts on regression and prediction, Part 1. Sankhyã 37, Series C., 102—120, 1975.

72

Rao, J. V. V. Some more lepresentations for the generalized inverse of a partitioned matrix. SIAM J. app!. Math. 24, 272—276, 1973.

BIBL$OGRAPHV 267 73

74

Rao, T. M. Subramanian, K. and Krishnamurthy, E. V. Residue arithmetic algorithms for exact computation of g-inverses of matrices. SIAM.!. numer. Anal. 13, 155—171, 1976. Robert, P. On the group inverse of a linear transformation. J. math. Anal. App/ic. 22, 658—669, 1968.

75 76 77

Robertson, J. B. and Rosenberg, M. The decomposition of matrix valued measures. Mich. math. J. 15, 353—368, 1968. Rosenberg, M. Range decomposition and generalized inverse of non-negative hermitian matrices. SIAM Review 11, 568—571, 1969. Rose, N. J. A note on computing the Drazin inverse. Linear Aig. App/ic. 15, 95—98, 1976.

Rose, N. J. The Laurent expansion of a generalized resolvent with some applications. SIAM J. app!. Math. (to appear). 79 Schwerdtfeger, H. Introduction to Linear Algebra and the Theory of Matrices. P. Noordhoff, N. V., Groningen, Holland, 1961. 80 Scholnik, H. D. A new approach to linear programming, preliminary report. 81 Shinozaki, N., Sibuya, M. and Tanabe, K. Numerical algorithms for the Moore—Penrose inverse of a matrix: direct methods. Ann. Inst. statist. Math. 24, 193—203, 1972. 82 Shinozaki, N., Sibuya, M. and Tanabe, K. Numerical algorithms for the Moore—Penrose inverse of a matrix: iterative methods. 78

Ann. Inst. statist. Math. 24, 621 —629, 1972. 83

Shoaf, J. M. The Drazin inverse of a rank-one modification of a square matrix. Ph.D. Dissertation, North Carolina State University, 1975.

Simonnard, M. Linear Programming. Prentice-Hall, Inc., Englewood Cliffs, N. J., 1966. 85 Smythe, W. R. and Johnson, L. A. Introduction to Linear Programming, with Applications. Prentice-Hall, Inc., Englewood Cliffs, N. J., 1966. 86 Söderstrom, T. and Stewart, G. W. On the numerical properties of an iterative method for computing the Moore—Penrose generalized inverse. SIAM.!. numer. Anal. 11, 6 1—74, 1974. 87 Stallings, W. T. and Boullion, T. L. Computation of pseudo-inverse matrices using residue arithmetic. SIAM Review 14, 152—537, 1972. 88 Stewart, 0. W. On the continuity of the generalized inverse. SIAM J. 84

app!. Math. 17, 33—45, 1968.

Stewart, G. W. On the perturbation of pseudo-inverses, projections and linear least squares problems. SIAM Review, 19, 634—662, 1977. 90 Szabo, S. and Tanaka, ft. Residue Arithmetic and Its Application to Computer Technology. McGraw-Hill, New York, 1967. 91 Wedin, P.-A. Perturbation bounds in connection with singular for information Behandling value decomposition. Nordisk 89

12,99—111,

1972.

92 Young, D. M. and Gregory, R. T. A

Mathematics, Vol.

Survey of Numerical

2, Addision-Wesley, Reading, Mass. 1973.

Index

Absorbing, chain, 152, 165, 169 state, 152 Absorbtion probability, 166 Adjoining a row and/or a column, 54 Algorithms, 260 Analytic matrix valued function, 227 matrices, 109, 118 Backward population projection, 184

Best linear unbiased estimate, 109 Block, form, 3 triangular matrix, 61 Blue, 109,112,113,114 Bou—Duflln inverse, 117

Calculation of the Moore-Penrose inverse, 247 Canonical form for the Drazan inverse, 122 Cauchy's inequality, 234 Characteristic polynomial, 132, 207

aosest vector, 29 Coethcient of determination, 35 Common solution, 98 Commutator-(X,YJ, 23 Commuting weak Drazin inverse, 203 Complementary subspace, 3 Component matrices, 201 Computation of the Drazm inverse, 255 Computational concerns, 246 Condition number, 224,247 Conformable, 53

Conjugate transpose, 2 Consistent, initial vector, 172, 182 model, 106 norms, 212 systems, 93

Constrained, generalized inverse, 68 least squares solutions, 65 minimization, 63

Continuity, of generalized inverses, 210 of the Drazin inverse, 232 of the index, 233 of the Moore—Penrose inverse, 216 Core of a matrix, 127 Core—nilpotent decomposition, 128 Curve fitting, 39 Cyclic chain, 152

Deflation, 258 Derivative, of matrix valued function, 226 of Moore—Penrose inverse, 227 Design matrix, 32 Difference equations—systems, 181 Differential equations—systems, 171 Direct sum, 3 Directed graph, 117 Discrete control problem, 197 Distinguished columns, IS Doubly stochastic matrix, 163 Drazin inverse, a polynomial, 130

270 INDEX algebraic definition, 122

asa limit, 136 functional definition, 122 of a partitioned matrix, 139 Dual, 238 Electrical engineering, 77 EP, matrices, 208 matrix, 74, 129

Equation solving inverses, 93 Equivalent norms, 212 Ergodic, chain, 152 158 set, 151, 166 state, 151, 165 Estimable, 106 Estimating equation, 34 EucLidean norm, 28 Expected, absorption times, 169 number of times in a state, 158

Farkas theorem, 239 Feasible solution, 238 Final space, 72 Finite Markov chain, 151 Fitting a linear combination of functions, 43 Fixed probability vector, 156, 160, 162,208 Flat, 31 Full rank factorization, 14, 15, 16, 148, 149,249 Functional, error, 30 relationship, 30 Functions of a matrix, 200 Fundamental matrix of constrained minimization, 63 Gaussian elimination, 246, 250 Gauss—Markov theorem, 105 General solution of AXB = C, 97 Generalized, eigenvector of grade P, 129 inverse, functional definition, 8 Moore definition, 9 of a sum, 46 Penrose definition, 9

Newton method, 118 Givens rotation, 249, 252 Goodness of fit, 34, 38,39 Group inverse, 124 of a block triangular matrix, 143 Hermite, echelon form, 15 form, 115, 117 Hermitian, 2 Homogeneous, Markov chain, 151 systems—differential equations, 175 Householder transformation, 249,251 Idempotent matrices, 201 Impedance, 78 matrix, 80 Inconsistent systems, 94 Index, of a block triangular matrix, 142 of a matrix, 121, 138 of a product, 150 of zero eigenvalue, 132 Initial, space, 72 value problem, 172 Inputs, 78 Integer solution, 116 Integration of exponential, 176 Intercept hypothesis, 32 Invariant subspace, 4, 7 Inverse-,(i,j, k), 91, 95

(l),96 (l,2),96 (1, 2, 3, 4), 96

(1,3), 96 (1,3,4), 25 (1,4), 96 Inverse function, 8 Isometry, 71 Iterative methods, 250, 261,262

Kirchhoff law, 78 Kronecker product, 115 I_east,

squares generalized inverse, 95 squares solution, 28

INDEX

Leslie, matrix, 185

population model, 184 Limited probability, 168 Limits for transition matrices, 155 Linear, estimation, 104 functional reLationship, 30 hypothesis, 30 model, 105 programming, 237 unbiased estimates, 107 Linearly unbiased estimate, 106 Markov process, 151 Matrix, norm, 211 valued functions, 224 Maximal element, 127 Mean first passage matrix, 159 Measurable matrix valued functions, 235

Measurement errors, 30 Meromorphic matrix valued function, 228

Minimal, least squares solution, 28 norm constrained least squares solutions, 67 polynomial, 132,207 rank weak Drazin inverse, 203 V-norm solution, 117 V-seminorm W-least squares solution, 118 Minimum, norm generalized inverse, 94 variance linear unbiased estimate, 107 Modified matrices, 51 Moore—Penrose inverse, 10,92 as an integral, 26 of block triangular matrix, 62 of partitioned matrix, 54, 55,58 of rank 1 matrix, 25 n-port network, 78 Nilpotent part of a matrix, 128 No intercept hypothesis, 32 Nonlinear least squares, 229 Norm, matnx, 211

271

operator, 211 vector, 210 Normal, 5 equations, 29 NR-Generalized inverse, 92 Null space, 2

Oblique projector, 92 Operation, 52 Operator norm, 211 Optimal, control, 187, 189 value, 239 vector, 238 Orthogonal projector, 2, 3, 29, 92, 218 Outputs, 78 p-norm, 211 Parallel sum, 82, 84 Partial isometry, 72 Partitioned, matrices, 116 (1,2)-inverse 100 (1)-inverse 98 matrix, 53

Perturbation of the Moore-Penrose inverse, 224

Polar fonn, 5, 73 Polyhedral cone, 244 Port, 78 Positive semi-definite, 102 Prescribed range null space inverse,

92,93 Product moment correlation, 35 Projective weak Drazin inverse, 203 Projector, 2 Proper splitting, 262 Properly partitioned block triangular matrix, 61 Property, of homogeneity, 78 of superposition, 78 Pyle's formulation, 241

Qudratic cost functional, 188 Range, 2 Rank, function, 225 of block triangular matrix, 104 of partitioned matrix, 102

272

INDEX

Rank 1 modification, 47, 116 Reciprocal network, 81 Rectangular systems of differential eq., 178 Reducing subspace, 4,7 Regular chain, 152, 157 Relative error, 34 Resistive network, 82 Restricted, linear model, 115 linear transformation, 9, 91, 121 model, 106 Reverse order law, 19, 26, 95, 115, 149 Row, echelon form, 14 space, 101

Shorted, matrices, 86 matrix, 87 Singular value, 6, 214 decomposition, 6, 247, 251 Slack variables, 237

Special, mapping property, 75, 205 properties of the Drazin inverse, 129 Standard basis, 47 Star cancellation law, 3 Stochastic matrix, 152 sup norm, 211

Terminals, 77 Tractable, 172, 182 Transient set, 151, 165, 168 Transition matrix, 151

Unitary matrices, 12, 72 Variance of first passage time, 160 Weak Drazin inverses, 202, 203 Weighted, generalized inverse, 117 least squares solution, 117 Moore—Penrose inverse, 118 normal equations, 115