Linear Algebra in Action Harry Dym
Graduate Studies in Mathematics Volume 78
, . American Mathematical Society
Linear Algebra in Action Harry Dym
Graduate Studies in Mathematics Volume 78
\'f>'tt....
American Mathematical Society Providence, Rhode Island
Editorial Board
David Cox Walter Craig Nikolai Ivanov Steven G. Krantz David Saltman (Chair) 2000 Mathematics Subject Classification. Primary 15-01, 30-01, 34-01, 39-01, 52-01, 93-01.
For additional information and updates on this book, visit www.ams.org/bookpages/gsm-78
Library of Congress Cataloging-in-Publication Data Dym, H. (Harry), 1938-. Linear algebra in action / Harry Dym. p. cm. - (Graduate studies in mathematics, ISSN 1065-7339 i v. 78) Includes bibliographical references and index. ISBN-13: 978-0-8218-3813-6 (alk. paper) ISBN-lO: 0-8218-3813-X (alk. paper) 1. Algebras, Linear. I. Title. QA184.2.D96 512'.5--dc22
2006 2006049906
Copying and reprinting. Individual readers of this publication, and nonprofit libraries acting for them, are permitted to make fair use of the material, such as to copy a chapter for use in teaching or research. Permission is granted to quote brief passages from this publication in reviews, provided the customary acknowledgment of the source is given. Republication, systematic copying, or multiple reproduction of any material in this publication is permitted only under license from the American Mathematical Society. Requests for such permission should be addressed to the Acquisitions Department, American Mathematical Society, 201 Charles Street, Providence, Rhode Island 02904-2294, USA. Requests can also be made by e-mail to reprint-permissionGams. org.
© 2007 by the American Mathematical Society.
All rights reserved. The American Mathematical Society retains all rights except those granted to the United States Government. Printed in the United States of America.
@
The paper used in this book is acid-free and falls within the guidelines established to ensure permanence and durability. Visit the AMS home page at http://IMl.ams.org/ 10987654321
12 11 10 09 08 07
Dedicated to the memory of our oldest son Jonathan Carroll Dym and our first granddaughter Avital Chana Dym, who were recalled prematurely for no apparent reason, he but 44 and she but 12. Yhi zichram baruch
Contents
Preface
xv
Chapter 1. Vector spaces
1
§1.1.
Preview
1
§1.2.
The abstract definition of a vector space
2
§1.3.
Some definitions
5
§1.4.
Mappings
11
§1.5.
Triangular matrices
13
§1.6.
Block triangular matrices
16
§1. 7.
Schur complements
17
§1.8.
Other matrix products
19
Chapter 2. Gaussian elimination
21
§2.1.
Some preliminary observations
22
§2.2.
Examples
24
§2.3.
Upper echelon matrices
30
§2.4.
The conservation of dimension
36
§2.5.
Quotient spaces
38
§2.6.
Conservation of dimension for matrices
38
§2.7.
From U to A
40
§2.8.
Square matrices
41
Chapter 3. Additional applications of Gaussian elimination §3.1.
Gaussian elimination redux
45 45
-
v
Contents
vi
§3.2.
Properties of BA and AC
48
§3.3.
Extracting a basis
50
§3.4.
Computing the coefficients in a basis
51
§3.5.
The Gauss-Seidel method
52
§3.6.
Block Gaussian elimination
55
§3.7.
{O, 1, oo}
56
§3.8.
Review
57
Chapter 4. Eigenvalues and eigenvectors
61
§4.1.
Change of basis and similarity
62
§4.2.
Invariant subspaces
64
§4.3.
Existence of eigenvalues
64
§4.4.
Eigenvalues for matrices
66
§4.5.
Direct sums
69
§4.6.
Diagonalizable matrices
71
§4.7.
An algorithm for diagonalizing matrices
73
§4.8.
Computing eigenvalues at this point
74
§4.9.
Not all matrices are diagonalizable
76
§4.10. The Jordan decomposition theorem
78
§4.11. An instructive example
79
§4.12. The binomial formula
82
§4.13. More direct sum decompositions
82
§4.14. Verification of Theorem 4.12
84
§4.15. Bibliographical notes
87
Chapter 5. Determinants
89
§5.1.
Functionals
89
§5.2.
Determinants
90
§5.3.
Useful rules for calculating determinants
93
§5.4.
Eigenvalues
97
§5.5.
Exploiting block structure
99
§5.6.
The Binet-Cauchy formula
102
§5.7.
Minors
104
§5.8.
Uses of determinants
108
§5.9.
Companion matrices
108
§5.10. Circulants and Vandermonde matrices
109
Contents
vii
Chapter 6. Calculating Jordan forms §6.1. Overview §6.2. Structure of the nullspaces N Bi §6.3.
Chains and cells
§6.4.
Computing J
§6.5. §6.6.
An algorithm for U An example
§6.7. §6.8.
Another example
§6.9.
Companion and generalized Vandermonde matrices
Jordan decompositions for real matrices
Chapter 7. Normed linear spaces §7.1. Four inequalities §7.2. §7.3. §7.4. §7.5. §7.6. §7.7.
Normed linear spaces Equivalence of norms Norms of linear transformations Multiplicative norms
§7.8.
Another estimate
Evaluating some operator norms Small perturbations
§7.9. Bounded linear functionals §7.1O. Extensions of bounded linear functionals §7.11. Banach spaces Chapter 8. Inner product spaces and orthogonality §8.1. Inner product spaces §8.2. A characterization of inner product spaces §8.3. Orthogonality §8.4. §8.5.
Gram matrices Adjoints
§8.6. §8.7. §8.8. §8.9. §8.1O.
The Riesz representation theorem Normal, selfadjoint and unitary transformations Projections and direct sum decompositions Orthogonal projections Orthogonal expansions
§8.11. The Gram-Schmidt method
111 112 112 115 116 117 120 122 126 128 133 133 138 140 142 143 145 147 149 150 152 155 157 157 160 161 163 163 166 168 170 172 174 177
Contents
viii
§8.12. Toeplitz and Hankel matrices §8.13. Gaussian quadrature §8.14. Bibliographical notes
178 180 183
Chapter 9. Symmetric, Hermitian and normal matrices §9.1. Hermitian matrices are diagonalizable §9.2. Commuting Hermitian matrices §9.3. Real Hermitian matrices Projections and direct sums in IF n §9A. §9.5. Projections and rank §9.6. Normal matrices §9.7. Schur's theorem §9.8. QR factorization §9.9. Areas, volumes and determinants §9.1O. Bibliographical notes
185 186 188 190 191 195 195 198 201 202 206
Chapter 10. Singular values and related inequalities §1O.1. Singular value decompositions §1O.2. Complex symmetric matrices §1O.3. Approximate solutions of linear equations §lOA. The Courant-Fischer theorem §1O.5. Inequalities for singular values §1O.6. Bibliographical notes
207 207 212 213 215 218 225
Chapter 11. Pseudoinverses §11.1. Pseudoinverses §11.2. The Moore-Penrose inverse §11.3. Best approximation in terms of Moore-Penrose inverses
227 227 234 237
Chapter 12. Triangular factorization and positive definite matrices §12.1. A detour on triangular factorization §12.2. Definite and semidefinite matrices §12.3. Characterizations of positive definite matrices §12A. An application of factorization §12.5. Positive definite Toeplitz matrices §12.6. Detour on block Toeplitz matrices §12.7. A maximum entropy matrix completion problem §12.8. Schur complements for semidefinite matrices
239 240 242 244 247 248 254 258 262
Contents
§12.9.
ix
Square roots
265
§12.1O. Polar forms
267
§12.11. Matrix inequalities
268
§12.12. A minimal norm completion problem
271
§12.13. A description of all solutions to the minimal norm
completion problem §12.14. Bibliographical notes Chapter 13. Difference equations and differential equations
273 274 275
Systems of difference equations The exponential etA
276
Systems of differential equations Uniqueness
279
Isometric and isospectral flows Second-order differential systems
282 284
§13.8.
Stability Nonhomogeneous differential systems
§13.9.
Strategy for equations
285
§13.1. §13.2. §13.3. §13.4. §13.5. §13.6. §13.7.
277 281 283 285
§13.1O. Second-order difference equations
286
§13.11. Higher order difference equations
289
§13.12. Ordinary differential equations
290
§13.13. Wronskians
293
§13.14. Variation of parameters
295
Chapter 14. Vector valued functions
297
§14.2.
1\lean value theorems Taylor's formula with remainder
299
§14.3.
Application of Taylor's formula with remainder
300
§14.4.
Mean value theorem for functions of several variables Mean value theorems for vector valued functions of several variables Newton's method
301
§14.1.
§14.5. §14.6. §14.7. §14.8.
A contractive fixed point theorem A refined contractive fixed point theorem
Spectral radius §14.1O. The Brouwer fixed point theorem §14.11. Bibliographical notes §14.9.
298
301
304 306 308 309 313 316
Contents
x
Chapter 15. The implicit function theorem
317
§15.1.
Preliminary discussion
317
§15.2.
The main theorem
319
§15.3.
A generalization of the implicit function theorem
324
§15.4.
Continuous dependence of solutions
326
§15.5.
The inverse function theorem
327
§15.6.
Roots of polynomials
329
§15.7.
An instructive example
§15.8.
A more sophisticated approach
329 331
§15.9.
Dynamical systems
333
§15.10. Lyapunov functions §15.11. Bibliographical notes
336
Chapter 16. Extremal problems
337
335
§16.1.
Classical extremal problems
337
§16.2.
341
§16.3.
Extremal problems with constraints Examples
344
§16.4.
Krylov subspaces
349
§16.5.
The conjugate gradient method
349
§16.6.
Dual extremal problems
354
§16.7.
Bibliographical notes
356
Chapter 17. Matrix valued holomorphic functions
357
§17.1.
Differentiation
357
§17.2.
Contour integration
361
§17.3.
Evaluating integrals by contour integration
365
§17.4.
A short detour on Fourier analysis
370
§17.5.
Contour integrals of matrix valued functions
372
§17.6.
Continuous dependence of the eigenvalues
375
§17.7.
More on small perturbations
377
§17.8.
Spectral radius redux
378
§17.9.
Fractional powers
381
Chapter 18. Matrix equations §18.1. The equation X - AX B = C §18.2. The Sylvester equation AX - X B §18.3. Special classes of solutions
383 383
=C
385 388
Contents
§18.4. §18.5. §18.6. §18.7.
xi
Riccati equations Two lemmas An LQR problem Bibliographical notes
Chapter 19. Realization theory §19.1. §19.2. §19.3. §19.4.
Minimal realizations Stabilizable and detectable realizations Reproducing kernel Hilbert spaces de Branges spaces
§19.5. §19.6. §19.7.
RQ invariance
Factorization of 8(A) Bibliographical notes
Chapter 20. Eigenvalue location problems §20.1. Interlacing §20.2. Sylvester's law of inertia §20.3. §20.4. §20.5.
Congruence Counting positive and negative eigenvalues Exploiting continuity
§20.6. §20.7. §20.8. §20.9. §20.10. §20.11.
Gersgorin disks The spectral mapping principle AX = X B Inertia theorems An eigenvalue assignment problem Bibliographical notes
Chapter 21. Zero location problems §21.1. Bezoutians §21.2. A derivation of the formula for H, based on realization §21.3. §21.4. §21.5.
The Barnett identity The main theorem on Bezoutians Resultants
§21.6. §21. 7. §21.8. §21.9.
Other directions Bezoutians for real polynomials Stable polynomials Kharitonov's theorem
390 396 398 400 401 408 415 416 418 420 421 425 427 427 430 431 433 437 438 439 440 441 443 446
447 447 452 453 455 457 461 463 464 466
Contents
XlI
§21.1O. Bibliographical notes Chapter 22. Convexity
467 469
§22.1.
Preliminaries
469
§22.2.
471
§22.3.
Convex functions Convex sets in lR n
473
§22.4.
Separation theorems in lR n
475
§22.5.
Hyperplanes
477
§22.6.
Support hyperplanes
479
§22.7.
Convex hulls
480
§22.8.
Extreme points
482
§22.9.
Brouwer's theorem for compact convex sets
485
§22.1O. The Minkowski functional
485
§22.11. The Gauss-Lucas theorem
488
§22.12. The numerical range
489
§22.13. Eigenvalues versus numerical range
491
§22.14. The Heinz inequality
492
§22.15. Bibliographical notes
494
Chapter 23. Matrices with nonnegative entries
495
§23.1.
Perron-Frobenius theory
496
§23.2.
Stochastic matrices
503
§23.3.
Doubly stochastic matrices
504
§23.4.
An inequality of Ky Fan
507
§23.5.
The Schur-Horn convexity theorem
509
§23.6.
Bibliographical notes
513
Appendix A.
Some facts from analysis
515
§A.1.
Convergence of sequences of points
515
§A.2.
Convergence of sequences of functions
516
§A.3.
Convergence of sums
516
§A.4.
Sups and infs
517
§A.5.
Topology
518
§A.6.
Compact sets
518
§A.7.
N ormed linear spaces
518
Appendix B. §B.l.
More complex variables
Power series
521 521
Contents
xiii
§B.2. §B.3.
Isolated zeros The maximum modulus principle
§B.4. §B.5. §B.6. §B.7.
In (1 - A) when JAJ < 1 Rouche's theorem Liouville's theorem
§B.8.
Laurent expansions Partial fraction expansions
523 525 525 526 528 528 529
Bibliography
531
Notation Index
535
Subject Index
537
Preface
A foolish consistency is the hobgoblin of little minds, ...
Ralph Waldo Emerson, Self Reliance This book is based largely on courses that I have taught at the Feinberg Graduate School of the Weizmann Institute of Science over the past 35 years to graduate students with widely varying levels of mathematical sophistication and interests. The objective of a number of these courses was to present a user-friendly introduction to linear algebra and its many applications. Over the years I wrote and rewrote (and then, more often than not, rewrote some more) assorted sets of notes and learned many interesting things en route. This book is the current end product of that process. The emphasis is on developing a comfortable familiarity with the material. Many lemmas and theorems are made plausible by discussing an example that is chosen to make the underlying ideas transparent in lieu of a formal proof; i.e., I have tried to present the material in the way that most of the mathematicians that I know work rather than in the way they write. The coverage is not intended to be exhaustive (or exhausting), but rather to indicate the rich terrain that is part of the domain of linear algebra and to present a decent sample of some of the tools of the trade of a working analyst that I have absorbed and have found useful and interesting in more than 40 years in the business. To put it another way, I wish someone had taught me this material when I was a graduate student. In those days, in the arrogance of youth, I thought that linear algebra was for boys and girls and that real men and women worked in functional analysis. However, this is but one of many opinions that did not stand the test of time.
-
xv
xvi
Preface
In my opinion, the material in this book can (and has been) used on many levels. A core course in classical linear algebra topics can be based on the first six chapters, plus selected topics from Chapters 7-9 and 13. The latter treats difference equations, differential equations and systems thereof. Chapters 14-16 cover applications to vector calculus, including a proof of the implicit function based on the contractive fixed point theorem, and extremal problems with constraints. Subsequent chapters deal with matrix valued holomorphic functions, matrix equations, realization theory, eigenvalue location problems, zero location problems, convexity, and matrices with nonnegative entries. I have taken the liberty of straying into areas that I consider significant, even though they are not usually viewed as part of the package associated with linear algebra. Thus, for example, I have added short sections on complex function theory, Fourier analysis, Lyapunov functions for dynamical systems, boundary value problems and more. A number of the applications are taken from control theory. I have adapted material from many sources. But the one which was most significant for at least the starting point of a number of topics covered in this work is the wonderful book [45] by Lancaster and Tismenetsky. A number of students read and commented on substantial sections of assorted drafts: Boris Ettinger, Ariel Ginis, Royi Lachmi, Mark Kozdoba, Evgeny Muzikantov, Simcha RimIer, Jonathan Ronen, Idith Segev and Amit Weinberg. I thank them all, and extend my appreciation to two senior readers: Aad Dijksma and Andrei Iacob for their helpful insightful remarks. A special note of thanks goes to Deborah Smith, my copy editor at AMS, for her sharp eye and expertise in the world of commas and semicolons. On the production side, I thank Jason Friedman for typing an early version, and our secretaries Diana Mandelik, Ruby Musrie, Linda Alman, Terry Debesh, all of whom typed selections and to Diana again for preparing all the figures and clarifying numerous mysterious intricacies of Latex. I also thank Barbara Beeton of AMS for helpful advice on AMS Latex. One of the difficulties in preparing a manuscript for a book is knowing when to let go. It is always possible to write it better.! Fortunately AMS maintains a web page: http://www.ams.org/bookpages/gsm-78, for sins of omission and commission (or just plain afterthoughts). TAM, ACH TEREM NISHLAM,oo. October 18, 2006 Rehovot, Israel 1 Israel Gohberg tells of a conversation with Lev Sakhnovich that took place in Odessa many years ago: Lev: Israel, how is your book with Mark Gregorovic (Krein) progressing? Israel: It's about 85% done. Lev: That's great! Why so sad? Israel: If you would have asked me yesterday, I would have said 95%.
Chapter 1
Vector spaces
The road to wisdom? Well it's plain and simple to express. Err and err and err again, but less and less and less.
Cited in [43)
1.1. Preview One of the fundamental issues that we shall be concerned with is the solution of linear equations of the form
au Xl +a12X2 + .. ·+alqxq=bl a2lXI +a22X2 + .. +a2qxq=b2
where the aij and the bi are given numbers (either real or complex) for i = 1, . .. ,p and j = 1, . .. ,q, and we are looking for the x j for j = 1, ... ,q. Such a system of equations is equivalent to the matrix equation
Ax=b, where
[a~l
A =: apl
...
a~q] _ [~l] :' x : apq
Xq
_[b.l]
and b -
: bp
-
1
1. Vector spaces
2
• RC Cola: The term aij in the matrix A sits in the i'th row and the j'th column of the matrix; Le., the first index stands for the number of the row and the second for the number of the column. The order is rc as in the popular drink by that name. Given A and b, the basic questions are: 1. When does there exist at least one solution x?
2. When does there exist at most one solution x? 3. How to calculate the solutions, when they exist? 4. How to find approximate solutions? The answers to these questions are part and parcel of the theory of vector spaces.
1.2. The abstract definition of a vector space This subsection is devoted to the abstract definition of a vector space. Even though the emphasis in this course is definitely computational, it seems advisable to start with a few abstract definitions which will be useful in future situations as well as in the present. A vector space V over the real numbers is a nonempty collection of objects called vectors, together with an operation called vector addition, which assigns a new vector u + v in V to every pair of vectors u in V and v in V, and an operation called scalar multiplication, which assigns a vector av in V to every real number a and every vector v in V such that the following hold: 1. For every pair of vectors u and v, u+v is commutative.
= v+u; Le., vector addition
2. For any three vectors u, v and w, u + (v + w) vector addition is associative.
= (u + v) + w; Le.,
3. There is a zero vector (or, in other terminology, additive identity) o E V such that 0 + v = v + 0 = v for every vector v in V. 4. For every vector v there is a vector w (an additive inverse of v) such that v + w = o. 5. For every vector v, Iv = v. 6. For every pair ofreal numbers a and f3 and every vector v, a(f3v) =
(af3)v. 7. For every pair ofreal numbers a and f3 and every vector v, (a+f3)v = av+ f3v. 8. For every real number a and every pair of vectors u and v, a(u+v) = au + av.
1.2. The abstract definition of a vector space
3
Because ofItem 2, we can write u+v+w without brackets; similarly, because of Item 6 we can write o.{3v without brackets. It is also easily checked that there is exactly one zero vector 0 E V: If 0 is a second zero vector, then o = 0 + 0 = O. A similar argument shows that each vector v E V has exactly one additive inverse, -v = (-l)v in V. Correspondingly, we write u + (-v) = u - v. From now on we shall use the symbol lR to designate the real numbers, the symbol C to designate the complex numbers and the symbol IF when the statement in question is valid for both lR and C and there is no need to specify. Numbers in IF are often referred to as scalars. A vector space V over C is defined in exactly the same way as a vector space V over lR except that the numbers a and {3 which appear in the definition above are allowed to be complex. Exercise 1.1. Show that if V is a vector space over C, then Ov = 0 for every vector v E V. Exercise 1.2. Let V be a vector space over IF. Show that if a, {3 E IF and if v is a nonzero vector in V, then o.v = {3v ¢::::::} a = {3. [HINT: 0.- {3 =I 0 ===> v = (a - (3)-1(0. - (3)v).] Example 1.1. The set of column vectors
FP
~ { [~J
of height p with entries addition
Xi
E
:
Xi
E F, i
~
1, ...
,p}
IF that are subject to the natural rules of vector
Yl] [~l] + [~l] [Xl 7 + =
Xp
YP
Xp
YP
and multiplication
of the vector x by a number a E IF is the most basic example of a vector space. Note the difference between the number 0 and the vector 0 E IF p. The latter is a column vector of height p with all p entries equal to the number zero.
1. Vector spaces
4
The set IFpxq of p x q matrices with entries in IF is a vector space with respect to the rules of vector addition:
[X~l Xpl
...
X~q] + [Y~l Xpq
... y~q] = [XU 7Yu ...
Ypl
Ypq
Xpl
+ Ypl
Xl q
7
Xpq
+ Ypq
Ylq] ,
and multiplication by a scalar a E IF:
a
[X~l
...
Xpl
X;q] = Xpq
[a~l1
...
aXpl
a~lq]. aXpq
Notice that the vector space IFP dealt with a little earlier coincides with the vector space that is designated IFpxl in the current example. Exercise 1.3. Show that the space JR 3 endowed with the rule x D y = [
::i:~:~~~ 1
maX(X3,Y3)
for vector addition and the usual rule for scalar multiplication is not a vector space over R [HINT: Show that this "addition" rule does not admit a zero element; Le., there is no vector a E JR 3 such that a D x = x D a = x for every x E JR3.]
Exercise 1.4. Let C c Ill. 3 denote the set of vectors a
~ [ :: 1such that
the polynomial al + a2t + a3t2 ~ 0 for every t E R Show that it is closed under vector addition (Le., a, bE C ~ a+ bE C) and under multiplication by positive numbers (Le., a E C and a> 0 ~ aa E C), but that C is not a vector space over R [REMARK: A set C with the indicated two properties is called a cone.] Exercise 1.5. Show that for each positive integer n, the space of polynomials n
p(A) =
L ajA j
of degree
n
j=O with coefficients aj E C is a vector space over C under the natural rules of addition and scalar multiplication. [REMARK: You may assume that 'L,"]=o ajAj = 0 for every A E C if and only if ao = al = ... = an = 0.] Exercise 1.6. Let:F denote the set of continuous real-valued functions f(x) on the interval 0 :$ X :$ 1. Show that :F is a vector space over JR with respect to the natural rules of vector addition ((fl + h)(x) = Jr(x) + h(x)) and scalar multiplication ((af)(x) = af(x)).
1.3. Some definitions
5
1.3. Some definitions • Subspaces: A subspace M of a vector space V over IF is a nonempty subset of V that is closed under vector addition and scalar multiplication. In other words if x and y belong to M, then x+y E M and ax E M for every scalar a Elf. A subspace of a vector space is automatically a vector space in its own right. Exercise 1.7. Let Fo denote the set of continuous real-valued functions f(x) on the interval 0 ::; x ::; 1 that meet the auxiliary constraints f(O) = 0 and f(1) = O. Show that Fo is a vector space over ~ with respect to the natural rules of vector addition and scalar multiplication that were introduced in Exercise 1.6 and that Fo is a subspace of the vector space F that was considered there. Exercise 1.8. Let Fl denote the set of continuous real-valued functions f (x) on the interval 0 ::; x ::; 1 that meet the auxiliary constraints f (0) = 0 and f(1) = 1. Show that Fl is not a vector space over ~ with respect to the natural rules of vector addition and scalar multiplication that were introduced in Exercise 1.6. • Span: If VI, ... ,Vk is a given set of vectors in a vector space V over IF, then span {VI, ... ,Vk} =
{tajVj :al, .. ' , ak Elf} . )=1
In words, the span iR the set of all linear combinations al VI + ... + Vk of the indicated set of vectors, with coefficients aI, ... in IF. It is important to keep in mind that span{ VI, ... ,Vk} may be small in some sense. In fact, span {VI, ... ,Vk} is the smallest vector space that contains the vectors V}, ... ,Vk. The number of vectors k that were used to define the span is not a good indicator of the size of this space. Thus, for example, if
ak
,ak
then span{Vl, V2, V3} = span{vl}' To clarify the notion of the size of the span we need the concept of linear dependence . • Linear dependence: A set of vectors VI, ... ,Vk in a vector space V over IF is said to be linearly dependent over IF if there exists a
1. Vector spaces
6
set of scalars aI, ... ,ak ElF, not all of which are zero, such that a1 VI
+ ... + ak vk = 0 .
Notice that this permits you to express one or more of the given vectors in terms of the others. Thus, if a1 =f. 0, then VI
a2 ak = - -V2 - ... - -Vk a1
a1
and hence span{vI, ... ,vd = span{V2, ... ,vd· Further reductions are possible if the vectors V2, ... ,Vk are still linearly dependent . • Linear independence: A set of vectors VI, ... ,Vk in a vector space V over IF is said to be linearly independent over IF if the only scalars aI, ... ,ak E IF for which a1v1
+ ... + akvk =
0
are a1 = ... = ak = O. This is just another way of saying that you cannot express one of these vectors in terms of the others. Moreover, if {VI, ... ,vd is a set of linearly independent vectors in a vector space V over IF and if
(1.1)
V = a1v1
+ ... + akvk
and V = f31v1
+ ... + f3kvk
for some choice of constants a1, ... ,ak, 131, ... ,13k ElF, then aj = f3j for j = 1, . .. ,k. Exercise 1.9. Verify the last assertion; i.e., if (1.1) holds for a linearly independent set of vectors, {VI, ... ,Vk}, then aj = f3j for j = 1, ... ,k. Show by example that this conclusion is false if the given set of k vectors is not linearly independent . • Basis: A set of vectors VI, ... ,Vk is said to form a basis for a vector space V over IF if (1) span{vI, ... ,Vk} = V. (2) The vectors VI, ... ,Vk are linearly independent. Both of these conditions are essential. The first guarantees that the given set of k vectors is big enough to express every vector V E Vasa linear combination of VI, ... ,Vk; the second that you cannot achieve this with less than k vectors. A nontrivial vector space V has many bases. However, the number of elements in each basis for V is exactly the same and is referred to as the dimension of V and will be denoted dim V. A proof of this statement will be furnished later. The next example should make it plausible.
1.3. Some definitions
7
Example 1.2. It is readily checked that the vectors
[~] ,m
and
m
form a basis for the vector space IF 3 over the field IF. It is also not hard to show that no smaller set of vectors will do. (Thus, dim IF 3 = 3, and, of course, dimIF k = k for every positive integer k.) In a similar vein, the p x q matrices E ij , i = 1, ... ,p,j = 1, ... ,q, that are defined by setting every entry in Eij equal to zero except for the ij entry, which is set equal to one, form a basis for the vector space IFpxq .
• Matrix multiplication: Let A = [aij] be apxq matrix and B = [bstj be a q x r matrix. Then the product AB is the p x r matrix C = [CklJ with entries q
Ckf.
=
L akjbjf., j=1
k
= 1, ... ,p;.e = 1 ...
,r.
Notice that Ckf. is the matrix product of the the k'th row ak of A with the £,th column bf. of B: Ckf. = akbf. = [akl ... akq] [
b~f.]
.
bqf.
Thus, for example, if
A
[~ ~ ] 3
and B =
1
then
AB
[11 20 -13 o
~
1
2
:]
,
-1
[4 7 10 2]. 3 4
5
9
Moreover, if A E IFpxq and x E lF q, then y = Ax is the vector in lF P with components Yi = L3=1 aijXj for i = 1, ... ,po • Identity matrix: We shall use the symbol In to denote the n x n matrix A = [aij], i, j = 1, ... ,n, with aii = 1 for i = 1, ... ,n and aij = 0 for i i= j. Thus,
13~ [H ~].
1. Vector spaces
8
The matrix In is referred to as the n x n identity matrix, or just the identity matrix if the size is clear from the context. The name stems from the fact that Inx = x for every vector x E lFn. • Zero matrix: We shall use the symbol Opxq for the matrix in lF pxq all of whose entries are equal to zero. The subscript p x q will be dropped if the size is clear from the context. The definition of matrix multiplication is such that: • Matrix multiplication is not commutative, Le., even if A and Bare both p x p matrices, in general AB i- BA. In fact, if p > 1, then one can find A and B such that AB = Opxp, but BA i- Opxp. Exercise 1.10. Find a pair of 2 x 2 matrices A and B such that AB but BA i- 02x2. • Matrix multiplication is associative: If C E lFrxs, then (AB)C = A(BC).
A E lF pxq , B E lF qxr
• Matrix multiplication is distributive: If A, A l , A2 E B2 E lF qxr , then
(Al
+ A2)B = AlB + A2B
and
A(Bl
= 02x2
lF pxq
and
and B, B l ,
+ B2) = ABl + AB2 .
• If A E lF pxq is expressed both as an array of p row vectors of length q and as an array of q column vectors of height p:
and if B E lF qxr is expressed both as an array of q row vectors of length r and as an array of r column vectors of height q:
then the product AB can be expressed in the following three ways:
(1.2)
AB= [
1 ~B =
q
alB
Exercise 1.11. Show that if
[Ab l
Abr ]
= Laibi. i=l
1.3. Some definitions
9
then
AB =
[:~~ ~ ~] B + [ ~
and hence that
AB = [
all ] a21
[b ll
b14 ] +
...
[
a12 ] a22
[b 21 b22 b23 b24 ] +
[
a13 ] [b31 ... a23
b34].
Exercise 1.12. Verify the three ways of writing a matrix product in formula (1.2). [HINT: Let Exercise 1.11 serve as a guide.] • Block multiplication: It is often convenient to express a large matrix as an array of sub-matrices (Le., blocks of numbers) rather than as an array of numbers. Then the rules of matrix multiplication still apply (block by block) provided that the block decompositions are compatible. Thus, for example, if
with entries
Aij
E lFPixqj and Bjk E
lFqjXTk,
then
C = AB = [Cij ] ,i = 1, ... ,3, j = 1, ... ,4, where
is a Pi x r j matrix. • Transposes: The transpose of a P x q matrix A is the q x P matrix whose k'th row is equal to the k'th column of A laid sideways, k = 1, ... ,q. In other words, the ij entry of A is equal to the ji entry of its transpose. The symbol AT is used to designate the transpose of A. Thus, for example, if
A =
[~ ~
:
1 ,then AT
=
[:
~]
.
It is readily checked that (1.3)
(ATf = A
and
(ABf = BT AT.
• Hermitian transposes: The Hermitian transpose AH of a P x q matrix A is the same as the transpose AT of A, except that all the entries
1. Vector spaces
10
in the transposed matrix are replaced by their complex conjugates. Thus, for example, if 3i 5+i 4 2-i 6i
A = [ 1
1' then AH
It is readily checked that
(1.4)
(AH)H
= A and (AB)H = BH AH .
• Inverses: Let A E lF pxq . Then: (1) A matrix C E lF qxp is said to be a left inverse of A if CA = lq. (2) A matrix B E lF pxq is said to be a right inverse of A if AB = lp. In the first case A is said to be left invertible. In the second case A is said to be right invertible. It is readily checked that if a matrix A E lF pxq has both a left inverse C and a right inverse B, then B = C: C
= Clp = C(AB) = (CA)B = lqB = B.
Notice that this implies that if A has both a left and a right inverse, then it has exactly one left inverse and exactly one right inverse and (as shown just above) the two are equal. In this instance, we shall say that A is invertible and refer to B = C as the inverse of A and denote it by A-I. In other words, a matrix A E lF pxq is invertible if and only if there exists a matrix B E lF qxp such that AB = lp and BA = lq. In fact, as we shall see later, we must also have q = p in this case.
Exercise 1.13. Show that if A and B are invertible matrices of the same size, then AB is invertible and (AB)-l = B-IA-l. Exercise 1.14. Show that the matrix A =
[ 1~ o~ ~ll
has no left inverses
and no right inverses.
Exercise 1.15. Show that the matrix A =
[~ ~ ~]
has at least two
right inverses, but no left inverses.
Exercise 1.16. Show that if a matrix A E C pxq has two right inverses BI and B2, then >'Bl + (1- >.)B2 is also a right inverse for every choice of>. E C. Exercise 1.17. Show that a given matrix A E lF pxq has either 0, 1 or infinitely many right inverses and that the same conclusion prevails for left inverses.
1.4. Mappings
11
Exercise 1.18. Let Au if Au is invertible, then [Au A 12 ]
E
lF Pxp , Al2 E lF pxq and A21 E lF qxP • Show that
is right invertible and
is left invertible.
1.4. Mappings • Mappings: A mapping (or transformation) T from a subset 'DT of a vector space U into a vector space V is a rule that assigns exactly one vector v E V to each u E 'DT. The set 'DT is called the domain ofT. The fO[IlOW] ing three e[X:f~S4;~ve]some idea of the possibilities:
(a) T: :~
E]R 2
I-t
Xl
(b) T:
{[:~]
(c) T: [ :~ ]
X2 - Xl + 2X2 + 6
E]R2: Xl - X2
#
o}
E ]R 3.
I-t
[lj(XI - X2)] E ]RI.
+
E]R 2
I-t
3XI X2] [ Xl - X2 E ]R 3 .
3XI + X2 The restriction on the domain in the second example is imposed in order to insure that the definition is meaningful. In the other two examples the domain is taken equal to the full vector space. In this framework we shall refer to the set
NT = {u E 'DT : Tu = Ov} as the nullspace (or kernel) of T and the set 'RT = {Tu : u E VT}
as the range (or image) ofT. The subscript V is added to the symbol o in the first definition to emphasize that it is the zero vector in V, not in U . • Linear mapping: A mapping T from a vector space U over IF into a vector space V over the same number field IF is said to be a linear mapping (or a linear transformation) if for every choice of u, v E U and a ElF the following two conditions are met: (1) T(u + v) = Tu + Tv. (2) T(au) = aTu. It is readily checked that if T is a linear mapping from a vector space U over IF into a vector space V over IF, then NTis a subspace of U and 'RT is a subspace of V . Moreover, in the preceding set of three examples, T is linear only in case (c).
1. Vector spaces
12
• The identity: Let U be a vector space over IF. The special linear transformation from U into U that maps each vector U E U into itself is called the identity mapping. It is denoted by the symbol In if U = IFn and by Iu otherwise, though, more often than not, when the underlying space U is clear from the context, the subscript U will be dropped and I will be written in place of Iu. Thus, Iuu = I U = U for every vector U E U. Exercise 1.19. Compute NT and 'RT for each of the three cases (a), (b) and (c) considered above and say which are subspaces and which are not. Linear transformations are intimately connected with matrix multiplication: Exercise 1.20. Show that if T is a linear transformation from a vector space U over IF with basis {Ul, ... , u q } into a vector space V over IF with basis {VI, ... , V p}, then there exists a unique set of scalars aij E IF, i = 1, ... , p and j = 1, . .. , q such that p
(1.5)
TUj = LaijVi
for
j = 1, ... ,q
i=l
and hence that q
(1.6)
p
T(LXjUj) = LYiVi ~Ax=y, j=l
i=l
where x E IFq has components Xl,." , X q, Y E IFP has components Yl, .. ' and the entries aij of A E IFpxq are determined by formula (1.5) .
• WARNING: If A
,YP
E C pxq ,
then matrix multiplication defines a linear map from x E C q to Ax E C p. Correspondingly, the nullspace of this map,
NA = {x E C q : Ax = O},
is a subspace of C q
,
and the range of this map,
'RA = {Ax: x
E
C q },
is a subspace of CP.
However, if A E IR pxq , then matrix multiplication also defines a linear map from x E IR q to Ax E IR P; and in this setting
NA = {x E IRq: Ax = O}
is a subspace of IRq,
and the range of this map,
'RA = {Ax: x
E
IRq},
is a subspace of IRP.
In short, it is important to clarify the space on which A is acting, i.e., the domain of A. This will usually be clear from the context.
1.5. Triangular matrices
13
1.5. Triangular matrices An
n
x
n
matrix A
=
[aij]
is said to be
• upper triangular if all its nonzero entries sit either on or above the diagonal, i.e., if aij = 0 when i > j. • lower triangular if all its nonzero entries sit either on or below the diagonal, i.e., if AT is upper triangular. • triangular if it is either upper triangular or lower triangular. • diagonal if
aij
= 0 when
i
t= j.
Systems of equations based on a triangular matrix are particularly convenient to work with, even if the matrix is not invertible. Example 1.3. Let A E lF 4x4 be a 4 x 4 upper triangular matrix with nonzero diagonal entries and let b be any vector in IF 4 . Then the vector x is a solution of the equation
(1.7)
Ax=b
if and only if allXI
+ al2X2 + al3 X3 + al4X4
bl
+ a23 X3 + a24X4
b2
+ a34 X 4
b3
a22 x 2
a33 X3
a44x 4
b4 .
Therefore, since the diagonal entries of A are nonzero, it is readily seen that these equations admit a (unique) solution, by working from the bottom up: -lb4 a 44 asi(b3 - a34 X4) X2
-
Xl
a~",}(b2
- a23x3 - a24X4)
aii(bl - al2 x 2 - al3X3 - a14 X4) .
Thus, we have shown that for any right-hand side b, the equation (1.7) admits a (unique) solution x. Exploiting the freedom in the choice of b, let ej, j = 1, ... ,4, denote the j'th column of the identity matrix 14 and let Xj denote the solution of the equation AXj = ej for j = 1, . .. ,4. Then the 4 x 4 matrix X
with columns
Xl, ... ,X4
=
[Xl
X2
X3
X4]
is a right inverse of A:
AX = A[XI ... X4] = [AXI'" AX4] [el ... e4] = 14 .
14
1. Vector spaces
Analogous examples can be built for pxp lower triangular matrices. The only difference is that now it is advantageous to work from the top down. The existence of a left inverse can also be obtained by writing down the requisite equations that must be solved. It is easier, however, to play with transposes. This works because A is a triangular matrix with nonzero diagonal entries if and only if AT is a triangular matrix with nonzero diagonal entries and
YA = Ip
¢:::::}
ATyT
= Ip
.
Exercise 1.21. Show that the right inverse X of the upper triangular matrix A that is constructed in the preceding example is also a left inverse and that it is upper triangular. Lemma 1.4. Let A be a p x p triangular matrix. Then (1) A is invertible if and only if all its diagonal entries are different from zero. Moreover, if A is an invertible triangular matrix, then (2) A is upper triangular
¢:::::}
A-I is upper triangular.
(3) A is lower triangular
¢:::::}
A-I is lower triangular.
Proof. Suppose first that A
=
[auo
a 12 ] a22
is a 2 x 2 upper triangular matrix with nonzero diagonal entries au and Then it is readily checked that the matrix equation
A
[~~~ ~~~] = [~ ~],
which is equivalent to the pair of equations
A[
~~~ ] = [ ~]
and A [
~~~ ] = [ ~ ] ,
has exactly one solution X
=
[xu X21
X12] X22
= [ all
-au -1 a12 a 22 -1
0
and that this solution is also a left inverse of A:
-1 a22
1
a22.
1.5. Triangular matrices
15
Thus, every 2 x 2 upper triangular matrix A with nonzero diagonal entries is invertible and -an -1 a12a22 -1
A- 1 = [ aOll
(1.8)
1
-1 a22
is also upper triangular. Now let A and B be upper triangular k x k matrices such that AB = h. Then for every choice of a, b,e E C k and a,(J E C with 0.=1=0,
BA =
[ A
o
b] _
a] [B a cT (J
[AB + acT acT
+ acT [ h acT
Ab + a(J ] a(J Ab + (Ja ] a(J .
Consequently, the product of these two matrices will be equal to h+1 if and only if c = 0, Ab + (Ja = 0 and a(J = 1, that is, if and only if c = 0, b = -(JBa and (J = 1/0.. Moreover, if c, band (J are chosen to meet these conditions, then
since Ba + ab
= Ba + 0.( -(JBa) = o.
Thus, we have shown if k x k upper triangular matrices with nonzero entries on the diagonal are invertible, then the same holds true for (k + 1) x (k + 1) upper triangular matrices with nonzero entries on the diagonal. Therefore, since we already know that 2 x 2 upper triangular matrices with nonzero entries on the diagonal are invertible, it follows by induction that every upper triangular matrix with nonzero entries on the diagonal is invertible and that the inverse is upper triangular. Suppose next that A E Cpxp is an invertible upper triangular matrix with inverse B E Cpxp. Then, upon expressing the identity AB = Ip in block form as
\ [AIo a1] [Bjc . b1] = [I 0 0]1 p-l
0.1
1
{31
with diagonal blocks of size (p - 1) x (p - 1) and 1 xI, respectively, it is readily seen that al{31 = 1. Therefore, 0.1 =1= O. The next step is to play the same game with Al to show that its bottom diagonal entry is nonzero and, continuing this way down the line, to conclude that the diagonal entries of A are nonzero and that the inverse matrix B is also automatically upper triangular. The details are left to the reader.
16
1. Vector spaces
This completes the proof of the asserted statements for upper triangular matrices. The proof for lower triangular matrices may be carried out in D much the same way or, what is simpler, by taking transposes. Exercise 1.22. Show that if A E c nxn and Ak = Onxn for some positive integer k, then In - A is invertible. [HINT: It's enough to show that
(In -A)(In+ A + A2 + ... +A k- 1) = (In +A+A2+ .. . + A k- 1)(In -A) = In.J Exercise 1.23. Show that even though all the diagonal entries of the matrix
A=[H
n
are equal to zero, A is invertible, and find A-I. Exercise 1.24. Use Exercise 1.22 to show that a triangular n x n matrix A with nonzero diagonal entries is invertible by writing
A
=
D + (A - D)
=
D(In
+ D- 1 (A - D)),
where D is the diagonal matrix with d jj = ajj for j key observation is that (D-1(A - D))n = O.J
= 1, ...
,n. [HINT: The
1.6. Block triangular matrices A matrix A E lF nxn with block decomposition
where Aij E lFPiXqj for i,j is said to be
= 1, ... ,k and PI + ... + Pk =
• upper block triangular if Pi i
+ ... + qk = n
=
qi
for i
= 1, . .. , k and
=
qi
for i
= 1, . .. , k and Aj = 0 for
Aij
= 0 for
> j.
• lower block triangular if Pi i
ql
<
j.
• block triangular if it is either upper block triangular or lower block triangular . • block diagonal if Pi
= qi for i = 1, ... ,k and
Aij
= 0 for i
=1=
j.
Note that the blocks Aii in a block triangular decomposition need not be triangular.
1.7. Schur complements
Exercise 1.25. Let A
17
= [OAll
AAI2] be an upper block triangular ma-
qxp
22
trix with invertible diagonal blocks All of size p x p and A22 of size q x q. Show that A is invertible and that -IA 12 A-I] 22 (1.9) A-I = [ AliI - A 11 A-I , Oqxp 22
which generalizes formula (1.8). Exercise 1.26. Use formula (1.9) to calculate the inverse of the matrix
A=[~0 0~ 5~l. Exercise 1.27. Let A
=
[~~~ ~2X2q]
be a lower block triangular matrix
with invertible diagonal blocks All of size p x p and A22 of size q x q. Find a matrix B of the same form as A such that AB = BA = I p+q. 1.7. Schur complements Let
(1.10)
E
=
[~ ~],
where A E Cpx P , BE C pxq , e E C qxp and D E two factorization formulas are extremely useful:
c qxq .
Then the following
(1) If A is an invertible matrix, then
(1.11) and D - e A-I B is referred to as the Schur complement of A with respect to E. (2) If D is an invertible matrix, then
(1.12)
E
=
[Ip 0
BD- I ] [A - BD-Ie Iq 0
and A-BD-Ie is referred to as the Schur complement of D with respect to E. At this point, these two formulas may appear to be simply tedious exercises in block matrix multiplication. However, they are extremely useful. Another proof based on block Gaussian elimination, which leads to even more general factorization formulas, will be presented in Chapter 3. Notice that the first formula exhibits E as the product of an invertible lower triangular matrix
1. Vector spaces
18
times a block diagonal matrix times an invertible upper triangular matrix, whereas the second formula exhibits E as the product of an invertible upper triangular matrix times a block diagonal matrix times an invertible lower triangular matrix.
Exercise 1.28. Verify formulas (1.11) and (1.12) under the stated conditions. Exercise 1.29. Show that if BE C pxq and C E C qxp , then Ip - BC is invertible
(1.13)
¢=::?
Iq - CB
is invertible
and that if these two matrices are invertible, then (1.14) [HINT: Exploit formulas (1.11) and (1.12).]
Exercise 1.30. Let the matrix E be defined by formula (1.10). Show that: A
and D - CA- I B
invertible ==> E
is invertible,
and construct an example to show that the opposite implication is false.
Exercise 1.31. Show that if the matrix E is defined by formula (1.10), then D and A - BD-IC invertible ==> E is invertible, and show by example that the opposite implication is false.
Exercise 1.32. Show that if the blocks A and D in the matrix E defined by formula (1.10) are invertible, then E is invertible
¢=::?
D - CA- 1 B is invertible
¢=::?
A - BD-1C is invertible.
Exercise 1.33. Show that if blocks A and D in the matrix E defined by formula (1.10) are invertible and A - BD-IC is invertible, then (1.15) (A - BD-IC)-l = A-I + A-I B(D - CA- I B)-ICA- I . [HINT: Multiply both sides of the asserted identity by A - BD-IC.]
Exercise 1.34. Show that if if blocks A and D in the matrix E defined by formula (1.10) are invertible and D - CA-1B is invertible, then (1.16) (D - CA- I B)-l = D- 1 + D-1C(A - BD-IC)-l BD- I . [HINT: Multiply both sides of the asserted identity by D - CA- I B.]
Exercise 1.35. Show that if A E C pxP , B E C pxq , C E C qxp and the matrices A and A + BC are both invertible, then the matrix Iq + CA -1 B is invertible and (Iq + CA- 1B)-l = Iq - C(A + BC)-l B.
1.B. Other matrix products
19
Exercise 1.36. Show that if A E CpxP, B E Cpx q , C E C qxp and the matrix A + BC is invertible, then the matrix
[~ ~q]
is invertible, and
find its inverse. Exercise 1.37. Let A E Cpx P, invertible. Show that
[vAH
E CP, v E CP and assume that A is
U
-u].. 1 IS Invert'bl 1 e
and that if these conditions are met, then (Ip
+ uv H A-1)-lu =
u(l
+ v H A-1u)-1 .
Exercise 1.38. Show that if in the setting of Exercise 1.37 the condition 1 + v H A-1u i= 0 is met, then the Sherman-Morrison formula
(1.17)
(A
+ uvH)-l = A-I _
A-1uvH A-I 1 +vHA-1u
holds. Exercise 1.39. Show that if A is a P x q matrix and C is a q x q invertible matrix, then RAG = RA· Exercise 1.40. Show that the upper block triangular matrix
A= [Ad 1~~ 1~: 1 1
o
0
A33
with entries Aj of size Pi XPj is invertible if the diagonal blocks All, A22 and A33 are invertible, and find a formula for A-I. [HINT: Look for a matrix B of the same form as A such that AB = Ipl +P2+P3']
1.8. Other matrix products Two other product rules for matrices that arise in assorted applications are: • The Schur product C = AoB of A = [aij] E cnxn with B = [bij] E nxn is defined as the n x n matrix C = [Cij] with entries Cij = aijbij for i,j = 1, ... ,no
c
• The Kronecker product A®B of A = [aij] E cpxq with B = [bij ] E nxm is defined by the formula
c
al~:B ... A®B= [ aplB
1. Vector spaces
20
The Schur product of two square matrices of the same size is clearly commutative. It is also readily checked that the Kronecker product of real (or complex) matrices is associative:
(A ® B) ® C
= A ® (B ® C)
and satisfies the rules (A®B)T = AT ®BT, (A® B)(C ® D) = AC ® BD, when the indicated matrix multiplications are meaningful. If x E IF k, lFk, Y E lFe and v E lFe, then the last rule implies that
(xT u)(yT v) = (x T ® yT)(u ® v).
U
E
Chapter 2
Gaussian elimination
... People can tell you... do it like this. But that ain't the way to learn. You got to do it for yourself.
Willie Mays, cited in Kahn [40], p.163 Gaussian elimination is a way of passing from a given system of equations to a new system of equations that is easier to analyze. The passage from the given system to the new system is effected by multiplying both sides of the given equation, say
Ax=b, successively on the left by appropriately chosen invertible matrices. The restriction to invertible multipliers is essential. Otherwise, the new system will not have the same set of solutions as the given one. In particular, the left multipliers will be either permutation matrices (which are defined below) or lower triangular matrices with ones on the diagonal. Both types are invertible. The first operation serves to interchange (Le., permute) rows, whereas the second serves to add a multiple of one row to other rows. Thus, for example,
[~
1o 0] [au
a12
a21
a22
a,.] ["" a" a2n = au a12
a31
a32
a3n
o
0 1
a31
a32
a,oJ a1n
,
a3n
whereas
[~ ~] 0 1 0
[au
a,.]
a21
a2n
a31
a3n
=
[ a11+
a12
aall
a21
{3all
+ a31
+ a22 {3a12 + a32 aa12
a,.+ ]
aa1n
a2n
{3al n
+ a3n
-
21
.
2. Gaussian elimination
22
2.1. Some preliminary observations The operation of adding (or subtracting) a constant multiple of one row of a p x q matrix from another row of that matrix can always be achieved by multiplying on the left by a p x p matrix with ones on the diagonal and one other nonzero entry. Every such matrix can be expressed in the form
Ea = Ip + aeieJ with i and j fixed and i
(2.1)
=1=
j ,
where the vectors el ... ,ep denote the standard basis for IFP (Le., the columns in the identity matrix Ip) and a E IF. It is readily seen that the following conclusions hold for the class of matrices £ij of the form (2.1):
(1) £ij is closed under multiplication: Ea E{1 = E a+{1' (2) The identity belongs to £ij: Eo = Ip. (3) Every matrix in £ij is invertible: Ea is invertible and E;;l = E_ a . (4) Multiplication is commutative in £ij: EaE{1 = E{1Ea. Thus, the class of matrices of the form (2.1) is a commutative group with respect to matrix multiplication. The same conclusion holds for the more general class of p x p matrices of the form
(2.2)
Eu=Ip+ueT,
with
uEIFP
and
eTu=O.
The trade secret is the identity, which is considered in the next exercise, or, in less abstract terms, the observation that
[o~ !H] [~ ! H]- [LL H] b 0 1
0 dOl
0 b+d 0 1
and the realization that there is nothing special about the size of this matrix or the second column. Exercise 2.1. Let u, v E IFP be such that eT u = 0 and eT v = O. Show that
(Ip + uen(Ip + yen = (Ip
+ veT)(Ip + uen
= Ip + (v + u)eT .
• Permutation matrices: Every n x n permutation matrix P is obtained by taking the identity matrix In and interchanging some of the rows. Consequently, P can be expressed in terms of the columns ej, j = 1, ... ,n of In and a one to one mapping ()" of the set of integers {I, . .. ,n} onto itself by the formula n
(2.3)
P
= Pa = Leje;(j)' j=l
2.1. Some preliminary observations
23
Thus, for example, if n = 4 and 0-(1) = 3, 0-(2) = 2, 0-(3) = 4 and 0-(4) = 1, then
The set of n x n permutation matrices also forms a group under multiplication, but this group is not commutative (Le., conditions (1)-(3) in the list given above are satisfied, but not (4)). • Orthogonal matrices: An n x n matrix V with real entries is said to be an orthogonal matrix if VTV = In. Exercise 2.2. Show that every permutation matrix is an orthogonal matrix. [HINT: Use formula (2.3).] The following notions will prove useful: • Upper echelon: A p x q matrix U is said to be an upper echelon matrix if the first nonzero entry in row i lies to the left of the first nonzero entry in row i + 1. Thus, for example, the first of the following two matrices is an upper echelon matrix, while the second is not.
[o~~~~!~] o 0 0 0 2 0 0 0 0 0 0
[~~~~] 0 5 0 5 000 0
• Pivots: The first nonzero entry in each row of an upper echelon matrix is termed a pivot. The pivots in the matrix on the left just above are 3, 1 and 2. • Pivot columns: A column in an upper echelon matrix U will be referred to as a pivot column if it contains a pivot. Thus, the first, third and fifth columns of the matrix considered in the preceding paragraph are pivot columns. If GA = U, where G is invertible and U E lF pxq is in upper echelon form with k pivots, then the columns ~l , ••• '~k of A that correspond in position to the pivot columns Uil' ... ,Uik of U will also be called pivot columns (even though the pivots are in U not in A) and the entries Xi!' .. . ,Xik in x E lF q will be referred to as pivot variables.
2. Gaussian elimination
24
2.2. Examples Example 2.1. Consider the equation Ax = b, where
(2.4)
A=
!]
[~2 6~ 3~
and b =
2
[~]1
1. Construct the augmented matrix 0
2
3 1 1]
A= [ 1 5 342 2 6 321
(2.5)
that is formed by adding b as an extra column to the matrix A on the far right. The augmented matrix is introduced to insure that the row operations that are applied to the matrix A are also applied to the vector b. 2. Interchange the first two rows of A to get
1 5 3 4 [ 023 1
(2.6)
263 2 where
:] =P,A,
[~ H]
has been chosen to obtain a nonzero entry in the upper left-hand corner of the new matrix. 3. Subtract two times the top row of the matrix PIA from its bottom row to get
(2.7)
[~
o
~
3
4
3
1
where
El = [
-4 -3 -6
~ ~ ~]
-2 0 1
is chosen to obtain all zeros below the pivot in the first column. 4. Add two times the second row of EIPIA to its third row to get
(2.8) where
[~ ~ ~ ~ ~ ] = E2EIPIA = [U o
0 3 -4 -1
c],
2.2. Examples
25
is chosen to obtain all zeros below the pivot in the second column, U = E2EIPIA is in upper echelon form and c = E 2 E 1 P l b. It was not necessary to permute the rows, since the upper left-hand corner of the block 23 [ o 3 -41 -11] was already nonzero. 5. Try to solve the new system of equations
(2.9)
ux =
[~o ~ ~ i 1[:~] [~ 1 0 3 -4
-1
X3 X4
by solving for the pivot variables from the bottom row up: The bottom row equation is
= -1,
3X3 - 4X4
and hence for the third pivot variable 3X3
X3
we obtain the formula
= 4X4 -1.
The second row equation is 2X2
+ 3X3 + X4 =
and hence for the second pivot variable 2X2
X2
we obtain the formula
+1=
= -3X3 - X4
1,
-5X4
+2.
Finally, the top row equation is Xl
+ 5X2 + 3X3 + 4X4 =
and hence for the first pivot variable Xl
= -5X2
-
3X3 - 4X4
_ -5( -5X4 2 9
= 2X4 -
Xl
+ 2)
-
2,
we get
+2
(4
X4 -
1)
-
4
X4
2.
Thus, we have expressed each of the pivot variables the variable X4. In vector notation,
x=
+2
[~~]
[
Xl, X2, X3
-;3] [~~2] + X4
is a solution of the system of equations (2.9), or equivalently,
(2.10)
in terms of
2. Gaussian elimination
26
(with A and b as in (2.4)) for every choice of X4. However, since the matrices E2, EI and PI are invertible, x is a solution of (2.10) if and only if Ax = b,
i.e., if and only if x is a solution of the original equation. 6. Check that the computed solution solves the original system of equations. Strictly speaking, this step is superfluous, because the construction guarantees that every solution of the new system is a solution of the old system, and vice versa. Nevertheless, this is an extremely important step, because it gives you a way of verifying that your calculations are correct. Conclusions: Since U is a 3 x 4 matrix with 3 pivots, much the same sorts of calculations as those carried out above imply that for each choice of bE ]F3, the equation Ax = b considered in this example has at least one solution x E IF4. Therefore, RA = IF3. Moreover, for any given b, there is a family of solutions of the form x = u + X4V for every choice of X4 E IF. But this implies that Ax = Au + x4Av = Au for every choice of X4 E IF, and hence that vENA. In fact,
This, as we shall see shortly, is a consequence of the number of pivots and their positions. (In particular, anticipating a little, it is not an accident that the dimensions of these two spaces sum to the number of columns of A.) Example 2.2. Consider the equation Ax = b with A=
[~1 ~2 8 4~l :
and b =
[~lb3
1. Form the augmented matrix
A~ [: 2. Interchange the first two rows to get
:
[~ ~ ~ :~l =
PIA
1 2 8 4 b3
with PI as in Step 2 of the preceding example.
2.2. Examples
27
3. Subtract the top row of PIA from its bottom row to get
= [o~ ~ !! b ~~] -b 0 4 3
EIPIA,
2
3
where
4. Subtract the second row of EIPIA from its third row to get
[0~ 0~ 0!!0 where
E2~ [~
_:
~~
]=
E2EIPI A =
[U c],
b3-b2-bt
n
[1241]
U= 0 0 4 3 000 0
5. Try to solve the new system of equations
[~o ~ ! !] [:~]
~~
= [
0 0 0::
b3 - b2 -
]
bl
working from the bottom up. To begin with, the bottom row yields the equation 0 = b3 - b2 - bl. Thus, it is clear that there are no solutions unless b3 = bl + b 2 . If this restriction is in force, then the second row gives us the equation 4X3
+ 3X4 =
bl
and hence, the pivot variable, X3 =
bl
- 3X4
4 Next, the first row gives us the equation Xl
+ 2X2 + 4X3 + X4 =
b2
and hence, the other pivot variable, Xl
= b2 = b2 -
2X2 - 4X3 - X4 2X2 -
(b 1 -
= b2 - bl - 2X2
3X4) - X4
+ 2X4 .
2. Gaussian elimination
28
Consequently, if b3
= bl + b2, then
is a solution of the given system of equations for every choice of in IF.
X2
and
X4
6. Check that the computed solution solves the original system of equations.
Conclusions: The preceding calculations imply that the equation Ax is solvable if and only if
=b
Moreover, for each such b E IF3 there exists a solution of the form x = u + X2Vl + X4V2 for every X2, X4 E IF. In particular, X2Avl + X4Av2 = 0 for every choice of X2 and X4. But this is possible only if AVI = 0 and AV2 = o.
Exercise 2.3. Check that for the matrix A in Example 2.2, RA is the span of the pivot columns of A:
The next example is carried out more quickly.
Example 2.3. Let
A=
[~
0 3 4 1 0 0
~] ~db= [~]
3 6 0 6 8 14 2
b4
Then a vector x E IF5 is a solution of the equation Ax = b if and only if
[~ ~ ~ oo 0] 4 7 000 2 1 0 000
2.2. Examples
29
The pivots of the upper echelon matrix on the left are in columns 2, 3 and 4. Therefore, upon solving for the pivot variables X2, X3 and X4 in terms of Xl, Xs and bl, .. ' ,b4 from the bottom row up, we obtain the formulas
o
b4
-
2bl
2X4
b3 - 2b2 - bl - Xs
3X3
bl - 4X4 - 7xs 3b1 + 4b2 - 2b3 - 5xs
X2
b2 .
But this is the same as Xl Xl b2 X2 (-5xs + 3b l + 4~ - 2b3)/3 X3 = (-xs + b3 - 2b2 - bl )/2 X4 Xs Xs 0 0
1 -1/2
+bl
0
+b2
0
= Xl
0 0
+xs
-5/3 -1/2 1
0 0
0
1 4/3 -1
1 0 0 0 0
+b3
-2/3 1/2 0
XIUI + XSU2 + blU3 + b2u4 + b3US, where UI, ... ,Us denote the five vectors in lF s of the preceding line. Thus, we have shown that for each vector b E lF4 with b4 = 2bl, the vector x
= Xl UI + XSU2 + bl U3 + b2ll4 + b3Us
is a solution of the equation Ax = b for every choice of Xl and Xs. Therefore, Xl UI + XSU2 is a solution of the equation Ax = 0 for every choice of XI, Xs E IF. Thus, UI, U2 E NA and, as Ax
= XIAul + XSAU2 + blAu3 + b2Au4 + b3AuS = blAu3 + b2Au4 + b3Ans,
the vectors
belong to RA.
Exercise 2.4. Let aj, j = 1, ... ,5, denote the j'th column vector of the matrix A considered in the preceding example. Show that .
(1) span{vI, V2, V3} = span{a2'a3,ad i.e., the span of the pivot columns of A.
2. Gaussian elimination
30
2.3. Upper echelon matrices The examples in the preceding section serve to illustrate the central role played by the number of pivots in an upper echelon matrix U and their positions when trying to solve systems of equations by Gaussian elimination. Our next main objective is to exploit the special structure of upper echelon matrices in order to draw some general conclusions for matrices in this class. Extensions to general matrices will then be made on the basis of the following lemma:
Lemma 2.4. Let A E lF pxq and assume that A
i=
Opxq. Then there exists
an invertible matrix G E lF Pxp such that
(2.11)
GA=U
is in upper echelon form.
Proof. By Gaussian elimination there exists a sequence PI. P2, ... ,Pk of pxp permutation matrices and a sequence EI, E2,'" ,Ek of lower triangular matrices with ones on the diagonal such that
is in upper echelon form. Consequently the matrix G = EkPk'" E2P2EIPI fulfills the asserted conditions, since it is the product of invertible matrices. D
lF pxq be an upper echelon matrix with k pivots and denote the j'th column of Ip for j = 1, ... ,po Then:
Lemma 2.5. Let U let ej
(1) k
~
E
min{p,q}.
(2) The pivot columns of U are linearly independent. (3) The span of the pivot columns = span {el' ... ,ek} (a) If k < p, then
Ru={[~]: (b) If k = p, then
bElFk
Ru = lF
Ru; i. e.,
and OElF P - k}.
P.
(4) The first k columns of uT form a basis for
Proof.
=
RUT.
The first assertion follows from the fact there is at most one pivot
in each column and at most one pivot in each row. Next, let UI, ... ,uq
2.3. Upper echelon matrices
31
denote the columns of U and let Uil" pivot columns of U. Then clearly
.. ,Uik
(with il < ... < ik) denote the
(2.12) span {Uill ...
,Uik}
~ span{uI, ... ,uq } ~ {[~]
bE lFk and 0 E lF P- k }
:
,
if k < p. On the other hand, the matrix formed by arraying the pivot columns one after the other is of special form: [Uil
...
Uik]
[g~~]
=
,
where Un is a k x k upper triangular matrix with the pivots as diagonal entries and U21 = O(p-k)xk' Therefore, Un is invertible, and, for any choice of b E IF k, the formulas
[UiI
Uik] Uli1b
=
[g~~] Uli 1b = [~]
imply (2) and that (2.13)
{[~]
: bE lFk and 0 E lF P - k}
~ {Ux:
x E lF q }
~ span {Uill ... ,Uik}'
The two inclusions (2.12) and (2.13) yield the equality advertised in (a) of (3). The same argument (but with U = Un) serves to justify (b) of (3). Item (4) is easy and is left to the reader. D Exercise 2.5. Verify (4) of Lemma 2.5. Exercise 2.6. Let U E lF pxq be an upper echelon matrix with k pivots. Show that there exists an invertible matrix K E lF qxq such that:
< q, then
(1) If k
RUT
(2) If k
= {K
[~]
: b E lFk
and 0 E lF q -
k}
.
= q, then RUT = lF q •
[HINT: In case of difficulty, try some numerical examples for orientation.] Exercise 2.7. Let U be a 4 x 5 matrix of the form
U
= [UI
U2
U3
U4
Un o0 [ U5] = o
with 'Un,
U23
U12
0
0 0
U13 U23
0
U15]
U14 U24
U25
U34
U35
000
and 'U34 all nonzero. Show that span {UI'
U3,
ll4} = Ru·
2. Gaussian elimination
32
Exercise 2.8. Find a basis for the null space Nu of the 4 x 5 matrix U considered in Exercise 2.7 in terms of its entries Uij, when the pivots of U are all set equal to one. Lemma 2.6. Let U E
(1) k
~
IFpxq
be in upper echelon form with k pivots. Then:
min{p, q}.
(2) k = q ¢:=} U is left invertible (3) k
= p ¢:=} U
¢:=}
is right invertible
Nu
¢:=}
= {a}.
Ru
= IFP.
Proof. The first assertion is established in Lemma 2.5 (and is repeated here for perspective). Suppose next that U has q pivots. Then U- [ -
Un
]
O(p-q)Xq
if q < p
and
U = Uu
if q = p,
where Un is a q x q upper triangular matrix with nonzero diagonal entries. Thus, if q < p and V E IFqxp is written in block form as V
=
[Vu
Vd
with ViI = U 1/ and V12 E IFqx(p-q), then V is a left inverse of choice of Vi2 E IFqx(P-q)j i.e., k = q =? U is left invertible.
U
for every
Suppose next that U is left invertible with a left inverse V. Then
x E Nu
=?
i.e., U left invertible
Ux
=?
= 0 =? 0 = V(Ux) =
(VU)x
= x,
Nu = {a}.
To complete the proof of (2), observe that: The span of the pivot columns of U is equal to the span of all the columns of U, alias Ru. Therefore, every column of U can be expressed as a linear combination of the pivot columns. Thus, as
Nu
= {a} =?
the q columns of U are linearly independent,
it follows that
Nu = {a}
=?
U has q pivots.
Finally, even though the equivalence k = p ¢:=} Ru = IF p is known from Lemma 2.5, we shall present an independent proof of all of (3), because it is instructive and indicates how to construct right inverses, when they exist. We proceed in three steps: (a) k = P =? U is right invertible: If k = p = q, then U is right (and left) invertible by Lemma 1.4. If k = p and q > p, then there exists a
2.3. Upper echelon matrices
33
q x q permutation matrix P that (multiplying U on the right) serves to interchange the columns of U so that the pivots are concentrated on the left, i.e., UP = [Un U12 ] ,
where Un is a p x p upper triangular matrix with nonzero diagonal entries. Thus, if q > p and V E lF qxp is written in block form as
V
= [
~~ ]
with Vn E lF Pxp and V21 E IF(q-p)xp , then
UPV = Ip {::::::} Un Vn
+ U12V21 = Ip {::::::} Vn
=
Ulil(Ip - U12V21) .
Consequently, for any choice of V21 E IF(q-p)x p , the matrix PV will be a right inverse of U if Vn is chosen as indicated just above; i.e., (a) holds. (b) U is right invertible ~ Ru = lF P : If U is right invertible and V is a right inverse of U, then for each choice of b E IF P, x = Vb is a solution of the equation Ux = b:
UV
= Ip
~
U(Vb)
= (UV)b = b;
i.e., (b) holds. (c) Ru = lF P ~ k = p: If Ru = lF P , then there exists a vector v E lF q such that Uv = e p , where ep denotes the p'th column of Ip. If U has less than p pivots, then the last row of U, erU = OT, i.e.,
1 = e~ep
= e~(Uv) = (e~U)v= OTv = 0,
which is impossible. Therefore, Ru = lF P
~
U has p pivots and (c) holds.
o Exercffie 2.9. Let A
~ [~ ~ ~] and B ~ [~ ~ ~]. Fmd
a bMffl fur
each of the spaces RBA, RA and RAB. Exercise 2.10. Find a basis for each of the spaces NBA, NA and NAB for the matrices A and B that are given in the preceding exercise. Exercise 2.11. Show that if A E lF pxq , B E lF Pxp and Ub ... ,Uk is a basis for RA, then span {BU1, ... ,BUk} = RBA and that this second set of vectors will be a basis for RBA if B is left invertible. Exercise 2.12. Show that if A is a p x q matrix and C is a q x q invertible matrix, then RAG = RA·
2. Gaussian elimination
34
Exercise 2.13. Show that if U E lF pxq is a p x q matrix in upper echelon form with p pivots, then U has exactly one right inverse if and only if p = q. If A E lF pxq and U is a subspace of lF q , then
AU={Au: UEU}.
(2.14)
Exercise 2.14. Show that if GA = Band G is invertible (as is the case in formula (2.11) with U = B), then
nB = GnA, NB =NA, nBT = nAT and GTNBT = NAT . Exercise 2.15. Let U E lF pxq be an upper echelon matrix with k pivots, where 1 ~ k ~ p < q. Show that Nu =1= {o}. [HINT: There exists a q x q permutation matrix P (that is introduced to permute the columns of U, if need be) such that UP = [Uu U21
U12] , U22
where Uu is a k x k upper triangular matrix with nonzero diagonal entries, Ul2 E lFkx(q-k), U21 = O(p-k)Xk and U22 = O(p-k)x(q-k) and hence that x
=P
[ Ulil Ul2 ] Y -Iq-k
is a nonzero solution of the equation Ux = 0 for every nonzero vector y E lF q- k .]
Exercise 2.16. Let nL = nL(U) and nR = nR(U) denote the number of left and right inverses, respectively, of an upper echelon matrix U E lF pxq . Show that the combinations (nL = 0, nR = 0), (nL = 0, nR = 00), (nL = 1, nR = 1) and (nL = 00, nR = 0) are possible. Exercise 2.17. In the notation of the previous exercise, show that the combinations (nL = 0, nR = 1), (nL = 1, nR = 0), (nL = 00, nR = 1), (nL = 1, nR = 00) and (nL = 00, nR = 00) are impossible. Lemma 2.7. Let A E lF pxq and assume that NA = {o}. Then p ~ q. Proof. Lemma 2.4 guarantees the existence of an invertible matrix G E lF Pxp such that formula (2.11) is in force and hence that
NA = {o}
{:=:}
Nu = {o} .
Moreover, in view of Lemma 2.6,
Nu
= {o}
{:=:}
U has q pivots.
Therefore, by another application of Lemma 2.6, q
~ p.
o
Theorem 2.8. Let Vb .•. ,Ve be a basis for a vector space V over IF and let UI, ... ,Uk be a basis for a subspace U of V. Then:
2.3. Upper echelon matrices
35
(1) k S: £. (2) k = £ <;::=} U = V. (3) If k < £, then there exist a set of vectors {WI, ... , Wl-k} in V such that {UI, ... ,Uk,Wl, ... ,Wl-k} is a basis for V. Proof. Since U is a subspace of V, each of the vectors Uj can be expressed as a unique linear combination of the vectors in the basis for V, i.e., l
Uj =
L
aijVi ,
for
j = 1, ... , k .
i=1
Thus, the £ x k matrix
[a~1
_ A - ..
all
...
a~k. 1 .
alk
that is based on these coefficients is uniquely defined by these two sets of vectors. The next objective is to show that NA = {o}. To this end, suppose that k
L
aijCj
= 0
for i = 1, ... ,£.
j=1
Then
t
~c;u; = ~c; (t,ai;Vi) = (~%C;)
Vi
= o.
Thus, CI = ... = Ck = 0, since the vectors UI, ... ,Uk are linearly independent. Therefore, NA = {o} and, by Lemma 2.7, k S: £. Suppose next that k = £ and span {UI, •.• ,Uk} =1= V. Then there exists a vector W E V that is linearly independent of {UI' ... , Uk}. But this in turn implies that span {UI' ... , Uk, w} is a subspace of V and hence, by the argument furnished to justify assertion (1), that £ 2: k + 1. But this is impossible if k = £. Therefore, k=£===}U=V.
Conversely, if U = V, then every vector v j, j = 1, . .. , £, can be written as a linear combination of UI, ... ,Uk, and hence the argument used to verify (1) implies that the inequality £ S: k must also be in force. Therefore, k = £. Finally, if k < £, then (2) implies that U =1= V and hence that there exists a vector WI ¢ span {Ul, ... , Uk}. But this implies that {UI,"" Uk, WI} is a linearly independent set of vectors. Moreover, if k + 1 < £, then there exists a vector W2 ¢ span {UI, .. " Uk, WI}. Therefore, {Ul, ... , Uk, WI, W2} is a linearly independent set of vectors inside the vector space V. If k + 2 < £,
2. Gaussian elimination
36
then the procedure continues untilf - k new vectors {WI, ... ,we-d have been added to the original set {UI' ... , ud to form a basis for V. D • dimension: The preceding theorem guarantees that if V is a vector space over IF with a finite basis, then every basis of V has exactly the same number of vectors. This number is called the dimension of the vector space. • zero dimension: The dimension of the vector space {O} will be assigned the number zero. Exercise 2.18. Show that IFk is a k dimensional vector space over IF. Exercise 2.19. Show that if U and V are finite dimensional subspaces of a vector space W over IF, then the set U + V that is defined by the formula
(2.15)
U + V = {u + v:
U
E U and v E
V}
is a vector space over IF and (2.16)
dim (U + V)
= dim U + dim V - dim U n V .
Exercise 2.20. Let T be a linear mapping from a finite dimensional vector space U over IF into a vector space V over IF. Show that dim RT :::; dim U. [HINT: If UI, . .. ,Un is a basis for U, then the vectors TUj, j = 1,... ,n span RT.] Exercise 2.21. Construct a linear mapping T from a vector space U over IF into a vector space V over IF such that dim RT < dim U.
2.4. The conservation of dimension Theorem 2.9. Let T be a linear mapping from a finite dimensional vector space U over IF into a vector space V over IF (finite dimensional or not). Then (2.17)
dim NT
+ dim RT =
dim U.
Proof. In view of Exercise 2.20, RT is automatically a finite dimensional space regardless of the dimension of V. Suppose first that NT =I- {O}, RT =I{O} and let UI, ... ,Uk be a basis for NT, VI, .•• ,VI be a basis for RT and choose vectors Yj E U such that TYj
= Vj,
j
= 1, ... ,l.
The first item of business is to show that the vectors UI, ... ,Uk and YI,'" ,Yl are linearly independent over IF. Suppose, to the contrary, that there exists scalars QI, . .. ,Qk and {31, . .. ,(31 such that k
(2.18)
L i=l
I QiUi
+ L(3jYj = O. j=l
2.4. The conservation of dimension
Then T
(t.
a;Ui
+
37
~ 13m ) ~ T(O) ~ O.
But the left-hand side of the last equality can be reexpressed as I
k
1
LaiTui + L{3jTYj = 0 + L{3jVj. i=1 j=l j=1 Therefore, {31 = ... = {31 = 0 and so too, by (2.18), al = ... = ak = O. This completes the proof of the asserted linear independence. The next step is to verify that the vectors U1, . .. ,Uk, Yl,. . . ,YI span U and thus that this set of vectors is a basis for U. To this end, let wE U. Then, since I
Tw
I
= L {3jVj = L {3jTYj ,
j=l j=l for some choice of {31, . .. ,{3£ ElF, it follows that
This means that I
w - L{3jYj E NT j=l and, consequently, this vector can be expressed as a linear combination of Ul, .. ' ,Uk. In other words, k
I
LaiUi + L{3jYj i=l j=l for some choice of scalars al, ... ,ak and {3I, ... ,{31 in IF. But this means that span{Ul, ... ,Uk,Y},'" ,YI} = U and hence, in view of the already exhibited linear independence, that W
=
dimU = k+l
= dimNT + dim RT, as claimed. Suppose next that NT = {O} and 'RT =1= {O}. Then much the same sort of argument serves to prove that if VI, . .. ,VI is a basis for 'RT and if Yj E U is such that TYj = Vj for j = 1, ... ,I, then the vectors YI, ... ,YI are
2. Gaussian elimination
38
linearly independent and span U. Thus, dim U = dim RT formula (2.17) is still in force, since dim NT = O.
= .e,
and hence
It remains only to consider the case RT = {O}. But then NT = U, and formula (2.17) is still valid. 0 Remark 2.10. We shall refer to formula (2.17) as the principle of conservation of dimension. Notice that it is correct as stated if U is a finite dimensional subspace of some other vector space W.
2.5. Quotient spaces This section is devoted to a brief sketch of another approach to establishing Theorem 2.9 that is based on quotient spaces. It can be skipped without loss of continuity. • Quotient spaces: Let V be a vector space over IF and let M be a subspace of V and, for U E V, let UM = {u + m : mE M}. Then V j M = {UM : U E V} is a vector space over IF with respect to the rules UM +VM = (U+V)M and a(uM) = (au)M of vector addition and scalar multiplication, respectively. The details are easily filled in with the aid of Exercises 2.22-2.24. Exercise 2.22. Let M be a proper nonzero subspace of a vector space V over IF and, for U E V, let UM = {u + m: mE M}. Show that if X,Y E V, then xM = YM { = } x - Y E M and use this result to describe the set of vectors U E V such that UM = OM. Exercise 2.23. Show that if, in the setting of Exercise 2.22, u, v, x, y E V and if also uM = XM and v M = YM, then (u + V)M = (x + Y)M. Exercise 2.24. Show that if, in the setting of Exercise 2.22, a, {3 E IF and U E V, but U ¢ M, then (au)M = ({3U)M if and only if a = {3. Exercise 2.25. Let U be a finite dimensional vector space over IF and let V be a subspace of U. Show that dimU = dim (UjV) + dim V. Exercise 2.26. Establish the principle of conservation of dimension with the aid of Exercise 2.25.
2.6. Conservation of dimension for matrices One of the prime applications of the principle of conservation of dimension is to the particular linear transformation T from lF q into lF P that is defined by multiplying each vector x E lF P by a given matrix A E lF pxq • Because of its importance, the main conclusions are stated as a theorem, even though
2.6. Conservation of dimension for matrices
39
they are easily deduced from the definitions of the requisite spaces and Theorem 2.9. Theorem 2.11. If A E lF pxq , then
(2.19)
NA
(2.20)
'RA
{x E lF q : Ax = O}
is a subspace of lF q ,
= {Ax: x E lF q } is a subspace of lF P and q = dim NA + dim 'RA .
(2.21)
• rank: If A E lF pxq , then the dimension of'RA is termed the rank of A: rank A = dim'RA . Exercise 2.27. Let A E lF pxq , BE lF Pxp and C E lF qxq . Show that: (1) rankBA::; rank A, with equality if B is invertible. (2) rank AC ::; rank A, with equality if C is invertible. Theorem 2.12. If A E lF pxq , then rank A = rank AT = rankAH
(2.22)
::;
min {p, q}.
Proof. The statement is obvious if A = Opxq. If A =1= Opxq, then there exists an invertible matrix G E lF Pxp such that GA = U is in upper echelon form. Thus, rank A
= rank G A = rank U = the number of pivots of U ,
whereas,
= rank AT GT = rank UT = the number of pivots of U . The proof that rank A = rankA H is left to the reader as an exercise. rank AT
0
Exercise 2.28. Show that if A E C pxq , then rank A = rankAH. Exercise 2.29. Show that if A E C pxq and C E C kxq , then (2.23)
rank
[~]
=q{=::}NAnNc={o}.
Exercise 2.30. Show that if A E C pxq and B E C pxr , then (2.24)
rank
[A B]
=
p {=::} NAB nNBB = {O}.
Exercise 2.31. Show that if A is a triangular matrix (either upper or lower), then rank A is bigger than or equal to the number of nonzero diagonal entries in A. Give an example of an upper triangular matrix A for which the inequality is strict. Exercise 2.32. Calculate dimNA and dim 'RA in the setting of Exercise 2.4 and confirm that these numbers are consistent with the principle of conservation of dimension.
2. Gaussian elimination
40
2.7. From U to A The next theorem is an analogue of Lemma 2.6 that is stated for general matrices A E lF pxq ; i.e., the conclusions are not restricted to upper echelon matrices. It may be obtained from Lemma 2.6 by exploiting formula (2.11). However, it is more instructive to give a direct proof. Theorem 2.13. Let A E lF pxq . Then
(1) rank A = p (2) rank A
¢:::::}
A is right invertible
= q ¢:::::} A
is left invertible
¢:::::}
¢:::::}
RA
= lF P •
NA = {o}.
(3) If A has both a left inverse B E lF qxp and a right inverse C E lF qxp , then B = C and p = q.
Proof.
Since RA
~
lF P , it is clear that RA = lF P
¢:::::}
rank A = p.
Suppose next that RA = lF P • Then the equations AXj=bj , j=l, ... ,p,
are solvable for every choice of the vectors b j . If, in particular, b j is set equal to the j'th column of the identity matrix I p , then
A
[Xl
xp] = [b l
...
This exhibits the q x p matrix X
=
[Xl
Xp]
with columns Xl, .. , ,Xp as a right inverse of A. Conversely, if AC = Ip for some matrix C E lF qxp , then X = Cb is a solution of the equation Ax = b for every choice of bE lF P , i.e., RA = lF P • This completes the proof of (1). Next, (2) follows from (1) and the observation that
N A = {o}
= q (by Theorem 2.11)
¢:::::}
rank A
¢:::::}
rank AT = q
-¢=::}
AT is right invertible
-¢=::}
A is left invertible.
(by Theorem 2.12) (by part (1))
Moreover, (1) and (2) imply that if A is both left invertible and right invertible, then p = q and, as has already been shown, the two one-sided inverses coincide: B = BIp = B(AC) = (BA)C = IqC = C. D
2.8. Square matrices
41
Exercise 2.33. Find the null space NA and the range RA of the matrix A
=
[~5 2~ 0~ 4~ 1 acting on
lR. 4
and check that the principle of conservation of dimension holds.
2.8. Square matrices Theorem 2.14. Let A E IFPxP. Then the following statements are equivalent:
(1) A is left invertible. (2) A is right invertible. (3) NA = {a}.
(4) RA = IFP. Proof.
This is an immediate corollary of Theorem 2.13.
o
Remark 2.15. The equivalence of (3) and (4) in Theorem 2.14 is a special case of the Fredholm alternative, which, in its most provocative form, states that if the solution to the equation Ax = b is unique, then it exists, or to put it better: If A E IFPxp and the equation Ax = b has at most one solution, then it has exactly one.
Lemma 2.16. If A E IFPxp, B E IFPxp and AB is invertible, then both A and B are invertible. Proof. p
Clearly, RAB ~ RA and NAB ;2 N B . Therefore,
= rankAB
~ rank A ~ p
and 0 = dimNAB ~ dimNB ~ O.
The rest is immediate from Theorem 2.14.
o
Lemma 2.17. IfVEIFpx q , then (2.25)
NVHV = N v
and rank VHV = rank V .
Proof. It is easily seen that N v ~ NVHV' since Vx = 0 clearly implies that VHVx = O. On the other hand, if VHVx = 0, then xHVHVx = 0 and hence the vector y = Vx, with entries y, ... ,YP is subject to the constraints P
O=yHy
=LIYjI2. j=l
2. Gaussian elimination
42
Therefore, VHVx = 0 ===} y = Vx = O. This completes the proof of the first assertion in (2.25). The second then follows easily from the principle of conservation of dimension, since VHV and V both have q columns. 0 Exercise 2.34. Show that if A E IFpxq and B E IFqx p, then NAB and only if NA n RB = {O} and NB = {O}. Exercise 2.35. Find a p x q matrix A and a vector b E NA = {O} and yet the equation Ax = b has no solutions.
]RP
= {O} if
such that
Exercise 2.36. Let BE IFnxp, A E IFpxq and let {u!, ... ,Uk} be a basis for RA. Show that {Bu!, ... ,Bud is a basis for RBA if and only if RA nNB = {O}.
Exercise 2.37. Find a basis for RA and NA if A =
1 3 1 82] [ ~ !2 ~ ! ~ . 1
6
11 5 9
Exercise 2.38. Let B E IFnx p, A E lF pxq and let {u!, ... ,Uk} be a basis for RA. Show that: (a) span{BuI, ... ,BUk} = RBA. (b) If B is left invertible, then {Bu!, ... ,BUk} is a basis for RBA. Exercise 2.39. Let A E IF 4x5 , let v!, V2, V3 be a basis for RA and let V = [VI V2 V3]. Show that V HV is invertible, that B = V (V HV) -1 V H is not left invertible and yet RB = RBA. Exercise 2.40. Let UI, U2, U3 be linearly independent vectors in a vector space U over IF and let ll4 = UI + 2U2 + U3. (a) Show that the vectors U}, U2, ll4 are linearly independent and that span {UI' U2, U3} = span {UI' U2, U4}. (b) Express the vector 7Ul +13u2+5u3 as a linear combination of the vectors u}, U2, U4. [Note that the coefficients of all three vectors change.]
Exercise 2.41. For which values of x is matrix [
2] invertible? 4 1 ~ 3 4 x
Exercise 2.42. Show that the matrix A =
[~1
3: 2~]
find its inverse by solving the system of equations A[XI umn by column.
is invertible and X2
X3] = 13 col-
2.8. Square matrices
43
Exercise 2.43. Show that if A E C pxp is invertible, BE C pxq , C E C qxp , DE c qxq and
E = then dim RE
[~ ~],
= dim RA + dim R(D-CA-IB)'
Exercise 2.44. Show that if, in the setting of the previous exercise, D is invertible, then rankE = rankD + rank (A - BD-IC).
:~::: 2~:~e:: a basis for
RA
:e [mtt t!,]U':i::lim[T]t:o: :~:,t::~
3 0 1 2 and a basis for N A .
1
n' ~ n~ible, ~d
Exercise 2.46. Use the method of Gaussian elimination to solve the equation Ax
~
b when A
~ [~ ~
b
[
if
Md •
basis for R A and a basis for N A. Exercise 2.47. Use the method of Gaussian elimination to solve the equation Ax
~ b when A ~
U! ~] , ~ Ul b
IT posmble, and Md •
basis for R A and a basis for N A. Exercise 2.48. Find lower triangular matrices with ones on the diagonal EI, E2, ... and permutation matrices PI, P2, . .. such that Ek PkEk-I Pk-l ... EIPI A
is in upper echelon form for any two of the three preceding exercises. Exercise 2.49. Use Gaussian elimination to find at least two right inverses to the matrix A given in Exercise 2.45. [HINT: Try to solve the equation
A
Xu [ X21 X31
X12 X22 X32
X13] X23 X33
X41
X42
X43
= 13 ,
column by column.] Exercise 2.50. Use Gaussian elimination to find at least two left inverses to the matrix A given in Exercise 2.46. [HINT: Find rightinverses to AT.]
Chapter 3
Additional applications of Gaussian elimination
I was working on the proof of one of my poems all morning, and took out a comma. In the afternoon I put it back again.
Oscar Wilde This chapter is devoted to a number of applications of Gaussian elimination, both theoretical and computational. There is some overlap with conclusions reached in the preceding chapter, but the methods of obtaining them are usually different.
3.1. Gaussian elimination redux Recall that the method of Gaussian elimination leads to the following conclusion:
Theorem 3.1. Let A E lF pxq be a nonzero matrix. Then there exists a set of lower triangular p x p matrices EI, ... ,Ek with ones on the diagonal and a set of p x p permutation matrices PI, ... ,Pk such that (3.1) is in upper echelon form. Moreover, in this formula, Pj acts only (if at all) on rows j, ... ,p and Ej - Ip has nonzero entries in at most the j 'th column.
The extra information on the structure of the permutation matrices may seem tedious, but, as we shall see shortly, it has significant implications: it enables us to slide all the permutations to the right in formula (3.1) without changing the form of the matrices E I , ... , Ek.
-
45
3. Additional applications of Gaussian elimination
46
Theorem 3.2. Let A E lF pxq be any nonzero matrix. Then there exists a lower triangular p x p matrix E with ones on the diagonal and a p x p permutation matrix P such that EPA=U
is in upper echelon form. Discussion. To understand where this theorem comes from, suppose first that A is a nonzero 4 x 5 matrix and let el, ... ,e4 denote the columns of 14. Then there exists a choice of permutation matrices PI, P2, P3 and lower triangular matrices
El
E2
E3
-
-
=
[~ ~] [~ ~] [~ ~] 0 0 1 0 0 1 0 0
0 0 1 0
d 1 e 0
0 0 1 0 0 1 0 f
-1 - 4 + Ul e Tl with Ul=
= 14 + u2 e f
with U2 =
-1 - 4 + U3 e T3 with
[~] [;]
~~m,
such that
(3.2) is in upper echelon form. In fact, since P2 is chosen so that it interchanges the second row of EIP1A with its third or fourth row, if necessary, and P3 is chosen so that it interchanges the third row of E2P2EIPIA with its fourth row, if necessary, these two permutation matrices have a special form: P2
-
[01 °IITl] ,
where III
P3
=
[~
where 112 is a 2 x 2 permutation matrix.
g2]'
is a 3 x 3 permutation matrix
This exhibits the pattern, which underlies the fact that
47
3.1. Gaussian elimination redux
where Ej denotes a matrix of the same form as E j • Thus, for example, since e~ P3 = e~ and V2 = P3U2 is a vector of the same form as U2, it follows that P3 E 2
P3(14
+ u2e f)
P3 +v2e~
=
P3 + v2e~ P3 = E~P3 ,
where E~ = 14 +v2e~
is a matrix of the same form as E 2 • In a similar vein P3P2E I
= P3E~ P2 = E~ P3P2
and consequently, E3P3E2P2EIPI = E3E~E~ P3P2PI = EP,
with E
= E3E~E~
and
P
= P3P2PI .
Much the same argument works in the general setting of Theorem 3.2. You have only to exploit the fact that Gaussian elimination corresponds to multiplication on the left by EkPk··· EIPI, where E·J
= Ip + [0] bj
e~J
with
°
E
IF;
j and b·J E lF p -'
and that the p x p permutation matrix Pi may be written in block form as Pi =
[lo 0], i- 1
IIi-l
where Cj
II i - 1
is a (p - i
+ 1) x (p - i + 1) permutation matrix.
Then, letting
= IIi-lbj, PiEj
Pi (lp Pi
+
+
[:J e;)
[~] e;
(lp+[~]e;)~=EjPi since
e; Pi = e;
for
i>j,
for i > j .
Remark 3.3. Theorem 3.2 has interesting implications. However, we wish to emphasize that when Gaussian elimination is used in practice to study the equation Ax = b, it is not necessary to go through all this theoretical
3. Additional applications of Gaussian elimination
48
analysis. It suffices to carry out all the row operations on the augmented matrix
[a~l
...
a~q b:l ]
apl al q bq and then to solve for the "pivot variables", just as in the examples. But, do check that your answer works.
In what follows, we shall reap some extra dividends from the representation formula
(3.3)
EPA=U
(that is valid for both real and complex matrices) by exploiting the special structure of the upper echelon matrix U. Exercise 3.1. Show that if A E lF nxn is an invertible matrix, then there exists a permutation matrix P such that
(3.4)
PA=LDU,
where L is lower triangular with ones on the diagonal, U is upper triangular with ones on the diagonal and D is a diagonal matrix. Exercise 3.2. Show that if LlDlUl = L2D2U2, where L j , D j and Uj are n x n matrices of the form exhibited in Exercise 3.1, then L1 = L 2, D1 = D2 and Ul = U2. [HINT: Consider L2lL1D1 = D2U2Ul1.] Exercise 3.3. Show that there exists a 3 x 3 permutation matrix P and a lower triangular matrix
B=
[~:
l nsuch that [! H][ ~ ! n
if and only if a = O.
= BP
=l(f1ran upper triangclar
Exercise 3.4. Find a permutation matrix P such that P A
:::::::::::t:::e~a:r:
= LU, where L
3.2. Properties of BA and AC In this section a number of basic properties of the product of two matrices in terms of the properties of their factors are reviewed for future use. Lemma 3.4. Let A E lF pxq and let B E lF Pxp be invertible. Then:
3.2. Properties of BA and AC
49
(1) A is left invertible if and only if B A is left invertible. (2) A is right invertible if and only if BA is right invertible.
(3) NA =NBA.
(4) BRA = RBA. (5) rank BA = rank A. (6) NA = {O} ~ NBA = {O}.
(7) RA = IF P ~ RBA = IF P •
Proof. The first assertion follows easily from the observation that if C is a left inverse of A, then CA
= Iq
====?
(CB-I)(BA)
= Iq .
Conversely, if C is a left inverse of BA, then
= Iq
C(BA)
====?
(CB)A
= Iq .
This completes the proof of (1). Next, to verify (2), notice that if C is a right inverse of A, then AC
= Ip
====?
(BA)C
= B(AC) = B
====?
BA(CB- I )
= Ip;
i.e., (C B- 1 ) is a right inverse of B A. Conversely, if C is a right inverse of BA, then (BA)C
= Ip
====?
B(AC)
= Ip ====? AC = B- 1 ====?
A(CB)
= Ip;
i.e., CB is a right inverse of A. Items (3) and (4) are easy and are left to the reader. To verify (5), let
{UI, ...
,Uk} be a basis for RA. Then clearly
span {BU1,'" ,BUk}
= RBA.
Moreover, the vectors BU1,'" ,BUk are linearly independent, since
~aj (Buj) ~ 0 ==* B (~ajUj) ~O==* ~ajUj ENE, and NB = {O}, which forces all the coefficients QI, ... ,Qk to be zero, because the vectors Ul,' .. ,Uk are linearly independent. Thus, the vectors Bu!, ... ,BUk are also linearly independent and hence dim RBA
= k = dim RA ,
which proves (5). Finally, (6) is immediate from (3), and (7) is immediate from (5).
Exercise 3.5. Verify items (3), (4), and (7) of Lemma 3.4.
0
3. Additional applications of Gaussian elimination
50
Exercise 3.6. Verify item (5) of Lemma 3.4 on the basis of (3) and the law of conservation of dimension. [HINT: The matrices A and BA have the same number of columns.] Lemma 3.5. Let A E lF pxq and let C E lF qxq be invertible. Then: (1) A is left invertible if and only if AC is left invertible. (2) A is right invertible if and only if AC is right invertible.
= CNAG. (4) RA = RAG. (3) NA
(5) dimNA = dimNAG.
(6) NA (7) RA
= {O} ¢::=} NAG = {O}. = lF P ¢::=} RAG = lF P .
Exercise 3.7. Verify Lemma 3.5. [HINT: The fact that {UI,'" ,Uk} is a basis for NA ¢::=} {C-IUI, ... ,C-IUk} is a basis for NAG is helpful.] Exercise 3.S. Give an example of a pair of matrices A E lF pxq and C E lF qxq such that C is invertible, but NA =1= NAG. Exercise 3.9. Let A E lF pxq , B E lF Pxp and let {UI,'" ,Uk} be a basis for RA. Show that if B is left invertible, then {BUI,'" ,Bud is a basis for RBA. Exercise 3.10. Find a pair of matrices A E lF pxq and B E lF Pxp such that B is not left invertible and yet {BUI,'" , BUk} is a basis for RBA for every basis {UI, ... , Uk} of RA.
3.3. Extracting a basis Let {Vb' .. ,Vk} be a set ot vectors in a basis for the subspace
]fI' lit.
A problem of interest is to find
v = span{vl, ... ,vd. This problem may be solved by the following sequence of steps: (1) Let A =
[VI ... Vk]'
(2) Use Gaussian elimination to reduce A to an upper echelon matrix U. (3) The pivot columns of A form a basis for V. Example 3.6. Let
3.4. Computing the coefficients in a basis
51
Then, following the indicated strategy, we first set
12
20]
1 2
4 1
= [ 3 6 10 2
A
.
Then, by Gaussian elimination, a corresponding upper echelon matrix is U=
[10 02 42 0]2 o
.
0 0 0
The pivot columns of U are the first and the third. Therefore, by the recipe furnished above, RA
and dim RA
= span{vl' V2, V3, V4} = span{vl, V3},
= 2.
Exercise 3.11. Use Gaussian elimination to find NA and RA for each of the following choices of the matrix A:
[; ~ ~ ~] [-~ 3 2 6 1
~3 0~ 0~].
1
Exercise 3.12. Find a basis for the span of the vectors
3.4. Computing the coefficients in a basis Let {u1, . . . ,Uk} be a basis for a k dimensional subspace U of lFn. Then every vector b E U can be expressed as a unique linear combination of the vectors {Ul,' .. ,Uk}; i.e., there exists a unique set of coefficients Cl,' .• ,Ck such that k
b=
I>jUj. j=1
The problem of computing these coefficients is equivalent to the problem of solving the equation
Ac=b, where A = [Ul ... Uk] is the n x k matrix with columns u1. ... ,Uk and e = [Cl ck]T. This problem, too, can be solved efficiently by Gaussian elimination.
3. Additional applications of Gaussian elimination
52 Exercise 3.13. Let
U
=
[UI
U2
U3
U4
U5]
[H ~ H1
=
o
Show that the pivot columns
UI, U3
and
0 0 0 0
form a basis for Ru.
U4
Exercise 3.14. Show that in the setting of the previous exercise, the columns UI, U3 and U5 also form a basis for Ru and calculate the coefficients a, b, c, d, e, f in the two representations
[n~
aud 1m;, + cu,
~
dud eu, + lut
of the vector given on the left.
3.5. The Gauss-Seidel method Gaussian elimination can be used to find the inverse of a p x p invertible matrix A by solving each of the equations AXj
where the right-hand side Then the formula
ej
= ej,
j
= 1, ...
,p,
is the j'th column of the identity matrix Ip.
[Xl ... xp] = [eI ... ep ] = [Xl ... xp] as the inverse A-I of A. A
identifies X =
Ip
The Gauss-Seidel method is a systematic way of organizing all p of these separate calculations into one more efficient calculation by proceeding as follows, given A E lF Pxp , invertible or not: 1. Construct the p x 2p augmented matrix
A = [A
Ip].
2. Carry out elementary row operations on A that are designed to bring A into upper echelon form. This is equivalent to choosing E and P so that EPA = [EPA EPIp ]
=
[U
EP].
3. Observe that U is a p x p upper triangular matrix with k pivots. If k < p, then A is not invertible and the procedure grinds to a halt. If k = p, then Uii 1= 0 for i = 1, ... ,po Therefore, by Lemma 1.4, there exists an upper triangular matrix F such that FU = Ip and hence A-I = FEP. To obtain
3.5. The Gauss-Seidel method
53
A-I numerically, go on to the next steps. 4. Multiply [U EP] on the left by a diagonal matrix D = [dij] with dii = (Uii)-I for i = 1, ... ,p to obtain
[u where now U
DEP] ,
= DU is an upper triangular matrix with ones on the diagonal.
5. Carry out
el~mentary row manipulations on
[U
DEP] that are de-
signed to bring U !o the identi.9-"..:.. This is equivalent to choosing an upper triangular matrix F such that FU = Ip. Then
and hence, as F DEPA = I p , the second block on the right FDEP = FEP = A-I.
6. Check! Multiply your candidate for A-I by A to see if you really get the identity matrix Ip as an answer. Thus, for example, if
then
1 3 1 [ A = 2 8 4
1 0 0 010
047
001
1
,
and two steps of Gaussian elimination lead in turn to the forms
Al
=
[~
3 1
1 0
2 2
-2 1
4 7
0 0
~1
and A2 =
[~
3 1
1
0
2 2
-2
1
0 3
4
-2
~]
3. Additional applications of Gaussian elimination
54
Next, let
A3
=
100] [ 0 ~ 0
o
0
A2
[131 = 0 1 1
l
1
-1
0 0 1
and then subtract the bottom row of A3 from the second and first rows to obtain
A4 =
1 3 0 [ 0 1 0 001
The next to last step is to subtract three times the second row of the first to obtain
A5=
A4 from
1 0 0 [ 010 001
The matrix built from the last 3 columns of A5 A-I. The final step is to check that this matrix, which is conveniently written as
is indeed the inverse of A, i.e., that
AB = 13. Exercise 3.15. Find the inverse of the matrix
[~1 :3 2~l
by the Gauss-
Seidel method. Exercise 3.16. Use the Gauss-Seidel method to find the inverse of the matrix
3.6. Block Gaussian elimination
55
3.6. Block Gaussian elimination Gaussian elimination can also be carried out in block matrices providing that appropriate range conditions are fulfilled. Thus, for example, if
A=
[~~~ ~~~ ~~: 1 A3l A32 A33
is a block matrix and if there exists a pair of matrices Kl and K2 such that (3.5)
A2l
= KlAn and
A3l
= K2 A n,
then
[
-~l -K2
0 I
o o
0
I
1 [ A=
Au 0 0
Al2 -KlAn +A22 -K2A n +A32
Ala+ A23 -KlA13
1 .
-K2A 13 +A33
This operation is the block matrix analogue of clearing the first column in conventional Gaussian elimination. The implementation of such a step depends critically on the existence of matrices Kl and K2 that fulfill the conditions in (3.5). If An is invertible, then clearly Kl = A21All and K2 = A3lAll meet the requisite conditions. However, matrices Kl and K2 that satisfy the conditions in (3.5) may exist even if All is not invertible.
Lemma 3.7. Let A E lF pxq , BE lF PXT and C E lF PXT • Then: (1) There exists a matrix K E lF TXq such that A = BK if and only if RA ~ RB.
(2) There exists a matrix L
E lF TXq such that A = LC if and only if
RAT ~ RCT.
Proof. Suppose first that RA
~
RB and let ej denote the j'th column of
I q • Then the presumed range inclusion implies that Aej = BUj for some vector Uj E lFT for j = 1, ... ,q and hence that A = A [el ... e q ] =
B [Ul ... u q ] = BK, with K = [Ul ... u q ]. This proves half of (1). The other half is easy and is left to the reader together with (2). D
Exercise 3.17. Complete the proof of Lemma 3.7. The Schur complement formulas (1.11) and (1.12) that were furnished in Chapter 1 and generalizations that are valid under less restrictive assumptions can be obtained by a double application of block Gaussian elimination.
Theorem 3.S. If All E lF Pxp , Al2 E lF pxq , A2l E lF qxp , A22 E IF qxq and the range conditions
(3.6)
3. Additional applications of Gaussian elimination
56
are in force, then there exists a pair of matrices K E lF pxq and L E lF qxp such that
A12 = AuK
(3.7)
and
A21 = LAn
and
[~~~ ~~~] =
(3.8)
[1
~] [~l
A22 -
~AllK ] [3 Z]·
Proof. Lemma 3.7 guarantees the existence of a pair of matrices L E lF qxp and K E lF pxq that meet the conditions in (3.7). Thus,
[ Ip -L
0]
Iq
[All A12] _ [ All A21 A22 0
A12 ] A22 - LA12
and
[A~1
A22
~1lA12 ] [3 ~~] = [Ad 1
A22
which in turn leads easily to formula (3.8).
~LA12 ]
, D
Exercise 3.18. Show that formula (3.8) coincides with the first Schur complement formula (1.11) in the special case that An is invertible. Similar considerations lead to a generalization of the second Schur complement formula (1.12).
Exercise 3.19. Let A E lF nxn be a four block matrix with entries All E lF Pxp , A12 E lF pxq , A21 E lF qxp , A22 E lF qxq , where n = p + q. Show that if the range conditions RAT12 ~ RAT22
(3.9)
and RA21 C RA22 -
are in force, then A admits a factorization of the form
3.7.
{a, 1, oo}
Theorem 3.9. The equation Ax = b has either 0, 1, or infinitely many solutions. Proof.
There are three possibilitities to consider:
(1) b¢RA. (2) bE RA and NA = {o}. (3) bE RA and NA
i= {o}.
3.8. Review
57
In case (1) the equation Ax = b has no solutions. Suppose next that bE R,1 and that Xl and X2 are both solutions to the given equation. Then the identities
0= b - b = AXI - AX2 = A(XI - X2) imply that (Xl -X2) E N,1. Thus, in case (2) (Xl -X2) = 0; i.e., the equation has exactly one solution, whereas in case (3) it has infinitely many solutions: If Ax = band U E N,1, then A(x + au) = b for every a E IF. 0 Exercise 3.20. Find a system of 5 equations and 3 unknowns that has exactly one solution and a system of 3 equations and 5 unknowns that has no solutions. Exercise 3.21. Let nL = nL(A) and nR = nR(A) denote the number of left and right inverses, respectively, of a matrix A E IFpxq. Show that the combinations (nL = 0, nR = 0), (nL = 0, nR = (0), (nL = 1, nR = 1) and (nL = 00, nR = 0) are possible. [HINT: To warm up, consider upper echelon matrices first.] Exercise 3.22. In the notation of the previous exercise, show that the combinations (nL = 0, nR = 1), (nL = 1, nR = 0), (nL = 00, nR = 1), (nL = 1, nR = (0) and (nL = 00, nR = (0) are impossible.
3.8. Review Before we go on, let us review a few of the main facts that have been established in the first three chapters. To this end, let A E IFpxq be a nonzero matrix, let E be a p x p lower triangular matrix with ones on the diagonal and let P be a p x p permutation matrix such that
(3.11)
EPA=U
is in upper echelon form with k pivots. Then:
(1) NA = N u · (2) dimNA = dimNu = q - k. (3) dim R,1 = dim Ru = k. (4) RA = the span of the pivot columns of A. Moreover, • The following are equivalent: 1. NA = {O}. 2. A is left invertible. 3. The equation Ax = b has at most one solution x for each righthand side b. 4. k = q. 5. rank A = q.
3. Additional applications of Gaussian elimination
58
• The following are equivalent: 1. RA = JFP. 2. A is right invertible. 3. The equation Ax = b has at least one solution x for each righthand side b. 4. k =p. 5. rankA = p. • Law of conservation of dimension:
• rank A = rank AT.
• If p = q, then NA = {O} if and only if RA = JFP. Therefore, if p = q, then A is left invertible if and only if it is right invertible. (Moreover, in this case, as we have already noted in Theorem 2.13, there is exactly one left inverse and exactly one right inverse and they coincide.)
Exercise 3.23. Use the method of Gaussian elimination to solve each of the following systems of linear equations when possible: 2Xl Xl 3Xl
+
X2
+
2X2
X2
5Xl 3Xl 2Xl
+ + + + + +
X3 2X3 4X2 2X2 X2
3XI 2Xl 3XI
0 -1 1
X3
+ + +
6X3 5 X3 3X3
+ + +
+ + +
5 X2
-
4X2
-
6X2
=
8
7X4
-
6.
5X4
=
1
2X4
9 7 5
Exercise 3.24. Discuss the answers obtained in the preceding exercise in terms of NA and RA. Exercise 3.25. Let U and V be subspaces ofJFk and JF l , respectively. Show that
is a subspace of JFk+l and that dim W
= dimU + dim V.
Exercise 3.26. Let Uj be subspaces of JF kj for j = 1, . .. ,f. Show that
is a subspace of JF k l+··+ k t and that dim W = dimUl exhibit a basis for W.
+ ... + dimUl,
and
3.8. Review
59
Exercise 3.27. Let A E C pxn and e E C pxq . Show that there exists at most one matrix B E C nxq such that AB = C and BH v = 0 for every vector vENA. [HINT: The second fact in (2.25) may be helpful.] Exercise 3.28. Let A E C pxn and e E C pxq and let V E C nxk be a matrix whose columns form a basis for N A . Show that if AB = C, then B = (In - V(VHV)-IVH)B meets the conditions A = jje and jjHv = 0 for every vector vENA and is the only matrix in C nxq to do so. Exercise 3.29. Let A, X, B E c nxn be such that AX = XB. Show that if A is invertible, then there exists a matrix e E c nxn such that AX = XC and e is invertible. [HINT: If X is not invertible, then, without loss of generality, you may assume that X = [Xl X 2 ], where the columns of Xl form a basis for Rx and hence AXI = XIK and X2 = XIL for suitably chosen matrices K and L.] Exercise 3.30. Show that if A 'E c nxn meets the condition Ax = cp(x)x for every vector x E C n, where cp(x) is a scalar valued function of x, then A = >..In for some constant>.. E C. Exercise 3.31. Let A E Show that rank
c nxn
and suppose that Ak-l Ak-2
[Af
Ak-l
... ...
i=
0, but Ak = 0.
In ]
=n.
AL ° [HINT: The given matrix can be expressed as the product of its last block column with its first block row.] Exercise 3.32. If
10:1 i=
(3.12)
[
0:1
:
o:k
1, then the matrix equation
a1 o:k-l
. .... .!~ 0: ...
I]
[XO] Xl
:: 1
Xk
-
[0]:
° 1
admits a unique solution x E C HI with bottom entry xk = (1 - 10:1 2)-1. Use Gaussian elimination to verify this statement when k = 2 and k = 3. Exercise 3.33. Show that if A E lF pxq , B E IFqxp and AB is invertible, then A is right invertible and B is left invertible, however, the converse is false.
Chapter
4
Eigenvalues and eigenvectors
Can you imagine a mathematician writing Moby Dick? Let my name be Ishmael, let the captain's name be Ahab, let the boats name be Pequod, and let the whale's name be as in the title. B. A. Cipra [17] This chapter takes the first steps towards establishing the Jordan decomposition theorem for matrices A E c nxn . Computational techniques and a variant for A E lR. nxn in which all the factors in the Jordan decomposition are also constrained to belong to lR. nxn will be considered in Chapter 6. It is convenient to begin the story in the setting of arbitrary linear transformations acting from one finite dimensional vector space into another. At first glance this might appear to be a much too abstract setting for the stated goal of decomposing a matrix. However, it isn't really, because, as was already indicted in Exercise 1.20, every linear transformation T from a q dimensional vector space U over IF into a p dimensional vector space V over IF can be fully described in terms of a p x q matrix whose entries are determined by specifying a basis for each space. Thus, to recap:
If UI,
...
,uq is a basis for U and
VI, ...
,vp is a basis for V, then
P
TUi
= LajiVj j=I
-
61
4. Eigenvalues and eigenvectors
62 for some unique choice of since
aji
E IF, i = 1, ... ,q, j = 1, ... ,po Moreover,
q
U
E U ==>
U
= 2: aiUi for some choice of coefficients
ai
ElF
i=l
and T is linear,
=
v
Tn = Tt.<>,U; = t.<>;{TU;) = ~ (t. aj:<>,)
where [
~l
]
[
a~l
a~q]
Vj
= ~f3jVj,
[7]
(3p apl apq aq Thus, the coefficients (3I, ... ,(3p of v = Tu, stacked as a column vector, are obtained by multiplying the stacked coefficients aI, ... ,aq of U by the p x q matrix A with entries aij' In other words, every linear transformation T from lF q to lF P corresponds to multiplication by a p x q matrix A. The converse is also true; i.e., multiplication of vectors in lF q by a matrix A E lF pxq defines a linear transformation from lF q to lF P • There is in fact a one to one correspondence between linear transformations and matrix multiplication. Moreover, we can, as in the preceeding discussion, either think of the given column vectors as an encoding of some specified basis or more simply as the coefficients with respect to the standard basis for the spaces IF q and IF P, respectively.
In this chapter we shall focus on linear mappings from a finite dimensional vector space U over IF into itself and shall take the same basis, say UI, •.. ,Un, for both VT, the domain of T, and 'RT, the range of T. Correspondingly the matrix A E IF nxn, with entries aij, i, j = 1, ... ,n, is defined by the rule n
(4.1)
TUi =
2:
ajiUj,
i = 1, ...
,n.
j=l
There are many different choices that one can make for the basis. Some will turn out to be more convenient than others.
4.1. Change of basis and similarity • similarity: A pair of matrices A, BEe nxn are said to be similar if there exists an invertible matrix C E c nxn such that A = CBC-l. • change of basis: The matrix representation (4.1) of a given linear transformation T from a vector space U into itself changes as the basis changes. However, any two such matrices are similar.
4.1. Change of basis and similarity
63
Theorem 4.1. Let T be a linear mapping from an n dimensional vector space U over IF into itself. Let Ul,'" ,Un and WI, ... ,Wn be two bases for U and suppose that n
TUi
n
= LajiUj, i = 1, ... ,n and TWi = LbjiWj, i = 1, ... ,no j=1
j=1
Then the matrix A = [aji] is similar to the matrix B = [bji ]; i.e., there exists an invertible n x n matrix C such that
A= CBC- 1 . Proof. Let C be the n x n matrix with entries Cij, i, j defined by the rule
= 1, ... ,n, that are
n Wi
= L CsiUs ,
i
= 1, ... ,n.
s=1 Then, on the one hand, TWi
=
n n n LCsiTus = LCSi LajsUj s=1 • s=1 j=1
whereas, on the other hand, n n n TWi = LbtiWt = Lbti LCjtUj t=l t=1 j=1
=
=
n n L(LajsCsi)Uj, j=1 s=1
n n L(LCjtbti)Uj. j=1 t=1
Therefore, upon comparing the coefficients of Uj, we see that n
L ajsCsi s=1
n
= L Cjtbti i.e., AC = CB . t=1
Therefore, in order to complete the proof it remains only to show that C is invertible. But if C were not invertible, then there would exist a nonzero vector x such that Cx = O. But this in turn implies that n n n L XiWi = L(L CjiXi)Uj = 0, i=1 i=1 j=1 which contradicts the presumed linear independence of WI, ...
,Wn .
0
Exercise 4.1. Show that similarity is an equivalence relation, i.e., denoting similarity by rv: (1) A rv Aj (2) A rv B ==> B rv Aj (3) A rv Band
B", C ==> A
rv
C.
4. Eigenvalues and eigenvectors
64
4.2. Invariant subspaces Let T be a linear mapping from a vector space U over IF into it8elf. Then a subspace M of U is said to be invariant under T if Tu E M whenever uEM. The simplest invariant subspaces are the one dimensional ones, if they exist. Clearly, a one dimensional invariant sub8pace M = {em : a E IF} based on a nonzero vector u E U is invariant under T if and only if there exists a constant A E IF such that
(4.2)
Tu = AU,
U
i- 0,
or, equivalently, if and only if N(T->'J)
i- {O} ;
i.e., the nullspace of T - AI is not just the zero vector. In this instance, the number A is said to be an eigenvalue of T and the vector U is said to be an eigenvector of T. In fact, every nonzero vector in N(T->'J) is said to be an eigenvector of T. It turns out that if IF = C and U is finite dimensional, then a one dimensional invariant subspace always exists. However, if IF = JR, then T may not have anyone dimensional invariant subspaces. The best that you can guarantee for general T in this case is that there exists a two dimensional invariant subspace. As we shall see shortly, this is connected with the fact that a polynomial with real coefficients (of even degree) may not have any real roots. Exercise 4.2. Show that if T is a linear transformation from a vector space V over IF into itself, then the vector spaces N(T->'J) and n(T->'I) are both invariant under T for each choice of A E IF. Exercise 4.3. The set V of polynomials p(t) with complex coefficients is a vector space over C with respect to the natural rules of vector addition and scalar multiplication. Let Tp = p"(t) + tp'(t) and Sp = p"(t) + t 2p'(t). Show that the subspace Uk of V of polynomials p(t) = Co + CIt + ... + Cktk of degree less than or equal to k is invariant under T but not under S. Find a nonzero polynomial p E U3 and a number A E C such that Tp = Ap. Exercise 4.4. Show that if T is a linear transformation from a vector space
V over IF into itself, then T2 + 5T + 61 = (T + 31) (T + 21).
4.3. Existence of eigenvalues The first theorem in this section serves to establish the existence of at least one eigenvalue A E C for a linear transformation that maps a finite dimensional vector space over C into itself. The second theorem serves to bound
4.3. Existence of eigenvalues
65
the number of distinct eigenvalues of such a transformation by the dimension of the space. Theorem 4.2. Let T be a linear transformation from a vector space V over C into itself and let U i= {O} be a finite dimensional subspace of V that is invariant under T. Then there exists a nonzero vector W E U and a number A E C such that TW=AW.
Proof. By assumption, dimU = f for some positive integer f. Consequently, for any nonzero vector u E U the set of f + 1 vectors
u,Tu, ... ,Tlu is linearly dependent over Cj i.e., there exists a set of complex numbers CO, •. . ,Ce, not all of which are zero, such that cou + ... + clrtu = O.
Let k = max {j : the polynomial p(x)
Cj
i= a}.
Then, by the fundamental theorem of algebra,
= Co + CIX + ... + ClXl = CO + ClX + ... + Ck xk
can be factored as a product of k polynomial factors of degree one with roots 1-'1, ... ,I-'k E C:
Correspondingly, cou + ... + clTeu
=
cou + ... + ckTku
=
ck(T - I-'kI) ... (T - Jl.2I)(T - Jl.II)u = O.
This in turn implies that there are k possibilities:
(1) (T - Jl.II)u = O. (2) (T - 1-'1I)u i= 0 and (T - 1-'2I)(T - Jl.II)u = O. (k) (T - I-'k-lI) ... (T - 1-'1I)u i= 0 and (T - Jl.kI) ... (T - 1-'1I)u =
o.
In the first case, Jl.l is an eigenvalue and u is an eigenvector. In the second case, the vector WI = (T - Jl.II)u is a nonzero vector in U and TWI = Jl.2Wl. Therefore, (T - 1-'1I)u is an eigenvector of T corresponding to the eigenvalue Jl.2.
In the k'th case, the vector Wk-l = (T - Jl.k-lI) ... (T - Jl.II)u is a nonzero vector in U and TWk-l = I-'kWk-l. Therefore, (T - Jl.k-lI)··· (T -l-'lI)u is an eigenvector of T corresponding to the eigenvalue I-'k. 0
4. Eigenvalues and eigenvectors
66
Notice that the proof does not guarantee the existence of real eigenvalues for linear transformations T from a vector space V over lR into itself because the polynomial p(x) = CO + CIX + ... + Ckxk may have only complex roots J.tb ••. ,J.tk even if the coefficients CI, ... ,Ck are real; see e.g., Exer
Exercise 4.5. Let T be a linear transformation from a vector space V over lR into itself and let U be a two dimensional subspace of V with basis {Ul' U2}. Show that if TUl = U2 and TU2 = -UI, then T 2u + U = 0 for every vector U E U but that there are no one dimensional subspaces of U that are invariant under T. Why? [HINT: A one dimensional subspace of U is equal to {a(clul +C2U2) : a E lR} for some choice of CI, C2 E lR with icll+ic21 > 0.] Theorem 4.3. Let T be a linear transformation from an n dimensional vector space U over C into itself and let UI, ... ,Uk E U be eigenvectors ofT corresponding to a distinct set of eigenvalues AI, ... ,Ak' Then the vectors UI, ... ,Uk are linearly independent and hence k ::; n. Proof.
Let al, ... ,ak be a set of numbers such that
(4.3) We wish to show that
(1.j
= 0 for j =
1, ... ,k. But now as
(T - AiI)Uj = TUj - AiUj = (Aj - Ai)Uj,
i, j = 1, ... ,k,
it is readily checked that
(1'
-
A I) (T ;\ 1) 2 .. . - k Uj
=
{O(AI if- A2)j =...2,(AI... -,kAk)UI
if
j=l.
Therefore, upon applying the product (T - A2I) .. · (T - AkI) to both sides of (4.3), it is easily seen that the left-hand side
(T - A2I)··· (T - AkI)(alul + ... + akuk) = (AI - A2)'" (AI - An)alul, whereas the right-hand side
(T - A21) ... (T - AkI)O = 0 . Thus, al = 0, since these two sides must match. Similar considerations serve to show that (1.2 = ... = ak = O. Consequently the vectors UI,'" ,Uk are linearly independent and k ::; n, as claimed. 0
4.4. Eigenvalues for matrices The operation of multiplying vectors in lF n by a matrix A E lF nxn is a linear transformation of lF n into itself. Consequently, the definitions that were introduced earlier for the eigenvalue and eigenvector of a linear transformation can be reformulated directly in terms of matrices:
4.4. Eigenvalues for matrices
67
• A point A E JF is said to be an eigenvalue of A E JFnxn if there exists a nonzero vector U E JFn such that Au = AU, Le., if ~A-).,Jn)
-=1=
{O} .
Every nonzero vector U E ~A-).,Jn) is said to be an eigenvector of A corresponding to the eigenvalue A.
• A nonzero vector U E JF n is said to be a generalized eigenvector of the matrix A E JFnxn corresponding to the eigenvalue A E JF if U E N(A->..In)n.
• A vector U E JF n is said to be a generalized eigenvector of order k of the matrix A E JFnxn corresponding to the eigenvalue A E JF if (A - A[n)ku = 0, but (A - A[n)k-l u -=1= O. In this instance, the set of vectors Uj = (A - A[n)(k-i)u for j = 1, ... , k is said to form a Jordan chain of length k; they satisfy the following chain of equalities: (A - A[n)Ul -
0
(A - A[n)U2
Ul
-
(A - A[n)Uk -
Uk-I.
This is equivalent to the formula
(4.4) k-l (A - A[n) [Ul
Uk]
= [Ul ...
Uk]
N,
where
N
=
L ejeJ+1 j=l
and ej denotes the j'th column of [k. Thus, for example, if k then (4.4) reduces to the identity
= 4,
Exercise 4.6. Show that the vectors U1, ... ,Uk in a Jordan chain oflength k are linearly independent. If A1, ... ,Ak are distinct eigenvalues of a matrix A E JF nxn , then:
• The number 'Yj = dimN(A->"jln ), j = 1, ... ,k,
is termed the geometric multiplicity of the eigenvalue Aj. It is equal to the number of linearly independent eigenvectors associated with the eigenvalue Aj.
4. Eigenvalues and eigenvectors
68 • The number aj =
dimN(A-Ajln)n, j =
1, ... ,k,
is termed the algebraic multiplicity of the eigenvalue Aj. It is equal to the number of linearly independent generalized eigenvectors associated with the eigenvalue Aj. • The inclusions N(A-Ajln ) ~ N(A-Ajln )2 ~ .•• ~ N(A-Ajln)n
(4.6)
guarantee that
(4.7)
"Ij
S aj for j = 1, ... ,k,
and hence (as will follow in part from Theorem 4.12) that
(4.8)
"11
+ ... + "Ik
Sal
+ ... + ak =
n .
• The set
(4.9)
O'(A) = {A
E
C:
N(A->.Jn ) =1=
{O}}
is called the spectrum of A. Clearly, O'(A) is equal to the set {At, ... ,Ak} of all the distinct eigenvalues of the matrix A in C. Theorems 4.2 and 4.3 imply that (1) Every matrix A E c nxn has at least one eigenvalue).. E C. (2) Every matrix A E
c nxn
has at most n distinct eigenvalues in C.
(3) Eigenvectors corresponding to distinct eigenvalues are automatically linearly independent.
Even though (1) implies that 0'( A) =1= 0 for every A E C nxn, it does not guarantee that O'(A) n lR =1= 0 if A E lR nxn. Exercise 4.7. Verify the inclusions (4.6). Exercise 4.8. Show that the matrices
A=
[~
-
~]
and
A= [
~
-1 ] -1
have no real eigenvalues, i.e., u(A) n lR = 0 in both cases. Exercise 4.9. Show that although the following upper triangular matrices
'
'
[ ~ ~ ~l [~ ~ ~l [~ ~ ~l 002
002
002
have the same diagonal, dimN(A-2h) is equal to three for the first, two for the second and one for the third. Calculate N(A- 2I3)j for j = 1,2,3,4 for each of the three choices of A.
4.5. Direct sums
69
Exercise 4.10. Show that if A E IF nxn is a triangular matrix with entries aij, then o-(A) = Ui=l {aii}. The cited theorems actually imply a little more: Theorem 4.4. Let A E c nxn and let U be a nonzero subspace ofC n that is invariant under A, i.e., U E U ===} Au E U. Then: (1) There exists a nonzero vector Au = AU.
U E
U and a number A E C such that
(2) If Ul,.· . ,Uk E U are eigenvectors of A corresponding to distinct eigenvalues AI, ... , Ak, then k:S dimU.
Exercise 4.11. Verify Theorem 4.4.
4.5. Direct sums Let U and V be subspaces of a vector space Y over IF and recall that
U +V
= {u + v: u E U and v
E V} .
Clearly, U + V is a subspace of Y with respect to the rules of vector addition and scalar multiplication that are inherited from the vector space Y, since it is closed under vector addition and scalar multiplication. The sum U + V is said to be a direct sum if U n V = {o}. Direct sums are denoted by the symbol i.e., U+V rather than U + V. The vector space Y is said to admit a sum decomposition if there exists a pair of subspaces U and V of Y such that
+,
U+V=y. In this instance, every vector y E Y can be expressed as a sum of the form y = u + v for at least one pair of vectors u E U and v E V. The vector space Y is said to admit a direct sum decomposition if there exist a pair of subspaces U and V of Y such that U+V = y, Le., if U + V = Y and Un V = {o}. If this happens, then V is said to be a complementary space to U and U is said to be a complementary space to V. Lemma 4.5. Let Y be a vector space over 1F and let U and V be subspaces of Y such that U V = y. Then every vector y E Y can be expressed as a sum of the form y = u + v for exactly one pair of vectors u E U and v E V.
+
Exercise 4.12. Verify Lemma 4.5. Exercise 4.13. Let T be a linear transformation from a vector space V over IR into itself and let U be a two dimensional subspace of V with basis {Ul. U2}. Show that if TUI = Ul + 2U2 and TU2 = 2Ul + U2, then U is the direct sum of two one dimensional spaces that are each invariant under T.
4. Eigenvalues and eigenvectors
70
Lemma 4.6. Let U, V and W be subspaces of a vector space Y over IF such that U+ V = Y and U c;, W . Then
W = (WnU)+(W n V). Proof.
Clearly, (W n U)
+ (W n V) c;, W + W = W.
+
To establish the opposite inclusion, let w E W. Then, since Y = U V, w = U + v for exactly one pair of vectors U E U and v E V. Moreover, under the added assumption that U c;, W, it follows that both U and v = w - U belong to W. Therefore, U E W n U and v E W n V, and hence W c;, (W n U)
+ (W n V) .
o Exercise 4.14. Provide an example of three subspaces U, V and W of a vector space Yover IF such that U+V = Y, but W =1= (W n U)+(W n V). [HINT: Simple examples exist with Y = IR 2 .J H Uj, j = 1, ... , k, are finite dimensional subspaces of a vector space Y over IF, then the sum
(4.10)
Ul + ... +Uk =
{Ul
+ ... + Uk:
Ui
E Ui for i = 1, ... , k}
is said to be direct if
(4.11) H U = Ul
+ ... + Uk and the sum is direct, then we write U =U1+···+Uk.
H k = 2, then formula (2.16) implies that the sum Ul + U2 is direct if and only if Ul n U2 = {O}. Therefore, the characterization (4.11) is consistent with the definition of the direct sum of two subspaces given earlier.
Exercise 4.15. Give an example of three subspaces U, V and W of IR 3 such that un V = {O}, Un W = {O} and V n W = {O} yet the sum U + V + W is not direct. Exercise 4.16. Let Y be a finite dimensional vector space over IF. Show that if Y = U+V and V = X+W, then Y = U+X+W. Lemma 4.7. Let Uj, j = 1, . . . , k, be finite dimensional nonzero subspaces of a vector space Y over IF. Then the sum (4.. 10) is direct if and only if every set of nonzero vectors {Ul, ... , Uk} with lli E Ui for i = 1, ... , k is a linearly independent set of vectors.
4.6. Diagonalizable matrices
71
Discussion. To ease the exposition, suppose that k = 3 and let {aI, ... ,ad be a basis for Ul , {b l , ... , b m } be a basis for U2 and {Cl, ... ,cn } be a basis for U3. Clearly span {a!, ... ,ae, bI, ... ,bm , cI, ...
,Cn }
= Ul +U2 +U3.
It is easily checked that if the sum is direct, then the f + m + n vectors indicated above are linearly independent and hence if u = 2:: ai8.i, v = 2:: f3j b j and w = 2:: Ikak are nonzero vectors in Ul, U2 and U3, respectively, then they are linearly independent.
Suppose next that every set of nonzero vectors u E Ul, v E U2 and U3 is linearly independent. Then {al, ... ,ae, b l ,· .. ,bm , Cl,··. ,Cn} must be a linearly independent set of vectors because if
wE
alaI + ... + aeae + f3 l b l + ... + f3m b m + IlCl + ... + Incn and if, say, al
i= 0, 131 i=
°
and II
= 0,
i= 0, then
al(al + ... +a1l aeae)+f3l(b l +- .. + 1311f3m b m) +,1 (Cl +- . ·+,ll'ncn) = 0, which implies that al = 131 = II = 0, contrary to assumption. The same D argument shows that all the remaining coefficients must be zero too. Exercise 4.17. Let U = span{ Ul, ... ,Uk} over IF and let Uj = {auj : a E IF}. Show that the set of vectors {UI, ... ,Uk} is a basis for the vector space U over IF if and only if Ul +Uk = U.
+...
4.6. Diagonalizable matrices A matrix A E lF nxn is said to be diagonalizable if it is similar to a diagonal matrix, i.e., if there exists an invertible matrix U E lF nxn and a diagonal matrix DE lF nxn such that
A=UDU- 1 .
(4.12)
Theorem 4.8. Let A E e nxn and suppose that A has exactly k distinct eigenvalues AI, ... ,Ak E C. Then the sum N(A-AIln )
+ ... + N(A-Ak1n)
is direct. Moreover, the following statements are equivalent: (1) A is diagonalizable. (2) dim N(A-AIln ) + ... + dim
N(A-Ak1n) =
(3)
= en.
N(A-AIln )+··· +N(A-Ak1n)
n.
Proof. Suppose first that A is diagonalizable. Then, the formula A UDU- 1 implies that
A - A1n = UDU- 1 - AUlnU- 1 = U(D - Aln)U- l
=
4. Eigenvalues and eigenvectors
72
and hence that dimN(A-AIn) = dimN(D-AIn)
for every point
A E C.
In particular, if A = Aj is an eigenvalue of A, then 'Yj = dimN(A-Ajln) = dimN(D-Ajln )
is equal to the number of times the number Aj is repeated in the diagonal matrix D. Thus, 'Yl + ... + 'Yk = n, Le., (1) ==} (2) and, by Lemma 4.7, (2) { = } (3). It remains to prove that (2) ==} (1). Take 'Yj linearly independent vectors from NCA-Ajln) and array them as the column vectors of an n x 'Yj matrix Uj • Then AUj = UjAj for j = 1, ... ,k , where Aj is the 'Yj x 'Yj diagonal matrix with Aj on the diagonal. Thus, upon setting U = [Ul ... Uk] and D = diag {AI. ... ,Ak }, it is readily seen that
AU=UD and, with the help of Theorem 4.3, that the 'Yl + ... + 'Yk columns of U are linearly independent, Le., rank U = 'Yl
+ ... + 'Yk .
The formula AU = UD is valid even if 'Yl + ... + 'Yk in force, then U is invertible and A = UDU- 1 •
< n. However, if (2) is D
Corollary 4.9. Let A E c nxn and suppose that A has n distinct eigenvalues in C. Then A is diagonalizable. Exercise 4.18. Verify the corollary. Formula (4.12) is extremely useful. In particular, it implies that
A2 = (UDU- 1 )(UDU- 1 ) =
un 2u- 1 ,
A3 = UD 3 U- 1 etc. The advantage is that the powers D2, n 3 , •••
,nk are easy to compute:
4.7. An algorithm for diagonalizing matrices
73
Moreover, this suggests that the matrix exponential (which will be introduced later)
all of which can be justified. Exercise 4.19. Show that if a matrix A E lF nxn is diagonalizable, i.e., if A = U DU- l with D = diag{Al, ... ,An}, and if
U
~ [Ul
...
Un
1
and
(1) Ak = UDkU- l = Ej=lAjUj (2) (A-Aln)-l
U- 1
~
[
1],
then
Yj.
= U(D-Aln )-lU- 1 = Ej=l (Aj-A)-lUj Yj, if A rf. a(A).
4.7. An algorithm for diagonalizing matrices The verification of (2) ==> (1) in the proof of the Theorem 4.8 contains a recipe for constructing a pair of matrices U and D so that A = U DU-l for a matrix A E C nxn with exactly k distinct eigenvalues AI, ... ,Ak when the geometric multiplicities meet the constraint 1'1 + ... + 1'k = n:
(1) Calculate the geometric multiplicity 1'j = dim N(A-Ajln ) for each eigenvalue Aj of A. (2) Obtain a basis for each of the spaces .Nc.A-Ajln ) for j = 1, ... ,k and let Uj denote the n x 1'j matrix with columns equal to the vectors in this basis.
(3) Let U = [Ul
Uk]'
Then AU = [AUI
where D
= diag{D1, ...
,Dk}
and D j is a 1'j x1'j diagonal matrix with Aj on its diagonal. If 1'1 + .+1'k = n, then U will be invertible. The next example illustrates the algorithm.
74
4. Eigenvalues and eigenvectors
Example 4.10. Let A E C 6x6 and suppose that A has exactly 3 distinct eigenvalues AI, A2, A3 E C with geometric multiplicities 11 = 3, 12 = 1 and 13 = 2, respectively. Let {Ub U2, U3} be any basis for N(A->'lIn ) , {U4} be any basis for N(A->'2 In) and {us, ll6} be any basis for N(A->'3 In)' Then it is readily checked that
Al 0 0 : 0 0 0 0 Al 0 : 0 0 0 0 0 ~l.. ;.. ~ .. , 0 0 0 0 o : A2 : 0 0 . . . . . . . . . . . . . . . . . . . . ... . . . . . . . . . .. . . . . . .. . . 0 0 0:0:A3 0 0 0 0 : 0 0 A3 But, upon setting
U = [UI U2 U3 U4 Us U6]
and
D = diag {AI, AI, AI, A2, A3, A3},
the preceding formula can be rewritten as AU
= UD
or equivalently as
A
= U DU- l ,
since U is invertible, because it is a 6 x 6 matrix with six linearly independent column vectors, thanks to Theorem 4.3. Notice that in the notation used in Theorem 4.8 and its proof, k = 3, Ul
A3
= [Ul U2 U3], U2 = U4, US = [U5 U6], Al = diag {AI, AI, AI}, A2 = A4 and
= diag {As, A6}.
4.8. Computing eigenvalues at this point The eigenvalues of a matrix A E C nxn are precisely those points A E C at which N(A->.J.,) i= {o}. In Chapter 5, we shall identify these points with the values of A E C at which the determinant det (A - A1n) is equal to zero. However, as we have not introduced determinants yet, we shall discuss another method that uses Gaussian elimination to find those points A E C for which the equation Ax - AX = 0 has nonzero solutions x E en. In particular, it is necessary to find those points A for which the upper echelon matrix U corresponding to A - A1n has less than n pivots. Example 4.11. Let
Then 1
2-A
1
3
I-A
1.
4.8. Computing eigenvalues at this point
75
Thus, permuting the first and third row of A for convenience, we obtain
[12 2-A 3 I-A] A= 1 . [0o1 001 01] 0 3-A 1 1 Next, adding -2 times the first row to the second and A - 3 times the first row to the third, yields
1 0 0] [ 12 2 -3 A [ -2 1 0 A-3 0 1 3-A 1
I-A] 1 = [10
3 1- A ] -4 - A 2A -1 , 0 3A-8 x
1
where
x = 1 + (A - 3)(1 - A) . Since the last matrix on the right is invertible when A = -4, the vector space NAH 13 = {O}. Thus, we can assume that A+4 i= 0 and add (3A-8)j(A+4) times the second row to the third row to get 1 [
0 1
o o (fA -
8) A+4)
I-A]
0][1 o 0 -43- A 2A - 1 1 0 3A - 8 x
I-A]
= [10 -43- A 2A - 1 0
0
y
where
(3A - 8)(2A -1) + A + 4 + (A + 4)(A - 3)(1- A) A+4 -A(A - 5)(A - 1) = A+4 Therefore, N(A-Mn) i= {O} if and only if A = 0, or A = 5, or A = 1. y
Exercise 4.20. Find an invertible matrix U E C 3X3 and a diagonal matrix DE C 3x3 so that A = UDU- 1 when A is chosen equal to the matrix in the preceding example. [HINT: Follow the steps in the algorithm presented in the previous section.] Exercise 4.21. Find an invertible matrix U such that U-lAU is equal to a diagonal matrix D for each of the following two choices of A:
[Hn [H~] D~ n
Exercise 4.22. Repeat Exercise 4.21 for
A=
(REMARK: This is a little harder than the previous exercise, but not much.]
4. Eigenvalues and eigenvectors
76
4.9. Not all matrices are diagonalizable Not all matrices are diagonalizable, even if complex eigenvalues are allowed. The problem is that a matrix may not have enough linearly independent eigenvectors; i.e., the criterion '/'1 + ... + '/'k = n established in Theorem 4.8 may not be satisfied. Thus, for example, if
A
=
[~ ~],
then A - 2h
=
[~ ~]
,
dimN(A-2h) = 1 and dimN(A-,u2) = 0 if A i- 2. Similarly, if
[2 1 0]
[O~ ~1 O~l
A= 0 2 1 , then A - 2h = 002
dimN(A- 213) = 1 and dimN(A_,u3) = 0 if A i- 2. More elaborate examples may be constructed by taking larger matrices of the same form or by putting such blocks together as in Exercise 4.23.
Exercise 4.23. Calculate dimN(B>'l -Alh3)i for j = 1,2, ... when Al
1 0 0 Al 1 0 0 Al 1 0 0 Al 0 0 0 ···0 .. ···0·····0·· 0
o o o o
B)"l -
o o
0
0 0 0
0 0 0
o
0
0
o
0
···0·····0·····0··
0 0 0
0
0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0: 0 0: 0
Al 0 0
1 Al 0
0 1
0 0 0
0: 0 0 0 0 0 0: 0 . 0
AI····C···O·····O··:··O·····O··;··O··:··O··
AI: ·O··:··O·····O·····O·····O··:·A;····i··;··o··;··O·· 0: 0 0 0 0: 0 AI: 0 : 0
. . . . . ~::::~:::::~:::::~:::::~::::§.::.~:.:.~;:.::~
.::~:::::~:::::~: ~ 0
0 0 0 1 Al 0
0
0
0
0
0
0: 0
O· 0 . Al
and build an array of symbols x with dimN(B>'l -Alh3)i -dimN(B>'1-Alh3)i-l symbols x in the i'th row for i = 1,2,.... Check that the number of fundamental Jordan cells in BAI of size i x i is equal to the number of columns of height i in the array corresponding to Aj. The notation
77
4.9. Not all matrices are diagonalizable
is a convenient way to describe the matrix its fundamental Jordan cells C~), where a 0 (4.14)
C(//) 0<
1 0 a 1
BAl
0 0
of Exercise 4.23 in terms of
0 0
=
= aI// + C6//) 0 0
a 0
0 0
1 a
denotes the 1/ x 1/ matrix with a on the main diagonal, one on the diagonal line just above the main diagonal and zeros elsewhere. This helps to avoid such huge displays. Moreover, such block diagonal representations are convenient for calculation, because
(4.15)
B = diag{Bl, ... ,Bd
==?
dimNB = dimNB1
+ ... + dimNBk .
Nevertheless, the news is not all bad. There is a more general factorization formula than (4.12) in which the matrix D is replaced by a block diagonal matrix J = diag {B A1 , ••. ,BAk }, where B Aj is an aj x aj upper triangular matrix that is also a block diagonal matrix with "(j Jordan cells (of assorted sizes) as blocks that is based on the following fact:
Theorem 4.12. Let A E c nxn and suppose that A has exactly k distinct eigenvalues, AI, ... ,Ak E C. Then
The proof of this theorem will be carried out in the next few sections. At this point let us focus instead on its implications.
Example 4.13. Let A E C 9x9 and suppose that A has exactly three distinct eigenvalues AI, A2 and A3 with algebraic multiplicities al = 4, a2 = 2 and a3 = 3, respectively. Let {VI, V2, V3, V4} be any basis for NCA-A1I9)9, let {Wl,W2} be any basis for N(A-A2 19)9 and let {XI,X2,X3} be any basis for N(A-A3 19)9. Then, since each of the spaces N(A-Aj I 9)9, j = 1,2,3 is invariant under multiplication by the matrix A, A[VI V2 V3 V4] A[WI W2] A[XI X2 X3]
=
[VI V2 V3 V4]G l [WI W2]G 2 [Xl X2 X3]G3
for some choice of G I E C 4X4 , G 2 E C 2x2 and G3 E C 3x3 . In other notation, upon setting
4. Eigenvalues and eigenvectors
78
one can write the preceding three sets of equations together as A[V W
Xl =
[V W
Xl
or, equivalently, upon setting U = [V W GI 0 A = U [ 0 G2
(4.16)
o
since the matrix U = [V W
0
GI 0 [ 0 G2
o 0 Xl, as 0 0 G3
1
U- l
,
Xl is invertible, thanks to Theorem 4.12.
Formula (4.16) is the best that can be achieved with the given information. To say more, one needs to know more about the subspaces ~A->'iln)j for j = 1, ... ,ai and i = 1,2,3. Thus, for example, if dimN(A->'iIg) = 1 for i = 1,2,3, then the vectors in ~A->'iIg)Of.i may be chosen so that
, ,ci!)} .
diag {GI, G2, G3} = diag {ci~) ci~)
On the other hand, if dimN(A->'iIg) = 2 for i = 1,2,3 and dim~A->'lIg)2 = 4, then the vectors in N(A->'iI9)Of.j may be chosen so that diag {GI, G2, G3 }
= diag {Ci~) ,ci~), ci!) ,ci!) ,ci~), ci~)} .
There are still more possibilities. The main facts are summarized in the statement of Theorem 4.14 in the next section.
4.10. The Jordan decomposition theorem Theorem 4.14. Let A E c nxn and suppose that A has exactly k distinct eigenvalues AI, ... ,Ak in C with geometric multiplicities 1'1, ... ,1'k and algebraic multiplicities al,." ,ak, respectively. Then there exists an invertible matrix U E c nxn such that AU=UJ,
where: (1) J = diag {BA1 , ... ,BAk }.
(2) B>.j is an cells
ct)3
aj x C1.j
block diagonal matrix that is built out of 1'j Jordan
oj the form (4.14).
(3) The number of Jordan cells
(4.17)
ci; in B>.j with i ~ f is equal to
dim~A_Ajln)l - dim~A_Ajln)l-l' f = 2, ... ,aj,
4.11. An instructive example
79
or, in friendlier terms, the number of Jordan cells cl? in B>.j is equal to the number of columns of height i in the array of symbols x x
...
x
...
x
.. .
x
with 1'1 symbols in the first row with dim .Nc.A->'jln)2 - dim N(A->'jln) symbols in row 2 with dim .Nc.A->'jln )3 - dim N(A->'jln )2 symbols in row 3
(4) The columns of U are generalized eigenvectors of the matrix A. (5) (A - >'lIn)Ql ... (A - >'k1n)Qk = O. (6) Ifvj = min{i: dim.Nc.A->'jln)i = dim.Nc.A->'jln)n}, then Vj:::; OJ for j = 1, ... ,k and (A - >'IIn)Vl ... (A - >'k1n)Vk = O. The verification of this theorem rests on Theorem 4.12. It amounts to showing that the basis of each of the spaces N(A->'jln)n can be organized in a suitable way. It turns out that the array constructed in (3) is a Young diagram, since the number of symbols in row i + 1 is less than or equal to the number of symbols in row ij see Corollary 6.5. Item (5) is the Cayley-Hamilton theorem. In view of (6), the polynomial p(>.) = (>. - >'l)Vl ... (>. - >'k)Vk is referred to as the minimal polynomial for A. Moreover, the number Vj is the "size" of the largest Jordan cell in B>'j. A detailed proof of the Jordan decomposition theorem is deferred to Chapter 6, though an illustrative example, which previews some of the key ideas, is furnished in the next section. Exercise 4.24. Show that if
>. i= 0 >. = a Exercise 4.25. Calculate dim.Nc.A->.Ip)t for every >. E C and t = 1,2, ... for the 26 x 26 matrix A = UJU- 1 when J = diag{B>'1,B>'2,B>'3}' the points A=
ucin)u- 1 ,
then dim.Nc.A->.ln)
={
~
if if
>'1,>'2,>'3 are distinct, B>'l is as in Exercise 4.23, B>'2 = diag{Cl!), ci!)} and B >'3 = diag {ci!), ci!), cl~)} . Build an array of symbols x for each eigenvalue >'j with dim.Nc.A->'jlp)i - dimN(A->'jlp)i-l symbols x in the i'th row for i = 1,2, ... and check that the number of fundamental Jordan cells in B>.j of size i x i is equal to the number of columns in the array of height i.
4.11. An instructive example To develop some feeling for Theorem 4.14, we shall first investigate the implications of the factorization A = U JU- 1 on the matrix A when
U=
[Ul ...
us]
80
4. Eigenvalues and eigenvectors
is any 5 x 5 invertible matrix with columns
UI, .. , ,U5
Al
1
o
All' 0
0: 0
0 0
C(2)} -- d'lag {C(3) AI' A2
0 .... ~ .... ~l.. : .. ~ .... ~ o 0 0: A2 1 o 0 0: 0 A2
J =
and
Then the matrix equation AU = U J can be replaced by five vector equations, one for each column of U: AUI
= AIUI.
+ Al u2· AU3 = U2 + Al u3. AU2
= UI
AU4
= A2 U 4. = u4 + A2U5'
AU5
The first three formulas imply in turn that
N(A->'1 1S) ; i.e., UI is an eigenvector corresponding to U2 fI. N(A->'l/s) but U2 E N(A- Al/s)2. Us fI. N(A->'1Is)2 but U3 E N(A- Alls)3 ;
Ul E
AI.
i.e., ul, U2, Us is a Jordan chain of length 3. Similarly, ll4, U5 is a Jordan chain of length 2. This calculation exhibits UI and U4 as eigenvectors. In fact,
(1) if Al
:f A2, then
rumA{A-M,)
~
t
= Al if A = A2 if otherwise
A
otherwise
and
~G
if
dimA{A-M,).
(2) if Al =
A2,
A = Al
if A = A2 otherwise
for every integer
k ~ 3;
then
2 if A = Al dimM(A->.Is) = { 0 otherwise'
4 if A = A] dimN(A->.Is)2 = { 0 otherwis
4.11. An instructive example
81
and dim.Af(A_AIs)k
5 if A = Al = { 0 otherwise
for every integer k
~
3.
The key to these calculations is in the fact that (4.18) for k = 1,2, ... , and the special structure of J. Formula (4.18) follows from the identity (4.19) Because of the block diagonal structure of J, rank J
= rank C(3) + rank C(2) A1 A2
and
Moreover, it is easy to compute the indicated ranks, because a is invertible if {3 i= 0 and cell
Cr)
rank ( Co(V))k
= 1/ - k for k = 1, ...
To illustrate even more graphically, observe that if {3
J - AIls
=
1 0: 0 0 1 0
0 0 0 0 0 . . 0. . . . . 0: . . . .. . . . . . . . . . . 0:{3 1 0 0 0 0 0: 0 {3
0 0
(J - AII5)2 =
1/
x
1/
Jordan
,1/.
= Al - A2 i= 0, then 0 0 0
1 0 0
0
0
0 0 0
0
0 0 0: {3 0 0
0 0 0
················"···2··········
o 0
2{3
f32
and 0 0 0: 0
(J - AII5)3 =
0 0 0: 0 0 0 0: 0 · .......... ···'····3 0 0:{3 0 0 0: 0
o
0 0 0 ... '2' 3{3
(33
Clearly one can construct more elaborate examples of nondiagonalizable matrices A = U JU- l by adding more diagonal block "cells" ct) to J. 1
4. Eigenvalues and eigenvectors
82
4.12. The binomial formula The familiar binomial identity (a+b)m=
f
(7)a k bm- k
k=O
for numbers a and b remains valid for square matrices A and B of the same size if they commute: (4.20)
(A+B)m =
f (7)
AkBm- k if AB
= BA.
k=O
If this is unfamiliar, try writing out (A
(>.J
(4.21)
+ B)2
and (A + B)3. In particular,
+B)m = ~ (7)A'B m-•.
Exercise 4.26. Find a pair of matrices A and B for which the formula (4.20) fails.
4.13. More direct sum decompositions Lemma 4.15. Let A E IFnxn, ,X E IF and suppose that (A - ,XIn)ju = 0 for some j ~ 1 and u E IFn. Then (A - ,XIn)nu = o. Proof. Let B = A - ,XIn. Then, since the assertion is self-evident if u = 0, it suffices to focus attention on the case when u =J. 0 and k is equal to the smallest positive integer j such that Bj u = 0 and to consider the set of nonzero vectors k-l u. U, B u, ... , B This set of vectors is linearly independent because if COU + cIBu + ...
+ Ck_lBk-1U =
0,
then the self-evident identity
B k- 1 ( COU + ... + Ck_lBk-1u) = 0 implies that CO = O. Similarly,
B k- 2 (c1Bu + ... + Ck_lBk-1u) = 0 implies that
Cl
= O. After k - 1 such steps we are left with
0 Ck-l B k-l u=,
which implies that Ck-l = O. This completes the proof of the asserted linear independence. But if k vectors in IF n are linearly independent, then k ~ n, and hence Bnu = 0, as claimed. 0
4.13. More direct sum decompositions
83
Lemma 4.16. Let A E lF nxn and A E IF. Then n
( 4.22 )
Ar
•
IF = JV (A-).Jn)n +R(A-).Jn)n .
Proof.
Let B
= A - AIn and suppose first that uENBn nRBn.
Then Bnu
= 0 and u = Bnv for some vector v
E lFn.
Therefore,
0= Bnu = B 2n v. But, by the last lemma, this in fact implies that
u=Bnv=O. Thus, the sum is direct. It is all of lF n by the principle of conservation of dimension:
o Lemma 4.17. Let A E lF nxn , let Al,A2 E IF and suppose that Al
of A2.
Then N(A->'2In)n ~ R(A->'lIn)n.
Proof.
o= =
Let u E N(A->'2 In)n. Then, by formula (4.20),
(A - AIIn + (AI - A2)In)n u
t
. 0
J=
(~) (A -
AIIn)j (AI - A2)n- j u
J
= (AI - A2)nU +
t
(~) (A -
j=l J
= (AI - A2tu + (A - AIIn)
AIIn)j (AI - A2)n- j u
t (~)
(A - AIIn)j-l(AI - A2)n- j U •
j=l J
Therefore, for some polynomial
p(A) = COIn + CIA + ... + Cn_1A n- 1 in the matrix A.
Iterating the last identity for u, we obtain u = (A - AIIn)p(A)(A - AIIn)p(A)u
= (A - AIIn)2P(A)2u,
84
4. Eigenvalues and eigenvectors
since
(A - AIln)p(A) = p(A)(A - A1In). Iterating n - 2 more times we see that U
= (A - A1In)np(Atu,
which is to say that
o
as claimed.
Remark 4.18. The last lemma may be exploited to give a quick proof of the fact that generalized eigenvectors corresponding to distinct eigenvalues are automatically linearly independent. To verify this, let
(A - Ajln)nuj
= 0, j = 1, ... , k,
for some distinct set of eigenvalues AI, ... , Ak and suppose that Cl Ul
+ ... + ck Uk
= 0.
Then and, since -C1U1 E.N(A->'lln)n
and, by Lemma 4.17,
C2U2+"'+CkUk E R(A-A1ln)n,
both sides of the last equality must equal zero, thanks to Lemma 4.16. Therefore Cl = 0 and C2U2 + ... + CkUk = O. To complete the verification, just keep on going.
Exercise 4.27. Complete the proof ofthe assertion in the preceding remark when k = 3.
4.14. Verification of Theorem 4.12 Lemma 4.16 guarantees that (4.23)
for every point A E IF. The next step is to obtain an analogous direct sum decomposition for R(A->-.Jn)n.
Lemma 4.19. Let A E lF nxn , let A1,A2 E IF and suppose that Al ::/= A2. Then
(4.24)
R(A-Al1n)n
= N(A-A2 In)n+{R(A->'lIn )n n R(A-A2 In)n}.
4.14. Verification of Theorem 4.12
Proof.
85
The sum in (4.24) is direct, thanks to Lemma 4.16. Moreover, if x E
'R(A-Atln)n ,
then, we can write x=u+v
where u E N(A-A2In)n, v E 'R(A-A2In)n and, since the sum in (4.22) is direct, the vectors u and v are linearly independent. Lemma 4.17 guarantees that u E 'R(A-AIln)n.
Therefore, the same holds true for v. Thus, in view of Lemma 4.6, 'R(A-A1ln)n
= 'R(A-AIln)n n N(A-A2 In)n + 'R(A-AIln)n n 'R(A-A2 In)n ,
which coincides with formula (4.24).
0
There is a subtle point in the last proof that should not be overlookedj see Exercise 4.14. Lemma 4.20. Let A E C nxn and suppose that A has exactly k distinct eigenvalues, AI, ... ,Ak E C. Then (4.25)
Proof. Let M denote the intersection of the k sets on the left-hand side of the asserted identity (4.25). Then it is readily checked that M is invariant under Aj i.e, if U E M, then Au E M, because each of the sets 'R(A-Ajln)n is invariant under A: if U E 'R(A-Ajln)n, then U = (A - AjIn)nvj and hence Au = (A - AjIn)n AVj, for j = 1, ... ,k. Consequently, if M =1= {O}, then, by Theorem 4.4, there exists a complex number A and a nonzero vector v E M such that Av - AV = O. But this means that A is equal to one of the eigenvalues, say At. Hence v E N{A->..t1n)' But this in turn implies that v E N(A->..t1n)n n'R(A->..t1n)n = {O} .
o We are now ready to prqve Theorem 4.12, which gives the theoretical justification for the decomposition of J into blocks B Aj , one for each distinct eigenvalue. It states that every vector vEe n can be expressed as a linear combination of generalized eigenvectors of A. Moreover, generalized eigenvectors corresponding to distinct eigenvalues are linearly independent. The theorem is established by iterating Lemma 4.19. Proof of Theorem 4.12. Let us suppose that k 2: 3. Then, by Lemmas 4.16 and 4.19,
and
4. Eigenvalues and eigenvectors
86 Therefore,
en = N(A-AlIn)n+N(A-A2In)n+'RCA-AlIn)n n 'RCA- A2In)n . Moreover, since NCA-A3In)n
~ 'RCA-AlIn)"
n 'RCA-A2In)n,
by Lemma 4.17, the supplementary formula 'RCA-AlIn)" n'RCA-A2In)n
= N(A-A3In)n +'RCA-AlI.. )n n'RCA-A2In)n n'RCA-A3In)n
may be verified just as in the proof of Lemma 4.19 and then substituted into the last formula for en. To complete the proof, just keep on going until you run out of eigenvalues and then invoke Lemma 4.20. D The point of Theorem 4.14 is that for every matrix A E e nxn it is possible to find a set of n linearly independent generalized eigenvectors UI, ... ,Un such that
A[UI ... un] =
[UI ...
un]J .
The vectors have to be chosen properly. Details will be furnished in Chapter 6.
Exercise 4.28. If B E lF nxn , then
'RBn
n NBn = {o}. Show by example
that the vector space may contain nonzero vectors.
Exercise 4.29. Show that if A
E
e nxn has exactly two distinct eigenvalues
in C, then 'RCA-AlIn)" n'RCA-A2In)n =
{o} .
Exercise 4.30. Show that if A E e nxn has exactly k distinct eigenvalues AI, ... ,Ak in C with algebraic multiplicities O!l,'" ,O!k, then Ar
JVCA-AIIn)"'l
+ ... +JVCA-Ak1n)<>k "Af
=
en .
Is it possible to reduce the powers further? Explain your answer.
Exercise 4.31. Verify formula (4.15). [HINT: In case of difficulty, start modestly by showing that if B
= diag{B},B2,Bg}, then '.
dimNB =
dimNBl
+ dimNB2 + dimNB3']
Exercise 4.32. Let A be an n x n matrix. (a): Show that if UI," . ,Uk are eigenvectors corresponding to distinct eigenvalues All ... ,Ak, then the vectors Ul, ••• , Uk are linearly independent. (Try to give a simple direct proof that exploits the fact that (A - Alln)'" (A - Aj1n)Ui = (Ai - AI)'" (Ai - Aj)ud (b): Use the conclusions of part (a) to show that if A has n distinct eigenvalues, then A is diagonalizable.
4.15. Bibliographical notes
87
Exercise 4.33. Let U E en, Y E en and BE e nxn be such that B 4 u = 0, B4y = and the pair of vectors B3u and B3 y are linearly independent in en. Show that the eight vectors u, Bu, B 2u, B 3 u, Y, By, B2y and B3 y are linearly independent in en.
°
Exercise 4.34. Let u E en, y E en and B E e nxn be such that B 4 u = 0, B3 y = and the pair of vectors B3u and B2y are linearly independent in en. Show that the seven vectors u, Bu, B 2u, B 3u, y, By and B2y are linearly independent in en.
°
Exercise 4.35. Let B E e nxn . Show that NB ~ NB2 ~ N B3 ~ '" and that if NBi = NBHl for j = k, then the equality prevails for every integer j> k also. Exercise 4.36. Show that if BE e nxn , then dim NB2 ~ 2 dim N B. [REMARK: The correct way to interpret this is: dim NB2 -dim NB ~ dim N B.] Exercise 4.37. Calculate
[~
!]
100.
[HINT: To see the pattern, write
the given matrix as aI2 + F and note that since F2 = 0, (aI2 F)3, ... , have a simple form.]
+ F)2,
(aI2 +
4.15. Bibliographical notes Earlier versions of this chapter defined the eigenvalues of a matrix A E in terms of the roots of the polynomial det ()".In - A). The present version, which was influenced by a conversation with Sheldon Axler at the Holomorphic Functions Session at MSRI, Berkeley, in 1995 and his paper [5] in the American Math. Monthly, avoids the use of determinants. They appear for the first time in the next chapter, and although they are extremely useful for calculating eigenvalues, they are not needed to establish the Jordan decomposition theorem. To counter balance the title of [5] (which presumably was chosen for dramatic effect) it is perhaps appropriate to add the following words of the distinguished mathematical physicist L. D. Faddeev [30]: If I had to choose a single term to characterize the technical tools used in my research, it would be determinants.
e nxn
Chapter 5
Determinants
Look at him, he doesn't drink, he doesn't smoke, he doesn't chew, he doesn't stay out late, and he still can't hit. Casey Stengel In this chapter we shall develop the theory of determinants. There are several ways to do this, many of which depend upon introducing unnatural looking formulas and/or recipes with little or no motivation and then showing that "they work". The approach adopted here is axiomatic. In particular we shall show that the determinant can be characterized as the one and only one multilinear functional d(A) from nxn to C that meets the two additional constraints d(In) = 1 and d(PA) = -d(A) for every simple n x n permutation P. Later on, in Chapter 9, we shall also give a geometric interpretation of the determinant of a matrix A E lR. nxn in terms of the volume of the parallelopiped generated by its column vectors.
c
5.1. Functionals A function f from a vector space V over IF into IF is called a functional. A functional f on a vector space V over IF is said to be a linear functional if it is a linear mapping from V into IF, i.e., if
f(au for every choice of u, v E n-dimensional vector space elements of a basis for the space V over IF, and if v =
+ f3v) =
af(u)
+ f3f(v)
V and a, f3 E IF.
A linear functional on an is completely determined by its action on the space: if {Vb... ,vn} is a basis for a vector alvl + ... + anv n , for some set of coefficients
-
89
5. Determinants
90 {aI, ... ,an} E IF, then
(5.1)
f(v) = f
(t,";Vi) t,"i!(Vi) =
is prescribed by the n numbers !(VI), ... ,f(vn ). A functional !(VI, ... ,Vk) on an ordered set of vectors {VI, ... ,Vk} belonging to a vector space V is said to be a multilinear functional if it is linear in each entry separately; i.e., !(VI, ... ,Vi + W, ... ,Vk) = !(VI, ... ,Vi, ... ,Vk)
+ !(VI, ...
,W, ... ,Vk)
for every integer i, 1 $ i $ k, and !(VI, ...
,ctVi,' ..
,Vk) = a!(vI, ... ,Vi,··. ,Vk)
for every a E IF. Notice that if, say, k !(VI + WI, V2 + W2, va) =
= 3, then this implies that
=
!(VI, V2 + W2, va) + f(WI, V2 f(vI, V2, va) + f(vI, W2, V3)
+
f(WI, V2, V3)
+ W2, va)
+ f(WI, W2, V3)
and
5.2. Determinants Let En denote the set of all the n! one to one mappings u of the set of integers
{l, ... , n} onto itself and let ei denote the i'th column of the identity matrix In. Then the formula Pu =
?= eie~(i) n
~=I
[
=
e~(I) 1 : eT
u(n) that was introduced earlier defines a one to one correspondence between the set of all n x n permutation matrices Pu and the set En. A permutation Pu E lR nxn with n ;?: 2 is said to be simple if u interchanges exactly two of the integers in the set {I, ... ,n} and leaves the rest alone; i.e., an n x n permutation matrix P is simple if and only if it can be expressed as P
= L ejeJ + eile~ + ei2e~ , jEA
where A = {I, ... ,n} \ {iI, i2} and il and i2 are distinct integers between 1 andn. Exercise 5.1. Show that if P is a simple permutation, then P = pT.
91
5.2. Determinants
Theorem 5.1. There is exactly one way of assigning a complex number d(A) to each complex n x n matrix A that meets the following three requirements: 1° d(ln) = 1. 2° d(PA) = -d(A) for every simple permutation matrix P. 3° d(A) is a multilinear functional of the rows of A.
Discussion. The first two of these requirements are easily understood. The third is perhaps best visualized by example. Thus, if
A=
[:~~ :~: :~:l' a31 a32 a33
then, since [an
al2
a13] = [an
0 0] + [0
0] + [0 0
al2
a13] ,
rule 3° applied to the top row of A implies that
= and
([a~l a~2 a~3l) + ([a~l a~2 a~3l) al2 d
a31
+ al3 d
a32
a33
a31
a32
a33
([a~l a~2 a~3l)· a31
a32 a33
This last formula can be rewritten more efficiently by invoking the notation and e;, for the i'th row of the matrix A and the i'th row of the identity matrix 13 , respectively, as
a:
e:=
d(A)
= i;alid ([ ~-l
~]) a3
Moreover, since and
a; =
3
L a3k e; , k=l
5. Determinants
92
another two applications of rule 3° lead to the formula
which is an explicit formula for d(A) in terms of the entries ast in the matrix
A and the numbers d ([
~ ] ) , which in fact are equal to 0 if one or more
of the rows coincide, thanks to the next lemma. Granting this fact for the moment, the last expression simplifies to
d(A)
=
L
alO' (1)a2O'(2)a3O'(3)d ( [
=i:::
e 0'(3)
O'EI:a
l) ,
where, as noted earlier, En denotes the set of all the n! one to one mappings of the set {1, ... ,n} onto itself. It is pretty clear that analogous formulas hold for A E C nxn for every positive integer n:
(5.2) d(A)
=
L O'EI:,.
alO'(l) ... anO'(n)d ( [
e;(l) ] ) : =
eT
O'(n)
L
alO'(l)'" anu(n)d(PO' ) .
O'EI:n
Moreover, if Puis equal to the product of k simple permutations, then
The unique number d(A) that is determined by the three conditions in Theorem 5.1 is called the determinant of A and will be denoted det(A) or det A from now on. Exercise 5.2. Use the three rules in Theorem 5.1 to show that if A E C 2x2, then det A = alla22 - a12a2l. Exercise 5.3. Use the three rules in Theorem 5.1 to show that if A E C 3x3, then
5.3. Useful rules for calculating determinants
93
5.3. Useful rules for calculating determinants Lemma 5.2. The determinant of a matrix A E C nxn satisfies the following rules: 4° If two rows of A are identical, then det A = O. 5° If B is the matrix that is obtained by adding a multiple of one row of A to another row of A, then det B = det A. 6° If A has a row in which all the entries are equal to zero, then det A = O. 7° If two rows of A are linearly dependent, then det A = O.
Discussion. Rules 4°-7° are fairly easy consequences of 1°-3°, especially if you tackle them in the order that they are listed. Thus, for example, if two rows of A match and if P denotes the permutation that interchanges these two rows, then A = P A and hence det A = det (P A) = - det A , which clearly justifies 4°. Rule 5° is most easily understood by example: If, say,
A
=[
~ 1 ~d
then
det B
= det
B
= A+ ae2
[~ 1+
a dci [
i
~= [ ~ ~ 1 ~ 1=
det A
+ O.
Rule 6° is left to the reader. Rule 7° follows from 6° and if, say, n aa3 + {3a4 = 0 and a -1= 0, the observation that
a det A
= det
~ 1=
[ a a3
J ~1
det [ a a3 +{3
->
~
= 4,
~
= O.
->
~
A number of supplementary rules that are useful to calculate determinants will now be itemized in numbers running from 8° to 13°, interspersed with discussion. 8° If A E
c nxn is either upper triangular or lower triangular, then det A
= all ... ann.
94
5. Determinants
Discussion.
To clarify 8°, suppose for example that
Then by successive applications of rules 3° and 5°, we obtain det A = a33 det [
° ° °
° ° °
au al2 a l3] [an a22 a23 = a33 det
= a3S a22 det
1
[a~l a~2 ~]
° °
= aaaa22aU det 13·
= a33a22 det
1
a12 001] a22
[a~1
° °0] °° 1
1
Thus, in view of rule 1°, det A
= ana22a33,
as claimed. Much the same sort of argument works for lower triangular matrices, except then it is more convenient to work from the top row down rather than from the bottom row up. Lemma 5.S. If E E C nxn is a lower triangular matrix with ones on the diagonal, then
det (EA) = det A
(5.3)
for every A E C nxn. Discussion. Rule 5° implies that det (EA) = det A if E is a lower triangular matrix with ones on the diagonal and exactly one nonzero entry below the diagonal. But this is enough, since a general lower triangular matrix E with ones on the diagonal can be expressed as the product E = E1 ... Ek of k matrices with ones on the diagonal and exactly one nonzero entry below the diagonal, as in the next exercise. 0 Exercise 5.4. Let E=
[a~1 ~ ~ ~ 1
a31 a32 1 a4l a42 a43 and let ei, i = 1, ... ,4 denote the standard basis for C 4. Show that
+ a3le3e f)(I4 + a41 e4ef) x (14 + a32e3e I) (14 + a42e4eI}(14 + flo:43e4ef). If A E c nxn , then A is invertible if and only if det(A) i= 0. E
9°
-
(14 + a21e2ef) (14
5.3. Useful rules for calculating determinants
Proof.
95
In the usual notation, let
(5.4)
U = EPA
be in upper echelon form. Then U is automatically upper triangular (since it is square in this application) and, by the preceding rules, det(EP A)
= det(P A) = ± det A.
Therefore, IdetAI = Idet UI = lUll'" unnl· But this serves to establish the assertion, since A is invertible
{=}
U is invertible
U is invertible
{=} Ull'"
and U nn
=f 0.
o 10° If A, BE
c nxn , then det(AB) = det Adet B = det(BA).
Proof. If det B = 0, then the asserted identities are immediate from rule 9°, since B, AB and BA are then all noninvertible matrices. If det B =f 0, set (A) = det(AB) cp det B and check that cp(A) meets rules 1°- 3°. Then cp(A) = det A,
since there is only one functional that meets these three conditions, i.e., det(AB) = det A det B,
as claimed. Now, having this last formula for every choice of A and B, invertible or not, we can interchange the roles of A and B to obtain det(BA)
= det Bdet A = det Adet B = det(AB).
o Exercise 5.5. Show that if det B =f 0, then the functional cp(A) = dd~\AJ) meets conditions 1°-3°. [HINT: To verify 3°, observe that if al,' .. ,an designate the rows of A, then the rows of AB are alB, ... ,anB.] 11° If A E c nxn and A is invertible, then det(A-I) = {det A}-l, Proof. 12° If A E
Invoke rule 10° and the formula det(AA-I) = det(In) = 1.
c nxn , then det(A) = det(AT ).
0
5. Determinants
96
Proof.
Invoking the formula EPA = U and rules 10° and 8°, we see that det(P) det(A) = det(U) = un ... U nn .
Next, another application of these rules to the transposed formula ATpTET = UT
leads to the formulas
But now as P can be written as the product
of simple permutations, it follows that
pT = pl . .. p!
= Pk ... PI
is again the product of k simple permutations. Therefore, det(P)
= (_l)k = det(pT)
and hence
as claimed.
D
13° If A E C nxn, then rules 3° to 7° remain valid if the word rows is replaced by the word columns and the row interchange in rule 2° is replaced by a column interchange. Proof. reader.
This is an easy consequence of 12°. The details are left to the D
Exercise 5.6. Complete the proof of 13°. Exercise 5.7. Calculate the determinants of the following matrices by Gaussian elimination:
[~
3 4 0 1
2 1 2 0
!l [~ I U ~ 0 1 0 1
1 0 0 1
1
3 2 0 0
2 1 3 1
1
[~
0 2 0 1
0
3 1 2
[HINT: If, in the usual notation, EPA = U, then Idet AI = I det UI.]
I
1
Exercise 5.S. Calculate the determinants of the matrices in the previous exercise by rules 1° to 13°.
5.4. Eigenvalues
97
5.4. Eigenvalues Determinants play a useful role in calculating the eigenvalues of a matrix A E lFnxn. In particular, if A = U JU- 1 , where J is in Jordan form, then
Therefore, by rules 10°, 11° and 8°, applied in that order,
det(Aln - A) = det(Aln - J) = (A - jl1)(A - J22) ... (A - jnn) , where jii, i = 1, ...
,n, are the diagonal entries of J.
The polynomial
p(A) = det(Aln - A)
(5.5)
is termed the characteristic polynomial of A. In particular, a number A is an eigenvalue of the matrix A if and only if p(A) = o. Thus, for example, to find the eigenvalues of the matrix
look for the roots of the polynomial
This leads readily to the conclusion that the eigenvalues of the given matrix A are Al = 3 and A2 = -1. Moreover, if J = diag {3, -I}, then
which yields the far from obvious conclusion
The argument propogates: If AI, . .. ,Ak denote the distinct eigenvalues of A, and if Q;i denotes the algebraic multiplicity of the eigenvalue Ai, i = 1, ... ,k, then the characteristic polynomial can be written in the more revealing form
(5.6) It is readily checked that
p(A) =
(A - Al In r°l!l (A - A2Inyl<2 ... (A - Ak1nY"I, U(J - A1Inyl!1(J - A2Inyr2 ... (J - Ak1nYl!kU-1
- o.
5. Determinants
98
This serves to justify the Cayley-Hamilton theorem that was referred to in the discussion of Theorem 4.14. In more striking terms, the CayleyHamilton theorem states that det (AIn - A) ao + ... + an_lA n- 1 + An (5.7) ~ aoIn + ... + an_1An-l + An = 0 Exercise 5.9. Show that if J = diag {Ci~), ci~), ci~)}, then (J - Alho)5(J - A2ho)3(J - A3IlO)2 = O. Exercise 5.10. Show that if Al = A2 in Exercise 5.9, then (J - AIIlO)5(J - A3I1O)2 = O. Exercise 5.10 illustrates the fact that if /lj, j = 1, ... ,k, denotes the size of the largest Jordan cell in the matrix J with Aj on its diagonal, then p( A) = 0 holds for the possibly lower degree polynomial
Pmin(A) = (A - Altl (A - A2t2 ... (A - Aktk , which is the minimal polynomial referred to in the discussion of Theorem 4.14:
Pmin(A) =
(A - AIIntl (A - A2Int2 ... (A - AkIn)lIk = U(J - AIInt1(J - A2In)II2 ... (J - AkIn)lIkU-1 =
o.
Two more useful formulas that emerge from this analysis are:
(5.8) and
(5.9) where the trace of an n x n matrix A is defined as the sum of its diagonal elements:
(5.10)
trace A
= au + a22 + ... + ann.
The verification of the last formula depends upon the fact that n
(5.11)
trace (AB) =
n
L L aijbji = trace (BA). i=l j=l
Thus, in particular, (5.12)
trace A
= trace (UJU- 1 ) =
which leads easily to the stated result.
trace (JU-1U)
= traceJ,
99
5.5. Exploiting block structure
5.5. Exploiting block structure The calculation of determinants is often simplified by taking advantage of block structure. Lemma 5.4. If A E
c nxn is block diagonal, A
=
[~l
i.e., if A is of the form
.z2]
with square blocks An E C pxp and A22 E C qxq on the diagonal, then
det A = det An det A22 . Proof. The basic factorization formula of Gaussian elimination guarantees the existence of a pair of lower triangular matrices EI E C pxP , E2 E c qxq with ones on the diagonal and a pair of permutation matrices PI E C pxP , P2 E c qxq such that EIPIA n
= Un and E 2P2A22 = U22
are in upper echelon form and hence automatically upper triangular. Thus, det Un
= det EIPIA n = det PIAu and det U22 = det P2A22.
Moreover, since
[d
~]
~] [A~l
[2
.z2] =
and and
[ Uu
o
[~l ~2]
0]
U22
are both triangular, it follows that det [ EI 0 and det
0] =
E2
[U~l ~2]
det EI det E2 = 1
= det Uu det U22 .
Therefore, det [Uu
0] U22 det Un det U22
o
=
det PI det Au det P2 det A22 . The proof is completed by checking that PI det [ 0
0] _
P2
- det
[PI 0
0] [
Iq
det
Ip 0
o
5. Determinants
100
Theorem 5.5. Let A E
c nxn be expressed in block form as A = [All A12] , A21 A22
with square blocks An E C kxk and A22 E C (n-k)x(n-k) on the diagonal.
(1) If A22 is invertible, then
detA=det(All-A12A2"lA21)detA22.
(5.13)
(2) If An is invertible, then det A = det(A22
(5.14)
-
A21Ail A12 ) det An .
Proof. The first assertion follows easily from Lemma 5.4 and the identity for the Schur complement of A22 with respect to A:
0]
A = [Ik A12A2"l] [ An - A12A2"lA21 o In-k 0 A22
0], A22 A21 In-k
[_{k
which is valid when A22 is invertible; the second rests on the identity
A=
[
Ik
0] [An
A21All In-k
0
0
A22 - A21Ail A12
][h 0
which is valid when An is invertible.
Ail A12 ] In-k
'
o
Corollary 5.6. If A is block triangular, i.e., if
0]
A = [An A21 A22
or A = [ All 0
with square diagonal blocks, then det A Proof.
= det Au det A22 .
Observe first that
[A~l 1~:] = [~ 1~:] [A~l I~k] and then invoke (2) of Theorem 5.5 to calculate the determinant of the first matrix on the right and (1) of the same theorem to calculate the determinant of the second matrix on the right. It is important to keep in mind that (1) and (2) cannot be invoked directly, because the diagonal blocks An and A22 are not assumed to be invertible. The formula for the determinant of the second block matrix can be verified in much the same way or, what is even easier, by noting that the transpose of the second block matrix has the same block form as the first 0 and then invoking the fact that det A = det AT.
5.5. Exploiting block structure
101
Exercise 5.11. Show that if A, BE c nxn are expressed in compatible four block form with An, Bn E C kxk, A 22 , B22 E C (n-k) x (n-k) and if AB = In, then (5.15) det An i- 0 {::::::::} det B22 i- 0 and det A22 i- 0 {::::::::} det Bn i- o.
[HINT: Anx = 0
==}
B 22 A 21 X = 0, B 22 y = 0
==}
AnBI2Y = 0 etc.]
Exercise 5.12. Show that if A, BE c nxn are expressed in compatible four block form with An, Bn E C kxk , A 22 , B22 E C (n-k)x(n-k) and if AB = In, then det A22 d B _ det An and (5.16) det B22 = det A . et 22 - det A
[HINT: In view of Exercise 5.11, it only remains to consider the cases det An i- 0 and det A22 i- 0, respectively.] Remark 5.7. The identities in (5.16) are special cases of a more general formula due to Jacobi that expresses the determinant of an arbitrary square submatrix of A-I in terms of the determinant of the complementary submatrix of A and det A; see e.g., Exercise 5.21. Exercise 5.13. Show that if B is a p x q matrix and G is a q x p matrix, then det{Ip - BG} = det{Iq - GB} and that q + rank{Ip - BG} = p + rank{Iq - GB} . [HINT: Imbed Band G appropriately in a (p+q) x (p+q) matrix and then exploit Theorem 5.5.] Exercise 5.14. Show that if A E C pxq and BE C qxp , then
(5.17)
det ()"Ip
-
AB) =
)..p-q det ()..Iq -
BA)
if )..
i- o.
Exercise 5.15. Show that if u E C P, then det (Ip - uu H ) if uHu i- 1.
i- 0 if and only
Exercise 5.16. Let A E c nxn be invertible and let u, v E cn. Show that the matrix A + uvT is invertible if and only if 1 + v T A-Iu i- O. Exercise 5.17. Calculate the determinant of the matrix 1 1 1 1 0 0 0 0 2 2 2 0 0 0 0 0 3 3 0 0 0 A= 0 0 0 4 0 0 0 9 8 7 6 1 2 3 1 5 9 3 0 4 1 8 8 8 6 0 2 2
5. Determinants
102
[HINT: This is easy if you exploit the block triangular structure of A.] Exercise 5.1S. Calculate the determinant of the matrix
[Z
~].
5.6. The Binet-Cauchy formula The Binet-Cauchy formula is a useful tool for calculating the determinant of the product AB of a matrix A E C kxn and a matrix B E C nxk in terms of the determinants of certain square subblocks of A and B when n 2: k. The notation
for the determinant of the indicated k x k subblock of the matrix C with entries taken from rows i l ,.·· , ik and columns jI, ... ,jk of the matrix C will be needed. To amplify further, if, say, C is a 7 x 5 matrix, then
C 1: 4: 5 (
2 3 6 )
= det
[C2l C3l
C24 C34
Cfll
Cfl4
C25] C35 C65
•
Theorem 5.S. If A E C kxn , BE C nxk and k::; n, then detAB=
L
A( .I, ...
... 1:531 <32<"'<3k:5n
,~
)I, ... ,)k
)B( jI, ... ,jk) I, ... ,k
Discussion. The proof consists of two steps. The first is to exploit the identities =
[A AB] -In Onxk
_
[AB A Onxk -In
][0h
IOn]
to obtain the formula det
[-~n O~k]
det
[~~k -~n] det [Z ~]
= (-1 tk det [AB
A] Onxk -In
(-ItkdetAB det(-In) = (-l)n(k+l)detAB
5.6. The Binet-Cauchy formula
103
or, equivalently,
(5.18)
detAB = (_l)n(k+l)det [ A -In
OkXk] B .
The second step is to use the multilinearity of determinants to help evaluate the determinant of the matrix on the right. Thus, for example, if
A = [a1 C = [C1 and
a2
a3] is a 2 x 3 matrix, with columns aI, a2, a3
in
C2
C3] is a 3 x 3 matrix, with columns C1, C2, C3
in ((:3
((:2
,
= [hI h2] is a 3 x 2 matrix, with columns hI, h2
B
then, since the determinant of the full matrix is a multilinear functional of its columns, det [a1 a2 33 0 C1 C2 C3 hI
=
0] h2
@+®, where @ = det [ a1
C1
in view of Corollary 5.6, and
®
=
=
-
det [a1 a2 a3 0 C1 C2 0 hI
0] h2
det [ a 1 a2 o C2
0] h2
a3
_ det [a1 a3 o 0
- det [al
0 hI
0
a2 C2
0 hI
+ det [0 a2 33
0] h2
C1
c2
0 hI
0
+ det [a2 a3 0 c2
0
Cl
0] h2 0 hI
0] h2
a3] det [C2 hI h2]
+ det [a2
a3] det [C1 hI
h2].
The next step is to show that if C3 = -13, then @ = -A ( 1,2 ) B ( 1,2 ) 1,2 1,2
(5.19) and (5.20)
®
= -A ( 1,2 ) B ( 1,3 ) _ A ( 1,2 ) B ( 2,3 ) .
1,3
1,2
2,3
1,2
The Binet-Cauchy formula for this example is now easily completed by combining formulas (5.18)-(5.20).
Exercise 5.19. Verify formulas (5.19) and (5.20).
5. Determinants
104
Exercise 5.20. Write the Binet-Cauchy formula for det AB for A E C 2x3 and B E C 2x3 in terms of the entries aij of A and bij of B. Do not compute the relevant determinants. Exercise 5.21. Show that if A, BE C 5x5 and AB (5.21)
= 15, then
1 3 4) ( 5)=_A 245 det A
3
5.7. Minors The ij minor A{ij} of a matrix A E c nxn is defined as the determinant of the (n - 1) x (n - 1) matrix that is obtained by deleting the i'th row and the j'th column of A. Thus, for example, if
A
[13 1]
[2
= ~ ~ ~ ,then A{12} = det 1
Theorem 5.9. If A is an n x n matrix, then det A can be expressed as an expansion along the i'th row: n
det A =
(5.22)
2: aijA{ij} (_1)i+i j=l
for each choice of i, i = 1, ... ,n, or as an expansion along the j'th column: n
(5.23)
detA
= 2: aijA{ii}(-I)i+j i=l
for each choice of j, j = 1, ... ,n.
Discussion. The conclusion depends heavily upon the fact that the determinant is linear in each row separately and each column separately. Thus, for example, if A is a 4 x 4 matrix, then we can write the third row of A as [a31
a32
a33
a34]
+ a32 [0 0] + a34 [0
= a31 [1
0 0 0]
1 0 0]
+ a33 [0
0 1
0 0 1]
5.7. Minors
105
and invoke linearity to obtain
det A
= a31
+ a33
[an det 1
a12
a13
a22
a23
0
0
a41
a42
a43
a44
[an det 0
a12
a13
a21
a22
a23
a14] a24
0
1
a41
a42
a43
a21
a1'] o + a24
a32
o + a34
[all det 0
a12
a13
a21
a22
a23
1
0
aa24 0
a41
a42
a43
a44
[an det 0
a12
a13
a21
a22
a23
0
0
a41
a42
a43
a44
14]
14]
aa24 1 . a44
Next, by invoking rule 5°, we can knock out all the entries in the same column as the one in each of the last four determinants. Thus, for example, the determinant multiplying a32 is seen to be equal to
:~: :~:] a43
= (-1)3det
[o ~
o
a44
a~3 a~4]
a23
a24
a43
a44
= (-1)3A{32}, thanks to rules 2°, 13° and Lemma 5.4. Similar considerations lead to the formula det A
= a31A{31}
- a32A{32}
+ a33A{33} -
a34A{34}'
All the other formulas for computing the determinant of a matrix A E C 4x4 can be established in much the same way. The verification offormula (5.22) for A E nxn is similar. It rests on the observation that the rows of the matrix A can be written as 'Ej=o aijeJ and hence, for example, the expansion in minors along the first row of A is obtained from the development
a:
c
The verification of formula (5.23) rests on analogous decompositions for the columns of A. MORAL: Formulas (5.22) and (5.23) yield 2n different ways of calculating the determinant of an n x n matrix, one for each row and and one for each column, respectively. It is usually advantageous to expand along the row or column with the most zeros.
5. Determinants
106
Exercise 5.22. Evaluate the determinant of the 4 x 4 matrix
A=
[n ~ ~l
twice, first begin by expanding in minors along the third column and then begin by expanding in minors along the fourth column. Exercise 5.23. Show that if A E C nxn, then the ij minor A{ij} is equal to (-1 )i+j det X, where X denotes the matrix A with its i'th row replaced by e Tj . Theorem 5.10. If A E entries
c nxn
and if C E
c nxn
denotes the matrix with
then:
(1) AC = CA = det A . In. (2) If detA # 0, then A is invertible and A-I =
aef-A C.
Discussion. If A is a 3 x 3 matrix, then this theorem states that (5.24) an aI2 a13 ] [ A{1I} -A{2I} A{3I}] [det A 0 0] [a2I a22 a23 - A{12} A{22} - A{32} = 0 det A 0 a3I a32 a33 A{13} - A{23} A{33} 0 0 det A This formula may be verified by three simple sets of calculations. The first set is based on the formula
a~2
a:3] = XA{1I} - yA{12} + ZA{13} det [a:1 a31 a32 a33 and the observation that: (5.25)
• if (x, y, z) = (an, a12, aI3), then the left-hand side of (5.25) is equal to detA; • if (x, y, z) = (a21' a22, a23), then, by rule 4°, the left-hand side of (5.25) is equal to 0; • if (x, y, z) = (a31' a32, a33), then, by rule 4°, the left-hand side of (5.25) is equal to O. These three evaluations can be recorded in the following more revealing way:
[:~~ :~: :~:] ~~~~~ [-
a31 a32 a33
A{13}
] = det A
[~]. 0
5.7. Minors
107
The next set of calculations uses the formula an
det
[
x
l3 az ]
a31
a33
= -XA{21} + yA{22} -
ZA{23}
to verify that an [ a21
al2 a22
a 13 ] [ -A{21} ] a23 A{22} =
a31
a32
a33
[0]
det A I .
°
-A{23}
D
The rest should be clear.
Exercise 5.24. Formulate and verify the analogue of formula (5.24) for 4 x 4 matrices. Exercise 5.25. Let A be an n x n matrix and let f(x) = det (In + xA). Show that f'(O) = traceA. [HINT: First write the determinant in terms of the minors of either the first row or the first column and then differentiate. Lots of things drop out when x = 0. Try a 2 x 2 or a 3 x 3 first to get oriented.] Exercise 5.26. Give a second proof of the formula in Exercise 5.25 on the basis of the Jordan decomposition A = U JU- 1 that is described in Theorem 4.14. Exercise 5.27. Show that if A E C3x3 is invertible and Ax = b, then
Xl
=
det A
det A
X2 =
and state and verify the analogous formula for example of Cramer's rule.] Exercise 5.2S. Show that if AEC
(5.26)
bll
A, B E
X3.
[REMARK: This is an
c nxn
and
det
[a~l
a22
anI
a n2
+ b21A + ... + bnlAn - 1 =
AB = In,
~n-'l
A
det
then for every
a2n
...
a2nn
A
Exercise 5.29. Compute the inverse of the matrix A =
[~ ~
; ] for
°
1 x those values of x for which A is invertible. [HINT: Exploit formula (5.24).]
5. Determinants
108
Exercise 5.30. Show that if a 12 ] a a22
det [all
a21
+ det
[an a21
a 13 ] a23
f3 + det
[a 12 a22
a 13 ] a23 'Y
then there exists a matrix C E C 3x2 such that [an
al2 a22
a21
[HINT: Imbed the given matrix.]
aij
= 1,
.
a 13 ] C a23
= 12 .
in an appropriately chosen 3 x 3 invertible
Exercise 5.31. Let 8i E C 2 and
aij
= det [ai aj] for
i,j
= 1, ... ,4.
Show that (5.27) [HINT: Consider the matrix
[aal14
a24
a2
a 34 ].]
ag
Exercise 5.32. Show that if the six numbers a12, alg,
a14, a34, a24 and a23 satisfy the identity (5.27), then there exists a matrix A E C 2x4 such
that
aij
Try A =
= det
[01
[8i
x
a12
aj], where ai designates the i'th column of A. [HINT:
Y z ] for a start.] al3 al4
5.8. Uses of determinants In looking back over the results of this chapter, you should keep in mind that determinants play a useful role in: (1) The calculation of the eigenvalues of a matrix A.
c
(2) Checking whether or not a matrix A E nxn is invertible. (3) The calculation of the inverse of an invertible matrix A E C nxn. (4) Solving equations of the form Ax = b when A E
Exercise 5.33. Let A
E C pxq .
c nxn is invertible.
Show that rankA = r if and only if the largest square invertible submatrix of A is of size r x r. [REMARK: A k x k submatrix of A is obtained by deleting p - k rows and q - k columns.]
5.9. Companion matrices A matrix A E C nxn of the form
(5.28)
0 0
1
0
0
1
0 0
0
0
0
1
-aD
-al
-a2
-an-l
A=
5.10. Circulants and Vandermonde matrices
109
is called a companion matrix. Companion matrices playa significant role in the theory of differential equations and the theory of Bezoutians, as will be discussed in Chapters 13 and 2l. Theorem 5.11. Let A E c nxn be a companion matrix of the form (5.28) with distinct eigenvalues AI, ... ,Ak having geometric multiplicities 1'1, ... ,'Yk and algebraic multiplicities aI, ... ,ak, respectively. Then:
= ao + alA + ... + an_lAn- l + An. (2) 'Yj = 1 for j = 1, ... ,k. (3) A is similar to the Jordan matrix J = diag {ct 1 ), •••
(1) det (AIn - A)
(4) A is invertible if and only if ao
=1=
,ct
k )}.
o.
Proof. The formula in (1) is obtained by expanding in minors along the first column and taking advantage of the structure. The second assertion follows from the fact that dim 'R,(A->.In ) ~ n - 1 for every point A E C, because it implies that dimN(A-~jln) = 1 for j = 1, ... ,k. Therefore (3) follows: there is exactly one Jordan cell Ci~j) for each distinct eigenvalue 3 Aj. The last assertion (4) is left to the reader as an exercise. 0 Exercise 5.34. Verify the formula in (1) for n = 2 and n = 3. Exercise 5.35. Verify the formula in (1) for an arbitrary positive integer n by induction. [HINT: Expand in minors along the last column.] Exercise 5.36. Verify assertion (4) in Theorem 5.11. Exercise 5.37. Show that if peA) = ao + alA + a2A 2 + A3 = (A - J.L)3, then 1
o
o1 ] V=V -a2
1 0]
[J.L0 J.L 1 0 0 J.L
where
V =
[J.L21J.L 2J.L0 0]1
[HINT: Exploit the formulas p(J.L) = 0, P'(J.L) = 0 and p"(J.L) denotes the derivative of p with respect to A.]
5.10. Circulants and Vandermonde matrices A matrix A E
c nxn of the form
(5.29) based on the n x n permutation n-l
(5.30)
P =
L eje;+1 + enef j=l
1
=
0
0, where p'
5. Determinants
110
is termed a circulant. To illustrate more graphically, if n = 5, then 1 0 0 0 0
0 0 0 1 0
ao al a2 a3 a4 a4 ao al a2 a3 and A= a3 a4 ao al a2 a2 a3 a4 ao al al a2 a3 a4 ao Because of the special form of A, it is reasonable to look for eigenvectors u of the special form u T = [1 X x 2 x 3 x4]. This will indeed be the case for appropriate choices of x. A matrix V E C nxn with columns of this form: 0 0 p= 0 0 1
0 1 0 0 0
0 0 1 0 0
V=
(5.31)
[A~ Ai- l
is termed a Vandermonde matrix. Exercise 5.38. Show that if P denotes the permutation P defined by formula (5.30), then det (Aln - P) = -1 + An and pn = In. [HINT: Invoke Theorem 5.11 and the Cayley-Hamilton theorem.] Exercise 5.39. Let A E c nxn denote the circulant matrix defined by formula (5.29) and let p{A) = ao + alA + ... + an_lAn-I. Show that
AV = VD,
where
D
= diag {P«(I), ... ,P{(n)},
(j = exp(21rij/n) for j = 1, ... ,n and V is a Vandermonde matrix with Aj = (j for j = 1, ... ,n. Conclude that
(5.32)
det A = P{(I)'" P{(n).
Exercise 5.40. Show that if A E AB=BA.
c nxn and BE c nxn are circulants, then
Exercise 5.41. Find the determinant of the (Vandermonde) matrix V given by formula (5.31) when n = 3. [REMARK: You can calculate this determinant by brute force. But a better way is to let f{x) denote the value of the determinant when Al is replaced by x and observe that f{x) is a polynomial of degree two such that f(A2) = f(A3) = 0.] Exercise 5.42. Let {ao, ... , an} and {30, ... ,.Bn} be two sets of points in C. Show that if ai =1= aj when i =1= j, then there exists a unique polynomial p(A) = CO+CIA+" '+CnA n of degree n such that p(ai) =.Bi for i = 0, ... ,n and find a formula for the coefficients Cj.
Chapter 6
Calculating Jordan forms
Some people believe that football is a matter of life or death. I'm very disappointed with that attitude. I can assure you its much, much more important than that. Bill Shankly, former manager of Liverpool The first (and main) part of this chapter is devoted to calculating the Jordan forms of a given matrix A E c nxn , i.e., to finding representations of the form
(6.1)
A=UJU- 1 ,
where U E C nxn is invertible and J E C nxn is a block diagonal matrix with Jordan cells as blocks. Subsequently, analogous representations for A E IR. nxn when the matrices on the right are also constrained to have real entries are considered. The last section furnishes additional information on companion matrices. In particular, it is shown that a companion matrix Sf that is based on the polynomial
with k distinct roots >'1, ... ,Ak is similar to J = diag {ct1 ), ••• ,Ci~k)}. Thus, every matrix A E C nxn is similar to a block diagonal matrix diag {S91 , ... ,S9t} based on one or more companion matrices. The symbol Sf (S for significant other) is used for the companion matrix to avoid overburdening the letter C.
-
111
6. Calculating Jordan forms
112
6.1. Overview The calculation of the matrices in a representation of the form (6.1) can be conveniently divided into three parts (the notation corresponds to Theorem 4.14): (1) Calculate the eigenvalues of the matrix A from the distinct roots "\1, ... ,..\k of the characteristic polynomial
p(..\) = det (..\In - A) by writing it in factored form as
p(..\)
= (..\ - ..\lrl<1 ... (..\ - ..\k)ll
The distinct roots ..\1," . ,..\k of p(..\) are the distinct eigenvalues of A, and the numbers aI, ... ,ak are their algebraic multiplicities. (2) Compute J = diag {B Al
...
,BAk} by calculating
dim N(A-Ajln)i
for
i
= 1, ... , aj
for each of the distinct eigenvalues ..\1, ... , ..\k, in order to obtain the sizes of the Jordan cells in B Aj from either the diagram discussed in Section 6.4, or formula (4.17). (3) Organize the vectors in N(A-Ajln)Qj into Jordan chains, one Jordan chain for each Jordan cell, in order to calculate the matrices U.
Remark 6.1. The information in (1) is enough to guarantee a factorization of A of the form A = UGU- l , where G = diag {G l , ... ,Gk } for some choice of Gj E Cll<jXll<j, j = 1, ... ,k. The remaining two steps are to show that if the vectors in U are chosen appropriately, then each of the blocks Gj , j = 1, ... ,k, can be expressed as a block diagonal matrix with 'Yj Jordan cells as diagonal blocks and to determine the sizes of these cells.
6.2. Structure of the nullspaces Lemma 6.2. Let BE lF nxn and let U
NBj
E
lF n belong to NBk for some positive
integer k. Then the vectors {Bk-lu, B k- 2u, ...
Proof.
,u} are linearly independent over IF
Suppose first that B k - l u
¢:::=:>
Bk-lu
i- 0 and that
aou + alBu + ... + ak_lBk-lu = 0 for some choice of coefficients ao, ... ,ak ElF. Then, since u E NBk, Bk-l(aou + Q:lBu + ...
which clearly implies that ao
+ ak_lBk-1u) =
aoBk-lu = 0,
= O. Similarly, the identity
B k- 2(a l Bu + ...
+ ak_lBk-1u) =
alBk-lu
=0
i- O.
6.2. Structure of tile nullspaces N Bi
implies that
Ql
113
= O. Continuing in this vein it is readily seen that Bk-1u =1=
o ===> that the vectors {Bk-I U , Bk-2u , ... , u} are linearly independent over IF. Thus, as the converse is self-evident, the proof is complete.
Lemma 6.3. If BE
IFnxn,
D
then:
(1) The null spaces N Bj are ordered by inclusion:
NB r;;.NB2 r;;.NB3 r;;. ....
(2) If N Bi = NBj-l for some integer j 2:: 2, then NBi+l = N Bi . (3) If j 2:: 1 is an integer, then N Bj
=1=
{O}
~ NBi+l =1=
{O}.
Proof. If U E N Bj , then Bi+1u = B(Biu) = BO = 0, which clearly implies that u E NBi+l and hence justifies (1). Suppose next that U E NBi+l and NBi = NBj-l. Then Bu E N Bi , since Bi+1u = Bi (Bu) = O. Therefore, Bu E NBj-l, which implies in turn that u E N Bj . Thus, N Bj
= NBj-l ===> NBi+l r;;. N Bi ,
which, in view of (1), serves to prove (2). Finally, (3) is left to the reader as an exercise. D Exercise 6.1. Verify assertion (3) in Lemma 6.3. Lemma 6.4. Let B E c nxn and assume that NBj-l zs a nonzero proper subspace of N Bj for some integer j 2: 2.
(1) If {UI,'"
,ud
is a basis for NBi-l, and {UI,'" , Uk; VI,··· ,ve} is a basis for N Bj , then f ~ k and span{Bi- 1vI , ... ,Bi-1ve} is an f-dimensional subspace of span{ul, ... , Uk}.
(2) IfalsoNBj is a proper subspace ofNBi+1 and{uI, ... ,Uk; VI, ... ,Vi; WI, ... ,wm } is a basis for NBj+l, then m ~ f ~ k and span {Bi w1 , ... ,Biwm } is an m-dimensional subspace of the f-dimensional space span {Bi- 1v1 , ... , Bi-1ve}.
Proof. To verify (2), observe first that the identity Bi (Bw i) = 0 clearly implies that the vectors BWi E N Bj and hence that span{Bwl, ... ,Bwm } r;;. span{ul, ... ,Uk,VI,··· ,Ye}o But this in turn implies that span {Bi wl , ... ,Bi wm } r;;. span {Bi- 1v1 , ... ,Bi-1vt}. The next step is to check that the vectors Biw1 , ... , Biwm are linearly independent. To this end, suppose that "YIBiwl
+ ... + "YmBiwm = 0
6. Calculating Jordan forms
114
for some set of constants 11, ... ,1m E C. Then II WI + ... + 1mWm E N Bi and, consequently, IIWI
+ ... + ImWm =
alUI
+ ... + akUk + /3lvl + ... + /3fYi
for some choice of constants aI, . .. ,ak, /31, . .. ,/3e E C. However, since the three sets of vectors are linearly independent, this is viable only if all the constants are zero. The same argument serves to show that the vectors Bi- I vl , ... ,Bi- 1vi are linearly independent. This completes the proof of 0 (2). The proof of (1) is similar.
Corollary 6.5. If BE (6.2)
c nxn , then dimNBo = 0 and
dimNBHl - dimNBi ~ dimNBi - dimNBi-l for j = 1,2, ....
In future applications of Lemma 6.4, we shall also need the following result, which will enable us to select basis elements in an appropriate way.
Lemma 6.6. Let span {UI' ... ,Uk} be a k-dimensional subspace of an fdimensional subspace span {VI, ... ,vt} of cn, where k < f. Then there exists an i x i permutation matrix P such that if
vel =
[VI
[VI
Vi] P,
then
Proof. Under the given assumptions there exists a matrix A rank A = k such that [UI
Uk]
= [VI
E C ixk
with
Vi] A .
Let P be an i x i permutation matrix such that the bottom k x k block of pT A is invertible. Thus, in terms of the block decomposition pT A = [AI2] A22
with blocks Al2 E C(l!-k)xk and A22 E C kXk , A22 is invertible and [UI
Uk]
=
[VI
Vi] P pT A
=
[VI
Vi] p
[1~~]
Consequently, the i - k vectors defined by the formula [VI
Vi-k]
= [VI
meet the asserted conditions, since
Vi]
P
[I~k]
.
6.3. Chains and cells
115
and the two R. x R. matrices on the far right in the last formula are invertible.
o 6.3. Chains and cells Recall that a set of vectors U 1. . .. ,Uk E en is said to form a Jordan chain of length k corresponding to an eigenvalue >'0 of A E C nXn if
(A - >'OIn}Ul
and
Ul
i= o.
=
0
(A - >'oIn}u2
Ul
(A - >'oIn}u3
U2
In other words, the n x k matrix
with these vectors as its columns satisfies the identity Uk-I]
o
010 001
o 1
0 0 0 0 0 0
=
[Ul
o
Uk] C~k)
U(Ci:) - >'OIk} , i.e.,
AU = uci:). To illustrate the computation of the number of Jordan cells in J, suppose for the sake of definiteness that B>'1 is a block diagonal matrix with exactly kl Jordan cells ci~), k2 Jordan cells ci~), k3 Jordan cells ci~) and k4 Jordan cells ci~) and let B = B>'1 - >'lIa1 . Then dim N B dim NB2 dim NB 3 dim NB4
+ k2 + k3 + k4 = kl + 2k2 + 2k3 + 2k4 kl + 2k2 + 3k3 + 3k4 kl + 2k2 + 3k3 + 4k4 =
=
kl
il!l .
6. Calculating Jordan forms
116
Thus,
+ k3 + k4 k3 + k4
dim NB2 - dim NB
k2
dim N B 3 - dim NB2 dim NB4 - dim N B 3
k4
,
and hence kl
=
2 dim NB - dim NB2
k2
-
2 dim NB2 - dim N B 3 - dim NB 2 dim N B 3 - dim NB4 - dim NB2
k3 k4
=
dim NB4 - dim N B 3.
The last set of formulas can be written in the uniform pattern (6.3) kj = 2 dim NBi - dim NBi+l - dim NBj-l
since dim NBo j =4, ... ,n.
for
= 0 and, for this choice of numbers,
j = 1,2, ... , n - 1,
dim N Bi
= dim NB4
for
Exercise 6.2. Let A E c nxn be similar to a Jordan matrix J that contains exactly kj Jordan cells CY) of size j xj with /-l on the diagonal for j = 1, ... , I! and let B = A - /-lIn. Show that formula (6.3) for kj is still valid. Exercise 6.3. Calculate dim N Bj for j and
= 1, ... , 15, when B = B>"l
- Alh5
Exercise 6.4. Find an 11 x 11 matrix B such that dim N B = 4, dim N B2 = 7, dim N B 3 = 9, dim NB4 = 10 and dim N B 5 = 11. Exercise 6.5. Let A E c nxn be similar to a Jordan matrix J that contains exactly kl Jordan cells C~l), k2 Jordan cells C~2), ... , k£ Jordan cells C~£) with /-l on the diagonal and let B = A - /-lIn. Show that (6 4) d' .
1m
N. = { B3
ki
kl + k2 + ... + k£ + 2k2 + ... + (j - 1)kj - 1 + j Ef=j ki
if j = 1 if j?2.
Exercise 6.6. Show that in the setting of Exercise 6.5 (6.5)
dim NBi+l - dim N Bi = kj+l
+ ... + k£
for
j = 1, ... ,n -1.
6.4. Computing J To illustrate the construction of J, let A be an n x n matrix with k distinct eigenvalues AI, ... , Ak having geometric multiplicities 11, ... "k and algebraic multiplicities al, ... , ak, respectively. To construct the Jordan blocks
6.5. An algorithm for U
117
associated with AI, let B = A - A1In for short and suppose for the sake of definiteness that 1'1 = 6, Q1 = 15, and, to be more concrete, suppose that: dim
NB = 6, dim NB2 = 10, dim N B3 = 13 and dim NB4 = 15.
These numbers are chosen to meet the two constraints imposed by (1) of Lemma 6.3 and the inequalities in (6.2), but are otherwise completely arbitrary. To see what to expect, construct an array of x symbols with 6 in the first row, 10 - 6 = 4 in the second row, 13 - 10 = 3 in the third row and 15 - 13 = 2 in the fourth row:
x x x x
x x x x x x x x x x x
The Jordan cells will correspond in size to the number of x symbols in each column: two cells of size 4, one cell of size 3, one cell of size 2 and two cells of size 1. The same construction works in general:
Theorem 6.7. Let A E c nxn , J.L E a(A), B = A - J.L1n and dj = dim N Bj for j = 0, ... ,n. Now construct an array of x symbols with dj - dj- 1 x symbols in the j 'th row, stacked as in the example just above, and suppose that exactly rows contain at least one x symbol. Then the number kj of
e
Jordan cells CY) in J is equal to the number of columns in the array that contain exactly j x symbols.
Proof.
In view of formula (6.5) and the fact that do = 0, d1 - do d2
-
=
d1
de - de-1
+ ... + k2 + k1 ke + ... + k2 ke
ke
Therefore, there are k j columns with exactly j x symbols in them for j = 1, ... ,f. 0
6.5. An algorithm for U In this section we shall present an algorithm for choosing a basis of -M,A-Aj)n that serves to build the matrix U in the Jordan decomposition A = U JU- 1 . Let Ai be an eigenvalue of A E
c nxn , let B =
A - Ai1n and let
6. Calculating Jordan forms
118
al, ... , all be a basis for NB, al, ... ,all; bI, ... ,bl2 be a basis for
N B 2,
aI, ... ,all; bI, ... ,bl2 ; Cl,'" ,Cl3 be a basis for
N B3,
aI, ... ,all; bI, ... ,bl2 ; CI, ... ,Cl3; d l ,··. ,dl4 be a basis for NB4 , and suppose that NB4 = NBn. Then, in view of Lemma 6.4, II 2 l2 2 l3 2 l4 and span {BbI, ... ,Bbl2} is an l2-dimensional subspace of span {aI, ... ,all}' span {B 2cI, ... ,B2cl3} is an l3-dimensional subspace of span{Bb l , ... , Bbl 2 }, span {B3dl, ... ,B3dl4} is an l4-dimensional subspace of span {B2cl, ... ,B2 cla}' Moreover, in view of Lemma 6.6, there exists a set of
l3 - l4 vectors CI, ... ,Cla-l4 in {CI, ... ,Cia} l2 - l3 vectors hI, ... ,hl2 -1a in {b l , ... ,bl2 } II - l2 vectors aI, ... ,all -l2 in {aI, ... ,all}' such that the set of II vectors
{B 3d l , ... , B3dl4; B2Cl,'" ,B2Cla-l4; Bh l , ... , Bhl 2 -13; al,'" ,al l -l2 } is a basis for
NB .
The next step is to supplement these vectors with the chains that they generate: £4 clmns
l3 - l4 clmns
l2 - l3 clmns
B2cl ···
Bbl'"
B 2 dl'"
BCl" .
bl'"
Bdl'"
Cl ...
B 3d I
dI
•••
···
The algorithm produces a Jordan chain for each column, i.e.,
l4 Jordan chains of length 4, l3 - l4 Jordan chains of length 3, l2 - £3 Jordan chains of length 2, II - l2 Jordan chains of length 1.
6.5. An algorithm for U
119
The total number of vectors in these chains is equal to
£4 + £3 + £2 + £1 dimNBn. Therefore, since this set of £4 + £3 + £2 + £1 vectors is linearly independent, it is a basis for NBn. Exercise 6.7. Verify that the £4 +£3 +£2 +£1 vectors exhibited in the array just above are linearly independent. Exercise 6.8. Show that in the array exhibited just above the set of vectors in the first k rows is a basis for NBk for k = 1,2,3,4; i.e., the set of vectors in the first row is a basis for N B, the set of vectors in the first two rows is a basis for N B 2, etc.
To complete the construction, let
Vs
[B 3 d s B 2 d s Bds ds]
Xs
[B 2cs Bcs cs] [Bb s bs ] for
Ys
[as]
Ws
for
for
for
8 = 1, ... ,£4,
8 = 1, ... ,£3 - £4 ,
8 = 1, ... '£2 - £3 ,
8=1,· .. ,£1-£2.
Then it is readily checked that BVs
C(2) C(I) = VsCO(4) ,BWs = Ws c(3) 0 ,BXs = Xs 0 ,BYs = Ys 0 '
and hence that if
Ui = [VI ... Vl 4 WI ... Wl 3 -l4 Xl ... Xl 2 -l3 Yl ... 1'l1-l2] , then
BUi = (A - Ai1n)Ui
= Ui(B>'i
- Ai10eJ ,
where B>'i is a block diagonal matrix with £4 Jordan cells ci~), £3 -£4 Jordan cells ci~), £2 - £3 Jordan cells ci~) and £1 - f2 Jordan cells The last identity is equivalent to the identity
ci!) as blocks.
AUi = UiB>'i· This yields the vectors associated with Ai and hence, upon setting
that
AU=UJ. This completes the construction, since U is invertible, by Remark 4.18 and Exercise 6.7.
120
6. Calculating Jordan forms
6.6. An example Let A E
c nxn , let >'1
E
(T(A), let B = A -
dim N B = 2 , dim N B2
>'IIn
for short and suppose that
= 4 and dim NBi = 5 for
j
= 3, ... ,n .
The given information guarantees the existence of five linearly independent vectors aI, a2, b}, b2 and CI such that {al,a2} is a basis for N B , {aI, a2, b l , b2} is a basis for NB2 and {a}, a2, b l , b2, cd is a basis for N B3. Thus, upon supplementing these vectors with the chains that they generate, we obtain the array 2 1 3 4 5 2 B cl Bbl Bb2 al a2 BCI bl b2 Cl
The five vectors in the first row of this array belong to the two-dimensional space NB. The analysis in Section 6.5 guarantees that at least one of the two sets of vectors {B2c1 , Bb l }, {B2CI' Bb 2} is a set of linearly independent vectors. Suppose for the sake of definiteness that B2cI and Bbl are linearly independent. Then the earlier analysis also implies that {B2cl,Bc},c},Bbl,bt} is a basis for NBn. Nevertheless, we shall redo the analysis of this special example from scratch in order to reenforce the underlying ideas. There are six main steps: (1) Every vector in this array of nine vectors is nonzero. (2) span {B2cl} ~ span {Bb l , Bb2} ~ span {a}, a2}. (3) The vectors Bb 1 and Bb2 are linearly independent. (4) If the vectors B 2cl and Bb l are linearly dependent, then the vectors B2cl and Bb2 are linearly independent. (5) If the vectors B2cl and Bb l are linearly independent, then the vectors in columns 1 and 2 are linearly independent. (6) If the vectors B2cI and Bb2 are linearly independent, then the vectors in columns 1 and 3 are linearly independent. To verify (1), suppose first that B2Cl = O. Then CI E N B 2, which implies that Cl E span {a l, a2, bl, b2}. But this contradicts the presumed linear independence of the 5 vectors involved. Therefore, B2cI =1= 0 and hence, BCl =1= 0 and Cl =1= O. Similar reasons insure that Bb l =1= 0 and Bb 2 =1= O. The vectors in the first row of the array are nonzero by choice. To verify the first inclusion in (2), observe that BCl E N B 2, and hence it can be expressed as a linear combination of the basis vectors of that space:
6.6. An example
121
Therefore, B2Cl = ,8lBb l
+ ,82Bb2 .
The second inclusion in (2) is self-evident, since Bbl, Bb 2 E NB and {aI, a2} is a basis for N B . Next, to verify (3), suppose that ,8IBb l
+ ,82Bb2 =
0
for some choice of constants ,81,,82 E C. Then the subsequent formula B(,8lb1 + ,82b2)
implies that ,8lb 1 + ,82b2 E
=0
NB and hence that
,8lb1
+ ,82b 2 =
alaI
+ a2a 2
for some choice of constants aI, a2 E C. However, since the four vectors in the last line are linearly independent, this means that al = a2 = ,81 = fh = O. Therefore, (3) follows. If B 2c1 and Bb l are linearly dependent, then ')'IBcI + ,81bl E NB for some choice of ')'1,,81 E C which are not both equal to zero. In fact, since BCI ¢ NB and b l ¢ N B, both of these constants are different from zero. Similarly, if B2cI and Bb2 are linearly dependent, then ')'2BcI + ,82b2 E NB for some choice of constants ')'2, Ih E C which both differ from zero. Therefore, ')'2(')'l Bc l
+ ,8lbl) -
')'1 (')'2 Bcl
+ ,82b 2) =
')'2,8lbl - ')'1,82 b 2
also belongs to N B , contrary to assumption, unless ')'2,81 This justifies (4).
= 0 and ')'1,82
=
o.
Suppose next that the vectors B2cl and Bbl are linearly independent and that there exist constants such that ')'1 Cl
+ ')'2 Bcl + ')'3 B2C l + ,81 b l + ,82Bbl = O.
Then, upon multiplying both sides on the left by B2, it is readily seen that = O. Next, upon multiplying both sides on the left by B, it follows that
1'1
')'2B2c1
+ ,8lBbl =
0,
which, in view of the conditions imposed in (5), implies that 1'2 = ,81 = O. Thus, the original linear combination of·5 vectors reduces to ')'3B2c1
which forces ')'3 = ,82 is similar.
+ ,82Bbl =
0,
= O. This completes the proof of (5); the proof of (6)
6. Calculating Jordan forms
122
In case (5), the set of vectors {B 2 cI. BCI,CI. BhI. hI} is a basis for N B 3. Moreover, since B[B 2cI
BCI
CI
Bhl
hI]
=
[B3cI
=
[0
=
[B2cI
B2cI
B2cI
BCI
B 2h l
0
Bh l ]
BCI
BCI
ci
Bhl
Bh l ]
bl]N,
where
N = diag {C(3) o ' C(2)} 0
,
it is now readily seen that the vectors UI
= B2CI,
U2
= BCI,
U3
= cI,
= Bhl and
U4
Us
=
hI
are linearly independent and satisfy the equation
(6.6)
A[UI ... Us] = [UI ... us]
Al 0 0 0 0
1 Al 0 0 0
0 1 Al 0 0
0 0 0 Al 0
0 0 0 1 Al
Similar conclusions prevail for case (6), but with h2 in place of hI.
6.7. Another example In this section we shall present a second example to help clarify the general algorithm that was introduced in Section 6.5. To this end, assume for the sake of definiteness that dim NB dim NBj
6, dim
=
15
for
NB2 =
j
10, dim N B3 = 13 and
= 4, ...
,n.
These numbers must meet the constraints imposed by (1) of Lemma 6.3 and the inequalities (6.2), but are otherwise completely arbitrary. The eigenvectors and generalized eigenvectors corresponding to each Jordan block may be constructed as follows: 1.
Construct a basis for
NBn according to the following scheme:
al, ... ,86 is a basis for N B , aI, ... ,86; hI. ... ,h4 is a basis for N B 2, aI, ... ,86; hI. ... ,h4; CI, C2, C3 is a basis for N B 3, aI, ... ,86; hI. ... ,h4; CI, C2, C3; dI. d2 is a basis for N B 4.
6.7. Another example
123
2. Construct chains of powers of B applied to each vector in the basis and display them in columns of nonzero vectors labeled 1-15:
1 B 3d l B2dl Bd l dl
3. to
2 3 B 3d 2 B2cl B 2d 2 BCl Bd2 Cl d2
4 5 B 2c2 B2c3 BC2 BC3 C2 C3
6
Bbl bl
7 8 9 10 Bb2 Bb3 Bb4 al b4 b2 b3
15 il6
Observe that the vectors in the first row of the preceding array belong NB and that
span {B 2cI, B2c2, B2c3} is a3-dimensional subspace of span {Bb l , Bb 2, Bb3, Bb4 }. span {Bb l , Bb2, Bb3, Bb4} is a 4-dimensional subspace of span {aI, ... , il6}. Thus, for example, since Bdl E NB 3, it follows that 643 Bd l = Lajaj + L,8jb; + L';C; j=l
;=1
;=1
for some choice of the 13 coefficients all ... , a6, ,81, .. · , ,84, ')'1, ,2, ')'3 and hence that 3
B 3d l
=
L ')'jB c; . 2
j=l
Moreover, if, say, + = 0, then ad 1 + ,8d2 E N B 3 and consequently, adl + ,8d2 can be expressed as a linear combination of the vectors
aB 3 d l
,8B3 d 2
{al, ... ,a6,bl ·.·,b4,Cl··· ,C3}. However since all these vectors are linearly independent, this forces a O. The remaining assertions may be verified in much the same way. 4.
Build a basis for
= ,8 =
NB by moving from left to right in the ordering
span {B 3d}, B 3d 2} C C
span {B2cl, B2c2, B2c3} span {Bb l , Bb2, Bb3, Bb4} ~ span{al, ... ,i16},
by adding vectors that increase the dimension of the space spanned by those selected earlier, starting with the set {B 3 d}, B 3 d 2 }. Thus, in the present setting, this is done as follows:
6. Calculating Jordan forms
124
(i) Choose a vector Cj such that B 2 cj is linearly independent of {B3d l , B3d2}. There exists at least one such, say C2. Then span {B 3d I, B3d2, B2c2} is a three-dimensional subspace of the three-dimensional space span {B2cI, B2c2, B2c3}' Thus, these two spaces are equal and one moves to the next set of vectors on the right.
(ii) Choose a vector b i such that Bbi is linearly independent of the vectors {B3dl,B3d2,B2c2}' There exists at least one such, say b l . Then span {B3dl, B3d2, B 2c2' Bbd is a four-dimensional subspace of the four-dimensional space span {Bbl' Bb2, Bb3, Bb4}. Thus, these two spaces are equal. (iii) Choose a vector 8i that is linearly independent of {B 3d l , B3d2, B2c2, BbI}, say Ita. Then span{B3dl,B3d2,B2c2,Bbl,a3} is a five-dimensional subspace of the six-dimensional space span {a I, . . . , 116}. Therefore another selection should be made from this set.
(iv) Choose a vector aj that is linearly independent of {B3dl, B3d2, B2c2, Bbl,Ita}, say as. Then span {B3dl, B3d2, B 2c2' Bb l , a3, a5} is a sixdimensional subspace of the six-dimensional space span {aI, . .. , 116}. Therefore the two spaces are equal and the selection procedure is complete.
5. The 15 vectors in the columns corresponding to {B3dl, B 3d 2, B2c2, Bbb Ita, as}, i.e., columns 1, 2, 4, 6, 12, 14, are linearly independent. Since dim NBft = 15, these 15 linearly independent vectors in NBn form a basis for that space. Moreover, if B = A - Alln, then each of the specified columns generates a Jordan cell of height equal to the height of the column. Consider, for example, the cell corresponding to the first column. The four vectors in that column are stacked in order of decreasing powers of B to form an n x 4 matrix:
ci'!
B [B3d l
B2dl Bd l
dl ]
=
[0 B 3d l [B3dl
B 2d l
B 2dl
Bd l ]
Bdl
Thus, upon writing Ul = B 3d l , U2 = B 2d b U3 = Bd l , U4 B = A - Adn, the last formula can be rewritten as
dl
]
C64) .
= dl and setting
125
6.7. Another example
or, equivalently, as
Continuing in this fashion, set U5 = B 3 d 2, U6 = B 2d 2, U7 = Bd 2, Us = d2, Ug = B2c2 , UlO = BC2, Un = C2, Ul2 = Bb l , Ul3 = b l , Ul4 = a3, Ul5 = a2 and It is readily seen that AU = U B Al
B Al = diag {ci~) , ci~) , ci~) , ci~) , ct) , ci~)} .
where
Exercise 6.9. Find a Jordan form J and an invertible matrix U such that 2 0 0 0 0
0 2 2 0 0
0 0 2 0 0
0 0 0 2 0
2 0 0 0 2
= UJU- l
.
Exercise 6.10. Find a Jordan form J and an invertible matrix U such that
[~
2 1 0 0
0 2 1 0
~]
= UJU- l
.
Exercise 6.11. Find a Jordan form J and an invertible matrix U such that
[~ first for x
2 1 0 0
0 0 1 2
~]
= UJU- l
,
= 0 and then for x = 1.
Exercise 6.12. Find a Jordan form J and an invertible matrix U such that
A=
2 0 0 0 0 0 3 0 0 1 0 0 3 0 0 0 0 1 3 0 0 0 0 0 3 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 3 0
1 0 0 0 0 0 2
= UJU- l
.
6. Calculating Jordan forms
126
Exercise 6.13. Find a Jordan form J and an invertible matrix U such that
A=
1 1 13
1 2
-8 -11 8
-5
-22
17
-12 -4 5
0 1 6
0 0 0 1
0 0 0 1
= UJU- l
.
[HINT: The first step is to compute det ()..Is - A). This is easier than it looks at first glance if you take advantage of the block triangular structure of A.]
6.8. Jordan decompositions for real matrices The preceding analysis guarantees the existence of a Jordan decomposition A = U JU- l with J, U E e nxn for every A E e nxn and hence also for every A E lR nxn. However, even if A E lR n x n, J and U may have entries that are not real. Our next objective is to deduce analogous decompositions for A E lR nxn, but with both J and U in lR nxn. It suffices to focus on the Jordan chains that correspond to each of the Jordan cells that appear in J. Thus, if eik) is one of the diagonal blocks in J, the preceding analysis guarantees the existence of a set of linearly independent vectors Ul ... ,Uk E en such that (6.7) is in force or, equivalently, upon setting B = A - Aln , such that (6.8)
[UI
Uk] = [Bk-1Uk
Bk-2Uk
Uk]
and
Bkuk =
O.
Exercise 6.14. Show that the two conditions (6.7) and (6.8) are equivalent. There are two cases to consider for)" E u(A): A E lR and)" ¢ R Case 1: A E lR nxn and)" E u(A) nlR.
Lemma 6.S. Let A
lR nxn, let).. E u(A) nlR and let Ul ... ,Uk be a Jordan chain in en corresponding to a Jordan cell ef). Then there exists a Jordan chain VI .•. ,Vk in 1R n corresponding to eik).
Proof.
E
Let B = A - )"In , let Ul ... ,Uk be a Jordan chain in en corresponding to elk) and let Uj = Xj + iYj, where Xj and Yj denote the real and imaginary parts of the vector Uj, respectively, for j = 1, ... ,k. Then, by assumption, the given set of vectors satisfy the constraint (6.7) and are
127
6.8. Jordan decompositions for real matrices
linearly independent over C. Moreover, in view of the equivalence between (6.7) and (6.8), this means that Uk]
[Ul
=
[Bk-luk
Bk-2Uk
Uk]
and
Bkuk
=0
or, equivalently, that
= Yk] = Xk]
[Bk-lxk
B k- 2x k
Xk],
[Bk-lYk
B k- 2Yk
Yk]
Bkxk = 0,
and
BkYk =
o.
It remains to check that at least one of the two sets of vectors
,xd,
{Bk-lxk,Bk-2xk, ...
{B k - l Yk,B k - 2Yk,··· ,Yk}
is linearly independent over lR. In view of Lemma 6.2, it suffices to show that at least one of the two conditions Bk-lxk =I- 0 and Bk-lYk =I- 0 is in 0 force. But this is clearly the case, since Bk-l uk =I- O. Case 2: A E 1R nxn and A E u(A) n (C \ R). If A E 1R nxn, then the characteristic polynomial p(A) = det (A/n - A) has real coefficients. Therefore, the nonreal roots of p(A) come in conjugate pairs. Thus, for example, if
A
[Ul
U2
U3]
=
[Ul
U2
U3]
[~l
~]
;1
o
Al
0
then, taking the complex conjugate of both sides,
A
[Ul
U2
113
1=
[Ul
U,
113
[1 ~ I.]
1
and, since Al =I- AI, span{ ul, U2, U3} n span{ ul, U2, U3} = {O} . Thus, the rank of the n x 6 matrix [Ul U2 U3 Ul U2 U3] is equal to 6. Therefore, the same holds true for the n x 6 real matrix 1 [Xl
Yl
X2
Y2
X3
Y3] =
2 [Ul
U2
U3
since the matrix
Q=
1 -z 0 0 0 0 0 0 1 -i 0 0 0 0 0 0 1 -i 1 i 0 0 0 0 0 0 1 i 0 0 0 0 0 0 1 i
Ul
U2
U3]
Q,
128
6. Calculating Jordan forms
is invertible. Moreover, upon writing Al in polar coordinates as Al r cos 0 + i r sin 0, it is readily checked that
A
[Xl
YI
X2
Y2
Y3] = [Xl
X3
Yl
X2
Y2
X3
= re i8 =
Y3]
A,
where
A=
r cos 0 r sinO -rsinO rcosO 0 0 0 0 0
1 0 0 0 0 1 0 0 r cos 0 r sin 0 1 0 -r sin 0 r cos 0 1 0 0 r cos 0 r sin 0 0 -r sin 0 r cos 0 0 0
0 0 0
Analogous decompositions hold for other Jordan blocks.
Exercise 6.15. Let A E R nxn and suppose that n 2 2. Show that: (1) There exists a one-dimensional subspace U of under A.
en
that is invariant
(2) There exists a subspace V of R n of dimension less than or equal to two that is invariant under A.
6.9. Companion and generalized Vandermonde matrices Lemma 6.9. Let f(A) = fo
+ hA + ... + fnAn,
fn
# 0,
be a polynomial of degree n, let 0 0
1
0
0
0
0
0
where
Sf= 0
0
0
-ao
-al
-an-2
1 -an-l
denote the companion matrix based on f(A) and let
V(A) = [
~
An- l
I
and f(A) = [
~ f(A)
Then
(6.9)
1 Sf V(A) = A V(A) - fn f(A)
I·
aj
=
fjl fn,
6.9. Companion and generalized Vandermonde matrices
129
and (6.10)
for j = 1, . . . ,n - 1.
Proof.
By direct computation
which coincides with (6.9). The formulas in (6.10) are obtained by differentiating both sides of (6.9) j times with respect to A. 0 Corollary 6.10. In the setting of Lemma 6.9, assume that the polynomial f(A) admits a factorization of the form
with k distinct roots AI,' .. ,Ak, and let (6.11) for j
v- = J
[V(Aj) O!
= 1, . . . ,k. Then
(6.12)
Exercise 6.16. Verify formula (6.12) when mj
A matrix of the form V = [VI a generalized Vandermonde matrix.
= 4.
Vk], with Vj as in (6.11) is called
Corollary 6.11. The vectors in a generalized Vandermonde matrix are linearly independent. Exercise 6.17. Verify Corollary 6.1l. Example 6.12. If
6. Calculating Jordan forms
130
then 0 0 0 0
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1
-10 -!1 -12 -fa -14 1 a a2 a3 a4
0 1 2a 3a2 4a3
0 0 1 3a 6a 2
1 (3 (32 (33 (34
1 a a2 a3 a4
0 1 2(3 3(32 4(33
0 1 2a 3a2 4a3
a 1 0 a 0 0 0 0 0 0
0 0 1 3a 6a 2 0 1 a 0 0
0 0 0 (3 0
1 (3 (32 (33 (34
0 1 2(3 3(32 4(33
0 0 0 1 (3
Exercise 6.1S. Verify that the matrix identity in Example 6.12 is correct. Theorem 6.13. Let f(A) be a polynomial of degree n that admits a factorization of the form
with k distinct roots A},'" ,Ak' Then the companion matrix Sf is similar to a Jordan matrix J with one Jordan cell for each root: Sf = VJV- 1 ,
where V is a generalized Vandermonde matrix and C(Cl
Proof. This is an easy consequence of Corollary 6.10. o This circle of ideas can also be run in the other direction, as indicated by the following two exercises. Exercise 6.19. Let A E C nxn have k distinct eigenvalues AI, ... ,Ak with geometric multiplicities ")'1, ... ,'k and algebraic multiplicities al, ... ,ak, respectively. Show that if Ii = 1 for j = 1, ... ,k, then A is similar to a companion matrix Sf based on a polynomial f(A) and find f(A). Exercise 6.20. Show that if A E J
c nxn is similar to the Jordan matrix
= diag {ci~) ,ci~) ,ci!) ,ci~) ,ci!)} ,
then A is also similar to the block diagonal matrix diag {S91 , S92} based on a pair of polynomials gl(A) and g2(A) and find the polynomials. Exercise 6.21. Find a Jordan form for Jl.(In - Jl.Cdn))-1 when Jl.
i= o.
6.9. Companion and generalized Vandermonde matrices
Exercise 6.22. Find a Jordan form J for the matrix A =
131
2 2 o2 0 0 o3
o
100 300 2 0 0 120
1 1 1 2
Exercise 6.23. Find an invertible matrix U E C 5x5 such that A = U JU- 1 for the matrix A with Jordan form J that was considered in Exercise 6.22. Exercise 6.24. Let B E c nxn ; let U1 E N B; V1, V2 E N B2; W1, W2 E N B3j and assume that the 5 vectors B 2wI, B 2w2' BV1, BV2, U1 are linearly independent over C. Show that the 11 vectors B 2wI, BW1, WI, B 2w2' BW2, W2, BV1, V1, BV2, V2, U1 are also linearly independent over C. Exercise 6.25. [Fi~d aD i~ve]rtible matrix U such that U- 1AU is in Jordan
form when A
=
0 2 0 -i 0 1
. [NOTE: i
= A.]
Exercise 6.26. Find a Jordan form J 0 1 0 0 8 -12 A= -1 1 -4 1
for the matrix 0 0 0 1 0 0 6 0 0 0 0 1 0 -4 4
x3 -
6x 2 + 12x - 8
[HINT: You may find the formula
~
= (x - 2)3 useful.]
Exercise 6.27. Find an invertible matrix U such that AU = U J for the matrices A and J considered in Exercise 6.26.
The next three exercises are adapted from [58]. Exercise 6.28. Show that if n ;::: 2, then the matrix (C~n))2 is similar to the matrix C(~) if and only if J1, i- O. I-'
Exercise 6.29. Let B E Cpxp be a triangular matrix with diagonal entries bii = A i- 0 for i = 1, ... ,p and let V E CpxP. Show that B 2V = VB2-<===} BV = VB. [HINT: Separate even and odd powers of B in the binomial expansion of (B - Alp)P = 0 to obtain a pair of invertible matrices P1 = aolp + a2B2 + ... and P2 = bolp + b2B2 + ... such that P1 = BP2.] Exercise 6.30. Let A, B E C nxn and suppose that the eigenvalues of A and B are nonnegative and that NA = NA2 and NB = N B2. Show that A2 = B2 {::::::} A = B. [HINT: Invoke Exercise 6.28 to show that the Jordan decompositions A = U1J 1U1 1 and B = U2J2U;;1 may be chosen so that J1 = h and hence that JrV = V Jr for V = U1 1U2. Then apply Exercise 6.29, block by block, to finish.]
132
6. Calculating Jordan forms
Exercise 6.31. Let A E det (Aln - A). Show that
(6.13) A
UJ
=
[An
c nxn
be a companion matrix and let p(A)
=
tJ LJ [nAn_1t,J A
[(n _
and differentiate once again with respect to A to obtain the next term in the indicated sequence of formulas. Exercise 6.32. Find an invertible matrix U and a matrix J in Jordan form such that A = U JU- 1 if A E C 6X6 is a companion matrix, det (AI6 - A) = (A - AI)4(A - A2)2 and Al =1= A2' [HINT: Exploit the sequence of formulas indicated in (6.13), first with A = Al and then with A = A2'] Exercise 6.33. Let A E c nxn with k distinct eigenvalues Al, ... ,Ak' Show that if the geometric multiplicity Ij of Aj is equal to one for j = 1,. " ,k, then A is similar to a companion matrix.
Chapter 7
N ormed linear spaces
I give you now Professor Twist, A conscientious scientist, Camped on a tropic riverside, One day he missed his loving bride. She had, the guide informed him later, Been eaten by an alligator. Professor Twist could not but smile. "You mean," he said, "a crocodile." The Purist, by Ogden Nash In this chapter we shall consider a number of different ways of assigning a number to each vector in a vector space U over C that gives some indication of its size. Ultimately, our main interest will be in the vector spaces C nand R n. But at this stage of the game it is useful to develop the material in a more general framework, because the extra effort is small and the dividends are significant. We shall also show that if a matrix B E C nxn is sufficiently close to an invertible matrix A E C nxn, then B is invertible too. In other words, the invertibility of a square matrix is preserved under small perturbations of its entries. This suggests the following question: Is the rank of a matrix A E C pxq preserved under small perturbations of its entries? The answer, as we shall see later in Chapter 17, is yes if the rank of A is equal to either p or q, but not otherwise.
7.1. Four inequalities Throughout this subsection s are connected by the formula (7.1)
> 1 and t > 1 will be two fixed numbers that 1
1
-+-=1. s t
-
133
134
7. Normed linear spaces
y
y
b", lliiim o
a
x
a
Figure 1
x
Figure 2
It is readily checked that (7.2) ·11
:; + t = 1 {::::::} (8 -
1){t - 1)
= 1 ~ {s -
l)t = s
~
(t - 1)8 = t .
Lemma 7.1. Let a > 0, b> 0, s> 1, t> 1 and (s - l)(t - 1) = 1. Then
(7.3)
as
ab $ -; +
bt
t'
with equality if and only if as = bt .
Proof. The inequality will be obtained by comparing the areas of a rectangle R with horizontal sides of length a and vertical sides of length b with the area of the shaded regions that are formed between the x-axis and the curve y = x s - 1 , 0 $ x $ a, and the y axis and the same curve, now written as x = yt-l, for 0 $ y $ b, as sketched in the two figures. The first figure corresponds to the case as-I> b; the second to the case s 1 a - < b, and (7.2) guarantees that y
= x s- 1 ~ X = yt-l
.
The rest is straightforward. It is clear from the figures that in each setting the area of the rectangle is less than or equal to the sum of the area of the vertically shaded piece and the area of the horizontally shaded piece: ab <
foa xs-1dx + fob yt-1dy yt IY=b -X Ix=a +S
s x=O
as
bt
-+-. s t
t y=O
135
7.1. Four inequalities
The figures also make it clear that equality will prevail in formula (7.3) if and only if a s - 1 = b or, equivalently, if and only if a(s-l)t = bt . But this is the same as the stated condition, since as = a(s-l)t. 0
Lemma 7.2. (HOlder's inequality) Let s > 1, t > 1 and (s-l)(t-l) = 1. Then (7.4)
n
{
n
~ {; lakl s
{; lakbkl
} 1/S
{
n
{; Ibkl t
} 1ft
Moreover, equality will prevail in (7.4) if and only if the vectors u with components Uj = laj IS and v with components Vj = Ibj It are linearly dependent. Proof. We may assume that the right-hand side of the asserted inequality is not equal to zero, because otherwise the inequality is self-evident. [Why?] Let
Then
n
L
n
lakl s = 1 and
k=l and hence, in view of Lemma 7.1,
L l.Bklt = 1, k=l
~ lak.Bkl ~ ~ lakl s + ~ l.Bklt = ~ + ~ = 1.
L...J L...J s L...Jt k=l k=l k=l This yields the desired inequality because
~ I f.l 1= L...J akfJk
t
Ek=llakbkl
l/t .
l/s
(Ej=l laj IS)
k=l
s
(Ekj=l Ib j It)
Finally, equality will prevail in (7.4) if and only if either (1) the right-hand side is equal to zero or (2) the right-hand side is not equal to zero and
lai.Bil =
lail s + l.Bilt s
t
for i
= 1, ...
,n.
Lemma 7.1 implies that the latter condition holds if and only if
lail s = l.Bilt
for i
= 1, ... ,n,
i.e., if and only if for i
= 1, ... ,n ..
This completes the proof, since (1) and (2) are equivalent to the linear dependence of the vectors u and v. 0
7. Normed linear spaces
136
The case s = 2 is of special interest because then t = 2 and the inequality (7.4) assumes a more symmetric form and gets a special name:
Lemma 7.3. (The Cauchy-Schwarz inequality) Let a, bEen with components al, ... ,an and bI , ... , bn , respectively. Then
dim span{ a, b}
:s: 1 .
Proof.
The inequality is immediate from (7.4) by choosing s as already remarked, then forces t = 2).
Exercise 7.1. Show that if 0:, (3
=2
E IR and () E [0, 271"), then 0: cos ()
(which, D
+(3 sin () :s:
.J + (32 and that the upper bound is achieved for some choice of (). 0: 2
Lemma 7.4. (Minkowski's inequality) Let 1 :s: s < {
(7.5)
£; n
jak + bkjS
}l/S
{n
:s: {; jakjS
}l/S
00.
{n
+ {; jbkjS
Then }l/S
.
Proof.
The case s = 1 is an immediate consequence of the fact that for every pair of complex numbers a and b, ja + bj :s: jaj + jbj. On the other hand, if s > 1, then n
L jak + bkjS-ljak + bkj k=l n
<
L jak + bkjS-l(jakj + jbkj) . k=I
and
tr n
jak + bkjS-ljbkj:S:
{
n
{; jak + bkj(S-l)t
Combining the last three inequalities, we obtain
}I/t {
n
{; jbkjS
}l/S
7.1. Four inequalities
137
Now, if n
L lak + bkl
s
>0,
k=l
then we can divide both sides of the last inequality by {}:::~=llak to obtain the desired inequality (7.5). It remains to consider the case l:~=l lak inequality (7.5) is self-evident.
+ bk 1
8
+ bkIS}I/t
= 0 . BlIt then the
0
Exercise 7.2. Let al, ... ,an and bI, ... ,bn be nonnegative numbers and let 1 < S < 00. Show that
(7.6)
{t. lak + bkl' } ~ {t. lakl' } + {t. lbkl'} 1/,
1/,
1/_
if and only if the vectors a and b with components aI, ... , an and bl , ... ,bn , respectively, are linearly dependent. [HINT: See how to change the inequalities in the proof of Minkowski's inequality to equalities.] Remark 7.5. The inequality (7.3) is a special case of a more general statement that is usually referred to as Young's inequality: If a I, . .. , an and PI , . .. , Pn are positive numbers such that then (7.7)
ail al·· ·an :::; -
:1 + ... + p~ = 1,
a~n
+ ... + - .
Pn A proof is spelled out in the following three exercises. PI
Exercise 7.3. Let aI, ... , an and Cll ... ,en be positive numbers such that CI + ... + Cn = 1 and let P > 1. Show that
(7.8) [HINT: Write cjaj
= c~/q (c~/p aj) and then invoke Holder's inequality.]
Exercise 7.4. Verify Young's inequality when n = 3 by exploiting the inequality (7.3) to show that if 1
1
- = P PI
+-1 P2
and
1
1
q
P3
7. Normed linear spaces
138 P-apl/p + P-a'P2lp < (P-aPl (3) PI I P2 2 PI I
+ P2 P-a'P2) lip 2 •
(4) Verify Young's inequality for n = 3. [HINT: The inequality (7.8) is useful for (3).] Exercise 7.5. Verify Young's inequality. [HINT: Use the steps in the preceding exercise as a guide.] Exercise 7.6. Use Young's inequality to show that the geometric mean of a given set of positive numbers bl, ... ,bn is less than or equal to its arithmetic mean, i.e.,
(7.9)
7.2. N armed linear spaces A vector space U over IF is said to be a normed linear space if there exists a number
(1) (2) (3) (4)
Any such function c,o(x) is said to be a norm and is usually denoted by the symbolllxll, or by the symbolllxllu, if it is desired to clarify the space under consideration. The inequality in (4) is called the triangle inequality. Lemma 7.6. LetU be a normed linear space overlF with norm
for every choice of x and y in U. Proof. Item (4) in the property list of norms, i.e., the triangle inequality, implies that c,o(x) = c,o(x - y + y) ~
~
which is equivalent to the stated inequality.
~
o
7.2. Normed linear spaces
139
The simplest example of a norm on vectors x = ~.1=1 XjUj in a finite dimensional vector space U over IF with basis Ul, ... ,Un is
cp(x) = max {Ixjl : 1 ~ j
~
n}.
To verify the triangle inequality, note that if y = ~.1=1 YjUj, then
IXj
+ Yjl
~ IXjl
+ IYjl
~ m~ IXjl 3
+ m~ IYjl· 3
Exercise 7.7. Let U be a vector space over e with basis Ub ... ,Un. Show that for each choice of s in the interval 1 ~ S < 00 the formula
cp (tXjUj) = 3=1
{t
IXjls}IIS
3=1
also defines a norm on U. [IDNT: Minkowski's inequality (7.5) is useful.] In the special case that U = lF n and Uj = ej, the j'th column of In, the norms considered above are commonly denoted by the symbols IIxll oo and IIxlls, respectively:
(7.10)
IIxli oo = max {Ixjl : 1 ~ j ~ n}
and
IIS (7.11)
{
IIxll.= t,IX;I'
}
forl~s
This notation is also sometimes adopted in general normed linear spaces U, but then care must be taken, because the numbers in formulas (7.10) and (7.11) depend upon the choice of the basis. The most important norms in en are: IIxlh, IIxll2 and IIxll oo ; the choice s = 2 yields the familiar Euclidean norm: (7.12)
Exercise 7.S. Show that if s ;:::: 1 and t ;:::: 0, then (7.13)
[HINT: IfYj
IIxli s ;:::: IIxlls+t ;:::: IIxli oo
for each vector x E
= (IIxll s)-llxjl, then 0 ~ Yj
en.
~ 1 and ~.1=o yrt ~
2:.1=0 yj = 1.]
Exercise 7.9. Show that limsToo IIxll s = IIxli oo for each vector x E
en.
Exercise 7.10. Sketch the sets {x E lR,2: IIxlit ~ I} for t = 1,2 and
00.
7. Normed linear spaces
140
Exercise 7.11. Show that C pxq is a normed linear space over C with re-
spect to each of the norms (7.14)
IIAlls = { {L:f=1 L:3=1 laijlS
r ls
if
1
~
8
max{laijl:i=1, ... ,Pij=1, ...
in which aij denotes the ij entry of the
< 00 ,q} if
8=00,
matri~ A.
Exercise 7.12. Show that if A = [: :] with a > 0, then IIA211s >
when 2 <
8
IIAII~
~ 00.
Exercise 7.13. Show that the matrix A defined in Exercise 7.12 satisfies the inequality if 411A211s > IIAII~ for each choice of 8 in the interval 1 < 8 ~ 00.
A subset Q of a normed linear space U over IF is said to be convex if
x, y E Q = } tx + (1 - t)y E Q for every 0 The balls of radius r > 0 (7.15) B r {a) = {x E U: lIa - xII < r} and Br (a)
~
t
~
= {x E U: II a -
1.
xII ~ r}
are both convex sets.
Exercise 7.14. Verify the claim that the open and closed balls defined in (7.15) are both convex.
7.3. Equivalence of norms The proof of the next theorem depends upon some elementary concepts from analysis. A brief survey of the facts needed here and further on is furnished in Appendix I. However, the proof may be skipped without loss of continuity.
Theorem 7.7. On a finite dimensional vector space all norms are equivalent; i.e., if c,o(x) and 1jJ(x) are norms on a finite dimensional vector space U over IF, then there exists a pair of positive constants II and 12 such that 11cp(X) ~ 1jJ(x) ~ 12c,o(X). Proof. Let Ub ... , Un be a basis for U, let x = L:j=1 XjUj and y = L:j=1 YjUj and let €i denote the i'th column of the identity matrix In. Then
l
(t(X' -y,)u;) {
n
~ ~ IXi - Yilcp(Ui) ~ ~ IXi - Yil 2
}1/2 { n
~ cp(Ui)2
}1/2
_'
7.3. Equivalence of norms
141
by the Cauchy-Schwarz inequality. But this proves that (7.16)
Icp(x) - cp(y)1 :::; ,aIISx - Syl12
and
Icp(x)l:::; ,allSxll2,
where
and S denotes the linear transformation from U onto lF n that is defined by the formula
(t
S
XjUj) =
j=1
t
xjej .
j=1
The first inequality in (7.16) guarantees that cp(x) is continuous. Moreover, by the properties of a norm, cp(x) > 0 for every vector x in the set {x : IISxll2 = I}. Now let
a
= inf {cp(x) : IISxll2 = I}.
By the definition of infimum, there exists a sequence of vectors Xl, X2, ... in U with IISXjll2 = 1 such that cp(Xj) ~ a as j i 00. Since {v E lF n : IIvll2 = I} is a closed bounded set in lF n , it is a compact subset of lF n and hence there exist a subsequence of vectors Xkl' Xk2 . .. in U and a vector y E U such that IISXkj - Syll2 ~ 0 as j i 00. But this in turn implies that cp(Xk j ) ~ cp(y) as j i 00 and hence that a = cp(y) > O. Therefore, (7.17)
a :::; cp(u) :::; (3 when
IISuII2 = 1.
Now take any vector U E U with U =1= O. Then the inequality (7.17) is applicable to u/IISu1I2 and implies that
a :::; cp(u/IISuII2) :::; (3 or, equivalently, that (7.18)
for every nonzero vector u E U. But this last inequality is clearly valid for u = 0 also. A similar pair of inequalities holds for "p(u): (7.19)
with 0 < al < {31' The statement of the theorem now follows easily by combining (7.18) and (7.19): (3a "p(u) :::; cp(u) :::; .t."p(u). I
al
o
7. Normed linear spaces
142
Remark 7.8. Even though all norms on a finite dimensional normed linear space are equivalent in the sense established above, particular choices may be most appropriate for certain applications. Thus, for example, if the entries Ui in a vector u E IR n denote deviations from a navigational path, such as a channel through shallow waters, it's important to keep lIuli oo small. If a, hEIR 2, then although lIa - hll2 is equal to the usual Euclidean distance between the points a and h, the norm lIa- hilI might give a better indication of the driving distance.
7.4. Norms of linear transformations The set of linear transformations from a finite dimensional normed linear space U over IF into a normed linear space V over IF is clearly a vector space over IF with respect to the natural rules of vector addition:
(SI
+ S2)U = SIU + S2U
and scalar multiplication:
(as)u = a(Su).
A particularly useful norm on a linear transformation S from a finite dimensional normed linear space U over IF into a normed linear space V over IF is defined by the following recipe: (7.20)
IISllu,v = max {IiSullv
: lIullu::S I}.
This norm will be referred to as the operator norm of the linear transformation. The usefulness of this number rests on the fact that it is multiplicative in the sense that is spelled out in Theorem 7.10. But the first order of business is to verify that this number defines a norm and to itemize some of its properties. Theorem 7.9. The number IISlIu,v that is defined by formula (7.20) defines a norm on the set of linear transformations S from a finite dimensional normed linear space U over IF into a normed linear space V over IF . Moreover,
(1) IISullu::S IISlIu,vllullu for every vector U E U. (2) IISlIu,v = max {IlSuliv : u E U and lIullu = I}. (3) IISlIu,v =
max {III~II~ : u E U
and
u i=
o}.
Proof. It is readily checked that the number IISlIu,v meets the first three of the four stated requirements for a norm. To verify the fourth, let SI and S2 be linear transformations from U into V and let u E U with Ilullu ::s 1. Then, by the triangle inequality,
II(SI + S2)ullv =
II Sl u + S2 u llv
::s II SI U llv + II S2u llv ::s
IISIilu,v + I S21Iu,v.
7.5. Multiplicative norms
143
Thus, as this inequality holds for every choice of u E U with lIuliu follows that Next, to verify
~
1, it
I SI + S211u,v ~ IISlllu,v + II S2I1u,v. (1), choose U E U with u :f:. 0 and let u W=
lIuliu.
Then, IISuliv
= IISwllv lIuliu
~ IISlIu,v lIuliu ,
since IIwllu = 1. This verifies (1). Moreover, the formula IISuliv
= IISwllv lIullu with Ilullu
~1
implies that IISuliv ~ max {IISyllv : y E U
and
IIYllu = I}
and hence that IISllu,v ~ max {IISyliv : y E U and IIYllu = I}. This serves to verify (2), since the opposite inequality is self-evident. The verification of (3) rests on similar arguments and is left to the reader.
o Theorem 7.10. Let U, V and W be finite dimensional inner product spaces over IF and let SI be a linear transformation from U into V and let S2 be a linear transformation from V into W. Then
(7.21) Proof.
By (1) of Theorem 7.9, II S 2S 1 U llw ~ II S 211v,w II S I U llv ~ II S211v,w IISll1u,v Ilullu,
which leads easily to (7.21). Exercise 7.15. Let A
=
0
[~ ~]
IIABlis and IIBAlis for 1 ~ any, IIABlis ~ IIAlIsliBlis.
s
and
~ 00,
B=
[~ ~].
Calculate IIAlls,
IIBlIs,
and determine for which values of s, if
7.5. Multiplicative norms The ideas developed in the previous section are now applied to matrices. We shall say that a norm II . II on C nxn is multiplicative if
(7.22)
IIABII
~ IIAIIIIBIl
for every
A,B E C nxn .
7. Normed linear spaces
144
In particular, we shall show that the norm of A as a linear transformation that sends x E endowed with the norm Ilxll s into y = Ax E endowed with the norm IIYlls that is defined by the rule .
en
en
IIAlIs,s = max{IIAxlls : x E
(7.23)
en and Ilxll s ~ I}
for some choice of s in the interval 1 ~ s ~ 00 is a multiplicative norm. In fact this definition is easily extended to nonsquare matrices A E e pxq with different norms on e q and e p by setting (7.24) IIAlIs,t = max{IIAxllt: x E
e q and IIxll s ~ I}
for
1 ~ s, t ~
00.
The norm IIAlIs,t is an operator norm; its value depends upon the choice of sand t. In this definition the numbers sand t are not linked by formula (7.1). Explicit evaluations of the number IIAlIs,t for some specific choices of sand t will be discussed in the next section. The remainder of this section is devoted to checking that formula (7.24) defines a norm on e pxq and that this norm is multiplicative in the sense that IIABII ~ IIAIIIIBII when the product of the two matrices is well defined. The first step is to explore the number IIAlIs,t.
Lemma 7.11. Let A E C pxq and let 1 ~ s, t
~ 00.
Then
(1) IIAxli t ~ IIAlls,tllxlls for every x E e q . (2) IIAlIs,t = max{llAxllt : x E e q and Ilxll s = I}.
(3) IIAlIs,t = max {1I,~iI~t : x
i=
o}.
Proof. The inequality advertised in (1) is self-evident if x = therefore, that x
i= 0 and let u = (lIxlls)-lx.
Then lIull s
o.
Suppose,
= 1 and
IIAxllt = IIAulltllxlls ~ IIAlIs,tllxlls. Thus, the proof of (1) is complete. Next, to establish (2), let
J.£s,t(A) = max{IIAxllt : x E e q and Ilxll s = 1}. Then
IIAlIs,t = -
<
e q and IIxli s ~ 1} max{IIAxllt: x E e q , IIxli s ~ 1 and x i= O} x max{IIA Ilxll s IItllXlis : x E e q , Ilxli s ~ 1 and x i= O} J.£s,t(A)max{llxlls: x E e q and Ilxll s ~ 1}, max{IIAxllt: x E
Le., IIAlIs,t ~ J.£s,t(A) .
145
7.6. Evaluating some operator norms
However, at the same time, J.Ls,t{A) ~ IIAlls,t, since the maximization in the definition of the latter is taken over a larger set of vectors. Thus, (2) is in force. Much the same argument serves to justify (3). 0 Exercise 7.16. Justify the formula for computing IIAlls,t that is given in (3) of Lemma 7.11.
Lemma 7.12. The vector space c pxq of p x q matrices A over C equipped with the norm IIAlIs,t is a normed linear space over C. Proof. It is readily checked that IIAlIs,t 2 0, with equality if and only if A = Opxq, and that IlaAlls,t = laIIIAlIs,t. Therefore, it remains only to verify that if A and B are p x q matrices, then IIA + Blls,t ~ IIAlls,t + IIBlls,t. But this is immediate from the definition of IIA + Blls,t and the inequality II(A + B)xllt ~ IIAxllt
+ IIBxllt ~
IIAlls,tllXlls
+ IIBlls,tllxlls.
o Lemma 7.13. Let A E C pxk , BE C kxq and 1 ~ r, 8, t ~
00.
Then
IIABIIr.t ~ IIBllr,sIIAlIs,t. Proof.
The proof rests on the observation that IIABxllt
<
IIAlls,tllBxlls
~
IIAlIs,tIlBllr,sllxll r
and (3) of Lemma 7.11. The most important application of the last lemma is when l'
0 = 8 = t = 2:
(7.25) Exercise 7.17. Show that if A
[~ ~]
=
E ]R2x2 and d2
= a2 + b2 + c2 ,
then
max{IIAxll~ : x E]R2 and II x l12 = 1} = [HINT: If u E ]R2 and IIul12 refer to Exercise 7.1.]
= 1,
then u T
~
+ Jd4 -
= [cosO
2
4a 2 c2
sinO], and to finish,
7.6. Evaluating some operator norms Lemma 7.14. If A E C pxq , then:
(1) IIAlll,1 = m~{I:f=llaijl}· J
(2) IIAlloo,oo = mrx {I:]=ll a ijl}. (3) IIA112,2 = 81, where sI is the maximum eigenvalue of the matrix AH A.
146
7. Normed linear spaces
(4) IIA111,oo = IIAlloo,l = ~8?C laijl· t,3
(5) IIAII2,oo
= mrx {(I:3=11~jI2)1/2}.
(6) IIA1l1,2 = m~ {(I:f=1IaijI2)1/2}. 3
Discussion.
To obtain the first formula, observe that
and hence that
(7.26) This establishes the inequality
IIAlIl.l <: fi)'"
(7.27)
{t, I"';I} .
To obtain equality, it suffices to exhibit a vector x E e q such that x i= 0 and equality prevails in formula (7.26). Suppose for the sake of definiteness that the maximum in (7.27) is achieved when j = q. Then for the vector u with u q = 1 and all other coordinates equal to zero, we obtain lIull1 = 1 and
IIAulh
IIAlh,l ~ -llul11
q =?=?= aijUj =?= laiql =mre ?= laijl ,=1 3=1 ,=1 ,=1 p
p
{
p
}
.
This completes the proof of the first formula.
Next, to obtain the second formula, observe that q q q aijXj ::; laij IIXj I ::; laij IIlxli oo j=l j=l j=l
L
L
L
and hence that
(7.28)
II Ax II 00 = mrx {
t
aijXj } ::; mrx
3=1
{t
laij I} II xII 00 ,
3=1
i.e.,
(7.29)
IIAlloo,oo ::; mrx
{t
laijl } .
3=1
en
To obtain equality in (7.29), it suffices to exhibit a vector x E such that xi- 0 and equality prevails in (7.28). Suppose for the sake of definiteness
7.7. Small perturbations
147
that the maximum in (7.29) is attained at i = 1 and that it is not equal to zero, and let u be the vector in en with entries
Then
lIuli oo =
1 and
IIAull oo ~ IIAlloo,oo ~ Ilull oo ~ f=t aljUj This completes the proof of the second assertion if A i= Opxq' However, if A = Opxq, then the asserted formula is self-evident. We shall postpone the proof of the third assertion to Lemma 10.3 and leave the remaining assertions to the reader. 0
Exercise 7.1B. Compute the maximum eigenvalue of the matrix AHA when A =
[~ ~]
E
IR 2 x 2 and show that it is equal to the maximum that was
calculated in Exercise 7.17.
7.7. Small perturbations The central idea of this section is that if A E C nxn is an invertible matrix and if B E c nxn is close to A in the sens&that IIA - BII is small enough with respect to some multiplicative norm, then B is also invertible. The main conclusion is based on the following lemma, which is important in its own right. Convergence of infinite sums and Cauchy sequences, which enter into the proof, are discussed briefly in Appendix I.
Lemma 7.15. If X E C nxn and
IIXII < 1 with respect to
some multiplica-
tive norm, then: (1) In - X is invertible. (2) (In -
(3)
X)-l = Ef=o xj; i.e.,
II (In - X)-lll ~
the sum converges.
1- tixil'
Proof. To verify (1) it suffices to show that 1 is not an eigenvalue of X. But if u = Xu, then the self-evident inequality
lIuli = IIXuli ~ IIXllliuli implies that
o ~ (1 -IIXIDllull ~ 0
148
7. Normed linear spaces
and hence that u
= 0 and
In - X is invertible. Next, let k
Sk=
LXi. i=O
Then k+j
=
IISk+j - Skll
II
Hj
L
XiII ~
i=k+1 k+j
<
L
L
IIXil1
i=k+1 00
IIXll i ~
i=k+1
L
IIXll i
i=k+1
which can be made as small as you like by choosing k large enough, since IIXI! < 1. Consequently, the sequence of matrices So, S1,'" is a Cauchy sequence in nxn and therefore must tend to a limit Soo. (That is the meaning of the infinite sum in (2).) The proof of (3) is a small variation of the proof of (2). The details are D left to the reader.
c
Exercise 7.19. Verify the bound in (3) of Lemma 7.15. Exercise 7.20. Let A E c nxn and)" E C. Show that if is invertible and 1I()"In - A)-III ~ (1)..1 - IIAID- I .
1)..1 > IIAII, then
A
Theorem 7.16. Let A, B E C nxn and suppose that A is invertible, that IIA-lil $ 'Y and that liB - All < 1/'y with respect to some multiplicative norm for some number "I > O. Then: (1) B is invertible.
(2)
IIB- l il ~
Proof.
1- "1111- All'
Let
and set
X
= A- 1 (A - B).
Then, since B = A(I-X), the desired results are immediate from Lemma 7.15 and the estimate
D
149
7.8. Another estimate
• The spectral radius To-(A) of a matrix A E formula
To-(A)
= max {IAI :
c nxn is defined by the
A E (T(A)}.
Thus, if AI, ... ,Ak denote the distinct eigenvalues of A, then To-(A) max{IAII,··· ,IAkl}·
=
Remark 7.17. Parts (1) and (2) of Lemma 7.15 hold under the assumption that To-(X) < 1, which is less restrictive than the constraint IIXII ~ 1, since (7.30)
for any multiplicative norm. The next exercise should help to clarify this point. Exercise 7.21. Calculate IIXII, To-(X) , (12 - X)-I and lI(h - X)-III for the matrix
x = [1~2 1~2]' using the formula in Exercise 7.17 to evaluate the norms. Exercise 7.22. Show that if X E c nxn and the spectral radius Tq(X) < 1, then In - X is invertible and the sequence of matrices {Sk}, k = 0,1, ... , defined in the proof of Lemma 7.15 is still a Cauchy sequence in c nxn . However, the inequality II(In - X)-lll ~ (1- To-(X))-l may fail. [HINT: You may find Exercise helpful.]
7.21
Exercise 7.23. Calculate max {11(I3-X)-lxI12 : x E R3 for the matrix
X=
[1/~2
and
IIxll2 = 1}
1 0] .
1~2 ~
7.8. Another estimate The information in the next lemma will be useful in Chapter 17. Lemma 7.18. Let A, B E c nxn and A E C, and suppose that AIn - A is invertible, that II(AIn - A)-III ~ "y and that liB - All < 1h with respect to some multiplicative norm for some number"Y > O. Then:
(1) AIn - B is invertible.
(2) II (AIn - B)-III ~ 1 -
"Y1I1- All'
(3) II(AIn - A)-l - (AIn - B)-III
~ 1 ~~"i!~II'
7. Normed linear spaces
150
Proof.
Clearly, AIn - B
AIn - A - (B - A) =
(AIn - A){In - (AIn - A)-I(B -
An
(AIn - A)(In - X) ,
with for short. Moreover,
IIXII
=
II(AIn - A)-I(B - A) II
::; II (AIn -
A)-IIIIIB - All
< ,liB - All < 1, by assumption. Therefore, by Lemma 7.15, the matrix In - X is invertible and Thus,
II (AIn -
B)-III =
II (In - X)-I(AIn - A)-III
< II (In - X)-IIlIl(AIn - A)-III < ,(1- IIXII)-l ::; ,(I-,IIB - AII)-l , which justifies (2).
Finally, the bound furnished in (3) is an easy consequence of the formula {AIn - A)-l - (AIn - B)-l
= (AIn - A)-I(A - B) (AIn - B)-l
and the preceding estimates.
D
7.9. Bounded linear functionals A function f(x) from a normed linear space X over IF into IF is said to be a bounded linear functional if (1) f(ax+f3y) = af(x)+/3f(y) for every choice ofx,y E X and a,/3 E IF. (2) There exists a constant cf such that If(x)1 ::; cfllxll for every x E X. The least constant cf for which (2) holds is termed the norm of designated by the symbol IIfli. Thus, IIfll
=
sup{lf(x)l.xExandx#o} II xII . = sup{lf(x)l: x E X and II xII = I}.
f and is
7.9. Bounded linear functionals
151
Theorem 7.19. Let f be a bounded linear functional from a normed linear space X over IF into IF, let
N f = {x EX: f(x) = O} and suppose that N f :f= X. Then there exists a vector xo E X such that X =
Proof.
Nf
+{axo : a ElF}.
Choose Xo E X such that f(xo)
f(x») x = ( x - f(xo) XO
:f= O. Then the formula f(x) ) + ( f(xo) Xo
clearly displays the fact that
Nf + span{xo} = X. It is also easy to see that this sum is direct, because if y = axo belongs to 0
Nf, then f(y) = af(xo) = 0, which forces a = 0 and hence y = O.
Exercise 7.24. Let f(x) be a linear functional on en and let f(ej) = aj for j = 1, ... , n, where ej denotes the j'th column of In. Show that if 8> 1 and t = 8/(8 - 1), then n
max {If(x)1 : IIxli s $1} =
(L lajn l / t . j=1
[HINT: It's easy to show that If(x)1 $ (L:j=llajlt)l/tllxlls. The particular vector x with coordinates Xj = ajlajlt-2 when aj :f= 0 is useful for showing that the maximum is attained.J Exercise 7.25. Let f(x) be a linear functional on en and let f(ej) = aj for j = 1, ... , n, where ej denotes the j'th column of In. Show that n
max {If(x)1 : IIxli oo $ I} =
L lajl· j=1
[HINT: See the hint in Exercise 7.24.J Exercise 7.26. Let f(x) be a linear functional on en and let f(ej) = aj for j = 1, ... ,n, where ej denotes the j'th column of In. Show that max {If(x)1 : IIxlll $1} = max {Iajl : j = 1, ... ,n}. [HINT: See the hint in Exercise 7.24.J
7. Normed linear spaces
152
7.10. Extensions of bounded linear functionals In this section we consider the problem of extending a bounded linear functional f that is specified on a proper subspace U of a normed linear space X over IF to the full space X in such a way that the norm of the extension F is the same as the norm of f, i.e., sup {1F(x)1 : x E X and IIxll S 1} = sup {If(u)1 : U E U and lIuli S 1}. The fact that this is possible lies at the heart of the Hahn-Banach theorem. It turns out to be more useful to phrase the extension problem in a slightly more general form that requires a new definition: A real-valued function p(x) on a vector space X over IF is said to be a seminorm if for every choice of x, y E X and a E IF the following two conditions are met:
(1) p(x + y) S p(x) + p(y). (2) p(ax) = lalp(x). Exercise 7.27. Show that if p(x) is a seminorm on a vector space X over IF, then p also automatically satisfies the following additional three conditions:
(3) p(O) = O. (4) p(x) ~ 0 for every x E X. (5) p(x - y) ~ Ip(x) - p(y)l· Theorem 7.20. Let p be a seminorm on a finite dimensional vector space X over IF, and let f be a linear functional on a proper subspace U of X such that f(u) E rand If(u)1 p(u) for every u E U. Then there exists a linear functional F on the full space X such that
..
s
(1) F(u) E IF and F(u) = f(u) for every u E U. (2) IF(x)1 s p(x) for every x EX. Proof.
Suppose first that IF g(u) = f(u) ;
= C and let
7(U)
and
h(u) = f(u)
~ 7(U)
denote the real and imaginary parts of f(u), respectively. Then f(u)
= g(u) + ih(u) ,
and it is readily checked that g(aul
+ ,Bu2) =
ag(ul)
+ ,Bg(U2) and h(aul + ,Bu2) =
for every choice of UIl U2 E U and a,,B E R Moreover, g(u) S If(u)1
s p(u) for every u E U.
ah(ul)
+ ,Bh(u2)
7.10. Extensions of bounded linear functionals
153
Let VI E X \ U and suppose that there exists a real-valued function G(u) such that
G(u) + aG(vI) g(u) + aG(vI) and
G(u + aVI) ::; p(u + aVI) for every choice of u E U and a E JR. Then g(u) + aG(vI) ::; p(u + aVI).
Thus, if a > 0, then g(u)+aG(vI)
< p(a{~+VI}) =
ap
(~+Vl)
and for every u E U; i.e., (7.31) for every y E U. On the other hand, if a < 0, then g(u) + aG(vd
g(u) -laIG(vI) = G(u - lalvl) ::; p(u -lalvI) = lalp
(1:1 - VI) ,
and hence
for every u E U; Le., (7.32) for every x E U. Thus, in order for such an extension to exist, the number G(VI) must meet the two sets of inequalities (7.31) and (7.32). Fortunately, this is possible: The inequality g(x)
+ g(y)
g(x + y) ::; p(x + y)
P(X-VI +Y+VI) < p(x - VI) + p(y + VI) implies that g(x) - p(x - VI) ::; p(y + VI) - g(y)
7. Normed linear spaces
154
and hence that
(7.33)
sup{g(x) - p(x - VI) : x E U} ::; inf{p(y + VI) - g(y) : y E U} .
The next step is to extend f(u). This is facilitated by the observation that
g(iu)
+ ih(iu) =
f(iu) = if(u) = ig(u) - h(u) ,
which implies that h(u) = -g(iu) and hence that
f(u) = g(u) - ig(iu) . This suggests that
F(x) = G(x) - iG(ix) might be a reasonable choice for the extension of f(u), if it's a linear functional with respect to C that meets the requisite bound. Its clear that F(x + y) = F(x) + F(y). However, its not clear that F(ax) = aF(x) for every point a E C, since G(ax) = aG(x) only for a E JR. To verify this, let a = a + ib with a,b E JR. Then F«a+ib)y) = G«a+ib)y)-iG(i(a+ib)y) = aG(y) + bG(iy) - iaG(iy) + ibG(y) = (a + ib)G(y) - i(a + ib)G(iy) = (a + ib)F(y). Moreover, if F(y) :f:. 0, then upon writing the complex number F(y) in polar coordinates as F(y) = ei9 IF(y)l, it follows that
F(e- i9 y) = G(e- i9 y) ::; p(e- i9 y) = le- i9 Ip(y) = p(y).
IF(y)1 =
Therefore, since F(O) = 0 = p(O), the inequality IF(y)1 ::; p(y) holds for every vector y E U + {avl : a E C}. This completes the proof for a one dimensional extension. The procedure can be repeated until the extension is defined on the full finite dimensional space X over C. The proof for the case IF = JR is easily extracted from the preceding D analysis and is left to the reader as an exercise. Exercise 7.28. Verify Theorem 7.20 for the case IF = JR. [HINT: The key is to verify (7.33) with f = g.] Exercise 7.29. Show that if X is a finite dimensional normed linear space over JR, then Theorem 7.20 remains valid if p(x) is only assumed to be a sub linear functional on X; i.e., if for every choice of X,y E X, p(x) satisfies the constraints (1) 00 > p(x) ~ OJ (2) p(x + y) ::; p(x) + p(y); (3) p(ax = ap(x) for all a > O.
155
7.11. Banach spaces
7.11. Banach spaces A normed linear space U over IF is said to be a Banach space over IF if every Cauchy sequence VI, V2, ... in U tends to a limit v E U, i.e., if there exists a vector v E U such that limnjoo IIVn - vllu = O. In this section we shall show that finite dimensional normed linear spaces are automatically Banach spaces. Not all normed linear spaces are Banach spaces. Exercise 7.30. Let U be the space of continuous real-valued functions on the interval 0 ::; x ::; 1 equipped with the norm
IIfllu
=
f(x)
foIlf(x)ldX.
Show that U is not a Banach space. Remark 7.21. The vector space U considered in Exercise 7.30 is a Banach space with respect to the norm IIfliu = max {If(x)1 : 0 ::; x ::; I}. This is a consequence of the Ascoli-Arzela theorem; see e.g., [73]. Thus, norms in infinite dimensional normed linear spaces are not necessarily equivalent. Exercise 7.31. Let Ub'" ,Ul be a basis for a normed linear space U over IF. Show that the functional
~(t~~)=tIC;1 defines a norm on U. Theorem 7.22. Every finite dimensional normed linear space U over IF is automatically a Banach space over IF. Proof. Let UI,." ,Ul be a basis for U and let {Vj}~1 be a Cauchy sequence in U. Then, for every E: > 0 there exists a positive integer N such that IIVn+k - vnllu < E: for n ~ N and k ~ 1. Let cp(v) denote the norm introduced in Exercise 7.31. Then, in view of Theorem 7.7, allvllu ::; cp(v) ::; ,8l1vllu for some constants 0 < a <,8. Thus the i'th coefficient Gin of Vn with respect to any basis {UI' ... ,ud of U is subject to the bound l
IGi,n+k - Ginl
<
L ICj,n+k -
cjnl
j=1
=
CP(Vn+k - v n )
< ,8livn+k - vnllu.
7. Normed linear spaces
156
But this means that Cin
-t
di as n
i
00.
Moreover, if
t.
v=
L~Ui' i=l
then the inequalities allv - vnllu
<
~ (t.(c;n -d;)U;) l
< Llcin-dil i=l
clearly imply that
IIv - vnllu
--t
0 as n
i
00.
D
Chapter 8
Inner product spaces and orthogonality
A proof should be as simple as possible, but no simpler.
Paraphrase of Albert Einstein's remark on deep truths In this chapter we shall first introduce the notion of an inner product space and characterize its essential features. We then define orthogonality and study projections, orthogonal projections and related applications, including methods of orthogonalization and Gaussian quadrature.
8.1. Inner product spaces A vector space U over IF is said to be an inner product space if there is a number (u, v)u E IF associated with every pair of vectors u, v E U such that: (1) (u + w, v)u
= (u, v)u + (w, v)u for every w E U.
= a(u, v)u for every a (3) (u, v)u = (v, u)u. (2) (au, v)u (4) (u, u)u
~
E IF.
0 with equality if and only if u = O.
The number (u, v)u is termed the inner product. Items (1) and (2) imply that the inner product is linear in the first entry and hence, in particular, that 2(0, v)u
= (20, v)u = (0, v)u ,
-
157
8. Inner product spaces and orthogonality
158
which implies that (0, v)u = O. Item (3) then serves to guarantee that the inner product is additive in the second entry, i.e., (u, v
+ w)u = (u, v)u + (u, w)u ;
(u, f3v)u
however,
= jj(u, v)u .
Usually we drop the subscript U from the symbol (u, v)u and simply write (u, v). Exercise 8.1. Let U be an inner product space over IF and let u E U. Show that (u, v)
= 0 for every v E U {:::=:} u =
°
and (consequently) (UI' v)
= (U2' v)
for every v E U
{:::=:}
UI
= U2.
The symbol (x,Y)st, which is defined for x,Y E IFn by the formula n
(8.1)
(x,Y)st = yHx = LYiXi, i=l
will be used on occasion to denote the standard inner product on IF n. The conjugation in this formula can be dropped if x, Y E Rn. It is important to bear in mind that there are many other inner products that can be imposed on lF n : Exercise 8.2. Show that if BE
e nxn is invertible, then the formula
(8.2)
(x, y) = (By)H Bx
defines an inner product on
en.
Lemma 8.1. (The Cauchy-Schwarz inequality for inner products) Let U be an inner product space over e with inner product (u, v) for every pair of vectors u, v E U. Then
(8.3)
l(u,v)l:s {(u,u)}1/2{(v,v)}1/2,
with equality if and only if dim span {u, v} :s 1.
Proof.
There are two cases to consider:
(1) If (u, v) = 0, then the inequality is clear. Moreover, if equality holds in this case, then 0= {(u, u)}I/2 {(v, v)}I/2 , which forces at least one of the vectors u,v to be equal to the vector 0, and hence the two vectors are linearly dependent.
159
8.1. Inner product spaces
(2) If (u, v)
i= 0,
then, in polar coordinates,
(u, v) = I(u, v)lei9 = re i9 . Set A = xei9 , x E ~, a = (u, u), b = (v, v) and observe that
(u + AV, u + AV) =
(u, u) + (u, AV) + (AV, u) + (AV, AV) = a+2xr+x2b r)2 r2 = b ( x+b +a- b
> 0 for every choice of x E ~. (The condition (u, v) i= 0 insures that v and hence permits us to divide by b.) Thus, upon choosing
i= 0
r X=--,
b
we conclude that
r2
a-
b
= (u + AV, u
+ AV) ~ 0,
which justifies the asserted inequality. Finally, if this last inequality is an equality, then (u + AV, u + AV) = 0 i8 when A = xe and x is chosen as above. But this implies that
U+AV = 0 for this choice of A and hence that u and v are linearly dependent.
EJ
Exercise 8.3. Let U denote the set of continuous complex valued functions f (t) on the finite closed interval [a, b]. (a) Show that U is a vector space over C with respect to the natural rules of addition and multiplication by constants. Identify the zero element. (b) Show that U is a normed linear space with respect to the norm IIfli =
{f: If(t)j2dt}
1/2.
(c) Show that U is an inner product space with respect to the inner b product (1, g) = fa f(t)g(t)dt. Exercise 8.4. Show that if f(t) and g(t) are continuous complex valued functions f(t) on the finite closed interval [a, b], then
lib
f(t)g(t) dt I
2
~
lb
If(t)1 2dt
ib
Ig(t)1 2dt
with equality if and only if there exists a pair of constants a, f3 E C such that af(t) + f3g(t) = 0 for every point t E [a, b].
8. Inner product spaces and orthogonality
160
Exercise 8.5. Show that the space U = C pxq endowed with the inner product (A, B) = trace{B H A} is a pq-dimensional inner product space.
8.2. A characterization of inner product spaces An inner product space U over F is automatically a normed linear space with respect to the norm lIuli = {(u, u)} 1/2 :
= {(au, au)} 1/2 = {aa(u, U)}1/2 = lailiull,
Ilaull
lIuli ;::: 0 with equality if and only if u = 0, and, by the Cauchy-Schwarz inequality for inner products,
lIu + vll 2
(u + v, u + v) = lIull 2+ (u, v) + (v, u) + IIvll2 ::; lIul/ 2 + I/ullllvil + IIvllllull + IIvl12 = (llull+ IIvll)2. -
It is natural to ask whether or not the converse is true: Is every normed linear space automatically an inner product space? The answer is no, because the nOrm induced by the inner product has an extra property: Lemma 8.2. LetU be an inner product space over F. Then the norm lIuli = { (u, u) }1/2 induced by the inner product satisfies the parallelogram law
(8.4)
Moreover, the inner product can be recovered from the norm by the formula (8.5)
(u, v) = {
iL:k=likllu+ikvIl2 if F=C !L:~=1(-1)kllu+(-1)kvIl2
Proof.
if F=IR
Both formulas are straightforward computations.
o
Exercise 8.6. Verify formula (8.4) in the setting of Lemma 8.2. Exercise 8.7. Verify formula (8.5) in the setting of Lemma 8.2. Exercise 8.8. Let U be a normed linear space, and for u, v E U, let f (u) = lIu + vII· Show that If(u2) - f(u1)1 ::; IIU2 - uIl! for any two elements U1,U2 EU. Theorem 8.3. Let U be a normed linear space over C in which the parallelogram law (8.4) holds. Then formula (8.5) defines an inner product in U; i.e., the four defining characteristics of an inner product that were enumerated in the previous section are all met when (u, v) is defined by formula
(8.5).
8.3. Orthogonality
161 b \
\ \ \ \
\ \ \ \ \
\
\ \ \ \ \
\
L_L-----a o \
Figure 1 Discussion. We shall not give the details of the proof, but rather shall list a number of steps, each one of which is relatively simple to verify and which taken together yield the asserted result.
(a) (u, u) = Ilu11 2. (b) (u, v)
= (v, u).
(c) (u,O) = O. (d) (x, y) + (u, v) = 2((x + u)/2, (y + v)/2) + 2((x - u)/2, (y - v)/2). (e) (x + u, y) = (x, y) + (u, y). (f) (mx, y) = m(x, y) for any positive integer m. (g) ((m/n)x, y) = (m/n)(x, y) for any two positive integers m, n. (h) (ax, y) = a(x, y) for any real number a. (i) (ax, y) = a(x, y) for any complex number a. HINT: Exercise 8.8 helps in the transition from (g) to (h).
0
Exercise 8.9. Let U = en endowed with the standard inner product. Show that if n ~ 2 and if u = el and v = e2, where ej denotes the j'th column of In, and if 1 ~ s ~ 00, then Ilu + vll~
+ Ilu -
vll~ = 211ull; + 211vll;
<===}
s
= 2.
8.3. Orthogonality The role of the inner product is perhaps best motivated by considering the law of cosines in R 3 equipped with Ilxll = IIx112. Then the cosine of the angle () between the line segment running from 0 to a = (aI, a2, a3) and the line segment running from 0 to b = (bl'~' b3) in Figure 1 is
8. Inner product spaces and orthogonality
162 In particular, 3
L aibi = 0
<===?
cos () = 0
<===?
a 1.. h.
i=l
The law of cosines serves to motivate the following definitions in an inner product space U over IF with inner product (u, v)u: • Orthogonal vectors: A pair of vectors U and v in U is said to be orthogonal if (u, v)u = o. • Orthogonal family: A set of nonzero vectors {UI, ... ,Uk} in U is said to be an orthogonal family if
(Ui, Uj)u =
0 for i
t= j.
The assumption that none of the vectors UI, ... ,Uk are equal to 0 serves to guarantee that they are automatically linearly independent. • Orthonormal family: A set of vectors UI, .. ' , Uk in U is said to be an orthonormal family if (1) it is an orthogonal family and (2) the vectors Ui, i = 1, ... ,k, are all normalized to have unit length, i.e., IIUiIl~ = (Ui' Ui)U = 1, i = 1, ... ,k.
• Orthogonal decomposition: A pair of subspaces V and W of U is said to form an orthogonal decomposition of U if (1) V+W=U, (2) (v, w)u = 0 for every v E V and wE W. Orthogonal decompositions will be indicated by the symbol
U=VEBW. • Orthogonal complement: If V is a subspace of an inner product space U over IF, then the set
Vl. = {u E U: (u, v)u = 0 for every v
E
V}
is referred to as the orthogonal complement of V in U. It is a subspace of U. Exercise 8.10. Show that every orthogonal sum decomposition is a direct sum decomposition and give an example of a direct sum decomposition that is not an orthogonal decomposition. Exercise 8.11. Show that if {UI, . .. ,Uk} is an orthogonal family of nonzero vectors in an inner product space U over IF, then UI, ... ,Uk are linearly independent.
163
8.5. Adjoints
8.4. Gram matrices Let VI, ... ,Vk be a set of vectors in an inner product space U over IF. Then the k x k matrix G with entries
(8.6) is called the Gram matrix of the given set of vectors. Note that G = GH. Lemma 8.4. Let U be an inner product space over IF and let G denote the Gram matrix of a set of vectors VI, ... ,Vk in U. Then G is invertible if and only if the vectors VI, ... ,Vk are linearly independent over IF. Proof. Let c, d E lFk with components Cl, ... ,Ck and d l , ... ,dk , respectively, and let V = 2:7=1 CjVj and w = 2:~=1 ~Vi. Then it is readily checked that (8.7) Suppose first that G is invertible and that 2:~=1 CjVj = 0 for some choice of Cl, . .. ,Ck ElF. Then, in view of formula (8.7), k
k
o = (LCjVj'L~Vi)U = dHGc j=l
i=l
for every choice of d l , ... ,dk ElF. Therefore, Gc = 0, which in turn implies that c = 0, since G is invertible. Thus, the vectors VI, ... ,Vk are linearly independent. Suppose next that the vectors VI, ..• ,Vk are linearly independent and that c E Nc. Then, by formula (8.7), k (LCjVj, j=l
k
LCiVi)U = cHGc = O. i=l
Therefore, 2:7=1 CjVj = 0 and hence, in view of the presumed linear independence, Cl = ... = Ck = o. Thus, G is invertible. 0 Exercise 8.12. Verify formula (8.7).
8.5. Adjoints In this section we introduce the notion of the adjoint S* of a linear transformation S from one inner product space into another. It is important to keep in mind that the adjoint depends upon the inner products of the two spaces.
164
8. Inner product spaces and ortllOgonality
Theorem 8.5. Let U and V be a pair of finite dimensional inner product spaces over IF and let S be a linear transformation from U into V. Then there exists exactly one linear transformation S* from V into U such that (SU, v}v
(8.8)
= (u, S*v}u
for every choice of U E U and v E V.
Proof. It is easy to verify that there is at most one linear transformation from V into U for which (8.8) holds. If there were two such linear transformations, Si and S2 I then (Su, v}v
= (u, Siv}u = (u, S;'v}u
and hence for every choice of U it follows that
(U, (S; - S;')v)u = 0 E U and v E V. Thus, upon choosing u = (Si - S2)v, ((S; - S2)v, (Si - S;')v)u
= 0,
which implies that (Si - S2)v = 0 for every vector v E V, i.e., Si = S2' The verification of the existence of S* in the present setting is by computation (though a more elegant approach that is applicable in more general settings is via the Riesz representation theorem, which is discussed in the next section). Let UI,'" ,11q be a basis for U, let VI, ... ,vp be a basis for V and let U and V denote the corresponding Gram matrices, with entries Uti = (Ui' Ut}u
and Vjk = (Vk' Vj}v,
respectively. It suffices to show that there exists a linear transformation S* from V to U such that (SUi,Vj)V=(Ui,S*Vj)u
for
and
i=l, ... ,q
j=l, ... ,p.
Et=I akivk and suppose for the moment that S* exists and that El=l btjUt. Then
Let SUi = S*v j =
(SUi, Vj)v
=
and
(Ui, S*Vj}v =
p
p
k=l
k=l
L aki(vk, Vj}v = L Vjkaki q
q
t=l
t=l
L btj(Ui, Ut}u = L Utibtj .
Thus, upon setting A = [akiJ and B = [btj], it follows that the last two formulas will match if and only if V A = BHU and hence, since the Gram matrices V and U are invertible, if and only if B = (UH)-l AHVH .
165
B.5. Adjoints
Thus, the linear transformation S* from V into U that is defined by the formula q
S*Vj =
L btjut (with B = (U H)-1 AHVH = [btj]) t=l
is the one and only linear transformation from V to U that meets the stated requirements. 0 Corollary 8.6. If U = lF q and V = lF P are both endowed with the standard inner product and if the linear transformation is multiplication by A E C pxq , then A* = AH. Exercise 8.13. Verify Corollary 8.6. Example 8.7. Let U = C pxq equipped with the inner product (A,B)u = trace BH A, let V = C P equipped with the standard inner product, let u E C q and let S denote the linear transformation from U into V that is defined by the formula SA = Au for every A E U. Then the adjoint S* must satisfy the identity (SA, v}v = (A, S*v}u for every choice of A E C pxq and v E CPo Thus,
(SA, v}v
(Au, v}v = v HAu trace {v HAu} = trace {uv HA} (A,vuH}u,
i.e.,
(A, S*v}u = (A, vuH}u for every v E CPo Therefore, in view of Exercise 8.1, (8.9)
S*v
= YUH for every v
E V.
Exercise 8.14. Verify the identification (8.9) in the preceding example by checking that for all rank one matrices A = xyH, with x E CP and y E cq,
(SA,v}v =yHuvHx and
(A,S*v}u =yH(S*v)Hx.
Exercise 8.15. Let U = C n equipped with the inner product (u, v}u = Ej=1 jVjUj for vectors u, v E C n with components Ul, ... ,Un and VI, ••• ,Vn , respectively. Find the adjoint A* of a matrix A E c nxn with respect to this inner product. Lemma 8.8. Let U, V and W be finite dimensional inner product spaces over IF and let Sand SI be linear transformations from U into V and let T be a linear transformation from V into W. Then:
(1) (as + [3S1)* = (is* (2) (S*)* = S.
+ fjSi
for every choice of a and [3 in IF.
8. Inner product spaces and orthogonality
166 (3) (TS)*
= S*T*.
(4) If U = V and I denotes the identity in U, then 1* = I. Proof.
The formulas
(as + f3S1)U, v)v -
a(Su, v)v + f3(SlU, v)v a(u,S*v)u+f3(u,S;v)u
=
(u, CiS*v)u + (u, {jS;v)u (u, (CiS* + {jS;)v)u
serve to verify (1). The proof of the remaining assertions is similar and is left to the reader. 0
Exercise 8.16. Complete the proof of Lemma 8.8. Lemma 8.9. Let U and V be finite dimensional inner product spaces over ]F and let S be a linear transformation from U into V. Then Ns = Ns*s, i.e.,
Su = 0
(8.10)
<===?
S* Su
= 0.
Proof. Suppose first that S* Su = 0 for some vector u E U. Then the formulas (Su, Su)v = (S* Su, u)u = (0, u)u = 0 imply that Su = O. Therefore, Ns*s ~ N s . To complete the proof, it remains to check that the opposite inclusion holds too. But that is selfevident. 0
Exercise 8.17. Let T be a linear transformation from a finite dimensional inner product space U into itself, and let V be a subspace of U. Show that TV ~ V <===? T*Vl. ~ Vl. .
(8.11)
8.6. The Riesz representation theorem It is convenient to begin with an elementary exercise to help set the scene.
Exercise 8.1S. Let U be an inner product space over ]F, let y E U and let
f(x)
(8.12) Show that (8.13)
f
= (x, y)u
for every x E U.
is a linear functional on U and that
IIYllu = max{lf(x) I : x E U
and
Ilxllu = I}.
A natural question is: Does every linear functional on U admit a representation of the form (8.12) for some vector y E U? In view of Exercise 8.1, there is at most one such vector y. The solution of the next exercise guarantees that there is at least one (and hence exactly one) such vector y if U is finite dimensional.
8.6. The Riesz representation theorem
167
Exercise 8.19. Let U be a finite dimensional inner product space over IF, with basis {UI," . ,un} and Gram matrix G, and let f be a linear functional on U. Show that
f{x) = (x, y)u for every x E U , where y
= L~l diui and di is the i'th component of the vector d
= G- I [f{Ul) ... f(Un)]H.
Exercise 8.19 is a good exercise in the use of Gram matrices. A more elegant approach that works equally well in infinite dimensional Hilbert spaces is based on the Riesz representation theorem; see Theorem 8.10, below. An inner product space U over IF is said to be a Hilbert space over IF if every Cauchy sequence Xl, X2, ... in U tends to a limit y E U in the norm induced by the inner product. A Hilbert space may also be characterized as a Banach space in which the norm satisfies the parallelogram law, just as in Section 8.2. A linear transformation S from a Hilbert space U over IF into a Hilbert space V over IF is said to be bounded if there exists a number 'Y > 0 such that (8.14)
IISullv ~ 'Ylluliu for every vector u E U .
Moreover, IISliu,v is the smallest 'Y for which (8.14) holds; see Appendix I for additional discussion of this terminology. Finite dimensional inner product spaces are automatically Hilbert spaces, and linear transformations from a finite dimensional inner product space U into an inner product space V are automatically bounded. Theorem 8.10. (Riesz representation) Let f(x) be a bounded linear functional on a Hilbert space U over IF. Then there exists a unique vector y E U such that
f{x) = (x, y)u for every x
(8.15)
E
U.
Moreover, (8.16)
Ilyliu
= max{lf{x)1 : x E U and Ilxliu =
I}.
Proof. Let N, = {x E U : f{x) = a}. Then, referring to Appendix I, if need be, it is readily checked that
(1) N, is a closed subspace of U. (2) N, = U if and only if f{x) = 0 for every x E U. (3) f(u)v - f(v)u E N, for every choice of u and v in U. (4) If N, is a proper subspace of U, then the orthogonal complement of N, in U is a one dimensional subspace of U.
8. Inner product spaces and orthogonality
168
Consequently, if Nf is a proper subspace of U, then there exists a nonzero vector Xo E U that is orthogonal to Nf . Thus, as f(x)xo - f(xo)x E N f , it follows that f(x)(xo, xo)u = f(xo)(x, xo)u
for every vector x E U.
This serves to establish (8.15) with
l[XJ y = (xa,xo)u Xo·
If N f = U, then f(x) = 0 for every vector x E U and hence y (8.16) has already been established in Exercise 8.18.
= o. Assertion D
Theorem 8.11. Let U and V be Hilbert spaces over IF and let S be a bounded linear transformation from U into V. Then there exists exactly one bounded linear transformation S* from V into U such that
= (u, S*v}u IISII = IIS*II·
(Su, v}v Moreover,
for every choice of u E U
and v E V .
Discussion. The formula fv(u)
= (Su, v}v
defines a bounded linear functional on U for each given v E V. Therefore, by the ruesz representation theorem, there exists a unique vector w E U such that fv(u) = (u, w)u. But this in turn implies that (Su, v}v
= (u, w)u for every choice of u
E U.
Since there is only one such w for each v E V, we may define S*v = w. It remains to check that S* is a bounded linear transformation from V into U. The details are left to the reader. D
8.7. Normal, selfadjoint and unitary transformations A linear transformation T from an inner product space U over IF into itself is said to be • normal if T*T = TT*, i.e., if (Tu, Tv}u
= (T*u, T*v)u
for every choice of u, v E U .
• selfadjoint if T = T*, i.e., if (Tu, v}u = (u, Tv)u
• unitary if T*T
= TT* =
(Tu, Tv}u = (u, v)u
for every choice of u, v E U. I, i.e., if
for every choice of u, v E U .
169
8.7. Normal, selfadjoint and unitary transformations
It is important to bear in mind that each of these three classes of transformations depends upon the inner product and that selfadjoint and unitary transformations are automatically normal. Theorem 8.12. Let T be a normal transformation from an n-dimensional
inner product space U over C into itself. Then (1) Tu = AU ¢::::::> T*u = Xu. (2) There exists an orthonormal basis {Ub •. ' ,un} of U and a set of complex numbers {AI, ... ,An} (not necessarily distinct) such that TUj
= AjUj
= 1, ... ,no = 1, ••• ,n.
for j
(3) If also T = T*, then Aj E 1R for j (4) If also T*T = I, then IAjl = 1 for j = 1, ... ,n. Proof. that
The first assertion follows from Lemma 8.9 and the observation
T*Tu = 0 {:::=} TT*u = 0 ¢::::::> T*u = 0 , with T replaced by AI - T. This does the trick, since T is normal if and only if AI - T is normal and (AI - T)* = XI - T*, i.e., Tu = 0
¢::::::>
(AI - T)u = 0
{:::=}
(XI - T*)u = O.
Since U is invariant under T, Theorem 4.2 guarantees the existence of a vector Ul E U such that IluIIIu = 1 and TUI = AIUI. N ow let us suppose that we have established the existence of an orthonormal family {UI,' .. , Uk} of eigenvectors of T for some positive integer k and let
.ck = span {UI,'" ,Uk} and .ct = {u EU: (v, u}u =
0 for every v E.cd·
Then, .ct is invariant under T: (v, Uj)u = 0
===}
(Tv, Uj)u = (v, T*Uj}u = (v, AjUj}u = Aj(v, Uj)u = 0
for j = 1, ... ,k. Therefore, if k < n, there is a vector ukH of norm one in .ct such that TUkH = AkH UkH' Thus, if .ck exists for k < n, then .ckH exists and the construction continues until k = n. This completes the proof of (2). If T is selfadjoint, then the string of equalities
Aj
=
Aj(Uj, Uj)u = (Tuj, Uj)U
= (Uj, T*uj)u
=
(Uj, TUj)u = (Uj, AjUj}u = Aj(Uj, Uj)u
=
Aj
implies that Aj E JR, i.e., the eigenvalues of T are real. This completes the proof of (3). The proof of (4) is left to the reader as an exercise. 0
8. Inner product spaces and orthogonality
170
Exercise 8.20. Show that IAjl = 1 for each eigenvalue Aj of the unitary transformation T considered in Theorem 8.12. B.B. Projections and direct sum decompositions
• Projections: A linear transformation P of a vector space U over IF into itself is said to be a projection if P is idempotent, i.e., if P 2 =P. • Orthogonal projections: A linear transformation P of an inner product space U over IF into itself is said to be an orthogonal projection if P is idempotent and selfadjoint with respect to the given inner product, i.e., if
p2 = P and (Pu, v}u
= (u, Pv}u
for every pair of vectors u, v E U. Exercise 8.21. Let Ul and U2 be a pair of orthonormal vectors in an inner product space U over IF and let a ElF. Show that the transformation P that is defined by the formula Pu = (u, Ul + aU2}UUl is a projection but is not an orthogonal projection unless a = O. Lemma 8.13. Let P be a projection in a vector space U over IF, and let 'Rp
= {Px:
x E U}
and let
Np
= {x E U:
Px = o}.
Then (8.17) Proof.
Let x E U. Then clearly x
= Px + (I - P)x
and Px E'Rp. Moreover, (I - P)x E N p , since P(I - P)x
= (P - p 2 )x = (P - P)x = O.
Thus, U = 'Rp +Np. The sum is direct because Y E 'Rp <==> y = Py and y E N p
<==:}
Py
=0.
o Lemma 8.13 exhibits U as the direct sum of the spaces V = 'Rp and W = N p that are defined in terms of a given projection P. Conversely, every direct sum decomposition U = V+W defines a projection P on U with V = 'Rp and W =Np .
171
8.8. Projections and direct sum decompositions
Lemma 8.14. Let V and W be subspaces of a vector space U over IF and suppose that U = V+W. Then: (1) For every vector u E U there exists exactly one vector Pvu E V such that u - Pvu E W.
(2) Pvv = v for every v E V. (3) Pvw
=0
for every w E W.
(4) Pv is linear on U and P~ = Pv. (5) V
= RPv
and W =NPv.
(6) ffU is an inner product space with inner product (., ·)u, then
(8.18)
(v, w)u
=0
for every choice of v E V and w E W
if and only if (8.19)
(Pvx, y)u = (x, PvY)u for every choice of x, y
E
u.
Proof. Item (1) is immediate from the definition of a direct sum decomposition. Items (2) and (3) then follow from the decompositions v = v + 0 and w = 0 + w, respectively. Items (4) and (5) are left to the reader as exercises. To verify (6), suppose first that (8.18) is in force. Then, since Pvu E V and (I - Pv)u E W for every vector U E U, (Pv x , y)u
= (Pvx, PvY)u + (Pvx, (1 - Pv )y)u = (Pvx, pVY)u
(x, pVY)u
= (Pv x , PvY)u + ((1 - Pv )x, PvY)u = (Pvx, PvY)u
and for every choice of x, y E U, i.e., (8.18) in force and v E V and w E W, then (v, w)
~
(8.19). Conversely, if (8.19) is
= (Pvv, w) = (v, Pvw) = (v,O) = o. D
Exercise 8.22. Verify assertions (4) and (5) in Lemma 8.14. Exercise 8.23. Let {v, w} be a basis for a vector space U over IF. Find the projection Pvu of the vector u = 2v + 3w onto the space V with respect to each of the following direct sum decompositions: U = V+W andU = V+WI, when V = span {v}, W = span{w} and WI = span{w + v}. It is important to keep in mind that Pv depends upon both V and the complementary space W. However, if Pv is an orthogonal projection, then W is taken equal to the orthogonal complement Vl. of V, i.e., (8.20)
W
= Vl. = {u E U: (v, u)u = 0 for every v E V}.
8. Inner product spaces and orthogonality
172
8.9. Orthogonal projections The next result is an analogue of Lemma 8.14 for orthogonal projections that is formulated in terms of one subspace V of U rather than in terms of a pair of complementary subspaces V and W. This is possible because, as noted at the end of the previous section, the second space W is implicitly specified as the orthogonal complement VJ.. of V.
Lemma 8.15. Let U be an inner product space over IF, let V be a subspace ofU with basis {Vb ••• ,VA:} and Gram matrix G with entries gij = (Vj, Vi)U. Then:
(1) For every vector u E U, there exists exactly one vector Pvu E V such that u - Pvu E VJ.. i it is given by the formula A:
Pvu =
(8.21)
j=l
b =
where
~)G-lb)jvj, .
[(U, ~l)Ul (u, Vk)U
i.e., U = VEBVJ...
(8.22)
(2) Pv is a linear transformation ofU into U that maps U onto V. More. over, Pv = P~ = Pvi i. e., Pv is an orthogonal projection. (3) lIu - vlI~ ~ lIu - Pvull~ for every vector V E V, with equality if and only
if V
= Pvu.
(4) IIPvull~ ~ Ilull~, with equality if and only ifu E V. Moreover, IIPvull~ = bHG-lb.
en
is endowed with the standard inner product and V [VI'" VAll, then G = VHV and (8.23) Pv = V(VHV)-lV H . (5) If U =
Proof. The first assertion is equivalent to the claim that there exists exactly one choice of coefficients
( (u -
Cl, • . . ,Ck
t, CjVj) ,Vi) ~ U
E
e such that
0 for i
~ 1, ... , k,
or, equivalently in terms of the entries in the Gram matrix, that k
2:.: gijCj
for i = 1, ... ,k. j=l But this in turn is the same as to say that the vector equation b = Gc has a unique solution c E e A: with components CI, .•. ,Ck for each choice of the (u, Vi)U =
173
8.9. Orthogonal projections
vector b. However, since G is invertible by Lemma 8.4, this is the case and Pvu is uniquely specified by formula (8.21). Moreover, this formula clearly displays the fact that Pv is a linear transformation of U into V, since the vector b depends linearly on u. The rest of (2) is left to the reader as an exercise. Next, since (u - Pvu, Pvu - v)u = 0 for v E V, it is readily seen that (8.24)
Ilu -
vlI~
= lIu - Pvu + Pvu - vlI~ = lIu - Pvull~ + IIPvu - vlI~ ,
which serves to justify (3). The first part of (4) follows from formula (8.24) with v = 0; the second part is a straightforward calculation (it's a special case of (8.7)). Finally, (5) follows from (1), since G = VHV and b = VHu in the given setting. 0 Exercise 8.24. Show that in the setting of Lemma 8.15, k
lIu - LCjvjl12 ~ lIull 2-
(8.25)
bHG-lb
j=1
with equality if and only if Cj = (G-1b)j for j = 1, ... ,k. Exercise 8.25. Verify directly that the transformation Pv defined by formula (8.23) for U = en endowed with the standard inner product meets the following conditions: (1) (PV)2=PV.
(2) (PV)H = Pv.
(3) PVVj = Vj for j = 1, ... , k. (4) Pvu = 0 if u E V..l, computed with respect to the standard inner product. [HINT: Items (1), (2) and (4) are easy, as is (3), if you take advantage of the fact that Vj = Vej, where ej is the j'th column vector of In.] Exercise 8.26. Calculate the norm of the projection P that is defined in Exercise 8.21. Exercise 8.27. Show that if P is a nonzero projection matrix, then:
(a) (b)
I!PII = 1 if P is an orthogonal projection matrix. I!PII can be very large if P is not an orthogonal projection matrix.
8. Inner product spaces and orthogonality
174
Exercise 8.28. Let
[UI
U2
Ug
U4 U5
ua]
11 2 0 = [ 1 1
o andletU = span{Ul,U2,U3,U4}, V and W2 = span{u5}'
1 2 4 1
35 04]
1 0 1 0 0 -1 1
'
1 -1
= span{UI,U2,ug},
WI
=
span{u4}
(a) Find a basis for the vector space V. (b) Show that U = V+W1 and U = V+W2. (c) Find the projection of the vector U6 onto the space V with respect to the first direct sum decomposition. (d) Find the projection of the vector U6 onto the space V with respect to the second direct sum decomposition. (e) Find the orthogonal projection of the vector U6 onto the space V.
8.10. Orthogonal expansions If the vectors VI, ... ,Vk that are specified in the Lemma 8.15 are orthonormal in U, then the formulas simplify, because G = h, and the conclusions can be reformulated as follows:
Lemma 8.16. Let VI, ... ,Vk be an orthonormal set of vectors in an inner produ.ct space U over 1F and let V = span {VI, ... , Vk}' Then: (1) The vectors Vb ... , Vk are linearly independent.
(2) Vl.={UEU: (Vj,u)u=Oforj=l, ... ,k}. (3) The orthogonal projection Pvu of a vector the formu.la PvU =
(u, VI)UVI
U E
U onto V is given by
+ ... + (u, Vk)UVk.
(4) IIPvulI~ = E7=II(u, Vj)UI2 for every vector U E U. (5) (Bessel's inequality) EJ=II(u, Vj)UI2 ::; lIull~, with equality if and only ifu E V. Notice that: (1) It is easy to calculate the coefficients of a vector V in the span of VI, ... , Vk and its norm in terms of these coefficients: k
V = LCjVj => Cj = (v, Vj)u j=1
k
and
Ilvll~ =
L ICjI2. j=1
8.10. Orthogonal expansions
175
Pv of a vector
(2) It is easy to calculate the coefficients of the projection U onto V = span{vl, ... ,Vk}: k
Pv u
=L
k ejvj
===* ej = (u, Vj}u and IIPvulI~ = L
j=1
ICjI2.
j=1
(3) The coefficients ej, j = 1, ... ,k, computed in (2) do not change if the space V is enlarged by adding more orthonormal vectors. It is important to note that to this point the analysis in this section is applicable to any inner product space. Thus, for example, we may choose U equal to the set of continuous complex valued functions on the interval [0, 1], with inner product
(I, g)u =
101 f(t)g(t)dt.
Then it is readily checked that the set of functions
j = 1, ... ,k,
is an orthonormal family in U for any choice of the integer k. Consequently,
t
Ii,' 1(t)'I'; (t)dtl' $i,'I/(t)I'dt,
by the last item in Lemma 8.16. Exercise 8.29. Show that no matter how large you choose k, the family
(8.26)
k
(u, w)u = L(u, Uj)u (w, Uj)u j=1
and (u, u)u = L I(u, Uj)UI2 j=1
for every choice of U and w in U. Proof.
Since the given basis for U is orthonormal, k U=
k
LCjUj and w = LdiUj, j=1 i=1
where
Cj=(U,Uj)U
and
dj=(w,vj)u,
for
j=I, ... ,k.
8. Inner product spaces and orthogonality
176
Therefore, k
(U, W)U
k
k
k
= (I>jUj, L diuj) = L cjdi (Uj, Ui)U = L cidi , j=1
i=1
i,j=1
i=1
by the presumed orthonormality of the basis. The rest is plain.
0
Lemma 8.18. Let U and V be inner product spaces over IF with orthonormal bases Ul, ... ,uq and VI, .•. ,vP1 respectively, and let S be a linear transformation from U into V. Then q
p
L IISujll~ = L IIS*Vill~· j=1 i=1
(8.27) Proof.
By Lemma 8.17, p
p
IISujll~ = L I(SUj, i=1
vi)vl 2
=L
I(uj, S*vi)uI2 .
i=1
Therefore, q
p
q
p
2: 2:
L IISujll~ = I(uj, S*vi)uI2 = L IIS*Vill~· j=1 i=1 j=1 i=1
o Lemma 8.19. Let S be a linear transformation from an inner product space U over 1F into itself and let {u}, ... ,un} and {w}, ... ,wn } be any two orthonormal bases for U. Then: n
n
j=1
j=1
2: (SUj , Uj)u = L(SWj, Wj)u.
(8.28) Proof.
By Lemma 8.17, n
n
(SUj, Uj)U = L(SUj, Wi)U (Uj, Wi)U = L(SUj, Wi)U (Wi, Uj)U i=1 i=1 and
(Wi, S*Wi)U
n
n
j=1
j=1
= L(Wi, Uj)U (S*Wi, Uj)U = 2: (Wi, Uj)U (SUj, Wi)U.
Therefore, n
L(SUj,Uj)U j=1 as claimed.
n
=
L(Wi,S*Wi)U i=1
n
=
L(SWi,Wi)U, i=1
o
8.11. The Gram-Schmidt method
177
Exercise 8.30. Show that in the setting of Lemma 8.19 n
n
n
LIISujll~ = LIISwjll~
(8.29)
j=1
=
LIIs*ujll~. j=I
j=1
8.11. The Gram-Schmidt method Let {U 1, . .. ,Uk} be a set of linearly independent vectors in an inner product space U over IF. The Gram-Schmidt procedure is a method for finding a set of orthonormal vectors {VI, . .. ,Vk} such that
= span{ul, ... , Uj} for j = 1, ... ,k. The steps are as follows: Let ?vj denote the orthogonal projection onto Vj Vj
= span{vI, ...
,Vj}
and then: (1) Set VI (2) Set
= uI/llulliu. Then IIvlliu = 1.
W2 and check that
= U2 - ?vl U2 = U2 -
(W2' Vl)U
(U2' VI}UVI
= (U2' VI)U - (U2' VI)U (VI, VI)U = 0
and, since U2 and VI are linearly independent, IIW211u we can set V2 = w2/IIw211u. Notice that
=1=
O. Therefore,
span{uI, U2} = span{vb V2} . (3) Set
W3 = U3 -
PV2U3 =
U3 - (U3, VI}UVI - (U3, V2)U V 2
and check that
Therefore we can set
and verify that VI, V2, V3 is an orthonormal set of vectors such that span{vI, V2, V3}
= span{ub U2, U3} .
The first three steps should suffice to transmit the general idea. A complete formal proof depends upon induction, i.e., showing that if the procedure works for the first j vectors (and j < k), then it works for the first j + 1 vectors: Thus suppose V!, ... ,Vj is an orthonormal family of vectors such that
8. Inner product spaces and orthogonality
178
Then, set WjH
= UjH -?vj UjH = UjH -
(UjH' VI)UVI - ... - (Uj+I, Vj)UVj
and observe that (WjH' VI)U = ... = (wj+I, Vj)u = 0 and IIWjHllu
=1=
0.
Therefore, we can set Vj+! = wjH/llwjHllu and check that VI, ... ,VjH is an orthonormal set of vectors such that span{VI, ... ,VjH} = span{uI, ... ,UjH}
o
to finish.
Exercise 8.31. Show that if {UI, ... ,Uk} is a set of k linearly independent vectors in lF n , then there exists an invertible upper triangular matrix B E lF kxk such that the matrix V = [UI ... Uk] B has orthonormal columns. [HINT: This is a byproduct of the Gram-Schmidt procedure.] Exercise 8.32. Show that if A E c nxn is invertible, then there exists an invertible upper triangular matrix B such that AB is unitary and an invertible lower triangular matrix C such that C A is unitary. [HINT: Exploit Exercise 8.31.] Exercise 8.33. Show that if A E lR nxn is invertible, then there exist an invertible upper triangular matrix B such that AB is an orthogonal matrix and an invertible lower triangular matrix C such that C A is an orthogonal matrix. [IDNT: Exploit Exercise 8.31.] Exercise 8.34. Find a set of three polynomialspo(t) = a, PI(t) = b+ct, and Pa (t) = d + et + ft 2 with real coefficients a, b, c, d, e, f so that they form an f(t)g(t)dt. orthonormal set with respect to the real inner product (f, g) =
J;
8.12. Toeplitz and Hankel matrices The structure of the Gram matrix G of a set of vectors {VI, ... , Vk} in an inner product space U over C depends upon the choice of the vectors and upon the inner product. In many applications the Gram matrix that comes into play is either a Toeplitz matrix or a Hankel matrix. A matrix A E C nxn is said to be a Toeplitz matrix if its entries aij, i, j = 1, ... , n, depend only upon i - j, i.e., if aij = ai-j for i, j = 1, ... ,n, for some set of 2n - 1 numbers a-(n-I), ... , ao, ... , an-I. Toeplitz matrices occur naturally in the theory of stationary (and weakly stationary) sequences and in approximation and extension problems involving trigonometric polynomials.
179
8.12. Toeplitz and Hankel matrices
A matrix A E c nxn is said to be a Hankel matrix if its entries aij, i,j = 1, ... ,n, depend only upon i + j, Le., if
aij
= f3i+j-l
for i, j
= 1, ... ,n,
for some choice of the 2n - 1 numbers 131, . .. ,f32n-l. Hankel matrices occur in problems involving polynomial (and rational) approximation on IR or subsets of lR. If U is the space of continuous functions f(e i6 ) on the unit circle and an inner product is defined in terms of a function w( ei6 ) by the formula 1
r21r -g(e-i6 )w(e '(} )f(e '(} )dO,
(j,g)u = 27r Jo where w(e i(}) > 0 for 0
gjk
Z
Z
:s 0 < 27r, and if cpj(ei6 ) = eij6 for j = 1, ... ,n, then
r 1r
1 "6 '(} 'k6 = (CPk, CPj)U = 27r Jo e-I3 w(e )eZ dO = aj-k, Z
where 1 121r w(e''6 )e- Z3"6 dO 27r 0 is the j'th Fourier coefficient of w(ei6 ); i.e., a is a Toeplitz matrix. On the other hand, if U is the space of continuous functions f (x) on a subinterval of IR and an inner product is defined in terms of a function w{x) by the formula a' = J
(j, g)u =
Id g(x)w(x)f(x)dx,
where w{x) > 0 on the interval c < x
gjk
= (CPk,CPj)U =
l
c
< d, d
and if cPj{x) = xj, then
,
k
x'w(x)z dx
= bj+k'
where
Le.,
a is a Hankel matrix.
These simple examples help to illustrate the great interest in developing efficient schemes for solving matrix equations of the form Gx = band calculating a-I when a is either a Toeplitz or a Hankel matrix. Let 0 0
n
(8.30)
Zn
=
L eje~_j+1 = j=1
[f
0
0 1
...
0
n-l
:1
and N n =
I: ejef+1 . j=l
8. Inner product spaces and orthogonality
180
Exercise 8.35. Show that A E C nxn is a Toeplitz matrix if and only if ZnA is a Hankel matrix. Exercise 8.36. Show that if A E c nxn is a Hankel matrix with aij = !3i+j-l for i,j = 1, ... ,n, then, in terms of the matrices Z = Zn and N = N n defined in formula (8.30), n-l
n
A
= L!3jZ(NT )n-j
+ L!3n+jZNj.
j=1
j=1
Exercise 8.37. Show that if A E c nxn is a Toeplitz matrix with aij = ai-j, then, in terms of the matrices Z = Zn and N = N n defined in formula (8.30), n-l
n-l
A= 6""" a-iNi i=O
+ 6ai(N """ T )i . i=l
Exercise 8.38. The n x n Hankel matrix Hn with entries hij = 1/(i + j + 1) for i, j = 0, ... ,n - 1 is known as the Hilbert matrix. Show that the Hilbert matrix is invertible. [HINT: fol xixjdx = 1/(i + j + 1).] Exercise 8.39. Show that if Hn denotes the n x n Hankel matrix introduced in Exercise 8.38 and if a E en and bEe n are vectors with components ao, ... ,an-l and 1>0, ••• ,bn-l. respectively, then (8.31)
(Hna, b)st
= 2~ 127r o
('I:
bke-ikt) (ie-it (71' - t»)
~o
('I:
aje- ijt ) dt.
3~
Exercise 8.40. Show that if Hn denotes the n x n Hankel matrix introduced in Exercise 8.38, then IIHn 11 2,2 < 71'. [HINT: First use formula (8.31) to prove that I(Hna, b)stl ~ 71'llaIl21IbI12.]
8.13. Gaussian quadrature Let w(x) denote a positive continuous function on a finite interval a ~ x ~ b and let U denote the inner product space over 1R of continuous complex valued functions on this interval, equipped with the inner product
(f, g)u =
lb
g(x )w(x )f(x )dx .
Let Pk, k = 0, 1, ... ,denote the k+l-dimensional subspace of polynomials of degree less than or equal to k (with complex coefficients), let s:t3n denote the orthogonal projection ofU onto P n , let Mx denote the linear transformation on U of multiplication by the independent variable x and let
Sn = s:t3nMxlPn
for
n = 0,1 ... ;
8.13. Gaussian quadrature
i.e., Sn maps f E Pn and Sn = S~; i.e.,
---+
181
f.PnMxf. Then clearly Pn is invariant under Sn
(Snf, g)u = (I, Sng)U
for every choice of
f, g E Pn ·
Consequently, there exists an orthonormal set of vectors
Sn'Pj
= Aj'Pj
and
Aj E lR. for j
= 0, ...
, n.
Thus, if
'1rk(X)=X k for k=O,I, ... , then, for 1 ::; k ::; n + 1 and j = 0, ... ,n, ('Pj, '1rk)U = ('Pj, Sn'Trk-I)U = (Sn'Pj, 'Trk-I)U = Aj('Pj, 'Trk-I}U, and hence, upon iterating this formula, we obtain (8.32) ('Pj, '1rk}U = Aj('Pj, '1ro)u for j = 0, ... ,n' and
k
= 1, ...
,n + 1.
Lemma 8.20. If p( x) is any polynomial of degree less than or equal to n + 1 with complex coefficients, then:
(1) ('Pj,p}u = p(Aj)('Pj, '1ro)u,
°
(2) 'Pj(Aj)('Pj, '1ro}u = 1 and 'Pk(Aj) = if j -:f. k. (3) If p(x) is a polynomial of degree n + 1 such that then p(Aj) = Proof.
°for
Let p(x)
('Pj,p}=o
for j=O, ... ,n,
j = 0, ... ,no
= L~!Ol Cixi. Then, in view of formula (8.32), n+l
('Pj,p}u = ('Pj,CO)u+
I: Ci(
n+l
=
Co ('Pj , l}u
+ I: CiA; ('Pj, 'Tro)u i=l
= p(Aj )('Pj, 'Tro}u , which verifies (1) and, upon choosing p = 'Pk, yields the formula
'Pk(Aj)('Pj, '1ro)u = ('Pj, 'Pk)U, which leads easily to (2), since the 'Pj are orthonormal. Finally, (3) is an easy consequence of the formula in (1) and the fact that ('Pj, '1ro)u -:f. for j = 0, ... ,n, which is immediate from (2). 0
°
8. Inner product spaces and orthogonality
182 Theorem 8.21. Let Wj
l
= 1( 7r0, 'Pj}UI2 for j = 0, ... ,n. Then the formula n
b
(8.33)
w(x)f(x)dx =
L Wi/(Ai) i=O
a
is valid for every polynomial f(x) of degree less than or equal to 2n + 1 with complex coefficients. Proof. It suffices to verify this formula for f(x) = 7rk(X), k Consider first the case k = i + j with i, j = 0, ... ,n. Then
lb
7ri (x)w(x)7rj {x)dx =
= 0, ... ,2n+1.
(7rj,7ri}U
~ (~(or;, 'P,)u 'P"
t.
(or" 'Pt)u 'Pt ) u
n
=
L
(7rj, 'Ps}u ('Pt, 7ri}U ('Ps, 'Pt}u
s,t=O n
=
L 7rj(As)7ri(As)('Ps, 7ro}u(7ro, 'Ps}u s=O
To complete the proof, it remains only to check that
1 b
(8.34)
w(x)x 2n+1dx
a
2n+l
=
L A~n+1I(7r0,'Ps}uI2. t=Q
o
The details are left to the reader.
Exercise 8.41. Justify formula (8.34). [HINT: First check that the integral is equal to (7rn , Sn7rn}U.] Finite sums like (8.33) that serve to approximate definite integrals, with equality for a reasonable class of functions, are termed quadrature formulas. The notation (8.35) and will prove useful in the next three exercises.
Kn
=
8.14. Bibliographical notes
183
Exercise 8.42. Show that in the setting of this section, n
s:t3n Xn +1 =
L
bjx j ,
where
j=O
hj and the (n + 1) x (n
+ 1) Hankel matrix Hn are defined in {8.35}.
Exercise 8.43. Show that in the setting of this section, Sn
n ~ ajx j J=O
n [bo] =~ bjxl {:::::} : = H;;l Kn J=O
bn
[ao]:
,
an
where the {n+ 1} x {n+ 1} Hankel matrices Hn and Kn are defined in {8.35}. Exercise 8.44. Show that if a and b are vectors with components and bo, ... ,bn , respectively, then (8.36) n
(L ;=0
n
ajxj ,
L
n
n
bkXk)U
=
bH
Hn a
k=O
and (Sn
L
;=0
ao, ... ,an
a; xl ,
L
bkXk)U
= b H Kna,
k=O
where the (n+1) x {n+1} Hankel matrices Hn and Kn are defined in (8.35).
8.14. Bibliographical notes Example 8.7 is adapted from the monograph by Borwein and Lewis [10J. The bound on the norm of the Hilbert matrix that is developed in Exercises 8.39 and 8.40 is adapted from the discussion of the Hilbert matrix in Peller [56J. Another way to compute this bound is discussed in Chapter 16. The treatment of Gaussian quadrature that is presented in this chapter, including the proofs of Lemma 8.20 and Theorem 8.21, is adapted from the PhD thesis of Han Degani [19J.
Chapter 9
Symmetric, Hermitian and normal matrices
Not everything that one thinks, should one say; not everything that one says, should one write, and not everything that one writes, should one publish. Dictum of the Soloveitchik family, cited in [12], p. 135 In this chapter we shall focus primarily on the inner product space equipped with the standard inner product
en
(9.1) The subscript st will be used initially to emphasize the dependence of the result under discussion on the inner product, especially in the formulation of the result, but may be dropped when the intent is clear from the surrounding content. A number of facts that are easily available from the analysis in Chapter 8 will be re-proved in this setting by different arguments because of the importance of both the facts and the methods. Moreover, in many respects this chapter is a natural continuation of Chapter 6, since it focuses on matrices and the implications of extra structure of a matrix A on the corresponding Jordan forms J. Recall that a matrix A E lF nxn is said to be symmetric if A = AT. It is said to be Hermitian if A = AH. If A E jRnxn, then A is Hermitian if and only if it is symmetric. But this is not true if A E C nxn. Thus, for example, the matrix
A=
[!
~] is symmetric but not Hermitian,
-
185
9. Symmetric, Hermitian and normal matrices
186
whereas the matrix
B=
[~i ~] is Hermitian but not symmetric.
A matrix A E lF pxq is said to be isometric if AHA = I q • A matrix A E R nxn is said to be an orthogonal matrix if AT A= In.
A matrix A E
e nxn is said to be a unitary matrix if AHA=In
.
The preceding three definitions are linked to the standard inner product; see Exercises 9.1, 9.2 and 9.13. Exercise 9.1. Show that the columns of an n x n orthogonal matrix form an orthonormal family in R n with respect to the standard inner product (9.1). Exercise 9.2. Show that the columns of an n x n unitary matrix form an orthonormal family in en with respect to the standard inner product (9.1).
e
Exercise 9.3. Let A E nxn . Show that AHA
(Ax, Ax) st
=
(x, x) st
= In if and only if
for every x E en.
Our first main objective is to show that every Hermitian matrix is diagonalizable; i.e., A = U DU- 1 , where D is a diagonal matrix with real entries and U may be chosen to be unitary.
9.1. Hermitian matrices are diagonalizable Lemma 9.1. Let A E e nxn be a Hermitian matrix. Then:
(1) The eigenvalues of A are real (even if A is complex). (2) Eigenvectors corresponding to distinct eigenvalues of A are orthogonal with respect to the standard inner product (9.1).
Proof. Let u, v E en be such that Au of nonzero vectors u and v. Then a(u, v}st
= au and Av = j3v for some pair
= (au, v}st = (Au, v}st = (u, AH v) st = (u, Av) st = (u, j3v}st = {j(u, v}st.
Therefore, (9.2)
{a -{j)(u, v}st = 0 and
(a - a)(u, u}st =
o.
9.1. Hermitian matrices are diagonalizable
187
The second equality, which follows from the first by choosing v = u and a = /3, implies that
a=a, and hence that the eigenvalues of A are automatically real. Thus, 73 = /3, and if a =I /3, then the first formula in (9.2) implies that u is orthogonal to v. 0 Exercise 9.4. Let A E C pxq • (a) Show that RA nNAH = {o}. (b) Show that if A = AH, then p = q and
RA nNA
= {o}.
Lemma 9.2. If A E C nxn is Hermitian, then .N(A->.In)k
= N(A->.In )
for every positive integer k and every complex number A.
Proof. If A is not an eigenvalue of A, then A - A1n and (A - AIn)k are both invertible matrices and hence N(A->.In ) = N(A->.In)k = {o}. On the other hand, if A is an eigenvalue of A and u E N(A->.In)k+l for some positive integer k, then the vector v = (A - AIn)ku
belongs to R(A->.In ) nN(A->.In)' Moreover, since A E R , the matrix A - AIn is also Hermitian; i.e., A - AIn = (A - AIn)H. Therefore, v = 0, because R(A->.In ) n .N(A->.In)H = {O}, thanks to Exercise 9.4. Thus,
(A - AIn)k+1u = 0 ~ (A - AIn)ku = 0 for every positive integer k, which justifies the asserted identity.
0
Theorem 9.3. If A E c nxn is Hermitian, then A is unitarily equivalent to a diagonal matrix D E R nxn; i.e., there exists a unitary matrix U E c nxn and a diagonal matrix D E R nxn such that
(9.3) Proof. Let AI, ... ,Ak denote the distinct eigenvalues of A. Then, in view of Lemma 9.2, the algebraic multiplicity aj of each eigenvalue Aj is equal to the geometric multiplicity Ij' Therefore, each of the Jordan cells in the Jordan decomposition of A is 1 x 1; that is to say, the Jordan matrix J in the Jordan decomposition A = UJU- 1 must be of the form
188
9. Symmetric, Hermitian and normal matrices
where B Aj is an (Xj x (Xj diagonal matrix with Aj on the diagonal. In particular, J is a diagonal matrix. Consequently each column in the matrix U is an eigenvector of A. By Lemma 9.1, the eigenvectors corresponding to distinct eigenvalues are automatically orthogonal. Moreover, the columns in U corresponding to the same eigenvalue can be chosen orthonormal (by the Gram-Schmidt procedure). Thus, by choosing all the columns in U to have norm one, we end up with a unitary matrix U. D Example 9.4. Let A be a 5 x 5 Hermitian matrix with characteristic polynomial p(A) = (A - Al)3(A - A2)2, where Al =I- A2. Then, by Theorem 9.3, dimN(A1Is-A) = 3 and dimN( A2ls- A ) = 2. Let UI, U2, U3 be an orthonormal basis for N(A1Is-A) and let U4, Us be an orthonormal basis for N( A2ls- A )' This can always be achieved by invoking the Gram-Schmidt method, in each nullspace separately, if need be. Therefore, since the eigenvectors of a Hermitian matrix that correspond to distinct eigenvalues are automatically orthogonal, the full set UI, ... ,Us is an orthonormal basis for C s. Thus, upon setting
U = [UI
and D
.•. us]
= diagonal{>.l, AI, AI, A2, A2},
one can readily check that
AU=UD and that U is unitary. Remark 9.5. Since U is unitary
{::::::::>
U is invertible and U- 1 = U H ,
the computation of the inverse of a unitary matrix is remarkably simple. Moreover,
U unitary
===> (Uu, UV)st = (u, v)st
for every choice of u, vEe n . Exercise 9.5. ShowthatifE =
= AH ,
= BH,
[~ ~]
C = C H and D [HINT: It suffices to focus on A
A
B
=
is a2px2pmatrixwithpxp blocks
D H , then A E O'(E)
.]
{::::::::>
XE O'(E).
9.2. Commuting Hermitian matrices
c
c
Theorem 9.6. Let A E nxn and BE nxn be Hermitian matrices. Then AB = BA if and only if there exists a single unitary matrix U E C nxn that diagonalizes both A and B.
189
9.2. Commuting Hermitian matrices
Proof. Suppose first that there exists a unitary matrix U E C nxn such that DA = U HAU and DB = U HBU are both diagonal matrices. Then, since DADB = DBDA, AB = UDAUHUDBU H = UDADBUH = UDBDAU H = UDBUHUDAU H = BA. The proof of the converse is equally simple in the special case that A has n distinct eigenvalues .AI, ... ,.An, because then the formulas
AUj
= .AjUj,
j = 1, ...
,n,
and imply that
BUj = {jjUj for j = 1, ... ,n and some choice of {jl, ... ,{jn E C . But, since Ul, ... ,Un may be chosen orthonormal, this is the same as to say that .AI
AU=U [ for some unitary matrix U E cnxn. Suppose next that AB = BA and A has k distinct eigenvalues .AI, ... ,.Ak with geometric multiplicities 11, ... "k, respectively. Then there exists a set of k isometric matrices Ul E C nx'Y1, ... , Uk E C nX"/k such that U = [Ul ... Uk] is unitary and AUj = .AjUj. Therefore, for j = 1, ... ,k,
ABUj = BAUj = .AjBUj
which implies that the columns of the matrix BUj belong to N(A-Ajln ) and hence, since the columns of Uj form a basis for that space, that there exist matrices Cj such that
BUj
= UjCj for
j
= 1, ... , k .
The supplementary formulas
Cj = UjHUjCj = UfBUj = Cf for j = 1, ... ,k exhibit Cj as a Ij x Ij Hermitian matrix. Therefore, upon writing Cj = WjDjWjH for j = 1, ... ,k, with Wj unitary and Dj diagonal, and setting
Vj
= UjWj for
j = 1, ... ,k and
W
= diag{W1 , ... , Wd,
one can readily check that
AVj = AUjWj
= .AjUjWj = .AjVj for
j
= 1, ...
,k
9. Symmetric, Hermitian and normal matrices
190
and
Thus, the matrix V = [VI .. , Vk] = UW is a unitary matrix that serves D to diagonalize both A and B.
9.3. Real Hermitian matrices In this section we shall show that if A = AH and A E R nxn, then the unitary matrix U in formula (9.3) may also be chosen in R nxn. Theorem 9.7. If A = AH and A E Rnxn, then there exist an orthogonal matrix Q E R nxn and a real diagonal matrix D E R nxn such that (9.4) Proof. Let p. E u(A) and let Ul, ... , Ue be a basis for the nullspace of the matrix B = A - p.In. Then, since B E R nxn, the real and imaginary parts of the vectors u; also belong to N B : If Uj = Xj + iYj with Xj and Yj in R n for j = 1, ... ,i, then
A(xj + iYj)
= p.(Xj + iYj) ==> AXj = J.LXj and AYj = p.Yj for j = 1, ... ,f
j
i.e., the vectors x,'
= U'+u3 3 2
and y.
3
= U'-u3 3 2i
also belong to NB. Moreover, since span{xl"" ,Xe, Yl"" ,Ye}
=NB ,'
.e of these vectors form a basis for N B . Next, by invoking th~ Gram-Schmidt procedure, we can find an orthonormal basis of f vectors in R n for N B . If A has k distinct eigenvalues AI, ... ,Ak, let Qi, i = 1, ... ,k, denote the n x 'Yi matrix that is obtained by stacking the vectors that are obtained by applying the procedure described above to Bi = A-AiIn for i = 1, ... ,k. Then, AQi = AiQi and A [Ql
...
Qk] = [Ql ...
Qk] D where D = diag {AlL'Yl'''' ,AkI'Yk}'
Moreover, the matrix Q = [Ql Qk] is an orthogonal matrix, since all the columns in Q have norm one and, by Lemma 9.1, the columns in Qi are 0 orthogonal to the columns in Qj if i =1= j. Lemma 9.8. If A E jRPx q , then
max {IIAxli st : x E C q and IIxlist = 1} = max {IIAxllst : x E Rq and IIxlist = 1}.
9.4. Projections and direct sums in lF n
191
Proof. Since A E ~pxq, AHA is a real q x q Hermitian matrix. Therefore, AH A = QDQT, where Q E ~qxq is orthogonal and D E jRqxq is diagonal. Let 8 = max {>. : >. E O'(AH An, let x E e q and let y = QTx . Then 82: 0 and
IIAxll;t =
(AH Ax,x)st = (QDQT x,x)st (DQTx,QTx)st = (DY,Y)st n
-
n
L djjYjYj ~ 8 LYjYj j=1
j=1
- 8l1yll;t = 811QTxll;t = 811xll;t . Thus,
en
max {IIAxlist : x E and IIxlist = 1} = v'8. However, it is readily seen that this maximum can be attained by choosing x = Qel' the first column of Q. But this proves the claim, since Qe1 E 0
~q.
9.4. Projections and direct sums in lF n • Projections: A matrix P E lF nxn is said to be a projection if p2 =
P. • Orthogonal projections: A matrix P E lF nxn is said to be an orthogonal projection (with respect to the standard inner product (9.1)) if p2 = P and pH = P. Thus, for example,
p =
[!
~]
is a projection, but it is not an orthogonal projection with respect to the standard inner product unless a = O. Lemma 9.9. Let P E lF nxn be a projection, let'Rp let N p = {x E lF n : Px = o}. Then
E lFn} and
lF n = Rp+Np .
(9.5) Proof.
= {Px : x
Clearly x = Px + (In - P)x for every vector x E lFn.
Therefore, since Px E Rp and (In - P)x E N p , it follows that lF n Rp + Np. It remains only to show that the indicated sum is direct. This is 0 left to the reader as an exercise. Exercise 9.6. Show that if P is a projection on a vector space U over IF, then Rp nNp = {o}.
9. Symmetric, Hermitian and normal matrices
192
Exercise 9.7. Let ej denote the j'th column of the identity matrix 14 for j = 1, ... ,4, and let
u
~
[
n~ II ] ~ Wj
,W,
[
f]
,W3
~
[
n' ~ n. w, [
Compute the projection of the vector u onto the subspace V with respect to the direct sum decomposition IF 4 = V+ W when: (a) V = span {el, e2} and W = span {WI, W2}. (b) V = span {el,B2} and W = span {W3, wd· (c) V = span {el, e2, WI} and W = span {wd. [REMARK: The point of this exercise is that the coefficients of are different in all three settings.]
el
and
e2
Lemma 9.10. LetlF n = V+W and let V = [VI ... Vk], where {Vi, ... , Vk} is a basis for V, and let W = [WI ... Wf J, where {WI, ... , wt} is a basis for W. Then: (1) The matrix [V
W] is invertible.
(2) The projection Pv of lF n onto V with respect to the decomposition lF n = V+W is given by the formula (9.6)
Pv = V[lk
O][V
Wri.
Proof. Let u E lFn. Then there exists a unique vector e E lF n with entries CI, .•. ,en such that u = CIVl + ... + CkVk + ck+1Wl + ... + CnWe or, equivalently, u=
[V W]e.
Consequently, (1) holds and Pvu = ClVl + ... + CkVk = V[h
O]e,
where e
= [V Wr1u.
o If P is an orthogonal projection, then formula (9.6) simplifies with the help of the following simple observation:
Lemma 9.11. Let A (9.7) Proof.
E lF pxq , let
u
E
lF q and v E lF P . Then
(Au, v)st = (u,AHv)st. This is a pure computation: (Au, v)st = v H(Au) = (v HA)u = (AHv)Hu = (u,AHv)st.
o
9.4. Projections and direct sums in lF n
193
Lemma 9.12. In the setting of Lemma 9.10, Pv is an orthogonal projection with respect to the standard inner product (9.1) if and only if
(9.8)
(Vx, WY)st = 0 for every choice of x
and Y E lFt.
E]Fk
Moreover, if (9.8) is in force, then
Pv =
(9.9) Proof.
V(VHV)-IV H .
Let P = Pv. If P = pH, then
(Vi, Wj)
(PVi, Wj) = (Vi, pHWj) (Vi, PWj) = (Vi,O)
= o. Thus, the constraint (9.8) is in force. Conversely, if (9.8) is in force, then, since VHW = Okxe and WHy = Oexk, it is readily checked that [V
(9.10)
W]
-1
=
[(VHV)-IV H ] (WHW)-IWH
and hence that formula (9.6) simplifies to Pv
= V[h 0]
(VHV)-l VH ] [ (WHW)-lWH
= V(V
H
V)
-1
V
H
, D
as claimed.
Exercise 9.8. Double check the validity of formula (9.10) by computing [V W [V W] and [V W] [V W
r1
rl.
Lemma 9.13. Let P E lF nxn be an orthogonal projection (with respect to the standard inner product (9.1»). Then (1) N p is orthogonal to Rp (with respect to the standard inner product).
(2) lF n = Rp (f) N p . Proof.
Let u E Rp and (u, v}st
V
E
N p . Then
=
(Pu, v)st = (u, pHv}st = (u, PV)st = (u,O}st = 0,
since P = pH and V E N p . This completes the proof of (1). The second D assertion is then immediate from Lemma 9.9. The next result includes a more general version of (2) of the last lemma that is often useful.
Lemma 9.14. Let A E lF pxq , RA = {Ax: x E lF q }, NA = {x E lF q O}, etc. Then, with respect to the standard inner product:
(1) lF P = RA (f) NAH.
:
Ax =
9. Symmetric, Hermitian and normal matrices
194
(2) IF q = RAH (J)NA. (3) RA (4) NA
= RAAH and RAH = RAHA= NAHA and NAH = NAAH.
Proof. Since rankA = rankAT of dimension implies that
= rankAH, the principle of conservation
p = dim RAH = dim RA
+ dimNAH
+ dimNAH
.
Therefore, to complete the proof of the first assertion, it suffices to show that RA is orthogonal to NAH. To this end, let u E RA and v E NAH. Then, since u = Ax for some vector x E C q,
(u, v)st
=
(Ax, v}st
=
(x,O}st
= (x, AHv}st
= O.
This completes the proof of the first assertion. The second then follows immediately by replacing A by AH in the first. In particular, (2) implies that every vector u E IF q can be expressed as a sum of the form u = AHv + w for some choice of v E IF P and w E N A . Thus, Au = A(AHv + w) = AAHv. This shows that 'RA ~ 'RAAH. Therefore, since the opposite inclusion is self-evident, equality must prevail, which proves the first formula in (3). Next, the implications AHAu = 0
==> u HAH Au = 0 => IIAullst = 0 => Au = 0
yield the inclusion NAHA ~ NA. Therefore, since the opposite inclusion is self-evident, equality must prevail. This justifies the first assertion in (4). The second assertions in (3) and (4) follow by interchanging A and AH. D Exercise 9.9. Show directly that 'RA n NAH = 0 for every p x q matrix A. [HINT: It suffices to show that (u, u)st = 0 for vectors u that belong to both of these spaces.] The next lemma will be useful in the sequel, particularly in the development of singular value decompositions, in the next chapter. Lemma 9.15. Let V E IF nxr be a matrix with r columns that are orthonormal in IF n with respect to the standard inner product. Then r ~ n. Moreover, if r < n, then we can add n - r columns to V to obtain a unitary matrix. Proof.
By Lemma 9.14,
9.6. Normal matrices
195
By assumption, the columns VI, ... , Vr span 'Rv. Ifn = r, then V is unitary and there is nothing left to do. If r < n, let Wr+I. ... , Wn be a basis for NVH. By the Gram-Schmidt algorithm, there exists an orthonormal family V r+1, . .. , V n that also spans N v H. The matrix [V v r+1 ... v nl with columns VI, ... , Vn is unitary. 0 Exercise 9.10. Let A E C pxq and B E C pxr • Show that
9.5. Projections and rank Lemma 9.16. Let P and Q be projection matrices in lF nxn such that liP - QII < 1. Then rankP = rankQ. Proof. The inequality liP - QII < 1 implies that the matrix In - (P - Q) is invertible and hence that
rankP
rank{P(In
-
(P - Q)} = rank{PQ}
< min{rankP, rankQ}. Therefore, rankP
~
rankQ.
On the other hand, since Q and P can be interchanged in the preceding analysis, the inequality rank P ~ rank Q must also be in force. Therefore, rank P = rank Q, as claimed. 0
9.6. Normal matrices In Section 9.1, we showed that every Hermitian matrix A can be diagonalized "by" a unitary matrix Uj Le., AU = UD, where D is a diagonal matrix. This is such a useful result that it would be nice if it held true for other classes of matrices. If so, then a natural question is: What is the largest class of matrices which can be diagonalized "by" a unitary matrix? The answer is the class of normal matrices .
c
• normal matrices: A matrix A E nxn is said to be normal if AH A = AAH. Notice that in addition to the class of n x n Hermitian matrices, the class of n x n normal matrices includes the class of n x n unitary matrices. Lemma 9.17. If A E c nxn is normal, then
9. Symmetric, Hermitian and normal matrices
196
Proof.
This is a consequence of the following sequence of implications:
Au=O
<===}
IIAulist = 0
<===}
(AH Au, u}st = 0
<===}
(AAHu, u}st = 0
<===}
AHu=O.
o Lemma 9.18. If A E
c nxn is a normal matrix, then NAk =NA
for every positive integer k.
Proof. Let k be a positive integer and let u E N Ak+l. Then the vector v = Aku belongs to 'RAnNA. Therefore, since NA =NAH by Lemma 9.17 and 'RA n NAH = {O} by Lemma 9.14, it follows that v = o. This proves that NAk+l ~ NAk and hence, as the opposite inclusion is self-evident, that NAk+l = NAk for every positive integer k. Thus, NAk = NA for every positive integer k, as advertised. 0
Exercise 9.11. Show that if A
E C nXn is normal, then (Ak)H A
= A(AH)k
for every nonnegative integer k.
Exercise 9.12. Give a second proof of Lemma 9.18 by justifying the following sequence of implications: Aku = 0 {=} (Ak)H Aku = 0 {=}
AH Au
<===}
(AH A)ku = 0
= 0 <===} Au = O.
[HINT: Lemma 9.2 is applicable to AHA.]
Lemma 9.19. Let A E c nxn be a normal matrix. Then N(A->..In)k = N(A->.In )
for every complex number>' E C and every positive integer k.
Proof. In view of Lemma 9.18, it suffices to observe that if A is normal, then A - )"In is also normal for every complex number>' E C. 0 The preceding result guarantees that normal matrices are diagonalizable. It remains to check that the diagonalization can be effected by unitary matrices.
Lemma 9.20. The eigenvectors of a normal matrix corresponding to distinct eigenvalues are automatically orthogonal.
9.6. Normal matrices
Proof.
Let A
E
197
c nxn
be a normal matrix and suppose that Au = au and Av = {3v
for some pair of nonzero vectors u and v. Then, by Lemma 9.17, AHv =
f3v
and hence
a(u, v}st = (Au, v}st = (u,AHv}st
= (u, /3v}st = {3(u, v}st.
Therefore, (a - (3)(u, v}st = 0, which clearly implies that u is orthogonal to v if a =1= {3. o The preceding analysis shows that normal matrices can be diagonalized by unitary matrices. The next theorem shows that this is the end of the line: Theorem 9.21: Let A E C nxn. Then there exists an orthonormal basis of consisting of eigenvectors of A if and only if A is normal.
en
Proof. Suppose that Ul, ... ,Un is an orthonormal family of eigenvectors of A, let AI, ... ,An denote the corresponding eigenvalues and let U=
[Ul
.•.
un]
be the n x n matrix with columns Ul, •.. ,Un and n = diag {AI ... ,An}. Then AU=UD and, since U is unitary, A = UDU H and AH = unHu H . Therefore, since DDH
= DH D = diag{I AlI 2 , •.. ,IAnI2},
one can readily see that AAH
= unnHu H = uDHnu H = AH Aj
i.e., A is normal. Conversely, if A is normal, then there exists an orthonormal basis of en made up of eigenvectors of A, since the eigenvectors corresponding to distinct eigenvalues are orthogonal by Lemma 9.20 and a linearly independent set of eigenvectors corresponding to the same eigenvalue can be replaced by an orthonormal set via the Gram-Schmidt orthogonalization method. 0
9. Symmetric, Hermitian and normal matrices
198
Exercise 9.13. Show that if A E Fpxq is an isometric marix, then (Au, AV}st = (u, v}st for every choice of u, v E IF q •
Exercise 9.14. Let U be an n x n unitary matrix. Show directly that the eigenvectors corresponding to distinct eigenvalues are orthogonal with respect to the standard inner product in and that if J.L is an eigenvalue of U then IJ.LI = 1.
en
9.7. Schur's theorem We have observed that every normal matrix is unitarily equivalent to a diagonal matrix. A rather useful theorem of Issai Schur states that every square matrix is unitarily equivalent to a triangular matrix:
Theorem 9.22. Let A c nxn such that
E C nxn .
Then there exists a unitary matrix V E
VHAV=S
(9.11)
is upper triangular. Moreover, if A is similar to an upper triangular matrix
B, then V can be chosen so that the diagonal entries of S coincide with the diagonal entries of B; i.e., Sjj = bjj for j = 1, ... ,n.
Proof. Let A E c nxn and let U = [UI ... in C nXn with columns UI, ... ,Un such that B
un]
be an invertible matrix
= U-IAU
is upper triangular. Such a matrix U always exists, because B may be taken equal to a Jordan form of A. Then it is readily checked that AUI E span{ Ul}
AU2 E span{ul, U2}
AUj E span{ul, ... ,Uj} and hence that Mj
= span{ul, ...
is a j-dimensional subspace of
,Uj}
en such that
MI eM2
e ···eM
and
AMj
~
Mj , for j = 1, ... ,n .
9.7. Schur's theorem
199
By the Gram-Schmidt procedure we can construct an orthonormal set of vectors VI, ... , Vn such that span{VI, ... ,vj}=span{uI, ... ,Uj}
for
j=l, ... ,no
Therefore, AVj E
span{vI, ... ,Vj} for j = 1, ...
,n.
But this means that the matrix V with columns VI, ... , Vn can be written as V=U8 for some invertible upper triangular matrix 8 and hence that AV = AU8 = UB8 = V8- 1B8. This proves the first assertion, since V is unitary and 8- 1 B8 is upper triangular. Moreover, upon writing 8=DI+X1 , B=Do+Xo and 8- 1 =D2 +X2, where D j is diagonal and Xj is strictly upper triangular (Le., upper triangular with zero entries on the diagonal), it is readily checked that 8- 1B8
+ X2)(Do + XO)(D1 + Xl) D2DOD1 + X3 , (D2
=
where X3 is strictly upper triangular and hence as D2DODI = DoD2D1 and D2 = Dl1 , the diagonal component of 8 agrees with the diagonal component of B, as claimed. 0 The usefulness of the decomposition (9.11) rests on the fact that the diagonal entries of 8 run through the eigenvalues of A repeated according to algebraic multiplicity. To verify this statement, recall that if A is an n x n matrix with eigenvalues J-L1,··. , J-Ln, then
det(Aln - A) = (A - J-L1) •.• (A - J-Ln) . On the other hand, formula (9.11) implies that
det{Aln - A) = det{Aln - 8) = {A - S11)'" (A - snn) , since 8 is triangular. In particular, n
n
L
lJ-Ljl2 = L ISjjl2 .
j=l
j=l
200
9. Symmetric, Hermitian and normal matrices
Corollary 9.23. Let A be an n x n matrix with eigenvalues 1-£1,· .• repeated according to algebraic multiplicity. Then n
n
j=1
i,j=1
,I-£n,
L II-£jl2 ~ L laijl2 .
(9.12) Proof.
By Schur's theorem, there exists a unitary matrix U such that UHAU= S
is upper triangular and trace {AHA}
= I-£i for i = 1, ...
Sii
, n. Therefore,
=
trace {USHUHUSU H }
= trace {US H SU H }
=
trace {SH SU H U} = trace {SH S} ,
which supplies the identity n
n
2: I ijl2 = L S
i,j=1
for the entries
Sij
of Sand
aij
laijl2
i,j=1
of A. Therefore,
n
n
2: II-£il 2=
L
i=1
i=1
n
IS iil 2 ~
L
laijl2.
i,j=1
D
Schur used the estimate (9.12) to give a simple proof of one of Hadamard's inequalities (9.13): Corollary 9.24. Let A E Then
c nxn
and let "I = max {Iaijl
Idet AI
(9.13)
~
:
i,j = 1, ... ,n}.
"In nn/2 .
Proof. Let ""1, ... ,I-£n denote the eigenvalues of A repeated according to their algebraic multiplicity. Then det A = 1-£11-£2'" I-£n and the inequality between the geometric and arithmetic means followed by an application of the bound (9.12) implies that
11-£11 21""21 2... ll-£nl 2 ~ ('1-£112 + 11-£212n+'" + Il-£nl2) n
(~.t
<
laij
12) n
',3=1
<
(n:'Y2) n
The rest is plain sailing. Exercise 9.15. Let A,B E
D
c nxn .
Show that
9.B. QR factorization
(a) det ()"In - AB)
201
= det ()"In - BA) for every point).. E C .
(b) The matrices AB and BA have the same set of eigenvalues with the same algebraic multiplicities.
c
Exercise 9.16. Let A E nxn . Show that trace A = Ej=l (Auj, Uj) for every orthonormal basis {U 1, . .. ,Un} of [HINT: trace A = trace (U H AU).]
en.
Exercise 9.17. Let J.l1, ... ,J.ln denote the eigenvalues of A E c nxn , repeated according to their algebraic multiplicity and let t51 (A), ... ,t5n (A) denote the eigenvalues of AH A. Show that Ej=11J.ljI2 ~ Ej=l t5j (A).
c
Exercise 9.18. Let A, B E nxn and assume that AB = (AB)H. Show that trace {(AB)H AB} ~ trace {(BA)H BA}. [HINT: Exploit the preceding exercises.]
9.8. QR factorization Lemma 9.25. Let A E lF pxq and suppose that rankA = q. Then there exist a unique isometric matrix Q E lF pxq and a unique upper triangular matrix R E lF qxq with positive entries on the diagonal such that A = QR.
Proof. The existence of at least one factorization of the indicated form is a consequence of the Gram-Schmidt procedure. To verify the asserted uniqueness, suppose that there were two such factorizations: A = Q1R1 and A = Q2R2. Then RfI R1
= RfIQfIQIR1 = AHA = R¥Q¥Q2R2 = R¥ R2
and hence,
Therefore, since the left-hand side of the last equality is upper triangular while the right-hand side is lower triangular, it follows that R 1(R2)-1
= D = {DH}-l
is a diagonal matrix with positive diagonal entries. Therefore, in view of the relation D = {DH}-1, it follows that D = I q. Thus, Q1 = Q2 and Rl = R2, as claimed. 0
Exercise 9.19. Let A =
[~ ~ ~].
Find an isometric matrix Q E 1R 4x3
and an upper triangular matrix R E Ii 3x3 with positive entries on the diagonal such that A = QR.
9. Symmetric, Hermitian and normal matrices
202
9.9. Areas, volumes and determinants To warm up, consider the following: Exercise 9.20. Let a, bE lR 2 and let V = [a b] denote the 2 x 2 matrix with columns a and b. Then the area of the parallelogram generated by a and b is equal to Idet VI. Lemma 9.26. Let a, bE lR n and let V = [a b] denote the n x 2 matrix with columns a and b. Then the area of the parallelogram generated by a and b is equal to {det (VHV)}1/2. Proof. To begin with, let us assume that a and b are linearly independent. Then the area of the parallelogram of interest (as drawn in Figure 1) is equal to hllall. ,, , ,, ,
,, ,, ,
, ,
,, ,
\h ,, ,, ,
I
Figure 1. Parallelogram generated by a and b
Thus, the main chore is to figure out h. To this end, let A = span{a}. Then, by formula (8.23), Therefore, PAb
= a(aHa)-laHb = (b,a)lIall- 2a, b - PAb = b - a(aHa)-laHb
and Thus, h2 =
(b-PAb,b-PAb)
(b - PAb, b) =
bHb - bHa(aHa)-laHb
=
IIbl12 - I(a, b)1 2I1all- 2 .
203
9.9. Areas, volumes and determinants
Consequently, (9.14)
To complete the proof, observe that VHV
=
[ : : ] [a
=
[:::
h]
::!]
=
[~~~~) \~~~;]
and hence that (9.15)
Thus, we obtain (9.16)
as claimed, at least when a and h are linearly independent. Formula (9.16) remains valid, however, even if a and h are linearly dependent because then both sides in formula (9.16) are equal to zero. 0 As a byproduct of the proof of the last lemma we obtain the formula (9.17) I(a, h)12 = lIall 211hll 2 - (area)2 . This yields another proof of the Cauchy-Schwarz inequality for vectors in lR n :
I(a, h)1 ~ lIallllhll with equality if and only if the area is equal to zero, i.e. if and only if a and hare colinear. Lemma 9.27. Let vol {a, h, e} denote the volume of the parallelopiped generated by the vectors a, h, e E lR 3 and let W = [a he]. Then (9.18)
vol {a, h, e} = Idet
WI.
Proof. Suppose first that the three given vectors are linearly independent. Then in view of Lemma 9.26, the volume we are after is given by the formula vol {a, h,e} = h{det[VHV]}1/2, where V is the 3 x 2 matrix with columns a and hand h is the distance of the point (Cl, C2, C3) from the plane V generated by the vectors a and h. Thus, h is equal to the length of the vector e - Pvc = e - V(VHV)-l VH e: h2 = (e - V(VHV)-lVHe,e - V(VHV)-lVHe)
= (e - V(VHV)-lVHe, c) ,
9. Symmetric, Hermitian and normal matrices
204
since Pv
= V(VHV)-I VH is an orthogonal projection, i.e.,
= (Pv(c -
(c - Pvc, Pvc)
Pvc), c)
= (0, c) = 0 .
Consequently
h2
_
IIcll 2
(V(VHV)-IVHC, c)
_
cHc - cHV(VHV)-IVHc
-
and thus, (vol {a, h, c})2
= {lIcll 2 -
cHV(VHV)-IVHc} det(VHV) .
The next step is to observe that W=[V
H [VHV VHc] and W W= cHV cHc .
c]
Therefore, by formula (5.14), det WHW = det(VHV){cHc - cHV(VHV)-IVHC} , which serves to complete the proof for the case of three linearly independent vectors, since Idet W 12 = det WHW. However, the formula remains valid even if a, hand c are linearly dependent 0 because then both sides of formula (9.18) are equal to zero.
In order to extend these formulas to higher dimensions, it is necessary to develop the definition of volume in lR n for n > 3, which we shall not do. The monograph [71] provides good introduction to this class of ideas. Exercise 9.21. Let A = {A E lR nxn : AT = A} and let S denote the linear transformation from A into lR kn that is defined by the formula
SeA) =
[A~ll ' Aek
where ej denotes the j'th column of In for j = 1, ... ,n and k :::; n. Show that (a) A is an (n 2 + n)/2 dimensional vector space over lR. t.r (b) d'ImJVS
_ -
(n - k)2 + n - k 2 .
(c) nk - dim'Rs = k(k
i
1) .
Exercise 9.22. Show that the conclusions of Exercise 9.21 remain valid if el, ... ,ek is replaced by any set VI, ... ,Vk of k linearly independent vectors inlRn.
9.9. Areas, volumes and determinants
205
Exercise 9.23. Let {U}, ... ,Uk} and {V}, ... ,vt} be two sets of linearly independent vectors in lR n and let S denote the linear transformation from lR nxn into lR (k+f)n that is defined by the formula AUI
Show that (a) dimNs = (n - k)(n - f). (b) n(k + £) - dim Rs = k£. Exercise 9.24. Let v E lR n, A E lR kxn and let U}, ..• ,uq be linearly independent vectors in NA n lR n such that (v, Uj)st = 0 for j = 1, ... ,q. Show that if rank A = n - q, then v T is a linear combination of the rows of A. Exercise 9.25. Let A
with
{3 ~ o.
lall = 1.
Show that
J]
and G(a)
= [~
!a]
be unitary matrices
{3 = Jl -lal 2 and that A = G(a) [~ ~l]'
where
[HINT: Consider G(a)H A.]
A matrix A E aij
= [~
= 0 for
i ~j
is said to be an upper Hessenberg matrix if Thus, for example,
onxn
+ 2.
al3 a14] a23 a24 a33 a34 a43
a44
is a 4 x 4 upper Hessenberg matrix. Exercise 9.26. Show that if A is a unitary 4 x 4 upper Hessenberg matrix with nonnegative entries on the subdiagonal and G(a) is as in Exercise 9.25, then A admits a factorization of the form
for some choice of constants aj with lajl ~ 1 and {3j = aj+IJ for j = 1,2,3 and la41 = 1. [HINT: Keep the conclusions of Exercise 9.25 in mind.]
206
9. Symmetric, Hermitian and normal matrices
Exercise 9.27. Formulate and solve an analogue of Exercise 9.26 for a 5 x 5 unitary upper Hessenberg matrix A with nonnegative entries on the subdiagonal.
9.10. Bibliographical notes Exercise 9.18 is adapted from a statement in an expository article by Bhatia [8J. Exercises 9.21, 9.22 and 9.23 are adapted from Lemma 9.5 in Camino et al. [14J. Exercises 9.25-9.27 are adapted from a theorem in [2J.
Chapter 10
Singular values and related inequalities
Let's throw everything away. Then there will be room for what's left. Irene Dym • WARNING: From now on, unless explicitly indicated otherwise, (u, v) = vHu and Ilull = VuHu for vectors u, v E]Fk and IIAII = IIA112,2 for matrices A.
10.1. Singular value decompositions Let A E C pxq • Then the matrices AHA and AAH are Hermitian matrices of sizes qxq andpxp, respectively. Moreover, if AH Au = au and AAHv = (lv, then the formulas
a(u, u) = (AH Au, u) = (Au, Au) ~ 0 and (l(v, v)
= (AAHv, v) = (AHv, AHv)
~0
clearly imply that the eigenvalues of AHA and AAH are nonnegative. Therefore, by Theorem 9.3, there exists a unitary matrix U E c qxq such that
(10.1) where the numbers s~, ... ,s~ designate the eigenvalues of AHA, and, in keeping with the usual conventions, it is assumed that they are indexed so
-
207
208
10. Singular values and related inequalities
that S1 ~ S2 ~ ... ~
~
Sq
0.
The numbers Sj,j = 1, ... ,q, are referred to as the singular values of A. = r, then Sr > 0 and Sj = 0 for j = r + 1, . . . , q, if r < q.
If rank A
Exercise 10.1. Show that if A E C pxq , then
(10.2)
RA
= RAAH and rankAH A = rank A = rankAAH
Exercise 10.2. Show that if A E C pxq and 1 ~ r
rankA H A
(10.3)
.
< q, then
= r {:::::} Sr > 0 and SrH = o.
Theorem 10.1. Let A E C pxq be a matrix of rank r with singular values 81, ... ,Sq and let
D = diag{sI, ... ,sr} E
jRrxr
(with
S1 ~ ... ~ Sr
> 0).
Then there exists a unitary matrix V E C pxp and a unitary matrix U E C qxq such that Orx(q-r) ] U H V [D 0(p-r) xr O(p-r) x (q-r)
(10.4)
A=
V[
D
O(p-r)xr
if r
] UH
< min{p, q}
if r=q
V[D Orx(q-r)] UH
if r=p
VDU H
if r=p=q
Moreover, if A E jRPx q, then the unitary matrices U and V in (10.4) may be chosen to have real entries, i. e., to be orthogonal matrices. Proof. Let Uj denote the j'th column of the unitary matrix U that appears in formula (10.1). Then AHAuj
= sJUj
for
j
= 1, ...
,q.
Let (10.5)
AUj = SjVj
for j = 1, ... ,r.
Then (SjVj, SkVk) = (Auj, AUk) = (AH AUj, Uk) =sJ(Uj,Uk)
for
j,k=I, ... ,r.
209
10.1. Singular value decompositions
Therefore,
for j, k
= 1, ... ,r. Thus, in matrix notation,
(10.6)
If r = p, then the matrix V = [VI ... v r } is unitary. If r < p, then, in view of Lemma 9.15, we can add columns vr+b ... ,vp so that
is a unitary matrix. Consequently, we can rewrite formula (10.6) as if r =p
A
(10.7)
[UI ...
if r
If r = q, then (10.7) yields the second formula in (10.4) if r < p and the fourth formula in (10.4) if r = p. If r < q, then, since AH AUj
= 0 for j = r + 1, ...
,q => AUj
= 0 for j = r + 1, ...
,q,
we can add the last q - r columns of U on the left of (10.7), balanced by r - q zero columns on the right of (10.7), to obtain V [D AU
=
{
V
Orx(q-r)]
[D O(p-r)xr
Orx(q-r)] O(p-r) x (q-r)
if r
=p
if r < min{p, q},
which yields the remaining two formulas in (10.4). Finally, if A E ~pxq, then AHA is a real Hermitian matrix and so, in view of the analysis in Section 9.3, the unitary matrix U in formula (10.1) may be chosen in ~ qxq and hence the unitary matrix V may be chosen in ~pxp.
0
Formula (10.4) is called the singular value decomposition of A. Exercise 10.3. Show that if A E C pxq and rank A = r, then the nonzero singular values of A coincide with the nonzero singular values of AH; i.e., sj(A) = Sj(AH) for j = 1, ... ,r.
10. Singular values and related inequalities
210
Corollary 10.2. Let A E C pxq be a matrix of rank r, r 2: 1, and let VI, ... ,vp and UI,." ,uq denote the columns of the unitary matrices V E C pxp and U E c qxq that appear in the singular value decomposition (10.4) of A. Let (10.8) UI = [UI ... u r ], D = diag{sI, ... ,sr} and VI = [VI ... v r ]. Then r
(10.9)
A
r
= VIDU[l = L:VjSjUf
AH = UIDVIH = L:UjSjVf. j=1
and
j=1
Proof. The first formula is equivalent to formula (10.4), and the second follows easily from the first. 0 Exercise 10.4. Show that if A E C pxq is expressed in the form (10.9), then RA =span{vl, ... ,vr } andN"A =span{ur +1,'" ,uq }. Exercise 10.5. Show that if A E C pxq is expressed in the form (10.9), then RAH =span{ub'" ,Ur} andN"AH =span{vr +1,'" ,vp}. Exercise 10.6. Let A E C pxq be expressed in the form (10.9) and let At = U1 D- 1 V1H
(10.10)
•
Show that AAtA = At, AAtA = A, (AtA)H = AtA and (AAt)H = AAt.
In the next chapter, we shall identify the matrix At as the MoorePenrose inverse of A and shall show that it is the only matrix in Cqxp that meets the four conditions in Exercise 10.6. Lemma 10.3. Let A E C pxq • Then
IIA112,2 =
SI.
Proof. Let M denote the middle term in the decomposition (10.4) so that A = VMU H , and let x E C q and y = UHx. Then (10.4), IIAxIl~
IIVMUHxll~ = IIMUHxll~ r
=
IIMyll~ =
L: ISjYjl2 j=1
r
< s~
L: IYjl2 ~ s~lIyll~· j=1
Therefore, since lIyll2 = IIx1l2' it follows that IIAxll2 ~ s111xll2
for every x E C q and hence that IIA1I2,2
~
SI. On the other hand, since
IIAu1112 = II MUHu 1112 = IIMeIil2 = SI,
10.1. Singular value decompositions
211
it follows that IIA1I2,2 ~ SI and hence that equality prevails. o The next result extends Lemma lOA and serves to characterize all the singular values of A E C pxq in terms of an approximation problem. Lemma 10.4. If A E C pxq , then its singular values SI, ••• ,Sq can be characterized as
(1O.11)
Sj+1
= min {IIA - BI12,2 : BE C pxq and rankB:5 j} .
Proof. Suppose first that rank B = rank A. Then clearly the choice B = A minimizes the norm in (1O.11). Suppose next that rank A = rand rankB = k with k < r and let Ul, ... ,Uq and VI. ... ,vp denote the columns of a pair of unitary matrices U and V in a singular value decomposition of A. Let k+1 kH x= CjUj with Llcjl2 = 1. j=1 j=1 Then k+l p AX=LCjSjVj and Bx=L(Bx,vj)vj. j=1 j=1 The next step is to show that x may be chosen to be orthogonal to all the vectors BHvl, ... , BHVk+1. In view of the chosen form of x, this is the same as to say that there exists a choice of coefficients Cl,··· , Ck+ 1 such that k+l k+l LvfBujcj=LCj(Buj,Vi)=O for i=I, ... ,k+l. j=1 j=1 This last requirement can be written more transparently in terms of the vk+d and the vector matrices Ul = [UI . . . Uk+d and VI = [VI k c E C +1 with components CI, . .. , Ck+1 as
L
VIHBUIC =
o.
However, since rank VIH BUI :s; k, there exists a vector c with IIcll = 1 that meets this requirement. For such a choice of c, k+l IIAx - Bxll~2 ,
=
L
j=l
k+l
2
p
CjSjVj -
L j=k+2
(Bx, Vj)Vj 2,2
p
L ICjl2 sJ + L I(Bx, Vj)12 j=1 j=k+2 k+1 k+1 > L ICjl2 sJ ~ s~+1 L ICjl2 = s~+1. j=1 j=1
10. Singular values and related inequalities
212
Thus, IIA - B112,2 ~
(10.12)
Sk+l
for every BE C pxq with
rankB:::; k.
It remains to show that there exists a choice of B E C pxq with rank B :::; k that attains equality in (10.12). This is an easy consequence of the singular
value decomposition and is left to the reader as an exercise.
D
Exercise 10.7. Show that there exists a choice of B E C pxq with rank B :::; k that attains equality in (10.12). Exercise 10.8. Let A E C pxq • Show that (10.13)
IIAII
= max {1(Ax,y}1 : x E C q ,
Y E C P , and Ilxll
= Ilyll =
I}.
Exercise 10.9. Let A E C pxq . Show that (10.14)
Exercise 10.10. Let A E C pxq and suppose that SI (A) :::; 1. Show that the matrix Ip-AB is invertible for every choice of BE C qxp with sl(B) :::; 1 if and only if sl(A) < 1. Exercise 10.11. Let A E C nxn and let AI, ... of the matrix B=[fH
,A2n
denote the eigenvalues
~]
repeated according to their mutiplicity and indexed so that Al Express these eigenvalues in terms of the singular values of A. Exercise 10.12. Show that if A = AH E IIABII2 = IIBA2 BII.
c nxn and B =
BH E
~
...
~ A2n.
c nxn , then
Exercise 10.13. Redo Exercises 9.17 and 9.18 using singular value decompositions.
10.2. Complex symmetric matrices
c nxn is symmetric, then AH = A. Theorem 10.5. (Takagi) If A E c nxn
If A E
and A = AT, then there exists a unitary matrix W E C nxn and a diagonal matrix D such that (10.15) A=WDWT and DHD=diag{s~, ... ,s~}, where
SI ~ ••• ~ Sn
are the singular values of A.
Proof. Let 0"1 > ... > 0"£ denote the distinct singular values of A. Then the condition A = AT implies that the spaces Mi = N(AHA-(1~In) meet the constraint AMi ~ Mi for i = 1, ... ,f; i.e., if u E M i , then Au E Mi: AH A(Au)
= AA(Au) = A(AH Au) = A(O";u) = 0"; Au.
213
10.3. Approximate solutions of linear equations
e,
e
c
If dimMi = 1 for i = 1, ... , then = n and the matrix U E nxn with columns Ui E Mi of norm Iluill = 1 for i = 1, ... ,n is unitary and meets the supplementary condition
AU
D is a diagonal matrix and DH D = diag {s~, ... , s~} .
= U D, where
Consequently, W = U satisfies the conditions in (10.15). On the other hand, if dim Mi = ki > 1 for some i, then the construction of the matrix W is a little more complicated: let Ul E Mi with IiUlli = 1 and let VI
{
=
Ul O"iUl
+ AUI
if Ul and AUI are linearly dependent if UI and AUI are linearly independent .
Then it is readily checked that VI E Mi and AVI = al VI, where lall = O"i. Next, choose a nonzero vector U2 E Mi that is orthogonal to VI and define V2
=
{
U2 O"iU 2
+ AU2
if U2 and if U2 and
and check that V2 E Mi and
AV2
AU2 AU2
are linearly dependent are linearly independent
= a2V2, where la21
= O"i, and that
(V2, VI) = O. Continuing this way, generate an orthogonal basis Vb ... , Vki of Mi with the property AVj = ajvj and lajl = O"i for j = 1, ... , ki . Let Wi denote the n x ki matrix with columns WI, ... , Wki based on the normalized vectors
Then, since T-Wi Wi
= hi
and
AWi
= WiDi
.
where Di = dlag {ai, ... , G:kJ ,
it is readily checked that the matrix W = [WI ... Wi] with blocks Wi of 0 size n x k i is a unitary matrix that meets the conditions of (10.15).
10.3. Approximate solutions of linear equations Let A E C pxq and let bE CPo Then the equation
Ax=b has a solution x E C q if and only if b ERA. However, if b¢ RA, then a reasonable strategy is to minimize {IIAx - bll over x E C q }
lO. Singular values and related inequalities
214
with respect to some norm 11·11. The most convenient norm is 11·112, because it fits naturally with the standard inner product. More often than not we shall, as warned earlier, drop the subscript. Lemma 10.6. If A E C pxq , bE CP and P'RA denotes the orthogonal projection of CP onto RA, then
(10.16) with equality if and only if Ax = P'RAb. Moreover, if rank A = rand VI = [VI .. . v r ] is built from the first r columns in the matrix V in (10.4) and 81, .•• ,sr are the positive singular values of A, then r
P'RA b = Vi
(10.17)
VlH b
=
L (b, Vj)Vj j=1
and
(10.18)
II(Ip - P'RA)bI1 2
=
r
p
j=I
j=r+l
L I(b, Vj)12 = L
IIbl1 2 -
I(b, Vj)12.
Moreover, ifuI,.'. ,uq denote the columns in the matrix U in (10.4), UI = [UI ... Ur] and U2 = [Ur+l ... u q ], then the vector
(10.19)
x=U,D-'vtb+U, [:,]
is a solution
0/ the equation
(10.20) for every choice of the coefficients
Cr+l, . ..
,cq .
Proof. Formula (10.17) is an application of Exercise 10.4 and Lemma 9.12. Formula (10.18) then follows from the decomposition p
r
b - P'RAb = L(b, Vj)Vj - L(b, Vj)Vj = j=1
j=l
p
L
(b, Vj)Vj
j=r+l
and the fact that the vectors VI, ... ,Vr form an orthonormal basis for RA. Finally, since the vectors Ul, ... ,uq form an orthonormal basis for C q, every vector x E C q can be expressed as q X=
LCjUj. j=1
10.4. The Courant-Fischer theorem
215
Thus, with the aid of Corollary 10.2, it follows that r
Ax = LSjCjVj j=1
and hence that Ax = PRA h if and only if Cj
= (h,SjVj)
Since there are no constraints on indeed a solution of (10.20).
for j Cj
= 1, ... ,r.
for j = r + 1, ...
,q, formula
(10.19) is 0
Exercise 10.14. In the setting of Lemma 10.6, show that if r < q and At denotes the Moore-Penrose inverse of A introduced in Exercise 10.6, then the vector x
(10.21)
= Ath = ~ (h, Vj) Uj ~
S· 3
j=1
may be characterized as the solution of (10.20) with the smallest norm.
Exercise 10.15. In the setting of Lemma 10.6, show that if r = q, then AH A is invertible and the solution x of equation (10.20) given by formula (10.19) may be expressed as x = (AH A)-1 AHh.
10.4. The Courant-Fischer theorem Let Sj denote the set of all j-dimensional subspaces of en for j = 0, ... , n, where it is to be understood that So = {O} and Sn = en.
Theorem 10.7. (Courant-Fischer) Let A E e nxn be a Hermitian matrix with eigenvalues AI,.·. ,An. (1) If Al
~
...
~
An, then
(10.22)
.
Aj = mm max XESj
{(AX,x) (
X,X
and x
): x E X
t=
o} ,
j = 1, ... ,n.
t= 0 }
, j = 1, ... ,n.
(10.23)
. {(AX,x) An +l-j = max mm :x XESj (x,x) (2) If Al (10.24) Aj
~
...
~
= XESj max
E
X
and x
An, then min {
(1
x , ~) : x E X x,x
and x
t= O}
, j
= 1, ...
,n.
10. Singular values and related inequalities
216 (10.25) An+l-j =
FJ~
max {
(~~~)}
:xEX
and x
i=
o} ,
j = 1, ... ,n.
Proof. Let UI, ... ,Un be an orthonormal set of eigenvectors of A corresponding to the eigenvalues AI, ... ,An. To prove (10.22), let Uj = span {Uj, ... ,un}. Then
n Uj
X
since dim X Then
=j
=J. {O} for every choice of X
and dimUj
= n + 1- j.
E Sj ,
Choose v E X
n Uj
with v =J.
o.
n
V = LCiUi i=j
and hence
n
(Av, v) =
L
n
AilCil 2
L ICil
~ Aj
i=j
2
= Aj(V, v} .
i=j
Therefore,
(AX,x) max { ( ): x E X
x,x
and x =J. 0
}
~ Aj ,
for every X E 5j, which in turn implies that
. max {(AX,x) : x E X mm
xesj
(x, x}
and
On the other hand, as
(Ax, x) max { (x, x} : x E span {UI, ... ,Uj}
and x =J. 0 }
= Aj ,
it follows that
. {(AX,x) : XEX mmmax (x,x)
xeS;
and hence that equality prevails. To prove (1O.23), let Wj = span{uI, ... ,Un-j+l} . Then X n Wj =J. {o}
for every X E 5j. Thus, for every X E Sj, we can find a nonzero vector W E X n Wj. But this implies that n-j+l W=
L
i=1
CiUi
10.4. The Courant-Fischer theorem
217
and hence that n-j+l
(Aw, w)
L
=
n-j+l
L
Ail cil :s; An -j+1 2
i=l
ICil 2 = An -j+1 (w, w).
i=l
Therefore, min {
(Ax, x) : x EX (x, x)
and x
i= 0 } :s;
(Aw,w) (w, w) :s; An -j+1 .
Thus, as the space X is an arbitrary member of Sj, it follows that
. max mm
XESj
{(AX, ( x) ): X,X
\ and x T~} 0 :s; I\n-j+l .
xEX
To get the opposite inequality, it suffices to note that min {
(1:~) :
y E span {Un-j+l, ...
,Un} and y i= 0 } = An -j+1 .
The verification of (10.24) and (10.25) is left to the reader.
o
Exercise 10.16. Show that if aij denote the entries of the matrix
A=
[~ ~ ~l
Exercise 10.17. Show that if the eigenvalues of A = AH are ordered so that Al ~ ... ~ An, then formulas (10.24) and (10.25) hold. Exercise 10.18. Show that if A E e nxn is a Hermitian matrix with eigenvalues Al :s; ... :s; An and Xl. denotes the orthogonal complement of X in en, then An-j+l =
min max
XESj
{(1x,~) x,x
:x
E
Xl.
and x
i=
o}
for j = 1, ... ,no
Exercise 10.19. Show that if A E e nxn is a Hermitian matrix with eigenvalues Al :s; ... :s; An and Xl. denotes the orthogonal complement of X in en, then Aj =
max min { (1 x , ~) : x E Xl. x,x
XESj
and x
i=
e
o}
for j = 1, ... ,n.
Lemma 10.8. Let A, B E nxn and let sj{A) and sj{BA), j = 1, ... ,n, denote the singular values of A, Band BA, respectively. Then:
Sj{BA) :s; IIBIIsj{A).
218
Proof.
10. Singular values and related inequalities
SI (A) ~
Since
s;(A)
...
~
sn(A),
max min {(A H Ay,y) : y E Y and lIyll
YESj
max min {IiAyIl2 : y E Y and lIyll
YESj
= I}
= I} .
Correspondingly,
s;(BA) = max min {IIBAyIl2 : y YESj
E
Y and Ilyll
=
I} .
Therefore, as IIBAYII ~ IIBIlIIAYII , it follows that min {IiBAyIl2 : y E Y and Ilyll = I} ~ IIBII 2 min{IIAYIl2: y E Y and lIyll =
I}
and hence that
Sj(BA)2 ~ IIBII 2sj(A)2 for j = 1, ... ,n. This serves to prove the lemma, since sj(BA) tion.
~
0 and sj(A)
~
0 by defini0
Exercise 10.20. Let A E c nxn , B E c nxn and let sj(AB) and sj(A) denote the singular values of the matrices AB and A, respectively. Show that 8j(AB) ~ IIBlIsj(A). Exercise 10.21. Let A E c nxn be a Hermitian matrix with eigenvalues -AI ~ ..• ~ -An. Show that -An ~ min aii ~ max aii ~ -AI.
10.5. Inequalities for singular values Lemma 10.9. Let A E C nxn, let of A and let 1 ~ k ~ n. Then
81 ~ ... ~ 8 n
det(W H AH AW) ~ s~ ... s~ det(WHW)
denote the singular values
for every choice of WE C nxk .
Proof. Theorem 10.1 guarantees the existence of a unitary matrix U E C nxn such that U HAH AU = D2
with
D = diag{sI, ... ,sn}.
Therefore, W HUD 2U HW
WHAHAW =
WHUDDUHW
= BBH ,
10.5. Inequalities for singular values
where B formula,
219
= F D and F = W H U. Let C = BH. Then, by the Binet-Cauchy
det(BB H) =
L
,~
B ( .1, ...
..
I~Jl <"'<Jk~n
31, .. · ,3k
)
C( il,". ,ik) . 1, ... , k
Consequently, as
B ( .1, ... , ~ ) = C ( il,'" ,ik 31, ... ,3k 1, ... , k
(10.26)
) ,
it follows that (10.27) Moreover, since B (
1, ... ,k ) 31, .. · ,3k
=
and 0
~
sil ... Sjk
IB
8j1··· 8jk F
~ SI ... Sk,
( .1, ... ,
~
( .1, ... ,k. ) 311 .. · ,3k
it follows that
)
31. .. · ,3k
12 ~ s~ .. · s~
IF ( ?, ... ,~
31. .. · ,3k
) 12
and hence, by formula (10.27),
det(BB H ) ~ s~ ... s~ det(FF H ). Thus,
as claimed. Exercise 10.22. Verify formula (10.26).
o
10. Singular values and related inequalities
220
c
Let A E nxn , let Sl ~ '" ~ Sn denote the singular values of A and suppose that the eigenvalues of A are repeated according to algebraic multiplicity and are indexed so that IAll ~ IA21 ~ ... ~ IAnl. Then
Lemma
10.10~
IAll .. ·IAkl$sl···Sk for k=1, ... ,n. By Schur's theorem, there exists a unitary matrix U such that = Aj for j = 1, ... , n. Thus, if
Proof.
T = U H AU is upper triangular and tjj VH = [h Okx(n-k)], then
AI'" Ak
= det{VHTV} = det Tn,
where Tn denotes the upper left-hand k x k corner of T. Moreover, since T is upper triangular, V HTH TV _ VH -
[Tlf. 0] [Tn0 TM T~
T12] [ Ik ] _ [ Tn ] - TH T22 0 0 - n
T
11·
Therefore, IAI ... Akl 2 _
Idet{V HTV}1 2 = Idet Tnl 2 = det{T!iTll} det{VHTHTV} = det{VHU H AHUU H AUV} det{VHU H AH AUV}
<
s~ ... s%det{VHUHUV},
_
by Lemma 10.9. This is equivalent to the claimed inequality, since det{VHUHUV} = 1
for the given choice of V H when U is unitary.
o
Corollary 10.11. Let A E C nxn with singular values Sl ~ ... ~ Sn and eigenvalues AI, ... , An repeated according to algebraic multiplicity and indexed so that IAll ~ ... ~ IAnl. Then Sk = 0 ==> Ak = 0 (Le., IAkl > 0 ==> Sk > 0).
Lemma 10.12. Let {al, ... ,an} and {bl, ... ,bn } be two sequences of real numbers such that al ~ a2 ~ ... ~ an; bl ~ b2 ~ ... ~ bn and k
k
j=l
j=1
L aj $ L bj, Then
k
Le
aj
j=l Proof.
for k = 1, ... , n .
k
$
Le
bj
,
for k = 1, ... ,n .
j=1
It is readily checked that eX =
JCxoo (x-s)eSds
10.5. Inequalities for singular values
221
or, equivalently, in terms of the notation _{x-sfor X-s>O (x-s )+o x-s:5:0, that
Consequently,
and k
ii = Lk
L
j=l
1
00
(bj - s)+eSds .
j=l-oo
Thus, in order to establish the stated inequality, it suffices to show that k
k
L(aj - s)+ :5: ~)bj - s)+ j=l
j=l
for every s E R To this end, let a(s)
= (al
- s)+
+ ... + (ak -
s)+ and (3(s)
= (b l
-
s)+
+ ... + (b k -
s)+
and consider the following cases: (1) If s < ak, then a(s)
(al-s)+···+(ak- s )
< (bl-S)+···+(bk-s) < (bl -s)++···+(bk- S )+={3(s).
= 2, ... ,k, then (al - s)+ + ... + (aj - s)+
(2) If aj:5: s < aj-l, for j a(s)
(al - s)
+ ... + (aj-1 -
s)
< (b1- S )+···+(bj-1- S)
< (bl-sh+···+(bk-S)+ (3) If s
~
aI, then a(s) = 0 and so (3(s)
~
= (3(s) .
a(s), since (3(s)
~
O.
0
Theorem 10.13. Let A E c nxn , let Sl, ... ,Sn denote the singular values of A and let AI, ... ,An denote the eigenvalues of A repeated according to algebraic multiplicity and indexed so that IAll ~ ... ~ IAnl. Then k
(1)
k
LIAjIP:5:L~ j=1
j=l
forp>Oandk=l, ... ,no
10. Singular values and related inequalities
222
(2)
k
k
j=l
j=l
II (1+rIAjl) S II (l+rsj)
Proof.
forr > 0 and k = 1, ... ,n .
Lemma 10.10 guarantees that IAll·· ·IAkl S Sl ... Sk
•
Suppose that IAk I > O. Then In IAll and hence, if p
+ ... + InlAkl S Insl +
... + Insk
> 0,
or, equivalently, lnlAll P + ... + In IAklP S Insi + ... + In~ . Consequently, Lemma 10.12 is applicable to the numbers In?;, j = 1, ... ,k, and yields the inequality
aj
elnl>'ll" + ... + e1nl>'kl P S e1nsi + ... + elns~
= In IAjIP, bj =
,
which is equivalent to IAII P + ... + IAkl P S si + ... + ~
.
Thus we have established the inequality (1) for every integer k E {1, ... , n} for which IAlcl > O. However, this is really enough, because if Ai = 0, then IAjl S Sj for j = l, ... ,n. Thus, for example, if n = 5 and IA31 > 0 but ~ = 0, then the asserted inequality (1) holds for k = 1,2,3 by the preceding analysis. However, it must also hold for k = 4 and k = 5, since A4 = 0 ==? A5 = 0 and thus IA41 S S4 and IA51 S S5. The second inequality may be verified in much the same way by invoking the formula
cp(x) =
1:
(x - s)cp"(s)ds
with
cp{x)
= In(l +
reX) and r > 0 .
This works because X
cp"(x) -- (1 +rerex)2 > - 0 £or every x
The details are left to the reader.
E
TIl> l.l'lo. •
D
Exercise 10.23. Verify the integral representation for cp(x), assuming that cp{x) , cp'{x) and cp"{x) are nice continuous functions that tend to zero quickly
10.5. Inequalities for singular values
223
enough as x ~ -00 so that the integrals referred to in the following hint converge. [HINT: Under the given assumptions on t.p,
t.p(x) =
i:
t.p'(s)ds =
i: (l
X
i: {i~
t.p"(U)dU} ds
dS) t.p"(u)du.]
Lemma 10.14. Let A E c nxn with singular values SI
81 ~ ... ~
Sn. Then
+ ... + Sk = max{ltrace(VHU AV)I : UU H = In, V
E C nxk and
VHV
= h}.
Proof. Let B = VHU AV and let Al(B), ... ,Ak(B) denote the eigenvalues of B repeated according to their algebraic multiplicity. Then, since k
trace B = L:Aj(B) ,
j=1 Theorem 10.13 implies that k
k
j=1
j=1
L: IAj(B)1 :::; L: 8j(B) .
Itrace BI ~
Moreover, by Lemma 10.8, 8j(V HUAV)
sj(B) =
<
IIVH IlIIU1I8j(A)11V1l
sj(A). Therefore, for every choice of U E c nxn and V and VHV = h, =
E
C nxk , with UHU = In
k
Itrace(VHUAV)1 ~ L:8j(A) . j=1 The next step is to show that there exists a choice of U and V of the requisite form for which equality is attained. The key ingredient is the singular value decomposition
A = V1SUf1, in which VI and Ul are unitary,
S = diag{sl, ... ,sn} and
8j =
sj(A).
In these terms, trace (VHU AV) = trace (V HUVI sUf1V} ,
224
10. Singular values and related inequalities
which, upon choosing
VH = [Ik OjUf and U = Ul V1H, simplifies to
trace(VHUAV) = trace{[h OjS[h OJH} =
Sl
+ ... +Sk· o
The next theorem summarizes a number of important properties of singular values. Theorem 10.15. Let A,B E c nxn and let sj(A) and sj(B), j
=
1, ... ,n,
denote the singular values of A and B, respectively. Then: (1) sj(A) = Sj(AH). (2) sj(BA) ~ IIBllsj(A).
(3) sj(AB)
~
sj(A)IIBIl.
(4) n~=lsj(AB) ~ n~=l sj(A)
n;=l sj(B).
(5) ~;=lsj(A + B) ~ ~;=l sj(A) + ~~=l sj(B). Proof. Items (1)-(3) are covered by Exercise 10.3, Lemma 10.8 and Exercise 10.20, respectively.
Next, a double application of Lemma 10.9 yields the inequalities
det{V HBH AH ABV} :::; S1 (A)2 .. , sk(A)2 det{VH BH BV} :::; s1(A)2.,. sk(A)2 s1 (B)2 ... 8k(B)2 det{VHV} for every matrix V E C nxk . Thus, if U is a unitary matrix such that
BH AH AB = U [
81 (AB)2
".
] UH , 8n (AB)2
the choice yields the formulas
81(AB)2
= [h OJ
[
", 8n(AB)2] [
=
[8 M B)' 8k(AB)'] ,
~1
10.6. Bibliographical notes
225
which leads easily to the inequality (4). Finally, the justification of (5) rests on Lemma 10.14 and the observation that for any unitary matrix U E c nxn and any V E C nxk with VHV = h,
trace{VHU(A + B)V}
=
trace{VHUAV} + trace{VHUBV}
and hence (by that lemma) that
Itrace{VHU(A + B)V}I
:::; Itrace{VHUAV}I k
+ Itrace{VHUBV} I
k
< ~:::>j(A) + LSj(B) . j=l j=l The exhibited inequality is valid for every choice of U and V of the indicated 0 form. Thus, upon maximizing the left-hand side, we obtain (5). Exercise 10.24. Let A E C nxn, let /3b ... ,/3n and "Yl, ... ,"In denote the eigenvalues of the Hermitian matrices
B
= (A + AH)j2
and C
= (A -
AH)j(2i),
respectively, and let A E O'(A). Show that
A+"X
/31 :::; -2- :::; /3n
and "11:::;
[HINT: If Ax = AX, then A + "X = (Ax, x)
A-"X 2 i :::; "In .
+ (x, Ax).]
Remark 10.16. A number of inequalities exist for the real and imaginary parts of eigenvalues. Thus, for example, if A E nxn and Aj = /3j + i'Yj, j = 1, ... , n, denote the eigenvalues of A, repeated according to algebraic multiplicity and indexed so that IA11 ~ ... ~ IAnl and B = (A - AH)j(2i), then
c
k
k
L !"Ijl :::; LSj(B) . j=l j=l See e.g., p. 57 of [35].
10.6. Bibliographical notes Theorem 10.5 was adapted from an article by Takagi [66]. The last section was adapted from Gohberg-Krein [35], which contains Hilbert space versions of most of the cited results, and is an excellent source of supplementary information.
Chapter 11
Pseudoinverses
"How long have you been hearing confessions?" "About fifteen years." "What has confession taught you about men?" ... "the fundamental fact is that there is no such thing as a grown up person .... " Andre Malraux [49] To set the stage, it is useful to recall that if A E lFPxq, then
A
is left invertible
{:::::} NA = {O} {:::::} rank A = q
and
A
is right invertible
{:::::} 'RA
= lF P {:::::} rank A = p.
Thus, if rank A < min {p, q}, then A is neither left invertible nor right invertible.
11.1. Pseudoinverses A matrix AO E lF qxp is said to be a pseudoinverse of a matrix A E lF pxq if
(11.1) It is readily checked that if A is left invertible, then every left inverse of A is a pseudoinverse, Le., (11.2)
BA = Iq => ABA = A and BAB = B.
Similarly, if A is right invertible, then every right inverse of A is a pseudoinverse, Le.,
(11.3)
AG = Ip => ACA = A and GAG
= G.
-
227
11. Pseudoinverses
228
However, although there are matrices A which are neither left invertible nor right invertible, every matrix A has a pseudoinverse. Moreover, AO is a pseudoinverse of A if and only if A is a pseudoinverse of A 0 •
Exercise 11.1. Let A be a 4 x 5 matrix such that EPA = U is an upper echelon matrix with pivots in the 11, 22 and 34 positions. Show that there exists an invertible 5 x 5 lower triangular matrix F and a 5 x 5 permutation matrix II such that
1 000
o IIFU T
=
1 0 0
0 0 1 0 000 0 o 000
Theorem 11.1. Every matrix A E lF pxq admits a pseudoinverse AO E lF qxP . Moreover, if rank A = r > 0 and the singular value decomposition (10.4) of A is expressed generically as (11.4)
A
=V
Orx(q-r) ] U H O(p-r)x(q-r) ,
[D O(p-r)xr
where V E lF Pxp and U E invertible, then
lF qxq
are unitary and D = diag {81' ...
,8 r }
zs
(11.5) is a pseudoinverse of A for every choice of B1 E lFrx(p-r) and B2 E IF(q-r)xr. Furthermore, every generalized inverse AO of A can be expressed this way.
Proof. If A = Opxq, then AO only pseudoinverse of A.
=
Oqxp
is readily seen to be the one and
Suppose next that rank A = r > 0 and note that every matrix can be written in the form
11 = U
[Rn R21
A E lF qxp
R12] VH , R22
where U, V are as in (11.4), Rn E lF rxr , R12 E lFrx(p-r), R21 E IF(q-r)xr and R22 E IF(q-r)x(p-r). The constraint AAA = A is met if and only if
i.e., if and only if Rn = D- 1 .
11.1. Pseudoinverses
Next, fixing Rll if and only if [
229
= D- 1 , we see that the second constraint XAX =
X is met
D- 1
R21
i.e., if and only if R21DR12 = R22· Thus, A is a pseudoinverse of A if and only if it can be expressed in the form
X-
1 U[DR21
R12] VH R21DR12 .
But this is exactly the assertion of the lemma (with Bl
= R12 and B2 = R21)' o
Lemma 11.2. Let A E lF pxq and let AO be a pseudoinverse of A. Then: (1) RAAo (2) NAAo
= RA. = NAo.
(3) dim RA = dim RAo. Proof.
Clearly RAAo ~ RA = RAAoA ~ RAAo.
Therefore, equality (1) prevails. On the other hand, NAo ~ NAAo ~ NAoAAo = NAo,
which serves to establish (2). Finally, by the principle of conservation of dimension (applied first to A ° and then to AA 0 ) and the preceding two formulas,
+ dim RAo dim NAAo + dim RAo,
p = dim NAo
=
whereas p
= dim NAAo + dim RAAo = dim NAAo + dim RA.
Assertion (3) drops out by comparing the two formulas for p. 0 It is instructive to verify assertions (1)-(3) of the last lemma via the decompositions (11.4) and (11.5). Exercise 11.2. Verify assertions (1)-(3) of Lemma 11.2 via the decompositions (11.4) and (11.5). Lemma 11.3. Let AO E lF qxp be a pseudoinverse of A E lF pxq • Then:
230
11. Pseudoinverses
(1) lF P = 'RA+NAo. (2) lF q = 'RAo+NA.
Proof.
First observe that AAo and AO A are both projections, since
(AAO)(AAO) = (AAO A)AO = AAo and Thus, as
(11.6)
'Rp+Np = lFk for any projection P
E
lF kxk
,
lF P = 'RAAo+NAAo and lF q = 'RAoA+NAo A . The first conclusion now drops out easily from Lemma 11.2. The second follows from the first since A ° is a pseudoinverse of A if and only if A is a pseudoinverse of A°. D
Remark 11.4. Lemma 11.2 exhibits the fact that if A ° is a generalized inverse for a matrix A E lF pxq , then NAo is a complementary subspace for 'RA in lF P and 'RAo is a complementary subspace for NA in lF q . Our next objective is to establish a converse statement. The proof will exploit the general form (11.5) for pseudoinverses and the following preliminary observation. Lemma 11.5. Let A E lF pxq , let AO E lF qxp be a pseudoinverse of A and suppose that rankA = r > 0 and that A and AO are expressed in the forms (11.4) and (11.5), respectively. Then:
(11.7)
'RA =
{v [0(p-r)xr Ir ]
(11.8)
NAo = { V
(11.9)
'RAo =
(11.10) Proof.
{u
NA = {
u : U E lFr} ,
[-I~_~l] v : v E lF P- r } [i:D] u:
U
,
E lFr} ,
u [Or4~:r)] v : v E IF(q-r)}
By definition,
{V [g g] = {V [g] [Ir {V [~]
'RA =
=
u:
UHx: x E lF q }
O]X:XElF q} U
E lFr} .
.
231
11.1. Pseudoinverses
Similarly, = {U
RAO
[~~l]
={U[~~l] =
{U [i;D]
[Ir DBI] VHx : x E lF [Ir
P}
DBI]X:XErp}
u: u
Err} .
Suppose next that x E NAo. Then
D-I] However, since U [ B2 is left invertible, this holds if and only if
Thus, upon writing
with
U
E
rr and v
E
rp-r, we see that
or equivalently that
Therefore, X
=
-DBI] V [ Ip-r v.
This proves that
and hence in fact serves to establish equality, since the opposite inclusion is self-evident. The formula for
NA is established in much the same way.
Exercise 11.3. Verify the formula for
NA that is given in Lemma 11.5.
0
232
11. Pseudoinverses
Remark 11.6. Formulas (11.7) and (11.8) confirm the already established fact that RA +NAo
{v [0 - {v [ =
-
-DB!
]
Ir (p-r)xr
I(p-r) x (p-r)
Ir O(p-r)xr
I(p-r) x (p-r)
[u]v :u
E lF r
and
v
E lF P - r }
-DBl. ] x: x E lF P }
= lF P ,
since both of the p x p matrices are invertible. Similarly, formulas (11.9) and (11.10) confirm the already established fact that RAo +NA =
{u [B~D
[u]v :u
E lF r
Orx(q-r)] Iq-r
and
v
E
lF q- r }
= lF q •
Exercise 11.4. Use formulas (11.7)-(11.10) to confirm that 'RA nNAo = {O} and 'RAo nNA = {O}.
Exercise 11.5. Show that, in the setting of Lemma 11.5, (11.11)
'RA
is orthogonal to NAo
NA
is orthogonal to
¢:::=:}
BI = Orx(p-r)
and (11.12)
Theorem 11.7. Let A E lF pxq and let respectively, such that 'RA+X
RAo X
¢:::=:}
B2
= O(q-r)xr .
and Y be subspaces oflF P and lF q
= lF P and NA+Y = lF q .
Then there exists a pseudoinverse AO of A such that NAo
=X
and 'RAo
= y.
Proof. Suppose first that X and Y are proper nonzero subspaces of lF P and lF q respectively, let r = rank A and let {Xl, ... ,xp - r } be a basis for X. Then, in terms of the representation (11.4), we can write
for some choice of C E lFrx(p-r) and E E IF(P-r)x(p-r). Thus, in view of (11.7), RA + X =
{v [O(p~r)xr
~] [~]
:
u E lF r and v E lF P - r } .
233
11.1. Pseudoinverses
Moreover, since RA
+X
=
IF p
{:::=:}
E
is invertible,
it follows that E is invertible and RA
+X =
{v [0(p-r)xr Ir
ClE-I]
p-r
[Ell] : II E 1FT and v E 1F P- T}. V
Choose BI
= -D-1CE-1 .
Next, let Yl, ... ,Yr be a basis for Y and write
[YI ... Yr]
= U
[~]
,
where G E IF rxr and H E IF (q-r)xr. Then
NA
+Y
[Or;:~:r) ~] [~] : UE 1F q-
= {U
T
and v E 1FT} = IFq
if and only if G is invertible. Thus,
NA
+Y =
{u [Orx(q-r) {u [Or;:~:r) Iq-r
IT_I] [Gv u ]
HG
: II E 1F q- r and v E IFr}
Hci- l ] x: x E 1F q} .
Choose B2 = HG- 1 D- 1 . Then for the specified choices of Bl and B2, the matrix AO defined by formula (11.5) is a pseudoinverse of the matrix A in (11.5) such that
and
Y=
{u [::D]
u:
UE 1FT} = RAO,
as claimed; see Lemma 11.5. This completes the proof when X and Y are proper subspaces of IFP and IFq, respectively. The remaining cases are left to the reader. 0 Exercise 11.6. Let A E IFpxq and suppose that RA = IFP and NA+Y = IF q for some proper nonzero subspace Y of IF q. Show that there exists a pseudoinverse AO of A such that NAo = {O} and RAo = y. Exercise 11. 7. Let
234
11. Pseudoinverses
Find a pseudoinverse A 0 of the matrix A such that NAo
= Y and n AO = X.
Exercise 11.8. Let A E C pxq admit a singular value decomposition of the form A = VSU H , where V E C pxp and U E qxq are both unitary. Suppose further that rank A = r, S = diag{D,O(p-r)x(q-r)} and that 1 :::; r < min {p, q}.
c
(1) (2) (3) (4)
Find formulas for AHA, AAH, AAt and AtA. Show that the ranges of AHA and AtA coincide. Show that the ranges of AAH and AAt coincide. Describe the null spaces of the four matrices considered in (1) in terms of appropriately chosen sub-blocks of U and V.
11.2. The Moore-Penrose inverse Theorem 11.8. Let A E JFpxq. Then there exists exactly one matrix AtE IF qxp that meets the four conditions
(11.13) AAtA = A,
AtAAt
= At, AAt = (AAt)H and AtA = (AtA)H.
Proof. If A = Opxq, then the matrix At conditions in (11.13). If rank A = r > 0, and
(11.14)
= 0 qxp clearly meets the four
A = V [D Orx(q-r) ] U H 0(p-r)xr O(p-r) x (q-r)
= VI DU f,
where V = [Vi V2] and U = [U1 U2] are unitary matrices with first blocks Vi and Ul of sizes p x rand q x r, respectively, and
D = diag {S1, ... ,sr}
(11.15)
is a diagonal matrix based on the nonzero singular values of A, then the matrix At E IF qxp defined by the formula (11.16)
At = U [D;1
SI ~ ..• ~ Sr
>0
Z] VH = U1D-1V1
H
meets the four conditions in (11.13). It remains to check uniqueness. Let B E JFqxp and C E IF qxp both satisfy the four conditions in (11.13) and let Y = BH - CH. Then the formulas A = ABA
= A(BA)H = AAH BH
and A
= ACA = A(CA)H = AAHCH
imply that 0 = AAHy and hence that
ny ~ NAAH =
NAH.
235
11.2. The Moore-Penrose inverse
On the other hand, the formulas B
= BAB = B(AB)H =
BBH AH
and
C = CAC = C(AC)H = CC H AH imply that Y = A(BBH - CCH)
and hence that Ry ~ RA. Thus, as RA nNAH = {O} by Lemma 9.14, it follows that Y = 0, as needed. 0 The unique matrix At that satisfies the four conditions in (11.13) is called the Moore-Penrose inverse of A. In view of the last two formulas in (11.13), AAt and AtA are both orthogonal projections with respect to the standard inner product. Correspondingly the direct sum decompositions exhibited in Lemma 11.3 become orthogonal decompositions if the Moore-Penrose inverse At is used in place of an arbitrary pseudoinverse A 0 • Lemma 11.9. Let At E lF qxp be the Moore-Penrose inverse of A E lF pxq . Then: (1) lF P = RA EEl NAt.
(2) lF q
= RAt EEl NA.
Proof. Since At is a pseudoinverse, Lemma 11.3 guarantees the direct sum decompositions
To complete the proof of (1), we need to show that the two spaces RA and NAt are orthogonal with respect to the standard inner product. To this end let x E RA and y E NAt. Then, since RA = RAAt and AAt is a projection, Therefore,
(x,y) = (AAtx,y) = (x, (AAt)Hy) = (x,AAty) = 0, as needed. The proof of (2) is immediate from (1) and the fact that (At)t = A. 0 Exercise 11.9. Show that if A E C pxP , B E C pxq and RB AAtB=B.
~
RA, then
11. Pseudoinverses
236
Lemma 11.10. Let
M=[:H
~]
be a Hermitian matrix with square diagonal blocks such that RB Then M admits a factorization of the form (11.17)
J ~] [~
M = [ B At
C _ %H AtB ]
~
RA.
[~ A~B].
Proof. The formula is easily verified by direct calculation, since BH At A = BH and AAt B = B when the presumed inclusion is in force. 0 Exercise 11.10. Show that RAt = RAH and NAt = NAH. [HINT: This is an easy consequence of the representations (11.14) and (11.16).] Exercise 11.11. Show that if A E C pxq , then the matrix AAt is an orthogonal projection from C P onto RA. Exercise 11.12. Use the representation formulas (11.14) and (11.16) to give a new proof of the following two formulas, for any matrix A E C pxq : (1) C P = 'RA $NAH (with respect to the standard inner product). (2) C q = 'RAH $.NA (with respect to the standard inner product). Exercise 11.13. Show that if B, C E C pxq , A E c qxq and rankB rankC = rank A = q, then (11.18) (BACH)t = C(CHC)-l A-1(B HB)-l BH and give explicit formulas for (BACH)t(BAC H ) and (BACH)(BACH)t in terms of B, B H , C and C H . Exercise 11.14. Show that if A E C pxq , then AtAAH
= A-? AAt = AH.
Exercise 11.15. Show that if
~]
E=[%H is a Hermitian matrix such that Rc Et of E is given by the formula (11.19)
Et =
~
'RBH, then the Moore-Penrose inverse
[-(Bt~~CBt
(B2 H ].
[HINT: Exploit Exercise 11.14 and the fact that Rc ~ RBH
C.] Exercise 11.16. Let
C=
0]
B [A BH 00'
====}
Bt BC =
11.3. Best approximation in terms of Moore-Penrose inverses
237
where B is invertible, and let At denote the Moore-Penrose inverse of A. Show that the matrix (B-l)H
[~
g]
B- 1
is a pseudoinverse of C, but it is not a Moore-Penrose inverse. Exercise 11.17. Show that the matrix
0]
[ AAt BHAt 0
is a projection but not an orthogonal projection with respect to the standard inner product (unless BH At = 0). Exercise 11.18. Let A l ,A2 E C pxq and B1,B2 E C pxr and suppose that RBI ~ RAI and RB2 ~ RA2. Show by example that this does not imply that RBI +B2 ~ RAI +A2· [HINT: Try Bi = uivf/ and Ai = UiWf/ for i = 1,2, with VI orthogonal to V2 and WI = W2.] Exercise 11.19. Let A E Cpx P , B E C pxq ,
M=[:H and suppose that BBH
~]
= Ip. Show that the Moore-Penrose inverse Mt =
[%H _B~AB ]
Exercise 11.20. Let B E C pxq . Show that: (1) BtB is the orthogonal projection of C q onto
RBH.
(2) BBt is the orthogonal projection of C P onto
RB.
11.3. Best approximation in terms of Moore-Penrose inverses The Moore-Penrose inverse of a matrix A with singular value decomposition (11.14) is given by the formula (11.16). Lemma 11.11. If A E C pxq and bE CP, then
(11.20)
IIAx -
for every x E
c q.
Proof.
bll 2 = IIAx -
AAt bll 2 + II(Ip - AAt)b1l 2
The stated formula follows easily from the decomposition Ax - b
= (Ax - AAtb) - (Ip - AAt)b = AAt(Ax- b) - (Ip - AAt)b,
238
11. Pseudoinverses
since (AAt(Ax - b), (Ip - AAt)b) = ((Ax - b), (AAt)H (Ip - AAt)b) = (Ax - b, (AAt)(Jp
-
AAt)b)
= (Ax- b,O)
=0. D Formula (11.20) exhibits the fact that the best approximation to b that we can hope to get by vectors of the form Ax is obtained by choosing x so that
Ax=AAtb.
This is eminently reasonable, since AAtb is equal to the orthogonal projection of b onto n A. The particular choice x=Atb
has one more feature: (11.21)
To verify this, observe that if y is any vector for which Ay = AAtb, then y-Atb
ENA ;
i.e., for some vector u E NA.
y=Atb+u Therefore, since
(Atb, u) = (At AAtb, u) = (Atb, At Au) = (Atb,O) = 0,
it follows that lIyl12 = IIA t bl1 2 + lIu11 2. Thus, Ilyll 2: IIAtb11 with equality if and only if y = Atb.
Remark 11.12. If A is a p x q matrix of rank q, then AHA is invertible and another recipe for obtaining an approximate solution to the equation Ax = b is based on the observation that if x is a solution, then AHAx=AHb
and hence x = (AH A)-l AHb.
Exercise 11.21. Let A E lF pxq . Show that if rank A = q, then AHA is invertible and
Chapter 12
Triangular factorization and positive definite matrices
Half the harm that is done in this world Is due to people who want to feel important. They don't mean to do harm-but the harm does not interest them. Or they do not see it, or they justify it Because they are absorbed in the endless struggle To think well of themselves. T. S. Elliot, The Cocktail Party This chapter is devoted primarily to positive definite and semidefinite matrices and related applications. To add perspective, however, it is convenient to begin with some general observations on the triangular factorization of matrices. In a sense this is not new, because the formula
EPA = U or, equivalently,
A = p-l E-1U
that emerged from the discussion of Gaussian elimination is almost a triangular factorization. Under appropriate extra assumptions on the matrix A E c nxn , the formula A = P-1E-1U holds with P = In . • WARNING: We remind the reader that from now on (u, v) = (u, v)st, the standard inner product, and Ilull = lIull2 for vectors u, v E lF n , unless indicated otherwise. Correspondingly, IIAII = IIA1I2,2 for matrices A.
-
239
12. Triangular factorization and positive definite matrices
240
12.1. A detour on triangular factorization The notation
:
a~:'kl
akj
akk
a jj
(12.1)
A[j,k]
=
[
for
A E C nxn
and
1:::; j :::; k :::; n
will be convenient.
Theorem 12.1. A matrix A E c nxn admits a factorization of the form
A=LDU,
(12.2)
where L E C nxn is a lower triangular matrix with ones on the diagonal, U E C nxn is an upper triangular matrix with ones on the diagonal and DEC nxn is an invertible diagonal matrix, if and only if the submatrices (12.3)
A[I,k]
are invertible for
= 1, ... ,n.
k
Moreover, if the conditions in (12.3) are met, then there is only one set of matrices, L, D and U, with the stated properties for which (12.2) holds. Proof. Suppose first that the condition (12.3) is in force. Then, upon expressing
A
= [Au A12] A2I
A22
in block form with An E C pxP , A22 E C qxq and p + q = n, we can invoke the first Schur complement formula
A- [ Ip 0 ] [ Au 0 ] [Ip All AI2 ] A2IAIl Iq 0 A22 - A2IAIl AI2 0 Iq repeatedly to obtain the asserted factorization formula (12.2). Thus, if All = A[l,n-l] , then an = A22 - A2IAIl Al2 is a nonzero number and the exhibited formula states that
A - L [A[l,n-l] - n 0
0]
an
U n,
where Ln E C nxn is a lower triangular matrix with ones on the diagonal and Un E C nxn is an upper triangular matrix with ones on the diagonal. The next step is to apply the same procedure to the (n - 1) x (n - 1) matrix A[l,n-l]' This yields a factorization of the form
_ [ A[I,n-2] A[l,n-I] - Ln-l 0
0]-
an-l
Un-I,
12.1. A detour on triangular factorization
241
where L n- 1 E C (n-1)x(n-1) is a lower triangular matrix with ones on the diagonal and Un-1 E C (n-1)x(n-1) is an upper triangular matrix with ones on the diagonal. Therefore,
A
=L
n
n- 1 [L °
0] 1
[A[1S-2J
o
g1 ° 0]
°
a n-1 0 an
[Un-1
1
U.
n,
which is one step further down the line. The final formula is obtained by iterating this procedure n - 3 more times. Conversely, if A admits a factorization of the form (12.2) with the stated properties, then, upon writing the factorization in block form as
[1~~ 1~:] = [~~~ ~2] [D~l ~2] [U~1 ~~~], it is readily checked that
An = LnDn Un or, equivalently, that
A[1,kJ = L[l,k]D[l,k]U[l,k]
for
k = 1, ... ,n.
Thus, A[I,k] is invertible for k = 1, ... , n, as needed. To verify uniqueness, suppose that A = L 1D 1U1 = L2D2U2. Then the identity L21 LIDI = D2U2Uli implies that L21 LIDI is both upper and lower triangular and hence must be a diagonal matrix, which is readily seen to be equal to D I . Therefore, L1 = L2 and by an analogous argument UI = U2, which then forces DI = D2. 0 Theorem 12.2. A matrix A E c nxn admits a factorization of the form
A=UDL,
(12.4)
where LEe nxn is a lower triangular matrix with ones on the diagonal, U E C nxn is an upper triangular matrix with ones on the diagonal and DEC nxn is an invertible diagonal matrix, if and only if the blocks (12.5)
A[k,nJ
are invertible for
k = 1, ... ,n.
Moreover, if the conditions in (12.5) are met, then there is only one set of matrices, L, D and U, with the stated properties for which (12.4) holds.
Proof. The details are left to the reader. They are easily filled in with the proof of Theorem 12.1 as a guide. 0 Exercise 12.1. Prove Theorem 12.2. Exercise 12.2. Let Pk
= diag{h,O(n-k)x(n-k)}' Show that
12. Triangular factorization and positive definite matrices
242
(a) A E e nxn is upper triangular if and only if APk = PkAPk for k = 1, ... ,no (b) A E e nxn is lower triangular if and only if PkA = PkAPk for k = 1, ... ,n. Exercise 12.3. Show that if L E e nxn is lower triangular, U E e nxn is upper triangular and DEe nxn is diagonal, then
(12.6) (LDU)[l,k] = L[l,kjD[l,kjU[l,kj
and
(U DL)[k,nj = U[k,njD[k,njL[k,nj
for k = 1, ... ,n.
12.2. Definite and semidefinite matrices A matrix A E e nxn is said to be positive semidefinite over en if (Ax,x) 2: 0 for every x
(12.7)
E
en;
it is said to be positive definite over en if
(Ax,x) > 0 for every nonzero vector x
(12.8)
E
en.
The notation A !: 0 will be used to indicate that the matrix A E e nxn is positive semidefinite over en. Similarly, the notation A>- 0 will be used to indicate that the matrix A E e nxn is positive definite over en. Moreover, if A E e nxn and B E e nxn , then A !: B and A >- B means that A - B !: 0 and A - B >- 0, respectively. Correspondingly, a matrix A E e nxn is said to be negative semidefinite over en if -A!: 0 and negative definite over en if - A >- O. Lemma 12.3. If A E e nxn and A !: 0, then: (1) A is automatically Hermitian.
(2) The eigenvalues of A are nonnegative numbers. (3) A >- 0 {::::::} the eigenvalues of A are all positive {::::::} det A > O. Proof.
If A !: 0, then
(Ax, x) = (Ax, x) = (x, Ax) for every x E
en.
Therefore, by a straightforward calculation, 4
4(Ax,y) =
:~::>k(A(x+iky),(x+iky)) k=l 4
=
L ik((x + iky), A(x + iky)) = 4(x, Ay) ;
k=l i.e., (Ax,y) = (x,Ay) for every choice of x, y
E
en. Therefore, (1) holds.
12.2. Definite and semidefinite matrices
243
Next, let x be an eigenvector of A corresponding to the eigenvalue >.. Then >'(x, x) = (Ax,x) ~ O. Therefore >. ~ 0, since (x, x) > O. This justifies assertion (2); the proof of (3) is left to the reader. 0 WARNING: The conclusions of Lemma 12.3 are not true under the less restrictive constraint (Ax,x)
~
0 for every x E ]Rn.
Thus, for example, if
A
= [~ -;]
and x
= [~~] ,
then (Ax, x)
= (Xl - X2)2 + x~ + x~ > 0
for every nonzero vector x E
c nxn .
Exercise 12.4. Let A E A
>- 0 <¢:::::::>
]R n.
However, A is clearly not Hermitian. Show that if At 0, then
all the eigenvalues of A are positive
Exercise 12.5. Show that if V E A
<¢:::::::> det A > 0 .
c nxn is invertible, then
>- 0 {:::::::} V HAV >- O.
Exercise 12.6. Show that if V E C nxk and rank V = k, then
>- 0, but the converse implication is not true if k < n. A
>- 0
==* VH AV
Exercise 12.7. Show that if the n x n matrix A = [aij], i,j positive semidefinite over cn, then laijl2 ~ aiiajj' Exercise 12.8. Show that if A E
= 1, ...
,n, is
c nxn , n = p + q and
A _ [Au A12] A21 A22 '
where Au E C pxP , A22 E A>- 0 <¢:::::::> Au
>- 0,
c qxq , then A21
= A~ and A22 - A2lAIlAl2 >- O.
Exercise 12.9. Show that if A E C pxq , then IIAII ~ 1 {:::::::} Iq - AHA to<¢:::::::> Ip - AAH to.
[HINT: Use the singular value decomposition of A.]
12. Triangular factorization and positive definite matrices
244
Exercise 12.10. Show that if A E c nxn and A
[~ ~] Exercise 12.11. Show that if A E
= AH, then
to.
c nxn and A t
0, then
[~ ~]tO. Exercise 12.12. Let U E C nxn be unitary and let A E C nxn. Show that if A >- 0 and AU >- 0, then U = In. [HINT: Consider (Ax, x) for eigenvectors x of U.]
12.3. Characterizations of positive definite matrices A basic question of interest is to check when an n x n matrix A = [aij] , i, j = 1, ... , n is positive definite over en. The next theorem supplies a number of equivalent characterizations. Theorem 12.4. If A E C nxn, then the following statements are equivalent: (1) A
>- O.
(2) A = AH and the eigenvalues of A are all positive; i. e. Aj > 0 for j= 1, ... ,n. (3) A = VHV for some n x n invertible matrix V. (4) A=AH anddetA[l,k] >0 for k=l, ... ,no
(5) A = LLH, where L is a lower triangular invertible matrix. (6) A = AH and det A[k,n] > 0 for k = 1, ... ,no (7) A = UU H , where U is an upper triangular invertible matrix. Proof. Let {Ub ... , un} denote an orthonormal set of eigenvectors corresponding to A1, ... ,An. Then, since (Uj, Uj) = 1, the formula
Aj
= Aj(Uj, Uj} = (Auj, Uj) ,
for j
= 1, ...
,n,
clearly displays the fact that (1)==?(2). Next, if (2) is in force, then
D = diag{A}, ... , An} admits a square root Dl/2
= diag{
v.x
l , ...
,v.xn }
and hence the diagonalization formula A = UDU H
with U =
[Ul
un]
can be rewritten as A = VHV
with V = Dl/2U H invertible.
12.3. Characterizations of positive definite matrices
Thus, (2)
~
245
(3), and, upon setting I1k - [ h ] O(n-k)xk
and VI = VI1k ,
it is readily seen that
A[I,k] = = =
I1f: AI1k
I1f: VHVI1 k ViHVi.
But this implies that (I1f: AI1kx, x) -
(ViHVix, x) - (V1x, V1x) > 0
for every nonzero vector x E C k, since Vi has k linearly independent columns. Therefore, (3) implies (4). However, in view of Theorem 12.1, (4) implies that A = L1DU1, where Ll E nxn is a lower triangular matrix with ones on the diagonal, Ul E c nxn is an upper triangular matrix with ones on the diagonal and DEC nXn is an invertible diagonal matrix. Thus, as A = A H in the present setting, it follows that
c
(Ufi)-IL 1 = D H L{iU11D- 1 and therefore, since the left-hand side of the last identity is lower triangular and the right-hand side is upper triangular, the matrix (Ufi)-l Ll must be a diagonal matrix. Moreover, since both Ul and Ll have ones on their diagonals, it follows that (Ufi)-l Ll = In, i.e., uf = Ll. Consequently,
A[I,k] = I1f: AI1k = I1f:ufi DUII1k = (I1f:Ufi I1k)(I1f: DI1k)(I1f:UII1k) and
detA[I,k] = det{(L 1 )[l,k]}det{D[I,k]}det{(Ul)[I,k]} = du " ·dkk for k = 1, ... ,n. Therefore, D is positive definite over en as is A = LIDLIf. The formula advertised in (5) is obtained by setting L = LIDI/2. It is also clear that (5) implies (1). Next, the matrix identity
[0 Ik
In-k] [Au A12] 0 A21 A22
[0 h] = In-k
0
[A22 A21] A12 Au
clearly displays the fact that (4) holds if and only if (6) holds. Moreover, since (7) implies (1), it remains only to show that (6) implies (7) in order to complete the proof. This is left to the reader as an exercise. Exercise 12.13. Verify the implication
(6)~(7)
in Theorem 12.4.
246
12. Triangular factorization and positive definite matrices
Exercise 12.14. Let A E e nxn and let DA = diag{au ... ,ann} denote the n x n diagonal matrix with diagonal entries equal to the diagonal entries of A. Show that DAis multiplicative on upper triangular matrices in the sense that if A and B are both n x n upper triangular matrices, then DAB = DADB and thus, if A is invertible, DA-l = (DA)-l. Remark 12.5. The proof that a matrix A that is positive definite over LLH for some lower triangular invertible matrix L can also be based on the general factorization formula EPA = U that was established as a byproduct of Gaussian elimination. The proof may be split into two parts. The first part is to check that, since A >- 0, there always exists a lower triangular matrix E with ones on the diagonal such that EA=U
en admits a factorization of the form A =
is in upper echelon form and hence upper triangular. Once this is verified, the second part is easy: The identity UEH = EAEH
= (EAEH)H = EU H
implies that D = UEH is a positive definite matrix that is both lower triangular and upper triangular. Therefore,
= diag{ d u , ... ,dnn } is a diagonal matrix with d jj > 0 for j = 1, ... ,n. Thus, D
D has a positive
square root: D=F 2
,
where and consequently A
= (E- 1 F)(E- 1 F)H
.
This is a representation of the desired form, since L = E- 1 F is lower triangular. Notice that djj is the j'th pivot of U and E- 1 = (D-1U)H. Exercise 12.15. Show that if A E e 3x3 and A >- 0, then there exists a lower triangular matrix E with ones on the diagonal such that EA is upper triangular. Exercise 12.16. Show that if A E e 3x3 and A >- 0, then there exists an upper triangular matrix F with ones on the diagonal such that F A is lower triangular. [HINT: This is very much like Gaussian elimination in spirit, except that now you work from the bottom row up instead of from the top row down.]
247
12.4. An application of factorization
Exercise 12.17. Let A = [AI A 2], where Al E C nxs , A2 E C nxt and s + t = r. Show that if rank A = r, then the matrices AHA, A¥ AI, A~ A2 and Ar A2 - Ar AI(Af AI)-l Af A2 are all positive definite (over complex spaces of appropriate sizes).
[~
Exercise 12.18. Show that if x E JR, then the matrix positive definite over C 3 if and only if (x _1)2
~]
; x 1 1
will be
< 1/2.
12.4. An application of factorization Lemma 12.6. Let A E
A where c,d E C n -
=
c nxn
[a~l ~]
and D,E E
l
and suppose that A
and that
[b~I ~],
and A-I = c(n-l)x(n-I).
>- 0
Then
(12.9)
Proof. In view of Theorem 12.4, A = LLH , where L E C nXn is an invertible lower triangular matrix. Therefore, n
n
n
L
X jej)11 2 . (A(el - LXjej), el - LXjej) = IILH(ei j=2 j=2 j=2
Let
Vj
= LHej for j = 1, ...
V = [V2 ...
,n,
vn ]
and V = span {V2' ... ,vn }.
Then, since L is invertible, the vectors VI,." ,Vn are linearly independent and hence the orthogonal projection Pv of C n onto V is given by the formula Pv = V(VHV)-IV H . Thus the minimum of interest is equal to
Ilvl - PvvI1l 2 = (VI -
PVV}, VI) =
IIVIII2 - v{iV(VHV)-IVHVI'
It remains to express this number in terms of the entries in the original
matrix A by taking advantage of the formulas cH] = A = LLH = [vf] [an c D VH
[VI V] = [VfVI vfV] VHVI VHV
The rest is left to the reader. Exercise 12.19. Complete the proof of Lemma 12.6.
o
12. Triangular factorization and positive definite matrices
248
Exercise 12.20. Let A E min Xl,··· ,Xn-l
c nxn and assume that A >- O.
(A(e n -
n-1
n-l
j=1
j=1
Evaluate
L xjej), en - L xjej}
in terms of the entries in A and the entries in A-I.
12.5. Positive definite Toeplitz matrices In this section we shall sketch some applications related to factorization in the special case that the given positive definite matrix is a Toeplitz matrix. Theorem 12.7. Let
!~ Tn = [ .
tto1
..
. ..
:::
..
..
.
tn
...
tl
::~l . >- 0 ...
(n)
and
[
fn =
(n)
n
L 'Y]~) Aj
Pn(A) =
(n)]
'YOn
= T-n 1 'Ynn
to
and let
'Yoo
(n) 'Ynn
n
and
qn(A)
=
L 'Y;~) Aj . j=O
j=O
Then:
(1) E~j=o Ai'Yt)w.i is related to the polynomials Pn(A) and qn(A) by the formula ( 12.10)
(2)
~
Ai
L..J i,j=O
~~)w.i = Pn(A)h~~)} -IPn{w)* - Awqn{A)h~~} -lqn(W)* 1 - Aw
'Y'3
(n) _
(n)
_
MI' . -
'Yij - 'Yn-j,n-i - 'Yji Jor z, J -
0
, ... , n.
(3) qn{A) = Anpn{1/A). (4) The polynomial Pn(A) has no roots in the closed unit disc.
(5) If Sn >- 0 is an (n + 1) x (n + 1) Toeplitz matrix such that T;lel = S;le1 1 thenSn =Tn. Proof. Since Tn formulas to write
>- 0
(12.11)
x:]
[~~) =
[
X
{
(~)}-1
'YOO
===}
f
n
>- 0 we can invoke the Schur complement
12.5. Positive definite Toeplitz matrices
249
and (12.12)
rn
where
~ xH
[:;,
7r~]
[Ino
Yb~r:2}-l]
=
[",(n) 101
01 [
[y - yb~~}-lyH
,~~
0
1 ...
",(n)] IOn
'
yH
=
['Y{n) InO
••.
In
0]
b~~}-lyH
'Y(n) ] In,n-l '
1
X denotes the
lower right-hand n x n corner of r n and Y denotes the upper left-hand n x n corner of r n' Thus,
[1
~ ... ~nJrn [1]
and a second development based on the second Schur complement formula yields the identity
[1
~
...
~nJ r n[1] ~ qn(~)b!::l} + [1 ...
-lq,,(w)'
~n-'l [y - yb~~} -'yH] [;_,] .
The proof of formula (12.10) is now completed by verifying that X - xba~)} -lx H
(12.13)
= r n-l = Y
- yb~~} -lyH ,
where (12.14)
[,~~) .. . '~Z)l for k
:
:
",(k) Ikk
",(k) 'kk
-1
= Tk
[t~ t~l
.. .
and Tk = : :
tk tk-l
= 0, ... , n. The details are left to the reader as an exercise.
t~k] :
to
12. Triangular factorization and positive definite matrices
250
To verify (2), let 8i j denote the Kronecker delta symbol, i.e., 8ij
=
{
1 if i=j 0 if i:f j ,
and write n
n
"~
(n) ti-s'Ysj
8ij = 8n-i,n-j = ~ tn-i-s'Y;~_j s=o
-
8=0
-
n
n
s=o
s=o
~ tn-i-(n-8)'Y~~8,n-j = ~ 'Y~~s,n_jts-i n
-
~ 'Yj:)tS-i , s=o
which, upon comparing the last two sums, yields the first formula in (2); the second follows from the fact that Tn and r n are Hermitian matrices. Suppose next that Pn(w) = O. Then formula (12.10) implies that (12.15)
_\w\2 qn (w) {'Y!::2} -1 qn(W)* = (1- \w\2)
t
wi'Yt)wj ,
i,j=O which is impossible if \w\ < 1, because then the left-hand side of the identity (12.15) is less than or equal to zero, whereas the right-hand side is positive. Thus, \Pn(w) I > 0 if \w\ < 1. Moreover, if \w\ = 1, then formula (12.15) implieS that (]n{w) = 0 also. Thus, formula (12.10) implies that
(12.16)
0=[1
A··· Anjrn[il
for all A E C, which is impossible. Finally, in view of items (2) and (3), formula (12.1O) can be rewritten
which exhibits the fact that if Tn >- 0, then all the entries 'Yijn) are completely D determined by the first column of r n, and hence serves to verify (5) . Exercise 12.21. Verify the identity (12.13) in the setting of Theorem 12.7. [HINT: Use the Schur complement formulas to calculate r;;:-l alias Tn from the two block decompositions (12.11) and (12.12).]
12.5. Positive definite Toeplitz matrices
251
Theorem 12.7 is just the tip of the iceberg; it can be generalized in many directions. Some indications are sketched in the next several exercises and the next section, all of which can be skipped without loss of continuity.
Exercise 12.22. Show that if Tn :>- 0, then torization
rn
admits the triangular fac-
(12.18)
where (12.19)
Ln
=
(n) 'Yoo (n) 'Y10
0
0
(n-1) 'Yoo
0
(n) 'YnO
(n-1) 'Yn -1,0
(0) 'YOO
(n) 'Yoo
LHn-
0
(n) 'Y01 (n-1) 'YOO
(n) 'YOn (n-1) 'Y0,n-1
0
0
(0) 'YOO
and
(n)}-l "00 { (n-1)}-1 , • .., { 'YOO (0)}-1} · {{ 'YOO D n = dlag .
(12.20)
Exercise 12.23. Find formulas in terms of 'Y~) analogous to those given in the preceding exercise for the factors in a triangular factorization of the form (12.21)
where Tn :>- 0,
Un
is an upper triangular matrix and Dn is a diagonal matrix.
Positive definite Toeplitz matrices playa significant role in the theory of prediction of stationary stochastic sequences, which, when recast in the language of trigonometric approximation, focuses on evaluations of the following sort: (12.22)
min
{~ r'lf leinO - I: cjeijOl2 f(eiO)dO : co, ... 27r Jo
,Cn-1 E
j=O
c}
=
h~';)}-l
and (12.23)
. {mm
1
1 11
27r 0
2'1f
n - ~ ~ cJ.eijOl2f( eiO)dO'. CI, ••. j=l
,Cn
E
tr \l...
}
_ -
{
(n)}-l , 'Yoo
where, for ease of exposition, we assume that f(e iO ) is a continuous function of () on the interval 0 ::; () ::; 27r such that f(e iO ) > 0 on this interval. Let
252
12. Triangular factorization and positive definite matrices
Tn = Tn (1) denote the Toeplitz matrix with entries (12.24)
~
tJ· =
f21r f(eiO)e-ijOd8 27r Jo
for
j = 0, ±1, ±2, ....
Exercise 12.24. Show that (12.25)
bO] if b = [ b: ' n
then
1 121r
27r
0
ILn
bjeijOl2 f(eiO)dO = bHTnb .
j=O
Exercise 12.25. Show that if Tn >- 0 and u H = [tn (12.26)
0]
In [Tn-l Tn = [uHT;!l 1 OH
Tn =
[1
VHT;!l] 1
o
then
0 1[InOH
h~~2} -1
Exercise 12.26. Show that if Tn >- 0 and v H = [tl
(12.27)
tt],
...
[ha~)}-l
OH
Tn -
0
...
tn], then
1[Tn_Iv _11
1
Exercise 12.27. Show that if Tn >- 0, then
(12.28)
ha~)}-lha~-l)}-l"'hag)}-l
detTn -
h~r::}-lhi~~Ll}-l ... hag)}-l.
=
Exercise 12.28. Verify formula (12.22). [HINT: Exploit formulas (12.25) and (12.26).] Exercise 12.29. Verify formula (12.23). [HINT: Exploit formulas (12.25) and (12.27).] Exercise 12.30. Use the formulas in Lemma 8.15 for calculating orthogonal projections to verify (12.22) and (12.23) another way. Exercise 12.31. Show that if Tn >- 0, then 'Ya~) = 'Y~r:: and
'Ya~-l) ~ 'Ya~) .
(12.29)
[HINT: The monotonicity is an easy consequence offormula (12.23).J Exercise 12.32. Show that if Tn >- 0, then the polynomials k
(12.30)
qk(.x)
= L 'Yj~) .xj
for
k
= 0, ...
,n
j=O
are orthogonal with respect to the inner product
r
1r - - '0 1 '0 (12.31) (qj,qk),= 27rJo qk(eiO)f(e' )qj(e' )dO,
and
(k)
(qk,qk)'='Ykk'
253
12.5. Positive definite Toeplitz matrices
Exercise 12.33. Use the orthogonal polynomials defined by formula (12.30) to give a new proof of formula (12.22). [HINT: Write (n = Ej=o CjQj«().] Exercise 12.34. Let f(e iO ) = Ih(e i OI 2 , where h«() = E~o hj(j, E~o Ihjl < 00 and Ih«()1 > 0 for 1(1 ::; 1. Granting that l/h has the same properties as h (which follows from a theorem of Norbert Wiener), show that limb~":2}-l = Ihol2.
(12.32)
nloo
[HINT: 11 - Ej=l cjeijOl2 f(e iO ) = Ih(e10 ) - Ej=1 cjeijOh(elo)12 = Iho + u(e iO )12 where u(e iO ) = ~c:o ~~ c'eijOh(elO) is orthogonal to I , w}=1 h'e } ijO - w}=l J ho with respect to the inner product of Exercise 8.3 adapted to [0, 27l'].] The next lemma serves to guarantee that the conditions imposed on f(e iO ) in Exercise 12.34 are met if f«() = a«()/b«(), where a«() = E;=-k aj(j and b«() = E~=-f bj(j are trigonometric polynomials such that a«() > 0 and b«() > 0 when 1(1 = 1. Lemma 12.8. (Riesz-Fejer) Let n
f«() ==
L
h(j
for
1(1 = 1
j=-n be a trigonometric polynomial such that If«()1 > 0 for every point ( E C with 1(1 = 1 and fn =1= O. Then there exists a polynomial
«( -
f«() = l
1 for j
for
1(1 = 1
= 1, ... ,no
Proof. Under the given assumptions, it is readily checked that f - j = fj for j = 0, ... ,n and hence, that f«(3) = f(l/ (3) for every point (3 E C \ {O}. Moreover, since g«() = (n f«() = f-n + h-n( + ... + fn(2n is a polynomial of degree 2n with g(O) = f -n =1= 0, g«()
= a«( -
(31)'"
«( - (32n)
for some choice of points a, (31, ... , (32n E C \ {O}. However, in view of the preceding discussion, these roots can be indexed so that l(3j I > 1 and (3j+n = 1/(3j for j = 1, ... , n. Therefore, n
f«()
= (-n a
II «( - (3j)«( -1/(3j) j=l n
-
(-1)n a«(31'''(3n)-lII«(-(3j)«(-(3j) if j=l
1(1=1.
12. Triangular factorization and positive definite matrices
254
The polynomial 'Pn(() = V(-1)naCBloo . .Bn)-l TIj=l(( - .Bj) meets the stated requirements of the lemma. D
12.6. Detour on block Toeplitz matrices The interplay between the two Schur complements that was used to establish formula (12.10) is easily adapted to the more general setting of block Toeplitz matrices (12.33)
t ~'Ok]
.L, rT'k = [tt:'Ok
with blocks ti E CpxP, i = 0, ... ,k,
and their inverses (12.34)
rk
=
[~~) ... 1'~)
'~.,z)]
with blocks ,~k) E CpxP, i,j
= 0, ...
,k,
(k)
'kk
when they exist. Lemma 12.9. Let Tk denote the block Toeplitz matrices defined by formula (12.33) and suppose that n 2 1 and Tn is invertible. Let r n = T;l be decomposed into blocks as in formula (12.34). Then the following are equivalent: (1) Tn and ,~) are invertible. (2) Tn and Tn-l are invertible. (3) Tn and 1'~':l are invertible.
Proof. The proof is an easy consequence of the two Schur decompositions used in the proof of Theorem 12.7 except that now block Schur decompositions are used and the vectors x H and yH are replaced by the block rows [,a~) and [,~~) . . . 'Y~~-l ] , respectively. The details are left to the reader. D
.. . 'Y/::/]
Exercise 12.35. Show that in the setting of Lemma 12.9, det
'Y(n) 100
= det 'Y(n) Inn •
Theorem 12.10. Let Tn and Tn-l be invertible block Toeplitz matrices of the form (12.33) and let r n = T;l. Then the matrix polynomials n
(12.35)
Pn(.X)
=
L Aj'Y;~) , j=O
P~(A) =
n
L Aj'Ya~) j=O
n
(12.36)
Qn(A)
=
L Aj,;~) j=O
and Q~(A) =
n
L Aj'Y~;) j=O
255
12.6. Detour on block Toeplitz matrices
are connected by the formula
Proof. The proof is an almost exact paraphrase of the verification of formula (12.10), except that now block Schur decompositions are used and the vectors x H and yH are replaced by the block rows h'a~) and
... ,a:)]
(n) [ 'nO
. ..
(n)]
'n,n-l'
respect·lveIy.
The well known Gohberg-Heinig formula for r n can be obtained from formula (12.37) with the aid of the following evaluation: Lemma 12.11. Let
X(-\)
=
"
L-\iXi
and Y(-\) =
i=O
L" -\iYi i=O
be matrix polynomials with coefficients Xi E C pxq and 0, ... ,.e such that (12.38)
X(eiO)Y(eiB)H = Opxp for 058
Yi
E
c qxp for i
< 2rr.
Then (12.39)
X{-\)Y{w)H 1- -\w X"-1
~ -'li(A) [X':
Xl]
Yl!-l
...
0
X"
y,H
X'o·· ] ·
0
o
y 2H
y;H 2 y;H 2 y;H 3
H [YI. ..
"
\II{-\) = \II"-I{-\) = [lp -\lp Proof.
l
1
y:H
0
0 y:H
l
... X2
[Xl = -\11(-\) : where
y:H
X"
X2 X2 X3 X"
...
0
... ...
\II (w)H
0 y:H
"
!
H Y. ]
... -\"-1] .
The condition (12.38) is equivalent to the condition
-H X{-\)Y{1/-\) = Opxp for
-\ E
C \ {O}.
'li(w)H,
=
12. Triangular factorization and positive definite matrices
256
Thus, if w =I 0 and j.L
= l/w, then
X(A)Y(W)H 1- AW
=
{X (A) - X(l/w)}Y(w)H 1- AW £
.
.
£
.
.
~
=
A3 - j.L3 H -j.L LJ XjY(w) '0 A-j.L 3=
_
_j.L ~ ,\J - j.L3 X -Y(w)H LJ A-j.L 3 j=I
£ j-I
=
-j.L LLAij.Lj-I-iXjY(w)H j=Ii=O i-I s
=
-j.L LLAi j.Ls-iXS +1Y (w)H s=Oi=O i-I
=
i-I
-j.L L Ai Lj.LS-iXS+1Y (w)H , i=O s=i
which serves to identify the coefficient of Ai in the first formula on the righthand side of (12.39) with £-1
£-1
£
-j.L Lj.LS-iXS+1Y (w)H = - L Lwt- S +i- I X s+1ytH . s=i s=i t=o Moreover, since the right-hand side of the last formula is a polynomial in W, it can be reexpressed as
- {Xi+I twt-IytH + Xi+2 t wt- 2ytH + ... + X£ twt-HiytH} t=O t=o t=o -
- {Xi+l twt-1ytH + Xi+2 t wt- 2ytH + ... + Xi t wt-HiytH} t=l t=2 t=i-i - {Xi+!
~wtyt!l + Xi+2 ~wtyt!2 + ... + Xi twtyt!i_i} . t=O
t=O
t=O
Therefore, the coefficient of AiWj in the first formula on the right-hand side of (12.39) is equal to
- {Xi+lYj~l and to
+ Xi+2Yj!2 + ... + XiYl!-i+j}
if i ~ j
12.6. Detour on block Toeplitz matrices
257
which yields the first formula in (12.39). The second formula follows easily from the first upon noting that
[XI X2 Xe
.. .
X2 X3
...
Xe-l
.. .
...
Xe
=
I] [I
0
0
X2
Xl]
0
[0
0 Ip
;e
Jp
0
!] 0
Theorem 12.12. (Gohberg-Heinig) In the setting of Theorem 12.10, (n) (n) (n) (n) 0 0 'Ynn 'Ynn 'YOn 'Yn-l,n (12.40) T;;l =
0
(n) 'Ynn
'YIn
0
0
(n) 'Ynn
(n) 'YnO
0
0
0
(n)
(n)
(n)
(n) 'Ynn
'Yn,n-l 1 Dn (n)
(n)
'Ynl
'YnO
0 0
0
0
0
(n)
(n)
'YOn
0
0
(n)
(n) 'YOn
0
'Y1O
(n) 'YnO
'Y20
(n)
0 0
'YnO
(n)
0
'YOI
(n)
'Y02
where · {(n) (n)} D o = dlag 'YOO , ... , 'Yoo
Proof.
an d d'lag {(n) 'Ynn,'"
(n)} , 'Ynn .
Let
and Then Xj = {
['YJ~)h6~)}-1
'Yj-l,n
[0 Ip]
j=n+1
for
(n)
{(n) 'Ynn
}-l]
for
j
= 1, ... ,n
and
{
Y;
[h6j)}H
~ [0
(n) 'Ynn
(n)
'Yn-l,O
D- 1 'YO,n-l 0
0 0
0
-htJ-l}H]
-bi':i}H]
for j
for
j=l, ... ,n
~ n +1
(n) 'YOn
12. Triangular factorization and positive definite matrices
258
and the formula emerges from formula (12.39) upon making the requisite substitutions. D
12.7. A maximum entropy matrix completion problem In this section we shall consider the problem of completing a matrix in C nxn that belongs to the class c~xn =
{A
E c nxn :
A>- O}
when only the entries in the 2m + 1 central diagonals of A, i.e., the entries with indices in the set
aij
Am
(12.41)
= {(i,j):
i,j = 1, ... ,n and
Ii - jl:::; m},
are given. It is tempting to set the unknown entries equal to zero. However, the matrix that is obtained this way is not necessarily positive definite; see Exercise 12.18 for a simple example. A remarkable fact is that there exists exactly one completion A E c~xn of the partially specified A such that ef(A)-lej = 0 for (i,j) ¢ Am. We shall sketch an algorithm for obtaining this p~icular completion that is based on factorization and shall show that A can also be characterized as the completion which maximizes the determinant. Because of this property A is commonly referred to as the maximum entropy completion.
Theorem 12.13. Let m be an integer such that 0 :::; m :::; n - 1 and let {bij:
(i,j)
E
Am}
be a given set of complex numbers. Then there exists a matrix A E such that (12.42)
aij
= bij for (i,j)
E
c~xn
Am
if and only if (12.43)
[
1>- 0
jj
bj,j+m
bj;m,j
bj+~,j+m
b
:
for j = 1, ... ,n - m.
Discussion. The proof of necessity is easy and is left to the reader as an exercise. The verification of sufficiency is by construction and is most easily understood by example. To this end, let n = 5 and m = 2, and let X E C 5x5 denote the lower triangular matrix with entries Xij that are set equal to zero for i > j + 2 ( i.e., X41 = X51 = X52 = 0) and are determined by the following equations when j :::; i :::; j + 2: (12.44)
[bU b21
b12 b22
b 13 ] b23
b31
b32
b33
[xu] = [1] X21 X31
0, 0
[b22 b32
b23 b33
b24] b34
b42
b43
b44
12.7. A maximum entropy matrix completion problem
259
(12.45)
[~:: ~:: ~::] [~::] [~] b53
b54
b55
0
X53
[b44 b54
b45] [X44] b55 X54
[1]° ,
=
°
b55X55 =
1.
Next, let D = diag {xu, ... ,X55}. Since Xjj > for j = 1, ... ,5, D >- 0 and the matrix L = X D- 1 is lower triangular with ones on the diagonal. Now set (12.46) Then clearly A E C~X5 and equations (12.44) and (12.45) are in force, but with aij in place of bij . Therefore, the numbers Cij = bij - aij are solutions of the equations (12.47)
cu [ C2l
CI2 C22
CI3] C23
C22 [ C32
C23 C33
C24] [X22] C34 X32 =
C3l
C32
C33
C42
C43
C44
X42
[0]0
°
,
(12.48)
[~:: ~:: ~::] [~::] [~] =
C53
C54
C55
0
X53
[C44 C54
C45] [X44] C55 X54
=
[0]0 ,
C55X55
= o.
But, since the Xjj are positive and each of the five submatrices are Hermitian, it is readily seen that Cij = 0 for all the indicated entries; i.e., aij = bij for Ii - jl :S 2. Thus, the matrix A constructed above is a positive definite completion. 0 At first glance it might seem that the missing entries in the partially specified matrix should be set equal to zero. However, as we have already noted, Exercise 12.18 shows that the matrix that arises this way is not necessarily positive definite over en. The matrix A that is constructed in Theorem 12.13 inherits special properties from the construction. Lemma 12.14.
If
A = XXH and X E
c nxn
is a lower triangular invert-
ible matrix, then
(12.49)
aij = 0
for
i - j 2: k
-<==> Xij
= 0
for
i - j 2: k .
Discussion. The verification of (12.49) becomes transparent if the calculations are organized properly. The underlying ideas are best conveyed by example. Let A E C 7x7 and suppose that k = 3. Then the entries aij in A that meet the constraint i - j 2: k with k = 3 can be expressed in terms of the corresponding entries Xij in the lower triangular matrix X by means of
12. Triangular factorization and positive definite matrices
260
the formulas
["41] a5l a6l
=
a71
X5l _ [X51 X52] [_] [X41] [a52] Xu, a62 = X6l X62 ~~~ X6l
a72
X71
X71
,
X72
(12.50)
[X61 x62 X63] X71 X72 X73
[a 63 ] a73
[=:] _ X33
4
and
a74
=L
X7j X 4j .
j=l
Thus, as the diagonal entries of X are all nonzero, assertion (12.71) is easily verified for this special case. The general case may be established in just the same way. There is a companion result, which we state without proof:
Lemma 12.15. If A = yyH and Y E C nxn is an upper triangular invertible matrix, then (12.51)
aij
= 0 for i - j :::; -k
<¢:::=>
Exercise 12.36. Verify Lemma 12.15 if n
Yij =
0
for
i - j :::; -k.
= 7 and k = 3.
Theorem 12.16. Let m be an integer such that 0:::; m :::; n - 1 and let
be a given set of complex numbers such that the conditions (12.43) are in Then there exists exactly one matrix A E c~xn such that
force.
(12.52)
aij
=
bij
for
(i,j) E Am
and (12.53)
Proof. In view of Lemma 12.14, the matrix A = (LH)-l D-l L -1 that was constructed in the discussion of Theorem 12.13 meets both of the stated conditions. Suppose next that Al E c~xn is a second matrix that meets the conditions (12.52) and (12.53). Then, by Theorem 12.4, Al admits a factorization of the form
All =
LlDlL{f
for some lower triangular matrix LI E c nxn with ones on the diagonal and some diagonal matrix Dl E c~xn. Moreover, by Lemma 12.14 the entries Zij in the lower triangular matrix Z = LID are equal to zero for i > j + m. Consequently, the entries Zij with j :::; i :::; j + m are determined by the same
12.7. A maximum entropy matrix completion problem
261
equations as the Xij for j ~ i ~ j + m, Le., by equations (12.44) and (12.45) if n = 5 and m = 2, or, in general, by the equations
(12.54)
BUJ+mJ
L~J ~ [t]
w
j~1,
... ,n-m
and (12.55)
BU.nJ
[:i'J m
for j
~ n-m+
1, ... ,no
Thus, Zij = Xij for i, j = 1, ... , n and hence Al = Ai i.e., the proof of uniqueness is complete. 0 Theorem 12.17. Let m be an integer such that 0 {bij :
~
m
~
n - 1 and let
(i,j) E Am}
be a given set of complex numbers such that the conditions (12.43) are in force. Let A E c~xn meet conditions (12.52) and (12.53) and let C E c~xn meet condition (12.52). Then (1) det A 2: det C, with equality if and only if A = C.
(2) If A = LADAL~ and C = LcDcLg, in which LA and Lc are lower triangular with ones on the diagonal and D A and Dc are n x n diagonal matrices, then D A t Dc, with equality if and only if A = C. Proof.
In view of Theorem 12.4, A = (XH)-l DX- l and C
= (yH)-lGy-l
,
where X E c nxn and Y E c nxn are lower triangular matrices with ones on the diagonal, D E c~xn and G E c~xn are diagonal matrices and Xij = 0 for i 2: m + j. Therefore, the formulas C
= A + (C - A) and Z = y-l X
imply that ZHGZ=D+XH(C-A)X.
Thus, as Z is lower triangular with ones on the diagonal and the diagonal entries of XH (C - A)X are all equal to zero, n d jj
= Lgssl z sjl2 = gjj + L9ssl z sjl2 2: gjj s=j
s>j
with strict inequality unless Zsj = 0 for s > j, i.e., unlessZ = In. This completes the proof of (1). Much the same argument serves to justify (2).
o
12. Triangular factorization and positive definite matrices
262
Remark 12.18. Theorem 12.16 can also be expressed in terms of the orthogonal projection PAm that is defined by the formula PAm A
L
=
(A, eieJ)eieJ
(i,j)EAm
c
on the inner product space nxn with inner product (A, B) = trace {BH A}: If the conditions of Theorem 12.16 are met and if Q E C nxn with qij = bij for (i, j) E Am, then there exists exactly one matrix A E c~xn such that PAmA = Q and
{In - PAm)A- 1 = O.
This formulation suggests that results analogous to those discussed above can be obtained in other algebras, which is indeed the case.
12.8. Schur complements for semidefinite matrices In this section we shall show that if E to, then analogues of the Schur complement formulas hold even if neither of the block diagonal entries are invertible. (Similar formulas hold if E :::S 0.) Lemma 12.19. Let A E C pxP , DE
c qxq , n = p + q,
E=[Bt
and let
~]
be positive semidefinite over C n. Then:
(1) NAc;;.,NBH andNDc;;.,NB . (2) 'R-B c;;., 'R-A and'RBH c;;., 'RD. (3) AAtB = Band DDtBH = BH.
(4) The matrix E admits the (lower-upper) factorization (12.56)
_ [ Ip E- BHAt
0 ] [ A Iq 0
0 ] [Ip D-BHAtB 0
At B ] Iq ,
where At denotes the Moore-Penrose inverse of A.
(5) The matrix E admits the (upper-lower) factorization (12.57)
E = [Ip
o
BDt] [ A - BDt BH Iq 0
0] [ D
Ip DtBH
where Dt denotes the Moore-Penrose inverse of D.
Proof.
Since E is presumed to be positive semidefinite, the inequality xH(Ax + By)
+ yH (BHx+ Dy)
~ 0
must be in force for every choice of x E C P and y E C q. If, in particular, x E NA, then this reduces to xHBy+yH(BHx+Dy) ~ 0
12.8. Schur complements for semidefinite matrices
263
for every choice of y E C q and hence, upon replacing y by cy, to cxHBy
+ cyHBHx+ c2yHDy ~ 0
for every choice of E > 0 as well. Consequently, upon dividing through by c and then letting c 1 0, it follows that xHBy+yHBHx ~ 0
for every choice of y E C q. But if y = - BH x, then the last inequality implies that
Therefore, BHx=O,
which serves to complete the proof of the first statement in (1) and, since the orthogonal complements of the indicated sets satisfy the opposite inclusion, implies that RAH = (NA).L ;2 (NBH).L = RB. Since A = A H , this verifies the first assertion in (2); the proofs of the second assertions in (1) and (2) are similar. The fourth assertion is a straightforward consequence of the formula AAt A = A and the fact that A
= AH ===> (At)H = At .
o
Items (3) and (5) are left to the reader. Exercise 12.37. Verify items (3) and (5) of Lemma 12.19. Theorem 12.20. If A E of the form
c nxn
and A t 0, then A admits factorizations
(12.58)
where L is lower triangular with ones on the diagonal, U is upper triangular with ones on the diagonal, and Dl and D2 are n x n diagonal matrices with nonnegative entries. Since A[l.k]
Proof.
to for
k = 1, ... , n, formula (12.56) implies that
_ - [A[l.k-l] 0 A[l.k] - Lk
0]
-H Uk Lk
for
k -_ 2, ...
,n,
where Lk is a k x k lower triangular matrix with ones on the diagonal and Uk ~ 0: A
0]
= L-n[A[l.n-I] Oun
L- H A n' ... , [1.2]
= L- 2 [A[l.l] 0
0]
U2
L- H
2·
12. Triangular factorization and positive definite matrices
264
The first formula in (12.58) is obtained by setting Lk = diag {Lk,In-d and writing A
= A l1 ,n] = LnLn-l ... L2 diag{au, a2, ... ,an} Lr ... L:[_I L:[ .
The second formula in (12.57) is verified on the basis of (12.58) in much the same way. 0 Exercise 12.38. Let
Show that if a> b> c> 0 and ac > b2 , then: (1) The matrices A and B are both positive definite over (2) The matrix AB is not positive definite over e 2. (3) The matrix AB + BA is not positive definite over
e 2.
e 2.
Exercise 12.39. Show that the matrix AB considered in Exercise 12.38 is not positive definite over 1R 2 • Exercise 12.40. Let A E e nxn and B E e nxn both be positive semidefinite over en. Show that A 2 B2 + B2 A 2 need not be positive semidefinite over en. [HINT: See Exercise 12.38.] Exercise 12.41. Let A E e nxn be expressed in block form as
A = [Au A12] A21 A22 with square blocks Au and A22 and suppose that A t O. Show that: (1) There exists a matrix K E
(2) A = [;H
~] [A~l
e pxq such that A12 =
A22 _ %H AuK]
AuK.
[3 z]·
(3) KHAuK = A21AilA12' Exercise 12.42. Show that in the setting of Exercise 12.41
e qxp such that A21 = A22K. KA22KH [Ip 0].
(1) There exists a matrix K
(2)
A=
[11' o
(3) KA22KH
KH] [Au Iq 0
E
0]
A22
K Iq
= A12A~2A21'
Exercise 12.43. Let A = BBH, where B E e nxk and rankB = k; let UI,··· , Uk be an orthonormal basis for 'RBH; and let Ae = E~=l BUjuy BH for £. = 1, ... ,k. Show that A = Ak and that A-Ae is a positive semidefinite matrix of rank k - f. for £. = 1, . .. , k - 1.
12.9. Square roots
265
Exercise 12.44. Show that if A E c nxn , then (12.59)
det A[l,k) ~ 0 for
A!: 0 ====> A = AH and
k
= 1, ... , n,
but the converse implication is false. Exercise 12.45. Let A E C nxn. Show that
A = AH and O"(A) C [0,00) <===> A t: O.
(12.60)
12.9. Square roots Theorem 12.21. If A E
BE
c
nxn
such that B !:
c nxn and A!: 0, then there is exactly one matrix
° and B2
= A.
Proof. If A E c nxn and A !: 0, then there exists a unitary matrix U and a diagonal matrix D = diag{du, ... ,linn} with nonnegative entries such that A = UDU H • Therefore, upon setting D 1/ 2
1/ 2 } , /2 dnn = d'lag {d1u,···,
it is readily checked that the matrix B = UD 1/ 2 U H is again positive semidefinite and
B2 = (UDl/2U H )UDl/2UH )
= UDU H = A.
This completes the proof of the existence of at least one positive semidefinite square root of A. Suppose next that there are two positive semidefinite square roots of A, say Bl and B 2. Then, since Bl and B2 are both positive semidefinite over en and hence Hermitian, there exist a pair of unitary matrices U1 and U2 and a pair of diagonal matrices Dl !: 0 and D2 !: such that
°
Bl = U1DIU[i
and B2 = U2D2UJ! .
Thus, as it follows that and hence that
(UJ!U1D 1 - D2UJ!U1)D1 + D2 (UJ!Ul Dl - D2UJ!Ul) = O. But this in turn implies that the matrix
X = UJ!U1Dl - D2UJ!Ul is a solution of the equation
266
12. Triangular factorization and positive definite matrices
The next step is to show that X = 0 is the only solution of this equation. Upon writing ' {del) (2)} Dl - dlag 11' . .., del)} nn and D2 -- d'lag {(2) d11 , · · · , dnn
one can readily check that the equation
Xij,
the ij entry of the matrix X, is a solution of
XijdB) . (1) h Thus, If d jj + d(2) ii > 0, t en
,
+ di~2) Xij = O.
(2) = O. 0 n t he ot her h an d ,1'fd(l) j j + d ii = 0, = di~2) = 0 and, as follows from the definition of X, Xij = 0 in this
Xij
then dB) case too. Consequently,
UJ!UlD1 - D2Ut!U1 = X = 0; i.e., as claimed. o If A ~ 0, the symbol A 1/2 will be used to denote the unique n x n matrix B ~ 0 with B2 = A. Correspondingly, B will be referred to as the square root of A. The restriction that B ~ 0 is essential to insure uniqueness. Thus, for example, if A is Hermitian, then the formula
exhibits the matrix as a square root of for every choice of C that commutes with A. In particular, 0] [Ik 0] [Ik C -Ik C -h -
[h0
0]
h
for every C E C kxk
.
Exercise 12.46. Show that if A, BEe nxn and if A >- 0 and B = B H , then there exists a matrix V E c nxn such that
VH AV = In and VH BV = D = diag{.A1, ... ,.An}. [HINT: Reexpress the problem in terms of U = A1/2V.] Exercise 12.47. Show that if A, B E c nxn and A ~ B >- 0, then B- 1 ~ A-I >- O. [HINT: A - B >- 0 ~ A- I/2 BA-1/2 ~ In.] Exercise 12.48. Show that if A, BE traceAB ~ 0 (even if AB ~ 0).
c nxn and if A ~ 0
and B
~
0, then
12.10. Polar forms
267
12.10. Polar forms If A E C pxq and r = rankA ~ 1, then the formula A = VIDU[I that was obtained in Corollary 10.2 on the basis of the singular value decomposition of A can be reexpressed in polar form: A
(12.61)
= VIU{' (UIDU{') and
A = (ViDViH)VIU[I ,
where VIU{' maps RAH isometrically onto RA, U1DU[I = {AH AP/2 is positive definite on RAH and VIDV1H = {AAHP/2 is positive definite on R A. These formulas are matrix analogues of the polar decomposition of a complex number. Theorem 12.22. Let A E C pxq • Then (1) rankA = q if and only if A admits a factorization of the form A = VIPI , where VI E C pxq is isometric; i.e., V1HVi = I q , and PI E c qxq is positive definite over C q • (2) rankA = p if and only if A admits a factorization of the form A = P2U2, where U2 E C pxq is coisometric; i.e., u2uf = I p , and P2 E C pxp is positive definite over C P • Proof. If rankA = q, then p factorization of the form A= V
~ q
and, by Theorem 10.1, A admits a
[~] u=v [3] DU,
where V and U are unitary matrices of sizes p x p and q x q, respectively, and D E c qxq is positive definite over C q • But this yields a factorization of the asserted form with VI
=
V
[3]
U and PI =
uH DU.
Conversely, if
A admits a factorization of this form, it is easily seen that rank A = q. The details are left to the reader. Assertion (2) may be established in much the same way or by invoking (1) and passing to transposes. The details are left to the reader. D Exercise 12.49. Complete the proof of assertion (1) in Theorem 12.22. Exercise 12.50. Verify assertion (2) in Theorem 12.22. Exercise 12.51. Show that if UU H = VVH for a pair of matrices U, V E C nxd with rank U = rank V = d, then U = V K for some unitary matrix K E C dxd . Exercise 12.52. Find an isometric matrix Vi and a matrix PI
that
[~
!]
= V,h
>- 0 such
12. Triangular factorization and positive definite matrices
268
12.11. Matrix inequalities Lemma 12.23. If F E C pxq , G E C rxq and FH F - G HG !:::: 0, then there exists exactly one matrix K E C rxp such that
(12.62)
G
= KF
and Ku
= 0 for every
Moreover, this matrix K is contractive:
Proof.
u E NFH .
IIKII ::; 1.
The given conditions imply that
(FHFx, x) ~ (GHGx,x)
for every x E C q .
Thus, Fx = 0 ==} IIGxl1 = 0 ==} Gx = 0 ; i.e., N F ~ N G and hence 'R,GH ~ 'R,FH. Therefore, there exists a matrix Kf E C pxr such that GH = FH Kf.
If NFH (12.62).
= {O}, then the matrix K = Kl meets both of the conditions in
If NFH =1= {O} and V E C pxi is a matrix whose columns form a basis for NFH, then
FH(K{! + VL) = FHK{! = G H for every choice of L E C ixr . Moreover, (Kl
+ LHVH)V = 0
{::::::> {::::::>
LH = -Kl V(VHV)-l Kl + LHV H = K1(Ip - V(VHV)-lVH).
Thus, the matrix K = Kl(Ip - V(VHV)-lV) meets the two conditions stated in (12.62). This is eminently reasonable, since Ip - V(VHV)-l VH is the formula for the orthogonal projection of C P onto 'R,F. It is readily checked that if K E C rxp is a second matrix that meets the two conditions in (12.62), then K = K. The details are left to the reader. It remains to check that K is contractive. Since CP
= 'R,F (f)NFH,
every vector u E CP can be expressed as u = Fx + Vy for some choice of x E C q and y E C i. Correspondingly,
(Ku,Ku) = (K(Fx+ Vy), K(Fx + Vy)) = (KFx,KFx) =
(Gx, Gx) ::; (Fx, Fx)
< (Fx, Fx) + (Vy, Vy) = (u, u) .
o Exercise 12.53. Show that if K E crxp and K E C rxp both meet the two conditions in (12.62), then K = K and hence that K is uniquely specified in terms of the Moore-Penrose inverse Ft of F by the formula K = GFFt.
12.11. Matrix inequalities
269
Corollary 12.24. If, in the setting of Lemma 12.23, FH F = GHG, then the unique matrix K that meets the two conditions in (12.62) is an isometry on RF. Proof.
This is immediate from the identity (K Fx, K Fx)
= (Gx, Gx) = (Fx, Fx) ,
which is valid for every x E C q. Lemma 12.25. Let A E O. Then:
c nxn
D
and BE
(1) There exists a matrix K E KHK t O. (2) A 1/2 t B1/2. (3) det A 2: det B 2:
c nxn
c nxn ,
and suppose that A t B t
such that B = KH AK and
In -
o.
Moreover, if A >- 0, then
(4) det A
= det
B if and only if A
= B.
Proof. Lemma 12.23 with F = A 1/2 and G = B1/2 guarantees the existence of a contractive matrix K E C nxn such that K A 1/2 = B1/2. Therefore, since B 1/ 2 = (Bl/2)H = A 1 / 2K H,
B = B1/2 B 1/ 2 = KA1/2(KA1/2)H = KAKH . Next, in view of Exercise 20.1, it suffices to show that all the eigenvalues of the Hermitian matrix A1/2 - B1/2 are nonnegative in order to verify (2). To this end, let (Al/2 - B1/2)U = AU for some nonzero vector u. Then
((A1/2
+ Bl/2)(Al/2 -
B1/2)u, u) ((A + B 1/ 2A 1/ 2 - A 1/ 2B 1/ 2 - B)u, u)
((A - B)u, u) 2: 0, since
+ A(U, u) = (A 1/2 B1/2u, U) . A 2: 0 if ((A1/2 + B1/2)u, u) > O. On the
(B1/2 A 1/2u, u) = (B1/2U, B1/2u)
The last inequality implies that other hand, if ((A1/2 + B1/2)u, u) = 0, then (A1/2u, u) and hence A = O.
= (B1/2u, u)
= 0
To obtain (3), observe first that in view of (1), the eigenvalues Ill,·· . ,Iln of KH K are subject to the bounds 0 ~ Ilj ~ 1 for j = 1, ... ,n. Therefore, det B = det (KAKH) = det (KH K) det A = (Ill··' Iln) det A ~ det A.
12. Triangular factorization and positive definite matrices
270
Moreover, if det B = det A and A is invertible, then J.LI KHK = In. Therefore, since KAI/2 = BI/2 = (BI/2)H
B
= ... = J.Ln = 1, i.e., = Al/2KH,
= A I/ 2KH KAI/2 = AI/2InAl/2 = A,
which justifies (4) and completes the proof. Exercise 12.54. Let A
=
[i
~]
and B
D
= [~ ~].
Show that A - B
~ 0,
but A2 - B2 has one positive eigenvalue and one negative eigenvalue. Theorem 12.26. If Al E C nx8 , A2 E C nxt , rankA I
A = [AI
= s, rankA2 = t and
A 2], then det (AH A) ~ det (A¥ AI) det (Alf A 2) ,
with equality if and only if A¥ A2 = O. Proof.
Clearly
AH A =
[A~ Al A~ A2]. A2 Al A2 A2
Therefore, since A¥ Al is invertible by Exercise 12.17, it follows from the Schur complement formulas that det (AHA) = det (A¥ AI) det (Alf A2 - Alf AI(A¥ AJ}-I A¥ A2). Thus, as
Alf A2 - Alf AI(A¥ AI)-I A¥ A2
j
Alf A 2 ,
Lemma 12.25 guarantees that det (Alf A2 - Alf Al (A¥ A I )-1 A¥ A 2) ~ det (Alf A 2) , with equality if and only if
Alf A2 - Alf A1(A¥ A 1)-1 A¥ A2 = Alf A 2 . This serves to complete the proof, since the last equality holds if and only if A¥A2 =0. D The lemma leads to another inequality (12.63) that is also credited to Hadamard. This inequality is sharper than the inequality (9.13). Corollary 12.27. Let A = [al aj E en for j = 1, ... , n. Then
an]
be an n x n matrix with columns
n
(12.63)
Idet A 12
~
II aJaj . j=1
Moreover, if A is invertible, then equality holds in (12.63) if and only if the columns of A are orthogonal.
12.12. A minimal norm completion problem
271
Proof.
The basic strategy is to iterate Theorem 12.26. The details are left to the reader. 0
Exercise 12.55. Complete the proof of Corollary 12.27. Exercise 12.56. Show that if U, V E C nxd and rank U = rank V = d, then UU H = VVH {::=} U = VK for some unitary matrix K E C dxd • Exercise 12.57. Show that if A E x E R(ln-A) if and only if
c nxn
and 0 ::S A ::S In, then a vector
lim((In - 8A)-lx,x) oj!
< 00.
[HINT: The result is transparent if A is diagonal.] Exercise 12.58. Show that if A, BE
c nxn and if A >-
°
and B >- 0, then
A+B v'det A det B $ det - 2 - '
(12.64)
Exercise 12.59. Show that if A, B E c nxn and if AB = then there exists a unitary matrix U E c nxn such that U H AU
where An >-
°
=
g]
[A~l
and UHBU
=
[g
°
but A+B >- 0,
~2]'
and B22 >- O.
12.12. A minimal norm completion problem The next result, which is usually referred to as Parrott's lemma, is a nice application of the preceding circle of ideas. Lemma 12.28. Let A E C pxq , BE C pxr and C E (12.65)
min
{II [~ ~] I :D
E
c sxq .
Then
csxr } = max {II [ ~ ]II, I [A B] II} .
The proof will be developed in a sequence of auxiliary lemmas, most of which will be left to the reader to verify. Lemma 12.29. Let A E C pxq . Then
IIAII $
,
{::=}
'l Iq -
AHA to{::=} ,2Ip - AAH to.
Proof. The proof is easily extracted from the inequalities in Exercise 12.9.
o
Lemma 12.30. Let A E C pxq , BE C pxr and C E
(1) ,
~ 1\ [ ~ ] I {::=} ,2Iq -
AHA t CHC.
c sxq .
Then
12. Triangular factorization and positive definite matrices
272
Proof. This is an easy consequence of the preceding lemma and the fact that IIEII = IIEHII. 0 Lemma 12.31. If A E C pxq and IIAII ~ 'Y, then:
(-y2Iq _ AH A)1/2 AH = AH (-y2 Ip _ AAH)1/2
(12.66) and (12.67)
Proof. These formulas may also be established with the aid of the singular value decomposition of A. 0 Lemma 12.32. If A E C pxq , P + q = nand
IIAII
~ 'Y,
then the matrix
_ [ A (-y2Ip - AAH)1/2 ] E (-y2Iq _ AH A)1/2 _AH
(12.68)
satisfies the identity
EEH
= [ 'Y2Ip
o
0 ] 'Y2Iq
= ",2 I In'
Proof. This is a straightforward multiplication, thanks to Lemma 12.31.
o Lemma 12.33. Let A E C pxq , BE C pxr and C E
c sxq
and suppose that
Then there exists a matrix DEC sxr such that
Proof.
The given inequality implies that
"'?Iq _AHA
t
CHC
and 'Y2Ip - AAH
t
BBH.
Therefore, by Lemma 12.23, (12.69)
B
= (-y2Ip -
AAH)1/2X and C
= y(-y2Iq -
c
AH A)1/2
for some choice of X E C pxr and Y E sxq with IIXII ~ 1 and Thus, upon setting D = - YAH X, it is readily seen that
IIYII
~
1.
[~~]=[3 ~]E[3 ~], where E is given by formula (12.68). But this does the trick, since EEH = 'Y2 In by Lemma 12.32 and the norm of each of the two outside factors on 0 the right is equal to one.
12.13. A description of all solutions to the completion problem
273
12.13. A description of all solutions to the minimal norm completion problem Theorem 12.34. A matrix D E CSXT achieves the minimum in (12.65) if and only if it can be expressed in the form
D
(12.70)
= _YAH X + (Is - yy H)1/2Z(Ir - X HX)1/2,
where
(12.71) and is any matrix in C SXT such that
ZH Z ~
(12.72)
Z
Discussion. 1.
We shall outline the main steps in the proof:
,.,? IT'
if and only if (12.73) 2. In view of Lemma 12.31 and the formulas in (12.69),
'Y2Iq+T -
[~:
] [A
B]
= MH M ,
where
_ [("(2Iq - AH A)1/2 _AH X ] M 0 'Y(Ir _ XH X)1/2 3. In view of (12.73), the identity in Step 2 and Lemma 12.23, there exists a unique matrix [Kl K2] with components Kl E csxq and K2 E CSXT such that
[C D]
=
[Kl K 2] M [Kl("(2Iq - AH A)1/2 -KIAHX + K2'Y(IT - XHX)I/2]
and
4. Kl = Y, since
MH
[~~] = 0
¢=}
("(2Iq - AH A)I/2UI = 0
and
_XH AUI + 'Y(Ir - XH X)1/2u2 = 0 ¢=}
("(2Iq - AH A)1/2Ul = 0 and (IT - XH X)I/2U2 = 0,
12. Triangular factorization and positive definite matrices
274
because
XH A = BH
{('lIq -
AH A)1/2r A = BH A {("?Ip _ AAH)1/2} t
and NWH = Nwt for any matrix WE C kxk . 5. Extract the formula
D
= -KIAHX + ,,/K2(Iq - XXH)1/2
from Step 3 and then, taking note of the fact that KIKf replace Kl by Y and ,,/K2 by (Is - yyH)1/2 Z.
+ K2KJ! ::5
Is,
12.14. Bibliographical notes The section on maximum entropy interpolants is adapted from the paper [24]. It is included here to illustrate the power of factorization methods. The underlying algebraic structure is clarified in [25]; see also [34] for further generalizations. A description of all completions of the problem considered in Section 12.7 may be found e.g., in Chapter 10 of [21]. Formulas (12.32) and (12.28) imply that
Ihol2 = ~
r
27r In f(eiO)dO 27r for the Toeplitz matrix Tn (f) based on the Fourier coefficients of the considered function f. This is a special case of a theorem that was proved by Szego in 1915 and is still the subject of active research today; see e.g., [9] and [65] for two recent expository articles on the subject; [64] for additional background material; and the references cited in all three. Lemma 12.8 is due to Fejer and Riesz. Formula (12.40) is one way of writing a formula due to Gohberg and Heinig. Other variants may be obtained by invoking appropriate generalizations of the observation
(12.74)
lim In det Tn(f) = In
njoo
n
Jo
[~ ~ ~l[~ ~ ~l[~ ~ ~l=[~: ~l 100
cba
100
OOa
The minimal norm completion problem is adapted from [31] and [74], both of which cite [18] as a basic reference for this problem. Exercises 12.58 and 12.59 are adapted from [69] and [26], respectively.
Chapter 13
Difference equations and differential equations
There are few vacancies in the Big Leagues for the man who is liable to steal second with the bases full. Christy Mathewson, cited in [40], p. 136 In this chapter we shall focus primarily on four classes of equations:
(1) xk+1 = AXk, k = 0,1, ... , in which A E lF Pxp and Xo E IFP are specified. (2) x'(t) = Ax(t) for t specified.
~ a,
in which A E lF Pxp and x(a) E IFP are
(3) Xk+p = alxk+p-l +- . +apxk, for k = p,p+l, ... , in which al, ... ,ap IF, ap 1= 0 and xo, ... , Xp-l are specified.
E
(4) x(p)(t) = alx(p-l)(t) +a2x(p-2)(t)+ .. ·+apx(t), in which al,'" ,ap E IF and x(a), ... ,x(p-l)(a) are specified. It is easy to exhibit solutions to the first-order vector equations described in (1) and (2). The main effort is to understand the behavior of these solutions when k and t tend to 00 with the help of the Jordan decomposition of the matrix A. The equations in (3) and (4) are then solved by imbedding them in first-order vector equations of the kind considered in (1) and (2), respectively. Two extra sections that deal with second-order equations with nonconstant coefficients have been added because of the importance of this material in applications.
-
275
13. Difference equations and differential equations
276
13.1. Systems of difference equations The easiest place to start is with the system of difference equations (or, in other terminology, the discrete dynamical system) (13.1)
Xk+1
= AXk, k = 0, 1, ... ,
in which A E Cpxp and Xo E CP are specified and the objective is to understand the behavior of the solution Xn as n gets large. Clearly Xn
= Anxa·
However, this formula does not provide much insight into the behavior of X n . This is where the fact that A is similar to a Jordan matrix J comes into play: (13.2)
A = V JV- 1 ==} Xn = V rv-1xo ,
for
n = 0,1, ....
The advantage of this new formulation is that In is relatively easy to compute: If A is diagonalizable, then J
= diag {AI.'" ,Ap}, In = diag {Aj\ ...
and
,A~}
p
Xn
= LdjAjVj j=l
is a linear combination of the eigenvectors Vj of A, alias the columns of V, with coefficients that are proportional to Aj. If A is not diagonalizable, then J=diag{J1, ... ,Jr } , where each block entry Ji is a Jordan cell, and
In = diag {Ji, ...
,~}.
Consequently the key issue reduces to understanding the behavior of the n'th power (cim»)n of the m x m Jordan cell cim) as n tends to 00. Fortunately, this is still relatively easy:
Lemma 13.1. If N = (13.3)
ei
(cim»)n =
m) -
AIm = C~m), then
E(~) j=O
An - j Nj
when n
~ m.
J
Proof. Since N commutes with AIm, the binomial theorem is applicable and supplies the formula
(cim»)n = (AIm + Nt =
t (~) j=O
J
But this is the same as formula (13.3), since Nj =
An- j Nj .
°
for j ~ m.
D
13.2. The exponential
277
etA
Exercise 13.1. Show that if J = diag {AI •...• Ap}. V = [VI'" Vp ] and (V-ll = [WI ... w p ], then the solution (13.2) of the system (13.1) can be expressed in the form p
Xn
=
L NJvjwJxo . j=1
Exercise 13.2. Show that if, in the setting of Exercise 13.1, IAII j = 2, ... ,p, then
> IAjl for
lim ,In Xn = VIwI'xo.
njoo "'I
Exercise 13.3. The output Un of a chemical plant at time n, n = 0,1, ...• is modelled by a system of the form Un = Anno. Show that if A
=
[~ -~;~ ~] o
and Uo
= [ : ] , then c
0 1/4
l~m Un = [ a ~ 3b ] . n
0
00
Exercise 13.4. Find an explicit formula for the solution = Anuo when
Un
of the system
Un
Notice that it is not necessary to compute V-I in the formula for the solution in (13.2). It is enough to compute V-Ixo. which is often much less work: set Yo
= V- 1xo
and solve the equation Vyo
Exercise 13.5. Crucuffite
= Xo·
V-l~ when V ~ [~ ~ ~]
and xo
~
m
both directly (Le .• by first calculating V-I and then calculating the product V-1xo) and indirectly by solving the equation Vyo = Xo, and compare the effort.
13.2. The exponential
etA
Our next objective is to develop formulas analogous to (13.2) for the solution of a first-order vector differential equation. To do this, it is useful to first discuss the exponential etA of a matrix A E en X n. It is well known that for every complex number may be expressed as a power series 00
o
~a
k
e = L.J k! • k=O
Q
the exponential eO
13. Difference equations and differential equations
278
which converges in the full complex plane C. The same recipe may be used for square matrices A, thanks to the following lemma. Lemma 13.2. Let A = [aij], i,j = 1, ... ,p, be a p x p matrix and let
a=
max {Iaijl : i,j =
1, ... ,p}.
Then the ij entry of Ak is subject to the bound
(13.4)
I(Ak)ijl
~
(ap)k, for i,j p
= 1, ...
,p and k
= 1,2, ....
Proof. The proof is by induction. The details are left to the reader. Thus, for A E CPxp, we may define A2 (13.5) eA = Ip + A + 2f + ....
0
Exercise 13.6. Verify the bound (13.4). Exercise 13.7. Show that if A E CPxp, then the partial sums k Ak
Sk
= Lkf j=O
form a Cauchy sequence in the normed linear space Cpxp with respect to any multiplicative norm on that space. Exercise 13.8. Show that if A E CpxP, then
(13.6)
II
ehA - Ip - hA II elhiliAIl h ~
l-lhlliAIl Ihl
~ (e
IhlliAIl
-
l)IIAII·
Exercise 13.9. Show that if A, BE Cpxp and AB = BA, then eA+B = eAe B .
WARNING: In general, eA+B ::I eAe B . Exercise 13.10. Exhibit a pair of matrices A, B E Cpxp such that e A +B eAe B . Exercise 13.11. Show that if A, BE CpxP, then etAesBe-tAe-sB - I. lim P = AB - BA . (s,t)-+(O,O) st Let Then
F(O) = Ip
::I
13.3. Systems of differential equations
279
and
F(t + h) - F(t) h _
-e
tA (e hA -
h
Ip) '
which tends to etA A
= AetA
as h tends to zero, thanks to the bound (13.6). Thus, the derivative
F'(t)
= lim F(t + h) -
F(t)
= AF{t).
h The same definition is used for the derivative of any suitably smooth matrix valued function F(t) = [fij(t)] with entries /ij{t) and implies that h-.O
F'(t)
=
[ffj(t)],
and correspondingly
lb
F{s)ds
=
[lb
fij(S)dS] ;
i.e., differentiation and integration of a matrix valued function is carried out on each entry in the matrix separately. Exercise 13.12. Show that if F(t) is an invertible suitably smooth p x p matrix valued function on the interval a < t < b, then (13.7)
lim F(t + h)-~ - F(t)-l = -F(t)-l F'(t)F(t)-l for a < t < b.
h-.O
[HINT: F(t + h)-l - F(t)-l
= F(t + h)-l(F(t) - F(t + h»F(t)-l.]
Exercise 13.13. Calculate eA when A =
[~ ~].
Exercise 13.14. Calculate eA when A =
[ab ab].
[HINT: aI2 and A - aI2
commute.] Exercise 13.15. Calculate eA when A =
[~b ~].
[HINT: aI2 and A- aI2
commute.]
13.3. Systems of differential equations In view of the preceding analysis, it should be clear that for any vector c E C P, the vector function x(t) = e(t-a)A c (13.8) is a solution of the system (13.9)
x'(t)
= Ax(t), t 2 a, with initial conditions x(a)
= c.
13. Difference equations and differential equations
280
The advantage of this formulation is its simplicity. The disadvantage is that it is hard to see what's going on. But this is where the Jordan decomposition theorem comes to the rescue, just as before: If A = V JV- 1 for some Jordan matrix J, then etA = Ve tJ V- 1 and
(13.10)
x{t)
= Ve(t-a)Jd, where d = V- 1x{a).
Note that it is not necessary to calculate V-I, since only d is needed. The advantage of this new formula is that it is easy to calculate etJ : If J = diag {AI, ... ,.\p}, then etJ = diag {etAI, ... , etA)} and hence, upon writing V =
[VI ...
= [d 1
v p ] and d T
dp ],
p
x(t) =
(13.11)
L dje(t-a)AjVj , j=1
which exhibits the solution x(t) of the system (13.9) as a linear combination of the eigenvectors Vb ..• , vp of A with coefficients that depend upon the eigenvalues of A and vary with t. If A is not diagonalizable, then J = diag {J}, ... ,Jr} and etJ = diag {etl! , . .. , etJr }, where each block entry Ji is a Jordan cell. Consequently, the solution x{t) of the system (13.9) is now a linear combination of generalized eigenvectors of A and it is important to understand the behavior of etcim ) as t tends to 00. Fortunately, this too is relatively easy. Thus, for example, if m = 3 and N = 3) - AI3, then
ci
(3)
C et.x
= etAhetN = etAetN
:2 }
= etA { h + tN + t 2 etA tetA etA
= [0
o
0
~ etA] tetA etA
.
The same pattern propagates for every Jordan cell: Lemma 13.3. If N
= eim ) - )"Im = Cam),
(13.12)
etcirn) = etAe tN
then
m-l
j=O
Proof.
.
= etA L (t~)J . J.
The proof is easy and is left to the reader as an exercise.
Exercise 13.16. Verify formula (13.12).
0
13.4. Uniqueness
281
Exercise 13.17. Show that if J = diag P.I, ... ,Ap}, V = [VI"· Vp ] and (V- 1 f = [WI ... w p ], then the solution (13.8) of the system (13.9) can be expressed in the form p
x(t) =
L
e(t-a),xjVjWlx(a).
j=l
Exercise 13.18. Show that if, in the setting of Exercise 13.17, for j = 2, ... ,p, then H~ e-t,xlx(t)
IAII > IAjl
= e-a,xlvlW[ x(a).
Exercise 13.19. Give an explicit formula for etA when
A
[~1o -1~ 0~].
=
[HINT: You may use the fact that the eigenvalues of A are equal to 0, i..j2 and -i..j2.] Exercise 13.20. Let A
and (VT)-l = [WI e- 3t etA as t i 00.
1
= V JV- l , where J = [ 02 21 00 ,V =
W2 W3].
[VI V2 V3]
003 Evaluate the limit of the matrix valued function
13.4. Uniqueness
Formula (13.8) provides a (smooth) solution to the first-order vector differential equation (13.9). However, it remains to check that there are no others. Lemma 13.4. The differential equation (13.9) has only one solution x(t) with continuous derivative x'(t) on the interval a ~ t ~ b that meets the specified initial condition at t = a.
Proof. Then
Suppose to the contrary that there are two solutions x(t) and y(t).
x(t) - y(t) = =
it it
{x'(s) - y'(s)}ds A{x(s) - y(s)}ds.
13. Difference equations and differential equations
282
Therefore, upon setting u(s) = x(s) - y(s) for a last equality, we obtain the formula u(t) = An
it 1 1 81
•••
8n
1 -
~
s ~ b and iterating the
u(sn)dsn · .. ds 1 ,
which in turn leads to the inequality
~ MilAn II (b - ,a)n ~ MIlAlln (b - ,a)n
M
n.
n.
for M = max {lIu(t)1I : a ~ t ~ b}. If n is large enough, then IIAlln(b - a)n In! < 1 and hence,
(l- II Alln(b ~t)n) ~ O.
o~ M
Therefore, M = OJ i.e., there is only one smooth solution of the differential equation (13.9) that meets the given initial conditions. 0 Much the same sort of analysis leads to Gronwall's inequality: Exercise 13.21. Let h(t) be a continuous real-valued function on the interval a ~ t ~ b. Show that
it {l h(S2)
it [1 h(S3)
83
h(S2)
{l
B2
B2
h(sr)ds 1 } dS2 =
h(Sl)dsl } dS2] dS3 =
(it h(S)dS) (it h(S)dS)
2
/2!,
3
/3!,
etc. Exercise 13.22. (Gronwall's inequality) Let a > 0 and let u(t) and h(t) be continuous real-valued functions on the interval a ~ t ~ b such that
u(t)
~a+
it
Show that
u(t)
h(s)u(s)ds and h(t)
~ aexp
(it h(S)dS)
~0
for
for
a~t
a
~ t ~ b.
~ b.
[HINT: Iterate the inequality and exploit Exercise 13.21.]
13.5. Isometric and isospectral flows A matrix B E jRPxp is said to be skew-symmetric if B = _BT. Analogously, B E Cpxp is said to be skew-Hermitian if B = _BH. Exercise 13.23. Let B E eB is unitary.
CPxp.
Show that if B is skew-Hermitian, then
13.6. Second-order differential systems
283
Exercise 13.24. Let F(t) = e tB , where B E IRPxP. Show that F(t) is an orthogonal matrix for every t E IR if and only if B is skew-symmetric. [HINT: If F(t) is orthogonal, then the derivative {F(t)F(t)T}' = 0.] Exercise 13.25. Let B E IRPxp and let x(t), t 2: 0, denote the solution of the differential equation x'(t) = Bx(t) for t 2: 0 that meets the initial condition x(O) = c E IRP. (a) Show that ftllx(t)112 = x(t)T(B + BT)x(t) for every t 2: O. (b) Show that if B is skew-symmetric, then IIx(t)1I t 2: O.
= IIx(O)1I for every
Exercise 13.26. Let A E IRPxp and U(t), t 2: 0, be a one-parameter family of p x p real matrices such that U'(t) = B(t)U(t) for t > 0 and U(O) = Ip. Show that F(t) = U(t)AU(t)-I is a solution of the differential equation (13.13)
F'(t) = B(t)F(t) - F(t)B{t)
for t 2: O.
Exercise 13.27. Show that if F(t) is the only smooth solution of a differential equation of the form (13.13) with suitably smooth B(t), then F(t) = U(t)F(O)U(t)-I for t 2: O. [HINT: Consider U(t)F(O)U(t)-I when U(t) is a solution of U'(t) = B(t)U(t) with U(O) = I p .] A pair of matrix valued functions F(t) and B(t) that are related byequation (13.13) is said to be a Lax pair, and the solution F(t) = U(t)F(O)U(t)-I is said to be isospectral because its eigenvalues are independent of t.
13.6. Second-order differential systems If A = V JV- I is a 2 x 2 matrix that is similar to a Jordan matrix J, then either J
=
[~I ~J
or
J=
[~1
;J.
In the first case, A has two linearly independent eigenvectors,
A and
[VI V2]
=
[VI V2]
[~I ~2]
VI
and
V2:
13. Difference equations and differential equations
284
where we have set
In the second case
[e oAl t
etJ =
te tAl ] etAl
and only the first column of
v=
[VI V2]
is an eigenvector of A. Defining dl and d2 as before, we now obtain the formula U(t) = [VI V2] = dletAlVI
[e~l
t;t:ll]
[~~]
+ d2(tetAlvl + etAlv2) .
13.7. Stability The formulas Un
= V JnV-IuQ
and x(t)
= Ve(t-a)JV-Ix(a)
express the solutions Un and x(t) of equations (13.1) and (13.9) as linear combinations of the eigenvectors and generalized eigenvectors of A with coefficients that depend upon nand t, respectively, and the eigenvalues. Thus, the "dynamic" behavior depends essentially upon the magnitudes IAjl, j = 1, ... ,p, in the first case and the real parts of Aj, j = 1, ... ,p, in the second: and, similarly,
IIx(t) II ~ 1IVIIIle(t-a)JIIIIV-1x(a)lI· These bounds are particularly transparent when J = diag {AI, ... ,Ap},
because then where a = max{IAjl : j = 1, ... ,p}
and
,B = ma.x{Aj + Aj : j = 1, ... ,p}.
In particular, (a) J diagonal (or not) and lal < 1 ===} liIDnioo lIun li = o. (b) J diagonal and lal ~ 1 ===} Ilunli is bounded. (c) J diagonal (or not) and ,B < 0 ===} limt-+oo Ilx(t)II = o.
13.9. Strategy for equations
285
(d) J diagonal and (3 ~ 0 ==> IIx(t)II is bounded for t > O. Exercise 13.28. Show by example that item (b) in the list just above is not necessarily correct if the assumption that J is a diagonal matrix is dropped. Exercise 13.29. Show by example that item (d) in the list just above is not necessarily correct if the assumption that J is a diagonal matrix is dropped.
13.8. Nonhomogeneous differential systems In this section we shall consider nonhomogeneous differential systems, i.e., systems of the form x'(t) = Ax(t) + g(t) ,
0:
~ t
< (3,
where A E jRnxn and g(t) is a continuous n x 1 real vector valued function on the interval 0: ~ t < (3. Then, since x'(t) - Ax(t)
= etA (e-tAx(t))' ,
it is readily seen that the given system can be reexpressed as (e-sAx(s))' = e-sAg(s)
and hence, upon integrating both sides from 0: to a point t E (0:, (3), that e-tAx(t) - e-aAx(o:) = i t (e-sAx(s))' ds = i t e-SAg(s)ds
or, equivalently, that (13.14)
x(t)
= e(t-a)Ax(o:) + it eCt-s)Ag(s)ds for
0:
~ t < (3.
13.9. Strategy for equations To this point we have shown how to exploit the Jordan decomposition of a matrix in order to study the solutions of a first-order vector difference equation and a first-order vector differential equation. The next item of business is to study higher order scalar difference equations and higher order scalar differential equations. In both cases the strategy is to identify the solution with a particular coordinate of the solution of a first-order vector equation. This will lead to vector equations of the form Uk+l = AUk and x'(t) = Ax(t), respectively. However, now A will be a companion matrix and hence Theorem 5.11 supplies an explicit formula for det ()"In - A), which is simply related to the scalar difference/differential equation under consideration. Moreover, A is similar to a Jordan matrix with only one Jordan cell for each distinct eigenvalue. Consequently, it is possible to develop an algorithm for writing down the solution, as will be noted in subsequent sections.
286
13. Difference equations and differential equations
Exercise 13.30. Show that if A is a companion matrix, then, in the notation of Theorem 5.11, (13.15)
A
is invertible {::::::} Al ... Ak
i= 0 {::::::} ao i= o.
13.10. Second-order difference equations To warm up, we shall begin with the second-order difference equation (13.16)
Xn = aXn-1
+ bXn-2,
n = 2,3, ... , with b i= 0,
where a and b are fixed and Xo and Xl are given. The objective is to obtain a formula for Xn and, if possible, to understand how Xn behaves as n i 00. We shall solve this second-order difference equation by embedding it into a first-order vector equation as follows: First observe that
[xn-2] Xn-l and then, to fill out the left-hand side, add the row Xn =
[b a]
Xn-l = to get
[0 1]
[xn-2] Xn-l
[0 1]
23
[xn-2] _ [xn-l] = b Xn a Xn-l ,n- , ''' .. Thus, upon setting
no = [xo] , UI = Xl
[Xl] , " . ,
X2
Un
= [Xn+l Xn ]
and
A=
(13.17)
[0 a1] ' b
we obtain the sequence UI
= Auo, U2 = Au}, ... ,
i.e.,
Un = Anno.
Since A is a companion matrix, Theorem 5.11 implies that
det(AI2 - A) = A2 - aA - b and hence the eigenvalues of A are
a- Va 2 + 4b
A2=---2--'
13.10. Second-order difference equations
287
Therefore, A is similar to a Jordan matrix of the form J =
[~l ~J
if Al
# A2
J=
and
[~l
{I]
if Al = A2.
Moreover, since b # 0 by assumption, the formula
(A - A1)(A - A2) Case 1 (AI
=
# O.
A2 - aA - b ==? AIA2
# A2): 0] V-I
(13.18)
A~
uo·
Consequently, (13.19)
Xn
= [1 0] V
[A~0
0]
A~ V
-1
Uo .
However, it is not necessary to calculate V and V-I. It suffices to note that formula (13.19) guarantees that Xn must be of the form Xn
= aAl + {3X2 (AI # A2)
and then to solve for a and {3 from the given "initial conditions" Xo and
Xl.
Example 13.5. Xn = 3Xn -1
+ 4Xn-2,
n = 2,3, ... ,
Xo = 5 and Xl = O.
Discussion. The roots of the equation A2-3A-4 are Al = 4 and A2 =-1. Therefore, the solution Xn must be of the form Xn
= a4n + {3( _1)n, n = 0, 1, ....
The initial condition
= 5 ==? a + {3 =
Xo
5,
whereas, the initial condition Xl
= 0 ==? 4a - {3 = O.
Thus, we see that a = 1, {3 = 4, and hence the solution is xn=4n+4(-lt
for
n=O,I, ....
Case 2 (AI = A2): Un
n = A Uo
n V-I = V [A~ nA~-I] V-I = V [AlI] 0 Al Uo 0 A~ no .
Consequently Xn
=
nA~-I] V-I [0 1] V [ A~ 0 An Uo 1
288
13. Difference equations and differential equations
must be of the form
Xn = aAl + j3nAl . Notice that since Al ::/= 0, a (positive or negative) power of Al can be absorbed into the constant 13 in the last formula for X n . Example 13.6.
Xn = 2Xn-l - Xn-2 Xo
=3
for
and Xl
n = 2,3, ...
= 5.
The equation A2 - 2A + 1 = 0 has two equal roots:
Discussion.
Al = A2 = 1. Therefore,
Xn = a(l)n + j3n(l)n = a Substituting the initial conditions Xo = a = 3 and we see that
13 =
+ j3n.
XI = 3 + 13 = 5 ,
2 and hence that
Xn = 3 + 2n for n = 0,1, .... We are thus led to the following recipe: The solution of the second-order difference equation
Xn
= aXn-1 + bXn-2,
n
= 2,3, ...
,
with b::/= 0,
Xo = c and XI = d may be obtained as follows:
(1) Solve for the roots AI, A2 of the quadratic equation A2 = aA + b and note that the factorization
(A - AI)(A - A2) = A2 - aA - b implies that AIA2 = b ::/= O. (2) Express the solution as
Xn = { aAf + j3A~
if A1::/= A2
aAf + j3nAf if Al = A2 for some choice of a and 13. (3) Solve for a and j3 by invoking the initial conditions: C
= Xo = a + j3 and
C
=
d = XI
= aAI + j3A2 if AI::/= A2
{
Xo = a
and d = XI
= aAI + j3AI
if Al
= A2
13.11. Higher order difference equations
289
Exercise 13.31. Find an explicit formula for X n , for n = 0,1, ... , given that Xo = -1, Xl = 2 and Xk+l = 3Xk - 2Xk-l for k = 1,2, ... . Exercise 13.32. The Fibonacci sequence X n , n = 0, 1, ... , is prescribed by the initial conditions Xo = 1, Xl = 1 and the difference equation xn+l = Xn + Xn-l for n = 1,2, .... Find an explicit formula for Xn and use it to calculate the golden mean, limnjoo xn/xn+l'
13.11. Higher order difference equations Similar considerations apply to higher order difference equations. The solution to the p'th order equation (13.20) x n+p
= ClXn+p-l + C2Xn+p-2 + ... + cpxn, n = 0, 1, ... , with
Cp
=I 0
and given initial conditions Xo, Xl, ..• ,Xp _ 1, can be obtained from the solution to the first-order vector equation Un =
AUn-l
for n=p,p+l, ...
where
(13.21)
Un =
[xn~p+ll . Xn-l
0 0
and
1 0
0 1
0 0
A=
Xn
0
0
0
1
Cp
Cp-l
Cp-2
Cl
The nature of the solution will depend on the eigenvalues of the matrix A. A convenient recipe for obtaining the solution of equation (13.20) is: (1) Find the roots of the polynomial AP - ClAP - l - ... - Cp. (2) If AP - ClAP-l - ... - cp = (A - Al)'1:1 •.. (A - AkY:l
= LPj(n)Aj , where Pj is a polynomial of degree a-j -1. j=l
(3) Invoke the initial conditions to solve for the coefficients of the polynomials Pj' Discussion.
The algorithm works because A is a companion matrix. Thus, det (Alp - A) = AP -
and hence, if
CIAP - 1 - ... - Cp
13. Difference equations and differential equations
290
with distinct roots AI, ... ,Ak, then A is similar to the Jordan matrix
· {C(c::tI) C(c::t k )} J = d lag A1' . .., Ak ' with one Jordan cell for each distinct eigenvalue. Therefore, the solution must be of the form indicated in (2).
Remark 13.7. The equation AP - CIAP-l - ... - Cp = 0 may be obtained with minimum thought by letting Xj = Aj in equation (13.20) and then factoring out the highest common power of A. Exercise 13.33. Find the solution of the third-order difference equation =
X n +3
3Xn +2
-
3Xn +1
subject to the initial conditions (x - 1)3 = x3 - 3x2 + 3x-1.]
+ Xn
,
n = 0,1, ...
= 1, Xl = 2 and
Xo
X2
8.
[HINT:
13.12. Ordinary differential equations Ordinary differential equations with constant coefficients can be solved by imbedding them in first-order vector differential equations and exploiting the theory developed in Section 13.3. Thus, for example, to solve the secondorder differential equation
x"(t) = ax'(t) + bx(t), t
(13.22)
~ 0
with initial conditions
x(O) =
C
and x'(O)
= d,
introduce the new variables
UI(t) = x(t) and U2(t) = x'(t). Then and U~(t) =
x"(t) = aU2(t) + bUI(t).
Consequently, the vector
[~~~g]
u(t) =
is a solution of the first-order vector equation
u'(t)
=
[u~(t)] = u~(t)
= Au(t),
t
~
[0b a1] [UI(t)] U2(t) 0,
13.12. Ordinary differential equations
291
with
[~
A=
!]
u(O) =
and
[~].
Thus,
u(t) = etA
[~]
and
x(t) =
[1 0] etA [~]
.
Let AI. A2 denote the roots of A2 - aA - b. Then, since A is a companion matrix, there are only two possible Jordan forms: J
=
[~1 ~2]
Case 1 (AI
i= A2
if Al
i= A2): tJ _
e -
[~
and J =
[e-Xlt 0
;J
if Al
= A2 .
0]
eA2t
and hence the solution x(t) of equation (13.22) must be of the form
x(t)
=
ae Alt + {3e A2t ,
for some choice of the constants a and {3.
tJ _
e
-
[e-XI t teAl t] 0 eAlt
and hence the solution x(t) of the equation must be of the form
x(t) = ae-Xlt + {3te Alt . In both cases, the constants a and {3 are determined by the initial conditions. The recipe for solving a p'th order differential equation (13.23) with constant coefficients that is subject to the constraint ap initial conditions x(a)
= C1, ...
,x(P-1)(a)
f: 0 and to the
= Cp
is similar: (1) Find the roots of the polynomial AP
-
(a1AP-1
+ ... + ap ).
13. Difference equations and differential equations
292
+ ... + ap) = (>.. - >"d
(>.. - >"k)Qk with k distinct roots >"1, ... ,>"k, then the solution x(t) to the given equation is of the form x(t) = e(t-a)AI P1 (t) + ... + e(t-a)Akpk(t) ,
(2) If >"P - (a1>..p-1
Q1 •••
where pj(t) is a polynomial of degree OJ - 1 for j = 1, ... ,k.
(3) Find the coefficients of the polynomials pj(t) by imposing the initial conditions. Discussion.
Let
Then
u'(t) = Au(t) for t ~ a, where 0 0
1
0
0
1
0 0
and
A= 0
0
0
1
ap
tlp-l
ap-2
al
u(a)
~ ~ c
[:]
Thus,
u(t) = e(t-a)A c and x(t) = [1 0 .. .
0] u(t)
for
t
~
a.
The special form of the solution indicated in (2) follows from the fact that A is a companion matrix and hence is similar to the Jordan matrix
C(Qk)} . · {C(Ql) J = dlag AI' ••• , Ak
o Remark 13.8. The equation >"P - al>..p-l - ... - ap = 0 may be obtained with minimum thought in this setting too by letting x(t) = eAt in equation (13.23) and then factoring out the term eAt. Example 13.9. The recipe for solving the third-order differential equation
x"'(t) = ax"(t) + bx'(t) + cx(t), t ~ 0 and c =1= 0, is: (1) Solve for the roots >"1, >"2, >"3 of the polynomial >..3 - a>..2 - b>" - c.
13.13. Wronskians
293
(2) The solution is
x(t) = o:e A1t + /3e A2t + ,eA3t if AI. A2, A3 are all different,
= o:e A1t + /3te A1t + ,eA3t if Al = A2#= A3, x(t) = o:e A1t + /3te A1t + 2eA1t if Al = A2 = A3. x(t)
,t
(3) Determine the constants and xl/(O).
0:,
/3"
from the initial conditions x(O), x'(O)
Exercise 13.34. Find the solution of the third-order differential equation
= 3X(2)(t) -
x(3)(t)
3x(I)(t) +x(t) , t 2:: 0,
subject to the initial conditions x(O)
=1,
Exercise 13.35. Let u'(t) =
x(l)(O) = 2 , x(2)(0) = 8 .
[~ ~] u(t)
for
t 2:: O. Show in two dif-
ferent ways that Ilu(t)1I2 = Ilu(0)1I2 if 0: + a = 0: first by showing that the derivative of Ilu(t)112 with respect to t is constant and then by invoking Exercise 13.23. Exercise 13.36. In the setting of Exercise 13.35, describe Ilu(t)1I2 as t if 0: + a #= O. Exercise 13.37. Evaluate limttoo equation
y(t) =
[~
i
00
r 2 e- 2t y(t) for the solution yet) of the
-t n
y(t), t
~ 0, when y(O) = [
i]
13.13. Wronskians To this point we have considered only differential equations with constant coefficients. A significant number of applications involve differential equations with coefficients that also depend upon the independent variable, i.e., equations of the form (13.24) ap(t)x(P)(t)
+ ap_l(t)x(P-l)(t) + ... + al(t)x(l)(t) + ao(t)x(t) =
get)
on either a finite or infinite subinterval of R. Although we shan consider only second-order differential equations in the sequel, it is instructive to begin in the more general setting of p'th order differential equations.
13. Difference equations and differential equations
294
Lemma 13.10. Let Ul(t), ... ,up(t) be solutions of the homogeneous equa-
tion ap(t)x(p)(t)+ap_l(t)x(p-l)(t)+ .. '+al(t)x(l)(t)+ao(t)x(t) = 0, a ~ t ~ (3, in which the coefficients are assumed to be continuous real-valued functions on a finite interval a ~ t ~ (3 with ap(t) > 0 on this interval. Let
(13.25)
Then
Discussion.
=
Let p = 3. Then
[u(') (t)
..,\') (t)
u\') (t)]
det u~l)(t) u~l)(t) u~l)(t) u~2)(t) u~2)(t) u~2)(t)
U2(t) U3(t) ] [U'(t) 2 +det ui )(t) u~2)(t) u~2) (t) ui2)(t) u~2) (t) u~2)(t)
[ u,(t) U2(t) U3(t) ] + u~l)(t) u~l)(t) u~l)(t) u~3)(t) u~3)(t)
u~3)(t)
-
[ u,(t) u,(t) U3(t) ] 0+0+ ui1)(t) u~l)(t) u~l)(t) ui3)(t) u~3)(t) u~3) (t)
=
_ a2(t) (t)
a3(t)
,
since
a3(t)u)3)(t) + a2(t)u;2)(t) + al (t)u;l) (t) + ao(t)uj(t) = 0 for j = 1,2,3. But this leads easily to the stated conclusion. It is clear that the same argument is applicable for general p. The function
13.14. Variation of parameters
295
13.14. Variation of parameters The method of variation of parameters provides a solution to a nonhomogeneous equation of the form (13.24) in terms of linear combinations of the solutions to the corresponding homogeneous equation, in which the coefficients are permitted to depend on the independent variable t. We shall illustrate the method for the second-order differential equation
a(t)y"(t)
(13.26)
+ b(t)y'(t) + c(t)y(t) = g(t).
Let
U(t) = dl(t)UI(t) + d2(t)U2(t) be a linear combination of solutions Ul(t) and U2(t) to the homogeneous
equation
a(t)y"(t) + b(t)y'(t) + c(t)y(t) = 0,
(13.27)
with coefficients dI (t) and d2 (t) that are allowed to vary with the independent variable t. To explore this idea, note that
u'(t) = dIU~
+ d2U~ + ~Ul + d~U2'
and hence upon choosing dl(t) and d 2 (t) so that d~UI
on the interval a ::; t
+ d~U2 =
0
< /3, it follows that
u"(t)
= dIU~
+ d2U~ + ~u~ + d~u~.
Therefore
au" + bu' + cu =
dl (au~ + bu~ + CUI) + d2(au~ + bu~ +a(d~u~ a(d~u~
+ CU2)
+ d~u~)
+ d~u~).
Thus, the problem of interest reduces to finding coefficients d l (t) and d2 (t) such that
+ d~U2 - 0 a(d~u~ + d~u~) - get) d~UI
or, equivalently,
UI'1 [ pu
pUU~2] [ d~d~ ] --
[ a-Ipg 0 ]
with
p(t) = exp
{It ab((ss))dS} . ~
The extra factor p( t) has been introduced in the array of equations in order to take advantage of Lemma 13.10, which guarantees that the determinant of the 2 x 2 matrix on the left is equal to a constant: det [uI,
pU I
u 2,] = cp(t)p(t) = cp(a). pU2
13. Difference equations and differential equations
296
Exercise 13.39. Verify that the function u(t) specified in formula (13.28) is a solution of the differential equation (13.26) for every choice of the constants d1(a) and d2(a). [HINT: The formulas
(13.29)
d dt
Jar f(s)ds = t
f(t)
and
d dt
Jtr
f3
f(s)ds = - f(t)
may be usefu1.] Exercise 13.40. Show that if the arbitrary constants in formula (13.28) are specified as d2(a) = 0 and d1(a) = ,),-1 a(s)-1p(S)U2(S)g(s)ds, then the solution can be expressed as
J:
u(t) =
lf3 G(t, s) Q
P((S))9(S)dS, ')'a S
where
G(t s) = {U1(t)U2(S) if a:::; t:::; s:::; (3 , U1(S)U2(t) if a:::; s:::; t:::; (3 (The kernel G(t, s) is called the Green function of the problem.) Exercise 13.41. Use the formulas in Exercise 13.40 to show that for any choice of a E JR and b E JR, there exist a pair of constants K1 and K2 such that au(a}+btt'(a} = Kl(aul(a)+bu~(a)) and au((3)+bu'((3) = K2( au2((3) + bu~((3)).
Chapter 14
Vector valued functions
In my experience, those people who think they know all the answers, don't know all the questions. A Chinese proverb puts it well: Trust only those who doubt.
In this chapter we shall discuss vector valued functions of one and many variables and some of their applications. We begin with some notation for classes of functions with different degrees of smoothness that will prove convenient. Let Q be an open subset of IR n and let Q denote the closure of Q. A function f that maps Q into IR is said to belong to the class
C(Q) if f is continuous on Q, C(Q) if f is continuous on Q, Ck(Q) for some positive integer k if f and all its partial derivatives of order up to and including k are continuous on Q, Ck(Q) for some positive integer k if f E C(Q) and f and all its partial derivatives of order up to and including k extend continuously to Q. A vector valued function f from Q or Q into IR m is said to belong to one of the four classes listed above if all its components belong to that class. Moreover, on occasion, f is said to be smooth if it belongs to Ck(Q) for k large enough for the application at hand. The notation Br{a) and Br{a) for balls of radius r > 0 that was introduced in (7.15) will be useful. • The warnings posted in the preceding chapters are still in effect.
-
297
298
14. Vector valued functions
14.1. Mean value theorems We begin with the classical mean value theorem for real-valued functions f (x) of one variable x that are defined on the closed interval
[a, b]
= {x E lR : a ~ x ~ b}.
The proof can be found in many textbooks (see e.g. [4]) and will not be given here. Theorem 14.1. Let f(x) be a continuous real-valued function on the finite
closed interval [a, b] and suppose that the derivative f'(x) exists for each point x in the open interval (a, b)
= {x E lR : a < x < b} .
Then f(b) - f(a) = f'(c)(b - a)
(14.1)
for some point c E (a, b). We turn next to the generalized mean value theorem. Theorem 14.2. Let f(x) and g(x) be continuous real-valued functions on
the finite closed interval [a, b] and suppose that the derivatives f' (x) and g'(x) exist for each point x in the open interval (a, b). Then (14.2)
{f(b) - f(a)}g'(c) = {g(b) - g(a)}f'(c)
for some point c E (a, b). Proof.
Let
h(x) = f(x){g(b) - g(a)} - g(x){f(b) - f(a)}. Then, by Theorem 14.1,
h(b) - h(a) = h'(c)(b - a) for some point c E (a, b). However, since
h(b) - h(a) = 0 and b > a, this implies that
h'(c) = 0 for some point c E (a, b). But that is the same as the asserted statement.
D
Exercise 14.1. Let p(x) = ao + alX + ... + anx n be a polynomial of degree n with n distinct real roots al < ... < an, where n ~ 2. Show that p'(x) has n - 1 real roots {31 < ... < {3n-l such that aj < {3j < {3j+1 for j = 1, ... ,n-1. Exercise 14.2. Use the mean value theorem to show that if b then v'ab - a ~ (b - a)/2.
>
a
> 0,
299
14.2. Taylor's formula with remainder
14.2. Taylor's formula with remainder Theorem 14.3. Let f E Cn- 1 ([a, b]) on the finite closed interval [a, b] and suppose that the n'th order derivative f(n}(x) exists for each point x in the open interval (a, b) . Then
f(b) = f(a)
(14.3)
n-l
+ '" f(k}(a) L-
k=l
()k b- a k!
+ f(n} (c)
()n b- a n!
for some point c E (a, b). Proof. Let 9 E Cn - 1 ([a, b]) on the finite closed interval [a, b] be such that the n'th order derivative g(n) (x) exists for each point x in the open interval (a, b) and let n-l j(k)( )
cp(x) = f(x)
+L
k! x (b - x)k
k=l
and n-l
(k)( )
"p(x)=g(x)+L g k!x (b-x)k. k=l
Then cp(x) and "p(x) meet the hypotheses of the generalized mean value theorem, Theorem 14.2. Therefore, by that theorem,
{cp(b) - cp(a)}"p'(c) = {"p(b) - "p(a)}cp'(c) for some point c E (a, b). Thus, as
cp(b) = f(b) , "p(b) = g(b) and, by a short calculation,
cp'(x) =
f (n)( x ) (b _ xt- 1 and "p'(x) =
(n - I)!
9
(n)( )
x (b - xt- 1 ,
(n - I)!
we see that
{f(b) - cp(a)}g(n) (c) = {g(b) - "p(a)} f(n) (c) for some point
C
E
(a, b). To complete the proof, let
g(x) = (x - a)n. Then, as
o
for k = 0, ... ,n - 1 ,
n! for every point x and
g(b)
(b - a)n,
E lR.
14. Vector valued functions
300
the last formula reduces to
{f(b) - rp(a)}n! = (b - a)nf(n) (c) ,
o
which is equivalent to formula (14.3).
14.3. Application of Taylor's formula with remainder Formula (14.3) is useful for calculating f(b) from f(a) and its derivatives f(l)(a), f(2)(a), ... , when b is close to a. Thus, upon setting b = a + h, we can reexpress formula (14.3) as
f(a
(14.4)
+ h) -
{
f(a)
hk}
n-l
+ "" f(k)(a)- = ~ k! k=l
hn f(n)(c)n!
and use the right-hand side to estimate the difference between the true value of f(a + h) and the approximant: n-l
f(a)
+L
hk f(k)(a) k!
k=l
Thus, for example, in order to calculate (27.1)5/3 to an accuracy of 1/lDO, let f(x) = x 5/ 3 ,a = 27 and b = 27.1. Then as the formula
feb) = f(a)
+ f'(a)(b -
a)
+ f"(c) (b ~!a)2
translates to (27.1)5/3 _ {(27)5/3 + ~(27)2/3~} = lD c- 1/ 3_ 1_. 3 10 9 200 '
that is
(27.1)5/3 _ {35 + ~}\ = c- 1/ 3 \ 2 180 ' for seme number c that lies between 27 and 27.1. In particular, this constraint implies that c> 27 and hence that c- 1/ 3 < Consequently
1.
(271)5/3 _ {35 \ .
+ ~}\ < 2
-
(1/3) 180
= _1
540'
Thus, the error in approximating (27.1)5/3 by 35 + ~ is less than 1/(540). Exercise 14.3. Show that the error in approximating (27.1)5/3 by (27)5/3 is bigger than 3/2.
14.5. Theorems for vector valued functions of several variables
301
14.4. Mean value theorem for functions of several variables Let f(x) = f(Xl,'" ,xn ) be a real-valued function of the vector x with components Xl, ... ,Xn and suppose that the partial derivatives (x) exist 1 in some region, say al < Xl < bl , ... , an < Xn < bn . Then we shall write
if
(14.5)
(V f)(x) = [M,(x)
for the 1 x n row vector with entries -3!;(x), j 1 (V f)(x) is termed the gradient of f.
=
1, ... ,n. The vector
Theorem 14.4. Let Q = {x E ]Rn : al < Xl < bl , ... ,an < Xn < bn } and let f(x) = f(xI, ... ,xn ) be a continuous real-valued function of the variables Xl,. .. ,Xn in the bounded closed region Q and suppose that the partial derivatives (Xl, . .. ,Xn ) exist for each point (X I, . . . ,xn ) in the J open region Q. Then
-sf.
(14.6)
f(b) - f(a) = (Vf)(c)(b - a)
for some point c = a between a and b.
+ to(b -
Proof.
a), 0 < to < 1, on the open line segment
Let
h(t)
= f(a+t(b-a)) f(XI(t), ... ,xn(t)) ,
where
Xj(t)
= aj + t(bj - aj)
for 0 ~ t ~ L Then clearly h(t) is continuous on the interval [0,1]' and the derivative
h'(t)
=
of L ax' (a+t(b - a))(bj - aj) n
j=l
J
= (V f) (a + t(b - a)) (b - a) exists for each point t in the open interval (0,1). Therefore, by Theorem 14.1, there exists a point to E (0,1) such that h(l) - h(O) = h'(to). But, in view of the preceding calculation, this is easily seen to be the same as formula 0 (14.6) with c = a + to(b - a).
14.5. Mean value theorems for vector valued functions of several variables We turn now to vector valued functions
302
14. Vector valued functions
of several variables. We assume that each of the components Ji(X), i = 1, ... ,p, of f(x) is real- valued. Thus f(x) defines a mapping from some subset of IR q into IR p. Theorem 14.5. Assume that each oj the components Ji(x), i = 1, ... ,p, oJf(x) is a continuous real-valued function oj the variables Xl, ... ,Xq in the bounded closed region al $ Xl $ b l , ... ,aq $ Xq $ bq and that the partial derivatives
afi afi -a (x) = -a (Xl. ... Xj Xj exist Jor each point Xq < bq • Then
(Xl. •.• ,Xq)
,Xq)
in the open region
al
< Xl <
bl, ...
,aq <
(14.7)
for some set of points Ci =
a + ti(b - a), 0 < ti < 1, i = 1, ... ,po
Proof. This is an immediate consequence of Theorem 14.4, applied to each 0 component !i(x) of f{x) separately. Corollary 14.6. In the setting of Theorem 14.5,
for some set of points Proof.
CI, ...
,cp in the open line segment between a and b.
By definition, p
IIf(b) - f(a)112 =
L {Ji(b) -
fi(a)}2 .
i=l
Moreover, by Theorem 14.5 and the Cauchy-Schwarz inequality,
Ifi(b) -Ji(a)1 = I(b - a, V'fi(Ci?) I $lIb-allllV'!i(ci?1I
= lib - al!l!V'Ji(ci)lI· But this is easily seen to be equivalent to the asserted statement.
0
14.5. Theorems for vector valued functions of several variables
303
Theorem 14.7. Let
h(Xl!':' f(x) = [
,X q )]
: fp(xI, ... ,Xq)
be a continuous map from the bounded region al ~ Xl ~ bb ... ,aq ~ Xq ~ bq into JRP such that all the partial derivatives ~(x), j = 1, ... ,p, k = 1, ... ,q, exist for al < Xl < bl,'" ,ak < Xq < bk and let
[
~(X)
..
~(X)l
Jf(X) = :
:
~(x) OXI
~(x) oX q
denote the Jacobian matrix of the mapping f. Then
Ilf(b) - f(a) II ~ II Jf(C) II lib - all for some point c on the open line segment between a and b.
Proof.
Let
u = f(b) - f(a) and let h(t) = uTf(a + t(b - a)).
Then, by the classical mean value theorem, h(l) - h(O)
= h'(to)(1- 0)
for some point to E (0,1). But now as h(1) - h(O)
= uTf(b) - uTf(a) = lIull 2
and
d
h'(t)
p
L ujfj(a + t(b - a)) t . = L L aar (a+ t(b - a))(bk -
= -d
J=
1
P
q
Uj
j=l
k=l
J Xk
= (Jf(a + t(b - a))(b - a), u),
the mean value theorem yields the formula
lIull 2 = (Jf(c)(b - a), u) for some point c= a
+ to(b -
a)
ak)
14. Vector valued functions
304
on the open line segment between a and b. Thus, by the Cauchy-Schwarz inequality, IIuII 2 ~ II Jf(c)(b - a)llllull, which leads easily to the advertised result. o
14.6. Newton's method Newton's method is an iterative scheme for solving equations of the form f(x) = 0 for vector valued functions f E C2 (Q) that map an open subset Q of llV into lR P . The underlying idea is that if Xo is close to a point UO at which f(uO) = 0 and the Jacobian matrix Jf(UO) is invertible, then, since 0= f(uO)
~
f(xo)
+ Jf(xo)(uO -
xo)
and Jf(xO) is invertible, the point UO should be close to Xl = Xo - Jf(xO)-lf(xo)
and even closer to etc.
Theorem 14.8. LetQ be a nonempty open subset oflR P , letfE C1(Q) map Q into lR P and suppose that there exists a point UO E Q such that
(1) f(uO) = O. (2) The Jacobian matrix
Jf(X) =
?k(X) ox!
M:(X) P
is invertible at UO .
(3) There exists a pair of numbers a ball Bp(uO) C Q and IIJf(a) - Jf(b)II ~ alia - bll
> 0 and p > 0 such that the open for a, b E Bp(uO).
f3 > 0 and 6 > 0 such that is invertible and IIJf(X)-11l ~ f3 for X E Bo(uO).
Then there exists a pair of numbers
(14.8)
Jf(X)
Moreover, if
(14.9)
Xi+! = Xi - Jf(Xi)-lf(Xi) ,
i = 0, 1, ... ,
and Xi E Bo(uO) ,
14.6. Newton's method
305
then (14.10) Proof. Suppose that the vector Xi belongs to the open ball Bo(uO) for some choice of & in the interval 0 < &:s: p such that the conditions in (14.8) are in force and let Then Xi+! - UO = Xi - UO - Jf(xd- 1{f(Xi) - f(uO)} , since f(uO) = O. Let
h(s) = f(Xi
+ s(UO -
xd)
for
O:S: s :s: 1.
Then h(1) - h(O) = 11 h'(s)ds
f(uO) - f(Xi) =
11 (Jf)(Xi
+ S(UO - xi))ds(uO -
Xi).
Thus, (14.11) Xi+! - UO = Jf(Xi)-1
r {Jf(Xi + s(UO - Xi)) - Jf(Xi)}ds(uO - Xi) .10 i
and hence by (3),
II Jf(Xi)-1 11 {Jf(Xi + S(UO -
Ilxi+! - u011
+ S(UO -
<
0:(3 11 IIXi
<
0:(3 11 sdslluO - xil1 2 ,
Xi)) - Jf(Xi)}ds(uO - xi)11
Xi) - xilldsll u O- Xiii
o
which coincides with (14.10).
Corollary 14.9. If, in the setting of Theorem 14.8, &1 = min{&,2j(0:,B)} and Xo E BOl (UO), then XiEBol(UO) Proof.
and
Iluo-Xi+111:S:lluo-Xill
The proof is left to the reader.
Exercise 14.4. Verify Corollary 14.9.
for
i=1,2, ....
0
306
14. Vector valued functions
Exercise 14.5. Show that the Newton step (14.11) for solving the equation X2 - a = 0 to find the square roots of a > 0 is
xn+1 =
~ (xn + xan )
if Xn
and calculate Xl, X2, X3 when a = 4 and Xo
i- 0 ,
= ± l.
Exercise 14.6. Show that in the setting of Exercise 14.5
1
IIXn+1- xnll :::; 211x~
I
1IIIxn
2
-
xn-IiI .
Exercise 14.7. Show that the Newton step (14.11) for solving the equation x3 - a = 0 to find the cube roots of a is
xn+1 = and calculate
XI, X2, X3
~ (2xn + xa~)
if Xn
i- 0 ,
when a = 8 and Xo = ±l.
14.7. A contractive fixed point theorem Theorem 14.10. Let f{x) be a continuous map of a closed subset E of a normed linear space X over IF into itself such that IIf(b) - f{a) II :::; Kllb - all
for some constant K,O Then:
< K < 1, and every pair of points a, b in the set E.
(1) There is exactly one point x* E E such that f(x*) = x*. (2) If Xo E E and X n +1 = f(x n ) for n = 0, 1, ... , then x*
=
lim X n ;
nioo
i.e., the limit exists and is independent of how the initial point Xo is chosen and (3) IIx* -
Xnll :::;
Kn
1- K"xI - xoll·
Proof. Choose any point Xo E E and then define the sequence of points XI,X2, ... by the rule Then clearly IIX2 - xIiI = IIf(xI) - f(xo) II
II X3 -
:::; KlixI -
xoll
x211 = IIf(x2) - f(xdll :::; Kllx2 - xIII :::; K211xI - xoll
14.7. A contractive fixed point theorem
307
and hence
Ilxn+k - xnll :s; Ilxn+k - xn+k-IiI + ... + IIxn+1 - xnll :s; (Kn+k-l + ... + Kn)llxl - xoll Kn
:s; 1_K"xl-xol. Therefore, since K n tends to 0 as n i 00, this last bound guarantees that the sequence {xn} is a Cauchy sequence in the closed subset E of X. Thus, Xn+k converges to a limit X* in E as k i 00, which justifies the inequality in (3). Moreover,
IIf(x*) - x*1I = IIf(x*} - f(xn} + Xn+1 - x*1I :s; IIf(x*} - f(xn} II + IIXn+1 - x*1I :s; Kllx* - xnll + IIxn+l - x*1I 2Kn :s; 1- K"xl - xoll· Thus, as this upper bound can be made arbitrarily small by choosing n large, we must have x* = f(x*}; i.e., x* is a fixed point of f. This establishes the existence of a fixed point. The next step is to verify uniqueness. To this end, suppose that x* and y * are both fixed points of f in the set E. Then
o:s; IIx* - y*1I = IIf(x*} - f(y*) II :s; Kllx* - y*lI· Therefore, This proves that
o Example 14.11. Let A E Cpx P , B E c qxq and C E C pxq . Then the equation X-AXB=C
has a unique solution X E C pxq if IIAIIIIBIl < 1. (Much stronger results will be obtained in Chapter 18.) Discussion. Let f(X) =C+AXB. Then clearly the function f maps X E C pxq into f(X} E C pxq , and X is a solution of the given equation if and only if f(X) = X. The conclusion
308
14. Vector valued functions
now follows from the last theorem (with E = C pxq identified as C pq ) and the observation that
Ilf(X) -
f(Y)1I =
IIA(X - Y)BII :s: IIAIIIIX - YIIIIBII·
Exercise 14.8. Let E = {x E Ii : O:S: x Show that: (a)
:s: 1} and let
f(x) = (1
f maps E into E.
(b) There does not exist a positive constant "f < If(b) - f(a)1 :s: "fIb - al for every choice of a, bEE. (c)
+ x 2 )j2.
f has exactly one fixed point
1 such that
x* E E.
Exercise 14.9. Show that the polynomial p(x) = 1 - 4x + x 2 - x 3 has at least one root in the interval 0 :s: x :s: 1. [HINT: Use the fixed point theorem.] Exercise 14.10. Show that the function f(x, y)
= [
fixed point inside the set of points (x, y) E Ii2 : x 2 + y2
g;:1{}3 ]
has a
:s: 1.
Exercise 14.11. Show that if A E IiPxp and IIIp - All < 1, then, for any choice of bE IiP and Uo E IiP, the vectors Un+! = b + (Ip - A)un converge to a solution x of the equation Ax = b as n tends to infinity. [HINT: Ax = b if and only if b + (Ip - A)x = x.]
14.8. A refined contractive fixed point theorem The fixed point theorem that we proved above assumed that f(x) was a continuous mapping of a closed subset E of a normed linear space V over IF into itself such that IIf(x) - f(y) II
:s: Kllx -
yll
for some constant K, 0 < K < 1, for every pair of vectors x and y in E. The next theorem relaxes this constraint. Theorem 14.12. Let f(x) be a continuous map of a closed subset E of a normed linear space V over IF into itself such that the j 'th iterate flil
=f0f0
.•. 0
f
of f satisfies the constraint
for some constant K, 0 < K < 1, with respect to any norm f has a unique fixed point x* in E.
II
lion V. Then
14.9. Spectral radius
309
Proof. Let g(x) = fbj(x). Then, by Theorem 14.7, g has a unique fixed point x*. Moreover, f(x*) = f(g(x*)) = g(f(x*)). But this exhibits f(x*) as a fixed point of g. Thus, as g has only one fixed point, we must have f(x*) = x*; o i.e., x* is a fixed point of f. Example 14.13. Let A E CpxP, B E equation
c qxq
and C E C pxq . Then the
X -AXB = C has a unique solution X if IIAnllllBnll < 1 for some positive integer n. This conclusion is stronger than the one obtained in the last section. It rests on the observation that the j'th iterate of the function f(X) C + AX B satisfies the inequality E C pxq
Ilfbj(X) - fbj(Y)11 = II Ai (X - Y)Bili ~ IIAillllX - YIIIIBili.
14.9. Spectral radius The last example (and in fact earlier considerations on the growth o(the solutions to the equations studied in Chapter 13 indicates the importance of estimates of the size of IIAnll. The next theorem provides a remarkable connection between these numbers and the spectral radius
(14.12)
ru(A) = max {IAI : A E u(A)}
for
A
E lF Pxp .
It is easy to obtain a bound: Lemma 14.14. Let A E lF PxP . Then (14.13)
for every positive integer n. Proof. Let Ax = AX for some nonzero vector x E C p. Then, since A nx = An x , it is readily seen that
and hence that IAnl ~ IIAnll for every eigenvalue A of A. Therefore, the spectral radius ru{A) of the matrix J is clearly subject to the bound (14.13) for every positive integer n. 0
14. Vector valued functions
310
It is a little harder to obtain appropriate upper bounds on IIAnlll/n. It is convenient to first establish the following result: Lemma 14.15. If A E lF Pxp is similaT to B E lF Pxp and iflimnloo IIAnlll/n exists, then lim IIAnll l / n = lim IIB n I l/n . nloo
nloo
Proof. By assumption, there exists an invertible matrix U E lF Pxp such that A = U BU- 1 . Therefore,
IIAnll = IIUBnU-11l :s; 11U1I1IBn1l1lU-11l and where and cn ! 0 as n i 00. The proof is now easily completed, since the formula B = U- 1 AU yields the supplementary bound
D
Exercise 14.12. Show that the numbers Cn that are defined in the proof of Lemma 14.15 tend monotonically to zero as n i 00. Theorem 14.16. Let A E lF PxP . Then (14.14)
lim I An II l/n = Tu(A);
nioo
i.e., the indicated limit exists and is equal to the spectral radius of A. Proof. Let A be similar to a Jordan matrix J. Then Tu(A) hence, in view of Lemma 14.15, it suffices to show that (14.15)
= Tu(J), and
lim II In II l/n = Tu(J) .
nioo
Since (14.15) is clear if Tu(J) = 0, it is necessary to consider only the case Tu(J) > O. Suppose first that J = cir) is a single Jordan cell with J1 i- O. Then J = B + C, with B = J1Ip and C = C}r) - J1Ip = CaP). Therefore, since
BC = C Band C k = 0 for k ~ p,
14.9. Spectral radius
311
the binomial theorem is applicable: If n
t (~)Bn-kCk
In =
> p,
then
k=O
= Bn +
(7) B n- 1C
Bn-pH {BP-1 + and, since IIBII
=
IIlI and IICII
+ ... +
~:
(7) Bp- C 2
1) B n-pHCp-1
+ ... +
~: I)CP-1}
= 1,
IlJnll :::; II Bn-pH II {IIBP- 111 + n1lBp- 2 CII + ... + nP-1I1CP-llI}
:::; IIB n-pH llnp- 1 (1 + IIBII)P-1 = Illl n n P- 1 (1 +
11l1- 1 )P-1.
Therefore, (14.16) where 1 + 8n
{n(1 exp
+ 11l1- 1 )} (p-1)/n
{P:l
[lnn+ln(I+ IIlI - 1)]}
1 as njoo.
---t
Thus, the two bounds (14.13) and (14.16) imply that
ra(J) :::; IIJnll l / n :::; ra(J){1 + 8n }, which serves to complete the verification of (14.15) when J Jordan cell with Il oF 0, since 8n ---t 0 as n j 00.
= C}r)
is a single
The next step is to observe that formula (14.15) holds if the p x p matrix
J = diag { J1, ... ,Jr }, where
J ~. = is a Jordan cell of size IIrll 1/ n
Vi
x =
Vi
with Ili
C(Vi) P-i
oF 0 for i
= 1, ... ,r. Then
max{IIJfIl 1/ n , ... , IIJ~1I1/n}
< max {11l1InP , ... , Illr InP } < r a(J)nP for large enough n .
o Remark 14.17. Formula (14.14) is valid in a much wider context than was considered here; see e.g., Chapter 18 of W. Rudin [60].
312
14. Vector valued functions
Theorem 14.18. Let A and B be p x p matrices that commute. Then
a(A + B)
~
a(A) + a(B).
Proof. Let u be an eigenvector of A + B corresponding to the eigenvalue f..L. Then
(A+B)u = f..LU and hence, since BA = AB,
(A + B)Bu = B(A + B)u
= f..LBu;
that is to say, N(A+B-p.lp ) is invariant under B. Therefore, by Theorem 4.2, there exists an eigenvector v of B in this null space. This is the same as to say that (A+B)v = f..LV and Bv = {3v where {3 is an eigenvalue of B. But this in turn implies that
Av = (f..L - (3)v; Le., the number a = f..L - (3 is an eigenvalue of A. Thus we have shown that
f. L
E
a(A + B) ==> f..L = a
+ (3,
a E a(A) and (3 E a{B).
where
But that is exactly what we wanted to prove.
D
Theorem 14.19. If A and Bare p x p matrices such that AB
= BA, then
(1) rq(A + B) $ rq(A) + rq{B). (2) rq(AB) $ rq(A)ru{B). Proof. The first assertion is an immediate consequence of Theorem 14.18 and the definition of spectral radius. The second is left to the reader as an exercise. 0 Exercise 14.13. Verify the second assertion in Theorem 14.19. Exercise 14.14. Verify the first assertion in Theorem 14.19 by estimating II(A+B)nll with the aid of the binomial theorem. [REMARK: This is not as easy as the proof furnished above, but has the advantage of being applicable in wider circumstances.] Exercise 14.15. Show that if A,B E c nxn , then ru(AB) if AB =1= BA. [HINT: Recall formula (5.17).] Exercise 14.16. Show that if A =
ru{AB) > ru{A) ru{B)
and
[~ ~]
and B =
= ru{BA), even
G ~], then
rq(A + B) > ru{A)
+ ru{B). IIAII. ru(A) + IIBII,
Exercise 14.17. Show that if A is a normal matrix, then ru{A) = Exercise 14.18. Show that if A, B E c nxn , then r A+B $ even if the two matrices do not commute.
14.10. The Brouwer fixed point theorem
313
14.10. The Brouwer fixed point theorem A set K is said to have the fixed point property if for every continuous mapping T of K into K, there is an x E K such that Tx = x. This section is devoted to the Brouwer fixed point theorem, which states that the closed unit ball has the fixed point property. The proof rests on the following preliminary result: Theorem 14.20. Let B = {x E Rn : Ilxll ~ I}. There does not exist a function f E C2 (B) that maps B into its boundary S = {x E R n : IIxll = I} such that f(x) = x for every point XES. Discussion.
Suppose to the contrary that there does exist a function f E
C2 (B) that maps B into its boundary S such that f(x) = x for every point XES, and, to ease the exposition, let us focus on the case n = 3, so that
Let
Dr{x)
= det Jr(x) = det
g£~ (x) g£~ (x) g£~ (x)
g£~ (x) g£~ (x) g£~ (x) Then, since f maps B into the boundary S, 3
1=
Ilf(x)11
2 =
L fi(X)2
for every point
x E B.
i=l
Therefore,
a
(3
o = aXj ~ fi(X)2
) = 2 tt3fi(x) ax; ar (x)
for j = 1,2,3 and x E B, and consequently
g£~ (x) g£~ (x) g£~ (x) [hex) hex) hex)]
g£~ (x) g£~ (x) g£~ (x) ala (x) ala (x) ala (x)
0X1
Bi2
0Xi
= [0
0 0]
14. Vector valued functions
314
if x E B. Thus, Dr(x) = 0 for every point x E B. Moreover, if denotes the ij minor of the matrix under consideration, then
Mij(X)
3
Dr(x)
=
2:) _l)1+j ~f~ (x)M
1j (x)
j=1
XJ
3
_
~)_l)1+j {~(IIMIj) - II aMIj } j=1
aXj
.
aXj
Next, in order to evaluate the sum of the second terms on the right, it is convenient to let
and then to note that
aru -ar ar Xl
12 X2
+
13 X3
-- aXI a det [~ ~] - aX2 a det [~ ~] + aX3 a det [~ aX2 aX3 aXI aX3 aXI
~] aX2
Thus, to this point we know that
The next step is to evaluate the last integral another way with the aid of Gauss' divergence theorem, which serves to reexpress the volume integral of interest in terms of a surface integral over the boundary of B:
14.10. The Brouwer fixed point theorem
315
This implies that
JJr
Df(X)dxldx2dx3 =
iB
J1s
3
JI(x)
:?=( -l)1+jXj M lj(x)du )=1
and leads to the problem of evaluating
ah
ah
ah
OX! ax; 8x3 on the boundary S of the ball B. To this end, let x( t) , -1 ::; t ::; 1, be a smooth curve in S such that x(O) = U and X/(O) = v. Then
dfi(X(t)) = afi x' (t) + ali x' (t) + afi x' (t). dt aXl 1 aX2 2 aX3 3 However, since fi(X(t)) = Xi(t) this last expression is also equal to Thus, writing the gradient gradfi(u) = V Ii(u) as a column vector, (grad Ii(u) - ei, v) = 0 for
x~(t).
UE S
for every choice of v that is tangent to S at the point u. Therefore, grad fi(U) - ei = AiU for some constant Ai E R. In other notation, grad fi(X) - ei = AiX. Thus, the determinant of interest is equal to det
[A:~l A3Xl
1
= Xl, A2 X: \ 1 A:!3 A3X2 A3X3 + 1
which leads to the contradiction
o=
Jis
xidu .
Therefore there does not exist a function f E C2 (B) that maps B into its boundary S such that f (x) = x for every point XES. 0 Theorem 14.21. Let f(x) be a continuous mapping of the closed unit ball B
= {x E lRn : IIxll ::; I}
into itself. Then there is a point x E B such that f(x)
= x.
Proof. If the theorem is false, then there exists a continuous function f(x) that maps B into itself such that IIf{x) - xii> 0 for every point x E B. Therefore, since B is compact and f(x) - x is continuous on B, there exists an c > 0 such that Ilf(x) - xii ~ c for every point x E B.
14. Vector valued functions
316
Let g E C2 (B) be a mapping of B into itself such that Ilg(x) - f(x) II ~ c/2 for x E B. Then Ilg(x) - xII ~ c/2 for x E B. Now choose a point hex) on the line generated by x and g(x) such that Ilb(x)11 = 1 and x lies between hex) and g(x) in the sense that x = tb(x)
+ (1 -
t)g(x)
for some 0 < t :S 1; t = 0 is ruled out since Ilg(x) - xii hex)
= x + (1 ~
t)
(x - g(x))
~
c/2. Then
= x + c(x)(x - g(x)) ,
where the coefficient c(x) = (1 - t)/t is nonnegative and may be expressed as ( ) _ -(x - g(x), x) + {(x - g(x),x}2 + Ilx - g(x)112(1_llxI12)}1/2 ex Ilx _ g(x)112 ' since c(x) ~ 0 and IIb(x)1I = 1. But this exhibits b(x) as a function of class C2(B) such that b(x) = x for points x E B with Ilxll = 1, which is impossible in view of Theorem 14.20. D The Brouwer fixed point theorem can be strengthened to: Every closed bounded convex subset ofJRn has the fixed point property; see Chapter 22 .. There are also more general versions in infinite dimensional spaces:
The Leray-Schauder Theorem: Every compact convex subset in a Banach space has the fixed point property; see e.g. [62]' for a start.
14.11. Bibliographical notes The discussion of Newton's method is adapted from [57]. A more sophisticated version due to Kantorovich may be found in the book [62] by Saaty and Bram. The discussion of Theorem 14.20 is adopted from an expository article by Yakar Kannai [41]. The proof of Theorem 14.21 is adapted from [62].
Chapter 15
The implicit function theorem
It seems that physicists do not object to rigorous proofs provided that they are short and simple. I have much sympathy with this point of view. Unfortunately it has not always been possible to provide proofs of this kind.
E. C. Titchmarsh [67] This chapter is devoted primarily to the implicit function theorem and a few of its applications. The last two sections are devoted to an application of vector calculus to dynamical systems and a test for their stability.
15.1. Preliminary discussion To warm up, consider first the problem of describing the set of solutions u E jRn to the equation Au = b, when A E jRPxn, b E jRP and rank A = p. The rank condition implies that there exists an n x n permutation matrix P such that the last p columns of the matrix AP are linearly independent. Thus, upon writing
AP = [An
A 12 ]
with A12 E jRPxp invertible, x E equation can be rewritten as
o
and
jRq,
[;]
= pT u
Y E lR P and n
=
p+q, the original
b- Au = b -APpTu =
b - [An
A 12]
[;]
= b - Anx - A 12 y.
-
317
15. The implicit function theorem
318
Thus, (15.1)
Au - b = 0
<===}
Y = All(b - AllX)
j
i.e., the constraint Au - b = 0 implicitly prescribes some of the entries in bu in terms of the others. The implicit function theorem is based on the same circle of ideas applied to the problem of describing the set of vectors u E ~ n of the vector equation g(u) = 0 when g(u) E ~p. The idea is that if g(a) = 0 and u is close to a, then g(u) should behave approximately the same as g(a) + Vg(a) (u - a), where Vg = V
[~11 = ~11 [V
gp
Vgp
To elaborate further, it is instructive to consider the example of a vector valued function
that maps the set
Q = {(x,y)
E]R2
x ~3
:
Ilx - xoll < a and Ily - Yoll <,8}
into 1R 3 such that:
(1) g E C1(Q). (2) g(xo, YO) = o. (3) The matrix
Bo =
[~('GJ,yOl
W,(7,yOl]
g~~ (xo, YO)
t(Xo,Yo)
is invertible. The mean value theorem yields the formula
for points (x, y) E Q, where (15.2)
c· J
= [Xo] +t· [x - xo] 0< tj < 1, Yo J Y - Yo '
j
= 1,2,3.
15.2. The main theorem
319
Let
and
so that
g(x, y) - g(xo, YO) = A(tb t2, t3)(X - xo) + B(tl' t2, t3)(Y - yo). Thus, if
g(Xo,yo) = 0 and B(tl' t2, t3) is invertible, then in order to find values of y that will make g(x, y) = 0, it is tempting to set (15.3)
y
= Yo - B(tb t2, t3)-1 A(tb t2, t3)(X - xo).
Indeed the invertibility of the matrix B(tb t2, t3) is guaranteed byehoosing IIx - xoll < 'Y and Ily - yoll < 8 with 'Y and 8 small enough, since Bo = B(O, 0, 0) is invertible and the partial derivatives are continuous in the vicinity of the point (xo, yo). However, formula (15.3) is deceptive, because the vectors Cl. C2, C3 depend upon y. That is to say, in formula (15.2) the right-hand side also depends upon y except in the special case that g(x,y) is of the form
t
Ao(x - xo) + Bo(y - YO) for some pair of matrix valued functions Ao = Ao (x) and Bo = Bo (x) that g(x, y)
=
are independent of y. Thus, we are forced to do something a little more clever.
15.2. The main theorem Theorem 15.1. Let
g(x,y) =
91 (Xl,
...
'X~.' Yl.·· . 'YP)]
[
9p(Xl, ... ,Xq,Yl,'" ,Yp) be a vector valued function that maps the set
Q = {(x,y) into lR p such that:
E lR q
x lR P
:
Ilx-xoli < a
and
IIY-Yoll <.B}
320
15. The implicit function theorem
(1) g E C1(Q). (2) g(xo, YO) = o. (3) The p x p matrix
[
G~(x,y) =
~(X,y) ... t(x,y)] :
89
Wt(x,y)
U!(x,y)
is invertible at the point (Xo,Yo).
Then there exists a pair of positive numbers 'Y and 6 such that for every x in the ball B.y(xo) = {x E lR q : Ilx - xoll < 'Y}
= cp(x) in the ball
there exists exactly one point y
B5(YO)
=
{y
E lR P :
lIy - Yoll < 6}
such that g(x, cp(x)) = O. Moreover,
(15.4)
if g E Ck(Q) ,
then
cp E Ck(B.y(xo)) .
Proof. Let B
= G~(xo, YO)
and, in order to invoke the fixed point theorem, let
f(x,y) = y - B-1g(x,y). Then
g(x,y)
=
0
if and only if f(x,y) = y.
Moreover, the array of partial derivatives
8ft
F,(x,y) = is simply related to
[
lJih
...
Ui
G~(x,y):
F~(x,y)
Therefore, since F~(xo,Yo) 'Y > 0 and 6 > 0 such that
= Ip -
B-IG~(X,y).
= 0, assumption (1) guarantees the existence of
(1) IIF~(x, y)1I < K < 1 and
(2) IIB-1g(x, Yo) II ~ (1- K)~
15.2. The main theorem
for Ilx - xoll and let h(y)
321
< 'Y and Ily - Yoll < 8. Now fix any point x in the ball Hy(xo) = f(x, y) and
,
[~(y)... ~(Y)l
H (y) = :
:.
oh
oh
'iiii7 (y)
'iii!; (y)
Then, by Theorem 14.7, the inequality Ilh(b) - h(a)11 ~ IIH'(e)llllb - all
holds for any two points a and b in the ball Bo(yo), where e = a+to(b-a) for some 0 < to < 1 is a point on the line segment joining the points a and b and consequently also belongs to this ball: lie - Yoll
-
11(1- to)(a - Yo) + to(b - Yo)11
< (1- to)lI(a-Yo)lI+toll(b-Yo)11 < (1 - to)8 + to8 = 8. Thus, as
H'(e) = F~(x, e) and
IIF~(x, e)11 < K,
it follows that Ilh(b) - h(a) II ~ Kllb - all
(15.5)
for every pair of points a and b in the open ball Bo(yo). Now let
E = Bo! (Yo) = {y E llV: lIy - Yo II ~ 8t} for any choice of 81 that meets the inequality ~ proof into five parts in order to clarify the logic.
<
81
<
8 and break the
(a) h maps the closed set E into itself. If y E E, then Ilh(y) - Yoll
< IIh(y) - h(yo) II + Ilh(yo) - yoll
<
Klly - yoll + Ilb(yo) - yoll
< K81 + Ilh(yo) - yoll and hence, as IIh(yo) - yoll
=
8
Ilf(x, Yo) - f(xo, Yo) II = IIB- 1 g(x, yo)11 ~ (1- K)2
and 8/2 < 81 ,
8
IIh(y) - yoll ~ K8 1 + (1- K)2 ~ K8 1 + (1- K)/h ~ 81 .
(b) The inequality (15.5) guarantees that IIh(b) - h(a)11 ~ Kllb - all holds for every pair of points a, bEE.
322
15. The implicit function theorem
(c) Invoke the fixed point theorem to h(y) to conclude that for each x in the ball B)'(xo), there exists a unique y =
f(x,y) = y. However, if there were two such fixed points y* and w* in B,,(yo), then the inequality (15.5) implies that lIy* - w*11 = IIh(y*) - h(w*)11 ~ Klly* - w*lI,
K < 1,
with
which is viable only if y * = w *. Therefore, uniqueness prevails in B6(Yo) too. But this is equivalent to the stated conclusion. (e) Verify the implication (15.4). To obtain the differentiability of
where Ci
= [
for
ti E (0,1),
and hence, upon setting
i
= 1, ...
...
,p,
~(Cl)l Yp ,
:
U!(C
p)
that (15.6)
L
[b -
a] = M [
The matrices L = L(Cl,'" ,cp )
and
M = M(Cl, ... ,cp )
are matrices of sizes p x q and p x p, respectively, that depend upon a and b. If 'Y is small enough, then M is invertible and IIM- 1 LII ~ Kl for all points
15.2. The main theorem
323
a and b in By(xo), Thus,
Ilcp(b) - cp(a) II
:::; KIilb -
all,
which establishes the continuity of cp(x) in By(xo), In particular, this guarantees that Ci
~ [ cp~)]
as b
~ a.
The calculation of the partial derivatives of cp(x) is completed by choosing b
= a + cek,
k
= 1, ... ,q,
in formula (15.6), where ek is the k'th column of the identity matrix Iq and c is a small real number. Then, for example, if k = 1, we obtain the formula
_[~(CJ) . . 8g ~(cp)
-
[
~(Cl)
'I',(a, +<,a2,··· ,a,: - 'I',(a"a2,." ,a,)]. [ cpp(al + c, a2, ... ,aq) - cpp(al' a2, ... ,aq)
. 8g lff!I (c
p)
...
c
Letting c lOwe obtain formulas for 0£J., ... , gCPp. Formulas for the other 8XI Xl partial derivatives are obtained from the other choices of ek. 0 Exercise 15.1. Let a, band c be real numbers and let
Show that if
g(xo, YO)
8g
= 0 and 8y (xo, YO) # 0,
then to each point X that is sufficiently close to Xo (Le., Ix - xol < 0 for small enough 0) there is exactly one point y such that g(x, y) = O. Since this single point y is uniquely determined by the x, this is the same as to say that y = cp(x) for such x. Exercise 15.2. Show that in the setting of Exercise 15.1, one can solve for X as a function of y in the neighborhood of any point (xo, YO) Such that
g(xo, YO)
8g
= 0 and 8x (xo, YO) # O.
15. The implicit function theorem
324
Exercise 15.3. Let 9l(X, Yl, Y2) = x2(Yi + y~) - 5 and 92(X, Yl, Y2) = (x Y2)2 + 2. Show that in a neighborhood of the point x = 1, Yl = -1, Y2 = 2, the curve of intersection of the two surfaces 91 (x, Yl, Y2) = 0 and 92(X,Yl,Y2) = 0 can be described by a pair of functions Yl = CPl(X) and Y2 = CP2(X).
yi -
Exercise 15.4. Let
s = {[::]
E
R 4:
Xl - 2X2 Xl - 2X2
X4
+a!~~ -
+ 2X3 -
= 0 }, u = [:] and v =
X4
X4 = 0
[~]
1
-
1
Use the implicit function theorem to show that it is possible to solve for X3 and X4 as functions of Xl and X2 for points in S that are close to u and to points in S that are close to v and write down formulas for these functions for each of these two cases.
15.3. A generalization of the implicit function theorem
I g!
Theorem 15.2. Let Q be an open subset of R n that contains the point uo, let 9i E Cl(Q) for i = 1, ... ,k and suppose that
~(u) rank
t(U)
[
...
t;(u)
(u)
~ p for
u
E
B,(u)' c Q
for some r > 0 and let q = n - p. Then there exists a permutation a of the indices {I, ... ,n} of the components of u and a pair of numbers I > 0 and 6 > 0 such that if
(15.7) Xi
= UO'(i)
for i = 1, ... , q
and
Yi
= UO'(q+i)
for i
= 1, ...
,p,
then:
(2) For each point in the ball B/,(xo) = {x E Rq : exists exactly one point y = cp(x) in the ball {y
IIx - x011 < I}, there Ily - y011 < 6}
E RP:
such that 9i (u) = 0
for i
= 1, . . . ,k when
[uut) Uu(n)
1= [ (x) ] . cP
325
15.3. A generalization of the implicit function theorem
Proof.
The basic idea is to first reorder the functions 91, . .. ,9k so that rank
. . ~(U) 1
[~(U)
=p
~(U) aUI
~(U) aUn
and then to relabel the independent variables in accordance with (15.7) so that rank
[~:(UO) ... t:(UO )1= p. ~(UO)
~(UO)
aYI
ayp
The existence of a function cp(x) such that (2) and (3) hold for i = 1, ... ,p then follows from the implicit function theorem. To complete the proof when k > p, it remains to check that (2) holds for i = p + 1, . .. ,k. To this end, fix i in this range; let with
[
UU(~)(t) .
] = [
x(t) ] cp(x(t))
for
O:::;t:::;l,
Uu(n)(t)
where x(O) = XO and x(t) is a smooth curve inside the ball B')'(xO); and set h(t) = 9i(U(t)). Then, since P
(V'9i)(U(t))
=
Laj(u(t))(V'9j)(U(t)) j=1
for 0 :::; t :::; 1 and h(O) = 0,
h(t) -
lot :s h(s)ds lot (V'9i)(U(S))u'(s)ds
1,' {t, aj(u(s))(V9j)(U(S)) } u'(s)d" =
it t o
0,
j=l
aj(u(s)) {(V'9j)(U(S))U'(s)} ds
326
15. The implicit function theorem
because
(\79j)(U(s))u'(s)
d
= ds 9j (u(s)) = 0
for
j
= 1, ...
,p
and
0 < s < 1. D
15.4. Continuous dependence of solutions The implicit function theorem is often a useful tool to check the continuous dependence of the solution of an equation on the coefficients appearing in the equation. Suppose, for example, that X E lR. 2x2 is a solution of the matrix equation
for some fixed choice of the matrices A, B E lR. 2 x 2 . Then we shall invoke the implicit function theorem to show that if A changes only a little, then X will also change only a little. To this end, let F(A,X)
= ATX + XA -
B
so that lij(A,X)
where
el, e2
= eT(AT X + XA
- B)ej,
i,j
= 1,2,
denote the standard basis vectors in lR. 2. Then, upon writing X =
[Xll X21
X12], X22
one can readily check that alij aXst
ef(AT ese[
T
asiet ej
+ ese[ A)ej
+ atjeiT e s .
Thus, alij aXn
=
alij aX12 alij aX21 alij aX22
=
T aliel ej
+ aljeiT el
T alie2 ej
+ a2jeiT el
T a2iel ej
+ aljeiT e2
T a2i e 2 ej
+ a2jeiT e2 .
15.5. The inverse function theorem
327
Correspondingly,
gill gill gill gin gig gli; gli; gE;
ghl gh1 gh2 Xll gh2 XlI
X12
~l
~2
Xu
X12
~l
~2
X12
X21
X22
8 8
21
22
8 8
21
22
Now suppose that F(Ao, Xo) = 0 and that the matrix on the right in the last identity is invertible when the terms aij are taken from Ao. Then the implicit function theorem guarantees the existence of a pair of numbers 'Y > 0 and {) > 0 such that for every matrix A E lR 2x2 in the ball IIA - Aoll < 'Y there exists a unique X = cp(A) in the ballllX - Xoll < {) such that F(A, X) = 0 and hence that cP (A) is a continuous function of X in the ball II A - Ao II < 'Y.
15.5. The inverse function theorem Theorem 15.3. Suppose that the p x 1 real vector valued Junction
!I(Xb .... ,XP)] f(x) = [
: Jp(Xl. ... , xp)
is in C1 (Ba:(xo)) Jor some a > 0 and that the Jacobian matrix
Jdx) =
[
~(X)
...
~(X)I
:
!!iE.(x) aXl
!!iE.(x) axp
is invertible at the point Xo. Let Yo = f(xa). Then there exist a pair oj numbers'Y > 0 and {) > 0 such that Jor each point y E lRP in the ball BtS(Yo) there exists exactly one point x in the ball By(xa) such that Y = f(x). Moreover, the Junction x = '!9(y) is in C1 (Bo(Yo)).
Proof.
Let g(x, y) = f(x) - y and let
~(x,y) G~(x,y)
=
[
: 89 ~(x,y)
M;(x,y)
I.
~(x,y) p
Then, since g(xo, Yo) = 0 and the matrix G~ (xo, Yo) = J(xa) is invertible, the implicit function theorem guarantees the existence of a pair of positive numbers'Y and {) such that for each vector y E BtS(Yo), there exists exactly
328
15. The implicit function theorem
one point x = '!9 (y) such that g( '!9(y) , y) = 0 and that moreover, '!9(y) will have continous first order partial derivatives in Bo(yo). But this is equivalent to the asserted statement. 0
Exercise 15.5. Let g(x) be a continous mapping of lR P into lR P such that all the partial derivatives
gfli. exist and are continuous on lR
p.
(xl , ... , X P) gI
g(x)
=[
:
!l!zl(x) ,... aXl
1,Jg(x) =[ :
I
Write
3
!l!zl(x) axp
:,
g~~ (x) , . . . g;~ (x)
gp(Xl, ... ,Xp)
yO = g(XO) and suppose that the matrix B = Jg(XO) is invertible and that IIIp - B-lJg(x)!I < 1/2 for every point x in the closed ball Bo(xO). Show that if p = 8/(2!1B-l!l), then for each fixed y in the closed ball Bp(YO), there exists exactly one point x E Bo(xo) such that g(x) = y. [HINT: Show that for each point y E Bp(YO), the function h(x) = x - B-l(g(x) - y) has a fixed point in B6(XO).] Exercise 15.6. Let Xl -X2 g(x) = [ X22 +X3 X~
- 2X3
+1
(a) Calculate Jg(x), B
1
= Jg(XO) and B-1.
(b) Show that < 5/3. (c) Show that if y E lR 3 is fixed and h(x) = x - B-l(g(x) - y), then Jh(x) = B-l(Jg(xo) - Jg(x)).
IIB- 1 11 2
(d) Show that IiJh(x) II ~ 211B-lllllx - xOII· (e) Show that if 211B- 1 118 < 1/2, then for each fixed point y in the closed ball Bp(YO), with p = 8/(21IB- l !l), there exists exactly one point x in the closed ball Bo(xo) such that g(x) = y.
Exercise 15.7. Let UO E lR 2 and let f E C2(Br(uO) and suppose that
[~~~~~~~n
is invertible for every pair of vectors u, v
that if a, bE Br(uO), then f(a) = f(b)
¢:::::>
E
Br(uO). Show
a = b.
Exercise 15.8. Show that the condition in Exercise 15.7 cannot be weakened to
[~~j~j~:j]
is invertible for every vector
U
E
Br(uO). [HINT: Con-
sider the function f(x) with components h(x) = Xl cos X2 and h(x) Xl sin X2 in a ball of radius 27r centered at the point (37r, 27r).
15.7. An instructive example
329
Exercise 15.9. Calculate the Jacobian matrix Jf(x) of the function f(x) with components fi(Xl,X2,X3) = xd(1 + Xl + X2 + X3) for i = 1,2,3 that are defined at all points x E lR 3 with Xl + X2 + X3 # -1. Exercise 15.10. Show that the vector valued function that is defined in Exercise 15.8 defines a one to one map from its domain of definition in lR 3 and find the inverse mapping.
15.6. Roots of polynomials Theorem 15.4. The roots of the polynomial
f(>..)
= >..n + al>..n-l + ... + an
vary continuously with the coefficients al, ... ,an' This theorem is of great importance in applications. It guarantees that a small change in the coefficients al,'" ,an of the polynomial causes only a small change in the roots of the polynomiaL It is usually proved by Rouche's theorem from the theory of complex variables; see e.g. pp. 153-154 of [7] and Appendix B. Below, we shall treat a special case of this theorem in which the polynomial has distinct roots via the implicit function theorem. The full result will be established later by invoking a different circle of ideas in Chapter 17. Another approach is considered in Exercise 17.15.
15.7. An instructive example To warm up, consider first the polynomial
p(>..)
= >..3 + >..2 _ 4>.. + 6.
It has three roots:
>"1 = 1 + i, >"2 = 1 - i, and >"3 = -3. This means that the equation (15.8)
(p, + ill)3
+ a(p, + ill)2 + b(p, + ill) + C =
in terms of the 5 real variables
p" 1I,
0
a, b, c is satisfied by the choices:
= 1, 1I = 1, a = 1, b = -4, c = 6 p, = 1, 1I = -1, a = 1, b = -4, c = 6 p, = -3, 1I = 0, a = 1, b = -4, c = 6. p,
To put this into the setting of the implicit function theorem, let us express
f(a, b, c, p, + ill) = (p, + ill)3 + a(JL + ill)2 + b(p, + ill) + c = p,3 + 3p, 2ill + 3JL( ill)2 + (ill)3
+ a(p,2 + 2JLill + (ill)2) + b(p, + ill) + C
15. The implicit function theorem
330
in terms of its real and imaginary parts as
f(a,b,c,j1+i//) = h(a,b,c,j1,//) +ih(a,b,c,j1,//), where
and
Thus, we have converted the study of the solutions of the roots of the equation
with real coefficients a, b, c to the study of the solutions of the system
h(a, b, c, j1, //) = 0 h(a, b, c, j1, //) =
o.
The implicit function theorem guarantees the continuous dependence of the pair (J.t, //) on (a, b, c) in the vicinity of a solution provided that the matrix
b.(a,b,c,J.t,//) =
~~ ~l ~f [~
is invertible. Let us explore this at the point a = 1, b = -4, c = 6, j1 = 1, // 1. To begin with (15.9)
8h 2 2 8j1 = 3j1 - 3// + 2aj1 + b,
(15.10)
-
8II
(15.11)
= -6j1// - 2a// , 8// 8h 8j1 = 6j1// + 2a// ,
(15.12)
a;:
=
3j12 - 3//2 + 2aJ.t + b.
=
15.8. A more sophisticated approach
331
Therefore,
{)h
{)Jl (1, -4,6,1,1) = -2,
{)h {)v
(1, -4,6,1,1) = -8,
{)h
{)Jl (1, -4,6,1,1) = 8,
{)h {)v
(1, -4,6,1,1)
and (15.13)
D.(I, -4,6,1,1) = det
= -2
[-2 -8] = 8
-2
22 + 82
= 68.
Thus we can conclude that if the coefficients a, b, c of the polynomial ..\3 + a..\2
+ b..\ + c
change a little bit from 1, -4, 6, then the root in the vicinity of 1 + i will only change a little bit. Similar considerations apply to the other two roots in this example. Exercise 15.11. Show that there exists a pair of numbers 'Y > 0 and 8 > 0 such that the polynomial ..\3 + a..\ 2 + b..\ + c with real coefficients has exactly one root..\ = Jl+iv in the ball Jl2+(v-2)2 < 8 if (a-l)2+(b-4)2+(c-4)2 < 'Y. [HINT: The polynomial..\3+..\2 +4..\+4 has three distinct roots: 2i, -2i and -1.]
15.8. A more sophisticated approach The next step is to see if we can redo this example in a more transparent way that will enable us to generalize the procedure. The answer is yes, and the key rests in looking carefully at the formulas (15.9)-(15.12) and noting that {)h = {)h and {)h = _ {)h {)Jl
{)v
.
{)v
{)J.L
and hence that the determinant of interest is equal to
{)h {)h _ {)h {)h = ({)h)2 + ({)h)2 {)J.L {)v
{)v {)J.L
{)J.L
{)Jl
1{)J.L + i{)h 12 = l.8{)Jlf /2 {)J.L
= {)h
=
1:~12
15. The implicit function theorem
332 Moreover, in the case at hand, f(l, -4,6, A)
= A3 + A2 - 4A + 6 = (A - Ad(A - A2)(A - A3)
and
af aA (1, -4,6, AI) = (AI - A2)(AI - A3) =I 0 because the roots are distinct. Lemma 15.5. Let
f(A)
=
=
Then
An + alAn - 1 + ... + an (J..L + illt + al(J..L + illt- l + ... + an· aj aJ..L
Proof.
=
aj aA
and
af all
.af
= '/, aA .
By the chain rule,
aj af aA af af af aA .af aJ..L = aA aJ..L = aA and all aA all = '/, aA . Thus, if we write
f(A) = h (J..L, 11) + ih(J..L, 11) where
h(J..L, 11) and h(J..L, 11) are now both real functions, we see that ah +iah aJ..L aJ..L
=
~ (ah +i ah ). i
all
all
Matching real and imaginary parts, we obtain
aIt = a12 and a12 = _ ah aJ..L all aJ..L all in this case also. These are the well-known Cauchy-Riemann equations, which will resurface in Chapter 17. In particular, this analysis leads to the conclusion that, for the vector function f with components hand 12,
which is nonzero at simple roots of the polynomial f(A). Thus, the implicit function theorem guarantees that the roots of a polynomial f (A) depend continuously on the coefficients of the polynomial if f (A) has distinct roots.
15.9. Dynamical systems
333
15.9. Dynamical systems Dynamical systems are equations of the form x'(t) = f(x(t))
(15.14)
t ~0
for
or, in the discrete case, Xk+l = f(Xk)
(15.15)
for
k
= 0,1, ... ,
where f maps an open set 0 eRn into R n and is constrained to be smooth enough to guarantee the existence and uniqueness of a solution to the stated equation for each given initial condition x(O) E o. Example 15.6. Let f be a continuous mapping of R n into itself such that Ilf(x) - f(y) II ~ ,llx - yll for all vectors x, y ERn and let 0 < b < 00. Then there exists exactly one continuous vector valued function x(t) such that x(t) = v
(15.16)
+ lot f(x(s))ds
for 0
~ t ~ b.
Moreover, x E C((O, 00)) and x'(t) = f(x(t))
(15.17)
Proof.
Let xo(t) = v for 0
Xk+l(t) = v
~
t
~
for
0 < t < b.
b and let
for 0 ~ t
+ lot f(Xk(S))ds
~ band
k
= 0,1, ....
Then the vector valued functions Xk(t), k = 0,1, ... , are continuous on the interval 0 ~ t ~ band Xl(t) - xo(t) =
lot f(v)ds = f(v)t.
Therefore, upon setting (3 = Ilf(v)ll, one can readily see that Ilx2(t) - xl(t)11
<
<
lot Ilf(Xl(S)) - f(xo(s)) lids {3'Y lot sds = {3,t 2/2
and, upon iterating this procedure, that
15. The implicit function theorem
334
Consequently, {3
<
,
k+i bt)j
L·, j=k+1 J.
{3 bt)k+1 < _
e'Yt
,(k+1)!
{3 (,b)k+l
'Yb
< ~(k+1)!
e ,
which can be made arbitrarily small by choosing k large enough. This suffices to guarantee that the continuous functions Xk (t) converge uniformly to a continuous limit x(t) on the interval 0 ::; t ::; b. Moreover, limx(t+h)-x(t) h
_
h-+O
lim-1 (t+hf(x(s))ds h
h-+O
=
Jt
f(x(t))
for each point t E (O,b); i.e.,
x'(t) = f(x(t))
for 0 < t < b.
To obtain uniqueness, suppose that x(t) and y(t) satisfy (15.16) and let
6 = max{lIx(s) - y(s)ll: 0::; s ::; b}. Then IIx(t) - y(t) II
lifat {f(x(s)) - f(Y(S))}dsll
<
fat ,lIx(s) -
y(s) lids
< ,8t. Therefore, upon invoking this bound in the basic inequality IIx(t) - y(t) II ::; ,
fat IIx(s) -
y(s) lids ,
r
2.
we obtain the inequality IIx(t) - y(t) II ::; 6,2
Jo
t
sds = 6 bt
and, upon iterating this procedure, IIx(t) - y(t) II
< 8 bt )k k!
< 8 bb )k
which tends to zero as k i
00.
for 0 ::; t ::; b, k! Therefore, x(t) = y(t).
15.10. Lyapunov functions
335
15.10. Lyapunov functions Let f be a continuous map of an open set 0 C 1R n into 1R n and suppose that f(wo) = 0 at a point Wo E O. A real-valued function cp E CI(O) is said to be a Lyapunov function for the dynamical system x'(t) = f(x(t)),
< 00, at Wo if
0~ t
(1) cp(wo) = 0 and cp(x) > 0 for x E 0 \ {wo}. (2) (V'cp,f(x)) ~ 0 for x E 0 \ {wo}. A Lyapunov function cp(x) on 0 is said to be a strict Lyapunov function if the inequality in (2) is strict, i.e., if (3) ((V'cp) (x) , f(x)) < 0 for x E 0 \ {wo}. Theorem 15.7. Let x(t), 0
:s; t < 00, be a solution of the dynamical system
x'(t) = f(x(t))
for t ~ 0
with x(O) = Xo
and let cp(x) be a Lyapunov function for this system at the point woo Then:
> 0, there exists a 8 such that Ilx(O) - woll < 8 ===> IIx(t) - woll < E for all t
(1) Given any
E
~ O.
(2) If cp(x) is a strict Lyapunov function for the system, then there exists a 8 > 0 such IIx(O) - woll
< 8 ===> x(t) - Wo as t i
00.
Proof. Let E > 0 be given and let 0 < EI ~ E be such that Bel (wo) C O. Let a = min{ cp(x) : IIx - woll = EI} and choose 0 < 8 < El such that max{cp(x) : IIx - woll
:s; 8} = al < a.
Let x(t) denote the trajectory of the given dynamical system with initial value x(O) E B6(WO)' Then, since d dt cp(x(t)) = (V'cp)(x), f(x)) , it follows that
lt2 + lt2
= cp(X(tl)) +
dd cp(x(s))ds t
h
=
cp(x(tt))
((V'
tl
~
for tl
< t2,
and hence that
cp(x(t)) Therefore, IIx(t) - woll
~
< EI for all t
~
for all t
~
O.
O. This completes the proof of (1).
15. The implicit function theorem
336
Suppose next that cp(x) is a strict Lyapunov function on n and that Ilx(t) - woll ~ c for all t ~ o. Then, in order to establish (2), it follows from (1) that it suffices to show that there does not exist an cO > 0 such that Ilx(t) - woll ~
cO
for all sufficiently large t.
This is because the function
h(x) = (V'cp)(x), f(x») is subject to the lower bound
h(x) ~ P > 0 for cO ~ IIx - woll ~ c and hence Icp(x(t2» - cp(x(tl»1 =
lilt2 h(X(S»dsl ~ (t2 - tl)p,
t2 > tl ,
which is not consistent with the fact that cp(x(t» tends to a limit as t
i
00.
D
Exercise 15.12. Show that if A E
(Ax,x)
~
0 for all x E jRn
jRnxn
~
Exercise 15.13. Show that if A E
and A
(Ax, x)
jRnxn
= AT, then
~
0
for all x E
en.
and AT = A, then
= (Ax,x) + (Ay,y) for all vectors x,y E jRn. Exercise 15.14. Let A, BE jRnxn, B = BT. Show that cp(x) = (Bx,x) (A(x+iy),x+iy)
is a Lyapunov function for the dynamical system x'(t) = Ax(t), 0 ~ t < at the point 0 if and only if B ~ 0 and BA + AT B :j O.
00,
Exercise 15.15. Show that in the setting of Exercise 15.14, cp(x) is a strict Lyapunov function for the considered system at the point 0 if and only if
B ~ 0 and BA+ATB -< O. Exercise 15.16. A dynamical system x'(t) = f(x(t», t ~ 0, is said to be a gradient system if f(x) = -V' g(x) for some real valued function 9 E C1 (jR n). Show that for such a system
d dtg(x(t»
~ 0
for every t
> o.
Exercise 15.17. Let 9 E C1(jRn) and let Xo be an isolated minimum of g(x). Show that g(x) - g(xo) is a strict Lyapunov function for the system x'{t) = -V'g(x(t» in a neighborhood of the point Xo.
15.11. Bibliographical notes The presented proof of Theorem 15.1 was adapted from Saaty and Bram [62]. The treatment of Lyapunov functions was adapted from La Salle and Lefschetz [46].
Chapter 16
Extremal problems
So I wrote this tune-took me three months. I wanted to keep it simple, elegant. Complex things are easy to do. Simplicity's the real challenge. I worked on it every day until I began to get it right. Then I worked on it some more.... Finally, one night I played it.
R. J. Waller [70], p. 168. This chapter is devoted primarily to classical extremal problems and extremal problems with constraints, which are resolved by the method of Lagranges multipliers. Applications to conjugate gradients and dual extremal problems are also considered.
16.1. Classical extremal problems Let f(x) = f(Xl,'" ,xn ) be a real-valued function of the variables XI, ... ,Xn that is defined in some open set 0 C R n and suppose that f E C1(O) and a En. Then Theorem 14.4 guarantees that the directional derivative: (16.1)
(Duf) (a) = lim f(a + EU) - f(a) dO E
exists for every choice of u ERn withllull = 1 and supplies the formula
(Duf) (a) = (Vf)(a)u.
(16.2)
If a is a local maximum, then
f(a)
~
f(a + EU)
for all unit vectors u and all sufficiently small positive numbers
E.
Thus,
E > 0 ==} f(a + EU) - f(a) ::; 0 ==} (Duf) (a) = (V f)(a)u ::; 0 E
-
337
16. Extremal problems
338
for all unit vectors U ERn. However, since the same inequality holds when u is replaced by -u, it follows that the last inequality must in fact be an equality: If a is a local maximum, then (Vf)(a)u = (Duf) (a)
=0
for all directions u. Therefore, as similar arguments lead to the same conclusion when a is a local minimum point for f(x), we obtain the following result: Theorem 16.1. Let Q be an open subset of R n and let f E Cl(Q). If a vector a E Q is a local maximum or a local minimum for f(x), then (16.3)
(Vf)(a) =
Olxn'
WARNING: The condition (16.3) is necessary but not sufficient for a to be a local extreme point (Le., a local maximum or a local minimum). Thus, for example, the point (0,0) is not a local extreme point for the function f(Xl, X2) = xf - x~, even though (V f)(0, 0) = [0 0].
More can be said if f E C2 (Q), because then, if Br(a) C Q and b E Br(a), Taylor's formula with remainder applied to the function
a»
h(t) = f(a + t(b implies that
h(1) = h(O) + h'(O) . 1 + h"(to) .
~~
for some point to E (0,1). But this is the same as to say that
feb) == f(a)
n
~
+ L ~(a)(bj j=l
aj)
xJ
&f
1 n
+2L
(bi - ai) 8 .8 . (c)(bj - aj), Xt
i,j=l
xJ
where
c=a
+ to(b -
a)
is a point on the open line segment between a and b. Thus, upon writing the gradient (Vf)(a) as a 1 x n row vector: (16.4)
[M.(a)
Vf(a) =
and introducing the Hessian
(16.5)
Hf(c) =
[
a!2~Xl (c) :
af 2
OXn8xi(c)
a!'J~n (el] , a2f
~(c)
16.1. Classical extremal problems
339
we can rewrite the formula for f(b) as (16.6)
f(b) = f(a)
+ (VJ)(a)(b -
a)
1
+ 2(Hr(c)(b -
a), (b - a)).
Let us now choose b = a+Eu,
where u is a unit vector and E is a positive number that will eventually tend to zero. Then the last formula implies that
f(a + EU) E - f(a) = (nf)() v a u
1 (H feu, () u ) . + 2E
Exercise 16.1. Show that if A E R nxn, then the following two conditions are equivalent:
> 0 for every nonzero vector u
(1) A = AT and (Au, u)
ERn.
(2) (Au, u) > 0 for every nonzero vector u E. en. In view of Exercise 16.1, the notation A >- 0 may be used for real symmetric matrices that are positive definite over R n as well as for matrices A E C nxn that are positive definite over en. Correspondingly we define (16.7)
R~xn
= {A
E
R nxn : A >- O}.
Exercise 16.2. Let A, B E R nxn and suppose that A >- 0, B = BT and IIA - BII < Amin, where Amin denotes the smallest eigenvalue of A. Show that B >- O. [HINT: (Bu, u) = (Au, u) + ((B - A)u, u).] Exercise 16.3. Let A, B E R nxn and suppose that A >- 0 and B = BT. Show that A + EB >- 0 if lEI is sufficiently smalL If f E C2 (Br(a)) and r
(16.8)
> E, then, in view of formula (16.6), 1
(V J)(a) = Olxn =* f(a + EU) - f(a) = 2E2(Hf(C)U, u)
for some point c on the open line segment between a easily to the following conclusions:
+ EU and a and leads
Theorem 16.2. Let f(x) = f(xI, ... ,xn ) belong to C2(Q), where Q ~ Rn is an open set that contains the point a. Then:
(16.9)
(V J)(a)
= Olxn and Hf(a) >- 0 =* a is a local minimum for f(x).
(16.10) (V J)(a)
= Olxn and Hf(a) -< 0 =* a is a local maximum for f(x).
16. Extremal problems
340
Proof. The proof of (16.9) follows from (16.8) and Exercise 16.2. The latter is applicable because Hf(a) )-- 0 and Hf(c) is a real symmetric matrix that tends to Hf(a) when c -+ O. The verification of (16.10) is similar. D Theorem 16.2 implies that the behavior of a smooth function f(x) in the vicinity of a point a at which (\1 f)(a) = Olxn depends critically on the eigenvalues of the real symmetric matrix Hf(a). Example 16.3. Let f( u, v) = a( u-1)2 +.B( v - 2)3 with nonzero coefficients a E JR and .B E JR. Then
8f
8f
8u(u,v)=2a(u-1) and
8v(u,v)=3.B(v-2)2.
Hence, (\1 f)(u, v) = [0
0] if u
v = 2. However, the point (1,2) is not a local maximum point or a local minimum point for the function feu, v). The Hessian
[s 82 f
lJVfJU
= 1 and
lJv] = ~
[2a 0
8v
and Hf(1,2) =
[2~ ~]
is neither positive definite nor negative definite. Exercise 16.4. Show that the Hessian Hg(l, 2) of the function
g(u, v) = a(u _1)2
+ .B(v -
2)4
is the same as the Hessian Hf (1, 2) of the function considered in the preceding example.
WARNING: A local minimum or maximum point is not necessarily an absolute minimum or maximum point: In the figure, f(x) has a local minimum at x = c and a local maximum at x = b. However, the absolute maximum value of f(x) in the closed interval a :::; x :::; d is attained at the point d and the absolute minimum value of f (x) in this interval is attained at the point x=a. Exercise 16.5. Show that if a = 1 and .B = 1, then the point (1,2) is a local minimum for the function
g(u, v) = (u _1)2
+ (v -
2)4,
but it is not a local minimum point for the function
feu, v)
= (u _1)2
+ (v -
2)3.
16.2. Extremal problems with constraints
341
f(x)
x
o
a
b
c
d
Exercise 16.6. Let f E C2 (JR 2 ) and suppose that (V f)(a, b) = [0 0], and let Al and A2 denote the eigenvalues of Hf(a, b). Show that the point (a, b) is: (i) a local minimum for f if Al > 0 and A2 > 0, (ii) a local maximum for f if Al < 0 and A2 < 0, (iii) neither a local maximum nor a local minimum if IAIA21 > 0, but AIA2 < o. Exercise 16.7. In many textbooks on calculus the conclusions formulated in Exercise 16.6 are given in terms of the second-order partial derivatives a = (82f /8x 2)(a, b), f3 = (82f /8y2)(a, b) and , = (82f /8x8y) (a, b) by the conditions (i) a > 0 , f3 > 0 and af3 > OJ (ii) a < 0 , f3 < o and af3 > OJ (iii) af3 < 0, respectively. Show that the two formulations are equivalent.
_,2
_,2
_,2
Exercise 16.8. Let Q be an open convex subset of JR n and suppose that f E C2 (Q) and Hf(x) >- 0 for every point x E Q. Show that if a E Q, then (16.11)
(Vf)(a) = 0 ===> f(b)
> f(a) for every point bE Q.
16.2. Extremal problems with constraints In this section we shall consider extremal problems with constraints, using the method of Lagrange multipliers. Let a be an extreme point of the function f(xI, ... ,xn ) when the variables (Xl, ... ,Xn ) are subject to the constraint g(XI, ... ,Xn ) = O. Geometrically, this amounts to evaluating to f (Xl, ... ,Xn ) at the points of the surface determined by the constraint g(XI, . .. ,Xn ) = O. Thus, for
16. Extremal problems
342
xi
example, if g(Xl, .. ' , Xn) = + ... + x; - 1, the surface is a sphere of radius 1. If x(t), -1 ~ t ~ 1, is any smooth curve on this surface passing through the point a with x(O) = a, then, in view of the formulas
g(x(t)) = 0 and it follows that
d dtg(x(t)) = (V'g)(x(t))x/(t)
d dtg(x(t))
for all
t E (-1,1),
= (V'g)(x(t))x'(t) = 0
for all t E (-1, 1) and hence, in particular, that
(V'g)(a)x'(O)
=
o.
At the same time, since a is a local extreme point for
0=
~f(X(O)) =
f (x)
we also have
(V' J) (a)x' (0).
Thus, if the set of possible vectors x' (0) fill out an n - 1 dimensional space, then
(V'f)(a)
=
A(V'g)(a)
for some constant..\. Our next objective is to present a precise version of thisatgument, with the help of the implicit function theorem.
Theorem 16.4. Let feu) = f(ul, ... ,un), gl(U) = gl(Ul, ... ,Un), ... , BIe(U) = Bk(UI, .. ' ,Un) be real-valued functions in Cl(Q) for some open set Qin R R., 'Where k < n. Let
S= {(UI, ... ,un) E Q: gj(Ul, ... ,un) = 0 for j
= 1, ... ,k}
and assume that:
(1) There ezists a point a E S and a number a > 0 such that the open ball BO/(a) is a subset of Q and either feu) ~ f(a) for all U E S n Ba(a) or feu) ~ f(a) for all U E S n Ba(a).
(V'gl)(U) (2) rank [
:
1 p for all points =
U
in the ball Ba(a).
(V'Bk)(U) Then there exists a set of k constants AI, ... ,..\k such that
(V'f)(a) = Al (V'gl)(a) + ... + "\k(V'gk) (a) . Proof. The general implicit function theorem guarantees the existence of a pair of constants 'Y > 0 and (j > 0 and a permutation matrix P E IR nxn such that if (16.12)
Pu = [ ; ]
with x E IR q ,
Y E IR P,
P + q = nand
Pa = [ ;: ] ,
16.2. Extremal problems with constraints
343
then for each point x in the ball By(xo) there exists exactly one point y = 'P(x) in the ball B6(YO) such that
9i(U)
= 0
i = 1, ... ,k
for
Pu = [ 'P(x) ] .
when
Moreover, 'P(x) E C1(By(xo)), Let x(t), -1 :::; t :::; 1, be a curve in B')'(xo) with x(O)
u(t) = P
T [
= XO and let
x(t) ] 'P(x(t)) .
Then,
d
(16.13)
dt!(u(t))lt=o
= (Vf)(a)u'(O) = 0
and
9i(U(t))=O for
-1
Therefore,
d
dt 9i (U(t))
=
(V9i)(U(t))u'(t)
= 0
for
-1
< t < 1 and
i = 1, ... ,k.
In particular,
(V9d(a)u'(O) = (V9i)(U(O))u'(O) = 0 for i = 1, ... , k. Therefore, the vector u'(O) belongs to the null space NA of the k x n matrix
A= [
(V9~)(a) ]
.
(V9k) (a) Since rank A = p, by assumption, dim NA = q. Next, consider the curve
x(t) = XO + tw,
-1:::; t :::; 1,
which belongs to the ball {x E lR q: IIx - XO II < 'Y} for every vector w E lR q with IIwll < 'Y. Therefore, since pT = p-l, it follows that
u'(O) where
B
=
=
pT [ x'(O) ] = pT [
y'(O)
W
]
Bw'
I ~:(x')
[~:(x') ... ~(x')
~i
(x') . . .
.
16. Extremal problems
344
Thus, the considered set of vectors u'(O) span NA. Consequently, if v and v T u'(O) = 0
E ~n
for all vectors u' (0) of the indicated form, then v E RAT. Therefore, (V' f)(a) E RAT, which is equivalent to the asserted conclusion. o
16.3. Examples Example 16.5. Let A E lR pxq and let
f(x) = (Ax, Ax) and g(x) = (x, x) - l. The problem is to find
maxf(x) subject to the constraint
g(x)
=
o.
Discussion. The first order of business is to verify the following formulas for the gradients, written as column vectors: (16.14)
(V' J)(x) = 2AT Ax
(16.15)
(V'g)(x)
= 2x.
Next, Theorem 16.4 guarantees that if a is an extreme point for this problem, then there exists a constant a such that (V' J)(a)
= a(V'g)(a).
But this in turn implies that 2AT Aa = 2aa, and hence, since a -:f 0, that a is an eigenvalue of AT A. The maximum value where Sl is the largest singular value of A. is obtained by choosing a =
sr,
Example 16.6. Let A be a real p x q matrix of rank r > 1 and let
f(x) = (Ax,Ax), go(x) = (x, x) -1 and gl(X) where The problem is to find
maxf(x) subject to the constraints
go{x) = 0 and gl(X) = O.
=
(x, Ul),
345
16.3. Examples
Discussion.
By Theorem 16.4, there exists a pair of real constants 0: and
(3 such that
(16.16)
+ (3(\7gl) (a)
(\7 f)(a) = a(\7go)(a)
at each extreme point a of the given problem. Therefore,
+ (3Ul'
2AT Aa = 20:a
(16.17)
where the constraints supply the supplementary information that (a, a) = 1
and
(a, Ul)
= O.
Since AT A is a real symmetric (and hence a Hermitian) matrix, it has a set of q orthonormal eigenvectors Ul, ... ,uq such that the corresponding eigenvalues are the squares of the singular values of A: AT AUj
where Sl 2: for C q ,
S2
2: ... 2:
= S;Uj,
2: O. Thus, as
Sq
= 1, ...
j
UI, ...
,q,
,uq is an orthonormal basis
q
a
= L CjUj,
where
Cj
= (a, Uj)
for
= 1, ...
j
,q.
j=l
This last formula exhibits once again the advantage of working with an orthonormal basis: it is easy to calculate the coefficients. In particular, the constraint gl(a) = 0 forces C! = 0 and hence we are left with q
a= LCjUj. j=2
Substituting this into formula (16.17), we obtain 2AT A
q
q
j=2
j=2
L CjUj = 20: L CjUj + (3Ul,
which reduces to 2
q
q
j=2
j=2
L CjS;Uj = 2a L CjUj + (3ul·
Therefore, we must have (3 since the constraint
= 0 and CjS;
= O:Cj for j q
go(a) = 0 ===>
L c; = 1, j=2
we see that q
(Aa, Aa)
=L j=2
c;sJ ~
= 2, ... ,q. Moreover,
16. Extremal problems
346
and hence that the maximum value of (Aa,Aa) subject to the two given constraints is obtained by choosing Q: = 8~, C2 = 1 and Cj = 0 for j = 2, ... ,q. This gives a geometric interpretation to 82. Analogous interpretations hold for the other singular values of A. Exercise 16.9. Let A be the (n + 1) x (n + 1) matrix with entries aij = 1/(i + j + 1) for i,j = 0, ... ,no Show that A ~ 0 and that the largest singular value 81 of A is subject to the bound (16.18) 81
< y'k + 1/2
-
1 L j=o(k+j+l)y'j+l/2 n
for some integer k,
0:::; k :::; n.
[HINT: Let Uj, j = 0, ... ,n denote the entries in an eigenvector corresponding to 81 and choose k so that Jk + 1/21u kl = max {Jj + 1/2Iujl}·] Remark 16.7. The inequality (16.18) combined with the sequence of inequalities
fa
1
n
(k + j
< jn+1/2
+ 1)Jj + 1/2
-1/2
dx (x
+ k + 1/2)Jx + 1/2
rJ (n+1/2
=
10
2dy
y2
+ k + 1/2 <
1f
Jk
+ 1/2
provides another proof that the norm of the Hilbert matrix considered in Exercises 8.38-8.40 is less than 1f. Exercise 16.10. Let A E ~pxp and B = A + AT. Show that 1 (16.19) (Ax, x) = 2(Bx, x) for every x E ~ p • Exercise 16.11. Let A E ~ 4x4, let Al > A2 > A3 > A4 denote the eigenvalues of the matrix B = A + AT and let Ub U2, U3, U4 denote a corresponding set of normalized eigenvectors; i.e., (Uj, Uj) = 1 and BUj = AjUj for j = 1, ... ,4. Find the maximum value of the function f(x) = (Ax, x) over all vectors x E ~ 4 that are subject to the two constraints (x, x) = 1 and (x, v) = 0, where v = (3/5)U1 + (4/5)U2. Exercise 16.12. In the setting of Exercise 16.11, find the minimum value of f(x) over all vectors x E ~ 4 that are subject to the same two constraints that are considered there. Example 16.8. The problem of maximizing f(X) = (det x)2 over the set of all matrices X E ~pxp that meet the constraint traceXX T = 1 fits naturally into the setting of Lagrange multipliers upon setting g(X) = trace X XT - 1.
347
16.3. Examples
Discussion. This is a problem with p2 variables, Xij, i,j = 1,000 ,po It turns out to be convenient to organize the gradient of f(X) as a p x p matrix with ij entry equal to let h(X)
l£j 0 To warm up to the problem at hand,
= det X and note, for example, that the formula
h(X) = xnMn - X12M12 + X13M13 for the determinant in terms of the minors
+ ... + (-l)l+PXlpMlp Mij
clearly implies that
oh
--=-M12 OX12
and that, in general, (16.20)
Since the maximum value of f(X) under the given constraint is nonzero, there is no loss in generality in assuming that X is invertible. Then, in view of Theorem 5.9,
and hence (V' f)(X)
= 2(det X) (\7h)(X) = 2J(X)(XT)-1
0
Thus, as (V'f)(X) = a(V'g)(X) at a critical point Xo of the constrained problem and (V'g)(X) = 2X, it follows that
2f(Xo)(XJ,)-1 = a2Xo . Therefore,
J(Xo)Ip = aXoxl , which, upon computing the trace of both sides, implies that pf(Xo) = a and hence that
f(Xo)Ip = pf(Xo)XoXl Consequently, XoXl
0
= p-llp and
(det XO)2 = det Xo det xl = p-p
0
It is readily confirmed that p-p is indeed the maximum value of the function under consideration by using singular value decompositions; see Exercise ... Exercise 16.13. Find the maximum value of the function f(X) = In(det X)2 over all matrices X E ~pxp with traceXX T = 1. Exercise 16.14. Find the maximum value ofthe function f(X) = trace X XT over all matrices X E ~pxp with det X = 1.
16. Extremal problems
348
Exercise 16.15. Let C E lR nxn , Q = {X E lR nxn
f(X)
= (X, C) -lndet X for
:
det X > O} and let
X E Q,
where (X, C) = trace {CTX}. Show that if (Vf)(X) is written as an n x n matrix with ij entry equal to af jaXij, then (V f)(X) = C - (XT)-l. Exercise 16.16. Find max {trace A : A E lR nxn
and
trace (AT A)
= I}.
Exercise 16.17. Let B E lR~xn, Q = {A E lR~xn : aij = bij for li-jl :::; m} and let f(X) = lndet X. Use the method of Lagranges multipliers to show that if A E Q and I(A) ~ I(A) for every matrix A E Q, then ef(A)-lej = 0 for Ii - jl > m. Exercise 16.18. Let 01 < ... < Ok and let p(x) = CO + CIX + ... + cnxn be a polynomial of degree n ~ k with coefficients Co, . .. , Cn E R Show that if p(Oj) = {3j for j = 1, ... ,k, then
t.
q 2: bT (UT U)-lb,
where
U
~ [;1
;,]
wd
b
~ [1J '
Ok
O~
and find a polynomial with coefficients that achieve the exhibited minimum. Exercise 16.19. Let ei denote the i'th column of In for i = 1, ... , nand let n
Q = {x E lR n : efx > 0 for i = 1, ... ,n and
L efx = 1} . i=l
Show that if a E Q, b E Q and u E lR n are vectors with components al,'" ,an, b1 ••• ,bn and UI ... ,Un, respectively, then
m~ {(u,a) -In (teUibi )} = tailn ab~~ .
ueR
i=l
i=l
[HINT: First rewrite the function that is to be maximized as :L~l ai In ~, where Ci = eUibi!{:L:=1 eUsb s } and note that the vector c with components Ci, i = 1, ... ,n, belongs to Q. Therefore, the quantity of interest can be identified with n
max Lai ln i=l
X·
b~ ~
over x E lR n with Xi> 0 and subject to g(x) =
Xl
+ ... + Xn -
1 = O.
Exercise 16.20. Use Lagrange's method to find the point on the line of intersection of the two planes alXI + a2x2 + a3X3 + ao = 0, blXI + b2X2 + b3X3 + bo = 0 which is nearest to the origin. You may assume that the two
16.5. The conjugate gradient method
349
planes really intersect, but you should explain where this enters into the calculation. Exercise 16.21. Use Lagrange's method to find the shortest distance from the point (0, b) on the y axis to the parabola x 2 - 4y = 0. Exercise 16.22. Use Lagrange's method to find the maximum value of (Ax, x) subject to the conditions (x, x) = 1 and (Ul' x) = 0, where Ul is a nonzero vector in N(S~In-AT A)' sl is the largest singular value of A and A
= AT E
]Rnxn.
16.4. Krylov subspaces The k'th Krylov subspace of]R n generated by a nonzero vector u E ]R nand a matrix A E ]R nxn is defined by the formula 1tk
= span{ u, Au, ... ,Ak- l u} for
Clearly, dim 1tl = 1 and dim 1tk
~
k
= 1,2, ....
k for k = 2,3, ....
Lemma 16.9. If1tk+l = 1tk for some positive integer k, then 1tj = 1tk for every integer j 2:: k. Proof. If1tk+l = 1tk for some positive integer k, then Aku = Ck_lAk-1u+ .. ,+cou for some choice of coefficients co, ... ,Ck-l E R Therefore, Ak+lu = Ck_lAk + ... + coAu, which implies that 1tk+2 = 1tk+l and hence that 1tk+2 = 1tk· The same argument implies that 1tk+3 = 1tk+2 = 1tk+l and so on down the line, to complete the proof. D Exercise 16.23. Let A E ]R nxn, let u ERn and let k 2:: 1 be a positive integer. Show that if A >- 0, then the matrix
[
(Au, u) (Au, Au)
(A 2 u, u) (A 2u, Au)
(Au,lk-l u )
(Aku, u) (Aku, Au) (Aku,
1
~k-lU)
is invertible if and only if the vectors u, Au, ... ,Ak-lu are linearly independent in R n .
16.5. The conjugate gradient method The method of conjugate gradients is an iterative approach to solving equations of the form Ax = b for matrices A E R~xn when b E ]Rn. It is presented in this chapter because the justification of this method is based on solving an appropriately chosen sequence of minimization problems.
16. Extremal problems
350
Let A E IR nxn , bE IR n and 1
cp(x) = 2" (Ax, x} - (b,x)
(16.21)
for every x E IRn. Then the gradient (Vcp)(x), written as a column vector, is readily seen to be equal to
(Vcp)(x) =
[~]1 a:
= 2" (A + AT)x - b,
*£
and the Hessian H
Therefore, upon invoking the general formula (16.6), one can readily check that 1 cp(y) - cp{x) = ((Vcp)(x),y - x) + 4((A + AT)(y - x), (y - x)). Thus, if the given matrix A E lR~xn, which will be the case for the rest of this section, then the function cp(x) attains its minimum value at exactly one point a E IR n, namely when Aa = b. This is the one and only point in R n at which the gradient Vcp(x) = Ax - b vanishes. Consequently, the unique solution x of the equation Ax = b is the unique point at which cp(x) attains its minimum value. The conjugate gradient method exploits this fact in order to solve for the solution of the equation by solving for this minimum recursively, i.e., via a sequence of minimization problems. Lemma 16.10. Let A E lR~xn and let Q be a closed nonempty convex subset ofR n. Then there exists exactly one point q E Q at which the function cp{x) defined by formula (16.21) attains its minimum value, i.e., at which
cp(q) Proof.
Let Al
~
...
~
~
cp(x)
for every x E Q.
An denote the eigenvalues of A. Then, Al > 0 and 1
cp(x)
2" (Ax, x) - (b, x) 1
2
> 2"Alllxli2 - IIbll211xll2 =
1
IlxIl2(2"AlllxIl2 -li b Il2),
which clearly tends to 00 as IIxl12 tends to 00. Thus, if y E Q, there exists a number R > 0 such that cp(x) > cp(y) if IIxli2 2: R. Consequently, cp(x) will achieve its lowest values in the set Q n {x: IIxl12 ~ R}. Thus, as this set is
16.5. The conjugate gradient method
351
closed and bounded and ~(x) is a continuous function of x, ~(x) will attain its minimum value on this set. The next step is to verify that ~(x) attains its minimum at exactly one point in the set Q. The proof is based on the fact that ~(x) is strictly convex; i.e., ~(tu
(16.22)
+ (1 -
t)v) < t~(u)
+ (1 -
t)~(v)
for every pair of distinct vectors u and v in R n and every number t E (0,1). Granting this statement, which is left to the reader as an exercise, one can readily see that if ~(u) = ~(v) = , for two distinct vectors u and v in Q and if t E (0,1), then tu + (1- t)v E Q and
, <
= v.
D
Exercise 16.24. Verify that the function
+ (1 -
t)v) =
t~(u)
+ (1 -
t)~(v)
1
- 2t(1 - t)(A(u - v), u - v)
is valid for every pair of vectors u and v in R n and every number t E R 1 Lemma 16.11. Let A E R~xn, bERn, and for any vector Xo E IR n such that u = Axo - b "" 0, Xo E IR n, let 'Hj, j = 1,2, ... , denote the Krylov subspaces based on the matrix A and the vector u. Let f denotes the smallest positive integer such that 'HHI = 'Hi. Then the vector a = A- 1 b meets the following constraints:
(1) a E Xo + 'Hi. (2) a ¢ Xo + 'Hj for j = 1, ... ,f - 1.
Proof.
By assumption, Alu E 'Hi, and hence Alu = Cl_IAl-iu + ... + COU
for some choice of coefficients Cl-I. ... ,co E R Moreover, CO 1= 0, because otherwise Alu = Cl_IAl-iu + ... + cIAu, and hence, as A is invertible, it follows that Al-Iu = Cl_2Al-Iu+ . +Cl U. But this implies that 'Hi = 'Hi-I. which contradicts the definition of f. Thus, the polynomial p(x)
= xl - {Cl_lXl - 1 + ... + CO}
352
16. Extremal problems
meets the conditions p(O)
i= 0 and p(A)u = O.
Therefore,
r(x) = p(x) - p(O) x
is a polynomial of degree I - 1 and
p(O)u + Ar(A)u = p(A)u = O. But this is the same as to say that
p(O)A(Xo - a) = -Ar(A)u and hence, as A is invertible, that
a-
X()
= p(O)-lr(A)u;
that is to say, a E Xo + 1il, as claimed in (1).
+ 1ij for some integer j . 1 a - Xo = cj_1AJ- U + ... + cou for some set of coefficients co, ... ,Cj-l E 1R and hence that To verify (2), suppose that a E Xo
~
1; i.e.,
u = A(Xo - a) = -cj-1Aju - ... - coAu. But this implies that the vectors u, ... ,Aju are linearly dependent. Therefore, j ~ I. 0 Lemma 16.10 guarantees that for each integer j, j = 1, ... ,I, there exists a unique point Xj in Xo + 1ij at which 'P(x) attains its minimum value, i.e.,
ep(Xj) Lemma 16.12. If k
(1) (2) (3) (4) (5) (6)
Xk - X()
~
= min{'P(x): x E
Xo +1ij}.
I, then dim1ik = k and:
E 1ik'
(AXk - b, y) = 0 for every vector y E 1ik' AXk-b=U+A(Xk-XO). AXk - bE 1ik+l' (A(Xk+1 - Xk), y} = 0 for every vector y E 1ik' The vectors
(16.24)
rj
= AXj -
b,
j
= 0, ...
,k - 1,
form an orthogonal basis for the space 1ik with respect to the standard inner product in lR n . Proof.
Item (1) is by definition. Item (2) follows from the formula ( r7
()
vep x
k,
}
l'
v = 1m
.0-+0
'P(Xk + cv) - 'P(Xk) , C
16.5. The conjugate gradient method
353
which is valid for every vector v E 11.k, and the definition of the point Xk. The remaining assertions are easy if tackled in the presented order and are left to the reader. 0 Exercise 16.25. Verify assertions (3)-(6) of Lemma 16.12. Exercise 16.26. Show that properties (1) and (2) of Lemma 16.12 serve to uniquely determine Xk. Let Po = u and let Pj' j = 0, ... ,k - 1, be a basis for 11.k that is orthogonal with respect to the inner product (Ax, y), i.e., (Api,Pj) =
°
for
,e ·-1
i,j = 0, ...
and
i =1= j,
and suppose further that these vectors are normalized so that rj - Pj E 11.j.
Lemma 16.13. If k Proof.
If k ~
~
e- 1, then Xk+l -
e- 1, then Xk+1 -
Xk
= CkPk'
Xk E 11.k+1' Therefore, k
xk+1 - Xk = L
'YjPj
j=O
for some choice of real constants 'Yo, ... ,'Yk, since Po, ... ,Pk is a basis for 11.k+ 1. Moreover, (A(Xk+1 - Xk), Pi) = 'Yi(Api, Pi) for i = 0, ... ,k. Therefore, by (5) of Lemma 16.12, 'Yi = hence the result follows with Ck = 'Yk. Exercise 16.27. Show that if k (16.25)
Xk+1 - Xk
Lemma 16.14. If k Proof.
~
= CkPk
e-
~
°
for i < k, and 0
e- 1, then .h
WIt
Ck
1, then Pk - rk
(rk' Pk) ) Pk,Pk
= - (A
= dk-lPk-l'
By construction, Pk - rk E 11.k' Therefore, k-1 Pk - rk = LOiPi i=O
for some choice of coefficients Oi E R However, if i
~
k - 2, then
-(Ark,Pi)
-(rk,Api) 0, since rk is orthogonal to 11.k-l with respect to the standard inner product. Therefore, Pk - rk = Ok-1Pk-l, which is of the claimed form. 0
16. Extremal problems
354
Exercise 16.28. Show that if k ~ f - 1, then (16.26)
Pk - rk
= dk-1Pk-l
.
wIth
dk-l
(Ark, Pk-l)
= - (A
Pk-l, Pk-l
).
Exercise 16.29. Show directly that cp{Xj) ~ cp{xj-d for j = 1, ... ,C. Moral: The preceding analysis provides a recursive scheme for calculating a via formulas (16.24)-(16.26): Choose any vector Xo E lR n and set ro = Po = Axo - b. Then run the recursions for C steps. The desired solution a = Xe. Thus, it is not necessary to solve the extremal problems that were considered above. Exercise 16.30. Use formula (16.24) and the recursions (16.25) and (16.26) to solve the equation Ax = b when A
[~ ~]
=
and b
=-
[~].
16.6. Dual extremal problems Recall that if X is a finite dimensional normed linear space over IF, then every linear functional I on X is bounded. The set of linear functionals on such a finite dimensional space X is denoted by X'. Theorem 16.15. Let X be a finite dimensional normed linear space over C, let U be a subspace 01 X and let UO = {j E X' : I{u) = 0 lor every u E U} Then: (1) For each vector x EX, min Ilx - ull = UEU
max I/(x)l.
I
EUo
11111 ~ 1 (2) For each bounded linear functional
I
E
X',
min III - gil =
max I/(u)l·
g E UO
u EU
Ilull ~ 1 Proof. To establish (1), let x E X and suppose further that x ¢ U, because otherwise (1) is self-evident, since both sides of the asserted equality are equal to zero. Then for any f E UO with IIfll ~ 1 and any u E U, If(x)1 =
If(x - u)1 < 11/11 Ilx - ull ~
Therefore, (16.27)
If{x)1 ~ d,
IIx-ull·
355
16.6. Dual extremal problems
where
d = inf{lIx - ull : u
U} = min{llx - ull : u E U}, since X is finite dimensional. Now, since x f/. U, we may define the linear functional fo on the subspace E
Ul = {ax + f3u: a, f3 E C
and
u E U}
by the formula
fo(ax + f3u) = ad. Then
Ifo(ax + f3u)1 =
laid
~
for every choice of y E U. If a
!
lal IIx - yll = Ilax - aY11 0, the particular choice
f3 y=--u a yields the inequality
Ifo(ax + f3u)1 ~ II (ax + f3u)lI, which is valid for all choices of a, f3 E C and u E U. Theorem 7.20 (the junior version of the Hahn-Banach Theorem) guarantees the existence of a bounded linear functional h on the full space X such that
h (ax + f3u)
=
fo(ax + f3u)
for all choices of a, f3 E C and u E U and Ilhll =sup{lfo(ax+f3u)l: II ax + f3ull ~ I}. Thus, h E uo, Ilhll ~ 1 and h(x) = fo(x) = d. Therefore, the upper bound on the left-hand side of equation (16.27) is attained by h(x). Next, to verify (2), fix f E X' and suppose f f/. uo; otherwise both sides of (2) are equal to zero. Then If(u)1 = If(u) - g(u)1 ~ Ilf - gliliull for every choice of g E UO and u E U. Therefore, (16.28)
sup {If(u)1 : u E U
and
Ilull ~ I} ~ inf {Ilf - gil: g E UO}.
Now let fo = flu, the restriction of f to the subspace U. By Theorem 7.20, there exists a bounded linear functional h E X' such that
h(u)
=
fo(u)
=
f(u)
for every vector u E U and sup{lh(x)1 : x E X and IIxll ~ I} = sup{lfo(u)I : u E U aIid lIull ~ I}, i.e.,
Ilhll = sup{lf(u)1 : U E U and
lIull ~ I}.
Chapter 17
Matrix valued holomorphic functions
like most normally constituted writers Martin had no use for any candid opinion that was not wholly favorable.
Patrick O'Brian [54], p. 162 The main objective of this chapter is to introduce matrix valued contour integrals of the form
1r cp(A) (Aln - A)-IdA, where r is a simple closed smooth curve that does not pass through any of the eigenvalues of A and cp(A) is analytic in an open set that contains the curve r and its interior. Integral formulas of this type are extremely useful, both in the present context of matrices as well as for more general classes of operators. In this chapter they will be used to establish the continuous dependence of the eigenvalues of a matrix on the entries in the matrix and then, subsequently, to give an elegant proof of the formula for the spectral radius and some formulas for the fractional powers of matrices. We begin, however, with a brief introduction to the theory of scalar analytic functions of one complex variable. A number of useful supplementary facts are surveyed in Appendix 2.
17.1. Differentiation A complex valued function f (A) of the complex variable A that is defined in an open set n of the complex plane C is said to be holomorphic (or
-
357
17. Matrix valued holomorphic functions
358 analytic) in
n if the limit
+ e) - f()..) e-+o exists for every point ).. E n. This is a very strong constraint because the variable e in this difference ratio is complex and the definition requires the limit to be the same regardless of howe tends to zero. Thus, for example, if f()..) is analytic in n, then the directional derivative !'()..) = lim f()..
(17.1)
e
ifJ (DfJf)()..) = lim f().. + re ) - f()..) r~O r
(17.2)
exists and
(17.3) for every angle 9. In particular, because of (17.1),
(Dof)()..) = e- i -rr/2(D-rr/2f)()..);
(17.4)
i.e., in more conventional notation
(17.5) This last formula can also be obtained by more familiar manipulations by writing).. = x + iy, e = a + i{3, f()..) = g(x, y) + ih(x, y), where x,y,a,{3,g(x,y) and h(x,y) are all real, and noting that
(17.6) f()..
+ e) - I()..)
e
g(x + a, y + {3) - g(x, y) = a + i{3
Then, since the limit when {3 = 0 and a a = 0 and {3 ~ 0, it follows that (17.7)
og ox (x,y)
~
ih(x + a, y + a
+ {3) + i{3
hex, y) .
0 must agree with the limit when
1 {Og .oh } + ~.oh 8x (x, y) = i 8y (x,y) + 1, 8y (x, y)
,
which is just another way of writing formula (17.6). Clearly formula (17.7) (and hence also formula (17.6» is equivalent to the pair of equations og 8h 8h 8g (17.8) ox(x,y) = 8y(x,y) and 8x(x,y)=-8y(x,y). These are the Cauchy-Riemann equations. But it's easiest to remember them in the form (17.5). There is a converse statement: If (17.8) holds in some open set n and if the indicated partial derivatives are continuous in n, then f()..) = g(x, y) + ih(x, y) is holomorphic in n. For sharper statements, see e.g. [60]. Theorem 17.1. Let n be an open nonempty subset of C and let ~n denote the set of functions that are holomorphic (i.e., analytic) in n. Then:
359
17.1. Differentiation
(1) (2) (3) (4) (5) (6)
f E
~n ==}
f()..) is continuous in O.
f E ~n, 9 E ~n ==} oJ + (3g E ~n for every choice of 0:, (3 E C. f E ~n, 9 E ~n ==} fg E ~n and (fg)'()..) = f'()..)g()..) + f()..)g'()..). f E ~n, f()..) -# 0 for any point).. EO==} (1/ f) E ~n. 9E f E
~n,
g(O) CO l , 0 1 open, f E ~nl ==} fog E ~n, 0: EO==} Raf E ~n, where f(~ - f(o:)
(Raf)()..) = {
(17.9)
-
1'(0:) (7)
f
E ~n ==}
Proof.
Let
f'
E ~n.
0: E O.
Then
If(o: +~) - f(o:)1 <
0: + ~ E 0
if )..
0:
if
~n·
-# 0:
)..=0:
if I~I is small enough and
I f(o: + ~~ -
f(o:)
II~I
I f(o: +~~ -
f(o:) -
f'(o:)II~1 + 1!'(0:)11~1,
which clearly tends to zero as I~I ----t O. Therefore, (1) holds. The next four items in this list are easy to verify directly from the definition. Items (6) and (7) require more extensive treatment. They will be discussed after we introduce contour integration. 0 We turn now to some examples. Example 17.2. The polynomialp()..) = ao+al)..+·· ·+an)..n is holomorphic in the whole complex plane C.
Discussion. In view of Items 2 and 3 of Theorem 17.1, it suffices to verify this for polynomials PI ()..) = ao + a1).. of degree one. But this is easy: Pl()..+~)-p1()..)
~
=
ao+al()..+~)-ao-a1)..
~
=al'
Therefore, the powers )..k
= )...).. ... ).. , k = 1,2, ...
,n,
are holomorphic in C as is p()..). Exercise 17.1. Verify directly that lim ().. + ~)n - )..n = n)..n-I
~-+o
~
Example 17.3. f()..) =
OX
for every positive integer n
~ 0: is analytic in C \ {o:}.
~ 2.
17. Matrix valued holomorphic functions
360 Discussion.
f(>..
Clearly
+0 -
~ {>..+~-a >..~a}
f(>..) =
~
>"-a-(>"+~-a)
-
~(>..+e-a)(>"-a)
-1
-1
~----~~--~~~--~
(>..+e-a)(>..-a)
as
~ ~
(>..-a)2'
o.
Discussion. This is immediate from Item 3 of Theorem 17.1 and the preceding two examples.
Example 17.5. f(>..) = 1'(>") = af(>..}·
eCl
is analytic in C for every fixed a E C and
Discussion. In view of Item 5 of Theorem 17.1, it suffices to verify that eA is analytic. To this end, write >.. = J-l + ill so that
f(>..) = eA =
eILeiv
= elL (cos 1I + i sin 1I)
and, consequently,
~~ (>..) =
f(>..) and
~~ (>..) =
elL ( - sinll
+ i cos 1I) =
if(>..) .
Thus, as the Cauchy-Riemann equations are satisfied, f(>..) is analytic in C. A perhaps more satisfying proof can be based on the exhibited formula for f(>..) and Taylor's formula with remainder applied to the real valued functions eIL,cosll,sinll, to check the existence of the limit in formula (17.1). Yet another approach is to first write
e
and then to verify the term inside the curly brackets tends to a as tends to 0, using the power series expansion for the exponential. It all depends upon what you are willing to assume to begin with.
17.2. Contour integration
361
17.2. Contour integration A directed curve (or contour) r in the complex plane C is the set of points {-y(t) : a :::; t :::; b} traced out by a complex valued function, E C([a,b]) as t runs from a to b. The curve is said to be closed if ,(a) = ,(b), it is said to be smooth if, E C1([a, b]), it is said to be simple if, is one to one on the open interval a < t < b; i.e., if a < h, t2 < band ,(tt) = ,(t2), then tl = t2' The simplest contours are line segments and arcs of circles. Thus, for example, if r is the horizontal line segment directed from O!l + if3 to 0!2 + if3 and O!l < 0!2, then we may choose
,(t) = t + if3,
O!l :::;
t :::;
0!2
or ,(t)
= O!l + t(0!2 -
al)
+ if3,
0:::; t :::; 1.
The second parametrization is valid even if O!l > a2. If r is the vertical line segment directed from a + if31 to O! + if32 and f31 < f32, then we may choose ,(t) = a + it, f31 :::; t :::; f32. If r is a circular arc of radius R directed from Reia. to Re if3 and a < f3, then we may choose ,(t) = Re it , a :::; t :::; f3. A curve r is said to be piecewise smooth if it is a finite union of smooth curves, such as a polygon. The contour integral f(>")d>" of a continuous complex valued function that is defined on a smooth curve r that is parametrized by, E C1([a, b]) is defined by the formula
fr
(17.10)
ir
f(>..)d>.. =
lb
f(,(t)h'(t)dt.
The numerical value of the integral depends upon the curve r, but not upon the particular choice of the (one to one) function ,(t) that is used to describe the curve, as the following exercise should help to clarify. Exercise 17.2. Use the rules of contour integration to calculate the integral (17.10) when f(>..) = >.. and (a) ,(t) = t for 1 :::; t :::; 2; (b) ,(t) = t 2 for 1 :::; t :::; )2; (c) ,(t) = et for 0 :::; t :::; In 2 and (d) ,(t) = 1 + sin t for
0:::;t:::;7r/2. Exercise 17.3. Use the rules of contour integration to calculate the integral (17.10) when f(>..) = >.. and r is the rectangle directed counterclockwise with vertices -a - ib, a - ib, a + ib, -a + ib, where a > 0 and b > O. Exercise 17.4. Repeat the preceding exercise for (positive, zero or negative) and the same curve r.
f (>..)
=
>.. n, n an integer
Theorem 17.6. Let f(>..) be analytic in some open nonempty set O. Let r be a simple smooth closed curve in 0 such that all the points enclosed by r also belong to O. Then
irf(>")d>" = O.
362
17. Matrix valued holomorphic functions
Discussion. Consider first the special case when r is a rectangle with vertices al +ibl,a2+ibl,a2+ib2,al +ib2, withal < a2, bl < b2 and suppose that the curve is directed counterclockwise. Then the integral over the two horizontal segments of r is equal to
l a2 l b2
f(x
+ ibl)dx -
la2 f(x + i~)dx = l a2 {l b2 --a af (x + iy)dy } dx, Y at
al
at
bl
whereas the integral over the vertical segments of r is equal to
f(a2
+ iy)idy -
lb<J f(al + iy)idy = lb2i
bl
bl
bt
{l a2 -aaf al
(x + iy)dx } dy.
X
Therefore, assuming that it is legitimate to interchange the order of integration on the right-hand side of the first of these two formulas, we see that
{~la2 {- :f (x + iy) + i: f (x + iY )} dxdy Jbl al y X
{f(A)dA =
Jr
0,
=
by the Cauchy-Riemann equations (17.5). The conclusion for general r is obtained by approximating it as in Figure 1. The point is that the sum of the contour integrals over every little box is equal to zero, since the integral around the edges of each box is equal to zero. However, in the sum, each interior edge is integrated over twice, once in each direction. Therefore, the integrals over the interior edges cancel out and you are left with an integral over the outside edges, which is approximately equal to the integral over the curve r and in fact tends to the integral over this curve as the boxes shrink to zero.
---....-
,--
r--',/
I \.
--
...... i'-,
"'" \
......,
\
j
!/ //
".......
I
-'--.
.......
"
)
"~
--
Figure 1
--
17.2. Contour integration
363
Figure 2 Theorem 17.7. Let f(>.) be holomorphic in some open nonempty set n. Let r be a simple closed curve in n that is directed counterclockwise such that all the points enclosed by r also belong to n. Then
r
_1_ f(>.) d>' = {f(o.) if a is inside r 27fi Jr >. - a 0 if a is outside r
.
Proof. Let a be inside r and introduce a new curve f1 + r~ + ra + r~, as in Figure 2, such that a is outside this new curve, ra is "most" of a circle of radius r centered at the point a and
r
2-
r
f(>.) d>' = r f(>.) d>' 27fi Jr (>. - a) E~ 2'1ri Jr'1 (>. - a) .
_1
Then, by the construction,
r
Jq
f(>.) d>' = _
(>. - a)
r
Jq
f(>.) d>' _
(>. - a)
r
f f(>.) d>' _ f(>.) d>.. Jr~ (>. - a) Jr~ (>. - a)
Now, as € tends to zero, the first and third integrals on the right cancel each other out and the second integral
r
f(>.) d>' Jr3 (>. - a) as
€
-t _
r J o
7r
f(o. +,reiO ) ireiod(} (a + re~o - a)
tends to O. The final formula follows from the fact that 27r f(o. + reiO ) , -;--'-~----;;,...---'--:-iretO d(} - t 27fi f (a) o (a + re~o - a)
as r tends to O.
1
17. Matrix valued holomorphic functions
364
It remains to consider the case that the point a is outside the statement is immediate from Theorem 17.6.
Theorem 17.8. Let Then f' E ~n·
n
r.
But then 0
be a nonempty open subset of C and let f E
~n.
Proof. Let wEn and let r = w + rei/} , 0 ~ () < 271", where r is chosen small enough so that ,X - wEn when I,X - wi ~ r. Then
r
f(w) = ~ f(,X) d,X 271"1,lr ,X - w and
f(w
as
e
-+
+ e) -
e
f{w)
~Jf('x) {
=
e
271"i
1 _ _ 1_}d'x 'x-w-e 'x-w
r
f('x) d,X 271"ilr ('x-w-e)('x-w)
=
_1
-+
_1
r
f(,X) d,X 271"i lr (,X - w)2
O. Therefore, 1
I
f (w) and hence
f'(W +e) - f'(w) = _1 271"i
e
r
f(,X)
= 271"i lr (,X _ w)2d'x
1e
f{'x) {
which tends to the limit
1
_
(,X - w - e)2
'Y
1
} d,X,
(,X - w)2
r
~ f(,X) d,X 271"i lr (,X - w)3 as
etends to O.
o
Corollary 17.9. In the setting of Theorem 17.8,
(17.11) for k
f{k)(w) k!
=
r
f(,X) d,X 271"i lr (,X - w)k+1
_1
= 0,1, ....
Exercise 17.5. Verify Corollary 17.9. Exercise 17.6. Verify assertion (6) in Theorem 17.l. Theorem 17.10. Let n be a nonempty open subset of C and let f(,X) g(,X)jh('x), where 9 E ~n and
h('x)
= (,X -
al)k 1
•••
(,X - an)kn
=
17.3. Evaluating integrals by contour integration
365
is a polynomial with n distinct roots a1, . .. ,an' Let r be a simple closed smooth curve directed counterclockwise such that r and all the points inside r belong to n. Suppose further that a1, . .. ,ae are inside rand ae+ 1, . .. ,an lie outside r. Then
where
{(,X - aj)kj f(,X)}(k r 1) Res(f, aj) = >'~~j (k j _ I)! ' .
and the superscript kj -1 in the formula indicates the order of differentiation. Discussion. The number Res(f, aj) is called the residue of f at the point aj. The basic strategy is much the same as the proof of Theorem 17.10 except that now f little discs have to be extracted, one for each of the distinct zeros of h('x) inside the curve r. This leads to the formula e 1 . { f('x)d'x, ~ { f('x)d'x = -2
L.
2nJr
3=
1
nJr.J
where rj is a small circle ofradius rj centered at aj that is directed counterclockwise, and it is assumed that rj < (1/2) min{lai - akl : i, k = 1, ... , f} and that {A E
fj('x) = (,X - aj)kj f('x). Then, since fj('x) is holomorphic in an open set that contains interior points, formula (17.11) yields the evaluation (krl)
~
rj .
{ f('x)d'x = ~ ( h('x) d,X = fj (a 3 ) 2n Jr j 271'~ Jrj (,X - aj)kj (kj - I)!
for j
= 1, ...
and all its
,
,f, which coincides with the advertised formula.
17.3. Evaluating integrals by contour integration Having come so far, it is worth expending a little extra energy to review some evaluations of integrals that emerge as a very neat application of contour integration and also serve as a good introduction to some of the basic formulas of Fourier analysis, which will be the subject of the next section. 00 1 Example 17.11. 2""1dx = 71'.
1
-00
Discussion.
X
+
Let 1 f(,X) = ,X2 + 1 '
g(,X) = (,X - i)f('x) =
~ +z 1\
366
17. Matrix valued holomorphic functions
o
-R
R
Figure 3 and let r R denote the semicircle of radius R in the upper half plane, including the base (-R, R), directed counterclockwise as depicted in Figure 3. Then
l
R
1 -2--1 dx
-Rx +
= IR + HR,
where
IR =
f(>')d>' = (
g(>').d>' z = 27rig( i) = 7r if R > 1 , {
JrR
JrR >. -
since 9 is holomorphic in C \ {-i}, and
IIR =
-1
f(>.)d>.,
CR
the integral over the circular arc C R = Rei6 , 0 :::; () :::; 7r, tends to zero as R i 00, since
1
00
Example 11.12. Discussion.
eitx
-2--dx -00 x + 1
= 7re- ltl if t E JR.
Let eit>.
eit>.
f(>.) = >.2 + 1 and g(>.) = (>. - i)f(>.) = >. + i and let r R denote the contour depicted in Figure 3. Then, since 9 is holomorphic in C\ {-i}, the strategy introduced in Example 17.11 yields the
17.3. Evaluating integrals by contour integration
367
o
-R
R
Figure 4 evaluations
r f(>')d>' = JrRr f(>').d>' = 27rig(i) -2
JrR
if R> 1
and
Thus, if t > 0, then
IfaR f(>')d>.1 ~ fo7r R2~ 1 dO, which tends to zero as R i 00. If t < 0, then this bound is no longer valid; however, the given integral may be evaluated by completing the line segment [-R R] with a semicircle in the lower half plane.
Exercise 17.7. Show that
1
00
-00
eitx
~dx = 7ret x +
by integrating along the curve
Example 17.13.
1
00
x
-00
Discussion.
rR shown in Figure 4.
1- costx 2
if t < 0
. dx = 7r!t!lf t E R
Let
f(>.)
=
1 - cos t>. >.2
=
2 - e it>. - e-it>. 2>.2 .
368
17. Matrix valued holomorphic functions
0
...
•
..
-r Vr
-R
• R
Figure 5
Then
f is holomorphic in C and, following the strategy of Example 1,
j-RR f(x)dx = IR + IIR, where
and
r f(ReiB)iReiBdO.
= - ( f()")d)" = lOR 10
IIR
However, this does not lead to any useful conclusions because IIR does not tend to zero as R t 00 (due to the presence of both eit >. and e- it >' inside the integral). This is in fact good news because IR = O. It is tempting to split f ()..) into the two pieces as
f()..) =
h ()..) + h()..)
with 1 - eit >.
h()..) =
2)..2
1 - e- it >.
h()..) =
and
2)..2
and then, if sa.y t > 0, to integrate h ()..) around a contour in the upper half plane and h()..) around a contour in the lower half plane. However, this does not work because the integrals
i:
h(x)dx and
1:
h(x)dx
are not well defined (because of the presence of a pole at zero). This new difficulty is resolved by first noting that
j
R f(x)dx
-R
= (
lLR
f()")d)",
where LR is the directed path in Figure 5. Since this path detours the troublesome point ).. = 0,
j
R f(x)dx = -R
(
lLR
{h()..)
+ h()..)} d)" =
(
lLR
h()")d)" +
r h()")d)",
lLR
369
17.3. Evaluating integrals by contour integration
-R+ib
R+ib
R
-R Figure 6
and hence if t > 0, R> 0 and CR depicts the circular arc in Figure 3,
r
~
h (>.)d>. =
r
~
r
h (>.)d>. +
~
h (>.)d>. -
r
~
h (>.)d>.
n't - fo7r h(Rei9 )iRei9 dO ----+ rrt
as R j
00 ;
whereas, if DR denotes the circular arc depicted in Figure 4,
27r r 12(>') + r 12(>')d>' -1 12 (Re JLR JDR 7r
1
i9 )iRe ifJ dO
27r
12 (Re i9 )iRei9 dO
0+ 7r ----+0
as Rjoo.
This completes the evaluation if t > O. The result for t < 0 may be obtained by exploiting the fact that 1 - cos tx is an even function of t. Exercise 17.8. Verify the evaluation of the integral given in the preceding example by exploiting the fact that
1
00
1:
-00
Example 17.14.
1- cos tXd = l' 2 x 1m x c!O
e- a (x-ib)2 dx
=
1
00
1:
-00
1 - cos tXd 2 2 X. x +c
e- ax2 dx if a > 0 and b E R
Discussion. Let r R denote the counterclockwise rectangular path indicated in Figure 6 and let f(>.) = e- aA2 • Then, since f(>.) is holomorphic in the whole complex plane C,
o=
r f(>')d>' JrR
=
l l
R
f(x)dx
-R R
-R
f(x
+
r f(R + iy)idy Jo b
+ ib)dx -
r f( -R + iy)idy. Jo b
17. Matrix valued holomorphic functions
370
Next, invoking the bound
Iexp {-a(R + ib)2}1 = Iexp {-a(R2 + 2iRb -
b2)}1 = exp {-a(R 2 - b2)},
one can readily see that the integrals over the vertical segments of the rectangle tend to zero as R i 00 and hence that
I:
e- ax2 dx =
I:
f(x)dx =
I:
f(x
+ ib)dx
I:
e- a(x+ib)2 dx,
for every choice of b > O. Since the same argument works for b < 0, the verification of the asserted formula is complete.
17.4. A short detour on Fourier analysis Let (17.12) denote the Fourier transform of f (whenever the integral is meaningful) and let 1 for a < x < b and t = b - a. fab{X) = { 0 elsewhereThen
and
I: I
lab (Jl) 12 dJl
= =
I: ei~b ~ ei~a 1
1 21
00
1ei~~ - 112 dJl
'tJL
-00
=
12 dJL
00
c~s Jlt dJl JL (by the formula in Example 17.13) 1-
-00
I:
21ft =
21f
Ifab{X)1 2 dx.
Exercise 17.9. Show that if a < b :::; c < d, then
I:
[HINT: Let
lcd(Jl)lab(JL)dJl =
o.
17.4. A short detour on Fourier analysis
371
ei/L(d-b) _ ei/L(c-b) _ ei/L(d-a)
+ ei/L(c-a)
{t2 and exploit the fact that g(,X) is holomorphic in C and the coefficients of i{t in the exponential terms in g({t) are all nonnegative.] Exercise 17.10. Show that
for all points x E JR other than a and b. In view of the formulas in Exercises 17.9 and 17.10, it is now easy to check that (17.13)
and (17.14)
for functions
1 of the form n
I(x) =
L CjIUjbj{X) , j=l
where al < bl ::; a2 < b2 ::; ... ::; an < bn and CI,' .. ,Cn is any set of complex numbers. The first formula (17.13) exhibits a way of recovering I(x) from its Fourier transform i({t). Accordingly, the auxiliary transform (17.15)
(appropriately interpreted) is termed the inverse Fourier transform. The second formula (17.14) is commonly referred to as the Parseval/Plancherel or Pareseval-Plancherel formula. It exhibits the fact that
where
111112 =
{I:
1
II{x) 12
dX} 2 ,
for the class of functions under consideration. However, the conclusion is valid for the class of 1 which belong to the space L2 of 1 such that 1112 is integrable in the sense of Lebesgue on the line JR.
17. Matrix valued holomorphic functions
372
Exercise 11.11. Show that (17.14) holds if and only if
2~
(17.16)
I:
1(J1.)g(J1.)dJ1. =
I:
f(x)g(x)dx
holds for every pair of piecewise constant functions f(x) and g(x). [HINT: This is just (8.5).]
1
The space £2 has the pleasant feature that f E £2 ¢:::::;> E £2. An even pleasanter class for Fourier analysis is the Schwartz class 5 of infinitely differentiable functions f(x) on R such that lim
x1+00
lxi f(k)(x)1 =
lim Ix j f(k)(x)1
xL-co
=0
for every pair of nonnegative integers j and k. Exercise 17.12. Show that if f E 5, then its Fourier transform 1p..) enjoys the following properties: (a) (-iA)i j(A)
= J~oo ei>.x f(j}(x)dx
for j
= 1,2, ... .
(b) (-iD>.)k1 = J~oo ei>.xx k f(x)dx for k = 1,2, ... .
(c)
1 E 5.
You may take it as known that if f E 5, then the derivative
D>.1 = lim 1(A + ~) ~--o
- j(A)
~
can be brought inside the integral that defines the transform. Exercise 17.13. Show that if f(x) and g(x) belong to the Schwartz class S, then the convolution (17.17)
(f 0 g)(x) =
I:
f(x - y)g(y)dy
belongs to the class 5 and that (17.18)
--
~
(f 0 g)(A) = f(A)g(A). 2/
~
2
~
Exercise 17.14. Show that if f(x) = e- x 2, then f(/-L) = e-I-' /2 f(O). [HINT: Exploit the formula that was established in Example 17.14.]
17.5. Contour integrals of matrix valued functions The contour integral
h
F(A)dA
17.5. Contour integrals of matrix valued functions
373
of a p x q matrix valued function
is defined by the formula
where
aij= hfij()")d)", i=l, ... ,p, j=l, ... ,q; i.e., each entry is integrated separately. It is readily checked that
h {F()")
+ G()")}d)" =
h F()")d)" + h G()")d)"
and that if Band C are appropriately sized constant matrices, then
h BF()")Cd)" = B (h F()")d)" ) C . Moreover, if
h
matrix valued function on r, then
IIh
F()")d)..11
~
lb
11F{t(t)) II I,'(t) Idt .
Proof. This is a straightforward consequence of the triangle inequality applied to the Riemann sums that are used to approximate the integral. 0 Lemma 17.16. Let J
= C~k) = p,h + N
be a Jordan cell and let r be a simple smooth counterclockwise directed curve in the complex plane C that does not intersect the point /-t. Then (17.19)
17. Matrix valued holomorphic functions
374 Proof.
Clearly,
)..h - J and, since N k
= ().. - J-L)h - N
= Okxk,
Therefore,
2~i
lr
k
(>.I. - J)-ld>' =
~ L~i
lr
(>.
~ I')i d>' }
Nj-l .
But this yields the asserted formula, since 1. -2 7rZ
irf ().. -1 J-L ).J d)" = 0
and
~ 27ri
f _1_d)" = ir ).. - J-L
{1
0
if j > 1
if J-L is inside r if J-L is outside r
. D
Let A E C nxn admit a Jordan decomposition of the form (17.20)
where J1, ... ,Je denote the Jordan cells of J, UI, ... ,Ue denote the corresponding block columns of U and VI, . .. , Vi denote the corresponding block rows of U- 1 • Consequently, (17.21) i=1
and, if Ji is a Jordan cell of size ni x ni, then t.
(17.22)
()"In - Atl = U()"In - J)-IU- 1 =
L Ui()..In; -
Ji)-lVi.
i=l
Note that if A has k distinct eigenvalues with geometric multiplicities "11, ... , 'Yk, then '- = "11 + ... + 'Yk in formula (17.22). Lemma 17.17. Let A E c nxn admit a Jordan decomposition of the form (17.20), where the ni x ni Jordan cell Ji = (3iIni +Nni' and let r be a simple
17.6. Continuous dependence of the eigenvalues
375
smooth counterclockwise directed curve in C that does not intersect any of the eigenvalues of A. Then 1 -2'
(17.23)
~I
1 r
e (AIn - A)-IdA = ~ L..J U·X·V; t t t, i=l
where if (3i if (3i
Proof.
is inside is outside
r r
This is an easy consequence of formula (17.22) and Lemma 17.16.
o It is readily checked that the sum on the right-hand side of formula (17.23) is a projection:
(t.
U,X,
v. )' ~
t.
U,X,
v. .
Therefore, the integral on the left-hand side of (17.23),
PrA = -1,
lr
(AIn - A) -1 dA, r is also a projection. It is termed the Riesz projection. (17.24)
2~1
Lemma 17.18. Let A E c nxn , let det{Aln - A) = (A - A1Yl<1 ... (AAk)(l
(17.25)
rank Pt-
=
L
Qi
where G = {i : Ai
is inside
r}.
iEG
Proof.
The conclusion rests on the observation that e rank (U {diag {XI, ... ,Xe}} V) rank UiXiVi
L
i=l
rank {diag{Xb .. ' ,Xe}} , since U and V are invertible. Therefore, the rank of the indicated sum is equal to the sum of the sizes of the nonzero Xi that intervene in the formula for Pt-, which agrees with formula (17.25). 0
17.6. Continuous dependence of the eigenvalues In this section we shall use the projection formulas Pt- t~ establish the continuous dependence of the eigenvalues of a matrix A E nxn on A. The strategy is to show that if B E c nxn is sufficiently close to A, then
c
376
17. Matrix valued holomorphic functions
IIP{~l - pfll < 1, and hence, by Lemma 9.16, rankPt = rankPf. The precise formulation is given in the next theorem, in which
Theorem 17.19. Let A, BE
c nxn
and let
det(AIn - A) = (A - AI)Ql ... (A - Ak)Qk , where the k points Al ... ,Ak are distinct. Then for every a 8 > 0 such that if II A - B II < 8, then
E
> 0, there exists
k
UDf(Ad·
u(B) C
i=I
Moreover, if eo =
~minH'\ -
Ajl : i
i= j}
and e < eo, then each disk DE(Ai) will contain exactly of B, counting multiplicities.
(Yi
points of spectrum
Proof. Let r < eo and let rj = rj(r) denote a circle of radius r centered at Aj and directed counterclockwise. This constraint on r insures that the rj nrk = 0 if j :/= k and that Aj is the only eigenvalue of A inside the closed disc Dr(Aj) = {A E C: IA - Ajl ::; r}. The remainder of the proof rests on the following four estimates that are taken from Lemma 7.18: (1) If A E r j , then AIn - A is invertible and there exists a constant "(j such that II (Aln - A)-III::; "(j < 00 for j = 1, ... ,k. (2) If"( = max{-rj: j invertible and
= 1, ...
,k}
IIA - BII < ~,then AIn - B is
and
II(AIn - B)-III ::; 1-
"(111- All
for every point A E Uj=I r j. (3) The bound II( AI - A)-I _ (AI _ B)-III n
n
<
"(2 liB
- All
- 1 - "(liB - All
is in force for every point A E Uj=I r j.
(4) In view of (3) and Lemma 17.15, IIPt- J
pfll
=
J
<
~ { 2wl~
(AIn - A)-I - (AIn - B)-IdA J
"(211B - Allr 1- "(liB - All·
17.7. More on small perturbations
377
At first glance, the last upper bound might look strange, unless you keep in mind that the number, also depends upon r and can be expected to increase as r decreases. Nevertheless, this bound can be made less than one by choosing 1
liB-Ali <
Thus, given any c Lemma 9.16, for j
=
rank
,+,r
2'
> 0, let r < min{c,co} and 8 < (r+,2 r )-l; then, by
p,
pt. =
rank 3 = rank 3 1, ... ,k, and hence by Lemma 17.18,
P' 3
(XJ'
= the sum of the algebraic multiplicities of the eigenvalues of B inside Df(Aj) ,
o
which yields the desired result.
Exercise 17.15. Show that the roots A1, ... ,An of the polynomial p(>..) = ao + alA + ... + anA n with an =1= 0 depend continuously on the coefficients ao, ... ,an of the polynomial. [HINT: Exploit companion matrices.]
17.7. More on small perturbations In this section we shall continue to investigate the effect of small changes in the entries of a matrix. Lemma 17.20. If A E c nxn has n distinct eigenvalues, then there exists a number 8 > 0 such that every matrix B E nxn for which IIA - BII < 8 also has n distinct eigenvalues.
c
This is an easy consequence of Theorem 17.19. o The next result shows that any matrix A E c nxn can be approximated arbitrarily well by a matrix B E c nxn with n distinct eigenvalues; i.e., the cla..'ls of n x n matrices with n distinct eigenvalues is dense in c nxn . Proof.
Lemma 17.21. If A E c nxn and 8> 0 are specified, then there exists a matrix BE c nxn with n distinct eigenvalues such that IIA - BII < 15. Proof. Let A = PJP-l be the Jordan decomposition of the matrix A and let D be a diagonal matrix such that the diagonal entries of the matrix D + J are all distinct and
(IIPlIlIp- 1 11)-lt5 Then the matrix B = P(D + J)P- l Idiil
for i = 1, ... , n. values and
~
has n distinct eigen-
o
17. Matrix valued holomorphic functions
378
Note that this lemma does not say that every matrix B that meets the inequality IIA - BII < 8 has n distinct eigenvalues. Since the class of n x n matrices with n distinct eigenvalues is a subclass of the set of diagonalizable matrices, it is reasonable to ask whether or not a diagonalizable matrix remains diagonalizable if some of its entries are changed just a little. The answer is not always! Exercise 17.16. Let A that
IIA -
=
[~ ~]
and B =
[~ ~], where (3 =1= o.
Show
BII = 1(31, but that A is diagonalizable, whereas B is not.
Exercise 17.17. Let A E C pxq and suppose that rankA = k and k < min{p, q}. Show that for every € > 0 there exists a matrix B E C pxq such that IIA - BII < € and rankB = k + 1. [HINT: Use the singular value decomposition of A.] Some conclusions A subset X of the set of p x q matrices C pxq is said to be open if for every matrix A E X there exists a number 8 > 0 such that the open ball
B6(A) = {B E C pxq : IIA - BII < 8} is also a subset of X. The meaning of this condition is that if the entries in a matrix A E X are changed only slightly, then the new perturbed matrix will also belong to the class X. This is a significant property in applications and computations because the entries in any matrix that is obtained from data or from a numerical algorithm are only known approximately. The preceding analysis implies that: rank A = min{p, q}} is an open subset of C pxq . (2) {A E C pxq : rankA < min{p, q}} is not an open subset of C pxq . (3) {A E c nxn : with n distinct eigenvalues} is an open subset of c nxn . (4) {A E c nxn : A is diagonalizable} is not an open subset of c nxn . (1) {A E C pxq
:
The set {A E C nxn : with n distinct eigenvalues} is particularly useful because it is both open and dense in c nxn , thanks to Lemma 17.21. Open dense sets are said to be generic. Exercise 17.18. Show that {A E C nxn
:
A
is invertible} is a generic set.
17.8. Spectral radius redux
In this section we shall use the methods of complex analysis to obtain a simple proof of the formula
ru(A)
=
lim IIAklll/k
kjoo
17.8. Spectral radius redux
379
for the spectral radius
rq(A) = max{IAI : A E a(A)} of a matrix A E C nxn. Since the inequality
rq(A) ~ IIAklli/k for k = 1,2, ... is easily verified, it suffices to show that (17.26)
Lemma 17.22. If A E c nxn and a(A) belongs to the set of points enclosed by a simple smooth counterclockwise directed closed curve r, then (17.27)
Proof. Let g(A) = Ak; let r r denote a circle of radius r centered at zero and directed counterclockwise, and suppose that r > IIAII. Then
-1,
ill
21f't
rr
Ak(AIn - A)-IdA =
2.....
rr
Il •
g(A)
L 00
Aj dA
\j+1 j=O /\
= ~ g(j)(O) Aj = Ak L...J" J.
'0 J=
.
The assumption r > IIAII is used to guarantee the uniform convergence of the sum L~o A- j Aj on r r and subsequently to justify the interchange in the order of summation and integration. Thus, to this point, we have established formula (17.27) for r = rr and r > IIAII. However, since Ak(AIn - A)-l is holomorphic in an open set that contains the points between and on the curves rand r r, it follows that
~ 27rt
rAk(AIn - A)-IdA
lr
=
~ 21f't
r Ak(AIn - A)-IdA.
lr
r
D
Corollary 17.23. If A E c nxn and a(A) belongs to the set of points enclosed by a simple smooth counterclockwise directed closed curve r, then (17.28)
for every polynomial p(A).
17. Matrix valued holomorphic functions
380
Theorem 17.24. If A E C nxn I then
(17.29) i. e., the limit exists and is equal to the modulus of the maximum eigenvalue
of A. Proof.
Fix
E
> 0, let r = rq(A) + E, and let I'r
= max{II(AIn - A)-III: IAI = r}.
Then, by formula (17.27), IIAkll
112~i fo
=
~ -.!..
{21f
27r 10
21f
(re i9 )k(rei9 In - A)-lirei9 dOll
rk II (re i9 I n
_
A)-llirdO
Thus, and, as (rl'r)l/k ~ 1 as k i
00,
it follows that lim sup IIAk I l/k ~ r = ru(A)
+ E.
kjoo
The inequality lim sup IIAklll/k ~ rq(A) kjoo
is then obtained by letting k = 1,2, ... , it follows that
E
1 O. Therefore, since rq(A)
~ IIAklll/k for
rq(A) ~ liminf IIAklll/k ~ lim sup IIAklll/k ~ rq(A) , kjoo
kjoo
which serves to establish formula (17.29).
D
Exercise 17.19. Show that if A E c nxn and cr(A) belongs to the set of points enclosed by a simple smooth counterclockwise directed closed curve r, then (17.30)
1 -2' 7r'l
1
~ -., Ai . eA (AIn - A) -1 dA = L...t r . 0 J. J=
17.9. Fractional powers
381
Let A E c nxn and let J(>..) be holomorphic in an open set n that contains (T(A). Then, in view offormulas (17.28) and (17.30) it is reasonable to define (17.31) where r is any simple smooth counterclockwise directed closed curve in that encloses (T(A) such that every point inside r also belongs to n. This definition is independent of the choice of r and is consistent with the definitions of J(A) considered earlier.
n
Exercise 17.20. Show that if, in terms of the notation introduced in (17.21), A = UIC~)Vl + U2C~q)V2' then (j)
p-l
(17.32)
q-l
.
J(A) = Ul ~ J .}a) (Qr»)3Vl + U2 ~ J j=O
J.
j=O
(j)
.~a) (Caq»)jV2
J.
for every function J(>..) that is holomorphic in an open set that contains the points a and (3. Exercise 17.21. Show that in the setting of Exercise 17.20 det (>..In - J(A))
(17.33)
= (>.. - J(a))P(>.. - J((3))q .
Exercise 17.22. Show that if A E open set that contains (T(A), then (17.34) det (>..In - A)
=
c nxn
and J(>..) is holomorphic in an
(>.. - >"lYl.. - >"k)Cl
det (>..In - J(A))
= (>.. - f(>"I))Cl.. - J(>"k))Cl
[HINT: The main ideas are contained in Exercises 17.20 and 17.21. The rest is just more elaborate bookkeeping. 1 Theorem 17.25. (The spectral mapping theorem) Let A E c nxn and let J(>..) be holomorphic in an open set that contains (T(A). Then
(17.35) Proof.
J-l
E
(T(J(A))
~
J(J-l)
E
(T(A) .
This is immediate from formula (17.34).
D
17.9. Fractional powers
c nxn . Show that if A >- 0, then A I / 2 = ~ f v').(>..In _ A)-Id>"
Exercise 17.23. Let A
E
2m lr for any simple closed smooth curve r in the open right half plane that includes the eigenvalues of A in its interior.
17. Matrix valued holomorphic functions
382
o
Figure 7 Exercise 17.24. Let A, B E that if 0 < t < 1, then (17.36)
c nxn
and suppose that A >- B >- O. Show
1 . { At {(AIn - A)-l - (AIn - B)-I} dA, At - Bt = -2 1C'Z
1r
where r indicates the curve in Figure 7, and then, by passing to appropriate limits, obtain the formula (17.37)
At _ Bt = sin7rt 7r
roo xt(xIn + A)-l(A _ B)(xIn + B)-ldt.
10
Exercise 17.25. Use formula (17.37) to show that if A, BE
(17.38)
A>- B >- 0
===}
At >- Bt
for
0
< t < 1.
c nxn , then
Chapter 18
Matrix equations
confusion between creativity and originality. Being original entails saying something that nobody has said before. Originality... must be exhibited, or feigned, for academic advancement. Creativity, by contrast, reflects the inner experience of the individual overcoming a challenge. Creativity is not diminished when one achieves ... what has already been discovered ...
Shalom Carmy [15], p. 26 In this chapter we shall analyze the existence and uniqueness of solutions to a number of matrix equations that occur frequently in applications. The notation I1+
= {A E C : A +:X > O}
and I1_
= {A E C : A +:X < O}
for the open right and open left half plane, respectively, will be useful.
18.1. The equation X - AXE = C BE cqxq and C E Cpx q; let ab ... ,ap and {31, ... ,{3q denote the eigenvalues of the matrices A and B (repeated according to their algebraic multiplicity), respectively; and let T denote the linear transformation from c pxq into C pxq that is defined by the rule
Theorem 18.1. Let A
(18.1)
E CpxP,
T: X E C pxq ---+ X - AXB E C pxq .
Then (18.2)
NT
= {Opxq} ~ ai{3j =1= 1
for i = 1, ... ,p
and j = 1, ... ,q.
-
383
18. Matrix equations
384
Proof. Ui E CP
Let AUi = aiui and BT Vj = /3jVj for some pair of nonzero vectors and Vj E cq and let X = UiVr Then the formula
TX
= Uivf -
AUivf B
= (1 -
ai/3j )uivf
clearly implies that the condition stated in (18.2) is necessary for NT #{Opxq}. To prove the sufficiency of this condition, invoke Jordan decompositions A = U JU- 1 and B = V]V-l of these matrices. Then, since X - AXB
=0
~ X - UJU-1XV]V- 1 ~
U- 1 XV
the proof is now completed by setting Y
(18.3)
-
=0 = 0,
J(U- 1 XV)]
= U- 1 XV and writing
J
and
=
[~~ o
0
: ...
Je
I ,Ie,
in block diagonal form in terms of their Jordan cells Jr, ... ,Jk and~, ... respectively. Then, upon expressing Y in compatible block form with blocks Yij, it is readily seen that (18.4) Y - JY J = 0 ~ Yij - JiYijJj = 0
i = 1, ... ,k and j = 1, ... f .
for
Thus, if the Jordan cells Ji = CJ!:;) = aiIpi + Nand then
Yij - JiYijJj = Yij - (aiIpi
Jj = C~~j)
+ N)"}ijJj = "}ij(Iqj -
However, if 1- ai/3j #- 0, then the matrix Iqj - aiJj is invertible, and hence, upon setting -
-
M = Jj(Iqj - aiJj)
= /3j I qj
+ N,
aiJj) - N"}ijJj.
= (1 - ai/3j )Iqj - aiN
-1
the equation "}ij - JiYijJj = 0 reduces to
Yij = NYijM,
which iterates to Yij
= Nk"}ijMk
and hence implies that Yij = 0, since N k = 0 for large enough k. Therefore, Y = 0 and X = UYV- 1 = O. This completes the proof of the sufficiency of the condition ai/3j #- 1 to insure that NT = {Opxq}, i.e., that X = 0 is D the only solution of the equation X - AX B = O.
18.2. The Sylvester equation AX - X B = C
385
Theorem 18.2. Let A E C pxP , B E c qxq and C E C pxq and let a!, ... , a p and fh, ... , /3q denote the eigenvalues of the mat'rices A and B (repeated according to their algebraic multiplicity), respectively. Then the equation (18.5)
X-AXB=C
has a unique solution X E c pxq if and only if adJj i and j.
i
1 for every choice of
Proof. This is immediate from Theorem 18.1 and the principle of conservation of dimension: If T is the linear transformation that is defined by the rule (18.1), then pq = dim NT + dim RT. Therefore, T maps onto C pxq if and only if NT = 0, i.e., if and only if ai/3j i 1 for every choice of i and j. 0 Corollary 18.3. Let A E C pxP , C E C pxp and let al, ... , a p denote the eigenvalues of the matrix A. Then the Stein equation (18.6)
X - ABXA = C
has a unique solution X E C pxp if and only if 1 - aiaj of i and j.
i
0 for every choice
Exercise 18.1. Verify the corollary. Exercise 18.2. Let A = C~2) and B = C~2) and suppose that afj = l. Show that the equation X - AX B = C has no solutions if either C2l i 0 or nCll i BC 22. Exercise 18.3. Let A = C~2) and B = C~2) and suppose that u/3 = l. Show that if C2l = 0 and UCll = /3c22, then the equation X - AX B = C has infinitely many solutions. Exercise 18.4. Find the unique solution X E C pxp of equation (18.6) when A = C6P), C = el u B + ue{i + epe:! and u B = [0 Sl Sp-1].
18.2. The Sylvester equation AX - X B = C The strategy for studying the equation AX - X B for the equation X - AX B = C.
=C
is much the same as
Theorem 18.4. Let A E Cpx P , B E C qxq and let aI, ... ,ap and /31. ... , /3q denote the eigenvalues of the matrices A and B (repeated according to their algebraic multiplicity), respectively, and let T denote the linear transformation from cpxq into C pxq that is defined by the rule (18.7)
T: X E C pxq ~ AX - XB E C pxq .
18. Matrix equations
386
Then (18.8) NT={Opxq}¢=}O!i-{3j:f:O
for i=I, ... ,p,
j=I, ... ,q.
Proof. Let AUi = O!iUi and BT Vj = {3jVj for some pair of nonzero vectors Ui E C p and v j E C q and let X = UiVJ. Then the formula TX =
AUiVJ - uivJB = (O!i - {3j)UiVJ
clearly implies that the condition stated in (18.8) is necessary for NT = {Opxq}. To prove the sufficiency of this condition, invoke Jordan decompositions A = U JU- 1 and B = V JV- 1 of these matrices. Then, since
AX - XB = 0
=0
{:::=:>
U JU- 1 X - XV JV- 1
{:::=:>
J (U- 1 XV) - (U- 1 XV)
J = 0,
the proof is now completed by setting Y = U- 1 XV and writing Zand J In block diagonal form in terms of their Jordan cells J1, ... ,Jk and J 1, ... ,Je, respectively, just as in (18.3). Then, upon expressing Y in compatible block form with blocks Yij, it is readily seen that (18.9) JY - Y J Thus, if Ji
=0
{:::=:>
JiYij - YijJj = 0
for
i
= 1, ...
- = {3jlqj + N,- then = O!i1Pi + Nand Jj
JiYij - YijJj = (O!i1Pi
+ N)Yij
,k and j
- YijJj = Yij(O!i1qj -
However, if O!i - {3j :f: 0, then the matrix O!i1qi invertible, and hence, upon setting -
M = -(O!i1qj - Jj)
-1
Jj =
= 1, ... £.
Jj) + NYij . {3j)Iqj -
(O!i -
IV
is
,
the equation reduces to
Yij = NYijM,
which iterates to
Yij = NkYijMk
for
k
= 2,3, ...
and hence implies that Yij = 0, since N k = 0 for large enough k. This completes the proof of the sufficiency of the condition O!i - {3j :f: 0 to insure that NT = {Opxq}. 0 Theorem IS.S. Let A E Cpx P , B E C qxq and C E cpxq and let 0!1, ... ,O!p and {31, ... ,{3q denote the eigenvalues of the matrices A and B (repeated according to their algebraic multiplicity), respectively. Then the equation
has a unique solution X i and j.
E
AX-XB=C c pxq if and only if O!i -
{3j
:f: 0 for
any choice of
18.2. The Sylvester equation AX - X B
=C
387
Proof. This is an immediate corollary of Theorem 18.4 and the principle 0 of conservation of dimension. The details are left to the reader. Exercise 18.5. Complete the proof of Theorem 18.5. Exercise 18.6. Let A E
e nxn.
(18.10)
Show that the Lyapunov equation
AH X
+ XA = Q
has a unique solution for each choice of Q E 0"( _AH) = 0.
e nxn
if and only if O"(A)
n
Lemma 18.6. If A, Q E e nxn , and if O"(A) C IL and -Q t 0, then the Lyapunov equation (18.10) has a unique solution X E e nxn . Moreover this solution is positive semidefinite with respect to en. Proof.
Since O"(A) C IL, the matrix
z=
-
1 00
etAH QetAdt
is well defined and is positive semidefinite with respect to
AHZ =
1 _1 (! 00
-
etAH ) QetAdt
_ {e tAH Qe tA
=
Moreover,
AHetAHQetAdt
00
Q+
en.
1
00
100
t=O
_
fOO
Jo
etAH ~(QetA)dt} dt
etAH QetAdtA
Q-ZA.
Thus, the matrix Z is a solution of the Lyapunov equation (18.10) and hence, as the assumption O"(A) C IL implies that O"(A) n O"(AH) = cp, there is only 0 one, by Exercise 18.6. Therefore, X = Z is positive semidefinite. A number of refinements of this lemma may be found in [45]. Exercise 18.7. Let A E c nxn . Show that if O"(A) c II+, the open right half plane, then the equation AH X + XA = Q has a unique solution for every choice of Q E e nxn and that this solution can be expressed as
X for every choice of Q E
by parts.]
=
e nxn .
1 00
e-tAH Qe-tAdt
[HINT: Integrate the formula
18. Matrix equations
388
Exercise 18.8. Show that in the setting of Exercise 18.7, the solution X can also be expressed as X
= --2 1 7l"
1
00
(ipJn
+ Ali)-lQ(ij..tln -
A)-ldj..t.
-00
Exercise 18.9. Let .4 = diag {Au, .422} be a block diagonal matrix in c nxn with a(Au) C II+ and a(A22) elL, let Q E c nxn and let Y E c nxn and Z E c nxn be solutions of the Lyapunov equation Ali X + XA = Q. Show that if Y and Z are written in block form consistent with the block decomposition of A, then Yu = Zu and Y22 = Z22. Exercise 18.10. Let A, Q E c nxn . Show that if a(A) n ilR = 0 and if Y and Z are both solutions of the same Lyapunov equation A H X + X A = Q such that Y - Z t 0, then Y = Z. [HINT: To warm up, suppose first that A = diag {All, A22}, where a(All) C II+ and a(A:12) C II_ and consider Exercise 18.9.]
C6
Exercise 18.11. Let A = E~=l ejeJ+l = 4 ) and let T denote the linear transformation from C 4x4 into itself that is defined by the formula T X = AliX -XA. (a) Calculate dim NT. (b) Show that a matrix X E C 4x4 with entries
H
matrix equation A X - X A
= [:
is a Hankel matrix with
= a,
Xu
Xij
is a solution of the
T~b ~C]
cOO = band
X12
if and only if X
0 X13
= c.
C6
4 ) and let T denote the linear transformation Exercise 18.12. Let A = 4x4 from C into itself that is defined by the formula T X = X - A H X A.
(a) Calculate dim NT. (b) Show that a matrix X E C 4x4 is a solution of the matrix equation X _ A HXA __ [fage
o~b o~c ~~]
if and only if X is a Toeplitz matrix.
18.3. Special classes of solutions Let A E C nxn and let: • t'+(A) = the number of zeros of det (>.In - A) in II+ . • t'_(A)
= the number of zeros of det (>.In -
A) in II_.
389
18.3. Special classes of solutions
• Eo(A)
= the number of zeros of det ()"In
A) in ilR.
-
The triple (E+(A), E_(A), Eo(A)) is called the inertia of A; since multiplicities are counted E+(A) + E_(A) + Eo(A) = n. Theorem 18.7. Let A E c nxn and suppose that £T(A) there exists a Hermitian matrix G E c nxn such that
n ilR
= >. Then
(1) AHG+GA:>-O. (2) E+(G)
= E+(A), E_(G) = E_(A) and Eo(G) = Eo(A) = O.
Proof. Suppose first that E+(A) = p ~ 1 and E_(A) = q ~ 1. Then the assumption £T(A) n ilR = > guarantees that p + q = n and that A admits a Jordan decomposition U JU- 1 of the form A
= U [Jl 0 ] U- 1 o h
c
with Jl E C pxP , £T(J1 ) C TI+, hE qxq and £T(h) C TI_. Let Pu E C pxp be positive definite over C P and P22 E c definite over C q. Then
tx)
Xu = io e
-tJH 1
Plle
qxq
be positive
-tJ
ldt
is a positive definite solution of the equation J(i Xu
+ XU J1 = Pu
and
X22 = -
10
00
etJf P22eth dt
is a negative definite solution of the equation
Jr X 22 + X22J2 = P22 . Let
X = diag {Xu, X22} and P = diag {Pu , P22} . Then and hence
(UH)-lJHUH(UH)-lXU-l Thus, the matrix G
+ (UH)-lXU-1UJU- 1 =
(UH)-lpU- 1 .
= (UH)-l XU- 1 is a solution of the equation AHG + GA = (UH)-lpU- 1
and, with the help of Sylvester's inertia theorem (which is discussed in Chapter 20), is readily seen to fulfill all the requirements of the theorem. The cases p = 0, q = nand p = n, q = 0 are left to the reader. 0
18. Matrix equations
390
Exercise 18.13. Complete the proof of Theorem 18.7 by verifying the cases = 0 and p = n.
p
18.4. Riccati equations In this section we shall investigate the existence and uniqueness of solutions X E en x n to the Riccati equation (18.11)
AHX +XA+XRX + Q = 0,
in which A, R, Q E c nxn , R = RH and Q = QH. This class of equations has important applications. Moreover, the exploration of their properties has the added advantage of serving both as a useful review and a nonartificial application of a number of concepts considered earlier. The study of the Riccati equation (18.11) is intimately connected with the invariant subspaces of the matrix (18.12)
which is often referred to as the Hamiltonian matrix in the control theory literature. The first order of business is to verify that the eigenvalues of C are symmetrically distributed with respect to the imaginary axis ilR., or, to put it more precisely: Lemma 18.8. The roots of the polynomial p(>.) = det(>.I2n metrically distributed with respect to ilR..
-
C) are sym-
Proof. This is a simple consequence of the identity SCS- 1 = _CH
,
where (18.13) D
Exercise 18.14. Verify the identity SCS- 1 Lemma 18.8.
= -CH and the assertion of
If a(C) n iR = 0, then Lemma 18.8 guarantees that C admits a Jordan decomposition of the form (18.14)
G U [J1o h0] U- 1' =
391
18.4. Riccati equations
It turns out that the upper left-hand n x n corner Xl of the matrix U will playa central role in the subsequent analysis; i.e., upon writing
so that (18.15)
G[
~~
] = [
~~ ] A
and
~(A) c IL ,
the case in which Xl is invertible will be particularly significant. Lemma 18.9. If ~(G) n ilR. = 0 and formula (18.15) is in force for some matrix A E C nxn (that is not necessarily in Jordan form), then
(18.16) Proof.
Let
Then
ZA
[xf
XH]S 2 [ Xl X2 ] A
[Xf X:1SG
[~~ ]
= -[Xf X:]GHS [
~~
]
_AH[Xf XH1S [ Xl ] X2 2 _AHZ. Consequently, the matrix Z is a solution of the equation
However, since ~(A) c IL and hence ~(AH) c IL, it follows from Theorem 18.5 that Z = 0 is the only solution of the last equation. 0 Theorem 18.10. If ~(G) n ilR. = is invertible, then:
0 and the matrix Xl in formula (18.15)
(1) The matrix X = X2Xl1 is a solution of the Riccati equation (18.11).
(2) X = X H . (3) ~(A+RX) c IL .
392
18. Matrix equations
Proof. If Xl is invertible and X = X2Xl1, then formula (18.15) implies that
and hence, upon filling in the block entries in G and writing this out in detail, that =
X1AXl I X(XIAXll) .
Therefore,
-Q - AH X = X(A + RX) , which serves to verify {1}. Assertion (2) is immediate from Lemma 18.9, whereas (3) follows from the formula A + RX = X1AX11 and the fact that O"(A) C IL. D Theorem 18.10 established conditions that guarantee the existence of a solution X to the Riccati equation (18.11) such that O"(A + RX) c IL. There is a converse:
Theorem IS.I1. If X = XH is a solution of the Riccati equation (18.11) such that u{A + RX) c IT_, then O"(G) n ilR = 0 and the matrix Xl in formula (18.15) is invertible. Proof. If X is a solution of the Riccati equation with the stated properties, then
G[In]
X
=
[A RH][In]=[ A+R~ ] -Q -A X -Q-A X
=
[ ; ] (A
+ RX) .
Moreover, upon invoking the Jordan decomposition
A+RX = PJIP-1, we see that
which serves to identify the columns of the matrix
as a full set of eigenvectors and generalized eigenvectors of the matrix G corresponding to the eigenvalues of G in IL. Thus, Xl = P is invertible D and, in view of Lemma 18.8, 0"( G) n ilR = 0.
18.4. Riccati equations
393
Theorem 18.12. The Riccati equation (18.11) has at most one solution X E e nxn such that X = XH and O"(A + RX) c IL. Proof. Let X and Y be a pair of Hermitian solutions of the Riccati equation (18.11) such that O"(A + RX) c IL and O"(A + RY) c IL. Then, since and AHy +YA+YRY +Q=O,
it is clear that AH(X - Y)
+ (X -
Y)A+XRX - YRY = O.
= Y H, this last equation can also be reexpressed as (A + Ry)H (X - Y) + (X - Y)(A + RX) = 0 ,
However, as Y
which exhibits X - Y as the solution of an equation of the form BZ+ZC=O
with O"(B) c IL and O"(C) C IL. Theorem 18.5 insures that this equation has at most one solution. Thus, as Z = Onxn is a solution, it is in fact the 0 only solution. Therefore X = Y, as claimed. The preceding analysis leaves open the question as to when the conditions imposed on the Hamiltonian matrix G are satisfied. The next theorem provides an answer to this question when R = -BBH and Q = CHC. Theorem 18.13. Let A E (a) rank [ A
-;In ]=
(b) rank[A - )"In plane.
e nxn , BE e nxk , C E e rxn
and suppose that
n for every point A E ilR and
B] = n for every point A E IT+, the closed right half
Then there exists exactly one Hermitian solution X of the Riccati equation AHX +XA -XBBHX +CHC = 0
(18.17)
such that O"(A - BBH X) c IT_. Moreover, this solution X is positive semidefinite over en, and if A, Band C are real matrices, then X E JR nxn .
Proof.
Let -BBH] A G = [ -CHC _AH
and suppose first that (a) and (b) are in force and that
x] [x]
-BBH ] [ A [ -CHC _AH y
=)..
y
18. Matrix equations
394
for some choice of x E
cn, y
E
C n and A E C. Then
(A - AIn)X = BBHy
and Therefore, and
Thus,
-(A+:X)(X,y) =
((A-AIn)x,y) - ((A+XIn)x,y) IIBH YI~
+ IICxlI~
and hence A + X = 0 ==:::} BH Y = 0 and
Cx = 0
,
which in turn implies that [A-CAIn
]x=o and yH[A+XIn
B]=O
when A E ilR. However, in view of (a) and (b), this is viable only if x and y = o. Consequently, u(G) n ilR = 0. The next step is to show that if (a) and (b) are in force and if
[ _tH -!::] [~~ ] = [ ~~ ] C
A and
rank [
=0
~~ ] = n,
where XI,X2,A E c nxn and u(A) C IL, then Xl is invertible. Suppose that u E NXl . Then -BBH X2U = XIAu ,
and hence, as xfXI = Xf!X2 by Lemma 18.9,
_u H xf BBH X2U = u H xf XIAu
-IIB HX2ull2 =
u H Xf!X2Au
= o.
XIAu = -BBH X2 U = 0, which means that NXl is invariant under A and hence that either NXl = {O} or that Av = AV for some point A E IL and some nonzero vector v E N Xl. In the latter case,
18.4. Riccati equations
395
and _AH X 2v = X2Av = ).X2v ,
which is the same as to say vHXr[A+ Xln
B]
for some point ). E 11_. Therefore, since that X 2 v = O. Therefore, [
~~
]v
= OH
-X E 11+, assumption (b) implies
= 0 ~ v = 0 ~ NXl = {O}
~ Xl
is invertible.
Thus, in view of Theorems 18.10 and 18.12, there exists exactly one Hermitian solution X of the Riccati equation (18.17) such that a(A - BBH X) c 11_. If the matrices A, Band C are real, then the matrix X is also a Hermitian solution of the Riccati equation (18.17) such that a(A - BBH X) c IL. Therefore, in this case, X E ~ nxn. It remains only to verify that this solution X is positive semidefinite with respect to en. To this end, it is convenient to reexpress the Riccati equation AHX +XA-XBBHX +CHC = 0
as (A - BBHX)HX +X(A-BBHX) = -CHC -XBBHX ,
which is of the form where a(AI)
c
11_ and - Q !: O.
The desired result then follows by invoking Lemma 18.6.
0
c
Exercise 18.15. Let A E nxn , BE C nxk • Show that if a(A)ni~ = 0 and rank [A - )'In B] = n for every point). E 11+, then there exists exactly one Hermitian solution X of the Riccati equation AH X + XA - XBB HX = 0 such that a(A - BBH X) c IL.
For future applications, it will be convenient to have another variant of Theorem 18.13. Theorem 18.14. Let A E c nxn , BE C nxk , Q E suppose that Q !: 0, R ~ 0,
A - )'In] (a) rank [ Q
=n
c nxn ,
.
for every pomt ). E i~
and
(b) rank [A - )'In
B] = n for every point). E 11+.
R E C kXk ; and
18. Matrix equations
396
Then there exists exactly one Hermitian solution X of the Riccati equation AHX +XA-XBR-1BHX +Q
=0
such that cr(A - BR- I BH X) c IL. Moreover, this solution X is positive semidefinite over en, and if A, Band C are real matrices, then X E lR n x n . Proof. Since Q t: 0 there exists a matrix C E C rxn such that C H C = Q and rank C = rank Q = r. Thus, upon setting BI = BR- I/ 2 , we see that
[ A
-Q
-BR-~BH] = [ AH -A -C C
-Blf/!] -A
is of the form considered in Theorem 18.13. Moreover, since
_ 0 [ A -C>.In ] u-
¢=::}
[ A -Q>.In]
- 0
U-,
condition (a) implies that r ank [ A -C>.In ] = n
£or every pomt . A\ E
'lID
'/,~.
Furthermore, as rank [A - >.In B] = rank [A - >.In
BR- 1/ 2 ],
assumption (b) guarantees that rank [A - >.In
B I ] = n for every point>. E II+.
The asserted conclusion is now an immediate consequence of Theorem 18.13.
o Exercise 18.16. Show that if N, Y, M E c nxn , then the n x n matrix valued function X(t) = etNYe tM is a solution of the differential equation
X'(t) = NX(t) that meets the initial condition X(O)
+ X(t)M
= Y.
18.5. Two lemmas The two lemmas in this section are prepared for use in the next section. Lemma 18.15. Let A,Q E c nxn , B,L E C nxk , R E C kxk ,
E=[L~ ~] and suppose that E (18.18)
t: 0,
R >- 0 and that
rankE = rankQ+rankR.
18.5. Two lemmas
397
Then the formulas (18.19)
and rank [A - )"In
(18.20)
B]
= rank [A -
)"In
B]
are valid for the matrices (18.21)
and every point ).. E C.
Proof.
The formula
[L~ ~] = [~ L~:l] [~ ~] [R-~nLH Z] implies that rankE = rankQ + rankR
and
Q to.
Thus, in view of assumption (18.18), rank Q = rank Q and, since Q =
Q+
LR- LH is the sum of two positive semidefinite matrices, 1
However, since rankQ = rankQ
==}
dimNQ = dimNQ ,
the last inclusion is in fact an equality:
NQ =NQ and N Q ~NLH and hence, - )"In [ A -Q
1 -0 u-
{:=:}
[ A - )"In] u-- 0 .
Q
The conclusion (18.19) now follows easily from the principle of conservation of dimension. The second conclusion (18.20) is immediate from the identity
o
18. Matrix equations
398
-
-
Lemma 18.16. Assume that the matrices A, A, Q, Q, B, L, Rand E are as in Lemma 18.15 and that
rank E = rank Q + rank R,
(18.22)
. rank [ A -QAln] = n Jf.or every pomt
(18.23)
\
1\
'11])
E Z.l!'\.
and rank [A - A1n
(18.24)
B] = n for every point A E II+ .
Then there exists exactly one Hermitian solution X E equation
e nxn
of the Riccati
(18.25)
such that (T(A - BR- 1 BH X) C II_. Moreover, this solution X is positive semidefinite over en, and if the matrices A, B, Q, Land R are real, then X E lRnxn.
Proof.
Under the given assumptions, Lemma 18.15 guarantees that rank [ A -QAln
1= n for every point A
E
ilR
and
B] = n for every point A E II+.
rank [A-Aln
-
-
Therefore, Theorem 18.14 is applicable with A in place of A and Q in place ofQ. D
18.6. An LQR problem Let A E lR nxn and B E lR nxk and let
x(t) = etAx(O) + lot e(t-s)A Bu(s)ds,
0
~ t < 00,
be the solution of the first-order vector system of equations
x'(t)
=
Ax(t)
+ Bu(t), t ~ 0,
in which the vector x(O) E lR n and the vector valued function u( t) E lR k, t ~ 0, are specified. The LQR (linear quadratic regulator) problem in control engineering is to choose u to minimize the value of the integral (18.26)
18.6. An LQR problem
399
when
Q=
[3 QT E ~nxn,
L E ~nxk, R
~]~O, =
RT E ~kxk
and R is assumed to be
invertible. The first step in the analysis of this problem is to reexpress it in simpler form by invoking the Schur complement formula: [ Q LT
L] = [In0
LR- 1 ] [ Q - LR- 1LT
Ik
R
0
0] [ In R
R- 1LT
0 ] h .
Then, upon setting Q=Q-LR-1LT
1..=A-BR-1LT ,
and
v(s) = R-1LT x(s)
+ u(s),
the integral (18.26) can be reexpressed more conveniently as (18.27) where the vectors x( s) and v( s) are linked by the equation x'(s) = 1..x(s) + Bv(s) ,
(18.28)
i.e., (18.29) Lemma IS.17. Let X be the unique Hermitian solution of the Riccati equation (18.25) such that 0"(1.. - BR-1BTX) c IL. Then X E ~nxn and (18.30) Z(t)
= x(O)T Xx(O) - x(tf Xx(t) +
Let
Proof.
lot IIR-l/2(BT Xx(s) + Rv(s))II~ ds.
=
x'(s)T Xx(s) (Ax(s)
+ x(s)T Xx'(s)
+ Bv(s)fXx(s) +x(sfX(1..x(s) + Bv(s))
=
x(sf(1..TX + X1..)x(s) +v(sfBTXx(s) +x(sfXBv(s)
=
x(s)T(XBR-l BTX - Q)x(s)
=
(x(sfXB
+ v(sf BTXx(s) +x(s)TXBv(s)
+ v(s)TR)R-l(BTXx(s) + Rv(s))
-x(sfQx(s) - v(sfRv(s).
18. Matrix equatiolls
400
Therefore, Z(t)
lot {x(sfQx(s) + v(s)T Rv(s)}ds -
i
t d
o
-d {x(s)TXx(s)}ds s
+ lot (BT Xx(s) + Rv(s)f R-1(BT Xx(s) + Rv(s))ds,
o
which is the same as the advertised formula.
Theorem 18.18. If the hypotheses of Lemma 18.16 are met and if X E lR nxn is the unique Hermitian solution of the Riccati equation (18.25) with a(..4 - BR- 1 BT X) c IT_, then: (1) Z(t) ~ x(O)T Xx(O) - x(t)T Xx(t) with equality if v(s) = -R-1BTXx(s) for 0::; s::; t. (2) Ifv(s) = -R-IBTXx(s) for 0::; s < 00, then Z(oo) = x(Of Xx(O). Proof. The first assertion is immediate from formula (18.30). Moreover, if v(s) is chosen as specified in assertion (2), then x(t) is a solution of the vector differential equation x'(t) = (..4 - BR- 1 BT X)x(t) ,
and hence, as u(..4 - BR- 1 BT X) C IT_, it is readily checked that x(t)
as t 1 00.
-----t
0 0
18.7. Bibliographical notes The discussion of Riccati equations and the LQR problem was partially adapted from the monograph [74], which is an excellent source of supplementary information on both of these topics. The monograph [44] is recommended for more advanced studies. Exercise 18.4 is adapted from [1].
Chapter 19
Realization theory
... was so upset when her mother married one that she took to mathematics and Hebrew directly ... though she was the prettiest girl for miles around ...
Patrick O'Brian [53], p. 122 A function f (>.) is said to be rational if it can be expressed as the ratio of two polynomials: (19.1) A rational function f(>.) is said to be proper if f(>.) tends to a finite limit as >. --t 00; it is said to be strictly proper if f(>.) ----t 0 as >. --t 00. If ak i= 0, (3n i= 0 and the numerator and denominator have no common factors, then the degree of f (>.) is defined by the rule (19.2)
degf(>.) = max{k,n} .
In this case f(>.) is proper if n
~
k. A p
X
q mvf (matrix valued function)
is said to be rational if each of its entries is rational. It is said to be proper if each of its entries is proper or, equivalently, if F(>.) tends to a finite limit as >. --t 00 and strictly proper if F(>') ----t 0 as >. --t 00. Theorem 19.1. Let F(>') be a p X q rational mvf that is proper. Then there exists an integer n > 0 and matrices D E C pxq , C E C pxn , A E nxn and
c
-
401
19. Realization theory
402
BE C nxq such that
F{>.) = D + C{>.In - A)-l B .
(19.3) Proof.
Let us suppose first that
Xl
Xk F{>.) = (>. _ w)k
(19.4)
+ ... + >. _ w + Xo
,
where the Xj are constant p x q matrices. Let A denote the kp x kp block Jordan cell of the form
o o A=
o Let n
0
0
0
Ip wIp
...
= kp and let N=A-wIn
.
Then
(>.In - A)-1 = =
((>. - w)In - N)-l (>. - w)-1 {In + >. ~ w
+ ... + (>. ~~~-l }
since Nk = O. Therefore, the top block row of {>.In - A)-l is equal to
[Ip
O](>.In - A)-1 =
0
[~ >.-w
Ip (>. - w)2
Thus, upon setting
D=Xo, C= [Ip
0
01
and
B~ [1: 1
it is readily seen that the mvf F{>.) specified in (19.4) can be expressed in the form (19.3) for the indicated choices of A, B, C and D. A proper rational p x q mvf F(>') will have poles at a finite number of distinct points WI, ••• ,We E C and there will exist mvf's F I {>')={,
X 1k1
/\ - WI
)k
Xu
1
+"'+,/\ -
WI
Xekl.
,... ,Fe{>')=(,/\ -
with matrix coefficients X ij E C pxq such that e
F{>.) -
L F {>.) j
j=1
We
)k
XlI
I.
+ ... + >. -
We
19. Realization theory
403
is holomorphic and bounded in the whole complex plane. Therefore, by Liouville's theorem (applied to each entry of F(A) - I:~=l Fj(A) separately), e
F(A) - LFj(A) = D, j=l
a constant p x q matrix. Moreover,
D = lim F(A) , .>..-+00
and, by the preceding analysis, there exist matrices Cj E C pxnj , Aj E cnjxnj, B j E Cnjxq and nj =pkj for j = 1, ... ,£, such that
Fj(A) = Cj(AI - Aj )-l Bj
and
n = nl
+ ... + nl.
Formula (19.3) for F(A) emerges upon setting C
~
IC! ... Cd , A
~ diag{A), ... ,At!
and
B
~ [ :: 1 o
Formula (19.3) is called a realization of F(A). It is far from unique. Exercise 19.1. Check that the mvf F(A) defined by formula (19.3) does not change if C is replaced by C S, A by S-1 AS and B by S-1 B for some invertible matrix S E c nxn . Exercise 19.2. Show that if Ft (A) = Dl + Cl (Alnl - Ad- 1 Bl is a p x q mvf and F2(A) = D2 + C 2(Aln2 - A 2)-1 B2 is a q x r mvf, then
Fl(A) F2(A) = D3 + C3{Aln - A 3)-1 B3, where
D3 = DID2 , and n =
nl
C3 = [C l
DIC2] ,
A3 = [AI 0 -BIC2] A2 '
B3
= [BBID22]
+n2.
Exercise 19.3. Show that if D E C pxp is invertible, then (19.5) {D + C{Aln - A)-1 B} -1 = D- l - D- IC{Aln - [A - BD- IC])-IBD- l . Let C E C pxn , A E C nxn and B E C nxq. Then the pair (A, B) is said to be controllable if the controllability matrix ~ = [B AB A n- l B] is right invertible, i.e., if rank~=n.
19. Realization theory
4U4
The pair (C, A) is said to be observable if the observability matrix
D=
r
l
C.A
C
CA:n -
1
I
is left invertible, Le., if its null space
.ND = {O} . Exercise 19.4. Show that (C,A) is an observable pair if and only if the pair (A H , C H) is controllable. Lemma 19.2. The following are eq'uivalent:
(1) (A, B) is controllable. (2) rank[A - >.In
B]
=n
for eVe1"y pO'int
>.
E
c.
(3) The rows of the mvf (>'In - A)-l B are linearly independent in the sense that if u B (>.In - A)-1 B = OH for u E en and all >. in some open nonempty subset n of e that does not contain any eigenvalues of A, then u = O. (4) J~ esABBBesAH ds )- 0 for every t > O.
(5) For each vector v E en and each t > 0, there exists an m x 1 vector valued function u( s) on the interval 0 :-:; s :-:; t s'uch that
lot e(t-s)A Bu(s)ds = v. (6) The matrix (tJ!H is invertible.
(1)===?(2). Let u E en be orthogonal to the columns of the mvf [A - >.In B] for some point >. E C. Then
Proof.
u H A = >.u H and u H B
= OH
.
Therefore,
= >.ku H B = OH uHe: = OH. Thus, u = o.
u H AkB
for k
= 0, ... , n -1,
i.e.,
(2)===?(1). Suppose that uHAk B =
for k = 0, ... , n-1 for some nonzero Then .N'l.H is nonempty and hence, since .N([H is invariant vector u E under AH, AH has an eigenvector in this nullspace. Thus, there exists a nonzero vector vEe n and a point Q E e such that OH
en.
AHv
= QV and e:Hv = 0
.
405
19. Realization theory
But thiH implies that vH[A-aI
B] =OH,
= {O}; i.e., (A, B) is controllable.
which is incompatible with (2). Thus, Nr!H
(1){::::::::}(3). This follows from the observation that
{::::::::}
OH for ).. E n u H ()"In - A)-1 B
{::::::::}
uH
u H ()"In - A)-1 B
f
= OH for ).. E e \ O'(A)
~; B = OH
for
1)..1 > IIAII
j=O
{::::::::} u H A k B = OH {::::::::} uHIt
for
k = 0, ... ,n - 1
= OH.
The verification of (4), (5) and (6) is left to the reader.
o
Exercise 19.5. Show that (A, B) is controllable if and only if condition (4) in Lemma 19.2 is met. Exercise 19.6. Show that (A, B) is controllable if and only if condition (5) in Lemma 19.2 is met. Exercise 19.7. Show that (.4, B) is controllable if and only if ItItH is invertible. Lemma 19.3. The following are equivalent:
(1) (C, A) is observable. (2) rank [ A -;In ]
=n
fO'r every point).. E C.
(3) The columns of the mvf C()"In - A)-1 are linearly independent in the sense that ifu E and C()"In _A)-lu = 0 for all points).. in some open nonempty subset n of e that does not contain any eigenvalues of A, then u = O.
en
(4) J~CesAes,4HCHds >- 0 for every t
> O.
(5) For each vector vEe n and each t > 0, there exists a p x 1 vector valued function u( s) on the interval 0 ~ s ~ t such that
t e(t-s)AH CHu(s)ds = v .
.fo
(6) The matrix OH 0 is invertible.
.
406 Proof.
19. Realization theory
(1)==}(2). Let [ A -C>.In ] u
=0
for some vector u E C n and some point>' E C. Then Au = >.u and Cu = O. Therefore, CAku = >.kCu = 0 for k = 1,2, ... also and hence u = 0 by (1). (2)==}(1). Clearly No is invariant under A. Therefore, if No I: {O}, then it contains an eigenvector of A. But this means that there is a nonzero vector v E No such that Av = av, and hence that
[ A -; aI ] v = 0 . But this is incompatible with (2). Therefore, (C, A) is observable. (3){:::::::}(1). This follows from the observation that (19.6) C(>.In - A)-Iu
= 0 for >. E n {:::::::} CAku = 0 for k = 0, ... ,n - 1.
The details and the verification that (4), (5) and (6) are each equivalent to observability are left to the reader. 0 Exercise 19.8. Verify the equivalence (19.6) and then complete the proof that (3) is equivalent to (1) in Lemma 19.3. Exercise 19.9. Show that in Lemma 19.3, (4) is equivalent to (1). Exercise 19.10. Show that in Lemma 19.3, (5) is equivalent to (1). Exercise 19.11. Show that the pair (C, A) is observable if and only if the matrix ORO is invertible. Exercise 19.12. Let F(>.) = Ip+G(>.In _A)-I Band G(>') = Ip-GI(>.InAI)-I B I • Show that if G1 = G and (C, A) is an observable pair, then F(>')G(>') = Ip if and only if Bl = B and Al = A - BG. A realization F(>.) = D+G(>.In _A)-I B of a p x q rational nonconstant mvf F(>.) is said to be an observable realization if the pair (C, A) is observable; it is said to be a controllable realization if the pair (A, B) is observable. Theorem 19.4. Let F(>') be a nonconstant proper rational p x q mvf such that F(>') = DI
+ G1 (>.In1
- AI)-l Bl = D2
+ G2(>'In2 -
A 2)-1 B 2 ,
and suppose that both of these realizations are controllable and observable. Then:
19. Realization theory
407
(1) Dl = D2 and nl = n2. (2) There exists exactly one invertible nl x nl matrix Y such that C 1 = C 2Y , Al = y- 1A 2Y and Bl = y-1B2 . Proof.
It is readily checked that
F(oo) = Dl = D2 and that Let
O2 =
and bear in mind that the identity
~2 -=1=
C~~2l [C2A:~1-1
1!2 and O2 -=1= D2 unless n2 = nl . Nevertheless,
Dll!l = 02~2 holds. Moreover, under the given assumptions, the observability matrix D1 is left invertible, whereas the controllability matrix I!l is right invertible. Thus, the inclusions
imply that rank(D 1) = rank(D11!11!{i) :::; rank(DIl!l) :::; rank(D 1 ) = nl , which implies in turn that nl
= rank{D l 1!1) = rank(02~2) :::; n2 .
However, since the roles played by the two realizations in the preceding analysis can be reversed, we must also have n2:::; nl .
Therefore, equality prevails and so D2 Next, observe that the identity
= 02 and 1!2 =
~2.
DIBI = D2B 2, implies that Bl
= XB2,
Similarly, the identity implies that
where X
= (D{iD 1)-lD{iD2 .
19. Realization theory
408
Moreover,
Thus, X
XY -
(.oF.ol)-l.oF.o2(t2(tF((tl(tF)-1
=
(.oF .oI}-l.oF.o l (tl (tF ((tl (tF)-l
-
I n1
•
= y-l.
Finally, from the formula
DIAl (tl =
.o2A 2(t2
we obtain
Al = _
(.oF.ol)-l.oF.o2A2(t2(tF ((tl (tF)-l XA 2 Y = y-l A 2 Y . D
Exercise 19.13. Verify the asserted uniqueness of the invertible matrix Y that is constructed in the proof of (2) of Theorem 19.4.
19.1. Minimal realizations Let
F(>.) = D + C(>.In - A)-l B
(19.7)
for some choice of the matrices C E C pxn , A E c nxn , B E C nxq and DE C pxq • Then this realization is said to be minimal if the integer n is as small as possible, and then the number n is termed the McMillan degree of F(>').
Theorem 19.5. A realization (19.7) for a proper rational nonconstant function F (>.) is minimal if and only if the pair (C, A) is observable and the pair (A, B) is controllable. For ease of future reference, it is convenient to first prove two preliminary lemmas that are of independent interest. Lemma 19.6. The controllability matrix (t has rank k < n if and only if there exists an invertible matrix T such that (1) T- 1 AT =
[Ad l ~~:]
, T- 1 B =
C kxq and
(2) the pair (Au, Bl) is controllable.
[~],
where All E C kXk , Bl E
19.1. Minimal realizations
409
Proof. Suppose first that e: has rank k < n and let X be an n x k matrix whose columns are selected from the columns of e: in such a way that rank X = rank e: = k . Next, let £ = n - k and let Y be an n x £ matrix such that the n x n matrix
Y]
T=[X
is invertible and express T- 1 in block row form as T- 1
[~]
=
,
where U E C kxn and V E C exn . Then, the formula In = T-1T =
[~] [X
Y] =
[~~ ~~]
implies that UX = Ik , UY = Okxe , VX = Oexk
and
VY = Ie·
Moreover, since Ae: = e:E, e: = X F, X = e:G
(19.8)
and
B = e:L
for appropriate choices of the matrices E, F, G and L, it follows easily that AX = A
and
B = e:L.
Therefore, V AX = V X F EG = 0
and VB = V X F L = O.
Thus, in a self-evident notation, T- 1AT
=
[~] A[X uAX [VAX
Y]
U AY] = [All VAY 0
A12] A22
and
where All E C kxk and Bl E C kxq. Furthermore, since T- 1Ai B = {T- 1AT)iT- 1B for j = 0, 1, ... ,n - 1, it is readily checked that T-1e: = [Bl
o
AllBl 0
... ...
A~101 Bl]
and hence that k=ranke:=rankT-1e:=rank[Bl
AllBl ...
A~11Bl].
19. Realization theory
410
Thus, (An, Bd is controllable. This completes the proof in one direction. The converse is easy. 0 Exercise 19.14. Verify the assertions in formula (19.8). Lemma 19.7. The observability matrix D has rank k < n if and only if there exists an invertible matrix T such that (1) CT = [C1 0], T- l AT =
[~~~ ~J,
where C1 E C pxk , An E
C kxk and (2) the pair (Cl, All) is observable. Proof. Suppose first that the observability matrix D has rank k < nand let U be a k x n matrix whose rows are selected from the rows of D in such a way that rank U = rank D = k . Then, let f.
[~]
= n - k and let V be any f. x n matrix such that the n x n matrix
is invertible, and set
T= [X Y] =
[~]-1
,
where X E C nxk and Y E C nxe . The next step is to check that CY
=0
and
U AY
=0 .
The details may be worked out by adapting the arguments used to prove Lemma 19.6 and are left to the reader, as is the rest of the proof. 0 Exercise 19.15. Complete the proof of Lemma 19.6. Proof of Theorem 19.5. If the realization (19.7) is not controllable, then, by Lemma 19.6, there exists an invertible matrix T E C nxn such that
J
T- l AT = [A
I
~~~]
and T- l B
=
[~]
and the pair (A ll ,Bl) E C kxk x C kxq is controllable. Let
CT= [C1 C2]. Then, as
()..In - T- 1AT)-1
=
[( )..Ik -0Al1)-1
(>.Ik - A 11 )-1 A I2 (>'In - A 22)-I] (; (>.Ie - A 22 )-1 '
19.1. Minimal realizations
411
it is easily seen that [G 1
G2 ]
['\h ~
An
Gl ('\h - An)-1 Bl ,
which contradicts the presumed minimality of the original realization. Therefore, a minimal realization must be controllable. The proof that a minimal realization must be observable follows from Lemma 19.7 in much the same way. Conversely, a realization that is both controllable and observable must be minimal, thanks to Theorem 19.4 and the fact that a minimal realization 0 must be controllable and observable, as was just proved above. Exercise 19.16. Let (G, A) E C pxn x c nxn be an observable pair and let u E cn. Show that G('\In _A)-lu has a pole at a if and only if ('\In _A)-IU has a pole at a. [HINT: First show that it suffices to focus on the case that A = Gin) is a single Jordan cell.] Exercise 19.17. Let (G, A) E C pxn x c nxn be an observable pair and let u(,\) = Uo + '\Ul + ... + ,\kUk be a vector polynomial with coefficients in cn. Show that G('\In - A)-lu('\) has a pole at a if and only if ('\In - A)-lu('\) has a pole at a. [HINT: Try Exercise 19.16 first to warm up.] Theorem 19.8. Let the realization
F('\) = D + G('\In - A)-1 B be minimal. Then the poles of F('\) coincide with the eigenvalues of A. Discussion. Since F('\) is holomorphic in C \ O'(A), every pole of F(A) must be an eigenvalue of A. To establish the converse statement, there is no loss of generality in assuming that A is in Jordan form. Suppose, for the sake of definiteness, that A = diag {G~~), G~;)}
with
WI
i= W2 .
Then, upon writing C
= [Cl,'" ,C5]
and
BT
= [bl,'" , b 5] ,
one can readily see that G('\h - A)-1 B
=
+
cl b
I + cl b f + c2 b I + cl b f + c2 b f + c3 b I
(,\ - Wl)3 c4b
g
(,\ - W2)2
(A - WI)
(,\ - wI)2
C4br + c5b
g
+-,....;;---,-~
(,\ - W2)
19. Realization theory
412
Moreover, item (2) in Lemma 19.3 implies that -1 A - WI
A - WI rank
0 0 0 0 C1
0 0 0 C2
0 -1 A -WI 0 0 C3
A - W2
0 0 0 -1
0 C4
A - W2 C5
0 0 0
=n
for every point A E C and hence, in particular that the vectors C1 and C4 are both different from zero. Similarly, item (2) in Lemma 19.2 implies that the vectors hI and hg are both nonzero. Therefore, the matrices C1 hI and c4bg are nonzero, and thus both WI and W2 are poles of F(A). D Theorem 19.8 does not address multiplicities of the poles versus the multiplicities of the eigenvalues. This issue is taken up in Theorem 19.9. Theorem 19.9. Let F(A) = D + C(AIn - A)-l B be an m x l rational mvf and suppose that A E c nxn has k distinct eigenvalues AI, ... ,Ak with algebraic multiplicities 01. ... ,Ok, respectively. Let F~j) J
OJ =
0
F U)
F,(j)
1
O!j-l
F~j)
F,Cj) 2
J
0
0
F~j) J
be the block Toeplitz matrix based on the matrix coefficients FiU) of (A Aj ) -i, i = 1, . .. ,OJ, in the Laurent expansion of F (A) in the vicinity of the point Aj. Then the indicated realization of F(A) is minimal ~f n.nrl. nnly if
rank OJ Proof.
= OJ for j = 1, ... ,k.
Let
A = =
UJU- 1 [U1· .. u.]
[Jl. J[J.l
- EJ=l UjJj Vj, where Uj E C nxaj, Vj E C O!j xn and J j E C O!j XO!j are the components in a Jordan decomposition of A that are connected with Aj for j = 1, ... ,k. Then, just as in Chapter 17, if i = j if i=!=j
19.1. Minimal realizations
413
and the n x n matrices
Pj
pJ =
are projections, i.e., Jordan cells.
= Uj Vj, j = 1, ... , k, even though here Jj may be include several
Pj,
The formula C(>"ln - A)-l B
-
CU(>"ln - J)-lU- 1 B
- c C
{t {t
Ui(AI", - Ji)-lV; } B
U,((A - A,)!", - N,)-l V; } B
implies that the poles in the Laurent expansion of F(>") in the vicinity of the point >"j are given by CU·((>.. - >..J·)1a ]. - N·)-lVB J J J _ {la; -CUj >"_>"j
Nj
N?-l
+ (>"_>"j)2 + ... + {>"_>"j)a;
Con::>equently,
XN"Y
nj =
[
o
XN",-ly XN"'y
o
0
with X = CUj, Y = VjB, N = N j and K = for j > K,
(19.9)
nj
=
XN Xl [: _ XN'"
Next, let (19.1O)
~
[N"'Y
.. . .. .
aj
}
VjB.
I :X.: XY
-1. Therefore, since Nj = 0
~
N",-ly
... Y].
414 for j
19. Realization theory
= 1, . ..
,k. It is readily checked that
(C, A) is observable {:::::} rank OJ =
aj
for j = 1,··· ,k
(A, B) is controllable {:::::} rank (tj =
aj
for j
(19.11) and (19.12)
= 1, ...
,k.
Moreover, in view of Exercises 19.20 and 19.21 (below), the ranks of the two factors in formula (19.9) for OJ are equal to rank OJ and rank (tj, respectively. Therefore, rank (OJ) ::; min{rank OJ, rank (tj} ::; aj, and hence as it is readily seen that rank OJ
= aj {=} rank OJ = aj and rank (tj = aj.
Therefore, the indicated realization is minimal if and only if rank OJ for j = 1, ... ,k.
=
aj
0
Exercise 19.18. Show that the coefficients Fji) of (A-Aj)-i in the Laurent expansion of the matrix valued function F(A) considered in Theorem 19.9 are given by the formula
Fij)
= CPj(A - AjIn)i-I B for £ = 1,··· ,aj,
where Pj = Uj Vj is the projector defined in the proof of the theorem. Exercise 19.19. Show that the projector Pj defined in the proof of Theorem 19.9 is the lliesz projector that is defined by the formula 1 . { (AIn - A)-IdA, Pj = -2 7rZ
Jrj
if r j is a small enough circle centered at Aj and directed counterclockwise. Exercise 19.20. Show that if C E C pxn and A E C nxn, then
rank
CA
[c ~Ai
I
= rank
[CC(aIn + A) 1 C(aIn + A)i
for every positive integer £ and every point a E C. Exercise 19.21. Show that if A E
c nxn and B
E
C nxq , then
rank [B AB ... AiB] = rank [B (aIn + A)B ... (aIn for every positive integer £ and every point a E C.
+ A)iB]
19.2. Stabilizable and detectable realizations
415
19.2. Stabilizable and detectable realizations
c
c
A pair of matrices (A, B) E nxn x nxr is said to be stabilizable if there exists a matrix K E rxn such that u(A + BK) c IL.
c
Lemma 19.10. Let (A,B) E
c nxn x c nxr .
Then the following two condi-
tions are equivalent: (1) (A, B) is stabilizable. (2) rank [A - >..In B] = n for every point>.. E II+. Proof. Suppose first that (A, B) is stabilizable and that u(A+BK) C II_ for some K E C rxn. Then the formula
B] = [A->..In
[A+BK ->..In
B] [IKn
0] Ir
clearly implies that
B] = rank [A + BK - >..In
rank [A - >..In
B]
and hence that (1) ====? (2), since A + BK - >..In is invertible for>.. E II+. Suppose next that (2) is in force. Then, by Theorem 18.13 there exists a solution X of the Riccati equation (18.17) with C = In such that u(A BBH X) c II_. Therefore, the pair (A, B) is stabilizable. 0 A pair of matrices (C, A) E c mxn x c nxn is said to be detectable if there exists a matrix L E c nxm such that u(A + LC) c II_.
Lemma 19.11. Let (C, A) E
c mxn x c nxn .
Then the following two con-
ditions are equivalent: (1) (C, A) is detectable. (2) rank [A->"In] C
.>.. = n for every pomt E II+.
Proof. Suppose first that u(A + LC) the formula
[~ I~]
c
II_ for some L E
[ A -C>..In ] = [ A + L~ - >..In ]
clearly implies that
A - >..In ] _ k [ A rank [ C -ran
+ LC C
for every point >.. E C and hence that (1) ====? (2). Suppose next that (2) is in force. Then rank [AH - >..In
C H] = n
>..In ]
c nxm .
Then
19. Realization theory
416
for every point A E II+, and hence, by Theorem 18.13 with G
=
[AH -In
-OHC] -A '
there exists a Hermitian solution X of the Riccati equation AX +XAH _XCHCX +In = 0
such that Therefore u(A - XOHO)
c II_.
o Exercise 19.22. Show that (A, B) is stabilizable if and only if (BH, AH) is detectable. Exercise 19.23. Show that if (C, A) is detectable and (A, B) is stabilizable, then there exist matrices K and L such that
U([L~ A-~~~LC]) cII_. [HINT: [In In
0]
-In
-1 [
A - B K ] [In LO A-BK-LC In
0]
-In -
[A - BK
0
BK ] A-LC
.J
19.3. Reproducing kernel Hilbert spaces A Hilbert space 'It of complex m x 1 vector valued functions that are defined on a nonempty subset n of e is said to be a reproducing kernel Hilbert space if there exists an m x m mvf Kw(A) that is defined on n x n such that for every choice of wEn and u E em the following two conditions are fulfilled: (1) Kwu E 'It.
(2) (I, Kwu}Ji = u H f(w) for every f
E
'It.
An m x m mvf Kw(A) that meets these two conditions is called a reproducing kernel for 'It. Lemma 19.12. Let 'It be a reproducing kernel Hilbert space of em valued functions that are defined on a nonempty subset n of e with reproducing kernel Kw(A). Then (1) The reproducing kernel is unique; i.e., if Lw(A) is also a reproducing kernel for 'It, then Kw (A) = Lw (A) for all points A, wEn. (2) Kw(A) = K>.(w)H for all points A, WEn.
19.3. Reproducing kernel Hilbert spaces
417
(3) I:~j=l uf KWj (Wi)Uj ~ 0 for every choice of the points WI, ... and the vectors
Ul ... ,Un E
,Wn E
n
em.
Proof. If Lw(>") and Kw(>") are reproducing kernels for the reproducing kernel Hilbert space 1t, then
v HLa ({3)u -
(Lau, K{3v)7-£ = (K{3v, L au)7-£ u HK{3(a)v = (u HK{3(a)v)H vHK{3(a)Hu
for every choice of u and
v in em. Therefore,
for every choice of a and {3 in
n.
In particular, this implies that
and hence, upon invoking both of the last two identities, that
for every choice of a and (3 in n. This completes the proof of (1) and (2). The third assertion is left to the reader. D Exercise 19.24. Justify assertion (3) of Lemma 19.12. In this section we shall focus on a class of finite dimensional reproducing kernel Hilbert spaces. Theorem 19.13. Every finite dimensional Hilbert space 1t of strictly proper rational m x 1 vector valued functions can be identified as a space (19.13)
M(X) = Mr(X) = {F(>..)Xu: u
E
en},
endowed with the inner product
(FXu, FXV)M(X) = vHXu,
(19.14) where
(19.15)
r = (C, A) E e mxn x e nxn
is an observable pair,
F(>") = C(>..In - A)-I
(19.16) and
(19.17)
X t 0
is an n x n matrix with
rank X = dim 1t.
19. Realization theory
418
Proof. Let {fl(.~)"" ,fr(),)} be a basis for an r dimensional inner product space M of strictly proper m x 1 rational vector valued functions. Then, in view of Theorem 19.1, there exists a set of matrices C E e mxn , A E e nxn and BEe nxr such that the m x r matrix valued function with columns f 1 (),), ... , fr(),) admits a minimal realization of the form [f1 (),)
..•
fr (),)] = C()'In - A)-l B.
Moreover, if C denotes the Gram matrix with entries gij
= (fj,fi)M, for
i,j
= 1, ... ,r,
and F()') = C()'In - A)-I, then 1i = {F{),)Bu : u E
er }
and (19.18)
(F Bx, F BY)1i = yH Cx
for every choice of x, y E
er .
Let X = BC-1BH. Then it is readily checked that NBH = N x and hence that'RB = 'Rx. Thus X t 0, rank X = r and formulas (19.13) and (19.14) drop out easily upon taking x = C- 1 BH u and y = C- 1 BH v in formula 0
(19.18).
Exercise 19.25. Verify formulas (19.13) and (19.14) and check that the inner product in the latter is well defined; i.e., if FXUl = FXU2 and FXVI = FXV2, then V{iXUl = V!jXU2. Theorem 19.14. The space M(X) defined by formula (19.13) endowed with the inner product (19.14) is a reproducing kernel Hilbert space with reproducing kernel (19.19)
K~()') = F()')XF{w)H.
Proof. Clearly Ktt{),)u E M (as a function of ),) for every choice of u E em and wEn. Let f E M. Then f{),) = F()')Xv for some vector Then, in view of formula (19.18), v E
en.
(f,K;;:U)M =
(FXv,FXF{w)Hu)M u HF{w)Xv = u H f{w) ,
for every vector u E em and every point WEn, as needed.
19.4. de Branges spaces A matrix J E (19.20)
e mxm is said to be a signature matrix if
o
19. Realization theory
418
Proof. Let {fl(,x), ... , fr(,x)} be a basis for an r dimensional inner product space M of strictly proper m x 1 rational vector valued functions. Then, in view of Theorem 19.1, there exists a set of matrices C E em x n, A E en x n and B E e nxr such that the m x r matrix valued function with columns fl (,x), ... ,frCX) admits a minimal realization of the form [fl(,x) ...
fr(,x)] = C(,xln - A)-1 B.
Moreover, if C denotes the Gram matrix with entries gij
and F(,x)
= (fj,fi)M,
for
i,j
= 1, ... ,r,
= C(,xln - A)-I, then 'H = {F(,x)Bu:
u E
er }
and (19.18)
(F Bx, F By)'H. = yH Cx for every choice of x, y
E
e
r .
Let X = BC-l BH. Then it is readily checked that NBB = N x and hence that RB = Rx. Thus X t 0, rank X = r and formulas (19.13) and (19.14) drop out easily upon taking x = C- 1 BH u and y = C- 1 BH v in formula (19.18). 0
Exercise 19.25. Verify formulas (19.13) and (19.14) and check that the inner product in the latter is well defined; i.e., if FXUl = FXU2 and FXVI = FXV2, then v{f XUl = v!f XU2. Theorem 19.14. The space M(X) defined by formula (19.13) endowed with the inner product (19.14) is a reproducing kernel Hilbert space with reproducing kernel (19.19)
K~(,x)
= F(,x)XF(w)H .
Clearly K~(,x)u E M (as a function of ,x) for every choice of u E em and wEn. Let f E M. Then f(,x) = F(,x)Xv for some vector v E en. Then, in view of formula (19.18),
Proof.
(j,K!:U)M =
(FXv,FXF(w)Hu)M u H F(w)Xv = u Hf(w) ,
for every vector u E em and every point wEn, as needed.
19.4. de Branges spaces A matrix J E (19.20)
e mxm is said to be a signature matrix if
o
19.4. de Branges spaces
419
Exercise 19.26. Show that if J E c mxm is a signature matrix, then either J = ±Im or J = Udiag{Ip, -Iq}U H , with U unitary, p ~ 1, q ~ 1 and p+q = m. The finite dimensional reproducing kernel Hilbert space M(X) will be called a de Branges space H(8) if there exists a proper rational m x m mvf 8('x) and an m x m signature matrix J such that
F('x)XF(w)H = J - 8flJ:(w)H
(19.21)
Theorem 19.15. The finite dimensional reproducing kernel Hilbert space M(X) is a de Branges space H(8) if and only if the Hermitian matrix X is a solution of the Riccati equation (19.22) X AH + AX + XC HJCX = O. Moreover, if X is a solution of the Riccati equation (19.22)' then 8('x) is uniquely specified by the formula 8('x) = 1m - F('x)XC HJ ,
(19.23)
up to a constant factor K E C mxm on the right, which is subject to the constraint
KJKH=J. Proof. Suppose first that there exists a proper rational m x m matrix valued function 8('x) that satisfies the identity (19.21). Then (19.24)
(,X
+ w)F('x)X(wln - AH)-lCH = J - 8('x)J8(w)H ,
and hence, upon letting w tend to infinity, it follows that
F('x)XC H = J - 8('x)J8(00)H. The identity (19.24) also implies that
J - 8('x)J8( _X)H = 0 and consequently, upon letting ,X = iv with v E JR, that
8(iv)J8(iv)H = J and hence that
8(00)J8(00)H = J. Thus, 8(00) is invertible and the last formula can also be written as
8(00)HJ8(00) = J. But this in turn implies that
F('x)XC HJ8(00) = =
(J - 8('x)J8(00)H)J8(00) 8(00) - 8('x).
19. Realization theory
420
Therefore, 8(>') is uniquely specified by the formula
=
8(>')
(Im - F(>')XC H J)8( 00) ,
up to a multiplicative constant factor K = 8(00) on the right, which meets the cOllstraint K J K H = J. Moreover, since 8 (00 ) J8 (00 ) H = J, (Irn - F(>')XC H J)8(00)J8(00)H(Im - JCXF(w)ll) J - F(>')XC R - CXF(w)H + F(>')XC H JCXF(w)H
8(>')J8(w)H
J - F(>.){·.· }F(w)R, where { ... }
X(w1n - AH) =
+ ()''In -
A)X - XC H JCX
()..+w)X_(XAH+AX+XCHJCX).
Therefore, (19.25)
J - 8()")J8(w)H )..+w
=
F()")XF(w)H F(>')(XAH
+ AX + XC R JCX)F(w)H )..+w
Thus, upon comparing the last formula with the identity (19.21), it is readily seen that F()")(XAH
+ AX + XC R JCX)F(w)H = 0
and hence, as (C, A) is observable, that X must be a solution of the Riccati equation (19.22). Conversely, if X is a solution of the Riccati equation (19.22) and if 8()") is then defined by formula (19.23), the calculations leading to the formula (19.25) serve to justify formula (19.21). 0
R:r. invariance
19.5.
Lemma 19.16. Let M(X) denote the space defined in terms of an obserlJable pair·r = (C,A) E c mxn x c nxn and an n x n matrix X ~ 0 as in Theorem 19.13. Then the following conditions are eq'uivalent: (1) The space M(X) is inva'riant under the action of the backward sh'ift operator (19.26)
(Rexf)()..) = f()..) - f(a) )..-a
for ever'y point a E C \ a(A). (2) The space M(X) is invariant under the action of the operator Rex for at least one point a E C \ a(A).
19.6. Fa.ctorization of 8("\)
421
e nxn . for some ma.trix An E e nxn with O"(An) ~ O"(A).
(3) AX = XAn for some matrix An E
(4) AX = XAn
Proof. Let h j ("\) = F("\)Xej for j = 1, ... , n, where ej denotes the j'th column of the identity matrix In and suppose that (2) is in force for some point nEe \ O"(A). Then, since (Roh))("\)
=
F("\) - F(n)X ,,\ _ n
ej
F("\) { (o.In - A1
=~"\In
- A)} {01n _ A)-I Xej
-F("\)(nIn - A)-I Xej , the invariance assumption guarantees the existence of a set of vectors en such that
-F("\)(aIn - A)-l Xej = F("\)Xvj
for
Vj
E
j = 1, ... , n.
Consequently,
-(nIn-A) -I·Xej=Xvj
for
j=1, ... ,n,
and hence
-(o.In - A)-l X [el
...
en] = X
[VI
vnJ
which is the same as to say that
-(aLn - A)-I X = XQa for some matrix Qa E e nxn . In view of Lemma 20.14, there is no loss of generality in aSl:;uming that Qn is invertible and consequently that
AX
= X{o.In + Q;;l),
which serves to justify the implication (2) ===} (3). The equivalence (3) {::::::> (4) is covered by another application of Lemma 20.14. The remaining implications (4) ===} (1) ===} (2) are easy and are left to the reader. 0 Exercise 19.27. Complete the proof of Lemma 19.16 by justifying the implications (4) ===} (1) ===} (2).
19.6. Factorization of 8(.A) We shall assume from now on that X E e nxn is a positive semidefinite solution of the Riccati equation (19.27) XA H + AX +XCHJCX = 0, and shall obtain a factorization of the matrix valued 8("\) based on a decomposition of the space M(X). In particular, formula {19.27} implies that
AX
= X{_AH - CHJCX)
19. Realization theory
422
and hence, in view of Lemma 19.16, that M(X) is invariant under the action of RO/ for every choice of a E C \ u(A). Thus, if a E C \ u(A), Lemma 19.16 guarantees the existence of a nonzero vector valued function gl E M and a scalar f..t I E C such that
Therefore, since gl('\) = F('\)XUI
for some vector UI E
cn such that XUI i= 0, it follows that
(RO/gI )('\)
= -F('\)(a1n - A)-l XUI = f..tIF('\)XUI
and hence that
-(aln
-
A)-l XUI
But this in turn implies that f..tl rewritten as
i=
= f..tIXUI.
O. Thus, the previous formula can be
(19.28) and, consequently, (19.29)
and (19.30) Let
MI = {,8gl : ,8 E C} denote the one dimensional subspace of M spanned by gl (,\) and let III denote the orthogonal projection of M onto MI. Then II1FXv = =
(FXv,gl)M u{iXv FX g Ul (gl,gl)M I - U{iXUI FXUI(u{iXUl)-lu{iXv = FXIV
with (19.31) Let QI = UI{u{i XUl)-lu{i X.
Then, since Ql is a projection, i.e., Qi = Qb it is readily checked that Xl = XQI
and
= xQi = XIQl = Q{ixi
19.6. Factorization of 8(A)
423
Thus, with the aid of the formula AXl =wlXl,
it follows that Q{i AXl -
WIXI
= WIQ{i Xl
= AXI
and Q{iXA HQ1 =X1AH.
Consequently, upon multiplying the Riccati equation (19.27) on the left by Q{i and on the right by Q1, it follows that
o =
Q{iXA HQ1+Q{iAXQ1+Q{ixC HJCXQ1 = X1AH + AX1 + X 1C H JCX 1 ;
i.e., Xl is a rank one solution of the Riccati equation (19.27). Therefore,
MI
= 1l({h)
is a de Branges space based upon the matrix valued function
19 1 (A) = Im - F(A)X1 CHJ = Im _
C~lCHJ. -WI
Mr denote the orthogonal complement of M1 in M(X). Then Mr = {(I - II 1)FXu: u E en} = {FX u: u E en}, (19.32)
Let
2
where
Let and By formula (19.5),
191(A)-1 = Im + C(AIn - A 1)-1 X1C H J, where Al
= A + Xl cH JC .
Moreover, by straightforward calculations that are left to the reader, (19.33) (19.34)
424
19. Realization theory
and, since both X and Xl are solutions of (19.27), X 2 the Riccati equation
~
0 is a solution of
(19.35) with rank X 2
=r-
1. Thus, as the pair
(C,Ad = (C,A+XlCHJC) is observable, we can define
FI(A) = C(AIn - At)-l and the space
M2 = {FI(A)X2U : U endowed with the inner product
E
en}
(FIX2U, FlX2V)M2 = v HX 2u is also Ra invariant for each point
E
Q
e \ O"(At).
Therefore,
M2 = 1-£(8 2 ), and the factorization procedure can be iterated to obtain a factorization
8(A) = '!9l(A)··· '!9k(A) of SeA) as a product of k elementary factors of McMillan degree one with k = rankX. Exercise 19.28. Verify the statements in (19.32). Exercise 19.29. Verify formula (19.34). [HINT: The trick in this calculation (and others of this kind) is to note that in the product, the two terms
C(AIn - A)-IXCHJ
+ C(AIn -
Al)-l XlC HJC(AIn - A)-l XC HJ
can be reexpressed as
C(AIn - Altl{AIn - Al
+ XlC HJC}(AIn -
A)-l XC HJ,
which simplifies beautifully.] Exercise 19.30. Show that if f E M2, then
[HINT: First check that
t?ICA)C(AIn - AI)-l X2 = C(AIn - A)-l X 2 and then exploit the fact that
X 2 = X(In - Ql) = (In - Q{f)X(In - Qd·] Exercise 19.31. Show that rankX2
= rank X - rank Xl.
19.7. Bibliographical notes
425
Exercise 19.32. Let A E C nxn, e E C mxn, J E C mxm, let ).1. ... ,).k denote the distinct eigenvalues of A; and let P be a solution of the Stein equation P - AH PA = e H Je. Show that if 1- Ai).j =I- 0 for i,j = 1, ... ,k, then No ~ N p . [HINT: No is invariant under A.] Exercise 19.33. Let A E C nxn, e E C mxn, J E C mxm; and let P be a solution of the Lyapunov equation AH P + PA = e H Je. Show that if O'(A) n 0'( _AH) = 0, then No ~ N p . [HINT: No is invariant under A.]
19.7. Bibliographical notes The monographs [14] and [12] are good sources of supplementary information on realization theory and applications to control theory. Condition (2) in Lemmas 19.2, 19.3, 19.10, 19.11 and variations thereof are usually referred to as Hautus tests or Popov-Belevich-Hautus tests. Theorem 19.9 is adapted from [6]. Exercise 19.23 is adapted from Theorem 4.3 in Chapter 3 of [61]. The connection between finite dimensional de Branges spaces and Riccati equations is adapted from [22]. This connection lends itself to a rather clean framework for handling a number of bitangential interpolation problems; see e.g., [23]; the treatment of factorization in the last section is adapted from the article [20], which includes extensions of the factorization discussed here to nonsquare matrix valued functions.
Chapter 20
Eigenvalue location problems
When I'm finished [shooting] that bridge... I'll have made it into something of my own, by lens choice, or camera angle, or general composition, and most likely by some combination of all those. I don't just take things as given, I try to make them into something that reflects my personal consciousness, my spirit. I try to find the poetry in the image. Waller [70], p. 50 If A E
c nxn
and A = A H , then u(A)
c
1R and hence:
• f+(A) = the number of positive eigenvalues of A, counting multiplicities;
• f_ (A) = the number of negative eigenvalues of A, counting multiplicities;
• fo(A) = the number of zero eigenvalues of A, counting multiplicities. Thus,
20.1. Interlacing Theorem 20.1. Let B be the upper left k x k corner of a (k + 1) x (k + 1) Hermitian matrix A and let Al(A) ~ ... ~ Ak+1(A) and Al(B) ~ ... ~
-
427
428
20. Eigenvalue location problems
Ak(B) denote the eigenvalues of A and B, respectively. Then Aj(A)
(20.1)
Proof.
~
Aj(B)
~
A)+1 (A)
j = 1, ... ,k.
for
Let
a(X)
= max {(Ax, x) : x E X and Ilxll = I}
for each subspace X of C k+1 and let b(Y)
= max {(By,y) : y E Y and Ilyll = I},
for each subspace Y of C k. Let Sj denote the set of all j-dimensional subspaces of C k+1 for j = 1, ... ,k + 1; let 7j denote the set of all jdimensional subspaces of C k for j = 1, ... ,k; and let SJ denote the set of all j-dimensional subspaces of Ck+l for j = 1, ... ,k that are orthogonal to eHI, the k + l'st column of Ik+l. Then, by the Courant-Fischer theorem,
Aj(A) =
min a(X)
XESj
< min a(X) = min b(Y) XES;
Aj(B)
YETj
for
j = 1, ... ,k.
The second inequality in (20.1) depends upon the observation that for each j + I-dimensional subspace X of CHI, there exists at least one jdimensional subspace Y of C k such that (20.2) Thus, as
(By,y) = for y E
(A [~] ,[~])
==}
Y and such a pair of spaces Y and
b(Y)
~ a(X)
X, it follows that
Aj(B) = min b(Y) ~ a(X). YETj
Therefore, as this lower bound is valid for each subspace X E Sj+1, it is also valid for the minimum over all X E S)+I, i.e., Aj(B)~
min a(X) = Aj+1(A).
XESHl
o Exercise 20.1. Find a 2-dimensional subspace Y of C 3 such that (20.2) holds for each of the following two choices of X:
20.1. In terlacing
429
Exercise 20.2. Show that if X is a j + I-dimensional subspace of C k+1 with basis UI, ... ,Uk+l, then there exists a j-dimensional subspace Y of C k such that (20.2) holds. [HINT: Build a basis VI, ... ,Vk for y, with Exercise 20.1 as a guide.] Exercise 20.3. Let A = AH E Al(A):S ... :s An(A) and AI(B)
Aj(A) + Al (B)
~
Aj(A + B)
c nxn ~
~
and B = BH E ... ~ An(B), then
Aj(A) + An(B)
c nxn .
for
Show that if
j = 1, ... ,n.
[HINT: Invoke the Courant-Fischer theorem.] Theorem 20.2. Let Al and A2 be n x n Hermitian matrices with eigenvalues A~l) :s ... :s A~I) and Ai2 ) ~ ••. ~ A~2), respectively, and suppose further that Al - A2 ~ O. Then: (1)
\
(1)
(2) f
;::: Aj
.
Jor J = 1, ... ,n.
(2) If also rank(A l
-
A2)
= 1', then A]I) ~ A]~r for j
= 1, ... ,n -
1'.
Proof. The first assertion is a straightforward consequence of the CourantFischer theorem and is left to the reader as an exercise. To verify (2), let B = Al - A 2 . Then, since rankB = l' by assumption, dimNB = n - 1', and hence, for any k dimensional subspace Y of en,
dimY+dimNB -dim(Y +NB )
dim(ynNB ) ;:::
k+n-1'-n
=
k-1'.
Thus, if k > 'r and 5 j denotes the set of all j-dimensional subspaces of en, then min max{ (A1y,y) : y E U and Ilyll = I}
UESk_,·
ynNB and Ilyil = I} max{(A2Y,y) + (By,y): y E ynNB and Ilyll = I} max{(A 2 y,y): y E ynNB and Ilyll = I}
< max{(A 1 y,y): y
E
< max{(A 2 y,y): y E Y and Iiyll = I} . Therefore, since this inequality is valid for every choice of Y E 5k, it follows that (1) (2) Ak - r :s Ak for k = r + 1, ... ,n . o But this is equivalent to the asserted upper bound in (2). Exercise 20.4. Verify the first assertion of Theorem 20.2. Exercise 20.5. Let A = B +/,uuH , where BEe nxn is Hermitian, u E en, /' E 1R; and let Al(A) ~ ... ~ An(A), Al(B) :s ... ~ An{B) denote the eigenvalues of A and B, respectively. Show that
20. Eigenvalue location problems
430
(a) If I ~ 0, then Aj(B) :::; Aj{A) :::; Aj+1{B) for j An(B) :::; An{A). (b)
:::; 0, then Aj-l (B) :::; Aj{A) :::; Aj{B) for j Al(B).
1f T
=
= 1, ... ,n - 1 and 2, ... ,n and Al(A) :::;
Exercise 20.6. Show that in the setting of Exercise 20.5, n
Aj(A) = Aj{B) + cjf,
where
Cj
~ 0 and
L Cj = u R u. j=1
[HINT: 2:j=1 Aj{A) = 2:j=1 (Auj, Uj) = 2:j=1 (Avj, Vj) for any two orthonormal sets of vectors {Ul' ... ,un} and {VI, ... ,vn} in en.J
e
Exercise 20.7. Let AO be a pseudoinverse of a matrix A = AR E nxn such that A ° is also Hermitian. Show that &± (A) = &± (A 0 ) and &0 (A) =
&o(AO). A tridiagonal Hermitian matrix An E lR. nxn of the form
al b1 0 b1 a2 b2 n n-l An = Lajeje] + Lbj(eje]+1 +ej+l e ]) = 0 b2 a3 j=1 j=1 0
0
0
0 0 0
0
0 0
... bn- 1 an
with bj > 0 and aj E lR. is termed a Jacobi matrix. Exercise 20.S. Show that a Jacobi matrix An+1 has n + 1 distinct eigenvalues Al < ... < An+l and that if J.Ll < ... < J.Ln denote the eigenvalues of the Jacobi matrix An, then >"j < J.Lj < Aj+1 for j = 1, ... ,n.
20.2. Sylvester's law of inertia Theorem 20.3. Let A and B = C R AC be Hermitian matrices of sizes
n x nand m x m, respectively. Then &+(A)
~
&+(B)
and &_(A)
~
&_(B) ,
with equality in both if rank A = rank B . Proof. Since A and B are Hermitian matrices, there exists a pair of invertible matrices U E e nxn and V E e mxm such that
20.3. Congruence
431
where Q = UCV- 1 • Thus, upon expressing the n x m matrix in block form as
Q=
[~~~ ~~: ~~: 1' Q31
Q32
Q33 where the heights of the block rows are 81, t1 and n - 81 - t1 and the widths of the block columns are 82, t2 and m - 82 - t2, respectively, it is readily
seen that IS2
= QK Q11
- Q~ Q21
and
It2
= -Q~Q12 + Q~Q22 .
Therefore,
QK Q11 = IS2 + Q~ Q21
Q~Q22 = It2 + Q~Q21 .
and
The first of these formulas implies that NQll = {O}, and hence the principle of conservation of dimension applied to the 81 x 82 matrix Q11 implies that 82 = rank Q11. But this in turn implies that Q11 must have 82 linearly independent rows and hence that 81 :2: 82. By similar reasoning, t1 :2: t2. Finally, if rank A = rank B = r, then t1 = r - 81 and t2 = r - 82 and hence t1
2: t2
¢:::::}
r-
81
:2: r -
82 ¢:::::} 82
2:
81 .
o
Thus, under this extra condition, equality prevails.
Corollary 20.4. (Sylvester's law of inertia) Let A and B = C H AC be two n x n Hermitian matrices and suppose that C is invertible. Then
20.3. Congruence A pair of matrices A E c nxn and B E c nxn is said to be congruent if there exists an invertible matrix C E c nxn such that A = CHBC. This connection will be denoted by the symbol A rv B. Remark 20.5. In terms of congruence, Sylvester's law of inertia is:
c nxn ,
=
c±(B)
Exercise 20.9. Let U E c nxn be a unitary matrix and let A E Show that if A ~ 0 and AU ~ 0, then U = In.
c nxn .
If A,B E and coCA)
=
B co(B).
=
BH and A
rv
B, then A
=
A H, c±(A)
20. Eigenvalue location problems
432
Lemma 20.6. If B E C pxq , then
0 E = [ BH
B]
0
[ BBH
0 ] _BH B
0
rv
and E±(E) = rankB.
Proof. The first step is to note that, in terms of the Moore-Penrose inverse Bt of B,
[_~H
(B2H] [%H
~] [It ~:] = 2 [Bgt
_%H B ]
and, by Theorem 5.5, det
[;;t
~:]
= det
(1q
+ Bt B) > O.
The conclusion then follows upon multiplying both sides of the preceding identity by the matrix
~ v'2
[Y 0] 0 Iq ,
where Y
= (BBH)1/2 + Ip _ BBt = yH '
This does the trick, since Y is invertible and
YBBty = (BBH)1/2 BBt(BBH)1/2 = BBH .
o Exercise 20.10. Show that the matrix Y that is defined in the proof of Lemma 20.6 is invertible. Exercise 20.11. Furnish a second proof of Lemma 20.6 by showing that if p+q = n, then det (>..In - E) = >.(p-q) det (>.2Iq - BH B). [HINT: Show that if rankB = k, then E±(E) = k and Eo(E) = n - 2k.] Lemma 20.7. If BE C pxq and
e = e H E c nxn , then
1
0 B [BBH 0e E= [ 00 eo", 0 00 BH 0 0 0 0 _BHB
1
and hence
Proof. This is an easy variant of Lemma 20.6: Just multiply the given matrix on the right by the invertible constant matrix
Ip 0 -OB K= [ 0 In Bt 0 Iq
1
20.4. Counting positive and negative eigenvalues
433
and on the left by K H to start things off. This yields the formula
KHEK= [
1
2BBt 0 0 0 CO, o 0 -2BHB
which leads easily to the desired conclusion upon multiplying KH EK on the left and the right by the invertible constant Hermitian matrix 1 . y'2dlag {Y, y'2In , Iq} ,
o
where Y is defined in the proof of Lemma 20.6.
Exercise 20.12. Show that congruence is an equivalence relation, i.e., (i) A", A; (ii) A", B ===> B '" A; and (iii) A B, B '" C ===> A '" C. I".J
Exercise 20.13. Let A
=
A H, C
=
CH and E
=
[:H
~].
Show that
E = EH and that £±(E) ~ £±(A) and £±(E) ~ £±(C). [HINT: Exploit Theorem 20.3.] Exercise 20.14. Let ej denote the j'th column of In. Show that the eigenvalues of Zn = 2:/;=1 eje~+1_j must be equal to either 1 or -1 without calculating them. [HINT: Show that Z!! = Zn and Z!! Zn = In.] Exercise 20.15. Let Zn be the matrix defined in Exercise 20.14.
= 2k is even, then £+(Zn) = £-(Zn) = k. (b) Show that if n = 2k + 1 is odd, then £+(Zn) = k + 1 and £-(Zn) = k. (a) Show that if n
[HINT: Verify and exploit the identity
[!~k ~:] [~ ~] [~ -I~k] =
2diag {h, -h}.] Exercise 20.16. Confirm the conclusions in Exercise 20.14 by calculating >.h -Zk] det (>.In - Zn). [HINT: If n = 2k, then >.In - Zn = [-Zk >'Ik .]
20.4. Counting positive and negative eigenvalues Lemma 20.S. Let A
(20.3)
Then (20.4)
= AH
E C kxk , BE C kxl ; and let
20. Eigenvalue location problems
434
Proof. Let m = rank B and suppose that m 2:: 1, because otherwise there is nothing to prove. Then, the singular value decomposition of B yields the factorization B = V SU H, where V and U are unitary matrices of sizes k x k and f x f, respectively, S - [
D
-
Ok'xm
Omxi' ] Ok/xl'
'
k' = k - m f' = f - m ,
,
D is an m x m positive definite diagonal matrix, and for the sake of definiteness, it is assumed that k' 2:: 1 and f' 2:: 1. Thus,
E
0]
=
[V
'"
VHAV S] [ SH 0
=
[ A21 A22 0
o
U
~u ~12 D
o
0 0
0]
S ] [VH 0
[VHAV SH
o
UH
DO] 0
0 0
00
'
where the block decomposition of A = V H AV is chosen to be compatible with that of S, i.e., Au E c mxm , A12 E c mxi', A22 E C ixi . Moreover, Au = A{{, A12 = A~, A22 = A~ and, since D is invertible, the last matrix on the right is congruent to
[~ f g g], o
0
0
A22 which is congruent to
0
[
o o o
00 0] 0
D
0
D 0
0 0
0 0
as may be easily verified by symmetric block Gaussian elimination for the first congruence and appropriately chosen permutations for the second; the details are left to the reader as an exercise. Therefore, by Lemma 20.6,
E±(E) = t:±(A22 ) + t:± ([ ~ =
g])
t:±(A22 ) + rank (D) 2:: rank (D).
This completes the proof, since rank (D)
= rank (B).
o
Exercise 20.17. Show that if B,C E C nxk and A = BC H + CB H , then t:±(A) S k. [HINT: Write A = [B C]J[B C]H for an appropriately chosen signature matrix J and invoke Theorem 20.3.] Exercise 20.18. Show that if B E inverse of
B, then
Cpxq and Bt denotes the Moore-Penrose [%H ~] '" [Bgt -~tB]'
20.4. Counting positive and negative eigenvalues
435
Exercise 20.19. Evaluate E±(Ej) and Eo(Ej) for the matrices
[~: ~]
El =
[g ~ ~ g] .
E2 =
and
0
Ie Lemma 20.9. Let A
= AH,
0
0
= DH and
D
E= [BAH
~
CH
0
g] 0
Then
Let C = VSU H be the singular value decomposition of C. Then
Proof.
B S] O, [BHA D
E",
0
SH
where
A = VH AV and B = VHB.
0
Thus, if
A and B are written in
where F is a positive definite diagonal matrix and compatible block form as
A .
-
respectIvely, then An
E",
(20.5)
~ [i:: i~ 1 and [~: 1' -H = An,
A12
An A21 -H Bl
A12 A22 -H B2
Bl B2 D
F
0 0
0 0
0
= A-H 21 , A22 =
F
0 0 0 0
0 0 0 0 0
fV
-H A22 and
0 0 0
0
0
F
A22 -H B2
B2 D
0 0
0 0
0 0 0 0
F
0
0 0 0 0 0
Therefore, upon applying Theorem 20.3 to the identity I
[~ ~] ~ 0
D
0
0 0 0 0
0 0 I
0 0 0
0 0
0
I
T
0 0 0 F 0 A22 B2 0 -H 0 B2 D 0 F 0 0 0 0 0 0 0
0 0 0 0 0
I
0 0 0 0
0 0 I
0 0 0
0 0
0
I
,
20. Eigenvalue location problems
436
and then invoking Lemma 20.7, it is readily seen that e±(E) 2:: rankF + e±(D).
o
This completes the proof, since rankF = rankC.
Lemma 20.10. Let E = EH be given by formula (20.3) with rank A 2:: 1 and let At denote the Moore-Penrose inverse of A. Then
e±(E) 2:: e±(A) + e'f(BAt B H ).
(20.6)
Proof. Since A = AH, there exists a unitary matrix U and an invertible diagonal matrix D such that
A=U[g g]U
H.
Thus, upon writing and UH B
=[
~~ ~ ] = [ ~~ ]
in compatible block form, it is readily seen that
E=[6 ~ 1[~f1
0 0
B, B2
BH 2
1[ 0
UH
0
Le.,
E",
[ ~f1
0 0
B,
Bfj
~2
1 .
Moreover, since D is invertible,
where E1 =
[~H ~] - [~H] D- 1 [0
B1] = [
~
Thus, [±(E) = e±(D) + e±(E1) ' which justifies both the equality
(20.7) and the inequality (20.8)
~ ],
437
20.5. Exploiting continuity
Much the same sort of analysis implies that £±(El) ~ £±( -B{i D- 1 B 1 )
= £'f(B{i D- 1B 1 )
and hence, as
[g-l
B~]
[B{i BHU
Z] [ ~~ ]
[g-1 Z] U HB
BHAtB ,
implies that £±(Ed ~ £'f(BH At B). The asserted inequality (20.6) now drops out easily upon inserting the last 0 inequality into the identity (20.7). Exercise 20.20. Let B E C pxq , C
= C HE c qxq
and let E
=
[%H ~].
Show that £±(E) ~ max {£±(C), rank (B)}.
20.5. Exploiting continuity In this section we shall illustrate by example how to exploit the fact that the eigenvalues of a matrix A E C nxn are continuous functions of the entries in the matrix to obtain information on the location of its eigenvalues. But see also Exercise 20.22. The facts in Appendix A may be helpful. Theorem 20.11. Let
A
B
Ip ]
E = [BH Ip 0 Ip 0 0 in which A, BE C pxp and A Proof.
= AH.
,
Then £+(E)
= 2p
and £_(E) = p.
Let E{t)
tA
tB I p] Ip 0 . 0 0
= [ tBH Ip
Then it is readily checked that:
(1) E(t) is invertible for every choice of t E JR. (2) £+(E(O))
= 2p and £_(E(O)) = p.
(3) The set 0 1 = {t E JR : £+(E(t)) = 2p} is an open subset of JR. (4) The set 02
= {t E JR :
= 0 1 U O2 , (6) 0 1 i= 0.
(5) JR
£+(E(t))
< 2p} is an open subset of JR.
20. Eigenvalue location problems
438
Thus, as the connected set R = fh U O2 is the union of two open sets, Item (6) implies that O2 = 0. Therefore, 0 1 = R. D Exercise 20.21. Verify the six items that are listed in the proof of Theorem 20.1l. Exercise 20.22. Show that the matrix E(t) that is introduced in the proof of Theorem 20.11 is congruent to E(O) for every choice of t E R
20.6. Gersgorin disks Let A E
e nxn with entries G,;,j and let n
Pi(A)
=L
laijl-Iaiil
for
i
= 1, ...
,n.
j=1
Then the set
ri(A) = {A E e: IA - aiil ~ Pi(A)} is called the i'th Gersgorin disk of A. Theorem 20.12. If A E e nxn , then:
(1) u(A) ~ Uf=lri(A). (2) A union 01 of k Gersgorin disks that has no points in common with the union 02 of the remaining n - k Gersgorin disks contains exactly k eigenvalues of A. Proof. If A E u(A), then there exists a nonzero vector U E components UI, .•. ,Un such that (Aln - A)u = o. Suppose that
IUkl = max {Iujl : j = 1, ... ,n}. Then the identity ,
n
L akjUj - akkuk
(A - akk)uk =
j=1
implies that n
I(A - akk)ukl <
L lakjllujl- lakkllukl j=l
< Pk(A)lukl· Therefore, n
A E rk(A)
C
Uri(A) . i=l
This completes the proof of (1).
en
with
439
20.7. The spectral mapping principle
Next, to verify (2), let D = diag{all, ... ,ann} and let
B(t) = D + t(A - D)
for
0
~
t
~
1.
Then
01(t)
~
0 1(1)
and the fact that 0 1 (1)
~
01(t) u 02(t)
for 0 ~ t ~ 1, where OJ(t) denotes the union of the disks with centers in OJ(l) but with radii pi(B(t)) = tpi(B(I)) for 0 ~ t ~ 1. Clearly 0 1 (0) contains exactly k eigenvalues of D = B(O), and O2 (0) contains exactly n-k eigenvalues of D = B(O). Moreover, since the eigenvalues of B(t) must belong to fh(t) u 02(t) and vary continuously with t, the assertion follows from the inclusions
O"(B(t))
and
02(t)
~
O2(1)
for
0
~
t
~
1
o
n 02(1) = 0.
Exercise 20.23. Show that the spectral radius ru(A) of a matrix A E cnxn is subject to the bound
ru(A)
~ max
{t
!aij! : i = 1, ... , n} .
3=1
Exercise 20.24. Show that the spectral radius ru(A) of a matrix A E c nxn is subject to the bound n
ru(A) ~ max{L !aij!: j = 1, ... ,n}. i=1
Exercise 20.25. Let A E c nxn . Show that if aii > Pi(A) for i then A is invertible.
= 1, ...
,n,
Exercise 20.26. Let A E C nxn. Show that A is a diagonal matrix if and only if O"(A) = u~lri(A).
20.1. The spectral mapping principle In this section we shall give an elementary proof of the spectral mapping principle for polynomials because it fits the theme of this chapter; see Theorem 17.25 for a stronger result. Theorem 20.13. Let A E c nxn , let A!, ... ,Ak denote the distict eigenval-
ues of A and let p(A) be a polynomial. Then (20.9) det (Aln - A) ===}
(A - A1)Cll ... (A - Ak)Clk det (Aln - p(A)) = (A - p(Ad)Cll ... (A - p(Ak))Clk ;
z.e., (20.10)
O"(p(A)) = p(O"(A)).
440
20. Eigenvalue location problems
Proof. Let V be an invertible matrix such that V-I AV = J is in Jordan form. Then it is readily checked that Jk = Dk + Tk for k = 1,2 .... , where D = diag {dn, ... ,dnn } is a diagonal matrix and Tk is strictly upper triangular. Thus, if the polynomial p(A) = ao + alA + ... + aeAe, then
e
p(J) = p(D) +
L ajTj . j=l
Consequently, det (Aln - p(A))
det (Aln - p(J))
= det (Aln - p(D))
= (A - p(du )) ... (A - p(dnn )) ,
o
which yields (20.9) and (20.10).
20.8. AX
= XB
Lemma 20.14. Let A,X,B E c nxn and suppose that AX = XB. Then there exists a matrix C E c nxn such that AX = X(B + C) and (T(B + C) ~ (T(A). Proof. If X is invertible, then (T(A) = (T(B); i.e., the matrix C = 0 does the trick. Suppose therefore that X is not invertible and that C~k) is a k x k Jordan cell in the Jordan decomposition of B = U JU- 1 such that j3 f/- (T(A). Then there exists a set of k linearly independent vectors UI, ... ,Uk in en such that
and
i.e.,
and
AXUj = j3XUj
Since j3 if
where UI, ...
f/-
~
+ Uj-l
for j = 2, ... ,k. (T(A), it is readily checked that XUj = 0 for j = 1, ... , k. Thus,
~
I
are the rows in V = U- corresponding to the columns ,Uk, and a E (T(A), then
VI, ... ,Vk
XB I =XB=AX,
20.9. Inertia theorems
441
and the diagonal entry of the block under consideration in the Jordan decomposition of Bl now belongs to O'(A) and not to O'(B). Moreover, none of the other Jordan blocks in the Jordan decomposition of B are affected by this change. The same procedure can now be applied to change the diagonal entry of any Jordan cell in the Jordan decomposition of Bl from a point that is not in O'(A) to a point that is in O'{A). The proof is completed by iterating this procedure. 0
Exercise 20.27. Let A, X, B E c nxn . Show that if AX = XB and the columns of V E C nxk form a basis for N x , then there exists a matrix L E C kxn such that O'{B + VL) ~ O'{A).
20.9. Inertia theorems Theorem 20.15. LetA E c nxn and suppose thatG E matrix such that A HG + G A :>- O. Then:
c nxn is a Hermitian
(1) G is invertible. (2) IT{A) n ilR = 0. (3) c+(G) = c+(A), c_(G) = c_(A) and (in view 01(1) and (2)) co(G) = co(A) = O. Proof.
Let
Then
Gu = 0 ===> uHQu = 0 ===> u
=0
since Q :>- O. Therefore (1) holds. Similarly Au
= AU, u
=1=
0 ===> (A + X)uHGu = uHQu > O.
Therefore, A + X=1= 0; i.e., (2) holds. Suppose now for the sake of definiteness that A has p eigenvalues in the open right half plane II+ and q eigenvalues in the open left half plane with p ~ 1 and q ~ 1. Then p + q =·n and A = U JU- 1 with a Jordan matrix J=[3
~],
where J 1 is a p x p Jordan matrix with O'(Jt} C II+ and h is a q x q Jordan matrix with 0'(J2) c II_. Thus, the formula AHG+GA = Q can be rewritten as JH(UHGU) + (UHGU)J = UHQU and hence, upon writing UHGU
= [Kn K12] and UHQU = K21
K22
[Pu P21
P12] P22
20. Eigenvalue location problems
442
in block form that is compatible with the decomposition of J,
[~
;:]
[~~~ ~~:] + [~~~ ~~:] [3 ~] = [ ~~~
Therefore, and
Jf K22 + K22J2 = P22 .
In view of Exercise 18.6, both of these equations have unique solutions. Moreover, it is readily checked that Kn >- 0, K22 -< 0 and
[~~~ ~~:] = CH [~1 with C=
[3
K22 _
K~KlilK12 ] C,
Kl~:12].
Consequently, the Sylvester inertia theorem implies that £+(G)
= c+(UHGU) = £+(Kn) = p = c+(A)
and
= £-(K22 - K21Klil K 12) = q = c_(A). The same conclusions prevail when either p = 0 or q = o. The details are £_(G) = £_(UHGU)
left to the reader.
0
Exercise 20.28. Verify that Kn >- 0 and K22 -< 0, as was asserted in the proof of Theorem 20.15. [HINT: Kn = 1000 e- sJfi Pne-shds.j Exercise 20.29. Complete the details of the proof of Theorem 20.15 when
p=o. A more elaborate argument yields the following supplementary conclusion: Theorem 20.16. Let A E c nxn and let G E such that:
c nxn
be a Hermitian matrix
(1) AHG + GA t: O. (2) u(A) n ilR. = 0. (3) G is invertible. Then c+(A) tion).
= £+(G), c_(A) = c_(G) (and £o(A) = co(G) = 0 byassump-
Proof. The asserted result is an immediate consequence of the following lemma. 0
20.10. An eigenvalue assignment problem
443
Lemma 20.17. Let A E c nxn and let G E c nxn be a Hermitian matrix such that A H G + GAt: O. Then the following implications hold:
n ilR = 0 => £+(G)
£+(A) and £_(G) ~ £_(A). (2) G is invertible => £+(G) 2: £+(A) and £_(G) 2: £_(A).
(1) O"(A)
~
Proof. Suppose first that O"(A) n ilR = 0. Then, by Theorem 18.7, there exists a Hermitian matrix Go such that A H Go + GoA >- O. Therefore,
for every e >
o.
Thus, by Theorem 20.15,
for every e > O. Moreover, since the eigenvalues of G + eGo are continuous functions of e and G+eGo is invertible for every e > 0, the desired conclusion follows by letting e ~ O. Next, if G is invertible and Ae = A + eG- l , then A~G+GAe =AH G+GA+2cIn
>- 0
for every choice of e > O. Therefore, by Theorem 20.15, £+(G) = £+(Ae) and £_(G) = £_(Ae) for every choice of e > O. Then, since the eigenvalues of Ae are continuous functions of e, the inequalities in (2) follow by letting e ~ O. 0
20.10. An eigenvalue assignment problem A basic problem in control theory amounts to shifting the eigenvalues of a given matrix A to preassigned values, or a preassigned region, by an appropriately chosen additive perturbation of the matrix, which in practice is implemented by feedback. Since the eigenvalues of A are the roots of its characteristic polynomial, this corresponds to shifting det (>.In - A) to a polynomial Co + ... + Cn_l>.n-l + >.n with suitable roots. Theorem 20.18. Let
(20.11) where Sf denotes the companion matrix based on the polynomial f(>.) = det(>.In - A) = ao + alA + ... + an_l>.n-l + An
20. Eigenvalue location problems
444
and H f denotes the Hankel matrix
(20.12)
Hf=
[
al
a2
a2
a3
... ...
an_Ill 1 0
:
1
o
0
0
based on the coefficients of f()..). If (A, b) is controllable, then e: is invertible.
Proof. Let N = C~n) = L,j~f ejef+1 be the n x n matrix with ones on the first superdiagonal and zeros elsewhere. Then clearly, Ae: =
A[b Ab··· An-Ib]
-
[Ab···An-lb 0]
=
(£ NT
=
[Ab A 2 b· .. Anb]
+ [0···0 Anb]
+ [0···0 Anb] = (£8J,
since by the Cayley-Hamilton theorem, and, consequently, 0 ... [: [0···0 Anb] = e :
o
0 :
0
Moreover, if (£ is controllable, then it is invertible, since e: is a square matrix. The verification of the second identity in (20.11) is left to the reader. D Exercise 20.30. Verify the second identity in (20.11).
Exercbe 20.31. L& A = u E
[~ ~
:] and let b =
m.
Find a vector
e 3 such that u(A+ buH) = {2,3,4}.
Theorem 20.19. If(A, b) E enxnxe n is controllable, then for each choice of co, CI, ..• , Cn-I E e, there exists a vector u E en such that det()..In - A - buH) = Co
Proof.
+ CI).. + ... + Cn_l)..n-l + )..n.
Let 8 g denote the companion matrix based on the polynomial
g()..) = CO + CI).. + ... + Cn_l)..n-l
+ )..n.
Then it suffices to show that there exists a vector u E en such that A + buH is similar to 8 g • Since Ae:Hf = e:Hf8f by Theorem 20.18, and e:Hf is invertible, it is enough to check that (A + buH)(£Hf = e:Hf8g
20.10. An eigenvalue assignment problem
445
or, equivalently, that bUH ([.Hf
=
Sf).
In order for this equality to hold, it is necessary that buH
=
Sf)Hj1
I.e.,
uH
=
(bHb)-lb H
It remains to check that this choice of u really works, i.e.,
(A + b(bHb)-lb H
Sf )Hj1
The main steps in the verification are to check that
(1) Sg - Sf = enwT , where w T = [ao - Co (2) Hf(Sg - Sf)
=
an-l - Cn-I]'
el w T .
(3)
The details are left to the reader as an exercise.
o
Exercise 20.32. Complete the proof of Theorem 20.19 by verifying the four items in the list furnished towards the end of the proof of that theorem. Theorem 20.20. If (A, B) E c nxn X C nxk is a controllable pair with k > 1, then there exist a matrix C E C kxn and a vector b E RB such that (A + BC, b) is a controllable pair. Discussion. Suppose for the sake of definiteness that A E C 10 x 10, B = [b i ... b 5 ], and then permute the columns in the controllability matrix to obtain the matrix
Then, moving from left to right, discard vectors that may be expressed as linear combinations of vectors that sit to their left. Suppose further that A 3b 1 E
span {bl, Ab 1, A 2b l }
A5b2 E
span {bl, Abl, A 2b 1, b 2, ... ,A4 b 2}
A2b3
span{bl,Abl,A2bl,b2,'" ,A4 b 2 ,b3,Ab3}
E
and that the matrix
Q = [b i Ab 1 A 2 b 1 b 2 Ab2 A 2b2 A 3b 2 A 4 b 2 b3 Ab3 ] is invertible. Let ej denote the j'th column of Is and let fk denote the k'th column of lIo and set G = [0 0 e2 0 0 0 0 e3 0 0]
20. Eigenvalue location problems
446
and C = GQ-l. Then
+ BC)b l (A + BC)2bl (A
=
+ BGQ-IQfl = Ab l (A + BC)Ab l = A2bl + BGQ-IQf2
Ab l
= A2bl (A
(A + BC)A2b l
+ BC)3b l =
A3bl + Be2
= A 3 b l + BGQ-IQf3
= A3b l + b 2 .
Similar considerations lead to the conclusion that [bl (A + BC)b l ... (A + BC)9b l ] = QU ,
where U is an upper triangular matrix with ones on the diagonal. Therefore, since QU is invertible, (A + BC, b l ) is a controllable pair. Theorem 20.21. Le'- (A, B) E e nxn x e nxk be a controllable pair and let {J.Lb ... ,J.Ln} be any set of points in en (not necessarily distinct). Then there exists a matrix K E e kxn such that det{Aln
-
A - BK)
= (A -
J.LI) ... (A - J.Ln).
Proof. If k = 1, then the conclusion is given in Theorem 20.19. If k > 1, then, by Theorem 20.20, there exists a matrix C E e kxn and a vector bE 'RB such that (A + BC, b) is a controllable pair. Therefore, by Theorem 20.19, there exists a vector u E en such that det(Aln - A - BC - buH) = (A - J.Ld ... (A - J.Ln). However, BC + buH = B(C +vuH ) for some v E e k , since b E the proof may be completed by choosing K = C + vu H .
'RB.
Thus, 0
20.11. Bibliographical notes Theorem 20.15 is due to Ostrowski and Schneider, and independently, to M. G. Krein; see p. 445 of [45] for a discussion of the history and a converse statement. Theorem 20.16 is due to Carlson and Schneider. The proofs presented here are adapted from the presentation in [45]. The discussion of Theorem 20.20 is partially adapted from Heymann [38]. A number of the congruence theorems were taken from the paper [27].
Chapter 21
Zero location problems
Just because it hurts, it doesn't mean that it's good for you. Michael Dym In this chapter we shall present two recipes for counting the number of common roots of two polynomials and applications thereof. In particular, we shall discuss stable polynomials and a criterion of Kharitonov for checking the stability of a family of polynomials with coefficients that are only specified within certain bounds.
21.1. Bezoutians Let
f(>.) = fo
+ II>' + ... + fn>.n
, fn
to,
be a polynomial of degree n with coefficients fo, . .. ,fnEe and let
g(>.) = go + gl>' + ... + gn>.n be a polynomial of degree less than or equal to n with coefficients go, ... ,gn E C, at least one of which is nonzero. Then the matrix BE c nxn with entries bij, i,j = 0, ... ,n -1, that is uniquely defined by the formula
is called the Bezoutian of the polynomials f(>.) and g(>.) and will be denoted by the symbol B (J, g).
-
447
448
21. Zero location problems
The first main objective of this chapter is to verify the formula
dimNB = v(j,g) , where
v(j, g)
the number of common roots of the polynomials f()..) and g()..), counting multiplicities.
In particular, it is readily seen that if f(o:) = 0 and g(o:) = 0, then
for every point ).. E C and hence that
Moreover, if f(o:) = /,(0:)
= 0 and g(o:) = g'(o:) = 0, then the identity
f()..)g'(p,) - g()..)f'(p,) ).._p,
+ f()..)g(p,)
- g()..)f(p,) ().._p,)2
o 1 = [1
).. ... )..n-ljB
2p, (n - 1)p,n-2
which is obtained by differentiating both sides of formula (21.1) with respect to p" implies that
o (21.2)
1 20: (n - 1)o:n-2
=0
21.1. Bezoutians
449
for every point). E C. Therefore, dimNB 2:: 2, since the vector
(2L3) v(o) = [
J-,
I
o and its derivative
1 2a
v(1)(a) =
(n -1)an - 2
with respect to a both belong to
NB and are linearly independent.
Much the same sort of reasoning leads rapidly to the conclusion that if f(a) = f(1)(a) = ... f(k-l) (a) = 0 and g(a) = g(l)(a) = ... g(k-l)(a) = 0, then the vectors v(a), ... , v(a)(k-l) all belong to N B . Thus, as these vectors are linearly independent if k ::; n, dimNB 2:: k. Moreover, if f({3) = f(1)({3) = ... f(j-l)({3) = 0 and g({3) = g(I)({3) = ... g(j-I)({3) = 0, then the vectors v({3), ... , v(j-I)({3) all belong to NB. Therefore, since this set of vectors is linearly independent of the set v(a), ... ,v(k-I)(a), as will be shown in the next section, dimNB 2:: k+ j. Proceeding this way, it is rapidly deduced that
dimNB 2:: v(f,g). The verification of equality is more subtle and will require a number of steps. The first step is to obtain a formula for B(f, 'Pk) for the monomial 'Pk().) = ).k, k = 0, ... ,n, in terms of the Hankel matrices (21.4)
,
H[k,n1 _
fk fk+l
fk+l
fn
0
fn 0
-
o H,10,k-11
fo!I
0
(for k = 1, ... ,n), the projectors k
(21.5)
E(j,k)
=
L eie[
for
1::; j ::; k ::; n
i=j
based on the columns ei of In and the n x n matrices 0 1 0 0 0 1 (21.6)
0 0
n-I
=L
N -- C(n)0 -
0 0 0 0
1 0
i=1
eie41"
fo !I
A-I
21. Zero location problems
450
Theorem 21.1. If f(>..) = fo + ... + fn>..n is a polynomial of degree n, i.e., fn =1= 0, and n ~ 2, and if ") = >..k, then
H[l,n] I
(21.7)
BU,
if k=O,
1 11 [Ht :-[O,n-l]
-HI
Proof.
if k=n.
Suppose first that 1 :::; k :::; n - 1. Then
where
and
If j
H}k~,.nll
< k, as in (1), then
On the other hand, if j > k, then
if l:::;k:::;n-l,
21.1. Bezoutians
451
Thus, f(>.)p,k - >.k f(p,) >.-p,
s,t=O
k-l
- L fipk-l p,i + >.k-2 p,i+l + ... + >.i p,k-l} i=O
n
+
L
Ii{>.i-lp,k
+ >.i-2p,k+l + ... + >.kp,j-l} ,
j=k+l
and hence the nonzero entries in the Bezoutian matrix B are specified by the formulas (21.8)
bk-1,i
= bk- 2,i+l = ... = bj,k-l = -Ii for j = 0, ...
bi-1,k
= bi - 2,k+l = ... = bk,j-l = Ii for j = k + 1, ... ,n .
,k - 1
and (21.9)
But this is exactly the same as the matrix that is depicted in the statement of the theorem for the case 1 ~ k ~ n - 1. It remains to consider the cases k = 0 and k = n: k
= 0 ===:> CD = 0 and (]) =
t
Ii { >.i
>.
j=l
=
p,j }
P,
and hence that formula (21.9) is in force for k = O. Similarly, k
=n
===:>
CD =
L Ii {>.j p,n>. __ >.np,j} and (]) = 0
n-l
P,
j=O
and hence that formula (21.8) is in force for k = n.
o
Corollary 21.2. B(j, g) is a symmetric Hankel matrix. Proof. Formula (21.7) clearly implies that B(j, 'Pk) is a symmetric Hankel matrix. Therefore, the same conclusion holds for n
B(j,g) = LgjB(f''Pj) , k=O since a linear combination of n x n symmetric Hankel matrices is a symmetric Hankel matrix. 0 Remark 21.3. It is convenient to set
(21.10)
H I -- H[l,n] I
and
-
-
H, = H,
[O,n-l]
21. Zero location problems
452
for short. Then the formula for k = 1, ... ,n - 1 can be expressed more succinctly as (21.11) BU, 'Pk) = E(k+l,n) HfN k - N n- kHfE(1,k) if k = 1, ... ,n - 1. Exercise 21.1. Verify the formulas
(NT)k N k =
[~ In~k] = E(k+l,n)
for
k
= 1, ...
,n - 1.
for
k
= 1, ...
,n - 1.
Exercise 21.2. Verify the formulas
Nk(NT)k
=
[InOk
~] = E(l,n-k)
Exercise 21.3. Show that formula (21.11) can be expressed as (21.12) BU, 'Pk)
= (NT)k N kHfN k - N n- kHfNn-k(NT)n-k
for
k
= 0, ...
,n.
21.2. A derivation of the formula for H f based on realization If N E lR nxn is the matrix defined by formula (21.6), then 1
o
A
1
o o
An -
Thus a polynomial f(A) (21.13)
fo
1
An -
2
1
= fo + IIA··· + fnAn admits the realization
+ IIA··· + fnAn
= fo
+ A[II ... fnJ(In
- ANT)-lel .
In this section we shall use (21.13) to obtain a new proof the first formula in (21.7). Exercise 21.4. Verify formula (21.13). Lemma 21.4. If f(A) is a polynomial of degree n 2 1, then
BU, 1) = Hf Proof.
.
In view of formula (21.13),
f(A) - f(J.l)
= =
[II··· fn]{A(In - ANT)-l - J.l(I - I-£NT)-l }el [II··· fn]{(A - J.l)(In - ANT)-l(In - J.lNT)-I}el
and, consequently, (21.14)
f(Al- f(J.l) = [II ... fnJ(I - ANT)-l(I - J.lNT)-lel . -1-£
21.3. The Barnett identity
453
The next step is to verify the identity
[/1 ... fn](In - )"NT)-l = [1
(21.15)
).. ... )..n-l]Hf·
This follows easily from the observation that (21.16)
[/1 ... fn](NT)k = ef+1Hf
for
k
= 0, ... ,n -1 :
n-l
[/1 ... fn](In - )"NT)-l = [/1 ... fn] I)"NT)j j=O n-l
- L )..jej+1Hf = [1)..··· )..n-l]Hf· j=O
Thus, substituting formula (21.15) into formula (21.14),
f()..) - f(J1,)
[1 )..
i.e.,
BU, cpo) =
o
Hf as claimed.
21.3. The Barnett identity The next order of business is to establish the identity (21.17)
°
BU, CPk) = HfSj
for
k
= 0, ...
,n.
The case k = has already been established twice: once in Theorem 21.1 and then again in Lemma 21.4.
Lemma 21.5. The identity E(l,k) HfSj = _Nn- kH,E(l,k)
holds for k
= 1, ... ,no
Proof. It is readily checked by direct calculation that the asserted identity is valid when k = 1, since T TT el HfSf = -[10 Olx(n-d = -enHfelel . Thus, proceeding by induction, assume that the formula is valid for k - 1 and let where
454
21. Zero location problems
By the induction hypothesis,
"--
E (l,k-l)H Sk -
,
,.
N(n-k+1)H E(l,k-l)S
Therefore, since E(l,k-l)S,
the term
CD
= E(l,k-l)N = NE(l,k)
and
H,N
= NTH"
can be reexpressed as
CD =
_N(n-k)NH,NE(l,k)
=
_N(n-k) NN T H,E(l,k)
=
_N(n-k) E(l,n-l) H,E(l,k) . ~ is
Next, the key to evaluating
eIH,Sj -
the observation that
[A··· fn
OlX(k-1)lSj
=
[fk··· fn 01X(k_1)lN k- 1S,
=
[OlX(k-l)
=
-[fo··· fk-l Olx(n-l)]
A··· fn]S,
-e~H,E(l,k)
and Thus,
CD + ~ = _Nn- k {E(l,n-l) + ene~} H,E(l,k) = _N n- k H,E(l,k) ,
o
as needed.
Lemma 21.6. The identity E(k+1,n) H,Sj
= E(k+1,n) H,N k
holds for k = 0, ... ,n - 1.
Proof.
The special triangular structure of H, yields the identity E(k+1,n) H,
= E(k+1,n) H,E(l,n-k)
.
The asserted conclusion then follows from the fact that E(l,n-k) sj = E(l,n-k) N k .
o Theorem 21. 7. The identity H,Sj = E(k+l,n) H,N k - N n - k H,E(l,k) holds for k
= 0,1, ...
,n, with the understanding that E(n+1,n)
= E(l,O) = o.
455
21.4. The main theorem on Bezoutians
Proof.
This is an immediate consequence of the preceding two lemmas.
o Theorem 21.8. (The Barnett identity) If f()..) = fo + 11>' + ... + fn)..n is a polynomial of degree n (i.e., fn =1= 0) and g()..) is a polynomial of degree ~ n with at least one nonzero coefficient, then (21.18)
Proof.
B(f,g)
= H, g(8,).
By formula (21.11), the identity
B(f, 'Pk) = E(k+l,n) H,N k - N n- kH,E(l,k) holds for k = 0, 1, ... ,n, with self-evident conventions at k (see (21.12). Thus, in view of Theorem 21.7,
= 0 and k = n
Therefore, n
n
B(f,g) = LgkB(f,'Pk) = LgkH,Sj, k=O
k=O
o
as needed.
21.4. The main theorem on Bezoutians This section is devoted to the statement and proof of the main theorem on Bezoutians. It is convenient to first establish a lemma for Jordan cells.
Lemma 21.9. If g()..) is a polynomial and N = C~p), then
(21.19)
g(P-l)V') (p-l!
g()..)
g(l)V')
o
g()..)
gt )W
o
o
g(>.)
1.
2
p-2.
and rankg(Cf})
=
{
if g()..) =1= 0 p ~ s if g(>.) = ... = g(s-l)()..)
where, in the last line, s is an integer such that 1
=0
~
but g(s)()..)
s :::; p.
=1=
0 '
21. Zero location problems
456
Proof. Let r denote a circle of radius R > and is directed counterclockwise. Then
1).1 that is centered at the origin
= ~ { g(()((Ip - )'Ip - N)-ld(
g().Ip + N)
27rZ
lr
p-1
g( ()
'"" _1 {
~ 27ri
3=0
-
lr (( - ).)1+1
~ g(j)().) ~ j=O
.,
Nj d(
j
N.
J.
The formula for the rank of g(Cf}) is clear from the fact that the matrix under consideration is an upper triangular Toeplitz matrix. Thus, for example, if p = 3, then g().) g(1)().) [ g().Ia + N) = 0 g().)
o
0
But this clearly exhibits the fact that rank g().Ia + N)
=
3 if g().) { 2 if g().) 1 if g().)
#0 = 0 and g(l)().) # 0 = g(1)().) = 0 and g(2)().) # 0
o Exercise 21.5. Confirm formula (21.19) for the polynomial n
g().)
n
= Lgk).k
by writing g().Ip + N)
=L
k=O
gk()'Ip + N)k
k=O
and invoking the binomial formula. [REMARK: This is a good exercise in manipulating formulas, but it's a lot more work.] Theorem 21.10. If f().) is a polynomial of degree n (i.e., fn # 0) and g().) is a polynomial of degree::; n with at least one nonzero coefficient, then dim NB(f,g) is equal to the number of common roots of f().) and g().) counting multiplicities. Proof. The proof rests on formula (21.18). There are three main ingredients in understanding how to exploit this formula: (1) The special structure of Jordan forms J, of companion matrix
f().)
= fn(). - ).1)m 1
••• (). -
Sr
).k)mk
with k distinct roots).1.··· ,).k, where fn # 0 and m1 + .. ·+mk then (up to permutation of the blocks) the Jordan form J,
= diag{C(m 1 ) >'1
'
C(m 2 )
>'2'
•••
If
d mk )}.
'>'k
= n,
21.5. Resultants
457
= U J,U- 1 ,
(2) The special structure of 9(8,): If 8, U9(J,)U- 1 . Therefore,
then 9(8,) =
9(J,) = diag{9(Ci:n1 )), ••• , 9(Ct~k))} and correspondingly, k
dim NB(f,g)
= dim N9(Sf) = L
dim N
dmj) . g(
j=1
)..j
)
(3) The formulas in Lemma 21.9, which clarifies the connection between dim N (mj) and the order of )..j as a zero of 9()..). g(G)..
)
J
Let Vj = dim Ng(Ajlmj+N) for j = 1, ... , k.
Then, clearly, Vj
= mj
Moreover, if Vj
-
rank9(Cimj)) J
> 0, then 9()..) = ().. -
and
Vj
> 0 ~ 9()..j) = o.
)..jtj hj()"), where hj()"j)
i= 0;
o
i.e., 9()..) has a zero of order Vj at the point )"j. Exercise 21.6. Show that if A E
9(B) =
c nxn , BE cnxn and AB =
L 9(k)(A) , (B k.
BA, then
n
A)k
k=O
for every polynomial 9()..) of degree ~ n. Exercise 21. 7 . Use Theorem 21.10 and formula (21.18) to calculate the number of common roots of the polynomials I(x) = 2 - 3x + x 3 and 9(X) = -2 +x +x2.
21.5. Resultants The 2n x 2n matrix
10 h o 10 R(f,9)
=
0
In-l In-2
In In-l
0
In
o o
10
0 In h h ........................ .................................
90 91 0 90
9n-l 9n-2
o
90
0
9n 9n-l
0 9n
0
0
92
9n
21. Zero location problems
458
based on the coefficients of the polynomials f(>.) = fo + II>' + ... + fn>.n and g(>.) = go + gl>' + ... + gn>.n is termed the resultant of f(>.) and g(>.). Theorem 21.11. If f(>.) is a polynomial of degree n (i.e., fn t= 0) and g(>.) is a polynomial of degree ~ n with at least one nonzero coefficient, then dimNR(f,g) = the number of common roots of f(>.) and g(>.) counting multiplicities.
The proof rests on a number of matrix identities that are used to show that dim NB(f,g) = dim NR(f,g) . It is convenient to express the relevant matrices in terms of the n x n matrices: 0 1 0 0
0 1 1 0
0 0 0 0
and
(21.20) Z= 0 1 1 0
0 0 0 0
fn 0
0 0
1 0
n-l
=L
Hf=
0 0
N=
0 0 0 0
II h h (21.21)
0 1
fn-j Z(NT" )J,
j=O
0
fn
0
0
fo
II (21.22)
fo fo
(21.23)
T,
n-l
= Lfj ZNj,
Hf=
o
II
fn-l
II . . . fo
...
fn-l fn-2
=[::: .. . o· fo
I
j=O
=
nL-l j=O
" f"J NJ
and 0
(21.24)
[ fn fn-l Gf
= ZHf =
il
fn
U
n-l
T" =L fn-j (N )3.
j=O
h
21.5. Resultants
459
Lemma 21.12. The matrices Z and N satisfy the relations
(21.25)
= NTZ and ZN T = NZ.
ZN
Proof.
In terms of the standard basis ej, j = 1, ...
,n, of en,
n-l n Z = L eje~_j+1 and N = L ei e T+l· j=1 i=1 Therefore, n
n-l Leje~-i+l L ei e T+1 j=1 i=1
ZN
n
L ej(e~_j+1 en-j+1)e~-i+2 j=2 n
L ej j=2
e~-i+2'
which is a real symmetric matrix: n
(ZN)T
n
= Len-i+2 eJ = L eie~_i+2 = ZN. j=2
i=2
Therefore, ZN
= NTZT = NTZ,
since Z = ZT. This justifies the first formula in (21.25). The proof of the second formula is similar. 0 Lemma 21.13. If f(>.) = fo + II>' + ... + fn(>.)n and g(>.) ... + gn(>.)n, then the Bezoutian
= go + g1>' +
(21.26) and (21.27)
Proof.
Clearly,
f(>.)g(J1.) - g(>.)f(J1.) >'-J1.
f(>.) - f(J1.) g(J1.) _ g(>.) -g(J1.) f(J1.) >'-J1. >'-J1. = v(>'? {Hfv(J1.)9(J1.) - Hg v(J1.)f(J1.)},
=
460
21. Zero location problems
in view of formulas (21.3) and (21.7). Moreover,
v(~)g(~)
=
[J1 1g(~) o
i.e.,
(21.28) Consequently,
f(>.)g(l-£) - g(>.)f(l-£)
v(>.f {HjTg - HgTj} v(l-£)
=
>'-1-£
+ I-£nv(>.f{HjGg - HgG j } v(I-£). Thus, in order to complete the proof, it suffices to verify (21.27). However, by formulas (21.21) and (21.24),
(I: - (I:
H,Gg - HgG, =
fn_jZ(NT)j)
)=0
(I:
gn_k(NT)k)
k=O
gn_kZ(NT)k)
k=O
(~fn_j(NT)j) )=0
n-ln-l
L L fn-j gn-k Z((NT)i+k -
=
j=Ok=O =
(NT)k+ j )
.
0,
as claimed.
D
Proof of Theorem 21.11. Lemma 21.13 clearly implies that
[ Hj -Hg ] [Tg
o
In
Tf
Gg ] Gf
=
[BU,g) 0 ] T, Gj
or, equivalently, that
[-~~
Zf] [£ g;] = [:~"g)
~f]·
461
21.6. Other directions Therefore, since H, and G, are invertible when fn =I 0 and
=
[T,Tg G,] Gg ,
NRU,g)
= dim NBU,g)
R(f,g)
it follows that dim
,
as claimed.
*
o
Exercise 21.8. Show that the polynomial Pn(.X) = 2:.1=0 has simple roots. [HINT: It is enough to show that R(Pn, p~) is invertible.] Exercise 21.9. Use Theorem 21.11 to calculate the number of common roots of the polynomials f(x) = 2 - 3x + x 3 and g(x) = -2 + x + x 2 •
21.6. Other directions Theorem 21.14. Let f()..) = fo + .,. + fn)..n be a polynomial of degree n (i.e., fn =I 0), let g()..) = go + ... + gn)..n be a polynomial of degree less than or equal to n that is not identically zero and let B = B (f, g) . Then S'f B - BS, = 0
(21.29)
and B is a solution of the matrix equation
(21.30)
Sf B - BS, = fn(v - v){[gO .,. gn-l]
where
v
T
+ gn vT },
T = enS, = - fn1 [fo ... fn-I].
Proof. In view of Corollary 21.2, B = BT, Therefore, since B(f, 'Pk) = H,Sj, by Theorem 21.8, it follows in particular that H,
= H'f and H,S, = (H,S,?
and hence, upon reexpressing the last identity, that (21.31)
H,S,
= (H,S,? = S'fH] = S'fH"
which serves to justify the first assertion. The next step is to observe that S'f
= NT +ve~
and and hence that Sf = S'f + (v - v) e~ . Therefore, SfB
= S'fB + (v -
v) e~B
= BS, + (v-v) e~B.
21. Zero location problems
462
Moreover, by Theorem 21.8, n
e~B
= e~Hfg(Sf) = fn
e[g(Sf)
= fn Lgke[Sj. k=O
The asserted formula (21.30) now follows easily from the fact that T el
Skf- {
e[+k if k = 0, ... ,n - 1 vT
if k = n.
o Theorem 21.15. Let f(>.) = fo (i.e., fn =1= 0), let
+ ... + fn>.n
be a polynomial of degree n
and let B = B(f, f#). Then:
(1) -iB is a Hermitian matrix. (2) The Bezoutian B = B(f, f#) is a solution of the matrix equation (21.32) . S?X - XSf = Ifnl 2(v - v)(v - v? for X E c nxn . (3) The Hermitian matrix B = -iB is a solution of the matrix equation SHX + XS = Ifnl 2(v - v)(v - v)H for X E c nxn , where S = -iSf. (4) If B = BU, f#) is invertible, then E+(B) =
the number of roots of f(>.) in C+
and the number of roots of f(>.) in C_ . Proof. Let g(>.) = Ej=ogk>.k. Then g(>.) = f#(>.) if and only if gj = fj for j = 0, ... ,n. In this case, the right-hand side of equation (21.30) is equal to T fn(v - v)(f nV
+ [!- 0 ... -f n-l])
Ifni 2(v- - v)(v - -v)T = Ifnl 2(v - v)(v - v)H .
=
Thus (2) holds and (3) follows easily from (2):
SHB+BS=S?B-BSf =
IfnI 2(v-v)(v-v)H t
O.
Moreover, if B is invertible, then, by Theorem 21.10, f(>.) and f#(>.) have no common roots; i.e., if A1, ... ,Ak denote the distinct roots of f(>'), then
{>'b ... ,Ak} n {AI, ... ,Ak} = 0.
21.7. Bezoutians for real polynomials
463
In particular, f (>.) has no real roots and therefore a (Sf) n lR = 0 and hence 0'(8) n ilR = 0. Therefore, by Theorem 20.15,
&+(8)
= &+(13) and &_(8)
=
&_(13).
But this is easily seen to be equivalent to assertion (4), since -i>. E 11+
{::=>
>. E C+
f(>.) = 0 {::=> >.
and
E
O'(Sf)
{::=>
-i>. E 0'(8).
o
Item (1) is left to the reader.
21. 7. Bezoutians for real polynomials Theorem 21.16. Let
a(>.) =
n
n
j=O
j=O
L aj>.j and b(>.) = L bj>.j
be a pair of polynomials with real coefficients and suppose that an Then
=1= 0
and
Ibol + ... + Ibnl > O.
(1) B(a, b) = B(a, b)H. (2) B(a, b) -< 0 if and only if the polynomial a(>.) has n real roots >'1 < ... < >'n and a'(>'j)b(>'j) < 0 for j = 1, ... ,n. Proof. Let B formulas
= B(a, b) and recall the formula in
(21.3) for v(>.). The
a(>.)b(J.L) - b(>.)a(J.L) >'-J.L -
=
-
(a(>')b(]l) - ~(>.)a(]l)) >'-J.L -
=
-
( a(]l)b(>'2 - b(]l)a(>.)) J.L->' (v(jL)T Bv(X)) H
H
H
v(>.)T BHv(J.L) imply that v(>.f Bv(J.L) = v(>.f BHv(J.L) for every choice of>. and J.L in C and hence that (1) holds. To verify (2), suppose first that B
-< O. Then the formulas
a(>')b(X) - b(>')a(X) = v(X)H Bv(X) < 0 if >. =1= X imply that a(>.)
=1=
>.->. 0 if >. ¢ lR, because
a(>.) = 0 {::=> a(>.) = 0 {::=> a(X) = 0
464
21. Zero location problems
(since a(A) has real coefficients) and this would contradict the last inequality. Therefore, the roots of a(A) are real. Moreover, if a(f-L) = 0 for some point f-L E JR., then _
lim (a(A) - a(f-L))b(f-L) A - f-L
A-+p,
=
lim a(A)b(f-L) - b(A)a(f-L) A-+p, A - f-L
=
v(f-L)H BV(f-L) < O.
(Thus, the roots of a(A) are simple.) Next, suppose conversely that the roots of a(A) are real and a' (f-L)b(f-L) < 0 at every root f-L. Then the roots of a(A) are simple and hence, ordering them as Al < A2 < ... < An, the formulas
V(Aj)H BV(Ak) = a(Aj)b(Ak) - b(Aj)a(Ak) = 0 if j
-=1=
k
Aj - Ak and
V(Aj)H BV(Aj) = a'(Aj)b{Aj) < 0 imply that the matrix VH BV, in which
V = [V{Al) ... V(An)] is the n x n Vandermonde matrix with columns V(Aj), is a diagonal matrix with negative entries on the diagonal. Therefore, B -< 0, since V is invertible. 0
Exercise 21.10. Show that if, in the setting of Theorem 21.16, B( a, b) -< 0 and bn = 0, then bn-l -=1= 0 and the polynomial b(A) has n - 1 real roots f-Lb . .. ,f-Ln-l which interlace the roots of a(A), i.e., Al < f-Ll < A2 < ... < An-l < f-Ln-l < An.
Exercise 21.11. Show that if, in the setting of Theorem 21.16, bn -=1= 0, then B(a, b) -< 0 if and only if b(A) has n real roots f-Ll < ... < f-Ln and b'(f-Lj)a(f-Lj) > 0 for j = 1, ... ,n.
21.8. Stable polynomials A polynomial is said to be stable if all of its roots lie in the open left half plane IL.
Theorem 21.17. Let p( A) = Po +PI A+ ... +PnA n be a polynomial of degree n with coefficients Pj E JR. and Pn -=1= 0 and let
(21.33)
a(A)
= p(iA) +2P( -iA)
and b(A)
= p(iA) ~;( -iA) .
Then p(A) is stable if and only if the Bezoutian B(a, b) -< O.
21.8. Stable polynomials
465
Remark 21.18. If n = 2m, then
a(..\) = Po - P2..\2
+ ... + (-1)mp2m..\2m
and b(..\) = PI..\ - P3..\3
+ ... + (_1)m-I p2m _ I ..\2m-I.
If n = 2m + 1, then
a(..\)
= Po - P2..\2 + ... + (-1)mp2m..\2m
and b(..\)
Proof. that:
= PI..\ - P3..\3 + ... + (-1)mp2m+1..\2m+1.
Let j(..\) = p(i..\) and j#(..\) = jeX). Then it is readily checked
(1) a(..\) = j(..\) +2 j #(..\). (2) b(..\) = j(..\) 2/#(..\) . (3) B(a, b)
= - B(j'l/#).
(4) p(..\) is stable if and only if the roots of f(..\) belong to the open upper half plane C+. Now, to begin the real work, suppose that the Bezoutian B(a, b) -< O. Then -iB(j, f#) >- 0 and hence, in view of Theorem 21.15, the roots of the polynomial f (..\) are all in C+. Therefore, the roots of p(..\) are all in 1L; i.e., p(..\) is stable. The main step in the proof of the converse is to show that if p(..\) is stable, then B(a, b) is invertible because if B(a, b) is invertible, then B(j, f#) is invertible, and hence, by Theorem 21.15, the number of roots of j (..\) in C+ =
the number of roots of p(..\) in IL
=
n;
i.e., -iB(j, j#) >- 0 and therefore B(a, b) -< O. To complete the proof, it suffices to show that a(..\) and b(..\) have no common roots. Suppose to the contrary that a(a) = b(a) = 0 for some point a E C. Then p(ia) = a(a) + ib(a) = O. Therefore, ia E 1L. However, since a(..\) and b(..\) have real coefficients, it follows that p(ia) = a(a) + ib(a) = a(a) + ib(a) = 0
21. Zero location problems
466
also. Therefore, if n is a common root of a(A) and b(A), then in and ia. both belong to IL; i.e., the real part of ia is negative and the real part of ia. is negative. But this is impossible. D Exercise 21.12. Verify the four assertions (1)-(4) that are listed at the beginning of the proof of Theorem 21.17. [HINT: To check (4) note that if p(A) = (A-Ad ml ... (A-Ak)m/o, thenp(iA) = (iA+i 2AI)m l ... (iA+i2Ak)mk.] Exercise 21.13. Show that if p(A) and q(A) are two stable polynomials with either the same even part or the same odd part, then tp(A) + (1- t)q(A) is stable when t ::; 1. (In other words, the set of stable polynomials with real coefficients that have the same even (respectively odd) part is a convex set.)
°: ;
21.9. Kharitonov's theorem A problem of great practical interest is to determine when a given polynomial
p(A) = Po + PIA + ... + Pn An is stable. Moreover, in realistic problems the coefficients may only be known approximately, i.e., within the bounds (21.34)
'!!.i ::; Pi ::; Pj
for j
= 0, ... ,n.
Thus, in principle it is necessary to show that every polynomial that meets the stated constraints on its coefficients has all its roots in IL. A remarkable theorem of Kharitonov states that it is enough to check four extremal cases. Theorem 21.19. (Kharitonov) Let '!!.j ::; Pj' j = 0, ... ,n, be given, and let
'Pl(A) =
l!.o + P2 A2 + '!!.4 A4 + .. .
'P2(A) = Po + P A2 + P4 A4 + .. . -2 1/JI (A) = P A+ P3A3 + !:.5 p A5 + ... -1 1/J2(A) = PIA + &A3 + P5 A5 + .... Then every polynomial
p(A) = Po + PIA + ... + Pn An with coefficients Pi that are subject to the constraints (21.34) is stable if and only if the four polynomials
Pjk(A) are stable.
= 'Pj(A) + 1/Jk(A) ,
j, k
= 1,2,
21.10. Bibliographical notes
467
Proof. One direction of the asserted equivalence is self-evident. The strategy for proving the other rests on Theorem 21.17, which characterizes the stability of a polynomial p(A) in terms of the Bezoutian B(a, b) of the pair of associated polynomials a(A)
=
p(iA) +2P(-iA)
and b(A) = p(iA) ~r -iA) .
However, since
pjk(iA)+Pjk(-iA) 2
pjk(iA) - Pjk( -iA) 2i
.('A)
-
Z
'l/Jk(iA) - 'l/Jk( -iA) = _ '.1. ('A) 2i Z
Thus, upon setting
it remains to show that
B(Clj, bk) -< 0 for j, k = 1,2 ===> B(a, b) -< O. Theorem 21.16 is useful for this. > O. Then Suppose for the sake of definiteness that n = 2m and P -n clearly Cl1(A) ::; a(A) ::; Cl2(A) for every point A E JR. Moreover, since the Bezoutians B(Cl1' b1) -< 0 and B(Cl2' b1) -< 0, b1(A) has n-1 distinct real roots and the polynomials Cl1 (A) and Cl2 (,x) each have n distinct real roots that interlace the roots of b1 (A). Thus, by Exercise 21.10, the polynomial a(A) also has n real roots AI,.. . ,'xn that interlace the roots of b1 (A) and meet the constraint a'('xj)b1(Aj) < O. Therefore, by Theorem 21.16, B(a, b1) -< O. Similar considerations imply that a'(Aj)b2(Aj) < 0 and B(a, b2) -< O. Moreover, since b1(A) ::; b('x) ::; b2(A) for A > 0, b2(A) ::; b(A) ::; b1('x) for'x < 0 and b1 (0) = b(O) = b2(0) = 0, the roots of a('x) interlace the roots of b(A) and a'('xj)b(Aj) < 0 for j = 1, ... ,no Thus, Theorem 21.16 guarantees that B(a, b) -< O. This completes the proof for the case under consideration. The other cases may be handled in much the same way. The details are left to the reader. 0
21.10. Bibliographical notes My main source for early versions of this chapter was the book [45J. However, a number of the proofs changed considerably as the chapter evolved. Formula (12.10) and its block matrix analogue (12.37) exhibit the inverse of a Toeplitz matrix as a Bezoutian that is tailored to the UIilt circle rather than to JR or iJR. The application of Bezoutians to verify Kharitonov's theorem is due to A. Olshevsky and V. Olshevsky [55J. There are also versions
468
21. Zero location problems
of Kharitonov's theorem for polynomials with complex coefficients. Exercise 21.13 is based on an observation in the paper [16J.
Chapter 22
Convexity
All extremism, fanaticism and obscumntism come from a lack of security. A person who is secure cannot be an extremist. He uses his heart and his mind in a normal fashion. I resent very much that certain roshei yeshiva and certain teachers want to impose their will upon the boys.. ... It is up to the individual to make the choice. Rabbi Joseph B. Soloveitchik, cited in [59], p. 237 and p. 240 • WARNING The warnings in preceding chapters should be kept in mind.
22.1. Preliminaries Recall that a subset Q of a vector space U over IF is said to be convex if u, v E Q => AU + (1- A)V E Q for every A such that 0:::; A:::; 1 j and if U is a normed linear space over IF with norm II II and v E U, then Br(v) = {u E U : Ilu - vII :::; r} is a closed convex subset of U, whereas Br{v) = {u E U : Ilu - vII < r} is an open convex subset of U for every choice of r > 0 Exercise 22.1. Verify that the two sets indicated just above are both convex. [HINT: AUI + (1- A)U2 - v = A(UI - v) + (1- A)(U2 - v).] Exercise 22.2. Show that Q = {A E
c nxn : At O}is a convex set.
Exercise 22.3. Show that
Q = {(A,B,C) E c nxn x is a convex set.
c nxn X c nxn : IIAII:::; 1, IIBII:::; 1 and IICII $1}
-
469
22. Convexity
470
Figure 1 Exercise 22.4. Show that the four-sided figure in Figure 1 is not convex.
A convex combination of n vectors is a sum of the form n
L::
(22.1)
.xiVi
with
.xi
~ 0 and
i=l
Lemma 22.1. Let Q be a nonempty subset of a vector space U over IF. Then Q is convex if and only if it is closed under convex combinations, i. e., if and only if for every integer n ~ 1, n
Vb.·· ,Vn E
Q ==}
LAiVi E i=l
Q
for every choice of nonnegative numbers .xl, ... ,.xn such that .xl + ... +.xn = 1. Proof. Suppose first that Q is convex, and that VI, ... ,Vn E Q. Then, if n > 2, .xl < 1 and J.£; = .x; / (1 - .x;) for j = 2, ... ,n, the formula .xlVI
+ .x2 V2 + ... + .xnvn = AIVI + (1- .xl) {
.x2V2
+ ... + .xn Vn } 1- Al
implies that .xlVI
+ ... + Anvn E Q ¢::::::} J.£2V2 + ... + J.£nVn E Q.
Thus, Q is closed under convex combinations of n vectors if and only if it is closed under convex combinations of n - 1 vectors. Therefore, Q is closed under convex combinations of n vectors if and only if it is closed under convex combinations of 2 vectors, i.e., if and only if Q is convex. 0 Lemma 22.2. Let x E
en, let Q be a nonempty subset of en and let d = inf{llx - ull : u E Q} .
Then the following conclusions hold:
22.2. Convex functions
471
(a) If Q is closed, then there exists at least one vector Uo E Q such that
IIx-uoll .
d=
(b) If Q is closed and convex, then there exists exactly one vector no E Q
such that
IIx- uoll . Choose a sequence of vectors UI, U2, ... d=
Proof.
Ilx for k
Uk
II
E
Q such that
1 ~ d+k
= 1,2, .... Then the bound
Ilukll = IIUk - X + xii
~
Iluk - xii + IIxll
1 ~ d + k + Ilxll
guarantees that the vectors Uk are bounded and hence that a subsequence Ukl' Uk2' ... converges to a limit uo, which must belong to Q if Q is closed. The bounds 1 d ~ Ilx - uoll ~ IIx - Uk j I + Ilukj - uoll ~ d + k- + Ilukj - uoll J
serve to complete the proof of (a). Suppose next that Q is both closed and convex and that d = Then, by the parallelogram law,
Ilx - voll· 4d2
IIx - uoll =
211x - uoll 2+ 211x - vol1 2= Ilx - Uo + x - vol1 2+ Ilvo - uol1 2 = 411x - uo; Vo 112 + Ilvo - uoll 2~ 4d2 + Ilvo - uol1 2.
Therefore, 0
~
IIvo - uoll, which proves uniqueness.
o
22.2. Convex functions A real-valued function f(x) that is defined on a convex subset Q of a vector space U over lR is said to be convex if
(22.2)
f(tx
for x, y E Q and 0
~
(22.3)
t
+ (1 ~
t)y)
~
tf(x) + (1 - t)f(y)
1, or, equivalently, if Jensen's inequality holds: n
n
i=1
i=1
f(L AiXi) ~ L Ai! (Xi)
for every convex combination E~=I AiXi of vectors in Q.
Lemma 22.3. Let f(x) be a convex function on an open subintenJal Q of lR and let a < c < b be three points in Q. Then
(22.4)
f(c) - f(a) < f(b) - f(a) < f(b) - f(c) . c-a b-a b-c
472
Proof.
22. Convexity
Let 0 < t < 1 and c = ta + (1- t)b. Then the inequality
f(c)
~
tf(a) + (1 - t)f(b)
implies that
feb) - f(c)
~
t(f(b) - f(a)),
or, equivalently, that
feb) - f(c) > feb) - f(a) b- a ' t(b - a) which serves to prove the second inequality, since t(b-a) = b-c. The proof of the first inequality is established in much the same way. The first step is to observe that f(c) - f(a) ~ (1 - t)(f(b) - f(a)). The rest is left to the reader as an exercise.
o
Exercise 22.5. Complete the proof of Lemma 22.3. Lemma 22.4. Let Q = (a, {3) be an open subinterval of IR and let f E C2 (Q). Then f(x) is convex on Q if and only if /,,(x) ~ 0 at every point XEQ.
Proof. Suppose first that f(x) is convex on Q and let a < c < b be three points in Q. Then upon letting c 1 a in the inequality (22.4), one can readily see that
f'ea) 5, feb) - f(a) . b-a Next, upon letting c i b in the same set of inequalities, it follows that
feb) - f(a) 5, f'(b). b-a Thus, fl(a) ~ f'(b) when a 5, band f(x) is convex. Therefore, /,,(x) ~ 0 at every point x E Q.
Conversely, if /,,(x) ~ 0 at every point x E Q and if a < c < b are three points in Q, then, by the mean value theorem,
f(c) - f(a) = f'(f.) c-a
for some point f. E (a, c)
and
feb) - f(c) = f'e-f}) for some point ", E (c, b). b-c Therefore, since f'(f.) 5, 1'(",), it follows that (f(c) - f(a))(b - c) 5, (f(b) - f(c))(c - a) . But this in turn implies that
f(c)(b - a)
~
(b - c)f(a) + (c - a)f(b) ,
22.3. Convex sets in lR n
473
which, upon setting c = ta + (1 - t)b for any choice of t E (0,1), is easily seen to be equivalent to the requisite condition for convexity. 0 Exercise 22.6. Let f(x) = xr on the set Q = (0,00). Show that I(x) is convex on Q if and only if r ~ 1 or r ~ 0 and that - I (x) is convex on Q if and only if 0 ~ r ~ 1. Lemma 22.5. Let Q be an open nonempty convex subset of lR. n, and let IE C2(Q) be a real-valued function on Q with Hessian Hf(x). Then
(22.5)
I
is convex on Q il and only il Hf(x)
~ 0
on Q
for every point x E Q.
Proof. Let Ix,y = {t : x + ty E Q} for any pair of distinct vectors x, y E Q. Then it is readily checked that Ix,y is an open nonempty convex subset of R Moreover, since the function g(t) = I(x + ty)
satisfies the identities
and Ilg(tl) + (1 - ll)g(t2) = Ill(x + tlY) + (1 - Il)f(x + t2Y) , it is also readily seen that f is convex on Q if and only if 9 is convex on Ix,y for every choice of x, y E Q. Thus, in view of Lemma 22.4 it follows that f is convex on Q if and only if
(~:;) (0) ~ 0
for every choice of x, y E Q.
But this serves to complete the proof, since n (8 2f ) ( fJ29) 8t (0) = L Yi 8x.8x. 2
i,j=l
~
(x)Yj.
J
o Corollary 22.6. Let Q be a closed nonempty convex subset of lRfl, and let f E C2 (Q) be a convex real-valued function on Q. Then f attains its minimum value in Q.
22.3. Convex sets in R n Lemma 22.7. Let Q be a closed nonempty convex subset olRfl, let x E lR n and let Ux be the unique element in Q that is closest to x. Then (22.6)
(x - Ux, U - UX) ~ 0 for every U E Q .
474
22. Convexity
Proof. Let u E Q. Then clearly (1 - -X)ux + -Xu E Q for every number ). in the interval 0 ~ ). ~ 1. Therefore, Ilx - uxll~
< IIx - (1- -X)ux - ).ull~ IIx - Ux - -X(u - ux)lI~ =
since). E
~
IIx - uxll~ - 2-X(x - ux, u - ux)
and all the vectors belong to
~ n.
+ ).211u -
uxll~ ,
But this in turn implies that
2-X(x - Ux, u - ux) ~ ).211u - uxll~ and hence that 2(x - Ux, u - ux) ~ -Xllu - uxll~ for every -X in the interval 0 < -X ~ 1. (The restriction). > 0 is imposed in order to permit division by -X in the line preceding the last one.) The desired inequality (22.6) now drops out easily upon letting). 1 o. 0 Exercise 22.7. Let B be a closed nonempty convex subset of ~ n, let ao E ~ n and let hex) = (ao - bo, x), where bo is the unique vector in B that is closest to ao. Show that if ao ¢ B, then there exists a number 6 > 0 such that h(ao) ~ 6 + h(b) for every vector b E B. Lemma 22.S. Let U be a nonempty subspace of ~ n and let x E ~ n. Then there exists a unique vector Ux E U that is closest to x. Moreover,
(x - Ux, u)
(22.7)
= 0 for every vector
u
EU .
Proof. The existence and uniqueness of Ux follows from Lemma 22.7, since a nonempty subspace of ~ n is a closed nonempty convex set. Lemma 22.7 implies that (x - ux, u) = (x - Ux, u
+ Ux -
ux) ~ 0
for every vector u E U. Therefore, since U is a subspace, the supplementary inequality (x - Ux, -u) ~ 0 is also in force for every vector u E U.
o
Lemma 22.9. Let Q be a closed nonempty convex subset of ~ n, let x, y E ~ n and let Ux and Uy denote the unique elements in Q that are closest to x and y, respectively. Then IIUx-uyll ~ IIx-yll·
(22.8) Proof.
Let 0:
= (x - Ux, Uy - ux) and {3 = (y - Uy, Ux - Uy) .
22.4. Separation theorems in
~n
475
In view of Lemma 22.7, a ::; 0 and (J ::; O. Therefore
II(x - ux) - (y - uy) + (ux - uy)ll~
Ilx - Yl12 =
lI(x-ux)-(y-uy)II~-a-{J+llux-uyll~
> Ilux - uyll~ , as claimed. 0 The inequality (22.8) implies that if Q is a closed nonempty convex subset of ~ n, then the mapping from x E ~ n - 7 Ux E Q is continuous. This fact will be used to advantage in the next section, which deals with separation theorems.
22.4. Separation theorems in
~n
The next theorem extends Exercise 22.7.
Theorem 22.10. Let A and B be disjoint nonempty closed convex sets in ~ n such that B is also compact. Then there exists a point ao E A and a point b o E B such that
(ao - bo, a)
1 2 2 2 > "2{lIaoI1 2 -li boll 2 + llao - bo11 2} 1
2
2
2
> "2{llaolb -Ilbolb -liao - bo11 2} 2:: (ao - bo, b) for every choice of a E A and b E B. Proof. Let fA (x) denote the unique point in A that is closest to x E Then, by Lemma 22.7, (x - fA(X), a - f A(X)) ::; 0 for every
~ n.
a EA.
Moreover, since fA(X) is a continuous function of x by Lemma 22.9,
g(x) = Ilx - fA (x) II is a continuous scalar valued function of x E ~ n. In particular, 9 is continuous on the compact set B, and hence there exists a vector b o E B such that Ilbo - fA (bo)II ::; lib - fA(b)11 for every b E B. Let ao = fA(bo). Then
IIbo - aoll ::; lib - fA(b) II ::; lib - aoll for every choice of b E B, and hence, as B is convex,
IIbo - aoll 2 < 11(1 - 'x)bo +,Xb - aoll~ - 11'x(b - bo) - (ao - bo)ll~ = ,X211b - boll~ - 2'x(b - b o, ao - bo) + llao - boll~
22. Convexity
476
for 0 ::; A ::; 1. But this reduces to the inequality
2A(ao - bo, b - bo} ::; A211b - boll~ and hence implies that
2(ao - b o, b - b o} ::; Allb - boll~ for every A in the interval 0 < A ::; 1. Thus, upon letting A 1 0, we obtain the auxiliary inequality
(ao - bo, b - bo) ::; 0 for every b
E
B,
which, in turn, yields the inequality
(22.9)
(ao - bo, b) ::; (ao - bo, bo) for every b
E B.
Moreover, Lemma 22.7 implies that
(bo - ao, a - ao) ::; 0 for every a
E
A
and hence that
(22.10)
(ao - bo,a)
~(ao
- bo,ao) for every a EA.
Next, since
2(ao, bo} = llaoll~ + Ilboll~ - llao - boll~ and ao -:f:. bo, it is readily checked that
(ao - bo,ao) =
llaoll~ - ~{lIaoll~ + IIboll~ -ilao -
=
1 2 2 2 "2{lIaolb -li b ol1 2 + llao - bo11 2}
>
~{Ilaoll~ -liboll~ -ilao -
=
(ao - bo, bo) .
bolln
bolln
The asserted conclusion now drops out easily upon combining the last chain of inequalities with the inequalities (22.9) and (22.10). D
Theorem 22.11. Let A and B be disjoint nonempty closed convex sets in R n such that B is compact. Then there exists a linear functional f(x) on R n and a pair of numbers Cl, C2 E R such that
(22.11)
f(a}
~
Cl > C2
~
f(b}
for every choice of a E A and b E B.
Proof.
By Theorem 22.10, there exists a pair of points ao E A and bo E B
such that for every a E
(ao - bo,a) ~ Cl > C2 A and b E B, where
~
(ao - bo, b)
2Cl = llaoll~ - Ilboll~ + llao - boll~
477
22.5. Hyperplanes
and 2C2
The inequality Cl > C2 completed by defining
= llaoll~ -Ilboll~ -ilao - boll~ . holds because An B = 0. The proof is now easily f(x) = (x, ao
- bo) .
o 22.5. Hyperplanes The conclusions of Theorem 22.11 are often stated in terms of hyperplanes: A subset Q of an n dimensional vector space V over IF is said to be a hyperplane in V if there exists an element v E V such that the set
Q-v={U-v:UEQ} is an n - 1 dimensional subspace of V. Or, to put it another way, Q is a hyperplane in V if and only if there exists a vector v E V and an n - 1 dimensional subspace W of V such that
Lemma 22.12. Let X and Y be two subspaces of a vector space V over IF, let u, v E V and suppose that
U+X=v+y. Then X
= Y and U - v EX.
Proof. Let x EX. Then, under the given assumptions, there exists a pair of vectors y 1, Y2 E Y such that U = v + y 1 and U + x = v + y 2
.
Therefore, X
=Y2 -Yl
and hence, since Y is a vector space, X ~ y. But, by much the same argument, Y ~ X and consequently X = Y and thus U - VEX. 0 Lemma 22.13. Let V be an n dimensional vector space over IF, let a ElF and let f be a linear functional on V that is not identically zero. Then the set Qf(a)
= {v E V : f(v)
= a}
is a hyperplane. Conversely if Q is a hyperplane in V, then there exists a point a E IF and a linear functional f on V such that Q = QJ(a).
22. Convexity
478
Proof. To show that Q f (a) is a hyperplane, let u, W E Q f (a). Then u E Qf(O), since
W -
f(w - u) = f(w) - f(u) = a - a = O. Therefore, Qf(a) ~ u + Qf(O) and hence, as the opposite inclusion u + Qf(O) ~ Qf(a) is easily verified, it follows that Qf(a) = u + Qf(O). Thus, as Qf(O) is an n-l dimensional subspace of V, Qf(a) is a hyperplane.
Conversely, if Q is a hyperplane, then there exists a vector u E V and an n - 1 dimensional subspace W of V such that Q=u+W.
There are two cases to consider: (a) u ¢ Wand (b) u E W. In case (a), V = {au + w : a E IF and w E W}.
Moreover, if then (al - (2)u
= W2 -
WI
and hence as W2 - WI E Wand u ¢ W, al = a2; i.e., the coefficient of u in the representation of a vector v E Vasa linear combination of u and a vector in W is unique. Consequently, the functional fu that is defined by the rule fu(v) = a for vectors v = au + w
with wE W
is a well-defined linear functional on V and
Q = {v E V : fu(v) = l}. This completes case (a). In case (b) choose a vector y ¢ Wand note that every vector v E V admits a unique representation of the form v = ay + w
with w E W.
Correspondingly, the formula fy(ay+w) = a defines a linear functional on V and Q = {v : fy(v) = O} .
o Theorem 22.11 states that if C2 < a < Cl. then the hyperplane Qf(a) that is defined in terms of the linear functional f that meets the constraints in (22.11) separates the two sets A and B.
22.6. Support hyperplanes
479
22.6. Support hyperplanes A hyperplane H in IR n is said to be a support hyperplane of a nonempty proper convex subset Q of IR n if every vector in Q sits inside one of the closed halfspaces determined by H. In other words, if H = {x E Rn : (x, u) = c}, then either (x, u) ~ c for every vector in Q, or (x, u) 2:: c for every vector in Q. Theorem 22.14. Let a E IR n belong to the boundary of a nonempty convex subset Q of IR n. Then: (1) There exists a point b E IR n such that lib - xii 2:: lib - all for every point x E Q, the closure of Q. (2) The hyperplane H
= {a + x : x
E
IR n
and
(x, b - a) = O}
is a support hyperplane for Q through the point a. Proof. Let 0 < E < 1 and let C E Be(a) be such that c f/. Q. By Theorem 22.11, there exists a vector u E IR n such that the hyperplane if = {x E R n : (x - C, u) = O} does not intersect Q. It is left to the reader to check that (22.12)
~
.
mm{lIx - all : x E H} =
(c - a, u)
~
lIull
E.
Thus, if d E IR n is the point in the intersection of Bl (a) and the line through a and the point in if that achieves the minimum distance given in (22.12), then lid - xII 2:: 1- E for every x E Q. Thus, as E can be an arbitrarily small positive number, and the function
fcJ(x) = min{lIx - qll : q
E
Q}
is continuous, it follows that max {f(J(x) : IIx-ali = I} = 1 and is attained by some point b E IR n with lib - all lib - xii 2 f(J(b) 2 lib - all = 1 for
= 1. Thus, x E Q.
This completes the proof of (1). Next, Lemma 22.7 implies that (b - a, x - a) ~ 0 for every point x E Q, which serves to complete the proof, since the given hyperplane can also be written as H = {x E IR n: (x - a, b - a) = O} .
o Exercise 22.8. Verify formula (22.12).
22. Convexity
480 22.7. Convex hulls
Let Q be a subset of a vector space V over IF. The convex hull of Q is the smallest convex set in V that contains Q. Since the intersection of two convex sets is convex, the convex hull of Q can also be defined as the intersection of all convex sets in V that contain Q. The symbol conv Q will be used to denote the convex hull of a set Q.
Lemma 22.15. Let Q be a subset oflF n . Then the convex hull ofQ is equal to the set of all convex combinations of elements in Q, i. e., (22.13)
conv
Q= {ttiXi: n ~ 1, ti ~ 0, t t i = 1 and i=l
Xi E
Q} .
I
Proof. It is readily seen that the set on the right-hand side of (22.13) is a convex set: if n
U
=
L tiXi is a convex combination of
Xl, ... ,Xn E
Q
i=l
and k
Y=
L SjYj is a convex combination of Yl'··· ,yk E Q , j=1
then
AU + (1 - ;\)v =
n
k
i=1
j=1
L ;\tixi + L(1-
;\)SjYj
is again a convex combination of elements of Q for every choice of ;\ in the interval 0 ~ A ~ 1, since for such ;\, ;\ti ~ 0, (1 - A)Sj ~ 0 and At1
+ ... + ;\tn + (1 -
;\)Sl
+ ... + (1 -
;\)Sk
=1
.
Thus, the right-hand side of (22.13) is a convex set that contains Q. Moreover, since every convex set that contains Q must contain the convex combinations of elements in Q, the right hand side of (22.13) is the smallest 0 convex set that contains Q.
Theorem 22.16. (Caratheodory) Let Q be a nonempty subset of R n. Then every vector x E cony Q is a convex combination of at most n + 1 vectors in Q. Proof.
Let
22.7. Convex hulls
Then xEconvQ
481
[Xl]
~
E
[~] =
convQl
t,
If the vectors Yj
=
k
a; [:;] with
[:J '
aj
> 0,
Q and
Xj E
L
aj
= 1.
j=1
j = 1, ... ,k,
are linearly independent, then k ::; n + 1, as claimed. If not, then there exists a set of coefficients (31, . . . ,(3k E lR such that k
L(3jYj = 0 j=1
and 'P = {j:
(3j
> O}
=1=
0.
Let ,=min{;:: jE'P}. Then aj - ,(3j ~ 0 for j = 1, ... ,k, and, since at least one of these numbers is equal to zero, the formula (22.14) displays the vector on the left as a convex combination of at most k - 1 vectors. If these vectors are linearly independent, then there are at most n + 1 of them. If not, the same argument can be repeated to eliminate additional vectors from the representation until a representation of the form (22.14), but with k ::; n + 1, is obtained. The resulting identities k
k
1 = L(aj - ,(3j) j=1 serve to complete the proof.
and x = L(aj j=1
,(3j)Xj
o
Exercise 22.9. Let Q be a nonempty subset of lR n . Show that every nonzero vector x E conv Q can be expressed as a linear combination 2:J=l ajxj of f linearly independent vectors in Q with positive coefficients. [IDNT: The justification is a variant of the proof of Theorem 22.16.] The conclusions in Exercise 22.9 can be extended to cones: If Q is a nonempty subset of lR n, then the cone generated by Q is the set KQ =
{t
3=1
ajxj :
Xj
E Q and
aj
>
o}
482
22. Convexity
of all finite linear combinations of vectors in Q with positive coefficients. Exercise 22.10. Let Q be a nonempty subset of Rn. Show that every nonzero vector x E KQ can be expressed as a linear combination L:~=1 ajxj of £ linearly independent vectors in Q with positive coefficients. Lemma 22.17. Let Q be a nonempty subset of R n. Then
(1) Q open ====} conv Q is open. (2) Q compact ====} conv Q is compact. Proof. Suppose first that Q is open and that x = L:;=1 CjXj is a convex combination of vectors Xl, ... ,Xl E Q. Then there exists an c > 0 such that Xj + u E Q for j = 1, ... ,£ and every vector U ERn with Ilull < c. Therefore, l
x+u= LCj(Xj+u) EQ j=l
for every vector u E Rn with IIuli < c; i.e., convQ is open. Suppose next that Q is compact and that {Xj}, j = 1, 2, ... , is an infinite sequence of vectors in conv Q that converges to a vector Xo ERn. Then, by Theorem 22.16, there exists a sequence of matrices Aj E R nx(n+1) with columns in Q and a sequence of vectors Cj E R n+l with nonnegative coefficients and IIcjlll = 1 such that Xj = Ajcj for j = 1,2,.... By the presumed compactness of Q and the compactness of the {c E R n+1 : II CII 1 = I}, there exists a subsequence nl < n2,··· such that Anj ---t A, cnj ---t C and Xo
= j---+oo lim Xn1- = j---+oo lim Anjcn1- = Ac,
where A E R nx(nH) with columns in Q and C E R n+1 with nonnegative coefficients and IIcill = 1. Thus, Xo = Ac E conv Q. This proves that conv Q is closed. Since conv Q is also clearly a bounded subset of R n+1, it must be compact. 0
22.8. Extreme points Let Q be a convex subset of R n. A vector u E Q is said to be an extreme point of Q if
o < a < 1, x, y E Q
and
u = ax + (1 - a)y
==}
x = Y = u.
Lemma 22.18. Every nonempty compact convex subset of R n contains at least one extreme point. Proof. Let Q be a nonempty compact convex subset of R n and let f(x) IIxil for every vector x E Q. The inequality
If(x) - f(y)1 = "Ixil - IIylll :::; IIx - yll
=
22.8. Extreme points
483
implies that f is continuous on Q. Therefore f attains its maximum value on Q; i.e., there exists a vector u E Q such that Ilull 2 IIxll for every x E Q. The next step is to show that u is an extreme point of Q. It suffices to restrict attention to the case u i= O. But if u = ax + (1 - a)y for some pair of vectors x and y in Q and some a with a < a < 1, then the inequalities lIuli = lIax + (1- a)YII ::; allxll
+ (1- a)lIyll ::; lIull
clearly imply that Ilxll = lIyll = lIuli. Thus, (u, u)
= =
a(x, u) + (1- a)(y, u) ::; allullllxll allul1 2 + (1- a)llull 2 = lIull 2 .
+ (1- a)lIulillyll
Therefore, equality is attained in the Cauchy-Schwarz inequality, and hence x = au and y = bu for some choice of a and b in R However, since Ilxll = lIyll = lIull, it is readily seen that the only viable possibilities are a = b = 1. Therefore, u is an extreme point of Q. 0 Lemma 22.19. The set of extreme points of a non empty compact convex subset of ~ 2 is closed. Discussion. The stated conclusion is self-evident if A is either a single point or a subset of a line in ~ 2. Suppose, therefore, that neither of these cases prevails and let a = limkjoo ak be the limit of a sequence aI, a2, ... of extreme points of A. Then a must belong to the boundary of A; i.e., for every r > a the open unit ball Br(a) of radius r > a centered at a contains points in A and in ~ 2 \ A. Therefore, there exists a line L through the point a such that all the points in A sit on one side of L. Without loss of generality, we may assume that L = {x E ~ 2 : Xl = '"'{} for some '"'{ E ~ and that A C {x E ~2 : Xl ::; '"'{}. Thus, if a = ab + (1 - a)c
with b, c E A
and
a < a < 1,
then, since the first coordinates are subject to the constraints al = 'Y, bl ::; 'Y and CI ::; ,",{, it is readily seen that al = bl = Cl, i.e., b, c E An L. Moreover, if a is not an extreme point of A, then there exists a choice of points b, c E An L with b2 > a2 > C2 and a point d E A such that Br(d) C A for some r > a. Consequently, ak E conv{b,c,d} U {x E ~2: C2
< X2 < b2}
for all sufficiently large k. But this is not possible, since the ak were presumed to be extreme points of A. The conclusions of Lemma 22.19 do not propagate to higher dimensions: Exercise 22.11. Let QI = {x E ~3 : x~ + x~ ::; 1 and X3 = a}, Q2 = {x E ~3 : Xl = 1,X2 = a and -1::; X3 ::; I}. Show that the set of extreme points of the set conv (Q1 U Q2) is not a closed subset of ~ 3 .
22. Convexity
484
We turn next to a finite dimensional version of the Krein-Milman theorem. Theorem 22.20. Let Q be a nonempty compact convex set in IR n. Then Q is equal to the convex hull of its extreme points. Discussion. By Lemma 22.18, the set E of extreme points of Q is nonempty. Let E denote the closure of E and let F = cony E denote the convex hull of E and suppose that there exists a vector qo E Q such that qo ¢ F. Then, since F is closed, Theorem 22.11 guarantees that there exists a real linear functional h on IR n and a number 8 E IR such that h(~)
(22.15)
> 82 h(x) for every x E F.
Let ,=sup{h(X):XEQ}
and
Eh={xEQ:h(x)=,}.
The inequality (22.15) implies that, > 6 and hence that En Eh = 0. On the other hand, it is readily checked that Eh is a compact convex set. Therefore, by Lemma 22.18, it contains extreme points. The next step is to check that if Xo E Eh is an extreme point of Eh, then it is also an extreme point of Q: If Xo = au + (1- a)v for some pair of vectors u, v E Q and 0 < a < 1, then the identity h(xa) = ah(u)
+ (1 -
a)h(v)
implies that h(xa) = h(u) = h(v) =, and hence that u, v E Eh. Therefore, since Xo is an extreme point for Eh, Xo = U = v. Thus, Xo E E, which proves that En Eh =f. 0; i.e., if F
=f. Q ,
then
E n Eh =
0 and E n Eh =f. 0 ,
which is clearly impossible. Therefore, F = Q; i.e., cony E = Q. It remains to show that Q = cony E. In view of Lemma 22.19, this is the case if Q C 1R 2 , because then E = E. Proceeding inductively, suppose that in fact Q = cony E if Q is a subset of IR k for k < p, let Q be a subset of IR P and let q E E. Then q belongs to the boundary of Q and, by translating and rotating Q appropriately, we can assume that q belongs to the hyperplane H = {x E IR P : Xl = O} and that Q is a subset of the halfspace H _ = {x E IR P : Xl ~ O}. Thus, Q n H can be identified with a compact convex subset of IR p-l. Let E' denote the extreme points of Q n H. By the induction hypothesis Q n H = cony E'. Therefore, since q E Q n H, and cony E' ~ cony E, it follows that (22.16)
E
~
convE,
and hence that Q = cony E, as claimed.
o
22.10. The Minkowski functional
485
Theorem 22.21. Let Q be a nonempty compact convex set in
]Rn. Then every vector in Q is a convex combination of at most n + 1 extreme points ofQ·
Proof. This is an immediate consequence of Caratheodory's theorem and the Krein-Milman theorem.
0
22.9. Brouwer's theorem for compact convex sets A simple argument serves to extend the Brouwer fixed point theorem to compact convex subsets of ]R n.
Theorem 22.22. Let Q be a nonempty compact convex subset of 1R nand let f be a continuous mapping of Q into Q. Then there exists at least one point q E Q such that f( q) = q. Proof. Since Q is compact, there exists an r > 0 such that the closed ball Br(O) = {x E 1R n: IIxll ~ r} contains Q. Then, by Lemma 22.2, for each point x E Br(O) there exists a unique vector ~ E Q that is closest to x. Moreover, by Lemma 22.9, the function g from Br(O) into Q that is defined by the rule g(x) = ~ is continuous. Therefore the composite function h(x) = f(g(x)) is a continuous map of Br(O) into itself and therefore has a fixed point in Br(O). But this serves to complete the proof: f(g(x)) = x
===? f(~)
= x ===? x E Q ===? x = ~
===?
f(x) = x.
o Exercise 22.12. Show that the function f(x) =
(Xl
+ X2)/2]
[
J X IX2 maps the set Q = {x E 1R 2 : 1 ~ Xl ~ 2 and 1 ~ X2 ~ 2} into itself and then invoke Theorem 22.22 to establish the existence of fixed points in this set and find them.
Exercise 22.13. Show that the function f defined in Exercise 22.12 does not satisfy the constraint IIf(x) - f(y)1I < 'Yllx - yll for all vectors x, y in the set Q that is considered there if 'Y < 1. [HINT: Consider the number of fixed points.]
22.10. The Minkowski functional Let X be a normed linear space over 1F and let Q ~ X. Then the functional
PQ(x) = inf{t
>0:
T
E
Q}
486
22. Convexity
is called the Minkowski functional. If the indicated set of t is empty, then PQ(x) = 00. Lemma 22.23. Let Q be a convex subset of a normed linear space X over IF such that Q ;2 Br(O) for some r > 0
and let int Q and Q, denote the interior and the closure of Q, respectively. Then: (1) (2) (3) (4)
PQ(x + y) ~ PQ(x) + PQ(Y) for x, Y EX. pQ(ax) = apQ(x) for a ~ 0 and x EX. PQ(x) is continuous. {x EX: PQ(X) < I} = intQ.
(5) {x EX: PQ(X) ~ I} = Q. (6) If Q is also bounded, then PQ(x)
= 0 '* x = o.
Proof. Let X,Y E X and suppose that a-Ix E Q and choice of a > 0 and 13 > o. Then, since Q is convex,
f3- Iy
E
Q for some
x+y = _a_(a-Ix) + _f3_(f3- I y)
a+f3
a+f3
a+f3
belongs to Q and hence,
PQ(X + y) ~ a
+ 13 .
Consequently, upon letting a run through a sequence of values al ~ a2 ~ ... that tend to PQ(x) and letting 13 run through a sequence of values 131 ~ 132 ~ ... that tend to PQ(y), one can readily see that
PQ(x + y) ~ PQ(x)
+ PQ(Y)
.
Suppose next that a > 0 and PQ(x) = a. Then there exists a sequence of numbers t1, t2, . .. such that tj > 0, x E Q and lim tj = a . tj jjoo Therefore, since ax atj
E
Q and
limatj = aa, jjoo
fQ(ax)
~
afQ(x) .
However, the same argument yields the opposite inequality:
afQ(x) = afQ(a-Iax) ~ aa- 1 fQ(ax) = fQ(ax) . Therefore, equality prevails. This completes the proof of (2) when a > O. However, (2) holds when a = 0, because PQ(O) = O.
22.10. The Minkowski functional
If x
=1=
487
0, then
PQ(x) :::; 2rllxll ,
since
Therefore, since the last inequality is clearly also valid if x = 0, and, as follows from (1),
IpQ(x) - pQ(y)1 :::; PQ(x - y), it is easily seen that PQ(x) is a continuous function of x on X. Items (4) and (5) are left to the reader. Finally, to verify (6), suppose that PQ(x) = O. Then there exists a sequence of points al 2:: a2 2:: ... decreasing to 0 such that ajI x E Q. Therefore, since Q is bounded, say Q ~ {x : IIxll :::; C}, the inequality Ilatxll :::; C implies that Ilxll :::; ajC for j = 1,2 ... and hence that x = 0.
o Exercise 22.14. Complete the proof of Lemma 22.23 by verifying items (4) and (5). Exercise 22.15. Show that in the setting of Lemma 22.23, PQ(x) x E Q and x E Q ==} PQ(x) :::; 1.
< 1 ==}
The proof of the next theorem that is presented below serves to illustrate the use of the Minkowski functional. The existence of a support hyperplane is already covered by Theorem 22.14.
Theorem 22.24. Let Q be a convex subset of lR n such that Br(q) c Q for some q E lR n and some r > O. Let v E lR nand U be a k-dimensional subspace of lR n such that 0 :::; k < n and the set V = v + U has no points in common with int Q. Then there exist a vector y E lR n and a constant c E lR. such that (x,y) = c if x E V and (x,y) < c if x E intQ. Proof. By an appropriate translation of the problem we may assume that q = 0, and hence that ¢ V. Thus, there exists a linear functional f on the vector space W = {av + U : a E lR.} such that
°
V = {w E W: f(w) = I}. The next step is to check that
(22.17)
f(x) :::; PQ(x)
for every vector
x E W.
Since PQ(x) 2:: 0, the inequality (22.17) is clearly valid if f(x) :::; O. On the other hand, if x E Wand f(x) = a > 0, then a-Ix E V and thus, as V n intQ = 0,
a> 0 ==}
~PQ(x) = PQ (~)
2:: 1 = f
(~) = ~f(x);
22. Convexity
488
i.e., f(x)
> 0 ===> f(x)
~ PQ(x) .
Thus, (22.17) is verified. Therefore, by the variant of the Hahn-Banach theorem discussed in Exercise 7.29, there exists a linear functional F on R n such that F(x) = f(x) for x E Wand F(x) ~ PQ(x) for every x ERn. Let H = {x ERn: F(x) = I}. Then F(x) ~ PQ(x) < 1 when x E int Q, whereas F(x) = 1 when x E V, since V ~ H. Thus, as F(x) = (x, y) for some vector y ERn, it follows that (x, y) < 1 when x E int Q and (x, y) = 1 when x E H. 0
22.11. The Gauss-Lucas theorem Theorem 22.25. Let f(A) = ao+a1A+·· ·+anAn be a polynomial of degree n ~ 1 with coefficients ai E C for i = 1,· .. ,n and an i= o. Then the roots of the derivative f'(A) lie in the convex hull of the roots of f(A). Proof. Let AI.··· ,An denote the roots of needed. Then
f (A), allowing repetitions as
and
f'(A)
= _1_+ ... +
A- Al
f(A) =
Thus, if f'(p.) = 0 and f(J.L)
1
A-
An
X-Xl X-Xn IA-AlI2 + ... + IA-AnI2· i= 0, then
+ ... + IJ.L -\nI 2} Al An IJ.L - Al12 + ... + Ip. - An1 2 '
J.L {IJ.L -\112 =
which, upon setting
J.L = hAl + ... + tnAn. This completes the proof since the conclusion for the case f'(p.) = f(J.L) is self-evident.
=0 0
22.12. The numerical range
489
22.12. The numerical range Let A E
e nxn . The set W(A)
= {(Ax, x) : x
E en and Ilxll
= 1}
is called the numerical range of A. The objective of this section is to show that W(A) is a convex subset of e. We begin with a special case.
Lemma 22.26. Let BEe nxn and let x, y E en be nonzero vectors such that (Bx, x) = 1 and (By, y) = O. Then for each number oX in the interval 0 < oX < 1, there exists a vector VA E en with IIvAil = 1 such that (BvA' vA) = oX.
Proof. where
"'I
Let Ut = t-yx
+ (1 -
t)y
= 1 and 0 ~ t ~ 1. Then
t2(Bx, x) + t(1 - t)b(Bx, y) + :y(By, x)} = t 2 + t(1 - t)b(Bx, y) + :y(By, x)} .
(BUt, Ut) =
+ (1 -
t)2(By, y)
The next step is to show that there exists a choice of , such that
,(Bx, y) + :y(By, x) is a real number. To this end it is convenient to write
B=C+iD in terms of its real and imaginary parts
C = B
+ BH
B - BH
and D=
2
2i
Then, since C and D are both Hermitian matrices,
,(Bx, y)
+ :y(By, x)
+ ir(Dx, y) + :y(Cy, x) + i:y(Dy, x) + ibd + ,d} ,
,(Cx, y)
=
,c + ,c
where
c = (Cx,y) and d = (Dx,y) are both independent of t. Now, in order to eliminate the imaginary component, set , =
{
ifd=O I ildl-1(l if d # O.
Then, for this choice of "
(BUt, Ut) = t 2 + t(l - t)(,c + ,c).
490
22. Convexity
Moreover, since (Bx, x) = 1 and (By, y) linearly independent. Thus, Ilutll~
=
0, the vectors x and yare
= t 2 + (1- t)t{(-yx,y) + (y,/,x)} + (1- t)2
for every choice of t in the interval 0
~
t
~
=1=
0
1. Therefore, the vector
Ut
Vt = Ilutll is a well-defined unit vector and
(B v v) -
t2 + t(l- t)hc + ;;yc} + t(1- tH (-yx, y) + (y, /,x)} + (1 -
-::=----;--~,.......:_----;-.:.....:....;_-~-----:_,___~
t, t - t2
~
is a continuous-real valued function of t on the interval 0
t
t)2 ~
1 such that
(Bvo, vo) = 0 and (BVI' VI) = 1 . Therefore, the equation
(BVt, Vt) = >. has at least one solution t in the interval 0 the interval 0 ~ >. ~ 1.
~
t
~
1 for every choice of
>. in 0
Theorem 22.27. (Toeplitz-Hausdorff) The numerical range W(A) of a matrix A E C nxn is a convex subset of C. Proof.
The objective is to show that if IIxll = Ilyll = 1 and if (Ax,x) = a
and
(Ay,y)
= b,
then for each choice of the number t in the interval 0 a vector Ut such that
~
t
~
1, there exists
= 1 and (BUt, Ut) = ta + (1 - t)b. If a = b, then ta + {1 - t)b = a = b, and hence we can choose Ut = y. Suppose therefore that a =1= b and let B = aA + /3In lIutll
where a, f3 are solutions of the system of equations
aa + /3 = ba+f3 =
1 O.
Then (Bx, x) =
(aAx, x) + /3(x, x) aa + /3 = 1
and (By,y)
(aAy,y)
+ /3(y,y)
ab + /3 = 0 .
Ut
= x or
491
22.13. Eigenvalues versus numerical range
Therefore, by Lemma 22.26, for each choice of t in the interval 0 there exists a vector Wt such that
~
t
~
1,
IIWtl1 = 1 and (Bwt, Wt) = t. But this in turn is the same as to say
a(Awt, Wt)
Thus, as (Wt, Wt)
+ (3(Wt, Wt)
= 1 and
a
=1=
t + (1 - t)O = t(aa + (3) + (1 - t)(ba + (3) = a{ta+(l-t)b}+{3. =
0,
(AWt, Wt)
= ta + (1 - t)b ,
o
as claimed.
22.13. Eigenvalues versus numerical range The eigenvalues of a matrix A E C nxn clearly belong to the numerical range W(A) of A, i.e., u(A) ~ W(A) . Therefore,
convu(A)
(22.18)
~
W(A)
for every
AE
c nxn ,
since W (A) is convex. In general, however, these two sets can be quite different. If
A=[~ ~],
for example, then
u(A) = 0 and W(A) = {ab : a, bEe and lal 2 + Ibl 2 = I} . The situation for normal matrices is markedly different:
Theorem 22.28. Let A E c nxn be a normal matrix, i.e., AAH = AHA. Then the convex hull of u(A) is equal to the numerical range of A, i.e., convu(A) = W(A). Proof. Since A is normal, it is unitarily equivalent to a diagonal matrix; i.e., there exists a unitary matrix U E c nxn such that U H AU = diag{Ab .. · ,An} . The columns
UI, ...
,Un of U form an orthonormal basis for en. Thus, if
x E en and Ilxll = 1, then n
X
= LCilli i=l
,
22. Convexity
492
is a linear combination of Ul, ... ,Un, n n (Ax, x) = (A L CiUi, L CjUj) i=l j=l
=
n L AiCiCj (Ui' Uj) i=l
n
-
L AilCil 2 i=l
and
n
LlCil 2 = IIxll2 = 1. i=l Therefore, W(A) ~ conv (O'(A)) and hence, as the opposite inclusion (22.18) is already known to be in force, the proof is complete. D Exercise 22.16. Verify the inclusion W(A) ~ conv (O'(A)) for normal matrices A E C nxn by checking directly that every convex combination E~=l tiAi of the eigenvalues AI, ... ,An of A belongs to W(A). [HINT: n n n n LtiAi = Lti(Aui' Ui) = (A L y'tiUi, L ytjUj).J i=l i=l i=1 j="1
Exercise 22.17. Find the numerical range of the matrix
[~ ~ ~].
22.14. The Heinz inequality Lemma 22.29. Let A = AH E CpxP, B = BH E cqxq and Q E C pxq . Then the following inequalities hold under the extra conditions indicated in each item. (1) If also p
= q, Q = QH
and A is invertible, then
211QII ::; IIAQA- l
(22.19)
+ A-lQAII.
(2) If also p = q and A is invertible, then (22.19) is still in force. (3) If A and B are invertible, then
211QII ::; IIAQB- l
(22.20) (4) If A (22.21)
t
0 and B
to,
+ A-lQBII.
then
211AQBII ::; IIA2Q + QB 211.
Proof. Let A E O'(Q). Then A E u(A-lQA), and hence there exists a unit vector x E C P such that
22.14. The Heinz inequality
493
and
"X = (x,AQA-1x) = (A-IQAx,x). Therefore, since).. =
12)..1
"x,
= I((AQA- I
+ A-1QA)x,x)1
~ IIAQA- I
+ A-IQAII,
which leads easily to the inequality (22.19). To extend (1) to matrices Q E Cpxp that are not necessarily Hermitian, apply (1) to the matrices
Q=
[gH ~]
and
[~ ~].
A=
This leads easily to (2), since
IIQII = IIQII
and
IIAQA-I + A-IQAII = IIAQA- I + A-IQAII·
Next, (3) follows from (2) by setting
Q=
[Z
~]
A
and
=
[~ ~].
Finally to obtain (4), let Ae: = A + d p and Be: = B + d q with € > O. Then, since Ae: and Be: are invertible Hermitian matrices, we can invoke (22.20) to obtain the inequality 2 II Ae:QBe: II ~ IIA;QBe:B;1
+ A;l Ae:QB;1I = IIA;Q + QB;II ,
which tends to the asserted inequality as
€
1 o.
o
Theorem 22.30. (Heinz) Let A E CpxP, Q E Cpx q , B E pose that A ~ 0 and B ~ O. Then
IIAtQB I - t + AI-tQBtll ~
(22.22)
IIAQ + QBII
cqxq
and sup-
for 0 ~ t ~ 1.
Proof. Let f(t) = IIAtQB I - t + AI-tQBtll, let 0 ~ a < b ~ 1 and set c = (a + b) /2 and d = b - c. Then, as c = a + d and 1 - c = 1 - b + d,
f(c)
=
IIAcQB I - c + AI-cQBcll =
IIAd (AaQB I - b + AI-bQBa) Bdll
< ~IIA2d (AaQB I - b + AI-bQBa) + (AaQB I - b+ AI-bQBa) B 2d ll !IIAbQB I - b + AI-aQB a + AaQB I - a + AI-bQAbll 2
< f(a) + f(b) 2 i.e., f (t) is a convex function on the interval 0 ~ t ~ 1. Thus, as the upper bound in formula (22.22) is equal to f(O) = f(l) and f(t) is continuous on the interval 0 ~ t ~ 1, it is readily seen that f(t) ~ f(O) for every point tin the interval 0 ~ t ~ 1. 0
22. Convexity
494
Theorem 22.31. Let A E CpxP, B E Cpxp and suppose that A )- 0 and B)- O. Then (22.23)
Proof. Let Q = {u : 0::; u ::; 1 for which (22.23) is in force} and let sand t be a pair of points in Q. Then, with the help of the auxiliary inequality
IIA(s+t)/2 B(s+t)/211 2 = =
IIB(s+t)/2 As+t B(s+t)/2 II
= r (B(s+t)/2 A s+t B(s+t)/2) (1
r(1(B S A s+tB t )::; IIBsAs+tBtll,
it is readily checked that Q is convex. The proof is easily completed, since D
o E Q and 1 E Q.
Exercise 22.18. Show that if A and B are as in Theorem 22.31, then 'P(s) = liAS BSII1/s is an increasing function of s for s > O.
22.15. Bibliographical notes A number of the results stated in this chapter can be strengthened. The monograph by Webster [71] is an eminently readable source of supplementary information on convexity in lR, n. Exercise 22.11 was taken from [71]. Applications of convexity to optimization may be found in [10] and the references cited therein. The proof of Theorem 22.14 is adapted from the expository paper [11]; the proof of Theorem 22.24 is adapted from [48]. The presented proof of the Krein-Milman theorem, which works in more general settings (with convex hull replaced by the closure of the convex hull), is adapted from [73]. The presented proof of the convexity of numerical range is based on an argument that is sketched briefly in [36]. Halmos credits it to C. W. R. de Boor. The presented proof works also for bounded operators in Hilbert space; see also McIntosh [50] for another very attractive approach. The proof of the Heinz inequality is taken from a beautiful short paper [32] that establishes the equivalence of the inequalities in (1)-(4) of Lemma 22.29 with the Heinz inequality (22.22) for bounded operators in Hilbert space and sketches the history. The elegant passage from (22.21) to (22.22) is credited to an unpublished paper of A. McIntosh. The proof of Theorem 22.31 is adapted from [33]. The notion of convexity can be extended to matrix valued functions: a function f that maps a convex set Q of symmetric p x p matrices into a set of symmetric q x q matrices is said to be convex if
f(tX
+ (1 -
t)Y) ::; tf(X) + (1 - t)f(Y)
for every t E (0,1) when X and Y belong to Q. Thus, for example, the function f(X) = xr is convex on the set Q = {X E lR,nxn : X t: O} is convex if 1 ::; r ::; 2 or -1 ~ r ~ 0 and - f is convex if 0::; r ::; 1, see [3].
Chapter 23
Matrices with nonnegative entries
Be wary of writing many books, there is no end, and much study is wearisome to the flesh. Ecclesiastes 12:12
Matrices with nonnegative entries play an important role in numerous applications. This chapter is devoted to the study of some of their special properties. A rectangular matrix A E ~ nxm with entries aij, i = 1, ... ,n, j = 1, ... ,m, is said to be nonnegative if aij ~ 0 for i = 1, ... , nand j = 1, ... ,m; A is said to be positive if aij > 0 for i = 1, ... , nand j = 1, ... ,m. The notation A ~ 0 and A> 0 are used to designate nonnegative matrices A and positive matrices A, respectively. Note the distinction with the notation A >- 0 for positive definite and A !:: 0 for positive semidefinite matrices that was introduced earlier. The symbols A ~ B and A > B will be used to indicate that A - B ~ 0 and A - B > 0, respectively. A nonnegative square matrix A E ~ nxn is said to be irreducible if for every pair of indices i, j E {I,... ,n} there exists an integer k ~ 1 such that the ij entry of Ak is positive; i.e., in terms of the standard basis ei, i = 1, ... ,n, of ~ n, if (Akej, ei)
>0
for some positive integer k that may depend upon ij.
This is less restrictive than assuming that there exists an integer k that Ak > o.
~
1 such
-
495
23. Matrices with nonnegative entries
496
Exercise 23.1. Show that the matrix
A=[~ ~] is a nonnegative irreducible matrix, but that Ak is never a positive matrix. Exercise 23.2. Show that the matrix
A=
[~ ~]
is irreducible, but the matrix B =
[~
n
is not irreducible
and, more generally, that every triangular nonnegative matrix is not irreducible. Lemma 23.1. Let A E (In + A)n-l is positive.
jR nxn
be a nonnegative irreducible matrix. Then
Proof. Suppose to the contrary that the ij entry of (In + A)n-l is equal to zero for some choice of i and j. Then, in view of the formula (In
+ A)n-l = ~ (
n;
1 ) An-1-k =
k=O
~(
n;
1 ) Ak ,
k=O
and hence that
(Akei,ej)=O for
k=O, ... ,n-1.
Therefore, by the Cayley-Hamilton theorem,
(Akei,ej) = 0 for k = 0,1, ... ,
o
which contradicts the assumed irreducibility.
Exercise 23.3. Let A E lR nxn have nonnegative entries and let D E jRnxn be a diagonal matrix with strictly positive diagonal entries. Show that
A is irreducible
-<===:}
AD is irreducible
~
DA is irreducible.
23.1. Perron-Frobenius theory The main objective of this section is to show that if A is a square nonnegative irreducible matrix, then ru(A), the spectral radius of A, is an eigenvalue of A with algebraic multiplicity equal to one and that it is possible to choose a corresponding eigenvector u to have strictly positive entries. Theorem 23.2. (Perron-Frobenius) Let A E jRnxn be a nonnegative irreducible matrix. Then A =1= Onxn and:
497
23.1. Perron-Frobenius theory
(1) ru(A) E u(A).
(2) There exists a positive vector u E IR n such that Au = ru(A)u. (3) There exists a positive vector v E IR n such that ATv = r u (A)v. (4) The algebraic multiplicity of ru(A) as an eigenvalue of A is equal to one. The proof of this theorem is divided into a number of lemmas. Let
and, for x E C, let f<
VA
()
x
= m~n z •
{(Ax, ( ei) .): (x, ei ) > 0 } x,e z
.
Lemma 23.3. Let A E IR nxn be a nonnegative irreducible matrix and let x E en B be such that Ax - &A(X)X =1= O. Then (In + A)n-lx Y = II (In + A)n- 1xIl2 is a positive vector that belongs to
Proof.
en Band &A(Y) > &A(X).
Clearly, Ax - &A(X)X
~
0 for every x E C,
and so too for every x E C n B. Moreover, if Ax - &A(X)X =1= 0, then (In + A)n-l(Ax - &A(X)X) > O. Consequently, the vector Y defined above belongs to en B and, since Y > 0 and Ay - &A(X)Y > 0, the inequality L:,j=l aijYj
> &A(X)
Yi
is in force for i = 1, ... ,n. Therefore &A(Y)
> &A(X), as asserted.
0
Exercise 23.4. Show that if A E IR nxn is a nonnegative irreducible matrix, then &A{X) :S IIAII for every x E C n B. Exercise 23.5. Show that if A E IR nXn is a nonnegative irreducible matrix, then &A(X) :S maxi{L:,j=l aij}. Exercise 23.6. Evaluate &A{X) for the matrix A = tors x
= [
Let
~ ] for c ~ O.
[~ ~]
and the vec-
23. Matrices with nonnegative entries
498
Exercise 23.7. Let A E lR nxn be a nonnegative irreducible matrix. Show that 8A is continuous on CAn B but is not necessarily continuous on the set C n B. [HINT: Exercise 23.6 serves to illustrate the second assertion.] Lemma 23.4. Let A E lR nxn be a nonnegative irreducible matrix. Then there exists a vector Y E CAn B such that
O'A{Y) Proof.
~
O'A(x)
for every x E C n B.
Let
PA = sup{O'A(x) : x E C n B} . Then, by the definition of supremum, there exists a sequence Xl, X2, . .. of vectors in C n B such that 5A (Xj) -+ PA as j i 00. Moreover, by passing to a subsequence if need be, there is no loss of generality in assuming that O'A(XI) ~ O'A(x2) ~ ... and that Xj -+ x, as j i 00. Let (In + A)n-l Xj . (In + A)n-Ix Yj = II (In + A)n-1xjIl2 for J = 1,2,... and Y = II (In + A)n- I xI12 . Then it is left to the reader to check that: (a) The vectors Y and YI,y2"" all belong to CA (b) Yj -+ y, as j i 00. (c) OA(Yj) -+ OA(Y), as j i 00.
n B.
Therefore, the vector Y exhibits all the advertised properties.
o
Exercise 23.8. Complete the proof of Lemma 23.4 by justifying assertions (a), (b) and (c). [mNT: Exploit Exercise 23.7 for part (c).] Lemma 23.5. Let A E lR nxn be a nonnegative irreducible matrix; then PA is an eigenvalue of A. Moreover, if u E C n B is such that OA(U) ~ O'A(x) for every x E C n B, then Au = O'A(u)u and u E CA n B.
Proof.
By the definition of O'A(u),
Au - O'A(u)u
~
0 .
Moreover, if Au - OA(U)U # 0 and (In + A)n-Iu Y = II (In + A)n- I uI12 ' by Lemma 23.3. But this contradicts the presumed maximality of O'A(u). Therefore Au = OA(U)U, The last equality implies further that
+ A)n-Iu = (1 + OA(U))n-l u and hence, as 1 + O'A(u) > 0, that u E CA n B. (In
D
Lemma 23.6. Let A E lR nxn be a nonnegative irreducible matrix. Then PA = ru(A), the spectral radius of A.
23.1. Perron-Frobenius theory
499
Proof. Let.A E u(A). Then there exists a nonzero vector x E en such that .Ax = Ax; i.e., n
.Axi = L aijXj
for
i = 1, ... ,n.
j=l
Therefore, since aij
~
0, n
n
aijXj ~ L aijlxjl
1.Allxil = L j=l
for
i = 1, ... ,n,
j=1
which in turn implies that I.AI
~
1 IXil
n
~ aijlxjl
if
IXil
#0.
Thus, the vector v with components Vi = IXil belongs to C and meets the inequality
IAI
~ 8A(V) ~ PA .
Since the inequality I.AI ~ PA is valid for all eigenvalues of A, it follows that ru(A) ~ PA. However, since PA has already been shown to be an eigenvalue 0 of A, equality must prevail: ru(A) = PA.
Lemma 23.7. Let A E jRnxn be a nonnegative irreducible matrix and let x be a nonzero vector in en such that Ax = ru(A)x. Then there exists a number c E e such that CXi > 0 for every entry Xi in the vector x and ex E CA. Let x be a nonzero vector in en such that Ax = r u (A)x and let v E jR n be the vector with components Vi = IXi I, for i = 1, ... ,n. Then v E C and
Proof.
n
ru(A)vi
n
= ru(A)lxil = LaijXj ~ LaijVj , i = 1, ... ,n, j=l
j=1
i.e., the vector Av - ru(A)v is nonnegative. But this in turn implies that either Av - ru(A)v = 0 or (In + A)n-I(Av - ru(A)v) > O. The second possibility leads to a contradiction of the fact that ru(A) = PA. Therefore, equality must prevail. Moreover, in this case, the subsequent equality
(In
+ At-Iv = "(v
with "(
= (1 + ru(A)t- 1
implies that v E CA and hence that Vi > 0 for i = 1, ... ,n. Furthermore, the two formulas Ax = ru(A)x and Av = ru(A)v imply that Akx = ru(A)kx and Akv = ru(A)kv for k = 0,1, .... Therefore, x and v are eigenvectors of the matrix B = (In + A)n-I, corresponding to the eigenvalue "(; i.e.,
23. Matrices with nonnegative entries
500
Bx = l'x and Bv = l'v. Then, since Vi = that n
n
j=l
j=l
lXii,
L bijlxjl = 1'Ixil = L bijxj
the last two formulas imply
for
i
= 1, ...
,n.
Moreover, since bij > 0 for all entries in the matrix B, and IXjl > 0 for = 1, ... ,n, since v E CA, the numbers Cj = bijxj, j = 1, ... ,n, are all nonzero and satisfy the constraint
j
ICI + ... +enl = hi + hi + ... + len I . But this is only possible if Cj = ICjleiO for j = 1, ... ,n for some fixed (), i.e., if and only if Xj = IXjleiO for j = 1, ... ,n. Therefore, the number c = e- iO fulfills the assertion of the lemma: ex E CA. D Exercise 23.9. Show that if the complex numbers Cl, ... , en E e\ {O}, then ICI + ... + enl = ICll + IC21 + ... + lenl if and only if there exists a number () E [0,211") such that Cj = eiOlcj I for j = 1, ... ,n. Lemma 23.8. Let A E lR nxn be a nonnegative irreducible matrix. Then the geometric multiplicity of ru(A) as an eigenvalue of A is equal to one, i.e., dimN(ru(A)In-A) = 1.
en
Proof. Let u and v be any two nonzero vectors in such that Au = ru(A)u and Av = Tu(A)v. Then the entries Ul, ... ,Un of u and the entries VI, ••• ,Vn of v are all nonzero and VI u - Ul V is also in the null space of the matrix A-Tu{A)In. Thus, by the preceding lemma, either VlU-UlV = 0 or there exists a number C E e such that C(VlUj - UlVj) > 0 for j = 1, ... ,n. However, the second situation is clearly impossible since the first entry in the vector C( VI U - UI v) is equal to zero. Thus, u and v are linearly dependent. D Lemma 23.9. Let A E lR nxn and B E lR nxn be nonnegative matrices such that A is irreducible and A - B is nonnegative (i. e., A 2:: B 2:: 0). Then: (1) Tu(A) 2:: Tu(B) . .(2) Tu(A) = Tu(B)
{=:}
A = B.
Proof. Let (3 E u(B) with I,BI = ru(B), let By = f3y for some nonzero and let v E lR n be the vector with Vi = IYi I for i = 1, ... ,n. vector y E Then, in the usual notation,
en
n
f3Yi = LbijYj
j=l
for
i
= 1, ... ,n
501
23.1. Perron-F'robenius theory
and so (23.1)
n n n n Tu(B)Vi = 1,6IIYil = L bijYj ~ L bijlYjl ~ L aijlYj\ = L aijVj. j=l j=l j=l j=l Therefore, since v E C, this implies that Tu(B)
~
8A(V)
and hence that Tu(B) ~ Tu(A). Suppose next that Tu(A) = Tu(B). Then the inequality (23.1) implies that Av - Tu(A)v ~ o. But this forces Av - Tu(A)v = 0 because otherwise Lemma 23.6 yields a contradiction to the already established inequality Tu(A) ~ 8A(U) for every U E C. But this in turn implies that n
L(aij - bij)vj = 0 for i = 1, ... ,n i=l and thus, as aij - bij ~ 0 and Vj > 0, we must have aij = bij for every choice of i, j E {I, ... ,n}, i.e., Tu(A) = Tu(B) ===> A = B. The other direction is self-evident. 0 Lemma 23.10. Let A E IR nxn be a nonnegative irreducible matrix. Then the algebraic multiplicity of Tu(A) as an eigenvalue of A is equal to one.
Proof. It suffices to show that T u (A) is a simple root of the characteristic polynomial cp('x) = det('xln - A) of the matrix A. Let
Cll.('x) ... [ C('x) = : Cnl('x)
C1n.('x)
1
:'
Cnn('x)
where
Cij('x) = (-l)i+j('xln - A)ji and ('xln - A)ji denotes the determinant of the (n -1) x (n -1) matrix that is obtained from 'xln - A by deleting the j'th row and the i'th column of 'xln - A. Then, as ('xln - A)C('x)
= C('x)('xln -
A)
= cp('x)In,
it follows that
(Tu(A)In - A)C(T/T(A)) = Onxn and hence that each nonzero column of the matrix C(ru(A)) is an eigenvector of A corresponding to the eigenvalue ru(A). Therefore, in view of Lemma 23.7, each column of C(r/T(A)) is a constant multiple of the unique
23. Matrices with nonnegative entries
502
vector u E CA n B such that Au = ru(A)u. Next, upon differentiating the formula C(A)(Aln - A) = CP(A)In with respect to A, we obtain
C'(A)(Aln - A) + C(A) = cp'(A)In and, consequently,
C(ru(A))u = cp'(ru(A))u . Thus, in order to prove that cp'(ru(A)) =1= 0, it suffices to show that at least one entry in the vector C(ru(A))u is not equal to zero. However, since AT is also a nonnegative irreducible matrix and ru(A) = ru(AT), much the same sort of analysis leads to the auxiliary conclusion that there exists a unique vector v E CAT n B such that ATv = ru(A)v and consequently that each column of C(ru(A))T is a constant multiple of v. But this is the same as to say that each row of C(ru(A)) is a constant multiple of v T . Thus, in order to show that the bottom entry in C(ru(A))u is not equal to zero, it suffices to show that Cnn(ru{A)) =1= O. By definition,
Cnn{ru(A)) = det(ru(A)In- 1 -.-4)
,
where A is the (n -1) x (n - 1) matrix that is obtained from A by deleting its n'th row and its n'th column. Let
A
B = [
O(n-l)xl ]
.
0
Olx(n-l)
Then clearly B ~ 0 and A - B ~ O. Moreover, A - B =1= 0, since A is irreducible and B is not. Thus, by Lemma 23.9, ru(A) > ru(B) and consequently, ru{A)In - B is invertible. But this in turn implies that
ru(A)cnn(ru(A)) = det(ru(A)In - B) and hence that Cnn{ru(A))
=1=
=1=
0,
0, as needed to complete the proof.
0
Exercise 23.10. Show that if A E lR nxn is a nonnegative irreducible matrix, then, in terms of the notation used in the proof of Lemma 23.10,
C(ru(A)) = ,uvT for some, > O. The proof of Theorem 23.2 is an easy consequence of the preceding lemmas. Under additional assumptions one can show more: Theorem 23.11. Let A E lR nxn be a nonnegative irreducible matrix such
that Ak is positive for some integer k ~ 1, and let A E a(A). Then
A=1= ru(A)
=?
IAI < ru(A)
Exercise 23.11. Prove Theorem 23.11.
.
23.2. Stochastic matrices
503
Exercise 23.12. Let A E IRnxn be a nonnegative irreducible matrix with spectral radius ru(A) = 1. Let B = A-xyT, where x = Ax and y = ATyare positive eigenvectors of the matrices A and AT, respectively, corresponding to the eigenvalue 1 such that yT x = 1. Show that: (a) u(B) c u(A) u {a}, but that 1 ¢ u(B). (b) limN->oo
k L:f=l Bk = a.
= 1,2, .... (d) limN->oo k L:f=l Ak = xyT. (c) Bk = Ak - xyT for k
[HINT: If ru(B) < 1, then it is readily checked that Bk --t 0 as k --t 00. However, if ru(B) = 1, then B may have complex eigenvalues of the form ei9 and a more careful analysis is required that exploits the fact that limN->oo L:f=l eik9 = a if ei9 =1= 1.]
k
23.2. Stochastic matrices A nonnegative matrix P E IR nxn with entries Pij, i, j be a stochastic matrix if
= 1, ... , n, is said to
n
(23.2)
LPij=l
forevery
iE{l, ... ,n}.
j=l
Stochastic matrices play a prominent role in the theory of Markov chains with a finite number of states.
Exercise 23.13. Let P E IR nxn be a stochastic matrix and let ei denote the i'th column of In for i = 1, ... , n. Show that e[ pkej :S 1 for k = 1,2, .... [HINT: Justify and exploit the formula e[ p k+1 ej = L:~=1 e[ Pese; pkej.] Exercise 23.14. Show that the spectral radius ru(P) of a stochastic matrix P is equal to one. [HINT: Invoke the bounds established in Exercise 23.13 to justify the inequality ru(P) :S 1.] Exercise 23.15. Show that if P E IR nXn is an irreducible stochastic matrix with entries Pij for i, j = 1, ... , n, then there exists a positive vector U E lR n with entries Ui for i, ... , n such that Uj = L:?=1 UiPij for j, ... , n. [HINT: Exploit Theorem 23.2 and Exercises 23.14 and 23.15.] Exercise 23.16. Show that the matrix P
1/2 a 1/2]
= [1/4
1/2 1/4 is an irre1/8 3/8 1/2 ducible stochastic matrix and find a positive vector u E lR 3 that meets the conditions discussed in Exercise 23.15.
23. Matrices with nonnegative entries
504
23.3. Doubly stochastic matrices A nonnegative matrix P E 1R nxn is said to be a doubly stochastic matrix if both P and pT are stochastic matrices, i.e., if (23.2) and n
L Pij = 1
(23.3)
for every j E {1, . . . , n}
i=l
are both in force. The main objective of this section is to establish a theorem of Birkhoff and von Neumann that states that every doubly stochastic matrix is a convex combination of permutation matrices. It turns out that the notion of permanents is a convenient tool for obtaining this result.
c
If A E nxn , the permanent of A, abbreviated per(A) or per A, is defined by the rule
(23.4)
per (A) =
L
alu(l) ... anu(n) ,
uEI:: n
where the summation is taken over the set En of all n! permutations (f of the integers {1, ... ,n}. This differs from the formula (5.2) for det A because the term d(Pu) is replaced by the number one. There is also a formula for computing per A that is analogous to the formula for computing determinants by expanding by minors: n
(23.5)
per A =
L aijper A(ij)
for each choice of i ,
j=l
where A(ij) denotes the (n - 1) x (n - 1) matrix that is obtained from deleting the i'th row and the j'th column.
A
by
Exercise 23.17. Show that if A E ]Fnxn, then (i) per A is a multilinear functional of the rows of Aj (ii) per PA = per A = per AP for every n x n permutation matrix Pi and (iii) per In = 1. Exercise 23.18. Let A =
[~ ~], where B and D are square matrices.
Show that per A
= per B . per D .
Exercise 23.19. Let A E 1R 2x2 be a nonnegative matrix. Show that per A = o if and only if A contains a 1 x 2 sub matrix of zeros or a 2 x 1 submatrix of zeros. Exercise 23.20. Let A E 1R 3x3 be a nonnegative matrix. Show that per A = o if and only if A contains an r x s submatrix of zeros where r + s = 3 + 1. Lemma 23.12. Let A E IR nxn be a nonnegative matrix. Then per A = 0 if and only if there exists an r x s zero submatrix of A with r + s = n + 1.
23.3. Doubly stochastic matrices
505
Proof. Suppose first that per A = O. Then, by Exercise 23.19, the claim is true for 2 x 2 matrices A. Suppose that in fact the assertion is true for k x k matrices when k < n and let A E R nxn. Then formula (23.5) implies that aijper A(ij} = 0 for every choice of i, j = 1, ... ,n. If aij = 0, for j = 1, ... , n and some i, then A has an n x 1 submatrix of zeros. If aij i= 0 for some j, then per A(ij} = 0 and so by the induction assumption A(ij} has an r x s submatrix of zeros with r + s = n. By permuting rows and columns we can assume Orxs is the upper right-hand block of A, i.e., P I AP2 =
[~ ~]
for some pair of permutation matrices PI and P2. Thus, as per B per D
= per (PI AP2) = per A = 0,
it follows that either per B = 0 or per D = O. Suppose, for the sake of definiteness, that per B = o. Then, since B E R rxr and r < n, the induction assumption guarantees the existence of an i x j submatrix of zeros in B with i + j = r + 1. Therefore, by permuting columns we obtain a zero submatrix of A of size i x (j + s). This fits the assertion, since
i+j+s=r+l+s=n+l. The other cases are handled similarly. To establish the converse, suppose now that A has an r x s submatrix of zeros with r + s = n + 1. Then, by permuting rows and columns, we can without loss of generality assume that
A=[~ ~]
,r+s=n+1.
Thus, as B E Rrx(n-s) and r x (n - s) = r x (r - 1), any product of the form
aI u(l)a2u(2) ... aru(r) ... anu(n) is equal to zero, since at least one of the first r terms in the product sits in the zero block. D
Lemma 23.13. Let A E R nxn be a doubly stochastic matrix. Then per A >
o.
Proof. If per A = 0, then, by the last lemma, we can assume that
23. Matrices with nonnegative entries
506
where B E 1R rx(n-s), C E 1R (n-r) x (n-s) and r + s = n + 1. Let Ea denote the sum of the entries in the matrix G. Then, since A is doubly stochastic, r = EB S EB
+ Eo = n - s;
i.e., r + s = n, which is not compatible with the assumption per A = o. Therefore, per A> 0, as claimed. 0 Theorem 23.14. (Birkhoff-von Neumann) Let P E lR nxn be a doubly stochastic matrix. Then P is a convex combination of finitely many permutation matrices. Proof. If P is a permutation matrix, then the assertion is self-evident. If P is not a permutation matrix, then, in view of Lemma 23.13 and the fact that P is doubly stochastic, there exists a permutation a of the integers {1, ... ,n} such that 1 > Plu(I)P2u(2) ... Pnu(n) > O. Let Al
= min{PIU(I), ...
,Pnu(n)}
and let III be the permutation matrix with 1's in the ia(i) position for i = 1, ... ,n. Then it is readily checked that PI = P - AlIII
1- >'1
is a doubly stochastic matrix with at least one more zero entry than P and that P = AlIII + (1 - AdPI . If PI is not a permutation matrix, then the preceding argument can be repeated; i.e., there exists a number A2, 0 < A2 < 1, and a permutation matrix II2 such that P2 = PI - A2II2 1 - A2 is a doubly stochastic matrix with at least one more zero entry than Pl. Then P = AlIII + (1- AI){A2II2 + (1- A2)P2} .
Clearly this procedure must terminate after a finite number of steps.
0
Exercise 23.21. Let Q denote the set of doubly stochastic n x n matrices. (a) Show that Q is a convex set and that every n x n permutation matrix is an extreme point of Q. (b) Show that if P E Q and P is not a permutation matrix, then P is not an extreme point of Q. (c) Give a second proof of Theorem 23.14 based on the Krein-Milman theorem.
507
23.4. An inequality of Ky Fan
23.4. An inequality of Ky Fan Let A, B E lR. nxn be a pair of symmetric matrices with eigenvalues J-t1 ~ J-t2 ~ ... ~ J-tn and ~ 112 ~ ... ~ lin, respectively. Then the Cauchy-Schwarz inequality applied to the inner product space C nxn with inner product
III
(A, B) = trace{ BH A}
leads easily to the inequality trace{ AB}
~
{t {t II;} J-t]} 1/2
3=1
1/2
3=1
In this section we shall use the Birkhoff-von Neumann theorem and the Hardy-Littlewood-Polya rearrangement lemma to obtain a sharper result for real symmetric matrices. The Hardy-Littlewood-Polya rearrangement lemma, which extends the observation that (23.6) to longer ordered sequences of numbers, can be formulated as follows: Lemma 23.15. Let a and b be vectors in lR. n with entries al an and bi ~ b2 ~ ... ~ bn , respectively. Then aT Pb
(23.7)
~
a2
~
...
~
~ aTb
for every n x n permutation matrix P. Proof. Let P = L:j=1 eje~(j) for some one to one mapping u of the integers {I, ... ,n} onto themselves, and suppose that P =I In. Then there exists a smallest positive integer k such that u(k) =I k. If k > 1, this means that u(l) = 1, ... ,u(k - 1) = k - 1 and k = u(£), for some integer £ > k. Therefore, bk = bu(e) ~ bU(k) and hence the inequality (23.6) implies that
akbu(k)
+ aebu(e)
~
=
akbu(l) + aebu(k) akbk + aebu(k) .
In the same way, one can rearrange the remaining terms to obtain the inequality (23.7). 0 Lemma 23.16. Let A, B E lR. nxn be symmetric matrices with eigenvalues
J-ti
~
J-t2
~
...
~
J-tn and
III
~
112
~
...
~
lin ,
respectively. Then (23.8)
trace{ AB} ~ J-tilli
+ ... + J-tnlln
,
with equality if and only if there exists an n x n orthogonal matrix U that diagonalizes both matrices and preserves the order of the eigenvalues in each.
23. Matrices with nonnegative entries
508
Proof. Under the given assumptions there exists a pair of n x n orthogonal matrices U and V such that
A = UDAUT and B = VDBV T , where D A = diag{Jl.l, ... , Jl.n} and DB = diag{vl' ... ,vn }
.
Thus trace{AB} = =
trace{UDAUTVDBVT} trace{DAWDBWT} n
L
=
Jl.iW;jVj ,
i,j=1
where Wij denotes the ij entry of the matrix W = UTV. Moreover, since W is an orthogonal matrix, the matrix Z E IR nxn with entries Zij = W;j' i,j = 1, ... ,n, is a doubly stochastic matrix and consequently, by the Birkhoff-von Neumann theorem,
8=1
is a convex combination of permutation matrices. Thus, upon setting x T = [Jl.I. ... ,Jl.n] and yT = [VI, .•• ,vn ] and invoking Lemma 23.15, it is readily seen that l
trace{AB}
LAsxTpsY s=1 l
<
L
A8 XT Y =
xT Y ,
8=1
as claimed.
o
The case of equality is left to the reader.
Remark 23.17. A byproduct of the proof is the observation that the Schur product U 0 U of an orthogonal matrix U with itself is a doubly stochastic matrix. Doubly stochastic matrices of this special form are often referred to as orthostochastic matrices. Not every doubly stochastic matrix is an orthostochastic matrix; see Exercise 23.23 The subclass of orthostochastic matrices play a special role in the next section. Exercise 23.22. Show that in the setting of Lemma 23.16,
trace{AB} 2:
Jl.IVn
+ ... + Jl.nVI
.
23.5. The Schur-Horn convexity theorem
509
Exercise 23.23. Show that the doubly stochastic matrix A
~ 1[~ ~ ~l
is not an orthostochastic matrix.
23.5. The Schur-Horn convexity theorem
A = [aij] ,i, j = 1, ... ,n be a real symmetric matrix with eigenvalues J..lI, ... ,J..ln· Then, by Theorem 9.7, there exists an orthogonal matrix Q E R nxn such that A = QDQT, where D = diag{J..l1, ... ,J..ln}. Thus,
Let
n
aii = LqtjJ..lj j=l
for
i
and the vector dA with components aii for dA
=
= 1, ... ,n
= 1, ...
,n is given by the formula
[a~l] = B [~1] , ann
J..ln
where B denotes the orthostochastic matrix with entries
bij = qtj
for
i, j
= 1, ... ,n.
This observation is due to Schur [63]. By Theorem 23.14,
L
B=
cuPu
uEI:n
is a convex combination of permutation matrices Pu . Thus, upon writing Pu in terms of the standard basis ei, i = 1, ... ,n for R n as n
Pu =
L eie~(i)
,
i=l
it is readily checked that the vector dA =
L Cu [J..lU;(l)] uEI: n J..lu(n)
is a convex combination of the vectors corresponding to the eigenvalues of the matrix A and all their permutations. In other words, (23.9)
dA E conv
{[J..lU;(l)] : J..lu(n)
(j
E
En}
There is a converse statement due to Horn [39], but in order to state it, we must first introduce the notion of majorization.
23. Matrices with nonnegative entries
510
Given a sequence {Xl, ..• ,xn } of real numbers, let {Xl, ... ,xn } denote the rearrangement of the sequence in "decreasing" order: Xl ~ ... ~ xn. Thus, for example, ifn = 4 and {Xl, ... ,X4} = {5,3,6,1}, then {Xl, ... , X4} = {6, 5, 3, 1}. A sequence {XI. ... , xn} of real numbers is said to majorize a sequence {Yl, ... , Yn} of real numbers if Xl
= ... Xk > 111 + ... + 11k for
Xl
= ... Xn = 111 + ... + 11n .
and
k
= 1, ... , n - 1
Exercise 23.24. Show that if A E lR nxn is a doubly stochastic matrix and if y = Ax for some x E lR n, then the set of entries {Xl, ... , xn} in the vector x majorizes the set of entries {yI, .. . , Yn} in the vector y. [HINT: If Xl ~ ... Xn , Y1 ~ ... Yn and 1 ~ i ~ k ~ n, then Yi
n
k-l
n
k-l
j=l
j=l
j=k
j=l
= L UijXj ~ L aijXj + Xk L aij = L aij(xi -
Now exploit the fact that Ef=l aij ~ 1 to bound Yl
Xk)
+ Xk·
+ ... + Yk.]
Lemma 23.18. Let {Xl. ... ,xn } and {Yl, ... , Yn} be two sequences of real numbers such that {XI. ... , xn} majorizes {Yl, ... , Yn}. Then there exists a set of n - 1 orthonormal vectors Ul, ... , Un-l and a permutation (J E ~n such that n
(23.10)
Yu(i)
= L)Ui)JXj
for i
= 1, ...
, n - 1.
j=l
Discussion. Without loss of generality, we may assume that Xl ~ ... ~ Xn and Y1 ~ ... ~ Yn. To ease the presentation, we shall focus on the case n = 4. Then the given assumptions imply that
+ Y2 , Xl + X2 + X3 ~ YI + Y2 + Y3 because of the equality Xl + ... + X4 = Yl + ... + Y4, X4 ~ Y4 , X3 + X4 ~ Y3 + Y4 , X2 + X3 + X4 ~ Y2 + Y3 + Y4 . Xl ~ Yl ,
and,
Xl
+ X2
~ Yl
The rest of the argument depends upon the location of the points Yl, Y2, Y3 with respect to the points Xl. . .. , X4. We shall suppose that Xl > X2 > X3 > X4 and shall consider three cases: Case 1: Y3 such that
~
X3 and Y2
~
X2. Clearly, there exists a choice of
Y3 = C~X3 + dX4 with c~ + c~ Moreover, if w = U 2X2 + V2(-y2x3 + 82x4), where ')'e3
+ 8C4 =
0 and
')'2
= 1.
+ 82 = u 2 + v 2 = 1 ,
C3, C4 E
lR
23.5. The Schur-Horn convexity theorem
then the vectors
UI
and
U2
u[ = [0 0
511
defined by the formulas
C4]
C3
and
uf = [0 u V'Y v8]
are orthonormal and, since
= X3 + X4 - Y3 ~ Y4 ~ Y2 X2, there exists a choice of u, v with u 2 + v 2 = 1 such that Y2 = W,
'Y2X3
and Y2 ~ i.e.,
+ 82x4 = (1 -
C§)X3
+ (1 -
C~)X4
Now, let w = u2XI
+ v 2((32x2 + 'Y2X3 + 82x4) ,
where the numbers (3, "I, 8, u and v are redefined by the constraints 'Y 2C3 + 82c4 = (32b 2 + 'Y 2b3 + 82b4
and (32 + "12
=0
+ 82 = u 2 + v 2 =
1.
Then the vectors
are orthonormal and, since (32x2
+ 'Y2X3 + 82x4 = =
(1 - b~)X2 + (1 - b§ - C§)X3 X2
+ X3 + X4 -
Y2 - Y3
~
Y4
+ (1 ~
b~ - C~)X4
YI
and YI ~ Xl, there exists a choice of real numbers u and v with u 2 + v 2 = 1 such that w = uf x = YI. Case 2: YI 2: X2 and Y2 2: X3· This is a reflected version of Case 1, and much the same sort of analysis as in that case leads rapidly to the existence of representations YI Y2
and alb!
a~xI
+ a~x2 with b~XI + b~X2 + b§X3
+ a2b2 = O.
a~ + a~ = 1, with
b~ + b~ + b§
=1
Therefore, if
w = u2(a2xI
+ (32x2 + 'Y2X3) + v2X4
with aal
+ (3a2 = ab l + (3b 2 + 'Yb3 = 0
then the vectors
and a 2 + (32
+ "12 = u 2 + v 2 = 1,
23. Matrices with nonnegative entries
512
are orthonormal and, since
(1 - ai - bi)XI Xl
+ X2 + X3 -
+ (1 -
a~ - b~)X2
YI - Y2
~
Yl and a2b2
+ v2 =
+ X3 :S Yl + Y2,
b~X2 + b~X3 with b~ + b5 = 1 aixi + a~x2 + a~x3 with ai + a~
Y2
+ a3b3 =
b5)X3
Y3
and Y3 ~ X4, there exists a choice of u, v with u 2 W = u§x = Y3. Case 3: X3 :S Y3 :S Y2 :S YI :S X2· If X2
+ (1 -
1 such that
then we may write
+ a5 = 1
O. Thus, if W
= U2(a2xl
+ (32X2 + 'lX3) + v2X4,
where
then the vectors
are orthonormal and, since
(1 - ai)xl =
Xl
+ (1 -
+ X2 + X3 -
a~ - b~)X2
YI - Y2
~
+ (1 -
a~ - b5)X3
Y3
and X4 :S Y3, there exists a choice of real numbers u and v with u 2 + v 2 = 1 such that w = u§x = Y3. On the other hand, if X2
and b2C2
Y2
-
Y3
-
+ X3
~
Yl
+ Y2,
then
b~X2 + b5X3 with b~ + b~ = 1 C22 X2 + b2cX3 + C42 X4 WI'th c22 + C32 + C42
+ b3C3 + b4C4 = Yl
=1
0 and the construction is completed by choosing
= U2Xl + v 2((32x2 + 'lX3 + 82x4)
with
and an appropriate choice of u, v with u 2
+ v2 =
1.
23.6. Bibliographical notes
513
Theorem 23.19. (Schur-Horn) Let {J1}, ... ,J1n} be any set of real numbers (not necessarily distinct) and let a E IR n. Then
a E conv {
[J1U:(I)] : (j
E
En}
J1u(n) if and only if there exists a symmetric matrix A E IR nxn with eigenvalues J1}' ... ,J1n such that dA = a.
Proof.
Suppose first that a belongs to the indicated convex hull. Then a
=
L
cuPu
[~l] = [~l] P
J1n
uEEn
J1n
,
where P = I:UEEn cuPu is a convex combination of permutation matrices Pu and hence, by Exercise 23.24, {J1}' ... ,J1n} majorizes {al,'" ,an}, Therefore, by Lemma 23.18, there exists a set {UI, ... ,un-d of n-l orthonormal vectors in IR n and a permutation (j E En such that n
au(i)
= L(Ui);J1j for
i
= 1, ... ,n - 1.
j=l
Let Un E IR n be a vector of norm one that is orthogonal to Then
UI,.·. ,Un-I'
~(un)~l'j ~ ~ (1 -~(Ui);) 1'; =
J11
+ ... + J1n -
(au(l)
+ ... + au(n-l)) = au(n) .
This completes the proof in one direction. The other direction is covered by the first few lines of this section. 0
Exercise 23.25. The components {Xl, ... ,xn } of x components {YI,' .. ,Yn} of Y E IRn if and only if y E conv{Pux:
(j
E
IR n majorize the
E En}.
Exercise 23.26. Verify Lemma 23.18 when {x!, ... ,xd majorizes {YI, ... , and Xl 2: X2 2: YI 2: Y2 2: Y3 2: X3 2: X4. [HINT: Express YI as a convex
Y4}
combination of Xl and
X4
and
Y3
as a convex combination of X2 and
X3.]
23.6. Bibliographical notes Applications of Perron-Frobenius theory to the control of a group of autonomous wheeled vehicles are found in the paper [47J. The presented proof of Fan's inequality is adapted from an exercise with hints in [10]. Exercise
514
23. Matrices with nonnegative entries
23.25 is a theorem of Rado. Exercise 23.24 is taken from [37]. Exercise 23.23 is taken from [63]. Additional discussion on the history of the Schur-Horn convexity theorem and references to generalizations may be found in [28]. Related applications are discussed in [13]. The definitive account of permanents up till about 1978 was undoubtedly the book Permanents by Henryk Minc [51]. However, in 1980/81 two proofs of van der Waerden's conjecture, which states that
The permanent of a doubly stochastic n x n matrix is bounded below by n!/nn, with equality if and only if each entry in the matrix is equal to l/n, were published. The later book Nonnegative Matrices [52] includes a proof of this conjecture.
Appendix A
Some facts from analysis
. . . a liberal arts school that administers a light education to students lured by fine architecture and low admission requirements. ... Among financially gifted parents of academically challenged students along the Eastern Seaboard, the college is known as ... a place where ... , barring a felony conviction, ... [your child} ... will get to wear a black gown and attend graduation .. . .
Garrison Keillor [42], p. 7
A.I. Convergence of sequences of points A sequence of points
Xl, X2,'"
E
lR is said to
• be bounded if there exists a finite number M for j = 1,2, ... , • be monotonic if either
Xl ~ X2 ~ • ••
or
Xl
> 0 such that 2:
X2
IXj I ~ M
2: ... ,
• converge to a limit X if for every c > 0 there exists an integer N such that IXj - xl < c if j 2: N. • be a Cauchy sequence ( or a fundamental sequence) if for every c > 0 there exists an integer N such that IXj+k - Xj I < c if j 2: N
and
k 2: l.
It is easy to see that every convergent sequence is a Cauchy sequence. The converse is true also. The principle facts regarding convergence are:
-
515
516
A. Some facts from analysis
• A sequence of points Xl, X2,'" E R converges to a limit only if it is a Cauchy sequence.
X
E R if and
• Every bounded sequence of points in R has a convergent subsequence. • If every convergent subsequence of a bounded sequence of points converges to the same limit, then the sequence converges to the same limit.
• Every bounded monotonic sequence converges to a finite limit.
A.2. Convergence of sequences of functions A sequence of functions h(x), h(x),··· that is defined on a set Q c R is said to converge to a limit f(x) on Q if fn(x) converges to f(x) at each point X E Q, i.e., if for each point X E Q and every E > 0, there exists an integer N such that
l/j(x) - f(x)1 < E if j ~ N. In general, the number N depends upon x. The sequence h (x), h (x), ... is said to converge uniformly to f (x) on Q if for every E > 0 there exists an integer N that is independent of the choice of x E Q such that
l/j(x) - f(x)1 < E for
j
~
N
and every
x
E
Q.
A.3. Convergence of sums A sum 2:~1 aj of points aj E R is said to converge to a limit a E R if the partial sums n
Sn = Laj j=l
tend to a as n i that
00,
i.e., if for every ISn -
al < E
E
> 0, there exists an integer N such for
n > N.
The Cauchy criterion for convergence then translates to: for every there exists an integer N such that
ISn+k -
Sn I < E
for n > Nand
k~1
or, equivalently, n+k
L j=n+l
aj < E for
n > Nand k ~ 1 .
E
>0
A.4. Sups and inEs
517
In particular, a sufficient (but not necessary) condition for convergence is that for every E > 0 there exists an integer N such that n+k
L
00
lajl <
E
n> Nand k 2: lor, equivalently, Llajl
for
j=l
j=n+l
A.4. Sups and infs If Q is a bounded set of points in JR, then:
m is said to be a lower bound for Q if m
~
M is said to be an upper bound for Q if x
x for every x E Q. ~
M for every x E Q.
Moreover, there exists a unique greatest lower bound in for Q and a unique least upper bound M for Q. These numbers are referred to as the infimum and supremum of Q, respectively. They are denoted by the symbols
in = inf{x : x
E Q}
and
M = sup{x : x E Q},
respectively, and may be characterized by the conditions infimum: in = inf{x : x E Q} if and only if in ~ x for every x E Q, but for every E > 0 there exists at least one point x E Q such that x < in+E. ~
~
supremum: M = sup{x: x E Q} if and only if x ~ M for every x E Q, but for every E > 0 there exists at least one point x E Q such that
x> M
-E.
Let Xl, X2,'"
be a sequence of points in JR such that IXjl ~ M <
00
and
let Then
Ml 2: M2 2: M3 2: ... 2: -M; i.e., Ml, M2, .. ' is a bounded monotone sequence. Therefore limjtoo Mj exists, even though the original sequence Xl, X2, •.. may not have a limit. (Think of the sequence 0,1,0,1, .... ) This number is called the limit superior and is written limsupxj = ~im sup{Xj,Xj+l"'}' jioo 3ioo The limit inferior is defined similarly: lirp.infxj = ~ 3ioo 3ioo
inf{Xj,Xj+l"'}'
A. Some facts from analysis
518
A.5. Topology Let
Br(y) = {x E R:
Ix - YI < r}
and
Br(y) = {x E R:
Ix - yl :s; r}.
A point y E R is said to be a limit point of a set Q c R if for every choice of r > 0, Br(y) contains at least one point of Q other than y. A subset Q of R is said to be: open: if for every point y E Q there exists an r > 0 (which usually depends upon y) such that Br(Y) C Q. closed: if Q contains all its limit points. bounded: if Q
c BR(O)
for some R > O.
If AI, A2 ... are open sets, then finite intersections ni=1 Ai and (even) infinite unions U~IAi are open. If B I ,B2 , •.. are closed sets, then finite unions Ui=1 Bi and (even) infinite intersections n~1 Bi are closed. An open set n c C is said to be connected if there does not exist a pair of disjoint nonempty open sets A and B such that n = A U B. In other words, if = Au B, A, B are open and An B = 0, then either A = 0 or
n
B=0.
A.6. Compact sets A collection of open sets {Bo. : a E A} is said to be an open covering of Q if Q c Uo.EA Bo.. Q is said to be compact if every open covering contains a finite collection of open sets Bo. 1 , ••• ,Bo.n such that Q C Uj=l Bo. j • • Let Q C R. Then Q is compact if and only if Q is closed and bounded . • A continuous real-valued function f(x) that is defined on a compact set K attains its maximum value at some point in K and its minimum value at some point in K.
A. 7. N ormed linear spaces The definitions introduced above for the vector space R have natural analogues in the vector space C and in normed linear spaces over R or C. Thus, for example, if X is a normed linear space over C and y EX, it is natural to let
Br(Y)={XEX: IIx-YII 0, Br(y) contains at least one point of Q other than y. A subset Q of R is said to be:
A.7. Normed linear spaces
519
open: if for every point y E Q there exists an r > 0 (which usually depends upon y) such that Br (y) c Q. closed: if Q contains all its limit points. bounded: if Q c BR(O) for some R > O. compact: if every open covering of Q contains a finite open covering of Q. A sequence of vectors XI. X2, ... in X is said to be a Cauchy sequence if for every c > 0 there exists an integer N such that
IIXn+k - xnll < c for every k ~ 1 when n ~ N. A normed linear space X over C or R is said to be complete if every Cauchy sequence converges to a point X EX. Finite dimensional normed linear spaces over C or R are complete. Therefore, their properties are much the same as those recorded for R earlier. Thus, for example, if X is a finite dimensional normed linear space over C or R, then every bounded sequence has a convergent subsequence. Moreover, if every convergent subsequence of a bounded sequence tends to the same limit, then the full original sequence converges to that limit.
A point a E X is said to be a boundary point of a subset A of X if
Br(a) n A
i= 0
and
Br(a) n (X \ A)
i= 0
for every r > 0,
where X \ A denotes the set of points that are in X but are not in A. A point a E X is said to be an interior point of A if Br(a) c A for some r > O. The symbol int A will be used to denote the set of all interior points of A. This set is called the interior of A.
Appendix B
More complex variables
The game was not as close as the score indicated. Rud Rennie (after observing a 19 to 1 rout), cited in [40], p. 48
This appendix is devoted to some supplementary facts on complex variable theory to supplement the brief introduction given in Chapter 17.
B.t. Power series An infinite series of the form L:~=o an (.X - w) n is called a power series and the number R that is defined by the formula = limsup{lakll/k},
1 R
kjoo
with the understanding that
R=
00
if
lim sup lakl 1/ k = 0 and R = 0 if
lim sup lakl 1/ k =
~oo
00,
~oo
is termed the radius of convergence of the power series. The name stems from the fact that the series converges if I'x-wl < R and diverges if 'x-wi> R. Thus, for example, if 0 < R < 00 and 0 < r < R, then the partial sums n
In('x) = LakC'x _w)k, n = 0,1, ... , k=O
form a Cauchy sequence for each point ,x in the closure
Br(w)={.xEC:
I'x-wl~r}
of Br(w)={'xEC:"I'x-wl
-
521
522
B. More complex variables
if r < R: if r < rl < R, then enough and hence
which tends to zero as n i On the other hand, if
lajl ::; r-;j
for all j ~ n if n is chosen large
00.
1.\ - wi = r > Rand
1 1 1 - <- <- , r rl R
then
lakl 1/ k ~
:1
~ lakll"\-wl k ~ (:I)k > 1 infinitely often.
infinitely often
Therefore, the power series diverges if 1..\ - wi > R. The cases R = 0 and R = 00 are left to the reader. The next order of business is to check that every holomorphic function generates a power series. More precisely:
Lemma B.l. Let I be holomorphic in an open set suppose that BR{W) en lor some R > O. Then
L I(n)(n.,w ) (..\ - wt 00
I{.\) =
for
1..\ -
nc
te, let wEn and
wi < R.
n=O
Proof. Let r R denote a circle centered at w of radius R directed counterclockwise and let 1.\ - wi < R. Then
I{.\) =
~ 27ri
=
_1 (
27ri =
-
f(() d(
JrR
f(() d( (- w - (..\ - w)
~ ( 27ri
=
(
JrR (-..\
~
Jr R(( {
f(() d( w)(1 - 2=~)
liQ ~ (.\_w)n d(
27ri
JrR ( - w ~
~
{1 ( (( _f(()w)n+1
(- w
~ 27ri JrR
which coincides with the asserted formula. Conversely, every convergent power series holomorphic function:
d(
}
n
(..\ - w) ,
o
L:k::O ak("\ - w)k
defines a
523
B.2. Isolated zeros
Lemma B.2. Let 2:~o ak(>' - w)k be a power series with radius of convergence R. Then the partial sums n
fn(>') =
L ak(>' - wl, n = 0,1, ... , k=O
converge to a holomorphic function f(>.) at every point>. partial sums
E
BR(W) and the
n
f~(>')
=
L kak(>' - w)k-l, n = 1,2, ... , k=l
converge to its derivative f'(>.) at every point>. E BR(W). Moreover, the convergence of both of these sequences of polynomials is uniform in Br(w) for every r < R.
Proof.
Since limnioo n1/n = 1, it follows that
limsup(nlanl)l/n = lim sup lanl 1 / n nioo
nioo
and hence that the two power series 00
00
k=O
k=l
L ak(>' - w)k and L kak(>' - w)k-l have the same radius of convergence R. Moreover, the limits
f(>.) = lim fn(>') nioo
and
g(>.) = lim f~ (>.) nioo
are both holomorphic in BR(W), since the convergence is uniform in Br(w) if 0 < r < R. Furthermore, if>. E BR(W) and rr denotes a circle ofradius r centered at wand directed counterclockwise, then
f'(>.)
=
_1 (
27ri
f(()
lrr (( - >.)2
lim _1 ( nioo 27ri lrr
d(
fn(() d(
(( - >.)2
lim f~(>') = g(>.). nioo
o B.2. Isolated zeros Lemma B.3. Let f be holomorphic in an open connected set 0 and suppose that f(>.) = 0 for every point>. in an infinite set 00 C 0 that contains a limit point. Then f(>.) = 0 for every point>. E O.
B. More complex variables
524
Proof.
Let
A = {w En: f(j)(w) = 0 for j = 0, 1, ... } and let B
= {w En: f(k)(w)
=1=
0 for at least one integer k 2 O}.
Clearly Au B = n, An B = 0 and B is open. Moreover, since w E A if and only if there exists a radius rw > 0 such that Brw (w) c nand f ()..) = 0 for every point ).. E Brw (w), it follows that A is also open. The proof is completed by showing that if aI, a2,··· is a sequence of points in no that tend to a limit a E no, then a ¢ B. Therefore a E A, i.e., A = nand B = 0. The verification that a ¢ B follows by showing that the zeros of f ()..) in B are isolated. The details are carried out in a separate lemma. 0 Lemma B.4. Let f()..) admit a power series expansion of the form
f()..) =
f(k+ 1) ( ) f (k)( ) k'w ().. - w)k + (k + 1~ ().. - w)k+1
+ ...
in the ball Br(w) and suppose that f(k)(w) =1= 0 for some nonnegative integer k. Then there exists a number Pw > 0 such that f()..) =1= 0 for all points).. in the annulus 0 < I).. - wi < Pw· Proof. H 0 < rl < r, then Brl (w) C Br (w) and the coefficients in the exhibited power series are subject to the bound
f(j)(w) M <j! - r{' where M = max{lf(()1 : ( E C and Therefore, if
Iw - (I =
rl}.
I).. - wi < rl, then 00
<
L
j=k+1
.< M
M
-·I)..-wl j
ri
(I).. - w1)k+l rl
Consequently,
rl
rl -
I).. - wi·
B.4. In (1 - A) when IAI
<1
which is not equal to zero for 0 enough.
525
< IA -
wi <
r2
when
r2
is chosen small
0
Exercise B.lo Let f be holomorphic in Br(w). Show that if If(A)1 = M for every point A E Br(w), then f(A) is constant in Br(w). [HINT: Use estimates analogous to those used just above to show that f(k}(w) = 0 for k = 1,2, .... J
B.3. The maximum modulus principle Lemma B.5. Let n be an open connected bounded nonempty subset of Ci let f be holomorphic in n and continuous in n, the closure of 0i let an denote the boundary of ni and let M = max {If(A)1 : A E n}. Then: (1) max {If(A)1 : A E an} = M (2) If(A)1 = M for some point A E
n if and only if f
is constant in
n.
Suppose first that f (w) I = M for some point wEn and let Br (w) c n. Then, in the usual notation, Cauchy's formula implies that
Proof.
If(w)1
=
i2~i Irr (~(~ d(i = 12~ fo27r If(w + rei9 )ldO
and thus, as If(w+re i9 )1 ~ M, equality must prevail: Le., If(w+pe i9 )1 = M, first for p = r and then, by the same argument, for 0 ~ p ~ r. Therefore the set A = {A En: If(A)1 = M} is open. Moreover, B = {A En: If(A)1 < M} is open. Therefore, since Au B = n, An B = 0 and n is connected, it follows that either A = 0 or B = 0. The proof is now easily completed by invoking Exercise B.1.
B.4. In (1 - A) when IAI < 1 Lemma B.6. If IAI < 1, then l-A=exp {
An} . -I:-;;: <X>
n=l
Proof.
Let
An} .
g(A) = exp { - ~ -;;: <X>
Then
and
9' (0) = -g(O) = -1.
0
526
B. More complex variables
Moreover,
"(A) = (A - l)g'(A) - g(A) = 0 (A- 1)2 . Thus, all the higher order derivatives of g(A) are also equal to zero in B1(0), and hence the power series expansion 00 (k) (0) g(A) = 9 , Ak
9
L
k=O
k.
reduces to
g(A)
= g(O) + g'(O)A =
1 - A. D
Corollary B.7. If IAI
< 1, then An L -. n 00
In(l- A) = -
n=l
B.5. Rouche's theorem Lemma B.S. Let f(A) be holomorphic in an open set n that contains the closed disc Br(w) for some r > 0 and suppose that If(A)1 > 0 if IA - wi = r and let Nf(w,r) denote the number of zeros of f inside Br(w), counting multiplicities. Then
Proof. The main observation is that if f (A) has a zero of order k at some point a E Br(w), then f(A) = (A - a)kp(A), where p(A) is holomorphic in n and p(a) =f O. Thus, if Cp denotes a circle of radius p that is centered at a and directed counterclockwise, then
1
1 {+
1 f'«() 1 27ri C p f«() d( = 27ri C p
k
p'«()} p«() d( = k,
if P is taken sufficiently smalL In other words, the residue of f' / f at the point a is equal to k. The final formula follows by invoking Theorem 17.10 to add up the contribution from each of the distinct zeros of f (A) inside Br(w); there are only finitely many, thanks to Lemma B.3. 0
Theorem B.9. (Rouche) Let f(A) and g(A) be holomorphic in an open set n that contains the closed disc Br (w) for some r > 0 and suppose that
If(A) - g(A)1 < If(A)1
if
IA - wi =
r.
Then f and 9 have the same number of zeros inside Br(w), counting multiplicities.
B.5. Rouche's theorem
527
Proof. Under the given assumptions, If(A)1 > 0 and Ig(A)1 > 0 for every point A on the boundary of Br (w). Therefore, the difference between the number of roots of f inside Br (w) and the number of roots of 9 inside Br (w ) is given by the formula
where
f(() h(() = g(()
for
(E rr
and r r is a circle of radius r directed counterclockwise that is centered at the point w. In view of the prevailing assumptions on f and g, 11 - h( () I < 1 for ( Err,
and h( () is holomorphic in the set {A E C : r - c < IA - wi < r + c} for some c > 0, thanks to Lemma B.4. Therefore, by Lemma B.6, we can write h(()
1 - (1 - h(())
exp{ cp( ()} , where
cp(() = _
f n=l
(1 - h(())n. n
Thus,
h'(() = h(()cp'(() for (E rr and 1 . { cp' (()d( -2
Jr 1 r 27ri Jo 7rZ
= =
=
r
7r
d d(CP(((t))('(t)dt
1 1211" d -2· -d cp(((t))dt 7rZ 0 t 1 -2. {cp(((27r)) - cp(((O)} 7rZ
.
= O. o
528
B. More complex variables
B.6. Liouville's theorem If f (>..) is holomorphic in the full complex plane C and If (>..) I ~ M < every >.. E C, then f (>..) is constant.
00
for
Proof. Under the given assumptions
for every point >.. E C. Moreover, the formula
for the coefficients implies that
for every R ~ O. Therefore
f(n)
(0)
= 0 for n
~ 1.
D
Much the same sort of analysis can be used to establish variants of the following sort: Exercise B.2. Show that if f(>..) is holomorphic in the full complex plane C and If(>")1 = 27 + v'31>"1 3 / 2 for every point >.. E C, then f (>..) is a polynomial of degree one.
B.7. Laurent expansions Let f(>..) be holomorphic in the annulus 0 < I>,,-wl < R, let 0 < rl < r2 < R and let frj be a circle of radius rj, j = 1, 2, centered at wand directed counterclockwise. Then, for rl < I>" - wi < r2,
(B.l)
B.B. Partial fraction expansions
529
Moreover,
and
f(()
__ 1 (
27ri lrrl
U=~
t. (2~i L ~
j~OO
- 1) (A - w)
d(
f()( - W)"d() (,\ - w)-ln+1)
(1 (
f(() ) j 27ri lrrl (( - w)j+1 d( (A - w) .
Thus, f(A) can be expressed in the form 00
f(A)
(B.2)
L
=
aj(A - w)j,
j=-oo
where 1 . { (( f((j'+1d( for j = ... ,-1,0,1,,,,, aj = -2 7r'1, lrr -wJ
rr
denotes a circle of radius r that is centered at wand directed counterclockwise and < r < R. The representation (B.1) is called the Laurent expansion of f about the point w. If a_k =f. for some positive integer k and aj = for j < -k, then w is said to be a pole of order k of f(A).
°
°
°
B.S. Partial fraction expansions Theorem B.lO. Let
where the k + £ points Ql, ... , Qk ... + mk ::; nl + ... + nt· Then:
and {31, ... ,(3t are all distinct and ml
(1) f(A) has a pole of order nj at the point {3j for j = 1, ... ,£.
+
B. More complex variables
530
(2) Let -1
9j(>") =
L
aji(>" - {3j)i,
j = 1. ... ,f
i=-nj
denote the sum of the terms with negative indices in the Laurent expansion of f(>..) at the point {3j. Then f(>..) - gj(>") is holomorphic in a ball of radius rj centered at {3j if rj is chosen small enough. (3) f(>..) = 91 (>..) + ... + gt(>") + c, where c = lim f(>..). A-+OO
Proof.
Under the given assumptions
f(>..) - {gl(>")
+ ... + gt(>")}
is holomorphic in all of C and tends to a finite limit as >..
If(>..) - {gl(>")
---t 00.
Therefore
+ ... + 9t(>")} 1 ~ M < 00
for all >.. E C. Thus, by Liouville's theorem,
f(>..) - {gl(>") + ... + gt(>")} = c for every point >.. E C.
o
Bibliography
[1] Pedro Alegria. Orthogonal polynomials associated with the Nehari problem. Portugaliae Mathematica, 62:337-347, 2005. [2] Gregory S. Ammar and William B. Gragg. Schur flows for orthogonal Hessenberg matrices. Fields Inst. Commun., 3:27-34, 1994. [3] Tsuyoshi Ando. Concavity of certain maps on positive definite matrices and applications to Hadamard products. Linear Algebra Appl., 26:203-241, 1979. [4] Tom M. Apostol. Mathematical Analysis. Addison-Wesley, 1957. [5] Sheldon Axler. Down with determinants. Amer. Math. Monthly, 102:139-154, 1995.
[6] Harm Bart, Israel Gohberg, and Marinus A. Kaashoek. Minimal Factorization of Matrix and Operator FUnctions. Birkhauser, 1979. [7] Rajendra Bhatia. Matrix Analysis. Springer-Verlag, 1997.
[8] Rajendra Bhatia. On the exponential metric increasing property. Linear Algebra Appl., 375:211-'220, 2003. [9] Albrecht Boetcher and Harold Widom. Szego via Jacobi. Linear Algebra Appl., 419:656-657, 2006. [10] Jonathan M. Borwein and Adrian S. Lewis. Convex Analysis and Nonlinear Optimization. Theory and Examples. Springer, 2006. [11] Truman Botts. Convex sets. Amer. Math. Monthly, 49:527-535, 1942. [12] Stanley Boylan. Learning with the Rav: Learning from the Rav. Tradition, 30:131144, 1996. [13] Roger Brockett. Using feedback to improve system identification. Lecture Notes in Control and Information Sciences, 329:45-65, 2006. [14] Juan F. Camino, J. William Helton, Robert E. Skelton, and Ye Jieping. Matrix inequalities: a symbolic procedure to determine convexity automatically. Integral Equations Operator Theory, 46:399-454, 2003. [15] Shalom Carmy. Polyphonic diversity and military music. Tradition, 34:6-32, 2000. [16] Herve Chapellat and S. P. Bhattacharyya. An alternative proof of Kharitonov's theorem. IEEE Trans. Automatic Control, 34:448-450, 1989. [17] Barry Cipra. Andy Rooney, PhD. The Mathematical Intelligencer, 10:10, 1988.
-
531
532
Bibliography
[18] Chandler Davis, W. M. Kahan, and Hans F. Weinberger. Norm preserving dilations and their applications to optimal error bounds. SIAM J. Numer. Anal., 19:445-469, 1982. [19] Ilan Degani. RCMS - right correction Magnus schemes for oscillatory ODE's and cubature formulas and oscillatory extensions. PhD thesis, The Weizmann Institute of Science, 2005. [20] Chen Dubi and Harry Dym. Riccati inequalities and reproducing kernel Hilbert spaces. Linear Algebra Appl., 420:458-482, 2007. [21] Harry Dym. J Contractive Matrix Functions, Reproducing Kernel Hilbert Spaces, and Interpolation. Amer. Math. Soc, 1989. [22] Harry Dym. On Riccati equations and reproducing kernel spaces. Oper. Theory Adv. Appl., 124:189-215, 200l. [23] Harry Dym. Riccati equations and bitangential interpolation problems with singular Pick matrice. Contemporary Mathematics, 323:361-391, 2002. [24] Harry Dym and Israel Gohberg. Extensions of band matrices with band inverses. Linear Algebra Appl., 36:1-24, 1981. [25] Harry Dym and Israel Gohberg. Extensions of kernels of Fredholm operators. Journal d'Analyse Mathematique, 42:51-97, 1982/1983. [26] Harry Dym and J. William Helton. The matrix multidisk problem. Integral Equations Operator Theory, 46:285-339, 2003. [27] Harry Dym, J. William Helton, and Scott McCullough. The Hessian of a noncommutative polynomial has numerous negative eigenvalue. Journal d'Analyse Mathematique, to appear. [28] Harry Dym and Victor Katsnelson. Contributions of Issai Schur to analysis. Progr. Math., 210:xci-clxxxviii, 2003. [29] Richard S. Ellis. Entropy, Large Deviations and Statistical Mechanics. SpringerVerlag, 1985. [30] Ludwig D. Faddeev. 30 years in mathematical physics. Proc. Steklov Institute, 176:328,1988. [31] Abraham Feintuch. Robust Control Theory in Hilbert Space. Springer, 1998. [32] J. 1. Fujii, M. Fujii, T. Furuta, and R. Nakamoto. Norm inequalities equivalent to Heinz inequality. Proc. Amer. Math. Soc., 118:827-830, 1993. [33] T. Furuta. Norm inequalities equivalent to Lowner-Heinz theorem. Rev. Math. Phys., 1:135-137, 1989. [34] Israel Gohberg, Marinus A. Kaashoek, and Hugo J. Woerdeman. The band method for positive and contractive extension problems. J. Operator Theory, 22:109-155, 1989. [35] Israel Gohberg and Mark G. Krein. Introduction to the Theory of Linear Nonselfadjoint Operators. American Math. Soc., 1969. [36] Paul Halmos. A Hilbert Space Problem Book. Van Nostrand, 1967. [37] G. H. Hardy, J. E. Littlewood, and G. P6lya. Inequalities. Cambridge University Press, 1959. [38] Michael Heymann. The pole shifting theorem revisited. IEEE Trans. Automatic Control, 24:479-480, 1979. [39] Alfred Horn. Doubly stochastic matrices and the diagonal of a rotation. Amer. J. Math., 76:620-630, 1953. [40] Roger Kahn. Memories of Summer. University of Nebraska Press, 1997.
Bibliography
533
[41] Yakar Kannai. An elementary proof of the no-retraction theorem. American Math. Monthly, 88:264-268, 1981. [42] Garrison Kiellor. Woebegone Boy. Viking, 1997. [43] Donald E. Knuth, Tracy Larrabee, and Paul M. Roberts. Mathematical Writing. Mathematical Association of America, 1989. [44] Peter Lancaster and Leiba Rodman. Algebraic Riccati Equations. Oxford University Press, 1995. [45] Peter Lancaster and Miron Tismenetsky. The Theory of Matrices. Academic Press, 1985. [46] Joseph LaSalle and Solomon Lefschetz. Stability by Liapunov's Direct Method with Applications. Academic Press, 1961. [47] Zhiyun Lin, Bruce Francis, and Manfredi Maggiore. Necessary and sufficient graphical conditions for formation control of unicycles. IEEE Trans. Automatic Control, 50:121127,2005. [48] David G. Luenberger. Optimization by Vector Space Methods. John Wiley & Sons, 1969. [49] Andre Malraux. Antimemoirs. Holt Rhinehart and Winston, 1968. [50] Alan McIntosh. The Toeplitz-Hausdorff theorem and ellipticity conditions. Amer. Math. Monthly, 85:475-477, 1978. [51] Henryk Minc. Permanents. Addison-Wesley, 1978. [52] Henryk Minc. Nonnegative Matrices. John Wiley, 1988. [53] Patrick O'Brian. Master and Commander. Norton, 1990. [54] Patrick O'Brian. The Far Side of the World. Norton, 1992. [55] Alex Olshevsky and Vadim Olshevsky. Kharitonov's theorem and Bezoutians. Linear Algebra Appl., 399:285-297, 2005. [56] Vladimir Peller. Hankel Operators and Their Applications. Springer, 1957. [57] Elijah Polak. Optimization: Algorithms and Consistent Approximation. SpringerVerlag, 2003. [58] Vladimir P. Potapov. The multiplicative structure of J-contractive matrix functions. Amer. Math. Soc. Transl. (2),15:131-243,1960. [59] Aaron Rakeffet-Rothkoff. The Rav: The World of Rabbi Joseph B. Soloveichik, Volume 2. Ktav Publishing House, 1999. [60] Walter Rudin. Real and Complex Analysis. McGraw Hill, 1966. [61] David L. Russell. Mathematics of Finite-Dimensional Control Systems. Marcel Dekker, 1979. [62] Thomas L. Saaty and Joseph Bram. Nonlinear Mathematics. Dover, 1981. [63] Issai Schur. Uber eine Klasse von Mittelbildungen mit Anwendungen auf die Determinantenttheorie. Sitzungsberichte der Berliner Mathematischen Gesellschajt, 22:9-20, 1923. [64] Barry Simon. OPUC on one foot. Bull. A mer. Math. Soc., 42:431-460, 2005. [65] Barry Simon. The sharp form of the strong Szego theorem. Contemporary Mathematics, 387:253-275, 2005. [66] Teiji Takagi. An algebraic problem related to an analytic theorem of CaratModory and Fejer on an allied theorem of Landau. Japanese J. of Mathematics, 1:83-93, 1924. [67] E. C. Titchmarsh. Eigenfunction expansions associated with second-order differential equations. Vol. 2. Oxford, 1958.
534
Bibliography
[68] Lloyd N. Trefethen and David Bau, III. Numerical Linear Algebra. SIAM, 1997. [69] Sergei Treil and Alexander Volberg. Wavelets and the angle between past and future. J. Funct. Anal., 143:269-308, 1997. [70] Robert James Waller. The Bridges of Madison County. Warner Books, 1992. [71] Roger Webster. Convexity. Oxford University Press, 1994. [72] W. Murray Wonham. Linear Multivariable Control. Springer-Verlag, 1985. [73] Kosaku Yosida. Functional Analysis. Springer-Verlag, 1965. [74] Kemin Zhou, John C. Doyle, and Keith Glover. Robust and Optimal Control. Prentice Hall, 1996.
Notation Index Frequently used symbols. A 0 B, 19 A®B,19 At B, 242 A ~ 0,242 At 0, 242 At, 235 A-I, 10 Al/2,266 AO,227 Aij, 104 A[j,k],240 IIAlIs,t, 144 AH,9 AT, 9
£+,388 £_,388 £0,389 e A ,278
j, 370 fog, 372 IF, 3 IFP, 3 IFPxq,4 gV,371 Hf(c),338
1m, 358 B(I,g),447 $, 172
C ( ~I, .•. , i.k ),102 )1, ... ,)k d:),77 C,3 C~xn. 258 C+.465 C(Q),297 C(Q),297 Ck (Q),297 Ck(Q),297 \!, 403 convQ,480 +,69 (Duf), 337 :D,404 detA,92
Ir f()")d)", 361 Jf(x),303 mvf,401
(V' f), 301 NT, 11 pt,375 per A, 504 TI_,383 TI+,383 Q,297 R(f.g),457 Ro,420 JR,3 RT,11
-
535
536
r.,.(A), 149
"',431 S',164 Sf, 111, 128 I: n ,90
S,372 (T(A), 68 Sj, 208
IISlIu,v, 142 (u, v)U, 157 Vol, 162
W(A),489
(x, Y)st, 158
IIxll oo , 139 IIxll., 139 {O, 1, oo}, 56
Notation Index
Subject Index
additive identity, 2 additive inverse, 2, 3 adjoint, 163 algebraic multiplicity, 68 algorithm for computing J, 116 for computing U, 117 for diagonalizing matrices, 73 See also recipe approximate solutions, 213 area, 202 Ascoli-Arzela theorem, 155 associative, 8 Axler, Sheldon, 87 backward shift, 420 Banach space, 155, 167 Barnett identity, 453, 455 basis, 6, 36 Bessel's inequality, 174 best approximation, 237 Bezoutian, 447, 459, 463 Bhatia, R., 206 Binet-Cauchy, 102 binomial formula, 82 binomial theorem, 276 Birkhoff-von Neumann, 506 block multiplication, 9 Borwein, J. M., 183 boundary point, 519 bounded, 167, 515, 518 linear functional, 150 Bram, J., 316, 336 Burg, J. P., 356 Camino, J. F., 206
Caratheodory, C., 480 Carlson, D. H., 446 Cassels, J. W. S., 356 Cauchy sequence, 278, 515, 519, 521 Cauchy-Riemann equations, 332, 358 Cauchy-Schwarz inequality, 136, 158, 203 Cayley-Hamilton theorem, 79, 98, 444 change of basis, 62 characteristic polynomial, 97, 112 circulant, 11 0 closed, 361, 518 under vector addition, 5 coisometric, 267 column vectors, 8 commutative group, 22 commute, 82 compact, 518, 519 complementary space, 69, 171 complete, 519 computing the coefficients in a basis, 51 cone, 4, 481 congruence, 433 congruent, 431 conjugate gradient, 349, 350 connected, 518 conservation of dimension, 36, 38 for matrices, 38 continuous dependence of solutions, 326 contour integral, 361 contour integration, 365 controllable, 403, 404, 408, 444, 446 realization, 406 converge, 515, 516 uniformly, 516 convergence of sums, 516 convex, 350, 469
-
537
538
Subject Index
combination, 470, 506 functions, 471 hull, 480, 491 subset, 490 convolution, 372 cosine, 161 law of, 162 Courant-Fischer, 215 Cramer's rule, 107
Brouwer, 313, 485 contractive, 306 refined contractive, 308 Fourier coefficient, 179 Fourier transform, 370 Fredholm alternative, 41 functional, 89 sublinear, 154 fundamental sequence, 515
de Boor, C. W. R., 494 de Branges space, 419 Degani, Han, 183 dense, 378 detectable, 415 determinant, 92, 202 diagonal, 13 block, 16 diagonalizable, 71, 76, 186 differentiation, 279 dimension, 6, 36 direct, 70, 170 sum, 69 decomposition, 69, 82 directed curve, 361 directional derivative, 337 distributive, 8 domain, 11, 12 Dym, Irene, 207 Dym, Michael, 447 dynamical system, 333, 335 discrete, 276
Gauss' divergence theorem, 314 Gauss-Lucas theorem, 488 Gauss-Seidel, 52 Gaussian elimination, 21 Gaussian quadrature, 180 generic, 378 geometric multiplicity, 67 Gersgorin disk, 438 Gohberg, I., xvi Gohberg-Heinig, 255, 257, 274 Gohberg-Krein, 225 golden mean, 289 gradient, 301, 344, 347, 350 system, 336 Gram-Schmidt, 177, 188 Green function, 296 Gronwall's inequality, 282
eigenvalue, 64, 67, 97, 242, 280 assignment problem, 443 eigenvector, 64, 67, 280 generalized, 67, 84 elementary factors, 424 Ellis, R. S., 356 equivalence of norms, 140 equivalence relation, 433 exponential, 277 extracting a basis, 50 extremal problems dual,354 with constraints, 341 extreme point, 482 local, 338
Hadamard, J., 200, 270 Hahn-Banach Theorem, 152, 355 Halmos, P., 494 Hardy-Littlewood-Polya rearrangement lemma, 507 Hautus test, 425 Heinz, E., 493 Hermitian, 185, 207, 242 matrices commuting, 188 real, 190 transposes, 9 Hessian, 338, 350, 473 Heymann, M., 446 higher order difference equations, 289 Hilbert space, 167 Holder's inequality, 135 holomorphic, 357, 522 Horn, A., 509 hyperplanes, 477
factorization, 241, 247, 248, 262, 263, 421, 424 Faddeev, L. D., 87 Fan, K., 507 Fejer and Riesz, 274 Fibonacci sequence, 289 fixed point, 307 theorem, 313
idempotent, 170 identity, 12 image, 11 implicit function theorem, 316, 324, 326, 329 general, 342 inequalities for singular values, 218 infimum, 517
Subject Index
inner product, 157 space, 157, 160 standard, 158 integration, 279 interior, 519 point, 519 interlace, 464 interlacing, 427 invariant subspaces, 64 inverse, 10, 52 left, 10,40 right, 10, 40 inverse Fourier transform, 371 invertible, 10, 14, 16 left, 10, 11, 32, 40, 49, 50, 227 right, 10, 11, 32, 40, 49, 50, 227 irreducible, 495 isolated zeros, 523 isometric, 186, 267, 282 isometry, 269 isospectral, 283 flows, 282 Jensen's inequality, 471 Jordan cell, 77, 115, 116, 276, 280, 402 Jordan chain, 80, 115 of length k, 67 Kannai, Y., 316 Kantorovich, L. V., 316 kernel,l1 Kharitonov, V. L., 466 Krein, M. G., xvi, 446 Krein-Milman theorem, 484, 494, 506 Kronecker product, 19 Krylov subspace, 349, 351 Lagrange multipliers, 341 LaSalle, J., 336 Laurent expansion, 413, 528 Lax pair, 283 Lebesgue, H., 371 Lefschetz, S., 336 Leray-Schauder Theorem, 316 Lewis, A. S., 183 limit inferior, 517 limit point, 518 limit superior, 517 linear combinations, 5 linear functional, 89, 476 linear quadratic regulator, 398 linear transformation, 12 linearly dependent, 5 linearly independent, 6 Liouville's theorem, 403, 528 local maximum, 338, 339 lower bound, 517
539
greatest, 517 Lyapunov equation, 387 Lyapunov function, 335, 336 strict, 335, 336 majorize, 510 mapping, 11 matrix algorithm for diagonalizing, 73 algorithm for Jordan J, 116 algorithm for U, 117 augmented, 24 companion, 109, 111, 128, 132, 285, 377, 443,456 complex symmetric, 212 conservation of dimension for, 38 contractive, 268 controllability, 403, 408, 443 Gram, 163 Hamiltonian, 390 Hankel, 179, 180, 183, 388, 444, 449 Hermitian commuting, 188 real, 190 Hessenberg, 205 Hilbert, 180, 346 identity, 7 inequalities, 268 Jacobi,430 Jacobian, 303, 329 multiplication, 7, 8 normal, 195, 196, 491 observability, 404, 410 orthogonal, 23, 190 orthostochastic, 508 permutation, 22 real, Jordan decompositions for, 126 semidefinite, 262 signature, 418 stochastic, 503 doubly, 504, 505 Toeplitz, 178, 179, 248, 388, 412 block,254 unitary, 186 Vandermonde, 110, 464 generalized, 129 zero, 8 maximum entropy completion, 258 maximum modulus principle, 525 McIntosh, A., 494 McMillan degree, 408, 424 mean value theorem, 298 generalized, 298 minimal,408 norm completion, 271 problem, solutions to, 273 polynomial, 79, 98
540
minimum, 350 local, 338, 339 Minkowski functional, 486, 487 Minkowski's inequality, 136 minor, 104 monotonic, 515 Moore-Penrose inverse, 210, 215, 235, 268, 434,436 multilinear functional, 90 multiplicative, 142, 143 negative definite, 242 negative semidefinite, 242 Newton step, 306 Newton's method, 304 nonhomogeneous differential systems, 285 nonhomogeneous equation, 295 nonnegative, 495 norm, 138 of linear transformations, 142 normal, 168, 197 matrix, 195, 196, 491 normed linear space, 138 nullspace, 11 numerical range, 489-491 observable, 404, 405, 408 pair, 404 realization, 406 Olshevsky, A., 467 Olshevsky, V., 467 open, 378, 518 covering, 518 operator norm, 142, 144 ordinary differential equations, 290 orthogonal, 162, 186, 283 complement, 162,171,172 decomposition, 162 expansions, 174 family, 162 projection, 170, 172-174, 191, 192, 214, 235, 262 sum decomposition, 162 orthonormal family, 162 Ostrowski, A., 446 parallelogram law, 160, 167, 471 Parrott's lemma, 271 Parseval,371 partial fraction expansions, 529 Peller, V., 183 permanent, 504 Perron-Frobenius, 496 piecewise smooth, 361 pivot, 23 columns, 23, 30 variables, 23
Subject Index
Plancherel, M., 371 polar form, 267 pole, 413, 529 polynomial, 329 Popov-Belevich-Hautus test, 425 positive, 495 definite, 242, 244 semidefinite, 242 power series, 521 projection, 170, 191, 230 proper, 401 pseudoinverse, 227 QR factorization, 201 quadrature formulas, 182 quotient spaces, 38 radius of convergence, 521 range, 11 conditions, 55 rank, 39, 40, 49, 50, 108, 195 rational, 401 RC Cola, 2 realization, 403, 452 recipe for solving difference equations, 289 for solving differential equations, 291 reproducing kernel, 416 Hilbert space, 416 residue, 365 resultant, 458 Riccati equation, 390, 419 Riesz projection, 375 Riesz representation, 167 Riesz-Fejer, 253 roots, 329 Rouche's Theorem, 329, 526 Rudin, W., 311 Saaty, T. L., 316, 336 Sakhnovich, L., xvi scalars, 3 multiplication, 2, 5 Schneider, H., 446 Schur complement, 17, 55, 56, 100, 262 Schur product, 19, 508 Schur's theorem, 220 Schur, Issai, 198 Schur-Horn, 513 Schwartz class, 372 second-order difference equation, 286 second-order differential systems, 283 selfadjoint, 168, 170 seminorm, 152 separation theorems, 475 sequence of functions, 516 sequence of points, 515, 516
541
Subject Index
Sherman-Morrison, 19 significant other, 111 similar, 310 similarity, 62 simple, 90, 361 permutation, 90 singular value, 208, 211, 212, 217, 224 decomposition, 209, 2lO skew-Hermitian, 282 skew-symmetric, 282, 283 small perturbation, 147, 377 smooth, 297, 361 span, 5 spectral mapping principle, 439 spectral mapping theorem, 381 spectral radius, 149, 309, 379 spectrum, 68 square root, 265, 266 stability, 284 stabilizable, 415 stable, 464, 465 standard basis, 22 Stein equation, 385 strictly convex, 351 strictly proper, 401 subspaces, 5 sum, 170 sum decomposition, 69 orthogonal, 162 support hyperplane, 479, 487 supremum, 517 Sylvester equation, 385 Sylvester inertia theorem, 442 Sylvester's law of inertia, 431 symmetric, 185 system of difference equations, 276 Takagi, T., 212, 225 Taylor's formula with remainder, 299, 300 Toeplitz-Hausdorff, 490 transformation, 11 transposes, 9 triangle inequality, 138 triangular, 13, 14 block,16 factorization, 240 lower, 13, 14, 242 lower block, 16 upper, 13, 14, 242 upper block, 16 tridiagonal,430 uniqueness, 281 unitary, 168, 188, 208 matrix, 186 upper bound, 517 least, 517
upper echelon, 23, 46 van der Waerden's conjecture, 514 variation of parameters, 295 vector, 2 addition, 2 is associative, 2 is commutative, 2 row, 8 space, 2 zero, 2,3 volume, 202, 203 Warning, 12, 110, 207, 239, 243, 278, 297, 338, 340, 469 Webster, R., 494 Wiener, Norbert, 253 Wronskian, 294 Young diagram, 79 Young's inequality, 137 zero dimension, 36
Titles in This Series 79 William Stein, Modular forms, a computational approach (with an appendix by Paul E. Gunnells), 2007 78 Harry Dym, Linear algebra in action, 2007 77 Bennett Chow, Peng Lu, and Lei Ni, Hamilton's Ricci flow, 2006 76 Michael E. Taylor, Measure theory and integration, 2006 75 Peter D. Miller, Applied asymptotic analysis, 2006 74 V. V. Prasolov, Elements of combinatorial and differential topology, 2006 73 Louis Halle Rowen, Graduate algebra: Commutative view, 2006 72 R. J. Williams, Introduction the the mathematics of finance, 2006 71 S. P. Novikov and I. A. Taimanov, Modern geometric structures and fields, 2006 70 Sean Dineen, Probability theory in finance, 2005 69 Sebastian Montiel and Antonio Ros, Curves and surfaces, 2005 68 Luis Caffarelli and Sandro Salsa, A geometric approach to free boundary problems, 2005 67 T.Y. Lam, Introduction to quadratic forms over fields, 2004 66 Yuli Eidelman, Vitali Milman, and Antonis Tsolomitis, Functional analysis, An introduction, 2004 65 S. Ramanan, Global calculus, 2004 64 A. A. Kirillov, Lectures on the orbit method, 2004 63 Steven Dale Cutkosky, Resolution of singularities, 2004 62 T. W. Korner, A companion to analysis: A second first and first second course in analysis, 2004 61 Thomas A. Iveyand J. M. Landsberg, Cartan for beginners: Differential geometry via moving frames and exterior differential systems, 2003 60 Alberto Candel and Lawrence Conlon, Foliations II, 2003 59 Steven H. Weintraub, Representation theory of finite groups: algebra and arithmetic, 2003 58 Cedric Villani, Topics in optimal transportation, 2003 57 Robert Plato, Concise numerical mathematics, 2003 56 E. B. Vinberg, A course in algebra, 2003 55 C. Herbert Clemens, A scrapbook of complex curve theory, second edition, 2003 54 Alexander Barvinok, A course in convexity, 2002 53 Henryk Iwaniec, Spectral methods of automorphic forms, 2002 52 Ilka Agricola and Thomas Friedrich, Global analysis: Differential forms in analysis, geometry and physics, 2002 51 Y. A. Abramovich and C. D. Aliprantis, Problems in operator theory, 2002 50 Y. A. Abramovich and C. D. Aliprantis, An invitation to operator theory, 2002 49 John R. Harper, Secondary cohomology operations, 2002 48 Y. Eliashberg and N. Mishachev, Introduction to the h-principle, 2002 47 A. Yu. Kitaev, A. H. Shen, and M. N. Vyalyi, Classical and quantum computation, 2002 46 Joseph L. Taylor, Several complex variables with connections to algebraic geometry and Lie groups, 2002 45 Inder K. Rana, An introduction to measure and integration, second edition, 2002 44 Jim Agler and John E. MCCarthy, Pick interpolation and Hilbert function spaces, 2002 43 N. V. Krylov, Introduction to the theory of random processes, 2002 42 Jin Hong and Seok-Jin Kang, Introduction to quantum groups and crystal bases, 2002
TITLES IN THIS SERIES
41 Georgi V. Smirnov, Introduction to the theory of differential inclusions, 2002 40 Robert E. Greene and Steven G. Krantz, Function theory of one complex variable, third edition, 2006 39 Larry C. Grove, Classical groups and geometric algebra, 2002 38 Elton P. Hsu, Stochastic analysis on manifolds, 2002 37 Hershel M. Farkas and Irwin Kra, Theta constants, Riemann surfaces and the modular group, 2001 36 Martin Schechter, Principles of functional analysis, second edition, 2002 35 James F. Davis and Paul Kirk, Lecture notes in algebraic topology, 2001 34 Sigurdur Helgason, Differential geometry, Lie groups, and symmetric spaces, 2001 33 Dmitri Burago, Yuri Burago, and Sergei Ivanov, A course in metric geometry, 2001 32 Robert G. Bartle, A modern theory of integration, 2001 31 Ralf Korn and Elke Korn, Option pricing and portfolio optimization: Modern methods of financial mathematics, 2001 30 J. C. McConnell and J. C. Robson, Noncommutative Noetherian rings, 2001 29 Javier Duoandikoetxea, Fourier analysis, 2001 28 Liviu I. Nicolaescu, Notes on Seiberg-Witten theory, 2000 27 Thierry Aubin, A course in differential geometry, 2001 26 Rolf Berndt, An introduction to symplectic geometry, 2001 25 Thomas Friedrich, Dirac operators in Riemannian geometry, 2000 24 Helmut Koch, Number theory: Algebraic numbers and functions, 2000 23 Alberto Candel and Lawrence Conlon, Foliations I, 2000 22 Giinter R. Krause and Thomas H. Lenagan, Growth of algebras and Gelfand-Kirillov dimension, 2000 21 John B. Conway, A course in operator theory, 2000 20 Robert E. Gompf and Andras I. Stipsicz, 4-manifolds and Kirby calculus, 1999 19 Lawrence C. Evans, Partial differential equations, 1998 18 Winfried Just and Martin Weese, Discovering modern set theory. II: Set-theoretic tools for every mathematician, 1997 17 Henryk Iwaniec, Topics in classical automorphic fOJ;:ms, 1997 16 Richard V. Kadison and John R. Ringrose, Fundamentals of the theory of operator algebras. Volume II: Advanced theory, 1997 15 Richard V. Kadison and John R. Ringrose, Fundamentals of the theory of operator algebras. Volume I: Elementary theory, 1997 14 Elliott H. Lieb and Michael Loss, Analysis, 1997 13 Paul C. Shields, The ergodic theory of discrete sample paths, 1996 12 N. V. Krylov, Lectures on elliptic and parabolic equations in Holder spaces, 1996 11 Jacques Dixmier, Enveloping algebras, 1996 Printing 10 Barry Simon, Representations of finite and compact groups, 1996 9 Dino Lorenzini, An invitation to arithmetic geometry, 1996 8 Winfried Just and Martin Weese, Discovering modern set theory. I: The basics, 1996 7 Gerald J. Janusz, Algebraic number fields, second edition, 1996 6 Jens Carsten Jantzen, Lectures on quantum groups, 1996
For a complete list of titles in this series, visit the AMS Bookstore at www.ams.org/bookstore/.
Linear algebra permeates mathematics. perhaps more so than any other single subject. It plays an essential role in pure and applied mathematics. statistics. computer science. and many aspects of physics and engineering. This book conveys in a user-friendly way the basic and advanced techniques of linear algebra from the point of view of a working analyst. The techniques are illustrated by a wide sample of applications and examples that are chosen to highlight the tools of the trade. In short. this is material that the author wishes he had been taught as a graduate student. Roughly the first third of the book covers the basic material of a first course in linear algebra. The remaining chapters are devoted to applications drawn from vector calculus. numerical analysis. control theory. complex analysis. convexity and functional analysis. In particular. fixed point theorems. extremal problems. matrix equations. zero location and eigenvalue location problems. and matrices with nonnegative entries are discussed. Appendices on useful facts from analysis and supplementary information from complex function theory are also provided for the convenience of the reader. The book is suitable as a text or supplementary reference for a variety of courses on linear algebra and its applications. as well as for self-study.
ISBN 0- 8218- 3813 - X
9 780821 838136
GSM/78