•
Rajendra Bhatia
Matrix Analysis
Graduate Texts in Mathematics S. Axler
Springer
New York Berlin
Heidelberg Barcelona Budapest Hong Kong London Milan Paris Santa Clara Singapore Tnkvn
F.W.
169
Editorial Board Gehring P.R. Halmos
Graduate Texts in Mathematics 1
TAKEUTIIZARING. Introduction to
33
HIRscH. Differential Topology.
Axiomatic Set Theory. 2nd ed.
34
SPITZER. Principles of Random Walk.
2
OXTOBY. Measure and Category. 2nd ed.
3
SCHAEFER. Topological Vector Spaces.
4
HlLToN/STAMM BACH. A Course in
Homological Algebra. 5
2nd ed.
35 36
MAC LANE. Categories for the Working
WERMER. Banach Algebras and Several Complex Variables. 2nd ed. KELLEYINAMIOKA et al. Linear Topological Spaces.
Mathematician.
37
MONK. Mathematical Logic.
6
HUGHES/PIPER. Projective Planes.
38
GRAUERTIFRrrzscHE. Several Complex
7
SERRE. A Course in Arithmetic.
8
TAKEUTIIZARING. Axiomatic Set Theory.
39
ARVESON. An Invitation to C*Algebras.
9
HUMPHREYS. Introduction to Lie Algebras
40
KEMENY/SNELI.iKNAPP. Denumerable
Variables.
and Representation Theory. 10 11 12 13
COHEN. A Course in Simple Homotopy
15
41
APoSTOL. Modular Functions and
Theory.
Dirichlet Series in Number Theory.
CoNWAY. Functions of One Complex
2nd ed.
Variable I. 2nd ed.
42
BEALS. Advanced Mathematical Analysis.
ANDERSON/FuLLER. Rings and Categories of Modules. 2nd ed.
14
Markov Chains. 2nd ed.
GoLUBITSKY/GUILLEMIN. Stable Mappings
Groups. 43
44
and Their Singularities.
45
BERBERIAN. Lectures in Functional
46
Analysis and Operator Theory.
47
16
WINTER. The Structure of Fields.
17
RosENBLATI. Random Processes. 2nd ed.
18
HALMOS. Measure Theory.
19
HALMos. A Hilbert Space Problem Book.
SERRE. Linear Representations of Finite
48
GILLMAN/JERISON. Rings of Continuous Functions.
KENDIG. Elementary Algebraic Geometry.
LoE.VE. Probability Theory II. 4th ed.
LOEVE. Probability Theory I. 4th ed. Dimensions 2 and 3.
MoiSE. Geometric Topology in
SACHs!Wu. General Relativity for Mathematicians.
49
GRUENBERG/WEIR. Linear Geometry. 2nd ed.
2nd ed. 20
HUSEMOLLER. Fibre Bundles. 3rd ed.
50
EDWARDS. Fermat's Last Theorem.
21
HUMPHREYS. Linear Algebraic Groups.
51
KLINGENBERG. A Course in Differential
22
BARNES/MACK. An Algebraic Introduction
23 24
Geometry.
to Mathematical Logic.
52
HARTSHORNE. Algebraic Geometry.
GREUB. Linear Algebra. 4th ed. HoLMES. Geometric Functional Analysis
53
MANIN. A Course in Mathematical Logic.
54
GRAVERIWATKINS. Combinatorics with Emphasis on the Theory of Graphs.
and Its Applications. 25
HEWITI/STROMBERG. Real and Abstract
55
Analysis. 26
MANEs. Algebraic Theories.
27
KELLEY. General Topology.
28
ZARisKIISAMUEL. Commutative Algebra Vol. I.
29 30
JACOBSON. Lectures in Abstract Algebra II. Linear Algebra.
32
56
JACOBSON. Lectures in Abstract Algebra III. Theory of Fields and Galois Theory.
MASSEY. Algebraic Topology: An Introduction.
57
CRoWELrlFox. Introduction to Knot KoBLITZ. padic Numbers, padic
Theory. 58
JACOBSON. Lectures in Abstract Algebra I. Basic Concepts.
31
Analysis.
ZARisKIISAMUEL. Commutative Algebra. Vol. II.
BRoWN/PEARCY. Introduction to Operator Theory 1: Elements of Functional
59 60
Analysis, and ZetaFunctions. 2nd ed.
LANG. Cyclotomic Fields.
Classical Mechanics. 2nd ed.
ARNOLD. Mathematical Methods in
continued after index
Rajendra Bhatia
Matrix Analysis
Springer
Rajendra Bhatia Indian Statistical Institute New Delhi 110 016 India
Editorial Board
F. W. Gehring
S. Axler
P.R. Halmos
Department of
Department of
Department of
Mathematics
Mathematics
Mathematics
Michigan State University
University of Michigan
Santa Clara University
East Lansing, MI 48824
Ann Arbor, MI 48109
Santa Clara, CA 95053
USA
USA
USA
Mathematics Subject Classification (1991): 1501, 15A16, 15A45, 47A55, 65F15 Library of Congress CataloginginPublication Data
I Rajendra Bhatia.
Bhatia, Rajendra, 1952Matrix analysis p.
em.

(Graduate texts in mathematics; 169)
Includes bibliographical references and index. ISBN 0387948465 (alk. paper)
QA188.B485
1. Matrices.
I. Title.
II. Series.
1996
512.9'434dc20
9632217
Printed on acidfree paper.
© 1997 SpringerVerlag New York, Inc. All rights reserved. This work may not be translated or copied in whole or in part without the
written permission of the publisher (SpringerVerlag New York, Inc., 175 Fifth Avenue, New York, NY 10010, USA), except for brief excerpts in connection with reviews or scholarly
analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereaf ter developed is forbidden. The use of general descriptive names, trade names, trademarks, etc., in this publication, even if the former are not especially identified, is not to be taken as a sign that such names, as understood by the Trade Marks and Merchandise Marks Act, may accordingly be used freely by anyone. Production managed by Victoria Evarretta; manufacturing supervised by Jeffrey Taub. Photocomposed pages prepared from the author's LaTeX files. Printed and bound by MapleVail Book Manufacturing Group, York, PA. Printed in the United States of America.
987654321 ISBN
0387948465 SpringerVerlag New York Berlin Heidelberg
SPIN 10524488
Preface
A good part of matrix theory is functional analytic in spirit . This statement can be turned around. There are many problems in operator theory, where most of the complexities and subtleties are present in the finitedimensional case. My purpose in writing this book is to present a systematic treatment of methods that are useful in the study of such problems. This book is intended for use as a text for upper division and gradu ate courses. Courses based on parts of the material have been given by me at the Indian Statistical Institute and at the University of Toronto (in collaboration with Chandler Davis) . The book should also be useful as a reference for research workers in linear algebra, operator theory, mathe matical physics and numerical analysis. A possible subtitle of this book could be Matrix Inequalities. A reader who works through the book should expect to become proficient in the art of deriving such inequalities. Other authors have compared this art to that of cutting diamonds. One first has to acquire hard tools and then learn how to use them delicately. The reader is expected to be very thoroughly familiar with basic lin ear algebra. The standard texts FiniteDimensional Vector Spaces by P. R. Halmos and Linear Algebra by K. Hoffman and R. Kunze provide adequate preparation for this. In addition, a basic knowledge of functional analy sis, complex analysis and differential geometry is necessary. The usual first courses in these subjects cover all that is used in this book. The book is divided, conceptually, into three parts. The first five chapters contain topics that are basic to much of the subject. (Of these, Chapter 5 is more advanced and also more special . ) Chapters 6 to 8 are devoted to
v1
Preface
perturbation of spectra, a topic of much importance in numerical analysis, physics and engineering. The last two chapters contain inequalities and perturbation bounds for other matrix functions. These too have been of broad interest in several areas. In Chapter 1, I have given a very brief and rapid review of some basic topics. The aim is not to provide a crash course but to remind the reader of some important ideas and theorems and to set up the notations that are used in the rest of the book. The emphasis, the viewpoint , and some proofs may be different from what the reader has seen earlier. Special attention is given to multilinear algebra; and inequalities for matrices and matrix functions are introduced rather early. After the first chapter, the exposition proceeds at a much more leisurely pace. The contents of each chapter have been summarised in its first paragraph. The book can be used for a variety of graduate courses. Chapters 1 to 4 should be included in any course on Matrix Analysis. After this, if perturbation theory of spectra is to be emphasized, the instructor can go on to Chapters 6,7 and 8. With a judicious choice of topics from these chapters, she can design a onesemester course. For example, Chapters 7 and 8 are independent of each other, as are the different sections in Chapter 8. Alternately, a onesemester course could include much of Chapters 1 to 5, Chapter 9 , and the first part of Chapter 10. All topics could be covered comfortably in a twosemester course. The book can also be used to supplement courses on operator theory, operator algebras and numerical linear algebra. The book has several exercises scattered in the text and a section called Problems at the end of each chapter. An exercise is placed at a particular spot with the idea that the reader should do it at that stage of his reading and then proceed further. Problems, on the other hand, are designed to serve different purposes. Some of them are supplementary exercises, while others are about themes that are related to the main development in the text . Some are quite easy while others are hard enough to be contents of research papers . From Chapter 6 onwards, I have also used the problems for another purpose. There are results, or proofs, which are a bit too special to be placed in the main text. At the same time they are interesting enough to merit the attention of anyone working, or planning to work, in this area. I have stated such results as parts of the Problems se c t ion, often with hints about their solutions. This should enhance the value of the book as a reference, and provide topics for a seminar course as well. The reader should not be discouraged if he finds some of these problems difficult. At a few places I have drawn attention to some unsolved research problems. At some others, the existence of such problems can be inferred from the text. I hope the book will encourage some readers to solve these problems too. While most of the notations used are the standard ones, some need a little explanation: Almost all functional analysis books written by mathematicians adopt the convention that an inner product (u, v) is linear in the variable u and
Preface
..
Vll
conj ugatelinear in the variable v. Physicists and numerical analysts adopt the opposite convention, and different notations as well. There would be no special reason to prefer one over the other, except that certain calculations and manipulations become much simpler in the latter notation. If u and v are column vectors , then u*v is the product of a row vector and a column vector, hence a number. This is the inner product of u and v. Combined with the usual rules of matrix multiplication, this facilitates computations. For this reason, I have chosen the second convention about inner products , with the belief that the initial discomfort this causes some readers will be offset by the eventual advantages. ( Dirac 's bra and ket notation, used by physicists , is different typographically but has the same idea behind it . ) The kfold tensor power of an operator is represented in this book as k k k @ A, the antisymmetric and the symmetric tensor powers as 1\ A and v A, respe ctively. This helps in thinking of these obj ects as maps, A � @ k A, etc. We often study the variational behaviour of, and perturbation bounds for, functio ns of operators . In such contexts, this notation is natural. Very often we have to compare two ntuples of numbers after rearrang ing them. For this I have used a pictorial notation that makes it easy to remember the order that has been chosen. If x == (x 1 , .. . , x n ) is a vector with real coordinates, then x1 and x i are vectors whose coordinates are ob tained by rearranging the numbers Xj in decreasing order and in increasing · · 1y. ·nr x 1  ( x l1 , . . . , x nl) an d x i  ( x ii , . . . , x ni) , vve write or der, respective > x n1 and x i1 < > < x ni . where x 11 The symbol Ill Ill stands for a unitarily invariant norm on matrices: one that satisfies the equality I I I U A V I I I == I I I A I I I for all A and for all unitary U, V . A statement like Ill A lii < Ill B ill means that , for the matrices A and B , this inequality is true simultaneously for all unitarily invariant norms. The supremum norm of A, as an operator on the space e n , is always written as I I A I I  Other norms carry special subscripts. For example, the Frobenius norm, or the HilbertSchmidt norm, is written as I I A I I 2  ( This sho uld be noted by numerical analysts who often use the symbol I I A I I 2 for what we call I I A l l . ) A few symbols have different meanings in different contexts. The reader ' s attention is drawn to three such symbols. If x is a complex number, lxl de notes the absolute value of x. If x is an nvector with coordinates (x 1 , . . . , Xn ) , then l x l is the vector ( l x11, ... , l x n l ). For a matrix A, the symbol l A ! stands for the positive semidefinite matrix (A * A)11 2 . If J is a finite set , IJI denotes the number of elements of J. A permutation on n indices is often denoted by the symbol u. In this case, u(j) is the image of the index j under the map u. For a matrix A , u(A) represents the spectrum of A. The trace of a matrix A is written as tr A. In analogy, if x == ( x 1 , .. . , Xn ) is a vector, we writ e tr x for the sum E xj . The words matrix and operator are used interchangeably in the book. When a statement about an operator is purely finitedimensional in content , _
·
·
·
·
·
·
·
_
v111
Preface
I use the word matrix. If a statement is true also in infinitedimensional spaces, possibly with a small modification, I use either the word matrix or the word operator. Many of the theorems in this book have extensions to infinitedimensional spaces. Several colleagues have contributed to this book, directly and indirectly. I am thankful to all of them. T. Ando, J .S. Aujla, R.B. Bapat , A. Ben Israel, I. Ionascu, A.K. Lal, R.C .Li, S.K. Narayan, D. Petz and P. Rosenthal read parts of the manuscript and brought several errors to my attention. Fumio Hiai read the whole book with his characteristic meticulous attention and helped me eliminate many mistakes and obscurities. Longtime friends and coworkers M.D. Choi, L. Elsner, J.A.R. Holbrook, R. Horn, F. Kittaneh, A. Mcintosh, K. Mukherjea, K.R. Parthasarathy, P. Rosenthal and K.B . Sinha, have generously shared with me their ideas and insights. These ideas, collected over the years, have influenced my writing. I owe a special debt to T. An do. I first learnt some of the topics presented here from his Hokkaido University lecture notes. I have also learnt much from discussions and correspondence with him. I have taken a lot from his notes while writing this book. The idea of writing this book came from Chandler Davis in 1986. Various logistic difficulties forced us to abandon our original plans of writing it together. The book is certainly the poorer for it . Chandler, however, has contributed so much to my mathematics, to my life, and to this project , that this is as much his book as it is mine. I am thankful to the Indian Statistical Institute, whose facilities have made it possible to write this book. I am also thankful to the Department of Mathematics of the University of Toronto and to NSERC Canada, for several visits that helped this project take shape. It is a pleasure to thank V.P. Sharma for his lffi.TEXtyping, done with competence and with good cheer, and the staff at SpringerVerlag for their help and support. My most valuable resource while writing, has been the unstinting and ungrudging support from my son Gautam and wife Irpinder. Without that, this project might have been postponed indefinitely. Rajendra Bhatia
Contents
Preface
v
I
1 1 3
A Review of Linear Algebra
I. l I. 2 1.3 1.4 I.5 1.6 1.7 II
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
Majorisation and Doubly Stochastic Matrices
IL l 11 . 2 11.3 II.4 II.5 II.6 III
Vector Spaces and Inner Product Spaces Linear Operators and Matrices . Direct Sums . . . . . . . . . Tensor Products . . . . . . . . Symmetry Classes . Problems . . . . . . . . . . . Notes and References
. . . . . . . . . Basic Notions . . . . . Birkhoff ' s Theorem . . . . . . Convex and Monotone Functions . Binary Algebraic O p erat ions and Majorisation . . . . Problems . . . . . . . Notes and References . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . . . . .
.
.
.
.
.
.
Variational Principles for Eigenvalues
III. l 111.2 IIL3 III.4
The Minimax Principle for Eigenvalues . . . . . . . Weyl's Inequalities . . . . . . . . . . . . Wielandt 's Minimax Principle . . . . . . Lidskii 's Theorems . . . . . . . . . . . . . .
9
12 16 20 26 28
28 36 40 48 50 54
57 57 62 65 68
x
Contents
III.5 III. 6 III. 7 IV
Eigenvalues of Real Parts and Singular Values . . . . Problems . . . . . . . Notes and References .
Symmetric Norms IV. 1 Norms on . . . . . . . . . . . . . IV. 2 Unitarily Invariant Norms on Operators IV. 3 Lidskii ' s Theorem ( Third Proof ) . . . . IV.4 Weakly Unitarily Invariant Norms . . . IV. 5 Problems . . . . . . . . . . . . . IV.6 Notes and References . . . . . . .
en .
.
.
v
Operator Monotone and Operator Convex Functions V. 1 Definitions and Simple Examples . . . . . V.2 Some Characterisations . . . . . . . Smoothness Properties . . . V.3 . . . . . V.4 Loewner ' s Theorems Problems . . . . . . . . . . . . . . . V.5 Notes and References V.6 . . . . . .
84
. . . . . . . . 84 on . . 91 . . . 98 . . . . . . . 101 . 107 . . . . . . . . 109
.
VI
73 75 78
.
.
en .
.
.
.
.
. . . . . . . . . . .
.
. . . .
. . . . .
. . . . . . . . . . . . . .
1 12 1 12 117 1 23 131 1 47 1 49
Spectral Variation of Normal Matrices 152 VI. 1 Continuity of Roots of Polynomials . 153 VI.2 Hermitian and SkewHermitian Matrices 1 55 . . . VI.3 Estimates in the Operator Norm . . . . . . 1 59 VI.4 Estimates in the Frobenius Norm . . . . . . . . . . 165 VI. 5 Geometry and Spectral Variation: the Operator Norm . 1 68 VI.6 Geometry and Spectral Variation: wui Norms . . . 1 73 VI. 7 Some Inequalities for the Determinant . . . . . . . 18 1 VI. 8 Problems . . . . . . . . . . . . . . . . . . . . 184 VI. 9 Notes and References . . . . . . . . . . 190 .
VII Perturbation of Spectral S ubspaces of Normal Matrices VII. I Pairs of Subspaces . . . . . . . . VII.2 The Equation AX  X B Y . . . . . . . . . . . . . . . VII.3 Perturbation of Eigenspaces . . VII.4 A Perturbation Bound for Eigenvalues VII. 5 Perturbation of the Polar Factors . . . . . . VII.6 Appendix: Evaluating the ( Fourier ) constants VII. 7 Problems . . . . . . . . . . . . . . . . VII.8 Notes and References . . . . . . . . . . . . . . ==
VIII Spectral Variation of Nonnor rnal Mat rices VIII. 1 General Spectral Variation Bounds . . . .
. . . . . . . . .
. . . . . . . .
. . . . . .
. . . . . .
194
195 203 211 2 12 213 216 22 1 . . 223 226 227
Contents VIII.4 Matrices with Real Eigenvalues
238 240 244 249
VIII. 5 Eigenvalues with Symmetries . . . . VIII. 6 Probl ems . . . . . . VIII. 7 Notes and References . . . . . .
.
.
.
.
IX
A
Selection of Matrix Inequalities IX. l Some Basic Lemmas . . . . . . . . . . IX. 2 Products of Positive Matrices . . . . . IX. 3 Inequalities for the Exponential Function IX.4 ArithmeticGeometric Mean Inequalities . . IX. 5 Schwarz Inequalities . . . . . IX.6 The Lieb Concavity Theorem . . . . . . IX. 7 Operator Approximation . . . . . . . . . IX. 8 Problems . . . . . . . . . . . IX. 9 Notes and References . . . . . . . .
.
.
.
.
.
.
.
X
.
253 253 . . . . . . . 255 258 262 . . .
.
266
.
.
.
. . . . . . . . 271 . . . . . . . . 275 279 . . . . . . . 285 .
Perturbation of Matrix Functions X. 1 Operator Monotone Functions . . . . . . . . The Absolute Value . . . . . . . . . . . . . . . . X.2 Local Perturbation Bounds . X. 3 Appendix : Differential Calculus X.4 Problems . . . . . . . . . . . X.5 Notes and References X.6
.
.
.
.
x1
289 289 296 30 1 310 317 320
References
325
Index
339
I A
Review of Linear Algebra
In this chapter we review, at a brisk pace, the basic concepts of linear and multilinear algebra. Most of the material will be familiar to a reader who has had a standard Linear Algebra course, so it is presented quickly with no proofs. Some topics, like tensor products, might be less familiar. These are treated here in somewhat greater detail. A few of the topics are quite advanced and their presentation is new. I.l
Vector Spaces and Inner Product Spaces
Throughout this book we will consider finitedimensional vector spaces over the field C of complex numbers. Such spaces will be denoted by symbols V, W, V1 , V2 , etc. Vectors will, most often, be represented by symbols u, v, w, x, etc. , and scalars by a, b, s, t, etc. The symbol n , when not explained, will always mean the dimension of the vector space under consideration. Most often, our vector space will be an inner product space. The inner product between the vectors u, v will be denoted by (u, v). We will adopt the convention that this is conj ugatelinear in the first variable u and linear in the second variable v. We will always assume that the inner product is definite; i.e. , \u, u) 0 if and only if u == 0. A vector space with such an inner product is then a finitedimensional Hilbert space. Spaces of this type will be denoted by symb ols 11, IC, etc. The norm arising from the inner product will be denoted by J l uJ I; i.e. , ll u ll == (u, u)112. As usual, it will sometimes be convenient to deal with the standard Hi lbert space en. Elements of this vector space are column vectors with ==
2
I.
A
Review
of
Linear Algebra
coordinates. In this case , the inner product ( u, v ) is the matrix product * u v obtained by multiplying the column vector v on the left by the row vector u*. The symbol * denotes the conj ugate transpose for matrices of any size. The notation u* v for the inner product is sometimes convenient even when the Hilbert space is not e n . The distinction between column vectors and row vectors is important in manipulations involving products. For example, if we write elements of e n * * as column vectors, then u v is a number, but uv is an n x n matrix ( some times called the "outer product" of u and v ) . However, it is typographically inconvenient to write column vectors. So, when the context does not de mand this distinction, we may write a vector x with scalar coordinates x1, .. . , Xn , simply as (xi, . . . , x n ) · This will often be done in later chap ters. For the present, however, we will maintain the distinction between row and column vectors. Occasionally our Hilbert spaces will be real, but we will use the same notation for them as for the complex ones. Many of our results will be true for infinitedimensional Hilbert spaces, with appropriate modifications at times. We will mention this only in passing. Let X == ( xi, . . . , x k ) be a ktuple of vectors. If these are column vectors, then X is an n x k matrix. This notation suggests matrix manipulations with X that are helpful even in the general case. For example, let X == (x 1 , .. . , x k ) be a linearly independent ktuple. We say that a ktuple Y == (y1, . . . , Y k ) is biorthogonal to X if (yi , Xj ) == Dij· This condition is expressed in matrix terms as Y* X == Ik , the k x k identity matrix. n
Exercise 1.1 .1 Given any k tuple of linearly independent vectors X as above, there exists a ktuple Y biorthogonal to it. If k == n, this Y is unique. The GramSchmidt procedure, in this notation, can be interpreted as a matrix factoring theorem. Given an ntuple X == (x 1 , . .. , xn ) of linearly independent vectors the procedure gives another ntuple Q == (q 1 , ... , qn ) whose entries are orthonormal vectors. For each k 1 , 2, . . . , n, the vectors {XI , . .. , X k } and { q1, .. . , qk } have the same linear span. In matrix notation this can be expressed as an equation, X == QR, where R is an upper triangular matrix. The matrix R may be chosen so that all its diagonal entries are positive. With this restriction the factors Q and R are both unique. If the vectors Xj are not linearly independent , this procedure can be modified. If the vector Xk is linearly dependent on x 1 , . . . , Xk l, set q k == 0; otherwise proceed as in the GramSchmidt process. If the kth column of the matrix Q so constructed is zero, put the kth row of R to be zero. Now we have a factorisation X == QR, where R is upper triangular and Q has orthogonal columns, some of which are zero. Take the nonzero columns of Q and extend this set to an orthonormal basis. Then, replace the zero columns of Q by these additional basis vectors . The new matrix Q now has orthonormal columns, and we still have X == Q R, because the =
I. 2
Linear Operators and Matrices
3
new columns of Q are matched with zero rows of R. This is called the QR decom pos itio n. Similarly, a change of orthogonal bases can be conveniently expressed in these notations as follows. Let X == (x 1 , ... , x k ) be any ktuple of vectors and E == (e1, ... ,en ) any orthonormal basis. Then, the columns of the matrix E* X are the representations of the vectors comprising X, relative to the basis E. W hen k == n and X is an orthonormal basis , then E* X is a unitary matrix. Furthermore, this is the matrix by which we pass between coordinates of any vector relative to the basis E and those relative to the basis X. Indeed, if
then we have
u ==
u ==
Ea Xb , '
*
a1· == eu
b1·
==
J
a==
,
x�u J '
b
==
E*u ' X*u.
Hence, a ==
E* Xb and
b
==
X* Ea.
Exercise 1.1 .2 Let X be any basis of1l and let Y be the basis biorthogonal to it. Using matrix multiplication, X gives a linear transformation from en to 1l. The inverse of this is given by Y*. In the special case when X is orthonormal (so that Y == X), this transformation is innerproduct preserving if the standard inner product is used on e n . Exercise 1.1 .3 Use the QR decomposition to prove Hadamard's inequal ity : if X == ( x 1 , . . . , xn ) , then
I det
XI <
n
IT II xj 11
j=I
Equality holds here if and only if either the . some xj 'lS zero. I.2
Xj
are mutually orthogonal or
Linear Operators and Matrices
Let £, ( V, W) be the space of all linear operators from a vector space V to a vector space W. If bases for V, W are fixed, each such operator has a unique matrix associated with it . As usual, we will talk of operators and matrices interchangeably. For operators between Hilbert spaces, the matrix representations are especially nice if the bases chosen are orthonormal. Let A E £(7t, K) , and let E == (e1, ... ,en ) be an orthonormal basis of7{ and F == (fl, . . . , fm) an orthonormal basis of JC. Then, the ( i, j) entry of the matrix of A relative
4
I.
A
Review
of Linear
Algebra
to these bases is aij == ft Aei == (fi , Aei). This suggests that we may say that the matrix of A relative to these bases is F*AE. In this notation, composition of linear operators can be identified with matrix multiplication as follows. Let M be a third Hilbert space with or thonormal basis G == (g1, ... , gp)· Let B E £(/C, M). Then
( matrix of B · A)
G* (B · A)E G*BF F*AE (G* BF)(F*AE) ( matrix of B) ( matrix of A).
The second step in the above chain is justified by Exercise 1. 1 .2. The ad joint of an operator A E £(7t, K) is the unique operator A* 1n £(K, 1t) that satisfies the relation
for all x
E
1t and
z
E
( z, Ax) K: == (A*z, x)rt IC.
Exercise 1.2 . 1 For fixed bases in 1l and IC, the matrix of A* is the con jugate transpose of the matrix of A. For the space £(1t, 1t) we use the more compact notation £(1t) . In the rest of this section, and elsewhere in the book, if no qualification is made, an operator would mean an element of £(7l ) . An operator A is called selfad joint or Hermitian if A == A*, skew Hermitian if A == A*, unitary if AA* == I == A*A, and no rmal if AA* ==A*A. A Hermitian operator A is said to be positive or positive semide finite if ( x , Ax) > 0 for all x E 7t. The notation A > 0 will be used to express the fact that A is a positive operator. If {x, Ax) > 0 for all nonzero x, we will say A is positive de finite, or strictly positive We will then write A > 0. A positive operator is strictly positive if and only if it is invertible. If A and B are Hermitian, then we say A > B if A  B > 0. Given any operator A we can find an orthonormal basis y1 , . . , Yn such that for each k == 1 , 2, . . . , n, the vector Ayk is a linear combination of y1, ... , Yk· This can be proved by induction on the dimension n of 1l. Let )q be any eigenvalue of A and y1 an eigenvector corresponding to A1 , and M the !dimensional subspace spanned by it. Let N be the orthogonal com plement of M. Let PN denote the orthogonal projection on N. For yEN, let ANY== PNAy. Then, AN is a linear operator on the ( n  1 ) dimensional space N. So, by the induction hypothesis, there exists an orthogonal ba sis Y2, ... , Yn of N such that for k == 2, ... , n the vector ANYk is a linear combination of y2, ... , Yk· Now Y1, ... , Yn is an orthogonal basis for 1t, and each Ayk is a linear combination of y1 , . , Yk for k == 1, 2, . . , n. Thus, the matrix of A with respect to this basis is upper triangular. In other words, .
.
.
.
.
I.2 Linear Operators and Matrices
5
every matrix A is unitarily equivalent (or unitarily similar ) to an up per triangular matrix T, i.e. , A == QTQ* , where Q is unitary and T is upper triangular. This triangular matrix is called a Schur Triangular Form for A. An orthonormal basis with respect to which A is upper triangular is called a Schur basis for A. If A is normal, then T is diagonal and we have Q* AQ D, where D is a diagonal matrix whose diagonal entries are the eigenvalues of A. This is the Spectral Theorem for normal matrices . The Spectral Theorem makes it easy to define functions of normal matri ces. If f is any complex function, and if D is a diagonal matrix with A1, . .., An on its diagonal, then f (D ) is the diagonal matrix with j(A1 ) , .. . , j (A n ) on its diagonal. If A == QDQ* , then f (A ) == Qf ( D) Q * . A special conse quence, used very often, is the fact that every positive operator A has a unique positive square root. This square root will be written as A 112. ==
Exercise 1.2 . 2 Show that the following statements are equivalent:
A is positive. {ii) A== B* B for some B. (iii) A== T* T for some upper triangular T. (iv) A == T* T for some upper triangular T {i}
with nonnegative diagonal
entries.
If A is positive definite, then the factorisation in (iv) is unique. This is called the Cholesky Decomposition of A.
Exercise 1.2.3 {i) Let {Ao:} be a family of mutually commuting operators. Then, there is a common Schur basis for {Ao:}· In other words, there exists a unitary Q such that Q* Ao:Q is upper triangular for all a . {ii} Let {Aa} be a family of mutually commuting normal operators. Then, there exists a unitary Q such that Q* Ao:Q is diagonal for all a. For any operator A the operator A* A is always positive, and its unique positive square root is denoted by jAj. The eigenvalues of IAI counted with multiplicities are called the singular values of A. We will always enu merate these in decreasing order, and use for them the notation s1 (A) > > sn(A). s2 (A) > If rank A== k, then sk (A ) > 0, but sk+l (A) == · · == sn(A) == 0. Let S be the diagonal matrix with diagonal entries sl ( A ) , . . . , sn(A) and s+ the k X k diagonal matrix with diagonal entries s1(A), ... , s k (A) . Let Q == (Q1, Q2) be the unitary matrix in which Q1 is the n x k matrix whose columns are the eigenvectors of A* A corresponding to the eigenvalues si (A), ... , s� (A) and Q2 then x (n k) matrix whose columns are the eigenvectors of A* A corresponding to the remaining eigenvalues. Then, by the Spectral Theorem ·
·
·
·
Q* (A* A)Q =
( �i �).
I.
6
A Review of
Note that
Linear Algebra
Qi ( *A )A Ql
S�,
==
; Q (*A
A)Q2
==
0
.
The second of these relations implies that Q2 A 0. From the first one we can conclude that if WI AQlS+ 1' then WiWl Ik · Choose w2 so that W (WI, 2 W ) is unitary. Then, we have ==
==
==
==
W *AQ
==
W AQl w � i ( w; AQl w:
A2 Q 2 AQ 2
) ( ==
s0+
0o
).
This is the Singular Value Decomposition: for every matrix exist unitaries W and Q such that W*
QA
==
A there
S,
where S is the diagonal matrix whose diagonal entries are the singular values of A. Note that in the above representation the columns of Q are eigenvectors of A* A and the columns of W are eigenvectors of AA * corresponding to the eigenvalues sj ( A ) , 1 < j < n. These eigenvectors are called the right and left singular vectors of ,A respectively. Exercise 1. 2 .4 {i) The Singular Value Decomposition leads to the Polar Decomposition: Every operator A can be written as A UP, where U is unitary and P is positive. In this decomposition the positive part P is unique, P I A I . The unitary part U is unique if A is invertible . {ii) An operator A is normal if and only if the factors U and P in the polar decomposition of A commute. {iii) We have derived the Polar Decomposition from the Singular Value Decomposition. Show that it is possible to derive the latter from the former. ==
==
Every operator
A can be decomposed A
==
as
a sum
Re A + i Im A ,
A"tA"' and Im A A;t"'. This is called the Cartesian where Re A Decomposition of A into its "real" and "imaginary" parts. The operators Re A and Im A are both Hermitian. The norm of an operator A is defined as ==
==
IIAll
We also have
IIAII
==
==
sup IIxA 11
llxll=l
sup
llxll=llyii=I
l(y,Ax)l.
When A is Hermitian we have AIIll
==
su p I (x , Ax) 1
IITII=l
I. 2
For every operator
When
Linear Operators and Matrices
7
A we have
A is normal we have II A I J max{J.Xjl: Aj is an eigenvalue of A}. operator A is said to be a contraction if I I A II < 1 . :..=
An We also use the adjective contractive for such an operator. A positive operator A is contractive if and only if A < I. To distinguish it from other norms that we consider later, the norm II A II will be called the operator norm or the bound norm of A. Another useful norm is the norm n
II A I I 2 == (L s} ( A )) 112 == ( trA * A ) 1 12 , j=l
where tr stands for the trace of an operator. If aij are the entries of a matrix representation of A relative to an orthonormal basis of 7t, then t,]
This makes this norm useful in calculations with matrices. This is called the Frobenius norm or the Schatten 2norm or the Hilb ert Schmidt norm. Both II A l l and I I A I I 2 have an important invariance property called uni tary invariance: we have I JA II == I I UAV II and l1 A l1 2 == l1U AV I I 2 for all unitary U , V. Any two norms on a finitedimensional space are equivalent. For the norms I I A I I and I I A I J2 it follows from the properties listed above that
I I A II for every
A.
<
I I A II 2 < n112 II A I I
Exercise 1.2 . 5 Show that matrices with distinct eigenvalues are dense in the space of all n x n matrices. (Use the Schur Triangularisation.) Exercise 1.2.6 If
II AII < 1 , then I  A is invertible (I A)  1 ==I + A+ A 2 + · ·
and
· ,
a convergent power series4 This is called the Neumann Series.
Exercise 1.2 . 7 The set of all invertible matrices is a dense open subset of the set of all n x n matrices. The set of all unitary m atri ces is a compact subset of the set of all n x n matrices. These two sets are also groups under multiplication. They are called the general linear group GL { n ) and the unitary group U{n), respectively.
8
I.
A Review
of
Linear Algebra
Exercise 1.2 . 8 For any matrix A the series
converges. This is called the exponential of .A_. The matrix exp A is always invertible and ( exp A)1 == exp ( A ) . Conversely, every invertible matrix can be expressed as the exponential of some matrix. Every unitary matrix can be expressed as the exponential of a skewHermitian matrix.
The numerical range or the field of values of an operator A is the subset W(A) of the complex plane defined as W(A) == { (x, Ax) : llxll == 1} . Note that W(UAU*) W(aA + bl)
W(A) aW(A) + bW(I)
for all U E U(n), for all a, b E C.
It is clear that if .X is an eigenvalue of A, then A is in W(A). It is also clear that W(A) is a closed set. An important property of W(A) is that it is a convex set. This is called the ToeplitzHausdorff Theorem; an outline of its proof is given in Problem I.6.2. Exercise 1. 2.9 (i) When A is normal, the set W(A) is the convex hull of the eigenvalues of A. For nonnormal matrices, W(A) may be bigger than the convex hull of its eigenvalues. For Hermitian opemtors, the first statement says that W (A) is the closed interval whose endpoints are the smallest and the largest eigenvalues of A. (ii) If a unit vector x belongs to the linear span of the eigenspaces corresponding to eigenvalues A1 , . . . , Ak of a normal operator A, then (x, Ax) lies in the convex hull of .A1 , . .. , Ak. {This fact will be used frequently in Chapter III.) The number w(A) defined
as
w(A)
==
sup l(x, Ax) I
llxll=l
is called the numerical radius of A. Exercise 1.2 . 10
(i) The numerical radius defines a norm on £(1l) .
{ii} w(UAU*) == w(A) for all U E U(n). (iii} w(A) < II AII < 2w(A ) for all A.
I.3 Direct Sums
(iv) w(A)
==
II A I I if {but
9
not only if) A is normal.
The spectral radius of an operator A is defined as sp r ( A) == max{ IA I
:A
is an eigenvalue of A} .
We have noted that spr(A) < w(A) < !! A l l , and that the three are equal if (but not only if) the operator A is normal.
I. 3
Direct Sums
If U, V are vector spaces, their direct sum is the space of columns (�) with u E U and v E V . This is a vector space with vector operations naturally defined coordinatewise. If 1l , IC are Hilbert spaces, their direct sum is a Hilbert space with inner product defined as
\G), G:))
= (h , h')rt + (k, k')K.
We will always denote this direct sum as 1l EB /C. If M and N are orthogonally complementary subspaces of 1t, then the fact that every vector x in 1l has a unique representation x == u+v with u E M and v E N implies that 1i is isomorphic to M E9 N. This isomorphism is given by a natural, fixed map. So, we say that 1l M E9 N. When a distinction is necessary we call this an internal direct sum. If M, N are subspaces of 1t complementary in the algebraic but not in the orthogonal sense; i.e. , if M and N are disjoint and their linear span is 1l , then every vector x in 1t has a unique decomposition x == u + v as before, but not with orthogonal summands. In this case we write 1t M + N and say 1t is the algebraic direct sum of M and N. If 1l == M E9 N is an internal direct sum, we may define the injecticn of M into 1t as the operator IM E £(M, 1l) such that IM(u) == u for all u E M. Then, IM is an element of £(1t, M) defined as !Mx == Px for all x E 1l , where P is the orthoprojector onto M. Here one should note that IM is not the same as P be�ause they map into different spaces. That is why their adjoints can be different. Similarly define IN. Then, (IM, IN) is an isometry from the ordinary ("external" ) direct sum M E9 N onto 1i. If 1l == M EB N and A E £(1l ) , then using this isomorphism, we can write A as a blockmatrix ==
==
A=
( � � ),
where B E £(M) , C E £(N, M), etc. Here, for example, C == !MAIN. The usual rules of matrix operations hold for block matrices. Adjoints are obtained by taking "conjugate transposes" formally.
I.
10
A Review of Linear Algebra
If the subspace M is invariant under A; i.e., Ax E M whenever x E M, then in the above blockmatrix representation of A we must have D 0. Indeed, this condition is equivalent toM being invariant. If both M and its orthogonal completnent N are invariant under A, we say that M reduces A. In this case, both C and D are 0. We then say that the operator A is the direct sum of B and E and write A= B Ef) E. =
Exercise
1.3.1 Let A= A1 Ee A2. Show that
(i} vll(A) is the convex hull of W (A1) and W(A2); i.e . , the smallest convex set containing W(A1) U W(A2). max( IIAtll, IIA2 II), max(spr(A1 ), spr(A2) ), max(w(A1), w(A2) ).
II A II spr(A) w(A)
(ii)
Direct sums in which each summand 1ij is the same space 1t arise often in practice. Very often, some properties of an operator A on 1i are reflected in those of some other operators on 1t Ee'H. This is illustrated in the following propositions. E
C('H). Then, the operators ( � unitarily equivalent in C('H ED 'H) .
�)
2 and ( :
�)
Lemma
1.3.2 Let A
Proof.
The cquivnleuc;e is itnpletnented by the unitary operator
are
•
�(!/ �).
1.3.3 An ope1utor A on 1t is positive if and only if the operator on 1i Ef) 1t is positive.
Corollary
( � �)
A This can also be seen by writing ( A
using
Exercise 1.2.2.
Coro llary P roo f.
1.3.4 For every A
Let A
( Note that
1 �1
=
P ropositio n
operator ( �
)

( A1/2 / Al 2
0 0
) ( Al/'2 0
C('H) the operator ( 1�1
1�: 1)
A1'2 0
) , and
is positive.
UP be the polar decomposition of A. Then,
A* IA*I
(b �)
E
A A
)
P UP
PU* UPU*
) ( I (b g)(��)( 0
is a unit�y operator on
1t ffi 1i.
0
U*
)
·
•
1.3.5 An operator A on 1t is contractive if and only if the on 1t ffi 1i is positive.
·�·)
L3 Direct
P ro of.
If A has the singular value decomposition
(A I
A* I
0
0
11
A U SV*, then =
) ( V U ) ( SI SI ) ( V* =
Sums
0
) U* . 0
Hen ce ( � �·) is positive if and only if ( � �) is positive. Also , II A l l = II S ll . So we may assume, without loss of generality, that A == S. Now let W be the unitary operator on 1l ffi1t that sends the orthonormal basis {e 1 , e2 , . . . , e2 n } to the basis {e 1 , en + b e2 , e n + 2 , ... , e n , e 2 n }· Then, the unitary conjugation by W transforms the matrix ( � �) to a direct sum of n twobytwo matrices
This is positive if and only if each of the summands is positive, which • happens if and only if Sj < 1 for all j; i.e. , S is a contraction.
Exercise 1.3.6 If A is a contraction, show that
A* (I  AA* ) 1 12 == (I  A* A ) 112 A* . Use this to show that if A is a contraction on 1t, then the operators
u v
( (I  _,..:! A ) l/2 ( (1  A� A)l/2
)' A *  (I  AA* ) 1 12 ) (I  AA* ) 1 12
A*
are unitary operators on 1t ffi 1t.
Exercise 1.3. 7 For every matrix A, the matrix ( b 1) is invertible and its inverse is ( b : ). Use this to show that if A , B are any two n x n matrices, then
This implies that AB and BA have the sa·me eigenvalues. {This last fact can be proved in ano ther way as follows. If B is invertible, then AB == B  1 (BA)B . So, AB and BA have the same eigenvalues. Since invertible matrices are dense in the space of all matrices, and a general known fact in complex analysis is that the roots of a polynomial vary continuously with the coefficients, the above conclusion also holds in general.) Direct sums with more than two summands are defined in the same way. We will denote the direct sum of spaces 1l1, . . . , 1lk as ffij= 1 7tj, or simply as ffij'Hj.
12
I.
1.4
Tensor P roducts
A
Review
of
Linear
Algebra
Let Vj , 1 < j < k , be vector spaceti. A 1nap F from the Cartesian product V1 X X Vk to another vector Space W is called multilin ear if it depends linearly on each of the arguments. W hen W C, such maps are called 2 , the word multilinear is replaced multilin ear funct io nals. When k w that satisfy by bilinear. Bilinear maps, thus, are maps F : vl X v2 the conditions •
•
•
=
=
�
F ( u , av 1 + bv2 ) F ( au1 + bu2 , v )
+
bF (u , v 2 ) , aF(u 1 , v ) + b F(u 2 v ) ,
aF(u , v 1 )
:
for all a , b E C ; u , U } , U2 E vl and v , V] ' V2 E V2 . We will be looking most often at the special situation when each Vj is the same vector space. As a special example consider a Hilbert space 1t and fix two vectors x, y in it. Then, F ( u, v ) (x , u) (y , v ) =
is a bilinear functional on 'H. We sec from this example that it is equally natural to consider conjugate multilinear functionals as welL Even more generally we could st udy func tions t hat are linear in some variables and conj ugate linear in others. As an exarnplc, let A E C('H, JC) and for u E JC and v E 'H, let F ( u v ) (u, A v) x: . Then , F depends linearly on v and conjugatelinearly on u . Such function als are called sesquilinear; an inner product is a functional of this sort. The example given above is the "most general" example of a sesquilinear functional : if F( u , v ) is any sesquilinear functional on JC x 1t , then there exis ts a unique ope rator A E C('H, JC) such that F (u , v ) (u, .Av) . In this sense our first example is not the most general exarnple of a bilinear functional. Bilinear functionals F ( u , v ) on 1t that can be expressed as F( u , v ) (x , u) (y , v) for some fixed x , y E 1t are c�lled elem ent ary . They are speci al as the following exercise will show. ,
=
=
=
·
be linearly independent vectors in 1t . Find a necessary and sufficient condition that a vector w m�Lst satisfy in order· that the bilinear functional
Exe rcise
1.4 . 1 Le t x , y ,
z
F (u , v )
=
(x , u) ( y , v)
+
( z , u) ( w , v)
is elementary. The set of all bilinear functionals is a vector space. The result of this exercise shows that the subset consisting of elcrncntary functionals is not closed under addition. We will soon see that a convenient basis for this vec tor space can be constructed with elementary functionals as its members. The procedure, called the tensor product construction, starts by taking formal li near combinations of symbols x ® y with x E 'H, y E JC; then
1 . 4 Tensor Products
13
reducing this space modulo suitable equivalence relations; then identifying t he resulting space with the space of bilinear functionals. More precisely, consider all finite sums of the type L Ci ( X i @ Yi ) ,
7t, Yi E IC and manipulate them formally as linear combinations. In this space the expressions Ci
E
E C , Xi
z
a(x @ y) a(x @ y)
(ax ® y) (x @ ay)
( x i + x2 ) ® y X @ ( Y l + Y2 ) for all a E C; x, x1 , x 2
XI @ Y + X 2 @ Y X @ YI + X @ Y2
E 1t and are next defined to be equivalent to 0, y , y1 , y2 E /C . The set of all linear combinations of expressions x @ y for x E 7t, y E IC, after reduction modulo these equivalences, is called the tensor product of 1t and IC and is denoted as 1t @ JC. Each term c ( x ® y) determines a conjugatebilinear functional F * (u, v) on 1t x JC by the natural rule
F* (u, v) c(u, x) (v, y ) . ==
This can be extended to sums of such terms, and the equivalences were chosen in such a way that equivalent expressions ( i.e. , expressions giving the same element of 7t ® IC) give the same functional. The complex conj ugate of each such functional gives a bilinear functional. These ideas can be extended directly to klinear functionals, including those that are linear in some of the arguments and conjugatelinear in others.
Theorem 1.4.2 The space of all bilinear functionals on 1t is linearly spanned by the elementary ones. If ( e I , . . . , e n ) is a fixed orthonormal basis of 1t, then to every bilinear functional F there correspond unique vectors x1 , . . , Xn such that F * Lei ® xi . .
==
Every sequence Xj , 1 < j <
J
n,
leads to a bilinear functional in this way.
Proof. Let F be a bilinear functional on 7l. For each j, F* (ej , v) is a conjugatelinear function of v . Hence there exists a unique vector Xj such that F* (ej , v) (v, xi } for all v . Now, if u Eaiei is any vector i n 1t , then F( u , v) �ai F(ej , v) • E {ej , u) { xj v) . In other words, F* Eej ® Xj as asserted. ==
==
,
A
==
==
more symmetric form of the above statement is the following:
Corollary 1.4.3 If (e 1 , . . . , e n ) and (JI , . . , fn ) are two fixed orthonormal bases of 1t, then every bilinear functional F on 1l has a unique represen tation F Ea ij ( e i @ fi ) * . .
==
14
L A Review of Linear Algebra
(Most o ften, the choice ( e t , . . . , e n ) ( /t , . . . , fn ) is the convenient one for using the above representations�) Thus, it is natural to denote the space of conj ugatebilinear functionals on 1t by 1t ® 'H . This is an n2 dimensional vector space. The �nner product on this space is defined by putting =
and then extending this definitio n to all of 1i ® 1i in a natural way. It is easy to verify that this definition is consistent with the equivalences , /n ) arc Ut;Cd in defining the tensor product . If (e1 , . � . , e n ) and (/1 , orthonornial bases in 1t , then e i ® /j , 1 < i, j < n, form an orthonorrnal basis in 1t ® 'H. For the purposes of computation it is useful to order this basis lexicographically: we say that ei ® /j precedes ek ® ft if and only if either i < k or i k and j < f. Tensor products such as 1t ® JC or JC* ® 1t can be defined by imitating the above procedure . Here the space JC* is t he space of all conjugatelinear functionals on JC. This space is called the dual space of JC . There is a natu ral identification between JC and JC* via a conjugatelinear, norm preserving bijection. •
.
.
=
{i) There is a natural is omorphism between the spaces JC ® 'H. * and £(1t, JC) in 'which the elementary tensor k ® h"' corresponds to the linear rnap that take.s a vector u of 1t to (h, u) k . This linear tra nsformation has rank one and all rank one, transformations can be obtained in this way. (ii) An explicit construc tion of this isomorphism
=
(iii) The space £(1t, JC) is a !filbert space with inner product (A, B) tr A* B� The set Eij , 1 < i < m, 1 < j < n, is an orthonormal basis for this space. Show that the map
Corresponding facts about tnultilinear functionals and tensor products of several spacet; arc proved in the same way. We will usc the notation ® k 'H for the k fold tensor product 1t ® 1t ® � ® 1t . Tensor products of linear operators arc defined as follows� We first define A ® 13 on elementary tensors by putting (A ® B ) ( x ® y) = A x ® By . We then extend this definition linearly to all linear combinations o f elementary tcnsots, Le., to all of 1t ® 'H . rfhis extension involves no inconsistency� ·
·
I . 4 Tensor Products
15
It is obvious that (A @ B) (C® D) == AC @ ED, that the identity on 1{ @ 1l is given by I @ I, and that if A and B are invertible, then so is A @ B and ( A @ B )  1 == A  1 @ B  1 . A oneline verification shows that ( A @ B)* == A* @ B * . It follows that A @ B is Hermitian if ( but not only if) A and B are Hermitian; A @ B is unitary if ( but not only if) A and B are unitary; A @ B is normal if (and only if) A and B are normal. (The trivial cases A == 0, or B == 0 , must be excluded f� r the last assertion to be valid . ) Exercise 1.4. 5 Suppose it is known that M is an invariant subspace for A . What invariant subspaces for A @ A can be ob tained from this information alone ? For operators A, B on different spaces 1l and JC, the tensor product can be defined in the same way as above. This gives an operator A ® B on 1{ @ JC. Many of the assertions made earlier for the case 1l JC remain true in this situation. ==
Exercise 1.4. 6 Let A and B be two matrices {not necessarily of the same size). Relative to the lexicographically ordered basis on the space of tensors, the matrix for A @ B can be written in block form as follows: if A == ( a ij ) , the n
A ® B == Especially important are the operators A ® A @ · · · ® A , which are kfold tensor products of an operator A E .C('H) . Such a product will be written more briefly as A®k or ®k A. This is an operator on the nkdimensional space ®k7t. Some of the easily proved and frequently used properties of these prod ucts are summarised below: 2. ( ®k A)  l == ®k A  l when either inverse exists .
4. If A is Hermitian, unitary, normal or positive, then so is ®k A.
5. If a 1 ,
, a k ( not necessarily distinct) are eigenvalues of A with eigen vectors u 1 , . , U k , respectively, then a 1 · · · a k is an eigenvalue of @k A and u 1 ® · · · ® U k is an eigenvector for it . 6. If Si 1 , , si k (not necessarily distinct) are singular values of A , then s i k is a singular value of ®k A. S i1 .
.
.
.
•
•
•
.
.
•
•
The reader should formulate and prove analogous statements for tensor products A1 Q9 A2 &"· · · Q9 Ak of different operators. ·
16
I. A H.ev 1 e w
I. 5
Symmetry Classes
ot
Linear Algebra
In the space ®k'H there are two especially important subspaces ( for non trivial cases , k > 1 and n > 1 ) The antisymm e t ric t e nsor pro duct of vectors X t , . . . , X k in 1t is de fined as .
Xt
1\
•
•
·
1\ Xk
1 2 = (k! )  1 Lco X o ( l )
®
·
·
® X o( k ) ,
'
0
where a runs over all permutation s of the k indices and £ 0 is ± 1 , depending on whether a is an even or an odd permutation. (c.o is called the s ignatur e of a. ) The factor (k � )  1 1 2 is chosen so that if Xj are orthonormal, then x 1 1\ · I\ Xk is a unit vector. The antisymmetry of t his product means that ·
·
1\ · 1\ Xt 1\
X1
•
·
•
·
•
1\ Xj 1\ · 1\ X k ·
·
=

X1
1\
·
·
·
1\ Xj 1\ · · 1\ Xi 1\ ·
·
·
·
1\ X k ,
i.e., interchangin g the position of any two of the factors in the product amounts to a change of sign . In particular, x 1 1\ 1\ X k 0 if any t w o of the factors are equal. The span of all antisymmetric tensors x 1 /\ · 1\ x k in ®k'H is denoted by 1\k'H. This is called the kth antisymmetric t e nso r pro d uct ( or t e nso r power ) of 'H. Given an orthonormal basis (e 1 , . . . , en ) in 'H, there is a st andard way of constructing an orthononnal b<.u:>is iu 1\ k'H. Let Qk,n denote the set of all strictly increasing ktuples chosen from { 1 , 21 , n} ; i .e., 'I E Q k,n if and only if 'I = (i1 , i2 , . . . , ik ) , where 1 < i1 < i 2 < < ik < n. For such an 'I let ex = ei1 1\ 1\ eik · Then, { ex : 'I E Q k , n } gi v es an orthonormal basis of 1\k'H. Such 'I are sometimes calle d multi indices . It is conventional to order them lexicographically. Note that the cardinality of Q k ,n , and hence the dimensionality of 1\ k'H, is (�) . If in particul ar k = n, the space 1\k'H is !dimensional. This plays a special role later on. When k > n the space 1\ k 'H is {0} . ·
·
•
•
•
·
·
=
·
•
•
·
·
·
·
1. 5. 1 Show that the inner product (x 1 1\ I\ Xk , Yt equal to the_ determinant of the k x k matrix ( (xi , Yi ) ) .
Exe rcis e
The
•
sym m et r ic t e nso r p ro d uct
of x 1 ,
•
•
·
·
1\
·
·
·
I\
Yk) is
, , Xk is defined as
0
where a, as befo re, runs over all permutations of the k indices. The linear span of all these vectors comprises the· subspace vk'H of ® k'H. This is called th e kth sym m e t ric t e ns o r p ower of 'H. Let Gk ,n denote the set of all nondecreasing ktuples chosen from { 1 , 2, . . , , n } ; i . e . , 'I E G k n if and only if 'I = (i 1 , . . . , ik ) , where 1 < i1 < < i < n . If such an 'I consists of e distinct indices i 1 , i2 , i t with k multiplicities m 1 , . . . , m e , respectively, put m('I) = m1 ! m2 � m e � Given .
•
·
•
.
·
·
·
•
.

I. 5 Symmetry Classes
17
an orthonormal basis ( e 1 , . . . , e n ),:,� 1t define, for every I E G k , n ,  � ex == e i1 V ei 2 V · k· · V ei k · Then , the set {m(I)  1 1 2 er : I E Gk ,n} is an orthonormal basis in v 7t. Again, it is conventional to order these multiindices lexico graphic ally. The cardinality of the set G k , n , and hence the dimensionality n+ 1 of t he space v k 7t, is ( z  ) . Notice that the expressions for the basis in 1\ k 1t are simpler because m(I) = 1 for I E Q k ,n·
The elementary tensors x @ · · · @ x, with all factors equal, are all in the subspace v k 1t. Do they span it? Exercise 1.5.3 Let M be a pdimensional subspace of 1t and N its or thogonal complement. Choosing j vectors from M and k  j vectors from N and form ing the linear span of the antisymmetric tensor products of all suc h vectors, we get different sub spaces of 1\ k 'H; for example, one of those is 1\ kM . Determine all the subspaces thus obtained and their dimensionalk ities. Do the same for v ?t. Exercise 1.5.4 If dim 7t == 3, then dim ® 3 7t == 27, dim /\ 3 7l == 1 and dim V 3 7t = 1 0. In terms of an orthonormal basis of 1l, write an element of (A 3 7t EB v3 7t ) j_ . The permanent of a matrix A == ( aij ) is defined as per A == La lu ( l ) . . . anu (n) . Exercise 1.5 . 2
0"
where a varies over all permutations on n symbols. Note t h at in contrast to the determinant , the permanent is not invariant under similarities. Thus, matrices of the same operator relative to different bases may have different permanents. ,
Show that the inner product (x1 V · · V X k , Y1 V V Yk ) is equal to the permanent of the k k matrix ((xi , Yi ) ) . The spaces l\ k 1i and y k Jt are also referred to as "symmetry classes" of tensors  there are other such classes in ® k 1l. Another way to look at them is as the ranges of the resp ective symmetry operators. Define PI\ and Pv Exercise 1.5.5
·
·
·
·
x
linear operators on
® k 1t by first
as
defining them on the elementary tensors
® · · ® x k ) == (k!)  1 / 2 x1 1\ I\ X k Pv ( x l ® · · ® x k ) == ( k! )  1 12x1 V · · V X k
as
Pl\ ( x 1
·
·
·
·
·
·
and extending them by linearity to the whole space. (Again it should be verified that this can be done consistently. ) The constant factor in the above definitions has been chosen so that both these operators are idempotent. They are also Hermitian. The ranges of these orthoprojectors are 1\ k 1i and V k 1l respectively. ,
I.
18
A Review of Linear Algebra
If A E .C(7t) , then Ax 1 1\ · · · 1\ A�$x: lies in 1\ k1t for all x1 , . . . , x k in 1t. Using this , one sees that the space 1\ k1t is invariant under the operator @ k A. The restriction of @ k A to this invariant subspace is denoted by 1\ k A or A Ak . This is called the kth antisymrnetric tensor power or the kth Grassmann power of A. We could have also defined it by first defining it on the elementary antisymmetric tensors x 1 1\ · · · 1\ x k as and then extending it linearly to the span
1\ k7t of these tensors.
Exercise 1.5 . 6 Let A be a nilpotent operator. Show how to obtain, from a Jordan basis for A, a Jordan basis for /\ 2 A.
The space vk7t is also invariant under the operator @ k A. The restriction of @ k A to this invariant subspace is written as y k A or Av k and called the kth symmetric tensor power of A. Some essential and simple properties of these operators are summarised below:
(vk A ) ( vk B) == vk ( A B) . ( A kA ) * == 1\kA * , (vkA) * == vkA* . ( A k A )  1 == 1\k A  1 , (vk A )  1 == yk A  1 .
1 . (1\k A ) ( l\ k B ) == 1\k (A B ) , 2. 3_
4. If A is Hermitian, unitary, normal or positive, then so are 5.
1\ k A
and
ykA. If a 1 , . . . , a k are eigenvalues of A (not necessarily distinct) belonging to eigenvectors u 1 , . . . , u k , respectively, then a 1 · · · a k is an eigenvalue of vk A belonging to eigenvector u 1 V · · · V u k ; if in addition the vectors Uj are linearly independent, then a 1 · · · a k is an eigenvalue of 1\ k A belonging to eigenvector u 1\ · · · 1\ k . 1
u
6. If s 1 , , Sn are the singular values of A, then the singular values of 1\k A are Si1 · · · si k , where (i 1 , . . . , ik) vary over Q k , n; the singular si k ' where (ii , . . . , i k ) , vary over G , n· values of v k A are S i 1 k .
•
.
• • •
7. tr1\ k A is the kth elementary symmetric polynomial in the eigenval ues of A; trv k A is the kth complete symmetric polynomial in the eigenvalues of A. (These polynomials are defined as follows. Given any ntuple ( a 1 , . . . , a n ) of numbers or other commuting obj ects, the kth elementary symmetric aik for (i1 , i 2 , . . . , ik ) polynomial in them is the sum of all terms ai1 ai2 in Q k , n ; the kth complete symmetric polynomial is the sum of all terms Cli 1 Cl i 2 a i for (i 1 , i 2 , . . . , ik) in G k , n . ) For A E .C (7t ) , consider the operator A ® I @ · · · ® I + I ® A @ I · · · ® I + · · · + I ® I ® · · · ® A. (There are k summands, each of which is a product • • •
•
•
•
k
I . 5 Symmetry Classes
19
of k factors. ) The eigenvalues of this operator on ® k U�are sums of eigenval ues of A. Both the spaces 1\ k 7t and v k H are invariant under this operator. One pleasant way to see this is to regard this operator as the tderivative at t == 0 of ® k (I + t A) . The restriction of this operator to the space 1\ k{{ will be of particular interest to us; we will write this restriction as A[k] . If u 1 , . . . , u k are linearly independent eigenvectors of A belonging to eigen I\ u k is an eigenvector of A [ k ] belonging to values a 1 , . . . , a k , then u 1 /\ eige nvalue a 1 + · + a k . Now, fixing an orthonormal basis ( e1 , . . . , en ) of 1t, identify A with its matrix ( a ij) . We want to find the matrix representations of 1\ k A and vk A relative to the standard bases constructed earlier. The basis of 1\ k 1t we are using is ex, I E Q k ,n· The (I, J ) entry of 1\ k A is ( e x ( l\ k A) e J" )  One may verify that this is equal to a subdeterminant of A. Namely, let A [IIJ"] denote the k x k matrix obtained from A by expunging all its entries aij except those for which i E I and j E J. Then, the (T, J ) entry of 1\k A is equal to det A [TIJ] . The special case k == n leads to the !dimensional space 1\ n H. The oper ator 1\ n A on this space is just the operator of multiplication by the number det A. We can thus think of det A as being equal to 1\ n A. The basis of vkH we are using is m(I)  1 1 2 er , I E G k , n · The (I, J ) entry of the matrix vk A can be computed as before, and the result is somewhat similar to that for 1\k A. For I == (i 1 , . . . , i k ) and J == (j1 , . . . , j k ) in G k , n , let A [IIJ] now denote the k x k matrix whose (r, s ) entry is the ( i, , j )  entry of A. Since repetitions of indices are allowed in I and J , this is not a submatrix of A this time. One verifies that the (I, J ) entry of yk A is (m(I)m(J))  11 2 per A [IIJ] . In particular, per A is one of the diagonal entries of v n A : the (I, I) entry for I == ( 1 , 2 , . . . , n ) . ·
·
·
·
·
,
s
Exercise 1.5 . 7 Prove that for any vectors
u1 ,
. . . , U k , v1 , . . . , vk we have
I det ( (u i , vj ) ) 1 2 < det ( ( u i , uj ) ) det ( (vi , Vj } ) , j per ( ( ui , vj ) ) 1 2 < per ( ( ui, Uj ) ) per ( (vi , Vj ) ) . Exercise 1.5 . 8 Prove that for any two matrices A, B we have
j per ( AB ) I2
<
per ( AA* ) per ( B* B ) .
(The corresponding relation for determinants is an easy equality . )
Exercise 1.5.9 (Schur's Theorem} If A is positive, then per A > det A. {Hint: Using Exercise 1. 2. 2 write A == T* T for an upper triangular T. Then use the preceding exercise cleverly.}
1.
20
A H.e v ie w of Linear Alg e bra
We have observed earlier that for any vectors x 1 ,
•
•
•
, Xk in 1i we have
When 1i = lRn , this determinant is also the square of the kdimensional volume of the parallelepiped havin g x 1 , , Xk as its sides. To see this, note that neither the determinant nor the volume in question is altered if we add to any of these vectors a linear combination of the others. Performing such operations successi vely, we can reach an orthogonal set of vectors, some of which might be zero. In this case it is obvious that the determinant is equal to the square of the volume; hence that was true initially too. Given an y ktuple X = (x t , . . . ' Xk ) , the m atrix ( (xi , Xj ) ) = x · X is called the G ram mat rix of the vectors Xj ; its determi n ant is called their •
•
•
G ram determi nant .
Every k x k positive matrix A = ( aij ) can be realised as a Gram matrix, i. e. , vectors Xj , 1 < j < k , can be found so that aij = (xi , Xj) for all i, j . Exercise 1 . 5 . 1 0
1.6
Problems
, u11 ) , not necessarily ort honor Given a basis U = ( u 1 , vn ) ? mal, in 'H, how would you compute the biorthogonal basis ( v1 1 Find a formula that expresses ( vj , x) for each x E 1t and j = 1 , 2, . . . , k in terms of Gram matrices .
P roblem 1 .6. 1 .
, • • •
• • •
Problem 1.6.2 .
A proof of the ToeplitzHaus dorff Theorem is outlined
below . Fill in the details. Note that W ( A) = {(x , Ax) : ll xll = 1 } = {tr Axx* : x• x = 1 } . It is enough to consider the special case dim 1i = 2. In higher dimensions, this special case can be used to show that if x , y are any two vectors) then any point on the line segment joining (x , Ax) and (y, A y) can be represented as ( z , Az) , where z is a vector in the linear span of x and y. Now, on the space of 2 x 2 Hermitian rnatrices consider the linear map � (T) = tr AT. This is a real linear map from a space of 4 re al dimensions (the 2 x 2 Hermitian matrices) to a space of 2 real dimensions ( the complex plane) . We want to pr ove that
(
cos t e1w Sin t 0
•
) (cos t
.
.
1
2
e�w s1n t ) = 
+
1
2

(
cos 2t e�w sin 2 t 0
eiw sin 2t
 cos 2t
)
'
This is a 2sphere centred at ( � ? ) and having radius 1/ J2 in the Frobe2 nius norm . The image of a 2sphere under a linear map with range in lR2
I . 6 Problems
21
must b e either an ellipse with interior, or a line segmerft..c; o r a point; in any case, a convex set.
. . . , Xk
are lin Problem 1.6.3. By the remarks in Section 5 , vectors x 1 , early dependent if and only if x 1 · · · == 0. This relationship between linear dependence and the antisymmetric tensor product goes further. Two and {Y l , . . , Y k } of linearly independent vectors have the sets { x1 , . . . for some constant · · same linear span if and only if x 1 /\ · · · 1\ x == c. Thus, there is a onetoone correspondence between kdimensional sub spaces of a vector space W and !dimensional subspaces of 1\ k W generated by elementary tensors x 1 1\ · · ·
1\ 1\ X k
, Xk }
.
k cy11\ ·1\yk
1\ X k .
Problem 1.6.4. How large must dim W be in order that there exist some element of /\2 W which is not elementary? Problem 1.6.5. Every vector w of W induces a linear operator Tw from W as follows. Tw is defined on elementary tensors as 1\ k W to == V k w , and then extended linearly to all · · · Tw ( v1 1\ · of 1\ k W. It is, then, natural to write Tw ( x) == x 1\ w for any x E Show that a nonzero vector x in W is elementary if and only if the space {w E : x w == is kdimensional. (When is a Hilbert space, the operators Tw are called creation oper ators and their adjoints are called annihilation operators in the physics literature. )
1\k+l · 1\ vk ) VI 1\ 1\ 1\ k 1\ W W1\ 0} ·
1\ k W.
Problem 1. 6.6. ( The ndimensional Pythagorean Theorem ) Let be orthogonal vectors in IRn . Consider the ndimensional sim xi , . . . Think of the ( n  I)dimensional sim plex S with vertices 0, x 1 , plex with vertices x 1 , as the "hypotenuse" of S and the remaining ( n  I )dimensional faces of S as its "legs" . By the remarks in Section 5, the kdimensional volume of the simplex formed by any k points together with the origin is (k ! ) 1\ · · · The volume of a simplex not having 0 as a vertex can be found by translating it. Use this to prove that the square of the volume of the hypotenuse of S is the sum of the squares of the volumes of the n legs.
, Xn
. . . , Xn. . . , Xn I J I YI
1\ Yk l ·
y1 , . . . , Yk
Problem 1. 6. 7. (i) Let Q" be the inclusion map from 1\ k ?t into ® k 7t (so that Q� equals the projection P" defined �arlier) and let Qv be the into ®k1t. Then, for any A E C(7t) inclusion map from
V k7t
( ii) II A A I < I I A II k ' (i ii ) jd et AI <
v k A == Pv ( ® k A ) Qv .
k I II v k A ll < I I A I l k . I A i l n , I perA I < I A l i n ·
I.
22
A Review of L inear A lgebra
Problem 1.6.8. For an invertible operator A obtain a :'fe�lationship between A  l , 1\ n A , and An  I A . Problem 1.6.9. (i) Let bases in 1{. Show that
{e l , · · · , e n }
and
{fl , · · · , fn }
be two orthonormal
(ii) Let P and Q be orthogonal projections in 1t, each of rank x, y be unit vectors such that Px == Qy == 0. Show that
n  1 . Let
Problem 1.6. 10. If the characteristic polynomial of A is written as
tn + a l tn  1 + . . . + an , then the coefficient is equal to tr 1\ k A.
ak
is the sum of all
Problem 1.6. 1 1 . (i) For any
k
x
k principal minors of A. This
A, B E £(7t) we have
® k A  ®k B
=
k L Cj j=l
'
where Hence,
I I ® k A  ® k Ell < k M k  l ii A  E l l , where M max( I I A IJ , l i B I I ) . (ii) The norms of J\ k A  J\ k B and v k A  y k B are therefore also bounded by k M k 1 1lA  Ell . (iii) For n n matrices A, B, ldetA  detB I < n M n  l l i A  B IJ , l perA  perB I < nMn  l i i A  B ll ==
x
(iv) The example A
==
o.I, B
(a + E )I When II A II
==
for small c. shows that these and II B II are far apart, find a
inequalities are sometimes sharp . simple improvement on them. ( v) If A, B are n x n matrices with characteristic polynomials
t n + a l tn  1 + . . . + an ,
I. 6 Problen1s
respe ctively, then
l a k  bkl
<
k
23
( n) M k  1 J I A  B l , k
M == max( IIA l l ,
I B I ! ). Problem 1. 6. 12. Let A, B be positive operators with A > B (i.e. , A  B is positive) . Show that ®kA > ®k B , 1\ k A > 1\ k B ,
where
vk A > vk B ,
> detB, > per
detA perA
B.
Problem 1. 6. 13. The Schur product or the Had amard product of two matrices A and is defined to be the matrix A o whose ( i, j )entry is aij bij . Show that this is a principal submatrix of A @ and derive from this fact two significant properties:
B
(i)
IA
o
Ell
<
BB,
I A ll l i B I for all A, B.
B
are positive, then so is A (ii) If A, Theorem. ) Problem 1. 6. 14. (i) Let A ==
( a ij )
be an n
n
L aij , j= l L a ij ·
ri s
o
B. x
(This is called Schur 's
n positive matrix. Let
l < i < n,
'L ,J
Show that sn perA
>
n
n ! IT l ri l2.
i =l
[Hint: Represent A as the Gram matrix of some vectors x 1 , . . . , X n as in Exercise ! . 5. 1 0. Let u == s 1 12 (x 1 + · · · + x n ) · Consider the vectors u V u V · · · V u and x1 V · · V X n , and use the CauchySchwarz inequality.] (ii) Show that equality holds in the above inequality if and only if either A has rank 1 or A has a row of zeroes. (iii) If in addition all aij are nonnegative and all ri == 1 ( so that the matrix A is doubly stochastic as well as positive semidefinite) , then ·
n!
per A > ;;: n
.
24
I.
A Review of Li near A lgebra
Here equality holds if and only if a ij == � for all i , j . Problem 1.6 . 15. Let A be Hermitian with eigenvalues a 1 > a 2 > Cln o In Exercise I.20 7 we noted that
·
0
0
>
1},
a 1 == max { ( x , Ax ) : II x II
==
an == min { ( x , Ax ) : ll x ll
== 1 } .
Using these relations and tensor products, we can deduce some other ex tremal representations: ( i ) For every k == 1 , 2 , . . , n , 0
k k L aj == max L (xj , Axj ) , j= l j=l n k L aj == min L (xj , A xj ) , j= l j =n  k+ l
where the maximum and the minimum are taken over all choices of or thonormal ktuples ( x 1 , . x k ) in 1to The first statement is referred to ·as Ky Fan's Maximum Principle. It will reappear in Chapter II ( with a different proof ) and subsequently. ( ii ) If A is positive, then for every k == 1 , 2, . . . , n , 0
0 ,
n
k
II
aj = min II (xj , Axj ) , j= l j =n k+ l
where the minimum is taken over all choices of orthonormal ktuples ( X l , . . o X k ) in 1{. [ Hint: You may need to use the I!adamard Determinant Theorem, which says that the deterrninant of a positive matrix is bounded above by the product of its diagonal entries. This is also proved in Chapter II.] ( ii ) If A is positive, then for every T E Q k,n ,
n
II
j =n  k + l
a1
k
< det A [TIIj < II aj . j= l
Problem 1.6. 16. Let A be any n x n matrix with eigenvalues a 1 , . Show that t A n 1 ( I � j tr l 2 < a1 I All _
�� [ :
_
: ) r /2
0
0
,
Cln o
for all j 1 , 2, , n . ( Results such as this are interesting because they give some information about the location of the eigenvalues of a matrix in ==
.
.
0
I . 6 Problems
25
te rms of more easily computable functions like the Frobenius norm II A I I 2 an d the trace. We will see several such statements later. ) (x1 , . , X n ) is a vector with + · + n [ Hint : First prove that if 6, th en 1 1/2 max lxJ I < n .]
x
==
.
x1
.
( � ) ilx l
·
·
x
==
Problem 1. 6.17. ( i ) Let z1 , z2 , z3 be three points on the unit circle. Then, the numerical range of an operator A is contained in the triangle with vert ices z1 , z2 , z3 if and only if A can be expressed as A z1 A1 + z2 A 2 + z3 A3 , where A1 , A2 , A3 are positive operators with A 1 + A2 + A 3 I. [Hint: It is easy to see that if A is a sum of this form, then W ( A ) is contained in the given triangle. The converse needs some work to prove. Let z be any point in the given triangle. Then, one can find a 1 , a2 , a3 1 such that aj > 0 , a1 + a 2 + a3 and z a 1 z1 + a 2 z2 + a3z3 . These are the ''barycentric coordinates" of z and can be obtained as follows. Let ' == Im ( z 1 Z2 + Z2 Z3 + z3z l ) · Then, for j 1 , 2 , 3, ==
==
==
=
==
aj
==
Im
( z  Z1·+l ) (z1· + 2  Z1· + I ) ,
I
where the subscript indices are counted modulo 3 . Put AJ
=
Im (
A  zi+ I I ) (zi+ 2  zi + I ) . '
Then, Aj have the required properties. ] ( ii ) Let W (A ) be contained in a triangle with vertices z1 , z2 , z3 lying on the unit circle. Then, choosing A1 , A2 , A3 as above, write 3 * A A· zJ· AJ· I 1
(�
) == L ( ]. =
This, being a sum of three positive matrices, is positive. Hence, by Propo sition 1.3. 5 A is a contraction. ( iii ) If W(A) is contained in a triangle with vertices z1 , z2 , Z3 , then I I A ll < max l zi l · This is Mirman's Theorem. Problem 1.6. 18. If an operator T has the Cartesian decomposition A + iB with A and B positive, then
T
=
Show that, if A or B is not positive then this need not be true. [Hint: To prove the above inequality note that W(T) is contained in a rectangle in the first quadrant . Find a suitable triangle that contains it and use Mirman's Theorem. ]
26
I.
I. 7
Notes and References
A Rev iew of Li near Algebra
Standard references on linear algebra and matrix theory include P.R. Hal mas, FiniteDimensional Vector Spaces , Van Nostrand, 1958; F.R. Gant macher, Matrix Theory , 2 volumes, Chelsea, 1959 and K. Hoffman and R. Kunze, Linear Algebra , 2nd ed. , Prentice Hall, 1971 . A recent work is R.A. Horn and C.R. Johnson, in two volumes, Matrix Analysis and Top ics in Matrix Analysis , Cambridge University Press, 1985 and 1990. For more of multilinear algebra, see W� Greub, Multilinear Algebra , 2nd ed. , SpringerVerlag, 1978, and M. Marcus, FiniteDimensional Multilinear Al gebra, 2 volumes, Marcel Dekker, 1973 and 1975. A brief treatment that covers all the basic results may be found in M. Marcus and H. Mine, A Sur vey of Matrix Theory and Matrix Inequalities , Prindle, Weber and Schmidt, 1964, reprinted by Dover in 1992. Though not as important as the determinant , the permanent of a matrix is an interesting object with many uses in combinatorics, geometry, and physics. A book devoted entirely to it is H. Mine, Permanents , Addison Wesley, 1978. Apart from the symmetric and the antisymmetric tensors, there are other symmetry classes of tensors. Their study is related to the glorious subject of representations of finite groups. See J.P. Serre, Linear Representations of Finite Groups, SpringerVerlag, 1977. The result in Exercise 1.3.6 is due to P.R. Halmos, and is the beginning of a subject called Dilation Theory. See Chapter 23 of P. R. Halmos, A Hilbert Space Problem Book, 2nd ed. , SpringerVerlag, 1982. The proof of the ToeplitzHausdorff Theorem in Problem 1.6.2 is taken from C. Davis, The ToeplitzHausdorff theorem explained , Canad. Math. Bull. , 14( 1971) 245246. For a different proof, see P.R. Halmos, A Hilbert Space Problem Book . For relations between Grassmann spaces and geometry, as indicated in Problem 1.6.3, see, for example, I.R. Porteous, Topological Geometry, Cam bridge University Press, 198 1 . The simple proof of the Pythagorean Theo rem in Problem 1.6.6 is due to S. Ramanan. Among the several papers in quantum physics, where ideas very close to those in Problems !.6.3 and I.6.5 are used effectively, is one by N.M. Hugenholtz and R.V. Kadison, Automorphisms and quasifree states of the CAR algebra, Commun. Math. Phys. , 43 ( 1975) 18 1 197. Inequalities like the ones in Problem !.6. 1 1 were first discovered in con nection with perturbation theory of eigenvalues. This is summarised in R. Bhatia, Perturbation Bounds for Matrix Eigenvalues , Longman, 1987. The simple identity at the beginning of Problem 1.6. 1 1 was first used in this context in R. Bhatia and L. Elsner, On the variation of permanents , Linear and Multilinear Algebra, 27( 1990) 105110. The results and the ideas of Problem !.6. 14 are from M. Marcus and M. Newman, Inequalities for the permanent function, Ann. of Math. , 75( 1962)
I . 7 Notes and References
27
4 7if12. In 1 926, B .L. van der Waerden had conjectured that the inequality in part ( iii) of Problem 1. 6. 14 will hold for all doubly stochastic matrices. This conjecture was proved , in two separate papers in 1981 , by G.P. Egorychev and D. Falikman. An expository account is given in J.H. van Lint , The van der Waerden conjecture: two proofs in one year, Math. Intelligencer, 4(1 982)7277. The results of Problem 1.6. 1 5 are all due to Ky Fan, On a theorem of Weyl co ncern ing eigenvalues of linear transformations I, II , Proc. Nat . Acad. Sci. , U.S.A. , 35 (1949) 652655, 36( 1950)3 135, and A minimum property of the eigenvalues of a Hermitian transformation, Amer. Math. Monthly, 60 ( 1953)4850. A special case of the inequality of Problem 1.6. 16 occurs in P. Tarazaga, Eigenvalue estimates for symmetric matrices , Linear Algebra and Appl. , 135 ( 1 990) 1 71 179. Mirman's Theorem is proved in B .A. Mirman, Numerical range and norm of a linear operator, Trudy Seminara po Funkcional' nomu Analizu, No . 10 (1968) , pp. 5 155. The inequality of Problem !.6. 18 is also noted there as a corollary. Our proof of Mirman's Theorem is taken from Y. Nakamura, Numerical range and norm, Math. Japonica, 27 ( 1982) 149 1 50.
II Maj orisation and D oubly Stochastic Matrices
Comparison of two vector quantities often leads to interesting inequali ties that can be expressed succinctly as majoris ation relations. There is an intimate relation between majorisation and doubly stochastic matrices. These topics are studied in detail here. We place special emphasis on ma j orisation relations between the eigenvalue ntuples of two matrices. This will be a recurrent theme in the book. "
II. l
"
Basic Notions
Let x == ( x 1 , . . . , x n ) be an element of IRn . Let x l and x T be the vectors obtained by re arr anging the coordinates of x in the de creasing and the increasing orders, respectively Thus, if xl == (xi , . . . , x� ) , then xi > · · > x; . Similarly, if xT == (xi , . . . , x� ) , then xi < · · · < x� . Note that .
·
1
XTj  xln  j + l ' _
Let
x,
y E IRn . We say that k
"'X � � J j= l
and
x
< j < n.
is majorised by k
+ < "" �y j= l
n
1
J '
n
L xJ == LyJ .
j=l
y , in 5ymbols
j=l
n,
x
<
(IL l )
y , if (II.2)
(II.3)
I L l B asic Not ions
Exartrp le: If X i > 0 and l:xi
29
1 , then
==
1 1 (  , . . . ,  ) < (x 1 , . . . , x n ) < ( 1 , 0, . . . , 0) . n
n
The notion of majorisation occurs naturally in various contexts. For ex ample, in physics, the relation x < y is interpreted to mean that the vector x describes a "more chaotic" state than y. (Think of xi as the probability of a system being found in state i . ) Another example occurs in economics. If x 1 , . . . , X n and Y1 , . . . , Yn denote incomes of individuals 1 , 2, . . . , n , then x < y would mean that there is a more equal distribution of incomes in the state x than in y. The above example illustrates this. From (IL l) we have k
n
n k
x] L xj  L xj . L =l =l =l ==
j
j
Hence x
<
j
y if and only if k
k
"' x � > "yT � J  � J'
j= l
l< n k<
j= l
and
(II.4)
n
n
L xJ LyJ . =l =l ==
(II.5)
j
j
Let e denote the vector ( 1 , 1, . . . , 1) , and for any subset I of { 1 , 2, . . . , n let e 1 denote the vector whose j th component is 1 if j E I and 0 if j rf_ I. Given a vector x E IRn , let
}
tr x
n
==
L xj j= l
==
(x, e ) ,
where (  , ·) denotes the inner product in IRn . Note that k
L xj
==
.
J=l
max (x, e1 ) , l l l =k
where JI I stands for the number of elements in the set I. So, x < y if and only if for each subset I of { 1 , 2, . . . ; n} there exists a subset J wi th l lI == J I such that
(II.6) and tr x
==
tr y.
(II. 7)
30
II.
Maj orisat ion and Doubly Stochastic Mat rices
We say that x is (weakly) subrnajorised by y, in symbols x < w y , if condition (II.2) is fulfilled. Note that in the absence of (II.3) , the conditions (II.2) and (II.4) are not equivalent. We say that x is (weakly) superrnajorised by y , in symbols x < w y , if condition (II.4) is fulfilled. Exercise II. l . l {i) x < y ¢:> x < w y and x < w y . (ii} If a is a positive real number, then X < w y
=>
ax < w ay,
X < w y
=>
ax < w a y.
=>
ax < ay .
(iii} X < w y ¢:> x < w  y. (iv) For any real number a , x < y
Remark II. 1.2 The relations < , < w , and < w are all reflexive and tran sitive. None of them, however, is a partial order. For example, if x < y and y < x , we can only conclude that x == Py, where P is a permutation matrix. If we say that x ""' y whenever x == Py for some permutation matrix P, then ""' defines an equivalence relation on IRn . If we denote by IR�y m the resulting quotient space, then < defines a partial order on this space. This > Xn } . These relation is also a partial order on the set { x E IRn : XI > statements are true for the relations < w and < w as well. For a , b E JR , let a V b == max ( a , b) and a 1\ b == min( a, b) . For x, y E JRn , define x V y == ( x l V y i , . . . , Xn V yn ) ·
X 1\ Y == (x i 1\ YI ,
·
·
·
·
·
, X n 1\ Yn ) ·
Let x+ lxl
x V 0, x V ( x) .
In other words, x + is the vector obtained from x by replacing the negative coordinates by zeroes , and l x l is the vector obtained by taking the absolute values of all coordinates. With these notations we can prove the following characterisation of ma jorisation that does not involve rearrangements: Theorem II. 1.3 Let x, y E IRn . Then, (i) x <w y if and only if for all t E IR n
n
(II.8) j= l
j=I
II. l
(ii)
x
31
< w y if and oniy if for all t E IR n
n
(11. 9 )
j= l
(iii)
Basic Notions
x < y if and only if for all t E IR
j=l
n
n
j= l
j= l
L lxj  t l < LIY1  tj .
(II. l O)
Proof. Let x < w y. If t > xi , then ( x1  t ) + == 0 for all j, and hence (II.8) holds. Let X k I < t < xi for some 1 < k < n, where, for convenience, + x ; + l == oo. Then, k k n L( xj  t) == Lxj  kt j=l j= l k
k
j= l
j=l
< L YJ  kt < L(yj  t)+ , and, hence, (II.8) holds. To prove the converse, note that if t n
==
Y k , then
k
k
L (Yj  t ) + L(yj  t) L yJ  kt. ==
==
j= l
j= l
k
k
j= l
j= l
j= l
But k
Lxj  kt j=l
L (xj  t) < L (xj  t ) + n
n
j= l
j= l
So, if (II.8) holds, then we must have k
"'X� Lt J
j= l
k
"'y +
< Lt j= l
J '
i.e. , X < w y . This proves ( i) . The statements (ii) and (iii) have similar proofs.
•
Corollary 11. 1.4 If x < y in JRn and u < w in IRm , then (x, u) < ( y , w ) in IRn + m . In particular, x < y if and only if (x, u ) < ( y , u ) for all u .
32
II.
An
Maj orisation and Do ubly Stochastic Mat rices
�� n
matrix A
==
( aij ) is called doubly stochastic if n
>0 a1.]· · 
for all
z,
J,
(II. 1 1 )
for all j,
(II. 12)
for all
(II. 13)
n
i.
Exercise 11. 1 . 5 A linear map A on e n is called positivitypreserving if it carries vectors with nonnegative coordinates to vectors with nonnegative coordinates. It is called tracepreserving if tr Ax == tr x for all x . It is called unital if Ae == e . Show that a matrix A is doubly stochastic if and only if the linear operator A is positivitypreserving, tracepreserving and unital. Show that A is tracepreserving if and only if its adjoint A* is unital. Exercise 11. 1.6 {i) The class of n x n doubly stochastic matrices is a convex set and is closed under multiplication and the adjoint operation. It is, however, not a group. (ii) Every permutation matrix is doubly stochastic and is an extreme point of the convex set of all doubly stochastic matrices. (Later we will prove Birkhoff 's Theorem, which says that all extreme points of this convex set are permutation matrices.) Exercise 11. 1 . 7 Let A be a doubly stochastic matrix. Show that all eigen values of A have modulus less than or equal to 1, that 1 is an eigenvalue of A, and that I I A I I == 1 . Exercise 11. 1.8 If
A is doubly stochastic, then l Ax !
where, as usual, j x j all j .
==
< A( l x l ) ,
( j x 1 1 , . . . , lx n l ) and we say that x
<
y if Xj
< Y1
for
There is a close relationshi p between majorisation and doubly stochastic matrices. This is brought o ut in the next few t heorems . <
Theorem 11. 1.9 A matrix A is doubly stochastic if and only if Ax for all vectors x .
Ax
x
Proof. Let < x for all x. First choosing to be e and ei == (0, 0, . . . , 1 , 0, . . . , 0) , 1 < i < n, one can easily see that A is bly stochastic. Conversely, let be doubly stochastic. Let y == Ax. To p rove y < may assume, without loss of generality, that the coordinates of both
A
x
then do u
x
x we and
I L l B asic Not ions
y
33
are in decreasing order. (See Remark II. 1 . 2 and Exercise 11. 1 . 6 . ) No\v note that for any k, 1 < k < n , we have
If we put t i
k k n L Yj == L L U ij X i · j= l j=l i= l
k
==
:Laij , then 0 < t i j=l
j=l
j= l
<
< 1
n
and L t i
==
i=l
k. We have
i=l i=l k n n L t i x i  L x i + (k  L t i ) x k i= l i= l i=l k n L ( t i  1 ) ( x i  x k ) + L ti ( x i  x k ) i= l i=k+l
0.
Further, when k == n we must have equality here simply because A is doubly stochastic. Thus, y < x . • Note that if x, y
E
�2
and x <
y
then
Note also that if x , y E JRn and x is obtained by averaging any two coordi nates of y in the above sense while keeping the rest of the coordinates fixed, then x < y. More precisely, call a linear map T on IRn a Ttransforrn if there exists 0 < t < 1 and indices j, k such that
Then, Ty
<
y
for all y .
Theorem 11. 1. 1 0 For x, y
E
JR n , the following statements are equivalent:
{i) X < y .
{ii) x is o btained from y b y a finite number of Ttransforms. {iii)
is in the convex hull of all vectors obtained by permuting the coor dinates of y .
x
{iv) x
==
Ay for some doubly stochastic matrix A .
34
Maj orisation and D oubly Stochastic Matrices
II.
Proof. When n == 2, ,ihen ( i ) * ( ii ) . We will prove this for a general n by induction. Assume that we have this implication for dimensions up to n  1 . Let x , y E JRn . Since x l and y l can be obtained from x and y by permutations and each permutation is a product of transpositions which are surely Ttransforms, we can assume without loss of generality that x 1 > X 2 > · · · > X n and Y l > Y2 > · · · > Yn · Now, if x < y, then Yn < x 1 < Y l · Choose k such that Yk < x 1 < Yk 1 . Then x 1 ty1 +( 1  t) y k for some 0 < t < 1 . Let
==
T1 z ( tz 1 + ( 1  t) z k , z2 , . . . , Zk  1 , (1  t) z1 + t zk , Zk + I , . . . , Zn) for all z E IR.n . Then note that the first coordinate of T1 y is x 1 . Let x' (x2 , . . . , X n ) y' ( y2 , · . , Y k  1 , ( 1  t) yl + tyk , Yk + l , . . . , Yn) · We will show that x ' < y'. Since Y l > · · · > Yk  1 > x 1 > x 2 > · · · > Xn , we have for 2 <
.
m
m
m
L xj < LYj · j= 2 j=2 For k <
m
<
n m
k 1 L Yj + ( ( 1  t) y l + tyk] + L Yj j =2 j= k+ l L Yj  tyl + (t  l) yk j=l m

m
m
m
m
Lxj  x 1 == Lxj . j=l j =2 The last inequality is an equality when m == n since x < y. Thus x ' < y ' . So by the induction hypothesis there exist a finite number of Ttransforms T2 , . . . , Tr on IR.n  I such that x ' == (Tr · · · T2 ) y' . We can regard each of them as a Ttransform on IRn if we prohibit them from touching the first LYi  x 1 j=l
>
coordinate of any vector. We then have
 
and that is what we wanted to prove. Now note that a Ttransform is a convex combination of the identity map and some permutation. So a product of such maps is a convex combination of permutations. Hence ( ii ) * ( iii ) . The implication ( iii ) =} ( iv ) is obvious, • and ( iv ) => ( i ) is a consequence of Theorem 11. 1.9. A consequence of the above theorem is that the set { x : x < y} is the convex hull of all points obtained from y by permuting its coordinates.
I L l B asic Notions
35
Exercise 11. 1 . 1 1 If U == ( u ij ) is a unitary matrix, then the matrix ( jui1 j2) is dou bly stochastic. Such a doubly stochastic matrix is called unitarystochastic; it is called orthostochastic if U is real orthogonal. Show that if x == Ay for some doubly stochastic matrix A, then there exists an orthostochastic matrix B such that x == B y . (Use induction.) Exercise 11. 1 . 12 Let A be an n x n Hermitian matrix. Let diag (A) denote the vector whose coordinates are the diagonal entries of A and .X(A) the vector whose coordinates are the eigenvalues of A specified in any order. Show that diag (A) < A(A ). (II. 14) This is sometimes referred to
as
Schur's Theorem.
Exercise 11. 1. 13 Use the majorisation (11. 14) to prove that if AJ (A ) de note the eigenvalues of an n x n Hermitian matrix arranged in decreasing o rder then for all k == 1 , 2 , . . . , n
k LAJ (A) = max j=l
k L(xj , Axj ), j=l
( II. 1 5 )
where the maximum is taken over all orthonormal ktuples of vectors {x 1 , . . . , x k } in e n . This is the Ky Fan's maximum principle . {See Problem I. 6. 1 5 also.) Show that the majorisation {I/. 14) can be derived from (I/. 1 5). The two statements are, thus, equivalent.
Exercise 11. 1 . 14 Let A, B be Hermitian matrices. Then for all k
==
1, 2,
. . . ' n
k k k (1!. 16) LAJ (A + B) < LAJ (A) + L .xJ (B) . j= l j=l j= l Exercise 11. 1 . 15 For any matrix A, let A be the Hermitian matrix

 [
0 A == A*
�].
(11. 17)
Then the eigenvalues of A are the singular values of A together with their negatives. Denote the singular values of A arranged in decreasing order by s 1 (A) , . . . , s n (A) . Show that for any two n x n matrices A, B and for any k == 1 , 2, . . . , n
k k k L sj ( A + B ) < L sj (A) + L s. (B ) . j=l j=l j= l i
(11. 18)
I1'hen k == 1, this is just the triangle inequality for the ope:ator norm I I A I I . For each 1 < k < n , define I I A I I c k ) == 2:;= 1 sj (A) . From (!!. 1 8} it follo ws that I I A I I c k ) defines a norm. These norms are called the Ky Fan knorms .
36
II. 2
II.
Majorisat ion and Do ubly Stochastic Mat rices
Birkhoff ' s Th�orem
We start with a combinatorial problem known as the Matching Problem. Let B == {b 1 , . . . , bn } and G == { g 1 , . . . , 9n } be two sets of n elements each, and let R be a subset of B x G. When does there exist a bijection f from B to G whose graph is contained in R? This is called the Matching Problem or the Marriage Problem for the following reason. Think of B as a set of boys, G as a set of girls, and (bi , gj ) E R as saying that the boy bi knows the girl 9i . Then the above question can be phrased as: when can one arrange a monogamous marriage in which each boy gets married to a girl he knows? We will call such a matching a compatible matching. For each i let Gi == {gj : (bi , 9i ) E R} . This represents the set of girls whom the boy bi knows. For each ktuple of indices 1 < i 1 < · · · < i k < n , k let G i 1 . ··ik == U Gir . This represents the set of girls each of whom are known r= l
to one of the boys bi 1 , . . . , bi k . Clearly a necessary condition for a compatible matching to be possible is that I G i 1 · ·i k I > k for all k == 1 , 2, . . . , n. Hall's Marriage Theorem says that this condition is sufficient as well. Theorem 11.2 . 1 {Hall} A compatible matching between B and found if and only if I Gi l · · · i k I > k , for all 1 < i 1 <
·
·
·
< ik <
n,
G
can be
(II. 19)
k == 1 , 2, . . . , n .
Proof. Only the sufficiency of the condition needs to be proved. This is done by induction on n. Obviously, the Theorem is true when n == 1. First assume that we have for all 1 < i 1 < · · · < ik < n, 1 < k < n. In other words, if 1 < k < n, then every set of k boys together knows at least k + 1 girls. Pick up any boy and marry him to one of the girls he knows. This leaves n  1 boys and n  1 girls; condition (II. 19) still holds, and hence the remaining boys and girls can be compatibly matched. If the above assumption is not met, then there exist k indices i 1 , . . . , i k , k < n, for which I G i l · · · i k I == k. In other words, there exist k boys who together know exactly k girls. By the induction hypothesis these k boys and girls can be compatibly matched. Now we are left with n  k unmarried boys and as many unmarried girls. If some set of h of these boys knew less than h of these remaining girls, then together with the earlier k these h+k boys would have known less than h+k girls. (The earlier k boys did not know any of the present n  k maidens. )
I I . 2 B irkhoff 's T heorem
So, condition (II. l9) is satisfied for the remfliping n  k boys and girls can now be compatibly n1arried by the induction hypothesis.
37 vv
ho
•
Ex ercise 11. 2.2 {The KonigFrobenius Theorem) Let A == (a i1 ) be an n x n matrix . If a is a permutation on n symbols, the set { a 1 o ( l ) , a 2 o ( 2 ) , . . . a o { ) } is called a diagonal of A . Each diagonal contains exactly one element from each row and from each column of A . Show that the following two statements are equivalent: ,
n
n
{i) every diagonal of A contains a zero element. {ii) A has a k k + f > n.
x
f submatrix with all entries zero for some k, € such that
One can see that the statement of the KonigFrobenius Theorem is equiv alent to that of Hall's Theorem. Theorem 11.2.3 {Birkhoff 's Theorem) The set of n x n doubly stochastic m atrices is a convex set whose extreme points are the permutation matrices. Proof. We have already made a note of the easy part of this theorem in Exercise II . 1 . 6. The harder part is showing that every extreme point is a permutation matrix. For this we need to show that each doubly stochastic matrix is a convex combination of permutation matrices . This is proved by induction on the number of positive entries of the ma trix. Note that if A is doubly stochastic, then it has at least n positive entries. If the number of positive entries is exactly n, then A is a permuta tion matrix. \iVe first show that if A is doubly stochastic, then A has at least one diagonal with no zero entry. Choose any k x f submatrix of zeroes that A might have. We can find permutation matrices P1 , P2 such that P1 AP2 has t he form P1 AP2 =
[g �],
where 0 is a k x .e matrix with all entries zero. Since P1 AP2 is again doubly stochastic, the rows of B and the columns of C each add up to 1 . Hence k + £ < n. So at leas t one diagonal of A must have all its entries positive , by the KonigFrobenius Theorem. Choose any such positive diagonal and let a be the smallest of the ele ments of this diagonal . If A is not a permutation matrix, then a < 1. Let P be the permutation matrix obtained by putting ones on this diagonal and let A  aP B= . 1a Then B is doubly stochastic and has at least one more zero entry than A has. So by the induction hypothesis B is a convex combination of permu • tation matrices. Hence so is A, since A == ( 1 a ) B + aP. 
38
II.
M aj orisation and Doubly S to chastic M atrices
Remark. There are n! permutation matrices of size n. Birkhoff's Theorem tells us that every n x n doubly stochastic matrix is a convex combination of these n! matrices. This number can be reduced as a consequence of a general theorem of Caratheodory. This says that if X is a subset of an mdimensional linear variety in IR N , then any point in the convex hull of X can be expressed as a convex combination of at most m + 1 points of X . Using this theorem one sees that every n x n doubly stochastic matrix can be expressed as a convex combination of at most n2  2n + 2 permutation matrices. Doubly substochastic matrices defined below are related to weak ma jorisation in the same way as doubly stochastic matrices are related to majorisation. A matrix B (b ij ) is called doubly substochastic if
==
. .
b· · > 0
for all
Lb · · < 1
for all J ,
?.J
n

i= l
?.J

n
Lb · · < 1 j= l
lJ

for all
'l ' J '
'l .
Exercise II.2.4 B is doubly substochastic if it is positivitypreserving, Be < e , and B*e < e . Exercise II. 2 . 5 Every square submatrix of a doubly stochastic matrix is doubly substochastic. Conversely, every doubly substochastic matrix B can be dilated to a doubly stochastic matrix A. Moreover, if B is an n x n matrix, then this dilation A can be chosen to have size at most 2n x 2n . Indeed, if R and C are the diagonal matrices whose j th diagonal entries are the sums of the j th rows and the j th columns of B , respectively, then
A = ( I B C
IR B*
)
is a doubly stochasti c matrix.
Exercise II.2.6 The set of all n x n doubly substochastic matrices is con vex; its extreme points are matrices having at most one entry 1 in each row and each column and all other entries zero. Exercise II. 2 . 7 A matrix B with nonnegative entries is doubly sub stochas tic if and only if there exists a doubly stochastic matrix A such that b ij < a ij for all i , j == 1 , 2, . . , n .
.
Our next theorem connects doubly substochastic matrices to weak majorisation.
II.2
Birkhoff 's Theorem
39
T he or em 11. 2 . 8 '{i) Let x, y be two vectors with nonnegative coordinates. Then x < w y if and only if x == By for some doubly substochastic matrix B. (ii) Let x , y E IRn . Then x <w y if and only if there exists a vector u su ch that x < u and u < y. Proof. I f x , u E IRn and x < u , then clearly x < w u . So, if in addition u < y, then x < w y . Now suppose that x , y are nonnegative vectors and x == By for some dou bly substochastic matrix B. By Exercise II.2. 7 we can find a doubly sto chast ic matrix A such that bij < aij for all i , j . Then x == By < A y. Hence, x <w y . Conversely, let x, y be nonnegative vectors such that x <w y. We want to prove that there exists a doubly substochastic matrix B for which x == B y If x == 0, we can choose B == 0, and if x < y , we can even choose B to be doubly stochastic by Theorem 1!. 1 . 10. So, assume that neither of these is the case. Let r be the smallest of the positive coordinates of x, and let s == � Yj  L: Xj · By assumption s > 0. Choose a positive integer m such that r > s fm . Dilate both vectors x and y to ( n + m)dimensional vectors x' , y' defined as .
x' y
'
(x l , · · · , Xn , s f m, . . . , s f m) , ( YI , · · · , Yn , O , . . . , O) .
Then x' < y ' . Hence x' == A y' for some doubly stochastic matrix of size n + m. Let B be the n x n submatrix of A sitting in the top left corner. Then B is doubly substochastic and x == By. This proves (i) . Finally, let x, y E IR n and x < w y. Choose a positive number t so that x + te and y + te are both nonnegative, where e == ( 1 , 1 , . . . , 1 ) . We still have x+ te < w y + te . So, by (i) there exists a doubly substochastic matrix B such that x + te == B (y + te ) . By Exercise II. 2. 7 we can find a doubly stochastic matrix A such that bij < aij for all i , j. But then x + te < A (y+ te ) == Ay+te. • Hence, if u == Ay, then x < u and u < y . Exercise 11. 2.9 A matrix A is doubly substochastic if and only if for every x > 0 we have Ax > 0 and Ax <w x . {Compare with Theorem 1!. 1 . 9.) Exercise 11.2. 10 Let x , y E IRn and let x > 0, y > 0 . Then x <w y if and only if x is in the convex hull of the 2 n n! points obtained from y by perm utati ons and sign changes of its coordinates {i . e . . vectors of the form ( ± Yo( 1 ) , ±Yo( 2 ) , . . . , ± Yu ( n) ) , where (]" is a pe rm utation}.
II.
40
Majorisation and Do ubly Stochastic M at rices
Convex and Monot one Functions
II . 3
In this section we will study maps from IRn to IRm that preserve various orders. Let f : IR � IR be any function. We will denote the map induced by f on IRn also by f; i.e. , f ( x) == ( f ( xi ) , . . . , f( x n )) for x E IRn . An elementary and useful characterisation of majorisation is the following.
Theorem 11.3. 1 Let equivalent: {i) X
< y .
{ii) tr
cp ( x )
x, y
E
JRn .
Then the following two conditions are
< tr cp ( y ) for all convex functions
from
IR
to
IR.
x < y . Then x == A y for some doubly stochastic matrix A . So n n X i == L a ij Yj , where a ij > 0 and La ij == 1 . Hence for every convex funcj =l j= l n n n tion cp , cp (x i ) < L a ij 'P ( Yj )  Hence L
Let
convex. Now apply Theorem II. l . 3 (iii) . Exercise 11. 3.2 For x , y {i) X (ii) tr
cp
E
IRn
•
the following two conditions are equivalent:
< w Y · < tr cp ( y ) for all mono tonically increasing convex functions from IR to IR.
cp( x)
Note that in the two statements above it suffices to consider only con tinuous functions. A real valued function cp on IRn is called Schurconvex or Sconvex if
x < y
(II.20)
(This terminology might seem somewhat inappropriate because the condi tion ( II.20 ) expresses preservation of order rather than convexity. However, the above two propositions do show that ordinary convex functions are related to this notion. Also, if x < y , then x is obtained from y by an averaging procedure. The condition ( 11.20) says that the value of cp is di minished when such a procedure is applied to its argument. Later on, we will come across other notions of averaging, and corresponding notions of convexity. ) We will study more general maps that include Schurconvex maps.
I I . 3 Convex and Monotone Funct ions
41
m
Consider maps : Rn lR . The domain of will be either all of IR. n or some convex set invariant under coordinate permutations of its elements. Such a map will be called monotone increasing if t
x
(x) < ( y ) ,
==?
monotone decreasing if
 is monotone increasing, convex if
(tx + ( 1  t)y)
<
co ncave if isot one if
t(x) + ( 1  t ) (y) ,
0<
t < 1,
 IS. convex, x < y
(x) < w ( y ),
< w Y
(x) < w (y) ,
strongly isotone if X
and strictly isotone if
x < y
(x) < (y) .
=====?
Note that when m == 1 isotone maps are precisely the Schurconvex maps . The next few propositions provide examples of such maps. We will denote by Sn the group o f n x n permutation matrices.
Theorem 11.3.3 Let : IR n P E Sn there exists P' E Sm
1R m
be a convex map. Suppose that for any such that (II. 2 1 ) (Px) P'(x) for all x E �n . Then is isotone. In addition, if is monotone increasing, then is strongly isotone. t
==
Proof. Let x < y in IRn . By Theorem IL 1 . 10 there exist P1 , . . . and positive real numbers t 1 , . . . , tN with L:tj == 1 such that X ==
, PN in Sn
�tj PjY·
So, by the convexity of
(x) < L:tj(Pjy) L:tjPj (y ) z, say. <
==
Then z z. that
is also monotone increasing. Let u < w y . Then by Theorem II.2.8 there exists x such that u < x < y. Hence
(y) . So,
i s strongly isotone . •
�1aj orisation and Do ubly Stochastic
II.
42
Corollary II.3.4 ·f t cp
1R
Matrices
is a convex function, then the induced map
+
IR
+
:
the above corollary. Example II.3.5
From the above results we can conclude that
{i} X < y in IR.n =* l x l <w Y I· I {ii} < y in IRn x 2 <w y 2 . {iii} x <w y in IR+ =* xP < w yP for p > 1. {iv) X < w y in IR.n x + < w y + . (v) If :.p is any function such that cp(e t ) is convex and monotone increas ing in t, then log x < w log y in IR+ cp( x) <w cp(y ) . (vi) log < w log y in IR� x < w y . {vii) For x , y E IR+ X
==>
=*
=*
X
=*
k
k
II xj < II yJ , 1 < k <
j= l
j= l
k
n , ==>
k
L xj < LYJ , 1 < k <
j= l
n.
j=l
Here IR� stands for the collection of vectors x > 0 (or, at places, x > 0) . All functions are understood in the coordinatewise sense. Thus, e.g. , l x I ( l x i i , · · · , l xn l ) .
==
As an application we have the following very useful theorem.
{Weyl 's Majorant Theorem) Let A be an n x n matrix with singular values s 1 > · · · > s n and eigenvalues A1 , : . . , A n arranged in such a way that A 1I I > · > A nI I · Then for every function cp IR+ IR+ , such that cp(e t ) is convex and monotone increasing in t, we have Theorem 11.3.6
·
·
:
+
(II.22) In particular, we have for all p
(11 . 23) >
0.
Proof. The spectral radius of a matrix is bounded by its operat or norm. Hence,
II.3
Convex and Monotone Functions
Apply this argument to the antisy.mmetric tensor powers
k k II I Xj l < II Sj , 1 < k < j= l j= l 

1\
43
k A. This gives (II. 24)
n.
•
Now use the assertion of II.3.5 (vii) . Note that we have
n
n
II I Aj l II Sj , j= l j= l
(II.25)
==
both the expressions being equal to (det A* A ) 1 1 2 . Remark 11.3. 7 Returning to the condition (I/. 21} just says
Theorem I/. 3. 3, we note that when 1 that
(Px) (x) , (II.26) for all x E IRn and P E Sn . So, in this case Theorem II. 3. 3 says that if a function IRn JR is convex and permutation invariant, then it is isotone {i. e., Schurconvex). m ==
==
:
+
Also note that every isotone function � from IRn to IR has to be permu tation invariant because Px and x majorise each other and hence isotony of implies equality of (Px) and
is isotone. If, P E Sn
convex function and let (x ) in addition, W is monotone in ==
creasing, then is strongly isotone. Exercise 11. 3.9 Let cp IR IR be convex. For each k 1, 2, functions cp( k ) IRn IR by :
:
==
+
. . . , n,
define
�
k cp ( k ) (x ) max L cp(x u (j ) ) , j= l ==
u
where runs over all permutations on n symbols. Then cp( k ) is isotone. If, in addition, cp is monotone increasing, then cp ( k ) is strongly isotone. Note that this applies, in particular, to a
n
cp(n) ( x ) L cp(xj ) ==
==
tr
cp (x ) .
j == l Compare this with Theorem I/. 3. 1 . The special choice cp (t) t gives cp( k ) ( x)
k Lx; .
j=l
==
==
44
II.
Maj orisation and Doubly
Example 11.3. 10 FO'r x E IRn
V(x)
Stochastic Matrices
let x
==
1 ""
� "Ex1 .
Let
 � (x1  x ) 2 .
==
n
.
J
This is called the variance function. Since the maps 1 convex, V(x) is isotone {i. e. , Schurconvex). Example 11.3 . 1 1 For E IR+. let H ( x)  L x1 log x1 , x
7
( x1  x) 2 are
x
==
J
where by convention we put t log t 0, if t 0. Then H is called the entropy function. Since the function f(t) t log t is convex for t > 0, we see that  H ( x) is isotone. {This is sometimes expressed by saying that the entropy function is antiisotone or Schurconcave on IR+. . ) In particular, if x1 > 0 and � x1 1 we have ==
==
==
==
H( l , O, . . . , O) < H(xJ , . , xn) < H( 1 , . . . , 1 ) , which is a basic fact about entropy. Example 11.3 . 12 For p > 1 the function . .
n
n
is isotone on IR� . In particular, if x j > 0 and �xj 1, we have 2 _ ( n_ _ +_ 1_ )P � · + 1 P < � (x ) . _ · x ==
n P 1
Example 11.3. 13 function if
j= l
A function
:
IRn
7
J
JR+
J
is called a symmetric gauge
{i}
( Px) (x) for all x E IRn , P E Sn , {iii} ( c 1 X 1 , , E n Xn ) ( x 1 , . , x n ) if Ej ± 1 , {iv) <1> ( 1 , 0, . . . , 0) 1 . {The last condition is an inessential normalisation.) Examples of sym metric gauge functions are ==
==
. . .
.
.
==
==
n
( L i xi l p ) l f p , oo (x)
j= l
lx1 j . j
1
< p <
oo,
II . 3 Convex and Monotone Funct ions
The s e norms are commonly used i11.c;.junctional analysis. of x are arranged so as to have l x 1 l > > lx n I , then ·
k
( k ) (x)
==
I: J xj l ,
j= l
·
If
45
the coordinates
·
l
is also a symmetric gauge function. This is a consequence of the majorisa tio ns (II. 29) and (i) in Examples II. 3. 5. Every symmetric gauge function is convex on IR.n and is monotone on IR+ (Problem II. 5. 1 1). Hence by Theorem II. 3. 3 it is � trongly isotone; i. e. , X < w y in lR� (x) < ( y ) . =>
For differentiable functions there are necessary and sufficient conditions characterising Schurconvexity: Theorem 11. 3. 14 A differentiable function : IRn JR. is isotone if and t
only if (i)
8 a
>
0.
Proof. We have already observed that every isotone function is permu tation invariant . To see that it also satisfies (ii ) , let i = 1, j == 2, without any loss of generality. For 0 < t < 1 let x (t )
<
Then x (t)
==
x
0 >
((1  t ) x ==
+
+
(1  t )x2 , X3 , . . . , X n ) · x (O). Hence (x(t) ) < (x (O) ) , and therefore 1
[ d (x(t) )] d t
t x 2 , tx 1
t=O
==
( II.27)
8<1> 8  (Xl  X 2 ) ( 8 (x )  a 2 ( x ) ) . X1 X
This proves (ii) . Conversely, suppose satisfies (i) and (ii). We want to prove that (u) <
we may assume that
for some 0
< s <
(u)  (x)
� . Let x(t ) be as in (IL27) . Then
18 d (x (t ) ) dt 0
dt
46
II.
Majorisation and Doubly Stochastic Matrices
<
 15 (x 1  x2 f[:� (x (t))  :: (x (t)) ] dt  fs x (t) I  x(t) 2 [ (x(t))  8 (x (t )) ] dt }0 1  2t 8x O if>
O if>
x2
1
0,
because of ( ii ) and the condition 0
<
s
<
•
�.
Example 11.3. 15 (A Schurconvex function : 1 2 � IR, where I == (0, 1 ) , be the function
( x 1 , x2 )
==
1 log( Xl
that is not convex) Let 1
 1 ) + log ( X2

1).
Using Theorem 11. 3. 14 one can check that is Schurconvex on the set However, the function log( �  1 ) is convex on (0, � ] but not on [� , 1 ) . Example II.3.16 {The elementary symmetric polynomials) For each k == 1 , 2, · · · , n, let Sk IRn � IR be the functions :
Sk (x)
==
L
1 < i1
X i 1 X i2
•
•
Xi k .
•
These are called the elementary symmetric polynomials of the variables x 1 , . . . , X n . These are invariant under permutations. We have the identities n
and
Sk (X I , · · · , Xi , . . . , x n )  Sk ( XI , · · · , Xj , · · · , X n )
== ( xi  x i ) Sk l ( xl , . , x i , . . , xi , . .
.
.
.
.
, xn ) ,
where the circumflex indicates that the term below it has been omitted. l.lsing these one finds via Theorem 11. 3. 14 that each Sk is Schurconcave; i. e. ,  Sk is isotone, on IR� . The special case k
n
==
n says that if x , y E IR� and x < y, then II Xj j= l
n
>
II Yi ·
j= l
{The Hadamard Determinant Theorem) If A is an n positive matrix, then n det A < II ajj · Theorem 11.3. 1 7
j= l
xn
47
II . 3 Convex and Monotone Functions
P roof. Use Schur's Theorem ( Exercise II. 1. 12) and the above statement about the Schurconcavity of the function f(x) == Xj on JR� .
II
•
J
More generally, if A 1 , . . . , A n are the eigenvalues of a positive matrix A, we have for k == 1, 2 , . . . , n Sk ( A 1 , · · · , A n )
Exercise 11.3. 18
If A is an
m
x
d et (A A* )
<
n
<
(11 . 28)
Sk ( a1 1 , . . . , U nn )
complex matrix, then m
n
II :L l aij l 2 
i =l j =l
(See Exercise 1. 1 . 3. ) Exerc ise 11.3. 19 Show that the ratio Sk (x) / Sk 1 (x) is Schurconcave on the set of positive vectors for k 2, . . . Hence, if A is a positive matrix, then ==
Sn ( a 1 l , · · · , ann ) Sn ( A 1 , , An )
>
, n.
Sl ( al l , · · · , a nn ) Bn  1 ( a l 1 , · · · , ann )  . . . > >
Bn l ( A I , . . . , A n ) tr A == tr A l . Proposition 11.3.20 If A is an n x n positive ·
·
·
( det A) 1 f n
= min
{ �
tr B
:B
81 ( ...\ 1 , . . . , A n )
definite matrix, then
}
is positive and det B = 1 .
If A is positive semidefinite, then the same relation holds with by inf.
min
replaced
Proof. It suffices to prove the statement about positive definite matrices; the semidefinite case follows by a continuity argument . Using the spectral theorem and the cyclicity of the trace, the general case of the proposition can be reduced to the special case when A is diagonal. So, let A be diago nal with diagonal entries ...\ 1 , . . . , A n . Then, using the arithmeticgeometric mean inequality and Theorem 11.3. 17 we have
tr A B n
=
�
L>..i bii ( II. >..j ) lfn ( II. bjj ) l fn > (det A ) lfn (det B) l fn ,
n . J
>
J
J
for every positive matrix B . Hence, tr: B > ( det A) 1 f n if det B B == ( det A) 1 / n A  l this becomes an equality.
1. When
•
{The Minkowski Determinant Theorem) If A, B are n positive matrices then
Corollary 11.3. 2 1 n x
==
(det ( A + B) ) 1 f n > ( de t A ) 1 f n
+
( de t B ) 1 f n _
48
II.
Maj orisation and Doubly S t o chastic M atrices
Binary Algebraic Operations and
II. 4
Maj orisation
For x
E
IRn we have seen in Section IL l that k
x] L .
==
J=l
It follows that if x, y E
max(x, e1 ) .
J l l =k
IRn , then
x + y < xl + y l .
(II. 29)
In this section we will study majorisation relations of this form for sums, products, and other functions of two vectors. IR is called lattice superadditive if A map
(II.30)
We will call a map cp monotone if it is either monotonically increasing or monotonically decreasing in each of its arguments. In this section we will adopt the following notation. Given cp : IR2 IR, we will denote by the map from JRn x IR.n to JRn defined as +
(II . 3 1 )
Example 11.4. 1
(i} cp(s, t)
=
s +t
is a monotone and lattice superadditive
function on IR2 . (ii} cp( t) st is a monotone and lattice superadditive function on IR� . For (i} above we have ( x y ) (x l + Y 1 , . . . , X n + Yn ) for x, y E IR. n , s,
=
,
=
and for {ii} we have Theorem 11.4. 2
If
for all x , y E IRn . Proof. Note that if we apply a coordinate permutation simultaneously to x and y, then
x 2 > · · · > X n · Next
I I . 4 B inary Algebrai c Operations and I'v1aj orisation
49
no te that we can find a finite sequence of vectors u(O) , u( l ) , . . . , u(N ) such th at y l == u C0) , y T == u ( N ) , y == u U) for some 1 < j < N ,
I an d each u( k + ) is obtained from uC k ) by interchanging two components in su ch a way as to move from the arrangement y l to y T ; i . e. , we pick up two indi ces i, j such that i <j
and interchange these two components to obtain the vector u( k + I ) . So, to prove (II.32) it suffices to prove
(x , u (k+l ) )
< w
(x, u(k ) )
(11.33)
, N  1 . Since we have already assumed x 1 > x 2 > > for k == 0, 1 , to prove (II. 33) we need to prove the twodimensional majorisation .
.
.
·
·
·
Xn ,
(II.34)
if s 1 > s 2 and t 1 > t 2 . Now, by the definition of weak majorisation, this is equivalent to the two inequalities
cp(s1 , t 2 ) V cp(s2 , t 1 ) cp ( s 1 , t2 ) + cp ( s 2 , t 1 )
<
<
cp (s1 , t 1 ) V cp( s 2 , t2 ) , cp(s 1 , t 1 ) + cp(s 2 , t 2 ) ,
for s 1 > s 2 and t 1 > t2 . The first of these follows from the monotony of cp and the second from the lattice superadditivity. • Corollary 11.4.3 For x, y For x , y
E
IR+.
where X · y ( x 1Y 1 , ==
Corollary II.4.4
E
�n
xl + y T
<
x+y
<
xl . y T
<w
X y
<w
(x , y)
<
•
, X nYn ) · For x, y E IR. n .
.
x l + yl .
(11.35)
xl . y l '
(11. 36 )
(x l , y l ) .
(II.37)
.
(x l , y T )
<
Proof. If x > 0 and y > 0 , this follows from (II.36) . In the general case , choose t large enough so that x + t e > 0 and y + t e > 0 and apply the • special result.
The inequality (II.37) has a "mechanicaP' interpretation when x > 0 and y > 0. On a rod fixed at the origin, hang weights Yi at the points at distances xi from the origin. The inequality (II.37) then says that the maximum moment is obtained if the heaviest weights are the farthest from the origi n .
II.
50
Majorisation and Doubly Stochastic Matrices
2 Exercise 11.4.5 The function : IR R defined as monotone and lattice superadditive on JR2 . Hence, for x, y i x 1\
11. 5
y1
1\
< w X
y
x
< , u
i Ay ,
i
If a doubly stochastic matrix
Prob lem 11.5. 2.
Let
also doub ly stochastic , then
and each
hul l of the points
is either
r;
y E JR� .
The set { x : x E
0
1.
or
Let y E
permutations and each c:1
Rn .
c onvex hul l of points of the form
Problem 11.5.4.
( �
A
±1,
..X (A)
Use
is
w he re
��� )
be a U
=
is the convex
y}
vari es over pennutations
a
2
is
Rn
: lx l
x
varies over
IY I }
< w a
is the
2 block matrix and le t
C(A)
s (C (A ) )
< w s
..X(C ( A ))
<
More general ly, let
=
EPJ A PJ
A1 1
A1 2 ' ' '
' ' '
' ' '
' ' '
'
Ar l
A r2
•
•
•
..X ( A ) .
' '
..
'
'
.
' ' '
A r·r
11 .5.4.
SlH)\V
.
•
.
EBP;
, Pr
be a fami ly of mutual ly
I. Then the operation of pinching of A . In an appropri ate =
A11 ,
C (A)
=
A 22 Ar ,,
E�ach such pinching is a product of duced in P roblem
P1 ,
is cal led a
Alr .
( .r1 ) .
to show that
choice of basis this means th at
' ' '
AU * ) .
resp ec t i vely.
A,
ort hogona l proj ections in en such that
A=
< w
A l
denot e the nvectors \vhose coordi nates are the eigen
to show that
Problem 11.5.5. to
iR� , x
(t: 1 Yu ( l ) , . . . , C:nYo (n} ) , \Vhere
A � (A +
s(A)
and
(11 . 18)
A
is i nvertible an d
=
A 2.
A is Hermitian , use (11. 16 )
taking
A
The set { x E
( ��:
=
val ues and the singular val ues of
If
s 1\ t
� ) I f U ( � _01 ) , then we can w rite
A 1 <
=
Let
=
C( ) Let
JRn
is a pcn nutation .
A
(r l Yo( l ) , · . . , rnYo(n) ) ,
Prob l em 11.5.3.
A
E
=
Problems
Pro b l em 11.5. 1 .
C( )
�
r

1
pinchings of the
that for every pinching
s( C( A ))
< w
s (A )
C
2
x
2
type intro
(11 .38 )
I I . 5 Pro blems
for all matrices A , and
.A(C (A) ) < .A(A)
51
(II.39)
for all Hermitian matrices A. When P1 , . . . , Pn are the projections onto the coordinate axes , we get as a special case of (II.38) above n
l tr A I < From (II.39) we get
as
L si (A)
j=l
==
I IAI h .
(II.40)
a special case Schur's Theorem diag (A) < .A(A) ,
which we saw before in Exercise II. 1 . 12. Problem 11.5.6. Let A be positive. Then
(II . 41 )
det A < det C(A) ,
for every pinching C. This is called Fischer's inequality and includes the Hadamard Determinant Theorem as a special case. P roblem II. 5 . 7. For each k == 1 , 2 , . . . that for positive definite A
, n
and for each pinching C sho\v ( II.42 )
where Sk (.A( A ) ) denotes the kth elementary symmetric polynomial of the eigenvalues of A . This inequality, due to Ostrowski, includes (II. 28) as a special case. It also includes (II. 4 1 ) as a special case. P roblem 11. 5 . 8 . If 1\ k A denotes the kth antisymmetric tensor power of A, then the above inequality can be written as
tr 1\ k A < tr 1\ k (C(A) ) .
(II.43)
The operator inequality
is not always true. This is shown by the following example. Let
A ==
2 0 0 1 0 1 1 0
0 1
2
0
1
0 0 1
'
P ==
1 0 0
0 0 0 l 0 0 0 0 0
0 0 0 0
and let C be the pinching induced by the pair of projections P and (The space 1\ 2 C4 is 6dimensional. )
I  P.
52
II.
Majoris at ion and Doubly Stochastic Matrices
Problem II. 5.9. Let { > q , . . numbers. Let
.
, A n } , {JL 1 , . . . , Jl n } be two n�tuples of complex
where the minimum is taken over all permutations on n symbols. This is called the optimal matching distance between the unordered ntuples A and fL · It defines a metric on the space C�y m of such ntuples. Show that we also have d(A, JL) == max 1 2 n min I A i  /Lj l ·
iEJ J,JC{ , , . . . , } jE J I I I+IJI=n+l
Problem 11. 5 . 10. This problem gives a refinement of Hall's Theorem un der an additional assumption that is often fulfilled in matching problems. In the notations introduced at the beginning of Section II.2, define
1 This is the set of boys known to the girl
Bi 1 ···ik Suppose that for each 1 < i1 < < i k < n, ·
·
==
k
k
U Bi r ,
r=l ==
1
9i .
< i1 <
< i < n.
Let ·
·
·
< i k < n.
1 , 2, . . . , [ � J and for every choice of indices
·
Show that then
Hence a compatible matching between B and G exists. P roblem 11.5. 1 1 . ( i ) Show that every symmetric gauge function is con tinuous.
( ii ) Show that if is a symmetric gauge function, then
1 (x ) for all x E IRn .
( iii ) If
<
< tj < 1 , then
( iv ) Every symmetric gauge function is monotone on R� . ( v ) If
x,
y
E
IRn and txl
gauge function .
<
I Y I , then
(y)
for every symmetric
I I . 5 Pro blems
( vi ) If
symm
x
<w
:
IR+
y if and etric gauge function .
x, y
E IR� , then
P ro ble m 11.5. 12. Let f f ( O) == 0.
�+
�
only i f
�
IR+
be defined
<
( y )
for every=··
be a concave function such that
( i ) Show that f is subadditive: f ( a + b )
( ii ) Let
(x)
53
< f ( a ) + f ( b ) fo r all a, b E
IR+ .
as
n
n
j=l
j=l
is Schurconcave. Note that for x, y E IR+ ( x, y ) < (x + y, 0 ) in IR�n .
Then
( iii )
( iv ) From (ii) and ( iii ) conclude that the function n
F (x ) == L J ( ! xj l ) j =l is subadditive on IRn .
( ) Special examples lead to the following inequalities for vectors v
x , y E IRn : n n n Ll xi + Yi i P < L l xi i P + L !Yi i P , j == 1 j=1 j=l
0 < P < 1.
L log ( 1 + l xi + Yi l) < L log ( 1 + l xil ) + L log ( I + I Yi ! ). n
n
n
j= l
j= l
_i = l
Problem 11. 5 . 13. Show that and only if cp (x 1 + 81 , x 2
a
map
:
�2
+
� is lattice superadditive if
 82 ) + rp(x 1  81 , x2 + 82 ) < rp(x1 + 8 1 , x 2 + 82 ) + rp (x1  81 , x 2  82 )
for all ( x 1 , x 2 ) and for all 81 , 82 equivalent to
>
0.
If
i s twice differentiable, this is
54
II.
Maj orisat ion and Doubly Stochastic Matrices
Problem 11. 5 . 14. Let cp : IR2 � lR be a monotone increasing lattice super additive function, and let f be a monotone increasing and convex function from R to IR. Show that if cp and f are twice differentiable, then the com position f o cp is monotone and lattice superadditive. When cp ( s, t) == s + t show that this is also true if f is monotone decreasing. These statements are also true without any differeiHjability assumptions. Problem 11.5 . 15 . For x , y
 log(x 1
E
R+
+ yi) < w  log ( x + y) < w  log (x l + y l )
log(x 1
·
y i ) < w log(x y ) < w log(x l y l ) . ·
·
From the first of these relations it follows that
n n IT (xj + yj ) < IT (xj + Yi ) < IT (xj + y] ) . j=l j= l j= l n
Problem 11.5.16. Let x , y, u be vectors in JRn all having their coordinates in decreasing order. Show that
(y, u ) if x < y, (ii ) ( x , u) < (y, u ) if x < w y and u E IR+ . In particular, this means that if x , y E Rn , x < w y, and u E R+ , then
(i) ( x , u ) <
(Use Theorem 11.3. 14 or the telescopic summation identity
k k L ajbj == L ( aj  aj +1 ) ( b1 + j=l j= l where II. 6
aj , bj ,
1
·
·
·
+ bj ) ,
< j < k, are any numbers and ak+l == 0.]
Notes and References
Many of the results of this chapter can be found in the classic Inequalities by G.H. Hardy, J E Littlewood, and G. Polya, Cambridge University Press, 1934, which gave the first systematic treatment of this theme. The more recent treatise Inequalities: Theory of Majorization and Its Applications by A.W. Marshall and I. Olkin, Academic Press, 1 979 , is a much more detailed and exhaustive text devoted entirely to the study of majorisation. .
.
II.6 Notes and
References
55
It is a.irinvaluable resource on this topic. For the reader who wants a quicker introduction to the essentials of maj orisation and its applications in linear algebra, the survey article Majorization, doubly stochas tic matrices and co mparison of eigenvalues by T. Ando, Linear Algebra and Its Applications , 1 18( 1 989 ) 163248 , is undoubtedly the ideal course. Our presentation is strongly influenced by this article from which we have freely borrowed. The distance d ( A., J..L ) introduced in Problem II. 5.9 is commonly employed in the study of variation of roots of polynomials and eigenvalues of matri ces since these are known with no preferred ordering. See Chapter 6. The result of Problem II.5. 10 is due to L. Elsner, C. Johnson, J . Ross, and J. Schonheim, On a generalised matching problem arising in estimating the eigenvalue variation of two matrices, European J. Combinatorics , 4( 1983) 1 33 1 3 6 .
Several of the theorems in this chapter have converses. For illustration we mention two of these. Schur's Theorem (II. 14) has a converse; it says that if d and A. are real vectors with d < .A , then there exists a Hermitian matrix A whose diagonal entries are the components of d and whose eigenvalues are the components of A.. Weyl's Majorant Theorem (II.3.6) has a converse; it says that if ...\ 1 , . , A n are complex numbers and s 1 , . . . , s n are positive real numbers ordered as IX 1 I > · · · > I .An l and s 1 > · · · > S n , and if .
k
k
II I .Ai l < II Sj =l j=l j n n II l.Aj l == II Sj , j= l j= l
fo r
1 < k<
.
n,
then there exists an n x n matrix A whose eigenvalues are ...\ 1 , . . . , An and singular values s 1 , . . . , s n . For more such theorems, see the book by Marshall and Olkin cited above. Two results very close to those in II. 3 . 16II.3. 2 1 and II.5 .6II.5.8 are given below. M. Marcus and L. Lopes, Inequalities for symmetric functions and Her mitian matrices, Canad. J. Math. , 9 ( 1 957) 305312, showed that the map 4> : JR� � lR given by ( x) ( Sk ( x)) 1 f k is Schurconcave for 1 < k < n . Using this they showed that for positive matrices A , B ==
(II.44)
This can also be expressed by saying that the map A ( tr 1\ k A ) 1 f k is n, this reduces to the concave on the set of positive matrices. For k statement [d et ( A + B ) ] 1 1 n > [ det A] 1 f n + [det B ] 1 f n , +
=
which is the Minkowski determinant inequality.
56
II.
Majorisation and Doubly Stochastic Matrices
E . H. Lieb, Convex trace functions and the Wigner YanaseDyson conjec ture, Advances in Math. , 1 1 ( 1973) 267288, proved some striking operator inequalities in connection with the W.Y.D. conj ecture on the concavity of entropy in quantum mechanics. These were proved by different techniques and extended in other directions by T. Ando, Concavity of certain maps on positive definite matrices and applications to Hadamard products, Linear Algebra Appl. , 26 ( 1979) 203241 . One consequence of these results is the inequality (11.45) for all positive matrices A, B and for all implies that
k == 1 , 2,
. . . , n.
In particular, this
When k == n , this reduces to the Minkowski determinant inequality. Some of these inequalities are proved in Chapter 9.
I II Variational P rinciples for Eigenvalues
In this chapter we will study inequalities that are used for localising the spectrum of a Hermitian operator. Such results are motivated by several interrelated considerations. It is not always easy to calculate the eigen values of an operator. However, in many scientific problems it is enough to know that the eigenvalues lie in some specified intervals. Such infor mation is provided by the inequalities derived here. While the functional dependence of the eigenvalues on an operator is quite complicated, several interesting relationships between the eigenvalues of two operators A, B and those of their sum A + B are known. These relations are consequences of variational principles. When the operator B is small in comparison to A, then A + B is considered as a perturbation of A or an approximation to A. The inequalities of this chapter then lead to perturbation bounds or error bounds. Many of the results of this chapter lead to generalisations, or analogues, or open problems in other settings discussed in later chapters. III. l
The Minimax Principle for Eigenvalues
The following notation will be used throughout this chapter. If A , B are Hermitian operators, we will write their spectral resolutions as Aui = O'.j Uj , Bvj {3j Vj , 1 < j < n, always assuming that the eigenvectors u1 and the eigenvectors Vj are orthonormal and that a 1 > o. 2 > > O'. n and {3 1 > {32 > > f3n . When the dependence of the eigenvalues on the operator is to be emphasized, we will write _Al (A) for the vector with com==
·
·
·
·
·
·
58
III.
Variational Principles for Eigenvalues
ponents A i ( A) , . , A ; ( A) , where AJ (A) are arranged in decreasing order; i.e. , AJ ( A) == D'.j . Similarly, A i ( A) will denote the vector with components AJ (A ) where A} (A) == Cl n j+ l , 1 < j < n . .
.
Theorem 111. 1 . 1 {Poincare's Inequality) Let A be a Hermitian operator on 1t and let M be any kdimensional subspace of 1t. Then there exist unit vectors x, y in M such that (x, Ax ) < Ak (A) and (y, A y ) > � k (A) . Proof. Let N be the subspace spanned by the eigenvectors A corresponding to the eigenvalues Ai (A), . . . , A� (A). Theil:
dim M
+
dim N == n +
U k , . . . , Un
of
1,
and hence the intersection of M and N is nontrivial. Pick up a unit vector x in M n N. Then we can write x
==
n
n
j=k
j= k
L�j Uj , where L 1�i l 2 = 1 . Hence,
n
n
j= k
j=k
This proves the first statement. The second can be obtained by applying this to the operator A instead of A. Equally well, one can repeat the argument, applying it to the given kdimensional space M and the ( n • k + I)dimensional space spanned by u 1 , u2 , . . . , Un  k + l · Corollary III. 1 . 2 {The Minimax Principle) Let A be a Hermitian opera tor on 7t. Then
max
M C 'H. d i m M =k
min
dim
min (x, Ax) EM
x l lx l l=l
M C 'H. M=n  k+l
max (x , A x ) .
xEM l l x l l=l
Proof. By Poincare's inequality, if M is any kdimensional subspace of 1t, then min(x, Ax) < A k ( A ) , where x varies over unit vectors in M . But if M is the span of { u 1 , . , uk } , then this last inequality becomes an equality. That proves the first statement. The second can be obtained from the first • by applying it to A instead of A. X
.
.
This minimax principle is sometimes called the CourantFischerWeyl minimax principle. Exercise 111. 1.3 In the proof of the minimax principle we made a par ticular choice of M . This choice is not always unique. For example, if A k (A ) == Ak + l (A) , there would be a whole 1parameter family of such subspaces obtained by choosing different eigenvectors of A belonging to A.i ( A) .
I I I . l The
Minimax Principle for Eigenvalues
59
Th is is not surprising. More surprising, perhap s even shocking, is the fact th at we could have A t (A) == min { (x, Ax) : x E M , J l x l l == 1 } , even for a kdimensional subspace that is not spanned by eigenvectors of A . Find an ex ample where this happens. {There is a simple example. )
Exercise 111. 1 .4 In the proof of Theorem II/. 1 . 1 we used a basic principle of linear alge bra: dim ( M 1
n
dim M 1 + dim M 2  dim ( M 1 + M2 ) dim M 1 + dim M2  n ,
M2)
>
for any two sub spaces M 1 and M2 of an ndimensional vector space. Derive the corresponding inequality for an intersection of three subspaces.
An equivalent formulation of the Poincare inequality is in terms of com pressions. Recall that if V is an isometry of a Hilbert space M into 1t, then the compression of A by V is defined to be the operator B == V* AV. Usu ally we suppose that M is a subspace of 1t and V is the injection map. Then A has a blockmatrix representation in which B is the northwest corner entry:
A=
( ! :).
B is the compression of A to the subspace M . Corollary 111. 1.5 (Cauchy 's Interlacing Theorem) Let A be a Hermitian operator on 1t, and let B be its compression to an (  k) dimensional
We say that
subspace N. Then for j
n
==
1, 2, . . .
, n 
k
(III. I ) Proof. For any j , let M be the span of the eigenvectors v 1 , . . . , Vj of B corresponding to its eigenvalues At ( B) , . . . , AJ (B) . Then (x, Bx) == (x, Ax) for all x E M . Hence,
A1� (B)
==
:r: E M
min l lx l l = l
(x, Bx )
==
:r: E M
min l l :r: l l = l
(x, Ax) < .X1* (A) .
This proves the first assertion in (III. I ) . Now apply this to A and its compression Note that
B to the given subspace N.
and
.XJ ( B ) .X} ( B ) =
=
.xtn k ) j+l ( B )
for all 1 < j <
n 
k.
III.
60
Variational
Principles for Eigenvalues
i == j+k.
Then the first inequality yields AJ (B) < .XJ + k (B) , which is t he second inequality in (III. I ) . •
Choose A
The
above inequalities look especially nice when B is the compression of to an (n  I)dimensional subspace: then they say that
This
(111.2)
explains why this is called an interlacing theorem.
Exercise 111. 1.6 The Poincare inequality, the minimax principle, and the in terlacing theorem can be derived from each other. Find an independent proof fo r each of them using Exercise III. 1 .4 . (This "dimensioncounting" fo r intersections of sub spaces will be used in later sections too.) �xercise III. l. 7 Let B be the compression of a Hermitian operator A to � I ) dimensional space M . If, for some k, the space M contains the a n (n ve c tors u 1 , , uk , then {3j == a. j for 1 < j < k . If M contains u k , , un , th en a.j == f3j  l fo r k < j < n . .
.
.
.
.
.
Exercise III. 1.8 (i} Let A n be the n x n tridiagonal matrix with entries a i l == 2 cos e for all i , U ij == 1 if l i  j I == 1 ' and U ij == 0 otherwise. The determinant of An is sin(n + 1 ) 0/ sin 0. (ii) Show that the eigenvalues of A n are given by 2( cos 0 + cos r?.;1 ) , 1 < j < n.
(iii) The special case when ai i == 2 for all i arises in Rayleigh 's finite dimensional approximation to the differential equation of a vibrating string. In this case the eigenvalues of A n are l .:\1. ( An )
==
. 4 Sln2
.
J 1r , 2 (n + l )
1 < j < n.
(iv) Note that, for each k < n , the matrix An k is a compression of A n . Th is example provides a striking illustration of Cauchy 's interlacing theorem.
It is illuminating to think of the variational characterisation of eigenval ues as a solution of a variational problem in analysis. If A is a Hermitian operator on lRn , the search for the top eigenvalue of A is just the problem of maximising the function F(x) = x* Ax subject to the constraint that the function G(x) x*x has the fixed value 1 . The extremum must occur at a critical point, and using Lagrange multipliers the condition for a point x to be critical is \lF(x) == A"VG(x) , which becomes Ax = .Xx. Our ear lier arguments got to the extremum problem from the algebraic eigenvalue problem, and this argument has gone the other way. If additional constraints are imposed , the maximum can only decrease. Confining x to an ( n  k)dimensional subspace is equivalent to imposing =
III. 1
The Minimax Principle
for Eigenvalues
61
k linearly independent linear constraints on it. These can be expressed as Hj ( x ) == 0 , where Hj (x) == wj x and the vectors Wj , 1 < j < k are linearly
indep endent . Introducing additional Lagrange multipliers J.lj , the condition for a critical point is now V' F(x ) == A \J G (x ) + Lj J.Li \1 Hi (x ) ; i.e. , Ax A x is no longer required to be 0 but merely to be a linear combination of the Wj Look at this in blockmatrix terms. Our space has been decomposed ·into a direct sum of a space N and its orthogonal complement which is spanned by {w1 , . . . , w k } Relative to this direct sum decomposition we can write
Our vector x is now constrained to be in N, and the requirement for it to be a critical point is that (A  AI) (�) lies in NJ.. . This is exactly requiring x to be an eigenvector of the compression B . If two:;interlacing sets of real numbers are given, they can be realised as the eigenvalues of a Hermitian matrix and one of its compressions. This is a converse to one of the theorems proved above: Theorem 111. 1.9 Let a.j , 1 < j < n, and f3i, 1 < i < n  1 , be real numbers such that
a1 >
{3 1 > a2 > . . . > f3n  l
> D. n ·
Then there exists a compression of the diagonal matrix A having f3i, 1 < i < n  1 , as its eigenvalues.
==
diag ( a 1 , . . .
, a. n)
Proof. Let Aui == D.j Uj ; then { Uj} constitute the standard orthonor mal basis in e n . There is a onetoone correspondence between ( n  1 ) dimensional orthogonal projection operators and unit vectors given by P == I  zz* . Each unit vector, in turn, is completely characterised by its coordinates (j with respect to the basis ui . We have z == 2: (j Uj == E(uj z)uj , 2: j(j l 2 == 1 . We will find conditions on the numbers (j so that, for the corresponding orthoprojector P == I  zz* , the compression of A to the range of P has eigenvalues f3i · Since PAP is a Hermitian operator of rank n  1 , we must have n 1
II (A  f3i )
i= l
==
tr
1\ n  1 [ P (.A I  A) P] .
Ei I  ui uj , then n l 1\n ( Af  A) L II ( A  a. k ) 1\n  1 Ej · j = 1 k:j:.j
If Ei are the projectors defined as
==
==
Using the result of Problem I.6.9 one sees that
1\ n  1 p . I\ n  1 Ej . I\ n1 p
=
1 (j 1 2 /\
n
 1 p_
II I . Variatio nal Principles
62
Since rank 1\ n  l P
for
Eigenvalues
== 1 , the above three relations give n1
n
(11!.3)
i= l
j=l
an identity between polynomials of degree n  1 , which the (j must satisfy if B has spectrum {,Bi } · We will show that the interlacing inequalities between Oj and ,Bi ensure that we can find
(j
satisfying ( II I.3) and
without loss of generality, that the
Gj
n L 1(j l2
=
1 . We may assume,
1 < j < n.
(111.4)
j=l are distinct . Put
i= l
The interlacing property ensures that all /j are nonnegative. Now choose (j to be any complex numbers with 1 (1 12 /j . Then the equation (III.3) is satisfied for the values .,.\ a1 , 1 < j < n , and hence it is satisfied for all ...\ . Comparing the leading coefficients of the two sides of (111.3) , we see that L l(j l 2 1 . This completes the proof. ==
==
•
==
J
III. 2
Weyl 's Inequalit ies
Several relations between eigenvalues of Hermitian matrices A , B, and A+B can be obtained using the ideas of the previous section. Most of these results were first proved by H. \Veyl.
A, B be x n Hermitian matrices. Then, AJ ( A + B ) < AI ( A) + .Aj _i +l (B) for i < j,
Theorem III. 2 . 1 Let
n
(1IL5)
.AJ ( A + B) > .Af (A) + .Ali+n( B ) for i > j. Uj , Vj , and Wj denote the eigenvectors of A , B ,
(IIL6)
Proof. Let and A + B respectively, corresponding to their eigenvalues in decreasing order. Let i < j . Consider the three subs paces spanned by { w1 , . . . , Wj } , { ui , . . , U n } , and { Vj  i+ I , . . , vn } respectively. These have dimensions j , n  i + 1 , and n j + i , and hence by Exercise 111. 1.4 they have a nontrivial intersection. Let x be a unit vector in their intersection. Then .
.

AJ ( A + B) <
(x, ( A + B) ) x
==
( x , Ax ) + ( x , Bx )
< .x; ( A )
+ .x; _i +l ( B) .
III.2 Weyl's Inequalities
This proves (III. 5} . If A and B, we get (III. 6) .
B in this
Corollary 111. 2 . 2 For each j
==
1 , 2,
.
inequality are replaced by
.
.
A and •
, n,
A.J (A) + A.;(B) < Aj (A + B) < AJ (A ) + A.i (B). P roof.
Put i
==
63
(III. 7) •
j in the above inequalities.
It is customary to state these and related results as perturbation theo rems, whereby B is a perturbation of A; that is B == A + H. In many of the applic ations H is small and the object is to give bounds for the distance of A.(B) from A.(A) in terms of H == B  A. Corollary 111. 2.3 {Weyl 's Monotonicity Theorem) If H is positive, then
AJ (A + H) > A.J (A)
for all j.
Proof. By the preceding corollary, AJ ( A + H) > AJ (A) + A.� (H ) , but all the eigenval ues of H are nonnegative. Alternately, note that ( x, (A + H )x ) > • (x, Ax ) for all x and use the minimax principal. Exercise 111. 2.4 If H is positive and has rank k , then
This
is
A.j (A + H) > AJ (A ) > A.j+k (A + H)
for j
==
1 , 2, . . . , n  k.
analogous to Cauchy 's interlacing theorem.
Exercise 111.2.5 Let H be any Hermitian matrix. Then
A.j (A )  I I H I I < A.j ( A + H ) < AJ (A) + I IHI I· This can be restated as: Corollary 111.2 .6 (Weyl 's Perturbation Theorem) Let A and mitian matrices. Then
m�J A.; (A) J
be Her
AJ ( B ) I < I l A  B I I 
Exercise 111.2. 7 For Hermitian matrices
I IA  Bl l
B
A, B,
< m�J �j (A) J
we have
Aj (B ) I.
It is useful to have another formulation of the above two inequalities, which will be in conformity with more general results proved later. We will denote by Eig A a diagonal matrix whose diagonal entries are the eigenvalues of A. If these are arranged in decreasing order, · we write this matrix as Eig! (A); if in increasing order as Eig T (A). The results of Coro ll ary III.2.6 and Exercise III.2. 7 can then be stated as
64
III.
Variational Principles for Eigenvalues
Theorem 111. 2.8 For any two Hermitian matrices A, B, Weyl ' s inequality (111.5 ) is equivalent to an inequality due to Aronszajn connecting the eigenvalues of a Hermitian matrix to those of any two com plementary principal submatrices. For this let us rewrite (I1I.5) as
Af+j 1 (A + B) < AJ ( A ) + AJ ( B ) , for all indices i , j such that i + j 1 < n. Theorem 111.2.9 (Aronszajn's Inequality) Let C be an
(111.8)

matrix partitioned as
c
=
n
( ;* � ) '
x n Hermitian
where.. .A is a k x k matrix. Let the eigenvalues of A, B, and C be a 1 > ak , {31 > > f3n  k , and 11 > · · · > In, respectively. Then ·
·
> ···
·
li+j 1 + In < a i + {3j for all i, j with i + j  1 <
(III.9)
n.
Proof. First assume that In == 0. Then C is a positive matrix. Hence C == D* D for some matrix D. Partition D as D == ( D1 D2) , where D1 has k columns. Then
Note that DD* == D 1 D; + D2D2 . Now the nonzero eigenvalues of the matrix C == D D are the same as those of D D* . The same is true for the matrices A == D;_ D1 and D1Di , and also for the matrices B == D?. D 2 and D2D2 . Hence, using Weyl 's inequality (III.8) we get (III.9) in this special case. If In =/= 0, subtract In! from C. Then all eigenvalues of A, B, and C are translated by In· By the special case considered above we have *
li+j 1  In < (a i  In ) + ((3j  In ) ,
•
which is the same as (III.9 ) .,;.�
We have derived Aronszajn's inequality from Weyl ' s inequality. But the argument above can be reversed. Let A, B be n x n Hermitian matrices and let C == A + B. Let the eigenvalues of these matrices be a 1 > > a n , {31 > · · · > f3n , and 11 > · · · > In, respectively. We want to prove that 'Yi +j  1 < ai + f3j · This is the same as 'Yi +j 1  (an + f3n) < (a i  an ) + ( {3j  f3n ) · Hence, we can assume, without loss of generality, that both A and B are positive. Then A == Di D1 and B == D2 D2 for some matrices D1 , D2 . Hence, ·
C
=
D� D 1
+
D ; D 2 = ( D i D; )
( �� )
.
·
·
III.3
Wielandt 's Minimax Principle
65
C o nsi der the 2 n ':�"2 n matrix
Then the eigenvalues of E are the eigenvalues of C together with n ze roes. Aronszajn ' s inequality for the partitioned matrix E then gives Weyl ' s inequality (III.8) . By this procedure, several linear inequalities for the eigenvalues of a sum of Hermitian matrices can be transformed to those for the eigenvalues of block Hermitian matrices, and vice versa. Wielandt ' s Minimax Principle
III . 3
The minimax principle (Corollary 111. 1 . 2) gives an extremal characterisa tion for each eigenvalue aj of a Hermitian matrix A. Ky Fan's maximum principle (Problem 1.6. 15 and Exercise II. 1 . 13) provides an extremal char acterisation for the sum a1 + + ak of the top k eigenvalues of A. In this section we will prove a deeper result due to Wielandt that subsumes both these principles by providing an extremal representation of any sum a h + + a i k . The proof involves a more elaborate dimensioncounting for intersections of subspaces than was needed earlier. We will denote by V + W the vector sum of two vector spaces V and W, by V  W any linear complement of a space W in V, and by span { v1 , . . . , vk } the linear span of vectors v1 , . . . , Vk . ·
·
·
·
·
·
Lemma 111. 3 . 1 Let WI =::) W2 =::) =::) Wk be a decreasing chain of vector spaces with dim Wj > k  j + l . Let Wj , 1 < j < k  1 , be linearly independent vectors such that Wj E Wj , and let U be their linear span. Then there exists a nonzero vector u in W1  U such that the space U + span { u } has a basis v 1 , . . . , v k with vj E Wj , 1 < j < k . •
•
•
This will be proved by induction on k. The statement is easily verified when k == 2 . Assume that it is true for a chain consisting of k  1 spaces. Let WI , . . . , Wk  I be the given vectors and U their linear span. Let S be the linear span of w2 , . . . , Wk  l · Apply the induction hypothesis to the chain w2 � . . . � wk to pick up a vector v in w2  s such that the space S + span{ v } is equal to span{ v2 , . . . , vk } for some linearly independent vec tors vi E Wi , j == 2 , . . . , k. This vector v may or may not be in the space U. We will consider the two possibilities. Suppose v E U. Then U == S + span{v} because U is (k  1 )dimensional and S is ( k  2)dimensional . Since dim wl > k , there exists a nonzero vector u in wl  U. Then u , V2 , . . . ' V k form a basis for U + span{ u } Put u == v1 . All requirements are now met. Suppose v ¢ U. Then w 1 ¢ S + span { v } , for if WI were a linear com bination of w2 , . . . , w k  l and v, then v would be a linear combination of Proof.
.
66
I II.
Variational Principles for Eigenvalues
W I , w 2 , . . . , wk I and hence be an element of U. So, span{ w1 , v2 , . . . , v k } is a kdimensional space that must , therefore, be u + span{ v } . Now W I E wl • and Vj E Wj , j == 2 , . . . , k. Again all requirements are met.
Theorem 111.3.2 Let V1 c V2 c c Vk be linear subspaces of an n dimensional vector space V, with dim Vj == ij , 1 < i 1 < i2 < · < i k < n. Let WI =::) W2 :J =::) Wk be subspaces of V, with dim Wi = n  ii + 1 == codim Vj + 1 . Then there exist linearly independent vectors Vj E 1j , 1 < j < k , and linearly indepeJ1dent vectors Wj E Wj , 1 < j < k , such that ·
·
·
·
·
·
·
·
span { vI , . . . , v k } == span { w I , . . . , w k } . Proof. When k == 1 the statement is obviously true. (We have used this repeatedly in the earlier sections. ) The general case will be proved by in duction on k. So, let us assume that the theorem has been proved for k  1 pairs of subs paces. · cB y the induction hypothes1s choose Vj E Vj and Wj E Wj , 1 < j < k  1 , two sets of linearly independent vectors having the same linear span U. Note that U is a subspace of Vk . For j == 1 , . . . , k, let Si = Wi n Vk . Then note that
n > dim Wi + dim Vk  dim Si ( n  ij + 1 ) + i k  dim Sj .
Hence,
. dim SJ > i k  iJ + 1 > k  J + 1. ·
·
Note that SI =::) 52 =::) :J sk are subspaces of vk and Wj E Sj for j = 1 , 2, . . . , k  1 . Hence, by Lemma III.3. 1 there exists a vector u in 81  U such that the space U + span { u } has a basis u i , . . . , u k , where Uj E Sj c Wj , j == 1 , 2 , . . . , k. But U + span{u} is also the linear span of v1 , . . . , v k  l and u . Put V k == u . Then Vj E Vj , j == 1 , 2 , . . . , k , and they span the same space as the Uj · • •
•
•
Exercise 111.3.3 If V is a Hilbert space, the vectors Vj and Wj i n the statement of the above theorem can be chosen to be orthonormal. Proposition 111.3.4 Let A be a Hermitian operator on 1l with · eigenvec tors Uj belonging to eigenvalues AJ (A) , j = 1 , 2 , . . . , n. (i) Le t Vi == span{ u1 , . . . , Uj }, 1 < j < n. Given indices 1 < i1 < < i k < n, choose orthonormal vectors X i; from the spaces Vi; , j = 1 , . . . , k . Let V be the span of these vectors, and let Av be the compression of A to the space V. Then ·
·
·
{ii} Let Wj = span{ Uj , . . . , u n } , 1 < j < n. Choose orthonormal vectors Xi from the spaces wi . ' j == 1 , . . . , k . Let w be the span of these vectors J
J
III.3 Wielandt's Minimax Principle
67
and Aw the compression of A to W: Then
AJ ( Aw ) < Af; ( A ) for j == 1, . . . , k. Proof. Let Y 1 , . . . , Yk be the eigenvectors of Av belonging to its eigenval ues A I (Av ) , . . . , A i ( Av ) . Fix j, 1 < j < k , and in the space V consider the , X i; } and {y1 , . . . , Yk } , respectively. The dimen spaces spanned by { X i 1 , •
•
•
sions of these two spaces add up to k + l , while the space V is kdimensional. Hence there exists a unit vector u in the intersection of these two spaces. For this vector we have
AJ ( Av ) > (u, Av u ) == (u , Au)
>
A L ( A ).
This proves ( i ) . The statement ( ii ) has exactly the same proof.
•
Theorem 111.3.5 (Wielandt 's Minimax Principle) Let A be a Hermitian operator on an ndimensional space 1t . Then for any indices 1 < i 1 < · · · < i k < n we have
k EAi; (A ) j= l
M 1 C ···CMk d i m Mj =i j
max
xj
.
min x
· E M J·
J
ort honormal
}\{1 :J · · · �.Nk
min
x;
d i m Nj =n  i j + l
· E JVJ·
max
XJ
k E(xj , Ax1 ) j= l k E (x1 , Ax1 ) . j= l
o rt honormal
Proof. We will prove the first statement ; the second has a similar proof. Let vi j == span { U l , . . . ' Ui j } ' where, as before, the Uj are eigenvectors of A corresponding to AJ ( A ) . For any unit vector x in Vii , (x, Ax) > A f1 (A). So, if Xj E Vii are orthonormal vectors, then
k
E( x j , Axj ) j=l Since
k
>
L Af (A) . j=l i
Xj were quite arbitrary, we have .
rmn
xi
XJ
E Vi i
orthonormal
k k L (xj , Axj) > E A i; ( A ) .
j=l
j=l
Hence, the desired result will be achieved if we prove that given any sub spaces M 1 c C M k with dim Mj == ii we can find orthonormal vectors Xj E Mj such that ·
·
·
k k L (x1 , Axi ) < L Af (A ) . j= l j= l i
68
III.
Variational Principles for Eigenvalues
Let M == wij == spani Vij , . . . ' Vn } , j == 1 , 2 , . . . ' k. These spaces were =::) Nk and considered in Proposition III.3.4 ( ii ) . We have N1 =::) N2 dim Nj == n  i j + 1 . Hence, by Theorem III.3.2 and Exercise III.3.3 there exist orthonormal vectors Xj E Mj and orthonormal vectors Yi E M such that span{x1 , . . . , x k } == span { y1 , . . . , Yk } = W , say. ·
By Proposition III.3.4 ( ii ) ,
·
·
Aj ( A w ) < A f; ( A) for j == 1 , 2 , . . . , k. Hence,
k
k
L(xj , Axj )
L (xj , Awxj ) == tr Aw j= l
j= l
k
k
L AJ ( Aw ) < L At ( A) . j= l j=l
•
This is what we wanted to prove. Exercise 111.3.6 Note that k
k
LAfi ( A) = L ( uii , Auii ) . j =l j=l
We have seen that the maximum in the first assertion of Theorem II/. 3. 5 is attained when M j == vi j = span { u l . . . ' Uij } ' j == 1 ' . . . ' k' and with this choice the minimum is attained for Xj == uii , j = 1 , . . . , k. Are there other choices of subspaces and vectors for which these extrema are attained ? {See Exercise II/. 1 . 3.} '
Exercise 111.3 . 7 Let [a, b] be an interval containing all eigenvalues of A and let «P(t1 , . . . , t k ) be any real valued function on [a, b] x x [a, b] that is monotone in each variable and permutationinvariant. Show that for each choice of indices 1 < i1 < < i k < n, ·
·
max
M1 C· · ·CMk dim
M j = ij
x
·
·
·
·
�!�
i EM j i W = spa ,x
,xk } orthonormal • . . .
(A f (Aw) , . . . ' Ai( Aw ) ) '
where A w is the compression of A to the space W . In Theorem 111. 3. 5 we have proved the special case of this with
III . 4
·
·
Lidskii 's Theorems
One important application of Wielandt ' s minimax principle is in proving a theorem of Lidskii giving a relationship between eigenvalues of Hermitian
III.4
Lidskii's Theorems
69
matrices A, B and A + B. This is quite"'tike our derivation of some of the results in Section 111.2 from those in Section 11I. l . Theorem 111.4. 1 Let A, B be Hermitian matrices. Then for any choice of indices 1 < i1 < < i k < n, ·
·
·
k
k
k
L A fj (A + B ) < L Afj ( A) + LAJ ( B ) .
j =l
j= l
j= l
By Theorem III. 3.5 there exist subspaces M 1 dim M j == ij such that
Proof.
C
(11!. 10)
·
·
·
C
M k , with
k
LAi; ( A + B) ==
j= l
By
Ky
Fan's maximum principle
k
k
j= l
j= l
L (xj , Bxj) < LAJ ( B) ,
for any choice of orthonormal vectors imply that
x
1 ,
..
.
,x
k.
The above two relations
k
LA}j (A + B) <
j=l
Now, using Theorem 111. 3.5 once again, it can be concluded that the first
k
term on the righthand side of the above inequality is dominated by LA t (A) . • j= l Corollary 111.4.2 If
of A, B, and
A+B
A, B
are Hermitian matrices, then the eigenvalues satisfy the following majorisation relation 
(111. 1 1 ) Exercise 111.4.3 {Lidskii 's Theorem) The vector ,.\ l ( A + B ) is in the con
vex hull of the vectors A ! ( A) + P A ! (B ), where P varies over all permuta tion matrices. [This statement and those of Theorem II/.4. 1 and Corollary IIL4 . 2 are, in fact, equivalent to each other.}
Lidskii 's Theorem can be proved without calling upon the more intricate Wielandt's principle. We will see several other proofs in this book, each highlighting a different viewpoint. The second proof given below is in the spirit of other results of this chapter.
70
III.
Variational Principles for Eigenvalues
Lidskii's Theorem ( second proof ) . We will prove Theorem III.4. 1 by induction on the dimension n. Its statement is trivial when n = 1 . Assume it is true up to dimension n  1 . When k == n, the inequality ( l i L lO ) needs no proof. So we may assume that k < n. Let Uj , Vj , and Wj be the eigenvectors of A, B, and A + B corresponding to their eigenvalues AJ ( A ) , AJ (B ) , and AJ ( A + B ) . We will consider three cases separately.
Case 1. i k < n. Let M = span { w 1 , . . . Wn  1 } and let AM be the compres sion of A to the space M . Then, by the induction hypothesis k k k L Afj ( AM + BM ) < L At ( AM ) + L AJ ( BM ) ,
j= l
j= l
j= l
The inequality ( l i L lO ) follows from this by using the interlacing principle ( III. 2 ) and Exercise III. l . 7. Case 2. 1 < i 1 . Let M == span { u2 , . . . , u n }  By the induction hypothesis k k k
L A f; _ l ( AM + BM ) < L Af; _ l ( AM ) + LAJ (BM )· j =l j= l j =l Once again, the inequality ( III. lO ) follows from this by using the interlacing principle and Exercise III. l. 7.
Case 3. i 1 == 1 . Given the indices 1 == i 1 < i 2 < · · · < i k < n, pick up the indices 1 < e l < £2 < . . . < en  k < n such that the set {ij : 1 < j < k } is the complement of the set { n  ej + 1 : 1 < j < n  k } in the set { 1 , 2, . . . , n } . These new indices now come under Case 1 . Use ( liL lO ) for this set of indices, but for matrices A and  B in place of A, B. Then note that A; ( A ) ==  A�i+ l ( A ) for all 1 < j < n. This gives nk
n k
nk
L  A�fJ + l (A + B ) < L  A �f; + l ( A ) + L  A� j + l ( B) . j=l j=l j =l Now add tr ( A + B) to both sides of the above inequality to get k
k
k
L At ( A + B ) < L,.\t ( A ) + L AJ (B) . j= l j= l j= l This proves the theorem.
•
As in Section IIL2, it is useful to interpret the above results as pertur bation theorems. The following statement for Hermitian matrices A, B can be derived from ( III. l l ) by changing variables:
( III. l2 )
I I I . 4 Lidskii 's Theo rems
71
Th is can also be written as Al ( A ) + Ai (B) < A ( A + B ) < Al ( A ) + Al ( B ) .
(III. 13)
In fact , the two righthand majorisations are consequences of the weaker maximum principle of Ky Fan. As a consequence of (III. 12) we have: T heorem 111.4.4 Let A , B be Hermitian matrices and let be any sym m etric gauge function on 1Rn . Then (Al ( A )  Al ( B )) <
<
Note that Weyl ' s perturbation theorem (Corollary III.2.6) and the in equality in Exercise 111.2. 7 are very special cases of this theorem. The majorisations in (III. 13) are significant generalisations of those in (11.35) , which follow from these by restricting A, B to be diagonal matrices. Such "noncommutative" extensions exist for some other results; they are harder to prove. Some are given in this section; many more will occur later. It is convenient to adopt the following notational shorthand. If x , y, z are nvectors with nonnegative coordinates, we will write log x < w log
k
y if 11 xj
k
<
j= l
log
x
< log
y
log x  log
<w log y
for k
=
1 , . . , n;
(III. 14)
.
j= l
y
log x <w log
if
z
11 yj ,
and
11 X i;
j= l
11 xj == 11 y} ;
k
<
(III . 15)
j= l
j=l
k
if
n
n
k
11 Yj 11 Zi1 ,
j=l
(111. 1 6)
j=l
for all indices 1 < i 1 < < i k < n. Note that we are allowing the possibility of zero coordinates in this notation. ·
·
·
Theorem 111.4. 5 {Gel 'fandNaimark} Let A, B be any two operators on 7l. Then the singular values of A, B and AB satisfy the majorisation log s ( AB )  log s ( B ) < log s ( A ) .
(III. 1 7)
Proof. We will use the result of Exercise III. 3.7. Fix any index k , 1 < k < n. Choose any k orthonormal vectors x 1 , . . . , x k , and let W be their linear span. Let q)(t1 , . , tk ) t1t 2 · tk. Express AB in its polar form AB == UP. Then, denoting by Tw the compression of an operator T to the subspace W , we have .
.
==
·
·
I det Pw l 2
I det ( (xi , Pwxj ) ) 1 2 I det( (xi , Px1) ) 12 I det ( (A* Ux i , B xj ) ) j 2
.
III.
72
Variational Principles for Eigenvalues
Using Exercise 1.5. 7 we see that this is dominated by
The second of these determinants is eq ual to det(B * B)w ; the first is equal
k to det ( AA * ) uw and by Coroll ry 11!. 1 . 5 is dominated by IJ s] ( A ) . Hence, j= l we have k
==
Now, using Exercise I1I.3. 7, we can conclude�that
k k k err At ( P ) ) 2 < II A}j ( IBI 2 ) 11 sJ ( A ) , j=l j= l j= l I.e. '
for ali i < i 1
k k k (11!. 18 ) IJ s ii ( A B) < 11 s ii ( B ) IJsj ( A ) , j= l j= l j= l • < . . . < i k < This, by definition, is what (111. 17 ) says. n.
Remark. The statement
k k k 11 Sj ( AB ) < II Sj (A) II Sj ( B ) , j=l j= l j= l
(III. 19 )
which is a special case of (1II. 18 ) , is easier to prove. It is just the statement I I A k ( AB ) I I < I I A k A I I I I A k B I I  If we te mp orarily introduce the notation s l ( A) and s i (A) for the vectors whose coordinates are the singular values of A arranged in decreasing order and in increasi ng order, respectively, then the inequalities (III. 18 ) and ( 111. 19 ) can be combined to yield
s l ( A) + log s i (B ) < log s(AB ) < log sl (A) + log sl ( B) (III.20) for any two matrices A, B. In conformity with our notation this is a sym log
bolic representation of the inequalities
k k k k k IJ sii (A) IJ sn ii + l (B) < IJ sj ( AB ) < IT sj (A )ITsj ( B ) j= l j= l j= l j= l j=l for all 1 < i1 < < i k < n. It is illuminating to compare this with the statement (III. 13) for eigenvalues of Hermitian matrices. ·
·
·
III. S
Eigenvalues of Real Parts and Singular Values
73
Corollary 111.4.6 {Lidskii) Let A, B be two positive matrices. Then all eigenvalues of A B are nonnegative and
Proof. It is enough to prove this when B is invertible, since every positive matrix is a limit of such matrices. For invertible B we can write
A B B 1 / 2 ( B l / 2 A B l / 2 ) B l / 2 . Now B 1 1 2 A B112 is positive; hence the matrix A B , which is similar to ==
it ,
has nonnegative eigenvalues. Now, from (III.20) we obtain
A 1 ( A 112 ) + log A T ( B 1 12 ) < log s( A 1 / 2 B 112 ) < log Al (A 112 ) + log A l ( B 112 ). (III.22) But s 2 (A 112 B 1 12 ) == Al ( B112 A B 112 ) _ == Al (A B ). So, the majori sations • ( IIL2 1 ) follow from (III.22) . log
III . 5
Eigenvalues of Real Parts and S ingular Values
The Cartesian decomposition A == Re A + i Im A of a matrix A associates with it two Hermitian matrices Re A == A iA "' and Im A == A ;t* . It is of interest to know relationships between the eigenvalues of these matrices, those of A, and the singular values of A. Weyl's majorant theorem (Theorem 11.3.6) provides one such relation ship: log lA( A) I < log s( A) . Some others, whose proofs are in the same spirit as others in this chapter, are given below. Proposition 111. 5 . 1 {FanHoffman} For · every matrix A
AJ (Re A) < Sj (A)
for all j == 1 , . . . , n.
Proof. Let xj be eigenvectors of Re A belonging to its eigenvalues AJ (Re A) and Yi eigenvectors of I A I belonging to its eigenvalues s1 (A) , 1 < j < n. For each j consider the spaces span {x 1 , . . . , x j } and span { Yi , . . . , Yn}. Their di mensions add up to n + 1 , so they have a nonzero intersection. If x is a unit vector in their intersection then AJ (Re A)
< <
Re ( x , Ax) j (x , Ax ) l < I ! A x j j (x , A * Ax) 1 / 2 < s1 ( A ) . (x , (Re A ) x)
==
•
74
III .
Variational Principles for Eigenvalues
Exercise 111. 5.2 {i) Let A be the 2 x 2 matrix ( � � ) . Then s 2 (A) == 0 , but Re A has two nonzero eigenvalues. Hence the vector j .A ( Re A) l l is not dominated by the vector s(A) . {ii) However, note that 1 A ( Re A) l < w s(A) for every matrix A. {Use the triangle inequality for Ky Fan norms.} Proposition 111.5.3 (Ky Fan} For every matrix A 'We have
Re .A (A) < .A ( Re A) . Proof.
Let =
Arrange the eigenvalues Aj ( A ) in such a way that
x 1 , . , X n be an orthonormal Schurbasis for A such that Aj ( A) ( xj , Axj ) · Then Aj ( A ) == ( xj A * xj ) · Let W = span {x1 , � �· · , xk}· Then . k L ( xj , ( Re A ) xj ) = tr ( Re A) w j= l k k 2:: Aj ( ( Re A) w ) < 2:: A} ( Re A) . j= l j= l . .
,
•
Exercise 111. 5.4 Give another proof of Proposition II/. 5. 3 using Schur 's theorem (given in Exercise 11. 1 . 12). Exercise 111. 5.5 Let X, Y be Hermitian matrices. Suppose that their eigen values can be indexed as Aj (X) and Aj ( Y) , 1 < j < n , in such a way that Aj (X) < Aj (Y) for all j . Then there exists a unitary U such that
X < U * YU .
{ii) For every matrix A there exists a unitary matrix U such that Re A < U* I A I U .
interesting consequence of Proposition III.5. 1 is the following version of the triangle inequality for the matrix absolute value: An
Theorem 111. 5.6 (R. C. Thompson) Let A, B be any two matrices. Then there exist unitary matrices U, V such that l A + B l < U I A I U* + V I B I V* .
Let A + B = W I A + B l be a polar decomposition of A + B. Then we can write . l A + B l == W* (A + B) == Re W* ( A + B ) == Re W* A + Re W * B. • Now use Exercise III. 5.5(i i ) . Proof.
III.6
Problems
75
Exerci se 111. 5 . 7 {i} Find 2 x 2 matrices A , B such that the inequality l A + B l < I A I + I B I is false for them . {ii} Find 2 x 2 matrices A, B for which there does not exist any unitary matrix U such that l A + B l < U ( I A I + I B I ) U* . III . 6
Problems
Problem 111. 6. 1 . ( The minimax principle for singular values ) any operator A on 7t we have
max min N:dim N=
For
min l l Ax l l max l l Ax l l j + l N l=l
M :dim M=j xEM , I Ix l l = l
n
for 1 < j <
xE , I I x l
n.
Problem 111.6 . 2 .
Let A , B be any two operators. Then sj (AB) < l l B l l sj (A ) , sj (A B) < I I A l l sj ( B )
for 1 < j <
n.
Problem 111.6. 3 .
For j �j
Show that for j
==
== ==
0, 1, . . . , n, {T E .C (H )
:
rank T < j } .
1 , 2 , . . . , n, sj ( A)
Problem 111.6.4. of rank k, then
let
=
min I lA  TI I TE �j  1
Show that if A is any operator and H is any operator
sj (A) > Sj+k ( A + H ) ,
j
==
1 , 2, .
. . , n 
Problem 111.6.5. For any two operators A , B and that i + j < n + 1 , we have
any two indices i, j such
Si +j  l ( A + B) < S i (A) + Sj (B) s i+j I (A B ) < s i (A ) sj ( B ) .
k.
76
III.
Variational Principles for Eigenvalues
Problem 111.6 . 6 . 1 , 2, . . . , n, we have
Show that for every operator A and
for
each k
k
k Lsj ( A ) == max L (yj , Axj) , j =l j= l 1 where the maximum is over all choices of orthonormal ktuples x 1 , . . . , x k and y 1 , . . . , Yk . This can also be written as k k Lsj ( A) == max L( xj , UAxj) , j=1 j= where the maximum is taken over all choices of unitary operators U and orthonormal ktuples x 1 , . . . , X k · Note that for k == 1 this reduces to the statement I I A I I == I (y , Ax ) 1 l
l lx i i = I IYI I =l
For k == 1 , 2, . . . the above extremal representations can be used to give k another proof of the fact that the expressions I I A I I c k ) == L sj ( A) are norms . j=l ( See Exercise II. l. l 5 . ) ( aij ) be a Hermitian matrix. For each i Problem 111.6.7. Let A 1, . . . let 1/ 2 ri == L 1 aij l 2 j::f:. i Show that each interval [a ii  ri, aii + ri ] contains at least one eigenvalue of A . Problem 111.6.8. Let a 1 > a 2 > > a n be the eigenvalues of a Her mitian matrix A. We have seen that the n  1 eigenvalues of any principal submatrix of A interlace with these numbers. If 81 > 82 > · · · > 8n  1 are the roots of the polynomial that is the derivative of the characteristic polynomial of A, then we have by Rolle's Theorem , n,
, n,
·
·
·
Show that for each j there exists a principal submatrix B of A for which a1 > AJ ( B) > 8j and another principal submatrix C for which 8j >
AJ (C) > ai + l ·
Most of the results in this chapter gave descriptions of eigenvalues of a Hermitian operator in terms of the numbers (x, Ax)
Problem 111.6.9.
IIL6
Problems
77
varies over unit vectors. Sometimes in computational problems an "approximate" eigenvalue .,.\ and an "approximate" eigenvector x are already known. The number (x , Ax) can then be used to further refine this informatio n . For a given unit vector x , let p (x , Ax) , c I I (A  p)x l l . ( i) Let (a, b) be an open interval that contains p but does not contain any eigenvalue of A . Show that when x
==
==
( b  p) ( p  a) < c 2 . ( ii) Show that there exists an eigenvalue a of A such that
In  PI <
£.
Let p and c be defined as in the above problem. Let ( a , b) be an open interval that contains p and only one eigenvalue a of A. Then 2 2 P roblem 111.6. 10.
£
p
£
.
pa bp KatoTemple inequality. Note
that if p  a and b  p This is called the are much larger than £ , then this improves the inequality in part ( ii ) of Problem III.6.9. Problem 111. 6. 1 1 . Show that for every Hermitian matrix A k max tr UAU* , L .AJ ( A ) UU* = lk j= l
k min tr UA U* L .AJ ( A) U U * = lk j=l for 1 < k < n, where the extrema are taken over k matrices U that satisfy UU* Ik , Ik being the k k identity matrix. Show that if A is positive, then k max det U AU* , rr.,.\; (A ) U U * =lk x n
x
==
j= l
k ll .A} ( A) j= l
min det UAU* .
UU* = lk
( See Problem 1.6. 15.) Problem 111.6. 12. n
Let A , B be any matrices. Then
Lsj ( A ) sj ( B) supjtr UAV B I == sup Re tr UAVB , ==
j= l
U, V
U, V
78
III.
Variational Principles for Eigenvalues
where U, V vary over all unitary matrices. Problem 111.6. 13. ( Perturbation theorem for singular values ) Let A, B be any n x n matrices and let
(s( A  B) ) .
In particular,
max l sj ( A)  Sj ( B) I < I l A  B I I [Hint : See Theorem 111.4.4 and Exercise 11. 1.15.] Problem 111.6. 14. For positive matrices A, B show that For Hermitian matrices A, B show that ( A ! (A) , A T (B ) ) < tr AB < ( A ! ( A) , _x! (B)) . (Compare these with (11.36) and (II.37) .) Problem 111.6. 15. Let A, B be Hermitian matrices. Use the second part of Problem 111.6. 14 to show that Note the analogy between this and Theorem III.2.8. (In Chapter IV we will see that both these results are true for a whole family of norms called unitarily invariant norms. This more general result is a consequence of Theorem III.4.4. ) III. 7
Notes and References
As pointed out in Exercise III.l.6, many of the results in Sections III. 1 and III.2 could be derived from each other. Hence, it seems fair to say that the variational principles for eigenvalues originated with A.L. Cauchy's interlacing theorem. A pertinent reference is Sur f. 'equation a f 'aide de laquelle on determine les inegalites seculaires des mouvements des planetes, 1829, in A . L. Cauchy, Oeuvres Completes (lie Serie}, Volume 9, Gauthier
Villars. .. The minimax principle was first stated by E. Fischer, Uber Quadratische Formen mit reellen Koeffizienten, Monatsh. Math. Phys., 16 (1905 ) 234249. The monotonicity principle and many of the results of S ection III.2 were proved by H. Weyl in Das asymptotische Verteilungsgesetz der Eigen werte linearer partieller Differentialgleichu "!gen, Math. Ann., 71 (1911 ) 441469. In a series of papers beginning with Uber die Eigenwerte bei den Dif ferentialgleichungen der mathematischen Physik, Math. Z. , 7(1920 ) 157,
III. 7
Notes
and
References
79
R.
Courant exploited the full power of the minimax principle. Thus the principle is often described as the CourantFischerWeyl principle. As the titles of these papers suggest, the variational principles for eigen values were discovered in connections with problems of physics. One fa mous work where many of these were used is The Theory of Sound by Lord Rayleigh, reprinted by Dover in 1945. The modern applied mathematics classic Methods of Mathematical Physics by R. Courant and D . Hilbert, Wiley, 1953, is replete with applications of variational principles. For a still more recent source, see M. Reed and B. Simon, Methods of Modern Math ematical Physics, Volume 4, Academic Press, 1978. Of course, here most of the interest is in infinitedimensional problems and consequently the results are much more complicated. The numerical analyst could turn to B.N. Par lett, The Symmetric Eigenvalue Problem, PrenticeHall, 1980, and to G.W. Stewart and J.G. Sun, Matrix Perturbation Theory, Academic Press, 1990. The converse to the interlacing theorem given in Theorem III. l.9 was first proved in L. Mirsky, Matrices with prescribed characteristic roots and diagonal elements, J. London Math. Soc. , 33 ( 1958) 1421. We do not know whether the similar question for higher dimensional compressions has been answered. More precisely, let a1 > > f3n , be > a n , and {31 > real numbers such that Eai E{3i . What conditions must these num bers satisfy so that there exists an orthogonal projection P of rank k such that the matrix A diag ( a 1 , . . . , a n ) when compressed to range P has eigenvalues {31 , . . . , f3k and when compressed to ( range P) j_ has eigenvalues  1. ) f3k + 1 , . . .. , f3n ? ( Theorem III. 1. 9 is the case k Aronszajn's inequality appeared in N. Aronszajn, RayleighRitz and ·
·
·
·
·
·
==
==
== n
A. Weinstein methods for approximation of eigenvalues. I. Operators in a Hilbert space, Proc. Nat. Acad. Sci. U.S.A. , 34( 1948) 474480. The ele gant proof of its equivalence to Weyl 's inequality is due to H. W. Wielandt, Topics in the Analytic Theory of Matrices, mimeographed lecture notes,
University of Wisconsin, 1967. Theorem III.3.5 was proved in H.W. Wielandt, An extremum property of sums of eigenvalues, Proc. Amer. Math. Soc., 6 ( 1955 ) 106110. The motivation for Wielandt was that he "did not succeed in completing the interesting sketch of a proof given by Lidskii" of the statement given in Exercise III.4.3. He noted that this is equivalent to what we have stated Theorem III.4.1 , and derived it from his new minimax principle. Inter estingly, now several different proofs of Lidskii 's Theorem are known. The second proof given in Section III.4 is due to M.F. Smiley, Inequalities re lated to Lidskii 's, Proc. Amer. Math. Soc., 19 ( 1968 ) 10291034. We will see some other proofs later. However , Theorem III.3.5 is more general, has several other applications, and has led to a lot of research. An account of the earlier work on these questions may be found in A.R. AmirMoez, as
Extreme Properties of Linear Transformations and Geometry in Unitary Spaces, Texas Tech. University, 1968, from which our treatment of Sec tion III.3 has been adapted. An attempt to extend these ideas to infinite
80
III. Variational Principles for Eigenvalues
dimensions was made in R.C. Riddell, Minimax problems on Grassmann manifolds, Advances in Math. , 54 ( 1984) 107199, where connections with differential geometry and some problems in quantum physics are also de veloped. The tower of subspaces occurring in Theorem III.3.5 suggests a connection with Schubert calculus in algebraic geometry. This connection is yet to be fully understood. Lidskii's Theorem has an interesting history. It appeared first in V .B. Lidskii, On the proper values of a sum and product of symmetric matrices, Dokl. Akad. Nauk SSSR, 75 (1950) 769772. It seems that Lidskii provided an elementary ( matrix analytic ) proof of the result which F. Berezin and I.M. Gel'fand had proved by more advanced ( Lie theoretic ) techniques in connection with their work that appeared later in Some remarks on the theory of spherical functions on symmetric Riemannian manifolds, Trudi Moscow Math. Ob., 5 (1956) 311351. As mentioned above, difficulties with this "elementary" proof led Wielandt to the discovery of his minimax prin ciple. Among the several directions this work opened up, one led to the follow ing question. What relations must three ntuples of real numbers satisfy in order to be the eigenvalues of some Hermitian matrices A, B and A + B? Necessary conditions are given by Theorem III.4. 1. Many more were discov ered by others. A. Horn, Eigenvalues of sums of Hermitian matrices, Pacific J. Math. , 12(1962) 225242, derived necessary and sufficient conditions in the above problem for the case 4, and wrote down a set of conditions which he conjectured would be necessary and sufficient for > 4. In a short paper Spectral polyhedron of a sum of two Hermitian matrices, Functional Analysis and Appl., 10 ( 1982 ) 7677, B.V . Lidskii has sketched a "proof" establishing Horn's conjecture. This proof, however, needs a lot of details to be filled in; these have not yet been published by B.V. Lidskii ( or anyone else). When should a theorem be considered to be proved? For an interesting discussion of this question, see S. Smale, The fundamental theorem of al gebra and complexity theory, Bull. Amer. Math. Soc. ( New Series) , 4 ( 1981) 136. Theorem III.4.5 was proved in LM. Gel'fand and M. Naimark, The rela n ==
n
tion between the unitary representations of the complex unimodular group and its unitary subgroup, lzv Akad. Nauk SSSR Ser. Mat. 14 ( 1950) 239
260. Many of the questions concerning eigenvalues and singular values of sums and products were first framed in this paper. An excellent summary of these results can be found in A.S. Markus, The eigenand singular val ues of the sum and product of linear operators, Russian Math. Surveys, 19 ( 1964 ) 92120. The structure of inequalities like (liL lO) and ( III.18 ) was carefully anal ysed in several papers by R.C. Thompson and his students. The asymmetric way in which A and B enter (III. 10 ) is remedied by one of their inequalities,
III. 7 Notes and References
81
which says k
k
k
j =l
j=l
j =l
LA L+Pi j ( A + B ) < LAt (A ) + LA�j (B )
1 < p 1 < · · · < P k < such that A similar generalisation of (III. 18 ) has also been proved. ik + Pk  k < References to this work may be found in the book by Marshall and Olkin cited in Chapter II. Proposition III.5. 1 is proved in K. Fan and A.J. Hoffman, Some metric inequalities in the space of matrices, Proc. Amer. Math. Soc. , 6 (1955 ) 1 1 11 16. Results of Proposition 111.5.3, Problems III.6.5, III.6.6, III.6. 11, and III.6. 12 were first proved by Ky Fan in several papers. References to these may be found in I.C. Gohberg and M.G. Krein, Introduction to the Theory of Linear Nonselfadjoint operators, American Math. Society, 1969, and in the MarshallOlkin book cited earlier. The matrix triangle inequality (Theorem III.5.6 ) was proved in R.C. Thompson, Convex and concave functions of singular values of matrix sums, Pacific J. Math. , 66 (1976) 285290. An extension to infinite di mensions was attempted in C. Akemann, J. Anderson, and G. Pedersen, Triangle inequalities in operator algebras, Linear and Multilinear Algebra, 11(1982 ) 167178. For operators A, B on an infinitedimensional Hilbert space there exist isometries U, V such that for any indices 1
< i 1 < · · · < ik <
n,
n,
n.
l A + B l < U I A I U* + V I B I V * .
Also, for each £ > 0 there exist unitaries U, V such that l A + B j < U I A I U*
+
V I B I V * + c. I .
It is not known whether the part in the last statement is necessary. Refinements of the interlacing principle such as the one in Problem III.6.8 have been obtained by several authors, including R.C. Thompson. See, for example, his paper Principal submatrices II, Linear Algebra Appl. , 1( 1968) 211243. One may wonder whether there are interlacing theorem, for singular val ues . There are, although they are a little different from the ones for eigen values. This is best understood if we extend the definition of singular values to rectangular matrices. Let A be an m matrix. Let r min(m, ) The r numbers that are the common eigenvalues of ( A* A ) 1 1 2 and ( A A * ) 11 2 are called the singular values of A. ( Sometimes a sequence of zeroes is added to make max(m, ) singular values in all.) Many of the results for singular values that we have proved can be carried over to this setting. See, e.g., the books by Horn and Johnson cited in Chapter I. c:
x n
n
==
n .
III. Variational Principles for Eigenvalues
82
Let A be a rectangular matrix and let B be a matrix obtained by deleting any row or any column of A. Then the minimax principle can be used to prove that the singular values of A and B interlace. The reader should work this out, and see that when A is an n x n matrix and B a principal submatrix of order n 1 then this gives 
s1 ( A) s 2 ( A)
. . . . . . . . . .. S n  2 (A) S n l ( A)
s 1 (B) s 2 (B)
> >

..... .. ... .
Sn2 (B) Sn l (B)
> >
> >
s3 (A) , s 4 (A) ,
> >
Sn (A) ,

.
0.
For more such results, see R.C. Thompson, Principal submatrices IX, Linear Algebra and Appl . , 5(1972 ) 112. Inequalities like the ones in Problems III.6.9 and III.6. 10 are called "resid ual bounds" in the numerical analysis literature. For more such results, see the book by Parlett cited above, and F. Chatelin, Spectral Approximation of Linear Operators, Academic Press, 1983. =Several refinements, exten sions, and applications of these results in atomic physics are described in the book by Reed and Simon cited above. The results of Theorem 111.4.4 and Problem III.6. 13 were noted by L. Mirsky, Symmetric gauge functions and unitarily invariant norms, Quart. J. Math. , Oxford Ser. (2) , 11(1960 ) 5059. This paper contains a lucid sur vey of several related problems and has stimulated a lot of res�arch. The inequalities in Problem 111.6.15 were first stated in K. Lowner, Uber mono tone Matrix functionen, Math. Z . , 38 ( 1934 ) 177216. Let A == U P be a polar decomposition of A. Weyl's majorant theorem gives a relationship between the eigenvalues of A and those of P ( the sin gular values of A) . A relation between the eigenvalues of A and those ofU was proved by A. Horn and R. Steinberg, Eigenvalues of the unitary part of a matrix, Pacific J. Math., 9(1959 ) 541550. This is in the form of a majorisation between the arguments of the eigenvalues: arg A ( A) < arg A ( U ) . A theorem, very much like Theorems 111.4. 1 and III.4.5 was proved by A. Nudel'man and P. Svarcman, The spectrum of a product of unitary ma trices, Uspehi Mat. N auk, 13 ( 1958) 1111 17. Let A, B be unitary matrices. Label the eigenvalues of A, B , and AB eia1 , , eian ; eif31 , , eif3n , and e i rn , respectively, in such a way that e ir1 , as
•
•
•
•
•
,
2 7f
>
27r >
{31 rl
>
·
>
.
·
�
·
.
> f3n >
0,
> rn > 0 .
•
•
•
•
III. 7
If a 1
have
+
{3 1 <
Notes and References
21r, then for any choice of indices 1 k
LTij
j= l
<
k
k
j= l
j= l
< i1 < · · · < ik <
83 n
we
L o:ij + Lf3j ·
These inequalities can also be written in the form of a majorisation between nvectors: I

a. < {3 .
For a gene ralisation in the same spirit as the one of inequalities (III. lO) and ( III.18 ) mentioned earlier, see R.C. Thompson, On the eigenvalues of a product of unitary matrices, Linear and Multilinear Algebra, 2 ( 1974 ) 1 324.
Symmetric Norms
In this chapter we study norms on the space of matrices that are invariant under multiplication by unitaries. Their properties are closely linked to those of symmetric gauge functions on lRn . We also study norms that are invariant under unitary conjugations. Some of the inequalities proved in earlier chapters lead to inequalities involving these norms. IV . l
Nor1ns on cc n
Let us begin by considering the familiar pnorms frequently used in analysis. For a vector x ( . . . , X n ) we define ==
x1 ,
n
l l x i i P == (L l x i l p )lfp , 1 < p < oo, i=l
( IV. l) ( IV.2 )
each 1 < p < oo, l lx l l p defines a norm on e n . These are called the pnorms or the lp norms. The notation (IV . 2) is justified because of the fact that ( IV.3) m llxJi p · llx l l oo plioo For
==
Some of the pleasant properties of this family of norms are
( IV.4 )
IV. l
Norms on
85
Cn
( IV.5 ) H x iJ P < IIYII P if l x l < I Y I , (IV.6) l l x l l p == I I Px l l p for all X E e n ' p E Sn . ( Recall the notations : l x l == ( !x l ! , . , !x n l ) , and lxl < I Y I if l xi l < I Yi l for 1 < j < n . Sn is the set of permutation matrices. ) A norm on e n is called gauge invariant or absolute if it satisfies the condition (IV.4) , mono tone if it satisfies (IV. 5 ) and permutation invariant or symmetric if it satisfies (IV .6) . The first two of these conditions turn out to be equivalent: P rop osition IV. l . l A norrn on e n is gauge invariant if and only if it is mo noto ne. .
.
,
P roof. Monotonicity clearly implies gauge invariance. Conversely, if a norm II · II is gauge invariant, then to show that it is monotone it is enough to show that IJ x l l < IJ Y II whenever Xj == tj yj for some real numbers , n. Furtber, it suffices to consider the special case 1 , 2, 0 < " tj < 1 , j when all tj except one are equal to 1 . But then ==
(
. . .
l j ( y l , · · · ' t yk , · · · ' Yn ) l l l+t 1t 1+t 1t , Y1 + Yk , Yk 2 Y1 , · · ·
2
2
2
•
·
·
,
1+t
2
Yn
1t
+ 2
) Yn
1t
l+t
< 2 Jj (y l , · · · ' Yn ) ll + 2 II ( Y I , · · · '  yk , · · ' Yn ) l l ·
==
•
I J ( Y 1 , · · ' Yn ) II · ·
Example IV. 1.2 Consider the following norms on IR2 : {i) l l x l l == lx 1 l + l x 2 l
+ l x1  x2 1 ·
+ l x 1  x2 1 · 21 x1 l + l x 2 ! 
{ii) l l x l l == l x1 l {iii} ll x l l
==
The first of these is symmetric but not gauge invariant, the second is neither symmetric nor gauge invariant, while the third is not symmetric but is gauge invariant.
Norms that are both symmetric and gauge invariant are especially inter esting. Before studying more examples and properties of such norms, let us make a few remarks. Let T be the circle group; i.e., the multiplicative group of all complex numbers of modulus 1. Let Sn o T be the semidirect product of Sn and T . In other words, this is the group of all n x n matrices that have exactly one nonzero entry on each row and each column, and this nonzero entry has modulus 1 . We will call such matrices complex permutation matrices. Then a norm I I . II on en is symmetric and gauge invariant if II x II == II Tx II
for all complex permutations
T.
( IV . 7 )
86
IV. Symmetric Norms
In other words, the group of ( linear ) isometries for II · I I contains Sn o T as a subgroup. ( Linear isometries for a norm 11 · 11 are those linear transformations on en that preserve II · II  ) Exercise IV. 1 .3 For the Euclidean norrn l l x ll 2 == (2:: j x i l 2 ) 1 1 2 the group of isometries is the group of all unitary matrices, which is much larger than the complex permutation group. Show that for each of the norms 1! x l l 1 and llx l l oo the group of isometries is the comp lex permutation group.
Note that gauge invariant norms on en are determined by those on R n . Symmetric gauge invariant norms on JRn are called symmetric gauge functions. We have come across them earlier (Example II.3.13 ) . To repeat, map : IR.n IR+ is called a symmetric gauge function if
a
+
(i ) 4> is a norm, ( ii )
( Px ) == ( x )
for all x E Rn and P E Sn ,
In addition, we will always assume that is normalised, so that (iv)
( 1 , 0, . . . , 0) == 1 .
The conditions (ii ) and (iii ) can be expressed together by saying that is invariant under the group Sn o Z 2 consisting of permutations and sign changes of the coordinates. Notice also that a symmetric gauge function is completely determined by its values on JR� . Example IV. 1 .4 If the coordinates of x are arranged so that j x 1 l > l x2 l > . . . > l x n I , then for each k == 1 , 2, . . . , n , the function k
(k) (x) ==
(IV.8 )
L l xj l
j= l
is a symmetric gauge function. We will also use the notation ll x ll (k ) for these. The parentheses  are used to distinguish these norms from the p norms defined earlier. Indeed, note that l l x ii ( I ) == ll x l l oo and ll x l l ( n ) == l l x l h ·
We have observed in Problem 11.5. 1 1 that these norms play a very distin guished role : if ( k ) ( x) < ( k ) ( Y ) for all k = 1 , 2 , . . then (x) < (y) for every symmetric gauge function . Thus an infinite family of norm inequalities follows from a finite one. Proposition IV. 1 . 5 For each k == 1 , 2, . . . .
, n,
, n,
(IV.9)
IV. l Norms on en
87
P roof. We may assume, without loss of generality, that x E IR� . If x == u + v , then (k ) ( x) < ( k ) ( u ) + ( k ) ( v ) < (n ) ( u ) + kc l ) ( v) . If we choose
u
v
( x 1!  xk1 , x2!  x k1 , . . . , x k1  x k1 , O , . . . 0) ,
( x i , . . . , x i , x i +1 , . . . , x � ) ,
then u+v ( n ) ( u )
xi
xk ,
'
( k ) ( x)  k x i , 1
•
and the proposition follows.
We now derive some basic inequalities . If f is a convex function on an interval I and if a i , i 1 , 2, are nonnegative real numbers such that n Ea i 1 , then ==
. . . , n,
==
i=l
n
n
J ( E a i t i ) < L ai f (t i ) i= l
for
i =l
all t i E /.
 log t on the interval
Applying this to the function f (t) obtains the fundamental inequality
( 0, oo ) ,
one
(IV. 10)
This is called the ( weighted ) arithmeticgeometric mean inequality . The special choice a 1 a2 � gives the usual an arithmetic  geometric mean inequality ==
==
·
·
·
==
==
(IV. 1 1 ) Theorem IV. 1 . 6 Let p, q be real numbers with p > 1 and x , y E lRn . Then for every symmetric gauge function
!+�
==
1 . Let
(IV. 12) Pro of.
From the inequality (IV. 10) one obtains q x iP IYi l , lx Yl < p + .
q
88 · .
IV.
Symmetric Norms
and hence
( IV. 13)
For t > 0, if we replace x , y by t x and t  1 y , then the lefthand side of ( IV. 13) does not change. Hence, (IV. l4) But, if cp(t)
1 == tP a + b qtq , p
where t , a , b > 0 ,
then plain differentiation shows that min cp(t) a 1 1Pb 1 f q . So, (IV. l2) follows from ( IV. 14) . ==
•
When ( n ) , ( IV. l2) reduces to the familiar Holder inequality n n n L l x i Yi l < (2= lx i i P ) 1 1P( L !Yi l q ) l f q _ i=l i= l i= l We will refer to ( IV. 12) as the Holder inequality for symmetric gauge functions. The special case p 2 will be called the CauchySchwarz ==
==
inequality for symmetric gauge functions.
Exercise IV. l. 7 Let p, q , r be positive real numbers with that for every symmetric gauge function 4> we have Theorem IV. 1 . 8 Let Then for all x , y E IRn
be
p + 1q
1
any symmetric gauge function
==
1. r
Show
( IV. l 5) and let p > 1 . (IV. l6)
When p 1, the inequality (IV. l6) is a consequence of the triangle inequalities for the absolute value on JRn and for the norm . Let p > 1. It is enough to consider the case x > 0, y > 0. Make this assumption and write ( x + y ) P x · ( x + y ) p  1 + y · (x + y ) P  1 _ Now, using the triangle inequality for
( X (X + y ) p 1 ) + tJ> ( y · ( X + y ) p  l ) < [ ( x P ) ] l f P [
( yP ) ] I fp [
( x P ) ] l fp + [ ( yP )] 1 f p } [( ( x + y ) P ) ] l f q ,
Proof.
==
==
·
q (p  1) [ 4> ( (x + y)P )] 1 1q,
si nce
IV. l Norms on Cn
p . If
89
we divide both sides of the above inequality by we get (IV . 16 ) . •
. :::�
Once again, when ci> ci>(n) the inequality (IV. 16) reduces to the familiar Minkowski inequality. So, we will call ( IV. 16 ) the Minkowski inequality ==
for symmetric gauge functions.
Exercise IV. 1.9 Let
ci>
be a symmetric gauge function and let p >
==
1. Let (IV . 17 )
[ ( l x i P )] 1/p .
Show that q> (P) is also a symmetric gauge function. Note that, if is the family of fp norms, then
p
ci>�2) cl>p1p2 ==
for all PI , P 2 >
1,
(IV. 18)
and, if ci> ( k ) is the norm defined by (IV. B), then
k
ci> ��� ( X ) == ( L I Xj I p ) 1 Ip '
{IV. 19 )
j= l
where the coordinates of x are arranged as l x 1 l > l x 2 l > · · > l x j . n ·
Just as among the lpnorms, the Euclidean norm has especially interest ing properties, the norms <J>(2) where ci> is any symmetric gauge function have some special interest. We will give these norms a name:
Definition IV. l . lO W is called a quadratic symmetric gauge func tion, or a Qnorm, if W == (2) for some symmetric gauge function . In other words,
(IV.20 )
Exercise IV. l . l l (i) Show that an lp norm is a Qnorm if and only if p > 2. (ii) More generally, show that for each k == 1, 2, . . . , n, ci> ��� is a Qnorm if and only if p > 2. Exercise IV. 1 . 12 We saw earlier that if
( k ) (Y) for all k == 1 , 2, . . . , n, then ci>(x) < ci>(y) for all symmetric gauge functions. Show that <
( 2 ) (x) < cpC2> (y) for all if symmetric gaug e functions ; i. e. , W ( x ) < W ( y) for all quadratic symmetric gauge functions.
�!�(x)
If q> is a norm on en , the dual of is defined as ' (x) sup l (x , y) l. ==
tl>(y ) = l
( IV.21 )
It is easy to see that ' is a norm even when is a function on en that does not necessarily satisfy the triangle inequality that but meets the other requirements of norm. ) a
90
IV.
Symmetric Norms
Exercise IV . 1 . 13 If Exercise IV. l . l4
Exercise IV. 1 . 15
is a symmetric gauge function then so is ' . Show that for any norm
l ( x, y ) l <
'(y) for all x, y. Let
1 1 + p q 

==
(IV.22)
(IV.23)
1.
Let and W be two norms such that 0.
Exercise IV. 1 . 16
Show that
'(x) > c  1 w' (x) for all x. We shall call a symmetric gauge function a Q'norm if it is the dual of a Qnorm. The lpnorms for 1 < p < 2 are examples of Q ' norrns. Exercise IV. 1 . 1 7 (i) Let be a norm such that
must be the Euclidean norm. (ii) Let
must be the Eu clidean norm. {Use Exercise IV. 1 . 1 6 and the fact that every symmetric gauge function is bounded by the l1 norm.) Exercise IV. 1 . 18 For each k = 1, 2, the dual of the norm ( k ) is given by (IV.24) ( k ) (x) = max { l ) ( x ), � ( n) (x) . Prove this using Proposition IV. 1 . 5 and Exercise IV. 1 . 1 6. Some ways of generating symmetric gauge functions are described in the following exercises. Exercise IV. 1 . 19 Let 1 = a 1 > a 2 > · · · > a n > 0. Given a symmetric gauge function
{
}
Then W is a symmetric gauge function. Exercise IV . 1 .20 (i) Let be a symmetric gauge function on R_n . Let m < n. If E Rm ' let X == (xi , . ' X m. , 0 , 0 , . . . ' 0) and define w (x) == m we define
is a symmetric gauge function on IRn . X
.
.
IV.2 Unitarily Invariant Norms on Operators on en
IV . 2
91
Unit arily Invariant Norms on O p erators on cc n
In this section, e n will always stand for the Hilbert space cc n with inner product ( · , ·) and the associated norm 11 · 11 · ( No subscript will be attached to this "standard" norm as was done in the previous Section. ) If A is a linear ope rator on e n ' we will denote by II Al l the operator ( bound ) norm of A defined as (IV.25) II A l l == sup I I A x il . l l x ll = l
As before, we denote by I A I the positive operator (A* A ) 11 2 and by s ( A ) the vector whose coordinates are the singular values of A , arranged as s 1 ( A ) > s2 ( A ) > · · · > s n (A) . We have II A I I == II l A I II == s 1 ( A) .
(IV.26)
Now, if U, v are unitary operators on e n , then I U AV I == V* I A I V and hence II A I I == II U A V II
(IV. 27)
for all unitary operators U, V . This last property is called unitary invari ance. Several other norms have this property. These are frequently useful in analysis, and we will study them in some detail. We will use the symbol Il l · I l l to mean a norm on n n matrices that satisfies x
(IV.28) I I I U A V I I I == I I I A I I I for all A and for unitary U, V . We will call such a norm a unitarily in variant norm on the space M ( n ) of n x n matrices. We will normalise such norms so that they all take the value 1 on the matrix diag(l, 0, . . . , 0) .
There is an intimate connection between these norms and symmetric gauge functions on IRn ; the link is provided by singular values.
Theorem IV. 2 . 1 Given a symmetric gauge function
== (s(A)).
Then this defines a unitarily invariant norm on M ( n ) . Conversely, given any unitarily invariant norm I l l · I l l on M ( n ) , define a function on Rn by
where diag
1 1 1 · 1 1 1 (x ) == I l l diag ( x ) I I j ,
(IV.30)
( ) is the diagonal matrix with entries x 1 , . . . , X n on its diagonal. x
Then this defines a symmetric gauge function on IRn .
Since s(UAV ) == s ( A ) for all unitary U, V , I l l · I I I is unitarily invariant. We will prove that it obeys the triangle inequality  the other Proof.
92
IV . Symmetric Norms
conditions for it to be a norm are easy to verify. For this, recall the majori sation ( II. 18 )
s ( A + B ) < w s( A) + s ( B ) for all A , B E M ( n) ,
and then use the fact that
n I I A II P == oo (s( A)) == S 1 ( A) == II A II .
( IV.31 ) ( IV.32 )
The second is the class of Ky Fan knorms defined as k
I I A li c k ) == L si ( A ) , 1 < k < j=l
(IV.33)
n.
Among the pnorms, the ones for the values p == 1, 2, oo, are used most often. As we have noted, I I AI I oo is the same as the operator norm I I A I I and the Ky Fan norm 1 1 AI Ic 1 ) · The norm I I A i h is the same as I I AI! cn ) · This is equal to tr ( J AI) and hence is called the trace norm, and is sometimes written as I I A ilt r · The norm
n I I A II 2 == [L ( Sj ( A ) ) 2 ] 1 /2 j= l
(IV.34)
is also called the HilbertSchmidt norm or the F'robenius norm ( and is sometimes written as I I A I I F for that reason ) . It will play basic role in our analysis. For A , B E M ( n ) let a
( A , B) == trA *B .
( IV.35 )
This defines an inner product on M ( n ) and the norm associated with this inner product is I I AI I2 , i.e. , (IV.36)
IV .2 Unitarily Invariant Norms
If the
matrix A has entries aij , then
on Operators on Cn
93
__
(IV.37) 'l. ,J
Thus the norm I I A J I 2 is the Euclidean norm of the matrix A when it is thought of as an element of e n . This fact makes this norm easily computable and geometrically tractable. The main importance of the Ky Fan norms lies in the following: 2
Theorem IV.2.2 (Fan Dominance Theorem) Let A , B be two trices. If IIAII c k ) < II B i l e k ) for k == 1 , 2, . . . , n , the n
I ! lAI I I < I I IBI I I
n
x
n
ma
for all unitarily invariant norms.
This is a consequence of the corresponding assertion about sym • metric gauge functions. ( See Example IV. 1 .4 . ) Proof.
Since
x
E
IRn
and for all symmetric
I I A I I < I I I A I I I < II A II cn ) == I I A ih
(IV.38)
for all A E M (n) and for all unitarily invariant norms I l l Analogous to Proposition IV. l . 5 we have Proposition IV. 2 .3 For each k II A I I c k )
Proof.
Now let
==
·
Ill
1 , 2, . . . , n ,
== min { H E l l en) + k i i C II
: A == B + C } .
I f A == B + C , then J I A II c k ) < II B II c k ) + I I C l i c k ) < I I E l l en) + k ii C II . s(A ) == ( s1 , . . . , sn ) and choose unitary U, V so that A == U [ ( diag ( s 1 . . . , s n ) ] V. ,
Let
(IV.39)
B
C
U (d iag ( s 1  sk , s 2  s k , . . . , Sk  sk , 0 , . , O ) ]V, U [diag ( s k , S k , S k , S k + l , . . . , s n ) ] V. . .
•
Then k
I J B II cn)
==
.
.
,
A == B + C,
L Sj  ks k
j= l
==
II A II ck)  k sk ,
I V . Symtnctric Nonns
94
and
II A I I < k l A norm
11
=
II B II
+
•
k ii G II .
on M ( n ) is called symm e t r i c if for A, B, G in M(n)
v ( B AC ) < II B I I v (A ) II C I I .
P ro p osi tio n IV . 2 . 4
A
unitarily invariant.
(IV.40 )
norm on M ( n ) is symmetric if and only if i t
is
If 11 is a symmetric norm, then for unitary U, V we have v(U A V) < v (A) and v(A) v(U1 U A v v  1 ) < v(U A V ) . So, v is unitarily invariant . Conversely, by Problem II I . 6 . 2 , s; (BAC) < II B II I I C II sJ (A) for all j = 1 , 2 , , . . , n . So, if
(s ( BAC) ) < • II B II II C II
Proof.
=
In particul ar7 this implies that every unitarily invariant norm is sub
mult i p licative :
I I I AB I I I < I I I A I I I I I I B I I I
for all A, B.
Inequal ities for sums and prod ucts of singular values of xnatricesl when combined with inequalities for symxnetric gauge function s proved in Section IV . 1 , lead to interesting st atexnents about unitarily invariant norms. This is illustrated below. Thcorcnt IV . 2 . 5 If A,
Proof.
n ar·e
n X n
11HLt1'i CC$ J then
(IV.4 1 )
If 1\ k A is the kth antisymmetric tensor product of
II A k
All
=
St
(1\k A)
=
k
A,
then
1 < k < n.
II s; (A) ,
j= l .
Hence,
k
II sj (AB )
j= l
k
II sj (A)sj ( B ),
j= l
No\v
U$C
t.hc statexncnt I I . 3 . 5 ( vii ) .
1
< k .<
n.
•
IV . 2 Unitarily Invariant Norms on Operators on
en
95
Corollary IV. 2 . 6 (Holder 's Inequality for Unitarily Invariant Norms) For every unitarily invariant norm and for all A , B E M (n) +o r all p > 1 and
jl
P roo f.
1
( IV . 42 )
p + .!. 1. q
==
Use the special case of ( IV.41) for r
==
1 to get
for every symmetric gauge function. Now use Theorem IV. 1.6 and the fact • that (s(A)) P s( I AIP). ==
Exerc ise IV.2. 7 Let p, q, r be positive real numbers with fo r every unitarily invariant norm Choosing p
==
q
l p
+ 1q
==
l . r
Then
( IV.43 ) ==
1,
one gets from this
( IV.44 ) This is the CauchySchwarz inequality for unitarily invariant norms.
Exercise IV.2. 8 Given a unitarily invariant norm 1 1 1 · 1 1 1 on M (n) , define
( IV.45 ) Show that this is a unitarily invariant norm. Note that
( IV . 46) and
k
II All ��� (L ?; ( A)) 1 1P ==
for
j= l
p
> 1, 1 < k <
n.
(IV . 4 7)
Definition IV.2 . 9 A unitarily invariant norm o n M(n) is called a Q norm if it corresponds to a quadratic symmetric gauge function; i. e. , I l l · I l l is a Q norm if and only if there exists a unitarily invariant norm I l l · I l l " such that
I I I A I 1 12
Note that the nor m
II
==
I I I A* A I I I " .
l i P is a Q norm if and only if p > 2 because ( IV.49 ) II A I I � I I A * All p/ 2· 
==
The
( IV.48 )
norms defined in (IV.47 ) are Qnorms if and only if p > 2.
96
IV.
Symmetric No rms
Exercise IV.2. 10 Let II · conditions are equivalent:
I IQ
denote a Q norrn. Observe that the following
IIAII Q < II B II q for all Q norrns. {ii) l i l A * Al i i < I I I B * Bl I I for all unita rily invariant norm s. {iii) IIAII � � � < l i B II � �� for k == 1, 2, . {iv} ( s ( A )) 2 < w ( s ( B)) 2• Duality in the space of unitarily invariant norms is defined via the inner product ( IV.35 ) . If I l l · I l l is a unitarily invariant norm, define I l l · I l l ' as ( IV.50 ) I I IAI I I ' == sup I (A, B) I == sup !trA * B I . I IIBI I I =l I I IBI II = l It is easy to see that this defines a norm that is unitarily invariant. Proposition IV. 2 . 1 1 Let be a symmetric gauge function on IRn and let II · II be the corresponding unitarily invariant norm o n M ( n) . Then II · II� == II · II ' · Proof. We have from (IL40) and ( IV.41 ) {i)
.
.
, n.
n
n
l trA * Bl < tr i A * Bl == L Sj ( A * B) < L Sj ( A ) sj ( B) . j =l j= l It follows that II A II� < ' ( s ( A ) ) == I IAII ' · Conversely, ' ( s ( A )) II A II ' n
sup L Sj (A ) yj : y E IRn , 4> ( y ) == 1 j=l sup {tr (di ag(s( A )) diag (y ) ] : lldiag( y ) ll == 1 } < l l diag ( s ( A)) II� == IIAII � 
•
Exercise IV. 2. 12 From statements about duals proved in Section IV. l, we can now conclude that
{i} l tr A * Bl < I I IAI I I · I I I B I I I' for every unitarily invariant norm . � + � == 1 . {ii} IIAII� == IIAII q for 1 < p < (iii) I I A I I ( k ) max { II A II ( 1 ) ! IJ A II ( n) } 1 < k < oo ,
=
,
,
n.
IV.2 Unitarily Invariant Norms on Operators on en
97
(iv) The only unitarily invariant norm that is its own dual is the Hilbert Sch midt no rm II · 1 1 2 . (v} The only norm that is a Qnorm and is also the dual of a Qnorm is the n orm I · 1 ! 2 
Duals of Qnorms will be called Q 'norms. These include the norms I I " P ' 1 < p < 2· An important property of all unitarily invariant norms is that they are all reduced by pinchings. If P1 , . . . , Pk are mutually orthogonal projections such that P1 E9 P2 EB . . . EB Pk I then the operator on M(n) defined as .
==
,
k C( A ) LPj A Pj j= l
(IV.51)
==
is called a pinching operator. I t is easy to see that
I I I C (A) I I I < I I I A I I I
(IV.52 )
for every unitarily invariant norm. (See Problem II.5.5. ) We will call this the p inching inequality. Let illustrate one use of this inequality. us
A , B E M(n) .
Theorem IV.2. 13 Let norm on M (2n) 1 2

[
[ A +0 B
J
0
A+B
[
Then for every unitarily invariant
<
<
[ IAI � I BI � ]
(IV.53)
The first inequality follows easily from the observation that � � and � � are unitarily equivalent. If we prove the second inequality in the special case when A, B are pos itive, the general case follows easily. So, assume A , B are positive. Then Proof.
]
]
B l /2
0
] [ A 1 /22 ] B11
o
0
'
where A 1 1 2 , B 1 1 2 are the positive square roots of A , B. Since T*T and TT* . A + B O · un1tar1 . lent 10r every T , the matriX · · 1y are un1. tar1· 1y equ1va 0 0 equivalent to
[
c
[ A l/2l/
B 2
�][
Al/2 0
] [
]
IS
IV . Symmetric Norms
98
B ut
[ A0 B0 ]
. IS
a p 1. nch.1ng of th.1s last matr1x.
•
·
As a corollary we have: Theorem IV. 2. 14 {Rotfel 'd) Let f : IR + � JR+ be a concave function with f(O) == 0 . Then the function F on M (n) defined by
F ( A)
n
==
L f (sj ( A)) j=l
(IV.54)
is subadditive.
Proof.
The second inequality in (IV.53) can be written as a majorisation
(s( A ) , s ( B ) ) < w (s( I A I + I B I) , 0 ) ' for all A, B E M(n) . We also know that s( I AI + I B I ) < s( A ) + s( B) . Hence 
(s( A) , s( B ) ) < (s (A) + s( B), 0 ) . Now proceed as in Problem II.5. 12.
•
Exercise IV.2. 1 5 Let 1 1 1 · 1 1 1 be a unitarily invariant norrn on M(n) . For m < n and A E M (rn) , define
l i lA! l it =
[� �]
I I I · l i lt defines a unitarily invariant norm o n M (m) . We will use this idea of "dilating" A and of going from M(n) to M(2n) in later chapters. Procedures given in Exercises IV. l . l9 and IV. l .20 can be adapted to matrices to generate unitarily invariant norms. Show that
IV. 3
Lidskii 's Theorem (Third Proof)
Let A! ( A) denote the nvector whose coordinates are the eigenvalues of Hermitian matrix A arranged in decreasing order. Lidskii's Theorem, for which we gave two proofs in Section III.4, says that if A , B are Hermitian matrices, then we have the majorisation
a
(IV.55)
We will give another proof of this theorem now, using the easier ideas of Sections III. l and III. 2.
IV . 3 Lidskii 's
Theorem (Third Proof)
99
Exercise IV 3 1 One corollary of Lidskii 's Theorem is that, if A and B are any two matrices, then .
.
! s (A)  s ( B ) I
< w
s ( A  B) .
(IV.56)
See Problem III. 6. 13. Conversely, show that if {IV. 56) is known to be true fo r all matrices A, B , then we can derive from it the statement {IV. 55). [Hint: Choose real numbers a. , {3 such that A + a. I > B + {31 > 0 .}
We will prove (IV.56) by a different argument . To prove this we need to prove that for each of the Ky Fan symmetric gauge functions ( k ) , 1 < k < n , we have the inequality
( k ) ( s ( A  B) ) .
(IV.5 7 )
We will prove this for ( n ) , and then use the interpolation formulas (IV.9) and (IV.39) . For
..j ( B) I < I l A  E l l . J
This can be proved easily by another argument also. For any j consider the subspaces spanned by { u 1 , . . . , Uj } and { Vj , . . . , Vn } , where u i , Vi , 1 < i < n are eigenvectors of A and B corresponding to their eigenvalues .X. f ( A) and .>.. f ( B ), respectively. Since the dimensions of these two spaces add up to n + 1 , they have a nonzero intersection. For a unit vector x in this intersection we have ( x , Ax) > .>..j ( A) and (x , B x) < .X.] ( B ). Hence, we have I I A  E l l > I ( x , ( A  B)x) I > .>..] ( A)  .>..J ( B) . So, by symmetry
I .Aj ( A )  .>..j (B ) I
<
I IA  El l,
1 <j <
n.
From this, as before, we can get m� l sj ( A )  sj ( B ) l < I l A  B l l J
for any two matrices A and B . This is the same as saying
c 1 ) ( s ( A  B ) ) .
(IV. 58)
Let T be a Hermitian matrix with eigenvalues .A1 > A 2 > · · · > .Ap > .Ap+l > · · > An , where Ap > 0 > .Ap +l · Choose a unitary matrix U such that T diag ( ,\ 1 , . . . , .An ) U DU * , where D is the diagonal matrix D Let n + ( 0 , · · · 0, .Ap+ l , . . . ,  .A n )  Let (,\1 , . . . , Ap , 0, · · · 0) and n+ r+ == U n + U * , rU n U* . Then both r and r are positive matri ces and ·
===
===
===
===
,
,
===
T
===
y +  y .
This is called the Jordan decomposition of T.
(IV.59)
1 00
IV.
Symmetric Norms
Lemma IV. 3.2 If A, B are
n x n
Hermitian matrices, then
n
L I AJ (A)  AJ ( B) I
<
j=l
Proof.
I I A  E l l en) ·
( IV.60 )
Using the Jordan decomposition of A  B we can write
I I A  B l l ( n) If we put C
==
== tr ( A  B)+ + tr ( A  B)  .
A + ( A  B) 
== B + ( A  B) + ,
then C > A and C > B . Hence, by Weyl ' s monotonicity principle, AJ ( C) > AJ ( A) and AJ ( C ) > AJ ( B) for all j . From these inequalities it follows that
Hence,
n
L !Aj (A)  AJ ( B) l < tr ( 2 C  A  B) == l f A  E l l e n ) ·
•
j=l
Corollary IV.3.3 For any two
e n )
n
n
x n matrices A, B we have
(s( A)  s(B) ) == L l si ( A )  Sj ( B) I j= l
Theorem IV.3.4 For n
x n
< I l A  El l e n) ·
(IV.61 )
matrices A, B we have the majorisation
!s(A )  s (B) I
<w
s( A  B) .
Proof. Choose any index k == 1 , 2, . . . , n and fix it. By Proposition IV.2 .3, there exist X, Y E M(n) such that
AB=X+Y and Define vectors
I lA  B i l ek} == I l X I I en) a,
{3
ki iY I I
as a
{3 Then
+
s (X + B )  s( B) , s ( A )  s (X + B) . s( A )  s( B ) == a + {3.
IV . 4 Weakly Unitarily Invariant Norms
101
Hence , by Proposition IV. l.5 (or Proposition IV. 2 . 3 restricted to diagonal matrices) and by (IV.58) and ( IV 6 1 ) , we have .
4> ( k ) ( s ( A )  s(B) )
<
( l) ( ,B ) <
( l) ( s ( A )  s(X + B ) )
•
This proves the theorem.
As we observed in Exercise IV.3. 1 , this theorem is equivalent to Lidskii ' s Theorem. In Section III.2 we introduced the notation Eig A for a diagonal matrix ,vhose diagonal entries are the eigenvalues of a matrix A. The majorisations
for the eigenvalues of Hermitian matrices lead to norm inequalities
for all unitarily invariant norms. This is j ust another way of expressing Theorem III.4.4. The inequalities of Theorem III.2.8 and Problem III.6. 1 5 are special cases o f this. We will see several generalisations of this inequality and still other proofs of it. Exercise IV .3.5 If Sing 1 (A) denotes the diagonal matrix whose diagonal entries are s1 ( A ) , . . . , s n ( A ) , then it follows from Theorem IV. 3.4 that for any two matrices A, B I I I Sing l ( A )  S ing1 ( B ) I I I
< l i lA  Bi l l
for every unitarily invariant norm. Show that in this case the "opposite inequality" I J J A  B i l l < I I ISing 1 (A)  Sing T ( B ) I l l is not always true.
IV . 4
Weakly Unitarily Invariant Norms
Consider the following numbers associated with an
n x n
matrix:
1 02
IV.
Symmetric Norms
( ii ) spr A == m?JC 1 Xj (A) I , the spectral radius of A; l <1 < n
( iii) w ( A )
==
max I (x , Ax) I , the numerical radius of A. l lx l l = l
Of these, the first one is a seminorm but not a norm on M(n) , the second one is not a seminorm, and the third one is a norm. ( See Exercise !.2. 10. ) All three functions of a matrix described above have an important in variance property: they do not change under unitary conj ugations; i.e. , the transformations A � U AU* , U unitary, do not change these functions . Indeed, the first two are invariant under the larger class of similarity transformations A SAS  1 , S invertible. The third one is not invari ant under all such transformations. t
Exercise IV.4. 1 Show that no norm on M (n) can be invariant under all similarity transformations. Unlike the norms that were studied in Section 2, none of the three func tions mentioned above is invariant under all transformations A UA V , where U, V vary over the unitary group U (n) . We will call a norm T on M(n) weakly unitarily invariant (wui , for short ) if t
T( A )
==
T ( U AU * )
for all
A
E
M( n) , U E U(n) .
(IV.63)
Examples of such norms include the unitarily invariant norms and the numerical radius. Some more will be constructed now. Exercise IV .4.2 Let E1 1 be the diagonal matrix with its top left entry 1 and all other entries zero. Then w ( A)
==
max ltr El l UAU* I · uEV(n)
(IV. 64)
Equivalently,
w ( A) Given
a
==
max { ltr API : P is an orthogonal projection of rank 1 } .
matrix C, let
we ( A ) ==
max l tr CU AU* I , UE U(n)
A
E
M(n) .
This is called the Cnumerical radius of A . Exercise IV.4.3 For every C wui seminorm on M(n) .
E
M(n) , the Cnumerical radius
Proposition IV.4.4 The Cnumerical radius and only if (i} C is not a scalar multiple of I, and (ii} tr C =1 0 .
we
( IV.65) we
'lS
a
is a norm on M (n) if
IV . 4 Weakly U nitarily Invariant Norms
103
P roof. If C == AI for any A E C, then we (A) == I A I I tr A I , and this is zero if tr A == 0. So we cannot be a norm. If tr C == 0, then we (I) == ltr C l == 0. Again we is not a norm. Thus (i) and (ii) are necessary conditions for we t o be a norm. Conversely, suppose we (A) == 0. If A were a scalar multiple of I, this would mean that tr C == 0. So, if tr C =f. 0, theH A is not a scalar multiple of I. Hence A has an eigenspace M of dimension m, for some 0 < m < n . Since e t K is a unitary matrix for all real t and skewHermitian K, the conditio n we (A) == 0 implies in particular that tr C e tK A e  t K
==
Differentiating this relation at t
0
==
if t E JR, K
K* .
0 , one gets
tr (AC  CA)K == 0
if K
==
Hence, we also have tr (AC  CA)X
==
==
0 for all X
E
K* .
M(n).
Hence AC == CA. (Recall that (S, T) == trS* T is an inner product on M(n) .) Since C commutes with A, it leaves invariant the mdimensional eigenspace M of A we mentioned earlier. Now, note that since we (A) == wc (UAU*), C also commutes with U AU* for every U E U(n). But U AU* has the space UM as an eigenspace. So, C also leaves UM invariant for all U E U(n) . But this would mean that C leaves all mdimensional subspaces invariant , which in turn would mean C leaves all onedimensional subspaces • invariant, which is possible only if C is a scalar multiple of /. More examples of wui norms are given in the following exercise. Exercise IV.4. 5 (i} T(A) == I I A I I + I tr AI is a wui norm. More generally, the sum of any wui norm and a wui seminorm is a wui norm. (ii} T(A) == max( I IAI I , jtr AI) is a wui norm. More generally, the maxi mum of any wui norm and a wui seminorm is a wui norm.
{iii} Let W (A) be the numerical range of A. Then its diamete r diam W (A) is a wui seminorm on M(n). It can be used to generate wui norms as in {i} and {ii} . Of particular interest would be the norm T(A) == w(A) + diam W(A) . {iv) Let m ( A ) be any norm on M(n) . Then
T(A) == zs a
.
.
wuz norm.
max m(U AU* ) UEU(n)
IV.
1 04
Symmetric Norms
(v) Let m(A) be any norm on M (n) . Then T(A) = r m(U AU* ) dU, lu cn) where the integral is with respect to the {normalised} Haar measnre on U(n) is a wui norm. (vi) Let
T(A) == max �B?C I (e i , A ej ) I , e l , . . . ,en
t ,J
where e 1 , . . . , e n varies over all orthonormal bases. Then T is a wui norm. How is this related to (ii) and {iv) above ?
Let S be the unit sphere in
en ,
S == {x
E
en : l l x l l == 1 } ,
and let C(S) be the space of all complex valued continuous functions on S. Let dx denote the normalised Lebesgue measure on S. Consider the familiar Lpnorms on C ( S ) defined as I IJ I IP
= (is
1 1P ) x) (x Pd , f l I
1
< p <
oo ,
l f ( x) l . l l f l l oo == mEax S x
(IV.66)
Since the measure dx is invariant under rotations, the above norms satisfy the invariance property Np (/ o U) == Np ( f ) for all f
We will call
a
E
C(S) , U E U(n) .
norm N on C ( S ) a unitarily invariant function norm if
N(f o U ) == N(f ) for all f
E
C(S) , U
E
U (n) .
(IV.67)
The Lpnorrns are important examples of such norms. Now, every A E M(n) induces, naturally, a function fA on S by its quadratic form: (IV.68) !A (x) == (x, Ax) .
The correspondence A � fA is a linear map from M ( n) into C(S) , which is onetoone. So, given a norm N on C(S) , if we define a function N' on M(n)
then
as
N'
N' ( A ) = N (JA ) ,
(IV.69)
is a norm on M(n) . Further, N ' (U AU* ) == N ( fu A u· ) == N ( JA
o
U* ) .
So, if N is a unitarily invariant function norm on C(S) then N' is a wui norm on M( n) . The next theorem says that all wui norms arise in this way:
105
IV.4 Weakly Unitarily Invariant Norms
Theorem IV.4.6 A norrn T on M (n) is weakly unitarily invariant if and only if there exists a unitarily invariant function norm N on C( S ) such N ' is defined by relations (IV. 68) and th at T == N' , where the map N (IV. 69). t
P ro of. We need to prove that every wui norm T on M(n) is of the form N' for some unitarily invariant function norm N. Let F == {fA : A E M ( n) } . This is a finitedimensional linear subspace of C( S) . Given a wui norm T, define N0 on F by No ( !A ) == T ( A )
(IV. 70)
.
N0 (f) for all The n N0 defines a norm on F, and further, N0 ( f o U) f E F. We will extend No from F to all of C(S) to obtain a norm N that is unitarily invariant. Clearly, then T == N' . This extension is obtained by an application of the HahnBanach Theo rem . The space C(S) is a Banach space with the supremum norm l l f l l oo · The finitedimensional subspace F has two norms N0 and I I · l l oo · These must be equivalent: there exist constants 0 < a < {3 < oo such that n i i ! J i oo < No (!) < f3J i f l l oo for all f E F. Let G be the set of all linear functionals on F that have norm less than or equal to 1 with respect to the norm No ; i.e. , the linear functional g is in G if and only if jg(f ) I < No (f ) for all f E F. By duality then No (f) == sup lg (f) l , for every f E F. Now
l g(J) I < f3 1 1 f l l oo for g E G and f E F. Hence, by the HahnBanach The orem, each g can be extended to a linear functional g on C(S) such that J g (f) I < ,B I I f l l oo for all f E C(S ) . Now define gEG
O (f) == sup lg(f) l , gEG
for all f E C(S) .
Then e is a seminorm on C(S) that coincides with N0 on F. Let
Then
JL
JL (f) == max { B (f) , a l l f l l oo } ,
f
E
C( S) .
is a norm on C(S) , and JL coincides with N0 on F. Now define N (f) ==
sup JL(f o U ) , UEU(n)
f
E
C(S) .
Then N is a unitarily invariant function norm on C(S) that coincides with N0 on F. The proof is complete. • When N == i I l loo the norm N' induced by the above procedure is the numerical radius w . Another example is discussed in the Notes. The Cnumerical radii play a useful role in proving inequalities for wui norms : ·
106
IV .
Symmetric Norms
Theorem IV.4. 7 For A B E M(n) the following statements are equiva lent: ,
{i) T(A) < T ( B ) for all wui norms
T.
{ii} we (A) < we ( B ) for all upper triangular matrices C that are not scalars and have nonzero tmce . {iii} we ( A ) < we ( B) for all C E M (n) . {iv) A can be expressed as a finite sum A = :L zk Uk B Uk where Uk E U (n) and zk are complex numbers with I: l zk l < 1 .
Proof. By Proposition IV.4.4, when C is not a scalar and tr C =f. 0 , each we is a wui norm. So ( i ) =* ( ii ) . Note that we (A) = wA ( C ) for all pairs of matrices A, C. So, if ( ii ) is true, then WA (C) < wB ( C ) for all upper triangular nonscalar matrices C with nonzero trace. Since WA and WB are wui , and since every matrix is unitarily equivalent to an upper triangular matrix, this implies that wA (C) < wB (C) for all nonscalar matrices C with nonzero trace. But such C are dense in the space M(n) . So WA (C) < wB (C) for all C E M (n) . Hence ( iii ) is true. Let JC be the convex hull of all matrices e i 0U BU * , e E IR, U E U(n) . Then IC is a compact convex set in M (n) . The statement ( iv ) is equivalent to saying that A E JC. If A ¢. IC, then by the Separating Hyperplane Theorem there exists a linear functional f on M (n) such that Re f(A) > Re f(X) for all X E /C . For this linear functional f there exists a matrix C such that f(Y) = tr CY for all Y E M(n) . ( Problem IV.5.8 ) For these f and C we have we (A)
>
max l tr CUAU* l > l tr CA l == l f(A) l > Re f( A ) UE U(n) max Re f (X) XEIC max Re tr Cei0 U BU* o,u
maxj t r CUBU* l u we ( B ) .
So, if ( iii ) were true, then ( iv ) cannot be false. Clearly ( iv ) * ( i ) .
•
The family we of Cnumerical radii, where C is not a scalar and has nonzero trace, thus plays a role analogous to that of the Ky Fan norms in the family of unitarily invariant norms. However, unlike the Ky Fan family on M(n) , this family is infinite. It turns out that no finite subfamily of wui norms can play this role.
IV. 5
107
Problems
More precisely, there does not exist any finite family T1 , . . . , T of wui norms o n M ( n ) that would lead to the inequalities T(A ) < T ( B ) for all wui nor ms whenever Tj ( A ) < Tj ( B ) , 1 < j < m. For if such a family existed, then we would have m
m
{ X : T( X) < T(I) for all wui norms T } == n { x : Tj (X ) < Tj (I )} . j= l
(IV. 71) Now each of the sets in this intersection contains 0 as an interior point (wi th respect to some fixed topology on M ( n ) ) . Hence the intersection also contains 0 as an interior point . However, by Theorem IV.4. 7, the set on the lefthand side of (IV. 71) reduces to the set {zl : z E C, l z l < 1 } , and this set has an empty interior in M ( n ) . Finally, note an important property of all wui norms:
T (C(A)) < T( A )
(IV. 72)
for all A E M ( n ) and all pinchings C on M ( n) . In Chapter 6 we will prove a generalisation of Lidskii ' s inequality (IV.62) extending it to all wui norms. IV. 5
Problems
Problem IV. 5. 1 . When 0 < p < 1 , the function p ( x) == (2: l x dP)l f p does not define a norm. Show that in lieu of the triangle inequality we have (Use the fact that f (t) == tP on IR+ is subadditive when 0 < p < 1 and convex when p > 1 . ) Positive homogeneous functions that do not satisfy the triangle inequality but a weaker inequality cp ( x + y) < c[cp ( x ) + 1 are sometimes called quasinorms. Problem IV.5.2., More generally, show that for any symmetric gauge function ti> and 0 < p < 1 , if we define q> ( P ) as in (IV. l7) , then
Problem IV. 5.3. All norms on e n are equivalent in the sense that if and W are two norms, then there exists a constant K such that tl> ( x ) < Kw ( x ) for all X E e n . Let Kci? , 'l!
==
inf{K :
108
IV.
Symmetric Norms
, W
are both members of the family p . Problem IV.5 .4. Show that for every norm
" == ; i.e. , the dual of the dual of a norm is the norm itself. Find the constants Kct> ,w when
Problem IV.5 . 5 . Find the duals of the uorms (These are somewhat complicated.)
��� defined by (IV. l9) .
Problem IV.5.6. For 0 < p < 1 and a unitarily invariant norm I l l I l l on M(n) , let ) I I I A I l l ( p == I l l I A I p 1111 Ip . Show that l ( ) I l l A + B I l l (p < 2 ;  I l l A I l l (p ) + I l l B I l l p ) . ·
[
Problem IV. 5 . 7. Choosing p ==
]
q
== 2 in (IV.43) or (IV.42) , one obtains 112 1 2 1 * B B < A A A l l III III li lll . I I I B* l l
This, like the inequality (IV.44) , is also a form of the CauchySchwarz in equality, for unitarily invariant norms. Show that this is just the inequality (IV.44) restricted to Qnorms. Problem IV. 5 . 8. Let f be any linear functional on M(n) . Show that there exists a unique matrix X such that f( A ) == tr XA for all A E M ( n) . Problem IV. 5.9. Use Theorem IV.2. 14 to show that for all A, B E M(n) det ( 1 + l A + B l ) < det( 1 + I A I ) det ( l + I BI ) . Problem IV. 5 . 10. More generally, show that for 0 < p < 1 and
J.L
>0
det( l + JL I A + B J P ) < det ( 1 + JL I .A I P ) det( 1 + JL I B I P ) . Problem IV.5. 1 1. Let fp denote the space e n with the pnorm defined in (IV. l) and (IV.2) , 1 < p < oo . For a matrix A let I JA I I  P' de not e the P norm" of A as a linear operator from lp to f , ; i.e. , p Show that m9JC L laij l· J
I I A I I oo+oo
I I A i h oo
.
�
m � L laij l · 7.
J
ij J . �B?Cia � ,J
IV. 6 Notes and Refere nces
109
Non e of these norms is weakly unitarily invariant. Problem IV. 5. 12. Show that there exi sts a weakly unitarily invariant norm T such that T ( A ) � T(A * ) for some A E M (n) . Problem IV. 5. 13. Show that there exists a weakly unitarily invariant norm T such that T(A) > T(B) for some positive matrices A, B with A < B. Problem IV.5 . 14. Let T be a wui norm on M (n) . Define v on M(n) as v(A) == T( I A I ) . Then v is a unitarily invariant norm if and only if T(A) < T(B) whenever 0 < A < B. Problem IV. 5 . 1 5 . Show that for every wui norm T 1 T(Eig A) == inf{ T( SAS  ) : S E GL(n) } . When is the infimum attained? Problem IV. 5 . 16. Let T be a wui norm on M(n) . Show that for every A T(A) >
ltr A I n
T(I) .
Use this to show that mi n { T(A  B) tr B == 0} == :
IV . 6
l tr A I n
T(l) .
Notes and References
The first major paper on the theory of unitarily invariant norms and sym metric gauge functions was by J . von Neumann, Some matrix inequali ties and metrization of matric space, Tomsk. Univ. Rev. , 1 (1937) 286300, reprinted in his Collected Works, Pergamon Press, 1962 . A famous book devoted to the study of such norms ( for compact operators in a Hilbert space) is R. Schatten, Norrn Ideals of Completely Continuous Operators, SpringerVerlag, 1960. Other excellent sources of information are the books by I. C. Gohberg and M.G. Krein cited in C hapte r III, by A . Marshall and I. Olkin cited in Chapter II, and by R. Horn and C . R. Johnson cited in Chap ter I . A succinct but complete summary can be found in L. Mirsky's paper cited in Chapter IlL Much more on matrix norms (not necessarily invariant ones ) can be found in the book Matrix Norms and Their Applications, by G.R. Belitskii and Y.I. Lyubich, Birkhauser, 1988. The notion of Q no rms is mentioned explicitly in R. Bhatia, Some in equalities for norm ideals, Commun. Math. Phys. , 1 1 1 ( 1987) 3339. ( The possible usefulness of the idea was suggested by C. Davis. ) The Cauchy Schwarz inequality (IV.44) is proved in R. Bhatia, Perturbation inequalities
1 10
IV .
Symmetric Norms
for the absolute value map in norm ideals of operators, J. Operator Theory, 19 ( 1988) 129 136. This, and a \vhole family of inequalities including the one in Problem IV .5. 7, are studied in detail by R.A. Horn and R. Mathias in two papers, An analog of the CauchySchwarz inequality for Hadamard p roducts and unitarily invariant norms, SIAM J. Matrix Anal. Appl. , 1 1 ( 1990) 48 1498, and CauchySchwarz inequalities associated with positive semidefinite matrices, Linear Algebra Appl. , 142 ( 1990) 6382. Many of the other inequalities in this section occur in K. Okubo, Holder type norm in equalities for Schur products of matrices, Linear Algebra Appl. , 91 ( 1987) 1328. A general study of these and related inequalities is made in R. Bha tia and C. Davis, Relations of linking and duality between symmetric gauge functions, Operator Theory: Advances and Applications, 73( 1994) 127 137. Theorems IV.2. 13 and IV.2. 14 were proved by S. Ju. Rotfel ' d, The singu lar values of a sum of completely continuous operators, in Topics in Math ematical Physics, Consultants Bureau, 1969, Vol. 3 , pp. 7378. See also, R. C. Thompson, Convex and concave functions of singular values of matrix sums, Pacific J . Math. , 66 ( 1976) 285290 . The results of Problems IV.5.9 and IV.5. 1 0 are also due to Rotfel'd. The proof of Lidskii ' s Theorem given in Section IV.3 is adapted from F. Hiai and Y. Nakamura, Majorisation for generalised snumbers in semi finite von Neumann algebras, Math. Z. , 195 ( 1 987) 1727. The theory of weakly unitarily invariant norms was developed in R. Bha tia and J.A. R. Holbrook, Unitary invariance and spectral variation, Linear Algebra Appl. , 95( 1987) 4368. Theorem IV.4.6 is proved in this paper. More on Cnumerical radii can be found in C.K. Li and N.K. Tsing, Narms that are invariant under unitary similarities and the Cnumerical radii, Linear and Multilinear A lgebra, 24 ( 1989) 209222. Theorem IV.4. 7 is taken from this paper. A part of this theorem (the equivalence of condi tions (i) and (iv) ) was proved in R. Bhatia and J .A.R. Holbrook, A softer, stronger Lidskii theorem, Pro c. Indian Acad. Sciences (Math. Sciences) , 99 ( 1989) 7583. Two papers dealing with wui norms for infinitedimensional operators are C.K. Fong and J.A.R. Holbrook, Unitarily invariant operator norms, Canad. J. Math. , 35 ( 1983) 274299 , and C.K. Fong, H. Radjavi, and P. Rosenthal, Norms for matrices and operators, J. Operator Theory, 18(1987) 99 1 13. The theory of wui norms is not developed as fully as that of unitarily invariant norms. Theorem IV.4.6 would be useful if one could make the correspondence between T and N more explicit. As things stand, this has not been done even for some wellknown and muchused norms like the Lpnorms. When N is the L00 function norm, we have noted that N' (A) w(A) . When N is the L2 function norm, then it is shown in the Bhatia Halbrook ( 1987) paper cited above that ==
N' (A )
==
(
I I A I I � + I tr A 1 2 n + n2
)
1 12
.
IV . 6 Notes and References
111
For other v�lues of p, the correspondence has not been worked out. For a recent survey of several results on invariant norms see C.K. Li, Some aspects of the theory of norms, Linear Algebra Appl. , 2 12/2 13 ( 1 994) 71100.
v Operator Monotone and O perator Convex Functions
In this chapter we study an important and useful class of functions called operator monotone functions. These are real functions whose extensions to Hermitian matrices preserve order. Such functions have several special properties, some of which are studied in this chapter. They are closely related to properties of operator convex functions. We shall study both of these together.
V. l
Definitions and Simple Examples
An ) Let f be a real function defined on an interval I. If D = diag ( A 1 , is a diagonal matrix whose diagonal entries Aj are in I, we define f (D) == diag( j ( >.. 1 ) , . . . , f (>.. n ) ) . If A is a Hermitian matrix whose eigenvalues Aj are in I, we choose a unitary U such that A U DU* , where D is diagonal, and then define f(A) =: U f (D)U* . In this way we can define f (A) for all Hermitian matrices (of any order ) whose eigenvalues are in I. In the rest of this chapter, it will always be assumed that our functions are real functions defined on an interval ( finite or infinite , closed or open) and are extended to Hermitian matrices in this way. We will use the notation A < B to mean A and B are Hermitian and B A is positive. The relation < is a partial order on Hermitian matrices. A function f is said to be matrix monotone of order n if it is mono tone with respect to this order on n x n Hermitian matrices, i.e. , if A < B implies f(A) < f (B) . If f is matrix monotone of order n for all n we say f is matrix monotone or operator monotone. .
==

•
.
,
V. l
1 13
Definitions and Simple Examples
A function is said to be matrix convex of order n if for all Hermitian matrices A and B and for all real numbers 0 < A < 1 ,
f
n x n
f ( ( 1  A) A + AB) < ( 1  A) f (A ) + A f ( B ) . If f is matrix convex of all orders, we say that
operator convex.
(V. 1 )
f is matrix convex or
(No te that if the eigenvalues of A and B are all in an interval /, then the eigenvalues of any convex combination of A , B are also in /. This is an easy consequence of results in Chapter III . ) We will consider continuous functions only. In this case, the condition (V. l ) can be replaced by the more special condition (V.2)
( Functions satisfying (V.2) are called midpoint operator convex, and if they are continuous, then they are convex. ) A function f is called operator concave if the function f is operator convex. It is clear that the set of operator monotone functions and the set of operator convex functions are both closed under positive linear combina tions and also under (pointwise) limits. In other words, if , g are operator monotone, and if a, {3 are positive real numbers, then a + f3g is also oper are operator monotone, and if ( x ) ( x ) , then ator monotone. If is also operator monotone. The same is true for operator convex functions.
f
f f f f f Example V. l . l The function f(t) == a + (3t is operator monotone {on every interval} for every a E IR and {3 > 0. It is operator convex for all n
n
7
a, {J E �
The first surprise is in the following example. Example V. 1.2 f(t) == t 2 [0 , oo)
The function on is not operator mono tone. In other words, there exist positive matrices A , B such that B  A is positive but B2  A2 is not. To see this, take Example V. 1.3
The function f(t) t2 is operator convex on every in terval. To see this, note that for any Hermitian matrices A, B , 2 2 2A2 + B2  ( A + B ) 2 A B) == _!_ A AB BA) B  0. ( ( 4 4 2 2 This{3 shows that the function f ( t) + (3t + "(t2 is operator convex for all IR, > 0 . ==
=
a,
E
!
1
+
== a
>
1 14
V.
O perator Mo notone and O perator Co nvex Functions
Example V. l.4 The function
f ( t)
==
To see this, let
Then,
A3
; B3
_
t3 on [0, oo ) is not operator convex.
(A ; Br = ( � � )
'
and this is not positive.
Examples V. l.2 and V . l .4 show that very simple functions which are monotone ( convex ) as real functions need not be operator monotone ( op erator convex ) . A complete description of operator monotone and operator convex functions will be given in later sections. It is instructive to study a few more examples first. The operator monotonicity or convexity of some functions can be proved by special arguments that are useful in other con texts as well. We will repeatedly use two simple facts. If A is positive, then A < I if and only if spr ( A ) < 1 . An operator A is a contraction ( II AII < 1 ) if and only if A* A < I. This is also equivalent to the condition AA * < I. The following elementary lemma is also used often. Lemma V . 1 . 5 If
X * AX . Proof.
B > A, then for every operator X we have X* BX >
For every vector u we have,
( u , X*BX u )
==
( Xu , BX u ) > (Xu, AXu)
==
(u, X* AXu) .
This proves the lemma. An equally brief proof goes as follows. Let C be the positive square root of the positive operator B  A. Then
X * ( B  A)X == X* CCX == ( CX) * CX > 0. Proposition V. 1.6 The function f (t) ( O, oo ) .
==
•
� zs operator monotone on
Proof.
Let B > A > 0. Then, by Lemma V. l.5, I > B 11 2 AB 11 2 • Since the map T y l is orderreversing on commuting positive operators, we have I < B11 2 A 1 B 1 1 2 . Again, using Le mma V. l .5 we get from this +
•
B l < A 1 .
Lemma V. 1 . 7 If B > A > 0 and
B is invertible, then II A 1 1 2 B 1 1 2 1 l < 1.
If B > A > 0 , then I > B 1 1 2 AB 1 1 2 and hence IIA 112 B112 11 < 1 .
Proof.
==
(A 1 12 B 1 1 2 )* A 1 / 2 B 1 / 2 , •
V . 1 Definitions and S imple Examples
P ro position V . 1 . 8 The function f (t) [O , oo ) . Proof.
Let
B > A > 0.
Suppose
==
t11 2
1 15
is operator monotone on
B is invertible.
Then, by Lemma V. 1 . 7,
Si nce B  1 / 4 AB  1 14 is positive, this implies that I > B  114 A 11 2 B  1/ 4 . Hence, by Lemma V. 1 . 5 , B 1 1 2 > A 1 1 2 . This proves the proposition under the assumption that B is invertible. If B is not strictly positive, then for 0. every £ > 0, B + £1 is strictly positive. So, ( B + £1) 1 1 2 > A 11 2 . Let £ • This shows that B 1 1 2 > A 1 1 2 . +
Theorem V. 1.9 The function f ( t) for 0 < r < 1 .
==
t' is operator monotone on [0 , oo )
Proof. Let r be a dyadic rational, i.e. , a number of the form r == ; , where n is any positive integer and 1 < m < 2n . We will first prove the assertion for such r. This is done by induction on n . Proposition V. 1 .8 shows that the assertion of the theorem is true when n == 1 . Suppose it is also true for all dyadic rationals ;; , in which 1 < j < n  1 . Let B > A and let r == ; . Suppose m < 2n  1 . Then, by the induction hypothesis, B m f2 n l > A rn f 2n  l . Hence, by Proposition V. 1 .8, B m f 2n > A m f 2 n . Suppose m > 2n 1 . If B > A > 0, then A  1 > B  1 . Using Lemma V. 1 .5, we have B rn f 2n A  1 B m f 2n > B rn f 2n B  1 B rn f 2 n == B (m / 2 n  l_ 1 ) . By the same argument ,
A  1 1 2 B m / 2 n A  1 B mf 2 n A  1 / 2 > A  1 1 2 B ( rn / 2n  l _ 1) A 1 12 > A  1 / 2 A ( rn / 2 n  1  1 ) A  1 1 2 ( by the induction hypothesis ) . This can be written also
as
So, by the operator monotonicity of the square root ,
Hence , B m f 2 n > A m / 2 n . We have shown that B > A > 0 implies B ' > A' for all dyadic rationals r in [0, 1] . Such r are dense in [0 , 1] . So we have B' > A' for all r in [0, 1] . By continuity this is true even when A is positive semidefinite . • Exercise V. l . lO Another proof of Theorem V. 1 . 9 is outlined below . Fill in the details.
V.
116
Operator Monotone and O p erator Convex Functions
{i} The composition of two operator monotone functions is operator mono tone. Use this and Proposition V. l . 6 to prove that the function f ( t) == �t is operator monotone on (0 , oo ) .
l
(ii) For each A > 0, the function f ( t ) (O, oo) .
==
A� t is operator monotone on
{iii) One of the integrals calculated by contour integration in Complex Analysis is
J0
CXJ
cosec
Ar  1 dA == 1r 1+A
r1r,
0
1.
( V. 3 )
By a change of variables, obtain from this the formula tr
==
sin r 7r 1r
00
J0
t r l d A A A+t
(V.4)
valid for all t > 0 and 0 < r < 1 . {iv) Thus, we can write
tr
=
J0 A + t dJ.L ( A) , CXJ
t
0 < r < 1,
(V . 5)
where J.L is a positive measure on (0, oo) . Now use {ii) to conclude that the function f(t) == t r is operator monotone on (0, oo ) for 0 < r < 1 .
Example V. l . l l The function f(t) == I t ! is not operator convex on any interval that contains 0. To see this, take
A= Then
IAI = But l A + B I 111. 5. 7.)
==
(  11
(  11
( � ) � ). 3 1 1 , IAI + IBI = ( ). 1 1 1 ) 1 1
,
B=
V2 I . So I AI + I B I  I A + Bl is not positive. {See also Exercise
Example V. 1 . 12 The function f ( t) == t V 0 is not operator convex on any interval that contains 0. To see this, take A , B as in Example V. 1 . 1 1 . Since the eigenvalues of A are 2 an d 0, f( A ) == 0. So � (f( A ) + f (B) ) ==
( � � ) . Any positive matrix dominated by this must have ( � ) as an
eigenvector with 0 as the corresponding eigenvalue. Since � ( A + B) does not have
( � ) as an eigenvector, neither does f ( A;B ) .
V. 2 Some Characterisations
117
Exercise V. l . 13 Let I be any interval. For a E I, let f (t) == (t  a ) V 0 . Then f is called an "angle function" angled at a. If I is a finite interval, th en every convex function on I is a limit of positive linear combinations of li near functions and angle functions. Use this to show that angle functions are not operator convex. Exercise V. 1 . 14 Show that the function f (t) tone on any interval that contains 0.
==
t V O is not operator mono
Exerc ise V. 1 . 1 5 Let A , B be positive. Show that
(A  1 Therefore, the function f ( t) V.2
==
_
B  1 ) (A  1
+
B 1 )  1 ( A 1
_
B 1 )
2
� is opemtor convex on ( 0, oo ) .
Some Characterisations
There are several different notions of averaging in the space of operators. In this section we study the relationship between some of these operations and operator convex functions. This leads to some characterisations of operator convex and operator monotone functions and to the interrelations between them. In the proofs that are to follow, we will frequently use properties of operators on the direct sum 1t EB 7t to draw conclusions about operators on 1l. This technique was outlined briefly in Section L3. Let K be a contraction on 1t. Let L (I  KK* ) 11 2 , M == (I  K* K) 11 2 . Then the operators U, V defined as ==
U= ( !
L ) � ) V = ( ! K*
K
(V.6)
,
are unitary operators on 1t EB 7l. ( See Exercise I.3.6. ) More specially, for each 0 < .X < 1 , the operator (V . 7) is a unitary operator on 1t EB 1t. Theorem V.2 . 1 Let f be a real function on an interval lowing two statements are equivalent:
I.
Then the fol
(i) f is operator convex on I. {ii) f ( C ( A) ) < C (f ( A ) ) for every Hermitian operator A (on a Hilbert space 1l) whose spectrum is contained in I and for every pinching C (in the space 1l).
1 18
V.
Operator Monotone and Operator Convex Functions
Proof. (i) *(ii) : Every pinching is a product of pinchings by two comple mentary projections. (See Problems II.5.4 and IL 5.5.) So we need to prove this implication only for pinchings C of the form C (x )
==
X + U* X U ' 2
For such a C
where U
==
( I0
) I . 0
� * AU ) f (A) + �( U* AU) < f( A+
f (C (A) )
f(A) + U* f(A) U = C ( f (A) ) . 2
( ii )
=}
(i) : Let A, B be Hermitian operators on 1l , both having their
spectrum in /. Consider the operator T
=
( � � ) on 1t
E9 1t . If W is
the unitary operator defined in (V. 7) , then the diagonal entries of W*TW are AA + (1  A)B and (1  A)A + AB . So if C is the pinching on 1t EB 1l induced by the projections onto the two summands, then C ( W* TW )
==
( AA + ( 01  A. ) B
0 ( 1  A)A + AB
By the same argument, C (f ( W* TW) )
C ( W * f (T ) W ) A. f(A) + ( 1  A) j ( B ) 0
(
)
.
0 ( 1  A) j (A) + A j (B )
So the condition f(C(W*TW) ) < C(f(W* TW) ) implies that j (AA + ( 1  A ) B) < A.j (A) + ( 1  A) j (B) .
)
.
•
Exercise V . 2 . 2 The following conditions are equivalent: {i) f is operator convex on I. (ii} f ( AM ) < (f ( A) ) M for every Hermitian operator A with its spectrum in I, and for every compression T TM . +
(iii) f (V* AV) < V * f (A) V for every Hermitian operator A (on 1t) with its spectrum in I, and for every isometry from any Hilbert space into 1t . (See Section III. 1 for the definition of a compression.)
V . 2 Some Characterisations
Theorem V.2 . 3 Let I be an interval containing 0 and let Junction on I. Then the following conditions are equivalent: {i) (ii) (iii)
f
1 19
be a real
f is operator convex on I and f(O) < 0. f(K * AK) < K* f(A)K for every contraction K and every Hermitian operator A with spectrum in I. f(KiAKl + K;_BK2 ) < K�j(A)K1 + K2f( B )K2 for all operators K K2 such that Ki K1 + K2 K2 < I and for all Herrnitian A, B with 1 ,
spectrum in I.
(iv)
f(PAP) < Pf(A)P for all projections P
and Hermitian operators
with spectrum in I.
P roof. (i) ==> (ii) : Let defined in (V. 6) . Then
So,
T
==
(�
g)
and let
U *TU = ( �*iff �*if: ) ,
Hence,
( (
K * AK 0 0 LAL
)
U, V be the unitary operators
V*TV =
)
==
A
(
K��� :�1 L
).
U* TU + V*TV . 2
0 f(K * AK) 0 f(LAL) f U * TU ; V * TV f(U * TU) + f(V * TV) <
(
2 U* f(T) U + 2
)
V * f(T) V
� { U* ( f �A) f �O) ) U + V* ( f �A) f �O) ) V } A) f f A) ) * < � { U* ( � � U V ( � � ) V} 0 ). K * f(A)K ( 0 Lj(A)L Hence, f(K* A K) < K* f(A)K. ) ) � . Then K (ii) ( i ii) : Le t = ( � � , K = ( �: +
=>
T
traction. Note that
K*TK =
(
Ki AK1 ; K� BK2
� )·
is
a
con
V.
1 20
Operator Monotone and Operator Convex Functions
Hence,
f(K*T K) < K* f ( T )K
( Ki f(A)KI ; K2 f (B) K � ) . 2
( iii ) * ( iv ) obviously. ( iv ) =* ( i ) : Let A, B be Hermitian operators with spectrum in I and let
0
< .,.\ <
1 . Let T
==
( A0 B0 ) , P ( 0I 00 ) and let W be the unitary . ==
operator defined by (V. 7) . Then
PW*TW p = So,
( j( .AA + (0  .A)B) 1
0 f (O)
( AA + (�  A)B
)
<
�)
.
f(PW*TWP) Pf(W*TW)P PW* f(T)WP .A f(A) + (1  .A ) f (B ) 0 .
) 0
==
(
0
Hence, f is operator convex and f ( O )
< 0.
•
Exercise V.2 . 4 (i) Let ...\ 1 , .A 2 be positive real numbers such that .,.\ 1 .,.\ 2 >
( ).bJ �� ) is positive. (Use Proposition 1. 3. 5.) ( � c;; ) be a Hermitian operator. Then for every {ii) Let
C* C.
Then
there exists
.,.\
>
0
such that
( CA
C* B
) < ( A +0 t:I 
c
> 0,
) .AI . 0
The next two theorems are among the several results that describe the connections between operator convexity and operator monotonicity. Theorem V.2 . 5 Let f be a {continuous) function mapping the positive
halfline [0, oo ) into itself. Then f is operator monotone if and only if it is operator concave.
P roof.
Suppose f is operator monotone. If we show that f(K* A K) > K * f (A ) K for every positive operator A and contraction K, then it would follow from Theorem V.2.3 that f is operator concave. Let T == ( � g ) and let U be the unitary operator defined in ( V.6) . Then U * TU ( ��A: �"ff ) . ==
V . 2 Some Characterisations
121
By the assertion in Exercise V. 2.4(ii) , given any £ > 0, there exists A > 0 such that
U*TU < Replacing
T by f(T), we get
( KL*f(A)K f(A)K
( K* A� + �/ ) .
K * f(A)L Lf(A)L
c
) <
U* f(T)U f(U*TU) f(K * AK + £) 0 0 f(>.. ) I ==
(
)
by the operator monotonicity of f. In particular, this shows K * f( A )K < f (K* AK + £) for every £ > 0. Hence K * f(A)K < f(K * AK) . Conversely, suppose f is operator concave. Let 0 < A < B. Then for any 0 < A < 1 we can write
>.. B >..A + ( 1  .A ) 1 .A .A ( B  A ). ==
Since f is operator concave, this gives
Since f(X) is positive for every positive X , it follows that f(>.. B ) > :Af(A). 1 . This shows f(B) > f(A) . So f is operator monotone. • Now let ).. �
Let f be a continuous function from (0, oo ) into itself. If f is operator monotone then the function g( t) Ct) is operator convex. f Proof. Let A, B be positive operators. Since f is operator concave, f ( A! B ) > f( A )� f( B ) . Since the map X � x l is orderreversing and Corollary V . 2 . 6
==
convex on positive operators (see Proposition V. l .6 and Exercise V. l . 15) , this gives
This is the same as saying g is operator convex.
•
Let I be an interval containing 0, and let f be a real function on I ith f(O) < 0. Show that for every Hermitian operator A with spectrum in I and for every projection P Exercise V.2 . 7 w
.f (PAP) < Pj(PAP)
==
Pj(PAP)P.
V.
122
Operator l\1onotone and Operat or Convex Functions
Exercise V. 2 . 8
Let f be a continuous real function on [0, oo) . Then for all positive operators A and projections P f( A l /2 pA 1 f 2 ) A 1 /2 p == A 1 /2 p f(PAP ). (Prove this first, by induction, for f(t) == tn . Then use the Weierstrass approximation theorem to show that this is true for all f.) Theorem V.2.9 Let f be a (continuous} real function on the interval [0, a ) . Then the following two conditions are equivalent: {i) f is operator convex and f(O) < 0 . {ii} The function g(t) == f(t) j t is operator monotone on (0, a ) . Proof. ( i ) ( ii ) : Let 0 < A < B. Then 0 < A 1 1 2 < B 1 1 2. Hence, B  1 1 2 A 11 2 is a contraction by Lemma V. l .7. Therefore, using Theorem ==>
V.2.3 we see that
f(A) f(A 1 /2 B  1 1 2 BB 11 2 A 1 /2 ) < A 1 /2 B  1 /2 j(B)B 1 1 2 A 1 1 2 . ==
From this, one obtains, using Lemma V. 1 . 5 ,
A  1 /2 f(A)A  1 /2 < B  1 /2 f(B)B  1 /2 . Since all functions of an operator commute with each other, this shows that A  1 f(A) < B  1 f(B ). Thus, g is operator monotone. ( ii ) => ( i ) : If f(t) j t is monotone on (0, a ) we must have f(O) < 0. We will show that f satisfies the condition ( iv ) of Theorem V.2.3. Let P be any projection and let A be any positive operator with spectrum in (0, a ) . Then there exists an c > 0 such that (1 + c:)A has its spectrum in (0, a ) . Since P + ci < (1 + t:)I, we have A 11 2 (P + t:I)A 1 1 2 < ( 1 + c:)A. So, by the operator monotonicity of g , we have
A 1 f 2 (P + t:I)  1 A  1 /2 f(A 1 f 2 (P + c/)A l /2 ) < ( 1 + t:)  1 A  1 f((1 + c:)A). Multiply both sides on the right by A 1 1 2 (P + cl) and on the left by its conjugate (P + t:l)A 1 /2 . This gives A  1 /2 f( A 11 2 (P+ t:l)A 1 1 2 ) A 1 1 2 (P+t:l) < (1+c) 1 (P+c:l)f( (1 + c)A) (P +c: l). Let 0. This gives A  1 /2 f( A 1 /2 pA lf 2 ) A 1 / 2 p < pf(A)P. Use the identity in Exercise V.2.8 to reduce this to Pf(PAP) < Pj(A)P, and then use the inequality in Exercise V.2. 7 to conclude that f(PAP) < Pf( A )P, desired. • £ +
as
As corollaries to the above results, we deduce the following statements about the power functions .
V . 3 S mo o t h ne ss Propert ies
123
The orem V. 2 . 1 0 On the positive halfline (0, oo ) the functions f ( t) == tr , whe re r is a real number, are operator monotone if and only if 0 < r < 1 . Proof. If 0 < r < 1 , we know that f ( t ) == tr is operator monotone by Theorem V. 1 .9. If r is not in [0, 1] , then the function f ( t) == tr is not concave on (0, oo). Therefore, it cannot be operator monotone by Theorem • V. 2.5. Exercise V. 2. 1 1 Consider the functions f(t) == tr on (O, oo) . Use Theo rems V. 2. 9 and V. 2. 1 0 to show that if r > 0 , then f ( t) is operator convex if and only if 1 < r < 2 . Use Corollary V. 2. 6 to show that f(t) , is operator convex if  1 < r < 0 . (We will see later that j ( t ) is not operator convex
fo r any other value of r. ) Exercise V .2 . 12 A function f from (0, ) into itself is both operator monotone and operator convex if and only if it is of the form f ( t ) oo
a + {3 t , a , {3
>
==
0.
Exercise V.2 . 13
cave on (0, oo) .
V. 3
Show that the function f ( t)
==
 t log t
is operator con
Smo othness P rop erties
Let I be the open interval (  1 , 1 ) . Let f be a continuously differentiable function on I. Then we denote by Jf l ) the function on I x I defined as f [ll (A, J.L ) j[ 1 J (A, A )
f (A )  f (J.L) A  J.L f ' (A ) .
The expression f [ l ] (A, J.l ) is called the first divided difference of f at (A, J.L) . If A is a diagonal matrix with diagonal entries A1 , . . . , A n , all of which are in I, we denote by f [1 f ( A ) the n x n symmetric matrix whose (i, j )entry is f [ 1 l ( ,.\i , ,x1 ) . If A is Hermitian and A == U AU * , let f Pl (A) U f[ 1 1 (A)U * . Now consider the induced map f on the set of Hermitian matrices with eigenvalues in /. Such matrices form an open set in the real vector space of all Hermitian matrices. The map f is called (Frechet) differentiable at A if there exists a linear transformation D j (A) on the space of Hermitian matrices such that for all H =
ll f(A + H )  f(A )  D f (A) ( H ) I I
==
o ( II H II ) .
(V.8)
The linear operator D f (A) is then called the derivative of f at A. Basic rules of the Frechet differential calculus are summarised in Chapter 10. If
V.
124
and
Operator Monotone
Operator Convex Functions
f is differentiable at A, then
D f ( A) ( H )
:t
=
t=O
/(A + tH) .
(V.9 )
There is an interesting relationship between the derivative D J(A ) and the matrix jl11 (A) . This is explored in the next few paragraphs.
Lemma V . 3 . 1 Le t f be a polynomial function. Then for every diagonal m atrix A and for every Hermitian matrix H , D f(A) (H)
fl 11 (A) o H,
==
(V. lO)
where o stands for the Sch urp rodu ct of two matrices.
Proof. B ot h sides of (V. lO) are linear in f. Therefore, it suffices to prove this for the powers f(t) tP , p 1 , 2 , 3 , . . . For such f, using (V.9) one gets =
=
p
D f (A ) ( II )
This
is
=
is
the ( il j)entry of fl1 1 (A) V. 3.2
Corollary
If A
=
L
k.;:: 1
is
E .x �  1 .x� k hij · On the other handl k= 1 .x �  1 .x�  k .
a matrix whose (il j)entry p
E A k  1 HAP  k . k=l p
•
U AU* and f
D f(A ) (H)
=
U[fl11 (A) o (U* HU)]U* .
Pro of. Note that d dt
t =O
f(U AU* + t H )
and use (V . 10 ) . Theorem
V.3 . 3
its eigenvalue � in
is a polynomial fun ction, then
=
Let f E C 1 (I ) I . Then
U
[
d dt
t=O
(V. l l )
]
/(A + tU* HU) u· , •
and let A be a Hermitian matrix with al l
D f( A) ( H )
=
f l11 ( A ) o H,
(V. 12)
where o denotes the Sch ur prod uc t in a basis in whi ch A is di agon a l. 
Proof. Let A
=
U AU* , where D f( A ) (II)
=
A is
diagonal. We want to prove that
U [fl11 (A) o (U* H U) ) U* .
(V. 1 3 )
V.3
125
Smoothness Properties
This has been proved for all polynomials f . We will extend its validity to 1 all f E C by a continuity argument. Denote the righthand side of (V. l3) by V j ( A) (H ) . For each f in C1 , V f( A ) is a linear map on Hermitian matrices. We have
IIVJ ( A) (H) ll 2 == ll f(l ] ( A) 0 ( U * HU ) I I 2 ·
All entries of the matrix jl1l ( A) are bounded by max l f ' ( t ) l  ( Use the lt i < I I AII
mean value theorem. ) Hence
I IV f ( A) ( H) 1 !2 < max If ' (t) I I I H i b lt i < II A I I
(V. 14)
Let H be a Hermitian matrix with norm so small that the eigenvalues of A + H are in I. Let [a, b] be a closed interval in I containing the eigenvalues of both A and A+ H . Choose a sequence of polynomials fn such that fn f and f� f ' uniformly on [a, b] . Let £ be the line segment joining A and A + H in the space of Hermitian matrices. Then, by the mean value theorem ( for Frechet derivatives ) , we have +
+
l l !m ( A + H )  !n ( A + H)  ( !m ( A )  !n ( A) ) II < IIHI J sup II Dfm (X )  D fn (X ) II XE.C (V 1 5 ) II H II sup I ! V fm ( X )  V fn (X) l i XE.C This is so because we have already shown that D fn == V fn for the polyno mial functions fn . Let £ be any positive real number. The inequality (V. l4) ensures that there exists a positive integer no such that for m , n > no we have .
£
sup i! V fm (X )  V fn (X ) ll < 3 XE£ and Let m +
I J Vfn ( A)  V f ( A) II oo
<
(V. l6)
�
(V. l 7)
and use (V. l5) and (V. l6) to conclude that
II J ( A + H)  J( A)  (Jn ( A + H)  fn { A) ) II <
� IIHII 
(V. 18)
If I I H I I is sufficiently small, then by the definition of the F'n§chet derivative, we have (V. l9) ll fn ( A + H)  fn ( A)  V fn ( A) ( H) II < I I H II .
�
Now we can write, using the triangle inequality,
ll f ( A + H )  f(A)  Vf(A) ( H) II < ll f ( A + H )  f (A )  ( fn ( A + H)  fn ( A)) II + l l fn ( A + H)  fn ( A)  Vfn (A ) ( H) I I + II (VJ( A)  V fn ( A ) ) ( H ) I I ,
126
V.
O perator Monotone and O p erator Co nvex Funct ions
and then use (V. 17) , (V. 18) , and (V . l 9) to conclude that , for I I H II suffi ciently small, we have
ll f ( A + H)  f ( A)  V f ( A) ( H) II < t: II H II But this says that D f( A)
== 1J f ( A) .
•
Let t A ( t) be a C1 map from the interval [0 , 1] into the space of Hermitian matrices that have all their eigenvalues in I . Let f E C1 (I), and let F( t) == f ( A (t ) ) . Then, by the chain rule, Df (t ) == DF(A(t ) ) ( A ' (t ) ) . Therefore, by the theorem above, we have +
F ( 1 )  F(O )
=
1
J j l1l (A (t ) )
o
A ' (t)dt,
(V.20)
0
where for each t the Schurproduct is taken in a basis that diagonalises
A(t).
Theorem V .3.4 Let f E C1 (I) . Then f is operator monotone on I if and only if, for every Hermitian matrix A whose eigenvalues are in I, the matrix j[IJ (A ) is positive. Proof. Let f be operator monotone, and let A be a Hermitian matrix whose eigenvalues are in I . Let H be the matrix all whose entries are 1. Then H is positive. So, A + t H > A if t > 0. Hence, f (A + t H )  f (A ) is positive for small positive t. This implies that D f (A ) ( H) > 0. So, by Theorem V.3.3, J P 1 (A ) o H > 0. But , for this special choice of H , this just says that f (l ] ( A) > 0. To prove the converse, let A, B be Hermitian matrices whose eigenvalues are in I, and let B > A . Let A ( t ) == (1  t )A + tB, 0 < t < 1 . Then A (t) also has all its eigenvalues in I. So, by the hypothesis, f[1l ( A (t) ) > 0 for all t. Note that A ' ( t) == B  A > 0, for all t . Since the Schurproduct of two positive matrices is positive, f[11 ( A (t ) ) o A' ( t) is positive for all t. So, • by ( V . 20 ) , f(B)  f ( A) > 0.
is continuous and operator monotone on (  1 , 1 ) , then for each 1 < ,.\ < 1 the function g ( t ) = ( t + ,.\) f ( t) is operator convex. Lemma V.3.5 If f
.x
Proof. We will prove this using Theorem V.2.9. First assume that f is continuous and operator monotone on [ 1 , 1] . Then the function f ( t  1) is operator monotone on [0, 2) . Let g ( t) == tf ( t  1) . Then g ( O ) == 0 and the function g( t ) j t is operator monotone on (0, 2). Hence, by Theorem V . 2 . 9 , g(t ) is operator convex on (0, 2) . This implies that the function h 1 ( t) == g(t + 1 ) == ( t + l ) j( t ) is operator convex on [ 1 , 1 ) . Instead of f (t ) , if the same argument is applied to the function  J (  t ) , which is also operator
V . 3 Smoothness Prop ert ies
127
monotone on ( 1 , 1] , we see that the function h2 ( t) == (t + 1 ) J (  t ) is ope rator convex on [ 1 , 1 ) . Changing t to t preserves convexity. So the function h3 (t) == h2 (t) == ( t  l ) f (t) is also operator convex. But for I X I < 1 , 9>.. ( t) == 1 t >.. h 1 ( t) + 1 2 >.. h3 ( t) is a convex combination of h 1 and h 3 . So g>.. is also operator convex. Now, given f continuous and operator monotone on (  1 , 1 ) , the function /( ( 1  c)t) is continuous and operator monotone on [ 1 , 1] for each £ > 0. Hence, by the special case considered above, the function ( t + .X) f ( ( 1  £ )t) is operator convex. Let £ + 0, and conclude that the function (t + A )j(t) is operator convex. • The next theorem says that every operator monotone function on I is in the class C 1 . Later on, we will see that it is actually in the class c oo . (This is so even if we do not assume that it is continuous to begin with. ) In the proof we make use of some differentiability properties of convex functions and smoothing techniques. For the reader ' s convenience, these are summarised in Appendices 1 and 2 at the end of the chapter. Theorem V 3 6 .
differe ntiable .
.
Every operator monotone function f on I is continuously
Proof. Let 0 < £ < 1, and let fc be a regularisation of f of order c. (See Appendix 2 . ) Then fc is a coo function on (  1 + £, 1  c) . It is also operator monotone. Let ] ( t ) == lim0 fc ( t ) . Then ](t) == � [f(t +) + f(t  )] . c+ Let 9c (t) == (t + 1) fc ( t) . Then, by Lemma V.3. 5, gc is operator convex. Let g (t) . Then g(t) is operator convex. But every convex function g(t) == clim + 0 c (on an open interval) is continuous. So g (t) is <:ontinuous. Since g(t) == _ (� + 1) f ( t) and t + 1 > 0 on I, this means that f ( t) is continuous. Hence f(t) == f (t) . We thus have shown that f is continuous. Let g(t) == (t + 1 ) / (t) . Then 9 is a convex function on I. So g is left and right differentiable and the onesided derivatives satisfy the properties g '_ ( t) <
9� ( t) ,
lim 9 � ( s) == g� ( t) ,
sl t
lim g� ( s) == g '_ ( t) .
slt
(V. 2 1 )
But 9'± (t) == f (t) + (t + I)f� (t) . Since t + 1 > 0, the derivatives f'±_ (t) also satisfy relations like (V.21). Now let A == ( � � ) , s , t E ( 1, 1 ) . If £ is sufficiently small, s, t are in (1 + £, 1  c) . Since fc is operator monotone on this interval, by Theorem V .3.4, the matrix JJ 1 ] (A) is positive. This implies that
( fc: ( )  fc: (t) ) 2 < f� (s ) f� (t) . 8
st
Let c � 0. Since fc � f uniformly on compact sets, fc (s)  !c (t) converges to f(s)  f ( t ) . Also, f� (t) converges to � [ f� ( t ) + J!_ (t)] . Therefore, the
1 28
V.
Operator Monotone and Operator Convex Functions
above inequality gives, in the limit , the inequality
Now let s l t, and use the fact that the derivatives of f satisfy relations like ( V.21 ) . This gives
< ! u� (t) + t� (t) ] [J� (t) + �� ( t) J , which implies that f� ( t) == f!_ ( t) . Hence f is differentiable. [ !� (t W
The relations • (V. 2 1 ) , which are satisfied by f too , show that f ' is continuous.
Just as monotonicity of functions can be studied via first divided differ ences, convexity requires second divided differences. These are defined as follows. Let f be twice continuously differentiable on the interval I._ Then j [2] is a function defined on I x I x I as follows. If A1 , A2 , A3 are distinct
! [ .1\} ' ' .1\, 2 , .1\, 3 ) 2J (
== j [ 1 l ( A l , A 2 )  j [1l ( A l , A3 ) .1\2 '  .1\' 3
•
For other values of A1 , A 2 , A 3 , j( 2] is defined by continuity; e.g. , / [2 1 ( A , A, A) == � ! ( A ) . 2 "
Show that if A1 , A 2 , A3 are distinct, then j [21 (Al , A 2 , A3 ) is the quotient of the two determinants
Exercise V .3. 7
/(AI ) A1 1
j ( A2 ) A2 1
AI A1 1
and
A� A2 1
A§ A3 1
Hence the function f [2] is symmetric in its three arguments. Exercise V. 3.8 If f(t) == t m , m == 2, 3, . . , show that .
O$p,q,r v + q +r=m  2
(i} Let f (t) == tm , m > 2 . Let A be an n x n diagonal n matrix; A == L Ai Pi , where Pi are the projections onto the coordinate i= l axes. Show that for every H
Exercise V.3.9
d2 f(A + tH ) 2 dt t=O
2 2
L
p +q + r = m  2
L
AP H Aq H Ar
L
p+q+r=m 2 1 < i ,j, k
AfAJ A� PiHPj HPk ,
V . 3 S moothness P roperties
129
and (V.22)
i, j, k
(ii) Use a continuity argument, like the one used in the proof of Theorem v. 3. 3, to show that this last formula is valid for all C 2 functions f. Theor em V.3. 10 If f E C2 (I) and f is operator convex, then for each Jl E I the functi on g( >.. ) == f[ 1 ] ( J.L , >.. ) is operator monotone. Proof. Since f is in the class C2 , g is in the class C 1 . So, by Theorem
V.3.4, it suffices to prove that , for each n, the n x n matrix with entries g[l ] ( >..i , Aj ) is positive for all >.. 1 , . . . , A n in I. Fix n and choose any ) q , . . . , An + 1 in I. Let A be the diagonal matrix with entries >.. 1 , . . . A n +l · Since f is operator convex and is twice differentiable, for every Hermitian matrix H, the matrix J:2 f(A + t H) must t=O be positive. If we write P1 , . . . , Pn + 1 for the proj ections onto the coordinate axes, we have an explicit expression for this second derivative in (V.22 ) . Choose H to be of the form
H ==

0 0 0 0
�1 �2 �n
0
where �b . . . , �n are any complex numbers. Let ( 1 , 1 , . . . , 1 , 0) . Then
x
be the ( n + 1)vector (V.23)
for 1 < i , j, k < n + 1 , where bj ,n + 1 is equal to 1 if j == n + 1 , and is equal to 0 otherwise. So, using the positivity of the matrix (V.22) and then (V.23) , we have 0
<
1 < i ,j,k
But,
f [ 1 1 ( .A n+ l , >.. i )  f[ l ] ( An+l , .Ak ) .Ai  .Ak g[lJ (.x i , .Ak )
V.
130
O perator Monotone and O perator Convex Functions
( putting A n + l
== JL
in the definition of g ) . So we have
0 < L g [ l ] (A i , A k )�k €i 1 < i ,k< n
Since �i are arbitrary complex numbers, this is equivalent to saying that • the n x n matrix (g[ 1 l (.A i , A k ) ] is positive.
If f E C2 (I) , f ( O) 0, and f is operator convex, then f�t ) is operator monotone . •
Corollary V.3. 1 1
the function g(t) ==
=
By the theorem above, the function j[ 1 l (O, t ) is operator mono • tone. But this is just the function f(t)jt in this case. Proof.
If f is operator monotone on I and f(O) = 0, then the t �>. f(t) is operator monotone for I .A I < 1 .
Corollary V.3 . 12
function g(t) ==
First assume that f E C 2 (I). By Lemma V.3.5, the function 9>. (t) == (t + J... ) f(t) is operator convex. By Corollary V.3. 1 1 , therefore, g ( t) is operator monotone. If f is not in the class C2 , consider its regularisations fc: . These are in C2 . Apply the special case of the above paragraph to the functions fc: ( t ) fc: ( O) and then let c 0. • Proof.

,
7
If f is operator monotone on I and f(O) is twice differentiable at 0.
Corollary V.3. 13
==
0, then f
By Corollary V.3. 12 , the function g ( t) == ( 1 + ; )f(t) is operator monotone, and by Theorem V.3.6 , it is continuously differentiable. So the function h defined as h (t) == � f ( t ) , h ( O ) == f'(O) is continuously differen • tiable. This implies that f is twice differentiable at 0. Proof.
Let f be a continuous operator monotone function on I. Then the function F(t) == J; f(s)ds is operator convex.
Exercise V.3. 14
Exercise V.3 . 1 5 Let f E C 1 (I) . Then f is opemtor convex if and for all Herrnitian matrices A, B with eigenvalues in I we have
j(A)  f(B)
>
/[ 1l (B)
o
only if
( A  B) ,
where denotes the Schurproduct in a basis in which B is diagonal. o
V . 4 Loewner's Theorems
131
V . 4 ,Loewner 's Theorems . Consider all functions f on the interval monotone and satisfy the conditions
f(O)
==
0,
I
! ' (0)
(  1 , 1)
==
==
that are operator
1.
(V.24)
Let K be the collection of all such functions. Clearly, K is a convex set. We will show that this set is compact in the topology of pointwise convergence and will find its extreme points. This will enable us to write an integral representation for functions in K. Lemma V.4. 1
If f E K, then t < f(t) 1  t for 0 < t < 1, t > f(t) 1 + t for  1 < t < 0 , l f " (O) l < 2.
Proof.
Let A
==
(�
g ) . By Theorem V.3.4, the matrix /[ 1 l ( A)
==
f ' (t) ( f(t) /t
f(t ) / t 1
)
is positive. Hence, (V.25)
(t ± l)f(t). By Lemma V.3.5, both functions 9± are con Let 9± (t) vex. Hence their derivatives are monotonically increasing functions. Since g� (t) f(t) + (t ± l)f ' (t) and g± (O) ± 1 , this implies that ==
==
==
f ( t ) + ( t  1 ) ! ' ( t) >  1 and
for
t>0
( V.26 )
f ( t) + ( t + 1 ) f' ( t) < 1
for
t < 0.
( V.27 )
2 f(t) ��
for
t > 0.
(V.28)
From ( V.25 ) and ( V.26 ) we obtain
f(t ) + 1 > ( 1 
< t < 1 we have f(t) > t t · Then f(t) 2 > 1 t f(t). So, from (V.28) , we get f(t)+l > f�t) . But this gives the inequality f(t) < 1 t t , which contradicts our assumption. This shows that f(t) < t for 0 < t < 1. The second inequality of the lemma is obtained by the same Now suppose that for some 0
1
t
argument using (V.27) instead
of
1
(V.26) .
t
V.
1 32
Operator Monotone and O perator Convex Functions
We have seen in the proof of Corollary V.3 . 1 3 that
f' (0 ) + � !" (0) = li m t+0 2
(1
+
t  1 ) f (t )  f ' (0 ) . t
Let t l 0 and use the first inequality of the lemma to conclude that this limit is smaller than 2. Let t i 0 , and use the second inequality to conclude that it is bigger than 0. Together, these two imply that l f " (O ) I < 2. • Proposition V.4.2
convergence.
in
The set K is compact
the topology of pointwise
Proof. Let { fi } be any net in K. By the lemma above, the set { fi (t ) } is bounded for each t . So, by Tychonoff ' s Theorem, there exists a subnet { fi } that converges pointwise to a bounded function f . The limit function f is operator monotone, and f (O ) == 0 . If we show that f ' (0) == 1 , we would have shown that f E K, and hence that K is compact. By Corollary V.3 . 12, each of the functions ( 1 + ! ) fi (t) is monotone 1 on (  1 , 1 ) . Since for all i , lim ( l +  ) fi (t) == !I (0) == 1 , we see that t+0 t ( 1 + l )fi (t ) > 1 if t > 0 and is < 1 if t < 0. Hence, if t > 0, we have ( 1 + ) J (t ) > 1 ; and if t < 0, we have the opposite inequality. Since f is continuously differentiable, this shows that f ' ( 0) == 1. •
!
Proposition V .4.3 All
f(t) Proof.
Let f
E
extreme p o i t 1  a.t '
n
ts of the
where
o:
set K have the form
==
1 " ! ( 0) . 2
K. For each A,  1 < A < 1 , let A g;.. (t) == ( 1 +  ) j (t )  A. t
By Corollary V.3. 12, g;.. is operator monotone. Note that g;.. (O ) == 0, since f (O ) == 0 and f ' (0) == 1 . Also, g� (0) = 1 + � Aj" (0) . So the function h;.. defined as 1 .A h >. ( t ) = 1 1 ) f ( t )  A] + t [( + � V " (O) is in K. Since l f " ( O ) I < 2, we see that I � Aj" (O) I < 1. We can write 1 " 1 1 1 f ==  ( 1 +  Aj " ( 0 ) ) h;.. +  ( 1   Aj ( 0 ) ) h_;.. . 2 2 2 2 So, if f is an extreme point of 1 2
K, we must have f = h;.. . A t
( 1 +  A j " (O ) ) f (t ) = ( 1 +  )f ( t)  .A ,
This says that
133
V. 4 Loewner 's Theorems
from which we can eonclude that
f ( t) == The orem V.4.4 Jl on [  1 , 1] such
t 1  �j" ( O ) t .
•
For each f in K there exists a unique probability measure that 1 f ( t) =
J 1 t At, dJL(A ) .
( V . 29 )
1
Proof. For  1 < A < 1 , consider the functions h>.. ( t) == 1 __!>.. t . By Propo sition V .4.3, the extreme points of K are included in the family { h.x } . Since K is compact and convex, it must be the closed convex hull of its extreme points. (This is the KreinMilman Theorem.) Finite convex com binations of elements of the family { h.x :  1 < .,.\ < 1 } can also be writ ten as I h>.. dv(A ) , where v is a probability measure on [ 1 , 1] with finite support . Since f is in the closure of these combinations , there exists a net {vi } of finitely supported probability measures on [ 1 , 1] such that the net fi ( t ) == J h .x ( t) dvi (A ) converges to f ( t). Since the space of the probability measures is weak* compact , the net vi has an accumulation point JL · In other words, a subnet of J h.x dvi ( .A) converges to J h;.. dJL(A ) . So j(t ) == I h>.. (t ) dJL( A ) == J l_!_A t dJL ( A ) . Now suppose that there are two measures JL 1 and JL2 for which the representation (V. 29) is valid. Expand the integrand as a power series
l�>.. t ==
00
L tn+ 1 .,.\ n
convergent uniformly in jAj < 1 for every fixed
=O n l t l < 1 . This shows that
t with
for all l t l < 1 . The identity theorem for power series now shows that
J )..ndJl 1 ( >..) J )..ndJl2 (>.. ) , 1
1
=
1
1
But this is possible if and only if Jl l == Jl 2 .
n=
0, 1 , 2, . . . •
One consequence of the uniqueness of the measure JL in the representation (V.29) is that every function h;..0 is an extreme point of K ( because it can be represented as an integral like this with JL concentrated at A0 ) . The normalisations (V.24) were required to make the set K compact . They can now be removed. We have the following result.
1 34
V.
Operator I\1onotone and O p erator Convex Funct ions
Corollary V.4.5 Let f be a nonconstant operator monotone (  1 , 1 ) . Then there exists a unique probability measure JL on
that
J 1 t 1
f ( t ) == f (O ) + !' ( 0 )

>.
1
t dJL ( .A) .
function on [ 1 , 1] such ( V.30)
Proof. Since f is monotone and is not a constant , f' (0) :f. 0. Now note that the function f(t),(t)(o) is in K. • It is clear from the representation (V.30) that every operator monotone function on (  1 , 1) is infinitely differentiable. Hence, by the results of earlier sections, every operator convex function is also infinitely differentiable.
Let f be a nonlinear operator convex function on (  1 , 1). Then there exists unique probability measure JL on [ 1, 1] such that
Theorem V.4. 6
a
f( t) = f(O ) + j ' (O)t + � f" ( 0 )
1
J l � .At dJL( .A ) .
1
2
(V.31)
Proof. Assume, without loss of generality, that f(O) == 0 and /' ( 0) == 0. Let g(t) == f(t)jt. Then g is operator monotone by Corollary V.3. 11, g (O) == 0, and g'(O) == � f" (O) . So g has a representation like (V.30), from which the representation (V. 31 ) for f follows. • We have noted that the integral representation (V . 3 0 ) implies that every operator monotone function on (1, 1) is infinitely differentiable. In fact, we can conclude more. This representation shows that f has an analytic continuation
j 1
f(z) == f( O ) + !' ( 0 )
1
z 1  ,\z dJL ( .A )
(V.32)
defined everywhere on the complex plane except on (  oo ,  1 ] U [l, oo ) . Note that z Im z Im ·
1 1  Az l 2
1  Az
So f defined above maps the upper halfplane H+ == {z : Im z > 0 } into itself. It also maps the lower halfplane H_ into itself. Further, f(z) == f(z). In other words, the function f on H is an analytic continuation of f on H+ across the interval (  1 1) obtained by reflection. This is a very important observation, because there is a very rich theory of analytic functions in a halfplane that we can exploit now. Before doing so, let us now do away with the special interval (  1 , 1). Note that a function f is operator monotone on an interval ( a , b ) if and only if the function _
,
V . 4 Lo ewner 's Theorems
135
f ( ( b ;_a ) t + bt a ) is operator monotone on ( 1 , 1 ) So, all results obtained for operator monotone functions on (  1 , 1 ) can be extended to functions on (a, b) . We have proved the following. 
.
If f is an operator monotone function on (a, b) , then f has an analytic continuation to the upper halfplane H+ that maps H+ into itself. It also has an analytic continuation to the lowerhalf plane H obtained b y reflection across (a, b) . The converse of this is also true: if a real function f on (a, b) has an analytic continuation to H+ mapping H into itself, then f is operator monotone on (a, b) . This is proved below. Th eorem V .4. 7
_ ,
+
Let P be the class of all complex analytic functions defined on H+ with their ranges in the closed upper halfplane { z : Im z > 0}. This is called the class of Pick functions. Since every nonconstant analytic function is an open map, if f is a nonconstant Pick function, then the range of f is contained in H+ . It is obvious that P is a convex cone, and the composition of two nonconstant functions in P is again in P.
{i) For 0 < < 1 , the function f(z ) == z ' is in P. {ii) The function f ( z) log z is in P. {iii) The function f(z) == tan z is in P. {iv) The function f ( z ) ==  ! is in P. (v) If f is in P, then so is the function 11 . Given any open interval (a, b) , let P ( a, b) be the class of Pick functions
Exercise V.4.8
r
==
that admit an analytic continuation across (a, b) into the lower halfplane and the continuation is by reflection. In particular, such functions take only real values on (a, b) , and if they are nonconstant , they assume real values only on ( a, b). The set P ( a , b) is a convex cone. Let f E P(a, b ) and write f ( z ) u ( z ) + iv(z) , where as usual u(z) and v(z) denote the real and imaginary parts of f. Since v(x) == 0 for a < x < b, we have v(x + iy )  v(x) > 0 if y > 0. This implies that the partial derivative vy (x ) > 0 and hence, by the CauchyRiemann equations, ux (x) > 0. Thus , on the interval (a, b) , f (x ) == u ( x ) is monotone. In fact, we will soon see that f is operator monotone on (a, b). This is a consequence of a theorem of Nevanlinna that gives an integral representation of Pick functions. We will give a proof of this now using some elementary results from Fourier analysis. The idea is to use the conformal equivalence between H+ and the unit disk D to transfer the problem to D , and then study the real part u of f. This is a harmonic function on D, so we can use standard facts from Fourier analysis. ==
Theorem V.4.9 Let u be a nonnegative harmonic function on the disk D == { z : I z I < 1 } . Then there exists a finite measure m on [0, 27r]
unit such
136
V.
Operator Monotone and O perator Convex Fu nctions
that
2 7r
u(ret8 ) == ·
1  r2 dm ( t) . 1 + r 2  2 r cos ( O  t )
J 0
(V.33)
Conversely, any function of this form is positive and harmonic on the unit disk D . Proof. Let u b e any continuous real function defined on the closed unit disk that is harmonic in D . Then, by a wellknown and elementary theorem in analysis, 2 7r 1 . t ) dt 1  r2 i u(re 8) u et ( 27r 1 + r 2  2r cos(e  t ) 0 2 7r 1 Pr ( B  t)u(ei t )dt, (V.34) 27r
J j 0
where P, (B ) is the Poisson kernel ( defined by the above equation ) for 0 < r < 1 , 0 < e < 21r. If u is nonnegative, put dm(t) == 2� u(e i t)dt. Then m is a positive measure on (0 , 27r] . By the mean value property of harmonic functions, the total mass of this measure is 2 7r t )dt = u(O) . (V.35) u(ei 2 0 So we do have a representation of the form (V.33) under the additional hypothesis that u is continuous on the closed unit disk. The general case is a consequence of this. Let u be positive and harmonic in D . Then, for £ > 0, the function uc (z) == u( l �c ) is positive and harmonic in the disk l z l < 1 + £. Therefore, it can be represented in the form (V.33) with a measure mE(t) of finite total mass uc ( O ) == u(O) . As E � 0, uE converges to u uniformly on compact subsets of D. Since the measures me all have the same mass, using the weak* compactness of the space of probability measures, we conclude that there exists a positive measure m such that
�j
"
j u(re' ) = lim uE (re' ) = c ·
e
·
+0
e
2n
0
1 + r2
1  r2
 2r
dm(t) . ( cos B  t )
Conversely, since the Poisson kernel Pr is nonnegative any function repre • sented by (V.33) is nonnegative.
every nonnegative harmonic function on the unit disk is the Poisson integml of a positive measure. Theorem V.4.9 is often called the Herglotz Theorem. It says that
V . 4 Loewner 's Theorems
137
Recall that two harmonic functions u, v are called harmonic conj u gates if the function f (z) == u ( z ) + iv ( z) is analytic. Every harmonic function u has a harmonic conjugate that is uniquely determined up to an additive constant. Theorem V.4. 10 Let f(z) == u ( z ) + iv(z) be analytic on the unit disk D . If u ( z) > 0, then there exists a finite positive measure m on [0, 21r] such
th at
2 7r
f ( z ) ==
J0
ei t + z ez. t  z
dm(t) + i ( O ) .
(V.3 6 )
v
Conversely, every function of this form is analytic on D and has a positive real part. Proof. By Theorem V.4.9, the function u can be written as in The Poisson kernel Pr , 0 < r < 1, can be written as
Pr (O)
==
1  r2 1 + r 2  2 r cos e
Hence,
Pr ( e  t )
==
Re
and
==
==
� 
+ r e i (B t) 1  re z ( e t ) ·
Re
J 0
== Re
CXJ
1
271"
u ( z)
CXJ " r lnl e in8 '""'"
==
Re
(V.33) .
1 + rei e 1  re i8 .
e i t + re ie e z· t  re2· e '
e it + z dm ( t). e 2. t  z
So, f ( z) differs from this last integral only by an imaginary constant . Putting z == 0 , one sees that this constant is iv(O). • The converse statement is easy to prove.
Next , note that the disk D and the halfplane H+ are conformally equiv alent , i.e. , there exists an analytic isomorphism between these two spaces. For z E D, let Then (
E
((z)
==
� z + 1_
1 H+ . The inverse of this map is given by �
z (()
=
z 
(
(

+
�.
'l
(V.37) (V. 38)
Using these transformations, we can establish an equivalence between the class P and the class of analytic functions on D with positive real part . If f is a function in the latter class, let
==
if(z(() ) .
(V. 39)
138
Then cp
V.
E
O perator Monotone and O p erator Convex Functions
P. The inverse of this transformation is (V.40)
f ( z ) ==  i cp ( (( z ) ) .
Using these ideas we can prove the following theorem, called N evan linna's Theorem. Theorem V .4. 1 1 A
representation
function cp is in the Pick class if and only if it has a 00
cp( ( ) == a + ,8( +
1 + A( dv ( >. ) ,
Joo .X _

(V . 4 1 )
(
where a is a real number, ,B > 0, and v is a positive finite measure on the real line.
Proof. Let f be the function on D associated with cp via the transforma tion (V.40). By Theorem V.4. 10, there exists a finite positive measure m on [0, 21r ] such that
J
2 7r
f(z) ==
e t + z dm(t)  i n . e2  z i
.t
0
If f ( z) == u ( z ) + iv (z ) , then a == v (O) , and the total mass of m is u ( O ) . If the measure m has a positive mass at the singleton {0} , let this mass be {3. Then the expression above reduces to
f ( z) =
eit + z dm t ) + {3 1 + z 1z et t  z (
J ( 0 ,27r)
Using the transformations
.
(V.38)
and
cp(( ) = Q + ,8( + i
.

za .
(V.39) , we get from this
J
e
it .
e ?.
( 0 , 2 7r)
t
i + (( + " ( � d m ( t). 
?.
( +i
__
The last term above is equal to
( cos 2t  s1n t 2 dm(t) . ( sin ; + cos � .
��
J (0 , 2 7r)
"
Now, introduce a change of variables A ==  cot � . This maps (0, 21r) onto (  oo , oo ) . The measure m is transformed by the above map to a finite measure v on (  oo , oo ) and the above integral is transformed to 00
Joo

l + A( dv ( A ) . A(
V . 4 Loewner ' s T heorems
139
This shows that cp can be represented in the form (V .4 1 ) . It is easy to see that every function of this form is a Pick function.
•
There is another form in which it is convenient to represent Pick func tio ns . Note that 1 + A( A )(.A 2 + 1 =( 1 . 2 ) 1 + A .A  ( .A  ( So, if we write representation
dJi( A)
( A2 + l )dv( .A ) ,
then we obtain from (V.4 1 ) the
cp(() = a + ,6( + J [ A � (  A2 � 1 ] dJ.L ( A ) ,
(V.42)
a == Re cp(i).
(V .43)
CXJ

CXJ
whe re 11 is a positive Borel measure on R, for which J A2� 1 d11(:>.. ) is finite. ( A Borel measure on IR is a measure defined on Borel sets that puts finite mass on bounded sets. ) Now we turn to the question of uniqueness of the above representations. It is easy to see from (V.4 1 ) that Therefore, a is uniquely determined by cp . Now let number. From (V.41 ) we see that

'T} be
any positive real
CXJ
As rJ oo , the integrand converges to 0 for each .A. The real and imaginary parts of the integrand are uniformly bounded by 1 when rJ > 1 . So by the Lebesgue Dominated Convergence Theorem, the integral converges to 0 as rJ + oo . Thus, {3 == lim 'fJ ( i TJ) / i TJ , (V.44) +
TJ + 00
and thus (3 is uniquely determined by cp . Now we will prove that the measure dJi in (V . 42) , is uniquely determined by cp. Denote by 11 the unique right continuous monotonically increasing function on IR satisfying JL ( O ) == 0 and JL ( ( a , b] ) 11 ( b)  JL ( a ) for every interval ( a, b]. ( This is called the distribution function associated with df.L.) We will prove the following result, called the Stieltjes inversion formula, from which it follows that J..L is unique. ==
Theorem V.4. 12 If
the Pick function cp is represented by (V.42}, then for any b that are points of continuity of the distribution function JL have a,
we
11 ( b)
 JL( ) == TJlim + 0 a
� J Im
7f
a
(V.45)
V.
1 40
Operator Monotone and Operator Convex Functions
From (V.42) we see that
Proof.
� J Im cp(x + i1J) dx
�J b
b
{3 7] +
a
a
00
Joo ( >.  x�2
+
1]2 dp, ( >. ) dx

the interchange of integrals being permissible by Fubini ' s Theorem. As rJ + 0, the first term in the square brackets above goes to 0. The inner integral can be calculated by the change of variables u == x T'J .A . This gives b
J a
b  ).. T}
'T]dX (x .A) 2 + 'TJ2
J
_
a  >.
du u2 + 1
T}
arctan
( b  .A) 'TJ

arctan
So to prove (V.45) , we have to show that
11( b)  p, ( a ) = ��
�
00
a arctan C � >.)  arctan ( � >.)] dp, ( A). [ J

oo
We will use the following properties of the function arctan. This is a mono tonically increasing odd function on (  oo , oo ) whose range is (  ; ; ) . So,
0 < arctan
( b  >.. )  arctan ( a  >.. ) <
,
1r .
(a  >..) have the same sign, then by the addition law for ( (b .A) a  >.. ) 'fJ (b  a) arctan == arctan 2  arctan 17 ( b ,.\) ( a ) · 'fJ 'fJ
'fJ
If (b  >.. ) and arctan we have,
+
'fJ
If x is positive, then
X
arctan x =
J 0
dt <  1 t +
2
X
J
+
8 )  11 ( a  6 ) J.L ( b + 8 )  J.L ( b  6 )
_
)..
dt == X .
0
Now, let c be any given positive number. Since a and tinuity of Ji , we can choose 8 such that
JL( a
_
< c/5 , < £/ 5 .
b are points of con
V. 4 Loewner 's Theorems
141
We then have,
1 I JL( b )  JL(a) 
CXJ
b  A )  arctan ( a  A ) ]dJ.L(A) ( J 1 / (arctan ( b  >.. )  arctan ( a  >.. ) ]dJ.L(A) 
CXJ
CXJ
<
'f/
'f/
(arctan
7r
'f/
7r
'I'}
b
1
b
b  >.. ) + arctan ( a  A ) ]dJ.L(A) arctan ( Ja a 1 b >.. a  >.. ) ] dJ.L(A) ( ) + [arctan arctan ( J 2£ 1 1 arctan ( ry(b_  a) _ ) dJ.L(A) + + (b A)(a A) S b+8 b8 1 [1r  arctan ( b  >.. ) + arctan ( a  >.. ) ]dJ.L( .X ) + J + a 8 a8 1 arctan ( ry(b_  a) _ ) dJ.L(A) . + + (b >.. ) (a A) J +
7f
7r
'I'}
'f/
 CXJ
CXJ
<
'f/
'I'}
[1r
TJ 2
7f
'f/
7f
'f/ 2
7f

'f/
CXJ
Note that in the two integrals with infinite limits, the arguments of arctan are positive. In the middle integral the variable runs between + and > TJ8 and a TJ ;A < TJ8 . So the righthand side of the For such >.. , b TJ ;A above inequality is dominated by
A

b  8.
2£ 5
CXJ
+ + +
b  a dJL(A) 1r J + ( b  >.. ) (a  >.. ) b+8 a8 b .X dtt ( >.. ) 1r J 'f}2 + (b  )( a  >.. ) 8 .!_ [1r  2 arctan ]dJ.L(A). J TJ
TJ
2
a

CXJ
b6
7f
'T]
a 8
a +6
TJ
The first two integrals are finite ( b ecause of the properties of df..L ) . The third o ne is dominated by 2 ( ;  arctan � [ S o we can choose rJ small enough to make each of the last three terms smaller than c / 5 . This proves the theorem. •
) JL(b) JL(a)].
1 42
V.
Operator Monotone and Operator Convex Functions
We have shown above th at all the terms occurring in the representation ( V.42) are uniquely determined by the relations (V.43) , (V.44) , and (V.45) . Exercise V.4. 13 We have proved the relations (V. 33}, (V. 36}, ( V. 4 1) and (V.42} in that order. Show that all these are, in fact, equivalent. Hence,
each of these representations is unique.
Proposition V .4. 14 A Pick function if the measure JL associated with it in mass on (a, b) .
'P
is in the class P( a , b) if and only the representation ( V.42} has zero
Proof. Let cp(x + iry) == u(x + i'T]) + i v(x + iry), where u, v are the real and imaginary parts of cp. If 'P can be continued across (a, b) , then as rJ 1 0, on any closed subinterval [c, d) of (a, b) , v (x + iry) converges uniformly to a bounded continuous function v ( x) on [ c, d] . Hence, JL( d)  M ( c) =
� J v(x )dx, d
c
i.e. , dJL(x) == ! v ( x ) dx . If the analytic continuation to the lower halfplane is by reflection across (a, b) , then v is identically zero on [ c, d] and hence so lS JL. Conversely, if JL has no mass on (a, b) , then for ( in (a, b) the integral in (V.42) is convergent, and is real valued. This shows that the function cp • can be continued from H+ to H_ across (a, b) by reflection. The reader should note that the above proposition shows that the con verse of Theorem V .4. 7 is also true. It should be pointed out that the formula (V.42) defines two analytic functions, one on H+ and the other on H_ . If these are denoted by cp and 'l/J, then cp(( ) == 'l/; ( () . So 'P and 'lj; are reflections of each other. But they need not be analytic continuations of each other. For this to be the case, the measure f.L should be zero on an interval (a, b) across which the function can be continued analytically.
If a function f is operator monotone on the whole real line, then f must be of the form j(t) = a + {3t, a E JR, {3 > 0 .
Exercise V.4. 15
Let us now look at a few simple examples.
The function cp(() ==  � is a Pick function. For this function, we see from (V.43) and {V. 44) that a == {3 == 0 . Since cp is analytic everywhere in the plane except at 0, Proposition V. 4 . 14 tells us that the measure JL is concentrated at the single point 0. Example V.4. 16
V . 4 Loewner ' s Theorems
1 43
Let cp( () == ( 11 2 be the principal branch of the square ro ot function. This is a Pick function. From (V. 4 3) we see that
Example V.4. 1 7
cp( i ) = Re e in:/4 = �From (V. 44) we see that {3 == 0 . If ( == A + iry is any complex number, then 1( I 2 = ( 1 ( 1 + A ) 112 + . sgn (1 ( 1  A ) 1 12 , 2 2 where sgn is the sign of defined to be 1 if > 0 and if < 0 . 1 12 = ( > 0 , we have Im cp( ) ('<1; >.. ) . As ! 0 , 1 ( 1 comes closer to So if j A I. So, Im cp(A + iry ) converges to 0 if A > 0 and to IAI1/ 2 if A < 0. Since a = Re
TJ
t
TJ
rJ ,
rJ
1
rJ
TJ
cp
is positive on the right halfaxis, the measure J.L has no mass at measure can now be determined from {V. 4 5). We have, then
0(
rJ
0.
The
1 / 2 A ) IAI (V.46) dA. + J A  ( A2 + 1 y'2 CXJ Example V .4. 18 Let cp( ( ) == Log (, where Log is the principal branch of the logarithm, defined everywhere except on ( 0] by the formula Log ( == l n l (I + i Arg ( . The function Arg ( is the principal branch of ( 1 / 2 ==
1 _
1

1r

 oo ,
the argument, taking values in (  1r, 1r] . We then have Re(Log i) == 0 a iry Lo ( ) == 0. � lim {3 T] +
CXJ
'l rJ
As rJ l 0, Im (Log( A + i'T])) converges to 1r if A < 0 and to 0 if A > 0. So from (V. 4 5) we see that, the measure J.L is just the restriction of the Lebesgue measure to (oo, 0] . Thus, (V.47) Exercise V.4. 19 For 0 function cp( ( ) == (r . Show
r<
1,
let (r denote the principal branch of the
0(
(r == cos r21f + sin r7r J A  (  A2 A+ 1 ) 1Ai r d.A. CXJ 1r
This includes
(V. 4 6) as a

special case.
1
(V .48 )
V.
1 44
Op erator Monotone and Operator Co nvex Functions
Let now f be any operator monotone funct ion on (0, oo ) . We have seen above that f must have the form
J ( .X 1 t  A. >.+ 1 ) d (A ) . 0
j(t) == a + ,B t +

2
oo
JL
By a change of variables we can write this as
00 f ( t) = Q + jJt + J ( ). 2 �  ). � t ) dJ1 ( ). ) (V.49) 1 IR, {3 > 0 and is a positive measure on (0, ) such that 00 dJi(A) (V. 50) < 2 A + 1 J l
where a
E
0
JL
oo
1
oo .
0
Suppose f is such that
f(O)
lim
: ==
to
Then, it follows from (V.49) that 1
JL
(V . 5 1 )
f(t) >  oo .
must also satisfy the condition
J � dfl(>.) 0
We have from (V.49)
<
(V. 5 2 )
oo.
00
J (�  ). � t ) dfl(>.) 00 jJt + 1 ( >. : t ) >. dfl(>.). jJt +
j(t)  f(O)
0
Hence, we can write f in the form
(V. 53) 100 � t dw(>.) , where == f(O) and dw(A) == ;2 dJi(A). From (V.50) and (V. 52) , we see that the measure w satisfies the conditions
f (t)
=
'Y
).
+ jJt +
1
7 >/: dw(>.) 0
1
<
oo
and
J >.dw(>.) 0
<
oo .
(V. 54)
V.4 Lo ewner's Theorems
145
These two conditions can, equivalently, be expressed as a single condition
j � ). CXJ
1
0
dw ( >. ) <
(V.55)
oo .
We have thus shown that an operator monotone function on (0, oo ) sat isfying the condition (V. 5 1 ) has a canonical representation (V. 53) , where 1 E JR, ,B > 0 and w is a positive measure satisfying (V. 55) . The representation (V.53) is often useful for studying operator monotone functions on the positive halfline (0, oo ) . Suppose that we are given a function f as in (V.53) . If JL satisfies the conditions (V.54) then
and we can write
and dJ.L (>.. ) == >.. 2 dw (>.. ) , then we have a representation of f in the form (V.49) . So , if we put the number in braces above equal t o
Exercise V .4.20 show that, for 0 <
Use the considerations in the preceding paragraphs to < 1 and t > 0, we have
r
t' (See
Exercise
a
==
s1
n
r
1r
____
1f
( V .56)
0
also. ) For t > 0, show that
V. 1 . 1 0
Exercise V.4. 2 1
log(l + t ) =
). J +t CXJ
1
>.. t
). 
2 d.A.
(V.57)
Appendix 1 . Differentiability of Convex Functions Let f be a real valued convex function defined on an interval I. Then f has some smoothness properties, which are listed below. The function f is Lipschitz on any closed interval (a, b] contained in /0 , the interior of I. So f is continuous on !0 .
1 46
V.
O perator Monotone and Op erator Co nvex Funct ions
At every point x in / 0 , the right and left derivatives of are defined, respectively, as
!' (
+ X
)
.
.
_

1.1m ylx
f'_ ( x) : = lim jx y
f
exist. These
f (y )  f (x ) , yX f (y)  f (x) yX .
Both these functions are monotonically increasing on !0 . Further, xlw
f� ( x ) lim f� ( x ) xjw lim
f� ( w ) , J!_ ( w ) .
The function f is differentiable except on a countable set E in /0 , i.e. , at every point x in 1°\E the left and right derivatives of f are equal. Further, the derivative f' is continuous on ! 0 \E. If a sequence of convex functions converges at every point of I , then the limit function is convex. The convergence is uniform on any closed interval [a, b] contained in J0 . Appendix 2 . Regularisation of Functions The convolution of two functions leads to a new function that inherits the stronger of the smoothness properties of the two original functions. This is the idea behind "regularisation" of functions. Let cp be a real function of class c oo with the following properties: cp > 0,
0, let 'Pt: ( x ) == � cp( � ) . Then supp 'Pt:: == [t:, E ] and 'Pt:: has all the other properties of cp listed above. The functions 'Pc are called mollifiers or smooth approximate identit ies. If f is a locally integrable function, we define its regularisation of order £ as the function
f£ ( x ) == (!
j f (x  y ) cp£ (y ) dy j f (x  Et ) cp(t ) dt.
fc has the following properties. Each fc is a c oo function. If the support of f is contained in a compact set K, then the support
The family 1.
2.
of ft: is contained in an £neighbourhood of K .
V . 5 P roblems
3 . If f is continuous at xo , then lim fE: ( x 0 ) clO
==
1 47
f (x0 ) .
4. If f has a discontinuity of the first kind at x 0 , then lim JE: ( x 0 ) £1 0 1/2 [f (xo+) + f(xo  )] . (A point x 0 is a point of discontinuity of the first kind if the left and right limits of f at x 0 exist ; these limits are denoted as f(xo ) and f(x o +) , respectively. ) 5. If f is continuous, then JE: (x) converges to f(x) as vergence is uniform on every compact set.
6. If f is differentiable , then, for every £ > 0, (JE: )'
==
c *
0. The con
(f')E: .
0, f� (x) converges to f' (x) at all 7. If f is monotone, then, as £ points x where f' (x) exists. (Recall that a monotone function can *
have discontinuities of the first kind only and is differentiable almost everywhere. )
V.5
Problems
Problem V. 5 . 1 . Show that the function f(t) == exp t is neither operator n1onotone nor operator convex on any interval. Problem V. 5 . 2 . Let f(t) == �i:� , where a, b, c, d are real numbers such that ad  be > 0. Show that f is operator monotone on every interval that does not contain the point  d . c
Problem V . 5 . 3 . Showthat the derivative of an operator convex function need not be operator monotone. Problem V. 5 .4. Show that for r <  1 , the function f(t) == t r on (O, oo ) is not operator convex. (Hint: The function j[1] (1 , t) cannot be continued analytically to a Pick function. ) Together with the assertion in Exercise V.2. 1 1 , this shows that on the halfline (0, oo ) the function f(t) == tr is operator convex if  1 < r < 0 or if 1 < r < 2; and it is not operator convex for any other real r . Problem V. 5 . 5 . A function it is of the form
g
on [0 , oo ) is operator convex if and only if
J
CXJ
g(t ) where
o: ,
= 0:
+ (3t + "(t 2 +
{3 are real numbers,
1
0
> 0, and
J.L
is a positive finite measure.
V.
1 48
Op erat o r Monotone and Op erator Convex Functions
Problem V. 5 . 6. Let f be an operator monotone function on (0, oo ) . Then (  1)n l f( n ) ( t) > 0 for n == 1 , 2 , . . . . [A function g on (0, oo ) is said to be completely monotone if for all n > 0, (  1) n g( n ) (t) > 0. There is a theorem of S.N. Bernstein that says that a function g is completely monotone if and only if there exists a positive measure Jl such that g ( t) ==
J eAtdJ.L(A) .] CXJ
The result of this problem says that the derivative of an
0
operator monotone function on (0, oo ) is completely monotone. Thus , f CXJ
has a Taylor expansion f (t) ==
a n (t  l ) n , in which the coefficients an L n =O are positive for all odd n and negative for all even n .
Problem V. 5 . 7. Let f be a function mapping (0, oo ) into itself. Let g(t) == [f ( t 1 )]  1 . Show that if f is operator monotone, then g is also operator monotone. I f f is operator convex and f (O) == 0, then g is operator convex. Problem V. 5 . 8 . Show that the function f(() ==  cot ( is a Pick function. Show that in its canonical representation (V . 42) , a == {3 == 0 and the measure JL is atomic with mass 1 at the points n1r for every integer n. Thus, we have the familiar series expansion CXJ
 cot ( == Loo n
[
1 n1f  (

n1r n 2 7r 2 + 1
].
Problem V. 5.9. The aim of this problem is to show that if a Pick function cp satisfies the growth restriction sup T] + CXJ
l rJ cp( i rJ ) l < oo ,
(V . 58)
then its representation (V.42) takes the simple form
where
cp(( ) =
Jl
1 oo CXJ

).
1
 dp,(A) , (
(V.59)
is a finite measure. To see this, start with the representation (V.4 1) . The condition (V.58) implies the existence of a constant M that bounds, for all 'fJ > 0, the quantity 'fJcp(iry ) , and hence also its real and imaginary parts. This gives two inequalities:
lan + •1
J
CXJ
 CXJ
'fJ (l  rJ2 )A A2 +
'f/2
dv(A) l < M,
V.6
From the first , conclude that TJ + 00
a == lim
CXJ J

oo
(TJ2  l ) .A dv (.A) == .A 2 + TJ 2
From the second , conclude that {3 == 0 and
Notes and
References
149
CXJ .A dv (.A) . CXJJ
CXJ 1 + .A2 )dv(.A) < M. ( 2 ). : CXJJ this gives CXJ 2 I ( 1 + .A ) dv(.A) = 1: dJ.L (.A) rP
Taking limits as
Thus,
J.L
rJ � oo ,

< M. ,,
oo
is a finite measure. From (V.4 1 ) , we get
cp ( ( ) =
CXJ CXJ ( I .A dv(.A) + I \�.A( dv(.A ) . CXJ 
CXJ
This is the same as (V. 59) . Conversely, observe that if 'P has a representation like (V.59) , then it must satisfy the condition (V .58) . Problem V. 5 . 10. Let f be a function on (0, oo ) such that
I .A � t dJ.L (A) , CXJ
f (t ) == Q + {3t 
0
where a E �' {3 > 0 and J.L is a positive measure such that J l dJ.L ( A ) < oo . Then f is operator monotone. Find operator monotone functions that can not be expressed in this form.
V.6
Notes and References
Operator monotone functions were first studied in detail by K.Lowner (C. Loewner) in a seminal paper Uber monotone Matrixfunktionen, Math. Z . , 38 ( 1934) 1 77216. In this paper, he established the connection between operator monotonicity, the positivity of the matrix of divided differences (Theorem V.3.4) , and Pick functions. He also noted that the functions f(t) = tr , 0 < r < 1 , and f(t) = log t are operator mo no t one on (0, oo ) .
1 50
V.
Operator M o notone and Operator Convex Functions ..
Operator convex functions were studied , soon afterwards, by F. Kraus, Uber konvexe Matrixfunktionen, Math. Z . , 41 ( 1 936) 1842 . In another wellknown paper, Beitriige zur Storungstheorie der Spectralz erlegung, Math. Ann. , 123 ( 1951 ) 415438, E. Heinz used the theory of operator monotone functions to study several problems of perturbation theory for bounded and unbounded operators. The integral representation ( V .41) in this context seems to have been first used by him. The operator monotonicity of the map A * A' for 0 < r < 1 is sometimes called the "LoewnerHeinz inequality" , although it was discovered by Loewner. J. Bendat and S. Sherman, Monotone and convex operator functions, Trans. Amer. Math. Soc. , 79 ( 1 955) 5871 , provided a new perspective on the theorems of Loewner and Kraus. Theorem V.4.4 was first proved by them, and used to give a proof of Loewner's theorems. A completely different and extremely elegant proof of Loewner's Theo rem, based on the spectral theorem for ( unbounded ) selfadJoint operators was given by A. Koranyi , On a theorem of Lowner and its connections with resolvents of selfadjoint transformations, Acta Sci. Math. Szeged , 1 7 ( 1956) 6370. Formulas like (V . 13) and ( V.22 ) were proved by Ju. L. Daleckii and S.G. Krein, Formulas of differentiation according to a parameter of functions of Hermitian operators, Dokl. Akad. Nauk SSSR, 76 ( 1951 ) 13 1 6 . It was pointed out by M. G. Krein that the resulting Taylor formula could be used to derive conditions for operator monotonicity. A concise presentation of the main ideas of operator monotonicity and convexity, including the approach of Daleckii and Krein, was given by C. Davis, Notions generalizing convexity for functions defined on spaces of matrices, in Convexity: Proceedings of Symposia in Pure Mathematics, American Mathematical Society, 1963, pp. 18720 1 . This paper also dis cussed other notions of convexity, examples and counterexamples, and was very infi uential. A full book devoted to this topic is Monotone Matrix Functions and Analytic Continuation, by W .F. Donoghue, SpringerVerlag, 1974. Several ramifications of the theory and its connections with classical real and com plex analysis are discussed here. In a set of mimeographed lecture notes, Topics on Operator Inequalities, Hokkaido University, Sapporo, 1978 , T. Ando provided a very concise mod ern survey of operator monotone and operator convex functions. Anyone who wishes to learn the Koranyi method mentioned above should certainly read these notes. A short proof of Lowner's Theorem appeared in G. Sparr, A new proof of Lowner 's theorem on monotone rnatrix functions, Math. Scand. , 47 ( 1980) 266274. In another brief and attractive paper, Jensen's inequality for operators and Lowner's theorem, Math. Ann. , 258 ( 1982) 229241 , F. Hansen and G . K . Pedersen provided another approach.
V. 6
Notes
and References
151
M uch of Sections 2 , 3 , and 4 are based on this paper of Hansen and Pedersen. For the latter parts of Section 4 we have followed Donoghue. We have also borrowed freely from Ando and from Davis. Our proof of The orem V. 1 .9 is taken from M. Fuj ii and T. Furuta, LownerHeinz, Cordes and HeinzKato inequalities, Math. Japonica, 38 ( 1993) 7378. Characteri sations of operator convexity like the one in Exercise V.3. 15 may be found in J.S. Auj la and H.L. Vasudeva, Convex and monotone operator functions, Ann. Polonici Math. , 62 ( 1 995) 1 1 1 . Operator monotone and operator convex functions are studied in R.A. Horn and C . R. Johnson, Topics in Matrix Analysis, Chapter 6 . See also the interesting paper R.A. Horn , The Hadamard product, in C.R. Johnson, ed. Matrix Theory and Applications, American Mathematical Society, 1990. A short , but interesting, section of the MarshallOlkin book ( cited in Chapter 2) is devoted to this topic. Especially interesting are some of the examples and connections with statistics that they give. Among several applications of these ideas, there are two that we should mention here. Operator monotone functions arise often in the study of electrical networks. See, e.g. , W.N. Anderson and G.E. Trapp, A class of monotone operator functions related to electrical network theory, Linear Algebra Appl. , 15( 1975) 5367. They also occur in problems related to elementary particles. See, e.g. , E. Wigner and J . von Neumann, Significance of Lowner 's theorem in the quantum theory of collisions, Ann. of Math. , 59 (1954) 418433. There are important notions of means of operators that are useful in the analysis of electrical networks and in quantum physics. An axiomatic approach to the study of these means was introduced by F. Kubo and T. Ando, Means of positive linear operators , Math. Ann. , 249 ( 1980) 2052 2 4 . They establish a onetoone correspondence between the class of oper ator monotone functions f on (0, oo ) with f ( 1) == 1 and the class of operator means.
VI Spectral Variation of Normal Matrices
Let A be an n x n Hermitian matrix, and let ,\ i (A) > ,\ � (A) > · · · > ,\ � (A) be the eigenvalues of A arranged in decreasing order. In Chapter III we saw that AJ (A) , 1 < j < n , are continuous functions on the space of Her mitian matrices. This is a very special consequence of Weyl's Perturbation Theorem: if A , B are two Hermitian matrices, then m 0xi Aj (A)  AJ (B) I < J
II A  B II 
In turn, this inequality is a special case of the inequality (IV.62 ) , which says that if Eig l (A) denotes the diagonal matrix with entries AJ (A) down its diagonal, then we have
for all Hermitian matrices A, B and for all unitarily invariant norms. In this chapter we explore how far these results can be carried over to normal matrices. The first difficulty we face is that , if the matrices are not Hermitian, there is no natural way to order their eigenvalues. So, the problem has to be formulated in terms of optimal matchings. Even after this has been done, analogues of the inequalities above turn out to be a little more complicated. Though several good results are known, many await discovery.
VI. l Cont i nuity of Roots of Polynomials
VI . l
1 53
Continuity of Roots of Polynomials
Every polynomial of degree n with complex coefficients has n complex roots . These are unique, except for an ordering. It is thus natural to think of them as an unordered ntuple of complex numbers. The space of such ntuples is denoted by C�ym . This is the quotient space obtained from the space e n via the equivalence relation that identifies two ntuples if their coordinates are permutations of each other. The space C�ym thus inherits a natural quotient topology from e n . It also has a natural metric : if .A == { .A1 , . . . , .A n} and J.L == { ttl ' . ' f..L n } are two points in c�y rn ' then . .
where the minimum is taken over all permutations. See Problem II.5.9. This metric is called the optimal matching distance between .A and JL . Exercise Vl. l . l Show that the quotient topology on C�y rn and the metric topology generated by the optimal matching distance are identical.
Recall that, if
!(
z
)
==
z n  a 1 z n  1 + a2 z n 2 +
·
·
·
+ (  l ) n an
(VI. l )
is a monic polynomial with roots n 1 , . . . , nn , then the coefficients aj are elementary symmetric polynomials in the variables n 1 , . , n n , i.e. , .
.
(VI.2)
By the Fundamental Theorem of Algebra, we have a bijection S : C�ym en defined as (VI.3)
1
Clearly S is continuous, by the definition of the quotient topology. We will show that s 1 is also continuous. For this we have to show that for every E > 0, there exists 8 > 0 such that if l aj  bj I < 8 for all j , then the optimal matching distance between the roots of the monic polynomials that have aj and bj as their coeffir3ients is smaller than E. . Let � 1 , . . . , ( k be the distinct roots of the monic polynomial f that has coefficients a1 . Given £ > 0, we can choose circles ri , 1 < j < k, centred at (j , each having radius smaller than c and such that none of them intersects any other. Let r be the union of the boundaries of all these circles. Let rJ == inf I f ( z) 1  Then rJ > 0. Since r
zEr
is a compact set , there exists a positive number 8 such that if g is any monic polynomial with coefficients bj , and I aj  b1 I < 8 for all j , then l f ( z )  g ( z ) l < rJ for all z E r. So, by Rouche's Theorem f and g have the same number of zeroes inside each rj , where the zeroes are counted with multiplicities. Thus we can pair each root of f with a root of g in
1 54
VI .
S pectral Variat ion of Normal M atrices
such a way that the distance between any two pairs is smaller than £. In other words, the optimal matching distance between the roots of f and g is smaller than £. We have thus proved the following.
The map S is a homeomorphism between C�y m and en . The continuity of s 1 means that the roots of a polynomial vary contin uously with the coefficients. Since the coefficients of its characteristic poly
Theorem Vl. 1 .2
nomial change continuously with a matrix, it follows that the eigenvalues of a matrix also vary continuously. More precisely, the map M (n) * C �ym that takes a matrix to the unordered tuple of its eigenvalues is continu ous . A different kind of continuity question is the following. If z A( z) is a continuous map from a domain G in the complex plane into M(n) , then do there exist n continuous functions .X 1 ( z ) , . . , A n ( z ) on G such that for each z they are the eigenvalues of the matrix A ( z ) ? The example below shows that this is not always the case. *
.
(
)
Let A( z ) � � . The eigenvalues of A(z) are ± z 1 12 . These cannot be represented by two single valued continuous func tions on any domain G that contains zero.
Example VI. 1.3
=
In two special situations, the answer to the question raised above is in the affirmative. If either the eigenvalues of A ( z ) are all real, or if G is an interval on the real line, a continuous parametrisation of the eigenvalues of A ( z ) is possible. This is shown below. Consider the map from IR�ym to IRn that rearranges an unordered ntuple { .A 1 , . . . , A n } in decreasing order as ( Ai , . . , A � ) . From the majorisation relation ( IL35 ) it follows that this map reduces distances, i.e. , .
Hence, in particular, this is a continuous map. So, if all the eigenvalues of A ( z ) are real, enumerating them as Ai (z) > > A � ( z ) gives a continuous parametrisation for them. We should remark that while this is the most natural way of ordering real ntuples, it is not always the most convenient. It could destroy the differentiability of these functions, which some other ordering might confer on them. For example, on any interval containing 0 the two functions ±t are differentiable. But rearrangement in the way above leads to the functions ±lt l , which are not differentiable at 0. For maps from an interval we have the following. ·
·
·
Let A be a continuous map from an interval I into the space C�y m . Then there exist n continuous comple x functions .A1 ( t) on I / such that A(t) {.A 1 (t) , . . . , A n (t) } for each t E .
Theorem Vl. 1.4 ==
VI . 2 Hermit ian a nd SkewHermitian l\1 atrices
155
For brevity we wil l cal l n functions whose existence is asserted by t he theorem a continuous selectio n for ...\ . Suppose a continuous selection .AJ 1 ) ( t) exists on a subinterval I1 and another continuous selection .AJ2 ) (t) exis t s on a s ubi nte rval I2 . If I1 and I2 have a common point t 0 , then 2 {.A)1 ) ( to ) } and { .A; ) ( t0 )} are identical up to a permutation. So a continuous I . sel ection exists on I1 U 2 It follows that , if J is a subinterval of I such that each point of J has a neighbourhood on which a continuous selection exists, then a continuous sel ection exists on the entire interval J. Now we can prove the theorem by induction on n . The statement is obviously true for n == 1 . Suppose i t is true for dimensions sm aller than n . Let K be the set of all t E I for which all the n ele me nt s of A( t) are equal. Then K is a closed subset of I. Le t L == I\K. Let t0 E L. Then A( t0 ) has at least two distinct elements. Collect all the copies of one of these elements. If these are k in number (i . e . , k is the multiplicity of the chosen element ) , then t 4e n elements of A( t0 ) are now divided into two groups with k and n  k elements, respectively. These two groups have no ele ment in co mmon . Since A( t ) is continuous, for t su ffi cientl y close to t0 the elements of A ( t) also split into two groups of k and n  k elements, each of which is continuous in t . By t h e induction hypothesis, each of these groups has a c o nt i n uous selection in a neighbourhood of t 0 . Taken together, they provide a continuous selection for A in this neighbourhood. So, a conti nu ous selection exists on each component of L. On its comple ment K, A ( t) consists of just o ne element .A(t) repeated n t imes . P ut ting • these together we obtain a continuous selection for A( t) on all of I.
Proof.
Corollary Vl. 1 . 5 Let a1 ( t) , 1 < j < n, be continuous complex valued functions defined on an interval I. Then there exist continuous functions a 1 ( t) , . . . , a n ( t) that, for each t E I, constitute the roots of the monic poly nomial z n  a 1 (t)zn l + + (  l ) n a n (t) . ·
·
·
Corollary Vl. 1 . 6 Let t � A ( t ) be a continuous map from an interval I into the space of n x n matrices. Then there exist continuous functions A 1 (t) , . . . , A n (t) that, for each t E I, are the eigenvalues of A (t) .
VI . 2
Hermitian and Skew Hermitian Matrices
In this section we derive so me bounds for the distance between the eigen values of a Hermitian matrix A and those of a skewHermitian matrix B . This will reveal several new facets of the general problem that are quite d i ffe re nt from the case when both A , B are Hermitian. Let us recall here, once again, the theorem that is the prototype of t he results we seek.
156
VI .
S p ect ral Variation o f Nonnal M at rices
(Weyl 's Perturbati on Theorem) Let A, B be Hermitian matrices with eigenvalues Ai ( A ) > > A ; ( A) and AI ( B ) > · · > A� (B ) , respectively. Then (VL4 ) ID?XIA] ( A )  A] ( B ) I < II A  E ll ·
Theorem Vl.2. 1
·
·
·
·
J
We have seen two different proofs of this, one in Section III. 2 and the other in Section IV.3. It is the latter idea which, in modified forms, will be used often in the following paragraphs.
Let A be a Hermitian and B a skewHermitian matrix. Let their eigenvalues a 1 , . . . , a n and {3 1 , . , f3n be arranged in such a way that
Theorem Vl.2 . 2
.
.
(VI.5)
Then
(VI.6)
Proof. For a fixed index j , consider the eigenspaces of A and B corre sponding to their eigenvalues {a1 , . . . , a j } and {{31 , . . . , f3n j +l } , respec tively. Let x be a unit vector in their intersection. Then
II A  B l l 2 >
� ( ! l A  B l l 2 + II A + B ll 2 ) � (! ! (A  B) x ll 2 + l l ( A + B)x ll 2 )
I I Ax II 2 + I I B x 11 2 > ! aj 1 2 + l f3n j + I I 2 == l nj  f3n j + I l 2 · At the first step above, we used the equality l i T II l i T* II valid for all T; at the third step we used the parallelogram law, and at the last step the fact • that aj is real and f3n j + l is imaginary. ==
For Hermitian pairs A, B we have seen analogues of the inequality (VI.4) for other unitarily invariant norms. It is , therefore, natural to ask for similar kinds of results when A is Hermitian and B skewHermitian. It is convenient to do this in the following setup. Let T be any matrix and let T == A + i B be its Cartesian decomposition into real and imaginary parts, A == T�r· and B == T;_[· The theorem below gives majorisation relations between the eigenvalues of A and B, and the singular values of T. From these several inequalities can be obtained. We will use the notation { x1 }j to mean an nvector whose jth coordinate .
lS
Xj ·
Let A, B be Hermitian 1natrices with eigenvalues Gj and {3j , respectively: ordered so that Theorem VI. 2.3
Let
T
A
VI.2 H e rm it i an and Ske\v Hermit ian M at rices
1 57
and let Sj be the singular values of T. Then the following majorisation relations are satisfied: ==
+ iB,
+ if3n  j+I I 2 } j < { s] }j , 2 { 1 / 2 ( s� + s ;_ j +1 ) }j < { l aj + i /3j l } j { l aj
Pro o f.
(VI. 7) (VI.8)
For any two Hermitian matrices X, Y we have the maj orisations
(III. 13) :
Cho osing X
Al (X) + AT (Y ) < A(X + Y) ==
A2 ,
Y
==
< A l ( X )
+ A l (Y) .
B 2 , this gives (VI.9)
Now note that and So, choosing X
==
r;r
sj ( T*T ) and Y
==
==
sj ( TT* )
r� ·
==
sJ .
in the first maj orisation above gives (VI. lO)
Since maj orisation is a transitive relation, the two assertions (VI. 7) and • (VI. 8) follow from (VI.9) and (VI. lO) .
For each p > 2 , the function
12P/ 2 { ( Sj2 + Sn2 j+l ) P/2 }J. <w { I aJ. + "fIJ �  IP }J. .
(VI. 12)
These two relations include the inequalities
n L ! aj + if3n  j+ I I P
j= l
n / 2:/ 2 LC sJ + s� j+ l )P 2 j =l
<
n
L s; , j= l
n < L l o:i + i ,6i i P j= l
(VI. 13)
(VI. 14)
for p > 2 . If a 1 and a 2 are any two nonnegative real nu mbers, then the function g ( t ) == ( ai + a � ) lft is monotonically decreasing on 0 < t < oo . So if p > 2 , then p p < 2 2 p/2 ( VI. 15) a l + a 2  ( a l + a2 ) .
158
VI .
S pectral Variation of Normal Matrices
Using th1 s we get from (VI . 14) the inequality n n p 2 2 1  / L sj < L !aj + i ,Bj i P j= l j=l
(VI. 1 6 )
for p > 2 .
Exercise Vl. 2 . 4 For 0 < p < 2 , the function cp(t ) == t P I 2 is concave on [0 , oo ) . Use this to show that for these values of p , the weak majorisations (VI. l l} and {VI. 1 2} are valid with < w replaced by < w . All the four in
equalities
( VI. 13} (VI. 1 6)
now go in the opposite direction.
Let A be any matrix with eigenvalues a 1 . . . , a n , counted in any order. We have used the notation Eig A to mean a diagonal matrix that has entries a 1 down its diagonal. I f O" is a permutation, we will use the notation Eigo (A) for the diagonal matrix with entries nu ( l ) , . . . , ao( n) down its diagonal . The symbol Eig l l l (A) will mean the diagonal matrix whose diagonal entries are the eigenvalues of A in decreasing order of magnitude, i.e. , the aj arranged so that j a 1 1 > · · > In n I · I n the same way, Eig l T I (A) will stand for the diagonal matrix whose diagonal entries are the eigenvalues of A arranged in increasing order of magnitude, i.e. , the a 1 rearranged so that l n 1 l < l n2 l < · · · < ! an i · With these notations, we have the following theorem for the distance between the eigenvalues of a Hermitian and a skewHermitian matrix, in the Schatten pnorms. ,
·
Theorem Vl. 2 . 5
Then, {i} for 2 < p <
{ii}
for
Let A be a Hermitian and B a skewHermitian matrix.
oo ,
we have
II Eig l l l (A)  Eig i T I (B) l i P < I I A  B l i p , I I A  B l i P < 2 �  � I I Eig l l l (A)  Eig l l l (B) l ip ; 1 < p < 2 , we have I I Eig l l i (A)  Ei g i 1 1 (B ) I I p < 2 i  � I I A  B l i , P I I A  B l ! v < II Eig l l l (A)  Eig i T I (B) llv·
(VI. 17 ) (VI. 18) (VI . 19) (VI.20)
All the inequalities above are sharp. Further, {iii) for 2 < p < oo , we have
I I Eig l l l ( A)  Eig i T I ( B) li P < I I Eig(A)  Eiga ( B ) l i P < I IEig l l l (A)  Eig l l l (B) li P (VI. 21)
for all permutations O" ; {iv} for 1 < p < 2_, we have
I I Eig l l i (A)  Eig i 1 1 (B) I I p < I I Eig(A )  EigO"(B) I I p < I I Eig l l i (A)  Eig i T I (B) IIp (VI. 22)
for all permutations O" .
VI . 3 Est imates in the O p erator Norn1
159
Proof. For p > 2, the inequalities (VI. 1 7) and (VI. 18) follo\v immediately from (VI. l3) and (VI. 16) . For p == oo , the same inequalities remain valid by a limiting argument. The next two inequalities of the theorem follow from the fact that, for 1 < p < 2 , both of the inequalities (VI. 13) and (VI. 1 6 ) are reversed. The special case of statements ( i ) and (ii) in which A and B commute is adequate for proving (iii) and (iv) . The sharpness of all the inequalities can be seen from the 2 x 2 example: (VI. 23)
Here I I A  B li P == 2 for all 1 < p < oo . The eigenvalues of of B are ±i. Hence, for every permutation O" I I Eig(A )  Eig ()" (B) l i P II A  B I I p

for all 1 < p <
==
1
A are ± 1 , those
1
2p2
oo .
•
Note that the inequality (VI.6) is included in (VI. 1 7) . There are several features of these inequalities that are different from the corresponding inequality (IV.62) for a pair of Hermitian matrices A, B . First , the inequalities (VI. 18) and (VI. 19) involve a constant term on the right that is bigger than 1 . Second, the best choice of this term depends on the norm I I · l i p · Third , the optimal matching of the eigenvalues of A with those of B  the one that will minimise the distance between them changes with the norm. In fact , the best pairing for the norms 2 < p < oo is the worst one for the norms 1 < p < 2 , and vice versa. All these new features reveal that the spectral variation problem for pairs of normal matrices A, B is far more intricate than the one for Hermitian pa1rs.
Let A be a Hermitian and B a skewHermitian matrix. Show that for every unitarily invariant norm we have (VI. 24) I I ! E ig l l i ( A )  Eig 1 1 1 ( B ) II I < 2 I II A  B ill ,
Exercise Vl. 2.6
Ili A  Bill < V2 111 Eig 1 1 1 (A)  Eig l l i (B) III 
(VI.25)
The term J2 in the second inequality cannot be replaced by anything smaller. VI . 3 In
Est imates in the Op erator Norm
this section we will obtain estimates of the distance between the eigen values of two normal matrices A and B in terms of II A  B II  Apart from
160
VI.
Spectral Variation o f Normal Matrices
the optimal matching distance, which has already been introduced, we will consider other distances. If L, M are two closed subsets of the complex plane C , let
s(L, M )
==
sup dist(.A, M )
>. E L
==
sup inf I .A  J.L I · JLE M
(VI.2 6)
A. E L
The Hausdorff distance between L and M is defined as
h(L , M )
==
max ( s( L , M ) , s (M , L ) ) .
( VI . 27 )
Show that s ( L , M) 0 if and only if L is a subset of M. Show that the Hausdorff distance defines a metric on the collection of all closed subsets of C .
Exercise Vl.3. 1
==
Note that s(L, M ) is the smallest number 8 such that every element of L is within a distance 8 of some element of M; and h(L , M) is the smallest number fJ for which this, as well as the symmetric assertion with L and M interchanged, is true. Let { ) q , . . . , A n } and { J.L1 , . . . , J.Ln } be two unordered ntuples of complex numbers. Let L and M be the subsets of C whose elements are the entries of these two tuples. If some entry among { Aj } or {,Uj } has multiplicity bigger than 1 , then the cardinality of L or M is smaller than n.
{i) The Hausdorff distance h(L, M ) is always less than or equal to the optimal matching distance d( { ...\ 1 , , A n } , { J.L l , , J.Ln } ) . 2 , the two distances are equal. {ii) When £, + E} and { m, £ ,  £ } provide an example in {iii) The triples { 0, which h(L , M) £ and the optimal matching distance is m  2 £. Thus, for > 3, the second distance can be arbitrarily larger than the first.
Exercise Vl.3 . 2
.
.
.
.
.
.
n ==
m

m
==
n
If A is an n x n matrix, we will use the notation o (A) for both the subset of the complex plane that consists of all the eigenvalues of A, and for the un ordered ntuple whose entries are the eigenvalues of A counted with multi plicity. Since we will be talking of the distances s (O"(A) , O"(B) ) , h(O"(A) , o(B) ) , and d ( o ( A) , a(}i )), it will be clear which of the two objects is being represented by o ( 1 \ Note that the inequalities (VI.4) and (VI.6) say that .r  , .
d( o ( A ) , o( B) )
<
I I A  Btl ,
( VI . 28 )
if either A and B are both Hermitian, or one is Hermitian and the other skewHermitian. Theorem Vl.3.3
Let A be a normal and B an arbitrary matrix. Then s (o(B) , o( A) ) < I I A  BJJ .
(VI. 29 )
V L 3 Estimates i n t he O perator Norm
161
P roof. Let £ == II A  B I J . We have to show that if (3 is any eigenvalue of B , then (3 is within a distance £ of some eigenvalue O'.j of A. By applying a translation, we may assume that (3 == 0 . If none of the a. j is within a distance £ of this, then A is invertible. Since A is normal, we have II A  I ll = minliQJ I < ! . Hence , Since B == A(! + A  1 ( B  A )) , this shows that B is invertible. But then B could not have had a zero eigenvalue. • Another proof of this theorem goes as follows. Let A have the spectral resolution A == � a.j uj uj , and let v be a unit vector such that B v == (Jv. Then J
J
L l a.j  f31 2 l uj v i 2 J
Sinc e the
Uj
inequality can be satisfied only if index j . Corollary VI. 3.4 If A
For
n =
L 1ujv j2 ==
form an orthonormal basis,
2, we have
and B are
J
Ja.j  (3 j 2 n x n
<
1 . Hence, the above
II A  Bll 2
for at least one
normal matrices, then
h( o ( A) , o( B ) )
<
IIA  Ell 
(VI.30)
d ( o ( A ) , o( B ))
<
I I A  B jj .
(VI.3 1 )
This corollary also follows from the proposition below. We will use the notation D ( a , p) for the open disk of radius p centred at a, and D ( a, p) for the closure of this disk.
Proposition Vl.3.5 Let A and B be normal matrices, II A  E ll . If any disk D ( a , p) contains k eigenvalues of A, D ( a , p + £) contains at least k eigenvalues of B .
and let £ == then the disk
Without loss of generality, we may assume that a 0. Suppose D ( O, p) contains k eigenvalues of A but D (O, p + £ ) contains less than k eigenvalues of B . Choose a unit vector x in the intersection of the eigenspace of A corresponding to its eigenvalues lying inside D ( O , p) and the eigenspace of B corresponding to its eigenvalues lying outside D (O, p +£ ) . We then have f l Ax II < p and I I B x j j > p + £. We also have l ! Bx ll  II Ax il < I I ( B  A ) x ll < • H B  A II == £. This is a contradiction. Proof.
==
162
VI .
S pect ral Variation o f Normal Matrices
Us(/ihe special case p
Exercise VI . 3 .6 Corollary VI. 3. 4 .
==
0
of Proposition VI. 3. 5 to prove
Given a su bs et X of the complex plane and a matrix A , let mA ( X ) denote the number of eigenvalues of A inside X.
Let A, E be two normal matrices. Let K1 , K2 be two convex sets such that mA ( K1 ) < k and mB (K2 ) > k + 1 . Then dist (K1 , K2 ) < I I A  E jj . [Hint: Let p � oo in Proposition VI. 3. 5.j Exercise VI. 3. 8 Use this to give another proof of Theorem V/. 2. 1 . Exercise Vl.3.9 Let A , B be two unitary matrices whose eigenvalues lie in a semicircle of the unit circle. Label both the sets of eigenvalues in the counterclockwise order. Then Exercise VI .3. 7
n x n
n 
nx n
m?JC I Aj (A)  Aj ( E ) l < I I A  E ll . J
Hence,
d (a( A) , a( E ) ) < I I A  E l l .
Let T be the unit circle, I any closed arc in T, and for arc { z E T l z  w l < £ for some w E /} . Let A, B be unitary matrices with I I A  E ll == £ . Show that mA (I) < ms (Ic ) · Theorem Vl. 3 . 1 1 For any two unitary matrices,
Exercise Vl.3 . 10 c > 0 let IE be the
:
d (a (A) , a(E) ) < I I A  E l l .
Proof. T he proof will use the Marriage Theorem ( Theorem IL 2. 1 ) and the exercise a bove. Let { ) q , . . , A n } and {J.Ll , . . . , J.ln } be the eigenvalues of A and B, respec tively. Let A be any subset of { A I , . , An }  Let J.L(A) == { J.Li : I J.Lj  Ai l < E. for some A i E A} . By the Marriage Theorem, the assertion would be proved if we show that IJ.L (A) I > I A I . Let I(A) be the set of all points on the unit circle T that are within distance E. of some point of A. Then J.L ( A) contains exactly t hos e J.li that lie in I( A) . Le t /(A) be written as a disjoint union of arcs !1 , . . , I, . For each 1 < k < r , let Jk be the arc contained in Ik all whose points are at distance > c from the boundary of Ik · Then Ik = ( Jk ) £ · From Exercise VL 3. 10 we have .
.
.
.
r
T
k ==l
k==l
But , all the elements of A are in some Jk . This shows that I A I < I J.L(A) I .
•
There is one difference between Theorem VL3. 1 1 and most of our earlier results of this type. N ow nothing is said about the order in which the
VI . 3 Estimates in t he Operator Nann
163
eigenvalues of A and B are arranged for tl,1 e optimal matching. No canonical order can be prescribed in general. In Problem VI. 8 . 3 , we outline another proof of Theorem VI.3. 1 1 which says, in effect , that for optimal matching the eigenvalues of A and B can be counted in the cyclic order on the circle p rovided the initial point is chosen properly. The catch is that this initial p oint depends on A and B and we do not know how to find it.
Let A == c 1 U1 , B == c 2 U2 , where U1 , U2 are unitary ma trices and c 1 , c2 are complex numbers. Show that d (o ( A ) , u ( B ) )
Exercise VI. 3 . 1 2 < II A  E l l .
By now we have seen that the inequality ( VI.28 ) is valid in the following situ ations:
( i ) A and B both Hermitian
(ii ) ( iii ) ( iv )
A Hermitian and E skewHermitian A and E both unitary ( or both scalar multiples of unitaries ) A and B both 2
x
2 normal matrices.
The example below shows that this inequality breaks down for arbitrary normal matrices A , B when n > 3.
Example Vl.3. 13 Let A be the 3 x 3 diagonal matrix with diagonal entries + 13 i A ==  l + 2 v'3 i L et v T == \ A == 4 5 1 13 ' 3 /\ } ' 2 13 v!§_8 ' l2 ' v 8 and let U � I  2vvT . Then U is a unitary matrix. Let E =  U * A U Then E is a normal matrix with eigenvalues /tj ==  Aj , j == 1 , 2 , 3 . One can check that
fi)
(
.

.
d ( cr (A) , cr (B) )
So,
IIA  B ll =
= {!A,
d (o (A ) , u ( B ) ) II A  E l l
==
.Jf{.
1 .0 183 + .
In the next chapter we will show that there exists a constant such that for any two n x n normal matrices A , B
c
< 2.91
d ( u (A) , CT(B ) ) < ell A  E l l .
For Hermitian matrices A, B we have a reverse inequality: II A  E ll < m� I AJ ( A )  A} ( B ) I . l <J < n
The quantity on the right is the distance between the eigenvalues of A and those of B when the "worst', pairing is made. An analogous result for normal matrices is proved below.
VI.
1 64
Sp ectral Variation o f Normal Matrices
Theore'm VI. 3 . 14 Let A and B::;·:�b e normal matrices with eigenvalues { ) q , . , A n } and { JL 1 , . , /L n } , respecti vely. Then, there exists a permu tation O" such that .
.
.
.
( VI.3 2 )
P roof. The matrices A 0 I and I 0 B are both normal and commute wit h each other. Hence A 0 I  I 0 B is normal. The eigenvalues of this matrix are all the differences .,.\ i  /Lj , 1 < i , j < n. Hence II A 0 I  I ® E l l = �Cl?C l A i  /Lj 1 'l ,J
So, the inequality ( VI.32 ) is equivalent to l l A  E ll
<
vl2 II A ® I  I ® B T II ·
This is, in fact , true for all A , B and is proved below. Theorem VI. 3 . 1 5
For all matrices A, B II A  E l l
Proof.
<
vi2II A 0 I  I ® B T I I ·
•
( VI.33)
We have to prove that for all x , y in e n l ( x , (A  B ) y) j < vf2 I I A 0 I  I 0 BT I I I l x l l I I Y II ·
We have
j (x, (A  B ) y) j
<
l x* A y  x* By ! == j tr (A y x*  y x* B) I I I A y x*  yx * B i l l 
The matrix A yx *  y x * B has rank at most 2 . So, l ! Ay x *  y x* B l l 1
<
vf2 11 A y x*  y x* B l 1 2 
Let x be the vector whose components are the complex conj ugates of the components of X . Then with respect to the standard basis e i 0 ej of en 0 e , the ( i , j )coordinate of the vector (A ® l) (y 0 x) is L aik YkX j . This is also n
k
the ( i , j )entry of the matrix A y x * . In the same way, the ( i , j )entry of y x * B is the ( i , j ) coordinate of the vector ( I ® B T ) ( y ® x) . Thus, we have I! Ay x*  y x* B l 1 2
This proves the theorem.
<
l l (A ® I  I ® B T ) (y 0 x ) l l II A ® I  I ® B r l l I I Y 0 x l l I I A 0 I  I 0 B T l! l l x l l I I Y I I 
•
The example ( VI . 23 ) shows that the inequality (VI.32) is sharp. Note that in this example A and B are both unitary. Also, A is Hermitian and B is skewHermitian. In contrast, the factor ..J2 in ( VI.32 ) can be replaced by 1 if A , B are both Hermitian.
Vl . 4 Estin1ates in the Frobenius Norm
VI . 4
165
Estimates in the Frobenius Norm
We will use the symbol Sn to mean the set of permutations on n symbols , as well as the set of n x n permutation matrices. (To every permutation O" there corresponds a unique matrix P that has entries 1 in the ( i , j) place if and only if j == O" ( i ) , and all whose remaining entries are zero. ) Let O n be the set of all n x n doubly stochastic matrices. This is a convex polytope and by Birkhoff's Theorem ( Theorem 11.2.3) its. extreme points are permutation matrices.
{Hoffman Wielandt) Let A and E be normal matrices with eigenvalues { ...\ 1 , . , A n } and {J.L 1 , . , /Ln } , respectively. Then I /2 n
Theorem Vl.4. 1
..
..
max
(
o E Sn
� I .Ai  J.la ( i ) l 2 i= l
)
(VI.34)
Proof. Choose unitary matrices U, V such that U A U * == D 1 , V BV * == D2 , where D 1 == diag ( A I , . . . , An) and D2 == diag(J.LI , . , J.Ln ) · Then, by uni tary invariance of the Frobenius norm, II A  El l� == I I U * D 1 U  V* D 2 V II � == 2 I I D 1 W  W D 2 ll , where W == UV * , another unitary matrix. If the matrix W has entries Wij , this can be written as
..
'L
,J
'L ,J
is an affine function on the set O n of doubly stochastic matrices. So it attains its minimum at one of the extreme points of O n . Thus, there exists a permutation matrix (Pij ) such that
'l. ,J
If this matrix corresponds to the permutation I I A  El l � >
·7 .
this says that
�I A i  J.la(i) l 2 'L
This proves the first inequality in (VI.34) . The same argument for the maximum instead of the minimum gives the other inequality. • Note that for Hermitian matrices, the inequality (VI.34) was proved ear lie r in Problem III.6. 15. In this case, we also proved that the same inequality is true for all unitarily invariant norms.
166
VI .
Spectral Variation o f Normal Matrices
In  general , there is no prescript ion for finding the permutation O" that minimises the Euclidean distance between the eigenvalues of A and those of B . However, if A is Hermitian with eigenvalues enume rated as ) q > A2 > > A n , then an enumeration o f J.Li in which Re /L l > Re J.L 2 > > Re fL n is the best one. To see this, j ust note that if _\ 1 > .:\ 2 and Re J.L 1 > Re JL2 , then ·
·
·
·
·
·
The same argument shows t h at an enumeration for whi ch the maximum < Re f..L n · (What distance is attained is one for which Re f..L l < Re J.L 2 < does this say when B is skewHermitian? ) Using the notations introduced in Sect io n VI.2, t he inequality (VI.34 ) can be rewritten as ·
B) min a ii E ig (A)  E ig a ( I I 2
<
IIA  BII 2
<
·
·
max i! Eig(A)  Eig a ( B) Ib  ( V I . 35) a
There is another way of looking at this. Since the eigenvalues of a no rmal matrix completely determine the matr ix up to a unitary conjugation , the inequality ( V I. 35) is equivalent to saying that for any two diagonal matrices A, B min jj A  P BP* 1 ! 2 < I I A  U BU * lb p
<
max i! A  PB P* 1 !2 , p
(VI.36)
where U is any unitary matrix and P varies over al l permutation matrices. Given any matrix B, let Us be the s et Us == { UBU* : U E U ( n ) } ,
where U(n) is the group consisting of unitary matrices. Then Us is a compact set called the unitary orbit of B. For a fixed di agon al matrix A, consider the function f(X) == II A  X ll 2  The inequal ity (VI.36) then says that if B is a no ther diagonal matrix, then on the compact se t UB b ot h the minimum and the maximum of f are attained at diago nal matrices (j ust some permutations of B) . In ot he r words, the minimum and the maximum on the unitary orbit are b ot h contained in the permutation orbit. This is an interesting fact from the point of view of calculus and geom etry. We will see below that if A, B are real diagonal matrices, a stronger statement can be p roved using calculus. This will also serve to introduce some e l ement ary ideas of differential geometry used in later sections. A differentiable function U (t) , where t is real and U ( t) is unitary, is called a differentiable curve thro ugh I if U (O) == I D i ffe rentiat i ng the equation U(t)U(t)* == I at t == 0 shows that for such a curve U' (O) is skewHe r mit i an . The matrix U' (O) is called the tangent vector to U(t) at /. If K == U' ( O ) , then etK is anothe r differentiable curve through I with t angent vector K at I. Thus , the curves U(t) and e t K have the same t angent vector and so .
VL 4 Est imates i n the Frobenius Norm
167
represent the same curve locally, i.e. , they are equal to tHe> first degree of approximation . The tangent space to the manifold U(n) at the p o i nt I is the linear space that consists of all these tangent vectors. We have seen that this is the real vector space K ( n) consisting of all skewHermitian matrices . If UA is the unitary orbit of a matrix A , then every differentiable curve through A can be represented locally as e t K Ae t K for some skewHermitian K. The derivative of this curve at t == 0 is K A  AK. This is usually written as [K, A ] and called a Lie bracket or a commutator. Thus the tangent space to the manifold UA at the point A is the space TAUA
== { [A, K]
:
K
E
K(n) } .
(VI.37)
Note that this implies that TAUA is contained in K(n) if A E K(n) . The sesquilinear form (A, B) == tr A * B is an inner product on the space M (n) . The symbol Sj__ will mean the orthogonal complement of a space S vvith respect to this inner product.
Le mma Vl.4.2 For every A E K(n) , the orthogonal in K(n) is the set of all Y that commute with A. Proof. Let Y K(n) we have
E
K(n) . Then Y
E
complement of TAUA
(TAUA ) j__ if and only if for every
( Y, [A , K] ) == tr Y * (AK  K A) tr (YAK  Y K A) == tr[A, Y] K.
0
This is possible if and only if [A, Y] == 0 .
K
in
•
The set of all matrices Y that commute with A is called the commutant or the centraliser of A , and is denoted as Z (A) . The lemma above says that in the space K(n) , (TAUA ) j__ == Z ( A ) for every A.
A E K(n) and let f ( X ) == I I A  X ll 2 · Let B be any Then B0 is an extreme point for the function f on if and only if Eo commutes with A.
Theorem Vl.4.3 Let other element of K(n) . the
unitary orbit Us
Proof. A point B0 is an extreme point if and only i f the straight line joining A and B0 is perpendicular to UB at B0 • By Lemma VI.4.2 this is so if and only if A  E0 commutes with B0 , i.e. , if and only if A commutes
with B0 .
•
For skewHermitian (or He rmitian ) matrices A, B , this gi ves another p ro of of Theorem VI. 4. 1 . However, in this case Theorem VI.4.3 says much more . From the first theorem we can conclude that if A and B are normal, then the g lo bal minimum and maximum of the (Frobenius) distance from A to U8 are attained among matrices that commute with A. The second
168
VI .
S pectral Vari ation of Normal Matrices
theorem says that when A and B are·' ' b oth Hel£fuitian this is true for all local extrema as well . This last statement is not true when A is Hermitian and B is skew Hermitian. For in this case,
I I A  u B U * II� == IIA II � + I I UB U * II �
==
II A II � + l i B II �
for all U. Thus the entire orbit Us is at a constant distance from A. Hence, every point on Us is an extremal point. However, not every point on Us need commute with A.
VI . 5
Geometry and S pectral Variation : t he O perator Norm
The first theorem below says that if A is a normal matrix and B is any matrix close to A, then the optimal matching distance d ( o ( A ) , a ( B )) is bounded by II A  B 11 . This is a local phenomenon; global versions of this are what we seek in the next paragraphs.
Let A be a normal matrix, and let B be any matrix such that II A  B l l is smaller than half the distance between any two distinct eigenvalues of A. Then d ( o ( A) , o{B )) < II A  B l l .
Theorem VI. 5 . 1
Let a1 , . . . , a. k be all the distinct eigenvalues of A. Let £ == II A  E l l · By Theorem VL3.3, all the eigenvalues of B lie in the union of the disks D ( a.j , c ) . By the hypothesis, these disks are mutually disjoint . We claim that if the eigenvalue ai has multiplicity mj , then the disk D(aj , c) contains exactly mj eigenvalues of B , counted with their respective multi plicities. Once this is established, the statement of the theorem is seen to follow easily. Let A(t) == ( 1  t ) A + t B , 0 < t < 1 . This is a continuous map from B. Note [0, 1] into the space of matrices; and we have A ( O ) == A, A( l) that II A  A (t) II == t£, and so all the eigenvalues of A (t ) also lie in the disks D ( aj , £ ) for each 0 < t < 1 . By Corollary�; VL 1.6, as t moves from 0 to 1 , the eigenvalues o f A ( t) trace continuous curves that join the eigenvalues of A to those of B . None of these curves can jump from one of the disks D ( aj , £ ) to another. So, if we start off with mj such curves i n the disk • D ( aj , E. ) , we must end up with exactly as many. Proof.
==
Example VI. 3. 13 shows that if no condition is imposed on B, then the conclusion of the theorem above is no longer valid, even when B is normal. However, this does suggest a new approach to the problem. Let A , B be normal matrices, and let 1 (t) be a curve joining A and B , such that each 1 ( t ) is a normal matrix. Then in a small neighbourhood of 1 (t) the spect ral
1G9
VI. 5 G eometry and S pectral Variatio n : t he Operator Norn1
variation inequality of Theorem · VI. 5.t holds. So, the ( total ) spectral vari ation between the endpoints of the curve must be bounded by the length of this curve. This idea is made precise below. Let N denote the set of normal matrices of a fixed size n . If A is an element of N, then so is tA for all real t . Thus the set N is path connected. However, N is not an affine set. A continuous map 1 from any interval [ a , b] into N will be called a normal path or a normal curve. If 1 ( a ) == A and 1 (b) == B, we say that 1 is a path joining A and B; A and B are then the endpoints of I · The length of 1 , with respect to the norm II · I I , is defined as fii · II ( 'Y ) == sup
m 1
L I I 'Y ( t k+ l )  'Y ( t k ) l l ,
k=O
( VI. 38)
where the supremum is taken over all partitions of [ a , b] as a == t 0 < t1 < · · · < t m == b. If this length is finite, the path 1 is said to be rectifiable. If the function 1' is a piecewise C 1 function, then
j b
1! 1 1 . 1 1 ( r) =
I ! J' ( t ) ! l dt .
(VI.39)
a
Let A and B be normal matrices, and let 1 be a re tifi able normal path joining them. Then (VI . 40 ) d(O" (A) , O"(B)) < £ 11 . 11 (/ ) .
Theorem Vl. 5.2
c
Proof. For convenience, let us choose the parameter t to vary in [0 , 1 ] . For 0 < r < 1 , let /r be that part of the curve which is parametrised by [0, r] . Let G == {r E (0 , 1 ] : d(O" (A) , O"( 'Y ( r ) ) ) < £ 11 · 11 ( 1, ) } . The theorem will be proved if we show that the point 1 is in G. Since the function 1 , the arclength, and the distance d are all continuous in their arguments, the set G is closed. So it contains the point g == sup G. We have to show that g == 1 . Suppose g < 1. Let S == ! (g) . Using Theorem VI.5. 1 , we can find a point t in (g , 1] such that, if T == 1(t) , then d(O" (S) , O" (T) ) < l i S  T i l . But then d(a(A) , O"( 'Y (t) ) )
< d(O"(A) , O"(S) ) + d(a( S) , O" (T) ) < f l l · ll ( lg ) + l i S  T i l <
f u  u ( !t ) 
By the definition of g , this is not possible.
•
An effective estimate of d(a(A) , a( B)) can thus be obtained if one could find the length of the shortest normal path joining A and B. This is a
1 70
VI .
S pectral Variation of Normal Matrices
difficult problem since the geometry of the set N is poorly understood. However, the theorem above does have several interesting consequences.
Let A, B E N. Then the line segment joining A and B lies in N if and only if A  E is in N. Theorem VI. 5.4 If A , E are norrnal matrices such that A  E zs also norrnal, then d ( O"( A) , O" ( E) ) < II A  E l l . Exercise VI. 5.3
Proof. The path 1 consisting of the line segment joining A, E is a normal path by Exercise VI.5.3. Its length is I I A  E l l . •
For Hermitian matrices A, B , the condition of the theorem above is sat isfied. So this theorem includes Weyl 's perturbation theorem as a special case. A more substantial application of Theorem VI.5.2 is obtained as follows. It turns out that there exist normal matrices A , B for which A E is not normal, but there exists a normal path that joins them and has length I I A  E l l . Note that this path cannot be the line segment joining A and B; however, it has the same length as the line segment. What makes this possible is the fact that the metric under consideration is not Euclidean, and so geodesics need not always be straight lines. (Of course, by the definition of the arclength and the triangle inequality no path joining A, B could have length smaller than I I A  B I I . ) Let S be any subset of M(n) . We will say that S is metrically flat in the metric induced by the norm I I · I I if any two points A, E of S can be joined by a path that lies entirely within S and has length II A  B ll . To emphasize the dependence on the norm II I I , we will also call such a set I I 1 1  fl at. Of course, every affine set is metrically fiat. A nontrivial example of a II · 11 flat set is given by the theorem below. Let U be the set of n x n unitary matrices and C U the set of all constant multiples of unitary matrices. 
·
·
·
Theorem Vl. 5 . 5
The set C · U is
II ·
1 1 fiat.
Proof. First note that C · U consists of just nonnegative real multiples of unitary matrices. Let Ao == ro Uo and A 1 == r 1 U1 be any two elements of this set , where r0 , r 1 > 0. Choose an orthonormal basis in which the unitary matrix U1 U0 1 is diagonal:
U1 u:0 1
_ 
d 1" ag ( e ifh , . . . , e iBn ) ,
where Reduction to such a form can be achieved by a unitary conjugation. Such a process changes neither eigenvalues nor norms. So, we may assume that
VI . 5 Geometry and Spectral Variation : t he Operator Norm
all
matrices are written with respect to K
==
the
above
171
basis .�rrL:ie t
orthonormal
diag ( iB I , . . . , iB n ) .
Then, K is a skewHerm itian matrix whose eigenvalues are in the interval ( i1r, i1r] . We have l i re /  r 1 U1 U0 I II
==
m�xlro  r i exp ( i01 ) 1 J
T his last quantity is the length of the straight line joining the points r0 and ri exp ( iB I ) in the complex plane. Parametrise this line segment as r(t) exp (itB 1 ) , 0 < t < 1 . This can be done except when I B1 1 == 1r , an exceptional case to which we will return later. The equation above can then be written as I
J I [r(t) exp(itB I ) ] ' i dt 0
I
J l r' ( t) + r ( t)iB 1 i dt. 0
Now let A(t) == r (t) ex p(tK) U0 , 0 < t < 1 . This is a smooth curve in C U with endpoints Ao and A 1 . The length of this curve is ·
l
J0 I I A' (t) l l dt
1
J l l r' (t ) exp (tK) Uo + r ( t) K exp (tK) Uo l l dt 0
j i i r' ( t)I + r (t)K I I dt, l
0
since exp(tK) U0 i s a unitary matrix. But ll r' ( t ) I + r (t) K I J
==
m�x l r' (t ) + ir(t)Bj l J

==
l r' (t) + ir (t) Bl l ·
Putting the last three equations together, we see that the path A (t) j oining A o and AI has length I I Ao  A 1 1 l · The exceptional case I el l == 7r is much simpler. The piecewise linear path that joins A0 to 0 and then to A 1 has length r0 + r 1 . This is equal to • I ra  r 1 exp ( iB 1 ) I and hence to II A o  A 1 l l Using Theorems VI. 5 . 2 and VI.5.5, we obtain another proof of the result of Ex ercise VI . 3 . 1 2 .
VI.
1 72
Spect ral Variation of Normal Matrices
Let A, B be normal iifatrices whose eigenvalues lie on concentric circles C(A) and C (B), respectively. Show that d(CJ( A ) , CJ ( B ) ) <
Exercise Vl. 5.6 IIA  B II ·
Theorem VI.5. 7 The if and only if n < 2 .
set N consisting of
nxn
normal matrices is 11 · 1 1 fiat
Proof. Let A , B be 2 x 2 normal matrices. If the eigenvalues of A and those of B lie on two parallel lines, we may assume that these lines are parallel to the real axis. Then the skewHermitian part of A  B is a scalar , and hence A  B is normal. The straight line joining A and B, then lies in N. If the eigenvalues do not lie on parallel lines, then they lie on two concentric circles. If a is the common centre of these circles, then A and B are in the set a + C U. This set is II · 11flat. Thus, in either case, A and B can be joined by a normal path of length II A B 11 I f n > 3, then N cannot be II 11 fl at because of Theorem VL5.2 and Example VI. 3 . 13. • ·

·
Here is an example of a Hermitian matrix A and a skew Hermitian matrix B that cannot be joined by a normal path of length I I A E l l  Let Example Vl. 5 . 8
A ==
0 1 1 0 0 1
0
1 0
0 1 0
B ==
1 0 1
0 1 0
Then II A  B II 2 . If there were a normal path of length 2j oining A, B, then the midpoint of this path would be a normal matrix C such that II A  Gil I I B C II == 1 . Since each entry of a matrix is dominated by its norm, this implies that l c2 1  1 1 < 1 and lc2 1 + 1 1 < 1 . Hence c2 1 0 . By the same argument, c3 2 == 0. So ==
==

==
A  C ==
*
1
*
*
*
*
*
1
*
where represents an entry whose value is not yet known. But if I I A  C f l 1 , we must have *
A  C ==
Hence C ==
0 1 0
0 * 0 0 1 0
0 1 * 0 0 1 0 0 0
But then C could not have been normal.
==
VI .6
VI . 6
Geometry and Sp ectral Variation: w ui N o rrns
1 73
Geometry and Spectral Variation:·· wui Norms
In this section we consider the possibility of extending to all (weakly) uni tarily invariant norrns, results obtained in the previous section for the op erator norm. Given a wui norm T , the Toptimal matching distance between the eigenvalues of two (normal) matrices A , B is defined as 1 (V L41 ) dT (o(A) , a ( B ) ) == min T(Eig A  P(Eig B) P ) ,
p
where, as before, Eig A is a diagonal matrix with eigenvalues of A down its diagonal (in any order) and where P varies over all permutation matrices. We want to compare this with the distance T (A  B) . The main result in this section is an extension of the path inequality in Theorem VL5.2 to all wui norms. From this several interesting conclusions can be drawn. Let us begin by an example that illustrates that not all results for the operator norm have straightforward extensions. Example VI. 6 . 1
For 0 < t <
1r ,
let U ( t ) =
III U( t )  U(O) I II == 1 1 
( e�t � ) . Then,
ez. t l == 2 sin 2t ,
for every unitarily invariant norm. In the trace norm (the Schatten norm), we have d 1 ( () ( U ( t ) ) , o( U ( 0) ) ) == 2 1 1  i t /2 1 = 4 sin ! .
1
e
So,
.
! d1 (()(U(t)) , o(U( O ) ) ) t. '0 t _;_  sec 4 > 1 , Jor ) U( ) U( I I t  O iJ 1 Thus, we might have d1 ( o( A) , o( B ) ) > I I A  B ll 1 , even for arbitrarily close normal matrices A, B . Compare this with Theorems VI. 5. 1 and V/. 4. 1 . _
The Qnorms are special in this respect, as we will see below. Let be any finite subset of C. A map F : C is called a retraction onto if l z  F( z ) l == dist (z, ) , i.e. , F maps every point in C to one of the points in that is at the least distance from it. Such an F is not unique if ci> has more than one element . Let be a subset of C that has at most n elements, and let N ( ) be the set of all n x n normal matrices A such that () ( A ) C . If F is a retraction onto , then for every normal matrix B wit h eigenvalues {,8 1 , . , fJn } and for every A in N( ) we have +
..
l i B  F(B) I I
I ,Bj  F ( ,Bj ) I == m?X dist( ,Bj , ) J s ( o(B) , o (A ) ) < li B  A ll m
l <J9X
(VL42)
VI .
1 74
S pect ral Variat ion of No rmal Mat rices
by Theorem VI .3. 3. Note that the normality otB was required at the first� step and that of A at the last . This inequality has a generalisation. Theorem Vl . 6 . 2 Let be a finite set of cardinality at most n . Let F be a retraction onto . Then for every norrnal matrix B and for every A E N ( 1>)
we have
for
every Q  n o
l i B  F ( B) II Q < l i B  A II Q
( VI.43 )
rrn .
By Exercise IV. 2. 10, the inequality (VI.43) is equivalent to the weak majorisation Proof.
[s ( B  F ( B)) ] 2 If {31 , . . . , f3n are the eigenvalues of all 1 < k < n we have
< w
[s ( A  B) ] 2 •
B , this is equivalent to saying that for
k
L lf3i1  F ( f3i1 ) 1 2
<
j= l
k
L s] ( A  B)
j= l
for every choice of indices i 1 , . . . , i k · By Ky Fan's maximum principle ( Exercise 11. 1 . 13) k
k
j=l
j=l
L s� ( A  B) == max L II ( A  B)vi ll 2 ,
where the maximum is taken over all orthonormal ktuples v 1 , . . . particular, if ej are unit vectors such that B e j == {3j ej , then
k
k
j=l
j =l
, Vk .
In
L s] ( A  B) > L I I (A  /3ij ) e i1 11 2 
But if {3 is any complex number and e any unit vector, then I I ( A  ,B ) e l l > dist ( {J, O"(A) ) . ( See the second proof of Theorem VI.3.3. ) Hence , we have
k
k
j =l
j=l
L sJ ( A  B) > L l f3ii  F ( ,Bii ) l 2 ,
and this completes the proof.
•
Show that the assertion of Theorem V/. 6. 2 is not true for the Schatten pnorrns, 1 < p < 2 . {See the example in (V/. 23).) Exercise Vl.6.3
VI . 6 Geometry and S p ectral Variation: wui Norms
1 75
Corollary Vl.6.4 Let A be a normal rnatrix, and let B beanother norma['Y, matrix such that II A  B I I is smaller than half the distance between the distinct eigenvalues of A . Then
fo r every Qnorm. {The quantity on the left is the Qnorm optimal match ing distance.) Proof. Let £ == I I A  E l l  In the proof of Theorem VI. 5. 1 we saw that all the eigenvalues of B lie within the region comprising of disks of radius c around the eigenvalues of A . Further, each such disk contains as many eigenvalues of A as of B (multiplicities counted ) . The retraction F of The orem VI. 6.2 then achieves a onetoone pairing of the eigenvalues of A and • those of B.
Replacing the operator norm by any other norm T in (VI. 38) , we can define the Tlength of a path 1 by the same formula. Denote this by f T ( 1 ) .
Let A and B be normal matrices, and let 1 be a normal path joining them. Then for every Qnorm we have Exercise Vl.6.5
This includes Theorem VI. 5. 2 as a special case. We will now extend this inequality to its broadest context.
Let A be a normal matrix and let 8 be half the min imum distance between distinct eigenvalues of A . Then there exists a posi tive number M {depending on 8 and the dimension ) such that any normal matrix B with I I A  E l l < 8 has a representation B U E'U * , where E' commutes with A and U is a unitary matrix with II I  U ll < M i l A  E l l 
Proposition Vl.6.6
n
==
Proof. Let D'.j , 1 < j < r, be the distinct eigenvalues of A, and let mj be the multiplicity of a. j . Choose an orthonormal basis in which A == ffij a. j lj , where 11 , 1 < j < r , are identity submatrices of dimensions mj . By the argument used in the proof of Theorem VI. 5. 1 , the eigenvalues of B can be grouped into diagonal blocks D1 , where D1 has dimension m1 and every eigenvalue of Dj is within distance b of a.j · This implies that
1 if j # k . II ( a j Ik  D k )  I I < 8 If D == ffij D1 , then there exists a unitary matrix W such that E == W D W * . With respect to the above splitting, let W have the block decomposition W == [ Wj k ] , 1 < j, k < r. Then 1
II A  E ll
I I A  W D W * I l == I I A W  W D II IJ [Wjk ( a.j lk  D k ) ] l l .
1 76
VI .
S pectral Variat ion of Normal Matrices
Hence , for j =1 k,
Hence, there exists a constant K that depends only o n 8 and
n
such that
Let X == EBj Wjj · This is the diagonal part in the block decomposition of w . Hence, I I X II < 1 by the pinching inequality. Let wjj == Vj Pj be the polar decomposition of Wi1 with Vj unitary and Pi positive. Then since P1 is a contraction. Let V == EBj Vj . Then V is unitary and from the above inequality, we see that II X  V I I < I I X * X  I l l == I I X * X  W * W i l 
Hence, II W  V I I
< < <
I I W  X II + II X  V II < II W  X I I + II X * X  W * W I I I I W  X II + II ( X *  W * ) X l l + t i W * ( X  W ) l l 3 II W  X II < 3 K I I A  E l l 
If we put U == W V * and M == 3 K, we have II I  U l l < M i l A  B l l and B == W D W * == U V DV* U * == U B 'U* , where B ' == V D V * . Since B ' is blockdiagonal with diagonal blocks of size mi , it commutes with A. This • completes the proof.
Given a normal matrix A, a wui T and an £ > 0, there exists a small neighbourhood of A such that for any normal matrix B in this neighbourhood we have Proposition VI.6. 7
norm
Proof. Choose B so close to A that the conditions of Proposition VI.6.6 are satisfied . Let U, B ' , M be as in that proposition. Let S == U  I, so that U == I + S and U* == I  S + S2U* . Then
A  B == A  B ' + ( B' , S] + U B' SU *  B' S. Hence,
r(A  B' + [ B ' , S] ) < T (A  B ) + T ( UB' SU*  B' S) .
VI . 6 Geometry and Spectral Var iat ion : wui Norms
1 77
Since A and B ' are commut ing normal matrices, the y can be diagonalised simultaneously by a unitary conj ugation. In this basis the diagonal of [B ' , S] will be zero. So, by the pinching inequality T ( A  B') < T ( A  B' + [B' ' S] ) .
The two ine quali t i es above give
dr (O"( A) , O"( B) ) < r ( A  B) + T ( U B' S U *  B' S) .
Now choose k such that 1 T (X ) < < k for all X . k ! I X [[
Then, using Proposition VI.6.6, we get T ( U B' SU*  B' S)
< <
2 T ( B' S) < 2 k M II B II II A  E ll 2 k 2 M II B J I T ( A  B ) .
Now, if B is so close to A that we have 2 k 2 M l ! B ll < of the proposition is valid .
c,
then the inequality
•
Let A, B be normal matrices, and let 1 be any normal path joining them. Then there exists a permutation matrix P such that for every wui norm T we have Theorem Vl.6 . 8
(VI.44)
Proof. For convenience, let 1 ( t) be parametrised on the interval [0, 1] . Let !(0) == A, 'Y( l ) == B . By Theorem VI . 1 .4, there exist continuous functions A1 (t) , . . . , A n (t) t hat represent the eigenvalues of the matrix 1(t) for each t . Let D ( t) be the diagonal matrix with diagonal entries A1 ( t ) . We will show that T (D( O)  D ( l) ) < fr (! ) · ( VI . 4 5 )
Let T be any wui norm, and let r: be any positive number. Let 1[s , t] denote the part of the path ! ( · ) that is defined on [s, t] . Let
G == {t : T(D( O )  D (t))
< ( 1 + t: ) fr (! [O , t ] ) } .
( VI . 4 6 )
Because of continuity, G is a closed set and hence it includes its supremum g. We will show that g == 1 . If this is not the case, then we can choose g' > g so close to g that Proposition VI.6. 7 guarantees
(VI.47)
for some permutation
matrix P.
T ( D (g )  p l D ( g ) P )
<
Now note that
T ( D (g ' )  D(g ) ) + T (D(g )  P D (g ') P  1 ) ,
1 78
VI.
Spectral Variat ion o f Normal Matrices
and hence if g' is sufficiently close to g, we will have T ( D (g)  p  l D (g) P ) small relative to the minimum distance between the distinct eigenvalues of D(g) . We thus have D (g) == p  1 D ( g ) P. Hence T ( D (g)  D ( g ' ) )
==
T ( P  1 D ( g) P  D (g ') )
So, from ( VI. 4 7) , T(D ( g )  D ( g' )) < ( 1
+
==
T ( D (g)  P D ( g' ) P  1 ) .
£)T ( !(9)  !(g' ) ) .
From the definition of g as the supremum of the set G in ( VI.46 ) , we have T (D ( O)  D (g)) < ( 1 + £)fr (! [O, g] ) .
Combining the two inequalities above , we get T( D( O)  D (g')) < ( 1 + £)f, (! [O , g ' ] ) .
•
This contradicts the definition of g . So g == 1 .
The inequality ( VI.45 ) tells us not only that for all normal A, B and for all wui norms T we have ( VI .4 8 )
but also that a matching of o(A) with o(B) can be chosen which makes this work simultaneously for all T. Further, this matching is the natural one obtained by following the curves Aj ( t) that describe the eigenvalues of the family 'Y( t). Several corollaries can be obtained now. ,
Let A , B be unitary matrices, and let K be any skew Hermitian matrix such that BA 1 e p K . Then, for every unitarily in variant norm I l l Il l , we have,
Theorem VI.6.9
==
x
·
d lll · l l l ( o ( A ) , o ( B) ) < Il l K il l 
Proof. Let 1 (t) == (exp tK)A, 0 < t < 1 . Then t , !(0) == A, ! ( 1 ) == B. So, by Theorem VL 6.8, d lll · lll ( cr ( A ) , cr ( B) )
<
(VL49)
1 (t)
j ll h/ (t) l l ! dt .
==
K ( exp t K) A So l ll ! ' ( t ) l ll == III K III 
Theorem VI.6. 10
invariant norm
unitary for all
I
0
But 1' (t)
is
.
•
Let A, B be unitary matrices. Then for every unitarily d lll · l l l ( cr ( A ) , cr ( B ) ) <
; lil A  B i ll 
(VI. 50)
VL 6 Geo n1 etr y and Spectral Variation: wui Norms
P roof.
1 79
In view of Theorem VL6.9, we need to show that . I nf { III K I I I : BA  l
==
K < A B exp K} 2 II I  ill 
Choose a K whose eigenvalues are contained in the interval (  i n , i1r] . By app lying a unitary conjugation , we may assume that K == diag( i8 1 , . . . , iB n ) . Then
l i l A  B ill == IIII  BA  1 111 == l l l d i ag ( 1 
But if  7r < 8 < n , then I B I < ; 1 1 every unitarily invariant norm.
ei8 1 
e i 8 1 , . . . , 1  e iBn ) I ll 
Hence, I l l K i l l
<
; l i l A  B i l l for •
We now give an example to show that the factor 1r /2 in the inequality (VI.50) cannot be reduced if the inequality is to hold for all unitarily in variant norms and all dimensions. Recall that for the operator norm and for the Frobenius norm we have the stronger inequality with 1 instead of K/2 (Theorem VI.3. 1 1 and Theorem VI.4. 1 ) . Example Vl. 6 . 1 1 adding an entry ± 1 'l.
e. '
Let A + and A _ be the unitary matrices obtained by in the bottom left corner to an upper Jordan matrix, 0
A± ==
0
0
±1
1 0
0 1
0 0
0 0
0 0
1 0
Then for the trace norrn we have I I A+  A_ ll 1 2 . The eigenvalues of A ± are the n roots of ± l . One can see that the J l · l h optimal matching distance between these two ntuples approaches as n oo . ==
�
n
The next theorem is a generalisation of, and can be proved using the same idea as, Theorem VI.5.4.
If A, B are norrnal matrices such that A  B is also normal, then for every wui norm
Theorem Vl. 6. 1 2
T
dT ( () (A) ' () (B) ) < ( A  B) . T
(V I. 5 1 )
This inequality, or rather just its special case when T is restricted to unitarily invariant norms and A , B are Hermitian, can be used to get yet another proof of Lidskii 's Theorem. We have seen this argument earlier in Chapter IV. The stronger result we now have at our disposal gives a stronger version of Lidskii's Theorem. This is shown below. Let X ' y be elements of e n . We will say that X is majorised by y ' in symbols x < y, if x is a convex combination of vectors obtained from y by permuting its coordinates, i.e. , x == 'Eaa Ya , a finite sum in which each Yu is
1 80
VL
S p ectral Variat ion of Normal Matrices
· ·:a
vector whose coordinates are obtained by applying the permutation O" to the coordinates of y and aa are positive numbers with :Eaa == 1 . When x , y are real vectors, this is already familiar. We will say x is softly majorised by y, in symbols x < s y , if we can write x as a finite sum x EzaYa in ,,lhich Za are complex numbers such that E l za l < 1 . ==
n
n
Let x , y be two vectors in e n such that Lxj and L Yj j =l j =l n n are not zero. If x <s y and Lxj LYj , then x < y. j =l j =l Proposition Vl.6 . 14 Let A, B be n normal matrices and let A.( A) , >.. ( !3) be two nvectors whose coordinates are the eigenvalues of A , B , respec tively. Then T (A) < T(B) for all wui norms T if and only if A.(A) < s A.(B ) .
Exercise VI.6. 13
==
x
n
Proof. Suppose T ( A ) < T(B) for all wui norms T. Then, using The orem IV.4 . 7, we can write the diagonal matrix Eig(A) as a finite sum Eig(A) == L:zk UkEig(B) Uk: , in which Uk are unitary matrices and E l zk l < 1 . This shows that A. ( A ) == :Ezk Sk (A.(B) ) , where each Sk is an orthostochas tic matrix. (An orthostochastic matrix S is a doubly stochastic matrix such that S ij == l u ij 1 2 , where Uij are the entries of a unitary matrix . ) By Birkhoff ' s Theorem each Sk is a convex combination of permutation ma trices . Hence, A.(A) <s A.(B) . The converse follows by the same argument without recourse to Birkhoff's Theorem. •
Let A , B be normal matrices such that A  B is also normal. Then the eigenvalues of A and B can be arranged in such a way that if >.. ( A) and >.. ( B) are the nvectors with these eigenvalues as their coordinates, then
Theorem Vl. 6 . 1 5
>.. (A)  A.(B) < .A(A  B) .
( VI. 52 )
Proof. Use Theorem VL6.8 and the observation in Theorem VL 6. 12 to conclude that we can arrange the eigenvalues in such a way that T ( E ig A  Eig B) < T (A  B)
for every wui norm T. By Proposition VL6. 14, this is equivalent to saying A.(A)  A. ( B) < s A.(A  B) ,
where .A ( A) is the vector whose entries are the diagonal entries of the diag onal matrix Eig A. By a small perturbation, if necessary, we may assume that trA # tr B. Since the components of the vectors A. ( A)  A. ( B) and >.. (A  B) must have the same (nonzero) sum, we have in fact majorisation • rather than just the soft majorisation proved above.
V I . 7 Some Inequalities for the D eterminant
181
We::cc an call this the Lidskii Theorem for normal matrices . It in clu des the classical Lidskii Theorem as a special case. Exercise Vl. 6 . 1 6 Let A, B be normal matrices such that A B is also normal. Let T be any wui norm. Show that there exists a permutation matrix 
P such that
VI . 7
T ( A  B ) < T (Eig A  P(Eig B ) P 1 ) .
(VI. 53)
S ome Inequalities for t he Determinant
The determinant of the sum A + B of two matrices has no simple relation with the determinants of A and B. Some interesting inequalities can be derived using ideas introduced in this chapter. These are proved below.
Let A and B be Hermitian matrices with eigenvalues and fJ1 , . . . , f3n , respectively. Then
Theorem VI. 7.1 a 1 , . . . , an
n
min a IT (ai + f3a (i) ) < det (A + B ) < max
a
i=l
where O" varies over all permutations. Proof.
n
IT (a i + f3a(i) ) ,
i= l
(VI. 54)
I f A and B commute, they can be diagonalised simultaneously, n
and hence det (A + B)
II (a i + f3a ( i ) )
==
for some
O" .
So, the inequality
i =l (VL5 4) is trivial in this case. Next note that the two extreme sides of (VI. 54) are invariant under the transformation B U BU* for every unitary U. Hence, it suffices to prove that for a fixed Hermitian matrix A the function f ( H) == det ( A + H ) on the unitary orbit Us of another Hermitian matrix B attains its minimum and maximum at points that commute with A. Let B0 be any extreme point of f on Us . Then, we must have �
d dt
t =O
for every skewHermitian
det (A + e t K B0 e  t K )
K.
==
0'
Now,
det ( A + etK B0 e  t K ) = det(A + Bo + t [ K, Bo] ) + O ( t 2 ) . Note that , if X , Y are any two matrices and X is invertible , then det(X + tY)
== det X ( l + t tr Yx  1 ) + O (t2 ) .
So, if A + B0 is invertible, the condition (VI. 55) reduces to tr [K, Bo] (A + B0 )  1
==
0.
(VI. 55)
182
VI .
Spectral Variation
of
Normal Mat rices
This is equivalent to saying
tr K ( Bo (A + Bo )  1  ( A + Bo )  1 B0 )
==
0.
If this is to be true for all skewHermitian K , we must have Thus B0 commutes with ( A + B0 )  1 , hence with A + B0 , and hence with A. This proves the theorem under the assumption that A + B0 is invertible. The general case follows from this by a limiting argument. • Exercise VI. 7.2
0, then n
Let A and B be Hermitian matrices. If .x; ( A ) + .X � ( B ) > n
(VI. 56)
j=l j =l This is true, in particular, when A and B are positive matrices. Theorem VI. 7.3 Let A, B be Hermitian matrices with eigenvalues O'.j and {3j , respectively, ordered so that
Let T == A + i B . Then I det T l
<
n II lnj + i f3n  j + l l ·
j= 1
(VI. 57)
Proof. The function f (t) == � log t is concave on the positive halfline. Hence, using the majorisation (VI. 7) and Corollary II.3.4, we have n
L j= l Hen ce,
+ if3n  j + I I > L j= l
n
n
j= l
j= 1
Proposition VI. 7.4
tian. Then
I det T l
log laj
n
==
log sj .
•
Let T A + iB, where A is positive and B Hermi ==
n det A II [ 1 + Sj (A  1 1 2 B A  1 12 ) 2 ] 1 /2 . j =l
( VI. 58)
VI . 7 Some Inequalities for the Determinant
P roof.
Since T == A 1 1 2 ( 1 �. i A  l /2 BA  11 2 ) A 1 12 , we have det T det A · det (J + iA  1 1 2 BA  1 1 2 ) . ==
183
(VI. 5 9)
Note that
I det (J + i A  l /2 BA  1 1 2 ) ! 2 det [ (/ + i A  1 1 2 BA  1 1 2 ) ( 1  i A  1 12 B A  1 1 2 )] det [/ + (A  1 /2 BA  1 / 2 ) 2 ] n IJ [ l + Sj (A  1 /2 BA 1 / 2 ) 2 ] . j=1
So, (VL58) follows from (VI. 59) and (VI. 60) .
(VI.60) •
If the matrix A in the Cartesian decomposition T then I det Tj > det A . Theorem VI. 7. 6 Let T A + i B, where A and B are positive matrices with eigenvalues a 1 > · · > a n and fJ1 > · · · > f3n , respectively. Then,
Corollary VI. 7. 5 A + iB is positive,
==
==
·
n l det T j > IJ i aJ + if3J I · j= l
(VI.61)
Proof. We may assume, without loss of generality, that both A, B are positive definite. B ecause of relations (VI. 59) and (VI. 60) , the theorem will be proved if we show n n IJ [ l + Sj (A  1/ 2 BA  l / 2 ) 2 ] > IJ ( 1 + aj 2 f3] ) . j= 1 j= 1 Note that Sj ( A  1 /2 B A  1 /2 ) Sj (A  l / 2 B l / 2 ) 2 . From ( 111. 20 ) we have 1 { log Sn  j+ l (A  1 /2 ) + log Sj ( B 1 1 2 ) } j < { log sj (A  1 1 2 B 1 2 ) } j · ==
This is the same as saying 1 /2 1 2 log ( { aj f3] ) }j < { log sj (A  1 1 2 B1 1 2 ) }j ·
Since the function log( 1 + e 4t ) is convex in t , using Corollary 11.3.4 we obtain from the last majorisation n n L log ( l + nj 2 f3} ) < L log ( 1 + sj (A  I /2 B l / 2 ) 4 ) j =l j= 1
n L log( l + sj (A 1 1 2 BA  1 1 2 ) 2 ) .
j= 1 This gives the desired inequality.
•
1 84
VI .
Spectral Variat ion of Normal Matrices
Exercise VI. 7. 7 Show that if only one of the two matrices A and B is positive, then the inequality {VI. 61} is not necessarily true.
A very natural generalisation of Theorem VI. 7. 1 would be the followin g statement. If A and B are normal then det (A + B) lies in the convex hull n
of the products
IT (ai + f30"(i ) ) 
This is called the Marcusde O livie ra
i= 1 Conjecture and is a wellknown open problem in matrix theory.
VI . 8
Problems
Problem Vl.8. 1. Let A be a Hermitian and B a s kew Herm itian matrix. Show that
for every Qnorm. Problem Vl. 8 . 2 . Let T == that, for 2 < p < oo ,
A + i B , where A and B
are Hermitian. Show
max ii Eig A + Eigcr ( iB ) I I < 2 1 / 2  l / P I I TII P , (]"
and that for 1 < p < 2 ,
Problem Vl.8.3. A different proof of Theorem VL3. 1 1 is outlined below. Fill in the details. Let n > 3 and let A, B be n x n unitary matrices. Assume that the eigenvalues ai and (3i are distinct and the distances lni  {3j I are also distinct. If /I , /2 are two points on the unit circle, we write / I < 12 if the minor arc from 1 1 to 12 goes counterclockwise. We write ( a.f3'Y) if the points a. , {3, 1 on the unit circle are in counterclockwise cyclic order. Number the indices modulo n, e.g . , n n + I == a 1 . Label the eigenvalues of A so as to have the order (a 1 a 2 a ) Let 8 == d ( o ( A ) , o(B ) ) . Assume that 8 < 2 ; otherwise, there is nothing to prove. Label the eigenvalues of B as {3 1 , . . . , f3n in such a way that for any subset J of { 1 , 2, . . . , n } and for any permutation o·
max l ni  f3i I < max l ni  f3u(i) 1 
iEJ
iE J
·
·
n
·
1 85
VI . 8 P roblems
Then 8 '·· m ?JC lai  ,Bi l · Assume, without loss of generality, that this 1
<2
m aximum is attained at
i
==
1 and that a1 < ,8 1 . C he ck the following.
( i ) If f3i < ai , then neither (a 1 ,Bi ,B 1 ) nor ( a 1 a i,8 1 )
is
possible.
( i i ) There exists j such that l ni+ 1  {3j I > 8. Choose and fix one such j . ( ii ) We have ( a 1 f31 f3j aj + 1 ) . ( iv ) For 1 < i < j we have ({31 ,Bi f3j ) . Let KA be the arc from aj + 1 positively to a 1 and Ks the arc from ,81 p osit ively to ,Bj . Then there are n  j + 1 of the ai in KA and j o f the f3i in KB · Use Proposition VI. 3 . 5 now. Problem Vl.8 . 4. Let a1 , . . . , an and /11 , . . . , /1n be any complex numbers. Show that there is a number 1 such that l f3j  1 1 m9XI a i  11 + m� J 'l.
< v2 ��x l a i  ,Bj l · 1.
,J
( The proof might be long but is not too difficult. ) Use proof of Theorem VI.3. 14.
this
to get another
Problem Vl. 8 . 5 . Let A be a Hermitian and B a normal matrix. If t he eigenvalues aj of A are enumerated as a 1 > > an and if the eigenvalues (Jj of B are enumerated so that Re {3 1 > · · · > Re f3n then ·
·
·
max < v2 II A  E ll · 1 < ]· < n I nJ·  !3J· 1 _
_
Problem Vl. 8.6. Let A be a normal mat rix with eigenvalues a 1 , . . , a n · Let B be any other matrix and let c == I I A  E l l . By Theorem VI.3.3 , all the e igenvalues of B are contained in the set D == U D (aj , £) . Use .
the argument in the proof of Theorem VI.5 . 1 to show that each connected component of D contains as many eigenvalues of B as o f A. Use this and the Matching Theorem ( Theorem II. 2 . 1 ) to show that J
d(o(A) , o(B) ) < ( 2n  1 ) I I A  E l l .
[If A and B are both normal, this argument together with the result of Prob lem II. 5 . 10 shows that d( o( A) , o( B) ) <: n II A  E I I  However, in this case, the HoffmanWielandt inequality gives a stronger result : d ( o(A) , o(B) ) < Vii I I A  B l l . We will see in the next chapter that, in fact , d (CT ( A ) , u ( B)) < 3 II A  E l l in this case. ] P roble m VI.8 . 7. Let A be a Hermitian matrix with eigenvalues a 1 > · · · > an , and let B be any matrix with eigenvalues (Jj arranged so that
1 86
VI .
Spect ral Variat ion of Nor mal Matrices
Re,B1 > · > Re,Bn . Let Re ,Bj Jli and I m,Bj basis in which B is upper triangular and ·
==
·
E
where M == diag(JLI , . . . ) JLn ) , triangular. Show that
==
N
==
Vj . Choose an orthonormal
M + iN + iR, ==
diag(v 1 , . . . , vn ) and R is strictly upper
I I Im ( A  E ) ll � == I I N II � + 1/2fi RI I � 
Hence, Show that
Combine the inequalities above to obtain
Compare this with the result of Problem VI.8.5; note that there B was assumed to be normal. Problem Vl.8.8. It follows from the result of the above problem that if A is Hermitian and E an arbitrary matrix, then d( o ( A ) , o(E ) ) < ffn II A  E l l .
The factor ffn here can be replaced by another that grows only like log n. For this one needs the following fact , which we state without proof. (See the discussion in the Notes at the end of the chapter. ) Let Z be an n x n matrix whose eigenvalues are all real. Then li Z  Z * II < 'Yn ii Z + Z* jl ,
where
2 [n / 2] 2j  1 cot In == n L 2n j
7r.
=l
The constant In is the smallest one for which the above norm inequality is true. Approximating the sum by integrals, it is easy to see that In / log n approaches 2/ as n + oo . 1r
Using the notations of Problem 7, show that
max l vj I < I l i m E ll == !li m ( A 
max l ai  J.Li l < I I A 
B ) 11 ,
M i l == II Re ( A  B) II +
1 11 R  R* ll · 2
VI . 8 P roblems
Let Z , ·  N + R. Then ."Z has only real eigenvalues, Z + Z * == 2 Im( A  B ) . Henc e,
Z  Z*
=
187
R  R* , and
max l c�1  ,81 l < l l Re ( A  B ) I I + ( In + l ) I I Im( A  B ) I I . Th is shows that d ( o ( A ) , o( B ) ) < ( In + 2 ) I I A  E l l .
Problem VI.8 .9. Let A be the Hermitian matrix 1
Let B
==
l i j l 0
w it h
i =f: j,
if for all
e nt ri e s
.
'l .
A + C '\v here C is the skewHermitian matrix with ent ri es
if for all
i =f: j .
'L
Then B is s trict ly lower triangular, hence all its eigenvalues are 0 . Show that II A  E ll < 1r for all n , and I I A l l == O(log n ) . ( This needs some work. ) Since A is Hermitian, this means that its spectral radius is O(log n ) . Thus, in t h i s example , d( o( A ) , o ( B ) ) == O ( log n ) and I I A  B ll < 1r. So , the bound obtained in Problem 8 is not too loose. Problem Vl.8. 10. For any matrix A, let A n denote its diagonal part and AL , Au its parts below and above the diagonal. Thus A == AL + Av + Au . Show that if A is an n x n normal matrix, then The example i n VI.6. 1 1 shows that this i ne q uali ty is sharp . Problem Vl. 8 . 1 1. Let A be a normal and B an arbitrary n x n matr ix . Choose an orthonormal basis in whi ch B is upper triangular. In this basis writ e A == A L + An + Au , B == Bv + Bu . By the HoffmanWielandt Theorem
Note that
A  B + Bu == ( A  B) L + ( A  B ) v + Au .
Use the result of Prob l em VL8. 10 t o show that
Hence, we have , for A normal and B arbitrary,
188
VI.
S pectral Variation of Normal Matrices
From this , we get for the Schatten pnorms 1
Show that for 1 < p < 2 these inequalities are sharp. (See Example VI. 6 . 1 1 . ) For p == oo , this gives d (()(A ) , () ( B ) ) < n i ! A

E ll ,
which is an improvement on the result of Problem 6 above. If A is Hermitian, then II Au l l 2 == I I AL II 2  Using this one obtains a slightly different proof of the last inequality in Problem VI.8. 7. Problem Vl.8 . 12 . Let A be an n x n Hermitian matrix partitioned as A == ( � IJ; ) , where M is a k x k matrix. Let the eigenvalues of A be > Jl k , and let the singular values of .,.\ 1 > > A n , those of M be Jt 1 > Show that there exist indices 1 < i < R be p 1 > p 2 > < ik < n 1 such that for every symmetric gauge function we have ·
·
·
·
·
·
·
·
·
.
·
·
·
In other words , for every unitarily invariant norm we have
In particular, we have
and Use an argument similar to the one in the proof of Theorem VI.4. 1 to show that the factor V2 in the last inequality can be replaced by 1 . This raises the question whether we have
for all unitarily invariant norms. This is example 0 1 1 0 A == y'3 0
not so ,
as
can be seen from the
y'3 0 0
Problem Vl. 8. 13. Let be a closed subset of C, and let F be a retraction onto . Let N( ) be the set of all normal matrices whose spectrum is
VL 8 P roblems
189
contained in . Show that if is a convex �et , then for every unitarily invariant norm III B  F (B) I ll < III B  A l ii , whenever B is a normal matrix and A E N (). If is any closed set, then the above inequality is true for all Qno:·ms. Problem Vl. 8. 14. The aim of this problem and the next, is to outline an alternative approach to the normal path inequality ( VI.44) . This uses slightly more sophisticated notions of differential geometry. Let A be any n x n matrix, and let 0 A be the orbit of A under the action of the group GL(n) , i.e. , OA
==
{ gAg  1 : g E GL(n) } .
This is called the similarity orbit of A; it is the set of all matrices similar to A. Every differentiable curve in 0 A passing through A can be parametrised locally as e t X A etx , X E M(n) . By the same argument as in Section VI.4, the tangent space to 0 A at the point A can be characterised as
TA O A
==
{ [A , X ] : X
E
M(n) } .
The orthogonal complement of this space in M(n) can be calculated as in Lemma VI.4.2. Show that
Now, a matrix A is normal if and only if Z(A* ) = Z(A) . So, for a normal matrix we have a direct sum decomposition M (n)
==
TA OA EB Z(A) .
Now, if B E OA , then B and A have the same set of eigenvalues and hence
If B E Z( A ) , then there is an orthonormal basis in which A and B are both upper triangular. Hence, for such a B,
Now, let ')' ( t ) , 0 < t < 1 be a C 1 curve in the space of normal matrices. Let ! ( 0 ) == A 0 , ! ( 1 ) == A 1 . Let cp(A) == d2 (0"(Ao) , O"(A) ) . At each point 1(t ) consider the decomposition M (n)
==
T"((t) O "f( t ) EB Z( 1( t ) )
obtained above. Then, as we move along 1' ( t ) , the rate of change of the function cp is zero in the first direction in this decomposition, and in the
1 90
VI .
Spect ral Variation of Normal Matrices
second, it is bounded by the rate of �change of the argument . Hence we should have 1 cp( 'y ( l ) ) < l h' ( t ) lb dt.
1
Prove this. Note that this says that
if 1 is a C 1 curve passing through normal matrices and joining A 0 to A 1 . Problem VI.8 . 15. The two crucial properties of the Frobenius norm used above were its invariance under the conjugations A U A U * and the pinching inequality. The first made it possible to change to any orthonormal basis , and the second was used to conclude that the diagonal of a matrix has norm smaller than the whole matrix. Both these properties are enjoyed by all wui norms. So , the method outlined above can be adopted to work for all wui norms to give the same result. (Some conditions on the path are necessary to ensure differentiability of the functions involved. ) +
Problem Vl.8. 16. Fill in the details in the following outline of a proof of the statement : every complex matrix with trace 0 is a commutator of two matrices. Let A be a matrix such that trA == 0. Assume that A is upper triangular. Let B be the nilpotent upper Jordan matrix (i.e. , B has all entries 0 except the ones on the first superdiagonal, which are all 1 ) . Then Z ( B* ) contains only polynomials in B * . (This is a general fact: Z (X) contains only polyno mials in X if a1:1 d only if in the Jordan form of X there is j ust one block for each different eigenvalue. ) Thus Z ( B * ) consists of lower triangular matrices with constant diagonals. Show that A is orthogonal to all such matrices. Hence A is in the space Ts Os , and so A == [B , C] for some C.
VI . 9
Notes and References
Perturbation theory for eigenvalues is of interest to mathematicians , physi cists, engineers , and numerical analysts. Among the several books that deal with this topic are the venerable classics, T. Kato, Perturbation Theory for Linear Operators, SpringerVerlag, 1966, and J .H. Wilkinson, The Algebraic Eigenvalue Problem, Oxford University Press, 1965. The first is addressed to the problems of quantum physics, the second to those of numerical anal ysis. Matrix Computations by G.H. Golub and C.F. Van Loan, The Johns Hopkins University Press, 1983 , has enough of interest for the theorist and for the designer of algorithms. Much closer to the spirit of our book (but a lot more appealing to the numerical analyst) is Matrix Perturbation Theory
VI . 9 Notes and References
191
by G . W. Stewart and J . G . Sun, Academic Press, 1 9 9Q.._ Much of the mate rial in this chapter has appeared before in R. Bhatia, Perturbation Bounds for Matrix Eigenvalues , Longman, 1987. In Section VI. l , we have done the bare minimum of qualitative analysis required for the latter sections. The questions of differentiability and an alyticity of eigenvalues and eigenvectors, asymptotic expansions and their convergence, and other related matters are discussed at great length by T. Kato, and by H. Baumgartel, Analytic Perturbation Theory for Matrices and Operators, Birkhauser, 1984. See also M. Reed and B . Simon, Analysis of Operators, Academic Press, 1978. The proof of Theorem VI. 1 . 2 given here is adopted from H. Whitney, Complex Analytic Varieties, Addison Wesley, 1972 . Other proofs may be found in R. Bhatia and K.K. Mukherjea, The space of unordered tuples of complex numbers, Linear Algebra Appl. , 52/53 ( 1983) 765768. The proof of Theorem VI. 1 .4 is taken from T. Kato. Theorem VI. 4. 1 was proved in A.J. Hoffman and H.W. Wielandt , The variation of the spectrum of a normal matrix, Duke Math J . , 20 ( 1953) 3739. This was believed to be the next best thing to having an analogue of Weyl's Perturbation Theorem for normal matrices . The conj ecture , that for normal A, B the optimal matching distance in each unitarily invariant norm is bounded by the corresponding distance between A and B, seems to have been raised first in L. Mirsky, Symmetric gauge functions and unitarily invariant norms, Quarterly J. Math. 1 1 ( 1960) 5059. The problem, of course, turned out to be much more subtle than imagined at that time. Theorem VI. 2 . 2 is due to V. S. Sunder, Distance between normal opera tors, Proc. Amer. Math. Soc. , 84 ( 1982) 483484. The rest of Section 2 is based on T. Ando and R. Bhatia, Eigenvalue inequalities associated with the Cartesian decomposition, Linear and Multilinear Algebra, 22 ( 1987) 133 147. Theorem VI.3.3 is one of the several results proved in the paper Narms and exclusion theorems, Numer. Math. 2 ( 1960) 137 141 , by F.L. Bauer and C.T. Fike. Theorem VI.3. 1 1 was first proved by R. Bhatia and C. Davis , A bound for the spectral variation of a unitary operator, Lin ear and Multilinear Algebra, 1 5 ( 19 8 4) 7176. Their proof is summarised in Problem VI.8.3. This approach of ordering eigenvalues in the cyclic order is elaborated further by L. Elsner and C. He, Perturbation and interlace the orems for the unitary eigenvalue problem, Linear Algebra Appl. , 188/ 189 (1993) 207229, where many related results are proved. The proof given in Section 3 is adapted from R.H. Herman and A. Ocneanu, Spectral analysis for automorphisms of UHF C* algebras, J. Funct. Anal. 66 ( 1986) 110. The first example of 3 x 3 normal matrices A, B for which d(o (A) , o (B) ) > I I A  E ll was constructed by J.A . R. Holbrook, Spectral variation of normal matrices, Linear Algebra Appl. , 174 ( 1 992) 131 144. This was done using a directed computer search. The barehands example VI.3. 13 was shown to us by G.M. Krause. The inequality ( VI. 33) appears in T. Ando, Bounds
1 92
VI.
Spect ral Variation of Normal Mat rices
for the aritidistance J J. Convex Analysis , 2(1996)
1 3· 1 t
,
had been observed earlier by R. Bhatia that this would lead to Theorem VI.3. 14. A differ ent proof of this theorem was found independently by M. Omladic and P. Semrl, On the distance between normal matrices, Pro c . A mer. Math. Soc . , 1 10 ( 1990) 59 1596. The idea of the simpler proof in Problem VI.8.4 is due to L. Elsner. The geometric ideas of Sections VI.5 and VI.6 were introduced in R. B hatia, Analysis of spec tral variation and some inequalities, Trans. A mer. Math. Soc. , 272 ( 1 982) 323332. They were developed further in three pa pers by R. Bhatia and J .A.R. Holbrook: Short normal paths and spectral variation, Proc. Amer. Math. Soc. , 94 ( 1 985) 377382 ; Unitary invariance and spectral variation, Linear Algebra Appl. , 95 ( 1987) 4368; and A softer, stronger Lidskii theorem, Proc. Indian Acad. Sci. (Math. Sci. ) , 99( 1989) 7583. Much of Sections 5 and 6 is based on these papers. Example VI. 5.8 is due to M . D . Choi. The special operator norm case of the inequality (VI.49) was proved by K.R. Parthasarathy, Eigenvalues of matrixvalued analytic maps, J. Austral. Math. Soc. (Ser. A) , 26( 1978) 1 79 197. Theorem VI.6. 10 and Example VI. 6. 1 1 are taken from R. Bhatia, C. Davis and A. Mcintosh, Perturbation of spectral subspaces and solution of linear operator equations , Linear Algebra Appl. , 52/53 ( 1983) 4567. The inequality (VI.53) for (strongly ) unitarily invariant norms was first proved by V.S. Sunder, On permutations, convex hulls and normal operators, Lin ear Algebra Appl. , 48 ( 1982) 40341 1 . P.R. Halmos, Spectral approximants of normal operators, Proc. Edin burgh Math. Soc. 19 ( 1974) 5 158 , initiated the study of operator approx imation problems of the following kind. Given a normal operator A, find the operator closest to A from the class of normal operators that have their spectrum in a given closed set . The result of Theorem VI.6.2 for infinite dimensional Hilbert spaces (and for closed sets
2, by R. Bouldin, Best approximation of a normal operator in the Schatten pnorm, Proc. Amer. Math. Soc. , 80 ( 1980) 277282. The re sult for Qnorms, as well as the special one for all unitarily invariant norms given in Problem VI. 8. 1 1 , was proved by R. Bhatia, Some inequalities for norm ideals, Commun. Math. Phys. 1 1 1 (1987) 3339. Theorem VI. 7. 1 is due to M. Fiedler, Bounds for the determinant of the sum of Hermitian matrices, Proc. Amer. Math. Soc., 30 ( 1 97 1) 2731 . The inequality (VI.56) is also proved in this paper. Theorem VI.7.3 is due to J F Queir6 and A.L. Duarte, On the Cartesian decomposition of a matrix, Linear and Multilinear Algebra, 18 ( 1985) 7785, while Theorem VI.7.6 is due to N. Bebiano. The proofs given here are taken from the paper by T. Ando and R. Bhatia cited above. There are several papers related to the Marcusde Oliviera conjecture. For a recent survey, see N. Bebiano, New developments on the MarcusOliviera Conjecture, Linear Algebra Appl. , 197/ 1 98 ( 1994) 793802. .
.
.
VI . 9 Notes and References
1 93
The results in Problem VI. 8.7 are due to W. Kahan , Spectra of nearly Hermitian matrices, Proc. Amer. Math. Soc. , 48 ( 1 975) 1 1 1 7, as are the ones in Problem VI.8.8 (essentially) and Problem VI.8.9. In an earlier paper, Every n x n matrix Z with real spectrum satisfies I I Z  Z * II < li Z + Z * ll (log 2 n + 0.038) , Proc. Amer. Math. Soc. 39 ( 1973) 23524 1 , Ka han proved the inequality in the title of the paper and used it to obtain the perturbation bound in the later paper. The exact value of In in this inequality was obtained by A. Pokrzywa, Spectra of operators with fixed imaginary parts, Proc. Amer. Math. Soc. , 81 ( 198 1 ) 359364. The appear ance of a constant growing like log n in this inequality is related to another important problem. Let T be the linear operator on M (n) that takes every matrix to its upper triangular part . Then sup I I T(X) 11 / II X II is known to be O (log n ) . From this one can see (on reducing Z to an upper triangular form) that the constant In also must have this order. Related results may be found in R. Mathias , The Hadamard operator norrn of a circulant and applications, SIAM J. Matrix Anal. Appl. , 14( 1993) 1 152 1 167. The results in Problems VI.8. 10 and VI.8. 1 1 are taken from J.G Sun, On the variation of the spectrum of a norrnal matrix, Linear Algebra Appl. , 246( 1996) 215223. Bounds like the one given in Problem VI.8. 1 2 are called residual bounds in numerical analysis. See W. Kahan, Numerical linear algebra, Canadian Math. Bull. , 9 ( 1 966) 75780 1 , where the special improvement for the Frobe nius norm is obtained. The general result for all unitarily invariant norms is proved in the book by Stewart and Sun. The example at the end of this problem was given in R. Bhatia, On residual bounds for eigenvalues, Indian J. Pure Appl. Math. , 23 ( 1992) 865866. The result in Problem VI.8. 14 is a wellknown theorem; the proof we have outlined here was shown to us by V .S. Sunder.
VII Perturbat ion of S p ectral S ubs p aces of Normal M at rices
In Chapter 6 we saw that the eigenvalues of a (normal) matrix change continuously with the matrix. The behaviour of eigenvectors is more com plicated. The following simple example is instructive. Let A == ( 1 6 £" 1 ° E." ) E9 H, B == ( ! � ) EB H, where H is Hermitian. The eigenvalues of the first 2 x 2 block of A are 1 + E , 1  E . The same is true for B. The corresponding nor malised eigenvectors are ( 1 , 0) and (0, 1) for A, and � (1, 1) and � ( 1 ,  1 ) for B . As E + 0 , B and A approach each other, but their eigenvectors re main stubbornly apart. Note, however, that the eigenspaces that these two eigenvectors of A and B span are identical. In this chapter we will see that interesting and useful perturbation bounds may be obtained for eigenspaces corresponding to closely bunched eigenvalues of normal matrices. Before we do this , it is necessary to introduce notions of distance between two subspaces. Also, it turns out that this perturbation problem is closely related to the solution of the matrix equation AX  X B Y . This equation called the Sylvester Equation, arises_ in several other contexts. So, we will study it in some detail before applying the results to the perturbation problem at hand. ==
1 95
VI I . l Pairs of S ubspaces
VII . l
Pairs of S ubspaces
We will be dealing with block decompositions of matrices . To keep track of dimensions, we will find it convenient to write £
k
A == for a blockmatrix in which k and f are the number of columns, and m and p are the number of rows in the blocks indicated. Theorem VII. l . l {The QR Decomposition) Let m > n . Then there is an m x m unitary matrix Q
( R0 )
A be an such that
m x n
matrix,
n
Q * A ==
n m  n
(VII. l )
'
where R is upper triangular with nonnegative real diagonal entries. Proof. For a square matrix A , this was proved in Chapter 1. The same proof also works here. (In essence this is just the GramSchmidt pro • cess. ) The matrix R above is called the R factor of A.
Let A be an matrix with rank A Then the decomposition of A has positive diagonal elements and is uniquely determined. (See Exercise I. 2. 2.) If we write
Exercise VII. 1 . 2 R factor in the QR
m x n
n
==
n.
m  n
then we have Thus Q 1 is uniquely determined by A. However, Q 2 need not be unique. Note the range of A is the range of Q 1 , and its orthogonal complement is the range of Q2 . Exercise VII. 1 . 3 Let A be an m x n matrix with < Then there exists an unitary matrix W such that m
n.
n x n
L 0
m
n  m
AW == ( ) m, where L is lower triangular and has nonnegative real diagonal entries.
1 96
VII. Perturbation of Spectral
S ubspaces
of Normal
Matrices
We remark here that it is only for convenience that we choose R (and L ) to have nonnegative diagonal entries. By modifying Q (and W) , we can also make R (and L) have non positive real diagonal entries.
Let A be an matrix with rank A == r. Then there permutation matrix P and an m m 'L£nitary matrix Q such
Exercise VII. 1 . 4
exists an that
n x n
m x n
x
Q*AP =
(�
1
where R1 1 is an r r upper triangular matrix with positive diagonal entries. This is called a rank revealing QR decomposition . Exercise VII. 1 . 5 Let A be an m matrix with rank A == r. Then there exists an m m unitary matrix Q and an unitary matrix W such that Q*AW � x
x n
x
n x n
=
( �),
where T is an r r triangular matrix with positive diagonal entries. Theorem VII. 1 . 6 {The CS Decomposition} Let W be an unitary matrix partitioned as x
n x n
m
f
W ==
)!
(VIL2)
where .e < m. Then there exist unitary matrices U == diag(U1 1 , U22 ) and v == diag(Vl l ' v2 2 ) ' where ull ' vl l are .e f matrices, such that X
f
U*WV == where
C 0 < c1 <
C s 0
f
S c 0
ml
0
0 I
.e
f ml
(VII.3)
and S are nonnegative diag n al matrice5, with diagonal < ce < 1 and 1 > s 1 > · · · > se > 0, respectively, and o
·
·
entries
·
Proof. For the sake of brevity, let us call a map X U * XV on the space of n x n matrices a Utransform, if U, V are block diagonal unitary matrices with top left blocks of size f x f. The product of two Utransforrns is again a Utransform. We will prove the theorem by showing that one can change the matrix W in (VIL2) to the form (VIL3) by a succession of Utransforms. 7
VI I . l Pairs of S ubspaces
Let ul l ' vl l be .e
X
197
f unitary matrices such that k
pk 0 I
where C1 is a diagonal matrix with diagonal entries 0 < c 1 < c2 < · · · < Ck < 1 . This is j ust the singular value decomposition: since W is unitary, all singular values of W1 1 are bounded by 1 . Then the Utransform in which U == diag (U1 1 , I) and V == diag ( V1 1 , I) reduces W to the form k Pk m
k c1 0 ?
Pk 0 I
m
? .
?.
? .
? .
(VIL4)
where the structures of the three blocks whose entries are indicated by ? are yet to be determined. Let W2 1 denote now the bottom left corner of the matrix (VIL4) . By the QR Decomposition Theorem we can find an m x m unitary matrix Q 22 such that p m 
p '
(VIL 5 )
where R is upper triangular with nonnegative diagonal entries. The U transform in which U == diag (I, Q22 ) and V == diag (I, I) leaves the top left corner of (VII . 4) unchanged and changes the bottom left corner to the form (VII. 5) . Assume that this transform has been carried out . Using the fact that the columns of a unitary matrix are orthonormal, one sees that the last f  k columns of R must be zero . Now examine the remaining columns, proceeding from left to right. Since C 1 is diagonal with nonnega tive diagonal entries, all of which are strictly smaller than 1 , one sees that R is also diagonal. Thus the matrix W is now reduced to the form fk 0 I
m
k Pk
k c1 0
k Pk mf
s1 0 0
0 0 0
? .
?. ? . ?
(VIL6)
?
in which 51 is diagonal with Cr + Sr == I , and hence 0 < S1 < I. The structures of the two blocks on the right are yet to be determined. Now , by
1 98
VII.
Pert urbat ion o f Spect ral S ubspaces o f Normal Matrices
Exercise VI I . 1 . 3 and the remark followi n g it , we can find e1n m x m unitary matrix V22 \v h i ch on right multiplication converts the top right block of (VII. 6 ) to the form f
f (L
mf 0 ),
(VII. 7)
in which L is lower triangular with nonpositive diago n al entries. The U transform in which U == diag ( I, I) and V == diag (I, V22 ) leaves the two left blocks in (VIL6) unchanged and converts the top right block to the form (VII . 7) . Again, orthonormality of the rows of a unitary matrix and a repetition of the argument in the preceding step show that after this Utransform the matrix W is reduced to the form
k e k k
Ek me
k c1 0 s1 0 0
fk 0
I
0 0 0
I I
I I I
k  81 0
fk 0 0
mf 0 0
X33 x43 X53
X34 X44 X54
X3 s X4s Xss
(VIL8)
Now, we determine the form of the bottom right corner. Since the rows of a unitary matrix are mutually orthogonal, we must have cl s l == slx33 · But C1 and 51 are diagonal and 8 is invertible. Hence, we must have 1 X33 == C1 . But then the blocks X34 , X3 5 , X43 , X53 must all be 0, since the matrix (VIL8) is unitary. So, this matrix has the form
Let X
=
k Ek
k c1 0
fk 0
I
k  81 0
fk 0 0
mf 0 0
k fk mf
s1 0 0
0 0 0
c1 0 0
0 x44 Xs 4
0 X45 Xs 5
(VI L9)
( i:: ;:: ) . Then X is a un tary matrix of size m  k. Let i
U == diag ( Jp_ , Ik , X ) , where lp_ and Ik are the identity operators of sizes f and k , respectively. Then, multiplying ( VII.9) on the left by U*  another
VI I. l Pairs of Subspaces
1 99
Utransform  we reduce i t to the form k
fk 0 I
k fk
s1
0
0 0
mf
0
0
k fk
fk 0 0
mf 0 0
c1 0
0 I
0 0
0
0
I
( VII. lO )
)
If we now put C ==
and S ==
k 1
(�
fk 0 I
k 1
fk 0 0
(�
)
k p_  k
)
k p_  k ,
•
then the matrix (VII. lO ) is in the desired form (VII.3) .
Let W be as in ( VII. 2) but with f > m . Show that there exist unitary matrices U = diag ( U11 , U22) and V diag ( V1 1 , V22 ) , where U1 1 , V1 1 are f f matrices, such that
Exercise VII. I . 7
==
X
nf
U* WV =
c 0
s
2£  n 0 I
0
nf S 0
c
where
nf 2£  n nf
C and S are nonnegative diagonal matrices with 0 < c1 < < Cne < 1 and 1 > s 1 > > s n  i > 0, C2 + S2 1. ·
·
·
·
·
·
(VIL l i )
diagonal entries respectively, and
==
The form of the matrices C and S in the above decompositions suggests an obvious interpretation in terms of angles. There exist (acute) angles oj , ; > e l > ()2 > . . . > 0, such that Cj == cos Oj and Sj = sin Oj . One of the major applications of the CS decomposition is the facility it provides for analysing the relative position of two subspaces of
en .
Let X 1 , Y1 be n f matrices with orthonormal columns. Then there exist f_ X f unitary matrices ul and vl and an n X n unitary matrix Q with the following properties.
Theorem VII. 1 . 8
x
200
VI I .
(i) If 2P <
Pert urbat ion of Spectral S ubspaces of Normal Tv1 atrices
n,
then p
Q X 1 U1 ==
I 0 0
f
P n  2P
f
c S 0
Q Y1 V1 =
(VIL 12)
p
P n  2P
(VII . 13)
where C, S are diagonal matrices with diagonal entries2 0 < c 1 < 1 and 1 > sl > . . . > S f_ > 0 , respectively, and C 2 + S == I {ii) If 2P > n, then
·
·
·
< ce <
.
nP I 0 0
2P  n 0
nP
c 0
s
I
(VIL 14)
0
nP 2£  n n£
2P  n 0 I 0
n£ 2£  n
(VIL 1 5)
nf
C, S are diagonal matrices with diagonal entries 0 < c 1 < 2 2 C n i < 1 and 1 > s 1 > · · · > S n i > 0 , respectively, and C + 5 = I .
where
·
·
·
<
Proof. Let 2P < n. Choose n x (n  f) matrices X2 and Y2 such that X == (X1 X2 ) and Y == ( Y1 Y2 ) are unitary. Let
By Theorem VII. l . 6 we can find block diagonal unitary matrices U == diag(Ul , U2 ) and v == diag(Vl , V2 ) , in which ul and vl are p X £ unitaries, such that C s 0
S c 0
0 0 I
Let Q == (XU) * == (X1 U1 X2 U2 ) * . Then from the first columns of the two sides of the above equation we obtain the equation ( VIL 13) . For this Q the equation (VII. 1 2) is also true.
VI I . l Pairs of S ubs paces
201
When 2£ > n , the assertion of the theorem follows, in the same manner, • fro m the decompos ition (VII. l l ) . This theorem can be interpreted as follows. Let £ and F be £dimensional subspaces of e n . Choose orthonormal bases XI ' . . . ' X.e and Yl ' . . . ' Y.e for these spaces. Let x l == (x l X 2 . . . Xt) , yl == ( Y l Y2 . . . Y.e ) . Premultiply ing X 1 , Y1 by Q corresponds to a unitary transformation of the whole space e n , while postmultiplying X 1 by U1 and Y1 by V1 corresponds to a change of bases within the spaces £ and F, respectively. Thus , the theorem says that, if 2£ < n , then there exists a unitary transformation Q of e n such that the columns of the matrices on the righthand sides in (VII. 12) and (VII. 13) form orthonormal bases for Q£ and Q:F, respectively. The span of those columns in the second matrix, for which s1 == 1 , is the orthogonal complement of Q£ in Q:F. When 2£ > n , the columns of the matrices on the righthand sides of (VII. 14) and (VII. 15) form orthonormal bases for Q£ and QF, respectively. The last 2£  n columns are orthonormal vectors in the intersection of these two spaces. The space spanned by those columns of the second matrix, in which s1 1 , is the orthogonal complement of Q£ in Q:F. The reader might find it helpful to see what the above theorem says when £ and F are lines or planes in JR3 . Using the notation above, we set ==
8(£, F) == arcsin S. This is called the angle operator between the subspaces £ and F. It is a diagonal matrix, and its diagonal entries are called the canonical angles between the subspaces £ and F . If the columns of a matrix X are orthonormal and span the subspace £, then the orthogonal projection onto £ is given by the matrix E == X X * . This fact is used repeatedly below.
Let £ and :F be subspaces of en . Let X and Y be ma trices with orthonormal columns that span £ and F, respectively. Let E, F be the orthogonal projections with ranges E, F. Then the nonzero singular values of EF are the same as the nonzero singular values of X * Y . Exercise Vll. l . lO Let E, F be subspaces of e n of the same dimension, and let E, F be the orthogonal projections with ranges E, F. Then the sin gular values of EF are the cosines of the canonical angles between £ and F, and the nonzero singular values of E _l_ F are the sines of the nonzero canonical angles between £ and F.
Exercise VII. l . 9
Let E, F and E, F be as above. Then the nonzero sin gular values of E  F are the nonzero singular values of E_l_ F, each counted twice; i. e. , these are the numbers s 1 , s 1 , s 2 , s2 , . . . .
Exercise Vll. l . l l
VI I .
202
Perturbat ion of Spectral Subspaces of Normal M atrices
Note that by Exercise VII. l . lO, the angle operator 8 ( £ , F) does not depend on the choice of any particular bases in £ and F. Further, 8 ( £ , F) 8(F, £ ) . It is natural to define the distance between the spaces £ and :F as I J E  F II . In view of Exercise VII. l . l l , this is also the number II E_L F ll . More generally, we might consider I I I E_L F i l l , for every unitarily invariant norm , to define a distance between the spaces £ and F. In this case, I ll E  Fi ll ==
==
I I I E j_ F EB EF j_ I ll 
We could use the numbers III E_L F ill to measure the separation of £ and F, even when they have different dimensions. Even the principal angles can be defined in this case:
Let X , y be any two vectors in e n . The angle between x and y is defined to be a number L (x, y) in [0, /2] such that Exercise VII. 1 . 12
1r
L
Let
(X' y )
= cos
1
l y *x l
II X I I I I y I I .
and F be sub spaces of en , and let dim e l ' . . . , em recursively as £
max EE
x x ...L ( x 1 , . . , x k  1 ]
£ > dim F == m .
Define
min
y E :F y ...L ( y l , · · · · Y k  1 ]
Then ; > 0 1 > · · > Brn > 0. The numbers ek are called the principal angles between £ and F. Show that when dim £ dim F, this coincides with the earlier definition of principal angles. ·
==
Exercise VII. 1 . 13 have l i E  F ll < 1 .
Show that for any two orthogonal projections E, F we
Proposition Vll. l . l4 Let E, F be two orthogonal projections such that jj E  Fl l < 1 . Then the ranges of E and F have the same dimensions. Proof. Let £ , :F be the ranges of E and F. Suppose dim £ > dim F. We will show that £ n F_L contains a nonzero vector. This will show that l i E  Fl l == 1 . Let g == EF. Then g c £, and dim g < dim :F < dim £ . Hence £ n g_L contains a nonzero vector x. It is easy to see. that £ n gj_ C :Fj_ . Hence , • x E :F.L . In most situations in perturbation theory we will be interested in compar ing two projections E, F such that I I E  Ft l is small. The above proposition shows that in this case dim E dim F. ==
VI I . 2 The Equation AX  XB
Y
203
Let
Example VII. l . 15
xl ==
=
1 1 0 0
1
V2

0 0 1 1
1 0 0 0
Y1 =
0 0 1 0
The columns of X and of Y1 are orthonormal vectors. If we choose unitary matrices 1
ul ==
and
( 11 � , V2 ) 1
v1 =

1 1 1 1
Q = �2
1 1 1 1
� ( 1 � )
1 1 1 1
1
'
1 1
1 1
then we see that Q X 1 U1 =
1 0 0 1 0 0 0 0
1
Q Y1 V1 == V2
1 0 0 1 1 0 0 1
Thus in the space IR4 (or C4 ) , the canonical angles between the 2dimensional subspaces spanned by the columns of X 1 and Y1 , respectively, are � , � . VII . 2
The Equation AX  X B == Y
We study in some detail the Sylvester equation,
AX  XB = Y.
( VII . 16)
Here A is an operator on a Hilbert space 7t, B is an operator on a Hilbert space K, and X, Y are operators from K into 7t. Most of the time we are interested in the situation when K = 1t == e n , and we will state and prove our results for this special case. The extension to the more general situation is straightforward. We are given A and B , and we ask the following questions about the above equation. When is there a unique solution X for every Y? What is the form of the solution? Can we estimate I !Xfl in terms of lf Y I I ?
Let A, B be operators with spectra o(A) and o(B), re spectively. If o(A ) and o( B) are disjoint, then the equation (VI/. 16} has a unique solution X for every Y .
Theorem VII.2. 1
204
VI I .
Pert urbat ion of S p ectral S ubs p aces of Normal M at rices
Let T be the linear operator on the space of operators, defined by T ( X ) == AX  X B . The conclusion of the theorem can be rephrased as: T is invertible if CJ ( A ) and CJ (B) are disj oint . Let A ( X ) == AX and B ( X ) == XB . Then T == A  B and A and B commute ( regardless of whether A and B do ) . Hence, CJ(T) C CJ(A)  CJ (B) . If x is an eigenvector of A with eigenvalue a , then the matrix X , one of whose columns is x and the rest of whose columns are zero, is an eigenvector of A with eigenvalue a . Thus t he eigenvalues of A are j ust the eigenvalues of A, each counted n times as often. So CJ ( A ) == CJ ( A ) In the same way, CJ(B) == CJ(B ) . Hence a ( T ) C CJ(A )  CJ (B ) . So, if CJ (A ) and O"(B) are disjoint , then 0 ¢ CJ(T) . Thus , T is invertible. • Proof.
.
It is instructive to note that the scalar equation a x  xb == y has a unique solution x for every y if a  b =I= 0. The condition 0 tf_ CJ (A)  CJ(B) can be interpreted to be a generalisation of this to the matrix case. This analogy will be helpful in the discussion that follows. Consider the scalar equation ax  xb == y. Exclude the trivial cases in which a == b and in which either a or b is zero. The solution to this equation can be written as b I 1 x == a1y.
(
a
)
If l b l < I a ! , the middle factor on the right can be expanded as a convergent power series, and we can write x = a 1
00 ( ) n b
2:=
n=O
a
y
00 '""'""'  n  1 y bn . == � a n=O
This is surely a complicated way of writing x = y f (a  b) . However, it suggests, in the operator case , the form of the solution given in the theorem below. For the proof of the theorem we will need the spectral radius formula. This says that the spectral radius of any operator A is given by the formula spr ( A ) = lim /! An l l l / n . n�oo
Theorem Vll.2.2 Let A, B be operators such that CJ"(B) C {z : l z l < p} and CJ(A ) C { z : l z l > p} for some p > 0. Then the solution of the equation AX  XB == Y is X ==
CXJ
L A n  1 Y Bn .
n =O
( VII. l 7 )
P roof. We will prove that the series converges. It is then easy to see that X so defined is a solution of the equation. Choose P1 < p < P2 such that CJ( B ) is contained in the disk {z : l z l < P 1 } and CJ(A) is outside the disk { z : l z l < p2 } . Then O" ( A  1 ) is inside the disk { z : I z I < p2, 1 } . By the spectral radius formula, there exists a positive
VI I . 2 The Equation AX  X B
==
}/
205
i nteger N such that for n > N , !I B n II < p� and I I A  n I I < p:;_n . Hence, for n > N , I I A  n  l yB n l l < (p 1 /p2 ) n ii A 1 Y II . Thus the series in (VII. 1 7) is •
convergent.
Another solution to (VII. l6) is obtained from the following considera
J e t(ba) d t is convergent and has the 0 Thus, in this case , the solution of the equation ax  xb == y can
tions. If Re(b value
a
1b.
a)
oo
< 0 , the integral CXJ
be expressed as
x == J et(b  a) y dt. This is the motivation for the following 0
theorem.
Let A and B be operators whose spectra are contained in the open right halfplane and the open left halfplane, respectively. Then the solution of the equation AX  X B == Y can be expressed as
Theorem VII.2 . 3
j et A ye t B dt. CXJ
X=
(VII. 18)
0
Proof. It is easy to see that the hypotheses ensure that the integral given above is convergent . If X is the operator defined by this integral, then AX  X B
00
j (A e tA yet B  e  tA y etB B ) dt 0
 e  t A ye t s l :
==
Y.
•
So X is indeed the solution of the equation.
Notice that in both the theorems above we made a special assumption about the way o ( A) and o ( B ) are separated. No such assumption is made in the theorem below. Once again, it is helpful to consider the scalar case first. Note that
(a 
1 () ( b  ( )
==
(
1
a
1 ( b(
)
1 b
a·
So, if r is any closed contour in the complex plane with winding numbers 1 around a and 0 around b, then by Cauchy's integral formula we have
27f i . 1 d( == b J ( a  () ( b  () a 
r
Thus the solution of the equation x ==
ax  x b == y can be expressed as
y 1 d(. 27fi J ( a  () ( b  ( ) r
VI I .
206
Perturbat ion of Spectral Subspaces o f Normal Matrices
The appropriate generalisation for operators is the following.
Let A and B be operators whose spectra are disjoint from each other. Let r be any closed contour in the complex plane with winding numbers 1 around o(A) and 0 around o(B ) . Then the solution of the equation AX  X B == Y can be expressed as Theorem Vll.2.4
( VII. 1 9) Proof. If A X  X B Y, then for every complex number ( , (A  () X  X ( B  () == Y . If A  ( and B  ( are invertible, this gives
X ( B  ( )  1  ( A  ( )  1 X == ( A  ( )  1 Y ( B  ( )  1 .
Integrate both sides over the given contour r and note that J ( B  ()  1 d( 0 and  J (A  ()  1 d( r
==
21ril. This proves the theorem.
r
==
•
Our principal interest is in the case when A and B in the equation (VII. 16) are both normal or, even more specially, Hermitian or unitary. In these cases more special forms of the solution can be obtained. Let A and B be both Hermitian. Then iA and iB are skewHermitian, and hence their spectra lie on the imaginary line. This is j ust the opposite of the situation that Theorem VII. 2.3 was addressed to. If we were to imitate 00 that solution, we would try out the integral J e  i tA y e itB dt . This, however, 0
does not converge . This can be remedied by inserting a convergence factor: a function f in L 1 (�) . If we set
X=
00
j 
oo
it it e  A ye B
f ( t ) dt ,
then this is a welldefined operator for each f E £ 1 (IR) , since for each t the exponentials occurring above are unitary operators. Of course, such an X need not be a solution of the equation (VII. 16) . Can a special choice of f make it so? Once again, it is instructive to first examine the scalar case. In this case, the above expression reduces to
,..
,.. x == yf( a  b ) ,
where f is the Fourier transform of f, defined as
}( s) =
CXJ
J

oo
e
 its
j ( t ) dt.
(VII.20)
V I I . 2 The Equat ion AX

XB
1 b , we do have .. So, if we choose an f such that j(a  b ) == The following theorem generalises this to operators. a
=
Y
207
ax  xb ==
y.
Let A , B be Hermitian operators whose spectra are dis joint from each other. Let f be any function in L 1 (IR) such that j ( s) == � whenever s E o(A)  o (B) . Then the solution of the equation AX  X B == Y can be expressed as CXJ e itA y e i t B f(t) dt. X (VII.2 1 ) Theorem VII.2 . 5
=
j
 CXJ
Proof. Let a and (3 be eigenvalues of A and B with eigenvectors u and i A is unitary and its adjoint is v , respectively. Then, using the fact that e t e  it A , we see that
( e i tA A u, y e i t B v ) e it ( /3 cx ) a ( u, Yv) .
A similar consideration shows that
( u , e  itA y eitB Bv) == e i t ( f3  cx ) f3 ( u , Y v ) .
Hence, if X is given by (VII.2 1 ) , we have
( u , ( A X  X B)v)
( a  ,B) ( u , Yv)
J e i t (f3<> ) f(t ) dt CXJ
(a  (J) (u , Yv) f (a  {3) (u , Yv) . ()()
"
Since eigenvectors of
AX  X B == Y.
A and B both span the whole space, this shows that
•
The two theorems below can be proved using the same argument as above. For a function f in L 1 ( IR2 ) we will use the notation j for its Fourier transform, defined as
]( s 1 , sz ) =
J J ei (t , s, +t2s2 ) f ( t1, tz ) dt 1 dtz. CXJ CXJ
 CXJ  ()()
Let A and B be normal operators whose spectra are disjoint from each other. Let A == A1 + i A2 , B == B1 + i B2 where A 1 and A2 are commuting Hermitian operators and so are B1 and B2 . Let f be any function in L1 (IR2 ) such that ] (s 1 , s2 ) == 1 � . whenever s 1 + is2 E s
Theorem VII.2.6
,
2S2
208
Perturbat ion o f S p ectral S ubspaces of Nor mal Matrices
VI I .
o(A)  o ( B )
expressed as
.
of
the equation
AX  X B
J j e  i (t l A l +t2 A2 ) yei (t 1 B 1 +t2 B2 ) f (t 1 , t 2 )dt 1 dt2 . CX)
X=
Then the solution
CX)
==
Y
can be
CX)
(VII. 22)
 CX)
Theorem VII.2. 7 
Let A and B be unitary operators whose spectra are disjoint from each other. Let { a n } CX) be any sequence in £ 1 such that CX)
CX)
L
n =  CXJ
in an e e ==
1
1  e2· e
whenever
Then the solution of the equation AX  X B == Y can be expressed as CX)
X == L a n A n  l yB n .
(VIL2 3)
n= CXJ
The different formulae obtained above lead to estimates for I I I X III when A and B are normal. These estimates involve I II Y I I I and the separation 8 between o(A) and o(B) , where
8 == d ist ( o(A) , a (B ) ) == min { I A  Jt I : A E o (A) ,
JL
E o(B) } .
The special case of the Frobenius norm is the simplest. Theorem Vll.2 . 8 Let A and B be normal matrices, and let 8 == dist (o(A) , o( B)) > 0 . Then the solution X of the equation AX  X B ==
Y
satisfies the inequality
(VII. 24) Proof. If A and B are both diagonal with diagonal entries A 1 , . , A n and f..L l , . . . , f..L n , respectively, then the entries of X and Y are related by the equation Xij == Yij / ( A i  Jtj )  From this (VII. 24) follows immediately. If A, B are any normal matrices, we can find unitary matrices U, V and diagonal matrices A ' , B' such that A == U A' U * and B == V B'V * . The equation AX  X B == Y can be rewritten as .
.
UA' U * X  X VB' V * = Y and then as
A' (U* X V )  ( U * X V ) B' == U* Y V.
So, we now have the same type of equation but with diagonal A ' , B ' . Hence,
I I U * X V [[2 <
� [ [ U* YV [[ 2 ·
By the unitary invariance of the Frobenius norm this is the same as the • inequality (VII.24) .
Examp le
V I I . 2 The Equation AX  X B
=
Y
209
If A or B is not normal, no inequality like ( 1llf. 24) is true in general. For example, if A == Y == I and B == ( � � ) , then the equation AX  X B == Y has the solution X == ( � � ) . Here 8 == 1 , I I Y I I 2 == )2, but I I X I J 2 can be made arbitrarily large by choosing t large. Thus we ca nnot even have a bound like I I X I I 2 < � I I Y I I 2 for any cons tant c in this case. Exa mpl e VII. 2 . 10 In this example all matrices involved are Hermitian: VII. 2 . 9
X=
Then
� IIY II 
A, B .
( Vf51
Vf5 3
),
Y=
6 ( 2Vf5
2Vf5 6
).
Here 8 == 2 . But, for the operator norm, I I X I I > Thu s, the inequ ality I I X I I < i I I Y I I need not hold even for Hermitian
AX

X B == Y .
In the next theorems we will see that we do have ! I I X I I I < � I l l Y I l l for a small constant c when A and B are normal. When the spectra of A and B are separated in a special way, we can choose c == 1 .
Let A and B be normal operators such that the spec trum of B is contained in a disk D( a, p ) and the spectrum of A lies outside a concentric disk D( a, p+8) . Then, the solution of the equation AX  X B == Y satisfies the inequality
Theorem Vll.2. 1 1
I ll X Il l
�
< i ll Y I ll
(VIL 25)
for every unitarily invariant norrn. Proof. Applying a translation , we can assume that a == 0. Then the solution X can be expressed as the infinite series (VII. I 7) . From this we get
I ll X I ll
00
<
<
n L I J A l Ji + l J II Y III I I B J i n n =O 00 II I Y I J I L (p + 8)  n  l Pn n =O
1 b iii Y I I I 
•
Either by taking a limit p � oo in the above argument or by using the form of the solution (VII. 18) , we can prove the following.
VI I .
210
Pert urbation o f Spectral Subspaces
o f Normal
Matrices
Let A and B be normal operators with ()(A ) and () ( B ) lying in halfplanes separated by strip of width 8 . Then the solution of the equation AX  XB == Y satisfies the inequality {VI/. 25). Exercise VII.2 . 13 Here is an alternate proof of Theorem VI/. 2. 1 1 . As sume, without loss of generality, that a == 0. Then A is invertible. Write X == A  1 (Y + X B) and obtain the inequality (VI/. 25} directly from this . Exercise VII. 2. 14 Choose unit vectors u and v such that X v == I I X I ! u and X * u == I I X I ! v ; i. e., u and v are left and right singular vectors of X corresponding to its largest singular value. Then ( u, ( AX X B)v) == I I X II ( (u, A u)  (v , Bv) ) . Use this to prove Theorem VI/. 2. 1 1 in the special case of the operator norm. Theorem VII.2. 1 5 Let A and B be Hermitian operators with d ist (()(A) , ()(B) ) == 8 > 0 . Then, the solution of the equation A X  X B == Y satisfies the inequality Theorem VII.2 . 12
a

III X III <
� 11 1¥111
(VII.2 6)
for every unitarily invariant norm, where c 1 is a positive real number de fined as c 1 == i nf{ l l f i i Ll f :
E
1
L ( IR) , f ( s ) "
=
1
when lsi > s

1}.
(VII. 27)
Proof. Let f6 be any function in £ 1 ( R) such that Jo ( s ) == ! whenever l s i > 8. By Theorem VII.2.5 we have
j
CXJ
X=
ei
tA y itB e fo ( t )dt .
 oo
Hence ,
I l l X III < 111¥111
00
j l fo (t ) ldt = � 111¥111 j l f (t) l dt, CXJ
 oo

oo
! whenever l si > 1. Any where f ( t ) == fo ( t /8) . Note that ](s ) this property satisfies the above inequality. ==
f with •
Exactly the same argument , using Theorem VII.2.6 now leads to the following. Theorem VII.2 . 16 Let dist (()(A) , ()(B) ) == 8 > 0 . Y satisfies the inequality
A and B be normal operators with Then the solution of the equation A X  XB ==
I ll X III <
� 111¥111
(VII.28 )
VII.3
Pertur bation of Eigenspaces
211
for every unitarily invariant norm, where (VII.29) The exact evaluation of the constants c 1 and c2 is an intricate problem in Fourier analysis. This is discussed in the Appendix at the end of this chapter. It is known that and 1r
C2 < 2 Further, with this value of VII . 3
c1 ,
j sin t dt < 2. 9 1 . 7r
0
t
the inequality (VII.26) is sharp.
Perturbation of Eigenspaces
Given a normal operator A and a subset S of C, we will write PA ( S ) for the orthogonal projection onto the subspace spanned by the eigenvectors of A corresponding to those of its eigenvalues that lie in S. If sl and s2 are two disjoint sets, and if E == PA ( SI ) and F == PA ( S2 ) , then E and F are mutually orthogonal. If A and B are two normal opera tors, and if E == PA ( S1 ) and F == Ps ( S2 ) , then we might expect that if B is close to A and S1 and S2 are far apart, then E is nearly orthogonal to F. This is made precise in the theorems below.
Let A , B be normal operators. Let S1 and S2 be two subsets of the complex plane that are separated by either an annulus of width 8 or a strip of width 8. Let E PA ( SI ) , F Ps ( S2 )  Then, for every unitarily invariant norm,
Theorem VII.3 . 1
==
==
III EF III < � III E (A  B )F III < Proof. Since E commutes with A and (VII.30) can be written as
�
� I l i A  B i ll ·
F with B,
III EF III < III AE F  E FB ll l 
(VII.30)
the first inequality in
Now let EF == X . This is an operator from the space ran F to the space ran E. Restricted to these spaces, the operators B and A have their spec tra inside S2 and S1 , respectively. Thus the above inequality follows from Theorem VII. 2 . 1 1 when S1 and S2 are separated by an annulus, and from Theorem VII.2. 12 when they are separated by a strip.
212
VI I .
Pert urb ation o f S p ect ral S ubspaces o f Normal M atrices
The second inequality in
( VII.30)
is true because I I E I I == I I F I I == 1 .
•
The special case of this theorem when A and B are Hernlitian , S1 is an interval [a, b] and S2 the complement in IR of the interval (a  8 , b + 8 ) , is known as the DavisKahan sin8 Theorem. ( We saw in Section 1 that l i EF I I is the sine of the angle between ran E and ran F ..i . ) With no special assumption on the way S1 and S2 are separated, we can derive the following two theorems from Theorems VII. 2 . 1 5 and VII. 2 . 1 6 by the argument used above.
Let A and B be Hermitian operators, and let S1 , S2 be any two subsets of JR. such that dist (S1 , S2 ) == 8 > 0 . Let E == PA ( S1 ) , F == Ps ( S2 ) . Then, for every unitarily invariant norm, c E (A  B) AB (VII.31)
Theorem VII. 3 . 2
III EF III <
; III
F II I <
� lil
l ll ,
where c1 is the constant defined by (VII. 27). {We know that c 1 == ; . ) Theorem VII. 3 . 3 Let A and B be normal operators, and let S1 , S2 be any two subsets of the complex plane such that dist ( S1 , S2 ) == 8 > 0 . Let E == p ( sl ) ' F == Ps ( s2 ) . Then, for every unitarily invariant norm, A
III EF III <
� III E (A  B) F III < � lil A  B ill ,
where c2 is the constant defined by (VII. 29) . (We know that c2
(VII.32) < 2. 9 1 . )
Finally, note that for the Frobenius norm alone, we have a stronger result as a consequence of Theorem VII. 2.8.
Let A and B be normal operators and let S1 , S2 be any two subsets of the complex plane such that dist ( S1 , S2 ) == 8 > 0. Let E == PA (S1 ) , F == Ps ( S2 )  Then
Theorem Vll. 3 . 4
1 1 I I E F I I 2 < 8 1 1 E ( A  B) F I I 2 < 8 1 J A  B l l 2 · VII . 4
A Perturbation Bound for Eigenvalues
An important corollary of Theorem VII.3.3 is the following bound for the distance between the eigenvalues of two normal matrices.
There exists a constant 1 < c < 3, such that the optimal matching distance d(a (A) , a (B) ) between the eigenvalues of any two normal matrices A and B is bounded as
Theorem VII.4. 1
c,
d ( a (A) , a (B) ) < e l l A  B ll 
(VII.33)
VI L 5 Perturbation of the Polar Fac tors
213
Proof. � will show that the inequality (VII . 33 ) is true if c == c 2 , the constant in the inequality (VII.32 ) . Let rJ == c2 l 1 A  E ll and suppose d ( a ( A ) , a ( B )) > 'TJ · Then we can find a 8 > rJ such that d ( a ( A) , a ( B )) > 8. By the Marriage Theorem, this is possible if and only if there exists a set 51 consisting of k eigenvalues of A , 1 < k < n , such that the 8neighbourhood { z : dist (z, 51 ) < 8} contains less than k eigenvalues of B. Let S2 be the set of all eigenvalues of B outside this neighbourhood. Then dist (S 1 , S2 ) > 8. Let E == PA (S1 ) , F == Ps ( S2 ) · Then the dimension of the range of E is k, and that of the range of F is at least n  k + 1 . Hence I I E F I I == 1. On the other hand, the inequality (VII.32) implies that I I EF I I <
� II A  E ll = � < 1 .
This is a contradiction. S o the inequality (VII.33) is valid if we choose c == c2 ( < 2 . 9 1 ) . Example VI. 3. 13 shows that any constant c for which the inequality • (VII.33) is valid for all normal matrices A , B must be larger than 1 . 0 18. We should remark that, for Hermitian matrices, this reasoning using Theorem VII.3.2 will give the inequality d ( a ( A ) , a ( B )) < ; I I A BJI . However, in this case, we have the stronger inequality d ( a ( A ) , a ( B )) < I I A  B I I  So, this may not be the best method of deriving spectral variation bounds. However, for normal matrices, nothing more effective has been found yet. 
VII . 5
Perturbation of the Polar Factors
Let A == UP be the polar decomposition of A. The positive part P in this decomposition is P == ! A I == (A* A) 1 1 2 and is always unique. The unitary part U is unique if A is invertible. Then U == A P 1 . It is of interest to know how a change in A affects its polar factors U and P. Some results on this are proved below. Let A and B be invertible operators with polar decompositions A == UP and B == VQ, respectively, where U and V are unitary, and P and Q are positive. Then, 
l i l A  B ill == III Up  VQ III ==
III P  U * V Q II I
for every unitarily invariant norm. By symmetry,
lil A  B ill == III Q  V* u P ill · Let Y == P  U* VQ , Z = Q  V * UP.
VI I.
214
Perturbation of Spectral Subspaces of Normal Mat rices
Then
Y + Z*
==
P(I  U * V) +
( /  U* V ) Q .
(VII.34)
This equation is of the form studied in Section VII.2. Note that () ( P) is a subset of the real line bounded below by s n (A) I I A  1 11  1 and O"(Q) is a subset of the real line bounded below by s n (B) == I I B 1 I I  1 . Hence, dist (a(P), (j(Q) ) == s n (A) + sn (B) . Hence, by Theorem VII. 2. 1 1 , ==
Ill I  U * V lll
<
s n (A)
� sn (B) lil Y + Z * Ill 
Since III Y III == III Z III == li l A  B ill and I ll !  U * V III == I I I U  V III , this gives the following theorem.
Let A and B be invertible operators, and let the unitary factors in their polar decompositions. Then
Theorem VII. 5 . 1
U, V
be
(VII.35)
for every unitarily invariant norm I ll I l l Exercise Vll. 5 . 2 Find matrices A , B for which (V/!.35} is an equality. Exercise VII. 5.3 Let A , B be invertible operators. Show that ·
II I I A I  I B I I II
where
m
< (1
+
== min( II A I I , II B J I ) .
II A  l ii  1
:1 1 B  l l !  l )
2
Ili A  B i ll ,
(VII.36)
For the Frobenius norm alone, a simpler inequality can be obtained as shown below. Lemma VII.5.4
the inequality
Let f be a Lipschitz continuous function on C satisfying
Then, for all matrices X
< klz  wj,
for all z , w E C. and all normal matrices A, we have
l f( z )  f ( w) l
l l f (A) X  X f (A) ll 2
<
k l l AX  X A I I 2 ·
Proof. Assume, without loss of generality, that A == diag ( A 1 , . . . , An ) Then, if X is any matrix with entries X ij , we have l l f ( A ) X  X f ( A) II �
't
,J
z ,J
•
VI I . 5 Perturbation of the Po lar Factors
Lemma VII. 5 . 5 VII. 5.4. Let A , B
215
Let f be functi on satisfying the conditions of Lemma be any two normal matrices. Then, for every matrix X , a.
I I J ( A ) X  X f ( B) \ 1 2
Proof. Let T == ( � � ) , Y == ( g by T and Y, respectively. Corollary VII. 5.6
< k l l AX  XB \ 1 2 ·
� ) . Replace A and X in Lemma VII. 5.4 •
If A and B are normal matrices, then (VII.37)
Theorem VII. 5 . 7
Let A , B be any two matrices. Then
1 \ I A I  I B \ 1 1 � + 1 1 \ A* I  I B* I I\ �
< 2 \\ A  E l l � 
(VII. 38)
Proof. Let T == ( J. � ) , S == ( ;. � ) . Then T and S are Hermitian. Note • that I T I == ( I �* I 1 � 1 ) . So, the inequality (VII.38 ) follows from (VII.37) . It follows from (VII . 38) that (VII.39 ) The next example shows that the Lipschitz constant V2 in the above in equality cannot be replaced by a smaller number. Example VII. 5.8
Let A=
Then l A \ == A and As
E
� 0,
( � � ), B= ( � � )·
I B I ==
1
v1 + £ 2
(
1 £
£ 2 E
1 B I Il2 approaches v2. the ratio II IAJ II A  B JI2
).
We will continue the study of perturbation of the function I A I in later chapters. A useful consequence of Theorem VII.5. 7 is the following p erturbation bound for singular vectors. Theorem VII. 5.9 Let S1 , S2 be that dist(S1 , 52 ) == 8 > 0 . Let A
two subsets of the positive half line such and B be any two matrices. Let E and E' be the orthogonal projections onto the subspaces spanned by the right and the left singular vectors of A corresponding to its singular values in 81 . Let F and F' be the projections associated with B in the same way, corresponding to its singular values in S2 • Then (VII.40 )
216
Proof.
\TI L
Perturbation of Spectral S ubspaces of Normal Matrices
By Theorem VII.3.4 we have < <
1
8 1 1 I A I  I B I II 2 ,
� II [ A * [  I B * I I I 2 ·
These inequalities, together with (VII. 38) , lead to the inequality (VII.40) .
VII. 6
Appendix : Evaluating t he const ants
•
( Fourier )
The analysis in Section VII. 2 has led to some extremal problems in Fourier analysis. Here we indicate how the constants c 1 and c2 defined by (VII. 27) and (VII.29) may be evaluated. The symbol 1 1 ! 1 1 1 will denote the norm in the space £ 1 for functions defined on JR. or on IR. 2 . We are required to find a function f in L 1 , with minimal norm, such that ] (s) == ! when lsi > 1 . Since j must be continuous, we might begin by taking a continuous function that coincides with ! for l s i > 1 and then taking f to be its inverse Fourier transform. The difficulty is that the function 1. is not in £ 1 , and hence its inverse Fourier transform may not be defined. Note, however, that the function ! is square integrable at oo. So it is the Fourier transform of an L 2 function. We will show that under suitable conditions its inverse Fourier transform is in L 1 , and find one that has the least norm. Since the function 1 is an odd function, it would seem economical to extend it inside the domain (  1 , 1 ) , so that the extended function is an odd function on JR. This is indeed so. Let f E L 1 ( IR ) and suppose ] (s ) == ! when l si > 1 . Let !odd be the odd part of J , !o dd ( t) == f ( t ) J ( t ) Then f:dd (s) == ! when l s i > 1 and ll !odd ii i < ll f ll 1 · Thus the cons tant c 1 is also the infimum of II f II 1 over all odd functions in L 1 for which J ( s) == ! when lsi > 1. Now note that if j is odd, then s
s

00
] (s )
J j (t) e itsdt
 oo
i
=
i

.
00
J J (t ) sin ts dt
 oo
J Re j(t) sin ts dt + J Im j(t) sin ts dt . 00

oo
00

00
If this is to be equal to � when l s i > 1 , the Fourier transform of Ref should have its support in (  1 , 1 ) . Thus, it is enough to consider purely imaginary
VII . 6 Appendix : Evaluating the
( Fo urier )
const ant s
217
functions f in our extremal problem. Equivalently, let C be the class of all odd real functions in L 1 (IR) such that j ( s ) == 215 when I s I > 1 . Then c1
== inf{ ll f lh :
f E C}.
(VIL 4 1 ) 2 rr
Now, let g be any bounded fun12tion with period 21r having a Fourier
L#O an ei nt. Th s last condition means that J0 g ( t)dt == 0 . n
series expansi on
i
Then, for any f in
C,
00
J
_
f ( t )g (x  t) d t ==
00
L
n#O
�n einx _
(VII .42)
'l n
Note that this expression does not depend on the choice of f . For a real number x , let sgnx be defined as  1 i f x is negative and 1 if x is nonnegative. Let fo be an element o f C such that sgn fo ( t) == sgn sin t .
(VII.43)
Note that the functio n sgn fo then satisfies the requirements made on the preceding paragraph . Hence, we have
j l fo (t ) l dt 00
j fo ( t ) sgn fo (t ) dt =  j fo ( t ) sg 00
in
00
 oo
 oo
g
n
fo ( t )d t
 oo
 j f (t ) sgn fo ( t )dt < j l f (t ) i dt 00
00
 oo
 oo
for every f E C. ( Use (VIL42) with x == 0 and see the remark following it. ) Thus c 1 == li fo 11 1 , where fo is any function in C satisfying (VII.43) . We will now exhibit such a function. We have remarked earlier that it is natural to obtain fo as the inverse Fourier transform of a continuous odd function cp such that cp ( s) == ! for l s i > 1 . First we must find a good sufficient condition on cp so that its inverse Fourier tr an sfo r m cj! (t) =
� 2
j cp ( s) sin ts ds (..'()

oo
( which, by definition, is in L2 ) is in L1 . Suppose cp is differentiable on
the whole real line and its derivative
integration
by
parts shows
cp'
is of bounded variation. Then, an
j cp ( s) sin ts ds = � j cos ts cp'(s)ds. 00
00

CXJ
218
VI I .
Pert urbation o f Spectral S ubspaces o f No rmal Matrices
Another integration by parts, this time for RiemannStieltjes integrals, shows that 00 CXJ
J tp ( s ) sin ts ds = t� J sin ts dtp' (s ) .

CXJ
CXJ
As t + oo this decays as t� . So <j; ( t ) is integrable at oo . Since
cpo ( s ) ==

for l si > 1 for 0 < j s j < 1 for s == 0.
1/s
1  7r2 cot 2 s 0 7f
s
(V I I . 44 )
From the familiar series expansion
CXJ 2z  cot  z ==  + L 2 2 z n = l z 2  4n 2 1
7f
7f
( see L.V. Ahlfors, Complex Analysis, 2nd ed. , p. 188) one sees that 00
cp a ( s ) == L
n= l
2s for 0 < s < 1 . 4n 2  s 2
This shows that cp 0 is a convex function in 0 < s < 1, and hence cp � is of bounded variation in this domain. On the rest of the positive halfline too cp� is of bounded variation. So cp 0 does meet the conditions that are sufficient to ensure that fo == 0,
2 fo ( t) 2fo ( t )  2 fo ( t + ) 1r
fo ( t )  fo ( t + 2 1r )
1
1
j cot ; 0
sin t
t +
fo ( t )
sin ts ds,
sin ( t + 1r )
Ut  t �
t + 7r 1r
The quantity inside the brackets is positive t � oo , we can write 00
s
,
+ 2( t
for
all
� 21r) ] sin t. t.
Since f0 (t )
L [ !o (t + 2 n7r)  fo (t + ( 2 n + l)1r)] sin t h ( t ) sin t ,
n =O
+
0
as
VI I . 6 Append ix: Evaluating t he ( Fo urier) const ants
219
whe re h(t) > 0. This shows that fo satisfi es the condition (VII.43) . Finally, II fo 1! 1 can be evaluated from the available data. We can easily see that 4 1 1  (sin t +  sin 3t +  sin 5t + n 5 3
sgn sin t
2.
nz
L n odd
Hence , using (VII.42 ) we obtain 00
j l fo (t ) l dt

()()
2:_ n
·
e int .
00
!  oo
·
·
)
fo (t) sgn fo ( t)dt 1
n
2 � nL n odd
2
n
We have shown that c 1 == ; . This result , and its proof given above, are due to B . Sz . Nagy. The twovariable problem, through which c2 is defined, is more com plicated. The exact value of c2 is not known. We will show that c2 is fi nite by showing that there does exist a function f in L1 (JR 2 ) such that j (s 1 , s 2 ) == 81_; i s 2 when sf + s� > 1. We will then sketch an argument that leads to an estimate of c2 , skipping the technical details. It is convenient to identify a point ( x , y) in JR? with the complex variable z = x + i y. The differential operator fz = ; anni hilates every +i complex holomorphic function. It is a well known fact (see, e.g. , W. Rudin, Functional Analysis, p. 205) that the Fourier transform of the tempered distribution 1z is  2 zrri . (The normalisations we have chosen are different from those of Rudin.) Let 'P be a c oo function on JR2 that vanishes in a neighbourhood of the origin, and is 1 outside another neighbourhood of the origin. Let '¢( z) == z 1 r,o( ) . We will show that the inverse Fourier transform if; is in L . Note that
( %x
%Y )
z
This
'TJ
(
z
) .· ==
!!._,,,( ) == � d'fJ(z) z z dz . dz o/
is a c oo function with compact support. Hence , 'fJ is in the Schwartz space S. Let i] E S be its inverse Fourier transform. Then (ignoring constant factors) 1/;(z) == i] (z )/ z . Since i] is integrable at oo , so is 7/J. At the origin, ! is integrable and fJ( z) bounded. Hence if; is integrable at the origin. This shows that c2 < oo . Consider the tempered distribution fo (z) == 2� . We know that f� ( �) == 1 1 � . However, f0 ¢ L . To fix it up we seek an element p in the space of tempered d i st r i but io ns S' such that 7r 7. Z
VI I .
220
E
(i) p
Pert urbation o f S p ectral S ubspaces o f Normal Matrices
£ 1 and supp p is contained in the unit disk D , L1 .
(ii) if f == fo + p , t he n f is in
Note that c2 == inf l l f lh over su ch
Writing
z ==
rei0 , one sees that
p.
(X)
=
7f
l fl l 1 2� j rdr j I�  iei0 21rp (z ) j dO.  7!"
0
Let
=
F(r) Then
7f
J
 7!"
7f
j iei6p ( z ) de .
(VII.45 )
1
1
1   ie1· o 21rp( z ) j dB > 2 7r j   F(r) l , r r
and there is equality here i f ei0p( r ei0 ) is independent of e . Hence, we can restrict attention to only those p that satisfy the additional condition
( iii)
zp z )
(
is a radial function.
Putting
G( r)
we see that c2
=
==
inf
1  rF(r) ,
(VII.46)
J IG ( r)ldr,
(VIL47)
(X)
0
where is defined via (VII.45) and (VII.46) for all p that satis fy the conditions (i) , (ii) , and ( iii) above. The twovariable minimisation p roblem is thus reduced to a onevariable problem. Using the conditions on p, one can characterise the functions that enter here. This involves a little m ore intricate analysis, which we will skip. The con c l us ion is that the functions G that enter in (VII.4 7 ) are all £ 1 fun ct i ons of the form G == g , where g is a continuous even function
G
G
supported in [ 1 , 1] such that
c2
=
inf {
(X)
j l.§ ( t ) l dt : 0
g
J g(t) dt = 1. In other words, 1
1
even, supp g
=
g be the function g(t)
If we choose to This gives the estimate c2
<
1r .
[ 1 , 1 ] , ==
1
j
g =
1 , g E £1 } . (VI I . 4 8)
 l t l, then g(t )
==
sin2 ( ; ) ( � ) 2 .
221
VI I . 7 P roblen1s
A better estimate is obtained from the function g (t)
7r
7r == 4
for ! t l < 1 .
cos 2 t
Then g(t) ==
7r
2
A little computation shows that
4 7r 2  t 2
cos t
.
sin dt < 2 . 9090 1 . J i§(t ) idt ; J CXJ
7r
=
0
t
t
0
Thus c2 < 2.9 1. The interested reader may find the details in the paper A n extremal problem in Fourier analysis with applications to operator theory, by R. Bhatia, C. Davis , and P. Koosis, J. Functional Analysis, 82 ( 1989) 138 1 50.
VII . 7
P roblems
Problem VII. 6 . 1 . Let £ be any subspace of e n . For any vector
8(x, £) Then 8(x, £) is equal to
==
yE£
min l l x
II ( /  E) x l i ·
X
let
 Yll 
If £, F are two subspaces of e n ' let
p (£, F) == max{ max 8 ( x , F) , max 8 ( y , F) } . xEE
II X II = 1
Let dim £ Show that
==
where E and
dim F, and let F
8
y E :F
II y II = 1
be the angle operator between £ and F.
p (£, :F ) == II sin 8 11 == l i E  F ll , are the orthogonal projections onto the spaces £ and F.
Problem VII.6.2. Let A, B be operators whose spectra are disjoint from each other. Show that the operator ( � � ) is similar to ( � � ) for every C. Problem VII.6. 3. Let A, B be operators whose spectra are disjoint from each other. Show that if C commutes with A + B and with AB , t hen C commutes with both A and B. Problem VII. 6.4. The equation AX + XA * == I is called the Lyapunov equation. Show that if o(A) is contained in the open left halfplane, then the Lyapunov equation has a unique solution X , and this solution is positive and invertible. 
222
VI I .
Perturbation o f Spectral S ubspaces of Normal Mat rices
Problem VII. 6 . 5 . Let A and B be any two matrices. Suppose that all singular values of A are at a distance greater than 8 from any singular value of B . Show that for every X ,
Problem VII. 6 . 6 . Let A , B be normal operators. Let S1 and S2 be two subsets of the complex plane separated by a strip of width 8. Let E PA ( S1 ) , F == Ps (S2 ) . Suppose E (A  B) E == 0. If T ( t ) is the function T(t) == t/ vl  t2 , show that ==
I I T ( I EF ! ) I I <
� I I E (A  B) II 
Prove that this inequality is also valid for all unitarily invariant norms. This is called the tanG theorem . Problem VII.6. 7. Show that the inequality (VII.28) cannot be true if c2 < ; . ( Hint: Choose the trace norm and find suitable unitary matrices A , B. ) Problem VII. 6 . 8 . Show that the conclusion of Theorem VII.2.8 cannot be true for any Schatten pnorm if p =/= 2. Problem VII.6.9. Let A , B be unitary matrices, and let dist (a (A) , a( B) ) == b == J2. If some eigenvalue of A is at distance greater than J2 from a ( B) , then a( A) and a(B) can be separated by a strip of width J2. In this case, the solution of AX  X B == Y can be obtained from Theorem VIL2.3. Assume that all points of a ( A) are at distance J2 from all points of O"(B) . Show that the solution one obtains using Theorem VII.2. 7 in this case is If O" ( A) == { 1 ,  1 } and O" ( B) == { i  i } , this reduces to ,
X == 1 /2 (AY  YB) . Problem VII.6. 1 0 . A reformulation of the Sylvester equation in terms of tensor products is outlined below. Let
cp ( I ® A)
AEij , EijAr ,
VI I. 8 Notes
and
References
223
where AT is the transpose of A. Thus the multiplication operator A ( X ) == AX on .C ( 7t ) can be identified with the operator A ® I on 1t ® 1t * , and the operator B(X ) == X B can be identified with I ® B T . The operator T == A  B then corresponds to
A ® !  I ® B T.
Use this to give another proof of Theorem VII. 2 . 1 . Sometimes it is more convenient to identify .C ( 7t ) with 1t ® 1t instead of 1l ® 7t * . In this case, we have a bijection
VII . 8
Notes and References
Angles between two subspaces of IRn were studied in detail by C. Jordan, Essai sur la geometrie a n dimensions, Bull. Soc. Math. France, 3 ( 1875) 103 1 74. These ideas were reinvented, developed, and used by several math ematicians and statisticians. The detailed analysis of these in connection with perturbation of eigenvectors was taken up by C. Davis, Separation of two linear subspaces, Acta Sci. Math. (Szeged) , 19 ( 1958) 172 1 87. This work was continued by him in The rotation of eigenvectors by a pertur bation, J. Math. Anal. Appl. , 6 ( 1963) 159 173, and eventually led to the famous paper by C. Davis and W.M. Kahan, The rotation of eigenvectors by a perturbation III, SIAM J. Numer. Anal. 7( 1 970) 146. The CS decomposi tion in its explicit form as given in Theorem VII. 1 . 6 is due to G.W. Stewart,
On the perturbation of pseudoinverses, projections, and linear least squares problems, SIAM Rev. , 1 9 ( 1 977) 634662. It occurs implicitly in the paper
by Davis and Kahan cited above as well as in another important paper, A. Bjorck and G H. Golub, Numerical methods for computing angles be tween linear subspaces, Math. Comp. 27( 1973) 579594. Further references may be found in an excellent survey article, C . C. Paige and M. Wei, His tory and generality of the CS Decomposition. Linear Algebra Appl. , 208/209 ( 1994) 303326. Many of the proofs in Section VII. 1 are slight modifications of those given by Stewart and Sun in Matrix Perturbation Theory. The importance of the Sylvester equation AX  X B == Y in connec tion with perturbation of eigenvectors was recognised and emphasized by Davis and Kahan, and in the subsequent paper, G.W. Stewart, Error and .
perturbation bounds for subspaces associated with certain eigenvalue prob lems , SIAM Rev. , 15 ( 1973) 727 764. Almost all results we have derived for
this equation are true also in infinitedimensional Hilbert spaces. More on this equation and its applications, and an extensive bibliography, may be found in R. B hatia and P. Rosenthal, How and why to solve the equation AX  X B == Y , BulL London Math. Soc. , 29( 1997) to appear. Theorem VII. 2 . 1 was proved by J.H. Sylvester, Sur f 'equation
224
VI I .
Pert urbation of Sp ectral S ubspaces o f Normal M atrices
C . R. Acad. Sci . P ari s 99( 1884) 6771 and 1 15 1 16 . The proof given here i s due to G . Lumer and M . Rosenblum, Linear op erator equations, Proc. Amer. Math. Soc. , 10 ( 1959) 3241 . The solution (VII. 1 8) is due to E . Heinz , Beitriige zur Storungstheorie der Spektralzer legung, Math. Ann. , 123 ( 19 5 1 ) 415438. The general solution (VII. l9) is due to M. Rosenblum, On the operator equation B X  XA = Q, Duke Math. J . , 23 ( 1 956) 263270. Much of t h e rest of Section VII.2 is based on the paper by R. Bhatia, C. Davis and A. Mc intosh , Perturbation of spectral subspaces and solution of linear operator equations, Linear Algebra Appl. , 52/53 ( 1983) 4567. For Hermitian operators, Theorem VI L 3. 1 was proved in the Davis Kahan paper cited above. This is very well known among numerical an alysts as the sinO theorem and has been used frequently by them. This paper also contains a tane theorem ( see Problem VII.6.6 ) as well as sin28 and tan2B theoren1s , all for Hermitian operators. Theorem VII.3.4 (for Her mitian operators ) is also proved there. The rest of Section VIL3 is based on the paper by Bhatia, Davis, and Mcintosh cited above, as is Section VII.4. Theorem VII. 5 . 1 is due to R.C.Li, New perturbation bounds for the uni tary polar factor, SIAM J . Matrix Anal. Appl. , 16 ( 1995) 327332. Lemmas VII.5.4, VII.5.5 and Corollary VII. 5.6 were proved in F. Kittaneh, On Lip schitz functions of norrnal operators, Proc. Amer. Math. Soc. , 94 ( 1985) 416418. Theorem VII. 5 . 7 is also due to F. Kittaneh , Inequalities for the Schatten pnorrn IV, Commun. Math. Phys . , 106 ( 1 986) 58 1585. The in equality (VII. 39) was proved earlier by H. Araki and S. Yamagami, An inequality for the Hilbert Schmidt norm, Commun. Math. Phys. , 81 ( 1 981) 8998. Theorem VII. 5.9 was proved in R. Bhatia and F. Kittaneh, On some perturbation inequalities for operators, Linear Algebra Appl. , 106 ( 1988) 271279. See also P.A. Wedin , Perturbation bounds in connection with sin gular value decomposition, BIT, 12( 1972) 99 1 1 1 . The beautiful result c 1 == ; , and its proof in the Appendix, are taken from B . Sz.Nagy, Uber die Ungleichung von H. Bohr, Math. Nachr. , 9 ( 1 953) 255259. Problems such as this, are called minimal extrapolation pro blems for the Fourier transform. See H.S. Shapiro, Topics in Approximation The ory, Sprin ge r Lecture Notes in Mathematics Vol. 187 ( 1971 ) . In Theorem VIL2.5 , all that is required of f is that ](s) = ; when 2 points if we are s E CT( A )  O"(B) . This fixes the value of j only at n dealing with n x n matrices. For each n , let b ( n ) be the smallest constant, for which we have I [ X [[ < I [ A X  X B II , en m atrice s px
== x q ,
b�)
whenever A, B are n x n Hermitian matrices such that dist (o(A) , 8. R. McEachin has shown that b(2)
v'6 2
�
1 . 22474 ( see Example VII.2. 10)
a
( B)) ==
b( 3) and that
8 + s VIO
18
VI I . 8 Notes and References �
1 . 32285
b == nlim b(n ) + oo
22 5
==
Thus the inequality (VII.26) is sharp with
7r
 .
2
c1
==
; . See, R. McEachin ,
A sharp estimate in an operator inequality, Pro c. A mer. l\1ath. Soc. , 1 1 5
( 1 99 2) 161165 and Analyzing specific cases of an operator inequality, Linear Alge bra Appl. , 208/20 9 ( 1994) 343365. The quantity p(£, F) defined in Problem VII.6 . 1 is sometimes called the gap between £ and F. This and related measures of the distance between two subspaces of a Banach space are used extensively by T . Kato, Pertur bation Theory for Linear Operators, Chapter 4.
VIII S p ect ral Variation of Nonnormal Mat rices
In Chapter 6 we saw that if A and B are both Hermitian or both unitary, then the optimal matching distance d(o(A) , o(B)) is bounded by I I A  B I I . We also saw that for arbitrary normal matrices A , E this need not always be true ( Example VI.3 . 13) . However, in this case, we do have a slightly weaker inequality d( a (A) , o(B) ) < 3 I I A  E l l ( Theorem VII.4. 1 ) . If one of the matrices A, E is Hermitian and the other is arbitrary, then we can only have an inequality of the form d(o(A) , o(B) ) < c( n ) I I A  E ll , where c( n ) is a constant that grows like log n ( Problems VI.8.8 and VI.8.9) . A more striking change of behaviour takes place if no restriction is placed on either A or B. Let A be the n x n nilpotent upper Jordan matrix; i.e. , the matrix that has all entries 1 on its first diagonal above the main diagonal and all other entries 0. Let B be the matrix obtained from A by adding an entry £ in the bottom left corner. Then the eigenvalues of B are the nth roots of £. So d(o(A) , o(B) ) == £1/ n , whereas I I A  E l l == £. When £ is small, the quantity c l / n is much larger. No inequality like d(o(A) , o(B) ) < c ( n ) II A  B II can be true in this case. In this chapter we will obtain bounds for d(o(A) , o( B ) ) , where A, B are arbitrary matrices. These bounds are much weaker than the ones for normal matrices. We will also obtain stronger results for matrices that are not normal but have some other special properties.
VI I I . I G eneral S pectral Variation Bounds
VIII . l
227
General S pectral Variation Bounds
Throughout this section, A and B will be two n x n matrices with eigenval ues a 1 , . . . , an , and fJ1 , . . , f3n , respectively. In Section VI. 3 , \Ve introduced the notation ( VII I l ) s ( a (B) , o( A ) ) == m?X �nlai  ,Bj 1 .
.
J
'l
A bound for this number is given in the following theorem. Theorem VIII. l . l Let A, B be n
x n
matrices. Then
(VIII.2) Proof. Let j be the index for which the maximum in the definition (VIII. I ) is attained. Choose an orthonormal basis e 1 , . , en such that Be 1 f3j e1 . Then .
.
==
[m�nl ai  (3j l ] n 'l n
<
IJ l a i  ,Bj I
==
l det (A  (31 1 ) 1
i= l
by Hadamard's inequality (Exercise !. 1 . 3) . The first factor on the right hand side of the above inequality can be written as I I (A  B) e 1 J I and is, therefore, bounded by I I A  B I I  The remaining n 1 factors can be bounded 
as
•
This is adequate to derive (VIII.2) . Example VIII. 1.2 Let A equal.
==
 B == I .
Then the two sides of (VII/. 2} are
Compare this theorem with Theorem VI. 3.3. Since the righthand side of (VIII.2) is symmetric in A and B, we have a bound for the Hausdorff distance as well: (VIII.3) Exercise VIII. 1 . 3 A bound for the optimal matching distance d( o(A) , o( B)) can be derived from Theorem VII/. 1 . 1 . The argument is similar to the one used in Pro blem VI. 8. 6 and is outlined below. (i) Fix A, and for any B let
228
VI I I .
S p ectral Variation o f Nonnormal Matrices
where lvf max( I I A I I , I I B I I ) . Let a 1 , . . . , a n be the eigenvalues of A. Let D ( a 2 , c( B )) be the closed disk with radius c ( B ) and centre a i . Then, The orem VII/. 1 . 1 says that CJ(B) is contained in the set D o btained by taking the union of these disks. (ii) Let A( t ) ( 1  t ) A + tB , 0 < t < 1 . Then A(O) == A, A(1) == B , and c(A(t) ) < c( B ) for all t . Thus, for each 0 < t < 1 , o ( A ( t )) is contained in D. {iii} Since the n eigenvalues of A ( t) are continuous functions of t , each connected component of D contains as many eigenvalues of B as of A. {iv) Use the Matching Theorem t o show that this implies ==
==
d ( a(A) , a ( B ))
<
(2n  1) (2M) l  l/ n ii A  B l l 1 / n .
(VIII.4)
{v) Interchange the roles of A and B and use the result of Problem 1!. 5. 1 0
to o btain the stronger inequality
(VIII.5) The example given in the introduction shows that the exponent 1/n oc curring on the righthand side of (VIII.5) is necessary. But then homogene ity considerations require the insertion of another factor like (2 M ) l  l/n. However, the first factor n on the righthand side of (VIII. 5) can be replaced by a much smaller constant factor. This is shown in the next theorem. We will use a classical result of Chebyshev used frequently in approximation theory: if p is any monic polynomial of degree n , then
1 max jp(t) I > 2 l . O
(VIII.6)
(This can be found in standard texts such as P. Henrici, Elements of Nu merical Analysis, Wiley, 1964, p. 194; T.J. Rivlin, A n Introduction to the Approximation of Functions, Dover, 1981 , p. 3 1 . ) The following lemma is a generalisation of this inequality. Lemma VIII. 1.4 Let r be a continuous curve in the complex plane with endpoints a and b. If p is any monic polynomial of degree n, then (VIII. 7) Proof. Let L be the straight line through a and b and L between a and b:
S the segment of
L
{z : z == a + t(b  a) , t E �}
S
{ z : z == a + t (b  a ) , O < t < 1 } .
For every point z in C, let z' denote its orthogonal projection onto L. Then l z  w l > l z'  w' l for all z and w .

VI I I . l General Spectral Vari ation Bo unds
 Let A t�, i == 1 , . . . , n , be the roots of p . Let .:\� == a + ti ( b t i E JR, and let z == a + t ( b  a ) be any point on L . Then n IT i
i= l
�
z  A I
==
n IT
i= l
I ( t  ti ) ( b  a ) I
==
lb  a!n
n
Ti
i= l

l t  ti l ·
From this and the inequality (VIII.6) applied to the polynomial we can conclude that there exists a point z0 on S for which n
IT i
i=l
'
zo  .A i l
> jb  a!n 22n  l
a
),
II n
229
i= l
\vhere
( t  tz ) ,
·
Since r is a continuous curve joining a and b , z0 == .:\� for some A o E r . Since l Ao  A i l > I A �  A � l , we have shown that there exists a point .:\ 0 on
r
n IT
l b ;:,� I n s uch that l p ( .A o ) l == . I Ao  A i > 1 2 i= l l
Theorem VIII. 1 . 5 Let A and B be two n
x
•
n matrices. Then (VIII. 8 )
Proof. Let A ( t ) == ( 1  t)A + tB, 0 < t < 1 . The eigenvalues of A(t) trace n continuous curves in the plane as t changes from 0 to 1. The initial points of these curves are the eigenvalues of A , and their final points are those of B . So, to prove (VIII. 8) it suffices to show that if r is one of these curves and a and b are the endpoints of r, then I a  b l is bounded by the righthand side of (VIII.8) . Assume that I I A l l < II B I I without any loss of generality. By Lemma VIII. l.4, there exists a point .:\0 on r such that I det (A  A o l ) l >
jb  a!n 2 _1 . 2 n
Choose 0 < t0 < 1 such that .:\ 0 is an eigenvalue of ( 1  t0 ) A + t0 B . In the proof of Theorem VIII. l . l we have seen that if X , Y are any two n x n matrices and if .A is an eigenvalue of Y, then J
I det( X  AI ) I < I I X  Y I ! ( I I X I I + I Y I I ) n  1 ·
Choose X == A and Y == ( 1 t0 ) A + t0 B . This gives 1;��� � < I det (A Aol ) l < II A  B li ( II A I I + I I B I I ) n l _ Taking nth roots, we obtain the desired • conclusion. 
Note that we have, in fact, shown that the factor 4 in the inequality (VIII.8) can be replaced by the smaller number 4 x 2  1 /n . A further im provement is possible; see the Notes at the end of the chapter. However, the best possible inequality of this type is not yet known.
VII I .
230
Spectral Variat ion of Nonnormal Matrices
Perturbation of Ro ots of Polynomials
VIII . 2
The ideas used above also lead to bounds for the distance between the roots of two polynomials. This is discussed below. Lemma VIII. 2. 1 Let f(z) nomial. Let
==
zn
+
a 1 z n l + · · · + an
be any monic poly
Then all the roots of f are bounded (in absolute value) by
Proof.
If l z l
> Jt ,
Jt .
(VIII.9 )
then
f(z)
a l + · · · + an 1+zn z a 2  · · ·  an al  > 1 z zn z2 1 1 1 1      · · ·  2 22 2n 0. > 
>
•
Such z cannot, therefore, be a root of f.
Let a 1 , , O n be the roots of a monic polynomial f. We will denote by Root f the unordered ntuple { a 1 , . , a n } as well as the subset of the plane whose elements are the roots of f. We wish to find bounds for the optimal matching distance d (Root f, Root g) in terms of the distance between the coefficients of two monic polynomials f and g . Let .
.
.
.
.
f(z ) g(z)
(VIII. 10)
be two polynomials. Let
(VIII. 1 1) (VIII.12) The bounds given below are in terms of these quantities. Theorem VIII. 2.2 Let f, g be two monic polynomials Then
where
Jt
n s (Ro ot f, Root g ) < L ! a k  bk !JL n  k , k= l is given by { VIII. 9} .
as
in (VII/. 1 0} .
(VIII. l3)
VIII.2
Proof.
Perturbation of Ro ots of Poly nomials
23 1
We have
n f(z)  g( z ) == L (a k  bk )z n  k . k= l So, if a is any root of j, then, by Lemma VIII.2. 1 ,
l g (a) l
n
< <
If the roots of g are {3 1 , . . .
L l a k  b k l lnl n  k k= l n L ! ak  b k l /1n  k_ k =l
, f3n , this says that n n IT Ia  f3j I < L lak  b k ! 11n  k · j= l k=l
So , •
This proves the theorem.
Corollary VIII.2 . 3 The Hausdorff distance between the roots of f and g is bounded as
h(Rootf, Rootg) < G ( f, g ).
(VIII. 14)
Theorem VIII.2 . 4 The optimal matching distance be tween the roots of f and g is bounded as d(Root j, Root g) < 4 8 ( / , g ) .
(VIII. 15)
Proof. The argument is similar to that used in proving Theorem VIII. l .5. Let ft ( l  t ) f + tg , 0 < t < 1 . If A is a root of ft , then by Lemma VIII.2. 1 , I A I < 1 and we have ==
l t (f( .A )  g (A) ) l < l f( A )  g ( A ) I n < Li a k  bk l l.A i n  k < [G ( j , g ) ] n . k= l The roots of ft trace continuous curves t changes from 0 to 1 . The initial points of these curves are the roots of f , and the final points are the I f( A ) I
n
as
roots of g. Let r be any one of these curves, and let a, b be its endpoints. Then, by Lemma VIII. l .4, there exists a point A on r such that
232
VI I I .
Spectral Variat ion o f No nnormal l\1 atrices
This shows that ja 
bj < 4
X
2l/n G (f, g ) ,
•
and that is enough for proving the theorem. h =
( f + g) /2 . Then any convex combination of f can also be expressed as h + t ( f  g) for some t with It! < � . Use
Exercise VIII. 2.5 Let and g this to show that
d ( Root j, Root g ) < 4 1  1 /n G ( f g ) . ,
(VIII. 16)
The first factor in the above inequality can be reduced further; see the Notes at the end of the chapter. However, it is not known what the optimal value of this factor is. It is known that no constant smaller than 2 can replace this factor if the inequality is to be valid for all degrees n. Note that the only property of 1 used in the proof is that it is an upper bound for all the roots of the polynomial ( 1  t) f + tg. Any other constant with this property could be used instead. Exercise VIII. 2.6 In Pro blem I. 6. 1 1 , a bound for the distance between the coefficients of the characteristic po lynomials of two matrices was ob tained. Use that and the combinatorial identity
tkG) k =O
=
n2 n  l
n matrices A, B d ( O" (A) , O" (B )) < n l/ n (8 M) l  1 /n ii A  B ll l/ n , ( VI I I . 1 7 ) where M == max( II A II , I I B II ) . This is weaker than the bound obtained in to show that for any two
n x
Theorem VIII. l . 5.
VII I . 3
Diagonalisable Matrices
A matrix A is said to be diagonalisable if it is similar to a diagonal matrix; i.e. , if there exists an invertible matrix S and a diagonal matrix D such that A == svs  1 . This is equivalent to saying that there are n linearly independent vectors in e n that are eigenvectors for A. If S is unitary ( or the eigenvectors of A orthonormal) , A is normal. In this section we will derive some perturbation bounds for diagonalisable matrices. These are natural generalisations of some results obtained for normal matrices in Chapter 6. The condition number of an invertible matrix S is defined as cond ( S)
==
II SI I II S  1 11 
Note that cond ( S) > 1 , and cond ( S) == 1 if and only if S is a scalar multiple of a unitary matrix. Our first theorem is a generalisation of Theorem VI. 3.3.
VI I I . 3 D iago nalis able M atrices
233
Theorem VIII . 3 . 1 Let A == SDS 1 , where D is a diagonal matrix and S an invertible matrix. Then, for any matrix B , s(o(B) , o(A) ) < cond( S) I I A  E l l 
(VIII. 18)
Proof. The proof of Theorem Vl. 3.3 can be modified to give a proof of this. Let E. = cond(S) I I A  B ll . We want to show that if (3 is any eigenvalue of B, then (3 is within a distance c of some eigenvalue a.1 of A. By applying a translation, we may assume that {3 = 0. If none of the O'.j is within a distance c of this , then A is invertible and
1 So,
J I A  1 ( B  A) l l < I I A  1 II II E  A l l < 1 .
Hence, I + A  1 (E  A) is invertible, and so is B = A(I + A 1 (B  A) ) . • But then B could not have had a zero eigenvalue. Note that the properties of the operator norm used above are (i) I + A is invertible if I I A l l < 1 ; (ii) I I AB II < I I A I I I I B II for all A, B; (iii) l i D I I == max l di l if D = diag( d 1 , . . . , dn ) . There are several other norms that satisfy these properties. For example, norms induced by the pnorms on e n , 1 < p < oo , all have these three properties. So, the inequality (VIII. 18) is true for a large class of norms. Exercise VI�I.3 . 2 Using continuity arguments and the Matching Theo rem show that if A and B are as in Theorem VIII. 3. 1 , then d(o(A) , o(B) ) < (2n  l)cond(S) I I A  E ll . If B is also diagonalisable and B
== T D'T  1 , then
d(o(A) , o(B) ) < n cond ( S)cond(T) I I A  E l l An inequality stronger than this will be proved below by other means.
Theorem VIII. 3 . 1 also follows from the following theorem. Both of these are called the BauerFike Theorems. Theorem VIII.3.3 Let S be an invertible matrix. If (3 is an eigenvalue of B but not of A, then (VIIL 1 9)
VI I I .
234
Proof.
Spect ral Vari ation of Nonnormal Matrices
We can write
S(B  {3/) S  1
S[(A  {31) + B  A] S  1 S(A  {3/) S  1 {I + S(A  {31 )  1 s  1 S(B  A) S  1 } . ·
Note that the matrix on the lefthand side of this equation is not invertible. Since A  {3! is invertible, the matrix inside the braces on the righthand side is not invertible. Hence,
1 < li S( A  f3I)  1 S  1 . S(B  A)S  1 11 < 11 s (A  f3 I)  l s 1 11 11 S (B  A )s  1 11 .
•
This proves the theorem.
We now obtain, for diagonalisable matrices , analogues of some of the major perturbation bounds derived in earlier chapters for Hermitian ma trices and for normal matrices. Some auxiliary theorems about norms of commutators are proved first. Theorem VIII.3.4 Let A, B be Hermitian operators, and let r be a pos itive operator whose smallest eigenvalue is
1
{i. e. ,
r > 1 I > 0) .
III Ar  r B i l l > 1 III A  B ill
The n
(VIII. 20 )
for every unitarily invariant norm.
Proof.
Let T == Ar  r B, and let Y
Y
==
==
T + T * . Then
(A  B)r + r(A  B).
This is the Sylvester equation that we studied in Chapter 7. From Theorem VII . 2.12, we get
21 III A  B ill < IllY I ll < 2 I II TI I I
==
2 III Ar  r B il l 
This proves the theorem.
•
Corollary VIII.3.5 Let A, B be any two operators, and let r be a positive opemtor, r > 7! > 0. Then
III ( Ar  r B) EB (A* r  r B * ) lll > 'Y IIl (A  B ) EB (A  B) I l l
( VIII . 21)
for every unitarily invariant norm.
Proof.
( J. � )
This follows from (VIII.20) applied to the Hermitian operators • and ( �* � ) , and the positive operator ( � � ) .
VI I I . 3 Diagonalisable M atrices
235
Corollary VIII.3 . 6 · Let A . and B be unitary operators, and let r be a positive operator, r > 1I > 0 . Then, for every unitarily invariant norm, ( VIII. 22 )
III Ar  r B i ll > 1 II I A  B i l l Proof.
If A and B are unitary, then
sj ( Ar  rB)
==
sj ( rB*  A* r )
==
sj (A*r  rB* ) .
Thus the operator (Ar  r B) EB (A * r  r B * ) has the same singular values as those of Ar  r B, each counted twice. From this we see that ( VIII. 2 1 ) implies ( VIII.22 ) for all Ky Fan norms, and hence for all unitarily invariant • norms. Corollary VIII.3. 7 Let A, B be normal operators, and let r be a positive operator, r > rl > 0. Then
( VIII.23 ) Proof. Suppose that A is normal and its eigenvalues are a 1 , . . . , a n · Then (choosing an orthonormal basis in which A is diagonal) one sees that for every X
z ,J
If A, B are normal� then, applying this to ( � in place of X , one obtains
�)
in place of A and ( g
I I AX  X B l l 2 == II A * X  X B * 1 1 2 
�)
( VIII. 24 )
Using this, the inequality ( VIII.23 ) can be derived from ( VIII.2 1 ) .
•
A famous theorem (called the FugledePutnam Theorem, valid in Hilbert spaces of finite or infinite dimensions) says that if A and B are normal, then for any operator X, AX == XB if and only if A* X == X B* . The equality ( VIII.24 ) says much more than this.
normal A, B, the inequality (VIII. 23} is no t al wa ys true if the HilbertSchmidt norm is replaced by the operator norm. A numerical example i l lust t ing this is given below. Let
Example VIII.3.8 For r ==
ra
diag(0.6384 , 0. 6384, 1 .0000 ) ,
 0. 5205  0. 1642i 0. 1 299 + 0 . 1 709i 0. 2850  0. 1808i
0. 1042  0.3618i 0.42 1 8 + 0.4685i 0. 3850  0. 4257i
 0. 1326  0.0260i  0. 5692  0 .3178 i
0. 2973  0. 1 71 5i
VI I I .
236
B
==
S p ectral Vari ation of Nonnormal Matrices
 0 .6040 + 0 . 1 760i 0 . 0582 + 0. 2850i 0 .408 1  0. 3333i
0. 5 1 28  0 . 2865i 0.0 1 54 + 0.4497i
 0.072 1  0. 2545i
0 . 1 306 + 0.0 1 54i  0 . 500 1  0 . 2833i  0 . 2686 + 0.0247i
Then
l i Ar  r B I I = 0.8763. ! II A  E l l Theorem VIII. 3.4 and Corollary VIII.3. 7 are used in the proofs below.
Alternate proofs of both these results are sketched in the problems. These proofs do not draw on the results in Chapter 7. Theorem VIII.3.9 Let A, B be any two matrices such that A == S D1 S 1 , B == T D 2 T 1 , where S, T are invertible matrices and D 1 , D 2 are real diag onal matrices. Then
1 2 IIIEig 1 ( A)  Eig1 ( B) I I I < [cond ( S)cond (T)] 1 III A  B ill
(VIII.25)
for every unitarily invariant norm.
Proof. When A, B are Hermitian, this has already been proved; see (IV.62) . This special case will be used to prove the general result. We can write
Hence,
We could also write
and get
1 IIIT  1 SD 1  D2 T  S IIl
I I T  1 I I IIIA  B III I I S I I . Let s 1 T have the singular value decomposition s 1 T == Ur V. Then
<
!II D 1 ur v  urv D 2 III III U * n 1 ur  r v D2 V * 1 1 1 == IIIA ' r  r B ' lll ,
V D 2 V * are Hermitian matrices. Note that where A' U* D 1 U and B' r  1 s == V * r 1 U* . So, by the same argument , ==
==
We have, thus, two inequalities
n iii A  B il l > I I I A r  r B ' III , '
VI I I . 3 D iagonalisable M atrices
237
and
/311J A  B ill > III A ' r l  r 1 B ' lll , where a == II S 1 JI I I T I I , (3 IIT  1 I I II SI I  Combining these two inequalities and using the triangle inequality, we have
==
[ 1/2 (  ) 1 / 2 ] 2 > 0 implies that � The operator inequality ( � ) r!3
+ rf3 >
1
(a{3� 1; 2 I. Hence, by Theorem VIII.3.4, 2 lii A  B i ll >
1
III A '  B ' lll � 1 2 1 (n
But A' and B ' are Hermitian matrices with the same eigenvalues as those of A and B , respectively. Hence, by the result for Hermitian matrices that was mentioned at the beginning, Ili A '  B ' lll > III Eig 1 (A)  Eig 1 ( B ) III Combining the three inequalities above leads to (VIII. 25) .
•
Theorem VIII. 3 . 10 Let A, B be any two matrices such that A == SD 1 S 1 , B T D2T 1 , where S, T are invertible matrices and D 1 , D 2
==
are diagonal matrices. Then
1 2 d2 ( o( A ) , o (B ) ) < [cond(S)cond(T ) ] 1 I I A  B i b 
(VIII. 26)
Proof. When A, B are normal, this is j ust the HoffmanWielandt in equality; see (VI. 34) . The general case can be obtained from this using the inequality (VIII. 23) . The argument is the same as in the proof of the • preceding theorem. Theorems VIII.3.9 and VIII. 3 . 1 0 do reduce to the ones proved earlier for Hermitian and normal matrices. However, neither of them gives tight bounds. Even in the favourable case when A and B commute, the lefthand side of ( VIII. 24) is generally smaller than I l i A B i l l , and this is aggravated further by introducing the condition number coefficients.

Exercise VIII. 3 . 1 1 Let A and B be as in Theorem VIII. 3. 10. Suppose that all eigenvalues of A and B have modulus 1 . Show that (VIII. 27) d lll 111 ( a( A ) , a( B )) < [cond ( S)cond ( T )] 1 1 2 II I A B il l
;

for all unitarily invariant norms. For the special case of the operator norm, the factor ; above can be replaced by 1. [Hint: Use Corollary VIII. 3. 6 and the theorems on unitary matrices in Chapte r 6.}
238
VI I I .
VII I . 4
Spectral Variat io n of Nonnormal M atrices
Matrices with Real Eigenvalues
In this section we will consider a collection R of matrices that has two special properties: R is a real vector space and every element of R has only real eigenvalues. The set of all Hermitian matrices is an example of such a collection. Another example is given below. Such families of matrices arise in the study of vectorial hyperbolic differential equations. The behaviour of the eigenvalues of such a family has some similarities to that of Hermitian matrices. This is studied below. Example VIII.4. 1 Fix a block decomposition of matrices in which all di agonal blocks are square. Let R be the set of all matrices that are block upper triangular in this decomposition and whose diagonal blocks are Her mitian. Then R is a real vector space (of real dimension n2 ) and every element of R has real eigenvalues.
In this book we have called a matrix positive if it is Hermitian and all its eigenvalues are nonnegative. A matrix A will be called laxly positive if all eigenvalues of A are nonnegative. This will be written symbolically as 0 < L A. If all eigenvalues of A are positive, we will say A is strictly laxly positive. We say A < L B if B  A is laxly positive. We will see below that if R is a real vector space of matrices each of which has only real eigenvalues, then the laxly positive elements form a convex cone in n. So, the order < L defines a partial order on n. Given two matrices A and B, we say that .X is an eigenvalue of A with respect to B if there exists a nonzero vector x such that Ax == .XBx. Thus, eigenvalues of A with respect to B are the n roots of the equation det(A  :AB) == 0. These are also called generalised eigenvalues. Lemma VIII. 4.2 Let A, B be two matrices such that every real linear combination of A and B has real eigenvalues. Suppose B is strictly laxly positive. Then for every real A , A + AI has real eigenvalues with respect to B .
Proof.
We have to show that for any real A the equation de t (A. + .XI  J.LB)
==
(VIII. 28)
0
is satisfied by n real JL. Let Jl be any given real number. Then, by hypothesis , there exist n real .X that satisfy (VIII.28) , namely the eigenvalues of A + JLB. Denote these A > VJ n (JL) . We have as 'Pi (JL) and arrange them so that
'P 2 ( JL ) > ·
·
·
n
det (  A + .XI  JL B) ==
IT (A  VJk ( JL) ) .
k= l
(VIII.29)
By the results of Section VI. l , each 'P k (JL) is continuous as a function of 1 11 · For large J.L, l (A + J.LB) is close to B . So, l VJ k ( JL ) approaches :A k(B) as � �
V I I I . 4 IV1at rices with Real Eigenval ues
239
JL � oo , and Ak (B) as JL �  oo . S i nce B is strictly laxly positive, this implies that 'Pk (J.L) � ±oo as J.L � ±oo. So, every A in IR is in t he range of 'Pk fo r each k == 1 , 2, . . . , n . Thus, for • each A , there exist n re al JL that satisfy (VIIL 28) .
Proposition VIII.4.3 Let A, B be two matrices such that every real linear combination of A and B has real eigenvalues. Suppose A is (strictly) laxly negative. Then every eigenvalue of A + i B has {strictly) negative real part. Proof. Let JL == f..L l + iJ.L 2 be an eigenvalue of A + iB. Then det (A + iB JL 1 l  ij.L 2 l) 0. Multiply this by in to get ==
det [ (  B + J.L 2 l) + i ( A  J.L 1 l) ] == 0 .
So the matrix B + J.L2 l has an eigenvalue i with respect to the matrix A  J.L i l , and it has an eigenvalue i wi th respect to the matrix  (A  j.L 1 / ) . By hy pothesis , every real linear combination of A  J.L 1 l and B has real eigenvalues. Hence, by Le mm a VIIL4. 2 , A  J.L 1 l c anno t be either strictly laxly positive or s trictly laxly negative. In other words, •
This proves the proposition. Exercise VIII.4.4 With no tations as in th e above proof, show that A� (B) < tt 2 < >.. t (B) .
Theorem VIII.4. 5 Let
R
be a real vector space whose elements are ma trices with real eigenvalues. Let A, B E R and let A < L B. Then Ak (A) < >.. i (B) for k == 1 , 2 , . . . , n .
Proof. We will prove a more general statement: if A, B E R and 0 < L B, then >.. i (A + J.LB) is a monotonically increasing function of the real var i able JL. I t is enough to prove this when 0 < L B; the general case follows by con tinuity. In the notation of Lemma VIIL4.2, >.. i (A + J.LB) == 'P k (J.L) . Suppose cp k (JL) decreases in some intervaL Then we can ch o os e a real number ). such that ).  'Pk (JL) increases from a negative to a positive value in this interval. Since .. , ).  'Pk (J.L) vanishes for at least three values of f.L · So, in the representation (VIII.29) this factor contributes at least three zeroes. The remaining factors contribute at least one zero each. So, for this >.. , the eq u at ion (VIII.28) has at least n + 2 roo ts • J.L. This is impossible. Theorem VIII.4. 6 Let R be a real vector space whose elements are ma trices with real eig envalues. Let A, B E R. Then (VIII. 30)
VI I I .
240
S pectral Variation of Nonnormal Matrices
for k == 1 , 2 , . . . , n .
P roof. The matrix B  A � ( B ) I is laxly positive. So, by the argument in the proof of the preceding theorem, .Ai (A + JtB)  JL A � (B ) is a monotoni cally increasing function of JL · Choose Jl 0, 1 to get the first inequality in (VIII. 30) . The same argument shows that A k (A + JLB)  JLA i ( B) is a mono tonically decreasing function of Jl · This leads to the second inequality. • ==
Corollary VIII.4. 7 On the vector space R, the function .Ai (A) is convex and the function A� (A) is concave in the argument A. Theorem VIII.4 8 Let A and B be two matrices such that all real linear combinations of A and B have real eigenvalues. Then ..
max
I < k
l .Ak (A)  A k ( B ) I < spr(A  B ) < II A  B l l 
(VIII.3 1 )
Proof. Let R be the real vector space generated by A and B. By Theorem VIII.4.6,
So,
l Ak ( B)  A k ( A ) l < max( I A i ( B  A) I , I A � ( B  A ) I ) spr(A  B) < II A  E ll 
•
Note that Weyl's Perturbation Theorem is included in this as a special case.
Exercise VIII.4. 9 Show that if only A, B and A + B are assumed to have real eigenvalues, then the inequality (VIII. 31) might not be true.
VI II 5 ..
Eigenvalues with S ymmetries
We have remarked earlier that the exponent 1/n occurring in the bound (VIII.8) is unavoidable. However, if A and B are restricted to some special classes, this can be improved. In this section we identify some useful classes of matrices where this exponent can be improved ( though not eliminated altogether) . These are matrices whose eigenvalues appear as pairs ±A or, more generally, as tuples {A, wA, . . . , wp l A}, where w is a pth root of unity. We will give interesting examples of large classes of such matrices, and then show how this symmetric distribution of their eigenvalues can be exploit ed to get better bounds.
VII I . 5 Eigenvalues with Symmet ries
24 1
Example VIII. 5 . 1 Let AT denote the transpose of a matrix. A complex matrix is called symmetric if AT == A and skewsymmetric if AT ==  A . If A is a skewsymmetric matrix, then A is an eigenvalue of A if and only if A is. The class of all such matrices forms a Lie algebra. This is the Lie algeb ra associated with the complex orthogonal group. Example VIII. 5 . 2 If AT is similar to  A , then, clearly, A is an eigen value of A if and only if  A is. The Lie algebra corresponding to the sym plectic Lie group contains matrices that have this property. Let n be an even number n == 2r. Let J == ( � � ) , where I is the identity matrix of order r . Let A be an n x n matrix such that AT ==  JAJ 1 . It is easy to see that we can then write
where A 1 , A 2 , A3 are r x r matrices of which A2 and A3 are skewsymmetric. The collection of all such matrices is the Lie algebra associated with the symplectic group.
Example VIII. 5.3 Let X be a matrix of order n form 0 A1 0 0 0 0 0 0 A2 0
==
pr having a special
X ==
0 A P
0 0
0 0
Ap  1 0
0 0
where A 1 , . . . , A p are matrices of order r. Let Y == diag (I, , wi, , . . . , wP 1 I, ) , where w is the primitive p th root of unity. Then y  1 X Y == wX . So, if A is an eigenvalue of X , then so are w..X.. w 2 A , . . . , wP 1 A .
Exercise VIII.5.4 Let Z == ( Z A1_ ) , and suppose R commutes with A 1 . Show that tr z k 0 if k is odd. Use this to show that A is an eigenvalue of Z if and only if  ).. is. =
Exercise VIII. 5 . 5 Let w be the primitive pth root of unity. If X, Y are two matrices such that X Y == w Y X , then (X + Y ) P == XP + Y P . Exercise VIII.5 . 6 Let Z be a matrix of order form 0 0 R A1 0 wR A 2 0 Z == Ap
0
0
n ==
pr having a special
0 0
Ap  1 wP 1 R
where R commutes with A 1 , A 2 , . . . , A p . Use the result of the preceding ex ercise to show that tr z k == 0 if k is not an integral multiple of p. Use this
242
VII I .
Spectral Variation of Nonnormal M at r ices
to show that if .A is an eigenvalue of Z , then so are w .A , w 2 A , · · {This is true even when R commutes with A 1 , . . . , Ap  l ·)
For brevity, an ntuple will be called p Carrollian if n elements of the tuple can be enumerated as
( a 1 , . . . , a, , wa1 , . . . , wa, , . . . , w
==
p  1 a1 , . . . , w p  1 a, ) ,
·
,
wP 1 A .
pr and the
(VIII.32)
where w is the primitive pth root of unity. We have seen above several examples of matrices whose eigenvalues are pCarrollian.
k
Exercise VIII. 5. 7 Let s k, 1 < < n denote the elementary symmetric polynomials in n variables. If ( a 1 , . . . , an ) is a Carrollian ntuple written in th e forrn { VII/. 32}, show that modulo a sign factor, we have
S k ( a 1 , · · · an ) _
,
{ Os
j ( af ,
. . . , a�)
if if
kk # jpjp. ==
Use this to show that if a 1 , . . . , an are roots of the polynomial
f( ) Z
==
Z
n
n + a1 Z  1
+ ·· ·
+ Un ,
then af , . . . , a� are roots of the polynomial
F( z)
==
z r + ap zr 1 + a2p zr 2 + · · · + arp ·
Proposition VIII. 5.8 Let f, g be monic polynomials of degree n as in ( VII/. 1 0) . Suppose n == and the roots of f and g both are p  Carrollian. Let 1 be as in {VII/. 1 1). Then the roots of f and g can be labelled as a1 , . . . , an and fJ 1 , . , f3n in such a way that
pr
.
j a'l! 1max

Proof.

.
{31??. I
{ "J ak  bk I'Yp(r k) } 1
<4
�
k= 1 p
p
.
d ( o (AP ) , ==
A, B
be two n
(VIII.33) •
Use Theorem VIII.2.4 and Exercise VIII.5.7.
Theorem VIII. 5.9 Let n == pr and let eigenvalues are pCarrollian. Then
where M
1/ r
x
a( BP)) < 4 Cr,p MP  1 1' II A  B l l 1 1 ' '
n matrices whose
(VIII.34)
max ( II A II , II B II ) and
{ k } Cr. p = k pG�) . r
1 /r
(VIII.35 )
VI I L 5 Eigenvalues with Symmet ries
Proof.
See Exercise VIII. 2.6 and use the preceding proposition .
243 •
The two results above give bounds not on the distance between the roots themselves but on that between their pth powers. If all of them are outside a neighbourhood of zero, a bound on the distance between the roots can be obtained from this. This needs the following lemma.
Lemma VIII.5. 10 Let x, y be complex numbers such that l x l > and l xP  yP J < C. Then, for some k , 0 < k < p  1 ,
p,
IYI > p
(VIII.36) where w is the primitive p th root of unity.
Proof.
Compare the coefficients of t in the identity
p 1
11 ( t  ( x  w k y ) ] == (  1 ) P [ (x  t)P  yP]
k =O to see that
 (  1 )p 1px p 1 . Sp 1 ( x  y , x  wy , . . . , x  wp 1 y ) 1 The righthand side has modulus larger than ppP  and the lefthand side is a sum of p terms. Hence, at least one of them should have modulus larger than pP 1 . So, there exists k, 0 < k < p  1 , such that
p 1
11 l x  wj y l > pP1 . J :i: k
] =0
But
p 1
11 l x  wj yj
j=O
When p == note that
==
jxP  yP j < C. This proves the lemma .
•
2, the inequality (VIII.36) can be strengthened. To see this
> 4 · l x  Y l 2 + l x + Y l 2 == 2 ( l x l 2 + I Y I 2 ) p2 So, either l x  Y l or l x + y j must be larger than 21 1 2 p . Consequently, one
of them must be smaller than C/2 1 1 2 p . Thus if the eigenvalues of A and B are pCarrollian and all have modulus 1 larger than p, then d(o(A) , o(B) ) is bounded by C / pP , where C is the quantity on the righthand side of (VIII.34) . When p == 2, this bound can be improved further to Cj y"lp . The major improvement over bounds obtained in Section 2 is that now the bounds involve I I A  B jj 1 / r instead of
2 44
VI I I .
Spect ral Var iation of Nonnormal M atrices
I I A  B ll 1/ n. For low values of p, the factors cr,p can be evaluated explicitly. For example, we have the combinatorial identity
VIII . 6
Problems
Problem VIII. 6. 1. Let f ( z ) == z n + a 1 z n  1 + · · · + a n be a monic poly nomial. Let JL 1 , . , Jl n be the numbers l a k l l / k , 1 < k < n , rearranged in decreasing order. Show that all the roots of f are bounded (in absolute value) by JL I + tt 2 . This is an improvement on the result of Lemma VIII. 2. 1 . .
.
Problem VIII.6 . 2 . Fill in the details in the following alternate proof of Theorem VIII.3. 1 . Let (3 be an eigenvalue of B but not of A . If Bx == (Jx, then Hence,
l l x ll < cond ( S) I I B  A I I l l (fJ I  D )  1 1 1
ll x ll .
From this it follows that m�n l fJ  aj l < cond (S) I I B  A l l . J
Notice that this proof too relies only on those properties of l l · II that are shared by many other norms (like the ones induced by the pnorrns on e n ) . See the remark following Theorem VIII.3. 1 .
Problem VIII.6.3. Let B be any matrix with entries Di ==
{ z : j z  bii l <
L l bij l }, j:f; i
1
bij ·
The disks
n,
are called the Gersgorin disks of B. The Gersgorin Disk Theorem says that
n
and that any connected component of the set U Di contains as many eigen'l
values of B as the number of disks that form this component. The proof of this is outlined below.
245
VI I I . 6 Pro b lems
Consider the vector norm I I x ll
on operators is
00
A ! l oa t oo
== max I x i I on
l
en .
The norm
it
induces
n
l aij I · n   j= l Let D be the diagonal of B , and let H == B  D . Let (3 be an eig e nval u e of B but not of D . Then (31  B == (31  H  D == ((3 1  D) [ 1  ( (3 1  D )  1 H ] . Si nce (31  B is not invertible, neither is the matrix in the square brackets . Hence, 1 < ll ( fJJ  D)  1 H ll ootoo · II
== m � � 1
From this, the first part of the theorem follows. The second part follows from the continuity argument we have used o ft en. Let B ( t ) == D + tH, 0 < t < 1 . Then B( O ) == D , B( l) == B ; the eigenvalues of B( t ) trace continuous curves that j oin the eigenvalues of D to those o f B . Note that the proof of the first part is very similar to that of Theorem VIII. 3.3; in fact , it is a special case of the earlier one .
Problem VIII.6.4. Given any matrix that U * AU == T ==
A , we can find a unitary U such
D + N, D is diagonal , and N
where T is upper triangular is strictly upper trian gular and, hence, nilpotent . Such a re d u ction is not unique. The measure of nonnormality of A is de fi ned as ,
L).(A) == inf
I! N il ,
where the infimum is taken over all N that o cc ur in the possible triangular forms of A given above. Now let B be any other matrix, and let (3 be an eigenvalue of B but not of From (VIII. 19) we have
A.
Show that
( D  (31 + N )  1
II ( D  (31 + N)  1 11  1
<
J I A  B ll .
[I + ( D  {3J)  1 N]  1 ( D  {31 )  1 [ I  ( D  {31 )  1 N + { (D  (31)  1 N} 2 + . . . + ( 1 ) n  l {(D {31 )  l N} n  l ] (D (3/) 1 . _
_
Let 8 == dist ( (J , o (A) ) . From this equation and the inequality before it concl ude that
n �l A A A I }. I I A  B I I � < � { 1 + �� ) + (�� )) + . . . + (�� )) 2
24 6
VI II.
Spectral Variation
of No nnormal
M atrices
Now show t hat
This is Henrici's Theorem. Let f (t) == t n / ( 1 + t + · · · + t n  l ) . Then f ( t ) is close to t n for small values o f t, and to t for large values of t. Thus, when � (A) is close t o 0, i.e. , when A is close to being normal, the above bound leads to the asymptotic in eq u ality
In Theorem VI.3.3 we saw that if A is normal, then s(()( B) , ()(A) ) < II A  B II ·
Problem VIII.6 . 5 . Let v be any norm on the space of matrices. The v rneasure of nonnormality of A i s defined as �v (A)
==
inf v (N) ,
where N is as in Problem VIII.6.4. Suppose that th e norm v is such that I I A ll < v(A) for all A. Show that � ( A) in Henrici's Theorem can be replaced by L).v (A) . Problem VIII.6.6 . For the HilbertSchmidt norm II · 1 ! 2 , the measure of
nonnormality satisfies the inequality
12 ll A * A  AA* I ! �
for every n x n matrix A. (The proof is a little intricate. ) Problem VIII.6. 7. Let A have the Jordan canonical fo r m J == SAS 1 . Let m be the size of the largest Jordan block in J. Let B be any other matrix. Show that for every eigenvalue {3 of B there is an eigenvalue a of A such that
Problem VIII. 6 . 8 . Let A, B, r be as in T h eo re m VIII.3.4. Let (A  B)x1 == AjXj , where the vectors x1 are orthonormal and the eigenvalues .Ai are indexed in such a way that sj : == s1 (A  B) == I A1 j . Let Yi be the orthonormal vectors that satisfy the relations (A  B)xj == B j Yj · Note that y1 == ±xj · Note also that the difference of Ar  rB and (A  B)r
V II L 6 P roblems
is skewHermitian. Use this to show that , for 1 < k <
k Re L( xj , (Ar  r B)yj ) j= l
Re
k
L (xj ,
j =l
247
n,
(A  B) r yi )
k k L sj ( Yj , r yj ) > I L sj . j=l j=l
Use this to give an alternate proof of Theorem VIIL3.4. ( See Problem IIL6. 6. )
Problem VIII. 6 . 9 . Fill in the details in the following proof of Corollary VIII.3.7. Let D r  �I. Then ==
l i Ar  r E l l �
l i ( AD  DB) + !(A  B ) I I � 2 II A D  DB I I � + ! II A  E l l � + 2 ! Re tr (A D  DB)* (A  B) .
So, it suffices to show that the last term is positive. This can be seen by writing 2 Re tr ( A D  DB ) * (AB) == tr { ( A D  D B) * ( AB) + (A  B) * (A D  DB) }
and then using cyclicity of the trace to reduce this to tr D [( A  B ) * (A  B) + (A  B) ( A  B ) * ] .
Problem VIII. 6. 10. ( i ) Let X be a contractive matrix; i.e. , let !l X II < 1 . Show that there exist unitaries U and V such that X == � ( U + V) . Use this to show that if D 1 and D2 are real diagonal matrices, then for every unitarily invariant norm. [ ( See (IV.62 ) . ] ( ii ) Let A == SD 1 S 1 , B == T D2 T  1 where S and T are invertible matrices and D1 , D2 are real diagonal matrices. Show that ,
T Il i A  B ill < cond(S) cond ( T) IIJ Eig1 ( A )  E ig (B) I ll 
Problem VIII.6. 1 1. Let A and B be any two diagonalisable matrices with eigenvalues .X 1 , . . . , An and f..L l , . . . , f..L n , respectively. Let A == SD 1 s 1 , B == TD2T 1 , where S and T are invertible matrices and D 1 , D2 are diagonal matrices. Show that 1 /2 .A ;  J.L;r( i ) 1 2 , I I A  B l l 2 < cond ( S ) cond ( T ) m:X:
�I (
)
VII I .
248
where
1r
Sp ectral Variation of Nonnormal M at rices
varies over all permutations on
n
symbols. [See Theorem VI.4 . 1 .]
Problem VIII. 6. 1 2 . Let A be a Hermitian matrix with eigenvalues , a n . Let B be any other matrix. For 1 < j < n , let a1 , .
.
.
Dj
==
{z : Iz 
aj
I < II A  B II , I Im z I < II Im (A

B) I I } .
The regions D1 are disks flattened on the top and bottom by horizontal lines. Show that the eigenvalues of B are contained in Unj , and that each connected component of this set contains
J
as
many eigenvalues of A
as
of B.
Problem VIII.6 . 13. Let R be a real vector space whose elements are matrices with real eigenvalues . Show that the function
k
L .Aj (A) is a convex
j= l
function of A on this space for 1 < k < is concave on
R.
n.
Show that the function
k
L .x] (A) j=l
Problem VIII. 6 . 14. I f R1 is invertible, then
)(�
Use this to show that if
and R commutes with A 1 , then Z and  Z have the same eigenvalues. (Show that they have the same characteristic polynomials. ) This gives another proof of the statement at the end of Exercise VIII.5.6, for p 2 . The same method works for p > 2. For instance, the case p = 3 is dealt with as follows. If R 1 , R2 are invertible, then ==
R1 0 A3 R1 0 A3
A1 R2 0
0 A2 R3 0 R2
 A 3R! 1 A 1
0 0 1 R3 + A3 R! 1 A 1 R"2 A 2
I 0 0
R1 1 A1 
I 0
0 R:; 1 A 2 I
Derive similar factorisations for p > 3, and use this to prove the statement at the end of Exercise VIII. 5.6.
VI I I . 7 Notes and References
VI II . 7
249
Notes and References
Many of the topics in this chapter have been presented earlier in R. Bhatia, Perturbation Bounds for Matrix Eigenvalues, Longman, 1 987, and in G . W. Stewart and J G Sun, Matrix Perturbation Theory, Academic Press, 1990. Some results that were proved after the publication of these books have, of course, been included here. The first major results on perturbation of roots of polynomials were proved by A. Ostrowski, Recherches sur e a methode de Griiffe et e es zeros des polynomes et des series de Laurent, Acta Math. , 72 ( 1 940) , 99257. See also Appendices A and B of his book Solution of Equations and Systems of Equations, Academic Press, 1960 . Theorem VIII.2.2 is due to Ostrowski . Using this he proved an inequality weaker than (VIII. 1 5 ) ; this had a factor (2n  1 ) instead of 4. The argument used by him is the one followed in Exercise VIII. 1 . 3. Ostrowski was also the first to derive perturbation bounds for eigenvalues of arbitrary matrices in his paper Uber die Stetigkeit von charakteristischen Wurzeln in A bhiingigkeit von den Matrizenelementen, Jber. Deut . Mat. Verein, 60 ( 1957) 4042. See also Appendix K of his book cited above. I aij I , The inequality he proved involved the matrix norm I I A I I L == � .

.
L 7.
,J
which is easy to compute but is not unitarily invariant . With this norm, his inequality is like the one in (VIII.4) . An inequality for d(u(A) , u ( B ) ) in terms of the unitarily invariant HilbertSchmidt norm was proved by R. Bhatia and K . K . Mukherjea, On the rate of change of spectra of operators, Linear Algebra Appl. , 27 ( 1979) 147 1 57. They followed the approach in Exercise VIII. 2 . 6 and, after a little tidying up, their result looks like (VIIL4) but with the larger norm II · ll 2 instead of l l · l l · This approach was followed , to a greater success , in R. Bha tia and S. Friedland, Variation of Grassmann powers and spectra, Linear Algebra Appl. , 40 ( 198 1 ) 1 18. In this paper, the norm II I I was used and an inequality slightly weaker than (VIII.4) was proved. An improvement of these inequalities in which (2n  1 ) is replaced by n was made by L. Elsner, On the variation of the spectra of matrices, Linear Algebra Appl. , 47 ( 1982) 127138. The major insightful observation was that the Matching Theorem does not exploit the symmetry between the polynomials f and g , nor the matrices A and B , under consideration. Theorem VIII. 1 . 1 is also due to L . Elsner, An optimal bound for the spectral variation of two matrices, Linear Algebra Appl. , 7 1 ( 1985 ) 7780. The argument using Chebyshev polynomials, that we have employed in Sections VIII. 1 and VIII.2, seems to have been first used by A. Schonhage, Quasi GCD computations, J. Complexity, 1 ( 1985) 1 18 137. ( See Theorem 2. 7 of this paper. ) It was discovered independently by D. Phillips , Improv ing spectral variation bounds with Chebyshev polynomials, Linear Algebra Appl. , 1 33 ( 1990) 1 65 1 73 . Phillips proved a weaker inequality than (VIII.8) ·
250
VI I I .
Spectral Variation of Nonnormal M atrices
with a factor 8 instead of 4. This argument was somewhat simplified and used again by R. Bhatia , L. Elsner , and G. Krause, Bounds for the variation of the roots of a poly nomial and the eigenvalues of a matrix, Linear Algebra Appl. , 142 ( 1990) 195209. Theorems VIII. 1.5 and VIII.2.4 ( and their proofs ) have been taken from this paper. Using finer results from Chebyshev approximation, G . Krause has shown that the factor 4 occurring in these inequalities can be replaced by 3.08. See his paper Bounds for the variation of matrix eigen values and polynomial roots, Linear Algebra Appl. , 208/209 ( 1994) 7382. It was shown by Bhatia, Elsner, and Krause in the paper cited above that, in the inequality (VIII. 1 5) , the factor 4 cannot be replaced by anything smaller than 2. Theorems VIII. 3. 1 and VIII.3.3 were proved in the very influential paper, F.L. Bauer and C .T. Fike, Norms and exclusion theorems, Numer. Math. , 2 ( 1960) 137 141 . See the discussion in Stewart and Sun, p. 177. The basic idea behind results in Section VIII.3 from Theorem VIII.3.4 onwards is due to W. Kahan, Inclusion theorems for clusters of eigenvalues of Hermitian matrices, Technical Report, Computer Science Department, University of Toronto, 1967. Theorem VIII.3.4 for the special case of the operator norm is proved in this report. The inequality (VIII.23) is due to J .G. Sun, On the perturbation of the eigenvalues of a normal matrix, Math. Numer. Sinica, 6 ( 1 984) 334336. The ideas of Kahan's and Sun's proofs are outlined in Problems VIII. 6.8 and VIII.6.9. Theorem VIII.3.4 , in its generality, was proved in R. Bhatia, C. Davis, and F. Kittaneh, Some inequalities for commutators and an application to spectral variation, Aequationes Math. , 4 1 ( 1 99 1 ) 7078. The three corollaries were also proved there. These authors then used their commutator inequalities to derive weaker versions of Theorems VIII.3.9 and VIII.3. 10; in all these, the square root in the inequalities (VIII.25) and (VIII.26) is missing. For the operator norm alone, the inequality (VIII.25) was proved by T.X. Lu, Perturbation bounds for eigenvalues of symmetrizable matrices, Numerical Mathemat ics: a Journal of Chinese Universities, 16 ( 1994) 1 77 185 (in Chinese ) . The inequalities (VIII.25) (VIII.27) have been proved recently by R . Bhatia, F. Kittaneh and R .C. Li, Some in eq uali ti e s for commutators and an appli cation to spectral variation II, Linear and Multilinear Algebra, to appear. The inequality in Problem VIII.6. 10 was proved in R . Bhatia, L. Elsner, and G. Krause, Spectral variation bounds for diagonalisable ma trices, Preprint 94098, SFB 343, University of Bielefeld. Example VIII.3.8 ( and another example illustrating the same phenomenon for the trace norm ) was constructed in this paper. The inequality in Problem VIIL6. 11 was found by L. Elsner and S. Friedland , Singular values, doubly stochastic matrices and applications, Linear Algebra Appl. , 220( 1995) 1 6 1 169. The results of Section VIII.4 were discovered by P. D . Lax, Differen tial equations, difference equations and matrix theory, Comm. Pure Appl. Math. , 1 1 ( 1958) 175 194. Lax was motivated by the theory of linear partial
V I I I . 7 Notes and References
251
differential equations of hyperbolic type, and his proofs used techniques from this theory. The paper of Lax was followed by one by H . F . Wein berger, Remarks on the preceding paper of Lax, Comm. Pure Appl. Math. , 1 1 ( 1958) 1 95 196. He gave simple matrix theoretic proofs of these theo rems , which we have reproduced here. L. Carding later pointed out that these are special cases of his results for hyperbolic polynomials that ap p eared in his papers Linear hyperbolic partial differential equations with constant coefficients, Acta Math. , 84( 1951 ) 162, and An inequality for hy perbolic polynomials, J . Math. Mech. , 8 ( 1 959) 957966. A characterisation of the kind of spaces R discussed in Section VIII.4 was given by H. Wielandt , Lineare Scharen von Matrizen mit reellen Eigenwerten, Math. Z. , 53( 1950) 2 1 9225 . It was observed by R. Bhatia, On the rate of change of spectra of oper ators II, Linear Algebra Appl. , 36 ( 198 1 ) 2532 , that better perturbation bounds can be obtained for matrices whose eigenvalues occur in pairs ±.A. This was carried further in the paper Symmetries and variation of spectra, Canadian J. Math. , 44 ( 1 992) 1 1 55 1 166, by R. Bhatia and L. Elsner, who considered matrices whose eigenvalues are pCarrollian. See also the paper by R. Bhatia and L. Elsner, The qbinomial theorem and spectral symmetry, Indag. Math. , N.S. , 4( 1993) 1 1 1 6. The material in Section VIII.5 is taken from these three papers. The bound in Problem VIII.6. 1 is due to Lagrange. There are several interesting and useful bounds known for the roots of a polynomial. Since the roots of a polynomial are the eigenvalues of its companion matrix, some of these bounds can be proved by using bounds for eigenvalues. An interesting discussion may be found in Horn and Johnson, Matrix Analysis, pages 3 1 63 19. The Gersgorin Disk Theorem was proved in S.A. Gersgorin, Uber die A brenzung der Eigenwerte einer Matrix, Izv. Akad. Nauk SSSR, Ser. Fiz. Mat., 6 ( 193 1 ) 749754. A matrix is called diagonally dominant if l a i i l > 2: l a ij I , 1 < i < n . Every diagonally dominant matrix is nonsingular.
j :j::i
Gersgorin 's Theorem is a corollary. This theorem is applied to the study of several perturbation problems in J .H. Wilkinson, The A lgebraic Eigenvalue Problem. A comprehensive discussion is also given in Horn and Johnson, Matrix Analysis. The results of Problems VIII.6.4, VIII .6.5, and VIII.6.6 are due to P. Henrici, Bounds for iterates, inverses, spectral variation and fields of values of nonnormal matrices, Numer. Math. , 4 ( 1962) 2439. Several other very interesting results that involve the measure of nonnormality are proved in this paper. For example, we know that the numerical range W ( A) of a matrix A contains the convex hull H(A) of the eigenvalues of A, and that the two sets are equal if A is normal. Henrici gives a bound for the distance between the boundaries of H (A) and W(A) in terms of the measure of nonnormality of A.
252
VI I I .
S pect ral Variation of Nonnormal Matrices
There are several different ways to measure the nonnormality of a ma trix. Problem VIII . 6 . 6 relates two such measures by an inequality. The re lations between several different measures of nonnormality are discussed in L. Elsner and M.H.C. Paardekooper, On measures of nonnormality of ma trices, Linear Algebra Appl. , 92 ( 1987) 107 1 24. Is a nearly normal matrix near to an (exactly) normal matrix? More pre cisely, for every E. > 0, does there exist a 8 > 0 such that if I I A * A  AA* I I < 8 then there exists a normal B such that II A  B ll < £7 The existence of such a 8 for each fixed dimension n was shown by C. Pearcy and A. Shields, Al most commuting matrices, J. Funct. Anal. , 33( 1979) 332338. The problem of finding a 8 depending only on £ but not on the dimension n is linked to several important questions in the theory of operator algebras. This has been shown to have an affirmative solution in a recent paper: H. Lin, A lmost commuting selfadjoint matrices and applications, preprint, 1995. No explicit formula for 8 is given in this paper. In an infinitedimensional Hilbert space , the answer to this question is in the negative because of index obstructions. The inequality in Problem VIII.6. 7 was proved i n W. Kahan, B .N. Par lett , and E. Jiang, Residual bounds on approximate eigensystems of non normal matrices, SIAM J . Numer. Anal. 1 9 ( 1 982) 470484. The inequality in Problem VIII. 6. 12 was proved by W. Kahan , Spectra of nearly Hermitian matrices, Proc. Amer. Math. Soc. , 48( 1975) 1 1 1 7. For 2 x 2 blockmatrices , the idea of the argument in Problem VIII.6. 1 4 is due to M . D. Choi , Almost commuting matrices need not be nearly com muting, Proc. Amer. Math. Soc. 102 ( 1 988) 529533. This was extended to higher order blockmatrices by R. Bhatia and L. Elsner, Symmetries and variation of spectra, cited above.
IX A
S elect ion of Matrix Inequalit ies
In this chapter we will prove several inequalities for matrices. From the vast collection of such inequalities, we have selected a few that are simple and widely useful. Though they are of different kinds , their proofs have common ingredie nt s already famil i ar to us from earlier chapters . IX . l
Some B asic Lemmas
If A and B are any two matrices , then AB and BA have the same eigen values. (See Exercise 1. 3. 7. ) Hence, if f(A) is any function on the space of matrices that depends only on the eigenvalues of A, then f(AB) == f ( BA) . Examples of such functions are the spectral radius, the trace, and the de terminant. If A is normal, then the spectral radius spr ( A ) is equal to I I A I I Using this, we can prove the following two useful p ro p os i t ions .
Proposition IX. l . l Let A, B be any two matrices such that the product AB is normal. Then, for every u nit ari ly invariant norm, we have III AB III < III BA III 
(IX. l )
Proof. For the operator norm this is an easy consequence of the two facts mentioned above; we have I I AB II
==
spr ( AB )
==
spr (BA) < I I BA II 
The general case needs more argument . Since AB is normal, sj (AB) ! Aj (AB) I , where j > q (AB) I > · · · > I An (AB) I are the eigenvalues of A B
254
IX .
A Selection
of Matrix
Inequalities
==
arranged in decreasing order of magnitude. But j Aj (A B ) l !Aj ( BA ) I . By Weyl's Majorant Theorem ( Theorem II.3.6) , the vector !A (BA) l is weakly majorised by the vector s ( BA) . Hence we have the weak maj orisation • s ( A B) < w s(B A ) . From this the inequality (IX. l) follows.
Proposition IX. 1 . 2 Let A, B be any two matrices such that the product AB is Hermitian. Then, for every unitarily invariant norm, we have
III AB III
.
< I II Re(B A ) Ill ·
( IX 2 )
Proof. The eigenvalues of BA, being the same as the eigenvalues of the Hermitian matrix AB, are all real. So, by Proposition III. 5.3, we have the majorisation A (B A ) < A( Re B A). From this we have the weak majorisation I A(BA) I < w I A(Re B A ) l (See Examples II.3.5.) The rest of the argument is the same as in the proof of the preceding proposition. •
.
Some of the inequalities proved in this chapter involve the matrix expo nential. An extremely useful device in proving such results is the following theorem.
Theorem IX. 1 . 3 { The Lie Product Formula) For any two matrices lim
rn � CX)
Proof.
(exp
A m
)
B rn exp == exp(A + B) m
.
A, B,
(IX.3 )
For any two matrices X , Y , and for m == 1 , 2, . . . , we have
x m  ym
== L xm 1 j (X  Y ) Yj . m 1
j =O
Using this we obtain (IX.4 )
== ==
.. .
where M max( ! I X I I , I I Y I I ) . Now let Xm exp ( A!_B ) Ym == exp � exp ! , m == 1 , 2 , . Then From the I I Xm I I and I I Ym II both are bounded above by exp power series expansion for 't he exponential function, we see that
( HA I !_I BII ) .
,
)
( 2 2 ) + · · ·] }  { [ 1 + � + � ( �) + · · · ] [ 1 + ! + � ( ! ( �2 ) for large m.
1+
0
A+B m
1 +2
A+B 2 +··· m
IX. 2 P ro ducts of Positive Matrices
255
Hence, using the inequality (IX .4) , we see that
! I X:  Y,: !l < m exp( I I A I I + I I B I I ) O
(�2 ) ·
This goes to zero as m � oo . But x:;;: == 2 xp ( A + B) for all m. Hence, lim Y� == exp ( A + B) . This proves the theorem. • m + CXJ
The reader should compare the inequality (IX.4) with the inequality in Prob lem I.6. 1 1 .
Exercise IX. 1. 4 Show that for any two matrices A , B lim
m + CXJ
(exp B 2m
B
A
exp  exp 2m m

)
m
==
exp ( A + B ) .
Exercise IX. 1 . 5 Show that for any two matrices A, B
( tB lim exp  exp
t + 0
IX . 2
2
)
tB l / t
tA exp 
2
==
exp ( A + B) .
Products of Posit ive Matrices
In this section we prove some inequalities for the norm, the spectral radius, and the eigenvalues of the product of two positive matrices.
Theorem IX. 2. 1 Let A, B be positive matrices. Then (IX.5)
Proof.
Let
Then D is a closed subset of [0, 1] and contains the points 0 and 1. So, to prove the theorem, it suffices to prove that if s and t are in D then so is s�t . We have +2 t B s+2 t 11 2 s A II
s+ +t s+t s+t s + t A8 + t Bt B spr 2 ) 2 2 A8 n 2 I I == (B H spr ( Bs A s+t B t ) < I I BS As+ t B t l ! < 11 BSA S JI I I AtB t l l == I I AS B S II l l A t B t l l ·
At the first step we used the relation I I T I I 2 == l i T* T i l , and at the last step the relat ion l i T * I I == li T I I for all T. If s , t are in D , this shows that
II A
s
tt
s
B t t II < I I AB I I (s + t )/ 2 '
256
IX .
A Selection o f Matrix Inequalities
•
and this proves the theorem.
An equivalent formulation of the above theorem is given below, with another proof that is illuminating.
Theorem IX. 2 . 2 If A, B are positive matrices with II AB I I < 1 , then li A s B s ll S 1 for 0 < s < 1 . Proof. We can assume that A > 0. The general case follows from this by a continuity argument . We then have the chain of implications
I I AB I I < 1
=} =* ==? ==? =}
I I AB2 A l l < 1 =} AB2 A < I B2 < A  2 (by Lemma V. 1 . 5) B2s < A  2 s (by Theorem V. 1 . 9) A s B2 s As < I (by Lemma V. 1 .5) jj A s B 2 s A s I I < 1 =} I I A s B s I I < 1 .
•
Another equivalent formulation is the following theorem.
Theorem IX. 2.3 Let A, B be positive matrices. Then (IX.6 )
Proof. From Theorem IX. 2 . 1 , we have I I A1ft B1ft l l < I I AB I I 1 / t for t > 1. Replace A, B by At , B t , respectively. • Exercise IX. 2 . 4 Let A, B be positive matrices. Then {i) {ii)
I I A l f t B1ft I I t is a monotonically decreasing function of t on (0, oo ) . I I At Bt I I I / t
is a monotonically increasing function of t on (0, oo) .
In Section 5 we will see that the inequalities (IX.5 ) and (IX.6) are, in fact, valid for all unitarily invariant norms. Results akin to the ones above can be proved for the spectral radius in place of the norm. This is done below. If A and B are positive, the eigenvalues of A B are positive. (They are the same as the eigenvalues of the positive matrix A 11 2 BA 1 1 2 . ) If T is any matrix with positive eigenvalues, we will enumerate its eigenvalues as > A n (T) > 0. Thus .A 1 (T) is equal to the spectr al .A1 (T) > .A (T) > 2 radius spr (T) . ·
·
·
Theorem IX. 2 . 5 If A, B are positive matrices with .A 1 (AB) < 1 , then ) q (A s B s ) < 1 for O < s < 1 .
IX . 2 Products of Positive Matr ices
257
Proof. As in the proof of Theorem IX.2.2 , we can assume that A > 0. We then have the chain of implications
)q (AB)
<
1
)q (A 1 / 2 B A 1 1 2 )
< 1
A 1 /2 BA 1 /2 < I B < A  1 ==> B s < A s ==> A s f2 B s A s / 2 ) q ( A sf 2 Bs A sf 2 ) < 1 ==> )q (A s B s ) < 1 .
==> ==> ==>
==>
< I •
This proves the theorem.
It should be noted that all implications in this proof and that of Theorem IX. 2.2 are reversible with one exception: if A > B > 0, then A s > B8 for 0 < s < 1 , but the converse is not true.
Theorem IX. 2 . 6 Let A, B be positive matrices. Then
(IX. 7 )
Proof. Let A 1 (AB) o: 2 . If o: f= 0, we have A 1 ( � � ) == 1 . So, by Theorem IX. 2.5, At (As B8 ) < a 2 8 == A1 (AB) . If a == 0, we have .X 1 (A1 1 2 BA 1 1 2 ) == 0, and hence A 11 2 BA 1 1 2 == 0. From this it follows that the range of A is contained in the kernel of B. But then • A 8 1 2 B 8 A 8 1 2 == 0, and hence, ) q (A 8 B 8 ) == 0 . ==
Exercise IX. 2 . 7 Le t A, B be positive matrices. Show that (IX.8)
Exercise IX. 2 . 8 Let A, B be positive matrices. Show that
[.X 1 ( A 1 ft B 1 ft )] t is a monotonically decreasing function of t on (0, oo) . (ii) [A 1 (At Bt )] 1 ft is a monotonically increasing function of t on (O , oo) . {i)
Using familiar arguments involving antisymmetric tensor products , we can now obtain stronger results.
Theorem IX. 2 . 9 Let A, B be positive matrices. Then, for 0 we have the weak majorisation
<
t
<
oo ,
(IX.9)
Proof. For k == 1 , 2, . . . , n , consider the operators 1\ k A and 1\ k B. The result of Exercise IX.2.8 (ii) applied to these operators in place of A, B yields the inequalities
k
k
II A� / t ( A t B t ) < II A� /u ( Au Bu )
j=
for k == (IX.9 ) .
1
( IX. 1 0 )
j= l
1 , 2 , . . . , n . The assertion of II.3.5(vii ) now leads to the majorisation
•
258
IX .
A Selection of
Matrix Inequalities
Theorem IX. 2 . 10 Let A , B be positive matrices. Then for every unitarily invariant norm we have
III Bt A t B t l ll II I ( BAB ) t lll P roof.
< <
III ( BAB ) t lll , for O < t < III B t At B t Ill , for t > 1 .
1,
(IX. 1 1) (IX. 1 2)
We have
for 0 < t < 1 , by Theorem IX.2. 1 . So
This is the same
as
saying that
Replacing A and B by their antisymmetric tensor powers, we obtain, for 1 < k < n, k
IT sj ( B t At B t )
j=l
<
k
IT s� ( BAB) .
j= l
By the argument used in the preceding theorem, this gives the majorisation
which gives the inequality (IX. 1 1 ) . The inequality (IX. 12) is proved in exactly the same way.
•
Exercise IX.2. 1 1 Derive (as a special case of the above theorem) the fol lowing inequality of ArakiLieb Thirring. Let A, B be positive matrices, and let s, t be positive real numbers with t > 1 . Then (IX. l3)
IX.3
Inequalities for the Exponential Function
For every complex number matrix version of this.
z,
we have I e z I == I e Re z 1  Our first theorem is a
Theorem IX.3. 1 Le t A be any matrix. Then (IX. l4 ) for every unitarily invariant norm.
IX . 3 Inequalit ies
for
the Exponent ial Function
259
Proof. For each positive integer m , we have I I Am l l < I I A II m · This is the same as saying that si (A m ) < si m ( A) or s 1 ( A *m Am ) < sr ( A * A ) . Replac ing A by A, we obtain for 1 < k < n
1\ k
k
IT Sj (A*m Am )
j= l
k
<
IT Sj ( [A* A] m ) .
j= l
eA im , we obtain k A * A ITk S [ A · /m im] m . eA ) IT Sj (e e ) < j( e
Now , if we replace A by
j=l
Lett ing
m � oo ,
j=l
and using the Lie Product Formula, we obtain
k
kS
j(eA· eA) s < IT j(eA *+A ) . j= j=l IT
l
Taking square roots, we get k
IT Sj (
j=l
k
eA ) < j=IT Sj(eReA ). l
This gives the majorisation •
( see II . 3 . 5 ( vii ) ) , and hence the inequality ( IX. l4 ) .
It is easy to construct an example of a 2 x 2 matrix A , for which I I and I ll are not equal. Our next theorem is valid for a large class of functions . It will be conve nient to give this class a name.
l eA Il
J l e Re A
Definition IX.3.2 A co ntinuo us complexvalued function on the space of matrices will said to belong to the class T if it satisfies the following two properties:
f
be {i} f(XY) f(YX) for all X, Y. {ii} l f (X2 m )j < f( [XX * ] rn ) for all X , and for 1, 2 , . . . . Exercise IX.3 . 3 {i) The func tions trace and determinant are in {ii} For every k, 1 < k < the function 'P k (X) tr 1\ k X is in {These are the coefficients in the characteristic polynomial of X . ) ==
m
==
T.
n,
==
T.
260
IX .
A Selection of M atrix Inequalities
{iii) Let Aj ( X) denote the eigenvalues of X arranged so that j A 1 ( X ) I > j A 2 (X ) j > · · · > I Xn (X) j . Then, for 1 < k < n , the function fk (X ) ==
k IT Aj ( X )
j =l II. 3. 6.}
is in T, and so is the function
{iv) For 1 < k <
n,
the function
gk (X )
==
! fk (X ) j . [Hint:
k L lAj (X ) I j=l
{v) Every symmetric gauge function of the numbers is in T .
Use Theorem
is in T.
j A I ( X ) I , . . . , I An (X ) j
Exercise IX. 3. 4 {i} If f is any complex valued function on the space of matrices that satisfies the condition (ii} in Definition IX. 3. 2, then f( A ) > 0 if A > 0 . In particular, f ( e A ) > 0 for every Hermitian matrix A . {ii) If f satisfies both conditions {i) and {ii) in Definition IX. 3. 2, then J ( A B) > 0 if A and B are both positive. In particular, f( e A e 8 ) > 0 if A and B are Herrnitian.
The p r in c ip al result about the class T is the following.
Theorem IX. 3 . 5 Let f be a function in the class T . Then for all matrices A, B , we have (IX. l 5) Proof.
X, Y l 2 (XY (XY *] rn ([ ) ) ) J f([XYY * X* ) 2 m l ) f( [X* XYY *] 2 m l ) .
For each positive integer
m 2 Y ([X ] )I IJ
m,
<
we have for all
Here, the inequality at the first step is a consequence of the property ( ii ) , and the equality at the last step is a consequence of the property ( i ) of functions in T. Repeat this argument to obtain
l f( [X Y] 2rn ) l
f( [ (X * X ) 2 (Y Y * ) 2 ] 2 m 2 ) < f([X* X] 2 m  l (YY *] 2m l ) . matrices. Put X == eAI 2 m and Y <
Now let A , B be any two == eBI 2 m in the above inequality to obtain / < [ /2m e A /2m ] 2 m  l (e B / 2 m e B * / 2 m ] 2 m  l /2 l f ( ( e A 2 rn e B m ] 2 m ) j f( e A ). *
Now let m * oo . Then, by the continuity of f and the Lie Product Formula, we can conclude from the above inequality that B < j j (e A + B ) j j ( e (A + A * ) / 2 e ( B +B * ) / 2 ) == j( e Re A e Re ) . •
IX . 3 I nequalities for the Expo nent ial Fu nction
Corollary IX. 3.6
L et f
261
be a function in the class T. Then (IX. 16)
and
(IX. l 7) Particularly noteworthy is the following special consequence.
Theorem IX.3. 7 Let A , B be any two Hermitian matrices . Then (IX. 18) fo r every unitarily invariant norm.
Use (IX. l 7) for the special functions in Exercise IX. 3 . 3 (iv) . This gives the majorisation Pro of.
•
This proves the theorem. Choosing f(X) == trX , we get from (IX. l 7) the famous Thompson inequality: for Hermitian A, B we have
G o l de n
(IX. l9)
Exercise IX.3.8 Let A, B be Hermitian matrices. Show that for every uni tarily invariant norm exp
tB 2
tB
exp tA exp 2
1 /t
decreases to Ill exp(A + B) Ill as t 1 0 . A s a special consequence of this we have a stronger version of the Golden Thompson inequality:
(
tB
tB
tr exp (A + B) < tr exp 2 exp tA exp 2 [Use Theorem IX. 2. 1 0 and Exercise IX. l. 5. J
)
for all t > 0.
262
IX . 4
IX .
A Selection of Mat rix I nequalities
Arit hmetic Geometric Mean Inequalities
The classical arithmeticgeometric mean inequality for numbers says that .J(;i; < � ( a + b) for all positive numbers a, b. From this we see that fo r complex numbers a , b we have Jab l < � ( l a l 2 + l b l 2 ) . In this section we obtain some matrix versions of this inequality. Several corollaries of these inequalities are derived in this and later sections.
Lemma IX.4. 1 Let Y1 , Y2 be any two positive matrices, and let Y == Y1 Y2 . Let Y == y +  y be the Jordan decomposition of the Hermitian matrix Y . Then, for j == 1 , 2 , . . . , n,
(See Section IV. 3 for the definition of the Jordan decomposition.}
Proof. Suppose .:\1 (Y) is nonnegative for j == 1 , . . . , p and negative for j == p + 1 , . . . , n. Then .:\1 ( Y + ) is equal to A1 ( Y ) if j == 1 , . . . , p, and is zero for j == p + 1 , . . . , n . Since Y1 == Y + Y2 > Y, we have A.j ( Y1 ) > A1 ( Y ) for all j , by Weyl' s Mono tonicity Principle. Hence, A.1 ( Y1 ) > .:\1 ( Y+ ) for all j . Since Y2 == Y1  Y >  Y , we have A1 (Y2 ) > Aj ( Y ) for all j . But A.1 (  Y) A1 ( Y ) for j = 1 , . . . , n  p and .:\1 (  Y ) == 0 for j > n  p . Hence, .:\1 ( Y2 ) > A.1 (Y ) for all j. • =
Theorem IX. 4 . 2 Let A,
B be any two matrices. Then
(IX.20) for 1 < j <
n.
X be the 2n 2n matrix X == ( � � ) . Then XX * = ( AA* � BB* � ) , X * X = ( �:� �:� ) . The offdiagonal part of X* X can be written Proof.
Let
x
as
where U is the unitary matrix ( � �1 ) . Note that both of the matrices the braces above are positive. Hence, by the preceding lemma,
in
IX. 4 Arit hmetic G eometric Mean Inequalities
263
X X and XX* have the same eigenvalues. Hence, both :A1 ( Y + ) and * Aj ( Y ) are bounded above by � :A1 ( AA* + BB * ) . Now note that , by Exercise II. l . l5, the eigenvalues of Y are the singular values of A* B together with But
their negatives. Hence we have
•
A , B be any two matrices. Then there exists a uni
Corollary IX.4.3 Let tary matrix such that
U
j A* Bj < � U(A A* + BB*)U*. Corollary IX.4.4 Let A , B be any two matrices. Then l i l A * B i l < � I I AA * + B B * I l
(IX. 2 1)
(IX . 22)
for every unitarily invariant norm.
The particular position of the stars in (IX. 20) , (IX. 2 1 ) , and (IX. 22) is not an accident. If we have
A= ( � � ) , B= ( � � ) , then s1( A B) == )2, but � s 1 ( AA* + BE * ) == 1 . The presence of the unitary U in (IX. 2 1 ) is also essential: it cannot be replaced by the identity matrix even when A , B are Hermitian. This is illustrated by the example A considerable strengthening of the inequality (IX.22) is given in the theorem below.
A, B , X, we have I I A* X B I I < � I ! AA* X + XBB* I I
Theorem IX.4.5 For any three matrices
(IX.23 )
for every unitarily invariant norm. B, X are Hermitian and Proof. First consider the special case when A == Then is also Hermitian. So, by Proposition IX. l.2 ,
B.
AX A
A,
(IX.24)
264
IX .
A Selection of Matrix Inequalit ies
\vhich is just the desired inequality in this special case. Next consider the more general situation, when A and and X is any matrix. Let
B are Hermitian
Then, by the special case considered above , ( IX.25)
Multiplying out the blockmatrices, one sees that
AX E EX * A 0
)2 ( 0 A X + XB 2 ( B2 X * + X * A2 0 ) · 0
TYT T 2 Y + Y T2
,
Hence, we obtain from (IX.25) the inequality
III AXB III
<
� III A2 X + X B2 III ·
Finally, let A, B, X be any matrices. Let decompositions of A and B . Then
(IX. 26)
A == A 1 U, B == B 1 V
be polar
AA* X + XBB * == AiX + X B i , while
l i l A* X B i ll == III U A l X B l V III == I I I A l X B l lll·
•
So, the theorem follows from the inequality (IX.26) .
Exercise IX.4.6 Another proof of the theorem can be obtained as follows. First prove the inequality (IX. 24} for Hermitian A and X . Then, for arbi trary A , B and X, let T and Y be the matrices
T ==
0 0 0 0 A* 0 0 B*
A 0 0 0
G jj 0 0
Y ==
0 X* 0 0
X 0 0 0
0 0 0 0
0 0 0 0
and apply the special case to them.
Exercise IX.4. 7 Construct an example to show that the inequality {IX. 20} cannot be strengthened in the way that (IX. 23) strengthens (IX. 22). When A, B are both positive, we can prove a result stronger than (IX. 26 ) . This is the next theorem.
IX. 4
Arit hmet ic Geomet ric Mean Inequal i t ies
Theorem IX.4.8 Let A, E be positive matrices and let Then, for each unitarily invariant norm, the function
X
265
be any matrix.
(IX . 27) is convex on the interval [ 1 , 1] and attains its minimum at t
==
0.
Proof. Without loss of generality, we may assume that A > 0 and B > 0. Since f is continuous and f(t) == f(  t ) , both conclusions will follow if we show that f (t) < � [f(t + s ) + f (t  s ) ] , whenever t ± s are in [ 1 , 1] . For each t, let M t be the mapping
M t (Y) For each
Y, we have
==
� (A t YB  t + A  t YB t ) . 2
III Y III = Il l At (A  t y B  t ) B t lll < � I I I AtY B  t + A  t y B 1 Ill by Theorem IX.4.5. Thus III Y III < IIIM t ( Y) III  From this it follows that II ! M t (AX B ) Ill < I I IMsM t (AX B) III , for all s , t. But,
So we have
Since
IIIM t (AXB ) lll < � { IIIM t + s (AXB ) Ill + IIIM t  s (AXB) lll }. IIIM t (AX B ) lll � f ( t ) , this shows that ==
1 f ( t) < [ f ( t 2 This proves the theorem.
+ s ) + f (t  s)] .
•
Corollary IX.4.9 Let A, B be positive matrices and X any matrix. Then, for each unitarily invariant norm, the function (IX.28) is convex on [0 , 1] .
Proof.
Replace
A, B by A 1 1 2 ,
B 1 1 2 in (IX . 2 7 ) . Then put v
==
l �t .
•
Corollary IX.4. 10 Let A, B be positive matrices and let X be any matrix. Then, fo r 0 < v < 1 and for every unitarily invariant norm,
III Av XB 1  v + A 1  v XB v ll l
<
III A X + XB III ·
Proof. Let g ( v) be the function defined in (IX. 28) . Note that g ( l ) So the assertion follows from the convexity of g .
(IX.29) ==
g (O) .
•
IX .
266
IX . 5
A Selection of Matrix Inequalit ies
Schwarz Inequalities
In this section we shall prove some inequalities that can be considered to be matrix versions of the CauchySchwarz inequality. Some inequalities of this kind have already been proved in Chapter 4. Let A, B be any two matrices and r any positive real number. Then , we saw that (IX.30) III l A* B l ' lll 2 < I II (AA * )' II I II I (BB* )' I II for every unitarily invariant norm. The choice r
== �
gives the inequality (IX.31)
while the choice r
==
1 gives
lil A* B lll 2 < I I I AA* I II II IBB* II I 
(IX.32)
See Exercise IV. 2 . 7 and Problem IV.5. 7. It was noted there that the in equality (IX. 32) is included in (IX 3 1 ) . We will now obtain more general versions of these in the same spirit as of Theorem IX. 4 . 5 . The generalisation of (IX.3 2) is proved easily and is given first , even though this is subsumed in the theorem that follows it . .
Theorem IX. 5 . 1 Let A, B , X be any three matrices. Then, for every uni tarily invariant norm,
III A* XB III 2 < III AA *X III I II XBB* III
(IX.33)
X is a positive matrix. Then lil A* X 1 1 2 X 1 1 2 B lll 2 I ll (X 1 1 2 A) * ( X 1 1 2 B) 111 2 III X l/ 2 AA* x l / 2 111 III X l/ 2 BB* x l / 2 11 1 ,
First assume that
Proof.
lil A* x B lll 2
==
<
using the inequality (IX.32) . Now use Proposition IX. l . l to conclude that
Ill A* X B 111 2 < Ill AA * X Ill Ill X B B * Ill This proves the theorem in this special case. Now let X be any matrix, and let X UP be its polar decomp:Osition. Then, by unitary invariance, I li A* u p Bill I I I U* A * u p B ill , III A *X B III I II AA* U P ill == I I I U* AA * UPi ll , I I I AA * X III II IU p BB * Ill == I II P B B * Ill Ill X BB* Ill ==
==
So, the general theorem follows by applying the special case to the triple
U* AU, B , P .
•
The corresponding generalisation of the inequality (IX . 30) is proved in t h e next theorem.
267
IX . 5 Schwarz Inequalities
Let A, B, X be any three matrices. Then, for every pos itive real number r, and for every unitarily invariant norm, we have
Theorem IX. 5. 2
IIl J A* XB I ' III 2
<
III I AA* X l ' lll III J XBB* I ' III ·
( IX.34 )
Proof. Let X == UP be the polar decomposition of X. Then A * X B == A * U P B == ( P 1 1 2 U * A) * P 112 B. So, from the inequality ( IX.30 ) , we have III l A* X B J ' III 2 < III (P 1 1 2 U* AA* U P 1 12 ) ' l i i iii (P 1 ; 2 B E* P 11 2 )' 111 · (IX.35) Now note that
Using Theorem
IX.2.9, we have
A.' (AA* U P U* ) < w .X' I 2 ( [AA*] 2 [ UPU* ] 2 ) . But (UPU * ) 2 U P2 U * == XX * . Hence, >t i2 ( [AA* ] 2 [UPU * ] 2 ) s' (AA*X ) . ==
==
Thus, and hence
III ( P 1 12 U* A A *U P 1 1 2 ) ' 11 1
In the same way, we have
<
I I I I AA* X I' Ill ·
( IX.3 6 )
.X ' (P BB * ) < w >t i2 (P2 [BB* ] 2 ) s ' (PBB* ) == s ' (XBB* ) .
Hence
( IX.37 )
Combining the inequalities (IX.35) , (IX.36) , and ( IX.37 ) we get
(IX.34 ) .
•
The following corollary of Theorem IX. 5. 1 sho uld be compared with ( IX.29 ) . Let A , B be positive matrices and let X be any matrix. < 1 , and for every unitarily invariant norm
Corollary IX. 5.3 Then, for 0 <
v
( IX .38 )
Proof. For v == 0, 1 , the inequality ( IX.38 ) is a trivial statement . For v == � , it reduces to the ine qual ity ( IX.33 ) . We will prove it, by induction, for all indices v == k / 2 n , k == 0, 1 , . . . , 2 n . The general case then follows by continuity.
IX . A Select ion o f M atrix Inequalities
268
Let v == 2 �� 1 be any dyadic rational. Then v == tt + p , \vhere JL == 2 71� , p == 21n Suppose that the inequality (IX.38) is valid for all dyadic rationals with denominator 2n l . Two such rationals are JL and .A == J..L + 2p == v + p . Then, using the inequality (IX. 33) and this induction hypothesis, \Ve have 1
•
II I A v XB l v lll <
<
II I A JL+ P X B l  A + p lll II I AP(AJL X Bl  A)BP I I I III A 2 p AJL X B 1  A III l / 2 III A JL XB 1  A B 2 p lll l/ 2 Ili A A X B 1  ).. 111 1 / 2 111 A JL X B 1  JL III 1 / 2 I II AX I II A 1 2 III X B III ( 1  ).. ) / 2 III AX III JL 1 2 III X B lll ( l  JL ) / 2 Ill AX Ill ( >. +JL ) / 2 III X B III 1  ( A + JL ) / 2 Ill AX lllv Ill X B lll l  v ·
This proves that the desired inequality holds for all dyadic rationals .
•
Corollary IX. 5 .4 Let A, B be positive matrices and let X be any matrix. Then, for 0 < v < 1 , and for every unitarily invariant norm
(IX.39)
Assume without loss of generality that A is invertible; the general case follows from this by continuity. We have, using (IX. 38) , Proof.
I I I A v X B v lll <
III (A  1 ) 1  v AXB 1  (l  v ) lll JII A  1 AX III 1  v lll AX B lllv Ill X Ill 1  v Ill AX B Ill v ·
Note t hat the inequality (IX.5) is a very special case of (IX. 39) .
•
Exercise IX. 5 . 5 Since I I I AA * I l l I l i A* A l l I , the stars in the inequality (IX. 32) could have been placed differently. Much less freedom is allowed for the generalisation (IX. 33) . Find a 2 x 2 example in which l i l A * XB I J I 2 is larger than I ll A * AX Ill Il l X B B* I ll · ==
Apart from norms , there are other interesti ng functions for which Schwarz like inequalities can be obtained. This is done below. It is convenient to have a name for the class of functions we shall study.
Definition IX.5.6 A continuous complexvalued function f on the space of matrices will be said to belong to the class .C if it satisfies the following two conditions: {i) {ii)
f(B) > f ( A ) > 0 if B > A > 0 . l f (A * B) l 2 < f (A* A) j( B * B) for all A, B .
IX . 5 Schwarz Inequalities
269
We have seen above that every un itarily invariant norm is a fu n ct i o n the class £. Other examples are given below.
Exercise IX. 5 . 7
in
{i) The functions trace and determinant are in £ .
{ii) The function spectral radius is in £ . {iii) If f is a function defined on matrices of order (�) and is in .C , then the function g (A) == f ( /\ k A) defined on matrices of order n is also in .C . {iv) The functions 'Pk (A) == tr 1\ k A, 1 < k < n , are in .C . {These are the coefficients in the characteristic polynomial of A . ) (v) If sj (A) , 1 < j <
are the singular values of A , then for each
n,
1 < k < n the function
fk (A )
== II k
S j (A) is in .C .
j=l
{vi) If Aj (A) denote the eigenvalues of A arranged as I A 1 (A) j >
I A n (A) I , then for 1 < k <
n the function fk ( A )
k
== II Aj (A)
·
·
·
>
is in £ .
j= l
Exercise IX. 5 .8 Another class of functions T was introduced in IX. 3. 2. The two classes T and .C have several elements in common . Find examples to show that neither of them is contained in the other.
A different characterisation of the c l as s .C is obtained below. For this we need the following theorem, which is also useful in other contexts.
Theorem IX. 5 . 9 Let A, B be positive operators on 7t, and let C be any operator on 1t . Then the operator ( � �· ) on 1t EB 1t is positive if and only if there exists a contraction K on 1i such that C == B 1 1 2 K A 1 1 2 . Proof. By Proposition I.3. 5, K is a contraction if and only if ( � �· ) is positive. The positivity of this matrix implies the positivity of the matrix
)(�
A 1 1 2 K* B l /2 B
).
( See Lemma V. l . 5. ) This proves one of the asserted implications. To prove
the converse, first note that if A and B are invertible, then the argument can be reversed; and then note that the general case can be obtained from • this by continuity.
2 70
IX.
A Selection of M atrix Inequalities
Theorem IX. 5 . 10 A {continuous) function f is in the class [, if and only if it satisfies the following two conditions: (a) f( A) > 0 for all A > 0 . {b) l f ( C ) I 2 < f ( A ) f (B) for all A, B , C
such that (� a; )
is positive.
Proof. If f satisfies condition ( i ) in Definition IX.5. 6 , then it certainly satisfies the condition ( a ) above. Further, if ( � �· ) is positive, then, by the preceding theorem, C == B 1 1 2 K A 1 1 2 , where K is a contraction. So, if f satisfies condition ( ii ) in IX.5.6, then Since A 1 / 2 K * K A 1 1 2 < A, we also have f (A 1 1 2 K * K A 1 1 2 ) < f(A) from the condition ( i ) in IX. 5 . 6 . Now suppose f satisfies conditions ( a ) and ( b ) . Let B > A > 0 . Write
( B A ) == ( B 0 A A B
0
BA
) + ( AA AA ) .
The first matrix in this sum is obviously positive; the second is also positive by Corollary L3.3. Thus the sum is also positive. So it follows from ( a ) and ( b ) that f (A) < f(B) . Next note that we can write, for any A and B ,
( A* AA
A* B B* B
B*
)  ( A* 00 ) ( A0 B0 ) . B*
Since the two matrices on the righthand side are adj oints of each other, their product is positive. Hence we have
l f( A * B) I 2 < f(A * A ) j (B* B) .
•
This shows that f is in £. This characterisation leads to an easy proof of the following theorem E . H . Lieb.
of
Theorem IX.5. 1 1 Let A , . . . , A m and B 1 , . . . , Ern be any matrices. Then, 1 for every function f in the class [,,
f
(� ) (� ) Ai Bi
f
Ai
2
<
f
<
f
2
(� ) (� ) (� ) (� ) A; Ai f I Ai l f
B; Bi .
l A: I .
( IX .40) (IX.4 1 )
IX. 6
The Lieb Co ncavity Theorern
271
A ... A . A .. B ) IS positive, b et he matrix ( B � A B � 8• ing the product of ( �� � ) and its adjoint. The sum of these matrices is , therefore, also positive. So the inequality (IX.40) follows from Theorem IX.5. 10. Each of the matrices ( I � ; I 1 ) is also positive; see Corollary I.3.4. So • the inequality (IX.4 1 ) follows by the same argument . P roof.
For each
,jfJ
==
1,
•
•
•
, m,
·
·
1
1
"l
·
·
1
'l
���
Exercise IX. 5. 12 For any two matrices A, B , we have tr( I A + B l ) < tr( I A I + I B I ) . Show by an example that this inequality is no t true if tr is replaced by det . Show that we have [det ( I A + B l )] 2 < det ( I A I
A
+ I B I ) det ( I A * I + I E* I ) .
similar inequality holds for every function in
IX . 6
(IX.42)
L.
The Lieb C oncavity Theorem
Let f(A, B ) be a real valued function of two matrix variables. Then , f is called jointly concave, if for all 0 < a < 1 , for all A 1 , A 2 , B 1 , B2 . In this section we will prove the following theorem due to E.H. Lieb. The importance of the theorem, and its consequences, are explained later.
Theorem IX.6 . 1 {Lieb) For each matrix t < 1 , the function
X and each real number 0 <
is jointly concave on pairs of positive matrices. Note that j (A, B) is positive if A, B are positive. To prove this theorem we need the following lemma.
Lemma IX.6. 2 Let R 1 , R2 , 81 , 82 , T1 , T2 be positive operators on a Hilbert space. Suppose R I commutes with R2 , S1 commutes with S2 , and T1 commutes with T2 , and ( IX . 4 3 ) Then, for 0 < t < 1 , ( IX. 44)
2 72
IX .
A Selection of Matrix Inequalities
Proof. Let E be the set of all t in [0, 1] for which the inequality (IX.44) is true. It is clear that E is a closed set and it contains 0 and 1 . We will first show that E contains the point � , then use this to show that E is a convex set . This would prove the lemma . Let x, y be any two vectors. Then
l ( x , ( S� / 2 s� / 2 + r{ f2r� f2 ) Y ) I < I ( x, s� / 2 s� f2 y ) I + I ( x , T11 / 2 T� / 2 y ) I < IIS� 1 2 x ll II S� 1 2 Y ll + I I T 1 1 2 x l 1 11 Ti1 2 Y ll 1 < [I I S� 1 2 x ll 2 + II T11 1 2 x ii 2 ] 1 1 2 [II S� 1 2 Y II 2 + 1 1Ti 1 2 Y ll 2 ] 1 1 2
by the CauchySchwarz inequality. This last expression can be rewritten as
Hence, by the hypothesis, we have
; I ( x, ( 811 1 2 s21 2
+
Tl1 12 T21 1 2 ) y) I
< [ ( x , R l x ) ( y, R2 Y ) ] 1; 2 .
Using this, we see that , for all unit vectors
u
and
v,
I ( u, R � 1 / 2 ( S� / 2 s� / 2 + rf l2ri f 2 ) R;, lf2v ) I I ( R� l / 2 u, ( si / 2 s� / 2 + rf l2r; f2 ) R;: 1 f2 v ) I < [ ( R ;_ If2 u, R� 1 2 u ) (R:;_ 1 f2v , R� 1 2 v )] I / 2 1.
This shows that
/ 2 ( S11 /2 821 ; 2 + T l / 2 r.2l / 2 ) R2 1 / 2 II < 1 . I I RI 1 1 Proposition IX. 1 . 1 , the commutativity of R 1 and R2 ,
Using equality above, we see that
and
the in
This is equivalent to the operator inequality
Fiom this, it follows that
( See Lemma V. 1 . 5 . ) This shows that the set E contains the point 1 /2 .
IX.6
Now suppose that
Jt
The Lieb Co ncavity Theoren1
2 73
and v are in E. Then > >
These two inequalities are exactly of the form (IX.43) . Hence, using the special case of the lemma that has been proved above, we have,
(Rij_ R� J.L ) l f 2 (R r R�  v ) l / 2 > ( Si S�J.L ) l f 2 (sr s�  v ) l f 2 + ( Tf Ti J.L ) lf 2 ( T� T:J  v ) lf 2 _ Using the hypothesis about commutativity, we see from this inequality that • � ( Jl + v ) is in E. This shows that E is convex.
Proof of Theorem IX.6. 1 : Let A, B be positive operators on 7t. Let A and B be the left and right multiplication operators on the space .C (7t) induced by A and B; i.e. , A(X ) == A X and B(X) == X B . Using the results of Exercise 1.4.4 and Problem VII.6 . 10, one sees that A and 8 are positive operators on (the Hilbert space) .C(1t) . Now, suppose A1 , A 2 , B1 , B2 are positive operators on 7t. Let A == A 1 + A 2 , B == B1 + B2 . Let A1 , A2 , A denote the left multiplication operators on .C (1t) induced by A 1 , A 2 , and A, respectively, and 81 , 82 , 8 the right multiplication operators induced by B1 , B2 , and B, respectively. Then A == A1 + A2 , B == B 1 + 82 . Hence, by Lemma IX. 6 . 2 . ,
At 8 1 t > At1 8 11  t + A2t 821  t '
for 0 < t < 1 . This is the same as saying that for every X in .C (7t)
or that
From this, it follows that
• This shows that f is concave. Another proof of this theorem is outlined in Problem IX. 8 . 17. Using the identification of .C('H) with 1t ® 1t , Lieb's Theorem can be seen to be equivalent to the following theorem.
Theorem IX.6.3 ( T. Ando} For each 0 < t < 1 , the map ( A, B ) A t ® B 1  t is jointly concave on pairs of positive operators on 1l .
�
2 74
IX.
A Selection of !vlatrix Inequalities
Exercise IX. 6 .4 Let t1 , t 2 be two positive numbers such that t 1 + t 2 < 1 . At1 ® B t 2 is jointly concave on pairs of positive Show that the map (A, B ) operators on 1l . [Hint: The map A As is monotone and concave on positive operators for 0 < s < 1 . See Chapter 5.} t
t
Lieb proved his theorem in connection with problems connected with entropy in quantum mechanics. This is explained below ( with some simpli fications ) . The function S(A)  tr A log A , where A is a positive matrix, is called the entropy function. This is a concave function on the set of positive operators. In fact, we have seen that the function f (t)  t log t is operator concave on ( 0, oo ) . Let K be a given Hermitian operator. The entropy of A relative to K is defined as S (A , K ) = tr [A 11 2 , K] 2 , ==
==
�
where [X, Y] stands for the Lie bracket ( or the commutator ) X Y  Y X . This concept was introduced by Wigner and Yanase , and extended by Dyson, who considered the functions
( IX. 4 5 ) 0 < t < 1. The WignerYanaseDyson conjecture said that St (A, K ) is concave in A on the set of positive matrices. Lieb's Theorem implies that this is true. To see this note that
(IX.46) Since the function g(A) ==  tr K 2 A is linear in A, it is also concave. Hence, concavity of St (A, K) follows from that of tr KAt K A l t . But that is a special case of Lieb's Theorem. Given any operator X, define
(IX. 47 ) 0 < t < 1 . Note that I0 (A, X) == 0. When X is Hermitian, this reduces to the function St , defined earlier. Lieb's Theorem implies that It ( A, X ) is a concave function of A . Hence, the function I (A, X) defined as
I(A, X) =
! t=O lt (A, X ) = tr (X* (log A) XA 
X*
X ( log A) A ) ( IX. 4 8 )
is also concave. Let A , B be positive matrices. The relative entropy of A and B is defined as S ( A I B ) tr ( A ( log A  log B)). ( IX.49) ==
This notion was introduced by Umegaki, and generalised by Araki to the von Neumann algebra setting.
IX. 7 O p er ator Approximat ion
Theorem IX. 6 . 5 {Lindblad) The function convex in
A, B.
Proof.
275
S( AI B ) defined above is jointly
Consider the blockmatrices
Then note that
==
S(A I B)  I(T, X ) .
•
I( T, X ) is concave in the argument Exercise IX.6.6 {i) Show that for every pinching C, S(C(A )I C (B ) ) < ). be the normalised trace function on matrices; i. e. , S(A{ii}I BLet ( A) T T tr A . Show that for all positive matrices A , B T(A) (log T ( A )  log T (B )) < T ( A ( lo A  log B ) ) . (IX . 50 )
We noted earlier that
l n
T.
n
xn
==
g
This is called the PeierlsBogoliubov Inequality . {There are other in equalities that go by the same name.)
IX. 7
O perator Approximation
An operator approximation problem consists of finding, for a given oper ator the element nearest to it from a special class. Some problems of this type are studied in this section. In formulating and interpreting these results , it is helpful to have an analogy: if arbitrary operators are thought of as complex numbers, then Hermitian operators should be thought of as real numbers, unitary operators as complex numbers of modulus one and positive operators as positive real numbers. Of course, this analogy has its limitations, since multiplication of complex numbers is commutative and that of operators is not. The first theorem below is easy to prove and sets the stage for later
A,
results.
�( A+A* ) . A (IX. 5 1 ) I l i A  Re A li < I l i A  Hi l P roof. Recall that I I T I I = l i l T * I l for every T . Using this fact and the triangle inequality, we have I li A  1 /2 ( A + A* )ll 1/2 Il i A  A* I l == 1 /2 I l i A  H + H  A* I l < 1 /2 ( I l i A  H i ll + Ill (A  H ) * Ill ) Il i A  Hil l ·
Theorem IX. 7. 1 Let be any operator and let Re A == Then, for every Hermitian operator H and for every unitarily invariant norm,
=
2 76
IX.
A Select ion of M atrix Inequalities
•
This proves the theorem.
The inequality (IX. 5 1 ) is sometimes stated in words as: in every unitarily invariant norm Re A is a Hermitian approximant to A. The next theorem says that a unitary approximant to A is any unitary that occurs in its polar decomposition.
Theorem IX. 7. 2 If A
== U P ,
lil A  U lll for every unitary
Proof.
W
<
where U is unitary and
l i l A  W i ll
<
P positive,
l i l A + U lll
then (IX. 52)
and for every unitarily invariant norm.
By unitary invariance , the inequality (IX. 52) is equivalent to
III P  J ill
<
III P  U* W III
<
III P + J il l 
So the assertion of the theorem is equivalent to the following: for every positive operator P and unitary operator V,
IIIP  I III
<
III P  V III
<
IIIP + l lll 
(IX. 53)
This will be proved using the spectral perturbation inequality (IV.62) . Let
=( 0
p
p p 0
) '  == ( V0 * V
Then P and V are Hermitian. The eigenvalues of P are the singular values
of P together with their negatives. (See Exercise IL 1. 15.) The same is true _ for V, which means that it has eigenvalues 1 and  1 , each with multiplicity n . We thus have
== (Eig1 (P)  I] EB [  EigT ( P) + I] , Eig 1 ( F )  Eigi C V ) == [Eig 1 ( P ) + I] EB (  E g ( P )  I].
E ig ! ( P )  Eig 1 C V )
So, from (IV.62) , we have
Il l [Eig 1 ( P )  I] EB [E ig1 ( P )  I] Ill
i T
<
<
Ill ( P  V) EB (P  V )* Il l Ill [Eig1 ( P ) + I] E9 [Eig ! ( P) + I] Ill 
This is equivalent to the pair of inequalities (IX. 53).
•
The two approximation problems solved above are subsumed in a more general question. Let be a closed subset of the complex plane, and let N( cp) be the collection of all normal operators whose spectrum is contained in
IX. 7 Operator A pproximat ion
277
circle. When i s the whole plane or the positive halfline , the problem becomes much harder, and the full solution is not known. Note that in the first case ( == C ) we are asking for a normal approximant to A , and in the second case ( == IR+ ) for a positive approximant to A. Some results on this problem, which are easy to describe and also are directly related to other parts of this book, are given below . We have already come across a special case of this problem in Chapter 6. Let F be a retraction of the plane onto the subset ; i.e. , F i s a map o f C onto 4> such that l z  F (z ) l < l z  w l for all z E C and w E . Such a map always exists; it is unique if (and only if) is convex. We have the following theorem.
Theorem IX. 7.3 Let F be a retraction of the plane onto the closed set . Suppose is convex. Then, for every normal operator A , we have
lil A  F (A) III
<
l i l A  Nlll
(IX . 54)
for all N E N () and for all unitarily invariant norms. If the set is not convex, the inequality (IX. 54) may not be true for all unitarily invariant norms, but is still true for all Qnorms. (See Theorem VI. 6. 2 and Problem
VI. 8. 1 3.)
Exercise IX. 7.4 Let A be a Hermitian operator, and let A == A+  A be
its Jordan decomposition. (Both A+ and A are positive operators.) Use the above theorem to show that, if P is any positive operator, then (IX. 55)
for every unitarily invariant norm. If A is normal, then for every positive operator P (IX. 56 ) l i l A  (Re A) + Ill < li l A  P i ll 
Theorem IX. 7. 5 Let A be any operator. Then for every positive operator p
(IX. 57 )
Proof.
Recall that 11 AI I � ==
li Re A l l � + ll lm A ll �  Hence,
I I A  ( Re A ) + I I � == l i Re A  (Re A) + II � + II Im A I I � From (IX. 55) , we see that l i Re A  ( Re This leads to the inequality (IX.57) . The
A) + ll� is bounded by
l i Re A  P II �
•
problem of finding positive approximants to an arbitrary operator A is much more complex for other norms. See the Notes at the end of the chapter. For normal approximants, we have a solution in all unitarily invariant norms only in the 2 x 2 case.
IX .
2 78
A Select ion of M atrix I nequalities
Theorem IX. 7.6 Let
A
be an upper triangular matrix (IX.58)
Let e
==
arg( .A 1
 .A2 ) ,
and let
No = Then
No
(
(IX. 59)
is normal, and for any normal matrix N we have
l i l A  No ll !
<
I l i A  Nlll
(IX . 60)
for every unitarily invariant norm.
P roof.
It is easy to check that N0 is normal. L et
N
==
(
X2 X3 X 4 X1
) be
any normal matrix. We must have ! x 2 l j x 3 1 in that case. t 1 t 2 1s. any matrix, . . . we can write Its offNow note t h at , 1 f T == .
diagonal part
( t� 5 )
( t3 t4 )
as
==
� (T  UTU * ) , where U is the diagonal matrix
with diagonal entries 1 and  1 . Hence, for every unitarily invariant norm, we have
III T II I
>
Using this, we see that
I ll A  N I l l
>
Ill diag ( b  x 2 ,  x 3 ) Ill

But ,
Thus the vector � (b, b) is weakly majorised by the vector ( lb  x 2 l , j x3 1 ), which in turn, is weakly majorised by the vector ( s 1 ( A  N ) , s 2 ( A  N )) as seen above. Since A N0 has singular values ( � b, � b) , this proves the • inequality (IX.60) . ,

Since every 2 x 2 matrix is unitarily equivalent to an upper triangular matrix of the form (IX. 58) , this theorem te lls us how to find a normal matrix closest to it .
Exercise IX. 7. 7 The measure of nonnormality of a matrix, with respect to any norm, was defined in Problems VII/. 6.4 and VII/. 6. 5. Theorem IX. 7. 6, on the other hand, gives for 2 x 2 matrices a formula for the distance to the set of all normal matrices. What is the relation between these two numbers for a given unitarily invariant norm ?
2 79
IX . 8 Problen1s
IX . 8
Problems
Problem IX. 8. 1 . Let A , B be positive matrices, and let m , k be positive integers with m > k. Use the inequality (IX. 13) to show that (IX. 6 1 ) The special case, t r(A B ) m < tr A m B m ,
(IX.62)
is called the LiebThirring inequality.
Problem IX. 8 . 2 . Let A, B be Hermitian matrices. Show that for every positive integer m (i) ltr ( A B ) 2 m l < tr A 2 m B 2 rn , (ii) ltr(A m B m ) 2 J < tr A 2 m B 2 m , (iii ) l tr ( A B) 4m l < tr( A 2 m B 2 rn ) 2 . (Hint: By the Weyl Majorant Theorem ltr X rn l < tr i X I m , for every matrix X . ) Note that if A=
( � �1 ) '
B=
( �1 � )
'
then J tr (A B ) 3 1 == 5, ltr A 3 B 3 l == 4, tr(A B ) 6 = 9 , and tr(A 3 B 3 ) 2 = 0. This shows the failure of possible extensions of the inequalities (i) and (iii) above.
Problem IX. 8 . 3 . Are there any natural generalisations of the above in equalities when three matrices A, B , C are involved? Take, for instance, the inequality (IX.62) . A product of three positive matrices need not have posi tive eigenvalues. One still might wonder whether ltr(A B C) 2 1 < l tr A 2 B 2 C 2 ] . Construct an example to show that this need not be true.
Problem IX. 8 . 4. A possible generalisation of the GoldenThompson in equality (IX. l9) would have been tr(eA+B+C ) < ltr( e Ae B e c ) l for any three Hermitian matrices A, B , C. This is false. To see this, let S1 , 52 , 53 be the Pauli spin matrices
If a 1 , a 2 , a 3 are any real numbers and a = (ai + a � + a § ) 1 12 , show that
280
IX .
A Select io n of M at r ix Inequalit ies
Le t Show that
For small
tr(e A + B + C )
2 cosh
t,
l tr( e A e B e 0 )
2 cosh
t[ l 
I
12 t4
+ O(t6 ) ] .
t, the first quantity is bigger than the second.
Problem IX. 8 . 5 . Show that the Lie P roduct Formula has a generalisation: for any k matrices A 1 , A 2 , . . . , A k ,
. ( A1 A hm exp  exp 2
m +CXJ
m
m
· · ·
)
Ak m
exp m
==
exp ( A 1 + A2 +
· · ·
+ Ak ) 
Problem IX.8 . 6 . Show that for any two matrices A, B we have
l i l A* B + B* A lii < III AA * + BB* III and
lil A* B + B* A lii < lil A* A + B* B ill
for every unitarily invariant norm.
P roblem IX. 8 . 7. Let X, Y be positive. Show that for every unitarily invariant norm
I I I X  Y l /1 <
(� ;)
From this, it follows that , for every A ,
I I A * A  AA* II < I I A I I 2 ' and
X
be any matrix. Problem IX. 8 . 8 . Let A, B be positive matrices and let Show that for all unitarily invariant norms, and for 0 < v < 1 ,
P roblem IX.8.9. Let A, B be positive operators and let T be any operat or such that I I T * x l l < II Ax ll and I! Tx l l < J I B xl l for all x . Show that, for all x , y and for 0 < v < 1 ,
IX. 8 P roblems
281
[Hint : From the hypotheses , it follows that A  1 T and TB  1 are contrac tions. The inequality (IX.38) then implies that (A 1 ) 1  vT (B  1 ) v is a con traction.]
Problem IX.8. 10. Use the result of the above problem to prove the fol lowing. For all operators T , vectors x , y , and for 0 < v < 1 ,
This inequality is called the Mixed Schwarz Inequality.
Problem IX.8. 1 1 . Show that if A , B are positive matrices , then we have det ( J + A + B ) < det (/ + A)det (/ + B ) . Then use this and Theorem IX.5. 1 1 to show that, for any two matrices
A, B,
l det(I + A + B ) I < det (I + I A I )det( J + I B I ) .
(See Problem IV .5. 9 for another proof of this. )
Problem IX.8 . 1 2 . Show that for all positive matrices A, B tr(A(log A  log B ) ) > tr(A  B ) .
The example A
=
(IX.63)
( � � ) , B ( ! � ) shows that we may not have =
the operator inequality A(log A  log B ) > ( A  B ) .
Problem IX.8 . 1 3 . Let f be a convex function on an interval /. Let A, B be two Hermitian matrices whose spectra are contained in I. Show that tr[f (A)  f( B ) ] > t r [ ( A  B ) f '(B ) ] . The special choice f ( t )
==
(IX.64)
t log t gives the inequality (IX.63) .
Problem IX. 8 . 14. Let A be a Hermitian matrix and f any convex func tion. Then for £very unit vector x f ( ( x , Ax ) ) < ( x , f (A) x) .
This implies that, for any orthonormal basis x 1 , . . . , X n , n
L!( (xj ,Axj ) ) < j= l
tr
f (A ) .
The name PeierlsBogoliubov inequality is sometimes used for this inequal ity, or for its special cases f(t) == e t , f(t) == e t , etc.
282
IX .
A Selection of Matrix Inequalities
Problem IX.8. 15. The concavity assertion in Exercise IX. 6.4. can be gen eralised to several variables. Let t 1 , t 2 , t3 be positive numbers such that t 1 + t 2 + t3 < 1 . Let A 1 , A2 , A3 be positive operators. Note that
Use the concavity of the first factor above (which has been proved in Exer cise IX.6.4) and the integral representation (V .4) for the second factor to prove that the map (A 1 , A 2 , A3 ) 7 A i1 ® A;2 @ A�3 is j ointly concave on triples of positive operators. More generally, prove that for positive num bers t1 , . . . , t k with t 1 + · · · + t k < 1 , the map that takes a ktuple of positive operators (A1 , . . . , A k ) to the operator Ai1 @ · · @ A�k is jointly concave. ·
Problem IX.8 . 16. A special consequence of the above is that the map A @ k A I / k is concave on positive operators for all k == 1 , 2 , Use this to prove the following inequalities for n x n positive matrices A, B: . . .
7
(i) (ii) (iii) (iv) (v) (vi)
®k (A + B) l fk > 1\ k ( A + B ) l fk > v k (A + B ) l f k > det (A + B) 1f n > per(A + B ) 1 f n > ck ( ( A + B ) 1 f k ) > 1\ k (A ) for 1 < k <
®kAl fk + ®k Blfk , 1\ k A l f k + 1\ k B l f k , v k A 1f k + v k B l f k , det A 1fn + det B 1fn , per A l / n + per B 1 f n, ck (A l f k ) + ck ( B l f k ) ,
where c k ( A ) == tr n. The inequality (iv) above is called the Minkowski Determinant Theorem and has been proved earlier (Corollary II.3.2 1 ) .
Problem IX. 8 . 17. Outlined below is another proof of the Lieb Concav ity Theorem which uses results on operator concave functions proved in Chapter 5. (i) Consider the space £ (1l)
EB
L ( ti) with the inner product
(ii) Let A 1 , A 2 be invertible positive operators on 7t and let 1/2 ( A 1 + A2 ) . Let fl ( R)
tl1 2 ( R , S)
A
ARA 1 ,
(A 1 RA ! 1 ,
A 2 RA; 1 ) .
Then � is a positive operator on the Hilbert space £ (7t) and � 1 2 is a positive operator on the Hilbert space £(1i ) EB £ (1l)
(iii) Nate that for any X in £(7t)
tr X* A t X A 1  t
==
( X A 1 12 , � t ( X A 11 2 ) )
IX . 8 P roblems
283
and tr ( X * A t1 X A 11  t + X * A t XA 1  t ) 2 2
( ( X A 1l /2 X A21/2 ) ' 6.. t12 ( X A 11 1 2 ' X A 2l /2 ) ) . '
( iv ) Let
V
be the map from .C(7t) into .C(1t) ffi £(7t) defined
V( X A l /2 ) Show that
V
==
as
1 _ J2 ( XA 1/2 ' X A 1/2 ) .
1
2
is an isometry. Show that
( v ) Since the function f ( t) on [0, oo ) is operator concave for 0 < t using Exercise V.2.4, we obtain
<
1,
( vi) This shows that
tr X * A t X A l  t > 1/2 tr ( X * A i X A � t + X * A � X A � t )
when A 1 and A 2 are invertible. By continuity, this is true for all positive operators A 1 and A 2 . In other words, for all 0 < t < 1 , the function 1s concave. 2
2 operator matrices ( � of Lieb's concavity theorem.
( vii) Use
x
�)
and ( �
g ) to complete the proof
Problem IX.8. 18. Theorem IX. 7. 1 can be generalised as follows. Let cp be a mapping of the space of n x n matrices into itself that satisfies three conditions: (i) cp 2 is the identity map; i.e. , cp( cp(A) )
( ii )
c.p
is real linear; i.e. , cp(aA + (J B) all real a, (3.
==
==
A for all A.
acp (A) + (Jcp ( B) for all A, B and
( iii ) A and cp(A) have the same singular values for all A.
Then the set I(cp) == {A : cp(A) == A} is a real linear subspace of the space of matrices. For each A, the matrix � ( A + cp ( A )) is in I ( cp) , and for all unitarily invariant norms l i l A  � (A + cp ( A) ) I I I < l i l A  B i l l for all B E /(cp) .
284
IX .
A Selection of Mat rix Inequalities
Examples of such maps are cp(A ) == ±A * , cp (A ) == ±AT , and cp(A ) == ±A , where AT denotes the transpose of A and A denotes the matrix obtained by taking the complex conj ugate of each entry of A.
Problem IX. 8. 19. The Cayley transform of a Hermitian matrix A is the unitary matrix C ( A ) defined as
C(A) == (A  il ) ( A + i J)  1 . If A , B are two Hermitian matrices, we have
;i [C( A )  C(B)]
==
( B + iJ)  1 (A  B) (A + iJ)  1 .
Use this to show that for all j , 1 /2 S j ( C ( A )  C ( B) ) < Sj ( A  B) . [No te that II ( A + i /)  1 11 < 1 and l i ( B + i J)  1 11 < 1 . ] I n particular , this gives
1/ 2 III C ( A )  C ( B) III < lil A  B ill
for every unitarily invariant norm.
Problem IX.8 . 20. A 2 x 2 block matrix A == ( ��� ��� ) , in which the four matrices Aij are normal and commute with each other, is called binormal. Show that such a matrix is unitarily equivalent to a matrix A == ( �1 � ) , in which A1 , A 2 , B are diagonal matrices and B is positive. Let
A1 ( No = � u 2 B
lB � 2
),
where U is the unitary operator such that A 1  A 2 that in every unitarily invariant norm we have 
for all 2n
x
==
UjA 1
 A2 l 
Show

l i l A  Na ill < l i l A  N I J I 2n normal matrices N .
Problem IX. 8 . 2 1 . An alternate proof of the inequality (IX.55) is out lined below. Choose an orthonormal basis in which A is diagonal and A == ( Ao+ � _ ) . In � his basis let P have the block decomposition P == ( ��� ��� ) .  pinching inequality, By the
lil A  P il l >
(
A +  p1 1 0
O
 A   P22
)
Since both A and P22 are positive, l i l A  I l l < lilA  + P22 l l l  Use thi s to prove the inequality (IX. 55) . This argument can be modified to give another proof of {IX. 56) also. For this we need the following fact . Let T and S be operators such that 0 < Re T < Re S, and Im T == Im S. If T is normal, then I I I T I I I < I I I SI I L for every unitarily invariant norm. Prove this using the Fan Dominance Theorem (Theorem IV.2.2) and the result of Problem III.6.6.
IX. 9 Notes and References
IX . 9
285
Notes and References
Matrix inequalities of the kind studied in this chapter can be found in several books. The reader should see, particularly, the books by Horn and Johnson, Marshall and Olkin, Marcus and Mine, mentioned in Chapters 1 and 2; and , in addition, M.L. Mehta, Matrix Theory, second edition , Hindustan Publishing Co. , 1989, and B . Simon, Trace Ideals and Their Applications, Cambridge University Press, 1979 . Proposition IX. 1 . 1 is proved in B . Simon, Trace Ideals, p. 95. Proposition IX. l . 2 , for the operator norm , is proved in A. Mcintosh , Heinz inequalities and perturbation of spectral families, Macquarie Math. Reports, 1979 ; and for all unitarily invariant norms in F. Kittaneh, A no te on the arithmetic geometric mean inequality for matrices, Linear Algebra Appl. , 171 ( 1992 ) 18. The Lie product formula has been extended to semigroups of operators in Banach spaces by H.F. Thotter. Our treatment of the material between Theorem IX.2. 1 and Exercise IX.2.8 is based on T. Furuta, Norrn inequalities equivalent to LownerHeinz theorem, Reviews in Math. Phys. , 1 ( 1989) 135 1 37. The inequality (IX . 5 ) can also be found in H . O . Cordes, Spectral Theory of Linear Differential Operators and Comparison Algebras, Cambridge University Press , 1987. Theorem IX. 2 . 9 is taken from B. Wang and M. Gong, Some eigenvalue inequalities for positive semidefinite matrix power products, Linear Algebra Appl. , 184( 1993) 249260. The inequality (IX. 13) was proved by H. Araki, On an inequality of Lieb and Thirring, Letters in Math. Phys. , 19( 1990) 167 1 70. Theorem IX. 2. 10 is a rephrasing of some other results proved in this paper. The GoldenThompson inequality is important in statistical mechanics. See S. Golden, Lower bounds for the Helmholtz function, Phys. Rev. B , 137 (1965) 1 127 1 128, and C.J. Thompson) Inequality with applications in sta tistical mechanics, J . Math. Phys. , 6 ( 1965) 1812 1 8 1 3 . It was generalised by A. Lenard , Generalization of the Golden Thompson inequality, Indiana Univ. Math. J. 21 ( 1971) 457468, and further by C . J . Thompson, Inequali ties and partial orders on matrix spaces, Indiana Univ. Math. J. 21 ( 1971 ) 469480. These ideas have been developed further in the much more gen eral setting of Lie groups by B. Kostant , On convexity, the Weyl group and the Iwasawa decomposition, Ann. Sci. E.N. S . , 6 ( 1973) 413455, and sub sequently by others. Inequalities complementary to the GoldenThompson inequality and its stronger version in Exercise IX. 3 . 8 have been proved by F. Hiai and D . Petz, The Golden Thompson trace inequality is com plemented, Linear Algebra Appl. , 18 1 (1 993) 153 185, and by T. Ando and F. Hiai, Log majorization and complementary Golden Thompson type in equalities, Linear Algebra Appl. , 197 / 198( 1994) 1 13 1 3 1 . Theorem IX. 3. 1 is a rephrasing of some results in J .E. Cohen, Inequalities for matrix ex ponentials, Linear Algebra Appl. , 1 1 1 ( 1988) 2528 . Further results on such in equalities may be found in J.E. Cohen, S. Friedland , T. Kato, and F.P.
286
IX.
A Selection of Matrix Inequali ties
Kelly, Eigenvalue inequalities for products of matrix exponentials, Linear Algebra Appl. , 45 ( 1 982) 5595 , in D . Petz, A survey of certain trace in equalities, Banach Centre Publications 30( 1994) 287298, and in Chapter 6 of Horn and Johnson, Topics in Matrix Analysis. Theorem IX.4.2 was proved in R. Bhatia and F. Kittaneh, On the singular values of a product of operators, SIAM J. Matrix Analysis, 1 1 ( 1990) 272277. The generalisation given in Theorem IX.4.5 is due to R. Bhatia and C. Davis, More matrix forms of the arithmeticgeometric mean in equality, SIAM J. Matrix Analysis, 14( 1993) 132 136. Many of the other results in Section IX.4 are from these two papers. The proof outlined in Exercise IX.4.6 is due to F. Kittaneh, A note on the arithmeticgeometric mean inequality for matrices, Linear Algbera Appl. , 1 71 ( 1 992) 18. A gen eralisation of the inequality (IX.2 1 ) has been proved by T. Ando, Matrix Young inequalities, Operator Theory: Advances and Applications, 75 ( 1995) 3338. If p, q > 1 and � + ! == 1 , then the operator inequality l AB * I < U( lp l A I P + lq j B i q ) U * is valid for some unitary U. Theorems IX.5. 1 and IX.5 . 2 were proved in R. Bhatia and C . Davis, A CauchySchwarz inequality for operators with applications, Linear Al gebra Appl. , 223 ( 1 995) 1 19 129. For the case of the operator norm, the inequality (IX. 38) is due to E. Heinz , as are the inequality (IX.29) and the one in Problem IX.8.8. See E. Heinz , Beitriige zur Storungstheorie der Spektralzerlegung, Math. Ann. , 123 ( 1 95 1 ) 4 15438. Our approach to these inequalities follows the one in the paper by A. Mcintosh cited above. The inequality in Problem IX.8.9 is also due to E. Heinz. The Mixed Schwarz inequality in Problem IX.8 . 10 was proved by T. Kato, Notes on some in equalities for linear operators, Math. Ann. , 125( 1952) 2082 12. (The papers by Heinz, Kato, and Mcintosh do much of this for unbounded operators in infinitedimensional spaces. ) The class £ in Definition IX. 5 . 6 was in troduced by E.H. Lieb, Inequalities for some operator and matrix func tions, Advances in Math. , 20( 1976) 174 1 78. Theorem IX. 5. 1 1 was proved in this paper. These functions are also studied in R. Merris and J .A. Dias da Silva, Generalized Schur functions, J. Algebra, 35 ( 1975) 442448. B . Simon ( Trace Ideals, p . 99) calls them Liebian functions. The character isation in Theorem IX. 5. 10 has not appeared before; it simplifies the proof of Theorem IX.5. 1� considerably. The Lieb Concavity Theorem was proved by E.H. Lieb, Convex trace functions and the Wigner YanaseDyson conjecture, Advances in Math. , 1 1 ( 1973) 267288. The proof given here is taken from B. Simon, Trace Ide als. T. Ando, Concavity of certain maps on positive definite matrices and applications to Hadamard products, Linear Algebra Appl. , 26 ( 1979) 20324 1 , takes a different approach. Using the concept of operator means, he first proves Theorem IX.6.3 (and its generalisation in Problem IX. 8. 15) and then deduces Lieb �s Theorem from it. The proof given in Problem IX. 8. 17 is taken from D. Petz , Quasientropies for finite quantum systems, Rep.
I X . 9 Notes and References
28 7
Math. Phys . , 2 1 ( 1986) 5765. Theorem IX.6 . 5 was proved by G. Lindblad , Entropy, information and quantum measurements, Commun. Math. Phys. , 33 ( 1 973) 305322 . Our proof is taken from A. Cannes and E. St¢rmer, Entropy for automorphisms of 1 11 von Neumann algebras, Acta Math. , 134 ( 1975) 289306. The reader would have guessed from the titles of these papers that these inequalities are useful in physics. The book Quantum En tropy and Its Use by M. Ohya and D . Petz, SpringerVerlag, 1993 , contains a very detailed study of such inequalities. Another pertinent reference is D. Ruelle, Statistical Mechanics, Benj amin, 1969. The inequalities in Prob lem IX. 8. 1 6 are taken from T. Ando, Inequalities for permanents, Hokkaido Math. J . , 1 0 ( 1 98 1 ) 1836, and R. Bhatia and C. Davis , Concavity of certain functions of matrices, Linear and Multilinear Algebra, 1 7 ( 1985 ) 155 164. Theorems IX. 7. 1 and IX. 7.2 were proved in K. Fan and A.J. Hoffman, So me metric inequalities in the space of matrices, Pro c. Amer. Math. Soc. , 6( 1955) 11 1 1 16. The inequalities in Problem IX.8. 19 were also proved in this paper. The result in Problem IX.8 . 1 8 is due to C.K. Li and N.K . Tsing, On the unitarily invariant norms and some related results, Lin ear and Multilinear Algebra, 20 (1987) 107 1 19 . Two papers by P.R. Hal mas, Positive approximants of operators, Indiana Univ. Math. J. 2 1 ( 1972) 95 1960, and Spectral approximants of normal operators, Proc. Edinburgh Math. Soc. , 19 ( 1974) 5 158, made the problem of operator approximation popular among operator theorists. The results in Theorem IX. 7 .3, Exer c ise IX. 7.4 , and in Problem IX.8. 2 1 , were proved in these papers for the special case of the operator norm (but more generally for Hilbert space operators) . The first paper of Halmos also tackles the problem of finding a positive approximant to an arbitrary operator, in the operator norm. The solution is different from the one for the HilbertSchmidt norm given in Theorem IX. 7. 5, and the problem is much more complicated. The problem of finding the closest normal matrix has been solved completely only in the 2 x 2 case. Some properties of the normal approximant and algorithms for finding it are given in A. Ruhe, Closest normal matrix finally found! BIT, 27 ( 1987) 585598. The result in Problem IX. 8 . 20 was proved, in the special case of the operator norm, by J. Phillips, Nearest normal approx imation for certain normal operators, Proc. Amer. Math. Soc. , 67 ( 1977) 236240. The general result was proved in R. Bhatia, R. Horn, and F. Kit taneh, Normal approximants to binormal operators, Linear Algebra Appl. , 147( 199 1 ) 169 1 79. An excellent survey of matrix approximation problems, with ma ny references and applications, can be found in N.J. Higham, Ma trix nearness problems and applications, in the collection Applications of Matrix Theory, Oxford University Press, 1989. A particularly striking ap plication of Theorem IX. 7.2 has been found in quantum chemistry. Given n linearly independent unit vectors e 1 , . . . , e n in an ndimensional Hilbert space, what is the orthonormal basis f1 , . . . , f that is closest to the ej , in the sense that L: l l eJ  fJ 11 2 is minimal? The GramSchmidt procedure does not lead to such an orthonormal basis. The chemist P. O. Lowdin, On the n
288
IX .
A Selection of Matrix Inequalities
non orthogonality problem connected with the use of atomic wave functions in the theory of molecules and crystals, J. Chern. Phys. , 1 8 ( 1950) 365374, found a procedure to obtain such a basis. The problem is clearly equivalent to that of finding a unitary matrix closest to an invertible matrix, in the HilbertSchmidt norm. Theorem IX. 7.2 solves the problem for all unitar ily invariant norms. The importance of such results is explained in J .A. Goldstein and M. Levy, Linear algebra and quantum chemistry, American Math. Monthly, 78 ( 199 1) 710718.
X Pert urb ation of Matrix Functions
In earlier chapters we derived several inequalities that describe the variation of eigenvalues, eigenvectors, determinants, permanents , and tensor powers of a matrix. Similar problems for some other matrix functions are studied in this chapter.
X. l
O perator Monotone Functions
If a , b are po3itive real numbers, then it is easy to see that I a '  b ' I > I a  b l' if r > 1 , a nd I a '  b' l < I a  b l ' if 0 < r < 1. The inequalities i n this section are extensions of these elementary inequalities to positive operators A, B . Instead of the power functions f ( t) t' , 0 < r < 1 , we shall consider the more general class of operator monotone functions. ==
Theorem X. l . l Let f be an operator monotone function on [0 , oo ) such that f (O) == 0 . Then for all positive operators A , B ,
ll f ( A)  f ( B) I I
<
f( II A  E ll ) 
(X. 1 )
Proof. Since f is concave (Theorem V.2.5) and f ( O ) == 0 , we have f (a + b ) < f (a ) + f ( b ) for all nonnegative numbers a , b . Let a == II A  E ll  Then A  B < a!. Hence , A < B + al and f (A) < f ( B + a!) . By the subadditivity property of f mentioned above, therefore, f(A) < f ( B ) + f ( a)I . Thus f ( A )  f (B) < f(a)I and, by symmetry, f(B)  f (A ) < f (a)I. This implies that lf( A)  f( B) j < f (a)I. Hence, • ll f (A )  f ( B) I I < ! ( a ) == !( II A  E ll ) .
290
X.
Perturbat ion of M atrix Functions
Note the special consequence of the above theorem:
II A'  B ' II
II A  B II' , 0 < r < 1 A , B . Note also that the
(X. 2)
<
for any two positive operators above proof shows that
l ll f (A)  f(B) III
<
argument in the
f( II A  B JI ) IIIIIII
for every unitarily invariant norm.
Exercise X. 1 . 2 Show that for 2 x 2 positive matrices, the inequality II A 1 / 2  B 1 12 11 2 < II A  B ll ; 1 2 is no t always valid. (It is false even whe n B 0.) ==
The inequality (X.2) can b e rewritten i n another form:
I I A'  B ' Il < I J I A  B J' II ,
(X. 3)
O < r < l.
This has a generalisation to all unitarily invariant norms . Once again, for this generalisation, it is convenient to consider the more general class of operator monotone functions. Recall that every operator monotone function f on [0, oo ) has an integral representation
f( t )
==
� + f3t +
CXJ
J
>.. t
0 ).. +
t dw(>.. ) ,
where � == f( O ), /3 > 0 and w is a positive measure such that < oo . ( See (V . 53) .)
(X.4)
J0CXJ 1 �>. dw (>.. )
Theorem X. 1 . 3 Let f be an operator monotone function on [0, oo ) such that f ( O ) == 0. Then for all positive operators A , B and for all unitarily invariant norms
lll f ( A )  f (B) III
<
III J (I A  B l ) lll 
(X.5)
In the proof of the theorem, we will use the following lemma.
Lemma X. 1 . 4 Let X, Y be positive operators. Then
for every unitarily invariant norm. Proof. Since (X + I )  1 < I , by Lemma V. l . 5 we have Y 11 2 (X + 1 ) 1 Y 1 1 < Y. Hence,
2
X . l O perator Monotone Functions
291
Therefore, by the Weyl �1onotonicity Principle ,
for all j . Note that Y 1 1 2 ( X + J ) 1 Y 1 1 2 has the same eigenvalues as those of ( X + I)  2 Y (X + I ) 2 . So, the above inequality can also be written as 1
l
From the identity
( X + J )  1  (X + Y + I)  1 == ( X + J )  2 { I  [( X + I )  2 Y ( X + J )  2 + I]  1 } ( X + J )  2 l
l
1
1
l
and the fact that I I ( X + J ) 2 11 < 1 , we see that
AJ ( [X + J]  1  (X + Y + I]  1 ) <
A; ( I  ( ( X + I)  � Y(X + I )  � + I ]  1 ) .
Thus, for all j . This is more than what we need to prove the lemma.
•
Proof of Theorem X. 1 . 3 : By the Fan Dominance Property ( Theorem IV.2.2) it is sufficient to prove the inequality (X. 5) for the special class of norms I I · ll ( k) ' k == 1 , 2, . . . , n . We will first consider the case when A  B is positive. Let C == A  B . We want to prove that
l l f( B + C )  f(B) I ! (k) < l l f( C ) I ! (k) · Let O"j == Sj (C) , j == 1 , 2, . . . operator C, we have
, n.
Since
()i
(X. 6)
are the eigenvalues of the positive
sj ( h ( C )) == h( ()j ) , j == 1 , . . . , n , for every monotonically increasing nonnegative function h ( t) on [0, oo ) . Thus
k
ll h ( C ) I I (k) == L h ( ()j ) , j= 1 for all such functions
h.
k == 1 , . . .
, n,
292
X.
Perturbation of lviatrix Functions
Now , f has the representation (X.4) with 1 0. The functions {3t and ).>::t are nonnegative , monotonically increasing functions of t. Hence, ==
I I J ( C ) II ck)
j= l
,6 1 [ G i l (k) +
j A l l C ( C + AI)  I l l ( k) dw ( CX)
>.) .
0
In the same way, we can obtain from the integral representation of f
ll f(B +
<
C)  f ( B ) l l c k )
j A I I ( B + C) ( B + C (X)
,B I I C II tkl +
+ >. I )  1
 B ( B + >.. I )  1 II c k ) dw ( >.. ) .
0
Thus, our assertion (X.6) will be proved if we show that for each
)..
>0
Now note that we can write
1 X ( ) X ( X + .Xl)  1 == 1  ): + I .
So, the above inequality follows from Lemma X. l .4. This proves the theo rem in the special case when A  B is positive. To prove the general case we will use the special case proved above and two simple facts. First, if X , Y are Hermitian with positive parts x+ , y+ in their respective Jordan decompositions, then the inequality X < Y implies that n x+ I I ( k ) < I I Y+ I I ( k ) for all k . This is an immediate conse quence of Weyl 's Monotonicity Principle. Second, if X1 , X2 , Y1 , Y2 are pos 0, yl y2 = 0, I I Xl l l c k ) < I I YI I ! ( k ) , and itive operators such that x l x2 I IX2 I I (k ) < I I Y2 l l ( k ) for all k, then we have IIX1 + X2 ll{k) < II Y1 + Y2 l l (k ) for all k. This can be easily seen using the fact that since X1 and X2 commute they can be simultaneously diagonalised, and so can Y1 , Y2 . Now let A , B be any two positive operators. Since A  B < (A  B ) + , we have A < B + (A  B ) + , and hence f ( A ) < f (B + (A  B ) + ) . From this we have ==
j ( A )  f ( B ) < f ( B + ( A  B) + )  f ( B ) , and , therefore, by the first observation above,
11 [ /(A )  f ( B )J + I I c k )
< I I J ( B + (A  B) + )  f ( B ) I I c k)
X . l O p erator 1vlonotone Fu nct ions
for all k. Then, using the special case of t h e conclude that
for all k. Interchanging
theorern
293
proved above \ve can
A and B, we have
for all k. Now note that
f([A  B ] + )j ( [ B  A] + ) f( [A  B] + ) + f ( [B  A] + ) [f(A)  f(B) ] + [f(B )  j(A )] + [ f ( A )  f(B) ] + + [ f(B)  f(A)] +
0,
f(lA  Bl ) , 0, lf ( A)  /( B) I .
Thus , the two inequalities above imply that
ll f (A)  f(B ) I I c k ) < ll f ( I A  B l ) ll c k ) for all k. This proves the theorem.
•
Exercise X. 1 . 5 Show that the conclusion of Theorem X. 1 . 3 is valid for all nonnegative operator monotone functions on (0 , oo) ; i. e. , we can replace the condition f( O ) == 0 by the condition j ( O ) > 0 . One should note two special corollaries of the theorem: we have for all positive operators A , B and for all unitarily invariant norms (X. 7) l!l log( J + A)
 log( J + B) lll <
lll log( J +
l A  B l ) lll 
(X.8)
Theorem X. 1 . 6 Let g be a continuous strictly increasing map of [0, oo) onto itself. Suppose that the inverse map g  1 is operator mono tone. Then for all positive operators A, B and for all unitarily invariant norms, we have (X.9) I J i g( A )  g(B) III > lll g(j A  B l ) lll P roof. Let f == g 1 . Since f is operator monotone, it is concave by Theorem V.2.5. Hence g is convex. From Theorem X. 1 .3, with g(A) and g (B) in place of A and B , we have
Ili A  Bi ll < II IJ ( I g ( A )  g (B ) I ) II I This is equivalent to the weak majorisation
{ sj ( A  B) } < w { sj (f ( ! g (A )  g ( B) I )) } .
X.
294
Since f
is
Pert urbation of M atrix Fu nct ions
monotone ,
s1 (J( jg(A)  g ( B ) I ) ) == f (s j (g(A)  g( B ) ) )
for each j . So, we have
Since
Since
g
g
{ sj (A  B ) } < w { f (sj (g (A)  g( B ) ) ) } .
is convex and monotone, by Corollary II.3.4, we have from this {g(sj (A  B ) ) } < w { sj (g(A)  g( B ) ) } .
is monotone, this is the same
as
saying that
{ Sj (g j A  B l ) } < w { S j (g(A)  g ( B ) ) } ,
and this, i n turn, implies the inequality (X.9) .
•
Two special corollaries that complement the inequalities (X. 7) and (X. 8 ) are worthy of note. For all positive operators A, B and for all unitarily invariant norms, III A r  B r ill > III l A  B l r lll , if r > 1 ,
(X. lO)
I ll exp A  exp B ill > I l l exp ( I A  B l )  I l ll 
(X. l l )
Exercise X. l . 7 Derive the inequality (X. l 0) from (X. 7) using Exercise IV. 2. 8.
Is there an inequality like (X. 10) for Hermitian operators A, B ? First note that if A is Hermitian, only positive integral powers of A are meaningfully defined. So, the question is whether I I I ( A  B ) m l l l can be bounded above by I I I Am  Em i l i · No such bound is possible if m is an even integer; for the choice B == A we have Am  Em == 0. For odd integers m , we do have a satisfactory answer. 
,
Theorem X. l . 8 Let A, B be Hermitian operators. Th__e n for all unitarily �; invariant norms, and for m == 1 , 2 , . . . ,
( X. 12) For the proof we need the following lemma. Lemma X. 1.9 Let A, B be Hermitian operators, let X be any operator, and let m be any positive integer. Then, for j == 1 , 2, . , m , .
.
I II A m+j x B m  j+l  A mj+l x B m+j lll < III A m+ j + l X B m  j  A m  j X B m+ j + l i J I .
(X. 13)
X. l
Operator Mo notone Funct ions
295
Proof. By the arithmeticgeometric mean inequality proved in Theorem IX.4. 5 , we have for all operators X and Hermitian A, B . We will use this to prove the inequality ( X. 13 ) . First consider the case j == 1 . We have
III Am + l X B m  A m X B m +l lll III A( Am X B m  1  A m  1 X B m )B ! II < 1 /2 III A 2 (A mx B m  1  A m l x B m ) + (A m XB m1 _ A m 1 XB m ) B 2 11 1 < l/2 III A m+ 2 x B m  I  Am  I x B m + 2 111 + 1/2 II J A m+lx B m  A m XB m+ 1 lll Hence
This shows that the inequality (X. l3) is true for j == 1 . The general case is proved by induction. Suppose that the inequality ( X. 13 ) has been proved for j  1 in place of j. Then using the arithmeticgeometric mean inequality, the triangle inequality, and this induction hypothesis, we have
III Am+j X B m  j +l  A m  j + l X B m+j lll I J I A(Am+j  l X Bm  j  Am  ix B m+j  l )B III < 1 /2 III A 2 (Am+j l X B m  j  A m  j X B m+ j  1 ) + (A m+ j  l XB m j  A m  j XB m+ j  1 )B 2 111 m+i+l X B m j  Am i X B m +j + l lll < 1/2 + 1/2 I II A m + (j  1) X B m  {j  1 ) + 1  A m  (j  1 ) + 1 X B m + (j  1 ) Ill < 1/2 1 11 Am+j + 1 X B m  j  A m  j X B m+j + 1 Ill + 1/2 I II A m+ j XB m  {i 1 )  A m  (i  1 ) X B m+ i lll 
lil A
This proves the desired inequality.
•
Proof of Theorem X. 1 . 8 : Using the triangle inequality and a very special case of Lemma X. l . 9, we have
III A2 m ( A  B) + ( A  B)B 2m lll < I II A 2 m + 1  B2 m+ 1 1 11 + I I I A2 m B  AB 2 m l ! (X . 14) <  B 2 m+l lll · Let C == A  B and choose an orthonormal basis Xj such that Cxj A.1 x1 , where sj ( C ) for j 1 , 2 , . . , n . Then, by the extremal representation
2 j i A2m. + l
j >..1 j
=
=
=
.
X.
296
Perturbat ion of Matrix Functions
of the Ky Fan norms given in Problem III.6.6,
k II A 2 m c + CB 2 m ll (k) > L l ( xj , (A 2 m c + CB 2 m) xj) l j= l k
L !Xj l { ( xj , A2 rn xj ) + ( xj , B 2 m xj ) } . j= l Now use the observation in Problem IX.8. 14 and the convexity of the func tion f ( t) == t 2 rn to see that
( xj , A2 mxj) + ( xj , B 2 mxj)
We thus have
I I A2 mC + CB 2 m !! (k)
l 1Axj ll 2 m + I J B xj ll 2 m 2 1  2 m ( II Axj II + J ! B xj 11 ) 2 m 2 1  2 m i 1Axj  Bxj l l 2 m 2 1  2m 1 Aj 1 2 m .
> > >
>
k
L 2 1  2 m 1Aj l 2 m + l j=l
Since this is true for all k, we have
for all unitarily invariant norms. Combining this with (X. l4) we obtain the • inequality (X. 12). Observe that when B A, the two sides of (X. l2) are equal. ==
X.2
The Absolute Value
In Section VIL5 l i l A  B i l l  More (A * A ) 1 1 2 , results section are useful
we obtained bounds for I l l I A I  I B I I l l in terms of such bounds are obtained in this section. Since I A I for the square root function obtained in the preceding here. ==
Theorem X.2. 1 Let invariant norrn,
A, B
be any two operators. Then, for every unitarily
III I A I  I B I III < v'2 ( III A + B i l l l i l A  B lll ) 1 1 2 . P roof.
(X. 15)
From the inequality (X. 7) we have
I II I A I  I B I III < III l A* A  B* B l 11 2 lll ·
( X. 16 )
X . 2 The Absolute Value
297
Note that
A* A  B * B == 1 /2 { (A + B) * (A  B ) + ( A  B)* (A + B ) } .
( X . 17)
Hence, by Theorem III. 5.6, we can find unitaries U and V such that
j A* A  B * B l < 1/2 {Uj (A + B) * (A  B) j U* + V I (A  B) * ( A + B) I V * } . Since the square root function is operator monotone, this operator inequal ity is pre s erved if we take square roots on both sides . Since every unitarily invariant norm is monotone, this shows that
III l A * A  B* B l 1 /2 lll 2 < 1/2 ll l [U I (A + B)* (A  B) I U * + V j (A  B ) * (A + B) I V* P I 2 III 2 By the result of Problem IV. 5.6, we have
for all X, Y . Hence ,
I I I l A* A  B* B l 11 2 l l l 2 < lll l (A + B ) * (A  B) l 1 1 2 lll 2 + III I ( A  B)* ( A + B) l 1 1 2 lll 2 By the CauchySchwarz inequality (IV . 44 ) , the righthand side is bounded above by 2 I I I A + B !J I I II A  B ill  Thus the inequality (X . 1 5 ) now follows from • (X. l6) .
Example X. 2 . 2 Let
Then
I l i A !  l E I II 1
==
2,
II A + B l h == II A  B l h == v'2.
So, for the trace norm the inequality (X. 1 5) is sharp. An improvement of this inequality is possible for special norms.
Theorem X.2.3 Let A, B be any two operators. Then for every Q norm {and thus, in particular, for every Schatten pnorrn with p > 2 ) we have
I I I A I  I B l llo < ( II A + B ll o ii A  B I I Q ) 1 1 2 .
(X. 18)
Proof. By the definition of Qnorms , the inequality (X. l8) is equivalent to the assertion that (X. 19)
X.
298
Pert u r bat ion of Matrix Functions
for all unitarily invariant norms. From the inequality (X. 10) we have
Using the identity (X. l 7) , we see that
l i l A* A  B * B i ll
1 / 2 { III (A + B ) * (A  B ) l l l + III ( A  B ) * ( A + B ) l l l } .
<
Now, using the CauchySchwarz inequality given in Problem IV. 5 .7, we see that each of the two terms in the brackets above is dominated by •
This proves the inequality (X. 19) .
Theorem X . 2 . 4 Let p norms with 1 < p <
A, B 2,
be any two operators. Then, for all Schatte n (X. 20)
Proof.
Let
II X l i P
: == (� sj ( X)) l /
p,
for all
p
> 0.
When p > 1, these are the Schatten pnorms. When 0 < p < 1 , this defines a quasinorm. Instead of the triangle inequality, we have
(X.2 1) (See Problems IV.5. 1 and IV. 5 . 6 . ) Note that for all positive real numbers r and p, we have
(X.22 )
Thus, the inequality (X. 7) , restricted to the pnorms, gives for all positive operators A, B
(X. 23 ) Hence, for any two operators
A , B,
I I I A I  I BI I I P < I I A* A  B* B l l �j� , 1 < p < the identity (X 1 7 ) and the property (X.21) to
00 .
Now use
see that , for
1 < p < 2, I I A * A  B* B ll p /2 < 2 2/ p  2 { I I ( A + B ) * (A  B ) llp /2 + II (A  B)* (A + B ) ,,p/2 }From the relation (X 2 2 ) and the CauchySchwarz inequality (IV. 44) , it fol .
.
lows that each of the two terms in the brackets is dominated by II A + B ll p l i A  B l i p · Hence II A* A 
B * B ll p /2 < 2 2 /p  l i i A + B ll p i i A  B l i P
X.2 The Absolute Value
for 1 <
p
•
< 2 . This proves the theorem.
The example given in X.2 . 2 shows that , for each 1 < (X .20) is sharp. In Section VIL5 we saw that
p
299
< 2 , the inequality
( X . 24) for any two operators A, B. Further, if both A and B are normal, the factor J2 can be replaced by 1. Can one prove a similar inequality for the operator norm instead of the HilbertSchmidt norm? Of course , we have from (X.24) II I A I  l E I II < V2n II A  B l l
(X . 2 5)
for all operators A , B on an ndimensional Hilbert space 7t. It is known that the factor ffn in the above inequality can be replaced by a factor en == O ( log n) ; and even when both A , B are Hermitian, such a factor is necessary. (See the Notes at the end of the chapter. ) In a slightly different vein, we have the following theorem.
Theorem X . 2 . 5 {T. Kato) For any two operators A , B we have
�
I I I A I  I B i ll <
II A  E ll
(
2 + log
1 �1 � �� I I ) .
(X. 26)
Proof. The square root function has an integral representation (V .4) ; this says that t l /2
=
� /1
CXJ
J 0
We can rewrite this
t A  l / 2 dA. A+t
as
= : j [>..  1 / 2 CXJ
t l /2
_
A l /2 (A
+
t )  l ] dA.
0
Using this, v.re have
� J A 1 12 [( 1 B I 2 + A)  I  ( I A I 2 + A) 00
IAI  IBI
=
 l ]dA .
( X .27 )
0
We will estimate the norm of this integral by splitting it into three parts. Let
X.
300
Pert urbation of M at rix Functions
Now note that if X, Y are two positive operators, then Y < X  Y < < and hence I I X  Y ll max( I I X II , I I Y I I ) . Using this we see that
X,
Q
II
J >._ 1 /2 [ ( 1 B I 2 + >.. )  1  ( I A I 2 + .A)  1 ]dA I I 0
<
Q
J
>..  1 1 2 d >..
=
2a 112
=
2 II A  E l l ·
(X. 28)
0
From the identity
and the identity (X. l 7) , we see that ll ( I B I 2 + A)  1  ( I AI 2 + A )  1 11
<
<
A  2 I I A + B I I I I A  E ll A  2 ,e 1 /2 11 A  B I I .
Hence,
J >.. 1 12 [ ( 1 B I 2 + >.. )  !  ( I A I 2 + A)  1 ] dA II < 2 II A  E ll (X)
II
{3
(X.30)
Since A * A  B * B == B * (A  B ) + (A*  B * ) A, from (X. 29) we have ( I B I 2 + A )  1  ( I A I 2 + A)  1  ( I B I 2 + A)  1 B * (A  B ) ( I A I 2 + A)  1 + ( I B I 2 + A)  1 (A *  B * ) A ( I A I 2 + A )  1 .
(X.31)
Note that I I ( I B I 2 + A )  1 B* l l

II B( IBI2 + A) 1 11

I I I B I ( I B I 2 + >.. )  1 1 1
since the maximum value of the function f (t) argum ent ,
So , from (X. 3 1 ) we obtain
==
t2
�
A
<
�
2 >.. / 2 '
is 2 ; 1 2 By the same A •
X . 3 Lo cal Perturbation Bo unds
Hence
II
�
1 >, l /2 [ ( 1 E I 2 Q
+
A ) � l  ( I A I 2 + A ) � l ] d A II < II A  E l l
II A  E l l log
301
�
J A� l dA Q
(3
=
0:
2 II A  E ll log
II A II + II E I I II A  E ll _
(X.3 2 )
Combining (X.2 7 ) , (X. 28) , ( X . 30) , and (X.32 ) , we obtain the inequality • (X. 26) .
X. 3
Lo cal Perturbation B ounds
Inequalities obtained above are global, in the sense that they are valid for all pairs of operators A, B . Some special results can be obtained if B is restricted to be close to A , or when both are restricted to be away from 0. It is possible to derive many interesting inequalities by using only elementary calculus on normed spaces. A quick review of the basic concepts of the Fn§chet differential calculus that are used below is given in the Appendix. Let f be any continuously differentiable map on an open interval I. Then the map that f induces on the set of Hermitian rnatrices whose spectrum is contained in I is Frechet differentiable. This has been proved in Theorem V.3 . 3 , and an explicit formula for the derivative is also given there. For each A , the derivative D f( A ) is a linear operator on the space of all Hermitian matrices. The norm of this operator is defined as
l lD f (A) II == sup I I D f (A) ( B ) II
II B II= l
(X.33)
More generally, any unitarily invariant norm on Hermitian matrices leads to a corresponding norm for the linear operator D f ( A) ; we denote this as
IIIDJ( A) III
==
sup III D J ( A) ( B) III
III B III= l
(X. 3 4 )
For some special functions f, we will find upper bounds for these quanti ties. Among the functions we consider are operator monotone functions on (O , oo ) . The square root function j (t) = t11 2 is easier to handle, and since it is especially important , it is worthwhile to deal with it separately.
Theorem X. 3 . 1 Let f (t) == t11 2 , 0 < t < oo . Then for every positive operat or A, and for every unitarily invariant norm, (X. 35 )
Proof. The function g ( t) == t 2 0 < t < oo is the inverse of f . So, by the chain rule of differentiation , D f ( A ) == [D g ( f (A) ) ]  1 for every positive �
302
X.
Pert urbat ion of l\1latrix Funct ions
operator A. Note that Dg (A) ( X ) == AX + XA, for every [Dg ( J ( A) )] (X) == A 1 / 2 X +
2 1 XA 1 .
X. So
> a n > 0, then dist (O" (A 1 1 2 ) , O" (  A 1 1 2 ) ) If A has eigenvalues a2 1 > == 2 a;1 2 == 2 II A  l ll  1 1 . Hence , by Theorem VII. 2 . 1 2 , ·
·
·
This prove s the theorem .
Exercise X.3. 2 Let f
E
•
C 1 (I) and let f ' be the derivative of f . Show that
1 1 / ' (A) I I == II D J (A) ( I ) ll < Thus, for the function f (t) ==
t 1 12
II Df(A) ll 
(X.36)
on (0, oo ) ,
I I D J (A) I I ==
ll f ' (A) I I
(X. 37)
for all positive operators A .
Theorem X.3.3 Let cp be the map that takes an invertible operator its absolute value I A I . Then, for every unitarily invariant norrn, 11/ D cp (A) Il l < cond ( A ) == I I A  I ll II A ll .
A
to
(X.38)
Let g (A) == A* A. Then Dg (A) ( B ) == A * B + B * A. He nce III Dg(A) III < 2 JI A Jj . The map cp is the composite fg, where f(A ) == A 11 2 . So, by the chain rule, Dcp(A) == Df (g (A) ) Dg (A) == Df (A* A) Dg(A) . Hence
Proof.
III Dcp(A) Il l < III D f(A * A) Ill III D g ( A ) Ill The first term on the right is bounded by � II A  I l l by Theorem X.3. 1 , and • the second by 2 II A II . This proves the theorem. The following theorem generalises Theorem X.3. 1 .
Theorem X.3.4 Let f be an operator monotone function on (0, oo ) . Then, for every unitarily invariant norrn,
III D f (A ) III < l l f ( A) l l
(X.39)
'
for all positive Proof.
operators A.
Use the integral representation (V.49) to write
j( � CXJ
j (t)
= a + {3t +
0
>. 2
1

>.
� t ) dJL(A) ,
X.3 Lo cal Pert urbation Bounds
where a , {3 are real numbers , {3 > 0, and
J.L
303
is a positive measure. Thus
1 ] d ( >. ) . >. A ( ) + I [ � p. J >.2 1 00
J ( A ) = a d + {3A +
0
Using the fact that , for the function g( A ) == A  1 we have Dg ( A ) (B)  A  1 B A  1 , we obtain from the above expression
! ( >. + A )  1 B( >. + A )  1 dp. ( >. ) . 00
Df( A ) ( B) = {3 B +
0
Hence
J I I ( >. + A )  l l l 2 dp. ( >. ) . 00
III D J ( A ) Ill < {3 +
(X .4 0)
0
From the integral representation we also have 00
J ' (t) = /3 +
J ( >. � t) 2 dp. ( >. ) . 0
Hence
CXJ ll f ' ( A ) II = ll /3 1 + ( >. + A )  2 dp. ( A ) II ·
J
(X.41)
0
If A has eigenvalues a 1 >   > a n , then since {3 > 0, the righthand sides of bot h (X.40) and (X.41) are equal to ·
J (A + Cl'n )  2 dp. ( A) . CXJ
{3 +
0
This proves the theorem.
•
Exercise X.3.5 Let f be an operator monotone func tion on (0, oo ) . Show that I I DJ ( A ) I! == II D J ( A ) (I) l l == II ! ' ( A ) II · Once we have estimates for the derivative D f(A) , we can obtain bounds for lll f ( A )  f(B) III when B is close to A . These bounds are obtained using Taylor's Theorem and the mean value theorem. Using Tay lo r s Theorem, we obtain from Theore ms X.3.3 and X . 3 . 4 above the fol lowing
.
'
304
X.
Pert urbat ion of Matrix Funct ions
Theorem X.3.6 Let invariant norm
A
be an invertible operator. Then for every unitarily
JII I A I  I B I III < cond( A) III A  B il l + O ( III A  B lll 2 ) for all B close to
(X.42)
A.
Theorem X.3. 7 Le t f be a n operator mono tone function o n (0, oo ) . Let A be any positive operator. Then for every unitarily invariant norm
ll f' ( A) II III A  B ill + O ( III A  B lll 2 ) for all positive operators B close to A . For the functions f ( t) == t r , 0 < r < 1 , we have from this lll f ( A)  f ( B) III
<
( X.4 3 )
( X . 44 ) The use of the mean value theorem is illustrated in the proof of the following theorem.
Theorem X.3.8 Let f be an operator monotone function on (0, oo ) and let A, B be two positive operators that are bounded below by a; i. e. , A > a! and B > a! for the positive number a. Then for every unitarily invariant norm (X.45) lll f ( A )  f (B) III < f' ( a) III A  B i l l Proof. Use the integral representation of f as in the proof of Theorem X.3.4. We have (X) f ' ( A ) = {31 + (>.. + A )  2 df.£(>.. ) . If A >
j 0
a!, then J ' ( A)
<
{31 +
[! ( >.. + a)2df.£(>.. )]1 = J'(a)1. (X)
0
Let A ( t ) == ( 1  t ) A + tB, 0 < t < 1 . If A and B are bounded below by a , then so is A( t ) . Hence, using the me an value theorem, the inequ ality (X.39), and the above observation, we have
lllf ( A)  f (B ) I II < <
DJ(A(t)) ( A' ( t))l ll III O
f ' ( A (t)) II III A' ( t) lll ll O
< !' (a ) l i l A  B i l l 
•
X . 3 Lo cal Pert urbation Bounds
305
A special corollary is the inequality
lil A'  B ' lll < r a'  1 III A  B i l l ,
(X.46)
0 < r < 1,
valid for operators A, B such that A > a! and B > al for some positive number a . Other inequalities of this type are discussed in the Problems section . Let A == UP be the polar decomposition of an invertible matrix A . Then P == ! A I and U == AP  1 . Using standard rules of differentiation , one can obtain from this, expressions for the derivative of the map A U, and then obtain perturbation bounds for this map in the same way as was done for the map A t I A I . There is, however, a more effective and simpler way of doing this. The advantage of this new method, explained below, is that it also works for other decompositions like the QR decomposition , where explicit formu lae for the two factors are not known. For this added power there is a small cost to be paid. The slightly more sophisticated notion of differentiation on a manifold of matrices has to be used. We have already used similar ideas in Chapter 6. In the space M(n) of n x n matrices, let GL (n) be the set of all invertible matrices , U(n) the set of all unitary matrices, and P (n) the set of all positive ( definite ) matrices. All three are differentiable manifolds. The set GL(n) is an open subset of M (n) , and hence the tangent space to GL (n) at each of its points is the space M( n) . The tangent space to U ( n) at the point I, written as Tr U (n) , is the space K (n) of all skewHermitian matrices. This has been explained in Section VI.4. The tangent space at any other point U of U(n) is TuU(n) == U K(n) == { US : S E K(n) } . Let 1t ( n) be the space of all Hermitian matrices. Both 1t ( n) and IC ( n) are real vector spaces and 7t (n) == i JC (n ) . The set P (n) is an open subset of 7t (n) , and hence, the tangent space to P (n) at each of its points is 1l (n) . The polar decomposition gives a differentiable map from GL(n) onto U(n) x P (n) . This is the map � (A) == (
2 (A) ) == ( U, P) , where the invertible matrix A has the polar decomposition A == UP. Earlier in this section we called ; i e. w ( U, P) = UP. This is a much simpler object to handle, since it is j ust a product map. We can calc ul ate the derivative of this map and t h en use the inverse function theorem to get the derivative of the map
·
.
,
Theorem X. 3.9 Let t1? 1 be the map from GL(n) into U(n) that takes an invertible matrix to the unitary part in its po lar decomposition,
X
306
X.
Pert urbation of M atrix Functions
point U X is given by the formula (X)
[ D1 (UP) ] (UX) = 2 U
J
e  t P ( i l m X )e  t P dt.
(X.4 7 )
0
Proof. The domain of the linear map D'I!(U, P) is the tangent space to the manifold U (n) x P (n) at the point (U, P) . This space is ( U · K (n) , 1l (n) ) . The range of D 'I! ( U, P) is the tangent space to GL(n) at UP. This space is M(n) . We will use the decomposition M(n) == U · .K:(n) + U · 1l (n) that arises from the Cartesian decomposition. By the definition of the derivative, we have [ D w ( U, P) ] ( U S, H)
d w ( Uet8 , P + tH) dt t=O d Uets (P + tH) dt =D t USP + UH
for all S E K(n) and H E 1l ( n) . The derivative D
w 1 ,
from the two equations above we see that
U X == [D
UN.
X == MP + N.
Our task now is to find M fro m this equation. Note that A1 is skew Hermitian and N Hermitian. Hence, from the above equation, we obtain
MP + PM == X  X* == 2 i lm X. This equation was studied in Chapter 7. From Theorem VII.2.3 we have its solution (X) M = 2 e tP ( i lm X ) e t P dt .
j
.
0
This gives us the expression (X.4 7) .
•
Corollary X.3. 10 For e v e ry unitarily invariant norm we have (X.48)
X . 3 Lo cal Pert urbation Bounds
Proof. that
Using (X.4 7 ) and properties of unitarily invariant norms, we see
P has
j ll e � t P II III X III II e � t P II dt . 00
III DI (U P ) (U X) III < 2 If
307
eigenvalues
a1
>
· · ·
>
an ,
0
then
1 i etP I I
==
e  tn1t .
So,
J e � 2tun II I X ll l dt CXJ
I l l D l ( U P ) (U X) I ll
<
2
0
Hence ,
IIID 1 ( U P) III
== sup
III X III = l
I I I D 1 (U P) (X ) I I I
<
1 II P  11 
The choice X == ivv* / l llvv* I l l , where v is an eigenvector of P belonging to • the eigenvalue a n , shows that the last inequality is in fact an equality. Two corollaries follow; the first one is obtained using the mean value theorem and the second one using Taylor ' s Theorem.
Corollary X.3. 1 1 Let A0 , A 1 be two elements of GL (n) , and let U0, U1 be the unitary factors in their polar decompositions. Suppose that the line segment A ( t) == ( 1  t ) Ao + tA1 , 0 < t < 1 , lies inside GL (n) . Then, for every unitarily invariant norm
III Uo  U1 l l l
II A ( t )  1 II · III A o  A l iii O
< max
( X . 49 )
Corollary X.3. 12 Let A0 be an invertible matrix with polar decomposition A o == Uo Po . Then, for a matrix A == U P in a neighbourhood of Ao , we have
I I I Uo  U lll
< I I Ao 1 l l
I I I Ao  A l ii + O ( III Ao  A lll 2 ) .
(X. 50 )
Exercise X.3 . 13 From the proof of Theorem X. 3. 9 one can also extract a boun d for the derivative of the map A j Aj . What does this give ? Compare it with the result of Theorem X. 3. 3. 7
Let us see now how this method works for a perturbation analysis of the QR decomposition. Let .6. + (n) be the set of all upper triangular matrices with positive di agonal entries. Each element A of GL(n) has a unique factoring A == QR, where Q E U(n) and R E .d + (n) . Thus the QR decomposition gives rise to an invertible map from GL(n ) onto U(n) x A. + (n) . Let A re ( n ) be the set of all upper triangular matrices with real diagonal entries. This is a , real vector space, and Ll + (n) is an open set in it . Thus the tangent space to Ll+ (n ) , at any of its points, is d re ( n ) . For each A == QR in GL(n)
308
X.
Perturbation of Matrix Funct ions
the derivative D (A) is a linear map from the vector space M(n) onto the vector space (Q K(n) , A re (n) ) . We want to c al culate the norm of this. First note that the spaces JC(n) and A re (n) are complementary to each other in M(n) . We have a vector space decomposition ·
M (n)
==
== JC(n) + A re (n) .
(X. 5 1 )
Every matrix X splits as X K + T in this decomposition; the entries of X , K and T are related as follows : kJ· J. k2]· . k2·]· tJ.J. t2·]. t 2. ].
i Im xJ·J·  Xj i x 2]· Re x ]· ]·
·
.
for all J , for j > i, for i > j, for all ] , for j > i, for i > j.

.
x 21 + x 1 i 0
(X. 52)
Exercise X.3 . 14 Let P1 and P2 be the complementary projection opera tors in M (n) corresponding to the decomposition {X. 51). Show that
where
II P1 ! 1 2
==
sup
II X I I 2 = l
II P1 X II 2 ,
and
ll · ll2
stands for the Frobenius (Hilbert
Schmidt) norrn.
==
Now let
W
be the map from U(n) x A. + (n) onto M( n) defined as w (Q, R) QR. T hen w and are inverse to each other. The derivative D 'I! (Q, R) is a linear map whose domain is the tangent space to the mani fold U (n) x A + (n) at the point (Q, R) . This space is (Q · IC (n) , A re (n) ) . Its range is the space M(n) Q · JC (n) + Q · A re (n) . By the definition of the derivative, we have
==
[Dw ( Q , R)] ( Q K , T)
d dt d dt
t=O
\li ( Q e
t
t
K , R + tT)
Q e K ( R + tT) t=O
QKR + QT for all K E JC(n) and T E A re (n) . The derivative D(QR) is a linear map from M (n) onto Q · JC (n) +
dre (n) . Suppose
==
[D (Q R) ] (Q X ) (Q M , N ) , where M E JC (n) and N E A re (n) . Then we must have QX
== [D(QR )]  1 (Q M, N) == [D w ( Q , R)] (Q M, N) == QM R + QN.
X. 3 Lo cal Perturbation Bounds
Hence
309
X == lvfR + N.
So, we have the same kind of equation as we had in the analysis of the polar decomposition. There is one vital difference, however. There, the matrices M, N were skewHermitian and Hermitian, respectively, and instead of the upper triangular factor R we had the positive factor P. So , taking adjoints, we could eliminate N and get another equation that we could solve explic itly. We cannot do that here. But there is another way out . We have from the above equation (X. 53) Here M E K(n) ; and both N and R  1 are in .6. re (n) , and hence so is their product N R 1 . Thus the equation (X. 53) is nothing but the decomposition of X R  1 with respect to the vector space decomposition (X. 5 1 ) . In this way, we now know M and N explicitly. We thus have the following theorem.
Theorem X.3. 1 5 Let <1> 1 , 2 be the maps from GL (n) into U (n) and .6.+ ( n) that take an invertible matrix to the unitary and the upper tri angular factors in its QR decomposition. Then fnr each X E M (n) , the derivatives D i1.> 1 ( Q R) and D 2 ( Q R ) evaluated at the point Q X are gzven by the forrnulae
[ D 1 (Q R ) ] (QX) == QPl (X R  1 ) , [ D 2 (QR) ] (Q X ) == P2 ( X R  1 ) R,
where P1 and P2 are the complementary projection operators in M (n) cor responding to the decomposition (X. 51). Using the result of Exercise X.3. 14, we obtain the first corollary below. Then the next two corollaries are obtained using the mean value theorem and Taylor ' s Theorem.
Corollary X.3. 16 Let <1> 1 , <1> 2 be the maps that take an invertible matrix A to the Q and R factors in its QR decomposition. Then
jj D 1 ( A ) l l 2 < vi2 II A  1 II , I I D 2 (A) l l 2 < vf2 nd (A ) vf2 II A ll II A  I l l Corollary X.3. 1 7 Let A0, A 1 be two elements of GL (n) with their respec tive QR decompositions A 0 Q 0Ro and A 1 == Q 1 R 1 . Suppose that the line segment A (t) ( 1  t ) A0 + t A 1 , 0 < t < 1 , lies in GL (n) . Then co
==
==
==
310
X.
Perturbation of Matrix Functions
Corollary X.3 . 18 Let A o == matrix A == Q R close to A0,
Q 0 Ro
be an invertible matrix. Then for every
II Q o  Q ll 2 < v'2 11 A a 1 II II Ao  A ll 2 + O ( II Ao  A l l� ), li Ra  R l 1 2 < v'2 cond(Ao ) II A o  A l l 2 + O ( II A o  A ll� ). For most other unitarily invariant norms, the norms of projections P1 and P2 onto the two summands in (X. 5 1 ) are not as easy to calculate. Thus this method does not lead to attractive bounds for these norms in the case of the QR decomposition .
X.4
App endix : Differential Calculus
We will review very quickly some basic concepts of the Frechet differential calculus, with special emphasis on matrix analysis. No proofs are given. Let X, Y be real Banach spaces , and let £ ( X, Y) be the space of bounded linear operators from X to Y. Let U be an open subset of X . A continuous map f from U to Y is said to be differentiable at a point u of U if there exists T E £(X, Y ) such that + v )  f ( u )  Tv l l u f ( . ll lim O == v+o 11 v 1 1
(X. 54)
It is easy to see that such a T, if it exists, is unique. If f is differentiable at u, the operator T above is called the derivative of f at u. We will use for it the notation D f ( u) . This is sometimes called the Frechet derivative. If f is differentiable at every point of U, we say that it is differentiable on U. One can see that, if f is differentiable at u , then for every v E X, d D f ( u) ( v ) == dt
t=O
f ( u + tv) .
(X.55)
This is also called the directional der ivat i ve of f at u in the direction v. The reader will recall from elementary calculus of functions of two vari ables that the existence of directional derivatives in all directions does not ensure differentiability. Some illustrative examples are given below. Example X.4. 1 {i) The constant function f (x)
points, and D f(x)
==
c
is differentiable at all
0 for all x. {ii) Every linear operator T is differentiable at all points, and is its own derivative; i . e. , DT( u ) ( v ) == T v , for all u , v in X . {iii} Let X, Y, Z be real Banach spaces and let B : X x Y Z be a bounded bilinear map. Then B is differentiable at every point, and its ==
�
X . 4 Appendix: D ifferent ial Calculus
derivative
D B(u, v )
311
is given as
D B ( u, v) ( x, y ) == B ( x, v ) + B( u , y ) . {iv) Let X be a real Hilbert space with inner product (· , · ) , and let f ( u ) ll u ll 2 == ( u , u ) . Then f is differentiable at every point and Df(u) ( v)
==
2( u, v ) .
The next set of examples is especially important for us.
Example X.4.2 In these examples X {i} Let j (A) == A 2 . Then
== Y == .C ( 7t) .
D j (A ) (B ) == AB + EA . let f (A) == A n , > 2 . From the
{ii) More generally, for (A + B ) n one can see that
n
binomial expansion
J + k = n.  1 J , k ?_ O
f(A) == A  l for each invertible A . Then Dj (A) ( B ) == A  1 BA  1 • (iv} Let f (A) == A * A. Then D f(A) ( B ) == A* B + B * A. (v} Let f (A) == e A . Use the formula 1 e A+B e A = e ( l  t ) A B e t( A +Bl dt {iii} Let
_
j 0
{called Dyson's expansion} to show that
1
D f( A ) (B) = J e (l t ) A B e t A dt . 0
The usual rules of differentiation are valid: If !I , f2 are two differentiable maps , then !I + f2 is differentiable and The composite of two differentiable maps have the chain rule D
f and g is differentiable and we
( g · f ) ( u ) = D g ( f ( u) )
· D
f ( u) .
3 12
X.
Perturbat ion of Matrix Functions
g is linear, this reduces to D ( g f) ( u) g · D f ( u) .
In the special situation when
==
·
One important rule of differentiation for real functions is the product rule: ( fg) ' == f ' g + gf ' . If f and g are two maps with values in a Banach space, their product is not defined  unless the range is an algebra as well. Still, a general product rule can be established. Let j, g be two differentiable maps from X into Y1 , Y2 , respectively. Let B be a continuous bilinear map from Y1 x Y2 into Z. Let cp be the map from X to Z defined as cp( x) == B ( f ( x) , g( x )) . Then for all u, v in X
D cp( u ) (v)
==
B(Df(u) (v ) , g (u)) + B(f(u) , D g( u ) ( v)) .
This is the product rule for differentiation. A special case of this arises when Y1 == Y2 == .C(Y) , the algebra of bounded operators in a Banach space Y. Now cp ( x) == f ( x ) g ( x ) is the usual product of two operators. The product rule then is
[Df(u) ( v )] · g (u) + f(u ) · [ D g( u) (v ) ] . (i} Let f be the map A * A 1 on GL(n) . Use the product
Dcp ( u) ( v ) Exercise X.4.3 rule to show that
==
This can also be proved directly. (ii} Let f (A ) == A 2 • Show that
 A  1 BA  2  A  2 BA 1 . a formula for the derivative of the map f(A) Df ( A) ( B)
(iii) Obtain 3, 4, . . . .
==
==
An ,
n
==
Perhaps, the most useful theorem of calculus is the Mean Value Theorem.
Theo r em X.4.4 (The Mean Value Theorem) Let f be a differentiable map from an interval I of the real line into a Banach space X . Then for each closed interval [ a , b] contained in I,
llf(b )  f ( a ) ll
<
lb  a!
 · ·
sup a
l! D f(t ) ll 
This is the version we have used often in the book, with There is a more general statement:
[ a , b]
==
[0, 1 ] .
Theorem X.4. 5 (The Mean Value Theorem} Let f be a differentiable map from a convex subset U of a Banach space X into the Banach space Y . Let a, b E U and let be the line segment joining them. Then
L
II ! ( b)  ! ( a) I I < I I b  a I I suP I I D f ( u) 11 uE L
X.4 Appendix: D ifferential Calculus
313
(Note that there are three different norms that occur in the above inequal ity. These are the norms of the spaces Y, X , and £(X, Y ) , respectively. ) Higher order Frechet derivatives can be identified with multilinear maps . This is explained below. Let f be a differentiable map from X to Y . At each point u , the deriva tive Df( u ) is an element of the Banach space L ( X , Y ) . Thus we have a map Df from X into L ( X , Y) , defined as D f : u � Df ( u ) . If this map is differentiable at a point u, we say that f is twice differentiable at u . The derivative of the map D f at the point u is called the second deriva tive of f at u . It is denoted as D 2 f(u ) . This is an element of the space L(X, L(X, Y ) ) . This space is isomorphic to another Banach space , which is easier to handle. Let £2 ( X , Y ) be the space of bounded bilinear maps from X x X into Y. The elements of this space are maps f from X x X into Y that are linear in both variables, and for whom there exists a constant c such that for all x i x 2 E X. The infimum of all such c is called l l f ll  This is a norm on the space £ 2 ( X , Y) , and the space is a Banach space with this norm. If 'P is an element of £ (X, L(X, Y) ) , let ,
cp (xi , x2 )
==
['fJ(XI )] (x 2 )
for x i , x 2 E X.
Then cp E £2 (X, Y) . It is easy to see that the map cp � cp is an isometric isomorphism. Thus the second derivative of a twice differentiable map f from X to Y can be thought of as a bilinear map from X x X to Y. It is easy to see that this map is symmetric in the two variables; i.e. ,
for all u , VI , v 2 . (This symmetry property is extremely helpful in guessing the expression for the second derivative of a given map . ) Some examples on the space of matrices are given below. Example X.4.6 Let X == M ( n ) and let f(A) == A 2 , A E M(n ) . We have seen that Df ( A) (B ) == A B + BA for a l l A, B . No te that D f(A) == LA + R A , where LA and RA are linear operators on M( n ) , the first one is the left multiplication by A and the second one is right multiplication by A. The map D f : A � Df(A) is a linear map from M(n) into £ ( M ( n ) ) . So the derivative of this map, at each point, is the map itself. Thus for each A, D 2 f (A) == D f. In other words,
(D 2 f(A ) ] (B) == Df (B) == Ls + R s . If we think of D 2 f(A) as a linear map from M ( n ) into L ( M ( n ) ) , we have (D 2 J(A ) (B 1 ) ] (B2) == (LB1 + R B1 ) (B2) == B 1 B2 + B2 B 1
314
X.
Perturbat ion of Matrix Functions
for all B 1 , B 2 . If we think of it as a bilinear map, we have
Note that the righthand side is independent of A . So the map A * D 2 f (A ) is a constant map. These are noncommutative analogues of the facts that if f(x) == x 2 , then f ' (x) == 2x and f" (x) == 2 . •
Example X.4. 7 Let f (A) == A 3 . We have seen that
This is the noncom mutative analogue of the fact that if f ( x) == x 3 , then f ' (x) == 3x 2 . What is the second derivative ? From the formula f" (x) == 6x , and the fact that D 2 f (A) is a symmetric bilinear map, we can guess that
Prove that this indeed is the right formula for D 2 f(A) . Note that the map A * D 2 f(A) is linear.
Example X.4. 8 More generally, let f (A) == A n . From the binominal the orem one can see that
L
j+ k+i=n 2 J,k,£�0
[Ai B 1 A k B2 A e + Ai B 2 A k B 1 A e ] .
Example X.4.9 Let f(A) == A  1 , A E GL(n). We know that D f(A) (B) ==  A  1 BA  1 , for all B E M(n) . This is the noncommutative analogue of the formula (x  1 ) == x 2 . The analogue of the formula (x 1 ) " == 2x  3 is the following: '
A  1 B 1 A 1 B2 A  1 + A  1 B2 A 1 B A  1 . 1 This can be guessed from the bilinearity and symmetry properties that D 2 f(A ) must have. It can be proved formally by the rul e s of differentia tion.
Example X.4.10 Let f(A) == A 2 , A  A  1 BA  2 A  2 BA  1 . Show that 
[D 2 f(A)] (B 1 , B2 )
==
E
GL(n). We know that Df(A) (B ) ==
A 2 B 1 A  1 B2A 1 + A 2 B2 A  1 B 1 A  1 + A  1 B 1 A 2 B2 A  1 + A 1 B2 A 2 B1 A 1 + A 1 B 1 A 1 B2 A 2 + A  1 B2 A  1 B 1 A 2 .
This is the analogue of the formula (x  2 ) " == 6x 4 .
X . 4 Appendix: D ifferent ial Calculus
315
Let f (A) == A * A . We have seen that D f(A) (B) == A* B + B *A. Show that D 2 f(A) (B 1 , B2 ) == B� B2 + B2 B1 . Note that this expression does not involve A. So the map A t D 2 f(A) is a constant map.
Example X.4. 1 1
Derivatives of higher order can be defined by repeat i ng the above proce dure. The pth derivative of a map f from X to Y can be identified with a plinear map from the space X x X x · · · x X (p copies) into Y . A convenient method of calc u lat i ng the pth derivative of f is provided by the formula
Compare this with the formula (X. 55) for the first derivative.
Let f(A ) == An A E .C(7t) . Then for p == 1 , 2, . . . , n, [DP j (A) ] ( B 1 , . . . , Bp )
Example X.4. 12
,
L
L
u E Sp
11
Jl �o. + · · · +J p + l = n  p
where Sp is the set of all permutations on p symbols. There are ( n�!p) ! terrns in the above double sum. These are all words of length n in which n  p of the letters are A and the remaining letters are B 1 , . . . , Bp , each occurring exactly once. Notice that this expression is linear and symmetric in each of the variables B 1 , . . , Bp . When dim 1t == 1 , this reduces to the formula for the pth derivative of the function f(x) == x n .
:
j( P ) (x)
==
n (n  1)
· · ·
(
n 
p + l )x n  p ==
n! np _ x ( n  p) !
The reader should work out some more si mple examples to see the ex pressions for higher derivat ives . Another important theorem of calculus, Taylor's Theorem, has an an al o gue in the Frechet calculus. Of the d iffe re nt versions possible, the one that is most useful for us is gi ven b elow. Let f be a (p+ 1 )times d i fferentiabl e map from a Banach space X i nto a B anach space Y. For h E X, write [ h] rn to mean the mtuple ( h , h, . . . , h ). Then , for all x E X and for small h ,
l l f(x + h )  f(x) 
L=l � Dm f (x)( [h] m) l l p
m.
m
=
O ( ll h llp+ l ) .
From this we get p
l l f( x + h )  f(x) l l < "2.: m= l
� I I Dm f ( x ) l l ll hll m + O ( ll h ii P+ l )_ m.
:3 1 6
X.
Pert urbation of M atrix Functions
Finally, let us write down formulae for higher order derivatives of the composite of two maps. These are already quite intricate for real functions. If \Ve have 'P == f ( g ( x ) ) , then we have
'P ( l ) ( x ) 'P( 2 ) ( x ) 'P (3 ) ( x )
j(l ) (g ( x) )g ( l ) (x ) , ! ( 2 ) ( g ( x )) (g (l ) ( x )] 2 + j( l ) ( g ( x ) ) g ( 2 ) ( x ) , f(3 ) ( g ( x)) [g ( l ) (x ) ] 3 + 3 f( 2 ) (g (x) ) g ( l ) ( x ) g ( 2 ) ( x) + f ( l ) ( g ( x )) g (3 ) ( x ) .
If X, Y, Z are Banach spaces and if
f is a map from X to Y ,
and g a map from Y to Z, then for the derivatives of the composite map 'P == f o g , we have the following formulae. By the chain rule,
D 1.p ( x ) == D f ( g(x)) Dg ( x) . The second and the third derivatives are bilinear and trilinear maps , re spectively. For them we have the formulae:
[D 2 1.p ( x ) ] ( x 1 , x 2 )
==
[ D 2 f ( g (x) ) ] ( Dg ( x ) ( x i ), D g ( x) (x 2 )) + D j (g(x) ) ([ D 2 g(x) ] ( x 1 , x 2 )),
[ D 3 f ( g ( x )) ] ( Dg (x) ( x 1 ), D g ( x ) ( x 2 ) , D g (x )(x3 )) + [D 2 f ( g ( x)) ] ( Dg ( x) (x i ) , [ D 2 g (x) ] (x 2 , x ) ) 3 + ( D 2 j ( g ( x )) ] ( Dg ( x) ( x 2 ), [D 2 g( x ) ] ( x 1 , x )) 3 + [D 2 f ( g ( x )) ] ( D g( x ) ( x ) , [D 2 g ( x ) ] (x 1 , x 2 ) ) 3 + D f ( g (x)) [D 3 g ( x )] ( xi , x 2 , x ) . 3 The reader should convince himself that considerations of domains and ranges of the maps involved, sy mmetry in the variables, and the demand that in the case of real functions we should recover the old formulae lead to these general formulae. He can then try proving them. We have also used the notion of the derivative of a map between mani folds. If X and Y are differentiable manifolds in finitedimensional vector spaces , and f is a differentiable map from X to Y, then at a point u of X the derivative D f ( x ) is a linear map from the linear space TuX into the lin ear space Tf(u) Y. These are the tangent spaces to X and Y at u and f ( u ), respectively. All manifolds we considered are subsets of M (n) . Of these, GL(n) , P(n) , and � + (n) are open subsets of vector subspaces of M(n) . So these vector spaces are the tangent spaces for the manifolds. The only closed manifold we considered is U ( n) . It is easy to find the tangent space at any point of this manifold. This was done in Chapter 6. Most of the results of Frechet calculus can be restated in this setup with appropriate modifications.
X . 5 P roblerns
X. 5
317
Problems
Problem X. 5 . 1 . Let f be a nonnegative operator monotone function on (O, oo) . ( i) Show that if A is positive and U unitary, then
lll f ( A)U  U f ( A ) IIl < lll!( I A U  U Al ) l ll ( ii ) Let A be positive and X Hermitian. Let U be the Cayley transform of X . We have ( X  i ) ( X + i)  1 , U i(I + U ) (I  U )  1 == 2i (/  U)  1  il. X Show that
lll f(A) X  Xf ( A) III Use the relation between show that
lllf (A) X  X f( A) III ( iii )
Let show that
A, B
U <
< 2 11 (1 
and 1
+
X
u )  1 11 2 lll f( I A U  U A I ) III 
again to estimate the last factor, and
spx) Ill ! C +
be positive and
X
2
5
)
� ( X ) l A X  X A I Ill ·
arbitrary. Use the above inequality to
[Hint: Use 2 x 2 block matrices. ] When X == I, this reduces to the inequality
( X. 5) .
Problem X. 5 . 2 . Let f be a nonnegative operator monotone function. Let A, B be positive matrices and let X be any contraction. Show that
[Hint:
l l l f (A )X  Xf(B ) III < 5 / 4 III ( I AX  XB I ) III Use the result of the preceding problem, replacing X there by � X.]
Problem X.5.3. From the above inequality it follows that if positive and X is any matrix, then for 0 < r < 1 ,
II A ' X  XB ' II < 5 /4 II X jj 1  ' jj AX  X B II '· Show that we have under these conditions
A, B
are
318
X.
Pert urbation of Matrix Funct ions
[ Hint: Reduce the general case to the special case inequality. ]
A == B .
Use Holder ' s 
Problem X. 5 . 4. Let
A, B
Problem X . 5 . 5 . Let that for every X
A be a positive operator such that A > a ! > 0 . Show
be any two operators. Show that
2 a iii X III < Ill AX + X A l ii · [Use the results in Section VII. 2 . ] Use this to show that if A, B are positive A 1 1 2 + B 1 1 2 > ai > O then
operators such that
'
[Hint:
ll l A 11 2  B 1 1 2 l ll < �a l il A  B il l · Consider the operators A 1 1 2 + B 1 1 2 and A 1 1 2  B 1 1 2 . ]
Problem X. 5 . 6 . Let A and B be positive operators such that A > a ! > 0 and B > bi > 0. Show that for every nonnegative operator monotone function f on (0, oo )
lll f( A)  f (B) III < C (a , b) III A  B i l l ,
where
C ( a, b ) == f( al=t(b)
if a f
b, and C ( a, b ) == f' ( a ) if a == b .
Problem X. 5. 7. Let f be a real function on (0 , oo) , and let f ( n ) be its nth derivative. Let f also denote the map induced by f on positive operators. Let D n f ( A) be the nth order Frechet derivative of this map at the point A. Let
v< n ) == { ! :
I I D n f ( A) I I ==
l l f ( n ) ( A) I l for all positive A } .
We have seen that every operator monotone function is in the class Show that it is in V ( n ) for all n 1 , 2 , . . . . ==
vC l ) .
Problem X . 5 . 8 . Several examples of functions that are not operator monotone but are in V ( l ) are given below. (i) Show that for each integer n, the function f (t) the class vC 1 ) .
==
tn
on (0, oo) is in
(ii) Show that the function f ( t) == a 0 + a 1 t + · · · + a n t n on (0, oo) , where n is any positive integer and the coefficients aj are nonnegative, is in the class vC l ) .
X . 5 P roblen1s
319
(iii) Any function on (0, oo ) that has a power series expansion with non negative coefficients is in the class 1)( l ) .
( i v) Use the Dyson expansion to show that the exponential function is in the class (v) Let f(t)
V( l ) .
J e Atd/1(>.. ) , CXJ
==
0
Show that f E
where
11
is a positive measure on ( 0, oo ) .
V ( l ) . [Use part ( iv) . ]
(vi) From the Euler's integral for the gamma function, we can write, for r > 0,
Use this to show that for each (0 , oo ) is in V(l) .
r
> 0 , the function f ( t)
Problem X. 5 . 9 . The Cholesky decomposition of a positive definite matrix A is the (unique) factoring A = R* R, where R is an upper triangular matrix with positive diagonal entries. This gives an invertible map from the space P ( n ) onto the space A. + ( n ) . Show that
for every A. Use this to write local perturbation bounds for the map .
P roblem X.5. 10. A matrix is called strongly nonsingular if all of its leading principal minors are nonzero. Such matrices form a dense open set in the space M ( n ) . Every strongly nonsingular matrix A can be factored as A == LR, where L is a lower triangular matrix and R an upper trian gular matrix. Further, L can be chosen to have all of its diagonal entries equal to 1 . With this restriction the factoring is unique. This is the LR decomposition familiar in linear algebr a and numeri cal analy si s . Let S be the set of strongly nonsingular matrices, �i the set of lower triangular matrices with unit diagonal, and � ns t he set of nonsingular upper triangular matrices. Let
320
X.
Pertur bation of Mat rix Functions
Follow the approach in Section X.3 to obtain the bounds : II D � 1 (A) I I 2 < cond ( L ) II R 1 1 1 , I I D 2 ( A ) II 2 < cond ( R ) II L  1 11 Use these to obtain local perturbation bounds for the LR decomposition.
X.6
Notes and References
Most of the results in this chapter can be proved for infinite dimensional Hilbert space operators. Many of them are valid for operator algebras as well. Let f be a continuous real function on an interval that contains the spec tra of two Hermitian operators A, B ( on a Hilbert space 7t) . The problem of finding bounds for ll f (A)  f(B) I l in terms o f I I A  B l l has been inves tigated in great detail by many authors. Many deep results on this were obtained by the Russian school of Birman, which includes Farforovskaya, N aboko, Solomyak, and others. When f is differentiable and f ' is bounded , one would expect to find inequalities of the form ll f (A)  f(B) II < C ll f ' ll oo II A  B ll . Counterexamples to show that such inequalities are not true, in general, were constructed by Yu. B . Farforovskaya, An estimate of the norm I I J ( B)  f(A) II for self adjoint operators A and B, Zap. Nauch. Sem LOMI, 56( 1976) 143 162. ( E nglish translation: J. Soviet Math. 14, No. 2 ( 1 980) . ) It was shown by M. Sh. Birman and M. Z. Solomyak that such inequalities can be found under stronger smoothness assumptions. The reader should see their paper titled Double Stieltjes operator integrals, English translation, in Topics in Mathematical Physics, Volume 1 , Consultant Bureau, New York, 1967. Theorem X. 1 . 1 is taken from F. Kittaneh and H. Kosaki , Inequalities for the Schatten pnorm V, Publ. Res. Inst. Math. Sci. , 23( 1987) 433443. The inequality (X.3) was proved by M.Sh. Birman, L.S. Koplienko, and M.Z. Solomyak, Estimates of the spectrum of the difference between frac tional powers of selfadjoint operators, Izvestiya Vysshikh Uchebnykh Zave denni. Mat, 19 ( 1975) 3 10. Its generalisation in Theorem X. 1 .3 is due to T. Ando, Comparison of norms l l l f(A)  f(B) I l l and l l l f ( I A  B l ) l l l , Math. Z . , 197( 1988) 403409. Our discussion of the material between Theorem X . 1 . 3 and Exercise X . 1 . 7 is taken from this paper. For p norms, the in equality (X. 10) has another formulation: if A,B are positive
II A 11t  B 1 1 t l l tp < I I A  B l l �/ t for p > 1 , t > 1 .
X . 6 Notes and References
32 1
The special case t == 2 of this was proved by R. T. Powers and E. Stq)rrner, Free states of the canonical anticommutation relations, Commun. Math. Phys. , 16 ( 19 70) 133 . The point of this formulation is that if A, B are positive Hilbert space operators and their difference A B is in the Schatten class Tp , then A l / t  B 1 1t is in the class 'It p , and the above inequality is valid . Theorem X. 1 .8. is due to D . Jocic and F. Kittaneh, Some perturbation inequalities for selfadjoint operators, J . Operator Theory, 3 1 ( 1994) 3 10. The proof of Lemma X. 1 .9 given here is due to R. Bhatia, A simple proof of an operator inequality of Jocic and Kittaneh, J . Operator Theory, 3 1 ( 1994) 2 122. As in the preceding paragraph , for the Schatten pnorms, the inequality (X. 12) can be written as
I I A  B j j ( 2 m +l) p < 2 2 m / 2 m+l ii A2 m+ l  B 2 m+l l l �/ 2 m+ l ' for m == 1 , 2, . , p > 1 and Hermitian A, B. The result is valid in infinite dimensional Hilbert spaces. A corollary of this is the statement that if the difference A 2 m+ l  B 2 m + l is in the Schatten class IP , then A  B is in the class Ic 2 m+l ) p · . .
The first inequality in Problem X.5.3 was proved by G . K . Pedersen , A commutator inequality ( unpublished note ) . The generalisation in Prob lem X . 5 . 2 , the inequalities in Problem X.5. 1 , and the second inequality in Problem X.5.3 are due to R. Bhatia and F. Kittaneh, Some inequalities for norrns of commutators, SIAM J . Matrix Anal. , 18 ( 1 997) to appear. The motivation for Pedersen was a result of W.B. Arveson, Notes on extensions of C* algebras, Duke Math. J. , 44 ( 1977) 329355. Let f be a continuous function on [0,1] with f(O) == 0 , and let £ > 0. Arveson showed that there exists a 8 > 0 such that if A and X are elements in the unit ball of a C* algebra and A > 0, then I I A X  X A I I < 8 implies l l f ( A ) X  Xf(A ) II < £ . The inequality in Problem X.5.3 is a quantitative version of this for the special class of functions f (t) == t' , 0 < r < 1 . Weaker results proved earlier and their applications may be found in C.L. Olsen and G.K. Peder sen, Corona C*  algebras and their applications to lifting pro blems, Math. Scand. , 64( 1989) 6386. It is conjectured that the factor 5/4 occurring in these inequalities can be replaced by 1. The inequality (X.20) for p == 1 was proved by H. Kosaki, On the conti nuity of the map 'P j cpj from the predual of a W* algebra, J. Funct. Anal. , 59( 1984) 123131. For the Schatten pnorms, p > 2 , the inequality (X. 18) was proved by F. Kittaneh and H, Kosaki in their paper cited above. The other parts of Theorems X.2 . 3 and X.2.4, and Theorem X.2. 1 were proved by R. Bhatia, Perturbation inequalities for the absolute value map in norm ideals of operators, J. Operator Theory, 19( 1988) 1 29 136. The constant ffn in (X.25) can be replaced by a factor Cn � log n. This has been known for some time and is related to other important problems in operator theory. See two papers by A. Mcintosh, Counterexample to a question on commutators, Proc, Amer. Math. Soc. , 29 ( 1971) 337340, and t
322
X.
Perturbat ion of M atrix Funct io ns
Functions and derivations of C* algebras, J . Funct. Anal. , 30 ( 1978) 264275 . It is also known that such a factor is indeed necessary, both for the operator norm and for the corresponding inequality for the trace norm II  II I · This implies that , if 11 is infinitedimensional, then the map A � I A I on £ (7t) is not Lipschitz continuous . Nor is it Lipschitz continuous on the Schat ten ideal T1 . The inequality (X.24) due to Araki and Yamagami , on the other hand, shows that on the HilbertSchmidt ideal T2 this map is contin uous. For other values of p , 1 < p < oo , E.B. Davies, Lipschitz continuity of functions of operators in the S chatten classes, J. London Math. Soc. , 37( 1988) 148 1 57, showed that there exists a constant /p that depends on p, but not on the dimension n, such that
II I A I  I B I l i P
< lp
II A  E ll ,
for all A , B. T heorem X.2 . 5 was proved in T. Kato, Continuity of the map S � l S I for linear operators, Proc. Japan Acad. 49 ( 1973) 157 1 60 , and interpreted to mean that the map A � I A I is "almost Lipschitz" . Results close to this were obtained by Yu. B . Farforovskaya in the papers cited above. Bounds like the ones in Section X.3 have been of interest to numerical analysts and physicists. References to much of this work may be found in R. Bhat i a, Matrix factorizations and their perturbations, Linear Alge bra A ppl 197/ 198 ( 1994) 245276. Theorem X. 3. 1 , and the proof given here, are due to C . J . Kenney and A.J. Laub, Condition estimates for ma trix functions, SIAM J. Matrix Analysis , 1 0 ( 1 989) 19 1209 . Theorem X.3.4 was proved in R. Bhatia , First and second order perturbation bounds for the operator absolute value, Linear Algebra Appl. , 208/209 ( 1994) 367376. Theorems X.3.3, X.3.6, X.3 .7, and X . 3 . 8 are also proved in this paper. The inequality in Problem X.5.5 is taken from J.L. van Hemmen and T. Ando, An inequality for trace ideals, Commun. Math. Phys. , 76( 1980) 143 148. This paper has references to physics literature, where such inequalities are used. The inequality in Problem X. 5.6 is proved in the paper by F. Kit taneh and H. Kosaki cited earlier. Most of the results after Theorem X.3.9 in Section X.3 were proved by R. Bhatia and K. Mukherjea, Variation of the unitary part of a matrix, SIAM J . Matrix Analysis, 1 5 ( 1 994) 1007 1014. T he full potential of this method was exploited in the paper cited at the beginning of this paragraph, where several other matrix decompositio ns of interest in numerical analysis are studied. The results of Problem X.5 . 9 and X.5. 10 are obtained in this paper. (Some of these were proved earlier using different methods by A. Barrlund, R. Mathias, G . W. Stewart, and J. G. Sun ) Bounds for the second derivative of the map A � I AI are obtained in R. Bhatia, First and second order perturbation bounds for the operator absolute value, Linear Algebra Appl. , 208/209 ( 1994) 367376; and for derivatives of higher orders in R. Bhatia, Perturbation bounds for the operator absolute value, Linear Algebra App l . 226 ( 1995) 639645 . The reader may try to . ,
.
,
X . 6 Notes and References
323
prove such inequalities using the methods explained in Section X.4. Since this map is the composite of two maps, A t A * A t ( A * A ) 1 12 , its analysis can be broken into two parts. No good bounds of higher order are known for other matrix decompositions. Results in Parts ( v ) anrl ( vi) of Problem X . 5 . 8 are taken from R. Bhatia and K . B . Sinha, Variation of real powers of positive operators, Indiana Univ. Math. J . , 43 ( 1994) 913925. In this paper it is also shown that the functions f (t) == tr on (0, oo ) belong to the class D( l ) if r > 2, but not if 1 < r < J2. We have already seen that these functions are in D( 1 ) for all real numbers r < 1 . In Section X.4 we have given a bare outline of differential calculus. More on this may be found in J. Dieudonne, Foundations of Modern Analy sis, Academic Press, 1960, and in A. Ambrosetti and G. Prodi , A Primer of Nonlinear Analysis, Cambridge University Press, 1993 . For calculus on manifolds , the reader could see S . Lang, Introduction to Differentiable Man ifolds, John Wiley, 1962. In our exposition we have included several exam ples of matrix functions and formulae for higher derivatives of composite maps that are not easily found in other sources.
References
C. Akemann, J . Anderson and G. Pedersen, Triangle inequalities in oper ator algebras, Linear and Multilinear Algebra, 1 1 ( 1982) 167 1 78. A. Ambrosetti and G. Prodi , A Primer of Nonlinear Analysis, Cambridge University Press, 1993 . A.R. AmirMoez, Extreme Properties of Linear Transformations and Ge ometry in Unitary Spaces, Texas Tech. University, 1968. W.N. Anderson and G.E. Trapp, A class of mono tone operator func tions related to electrical network theory, Linear Algebra AppL , 1 5 ( 1 975) 5367 . T. Ando, Topics on operator inequalities, Hokkaido University, Sapporo, 1978. T. Ando, Concavity of certain maps on positive definite matrices and appli cations to Hadamard products, Linear Algebra Appl. , 26( 1979) 20324 1 . T . Ando, Inequalities for permanents, Hokkaido Math. J. , 10 ( 1981 ) 1836. T. Ando, Compariso n of norms f l l f ( A )  f ( B ) l l l and I I I !( I A  B I ) I I I , Math. z . , 197( 1988) 403409. T. Ando, Majorization, doubly stochastic matrices and comparison of eigen values, Linear Algebra Appl. , 1 18 ( 1989) 163248. T. Ando, Matrix Young inequalities, Operator Theory: Advances and Ap plications, 75 ( 1995) 3338. T. Ando, Bounds for the antidistance, J. Convex Analysis, 2 ( 1996) 13 . T. An do and R. Bhatia, Eigenvalue inequalities associated with the Carte sian decomposition, Linear and Multilinear Algebra, 22 ( 1987) 133 147.
326
References
T.
Ando and F. Hiai, Log majorization and complementary Golden Thomp son type inequalities, Linear Algebra Appl. , 197 / 198 ( 1994) 1 13 13 1 . H . Araki , On an inequality of Lieb an d Thirring, Letters in Math. Phys . , 19( 1990 ) 167 1 70. H. Araki and S. Yamagami , An inequality for the HilbertSchmidt norrn, Commun. Math. Phys. , 8 1 ( 1 98 1 ) 8998. N. Aronszajn, RayleighRitz and A . Weinstein methods for approximation of eigenvalues. I. Operators in a Hilbert space, Pro c. Nat. Acad. Sci. U.S .A. , 34( 1948) 474480. W.B. Arveson, Notes on extensions of C* algebras, Duke 11ath. J . 44 ( 1977) 329355. J . S . Auj la and H. L. Vasudeva, Convex and monotone operator functions, Ann. Polonici Math. , 62 ( 1 995) 1 1 1 . F.L. Bauer and C.T. Fike, Norms and exclusion theorems, Numer. Math. 2 ( 1960) 137 14 1 . H . Baumgartel, A nalytic Perturbation Theory for Matrices and Operators, Birkhauser, 1984. N. Bebiano, New developments on the Marcus Oliviera conjecture, Linear Algebra Appl . , 197 / 198 ( 1 994) 793802 . G . R. Belitskii and Y.I. Lyubich , Matrix Norrns and Their Applications, Birkhauser, 1 988. J . Bendat and S. Sherman, Monotone and convex operator functions, Trans. Amer. Math. So c . , 79( 1955) 5871 . F . Berezin and I.M. Gel ' fand, Some remarks on the theory of spherical functions on symmetric Riemannian manifolds, Thudi Moscow Math. Ob. , 5 ( 1956) 31 1351 . R. Bhatia, On the rate of change of spectra of operators II, Linear Algebra Appl. , 36( 198 1 ) 2532. R. Bhatia, Analysis of spectral variation and some inequalities, Thans. Amer. Math. So c . , 272( 1982) 323332. R, Bhatia, Some inequalities for norrn ideals, Com1nun. Math. Phys. , 1 1 1 ( 1987) 3339. R. Bhatia, Perturbation Bounds for Matrix Eigenvalues, Longman, 1987. R. Bhatia, Perturbation inequalities for the absolute value map in norm ideals of operators, J. Operator Theory, 19( 1988) 129 136. R. Bhatia, On residual bounds for eigenvalues, Indian J . Pure Appl. Math. , 23 ( 1992) 865866. R. Bhatia, A simple proof of an operator inequality of Jocic and Kittaneh, J . Operator Theory, 3 1 ( 1 994) 2 122. R. Bhatia, Matrix factorizations and their perturbations, Linear Algebra Appl. , 197 / 1 98( 1994 ) 245276. ,
References
327
R. Bhatia, First and second order perturbation bounds for the operator R. R. R.
R.
R. R. R.
R.
R. R. R.
R.
R.
absolute value, Linear Algebra Appl. , 208/209 ( 1994) 367376. Bhatia, Perturbation bounds for the operator absolute value, Linear Al gebra Appl. , 226 ( 1 995) 639645. Bhatia and C. Davis, A bound for the spectral variation of a unitary operator, Linear and Multilinear Algebra, 1 5 ( 1 984) 7176. Bhatia and C. Davis, Concavity of certain functions of matrices, Linear and Multilinear Algebra, 1 7( 1 985) 155 164. Bhatia and C. Davis, More matrix forrns of the arithmeticgeometric mean inequality, SIAM J. Matrix Analysis, 14( 1993) 132 136. Bhatia and C. Davis, Relations of linking and duality between sym metric gauge functions, Operator Theory: Advances and Applications , 73 ( 1 994) 127 137. Bhatia and C. Davis, A CauchySchwarz inequality for operators with applications, Linear Algebra Appl. , 223 ( 1 995) 1 19 129. Bhatia, C . Davis, and F. Kittaneh, Some inequalities for commutators and an application to spectral variation, Aequationes Math. , 4 1 ( 1 99 1 ) 7078 . Bhatia, C. Davis and A. Mcintosh, Perturbation of spectral sub spaces and solution of linear operator equations, Linear Algebra AppL , 52/53 ( 1 983) 4567. Bhatia and L. Elsner, On the variation of permanents, Linear and Mul tilinear Algebra, 2 7 ( 1 990) 105 1 10. Bhatia and L. Elsner, Symmetries and variation of spectra, Canadian J. Math. , 44( 1 992) 1 155 1 166 Bhatia and L. Elsner, The qbinomial theorem and spectml symmetry, Indag. Math. , N.S . , 4( 1993) 1 1 16. Bhatia, L. Elsner, and G. Krause, Bounds for the variation of the roots of a polynomial and the eigenvalues of a matrix, Linear Algebra Appl. , 142 ( 1 990) 195209. Bhatia, L. Elsner, and G. Krause, Spectral variation bounds for diago nali,c; able matrices, Preprint 94098, SFB 343 , University of Bielefeld, A equationes Math , to appear. Bhatia and S. Friedland, Variation of Grassrnann powers and spectra, Linear Algebra Appl. , 40( 1981 ) 1 18. Bhatia and J A. R . Holbrook, Short normal paths and spectral variation, Proc. Arner. Math. Soc. , 94( 1985) 377382. .
R.
R.
.
R. Bhatia and J . A .R Holbrook, Unitary invariance and spectral variation, Linear Algebra Appl. , 95( 1987) 4368. R. Bhatia and J . A. R Holbrook, A softer, stronger Lidskii theorem, Pro c. Indian Acad. Sci. (Math. Sci. ) , 99(1989) 7583. .
.
References
328
R. Bhatia, R. Horn, and F. Kittaneh , Norrnal approximants to binormal operators, Linear Algebra Appl. , 147( 199 1 ) 169 179. R. Bhatia and F . Kittaneh , On some perturbation inequalities for operators, Linea r Algebra Appl. , 106 ( 1 988) 271279. R. Bhatia and F. Kittaneh, On the singular values of a product of operators, SIAM J. Matrix Analysis , 1 1 ( 1990) 272277. R. Bhatia and F. Kittaneh, Some inequalities for norms of commutators, SIAM J. Matrix Analysis, 18( 1997) to appear. R. Bhatia, F. Kittaneh and R. C . Li , Some inequalities for commutators and an application to spectral variation II, Linear and Multilinear Algebra, to appear. R. Bhatia and K.K. Mukherjea, On the rate of change of spectra of opera tors, Linear Algebra Appl. , 27( 1 979) 147157. R. Bhatia and K.K. Mukherj ea, The space of unordered types of complex numbers, Linear Algebra Appl. , 52/53 ( 1983) 765768. R. Bhatia and K. Mukherjea, Variation of the unitary part of a matrix, SIAM J . Matrix Analysis, 15 ( 1994) 1007 1014. R. Bhatia and P. Rosenthal, How and why to solve the equation AX  X B Y , Bull. London Math. Soc. , 29( 1997) to appear.
==
R. Bhatia and K . B . Sinha, Variation of real powers of positive operators, Indiana Univ. Math. J. , 43 ( 1 994) 9 13925 .
M.Sh. Birman, L.S. Koplienko, and M.Z. Solomyak, Estimates of the spec trum of the difference between fractional powers of selfadjoint opera tors, Izvestiya Vysshikh Uchebnykh Zavedenni. Mat , 19( 1975) 3 10. M. Sh. Birman and M.Z. Solomyak Double Stieltjes operator integrals, En glish translation, in Topics in Mathematical Physics, Volume 1 , Con sultant Bureau, New York, 1967. A. Bjorck and G.H. Golub, Numerical methods for computing angles be tween linear subspaces, Math. Comp. 27( 1973) 579594. R. Bouldin, Best approximation of a normal operator in the Schatten p norrn, Proc. Amer. Math. Soc. , 80( 1980) 277282. A.L. Cauchy, Sur l 'equation a l 'aide de laquelle on determine les inegalites seculaires des mouvements des p lan etes 18 2 9, Oeuvres CompU�tes, ( lind Serie ) Volume 9, GauthierVillars. F. Chatelin, Spectral Approximation of Linear Operators , Academic Press 1983. ,
M.D. Choi , Almost commuting matrices need not be nearly commuting, Proc. Amer. Math. Soc. 102 ( 1 988) 529533.
J E Cohen, Inequalities for matrix exponentials, Linear Algebra Appl. , 1 1 1 ( 1988) 2528. .
.
References
J .E.
329
Cohen, S . Friedland , T . Kato, and F .P. Kelly, Eigenvalue inequalities for products of matrix exponentials, Linear Algebra Appl . , 45 ( 1982 ) 5595.
A. Cannes and E . St¢rmer , Entropy for automorphisms of II1 von Neu mann algebras, Acta Math. , 134( 1975 ) 289306.
H. 0. Cordes, Spectral Theory of Linear Differential Operators and Com parison Algebras, Cambridge University Press, 1987 . .. R. Courant , Uber die Eigenwerte bei den Differentialgleichungen der mathematischen Physik, Math. Z. , 7( 1920) 157. R. Courant and D . Hilbert , Methods of Mathematical Physics, Wiley, 1953 . Ju. L. Daleckii and S.G. Krein, Formulas of differentiation according to a parameter of functions of Hermitian operators, Dokl. Akad. Nauk SSSR, 76 ( 1 9 5 1 ) 13 1 6. E. B . Davies, Lipschitz continuity of functions of operators in the Schatten classes, J . London Math. Soc. , 37( 1988) 148: 157. C. Davis , Separation of two linear subspaces, Acta Sci. Math. ( Szeged) , 1 9 ( 1958) 172 187. C. Davis, Notions generalizing convexity for functions defined on spaces of matrices, in Convexity: Proceedings of Symposia in Pure Mathematics, American Mathematical Society, ( 1 963) 1 8720 1 . C. Davis, The rotation of eigenvectors b y a perturbation, J. Math. Anal. Appl. , 6 ( 1 963) 159 1 73. C. Davis, The ToeplitzHausdorff theorem explained, Canad. Math. Bull. , 1 4 ( 197 1 ) 245246. C . Davis and W.M. Kahan, The rotation of eigenvectors by a perturbation III, SIAM J. Numer. Anal. 7(1970) 146. J. Dieudonne, Foundations of Modern Analysis, Academic Press, 1960. W.F. Donoghue, Monotone Matrix Functions and Analytic Continuation, SpringerVerlag , 1974. L. E lsner, On the variation of the spectra of matrices, Linear Algebra Appl. , 47( 1982) 127 138. L. Elsner, An optimal bound for the spectral variation of two matrices, Linear Algebra Appl. , 71 ( 1 98 5) 7780. L. E lsner and S . Friedland, Singular values, doubly stochastic matrices and applications, Linear Algebra Appl. , 2 2 0 ( 19 9 5 ) 161 1 69. L. Elsner and C. He, Perturbation and interlace theorems for the unitary eigenvalue problem, Linear Algebra Appl. , 188/189 ( 1993) 207229. L. Elsner, C. Johnson, J . Ross and J. Schonheim, On a generalised matching problem arising in estimating the eigenvalue variation of two matrices, European J. Combinatorics, 4 ( 1 983) 133 136.
330
References
L. Elsner and M.H.C. Paarde kooper, On measures of nonnormality of ma trices, Linear Algebra Appl. , 92 ( 1987) 107 1 24. Ky Fan , On a theorem of Weyl concerning eigenvalues of linear transfor mations I, Proc. Nat. Acad. Sci. , U . S . A . , 35 ( 1 949) 652655. Ky Fan, On a theorem of Weyl concerning eigenvalues of linear transfor mations II, Proc. Nat . Acad. Sci. , U.S .A. , 36( 1950) 3 135. Ky Fan, A minimum property of the eigenvalues of a Hermitian transfor mation, Amer. Math. Monthly, 60 ( 1953) 4850. Ky Fan and A . J . Hoffman, Some metric inequalities in the space of matri ces, Proc. Amer. Math. Soc. , 6 ( 1955) 1 1 1 1 16. Yu. B . Farforovskaya, A n estimate of the norm ll f (B)  f( A ) I I for self adjoint operators A and B, Zap. Nauch. Sem LOMI, 56 ( 1 976) 143 162. ( English translation: J. Soviet Math. 14, No. 2 ( 1980) . ) M . Fied l er, Bounds for the determinant of the sum of Hermitian matrices, Proc. Amer. Math. Soc. , 30 ( 1971 ) 2731 . .. E. Fischer, Uber Quadratische Form en mit reellen Koeffizienten, Monatsh. Math. Phys. , 1 6 ( 1 905) 234249. C .K. Fong and J.A.R. Holbrook, Unitarily invariant operator norms, Canad. J. Math. , 35 ( 1 983) 274299. C.K. Fang, H . Radj avi and P. Rosenthal, Norms for matrices and opera tors, J . Operator Theory, 1 8 ( 1 987) 99 1 13. M. Fujii and T. Furuta, LownerHeinz, Cordes and HeinzKato inequalities, Math. Japonica, 38( 1993) 7378. T. Furuta, Norm inequalities equivalent to LownerHeinz theorem, Reviews in Math. P hys . , 1 ( 1989) 135 137. F.R. Gantmacher, Matrix Theory, 2 volumes, Chelsea, 1959. L. Carding, Linear hyperbolic partial differential equations with constant coefficients, Acta Math. , 84( 195 1 ) 162. L. Carding, An inequality for hyperbolic polynomials, J. Math. Mech. , 8 ( 1959) 957966. I.M. Gel'fand and M. Naimark, The relation between the unitary represen tations of the complex unimodular group and its unitary subgroup, Izv Akad. Nauk SSSR Ser. Mat. 14( 1950) 239260. S . A. Gersgorin, Uber die A brenzung der Eigenwerte einer Matrix, Izv. Akad. Nauk SSSR, Ser. Fiz.  Mat . , 6 ( 193 1 ) 749754. I.C. Gohberg and M.G. Krein, Introduction to the Theory of Linear Non selfadjoint Operators, American Math. Society, 1969. S . Golden, Lower bounds for the Helmholtz function, Phys. Rev. B , 137( 1965) 1 127 1 128. J.A. Goldstein and M. Levy, Linear algebra and quantum chemistry, Amer ican Math. Monthly, 78 ( 1 99 1) 710718.
References
33 1
G . H. Golub and C.F. Van Loan, Matrix Computations, The Johns Hopkins University Press, 2nd ed. , 1989. W. Greub, Multilinear Algebra, 2nd ed . , SpringerVerlag, 1978. P.R. Halmos, Finite dimensional Vector Spaces, Van Nostrand , 1958. P. R. Halmos , Positive approximants of operators, Indiana Univ. Math. J . 2 1 ( 1972 ) 95 1960. P. R. Halmos, Spectral approximants of normal opemtors, Proc. Edinburgh Math. Soc. , 19 ( 1974 ) 5 158. P. R. Halmos, A Hilbert Space Problem Book, 2nd ed. , SpringerVerlag, 1982 . F. Hansen and G.K. Pederesen, Jensen's inequality for operators and L owner 's theorem, Math. Ann. , 258 ( 1982 ) 229241 . G.H. Hardy, J.E. Littlewood and G. Polya, Inequalities, Cambridge Uni versity Press, 1934. E . Heinz , Beitriige zur Storungstheoric der Spektralzerlegung, Math. Ann. , 123 ( 1951 ) 415438. R.H. Herman and A. Ocneanu, Spectral analysis for automorphisms of UHF C* algebras, J. Funct . Anal . , 66 ( 1986 ) 1 10. P. Henrici , Bounds for iterates, inverses, spectral variation and fields of values of nonnorrnal matrices, Numer. Math. , 4 ( 1962 ) 2439 . F. Hiai and Y. Nakamura, Majorisation for generalised snumbers in semifi nite von Neumann algebras, Math. Z. , 195 ( 1987 ) 1 727. F. Hiai and D. Petz, The Golden Thompson trace inequality is comple mented, Linear Algebra Appl. , 181 ( 1993 ) 153 185. N.J. Higham, Matrix nearness problems and applications, in Applications of Matrix Theory, Oxford University Press, 1989. A.J . Hoffman and H.W. Wielandt , The variation of the spectrum of a nor mal matrix, Duke Math J . , 20 ( 1953 ) 3739. K. Hoffman and R. Kunze, Linear Algebra, 2nd ed. , Prentice Hall, 1971 . J. A. R. Holbrook, Spectral variation of normal matrices, Linear Algebra Appl. , 174 ( 1992 ) 131 144. A. Horn, Eigenvalues of S'ums of Hermitian matrices, Pacific J. Math . , 1 2 ( 1962 ) 225242. A. Horn and R. Steinberg, Eigenvalues of the unitary part of a matrix, Pacific J. Math. , 9 ( 1959 ) 541550. R.A. Horn, The Hadamard product, in C.R. Johnson, ed. , Matrix Theory and Applications , American Mathematical Society, 1990. R.A. Horn and C.R. Johnson, Matrix Analysis, Cambridge University Press, 1985. R.A. Horn and C.R. Johnson, Topics in Matrix A nalysis, Cambridge Uni versity Press, 1990.
332
References
R.A. Horn and R. Mathias, An analog of the CauchySchwarz inequality for Hadmard products and unitarily invariant norms, SIAM J . Matrix Analysis, 1 1 ( 1990) 48 1498. R.A. Horn and R. Mathias, CauchySchwarz inequalities associated with positive semidefinite matrices, Linear Algebra Appl. , 1 42 ( 1990) 6382. N . M. Hugenholtz and R.V. Kadison, A utomorphisms and quasifree states of the CAR algebra, Commun. Math. Phys. 43 ( 1975) 1 8 1 197. C. Jordan, Essai sur la geometrie a n dimensions, Bull. Soc. Math. France, 3 ( 1875) 103 1 74. \V . Kahan, Numerical linear algebra, Canadian Math. Bull. , 9 ( 1966) 75780 1 . W. Kahan, Inclusion theorems for clusters of eigenvalues of Hermitian ma trices, Technical Report , Computer Science Department, University of Toronto , 1967. W. Kahan , Every n x n matrix Z with real spectrum satisfies IIZ  Z * II < II Z + Z * II (log2 n + 0.038) , Proc. Amer. Math. Soc. , 39 ( 1973) 23524 1 . W. Kahan, Spectra of nearly Hermitian matrices, Proc. A mer. Math. Soc. , 48 ( 1975) 1 1 17. W. Kahan, B .N. Parlett, and E. Jiang, Residual bounds on approximate eigensystems of nonnormal matrices, SIAM J. Numer. Anal. 1 9 ( 1982) 470484. T. Kato, Notes on some inequalities for linear operators, Math. Ann. , 125 ( 1952) 208212. T. Kato, Perturbation Theory for Linear Operators, SpringerVerlag, 1966. T. Kato, Continuity of the map S + l S I for linear operators, Proc. Japan Acad. 49 ( 1973) 157 160. C.J. Kenney and A . J . Laub , Condition estimates for matrix functions, SIAM J. Matrix Analysis, 10( 1989) 1 91209 . F. Kittaneh, On Lipschitz functions of normal operators, Proc. Amer. Math. Soc. , 94( 1985) 4164 18. F. Kittaneh , Inequalities for the Schatten pnorm IV, Commun. Math. Phys.� 106 ( 1986) 58 1585. F. Kittaneh, A note on the arithmeticgeometric mean inequality for ma trices, Linear Algebra Appl. , 171 (1992) 18. F. Kittaneh and D . Jocic, Some perturbation inequalities for selfadjoint operators, J. Operator Theory, 3 1 ( 1994) 3 10. F. Kittaneh and H. Kosaki, Inequalities for the S chatten pnorm V, Publ. Res. Inst. Math. Sci . , 23( 1987) 433443. A. Koninyi, On a theorem of Lowner and its connections with resolvents of selfadjoint transformations, Acta Sci. Math. Szeged , 1 7 ( 1956) 6370.
References
333
H. Kosaki , On the continuity of the map cp I 'P I from the predual of a W * algebra, J. Funct. Anal. , 59 ( 1984) 123 13 1 . +
B . Kostant , On convexity, the Weyl group and the Iwasawa decomposition, Ann. Sci E.N.S . , 6 ( 1973) 413455 . .
F. Kraus, Uber konvexe Matrixfunktionen, Math. Z. , 41 ( 1936) 1842 . G. Krause, Bounds for the variation of matrix eigenvalues and polynomial roots, Linear Algebra Appl. , 208/209 ( 1994) 7382. F. Kubo and T. Ando , Means of positive linear operators, Math. Ann. , 249 ( 1980) 205224. S. Lang, Introduction to Differentiable Manifolds, John Wiley, 1962. P. D. Lax, Differential equations, difference equations and matrix theory, Comm. Pure Appl. Math. , 1 1 ( 1958) 175194. A. Lenard, Generalization of the Golden Thompson inequality, Indiana Univ . Math. J. 2 1 ( 1971) 457468. C.K. Li, Some aspects of the theory of norms, Linear Algebra Appl. , 2 12/2 1 3 ( 1994) 71 100. C.K. Li and N.K. Tsing, On the unitarily invariant norms and some related results, Linear and Multilinear Algebra, 20 ( 1987) 107 1 19. C.K. Li and N.K. Tsing, Norms that are invariant under unitary sim ilarities and the Cnumerical radii, Linear and Multilinear Algebra, 24( 1989) 209222 . R.C. Li , New perturbation bounds for the unitary polar factor, SIAM J . Matrix Analysis. , 1 6 ( 1995) 327332. R.C.Li, Norms of certain matrices with applications to variations of the spectra of matrices and matrix pencils, Linear Algebra Appl. , 182 ( 1993) 199234. B .V. Lidskii, Spectral polyhedron of a sum of two Hermitian matrices, Func tional Analysis and Appl. , 10( 1982) 7677. V.B . Lidskii, On the proper values of a sum and product of symmetric matrices, D o kl Akad. Nauk SSSR, 75( 1950) 769 772 . .
E.H. Lieb, Convex trace functions and the Wigner YanaseDyson conjec ture, Advances in Math., 1 1 ( 1973) 267288. E.H. Lieb, Inequalities for some operator and matrix functions, Advances
in Math. , 20( 1976) 174 1 78.
H. Lin, Almost commuting selfadjoint matrices and applications, preprint , 1995. G. Lindblad, Entropy, information and quantum measurements, Commun. Math. Phys. , 33( 1973) 305322. J.H. van Lint, The van der Waerden conjecture: two proofs in one year, Math. , Intelligencer, 4( 1982) 7277.
334
References
P. O . Lowdin, On the non orthogonality problem connected with the use of atomic wave functions in the theory of molecules and crystals, J . Chern. Phys. , 18( 1950) 36537 4 .
..
K. Lowner, Uber monotone MatTixfunctionen, Math. Z. , 38 ( 1 934) 1 772 16. T.X. Lu, Perturbation bounds for eigenvalues of symmetrizable matrices, Numerical Mathematics: a Journal of Chinese Universities, 16( 1994) 177 185 ( in Chinese) . G. Lumer and M. Rosenblum, Linear operator equations, Proc. Amer. Math. Soc. , 10( 1959) 324 1 . M. Marcus, Finitedimensional Multilinear Algebra, 2 volumes , Marcel Dekker, 1973 and 1975. M. Marcus and L. Lopes , Inequalities for symmetric functions and Herrni tian matrices, Canad. J. Math. , 9 ( 1957) 305312. M. Marcus and H. Mine, A Survey of Matrix Theory and Matrix Inequali ties, Prindle, Weber and Schmidt, 1964 , reprinted by Dover in 1 992 . M. Marcus and M. Newmann , Inequalities for the permanent function, Ann . of Math. , 75 ( 1 962) 4762. A . S . Markus , The eigenand singular values of the sum and product of linear operators, Russian Math. Surveys, 19( 1964) 92 120. A.W. Marshall and I. Olkin, Inequalities: Theory of Majorization and Its Applications, Academic Press , 1979. R. Mathias , The Hadamard operator norrn of a circulant and applications, SIAM J. Matrix Analysis , 14( 1993) 1 152 1 167. R. McEachin , A sharp estimate in an operator inequality., Proc. Amer. Math. Soc. , 1 15 ( 1992) 161165. R. McEachin, A nalyzing specific cases of an operator inequality, Linear Algebra Appl . , 208/209 ( 1 994) 343365 . A. Mcintosh, Counterexample to a question on commutators, Proc, Amer. Math. Soc. , 29( 1971) 337340. A. Mcintosh , Functions and derivations of C* algebras, J . Funct. Anal.,
30( 1978) 264275.
A. Mcintosh, Heinz inequalities and perturbation of spectral families, Mac quarie Math. Reports, 1 979. M.L. Mehta, Matrix Theory, 2nd ed. , Hindustan Publishing Co. , 1989.
R. Merris and J.A. Dias da Silva, Generalized Schur functions, J. Algebra, 35 ( 1975) 442448.
H. Mine, Permanents, Addison Wesley, 1978. B.A. Mirman , Numerical range and norrn of a linear operator, Thudy Sem inara p o Funkcional ' n o mu Analizu, No. 10 ( 19 68) , 5155.
Refere nces
335
L. Mirsky, Matrices with prescribed characteristic roots and diagonal ele ments, J. London Math. Soc. , 33 ( 1958) 142 1 .
L. Mirsky, Symmetric gauge functions and unitarily invariant norms, Quart. J. Math. , Oxford Ser. (2 ) , 1 1 ( 1960) 5059. Y. Nakamura, Numerical range and norm, Math. Japonica, 27( 1982) 149150 . A. Nudel'man and P. S varc man, The spectrum of a product of unitary ma trices, Uspehi Mat. Nauk 13 ( 1958) 1 1 1 1 1 7. M. Ohya and D. Petz, Quantum Entropy and Its use, SpringerVerlag, 1993. K. Okubo, Holder type norm inequalities for Schur products of matrices, Linear Algebra Appl. , 9 1 (1987) 1328 . C.L. Olsen and G.K. Pedersen, Corona C* algebras and their applications to lifting pro blems, Math. Scand. , 64( 1989) 6386. M. Omladic and P. Semrl, On the distance between normal matrices, Proc. Amer. Math. Soc. , 1 10 ( 1990) 59 1596 . A. Ostrowski, Recherches sur I! a methode de Griiffe et f es zeros des polynomes et des series de Laurent, Acta Math. , 72 ( 1 940) 99257 . A. Ostrowski, Uber die Stetigkeit von charakteristischen Wurzeln in Abhangigkeit von den Matrizenelementen, J ber. Deut. Mat.  Verein, 60( 1 957) 4042 . A. Ostrowski, Solution of Equations and Systems of Equations, Academic Press , 1960. C . C. Paige and M . Wei, History and generality of the CS Decomposition, Linear Algebra Appl. , 208/209 ( 1994) 303326. B .N. Parlett, The Symmetric Eigenvalue Problem, PrenticeHall, 1980. K.R. Parthasarathy, Eigenvalues of matrixvalued analytic maps, J. Aus tral. Math. Soc.(Ser . A ) , 26( 1978) 179 197. C. Pearcy and A. Shields, Almost commuting matrices, J. Funct . Anal. , 33( 1979) 332338. G.K. Pedersen, A commutator inequality ( unpublished note) . D. Petz, Quasi entropies for finite quantum systems, Rep. Math. Phys. , 2 1 ( 1 986) 5765. D. Petz , A survey of certain trace inequalities, Banach Centre Publications 30 ( 1994) 287298. D. Phillips, Improving spectral variation bounds with Chebyshev polynomi als, Linear Algebra Appl. , 133 ( 1990) 165 173 . 
..
J . Phillips , Nearest normal approximation for certain normal operators� Proc. Amer. Math. Soc. 67(1977) 236240. A. Pokrzywa, Spectra of operators with fixed imaginary parts, Proc. Amer. Math. Soc. , 8 1 ( 198 1) 359364. ,
336
References
LR. Porteous , Topological Geometry, Cambridge University Press, 198 1 . R.T. Powers and E . St¢rmer, Free states of the canonical anticommutation relations, Commun . Math. Phys. , 16( 1970) 133.
J .F. Queir6 and A.L. Duarte, On the Cartesian d e comp o sit i o n of a matrix, Linear and Multilinear Algebra, 18 (1985) 7785. Lord Rayleigh, The The o ry of Sound, reprinted by Dover, 1945. M . Reed and B. Simon, Methods of Modern Mathe m ati ca l Physics, Volume 4, Academic Press, 1978. R. C. Riddell, Minimax problems on G rass m a nn m anifo lds, Advances in Math. , 54( 1984) 107 199. M. Rosenblum, On the operator equation BX  XA 23( 1956) 263270.
==
Q, Duke Math. J . ,
S. Ju. Rotfel'd, The singular values of a sum of completely continuous operators, Topics in Mathematical Physics, Consultants Bureau, 1969 , Vol. 3, 7378. A. Ruhe, Closest normal matrix finally found! BIT, 27( 1987) 585598. D. Ruelle, Statistical Mechanics, Benj amin, 1969.
R. Schatten, Narm Ideals of Completely Continuous Operators, SpringerVerlag , 1960. A . Schonhage, Quasi GCD computations, J. Complexity, 1 ( 1985) 1 18137. J.P. Serre, Linear Representations of Finite Gro ups, SpringerVerlag, 1977. H.S. Shapiro, Topics in Approximation Theory, Springer Lecture Notes in Mathematics, Vol. 187, 1971 . B . Simon, Tra ce Ideals and Their Applica t ions, Cambridge University Press, 1979.
S. Smale, The fundamental theorem of algebra and complexity theo ry, Bull. Amer. Math. Soc. ( New Series ) , 4( 1981) 136.
M.F. Smiley, Inequalities related to Lidskii 's, Proc. Amer. Math. Soc. , 19(1968) 1029 1034. G . Sparr, A new p ro of of Lowner 's theorem on m o n o t on e matrix functions, Math. Scand. , 4 7 ( 1980) 26627 4. t
G.W. Stewart, Erro r and perturbation bounds for subspaces associated with certain e ig e n va lue problems, SIAM Rev . , 1 5 ( 1973) 727764. G.W. Stewart , On the pe rturba ti o n of ps e u d o  inve rs e s projections, and linear least squares problems, SIAM Rev. , 19( 1977) 634662. ,
G .W. Stewart and J.G. Sun, Matrix Perturbation Theory, Academic Press, 1990. J . G. Sun, On the pe rtu rb ati o n of the eigenvalues of a normal matrix, Math. Numer. Sinica, 6( 1 984) 334336.
References
33 7
Sun, On the variation of the spectrum of a normal matrix, Linear Algebra Appl . , 246( 1996) 2 1 5222 .
J . G .
V.S. Sunder, Distance between normal operators, Proc. Amer. Math. Soc. , 84( 1982) 483484. V.S. Sunder, On permutations, convex hulls and normal ope ra to rs , Liuear Algebra Appl. , 48 ( 1982) 4034 1 1 . J .H. Sylvester, Sur l 'equation en matrices px 99 ( 1 884) 6771 and 1 15 1 16.
== x q ,
C.R. Acad. Sci. Paris,
B. Sz.Nagy, Uber die Ungleichung von H. Boh r, Math. Nachr. , 9( 1953)
255259. P. Tarazaga, Eige n value estimates fo r symmetric matrices, Linear Algebra Appl. , 135 ( 1 990) 1 7 1 1 79. C.J. Thompson, Inequality with applications in statistical mechanics, J. Math. Phys. , 6 ( 1965 ) 1 8 1 2 1 813. C.J. Thompson, Inequalities and partial orders on matrix spaces, Indiana Univ. Math. J. 2 1 ( 1971 ) 469480. R. C. Thompson, Principal submatrices II, Linear Algebra Appl. , 1 ( 1968) 21 1243. R. C. Thompson , Principal submatrices IX, Linear Algebra Appl . , 5( 1972) 1 12. R. C. Thompson, On the eigenvalues of a product of unitary matrices, Linear and Multilinear Algebra, 2 ( 1974) 1324. J.L. van Hemmen and Ando An inequality fo r trace ideals, Commun. Math. Phys. , 76( 1980) 143 148. J. von Neumann, Some matrix inequalities and metrization of matric space, Tomsk. Univ. Rev. , 1 ( 1937) 286300, reprinted in Coll ec ted works, Perg amon Press, 1962. B. Wang and M . Gong, Some eigenvalue i n equali ti e s for positive s e mi d efi nite matrix power products, Linear Algebra Appl. , 184 ( 1 993) 249260. P.A. Wedin, Perturbati o n bounds in connection with si ngul ar value decom p o s i tio n BIT ( 13) 2 1 7232. H . F . Weinberger, Remarks on the preceding paper of Lax, Comm. Pure Appl. Math. , 1 1 ( 1958) 195196. H. Weyl, Das asy mp t o tisch e Vert ei lu ngsg es e tz der Eigenwerte linearer par tieller Differentialgleichungen, Math. Ann. , 71 ( 191 1 ) 441469. H. Whitney, Complex A nalytic Varieties, Addison Wesley, 1972. ,
H. Wielandt, Lineare Scharen von Matrizen mit reellen Eigenwerten, Math. z. , 53 ( 1950) 2 19225. H.W. Wielandt, A n extre mum property of sums of eig e nv alu e s, Proc. An1er. Math. Soc. , 6 ( 1 955) 106 1 10.
338
References
H.W. Wielandt , Topics in the Analytic Theory af Matrices, mimeographed lecture notes, University of Wisconsin, 1 967, reprinted in Collected Works, Vol. 2, W.de Gruyter, 1996. E . Wigner and J. von Neumann, Significance of L owner 's theorem in the quantum theory of collisions, Ann. of Math. , 59 ( 1954) 418433. J. H. Wilkinson, The Algebraic Eigenvalue Problem, Oxford University Press , 1965 .
Index
absolute value of a matrix, 296 of a vector , 30 perturbation of, 296 adjoint , 4 analytic continuation, 1 34 analytic functions in a halfplane, 13 4 angles between subspaces, 20 1 angle operator, 22 1 annihilation operator, 2 1 antisymmetric tensor power, 16 antisymmetric tensor product, 16 arithmeticgeometric mean inequality, 87, 262 , 295 averaging, 40 , 1 1 7 Ando's Concavity Theorem, 273 ArakiLiebThirring inequality, 258 Aronszajn ' s Inequality, 64 BauerFike Theorem , 233 Bernstein's Theorem, 148 bilinear map, 1 2 , 3 10, 3 1 3
bilinear functional, 1 2 bilinear functional, elementary, 12 binormal operator, 284 biorthogonal, 2 Birkhoff 's Theorem, 37, 180 blockmatrix, 9, 195 Borel measure, 139 bound norm, 7 Cnumerical radius, 102 , 106 canonical angles, 20 1 cosines of, 20 1 Caratheodory's Theorem, 38 Cartesian decomposition, 6, 25, 73, 156, 183 Carrollian ntuple, 242 Cauchy's integral formula, 205 Cauchy's Interlacing Theorem, 59 CauchyRiemann equations, 135 CauchySchwarz inequality for matrices , 266 , 286 , 297, 298 for symmetric gauge func
340
Index
tions , 88 for unitarily invariant norms , 95, 108 Cayley transform, 284 centraliser, 167 chain rule, 3 1 1 Chebyshev's Theorem, 228 Cholesky decomposition , 5, 3 19 class £, 268 class T, 259 coefficients in the characteristic polynomial , 269 compatible matching, 36 commutant, 167 commutator, 167, 234 complete symmetric polynomial, 18 completely monotone function, 148 compression, 59, 6 1 , 1 19 concave function, 41 , 53, 158, 240, 248 , 289 condition number, 232 conformally equivalent , 137 contraction, 7, 1 19 contractive, 7, 10 convex function, 40, 41 , 45, 87, 1 1 7, 157, 218, 240, 248, 265, 28 1 monotone, 40 smoothness properties, 145 convolution, 146 creation operator, 21 CS Decomposition, 196, 223 cyclic order, 184 , 191 DavisKahan sin8 Theorem, 2 1 2 derivative, 124, 310 determinant , 16, 253, 269 inequality, 3 , 1 9 , 2 1 , 47, 5 1 , 108, 1 8 1 , 182, 183, 271 , 28 1 perturbation of, 22 diagonal of A, 37 diagonalisable matrix, 232
diagonally dominant , 25 1 differentiable curve , 166 differentiable manifold , 305 differentiable map, 124, 310 differentiation, rules of, 3 1 1 Dilation Theory, 26 distance between subspaces , 202 direct sum, 9 directional derivative, 3 10 distribution function, 139 doubly stochastic matrix, 23 , 32 , 165 doubly substochastic matrix, 38 , 39 dual norm, 89, 96 dual space, 14 Dyson's expansion, 3 1 1 , 3 19 eigenvalues, 24 continuity of, 1 52, 168 continuous parametrization of, 154 generalised, 238 of A with respect to B, 238 of Hermitian matrices, 24, 35, 62 , 1 0 1 o f product of unitary matrices, 82 elementary symmetric polynomi als, 18, 46 , 5 1 entropy, 44, 274, 287 in quantum mechanics, 274 relative, 274 Euclidean norm, 86 exponential, 8, 254, 3 1 1 extremal representation for eigenvalues, 2 4 58, 67, 77 extremal representation for singular values, 75, 76 ,
Fan Dominance Theorem, 93, 284, 29 1 FanHoffman theorem, 73 field of values , 8
Index
first divided difference, 123 Fischer's inequality, 5 1 Fourier analysis, 135 Fourier transform, 206, 207, 216 inverse, 2 16 minimal extrapolation problem, 224 Frechet derivative, 310 Frechet differential calculus, 30 1 , 3 10 Frechet differentiable map , 124 Frobenius norm, 7, 25, 92, 2 14 Fubini's Theorem, 140 FugledePutnam Theorem, 235 function of bounded variation, 217 gamma function, 3 1 9 gap, 225 Gel'fand Naimark Theorem, 71 general linear group, 7 Gersgorin disks, 244 Gersgorin Disk Theorem, 244 GoldenThompson inequality, 261 , 279, 285 Gram determinant , 20 Gram matrix, 20 GramSchmidt procedure , 2, 287 Grassmann power, 18 Hadamard's inequality, 3, 227 Hadamard product , 23 Hadamard Determinant Theorem, 24, 46, 5 1 Hall's Theorem, 52 Hall's Marriage Theorem, 36 harmonic conjugates, 137 harmonic function, 135 harmonic function, mean value property, 136 Hausdorff distance, 160 Heinz inequalities, 285 Henrici 's Theorem, 246 Herglotz Theorem, 136 Hermitian approximant , 276
341
Hermitian matrix, 4, 8, 57, 1 55 , 254 HilbertSchmidt norm, 7, 92, 97, 299 HoffmanWielandt inequality, 165, 237 HoffmanWielandt Theorem, 1 65 Holder inequality, 88 for symmetric gauge func tions, 88 for unitarily invariant norms, 95 hyperbolic PDE, 25 1 hyperbolic polynomials, 251 inner product , 1 inner product , on matrices, 92 imaginary part of a matrix, 6 invariant subspace, 10 interlacing theorem, 60 converse, 6 1 for singular values, 8 1 inverse, 29 3 isometry, 1 19 isotone, 41 jointly concave, 271 , 273 Jordan canonical form, 246 Jordan decomposition , 99, 262, 277, 292 KatoTemple inequality, 77 KonigFrobenius Theorem, 37 KreinMilman Theorem, 133 Ky Fan knorms, 35 , 92 main use, 93 Ky Fan's Maximum Principle, 24 , 35, 65, 6 9 , 7 1 , 1 74 lpnorrns, 84, 89 lattice superadditive, 48, 53 laxly positive, 238 Lebesgue Dominated Conver gence Theorem, 139 length of a curve, 169
342
Index
lexicographic ordering, 14 Lidskii ' s Theorem, 69, 1 79 second proof, 70 multiplicative, 73 third proof, 98 for normal matrices, 181 Lie algebra, 241 Lie bracket , 167 Lie Product Formula, 254 , 280 Liebian function, 286 Lieb's Concavity Theorem, 271 Lieb's Theorem, 270 LiebThirring inequality, 279 Lindblad 's Theorem, 275 Lipschitz continuous, 322 Lipschitz constant , 215 Lipschitz continuous function, 214 logarithm, 145 principal branch, 143 logarithmic majorisation, 71 Loewner's Theorems, 13 1 , 149 LoewnerHeinz inequality, 150 LownerHeinz Theorem, 285 Low din orthogonalisation, 28 7 LR decomposition, 3 1 9 perturbation of, 320 Lyapunov equation, 22 1 majorisation, 28 weak, 30 of complex vectors, 179 soft, 180 manifold, 167 Marcusde Oliviera Conjecture, 184 Marriage Problem , 36 Marriage Theorem, 162, 2 13 Matching Problem, 36 Matching Theorem, 185 matrix convex, 1 1 3 matrix convex of order n, 1 13 matrix monotone, 1 1 2 matrix monotone of order n, 1 12 matrix triangle inequality, 81
Matrix Young inequalities, 286 mean value theorem, 303, 307, 312 measure of nonnormality, 245 , 25 1 , 278 metrically fiat , 1 70 midpoint operator convex, 1 13 minimax principle, 58 of Courant , Fischer and Weyl, 58 for singular values, 75 Minkowski determinant inequal ity, 56 Minkowski Determinant Theo rem, 47, 282 Minkowski inequality, for sym metric gauge functions, 89 Mirman ' s Theorem, 25 Mixed Schwarz inequality, 281 , 286 mollifiers, 146 monotone, 45 , 48 monotone decreasing, 4 1 monotone increasing, 4 1 multiindices, 1 6 multilinear functional, 1 2 multilinear map, 1 2 multiplication operator, 223 multiplication operator, left, 273 multiplication operator, ri ght, 273 nearly normal matrix, 252 Neumann Series, 7 Nevanlinna's Theorem, 138 norm, 1 , 6 absolute, 85 bound, 91 gauge invariant , 85 Frobenius, 92 HilbertSchmidt, 92 Ky Fan, 92 monotone, 85 operator, 91
Index
permutation invariant 85 ' Schat ten, 92 s ubmult i pl i cat i ve 94 symmetric, 85 trace, 92 unitarily invariant , 9 1 weakly uni t arily invariant , 102 normal approximant, 277 normal curve, 169 normal matrices, distance between e igenvalue s, 2 12 normal matrices, path connected, 169 normal matrix, 4, 8, 160, 16 1 , 168, 172 , 1 77, 180, 253 function of, 5 spectral resol ut ion, 161 normal path, 169, 177 normal path inequality, 189 numerical radius, 8, 102 nu me ri cal range, 8, 25 vmeasure of nonno rm a lity, 246 ,
operator approximation, 192 , 275, 287 operator concave function , 1 13 121 operator convex function 1 13 130 integral representation, 134 o pe rator monotone function , 1 12 , 12 1, 126, 127, 130, 289 , 302, 303 , 304, 3 17 canonical representation, 145 infinitely differentiable , 134, 290 integral representation, 134 inverse of, 293 operator norm, 7 optimal matching, 159 optimal matching distance , 52 , 153, 1 60 , 2 1 '
'
'
343
o rbit , 189 orthostochastic matrix, 35 , 180 pC ar r ol l i an , 242 pnorms, 84 pth derivative, 3 1 5 p arti t i one d matrix, 64, 188 Pe i erl s B ogol iubov Inequality, 275, 28 1 permanent, 17, 19 i neq u ali ty, 19, 2 1 , 23 p e rt ur b at io n of, 22 permutations , 165 permutation invariant, 43 permutation matrix, 32, 37, 165 complex, 85 permutation orbit , 166 p i n chi ng , 50 , 97, 1 18, 275 pi nch i ng ine qual ity, 97 for wui norms, 107 Pick fu nc ti o ns, 135 , 1 39 Nevanlinna ' s Theorem, 135 integr al representation, 138 Poi n c a re s Inequality, 58 Poisson kernel, 136 po lar decomposition, 6, 213, 267, 2 76, 305 perturbation of, 305 positive approximant , 277 positive definite, 4 positive matrices, product of, 255 positive matrix, 4 positive part , 6, 213 positive semidefinite, 4 positivitypreserving, 32 power functi o ns, 123 , 145, 289 operator monotonicity of, 123 operator convexity of, 1 2 3 principal branch, 143 pro b abi lity measure, 133 pro b ab ili ty measures, weak* compact , 136 p rinc ip a l angles, 202 
'
344
Index
product rule for differentiation , 3 12 Pythagorean Theorem, 2 1 Qnorm, 89 , 95, 1 74, 1 75, 277 Q ' norm, 90, 97 QR decomposition, 3, 195, 307 perturbation of, 307 rank revealing, 196 quasinorms, 107 quantum chemistry, 28 7 R factor, 195 real eigenvalues, 238 real part of a matrix, 6 real spectrum, 1 93 rectifiable normal path, 169 rectifiable path, 169 reducing subspace, 10 regularisation, 146 residual bounds, 193 retraction, 1 73, 277 roots of polynomials, 230 continuity of, 154 perturbation of, 230 Rotfel'd's Theorem, 98 Rouche's Theorem, 153 Sconvex, 40 Schatten class, 32 1 Schatten 2norm, 7 Schatten p norm, 92, 297, 298 Schur basis, 5 Schurconcave, 44, 46, 53 Schurconvex, 40, 4 1 , 44 Schurconvexity, and convexity, 46 Schurconvexity, for differentiable functions, 45 Schur product , 23, 1 24 Schur ' s Theorem, 23, 35, 47, 5 1 , 74 converse of, 55 Schur Triangular Form, 5 Schwartz space, 219
second derivative, 313 second divided difference, 128 selfadjoint , 4 sesquilinear functional , 12 signature of a permutation, 16 similarity orbit, 189 similarity transformations, 1 02 singular value decomposition, 6 singular values, 5 inequalities, 94 majorisation, 157 of products, 71 perturbation of, 78 singular vectors, 6 perturbation of, 2 1 5 sinO theorem, 224 skewHermitian, 4, 1 55 skewsymmetric, 24 1 smooth approximate identities, 146 spectral radius, 9, 102, 253, 256, 269 spectral radius formula, 204 spectral resolution, 57 Spectral Theorem, 5 square root, 297, 30 1 of a positive matrix, 5 principal branch, 143 Stieltjes inversion formula, 1 39 strictly isotone, 4 1 strictly positive, 4 strongly isotone, 41 strongly nonsingular, 319 subadditive, 53 subspaces, 20 1 Sylvester equation, 194, 203, 222, 223, 234 condition for a unique solution, 203 norm of the solution, 208 solution of, 204, 205, 206, 207, 208 symmetry classes of tensors , 17 symmetric gauge function, 44, 52, 86 , 90, 260
Index
quadratic, 89 symmetric matrix, 241 symmetric norm, 94 symmetric tensor power, 16, 18 symmetric tensor product, 16 symplectic, Lie algebra, 241 symplectic, Lie group, 241 Ttransform, 33 tangent space, 167, 305 tangent vector, 166 tan8 theorem, 222 Taylor ' s Theorem, 303 , 307, 3 1 5 tempered distribution, 2 19 tensor product, 222 construction, 12 inner product on, 14 of operators, 14 of spaces, 13 orthonormal basis for, 14 ToeplitzHausdorff Theorem, 8, 20 trace, 25, 253, 269 of a vector, 29 trace inequality, 258, 261 , 279, 28 1 trace norm, 92, 1 73 tracepreserving, 32 triangle inequality for the matrix absolute value, 74 tridiagonal matrix, 60 twice differentiable map, 313 Tychonoff 's Theorem, 132 Tlength of a path, 1 75 Toptimal matching distance , 1 73
unitarystochastic , 35 unitarily equivalent , 5 unitarily invariant function norm, 104 unitarily invariant norm, 9 1 , 93 unitarily similar, 5 unordered ntuples, 1 53 metric on, 153 quotient topology on, 153 van der Waerden conjecture, 27 variance, 44 weak submajorisation, 30 weak supermajorisation, 30 weakly unitarily invariant norm, 102 , 109 Weyl's inequalities, 62 , 64 Weyl's Maj orant Theorem, 42, 73 , 254, 279 converse of, 55 Weyl's Monotonicity Theorem, 63 Weyl ' s Monotonicity Principle, 100, 29 1 , 292 Weyl's Perturbation Theorem, 63, 7 1 , 99, 152, 240 Wielandt 's Minimax Principle, 67 WignerYanaseDyson conjec ture, 274 wui norm, 102, 1 73, 1 77, 190 wui seminorm, 102 Notations
unital, 32 unitary approximant, 276 unitary conj ugation, 102 , 166 unitary factors, 307 unitary group, 7 unitary invariance, 7 unitary matrix, 4 , 1 6 2, 178 unitary orbit, 166 unitary part , 6, 82, 2 13, 305
345
a
V
1\
b, 30
b, 30 A* ' 4 A > 0, 4 A > B, 4 A l /2 , 5 A ® B , 14 A [ k J _ 19 a
I
346
Index
A o B , 23 A < B, 1 1 2 A < L B, 238 AT , 241 A, 204 B, 204 C (A) , 50 C (S) , 104 c�y m ' 52 C. · U, 170 cond (S) , 232 det A, 3 d( .A , Jl ) , 52 D f (A) , 124 D, 135 d ( o (A) , o ( B) ) , 160 dT ( () (A) ' () ( B )) ' 1 73 d( Root f, Root g ) , 230 � (A) , 245 � v (A) , 246 D j (A ) , 30 1 l i D f(A ) II ' 30 1 II I D J (A) III , 30 1 a + ( n ) , 307 are (n) , 307 Df(u ) , 310 D 2 f(u) , 3 13 diag ( A ) , 35 Eu , 16 e , 29 e 1 , 29 Eig A , 63 Eigl (A ) , 63 Eig T (A) , 63 Eiga (A ) , 158 Eig l l l (A) , 1 58 Eig i T I (A) , 158 f ( x ) , 40 j(l] ' 123 .t(2J , 128 j , 206 , 44 p (x ) , 44
( k) ( X ) , 45
q> ( P) ( x ) , 89 q, (P 2 ) ' 89 P1 ��\ ' 89
h( o (A) , o ( B ) ) , 160 h(Rootj, Rootg) , 23 1 Im A, 6 JC * , 14 JC (n) , 167 .c , 269 .C(V, W ) , 3 £ ( 1l) , 4 .C 2 ( X , Y ) , 3 1 3 A (A ) , 50 A1 (A) , 57 Aj (A ) , 58 .A 1 (A) , 58 Aj (A ) , 58 A 1 (T) , 256 £, ( , ) , 175 mA (X ) , 162 M (n) , 9 1 N , 169 N (
Index
16 s pan { v 1 , . . . , v k } , 6 5 s( L, M) , 1 60 a ( A ) , 160 s (a (A) , a ( B)) , 160 sgnx , 2 1 7 Sn , 165 s j_, 167 r+ , 99 r , 99 T( A ) , 102 TAUA , 167 TA OA , 189 8 (£ , F) , 201 8 ( /, g ) , 230 Tu U ( n ) , 305 T, 259 tr, 29 u *v , 2 U ( n) , 7 UB , 166 v + w , 65 v  w, 65 w (A) , 8 W (A) , 8 X @ y , 12 X 1 1\ · · · 1\ X k , 16 X 1 V · · · V X k , 16 a,
xl 28 '
x T 28 X < y, 2 8 X < w y, 30 X < w y , 30 '
< s y ,
X V y, X 1\ y , x+ , 3 0 Z (A) , 167
X
( u, v) ,
1 80 30 30
1 [K, A] , 167 ffijk 7lj , 1 1 ® 7l , 14 ® k A , 15 1\ k 1{ , 16 v k 71, 16 1\ k A , 18 vk A , 18 IAI , 5 I I I , 29 l x l , 30 ll x l l p , 84 ll x l l oo , 84 l l x l l 1 , 86 llx II (k ) , 8 6 IIAII , 6 I I A II 2 , 7 I I IA I I I , 91 I I I AI I I , 9 1 I I A I I ( k ) , 35 I I A II p , 92 I I A l l oo , 9 2 I I A !I l , 92 I l l · I l l (\ , 9 5 l I I · I I I ' , 96
347
Graduate Texts in Mathematics continued from page ii
61
WHITEHEAD. Elements of Homotopy
92
Theory .
62
64
63 65
66
Spaces.
KARGAPOLOV�ERWDAKOV. Fundrunentrus
93
I. 2nd ed.
GeometryMethods and Applications.
BOLLOBAS. Graph Theory.
Part
EDwARDS. Fourier Series. Vol. I 2nd ed.
94
WELLS . Differentiru Anruysis on Complex
WARNER . Foundations of Differentiable Manifolds and Lie Groups.
Manifolds . 2nd ed .
95
SHIRYAEV. Probability. 2nd ed .
WATERHOUSE. Introduction to Affine
96
CoNWAY. A Course in Functionru Anruysis . 2nd ed.
97
67
SERRE. Local Fields.
68
WEIDMANN. Linear Operators in Hilbert
69
LANG. Cyclotomic Fields II .
70
MASSEY. Singular Homology Theory.
KOBUTZ. Introduction to Elliptic Curves and Modular Forms. 2nd ed.
98
Spaces.
72
DUBROVINIFOMENKo!NOVIKOV. Modem
of the Theory of Groups.
Group Schemes .
71
DIESTEL. Sequences and Series in Banach
BROCKERifOM DIECK. Representations of Compact Lie Groups .
STILLWELL. Classical Topology and
FARKASIKRA. Riemann Surfaces. 2nd ed.
99
GROVFiBENSON. Finite Reflection Groups. 2nd ed.
100
BERG/CHRISTENSEN/RESSEL. Harmonic
Combinatoriru Group Theory. 2nd ed.
Analysis on Semigroups: Theory of
73
HUNGERFORD. AJgebra.
Positive Definite and Related Functions.
74
DAVENPORT. Multiplicative Number
1 01
EDWARDS. Gruois Theory.
Theory. 2nd ed.
1 02
VARADARAJAN. Lie Groups, Lie Algebras
75 76 77 78 79
Groups and Lie AJgebras .
1 03
liTAKA. AJgebraic Geometry .
104 DUBROVIN!FOMENKO/NOVIKOV. Modem
LANG. Complex Anruysis. 3rd ed.
HECKE. Lectures on the Theory of
GeometryMethods and Applications.
AJgebraic Numbers.
Part II.
B URRIS/SANKAPPA NAVAR. A Course in
1 05 LANG. S�(R) .
Universru Algebra.
1 06
WALTERS . An Introduction to Ergodic Theory.
80
and Their Representations.
HOCHSCHILD . Basi c Theory of AJgebraic
ROBINSON. A Course Groups. 2nd ed.
in the Theory of in Algebraic
SILV ERMAN. The Arithmetic of Elliptic
Curves .
1 07
OLVER: Applications of Lie Groups to Differential Equations. 2nd ed.
1 08
81
FoRSTER. Lectures on Riemann Surfaces.
Integral Representations
82
Borrffu. Differenttru Forms
Complex Variables.
Topology.
83
in Severru
RANGE. Holomorphic Functions and
1 09 Lmrro. Univalent Functions and
WASHINGTON. Introduction to Cyclotomic
Teichmiiller Spaces.
Curves.
Fields. 2nd ed.
1 10 LANG. Algebraic Number Theory.
IRELAND/RosEN. A Classical Introduction
111
to Modem Number Theory. 2nd ed.
1 1 2 LANG. Elliptic Functions.
85
EDwARDS. Fouri�r Series. Vol. II. 2nd ed.
1 13
86
vAN LINT. Introduction to Coding Theory.
84
2nd ed.
87
BROWN. Cohomology of Groups.
88
PIERCE. Associative Algebras.
89
LANG. Introduction to Algebraic and Abelian Functions. 2nd ed.
90
BR0NDSTED. An Introduction to Convex Polytopes.
91
B EARDON. On the Geometry of Discrete Groups.
HUSEMOLLER. Elliptic
KARATZAS/SHREVE. Brownian Motion and
A Course in Number Theory
Stochastic Cruculus. 2nd ed.
1 14
KOBLITZ.
and Cryptography. 2nd ed.
1 15
BERGERIGosTIAUX. Differential Geometry: Manifolds, Curves, and Surfaces .
1 1 6 KELLEY/S RINIVASAN. Measure and Integral. Vol. I.
1 17
SERRE. Algebraic Groups and Class Fields.
1 18
PEDERSEN. Analysis Now.
1 19
ROTMAN. An Introduction to Algebraic
1 46
Topology. 1 20
Mathematical Sketchbook.
ZIEMER. Weakly Differentiable Functions :
1 47
Sobolev Spaces and Functions of Bounded Variation. 121
LANG. Cyclotomic Fields I and II.
Readings in Mathematics 1 23 EBBINGHAUS/HERMES et al. Numbers . Readings in Mathematics
1 24
Geometry
Methods and Applications .
1 49
1 25 1 26 1 27
Introduction.
BERENSTEINIGAY. Complex Variables : An
in Algebraic
BoREL. Linear Algebraic Groups
MASSEY. A Basic Course Topology .
.
1 28
RAUCH. Partial Differential Equations.
1 29
FuLTON/HARRIS. Representation Theory:
Readings in Mathematics A First Course.
1 30 131
1 50
1 33
Geometry. 151 1 52
ZIEGLER. Lectures on Polytopes .
1 53
FuLTON. Algebraic Topology : A
First Course. 1 54
Analysis. 1 55
KASSEL. Quantum Groups.
1 56
KECHRIS . Classical Descriptive Set
Theory .
1 58
ROMAN. Field Theory.
LAM . A First Course in Noncommutative
1 59
CoNWAY. Functions of One
HARRis . Algebraic Geometry : A First
B EARDON. Iteration of Rational Functions .
1 35
RoMAN. Advanced Linear Algebra.
1 36
ADKINS/WEINTRAUB. Algebra: An
Approach via Module Theory.
Complex Variable II. 1 60
CoHEN. A Course
in Computational Algebraic Number Theo ry .
1 39
BREDON. Topology and Geometry .
1 40
AUBIN. Optima and Equilibria. An Introduction to Nonlinear Analysis.
161 1 62 1 63
DooB . Measure Theory 1 44 DENNISIFARB . No ncommutative Algebra.
145
VICK. Homology Theory . An
Introduction to Algebraic Topology. 2nd ed.
DIXON/MO RTIMER. Permutation
NATIIAN S ON. Additive N umber Theory:
Groups.
1 64
NATIIAN SON. Additive Number Theory : The Classical Bases .
1 65
Inverse Problems and the Geometry of SHARPE. Differential Geometry :
Cartan' s Generalization of Klein' s Erlangen Program . Sumsets .
1 66
1 67 1 68
MORANDI. Field and Galois Theory .
EWALD. Combinatorial Convexity and Algebraic Geometry.
3rd ed.
1 43
ALPERIN/BELL. Groups and
Representations .
B ases. A Computational Approach to
LANG. Real and Functional Analysis .
BORWEINIERDELYI. Polynomials and
Polynomial Inequalities.
BECKERIWEISPFENNING!KREDEL. Grobner Commutative Algebra.
LANG. Differential and Riemannian
Manifolds .
AxLERIBOURDON!RAMEY. Harmonic
Function Theory.
142
MALLIAVIN. Integration and
DoosoN/POSTON. Tensor Geometry .
RoMAN. Coding and Information Theory.
141
B ROWN/PEARCY. An Introduction to
Probability .
1 34
138
S ILYERMAN. Advanced Topics in
the Arithmetic of Elliptic Curves .
1 57
Course.
1 37
EISENBUD. Commutative Algebra with a View Toward Algebraic
Rings.
1 32
RATCLIFFE . Foundations of Hyperbolic Manifolds.

Part III.
RoTMAN . An Introduction to the
Theory of Groups. 4th ed.
REMMERT. Theory of Complex Functions .
DUBROVIN/FOMENKOINOVIKOV. Modem
ROSENBERG. Algebraic KTheory
and Its Applications.
1 48
Combined 2nd ed.
1 22
B RIDGES . Computability: A
1 69
BHATIA. Matrix Analysis.