Matrix Analysis (Graduate Texts in Mathematics)

• Rajendra Bhatia Matrix Analysis Graduate Texts in Mathematics S. Axler Springer New York Berlin Heidelberg Barc...

Author: Rajendra Bhatia

296 downloads 1920 Views 12MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

•

Rajendra Bhatia

Matrix Analysis

Graduate Texts in Mathematics S. Axler

Springer

New York Berlin

Heidelberg Barcelona Budapest Hong Kong London Milan Paris Santa Clara Singapore Tnkvn

F.W.

169

Editorial Board Gehring P.R. Halmos

Graduate Texts in Mathematics 1

TAKEUTIIZARING. Introduction to

33

HIRscH. Differential Topology.

Axiomatic Set Theory. 2nd ed.

34

SPITZER. Principles of Random Walk.

2

OXTOBY. Measure and Category. 2nd ed.

3

SCHAEFER. Topological Vector Spaces.

4

HlLToN/STAMM BACH. A Course in

Homological Algebra. 5

2nd ed.

35 36

MAC LANE. Categories for the Working

WERMER. Banach Algebras and Several Complex Variables. 2nd ed. KELLEYINAMIOKA et al. Linear Topological Spaces.

Mathematician.

37

MONK. Mathematical Logic.

6

HUGHES/PIPER. Projective Planes.

38

GRAUERTIFRrrzscHE. Several Complex

7

SERRE. A Course in Arithmetic.

8

TAKEUTIIZARING. Axiomatic Set Theory.

39

ARVESON. An Invitation to C*-Algebras.

9

HUMPHREYS. Introduction to Lie Algebras

40

KEMENY/SNELI.iKNAPP. Denumerable

Variables.

and Representation Theory. 10 11 12 13

COHEN. A Course in Simple Homotopy

15

41

APoSTOL. Modular Functions and

Theory.

Dirichlet Series in Number Theory.

CoNWAY. Functions of One Complex

2nd ed.

Variable I. 2nd ed.

42

BEALS. Advanced Mathematical Analysis.

ANDERSON/FuLLER. Rings and Categories of Modules. 2nd ed.

14

Markov Chains. 2nd ed.

GoLUBITSKY/GUILLEMIN. Stable Mappings

Groups. 43

44

and Their Singularities.

45

BERBERIAN. Lectures in Functional

46

Analysis and Operator Theory.

47

16

WINTER. The Structure of Fields.

17

RosENBLATI. Random Processes. 2nd ed.

18

HALMOS. Measure Theory.

19

HALMos. A Hilbert Space Problem Book.

SERRE. Linear Representations of Finite

48

GILLMAN/JERISON. Rings of Continuous Functions.

KENDIG. Elementary Algebraic Geometry.

LoE.VE. Probability Theory II. 4th ed.

LOEVE. Probability Theory I. 4th ed. Dimensions 2 and 3.

MoiSE. Geometric Topology in

SACHs!Wu. General Relativity for Mathematicians.

49

GRUENBERG/WEIR. Linear Geometry. 2nd ed.

2nd ed. 20

HUSEMOLLER. Fibre Bundles. 3rd ed.

50

EDWARDS. Fermat's Last Theorem.

21

HUMPHREYS. Linear Algebraic Groups.

51

KLINGENBERG. A Course in Differential

22

BARNES/MACK. An Algebraic Introduction

23 24

Geometry.

to Mathematical Logic.

52

HARTSHORNE. Algebraic Geometry.

GREUB. Linear Algebra. 4th ed. HoLMES. Geometric Functional Analysis

53

MANIN. A Course in Mathematical Logic.

54

GRAVERIWATKINS. Combinatorics with Emphasis on the Theory of Graphs.

and Its Applications. 25

HEWITI/STROMBERG. Real and Abstract

55

Analysis. 26

MANEs. Algebraic Theories.

27

KELLEY. General Topology.

28

ZARisKIISAMUEL. Commutative Algebra Vol. I.

29 30

JACOBSON. Lectures in Abstract Algebra II. Linear Algebra.

32

56

JACOBSON. Lectures in Abstract Algebra III. Theory of Fields and Galois Theory.

MASSEY. Algebraic Topology: An Introduction.

57

CRoWELrlFox. Introduction to Knot KoBLITZ. p-adic Numbers, p-adic

Theory. 58

JACOBSON. Lectures in Abstract Algebra I. Basic Concepts.

31

Analysis.

ZARisKIISAMUEL. Commutative Algebra. Vol. II.

BRoWN/PEARCY. Introduction to Operator Theory 1: Elements of Functional

59 60

Analysis, and Zeta-Functions. 2nd ed.

LANG. Cyclotomic Fields.

Classical Mechanics. 2nd ed.

ARNOLD. Mathematical Methods in

continued after index

Rajendra Bhatia

Matrix Analysis

Springer

Rajendra Bhatia Indian Statistical Institute New Delhi 110 016 India

Editorial Board

F. W. Gehring

S. Axler

P.R. Halmos

Department of

Department of

Department of

Mathematics

Mathematics

Mathematics

Michigan State University

University of Michigan

Santa Clara University

East Lansing, MI 48824

Ann Arbor, MI 48109

Santa Clara, CA 95053

USA

USA

USA

Mathematics Subject Classification (1991): 15-01, 15A16, 15A45, 47A55, 65F15 Library of Congress Cataloging-in-Publication Data

I Rajendra Bhatia.

Bhatia, Rajendra, 1952Matrix analysis p.

em.

-

(Graduate texts in mathematics; 169)

Includes bibliographical references and index. ISBN 0-387-94846-5 (alk. paper)

QA188.B485

1. Matrices.

I. Title.

II. Series.

1996

512.9'434-dc20

96-32217

Printed on acid-free paper.

© 1997 Springer-Verlag New York, Inc. All rights reserved. This work may not be translated or copied in whole or in part without the

written permission of the publisher (Springer-Verlag New York, Inc., 175 Fifth Avenue, New York, NY 10010, USA), except for brief excerpts in connection with reviews or scholarly

analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereaf ter developed is forbidden. The use of general descriptive names, trade names, trademarks, etc., in this publication, even if the former are not especially identified, is not to be taken as a sign that such names, as understood by the Trade Marks and Merchandise Marks Act, may accordingly be used freely by anyone. Production managed by Victoria Evarretta; manufacturing supervised by Jeffrey Taub. Photocomposed pages prepared from the author's LaTeX files. Printed and bound by Maple-Vail Book Manufacturing Group, York, PA. Printed in the United States of America.

987654321 ISBN

0-387-94846-5 Springer-Verlag New York Berlin Heidelberg

SPIN 10524488

Preface

A good part of matrix theory is functional analytic in spirit . This statement can be turned around. There are many problems in operator theory, where most of the complexities and subtleties are present in the finite-dimensional case. My purpose in writing this book is to present a systematic treatment of methods that are useful in the study of such problems. This book is intended for use as a text for upper division and gradu ate courses. Courses based on parts of the material have been given by me at the Indian Statistical Institute and at the University of Toronto (in collaboration with Chandler Davis) . The book should also be useful as a reference for research workers in linear algebra, operator theory, mathe matical physics and numerical analysis. A possible subtitle of this book could be Matrix Inequalities. A reader who works through the book should expect to become proficient in the art of deriving such inequalities. Other authors have compared this art to that of cutting diamonds. One first has to acquire hard tools and then learn how to use them delicately. The reader is expected to be very thoroughly familiar with basic lin ear algebra. The standard texts Finite-Dimensional Vector Spaces by P. R. Halmos and Linear Algebra by K. Hoffman and R. Kunze provide adequate preparation for this. In addition, a basic knowledge of functional analy sis, complex analysis and differential geometry is necessary. The usual first courses in these subjects cover all that is used in this book. The book is divided, conceptually, into three parts. The first five chapters contain topics that are basic to much of the subject. (Of these, Chapter 5 is more advanced and also more special . ) Chapters 6 to 8 are devoted to

v1

Preface

perturbation of spectra, a topic of much importance in numerical analysis, physics and engineering. The last two chapters contain inequalities and perturbation bounds for other matrix functions. These too have been of broad interest in several areas. In Chapter 1, I have given a very brief and rapid review of some basic topics. The aim is not to provide a crash course but to remind the reader of some important ideas and theorems and to set up the notations that are used in the rest of the book. The emphasis, the viewpoint , and some proofs may be different from what the reader has seen earlier. Special attention is given to multilinear algebra; and inequalities for matrices and matrix functions are introduced rather early. After the first chapter, the exposition proceeds at a much more leisurely pace. The contents of each chapter have been summarised in its first paragraph. The book can be used for a variety of graduate courses. Chapters 1 to 4 should be included in any course on Matrix Analysis. After this, if perturbation theory of spectra is to be emphasized, the instructor can go on to Chapters 6,7 and 8. With a judicious choice of topics from these chapters, she can design a one-semester course. For example, Chapters 7 and 8 are independent of each other, as are the different sections in Chapter 8. Alternately, a one-semester course could include much of Chapters 1 to 5, Chapter 9 , and the first part of Chapter 10. All topics could be covered comfortably in a two-semester course. The book can also be used to supplement courses on operator theory, operator algebras and numerical linear algebra. The book has several exercises scattered in the text and a section called Problems at the end of each chapter. An exercise is placed at a particular spot with the idea that the reader should do it at that stage of his reading and then proceed further. Problems, on the other hand, are designed to serve different purposes. Some of them are supplementary exercises, while others are about themes that are related to the main development in the text . Some are quite easy while others are hard enough to be contents of research papers . From Chapter 6 onwards, I have also used the problems for another purpose. There are results, or proofs, which are a bit too special to be placed in the main text. At the same time they are interesting enough to merit the attention of anyone working, or planning to work, in this area. I have stated such results as parts of the Problems se c t ion, often with hints about their solutions. This should enhance the value of the book as a reference, and provide topics for a seminar course as well. The reader should not be discouraged if he finds some of these problems difficult. At a few places I have drawn attention to some unsolved research problems. At some others, the existence of such problems can be inferred from the text. I hope the book will encourage some readers to solve these problems too. While most of the notations used are the standard ones, some need a little explanation: Almost all functional analysis books written by mathematicians adopt the convention that an inner product (u, v) is linear in the variable u and

Preface

..

Vll

conj ugate-linear in the variable v. Physicists and numerical analysts adopt the opposite convention, and different notations as well. There would be no special reason to prefer one over the other, except that certain calculations and manipulations become much simpler in the latter notation. If u and v are column vectors , then u*v is the product of a row vector and a column vector, hence a number. This is the inner product of u and v. Combined with the usual rules of matrix multiplication, this facilitates computations. For this reason, I have chosen the second convention about inner products , with the belief that the initial discomfort this causes some readers will be offset by the eventual advantages. ( Dirac 's bra and ket notation, used by physicists , is different typographically but has the same idea behind it . ) The k-fold tensor power of an operator is represented in this book as k k k @ A, the antisymmetric and the symmetric tensor powers as 1\ A and v A, respe ctively. This helps in thinking of these obj ects as maps, A � @ k A, etc. We often study the variational behaviour of, and perturbation bounds for, functio ns of operators . In such contexts, this notation is natural. Very often we have to compare two n-tuples of numbers after rearrang ing them. For this I have used a pictorial notation that makes it easy to remember the order that has been chosen. If x == (x 1 , .. . , x n ) is a vector with real coordinates, then x1 and x i are vectors whose coordinates are ob tained by rearranging the numbers Xj in decreasing order and in increasing · · 1y. ·nr x 1 - ( x l1 , . . . , x nl) an d x i - ( x ii , . . . , x ni) , vve write or der, respective > x n1 and x i1 < > < x ni . where x 11 The symbol Ill Ill stands for a unitarily invariant norm on matrices: one that satisfies the equality I I I U A V I I I == I I I A I I I for all A and for all unitary U, V . A statement like Ill A lii < Ill B ill means that , for the matrices A and B , this inequality is true simultaneously for all unitarily invariant norms. The supremum norm of A, as an operator on the space e n , is always written as I I A I I - Other norms carry special subscripts. For example, the Frobenius norm, or the Hilbert-Schmidt norm, is written as I I A I I 2 - ( This sho uld be noted by numerical analysts who often use the symbol I I A I I 2 for what we call I I A l l . ) A few symbols have different meanings in different contexts. The reader ' s attention is drawn to three such symbols. If x is a complex number, lxl de notes the absolute value of x. If x is an n-vector with coordinates (x 1 , . . . , Xn ) , then l x l is the vector ( l x11, ... , l x n l ). For a matrix A, the symbol l A ! stands for the positive semidefinite matrix (A * A)11 2 . If J is a finite set , IJI denotes the number of elements of J. A permutation on n indices is often denoted by the symbol u. In this case, u(j) is the image of the index j under the map u. For a matrix A , u(A) represents the spectrum of A. The trace of a matrix A is written as tr A. In analogy, if x == ( x 1 , .. . , Xn ) is a vector, we writ e tr x for the sum E xj . The words matrix and operator are used interchangeably in the book. When a statement about an operator is purely finite-dimensional in content , _

·

·

·

·

·

·

·

_

v111

Preface

I use the word matrix. If a statement is true also in infinite-dimensional spaces, possibly with a small modification, I use either the word matrix or the word operator. Many of the theorems in this book have extensions to infinite-dimensional spaces. Several colleagues have contributed to this book, directly and indirectly. I am thankful to all of them. T. Ando, J .S. Aujla, R.B. Bapat , A. Ben Israel, I. Ionascu, A.K. Lal, R.-C .Li, S.K. Narayan, D. Petz and P. Rosenthal read parts of the manuscript and brought several errors to my attention. Fumio Hiai read the whole book with his characteristic meticulous attention and helped me eliminate many mistakes and obscurities. Long-time friends and coworkers M.D. Choi, L. Elsner, J.A.R. Holbrook, R. Horn, F. Kittaneh, A. Mcintosh, K. Mukherjea, K.R. Parthasarathy, P. Rosenthal and K.B . Sinha, have generously shared with me their ideas and insights. These ideas, collected over the years, have influenced my writing. I owe a special debt to T. An do. I first learnt some of the topics presented here from his Hokkaido University lecture notes. I have also learnt much from discussions and correspondence with him. I have taken a lot from his notes while writing this book. The idea of writing this book came from Chandler Davis in 1986. Various logistic difficulties forced us to abandon our original plans of writing it together. The book is certainly the poorer for it . Chandler, however, has contributed so much to my mathematics, to my life, and to this project , that this is as much his book as it is mine. I am thankful to the Indian Statistical Institute, whose facilities have made it possible to write this book. I am also thankful to the Department of Mathematics of the University of Toronto and to NSERC Canada, for several visits that helped this project take shape. It is a pleasure to thank V.P. Sharma for his lffi.TEXtyping, done with competence and with good cheer, and the staff at Springer-Verlag for their help and support. My most valuable resource while writing, has been the unstinting and ungrudging support from my son Gautam and wife Irpinder. Without that, this project might have been postponed indefinitely. Rajendra Bhatia

Contents

Preface

v

I

1 1 3

A Review of Linear Algebra

I. l I. 2 1.3 1.4 I.5 1.6 1.7 II

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

. . . . . . . . . .

.

.

.

.

.

.

.

.

.

.

.

Majorisation and Doubly Stochastic Matrices

IL l 11 . 2 11.3 II.4 II.5 II.6 III

Vector Spaces and Inner Product Spaces Linear Operators and Matrices . Direct Sums . . . . . . . . . Tensor Products . . . . . . . . Symmetry Classes . Problems . . . . . . . . . . . Notes and References

. . . . . . . . . Basic Notions . . . . . Birkhoff ' s Theorem . . . . . . Convex and Monotone Functions . Binary Algebraic O p erat ions and Majorisation . . . . Problems . . . . . . . Notes and References . . .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

. . . . . .

.

.

.

.

.

.

Variational Principles for Eigenvalues

III. l 111.2 IIL3 III.4

The Minimax Principle for Eigenvalues . . . . . . . Weyl's Inequalities . . . . . . . . . . . . Wielandt 's Minimax Principle . . . . . . Lidskii 's Theorems . . . . . . . . . . . . . .

9

12 16 20 26 28

28 36 40 48 50 54

57 57 62 65 68

x

Contents

III.5 III. 6 III. 7 IV

Eigenvalues of Real Parts and Singular Values . . . . Problems . . . . . . . Notes and References .

Symmetric Norms IV. 1 Norms on . . . . . . . . . . . . . IV. 2 Unitarily Invariant Norms on Operators IV. 3 Lidskii ' s Theorem ( Third Proof ) . . . . IV.4 Weakly Unitarily Invariant Norms . . . IV. 5 Problems . . . . . . . . . . . . . IV.6 Notes and References . . . . . . .

en .

.

.

v

Operator Monotone and Operator Convex Functions V. 1 Definitions and Simple Examples . . . . . V.2 Some Characterisations . . . . . . . Smoothness Properties . . . V.3 . . . . . V.4 Loewner ' s Theorems Problems . . . . . . . . . . . . . . . V.5 Notes and References V.6 . . . . . .

84

. . . . . . . . 84 on . . 91 . . . 98 . . . . . . . 101 . 107 . . . . . . . . 109

.

VI

73 75 78

.

.

en .

.

.

.

.

. . . . . . . . . . .

.

. . . .

. . . . .

. . . . . . . . . . . . . .

1 12 1 12 117 1 23 131 1 47 1 49

Spectral Variation of Normal Matrices 152 VI. 1 Continuity of Roots of Polynomials . 153 VI.2 Hermitian and Skew-Hermitian Matrices 1 55 . . . VI.3 Estimates in the Operator Norm . . . . . . 1 59 VI.4 Estimates in the Frobenius Norm . . . . . . . . . . 165 VI. 5 Geometry and Spectral Variation: the Operator Norm . 1 68 VI.6 Geometry and Spectral Variation: wui Norms . . . 1 73 VI. 7 Some Inequalities for the Determinant . . . . . . . 18 1 VI. 8 Problems . . . . . . . . . . . . . . . . . . . . 184 VI. 9 Notes and References . . . . . . . . . . 190 .

VII Perturbation of Spectral S ubspaces of Normal Matrices VII. I Pairs of Subspaces . . . . . . . . VII.2 The Equation AX - X B Y . . . . . . . . . . . . . . . VII.3 Perturbation of Eigenspaces . . VII.4 A Perturbation Bound for Eigenvalues VII. 5 Perturbation of the Polar Factors . . . . . . VII.6 Appendix: Evaluating the ( Fourier ) constants VII. 7 Problems . . . . . . . . . . . . . . . . VII.8 Notes and References . . . . . . . . . . . . . . ==

VIII Spectral Variation of Nonnor rnal Mat rices VIII. 1 General Spectral Variation Bounds . . . .

. . . . . . . . .

. . . . . . . .

. . . . . .

. . . . . .

194

195 203 211 2 12 213 216 22 1 . . 223 226 227

Contents VIII.4 Matrices with Real Eigenvalues

238 240 244 249

VIII. 5 Eigenvalues with Symmetries . . . . VIII. 6 Probl ems . . . . . . VIII. 7 Notes and References . . . . . .

.

.

.

.

IX

A

Selection of Matrix Inequalities IX. l Some Basic Lemmas . . . . . . . . . . IX. 2 Products of Positive Matrices . . . . . IX. 3 Inequalities for the Exponential Function IX.4 Arithmetic-Geometric Mean Inequalities . . IX. 5 Schwarz Inequalities . . . . . IX.6 The Lieb Concavity Theorem . . . . . . IX. 7 Operator Approximation . . . . . . . . . IX. 8 Problems . . . . . . . . . . . IX. 9 Notes and References . . . . . . . .

.

.

.

.

.

.

.

X

.

253 253 . . . . . . . 255 258 262 . . .

.

266

.

.

.

. . . . . . . . 271 . . . . . . . . 275 279 . . . . . . . 285 .

Perturbation of Matrix Functions X. 1 Operator Monotone Functions . . . . . . . . The Absolute Value . . . . . . . . . . . . . . . . X.2 Local Perturbation Bounds . X. 3 Appendix : Differential Calculus X.4 Problems . . . . . . . . . . . X.5 Notes and References X.6

.

.

.

.

x1

289 289 296 30 1 310 317 320

References

325

Index

339

I A

Review of Linear Algebra

In this chapter we review, at a brisk pace, the basic concepts of linear and multilinear algebra. Most of the material will be familiar to a reader who has had a standard Linear Algebra course, so it is presented quickly with no proofs. Some topics, like tensor products, might be less familiar. These are treated here in somewhat greater detail. A few of the topics are quite advanced and their presentation is new. I.l

Vector Spaces and Inner Product Spaces

Throughout this book we will consider finite-dimensional vector spaces over the field C of complex numbers. Such spaces will be denoted by symbols V, W, V1 , V2 , etc. Vectors will, most often, be represented by symbols u, v, w, x, etc. , and scalars by a, b, s, t, etc. The symbol n , when not explained, will always mean the dimension of the vector space under consideration. Most often, our vector space will be an inner product space. The inner product between the vectors u, v will be denoted by (u, v). We will adopt the convention that this is conj ugate-linear in the first variable u and linear in the second variable v. We will always assume that the inner product is definite; i.e. , \u, u) 0 if and only if u == 0. A vector space with such an inner product is then a finite-dimensional Hilbert space. Spaces of this type will be denoted by symb ols 11, IC, etc. The norm arising from the inner product will be denoted by J l uJ I; i.e. , ll u ll == (u, u)112. As usual, it will sometimes be convenient to deal with the standard Hi lbert space en. Elements of this vector space are column vectors with ==

2

I.

A

Review

of

Linear Algebra

coordinates. In this case , the inner product ( u, v ) is the matrix product * u v obtained by multiplying the column vector v on the left by the row vector u*. The symbol * denotes the conj ugate transpose for matrices of any size. The notation u* v for the inner product is sometimes convenient even when the Hilbert space is not e n . The distinction between column vectors and row vectors is important in manipulations involving products. For example, if we write elements of e n * * as column vectors, then u v is a number, but uv is an n x n matrix ( some times called the "outer product" of u and v ) . However, it is typographically inconvenient to write column vectors. So, when the context does not de mand this distinction, we may write a vector x with scalar coordinates x1, .. . , Xn , simply as (xi, . . . , x n ) · This will often be done in later chap ters. For the present, however, we will maintain the distinction between row and column vectors. Occasionally our Hilbert spaces will be real, but we will use the same notation for them as for the complex ones. Many of our results will be true for infinite-dimensional Hilbert spaces, with appropriate modifications at times. We will mention this only in passing. Let X == ( xi, . . . , x k ) be a k-tuple of vectors. If these are column vectors, then X is an n x k matrix. This notation suggests matrix manipulations with X that are helpful even in the general case. For example, let X == (x 1 , .. . , x k ) be a linearly independent k-tuple. We say that a k-tuple Y == (y1, . . . , Y k ) is biorthogonal to X if (yi , Xj ) == Dij· This condition is expressed in matrix terms as Y* X == Ik , the k x k identity matrix. n

Exercise 1.1 .1 Given any k -tuple of linearly independent vectors X as above, there exists a k-tuple Y biorthogonal to it. If k == n, this Y is unique. The Gram-Schmidt procedure, in this notation, can be interpreted as a matrix factoring theorem. Given an n-tuple X == (x 1 , . .. , xn ) of linearly independent vectors the procedure gives another n-tuple Q == (q 1 , ... , qn ) whose entries are orthonormal vectors. For each k 1 , 2, . . . , n, the vectors {XI , . .. , X k } and { q1, .. . , qk } have the same linear span. In matrix notation this can be expressed as an equation, X == QR, where R is an upper triangular matrix. The matrix R may be chosen so that all its diagonal entries are positive. With this restriction the factors Q and R are both unique. If the vectors Xj are not linearly independent , this procedure can be modified. If the vector Xk is linearly dependent on x 1 , . . . , Xk -l, set q k == 0; otherwise proceed as in the Gram-Schmidt process. If the kth column of the matrix Q so constructed is zero, put the kth row of R to be zero. Now we have a factorisation X == QR, where R is upper triangular and Q has orthogonal columns, some of which are zero. Take the nonzero columns of Q and extend this set to an orthonormal basis. Then, replace the zero columns of Q by these additional basis vectors . The new matrix Q now has orthonormal columns, and we still have X == Q R, because the =

I. 2

Linear Operators and Matrices

3

new columns of Q are matched with zero rows of R. This is called the QR decom pos itio n. Similarly, a change of orthogonal bases can be conveniently expressed in these notations as follows. Let X == (x 1 , ... , x k ) be any k-tuple of vectors and E == (e1, ... ,en ) any orthonormal basis. Then, the columns of the matrix E* X are the representations of the vectors comprising X, relative to the basis E. W hen k == n and X is an orthonormal basis , then E* X is a unitary matrix. Furthermore, this is the matrix by which we pass between coordinates of any vector relative to the basis E and those relative to the basis X. Indeed, if

then we have

u ==

u ==

Ea Xb , '

*

a1· == e-u

b1·

==

J

a==

,

x�u J '

b

==

E*u ' X*u.

Hence, a ==

E* Xb and

b

==

X* Ea.

Exercise 1.1 .2 Let X be any basis of1-l and let Y be the basis biorthogonal to it. Using matrix mu-ltiplication, X gives a linear transformation from en to 1-l. The inverse of this is given by Y*. In the special case when X is orthonormal (so that Y == X), this transformation is inner-product preserving if the standard inner product is used on e n . Exercise 1.1 .3 Use the QR decomposition to prove Hadamard's inequal ity : if X == ( x 1 , . . . , xn ) , then

I det

XI <

n

IT II xj 11-

j=I

Equality holds here if and only if either the . some xj 'lS zero. I.2

Xj

are mutually orthogonal or

Linear Operators and Matrices

Let £, ( V, W) be the space of all linear operators from a vector space V to a vector space W. If bases for V, W are fixed, each such operator has a unique matrix associated with it . As usual, we will talk of operators and matrices interchangeably. For operators between Hilbert spaces, the matrix representations are especially nice if the bases chosen are orthonormal. Let A E £(7t, K) , and let E == (e1, ... ,en ) be an orthonormal basis of7-{ and F == (fl, . . . , fm) an orthonormal basis of JC. Then, the ( i, j) -entry of the matrix of A relative

4

I.

A

Review

of Linear

Algebra

to these bases is aij == ft Aei == (fi , Aei). This suggests that we may say that the matrix of A relative to these bases is F*AE. In this notation, composition of linear operators can be identified with matrix multiplication as follows. Let M be a third Hilbert space with or thonormal basis G == (g1, ... , gp)· Let B E £(/C, M). Then

( matrix of B · A)

G* (B · A)E G*BF F*AE (G* BF)(F*AE) ( matrix of B) ( matrix of A).

The second step in the above chain is justified by Exercise 1. 1 .2. The ad joint of an operator A E £(7t, K) is the unique operator A* 1n £(K, 1t) that satisfies the relation

for all x

E

1t and

z

E

( z, Ax) K: == (A*z, x)rt IC.

Exercise 1.2 . 1 For fixed bases in 1-l and IC, the matrix of A* is the con jugate transpose of the matrix of A. For the space £(1t, 1t) we use the more compact notation £(1t) . In the rest of this section, and elsewhere in the book, if no qualification is made, an operator would mean an element of £(7-l ) . An operator A is called self-ad joint or Hermitian if A == A*, skew Hermitian if A == -A*, unitary if AA* == I == A*A, and no rmal if AA* ==A*A. A Hermitian operator A is said to be positive or positive semide finite if ( x , Ax) > 0 for all x E 7t. The notation A > 0 will be used to express the fact that A is a positive operator. If {x, Ax) > 0 for all nonzero x, we will say A is positive de finite, or strictly positive We will then write A > 0. A positive operator is strictly positive if and only if it is invertible. If A and B are Hermitian, then we say A > B if A - B > 0. Given any operator A we can find an orthonormal basis y1 , . . , Yn such that for each k == 1 , 2, . . . , n, the vector Ayk is a linear combination of y1, ... , Yk· This can be proved by induction on the dimension n of 1-l. Let )q be any eigenvalue of A and y1 an eigenvector corresponding to A1 , and M the !-dimensional subspace spanned by it. Let N be the orthogonal com plement of M. Let PN denote the orthogonal projection on N. For yEN, let ANY== PNAy. Then, AN is a linear operator on the ( n - 1 ) -dimensional space N. So, by the induction hypothesis, there exists an orthogonal ba sis Y2, ... , Yn of N such that for k == 2, ... , n the vector ANYk is a linear combination of y2, ... , Yk· Now Y1, ... , Yn is an orthogonal basis for 1t, and each Ayk is a linear combination of y1 , . , Yk for k == 1, 2, . . , n. Thus, the matrix of A with respect to this basis is upper triangular. In other words, .

.

.

.

.

I.2 Linear Operators and Matrices

5

every matrix A is unitarily equivalent (or unitarily similar ) to an up per triangular matrix T, i.e. , A == QTQ* , where Q is unitary and T is upper triangular. This triangular matrix is called a Schur Triangular Form for A. An orthonormal basis with respect to which A is upper triangular is called a Schur basis for A. If A is normal, then T is diagonal and we have Q* AQ D, where D is a diagonal matrix whose diagonal entries are the eigenvalues of A. This is the Spectral Theorem for normal matrices . The Spectral Theorem makes it easy to define functions of normal matri ces. If f is any complex function, and if D is a diagonal matrix with A1, . .., An on its diagonal, then f (D ) is the diagonal matrix with j(A1 ) , .. . , j (A n ) on its diagonal. If A == QDQ* , then f (A ) == Qf ( D) Q * . A special conse quence, used very often, is the fact that every positive operator A has a unique positive square root. This square root will be written as A 112. ==

Exercise 1.2 . 2 Show that the following statements are equivalent:

A is positive. {ii) A== B* B for some B. (iii) A== T* T for some upper triangular T. (iv) A == T* T for some upper triangular T {i}

with nonnegative diagonal

entries.

If A is positive definite, then the factorisation in (iv) is unique. This is called the Cholesky Decomposition of A.

Exercise 1.2.3 {i) Let {Ao:} be a family of mutually commuting operators. Then, there is a common Schur basis for {Ao:}· In other words, there exists a unitary Q such that Q* Ao:Q is upper triangular for all a . {ii} Let {Aa} be a family of mutually commuting normal operators. Then, there exists a unitary Q such that Q* Ao:Q is diagonal for all a. For any operator A the operator A* A is always positive, and its unique positive square root is denoted by jAj. The eigenvalues of IAI counted with multiplicities are called the singular values of A. We will always enu merate these in decreasing order, and use for them the notation s1 (A) > > sn(A). s2 (A) > If rank A== k, then sk (A ) > 0, but sk+l (A) == · · == sn(A) == 0. Let S be the diagonal matrix with diagonal entries sl ( A ) , . . . , sn(A) and s+ the k X k diagonal matrix with diagonal entries s1(A), ... , s k (A) . Let Q == (Q1, Q2) be the unitary matrix in which Q1 is the n x k matrix whose columns are the eigenvectors of A* A corresponding to the eigenvalues si (A), ... , s� (A) and Q2 then x (n- k) matrix whose columns are the eigenvectors of A* A corresponding to the remaining eigenvalues. Then, by the Spectral Theorem ·

·

·

·

Q* (A* A)Q =

( �i �).

I.

6

A Review of

Note that

Linear Algebra

Qi ( *A )A Ql

S�,

==

; Q (*A

A)Q2

==

0

.

The second of these relations implies that Q2 A 0. From the first one we can conclude that if WI AQlS+ 1' then WiWl Ik · Choose w2 so that W (WI, 2 W ) is unitary. Then, we have ==

==

==

==

W *AQ

==

W AQl w � i ( w; AQl w:

A2 Q 2 AQ 2

) ( ==

s0+

0o

).

This is the Singular Value Decomposition: for every matrix exist unitaries W and Q such that W*

QA

==

A there

S,

where S is the diagonal matrix whose diagonal entries are the singular values of A. Note that in the above representation the columns of Q are eigenvectors of A* A and the columns of W are eigenvectors of AA * corresponding to the eigenvalues sj ( A ) , 1 < j < n. These eigenvectors are called the right and left singular vectors of ,A respectively. Exercise 1. 2 .4 {i) The Singular Value Decomposition leads to the Polar Decomposition: Every operator A can be written as A UP, where U is unitary and P is positive. In this decomposition the positive part P is unique, P I A I . The unitary part U is unique if A is invertible . {ii) An operator A is normal if and only if the factors U and P in the polar decomposition of A commute. {iii) We have derived the Polar Decomposition from the Singular Value Decomposition. Show that it is possible to derive the latter from the former. ==

==

Every operator

A can be decomposed A

==

as

a sum

Re A + i Im A ,

A"tA"' and Im A A-;t"'. This is called the Cartesian where Re A Decomposition of A into its "real" and "imaginary" parts. The operators Re A and Im A are both Hermitian. The norm of an operator A is defined as ==

==

IIAll

We also have

IIAII

==

==

sup IIxA 11-

llxll=l

sup

llxll=llyii=I

l(y,Ax)l.

When A is Hermitian we have AIIll

==

su p I (x , Ax) 1-

IITII=l

I. 2

For every operator

When

Linear Operators and Matrices

7

A we have

A is normal we have II A I J max{J.Xjl: Aj is an eigenvalue of A}. operator A is said to be a contraction if I I A II < 1 . :..=

An We also use the adjective contractive for such an operator. A positive operator A is contractive if and only if A < I. To distinguish it from other norms that we consider later, the norm II A II will be called the operator norm or the bound norm of A. Another useful norm is the norm n

II A I I 2 == (L s} ( A )) 112 == ( trA * A ) 1 12 , j=l

where tr stands for the trace of an operator. If aij are the entries of a matrix representation of A relative to an orthonormal basis of 7t, then t,]

This makes this norm useful in calculations with matrices. This is called the Frobenius norm or the Schatten 2-norm or the Hilb ert- Schmidt norm. Both II A l l and I I A I I 2 have an important invariance property called uni tary invariance: we have I JA II == I I UAV II and l1 A l1 2 == l1U AV I I 2 for all unitary U , V. Any two norms on a finite-dimensional space are equivalent. For the norms I I A I I and I I A I J2 it follows from the properties listed above that

I I A II for every

A.

<

I I A II 2 < n112 II A I I

Exercise 1.2 . 5 Show that matrices with distinct eigenvalues are dense in the space of all n x n matrices. (Use the Schur Triangularisation.) Exercise 1.2.6 If

II AII < 1 , then I - A is invertible (I- A) - 1 ==I + A+ A 2 + · ·

and

· ,

a convergent power series4 This is called the Neumann Series.

Exercise 1.2 . 7 The set of all invertible matrices is a dense open subset of the set of all n x n matrices. The set of all unitary m atri ces is a compact subset of the set of all n x n matrices. These two sets are also groups under multiplication. They are called the general linear group GL { n ) and the unitary group U{n), respectively.

8

I.

A Review

of

Linear Algebra

Exercise 1.2 . 8 For any matrix A the series

converges. This is called the exponential of .A_. The matrix exp A is always invertible and ( exp A)-1 == exp ( -A ) . Conversely, every invertible matrix can be expressed as the exponential of some matrix. Every unitary matrix can be expressed as the exponential of a skew-Hermitian matrix.

The numerical range or the field of values of an operator A is the subset W(A) of the complex plane defined as W(A) == { (x, Ax) : llxll == 1} . Note that W(UAU*) W(aA + bl)

W(A) aW(A) + bW(I)

for all U E U(n), for all a, b E C.

It is clear that if .X is an eigenvalue of A, then A is in W(A). It is also clear that W(A) is a closed set. An important property of W(A) is that it is a convex set. This is called the Toeplitz-Hausdorff Theorem; an outline of its proof is given in Problem I.6.2. Exercise 1. 2.9 (i) When A is normal, the set W(A) is the convex hull of the eigenvalues of A. For nonnormal matrices, W(A) may be bigger than the convex hull of its eigenvalues. For Hermitian opemtors, the first statement says that W (A) is the closed interval whose endpoints are the smallest and the largest eigenvalues of A. (ii) If a unit vector x belongs to the linear span of the eigenspaces corresponding to eigenvalues A1 , . . . , Ak of a normal operator A, then (x, Ax) lies in the convex hull of .A1 , . .. , Ak. {This fact will be used frequently in Chapter III.) The number w(A) defined

as

w(A)

==

sup l(x, Ax) I

llxll=l

is called the numerical radius of A. Exercise 1.2 . 10

(i) The numerical radius defines a norm on £(1-l) .

{ii} w(UAU*) == w(A) for all U E U(n). (iii} w(A) < II AII < 2w(A ) for all A.

I.3 Direct Sums

(iv) w(A)

==

II A I I if {but

9

not only if) A is normal.

The spectral radius of an operator A is defined as sp r ( A) == max{ IA I

:A

is an eigenvalue of A} .

We have noted that spr(A) < w(A) < !! A l l , and that the three are equal if (but not only if) the operator A is normal.

I. 3

Direct Sums

If U, V are vector spaces, their direct sum is the space of columns (�) with u E U and v E V . This is a vector space with vector operations naturally defined coordinatewise. If 1-l , IC are Hilbert spaces, their direct sum is a Hilbert space with inner product defined as

\G), G:))

= (h , h')rt + (k, k')K.

We will always denote this direct sum as 1-l EB /C. If M and N are orthogonally complementary subspaces of 1t, then the fact that every vector x in 1-l has a unique representation x == u+v with u E M and v E N implies that 1i is isomorphic to M E9 N. This isomorphism is given by a natural, fixed map. So, we say that 1-l M E9 N. When a distinction is necessary we call this an internal direct sum. If M, N are subspaces of 1t complementary in the algebraic but not in the orthogonal sense; i.e. , if M and N are disjoint and their linear span is 1-l , then every vector x in 1t has a unique decomposition x == u + v as before, but not with orthogonal summands. In this case we write 1t M + N and say 1t is the algebraic direct sum of M and N. If 1-l == M E9 N is an internal direct sum, we may define the injecticn of M into 1t as the operator IM E £(M, 1-l) such that IM(u) == u for all u E M. Then, IM is an element of £(1t, M) defined as !Mx == Px for all x E 1-l , where P is the orthoprojector onto M. Here one should note that IM is not the same as P be�ause they map into different spaces. That is why their adjoints can be different. Similarly define IN. Then, (IM, IN) is an isometry from the ordinary ("external" ) direct sum M E9 N onto 1i. If 1-l == M EB N and A E £(1-l ) , then using this isomorphism, we can write A as a block-matrix ==

==

A=

( � � ),

where B E £(M) , C E £(N, M), etc. Here, for example, C == !MAIN. The usual rules of matrix operations hold for block matrices. Adjoints are obtained by taking "conjugate transposes" formally.

I.

10

A Review of Linear Algebra

If the subspace M is invariant under A; i.e., Ax E M whenever x E M, then in the above block-matrix representation of A we must have D 0. Indeed, this condition is equivalent toM being invariant. If both M and its orthogonal completnent N are invariant under A, we say that M reduces A. In this case, both C and D are 0. We then say that the operator A is the direct sum of B and E and write A= B Ef) E. =

Exercise

1.3.1 Let A= A1 Ee A2. Show that

(i} vll(A) is the convex hull of W (A1) and W(A2); i.e . , the smallest convex set containing W(A1) U W(A2). max( IIAtll, IIA2 II), max(spr(A1 ), spr(A2) ), max(w(A1), w(A2) ).

II A II spr(A) w(A)

(ii)

Direct sums in which each summand 1ij is the same space 1t arise often in practice. Very often, some properties of an operator A on 1i are reflected in those of some other operators on 1t Ee'H. This is illustrated in the following propositions. E

C('H). Then, the operators ( � unitarily equivalent in C('H ED 'H) .

�)

2 and ( :

�)

Lemma

1.3.2 Let A

Proof.

The cquivnleuc;e is itnpletnented by the unitary operator

are

•

�(!/ �).

1.3.3 An ope1utor A on 1t is positive if and only if the operator on 1i Ef) 1t is positive.

Corollary

( � �)

A This can also be seen by writing ( A

using

Exercise 1.2.2.

Coro llary P roo f.

1.3.4 For every A

Let A

( Note that

1 �1

=

P ropositio n

operator ( �

)

-

( A1/2 / Al 2

0 0

) ( Al/'2 0

C('H) the operator ( 1�1

1�: 1)

A1'2 0

) , and

is positive.

UP be the polar decomposition of A. Then,

A* IA*I

(b �)

E

A A

)

P UP

PU* UPU*

) ( I (b g)(��)( 0

is a unit�y operator on

1t ffi 1i.

0

U*

)

·

•

1.3.5 An operator A on 1t is contractive if and only if the on 1t ffi 1i is positive.

·�·)

L3 Direct

P ro of.

If A has the singular value decomposition

(A I

A* I

0

0

11

A U SV*, then =

) ( V U ) ( SI SI ) ( V* =

Sums

0

) U* . 0

Hen ce ( � �·) is positive if and only if ( � �) is positive. Also , II A l l = II S ll . So we may assume, without loss of generality, that A == S. Now let W be the unitary operator on 1-l ffi1t that sends the orthonormal basis {e 1 , e2 , . . . , e2 n } to the basis {e 1 , en + b e2 , e n + 2 , ... , e n , e 2 n }· Then, the unitary conjugation by W transforms the matrix ( � �) to a direct sum of n two-by-two matrices

This is positive if and only if each of the summands is positive, which • happens if and only if Sj < 1 for all j; i.e. , S is a contraction.

Exercise 1.3.6 If A is a contraction, show that

A* (I - AA* ) 1 12 == (I - A* A ) 112 A* . Use this to show that if A is a contraction on 1t, then the operators

u v

( (I - _,..:! A ) l/2 ( (1 - A� A)l/2

)' -A * - (I - AA* ) 1 12 ) (I - AA* ) 1 12

A*

are unitary operators on 1t ffi 1t.

Exercise 1.3. 7 For every matrix A, the matrix ( b 1) is invertible and its inverse is ( b -: ). Use this to show that if A , B are any two n x n matrices, then

This implies that AB and BA have the sa·me eigenvalues. {This last fact can be proved in ano ther way as follows. If B is invertible, then AB == B - 1 (BA)B . So, AB and BA have the same eigenvalues. Since invertible matrices are dense in the space of all matrices, and a general known fact in complex analysis is that the roots of a polynomial vary continuously with the coefficients, the above conclusion also holds in general.) Direct sums with more than two summands are defined in the same way. We will denote the direct sum of spaces 1-l1, . . . , 1-lk as ffij= 1 7tj, or simply as ffij'Hj.

12

I.

1.4

Tensor P roducts

A

Review

of

Linear

Algebra

Let Vj , 1 < j < k , be vector spaceti. A 1nap F from the Cartesian product V1 X X Vk to another vector Space W is called multilin ear if it depends linearly on each of the arguments. W hen W C, such maps are called 2 , the word multilinear is replaced multilin ear funct io nals. When k w that satisfy by bilinear. Bilinear maps, thus, are maps F : vl X v2 the conditions •

•

•

=

=

�

F ( u , av 1 + bv2 ) F ( au1 + bu2 , v )

+

bF (u , v 2 ) , aF(u 1 , v ) + b F(u 2 v ) ,

aF(u , v 1 )

:

for all a , b E C ; u , U } , U2 E vl and v , V] ' V2 E V2 . We will be looking most often at the special situation when each Vj is the same vector space. As a special example consider a Hilbert space 1t and fix two vectors x, y in it. Then, F ( u, v ) (x , u) (y , v ) =

is a bilinear functional on 'H. We sec from this example that it is equally natural to consider conjugate multilinear functionals as welL Even more generally we could st udy func tions t hat are linear in some variables and conj ugate- linear in others. As an exarnplc, let A E C('H, JC) and for u E JC and v E 'H, let F ( u v ) (u, A v) x: . Then , F depends linearly on v and conjugate-linearly on u . Such function als are called sesquilinear; an inner product is a functional of this sort. The example given above is the "most general" example of a sesquilinear functional : if F( u , v ) is any sesquilinear functional on JC x 1t , then there exis ts a unique ope rator A E C('H, JC) such that F (u , v ) (u, .Av) . In this sense our first example is not the most general exarnple of a bilinear functional. Bilinear functionals F ( u , v ) on 1t that can be expressed as F( u , v ) (x , u) (y , v) for some fixed x , y E 1t are c�lled elem ent ary . They are speci al as the following exercise will show. ,

=

=

=

·

be linearly independent vectors in 1t . Find a necessary and sufficient condition that a vector w m�Lst satisfy in order· that the bilinear functional

Exe rcise

1.4 . 1 Le t x , y ,

z

F (u , v )

=

(x , u) ( y , v)

+

( z , u) ( w , v)

is elementary. The set of all bilinear functionals is a vector space. The result of this exercise shows that the subset consisting of elcrncntary functionals is not closed under addition. We will soon see that a convenient basis for this vec tor space can be constructed with elementary functionals as its members. The procedure, called the tensor product construction, starts by taking formal li near combinations of symbols x ® y with x E 'H, y E JC; then

1 . 4 Tensor Products

13

reducing this space modulo suitable equivalence relations; then identifying t he resulting space with the space of bilinear functionals. More precisely, consider all finite sums of the type L Ci ( X i @ Yi ) ,

7t, Yi E IC and manipulate them formally as linear combinations. In this space the expressions Ci

E

E C , Xi

z

a(x @ y) a(x @ y)

(ax ® y) (x @ ay)

( x i + x2 ) ® y X @ ( Y l + Y2 ) for all a E C; x, x1 , x 2

XI @ Y + X 2 @ Y X @ YI + X @ Y2

E 1t and are next defined to be equivalent to 0, y , y1 , y2 E /C . The set of all linear combinations of expressions x @ y for x E 7t, y E IC, after reduction modulo these equivalences, is called the tensor product of 1t and IC and is denoted as 1t @ JC. Each term c ( x ® y) determines a conjugate-bilinear functional F * (u, v) on 1t x JC by the natural rule

F* (u, v) c(u, x) (v, y ) . ==

This can be extended to sums of such terms, and the equivalences were chosen in such a way that equivalent expressions ( i.e. , expressions giving the same element of 7t ® IC) give the same functional. The complex conj ugate of each such functional gives a bilinear functional. These ideas can be extended directly to k-linear functionals, including those that are linear in some of the arguments and conjugate-linear in others.

Theorem 1.4.2 The space of all bilinear functionals on 1t is linearly spanned by the elementary ones. If ( e I , . . . , e n ) is a fixed orthonormal basis of 1t, then to every bilinear functional F there correspond unique vectors x1 , . . , Xn such that F * Lei ® xi . .

==

Every sequence Xj , 1 < j <

J

n,

leads to a bilinear functional in this way.

Proof. Let F be a bilinear functional on 7-l. For each j, F* (ej , v) is a conjugate-linear function of v . Hence there exists a unique vector Xj such that F* (ej , v) (v, xi } for all v . Now, if u Eaiei is any vector i n 1t , then F( u , v) �ai F(ej , v) • E {ej , u) { xj v) . In other words, F* Eej ® Xj as asserted. ==

==

,

A

==

==

more symmetric form of the above statement is the following:

Corollary 1.4.3 If (e 1 , . . . , e n ) and (JI , . . , fn ) are two fixed orthonormal bases of 1t, then every bilinear functional F on 1-l has a unique represen tation F Ea ij ( e i @ fi ) * . .

==

14

L A Review of Linear Algebra

(Most o ften, the choice ( e t , . . . , e n ) ( /t , . . . , fn ) is the convenient one for using the above representations�) Thus, it is natural to denote the space of conj ugate-bilinear functionals on 1t by 1t ® 'H . This is an n2- dimensional vector space. The �nner product on this space is defined by putting =

and then extending this definitio n to all of 1i ® 1i in a natural way. It is easy to verify that this definition is consistent with the equivalences , /n ) arc Ut;Cd in defining the tensor product . If (e1 , . � . , e n ) and (/1 , orthonornial bases in 1t , then e i ® /j , 1 < i, j < n, form an orthonorrnal basis in 1t ® 'H. For the purposes of computation it is useful to order this basis lexicographically: we say that ei ® /j precedes ek ® ft if and only if either i < k or i k and j < f. Tensor products such as 1t ® JC or JC* ® 1t can be defined by imitating the above procedure . Here the space JC* is t he space of all conjugate-linear functionals on JC. This space is called the dual space of JC . There is a natu ral identification between JC and JC* via a conjugate-linear, norm preserving bijection. •

.

.

=

{i) There is a natural is omorphism between the spaces JC ® 'H. * and £(1t, JC) in 'which the elementary tensor k ® h"' corresponds to the linear rnap that take.s a vector u of 1t to (h, u) k . This linear tra nsformation has rank one and all rank one, transformations can be obtained in this way. (ii) An explicit construc tion of this isomorphism

=

(iii) The space £(1t, JC) is a !filbert space with inner product (A, B) tr A* B� The set Eij , 1 < i < m, 1 < j < n, is an orthonormal basis for this space. Show that the map

Corresponding facts about tnultilinear functionals and tensor products of several spacet; arc proved in the same way. We will usc the notation ® k 'H for the k- fold tensor product 1t ® 1t ® � ® 1t . Tensor products of linear operators arc defined as follows� We first define A ® 13 on elementary tensors by putting (A ® B ) ( x ® y) = A x ® By . We then extend this definition linearly to all linear combinations o f elementary tcnsots, Le., to all of 1t ® 'H . rfhis extension involves no inconsistency� ·

·

I . 4 Tensor Products

15

It is obvious that (A @ B) (C-®- D) == AC @ ED, that the identity on 1{ @ 1-l is given by I @ I, and that if A and B are invertible, then so is A @ B and ( A @ B ) - 1 == A - 1 @ B - 1 . A one-line verification shows that ( A @ B)* == A* @ B * . It follows that A @ B is Hermitian if ( but not only if) A and B are Hermitian; A @ B is unitary if ( but not only if) A and B are unitary; A @ B is normal if (and only if) A and B are normal. (The trivial cases A == 0, or B == 0 , must be excluded f� r the last assertion to be valid . ) Exercise 1.4. 5 Suppose it is known that M is an invariant subspace for A . What invariant subspaces for A @ A can be ob tained from this information alone ? For operators A, B on different spaces 1-l and JC, the tensor product can be defined in the same way as above. This gives an operator A ® B on 1{ @ JC. Many of the assertions made earlier for the case 1-l JC remain true in this situation. ==

Exercise 1.4. 6 Let A and B be two matrices {not necessarily of the same size). Relative to the lexicographically ordered basis on the space of tensors, the matrix for A @ B can be written in block form as follows: if A == ( a ij ) , the n

A ® B == Especially important are the operators A ® A @ · · · ® A , which are k-fold tensor products of an operator A E .C('H) . Such a product will be written more briefly as A®k or ®k A. This is an operator on the nk-dimensional space ®k7t. Some of the easily proved and frequently used properties of these prod ucts are summarised below: 2. ( ®k A) - l == ®k A - l when either inverse exists .

4. If A is Hermitian, unitary, normal or positive, then so is ®k A.

5. If a 1 ,

, a k ( not necessarily distinct) are eigenvalues of A with eigen vectors u 1 , . , U k , respectively, then a 1 · · · a k is an eigenvalue of @k A and u 1 ® · · · ® U k is an eigenvector for it . 6. If Si 1 , , si k (not necessarily distinct) are singular values of A , then s i k is a singular value of ®k A. S i1 .

.

.

.

•

•

•

.

.

•

•

The reader should formulate and prove analogous statements for tensor products A1 Q9 A2 &"· · · Q9 Ak of different operators. ·

16

I. A H.ev 1 e w

I. 5

Symmetry Classes

ot

Linear Algebra

In the space ®k'H there are two especially important subspaces ( for non trivial cases , k > 1 and n > 1 ) The antisymm e t ric t e nsor pro duct of vectors X t , . . . , X k in 1t is de fined as .

Xt

1\

•

•

·

1\ Xk

1 2 = (k! ) - 1 Lco X o ( l )

®

·

·

® X o( k ) ,

'

0

where a runs over all permutation s of the k indices and £ 0 is ± 1 , depending on whether a is an even or an odd permutation. (c.o is called the s ignatur e of a. ) The factor (k � ) - 1 1 2 is chosen so that if Xj are orthonormal, then x 1 1\ · I\ Xk is a unit vector. The antisymmetry of t his product means that ·

·

1\ · 1\ Xt 1\

X1

•

·

•

·

•

1\ Xj 1\ · 1\ X k ·

·

=

-

X1

1\

·

·

·

1\ Xj 1\ · · 1\ Xi 1\ ·

·

·

·

1\ X k ,

i.e., interchangin g the position of any two of the factors in the product amounts to a change of sign . In particular, x 1 1\ 1\ X k 0 if any t w o of the factors are equal. The span of all antisymmetric tensors x 1 /\ · 1\ x k in ®k'H is denoted by 1\k'H. This is called the kth antisymmetric t e nso r pro d uct ( or t e nso r power ) of 'H. Given an orthonormal basis (e 1 , . . . , en ) in 'H, there is a st andard way of constructing an orthononnal b<.u:>is iu 1\ k'H. Let Qk,n denote the set of all strictly increasing k-tuples chosen from { 1 , 21 , n} ; i .e., 'I E Q k,n if and only if 'I = (i1 , i2 , . . . , ik ) , where 1 < i1 < i 2 < < ik < n. For such an 'I let ex = ei1 1\ 1\ eik · Then, { ex : 'I E Q k , n } gi v es an orthonormal basis of 1\k'H. Such 'I are sometimes calle d multi- indices . It is conventional to order them lexicographically. Note that the cardinality of Q k ,n , and hence the dimensionality of 1\ k'H, is (�) . If in particul ar k = n, the space 1\k'H is !-dimensional. This plays a special role later on. When k > n the space 1\ k 'H is {0} . ·

·

•

•

•

·

·

=

·

•

•

·

·

·

·

1. 5. 1 Show that the inner product (x 1 1\ I\ Xk , Yt equal to the_ determinant of the k x k matrix ( (xi , Yi ) ) .

Exe rcis e

The

•

sym m et r ic t e nso r p ro d uct

of x 1 ,

•

•

·

·

1\

·

·

·

I\

Yk) is

, , Xk is defined as

0

where a, as befo re, runs over all permutations of the k indices. The linear span of all these vectors comprises the· subspace vk'H of ® k'H. This is called th e kth sym m e t ric t e ns o r p ower of 'H. Let Gk ,n denote the set of all non-decreasing k-tuples chosen from { 1 , 2, . . , , n } ; i . e . , 'I E G k n if and only if 'I = (i 1 , . . . , ik ) , where 1 < i1 < < i < n . If such an 'I consists of e distinct indices i 1 , i2 , i t with k multiplicities m 1 , . . . , m e , respectively, put m('I) = m1 ! m2 � m e � Given .

•

·

•

.

·

·

·

•

.

-

I. 5 Symmetry Classes

17

an orthonormal basis ( e 1 , . . . , e n ),:,� 1t define, for every I E G k , n , - � ex == e i1 V ei 2 V · k· · V ei k · Then , the set {m(I) - 1 1 2 e-r : I E Gk ,n} is an orthonormal basis in v 7t. Again, it is conventional to order these multi-indices lexico graphic ally. The cardinality of the set G k , n , and hence the dimensionality n+ 1 of t he space v k 7t, is ( z - ) . Notice that the expressions for the basis in 1\ k 1t are simpler because m(I) = 1 for I E Q k ,n·

The elementary tensors x @ · · · @ x, with all factors equal, are all in the subspace v k 1t. Do they span it? Exercise 1.5.3 Let M be a p-dimensional subspace of 1t and N its or thogonal complement. Choosing j vectors from M and k - j vectors from N and form ing the linear span of the antisymmetric tensor products of all suc h vectors, we get different sub spaces of 1\ k 'H; for example, one of those is 1\ kM . Determine all the subspaces thus obtained and their dimensionalk ities. Do the same for v ?t. Exercise 1.5.4 If dim 7t == 3, then dim ® 3 7t == 27, dim /\ 3 7-l == 1 and dim V 3 7t = 1 0. In terms of an orthonormal basis of 1-l, write an element of (A 3 7t EB v3 7t ) j_ . The permanent of a matrix A == ( aij ) is defined as per A == La lu ( l ) . . . anu (n) . Exercise 1.5 . 2

0"

where a varies over all permutations on n symbols. Note t h at in contrast to the determinant , the permanent is not invariant under similarities. Thus, matrices of the same operator relative to different bases may have different permanents. ,

Show that the inner product (x1 V · · V X k , Y1 V V Yk ) is equal to the permanent of the k k matrix ((xi , Yi ) ) . The spaces l\ k 1i and y k Jt are also referred to as "symmetry classes" of tensors - there are other such classes in ® k 1-l. Another way to look at them is as the ranges of the resp ective symmetry operators. Define PI\ and Pv Exercise 1.5.5

·

·

·

·

x

linear operators on

® k 1t by first

as

defining them on the elementary tensors

® · · ® x k ) == (k!) - 1 / 2 x1 1\ I\ X k Pv ( x l ® · · ® x k ) == ( k! ) - 1 12x1 V · · V X k

as

Pl\ ( x 1

·

·

·

·

·

·

and extending them by linearity to the whole space. (Again it should be verified that this can be done consistently. ) The constant factor in the above definitions has been chosen so that both these operators are idempotent. They are also Hermitian. The ranges of these orthoprojectors are 1\ k 1i and V k 1-l respectively. ,

I.

18

A Review of Linear Algebra

If A E .C(7t) , then Ax 1 1\ · · · 1\ A�$-x: lies in 1\ k1t for all x1 , . . . , x k in 1t. Using this , one sees that the space 1\ k1t is invariant under the operator @ k A. The restriction of @ k A to this invariant subspace is denoted by 1\ k A or A Ak . This is called the kth antisymrnetric tensor power or the kth Grassmann power of A. We could have also defined it by first defining it on the elementary antisymmetric tensors x 1 1\ · · · 1\ x k as and then extending it linearly to the span

1\ k7t of these tensors.

Exercise 1.5 . 6 Let A be a nilpotent operator. Show how to obtain, from a Jordan basis for A, a Jordan basis for /\ 2 A.

The space vk7t is also invariant under the operator @ k A. The restriction of @ k A to this invariant subspace is written as y k A or Av k and called the kth symmetric tensor power of A. Some essential and simple properties of these operators are summarised below:

(vk A ) ( vk B) == vk ( A B) . ( A kA ) * == 1\kA * , (vkA) * == vkA* . ( A k A ) - 1 == 1\k A - 1 , (vk A ) - 1 == yk A - 1 .

1 . (1\k A ) ( l\ k B ) == 1\k (A B ) , 2. 3_

4. If A is Hermitian, unitary, normal or positive, then so are 5.

1\ k A

and

ykA. If a 1 , . . . , a k are eigenvalues of A (not necessarily distinct) belonging to eigenvectors u 1 , . . . , u k , respectively, then a 1 · · · a k is an eigenvalue of vk A belonging to eigenvector u 1 V · · · V u k ; if in addition the vectors Uj are linearly independent, then a 1 · · · a k is an eigenvalue of 1\ k A belonging to eigenvector u 1\ · · · 1\ k . 1

u

6. If s 1 , , Sn are the singular values of A, then the singular values of 1\k A are Si1 · · · si k , where (i 1 , . . . , ik) vary over Q k , n; the singular si k ' where (ii , . . . , i k ) , vary over G , n· values of v k A are S i 1 k .

•

.

• • •

7. tr1\ k A is the kth elementary symmetric polynomial in the eigenval ues of A; trv k A is the kth complete symmetric polynomial in the eigenvalues of A. (These polynomials are defined as follows. Given any n-tuple ( a 1 , . . . , a n ) of numbers or other commuting obj ects, the kth elementary symmetric aik for (i1 , i 2 , . . . , ik ) polynomial in them is the sum of all terms ai1 ai2 in Q k , n ; the kth complete symmetric polynomial is the sum of all terms Cli 1 Cl i 2 a i for (i 1 , i 2 , . . . , ik) in G k , n . ) For A E .C (7t ) , consider the operator A ® I @ · · · ® I + I ® A @ I · · · ® I + · · · + I ® I ® · · · ® A. (There are k summands, each of which is a product • • •

•

•

•

k

I . 5 Symmetry Classes

19

of k factors. ) The eigenvalues of this operator on ® k U�are sums of eigenval ues of A. Both the spaces 1\ k 7t and v k H are invariant under this operator. One pleasant way to see this is to regard this operator as the t-derivative at t == 0 of ® k (I + t A) . The restriction of this operator to the space 1\ k{{ will be of particular interest to us; we will write this restriction as A[k] . If u 1 , . . . , u k are linearly independent eigenvectors of A belonging to eigen I\ u k is an eigenvector of A [ k ] belonging to values a 1 , . . . , a k , then u 1 /\ eige nvalue a 1 + · + a k . Now, fixing an orthonormal basis ( e1 , . . . , en ) of 1t, identify A with its matrix ( a ij) . We want to find the matrix representations of 1\ k A and vk A relative to the standard bases constructed earlier. The basis of 1\ k 1t we are using is ex, I E Q k ,n· The (I, J ) -entry of 1\ k A is ( e x ( l\ k A) e J" ) - One may verify that this is equal to a subdeterminant of A. Namely, let A [IIJ"] denote the k x k matrix obtained from A by expunging all its entries aij except those for which i E I and j E J. Then, the (T, J ) -entry of 1\k A is equal to det A [TIJ] . The special case k == n leads to the !-dimensional space 1\ n H. The oper ator 1\ n A on this space is just the operator of multiplication by the number det A. We can thus think of det A as being equal to 1\ n A. The basis of vkH we are using is m(I) - 1 1 2 e-r , I E G k , n · The (I, J ) entry of the matrix vk A can be computed as before, and the result is somewhat similar to that for 1\k A. For I == (i 1 , . . . , i k ) and J == (j1 , . . . , j k ) in G k , n , let A [IIJ] now denote the k x k matrix whose (r, s ) -entry is the ( i, , j ) - entry of A. Since repetitions of indices are allowed in I and J , this is not a submatrix of A this time. One verifies that the (I, J ) -entry of yk A is (m(I)m(J)) - 11 2 per A [IIJ] . In particular, per A is one of the diagonal entries of v n A : the (I, I) -entry for I == ( 1 , 2 , . . . , n ) . ·

·

·

·

·

,

s

Exercise 1.5 . 7 Prove that for any vectors

u1 ,

. . . , U k , v1 , . . . , vk we have

I det ( (u i , vj ) ) 1 2 < det ( ( u i , uj ) ) det ( (vi , Vj } ) , j per ( ( ui , vj ) ) 1 2 < per ( ( ui, Uj ) ) per ( (vi , Vj ) ) . Exercise 1.5 . 8 Prove that for any two matrices A, B we have

j per ( AB ) I2

<

per ( AA* ) per ( B* B ) .

(The corresponding relation for determinants is an easy equality . )

Exercise 1.5.9 (Schur's Theorem} If A is positive, then per A > det A. {Hint: Using Exercise 1. 2. 2 write A == T* T for an upper triangular T. Then use the preceding exercise cleverly.}

1.

20

A H.e v ie w of Linear Alg e bra

We have observed earlier that for any vectors x 1 ,

•

•

•

, Xk in 1i we have

When 1i = lRn , this determinant is also the square of the k-dimensional volume of the parallelepiped havin g x 1 , , Xk as its sides. To see this, note that neither the determinant nor the volume in question is altered if we add to any of these vectors a linear combination of the others. Performing such operations successi vely, we can reach an orthogonal set of vectors, some of which might be zero. In this case it is obvious that the determinant is equal to the square of the volume; hence that was true initially too. Given an y k-tuple X = (x t , . . . ' Xk ) , the m atrix ( (xi , Xj ) ) = x · X is called the G ram mat rix of the vectors Xj ; its determi n ant is called their •

•

•

G ram determi nant .

Every k x k positive matrix A = ( aij ) can be realised as a Gram matrix, i. e. , vectors Xj , 1 < j < k , can be found so that aij = (xi , Xj) for all i, j . Exercise 1 . 5 . 1 0

1.6

Problems

, u11 ) , not necessarily ort honor Given a basis U = ( u 1 , vn ) ? mal, in 'H, how would you compute the biorthogonal basis ( v1 1 Find a formula that expresses ( vj , x) for each x E 1t and j = 1 , 2, . . . , k in terms of Gram matrices .

P roblem 1 .6. 1 .

, • • •

• • •

Problem 1.6.2 .

A proof of the Toeplitz-Haus dorff Theorem is outlined

below . Fill in the details. Note that W ( A) = {(x , Ax) : ll xll = 1 } = {tr Axx* : x• x = 1 } . It is enough to consider the special case dim 1i = 2. In higher dimensions, this special case can be used to show that if x , y are any two vectors) then any point on the line segment joining (x , Ax) and (y, A y) can be represented as ( z , Az) , where z is a vector in the linear span of x and y. Now, on the space of 2 x 2 Hermitian rnatrices consider the linear map � (T) = tr AT. This is a real linear map from a space of 4 re al dimensions (the 2 x 2 Hermitian matrices) to a space of 2 real dimensions ( the complex plane) . We want to pr ove that

(

cos t e-1w Sin t 0

•

) (cos t

.

.

1

2

e�w s1n t ) = -

+

1

2

-

(

cos 2t e-�w sin 2 t 0

eiw sin 2t

- cos 2t

)

'

This is a 2-sphere centred at ( � ? ) and having radius 1/ J2 in the Frobe2 nius norm . The image of a 2-sphere under a linear map with range in lR2

I . 6 Problems

21

must b e either an ellipse with interior, or a line segmerft..c;- o r a point; in any case, a convex set.

. . . , Xk

are lin Problem 1.6.3. By the remarks in Section 5 , vectors x 1 , early dependent if and only if x 1 · · · == 0. This relationship between linear dependence and the antisymmetric tensor product goes further. Two and {Y l , . . , Y k } of linearly independent vectors have the sets { x1 , . . . for some constant · · same linear span if and only if x 1 /\ · · · 1\ x == c. Thus, there is a one-to-one correspondence between k-dimensional sub spaces of a vector space W and !-dimensional subspaces of 1\ k W generated by elementary tensors x 1 1\ · · ·

1\ 1\ X k

, Xk }

.

k cy11\ ·1\yk

1\ X k .

Problem 1.6.4. How large must dim W be in order that there exist some element of /\2 W which is not elementary? Problem 1.6.5. Every vector w of W induces a linear operator Tw from W as follows. Tw is defined on elementary tensors as 1\ k W to == V k w , and then extended linearly to all · · · Tw ( v1 1\ · of 1\ k W. It is, then, natural to write Tw ( x) == x 1\ w for any x E Show that a nonzero vector x in W is elementary if and only if the space {w E : x w == is k-dimensional. (When is a Hilbert space, the operators Tw are called creation oper ators and their adjoints are called annihilation operators in the physics literature. )

1\k+l · 1\ vk ) VI 1\ 1\ 1\ k 1\ W W1\ 0} ·

1\ k W.

Problem 1. 6.6. ( The n-dimensional Pythagorean Theorem ) Let be orthogonal vectors in IRn . Consider the n-dimensional sim xi , . . . Think of the ( n - I)-dimensional sim plex S with vertices 0, x 1 , plex with vertices x 1 , as the "hypotenuse" of S and the remaining ( n - I )-dimensional faces of S as its "legs" . By the remarks in Section 5, the k-dimensional volume of the simplex formed by any k points together with the origin is (k ! ) 1\ · · · The volume of a simplex not having 0 as a vertex can be found by translating it. Use this to prove that the square of the volume of the hypotenuse of S is the sum of the squares of the volumes of the n legs.

, Xn

. . . , Xn. . . , Xn I J I YI

1\ Yk l ·

y1 , . . . , Yk

Problem 1. 6. 7. (i) Let Q" be the inclusion map from 1\ k ?t into ® k 7t (so that Q� equals the projection P" defined �arlier) and let Qv be the into ®k1t. Then, for any A E C(7t) inclusion map from

V k7t

( ii) II A A I < I I A II k ' (i ii ) jd et AI <

v k A == Pv ( ® k A ) Qv .

k I II v k A ll < I I A I l k . I A i l n , I perA I < I A l i n ·

I.

22

A Review of L inear A lgebra

Problem 1.6.8. For an invertible operator A obtain a :'f-e�lationship between A - l , 1\ n A , and An - I A . Problem 1.6.9. (i) Let bases in 1-{. Show that

{e l , · · · , e n }

and

{fl , · · · , fn }

be two orthonormal

(ii) Let P and Q be orthogonal projections in 1t, each of rank x, y be unit vectors such that Px == Qy == 0. Show that

n - 1 . Let

Problem 1.6. 10. If the characteristic polynomial of A is written as

tn + a l tn - 1 + . . . + an , then the coefficient is equal to tr 1\ k A.

ak

is the sum of all

Problem 1.6. 1 1 . (i) For any

k

x

k principal minors of A. This

A, B E £(7t) we have

® k A - ®k B

=

k L Cj j=l

'

where Hence,

I I ® k A - ® k Ell < k M k - l ii A - E l l , where M max( I I A IJ , l i B I I ) . (ii) The norms of J\ k A - J\ k B and v k A - y k B are therefore also bounded by k M k -1 1lA - Ell . (iii) For n n matrices A, B, ldetA - detB I < n M n - l l i A - B IJ , l perA - perB I < nMn - l i i A - B ll ==

x

(iv) The example A

==

o.I, B

(a + E )I When II A II

==

for small c. shows that these and II B II are far apart, find a

inequalities are sometimes sharp . simple improvement on them. ( v) If A, B are n x n matrices with characteristic polynomials

t n + a l tn - 1 + . . . + an ,

I. 6 Problen1s

respe ctively, then

l a k - bkl

<

k

23

( n) M k - 1 J I A - B l , k

M == max( II-A l l ,

I B I ! ). Problem 1. 6. 12. Let A, B be positive operators with A > B (i.e. , A - B is positive) . Show that ®kA > ®k B , 1\ k A > 1\ k B ,

where

vk A > vk B ,

> detB, > per

detA perA

B.

Problem 1. 6. 13. The Schur product or the Had amard product of two matrices A and is defined to be the matrix A o whose ( i, j )-entry is aij bij . Show that this is a principal submatrix of A @ and derive from this fact two significant properties:

B

(i)

IA

o

Ell

<

BB,

I A ll l i B I for all A, B.

B

are positive, then so is A (ii) If A, Theorem. ) Problem 1. 6. 14. (i) Let A ==

( a ij )

be an n

n

L aij , j= l L a ij ·

ri s

o

B. x

(This is called Schur 's

n positive matrix. Let

l 

n

n ! IT l ri l2.

i =l

[Hint: Represent A as the Gram matrix of some vectors x 1 , . . . , X n as in Exercise ! . 5. 1 0. Let u == s- 1 12 (x 1 + · · · + x n ) · Consider the vectors u V u V · · · V u and x1 V · · V X n , and use the Cauchy-Schwarz inequality.] (ii) Show that equality holds in the above inequality if and only if either A has rank 1 or A has a row of zeroes. (iii) If in addition all aij are nonnegative and all ri == 1 ( so that the matrix A is doubly stochastic as well as positive semidefinite) , then ·

n!

per A > ----;;: n

.

24

I.

A Review of Li near A lgebra

Here equality holds if and only if a ij == � for all i , j . Problem 1.6 . 15. Let A be Hermitian with eigenvalues a 1 > a 2 > Cln o In Exercise I.20 7 we noted that

·

0

0

>

1},

a 1 == max { ( x , Ax ) : II x II

==

an == min { ( x , Ax ) : ll x ll

== 1 } .

Using these relations and tensor products, we can deduce some other ex tremal representations: ( i ) For every k == 1 , 2 , . . , n , 0

k k L aj == max L (xj , Axj ) , j= l j=l n k L aj == min L (xj , A xj ) , j= l j =n - k+ l

where the maximum and the minimum are taken over all choices of or thonormal k-tuples ( x 1 , . x k ) in 1to The first statement is referred to ·as Ky Fan's Maximum Principle. It will reappear in Chapter II ( with a different proof ) and subsequently. ( ii ) If A is positive, then for every k == 1 , 2, . . . , n , 0

0 ,

n

k

II

aj = min II (xj , Axj ) , j= l j =n- k+ l

where the minimum is taken over all choices of orthonormal k-tuples ( X l , . . o X k ) in 1-{. [ Hint: You may need to use the I!adamard Determinant Theorem, which says that the deterrninant of a positive matrix is bounded above by the product of its diagonal entries. This is also proved in Chapter II.] ( ii ) If A is positive, then for every T E Q k,n ,

n

II

j =n - k + l

a1

k

< det A [TIIj < II aj . j= l

Problem 1.6. 16. Let A be any n x n matrix with eigenvalues a 1 , . Show that t A n 1 ( I � j tr l 2 < a1 I All _

�� [ :

_

: ) r /2

0

0

,

Cln o

for all j 1 , 2, , n . ( Results such as this are interesting because they give some information about the location of the eigenvalues of a matrix in ==

.

.

0

I . 6 Problems

25

te rms of more easily computable functions like the Frobenius norm II A I I 2 an d the trace. We will see several such statements later. ) (x1 , . , X n ) is a vector with + · + n [ Hint : First prove that if 6-, th en 1 1/2 max lxJ I < n .]

x

==

.

x1

.

( � ) ilx l

·

·

x

==

Problem 1. 6.17. ( i ) Let z1 , z2 , z3 be three points on the unit circle. Then, the numerical range of an operator A is contained in the triangle with vert ices z1 , z2 , z3 if and only if A can be expressed as A z1 A1 + z2 A 2 + z3 A3 , where A1 , A2 , A3 are positive operators with A 1 + A2 + A 3 I. [Hint: It is easy to see that if A is a sum of this form, then W ( A ) is contained in the given triangle. The converse needs some work to prove. Let z be any point in the given triangle. Then, one can find a 1 , a2 , a3 1 such that aj > 0 , a1 + a 2 + a3 and z a 1 z1 + a 2 z2 + a3z3 . These are the ''barycentric coordinates" of z and can be obtained as follows. Let ' == Im ( z 1 Z2 + Z2 Z3 + z3z l ) · Then, for j 1 , 2 , 3, ==

==

==

=

==

aj

==

Im

( z - Z1·+l ) (z1· + 2 - Z1· + I ) ,

I

where the subscript indices are counted modulo 3 . Put AJ

=

Im (

A - zi+ I I ) (zi+ 2 - zi + I ) . '

Then, Aj have the required properties. ] ( ii ) Let W (A ) be contained in a triangle with vertices z1 , z2 , z3 lying on the unit circle. Then, choosing A1 , A2 , A3 as above, write 3 * A A· zJ· AJ· I 1

(�

) == L ( ]. =

This, being a sum of three positive matrices, is positive. Hence, by Propo sition 1.3. 5 A is a contraction. ( iii ) If W(A) is contained in a triangle with vertices z1 , z2 , Z3 , then I I A ll < max l zi l · This is Mirman's Theorem. Problem 1.6. 18. If an operator T has the Cartesian decomposition A + iB with A and B positive, then

T

=

Show that, if A or B is not positive then this need not be true. [Hint: To prove the above inequality note that W(T) is contained in a rectangle in the first quadrant . Find a suitable triangle that contains it and use Mirman's Theorem. ]

26

I.

I. 7

Notes and References

A Rev iew of Li near Algebra

Standard references on linear algebra and matrix theory include P.R. Hal mas, Finite-Dimensional Vector Spaces , Van Nostrand, 1958; F.R. Gant macher, Matrix Theory , 2 volumes, Chelsea, 1959 and K. Hoffman and R. Kunze, Linear Algebra , 2nd ed. , Prentice Hall, 1971 . A recent work is R.A. Horn and C.R. Johnson, in two volumes, Matrix Analysis and Top ics in Matrix Analysis , Cambridge University Press, 1985 and 1990. For more of multilinear algebra, see W� Greub, Multilinear Algebra , 2nd ed. , Springer-Verlag, 1978, and M. Marcus, Finite-Dimensional Multilinear Al gebra, 2 volumes, Marcel Dekker, 1973 and 1975. A brief treatment that covers all the basic results may be found in M. Marcus and H. Mine, A Sur vey of Matrix Theory and Matrix Inequalities , Prindle, Weber and Schmidt, 1964, reprinted by Dover in 1992. Though not as important as the determinant , the permanent of a matrix is an interesting object with many uses in combinatorics, geometry, and physics. A book devoted entirely to it is H. Mine, Permanents , Addison Wesley, 1978. Apart from the symmetric and the antisymmetric tensors, there are other symmetry classes of tensors. Their study is related to the glorious subject of representations of finite groups. See J.P. Serre, Linear Representations of Finite Groups, Springer-Verlag, 1977. The result in Exercise 1.3.6 is due to P.R. Halmos, and is the beginning of a subject called Dilation Theory. See Chapter 23 of P. R. Halmos, A Hilbert Space Problem Book, 2nd ed. , Springer-Verlag, 1982. The proof of the Toeplitz-Hausdorff Theorem in Problem 1.6.2 is taken from C. Davis, The Toeplitz-Hausdorff theorem explained , Canad. Math. Bull. , 14( 1971) 245-246. For a different proof, see P.R. Halmos, A Hilbert Space Problem Book . For relations between Grassmann spaces and geometry, as indicated in Problem 1.6.3, see, for example, I.R. Porteous, Topological Geometry, Cam bridge University Press, 198 1 . The simple proof of the Pythagorean Theo rem in Problem 1.6.6 is due to S. Ramanan. Among the several papers in quantum physics, where ideas very close to those in Problems !.6.3 and I.6.5 are used effectively, is one by N.M. Hugenholtz and R.V. Kadison, Automorphisms and quasi-free states of the CAR algebra, Commun. Math. Phys. , 43 ( 1975) 18 1- 197. Inequalities like the ones in Problem !.6. 1 1 were first discovered in con nection with perturbation theory of eigenvalues. This is summarised in R. Bhatia, Perturbation Bounds for Matrix Eigenvalues , Longman, 1987. The simple identity at the beginning of Problem 1.6. 1 1 was first used in this context in R. Bhatia and L. Elsner, On the variation of permanents , Linear and Multilinear Algebra, 27( 1990) 105-110. The results and the ideas of Problem !.6. 14 are from M. Marcus and M. Newman, Inequalities for the permanent function, Ann. of Math. , 75( 1962)

I . 7 Notes and References

27

4 7if12-. In 1 926, B .L. van der Waerden had conjectured that the inequality in part ( iii) of Problem 1. 6. 14 will hold for all doubly stochastic matrices. This conjecture was proved , in two separate papers in 1981 , by G.P. Egorychev and D. Falikman. An expository account is given in J.H. van Lint , The van der Waerden conjecture: two proofs in one year, Math. Intelligencer, 4(1 982)72-77. The results of Problem 1.6. 1 5 are all due to Ky Fan, On a theorem of Weyl co ncern ing eigenvalues of linear transformations I, II , Proc. Nat . Acad. Sci. , U.S.A. , 35 (1949) 652-655, 36( 1950)3 1-35, and A minimum property of the eigenvalues of a Hermitian transformation, Amer. Math. Monthly, 60 ( 1953)48-50. A special case of the inequality of Problem 1.6. 16 occurs in P. Tarazaga, Eigenvalue estimates for symmetric matrices , Linear Algebra and Appl. , 135 ( 1 990) 1 71- 179. Mirman's Theorem is proved in B .A. Mirman, Numerical range and norm of a linear operator, Trudy Seminara po Funkcional' nomu Analizu, No . 10 (1968) , pp. 5 1-55. The inequality of Problem !.6. 18 is also noted there as a corollary. Our proof of Mirman's Theorem is taken from Y. Nakamura, Numerical range and norm, Math. Japonica, 27 ( 1982) 149- 1 50.

II Maj orisation and D oubly Stochastic Matrices

Comparison of two vector quantities often leads to interesting inequali ties that can be expressed succinctly as majoris ation relations. There is an intimate relation between majorisation and doubly stochastic matrices. These topics are studied in detail here. We place special emphasis on ma j orisation relations between the eigenvalue n-tuples of two matrices. This will be a recurrent theme in the book. "

II. l

"

Basic Notions

Let x == ( x 1 , . . . , x n ) be an element of IRn . Let x l and x T be the vectors obtained by re arr anging the coordinates of x in the de creasing and the increasing orders, respectively Thus, if xl == (xi , . . . , x� ) , then xi > · · > x; . Similarly, if xT == (xi , . . . , x� ) , then xi < · · · < x� . Note that .

·

1

XTj - xln - j + l ' _

Let

x,

y E IRn . We say that k

"'X � � J j= l

and

x

< j < n.

is majorised by k

+ <- "" �y j= l

n

1
J '

n

L xJ == LyJ .

j=l

y , in 5ymbols

j=l

n,

x

-<

(IL l )

y , if (II.2)

(II.3)

I L l B asic Not ions

Exartrp le: If X i > 0 and l:xi

29

1 , then

==

1 1 ( - , . . . , - ) -< (x 1 , . . . , x n ) -< ( 1 , 0, . . . , 0) . n

n

The notion of majorisation occurs naturally in various contexts. For ex ample, in physics, the relation x -< y is interpreted to mean that the vector x describes a "more chaotic" state than y. (Think of xi as the probability of a system being found in state i . ) Another example occurs in economics. If x 1 , . . . , X n and Y1 , . . . , Yn denote incomes of individuals 1 , 2, . . . , n , then x -< y would mean that there is a more equal distribution of incomes in the state x than in y. The above example illustrates this. From (IL l) we have k

n

n- k

x] L xj - L xj . L =l =l =l ==

j

j

Hence x

-<

j

y if and only if k

k

"' x � > "yT � J - � J'

j= l

l< -n -k<

j= l

and

(II.4)

n

n

L xJ LyJ . =l =l ==

(II.5)

j

j

Let e denote the vector ( 1 , 1, . . . , 1) , and for any subset I of { 1 , 2, . . . , n let e 1 denote the vector whose j th component is 1 if j E I and 0 if j rf_ I. Given a vector x E IRn , let

}

tr x

n

==

L xj j= l

==

(x, e ) ,

where ( - , ·) denotes the inner product in IRn . Note that k

L xj

==

.

J=l

max (x, e1 ) , l l l =k

where JI I stands for the number of elements in the set I. So, x -< y if and only if for each subset I of { 1 , 2, . . . ; n} there exists a subset J wi th l lI == J I such that

(II.6) and tr x

==

tr y.

(II. 7)

30

II.

Maj orisat ion and Doubly Stochastic Mat rices

We say that x is (weakly) subrnajorised by y, in symbols x -< w y , if condition (II.2) is fulfilled. Note that in the absence of (II.3) , the conditions (II.2) and (II.4) are not equivalent. We say that x is (weakly) superrnajorised by y , in symbols x -< w y , if condition (II.4) is fulfilled. Exercise II. l . l {i) x -< y ¢:> x -< w y and x -< w y . (ii} If a is a positive real number, then X -< w y

=>

ax -< w ay,

X -< w y

=>

ax -< w a y.

=>

ax -< ay .

(iii} X -< w y ¢:> -x -< w - y. (iv) For any real number a , x -< y

Remark II. 1.2 The relations -< , -< w , and -< w are all reflexive and tran sitive. None of them, however, is a partial order. For example, if x -< y and y -< x , we can only conclude that x == Py, where P is a permutation matrix. If we say that x ""' y whenever x == Py for some permutation matrix P, then ""' defines an equivalence relation on IRn . If we denote by IR�y m the resulting quotient space, then -< defines a partial order on this space. This > Xn } . These relation is also a partial order on the set { x E IRn : XI > statements are true for the relations -< w and -< w as well. For a , b E JR , let a V b == max ( a , b) and a 1\ b == min( a, b) . For x, y E JRn , define x V y == ( x l V y i , . . . , Xn V yn ) ·

X 1\ Y == (x i 1\ YI ,

·

·

·

·

·

, X n 1\ Yn ) ·

Let x+ lxl

x V 0, x V ( -x) .

In other words, x + is the vector obtained from x by replacing the negative coordinates by zeroes , and l x l is the vector obtained by taking the absolute values of all coordinates. With these notations we can prove the following characterisation of ma jorisation that does not involve rearrangements: Theorem II. 1.3 Let x, y E IRn . Then, (i) x -<w y if and only if for all t E IR n

n

(II.8) j= l

j=I

II. l

(ii)

x

31

-< w y if and oniy- if for all t E IR n

n

(11. 9 )

j= l

(iii)

Basic Notions

x -< y if and only if for all t E IR

j=l

n

n

j= l

j= l

L lxj - t l < LIY1 - tj .

(II. l O)

Proof. Let x -< w y. If t > xi , then ( x1 - t ) + == 0 for all j, and hence (II.8) holds. Let X k I < t < xi for some 1 < k < n, where, for convenience, + x ; + l == -oo. Then, k k n L( xj - t) == Lxj - kt j=l j= l k

k

j= l

j=l

< L YJ - kt < L(yj - t)+ , and, hence, (II.8) holds. To prove the converse, note that if t n

==

Y k , then

k

k

L (Yj - t ) + L(yj - t) L yJ - kt. ==

==

j= l

j= l

k

k

j= l

j= l

j= l

But k

Lxj - kt j=l

L (xj - t) < L (xj - t ) + n

n

j= l

j= l

So, if (II.8) holds, then we must have k

"'X� L-t J

j= l

k

"'y +

< L-t j= l

J '

i.e. , X -< w y . This proves ( i) . The statements (ii) and (iii) have similar proofs.

•

Corollary 11. 1.4 If x -< y in JRn and u -< w in IRm , then (x, u) -< ( y , w ) in IRn + m . In particular, x -< y if and only if (x, u ) -< ( y , u ) for all u .

32

II.

An

Maj orisation and Do ubly Stochastic Mat rices

�� n

matrix A

==

( aij ) is called doubly stochastic if n

>0 a1.]· · -

for all

z,

J,

(II. 1 1 )

for all j,

(II. 12)

for all

(II. 13)

n

i.

Exercise 11. 1 . 5 A linear map A on e n is called positivity-preserving if it carries vectors with nonnegative coordinates to vectors with nonnegative coordinates. It is called trace-preserving if tr Ax == tr x for all x . It is called unital if Ae == e . Show that a matrix A is doubly stochastic if and only if the linear operator A is positivity-preserving, trace-preserving and unital. Show that A is trace-preserving if and only if its adjoint A* is unital. Exercise 11. 1.6 {i) The class of n x n doubly stochastic matrices is a convex set and is closed under multiplication and the adjoint operation. It is, however, not a group. (ii) Every permutation matrix is doubly stochastic and is an extreme point of the convex set of all doubly stochastic matrices. (Later we will prove Birkhoff 's Theorem, which says that all extreme points of this convex set are permutation matrices.) Exercise 11. 1 . 7 Let A be a doubly stochastic matrix. Show that all eigen values of A have modulus less than or equal to 1, that 1 is an eigenvalue of A, and that I I A I I == 1 . Exercise 11. 1.8 If

A is doubly stochastic, then l Ax !

where, as usual, j x j all j .

==

< A( l x l ) ,

( j x 1 1 , . . . , lx n l ) and we say that x

<

y if Xj

< Y1

for

There is a close relationshi p between majorisation and doubly stochastic matrices. This is brought o ut in the next few t heorems . -<

Theorem 11. 1.9 A matrix A is doubly stochastic if and only if Ax for all vectors x .

Ax

x

Proof. Let -< x for all x. First choosing to be e and ei == (0, 0, . . . , 1 , 0, . . . , 0) , 1 < i < n, one can easily see that A is bly stochastic. Conversely, let be doubly stochastic. Let y == Ax. To p rove y -< may assume, without loss of generality, that the coordinates of both

A

x

then do u

x

x we and

I L l B asic Not ions

y

33

are in decreasing order. (See Remark II. 1 . 2 and Exercise 11. 1 . 6 . ) No\v note that for any k, 1 < k < n , we have

If we put t i

k k n L Yj == L L U ij X i · j= l j=l i= l

k

==

:Laij , then 0 < t i j=l

j=l

j= l

<

< 1

n

and L t i

==

i=l

k. We have

i=l i=l k n n L t i x i - L x i + (k - L t i ) x k i= l i= l i=l k n L ( t i - 1 ) ( x i - x k ) + L ti ( x i - x k ) i= l i=k+l

0.

Further, when k == n we must have equality here simply because A is doubly stochastic. Thus, y -< x . • Note that if x, y

E

�2

and x -<

y

then

Note also that if x , y E JRn and x is obtained by averaging any two coordi nates of y in the above sense while keeping the rest of the coordinates fixed, then x -< y. More precisely, call a linear map T on IRn a T-transforrn if there exists 0 < t < 1 and indices j, k such that

Then, Ty

-<

y

for all y .

Theorem 11. 1. 1 0 For x, y

E

JR n , the following statements are equivalent:

{i) X -< y .

{ii) x is o btained from y b y a finite number of T-transforms. {iii)

is in the convex hull of all vectors obtained by permuting the coor dinates of y .

x

{iv) x

==

Ay for some doubly stochastic matrix A .

34

Maj orisation and D oubly Stochastic Matrices

II.

Proof. When n == 2,- ,ihen ( i ) * ( ii ) . We will prove this for a general n by induction. Assume that we have this implication for dimensions up to n - 1 . Let x , y E JRn . Since x l and y l can be obtained from x and y by permutations and each permutation is a product of transpositions which are surely T-transforms, we can assume without loss of generality that x 1 > X 2 > · · · > X n and Y l > Y2 > · · · > Yn · Now, if x -< y, then Yn < x 1 < Y l · Choose k such that Yk < x 1 < Yk- 1 . Then x 1 ty1 +( 1 - t) y k for some 0 < t < 1 . Let

==

T1 z ( tz 1 + ( 1 - t) z k , z2 , . . . , Zk - 1 , (1 - t) z1 + t zk , Zk + I , . . . , Zn) for all z E IR.n . Then note that the first coordinate of T1 y is x 1 . Let x' (x2 , . . . , X n ) y' ( y2 , · . , Y k - 1 , ( 1 - t) yl + tyk , Yk + l , . . . , Yn) · We will show that x ' -< y'. Since Y l > · · · > Yk - 1 > x 1 > x 2 > · · · > Xn , we have for 2 <
.

m

m

m

L xj < LYj · j= 2 j=2 For k <

m

<

n m

k 1 L Yj + ( ( 1 - t) y l + tyk] + L Yj j =2 j= k+ l L Yj - tyl + (t - l) yk j=l m

-

m

m

m

m

Lxj - x 1 == Lxj . j=l j =2 The last inequality is an equality when m == n since x -< y. Thus x ' -< y ' . So by the induction hypothesis there exist a finite number of T-transforms T2 , . . . , Tr on IR.n - I such that x ' == (Tr · · · T2 ) y' . We can regard each of them as a T-transform on IRn if we prohibit them from touching the first LYi - x 1 j=l

>

coordinate of any vector. We then have

- -

and that is what we wanted to prove. Now note that a T-transform is a convex combination of the identity map and some permutation. So a product of such maps is a convex combination of permutations. Hence ( ii ) * ( iii ) . The implication ( iii ) =} ( iv ) is obvious, • and ( iv ) => ( i ) is a consequence of Theorem 11. 1.9. A consequence of the above theorem is that the set { x : x -< y} is the convex hull of all points obtained from y by permuting its coordinates.

I L l B asic Notions

35

Exercise 11. 1 . 1 1 If U == ( u ij ) is a unit-ary matrix, then the matrix ( jui1 j2) is dou bly stochastic. Such a doubly stochastic matrix is called unitary-stochastic; it is called orthostochastic if U is real orthogonal. Show that if x == Ay for some doubly stochastic matrix A, then there exists an orthostochastic matrix B such that x == B y . (Use induction.) Exercise 11. 1 . 12 Let A be an n x n Hermitian matrix. Let diag (A) denote the vector whose coordinates are the diagonal entries of A and .X(A) the vector whose coordinates are the eigenvalues of A specified in any order. Show that diag (A) -< A(A ). (II. 14) This is sometimes referred to

as

Schur's Theorem.

Exercise 11. 1. 13 Use the majorisation (11. 14) to prove that if AJ (A ) de note the eigenvalues of an n x n Hermitian matrix arranged in decreasing o rder then for all k == 1 , 2 , . . . , n

k L-AJ (A) = max j=l

k L(xj , Axj ), j=l

( II. 1 5 )

where the maximum is taken over all orthonormal k-tuples of vectors {x 1 , . . . , x k } in e n . This is the Ky Fan's maximum principle . {See Problem I. 6. 1 5 also.) Show that the majorisation {I/. 14) can be derived from (I/. 1 5). The two statements are, thus, equivalent.

Exercise 11. 1 . 14 Let A, B be Hermitian matrices. Then for all k

==

1, 2,

. . . ' n

k k k (1!. 16) L-AJ (A + B) < LAJ (A) + L .xJ (B) . j= l j=l j= l Exercise 11. 1 . 15 For any matrix A, let A be the Hermitian matrix

-

- [

0 A == A*

�].

(11. 17)

Then the eigenvalues of A are the singular values of A together with their negatives. Denote the singular values of A arranged in decreasing order by s 1 (A) , . . . , s n (A) . Show that for any two n x n matrices A, B and for any k == 1 , 2, . . . , n

k k k L sj ( A + B ) < L sj (A) + L s. (B ) . j=l j=l j= l i

(11. 18)

I1'hen k == 1, this is just the triangle inequality for the ope:ator norm I I A I I . For each 1 < k < n , define I I A I I c k ) == 2:;= 1 sj (A) . From (!!. 1 8} it follo ws that I I A I I c k ) defines a norm. These norms are called the Ky Fan k-norms .

36

II. 2

II.

Majorisat ion and Do ubly Stochastic Mat rices

Birkhoff ' s Th�orem

We start with a combinatorial problem known as the Matching Problem. Let B == {b 1 , . . . , bn } and G == { g 1 , . . . , 9n } be two sets of n elements each, and let R be a subset of B x G. When does there exist a bijection f from B to G whose graph is contained in R? This is called the Matching Problem or the Marriage Problem for the following reason. Think of B as a set of boys, G as a set of girls, and (bi , gj ) E R as saying that the boy bi knows the girl 9i . Then the above question can be phrased as: when can one arrange a monogamous marriage in which each boy gets married to a girl he knows? We will call such a matching a compatible matching. For each i let Gi == {gj : (bi , 9i ) E R} . This represents the set of girls whom the boy bi knows. For each k-tuple of indices 1 < i 1 < · · · < i k < n , k let G i 1 . ··ik- == U Gir . This represents the set of girls each of whom are known r= l

to one of the boys bi 1 , . . . , bi k . Clearly a necessary condition for a compatible matching to be possible is that I G i 1 · -·i k I > k for all k == 1 , 2, . . . , n. Hall's Marriage Theorem says that this condition is sufficient as well. Theorem 11.2 . 1 {Hall} A compatible matching between B and found if and only if I Gi l · · · i k I > k , for all 1 < i 1 <

·

·

·

< ik <

n,

G

can be

(II. 19)

k == 1 , 2, . . . , n .

Proof. Only the sufficiency of the condition needs to be proved. This is done by induction on n. Obviously, the Theorem is true when n == 1. First assume that we have for all 1 < i 1 < · · · < ik < n, 1 < k < n. In other words, if 1 < k < n, then every set of k boys together knows at least k + 1 girls. Pick up any boy and marry him to one of the girls he knows. This leaves n - 1 boys and n - 1 girls; condition (II. 19) still holds, and hence the remaining boys and girls can be compatibly matched. If the above assumption is not met, then there exist k indices i 1 , . . . , i k , k < n, for which I G i l · · · i k I == k. In other words, there exist k boys who together know exactly k girls. By the induction hypothesis these k boys and girls can be compatibly matched. Now we are left with n - k unmarried boys and as many unmarried girls. If some set of h of these boys knew less than h of these remaining girls, then together with the earlier k these h+k boys would have known less than h+k girls. (The earlier k boys did not know any of the present n - k maidens. )

I I . 2 B irkhoff 's T heorem

So, condition (II. l9) is satisfied for the remfliping n - k boys and girls can now be compatibly n1arried by the induction hypothesis.

37 vv

ho

•

Ex ercise 11. 2.2 {The Konig-Frobenius Theorem) Let A == (a i1 ) be an n x n matrix . If a is a permutation on n symbols, the set { a 1 o- ( l ) , a 2 o- ( 2 ) , . . . a o- { ) } is called a diagonal of A . Each diagonal contains exactly one element from each row and from each column of A . Show that the following two statements are equivalent: ,

n

n

{i) every diagonal of A contains a zero element. {ii) A has a k k + f > n.

x

f submatrix with all entries zero for some k, € such that

One can see that the statement of the Konig-Frobenius Theorem is equiv alent to that of Hall's Theorem. Theorem 11.2.3 {Birkhoff 's Theorem) The set of n x n doubly stochastic m atrices is a convex set whose extreme points are the permutation matrices. Proof. We have already made a note of the easy part of this theorem in Exercise II . 1 . 6. The harder part is showing that every extreme point is a permutation matrix. For this we need to show that each doubly stochastic matrix is a convex combination of permutation matrices . This is proved by induction on the number of positive entries of the ma trix. Note that if A is doubly stochastic, then it has at least n positive entries. If the number of positive entries is exactly n, then A is a permuta tion matrix. \iVe first show that if A is doubly stochastic, then A has at least one diagonal with no zero entry. Choose any k x f submatrix of zeroes that A might have. We can find permutation matrices P1 , P2 such that P1 AP2 has t he form P1 AP2 =

[g �],

where 0 is a k x .e matrix with all entries zero. Since P1 AP2 is again doubly stochastic, the rows of B and the columns of C each add up to 1 . Hence k + £ < n. So at leas t one diagonal of A must have all its entries positive , by the Konig-Frobenius Theorem. Choose any such positive diagonal and let a be the smallest of the ele ments of this diagonal . If A is not a permutation matrix, then a < 1. Let P be the permutation matrix obtained by putting ones on this diagonal and let A - aP B= . 1-a Then B is doubly stochastic and has at least one more zero entry than A has. So by the induction hypothesis B is a convex combination of permu • tation matrices. Hence so is A, since A == ( 1 a ) B + aP. -

38

II.

M aj orisation and Doubly S to chastic M atrices

Re-mark. There are n! permutation matrices of size n. Birkhoff's Theorem tells us that every n x n doubly stochastic matrix is a convex combination of these n! matrices. This number can be reduced as a consequence of a general theorem of Caratheodory. This says that if X is a subset of an m-dimensional linear variety in IR N , then any point in the convex hull of X can be expressed as a convex combination of at most m + 1 points of X . Using this theorem one sees that every n x n doubly stochastic matrix can be expressed as a convex combination of at most n2 - 2n + 2 permutation matrices. Doubly substochastic matrices defined below are related to weak ma jorisation in the same way as doubly stochastic matrices are related to majorisation. A matrix B (b ij ) is called doubly substochastic if

==

. .

b· · > 0

for all

Lb · · < 1

for all J ,

?.J

n

-

i= l

?.J

-

n

Lb · · < 1 j= l

lJ

-

for all

'l ' J '

'l .

Exercise II.2.4 B is doubly substochastic if it is positivity-preserving, Be < e , and B*e < e . Exercise II. 2 . 5 Every square submatrix of a doubly stochastic matrix is doubly substochastic. Conversely, every doubly substochastic matrix B can be dilated to a doubly stochastic matrix A. Moreover, if B is an n x n matrix, then this dilation A can be chosen to have size at most 2n x 2n . Indeed, if R and C are the diagonal matrices whose j th diagonal entries are the sums of the j th rows and the j th columns of B , respectively, then

A = ( I -B C

I-R B*

)

is a doubly stochasti c matrix.

Exercise II.2.6 The set of all n x n doubly substochastic matrices is con vex; its extreme points are matrices having at most one entry 1 in each row and each column and all other entries zero. Exercise II. 2 . 7 A matrix B with nonnegative entries is doubly sub stochas tic if and only if there exists a doubly stochastic matrix A such that b ij < a ij for all i , j == 1 , 2, . . , n .

.

Our next theorem connects doubly substochastic matrices to weak majorisation.

II.2

Birkhoff 's Theorem

39

T he or em 11. 2 . 8 '-{i) Let x, y be two vectors with nonnegative coordinates. Then x -< w y if and only if x == By for some doubly substochastic matrix B. (ii) Let x , y E IRn . Then x -<w y if and only if there exists a vector u su ch that x < u and u -< y. Proof. I f x , u E IRn and x < u , then clearly x -< w u . So, if in addition u -< y, then x -< w y . Now suppose that x , y are nonnegative vectors and x == By for some dou bly substochastic matrix B. By Exercise II.2. 7 we can find a doubly sto chast ic matrix A such that bij < aij for all i , j . Then x == By < A y. Hence, x -<w y . Conversely, let x, y be nonnegative vectors such that x -<w y. We want to prove that there exists a doubly substochastic matrix B for which x == B y If x == 0, we can choose B == 0, and if x -< y , we can even choose B to be doubly stochastic by Theorem 1!. 1 . 10. So, assume that neither of these is the case. Let r be the smallest of the positive coordinates of x, and let s == � Yj - L: Xj · By assumption s > 0. Choose a positive integer m such that r > s fm . Dilate both vectors x and y to ( n + m)-dimensional vectors x' , y' defined as .

x' y

'

(x l , · · · , Xn , s f m, . . . , s f m) , ( YI , · · · , Yn , O , . . . , O) .

Then x' -< y ' . Hence x' == A y' for some doubly stochastic matrix of size n + m. Let B be the n x n submatrix of A sitting in the top left corner. Then B is doubly substochastic and x == By. This proves (i) . Finally, let x, y E IR n and x -< w y. Choose a positive number t so that x + te and y + te are both nonnegative, where e == ( 1 , 1 , . . . , 1 ) . We still have x+ te -< w y + te . So, by (i) there exists a doubly substochastic matrix B such that x + te == B (y + te ) . By Exercise II. 2. 7 we can find a doubly stochastic matrix A such that bij < aij for all i , j. But then x + te < A (y+ te ) == Ay+te. • Hence, if u == Ay, then x 0 we have Ax > 0 and Ax -<w x . {Compare with Theorem 1!. 1 . 9.) Exercise 11.2. 10 Let x , y E IRn and let x > 0, y > 0 . Then x -<w y if and only if x is in the convex hull of the 2 n n! points obtained from y by perm utati ons and sign changes of its coordinates {i . e . . vectors of the form ( ± Yo-( 1 ) , ±Yo-( 2 ) , . . . , ± Yu ( n) ) , where (]" is a pe rm utation}.

II.

40

Majorisation and Do ubly Stochastic M at rices

Convex and Monot one Functions

II . 3

In this section we will study maps from IRn to IRm that preserve various orders. Let f : IR � IR be any function. We will denote the map induced by f on IRn also by f; i.e. , f ( x) == ( f ( xi ) , . . . , f( x n )) for x E IRn . An elementary and useful characterisation of majorisation is the following.

Theorem 11.3. 1 Let equivalent: {i) X

-< y .

{ii) tr

cp ( x )

x, y

E

JRn .

Then the following two conditions are

< tr cp ( y ) for all convex functions

from

IR

to

IR.

x -< y . Then x == A y for some doubly stochastic matrix A . So n n X i == L a ij Yj , where a ij > 0 and La ij == 1 . Hence for every convex funcj =l j= l n n n tion cp , cp (x i ) < L a ij 'P ( Yj ) - Hence L

Let

convex. Now apply Theorem II. l . 3 (iii) . Exercise 11. 3.2 For x , y {i) X (ii) tr

cp

E

IRn

•

the following two conditions are equivalent:

-< w Y · < tr cp ( y ) for all mono tonically increasing convex functions from IR to IR.

cp( x)

Note that in the two statements above it suffices to consider only con tinuous functions. A real valued function cp on IRn is called Schur-convex or S-convex if

x -< y

(II.20)

(This terminology might seem somewhat inappropriate because the condi tion ( II.20 ) expresses preservation of order rather than convexity. However, the above two propositions do show that ordinary convex functions are related to this notion. Also, if x -< y , then x is obtained from y by an averaging procedure. The condition ( 11.20) says that the value of cp is di minished when such a procedure is applied to its argument. Later on, we will come across other notions of averaging, and corresponding notions of convexity. ) We will study more general maps that include Schur-convex maps.

I I . 3 Convex and Monotone Funct ions

41

m

Consider maps : Rn lR . The domain of will be either all of IR. n or some convex set invariant under coordinate permutations of its elements. Such a map will be called monotone increasing if -t

x
(x) < ( y ) ,

==?

monotone decreasing if

- is monotone increasing, convex if

(tx + ( 1 - t)y)

<

co ncave if isot one if

t(x) + ( 1 - t ) (y) ,

0<

t < 1,

- IS. convex, x -< y

(x) -< w ( y ),

-< w Y

(x) -< w (y) ,

strongly isotone if X

and strictly isotone if

x -< y

(x) -< (y) .

=====?

Note that when m == 1 isotone maps are precisely the Schur-convex maps . The next few propositions provide examples of such maps. We will denote by Sn the group o f n x n permutation matrices.

Theorem 11.3.3 Let : IR n P E Sn there exists P' E Sm

1R m

be a convex map. Suppose that for any such that (II. 2 1 ) (Px) P'(x) for all x E �n . Then is isotone. In addition, if is monotone increasing, then is strongly isotone. -t

==

Proof. Let x -< y in IRn . By Theorem IL 1 . 10 there exist P1 , . . . and positive real numbers t 1 , . . . , tN with L:tj == 1 such that X ==

, PN in Sn

�tj PjY·

So, by the convexity of

(x) < L:tj(Pjy) L:tjPj (y ) z, say. -<
==

Then z z. that

is also monotone increasing. Let u -< w y . Then by Theorem II.2.8 there exists x such that u < x -< y. Hence (y) . So,

i s strongly isotone . •

�1aj orisation and Do ubly Stochastic

II.

42

Corollary II.3.4 ·f t cp

1R

Matrices

is a convex function, then the induced map

----+

IR

----+

:

the above corollary. Example II.3.5

From the above results we can conclude that

{i} X -< y in IR.n =* l x l -<w Y I· I {ii} -< y in IRn x 2 -<w y 2 . {iii} x -<w y in IR+ =* xP -< w yP for p > 1. {iv) X -< w y in IR.n x + -< w y + . (v) If :.p is any function such that cp(e t ) is convex and monotone increas ing in t, then log x -< w log y in IR+ cp( x) -<w cp(y ) . (vi) log -< w log y in IR� x -< w y . {vii) For x , y E IR+ X

==>

=*

=*

X

=*

k

k

II xj < II yJ , 1 < k <

j= l

j= l

k

n , ==>

k

L xj < LYJ , 1 < k <

j= l

n.

j=l

Here IR� stands for the collection of vectors x > 0 (or, at places, x > 0) . All functions are understood in the coordinatewise sense. Thus, e.g. , l x I ( l x i i , · · · , l xn l ) .

==

As an application we have the following very useful theorem.

{Weyl 's Majorant Theorem) Let A be an n x n matrix with singular values s 1 > · · · > s n and eigenvalues A1 , : . . , A n arranged in such a way that A 1I I > · > A nI I · Then for every function cp IR+ IR+ , such that cp(e t ) is convex and monotone increasing in t, we have Theorem 11.3.6

·

·

:

-+

(II.22) In particular, we have for all p

(11 . 23) >

0.

Proof. The spectral radius of a matrix is bounded by its operat or norm. Hence,

II.3

Convex and Monotone Functions

Apply this argument to the antisy-.mmetric tensor powers

k k II I -Xj l < II Sj , 1 < k < j= l j= l -

-

1\

43

k A. This gives (II. 24)

n.

•

Now use the assertion of II.3.5 (vii) . Note that we have

n

n

II I Aj l II Sj , j= l j= l

(II.25)

==

both the expressions being equal to (det A* A ) 1 1 2 . Remark 11.3. 7 Returning to the condition (I/. 21} just says

Theorem I/. 3. 3, we note that when 1 that

(Px) (x) , (II.26) for all x E IRn and P E Sn . So, in this case Theorem II. 3. 3 says that if a function IRn JR is convex and permutation invariant, then it is isotone {i. e., Schur-convex). m ==

==

:

----+

Also note that every isotone function � from IRn to IR has to be permu tation invariant because Px and x majorise each other and hence isotony of implies equality of (Px) and

is isotone. If, P E Sn

convex function and let (x ) in addition, W is monotone in ==

creasing, then is strongly isotone. Exercise 11. 3.9 Let cp IR IR be convex. For each k 1, 2, functions cp( k ) IRn IR by :

:

==

----+

. . . , n,

define

�

k cp ( k ) (x ) max L cp(x u (j ) ) , j= l ==

u

where runs over all permutations on n symbols. Then cp( k ) is isotone. If, in addition, cp is monotone increasing, then cp ( k ) is strongly isotone. Note that this applies, in particular, to a

n

cp(n) ( x ) L cp(xj ) ==

==

tr

cp (x ) .

j == l Compare this with Theorem I/. 3. 1 . The special choice cp (t) t gives cp( k ) ( x)

k Lx; .

j=l

==

==

44

II.

Maj orisation and Doubly

Example 11.3. 10 FO'r x E IRn

V(x)

Stochastic Matrices

let x

==

1 ""

� "Ex1 .

Let

- � (x1 - x- ) 2 .

==

n

.

J

This is called the variance function. Since the maps 1 convex, V(x) is isotone {i. e. , Schur-convex). Example 11.3 . 1 1 For E IR+. let H ( x) - L x1 log x1 , x

---7

( x1 - x) 2 are

x

==

J

where by convention we put t log t 0, if t 0. Then H is called the entropy function. Since the function f(t) t log t is convex for t > 0, we see that - H ( x) is isotone. {This is sometimes expressed by saying that the entropy function is anti-isotone or Schur-concave on IR+. . ) In particular, if x1 > 0 and � x1 1 we have ==

==

==

==

H( l , O, . . . , O) < H(xJ , . , xn) < H( -1 , . . . , -1 ) , which is a basic fact about entropy. Example 11.3 . 12 For p > 1 the function . .

n

n

is isotone on IR� . In particular, if x j > 0 and �xj 1, we have 2 _ ( n_ _ +_ 1_ )P � · + 1 P < � (x ) . _ · x ==

n P- 1

Example 11.3. 13 function if

j= l

A function
:

IRn

---7

J

JR+

J

is called a symmetric gauge

{i}

( Px) (x) for all x E IRn , P E Sn , {iii} ( c 1 X 1 , , E n Xn ) ( x 1 , . , x n ) if Ej ± 1 , {iv) <1> ( 1 , 0, . . . , 0) 1 . {The last condition is an inessential normalisation.) Examples of sym metric gauge functions are ==

==

. . .

.

.

==

==

n

( L i xi l p ) l f p , oo (x)

j= l

lx1 j . j
1

 > lx n I , then ·

k

( k ) (x)

==

I: J xj l ,

j= l

·

If

45

the coordinates

·

l
is also a symmetric gauge function. This is a consequence of the majorisa tio ns (II. 29) and (i) in Examples II. 3. 5. Every symmetric gauge function is convex on IR.n and is monotone on IR+ (Problem II. 5. 1 1). Hence by Theorem II. 3. 3 it is � trongly isotone; i. e. , X -< w y in lR� (x) < ( y ) . =>

For differentiable functions there are necessary and sufficient conditions characterising Schur-convexity: Theorem 11. 3. 14 A differentiable function : IRn JR. is isotone if and ---t

only if (i)

8 a

>

0.

Proof. We have already observed that every isotone function is permu tation invariant . To see that it also satisfies (ii ) , let i = 1, j == 2, without any loss of generality. For 0 < t < 1 let x (t )

-<

Then x (t)

==

x

0 >

((1 - t ) x ==

+

+

(1 - t )x2 , X3 , . . . , X n ) · x (O). Hence (x(t) ) < (x (O) ) , and therefore 1

[ -d (x(t) )] d t

t x 2 , tx 1

t=O

==

( II.27)

8<1> 8 - (Xl - X 2 ) ( 8 (x ) - a 2 ( x ) ) . X1 X

This proves (ii) . Conversely, suppose satisfies (i) and (ii). We want to prove that (u) <

we may assume that

for some 0

< s <

(u) - (x)

� . Let x(t ) be as in (IL27) . Then

18 -d (x (t ) ) dt 0

dt

46

II.

Majorisation and Doubly Stochastic Matrices

<

- 15 (x 1 - x2 f[:� (x (t)) - :: (x (t)) ] dt - fs x (t) I - x(t) 2 [ (x(t)) - 8 (x (t )) ] dt }0 1 - 2t 8x O if>

O if>

x2

1

0,

because of ( ii ) and the condition 0

<

s

<

•

�.

Example 11.3. 15 (A Schur-convex function : 1 2 � IR, where I == (0, 1 ) , be the function

( x 1 , x2 )

==

1 log( Xl

that is not convex) Let 1

- 1 ) + log ( -X2

-

1).

Using Theorem 11. 3. 14 one can check that is Schur-convex on the set However, the function log( � - 1 ) is convex on (0, � ] but not on [� , 1 ) . Example II.3.16 {The elementary symmetric polynomials) For each k == 1 , 2, · · · , n, let Sk IRn � IR be the functions :

Sk (x)

==

L

1 < i1
X i 1 X i2

•

•

Xi k .

•

These are called the elementary symmetric polynomials of the variables x 1 , . . . , X n . These are invariant under permutations. We have the identities n

and

Sk (X I , · · · , Xi , . . . , x n ) - Sk ( XI , · · · , Xj , · · · , X n )

== ( xi - x i ) Sk- l ( xl , . , x i , . . , xi , . .

.

.

.

.

, xn ) ,

where the circumflex indicates that the term below it has been omitted. l.lsing these one finds via Theorem 11. 3. 14 that each Sk is Schur-concave; i. e. , - Sk is isotone, on IR� . The special case k

n

==

n says that if x , y E IR� and x -< y, then II Xj j= l

n

>

II Yi ·

j= l

{The Hadamard Determinant Theorem) If A is an n positive matrix, then n det A < II ajj · Theorem 11.3. 1 7

j= l

xn

47

II . 3 Convex and Monotone Functions

P roof. Use Schur's Theorem ( Exercise II. 1. 12) and the above statement about the Schur-concavity of the function f(x) == Xj on JR� .

II

•

J

More generally, if A 1 , . . . , A n are the eigenvalues of a positive matrix A, we have for k == 1, 2 , . . . , n Sk ( A 1 , · · · , A n )

Exercise 11.3. 18

If A is an

m

x

d et (A A* )

<

n

<

(11 . 28)

Sk ( a1 1 , . . . , U nn )

complex matrix, then m

n

II :L l aij l 2 -

i =l j =l

(See Exercise 1. 1 . 3. ) Exerc ise 11.3. 19 Show that the ratio Sk (x) / Sk- 1 (x) is Schur-concave on the set of positive vectors for k 2, . . . Hence, if A is a positive matrix, then ==

Sn ( a 1 l , · · · , ann ) Sn ( A 1 , , An )

>

, n.

Sl ( al l , · · · , a nn ) Bn - 1 ( a l 1 , · · · , ann ) -- . . . --------> >

Bn- l ( A I , . . . , A n ) tr A == tr A l . Proposition 11.3.20 If A is an n x n positive ·

·

·

( det A) 1 f n

= min

{ �

tr B

:B

81 ( ...\ 1 , . . . , A n )

definite matrix, then

}

is positive and det B = 1 .

If A is positive semidefinite, then the same relation holds with by inf.

min

replaced

Proof. It suffices to prove the statement about positive definite matrices; the semidefinite case follows by a continuity argument . Using the spectral theorem and the cyclicity of the trace, the general case of the proposition can be reduced to the special case when A is diagonal. So, let A be diago nal with diagonal entries ...\ 1 , . . . , A n . Then, using the arithmetic-geometric mean inequality and Theorem 11.3. 17 we have

tr A B n

=

�

L>..i bii ( II. >..j ) lfn ( II. bjj ) l fn > (det A ) lfn (det B) l fn ,

n . J

>

J

J

for every positive matrix B . Hence, tr: B > ( det A) 1 f n if det B B == ( det A) 1 / n A - l this becomes an equality.

1. When

•

{The Minkowski Determinant Theorem) If A, B are n positive matrices then

Corollary 11.3. 2 1 n x

==

(det ( A + B) ) 1 f n > ( de t A ) 1 f n

+

( de t B ) 1 f n _

48

II.

Maj orisation and Doubly S t o chastic M atrices

Binary Algebraic Operations and

II. 4

Maj orisation

For x

E

IRn we have seen in Section IL l that k

x] L .

==

J=l

It follows that if x, y E

max(x, e1 ) .

J l l =k

IRn , then

x + y -< xl + y l .

(II. 29)

In this section we will study majorisation relations of this form for sums, products, and other functions of two vectors. IR is called lattice superadditive if A map

(II.30)

We will call a map cp monotone if it is either monotonically increasing or monotonically decreasing in each of its arguments. In this section we will adopt the following notation. Given cp : IR2 IR, we will denote by the map from JRn x IR.n to JRn defined as -+

(II . 3 1 )

Example 11.4. 1

(i} cp(s, t)

=

s +t

is a monotone and lattice superadditive

function on IR2 . (ii} cp( t) st is a monotone and lattice superadditive function on IR� . For (i} above we have ( x y ) (x l + Y 1 , . . . , X n + Yn ) for x, y E IR. n , s,

=

,

=

and for {ii} we have Theorem 11.4. 2

If

for all x , y E IRn . Proof. Note that if we apply a coordinate permutation simultaneously to x and y, then

x 2 > · · · > X n · Next

I I . 4 B inary Algebrai c Operations and I'v1aj orisation

49

no te that we can find a finite sequence of vectors u(O) , u( l ) , . . . , u(N ) such th at y l == u C0) , y T == u ( N ) , y == u U) for some 1 < j < N ,

I an d each u( k + ) is obtained from uC k ) by interchanging two components in su ch a way as to move from the arrangement y l to y T ; i . e. , we pick up two indi ces i, j such that i <j

and interchange these two components to obtain the vector u( k + I ) . So, to prove (II.32) it suffices to prove

(x , u (k+l ) )

-< w

(x, u(k ) )

(11.33)

, N - 1 . Since we have already assumed x 1 > x 2 > > for k == 0, 1 , to prove (II. 33) we need to prove the two-dimensional majorisation .

.

.

·

·

·

Xn ,

(II.34)

if s 1 > s 2 and t 1 > t 2 . Now, by the definition of weak majorisation, this is equivalent to the two inequalities

cp(s1 , t 2 ) V cp(s2 , t 1 ) cp ( s 1 , t2 ) + cp ( s 2 , t 1 )

<

<

cp (s1 , t 1 ) V cp( s 2 , t2 ) , cp(s 1 , t 1 ) + cp(s 2 , t 2 ) ,

for s 1 > s 2 and t 1 > t2 . The first of these follows from the monotony of cp and the second from the lattice superadditivity. • Corollary 11.4.3 For x, y For x , y

E

IR+.

where X · y ( x 1Y 1 , ==

Corollary II.4.4

E

�n

xl + y T

-<

x+y

-<

xl . y T

-<w

X y

-<w

(x , y)

<

•

, X nYn ) · For x, y E IR. n .

.

x l + yl .

(11.35)

xl . y l '

(11. 36 )

(x l , y l ) .

(II.37)

.

(x l , y T )

<

Proof. If x > 0 and y > 0 , this follows from (II.36) . In the general case , choose t large enough so that x + t e > 0 and y + t e > 0 and apply the • special result.

The inequality (II.37) has a "mechanicaP' interpretation when x > 0 and y > 0. On a rod fixed at the origin, hang weights Yi at the points at distances xi from the origin. The inequality (II.37) then says that the maximum moment is obtained if the heaviest weights are the farthest from the origi n .

II.

50

Majorisation and Doubly Stochastic Matrices

2 Exercise 11.4.5 The function : IR R defined as monotone and lattice superadditive on JR2 . Hence, for x, y i x 1\

11. 5

y1

1\

--< w X

y

x

--< , u

i Ay ,

i

If a doubly stochastic matrix

Prob lem 11.5. 2.

Let

also doub ly stochastic , then

and each

hul l of the points

is either

r;

y E JR� .

The set { x : x E

0

1.

or

Let y E

permutations and each c:1

Rn .

c onvex hul l of points of the form

Problem 11.5.4.

( �

A

±1,

..X (A)

Use

is

w he re

�� )

be a U

=

is the convex

y}

vari es over pennutations

a

2

is

Rn

: lx l

x

varies over

IY I }

--< w a

is the

2 block matrix and le t

C(A)

s (C (A ) )

--< w s

..X(C ( A ))

--<

More general ly, let

=

EPJ A PJ

A1 1

A1 2 ' ' '

' ' '

' ' '

' ' '

'

Ar l

A r2

•

•

•

..X ( A ) .

' '

..

'

'

.

' ' '

A r·r

11 .5.4.

SlH)\V

.

•

.

EBP;

, Pr

be a fami ly of mutual ly

I. Then the operation of pinching of A . In an appropri ate =

A11 ,

C (A)

=

A 22 Ar ,,

E�ach such pinching is a product of duced in P roblem

P1 ,

is cal led a

Alr .

( .r1 ) .

to show that

choice of basis this means th at

' ' '

AU * ) .

resp ec t i vely.

A,

ort hogona l proj ections in en such that

A=

-< w

A- l

denot e the n-vectors \vhose coordi nates are the eigen

to show that

Problem 11.5.5. to

iR� , x

(t: 1 Yu ( l ) , . . . , C:nYo (n} ) , \Vhere

A � (A +

s(A)

and

(11 . 18)

A

is i nvertible an d

=

A 2.

A is Hermitian , use (11. 16 )

taking

A

The set { x E

( ��:

=

val ues and the singular val ues of

If

s 1\ t

� ) I f U ( � _01 ) , then we can w rite

A 1 <

=

Let

=

C( ) Let

JRn

is a pcn nutation .

A

(r l Yo( l ) , · . . , rnYo(n) ) ,

Prob l em 11.5.3.

A

E

=

Problems

Pro b l em 11.5. 1 .

C( )

�

r

-

1

pinchings of the

that for every pinching

s( C( A ))

--< w

s (A )

C

2

x

2

type intro

(11 .38 )

I I . 5 Pro blems

for all matrices A , and

.A(C (A) ) -< .A(A)

51

(II.39)

for all Hermitian matrices A. When P1 , . . . , Pn are the projections onto the coordinate axes , we get as a special case of (II.38) above n

l tr A I < From (II.39) we get

as

L si (A)

j=l

==

I IAI h .

(II.40)

a special case Schur's Theorem diag (A) -< .A(A) ,

which we saw before in Exercise II. 1 . 12. Problem 11.5.6. Let A be positive. Then

(II . 41 )

det A < det C(A) ,

for every pinching C. This is called Fischer's inequality and includes the Hadamard Determinant Theorem as a special case. P roblem II. 5 . 7. For each k == 1 , 2 , . . . that for positive definite A

, n

and for each pinching C sho\v ( II.42 )

where Sk (.A( A ) ) denotes the kth elementary symmetric polynomial of the eigenvalues of A . This inequality, due to Ostrowski, includes (II. 28) as a special case. It also includes (II. 4 1 ) as a special case. P roblem 11. 5 . 8 . If 1\ k A denotes the kth antisymmetric tensor power of A, then the above inequality can be written as

tr 1\ k A < tr 1\ k (C(A) ) .

(II.43)

The operator inequality

is not always true. This is shown by the following example. Let

A ==

2 0 0 1 0 1 1 0

0 1

2

0

1

0 0 1

'

P ==

1 0 0

0 0 0 l 0 0 0 0 0

0 0 0 0

and let C be the pinching induced by the pair of projections P and (The space 1\ 2 C4 is 6-dimensional. )

I - P.

52

II.

Majoris at ion and Doubly Stochastic Matrices

Problem II. 5.9. Let { > q , . . numbers. Let

.

, A n } , {JL 1 , . . . , Jl n } be two n�tuples of complex

where the minimum is taken over all permutations on n symbols. This is called the optimal matching distance between the unordered n-tuples A and f-L · It defines a metric on the space C�y m of such n-tuples. Show that we also have d(A, J-L) == max 1 2 n min I A i - /-Lj l ·

iEJ J,JC{ , , . . . , } jE J I I I+IJI=n+l

Problem 11. 5 . 10. This problem gives a refinement of Hall's Theorem un der an additional assumption that is often fulfilled in matching problems. In the notations introduced at the beginning of Section II.2, define

1 This is the set of boys known to the girl

Bi 1 ···ik Suppose that for each 1 < i1 < < i k < n, ·

·

==

k

k

U Bi r ,

r=l ==

1

9i .

< i1 <

< i < n.

Let ·

·

·

< i k < n.

1 , 2, . . . , [ � J and for every choice of indices

·

Show that then

Hence a compatible matching between B and G exists. P roblem 11.5. 1 1 . ( i ) Show that every symmetric gauge function is con tinuous.

( ii ) Show that if is a symmetric gauge function, then

1 (x ) for all x E IRn .

( iii ) If

< tj < 1 , then

( iv ) Every symmetric gauge function is monotone on R� . ( v ) If

x,

y

E

IRn and txl

gauge function .

<

I Y I , then

(y)

for every symmetric

I I . 5 Pro blems

( vi ) If

symm

x

-<w

:

IR+

y if and etric gauge function .

x, y

E IR� , then

P ro ble m 11.5. 12. Let f f ( O) == 0.

�+

�

only i f

�

IR+

be defined

<

( y )

for every=··

be a concave function such that

( i ) Show that f is subadditive: f ( a + b )

( ii ) Let

(x)

53

< f ( a ) + f ( b ) fo r all a, b E

IR+ .

as

n

n

j=l

j=l

is Schur-concave. Note that for x, y E IR+ ( x, y ) -< (x + y, 0 ) in IR�n .

Then

( iii )

( iv ) From (ii) and ( iii ) conclude that the function n

F (x ) == L J ( ! xj l ) j =l is subadditive on IRn .

( ) Special examples lead to the following inequalities for vectors v

x , y E IRn : n n n Ll xi + Yi i P < L l xi i P + L !Yi i P , j == 1 j=1 j=l

0 < P < 1.

L log ( 1 + l xi + Yi l) < L log ( 1 + l xil ) + L log ( I + I Yi ! ). n

n

n

j= l

j= l

_i = l

Problem 11. 5 . 13. Show that and only if cp (x 1 + 81 , x 2

a

map

:

�2

----+

� is lattice superadditive if

- 82 ) + rp(x 1 - 81 , x2 + 82 ) < rp(x1 + 8 1 , x 2 + 82 ) + rp (x1 - 81 , x 2 - 82 )

for all ( x 1 , x 2 ) and for all 81 , 82 equivalent to

>

0.

If

i s twice differentiable, this is

54

II.

Maj orisat ion and Doubly Stochastic Matrices

Problem 11. 5 . 14. Let cp : IR2 � lR be a monotone increasing lattice super additive function, and let f be a monotone increasing and convex function from R to IR. Show that if cp and f are twice differentiable, then the com position f o cp is monotone and lattice superadditive. When cp ( s, t) == s + t show that this is also true if f is monotone decreasing. These statements are also true without any differeiHjability assumptions. Problem 11.5 . 15 . For x , y

- log(x 1

E

R+

+ yi) -< w - log ( x + y) -< w - log (x l + y l )

log(x 1

·

y i ) -< w log(x y ) -< w log(x l y l ) . ·

·

From the first of these relations it follows that

n n IT (xj + yj ) < IT (xj + Yi ) < IT (xj + y] ) . j=l j= l j= l n

Problem 11.5.16. Let x , y, u be vectors in JRn all having their coordinates in decreasing order. Show that

(y, u ) if x -< y, (ii ) ( x , u) < (y, u ) if x -< w y and u E IR+ . In particular, this means that if x , y E Rn , x -< w y, and u E R+ , then

(i) ( x , u ) <

(Use Theorem 11.3. 14 or the telescopic summation identity

k k L ajbj == L ( aj - aj +1 ) ( b1 + j=l j= l where II. 6

aj , bj ,

1

·

·

·

+ bj ) ,

< j < k, are any numbers and ak+l == 0.]

Notes and References

Many of the results of this chapter can be found in the classic Inequalities by G.H. Hardy, J E Littlewood, and G. Polya, Cambridge University Press, 1934, which gave the first systematic treatment of this theme. The more recent treatise Inequalities: Theory of Majorization and Its Applications by A.W. Marshall and I. Olkin, Academic Press, 1 979 , is a much more detailed and exhaustive text devoted entirely to the study of majorisation. .

.

II.6 Notes and

References

55

It is a.irinvaluable resource on this topic. For the reader who wants a quicker introduction to the essentials of maj orisation and its applications in linear algebra, the survey article Majorization, doubly stochas tic matrices and co mparison of eigenvalues by T. Ando, Linear Algebra and Its Applications , 1 18( 1 989 ) 163-248 , is undoubtedly the ideal course. Our presentation is strongly influenced by this article from which we have freely borrowed. The distance d ( A., J..L ) introduced in Problem II. 5.9 is commonly employed in the study of variation of roots of polynomials and eigenvalues of matri ces since these are known with no preferred ordering. See Chapter 6. The result of Problem II.5. 10 is due to L. Elsner, C. Johnson, J . Ross, and J. Schonheim, On a generalised matching problem arising in estimating the eigenvalue variation of two matrices, European J. Combinatorics , 4( 1983) 1 33- 1 3 6 .

Several of the theorems in this chapter have converses. For illustration we mention two of these. Schur's Theorem (II. 14) has a converse; it says that if d and A. are real vectors with d -< .A , then there exists a Hermitian matrix A whose diagonal entries are the components of d and whose eigenvalues are the components of A.. Weyl's Majorant Theorem (II.3.6) has a converse; it says that if ...\ 1 , . , A n are complex numbers and s 1 , . . . , s n are positive real numbers ordered as I-X 1 I > · · · > I .An l and s 1 > · · · > S n , and if .

k

k

II I .Ai l < II Sj =l j=l j n n II l.Aj l == II Sj , j= l j= l

fo r

1 < k<

.

n,

then there exists an n x n matrix A whose eigenvalues are ...\ 1 , . . . , An and singular values s 1 , . . . , s n . For more such theorems, see the book by Marshall and Olkin cited above. Two results very close to those in II. 3 . 16-II.3. 2 1 and II.5 .6-II.5.8 are given below. M. Marcus and L. Lopes, Inequalities for symmetric functions and Her mitian matrices, Canad. J. Math. , 9 ( 1 957) 305-312, showed that the map 4> : JR� � lR given by ( x) ( Sk ( x)) 1 f k is Schur-concave for 1 < k < n . Using this they showed that for positive matrices A , B ==

(II.44)

This can also be expressed by saying that the map A ( tr 1\ k A ) 1 f k is n, this reduces to the concave on the set of positive matrices. For k statement [d et ( A + B ) ] 1 1 n > [ det A] 1 f n + [det B ] 1 f n , --+

=

which is the Minkowski determinant inequality.

56

II.

Majorisation and Doubly Stochastic Matrices

E . H. Lieb, Convex trace functions and the Wigner- Yanase-Dyson conjec ture, Advances in Math. , 1 1 ( 1973) 267-288, proved some striking operator inequalities in connection with the W.-Y.-D. conj ecture on the concavity of entropy in quantum mechanics. These were proved by different techniques and extended in other directions by T. Ando, Concavity of certain maps on positive definite matrices and applications to Hadamard products, Linear Algebra Appl. , 26 ( 1979) 203-241 . One consequence of these results is the inequality (11.45) for all positive matrices A, B and for all implies that

k == 1 , 2,

. . . , n.

In particular, this

When k == n , this reduces to the Minkowski determinant inequality. Some of these inequalities are proved in Chapter 9.

I II Variational P rinciples for Eigenvalues

In this chapter we will study inequalities that are used for localising the spectrum of a Hermitian operator. Such results are motivated by several interrelated considerations. It is not always easy to calculate the eigen values of an operator. However, in many scientific problems it is enough to know that the eigenvalues lie in some specified intervals. Such infor mation is provided by the inequalities derived here. While the functional dependence of the eigenvalues on an operator is quite complicated, several interesting relationships between the eigenvalues of two operators A, B and those of their sum A + B are known. These relations are consequences of variational principles. When the operator B is small in comparison to A, then A + B is considered as a perturbation of A or an approximation to A. The inequalities of this chapter then lead to perturbation bounds or error bounds. Many of the results of this chapter lead to generalisations, or analogues, or open problems in other settings discussed in later chapters. III. l

The Minimax Principle for Eigenvalues

The following notation will be used throughout this chapter. If A , B are Hermitian operators, we will write their spectral resolutions as Aui = O'.j Uj , Bvj {3j Vj , 1 < j < n, always assuming that the eigenvectors u1 and the eigenvectors Vj are orthonormal and that a 1 > o. 2 > > O'. n and {3 1 > {32 > > f3n . When the dependence of the eigenvalues on the operator is to be emphasized, we will write _Al (A) for the vector with com==

·

·

·

·

·

·

58

III.

Variational Principles for Eigenvalues

ponents A i ( A) , . , A ; ( A) , where AJ (A) are arranged in decreasing order; i.e. , AJ ( A) == D'.j . Similarly, A i ( A) will denote the vector with components AJ (A ) where A} (A) == Cl n -j+ l , 1 < j < n . .

.

Theorem 111. 1 . 1 {Poincare's Inequality) Let A be a Hermitian operator on 1t and let M be any k-dimensional subspace of 1t. Then there exist unit vectors x, y in M such that (x, Ax ) < Ak (A) and (y, A y ) > � k (A) . Proof. Let N be the subspace spanned by the eigenvectors A corresponding to the eigenvalues Ai (A), . . . , A� (A). Theil:

dim M

+

dim N == n +

U k , . . . , Un

of

1,

and hence the intersection of M and N is nontrivial. Pick up a unit vector x in M n N. Then we can write x

==

n

n

j=k

j= k

L�j Uj , where L 1�i l 2 = 1 . Hence,

n

n

j= k

j=k

This proves the first statement. The second can be obtained by applying this to the operator -A instead of A. Equally well, one can repeat the argument, applying it to the given k-dimensional space M and the ( n • k + I)-dimensional space spanned by u 1 , u2 , . . . , Un - k + l · Corollary III. 1 . 2 {The Minimax Principle) Let A be a Hermitian opera tor on 7t. Then

max

M C 'H. d i m M =k

min

dim

min (x, Ax) EM

x l lx l l=l

M C 'H. M=n - k+l

max (x , A x ) .

xEM l l x l l=l

Proof. By Poincare's inequality, if M is any k-dimensional subspace of 1t, then min(x, Ax) < A k ( A ) , where x varies over unit vectors in M . But if M is the span of { u 1 , . , uk } , then this last inequality becomes an equality. That proves the first statement. The second can be obtained from the first • by applying it to -A instead of A. X

.

.

This minimax principle is sometimes called the Courant-Fischer-Weyl minimax principle. Exercise 111. 1.3 In the proof of the minimax principle we made a par ticular choice of M . This choice is not always unique. For example, if A k (A ) == Ak + l (A) , there would be a whole 1-parameter family of such subspaces obtained by choosing different eigenvectors of A belonging to A.i ( A) .

I I I . l The

Minimax Principle for Eigenvalues

59

Th is is not surprising. More surprising, perhap s even shocking, is the fact th at we could have A t (A) == min { (x, Ax) : x E M , J l x l l == 1 } , even for a k-dimensional subspace that is not spanned by eigenvectors of A . Find an ex ample where this happens. {There is a simple example. )

Exercise 111. 1 .4 In the proof of Theorem II/. 1 . 1 we used a basic principle of linear alge bra: dim ( M 1

n

dim M 1 + dim M 2 - dim ( M 1 + M2 ) dim M 1 + dim M2 - n ,

M2)

>

for any two sub spaces M 1 and M2 of an n-dimensional vector space. Derive the corresponding inequality for an intersection of three subspaces.

An equivalent formulation of the Poincare inequality is in terms of com pressions. Recall that if V is an isometry of a Hilbert space M into 1t, then the compression of A by V is defined to be the operator B == V* AV. Usu ally we suppose that M is a subspace of 1t and V is the injection map. Then A has a block-matrix representation in which B is the northwest corner entry:

A=

( ! :).

B is the compression of A to the subspace M . Corollary 111. 1.5 (Cauchy 's Interlacing Theorem) Let A be a Hermitian operator on 1t, and let B be its compression to an ( - k) -dimensional

We say that

subspace N. Then for j

n

==

1, 2, . . .

, n -

k

(III. I ) Proof. For any j , let M be the span of the eigenvectors v 1 , . . . , Vj of B corresponding to its eigenvalues At ( B) , . . . , AJ (B) . Then (x, Bx) == (x, Ax) for all x E M . Hence,

A1� (B)

==

:r: E M

min l lx l l = l

(x, Bx )

==

:r: E M

min l l :r: l l = l

(x, Ax) < .X1* (A) .

This proves the first assertion in (III. I ) . Now apply this to -A and its compression Note that

-B to the given subspace N.

and

-.XJ ( B ) .X} (- B ) =

=

.xtn- k ) -j+l (- B )

for all 1 < j <

n -

k.

III.

60

Variational

Principles for Eigenvalues

i == j+k.

Then the first inequality yields -AJ (B) < -.XJ + k (B) , which is t he second inequality in (III. I ) . •

Choose A

The

above inequalities look especially nice when B is the compression of to an (n - I)-dimensional subspace: then they say that

This

(111.2)

explains why this is called an interlacing theorem.

Exercise 111. 1.6 The Poincare inequality, the minimax principle, and the in terlacing theorem can be derived from each other. Find an independent proof fo r each of them using Exercise III. 1 .4 . (This "dimension-counting" fo r intersections of sub spaces will be used in later sections too.) �xercise III. l. 7 Let B be the compression of a Hermitian operator A to � I ) -dimensional space M . If, for some k, the space M contains the a n (n ve c tors u 1 , , uk , then {3j == a. j for 1 < j < k . If M contains u k , , un , th en a.j == f3j - l fo r k < j < n . .

.

.

.

.

.

Exercise III. 1.8 (i} Let A n be the n x n tridiagonal matrix with entries a i l == 2 cos e for all i , U ij == 1 if l i - j I == 1 ' and U ij == 0 otherwise. The determinant of An is sin(n + 1 ) 0/ sin 0. (ii) Show that the eigenvalues of A n are given by 2( cos 0 + cos r?.;1 ) , 1 < j < n.

(iii) The special case when ai i == -2 for all i arises in Rayleigh 's finite dimensional approximation to the differential equation of a vibrating string. In this case the eigenvalues of A n are l .:\1. ( An )

==

. -4 Sln2

.

J 1r , 2 (n + l )

1 < j < n.

(iv) Note that, for each k < n , the matrix An- k is a compression of A n . Th is example provides a striking illustration of Cauchy 's interlacing theorem.

It is illuminating to think of the variational characterisation of eigenval ues as a solution of a variational problem in analysis. If A is a Hermitian operator on lRn , the search for the top eigenvalue of A is just the problem of maximising the function F(x) = x* Ax subject to the constraint that the function G(x) x*x has the fixed value 1 . The extremum must occur at a critical point, and using Lagrange multipliers the condition for a point x to be critical is \lF(x) == A"VG(x) , which becomes Ax = .Xx. Our ear lier arguments got to the extremum problem from the algebraic eigenvalue problem, and this argument has gone the other way. If additional constraints are imposed , the maximum can only decrease. Confining x to an ( n - k)-dimensional subspace is equivalent to imposing =

III. 1

The Minimax Principle

for Eigenvalues

61

k linearly independent linear constraints on it. These can be expressed as Hj ( x ) == 0 , where Hj (x) == wj x and the vectors Wj , 1 < j < k are linearly

indep endent . Introducing additional Lagrange multipliers J.lj , the condition for a critical point is now V' F(x ) == A \J G (x ) + Lj J.-Li \1 Hi (x ) ; i.e. , Ax- A x is no longer required to be 0 but merely to be a linear combination of the Wj Look at this in block-matrix terms. Our space has been decomposed ·into a direct sum of a space N and its orthogonal complement which is spanned by {w1 , . . . , w k }- Relative to this direct sum decomposition we can write

Our vector x is now constrained to be in N, and the requirement for it to be a critical point is that (A - AI) (�) lies in NJ.. . This is exactly requiring x to be an eigenvector of the compression B . If two:;interlacing sets of real numbers are given, they can be realised as the eigenvalues of a Hermitian matrix and one of its compressions. This is a converse to one of the theorems proved above: Theorem 111. 1.9 Let a.j , 1 < j < n, and f3i, 1 

{3 1 > a2 > . . . > f3n - l

> D. n ·

Then there exists a compression of the diagonal matrix A having f3i, 1 < i < n - 1 , as its eigenvalues.

==

diag ( a 1 , . . .

, a. n)

Proof. Let Aui == D.j Uj ; then { Uj} constitute the standard orthonor mal basis in e n . There is a one-to-one correspondence between ( n - 1 ) dimensional orthogonal projection operators and unit vectors given by P == I - zz* . Each unit vector, in turn, is completely characterised by its coordinates (j with respect to the basis ui . We have z == 2: (j Uj == E(uj z)uj , 2: j(j l 2 == 1 . We will find conditions on the numbers (j so that, for the corresponding orthoprojector P == I - zz* , the compression of A to the range of P has eigenvalues f3i · Since PAP is a Hermitian operator of rank n - 1 , we must have n- 1

II (A - f3i )

i= l

==

tr

1\ n - 1 [ P (.A I - A) P] .

Ei I - ui uj , then n l 1\n- ( Af - A) L II ( A - a. k ) 1\n - 1 Ej · j = 1 k-:j:.j

If Ei are the projectors defined as

==

==

Using the result of Problem I.6.9 one sees that

1\ n - 1 p . I\ n - 1 Ej . I\ n-1 p

=

1 (j 1 2 /\

n

- 1 p_

II I . Variatio nal Principles

62

Since rank 1\ n - l P

for

Eigenvalues

== 1 , the above three relations give n-1

n

(11!.3)

i= l

j=l

an identity between polynomials of degree n - 1 , which the (j must satisfy if B has spectrum {,Bi } · We will show that the interlacing inequalities between Oj and ,Bi ensure that we can find

(j

satisfying ( II I.3) and

without loss of generality, that the

Gj

n L 1(j l2

=

1 . We may assume,

1 < j < n.

(111.4)

j=l are distinct . Put

i= l

The interlacing property ensures that all /j are nonnegative. Now choose (j to be any complex numbers with 1 (1 12 /j . Then the equation (III.3) is satisfied for the values .,.\ a-1 , 1 < j < n , and hence it is satisfied for all ...\ . Comparing the leading coefficients of the two sides of (111.3) , we see that L l(j l 2 1 . This completes the proof. ==

==

•

==

J

III. 2

Weyl 's Inequalit ies

Several relations between eigenvalues of Hermitian matrices A , B, and A+B can be obtained using the ideas of the previous section. Most of these results were first proved by H. \Veyl.

A, B be x n Hermitian matrices. Then, -AJ ( A + B ) < AI ( A) + .Aj _i +l (B) for i < j,

Theorem III. 2 . 1 Let

n

(1IL5)

.AJ ( A + B) > .Af (A) + .Al-i+n( B ) for i > j. Uj , Vj , and Wj denote the eigenvectors of A , B ,

(IIL6)

Proof. Let and A + B respectively, corresponding to their eigenvalues in decreasing order. Let i < j . Consider the three subs paces spanned by { w1 , . . . , Wj } , { ui , . . , U n } , and { Vj - i+ I , . . , vn } respectively. These have dimensions j , n - i + 1 , and n j + i , and hence by Exercise 111. 1.4 they have a nontrivial intersection. Let x be a unit vector in their intersection. Then .

.

-

-AJ ( A + B) <

(x, ( A + B) ) x

==

( x , Ax ) + ( x , Bx )

< .x; ( A )

+ .x; _i +l ( B) .

III.2 Weyl's Inequalities

This proves (III. 5-} . If A and -B, we get (III. 6) .

B in this

Corollary 111. 2 . 2 For each j

==

1 , 2,

.

inequality are replaced by

.

.

-A and •

, n,

A.J (A) + A.;(B) < Aj (A + B) < AJ (A ) + A.i (B). P roof.

Put i

==

63

(III. 7) •

j in the above inequalities.

It is customary to state these and related results as perturbation theo rems, whereby B is a perturbation of A; that is B == A + H. In many of the applic ations H is small and the object is to give bounds for the distance of A.(B) from A.(A) in terms of H == B - A. Corollary 111. 2.3 {Weyl 's Monotonicity Theorem) If H is positive, then

AJ (A + H) > A.J (A)

for all j.

Proof. By the preceding corollary, AJ ( A + H) > AJ (A) + A.� (H ) , but all the eigenval ues of H are nonnegative. Alternately, note that ( x, (A + H )x ) > • (x, Ax ) for all x and use the minimax principal. Exercise 111. 2.4 If H is positive and has rank k , then

This

is

A.j (A + H) > AJ (A ) > A.j+k (A + H)

for j

==

1 , 2, . . . , n - k.

analogous to Cauchy 's interlacing theorem.

Exercise 111.2.5 Let H be any Hermitian matrix. Then

A.j (A ) - I I H I I < A.j ( A + H ) < AJ (A) + I IHI I· This can be restated as: Corollary 111.2 .6 (Weyl 's Perturbation Theorem) Let A and mitian matrices. Then

m�J A.; (A) J

be Her

AJ ( B ) I < I l A - B I I -

Exercise 111.2. 7 For Hermitian matrices

I IA - Bl l

B

A, B,

< m�J �j (A) J

we have

Aj (B ) I.

It is useful to have another formulation of the above two inequalities, which will be in conformity with more general results proved later. We will denote by Eig A a diagonal matrix whose diagonal entries are the eigenvalues of A. If these are arranged in decreasing order, · we write this matrix as Eig! (A); if in increasing order as Eig T (A). The results of Coro ll ary III.2.6 and Exercise III.2. 7 can then be stated as

64

III.

Variational Principles for Eigenvalues

Theorem 111. 2.8 For any two Hermitian matrices A, B, Weyl ' s inequality (111.5 ) is equivalent to an inequality due to Aronszajn connecting the eigenvalues of a Hermitian matrix to those of any two com plementary principal submatrices. For this let us rewrite (I1I.5) as

Af+j- 1 (A + B) < AJ ( A ) + AJ ( B ) , for all indices i , j such that i + j 1 < n. Theorem 111.2.9 (Aronszajn's Inequality) Let C be an

(111.8)

-

matrix partitioned as

c

=

n

( ;* � ) '

x n Hermitian

where.. .A is a k x k matrix. Let the eigenvalues of A, B, and C be a 1 > ak , {31 > > f3n - k , and 11 > · · · > In, respectively. Then ·

·

> ···

·

li+j- 1 + In < a i + {3j for all i, j with i + j - 1 <

(III.9)

n.

Proof. First assume that In == 0. Then C is a positive matrix. Hence C == D* D for some matrix D. Partition D as D == ( D1 D2) , where D1 has k columns. Then

Note that DD* == D 1 D; + D2D2 . Now the nonzero eigenvalues of the matrix C == D D are the same as those of D D* . The same is true for the matrices A == D;_ D1 and D1Di , and also for the matrices B == D?. D 2 and D2D2 . Hence, using Weyl 's inequality (III.8) we get (III.9) in this special case. If In =/= 0, subtract In! from C. Then all eigenvalues of A, B, and C are translated by -In· By the special case considered above we have *

li+j- 1 - In < (a i - In ) + ((3j - In ) ,

•

which is the same as (III.9 ) .,;.�

We have derived Aronszajn's inequality from Weyl ' s inequality. But the argument above can be reversed. Let A, B be n x n Hermitian matrices and let C == A + B. Let the eigenvalues of these matrices be a 1 > > a n , {31 > · · · > f3n , and 11 > · · · > In, respectively. We want to prove that 'Yi +j - 1 < ai + f3j · This is the same as 'Yi +j -1 - (an + f3n) < (a i - an ) + ( {3j - f3n ) · Hence, we can assume, without loss of generality, that both A and B are positive. Then A == Di D1 and B == D2 D2 for some matrices D1 , D2 . Hence, ·

C

=

D� D 1

+

D ; D 2 = ( D i D; )

( �� )

.

·

·

III.3

Wielandt 's Minimax Principle

65

C o nsi der the 2 n ':�"2 n matrix

Then the eigenvalues of E are the eigenvalues of C together with n ze roes. Aronszajn ' s inequality for the partitioned matrix E then gives Weyl ' s inequality (III.8) . By this procedure, several linear inequalities for the eigenvalues of a sum of Hermitian matrices can be transformed to those for the eigenvalues of block Hermitian matrices, and vice versa. Wielandt ' s Minimax Principle

III . 3

The minimax principle (Corollary 111. 1 . 2) gives an extremal characterisa tion for each eigenvalue aj of a Hermitian matrix A. Ky Fan's maximum principle (Problem 1.6. 15 and Exercise II. 1 . 13) provides an extremal char acterisation for the sum a1 + + ak of the top k eigenvalues of A. In this section we will prove a deeper result due to Wielandt that subsumes both these principles by providing an extremal representation of any sum a h + + a i k . The proof involves a more elaborate dimension-counting for intersections of subspaces than was needed earlier. We will denote by V + W the vector sum of two vector spaces V and W, by V - W any linear complement of a space W in V, and by span { v1 , . . . , vk } the linear span of vectors v1 , . . . , Vk . ·

·

·

·

·

·

Lemma 111. 3 . 1 Let WI =::) W2 =::) =::) Wk be a decreasing chain of vector spaces with dim Wj > k - j + l . Let Wj , 1 < j < k - 1 , be linearly independent vectors such that Wj E Wj , and let U be their linear span. Then there exists a nonzero vector u in W1 - U such that the space U + span { u } has a basis v 1 , . . . , v k with vj E Wj , 1 < j < k . •

•

•

This will be proved by induction on k. The statement is easily verified when k == 2 . Assume that it is true for a chain consisting of k - 1 spaces. Let WI , . . . , Wk - I be the given vectors and U their linear span. Let S be the linear span of w2 , . . . , Wk - l · Apply the induction hypothesis to the chain w2 � . . . � wk to pick up a vector v in w2 - s such that the space S + span{ v } is equal to span{ v2 , . . . , vk } for some linearly independent vec tors vi E Wi , j == 2 , . . . , k. This vector v may or may not be in the space U. We will consider the two possibilities. Suppose v E U. Then U == S + span{v} because U is (k - 1 )-dimensional and S is ( k - 2)-dimensional . Since dim wl > k , there exists a nonzero vector u in wl - U. Then u , V2 , . . . ' V k form a basis for U + span{ u } Put u == v1 . All requirements are now met. Suppose v ¢ U. Then w 1 ¢ S + span { v } , for if WI were a linear com bination of w2 , . . . , w k - l and v, then v would be a linear combination of Proof.

.

66

I II.

Variational Principles for Eigenvalues

W I , w 2 , . . . , wk- I and hence be an element of U. So, span{ w1 , v2 , . . . , v k } is a k-dimensional space that must , therefore, be u + span{ v } . Now W I E wl • and Vj E Wj , j == 2 , . . . , k. Again all requirements are met.

Theorem 111.3.2 Let V1 c V2 c c Vk be linear subspaces of an n dimensional vector space V, with dim Vj == ij , 1 < i 1 < i2 < · < i k < n. Let WI =::) W2 :J =::) Wk be subspaces of V, with dim Wi = n - ii + 1 == codim Vj + 1 . Then there exist linearly independent vectors Vj E 1-j , 1 < j < k , and linearly indepeJ1dent vectors Wj E Wj , 1 < j < k , such that ·

·

·

·

·

·

·

·

span { vI , . . . , v k } == span { w I , . . . , w k } . Proof. When k == 1 the statement is obviously true. (We have used this repeatedly in the earlier sections. ) The general case will be proved by in duction on k. So, let us assume that the theorem has been proved for k - 1 pairs of subs paces. · cB y the induction hypothes1s choose Vj E Vj and Wj E Wj , 1 < j < k - 1 , two sets of linearly independent vectors having the same linear span U. Note that U is a subspace of Vk . For j == 1 , . . . , k, let Si = Wi n Vk . Then note that

n > dim Wi + dim Vk - dim Si ( n - ij + 1 ) + i k - dim Sj .

Hence,

. dim SJ > i k - iJ + 1 > k - J + 1. ·

·

Note that SI =::) 52 =::) :J sk are subspaces of vk and Wj E Sj for j = 1 , 2, . . . , k - 1 . Hence, by Lemma III.3. 1 there exists a vector u in 81 - U such that the space U + span { u } has a basis u i , . . . , u k , where Uj E Sj c Wj , j == 1 , 2 , . . . , k. But U + span{u} is also the linear span of v1 , . . . , v k - l and u . Put V k == u . Then Vj E Vj , j == 1 , 2 , . . . , k , and they span the same space as the Uj · • •

•

•

Exercise 111.3.3 If V is a Hilbert space, the vectors Vj and Wj i n the statement of the above theorem can be chosen to be orthonormal. Proposition 111.3.4 Let A be a Hermitian operator on 1-l with · eigenvec tors Uj belonging to eigenvalues AJ (A) , j = 1 , 2 , . . . , n. (i) Le t Vi == span{ u1 , . . . , Uj }, 1 < j < n. Given indices 1 < i1 < < i k < n, choose orthonormal vectors X i; from the spaces Vi; , j = 1 , . . . , k . Let V be the span of these vectors, and let Av be the compression of A to the space V. Then ·

·

·

{ii} Let Wj = span{ Uj , . . . , u n } , 1 < j < n. Choose orthonormal vectors Xi from the spaces wi . ' j == 1 , . . . , k . Let w be the span of these vectors J

J

III.3 Wielandt's Minimax Principle

67

and Aw the compression of A to W-: Then

AJ ( Aw ) < Af; ( A ) for j == 1, . . . , k. Proof. Let Y 1 , . . . , Yk be the eigenvectors of Av belonging to its eigenval ues A I (Av ) , . . . , A i ( Av ) . Fix j, 1 < j < k , and in the space V consider the , X i; } and {y1 , . . . , Yk } , respectively. The dimen spaces spanned by { X i 1 , •

•

•

sions of these two spaces add up to k + l , while the space V is k-dimensional. Hence there exists a unit vector u in the intersection of these two spaces. For this vector we have

AJ ( Av ) > (u, Av u ) == (u , Au)

>

A L ( A ).

This proves ( i ) . The statement ( ii ) has exactly the same proof.

•

Theorem 111.3.5 (Wielandt 's Minimax Principle) Let A be a Hermitian operator on an n-dimensional space 1-t . Then for any indices 1 < i 1 < · · · < i k < n we have

k EAi; (A ) j= l

M 1 C ···CMk d i m Mj =i j

max

xj

.

min x

· E M J·

J

ort honormal

}\{1 -:J · · · �.Nk

min

x;

d i m Nj =n - i j + l

· E JVJ·

max

XJ

k E(xj , Ax1 ) j= l k E (x1 , Ax1 ) . j= l

o rt honormal

Proof. We will prove the first statement ; the second has a similar proof. Let vi j == span { U l , . . . ' Ui j } ' where, as before, the Uj are eigenvectors of A corresponding to AJ ( A ) . For any unit vector x in Vii , (x, Ax) > A f1 (A). So, if Xj E Vii are orthonormal vectors, then

k

E( x j , Axj ) j=l Since

k

>

L Af (A) . j=l i

Xj were quite arbitrary, we have .

rmn

xi

XJ

E Vi i

orthonormal

k k L (xj , Axj) > E A i; ( A ) .

j=l

j=l

Hence, the desired result will be achieved if we prove that given any sub spaces M 1 c C M k with dim Mj == ii we can find orthonormal vectors Xj E Mj such that ·

·

·

k k L (x1 , Axi ) < L Af (A ) . j= l j= l i

68

III.

Variational Principles for Eigenvalues

Let M == wij == spani Vij ,- . . . ' Vn } , j == 1 , 2 , . . . ' k. These spaces were =::) Nk and considered in Proposition III.3.4 ( ii ) . We have N1 =::) N2 dim Nj == n - i j + 1 . Hence, by Theorem III.3.2 and Exercise III.3.3 there exist orthonormal vectors Xj E Mj and orthonormal vectors Yi E M such that span{x1 , . . . , x k } == span { y1 , . . . , Yk } = W , say. ·

By Proposition III.3.4 ( ii ) ,

·

·

Aj ( A w ) < A f; ( A) for j == 1 , 2 , . . . , k. Hence,

k

k

L(xj , Axj )

L (xj , Awxj ) == tr Aw j= l

j= l

k

k

L AJ ( Aw ) < L At ( A) . j= l j=l

•

This is what we wanted to prove. Exercise 111.3.6 Note that k

k

LAfi ( A) = L ( uii , Auii ) . j =l j=l

We have seen that the maximum in the first assertion of Theorem II/. 3. 5 is attained when M j == vi j = span { u l . . . ' Uij } ' j == 1 ' . . . ' k' and with this choice the minimum is attained for Xj == uii , j = 1 , . . . , k. Are there other choices of subspaces and vectors for which these extrema are attained ? {See Exercise II/. 1 . 3.} '

Exercise 111.3 . 7 Let [a, b] be an interval containing all eigenvalues of A and let «P(t1 , . . . , t k ) be any real valued function on [a, b] x x [a, b] that is monotone in each variable and permutation-invariant. Show that for each choice of indices 1 < i1 < < i k < n, ·

·

max

M1 C· · ·CMk dim

M j = ij

x

·

·

·

·

�!�

i EM j i W = spa ,x

,xk } orthonormal • . . .

(A f (Aw) , . . . ' Ai( Aw ) ) '

where A w is the compression of A to the space W . In Theorem 111. 3. 5 we have proved the special case of this with

III . 4

·

·

Lidskii 's Theorems

One important application of Wielandt ' s minimax principle is in proving a theorem of Lidskii giving a relationship between eigenvalues of Hermitian

III.4

Lidskii's Theorems

69

matrices A, B and A + B. This is quite"'tike our derivation of some of the results in Section 111.2 from those in Section 11I. l . Theorem 111.4. 1 Let A, B be Hermitian matrices. Then for any choice of indices 1 < i1 < < i k < n, ·

·

·

k

k

k

L A fj (A + B ) < L Afj ( A) + LAJ ( B ) .

j =l

j= l

j= l

By Theorem III. 3.5 there exist subspaces M 1 dim M j == ij such that

Proof.

C

(11!. 10)

·

·

·

C

M k , with

k

LAi; ( A + B) ==

j= l

By

Ky

Fan's maximum principle

k

k

j= l

j= l

L (xj , Bxj) < L-AJ ( B) ,

for any choice of orthonormal vectors imply that

x

1 ,

..

.

,x

k.

The above two relations

k

LA}j (A + B) <

j=l

Now, using Theorem 111. 3.5 once again, it can be concluded that the first

k

term on the right-hand side of the above inequality is dominated by LA t (A) . • j= l Corollary 111.4.2 If

of A, B, and

A+B

A, B

are Hermitian matrices, then the eigenvalues satisfy the following majorisation relation -

(111. 1 1 ) Exercise 111.4.3 {Lidskii 's Theorem) The vector ,.\ l ( A + B ) is in the con

vex hull of the vectors A ! ( A) + P A ! (B ), where P varies over all permuta tion matrices. [This statement and those of Theorem II/.4. 1 and Corollary IIL4 . 2 are, in fact, equivalent to each other.}

Lidskii 's Theorem can be proved without calling upon the more intricate Wielandt's principle. We will see several other proofs in this book, each highlighting a different viewpoint. The second proof given below is in the spirit of other results of this chapter.

70

III.

Variational Principles for Eigenvalues

Lidskii's Theorem ( second proof ) . We will prove Theorem III.4. 1 by induction on the dimension n. Its statement is trivial when n = 1 . Assume it is true up to dimension n - 1 . When k == n, the inequality ( l i L lO ) needs no proof. So we may assume that k < n. Let Uj , Vj , and Wj be the eigenvectors of A, B, and A + B corresponding to their eigenvalues AJ ( A ) , AJ (B ) , and AJ ( A + B ) . We will consider three cases separately.

Case 1. i k < n. Let M = span { w 1 , . . . Wn - 1 } and let AM be the compres sion of A to the space M . Then, by the induction hypothesis k k k L Afj ( AM + BM ) < L At ( AM ) + L AJ ( BM ) ,

j= l

j= l

j= l

The inequality ( l i L lO ) follows from this by using the interlacing principle -( III. 2 ) and Exercise III. l . 7. Case 2. 1 < i 1 . Let M == span { u2 , . . . , u n } - By the induction hypothesis k k k

L A f; _ l ( AM + BM ) < L Af; _ l ( AM ) + LAJ (BM )· j =l j= l j =l Once again, the inequality ( III. lO ) follows from this by using the interlacing principle and Exercise III. l. 7.

Case 3. i 1 == 1 . Given the indices 1 == i 1 < i 2 < · · · < i k < n, pick up the indices 1 < e l < £2 < . . . < en - k < n such that the set {ij : 1 < j < k } is the complement of the set { n - ej + 1 : 1 < j < n - k } in the set { 1 , 2, . . . , n } . These new indices now come under Case 1 . Use ( liL lO ) for this set of indices, but for matrices -A and - B in place of A, B. Then note that A; (- A ) == - A�-i+ l ( A ) for all 1 < j < n. This gives n-k

n- k

n-k

L - A�-fJ + l (A + B ) < L - A �-f; + l ( A ) + L - A�- j + l ( B) . j=l j=l j =l Now add tr ( A + B) to both sides of the above inequality to get k

k

k

L At ( A + B ) < L,.\t ( A ) + L AJ (B) . j= l j= l j= l This proves the theorem.

•

As in Section IIL2, it is useful to interpret the above results as pertur bation theorems. The following statement for Hermitian matrices A, B can be derived from ( III. l l ) by changing variables:

( III. l2 )

I I I . 4 Lidskii 's Theo rems

71

Th is can also be written as Al ( A ) + Ai (B) -< A ( A + B ) -< Al ( A ) + Al ( B ) .

(III. 13)

In fact , the two right-hand majorisations are consequences of the weaker maximum principle of Ky Fan. As a consequence of (III. 12) we have: T heorem 111.4.4 Let A , B be Hermitian matrices and let be any sym m etric gauge function on 1Rn . Then (Al ( A ) - Al ( B )) <

Note that Weyl ' s perturbation theorem (Corollary III.2.6) and the in equality in Exercise 111.2. 7 are very special cases of this theorem. The majorisations in (III. 13) are significant generalisations of those in (11.35) , which follow from these by restricting A, B to be diagonal matrices. Such "noncommutative" extensions exist for some other results; they are harder to prove. Some are given in this section; many more will occur later. It is convenient to adopt the following notational shorthand. If x , y, z are n-vectors with nonnegative coordinates, we will write log x -< w log

k

y if 11 xj

k

<

j= l

log

x

-< log

y

log x - log

-<w log y

for k

=

1 , . . , n;

(III. 14)

.

j= l

y

log x -<w log

if

z

11 yj ,

and

11 X i;

j= l

11 xj == 11 y} ;

k

<

(III . 15)

j= l

j=l

k

if

n

n

k

11 Yj 11 Zi1 ,

j=l

(111. 1 6)

j=l

for all indices 1 < i 1 < < i k < n. Note that we are allowing the possibility of zero coordinates in this notation. ·

·

·

Theorem 111.4. 5 {Gel 'fand-Naimark} Let A, B be any two operators on 7-l. Then the singular values of A, B and AB satisfy the majorisation log s ( AB ) - log s ( B ) -< log s ( A ) .

(III. 1 7)

Proof. We will use the result of Exercise III. 3.7. Fix any index k , 1 < k < n. Choose any k orthonormal vectors x 1 , . . . , x k , and let W be their linear span. Let q)(t1 , . , tk ) t1t 2 · tk. Express AB in its polar form AB == UP. Then, denoting by Tw the compression of an operator T to the subspace W , we have .

.

==

·

·

I det Pw l 2

I det ( (xi , Pwxj ) ) 1 2 I det( (xi , Px1) ) 12 I det ( (A* Ux i , B xj ) ) j 2

.

III.

72

Variational Principles for Eigenvalues

Using Exercise 1.5. 7 we see that this is dominated by

The second of these determinants is eq ual to det(B * B)w ; the first is equal

k to det ( AA * ) uw and by Coroll ry 11!. 1 . 5 is dominated by IJ s] ( A ) . Hence, j= l we have k

==

Now, using Exercise I1I.3. 7, we can conclude�that

k k k err At ( P ) ) 2 < II A}j ( IBI 2 ) 11 sJ ( A ) , j=l j= l j= l I.e. '

for ali i < i 1

k k k (11!. 18 ) IJ s ii ( A B) < 11 s ii ( B ) IJsj ( A ) , j= l j= l j= l • < . . . < i k < This, by definition, is what (111. 17 ) says. n.

Remark. The statement

k k k 11 Sj ( AB ) < II Sj (A) II Sj ( B ) , j=l j= l j= l

(III. 19 )

which is a special case of (1II. 18 ) , is easier to prove. It is just the statement I I A k ( AB ) I I < I I A k A I I I I A k B I I - If we te mp orarily introduce the notation s l ( A) and s i (A) for the vectors whose coordinates are the singular values of A arranged in decreasing order and in increasi ng order, respectively, then the inequalities (III. 18 ) and ( 111. 19 ) can be combined to yield

s l ( A) + log s i (B ) -< log s(AB ) -< log sl (A) + log sl ( B) (III.20) for any two matrices A, B. In conformity with our notation this is a sym log

bolic representation of the inequalities

k k k k k IJ sii (A) IJ sn -ii + l (B) < IJ sj ( AB ) < IT sj (A )ITsj ( B ) j= l j= l j= l j= l j=l for all 1 < i1 < < i k < n. It is illuminating to compare this with the statement (III. 13) for eigenvalues of Hermitian matrices. ·

·

·

III. S

Eigenvalues of Real Parts and Singular Values

73

Corollary 111.4.6 {Lidskii) Let A, B be two positive matrices. Then all eigenvalues of A B are nonnegative and

Proof. It is enough to prove this when B is invertible, since every positive matrix is a limit of such matrices. For invertible B we can write

A B B- 1 / 2 ( B l / 2 A B l / 2 ) B l / 2 . Now B 1 1 2 A B112 is positive; hence the matrix A B , which is similar to ==

it ,

has nonnegative eigenvalues. Now, from (III.20) we obtain

A 1 ( A 112 ) + log A T ( B 1 12 ) -< log s( A 1 / 2 B 112 ) -< log Al (A 112 ) + log A l ( B 112 ). (III.22) But s 2 (A 112 B 1 12 ) == Al ( B112 A B 112 ) _ == Al (A B ). So, the majori sations • ( IIL2 1 ) follow from (III.22) . log

III . 5

Eigenvalues of Real Parts and S ingular Values

The Cartesian decomposition A == Re A + i Im A of a matrix A associates with it two Hermitian matrices Re A == A iA "' and Im A == A ;t* . It is of interest to know relationships between the eigenvalues of these matrices, those of A, and the singular values of A. Weyl's majorant theorem (Theorem 11.3.6) provides one such relation ship: log lA( A) I -< log s( A) . Some others, whose proofs are in the same spirit as others in this chapter, are given below. Proposition 111. 5 . 1 {Fan-Hoffman} For · every matrix A

AJ (Re A) < Sj (A)

for all j == 1 , . . . , n.

Proof. Let xj be eigenvectors of Re A belonging to its eigenvalues AJ (Re A) and Yi eigenvectors of I A I belonging to its eigenvalues s1 (A) , 1 < j < n. For each j consider the spaces span {x 1 , . . . , x j } and span { Yi , . . . , Yn}. Their di mensions add up to n + 1 , so they have a nonzero intersection. If x is a unit vector in their intersection then AJ (Re A)

< <

Re ( x , Ax) j (x , Ax ) l < I ! A x j j (x , A * Ax) 1 / 2 < s1 ( A ) . (x , (Re A ) x)

==

•

74

III .

Variational Principles for Eigenvalues

Exercise 111. 5.2 {i) Let A be the 2 x 2 matrix ( � � ) . Then s 2 (A) == 0 , but Re A has two nonzero eigenvalues. Hence the vector j .A ( Re A) l l is not dominated by the vector s(A) . {ii) However, note that 1 -A ( Re A) l -< w s(A) for every matrix A. {Use the triangle inequality for Ky Fan norms.} Proposition 111.5.3 (Ky Fan} For every matrix A 'We have

Re .A (A) -< .A ( Re A) . Proof.

Let =

Arrange the eigenvalues Aj ( A ) in such a way that

x 1 , . , X n be an orthonormal Schur-basis for A such that Aj ( A) ( xj , Axj ) · Then Aj ( A ) == ( xj A * xj ) · Let W = span {x1 , � �·- · , xk}· Then . k L ( xj , ( Re A ) xj ) = tr ( Re A) w j= l k k 2:: -Aj ( ( Re A) w ) < 2:: -A} ( Re A) . j= l j= l . .

,

•

Exercise 111. 5.4 Give another proof of Proposition II/. 5. 3 using Schur 's theorem (given in Exercise 11. 1 . 12). Exercise 111. 5.5 Let X, Y be Hermitian matrices. Suppose that their eigen values can be indexed as Aj (X) and Aj ( Y) , 1 < j < n , in such a way that Aj (X) < Aj (Y) for all j . Then there exists a unitary U such that

X < U * YU .

{ii) For every matrix A there exists a unitary matrix U such that Re A < U* I A I U .

interesting consequence of Proposition III.5. 1 is the following version of the triangle inequality for the matrix absolute value: An

Theorem 111. 5.6 (R. C. Thompson) Let A, B be any two matrices. Then there exist unitary matrices U, V such that l A + B l < U I A I U* + V I B I V* .

Let A + B = W I A + B l be a polar decomposition of A + B. Then we can write . l A + B l == W* (A + B) == Re W* ( A + B ) == Re W* A + Re W * B. • Now use Exercise III. 5.5(i i ) . Proof.

III.6

Problems

75

Exerci se 111. 5 . 7 {i} Find 2 x 2 matrices A , B such that the inequality l A + B l < I A I + I B I is false for them . {ii} Find 2 x 2 matrices A, B for which there does not exist any unitary matrix U such that l A + B l < U ( I A I + I B I ) U* . III . 6

Problems

Problem 111. 6. 1 . ( The minimax principle for singular values ) any operator A on 7t we have

max min N:dim N=

For

min l l Ax l l max l l Ax l l -j + l N l=l

M :dim M=j xEM , I Ix l l = l

n

for 1 < j <

xE , I I x l

n.

Problem 111.6 . 2 .

Let A , B be any two operators. Then sj (AB) < l l B l l sj (A ) , sj (A B) < I I A l l sj ( B )

for 1 < j <

n.

Problem 111.6. 3 .

For j �j

Show that for j

==

== ==

0, 1, . . . , n, {T E .C (H )

:

rank T < j } .

1 , 2 , . . . , n, sj ( A)

Problem 111.6.4. of rank k, then

let

=

min I lA - TI I TE �j - 1

Show that if A is any operator and H is any operator

sj (A) > Sj+k ( A + H ) ,

j

==

1 , 2, .

. . , n -

Problem 111.6.5. For any two operators A , B and that i + j < n + 1 , we have

any two indices i, j such

Si +j - l ( A + B) < S i (A) + Sj (B) s i+j- I (A B ) < s i (A ) sj ( B ) .

k.

76

III.

Variational Principles for Eigenvalues

Problem 111.6 . 6 . 1 , 2, . . . , n, we have

Show that for every operator A and

for

each k

k

k Lsj ( A ) == max L (yj , Axj) , j =l j= l 1 where the maximum is over all choices of orthonormal k-tuples x 1 , . . . , x k and y 1 , . . . , Yk . This can also be written as k k Lsj ( A) == max L( xj , UAxj) , j=1 j= where the maximum is taken over all choices of unitary operators U and orthonormal k-tuples x 1 , . . . , X k · Note that for k == 1 this reduces to the statement I I A I I == I (y , Ax ) 1 l

l lx i i = I IYI I =l

For k == 1 , 2, . . . the above extremal representations can be used to give k another proof of the fact that the expressions I I A I I c k ) == L sj ( A) are norms . j=l ( See Exercise II. l. l 5 . ) ( aij ) be a Hermitian matrix. For each i Problem 111.6.7. Let A 1, . . . let 1/ 2 ri == L 1 aij l 2 j::f:. i Show that each interval [a ii - ri, aii + ri ] contains at least one eigenvalue of A . Problem 111.6.8. Let a 1 > a 2 > > a n be the eigenvalues of a Her mitian matrix A. We have seen that the n - 1 eigenvalues of any principal submatrix of A interlace with these numbers. If 81 > 82 > · · · > 8n - 1 are the roots of the polynomial that is the derivative of the characteristic polynomial of A, then we have by Rolle's Theorem , n,

, n,

·

·

·

Show that for each j there exists a principal submatrix B of A for which a-1 > AJ ( B) > 8j and another principal submatrix C for which 8j >

AJ (C) > ai + l ·

Most of the results in this chapter gave descriptions of eigenvalues of a Hermitian operator in terms of the numbers (x, Ax)

Problem 111.6.9.

IIL6

Problems

77

varies over unit vectors. Sometimes in computational problems an "approximate" eigenvalue .,.\ and an "approximate" eigenvector x are already known. The number (x , Ax) can then be used to further refine this informatio n . For a given unit vector x , let p (x , Ax) , c I I (A - p)x l l . ( i) Let (a, b) be an open interval that contains p but does not contain any eigenvalue of A . Show that when x

==

==

( b - p) ( p - a) < c 2 . ( ii) Show that there exists an eigenvalue a of A such that

In - PI <

£.

Let p and c be defined as in the above problem. Let ( a , b) be an open interval that contains p and only one eigenvalue a of A. Then 2 2 P roblem 111.6. 10.

£

p-

£

.

p-a b-p Kato-Temple inequality. Note

that if p - a and b - p This is called the are much larger than £ , then this improves the inequality in part ( ii ) of Problem III.6.9. Problem 111. 6. 1 1 . Show that for every Hermitian matrix A k max tr UAU* , L .AJ ( A ) UU* = lk j= l

k min tr UA U* L .AJ ( A) U U * = lk j=l for 1 < k < n, where the extrema are taken over k matrices U that satisfy UU* Ik , Ik being the k k identity matrix. Show that if A is positive, then k max det U AU* , rr.,.\; (A ) U U * =lk x n

x

==

j= l

k ll .A} ( A) j= l

min det UAU* .

UU* = lk

( See Problem 1.6. 15.) Problem 111.6. 12. n

Let A , B be any matrices. Then

Lsj ( A ) sj ( B) supjtr UAV B I == sup Re tr UAVB , ==

j= l

U, V

U, V

78

III.

Variational Principles for Eigenvalues

where U, V vary over all unitary matrices. Problem 111.6. 13. ( Perturbation theorem for singular values ) Let A, B be any n x n matrices and let

(s( A - B) ) .

In particular,

max l sj ( A) - Sj ( B) I < I l A - B I I [Hint : See Theorem 111.4.4 and Exercise 11. 1.15.] Problem 111.6. 14. For positive matrices A, B show that For Hermitian matrices A, B show that ( A ! (A) , A T (B ) ) < tr AB < ( A ! ( A) , _x! (B)) . (Compare these with (11.36) and (II.37) .) Problem 111.6. 15. Let A, B be Hermitian matrices. Use the second part of Problem 111.6. 14 to show that Note the analogy between this and Theorem III.2.8. (In Chapter IV we will see that both these results are true for a whole family of norms called unitarily invariant norms. This more general result is a consequence of Theorem III.4.4. ) III. 7

Notes and References

As pointed out in Exercise III.l.6, many of the results in Sections III. 1 and III.2 could be derived from each other. Hence, it seems fair to say that the variational principles for eigenvalues originated with A.L. Cauchy's interlacing theorem. A pertinent reference is Sur f. 'equation a f 'aide de laquelle on determine les inegalites seculaires des mouvements des planetes, 1829, in A . L. Cauchy, Oeuvres Completes (lie Serie}, Volume 9, Gauthier

Villars. .. The minimax principle was first stated by E. Fischer, Uber Quadratische Formen mit reellen Koeffizienten, Monatsh. Math. Phys., 16 (1905 ) 234-249. The monotonicity principle and many of the results of S ection III.2 were proved by H. Weyl in Das asymptotische Verteilungsgesetz der Eigen werte linearer partieller Differentialgleichu "!gen, Math. Ann., 71 (1911 ) 441469. In a series of papers beginning with Uber die Eigenwerte bei den Dif ferentialgleichungen der mathematischen Physik, Math. Z. , 7(1920 ) 1-57,

III. 7

Notes

and

References

79

R.

Courant exploited the full power of the minimax principle. Thus the principle is often described as the Courant-Fischer-Weyl principle. As the titles of these papers suggest, the variational principles for eigen values were discovered in connections with problems of physics. One fa mous work where many of these were used is The Theory of Sound by Lord Rayleigh, reprinted by Dover in 1945. The modern applied mathematics classic Methods of Mathematical Physics by R. Courant and D . Hilbert, Wiley, 1953, is replete with applications of variational principles. For a still more recent source, see M. Reed and B. Simon, Methods of Modern Math ematical Physics, Volume 4, Academic Press, 1978. Of course, here most of the interest is in infinite-dimensional problems and consequently the results are much more complicated. The numerical analyst could turn to B.N. Par lett, The Symmetric Eigenvalue Problem, Prentice-Hall, 1980, and to G.W. Stewart and J.-G. Sun, Matrix Perturbation Theory, Academic Press, 1990. The converse to the interlacing theorem given in Theorem III. l.9 was first proved in L. Mirsky, Matrices with prescribed characteristic roots and diagonal elements, J. London Math. Soc. , 33 ( 1958) 14-21. We do not know whether the similar question for higher dimensional compressions has been answered. More precisely, let a1 > > f3n , be > a n , and {31 > real numbers such that Eai E{3i . What conditions must these num bers satisfy so that there exists an orthogonal projection P of rank k such that the matrix A diag ( a 1 , . . . , a n ) when compressed to range P has eigenvalues {31 , . . . , f3k and when compressed to ( range P) j_ has eigenvalues - 1. ) f3k + 1 , . . .. , f3n ? ( Theorem III. 1. 9 is the case k Aronszajn's inequality appeared in N. Aronszajn, Rayleigh-Ritz and ·

·

·

·

·

·

==

==

== n

A. Weinstein methods for approximation of eigenvalues. I. Operators in a Hilbert space, Proc. Nat. Acad. Sci. U.S.A. , 34( 1948) 474-480. The ele gant proof of its equivalence to Weyl 's inequality is due to H. W. Wielandt, Topics in the Analytic Theory of Matrices, mimeographed lecture notes,

University of Wisconsin, 1967. Theorem III.3.5 was proved in H.W. Wielandt, An extremum property of sums of eigenvalues, Proc. Amer. Math. Soc., 6 ( 1955 ) 106-110. The motivation for Wielandt was that he "did not succeed in completing the interesting sketch of a proof given by Lidskii" of the statement given in Exercise III.4.3. He noted that this is equivalent to what we have stated Theorem III.4.1 , and derived it from his new minimax principle. Inter estingly, now several different proofs of Lidskii 's Theorem are known. The second proof given in Section III.4 is due to M.F. Smiley, Inequalities re lated to Lidskii 's, Proc. Amer. Math. Soc., 19 ( 1968 ) 1029-1034. We will see some other proofs later. However , Theorem III.3.5 is more general, has several other applications, and has led to a lot of research. An account of the earlier work on these questions may be found in A.R. Amir-Moez, as

Extreme Properties of Linear Transformations and Geometry in Unitary Spaces, Texas Tech. University, 1968, from which our treatment of Sec tion III.3 has been adapted. An attempt to extend these ideas to infinite

80

III. Variational Principles for Eigenvalues

dimensions was made in R.C. Riddell, Minimax problems on Grassmann manifolds, Advances in Math. , 54 ( 1984) 107-199, where connections with differential geometry and some problems in quantum physics are also de veloped. The tower of subspaces occurring in Theorem III.3.5 suggests a connection with Schubert calculus in algebraic geometry. This connection is yet to be fully understood. Lidskii's Theorem has an interesting history. It appeared first in V .B. Lidskii, On the proper values of a sum and product of symmetric matrices, Dokl. Akad. Nauk SSSR, 75 (1950) 769-772. It seems that Lidskii provided an elementary ( matrix analytic ) proof of the result which F. Berezin and I.M. Gel'fand had proved by more advanced ( Lie theoretic ) techniques in connection with their work that appeared later in Some remarks on the theory of spherical functions on symmetric Riemannian manifolds, Trudi Moscow Math. Ob., 5 (1956) 311-351. As mentioned above, difficulties with this "elementary" proof led Wielandt to the discovery of his minimax prin ciple. Among the several directions this work opened up, one led to the follow ing question. What relations must three n-tuples of real numbers satisfy in order to be the eigenvalues of some Hermitian matrices A, B and A + B? Necessary conditions are given by Theorem III.4. 1. Many more were discov ered by others. A. Horn, Eigenvalues of sums of Hermitian matrices, Pacific J. Math. , 12(1962) 225-242, derived necessary and sufficient conditions in the above problem for the case 4, and wrote down a set of conditions which he conjectured would be necessary and sufficient for > 4. In a short paper Spectral polyhedron of a sum of two Hermitian matrices, Functional Analysis and Appl., 10 ( 1982 ) 76-77, B.V . Lidskii has sketched a "proof" establishing Horn's conjecture. This proof, however, needs a lot of details to be filled in; these have not yet been published by B.V. Lidskii ( or anyone else). When should a theorem be considered to be proved? For an interesting discussion of this question, see S. Smale, The fundamental theorem of al gebra and complexity theory, Bull. Amer. Math. Soc. ( New Series) , 4 ( 1981) 1-36. Theorem III.4.5 was proved in LM. Gel'fand and M. Naimark, The rela n ==

n

tion between the unitary representations of the complex unimodular group and its unitary subgroup, lzv Akad. Nauk SSSR Ser. Mat. 14 ( 1950) 239-

260. Many of the questions concerning eigenvalues and singular values of sums and products were first framed in this paper. An excellent summary of these results can be found in A.S. Markus, The eigen-and singular val ues of the sum and product of linear operators, Russian Math. Surveys, 19 ( 1964 ) 92-120. The structure of inequalities like (liL lO) and ( III.18 ) was carefully anal ysed in several papers by R.C. Thompson and his students. The asymmetric way in which A and B enter (III. 10 ) is remedied by one of their inequalities,

III. 7 Notes and References

81

which says k

k

k

j =l

j=l

j =l

LA L+Pi -j ( A + B ) < LAt (A ) + LA�j (B )

1 < p 1 < · · · < P k < such that A similar generalisation of (III. 18 ) has also been proved. ik + Pk - k < References to this work may be found in the book by Marshall and Olkin cited in Chapter II. Proposition III.5. 1 is proved in K. Fan and A.J. Hoffman, Some metric inequalities in the space of matrices, Proc. Amer. Math. Soc. , 6 (1955 ) 1 1 11 16. Results of Proposition 111.5.3, Problems III.6.5, III.6.6, III.6. 11, and III.6. 12 were first proved by Ky Fan in several papers. References to these may be found in I.C. Gohberg and M.G. Krein, Introduction to the Theory of Linear Nonselfadjoint operators, American Math. Society, 1969, and in the Marshall-Olkin book cited earlier. The matrix triangle inequality (Theorem III.5.6 ) was proved in R.C. Thompson, Convex and concave functions of singular values of matrix sums, Pacific J. Math. , 66 (1976) 285-290. An extension to infinite di mensions was attempted in C. Akemann, J. Anderson, and G. Pedersen, Triangle inequalities in operator algebras, Linear and Multilinear Algebra, 11(1982 ) 167-178. For operators A, B on an infinite-dimensional Hilbert space there exist isometries U, V such that for any indices 1

 0 there exist unitaries U, V such that l A + B j < U I A I U*

+

V I B I V * + c. I .

It is not known whether the part in the last statement is necessary. Refinements of the interlacing principle such as the one in Problem III.6.8 have been obtained by several authors, including R.C. Thompson. See, for example, his paper Principal submatrices II, Linear Algebra Appl. , 1( 1968) 211-243. One may wonder whether there are interlacing theorem, for singular val ues . There are, although they are a little different from the ones for eigen values. This is best understood if we extend the definition of singular values to rectangular matrices. Let A be an m matrix. Let r min(m, ) The r numbers that are the common eigenvalues of ( A* A ) 1 1 2 and ( A A * ) 11 2 are called the singular values of A. ( Sometimes a sequence of zeroes is added to make max(m, ) singular values in all.) Many of the results for singular values that we have proved can be carried over to this setting. See, e.g., the books by Horn and Johnson cited in Chapter I. c:

x n

n

==

n .

III. Variational Principles for Eigenvalues

82

Let A be a rectangular matrix and let B be a matrix obtained by deleting any row or any column of A. Then the minimax principle can be used to prove that the singular values of A and B interlace. The reader should work this out, and see that when A is an n x n matrix and B a principal submatrix of order n 1 then this gives -

s1 ( A) s 2 ( A)

. . . . . . . . . .. S n - 2 (A) S n- l ( A)

s 1 (B) s 2 (B)

> >

-

..... .. ... .

Sn-2 (B) Sn- l (B)

> >

> >

s3 (A) , s 4 (A) ,

> >

Sn (A) ,

-

.

0.

For more such results, see R.C. Thompson, Principal submatrices IX, Linear Algebra and Appl . , 5(1972 ) 1-12. Inequalities like the ones in Problems III.6.9 and III.6. 10 are called "resid ual bounds" in the numerical analysis literature. For more such results, see the book by Parlett cited above, and F. Chatelin, Spectral Approximation of Linear Operators, Academic Press, 1983. =Several refinements, exten sions, and applications of these results in atomic physics are described in the book by Reed and Simon cited above. The results of Theorem 111.4.4 and Problem III.6. 13 were noted by L. Mirsky, Symmetric gauge functions and unitarily invariant norms, Quart. J. Math. , Oxford Ser. (2) , 11(1960 ) 50-59. This paper contains a lucid sur vey of several related problems and has stimulated a lot of res�arch. The inequalities in Problem 111.6.15 were first stated in K. Lowner, Uber mono tone Matrix functionen, Math. Z . , 38 ( 1934 ) 177-216. Let A == U P be a polar decomposition of A. Weyl's majorant theorem gives a relationship between the eigenvalues of A and those of P ( the sin gular values of A) . A relation between the eigenvalues of A and those ofU was proved by A. Horn and R. Steinberg, Eigenvalues of the unitary part of a matrix, Pacific J. Math., 9(1959 ) 541-550. This is in the form of a majorisation between the arguments of the eigenvalues: arg A ( A) -< arg A ( U ) . A theorem, very much like Theorems 111.4. 1 and III.4.5 was proved by A. Nudel'man and P. Svarcman, The spectrum of a product of unitary ma trices, Uspehi Mat. N auk, 13 ( 1958) 111-1 17. Let A, B be unitary matrices. Label the eigenvalues of A, B , and AB eia1 , , eian ; eif31 , , eif3n , and e i --rn , respectively, in such a way that e i--r1 , as

•

•

•

•

•

,

2 7f

>

27r >

{31 rl

>

·

>

.

·

�

·

.

> f3n >

0,

> rn > 0 .

•

•

•

•

III. 7

If a 1

have

+

{3 1 <

Notes and References

21r, then for any choice of indices 1 k

LTij

j= l

<

k

k

j= l

j= l

< i1 < · · · < ik <

83 n

we

L o:ij + Lf3j ·

These inequalities can also be written in the form of a majorisation between n-vectors: I

-

a. -< {3 .

For a gene ralisation in the same spirit as the one of inequalities (III. lO) and ( III.18 ) mentioned earlier, see R.C. Thompson, On the eigenvalues of a product of unitary matrices, Linear and Multilinear Algebra, 2 ( 1974 ) 1 3-24.

Symmetric Norms

In this chapter we study norms on the space of matrices that are invariant under multiplication by unitaries. Their properties are closely linked to those of symmetric gauge functions on lRn . We also study norms that are invariant under unitary conjugations. Some of the inequalities proved in earlier chapters lead to inequalities involving these norms. IV . l

Nor1ns on cc n

Let us begin by considering the familiar p-norms frequently used in analysis. For a vector x ( . . . , X n ) we define ==

x1 ,

n

l l x i i P == (L l x i l p )lfp , 1 < p < oo, i=l

( IV. l) ( IV.2 )

each 1 < p < oo, l lx l l p defines a norm on e n . These are called the p-norms or the lp -norms. The notation (IV . 2) is justified because of the fact that ( IV.3) m llxJi p · llx l l oo pli--oo For

==

Some of the pleasant properties of this family of norms are

( IV.4 )

IV. l

Norms on

85

Cn

( IV.5 ) H x iJ P < IIYII P if l x l < I Y I , (IV.6) l l x l l p == I I Px l l p for all X E e n ' p E Sn . ( Recall the notations : l x l == ( !x l ! , . , !x n l ) , and lxl < I Y I if l xi l < I Yi l for 1 < j < n . Sn is the set of permutation matrices. ) A norm on e n is called gauge invariant or absolute if it satisfies the condition (IV.4) , mono tone if it satisfies (IV. 5 ) and permutation invariant or symmetric if it satisfies (IV .6) . The first two of these conditions turn out to be equivalent: P rop osition IV. l . l A norrn on e n is gauge invariant if and only if it is mo noto ne. .

.

,

P roof. Monotonicity clearly implies gauge invariance. Conversely, if a norm II · II is gauge invariant, then to show that it is monotone it is enough to show that IJ x l l < IJ Y II whenever Xj == tj yj for some real numbers , n. Furtber, it suffices to consider the special case 1 , 2, 0 < "- tj < 1 , j when all tj except one are equal to 1 . But then ==

(

. . .

l j ( y l , · · · ' t yk , · · · ' Yn ) l l l+t 1-t 1+t 1-t , Y1 + Yk , Yk 2 Y1 , · · ·

2

2

2

•

·

·

,

1+t

2

Yn

1-t

+ 2

) Yn

1-t

l+t

< 2 Jj (y l , · · · ' Yn ) ll + 2 II ( Y I , · · · ' - yk , · · ' Yn ) l l ·

==

•

I J ( Y 1 , · · ' Yn ) II · ·

Example IV. 1.2 Consider the following norms on IR2 : {i) l l x l l == lx 1 l + l x 2 l

+ l x1 - x2 1 ·

+ l x 1 - x2 1 · 21 x1 l + l x 2 ! -

{ii) l l x l l == l x1 l {iii} ll x l l

==

The first of these is symmetric but not gauge invariant, the second is neither symmetric nor gauge invariant, while the third is not symmetric but is gauge invariant.

Norms that are both symmetric and gauge invariant are especially inter esting. Before studying more examples and properties of such norms, let us make a few remarks. Let T be the circle group; i.e., the multiplicative group of all complex numbers of modulus 1. Let Sn o T be the semidirect product of Sn and T . In other words, this is the group of all n x n matrices that have exactly one nonzero entry on each row and each column, and this nonzero entry has modulus 1 . We will call such matrices complex permutation matrices. Then a norm I I . II on en is symmetric and gauge invariant if II x II == II Tx II

for all complex permutations

T.

( IV . 7 )

86

IV. Symmetric Norms

In other words, the group of ( linear ) isometries for II · I I contains Sn o T as a subgroup. ( Linear isometries for a norm 11 · 11 are those linear transformations on en that preserve II · II - ) Exercise IV. 1 .3 For the Euclidean norrn l l x ll 2 == (2:: j x i l 2 ) 1 1 2 the group of isometries is the group of all unitary matrices, which is much larger than the complex permutation group. Show that for each of the norms 1! x l l 1 and llx l l oo the group of isometries is the comp lex permutation group.

Note that gauge invariant norms on en are determined by those on R n . Symmetric gauge invariant norms on JRn are called symmetric gauge functions. We have come across them earlier (Example II.3.13 ) . To repeat, map : IR.n IR+ is called a symmetric gauge function if

a

----+

(i ) 4> is a norm, ( ii )

( Px ) == ( x )

for all x E Rn and P E Sn ,

In addition, we will always assume that is normalised, so that (iv)

( 1 , 0, . . . , 0) == 1 .

The conditions (ii ) and (iii ) can be expressed together by saying that is invariant under the group Sn o Z 2 consisting of permutations and sign changes of the coordinates. Notice also that a symmetric gauge function is completely determined by its values on JR� . Example IV. 1 .4 If the coordinates of x are arranged so that j x 1 l > l x2 l > . . . > l x n I , then for each k == 1 , 2, . . . , n , the function k

(k) (x) ==

(IV.8 )

L l xj l

j= l

is a symmetric gauge function. We will also use the notation ll x ll (k ) for these. The parentheses - are used to distinguish these norms from the p norms defined earlier. Indeed, note that l l x ii ( I ) == ll x l l oo and ll x l l ( n ) == l l x l h ·

We have observed in Problem 11.5. 1 1 that these norms play a very distin guished role : if ( k ) ( x) < ( k ) ( Y ) for all k = 1 , 2 , . . then (x) < (y) for every symmetric gauge function . Thus an infinite family of norm inequalities follows from a finite one. Proposition IV. 1 . 5 For each k == 1 , 2, . . . .

, n,

, n,

(IV.9)

IV. l Norms on en

87

P roof. We may assume, without loss of generality, that x E IR� . If x == u + v , then (k ) ( x) < ( k ) ( u ) + ( k ) ( v ) < (n ) ( u ) + kc l ) ( v) . If we choose

u

v

( x 1! - xk1 , x2! - x k1 , . . . , x k1 - x k1 , O , . . . 0) ,

( x i , . . . , x i , x i +1 , . . . , x � ) ,

then u+v ( n ) ( u )

xi

xk ,

'

( k ) ( x) - k x i , 1

•

and the proposition follows.

We now derive some basic inequalities . If f is a convex function on an interval I and if a i , i 1 , 2, are nonnegative real numbers such that n Ea i 1 , then ==

. . . , n,

==

i=l

n

n

J ( E a i t i ) < L ai f (t i ) i= l

for

i =l

all t i E /.

- log t on the interval

Applying this to the function f (t) obtains the fundamental inequality

( 0, oo ) ,

one

(IV. 10)

This is called the ( weighted ) arithmetic-geometric mean inequality . The special choice a 1 a2 � gives the usual an arithmetic - geometric mean inequality ==

==

·

·

·

==

==

(IV. 1 1 ) Theorem IV. 1 . 6 Let p, q be real numbers with p > 1 and x , y E lRn . Then for every symmetric gauge function

!+�

==

1 . Let

(IV. 12) Pro of.

From the inequality (IV. 10) one obtains q x iP IYi l , lx Yl 0, if we replace x , y by t x and t - 1 y , then the left-hand side of ( IV. 13) does not change. Hence, (IV. l4) But, if cp(t)

1 == -tP a + -b qtq , p

where t , a , b > 0 ,

then plain differentiation shows that min cp(t) a 1 1Pb 1 f q . So, (IV. l2) follows from ( IV. 14) . ==

•

When ( n ) , ( IV. l2) reduces to the familiar Holder inequality n n n L l x i Yi l < (2= lx i i P ) 1 1P( L !Yi l q ) l f q _ i=l i= l i= l We will refer to ( IV. 12) as the Holder inequality for symmetric gauge functions. The special case p 2 will be called the Cauchy-Schwarz ==

==

inequality for symmetric gauge functions.

Exercise IV. l. 7 Let p, q , r be positive real numbers with that for every symmetric gauge function 4> we have Theorem IV. 1 . 8 Let Then for all x , y E IRn

be

p + 1q

1

any symmetric gauge function

==

1. r

Show

( IV. l 5) and let p > 1 . (IV. l6)

When p 1, the inequality (IV. l6) is a consequence of the triangle inequalities for the absolute value on JRn and for the norm . Let p > 1. It is enough to consider the case x > 0, y > 0. Make this assumption and write ( x + y ) P x · ( x + y ) p - 1 + y · (x + y ) P - 1 _ Now, using the triangle inequality for

( X (X + y ) p- 1 ) + tJ> ( y · ( X + y ) p - l ) < [ ( x P ) ] l f P [

( yP ) ] I fp [

( x P ) ] l fp + [ ( yP )] 1 f p } [( ( x + y ) P ) ] l f q ,

Proof.

==

==

·

q (p - 1) [ 4> ( (x + y)P )] 1 1q,

si nce

IV. l Norms on Cn

p . If

89

we divide both sides of the above inequality by we get (IV . 16 ) . •

. :::�

Once again, when ci> ci>(n) the inequality (IV. 16) reduces to the familiar Minkowski inequality. So, we will call ( IV. 16 ) the Minkowski inequality ==

for symmetric gauge functions.

Exercise IV. 1.9 Let

ci>

be a symmetric gauge function and let p >
==

1. Let (IV . 17 )

[ ( l x i P )] 1/p .

Show that q> (P) is also a symmetric gauge function. Note that, if is the family of fp -norms, then

p

ci>�2) cl>p1p2 ==

for all PI , P 2 >

1,

(IV. 18)

and, if ci> ( k ) is the norm defined by (IV. B), then

k

ci> �� ( X ) == ( L I Xj I p ) 1 Ip '

{IV. 19 )

j= l

where the coordinates of x are arranged as l x 1 l > l x 2 l > · · > l x j . n ·

Just as among the lp-norms, the Euclidean norm has especially interest ing properties, the norms <J>(2) where ci> is any symmetric gauge function have some special interest. We will give these norms a name:

Definition IV. l . lO W is called a quadratic symmetric gauge func tion, or a Q-norm, if W == (2) for some symmetric gauge function . In other words,

(IV.20 )

Exercise IV. l . l l (i) Show that an lp -norm is a Q-norm if and only if p > 2. (ii) More generally, show that for each k == 1, 2, . . . , n, ci> �� is a Q-norm if and only if p > 2. Exercise IV. 1 . 12 We saw earlier that if

( k ) (Y) for all k == 1 , 2, . . . , n, then ci>(x) < ci>(y) for all symmetric gauge functions. Show that <

( 2 ) (x) < cpC2> (y) for all if symmetric gaug e functions ; i. e. , W ( x ) < W ( y) for all quadratic symmetric gauge functions.

�!�(x)

If q> is a norm on en , the dual of is defined as ' (x) sup l (x , y) l. ==

tl>(y ) = l

( IV.21 )

It is easy to see that ' is a norm even when is a function on en that does not necessarily satisfy the triangle inequality that but meets the other requirements of norm. ) a

90

IV.

Symmetric Norms

Exercise IV . 1 . 13 If Exercise IV. l . l4

Exercise IV. 1 . 15

is a symmetric gauge function then so is ' . Show that for any norm

l ( x, y ) l < '(y) for all x, y. Let
1 1 + p q -

-

==

(IV.22)

(IV.23)

1.

Let and W be two norms such that 0.

Exercise IV. 1 . 16

Show that

'(x) > c - 1 w' (x) for all x. We shall call a symmetric gauge function a Q'-norm if it is the dual of a Q-norm. The lp-norms for 1 < p < 2 are examples of Q ' -norrns. Exercise IV. 1 . 1 7 (i) Let be a norm such that

must be the Euclidean norm. (ii) Let

must be the Eu clidean norm. {Use Exercise IV. 1 . 1 6 and the fact that every symmetric gauge function is bounded by the l1 -norm.) Exercise IV. 1 . 18 For each k = 1, 2, the dual of the norm ( k ) is given by (IV.24) ( k ) (x) = max { l ) ( x ), � ( n) (x) . Prove this using Proposition IV. 1 . 5 and Exercise IV. 1 . 1 6. Some ways of generating symmetric gauge functions are described in the following exercises. Exercise IV. 1 . 19 Let 1 = a 1 > a 2 > · · · > a n > 0. Given a symmetric gauge function

{

}

Then W is a symmetric gauge function. Exercise IV . 1 .20 (i) Let be a symmetric gauge function on R_n . Let m < n. If E Rm ' let X == (xi , . ' X m. , 0 , 0 , . . . ' 0) and define w (x) == m we define is a symmetric gauge function on IRn . X

.

.

IV.2 Unitarily Invariant Norms on Operators on en

IV . 2

91

Unit arily Invariant Norms on O p erators on cc n

In this section, e n will always stand for the Hilbert space cc n with inner product ( · , ·) and the associated norm 11 · 11 · ( No subscript will be attached to this "standard" norm as was done in the previous Section. ) If A is a linear ope rator on e n ' we will denote by II Al l the operator ( bound ) norm of A defined as (IV.25) II A l l == sup I I A x il . l l x ll = l

As before, we denote by I A I the positive operator (A* A ) 11 2 and by s ( A ) the vector whose coordinates are the singular values of A , arranged as s 1 ( A ) > s2 ( A ) > · · · > s n (A) . We have II A I I == II l A I II == s 1 ( A) .

(IV.26)

Now, if U, v are unitary operators on e n , then I U AV I == V* I A I V and hence II A I I == II U A V II

(IV. 27)

for all unitary operators U, V . This last property is called unitary invari ance. Several other norms have this property. These are frequently useful in analysis, and we will study them in some detail. We will use the symbol Il l · I l l to mean a norm on n n matrices that satisfies x

(IV.28) I I I U A V I I I == I I I A I I I for all A and for unitary U, V . We will call such a norm a unitarily in variant norm on the space M ( n ) of n x n matrices. We will normalise such norms so that they all take the value 1 on the matrix diag(l, 0, . . . , 0) .

There is an intimate connection between these norms and symmetric gauge functions on IRn ; the link is provided by singular values.

Theorem IV. 2 . 1 Given a symmetric gauge function

== (s(A)).

Then this defines a unitarily invariant norm on M ( n ) . Conversely, given any unitarily invariant norm I l l · I l l on M ( n ) , define a function on Rn by

where diag

1 1 1 · 1 1 1 (x ) == I l l diag ( x ) I I j ,

(IV.30)

( ) is the diagonal matrix with entries x 1 , . . . , X n on its diagonal. x

Then this defines a symmetric gauge function on IRn .

Since s(UAV ) == s ( A ) for all unitary U, V , I l l · I I I is unitarily invariant. We will prove that it obeys the triangle inequality - the other Proof.

92

IV . Symmetric Norms

conditions for it to be a norm are easy to verify. For this, recall the majori sation ( II. 18 )

s ( A + B ) -< w s( A) + s ( B ) for all A , B E M ( n) ,

and then use the fact that

n I I A II P == oo (s( A)) == S 1 ( A) == II A II .

( IV.31 ) ( IV.32 )

The second is the class of Ky Fan k-norms defined as k

I I A li c k ) == L si ( A ) , 1 < k < j=l

(IV.33)

n.

Among the p-norms, the ones for the values p == 1, 2, oo, are used most often. As we have noted, I I AI I oo is the same as the operator norm I I A I I and the Ky Fan norm 1 1 AI Ic 1 ) · The norm I I A i h is the same as I I AI! cn ) · This is equal to tr ( J AI) and hence is called the trace norm, and is sometimes written as I I A ilt r · The norm

n I I A II 2 == [L ( Sj ( A ) ) 2 ] 1 /2 j= l

(IV.34)

is also called the Hilbert-Schmidt norm or the F'robenius norm ( and is sometimes written as I I A I I F for that reason ) . It will play basic role in our analysis. For A , B E M ( n ) let a

( A , B) == trA *B .

( IV.35 )

This defines an inner product on M ( n ) and the norm associated with this inner product is I I AI I2 , i.e. , (IV.36)

IV .2 Unitarily Invariant Norms

If the

matrix A has entries aij , then

on Operators on Cn

93

__

(IV.37) 'l. ,J

Thus the norm I I A J I 2 is the Euclidean norm of the matrix A when it is thought of as an element of e n . This fact makes this norm easily computable and geometrically tractable. The main importance of the Ky Fan norms lies in the following: 2

Theorem IV.2.2 (Fan Dominance Theorem) Let A , B be two trices. If IIAII c k ) < II B i l e k ) for k == 1 , 2, . . . , n , the n

I ! lAI I I < I I IBI I I

n

x

n

ma

for all unitarily invariant norms.

This is a consequence of the corresponding assertion about sym • metric gauge functions. ( See Example IV. 1 .4 . ) Proof.

Since

x

E

IRn

and for all symmetric

I I A I I < I I I A I I I < II A II cn ) == I I A ih

(IV.38)

for all A E M (n) and for all unitarily invariant norms I l l Analogous to Proposition IV. l . 5 we have Proposition IV. 2 .3 For each k II A I I c k )

Proof.

Now let

==

·

Ill

1 , 2, . . . , n ,

== min { H E l l en) + k i i C II

: A == B + C } .

I f A == B + C , then J I A II c k ) < II B II c k ) + I I C l i c k ) < I I E l l en) + k ii C II . s(A ) == ( s1 , . . . , sn ) and choose unitary U, V so that A == U [ ( diag ( s 1 . . . , s n ) ] V. ,

Let

(IV.39)

B

C

U (d iag ( s 1 - sk , s 2 - s k , . . . , Sk - sk , 0 , . , O ) ]V, U [diag ( s k , S k , S k , S k + l , . . . , s n ) ] V. . .

•

Then k

I J B II cn)

==

.

.

,

A == B + C,

L Sj - ks k

j= l

==

II A II ck) - k sk ,

I V . Symtnctric Nonns

94

and

II A I I < k l A norm

11

=

II B II

+

•

k ii G II .

on M ( n ) is called symm e t r i c if for A, B, G in M(n)

v ( B AC ) < II B I I v (A ) II C I I .

P ro p osi tio n IV . 2 . 4

A

unitarily invariant.

(IV.40 )

norm on M ( n ) is symmetric if and only if i t

is

If 11 is a symmetric norm, then for unitary U, V we have v(U A V) < v (A) and v(A) v(U-1 U A v v - 1 ) < v(U A V ) . So, v is unitarily invariant . Conversely, by Problem II I . 6 . 2 , s; (BAC) < II B II I I C II sJ (A) for all j = 1 , 2 , , . . , n . So, if

(s ( BAC) ) < • II B II II C II

Proof.

=

In particul ar7 this implies that every unitarily invariant norm is sub

mult i p licative :

I I I AB I I I < I I I A I I I I I I B I I I

for all A, B.

Inequal ities for sums and prod ucts of singular values of xnatricesl when combined with inequalities for symxnetric gauge function s proved in Section IV . 1 , lead to interesting st atexnents about unitarily invariant norms. This is illustrated below. Thcorcnt IV . 2 . 5 If A,

Proof.

n ar·e

n X n

11HLt1'i CC$ J then

(IV.4 1 )

If 1\ k A is the kth antisymmetric tensor product of

II A k

All

=

St

(1\k A)

=

k

A,

then

1 < k < n.

II s; (A) ,

j= l .

Hence,

k

II sj (AB )

j= l

k

II sj (A)sj ( B ),

j= l

No\v

U$C

t.hc statexncnt I I . 3 . 5 ( vii ) .

1

< k .<

n.

•

IV . 2 Unitarily Invariant Norms on Operators on

en

95

Corollary IV. 2 . 6 (Holder 's Inequality for Unitarily Invariant Norms) For every unitarily invariant norm and for all A , B E M (n) +o r all p > 1 and

jl

P roo f.

1

( IV . 42 )

p + .!. 1. q

==

Use the special case of ( IV.41) for r

==

1 to get

( k ) ( s ( A - B) ) .

(IV.5 7 )

We will prove this for ( n ) , and then use the interpolation formulas (IV.9) and (IV.39) . For ..j ( B) I < I l A - E l l . J

This can be proved easily by another argument also. For any j consider the subspaces spanned by { u 1 , . . . , Uj } and { Vj , . . . , Vn } , where u i , Vi , 1 .. f ( B ), respectively. Since the dimensions of these two spaces add up to n + 1 , they have a nonzero intersection. For a unit vector x in this intersection we have ( x , Ax) > .>..j ( A) and (x , B x) < .X.] ( B ). Hence, we have I I A - E l l > I ( x , ( A - B)x) I > .>..] ( A) - .>..J ( B) . So, by symmetry

I .Aj ( A ) - .>..j (B ) I

<

I IA - El l,

1 <j <

n.

From this, as before, we can get m� l sj ( A ) - sj ( B ) l < I l A - B l l J

for any two matrices A and B . This is the same as saying

c 1 ) ( s ( A - B ) ) .

(IV. 58)

Let T be a Hermitian matrix with eigenvalues .A1 > A 2 > · · · > .Ap > .Ap+l > · · > An , where Ap > 0 > .Ap +l · Choose a unitary matrix U such that T diag ( ,\ 1 , . . . , .An ) U DU * , where D is the diagonal matrix D Let n + ( 0 , · · · 0, -.Ap+ l , . . . , - .A n ) - Let (,\1 , . . . , Ap , 0, · · · 0) and n+ r+ == U n + U * , rU n- U* . Then both r and r- are positive matri ces and ·

===

===

===

===

,

,

===

T

===

y + - y- .

This is called the Jordan decomposition of T.

(IV.59)

1 00

IV.

Symmetric Norms

Lemma IV. 3.2 If A, B are

n x n

Hermitian matrices, then

n

L I AJ (A) - AJ ( B) I

<

j=l

Proof.

I I A - E l l en) ·

( IV.60 )

Using the Jordan decomposition of A - B we can write

I I A - B l l ( n) If we put C

==

== tr ( A - B)+ + tr ( A - B) - .

A + ( A - B) -

== B + ( A - B) + ,

then C > A and C > B . Hence, by Weyl ' s monotonicity principle, AJ ( C) > AJ ( A) and AJ ( C ) > AJ ( B) for all j . From these inequalities it follows that

Hence,

n

L !Aj (A) - AJ ( B) l < tr ( 2 C - A - B) == l f A - E l l e n ) ·

•

j=l

Corollary IV.3.3 For any two

e n )

n

n

x n matrices A, B we have

(s( A) - s(B) ) == L l si ( A ) - Sj ( B) I j= l

Theorem IV.3.4 For n

x n

< I l A - El l e n) ·

(IV.61 )

matrices A, B we have the majorisation

!s(A ) - s (B) I

-<w

s( A - B) .

Proof. Choose any index k == 1 , 2, . . . , n and fix it. By Proposition IV.2 .3, there exist X, Y E M(n) such that

A-B=X+Y and Define vectors

I lA - B i l ek} == I l X I I en) a,

{3

ki iY I I-

as a

{3 Then

+

s (X + B ) - s( B) , s ( A ) - s (X + B) . s( A ) - s( B ) == a + {3.

IV . 4 Weakly Unitarily Invariant Norms

101

Hence , by Proposition IV. l.5 (or Proposition IV. 2 . 3 restricted to diagonal matrices) and by (IV.58) and ( IV 6 1 ) , we have .

4> ( k ) ( s ( A ) - s(B) )

<

( l) ( ,B ) <

108

IV.

Symmetric Norms

, W

are both members of the family p . Problem IV.5 .4. Show that for every norm

7f

a

(V.45)

V.

1 40

Operator Monotone and Operator Convex Functions

From (V.42) we see that

Proof.

� J Im cp(x + i1J) dx

�J b

b

{3 7] +

a

a

00

Joo ( >. - x�2

+

1]2 dp, ( >. ) dx

-

the interchange of integrals being permissible by Fubini ' s Theorem. As rJ ----+ 0, the first term in the square brackets above goes to 0. The inner integral can be calculated by the change of variables u == x -T'J .A . This gives b

J a

b - ).. T}

'T]dX (x .A) 2 + 'TJ2

J

_

a - >-.

du u2 + 1

T}

arctan

( b - .A) 'TJ

-

arctan

So to prove (V.45) , we have to show that

11-( b) - p, ( a ) = ��

�

00

a arctan C � >.) - arctan ( � >.)] dp, ( A). [ J

-

oo

We will use the following properties of the function arctan. This is a mono tonically increasing odd function on ( - oo , oo ) whose range is ( - ; ; ) . So,

0 < arctan

( b - >.. ) - arctan ( a - >.. ) <

,

1r .

(a - >..) have the same sign, then by the addition law for ( (b .A) a - >.. ) 'fJ (b - a) arctan == arctan 2 - arctan 17 ( b ,.\) ( a ) · 'fJ 'fJ

'fJ

If (b - >.. ) and arctan we have,

+

'fJ

If x is positive, then

X

arctan x =

J 0

dt < -- 1 t +

2

X

J

+

8 ) - 11- ( a - 6 ) J.L ( b + 8 ) - J.L ( b - 6 )

_

)..

dt == X .

0

Now, let c be any given positive number. Since a and tinuity of Ji- , we can choose 8 such that

JL( a

_

< c/5 , < £/ 5 .

b are points of con

V. 4 Loewner 's Theorems

141

We then have,

1 I J-L( b ) - J-L(a) -

CXJ

b - A ) - arctan ( a - A ) ]dJ.L(A) ( J 1 / (arctan ( b - >.. ) - arctan ( a - >.. ) ]dJ.L(A) -

CXJ

CXJ

<

'f/

'f/

(arctan

7r

'f/

7r

'I'}

b

1

b

b - >.. ) + arctan ( a - A ) ]dJ.L(A) arctan ( Ja a 1 b >.. a - >.. ) ] dJ.L(A) ( ) + [arctan arctan ( J 2£ 1 1 arctan ( ry(b_ - a) _ ) dJ.L(A) + + (b A)(a A) S b+8 b-8 1 [1r - arctan ( b - >.. ) + arctan ( a - >.. ) ]dJ.L( .X ) + J + a 8 a-8 1 arctan ( ry(b_ - a) _ ) dJ.L(A) . + + (b >.. ) (a A) J +

7f

7r

'I'}

'f/

- CXJ

CXJ

<

'f/

'I'}

[1r

TJ 2

7f

'f/

7f

'f/ 2

7f

-

'f/

CXJ

Note that in the two integrals with infinite limits, the arguments of arctan are positive. In the middle integral the variable runs between + and > TJ8 and a -TJ ;A < TJ8 . So the right-hand side of the For such >.. , b -TJ ;A above inequality is dominated by

A

-

b - 8.

2£ 5

CXJ

+ + +

b - a dJ-L(A) 1r J + ( b - >.. ) (a - >.. ) b+8 a-8 b -.X dt-t ( >.. ) 1r J 'f}2 + (b - )( a - >.. ) 8 .!_ [1r - 2 arctan ]dJ.L(A). J TJ

TJ

2

a

-

CXJ

b-6

7f

'T]

a 8

a +6

TJ

The first two integrals are finite ( b ecause of the properties of df..L ) . The third o ne is dominated by 2 ( ; - arctan � [ S o we can choose rJ small enough to make each of the last three terms smaller than c / 5 . This proves the theorem. •

) J-L(b) J-L(a)].

1 42

V.

Operator Monotone and Operator Convex Functions

We have shown above th at all the terms occurring in the representation ( V.42) are uniquely determined by the relations (V.43) , (V.44) , and (V.45) . Exercise V.4. 13 We have proved the relations (V. 33}, (V. 36}, ( V. 4 1) and (V.42} in that order. Show that all these are, in fact, equivalent. Hence,

each of these representations is unique.

Proposition V .4. 14 A Pick function if the measure J-L associated with it in mass on (a, b) .

'P

is in the class P( a , b) if and only the representation ( V.42} has zero

Proof. Let cp(x + iry) == u(x + i'T]) + i v(x + iry), where u, v are the real and imaginary parts of cp. If 'P can be continued across (a, b) , then as rJ 1 0, on any closed subinterval [c, d) of (a, b) , v (x + iry) converges uniformly to a bounded continuous function v ( x) on [ c, d] . Hence, J-L( d) - M ( c) =

� J v(x )dx, d

c

i.e. , dJ-L(x) == ! v ( x ) dx . If the analytic continuation to the lower half-plane is by reflection across (a, b) , then v is identically zero on [ c, d] and hence so lS J-L. Conversely, if J-L has no mass on (a, b) , then for ( in (a, b) the integral in (V.42) is convergent, and is real valued. This shows that the function cp • can be continued from H+ to H_ across (a, b) by reflection. The reader should note that the above proposition shows that the con verse of Theorem V .4. 7 is also true. It should be pointed out that the formula (V.42) defines two analytic functions, one on H+ and the other on H_ . If these are denoted by cp and 'l/J, then cp(( ) == 'l/; ( () . So 'P and 'lj; are reflections of each other. But they need not be analytic continuations of each other. For this to be the case, the measure f.L should be zero on an interval (a, b) across which the function can be continued analytically.

If a function f is operator monotone on the whole real line, then f must be of the form j(t) = a + {3t, a E JR, {3 > 0 .

Exercise V.4. 15

Let us now look at a few simple examples.

The function cp(() == - � is a Pick function. For this function, we see from (V.43) and {V. 44) that a == {3 == 0 . Since cp is analytic everywhere in the plane except at 0, Proposition V. 4 . 14 tells us that the measure J-L is concentrated at the single point 0. Example V.4. 16

V . 4 Loewner ' s Theorems

1 43

Let cp( () == ( 11 2 be the principal branch of the square ro ot function. This is a Pick function. From (V. 4 3) we see that

Example V.4. 1 7

cp( i ) = Re e in:/4 = �From (V. 44) we see that {3 == 0 . If ( == A + iry is any complex number, then 1( I 2 = ( 1 ( 1 + A ) 112 + . sgn (1 ( 1 - A ) 1 12 , 2 2 where sgn is the sign of defined to be 1 if > 0 and if < 0 . 1 12 = ( > 0 , we have Im cp( ) ('<1; >.. ) . As ! 0 , 1 ( 1 comes closer to So if j A I. So, Im cp(A + iry ) converges to 0 if A > 0 and to IAI1/ 2 if A < 0. Since a = Re

TJ

t

TJ

rJ ,

rJ

-1

rJ

TJ

cp

is positive on the right half-axis, the measure J.L has no mass at measure can now be determined from {V. 4 5). We have, then

0(

rJ

0.

The

1 / 2 A ) IAI (V.46) dA. + J A - ( A2 + 1 y'2 CXJ Example V .4. 18 Let cp( ( ) == Log (, where Log is the principal branch of the logarithm, defined everywhere except on ( 0] by the formula Log ( == l n l (I + i Arg ( . The function Arg ( is the principal branch of ( 1 / 2 ==

1 _

1

-

1r

-

- oo ,

the argument, taking values in ( - 1r, 1r] . We then have Re(Log i) == 0 a iry Lo ( ) == 0. � lim {3 T] -+

CXJ

'l rJ

As rJ l 0, Im (Log( A + i'T])) converges to 1r if A < 0 and to 0 if A > 0. So from (V. 4 5) we see that, the measure J.L is just the restriction of the Lebesgue measure to (-oo, 0] . Thus, (V.47) Exercise V.4. 19 For 0 function cp( ( ) == (r . Show

r<
1,

let (r denote the principal branch of the

0(

(r == cos r21f + sin r7r J A - ( - A2 A+ 1 ) 1-Ai r d.A. CXJ 1r

This includes

(V. 4 6) as a

-

special case.

1

(V .48 )

V.

1 44

Op erator Monotone and Operator Co nvex Functions

Let now f be any operator monotone funct ion on (0, oo ) . We have seen above that f must have the form

J ( .X -1 t - A. >.+ 1 ) d (A ) . 0

j(t) == a + ,B t +

-

2

oo

J-L

By a change of variables we can write this as

00 f ( t) = Q + jJt + J ( ). 2 � - ). � t ) dJ1- ( ). ) (V.49) 1 IR, {3 > 0 and is a positive measure on (0, ) such that 00 dJi-(A) (V. 50) < 2 A + 1 J l

where a

E

0

J-L

oo

1

oo .

0

Suppose f is such that

f(O)

lim

: ==

t-o

Then, it follows from (V.49) that 1

J-L

(V . 5 1 )

f(t) > - oo .

must also satisfy the condition

J � dfl-(>.) 0

We have from (V.49)

<

(V. 5 2 )

oo.

00

J (� - ). � t ) dfl-(>.) 00 jJt + 1 ( >. : t ) >. dfl-(>.). jJt +

j(t) - f(O)

0

Hence, we can write f in the form

(V. 53) 100 � t dw(>.) , where == f(O) and dw(A) == ;2 dJi-(A). From (V.50) and (V. 52) , we see that the measure w satisfies the conditions

f (t)

=

'Y

).

+ jJt +

1

7 >/: dw(>.) 0

1

<

oo

and

J >.dw(>.) 0

<

oo .

(V. 54)

V.4 Lo ewner's Theorems

145

These two conditions can, equivalently, be expressed as a single condition

j � ). CXJ

1

0

dw ( >. ) <

(V.55)

oo .

We have thus shown that an operator monotone function on (0, oo ) sat isfying the condition (V. 5 1 ) has a canonical representation (V. 53) , where 1 E JR, ,B > 0 and w is a positive measure satisfying (V. 55) . The representation (V.53) is often useful for studying operator monotone functions on the positive half-line (0, oo ) . Suppose that we are given a function f as in (V.53) . If J-L satisfies the conditions (V.54) then

and we can write

and dJ.L (>.. ) == >.. 2 dw (>.. ) , then we have a representation of f in the form (V.49) . So , if we put the number in braces above equal t o

Exercise V .4.20 show that, for 0 <

Use the considerations in the preceding paragraphs to < 1 and t > 0, we have

r

t' (See

Exercise

a

==

s1

n

r

1r

____

1f

( V .56)

0

also. ) For t > 0, show that

V. 1 . 1 0

Exercise V.4. 2 1

log(l + t ) =

). J +t CXJ

1

>.. t

). -

2 d.A.

(V.57)

Appendix 1 . Differentiability of Convex Functions Let f be a real valued convex function defined on an interval I. Then f has some smoothness properties, which are listed below. The function f is Lipschitz on any closed interval (a, b] contained in /0 , the interior of I. So f is continuous on !0 .

1 46

V.

O perator Monotone and Op erator Co nvex Funct ions

At every point x in / 0 , the right and left derivatives of are defined, respectively, as

!' (

+ X

)

.

.

_

-

1.1m ylx

f'_ ( x) : = lim jx y

f

exist. These

f (y ) - f (x ) , y-X f (y) - f (x) y-X .

Both these functions are monotonically increasing on !0 . Further, xlw

f� ( x ) lim f� ( x ) xjw lim

f� ( w ) , J!_ ( w ) .

The function f is differentiable except on a countable set E in /0 , i.e. , at every point x in 1°\E the left and right derivatives of f are equal. Further, the derivative f' is continuous on ! 0 \E. If a sequence of convex functions converges at every point of I , then the limit function is convex. The convergence is uniform on any closed interval [a, b] contained in J0 . Appendix 2 . Regularisation of Functions The convolution of two functions leads to a new function that inherits the stronger of the smoothness properties of the two original functions. This is the idea behind "regularisation" of functions. Let cp be a real function of class c oo with the following properties: cp > 0,

cpo ( s ) ==

-

for l si > 1 for 0 < j s j < 1 for s == 0.

1/s

-1 - -7r2 cot -2 s 0 7f

s

(V I I . 44 )

From the familiar series expansion

CXJ 2z - cot - z == - + L --2 2 z n = l z 2 - 4n 2 1

7f

7f

( see L.V. Ahlfors, Complex Analysis, 2nd ed. , p. 188) one sees that 00

cp a ( s ) == L

n= l

2s for 0 < s < 1 . 4n 2 - s 2

This shows that cp 0 is a convex function in 0 < s < 1, and hence cp � is of bounded variation in this domain. On the rest of the positive half-line too cp� is of bounded variation. So cp 0 does meet the conditions that are sufficient to ensure that fo == 0,

2 fo ( t) 2fo ( t ) - 2 fo ( t + ) 1r

fo ( t ) - fo ( t + 2 1r )

1-

1

j cot ; 0

sin t

t +

fo ( t )

sin ts ds,

sin ( t + 1r )

Ut - t �

t + 7r 1r

The quantity inside the brackets is positive t � oo , we can write 00

s

,

+ 2( t

for

all

� 21r) ] sin t. t.

Since f0 (t )

L [ !o (t + 2 n7r) - fo (t + ( 2 n + l)1r)] sin t h ( t ) sin t ,

n =O

----+

0

as

VI I . 6 Append ix: Evaluating t he ( Fo urier) const ants

219

whe re h(t) > 0. This shows that fo satisfi es the condition (VII.43) . Finally, II fo 1! 1 can be evaluated from the available data. We can easily see that 4 1 1 - (sin t + - sin 3t + - sin 5t + n 5 3

sgn sin t

2.

nz

L n odd

Hence , using (VII.42 ) we obtain 00

j l fo (t ) l dt

-

()()

2:_ n

·

e int .

00

-! - oo

·

·

)

fo (t) sgn fo ( -t)dt 1

n

2 � nL n odd

2

n

We have shown that c 1 == ; . This result , and its proof given above, are due to B . Sz .- Nagy. The two-variable problem, through which c2 is defined, is more com plicated. The exact value of c2 is not known. We will show that c2 is fi nite by showing that there does exist a function f in L1 (JR 2 ) such that j (s 1 , s 2 ) == 81_; i s 2 when sf + s� > 1. We will then sketch an argument that leads to an estimate of c2 , skipping the technical details. It is convenient to identify a point ( x , y) in JR? with the complex variable z = x + i y. The differential operator fz = ; anni hilates every +i complex holomorphic function. It is a well- known fact (see, e.g. , W. Rudin, Functional Analysis, p. 205) that the Fourier transform of the tempered distribution 1z is - 2 zrri . (The normalisations we have chosen are different from those of Rudin.) Let 'P be a c oo function on JR2 that vanishes in a neighbourhood of the origin, and is 1 outside another neighbourhood of the origin. Let '¢( z) == z 1 r,o( ) . We will show that the inverse Fourier transform if; is in L . Note that

( %x

%Y )

z

This

'TJ

(

z

) .· ==

!!._,,,( ) == � d'fJ(z) z z d-z . d-z o/

is a c oo function with compact support. Hence , 'fJ is in the Schwartz space S. Let i] E S be its inverse Fourier transform. Then (ignoring constant factors) 1/;(z) == i] (z )/ z . Since i] is integrable at oo , so is 7/J. At the origin, ! is integrable and fJ( z) bounded. Hence if; is integrable at the origin. This shows that c2 < oo . Consider the tempered distribution fo (z) == 2-� . We know that f� ( �) == 1 1 � . However, f0 ¢ L . To fix it up we seek an element p in the space of tempered d i st r i but io ns S' such that 7r 7. Z

VI I .

220

E

(i) p

Pert urbation o f S p ectral S ubspaces o f Normal Matrices

£ 1 and supp p is contained in the unit disk D , L1 .

(ii) if f == fo + p , t he n f is in

Note that c2 == inf l l f lh over su ch

Writing

z ==

rei0 , one sees that

p.

(X)

=

7f

l fl l 1 2� j rdr j I� - iei0 21rp (z ) j dO. - 7!"

0

Let

=

F(r) Then

7f

J

- 7!"

7f

j iei6p ( z ) de .

(VII.45 )

1

1

1 - - ie1· o 21rp( z ) j dB > 2 7r j - - F(r) l , r r

and there is equality here i f ei0p( r ei0 ) is independent of e . Hence, we can restrict attention to only those p that satisfy the additional condition

( iii)

zp z )

(

is a radial function.

Putting

G( r)

we see that c2

=

==

inf

1 - rF(r) ,

(VII.46)

J IG ( r)ldr,

(VIL47)

(X)

0

where is defined via (VII.45) and (VII.46) for all p that satis fy the conditions (i) , (ii) , and ( iii) above. The two-variable minimisation p roblem is thus reduced to a one-variable problem. Using the conditions on p, one can characterise the functions that enter here. This involves a little m ore intricate analysis, which we will skip. The con c l us ion is that the functions G that enter in (VII.4 7 ) are all £ 1 fun ct i ons of the form G == g , where g is a continuous even function

G

G

supported in [- 1 , 1] such that

c2

=

inf {

(X)

j l.§ ( t ) l dt : 0

g

J g(t) dt = 1. In other words, 1

-1

even, supp g

=

g be the function g(t)

If we choose to This gives the estimate c2

<

1r .

[- 1 , 1 ] , ==

1

j

g =

1 , g E £1 } . (VI I . 4 8)

- l t l, then g(t )

==

sin2 ( ; ) ( � ) 2 .

221

VI I . 7 P roblen1s

A better estimate is obtained from the function g (t)

7r

7r == 4

for ! t l < 1 .

cos 2 t

Then g(t) ==

7r

2

A little computation shows that

4 7r 2 - t 2

cos t

.

sin dt < 2 . 9090 1 . J i§(t ) idt ; J CXJ

7r

=

0

t

t

0

Thus c2 < 2.9 1. The interested reader may find the details in the paper A n extremal problem in Fourier analysis with applications to operator theory, by R. Bhatia, C. Davis , and P. Koosis, J. Functional Analysis, 82 ( 1989) 138- 1 50.

VII . 7

P roblems

Problem VII. 6 . 1 . Let £ be any subspace of e n . For any vector

8(x, £) Then 8(x, £) is equal to

==

yE£

min l l x

II ( / - E) x l i ·

X

let

- Yll -

If £, F are two subspaces of e n ' let

p (£, F) == max{ max 8 ( x , F) , max 8 ( y , F) } . xEE

II X II = 1

Let dim £ Show that

==

where E and

dim F, and let F

8

y E :F

II y II = 1

be the angle operator between £ and F.

p (£, :F ) == II sin 8 11 == l i E - F ll , are the orthogonal projections onto the spaces £ and F.

Problem VII.6.2. Let A, B be operators whose spectra are disjoint from each other. Show that the operator ( � � ) is similar to ( � � ) for every C. Problem VII.6. 3. Let A, B be operators whose spectra are disjoint from each other. Show that if C commutes with A + B and with AB , t hen C commutes with both A and B. Problem VII. 6.4. The equation AX + XA * == I is called the Lyapunov equation. Show that if o-(A) is contained in the open left half-plane, then the Lyapunov equation has a unique solution X , and this solution is positive and invertible. -

222

VI I .

Perturbation o f Spectral S ubspaces of Normal Mat rices

Problem VII. 6 . 5 . Let A and B be any two matrices. Suppose that all singular values of A are at a distance greater than 8 from any singular value of B . Show that for every X ,

Problem VII. 6 . 6 . Let A , B be normal operators. Let S1 and S2 be two subsets of the complex plane separated by a strip of width 8. Let E PA ( S1 ) , F == Ps (S2 ) . Suppose E (A - B) E == 0. If T ( t ) is the function T(t) == t/ vl - t2 , show that ==

I I T ( I EF ! ) I I <

� I I E (A - B) II -

Prove that this inequality is also valid for all unitarily invariant norms. This is called the tanG theorem . Problem VII.6. 7. Show that the inequality (VII.28) cannot be true if c2 < ; . ( Hint: Choose the trace norm and find suitable unitary matrices A , B. ) Problem VII. 6 . 8 . Show that the conclusion of Theorem VII.2.8 cannot be true for any Schatten p-norm if p =/= 2. Problem VII.6.9. Let A , B be unitary matrices, and let dist (a (A) , a( B) ) == b == -J2. If some eigenvalue of A is at distance greater than -J2 from a ( B) , then a( A) and a(B) can be separated by a strip of width -J2. In this case, the solution of AX - X B == Y can be obtained from Theorem VIL2.3. Assume that all points of a ( A) are at distance -J2 from all points of O"(B) . Show that the solution one obtains using Theorem VII.2. 7 in this case is If O" ( A) == { 1 , - 1 } and O" ( B) == { i - i } , this reduces to ,

X == 1 /2 (AY - YB) . Problem VII.6. 1 0 . A reformulation of the Sylvester equation in terms of tensor products is outlined below. Let

cp ( I ® A)
AEij , EijAr ,

VI I. 8 Notes

and

References

223

where AT is the transpose of A. Thus the multiplication operator A ( X ) == AX on .C ( 7t ) can be identified with the operator A ® I on 1t ® 1t * , and the operator B(X ) == X B can be identified with I ® B T . The operator T == A - B then corresponds to

A ® ! - I ® B T.

Use this to give another proof of Theorem VII. 2 . 1 . Sometimes it is more convenient to identify .C ( 7t ) with 1t ® 1t instead of 1-l ® 7t * . In this case, we have a bijection

VII . 8

Notes and References

Angles between two subspaces of IRn were studied in detail by C. Jordan, Essai sur la geometrie a n dimensions, Bull. Soc. Math. France, 3 ( 1875) 103- 1 74. These ideas were reinvented, developed, and used by several math ematicians and statisticians. The detailed analysis of these in connection with perturbation of eigenvectors was taken up by C. Davis, Separation of two linear subspaces, Acta Sci. Math. (Szeged) , 19 ( 1958) 172- 1 87. This work was continued by him in The rotation of eigenvectors by a pertur bation, J. Math. Anal. Appl. , 6 ( 1963) 159- 173, and eventually led to the famous paper by C. Davis and W.M. Kahan, The rotation of eigenvectors by a perturbation III, SIAM J. Numer. Anal. 7( 1 970) 1-46. The CS decomposi tion in its explicit form as given in Theorem VII. 1 . 6 is due to G.W. Stewart,

On the perturbation of pseudo-inverses, projections, and linear least squares problems, SIAM Rev. , 1 9 ( 1 977) 634-662. It occurs implicitly in the paper

by Davis and Kahan cited above as well as in another important paper, A. Bjorck and G H. Golub, Numerical methods for computing angles be tween linear subspaces, Math. Comp. 27( 1973) 579-594. Further references may be found in an excellent survey article, C . C. Paige and M. Wei, His tory and generality of the CS Decomposition. Linear Algebra Appl. , 208/209 ( 1994) 303-326. Many of the proofs in Section VII. 1 are slight modifications of those given by Stewart and Sun in Matrix Perturbation Theory. The importance of the Sylvester equation AX - X B == Y in connec tion with perturbation of eigenvectors was recognised and emphasized by Davis and Kahan, and in the subsequent paper, G.W. Stewart, Error and .

perturbation bounds for subspaces associated with certain eigenvalue prob lems , SIAM Rev. , 15 ( 1973) 727- 764. Almost all results we have derived for

this equation are true also in infinite-dimensional Hilbert spaces. More on this equation and its applications, and an extensive bibliography, may be found in R. B hatia and P. Rosenthal, How and why to solve the equation AX - X B == Y , BulL London Math. Soc. , 29( 1997) to appear. Theorem VII. 2 . 1 was proved by J.H. Sylvester, Sur f 'equation

224

VI I .

Pert urbation of Sp ectral S ubspaces o f Normal M atrices

C . R. Acad. Sci . P ari s 99( 1884) 67-71 and 1 15- 1 16 . The proof given here i s due to G . Lumer and M . Rosenblum, Linear op erator equations, Proc. Amer. Math. Soc. , 10 ( 1959) 32-41 . The solution (VII. 1 8) is due to E . Heinz , Beitriige zur Storungstheorie der Spektralzer legung, Math. Ann. , 123 ( 19 5 1 ) 415-438. The general solution (VII. l9) is due to M. Rosenblum, On the operator equation B X - XA = Q, Duke Math. J . , 23 ( 1 956) 263-270. Much of t h e rest of Section VII.2 is based on the paper by R. Bhatia, C. Davis and A. Mc intosh , Perturbation of spectral subspaces and solution of linear operator equations, Linear Algebra Appl. , 52/53 ( 1983) 45-67. For Hermitian operators, Theorem VI L 3. 1 was proved in the Davis Kahan paper cited above. This is very well known among numerical an alysts as the sinO theorem and has been used frequently by them. This paper also contains a tane theorem ( see Problem VII.6.6 ) as well as sin28 and tan2B theoren1s , all for Hermitian operators. Theorem VII.3.4 (for Her mitian operators ) is also proved there. The rest of Section VIL3 is based on the paper by Bhatia, Davis, and Mcintosh cited above, as is Section VII.4. Theorem VII. 5 . 1 is due to R.-C.Li, New perturbation bounds for the uni tary polar factor, SIAM J . Matrix Anal. Appl. , 16 ( 1995) 327-332. Lemmas VII.5.4, VII.5.5 and Corollary VII. 5.6 were proved in F. Kittaneh, On Lip schitz functions of norrnal operators, Proc. Amer. Math. Soc. , 94 ( 1985) 416-418. Theorem VII. 5 . 7 is also due to F. Kittaneh , Inequalities for the Schatten p-norrn IV, Commun. Math. Phys . , 106 ( 1 986) 58 1-585. The in equality (VII. 39) was proved earlier by H. Araki and S. Yamagami, An inequality for the Hilbert- Schmidt norm, Commun. Math. Phys. , 81 ( 1 981) 89-98. Theorem VII. 5.9 was proved in R. Bhatia and F. Kittaneh, On some perturbation inequalities for operators, Linear Algebra Appl. , 106 ( 1988) 271-279. See also P.A. Wedin , Perturbation bounds in connection with sin gular value decomposition, BIT, 12( 1972) 99- 1 1 1 . The beautiful result c 1 == ; , and its proof in the Appendix, are taken from B . Sz.-Nagy, Uber die Ungleichung von H. Bohr, Math. Nachr. , 9 ( 1 953) 255-259. Problems such as this, are called minimal extrapolation pro blems for the Fourier transform. See H.S. Shapiro, Topics in Approximation The ory, Sprin ge r Lecture Notes in Mathematics Vol. 187 ( 1971 ) . In Theorem VIL2.5 , all that is required of f is that ](s) = ; when 2 points if we are s E CT( A ) - O"(B) . This fixes the value of j only at n dealing with n x n matrices. For each n , let b ( n ) be the smallest constant, for which we have I [ X [[ < I [ A X - X B II , en m atrice s px

== x q ,

b�)

whenever A, B are n x n Hermitian matrices such that dist (o-(A) , 8. R. McEachin has shown that b(2)

v'6 2

�

1 . 22474 ( see Example VII.2. 10)

a

( B)) ==

b( 3) and that

8 + s VIO

18

VI I . 8 Notes and References �

1 . 32285

b == nlim b(n ) --+ oo

22 5

==

Thus the inequality (VII.26) is sharp with

7r

- .

2

c1

==

; . See, R. McEachin ,

A sharp estimate in an operator inequality, Pro c. A mer. l\1ath. Soc. , 1 1 5

( 1 99 2) 161-165 and Analyzing specific cases of an operator inequality, Linear Alge bra Appl. , 208/20 9 ( 1994) 343-365. The quantity p(£, F) defined in Problem VII.6 . 1 is sometimes called the gap between £ and F. This and related measures of the distance between two subspaces of a Banach space are used extensively by T . Kato, Pertur bation Theory for Linear Operators, Chapter 4.

VIII S p ect ral Variation of Nonnormal Mat rices

In Chapter 6 we saw that if A and B are both Hermitian or both unitary, then the optimal matching distance d(o-(A) , o-(B)) is bounded by I I A - B I I . We also saw that for arbitrary normal matrices A , E this need not always be true ( Example VI.3 . 13) . However, in this case, we do have a slightly weaker inequality d( a (A) , o-(B) ) < 3 I I A - E l l ( Theorem VII.4. 1 ) . If one of the matrices A, E is Hermitian and the other is arbitrary, then we can only have an inequality of the form d(o-(A) , o-(B) ) < c( n ) I I A - E ll , where c( n ) is a constant that grows like log n ( Problems VI.8.8 and VI.8.9) . A more striking change of behaviour takes place if no restriction is placed on either A or B. Let A be the n x n nilpotent upper Jordan matrix; i.e. , the matrix that has all entries 1 on its first diagonal above the main diagonal and all other entries 0. Let B be the matrix obtained from A by adding an entry £ in the bottom left corner. Then the eigenvalues of B are the nth roots of £. So d(o-(A) , o-(B) ) == £1/ n , whereas I I A - E l l == £. When £ is small, the quantity c l / n is much larger. No inequality like d(o-(A) , o-(B) ) < c ( n ) II A - B II can be true in this case. In this chapter we will obtain bounds for d(o-(A) , o-( B ) ) , where A, B are arbitrary matrices. These bounds are much weaker than the ones for normal matrices. We will also obtain stronger results for matrices that are not normal but have some other special properties.

VI I I . I G eneral S pectral Variation Bounds

VIII . l

227

General S pectral Variation Bounds

Throughout this section, A and B will be two n x n matrices with eigenval ues a 1 , . . . , an , and fJ1 , . . , f3n , respectively. In Section VI. 3 , \Ve introduced the notation ( VII I l ) s ( a (B) , o-( A ) ) == m?X �nlai - ,Bj 1 .

.

J

'l

A bound for this number is given in the following theorem. Theorem VIII. l . l Let A, B be n

x n

matrices. Then

(VIII.2) Proof. Let j be the index for which the maximum in the definition (VIII. I ) is attained. Choose an orthonormal basis e 1 , . , en such that Be 1 f3j e1 . Then .

.

==

[m�nl ai - (3j l ] n 'l n

<

IJ l a i - ,Bj I

==

l det (A - (31 1 ) 1

i= l

by Hadamard's inequality (Exercise !. 1 . 3) . The first factor on the right hand side of the above inequality can be written as I I (A - B) e 1 J I and is, therefore, bounded by I I A - B I I - The remaining n 1 factors can be bounded -

as

•

This is adequate to derive (VIII.2) . Example VIII. 1.2 Let A equal.

==

- B == I .

Then the two sides of (VII/. 2} are

Compare this theorem with Theorem VI. 3.3. Since the right-hand side of (VIII.2) is symmetric in A and B, we have a bound for the Hausdorff distance as well: (VIII.3) Exercise VIII. 1 . 3 A bound for the optimal matching distance d( o-(A) , o-( B)) can be derived from Theorem VII/. 1 . 1 . The argument is similar to the one used in Pro blem VI. 8. 6 and is outlined below. (i) Fix A, and for any B let

228

VI I I .

S p ectral Variation o f Nonnormal Matrices

where lvf max( I I A I I , I I B I I ) . Let a 1 , . . . , a n be the eigenvalues of A. Let D ( a 2 , c( B )) be the closed disk with radius c ( B ) and centre a i . Then, The orem VII/. 1 . 1 says that CJ(B) is contained in the set D o btained by taking the union of these disks. (ii) Let A( t ) ( 1 - t ) A + tB , 0 < t < 1 . Then A(O) == A, A(1) == B , and c(A(t) ) < c( B ) for all t . Thus, for each 0 < t < 1 , o- ( A ( t )) is contained in D. {iii} Since the n eigenvalues of A ( t) are continuous functions of t , each connected component of D contains as many eigenvalues of B as of A. {iv) Use the Matching Theorem t o show that this implies ==

==

d ( a(A) , a ( B ))

<

(2n - 1) (2M) l - l/ n ii A - B l l 1 / n .

(VIII.4)

{v) Interchange the roles of A and B and use the result of Problem 1!. 5. 1 0

to o btain the stronger inequality

(VIII.5) The example given in the introduction shows that the exponent 1/n oc curring on the right-hand side of (VIII.5) is necessary. But then homogene ity considerations require the insertion of another factor like (2 M ) l - l/n. However, the first factor n on the right-hand side of (VIII. 5) can be replaced by a much smaller constant factor. This is shown in the next theorem. We will use a classical result of Chebyshev used frequently in approximation theory: if p is any monic polynomial of degree n , then

1 max jp(t) I > 2 -l . O
(VIII.6)

(This can be found in standard texts such as P. Henrici, Elements of Nu merical Analysis, Wiley, 1964, p. 194; T.J. Rivlin, A n Introduction to the Approximation of Functions, Dover, 1981 , p. 3 1 . ) The following lemma is a generalisation of this inequality. Lemma VIII. 1.4 Let r be a continuous curve in the complex plane with endpoints a and b. If p is any monic polynomial of degree n, then (VIII. 7) Proof. Let L be the straight line through a and b and L between a and b:

S the segment of

L

{z : z == a + t(b - a) , t E �}

S

{ z : z == a + t (b - a ) , O < t < 1 } .

For every point z in C, let z' denote its orthogonal projection onto L. Then l z - w l > l z' - w' l for all z and w .

-

VI I I . l General Spectral Vari ation Bo unds

- Let A -t-�,-- i == 1 , . . . , n , be the roots of p . Let .:\� == a + ti ( b t i E JR, and let z == a + t ( b - a ) be any point on L . Then n IT i

i= l

�

z - -A I

==

n IT

i= l

I ( t - ti ) ( b - a ) I

==

lb - a!n

n

Ti

i= l

-

l t - ti l ·

From this and the inequality (VIII.6) applied to the polynomial we can conclude that there exists a point z0 on S for which n

IT i

i=l

'

zo - .A i l

> jb - a!n 22n - l

a

),

II n

229

i= l

\vhere

( t - tz ) ,

·

Since r is a continuous curve joining a and b , z0 == .:\� for some A o E r . Since l -Ao - A i l > I -A � - A � l , we have shown that there exists a point .:\ 0 on

r

n IT

l b ;:,� I n s uch that l p ( .A o ) l == . I -Ao - A i > 1 2 i= l l

Theorem VIII. 1 . 5 Let A and B be two n

x

•

n matrices. Then (VIII. 8 )

Proof. Let A ( t ) == ( 1 - t)A + tB, 0 < t < 1 . The eigenvalues of A(t) trace n continuous curves in the plane as t changes from 0 to 1. The initial points of these curves are the eigenvalues of A , and their final points are those of B . So, to prove (VIII. 8) it suffices to show that if r is one of these curves and a and b are the endpoints of r, then I a - b l is bounded by the right-hand side of (VIII.8) . Assume that I I A l l < II B I I without any loss of generality. By Lemma VIII. l.4, there exists a point .:\0 on r such that I det (A - -A o l ) l >

jb - a!n 2 _1 . 2 n

Choose 0 < t0 < 1 such that .:\ 0 is an eigenvalue of ( 1 - t0 ) A + t0 B . In the proof of Theorem VIII. l . l we have seen that if X , Y are any two n x n matrices and if .A is an eigenvalue of Y, then J

I det( X - AI ) I < I I X - Y I ! ( I I X I I + I Y I I ) n - 1 ·

Choose X == A and Y == ( 1 t0 ) A + t0 B . This gives 1;�� < I det (A Aol ) l < II A - B li ( II A I I + I I B I I ) n- l _ Taking nth roots, we obtain the desired • conclusion. -

Note that we have, in fact, shown that the factor 4 in the inequality (VIII.8) can be replaced by the smaller number 4 x 2 - 1 /n . A further im provement is possible; see the Notes at the end of the chapter. However, the best possible inequality of this type is not yet known.

VII I .

230

Spectral Variat ion of Nonnormal Matrices

Perturbation of Ro ots of Polynomials

VIII . 2

The ideas used above also lead to bounds for the distance between the roots of two polynomials. This is discussed below. Lemma VIII. 2. 1 Let f(z) nomial. Let

==

zn

+

a 1 z n -l + · · · + an

be any monic poly-

Then all the roots of f are bounded (in absolute value) by

Proof.

If l z l

> J-t ,

J-t .

(VIII.9 )

then

f(z)

a l + · · · + an 1+zn z a 2 - · · · - an al - > 1- z zn z2 1 1 1 1 - - - - - · · · - 2 22 2n 0. > -

>

•

Such z cannot, therefore, be a root of f.

Let a 1 , , O n be the roots of a monic polynomial f. We will denote by Root f the unordered n-tuple { a 1 , . , a n } as well as the subset of the plane whose elements are the roots of f. We wish to find bounds for the optimal matching distance d (Root f, Root g) in terms of the distance between the coefficients of two monic polynomials f and g . Let .

.

.

.

.

f(z ) g(z)

(VIII. 10)

be two polynomials. Let

(VIII. 1 1) (VIII.12) The bounds given below are in terms of these quantities. Theorem VIII. 2.2 Let f, g be two monic polynomials Then

where

J-t

n s (Ro ot f, Root g ) < L ! a k - bk !J-L n - k , k= l is given by { VIII. 9} .

as

in (VII/. 1 0} .

(VIII. l3)

VIII.2

Proof.

Perturbation of Ro ots of Poly nomials

23 1

We have

n f(z) - g( z ) == L (a k - bk )z n - k . k= l So, if a is any root of j, then, by Lemma VIII.2. 1 ,

l g (a) l

n

< <

If the roots of g are {3 1 , . . .

L l a k - b k l lnl n - k k= l n L ! ak - b k l /1-n - k_ k =l

, f3n , this says that n n IT Ia - f3j I < L lak - b k ! 11-n - k · j= l k=l

So , •

This proves the theorem.

Corollary VIII.2 . 3 The Hausdorff distance between the roots of f and g is bounded as

h(Rootf, Rootg) < G ( f, g ).

(VIII. 14)

Theorem VIII.2 . 4 The optimal matching distance be tween the roots of f and g is bounded as d(Root j, Root g) < 4 8 ( / , g ) .

(VIII. 15)

Proof. The argument is similar to that used in proving Theorem VIII. l .5. Let ft ( l - t ) f + tg , 0 < t < 1 . If A is a root of ft , then by Lemma VIII.2. 1 , I A I < 1 and we have ==

l t (f( .A ) - g (A) ) l < l f( A ) - g ( A ) I n < Li a k - bk l l.A i n - k < [G ( j , g ) ] n . k= l The roots of ft trace continuous curves t changes from 0 to 1 . The initial points of these curves are the roots of f , and the final points are the I f( A ) I

n

as

roots of g. Let r be any one of these curves, and let a, b be its endpoints. Then, by Lemma VIII. l .4, there exists a point A on r such that

232

VI I I .

Spectral Variat ion o f No nnormal l\1 atrices

This shows that ja -

bj < 4

X

2-l/n G (f, g ) ,

•

and that is enough for proving the theorem. h =

( f + g) /2 . Then any convex combination of f can also be expressed as h + t ( f - g) for some t with It! < � . Use

Exercise VIII. 2.5 Let and g this to show that

d ( Root j, Root g ) < 4 1 - 1 /n G ( f g ) . ,

(VIII. 16)

The first factor in the above inequality can be reduced further; see the Notes at the end of the chapter. However, it is not known what the optimal value of this factor is. It is known that no constant smaller than 2 can replace this factor if the inequality is to be valid for all degrees n. Note that the only property of 1 used in the proof is that it is an upper bound for all the roots of the polynomial ( 1 - t) f + tg. Any other constant with this property could be used instead. Exercise VIII. 2.6 In Pro blem I. 6. 1 1 , a bound for the distance between the coefficients of the characteristic po lynomials of two matrices was ob tained. Use that and the combinatorial identity

tkG) k =O

=

n2 n - l

n matrices A, B d ( O" (A) , O" (B )) < n l/ n (8 M) l - 1 /n ii A - B ll l/ n , ( VI I I . 1 7 ) where M == max( II A II , I I B II ) . This is weaker than the bound obtained in to show that for any two

n x

Theorem VIII. l . 5.

VII I . 3

Diagonalisable Matrices

A matrix A is said to be diagonalisable if it is similar to a diagonal matrix; i.e. , if there exists an invertible matrix S and a diagonal matrix D such that A == svs - 1 . This is equivalent to saying that there are n linearly independent vectors in e n that are eigenvectors for A. If S is unitary ( or the eigenvectors of A orthonormal) , A is normal. In this section we will derive some perturbation bounds for diagonalisable matrices. These are natural generalisations of some results obtained for normal matrices in Chapter 6. The condition number of an invertible matrix S is defined as cond ( S)

==

II SI I II S - 1 11 -

Note that cond ( S) > 1 , and cond ( S) == 1 if and only if S is a scalar multiple of a unitary matrix. Our first theorem is a generalisation of Theorem VI. 3.3.

VI I I . 3 D iago nalis able M atrices

233

Theorem VIII . 3 . 1 Let A == SDS- 1 , where D is a diagonal matrix and S an invertible matrix. Then, for any matrix B , s(o-(B) , o-(A) ) < cond( S) I I A - E l l -

(VIII. 18)

Proof. The proof of Theorem Vl. 3.3 can be modified to give a proof of this. Let E. = cond(S) I I A - B ll . We want to show that if (3 is any eigenvalue of B, then (3 is within a distance c of some eigenvalue a.1 of A. By applying a translation, we may assume that {3 = 0. If none of the O'.j is within a distance c of this , then A is invertible and

1 So,

J I A - 1 ( B - A) l l < I I A - 1 II II E - A l l < 1 .

Hence, I + A - 1 (E - A) is invertible, and so is B = A(I + A- 1 (B - A) ) . • But then B could not have had a zero eigenvalue. Note that the properties of the operator norm used above are (i) I + A is invertible if I I A l l < 1 ; (ii) I I AB II < I I A I I I I B II for all A, B; (iii) l i D I I == max l di l if D = diag( d 1 , . . . , dn ) . There are several other norms that satisfy these properties. For example, norms induced by the p-norms on e n , 1 < p < oo , all have these three properties. So, the inequality (VIII. 18) is true for a large class of norms. Exercise VI�I.3 . 2 Using continuity arguments and the Matching Theo rem show that if A and B are as in Theorem VIII. 3. 1 , then d(o-(A) , o-(B) ) < (2n - l)cond(S) I I A - E ll . If B is also diagonalisable and B

== T D'T - 1 , then

d(o-(A) , o-(B) ) < n cond ( S)cond(T) I I A - E l l An inequality stronger than this will be proved below by other means.

Theorem VIII. 3 . 1 also follows from the following theorem. Both of these are called the Bauer-Fike Theorems. Theorem VIII.3.3 Let S be an invertible matrix. If (3 is an eigenvalue of B but not of A, then (VIIL 1 9)

VI I I .

234

Proof.

Spect ral Vari ation of Nonnormal Matrices

We can write

S(B - {3/) S - 1

S[(A - {31) + B - A] S - 1 S(A - {3/) S - 1 {I + S(A - {31 ) - 1 s - 1 S(B - A) S - 1 } . ·

Note that the matrix on the left-hand side of this equation is not invertible. Since A - {3! is invertible, the matrix inside the braces on the right-hand side is not invertible. Hence,

1 < li S( A - f3I) - 1 S - 1 . S(B - A)S - 1 11 < 11 s (A - f3 I) - l s- 1 11 11 S (B - A )s - 1 11 .

•

This proves the theorem.

We now obtain, for diagonalisable matrices , analogues of some of the major perturbation bounds derived in earlier chapters for Hermitian ma trices and for normal matrices. Some auxiliary theorems about norms of commutators are proved first. Theorem VIII.3.4 Let A, B be Hermitian operators, and let r be a pos itive operator whose smallest eigenvalue is

1

{i. e. ,

r > 1 I > 0) .

III Ar - r B i l l > 1 III A - B ill

The n

(VIII. 20 )

for every unitarily invariant norm.

Proof.

Let T == Ar - r B, and let Y

Y

==

==

T + T * . Then

(A - B)r + r(A - B).

This is the Sylvester equation that we studied in Chapter 7. From Theorem VII . 2.12, we get

21 III A - B ill < IllY I ll < 2 I II TI I I

==

2 III Ar - r B il l -

This proves the theorem.

•

Corollary VIII.3.5 Let A, B be any two operators, and let r be a positive opemtor, r > 7! > 0. Then

III ( Ar - r B) EB (A* r - r B * ) lll > 'Y IIl (A - B ) EB (A - B) I l l

( VIII . 21)

for every unitarily invariant norm.

Proof.

( J. � )

This follows from (VIII.20) applied to the Hermitian operators • and ( �* � ) , and the positive operator ( � � ) .

VI I I . 3 Diagonalisable M atrices

235

Corollary VIII.3 . 6 · Let A . and B be unitary operators, and let r be a positive operator, r > 1I > 0 . Then, for every unitarily invariant norm, ( VIII. 22 )

III Ar - r B i ll > 1 II I A - B i l l Proof.

If A and B are unitary, then

sj ( Ar - rB)

==

sj ( rB* - A* r )

==

sj (A*r - rB* ) .

Thus the operator (Ar - r B) EB (A * r - r B * ) has the same singular values as those of Ar - r B, each counted twice. From this we see that ( VIII. 2 1 ) implies ( VIII.22 ) for all Ky Fan norms, and hence for all unitarily invariant • norms. Corollary VIII.3. 7 Let A, B be normal operators, and let r be a positive operator, r > rl > 0. Then

( VIII.23 ) Proof. Suppose that A is normal and its eigenvalues are a 1 , . . . , a n · Then (choosing an orthonormal basis in which A is diagonal) one sees that for every X

z ,J

If A, B are normal� then, applying this to ( � in place of X , one obtains

�)

in place of A and ( g

I I AX - X B l l 2 == II A * X - X B * 1 1 2 -

�)

( VIII. 24 )

Using this, the inequality ( VIII.23 ) can be derived from ( VIII.2 1 ) .

•

A famous theorem (called the Fuglede-Putnam Theorem, valid in Hilbert spaces of finite or infinite dimensions) says that if A and B are normal, then for any operator X, AX == XB if and only if A* X == X B* . The equality ( VIII.24 ) says much more than this.

normal A, B, the inequality (VIII. 23} is no t al wa ys true if the Hilbert-Schmidt norm is replaced by the operator norm. A numerical example i l lust t ing this is given below. Let

Example VIII.3.8 For r ==

ra

diag(0.6384 , 0. 6384, 1 .0000 ) ,

- 0. 5205 - 0. 1642i -0. 1 299 + 0 . 1 709i 0. 2850 - 0. 1808i

0. 1042 - 0.3618i 0.42 1 8 + 0.4685i -0. 3850 - 0. 4257i

- 0. 1326 - 0.0260i - 0. 5692 - 0 .3178 i

-0. 2973 - 0. 1 71 5i

VI I I .

236

B

==

S p ectral Vari ation of Nonnormal Matrices

- 0 .6040 + 0 . 1 760i 0 . 0582 + 0. 2850i 0 .408 1 - 0. 3333i

0. 5 1 28 - 0 . 2865i 0.0 1 54 + 0.4497i

- 0.072 1 - 0. 2545i

0 . 1 306 + 0.0 1 54i - 0 . 500 1 - 0 . 2833i - 0 . 2686 + 0.0247i

Then

l i Ar - r B I I = 0.8763. ! II A - E l l Theorem VIII. 3.4 and Corollary VIII.3. 7 are used in the proofs below.

Alternate proofs of both these results are sketched in the problems. These proofs do not draw on the results in Chapter 7. Theorem VIII.3.9 Let A, B be any two matrices such that A == S D1 S- 1 , B == T D 2 T- 1 , where S, T are invertible matrices and D 1 , D 2 are real diag onal matrices. Then

1 2 IIIEig 1 ( A) - Eig1 ( B) I I I < [cond ( S)cond (T)] 1 III A - B ill

(VIII.25)

for every unitarily invariant norm.

Proof. When A, B are Hermitian, this has already been proved; see (IV.62) . This special case will be used to prove the general result. We can write

Hence,

We could also write

and get

1 IIIT - 1 SD 1 - D2 T - S IIl

I I T - 1 I I IIIA - B III I I S I I . Let s- 1 T have the singular value decomposition s- 1 T == Ur V. Then

<

!II D 1 ur v - urv D 2 III III U * n 1 ur - r v D2 V * 1 1 1 == IIIA ' r - r B ' lll ,

V D 2 V * are Hermitian matrices. Note that where A' U* D 1 U and B' r - 1 s == V * r- 1 U* . So, by the same argument , ==

==

We have, thus, two inequalities

n iii A - B il l > I I I A r - r B ' III , '

VI I I . 3 D iagonalisable M atrices

237

and

/311J A - B ill > III A ' r- l - r- 1 B ' lll , where a == II S- 1 JI I I T I I , (3 IIT - 1 I I II SI I - Combining these two inequalities and using the triangle inequality, we have

==

[ 1/2 ( - ) 1 / 2 ] 2 > 0 implies that � The operator inequality ( � ) r!3

+ rf3 >

1

(a{3� 1; 2 I. Hence, by Theorem VIII.3.4, 2 lii A - B i ll >

1

III A ' - B ' lll � 1 2 1 (n

But A' and B ' are Hermitian matrices with the same eigenvalues as those of A and B , respectively. Hence, by the result for Hermitian matrices that was mentioned at the beginning, Ili A ' - B ' lll > III Eig 1 (A) - Eig 1 ( B ) III Combining the three inequalities above leads to (VIII. 25) .

•

Theorem VIII. 3 . 10 Let A, B be any two matrices such that A == SD 1 S- 1 , B T D2T- 1 , where S, T are invertible matrices and D 1 , D 2

==

are diagonal matrices. Then

1 2 d2 ( o-( A ) , o- (B ) ) < [cond(S)cond(T ) ] 1 I I A - B i b -

(VIII. 26)

Proof. When A, B are normal, this is j ust the Hoffman-Wielandt in equality; see (VI. 34) . The general case can be obtained from this using the inequality (VIII. 23) . The argument is the same as in the proof of the • preceding theorem. Theorems VIII.3.9 and VIII. 3 . 1 0 do reduce to the ones proved earlier for Hermitian and normal matrices. However, neither of them gives tight bounds. Even in the favourable case when A and B commute, the left-hand side of ( VIII. 24) is generally smaller than I l i A B i l l , and this is aggravated further by introducing the condition number coefficients.

-

Exercise VIII. 3 . 1 1 Let A and B be as in Theorem VIII. 3. 10. Suppose that all eigenvalues of A and B have modulus 1 . Show that (VIII. 27) d lll- 111 ( a( A ) , a( B )) < [cond ( S)cond ( T )] 1 1 2 II I A B il l

;

-

for all unitarily invariant norms. For the special case of the operator norm, the factor ; above can be replaced by 1. [Hint: Use Corollary VIII. 3. 6 and the theorems on unitary matrices in Chapte r 6.}

238

VI I I .

VII I . 4

Spectral Variat io n of Nonnormal M atrices

Matrices with Real Eigenvalues

In this section we will consider a collection R of matrices that has two special properties: R is a real vector space and every element of R has only real eigenvalues. The set of all Hermitian matrices is an example of such a collection. Another example is given below. Such families of matrices arise in the study of vectorial hyperbolic differential equations. The behaviour of the eigenvalues of such a family has some similarities to that of Hermitian matrices. This is studied below. Example VIII.4. 1 Fix a block decomposition of matrices in which all di agonal blocks are square. Let R be the set of all matrices that are block upper triangular in this decomposition and whose diagonal blocks are Her mitian. Then R is a real vector space (of real dimension n2 ) and every element of R has real eigenvalues.

In this book we have called a matrix positive if it is Hermitian and all its eigenvalues are nonnegative. A matrix A will be called laxly positive if all eigenvalues of A are nonnegative. This will be written symbolically as 0 < L A. If all eigenvalues of A are positive, we will say A is strictly laxly positive. We say A < L B if B - A is laxly positive. We will see below that if R is a real vector space of matrices each of which has only real eigenvalues, then the laxly positive elements form a convex cone in n. So, the order < L defines a partial order on n. Given two matrices A and B, we say that .X is an eigenvalue of A with respect to B if there exists a nonzero vector x such that Ax == .XBx. Thus, eigenvalues of A with respect to B are the n roots of the equation det(A - :AB) == 0. These are also called generalised eigenvalues. Lemma VIII. 4.2 Let A, B be two matrices such that every real linear combination of A and B has real eigenvalues. Suppose B is strictly laxly positive. Then for every real A , -A + AI has real eigenvalues with respect to B .

Proof.

We have to show that for any real A the equation de t (-A. + .XI - J.LB)

==

(VIII. 28)

0

is satisfied by n real J-L. Let J-l be any given real number. Then, by hypothesis , there exist n real .X that satisfy (VIII.28) , namely the eigenvalues of A + J-LB. Denote these A > VJ n (J-L) . We have as 'Pi (J-L) and arrange them so that

2 (A) ) == ( U, P) , where the invertible matrix A has the polar decomposition A == UP. Earlier in this section we called ; i e. w ( U, P) = UP. This is a much simpler object to handle, since it is j ust a product map. We can calc ul ate the derivative of this map and t h en use the inverse function theorem to get the derivative of the map

·

.

,

Theorem X. 3.9 Let t1? 1 be the map from GL(n) into U(n) that takes an invertible matrix to the unitary part in its po lar decomposition,

X

306

X.

Pert urbation of M atrix Functions

point U X is given by the formula (X)

[ D1 (UP) ] (UX) = 2 U

J

e - t P ( i l m X )e - t P dt.

(X.4 7 )

0

Proof. The domain of the linear map D'I!(U, P) is the tangent space to the manifold U (n) x P (n) at the point (U, P) . This space is ( U · K (n) , 1-l (n) ) . The range of D 'I! ( U, P) is the tangent space to GL(n) at UP. This space is M(n) . We will use the decomposition M(n) == U · .K:(n) + U · 1-l (n) that arises from the Cartesian decomposition. By the definition of the derivative, we have [ D w ( U, P) ] ( U S, H)

d w ( Uet8 , P + tH) dt t=O d Uets (P + tH) dt =D t USP + UH

for all S E K(n) and H E 1-l ( n) . The derivative D

w- 1 ,

from the two equations above we see that

U X == [D

( k) ( X ) , 45

q> ( P) ( x ) , 89 q, (P 2 ) ' 89 P1 ��\ ' 89
h( o- (A) , o- ( B ) ) , 160 h(Rootj, Rootg) , 23 1 Im A, 6 JC * , 14 JC (n) , 167 .c , 269 .C(V, W ) , 3 £ ( 1-l) , 4 .C 2 ( X , Y ) , 3 1 3 A (A ) , 50 -A1 (A) , 57 Aj (A ) , 58 .A 1 (A) , 58 Aj (A ) , 58 A 1 (T) , 256 £, ( , ) , 175 mA (X ) , 162 M (n) , 9 1 N , 169 N (

Index

16 s pan { v 1 , . . . , v k } , 6 5 s( L, M) , 1 60 a ( A ) , 160 s (a (A) , a ( B)) , 160 sgnx , 2 1 7 Sn , 165 s j_, 167 r+ , 99 r- , 99 T( A ) , 102 TAUA , 167 TA OA , 189 8 (£ , F) , 201 8 ( /, g ) , 230 Tu U ( n ) , 305 T, 259 tr, 29 u *v , 2 U ( n) , 7 UB , 166 v + w , 65 v - w, 65 w (A) , 8 W (A) , 8 X @ y , 12 X 1 1\ · · · 1\ X k , 16 X 1 V · · · V X k , 16 a,

xl 28 '

x T 28 X -< y, 2 8 X -< w y, 30 X -< w y , 30 '

-< s y ,

X V y, X 1\ y , x+ , 3 0 Z (A) , 167

X

( u, v) ,

1 80 30 30

1 [K, A] , 167 ffijk 7-lj , 1 1 ® 7-l , 14 ® k A , 15 1\ k 1-{ , 16 v k 71, 16 1\ k A , 18 vk A , 18 IAI , 5 I I I , 29 l x l , 30 ll x l l p , 84 ll x l l oo , 84 l l x l l 1 , 86 llx II (k ) , 8 6 IIAII , 6 I I A II 2 , 7 I I IA I I I , 91 I I I AI I I , 9 1 I I A I I ( k ) , 35 I I A II p , 92 I I A l l oo , 9 2 I I A !I l , 92 I l l · I l l (\ , 9 5 l I I · I I I ' , 96

347

Graduate Texts in Mathematics continued from page ii

61

WHITEHEAD. Elements of Homotopy

92

Theory .

62

64

63 65

66

Spaces.

KARGAPOLOV�ERWDAKOV. Fundrunentrus

93

I. 2nd ed.

Geometry-Methods and Applications.

BOLLOBAS. Graph Theory.

Part

EDwARDS. Fourier Series. Vol. I 2nd ed.

94

WELLS . Differentiru Anruysis on Complex

WARNER . Foundations of Differentiable Manifolds and Lie Groups.

Manifolds . 2nd ed .

95

SHIRYAEV. Probability. 2nd ed .

WATERHOUSE. Introduction to Affine

96

CoNWAY. A Course in Functionru Anruysis . 2nd ed.

97

67

SERRE. Local Fields.

68

WEIDMANN. Linear Operators in Hilbert

69

LANG. Cyclotomic Fields II .

70

MASSEY. Singular Homology Theory.

KOBUTZ. Introduction to Elliptic Curves and Modular Forms. 2nd ed.

98

Spaces.

72

DUBROVINIFOMENKo!NOVIKOV. Modem

of the Theory of Groups.

Group Schemes .

71

DIESTEL. Sequences and Series in Banach

BROCKERifOM DIECK. Representations of Compact Lie Groups .

STILLWELL. Classical Topology and

FARKASIKRA. Riemann Surfaces. 2nd ed.

99

GROVFiBENSON. Finite Reflection Groups. 2nd ed.

100

BERG/CHRISTENSEN/RESSEL. Harmonic

Combinatoriru Group Theory. 2nd ed.

Analysis on Semigroups: Theory of

73

HUNGERFORD. AJgebra.

Positive Definite and Related Functions.

74

DAVENPORT. Multiplicative Number

1 01

EDWARDS. Gruois Theory.

Theory. 2nd ed.

1 02

VARADARAJAN. Lie Groups, Lie Algebras

75 76 77 78 79

Groups and Lie AJgebras .

1 03

liTAKA. AJgebraic Geometry .

104 DUBROVIN!FOMENKO/NOVIKOV. Modem

LANG. Complex Anruysis. 3rd ed.

HECKE. Lectures on the Theory of

Geometry-Methods and Applications.

AJgebraic Numbers.

Part II.

B URRIS/SANKAPPA NAVAR. A Course in

1 05 LANG. S�(R) .

Universru Algebra.

1 06

WALTERS . An Introduction to Ergodic Theory.

80

and Their Representations.

HOCHSCHILD . Basi c Theory of AJgebraic

ROBINSON. A Course Groups. 2nd ed.

in the Theory of in Algebraic

SILV ERMAN. The Arithmetic of Elliptic

Curves .

1 07

OLVER: Applications of Lie Groups to Differential Equations. 2nd ed.

1 08

81

FoRSTER. Lectures on Riemann Surfaces.

Integral Representations

82

Borrffu. Differenttru Forms

Complex Variables.

Topology.

83

in Severru

RANGE. Holomorphic Functions and

1 09 Lmrro. Univalent Functions and

WASHINGTON. Introduction to Cyclotomic

Teichmiiller Spaces.

Curves.

Fields. 2nd ed.

1 10 LANG. Algebraic Number Theory.

IRELAND/RosEN. A Classical Introduction

111

to Modem Number Theory. 2nd ed.

1 1 2 LANG. Elliptic Functions.

85

EDwARDS. Fouri�r Series. Vol. II. 2nd ed.

1 13

86

vAN LINT. Introduction to Coding Theory.

84

2nd ed.

87

BROWN. Cohomology of Groups.

88

PIERCE. Associative Algebras.

89

LANG. Introduction to Algebraic and Abelian Functions. 2nd ed.

90

BR0NDSTED. An Introduction to Convex Polytopes.

91

B EARDON. On the Geometry of Discrete Groups.

HUSEMOLLER. Elliptic

KARATZAS/SHREVE. Brownian Motion and

A Course in Number Theory

Stochastic Cruculus. 2nd ed.

1 14

KOBLITZ.

and Cryptography. 2nd ed.

1 15

BERGERIGosTIAUX. Differential Geometry: Manifolds, Curves, and Surfaces .

1 1 6 KELLEY/S RINIVASAN. Measure and Integral. Vol. I.

1 17

SERRE. Algebraic Groups and Class Fields.

1 18

PEDERSEN. Analysis Now.

1 19

ROTMAN. An Introduction to Algebraic

1 46

Topology. 1 20

Mathematical Sketchbook.

ZIEMER. Weakly Differentiable Functions :

1 47

Sobolev Spaces and Functions of Bounded Variation. 121

LANG. Cyclotomic Fields I and II.

Readings in Mathematics 1 23 EBBINGHAUS/HERMES et al. Numbers . Readings in Mathematics

1 24

Geometry

Methods and Applications .

1 49

1 25 1 26 1 27

Introduction.

BERENSTEINIGAY. Complex Variables : An

in Algebraic

BoREL. Linear Algebraic Groups

MASSEY. A Basic Course Topology .

.

1 28

RAUCH. Partial Differential Equations.

1 29

FuLTON/HARRIS. Representation Theory:

Readings in Mathematics A First Course.

1 30 131

1 50

1 33

Geometry. 151 1 52

ZIEGLER. Lectures on Polytopes .

1 53

FuLTON. Algebraic Topology : A

First Course. 1 54

Analysis. 1 55

KASSEL. Quantum Groups.

1 56

KECHRIS . Classical Descriptive Set

Theory .

1 58

ROMAN. Field Theory.

LAM . A First Course in Noncommutative

1 59

CoNWAY. Functions of One

HARRis . Algebraic Geometry : A First

B EARDON. Iteration of Rational Functions .

1 35

RoMAN. Advanced Linear Algebra.

1 36

ADKINS/WEINTRAUB. Algebra: An

Approach via Module Theory.

Complex Variable II. 1 60

CoHEN. A Course

in Computational Algebraic Number Theo ry .

1 39

BREDON. Topology and Geometry .

1 40

AUBIN. Optima and Equilibria. An Introduction to Nonlinear Analysis.

161 1 62 1 63

DooB . Measure Theory 1 44 DENNISIFARB . No ncommutative Algebra.

145

VICK. Homology Theory . An

Introduction to Algebraic Topology. 2nd ed.

DIXON/MO RTIMER. Permutation

NATIIAN S ON. Additive N umber Theory:

Groups.

1 64

NATIIAN SON. Additive Number Theory : The Classical Bases .

1 65

Inverse Problems and the Geometry of SHARPE. Differential Geometry :

Cartan' s Generalization of Klein' s Erlangen Program . Sumsets .

1 66

1 67 1 68

MORANDI. Field and Galois Theory .

EWALD. Combinatorial Convexity and Algebraic Geometry.

3rd ed.

1 43

ALPERIN/BELL. Groups and

Representations .

B ases. A Computational Approach to

LANG. Real and Functional Analysis .

BORWEINIERDELYI. Polynomials and

Polynomial Inequalities.

BECKERIWEISPFENNING!KREDEL. Grobner Commutative Algebra.

LANG. Differential and Riemannian

Manifolds .

AxLERIBOURDON!RAMEY. Harmonic

Function Theory.

142

MALLIAVIN. Integration and

DoosoN/POSTON. Tensor Geometry .

RoMAN. Coding and Information Theory.

141

B ROWN/PEARCY. An Introduction to

Probability .

1 34

138

S ILYERMAN. Advanced Topics in

the Arithmetic of Elliptic Curves .

1 57

Course.

1 37

EISENBUD. Commutative Algebra with a View Toward Algebraic

Rings.

1 32

RATCLIFFE . Foundations of Hyperbolic Manifolds.

-

Part III.

RoTMAN . An Introduction to the

Theory of Groups. 4th ed.

REMMERT. Theory of Complex Functions .

DUBROVIN/FOMENKOINOVIKOV. Modem

ROSENBERG. Algebraic K-Theory

and Its Applications.

1 48

Combined 2nd ed.

1 22

B RIDGES . Computability: A

1 69

BHATIA. Matrix Analysis.

Matrix Analysis (Graduate Texts in Mathematics)

Read more

Matrix Analysis (Graduate Texts in Mathematics)

Read more

Analysis Now (Graduate Texts in Mathematics)

Read more

Analysis Now (Graduate Texts in Mathematics)

Read more

Analysis Now (Graduate Texts in Mathematics)

Read more

Elementary Functional Analysis (Graduate Texts in Mathematics)

Read more

Analysis Now (Graduate Texts in Mathematics)

Read more

Analysis Now (Graduate Texts in Mathematics)

Read more

Analysis Now (Graduate Texts in Mathematics)

Read more

Analysis Now (Graduate Texts in Mathematics)

Read more

Analysis Now (Graduate Texts in Mathematics)

Read more

Analysis Now (Graduate Texts in Mathematics)

Read more

Analysis Now (Graduate Texts in Mathematics)

Read more

Elementary Functional Analysis (Graduate Texts in Mathematics)

Read more

Algebra (Graduate Texts in Mathematics)

Read more

Algebra (Graduate Texts in Mathematics)

Read more

A Course in p-adic Analysis (Graduate Texts in Mathematics)

Read more

Analysis (Graduate Studies in Mathematics)

Read more

Applied Linear Algebra and Matrix Analysis (Undergraduate Texts in Mathematics)

Read more

Applied Linear Algebra and Matrix Analysis (Undergraduate Texts in Mathematics)

Read more

Complex Analysis Graduate Texts in Mathematics Second Edition

Read more

An Introduction to Analysis (Graduate Texts in Mathematics 154)

Read more

Classical Fourier Analysis, Second Edition (Graduate Texts in Mathematics, 249)

Read more

Nonsmooth Analysis and Control Theory (Graduate Texts in Mathematics)

Read more

Applied Linear Algebra and Matrix Analysis (Undergraduate Texts in Mathematics)

Read more

Applied Linear Algebra and Matrix Analysis (Undergraduate Texts in Mathematics)

Read more

Applied Linear Algebra and Matrix Analysis (Undergraduate Texts in Mathematics)

Read more

Modern Fourier Analysis, Second edition (Graduate Texts in Mathematics)

Read more

Real and Functional Analysis (Graduate Texts in Mathematics)

Read more

Introduction to Modern Analysis (Oxford Graduate Texts in Mathematics, 8)

Read more

Recommend Documents

Matrix Analysis (Graduate Texts in Mathematics)

Matrix Analysis (Graduate Texts in Mathematics)

Analysis Now (Graduate Texts in Mathematics)

Gert K. Pedersen Analysis Now Springer-Verlag New York Berlin Heidelberg London Paris Tokyo Gert K. Pedersen K0benh...

Analysis Now (Graduate Texts in Mathematics)

Analysis Now (Graduate Texts in Mathematics)

Gert K. Pedersen Analysis Now Springer-Verlag New York Berlin Heidelberg London Paris Tokyo Gert K. Pedersen K0benh...

Elementary Functional Analysis (Graduate Texts in Mathematics)

Graduate Texts in Mathematics 253 Editorial Board S. Axler K.A. Ribet Graduate Texts in Mathematics For other tit...

Analysis Now (Graduate Texts in Mathematics)

Gert K. Pedersen Analysis Now Springer-Verlag New York Berlin Heidelberg London Paris Tokyo Gert K. Pedersen K0benh...

Analysis Now (Graduate Texts in Mathematics)

Graduate Texts in Mathematics 118 Editorial Board J.H. E\\'ing Springer New York Berlin Heidelberg Barcelona Budapest ...

Analysis Now (Graduate Texts in Mathematics)

Gert K. Pedersen Analysis Now Springer-Verlag New York Berlin Heidelberg London Paris Tokyo Gert K. Pedersen K0benh...

Analysis Now (Graduate Texts in Mathematics)

Gert K. Pedersen \ Analysis Now Springer-Verlag Gert K. Pedersen Analysis Now Springer-Verlag New York Berlin Heid...