This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
is also monotone increasing. Let u -< w y . Then by Theorem II.2.8 there exists x such that u < x -< y. Hence
(y) . So,
i s strongly isotone . •
�1aj orisation and Do ubly Stochastic
II.
42
Corollary II.3.4 ·f t cp
1R
Matrices
is a convex function, then the induced map
(Px) (x) , (II.26) for all x E IRn and P E Sn . So, in this case Theorem II. 3. 3 says that if a function IRn JR is convex and permutation invariant, then it is isotone {i. e., Schur-convex). m ==
==
:
----+
Also note that every isotone function � from IRn to IR has to be permu tation invariant because Px and x majorise each other and hence isotony of implies equality of (Px) and
is isotone. If, P E Sn
convex function and let (x ) in addition, W is monotone in ==
creasing, then is strongly isotone. Exercise 11. 3.9 Let cp IR IR be convex. For each k 1, 2, functions cp( k ) IRn IR by :
:
==
----+
. . . , n,
define
�
k cp ( k ) (x ) max L cp(x u (j ) ) , j= l ==
u
where runs over all permutations on n symbols. Then cp( k ) is isotone. If, in addition, cp is monotone increasing, then cp ( k ) is strongly isotone. Note that this applies, in particular, to a
n
cp(n) ( x ) L cp(xj ) ==
==
tr
cp (x ) .
j == l Compare this with Theorem I/. 3. 1 . The special choice cp (t) t gives cp( k ) ( x)
k Lx; .
j=l
==
==
44
II.
Maj orisation and Doubly
Example 11.3. 10 FO'r x E IRn
V(x)
Stochastic Matrices
let x
==
1 ""
� "Ex1 .
Let
- � (x1 - x- ) 2 .
==
n
.
J
This is called the variance function. Since the maps 1 convex, V(x) is isotone {i. e. , Schur-convex). Example 11.3 . 1 1 For E IR+. let H ( x) - L x1 log x1 , x
---7
( x1 - x) 2 are
x
==
J
where by convention we put t log t 0, if t 0. Then H is called the entropy function. Since the function f(t) t log t is convex for t > 0, we see that - H ( x) is isotone. {This is sometimes expressed by saying that the entropy function is anti-isotone or Schur-concave on IR+. . ) In particular, if x1 > 0 and � x1 1 we have ==
==
==
==
H( l , O, . . . , O) < H(xJ , . , xn) < H( -1 , . . . , -1 ) , which is a basic fact about entropy. Example 11.3 . 12 For p > 1 the function . .
n
n
is isotone on IR� . In particular, if x j > 0 and �xj 1, we have 2 _ ( n_ _ +_ 1_ )P � · + 1 P < � (x ) . _ · x ==
n P- 1
Example 11.3. 13 function if
j= l
A function
( Px) (x) for all x E IRn , P E Sn , {iii} ( c 1 X 1 , , E n Xn ) ( x 1 , . , x n ) if Ej ± 1 , {iv) <1> ( 1 , 0, . . . , 0) 1 . {The last condition is an inessential normalisation.) Examples of sym metric gauge functions are ==
==
. . .
.
.
==
==
n
( L i xi l p ) l f p , oo (x)
j= l
lx1 j . j
1
< p <
oo,
II . 3 Convex and Monotone Funct ions
The s e norms are commonly used i11.c;.-junctional analysis. of x are arranged so as to have l x 1 l > > lx n I , then ·
k
( k ) (x)
==
I: J xj l ,
j= l
·
If
45
the coordinates
·
l
is also a symmetric gauge function. This is a consequence of the majorisa tio ns (II. 29) and (i) in Examples II. 3. 5. Every symmetric gauge function is convex on IR.n and is monotone on IR+ (Problem II. 5. 1 1). Hence by Theorem II. 3. 3 it is � trongly isotone; i. e. , X -< w y in lR� (x) < ( y ) . =>
For differentiable functions there are necessary and sufficient conditions characterising Schur-convexity: Theorem 11. 3. 14 A differentiable function : IRn JR. is isotone if and ---t
only if (i)
we may assume that
for some 0
< s <
(u) - (x)
� . Let x(t ) be as in (IL27) . Then
18 -d (x (t ) ) dt 0
dt
46
II.
Majorisation and Doubly Stochastic Matrices
<
- 15 (x 1 - x2 f[:� (x (t)) - :: (x (t)) ] dt - fs x (t) I - x(t) 2 [ (x(t)) - 8 (x (t )) ] dt }0 1 - 2t 8x O if>
O if>
x2
1
0,
because of ( ii ) and the condition 0
<
s
<
•
�.
Example 11.3. 15 (A Schur-convex function : 1 2 � IR, where I == (0, 1 ) , be the function
( x 1 , x2 )
==
1 log( Xl
that is not convex) Let 1
- 1 ) + log ( -X2
-
1).
Using Theorem 11. 3. 14 one can check that is Schur-convex on the set However, the function log( � - 1 ) is convex on (0, � ] but not on [� , 1 ) . Example II.3.16 {The elementary symmetric polynomials) For each k == 1 , 2, · · · , n, let Sk IRn � IR be the functions :
Sk (x)
==
L
1 < i1
X i 1 X i2
•
•
Xi k .
•
These are called the elementary symmetric polynomials of the variables x 1 , . . . , X n . These are invariant under permutations. We have the identities n
and
Sk (X I , · · · , Xi , . . . , x n ) - Sk ( XI , · · · , Xj , · · · , X n )
== ( xi - x i ) Sk- l ( xl , . , x i , . . , xi , . .
.
.
.
.
, xn ) ,
where the circumflex indicates that the term below it has been omitted. l.lsing these one finds via Theorem 11. 3. 14 that each Sk is Schur-concave; i. e. , - Sk is isotone, on IR� . The special case k
n
==
n says that if x , y E IR� and x -< y, then II Xj j= l
n
>
II Yi ·
j= l
{The Hadamard Determinant Theorem) If A is an n positive matrix, then n det A < II ajj · Theorem 11.3. 1 7
j= l
xn
47
II . 3 Convex and Monotone Functions
P roof. Use Schur's Theorem ( Exercise II. 1. 12) and the above statement about the Schur-concavity of the function f(x) == Xj on JR� .
II
•
J
More generally, if A 1 , . . . , A n are the eigenvalues of a positive matrix A, we have for k == 1, 2 , . . . , n Sk ( A 1 , · · · , A n )
Exercise 11.3. 18
If A is an
m
x
d et (A A* )
<
n
<
(11 . 28)
Sk ( a1 1 , . . . , U nn )
complex matrix, then m
n
II :L l aij l 2 -
i =l j =l
(See Exercise 1. 1 . 3. ) Exerc ise 11.3. 19 Show that the ratio Sk (x) / Sk- 1 (x) is Schur-concave on the set of positive vectors for k 2, . . . Hence, if A is a positive matrix, then ==
Sn ( a 1 l , · · · , ann ) Sn ( A 1 , , An )
>
, n.
Sl ( al l , · · · , a nn ) Bn - 1 ( a l 1 , · · · , ann ) -- . . . --------> >
Bn- l ( A I , . . . , A n ) tr A == tr A l . Proposition 11.3.20 If A is an n x n positive ·
·
·
( det A) 1 f n
= min
{ �
tr B
:B
81 ( ...\ 1 , . . . , A n )
definite matrix, then
}
is positive and det B = 1 .
If A is positive semidefinite, then the same relation holds with by inf.
min
replaced
Proof. It suffices to prove the statement about positive definite matrices; the semidefinite case follows by a continuity argument . Using the spectral theorem and the cyclicity of the trace, the general case of the proposition can be reduced to the special case when A is diagonal. So, let A be diago nal with diagonal entries ...\ 1 , . . . , A n . Then, using the arithmetic-geometric mean inequality and Theorem 11.3. 17 we have
tr A B n
=
�
L>..i bii ( II. >..j ) lfn ( II. bjj ) l fn > (det A ) lfn (det B) l fn ,
n . J
>
J
J
for every positive matrix B . Hence, tr: B > ( det A) 1 f n if det B B == ( det A) 1 / n A - l this becomes an equality.
1. When
•
{The Minkowski Determinant Theorem) If A, B are n positive matrices then
Corollary 11.3. 2 1 n x
==
(det ( A + B) ) 1 f n > ( de t A ) 1 f n
+
( de t B ) 1 f n _
48
II.
Maj orisation and Doubly S t o chastic M atrices
Binary Algebraic Operations and
II. 4
Maj orisation
For x
E
IRn we have seen in Section IL l that k
x] L .
==
J=l
It follows that if x, y E
max(x, e1 ) .
J l l =k
IRn , then
x + y -< xl + y l .
(II. 29)
In this section we will study majorisation relations of this form for sums, products, and other functions of two vectors. IR is called lattice superadditive if A map
x 2 > · · · > X n · Next
I I . 4 B inary Algebrai c Operations and I'v1aj orisation
49
no te that we can find a finite sequence of vectors u(O) , u( l ) , . . . , u(N ) such th at y l == u C0) , y T == u ( N ) , y == u U) for some 1 < j < N ,
I an d each u( k + ) is obtained from uC k ) by interchanging two components in su ch a way as to move from the arrangement y l to y T ; i . e. , we pick up two indi ces i, j such that i <j
and interchange these two components to obtain the vector u( k + I ) . So, to prove (II.32) it suffices to prove
(x , u (k+l ) )
-< w
(x, u(k ) )
(11.33)
, N - 1 . Since we have already assumed x 1 > x 2 > > for k == 0, 1 , to prove (II. 33) we need to prove the two-dimensional majorisation .
.
.
·
·
·
Xn ,
(II.34)
if s 1 > s 2 and t 1 > t 2 . Now, by the definition of weak majorisation, this is equivalent to the two inequalities
cp(s1 , t 2 ) V cp(s2 , t 1 ) cp ( s 1 , t2 ) + cp ( s 2 , t 1 )
<
<
cp (s1 , t 1 ) V cp( s 2 , t2 ) , cp(s 1 , t 1 ) + cp(s 2 , t 2 ) ,
for s 1 > s 2 and t 1 > t2 . The first of these follows from the monotony of cp and the second from the lattice superadditivity. • Corollary 11.4.3 For x, y For x , y
E
IR+.
where X · y ( x 1Y 1 , ==
Corollary II.4.4
E
�n
xl + y T
-<
x+y
-<
xl . y T
-<w
X y
-<w
(x , y)
<
•
, X nYn ) · For x, y E IR. n .
.
x l + yl .
(11.35)
xl . y l '
(11. 36 )
(x l , y l ) .
(II.37)
.
(x l , y T )
<
Proof. If x > 0 and y > 0 , this follows from (II.36) . In the general case , choose t large enough so that x + t e > 0 and y + t e > 0 and apply the • special result.
The inequality (II.37) has a "mechanicaP' interpretation when x > 0 and y > 0. On a rod fixed at the origin, hang weights Yi at the points at distances xi from the origin. The inequality (II.37) then says that the maximum moment is obtained if the heaviest weights are the farthest from the origi n .
II.
50
Majorisation and Doubly Stochastic Matrices
2 Exercise 11.4.5 The function : IR R defined as monotone and lattice superadditive on JR2 . Hence, for x, y i x 1\
11. 5
1 (x ) for all x E IRn .
( iii ) If
(y)
for every symmetric
I I . 5 Pro blems
( vi ) If
symm
x
-<w
:
IR+
y if and etric gauge function .
x, y
E IR� , then
P ro ble m 11.5. 12. Let f f ( O) == 0.
�+
�
only i f
�
IR+
be defined
<
( y )
for every=··
be a concave function such that
( i ) Show that f is subadditive: f ( a + b )
( ii ) Let
(s( A - B) ) . ( X (X + y ) p- 1 ) + tJ> ( y · ( X + y ) p - l ) < [ ( x P ) ] l f P [ ( yP ) ] I fp [ ( x P ) ] l fp + [ ( yP )] 1 f p } [( ( x + y ) P ) ] l f q , ( k ) (Y) for all k == 1 , 2, . . . , n, then ci>(x) < ci>(y) for all symmetric gauge functions. Show that < ( 2 ) (x) < cpC2> (y) for all if symmetric gaug e functions ; i. e. , W ( x ) < W ( y) for all quadratic symmetric gauge functions. ' is a norm even when is a function on en that does not necessarily satisfy the triangle inequality that but meets the other requirements of norm. ) a '(y) for all x, y. Let 0. must be the Euclidean norm. (ii) Let must be the Eu clidean norm. {Use Exercise IV. 1 . 1 6 and the fact that every symmetric gauge function is bounded by the l1 -norm.) Exercise IV. 1 . 18 For each k = 1, 2, the dual of the norm ( k ) is given by (IV.24) ( k ) (x) = max { l ) ( x ), � ( n) (x) . Prove this using Proposition IV. 1 . 5 and Exercise IV. 1 . 1 6. Some ways of generating symmetric gauge functions are described in the following exercises. Exercise IV. 1 . 19 Let 1 = a 1 > a 2 > · · · > a n > 0. Given a symmetric gauge function m we define is a symmetric gauge function on IRn . X == (s(A)). (s ( BAC) ) < • II B II II C II ( k ) ( s ( A - B) ) . ( n ) , and then use the interpolation formulas (IV.9) and (IV.39) . For ..j ( B) I < I l A - E l l . J c 1 ) ( s ( A - B ) ) . ( l) ( ,B ) < 1 are sometimes called quasi-norms. Problem IV.5.2., More generally, show that for any symmetric gauge function ti> and 0 < p < 1 , if we define q> ( P ) as in (IV. l7) , then " == ; i.e. , the dual of the dual of a norm is the norm itself. Find the constants Kct> ,w when 0, let 'Pt: ( x ) == � cp( � ) . Then supp 'Pt:: == [-t:, E ] and 'Pt:: has all the other properties of cp listed above. The functions 'Pc are called mollifiers or smooth approximate identit ies. If f is a locally integrable function, we define its regularisation of order £ as the function 2, by R. Bouldin, Best approximation of a normal operator in the Schatten p-norm, Proc. Amer. Math. Soc. , 80 ( 1980) 277-282. The re sult for Q-norms, as well as the special one for all unitarily invariant norms given in Problem VI. 8. 1 1 , was proved by R. Bhatia, Some inequalities for norm ideals, Commun. Math. Phys. 1 1 1 (1987) 33-39. Theorem VI. 7. 1 is due to M. Fiedler, Bounds for the determinant of the sum of Hermitian matrices, Proc. Amer. Math. Soc., 30 ( 1 97 1) 27-31 . The inequality (VI.56) is also proved in this paper. Theorem VI.7.3 is due to J F Queir6 and A.L. Duarte, On the Cartesian decomposition of a matrix, Linear and Multilinear Algebra, 18 ( 1985) 77-85, while Theorem VI.7.6 is due to N. Bebiano. The proofs given here are taken from the paper by T. Ando and R. Bhatia cited above. There are several papers related to the Marcus-de Oliviera conjecture. For a recent survey, see N. Bebiano, New developments on the Marcus-Oliviera Conjecture, Linear Algebra Appl. , 197/ 1 98 ( 1994) 793-802. . 'P 2 ( J-L ) > · 2 (A) ) == ( U, P) , where the invertible matrix A has the polar decomposition A == UP. Earlier in this section we called ( k) ( X ) , 45
In particular,
max l sj ( A) - Sj ( B) I < I l A - B I I [Hint : See Theorem 111.4.4 and Exercise 11. 1.15.] Problem 111.6. 14. For positive matrices A, B show that For Hermitian matrices A, B show that ( A ! (A) , A T (B ) ) < tr AB < ( A ! ( A) , _x! (B)) . (Compare these with (11.36) and (II.37) .) Problem 111.6. 15. Let A, B be Hermitian matrices. Use the second part of Problem 111.6. 14 to show that Note the analogy between this and Theorem III.2.8. (In Chapter IV we will see that both these results are true for a whole family of norms called unitarily invariant norms. This more general result is a consequence of Theorem III.4.4. ) III. 7
Notes and References
As pointed out in Exercise III.l.6, many of the results in Sections III. 1 and III.2 could be derived from each other. Hence, it seems fair to say that the variational principles for eigenvalues originated with A.L. Cauchy's interlacing theorem. A pertinent reference is Sur f. 'equation a f 'aide de laquelle on determine les inegalites seculaires des mouvements des planetes, 1829, in A . L. Cauchy, Oeuvres Completes (lie Serie}, Volume 9, Gauthier
Villars. .. The minimax principle was first stated by E. Fischer, Uber Quadratische Formen mit reellen Koeffizienten, Monatsh. Math. Phys., 16 (1905 ) 234-249. The monotonicity principle and many of the results of S ection III.2 were proved by H. Weyl in Das asymptotische Verteilungsgesetz der Eigen werte linearer partieller Differentialgleichu "!gen, Math. Ann., 71 (1911 ) 441469. In a series of papers beginning with Uber die Eigenwerte bei den Dif ferentialgleichungen der mathematischen Physik, Math. Z. , 7(1920 ) 1-57,
III. 7
Notes
and
References
79
R.
Courant exploited the full power of the minimax principle. Thus the principle is often described as the Courant-Fischer-Weyl principle. As the titles of these papers suggest, the variational principles for eigen values were discovered in connections with problems of physics. One fa mous work where many of these were used is The Theory of Sound by Lord Rayleigh, reprinted by Dover in 1945. The modern applied mathematics classic Methods of Mathematical Physics by R. Courant and D . Hilbert, Wiley, 1953, is replete with applications of variational principles. For a still more recent source, see M. Reed and B. Simon, Methods of Modern Math ematical Physics, Volume 4, Academic Press, 1978. Of course, here most of the interest is in infinite-dimensional problems and consequently the results are much more complicated. The numerical analyst could turn to B.N. Par lett, The Symmetric Eigenvalue Problem, Prentice-Hall, 1980, and to G.W. Stewart and J.-G. Sun, Matrix Perturbation Theory, Academic Press, 1990. The converse to the interlacing theorem given in Theorem III. l.9 was first proved in L. Mirsky, Matrices with prescribed characteristic roots and diagonal elements, J. London Math. Soc. , 33 ( 1958) 14-21. We do not know whether the similar question for higher dimensional compressions has been answered. More precisely, let a1 > > f3n , be > a n , and {31 > real numbers such that Eai E{3i . What conditions must these num bers satisfy so that there exists an orthogonal projection P of rank k such that the matrix A diag ( a 1 , . . . , a n ) when compressed to range P has eigenvalues {31 , . . . , f3k and when compressed to ( range P) j_ has eigenvalues - 1. ) f3k + 1 , . . .. , f3n ? ( Theorem III. 1. 9 is the case k Aronszajn's inequality appeared in N. Aronszajn, Rayleigh-Ritz and ·
·
·
·
·
·
==
==
== n
A. Weinstein methods for approximation of eigenvalues. I. Operators in a Hilbert space, Proc. Nat. Acad. Sci. U.S.A. , 34( 1948) 474-480. The ele gant proof of its equivalence to Weyl 's inequality is due to H. W. Wielandt, Topics in the Analytic Theory of Matrices, mimeographed lecture notes,
University of Wisconsin, 1967. Theorem III.3.5 was proved in H.W. Wielandt, An extremum property of sums of eigenvalues, Proc. Amer. Math. Soc., 6 ( 1955 ) 106-110. The motivation for Wielandt was that he "did not succeed in completing the interesting sketch of a proof given by Lidskii" of the statement given in Exercise III.4.3. He noted that this is equivalent to what we have stated Theorem III.4.1 , and derived it from his new minimax principle. Inter estingly, now several different proofs of Lidskii 's Theorem are known. The second proof given in Section III.4 is due to M.F. Smiley, Inequalities re lated to Lidskii 's, Proc. Amer. Math. Soc., 19 ( 1968 ) 1029-1034. We will see some other proofs later. However , Theorem III.3.5 is more general, has several other applications, and has led to a lot of research. An account of the earlier work on these questions may be found in A.R. Amir-Moez, as
Extreme Properties of Linear Transformations and Geometry in Unitary Spaces, Texas Tech. University, 1968, from which our treatment of Sec tion III.3 has been adapted. An attempt to extend these ideas to infinite
80
III. Variational Principles for Eigenvalues
dimensions was made in R.C. Riddell, Minimax problems on Grassmann manifolds, Advances in Math. , 54 ( 1984) 107-199, where connections with differential geometry and some problems in quantum physics are also de veloped. The tower of subspaces occurring in Theorem III.3.5 suggests a connection with Schubert calculus in algebraic geometry. This connection is yet to be fully understood. Lidskii's Theorem has an interesting history. It appeared first in V .B. Lidskii, On the proper values of a sum and product of symmetric matrices, Dokl. Akad. Nauk SSSR, 75 (1950) 769-772. It seems that Lidskii provided an elementary ( matrix analytic ) proof of the result which F. Berezin and I.M. Gel'fand had proved by more advanced ( Lie theoretic ) techniques in connection with their work that appeared later in Some remarks on the theory of spherical functions on symmetric Riemannian manifolds, Trudi Moscow Math. Ob., 5 (1956) 311-351. As mentioned above, difficulties with this "elementary" proof led Wielandt to the discovery of his minimax prin ciple. Among the several directions this work opened up, one led to the follow ing question. What relations must three n-tuples of real numbers satisfy in order to be the eigenvalues of some Hermitian matrices A, B and A + B? Necessary conditions are given by Theorem III.4. 1. Many more were discov ered by others. A. Horn, Eigenvalues of sums of Hermitian matrices, Pacific J. Math. , 12(1962) 225-242, derived necessary and sufficient conditions in the above problem for the case 4, and wrote down a set of conditions which he conjectured would be necessary and sufficient for > 4. In a short paper Spectral polyhedron of a sum of two Hermitian matrices, Functional Analysis and Appl., 10 ( 1982 ) 76-77, B.V . Lidskii has sketched a "proof" establishing Horn's conjecture. This proof, however, needs a lot of details to be filled in; these have not yet been published by B.V. Lidskii ( or anyone else). When should a theorem be considered to be proved? For an interesting discussion of this question, see S. Smale, The fundamental theorem of al gebra and complexity theory, Bull. Amer. Math. Soc. ( New Series) , 4 ( 1981) 1-36. Theorem III.4.5 was proved in LM. Gel'fand and M. Naimark, The rela n ==
n
tion between the unitary representations of the complex unimodular group and its unitary subgroup, lzv Akad. Nauk SSSR Ser. Mat. 14 ( 1950) 239-
260. Many of the questions concerning eigenvalues and singular values of sums and products were first framed in this paper. An excellent summary of these results can be found in A.S. Markus, The eigen-and singular val ues of the sum and product of linear operators, Russian Math. Surveys, 19 ( 1964 ) 92-120. The structure of inequalities like (liL lO) and ( III.18 ) was carefully anal ysed in several papers by R.C. Thompson and his students. The asymmetric way in which A and B enter (III. 10 ) is remedied by one of their inequalities,
III. 7 Notes and References
81
which says k
k
k
j =l
j=l
j =l
LA L+Pi -j ( A + B ) < LAt (A ) + LA�j (B )
1 < p 1 < · · · < P k < such that A similar generalisation of (III. 18 ) has also been proved. ik + Pk - k < References to this work may be found in the book by Marshall and Olkin cited in Chapter II. Proposition III.5. 1 is proved in K. Fan and A.J. Hoffman, Some metric inequalities in the space of matrices, Proc. Amer. Math. Soc. , 6 (1955 ) 1 1 11 16. Results of Proposition 111.5.3, Problems III.6.5, III.6.6, III.6. 11, and III.6. 12 were first proved by Ky Fan in several papers. References to these may be found in I.C. Gohberg and M.G. Krein, Introduction to the Theory of Linear Nonselfadjoint operators, American Math. Society, 1969, and in the Marshall-Olkin book cited earlier. The matrix triangle inequality (Theorem III.5.6 ) was proved in R.C. Thompson, Convex and concave functions of singular values of matrix sums, Pacific J. Math. , 66 (1976) 285-290. An extension to infinite di mensions was attempted in C. Akemann, J. Anderson, and G. Pedersen, Triangle inequalities in operator algebras, Linear and Multilinear Algebra, 11(1982 ) 167-178. For operators A, B on an infinite-dimensional Hilbert space there exist isometries U, V such that for any indices 1
< i 1 < · · · < ik <
n,
n,
n.
l A + B l < U I A I U* + V I B I V * .
Also, for each £ > 0 there exist unitaries U, V such that l A + B j < U I A I U*
+
V I B I V * + c. I .
It is not known whether the part in the last statement is necessary. Refinements of the interlacing principle such as the one in Problem III.6.8 have been obtained by several authors, including R.C. Thompson. See, for example, his paper Principal submatrices II, Linear Algebra Appl. , 1( 1968) 211-243. One may wonder whether there are interlacing theorem, for singular val ues . There are, although they are a little different from the ones for eigen values. This is best understood if we extend the definition of singular values to rectangular matrices. Let A be an m matrix. Let r min(m, ) The r numbers that are the common eigenvalues of ( A* A ) 1 1 2 and ( A A * ) 11 2 are called the singular values of A. ( Sometimes a sequence of zeroes is added to make max(m, ) singular values in all.) Many of the results for singular values that we have proved can be carried over to this setting. See, e.g., the books by Horn and Johnson cited in Chapter I. c:
x n
n
==
n .
III. Variational Principles for Eigenvalues
82
Let A be a rectangular matrix and let B be a matrix obtained by deleting any row or any column of A. Then the minimax principle can be used to prove that the singular values of A and B interlace. The reader should work this out, and see that when A is an n x n matrix and B a principal submatrix of order n 1 then this gives -
s1 ( A) s 2 ( A)
. . . . . . . . . .. S n - 2 (A) S n- l ( A)
s 1 (B) s 2 (B)
> >
-
..... .. ... .
Sn-2 (B) Sn- l (B)
> >
> >
s3 (A) , s 4 (A) ,
> >
Sn (A) ,
-
.
0.
For more such results, see R.C. Thompson, Principal submatrices IX, Linear Algebra and Appl . , 5(1972 ) 1-12. Inequalities like the ones in Problems III.6.9 and III.6. 10 are called "resid ual bounds" in the numerical analysis literature. For more such results, see the book by Parlett cited above, and F. Chatelin, Spectral Approximation of Linear Operators, Academic Press, 1983. =Several refinements, exten sions, and applications of these results in atomic physics are described in the book by Reed and Simon cited above. The results of Theorem 111.4.4 and Problem III.6. 13 were noted by L. Mirsky, Symmetric gauge functions and unitarily invariant norms, Quart. J. Math. , Oxford Ser. (2) , 11(1960 ) 50-59. This paper contains a lucid sur vey of several related problems and has stimulated a lot of res�arch. The inequalities in Problem 111.6.15 were first stated in K. Lowner, Uber mono tone Matrix functionen, Math. Z . , 38 ( 1934 ) 177-216. Let A == U P be a polar decomposition of A. Weyl's majorant theorem gives a relationship between the eigenvalues of A and those of P ( the sin gular values of A) . A relation between the eigenvalues of A and those ofU was proved by A. Horn and R. Steinberg, Eigenvalues of the unitary part of a matrix, Pacific J. Math., 9(1959 ) 541-550. This is in the form of a majorisation between the arguments of the eigenvalues: arg A ( A) -< arg A ( U ) . A theorem, very much like Theorems 111.4. 1 and III.4.5 was proved by A. Nudel'man and P. Svarcman, The spectrum of a product of unitary ma trices, Uspehi Mat. N auk, 13 ( 1958) 111-1 17. Let A, B be unitary matrices. Label the eigenvalues of A, B , and AB eia1 , , eian ; eif31 , , eif3n , and e i --rn , respectively, in such a way that e i--r1 , as
•
•
•
•
•
,
2 7f
>
27r >
{31 rl
>
·
>
.
·
�
·
.
> f3n >
0,
> rn > 0 .
•
•
•
•
III. 7
If a 1
have
+
{3 1 <
Notes and References
21r, then for any choice of indices 1 k
LTij
j= l
<
k
k
j= l
j= l
< i1 < · · · < ik <
83 n
we
L o:ij + Lf3j ·
These inequalities can also be written in the form of a majorisation between n-vectors: I
-
a. -< {3 .
For a gene ralisation in the same spirit as the one of inequalities (III. lO) and ( III.18 ) mentioned earlier, see R.C. Thompson, On the eigenvalues of a product of unitary matrices, Linear and Multilinear Algebra, 2 ( 1974 ) 1 3-24.
Symmetric Norms
In this chapter we study norms on the space of matrices that are invariant under multiplication by unitaries. Their properties are closely linked to those of symmetric gauge functions on lRn . We also study norms that are invariant under unitary conjugations. Some of the inequalities proved in earlier chapters lead to inequalities involving these norms. IV . l
Nor1ns on cc n
Let us begin by considering the familiar p-norms frequently used in analysis. For a vector x ( . . . , X n ) we define ==
x1 ,
n
l l x i i P == (L l x i l p )lfp , 1 < p < oo, i=l
( IV. l) ( IV.2 )
each 1 < p < oo, l lx l l p defines a norm on e n . These are called the p-norms or the lp -norms. The notation (IV . 2) is justified because of the fact that ( IV.3) m llxJi p · llx l l oo pli--oo For
==
Some of the pleasant properties of this family of norms are
( IV.4 )
IV. l
Norms on
85
Cn
( IV.5 ) H x iJ P < IIYII P if l x l < I Y I , (IV.6) l l x l l p == I I Px l l p for all X E e n ' p E Sn . ( Recall the notations : l x l == ( !x l ! , . , !x n l ) , and lxl < I Y I if l xi l < I Yi l for 1 < j < n . Sn is the set of permutation matrices. ) A norm on e n is called gauge invariant or absolute if it satisfies the condition (IV.4) , mono tone if it satisfies (IV. 5 ) and permutation invariant or symmetric if it satisfies (IV .6) . The first two of these conditions turn out to be equivalent: P rop osition IV. l . l A norrn on e n is gauge invariant if and only if it is mo noto ne. .
.
,
P roof. Monotonicity clearly implies gauge invariance. Conversely, if a norm II · II is gauge invariant, then to show that it is monotone it is enough to show that IJ x l l < IJ Y II whenever Xj == tj yj for some real numbers , n. Furtber, it suffices to consider the special case 1 , 2, 0 < "- tj < 1 , j when all tj except one are equal to 1 . But then ==
(
. . .
l j ( y l , · · · ' t yk , · · · ' Yn ) l l l+t 1-t 1+t 1-t , Y1 + Yk , Yk 2 Y1 , · · ·
2
2
2
•
·
·
,
1+t
2
Yn
1-t
+ 2
) Yn
1-t
l+t
< 2 Jj (y l , · · · ' Yn ) ll + 2 II ( Y I , · · · ' - yk , · · ' Yn ) l l ·
==
•
I J ( Y 1 , · · ' Yn ) II · ·
Example IV. 1.2 Consider the following norms on IR2 : {i) l l x l l == lx 1 l + l x 2 l
+ l x1 - x2 1 ·
+ l x 1 - x2 1 · 21 x1 l + l x 2 ! -
{ii) l l x l l == l x1 l {iii} ll x l l
==
The first of these is symmetric but not gauge invariant, the second is neither symmetric nor gauge invariant, while the third is not symmetric but is gauge invariant.
Norms that are both symmetric and gauge invariant are especially inter esting. Before studying more examples and properties of such norms, let us make a few remarks. Let T be the circle group; i.e., the multiplicative group of all complex numbers of modulus 1. Let Sn o T be the semidirect product of Sn and T . In other words, this is the group of all n x n matrices that have exactly one nonzero entry on each row and each column, and this nonzero entry has modulus 1 . We will call such matrices complex permutation matrices. Then a norm I I . II on en is symmetric and gauge invariant if II x II == II Tx II
for all complex permutations
T.
( IV . 7 )
86
IV. Symmetric Norms
In other words, the group of ( linear ) isometries for II · I I contains Sn o T as a subgroup. ( Linear isometries for a norm 11 · 11 are those linear transformations on en that preserve II · II - ) Exercise IV. 1 .3 For the Euclidean norrn l l x ll 2 == (2:: j x i l 2 ) 1 1 2 the group of isometries is the group of all unitary matrices, which is much larger than the complex permutation group. Show that for each of the norms 1! x l l 1 and llx l l oo the group of isometries is the comp lex permutation group.
Note that gauge invariant norms on en are determined by those on R n . Symmetric gauge invariant norms on JRn are called symmetric gauge functions. We have come across them earlier (Example II.3.13 ) . To repeat, map : IR.n IR+ is called a symmetric gauge function if
a
----+
(i ) 4> is a norm, ( ii )
( Px ) == ( x )
for all x E Rn and P E Sn ,
In addition, we will always assume that is normalised, so that (iv)
( 1 , 0, . . . , 0) == 1 .
The conditions (ii ) and (iii ) can be expressed together by saying that is invariant under the group Sn o Z 2 consisting of permutations and sign changes of the coordinates. Notice also that a symmetric gauge function is completely determined by its values on JR� . Example IV. 1 .4 If the coordinates of x are arranged so that j x 1 l > l x2 l > . . . > l x n I , then for each k == 1 , 2, . . . , n , the function k
(k) (x) ==
(IV.8 )
L l xj l
j= l
is a symmetric gauge function. We will also use the notation ll x ll (k ) for these. The parentheses - are used to distinguish these norms from the p norms defined earlier. Indeed, note that l l x ii ( I ) == ll x l l oo and ll x l l ( n ) == l l x l h ·
We have observed in Problem 11.5. 1 1 that these norms play a very distin guished role : if ( k ) ( x) < ( k ) ( Y ) for all k = 1 , 2 , . . then (x) < (y) for every symmetric gauge function . Thus an infinite family of norm inequalities follows from a finite one. Proposition IV. 1 . 5 For each k == 1 , 2, . . . .
, n,
, n,
(IV.9)
IV. l Norms on en
87
P roof. We may assume, without loss of generality, that x E IR� . If x == u + v , then (k ) ( x) < ( k ) ( u ) +
u
v
( x 1! - xk1 , x2! - x k1 , . . . , x k1 - x k1 , O , . . . 0) ,
( x i , . . . , x i , x i +1 , . . . , x � ) ,
then u+v ( n ) ( u )
xi
xk ,
'
( k ) ( x) - k x i , 1
•
and the proposition follows.
We now derive some basic inequalities . If f is a convex function on an interval I and if a i , i 1 , 2, are nonnegative real numbers such that n Ea i 1 , then ==
. . . , n,
==
i=l
n
n
J ( E a i t i ) < L ai f (t i ) i= l
for
i =l
all t i E /.
- log t on the interval
Applying this to the function f (t) obtains the fundamental inequality
( 0, oo ) ,
one
(IV. 10)
This is called the ( weighted ) arithmetic-geometric mean inequality . The special choice a 1 a2 � gives the usual an arithmetic - geometric mean inequality ==
==
·
·
·
==
==
(IV. 1 1 ) Theorem IV. 1 . 6 Let p, q be real numbers with p > 1 and x , y E lRn . Then for every symmetric gauge function
!+�
==
1 . Let
(IV. 12) Pro of.
From the inequality (IV. 10) one obtains q x iP IYi l , lx Yl < p + .
q
88 · .
IV.
Symmetric Norms
and hence
( IV. 13)
For t > 0, if we replace x , y by t x and t - 1 y , then the left-hand side of ( IV. 13) does not change. Hence, (IV. l4) But, if cp(t)
1 == -tP a + -b qtq , p
where t , a , b > 0 ,
then plain differentiation shows that min cp(t) a 1 1Pb 1 f q . So, (IV. l2) follows from ( IV. 14) . ==
•
When ( n ) , ( IV. l2) reduces to the familiar Holder inequality n n n L l x i Yi l < (2= lx i i P ) 1 1P( L !Yi l q ) l f q _ i=l i= l i= l We will refer to ( IV. 12) as the Holder inequality for symmetric gauge functions. The special case p 2 will be called the Cauchy-Schwarz ==
==
inequality for symmetric gauge functions.
Exercise IV. l. 7 Let p, q , r be positive real numbers with that for every symmetric gauge function 4> we have Theorem IV. 1 . 8 Let Then for all x , y E IRn
be
p + 1q
1
any symmetric gauge function
==
1. r
Show
( IV. l 5) and let p > 1 . (IV. l6)
When p 1, the inequality (IV. l6) is a consequence of the triangle inequalities for the absolute value on JRn and for the norm . Let p > 1. It is enough to consider the case x > 0, y > 0. Make this assumption and write ( x + y ) P x · ( x + y ) p - 1 + y · (x + y ) P - 1 _ Now, using the triangle inequality for
Proof.
==
==
·
q (p - 1) [ 4> ( (x + y)P )] 1 1q,
si nce
IV. l Norms on Cn
p . If
89
we divide both sides of the above inequality by we get (IV . 16 ) . •
. :::�
Once again, when ci> ci>(n) the inequality (IV. 16) reduces to the familiar Minkowski inequality. So, we will call ( IV. 16 ) the Minkowski inequality ==
for symmetric gauge functions.
Exercise IV. 1.9 Let
ci>
be a symmetric gauge function and let p >
==
1. Let (IV . 17 )
[ ( l x i P )] 1/p .
Show that q> (P) is also a symmetric gauge function. Note that, if is the family of fp -norms, then
p
ci>�2) cl>p1p2 ==
for all PI , P 2 >
1,
(IV. 18)
and, if ci> ( k ) is the norm defined by (IV. B), then
k
ci> ��� ( X ) == ( L I Xj I p ) 1 Ip '
{IV. 19 )
j= l
where the coordinates of x are arranged as l x 1 l > l x 2 l > · · > l x j . n ·
Just as among the lp-norms, the Euclidean norm has especially interest ing properties, the norms <J>(2) where ci> is any symmetric gauge function have some special interest. We will give these norms a name:
Definition IV. l . lO W is called a quadratic symmetric gauge func tion, or a Q-norm, if W == (2) for some symmetric gauge function . In other words,
(IV.20 )
Exercise IV. l . l l (i) Show that an lp -norm is a Q-norm if and only if p > 2. (ii) More generally, show that for each k == 1, 2, . . . , n, ci> ��� is a Q-norm if and only if p > 2. Exercise IV. 1 . 12 We saw earlier that if
�!�(x)
If q> is a norm on en , the dual of is defined as ' (x) sup l (x , y) l. ==
tl>(y ) = l
( IV.21 )
It is easy to see that
90
IV.
Symmetric Norms
Exercise IV . 1 . 13 If Exercise IV. l . l4
Exercise IV. 1 . 15
is a symmetric gauge function then so is ' . Show that for any norm
l ( x, y ) l <
1 1 + p q -
-
==
(IV.22)
(IV.23)
1.
Let and W be two norms such that
Exercise IV. 1 . 16
Show that
'(x) > c - 1 w' (x) for all x. We shall call a symmetric gauge function a Q'-norm if it is the dual of a Q-norm. The lp-norms for 1 < p < 2 are examples of Q ' -norrns. Exercise IV. 1 . 1 7 (i) Let be a norm such that
{
}
Then W is a symmetric gauge function. Exercise IV . 1 .20 (i) Let be a symmetric gauge function on R_n . Let m < n. If E Rm ' let X == (xi , . ' X m. , 0 , 0 , . . . ' 0) and define w (x) ==
.
.
IV.2 Unitarily Invariant Norms on Operators on en
IV . 2
91
Unit arily Invariant Norms on O p erators on cc n
In this section, e n will always stand for the Hilbert space cc n with inner product ( · , ·) and the associated norm 11 · 11 · ( No subscript will be attached to this "standard" norm as was done in the previous Section. ) If A is a linear ope rator on e n ' we will denote by II Al l the operator ( bound ) norm of A defined as (IV.25) II A l l == sup I I A x il . l l x ll = l
As before, we denote by I A I the positive operator (A* A ) 11 2 and by s ( A ) the vector whose coordinates are the singular values of A , arranged as s 1 ( A ) > s2 ( A ) > · · · > s n (A) . We have II A I I == II l A I II == s 1 ( A) .
(IV.26)
Now, if U, v are unitary operators on e n , then I U AV I == V* I A I V and hence II A I I == II U A V II
(IV. 27)
for all unitary operators U, V . This last property is called unitary invari ance. Several other norms have this property. These are frequently useful in analysis, and we will study them in some detail. We will use the symbol Il l · I l l to mean a norm on n n matrices that satisfies x
(IV.28) I I I U A V I I I == I I I A I I I for all A and for unitary U, V . We will call such a norm a unitarily in variant norm on the space M ( n ) of n x n matrices. We will normalise such norms so that they all take the value 1 on the matrix diag(l, 0, . . . , 0) .
There is an intimate connection between these norms and symmetric gauge functions on IRn ; the link is provided by singular values.
Theorem IV. 2 . 1 Given a symmetric gauge function
Then this defines a unitarily invariant norm on M ( n ) . Conversely, given any unitarily invariant norm I l l · I l l on M ( n ) , define a function on Rn by
where diag
1 1 1 · 1 1 1 (x ) == I l l diag ( x ) I I j ,
(IV.30)
( ) is the diagonal matrix with entries x 1 , . . . , X n on its diagonal. x
Then this defines a symmetric gauge function on IRn .
Since s(UAV ) == s ( A ) for all unitary U, V , I l l · I I I is unitarily invariant. We will prove that it obeys the triangle inequality - the other Proof.
92
IV . Symmetric Norms
conditions for it to be a norm are easy to verify. For this, recall the majori sation ( II. 18 )
s ( A + B ) -< w s( A) + s ( B ) for all A , B E M ( n) ,
and then use the fact that
n I I A II P ==
( IV.31 ) ( IV.32 )
The second is the class of Ky Fan k-norms defined as k
I I A li c k ) == L si ( A ) , 1 < k < j=l
(IV.33)
n.
Among the p-norms, the ones for the values p == 1, 2, oo, are used most often. As we have noted, I I AI I oo is the same as the operator norm I I A I I and the Ky Fan norm 1 1 AI Ic 1 ) · The norm I I A i h is the same as I I AI! cn ) · This is equal to tr ( J AI) and hence is called the trace norm, and is sometimes written as I I A ilt r · The norm
n I I A II 2 == [L ( Sj ( A ) ) 2 ] 1 /2 j= l
(IV.34)
is also called the Hilbert-Schmidt norm or the F'robenius norm ( and is sometimes written as I I A I I F for that reason ) . It will play basic role in our analysis. For A , B E M ( n ) let a
( A , B) == trA *B .
( IV.35 )
This defines an inner product on M ( n ) and the norm associated with this inner product is I I AI I2 , i.e. , (IV.36)
IV .2 Unitarily Invariant Norms
If the
matrix A has entries aij , then
on Operators on Cn
93
__
(IV.37) 'l. ,J
Thus the norm I I A J I 2 is the Euclidean norm of the matrix A when it is thought of as an element of e n . This fact makes this norm easily computable and geometrically tractable. The main importance of the Ky Fan norms lies in the following: 2
Theorem IV.2.2 (Fan Dominance Theorem) Let A , B be two trices. If IIAII c k ) < II B i l e k ) for k == 1 , 2, . . . , n , the n
I ! lAI I I < I I IBI I I
n
x
n
ma
for all unitarily invariant norms.
This is a consequence of the corresponding assertion about sym • metric gauge functions. ( See Example IV. 1 .4 . ) Proof.
Since
x
E
IRn
and for all symmetric
I I A I I < I I I A I I I < II A II cn ) == I I A ih
(IV.38)
for all A E M (n) and for all unitarily invariant norms I l l Analogous to Proposition IV. l . 5 we have Proposition IV. 2 .3 For each k II A I I c k )
Proof.
Now let
==
·
Ill
1 , 2, . . . , n ,
== min { H E l l en) + k i i C II
: A == B + C } .
I f A == B + C , then J I A II c k ) < II B II c k ) + I I C l i c k ) < I I E l l en) + k ii C II . s(A ) == ( s1 , . . . , sn ) and choose unitary U, V so that A == U [ ( diag ( s 1 . . . , s n ) ] V. ,
Let
(IV.39)
B
C
U (d iag ( s 1 - sk , s 2 - s k , . . . , Sk - sk , 0 , . , O ) ]V, U [diag ( s k , S k , S k , S k + l , . . . , s n ) ] V. . .
•
Then k
I J B II cn)
==
.
.
,
A == B + C,
L Sj - ks k
j= l
==
II A II ck) - k sk ,
I V . Symtnctric Nonns
94
and
II A I I < k l A norm
11
=
II B II
+
•
k ii G II .
on M ( n ) is called symm e t r i c if for A, B, G in M(n)
v ( B AC ) < II B I I v (A ) II C I I .
P ro p osi tio n IV . 2 . 4
A
unitarily invariant.
(IV.40 )
norm on M ( n ) is symmetric if and only if i t
is
If 11 is a symmetric norm, then for unitary U, V we have v(U A V) < v (A) and v(A) v(U-1 U A v v - 1 ) < v(U A V ) . So, v is unitarily invariant . Conversely, by Problem II I . 6 . 2 , s; (BAC) < II B II I I C II sJ (A) for all j = 1 , 2 , , . . , n . So, if
Proof.
=
In particul ar7 this implies that every unitarily invariant norm is sub
mult i p licative :
I I I AB I I I < I I I A I I I I I I B I I I
for all A, B.
Inequal ities for sums and prod ucts of singular values of xnatricesl when combined with inequalities for symxnetric gauge function s proved in Section IV . 1 , lead to interesting st atexnents about unitarily invariant norms. This is illustrated below. Thcorcnt IV . 2 . 5 If A,
Proof.
n ar·e
n X n
11HLt1'i CC$ J then
(IV.4 1 )
If 1\ k A is the kth antisymmetric tensor product of
II A k
All
=
St
(1\k A)
=
k
A,
then
1 < k < n.
II s; (A) ,
j= l .
Hence,
k
II sj (AB )
j= l
k
II sj (A)sj ( B ),
j= l
No\v
U$C
t.hc statexncnt I I . 3 . 5 ( vii ) .
1
< k .<
n.
•
IV . 2 Unitarily Invariant Norms on Operators on
en
95
Corollary IV. 2 . 6 (Holder 's Inequality for Unitarily Invariant Norms) For every unitarily invariant norm and for all A , B E M (n) +o r all p > 1 and
jl
P roo f.
1
( IV . 42 )
p + .!. 1. q
==
Use the special case of ( IV.41) for r
==
1 to get
for every symmetric gauge function. Now use Theorem IV. 1.6 and the fact • that (s(A)) P s( I AIP). ==
Exerc ise IV.2. 7 Let p, q, r be positive real numbers with fo r every unitarily invariant norm Choosing p
==
q
l p
+ 1q
==
l . r
Then
( IV.43 ) ==
1,
one gets from this
( IV.44 ) This is the Cauchy-Schwarz inequality for unitarily invariant norms.
Exercise IV.2. 8 Given a unitarily invariant norm 1 1 1 · 1 1 1 on M (n) , define
( IV.45 ) Show that this is a unitarily invariant norm. Note that
( IV . 46) and
k
II All ��� (L ?; ( A)) 1 1P ==
for
j= l
p
> 1, 1 < k <
n.
(IV . 4 7)
Definition IV.2 . 9 A unitarily invariant norm o n M(n) is called a Q norm if it corresponds to a quadratic symmetric gauge function; i. e. , I l l · I l l is a Q -norm if and only if there exists a unitarily invariant norm I l l · I l l " such that
I I I A I 1 12
Note that the nor m
II
==
I I I A* A I I I " .
l i P is a Q norm if and only if p > 2 because ( IV.49 ) II A I I � I I A * All p/ 2· -
==
The
( IV.48 )
norms defined in (IV.47 ) are Q-norms if and only if p > 2.
96
IV.
Symmetric No rms
Exercise IV.2. 10 Let II · conditions are equivalent:
I IQ
denote a Q -norrn. Observe that the following
IIAII Q < II B II q for all Q- norrns. {ii) l i l A * Al i i < I I I B * Bl I I for all unita rily invariant norm s. {iii) IIAII � � � < l i B II � �� for k == 1, 2, . {iv} ( s ( A )) 2 -< w ( s ( B)) 2• Duality in the space of unitarily invariant norms is defined via the inner product ( IV.35 ) . If I l l · I l l is a unitarily invariant norm, define I l l · I l l ' as ( IV.50 ) I I IAI I I ' == sup I (A, B) I == sup !trA * B I . I IIBI I I =l I I IBI II = l It is easy to see that this defines a norm that is unitarily invariant. Proposition IV. 2 . 1 1 Let be a symmetric gauge function on IRn and let II · II be the corresponding unitarily invariant norm o n M ( n) . Then II · II� == II · II ' · Proof. We have from (IL40) and ( IV.41 ) {i)
.
.
, n.
n
n
l trA * Bl < tr i A * Bl == L Sj ( A * B) < L Sj ( A ) sj ( B) . j =l j= l It follows that II A II� < ' ( s ( A ) ) == I IAII ' · Conversely, ' ( s ( A )) II A II ' n
sup L Sj (A ) yj : y E IRn , 4> ( y ) == 1 j=l sup {tr (di ag(s( A )) diag (y ) ] : lldiag( y ) ll
•
Exercise IV. 2. 12 From statements about duals proved in Section IV. l, we can now conclude that
{i} l tr A * Bl < I I IAI I I · I I I B I I I' for every unitarily invariant norm . � + � == 1 . {ii} IIAII� == IIAII q for 1 < p < (iii) I I A I I ( k ) max { II A II ( 1 ) ! IJ A II ( n) } 1 < k < oo ,
=
,
,
n.
IV.2 Unitarily Invariant Norms on Operators on en
97
(iv) The only unitarily invariant norm that is its own dual is the Hilbert Sch midt no rm II · 1 1 2 . (v} The only norm that is a Q-norm and is also the dual of a Q-norm is the n orm I · 1 ! 2 -
Duals of Q-norms will be called Q '-norms. These include the norms I I " P ' 1 < p < 2· An important property of all unitarily invariant norms is that they are all reduced by pinchings. If P1 , . . . , Pk are mutually orthogonal projections such that P1 E9 P2 EB . . . EB Pk I then the operator on M(n) defined as .
==
,
k C( A ) LPj A Pj j= l
(IV.51)
==
is called a pinching operator. I t is easy to see that
I I I C (A) I I I < I I I A I I I
(IV.52 )
for every unitarily invariant norm. (See Problem II.5.5. ) We will call this the p inching inequality. Let illustrate one use of this inequality. us
A , B E M(n) .
Theorem IV.2. 13 Let norm on M (2n) 1 2
-
[
[ A +0 B
J
0
A+B
[
Then for every unitarily invariant
<
<
[ IAI � I BI � ]
(IV.53)
The first inequality follows easily from the observation that � � and � � are unitarily equivalent. If we prove the second inequality in the special case when A, B are pos itive, the general case follows easily. So, assume A , B are positive. Then Proof.
]
]
B l /2
0
] [ A 1 /22 ] B11
o
0
'
where A 1 1 2 , B 1 1 2 are the positive square roots of A , B. Since T*T and TT* . A + B O · un1tar1 . lent 10r every T , the matriX · · 1y are un1. tar1· 1y equ1va 0 0 equivalent to
[
c
[ A l/2l/
B 2
�][
Al/2 0
] [
]
IS
IV . Symmetric Norms
98
B ut
[ A0 B0 ]
. IS
a p 1. nch.1ng of th.1s last matr1x.
•
·
As a corollary we have: Theorem IV. 2. 14 {Rotfel 'd) Let f : IR + � JR+ be a concave function with f(O) == 0 . Then the function F on M (n) defined by
F ( A)
n
==
L f (sj ( A)) j=l
(IV.54)
is subadditive.
Proof.
The second inequality in (IV.53) can be written as a majorisation
(s( A ) , s ( B ) ) -< w (s( I A I + I B I) , 0 ) ' for all A, B E M(n) . We also know that s( I AI + I B I ) -< s( A ) + s( B) . Hence -
(s( A) , s( B ) ) -< (s (A) + s( B), 0 ) . Now proceed as in Problem II.5. 12.
•
Exercise IV.2. 1 5 Let 1 1 1 · 1 1 1 be a unitarily invariant norrn on M(n) . For m < n and A E M (rn) , define
l i lA! l it =
[� �]
I I I · l i lt defines a unitarily invariant norm o n M (m) . We will use this idea of "dilating" A and of going from M(n) to M(2n) in later chapters. Procedures given in Exercises IV. l . l9 and IV. l .20 can be adapted to matrices to generate unitarily invariant norms. Show that
IV. 3
Lidskii 's Theorem (Third Proof)
Let A! ( A) denote the n-vector whose coordinates are the eigenvalues of Hermitian matrix A arranged in decreasing order. Lidskii's Theorem, for which we gave two proofs in Section III.4, says that if A , B are Hermitian matrices, then we have the majorisation
a
(IV.55)
We will give another proof of this theorem now, using the easier ideas of Sections III. l and III. 2.
IV . 3 Lidskii 's
Theorem (Third Proof)
99
Exercise IV 3 1 One corollary of Lidskii 's Theorem is that, if A and B are any two matrices, then .
.
! s (A) - s ( B ) I
-< w
s ( A - B) .
(IV.56)
See Problem III. 6. 13. Conversely, show that if {IV. 56) is known to be true fo r all matrices A, B , then we can derive from it the statement {IV. 55). [Hint: Choose real numbers a. , {3 such that A + a. I > B + {31 > 0 .}
We will prove (IV.56) by a different argument . To prove this we need to prove that for each of the Ky Fan symmetric gauge functions ( k ) , 1 < k < n , we have the inequality
(IV.5 7 )
We will prove this for
This can be proved easily by another argument also. For any j consider the subspaces spanned by { u 1 , . . . , Uj } and { Vj , . . . , Vn } , where u i , Vi , 1 < i < n are eigenvectors of A and B corresponding to their eigenvalues .X. f ( A) and .>.. f ( B ), respectively. Since the dimensions of these two spaces add up to n + 1 , they have a nonzero intersection. For a unit vector x in this intersection we have ( x , Ax) > .>..j ( A) and (x , B x) < .X.] ( B ). Hence, we have I I A - E l l > I ( x , ( A - B)x) I > .>..] ( A) - .>..J ( B) . So, by symmetry
I .Aj ( A ) - .>..j (B ) I
<
I IA - El l,
1 <j <
n.
From this, as before, we can get m� l sj ( A ) - sj ( B ) l < I l A - B l l J
for any two matrices A and B . This is the same as saying
(IV. 58)
Let T be a Hermitian matrix with eigenvalues .A1 > A 2 > · · · > .Ap > .Ap+l > · · > An , where Ap > 0 > .Ap +l · Choose a unitary matrix U such that T diag ( ,\ 1 , . . . , .An ) U DU * , where D is the diagonal matrix D Let n + ( 0 , · · · 0, -.Ap+ l , . . . , - .A n ) - Let (,\1 , . . . , Ap , 0, · · · 0) and n+ r+ == U n + U * , rU n- U* . Then both r and r- are positive matri ces and ·
===
===
===
===
,
,
===
T
===
y + - y- .
This is called the Jordan decomposition of T.
(IV.59)
1 00
IV.
Symmetric Norms
Lemma IV. 3.2 If A, B are
n x n
Hermitian matrices, then
n
L I AJ (A) - AJ ( B) I
<
j=l
Proof.
I I A - E l l en) ·
( IV.60 )
Using the Jordan decomposition of A - B we can write
I I A - B l l ( n) If we put C
==
== tr ( A - B)+ + tr ( A - B) - .
A + ( A - B) -
== B + ( A - B) + ,
then C > A and C > B . Hence, by Weyl ' s monotonicity principle, AJ ( C) > AJ ( A) and AJ ( C ) > AJ ( B) for all j . From these inequalities it follows that
Hence,
n
L !Aj (A) - AJ ( B) l < tr ( 2 C - A - B) == l f A - E l l e n ) ·
•
j=l
Corollary IV.3.3 For any two
e n )
n
n
x n matrices A, B we have
(s( A) - s(B) ) == L l si ( A ) - Sj ( B) I j= l
Theorem IV.3.4 For n
x n
< I l A - El l e n) ·
(IV.61 )
matrices A, B we have the majorisation
!s(A ) - s (B) I
-<w
s( A - B) .
Proof. Choose any index k == 1 , 2, . . . , n and fix it. By Proposition IV.2 .3, there exist X, Y E M(n) such that
A-B=X+Y and Define vectors
I lA - B i l ek} == I l X I I en) a,
{3
ki iY I I-
as a
{3 Then
+
s (X + B ) - s( B) , s ( A ) - s (X + B) . s( A ) - s( B ) == a + {3.
IV . 4 Weakly Unitarily Invariant Norms
101
Hence , by Proposition IV. l.5 (or Proposition IV. 2 . 3 restricted to diagonal matrices) and by (IV.58) and ( IV 6 1 ) , we have .
4> ( k ) ( s ( A ) - s(B) )
<
( l) ( s ( A ) - s(X + B ) )
•
This proves the theorem.
As we observed in Exercise IV.3. 1 , this theorem is equivalent to Lidskii ' s Theorem. In Section III.2 we introduced the notation Eig A for a diagonal matrix ,vhose diagonal entries are the eigenvalues of a matrix A. The majorisations
for the eigenvalues of Hermitian matrices lead to norm inequalities
for all unitarily invariant norms. This is j ust another way of expressing Theorem III.4.4. The inequalities of Theorem III.2.8 and Problem III.6. 1 5 are special cases o f this. We will see several generalisations of this inequality and still other proofs of it. Exercise IV .3.5 If Sing 1 (A) denotes the diagonal matrix whose diagonal entries are s1 ( A ) , . . . , s n ( A ) , then it follows from Theorem IV. 3.4 that for any two matrices A, B I I I Sing l ( A ) - S ing1 ( B ) I I I
< l i lA - Bi l l
for every unitarily invariant norm. Show that in this case the "opposite inequality" I J J A - B i l l < I I ISing 1 (A) - Sing T ( B ) I l l is not always true.
IV . 4
Weakly Unitarily Invariant Norms
Consider the following numbers associated with an
n x n
matrix:
1 02
IV.
Symmetric Norms
( ii ) spr A == m?JC 1 -Xj (A) I , the spectral radius of A; l <1 < n
( iii) w ( A )
==
max I (x , Ax) I , the numerical radius of A. l lx l l = l
Of these, the first one is a seminorm but not a norm on M(n) , the second one is not a seminorm, and the third one is a norm. ( See Exercise !.2. 10. ) All three functions of a matrix described above have an important in variance property: they do not change under unitary conj ugations; i.e. , the transformations A � U AU* , U unitary, do not change these functions . Indeed, the first two are invariant under the larger class of similarity transformations A SAS - 1 , S invertible. The third one is not invari ant under all such transformations. --t
Exercise IV.4. 1 Show that no norm on M (n) can be invariant under all similarity transformations. Unlike the norms that were studied in Section 2, none of the three func tions mentioned above is invariant under all transformations A UA V , where U, V vary over the unitary group U (n) . We will call a norm T on M(n) weakly unitarily invariant (wui , for short ) if --t
T( A )
==
T ( U AU * )
for all
A
E
M( n) , U E U(n) .
(IV.63)
Examples of such norms include the unitarily invariant norms and the numerical radius. Some more will be constructed now. Exercise IV .4.2 Let E1 1 be the diagonal matrix with its top left entry 1 and all other entries zero. Then w ( A)
==
max ltr El l UAU* I · uEV(n)
(IV. 64)
Equivalently,
w ( A) Given
a
==
max { ltr API : P is an orthogonal projection of rank 1 } .
matrix C, let
we ( A ) ==
max l tr CU AU* I , UE U(n)
A
E
M(n) .
This is called the C-numerical radius of A . Exercise IV.4.3 For every C wui seminorm on M(n) .
E
M(n) , the C-numerical radius
Proposition IV.4.4 The C-numerical radius and only if (i} C is not a scalar multiple of I, and (ii} tr C =1- 0 .
we
( IV.65) we
'lS
a
is a norm on M (n) if
IV . 4 Weakly U nitarily Invariant Norms
103
P roof. If C == AI for any A E C, then we (A) == I A I I tr A I , and this is zero if tr A == 0. So we cannot be a norm. If tr C == 0, then we (I) == ltr C l == 0. Again we is not a norm. Thus (i) and (ii) are necessary conditions for we t o be a norm. Conversely, suppose we (A) == 0. If A were a scalar multiple of I, this would mean that tr C == 0. So, if tr C =f. 0, theH A is not a scalar multiple of I. Hence A has an eigenspace M of dimension m, for some 0 < m < n . Since e t K is a unitary matrix for all real t and skew-Hermitian K, the conditio n we (A) == 0 implies in particular that tr C e tK A e - t K
==
Differentiating this relation at t
0
==
if t E JR, K
-K* .
0 , one gets
tr (AC - CA)K == 0
if K
==
Hence, we also have tr (AC - CA)X
==
==
0 for all X
E
-K* .
M(n).
Hence AC == CA. (Recall that (S, T) == trS* T is an inner product on M(n) .) Since C commutes with A, it leaves invariant the m-dimensional eigenspace M of A we mentioned earlier. Now, note that since we (A) == wc (UAU*), C also commutes with U AU* for every U E U(n). But U AU* has the space UM as an eigenspace. So, C also leaves UM invariant for all U E U(n) . But this would mean that C leaves all m-dimensional subspaces invariant , which in turn would mean C leaves all one-dimensional subspaces • invariant, which is possible only if C is a scalar multiple of /. More examples of wui norms are given in the following exercise. Exercise IV.4. 5 (i} T(A) == I I A I I + I tr AI is a wui norm. More generally, the sum of any wui norm and a wui seminorm is a wui norm. (ii} T(A) == max( I IAI I , jtr AI) is a wui norm. More generally, the maxi mum of any wui norm and a wui seminorm is a wui norm.
{iii} Let W (A) be the numerical range of A. Then its diamete r diam W (A) is a wui seminorm on M(n). It can be used to generate wui norms as in {i} and {ii} . Of particular interest would be the norm T(A) == w(A) + diam W(A) . {iv) Let m ( A ) be any norm on M(n) . Then
T(A) == zs a
.
.
wuz norm.
max m(U AU* ) UEU(n)
IV.
1 04
Symmetric Norms
(v) Let m(A) be any norm on M (n) . Then T(A) = r m(U AU* ) dU, lu cn) where the integral is with respect to the {normalised} Haar measnre on U(n) is a wui norm. (vi) Let
T(A) == max �B?C I (e i , A ej ) I , e l , . . . ,en
t ,J
where e 1 , . . . , e n varies over all orthonormal bases. Then T is a wui norm. How is this related to (ii) and {iv) above ?
Let S be the unit sphere in
en ,
S == {x
E
en : l l x l l == 1 } ,
and let C(S) be the space of all complex valued continuous functions on S. Let dx denote the normalised Lebesgue measure on S. Consider the familiar Lp-norms on C ( S ) defined as I IJ I IP
= (is
1 1P ) x) (x Pd , f l I
1
< p <
oo ,
l f ( x) l . l l f l l oo == mEax S x
(IV.66)
Since the measure dx is invariant under rotations, the above norms satisfy the invariance property Np (/ o U) == Np ( f ) for all f
We will call
a
E
C(S) , U E U(n) .
norm N on C ( S ) a unitarily invariant function norm if
N(f o U ) == N(f ) for all f
E
C(S) , U
E
U (n) .
(IV.67)
The Lp-norrns are important examples of such norms. Now, every A E M(n) induces, naturally, a function fA on S by its quadratic form: (IV.68) !A (x) == (x, Ax) .
The correspondence A � fA is a linear map from M ( n) into C(S) , which is one-to-one. So, given a norm N on C(S) , if we define a function N' on M(n)
then
as
N'
N' ( A ) = N (JA ) ,
(IV.69)
is a norm on M(n) . Further, N ' (U AU* ) == N ( fu A u· ) == N ( JA
o
U* ) .
So, if N is a unitarily invariant function norm on C(S) then N' is a wui norm on M( n) . The next theorem says that all wui norms arise in this way:
105
IV.4 Weakly Unitarily Invariant Norms
Theorem IV.4.6 A norrn T on M (n) is weakly unitarily invariant if and only if there exists a unitarily invariant function norm N on C( S ) such N ' is defined by relations (IV. 68) and th at T == N' , where the map N (IV. 69). --t
P ro of. We need to prove that every wui norm T on M(n) is of the form N' for some unitarily invariant function norm N. Let F == {fA : A E M ( n) } . This is a finite-dimensional linear subspace of C( S) . Given a wui norm T, define N0 on F by No ( !A ) == T ( A )
(IV. 70)
.
N0 (f) for all The n N0 defines a norm on F, and further, N0 ( f o U) f E F. We will extend No from F to all of C(S) to obtain a norm N that is unitarily invariant. Clearly, then T == N' . This extension is obtained by an application of the Hahn-Banach Theo rem . The space C(S) is a Banach space with the supremum norm l l f l l oo · The finite-dimensional subspace F has two norms N0 and I I · l l oo · These must be equivalent: there exist constants 0 < a < {3 < oo such that n i i ! J i oo < No (!) < f3J i f l l oo for all f E F. Let G be the set of all linear functionals on F that have norm less than or equal to 1 with respect to the norm No ; i.e. , the linear functional g is in G if and only if jg(f ) I < No (f ) for all f E F. By duality then No (f) == sup lg (f) l , for every f E F. Now
l g(J) I < f3 1 1 f l l oo for g E G and f E F. Hence, by the Hahn-Banach The orem, each g can be extended to a linear functional g on C(S) such that J g (f) I < ,B I I f l l oo for all f E C(S ) . Now define gEG
O (f) == sup lg(f) l , gEG
for all f E C(S) .
Then e is a seminorm on C(S) that coincides with N0 on F. Let
Then
J-L
J-L (f) == max { B (f) , a l l f l l oo } ,
f
E
C( S) .
is a norm on C(S) , and J-L coincides with N0 on F. Now define N (f) ==
sup J-L(f o U ) , UEU(n)
f
E
C(S) .
Then N is a unitarily invariant function norm on C(S) that coincides with N0 on F. The proof is complete. • When N == i I l loo the norm N' induced by the above procedure is the numerical radius w . Another example is discussed in the Notes. The C-numerical radii play a useful role in proving inequalities for wui norms : ·
106
IV .
Symmetric Norms
Theorem IV.4. 7 For A B E M(n) the following statements are equiva lent: ,
{i) T(A) < T ( B ) for all wui norms
T.
{ii} we (A) < we ( B ) for all upper triangular matrices C that are not scalars and have nonzero tmce . {iii} we ( A ) < we ( B) for all C E M (n) . {iv) A can be expressed as a finite sum A = :L zk Uk B Uk where Uk E U (n) and zk are complex numbers with I: l zk l < 1 .
Proof. By Proposition IV.4.4, when C is not a scalar and tr C =f. 0 , each we is a wui norm. So ( i ) =* ( ii ) . Note that we (A) = wA ( C ) for all pairs of matrices A, C. So, if ( ii ) is true, then WA (C) < wB ( C ) for all upper triangular nonscalar matrices C with nonzero trace. Since WA and WB are wui , and since every matrix is unitarily equivalent to an upper triangular matrix, this implies that wA (C) < wB (C) for all nonscalar matrices C with nonzero trace. But such C are dense in the space M(n) . So WA (C) < wB (C) for all C E M (n) . Hence ( iii ) is true. Let JC be the convex hull of all matrices e i 0U BU * , e E IR, U E U(n) . Then IC is a compact convex set in M (n) . The statement ( iv ) is equivalent to saying that A E JC. If A ¢. IC, then by the Separating Hyperplane Theorem there exists a linear functional f on M (n) such that Re f(A) > Re f(X) for all X E /C . For this linear functional f there exists a matrix C such that f(Y) = tr CY for all Y E M(n) . ( Problem IV.5.8 ) For these f and C we have we (A)
>
max l tr CUAU* l > l tr CA l == l f(A) l > Re f( A ) UE U(n) max Re f (X) XEIC max Re tr Cei0 U BU* o,u
maxj t r CUBU* l u we ( B ) .
So, if ( iii ) were true, then ( iv ) cannot be false. Clearly ( iv ) * ( i ) .
•
The family we of C-numerical radii, where C is not a scalar and has nonzero trace, thus plays a role analogous to that of the Ky Fan norms in the family of unitarily invariant norms. However, unlike the Ky Fan family on M(n) , this family is infinite. It turns out that no finite subfamily of wui norms can play this role.
IV. 5
107
Problems
More precisely, there does not exist any finite family T1 , . . . , T of wui norms o n M ( n ) that would lead to the inequalities T(A ) < T ( B ) for all wui nor ms whenever Tj ( A ) < Tj ( B ) , 1 < j < m. For if such a family existed, then we would have m
m
{ X : T( X) < T(I) for all wui norms T } == n { x : Tj (X ) < Tj (I )} . j= l
(IV. 71) Now each of the sets in this intersection contains 0 as an interior point (wi th respect to some fixed topology on M ( n ) ) . Hence the intersection also contains 0 as an interior point . However, by Theorem IV.4. 7, the set on the left-hand side of (IV. 71) reduces to the set {zl : z E C, l z l < 1 } , and this set has an empty interior in M ( n ) . Finally, note an important property of all wui norms:
T (C(A)) < T( A )
(IV. 72)
for all A E M ( n ) and all pinchings C on M ( n) . In Chapter 6 we will prove a generalisation of Lidskii ' s inequality (IV.62) extending it to all wui norms. IV. 5
Problems
Problem IV. 5. 1 . When 0 < p < 1 , the function p ( x) == (2: l x- dP)l f p does not define a norm. Show that in lieu of the triangle inequality we have (Use the fact that f (t) == tP on IR+ is subadditive when 0 < p < 1 and convex when p > 1 . ) Positive homogeneous functions that do not satisfy the triangle inequality but a weaker inequality cp ( x + y) < c[cp ( x ) +
Problem IV. 5.3. All norms on e n are equivalent in the sense that if and W are two norms, then there exists a constant K such that tl> ( x ) < Kw ( x ) for all X E e n . Let Kci? , 'l!
==
inf{K :
108
IV.
Symmetric Norms
, W
are both members of the family p . Problem IV.5 .4. Show that for every norm
Problem IV.5 . 5 . Find the duals of the uorms (These are somewhat complicated.)
��� defined by (IV. l9) .
Problem IV.5.6. For 0 < p < 1 and a unitarily invariant norm I l l I l l on M(n) , let ) I I I A I l l ( p == I l l I A I p 1111 Ip . Show that l ( ) I l l A + B I l l (p < 2 ; - I l l A I l l (p ) + I l l B I l l p ) . ·
[
Problem IV. 5 . 7. Choosing p ==
]
q
== 2 in (IV.43) or (IV.42) , one obtains 112 1 2 1 * B B < A A A l l III III li lll . I I I B* l l
This, like the inequality (IV.44) , is also a form of the Cauchy-Schwarz in equality, for unitarily invariant norms. Show that this is just the inequality (IV.44) restricted to Q-norms. Problem IV. 5 . 8. Let f be any linear functional on M(n) . Show that there exists a unique matrix X such that f( A ) == tr XA for all A E M ( n) . Problem IV. 5.9. Use Theorem IV.2. 14 to show that for all A, B E M(n) det ( 1 + l A + B l ) < det( 1 + I A I ) det ( l + I BI ) . Problem IV. 5 . 10. More generally, show that for 0 < p < 1 and
J.L
>0
det( l + JL I A + B J P ) < det ( 1 + JL I .A I P ) det( 1 + JL I B I P ) . Problem IV.5. 1 1. Let fp denote the space e n with the p-norm defined in (IV. l) and (IV.2) , 1 < p < oo . For a matrix A let I JA I I -- P' de not e the P norm" of A as a linear operator from lp to f , ; i.e. , p Show that m9JC L laij l· J
I I A I I oo--+oo
I I A i h-- oo
.
�
m � L laij l · 7.
J
ij J . �B?Cia � ,J
IV. 6 Notes and Refere nces
109
Non e of these norms is weakly unitarily invariant. Problem IV. 5. 12. Show that there exi sts a weakly unitarily invariant norm T such that T ( A ) � T(A * ) for some A E M (n) . Problem IV. 5. 13. Show that there exists a weakly unitarily invariant norm T such that T(A) > T(B) for some positive matrices A, B with A < B. Problem IV.5 . 14. Let T be a wui norm on M (n) . Define v on M(n) as v(A) == T( I A I ) . Then v is a unitarily invariant norm if and only if T(A) < T(B) whenever 0 < A < B. Problem IV. 5 . 1 5 . Show that for every wui norm T 1 T(Eig A) == inf{ T( SAS - ) : S E GL(n) } . When is the infimum attained? Problem IV. 5 . 16. Let T be a wui norm on M(n) . Show that for every A T(A) >
ltr A I n
T(I) .
Use this to show that mi n { T(A - B) tr B == 0} == :
IV . 6
l tr A I n
T(l) .
Notes and References
The first major paper on the theory of unitarily invariant norms and sym metric gauge functions was by J . von Neumann, Some matrix inequali ties and metrization of matric space, Tomsk. Univ. Rev. , 1 (1937) 286-300, reprinted in his Collected Works, Pergamon Press, 1962 . A famous book devoted to the study of such norms ( for compact operators in a Hilbert space) is R. Schatten, Norrn Ideals of Completely Continuous Operators, Springer-Verlag, 1960. Other excellent sources of information are the books by I. C. Gohberg and M.G. Krein cited in C hapte r III, by A . Marshall and I. Olkin cited in Chapter II, and by R. Horn and C . R. Johnson cited in Chap ter I . A succinct but complete summary can be found in L. Mirsky's paper cited in Chapter IlL Much more on matrix norms (not necessarily invariant ones ) can be found in the book Matrix Norms and Their Applications, by G.R. Belitskii and Y.I. Lyubich, Birkhauser, 1988. The notion of Q- no rms is mentioned explicitly in R. Bhatia, Some in equalities for norm ideals, Commun. Math. Phys. , 1 1 1 ( 1987) 33-39. ( The possible usefulness of the idea was suggested by C. Davis. ) The Cauchy Schwarz inequality (IV.44) is proved in R. Bhatia, Perturbation inequalities
1 10
IV .
Symmetric Norms
for the absolute value map in norm ideals of operators, J. Operator Theory, 19 ( 1988) 129- 136. This, and a \vhole family of inequalities including the one in Problem IV .5. 7, are studied in detail by R.A. Horn and R. Mathias in two papers, An analog of the Cauchy-Schwarz inequality for Hadamard p roducts and unitarily invariant norms, SIAM J. Matrix Anal. Appl. , 1 1 ( 1990) 48 1-498, and Cauchy-Schwarz inequalities associated with positive semidefinite matrices, Linear Algebra Appl. , 142 ( 1990) 63-82. Many of the other inequalities in this section occur in K. Okubo, Holder- type norm in equalities for Schur products of matrices, Linear Algebra Appl. , 91 ( 1987) 13-28. A general study of these and related inequalities is made in R. Bha tia and C. Davis, Relations of linking and duality between symmetric gauge functions, Operator Theory: Advances and Applications, 73( 1994) 127- 137. Theorems IV.2. 13 and IV.2. 14 were proved by S. Ju. Rotfel ' d, The singu lar values of a sum of completely continuous operators, in Topics in Math ematical Physics, Consultants Bureau, 1969, Vol. 3 , pp. 73-78. See also, R. C. Thompson, Convex and concave functions of singular values of matrix sums, Pacific J . Math. , 66 ( 1976) 285-290 . The results of Problems IV.5.9 and IV.5. 1 0 are also due to Rotfel'd. The proof of Lidskii ' s Theorem given in Section IV.3 is adapted from F. Hiai and Y. Nakamura, Majorisation for generalised s-numbers in semi finite von Neumann algebras, Math. Z. , 195 ( 1 987) 17-27. The theory of weakly unitarily invariant norms was developed in R. Bha tia and J.A. R. Holbrook, Unitary invariance and spectral variation, Linear Algebra Appl. , 95( 1987) 43-68. Theorem IV.4.6 is proved in this paper. More on C-numerical radii can be found in C.-K. Li and N.-K. Tsing, Narms that are invariant under unitary similarities and the C-numerical radii, Linear and Multilinear A lgebra, 24 ( 1989) 209-222. Theorem IV.4. 7 is taken from this paper. A part of this theorem (the equivalence of condi tions (i) and (iv) ) was proved in R. Bhatia and J .A.R. Holbrook, A softer, stronger Lidskii theorem, Pro c. Indian Acad. Sciences (Math. Sciences) , 99 ( 1989) 75-83. Two papers dealing with wui norms for infinite-dimensional operators are C.-K. Fong and J.A.R. Holbrook, Unitarily invariant operator norms, Canad. J. Math. , 35 ( 1983) 274-299 , and C.-K. Fong, H. Radjavi, and P. Rosenthal, Norms for matrices and operators, J. Operator Theory, 18(1987) 99- 1 13. The theory of wui norms is not developed as fully as that of unitarily invariant norms. Theorem IV.4.6 would be useful if one could make the correspondence between T and N more explicit. As things stand, this has not been done even for some well-known and much-used norms like the Lp-norms. When N is the L00 function norm, we have noted that N' (A) w(A) . When N is the L2 function norm, then it is shown in the Bhatia Halbrook ( 1987) paper cited above that ==
N' (A )
==
(
I I A I I � + I tr A 1 2 n + n2
)
1 12
.
IV . 6 Notes and References
111
For other v�lues of p, the correspondence has not been worked out. For a recent survey of several results on invariant norms see C.-K. Li, Some aspects of the theory of norms, Linear Algebra Appl. , 2 12/2 13 ( 1 994) 71-100.
v Operator Monotone and O perator Convex Functions
In this chapter we study an important and useful class of functions called operator monotone functions. These are real functions whose extensions to Hermitian matrices preserve order. Such functions have several special properties, some of which are studied in this chapter. They are closely related to properties of operator convex functions. We shall study both of these together.
V. l
Definitions and Simple Examples
An ) Let f be a real function defined on an interval I. If D = diag ( A 1 , is a diagonal matrix whose diagonal entries Aj are in I, we define f (D) == diag( j ( >.. 1 ) , . . . , f (>.. n ) ) . If A is a Hermitian matrix whose eigenvalues Aj are in I, we choose a unitary U such that A U DU* , where D is diagonal, and then define f(A) =: U f (D)U* . In this way we can define f (A) for all Hermitian matrices (of any order ) whose eigenvalues are in I. In the rest of this chapter, it will always be assumed that our functions are real functions defined on an interval ( finite or infinite , closed or open) and are extended to Hermitian matrices in this way. We will use the notation A < B to mean A and B are Hermitian and B A is positive. The relation < is a partial order on Hermitian matrices. A function f is said to be matrix monotone of order n if it is mono tone with respect to this order on n x n Hermitian matrices, i.e. , if A < B implies f(A) < f (B) . If f is matrix monotone of order n for all n we say f is matrix monotone or operator monotone. .
==
-
•
.
,
V. l
1 13
Definitions and Simple Examples
A function- is said to be -matrix convex of order n if for all Hermitian matrices A and B and for all real numbers 0 < A < 1 ,
f
n x n
f ( ( 1 - A) A + AB) < ( 1 - A) f (A ) + A f ( B ) . If f is matrix convex of all orders, we say that
operator convex.
(V. 1 )
f is matrix convex or
(No te that if the eigenvalues of A and B are all in an interval /, then the eigenvalues of any convex combination of A , B are also in /. This is an easy consequence of results in Chapter III . ) We will consider continuous functions only. In this case, the condition (V. l ) can be replaced by the more special condition (V.2)
( Functions satisfying (V.2) are called mid-point operator convex, and if they are continuous, then they are convex. ) A function f is called operator concave if the function -f is operator convex. It is clear that the set of operator monotone functions and the set of operator convex functions are both closed under positive linear combina tions and also under (pointwise) limits. In other words, if , g are operator monotone, and if a, {3 are positive real numbers, then a + f3g is also oper are operator monotone, and if ( x ) ( x ) , then ator monotone. If is also operator monotone. The same is true for operator convex functions.
f
f f f f f Example V. l . l The function f(t) == a + (3t is operator monotone {on every interval} for every a E IR and {3 > 0. It is operator convex for all n
n
---7
a, {J E �-
The first surprise is in the following example. Example V. 1.2 f(t) == t 2 [0 , oo)
The function on is not operator mono tone. In other words, there exist positive matrices A , B such that B - A is positive but B2 - A2 is not. To see this, take Example V. 1.3
The function f(t) t2 is operator convex on every in terval. To see this, note that for any Hermitian matrices A, B , 2 2 2A2 + B2 - ( A + B ) 2 A B) == _!_ A AB BA) B - 0. ( ( 4 4 2 2 This{3 shows that the function f ( t) + (3t + "(t2 is operator convex for all IR, > 0 . ==
=
a,
E
!
1
+
== a
>
1 14
V.
O perator Mo notone and O perator Co nvex Functions
Example V. l.4 The function
f ( t)
==
To see this, let
Then,
A3
; B3
_
t3 on [0, oo ) is not operator convex.
(A ; Br = ( � � )
'
and this is not positive.
Examples V. l.2 and V . l .4 show that very simple functions which are monotone ( convex ) as real functions need not be operator monotone ( op erator convex ) . A complete description of operator monotone and operator convex functions will be given in later sections. It is instructive to study a few more examples first. The operator monotonicity or convexity of some functions can be proved by special arguments that are useful in other con texts as well. We will repeatedly use two simple facts. If A is positive, then A < I if and only if spr ( A ) < 1 . An operator A is a contraction ( II AII < 1 ) if and only if A* A < I. This is also equivalent to the condition AA * < I. The following elementary lemma is also used often. Lemma V . 1 . 5 If
X * AX . Proof.
B > A, then for every operator X we have X* BX >
For every vector u we have,
( u , X*BX u )
==
( Xu , BX u ) > (Xu, AXu)
==
(u, X* AXu) .
This proves the lemma. An equally brief proof goes as follows. Let C be the positive square root of the positive operator B - A. Then
X * ( B - A)X == X* CCX == ( CX) * CX > 0. Proposition V. 1.6 The function f (t) ( O, oo ) .
==
•
� zs operator monotone on
Pr-oof.
Let B > A > 0. Then, by Lemma V. l.5, I > B- 11 2 AB- 11 2 • Since the map T y- l is order-reversing on commuting positive operators, we have I < B11 2 A- 1 B 1 1 2 . Again, using Le mma V. l .5 we get from this ---+
•
B- l < A- 1 .
Lemma V. 1 . 7 If B > A > 0 and
B is invertible, then II A 1 1 2 B- 1 1 2 1 l < 1.
If B > A > 0 , then I > B- 1 1 2 AB- 1 1 2 and hence IIA 112 B-112 11 < 1 .
Proof.
==
(A 1 12 B- 1 1 2 )* A 1 / 2 B- 1 / 2 , •
V . 1 Definitions and S imple Examples
P ro position V . 1 . 8 The function f (t) [O , oo ) . Proof.
Let
B > A > 0.
Suppose
==
t11 2
1 15
is operator monotone on
B is invertible.
Then, by Lemma V. 1 . 7,
Si nce B - 1 / 4 AB - 1 14 is positive, this implies that I > B - 114 A 11 2 B - 1/ 4 . Hence, by Lemma V. 1 . 5 , B 1 1 2 > A 1 1 2 . This proves the proposition under the assumption that B is invertible. If B is not strictly positive, then for 0. every £ > 0, B + £1 is strictly positive. So, ( B + £1) 1 1 2 > A 11 2 . Let £ • This shows that B 1 1 2 > A 1 1 2 . ---+
Theorem V. 1.9 The function f ( t) for 0 < r < 1 .
==
t' is operator monotone on [0 , oo )
Proof. Let r be a dyadic rational, i.e. , a number of the form r == -; , where n is any positive integer and 1 < m < 2n . We will first prove the assertion for such r. This is done by induction on n . Proposition V. 1 .8 shows that the assertion of the theorem is true when n == 1 . Suppose it is also true for all dyadic rationals ;; , in which 1 < j < n - 1 . Let B > A and let r == -; . Suppose m < 2n - 1 . Then, by the induction hypothesis, B m f2 n- l > A rn f 2n - l . Hence, by Proposition V. 1 .8, B m f 2n > A m f 2 n . Suppose m > 2n -1 . If B > A > 0, then A - 1 > B - 1 . Using Lemma V. 1 .5, we have B rn f 2n A - 1 B m f 2n > B rn f 2n B - 1 B rn f 2 n == B (m / 2 n - l_ 1 ) . By the same argument ,
A - 1 1 2 B m / 2 n A - 1 B mf 2 n A - 1 / 2 > A - 1 1 2 B ( rn / 2n - l _ 1) A -1 12 > A - 1 / 2 A ( rn / 2 n - 1 - 1 ) A - 1 1 2 ( by the induction hypothesis ) . This can be written also
as
So, by the operator monotonicity of the square root ,
Hence , B m f 2 n > A m / 2 n . We have shown that B > A > 0 implies B ' > A' for all dyadic rationals r in [0, 1] . Such r are dense in [0 , 1] . So we have B' > A' for all r in [0, 1] . By continuity this is true even when A is positive semidefinite . • Exercise V. l . lO Another proof of Theorem V. 1 . 9 is outlined below . Fill in the details.
V.
116
Operator Monotone and O p erator Convex Functions
{i} The composition of two operator monotone functions is operator mono tone. Use this and Proposition V. l . 6 to prove that the function f ( t) == �t is operator monotone on (0 , oo ) .
l
(ii) For each A > 0, the function f ( t ) (O, oo) .
==
A� t is operator monotone on
{iii) One of the integrals calculated by contour integration in Complex Analysis is
J0
CXJ
cosec
Ar - 1 dA == 1r 1+A
r1r,
0
1.
( V. 3 )
By a change of variables, obtain from this the formula tr
==
sin r 7r 1r
00
J0
t r- l d A A A+t
(V.4)
valid for all t > 0 and 0 < r < 1 . {iv) Thus, we can write
tr
=
J0 A + t dJ.L ( A) , CXJ
t
0 < r < 1,
(V . 5)
where J.L is a positive measure on (0, oo) . Now use {ii) to conclude that the function f(t) == t r is operator monotone on (0, oo ) for 0 < r < 1 .
Example V. l . l l The function f(t) == I t ! is not operator convex on any interval that contains 0. To see this, take
A= Then
IAI = But l A + B I 111. 5. 7.)
==
( - 11
( - 11
( � ) � ). 3 -1 -1 , IAI + IBI = ( ). -1 1 1 ) 1 -1
,
B=
V2 I . So I AI + I B I - I A + Bl is not positive. {See also Exercise
Example V. 1 . 12 The function f ( t) == t V 0 is not operator convex on any interval that contains 0. To see this, take A , B as in Example V. 1 . 1 1 . Since the eigenvalues of A are -2 an d 0, f( A ) == 0. So � (f( A ) + f (B) ) ==
( � � ) . Any positive matrix dominated by this must have ( � ) as an
eigenvector with 0 as the corresponding eigenvalue. Since � ( A + B) does not have
( � ) as an eigenvector, neither does f ( A;B ) .
V. 2 Some Characterisations
117
Exercise V. l . 13 Let I be any interval. For a E I, let f (t) == (t - a ) V 0 . Then f is called an "angle function" angled at a. If I is a finite interval, th en every convex function on I is a limit of positive linear combinations of li near functions and angle functions. Use this to show that angle functions are not operator convex. Exercise V. 1 . 14 Show that the function f (t) tone on any interval that contains 0.
==
t V O is not operator mono
Exerc ise V. 1 . 1 5 Let A , B be positive. Show that
(A - 1 Therefore, the function f ( t) V.2
==
_
B - 1 ) (A - 1
+
B- 1 ) - 1 ( A- 1
_
B- 1 )
2
� is opemtor convex on ( 0, oo ) .
Some Characterisations
There are several different notions of averaging in the space of operators. In this section we study the relationship between some of these operations and operator convex functions. This leads to some characterisations of operator convex and operator monotone functions and to the interrelations between them. In the proofs that are to follow, we will frequently use properties of operators on the direct sum 1t EB 7t to draw conclusions about operators on 1-l. This technique was outlined briefly in Section L3. Let K be a contraction on 1t. Let L (I - KK* ) 11 2 , M == (I - K* K) 11 2 . Then the operators U, V defined as ==
U= ( !
-L ) � ) V = ( ! K*
-K
(V.6)
,
are unitary operators on 1t EB 7-l. ( See Exercise I.3.6. ) More specially, for each 0 < .X < 1 , the operator (V . 7) is a unitary operator on 1t EB 1t. Theorem V.2 . 1 Let f be a real function on an interval lowing two statements are equivalent:
I.
Then the fol
(i) f is operator convex on I. {ii) f ( C ( A) ) < C (f ( A ) ) for every Hermitian operator A (on a Hilbert space 1-l) whose spectrum is contained in I and for every pinching C (in the space 1-l).
1 18
V.
Operator Monotone and Operator Convex Functions
Proof. (i) *(ii) : Every pinching is a product of pinchings by two comple mentary projections. (See Problems II.5.4 and IL 5.5.) So we need to prove this implication only for pinchings C of the form C (x )
==
X + U* X U ' 2
For such a C
where U
==
( I0
) -I . 0
� * AU ) f (A) + �( U* AU) < f( A+
f (C (A) )
f(A) + U* f(A) U = C ( f (A) ) . 2
( ii )
=}
(i) : Let A, B be Hermitian operators on 1-l , both having their
spectrum in /. Consider the operator T
=
( � � ) on 1t
E9 1t . If W is
the unitary operator defined in (V. 7) , then the diagonal entries of W*TW are AA + (1 - A)B and (1 - A)A + AB . So if C is the pinching on 1t EB 1-l induced by the projections onto the two summands, then C ( W* TW )
==
( AA + ( 01 - A. ) B
0 ( 1 - A)A + AB
By the same argument, C (f ( W* TW) )
C ( W * f (T ) W ) A. f(A) + ( 1 - A) j ( B ) 0
(
)
.
0 ( 1 - A) j (A) + A j (B )
So the condition f(C(W*TW) ) < C(f(W* TW) ) implies that j (AA + ( 1 - A ) B) < A.j (A) + ( 1 - A) j (B) .
)
.
•
Exercise V . 2 . 2 The following conditions are equivalent: {i) f is operator convex on I. (ii} f ( AM ) < (f ( A) ) M for every Hermitian operator A with its spectrum in I, and for every compression T TM . ---+
(iii) f (V* AV) < V * f (A) V for every Hermitian operator A (on 1t) with its spectrum in I, and for every isometry from any Hilbert space into 1t . (See Section III. 1 for the definition of a compression.)
V . 2 Some Characterisations
Theorem V.2 . 3 Let I be an interval containing 0 and let Junction on I. Then the following conditions are equivalent: {i) (ii) (iii)
f
1 19
be a real
f is operator convex on I and f(O) < 0. f(K * AK) < K* f(A)K for every contraction K and every Hermitian operator A with spectrum in I. f(KiAKl + K;_BK2 ) < K�j(A)K1 + K2f( B )K2 for all operators K K2 such that Ki K1 + K2 K2 < I and for all Herrnitian A, B with 1 ,
spectrum in I.
(iv)
f(PAP) < Pf(A)P for all projections P
and Hermitian operators
with spectrum in I.
P roof. (i) ==> (ii) : Let defined in (V. 6) . Then
So,
T
==
(�
g)
and let
U *TU = ( �*iff �*if: ) ,
Hence,
( (
K * AK 0 0 LAL
)
U, V be the unitary operators
V*TV =
)
==
A
(
K��� -:�1 L
).
U* TU + V*TV . 2
0 f(K * AK) 0 f(LAL) f U * TU ; V * TV f(U * TU) + f(V * TV) <
(
2 U* f(T) U + 2
)
V * f(T) V
� { U* ( f �A) f �O) ) U + V* ( f �A) f �O) ) V } A) f f A) ) * < � { U* ( � � U V ( � � ) V} 0 ). K * f(A)K ( 0 Lj(A)L Hence, f(K* A K) < K* f(A)K. ) ) � . Then K (ii) ( i ii) : Le t = ( � � , K = ( �: +
=>
T
traction. Note that
K*TK =
(
Ki AK1 ; K� BK2
� )·
is
a
con-
V.
1 20
Operator Monotone and Operator Convex Functions
Hence,
f(K*T K) < K* f ( T )K
( Ki f(A)KI ; K2 f (B) K � ) . 2
( iii ) * ( iv ) obviously. ( iv ) =* ( i ) : Let A, B be Hermitian operators with spectrum in I and let
0
< .,.\ <
1 . Let T
==
( A0 B0 ) , P ( 0I 00 ) and let W be the unitary . ==
operator defined by (V. 7) . Then
PW*TW p = So,
( j( .AA + (0 - .A)B) 1
0 f (O)
( AA + (� - A)B
)
<
�)
.
f(PW*TWP) Pf(W*TW)P PW* f(T)WP .A f(A) + (1 - .A ) f (B ) 0 .
) 0
==
(
0
Hence, f is operator convex and f ( O )
< 0.
•
Exercise V.2 . 4 (i) Let ...\ 1 , .A 2 be positive real numbers such that .,.\ 1 .,.\ 2 >
( ).bJ �� ) is positive. (Use Proposition 1. 3. 5.) ( � c;; ) be a Hermitian operator. Then for every {ii) Let
C* C.
Then
there exists
.,.\
>
0
such that
( CA
C* B
) < ( A +0 t:I -
c
> 0,
) .AI . 0
The next two theorems are among the several results that describe the connections between operator convexity and operator monotonicity. Theorem V.2 . 5 Let f be a {continuous) function mapping the positive
half-line [0, oo ) into itself. Then f is operator monotone if and only if it is operator concave.
P roof.
Suppose f is operator monotone. If we show that f(K* A K) > K * f (A ) K for every positive operator A and contraction K, then it would follow from Theorem V.2.3 that f is operator concave. Let T == ( � g ) and let U be the unitary operator defined in ( V.6) . Then U * TU ( ��A: �"ff ) . ==
V . 2 Some Characterisations
121
By the assertion in Exercise V. 2.4(ii) , given any £ > 0, there exists A > 0 such that
U*TU < Replacing
T by f(T), we get
( KL*f(A)K f(A)K
( K* A� + �/ ) .
K * f(A)L Lf(A)L
c
) <
U* f(T)U f(U*TU) f(K * AK + £) 0 0 f(>.. ) I ==
(
)
by the operator monotonicity of f. In particular, this shows K * f( A )K < f (K* AK + £) for every £ > 0. Hence K * f(A)K < f(K * AK) . Conversely, suppose f is operator concave. Let 0 < A < B. Then for any 0 < A < 1 we can write
>.. B >..A + ( 1 - .A ) 1 -.A .A ( B - A ). ==
Since f is operator concave, this gives
Since f(X) is positive for every positive X , it follows that f(>.. B ) > :Af(A). 1 . This shows f(B) > f(A) . So f is operator monotone. • Now let ).. �
Let f be a continuous function from (0, oo ) into itself. If f is operator monotone then the function g( t) Ct) is operator convex. f Proof. Let A, B be positive operators. Since f is operator concave, f ( A! B ) > f( A )� f( B ) . Since the map X � x- l is order-reversing and Corollary V . 2 . 6
==
convex on positive operators (see Proposition V. l .6 and Exercise V. l . 15) , this gives
This is the same as saying g is operator convex.
•
Let I be an interval containing 0, and let f be a real function on I ith f(O) < 0. Show that for every Hermitian operator A with spectrum in I and for every projection P Exercise V.2 . 7 w
.f (PAP) < Pj(PAP)
==
Pj(PAP)P.
V.
122
Operator l\1onotone and Operat or Convex Functions
Exercise V. 2 . 8
Let f be a continuous real function on [0, oo) . Then for all positive operators A and projections P f( A l /2 pA 1 f 2 ) A 1 /2 p == A 1 /2 p f(PAP ). (Prove this first, by induction, for f(t) == tn . Then use the Weierstrass approximation theorem to show that this is true for all f.) Theorem V.2.9 Let f be a (continuous} real function on the interval [0, a ) . Then the following two conditions are equivalent: {i) f is operator convex and f(O) < 0 . {ii} The function g(t) == f(t) j t is operator monotone on (0, a ) . Proof. ( i ) ( ii ) : Let 0 < A < B. Then 0 < A 1 1 2 < B 1 1 2. Hence, B - 1 1 2 A 11 2 is a contraction by Lemma V. l .7. Therefore, using Theorem ==>
V.2.3 we see that
f(A) f(A 1 /2 B - 1 1 2 BB -11 2 A 1 /2 ) < A 1 /2 B - 1 /2 j(B)B -1 1 2 A 1 1 2 . ==
From this, one obtains, using Lemma V. 1 . 5 ,
A - 1 /2 f(A)A - 1 /2 < B - 1 /2 f(B)B - 1 /2 . Since all functions of an operator commute with each other, this shows that A - 1 f(A) < B - 1 f(B ). Thus, g is operator monotone. ( ii ) => ( i ) : If f(t) j t is monotone on (0, a ) we must have f(O) < 0. We will show that f satisfies the condition ( iv ) of Theorem V.2.3. Let P be any projection and let A be any positive operator with spectrum in (0, a ) . Then there exists an c > 0 such that (1 + c:)A has its spectrum in (0, a ) . Since P + ci < (1 + t:)I, we have A 11 2 (P + t:I)A 1 1 2 < ( 1 + c:)A. So, by the operator monotonicity of g , we have
A -1 f 2 (P + t:I) - 1 A - 1 /2 f(A 1 f 2 (P + c/)A l /2 ) < ( 1 + t:) - 1 A - 1 f((1 + c:)A). Multiply both sides on the right by A 1 1 2 (P + cl) and on the left by its conjugate (P + t:l)A 1 /2 . This gives A - 1 /2 f( A 11 2 (P+ t:l)A 1 1 2 ) A 1 1 2 (P+t:l) < (1+c) -1 (P+c:l)f( (1 + c)A) (P +c: l). Let 0. This gives A - 1 /2 f( A 1 /2 pA lf 2 ) A 1 / 2 p < pf(A)P. Use the identity in Exercise V.2.8 to reduce this to Pf(PAP) < Pj(A)P, and then use the inequality in Exercise V.2. 7 to conclude that f(PAP) < Pf( A )P, desired. • £ --+
as
As corollaries to the above results, we deduce the following statements about the power functions .
V . 3 S mo o t h ne ss Propert ies
123
The orem V. 2 . 1 0 On the positive half-line (0, oo ) the functions f ( t) == tr , whe re r is a real number, are operator monotone if and only if 0 < r < 1 . Proof. If 0 < r < 1 , we know that f ( t ) == tr is operator monotone by Theorem V. 1 .9. If r is not in [0, 1] , then the function f ( t) == tr is not concave on (0, oo). Therefore, it cannot be operator monotone by Theorem • V. 2.5. Exercise V. 2. 1 1 Consider the functions f(t) == tr on (O, oo) . Use Theo rems V. 2. 9 and V. 2. 1 0 to show that if r > 0 , then f ( t) is operator convex if and only if 1 < r < 2 . Use Corollary V. 2. 6 to show that f(t) , is operator convex if - 1 < r < 0 . (We will see later that j ( t ) is not operator convex
fo r any other value of r. ) Exercise V .2 . 12 A function f from (0, ) into itself is both operator monotone and operator convex if and only if it is of the form f ( t ) oo
a + {3 t , a , {3
>
==
0.
Exercise V.2 . 13
cave on (0, oo) .
V. 3
Show that the function f ( t)
==
- t log t
is operator con
Smo othness P rop erties
Let I be the open interval ( - 1 , 1 ) . Let f be a continuously differentiable function on I. Then we denote by Jf l ) the function on I x I defined as f [ll (A, J.L ) j[ 1 J (A, A )
f (A ) - f (J.L) A - J.L f ' (A ) .
The expression f [ l ] (A, J.l ) is called the first divided difference of f at (A, J.L) . If A is a diagonal matrix with diagonal entries A1 , . . . , A n , all of which are in I, we denote by f [1 f ( A ) the n x n symmetric matrix whose (i, j )-entry is f [ 1 l ( ,.\i , ,x1 ) . If A is Hermitian and A == U AU * , let f Pl (A) U f[ 1 1 (A)U * . Now consider the induced map f on the set of Hermitian matrices with eigenvalues in /. Such matrices form an open set in the real vector space of all Hermitian matrices. The map f is called (Frechet) differentiable at A if there exists a linear transformation D j (A) on the space of Hermitian matrices such that for all H =
ll f(A + H ) - f(A ) - D f (A) ( H ) I I
==
o ( II H II ) .
(V.8)
The linear operator D f (A) is then called the derivative of f at A. Basic rules of the Frechet differential calculus are summarised in Chapter 10. If
V.
124
and
Operator Monotone
Operator Convex Functions
f is differentiable at A, then
D f ( A) ( H )
:t
=
t=O
/(A + tH) .
(V.9 )
There is an interesting relationship between the derivative D J(A ) and the matrix jl11 (A) . This is explored in the next few paragraphs.
Lemma V . 3 . 1 Le t f be a polynomial function. Then for every diagonal m atrix A and for every Hermitian matrix H , D f(A) (H)
fl 11 (A) o H,
==
(V. lO)
where o stands for the Sch ur-p rodu ct of two matrices.
Proof. B ot h sides of (V. lO) are linear in f. Therefore, it suffices to prove this for the powers f(t) tP , p 1 , 2 , 3 , . . . For such f, using (V.9) one gets =
=
p
D f (A ) ( II )
This
is
=
is
the ( il j)-entry of fl1 1 (A) V. 3.2
Corollary
If A
=
L
k.;:: 1
is
E .x � - 1 .x�- k hij · On the other handl k-= 1 .x � - 1 .x� - k .
a matrix whose (il j)-entry p
E A k - 1 HAP - k . k=l p
•
U AU* and f
D f(A ) (H)
=
U[fl11 (A) o (U* HU)]U* .
Pro of. Note that d dt
t =O
f(U AU* + t H )
and use (V . 10 ) . Theorem
V.3 . 3
its eigenvalue � in
is a polynomial fun ction, then
=
Let f E C 1 (I ) I . Then
U
[
d dt
t=O
(V. l l )
]
/(A + tU* HU) u· , •
and let A be a Hermitian matrix with al l
D f( A) ( H )
=
f l11 ( A ) o H,
(V. 12)
where o denotes the Sch ur prod uc t in a basis in whi ch A is di agon a l. -
Proof. Let A
=
U AU* , where D f( A ) (II)
=
A is
diagonal. We want to prove that
U [fl11 (A) o (U* H U) ) U* .
(V. 1 3 )
V.3
125
Smoothness Properties
This has been proved for all polynomials f . We will extend its validity to 1 all f E C by a continuity argument. Denote the right-hand side of (V. l3) by V j ( A) (H ) . For each f in C1 , V f( A ) is a linear map on Hermitian matrices. We have
IIVJ ( A) (H) ll 2 == ll f(l ] ( A) 0 ( U * HU ) I I 2 ·
All entries of the matrix jl1l ( A) are bounded by max l f ' ( t ) l - ( Use the lt i < I I AII
mean value theorem. ) Hence
I IV f ( A) ( H) 1 !2 < max If ' (t) I I I H i b lt i < II A I I
(V. 14)
Let H be a Hermitian matrix with norm so small that the eigenvalues of A + H are in I. Let [a, b] be a closed interval in I containing the eigenvalues of both A and A+ H . Choose a sequence of polynomials fn such that fn f and f� f ' uniformly on [a, b] . Let £ be the line segment joining A and A + H in the space of Hermitian matrices. Then, by the mean value theorem ( for Frechet derivatives ) , we have ---+
-+
l l !m ( A + H ) - !n ( A + H) - ( !m ( A ) - !n ( A) ) II < IIHI J sup II Dfm (X ) - D fn (X ) II XE.C (V 1 5 ) II H II sup I ! V fm ( X ) - V fn (X) l i XE.C This is so because we have already shown that D fn == V fn for the polyno mial functions fn . Let £ be any positive real number. The inequality (V. l4) ensures that there exists a positive integer no such that for m , n > no we have .
£
sup i! V fm (X ) - V fn (X ) ll < 3 XE£ and Let m ---+
I J Vfn ( A) - V f ( A) II oo
<
(V. l6)
�-
(V. l 7)
and use (V. l5) and (V. l6) to conclude that
II J ( A + H) - J( A) - (Jn ( A + H) - fn { A) ) II <
� IIHII -
(V. 18)
If I I H I I is sufficiently small, then by the definition of the F'n§chet derivative, we have (V. l9) ll fn ( A + H) - fn ( A) - V fn ( A) ( H) II < I I H II .
�
Now we can write, using the triangle inequality,
ll f ( A + H ) - f(A) - Vf(A) ( H) II < ll f ( A + H ) - f (A ) - ( fn ( A + H) - fn ( A)) II + l l fn ( A + H) - fn ( A) - Vfn (A ) ( H) I I + II (VJ( A) - V fn ( A ) ) ( H ) I I ,
126
V.
O perator Monotone and O p erator Co nvex Funct ions
and then use (V. 17) , (V. 18) , and (V . l 9) to conclude that , for I I H II suffi ciently small, we have
ll f ( A + H) - f ( A) - V f ( A) ( H) II < t: II H II But this says that D f( A)
== 1J f ( A) .
•
Let t A ( t) be a C1 map from the interval [0 , 1] into the space of Hermitian matrices that have all their eigenvalues in I . Let f E C1 (I), and let F( t) == f ( A (t ) ) . Then, by the chain rule, Df (t ) == DF(A(t ) ) ( A ' (t ) ) . Therefore, by the theorem above, we have ---+
F ( 1 ) - F(O )
=
1
J j l1l (A (t ) )
o
A ' (t)dt,
(V.20)
0
where for each t the Schur-product is taken in a basis that diagonalises
A(t).
Theorem V .3.4 Let f E C1 (I) . Then f is operator monotone on I if and only if, for every Hermitian matrix A whose eigenvalues are in I, the matrix j[IJ (A ) is positive. Proof. Let f be operator monotone, and let A be a Hermitian matrix whose eigenvalues are in I . Let H be the matrix all whose entries are 1. Then H is positive. So, A + t H > A if t > 0. Hence, f (A + t H ) - f (A ) is positive for small positive t. This implies that D f (A ) ( H) > 0. So, by Theorem V.3.3, J P 1 (A ) o H > 0. But , for this special choice of H , this just says that f (l ] ( A) > 0. To prove the converse, let A, B be Hermitian matrices whose eigenvalues are in I, and let B > A . Let A ( t ) == (1 - t )A + tB, 0 < t < 1 . Then A (t) also has all its eigenvalues in I. So, by the hypothesis, f[1l ( A (t) ) > 0 for all t. Note that A ' ( t) == B - A > 0, for all t . Since the Schur-product of two positive matrices is positive, f[11 ( A (t ) ) o A' ( t) is positive for all t. So, • by ( V . 20 ) , f(B) - f ( A) > 0.
is continuous and operator monotone on ( - 1 , 1 ) , then for each -1 < ,.\ < 1 the function g ( t ) = ( t + ,.\) f ( t) is operator convex. Lemma V.3.5 If f
.x
Proof. We will prove this using Theorem V.2.9. First assume that f is continuous and operator monotone on [- 1 , 1] . Then the function f ( t - 1) is operator monotone on [0, 2) . Let g ( t) == tf ( t - 1) . Then g ( O ) == 0 and the function g( t ) j t is operator monotone on (0, 2). Hence, by Theorem V . 2 . 9 , g(t ) is operator convex on (0, 2) . This implies that the function h 1 ( t) == g(t + 1 ) == ( t + l ) j( t ) is operator convex on [- 1 , 1 ) . Instead of f (t ) , if the same argument is applied to the function - J ( - t ) , which is also operator
V . 3 Smoothness Prop ert ies
127
monotone on (- 1 , 1] , we see that the function h2 ( t) == -(t + 1 ) J ( - t ) is ope rator convex on [- 1 , 1 ) . Changing t to -t preserves convexity. So the function h3 (t) == h2 (-t) == ( t - l ) f (t) is also operator convex. But for I -X I < 1 , 9>.. ( t) == 1 t >.. h 1 ( t) + 1 2 >.. h3 ( t) is a convex combination of h 1 and h 3 . So g>.. is also operator convex. Now, given f continuous and operator monotone on ( - 1 , 1 ) , the function /( ( 1 - c)t) is continuous and operator monotone on [- 1 , 1] for each £ > 0. Hence, by the special case considered above, the function ( t + .X) f ( ( 1 - £ )t) is operator convex. Let £ --+ 0, and conclude that the function (t + A )j(t) is operator convex. • The next theorem says that every operator monotone function on I is in the class C 1 . Later on, we will see that it is actually in the class c oo . (This is so even if we do not assume that it is continuous to begin with. ) In the proof we make use of some differentiability properties of convex functions and smoothing techniques. For the reader ' s convenience, these are summarised in Appendices 1 and 2 at the end of the chapter. Theorem V 3 6 .
differe ntiable .
.
Every operator monotone function f on I is continuously
Proof. Let 0 < £ < 1, and let fc be a regularisation of f of order c. (See Appendix 2 . ) Then fc is a coo function on ( - 1 + £, 1 - c) . It is also operator monotone. Let ] ( t ) == lim0 fc ( t ) . Then ](t) == � [f(t +) + f(t - )] . c---+ Let 9c (t) == (t + 1) fc ( t) . Then, by Lemma V.3. 5, gc is operator convex. Let g (t) . Then g(t) is operator convex. But every convex function g(t) == clim ---+ 0 c (on an open interval) is continuous. So g (t) is <:ontinuous. Since g(t) == _ (� + 1) f ( t) and t + 1 > 0 on I, this means that f ( t) is continuous. Hence f(t) == f (t) . We thus have shown that f is continuous. Let g(t) == (t + 1 ) / (t) . Then 9 is a convex function on I. So g is left and right differentiable and the one-sided derivatives satisfy the properties g '_ ( t) <
9� ( t) ,
lim 9 � ( s) == g� ( t) ,
sl t
lim g� ( s) == g '_ ( t) .
slt
(V. 2 1 )
But 9'-± (t) == f (t) + (t + I)f� (t) . Since t + 1 > 0, the derivatives f'±_ (t) also satisfy relations like (V.21). Now let A == ( � � ) , s , t E (- 1, 1 ) . If £ is sufficiently small, s, t are in (-1 + £, 1 - c) . Since fc is operator monotone on this interval, by Theorem V .3.4, the matrix JJ 1 ] (A) is positive. This implies that
( fc: ( ) - fc: (t) ) 2 < f� (s ) f� (t) . 8
s-t
Let c � 0. Since fc � f uniformly on compact sets, fc (s) - !c (t) converges to f(s) - f ( t ) . Also, f� (t) converges to � [ f� ( t ) + J!_ (t)] . Therefore, the
1 28
V.
Operator Monotone and Operator Convex Functions
above inequality gives, in the limit , the inequality
Now let s l t, and use the fact that the derivatives of f satisfy relations like ( V.21 ) . This gives
< ! u� (t) + t� (t) ] [J� (t) + �� ( t) J , which implies that f� ( t) == f!_ ( t) . Hence f is differentiable. [ !� (t W
The relations • (V. 2 1 ) , which are satisfied by f too , show that f ' is continuous.
Just as monotonicity of functions can be studied via first divided differ ences, convexity requires second divided differences. These are defined as follows. Let f be twice continuously differentiable on the interval I._ Then j [2] is a function defined on I x I x I as follows. If A1 , A2 , A3 are distinct
! [ .1\} ' ' .1\, 2 , .1\, 3 ) 2J (
== j [ 1 l ( A l , A 2 ) - j [1l ( A l , A3 ) .1\2 ' - .1\' 3
•
For other values of A1 , A 2 , A 3 , j( 2] is defined by continuity; e.g. , / [2 1 ( A , A, A) == � ! ( A ) . 2 "
Show that if A1 , A 2 , A3 are distinct, then j [21 (Al , A 2 , A3 ) is the quotient of the two determinants
Exercise V .3. 7
/(AI ) A1 1
j ( A2 ) A2 1
AI A1 1
and
A� A2 1
A§ A3 1
Hence the function f [2] is symmetric in its three arguments. Exercise V. 3.8 If f(t) == t m , m == 2, 3, . . , show that .
O$p,q,r v + q +r=m - 2
(i} Let f (t) == tm , m > 2 . Let A be an n x n diagonal n matrix; A == L Ai Pi , where Pi are the projections onto the coordinate i= l axes. Show that for every H
Exercise V.3.9
d2 f(A + tH ) 2 dt t=O
2 2
L
p +q + r = m - 2
L
AP H Aq H Ar
L
p+q+r=m- 2 1 < i ,j, k
AfAJ A� PiHPj HPk ,
V . 3 S moothness P roperties
129
and (V.22)
i, j, k
(ii) Use a continuity argument, like the one used in the proof of Theorem v. 3. 3, to show that this last formula is valid for all C 2 functions f. Theor em V.3. 10 If f E C2 (I) and f is operator convex, then for each J-l E I the functi on g( >.. ) == f[ 1 ] ( J.L , >.. ) is operator monotone. Proof. Since f is in the class C2 , g is in the class C 1 . So, by Theorem
V.3.4, it suffices to prove that , for each n, the n x n matrix with entries g[l ] ( >..i , Aj ) is positive for all >.. 1 , . . . , A n in I. Fix n and choose any ) q , . . . , An + 1 in I. Let A be the diagonal matrix with entries >.. 1 , . . . A n +l · Since f is operator convex and is twice differentiable, for every Hermitian matrix H, the matrix J:2 f(A + t H) must t=O be positive. If we write P1 , . . . , Pn + 1 for the proj ections onto the coordinate axes, we have an explicit expression for this second derivative in (V.22 ) . Choose H to be of the form
H ==
-
0 0 0 0
�1 �2 �n
0
where �b . . . , �n are any complex numbers. Let ( 1 , 1 , . . . , 1 , 0) . Then
x
be the ( n + 1)-vector (V.23)
for 1 < i , j, k < n + 1 , where bj ,n + 1 is equal to 1 if j == n + 1 , and is equal to 0 otherwise. So, using the positivity of the matrix (V.22) and then (V.23) , we have 0
<
1 < i ,j,k
But,
f [ 1 1 ( .A n+ l , >.. i ) - f[ l ] ( -An+l , .Ak ) .Ai - .Ak g[lJ (.x i , .Ak )
V.
130
O perator Monotone and O perator Convex Functions
( putting A n + l
== J-L
in the definition of g ) . So we have
0 < L g [ l ] (A i , A k )�k €i 1 < i ,k< n
Since �i are arbitrary complex numbers, this is equivalent to saying that • the n x n matrix (g[ 1 l (.A i , A k ) ] is positive.
If f E C2 (I) , f ( O) 0, and f is operator convex, then f�t ) is operator monotone . •
Corollary V.3. 1 1
the function g(t) ==
=
By the theorem above, the function j[ 1 l (O, t ) is operator mono • tone. But this is just the function f(t)jt in this case. Proof.
If f is operator monotone on I and f(O) = 0, then the t �>. f(t) is operator monotone for I .A I < 1 .
Corollary V.3 . 12
function g(t) ==
First assume that f E C 2 (I). By Lemma V.3.5, the function 9>. (t) == (t + J... ) f(t) is operator convex. By Corollary V.3. 1 1 , therefore, g ( t) is operator monotone. If f is not in the class C2 , consider its regularisations fc: . These are in C2 . Apply the special case of the above paragraph to the functions fc: ( t ) fc: ( O) and then let c 0. • Proof.
-
,
---7
If f is operator monotone on I and f(O) is twice differentiable at 0.
Corollary V.3. 13
==
0, then f
By Corollary V.3. 12 , the function g ( t) == ( 1 + ; )f(t) is operator monotone, and by Theorem V.3.6 , it is continuously differentiable. So the function h defined as h (t) == � f ( t ) , h ( O ) == f'(O) is continuously differen • tiable. This implies that f is twice differentiable at 0. Proof.
Let f be a continuous operator monotone function on I. Then the function F(t) == J; f(s)ds is operator convex.
Exercise V.3. 14
Exercise V.3 . 1 5 Let f E C 1 (I) . Then f is opemtor convex if and for all Herrnitian matrices A, B with eigenvalues in I we have
j(A) - f(B)
>
/[ 1l (B)
o
only if
( A - B) ,
where denotes the Schur-product in a basis in which B is diagonal. o
V . 4 Loewner's Theorems
131
V . 4 ,--Loewner 's Theorems . Consider all functions f on the interval monotone and satisfy the conditions
f(O)
==
0,
I
! ' (0)
( - 1 , 1)
==
==
that are operator
1.
(V.24)
Let K be the collection of all such functions. Clearly, K is a convex set. We will show that this set is compact in the topology of pointwise convergence and will find its extreme points. This will enable us to write an integral representation for functions in K. Lemma V.4. 1
If f E K, then t < f(t) 1 - t for 0 < t < 1, t > f(t) 1 + t for - 1 < t < 0 , l f " (O) l < 2.
Proof.
Let A
==
(�
g ) . By Theorem V.3.4, the matrix /[ 1 l ( A)
==
f ' (t) ( f(t) /t
f(t ) / t 1
)
is positive. Hence, (V.25)
(t ± l)f(t). By Lemma V.3.5, both functions 9± are con Let 9± (t) vex. Hence their derivatives are monotonically increasing functions. Since g� (t) f(t) + (t ± l)f ' (t) and g± (O) ± 1 , this implies that ==
==
==
f ( t ) + ( t - 1 ) ! ' ( t) > - 1 and
for
t>0
( V.26 )
f ( t) + ( t + 1 ) f' ( t) < 1
for
t < 0.
( V.27 )
2 f(t) ��
for
t > 0.
(V.28)
From ( V.25 ) and ( V.26 ) we obtain
f(t ) + 1 > ( 1 -
< t < 1 we have f(t) > t t · Then f(t) 2 > 1 t f(t). So, from (V.28) , we get f(t)+l > f�t) . But this gives the inequality f(t) < 1 t t , which contradicts our assumption. This shows that f(t) < t for 0 < t < 1. The second inequality of the lemma is obtained by the same Now suppose that for some 0
1
t
argument using (V.27) instead
of
1
(V.26) .
t
V.
1 32
Operator Monotone and O perator Convex Functions
We have seen in the proof of Corollary V.3 . 1 3 that
f' (0 ) + � !" (0) = li m t--+0 2
(1
+
t - 1 ) f (t ) - f ' (0 ) . t
Let t l 0 and use the first inequality of the lemma to conclude that this limit is smaller than 2. Let t i 0 , and use the second inequality to conclude that it is bigger than 0. Together, these two imply that l f " (O ) I < 2. • Proposition V.4.2
convergence.
in
The set K is compact
the topology of pointwise
Proof. Let { fi } be any net in K. By the lemma above, the set { fi (t ) } is bounded for each t . So, by Tychonoff ' s Theorem, there exists a subnet { fi } that converges pointwise to a bounded function f . The limit function f is operator monotone, and f (O ) == 0 . If we show that f ' (0) == 1 , we would have shown that f E K, and hence that K is compact. By Corollary V.3 . 12, each of the functions ( 1 + ! ) fi (t) is monotone 1 on ( - 1 , 1 ) . Since for all i , lim ( l + - ) fi (t) == !I (0) == 1 , we see that t--+0 t ( 1 + l )fi (t ) > 1 if t > 0 and is < 1 if t < 0. Hence, if t > 0, we have ( 1 + ) J (t ) > 1 ; and if t < 0, we have the opposite inequality. Since f is continuously differentiable, this shows that f ' ( 0) == 1. •
!
Proposition V .4.3 All
f(t) Proof.
Let f
E
extreme p o i t 1 - a.t '
n
ts of the
where
o:
set K have the form
==
1 " ! ( 0) . 2
K. For each A, - 1 < A < 1 , let A g;.. (t) == ( 1 + - ) j (t ) - A. t
By Corollary V.3. 12, g;.. is operator monotone. Note that g;.. (O ) == 0, since f (O ) == 0 and f ' (0) == 1 . Also, g� (0) = 1 + � Aj" (0) . So the function h;.. defined as 1 .A h >. ( t ) = 1 1 ) f ( t ) - A] + t [( + � V " (O) is in K. Since l f " ( O ) I < 2, we see that I � Aj" (O) I < 1. We can write 1 " 1 1 1 f == - ( 1 + - Aj " ( 0 ) ) h;.. + - ( 1 - - Aj ( 0 ) ) h_;.. . 2 2 2 2 So, if f is an extreme point of 1 2
K, we must have f = h;.. . A t
( 1 + - A j " (O ) ) f (t ) = ( 1 + - )f ( t) - .A ,
This says that
133
V. 4 Loewner 's Theorems
from which we can eonclude that
f ( t) == The orem V.4.4 J-l on [ - 1 , 1] such
t 1 - �j" ( O ) t .
•
For each f in K there exists a unique probability measure that 1 f ( t) =
J 1 -t At, dJ-L(A ) .
( V . 29 )
-1
Proof. For - 1 < A < 1 , consider the functions h>.. ( t) == 1 __!>.. t . By Propo sition V .4.3, the extreme points of K are included in the family { h.x } . Since K is compact and convex, it must be the closed convex hull of its extreme points. (This is the Krein-Milman Theorem.) Finite convex com binations of elements of the family { h.x : - 1 < .,.\ < 1 } can also be writ ten as I h>.. dv(A ) , where v is a probability measure on [- 1 , 1] with finite support . Since f is in the closure of these combinations , there exists a net {vi } of finitely supported probability measures on [- 1 , 1] such that the net fi ( t ) == J h .x ( t) dvi (A ) converges to f ( t). Since the space of the probability measures is weak* compact , the net vi has an accumulation point J-L · In other words, a subnet of J h.x dvi ( .A) converges to J h;.. dJ-L(A ) . So j(t ) == I h>.. (t ) dJ-L( A ) == J l_!_A t dJ-L ( A ) . Now suppose that there are two measures J-L 1 and J-L2 for which the representation (V. 29) is valid. Expand the integrand as a power series
l�>.. t ==
00
L tn+ 1 .,.\ n
convergent uniformly in jAj < 1 for every fixed
=O n l t l < 1 . This shows that
t with
for all l t l < 1 . The identity theorem for power series now shows that
J )..ndJl- 1 ( >..) J )..ndJl-2 (>.. ) , 1
-1
=
1
-1
But this is possible if and only if J-l l == J-l 2 .
n=
0, 1 , 2, . . . •
One consequence of the uniqueness of the measure J-L in the representation (V.29) is that every function h;..0 is an extreme point of K ( because it can be represented as an integral like this with J-L concentrated at A0 ) . The normalisations (V.24) were required to make the set K compact . They can now be removed. We have the following result.
1 34
V.
Operator I\1onotone and O p erator Convex Funct ions
Corollary V.4.5 Let f be a nonconstant operator monotone ( - 1 , 1 ) . Then there exists a unique probability measure J-L on
that
J 1 -t 1
f ( t ) == f (O ) + !' ( 0 )
-
>.
1
t dJ-L ( .A) .
function on [- 1 , 1] such ( V.30)
Proof. Since f is monotone and is not a constant , f' (0) -:f. 0. Now note that the function f(t),(t)(o) is in K. • It is clear from the representation (V.30) that every operator monotone function on ( - 1 , 1) is infinitely differentiable. Hence, by the results of earlier sections, every operator convex function is also infinitely differentiable.
Let f be a nonlinear operator convex function on ( - 1 , 1). Then there exists unique probability measure J-L on [- 1, 1] such that
Theorem V.4. 6
a
f( t) = f(O ) + j ' (O)t + � f" ( 0 )
1
J l � .At dJ-L( .A ) .
-1
2
(V.31)
Proof. Assume, without loss of generality, that f(O) == 0 and /' ( 0) == 0. Let g(t) == f(t)jt. Then g is operator monotone by Corollary V.3. 11, g (O) == 0, and g'(O) == � f" (O) . So g has a representation like (V.30), from which the representation (V. 31 ) for f follows. • We have noted that the integral representation (V . 3 0 ) implies that every operator monotone function on (-1, 1) is infinitely differentiable. In fact, we can conclude more. This representation shows that f has an analytic continuation
j 1
f(z) == f( O ) + !' ( 0 )
-1
z 1 - ,\z dJ-L ( .A )
(V.32)
defined everywhere on the complex plane except on ( - oo , - 1 ] U [l, oo ) . Note that z Im z Im ·
1 1 - Az l 2
1 - Az
So f defined above maps the upper half-plane H+ == {z : Im z > 0 } into itself. It also maps the lower half-plane H_ into itself. Further, f(z) == f(z). In other words, the function f on H is an analytic continuation of f on H+ across the interval ( - 1 1) obtained by reflection. This is a very important observation, because there is a very rich theory of analytic functions in a half-plane that we can exploit now. Before doing so, let us now do away with the special interval ( - 1 , 1). Note that a function f is operator monotone on an interval ( a , b ) if and only if the function _
,
V . 4 Lo ewner 's Theorems
135
f ( ( b -;_a ) t + bt a ) is operator monotone on ( 1 , 1 ) So, all results obtained for operator monotone functions on ( - 1 , 1 ) can be extended to functions on (a, b) . We have proved the following. -
.
If f is an operator monotone function on (a, b) , then f has an analytic continuation to the upper half-plane H+ that maps H+ into itself. It also has an analytic continuation to the lower-half plane H obtained b y reflection across (a, b) . The converse of this is also true: if a real function f on (a, b) has an analytic continuation to H+ mapping H into itself, then f is operator monotone on (a, b) . This is proved below. Th eorem V .4. 7
_ ,
+
Let P be the class of all complex analytic functions defined on H+ with their ranges in the closed upper half-plane { z : Im z > 0}. This is called the class of Pick functions. Since every nonconstant analytic function is an open map, if f is a nonconstant Pick function, then the range of f is contained in H+ . It is obvious that P is a convex cone, and the composition of two nonconstant functions in P is again in P.
{i) For 0 < < 1 , the function f(z ) == z ' is in P. {ii) The function f ( z) log z is in P. {iii) The function f(z) == tan z is in P. {iv) The function f ( z ) == - ! is in P. (v) If f is in P, then so is the function 11 . Given any open interval (a, b) , let P ( a, b) be the class of Pick functions
Exercise V.4.8
r
==
that admit an analytic continuation across (a, b) into the lower half-plane and the continuation is by reflection. In particular, such functions take only real values on (a, b) , and if they are nonconstant , they assume real values only on ( a, b). The set P ( a , b) is a convex cone. Let f E P(a, b ) and write f ( z ) u ( z ) + iv(z) , where as usual u(z) and v(z) denote the real and imaginary parts of f. Since v(x) == 0 for a < x < b, we have v(x + iy ) - v(x) > 0 if y > 0. This implies that the partial derivative vy (x ) > 0 and hence, by the Cauchy-Riemann equations, ux (x) > 0. Thus , on the interval (a, b) , f (x ) == u ( x ) is monotone. In fact, we will soon see that f is operator monotone on (a, b). This is a consequence of a theorem of Nevanlinna that gives an integral representation of Pick functions. We will give a proof of this now using some elementary results from Fourier analysis. The idea is to use the conformal equivalence between H+ and the unit disk D to transfer the problem to D , and then study the real part u of f. This is a harmonic function on D, so we can use standard facts from Fourier analysis. ==
Theorem V.4.9 Let u be a nonnegative harmonic function on the disk D == { z : I z I < 1 } . Then there exists a finite measure m on [0, 27r]
unit such
136
V.
Operator Monotone and O perator Convex Fu nctions
that
2 7r
u(ret8 ) == ·
1 - r2 dm ( t) . 1 + r 2 - 2 r cos ( O - t )
J 0
(V.33)
Conversely, any function of this form is positive and harmonic on the unit disk D . Proof. Let u b e any continuous real function defined on the closed unit disk that is harmonic in D . Then, by a well-known and elementary theorem in analysis, 2 7r 1 . t ) dt 1 - r2 i u(re 8) u et ( 27r 1 + r 2 - 2r cos(e - t ) 0 2 7r 1 Pr ( B - t)u(ei t )dt, (V.34) 27r
J j 0
where P, (B ) is the Poisson kernel ( defined by the above equation ) for 0 < r < 1 , 0 < e < 21r. If u is nonnegative, put dm(t) == 2� u(e i t)dt. Then m is a positive measure on (0 , 27r] . By the mean value property of harmonic functions, the total mass of this measure is 2 7r t )dt = u(O) . (V.35) u(ei 2 0 So we do have a representation of the form (V.33) under the additional hypothesis that u is continuous on the closed unit disk. The general case is a consequence of this. Let u be positive and harmonic in D . Then, for £ > 0, the function uc (z) == u( l �c ) is positive and harmonic in the disk l z l < 1 + £. Therefore, it can be represented in the form (V.33) with a measure mE(t) of finite total mass uc ( O ) == u(O) . As E � 0, uE converges to u uniformly on compact subsets of D. Since the measures me all have the same mass, using the weak* compactness of the space of probability measures, we conclude that there exists a positive measure m such that
�j
"
j u(re' ) = lim uE (re' ) = c ·
e
·
--+0
e
2n
0
1 + r2
1 - r2
- 2r
dm(t) . ( cos B - t )
Conversely, since the Poisson kernel Pr is nonnegative any function repre • sented by (V.33) is nonnegative.
every nonnegative harmonic function on the unit disk is the Poisson integml of a positive measure. Theorem V.4.9 is often called the Herglotz Theorem. It says that
V . 4 Loewner 's Theorems
137
Recall that two- harmonic functions u, v are called harmonic conj u gates if the function f (z) == u ( z ) + iv ( z) is analytic. Every harmonic function u has a harmonic conjugate that is uniquely determined up to an additive constant. Theorem V.4. 10 Let f(z) == u ( z ) + iv(z) be analytic on the unit disk D . If u ( z) > 0, then there exists a finite positive measure m on [0, 21r] such
th at
2 7r
f ( z ) ==
J0
ei t + z ez. t - z
dm(t) + i ( O ) .
(V.3 6 )
v
Conversely, every function of this form is analytic on D and has a positive real part. Proof. By Theorem V.4.9, the function u can be written as in The Poisson kernel Pr , 0 < r < 1, can be written as
Pr (O)
==
1 - r2 1 + r 2 - 2 r cos e
Hence,
Pr ( e - t )
==
Re
and
==
==
� -
+ r e i (B- t) 1 - re z ( e -t ) ·
Re
J 0
== Re
CXJ
1
271"
u ( z)
CXJ " r lnl e in8 '""'"
==
Re
(V.33) .
1 + rei e 1 - re i8 .
e i t + re ie e z· t - re2· e '
e it + z dm ( t). e 2. t - z
So, f ( z) differs from this last integral only by an imaginary constant . Putting z == 0 , one sees that this constant is iv(O). • The converse statement is easy to prove.
Next , note that the disk D and the half-plane H+ are conformally equiv alent , i.e. , there exists an analytic isomorphism between these two spaces. For z E D, let Then (
E
((z)
==
� z + 1_
1 H+ . The inverse of this map is given by �
z (()
=
z -
(
(
-
+
�.
'l
(V.37) (V. 38)
Using these transformations, we can establish an equivalence between the class P and the class of analytic functions on D with positive real part . If f is a function in the latter class, let
==
if(z(() ) .
(V. 39)
138
Then cp
V.
E
O perator Monotone and O p erator Convex Functions
P. The inverse of this transformation is (V.40)
f ( z ) == - i cp ( (( z ) ) .
Using these ideas we can prove the following theorem, called N evan linna's Theorem. Theorem V .4. 1 1 A
representation
function cp is in the Pick class if and only if it has a 00
cp( ( ) == a + ,8( +
1 + A( dv ( >. ) ,
Joo .X _
-
(V . 4 1 )
(
where a is a real number, ,B > 0, and v is a positive finite measure on the real line.
Proof. Let f be the function on D associated with cp via the transforma tion (V.40). By Theorem V.4. 10, there exists a finite positive measure m on [0, 21r ] such that
J
2 7r
f(z) ==
e t + z dm(t) - i n . e2 - z i
.t
0
If f ( z) == u ( z ) + iv (z ) , then a == -v (O) , and the total mass of m is u ( O ) . If the measure m has a positive mass at the singleton {0} , let this mass be {3. Then the expression above reduces to
f ( z) =
eit + z dm t ) + {3 1 + z 1-z et t - z (
J ( 0 ,27r)
Using the transformations
.
(V.38)
and
cp(( ) = Q + ,8( + i
.
-
za .
(V.39) , we get from this
J
e
it .
e ?.
( 0 , 2 7r)
t
-i + (( + " ( -� d m ( t). -
?.
( +i
__
The last term above is equal to
( cos 2t - s1n t 2 dm(t) . ( sin ; + cos � .
��
J (0 , 2 7r)
-"
Now, introduce a change of variables A == - cot � . This maps (0, 21r) onto ( - oo , oo ) . The measure m is transformed by the above map to a finite measure v on ( - oo , oo ) and the above integral is transformed to 00
Joo
-
l + A( dv ( A ) . A-(
V . 4 Loewner ' s T heorems
139
This shows that cp can be represented in the form (V .4 1 ) . It is easy to see that every function of this form is a Pick function.
•
There is another form in which it is convenient to represent Pick func tio ns . Note that 1 + A( A )(.A 2 + 1 =( 1 . 2 ) 1 + A .A - ( .A - ( So, if we write representation
dJi-( A)
( A2 + l )dv( .A ) ,
then we obtain from (V.4 1 ) the
cp(() = a + ,6( + J [ A � ( - A2 � 1 ] dJ.L ( A ) ,
(V.42)
a == Re cp(i).
(V .43)
CXJ
-
CXJ
whe re 11- is a positive Borel measure on R, for which J A2� 1 d11-(:>.. ) is finite. ( A Borel measure on IR is a measure defined on Borel sets that puts finite mass on bounded sets. ) Now we turn to the question of uniqueness of the above representations. It is easy to see from (V.4 1 ) that Therefore, a is uniquely determined by cp . Now let number. From (V.41 ) we see that
-
'T} be
any positive real
CXJ
As rJ oo , the integrand converges to 0 for each .A. The real and imaginary parts of the integrand are uniformly bounded by 1 when rJ > 1 . So by the Lebesgue Dominated Convergence Theorem, the integral converges to 0 as rJ ---+ oo . Thus, {3 == lim 'fJ ( i TJ) / i TJ , (V.44) ---+
TJ ---+ 00
and thus (3 is uniquely determined by cp . Now we will prove that the measure dJi- in (V . 42) , is uniquely determined by cp. Denote by 11- the unique right continuous monotonically increasing function on IR satisfying JL ( O ) == 0 and JL ( ( a , b] ) 11- ( b) - J-L ( a ) for every interval ( a, b]. ( This is called the distribution function associated with df.L.) We will prove the following result, called the Stieltjes inversion formula, from which it follows that J..L is unique. ==
Theorem V.4. 12 If
the Pick function cp is represented by (V.42}, then for any b that are points of continuity of the distribution function J-L have a,
we
11- ( b)
- JL( ) == TJlim ---+ 0 a
� J Im
7f
a
(V.45)
V.
1 40
Operator Monotone and Operator Convex Functions
From (V.42) we see that
Proof.
� J Im cp(x + i1J) dx
�J b
b
{3 7] +
a
a
00
Joo ( >. - x�2
+
1]2 dp, ( >. ) dx
-
the interchange of integrals being permissible by Fubini ' s Theorem. As rJ ----+ 0, the first term in the square brackets above goes to 0. The inner integral can be calculated by the change of variables u == x -T'J .A . This gives b
J a
b - ).. T}
'T]dX (x .A) 2 + 'TJ2
J
_
a - >-.
du u2 + 1
T}
arctan
( b - .A) 'TJ
-
arctan
So to prove (V.45) , we have to show that
11-( b) - p, ( a ) = ��
�
00
a arctan C � >.) - arctan ( � >.)] dp, ( A). [ J
-
oo
We will use the following properties of the function arctan. This is a mono tonically increasing odd function on ( - oo , oo ) whose range is ( - ; ; ) . So,
0 < arctan
( b - >.. ) - arctan ( a - >.. ) <
,
1r .
(a - >..) have the same sign, then by the addition law for ( (b .A) a - >.. ) 'fJ (b - a) arctan == arctan 2 - arctan 17 ( b ,.\) ( a ) · 'fJ 'fJ
'fJ
If (b - >.. ) and arctan we have,
+
'fJ
If x is positive, then
X
arctan x =
J 0
dt < -- 1 t +
2
X
J
+
8 ) - 11- ( a - 6 ) J.L ( b + 8 ) - J.L ( b - 6 )
_
)..
dt == X .
0
Now, let c be any given positive number. Since a and tinuity of Ji- , we can choose 8 such that
JL( a
_
< c/5 , < £/ 5 .
b are points of con
V. 4 Loewner 's Theorems
141
We then have,
1 I J-L( b ) - J-L(a) -
CXJ
b - A ) - arctan ( a - A ) ]dJ.L(A) ( J 1 / (arctan ( b - >.. ) - arctan ( a - >.. ) ]dJ.L(A) -
CXJ
CXJ
<
'f/
'f/
(arctan
7r
'f/
7r
'I'}
b
1
b
b - >.. ) + arctan ( a - A ) ]dJ.L(A) arctan ( Ja a 1 b >.. a - >.. ) ] dJ.L(A) ( ) + [arctan arctan ( J 2£ 1 1 arctan ( ry(b_ - a) _ ) dJ.L(A) + + (b A)(a A) S b+8 b-8 1 [1r - arctan ( b - >.. ) + arctan ( a - >.. ) ]dJ.L( .X ) + J + a 8 a-8 1 arctan ( ry(b_ - a) _ ) dJ.L(A) . + + (b >.. ) (a A) J +
7f
7r
'I'}
'f/
- CXJ
CXJ
<
'f/
'I'}
[1r
TJ 2
7f
'f/
7f
'f/ 2
7f
-
'f/
CXJ
Note that in the two integrals with infinite limits, the arguments of arctan are positive. In the middle integral the variable runs between + and > TJ8 and a -TJ ;A < TJ8 . So the right-hand side of the For such >.. , b -TJ ;A above inequality is dominated by
A
-
b - 8.
2£ 5
CXJ
+ + +
b - a dJ-L(A) 1r J + ( b - >.. ) (a - >.. ) b+8 a-8 b -.X dt-t ( >.. ) 1r J 'f}2 + (b - )( a - >.. ) 8 .!_ [1r - 2 arctan ]dJ.L(A). J TJ
TJ
2
a
-
CXJ
b-6
7f
'T]
a 8
a +6
TJ
The first two integrals are finite ( b ecause of the properties of df..L ) . The third o ne is dominated by 2 ( ; - arctan � [ S o we can choose rJ small enough to make each of the last three terms smaller than c / 5 . This proves the theorem. •
) J-L(b) J-L(a)].
1 42
V.
Operator Monotone and Operator Convex Functions
We have shown above th at all the terms occurring in the representation ( V.42) are uniquely determined by the relations (V.43) , (V.44) , and (V.45) . Exercise V.4. 13 We have proved the relations (V. 33}, (V. 36}, ( V. 4 1) and (V.42} in that order. Show that all these are, in fact, equivalent. Hence,
each of these representations is unique.
Proposition V .4. 14 A Pick function if the measure J-L associated with it in mass on (a, b) .
'P
is in the class P( a , b) if and only the representation ( V.42} has zero
Proof. Let cp(x + iry) == u(x + i'T]) + i v(x + iry), where u, v are the real and imaginary parts of cp. If 'P can be continued across (a, b) , then as rJ 1 0, on any closed subinterval [c, d) of (a, b) , v (x + iry) converges uniformly to a bounded continuous function v ( x) on [ c, d] . Hence, J-L( d) - M ( c) =
� J v(x )dx, d
c
i.e. , dJ-L(x) == ! v ( x ) dx . If the analytic continuation to the lower half-plane is by reflection across (a, b) , then v is identically zero on [ c, d] and hence so lS J-L. Conversely, if J-L has no mass on (a, b) , then for ( in (a, b) the integral in (V.42) is convergent, and is real valued. This shows that the function cp • can be continued from H+ to H_ across (a, b) by reflection. The reader should note that the above proposition shows that the con verse of Theorem V .4. 7 is also true. It should be pointed out that the formula (V.42) defines two analytic functions, one on H+ and the other on H_ . If these are denoted by cp and 'l/J, then cp(( ) == 'l/; ( () . So 'P and 'lj; are reflections of each other. But they need not be analytic continuations of each other. For this to be the case, the measure f.L should be zero on an interval (a, b) across which the function can be continued analytically.
If a function f is operator monotone on the whole real line, then f must be of the form j(t) = a + {3t, a E JR, {3 > 0 .
Exercise V.4. 15
Let us now look at a few simple examples.
The function cp(() == - � is a Pick function. For this function, we see from (V.43) and {V. 44) that a == {3 == 0 . Since cp is analytic everywhere in the plane except at 0, Proposition V. 4 . 14 tells us that the measure J-L is concentrated at the single point 0. Example V.4. 16
V . 4 Loewner ' s Theorems
1 43
Let cp( () == ( 11 2 be the principal branch of the square ro ot function. This is a Pick function. From (V. 4 3) we see that
Example V.4. 1 7
cp( i ) = Re e in:/4 = �From (V. 44) we see that {3 == 0 . If ( == A + iry is any complex number, then 1( I 2 = ( 1 ( 1 + A ) 112 + . sgn (1 ( 1 - A ) 1 12 , 2 2 where sgn is the sign of defined to be 1 if > 0 and if < 0 . 1 12 = ( > 0 , we have Im cp( ) ('<1; >.. ) . As ! 0 , 1 ( 1 comes closer to So if j A I. So, Im cp(A + iry ) converges to 0 if A > 0 and to IAI1/ 2 if A < 0. Since a = Re
TJ
t
TJ
rJ ,
rJ
-1
rJ
TJ
cp
is positive on the right half-axis, the measure J.L has no mass at measure can now be determined from {V. 4 5). We have, then
0(
rJ
0.
The
1 / 2 A ) IAI (V.46) dA. + J A - ( A2 + 1 y'2 CXJ Example V .4. 18 Let cp( ( ) == Log (, where Log is the principal branch of the logarithm, defined everywhere except on ( 0] by the formula Log ( == l n l (I + i Arg ( . The function Arg ( is the principal branch of ( 1 / 2 ==
1 _
1
-
1r
-
- oo ,
the argument, taking values in ( - 1r, 1r] . We then have Re(Log i) == 0 a iry Lo ( ) == 0. � lim {3 T] -+
CXJ
'l rJ
As rJ l 0, Im (Log( A + i'T])) converges to 1r if A < 0 and to 0 if A > 0. So from (V. 4 5) we see that, the measure J.L is just the restriction of the Lebesgue measure to (-oo, 0] . Thus, (V.47) Exercise V.4. 19 For 0 function cp( ( ) == (r . Show
r<
1,
let (r denote the principal branch of the
0(
(r == cos r21f + sin r7r J A - ( - A2 A+ 1 ) 1-Ai r d.A. CXJ 1r
This includes
(V. 4 6) as a
-
special case.
1
(V .48 )
V.
1 44
Op erator Monotone and Operator Co nvex Functions
Let now f be any operator monotone funct ion on (0, oo ) . We have seen above that f must have the form
J ( .X -1 t - A. >.+ 1 ) d (A ) . 0
j(t) == a + ,B t +
-
2
oo
J-L
By a change of variables we can write this as
00 f ( t) = Q + jJt + J ( ). 2 � - ). � t ) dJ1- ( ). ) (V.49) 1 IR, {3 > 0 and is a positive measure on (0, ) such that 00 dJi-(A) (V. 50) < 2 A + 1 J l
where a
E
0
J-L
oo
1
oo .
0
Suppose f is such that
f(O)
lim
: ==
t-o
Then, it follows from (V.49) that 1
J-L
(V . 5 1 )
f(t) > - oo .
must also satisfy the condition
J � dfl-(>.) 0
We have from (V.49)
<
(V. 5 2 )
oo.
00
J (� - ). � t ) dfl-(>.) 00 jJt + 1 ( >. : t ) >. dfl-(>.). jJt +
j(t) - f(O)
0
Hence, we can write f in the form
(V. 53) 100 � t dw(>.) , where == f(O) and dw(A) == ;2 dJi-(A). From (V.50) and (V. 52) , we see that the measure w satisfies the conditions
f (t)
=
'Y
).
+ jJt +
1
7 >/: dw(>.) 0
1
<
oo
and
J >.dw(>.) 0
<
oo .
(V. 54)
V.4 Lo ewner's Theorems
145
These two conditions can, equivalently, be expressed as a single condition
j � ). CXJ
1
0
dw ( >. ) <
(V.55)
oo .
We have thus shown that an operator monotone function on (0, oo ) sat isfying the condition (V. 5 1 ) has a canonical representation (V. 53) , where 1 E JR, ,B > 0 and w is a positive measure satisfying (V. 55) . The representation (V.53) is often useful for studying operator monotone functions on the positive half-line (0, oo ) . Suppose that we are given a function f as in (V.53) . If J-L satisfies the conditions (V.54) then
and we can write
and dJ.L (>.. ) == >.. 2 dw (>.. ) , then we have a representation of f in the form (V.49) . So , if we put the number in braces above equal t o
Exercise V .4.20 show that, for 0 <
Use the considerations in the preceding paragraphs to < 1 and t > 0, we have
r
t' (See
Exercise
a
==
s1
n
r
1r
____
1f
( V .56)
0
also. ) For t > 0, show that
V. 1 . 1 0
Exercise V.4. 2 1
log(l + t ) =
). J +t CXJ
1
>.. t
). -
2 d.A.
(V.57)
Appendix 1 . Differentiability of Convex Functions Let f be a real valued convex function defined on an interval I. Then f has some smoothness properties, which are listed below. The function f is Lipschitz on any closed interval (a, b] contained in /0 , the interior of I. So f is continuous on !0 .
1 46
V.
O perator Monotone and Op erator Co nvex Funct ions
At every point x in / 0 , the right and left derivatives of are defined, respectively, as
!' (
+ X
)
.
.
_
-
1.1m ylx
f'_ ( x) : = lim jx y
f
exist. These
f (y ) - f (x ) , y-X f (y) - f (x) y-X .
Both these functions are monotonically increasing on !0 . Further, xlw
f� ( x ) lim f� ( x ) xjw lim
f� ( w ) , J!_ ( w ) .
The function f is differentiable except on a countable set E in /0 , i.e. , at every point x in 1°\E the left and right derivatives of f are equal. Further, the derivative f' is continuous on ! 0 \E. If a sequence of convex functions converges at every point of I , then the limit function is convex. The convergence is uniform on any closed interval [a, b] contained in J0 . Appendix 2 . Regularisation of Functions The convolution of two functions leads to a new function that inherits the stronger of the smoothness properties of the two original functions. This is the idea behind "regularisation" of functions. Let cp be a real function of class c oo with the following properties: cp > 0,
f£ ( x ) == (!
j f (x - y ) cp£ (y ) dy j f (x - Et ) cp(t ) dt.
fc has the following properties. Each fc is a c oo function. If the support of f is contained in a compact set K, then the support
The family 1.
2.
of ft: is contained in an £-neighbourhood of K .
V . 5 P roblems
3 . If f is continuous at xo , then lim fE: ( x 0 ) clO
==
1 47
f (x0 ) .
4. If f has a discontinuity of the first kind at x 0 , then lim JE: ( x 0 ) £1 0 1/2 [f (xo+) + f(xo - )] . (A point x 0 is a point of discontinuity of the first kind if the left and right limits of f at x 0 exist ; these limits are denoted as f(xo -) and f(x o +) , respectively. ) 5. If f is continuous, then JE: (x) converges to f(x) as vergence is uniform on every compact set.
6. If f is differentiable , then, for every £ > 0, (JE: )'
==
c ----*
0. The con
(f')E: .
0, f� (x) converges to f' (x) at all 7. If f is monotone, then, as £ points x where f' (x) exists. (Recall that a monotone function can ----*
have discontinuities of the first kind only and is differentiable almost everywhere. )
V.5
Problems
Problem V. 5 . 1 . Show that the function f(t) == exp t is neither operator n1onotone nor operator convex on any interval. Problem V. 5 . 2 . Let f(t) == �i:� , where a, b, c, d are real numbers such that ad - be > 0. Show that f is operator monotone on every interval that does not contain the point - d . c
Problem V . 5 . 3 . Show-that the derivative of an operator convex function need not be operator monotone. Problem V. 5 .4. Show that for r < - 1 , the function f(t) == t r on (O, oo ) is not operator convex. (Hint: The function j[1] (1 , t) cannot be continued analytically to a Pick function. ) Together with the assertion in Exercise V.2. 1 1 , this shows that on the half-line (0, oo ) the function f(t) == tr is operator convex if - 1 < r < 0 or if 1 < r < 2; and it is not operator convex for any other real r . Problem V. 5 . 5 . A function it is of the form
g
on [0 , oo ) is operator convex if and only if
J
CXJ
g(t ) where
o: ,
= 0:
+ (3t + "(t 2 +
{3 are real numbers,
1
0
> 0, and
J.L
is a positive finite measure.
V.
1 48
Op erat o r Monotone and Op erator Convex Functions
Problem V. 5 . 6. Let f be an operator monotone function on (0, oo ) . Then ( - 1)n- l f( n ) ( t) > 0 for n == 1 , 2 , . . . . [A function g on (0, oo ) is said to be completely monotone if for all n > 0, ( - 1) n g( n ) (t) > 0. There is a theorem of S.N. Bernstein that says that a function g is completely monotone if and only if there exists a positive measure J-l such that g ( t) ==
J e-AtdJ.L(A) .] CXJ
The result of this problem says that the derivative of an
0
operator monotone function on (0, oo ) is completely monotone. Thus , f CXJ
has a Taylor expansion f (t) ==
a n (t - l ) n , in which the coefficients an L n =O are positive for all odd n and negative for all even n .
Problem V. 5 . 7. Let f be a function mapping (0, oo ) into itself. Let g(t) == [f ( t- 1 )] - 1 . Show that if f is operator monotone, then g is also operator monotone. I f f is operator convex and f (O) == 0, then g is operator convex. Problem V. 5 . 8 . Show that the function f(() == - cot ( is a Pick function. Show that in its canonical representation (V . 42) , a == {3 == 0 and the measure J-L is atomic with mass 1 at the points n1r for every integer n. Thus, we have the familiar series expansion CXJ
- cot ( == Loo n
[
1 n1f - (
-
n1r n 2 7r 2 + 1
].
Problem V. 5.9. The aim of this problem is to show that if a Pick function cp satisfies the growth restriction sup T] ---+ CXJ
l rJ cp( i rJ ) l < oo ,
(V . 58)
then its representation (V.42) takes the simple form
where
cp(( ) =
J-l
1 oo CXJ
-
).
1
- dp,(A) , (
(V.59)
is a finite measure. To see this, start with the representation (V.4 1) . The condition (V.58) implies the existence of a constant M that bounds, for all 'fJ > 0, the quantity 'fJcp(iry ) , and hence also its real and imaginary parts. This gives two inequalities:
lan + •1
J
CXJ
- CXJ
'fJ (l - rJ2 )A A2 +
'f/2
dv(A) l < M,
V.6
From the first , conclude that TJ -+ 00
a == lim
CXJ J
-
oo
(TJ2 - l ) .A dv (.A) == .A 2 + TJ 2
From the second , conclude that {3 == 0 and
Notes and
References
149
CXJ .A dv (.A) . -CXJJ
CXJ 1 + .A2 )dv(.A) < M. ( 2 ). : -CXJJ this gives CXJ 2 I ( 1 + .A ) dv(.A) = 1: dJ.L (.A) rP
Taking limits as
Thus,
J.-L
rJ � oo ,
-
< M. ,,
oo
is a finite measure. From (V.4 1 ) , we get
cp ( ( ) =
CXJ CXJ ( I .A dv(.A) + I \�.A( dv(.A ) . -CXJ -
CXJ
This is the same as (V. 59) . Conversely, observe that if 'P has a representation like (V.59) , then it must satisfy the condition (V .58) . Problem V. 5 . 10. Let f be a function on (0, oo ) such that
I .A � t dJ.L (A) , CXJ
f (t ) == Q + {3t -
0
where a E �' {3 > 0 and J.-L is a positive measure such that J l dJ.-L ( A ) < oo . Then f is operator monotone. Find operator monotone functions that can not be expressed in this form.
V.6
Notes and References
Operator monotone functions were first studied in detail by K.Lowner (C. Loewner) in a seminal paper Uber monotone Matrixfunktionen, Math. Z . , 38 ( 1934) 1 77-216. In this paper, he established the connection between operator monotonicity, the positivity of the matrix of divided differences (Theorem V.3.4) , and Pick functions. He also noted that the functions f(t) = tr , 0 < r < 1 , and f(t) = log t are operator mo no t one on (0, oo ) .
1 50
V.
Operator M o notone and Operator Convex Functions ..
Operator convex functions were studied , soon afterwards, by F. Kraus, Uber konvexe Matrixfunktionen, Math. Z . , 41 ( 1 936) 18-42 . In another well-known paper, Beitriige zur Storungstheorie der Spectralz erlegung, Math. Ann. , 123 ( 1951 ) 415-438, E. Heinz used the theory of operator monotone functions to study several problems of perturbation theory for bounded and unbounded operators. The integral representation ( V .41) in this context seems to have been first used by him. The operator monotonicity of the map A ----* A' for 0 < r < 1 is sometimes called the "Loewner-Heinz inequality" , although it was discovered by Loewner. J. Bendat and S. Sherman, Monotone and convex operator functions, Trans. Amer. Math. Soc. , 79 ( 1 955) 58-71 , provided a new perspective on the theorems of Loewner and Kraus. Theorem V.4.4 was first proved by them, and used to give a proof of Loewner's theorems. A completely different and extremely elegant proof of Loewner's Theo rem, based on the spectral theorem for ( unbounded ) selfadJoint operators was given by A. Koranyi , On a theorem of Lowner and its connections with resolvents of selfadjoint transformations, Acta Sci. Math. Szeged , 1 7 ( 1956) 63-70. Formulas like (V . 13) and ( V.22 ) were proved by Ju. L. Daleckii and S.G. Krein, Formulas of differentiation according to a parameter of functions of Hermitian operators, Dokl. Akad. Nauk SSSR, 76 ( 1951 ) 13- 1 6 . It was pointed out by M. G. Krein that the resulting Taylor formula could be used to derive conditions for operator monotonicity. A concise presentation of the main ideas of operator monotonicity and convexity, including the approach of Daleckii and Krein, was given by C. Davis, Notions generalizing convexity for functions defined on spaces of matrices, in Convexity: Proceedings of Symposia in Pure Mathematics, American Mathematical Society, 1963, pp. 187-20 1 . This paper also dis cussed other notions of convexity, examples and counterexamples, and was very infi uential. A full book devoted to this topic is Monotone Matrix Functions and Analytic Continuation, by W .F. Donoghue, Springer-Verlag, 1974. Several ramifications of the theory and its connections with classical real and com plex analysis are discussed here. In a set of mimeographed lecture notes, Topics on Operator Inequalities, Hokkaido University, Sapporo, 1978 , T. Ando provided a very concise mod ern survey of operator monotone and operator convex functions. Anyone who wishes to learn the Koranyi method mentioned above should certainly read these notes. A short proof of Lowner's Theorem appeared in G. Sparr, A new proof of Lowner 's theorem on monotone rnatrix functions, Math. Scand. , 47 ( 1980) 266-274. In another brief and attractive paper, Jensen's inequality for operators and Lowner's theorem, Math. Ann. , 258 ( 1982) 229-241 , F. Hansen and G . K . Pedersen provided another approach.
V. 6
Notes
and References
151
M uch of Sections 2 , 3 , and 4 are based on this paper of Hansen and Pedersen. For the latter parts of Section 4 we have followed Donoghue. We have also borrowed freely from Ando and from Davis. Our proof of The orem V. 1 .9 is taken from M. Fuj ii and T. Furuta, Lowner-Heinz, Cordes and Heinz-Kato inequalities, Math. Japonica, 38 ( 1993) 73-78. Characteri sations of operator convexity like the one in Exercise V.3. 15 may be found in J.S. Auj la and H.L. Vasudeva, Convex and monotone operator functions, Ann. Polonici Math. , 62 ( 1 995) 1- 1 1 . Operator monotone and operator convex functions are studied in R.A. Horn and C . R. Johnson, Topics in Matrix Analysis, Chapter 6 . See also the interesting paper R.A. Horn , The Hadamard product, in C.R. Johnson, ed. Matrix Theory and Applications, American Mathematical Society, 1990. A short , but interesting, section of the Marshall-Olkin book ( cited in Chapter 2) is devoted to this topic. Especially interesting are some of the examples and connections with statistics that they give. Among several applications of these ideas, there are two that we should mention here. Operator monotone functions arise often in the study of electrical networks. See, e.g. , W.N. Anderson and G.E. Trapp, A class of monotone operator functions related to electrical network theory, Linear Algebra Appl. , 15( 1975) 53-67. They also occur in problems related to elementary particles. See, e.g. , E. Wigner and J . von Neumann, Significance of Lowner 's theorem in the quantum theory of collisions, Ann. of Math. , 59 (1954) 418-433. There are important notions of means of operators that are useful in the analysis of electrical networks and in quantum physics. An axiomatic approach to the study of these means was introduced by F. Kubo and T. Ando, Means of positive linear operators , Math. Ann. , 249 ( 1980) 2052 2 4 . They establish a one-to-one correspondence between the class of oper ator monotone functions f on (0, oo ) with f ( 1) == 1 and the class of operator means.
VI Spectral Variation of Normal Matrices
Let A be an n x n Hermitian matrix, and let ,\ i (A) > ,\ � (A) > · · · > ,\ � (A) be the eigenvalues of A arranged in decreasing order. In Chapter III we saw that AJ (A) , 1 < j < n , are continuous functions on the space of Her mitian matrices. This is a very special consequence of Weyl's Perturbation Theorem: if A , B are two Hermitian matrices, then m 0xi Aj (A) - AJ (B) I < J
II A - B II -
In turn, this inequality is a special case of the inequality (IV.62 ) , which says that if Eig l (A) denotes the diagonal matrix with entries AJ (A) down its diagonal, then we have
for all Hermitian matrices A, B and for all unitarily invariant norms. In this chapter we explore how far these results can be carried over to normal matrices. The first difficulty we face is that , if the matrices are not Hermitian, there is no natural way to order their eigenvalues. So, the problem has to be formulated in terms of optimal matchings. Even after this has been done, analogues of the inequalities above turn out to be a little more complicated. Though several good results are known, many await discovery.
VI. l Cont i nuity of Roots of Polynomials
VI . l
1 53
Continuity of Roots of Polynomials
Every polynomial of degree n with complex coefficients has n complex roots . These are unique, except for an ordering. It is thus natural to think of them as an unordered n-tuple of complex numbers. The space of such n-tuples is denoted by C�ym . This is the quotient space obtained from the space e n via the equivalence relation that identifies two n-tuples if their coordinates are permutations of each other. The space C�ym thus inherits a natural quotient topology from e n . It also has a natural metric : if .A == { .A1 , . . . , .A n} and J.L == { t-tl ' . ' f..L n } are two points in c�y rn ' then . .
where the minimum is taken over all permutations. See Problem II.5.9. This metric is called the optimal matching distance between .A and J-L . Exercise Vl. l . l Show that the quotient topology on C�y rn and the metric topology generated by the optimal matching distance are identical.
Recall that, if
!(
z
)
==
z n - a 1 z n - 1 + a2 z n- 2 +
·
·
·
+ ( - l ) n an
(VI. l )
is a monic polynomial with roots n 1 , . . . , nn , then the coefficients aj are elementary symmetric polynomials in the variables n 1 , . , n n , i.e. , .
.
(VI.2)
By the Fundamental Theorem of Algebra, we have a bijection S : C�ym en defined as (VI.3)
---1
Clearly S is continuous, by the definition of the quotient topology. We will show that s- 1 is also continuous. For this we have to show that for every E > 0, there exists 8 > 0 such that if l aj - bj I < 8 for all j , then the optimal matching distance between the roots of the monic polynomials that have aj and bj as their coeffir3ients is smaller than E. . Let � 1 , . . . , ( k be the distinct roots of the monic polynomial f that has coefficients a1 . Given £ > 0, we can choose circles ri , 1 < j < k, centred at (j , each having radius smaller than c and such that none of them intersects any other. Let r be the union of the boundaries of all these circles. Let rJ == inf I f ( z) 1 - Then rJ > 0. Since r
zEr
is a compact set , there exists a positive number 8 such that if g is any monic polynomial with coefficients bj , and I aj - b1 I < 8 for all j , then l f ( z ) - g ( z ) l < rJ for all z E r. So, by Rouche's Theorem f and g have the same number of zeroes inside each rj , where the zeroes are counted with multiplicities. Thus we can pair each root of f with a root of g in
1 54
VI .
S pectral Variat ion of Normal M atrices
such a way that the distance between any two pairs is smaller than £. In other words, the optimal matching distance between the roots of f and g is smaller than £. We have thus proved the following.
The map S is a homeomorphism between C�y m and en . The continuity of s- 1 means that the roots of a polynomial vary contin uously with the coefficients. Since the coefficients of its characteristic poly
Theorem Vl. 1 .2
nomial change continuously with a matrix, it follows that the eigenvalues of a matrix also vary continuously. More precisely, the map M (n) ----* C �ym that takes a matrix to the unordered tuple of its eigenvalues is continu ous . A different kind of continuity question is the following. If z A( z) is a continuous map from a domain G in the complex plane into M(n) , then do there exist n continuous functions .X 1 ( z ) , . . , A n ( z ) on G such that for each z they are the eigenvalues of the matrix A ( z ) ? The example below shows that this is not always the case. ----*
.
(
)
Let A( z ) � � . The eigenvalues of A(z) are ± z 1 12 . These cannot be represented by two single valued continuous func tions on any domain G that contains zero.
Example VI. 1.3
=
In two special situations, the answer to the question raised above is in the affirmative. If either the eigenvalues of A ( z ) are all real, or if G is an interval on the real line, a continuous parametrisation of the eigenvalues of A ( z ) is possible. This is shown below. Consider the map from IR�ym to IRn that rearranges an unordered n-tuple { .A 1 , . . . , A n } in decreasing order as ( -Ai , . . , A � ) . From the majorisation relation ( IL35 ) it follows that this map reduces distances, i.e. , .
Hence, in particular, this is a continuous map. So, if all the eigenvalues of A ( z ) are real, enumerating them as -Ai (z) > > A � ( z ) gives a continuous parametrisation for them. We should remark that while this is the most natural way of ordering real n-tuples, it is not always the most convenient. It could destroy the differentiability of these functions, which some other ordering might confer on them. For example, on any interval containing 0 the two functions ±t are differentiable. But rearrangement in the way above leads to the functions ±lt l , which are not differentiable at 0. For maps from an interval we have the following. ·
·
·
Let A be a continuous map from an interval I into the space C�y m . Then there exist n continuous comple x functions .A1 ( t) on I / such that A(t) {.A 1 (t) , . . . , -A n (t) } for each t E .
Theorem Vl. 1.4 ==
VI . 2 Hermit ian a nd Skew-Hermitian l\1 atrices
155
For brevity we wil l cal l n functions whose existence is asserted by t he theorem a continuous selectio n for ...\ . Suppose a continuous selection .AJ 1 ) ( t) exists on a subinterval I1 and another continuous selection .AJ2 ) (t) exis t s on a s ubi nte rval I2 . If I1 and I2 have a common point t 0 , then 2 {.A)1 ) ( to ) } and { .A; ) ( t0 )} are identical up to a permutation. So a continuous I . sel ection exists on I1 U 2 It follows that , if J is a subinterval of I such that each point of J has a neighbourhood on which a continuous selection exists, then a continuous sel ection exists on the entire interval J. Now we can prove the theorem by induction on n . The statement is obviously true for n == 1 . Suppose i t is true for dimensions sm aller than n . Let K be the set of all t E I for which all the n ele me nt s of A( t) are equal. Then K is a closed subset of I. Le t L == I\K. Let t0 E L. Then A( t0 ) has at least two distinct elements. Collect all the copies of one of these elements. If these are k in number (i . e . , k is the multiplicity of the chosen element ) , then t 4e n elements of A( t0 ) are now divided into two groups with k and n - k elements, respectively. These two groups have no ele ment in co mmon . Since A( t ) is continuous, for t su ffi cientl y close to t0 the elements of A ( t) also split into two groups of k and n - k elements, each of which is continuous in t . By t h e induction hypothesis, each of these groups has a c o nt i n uous selection in a neighbourhood of t 0 . Taken together, they provide a continuous selection for A in this neighbourhood. So, a conti nu ous selection exists on each component of L. On its comple ment K, A ( t) consists of just o ne element .A(t) repeated n t imes . P ut ting • these together we obtain a continuous selection for A( t) on all of I.
Proof.
Corollary Vl. 1 . 5 Let a1 ( t) , 1 < j < n, be continuous complex valued functions defined on an interval I. Then there exist continuous functions a 1 ( t) , . . . , a n ( t) that, for each t E I, constitute the roots of the monic poly nomial z n - a 1 (t)zn- l + + ( - l ) n a n (t) . ·
·
·
Corollary Vl. 1 . 6 Let t � A ( t ) be a continuous map from an interval I into the space of n x n matrices. Then there exist continuous functions A 1 (t) , . . . , A n (t) that, for each t E I, are the eigenvalues of A (t) .
VI . 2
Hermitian and Skew- Hermitian Matrices
In this section we derive so me bounds for the distance between the eigen values of a Hermitian matrix A and those of a skew-Hermitian matrix B . This will reveal several new facets of the general problem that are quite d i ffe re nt from the case when both A , B are Hermitian. Let us recall here, once again, the theorem that is the prototype of t he results we seek.
156
VI .
S p ect ral Variation o f Nonnal M at rices
(Weyl 's Perturbati on Theorem) Let A, B be Hermitian matrices with eigenvalues Ai ( A ) > > A ; ( A) and AI ( B ) > · · > A� (B ) , respectively. Then (VL4 ) ID?XI-A] ( A ) - A] ( B ) I < II A - E ll ·
Theorem Vl.2. 1
·
·
·
·
J
We have seen two different proofs of this, one in Section III. 2 and the other in Section IV.3. It is the latter idea which, in modified forms, will be used often in the following paragraphs.
Let A be a Hermitian and B a skew-Hermitian matrix. Let their eigenvalues a 1 , . . . , a n and {3 1 , . , f3n be arranged in such a way that
Theorem Vl.2 . 2
.
.
(VI.5)
Then
(VI.6)
Proof. For a fixed index j , consider the eigenspaces of A and B corre sponding to their eigenvalues {a1 , . . . , a j } and {{31 , . . . , f3n- j +l } , respec tively. Let x be a unit vector in their intersection. Then
II A - B l l 2 >
� ( ! l A - B l l 2 + II A + B ll 2 ) � (! ! (A - B) x ll 2 + l l ( A + B)x ll 2 )
I I Ax II 2 + I I B x 11 2 > ! aj 1 2 + l f3n- j + I I 2 == l nj - f3n- j + I l 2 · At the first step above, we used the equality l i T II l i T* II valid for all T; at the third step we used the parallelogram law, and at the last step the fact • that aj is real and f3n- j + l is imaginary. ==
For Hermitian pairs A, B we have seen analogues of the inequality (VI.4) for other unitarily invariant norms. It is , therefore, natural to ask for similar kinds of results when A is Hermitian and B skew-Hermitian. It is convenient to do this in the following setup. Let T be any matrix and let T == A + i B be its Cartesian decomposition into real and imaginary parts, A == T�r· and B == T-;_[· The theorem below gives majorisation relations between the eigenvalues of A and B, and the singular values of T. From these several inequalities can be obtained. We will use the notation { x1 }j to mean an n-vector whose jth coordinate .
lS
Xj ·
Let A, B be Hermitian 1natrices with eigenvalues Gj and {3j , respectively: ordered so that Theorem VI. 2.3
Let
T
A
VI.2 H e rm it i an and Ske\v- Hermit ian M at rices
1 57
and let Sj be the singular values of T. Then the following majorisation relations are satisfied: ==
+ iB,
+ if3n - j+I I 2 } j -< { s] }j , 2 { 1 / 2 ( s� + s ;_ j +1 ) }j -< { l aj + i /3j l } j { l aj
Pro o f.
(VI. 7) (VI.8)
For any two Hermitian matrices X, Y we have the maj orisations
(III. 13) :
Cho osing X
Al (X) + AT (Y ) -< A(X + Y) ==
A2 ,
Y
==
-< A l ( X )
+ A l (Y) .
B 2 , this gives (VI.9)
Now note that and So, choosing X
==
r;r
sj ( T*T ) and Y
==
==
sj ( TT* )
r� ·
==
sJ .
in the first maj orisation above gives (VI. lO)
Since maj orisation is a transitive relation, the two assertions (VI. 7) and • (VI. 8) follow from (VI.9) and (VI. lO) .
For each p > 2 , the function
12P/ 2 { ( Sj2 + Sn-2 j+l ) P/2 }J. -<w { I aJ. + "-f-IJ- � - IP }J. .
(VI. 12)
These two relations include the inequalities
n L ! aj + if3n - j+ I I P
j= l
n / 2:/ 2 LC sJ + s� -j+ l )P 2 j =l
<
n
L s; , j= l
n < L l o:i + i ,6i i P j= l
(VI. 13)
(VI. 14)
for p > 2 . If a 1 and a 2 are any two nonnegative real nu mbers, then the function g ( t ) == ( ai + a � ) lft is monotonically decreasing on 0 < t < oo . So if p > 2 , then p p < 2 2 p/2 ( VI. 15) a l + a 2 - ( a l + a2 ) .
158
VI .
S pectral Variation of Normal Matrices
Using th1 s we get from (VI . 14) the inequality n n p 2 2 1 - / L sj < L !aj + i ,Bj i P j= l j=l
(VI. 1 6 )
for p > 2 .
Exercise Vl. 2 . 4 For 0 < p < 2 , the function cp(t ) == t P I 2 is concave on [0 , oo ) . Use this to show that for these values of p , the weak majorisations (VI. l l} and {VI. 1 2} are valid with -< w replaced by -< w . All the four in
equalities
( VI. 13}- (VI. 1 6)
now go in the opposite direction.
Let A be any matrix with eigenvalues a 1 . . . , a n , counted in any order. We have used the notation Eig A to mean a diagonal matrix that has entries a 1 down its diagonal. I f O" is a permutation, we will use the notation Eigo- (A) for the diagonal matrix with entries nu ( l ) , . . . , ao-( n) down its diagonal . The symbol Eig l l l (A) will mean the diagonal matrix whose diagonal entries are the eigenvalues of A in decreasing order of magnitude, i.e. , the aj arranged so that j a 1 1 > · · > In n I · I n the same way, Eig l T I (A) will stand for the diagonal matrix whose diagonal entries are the eigenvalues of A arranged in increasing order of magnitude, i.e. , the a 1 rearranged so that l n 1 l < l n2 l < · · · < ! an i · With these notations, we have the following theorem for the distance between the eigenvalues of a Hermitian and a skew-Hermitian matrix, in the Schatten p-norms. ,
·
Theorem Vl. 2 . 5
Then, {i} for 2 < p <
{ii}
for
Let A be a Hermitian and B a skew-Hermitian matrix.
oo ,
we have
II Eig l l l (A) - Eig i T I (B) l i P < I I A - B l i p , I I A - B l i P < 2 � - � I I Eig l l l (A) - Eig l l l (B) l ip ; 1 < p < 2 , we have I I Eig l l i (A) - Ei g i 1 1 (B ) I I p < 2 i - � I I A - B l i , P I I A - B l ! v < II Eig l l l (A) - Eig i T I (B) llv·
(VI. 17 ) (VI. 18) (VI . 19) (VI.20)
All the inequalities above are sharp. Further, {iii) for 2 < p < oo , we have
I I Eig l l l ( A) - Eig i T I ( B) li P < I I Eig(A) - Eiga ( B ) l i P < I IEig l l l (A) - Eig l l l (B) li P (VI. 21)
for all permutations O" ; {iv} for 1 < p < 2_, we have
I I Eig l l i (A) - Eig i 1 1 (B) I I p < I I Eig(A ) - EigO"(B) I I p < I I Eig l l i (A) - Eig i T I (B) IIp (VI. 22)
for all permutations O" .
VI . 3 Est imates in the O p erator Norn1
159
Proof. For p > 2, the inequalities (VI. 1 7) and (VI. 18) follo\v immediately from (VI. l3) and (VI. 16) . For p == oo , the same inequalities remain valid by a limiting argument. The next two inequalities of the theorem follow from the fact that, for 1 < p < 2 , both of the inequalities (VI. 13) and (VI. 1 6 ) are reversed. The special case of statements ( i ) and (ii) in which A and B commute is adequate for proving (iii) and (iv) . The sharpness of all the inequalities can be seen from the 2 x 2 example: (VI. 23)
Here I I A - B li P == 2 for all 1 < p < oo . The eigenvalues of of B are ±i. Hence, for every permutation O" I I Eig(A ) - Eig ()" (B) l i P II A - B I I p
------
for all 1 < p <
==
1
A are ± 1 , those
1
2p-2
oo .
•
Note that the inequality (VI.6) is included in (VI. 1 7) . There are several features of these inequalities that are different from the corresponding inequality (IV.62) for a pair of Hermitian matrices A, B . First , the inequalities (VI. 18) and (VI. 19) involve a constant term on the right that is bigger than 1 . Second, the best choice of this term depends on the norm I I · l i p · Third , the optimal matching of the eigenvalues of A with those of B - the one that will minimise the distance between them changes with the norm. In fact , the best pairing for the norms 2 < p < oo is the worst one for the norms 1 < p < 2 , and vice versa. All these new features reveal that the spectral variation problem for pairs of normal matrices A, B is far more intricate than the one for Hermitian pa1rs.
Let A be a Hermitian and B a skew-Hermitian matrix. Show that for every unitarily invariant norm we have (VI. 24) I I ! E ig l l i ( A ) - Eig 1 1 1 ( B ) II I < 2 I II A - B ill ,
Exercise Vl. 2.6
Ili A - Bill < V2 111 Eig 1 1 1 (A) - Eig l l i (B) III -
(VI.25)
The term J2 in the second inequality cannot be replaced by anything smaller. VI . 3 In
Est imates in the Op erator Norm
this section we will obtain estimates of the distance between the eigen values of two normal matrices A and B in terms of II A - B II - Apart from
160
VI.
Spectral Variation o f Normal Matrices
the optimal matching distance, which has already been introduced, we will consider other distances. If L, M are two closed subsets of the complex plane C , let
s(L, M )
==
sup dist(.A, M )
>. E L
==
sup inf I .A - J.L I · J-LE M
(VI.2 6)
A. E L
The Hausdorff distance between L and M is defined as
h(L , M )
==
max ( s( L , M ) , s (M , L ) ) .
( VI . 27 )
Show that s ( L , M) 0 if and only if L is a subset of M. Show that the Hausdorff distance defines a metric on the collection of all closed subsets of C .
Exercise Vl.3. 1
==
Note that s(L, M ) is the smallest number 8 such that every element of L is within a distance 8 of some element of M; and h(L , M) is the smallest number fJ for which this, as well as the symmetric assertion with L and M interchanged, is true. Let { ) q , . . . , A n } and { J.L1 , . . . , J.Ln } be two unordered n-tuples of complex numbers. Let L and M be the subsets of C whose elements are the entries of these two tuples. If some entry among { Aj } or {,Uj } has multiplicity bigger than 1 , then the cardinality of L or M is smaller than n.
{i) The Hausdorff distance h(L, M ) is always less than or equal to the optimal matching distance d( { ...\ 1 , , -A n } , { J.L l , , J.Ln } ) . 2 , the two distances are equal. {ii) When £, + E} and { m, £ , - £ } provide an example in {iii) The triples { 0, which h(L , M) £ and the optimal matching distance is m - 2 £. Thus, for > 3, the second distance can be arbitrarily larger than the first.
Exercise Vl.3 . 2
.
.
.
.
.
.
n ==
m
-
m
==
n
If A is an n x n matrix, we will use the notation o- (A) for both the subset of the complex plane that consists of all the eigenvalues of A, and for the un ordered n-tuple whose entries are the eigenvalues of A counted with multi plicity. Since we will be talking of the distances s (O"(A) , O"(B) ) , h(O"(A) , o-(B) ) , and d ( o- ( A) , a(}i )), it will be clear which of the two objects is being represented by o- ( 1 \ Note that the inequalities (VI.4) and (VI.6) say that .r - , .
d( o- ( A ) , o-( B) )
<
I I A - Btl ,
( VI . 28 )
if either A and B are both Hermitian, or one is Hermitian and the other skew-Hermitian. Theorem Vl.3.3
Let A be a normal and B an arbitrary matrix. Then s (o-(B) , o-( A) ) < I I A - BJJ .
(VI. 29 )
V L 3 Estimates i n t he O perator Norm
161
P roof. Let £ == II A - B I J . We have to show that if (3 is any eigenvalue of B , then (3 is within a distance £ of some eigenvalue O'.j of A. By applying a translation, we may assume that (3 == 0 . If none of the a. j is within a distance £ of this, then A is invertible. Since A is normal, we have II A - I ll = minliQJ I < ! . Hence , Since B == A(! + A - 1 ( B - A )) , this shows that B is invertible. But then B could not have had a zero eigenvalue. • Another proof of this theorem goes as follows. Let A have the spectral resolution A == � a.j uj uj , and let v be a unit vector such that B v == (Jv. Then J
J
L l a.j - f31 2 l uj v i 2 J
Sinc e the
Uj
inequality can be satisfied only if index j . Corollary VI. 3.4 If A
For
n =
L 1ujv j2 ==
form an orthonormal basis,
2, we have
and B are
J
Ja.j - (3 j 2 n x n
<
1 . Hence, the above
II A - Bll 2
for at least one
normal matrices, then
h( o- ( A) , o-( B ) )
<
IIA - Ell -
(VI.30)
d ( o- ( A ) , o-( B ))
<
I I A - B jj .
(VI.3 1 )
This corollary also follows from the proposition below. We will use the notation D ( a , p) for the open disk of radius p centred at a, and D ( a, p) for the closure of this disk.
Proposition Vl.3.5 Let A and B be normal matrices, II A - E ll . If any disk D ( a , p) contains k eigenvalues of A, D ( a , p + £) contains at least k eigenvalues of B .
and let £ == then the disk
Without loss of generality, we may assume that a 0. Suppose D ( O, p) contains k eigenvalues of A but D (O, p + £ ) contains less than k eigenvalues of B . Choose a unit vector x in the intersection of the eigenspace of A corresponding to its eigenvalues lying inside D ( O , p) and the eigenspace of B corresponding to its eigenvalues lying outside D (O, p +£ ) . We then have f l Ax II < p and I I B x j j > p + £. We also have l ! Bx ll - II Ax il < I I ( B - A ) x ll < • H B - A II == £. This is a contradiction. Proof.
==
162
VI .
S pect ral Variation o f Normal Matrices
Us(/ihe special case p
Exercise VI . 3 .6 Corollary VI. 3. 4 .
==
0
of Proposition VI. 3. 5 to prove
Given a su bs et X of the complex plane and a matrix A , let mA ( X ) denote the number of eigenvalues of A inside X.
Let A, E be two normal matrices. Let K1 , K2 be two convex sets such that mA ( K1 ) < k and mB (K2 ) > k + 1 . Then dist (K1 , K2 ) < I I A - E jj . [Hint: Let p � oo in Proposition VI. 3. 5.j Exercise VI. 3. 8 Use this to give another proof of Theorem V/. 2. 1 . Exercise Vl.3.9 Let A , B be two unitary matrices whose eigenvalues lie in a semicircle of the unit circle. Label both the sets of eigenvalues in the counterclockwise order. Then Exercise VI .3. 7
n x n
n -
nx n
m?JC I Aj (A) - Aj ( E ) l < I I A - E ll . J
Hence,
d (a( A) , a( E ) ) < I I A - E l l .
Let T be the unit circle, I any closed arc in T, and for arc { z E T l z - w l < £ for some w E /} . Let A, B be unitary matrices with I I A - E ll == £ . Show that mA (I) < ms (Ic ) · Theorem Vl. 3 . 1 1 For any two unitary matrices,
Exercise Vl.3 . 10 c > 0 let IE be the
:
d (a (A) , a(E) ) < I I A - E l l .
Proof. T he proof will use the Marriage Theorem ( Theorem IL 2. 1 ) and the exercise a bove. Let { ) q , . . , A n } and {J.Ll , . . . , J.ln } be the eigenvalues of A and B, respec tively. Let A be any subset of { A I , . , An } - Let J.L(A) == { J.Li : I J.Lj - Ai l < E. for some A i E A} . By the Marriage Theorem, the assertion would be proved if we show that IJ.L (A) I > I A I . Let I(A) be the set of all points on the unit circle T that are within distance E. of some point of A. Then J.L ( A) contains exactly t hos e J.li that lie in I( A) . Le t /(A) be written as a disjoint union of arcs !1 , . . , I, . For each 1 < k < r , let Jk be the arc contained in Ik all whose points are at distance > c from the boundary of Ik · Then Ik = ( Jk ) £ · From Exercise VL 3. 10 we have .
.
.
.
r
T
k ==l
k==l
But , all the elements of A are in some Jk . This shows that I A I < I J.L(A) I .
•
There is one difference between Theorem VL3. 1 1 and most of our earlier results of this type. N ow nothing is said about the order in which the
VI . 3 Estimates in t he Operator Nann
163
eigenvalues of A and B are arranged for tl,1 e optimal matching. No canonical order can be prescribed in general. In Problem VI. 8 . 3 , we outline another proof of Theorem VI.3. 1 1 which says, in effect , that for optimal matching the eigenvalues of A and B can be counted in the cyclic order on the circle p rovided the initial point is chosen properly. The catch is that this initial p oint depends on A and B and we do not know how to find it.
Let A == c 1 U1 , B == c 2 U2 , where U1 , U2 are unitary ma trices and c 1 , c2 are complex numbers. Show that d (o- ( A ) , u ( B ) )
Exercise VI. 3 . 1 2 < II A - E l l .
By now we have seen that the inequality ( VI.28 ) is valid in the following situ ations:
( i ) A and B both Hermitian
(ii ) ( iii ) ( iv )
A Hermitian and E skew-Hermitian A and E both unitary ( or both scalar multiples of unitaries ) A and B both 2
x
2 normal matrices.
The example below shows that this inequality breaks down for arbitrary normal matrices A , B when n > 3.
Example Vl.3. 13 Let A be the 3 x 3 diagonal matrix with diagonal entries + -13 i A == - l + 2 v'3 i L et v T == \ A == 4 5 1 13 ' 3 /\ } ' 2 13 v!§_8 ' l2 ' v 8 and let U � I - 2vvT . Then U is a unitary matrix. Let E = - U * A U Then E is a normal matrix with eigenvalues /-tj == - Aj , j == 1 , 2 , 3 . One can check that
fi)
(
.
-
.
d ( cr (A) , cr (B) )
So,
IIA - B ll =
= {!A,
d (o- (A ) , u ( B ) ) II A - E l l
==
.Jf{.
1 .0 183 + .
In the next chapter we will show that there exists a constant such that for any two n x n normal matrices A , B
c
< 2.91
d ( u (A) , CT(B ) ) < ell A - E l l .
For Hermitian matrices A, B we have a reverse inequality: II A - E ll < m� I AJ ( A ) - A} ( B ) I . l <J < n
The quantity on the right is the distance between the eigenvalues of A and those of B when the "worst', pairing is made. An analogous result for normal matrices is proved below.
VI.
1 64
Sp ectral Variation o f Normal Matrices
Theore'm VI. 3 . 14 Let A and B::;·:�b e normal matrices with eigenvalues { ) q , . , A n } and { J-L 1 , . , /L n } , respecti vely. Then, there exists a permu tation O" such that .
.
.
.
( VI.3 2 )
P roof. The matrices A 0 I and I 0 B are both normal and commute wit h each other. Hence A 0 I - I 0 B is normal. The eigenvalues of this matrix are all the differences .,.\ i - /-Lj , 1 < i , j < n. Hence II A 0 I - I ® E l l = �Cl?C l A i - /-Lj 1 'l ,J
So, the inequality ( VI.32 ) is equivalent to l l A - E ll
<
vl2 II A ® I - I ® B T II ·
This is, in fact , true for all A , B and is proved below. Theorem VI. 3 . 1 5
For all matrices A, B II A - E l l
Proof.
<
vi2II A 0 I - I ® B T I I ·
•
( VI.33)
We have to prove that for all x , y in e n l ( x , (A - B ) y) j < vf2 I I A 0 I - I 0 BT I I I l x l l I I Y II ·
We have
j (x, (A - B ) y) j
<
l x* A y - x* By ! == j tr (A y x* - y x* B) I I I A y x* - yx * B i l l -
The matrix A yx * - y x * B has rank at most 2 . So, l ! Ay x * - y x* B l l 1
<
vf2 11 A y x* - y x* B l 1 2 -
Let x be the vector whose components are the complex conj ugates of the components of X . Then with respect to the standard basis e i 0 ej of en 0 e , the ( i , j )-coordinate of the vector (A ® l) (y 0 x) is L aik YkX j . This is also n
k
the ( i , j )-entry of the matrix A y x * . In the same way, the ( i , j )-entry of y x * B is the ( i , j ) -coordinate of the vector ( I ® B T ) ( y ® x) . Thus, we have I! Ay x* - y x* B l 1 2
This proves the theorem.
<
l l (A ® I - I ® B T ) (y 0 x ) l l II A ® I - I ® B r l l I I Y 0 x l l I I A 0 I - I 0 B T l! l l x l l I I Y I I -
•
The example ( VI . 23 ) shows that the inequality (VI.32) is sharp. Note that in this example A and B are both unitary. Also, A is Hermitian and B is skew-Hermitian. In contrast, the factor ..J2 in ( VI.32 ) can be replaced by 1 if A , B are both Hermitian.
Vl . 4 Estin1ates in the Frobenius Norm
VI . 4
165
Estimates in the Frobenius Norm
We will use the symbol Sn to mean the set of permutations on n symbols , as well as the set of n x n permutation matrices. (To every permutation O" there corresponds a unique matrix P that has entries 1 in the ( i , j) place if and only if j == O" ( i ) , and all whose remaining entries are zero. ) Let O n be the set of all n x n doubly stochastic matrices. This is a convex polytope and by Birkhoff's Theorem ( Theorem 11.2.3) its. extreme points are permutation matrices.
{Hoffman- Wielandt) Let A and E be normal matrices with eigenvalues { ...\ 1 , . , A n } and {J.L 1 , . , /-Ln } , respectively. Then I /2 n
Theorem Vl.4. 1
..
..
max
(
o- E Sn
� I .Ai - J.la ( i ) l 2 i= l
)
(VI.34)
Proof. Choose unitary matrices U, V such that U A U * == D 1 , V BV * == D2 , where D 1 == diag ( A I , . . . , An) and D2 == diag(J.LI , . , J.Ln ) · Then, by uni tary invariance of the Frobenius norm, II A - El l� == I I U * D 1 U - V* D 2 V II � == 2 I I D 1 W - W D 2 ll , where W == UV * , another unitary matrix. If the matrix W has entries Wij , this can be written as
..
'L
,J
'L ,J
is an affine function on the set O n of doubly stochastic matrices. So it attains its minimum at one of the extreme points of O n . Thus, there exists a permutation matrix (Pij ) such that
'l. ,J
If this matrix corresponds to the permutation I I A - El l � >
·7 .
this says that
�I A i - J.la-(i) l 2 'L
This proves the first inequality in (VI.34) . The same argument for the maximum instead of the minimum gives the other inequality. • Note that for Hermitian matrices, the inequality (VI.34) was proved ear lie r in Problem III.6. 15. In this case, we also proved that the same inequality is true for all unitarily invariant norms.
166
VI .
Spectral Variation o f Normal Matrices
In - general , there is no prescript ion for finding the permutation O" that minimises the Euclidean distance between the eigenvalues of A and those of B . However, if A is Hermitian with eigenvalues enume rated as ) q > A2 > > A n , then an enumeration o f J.Li in which Re /-L l > Re J.L 2 > > Re f-L n is the best one. To see this, j ust note that if _\ 1 > .:\ 2 and Re J.L 1 > Re J-L2 , then ·
·
·
·
·
·
The same argument shows t h at an enumeration for whi ch the maximum < Re f..L n · (What distance is attained is one for which Re f..L l < Re J.L 2 < does this say when B is skew-Hermitian? ) Using the notations introduced in Sect io n VI.2, t he inequality (VI.34 ) can be rewritten as ·
B) min a ii E ig (A) - E ig a ( I I 2
<
IIA - BII 2
<
·
·
max i! Eig(A) - Eig a ( B) Ib - ( V I . 35) a
There is another way of looking at this. Since the eigenvalues of a no rmal matrix completely determine the matr ix up to a unitary conjugation , the inequality ( V I. 35) is equivalent to saying that for any two diagonal matrices A, B min jj A - P BP* 1 ! 2 < I I A - U BU * lb p
<
max i! A - PB P* 1 !2 , p
(VI.36)
where U is any unitary matrix and P varies over al l permutation matrices. Given any matrix B, let Us be the s et Us == { UBU* : U E U ( n ) } ,
where U(n) is the group consisting of unitary matrices. Then Us is a compact set called the unitary orbit of B. For a fixed di agon al matrix A, consider the function f(X) == II A - X ll 2 - The inequal ity (VI.36) then says that if B is a no ther diagonal matrix, then on the compact se t UB b ot h the minimum and the maximum of f are attained at diago nal matrices (j ust some permutations of B) . In ot he r words, the minimum and the maximum on the unitary orbit are b ot h contained in the permutation orbit. This is an interesting fact from the point of view of calculus and geom etry. We will see below that if A, B are real diagonal matrices, a stronger statement can be p roved using calculus. This will also serve to introduce some e l ement ary ideas of differential geometry used in later sections. A differentiable function U (t) , where t is real and U ( t) is unitary, is called a differentiable curve thro ugh I if U (O) == I D i ffe rentiat i ng the equation U(t)U(t)* == I at t == 0 shows that for such a curve U' (O) is skew-He r mit i an . The matrix U' (O) is called the tangent vector to U(t) at /. If K == U' ( O ) , then etK is anothe r differentiable curve through I with t angent vector K at I. Thus , the curves U(t) and e t K have the same t angent vector and so .
VL 4 Est imates i n the Frobenius Norm
167
represent the same curve locally, i.e. , they are equal to tHe> first degree of approximation . The tangent space to the manifold U(n) at the p o i nt I is the linear space that consists of all these tangent vectors. We have seen that this is the real vector space K ( n) consisting of all skew-Hermitian matrices . If UA is the unitary orbit of a matrix A , then every differentiable curve through A can be represented locally as e t K Ae -t K for some skew-Hermitian K. The derivative of this curve at t == 0 is K A - AK. This is usually written as [K, A ] and called a Lie bracket or a commutator. Thus the tangent space to the manifold UA at the point A is the space TAUA
== { [A, K]
:
K
E
K(n) } .
(VI.37)
Note that this implies that TAUA is contained in K(n) if A E K(n) . The sesquilinear form (A, B) == tr A * B is an inner product on the space M (n) . The symbol Sj__ will mean the orthogonal complement of a space S v-vith respect to this inner product.
Le mma Vl.4.2 For every A E K(n) , the orthogonal in K(n) is the set of all Y that commute with A. Proof. Let Y K(n) we have
E
K(n) . Then Y
E
complement of TAUA
(TAUA ) j__ if and only if for every
( Y, [A , K] ) == tr Y * (AK - K A) -tr (YAK - Y K A) == tr[A, Y] K.
0
This is possible if and only if [A, Y] == 0 .
K
in
•
The set of all matrices Y that commute with A is called the commutant or the centraliser of A , and is denoted as Z (A) . The lemma above says that in the space K(n) , (TAUA ) j__ == Z ( A ) for every A.
A E K(n) and let f ( X ) == I I A - X ll 2 · Let B be any Then B0 is an extreme point for the function f on if and only if Eo commutes with A.
Theorem Vl.4.3 Let other element of K(n) . the
unitary orbit Us
Proof. A point B0 is an extreme point if and only i f the straight line joining A and B0 is perpendicular to UB at B0 • By Lemma VI.4.2 this is so if and only if A - E0 commutes with B0 , i.e. , if and only if A commutes
with B0 .
•
For skew-Hermitian (or He rmitian ) matrices A, B , this gi ves another p ro of of Theorem VI. 4. 1 . However, in this case Theorem VI.4.3 says much more . From the first theorem we can conclude that if A and B are normal, then the g lo bal minimum and maximum of the (Frobenius) distance from A to U8 are attained among matrices that commute with A. The second
168
VI .
S pectral Vari ation of Normal Matrices
theorem says that when A and B are·' ' b oth Hel£fuitian this is true for all local extrema as well . This last statement is not true when A is Hermitian and B is skew Hermitian. For in this case,
I I A - u B U * II� == IIA II � + I I UB U * II �
==
II A II � + l i B II �
for all U. Thus the entire orbit Us is at a constant distance from A. Hence, every point on Us is an extremal point. However, not every point on Us need commute with A.
VI . 5
Geometry and S pectral Variation : t he O perator Norm
The first theorem below says that if A is a normal matrix and B is any matrix close to A, then the optimal matching distance d ( o- ( A ) , a ( B )) is bounded by II A - B 11 . This is a local phenomenon; global versions of this are what we seek in the next paragraphs.
Let A be a normal matrix, and let B be any matrix such that II A - B l l is smaller than half the distance between any two distinct eigenvalues of A. Then d ( o- ( A) , o-{B )) < II A - B l l .
Theorem VI. 5 . 1
Let a1 , . . . , a. k be all the distinct eigenvalues of A. Let £ == II A - E l l · By Theorem VL3.3, all the eigenvalues of B lie in the union of the disks D ( a.j , c ) . By the hypothesis, these disks are mutually disjoint . We claim that if the eigenvalue ai has multiplicity mj , then the disk D(aj , c) contains exactly mj eigenvalues of B , counted with their respective multi plicities. Once this is established, the statement of the theorem is seen to follow easily. Let A(t) == ( 1 - t ) A + t B , 0 < t < 1 . This is a continuous map from B. Note [0, 1] into the space of matrices; and we have A ( O ) == A, A( l) that II A - A (t) II == t£, and so all the eigenvalues of A (t ) also lie in the disks D ( aj , £ ) for each 0 < t < 1 . By Corollary�; VL 1.6, as t moves from 0 to 1 , the eigenvalues o f A ( t) trace continuous curves that join the eigenvalues of A to those of B . None of these curves can jump from one of the disks D ( aj , £ ) to another. So, if we start off with mj such curves i n the disk • D ( aj , E. ) , we must end up with exactly as many. Proof.
==
Example VI. 3. 13 shows that if no condition is imposed on B, then the conclusion of the theorem above is no longer valid, even when B is normal. However, this does suggest a new approach to the problem. Let A , B be normal matrices, and let 1 (t) be a curve joining A and B , such that each 1 ( t ) is a normal matrix. Then in a small neighbourhood of 1 (t) the spect ral
1G9
VI. 5 G eometry and S pectral Variatio n : t he Operator Norn1
variation inequality of Theorem · VI. 5.t holds. So, the ( total ) spectral vari ation between the endpoints of the curve must be bounded by the length of this curve. This idea is made precise below. Let N denote the set of normal matrices of a fixed size n . If A is an element of N, then so is tA for all real t . Thus the set N is path connected. However, N is not an affine set. A continuous map 1 from any interval [ a , b] into N will be called a normal path or a normal curve. If 1 ( a ) == A and 1 (b) == B, we say that 1 is a path joining A and B; A and B are then the endpoints of I · The length of 1 , with respect to the norm II · I I , is defined as fii · II ( 'Y ) == sup
m- 1
L I I 'Y ( t k+ l ) - 'Y ( t k ) l l ,
k=O
( VI. 38)
where the supremum is taken over all partitions of [ a , b] as a == t 0 < t1 < · · · < t m == b. If this length is finite, the path 1 is said to be rectifiable. If the function 1' is a piecewise C 1 function, then
j b
1! 1 1 . 1 1 ( -r) =
I ! J' ( t ) ! l dt .
(VI.39)
a
Let A and B be normal matrices, and let 1 be a re tifi able normal path joining them. Then (VI . 40 ) d(O" (A) , O"(B)) < £ 11 . 11 (/ ) .
Theorem Vl. 5.2
c
Proof. For convenience, let us choose the parameter t to vary in [0 , 1 ] . For 0 < r < 1 , let /r be that part of the curve which is parametrised by [0, r] . Let G == {r E (0 , 1 ] : d(O" (A) , O"( 'Y ( r ) ) ) < £ 11 · 11 ( 1, ) } . The theorem will be proved if we show that the point 1 is in G. Since the function 1 , the arclength, and the distance d are all continuous in their arguments, the set G is closed. So it contains the point g == sup G. We have to show that g == 1 . Suppose g < 1. Let S == ! (g) . Using Theorem VI.5. 1 , we can find a point t in (g , 1] such that, if T == 1(t) , then d(O" (S) , O" (T) ) < l i S - T i l . But then d(a(A) , O"( 'Y (t) ) )
< d(O"(A) , O"(S) ) + d(a( S) , O" (T) ) < f l l · ll ( lg ) + l i S - T i l <
f u - u ( !t ) -
By the definition of g , this is not possible.
•
An effective estimate of d(a(A) , a( B)) can thus be obtained if one could find the length of the shortest normal path joining A and B. This is a
1 70
VI .
S pectral Variation of Normal Matrices
difficult problem since the geometry of the set N is poorly understood. However, the theorem above does have several interesting consequences.
Let A, B E N. Then the line segment joining A and B lies in N if and only if A - E is in N. Theorem VI. 5.4 If A , E are norrnal matrices such that A - E zs also norrnal, then d ( O"( A) , O" ( E) ) < II A - E l l . Exercise VI. 5.3
Proof. The path 1 consisting of the line segment joining A, E is a normal path by Exercise VI.5.3. Its length is I I A - E l l . •
For Hermitian matrices A, B , the condition of the theorem above is sat isfied. So this theorem includes Weyl 's perturbation theorem as a special case. A more substantial application of Theorem VI.5.2 is obtained as follows. It turns out that there exist normal matrices A , B for which A E is not normal, but there exists a normal path that joins them and has length I I A - E l l . Note that this path cannot be the line segment joining A and B; however, it has the same length as the line segment. What makes this possible is the fact that the metric under consideration is not Euclidean, and so geodesics need not always be straight lines. (Of course, by the definition of the arclength and the triangle inequality no path joining A, B could have length smaller than I I A - B I I . ) Let S be any subset of M(n) . We will say that S is metrically flat in the metric induced by the norm I I · I I if any two points A, E of S can be joined by a path that lies entirely within S and has length II A - B ll . To emphasize the dependence on the norm II I I , we will also call such a set I I 1 1 - fl at. Of course, every affine set is metrically fiat. A nontrivial example of a II · 11 -flat set is given by the theorem below. Let U be the set of n x n unitary matrices and C U the set of all constant multiples of unitary matrices. -
·
·
·
Theorem Vl. 5 . 5
The set C · U is
II ·
1 1 -fiat.
Proof. First note that C · U consists of just nonnegative real multiples of unitary matrices. Let Ao == ro Uo and A 1 == r 1 U1 be any two elements of this set , where r0 , r 1 > 0. Choose an orthonormal basis in which the unitary matrix U1 U0 1 is diagonal:
U1 u:0- 1
_ -
d 1" ag ( e ifh , . . . , e iBn ) ,
where Reduction to such a form can be achieved by a unitary conjugation. Such a process changes neither eigenvalues nor norms. So, we may assume that
VI . 5 Geometry and Spectral Variation : t he Operator Norm
all
matrices are written with respec-t to K
==
the
above
171
basis .�rrL:ie t
orthonormal
diag ( iB I , . . . , iB n ) .
Then, K is a skew-Herm itian matrix whose eigenvalues are in the interval ( -i1r, i1r] . We have l i re / - r 1 U1 U0 I II
==
m�xlro - r i exp ( i01 ) 1 J
T his last quantity is the length of the straight line joining the points r0 and ri exp ( iB I ) in the complex plane. Parametrise this line segment as r(t) exp (itB 1 ) , 0 < t < 1 . This can be done except when I B1 1 == 1r , an exceptional case to which we will return later. The equation above can then be written as I
J I [r(t) exp(itB I ) ] ' i dt 0
I
J l r' ( t) + r ( t)iB 1 i dt. 0
Now let A(t) == r (t) ex p(tK) U0 , 0 < t < 1 . This is a smooth curve in C U with endpoints Ao and A 1 . The length of this curve is ·
l
J0 I I A' (t) l l dt
1
J l l r' (t ) exp (tK) Uo + r ( t) K exp (tK) Uo l l dt 0
j i i r' ( t)I + r (t)K I I dt, l
0
since exp(tK) U0 i s a unitary matrix. But ll r' ( t ) I + r (t) K I J
==
m�x l r' (t ) + ir(t)Bj l J
-
==
l r' (t) + ir (t) Bl l ·
Putting the last three equations together, we see that the path A (t) j oining A o and AI has length I I Ao - A 1 1 l · The exceptional case I el l == 7r is much simpler. The piecewise linear path that joins A0 to 0 and then to A 1 has length r0 + r 1 . This is equal to • I ra - r 1 exp ( iB 1 ) I and hence to II A o - A 1 l l Using Theorems VI. 5 . 2 and VI.5.5, we obtain another proof of the result of Ex ercise VI . 3 . 1 2 .
VI.
1 72
Spect ral Variation of Normal Matrices
Let A, B be normal iifatrices whose eigenvalues lie on concentric circles C(A) and C (B), respectively. Show that d(CJ( A ) , CJ ( B ) ) <
Exercise Vl. 5.6 IIA - B II ·
Theorem VI.5. 7 The if and only if n < 2 .
set N consisting of
nxn
normal matrices is 11 · 1 1 -fiat
Proof. Let A , B be 2 x 2 normal matrices. If the eigenvalues of A and those of B lie on two parallel lines, we may assume that these lines are parallel to the real axis. Then the skew-Hermitian part of A - B is a scalar , and hence A - B is normal. The straight line joining A and B, then lies in N. If the eigenvalues do not lie on parallel lines, then they lie on two concentric circles. If a is the common centre of these circles, then A and B are in the set a + C U. This set is II · 11-flat. Thus, in either case, A and B can be joined by a normal path of length II A B 11 I f n > 3, then N cannot be II 11- fl at because of Theorem VL5.2 and Example VI. 3 . 13. • ·
-
·
Here is an example of a Hermitian matrix A and a skew Hermitian matrix B that cannot be joined by a normal path of length I I A E l l - Let Example Vl. 5 . 8
A ==
0 1 1 0 0 1
0
1 0
0 -1 0
B ==
1 0 -1
0 1 0
Then II A - B II 2 . If there were a normal path of length 2j oining A, B, then the midpoint of this path would be a normal matrix C such that II A - Gil I I B C II == 1 . Since each entry of a matrix is dominated by its norm, this implies that l c2 1 - 1 1 < 1 and lc2 1 + 1 1 < 1 . Hence c2 1 0 . By the same argument, c3 2 == 0. So ==
==
-
==
A - C ==
*
1
*
*
*
*
*
1
*
where represents an entry whose value is not yet known. But if I I A - C f l 1 , we must have *
A - C ==
Hence C ==
0 1 0
0 * 0 0 1 0
0 1 * 0 0 1 0 0 0
But then C could not have been normal.
==
VI .6
VI . 6
Geometry and Sp ectral Variation: w ui N o rrns
1 73
Geometry and Spectral Variation--:·· wui Norms
In this section we consider the possibility of extending to all (weakly) uni tarily invariant norrns, results obtained in the previous section for the op erator norm. Given a wui norm T , the T-optimal matching distance between the eigenvalues of two (normal) matrices A , B is defined as 1 (V L41 ) dT (o-(A) , a ( B ) ) == min T(Eig A - P(Eig B) P- ) ,
p
where, as before, Eig A is a diagonal matrix with eigenvalues of A down its diagonal (in any order) and where P varies over all permutation matrices. We want to compare this with the distance T (A - B) . The main result in this section is an extension of the path inequality in Theorem VL5.2 to all wui norms. From this several interesting conclusions can be drawn. Let us begin by an example that illustrates that not all results for the operator norm have straightforward extensions. Example VI. 6 . 1
For 0 < t <
1r ,
let U ( t ) =
III U( t ) - U(O) I II == 1 1 -
( e�t � ) . Then,
ez. t l == 2 sin 2t ,
for every unitarily invariant norm. In the trace norm (the Schatten norm), we have d 1 ( () ( U ( t ) ) , o-( U ( 0) ) ) == 2 1 1 - i t /2 1 = 4 sin ! .
1-
e
So,
.
! d1 (()(U(t)) , o-(U( O ) ) ) t. '0 t _;_ - sec 4 > 1 , Jor ) U( ) U( I I t - O iJ 1 Thus, we might have d1 ( o-( A) , o-( B ) ) > I I A - B ll 1 , even for arbitrarily close normal matrices A, B . Compare this with Theorems VI. 5. 1 and V/. 4. 1 . _
The Q-norms are special in this respect, as we will see below. Let be any finite subset of C. A map F : C is called a retraction onto if l z - F( z ) l == dist (z, ) , i.e. , F maps every point in C to one of the points in that is at the least distance from it. Such an F is not unique if ci> has more than one element . Let be a subset of C that has at most n elements, and let N ( ) be the set of all n x n normal matrices A such that () ( A ) C . If F is a retraction onto , then for every normal matrix B wit h eigenvalues {,8 1 , . , fJn } and for every A in N( ) we have ---+
..
l i B - F(B) I I
I ,Bj - F ( ,Bj ) I == m?X dist( ,Bj , ) J s ( o-(B) , o- (A ) ) < li B - A ll m
l <J9X
(VL42)
VI .
1 74
S pect ral Variat ion of No rmal Mat rices
by Theorem VI .3. 3. Note that the normality otB was required at the first� step and that of A at the last . This inequality has a generalisation. Theorem Vl . 6 . 2 Let be a finite set of cardinality at most n . Let F be a retraction onto . Then for every norrnal matrix B and for every A E N ( 1>)
we have
for
every Q - n o
l i B - F ( B) II Q < l i B - A II Q
( VI.43 )
rrn .
By Exercise IV. 2. 10, the inequality (VI.43) is equivalent to the weak majorisation Proof.
[s ( B - F ( B)) ] 2 If {31 , . . . , f3n are the eigenvalues of all 1 < k < n we have
-< w
[s ( A - B) ] 2 •
B , this is equivalent to saying that for
k
L lf3i1 - F ( f3i1 ) 1 2
<
j= l
k
L s] ( A - B)
j= l
for every choice of indices i 1 , . . . , i k · By Ky Fan's maximum principle ( Exercise 11. 1 . 13) k
k
j=l
j=l
L s� ( A - B) == max L II ( A - B)vi ll 2 ,
where the maximum is taken over all orthonormal k-tuples v 1 , . . . particular, if ej are unit vectors such that B e j == {3j ej , then
k
k
j=l
j =l
, Vk .
In
L s] ( A - B) > L I I (A - /3ij ) e i1 11 2 -
But if {3 is any complex number and e any unit vector, then I I ( A - ,B ) e l l > dist ( {J, O"(A) ) . ( See the second proof of Theorem VI.3.3. ) Hence , we have
k
k
j =l
j=l
L sJ ( A - B) > L l f3ii - F ( ,Bii ) l 2 ,
and this completes the proof.
•
Show that the assertion of Theorem V/. 6. 2 is not true for the Schatten p-norrns, 1 < p < 2 . {See the example in (V/. 23).) Exercise Vl.6.3
VI . 6 Geometry and S p ectral Variation: wui Norms
1 75
Corollary Vl.6.4 Let A be a normal rnatrix, and let B be-another norma['-Y, matrix such that II A - B I I is smaller than half the distance between the distinct eigenvalues of A . Then
fo r every Q-norm. {The quantity on the left is the Q-norm optimal match ing distance.) Proof. Let £ == I I A - E l l - In the proof of Theorem VI. 5. 1 we saw that all the eigenvalues of B lie within the region comprising of disks of radius c around the eigenvalues of A . Further, each such disk contains as many eigenvalues of A as of B (multiplicities counted ) . The retraction F of The orem VI. 6.2 then achieves a one-to-one pairing of the eigenvalues of A and • those of B.
Replacing the operator norm by any other norm T in (VI. 38) , we can define the T-length of a path 1 by the same formula. Denote this by f T ( 1 ) .
Let A and B be normal matrices, and let 1 be a normal path joining them. Then for every Q-norm we have Exercise Vl.6.5
This includes Theorem VI. 5. 2 as a special case. We will now extend this inequality to its broadest context.
Let A be a normal matrix and let 8 be half the min imum distance between distinct eigenvalues of A . Then there exists a posi tive number M {depending on 8 and the dimension ) such that any normal matrix B with I I A - E l l < 8 has a representation B U E'U * , where E' commutes with A and U is a unitary matrix with II I - U ll < M i l A - E l l -
Proposition Vl.6.6
n
==
Proof. Let D'.j , 1 < j < r, be the distinct eigenvalues of A, and let mj be the multiplicity of a. j . Choose an orthonormal basis in which A == ffij a. j lj , where 11 , 1 < j < r , are identity submatrices of dimensions mj . By the argument used in the proof of Theorem VI. 5. 1 , the eigenvalues of B can be grouped into diagonal blocks D1 , where D1 has dimension m1 and every eigenvalue of Dj is within distance b of a.j · This implies that
1 if j -# k . II ( a j Ik - D k ) - I I < 8 If D == ffij D1 , then there exists a unitary matrix W such that E == W D W * . With respect to the above splitting, let W have the block decomposition W == [ Wj k ] , 1 < j, k < r. Then 1
II A - E ll
I I A - W D W * I l == I I A W - W D II IJ [Wjk ( a.j lk - D k ) ] l l .
1 76
VI .
S pectral Variat ion of Normal Matrices
Hence , for j =1- k,
Hence, there exists a constant K that depends only o n 8 and
n
such that
Let X == EBj Wjj · This is the diagonal part in the block decomposition of w . Hence, I I X II < 1 by the pinching inequality. Let wjj == Vj Pj be the polar decomposition of Wi1 with Vj unitary and Pi positive. Then since P1 is a contraction. Let V == EBj Vj . Then V is unitary and from the above inequality, we see that II X - V I I < I I X * X - I l l == I I X * X - W * W i l -
Hence, II W - V I I
< < <
I I W - X II + II X - V II < II W - X I I + II X * X - W * W I I I I W - X II + II ( X * - W * ) X l l + t i W * ( X - W ) l l 3 II W - X II < 3 K I I A - E l l -
If we put U == W V * and M == 3 K, we have II I - U l l < M i l A - B l l and B == W D W * == U V DV* U * == U B 'U* , where B ' == V D V * . Since B ' is block-diagonal with diagonal blocks of size mi , it commutes with A. This • completes the proof.
Given a normal matrix A, a wui T and an £ > 0, there exists a small neighbourhood of A such that for any normal matrix B in this neighbourhood we have Proposition VI.6. 7
norm
Proof. Choose B so close to A that the conditions of Proposition VI.6.6 are satisfied . Let U, B ' , M be as in that proposition. Let S == U - I, so that U == I + S and U* == I - S + S2U* . Then
A - B == A - B ' + ( B' , S] + U B' SU * - B' S. Hence,
r(A - B' + [ B ' , S] ) < T (A - B ) + T ( UB' SU* - B' S) .
VI . 6 Geometry and Spectral Var iat ion : wui Norms
1 77
Since A and B ' are commut ing normal matrices, the y can be diagonalised simultaneously by a unitary conj ugation. In this basis the diagonal of [B ' , S] will be zero. So, by the pinching inequality T ( A - B') < T ( A - B' + [B' ' S] ) .
The two ine quali t i es above give
dr (O"( A) , O"( B) ) < r ( A - B) + T ( U B' S U * - B' S) .
Now choose k such that 1 T (X ) < < k for all X . k ! I X [[
Then, using Proposition VI.6.6, we get T ( U B' SU* - B' S)
< <
2 T ( B' S) < 2 k M II B II II A - E ll 2 k 2 M II B J I T ( A - B ) .
Now, if B is so close to A that we have 2 k 2 M l ! B ll < of the proposition is valid .
c,
then the inequality
•
Let A, B be normal matrices, and let 1 be any normal path joining them. Then there exists a permutation matrix P such that for every wui norm T we have Theorem Vl.6 . 8
(VI.44)
Proof. For convenience, let 1 ( t) be parametrised on the interval [0, 1] . Let !(0) == A, 'Y( l ) == B . By Theorem VI . 1 .4, there exist continuous functions A1 (t) , . . . , A n (t) t hat represent the eigenvalues of the matrix 1(t) for each t . Let D ( t) be the diagonal matrix with diagonal entries A1 ( t ) . We will show that T (D( O) - D ( l) ) < fr (! ) · ( VI . 4 5 )
Let T be any wui norm, and let r: be any positive number. Let 1[s , t] denote the part of the path ! ( · ) that is defined on [s, t] . Let
G == {t : T(D( O ) - D (t))
< ( 1 + t: ) fr (! [O , t ] ) } .
( VI . 4 6 )
Because of continuity, G is a closed set and hence it includes its supremum g. We will show that g == 1 . If this is not the case, then we can choose g' > g so close to g that Proposition VI.6. 7 guarantees
(VI.47)
for some permutation
matrix P.
T ( D (g ) - p- l D ( g ) P )
<
Now note that
T ( D (g ' ) - D(g ) ) + T (D(g ) - P D (g ') P - 1 ) ,
1 78
VI.
Spectral Variat ion o f Normal Matrices
and hence if g' is sufficiently close to g, we will have T ( D (g) - p - l D (g) P ) small relative to the minimum distance between the distinct eigenvalues of D(g) . We thus have D (g) == p - 1 D ( g ) P. Hence T ( D (g) - D ( g ' ) )
==
T ( P - 1 D ( g) P - D (g ') )
So, from ( VI. 4 7) , T(D ( g ) - D ( g' )) < ( 1
+
==
T ( D (g) - P D ( g' ) P - 1 ) .
£)T ( !(9) - !(g' ) ) .
From the definition of g as the supremum of the set G in ( VI.46 ) , we have T (D ( O) - D (g)) < ( 1 + £)fr (! [O, g] ) .
Combining the two inequalities above , we get T( D( O) - D (g')) < ( 1 + £)f, (! [O , g ' ] ) .
•
This contradicts the definition of g . So g == 1 .
The inequality ( VI.45 ) tells us not only that for all normal A, B and for all wui norms T we have ( VI .4 8 )
but also that a matching of o-(A) with o-(B) can be chosen which makes this work simultaneously for all T. Further, this matching is the natural one obtained by following the curves Aj ( t) that describe the eigenvalues of the family 'Y( t). Several corollaries can be obtained now. ,
Let A , B be unitary matrices, and let K be any skew Hermitian matrix such that BA- 1 e p K . Then, for every unitarily in variant norm I l l Il l , we have,
Theorem VI.6.9
==
x
·
d lll · l l l ( o- ( A ) , o- ( B) ) < Il l K il l -
Proof. Let 1 (t) == (exp tK)A, 0 < t < 1 . Then t , !(0) == A, ! ( 1 ) == B. So, by Theorem VL 6.8, d lll · lll ( cr ( A ) , cr ( B) )
<
(VL49)
1 (t)
j ll h/ (t) l l ! dt .
==
K ( exp t K) A So l ll ! ' ( t ) l ll == III K III -
Theorem VI.6. 10
invariant norm
unitary for all
I
0
But 1' (t)
is
.
•
Let A, B be unitary matrices. Then for every unitarily d lll · l l l ( cr ( A ) , cr ( B ) ) <
; lil A - B i ll -
(VI. 50)
VL 6 Geo n1 etr y and Spectral Variation: wui Norms
P roof.
1 79
In view of Theorem VL6.9, we need to show that . I nf { III K I I I : BA - l
==
K < A B exp K} 2 II I - ill -
Choose a K whose eigenvalues are contained in the interval ( - i n , i1r] . By app lying a unitary conjugation , we may assume that K == diag( i8 1 , . . . , iB n ) . Then
l i l A - B ill == IIII - BA - 1 111 == l l l d i ag ( 1 -
But if - 7r < 8 < n , then I B I < ; 1 1 every unitarily invariant norm.
ei8 1 -
e i 8 1 , . . . , 1 - e iBn ) I ll -
Hence, I l l K i l l
<
; l i l A - B i l l for •
We now give an example to show that the factor 1r /2 in the inequality (VI.50) cannot be reduced if the inequality is to hold for all unitarily in variant norms and all dimensions. Recall that for the operator norm and for the Frobenius norm we have the stronger inequality with 1 instead of K/2 (Theorem VI.3. 1 1 and Theorem VI.4. 1 ) . Example Vl. 6 . 1 1 adding an entry ± 1 'l.
e. '
Let A + and A _ be the unitary matrices obtained by in the bottom left corner to an upper Jordan matrix, 0
A± ==
0
0
±1
1 0
0 1
0 0
0 0
0 0
1 0
Then for the trace norrn we have I I A+ - A_ ll 1 2 . The eigenvalues of A ± are the n roots of ± l . One can see that the J l · l h -optimal matching distance between these two n-tuples approaches as n oo . ==
�
n
The next theorem is a generalisation of, and can be proved using the same idea as, Theorem VI.5.4.
If A, B are norrnal matrices such that A - B is also normal, then for every wui norm
Theorem Vl. 6. 1 2
T
dT ( () (A) ' () (B) ) < ( A - B) . T
(V I. 5 1 )
This inequality, or rather just its special case when T is restricted to unitarily invariant norms and A , B are Hermitian, can be used to get yet another proof of Lidskii 's Theorem. We have seen this argument earlier in Chapter IV. The stronger result we now have at our disposal gives a stronger version of Lidskii's Theorem. This is shown below. Let X ' y be elements of e n . We will say that X is majorised by y ' in symbols x -< y, if x is a convex combination of vectors obtained from y by permuting its coordinates, i.e. , x == 'Eaa Ya , a finite sum in which each Yu is
1 80
VL
S p ectral Variat ion of Normal Matrices
· ·:a
vector whose coordinates are obtained by applying the permutation O" to the coordinates of y and aa are positive numbers with :Eaa == 1 . When x , y are real vectors, this is already familiar. We will say x is softly majorised by y, in symbols x -< s y , if we can write x as a finite sum x EzaYa in ,,-lhich Za are complex numbers such that E l za l < 1 . ==
n
n
Let x , y be two vectors in e n such that Lxj and L Yj j =l j =l n n are not zero. If x -<s y and Lxj LYj , then x -< y. j =l j =l Proposition Vl.6 . 14 Let A, B be n normal matrices and let A.( A) , >.. ( !3) be two n-vectors whose coordinates are the eigenvalues of A , B , respec tively. Then T (A) < T(B) for all wui norms T if and only if A.(A) -< s A.(B ) .
Exercise VI.6. 13
==
x
n
Proof. Suppose T ( A ) < T(B) for all wui norms T. Then, using The orem IV.4 . 7, we can write the diagonal matrix Eig(A) as a finite sum Eig(A) == L:zk UkEig(B) Uk: , in which Uk are unitary matrices and E l zk l < 1 . This shows that A. ( A ) == :Ezk Sk (A.(B) ) , where each Sk is an orthostochas tic matrix. (An orthostochastic matrix S is a doubly stochastic matrix such that S ij == l u ij 1 2 , where Uij are the entries of a unitary matrix . ) By Birkhoff ' s Theorem each Sk is a convex combination of permutation ma trices . Hence, A.(A) -<s A.(B) . The converse follows by the same argument without recourse to Birkhoff's Theorem. •
Let A , B be normal matrices such that A - B is also normal. Then the eigenvalues of A and B can be arranged in such a way that if >.. ( A) and >.. ( B) are the n-vectors with these eigenvalues as their coordinates, then
Theorem Vl. 6 . 1 5
>.. (A) - A.(B) -< .A(A - B) .
( VI. 52 )
Proof. Use Theorem VL6.8 and the observation in Theorem VL 6. 12 to conclude that we can arrange the eigenvalues in such a way that T ( E ig A - Eig B) < T (A - B)
for every wui norm T. By Proposition VL6. 14, this is equivalent to saying A.(A) - A. ( B) -< s A.(A - B) ,
where .A ( A) is the vector whose entries are the diagonal entries of the diag onal matrix Eig A. By a small perturbation, if necessary, we may assume that trA #- tr B. Since the components of the vectors A. ( A) - A. ( B) and >.. (A - B) must have the same (nonzero) sum, we have in fact majorisation • rather than just the soft majorisation proved above.
V I . 7 Some Inequalities for the D eterminant
181
We::c-c an call this the Lidskii Theorem for normal matrices . It in clu des the classical Lidskii Theorem as a special case. Exercise Vl. 6 . 1 6 Let A, B be normal matrices such that A B is also normal. Let T be any wui norm. Show that there exists a permutation matrix -
P such that
VI . 7
T ( A - B ) < T (Eig A - P(Eig B ) P- 1 ) .
(VI. 53)
S ome Inequalities for t he Determinant
The determinant of the sum A + B of two matrices has no simple relation with the determinants of A and B. Some interesting inequalities can be derived using ideas introduced in this chapter. These are proved below.
Let A and B be Hermitian matrices with eigenvalues and fJ1 , . . . , f3n , respectively. Then
Theorem VI. 7.1 a 1 , . . . , an
n
min a IT (ai + f3a (i) ) < det (A + B ) < max
a
i=l
where O" varies over all permutations. Proof.
n
IT (a i + f3a(i) ) ,
i= l
(VI. 54)
I f A and B commute, they can be diagonalised simultaneously, n
and hence det (A + B)
II (a i + f3a ( i ) )
==
for some
O" .
So, the inequality
i =l (VL5 4) is trivial in this case. Next note that the two extreme sides of (VI. 54) are invariant under the transformation B U BU* for every unitary U. Hence, it suffices to prove that for a fixed Hermitian matrix A the function f ( H) == det ( A + H ) on the unitary orbit Us of another Hermitian matrix B attains its minimum and maximum at points that commute with A. Let B0 be any extreme point of f on Us . Then, we must have �
d dt
t =O
for every skew-Hermitian
det (A + e t K B0 e - t K )
K.
==
0'
Now,
det ( A + etK B0 e - t K ) = det(A + Bo + t [ K, Bo] ) + O ( t 2 ) . Note that , if X , Y are any two matrices and X is invertible , then det(X + tY)
== det X ( l + t tr Yx - 1 ) + O (t2 ) .
So, if A + B0 is invertible, the condition (VI. 55) reduces to tr [K, Bo] (A + B0 ) - 1
==
0.
(VI. 55)
182
VI .
Spectral Variation
of
Normal Mat rices
This is equivalent to saying
tr K ( Bo (A + Bo ) - 1 - ( A + Bo ) - 1 B0 )
==
0.
If this is to be true for all skew-Hermitian K , we must have Thus B0 commutes with ( A + B0 ) - 1 , hence with A + B0 , and hence with A. This proves the theorem under the assumption that A + B0 is invertible. The general case follows from this by a limiting argument. • Exercise VI. 7.2
0, then n
Let A and B be Hermitian matrices. If .x; ( A ) + .X � ( B ) > n
(VI. 56)
j=l j =l This is true, in particular, when A and B are positive matrices. Theorem VI. 7.3 Let A, B be Hermitian matrices with eigenvalues O'.j and {3j , respectively, ordered so that
Let T == A + i B . Then I det T l
<
n II lnj + i f3n - j + l l ·
j= 1
(VI. 57)
Proof. The function f (t) == � log t is concave on the positive half-line. Hence, using the majorisation (VI. 7) and Corollary II.3.4, we have n
L j= l Hen ce,
+ if3n - j + I I > L j= l
n
n
j= l
j= 1
Proposition VI. 7.4
tian. Then
I det T l
log laj
n
==
log sj .
•
Let T A + iB, where A is positive and B Hermi ==
n det A II [ 1 + Sj (A - 1 1 2 B A - 1 12 ) 2 ] 1 /2 . j =l
( VI. 58)
VI . 7 Some Inequalities for the Determinant
P roof.
Since T == A 1 1 2 ( 1 �. i A - l /2 BA - 11 2 ) A 1 12 , we have det T det A · det (J + iA - 1 1 2 BA - 1 1 2 ) . ==
183
(VI. 5 9)
Note that
I det (J + i A - l /2 BA - 1 1 2 ) ! 2 det [ (/ + i A - 1 1 2 BA - 1 1 2 ) ( 1 - i A - 1 12 B A - 1 1 2 )] det [/ + (A - 1 /2 BA - 1 / 2 ) 2 ] n IJ [ l + Sj (A - 1 /2 BA- 1 / 2 ) 2 ] . j=1
So, (VL58) follows from (VI. 59) and (VI. 60) .
(VI.60) •
If the matrix A in the Cartesian decomposition T then I det Tj > det A . Theorem VI. 7. 6 Let T A + i B, where A and B are positive matrices with eigenvalues a 1 > · · > a n and fJ1 > · · · > f3n , respectively. Then,
Corollary VI. 7. 5 A + iB is positive,
==
==
·
n l det T j > IJ i aJ + if3J I · j= l
(VI.61)
Proof. We may assume, without loss of generality, that both A, B are positive definite. B ecause of relations (VI. 59) and (VI. 60) , the theorem will be proved if we show n n IJ [ l + Sj (A - 1/ 2 BA - l / 2 ) 2 ] > IJ ( 1 + aj 2 f3] ) . j= 1 j= 1 Note that Sj ( A - 1 /2 B A - 1 /2 ) Sj (A - l / 2 B l / 2 ) 2 . From ( 111. 20 ) we have 1 { log Sn - j+ l (A - 1 /2 ) + log Sj ( B 1 1 2 ) } j -< { log sj (A - 1 1 2 B 1 2 ) } j · ==
This is the same as saying 1 /2 1 2 log ( { aj f3] ) }j -< { log sj (A - 1 1 2 B1 1 2 ) }j ·
Since the function log( 1 + e 4t ) is convex in t , using Corollary 11.3.4 we obtain from the last majorisation n n L log ( l + nj 2 f3} ) < L log ( 1 + sj (A - I /2 B l / 2 ) 4 ) j =l j= 1
n L log( l + sj (A- 1 1 2 BA - 1 1 2 ) 2 ) .
j= 1 This gives the desired inequality.
•
1 84
VI .
Spectral Variat ion of Normal Matrices
Exercise VI. 7. 7 Show that if only one of the two matrices A and B is positive, then the inequality {VI. 61} is not necessarily true.
A very natural generalisation of Theorem VI. 7. 1 would be the followin g statement. If A and B are normal then det (A + B) lies in the convex hull n
of the products
IT (ai + f30"(i ) ) -
This is called the Marcus-de O livie ra
i= 1 Conjecture and is a well-known open problem in matrix theory.
VI . 8
Problems
Problem Vl.8. 1. Let A be a Hermitian and B a s kew- Herm itian matrix. Show that
for every Q-norm. Problem Vl. 8 . 2 . Let T == that, for 2 < p < oo ,
A + i B , where A and B
are Hermitian. Show
max ii Eig A + Eigcr ( iB ) I I < 2 1 / 2 - l / P I I TII P , (]"
and that for 1 < p < 2 ,
Problem Vl.8.3. A different proof of Theorem VL3. 1 1 is outlined below. Fill in the details. Let n > 3 and let A, B be n x n unitary matrices. Assume that the eigenvalues ai and- (3i are distinct and the distances lni - {3j I are also distinct. If /I , /2 are two points on the unit circle, we write / I < 12 if the minor arc from 1 1 to 12 goes counterclockwise. We write ( a.f3'Y) if the points a. , {3, 1 on the unit circle are in counterclockwise cyclic order. Number the indices modulo n, e.g . , n n + I == a 1 . Label the eigenvalues of A so as to have the order (a 1 a 2 a ) Let 8 == d ( o- ( A ) , o-(B ) ) . Assume that 8 < 2 ; otherwise, there is nothing to prove. Label the eigenvalues of B as {3 1 , . . . , f3n in such a way that for any subset J of { 1 , 2, . . . , n } and for any permutation o·
max l ni - f3i I < max l ni - f3u(i) 1 -
iEJ
iE J
·
·
n
·
1 85
VI . 8 P roblems
Then 8 '·· m ?JC lai - ,Bi l · Assume, without loss of generality, that this 1
<2
m aximum is attained at
i
==
1 and that a1 < ,8 1 . C he ck the following.
( i ) If f3i < ai , then neither (a 1 ,Bi ,B 1 ) nor ( a 1 a i,8 1 )
is
possible.
( i i ) There exists j such that l ni+ 1 - {3j I > 8. Choose and fix one such j . ( ii ) We have ( a 1 f31 f3j aj + 1 ) . ( iv ) For 1 < i < j we have ({31 ,Bi f3j ) . Let KA be the arc from aj + 1 positively to a 1 and Ks the arc from ,81 p osit ively to ,Bj . Then there are n - j + 1 of the ai in KA and j o f the f3i in KB · Use Proposition VI. 3 . 5 now. Problem Vl.8 . 4. Let a1 , . . . , an and /11 , . . . , /1n be any complex numbers. Show that there is a number 1 such that l f3j - 1 1 m9XI a i - 11 + m� J 'l.
< v2 ��x l a i - ,Bj l · 1.
,J
( The proof might be long but is not too difficult. ) Use proof of Theorem VI.3. 14.
this
to get another
Problem Vl. 8 . 5 . Let A be a Hermitian and B a normal matrix. If t he eigenvalues aj of A are enumerated as a 1 > > an and if the eigenvalues (Jj of B are enumerated so that Re {3 1 > · · · > Re f3n then ·
·
·
max < v2 II A - E ll · 1 < ]· < n I nJ· - !3J· 1 _
_
Problem Vl. 8.6. Let A be a normal mat rix with eigenvalues a 1 , . . , a n · Let B be any other matrix and let c == I I A - E l l . By Theorem VI.3.3 , all the e igenvalues of B are contained in the set D == U D (aj , £) . Use .
the argument in the proof of Theorem VI.5 . 1 to show that each connected component of D contains as many eigenvalues of B as o f A. Use this and the Matching Theorem ( Theorem II. 2 . 1 ) to show that J
d(o-(A) , o-(B) ) < ( 2n - 1 ) I I A - E l l .
[If A and B are both normal, this argument together with the result of Prob lem II. 5 . 10 shows that d( o-( A) , o-( B) ) <: n II A - E I I - However, in this case, the Hoffman-Wielandt inequality gives a stronger result : d ( o-(A) , o-(B) ) < Vii I I A - B l l . We will see in the next chapter that, in fact , d (CT ( A ) , u ( B)) < 3 II A - E l l in this case. ] P roble m VI.8 . 7. Let A be a Hermitian matrix with eigenvalues a 1 > · · · > an , and let B be any matrix with eigenvalues (Jj arranged so that
1 86
VI .
Spect ral Variat ion of Nor mal Matrices
Re,B1 > · > Re,Bn . Let Re ,Bj J-li and I m,Bj basis in which B is upper triangular and ·
==
·
E
where M == diag(J-LI , . . . ) J-Ln ) , triangular. Show that
==
N
==
Vj . Choose an orthonormal
M + iN + iR, ==
diag(v 1 , . . . , vn ) and R is strictly upper
I I Im ( A - E ) ll � == I I N II � + 1/2fi RI I � -
Hence, Show that
Combine the inequalities above to obtain
Compare this with the result of Problem VI.8.5; note that there B was assumed to be normal. Problem Vl.8.8. It follows from the result of the above problem that if A is Hermitian and E an arbitrary matrix, then d( o- ( A ) , o-(E ) ) < ffn II A - E l l .
The factor ffn here can be replaced by another that grows only like log n. For this one needs the following fact , which we state without proof. (See the discussion in the Notes at the end of the chapter. ) Let Z be an n x n matrix whose eigenvalues are all real. Then li Z - Z * II < 'Yn ii Z + Z* jl ,
where
2 [n / 2] 2j - 1 cot In == n L 2n j
7r.
=l
The constant In is the smallest one for which the above norm inequality is true. Approximating the sum by integrals, it is easy to see that In / log n approaches 2/ as n ----+ oo . 1r
Using the notations of Problem 7, show that
max l vj I < I l i m E ll == !li m ( A -
max l ai - J.Li l < I I A -
B ) 11 ,
M i l == II Re ( A - B) II +
1 11 R - R* ll · 2
VI . 8 P roblems
Let Z , · - N + R. Then ."Z has only real eigenvalues, Z + Z * == 2 Im( A - B ) . Henc e,
Z - Z*
=
187
R - R* , and
max l c�1 - ,81 l < l l Re ( A - B ) I I + ( In + l ) I I Im( A - B ) I I . Th is shows that d ( o- ( A ) , o-( B ) ) < ( In + 2 ) I I A - E l l .
Problem VI.8 .9. Let A be the Hermitian matrix 1
Let B
==
l i -j l 0
w it h
i =f: j,
if for all
e nt ri e s
.
'l .
A + C '\v here C is the skew-Hermitian matrix with ent ri es
if for all
i =f: j .
'L
Then B is s trict ly lower triangular, hence all its eigenvalues are 0 . Show that II A - E ll < 1r for all n , and I I A l l == O(log n ) . ( This needs some work. ) Since A is Hermitian, this means that its spectral radius is O(log n ) . Thus, in t h i s example , d( o-( A ) , o- ( B ) ) == O ( log n ) and I I A - B ll < 1r. So , the bound obtained in Problem 8 is not too loose. Problem Vl.8. 10. For any matrix A, let A n denote its diagonal part and AL , Au its parts below and above the diagonal. Thus A == AL + Av + Au . Show that if A is an n x n normal matrix, then The example i n VI.6. 1 1 shows that this i ne q uali ty is sharp . Problem Vl. 8 . 1 1. Let A be a normal and B an arbitrary n x n matr ix . Choose an orthonormal basis in whi ch B is upper triangular. In this basis writ e A == A L + An + Au , B == Bv + Bu . By the Hoffman-Wielandt Theorem
Note that
A - B + Bu == ( A - B) L + ( A - B ) v + Au .
Use the result of Prob l em VL8. 10 t o show that
Hence, we have , for A normal and B arbitrary,
188
VI.
S pectral Variation of Normal Matrices
From this , we get for the Schatten p-norms 1
Show that for 1 < p < 2 these inequalities are sharp. (See Example VI. 6 . 1 1 . ) For p == oo , this gives d (()(A ) , () ( B ) ) < n i ! A
-
E ll ,
which is an improvement on the result of Problem 6 above. If A is Hermitian, then II Au l l 2 == I I AL II 2 - Using this one obtains a slightly different proof of the last inequality in Problem VI.8. 7. Problem Vl.8 . 12 . Let A be an n x n Hermitian matrix partitioned as A == ( � IJ; ) , where M is a k x k matrix. Let the eigenvalues of A be > J-l k , and let the singular values of .,.\ 1 > > A n , those of M be J-t 1 > Show that there exist indices 1 < i < R be p 1 > p 2 > < ik < n 1 such that for every symmetric gauge function we have ·
·
·
·
·
·
·
·
·
.
·
·
·
In other words , for every unitarily invariant norm we have
In particular, we have
and Use an argument similar to the one in the proof of Theorem VI.4. 1 to show that the factor V2 in the last inequality can be replaced by 1 . This raises the question whether we have
for all unitarily invariant norms. This is example 0 1 1 0 A == y'3 0
not so ,
as
can be seen from the
y'3 0 0
Problem Vl. 8. 13. Let be a closed subset of C, and let F be a retraction onto . Let N( ) be the set of all normal matrices whose spectrum is
VL 8 P roblems
189
contained in . Show that if is a convex �et , then for every unitarily invariant norm III B - F (B) I ll < III B - A l ii , whenever B is a normal matrix and A E N (). If is any closed set, then the above inequality is true for all Q-no:·ms. Problem Vl. 8. 14. The aim of this problem and the next, is to outline an alternative approach to the normal path inequality ( VI.44) . This uses slightly more sophisticated notions of differential geometry. Let A be any n x n matrix, and let 0 A be the orbit of A under the action of the group GL(n) , i.e. , OA
==
{ gAg - 1 : g E GL(n) } .
This is called the similarity orbit of A; it is the set of all matrices similar to A. Every differentiable curve in 0 A passing through A can be parametrised locally as e t X A e-tx , X E M(n) . By the same argument as in Section VI.4, the tangent space to 0 A at the point A can be characterised as
TA O A
==
{ [A , X ] : X
E
M(n) } .
The orthogonal complement of this space in M(n) can be calculated as in Lemma VI.4.2. Show that
Now, a matrix A is normal if and only if Z(A* ) = Z(A) . So, for a normal matrix we have a direct sum decomposition M (n)
==
TA OA EB Z(A) .
Now, if B E OA , then B and A have the same set of eigenvalues and hence
If B E Z( A ) , then there is an orthonormal basis in which A and B are both upper triangular. Hence, for such a B,
Now, let ')' ( t ) , 0 < t < 1 be a C 1 curve in the space of normal matrices. Let ! ( 0 ) == A 0 , ! ( 1 ) == A 1 . Let cp(A) == d2 (0"(Ao) , O"(A) ) . At each point 1(t ) consider the decomposition M (n)
==
T"((t) O "f( t ) EB Z( 1( t ) )
obtained above. Then, as we move along 1' ( t ) , the rate of change of the function cp is zero in the first direction in this decomposition, and in the
1 90
VI .
Spect ral Variation of Normal Matrices
second, it is bounded by the rate of �change of the argument . Hence we should have 1 cp( 'y ( l ) ) < l h' ( t ) lb dt.
1
Prove this. Note that this says that
if 1 is a C 1 curve passing through normal matrices and joining A 0 to A 1 . Problem VI.8 . 15. The two crucial properties of the Frobenius norm used above were its invariance under the conjugations A U A U * and the pinching inequality. The first made it possible to change to any orthonormal basis , and the second was used to conclude that the diagonal of a matrix has norm smaller than the whole matrix. Both these properties are enjoyed by all wui norms. So , the method outlined above can be adopted to work for all wui norms to give the same result. (Some conditions on the path are necessary to ensure differentiability of the functions involved. ) ---+
Problem Vl.8. 16. Fill in the details in the following outline of a proof of the statement : every complex matrix with trace 0 is a commutator of two matrices. Let A be a matrix such that trA == 0. Assume that A is upper triangular. Let B be the nilpotent upper Jordan matrix (i.e. , B has all entries 0 except the ones on the first superdiagonal, which are all 1 ) . Then Z ( B* ) contains only polynomials in B * . (This is a general fact: Z (X) contains only polyno mials in X if a1:1- d only if in the Jordan form of X there is j ust one block for each different eigenvalue. ) Thus Z ( B * ) consists of lower triangular matrices with constant diagonals. Show that A is orthogonal to all such matrices. Hence A is in the space Ts Os , and so A == [B , C] for some C.
VI . 9
Notes and References
Perturbation theory for eigenvalues is of interest to mathematicians , physi cists, engineers , and numerical analysts. Among the several books that deal with this topic are the venerable classics, T. Kato, Perturbation Theory for Linear Operators, Springer-Verlag, 1966, and J .H. Wilkinson, The Algebraic Eigenvalue Problem, Oxford University Press, 1965. The first is addressed to the problems of quantum physics, the second to those of numerical anal ysis. Matrix Computations by G.H. Golub and C.F. Van Loan, The Johns Hopkins University Press, 1983 , has enough of interest for the theorist and for the designer of algorithms. Much closer to the spirit of our book (but a lot more appealing to the numerical analyst) is Matrix Perturbation Theory
VI . 9 Notes and References
191
by G . W. Stewart and J .- G . Sun, Academic Press, 1 9 9Q.._ Much of the mate rial in this chapter has appeared before in R. Bhatia, Perturbation Bounds for Matrix Eigenvalues , Longman, 1987. In Section VI. l , we have done the bare minimum of qualitative analysis required for the latter sections. The questions of differentiability and an alyticity of eigenvalues and eigenvectors, asymptotic expansions and their convergence, and other related matters are discussed at great length by T. Kato, and by H. Baumgartel, Analytic Perturbation Theory for Matrices and Operators, Birkhauser, 1984. See also M. Reed and B . Simon, Analysis of Operators, Academic Press, 1978. The proof of Theorem VI. 1 . 2 given here is adopted from H. Whitney, Complex Analytic Varieties, Addison Wesley, 1972 . Other proofs may be found in R. Bhatia and K.K. Mukherjea, The space of unordered tuples of complex numbers, Linear Algebra Appl. , 52/53 ( 1983) 765-768. The proof of Theorem VI. 1 .4 is taken from T. Kato. Theorem VI. 4. 1 was proved in A.J. Hoffman and H.W. Wielandt , The variation of the spectrum of a normal matrix, Duke Math J . , 20 ( 1953) 37-39. This was believed to be the next best thing to having an analogue of Weyl's Perturbation Theorem for normal matrices . The conj ecture , that for normal A, B the optimal matching distance in each unitarily invariant norm is bounded by the corresponding distance between A and B, seems to have been raised first in L. Mirsky, Symmetric gauge functions and unitarily invariant norms, Quarterly J. Math. 1 1 ( 1960) 50-59. The problem, of course, turned out to be much more subtle than imagined at that time. Theorem VI. 2 . 2 is due to V. S. Sunder, Distance between normal opera tors, Proc. Amer. Math. Soc. , 84 ( 1982) 483-484. The rest of Section 2 is based on T. Ando and R. Bhatia, Eigenvalue inequalities associated with the Cartesian decomposition, Linear and Multilinear Algebra, 22 ( 1987) 133- 147. Theorem VI.3.3 is one of the several results proved in the paper Narms and exclusion theorems, Numer. Math. 2 ( 1960) 137- 141 , by F.L. Bauer and C.T. Fike. Theorem VI.3. 1 1 was first proved by R. Bhatia and C. Davis , A bound for the spectral variation of a unitary operator, Lin ear and Multilinear Algebra, 1 5 ( 19 8 4) 71-76. Their proof is summarised in Problem VI.8.3. This approach of ordering eigenvalues in the cyclic order is elaborated further by L. Elsner and C. He, Perturbation and interlace the orems for the unitary eigenvalue problem, Linear Algebra Appl. , 188/ 189 (1993) 207-229, where many related results are proved. The proof given in Section 3 is adapted from R.H. Herman and A. Ocneanu, Spectral analysis for automorphisms of UHF C* -algebras, J. Funct. Anal. 66 ( 1986) 1-10. The first example of 3 x 3 normal matrices A, B for which d(o- (A) , o- (B) ) > I I A - E ll was constructed by J.A . R. Holbrook, Spectral variation of normal matrices, Linear Algebra Appl. , 174 ( 1 992) 131- 144. This was done using a directed computer search. The bare-hands example VI.3. 13 was shown to us by G.M. Krause. The inequality ( VI. 33) appears in T. Ando, Bounds
1 92
VI.
Spect ral Variation of Normal Mat rices
for the aritidistance J J. Convex Analysis , 2(1996)
1 -3· 1 t
,
had been observed earlier by R. Bhatia that this would lead to Theorem VI.3. 14. A differ ent proof of this theorem was found independently by M. Omladic and P. Semrl, On the distance between normal matrices, Pro c . A mer. Math. Soc . , 1 10 ( 1990) 59 1-596. The idea of the simpler proof in Problem VI.8.4 is due to L. Elsner. The geometric ideas of Sections VI.5 and VI.6 were introduced in R. B hatia, Analysis of spec tral variation and some inequalities, Trans. A mer. Math. Soc. , 272 ( 1 982) 323-332. They were developed further in three pa pers by R. Bhatia and J .A.R. Holbrook: Short normal paths and spectral variation, Proc. Amer. Math. Soc. , 94 ( 1 985) 377-382 ; Unitary invariance and spectral variation, Linear Algebra Appl. , 95 ( 1987) 43-68; and A softer, stronger Lidskii theorem, Proc. Indian Acad. Sci. (Math. Sci. ) , 99( 1989) 7583. Much of Sections 5 and 6 is based on these papers. Example VI. 5.8 is due to M . D . Choi. The special operator norm case of the inequality (VI.49) was proved by K.R. Parthasarathy, Eigenvalues of matrix-valued analytic maps, J. Austral. Math. Soc. (Ser. A) , 26( 1978) 1 79- 197. Theorem VI.6. 10 and Example VI. 6. 1 1 are taken from R. Bhatia, C. Davis and A. Mcintosh, Perturbation of spectral subspaces and solution of linear operator equations , Linear Algebra Appl. , 52/53 ( 1983) 45-67. The inequality (VI.53) for (strongly ) unitarily invariant norms was first proved by V.S. Sunder, On permutations, convex hulls and normal operators, Lin ear Algebra Appl. , 48 ( 1982) 403-41 1 . P.R. Halmos, Spectral approximants of normal operators, Proc. Edin burgh Math. Soc. 19 ( 1974) 5 1-58 , initiated the study of operator approx imation problems of the following kind. Given a normal operator A, find the operator closest to A from the class of normal operators that have their spectrum in a given closed set . The result of Theorem VI.6.2 for infinite dimensional Hilbert spaces (and for closed sets
.
.
VI . 9 Notes and References
1 93
The results in Problem VI. 8.7 are due to W. Kahan , Spectra of nearly Hermitian matrices, Proc. Amer. Math. Soc. , 48 ( 1 975) 1 1- 1 7, as are the ones in Problem VI.8.8 (essentially) and Problem VI.8.9. In an earlier paper, Every n x n matrix Z with real spectrum satisfies I I Z - Z * II < li Z + Z * ll (log 2 n + 0.038) , Proc. Amer. Math. Soc. 39 ( 1973) 235-24 1 , Ka han proved the inequality in the title of the paper and used it to obtain the perturbation bound in the later paper. The exact value of In in this inequality was obtained by A. Pokrzywa, Spectra of operators with fixed imaginary parts, Proc. Amer. Math. Soc. , 81 ( 198 1 ) 359-364. The appear ance of a constant growing like log n in this inequality is related to another important problem. Let T be the linear operator on M (n) that takes every matrix to its upper triangular part . Then sup I I T(X) 11 / II X II is known to be O (log n ) . From this one can see (on reducing Z to an upper triangular form) that the constant In also must have this order. Related results may be found in R. Mathias , The Hadamard operator norrn of a circulant and applications, SIAM J. Matrix Anal. Appl. , 14( 1993) 1 152- 1 167. The results in Problems VI.8. 10 and VI.8. 1 1 are taken from J.-G Sun, On the variation of the spectrum of a norrnal matrix, Linear Algebra Appl. , 246( 1996) 215-223. Bounds like the one given in Problem VI.8. 1 2 are called residual bounds in numerical analysis. See W. Kahan, Numerical linear algebra, Canadian Math. Bull. , 9 ( 1 966) 757-80 1 , where the special improvement for the Frobe nius norm is obtained. The general result for all unitarily invariant norms is proved in the book by Stewart and Sun. The example at the end of this problem was given in R. Bhatia, On residual bounds for eigenvalues, Indian J. Pure Appl. Math. , 23 ( 1992) 865-866. The result in Problem VI.8. 14 is a well-known theorem; the proof we have outlined here was shown to us by V .S. Sunder.
VII Perturbat ion of S p ectral S ubs p aces of Normal M at rices
In Chapter 6 we saw that the eigenvalues of a (normal) matrix change continuously with the matrix. The behaviour of eigenvectors is more com plicated. The following simple example is instructive. Let A == ( 1 6 £" 1 ° E." ) E9 H, B == ( ! � ) EB H, where H is Hermitian. The eigenvalues of the first 2 x 2 block of A are 1 + E , 1 - E . The same is true for B. The corresponding nor malised eigenvectors are ( 1 , 0) and (0, 1) for A, and � (1, 1) and � ( 1 , - 1 ) for B . As E ----+ 0 , B and A approach each other, but their eigenvectors re main stubbornly apart. Note, however, that the eigenspaces that these two eigenvectors of A and B span are identical. In this chapter we will see that interesting and useful perturbation bounds may be obtained for eigenspaces corresponding to closely bunched eigenvalues of normal matrices. Before we do this , it is necessary to introduce notions of distance between two subspaces. Also, it turns out that this perturbation problem is closely related to the solution of the matrix equation AX - X B Y . This equation called the Sylvester Equation, arises_ in several other contexts. So, we will study it in some detail before applying the results to the perturbation problem at hand. ==
1 95
VI I . l Pairs of S ubspaces
VII . l
Pairs of S ubspaces
We will be dealing with block decompositions of matrices . To keep track of dimensions, we will find it convenient to write £
k
A == for a block-matrix in which k and f are the number of columns, and m and p are the number of rows in the blocks indicated. Theorem VII. l . l {The QR Decomposition) Let m > n . Then there is an m x m unitary matrix Q
( R0 )
A be an such that
m x n
matrix,
n
Q * A ==
n m - n
(VII. l )
'
where R is upper triangular with nonnegative real diagonal entries. Proof. For a square matrix A , this was proved in Chapter 1. The same proof also works here. (In essence this is just the Gram-Schmidt pro • cess. ) The matrix R above is called the R factor of A.
Let A be an matrix with rank A Then the decomposition of A has positive diagonal elements and is uniquely determined. (See Exercise I. 2. 2.) If we write
Exercise VII. 1 . 2 R factor in the QR
m x n
n
==
n.
m - n
then we have Thus Q 1 is uniquely determined by A. However, Q 2 need not be unique. Note the range of A is the range of Q 1 , and its orthogonal complement is the range of Q2 . Exercise VII. 1 . 3 Let A be an m x n matrix with < Then there exists an unitary matrix W such that m
n.
n x n
L 0
m
n - m
AW == ( ) m, where L is lower triangular and has nonnegative real diagonal entries.
1 96
VII. Perturbation of Spectral
S ubspaces
of Normal
Matrices
We remark here that it is only for convenience that we choose R (and L ) to have nonnegative diagonal entries. By modifying Q (and W) , we can also make R (and L) have non positive real diagonal entries.
Let A be an matrix with rank A == r. Then there permutation matrix P and an m m 'L£nitary matrix Q such
Exercise VII. 1 . 4
exists an that
n x n
m x n
x
Q*AP =
(�
1
where R1 1 is an r r upper triangular matrix with positive diagonal entries. This is called a rank revealing QR decomposition . Exercise VII. 1 . 5 Let A be an m matrix with rank A == r. Then there exists an m m unitary matrix Q and an unitary matrix W such that Q*AW � x
x n
x
n x n
=
( �),
where T is an r r triangular matrix with positive diagonal entries. Theorem VII. 1 . 6 {The CS Decomposition} Let W be an unitary matrix partitioned as x
n x n
m
f
W ==
)!
(VIL2)
where .e < m. Then there exist unitary matrices U == diag(U1 1 , U22 ) and v == diag(Vl l ' v2 2 ) ' where ull ' vl l are .e f matrices, such that X
f
U*WV == where
C 0 < c1 <
C s 0
f
-S c 0
m-l
0
0 I
.e
f m-l
(VII.3)
and S are nonnegative diag n al matrice5, with diagonal < ce < 1 and 1 > s 1 > · · · > se > 0, respectively, and o
·
·
entries
·
Proof. For the sake of brevity, let us call a map X U * XV on the space of n x n matrices a U-transform, if U, V are block diagonal unitary matrices with top left blocks of size f x f. The product of two U-transforrns is again a U-transform. We will prove the theorem by showing that one can change the matrix W in (VIL2) to the form (VIL3) by a succession of U-transforms. ---7
VI I . l Pairs of S ubspaces
Let ul l ' vl l be .e
X
197
f unitary matrices such that k
p-k 0 I
where C1 is a diagonal matrix with diagonal entries 0 < c 1 < c2 < · · · < Ck < 1 . This is j ust the singular value decomposition: since W is unitary, all singular values of W1 1 are bounded by 1 . Then the U-transform in which U == diag (U1 1 , I) and V == diag ( V1 1 , I) reduces W to the form k P-k m
k c1 0 ?
P-k 0 I
m
? .
?.
? .
? .
(VIL4)
where the structures of the three blocks whose entries are indicated by ? are yet to be determined. Let W2 1 denote now the bottom left corner of the matrix (VIL4) . By the QR Decomposition Theorem we can find an m x m unitary matrix Q 22 such that p m -
p '
(VIL 5 )
where R is upper triangular with nonnegative diagonal entries. The U transform in which U == diag (I, Q22 ) and V == diag (I, I) leaves the top left corner of (VII . 4) unchanged and changes the bottom left corner to the form (VII. 5) . Assume that this transform has been carried out . Using the fact that the columns of a unitary matrix are orthonormal, one sees that the last f - k columns of R must be zero . Now examine the remaining columns, proceeding from left to right. Since C 1 is diagonal with nonnega tive diagonal entries, all of which are strictly smaller than 1 , one sees that R is also diagonal. Thus the matrix W is now reduced to the form f-k 0 I
m
k P-k
k c1 0
k P-k m-f
s1 0 0
0 0 0
? .
?. ? . ?
(VIL6)
?
in which 51 is diagonal with Cr + Sr == I , and hence 0 < S1 < I. The structures of the two blocks on the right are yet to be determined. Now , by
1 98
VII.
Pert urbat ion o f Spect ral S ubspaces o f Normal Matrices
Exercise VI I . 1 . 3 and the remark followi n g it , we can find e1n m x m unitary matrix V22 \v h i ch on right multiplication converts the top right block of (VII. 6 ) to the form f
f (L
m-f 0 ),
(VII. 7)
in which L is lower triangular with nonpositive diago n al entries. The U transform in which U == diag ( I, I) and V == diag (I, V22 ) leaves the two left blocks in (VIL6) unchanged and converts the top right block to the form (VII . 7) . Again, orthonormality of the rows of a unitary matrix and a repetition of the argument in the preceding step show that after this U-transform the matrix W is reduced to the form
k e- k k
E-k m-e
k c1 0 s1 0 0
f-k 0
I
0 0 0
I I
-I I I
k - 81 0
f-k 0 0
m-f 0 0
X33 x43 X53
X34 X44 X54
X3 s X4s Xss
(VIL8)
Now, we determine the form of the bottom right corner. Since the rows of a unitary matrix are mutually orthogonal, we must have cl s l == slx33 · But C1 and 51 are diagonal and 8 is invertible. Hence, we must have 1 X33 == C1 . But then the blocks X34 , X3 5 , X43 , X53 must all be 0, since the matrix (VIL8) is unitary. So, this matrix has the form
Let X
=
k E-k
k c1 0
f-k 0
I
k - 81 0
f-k 0 0
m-f 0 0
k f-k m-f
s1 0 0
0 0 0
c1 0 0
0 x44 Xs 4
0 X45 Xs 5
(VI L9)
( i:: ;:: ) . Then X is a un tary matrix of size m - k. Let i
U == diag ( Jp_ , Ik , X ) , where lp_ and Ik are the identity operators of sizes f and k , respectively. Then, multiplying ( VII.9) on the left by U* - another
VI I. l Pairs of Subspaces
1 99
U-transform - we reduce i t to the form k
f-k 0 I
k f-k
s1
0
0 0
m-f
0
0
k f-k
f-k 0 0
m-f 0 0
c1 0
0 I
0 0
0
0
I
( VII. lO )
)
If we now put C ==
and S ==
k 1
(�
f-k 0 I
k 1
f-k 0 0
(�
)
k p_ - k
)
k p_ - k ,
•
then the matrix (VII. lO ) is in the desired form (VII.3) .
Let W be as in ( VII. 2) but with f > m . Show that there exist unitary matrices U = diag ( U11 , U22) and V diag ( V1 1 , V22 ) , where U1 1 , V1 1 are f f matrices, such that
Exercise VII. I . 7
==
X
n-f
U* WV =
c 0
s
2£ - n 0 I
0
n-f -S 0
c
where
n-f 2£ - n n-f
C and S are nonnegative diagonal matrices with 0 < c1 < < Cn-e < 1 and 1 > s 1 > > s n - i > 0, C2 + S2 1. ·
·
·
·
·
·
(VIL l i )
diagonal entries respectively, and
==
The form of the matrices C and S in the above decompositions suggests an obvious interpretation in terms of angles. There exist (acute) angles oj , ; > e l > ()2 > . . . > 0, such that Cj == cos Oj and Sj = sin Oj . One of the major applications of the CS decomposition is the facility it provides for analysing the relative position of two subspaces of
en .
Let X 1 , Y1 be n f matrices with orthonormal columns. Then there exist f_ X f unitary matrices ul and vl and an n X n unitary matrix Q with the following properties.
Theorem VII. 1 . 8
x
200
VI I .
(i) If 2P <
Pert urbat ion of Spectral S ubspaces of Normal Tv1 atrices
n,
then p
Q X 1 U1 ==
I 0 0
f
P n - 2P
f
c S 0
Q Y1 V1 =
(VIL 12)
p
P n - 2P
(VII . 13)
where C, S are diagonal matrices with diagonal entries2 0 < c 1 < 1 and 1 > sl > . . . > S f_ > 0 , respectively, and C 2 + S == I {ii) If 2P > n, then
·
·
·
< ce <
.
n-P I 0 0
2P - n 0
n-P
c 0
s
I
(VIL 14)
0
n-P 2£ - n n-£
2P - n 0 I 0
n-£ 2£ - n
(VIL 1 5)
n-f
C, S are diagonal matrices with diagonal entries 0 < c 1 < 2 2 C n -i < 1 and 1 > s 1 > · · · > S n -i > 0 , respectively, and C + 5 = I .
where
·
·
·
<
Proof. Let 2P < n. Choose n x (n - f) matrices X2 and Y2 such that X == (X1 X2 ) and Y == ( Y1 Y2 ) are unitary. Let
By Theorem VII. l . 6 we can find block diagonal unitary matrices U == diag(Ul , U2 ) and v == diag(Vl , V2 ) , in which ul and vl are p X £ unitaries, such that C s 0
-S c 0
0 0 I
Let Q == (XU) * == (X1 U1 X2 U2 ) * . Then from the first columns of the two sides of the above equation we obtain the equation ( VIL 13) . For this Q the equation (VII. 1 2) is also true.
VI I . l Pairs of S ubs paces
201
When 2£ > n , the assertion of the theorem follows, in the same manner, • fro m the decompos ition (VII. l l ) . This theorem can be interpreted as follows. Let £ and F be £-dimensional subspaces of e n . Choose orthonormal bases XI ' . . . ' X.e and Yl ' . . . ' Y.e for these spaces. Let x l == (x l X 2 . . . Xt) , yl == ( Y l Y2 . . . Y.e ) . Premultiply ing X 1 , Y1 by Q corresponds to a unitary transformation of the whole space e n , while postmultiplying X 1 by U1 and Y1 by V1 corresponds to a change of bases within the spaces £ and F, respectively. Thus , the theorem says that, if 2£ < n , then there exists a unitary transformation Q of e n such that the columns of the matrices on the right-hand sides in (VII. 12) and (VII. 13) form orthonormal bases for Q£ and Q:F, respectively. The span of those columns in the second matrix, for which s1 == 1 , is the orthogonal complement of Q£ in Q:F. When 2£ > n , the columns of the matrices on the right-hand sides of (VII. 14) and (VII. 15) form orthonormal bases for Q£ and QF, respectively. The last 2£ - n columns are orthonormal vectors in the intersection of these two spaces. The space spanned by those columns of the second matrix, in which s1 1 , is the orthogonal complement of Q£ in Q:F. The reader might find it helpful to see what the above theorem says when £ and F are lines or planes in JR3 . Using the notation above, we set -==
8(£, F) == arcsin S. This is called the angle operator between the subspaces £ and F. It is a diagonal matrix, and its diagonal entries are called the canonical angles between the subspaces £ and F . If the columns of a matrix X are orthonormal and span the subspace £, then the orthogonal projection onto £ is given by the matrix E == X X * . This fact is used repeatedly below.
Let £ and :F be subspaces of en . Let X and Y be ma trices with orthonormal columns that span £ and F, respectively. Let E, F be the orthogonal projections with ranges E, F. Then the nonzero singular values of EF are the same as the nonzero singular values of X * Y . Exercise Vll. l . lO Let E, F be subspaces of e n of the same dimension, and let E, F be the orthogonal projections with ranges E, F. Then the sin gular values of EF are the cosines of the canonical angles between £ and F, and the nonzero singular values of E _l_ F are the sines of the nonzero canonical angles between £ and F.
Exercise VII. l . 9
Let E, F and E, F be as above. Then the nonzero sin gular values of E - F are the nonzero singular values of E_l_ F, each counted twice; i. e. , these are the numbers s 1 , s 1 , s 2 , s2 , . . . .
Exercise Vll. l . l l
VI I .
202
Perturbat ion of Spectral Subspaces of Normal M atrices
Note that by Exercise VII. l . lO, the angle operator 8 ( £ , F) does not depend on the choice of any particular bases in £ and F. Further, 8 ( £ , F) 8(F, £ ) . It is natural to define the distance between the spaces £ and :F as I J E - F II . In view of Exercise VII. l . l l , this is also the number II E_L F ll . More generally, we might consider I I I E_L F i l l , for every unitarily invariant norm , to define a distance between the spaces £ and F. In this case, I ll E - Fi ll ==
==
I I I E j_ F EB EF j_ I ll -
We could use the numbers III E_L F ill to measure the separation of £ and F, even when they have different dimensions. Even the principal angles can be defined in this case:
Let X , y be any two vectors in e n . The angle between x and y is defined to be a number L (x, y) in [0, /2] such that Exercise VII. 1 . 12
1r
L
Let
(X' y )
= cos
-1
l y *x l
II X I I I I y I I .
and F be sub spaces of en , and let dim e l ' . . . , em recursively as £
max EE
x x ...L ( x 1 , . . , x k - 1 ]
£ > dim F == m .
Define
min
y E :F y ...L ( y l , · · · · Y k - 1 ]
Then ; > 0 1 > · · > Brn > 0. The numbers ek are called the principal angles between £ and F. Show that when dim £ dim F, this coincides with the earlier definition of principal angles. ·
==
Exercise VII. 1 . 13 have l i E - F ll < 1 .
Show that for any two orthogonal projections E, F we
Proposition Vll. l . l4 Let E, F be two orthogonal projections such that jj E - Fl l < 1 . Then the ranges of E and F have the same dimensions. Proof. Let £ , :F be the ranges of E and F. Suppose dim £ > dim F. We will show that £ n F_L contains a nonzero vector. This will show that l i E - Fl l == 1 . Let g == EF. Then g c £, and dim g < dim :F < dim £ . Hence £ n g_L contains a nonzero vector x. It is easy to see. that £ n gj_ C :Fj_ . Hence , • x E :F.L . In most situations in perturbation theory we will be interested in compar ing two projections E, F such that I I E - Ft l is small. The above proposition shows that in this case dim E dim F. ==
VI I . 2 The Equation AX - XB
Y
203
Let
Example VII. l . 15
xl ==
=
1 1 0 0
1
V2
-
0 0 1 1
1 0 0 0
Y1 =
0 0 -1 0
The columns of X and of Y1 are orthonormal vectors. If we choose unitary matrices 1
ul ==
and
( 11 � , V2 ) 1
v1 =
-
1 1 1 1
Q = �2
1 1 -1 -1
� ( -1 � )
1 -1 1 -1
1
'
1 -1
-1 1
then we see that Q X 1 U1 =
1 0 0 1 0 0 0 0
1
Q Y1 V1 == V2
1 0 0 1 1 0 0 1
Thus in the space IR4 (or C4 ) , the canonical angles between the 2-dimensional subspaces spanned by the columns of X 1 and Y1 , respectively, are � , � . VII . 2
The Equation AX - X B == Y
We study in some detail the Sylvester equation,
AX - XB = Y.
( VII . 16)
Here A is an operator on a Hilbert space 7t, B is an operator on a Hilbert space K, and X, Y are operators from K into 7t. Most of the time we are interested in the situation when K = 1t == e n , and we will state and prove our results for this special case. The extension to the more general situation is straightforward. We are given A and B , and we ask the following questions about the above equation. When is there a unique solution X for every Y? What is the form of the solution? Can we estimate I !Xfl in terms of lf Y I I ?
Let A, B be operators with spectra o-(A) and o-(B), re spectively. If o-(A ) and o-( B) are disjoint, then the equation (VI/. 16} has a unique solution X for every Y .
Theorem VII.2. 1
204
VI I .
Pert urbat ion of S p ectral S ubs p aces of Normal M at rices
Let T be the linear operator on the space of operators, defined by T ( X ) == AX - X B . The conclusion of the theorem can be rephrased as: T is invertible if CJ ( A ) and CJ (B) are disj oint . Let A ( X ) == AX and B ( X ) == XB . Then T == A - B and A and B commute ( regardless of whether A and B do ) . Hence, CJ(T) C CJ(A) - CJ (B) . If x is an eigenvector of A with eigenvalue a , then the matrix X , one of whose columns is x and the rest of whose columns are zero, is an eigenvector of A with eigenvalue a . Thus t he eigenvalues of A are j ust the eigenvalues of A, each counted n times as often. So CJ ( A ) == CJ ( A ) In the same way, CJ(B) == CJ(B ) . Hence a ( T ) C CJ(A ) - CJ (B ) . So, if CJ (A ) and O"(B) are disjoint , then 0 ¢ CJ(T) . Thus , T is invertible. • Proof.
.
It is instructive to note that the scalar equation a x - xb == y has a unique solution x for every y if a - b =I= 0. The condition 0 tf_ CJ (A) - CJ(B) can be interpreted to be a generalisation of this to the matrix case. This analogy will be helpful in the discussion that follows. Consider the scalar equation ax - xb == y. Exclude the trivial cases in which a == b and in which either a or b is zero. The solution to this equation can be written as b -I 1 x == a1y.
(
a
)
If l b l < I a ! , the middle factor on the right can be expanded as a convergent power series, and we can write x = a- 1
00 ( ) n b
2:=
n=O
a
y
00 '""'""' - n - 1 y bn . == � a n=O
This is surely a complicated way of writing x = y f (a - b) . However, it suggests, in the operator case , the form of the solution given in the theorem below. For the proof of the theorem we will need the spectral radius formula. This says that the spectral radius of any operator A is given by the formula spr ( A ) = lim /! An l l l / n . n�oo
Theorem Vll.2.2 Let A, B be operators such that CJ"(B) C {z : l z l < p} and CJ(A ) C { z : l z l > p} for some p > 0. Then the solution of the equation AX - XB == Y is X ==
CXJ
L A- n - 1 Y Bn .
n =O
( VII. l 7 )
P roof. We will prove that the series converges. It is then easy to see that X so defined is a solution of the equation. Choose P1 < p < P2 such that CJ( B ) is contained in the disk {z : l z l < P 1 } and CJ(A) is outside the disk { z : l z l < p2 } . Then O" ( A - 1 ) is inside the disk { z : I z I < p2, 1 } . By the spectral radius formula, there exists a positive
VI I . 2 The Equation AX - X B
==
}/-
205
i nteger N such that for n > N , !I B n II < p� and I I A - n I I < p:;_n . Hence, for n > N , I I A - n - l yB n l l < (p 1 /p2 ) n ii A- 1 Y II . Thus the series in (VII. 1 7) is •
convergent.
Another solution to (VII. l6) is obtained from the following considera-
J e t(b-a) d t is convergent and has the 0 Thus, in this case , the solution of the equation ax - xb == y can
tions. If Re(b value
a
1b.
a)
oo
< 0 , the integral CXJ
be expressed as
x == J et(b - a) y dt. This is the motivation for the following 0
theorem.
Let A and B be operators whose spectra are contained in the open right half-plane and the open left half-plane, respectively. Then the solution of the equation AX - X B == Y can be expressed as
Theorem VII.2 . 3
j e-t A ye t B dt. CXJ
X=
(VII. 18)
0
Proof. It is easy to see that the hypotheses ensure that the integral given above is convergent . If X is the operator defined by this integral, then AX - X B
00
j (A e- tA yet B - e - tA y etB B ) dt 0
- e - t A ye t s l :
==
Y.
•
So X is indeed the solution of the equation.
Notice that in both the theorems above we made a special assumption about the way o- ( A) and o- ( B ) are separated. No such assumption is made in the theorem below. Once again, it is helpful to consider the scalar case first. Note that
(a -
1 () ( b - ( )
==
(
1
a-
1 ( b-(
)
1 b-
a·
So, if r is any closed contour in the complex plane with winding numbers 1 around a and 0 around b, then by Cauchy's integral formula we have
27f i . 1 ---d( == b J ( a - () ( b - () a -
r
Thus the solution of the equation x ==
ax - x b == y can be expressed as
y 1 d(. 27fi J ( a - () ( b - ( ) r
VI I .
206
Perturbat ion of Spectral Subspaces o f Normal Matrices
The appropriate generalisation for operators is the following.
Let A and B be operators whose spectra are disjoint from each other. Let r be any closed contour in the complex plane with winding numbers 1 around o-(A) and 0 around o-(B ) . Then the solution of the equation AX - X B == Y can be expressed as Theorem Vll.2.4
( VII. 1 9) Proof. If A X - X B Y, then for every complex number ( , (A - () X - X ( B - () == Y . If A - ( and B - ( are invertible, this gives
X ( B - ( ) - 1 - ( A - ( ) - 1 X == ( A - ( ) - 1 Y ( B - ( ) - 1 .
Integrate both sides over the given contour r and note that J ( B - () - 1 d( 0 and - J (A - () - 1 d( r
==
21ril. This proves the theorem.
r
==
•
Our principal interest is in the case when A and B in the equation (VII. 16) are both normal or, even more specially, Hermitian or unitary. In these cases more special forms of the solution can be obtained. Let A and B be both Hermitian. Then iA and iB are skew-Hermitian, and hence their spectra lie on the imaginary line. This is j ust the opposite of the situation that Theorem VII. 2.3 was addressed to. If we were to imitate 00 that solution, we would try out the integral J e - i tA y e itB dt . This, however, 0
does not converge . This can be remedied by inserting a convergence factor: a function f in L 1 (�) . If we set
X=
00
j -
oo
it it e - A ye B
f ( t ) dt ,
then this is a well-defined operator for each f E £ 1 (IR) , since for each t the exponentials occurring above are unitary operators. Of course, such an X need not be a solution of the equation (VII. 16) . Can a special choice of f make it so? Once again, it is instructive to first examine the scalar case. In this case, the above expression reduces to
,..
,.. x == yf( a - b ) ,
where f is the Fourier transform of f, defined as
}( s) =
CXJ
J
-
oo
e
- its
j ( t ) dt.
(VII.20)
V I I . 2 The Equat ion AX
-
XB
1 b , we do have .. So, if we choose an f such that j(a - b ) == The following theorem generalises this to operators. a
=
Y
207
ax - xb ==
y.
Let A , B be Hermitian operators whose spectra are dis joint from each other. Let f be any function in L 1 (IR) such that j ( s) == � whenever s E o-(A) - o- (B) . Then the solution of the equation AX - X B == Y can be expressed as CXJ e- itA y e i t B f(t) dt. X (VII.2 1 ) Theorem VII.2 . 5
=
j
- CXJ
Proof. Let a and (3 be eigenvalues of A and B with eigenvectors u and i A is unitary and its adjoint is v , respectively. Then, using the fact that e t e - it A , we see that
( e i tA A u, y e i t B v ) e it ( /3- cx ) a ( u, Yv) .
A similar consideration shows that
( u , e - itA y eitB Bv) == e i t ( f3 - cx ) f3 ( u , Y v ) .
Hence, if X is given by (VII.2 1 ) , we have
( u , ( A X - X B)v)
( a - ,B) ( u , Yv)
J e i t (f3-<> ) f(t ) dt CXJ
(a - (J) (u , Yv) f (a - {3) (u , Yv) . ()()
"
Since eigenvectors of
AX - X B == Y.
A and B both span the whole space, this shows that
•
The two theorems below can be proved using the same argument as above. For a function f in L 1 ( IR2 ) we will use the notation j for its Fourier transform, defined as
]( s 1 , sz ) =
J J e-i (t , s, +t2s2 ) f ( t1, tz ) dt 1 dtz. CXJ CXJ
- CXJ - ()()
Let A and B be normal operators whose spectra are disjoint from each other. Let A == A1 + i A2 , B == B1 + i B2 where A 1 and A2 are commuting Hermitian operators and so are B1 and B2 . Let f be any function in L1 (IR2 ) such that ] (s 1 , s2 ) == 1 � . whenever s 1 + is2 E s
Theorem VII.2.6
,
2S2
208
Perturbat ion o f S p ectral S ubspaces of Nor mal Matrices
VI I .
o-(A) - o- ( B )
expressed as
.
of
the equation
AX - X B
J j e - i (t l A l +t2 A2 ) yei (t 1 B 1 +t2 B2 ) f (t 1 , t 2 )dt 1 dt2 . CX)
X=
Then the solution
CX)
==
Y
can be
CX)
(VII. 22)
- CX)
Theorem VII.2. 7 -
Let A and B be unitary operators whose spectra are disjoint from each other. Let { a n } CX) be any sequence in £ 1 such that CX)
CX)
L
n = - CXJ
in an e e ==
1
1 - e2· e
whenever
Then the solution of the equation AX - X B == Y can be expressed as CX)
X == L a n A- n - l yB n .
(VIL2 3)
n= -CXJ
The different formulae obtained above lead to estimates for I I I X III when A and B are normal. These estimates involve I II Y I I I and the separation 8 between o-(A) and o-(B) , where
8 == d ist ( o-(A) , a (B ) ) == min { I A - J-t I : A E o- (A) ,
J-L
E o-(B) } .
The special case of the Frobenius norm is the simplest. Theorem Vll.2 . 8 Let A and B be normal matrices, and let 8 == dist (o-(A) , o-( B)) > 0 . Then the solution X of the equation AX - X B ==
Y
satisfies the inequality
(VII. 24) Proof. If A and B are both diagonal with diagonal entries A 1 , . , A n and f..L l , . . . , f..L n , respectively, then the entries of X and Y are related by the equation Xij == Yij / ( A i - J-tj ) - From this (VII. 24) follows immediately. If A, B are any normal matrices, we can find unitary matrices U, V and diagonal matrices A ' , B' such that A == U A' U * and B == V B'V * . The equation AX - X B == Y can be rewritten as .
.
UA' U * X - X VB' V * = Y and then as
A' (U* X V ) - ( U * X V ) B' == U* Y V.
So, we now have the same type of equation but with diagonal A ' , B ' . Hence,
I I U * X V [[2 <
� [ [ U* YV [[ 2 ·
By the unitary invariance of the Frobenius norm this is the same as the • inequality (VII.24) .
Examp le
V I I . 2 The Equation AX - X B
=
Y
209
If A or B is not normal, no inequality like ( 1llf. 24) is true in general. For example, if A == Y == I and B == ( � � ) , then the equation AX - X B == Y has the solution X == ( � � ) . Here 8 == 1 , I I Y I I 2 == )2, but I I X I J 2 can be made arbitrarily large by choosing t large. Thus we ca nnot even have a bound like I I X I I 2 < � I I Y I I 2 for any cons tant c in this case. Exa mpl e VII. 2 . 10 In this example all matrices involved are Hermitian: VII. 2 . 9
X=
Then
� IIY II -
A, B .
( Vf51
Vf5 3
),
Y=
6 ( 2Vf5
2Vf5 -6
).
Here 8 == 2 . But, for the operator norm, I I X I I > Thu s, the inequ ality I I X I I < i I I Y I I need not hold even for Hermitian
AX
-
X B == Y .
In the next theorems we will see that we do have ! I I X I I I < � I l l Y I l l for a small constant c when A and B are normal. When the spectra of A and B are separated in a special way, we can choose c == 1 .
Let A and B be normal operators such that the spec trum of B is contained in a disk D( a, p ) and the spectrum of A lies outside a concentric disk D( a, p+8) . Then, the solution of the equation AX - X B == Y satisfies the inequality
Theorem Vll.2. 1 1
I ll X Il l
�
< i ll Y I ll
(VIL 25)
for every unitarily invariant norrn. Proof. Applying a translation , we can assume that a == 0. Then the solution X can be expressed as the infinite series (VII. I 7) . From this we get
I ll X I ll
00
<
<
n L I J A- l Ji + l J II Y III I I B J i n n =O 00 II I Y I J I L (p + 8) - n - l Pn n =O
1 b iii Y I I I -
•
Either by taking a limit p � oo in the above argument or by using the form of the solution (VII. 18) , we can prove the following.
VI I .
210
Pert urbation o f Spectral Subspaces
o f Normal
Matrices
Let A and B be normal operators with ()(A ) and () ( B ) lying in half-planes separated by strip of width 8 . Then the solution of the equation AX - XB == Y satisfies the inequality {VI/. 25). Exercise VII.2 . 13 Here is an alternate proof of Theorem VI/. 2. 1 1 . As sume, without loss of generality, that a == 0. Then A is invertible. Write X == A - 1 (Y + X B) and obtain the inequality (VI/. 25} directly from this . Exercise VII. 2. 14 Choose unit vectors u and v such that X v == I I X I ! u and X * u == I I X I ! v ; i. e., u and v are left and right singular vectors of X corresponding to its largest singular value. Then ( u, ( AX X B)v) == I I X II ( (u, A u) - (v , Bv) ) . Use this to prove Theorem VI/. 2. 1 1 in the special case of the operator norm. Theorem VII.2. 1 5 Let A and B be Hermitian operators with d ist (()(A) , ()(B) ) == 8 > 0 . Then, the solution of the equation A X - X B == Y satisfies the inequality Theorem VII.2 . 12
a
-
III X III <
� 11 1¥111
(VII.2 6)
for every unitarily invariant norm, where c 1 is a positive real number de fined as c 1 == i nf{ l l f i i Ll f :
E
1
L ( IR) , f ( s ) "
=
1
when lsi > s
-
1}.
(VII. 27)
Proof. Let f6 be any function in £ 1 ( R) such that Jo ( s ) == ! whenever l s i > 8. By Theorem VII.2.5 we have
j
CXJ
X=
e-i
tA y itB e fo ( t )dt .
- oo
Hence ,
I l l X III < 111¥111
00
j l fo (t ) ldt = � 111¥111 j l f (t) l dt, CXJ
- oo
-
oo
! whenever l si > 1. Any where f ( t ) == fo ( t /8) . Note that ](s ) this property satisfies the above inequality. ==
f with •
Exactly the same argument , using Theorem VII.2.6 now leads to the following. Theorem VII.2 . 16 Let dist (()(A) , ()(B) ) == 8 > 0 . Y satisfies the inequality
A and B be normal operators with Then the solution of the equation A X - XB ==
I ll X III <
� 111¥111
(VII.28 )
VII.3
Pertur bation of Eigenspaces
211
for every unitarily invariant norm, where (VII.29) The exact evaluation of the constants c 1 and c2 is an intricate problem in Fourier analysis. This is discussed in the Appendix at the end of this chapter. It is known that and 1r
C2 <- -2 Further, with this value of VII . 3
c1 ,
j sin t dt < 2. 9 1 . 7r
0
t
the inequality (VII.26) is sharp.
Perturbation of Eigenspaces
Given a normal operator A and a subset S of C, we will write PA ( S ) for the orthogonal projection onto the subspace spanned by the eigenvectors of A corresponding to those of its eigenvalues that lie in S. If sl and s2 are two disjoint sets, and if E == PA ( SI ) and F == PA ( S2 ) , then E and F are mutually orthogonal. If A and B are two normal opera tors, and if E == PA ( S1 ) and F == Ps ( S2 ) , then we might expect that if B is close to A and S1 and S2 are far apart, then E is nearly orthogonal to F. This is made precise in the theorems below.
Let A , B be normal operators. Let S1 and S2 be two subsets of the complex plane that are separated by either an annulus of width 8 or a strip of width 8. Let E PA ( SI ) , F Ps ( S2 ) - Then, for every unitarily invariant norm,
Theorem VII.3 . 1
==
==
III EF III < � III E (A - B )F III < Proof. Since E commutes with A and (VII.30) can be written as
�
� I l i A - B i ll ·
F with B,
III EF III < III AE F - E FB ll l -
(VII.30)
the first inequality in
Now let EF == X . This is an operator from the space ran F to the space ran E. Restricted to these spaces, the operators B and A have their spec tra inside S2 and S1 , respectively. Thus the above inequality follows from Theorem VII. 2 . 1 1 when S1 and S2 are separated by an annulus, and from Theorem VII.2. 12 when they are separated by a strip.
212
VI I .
Pert urb ation o f S p ect ral S ubspaces o f Normal M atrices
The second inequality in
( VII.30)
is true because I I E I I == I I F I I == 1 .
•
The special case of this theorem when A and B are Hernlitian , S1 is an interval [a, b] and S2 the complement in IR of the interval (a - 8 , b + 8 ) , is known as the Davis-Kahan sin8 Theorem. ( We saw in Section 1 that l i EF I I is the sine of the angle between ran E and ran F ..i . ) With no special assumption on the way S1 and S2 are separated, we can derive the following two theorems from Theorems VII. 2 . 1 5 and VII. 2 . 1 6 by the argument used above.
Let A and B be Hermitian operators, and let S1 , S2 be any two subsets of JR. such that dist (S1 , S2 ) == 8 > 0 . Let E == PA ( S1 ) , F == Ps ( S2 ) . Then, for every unitarily invariant norm, c E (A - B) A-B (VII.31)
Theorem VII. 3 . 2
III EF III <
; III
F II I <
� lil
l ll ,
where c1 is the constant defined by (VII. 27). {We know that c 1 == ; . ) Theorem VII. 3 . 3 Let A and B be normal operators, and let S1 , S2 be any two subsets of the complex plane such that dist ( S1 , S2 ) == 8 > 0 . Let E == p ( sl ) ' F == Ps ( s2 ) . Then, for every unitarily invariant norm, A
III EF III <
� III E (A - B) F III < � lil A - B ill ,
where c2 is the constant defined by (VII. 29) . (We know that c2
(VII.32) < 2. 9 1 . )
Finally, note that for the Frobenius norm alone, we have a stronger result as a consequence of Theorem VII. 2.8.
Let A and B be normal operators and let S1 , S2 be any two subsets of the complex plane such that dist ( S1 , S2 ) == 8 > 0. Let E == PA (S1 ) , F == Ps ( S2 ) - Then
Theorem Vll. 3 . 4
1 1 I I E F I I 2 < 8 1 1 E ( A - B) F I I 2 < 8 1 J A - B l l 2 · VII . 4
A Perturbation Bound for Eigenvalues
An important corollary of Theorem VII.3.3 is the following bound for the distance between the eigenvalues of two normal matrices.
There exists a constant 1 < c < 3, such that the optimal matching distance d(a- (A) , a (B) ) between the eigenvalues of any two normal matrices A and B is bounded as
Theorem VII.4. 1
c,
d ( a- (A) , a- (B) ) < e l l A - B ll -
(VII.33)
VI L 5 Perturbation of the Polar Fac tors
213
Proof. � will show that the inequality (VII . 33 ) is true if c == c 2 , the constant in the inequality (VII.32 ) . Let rJ == c2 l 1 A - E ll and suppose d ( a ( A ) , a ( B )) > 'TJ · Then we can find a 8 > rJ such that d ( a ( A) , a ( B )) > 8. By the Marriage Theorem, this is possible if and only if there exists a set 51 consisting of k eigenvalues of A , 1 < k < n , such that the 8-neighbourhood { z : dist (z, 51 ) < 8} contains less than k eigenvalues of B. Let S2 be the set of all eigenvalues of B outside this neighbourhood. Then dist (S 1 , S2 ) > 8. Let E == PA (S1 ) , F == Ps ( S2 ) · Then the dimension of the range of E is k, and that of the range of F is at least n - k + 1 . Hence I I E F I I == 1. On the other hand, the inequality (VII.32) implies that I I EF I I <
� II A - E ll = � < 1 .
This is a contradiction. S o the inequality (VII.33) is valid if we choose c == c2 ( < 2 . 9 1 ) . Example VI. 3. 13 shows that any constant c for which the inequality • (VII.33) is valid for all normal matrices A , B must be larger than 1 . 0 18. We should remark that, for Hermitian matrices, this reasoning using Theorem VII.3.2 will give the inequality d ( a ( A ) , a ( B )) < ; I I A BJI . However, in this case, we have the stronger inequality d ( a ( A ) , a ( B )) < I I A - B I I - So, this may not be the best method of deriving spectral variation bounds. However, for normal matrices, nothing more effective has been found yet. -
VII . 5
Perturbation of the Polar Factors
Let A == UP be the polar decomposition of A. The positive part P in this decomposition is P == ! A I == (A* A) 1 1 2 and is always unique. The unitary part U is unique if A is invertible. Then U == A P 1 . It is of interest to know how a change in A affects its polar factors U and P. Some results on this are proved below. Let A and B be invertible operators with polar decompositions A == UP and B == VQ, respectively, where U and V are unitary, and P and Q are positive. Then, -
l i l A - B ill == III Up - VQ III ==
III P - U * V Q II I
for every unitarily invariant norm. By symmetry,
lil A - B ill == III Q - V* u P ill · Let Y == P - U* VQ , Z = Q - V * UP.
VI I.
214
Perturbation of Spectral Subspaces of Normal Mat rices
Then
Y + Z*
==
P(I - U * V) +
( / - U* V ) Q .
(VII.34)
This equation is of the form studied in Section VII.2. Note that () ( P) is a subset of the real line bounded below by s n (A) I I A - 1 11 - 1 and O"(Q) is a subset of the real line bounded below by s n (B) == I I B- 1 I I - 1 . Hence, dist (a(P), (j(-Q) ) == s n (A) + sn (B) . Hence, by Theorem VII. 2. 1 1 , ==
Ill I - U * V lll
<
s n (A)
� sn (B) lil Y + Z * Ill -
Since III Y III == III Z III == li l A - B ill and I ll ! - U * V III == I I I U - V III , this gives the following theorem.
Let A and B be invertible operators, and let the unitary factors in their polar decompositions. Then
Theorem VII. 5 . 1
U, V
be
(VII.35)
for every unitarily invariant norm I ll I l l Exercise Vll. 5 . 2 Find matrices A , B for which (V/!.35} is an equality. Exercise VII. 5.3 Let A , B be invertible operators. Show that ·
II I I A I - I B I I II
where
m
< (1
+
== min( II A I I , II B J I ) .
II A - l ii - 1
:1 1 B - l l ! - l )
2
Ili A - B i ll ,
(VII.36)
For the Frobenius norm alone, a simpler inequality can be obtained as shown below. Lemma VII.5.4
the inequality
Let f be a Lipschitz continuous function on C satisfying
Then, for all matrices X
< klz - wj,
for all z , w E C. and all normal matrices A, we have
l f( z ) - f ( w) l
l l f (A) X - X f (A) ll 2
<
k l l AX - X A I I 2 ·
Proof. Assume, without loss of generality, that A == diag ( A 1 , . . . , An ) Then, if X is any matrix with entries X ij , we have l l f ( A ) X - X f ( A) II �
't
,J
z ,J
•
VI I . 5 Perturbation of the Po lar Factors
Lemma VII. 5 . 5 VII. 5.4. Let A , B
215
Let f be functi on satisfying the conditions of Lemma be any two normal matrices. Then, for every matrix X , a.
I I J ( A ) X - X f ( B) \ 1 2
Proof. Let T == ( � � ) , Y == ( g by T and Y, respectively. Corollary VII. 5.6
< k l l AX - XB \ 1 2 ·
� ) . Replace A and X in Lemma VII. 5.4 •
If A and B are normal matrices, then (VII.37)
Theorem VII. 5 . 7
Let A , B be any two matrices. Then
1 \ I A I - I B \ 1 1 � + 1 1 \ A* I - I B* I I\ �
< 2 \\ A - E l l � -
(VII. 38)
Proof. Let T == ( J. � ) , S == ( ;. � ) . Then T and S are Hermitian. Note • that I T I == ( I �* I 1 � 1 ) . So, the inequality (VII.38 ) follows from (VII.37) . It follows from (VII . 38) that (VII.39 ) The next example shows that the Lipschitz constant V2 in the above in equality cannot be replaced by a smaller number. Example VII. 5.8
Let A=
Then l A \ == A and As
E
� 0,
( � � ), B= ( � � )·
I B I ==
1
v1 + £ 2
(
1 £
£ 2 E
-1 B I Il2 approaches v-2. the ratio II IAJ II A - B JI2
).
We will continue the study of perturbation of the function I A I in later chapters. A useful consequence of Theorem VII.5. 7 is the following p erturbation bound for singular vectors. Theorem VII. 5.9 Let S1 , S2 be that dist(S1 , 52 ) == 8 > 0 . Let A
two subsets of the positive half- line such and B be any two matrices. Let E and E' be the orthogonal projections onto the subspaces spanned by the right and the left singular vectors of A corresponding to its singular values in 81 . Let F and F' be the projections associated with B in the same way, corresponding to its singular values in S2 • Then (VII.40 )
216
Proof.
\TI L
Perturbation of Spectral S ubspaces of Normal Matrices
By Theorem VII.3.4 we have < <
1
8 1 1 I A I - I B I II 2 ,
� II [ A * [ - I B * I I I 2 ·
These inequalities, together with (VII. 38) , lead to the inequality (VII.40) .
VII. 6
Appendix : Evaluating t he const ants
•
( Fourier )
The analysis in Section VII. 2 has led to some extremal problems in Fourier analysis. Here we indicate how the constants c 1 and c2 defined by (VII. 27) and (VII.29) may be evaluated. The symbol 1 1 ! 1 1 1 will denote the norm in the space £ 1 for functions defined on JR. or on IR. 2 . We are required to find a function f in L 1 , with minimal norm, such that ] (s) == ! when lsi > 1 . Since j must be continuous, we might begin by taking a continuous function that coincides with ! for l s i > 1 and then taking f to be its inverse Fourier transform. The difficulty is that the function 1. is not in £ 1 , and hence its inverse Fourier transform may not be defined. Note, however, that the function ! is square integrable at oo. So it is the Fourier transform of an L 2 function. We will show that under suitable conditions its inverse Fourier transform is in L 1 , and find one that has the least norm. Since the function 1 is an odd function, it would seem economical to extend it inside the domain ( - 1 , 1 ) , so that the extended function is an odd function on JR. This is indeed so. Let f E L 1 ( IR ) and suppose ] (s ) == ! when l si > 1 . Let !odd be the odd part of J , !o dd ( t) == f ( t ) J ( t ) Then f:dd (s) == ! when l s i > 1 and ll !odd ii i < ll f ll 1 · Thus the cons tant c 1 is also the infimum of II f II 1 over all odd functions in L 1 for which J ( s) == ! when lsi > 1. Now note that if j is odd, then s
s
-
00
] (s )
J j (t) e -itsdt
- oo
-i
=
-i
-
.
00
J J (t ) sin ts dt
- oo
J Re j(t) sin ts dt + J Im j(t) sin ts dt . 00
-
oo
00
-
00
If this is to be equal to � when l s i > 1 , the Fourier transform of Ref should have its support in ( - 1 , 1 ) . Thus, it is enough to consider purely imaginary
VII . 6 Appendix : Evaluating the
( Fo urier )
const ant s
217
functions f in our extremal problem. Equivalently, let C be the class of all odd real functions in L 1 (IR) such that j ( s ) == 215 when I s I > 1 . Then c1
== inf{ ll f lh :
f E C}.
(VIL 4 1 ) 2 rr
Now, let g be any bounded fun12tion with period 21r having a Fourier
L#O an ei nt. Th s last condition means that J0 g ( t)dt == 0 . n
series expansi on
i
Then, for any f in
C,
00
J
_
f ( t )g (x - t) d t ==
00
L
n#O
�n einx _
(VII .42)
'l n
Note that this expression does not depend on the choice of f . For a real number x , let sgnx be defined as - 1 i f x is negative and 1 if x is nonnegative. Let fo be an element o f C such that sgn fo ( t) == sgn sin t .
(VII.43)
Note that the functio n sgn fo then satisfies the requirements made on the preceding paragraph . Hence, we have
j l fo (t ) l dt 00
j fo ( t ) sgn fo (t ) dt = - j fo ( t ) sg 00
in
00
- oo
- oo
g
n
fo ( -t )d t
- oo
- j f (t ) sgn fo ( -t )dt < j l f (t ) i dt 00
00
- oo
- oo
for every f E C. ( Use (VIL42) with x == 0 and see the remark following it. ) Thus c 1 == li fo 11 1 , where fo is any function in C satisfying (VII.43) . We will now exhibit such a function. We have remarked earlier that it is natural to obtain fo as the inverse Fourier transform of a continuous odd function cp such that cp ( s) == ! for l s i > 1 . First we must find a good sufficient condition on cp so that its inverse Fourier tr an sfo r m cj! (t) =
� 2
j cp ( s) sin ts ds (..'()
-
oo
( which, by definition, is in L2 ) is in L1 . Suppose cp is differentiable on
the whole real line and its derivative
integration
by
parts shows
cp'
is of bounded variation. Then, an
j cp ( s) sin ts ds = � j cos ts cp'(s)ds. 00
00
-
CXJ
218
VI I .
Pert urbation o f Spectral S ubspaces o f No rmal Matrices
Another integration by parts, this time for Riemann-Stieltjes integrals, shows that 00 CXJ
J tp ( s ) sin ts ds = t� J sin ts dtp' (s ) .
-
CXJ
CXJ
As t ----+ oo this decays as t� . So <j; ( t ) is integrable at oo . Since
cpo ( s ) ==
-
for l si > 1 for 0 < j s j < 1 for s == 0.
1/s
-1 - -7r2 cot -2 s 0 7f
s
(V I I . 44 )
From the familiar series expansion
CXJ 2z - cot - z == - + L --2 2 z n = l z 2 - 4n 2 1
7f
7f
( see L.V. Ahlfors, Complex Analysis, 2nd ed. , p. 188) one sees that 00
cp a ( s ) == L
n= l
2s for 0 < s < 1 . 4n 2 - s 2
This shows that cp 0 is a convex function in 0 < s < 1, and hence cp � is of bounded variation in this domain. On the rest of the positive half-line too cp� is of bounded variation. So cp 0 does meet the conditions that are sufficient to ensure that fo ==
2 fo ( t) 2fo ( t ) - 2 fo ( t + ) 1r
fo ( t ) - fo ( t + 2 1r )
1-
1
j cot ; 0
sin t
t +
fo ( t )
sin ts ds,
sin ( t + 1r )
Ut - t �
t + 7r 1r
The quantity inside the brackets is positive t � oo , we can write 00
s
,
+ 2( t
for
all
� 21r) ] sin t. t.
Since f0 (t )
L [ !o (t + 2 n7r) - fo (t + ( 2 n + l)1r)] sin t h ( t ) sin t ,
n =O
----+
0
as
VI I . 6 Append ix: Evaluating t he ( Fo urier) const ants
219
whe re h(t) > 0. This shows that fo satisfi es the condition (VII.43) . Finally, II fo 1! 1 can be evaluated from the available data. We can easily see that 4 1 1 - (sin t + - sin 3t + - sin 5t + n 5 3
sgn sin t
2.
nz
L n odd
Hence , using (VII.42 ) we obtain 00
j l fo (t ) l dt
-
()()
2:_ n
·
e int .
00
-! - oo
·
·
)
fo (t) sgn fo ( -t)dt 1
n
2 � nL n odd
2
n
We have shown that c 1 == ; . This result , and its proof given above, are due to B . Sz .- Nagy. The two-variable problem, through which c2 is defined, is more com plicated. The exact value of c2 is not known. We will show that c2 is fi nite by showing that there does exist a function f in L1 (JR 2 ) such that j (s 1 , s 2 ) == 81_; i s 2 when sf + s� > 1. We will then sketch an argument that leads to an estimate of c2 , skipping the technical details. It is convenient to identify a point ( x , y) in JR? with the complex variable z = x + i y. The differential operator fz = ; anni hilates every +i complex holomorphic function. It is a well- known fact (see, e.g. , W. Rudin, Functional Analysis, p. 205) that the Fourier transform of the tempered distribution 1z is - 2 zrri . (The normalisations we have chosen are different from those of Rudin.) Let 'P be a c oo function on JR2 that vanishes in a neighbourhood of the origin, and is 1 outside another neighbourhood of the origin. Let '¢( z) == z 1 r,o( ) . We will show that the inverse Fourier transform if; is in L . Note that
( %x
%Y )
z
This
'TJ
(
z
) .· ==
!!._,,,( ) == � d'fJ(z) z z d-z . d-z o/
is a c oo function with compact support. Hence , 'fJ is in the Schwartz space S. Let i] E S be its inverse Fourier transform. Then (ignoring constant factors) 1/;(z) == i] (z )/ z . Since i] is integrable at oo , so is 7/J. At the origin, ! is integrable and fJ( z) bounded. Hence if; is integrable at the origin. This shows that c2 < oo . Consider the tempered distribution fo (z) == 2-� . We know that f� ( �) == 1 1 � . However, f0 ¢ L . To fix it up we seek an element p in the space of tempered d i st r i but io ns S' such that 7r 7. Z
VI I .
220
E
(i) p
Pert urbation o f S p ectral S ubspaces o f Normal Matrices
£ 1 and supp p is contained in the unit disk D , L1 .
(ii) if f == fo + p , t he n f is in
Note that c2 == inf l l f lh over su ch
Writing
z ==
rei0 , one sees that
p.
(X)
=
7f
l fl l 1 2� j rdr j I� - iei0 21rp (z ) j dO. - 7!"
0
Let
=
F(r) Then
7f
J
- 7!"
7f
j iei6p ( z ) de .
(VII.45 )
1
1
1 - - ie1· o 21rp( z ) j dB > 2 7r j - - F(r) l , r r
and there is equality here i f ei0p( r ei0 ) is independent of e . Hence, we can restrict attention to only those p that satisfy the additional condition
( iii)
zp z )
(
is a radial function.
Putting
G( r)
we see that c2
=
==
inf
1 - rF(r) ,
(VII.46)
J IG ( r)ldr,
(VIL47)
(X)
0
where is defined via (VII.45) and (VII.46) for all p that satis fy the conditions (i) , (ii) , and ( iii) above. The two-variable minimisation p roblem is thus reduced to a one-variable problem. Using the conditions on p, one can characterise the functions that enter here. This involves a little m ore intricate analysis, which we will skip. The con c l us ion is that the functions G that enter in (VII.4 7 ) are all £ 1 fun ct i ons of the form G == g , where g is a continuous even function
G
G
supported in [- 1 , 1] such that
c2
=
inf {
(X)
j l.§ ( t ) l dt : 0
g
J g(t) dt = 1. In other words, 1
-1
even, supp g
=
g be the function g(t)
If we choose to This gives the estimate c2
<
1r .
[- 1 , 1 ] , ==
1
j
g =
1 , g E £1 } . (VI I . 4 8)
- l t l, then g(t )
==
sin2 ( ; ) ( � ) 2 .
221
VI I . 7 P roblen1s
A better estimate is obtained from the function g (t)
7r
7r == 4
for ! t l < 1 .
cos 2 t
Then g(t) ==
7r
2
A little computation shows that
4 7r 2 - t 2
cos t
.
sin dt < 2 . 9090 1 . J i§(t ) idt ; J CXJ
7r
=
0
t
t
0
Thus c2 < 2.9 1. The interested reader may find the details in the paper A n extremal problem in Fourier analysis with applications to operator theory, by R. Bhatia, C. Davis , and P. Koosis, J. Functional Analysis, 82 ( 1989) 138- 1 50.
VII . 7
P roblems
Problem VII. 6 . 1 . Let £ be any subspace of e n . For any vector
8(x, £) Then 8(x, £) is equal to
==
yE£
min l l x
II ( / - E) x l i ·
X
let
- Yll -
If £, F are two subspaces of e n ' let
p (£, F) == max{ max 8 ( x , F) , max 8 ( y , F) } . xEE
II X II = 1
Let dim £ Show that
==
where E and
dim F, and let F
8
y E :F
II y II = 1
be the angle operator between £ and F.
p (£, :F ) == II sin 8 11 == l i E - F ll , are the orthogonal projections onto the spaces £ and F.
Problem VII.6.2. Let A, B be operators whose spectra are disjoint from each other. Show that the operator ( � � ) is similar to ( � � ) for every C. Problem VII.6. 3. Let A, B be operators whose spectra are disjoint from each other. Show that if C commutes with A + B and with AB , t hen C commutes with both A and B. Problem VII. 6.4. The equation AX + XA * == I is called the Lyapunov equation. Show that if o-(A) is contained in the open left half-plane, then the Lyapunov equation has a unique solution X , and this solution is positive and invertible. -
222
VI I .
Perturbation o f Spectral S ubspaces of Normal Mat rices
Problem VII. 6 . 5 . Let A and B be any two matrices. Suppose that all singular values of A are at a distance greater than 8 from any singular value of B . Show that for every X ,
Problem VII. 6 . 6 . Let A , B be normal operators. Let S1 and S2 be two subsets of the complex plane separated by a strip of width 8. Let E PA ( S1 ) , F == Ps (S2 ) . Suppose E (A - B) E == 0. If T ( t ) is the function T(t) == t/ vl - t2 , show that ==
I I T ( I EF ! ) I I <
� I I E (A - B) II -
Prove that this inequality is also valid for all unitarily invariant norms. This is called the tanG theorem . Problem VII.6. 7. Show that the inequality (VII.28) cannot be true if c2 < ; . ( Hint: Choose the trace norm and find suitable unitary matrices A , B. ) Problem VII. 6 . 8 . Show that the conclusion of Theorem VII.2.8 cannot be true for any Schatten p-norm if p =/= 2. Problem VII.6.9. Let A , B be unitary matrices, and let dist (a (A) , a( B) ) == b == -J2. If some eigenvalue of A is at distance greater than -J2 from a ( B) , then a( A) and a(B) can be separated by a strip of width -J2. In this case, the solution of AX - X B == Y can be obtained from Theorem VIL2.3. Assume that all points of a ( A) are at distance -J2 from all points of O"(B) . Show that the solution one obtains using Theorem VII.2. 7 in this case is If O" ( A) == { 1 , - 1 } and O" ( B) == { i - i } , this reduces to ,
X == 1 /2 (AY - YB) . Problem VII.6. 1 0 . A reformulation of the Sylvester equation in terms of tensor products is outlined below. Let
cp ( I ® A)
AEij , EijAr ,
VI I. 8 Notes
and
References
223
where AT is the transpose of A. Thus the multiplication operator A ( X ) == AX on .C ( 7t ) can be identified with the operator A ® I on 1t ® 1t * , and the operator B(X ) == X B can be identified with I ® B T . The operator T == A - B then corresponds to
A ® ! - I ® B T.
Use this to give another proof of Theorem VII. 2 . 1 . Sometimes it is more convenient to identify .C ( 7t ) with 1t ® 1t instead of 1-l ® 7t * . In this case, we have a bijection
VII . 8
Notes and References
Angles between two subspaces of IRn were studied in detail by C. Jordan, Essai sur la geometrie a n dimensions, Bull. Soc. Math. France, 3 ( 1875) 103- 1 74. These ideas were reinvented, developed, and used by several math ematicians and statisticians. The detailed analysis of these in connection with perturbation of eigenvectors was taken up by C. Davis, Separation of two linear subspaces, Acta Sci. Math. (Szeged) , 19 ( 1958) 172- 1 87. This work was continued by him in The rotation of eigenvectors by a pertur bation, J. Math. Anal. Appl. , 6 ( 1963) 159- 173, and eventually led to the famous paper by C. Davis and W.M. Kahan, The rotation of eigenvectors by a perturbation III, SIAM J. Numer. Anal. 7( 1 970) 1-46. The CS decomposi tion in its explicit form as given in Theorem VII. 1 . 6 is due to G.W. Stewart,
On the perturbation of pseudo-inverses, projections, and linear least squares problems, SIAM Rev. , 1 9 ( 1 977) 634-662. It occurs implicitly in the paper
by Davis and Kahan cited above as well as in another important paper, A. Bjorck and G H. Golub, Numerical methods for computing angles be tween linear subspaces, Math. Comp. 27( 1973) 579-594. Further references may be found in an excellent survey article, C . C. Paige and M. Wei, His tory and generality of the CS Decomposition. Linear Algebra Appl. , 208/209 ( 1994) 303-326. Many of the proofs in Section VII. 1 are slight modifications of those given by Stewart and Sun in Matrix Perturbation Theory. The importance of the Sylvester equation AX - X B == Y in connec tion with perturbation of eigenvectors was recognised and emphasized by Davis and Kahan, and in the subsequent paper, G.W. Stewart, Error and .
perturbation bounds for subspaces associated with certain eigenvalue prob lems , SIAM Rev. , 15 ( 1973) 727- 764. Almost all results we have derived for
this equation are true also in infinite-dimensional Hilbert spaces. More on this equation and its applications, and an extensive bibliography, may be found in R. B hatia and P. Rosenthal, How and why to solve the equation AX - X B == Y , BulL London Math. Soc. , 29( 1997) to appear. Theorem VII. 2 . 1 was proved by J.H. Sylvester, Sur f 'equation
224
VI I .
Pert urbation of Sp ectral S ubspaces o f Normal M atrices
C . R. Acad. Sci . P ari s 99( 1884) 67-71 and 1 15- 1 16 . The proof given here i s due to G . Lumer and M . Rosenblum, Linear op erator equations, Proc. Amer. Math. Soc. , 10 ( 1959) 32-41 . The solution (VII. 1 8) is due to E . Heinz , Beitriige zur Storungstheorie der Spektralzer legung, Math. Ann. , 123 ( 19 5 1 ) 415-438. The general solution (VII. l9) is due to M. Rosenblum, On the operator equation B X - XA = Q, Duke Math. J . , 23 ( 1 956) 263-270. Much of t h e rest of Section VII.2 is based on the paper by R. Bhatia, C. Davis and A. Mc intosh , Perturbation of spectral subspaces and solution of linear operator equations, Linear Algebra Appl. , 52/53 ( 1983) 45-67. For Hermitian operators, Theorem VI L 3. 1 was proved in the Davis Kahan paper cited above. This is very well known among numerical an alysts as the sinO theorem and has been used frequently by them. This paper also contains a tane theorem ( see Problem VII.6.6 ) as well as sin28 and tan2B theoren1s , all for Hermitian operators. Theorem VII.3.4 (for Her mitian operators ) is also proved there. The rest of Section VIL3 is based on the paper by Bhatia, Davis, and Mcintosh cited above, as is Section VII.4. Theorem VII. 5 . 1 is due to R.-C.Li, New perturbation bounds for the uni tary polar factor, SIAM J . Matrix Anal. Appl. , 16 ( 1995) 327-332. Lemmas VII.5.4, VII.5.5 and Corollary VII. 5.6 were proved in F. Kittaneh, On Lip schitz functions of norrnal operators, Proc. Amer. Math. Soc. , 94 ( 1985) 416-418. Theorem VII. 5 . 7 is also due to F. Kittaneh , Inequalities for the Schatten p-norrn IV, Commun. Math. Phys . , 106 ( 1 986) 58 1-585. The in equality (VII. 39) was proved earlier by H. Araki and S. Yamagami, An inequality for the Hilbert- Schmidt norm, Commun. Math. Phys. , 81 ( 1 981) 89-98. Theorem VII. 5.9 was proved in R. Bhatia and F. Kittaneh, On some perturbation inequalities for operators, Linear Algebra Appl. , 106 ( 1988) 271-279. See also P.A. Wedin , Perturbation bounds in connection with sin gular value decomposition, BIT, 12( 1972) 99- 1 1 1 . The beautiful result c 1 == ; , and its proof in the Appendix, are taken from B . Sz.-Nagy, Uber die Ungleichung von H. Bohr, Math. Nachr. , 9 ( 1 953) 255-259. Problems such as this, are called minimal extrapolation pro blems for the Fourier transform. See H.S. Shapiro, Topics in Approximation The ory, Sprin ge r Lecture Notes in Mathematics Vol. 187 ( 1971 ) . In Theorem VIL2.5 , all that is required of f is that ](s) = ; when 2 points if we are s E CT( A ) - O"(B) . This fixes the value of j only at n dealing with n x n matrices. For each n , let b ( n ) be the smallest constant, for which we have I [ X [[ < I [ A X - X B II , en m atrice s px
== x q ,
b�)
whenever A, B are n x n Hermitian matrices such that dist (o-(A) , 8. R. McEachin has shown that b(2)
v'6 2
�
1 . 22474 ( see Example VII.2. 10)
a
( B)) ==
b( 3) and that
8 + s VIO
18
VI I . 8 Notes and References �
1 . 32285
b == nlim b(n ) --+ oo
22 5
==
Thus the inequality (VII.26) is sharp with
7r
- .
2
c1
==
; . See, R. McEachin ,
A sharp estimate in an operator inequality, Pro c. A mer. l\1ath. Soc. , 1 1 5
( 1 99 2) 161-165 and Analyzing specific cases of an operator inequality, Linear Alge bra Appl. , 208/20 9 ( 1994) 343-365. The quantity p(£, F) defined in Problem VII.6 . 1 is sometimes called the gap between £ and F. This and related measures of the distance between two subspaces of a Banach space are used extensively by T . Kato, Pertur bation Theory for Linear Operators, Chapter 4.
VIII S p ect ral Variation of Nonnormal Mat rices
In Chapter 6 we saw that if A and B are both Hermitian or both unitary, then the optimal matching distance d(o-(A) , o-(B)) is bounded by I I A - B I I . We also saw that for arbitrary normal matrices A , E this need not always be true ( Example VI.3 . 13) . However, in this case, we do have a slightly weaker inequality d( a (A) , o-(B) ) < 3 I I A - E l l ( Theorem VII.4. 1 ) . If one of the matrices A, E is Hermitian and the other is arbitrary, then we can only have an inequality of the form d(o-(A) , o-(B) ) < c( n ) I I A - E ll , where c( n ) is a constant that grows like log n ( Problems VI.8.8 and VI.8.9) . A more striking change of behaviour takes place if no restriction is placed on either A or B. Let A be the n x n nilpotent upper Jordan matrix; i.e. , the matrix that has all entries 1 on its first diagonal above the main diagonal and all other entries 0. Let B be the matrix obtained from A by adding an entry £ in the bottom left corner. Then the eigenvalues of B are the nth roots of £. So d(o-(A) , o-(B) ) == £1/ n , whereas I I A - E l l == £. When £ is small, the quantity c l / n is much larger. No inequality like d(o-(A) , o-(B) ) < c ( n ) II A - B II can be true in this case. In this chapter we will obtain bounds for d(o-(A) , o-( B ) ) , where A, B are arbitrary matrices. These bounds are much weaker than the ones for normal matrices. We will also obtain stronger results for matrices that are not normal but have some other special properties.
VI I I . I G eneral S pectral Variation Bounds
VIII . l
227
General S pectral Variation Bounds
Throughout this section, A and B will be two n x n matrices with eigenval ues a 1 , . . . , an , and fJ1 , . . , f3n , respectively. In Section VI. 3 , \Ve introduced the notation ( VII I l ) s ( a (B) , o-( A ) ) == m?X �nlai - ,Bj 1 .
.
J
'l
A bound for this number is given in the following theorem. Theorem VIII. l . l Let A, B be n
x n
matrices. Then
(VIII.2) Proof. Let j be the index for which the maximum in the definition (VIII. I ) is attained. Choose an orthonormal basis e 1 , . , en such that Be 1 f3j e1 . Then .
.
==
[m�nl ai - (3j l ] n 'l n
<
IJ l a i - ,Bj I
==
l det (A - (31 1 ) 1
i= l
by Hadamard's inequality (Exercise !. 1 . 3) . The first factor on the right hand side of the above inequality can be written as I I (A - B) e 1 J I and is, therefore, bounded by I I A - B I I - The remaining n 1 factors can be bounded -
as
•
This is adequate to derive (VIII.2) . Example VIII. 1.2 Let A equal.
==
- B == I .
Then the two sides of (VII/. 2} are
Compare this theorem with Theorem VI. 3.3. Since the right-hand side of (VIII.2) is symmetric in A and B, we have a bound for the Hausdorff distance as well: (VIII.3) Exercise VIII. 1 . 3 A bound for the optimal matching distance d( o-(A) , o-( B)) can be derived from Theorem VII/. 1 . 1 . The argument is similar to the one used in Pro blem VI. 8. 6 and is outlined below. (i) Fix A, and for any B let
228
VI I I .
S p ectral Variation o f Nonnormal Matrices
where lvf max( I I A I I , I I B I I ) . Let a 1 , . . . , a n be the eigenvalues of A. Let D ( a 2 , c( B )) be the closed disk with radius c ( B ) and centre a i . Then, The orem VII/. 1 . 1 says that CJ(B) is contained in the set D o btained by taking the union of these disks. (ii) Let A( t ) ( 1 - t ) A + tB , 0 < t < 1 . Then A(O) == A, A(1) == B , and c(A(t) ) < c( B ) for all t . Thus, for each 0 < t < 1 , o- ( A ( t )) is contained in D. {iii} Since the n eigenvalues of A ( t) are continuous functions of t , each connected component of D contains as many eigenvalues of B as of A. {iv) Use the Matching Theorem t o show that this implies ==
==
d ( a(A) , a ( B ))
<
(2n - 1) (2M) l - l/ n ii A - B l l 1 / n .
(VIII.4)
{v) Interchange the roles of A and B and use the result of Problem 1!. 5. 1 0
to o btain the stronger inequality
(VIII.5) The example given in the introduction shows that the exponent 1/n oc curring on the right-hand side of (VIII.5) is necessary. But then homogene ity considerations require the insertion of another factor like (2 M ) l - l/n. However, the first factor n on the right-hand side of (VIII. 5) can be replaced by a much smaller constant factor. This is shown in the next theorem. We will use a classical result of Chebyshev used frequently in approximation theory: if p is any monic polynomial of degree n , then
1 max jp(t) I > 2 -l . O
(VIII.6)
(This can be found in standard texts such as P. Henrici, Elements of Nu merical Analysis, Wiley, 1964, p. 194; T.J. Rivlin, A n Introduction to the Approximation of Functions, Dover, 1981 , p. 3 1 . ) The following lemma is a generalisation of this inequality. Lemma VIII. 1.4 Let r be a continuous curve in the complex plane with endpoints a and b. If p is any monic polynomial of degree n, then (VIII. 7) Proof. Let L be the straight line through a and b and L between a and b:
S the segment of
L
{z : z == a + t(b - a) , t E �}
S
{ z : z == a + t (b - a ) , O < t < 1 } .
For every point z in C, let z' denote its orthogonal projection onto L. Then l z - w l > l z' - w' l for all z and w .
-
VI I I . l General Spectral Vari ation Bo unds
- Let A -t-�,-- i == 1 , . . . , n , be the roots of p . Let .:\� == a + ti ( b t i E JR, and let z == a + t ( b - a ) be any point on L . Then n IT i
i= l
�
z - -A I
==
n IT
i= l
I ( t - ti ) ( b - a ) I
==
lb - a!n
n
Ti
i= l
-
l t - ti l ·
From this and the inequality (VIII.6) applied to the polynomial we can conclude that there exists a point z0 on S for which n
IT i
i=l
'
zo - .A i l
> jb - a!n 22n - l
a
),
II n
229
i= l
\vhere
( t - tz ) ,
·
Since r is a continuous curve joining a and b , z0 == .:\� for some A o E r . Since l -Ao - A i l > I -A � - A � l , we have shown that there exists a point .:\ 0 on
r
n IT
l b ;:,� I n s uch that l p ( .A o ) l == . I -Ao - A i > 1 2 i= l l
Theorem VIII. 1 . 5 Let A and B be two n
x
•
n matrices. Then (VIII. 8 )
Proof. Let A ( t ) == ( 1 - t)A + tB, 0 < t < 1 . The eigenvalues of A(t) trace n continuous curves in the plane as t changes from 0 to 1. The initial points of these curves are the eigenvalues of A , and their final points are those of B . So, to prove (VIII. 8) it suffices to show that if r is one of these curves and a and b are the endpoints of r, then I a - b l is bounded by the right-hand side of (VIII.8) . Assume that I I A l l < II B I I without any loss of generality. By Lemma VIII. l.4, there exists a point .:\0 on r such that I det (A - -A o l ) l >
jb - a!n 2 _1 . 2 n
Choose 0 < t0 < 1 such that .:\ 0 is an eigenvalue of ( 1 - t0 ) A + t0 B . In the proof of Theorem VIII. l . l we have seen that if X , Y are any two n x n matrices and if .A is an eigenvalue of Y, then J
I det( X - AI ) I < I I X - Y I ! ( I I X I I + I Y I I ) n - 1 ·
Choose X == A and Y == ( 1 t0 ) A + t0 B . This gives 1;��� � < I det (A Aol ) l < II A - B li ( II A I I + I I B I I ) n- l _ Taking nth roots, we obtain the desired • conclusion. -
Note that we have, in fact, shown that the factor 4 in the inequality (VIII.8) can be replaced by the smaller number 4 x 2 - 1 /n . A further im provement is possible; see the Notes at the end of the chapter. However, the best possible inequality of this type is not yet known.
VII I .
230
Spectral Variat ion of Nonnormal Matrices
Perturbation of Ro ots of Polynomials
VIII . 2
The ideas used above also lead to bounds for the distance between the roots of two polynomials. This is discussed below. Lemma VIII. 2. 1 Let f(z) nomial. Let
==
zn
+
a 1 z n -l + · · · + an
be any monic poly-
Then all the roots of f are bounded (in absolute value) by
Proof.
If l z l
> J-t ,
J-t .
(VIII.9 )
then
f(z)
a l + · · · + an 1+zn z a 2 - · · · - an al - > 1- z zn z2 1 1 1 1 - - - - - · · · - 2 22 2n 0. > -
>
•
Such z cannot, therefore, be a root of f.
Let a 1 , , O n be the roots of a monic polynomial f. We will denote by Root f the unordered n-tuple { a 1 , . , a n } as well as the subset of the plane whose elements are the roots of f. We wish to find bounds for the optimal matching distance d (Root f, Root g) in terms of the distance between the coefficients of two monic polynomials f and g . Let .
.
.
.
.
f(z ) g(z)
(VIII. 10)
be two polynomials. Let
(VIII. 1 1) (VIII.12) The bounds given below are in terms of these quantities. Theorem VIII. 2.2 Let f, g be two monic polynomials Then
where
J-t
n s (Ro ot f, Root g ) < L ! a k - bk !J-L n - k , k= l is given by { VIII. 9} .
as
in (VII/. 1 0} .
(VIII. l3)
VIII.2
Proof.
Perturbation of Ro ots of Poly nomials
23 1
We have
n f(z) - g( z ) == L (a k - bk )z n - k . k= l So, if a is any root of j, then, by Lemma VIII.2. 1 ,
l g (a) l
n
< <
If the roots of g are {3 1 , . . .
L l a k - b k l lnl n - k k= l n L ! ak - b k l /1-n - k_ k =l
, f3n , this says that n n IT Ia - f3j I < L lak - b k ! 11-n - k · j= l k=l
So , •
This proves the theorem.
Corollary VIII.2 . 3 The Hausdorff distance between the roots of f and g is bounded as
h(Rootf, Rootg) < G ( f, g ).
(VIII. 14)
Theorem VIII.2 . 4 The optimal matching distance be tween the roots of f and g is bounded as d(Root j, Root g) < 4 8 ( / , g ) .
(VIII. 15)
Proof. The argument is similar to that used in proving Theorem VIII. l .5. Let ft ( l - t ) f + tg , 0 < t < 1 . If A is a root of ft , then by Lemma VIII.2. 1 , I A I < 1 and we have ==
l t (f( .A ) - g (A) ) l < l f( A ) - g ( A ) I n < Li a k - bk l l.A i n - k < [G ( j , g ) ] n . k= l The roots of ft trace continuous curves t changes from 0 to 1 . The initial points of these curves are the roots of f , and the final points are the I f( A ) I
n
as
roots of g. Let r be any one of these curves, and let a, b be its endpoints. Then, by Lemma VIII. l .4, there exists a point A on r such that
232
VI I I .
Spectral Variat ion o f No nnormal l\1 atrices
This shows that ja -
bj < 4
X
2-l/n G (f, g ) ,
•
and that is enough for proving the theorem. h =
( f + g) /2 . Then any convex combination of f can also be expressed as h + t ( f - g) for some t with It! < � . Use
Exercise VIII. 2.5 Let and g this to show that
d ( Root j, Root g ) < 4 1 - 1 /n G ( f g ) . ,
(VIII. 16)
The first factor in the above inequality can be reduced further; see the Notes at the end of the chapter. However, it is not known what the optimal value of this factor is. It is known that no constant smaller than 2 can replace this factor if the inequality is to be valid for all degrees n. Note that the only property of 1 used in the proof is that it is an upper bound for all the roots of the polynomial ( 1 - t) f + tg. Any other constant with this property could be used instead. Exercise VIII. 2.6 In Pro blem I. 6. 1 1 , a bound for the distance between the coefficients of the characteristic po lynomials of two matrices was ob tained. Use that and the combinatorial identity
tkG) k =O
=
n2 n - l
n matrices A, B d ( O" (A) , O" (B )) < n l/ n (8 M) l - 1 /n ii A - B ll l/ n , ( VI I I . 1 7 ) where M == max( II A II , I I B II ) . This is weaker than the bound obtained in to show that for any two
n x
Theorem VIII. l . 5.
VII I . 3
Diagonalisable Matrices
A matrix A is said to be diagonalisable if it is similar to a diagonal matrix; i.e. , if there exists an invertible matrix S and a diagonal matrix D such that A == svs - 1 . This is equivalent to saying that there are n linearly independent vectors in e n that are eigenvectors for A. If S is unitary ( or the eigenvectors of A orthonormal) , A is normal. In this section we will derive some perturbation bounds for diagonalisable matrices. These are natural generalisations of some results obtained for normal matrices in Chapter 6. The condition number of an invertible matrix S is defined as cond ( S)
==
II SI I II S - 1 11 -
Note that cond ( S) > 1 , and cond ( S) == 1 if and only if S is a scalar multiple of a unitary matrix. Our first theorem is a generalisation of Theorem VI. 3.3.
VI I I . 3 D iago nalis able M atrices
233
Theorem VIII . 3 . 1 Let A == SDS- 1 , where D is a diagonal matrix and S an invertible matrix. Then, for any matrix B , s(o-(B) , o-(A) ) < cond( S) I I A - E l l -
(VIII. 18)
Proof. The proof of Theorem Vl. 3.3 can be modified to give a proof of this. Let E. = cond(S) I I A - B ll . We want to show that if (3 is any eigenvalue of B, then (3 is within a distance c of some eigenvalue a.1 of A. By applying a translation, we may assume that {3 = 0. If none of the O'.j is within a distance c of this , then A is invertible and
1 So,
J I A - 1 ( B - A) l l < I I A - 1 II II E - A l l < 1 .
Hence, I + A - 1 (E - A) is invertible, and so is B = A(I + A- 1 (B - A) ) . • But then B could not have had a zero eigenvalue. Note that the properties of the operator norm used above are (i) I + A is invertible if I I A l l < 1 ; (ii) I I AB II < I I A I I I I B II for all A, B; (iii) l i D I I == max l di l if D = diag( d 1 , . . . , dn ) . There are several other norms that satisfy these properties. For example, norms induced by the p-norms on e n , 1 < p < oo , all have these three properties. So, the inequality (VIII. 18) is true for a large class of norms. Exercise VI�I.3 . 2 Using continuity arguments and the Matching Theo rem show that if A and B are as in Theorem VIII. 3. 1 , then d(o-(A) , o-(B) ) < (2n - l)cond(S) I I A - E ll . If B is also diagonalisable and B
== T D'T - 1 , then
d(o-(A) , o-(B) ) < n cond ( S)cond(T) I I A - E l l An inequality stronger than this will be proved below by other means.
Theorem VIII. 3 . 1 also follows from the following theorem. Both of these are called the Bauer-Fike Theorems. Theorem VIII.3.3 Let S be an invertible matrix. If (3 is an eigenvalue of B but not of A, then (VIIL 1 9)
VI I I .
234
Proof.
Spect ral Vari ation of Nonnormal Matrices
We can write
S(B - {3/) S - 1
S[(A - {31) + B - A] S - 1 S(A - {3/) S - 1 {I + S(A - {31 ) - 1 s - 1 S(B - A) S - 1 } . ·
Note that the matrix on the left-hand side of this equation is not invertible. Since A - {3! is invertible, the matrix inside the braces on the right-hand side is not invertible. Hence,
1 < li S( A - f3I) - 1 S - 1 . S(B - A)S - 1 11 < 11 s (A - f3 I) - l s- 1 11 11 S (B - A )s - 1 11 .
•
This proves the theorem.
We now obtain, for diagonalisable matrices , analogues of some of the major perturbation bounds derived in earlier chapters for Hermitian ma trices and for normal matrices. Some auxiliary theorems about norms of commutators are proved first. Theorem VIII.3.4 Let A, B be Hermitian operators, and let r be a pos itive operator whose smallest eigenvalue is
1
{i. e. ,
r > 1 I > 0) .
III Ar - r B i l l > 1 III A - B ill
The n
(VIII. 20 )
for every unitarily invariant norm.
Proof.
Let T == Ar - r B, and let Y
Y
==
==
T + T * . Then
(A - B)r + r(A - B).
This is the Sylvester equation that we studied in Chapter 7. From Theorem VII . 2.12, we get
21 III A - B ill < IllY I ll < 2 I II TI I I
==
2 III Ar - r B il l -
This proves the theorem.
•
Corollary VIII.3.5 Let A, B be any two operators, and let r be a positive opemtor, r > 7! > 0. Then
III ( Ar - r B) EB (A* r - r B * ) lll > 'Y IIl (A - B ) EB (A - B) I l l
( VIII . 21)
for every unitarily invariant norm.
Proof.
( J. � )
This follows from (VIII.20) applied to the Hermitian operators • and ( �* � ) , and the positive operator ( � � ) .
VI I I . 3 Diagonalisable M atrices
235
Corollary VIII.3 . 6 · Let A . and B be unitary operators, and let r be a positive operator, r > 1I > 0 . Then, for every unitarily invariant norm, ( VIII. 22 )
III Ar - r B i ll > 1 II I A - B i l l Proof.
If A and B are unitary, then
sj ( Ar - rB)
==
sj ( rB* - A* r )
==
sj (A*r - rB* ) .
Thus the operator (Ar - r B) EB (A * r - r B * ) has the same singular values as those of Ar - r B, each counted twice. From this we see that ( VIII. 2 1 ) implies ( VIII.22 ) for all Ky Fan norms, and hence for all unitarily invariant • norms. Corollary VIII.3. 7 Let A, B be normal operators, and let r be a positive operator, r > rl > 0. Then
( VIII.23 ) Proof. Suppose that A is normal and its eigenvalues are a 1 , . . . , a n · Then (choosing an orthonormal basis in which A is diagonal) one sees that for every X
z ,J
If A, B are normal� then, applying this to ( � in place of X , one obtains
�)
in place of A and ( g
I I AX - X B l l 2 == II A * X - X B * 1 1 2 -
�)
( VIII. 24 )
Using this, the inequality ( VIII.23 ) can be derived from ( VIII.2 1 ) .
•
A famous theorem (called the Fuglede-Putnam Theorem, valid in Hilbert spaces of finite or infinite dimensions) says that if A and B are normal, then for any operator X, AX == XB if and only if A* X == X B* . The equality ( VIII.24 ) says much more than this.
normal A, B, the inequality (VIII. 23} is no t al wa ys true if the Hilbert-Schmidt norm is replaced by the operator norm. A numerical example i l lust t ing this is given below. Let
Example VIII.3.8 For r ==
ra
diag(0.6384 , 0. 6384, 1 .0000 ) ,
- 0. 5205 - 0. 1642i -0. 1 299 + 0 . 1 709i 0. 2850 - 0. 1808i
0. 1042 - 0.3618i 0.42 1 8 + 0.4685i -0. 3850 - 0. 4257i
- 0. 1326 - 0.0260i - 0. 5692 - 0 .3178 i
-0. 2973 - 0. 1 71 5i
VI I I .
236
B
==
S p ectral Vari ation of Nonnormal Matrices
- 0 .6040 + 0 . 1 760i 0 . 0582 + 0. 2850i 0 .408 1 - 0. 3333i
0. 5 1 28 - 0 . 2865i 0.0 1 54 + 0.4497i
- 0.072 1 - 0. 2545i
0 . 1 306 + 0.0 1 54i - 0 . 500 1 - 0 . 2833i - 0 . 2686 + 0.0247i
Then
l i Ar - r B I I = 0.8763. ! II A - E l l Theorem VIII. 3.4 and Corollary VIII.3. 7 are used in the proofs below.
Alternate proofs of both these results are sketched in the problems. These proofs do not draw on the results in Chapter 7. Theorem VIII.3.9 Let A, B be any two matrices such that A == S D1 S- 1 , B == T D 2 T- 1 , where S, T are invertible matrices and D 1 , D 2 are real diag onal matrices. Then
1 2 IIIEig 1 ( A) - Eig1 ( B) I I I < [cond ( S)cond (T)] 1 III A - B ill
(VIII.25)
for every unitarily invariant norm.
Proof. When A, B are Hermitian, this has already been proved; see (IV.62) . This special case will be used to prove the general result. We can write
Hence,
We could also write
and get
1 IIIT - 1 SD 1 - D2 T - S IIl
I I T - 1 I I IIIA - B III I I S I I . Let s- 1 T have the singular value decomposition s- 1 T == Ur V. Then
<
!II D 1 ur v - urv D 2 III III U * n 1 ur - r v D2 V * 1 1 1 == IIIA ' r - r B ' lll ,
V D 2 V * are Hermitian matrices. Note that where A' U* D 1 U and B' r - 1 s == V * r- 1 U* . So, by the same argument , ==
==
We have, thus, two inequalities
n iii A - B il l > I I I A r - r B ' III , '
VI I I . 3 D iagonalisable M atrices
237
and
/311J A - B ill > III A ' r- l - r- 1 B ' lll , where a == II S- 1 JI I I T I I , (3 IIT - 1 I I II SI I - Combining these two inequalities and using the triangle inequality, we have
==
[ 1/2 ( - ) 1 / 2 ] 2 > 0 implies that � The operator inequality ( � ) r!3
+ rf3 >
1
(a{3� 1; 2 I. Hence, by Theorem VIII.3.4, 2 lii A - B i ll >
1
III A ' - B ' lll � 1 2 1 (n
But A' and B ' are Hermitian matrices with the same eigenvalues as those of A and B , respectively. Hence, by the result for Hermitian matrices that was mentioned at the beginning, Ili A ' - B ' lll > III Eig 1 (A) - Eig 1 ( B ) III Combining the three inequalities above leads to (VIII. 25) .
•
Theorem VIII. 3 . 10 Let A, B be any two matrices such that A == SD 1 S- 1 , B T D2T- 1 , where S, T are invertible matrices and D 1 , D 2
==
are diagonal matrices. Then
1 2 d2 ( o-( A ) , o- (B ) ) < [cond(S)cond(T ) ] 1 I I A - B i b -
(VIII. 26)
Proof. When A, B are normal, this is j ust the Hoffman-Wielandt in equality; see (VI. 34) . The general case can be obtained from this using the inequality (VIII. 23) . The argument is the same as in the proof of the • preceding theorem. Theorems VIII.3.9 and VIII. 3 . 1 0 do reduce to the ones proved earlier for Hermitian and normal matrices. However, neither of them gives tight bounds. Even in the favourable case when A and B commute, the left-hand side of ( VIII. 24) is generally smaller than I l i A B i l l , and this is aggravated further by introducing the condition number coefficients.
-
Exercise VIII. 3 . 1 1 Let A and B be as in Theorem VIII. 3. 10. Suppose that all eigenvalues of A and B have modulus 1 . Show that (VIII. 27) d lll- 111 ( a( A ) , a( B )) < [cond ( S)cond ( T )] 1 1 2 II I A B il l
;
-
for all unitarily invariant norms. For the special case of the operator norm, the factor ; above can be replaced by 1. [Hint: Use Corollary VIII. 3. 6 and the theorems on unitary matrices in Chapte r 6.}
238
VI I I .
VII I . 4
Spectral Variat io n of Nonnormal M atrices
Matrices with Real Eigenvalues
In this section we will consider a collection R of matrices that has two special properties: R is a real vector space and every element of R has only real eigenvalues. The set of all Hermitian matrices is an example of such a collection. Another example is given below. Such families of matrices arise in the study of vectorial hyperbolic differential equations. The behaviour of the eigenvalues of such a family has some similarities to that of Hermitian matrices. This is studied below. Example VIII.4. 1 Fix a block decomposition of matrices in which all di agonal blocks are square. Let R be the set of all matrices that are block upper triangular in this decomposition and whose diagonal blocks are Her mitian. Then R is a real vector space (of real dimension n2 ) and every element of R has real eigenvalues.
In this book we have called a matrix positive if it is Hermitian and all its eigenvalues are nonnegative. A matrix A will be called laxly positive if all eigenvalues of A are nonnegative. This will be written symbolically as 0 < L A. If all eigenvalues of A are positive, we will say A is strictly laxly positive. We say A < L B if B - A is laxly positive. We will see below that if R is a real vector space of matrices each of which has only real eigenvalues, then the laxly positive elements form a convex cone in n. So, the order < L defines a partial order on n. Given two matrices A and B, we say that .X is an eigenvalue of A with respect to B if there exists a nonzero vector x such that Ax == .XBx. Thus, eigenvalues of A with respect to B are the n roots of the equation det(A - :AB) == 0. These are also called generalised eigenvalues. Lemma VIII. 4.2 Let A, B be two matrices such that every real linear combination of A and B has real eigenvalues. Suppose B is strictly laxly positive. Then for every real A , -A + AI has real eigenvalues with respect to B .
Proof.
We have to show that for any real A the equation de t (-A. + .XI - J.LB)
==
(VIII. 28)
0
is satisfied by n real J-L. Let J-l be any given real number. Then, by hypothesis , there exist n real .X that satisfy (VIII.28) , namely the eigenvalues of A + J-LB. Denote these A > VJ n (J-L) . We have as 'Pi (J-L) and arrange them so that
·
·
n
det ( - A + .XI - J-L B) ==
IT (A - VJk ( J-L) ) .
k= l
(VIII.29)
By the results of Section VI. l , each 'P k (JL) is continuous as a function of 1 11- · For large J.L, l (A + J.LB) is close to B . So, l VJ k ( J-L ) approaches :A k(B) as � �
V I I I . 4 IV1at rices with Real Eigenval ues
239
J-L � oo , and Ak (B) as J-L � - oo . S i nce B is strictly laxly positive, this implies that 'Pk (J.L) � ±oo as J.L � ±oo. So, every A in IR is in t he range of 'Pk fo r each k == 1 , 2, . . . , n . Thus, for • each A , there exist n re al J-L that satisfy (VIIL 28) .
Proposition VIII.4.3 Let A, B be two matrices such that every real linear combination of A and B has real eigenvalues. Suppose A is (strictly) laxly negative. Then every eigenvalue of A + i B has {strictly) negative real part. Proof. Let J-L == f..L l + iJ.L 2 be an eigenvalue of A + iB. Then det (A + iB J-L 1 l - ij.L 2 l) 0. Multiply this by in to get ==
det [ ( - B + J.L 2 l) + i ( A - J.L 1 l) ] == 0 .
So the matrix -B + J.L2 l has an eigenvalue -i with respect to the matrix A - J.L i l , and it has an eigenvalue i wi th respect to the matrix - (A - j.L 1 / ) . By hy pothesis , every real linear combination of A - J.L 1 l and B has real eigenvalues. Hence, by Le mm a VIIL4. 2 , A - J.L 1 l c anno t be either strictly laxly positive or s trictly laxly negative. In other words, •
This proves the proposition. Exercise VIII.4.4 With no tations as in th e above proof, show that A� (B) < t-t 2 < >.. t (B) .
Theorem VIII.4. 5 Let
R
be a real vector space whose elements are ma trices with real eigenvalues. Let A, B E R and let A < L B. Then Ak (A) < >.. i (B) for k == 1 , 2 , . . . , n .
Proof. We will prove a more general statement: if A, B E R and 0 < L B, then >.. i (A + J.LB) is a monotonically increasing function of the real var i able J-L. I t is enough to prove this when 0 < L B; the general case follows by con tinuity. In the notation of Lemma VIIL4.2, >.. i (A + J.LB) == 'P k (J.L) . Suppose cp k (J-L) decreases in some intervaL Then we can ch o os e a real number ). such that ). - 'Pk (J-L) increases from a negative to a positive value in this interval. Since
VI I I .
240
S pectral Variation of Nonnormal Matrices
for k == 1 , 2 , . . . , n .
P roof. The matrix B - A � ( B ) I is laxly positive. So, by the argument in the proof of the preceding theorem, .Ai (A + J-tB) - J-L A � (B ) is a monotoni cally increasing function of J-L · Choose J-l 0, 1 to get the first inequality in (VIII. 30) . The same argument shows that A k (A + J-LB) - J-LA i ( B) is a mono tonically decreasing function of J-l · This leads to the second inequality. • ==
Corollary VIII.4. 7 On the vector space R, the function .Ai (A) is convex and the function A� (A) is concave in the argument A. Theorem VIII.4 8 Let A and B be two matrices such that all real linear combinations of A and B have real eigenvalues. Then ..
max
I < k
l .Ak (A) - A k ( B ) I < spr(A - B ) < II A - B l l -
(VIII.3 1 )
Proof. Let R be the real vector space generated by A and B. By Theorem VIII.4.6,
So,
l -Ak ( B) - A k ( A ) l < max( I -A i ( B - A) I , I A � ( B - A ) I ) spr(A - B) < II A - E ll -
•
Note that Weyl's Perturbation Theorem is included in this as a special case.
Exercise VIII.4. 9 Show that if only A, B and A + B are assumed to have real eigenvalues, then the inequality (VIII. 31) might not be true.
VI II 5 ..
Eigenvalues with S ymmetries
We have remarked earlier that the exponent 1/n occurring in the bound (VIII.8) is unavoidable. However, if A and B are restricted to some special classes, this can be improved. In this section we identify some useful classes of matrices where this exponent can be improved ( though not eliminated altogether) . These are matrices whose eigenvalues appear as pairs ±A or, more generally, as tuples {A, wA, . . . , wp- l A}, where w is a pth root of unity. We will give interesting examples of large classes of such matrices, and then show how this symmetric distribution of their eigenvalues can be exploit ed to get better bounds.
VII I . 5 Eigenvalues with Symmet ries
24 1
Example VIII. 5 . 1 Let AT denote the transpose of a matrix. A complex matrix is called symmetric if AT == A and skew-symmetric if AT == - A . If A is a skew-symmetric matrix, then A is an eigenvalue of A if and only if -A is. The class of all such matrices forms a Lie algebra. This is the Lie algeb ra associated with the complex orthogonal group. Example VIII. 5 . 2 If AT is similar to - A , then, clearly, A is an eigen value of A if and only if - A is. The Lie algebra corresponding to the sym plectic Lie group contains matrices that have this property. Let n be an even number n == 2r. Let J == ( � � ) , where I is the identity matrix of order r . Let A be an n x n matrix such that AT == - JAJ- 1 . It is easy to see that we can then write
where A 1 , A 2 , A3 are r x r matrices of which A2 and A3 are skew-symmetric. The collection of all such matrices is the Lie algebra associated with the symplectic group.
Example VIII. 5.3 Let X be a matrix of order n form 0 A1 0 0 0 0 0 0 A2 0
==
pr having a special
X ==
0 A P
0 0
0 0
Ap - 1 0
0 0
where A 1 , . . . , A p are matrices of order r. Let Y == diag (I, , wi, , . . . , wP- 1 I, ) , where w is the primitive p th root of unity. Then y - 1 X Y == wX . So, if A is an eigenvalue of X , then so are w..X.. w 2 A , . . . , wP- 1 A .
Exercise VIII.5.4 Let Z == ( Z A1_ ) , and suppose R commutes with A 1 . Show that tr z k 0 if k is odd. Use this to show that A is an eigenvalue of Z if and only if - ).. is. =
Exercise VIII. 5 . 5 Let w be the primitive pth root of unity. If X, Y are two matrices such that X Y == w Y X , then (X + Y ) P == XP + Y P . Exercise VIII.5 . 6 Let Z be a matrix of order form 0 0 R A1 0 wR A 2 0 Z == Ap
0
0
n ==
pr having a special
0 0
Ap - 1 wP- 1 R
where R commutes with A 1 , A 2 , . . . , A p . Use the result of the preceding ex ercise to show that tr z k == 0 if k is not an integral multiple of p. Use this
242
VII I .
Spectral Variation of Nonnormal M at r ices
to show that if .A is an eigenvalue of Z , then so are w .A , w 2 A , · · {This is true even when R commutes with A 1 , . . . , Ap - l ·)
For brevity, an n-tuple will be called p- Carrollian if n elements of the tuple can be enumerated as
( a 1 , . . . , a, , wa1 , . . . , wa, , . . . , w
==
p - 1 a1 , . . . , w p - 1 a, ) ,
·
,
wP- 1 A .
pr and the
(VIII.32)
where w is the primitive pth root of unity. We have seen above several examples of matrices whose eigenvalues are p-Carrollian.
k
Exercise VIII. 5. 7 Let s k, 1 < < n denote the elementary symmetric polynomials in n variables. If ( a 1 , . . . , an ) is a Carrollian n-tuple written in th e forrn { VII/. 32}, show that modulo a sign factor, we have
S k ( a 1 , · · · an ) _
,
{ Os
j ( af ,
. . . , a�)
if if
kk #- jpjp. ==
Use this to show that if a 1 , . . . , an are roots of the polynomial
f( ) Z
==
Z
n
n + a1 Z - 1
+ ·· ·
+ Un ,
then af , . . . , a� are roots of the polynomial
F( z)
==
z r + ap zr- 1 + a2p zr -2 + · · · + arp ·
Proposition VIII. 5.8 Let f, g be monic polynomials of degree n as in ( VII/. 1 0) . Suppose n == and the roots of f and g both are p - Carrollian. Let 1 be as in {VII/. 1 1). Then the roots of f and g can be labelled as a1 , . . . , an and fJ 1 , . , f3n in such a way that
pr
.
j a'l! 1max
--
Proof.
-
.
{31??. I
{ "J ak - bk I'Yp(r- k) } 1
<4
�
k= 1 p
p
.
d ( o- (AP ) , ==
A, B
be two n
(VIII.33) •
Use Theorem VIII.2.4 and Exercise VIII.5.7.
Theorem VIII. 5.9 Let n == pr and let eigenvalues are p-Carrollian. Then
where M
1/ r
x
a( BP)) < 4 Cr,p MP - 1 1' II A - B l l 1 1 ' '
n matrices whose
(VIII.34)
max ( II A II , II B II ) and
{ k } Cr. p = k pG�) . r
1 /r
(VIII.35 )
VI I L 5 Eigenvalues with Symmet ries
Proof.
See Exercise VIII. 2.6 and use the preceding proposition .
243 •
The two results above give bounds not on the distance between the roots themselves but on that between their pth powers. If all of them are outside a neighbourhood of zero, a bound on the distance between the roots can be obtained from this. This needs the following lemma.
Lemma VIII.5. 10 Let x, y be complex numbers such that l x l > and l xP - yP J < C. Then, for some k , 0 < k < p - 1 ,
p,
IYI > p
(VIII.36) where w is the primitive p th root of unity.
Proof.
Compare the coefficients of t in the identity
p- 1
11 ( t - ( x - w k y ) ] == ( - 1 ) P [ (x - t)P - yP]
k =O to see that
- ( - 1 )p- 1px p- 1 . Sp- 1 ( x - y , x - wy , . . . , x - wp- 1 y ) 1 The right-hand side has modulus larger than ppP - and the left-hand side is a sum of p terms. Hence, at least one of them should have modulus larger than pP- 1 . So, there exists k, 0 < k < p - 1 , such that
p- 1
11 l x - wj y l > pP-1 . J :i: k
] =0
But
p- 1
11 l x - wj yj
j=O
When p == note that
==
jxP - yP j < C. This proves the lemma .
•
2, the inequality (VIII.36) can be strengthened. To see this
> 4 · l x - Y l 2 + l x + Y l 2 == 2 ( l x l 2 + I Y I 2 ) p2 So, either l x - Y l or l x + y j must be larger than 21 1 2 p . Consequently, one
of them must be smaller than C/2 1 1 2 p . Thus if the eigenvalues of A and B are p-Carrollian and all have modulus 1 larger than p, then d(o-(A) , o-(B) ) is bounded by C / pP- , where C is the quantity on the right-hand side of (VIII.34) . When p == 2, this bound can be improved further to Cj y"lp . The major improvement over bounds obtained in Section 2 is that now the bounds involve I I A - B jj 1 / r instead of
2 44
VI I I .
Spect ral Var iation of Nonnormal M atrices
I I A - B ll 1/ n. For low values of p, the factors cr,p can be evaluated explicitly. For example, we have the combinatorial identity
VIII . 6
Problems
Problem VIII. 6. 1. Let f ( z ) == z n + a 1 z n - 1 + · · · + a n be a monic poly nomial. Let J-L 1 , . , J-l n be the numbers l a k l l / k , 1 < k < n , rearranged in decreasing order. Show that all the roots of f are bounded (in absolute value) by J-L I + t-t 2 . This is an improvement on the result of Lemma VIII. 2. 1 . .
.
Problem VIII.6 . 2 . Fill in the details in the following alternate proof of Theorem VIII.3. 1 . Let (3 be an eigenvalue of B but not of A . If Bx == (Jx, then Hence,
l l x ll < cond ( S) I I B - A I I l l (fJ I - D ) - 1 1 1
ll x ll .
From this it follows that m�n l fJ - aj l < cond (S) I I B - A l l . J
Notice that this proof too relies only on those properties of l l · II that are shared by many other norms (like the ones induced by the p-norrns on e n ) . See the remark following Theorem VIII.3. 1 .
Problem VIII.6.3. Let B be any matrix with entries Di ==
{ z : j z - bii l <
L l bij l }, j:f; i
1
bij ·
The disks
n,
are called the Gersgorin disks of B. The Gersgorin Disk Theorem says that
n
and that any connected component of the set U Di contains as many eigen'l
values of B as the number of disks that form this component. The proof of this is outlined below.
245
VI I I . 6 Pro b lems
Consider the vector norm I I x ll
on operators is
00
A ! l oa -t oo
== max I x i I on
l
en .
The norm
it
induces
n
l aij I · n - - j= l Let D be the diagonal of B , and let H == B - D . Let (3 be an eig e nval u e of B but not of D . Then (31 - B == (31 - H - D == ((3 1 - D) [ 1 - ( (3 1 - D ) - 1 H ] . Si nce (31 - B is not invertible, neither is the matrix in the square brackets . Hence, 1 < ll ( fJJ - D) - 1 H ll oo-too · II
== m � � 1
From this, the first part of the theorem follows. The second part follows from the continuity argument we have used o ft en. Let B ( t ) == D + tH, 0 < t < 1 . Then B( O ) == D , B( l) == B ; the eigenvalues of B( t ) trace continuous curves that j oin the eigenvalues of D to those o f B . Note that the proof of the first part is very similar to that of Theorem VIII. 3.3; in fact , it is a special case of the earlier one .
Problem VIII.6.4. Given any matrix that U * AU == T ==
A , we can find a unitary U such
D + N, D is diagonal , and N
where T is upper triangular is strictly upper trian gular and, hence, nilpotent . Such a re d u ction is not unique. The measure of nonnormality of A is de fi ned as ,
L).(A) == inf
I! N il ,
where the infimum is taken over all N that o cc ur in the possible triangular forms of A given above. Now let B be any other matrix, and let (3 be an eigenvalue of B but not of From (VIII. 19) we have
A.
Show that
( D - (31 + N ) - 1
II ( D - (31 + N) - 1 11 - 1
<
J I A - B ll .
[I + ( D - {3J) - 1 N] - 1 ( D - {31 ) - 1 [ I - ( D - {31 ) - 1 N + { (D - (31) - 1 N} 2 + . . . + (- 1 ) n - l {(D {31 ) - l N} n - l ] (D (3/)- 1 . _
_
Let 8 == dist ( (J , o- (A) ) . From this equation and the inequality before it concl ude that
n �l A A A I }. I I A - B I I � < � { 1 + �� ) + (�� )) + . . . + (�� )) 2
24 6
VI II.
Spectral Variation
of No nnormal
M atrices
Now show t hat
This is Henrici's Theorem. Let f (t) == t n / ( 1 + t + · · · + t n - l ) . Then f ( t ) is close to t n for small values o f t, and to t for large values of t. Thus, when � (A) is close t o 0, i.e. , when A is close to being normal, the above bound leads to the asymptotic in eq u ality
In Theorem VI.3.3 we saw that if A is normal, then s(()( B) , ()(A) ) < II A - B II ·
Problem VIII.6 . 5 . Let v be any norm on the space of matrices. The v- rneasure of nonnormality of A i s defined as �v (A)
==
inf v (N) ,
where N is as in Problem VIII.6.4. Suppose that th e norm v is such that I I A ll < v(A) for all A. Show that � ( A) in Henrici's Theorem can be replaced by L).v (A) . Problem VIII.6.6 . For the Hilbert-Schmidt norm II · 1 ! 2 , the measure of
nonnormality satisfies the inequality
12 ll A * A - AA* I ! �
for every n x n matrix A. (The proof is a little intricate. ) Problem VIII.6. 7. Let A have the Jordan canonical fo r m J == SAS- 1 . Let m be the size of the largest Jordan block in J. Let B be any other matrix. Show that for every eigenvalue {3 of B there is an eigenvalue a of A such that
Problem VIII. 6 . 8 . Let A, B, r be as in T h eo re m VIII.3.4. Let (A - B)x1 == AjXj , where the vectors x1 are orthonormal and the eigenvalues .Ai are indexed in such a way that sj : == s1 (A - B) == I A1 j . Let Yi be the orthonormal vectors that satisfy the relations (A - B)xj == B j Yj · Note that y1 == ±xj · Note also that the difference of Ar - rB and (A - B)r
V II L 6 P roblems
is skew-Hermitian. Use this to show that , for 1 < k <
k Re L( xj , (Ar - r B)yj ) j= l
Re
k
L (xj ,
j =l
247
n,
(A - B) r yi )
k k L sj ( Yj , r yj ) > I L sj . j=l j=l
Use this to give an alternate proof of Theorem VIIL3.4. ( See Problem IIL6. 6. )
Problem VIII. 6 . 9 . Fill in the details in the following proof of Corollary VIII.3.7. Let D r - �I. Then ==
l i Ar - r E l l �
l i ( AD - DB) + !(A - B ) I I � 2 II A D - DB I I � + ! II A - E l l � + 2 ! Re tr (A D - DB)* (A - B) .
So, it suffices to show that the last term is positive. This can be seen by writing 2 Re tr ( A D - DB ) * (A-B) == tr { ( A D - D B) * ( A-B) + (A - B) * (A D - DB) }
and then using cyclicity of the trace to reduce this to tr D [( A - B ) * (A - B) + (A - B) ( A - B ) * ] .
Problem VIII. 6. 10. ( i ) Let X be a contractive matrix; i.e. , let !l X II < 1 . Show that there exist unitaries U and V such that X == � ( U + V) . Use this to show that if D 1 and D2 are real diagonal matrices, then for every unitarily invariant norm. [ ( See (IV.62 ) . ] ( ii ) Let A == SD 1 S- 1 , B == T D2 T - 1 where S and T are invertible matrices and D1 , D2 are real diagonal matrices. Show that ,
T Il i A - B ill < cond(S) cond ( T) IIJ Eig1 ( A ) - E ig (B) I ll -
Problem VIII.6. 1 1. Let A and B be any two diagonalisable matrices with eigenvalues .X 1 , . . . , An and f..L l , . . . , f..L n , respectively. Let A == SD 1 s- 1 , B == TD2T- 1 , where S and T are invertible matrices and D 1 , D2 are diagonal matrices. Show that 1 /2 .A ; - J.L-;r( i ) 1 2 , I I A - B l l 2 < cond ( S ) cond ( T ) m:X:
�I (
)
VII I .
248
where
1r
Sp ectral Variation of Nonnormal M at rices
varies over all permutations on
n
symbols. [See Theorem VI.4 . 1 .]
Problem VIII. 6. 1 2 . Let A be a Hermitian matrix with eigenvalues , a n . Let B be any other matrix. For 1 < j < n , let a1 , .
.
.
Dj
==
{z : Iz -
aj
I < II A - B II , I Im z I < II Im (A
-
B) I I } .
The regions D1 are disks flattened on the top and bottom by horizontal lines. Show that the eigenvalues of B are contained in Unj , and that each connected component of this set contains
J
as
many eigenvalues of A
as
of B.
Problem VIII.6 . 13. Let R be a real vector space whose elements are matrices with real eigenvalues . Show that the function
k
L .Aj (A) is a convex
j= l
function of A on this space for 1 < k < is concave on
R.
n.
Show that the function
k
L .x] (A) j=l
Problem VIII. 6 . 14. I f R1 is invertible, then
)(�
Use this to show that if
and R commutes with A 1 , then Z and - Z have the same eigenvalues. (Show that they have the same characteristic polynomials. ) This gives another proof of the statement at the end of Exercise VIII.5.6, for p 2 . The same method works for p > 2. For instance, the case p = 3 is dealt with as follows. If R 1 , R2 are invertible, then ==
R1 0 A3 R1 0 A3
A1 R2 0
0 A2 R3 0 R2
- A 3R! 1 A 1
0 0 1 R3 + A3 R! 1 A 1 R"2 A 2
I 0 0
R1 1 A1 -
I 0
0 R:; 1 A 2 I
Derive similar factorisations for p > 3, and use this to prove the statement at the end of Exercise VIII. 5.6.
VI I I . 7 Notes and References
VI II . 7
249
Notes and References
Many of the topics in this chapter have been presented earlier in R. Bhatia, Perturbation Bounds for Matrix Eigenvalues, Longman, 1 987, and in G . W. Stewart and J G Sun, Matrix Perturbation Theory, Academic Press, 1990. Some results that were proved after the publication of these books have, of course, been included here. The first major results on perturbation of roots of polynomials were proved by A. Ostrowski, Recherches sur e a methode de Griiffe et e es zeros des polynomes et des series de Laurent, Acta Math. , 72 ( 1 940) , 99-257. See also Appendices A and B of his book Solution of Equations and Systems of Equations, Academic Press, 1960 . Theorem VIII.2.2 is due to Ostrowski . Using this he proved an inequality weaker than (VIII. 1 5 ) ; this had a factor (2n - 1 ) instead of 4. The argument used by him is the one followed in Exercise VIII. 1 . 3. Ostrowski was also the first to derive perturbation bounds for eigenvalues of arbitrary matrices in his paper Uber die Stetigkeit von charakteristischen Wurzeln in A bhiingigkeit von den Matrizenelementen, Jber. Deut . Mat. Verein, 60 ( 1957) 40-42. See also Appendix K of his book cited above. I aij I , The inequality he proved involved the matrix norm I I A I I L == � .
-
.
L 7.
,J
which is easy to compute but is not unitarily invariant . With this norm, his inequality is like the one in (VIII.4) . An inequality for d(u(A) , u ( B ) ) in terms of the unitarily invariant Hilbert-Schmidt norm was proved by R. Bhatia and K . K . Mukherjea, On the rate of change of spectra of operators, Linear Algebra Appl. , 27 ( 1979) 147- 1 57. They followed the approach in Exercise VIII. 2 . 6 and, after a little tidying up, their result looks like (VIIL4) but with the larger norm II · ll 2 instead of l l · l l · This approach was followed , to a greater success , in R. Bha tia and S. Friedland, Variation of Grassmann powers and spectra, Linear Algebra Appl. , 40 ( 198 1 ) 1- 18. In this paper, the norm II I I was used and an inequality slightly weaker than (VIII.4) was proved. An improvement of these inequalities in which (2n - 1 ) is replaced by n was made by L. Elsner, On the variation of the spectra of matrices, Linear Algebra Appl. , 47 ( 1982) 127-138. The major insightful observation was that the Matching Theorem does not exploit the symmetry between the polynomials f and g , nor the matrices A and B , under consideration. Theorem VIII. 1 . 1 is also due to L . Elsner, An optimal bound for the spectral variation of two matrices, Linear Algebra Appl. , 7 1 ( 1985 ) 77-80. The argument using Chebyshev polynomials, that we have employed in Sections VIII. 1 and VIII.2, seems to have been first used by A. Schonhage, Quasi- GCD computations, J. Complexity, 1 ( 1985) 1 18- 137. ( See Theorem 2. 7 of this paper. ) It was discovered independently by D. Phillips , Improv ing spectral variation bounds with Chebyshev polynomials, Linear Algebra Appl. , 1 33 ( 1990) 1 65- 1 73 . Phillips proved a weaker inequality than (VIII.8) ·
250
VI I I .
Spectral Variation of Nonnormal M atrices
with a factor 8 instead of 4. This argument was somewhat simplified and used again by R. Bhatia , L. Elsner , and G. Krause, Bounds for the variation of the roots of a poly nomial and the eigenvalues of a matrix, Linear Algebra Appl. , 142 ( 1990) 195-209. Theorems VIII. 1.5 and VIII.2.4 ( and their proofs ) have been taken from this paper. Using finer results from Chebyshev approximation, G . Krause has shown that the factor 4 occurring in these inequalities can be replaced by 3.08. See his paper Bounds for the variation of matrix eigen values and polynomial roots, Linear Algebra Appl. , 208/209 ( 1994) 73-82. It was shown by Bhatia, Elsner, and Krause in the paper cited above that, in the inequality (VIII. 1 5) , the factor 4 cannot be replaced by anything smaller than 2. Theorems VIII. 3. 1 and VIII.3.3 were proved in the very influential paper, F.L. Bauer and C .T. Fike, Norms and exclusion theorems, Numer. Math. , 2 ( 1960) 137- 141 . See the discussion in Stewart and Sun, p. 177. The basic idea behind results in Section VIII.3 from Theorem VIII.3.4 onwards is due to W. Kahan, Inclusion theorems for clusters of eigenvalues of Hermitian matrices, Technical Report, Computer Science Department, University of Toronto, 1967. Theorem VIII.3.4 for the special case of the operator norm is proved in this report. The inequality (VIII.23) is due to J .-G. Sun, On the perturbation of the eigenvalues of a normal matrix, Math. Numer. Sinica, 6 ( 1 984) 334-336. The ideas of Kahan's and Sun's proofs are outlined in Problems VIII. 6.8 and VIII.6.9. Theorem VIII.3.4 , in its generality, was proved in R. Bhatia, C. Davis, and F. Kittaneh, Some inequalities for commutators and an application to spectral variation, Aequationes Math. , 4 1 ( 1 99 1 ) 70-78. The three corollaries were also proved there. These authors then used their commutator inequalities to derive weaker versions of Theorems VIII.3.9 and VIII.3. 10; in all these, the square root in the inequalities (VIII.25) and (VIII.26) is missing. For the operator norm alone, the inequality (VIII.25) was proved by T.-X. Lu, Perturbation bounds for eigenvalues of symmetrizable matrices, Numerical Mathemat ics: a Journal of Chinese Universities, 16 ( 1994) 1 77- 185 (in Chinese ) . The inequalities (VIII.25)- (VIII.27) have been proved recently by R . Bhatia, F. Kittaneh and R .-C. Li, Some in eq uali ti e s for commutators and an appli cation to spectral variation II, Linear and Multilinear Algebra, to appear. The inequality in Problem VIII.6. 10 was proved in R . Bhatia, L. Elsner, and G. Krause, Spectral variation bounds for diagonalisable ma trices, Preprint 94-098, SFB 343, University of Bielefeld. Example VIII.3.8 ( and another example illustrating the same phenomenon for the trace norm ) was constructed in this paper. The inequality in Problem VIIL6. 11 was found by L. Elsner and S. Friedland , Singular values, doubly stochastic matrices and applications, Linear Algebra Appl. , 220( 1995) 1 6 1- 169. The results of Section VIII.4 were discovered by P. D . Lax, Differen tial equations, difference equations and matrix theory, Comm. Pure Appl. Math. , 1 1 ( 1958) 175- 194. Lax was motivated by the theory of linear partial
V I I I . 7 Notes and References
251
differential equations of hyperbolic type, and his proofs used techniques from this theory. The paper of Lax was followed by one by H . F . Wein berger, Remarks on the preceding paper of Lax, Comm. Pure Appl. Math. , 1 1 ( 1958) 1 95- 196. He gave simple matrix theoretic proofs of these theo rems , which we have reproduced here. L. Carding later pointed out that these are special cases of his results for hyperbolic polynomials that ap p eared in his papers Linear hyperbolic partial differential equations with constant coefficients, Acta Math. , 84( 1951 ) 1-62, and An inequality for hy perbolic polynomials, J . Math. Mech. , 8 ( 1 959) 957-966. A characterisation of the kind of spaces R discussed in Section VIII.4 was given by H. Wielandt , Lineare Scharen von Matrizen mit reellen Eigenwerten, Math. Z. , 53( 1950) 2 1 9-225 . It was observed by R. Bhatia, On the rate of change of spectra of oper ators II, Linear Algebra Appl. , 36 ( 198 1 ) 25-32 , that better perturbation bounds can be obtained for matrices whose eigenvalues occur in pairs ±.A. This was carried further in the paper Symmetries and variation of spectra, Canadian J. Math. , 44 ( 1 992) 1 1 55- 1 166, by R. Bhatia and L. Elsner, who considered matrices whose eigenvalues are p-Carrollian. See also the paper by R. Bhatia and L. Elsner, The q-binomial theorem and spectral symmetry, Indag. Math. , N.S. , 4( 1993) 1 1- 1 6. The material in Section VIII.5 is taken from these three papers. The bound in Problem VIII.6. 1 is due to Lagrange. There are several interesting and useful bounds known for the roots of a polynomial. Since the roots of a polynomial are the eigenvalues of its companion matrix, some of these bounds can be proved by using bounds for eigenvalues. An interesting discussion may be found in Horn and Johnson, Matrix Analysis, pages 3 1 6-3 19. The Gersgorin Disk Theorem was proved in S.A. Gersgorin, Uber die A brenzung der Eigenwerte einer Matrix, Izv. Akad. Nauk SSSR, Ser. Fiz. Mat., 6 ( 193 1 ) 749-754. A matrix is called diagonally dominant if l a i i l > 2: l a ij I , 1 < i < n . Every diagonally dominant matrix is nonsingular.
j :j::i
Gersgorin 's Theorem is a corollary. This theorem is applied to the study of several perturbation problems in J .H. Wilkinson, The A lgebraic Eigenvalue Problem. A comprehensive discussion is also given in Horn and Johnson, Matrix Analysis. The results of Problems VIII.6.4, VIII .6.5, and VIII.6.6 are due to P. Henrici, Bounds for iterates, inverses, spectral variation and fields of values of nonnormal matrices, Numer. Math. , 4 ( 1962) 24-39. Several other very interesting results that involve the measure of nonnormality are proved in this paper. For example, we know that the numerical range W ( A) of a matrix A contains the convex hull H(A) of the eigenvalues of A, and that the two sets are equal if A is normal. Henrici gives a bound for the distance between the boundaries of H (A) and W(A) in terms of the measure of nonnormality of A.
252
VI I I .
S pect ral Variation of Nonnormal Matrices
There are several different ways to measure the nonnormality of a ma trix. Problem VIII . 6 . 6 relates two such measures by an inequality. The re lations between several different measures of nonnormality are discussed in L. Elsner and M.H.C. Paardekooper, On measures of nonnormality of ma trices, Linear Algebra Appl. , 92 ( 1987) 107- 1 24. Is a nearly normal matrix near to an (exactly) normal matrix? More pre cisely, for every E. > 0, does there exist a 8 > 0 such that if I I A * A - AA* I I < 8 then there exists a normal B such that II A - B ll < £7 The existence of such a 8 for each fixed dimension n was shown by C. Pearcy and A. Shields, Al most commuting matrices, J. Funct. Anal. , 33( 1979) 332-338. The problem of finding a 8 depending only on £ but not on the dimension n is linked to several important questions in the theory of operator algebras. This has been shown to have an affirmative solution in a recent paper: H. Lin, A lmost commuting selfadjoint matrices and applications, preprint, 1995. No explicit formula for 8 is given in this paper. In an infinite-dimensional Hilbert space , the answer to this question is in the negative because of index obstructions. The inequality in Problem VIII.6. 7 was proved i n W. Kahan, B .N. Par lett , and E. Jiang, Residual bounds on approximate eigensystems of non normal matrices, SIAM J . Numer. Anal. 1 9 ( 1 982) 470-484. The inequality in Problem VIII. 6. 12 was proved by W. Kahan , Spectra of nearly Hermitian matrices, Proc. Amer. Math. Soc. , 48( 1975) 1 1- 1 7. For 2 x 2 block-matrices , the idea of the argument in Problem VIII.6. 1 4 is due to M . D. Choi , Almost commuting matrices need not be nearly com muting, Proc. Amer. Math. Soc. 102 ( 1 988) 529-533. This was extended to higher order block-matrices by R. Bhatia and L. Elsner, Symmetries and variation of spectra, cited above.
IX A
S elect ion of Matrix Inequalit ies
In this chapter we will prove several inequalities for matrices. From the vast collection of such inequalities, we have selected a few that are simple and widely useful. Though they are of different kinds , their proofs have common ingredie nt s already famil i ar to us from earlier chapters . IX . l
Some B asic Lemmas
If A and B are any two matrices , then AB and BA have the same eigen values. (See Exercise 1. 3. 7. ) Hence, if f(A) is any function on the space of matrices that depends only on the eigenvalues of A, then f(AB) == f ( BA) . Examples of such functions are the spectral radius, the trace, and the de terminant. If A is normal, then the spectral radius spr ( A ) is equal to I I A I I Using this, we can prove the following two useful p ro p os i t ions .
Proposition IX. l . l Let A, B be any two matrices such that the product AB is normal. Then, for every u nit ari ly invariant norm, we have III AB III < III BA III -
(IX. l )
Proof. For the operator norm this is an easy consequence of the two facts mentioned above; we have I I AB II
==
spr ( AB )
==
spr (BA) < I I BA II -
The general case needs more argument . Since AB is normal, sj (AB) ! Aj (AB) I , where j > q (AB) I > · · · > I -An (AB) I are the eigenvalues of A B
254
IX .
A Selection
of Matrix
Inequalities
==
arranged in decreasing order of magnitude. But j Aj (A B ) l !Aj ( BA ) I . By Weyl's Majorant Theorem ( Theorem II.3.6) , the vector !A (BA) l is weakly majorised by the vector s ( BA) . Hence we have the weak maj orisation • s ( A B) -< w s(B A ) . From this the inequality (IX. l) follows.
Proposition IX. 1 . 2 Let A, B be any two matrices such that the product AB is Hermitian. Then, for every unitarily invariant norm, we have
III AB III
.
< I II Re(B A ) Ill ·
( IX 2 )
Proof. The eigenvalues of BA, being the same as the eigenvalues of the Hermitian matrix AB, are all real. So, by Proposition III. 5.3, we have the majorisation A (B A ) -< A( Re B A). From this we have the weak majorisation I A(BA) I -< w I A(Re B A ) l (See Examples II.3.5.) The rest of the argument is the same as in the proof of the preceding proposition. •
.
Some of the inequalities proved in this chapter involve the matrix expo nential. An extremely useful device in proving such results is the following theorem.
Theorem IX. 1 . 3 { The Lie Product Formula) For any two matrices lim
rn � CX)
Proof.
(exp
A m
)
B rn exp == exp(A + B) m
.
A, B,
(IX.3 )
For any two matrices X , Y , and for m == 1 , 2, . . . , we have
x m - ym
== L xm- 1 -j (X - Y ) Yj . m- 1
j =O
Using this we obtain (IX.4 )
== ==
.. .
where M max( ! I X I I , I I Y I I ) . Now let Xm exp ( A!_B ) Ym == exp � exp ! , m == 1 , 2 , . Then From the I I Xm I I and I I Ym II both are bounded above by exp power series expansion for 't he exponential function, we see that
( HA I !_I BII ) .
,
)
( 2 2 ) + · · ·] } - { [ 1 + � + � ( �) + · · · ] [ 1 + ! + � ( ! ( �2 ) for large m.
1+
0
A+B m
1 +2
A+B 2 +··· m
IX. 2 P ro ducts of Positive Matrices
255
Hence, using the inequality (IX .4) , we see that
! I X: - Y,: !l < m exp( I I A I I + I I B I I ) O
(�2 ) ·
This goes to zero as m � oo . But x:;;: == 2 xp ( A + B) for all m. Hence, lim Y� == exp ( A + B) . This proves the theorem. • m ---+ CXJ
The reader should compare the inequality (IX.4) with the inequality in Prob lem I.6. 1 1 .
Exercise IX. 1. 4 Show that for any two matrices A , B lim
m ---+ CXJ
(exp -B 2m
B
A
exp - exp 2m m
-
)
m
==
exp ( A + B ) .
Exercise IX. 1 . 5 Show that for any two matrices A, B
( tB lim exp - exp
t ---+ 0
IX . 2
2
)
tB l / t
tA exp -
2
==
exp ( A + B) .
Products of Posit ive Matrices
In this section we prove some inequalities for the norm, the spectral radius, and the eigenvalues of the product of two positive matrices.
Theorem IX. 2. 1 Let A, B be positive matrices. Then (IX.5)
Proof.
Let
Then D is a closed subset of [0, 1] and contains the points 0 and 1. So, to prove the theorem, it suffices to prove that if s and t are in D then so is s�t . We have +2 t B s+2 t 11 2 s A II
s+ +t s+t s+t s + t A8 + t Bt B spr 2 ) 2 2 A8 n 2 I I == (B H spr ( Bs A s+t B t ) < I I BS As+ t B t l ! < 11 BSA S JI I I AtB t l l == I I AS B S II l l A t B t l l ·
At the first step we used the relation I I T I I 2 == l i T* T i l , and at the last step the relat ion l i T * I I == li T I I for all T. If s , t are in D , this shows that
II A
s
tt
s
B t t II < I I AB I I (s + t )/ 2 '
256
IX .
A Selection o f Matrix Inequalities
•
and this proves the theorem.
An equivalent formulation of the above theorem is given below, with another proof that is illuminating.
Theorem IX. 2 . 2 If A, B are positive matrices with II AB I I < 1 , then li A s B s ll S 1 for 0 < s < 1 . Proof. We can assume that A > 0. The general case follows from this by a continuity argument . We then have the chain of implications
I I AB I I < 1
=} =* ==? ==? =}
I I AB2 A l l < 1 =} AB2 A < I B2 < A - 2 (by Lemma V. 1 . 5) B2s < A - 2 s (by Theorem V. 1 . 9) A s B2 s As < I (by Lemma V. 1 .5) jj A s B 2 s A s I I < 1 =} I I A s B s I I < 1 .
•
Another equivalent formulation is the following theorem.
Theorem IX. 2.3 Let A, B be positive matrices. Then (IX.6 )
Proof. From Theorem IX. 2 . 1 , we have I I A1ft B1ft l l < I I AB I I 1 / t for t > 1. Replace A, B by At , B t , respectively. • Exercise IX. 2 . 4 Let A, B be positive matrices. Then {i) {ii)
I I A l f t B1ft I I t is a monotonically decreasing function of t on (0, oo ) . I I At Bt I I I / t
is a monotonically increasing function of t on (0, oo) .
In Section 5 we will see that the inequalities (IX.5 ) and (IX.6) are, in fact, valid for all unitarily invariant norms. Results akin to the ones above can be proved for the spectral radius in place of the norm. This is done below. If A and B are positive, the eigenvalues of A B are positive. (They are the same as the eigenvalues of the positive matrix A 11 2 BA 1 1 2 . ) If T is any matrix with positive eigenvalues, we will enumerate its eigenvalues as > A n (T) > 0. Thus .A 1 (T) is equal to the spectr al .A1 (T) > .A (T) > 2 radius spr (T) . ·
·
·
Theorem IX. 2 . 5 If A, B are positive matrices with .A 1 (AB) < 1 , then ) q (A s B s ) < 1 for O < s < 1 .
IX . 2 Products of Positive Matr ices
257
Proof. As in the proof of Theorem IX.2.2 , we can assume that A > 0. We then have the chain of implications
)q (AB)
<
1
)q (A 1 / 2 B A 1 1 2 )
< 1
A 1 /2 BA 1 /2 < I B < A - 1 ==> B s < A -s ==> A s f2 B s A s / 2 ) q ( A sf 2 Bs A sf 2 ) < 1 ==> )q (A s B s ) < 1 .
==> ==> ==>
==>
< I •
This proves the theorem.
It should be noted that all implications in this proof and that of Theorem IX. 2.2 are reversible with one exception: if A > B > 0, then A s > B8 for 0 < s < 1 , but the converse is not true.
Theorem IX. 2 . 6 Let A, B be positive matrices. Then
(IX. 7 )
Proof. Let A 1 (AB) o: 2 . If o: f= 0, we have A 1 ( � � ) == 1 . So, by Theorem IX. 2.5, -At (As B8 ) < a 2 8 == A1 (AB) . If a == 0, we have .X 1 (A1 1 2 BA 1 1 2 ) == 0, and hence A 11 2 BA 1 1 2 == 0. From this it follows that the range of A is contained in the kernel of B. But then • A 8 1 2 B 8 A 8 1 2 == 0, and hence, ) q (A 8 B 8 ) == 0 . ==
Exercise IX. 2 . 7 Le t A, B be positive matrices. Show that (IX.8)
Exercise IX. 2 . 8 Let A, B be positive matrices. Show that
[.X 1 ( A 1 ft B 1 ft )] t is a monotonically decreasing function of t on (0, oo) . (ii) [A 1 (At Bt )] 1 ft is a monotonically increasing function of t on (O , oo) . {i)
Using familiar arguments involving antisymmetric tensor products , we can now obtain stronger results.
Theorem IX. 2 . 9 Let A, B be positive matrices. Then, for 0 we have the weak majorisation
<
t
<
oo ,
(IX.9)
Proof. For k == 1 , 2, . . . , n , consider the operators 1\ k A and 1\ k B. The result of Exercise IX.2.8 (ii) applied to these operators in place of A, B yields the inequalities
k
k
II A� / t ( A t B t ) < II A� /u ( Au Bu )
j=
for k == (IX.9 ) .
1
( IX. 1 0 )
j= l
1 , 2 , . . . , n . The assertion of II.3.5(vii ) now leads to the majorisation
•
258
IX .
A Selection of
Matrix Inequalities
Theorem IX. 2 . 10 Let A , B be positive matrices. Then for every unitarily invariant norm we have
III Bt A t B t l ll II I ( BAB ) t lll P roof.
< <
III ( BAB ) t lll , for O < t < III B t At B t Ill , for t > 1 .
1,
(IX. 1 1) (IX. 1 2)
We have
for 0 < t < 1 , by Theorem IX.2. 1 . So
This is the same
as
saying that
Replacing A and B by their antisymmetric tensor powers, we obtain, for 1 < k < n, k
IT sj ( B t At B t )
j=l
<
k
IT s� ( BAB) .
j= l
By the argument used in the preceding theorem, this gives the majorisation
which gives the inequality (IX. 1 1 ) . The inequality (IX. 12) is proved in exactly the same way.
•
Exercise IX.2. 1 1 Derive (as a special case of the above theorem) the fol lowing inequality of Araki-Lieb- Thirring. Let A, B be positive matrices, and let s, t be positive real numbers with t > 1 . Then (IX. l3)
IX.3
Inequalities for the Exponential Function
For every complex number matrix version of this.
z,
we have I e z I == I e Re z 1 - Our first theorem is a
Theorem IX.3. 1 Le t A be any matrix. Then (IX. l4 ) for every unitarily invariant norm.
IX . 3 Inequalit ies
for
the Exponent ial Function
259
Proof. For each positive integer m , we have I I Am l l < I I A II m · This is the same as saying that si (A m ) < si m ( A) or s 1 ( A *m Am ) < sr ( A * A ) . Replac ing A by A, we obtain for 1 < k < n
1\ k
k
IT Sj (A*m Am )
j= l
k
<
IT Sj ( [A* A] m ) .
j= l
eA im , we obtain k A * A ITk S [ A · /m im] m . eA ) IT Sj (e e ) < j( e
Now , if we replace A by
j=l
Lett ing
m � oo ,
j=l
and using the Lie Product Formula, we obtain
k
kS
j(eA· eA) s < IT j(eA *+A ) . j= j=l IT
l
Taking square roots, we get k
IT Sj (
j=l
k
eA ) < j=IT Sj(eReA ). l
This gives the majorisation •
( see II . 3 . 5 ( vii ) ) , and hence the inequality ( IX. l4 ) .
It is easy to construct an example of a 2 x 2 matrix A , for which I I and I ll are not equal. Our next theorem is valid for a large class of functions . It will be conve nient to give this class a name.
l eA Il
J l e Re A
Definition IX.3.2 A co ntinuo us complex-valued function on the space of matrices will said to belong to the class T if it satisfies the following two properties:
f
be {i} f(XY) f(YX) for all X, Y. {ii} l f (X2 m )j < f( [XX * ] rn ) for all X , and for 1, 2 , . . . . Exercise IX.3 . 3 {i) The func tions trace and determinant are in {ii} For every k, 1 < k < the function 'P k (X) tr 1\ k X is in {These are the coefficients in the characteristic polynomial of X . ) ==
m
==
T.
n,
==
T.
260
IX .
A Selection of M atrix Inequalities
{iii) Let Aj ( X) denote the eigenvalues of X arranged so that j A 1 ( X ) I > j A 2 (X ) j > · · · > I -Xn (X) j . Then, for 1 < k < n , the function fk (X ) ==
k IT Aj ( X )
j =l II. 3. 6.}
is in T, and so is the function
{iv) For 1 < k <
n,
the function
gk (X )
==
! fk (X ) j . [Hint:
k L lAj (X ) I j=l
{v) Every symmetric gauge function of the numbers is in T .
Use Theorem
is in T.
j A I ( X ) I , . . . , I An (X ) j
Exercise IX. 3. 4 {i} If f is any complex valued function on the space of matrices that satisfies the condition (ii} in Definition IX. 3. 2, then f( A ) > 0 if A > 0 . In particular, f ( e A ) > 0 for every Hermitian matrix A . {ii) If f satisfies both conditions {i) and {ii) in Definition IX. 3. 2, then J ( A B) > 0 if A and B are both positive. In particular, f( e A e 8 ) > 0 if A and B are Herrnitian.
The p r in c ip al result about the class T is the following.
Theorem IX. 3 . 5 Let f be a function in the class T . Then for all matrices A, B , we have (IX. l 5) Proof.
X, Y l 2 (XY (XY *] rn ([ ) ) ) J f([XYY * X* ) 2 m- l ) f( [X* XYY *] 2 m- l ) .
For each positive integer
m 2 Y ([X ] )I IJ
m,
<
we have for all
Here, the inequality at the first step is a consequence of the property ( ii ) , and the equality at the last step is a consequence of the property ( i ) of functions in T. Repeat this argument to obtain
l f( [X Y] 2rn ) l
f( [ (X * X ) 2 (Y Y * ) 2 ] 2 m -2 ) < f([X* X] 2 m - l (YY *] 2m- l ) . matrices. Put X == eAI 2 m and Y <
Now let A , B be any two == eBI 2 m in the above inequality to obtain / < [ /2m e A /2m ] 2 m - l (e B / 2 m e B * / 2 m ] 2 m - l /2 l f ( ( e A 2 rn e B m ] 2 m ) j f( e A ). *
Now let m ----* oo . Then, by the continuity of f and the Lie Product Formula, we can conclude from the above inequality that B < j j (e A + B ) j j ( e (A + A * ) / 2 e ( B +B * ) / 2 ) == j( e Re A e Re ) . •
IX . 3 I nequalities for the Expo nent ial Fu nction
Corollary IX. 3.6
L et f
261
be a function in the class T. Then (IX. 16)
and
(IX. l 7) Particularly noteworthy is the following special consequence.
Theorem IX.3. 7 Let A , B be any two Hermitian matrices . Then (IX. 18) fo r every unitarily invariant norm.
Use (IX. l 7) for the special functions in Exercise IX. 3 . 3 (iv) . This gives the majorisation Pro of.
•
This proves the theorem. Choosing f(X) == trX , we get from (IX. l 7) the famous Thompson inequality: for Hermitian A, B we have
G o l de n
(IX. l9)
Exercise IX.3.8 Let A, B be Hermitian matrices. Show that for every uni tarily invariant norm exp
tB 2
tB
exp tA exp 2
1 /t
decreases to Ill exp(A + B) Ill as t 1 0 . A s a special consequence of this we have a stronger version of the Golden- Thompson inequality:
(
tB
tB
tr exp (A + B) < tr exp 2 exp tA exp 2 [Use Theorem IX. 2. 1 0 and Exercise IX. l. 5. J
)
for all t > 0.
262
IX . 4
IX .
A Selection of Mat rix I nequalities
Arit hmetic- Geometric Mean Inequalities
The classical arithmetic-geometric mean inequality for numbers says that .J(;i; < � ( a + b) for all positive numbers a, b. From this we see that fo r complex numbers a , b we have Jab l < � ( l a l 2 + l b l 2 ) . In this section we obtain some matrix versions of this inequality. Several corollaries of these inequalities are derived in this and later sections.
Lemma IX.4. 1 Let Y1 , Y2 be any two positive matrices, and let Y == Y1 Y2 . Let Y == y + - y- be the Jordan decomposition of the Hermitian matrix Y . Then, for j == 1 , 2 , . . . , n,
(See Section IV. 3 for the definition of the Jordan decomposition.}
Proof. Suppose .:\1 (Y) is nonnegative for j == 1 , . . . , p and negative for j == p + 1 , . . . , n. Then .:\1 ( Y + ) is equal to -A1 ( Y ) if j == 1 , . . . , p, and is zero for j == p + 1 , . . . , n . Since Y1 == Y + Y2 > Y, we have A.j ( Y1 ) > -A1 ( Y ) for all j , by Weyl' s Mono tonicity Principle. Hence, A.1 ( Y1 ) > .:\1 ( Y+ ) for all j . Since Y2 == Y1 - Y > - Y , we have -A1 (Y2 ) > Aj ( -Y ) for all j . But A.1 ( - Y) A1 ( Y- ) for j = 1 , . . . , n - p and .:\1 ( - Y ) == 0 for j > n - p . Hence, .:\1 ( Y2 ) > A.1 (Y- ) for all j. • =
Theorem IX. 4 . 2 Let A,
B be any two matrices. Then
(IX.20) for 1 < j <
n.
X be the 2n 2n matrix X == ( � � ) . Then XX * = ( AA* � BB* � ) , X * X = ( �:� �:� ) . The off-diagonal part of X* X can be written Proof.
Let
x
as
where U is the unitary matrix ( � �1 ) . Note that both of the matrices the braces above are positive. Hence, by the preceding lemma,
in
IX. 4 Arit hmetic- G eometric Mean Inequalities
263
X X and XX* have the same eigenvalues. Hence, both :A1 ( Y + ) and * Aj ( Y- ) are bounded above by � :A1 ( AA* + BB * ) . Now note that , by Exercise II. l . l5, the eigenvalues of Y are the singular values of A* B together with But
their negatives. Hence we have
•
A , B be any two matrices. Then there exists a uni
Corollary IX.4.3 Let tary matrix such that
U
j A* Bj < � U(A A* + BB*)U*. Corollary IX.4.4 Let A , B be any two matrices. Then l i l A * B i l < � I I AA * + B B * I l
(IX. 2 1)
(IX . 22)
for every unitarily invariant norm.
The particular position of the stars in (IX. 20) , (IX. 2 1 ) , and (IX. 22) is not an accident. If we have
A= ( � � ) , B= ( � � ) , then s1( A B) == -)2, but � s 1 ( AA* + BE * ) == 1 . The presence of the unitary U in (IX. 2 1 ) is also essential: it cannot be replaced by the identity matrix even when A , B are Hermitian. This is illustrated by the example A considerable strengthening of the inequality (IX.22) is given in the theorem below.
A, B , X, we have I I A* X B I I < � I ! AA* X + XBB* I I
Theorem IX.4.5 For any three matrices
(IX.23 )
for every unitarily invariant norm. B, X are Hermitian and Proof. First consider the special case when A == Then is also Hermitian. So, by Proposition IX. l.2 ,
B.
AX A
A,
(IX.24)
264
IX .
A Selection of Matrix Inequalit ies
\vhich is just the desired inequality in this special case. Next consider the more general situation, when A and and X is any matrix. Let
B are Hermitian
Then, by the special case considered above , ( IX.25)
Multiplying out the block-matrices, one sees that
AX E EX * A 0
)2 ( 0 A X + XB 2 ( B2 X * + X * A2 0 ) · 0
TYT T 2 Y + Y T2
,
Hence, we obtain from (IX.25) the inequality
III AXB III
<
� III A2 X + X B2 III ·
Finally, let A, B, X be any matrices. Let decompositions of A and B . Then
(IX. 26)
A == A 1 U, B == B 1 V
be polar
AA* X + XBB * == AiX + X B i , while
l i l A* X B i ll == III U A l X B l V III == I I I A l X B l lll·
•
So, the theorem follows from the inequality (IX.26) .
Exercise IX.4.6 Another proof of the theorem can be obtained as follows. First prove the inequality (IX. 24} for Hermitian A and X . Then, for arbi trary A , B and X, let T and Y be the matrices
T ==
0 0 0 0 A* 0 0 B*
A 0 0 0
G jj 0 0
Y ==
0 X* 0 0
X 0 0 0
0 0 0 0
0 0 0 0
and apply the special case to them.
Exercise IX.4. 7 Construct an example to show that the inequality {IX. 20} cannot be strengthened in the way that (IX. 23) strengthens (IX. 22). When A, B are both positive, we can prove a result stronger than (IX. 26 ) . This is the next theorem.
IX. 4
Arit hmet ic- Geomet ric Mean Inequal i t ies
Theorem IX.4.8 Let A, E be positive matrices and let Then, for each unitarily invariant norm, the function
X
265
be any matrix.
(IX . 27) is convex on the interval [- 1 , 1] and attains its minimum at t
==
0.
Proof. Without loss of generality, we may assume that A > 0 and B > 0. Since f is continuous and f(t) == f( - t ) , both conclusions will follow if we show that f (t) < � [f(t + s ) + f (t - s ) ] , whenever t ± s are in [- 1 , 1] . For each t, let M t be the mapping
M t (Y) For each
Y, we have
==
� (A t YB - t + A - t YB t ) . 2
III Y III = Il l At (A - t y B - t ) B t lll < � I I I AtY B - t + A - t y B 1 Ill by Theorem IX.4.5. Thus III Y III < IIIM t ( Y) III - From this it follows that II ! M t (AX B ) Ill < I I IMsM t (AX B) III , for all s , t. But,
So we have
Since
IIIM t (AXB ) lll < � { IIIM t + s (AXB ) Ill + IIIM t - s (AXB) lll }. IIIM t (AX B ) lll � f ( t ) , this shows that ==
1 f ( t) < [ f ( t 2 This proves the theorem.
+ s ) + f (t - s)] .
•
Corollary IX.4.9 Let A, B be positive matrices and X any matrix. Then, for each unitarily invariant norm, the function (IX.28) is convex on [0 , 1] .
Proof.
Replace
A, B by A 1 1 2 ,
B 1 1 2 in (IX . 2 7 ) . Then put v
==
l �t .
•
Corollary IX.4. 10 Let A, B be positive matrices and let X be any matrix. Then, fo r 0 < v < 1 and for every unitarily invariant norm,
III Av XB 1 - v + A 1 - v XB v ll l
<
III A X + XB III ·
Proof. Let g ( v) be the function defined in (IX. 28) . Note that g ( l ) So the assertion follows from the convexity of g .
(IX.29) ==
g (O) .
•
IX .
266
IX . 5
A Selection of Matrix Inequalit ies
Schwarz Inequalities
In this section we shall prove some inequalities that can be considered to be matrix versions of the Cauchy-Schwarz inequality. Some inequalities of this kind have already been proved in Chapter 4. Let A, B be any two matrices and r any positive real number. Then , we saw that (IX.30) III l A* B l ' lll 2 < I II (AA * )' II I II I (BB* )' I II for every unitarily invariant norm. The choice r
== �
gives the inequality (IX.31)
while the choice r
==
1 gives
lil A* B lll 2 < I I I AA* I II II IBB* II I -
(IX.32)
See Exercise IV. 2 . 7 and Problem IV.5. 7. It was noted there that the in equality (IX. 32) is included in (IX 3 1 ) . We will now obtain more general versions of these in the same spirit as of Theorem IX. 4 . 5 . The generalisation of (IX.3 2) is proved easily and is given first , even though this is subsumed in the theorem that follows it . .
Theorem IX. 5 . 1 Let A, B , X be any three matrices. Then, for every uni tarily invariant norm,
III A* XB III 2 < III AA *X III I II XBB* III-
(IX.33)
X is a positive matrix. Then lil A* X 1 1 2 X 1 1 2 B lll 2 I ll (X 1 1 2 A) * ( X 1 1 2 B) 111 2 III X l/ 2 AA* x l / 2 111 III X l/ 2 BB* x l / 2 11 1 ,
First assume that
Proof.
lil A* x B lll 2
==
<
using the inequality (IX.32) . Now use Proposition IX. l . l to conclude that
Ill A* X B 111 2 < Ill AA * X Ill Ill X B B * Ill This proves the theorem in this special case. Now let X be any matrix, and let X UP be its polar decomp:Osition. Then, by unitary invariance, I li A* u p Bill I I I U* A * u p B ill , III A *X B III I II AA* U P ill == I I I U* AA * UPi ll , I I I AA * X III II IU p BB * Ill == I II P B B * Ill Ill X BB* Ill ==
==
So, the general theorem follows by applying the special case to the triple
U* AU, B , P .
•
The corresponding generalisation of the inequality (IX . 30) is proved in t h e next theorem.
267
IX . 5 Schwarz Inequalities
Let A, B, X be any three matrices. Then, for every pos itive real number r, and for every unitarily invariant norm, we have
Theorem IX. 5. 2
IIl J A* XB I ' III 2
<
III I AA* X l ' lll III J XBB* I ' III ·
( IX.34 )
Proof. Let X == UP be the polar decomposition of X. Then A * X B == A * U P B == ( P 1 1 2 U * A) * P 112 B. So, from the inequality ( IX.30 ) , we have III l A* X B J ' III 2 < III (P 1 1 2 U* AA* U P 1 12 ) ' l i i iii (P 1 ; 2 B E* P 11 2 )' 111 · (IX.35) Now note that
Using Theorem
IX.2.9, we have
A.' (AA* U P U* ) -< w .X' I 2 ( [AA*] 2 [ UPU* ] 2 ) . But (UPU * ) 2 U P2 U * == XX * . Hence, >t i2 ( [AA* ] 2 [UPU * ] 2 ) s' (AA*X ) . ==
==
Thus, and hence
III ( P 1 12 U* A A *U P 1 1 2 ) ' 11 1
In the same way, we have
<
I I I I AA* X I' Ill ·
( IX.3 6 )
.X ' (P BB * ) -< w >t i2 (P2 [BB* ] 2 ) s ' (PBB* ) == s ' (XBB* ) .
Hence
( IX.37 )
Combining the inequalities (IX.35) , (IX.36) , and ( IX.37 ) we get
(IX.34 ) .
•
The following corollary of Theorem IX. 5. 1 sho uld be compared with ( IX.29 ) . Let A , B be positive matrices and let X be any matrix. < 1 , and for every unitarily invariant norm
Corollary IX. 5.3 Then, for 0 <
v
( IX .38 )
Proof. For v == 0, 1 , the inequality ( IX.38 ) is a trivial statement . For v == � , it reduces to the ine qual ity ( IX.33 ) . We will prove it, by induction, for all indices v == k / 2 n , k == 0, 1 , . . . , 2 n . The general case then follows by continuity.
IX . A Select ion o f M atrix Inequalities
268
Let v == 2 �� 1 be any dyadic rational. Then v == t-t + p , \vhere J-L == 2 71� , p == 21n Suppose that the inequality (IX.38) is valid for all dyadic rationals with denominator 2n- l . Two such rationals are J-L and .A == J..L + 2p == v + p . Then, using the inequality (IX. 33) and this induction hypothesis, \Ve have 1
•
II I A v XB l- v lll <
<
II I A J-L+ P X B l - A + p lll II I AP(AJL X Bl - A)BP I I I III A 2 p AJL X B 1 - A III l / 2 III A JL XB 1 - A B 2 p lll l/ 2 Ili A A X B 1 - ).. 111 1 / 2 111 A J-L X B 1 - J-L III 1 / 2 I II AX I II A 1 2 III X B III ( 1 - ).. ) / 2 III AX III JL 1 2 III X B lll ( l - JL ) / 2 Ill AX Ill ( >-. +JL ) / 2 III X B III 1 - ( A + JL ) / 2 Ill AX lllv Ill X B lll l - v ·
This proves that the desired inequality holds for all dyadic rationals .
•
Corollary IX. 5 .4 Let A, B be positive matrices and let X be any matrix. Then, for 0 < v < 1 , and for every unitarily invariant norm
(IX.39)
Assume without loss of generality that A is invertible; the general case follows from this by continuity. We have, using (IX. 38) , Proof.
I I I A v X B v lll <
III (A - 1 ) 1 - v AXB 1 - (l - v ) lll JII A - 1 AX III 1 - v lll AX B lllv Ill X Ill 1 - v Ill AX B Ill v ·
Note t hat the inequality (IX.5) is a very special case of (IX. 39) .
•
Exercise IX. 5 . 5 Since I I I AA * I l l I l i A* A l l I , the stars in the inequality (IX. 32) could have been placed differently. Much less freedom is allowed for the generalisation (IX. 33) . Find a 2 x 2 example in which l i l A * XB I J I 2 is larger than I ll A * AX Ill Il l X B B* I ll · ==
Apart from norms , there are other interesti ng functions for which Schwarz like inequalities can be obtained. This is done below. It is convenient to have a name for the class of functions we shall study.
Definition IX.5.6 A continuous complex-valued function f on the space of matrices will be said to belong to the class .C if it satisfies the following two conditions: {i) {ii)
f(B) > f ( A ) > 0 if B > A > 0 . l f (A * B) l 2 < f (A* A) j( B * B) for all A, B .
IX . 5 Schwarz Inequalities
269
We have seen above that every un itarily invariant norm is a fu n ct i o n the class £. Other examples are given below.
Exercise IX. 5 . 7
in
{i) The functions trace and determinant are in £ .
{ii) The function spectral radius is in £ . {iii) If f is a function defined on matrices of order (�) and is in .C , then the function g (A) == f ( /\ k A) defined on matrices of order n is also in .C . {iv) The functions 'Pk (A) == tr 1\ k A, 1 < k < n , are in .C . {These are the coefficients in the characteristic polynomial of A . ) (v) If sj (A) , 1 < j <
are the singular values of A , then for each
n,
1 < k < n the function
fk (A )
== II k
S j (A) is in .C .
j=l
{vi) If Aj (A) denote the eigenvalues of A arranged as I A 1 (A) j >
I A n (A) I , then for 1 < k <
n the function fk ( A )
k
== II Aj (A)
·
·
·
>
is in £ .
j= l
Exercise IX. 5 .8 Another class of functions T was introduced in IX. 3. 2. The two classes T and .C have several elements in common . Find examples to show that neither of them is contained in the other.
A different characterisation of the c l as s .C is obtained below. For this we need the following theorem, which is also useful in other contexts.
Theorem IX. 5 . 9 Let A, B be positive operators on 7t, and let C be any operator on 1t . Then the operator ( � �· ) on 1t EB 1t is positive if and only if there exists a contraction K on 1i such that C == B 1 1 2 K A 1 1 2 . Proof. By Proposition I.3. 5, K is a contraction if and only if ( � �· ) is positive. The positivity of this matrix implies the positivity of the matrix
)(�
A 1 1 2 K* B l /2 B
).
( See Lemma V. l . 5. ) This proves one of the asserted implications. To prove
the converse, first note that if A and B are invertible, then the argument can be reversed; and then note that the general case can be obtained from • this by continuity.
2 70
IX.
A Selection of M atrix Inequalities
Theorem IX. 5 . 10 A {continuous) function f is in the class [, if and only if it satisfies the following two conditions: (a) f( A) > 0 for all A > 0 . {b) l f ( C ) I 2 < f ( A ) f (B) for all A, B , C
such that (� a; )
is positive.
Proof. If f satisfies condition ( i ) in Definition IX.5. 6 , then it certainly satisfies the condition ( a ) above. Further, if ( � �· ) is positive, then, by the preceding theorem, C == B 1 1 2 K A 1 1 2 , where K is a contraction. So, if f satisfies condition ( ii ) in IX.5.6, then Since A 1 / 2 K * K A 1 1 2 < A, we also have f (A 1 1 2 K * K A 1 1 2 ) < f(A) from the condition ( i ) in IX. 5 . 6 . Now suppose f satisfies conditions ( a ) and ( b ) . Let B > A > 0 . Write
( B A ) == ( B -0 A A B
0
B-A
) + ( AA AA ) .
The first matrix in this sum is obviously positive; the second is also positive by Corollary L3.3. Thus the sum is also positive. So it follows from ( a ) and ( b ) that f (A) < f(B) . Next note that we can write, for any A and B ,
( A* AA
A* B B* B
B*
) - ( A* 00 ) ( A0 B0 ) . B*
Since the two matrices on the right-hand side are adj oints of each other, their product is positive. Hence we have
l f( A * B) I 2 < f(A * A ) j (B* B) .
•
This shows that f is in £. This characterisation leads to an easy proof of the following theorem E . H . Lieb.
of
Theorem IX.5. 1 1 Let A , . . . , A m and B 1 , . . . , Ern be any matrices. Then, 1 for every function f in the class [,,
f
(� ) (� ) Ai Bi
f
Ai
2
<
f
<
f
2
(� ) (� ) (� ) (� ) A; Ai f I Ai l f
B; Bi .
l A: I .
( IX .40) (IX.4 1 )
IX. 6
The Lieb Co ncavity Theorern
271
A ... A . A .. B ) IS positive, b et he matrix ( B � A B � 8• ing the product of ( �� � ) and its adjoint. The sum of these matrices is , therefore, also positive. So the inequality (IX.40) follows from Theorem IX.5. 10. Each of the matrices ( I � ; I 1 ) is also positive; see Corollary I.3.4. So • the inequality (IX.4 1 ) follows by the same argument . P roof.
For each
,jfJ
==
1,
•
•
•
, m,
·
·
1
1
"l
·
·
1
'l
���
Exercise IX. 5. 12 For any two matrices A, B , we have tr( I A + B l ) < tr( I A I + I B I ) . Show by an example that this inequality is no t true if tr is replaced by det . Show that we have [det ( I A + B l )] 2 < det ( I A I
A
+ I B I ) det ( I A * I + I E* I ) .
similar inequality holds for every function in
IX . 6
(IX.42)
L.
The Lieb C oncavity Theorem
Let f(A, B ) be a real valued function of two matrix variables. Then , f is called jointly concave, if for all 0 < a < 1 , for all A 1 , A 2 , B 1 , B2 . In this section we will prove the following theorem due to E.H. Lieb. The importance of the theorem, and its consequences, are explained later.
Theorem IX.6 . 1 {Lieb) For each matrix t < 1 , the function
X and each real number 0 <
is jointly concave on pairs of positive matrices. Note that j (A, B) is positive if A, B are positive. To prove this theorem we need the following lemma.
Lemma IX.6. 2 Let R 1 , R2 , 81 , 82 , T1 , T2 be positive operators on a Hilbert space. Suppose R I commutes with R2 , S1 commutes with S2 , and T1 commutes with T2 , and ( IX . 4 3 ) Then, for 0 < t < 1 , ( IX. 44)
2 72
IX .
A Selection of Matrix Inequalities
Proof. Let E be the set of all t in [0, 1] for which the inequality (IX.44) is true. It is clear that E is a closed set and it contains 0 and 1 . We will first show that E contains the point � , then use this to show that E is a convex set . This would prove the lemma . Let x, y be any two vectors. Then
l ( x , ( S� / 2 s� / 2 + r{ f2r� f2 ) Y ) I < I ( x, s� / 2 s� f2 y ) I + I ( x , T11 / 2 T� / 2 y ) I < IIS� 1 2 x ll II S� 1 2 Y ll + I I T 1 1 2 x l 1 11 Ti1 2 Y ll 1 < [I I S� 1 2 x ll 2 + II T11 1 2 x ii 2 ] 1 1 2 [II S� 1 2 Y II 2 + 1 1Ti 1 2 Y ll 2 ] 1 1 2
by the Cauchy-Schwarz inequality. This last expression can be rewritten as
Hence, by the hypothesis, we have
; I ( x, ( 811 1 2 s21 2
+
Tl1 12 T21 1 2 ) y) I
< [ ( x , R l x ) ( y, R2 Y ) ] 1; 2 .
Using this, we see that , for all unit vectors
u
and
v,
I ( u, R � 1 / 2 ( S� / 2 s� / 2 + rf l2ri f 2 ) R;,- lf2v ) I I ( R� l / 2 u, ( si / 2 s� / 2 + rf l2r; f2 ) R;: 1 f2 v ) I < [ ( R ;_- If2 u, R� 1 2 u ) (R:;_ 1 f2v , R� 1 2 v )] I / 2 1.
This shows that
/ 2 ( S11 /2 821 ; 2 + T l / 2 r.2l / 2 ) R-2 1 / 2 II -< 1 . I I R-I 1 1 Proposition IX. 1 . 1 , the commutativity of R 1 and R2 ,
Using equality above, we see that
and
the in
This is equivalent to the operator inequality
Fi-om this, it follows that
( See Lemma V. 1 . 5 . ) This shows that the set E contains the point 1 /2 .
IX.6
Now suppose that
J-t
The Lieb Co ncavity Theoren1
2 73
and v are in E. Then > >
These two inequalities are exactly of the form (IX.43) . Hence, using the special case of the lemma that has been proved above, we have,
(Rij_ R� -J.L ) l f 2 (R r R� - v ) l / 2 > ( Si S�-J.L ) l f 2 (sr s� - v ) l f 2 + ( Tf Ti- J.L ) lf 2 ( T� T:J - v ) lf 2 _ Using the hypothesis about commutativity, we see from this inequality that • � ( Jl- + v ) is in E. This shows that E is convex.
Proof of Theorem IX.6. 1 : Let A, B be positive operators on 7t. Let A and B be the left and right multiplication operators on the space .C (7t) induced by A and B; i.e. , A(X ) == A X and B(X) == X B . Using the results of Exercise 1.4.4 and Problem VII.6 . 10, one sees that A and 8 are positive operators on (the Hilbert space) .C(1t) . Now, suppose A1 , A 2 , B1 , B2 are positive operators on 7t. Let A == A 1 + A 2 , B == B1 + B2 . Let A1 , A2 , A denote the left multiplication operators on .C (1t) induced by A 1 , A 2 , and A, respectively, and 81 , 82 , 8 the right multiplication operators induced by B1 , B2 , and B, respectively. Then A == A1 + A2 , B == B 1 + 82 . Hence, by Lemma IX. 6 . 2 . ,
At 8 1- t >- At1 8 11 - t + A2t 821 - t '
for 0 < t < 1 . This is the same as saying that for every X in .C (7t)
or that
From this, it follows that
• This shows that f is concave. Another proof of this theorem is outlined in Problem IX. 8 . 17. Using the identification of .C('H) with 1t ® 1t , Lieb's Theorem can be seen to be equivalent to the following theorem.
Theorem IX.6.3 ( T. Ando} For each 0 < t < 1 , the map ( A, B ) A t ® B 1 - t is jointly concave on pairs of positive operators on 1-l .
�
2 74
IX.
A Selection of !vlatrix Inequalities
Exercise IX. 6 .4 Let t1 , t 2 be two positive numbers such that t 1 + t 2 < 1 . At1 ® B t 2 is jointly concave on pairs of positive Show that the map (A, B ) operators on 1-l . [Hint: The map A As is monotone and concave on positive operators for 0 < s < 1 . See Chapter 5.} -t
-t
Lieb proved his theorem in connection with problems connected with entropy in quantum mechanics. This is explained below ( with some simpli fications ) . The function S(A) - tr A log A , where A is a positive matrix, is called the entropy function. This is a concave function on the set of positive operators. In fact, we have seen that the function f (t) - t log t is operator concave on ( 0, oo ) . Let K be a given Hermitian operator. The entropy of A relative to K is defined as S (A , K ) = tr [A 11 2 , K] 2 , ==
==
�
where [X, Y] stands for the Lie bracket ( or the commutator ) X Y - Y X . This concept was introduced by Wigner and Yanase , and extended by Dyson, who considered the functions
( IX. 4 5 ) 0 < t < 1. The Wigner-Yanase-Dyson conjecture said that St (A, K ) is concave in A on the set of positive matrices. Lieb's Theorem implies that this is true. To see this note that
(IX.46) Since the function g(A) == - tr K 2 A is linear in A, it is also concave. Hence, concavity of St (A, K) follows from that of tr KAt K A l -t . But that is a special case of Lieb's Theorem. Given any operator X, define
(IX. 47 ) 0 < t < 1 . Note that I0 (A, X) == 0. When X is Hermitian, this reduces to the function St , defined earlier. Lieb's Theorem implies that It ( A, X ) is a concave function of A . Hence, the function I (A, X) defined as
I(A, X) =
! t=O lt (A, X ) = tr (X* (log A) XA -
X*
X ( log A) A ) ( IX. 4 8 )
is also concave. Let A , B be positive matrices. The relative entropy of A and B is defined as S ( A I B ) tr ( A ( log A - log B)). ( IX.49) ==
This notion was introduced by Umegaki, and generalised by Araki to the von Neumann algebra setting.
IX. 7 O p er ator Approximat ion
Theorem IX. 6 . 5 {Lindblad) The function convex in
A, B.
Proof.
275
S( AI B ) defined above is jointly
Consider the block-matrices
Then note that
==
S(A I B) - I(T, X ) .
•
I( T, X ) is concave in the argument Exercise IX.6.6 {i) Show that for every pinching C, S(C(A )I C (B ) ) < ). be the normalised trace function on matrices; i. e. , S(A{ii}I BLet ( A) T T tr A . Show that for all positive matrices A , B T(A) (log T ( A ) - log T (B )) < T ( A ( lo A - log B ) ) . (IX . 50 )
We noted earlier that
l n
T.
n
xn
==
g
This is called the Peierls-Bogoliubov Inequality . {There are other in equalities that go by the same name.)
IX. 7
O perator Approximation
An operator approximation problem consists of finding, for a given oper ator the element nearest to it from a special class. Some problems of this type are studied in this section. In formulating and interpreting these results , it is helpful to have an analogy: if arbitrary operators are thought of as complex numbers, then Hermitian operators should be thought of as real numbers, unitary operators as complex numbers of modulus one and positive operators as positive real numbers. Of course, this analogy has its limitations, since multiplication of complex numbers is commutative and that of operators is not. The first theorem below is easy to prove and sets the stage for later
A,
results.
�( A+A* ) . A (IX. 5 1 ) I l i A - Re A li < I l i A - Hi l P roof. Recall that I I T I I = l i l T * I l for every T . Using this fact and the triangle inequality, we have I li A - 1 /2 ( A + A* )ll 1/2 Il i A - A* I l == 1 /2 I l i A - H + H - A* I l < 1 /2 ( I l i A - H i ll + Ill (A - H ) * Ill ) Il i A - Hil l ·
Theorem IX. 7. 1 Let be any operator and let Re A == Then, for every Hermitian operator H and for every unitarily invariant norm,
=
2 76
IX.
A Select ion of M atrix Inequalities
•
This proves the theorem.
The inequality (IX. 5 1 ) is sometimes stated in words as: in every unitarily invariant norm Re A is a Hermitian approximant to A. The next theorem says that a unitary approximant to A is any unitary that occurs in its polar decomposition.
Theorem IX. 7. 2 If A
== U P ,
lil A - U lll for every unitary
Proof.
W
<
where U is unitary and
l i l A - W i ll
<
P positive,
l i l A + U lll
then (IX. 52)
and for every unitarily invariant norm.
By unitary invariance , the inequality (IX. 52) is equivalent to
III P - J ill
<
III P - U* W III
<
III P + J il l -
So the assertion of the theorem is equivalent to the following: for every positive operator P and unitary operator V,
IIIP - I III
<
III P - V III
<
IIIP + l lll -
(IX. 53)
This will be proved using the spectral perturbation inequality (IV.62) . Let
-=( 0
p
p p 0
) ' - == ( V0 * V
Then P and V are Hermitian. The eigenvalues of P are the singular values
of P together with their negatives. (See Exercise IL 1. 15.) The same is true _ for V, which means that it has eigenvalues 1 and - 1 , each with multiplicity n . We thus have
== (Eig1 (P) - I] EB [ - EigT ( P) + I] , Eig 1 ( F ) - Eigi C V ) == [Eig 1 ( P ) + I] EB ( - E g ( P ) - I].
E ig ! ( P ) - Eig 1 C V )
So, from (IV.62) , we have
Il l [Eig 1 ( P ) - I] EB [E ig1 ( P ) - I] Ill
i T
<
<
Ill ( P - V) EB (P - V )* Il l Ill [Eig1 ( P ) + I] E9 [Eig ! ( P) + I] Ill -
This is equivalent to the pair of inequalities (IX. 53).
•
The two approximation problems solved above are subsumed in a more general question. Let be a closed subset of the complex plane, and let N( cp) be the collection of all normal operators whose spectrum is contained in
IX. 7 Operator A pproximat ion
277
circle. When i s the whole plane or the positive half-line , the problem becomes much harder, and the full solution is not known. Note that in the first case ( == C ) we are asking for a normal approximant to A , and in the second case ( == IR+ ) for a positive approximant to A. Some results on this problem, which are easy to describe and also are directly related to other parts of this book, are given below . We have already come across a special case of this problem in Chapter 6. Let F be a retraction of the plane onto the subset ; i.e. , F i s a map o f C onto 4> such that l z - F (z ) l < l z - w l for all z E C and w E . Such a map always exists; it is unique if (and only if) is convex. We have the following theorem.
Theorem IX. 7.3 Let F be a retraction of the plane onto the closed set . Suppose is convex. Then, for every normal operator A , we have
lil A - F (A) III
<
l i l A - Nlll
(IX . 54)
for all N E N () and for all unitarily invariant norms. If the set is not convex, the inequality (IX. 54) may not be true for all unitarily invariant norms, but is still true for all Q-norms. (See Theorem VI. 6. 2 and Problem
VI. 8. 1 3.)
Exercise IX. 7.4 Let A be a Hermitian operator, and let A == A+ - A- be
its Jordan decomposition. (Both A+ and A- are positive operators.) Use the above theorem to show that, if P is any positive operator, then (IX. 55)
for every unitarily invariant norm. If A is normal, then for every positive operator P (IX. 56 ) l i l A - (Re A) + Ill < li l A - P i ll -
Theorem IX. 7. 5 Let A be any operator. Then for every positive operator p
(IX. 57 )
Proof.
Recall that 11 AI I � ==
li Re A l l � + ll lm A ll � - Hence,
I I A - ( Re A ) + I I � == l i Re A - (Re A) + II � + II Im A I I � From (IX. 55) , we see that l i Re A - ( Re This leads to the inequality (IX.57) . The
A) + ll� is bounded by
l i Re A - P II �
•
problem of finding positive approximants to an arbitrary operator A is much more complex for other norms. See the Notes at the end of the chapter. For normal approximants, we have a solution in all unitarily invariant norms only in the 2 x 2 case.
IX .
2 78
A Select ion of M atrix I nequalities
Theorem IX. 7.6 Let
A
be an upper triangular matrix (IX.58)
Let e
==
arg( .A 1
- .A2 ) ,
and let
No = Then
No
(
(IX. 59)
is normal, and for any normal matrix N we have
l i l A - No ll !
<
I l i A - Nlll
(IX . 60)
for every unitarily invariant norm.
P roof.
It is easy to check that N0 is normal. L et
N
==
(
X2 X3 X 4 X1
) be
any normal matrix. We must have ! x 2 l j x 3 1 in that case. t 1 t 2 1s. any matrix, . . . we can write Its offNow note t h at , 1 f T == .
diagonal part
( t� 5 )
( t3 t4 )
as
==
� (T - UTU * ) , where U is the diagonal matrix
with diagonal entries 1 and - 1 . Hence, for every unitarily invariant norm, we have
III T II I
>
Using this, we see that
I ll A - N I l l
>
Ill diag ( b - x 2 , - x 3 ) Ill
-
But ,
Thus the vector � (b, b) is weakly majorised by the vector ( lb - x 2 l , j x3 1 ), which in turn, is weakly majorised by the vector ( s 1 ( A - N ) , s 2 ( A - N )) as seen above. Since A N0 has singular values ( � b, � b) , this proves the • inequality (IX.60) . ,
-
Since every 2 x 2 matrix is unitarily equivalent to an upper triangular matrix of the form (IX. 58) , this theorem te lls us how to find a normal matrix closest to it .
Exercise IX. 7. 7 The measure of nonnormality of a matrix, with respect to any norm, was defined in Problems VII/. 6.4 and VII/. 6. 5. Theorem IX. 7. 6, on the other hand, gives for 2 x 2 matrices a formula for the distance to the set of all normal matrices. What is the relation between these two numbers for a given unitarily invariant norm ?
2 79
IX . 8 Problen1s
IX . 8
Problems
Problem IX. 8. 1 . Let A , B be positive matrices, and let m , k be positive integers with m > k. Use the inequality (IX. 13) to show that (IX. 6 1 ) The special case, t r(A B ) m < tr A m B m ,
(IX.62)
is called the Lieb-Thirring inequality.
Problem IX. 8 . 2 . Let A, B be Hermitian matrices. Show that for every positive integer m (i) ltr ( A B ) 2 m l < tr A 2 m B 2 rn , (ii) ltr(A m B m ) 2 J < tr A 2 m B 2 m , (iii ) l tr ( A B) 4m l < tr( A 2 m B 2 rn ) 2 . (Hint: By the Weyl Majorant Theorem ltr X rn l < tr i X I m , for every matrix X . ) Note that if A=
( � �1 ) '
B=
( �1 � )
'
then J tr (A B ) 3 1 == 5, ltr A 3 B 3 l == 4, tr(A B ) 6 = 9 , and tr(A 3 B 3 ) 2 = 0. This shows the failure of possible extensions of the inequalities (i) and (iii) above.
Problem IX. 8 . 3 . Are there any natural generalisations of the above in equalities when three matrices A, B , C are involved? Take, for instance, the inequality (IX.62) . A product of three positive matrices need not have posi tive eigenvalues. One still might wonder whether ltr(A B C) 2 1 < l tr A 2 B 2 C 2 ] . Construct an example to show that this need not be true.
Problem IX. 8 . 4. A possible generalisation of the Golden-Thompson in equality (IX. l9) would have been tr(eA+B+C ) < ltr( e Ae B e c ) l for any three Hermitian matrices A, B , C. This is false. To see this, let S1 , 52 , 53 be the Pauli spin matrices
If a 1 , a 2 , a 3 are any real numbers and a = (ai + a � + a § ) 1 12 , show that
280
IX .
A Select io n of M at r ix Inequalit ies
Le t Show that
For small
tr(e A + B + C )
2 cosh
t,
l tr( e A e B e 0 )
2 cosh
t[ l -
I
-12 t4
+ O(t6 ) ] .
t, the first quantity is bigger than the second.
Problem IX. 8 . 5 . Show that the Lie P roduct Formula has a generalisation: for any k matrices A 1 , A 2 , . . . , A k ,
. ( A1 A hm exp - exp -2
m -+CXJ
m
m
· · ·
)
Ak m
exp m
==
exp ( A 1 + A2 +
· · ·
+ Ak ) -
Problem IX.8 . 6 . Show that for any two matrices A, B we have
l i l A* B + B* A lii < III AA * + BB* III and
lil A* B + B* A lii < lil A* A + B* B ill
for every unitarily invariant norm.
P roblem IX. 8 . 7. Let X, Y be positive. Show that for every unitarily invariant norm
I I I X - Y l /1 <
(� ;)
From this, it follows that , for every A ,
I I A * A - AA* II < I I A I I 2 ' and
X
be any matrix. Problem IX. 8 . 8 . Let A, B be positive matrices and let Show that for all unitarily invariant norms, and for 0 < v < 1 ,
P roblem IX.8.9. Let A, B be positive operators and let T be any operat or such that I I T * x l l < II Ax ll and I! Tx l l < J I B xl l for all x . Show that, for all x , y and for 0 < v < 1 ,
IX. 8 P roblems
281
[Hint : From the hypotheses , it follows that A - 1 T and TB - 1 are contrac tions. The inequality (IX.38) then implies that (A- 1 ) 1 - vT (B - 1 ) v is a con traction.]
Problem IX.8. 10. Use the result of the above problem to prove the fol lowing. For all operators T , vectors x , y , and for 0 < v < 1 ,
This inequality is called the Mixed Schwarz Inequality.
Problem IX.8. 1 1 . Show that if A , B are positive matrices , then we have det ( J + A + B ) < det (/ + A)det (/ + B ) . Then use this and Theorem IX.5. 1 1 to show that, for any two matrices
A, B,
l det(I + A + B ) I < det (I + I A I )det( J + I B I ) .
(See Problem IV .5. 9 for another proof of this. )
Problem IX.8 . 1 2 . Show that for all positive matrices A, B tr(A(log A - log B ) ) > tr(A - B ) .
The example A
=
(IX.63)
( � � ) , B ( ! � ) shows that we may not have =
the operator inequality A(log A - log B ) > ( A - B ) .
Problem IX.8 . 1 3 . Let f be a convex function on an interval /. Let A, B be two Hermitian matrices whose spectra are contained in I. Show that tr[f (A) - f( B ) ] > t r [ ( A - B ) f '(B ) ] . The special choice f ( t )
==
(IX.64)
t log t gives the inequality (IX.63) .
Problem IX. 8 . 14. Let A be a Hermitian matrix and f any convex func tion. Then for £very unit vector x f ( ( x , Ax ) ) < ( x , f (A) x) .
This implies that, for any orthonormal basis x 1 , . . . , X n , n
L!( (xj ,Axj ) ) < j= l
tr
f (A ) .
The name Peierls-Bogoliubov inequality is sometimes used for this inequal ity, or for its special cases f(t) == e t , f(t) == e -t , etc.
282
IX .
A Selection of Matrix Inequalities
Problem IX.8. 15. The concavity assertion in Exercise IX. 6.4. can be gen eralised to several variables. Let t 1 , t 2 , t3 be positive numbers such that t 1 + t 2 + t3 < 1 . Let A 1 , A2 , A3 be positive operators. Note that
Use the concavity of the first factor above (which has been proved in Exer cise IX.6.4) and the integral representation (V .4) for the second factor to prove that the map (A 1 , A 2 , A3 ) ---7 A i1 ® A;2 @ A�3 is j ointly concave on triples of positive operators. More generally, prove that for positive num bers t1 , . . . , t k with t 1 + · · · + t k < 1 , the map that takes a k-tuple of positive operators (A1 , . . . , A k ) to the operator Ai1 @ · · @ A�k is jointly concave. ·
Problem IX.8 . 16. A special consequence of the above is that the map A @ k A I / k is concave on positive operators for all k == 1 , 2 , Use this to prove the following inequalities for n x n positive matrices A, B: . . .
---7
(i) (ii) (iii) (iv) (v) (vi)
®k (A + B) l fk > 1\ k ( A + B ) l fk > v k (A + B ) l f k > det (A + B) 1f n > per(A + B ) 1 f n > ck ( ( A + B ) 1 f k ) > 1\ k (A ) for 1 < k <
®kAl fk + ®k Blfk , 1\ k A l f k + 1\ k B l f k , v k A 1f k + v k B l f k , det A 1fn + det B 1fn , per A l / n + per B 1 f n, ck (A l f k ) + ck ( B l f k ) ,
where c k ( A ) == tr n. The inequality (iv) above is called the Minkowski Determinant Theorem and has been proved earlier (Corollary II.3.2 1 ) .
Problem IX. 8 . 17. Outlined below is another proof of the Lieb Concav ity Theorem which uses results on operator concave functions proved in Chapter 5. (i) Consider the space £ (1-l)
EB
L ( ti) with the inner product
(ii) Let A 1 , A 2 be invertible positive operators on 7t and let 1/2 ( A 1 + A2 ) . Let fl ( R)
tl1 2 ( R , S)
A
ARA- 1 ,
(A 1 RA ! 1 ,
A 2 RA;- 1 ) .
Then � is a positive operator on the Hilbert space £ (7t) and � 1 2 is a positive operator on the Hilbert space £(1i ) EB £ (1-l)
(iii) Nate that for any X in £(7t)
tr X* A t X A 1 - t
==
( X A 1 12 , � t ( X A 11 2 ) )
IX . 8 P roblems
283
and tr ( X * A t1 X A 11 - t + X * A t XA 1 - t ) 2 2
( ( X A 1l /2 X A21/2 ) ' 6.. t12 ( X A 11 1 2 ' X A 2l /2 ) ) . '
( iv ) Let
V
be the map from .C(7t) into .C(1t) ffi £(7t) defined
V( X A l /2 ) Show that
V
==
as
1 _ J2 ( XA 1/2 ' X A 1/2 ) .
1
2
is an isometry. Show that
( v ) Since the function f ( t) on [0, oo ) is operator concave for 0 < t using Exercise V.2.4, we obtain
<
1,
( vi) This shows that
tr X * A t X A l - t > 1/2 tr ( X * A i X A �- t + X * A � X A �- t )
when A 1 and A 2 are invertible. By continuity, this is true for all positive operators A 1 and A 2 . In other words, for all 0 < t < 1 , the function 1s concave. 2
2 operator matrices ( � of Lieb's concavity theorem.
( vii) Use
x
�)
and ( �
g ) to complete the proof
Problem IX.8. 18. Theorem IX. 7. 1 can be generalised as follows. Let cp be a mapping of the space of n x n matrices into itself that satisfies three conditions: (i) cp 2 is the identity map; i.e. , cp( cp(A) )
( ii )
c.p
is real linear; i.e. , cp(aA + (J B) all real a, (3.
==
==
A for all A.
acp (A) + (Jcp ( B) for all A, B and
( iii ) A and cp(A) have the same singular values for all A.
Then the set I(cp) == {A : cp(A) == A} is a real linear subspace of the space of matrices. For each A, the matrix � ( A + cp ( A )) is in I ( cp) , and for all unitarily invariant norms l i l A - � (A + cp ( A) ) I I I < l i l A - B i l l for all B E /(cp) .
284
IX .
A Selection of Mat rix Inequalities
Examples of such maps are cp(A ) == ±A * , cp (A ) == ±AT , and cp(A ) == ±A , where AT denotes the transpose of A and A denotes the matrix obtained by taking the complex conj ugate of each entry of A.
Problem IX. 8. 19. The Cayley transform of a Hermitian matrix A is the unitary matrix C ( A ) defined as
C(A) == (A - il ) ( A + i J) - 1 . If A , B are two Hermitian matrices, we have
;i [C( A ) - C(B)]
==
( B + iJ) - 1 (A - B) (A + iJ) - 1 .
Use this to show that for all j , 1 /2 S j ( C ( A ) - C ( B) ) < Sj ( A - B) . [No te that II ( A + i /) - 1 11 < 1 and l i ( B + i J) - 1 11 < 1 . ] I n particular , this gives
1/ 2 III C ( A ) - C ( B) III < lil A - B ill
for every unitarily invariant norm.
Problem IX.8 . 20. A 2 x 2 block matrix A == ( ��� ��� ) , in which the four matrices Aij are normal and commute with each other, is called binormal. Show that such a matrix is unitarily equivalent to a matrix A == ( �1 � ) , in which A1 , A 2 , B are diagonal matrices and B is positive. Let
A1 ( No = � u 2 B
lB � 2
),
where U is the unitary operator such that A 1 - A 2 that in every unitarily invariant norm we have -
for all 2n
x
==
UjA 1
- A2 l -
Show
-
l i l A - Na ill < l i l A - N I J I 2n normal matrices N .
Problem IX. 8 . 2 1 . An alternate proof of the inequality (IX.55) is out lined below. Choose an orthonormal basis in which A is diagonal and A == ( Ao+ � _ ) . In � his basis let P have the block decomposition P == ( ��� ��� ) . - pinching inequality, By the
lil A - P il l >
(
A + - p1 1 0
O
- A - - P22
)
Since both A- and P22 are positive, l i l A - I l l < lilA - + P22 l l l - Use thi s to prove the inequality (IX. 55) . This argument can be modified to give another proof of {IX. 56) also. For this we need the following fact . Let T and S be operators such that 0 < Re T < Re S, and Im T == Im S. If T is normal, then I I I T I I I < I I I SI I L for every unitarily invariant norm. Prove this using the Fan Dominance Theorem (Theorem IV.2.2) and the result of Problem III.6.6.
IX. 9 Notes and References
IX . 9
285
Notes and References
Matrix inequalities of the kind studied in this chapter can be found in several books. The reader should see, particularly, the books by Horn and Johnson, Marshall and Olkin, Marcus and Mine, mentioned in Chapters 1 and 2; and , in addition, M.L. Mehta, Matrix Theory, second edition , Hindustan Publishing Co. , 1989, and B . Simon, Trace Ideals and Their Applications, Cambridge University Press, 1979 . Proposition IX. 1 . 1 is proved in B . Simon, Trace Ideals, p. 95. Proposition IX. l . 2 , for the operator norm , is proved in A. Mcintosh , Heinz inequalities and perturbation of spectral families, Macquarie Math. Reports, 1979 ; and for all unitarily invariant norms in F. Kittaneh, A no te on the arithmetic geometric mean inequality for matrices, Linear Algebra Appl. , 171 ( 1992 ) 1-8. The Lie product formula has been extended to semigroups of operators in Banach spaces by H.F. Thotter. Our treatment of the material between Theorem IX.2. 1 and Exercise IX.2.8 is based on T. Furuta, Norrn inequalities equivalent to Lowner-Heinz theorem, Reviews in Math. Phys. , 1 ( 1989) 135- 1 37. The inequality (IX . 5 ) can also be found in H . O . Cordes, Spectral Theory of Linear Differential Operators and Comparison Algebras, Cambridge University Press , 1987. Theorem IX. 2 . 9 is taken from B. Wang and M. Gong, Some eigenvalue inequalities for positive semidefinite matrix power products, Linear Algebra Appl. , 184( 1993) 249-260. The inequality (IX. 13) was proved by H. Araki, On an inequality of Lieb and Thirring, Letters in Math. Phys. , 19( 1990) 167- 1 70. Theorem IX. 2. 10 is a rephrasing of some other results proved in this paper. The Golden-Thompson inequality is important in statistical mechanics. See S. Golden, Lower bounds for the Helmholtz function, Phys. Rev. B , 137 (1965) 1 127- 1 128, and C.J. Thompson) Inequality with applications in sta tistical mechanics, J . Math. Phys. , 6 ( 1965) 1812- 1 8 1 3 . It was generalised by A. Lenard , Generalization of the Golden- Thompson inequality, Indiana Univ. Math. J. 21 ( 1971) 457-468, and further by C . J . Thompson, Inequali ties and partial orders on matrix spaces, Indiana Univ. Math. J. 21 ( 1971 ) 469-480. These ideas have been developed further in the much more gen eral setting of Lie groups by B. Kostant , On convexity, the Weyl group and the Iwasawa decomposition, Ann. Sci. E.N. S . , 6 ( 1973) 413-455, and sub sequently by others. Inequalities complementary to the Golden-Thompson inequality and its stronger version in Exercise IX. 3 . 8 have been proved by F. Hiai and D . Petz, The Golden- Thompson trace inequality is com plemented, Linear Algebra Appl. , 18 1 (1 993) 153- 185, and by T. Ando and F. Hiai, Log majorization and complementary Golden- Thompson type in equalities, Linear Algebra Appl. , 197 / 198( 1994) 1 13- 1 3 1 . Theorem IX. 3. 1 is a rephrasing of some results in J .E. Cohen, Inequalities for matrix ex ponentials, Linear Algebra Appl. , 1 1 1 ( 1988) 25-28 . Further results on such in equalities may be found in J.E. Cohen, S. Friedland , T. Kato, and F.P.
286
IX.
A Selection of Matrix Inequali ties
Kelly, Eigenvalue inequalities for products of matrix exponentials, Linear Algebra Appl. , 45 ( 1 982) 55-95 , in D . Petz, A survey of certain trace in equalities, Banach Centre Publications 30( 1994) 287-298, and in Chapter 6 of Horn and Johnson, Topics in Matrix Analysis. Theorem IX.4.2 was proved in R. Bhatia and F. Kittaneh, On the singular values of a product of operators, SIAM J. Matrix Analysis, 1 1 ( 1990) 272-277. The generalisation given in Theorem IX.4.5 is due to R. Bhatia and C. Davis, More matrix forms of the arithmetic-geometric mean in equality, SIAM J. Matrix Analysis, 14( 1993) 132- 136. Many of the other results in Section IX.4 are from these two papers. The proof outlined in Exercise IX.4.6 is due to F. Kittaneh, A note on the arithmetic-geometric mean inequality for matrices, Linear Algbera Appl. , 1 71 ( 1 992) 1-8. A gen eralisation of the inequality (IX.2 1 ) has been proved by T. Ando, Matrix Young inequalities, Operator Theory: Advances and Applications, 75 ( 1995) 33-38. If p, q > 1 and � + ! == 1 , then the operator inequality l AB * I < U( lp l A I P + lq j B i q ) U * is valid for some unitary U. Theorems IX.5. 1 and IX.5 . 2 were proved in R. Bhatia and C . Davis, A Cauchy-Schwarz inequality for operators with applications, Linear Al gebra Appl. , 223 ( 1 995) 1 19- 129. For the case of the operator norm, the inequality (IX. 38) is due to E. Heinz , as are the inequality (IX.29) and the one in Problem IX.8.8. See E. Heinz , Beitriige zur Storungstheorie der Spektralzerlegung, Math. Ann. , 123 ( 1 95 1 ) 4 15-438. Our approach to these inequalities follows the one in the paper by A. Mcintosh cited above. The inequality in Problem IX.8.9 is also due to E. Heinz. The Mixed Schwarz inequality in Problem IX.8 . 10 was proved by T. Kato, Notes on some in equalities for linear operators, Math. Ann. , 125( 1952) 208-2 12. (The papers by Heinz, Kato, and Mcintosh do much of this for unbounded operators in infinite-dimensional spaces. ) The class £ in Definition IX. 5 . 6 was in troduced by E.H. Lieb, Inequalities for some operator and matrix func tions, Advances in Math. , 20( 1976) 174- 1 78. Theorem IX. 5. 1 1 was proved in this paper. These functions are also studied in R. Merris and J .A. Dias da Silva, Generalized Schur functions, J. Algebra, 35 ( 1975) 442-448. B . Simon ( Trace Ideals, p . 99) calls them Liebian functions. The character isation in Theorem IX. 5. 10 has not appeared before; it simplifies the proof of Theorem IX.5. 1� considerably. The Lieb Concavity Theorem was proved by E.H. Lieb, Convex trace functions and the Wigner- Yanase-Dyson conjecture, Advances in Math. , 1 1 ( 1973) 267-288. The proof given here is taken from B. Simon, Trace Ide als. T. Ando, Concavity of certain maps on positive definite matrices and applications to Hadamard products, Linear Algebra Appl. , 26 ( 1979) 20324 1 , takes a different approach. Using the concept of operator means, he first proves Theorem IX.6.3 (and its generalisation in Problem IX. 8. 15) and then deduces Lieb �s Theorem from it. The proof given in Problem IX. 8. 17 is taken from D. Petz , Quasi-entropies for finite quantum systems, Rep.
I X . 9 Notes and References
28 7
Math. Phys . , 2 1 ( 1986) 57-65. Theorem IX.6 . 5 was proved by G. Lindblad , Entropy, information and quantum measurements, Commun. Math. Phys. , 33 ( 1 973) 305-322 . Our proof is taken from A. Cannes and E. St¢rmer, Entropy for automorphisms of 1 11 von Neumann algebras, Acta Math. , 134 ( 1975) 289-306. The reader would have guessed from the titles of these papers that these inequalities are useful in physics. The book Quantum En tropy and Its Use by M. Ohya and D . Petz, Springer-Verlag, 1993 , contains a very detailed study of such inequalities. Another pertinent reference is D. Ruelle, Statistical Mechanics, Benj amin, 1969. The inequalities in Prob lem IX. 8. 1 6 are taken from T. Ando, Inequalities for permanents, Hokkaido Math. J . , 1 0 ( 1 98 1 ) 18-36, and R. Bhatia and C. Davis , Concavity of certain functions of matrices, Linear and Multilinear Algebra, 1 7 ( 1985 ) 155- 164. Theorems IX. 7. 1 and IX. 7.2 were proved in K. Fan and A.J. Hoffman, So me metric inequalities in the space of matrices, Pro c. Amer. Math. Soc. , 6( 1955) 11 1- 1 16. The inequalities in Problem IX.8. 19 were also proved in this paper. The result in Problem IX.8 . 1 8 is due to C.-K. Li and N.-K . Tsing, On the unitarily invariant norms and some related results, Lin ear and Multilinear Algebra, 20 (1987) 107- 1 19 . Two papers by P.R. Hal mas, Positive approximants of operators, Indiana Univ. Math. J. 2 1 ( 1972) 95 1-960, and Spectral approximants of normal operators, Proc. Edinburgh Math. Soc. , 19 ( 1974) 5 1-58, made the problem of operator approximation popular among operator theorists. The results in Theorem IX. 7 .3, Exer c ise IX. 7.4 , and in Problem IX.8. 2 1 , were proved in these papers for the special case of the operator norm (but more generally for Hilbert space operators) . The first paper of Halmos also tackles the problem of finding a positive approximant to an arbitrary operator, in the operator norm. The solution is different from the one for the Hilbert-Schmidt norm given in Theorem IX. 7. 5, and the problem is much more complicated. The problem of finding the closest normal matrix has been solved completely only in the 2 x 2 case. Some properties of the normal approximant and algorithms for finding it are given in A. Ruhe, Closest normal matrix finally found! BIT, 27 ( 1987) 585-598. The result in Problem IX. 8 . 20 was proved, in the special case of the operator norm, by J. Phillips, Nearest normal approx imation for certain normal operators, Proc. Amer. Math. Soc. , 67 ( 1977) 236-240. The general result was proved in R. Bhatia, R. Horn, and F. Kit taneh, Normal approximants to binormal operators, Linear Algebra Appl. , 147( 199 1 ) 169- 1 79. An excellent survey of matrix approximation problems, with ma ny references and applications, can be found in N.J. Higham, Ma trix nearness problems and applications, in the collection Applications of Matrix Theory, Oxford University Press, 1989. A particularly striking ap plication of Theorem IX. 7.2 has been found in quantum chemistry. Given n linearly independent unit vectors e 1 , . . . , e n in an n-dimensional Hilbert space, what is the orthonormal basis f1 , . . . , f that is closest to the ej , in the sense that L: l l eJ - fJ 11 2 is minimal? The Gram-Schmidt procedure does not lead to such an orthonormal basis. The chemist P. O. Lowdin, On the n
288
IX .
A Selection of Matrix Inequalities
non- orthogonality problem connected with the use of atomic wave functions in the theory of molecules and crystals, J. Chern. Phys. , 1 8 ( 1950) 365-374, found a procedure to obtain such a basis. The problem is clearly equivalent to that of finding a unitary matrix closest to an invertible matrix, in the Hilbert-Schmidt norm. Theorem IX. 7.2 solves the problem for all unitar ily invariant norms. The importance of such results is explained in J .A. Goldstein and M. Levy, Linear algebra and quantum chemistry, American Math. Monthly, 78 ( 199 1) 710-718.
X Pert urb ation of Matrix Functions
In earlier chapters we derived several inequalities that describe the variation of eigenvalues, eigenvectors, determinants, permanents , and tensor powers of a matrix. Similar problems for some other matrix functions are studied in this chapter.
X. l
O perator Monotone Functions
If a , b are po3itive real numbers, then it is easy to see that I a ' - b ' I > I a - b l' if r > 1 , a nd I a ' - b' l < I a - b l ' if 0 < r < 1. The inequalities i n this section are extensions of these elementary inequalities to positive operators A, B . Instead of the power functions f ( t) t' , 0 < r < 1 , we shall consider the more general class of operator monotone functions. ==
Theorem X. l . l Let f be an operator monotone function on [0 , oo ) such that f (O) == 0 . Then for all positive operators A , B ,
ll f ( A) - f ( B) I I
<
f( II A - E ll ) -
(X. 1 )
Proof. Since f is concave (Theorem V.2.5) and f ( O ) == 0 , we have f (a + b ) < f (a ) + f ( b ) for all nonnegative numbers a , b . Let a == II A - E ll - Then A - B < a!. Hence , A < B + al and f (A) < f ( B + a!) . By the subadditivity property of f mentioned above, therefore, f(A) < f ( B ) + f ( a)I . Thus f ( A ) - f (B) < f(a)I and, by symmetry, f(B) - f (A ) < f (a)I. This implies that lf( A) - f( B) j < f (a)I. Hence, • ll f (A ) - f ( B) I I < ! ( a ) == !( II A - E ll ) .
290
X.
Perturbat ion of M atrix Functions
Note the special consequence of the above theorem:
II A' - B ' II
II A - B II' , 0 < r < 1 A , B . Note also that the
(X. 2)
<
for any two positive operators above proof shows that
l ll f (A) - f(B) III
<
argument in the
f( II A - B JI ) IIIIIII
for every unitarily invariant norm.
Exercise X. 1 . 2 Show that for 2 x 2 positive matrices, the inequality II A 1 / 2 - B 1 12 11 2 < II A - B ll ; 1 2 is no t always valid. (It is false even whe n B 0.) ==
The inequality (X.2) can b e rewritten i n another form:
I I A' - B ' Il < I J I A - B J' II ,
(X. 3)
O < r < l.
This has a generalisation to all unitarily invariant norms . Once again, for this generalisation, it is convenient to consider the more general class of operator monotone functions. Recall that every operator monotone function f on [0, oo ) has an integral representation
f( t )
==
� + f3t +
CXJ
J
>.. t
0 ).. +
t dw(>.. ) ,
where � == f( O ), /3 > 0 and w is a positive measure such that < oo . ( See (V . 53) .)
(X.4)
J0CXJ 1 �>. dw (>.. )
Theorem X. 1 . 3 Let f be an operator monotone function on [0, oo ) such that f ( O ) == 0. Then for all positive operators A , B and for all unitarily invariant norms
lll f ( A ) - f (B) III
<
III J (I A - B l ) lll -
(X.5)
In the proof of the theorem, we will use the following lemma.
Lemma X. 1 . 4 Let X, Y be positive operators. Then
for every unitarily invariant norm. Proof. Since (X + I ) - 1 < I , by Lemma V. l . 5 we have Y 11 2 (X + 1 )- 1 Y 1 1 < Y. Hence,
2
X . l O perator Monotone Functions
291
Therefore, by the Weyl �1onotonicity Principle ,
for all j . Note that Y 1 1 2 ( X + J )- 1 Y 1 1 2 has the same eigenvalues as those of ( X + I) - 2 Y (X + I )- 2 . So, the above inequality can also be written as 1
l
From the identity
( X + J ) - 1 - (X + Y + I) - 1 == ( X + J ) - 2 { I - [( X + I ) - 2 Y ( X + J ) - 2 + I] - 1 } ( X + J ) - 2 l
l
1
1
l
and the fact that I I ( X + J )- 2 11 < 1 , we see that
AJ ( [X + J] - 1 - (X + Y + I] - 1 ) <
A; ( I - ( ( X + I) - � Y(X + I ) - � + I ] - 1 ) .
Thus, for all j . This is more than what we need to prove the lemma.
•
Proof of Theorem X. 1 . 3 : By the Fan Dominance Property ( Theorem IV.2.2) it is sufficient to prove the inequality (X. 5) for the special class of norms I I · ll ( k) ' k == 1 , 2, . . . , n . We will first consider the case when A - B is positive. Let C == A - B . We want to prove that
l l f( B + C ) - f(B) I ! (k) < l l f( C ) I ! (k) · Let O"j == Sj (C) , j == 1 , 2, . . . operator C, we have
, n.
Since
()i
(X. 6)
are the eigenvalues of the positive
sj ( h ( C )) == h( ()j ) , j == 1 , . . . , n , for every monotonically increasing nonnegative function h ( t) on [0, oo ) . Thus
k
ll h ( C ) I I (k) == L h ( ()j ) , j= 1 for all such functions
h.
k == 1 , . . .
, n,
292
X.
Perturbation of lviatrix Functions
Now , f has the representation (X.4) with 1 0. The functions {3t and ).>::-t are nonnegative , monotonically increasing functions of t. Hence, ==
I I J ( C ) II ck)
j= l
,6 1 [ G i l (k) +
j A l l C ( C + AI) - I l l ( k) dw ( CX)
>.) .
0
In the same way, we can obtain from the integral representation of f
ll f(B +
<
C) - f ( B ) l l c k )
j A I I ( B + C) ( B + C (X)
,B I I C II tkl +
+ >. I ) - 1
- B ( B + >.. I ) - 1 II c k ) dw ( >.. ) .
0
Thus, our assertion (X.6) will be proved if we show that for each
)..
>0
Now note that we can write
1 X ( ) X ( X + .Xl) - 1 == 1 - ): + I .
So, the above inequality follows from Lemma X. l .4. This proves the theo rem in the special case when A - B is positive. To prove the general case we will use the special case proved above and two simple facts. First, if X , Y are Hermitian with positive parts x+ , y+ in their respective Jordan decompositions, then the inequality X < Y implies that n x+ I I ( k ) < I I Y+ I I ( k ) for all k . This is an immediate conse quence of Weyl 's Monotonicity Principle. Second, if X1 , X2 , Y1 , Y2 are pos 0, yl y2 = 0, I I Xl l l c k ) < I I YI I ! ( k ) , and itive operators such that x l x2 I IX2 I I (k ) < I I Y2 l l ( k ) for all k, then we have IIX1 + X2 ll{k) < II Y1 + Y2 l l (k ) for all k. This can be easily seen using the fact that since X1 and X2 commute they can be simultaneously diagonalised, and so can Y1 , Y2 . Now let A , B be any two positive operators. Since A - B < (A - B ) + , we have A < B + (A - B ) + , and hence f ( A ) < f (B + (A - B ) + ) . From this we have ==
j ( A ) - f ( B ) < f ( B + ( A - B) + ) - f ( B ) , and , therefore, by the first observation above,
11 [ /(A ) - f ( B )J + I I c k )
< I I J ( B + (A - B) + ) - f ( B ) I I c k)
X . l O p erator 1vlonotone Fu nct ions
for all k. Then, using the special case of t h e conclude that
for all k. Interchanging
theorern
293
proved above \ve can
A and B, we have
for all k. Now note that
f([A - B ] + )j ( [ B - A] + ) f( [A - B] + ) + f ( [B - A] + ) [f(A) - f(B) ] + [f(B ) - j(A )] + [ f ( A ) - f(B) ] + + [ f(B) - f(A)] +
0,
f(lA - Bl ) , 0, lf ( A) - /( B) I .
Thus , the two inequalities above imply that
ll f (A) - f(B ) I I c k ) < ll f ( I A - B l ) ll c k ) for all k. This proves the theorem.
•
Exercise X. 1 . 5 Show that the conclusion of Theorem X. 1 . 3 is valid for all nonnegative operator monotone functions on (0 , oo) ; i. e. , we can replace the condition f( O ) == 0 by the condition j ( O ) > 0 . One should note two special corollaries of the theorem: we have for all positive operators A , B and for all unitarily invariant norms (X. 7) l!l log( J + A)
- log( J + B) lll <
lll log( J +
l A - B l ) lll -
(X.8)
Theorem X. 1 . 6 Let g be a continuous strictly increasing map of [0, oo) onto itself. Suppose that the inverse map g - 1 is operator mono tone. Then for all positive operators A, B and for all unitarily invariant norms, we have (X.9) I J i g( A ) - g(B) III > lll g(j A - B l ) lll P roof. Let f == g- 1 . Since f is operator monotone, it is concave by Theorem V.2.5. Hence g is convex. From Theorem X. 1 .3, with g(A) and g (B) in place of A and B , we have
Ili A - Bi ll < II IJ ( I g ( A ) - g (B ) I ) II I This is equivalent to the weak majorisation
{ sj ( A - B) } -< w { sj (f ( ! g (A ) - g ( B) I )) } .
X.
294
Since f
is
Pert urbation of M atrix Fu nct ions
monotone ,
s1 (J( jg(A) - g ( B ) I ) ) == f (s j (g(A) - g( B ) ) )
for each j . So, we have
Since
Since
g
g
{ sj (A - B ) } -< w { f (sj (g (A) - g( B ) ) ) } .
is convex and monotone, by Corollary II.3.4, we have from this {g(sj (A - B ) ) } -< w { sj (g(A) - g( B ) ) } .
is monotone, this is the same
as
saying that
{ Sj (g j A - B l ) } -< w { S j (g(A) - g ( B ) ) } ,
and this, i n turn, implies the inequality (X.9) .
•
Two special corollaries that complement the inequalities (X. 7) and (X. 8 ) are worthy of note. For all positive operators A, B and for all unitarily invariant norms, III A r - B r ill > III l A - B l r lll , if r > 1 ,
(X. lO)
I ll exp A - exp B ill > I l l exp ( I A - B l ) - I l ll -
(X. l l )
Exercise X. l . 7 Derive the inequality (X. l 0) from (X. 7) using Exercise IV. 2. 8.
Is there an inequality like (X. 10) for Hermitian operators A, B ? First note that if A is Hermitian, only positive integral powers of A are meaningfully defined. So, the question is whether I I I ( A - B ) m l l l can be bounded above by I I I Am - Em i l i · No such bound is possible if m is an even integer; for the choice B == A we have Am - Em == 0. For odd integers m , we do have a satisfactory answer. -
,
Theorem X. l . 8 Let A, B be Hermitian operators. Th__e n for all unitarily �; invariant norms, and for m == 1 , 2 , . . . ,
( X. 12) For the proof we need the following lemma. Lemma X. 1.9 Let A, B be Hermitian operators, let X be any operator, and let m be any positive integer. Then, for j == 1 , 2, . , m , .
.
I II A m+j x B m - j+l - A m-j+l x B m+j lll < III A m+ j + l X B m - j - A m - j X B m+ j + l i J I .
(X. 13)
X. l
Operator Mo notone Funct ions
295
Proof. By the arithmetic-geometric mean inequality proved in Theorem IX.4. 5 , we have for all operators X and Hermitian A, B . We will use this to prove the inequality ( X. 13 ) . First consider the case j == 1 . We have
III Am + l X B m - A m X B m +l lll III A( Am X B m - 1 - A m - 1 X B m )B ! II < 1 /2 III A 2 (A mx B m - 1 - A m- l x B m ) + (A m XB m-1 _ A m- 1 XB m ) B 2 11 1 < l/2 III A m+ 2 x B m - I - Am - I x B m + 2 111 + 1/2 II J A m+lx B m - A m XB m+ 1 lll Hence
This shows that the inequality (X. l3) is true for j == 1 . The general case is proved by induction. Suppose that the inequality ( X. 13 ) has been proved for j - 1 in place of j. Then using the arithmetic-geometric mean inequality, the triangle inequality, and this induction hypothesis, we have
III Am+j X B m - j +l - A m - j + l X B m+j lll I J I A(Am+j - l X Bm - j - Am - ix B m+j - l )B III < 1 /2 III A 2 (Am+j -l X B m - j - A m - j X B m+ j - 1 ) + (A m+ j - l XB m- j - A m - j XB m+ j - 1 )B 2 111 m+i+l X B m -j - Am -i X B m +j + l lll < 1/2 + 1/2 I II A m + (j - 1) X B m - {j - 1 ) + 1 - A m - (j - 1 ) + 1 X B m + (j - 1 ) Ill < 1/2 1 11 Am+j + 1 X B m - j - A m - j X B m+j + 1 Ill + 1/2 I II A m+ j XB m - {i- 1 ) - A m - (i - 1 ) X B m+ i lll -
lil A
This proves the desired inequality.
•
Proof of Theorem X. 1 . 8 : Using the triangle inequality and a very special case of Lemma X. l . 9, we have
III A2 m ( A - B) + ( A - B)B 2m lll < I II A 2 m + 1 - B2 m+ 1 1 11 + I I I A2 m B - AB 2 m l ! (X . 14) < - B 2 m+l lll · Let C == A - B and choose an orthonormal basis Xj such that Cxj A.1 x1 , where sj ( C ) for j 1 , 2 , . . , n . Then, by the extremal representation
2 j i A2m. + l
j >..1 j
=
=
=
.
X.
296
Perturbat ion of Matrix Functions
of the Ky Fan norms given in Problem III.6.6,
k II A 2 m c + CB 2 m ll (k) > L l ( xj , (A 2 m c + CB 2 m) xj) l j= l k
L !-Xj l { ( xj , A2 rn xj ) + ( xj , B 2 m xj ) } . j= l Now use the observation in Problem IX.8. 14 and the convexity of the func tion f ( t) == t 2 rn to see that
( xj , A2 mxj) + ( xj , B 2 mxj)
We thus have
I I A2 mC + CB 2 m !! (k)
l 1Axj ll 2 m + I J B xj ll 2 m 2 1 - 2 m ( II Axj II + J ! B xj 11 ) 2 m 2 1 - 2 m i 1Axj - Bxj l l 2 m 2 1 - 2m 1 -Aj 1 2 m .
> > >
>
k
L 2 1 - 2 m 1Aj l 2 m + l j=l
Since this is true for all k, we have
for all unitarily invariant norms. Combining this with (X. l4) we obtain the • inequality (X. 12). Observe that when B -A, the two sides of (X. l2) are equal. ==
X.2
The Absolute Value
In Section VIL5 l i l A - B i l l - More (A * A ) 1 1 2 , results section are useful
we obtained bounds for I l l I A I - I B I I l l in terms of such bounds are obtained in this section. Since I A I for the square root function obtained in the preceding here. ==
Theorem X.2. 1 Let invariant norrn,
A, B
be any two operators. Then, for every unitarily
III I A I - I B I III < v'2 ( III A + B i l l l i l A - B lll ) 1 1 2 . P roof.
(X. 15)
From the inequality (X. 7) we have
I II I A I - I B I III < III l A* A - B* B l 11 2 lll ·
( X. 16 )
X . 2 The Absolute Value
297
Note that
A* A - B * B == 1 /2 { (A + B) * (A - B ) + ( A - B)* (A + B ) } .
( X . 17)
Hence, by Theorem III. 5.6, we can find unitaries U and V such that
j A* A - B * B l < 1/2 {Uj (A + B) * (A - B) j U* + V I (A - B) * ( A + B) I V * } . Since the square root function is operator monotone, this operator inequal ity is pre s erved if we take square roots on both sides . Since every unitarily invariant norm is monotone, this shows that
III l A * A - B* B l 1 /2 lll 2 < 1/2 ll l [U I (A + B)* (A - B) I U * + V j (A - B ) * (A + B) I V* P I 2 III 2 By the result of Problem IV. 5.6, we have
for all X, Y . Hence ,
I I I l A* A - B* B l 11 2 l l l 2 < lll l (A + B ) * (A - B) l 1 1 2 lll 2 + III I ( A - B)* ( A + B) l 1 1 2 lll 2 By the Cauchy-Schwarz inequality (IV . 44 ) , the right-hand side is bounded above by 2 I I I A + B !J I I II A - B ill - Thus the inequality (X . 1 5 ) now follows from • (X. l6) .
Example X. 2 . 2 Let
Then
I l i A ! - l E I II 1
==
2,
II A + B l h == II A - B l h == v'2.
So, for the trace norm the inequality (X. 1 5) is sharp. An improvement of this inequality is possible for special norms.
Theorem X.2.3 Let A, B be any two operators. Then for every Q -norm {and thus, in particular, for every Schatten p-norrn with p > 2 ) we have
I I I A I - I B l llo < ( II A + B ll o ii A - B I I Q ) 1 1 2 .
(X. 18)
Proof. By the definition of Q-norms , the inequality (X. l8) is equivalent to the assertion that (X. 19)
X.
298
Pert u r bat ion of Matrix Functions
for all unitarily invariant norms. From the inequality (X. 10) we have
Using the identity (X. l 7) , we see that
l i l A* A - B * B i ll
1 / 2 { III (A + B ) * (A - B ) l l l + III ( A - B ) * ( A + B ) l l l } .
<
Now, using the Cauchy-Schwarz inequality given in Problem IV. 5 .7, we see that each of the two terms in the brackets above is dominated by •
This proves the inequality (X. 19) .
Theorem X . 2 . 4 Let p- norms with 1 < p <
A, B 2,
be any two operators. Then, for all Schatte n (X. 20)
Proof.
Let
II X l i P
: == (� sj ( X)) l /
p,
for all
p
> 0.
When p > 1, these are the Schatten p-norms. When 0 < p < 1 , this defines a quasi-norm. Instead of the triangle inequality, we have
(X.2 1) (See Problems IV.5. 1 and IV. 5 . 6 . ) Note that for all positive real numbers r and p, we have
(X.22 )
Thus, the inequality (X. 7) , restricted to the p-norms, gives for all positive operators A, B
(X. 23 ) Hence, for any two operators
A , B,
I I I A I - I BI I I P < I I A* A - B* B l l �j� , 1 < p < the identity (X 1 7 ) and the property (X.21) to
00 .
Now use
see that , for
1 < p < 2, I I A * A - B* B ll p /2 < 2 2/ p - 2 { I I ( A + B ) * (A - B ) llp /2 + II (A - B)* (A + B ) ,,p/2 }From the relation (X 2 2 ) and the Cauchy-Schwarz inequality (IV. 44) , it fol .
.
lows that each of the two terms in the brackets is dominated by II A + B ll p l i A - B l i p · Hence II A* A -
B * B ll p /2 < 2 2 /p - l i i A + B ll p i i A - B l i P
X.2 The Absolute Value
for 1 <
p
•
< 2 . This proves the theorem.
The example given in X.2 . 2 shows that , for each 1 < (X .20) is sharp. In Section VIL5 we saw that
p
299
< 2 , the inequality
( X . 24) for any two operators A, B. Further, if both A and B are normal, the factor J2 can be replaced by 1. Can one prove a similar inequality for the operator norm instead of the Hilbert-Schmidt norm? Of course , we have from (X.24) II I A I - l E I II < V2n II A - B l l
(X . 2 5)
for all operators A , B on an n-dimensional Hilbert space 7t. It is known that the factor ffn in the above inequality can be replaced by a factor en == O ( log n) ; and even when both A , B are Hermitian, such a factor is necessary. (See the Notes at the end of the chapter. ) In a slightly different vein, we have the following theorem.
Theorem X . 2 . 5 {T. Kato) For any two operators A , B we have
�
I I I A I - I B i ll <
II A - E ll
(
2 + log
1 �1 � �� I I ) .
(X. 26)
Proof. The square root function has an integral representation (V .4) ; this says that t l /2
=
� /1
CXJ
J 0
We can rewrite this
t A - l / 2 dA. A+t
as
= : j [>.. - 1 / 2 CXJ
t l /2
_
A l /2 (A
-+
t ) - l ] dA.
0
Using this, v.re have
� J A 1 12 [( 1 B I 2 + A) - I - ( I A I 2 + A) 00
IAI - IBI
=
- l ]dA .
( X .27 )
0
We will estimate the norm of this integral by splitting it into three parts. Let
X.
300
Pert urbation of M at rix Functions
Now note that if X, Y are two positive operators, then -Y < X - Y < < and hence I I X - Y ll max( I I X II , I I Y I I ) . Using this we see that
X,
Q
II
J >._ 1 /2 [ ( 1 B I 2 + >.. ) - 1 - ( I A I 2 + .A) - 1 ]dA I I 0
<
Q
J
>.. - 1 1 2 d >..
=
2a 112
=
2 II A - E l l ·
(X. 28)
0
From the identity
and the identity (X. l 7) , we see that ll ( I B I 2 + A) - 1 - ( I AI 2 + A ) - 1 11
<
<
A - 2 I I A + B I I I I A - E ll A - 2 ,e 1 /2 11 A - B I I .
Hence,
J >.. 1 12 [ ( 1 B I 2 + >.. ) - ! - ( I A I 2 + A) - 1 ] dA II < 2 II A - E ll (X)
II
{3
(X.30)
Since A * A - B * B == B * (A - B ) + (A* - B * ) A, from (X. 29) we have ( I B I 2 + A ) - 1 - ( I A I 2 + A) - 1 - ( I B I 2 + A) - 1 B * (A - B ) ( I A I 2 + A) - 1 + ( I B I 2 + A) - 1 (A * - B * ) A ( I A I 2 + A ) - 1 .
(X.31)
Note that I I ( I B I 2 + A ) - 1 B* l l
-
II B( IBI2 + A)- 1 11
-
I I I B I ( I B I 2 + >.. ) - 1 1 1
since the maximum value of the function f (t) argum ent ,
So , from (X. 3 1 ) we obtain
==
t2
�
A
<
�
2 >.. / 2 '
is 2 ; 1 2 By the same A •
X . 3 Lo cal Perturbation Bo unds
Hence
II
�
1 >, l /2 [ ( 1 E I 2 Q
+
A ) � l - ( I A I 2 + A ) � l ] d A II < II A - E l l
II A - E l l log
301
�
J A� l dA Q
(3
=
0:
2 II A - E ll log
II A II + II E I I II A - E ll _
(X.3 2 )
Combining (X.2 7 ) , (X. 28) , ( X . 30) , and (X.32 ) , we obtain the inequality • (X. 26) .
X. 3
Lo cal Perturbation B ounds
Inequalities obtained above are global, in the sense that they are valid for all pairs of operators A, B . Some special results can be obtained if B is restricted to be close to A , or when both are restricted to be away from 0. It is possible to derive many interesting inequalities by using only elementary calculus on normed spaces. A quick review of the basic concepts of the Fn§chet differential calculus that are used below is given in the Appendix. Let f be any continuously differentiable map on an open interval I. Then the map that f induces on the set of Hermitian rnatrices whose spectrum is contained in I is Frechet differentiable. This has been proved in Theorem V.3 . 3 , and an explicit formula for the derivative is also given there. For each A , the derivative D f( A ) is a linear operator on the space of all Hermitian matrices. The norm of this operator is defined as
l lD f (A) II == sup I I D f (A) ( B ) II
II B II= l
(X.33)
More generally, any unitarily invariant norm on Hermitian matrices leads to a corresponding norm for the linear operator D f ( A) ; we denote this as
IIIDJ( A) III
==
sup III D J ( A) ( B) III
III B III= l
(X. 3 4 )
For some special functions f, we will find upper bounds for these quanti ties. Among the functions we consider are operator monotone functions on (O , oo ) . The square root function j (t) = t11 2 is easier to handle, and since it is especially important , it is worthwhile to deal with it separately.
Theorem X. 3 . 1 Let f (t) == t11 2 , 0 < t < oo . Then for every positive operat or A, and for every unitarily invariant norm, (X. 35 )
Proof. The function g ( t) == t 2 0 < t < oo is the inverse of f . So, by the chain rule of differentiation , D f ( A ) == [D g ( f (A) ) ] - 1 for every positive �
302
X.
Pert urbat ion of l\1latrix Funct ions
operator A. Note that Dg (A) ( X ) == AX + XA, for every [Dg ( J ( A) )] (X) == A 1 / 2 X +
2 1 XA 1 .
X. So
> a n > 0, then dist (O" (A 1 1 2 ) , O" ( - A 1 1 2 ) ) If A has eigenvalues a2 1 > == 2 a;1 2 == 2 II A - l ll - 1 1 . Hence , by Theorem VII. 2 . 1 2 , ·
·
·
This prove s the theorem .
Exercise X.3. 2 Let f
E
•
C 1 (I) and let f ' be the derivative of f . Show that
1 1 / ' (A) I I == II D J (A) ( I ) ll < Thus, for the function f (t) ==
t 1 12
II Df(A) ll -
(X.36)
on (0, oo ) ,
I I D J (A) I I ==
ll f ' (A) I I
(X. 37)
for all positive operators A .
Theorem X.3.3 Let cp be the map that takes an invertible operator its absolute value I A I . Then, for every unitarily invariant norrn, 11/ D cp (A) Il l < cond ( A ) == I I A - I ll II A ll .
A
to
(X.38)
Let g (A) == A* A. Then Dg (A) ( B ) == A * B + B * A. He nce III Dg(A) III < 2 JI A Jj . The map cp is the composite fg, where f(A ) == A 11 2 . So, by the chain rule, Dcp(A) == Df (g (A) ) Dg (A) == Df (A* A) Dg(A) . Hence
Proof.
III Dcp(A) Il l < III D f(A * A) Ill III D g ( A ) Ill The first term on the right is bounded by � II A - I l l by Theorem X.3. 1 , and • the second by 2 II A II . This proves the theorem. The following theorem generalises Theorem X.3. 1 .
Theorem X.3.4 Let f be an operator monotone function on (0, oo ) . Then, for every unitarily invariant norrn,
III D f (A ) III < l l f ( A) l l
(X.39)
'
for all positive Proof.
operators A.
Use the integral representation (V.49) to write
j( � CXJ
j (t)
= a + {3t +
0
>. 2
1
-
>.
� t ) dJ-L(A) ,
X.3 Lo cal Pert urbation Bounds
where a , {3 are real numbers , {3 > 0, and
J.L
303
is a positive measure. Thus
1 ] d ( >. ) . >. A ( ) + I [ � p. J >.2 1 00
J ( A ) = a d + {3A +
0
Using the fact that , for the function g( A ) == A - 1 we have Dg ( A ) (B) - A - 1 B A - 1 , we obtain from the above expression
! ( >. + A ) - 1 B( >. + A ) - 1 dp. ( >. ) . 00
Df( A ) ( B) = {3 B +
0
Hence
J I I ( >. + A ) - l l l 2 dp. ( >. ) . 00
III D J ( A ) Ill < {3 +
(X .4 0)
0
From the integral representation we also have 00
J ' (t) = /3 +
J ( >. � t) 2 dp. ( >. ) . 0
Hence
CXJ ll f ' ( A ) II = ll /3 1 + ( >. + A ) - 2 dp. ( A ) II ·
J
(X.41)
0
If A has eigenvalues a 1 > - - > a n , then since {3 > 0, the right-hand sides of bot h (X.40) and (X.41) are equal to ·
J (A + Cl'n ) - 2 dp. ( A) . CXJ
{3 +
0
This proves the theorem.
•
Exercise X.3.5 Let f be an operator monotone func tion on (0, oo ) . Show that I I DJ ( A ) I! == II D J ( A ) (I) l l == II ! ' ( A ) II · Once we have estimates for the derivative D f(A) , we can obtain bounds for lll f ( A ) - f(B) III when B is close to A . These bounds are obtained using Taylor's Theorem and the mean value theorem. Using Tay lo r s Theorem, we obtain from Theore ms X.3.3 and X . 3 . 4 above the fol lowing
.
'
304
X.
Pert urbat ion of Matrix Funct ions
Theorem X.3.6 Let invariant norm
A
be an invertible operator. Then for every unitarily
JII I A I - I B I III < cond( A) III A - B il l + O ( III A - B lll 2 ) for all B close to
(X.42)
A.
Theorem X.3. 7 Le t f be a n operator mono tone function o n (0, oo ) . Let A be any positive operator. Then for every unitarily invariant norm
ll f' ( A) II III A - B ill + O ( III A - B lll 2 ) for all positive operators B close to A . For the functions f ( t) == t r , 0 < r < 1 , we have from this lll f ( A) - f ( B) III
<
( X.4 3 )
( X . 44 ) The use of the mean value theorem is illustrated in the proof of the following theorem.
Theorem X.3.8 Let f be an operator monotone function on (0, oo ) and let A, B be two positive operators that are bounded below by a; i. e. , A > a! and B > a! for the positive number a. Then for every unitarily invariant norm (X.45) lll f ( A ) - f (B) III < f' ( a) III A - B i l l Proof. Use the integral representation of f as in the proof of Theorem X.3.4. We have (X) f ' ( A ) = {31 + (>.. + A ) - 2 df.£(>.. ) . If A >
j 0
a!, then J ' ( A)
<
{31 +
[! ( >.. + a)-2df.£(>.. )]1 = J'(a)1. (X)
0
Let A ( t ) == ( 1 - t ) A + tB, 0 < t < 1 . If A and B are bounded below by a , then so is A( t ) . Hence, using the me an value theorem, the inequ ality (X.39), and the above observation, we have
lllf ( A) - f (B ) I II < <
DJ(A(t)) ( A' ( t))l ll III O
f ' ( A (t)) II III A' ( t) lll ll O
< !' (a ) l i l A - B i l l -
•
X . 3 Lo cal Pert urbation Bounds
305
A special corollary is the inequality
lil A' - B ' lll < r a' - 1 III A - B i l l ,
(X.46)
0 < r < 1,
valid for operators A, B such that A > a! and B > al for some positive number a . Other inequalities of this type are discussed in the Problems section . Let A == UP be the polar decomposition of an invertible matrix A . Then P == ! A I and U == AP - 1 . Using standard rules of differentiation , one can obtain from this, expressions for the derivative of the map A U, and then obtain perturbation bounds for this map in the same way as was done for the map A -t I A I . There is, however, a more effective and simpler way of doing this. The advantage of this new method, explained below, is that it also works for other decompositions like the QR decomposition , where explicit formu lae for the two factors are not known. For this added power there is a small cost to be paid. The slightly more sophisticated notion of differentiation on a manifold of matrices has to be used. We have already used similar ideas in Chapter 6. In the space M(n) of n x n matrices, let GL (n) be the set of all invertible matrices , U(n) the set of all unitary matrices, and P (n) the set of all positive ( definite ) matrices. All three are differentiable manifolds. The set GL(n) is an open subset of M (n) , and hence the tangent space to GL (n) at each of its points is the space M( n) . The tangent space to U ( n) at the point I, written as Tr U (n) , is the space K (n) of all skew-Hermitian matrices. This has been explained in Section VI.4. The tangent space at any other point U of U(n) is TuU(n) == U K(n) == { US : S E K(n) } . Let 1t ( n) be the space of all Hermitian matrices. Both 1t ( n) and IC ( n) are real vector spaces and 7t (n) == i JC (n ) . The set P (n) is an open subset of 7t (n) , and hence, the tangent space to P (n) at each of its points is 1-l (n) . The polar decomposition gives a differentiable map from GL(n) onto U(n) x P (n) . This is the map � (A) == (
·
.
,
Theorem X. 3.9 Let t1? 1 be the map from GL(n) into U(n) that takes an invertible matrix to the unitary part in its po lar decomposition,
X
306
X.
Pert urbation of M atrix Functions
point U X is given by the formula (X)
[ D1 (UP) ] (UX) = 2 U
J
e - t P ( i l m X )e - t P dt.
(X.4 7 )
0
Proof. The domain of the linear map D'I!(U, P) is the tangent space to the manifold U (n) x P (n) at the point (U, P) . This space is ( U · K (n) , 1-l (n) ) . The range of D 'I! ( U, P) is the tangent space to GL(n) at UP. This space is M(n) . We will use the decomposition M(n) == U · .K:(n) + U · 1-l (n) that arises from the Cartesian decomposition. By the definition of the derivative, we have [ D w ( U, P) ] ( U S, H)
d w ( Uet8 , P + tH) dt t=O d Uets (P + tH) dt =D t USP + UH
for all S E K(n) and H E 1-l ( n) . The derivative D
w- 1 ,
from the two equations above we see that
U X == [D
UN.
X == MP + N.
Our task now is to find M fro m this equation. Note that A-1 is skew Hermitian and N Hermitian. Hence, from the above equation, we obtain
MP + PM == X - X* == 2 i lm X. This equation was studied in Chapter 7. From Theorem VII.2.3 we have its solution (X) M = 2 e- tP ( i lm X ) e -t P dt .
j
.
0
This gives us the expression (X.4 7) .
•
Corollary X.3. 10 For e v e ry unitarily invariant norm we have (X.48)
X . 3 Lo cal Pert urbation Bounds
Proof. that
Using (X.4 7 ) and properties of unitarily invariant norms, we see
P has
j ll e � t P II III X III II e � t P II dt . 00
III DI (U P ) (U X) III < 2 If
307
eigenvalues
a1
>
· · ·
>
an ,
0
then
1 i e-tP I I
==
e - tn1t .
So,
J e � 2tun II I X ll l dt CXJ
I l l D l ( U P ) (U X) I ll
<
2
0
Hence ,
IIID 1 ( U P) III
== sup
III X III = l
I I I D 1 (U P) (X ) I I I
<
1 II P - 11 -
The choice X == ivv* / l llvv* I l l , where v is an eigenvector of P belonging to • the eigenvalue a n , shows that the last inequality is in fact an equality. Two corollaries follow; the first one is obtained using the mean value theorem and the second one using Taylor ' s Theorem.
Corollary X.3. 1 1 Let A0 , A 1 be two elements of GL (n) , and let U0, U1 be the unitary factors in their polar decompositions. Suppose that the line segment A ( t) == ( 1 - t ) Ao + tA1 , 0 < t < 1 , lies inside GL (n) . Then, for every unitarily invariant norm
III Uo - U1 l l l
II A ( t ) - 1 II · III A o - A l iii O
< max
( X . 49 )
Corollary X.3. 12 Let A0 be an invertible matrix with polar decomposition A o == Uo Po . Then, for a matrix A == U P in a neighbourhood of Ao , we have
I I I Uo - U lll
< I I Ao 1 l l
I I I Ao - A l ii + O ( III Ao - A lll 2 ) .
(X. 50 )
Exercise X.3 . 13 From the proof of Theorem X. 3. 9 one can also extract a boun d for the derivative of the map A j Aj . What does this give ? Compare it with the result of Theorem X. 3. 3. --7
Let us see now how this method works for a perturbation analysis of the QR decomposition. Let .6. + (n) be the set of all upper triangular matrices with positive di agonal entries. Each element A of GL(n) has a unique factoring A == QR, where Q E U(n) and R E .d + (n) . Thus the QR decomposition gives rise to an invertible map from GL(n ) onto U(n) x A. + (n) . Let A re ( n ) be the set of all upper triangular matrices with real diagonal entries. This is a , real vector space, and Ll + (n) is an open set in it . Thus the tangent space to Ll+ (n ) , at any of its points, is d re ( n ) . For each A == QR in GL(n)
308
X.
Perturbation of Matrix Funct ions
the derivative D (A) is a linear map from the vector space M(n) onto the vector space (Q K(n) , A re (n) ) . We want to c al culate the norm of this. First note that the spaces JC(n) and A re (n) are complementary to each other in M(n) . We have a vector space decomposition ·
M (n)
==
== JC(n) + A re (n) .
(X. 5 1 )
Every matrix X splits as X K + T in this decomposition; the entries of X , K and T are related as follows : kJ· J. k2]· . k2·]· tJ.J. t2·]. t 2. ].
i Im xJ·J· - Xj i x 2]· Re x ]· ]·
·
.
for all J , for j > i, for i > j, for all ] , for j > i, for i > j.
-
.
x 21 + x 1 i 0
(X. 52)
Exercise X.3 . 14 Let P1 and P2 be the complementary projection opera tors in M (n) corresponding to the decomposition {X. 51). Show that
where
II P1 ! 1 2
==
sup
II X I I 2 = l
II P1 X II 2 ,
and
ll · ll2
stands for the Frobenius (Hilbert
Schmidt) norrn.
==
Now let
W
be the map from U(n) x A. + (n) onto M( n) defined as w (Q, R) QR. T hen w and are inverse to each other. The derivative D 'I! (Q, R) is a linear map whose domain is the tangent space to the mani fold U (n) x A + (n) at the point (Q, R) . This space is (Q · IC (n) , A re (n) ) . Its range is the space M(n) Q · JC (n) + Q · A re (n) . By the definition of the derivative, we have
==
[Dw ( Q , R)] ( Q K , T)
d dt d dt
t=O
\li ( Q e
t
t
K , R + tT)
Q e K ( R + tT) t=O
QKR + QT for all K E JC(n) and T E A re (n) . The derivative D(QR) is a linear map from M (n) onto Q · JC (n) +
dre (n) . Suppose
==
[D (Q R) ] (Q X ) (Q M , N ) , where M E JC (n) and N E A re (n) . Then we must have QX
== [D(QR )] - 1 (Q M, N) == [D w ( Q , R)] (Q M, N) == QM R + QN.
X. 3 Lo cal Perturbation Bounds
Hence
309
X == lvfR + N.
So, we have the same kind of equation as we had in the analysis of the polar decomposition. There is one vital difference, however. There, the matrices M, N were skew-Hermitian and Hermitian, respectively, and instead of the upper triangular factor R we had the positive factor P. So , taking adjoints, we could eliminate N and get another equation that we could solve explic itly. We cannot do that here. But there is another way out . We have from the above equation (X. 53) Here M E K(n) ; and both N and R - 1 are in .6. re (n) , and hence so is their product N R- 1 . Thus the equation (X. 53) is nothing but the decomposition of X R - 1 with respect to the vector space decomposition (X. 5 1 ) . In this way, we now know M and N explicitly. We thus have the following theorem.
Theorem X.3. 1 5 Let <1> 1 , 2 be the maps from GL (n) into U (n) and .6.+ ( n) that take an invertible matrix to the unitary and the upper tri angular factors in its QR decomposition. Then fnr each X E M (n) , the derivatives D i1.> 1 ( Q R) and D 2 ( Q R ) evaluated at the point Q X are gzven by the forrnulae
[ D 1 (Q R ) ] (QX) == QPl (X R - 1 ) , [ D 2 (QR) ] (Q X ) == P2 ( X R - 1 ) R,
where P1 and P2 are the complementary projection operators in M (n) cor responding to the decomposition (X. 51). Using the result of Exercise X.3. 14, we obtain the first corollary below. Then the next two corollaries are obtained using the mean value theorem and Taylor ' s Theorem.
Corollary X.3. 16 Let <1> 1 , <1> 2 be the maps that take an invertible matrix A to the Q and R factors in its QR decomposition. Then
jj D 1 ( A ) l l 2 < vi2 II A - 1 II , I I D 2 (A) l l 2 < vf2 nd (A ) vf2 II A ll II A - I l l Corollary X.3. 1 7 Let A0, A 1 be two elements of GL (n) with their respec tive QR decompositions A 0 Q 0Ro and A 1 == Q 1 R 1 . Suppose that the line segment A (t) ( 1 - t ) A0 + t A 1 , 0 < t < 1 , lies in GL (n) . Then co
==
==
==
310
X.
Perturbation of Matrix Functions
Corollary X.3 . 18 Let A o == matrix A == Q R close to A0,
Q 0 Ro
be an invertible matrix. Then for every
II Q o - Q ll 2 < v'2 11 A a 1 II II Ao - A ll 2 + O ( II Ao - A l l� ), li Ra - R l 1 2 < v'2 cond(Ao ) II A o - A l l 2 + O ( II A o - A ll� ). For most other unitarily invariant norms, the norms of projections P1 and P2 onto the two summands in (X. 5 1 ) are not as easy to calculate. Thus this method does not lead to attractive bounds for these norms in the case of the QR decomposition .
X.4
App endix : Differential Calculus
We will review very quickly some basic concepts of the Frechet differential calculus, with special emphasis on matrix analysis. No proofs are given. Let X, Y be real Banach spaces , and let £ ( X, Y) be the space of bounded linear operators from X to Y. Let U be an open subset of X . A continuous map f from U to Y is said to be differentiable at a point u of U if there exists T E £(X, Y ) such that + v ) - f ( u ) - Tv l l u f ( . ll lim O == v-+o 11 v 1 1
(X. 54)
It is easy to see that such a T, if it exists, is unique. If f is differentiable at u, the operator T above is called the derivative of f at u. We will use for it the notation D f ( u) . This is sometimes called the Frechet derivative. If f is differentiable at every point of U, we say that it is differentiable on U. One can see that, if f is differentiable at u , then for every v E X, d D f ( u) ( v ) == dt
t=O
f ( u + tv) .
(X.55)
This is also called the directional der ivat i ve of f at u in the direction v. The reader will recall from elementary calculus of functions of two vari ables that the existence of directional derivatives in all directions does not ensure differentiability. Some illustrative examples are given below. Example X.4. 1 {i) The constant function f (x)
points, and D f(x)
==
c
is differentiable at all
0 for all x. {ii) Every linear operator T is differentiable at all points, and is its own derivative; i . e. , DT( u ) ( v ) == T v , for all u , v in X . {iii} Let X, Y, Z be real Banach spaces and let B : X x Y Z be a bounded bilinear map. Then B is differentiable at every point, and its ==
�
X . 4 Appendix: D ifferent ial Calculus
derivative
D B(u, v )
311
is given as
D B ( u, v) ( x, y ) == B ( x, v ) + B( u , y ) . {iv) Let X be a real Hilbert space with inner product (· , · ) , and let f ( u ) ll u ll 2 == ( u , u ) . Then f is differentiable at every point and Df(u) ( v)
==
2( u, v ) .
The next set of examples is especially important for us.
Example X.4.2 In these examples X {i} Let j (A) == A 2 . Then
== Y == .C ( 7t) .
D j (A ) (B ) == AB + EA . let f (A) == A n , > 2 . From the
{ii) More generally, for (A + B ) n one can see that
n
binomial expansion
J + k = n. - 1 J , k ?_ O
f(A) == A - l for each invertible A . Then Dj (A) ( B ) == -A - 1 BA - 1 • (iv} Let f (A) == A * A. Then D f(A) ( B ) == A* B + B * A. (v} Let f (A) == e A . Use the formula 1 e A+B e A = e ( l - t ) A B e t( A +Bl dt {iii} Let
_
j 0
{called Dyson's expansion} to show that
1
D f( A ) (B) = J e (l -t ) A B e t A dt . 0
The usual rules of differentiation are valid: If !I , f2 are two differentiable maps , then !I + f2 is differentiable and The composite of two differentiable maps have the chain rule D
f and g is differentiable and we
( g · f ) ( u ) = D g ( f ( u) )
· D
f ( u) .
3 12
X.
Perturbat ion of Matrix Functions
g is linear, this reduces to D ( g f) ( u) g · D f ( u) .
In the special situation when
==
·
One important rule of differentiation for real functions is the product rule: ( fg) ' == f ' g + gf ' . If f and g are two maps with values in a Banach space, their product is not defined - unless the range is an algebra as well. Still, a general product rule can be established. Let j, g be two differentiable maps from X into Y1 , Y2 , respectively. Let B be a continuous bilinear map from Y1 x Y2 into Z. Let cp be the map from X to Z defined as cp( x) == B ( f ( x) , g( x )) . Then for all u, v in X
D cp( u ) (v)
==
B(Df(u) (v ) , g (u)) + B(f(u) , D g( u ) ( v)) .
This is the product rule for differentiation. A special case of this arises when Y1 == Y2 == .C(Y) , the algebra of bounded operators in a Banach space Y. Now cp ( x) == f ( x ) g ( x ) is the usual product of two operators. The product rule then is
[Df(u) ( v )] · g (u) + f(u ) · [ D g( u) (v ) ] . (i} Let f be the map A ----* A- 1 on GL(n) . Use the product
Dcp ( u) ( v ) Exercise X.4.3 rule to show that
==
This can also be proved directly. (ii} Let f (A ) == A- 2 • Show that
- A - 1 BA - 2 - A - 2 BA -1 . a formula for the derivative of the map f(A) Df ( A) ( B)
(iii) Obtain 3, 4, . . . .
==
==
A-n ,
n
==
Perhaps, the most useful theorem of calculus is the Mean Value Theorem.
Theo r em X.4.4 (The Mean Value Theorem) Let f be a differentiable map from an interval I of the real line into a Banach space X . Then for each closed interval [ a , b] contained in I,
llf(b ) - f ( a ) ll
<
lb - a!
- · ·
sup a
l! D f(t ) ll -
This is the version we have used often in the book, with There is a more general statement:
[ a , b]
==
[0, 1 ] .
Theorem X.4. 5 (The Mean Value Theorem} Let f be a differentiable map from a convex subset U of a Banach space X into the Banach space Y . Let a, b E U and let be the line segment joining them. Then
L
II ! ( b) - ! ( a) I I < I I b - a I I suP I I D f ( u) 11 uE L
X.4 Appendix: D ifferential Calculus
313
(Note that there are three different norms that occur in the above inequal ity. These are the norms of the spaces Y, X , and £(X, Y ) , respectively. ) Higher order Frechet derivatives can be identified with multilinear maps . This is explained below. Let f be a differentiable map from X to Y . At each point u , the deriva tive Df( u ) is an element of the Banach space L ( X , Y ) . Thus we have a map Df from X into L ( X , Y) , defined as D f : u � Df ( u ) . If this map is differentiable at a point u, we say that f is twice differentiable at u . The derivative of the map D f at the point u is called the second deriva tive of f at u . It is denoted as D 2 f(u ) . This is an element of the space L(X, L(X, Y ) ) . This space is isomorphic to another Banach space , which is easier to handle. Let £2 ( X , Y ) be the space of bounded bilinear maps from X x X into Y. The elements of this space are maps f from X x X into Y that are linear in both variables, and for whom there exists a constant c such that for all x i x 2 E X. The infimum of all such c is called l l f ll - This is a norm on the space £ 2 ( X , Y) , and the space is a Banach space with this norm. If 'P is an element of £ (X, L(X, Y) ) , let ,
cp (xi , x2 )
==
['fJ(XI )] (x 2 )
for x i , x 2 E X.
Then cp E £2 (X, Y) . It is easy to see that the map cp � cp is an isometric isomorphism. Thus the second derivative of a twice differentiable map f from X to Y can be thought of as a bilinear map from X x X to Y. It is easy to see that this map is symmetric in the two variables; i.e. ,
for all u , VI , v 2 . (This symmetry property is extremely helpful in guessing the expression for the second derivative of a given map . ) Some examples on the space of matrices are given below. Example X.4.6 Let X == M ( n ) and let f(A) == A 2 , A E M(n ) . We have seen that Df ( A) (B ) == A B + BA for a l l A, B . No te that D f(A) == LA + R A , where LA and RA are linear operators on M( n ) , the first one is the left multiplication by A and the second one is right multiplication by A. The map D f : A � Df(A) is a linear map from M(n) into £ ( M ( n ) ) . So the derivative of this map, at each point, is the map itself. Thus for each A, D 2 f (A) == D f. In other words,
(D 2 f(A ) ] (B) == Df (B) == Ls + R s . If we think of D 2 f(A) as a linear map from M ( n ) into L ( M ( n ) ) , we have (D 2 J(A ) (B 1 ) ] (B2) == (LB1 + R B1 ) (B2) == B 1 B2 + B2 B 1
314
X.
Perturbat ion of Matrix Functions
for all B 1 , B 2 . If we think of it as a bilinear map, we have
Note that the right-hand side is independent of A . So the map A ----* D 2 f (A ) is a constant map. These are noncommutative analogues of the facts that if f(x) == x 2 , then f ' (x) == 2x and f" (x) == 2 . •
Example X.4. 7 Let f (A) == A 3 . We have seen that
This is the noncom mutative analogue of the fact that if f ( x) == x 3 , then f ' (x) == 3x 2 . What is the second derivative ? From the formula f" (x) == 6x , and the fact that D 2 f (A) is a symmetric bilinear map, we can guess that
Prove that this indeed is the right formula for D 2 f(A) . Note that the map A ----* D 2 f(A) is linear.
Example X.4. 8 More generally, let f (A) == A n . From the binominal the orem one can see that
L
j+ k+i=n -2 J,k,£�0
[Ai B 1 A k B2 A e + Ai B 2 A k B 1 A e ] .
Example X.4.9 Let f(A) == A - 1 , A E GL(n). We know that D f(A) (B) == - A - 1 BA - 1 , for all B E M(n) . This is the noncommutative analogue of the formula (x - 1 ) == -x- 2 . The analogue of the formula (x- 1 ) " == 2x - 3 is the following: '
A - 1 B 1 A- 1 B2 A - 1 + A - 1 B2 A- 1 B A - 1 . 1 This can be guessed from the bilinearity and symmetry properties that D 2 f(A ) must have. It can be proved formally by the rul e s of differentia tion.
Example X.4.10 Let f(A) == A- 2 , A - A - 1 BA - 2 A - 2 BA - 1 . Show that -
[D 2 f(A)] (B 1 , B2 )
==
E
GL(n). We know that Df(A) (B ) ==
A- 2 B 1 A - 1 B2A- 1 + A- 2 B2 A - 1 B 1 A - 1 + A - 1 B 1 A- 2 B2 A - 1 + A- 1 B2 A- 2 B1 A- 1 + A- 1 B 1 A- 1 B2 A- 2 + A - 1 B2 A - 1 B 1 A- 2 .
This is the analogue of the formula (x - 2 ) " == 6x- 4 .
X . 4 Appendix: D ifferent ial Calculus
315
Let f (A) == A * A . We have seen that D f(A) (B) == A* B + B *A. Show that D 2 f(A) (B 1 , B2 ) == B� B2 + B2 B1 . Note that this expression does not involve A. So the map A ---t D 2 f(A) is a constant map.
Example X.4. 1 1
Derivatives of higher order can be defined by repeat i ng the above proce dure. The pth derivative of a map f from X to Y can be identified with a p-linear map from the space X x X x · · · x X (p copies) into Y . A convenient method of calc u lat i ng the pth derivative of f is provided by the formula
Compare this with the formula (X. 55) for the first derivative.
Let f(A ) == An A E .C(7t) . Then for p == 1 , 2, . . . , n, [DP j (A) ] ( B 1 , . . . , Bp )
Example X.4. 12
,
L
L
u E Sp
11
Jl �o. + · · · +J p + l = n - p
where Sp is the set of all permutations on p symbols. There are ( n�!p) ! terrns in the above double sum. These are all words of length n in which n - p of the letters are A and the remaining letters are B 1 , . . . , Bp , each occurring exactly once. Notice that this expression is linear and symmetric in each of the variables B 1 , . . , Bp . When dim 1t == 1 , this reduces to the formula for the pth derivative of the function f(x) == x n .
:
j( P ) (x)
==
n (n - 1)
· · ·
(
n -
p + l )x n - p ==
n! n-p _ x ( n - p) !
The reader should work out some more si mple examples to see the ex pressions for higher derivat ives . Another important theorem of calculus, Taylor's Theorem, has an an al o gue in the Frechet calculus. Of the d iffe re nt versions possible, the one that is most useful for us is gi ven b elow. Let f be a (p+ 1 )-times d i fferentiabl e map from a Banach space X i nto a B anach space Y. For h E X, write [ h] rn to mean the m-tuple ( h , h, . . . , h ). Then , for all x E X and for small h ,
l l f(x + h ) - f(x) -
L=l � Dm f (x)( [h] m) l l p
m.
m
=
O ( ll h llp+ l ) .
From this we get p
l l f( x + h ) - f(x) l l < "2.: m= l
� I I Dm f ( x ) l l ll hll m + O ( ll h ii P+ l )_ m.
:3 1 6
X.
Pert urbation of M atrix Functions
Finally, let us write down formulae for higher order derivatives of the composite of two maps. These are already quite intricate for real functions. If \Ve have 'P == f ( g ( x ) ) , then we have
'P ( l ) ( x ) 'P( 2 ) ( x ) 'P (3 ) ( x )
j(l ) (g ( x) )g ( l ) (x ) , ! ( 2 ) ( g ( x )) (g (l ) ( x )] 2 + j( l ) ( g ( x ) ) g ( 2 ) ( x ) , f(3 ) ( g ( x)) [g ( l ) (x ) ] 3 + 3 f( 2 ) (g (x) ) g ( l ) ( x ) g ( 2 ) ( x) + f ( l ) ( g ( x )) g (3 ) ( x ) .
If X, Y, Z are Banach spaces and if
f is a map from X to Y ,
and g a map from Y to Z, then for the derivatives of the composite map 'P == f o g , we have the following formulae. By the chain rule,
D 1.p ( x ) == D f ( g(x)) Dg ( x) . The second and the third derivatives are bilinear and trilinear maps , re spectively. For them we have the formulae:
[D 2 1.p ( x ) ] ( x 1 , x 2 )
==
[ D 2 f ( g (x) ) ] ( Dg ( x ) ( x i ), D g ( x) (x 2 )) + D j (g(x) ) ([ D 2 g(x) ] ( x 1 , x 2 )),
[ D 3 f ( g ( x )) ] ( Dg (x) ( x 1 ), D g ( x ) ( x 2 ) , D g (x )(x3 )) + [D 2 f ( g ( x)) ] ( Dg ( x) (x i ) , [ D 2 g (x) ] (x 2 , x ) ) 3 + ( D 2 j ( g ( x )) ] ( Dg ( x) ( x 2 ), [D 2 g( x ) ] ( x 1 , x )) 3 + [D 2 f ( g ( x )) ] ( D g( x ) ( x ) , [D 2 g ( x ) ] (x 1 , x 2 ) ) 3 + D f ( g (x)) [D 3 g ( x )] ( xi , x 2 , x ) . 3 The reader should convince himself that considerations of domains and ranges of the maps involved, sy mmetry in the variables, and the demand that in the case of real functions we should recover the old formulae lead to these general formulae. He can then try proving them. We have also used the notion of the derivative of a map between mani folds. If X and Y are differentiable manifolds in finite-dimensional vector spaces , and f is a differentiable map from X to Y, then at a point u of X the derivative D f ( x ) is a linear map from the linear space TuX into the lin ear space Tf(u) Y. These are the tangent spaces to X and Y at u and f ( u ), respectively. All manifolds we considered are subsets of M (n) . Of these, GL(n) , P(n) , and � + (n) are open subsets of vector subspaces of M(n) . So these vector spaces are the tangent spaces for the manifolds. The only closed manifold we considered is U ( n) . It is easy to find the tangent space at any point of this manifold. This was done in Chapter 6. Most of the results of Frechet calculus can be restated in this setup with appropriate modifications.
X . 5 P roblerns
X. 5
317
Problems
Problem X. 5 . 1 . Let f be a nonnegative operator monotone function on (O, oo) . ( i) Show that if A is positive and U unitary, then
lll f ( A)U - U f ( A ) IIl < lll!( I A U - U Al ) l ll ( ii ) Let A be positive and X Hermitian. Let U be the Cayley transform of X . We have ( X - i ) ( X + i) - 1 , U i(I + U ) (I - U ) - 1 == 2i (/ - U) - 1 - il. X Show that
lll f(A) X - Xf ( A) III Use the relation between show that
lllf (A) X - X f( A) III ( iii )
Let show that
A, B
U <
< 2 11 (1 -
and 1
+
X
u ) - 1 11 2 lll f( I A U - U A I ) III -
again to estimate the last factor, and
spx) Ill ! C +
be positive and
X
2
5
)
� ( X ) l A X - X A I Ill ·
arbitrary. Use the above inequality to
[Hint: Use 2 x 2 block matrices. ] When X == I, this reduces to the inequality
( X. 5) .
Problem X. 5 . 2 . Let f be a nonnegative operator monotone function. Let A, B be positive matrices and let X be any contraction. Show that
[Hint:
l l l f (A )X - Xf(B ) III < 5 / 4 III ( I AX - XB I ) III Use the result of the preceding problem, replacing X there by � X.]
Problem X.5.3. From the above inequality it follows that if positive and X is any matrix, then for 0 < r < 1 ,
II A ' X - XB ' II < 5 /4 II X jj 1 - ' jj AX - X B II '· Show that we have under these conditions
A, B
are
318
X.
Pert urbation of Matrix Funct ions
[ Hint: Reduce the general case to the special case inequality. ]
A == B .
Use Holder ' s -
Problem X. 5 . 4. Let
A, B
Problem X . 5 . 5 . Let that for every X
A be a positive operator such that A > a ! > 0 . Show
be any two operators. Show that
2 a iii X III < Ill AX + X A l ii · [Use the results in Section VII. 2 . ] Use this to show that if A, B are positive A 1 1 2 + B 1 1 2 -> ai -> O then
operators such that
'
[Hint:
ll l A 11 2 - B 1 1 2 l ll < �a l il A - B il l · Consider the operators A 1 1 2 + B 1 1 2 and A 1 1 2 - B 1 1 2 . ]
Problem X. 5 . 6 . Let A and B be positive operators such that A > a ! > 0 and B > bi > 0. Show that for every nonnegative operator monotone function f on (0, oo )
lll f( A) - f (B) III < C (a , b) III A - B i l l ,
where
C ( a, b ) == f( al=t(b)
if a f
b, and C ( a, b ) == f' ( a ) if a == b .
Problem X. 5. 7. Let f be a real function on (0 , oo) , and let f ( n ) be its nth derivative. Let f also denote the map induced by f on positive operators. Let D n f ( A) be the nth order Frechet derivative of this map at the point A. Let
v< n ) == { ! :
I I D n f ( A) I I ==
l l f ( n ) ( A) I l for all positive A } .
We have seen that every operator monotone function is in the class Show that it is in V ( n ) for all n 1 , 2 , . . . . ==
vC l ) .
Problem X . 5 . 8 . Several examples of functions that are not operator monotone but are in V ( l ) are given below. (i) Show that for each integer n, the function f (t) the class vC 1 ) .
==
tn
on (0, oo) is in
(ii) Show that the function f ( t) == a 0 + a 1 t + · · · + a n t n on (0, oo) , where n is any positive integer and the coefficients aj are nonnegative, is in the class vC l ) .
X . 5 P roblen1s
319
(iii) Any function on (0, oo ) that has a power series expansion with non negative coefficients is in the class 1)( l ) .
( i v) Use the Dyson expansion to show that the exponential function is in the class (v) Let f(t)
V( l ) .
J e- Atd/1-(>.. ) , CXJ
==
0
Show that f E
where
11-
is a positive measure on ( 0, oo ) .
V ( l ) . [Use part ( iv) . ]
(vi) From the Euler's integral for the gamma function, we can write, for r > 0,
Use this to show that for each (0 , oo ) is in V(l) .
r
> 0 , the function f ( t)
Problem X. 5 . 9 . The Cholesky decomposition of a positive definite matrix A is the (unique) factoring A = R* R, where R is an upper triangular matrix with positive diagonal entries. This gives an invertible map from the space P ( n ) onto the space A. + ( n ) . Show that
for every A. Use this to write local perturbation bounds for the map .
P roblem X.5. 10. A matrix is called strongly nonsingular if all of its leading principal minors are nonzero. Such matrices form a dense open set in the space M ( n ) . Every strongly nonsingular matrix A can be factored as A == LR, where L is a lower triangular matrix and R an upper trian gular matrix. Further, L can be chosen to have all of its diagonal entries equal to 1 . With this restriction the factoring is unique. This is the LR decomposition familiar in linear algebr a and numeri cal analy si s . Let S be the set of strongly nonsingular matrices, �i the set of lower triangular matrices with unit diagonal, and � ns t he set of nonsingular upper triangular matrices. Let
320
X.
Pertur bation of Mat rix Functions
Follow the approach in Section X.3 to obtain the bounds : II D � 1 (A) I I 2 < cond ( L ) II R- 1 1 1 , I I D 2 ( A ) II 2 < cond ( R ) II L - 1 11 Use these to obtain local perturbation bounds for the LR decomposition.
X.6
Notes and References
Most of the results in this chapter can be proved for infinite dimensional Hilbert space operators. Many of them are valid for operator algebras as well. Let f be a continuous real function on an interval that contains the spec tra of two Hermitian operators A, B ( on a Hilbert space 7t) . The problem of finding bounds for ll f (A) - f(B) I l in terms o f I I A - B l l has been inves tigated in great detail by many authors. Many deep results on this were obtained by the Russian school of Birman, which includes Farforovskaya, N aboko, Solomyak, and others. When f is differentiable and f ' is bounded , one would expect to find inequalities of the form ll f (A) - f(B) II < C ll f ' ll oo II A - B ll . Counterexamples to show that such inequalities are not true, in general, were constructed by Yu. B . Farforovskaya, An estimate of the norm I I J ( B) - f(A) II for self- adjoint operators A and B, Zap. Nauch. Sem LOMI, 56( 1976) 143- 162. ( E nglish translation: J. Soviet Math. 14, No. 2 ( 1 980) . ) It was shown by M. Sh. Birman and M. Z. Solomyak that such inequalities can be found under stronger smoothness assumptions. The reader should see their paper titled Double Stieltjes operator integrals, English translation, in Topics in Mathematical Physics, Volume 1 , Consultant Bureau, New York, 1967. Theorem X. 1 . 1 is taken from F. Kittaneh and H. Kosaki , Inequalities for the Schatten p-norm V, Publ. Res. Inst. Math. Sci. , 23( 1987) 433-443. The inequality (X.3) was proved by M.Sh. Birman, L.S. Koplienko, and M.Z. Solomyak, Estimates of the spectrum of the difference between frac tional powers of self-adjoint operators, Izvestiya Vysshikh Uchebnykh Zave denni. Mat, 19 ( 1975) 3- 10. Its generalisation in Theorem X. 1 .3 is due to T. Ando, Comparison of norms l l l f(A) - f(B) I l l and l l l f ( I A - B l ) l l l , Math. Z . , 197( 1988) 403-409. Our discussion of the material between Theorem X . 1 . 3 and Exercise X . 1 . 7 is taken from this paper. For p- norms, the in equality (X. 10) has another formulation: if A,B are positive
II A 11t - B 1 1 t l l tp < I I A - B l l �/ t for p > 1 , t > 1 .
X . 6 Notes and References
32 1
The special case t == 2 of this was proved by R. T. Powers and E. Stq)rrner, Free states of the canonical anticommutation relations, Commun. Math. Phys. , 16 ( 19 70) 1-33 . The point of this formulation is that if A, B are positive Hilbert space operators and their difference A -B is in the Schatten class Tp , then A l / t - B 1 1t is in the class 'It p , and the above inequality is valid . Theorem X. 1 .8. is due to D . Jocic and F. Kittaneh, Some perturbation inequalities for self-adjoint operators, J . Operator Theory, 3 1 ( 1994) 3- 10. The proof of Lemma X. 1 .9 given here is due to R. Bhatia, A simple proof of an operator inequality of Jocic and Kittaneh, J . Operator Theory, 3 1 ( 1994) 2 1-22. As in the preceding paragraph , for the Schatten p-norms, the inequality (X. 12) can be written as
I I A - B j j ( 2 m +l) p < 2 2 m / 2 m+l ii A2 m+ l - B 2 m+l l l �/ 2 m+ l ' for m == 1 , 2, . , p > 1 and Hermitian A, B. The result is valid in infinite dimensional Hilbert spaces. A corollary of this is the statement that if the difference A 2 m+ l - B 2 m + l is in the Schatten class IP , then A - B is in the class Ic 2 m+l ) p · . .
The first inequality in Problem X.5.3 was proved by G . K . Pedersen , A commutator inequality ( unpublished note ) . The generalisation in Prob lem X . 5 . 2 , the inequalities in Problem X.5. 1 , and the second inequality in Problem X.5.3 are due to R. Bhatia and F. Kittaneh, Some inequalities for norrns of commutators, SIAM J . Matrix Anal. , 18 ( 1 997) to appear. The motivation for Pedersen was a result of W.B. Arveson, Notes on extensions of C* -algebras, Duke Math. J. , 44 ( 1977) 329-355. Let f be a continuous function on [0,1] with f(O) == 0 , and let £ > 0. Arveson showed that there exists a 8 > 0 such that if A and X are elements in the unit ball of a C* algebra and A > 0, then I I A X - X A I I < 8 implies l l f ( A ) X - Xf(A ) II < £ . The inequality in Problem X.5.3 is a quantitative version of this for the special class of functions f (t) == t' , 0 < r < 1 . Weaker results proved earlier and their applications may be found in C.L. Olsen and G.K. Peder sen, Corona C* - algebras and their applications to lifting pro blems, Math. Scand. , 64( 1989) 63-86. It is conjectured that the factor 5/4 occurring in these inequalities can be replaced by 1. The inequality (X.20) for p == 1 was proved by H. Kosaki, On the conti nuity of the map 'P j cpj from the predual of a W* -algebra, J. Funct. Anal. , 59( 1984) 123-131. For the Schatten p-norms, p > 2 , the inequality (X. 18) was proved by F. Kittaneh and H, Kosaki in their paper cited above. The other parts of Theorems X.2 . 3 and X.2.4, and Theorem X.2. 1 were proved by R. Bhatia, Perturbation inequalities for the absolute value map in norm ideals of operators, J. Operator Theory, 19( 1988) 1 29- 136. The constant ffn in (X.25) can be replaced by a factor Cn � log n. This has been known for some time and is related to other important problems in operator theory. See two papers by A. Mcintosh, Counterexample to a question on commutators, Proc, Amer. Math. Soc. , 29 ( 1971) 337-340, and ---t
322
X.
Perturbat ion of M atrix Funct io ns
Functions and derivations of C* -algebras, J . Funct. Anal. , 30 ( 1978) 264-275 . It is also known that such a factor is indeed necessary, both for the operator norm and for the corresponding inequality for the trace norm II - II I · This implies that , if 11 is infinite-dimensional, then the map A � I A I on £ (7t) is not Lipschitz continuous . Nor is it Lipschitz continuous on the Schat ten ideal T1 . The inequality (X.24) due to Araki and Yamagami , on the other hand, shows that on the Hilbert-Schmidt ideal T2 this map is contin uous. For other values of p , 1 < p < oo , E.B. Davies, Lipschitz continuity of functions of operators in the S chatten classes, J. London Math. Soc. , 37( 1988) 148- 1 57, showed that there exists a constant /p that depends on p, but not on the dimension n, such that
II I A I - I B I l i P
< lp
II A - E ll ,
for all A , B. T heorem X.2 . 5 was proved in T. Kato, Continuity of the map S � l S I for linear operators, Proc. Japan Acad. 49 ( 1973) 157- 1 60 , and interpreted to mean that the map A � I A I is "almost Lipschitz" . Results close to this were obtained by Yu. B . Farforovskaya in the papers cited above. Bounds like the ones in Section X.3 have been of interest to numerical analysts and physicists. References to much of this work may be found in R. Bhat i a, Matrix factorizations and their perturbations, Linear Alge bra A ppl 197/ 198 ( 1994) 245-276. Theorem X. 3. 1 , and the proof given here, are due to C . J . Kenney and A.J. Laub, Condition estimates for ma trix functions, SIAM J. Matrix Analysis , 1 0 ( 1 989) 19 1-209 . Theorem X.3.4 was proved in R. Bhatia , First and second order perturbation bounds for the operator absolute value, Linear Algebra Appl. , 208/209 ( 1994) 367-376. Theorems X.3.3, X.3.6, X.3 .7, and X . 3 . 8 are also proved in this paper. The inequality in Problem X.5.5 is taken from J.L. van Hemmen and T. Ando, An inequality for trace ideals, Commun. Math. Phys. , 76( 1980) 143- 148. This paper has references to physics literature, where such inequalities are used. The inequality in Problem X. 5.6 is proved in the paper by F. Kit taneh and H. Kosaki cited earlier. Most of the results after Theorem X.3.9 in Section X.3 were proved by R. Bhatia and K. Mukherjea, Variation of the unitary part of a matrix, SIAM J . Matrix Analysis, 1 5 ( 1 994) 1007- 1014. T he full potential of this method was exploited in the paper cited at the beginning of this paragraph, where several other matrix decompositio ns of interest in numerical analysis are studied. The results of Problem X.5 . 9 and X.5. 10 are obtained in this paper. (Some of these were proved earlier using different methods by A. Barrlund, R. Mathias, G . W. Stewart, and J. G. Sun ) Bounds for the second derivative of the map A � I AI are obtained in R. Bhatia, First and second order perturbation bounds for the operator absolute value, Linear Algebra Appl. , 208/209 ( 1994) 367-376; and for derivatives of higher orders in R. Bhatia, Perturbation bounds for the operator absolute value, Linear Algebra App l . 226 ( 1995) 639-645 . The reader may try to . ,
.
,
X . 6 Notes and References
323
prove such inequalities using the methods explained in Section X.4. Since this map is the composite of two maps, A ----t A * A ----t ( A * A ) 1 12 , its analysis can be broken into two parts. No good bounds of higher order are known for other matrix decompositions. Results in Parts ( v ) anrl ( vi) of Problem X . 5 . 8 are taken from R. Bhatia and K . B . Sinha, Variation of real powers of positive operators, Indiana Univ. Math. J . , 43 ( 1994) 913-925. In this paper it is also shown that the functions f (t) == tr on (0, oo ) belong to the class D( l ) if r > 2, but not if 1 < r < J2. We have already seen that these functions are in D( 1 ) for all real numbers r < 1 . In Section X.4 we have given a bare outline of differential calculus. More on this may be found in J. Dieudonne, Foundations of Modern Analy sis, Academic Press, 1960, and in A. Ambrosetti and G. Prodi , A Primer of Nonlinear Analysis, Cambridge University Press, 1993 . For calculus on manifolds , the reader could see S . Lang, Introduction to Differentiable Man ifolds, John Wiley, 1962. In our exposition we have included several exam ples of matrix functions and formulae for higher derivatives of composite maps that are not easily found in other sources.
References
C. Akemann, J . Anderson and G. Pedersen, Triangle inequalities in oper ator algebras, Linear and Multilinear Algebra, 1 1 ( 1982) 167- 1 78. A. Ambrosetti and G. Prodi , A Primer of Nonlinear Analysis, Cambridge University Press, 1993 . A.R. Amir-Moez, Extreme Properties of Linear Transformations and Ge ometry in Unitary Spaces, Texas Tech. University, 1968. W.N. Anderson and G.E. Trapp, A class of mono tone operator func tions related to electrical network theory, Linear Algebra AppL , 1 5 ( 1 975) 5367 . T. Ando, Topics on operator inequalities, Hokkaido University, Sapporo, 1978. T. Ando, Concavity of certain maps on positive definite matrices and appli cations to Hadamard products, Linear Algebra Appl. , 26( 1979) 203-24 1 . T . Ando, Inequalities for permanents, Hokkaido Math. J. , 10 ( 1981 ) 18-36. T. Ando, Compariso n of norms f l l f ( A ) - f ( B ) l l l and I I I !( I A - B I ) I I I , Math. z . , 197( 1988) 403-409. T. Ando, Majorization, doubly stochastic matrices and comparison of eigen values, Linear Algebra Appl. , 1 18 ( 1989) 163-248. T. Ando, Matrix Young inequalities, Operator Theory: Advances and Ap plications, 75 ( 1995) 33-38. T. Ando, Bounds for the antidistance, J. Convex Analysis, 2 ( 1996) 1-3 . T. An do and R. Bhatia, Eigenvalue inequalities associated with the Carte sian decomposition, Linear and Multilinear Algebra, 22 ( 1987) 133- 147.
326
References
T.
Ando and F. Hiai, Log majorization and complementary Golden- Thomp son type inequalities, Linear Algebra Appl. , 197 / 198 ( 1994) 1 13- 13 1 . H . Araki , On an inequality of Lieb an d Thirring, Letters in Math. Phys . , 19( 1990 ) 167- 1 70. H. Araki and S. Yamagami , An inequality for the Hilbert-Schmidt norrn, Commun. Math. Phys. , 8 1 ( 1 98 1 ) 89-98. N. Aronszajn, Rayleigh-Ritz and A . Weinstein methods for approximation of eigenvalues. I. Operators in a Hilbert space, Pro c. Nat. Acad. Sci. U.S .A. , 34( 1948) 474-480. W.B. Arveson, Notes on extensions of C* -algebras, Duke 11ath. J . 44 ( 1977) 329-355. J . S . Auj la and H. L. Vasudeva, Convex and monotone operator functions, Ann. Polonici Math. , 62 ( 1 995) 1- 1 1 . F.L. Bauer and C.T. Fike, Norms and exclusion theorems, Numer. Math. 2 ( 1960) 137- 14 1 . H . Baumgartel, A nalytic Perturbation Theory for Matrices and Operators, Birkhauser, 1984. N. Bebiano, New developments on the Marcus- Oliviera conjecture, Linear Algebra Appl . , 197 / 198 ( 1 994) 793-802 . G . R. Belitskii and Y.I. Lyubich , Matrix Norrns and Their Applications, Birkhauser, 1 988. J . Bendat and S. Sherman, Monotone and convex operator functions, Trans. Amer. Math. So c . , 79( 1955) 58-71 . F . Berezin and I.M. Gel ' fand, Some remarks on the theory of spherical functions on symmetric Riemannian manifolds, Thudi Moscow Math. Ob. , 5 ( 1956) 31 1-351 . R. Bhatia, On the rate of change of spectra of operators II, Linear Algebra Appl. , 36( 198 1 ) 25-32. R. Bhatia, Analysis of spectral variation and some inequalities, Thans. Amer. Math. So c . , 272( 1982) 323-332. R, Bhatia, Some inequalities for norrn ideals, Com1nun. Math. Phys. , 1 1 1 ( 1987) 33-39. R. Bhatia, Perturbation Bounds for Matrix Eigenvalues, Longman, 1987. R. Bhatia, Perturbation inequalities for the absolute value map in norm ideals of operators, J. Operator Theory, 19( 1988) 129- 136. R. Bhatia, On residual bounds for eigenvalues, Indian J . Pure Appl. Math. , 23 ( 1992) 865-866. R. Bhatia, A simple proof of an operator inequality of Jocic and Kittaneh, J . Operator Theory, 3 1 ( 1 994) 2 1-22. R. Bhatia, Matrix factorizations and their perturbations, Linear Algebra Appl. , 197 / 1 98( 1994 ) 245-276. ,
References
327
R. Bhatia, First and second order perturbation bounds for the operator R. R. R.
R.
R. R. R.
R.
R. R. R.
R.
R.
absolute value, Linear Algebra Appl. , 208/209 ( 1994) 367-376. Bhatia, Perturbation bounds for the operator absolute value, Linear Al gebra Appl. , 226 ( 1 995) 639-645. Bhatia and C. Davis, A bound for the spectral variation of a unitary operator, Linear and Multilinear Algebra, 1 5 ( 1 984) 71-76. Bhatia and C. Davis, Concavity of certain functions of matrices, Linear and Multilinear Algebra, 1 7( 1 985) 155- 164. Bhatia and C. Davis, More matrix forrns of the arithmetic-geometric mean inequality, SIAM J. Matrix Analysis, 14( 1993) 132- 136. Bhatia and C. Davis, Relations of linking and duality between sym metric gauge functions, Operator Theory: Advances and Applications , 73 ( 1 994) 127- 137. Bhatia and C. Davis, A Cauchy-Schwarz inequality for operators with applications, Linear Algebra Appl. , 223 ( 1 995) 1 19- 129. Bhatia, C . Davis, and F. Kittaneh, Some inequalities for commutators and an application to spectral variation, Aequationes Math. , 4 1 ( 1 99 1 ) 70-78 . Bhatia, C. Davis and A. Mcintosh, Perturbation of spectral sub spaces and solution of linear operator equations, Linear Algebra AppL , 52/53 ( 1 983) 45-67. Bhatia and L. Elsner, On the variation of permanents, Linear and Mul tilinear Algebra, 2 7 ( 1 990) 105- 1 10. Bhatia and L. Elsner, Symmetries and variation of spectra, Canadian J. Math. , 44( 1 992) 1 155- 1 166 Bhatia and L. Elsner, The q-binomial theorem and spectml symmetry, Indag. Math. , N.S . , 4( 1993) 1 1- 16. Bhatia, L. Elsner, and G. Krause, Bounds for the variation of the roots of a polynomial and the eigenvalues of a matrix, Linear Algebra Appl. , 142 ( 1 990) 195-209. Bhatia, L. Elsner, and G. Krause, Spectral variation bounds for diago nali,c; able matrices, Preprint 94-098, SFB 343 , University of Bielefeld, A equationes Math , to appear. Bhatia and S. Friedland, Variation of Grassrnann powers and spectra, Linear Algebra Appl. , 40( 1981 ) 1- 18. Bhatia and J A. R . Holbrook, Short normal paths and spectral variation, Proc. Arner. Math. Soc. , 94( 1985) 377-382. .
R.
R.
.
R. Bhatia and J . A .R Holbrook, Unitary invariance and spectral variation, Linear Algebra Appl. , 95( 1987) 43-68. R. Bhatia and J . A. R Holbrook, A softer, stronger Lidskii theorem, Pro c. Indian Acad. Sci. (Math. Sci. ) , 99(1989) 75-83. .
.
References
328
R. Bhatia, R. Horn, and F. Kittaneh , Norrnal approximants to binormal operators, Linear Algebra Appl. , 147( 199 1 ) 169- 179. R. Bhatia and F . Kittaneh , On some perturbation inequalities for operators, Linea r Algebra Appl. , 106 ( 1 988) 271-279. R. Bhatia and F. Kittaneh, On the singular values of a product of operators, SIAM J. Matrix Analysis , 1 1 ( 1990) 272-277. R. Bhatia and F. Kittaneh, Some inequalities for norms of commutators, SIAM J. Matrix Analysis, 18( 1997) to appear. R. Bhatia, F. Kittaneh and R.- C . Li , Some inequalities for commutators and an application to spectral variation II, Linear and Multilinear Algebra, to appear. R. Bhatia and K.K. Mukherjea, On the rate of change of spectra of opera tors, Linear Algebra Appl. , 27( 1 979) 147-157. R. Bhatia and K.K. Mukherj ea, The space of unordered types of complex numbers, Linear Algebra Appl. , 52/53 ( 1983) 765-768. R. Bhatia and K. Mukherjea, Variation of the unitary part of a matrix, SIAM J . Matrix Analysis, 15 ( 1994) 1007- 1014. R. Bhatia and P. Rosenthal, How and why to solve the equation AX - X B Y , Bull. London Math. Soc. , 29( 1997) to appear.
==
R. Bhatia and K . B . Sinha, Variation of real powers of positive operators, Indiana Univ. Math. J. , 43 ( 1 994) 9 13-925 .
M.Sh. Birman, L.S. Koplienko, and M.Z. Solomyak, Estimates of the spec trum of the difference between fractional powers of self-adjoint opera tors, Izvestiya Vysshikh Uchebnykh Zavedenni. Mat , 19( 1975) 3- 10. M. Sh. Birman and M.Z. Solomyak Double Stieltjes operator integrals, En glish translation, in Topics in Mathematical Physics, Volume 1 , Con sultant Bureau, New York, 1967. A. Bjorck and G.H. Golub, Numerical methods for computing angles be tween linear subspaces, Math. Comp. 27( 1973) 579-594. R. Bouldin, Best approximation of a normal operator in the Schatten p norrn, Proc. Amer. Math. Soc. , 80( 1980) 277-282. A.L. Cauchy, Sur l 'equation a l 'aide de laquelle on determine les inegalites seculaires des mouvements des p lan etes 18 2 9, Oeuvres CompU�tes, ( lind Serie ) Volume 9, Gauthier-Villars. F. Chatelin, Spectral Approximation of Linear Operators , Academic Press 1983. ,
M.D. Choi , Almost commuting matrices need not be nearly commuting, Proc. Amer. Math. Soc. 102 ( 1 988) 529-533.
J E Cohen, Inequalities for matrix exponentials, Linear Algebra Appl. , 1 1 1 ( 1988) 25-28. .
.
References
J .E.
329
Cohen, S . Friedland , T . Kato, and F .P. Kelly, Eigenvalue inequalities for products of matrix exponentials, Linear Algebra Appl . , 45 ( 1982 ) 55-95.
A. Cannes and E . St¢rmer , Entropy for automorphisms of II1 von Neu mann algebras, Acta Math. , 134( 1975 ) 289-306.
H. 0. Cordes, Spectral Theory of Linear Differential Operators and Com parison Algebras, Cambridge University Press, 1987 . .. R. Courant , Uber die Eigenwerte bei den Differentialgleichungen der mathematischen Physik, Math. Z. , 7( 1920) 1-57. R. Courant and D . Hilbert , Methods of Mathematical Physics, Wiley, 1953 . Ju. L. Daleckii and S.G. Krein, Formulas of differentiation according to a parameter of functions of Hermitian operators, Dokl. Akad. Nauk SSSR, 76 ( 1 9 5 1 ) 13- 1 6. E. B . Davies, Lipschitz continuity of functions of operators in the Schatten classes, J . London Math. Soc. , 37( 1988) 148-: 157. C. Davis , Separation of two linear subspaces, Acta Sci. Math. ( Szeged) , 1 9 ( 1958) 172- 187. C. Davis, Notions generalizing convexity for functions defined on spaces of matrices, in Convexity: Proceedings of Symposia in Pure Mathematics, American Mathematical Society, ( 1 963) 1 87-20 1 . C. Davis, The rotation of eigenvectors b y a perturbation, J. Math. Anal. Appl. , 6 ( 1 963) 159- 1 73. C. Davis, The Toeplitz-Hausdorff theorem explained, Canad. Math. Bull. , 1 4 ( 197 1 ) 245-246. C . Davis and W.M. Kahan, The rotation of eigenvectors by a perturbation III, SIAM J. Numer. Anal. 7(1970) 1-46. J. Dieudonne, Foundations of Modern Analysis, Academic Press, 1960. W.F. Donoghue, Monotone Matrix Functions and Analytic Continuation, Springer-Verlag , 1974. L. E lsner, On the variation of the spectra of matrices, Linear Algebra Appl. , 47( 1982) 127- 138. L. Elsner, An optimal bound for the spectral variation of two matrices, Linear Algebra Appl. , 71 ( 1 98 5) 77-80. L. E lsner and S . Friedland, Singular values, doubly stochastic matrices and applications, Linear Algebra Appl. , 2 2 0 ( 19 9 5 ) 161- 1 69. L. Elsner and C. He, Perturbation and interlace theorems for the unitary eigenvalue problem, Linear Algebra Appl. , 188/189 ( 1993) 207-229. L. Elsner, C. Johnson, J . Ross and J. Schonheim, On a generalised matching problem arising in estimating the eigenvalue variation of two matrices, European J. Combinatorics, 4 ( 1 983) 133- 136.
330
References
L. Elsner and M.H.C. Paarde kooper, On measures of nonnormality of ma trices, Linear Algebra Appl. , 92 ( 1987) 107- 1 24. Ky Fan , On a theorem of Weyl concerning eigenvalues of linear transfor mations I, Proc. Nat. Acad. Sci. , U . S . A . , 35 ( 1 949) 652-655. Ky Fan, On a theorem of Weyl concerning eigenvalues of linear transfor mations II, Proc. Nat . Acad. Sci. , U.S .A. , 36( 1950) 3 1-35. Ky Fan, A minimum property of the eigenvalues of a Hermitian transfor mation, Amer. Math. Monthly, 60 ( 1953) 48-50. Ky Fan and A . J . Hoffman, Some metric inequalities in the space of matri ces, Proc. Amer. Math. Soc. , 6 ( 1955) 1 1 1- 1 16. Yu. B . Farforovskaya, A n estimate of the norm ll f (B) - f( A ) I I for self adjoint operators A and B, Zap. Nauch. Sem LOMI, 56 ( 1 976) 143- 162. ( English translation: J. Soviet Math. 14, No. 2 ( 1980) . ) M . Fied l er, Bounds for the determinant of the sum of Hermitian matrices, Proc. Amer. Math. Soc. , 30 ( 1971 ) 27-31 . .. E. Fischer, Uber Quadratische Form en mit reellen Koeffizienten, Monatsh. Math. Phys. , 1 6 ( 1 905) 234-249. C .-K. Fong and J.A.R. Holbrook, Unitarily invariant operator norms, Canad. J. Math. , 35 ( 1 983) 274-299. C.-K. Fang, H . Radj avi and P. Rosenthal, Norms for matrices and opera tors, J . Operator Theory, 1 8 ( 1 987) 99- 1 13. M. Fujii and T. Furuta, Lowner-Heinz, Cordes and Heinz-Kato inequalities, Math. Japonica, 38( 1993) 73-78. T. Furuta, Norm inequalities equivalent to Lowner-Heinz theorem, Reviews in Math. P hys . , 1 ( 1989) 135- 137. F.R. Gantmacher, Matrix Theory, 2 volumes, Chelsea, 1959. L. Carding, Linear hyperbolic partial differential equations with constant coefficients, Acta Math. , 84( 195 1 ) 1-62. L. Carding, An inequality for hyperbolic polynomials, J. Math. Mech. , 8 ( 1959) 957-966. I.M. Gel'fand and M. Naimark, The relation between the unitary represen tations of the complex unimodular group and its unitary subgroup, Izv Akad. Nauk SSSR Ser. Mat. 14( 1950) 239-260. S . A. Gersgorin, Uber die A brenzung der Eigenwerte einer Matrix, Izv. Akad. Nauk SSSR, Ser. Fiz. - Mat . , 6 ( 193 1 ) 749-754. I.C. Gohberg and M.G. Krein, Introduction to the Theory of Linear Non selfadjoint Operators, American Math. Society, 1969. S . Golden, Lower bounds for the Helmholtz function, Phys. Rev. B , 137( 1965) 1 127- 1 128. J.A. Goldstein and M. Levy, Linear algebra and quantum chemistry, Amer ican Math. Monthly, 78 ( 1 99 1) 710-718.
References
33 1
G . H. Golub and C.F. Van Loan, -Matrix Computations, The Johns Hopkins University Press, 2nd ed. , 1989. W. Greub, Multilinear Algebra, 2nd ed . , Springer-Verlag, 1978. P.R. Halmos, Finite- dimensional Vector Spaces, Van Nostrand , 1958. P. R. Halmos , Positive approximants of operators, Indiana Univ. Math. J . 2 1 ( 1972 ) 95 1-960. P. R. Halmos, Spectral approximants of normal opemtors, Proc. Edinburgh Math. Soc. , 19 ( 1974 ) 5 1-58. P. R. Halmos, A Hilbert Space Problem Book, 2nd ed. , Springer-Verlag, 1982 . F. Hansen and G.K. Pederesen, Jensen's inequality for operators and L owner 's theorem, Math. Ann. , 258 ( 1982 ) 229-241 . G.H. Hardy, J.E. Littlewood and G. Polya, Inequalities, Cambridge Uni versity Press, 1934. E . Heinz , Beitriige zur Storungstheoric der Spektralzerlegung, Math. Ann. , 123 ( 1951 ) 415-438. R.H. Herman and A. Ocneanu, Spectral analysis for automorphisms of UHF C* -algebras, J. Funct . Anal . , 66 ( 1986 ) 1- 10. P. Henrici , Bounds for iterates, inverses, spectral variation and fields of values of nonnorrnal matrices, Numer. Math. , 4 ( 1962 ) 24-39 . F. Hiai and Y. Nakamura, Majorisation for generalised s-numbers in semifi nite von Neumann algebras, Math. Z. , 195 ( 1987 ) 1 7-27. F. Hiai and D. Petz, The Golden- Thompson trace inequality is comple mented, Linear Algebra Appl. , 181 ( 1993 ) 153- 185. N.J. Higham, Matrix nearness problems and applications, in Applications of Matrix Theory, Oxford University Press, 1989. A.J . Hoffman and H.W. Wielandt , The variation of the spectrum of a nor mal matrix, Duke Math J . , 20 ( 1953 ) 37-39. K. Hoffman and R. Kunze, Linear Algebra, 2nd ed. , Prentice Hall, 1971 . J. A. R. Holbrook, Spectral variation of normal matrices, Linear Algebra Appl. , 174 ( 1992 ) 131- 144. A. Horn, Eigenvalues of S'ums of Hermitian matrices, Pacific J. Math . , 1 2 ( 1962 ) 225-242. A. Horn and R. Steinberg, Eigenvalues of the unitary part of a matrix, Pacific J. Math. , 9 ( 1959 ) 541-550. R.A. Horn, The Hadamard product, in C.R. Johnson, ed. , Matrix Theory and Applications , American Mathematical Society, 1990. R.A. Horn and C.R. Johnson, Matrix Analysis, Cambridge University Press, 1985. R.A. Horn and C.R. Johnson, Topics in Matrix A nalysis, Cambridge Uni versity Press, 1990.
332
References
R.A. Horn and R. Mathias, An analog of the Cauchy-Schwarz inequality for Hadmard products and unitarily invariant norms, SIAM J . Matrix Analysis, 1 1 ( 1990) 48 1-498. R.A. Horn and R. Mathias, Cauchy-Schwarz inequalities associated with positive semidefinite matrices, Linear Algebra Appl. , 1 42 ( 1990) 63-82. N . M. Hugenholtz and R.V. Kadison, A utomorphisms and quasi-free states of the CAR algebra, Commun. Math. Phys. 43 ( 1975) 1 8 1- 197. C. Jordan, Essai sur la geometrie a n dimensions, Bull. Soc. Math. France, 3 ( 1875) 103- 1 74. \V . Kahan, Numerical linear algebra, Canadian Math. Bull. , 9 ( 1966) 75780 1 . W. Kahan, Inclusion theorems for clusters of eigenvalues of Hermitian ma trices, Technical Report , Computer Science Department, University of Toronto , 1967. W. Kahan , Every n x n matrix Z with real spectrum satisfies IIZ - Z * II < II Z + Z * II (log2 n + 0.038) , Proc. Amer. Math. Soc. , 39 ( 1973) 235-24 1 . W. Kahan, Spectra of nearly Hermitian matrices, Proc. A mer. Math. Soc. , 48 ( 1975) 1 1- 17. W. Kahan, B .N. Parlett, and E. Jiang, Residual bounds on approximate eigensystems of nonnormal matrices, SIAM J. Numer. Anal. 1 9 ( 1982) 470-484. T. Kato, Notes on some inequalities for linear operators, Math. Ann. , 125 ( 1952) 208-212. T. Kato, Perturbation Theory for Linear Operators, Springer-Verlag, 1966. T. Kato, Continuity of the map S ----+ l S I for linear operators, Proc. Japan Acad. 49 ( 1973) 157- 160. C.J. Kenney and A . J . Laub , Condition estimates for matrix functions, SIAM J. Matrix Analysis, 10( 1989) 1 91-209 . F. Kittaneh, On Lipschitz functions of normal operators, Proc. Amer. Math. Soc. , 94( 1985) 416-4 18. F. Kittaneh , Inequalities for the Schatten p-norm IV, Commun. Math. Phys.� 106 ( 1986) 58 1-585. F. Kittaneh, A note on the arithmetic-geometric mean inequality for ma trices, Linear Algebra Appl. , 171 (1992) 1-8. F. Kittaneh and D . Jocic, Some perturbation inequalities for self-adjoint operators, J. Operator Theory, 3 1 ( 1994) 3- 10. F. Kittaneh and H. Kosaki, Inequalities for the S chatten p-norm V, Publ. Res. Inst. Math. Sci . , 23( 1987) 433-443. A. Koninyi, On a theorem of Lowner and its connections with resolvents of selfadjoint transformations, Acta Sci. Math. Szeged , 1 7 ( 1956) 63-70.
References
333
H. Kosaki , On the continuity of the map cp I 'P I from the predual of a W * -algebra, J. Funct. Anal. , 59 ( 1984) 123- 13 1 . -+
B . Kostant , On convexity, the Weyl group and the Iwasawa decomposition, Ann. Sci E.N.S . , 6 ( 1973) 413-455 . .
F. Kraus, Uber konvexe Matrixfunktionen, Math. Z. , 41 ( 1936) 18-42 . G. Krause, Bounds for the variation of matrix eigenvalues and polynomial roots, Linear Algebra Appl. , 208/209 ( 1994) 73-82. F. Kubo and T. Ando , Means of positive linear operators, Math. Ann. , 249 ( 1980) 205-224. S. Lang, Introduction to Differentiable Manifolds, John Wiley, 1962. P. D. Lax, Differential equations, difference equations and matrix theory, Comm. Pure Appl. Math. , 1 1 ( 1958) 175-194. A. Lenard, Generalization of the Golden- Thompson inequality, Indiana Univ . Math. J. 2 1 ( 1971) 457-468. C.-K. Li, Some aspects of the theory of norms, Linear Algebra Appl. , 2 12/2 1 3 ( 1994) 71- 100. C.-K. Li and N.-K. Tsing, On the unitarily invariant norms and some related results, Linear and Multilinear Algebra, 20 ( 1987) 107- 1 19. C.-K. Li and N.-K. Tsing, Norms that are invariant under unitary sim ilarities and the C-numerical radii, Linear and Multilinear Algebra, 24( 1989) 209-222 . R.-C. Li , New perturbation bounds for the unitary polar factor, SIAM J . Matrix Analysis. , 1 6 ( 1995) 327-332. R.-C.Li, Norms of certain matrices with applications to variations of the spectra of matrices and matrix pencils, Linear Algebra Appl. , 182 ( 1993) 199-234. B .V. Lidskii, Spectral polyhedron of a sum of two Hermitian matrices, Func tional Analysis and Appl. , 10( 1982) 76-77. V.B . Lidskii, On the proper values of a sum and product of symmetric matrices, D o kl Akad. Nauk SSSR, 75( 1950) 769- 772 . .
E.H. Lieb, Convex trace functions and the Wigner- Yanase-Dyson conjec ture, Advances in Math., 1 1 ( 1973) 267-288. E.H. Lieb, Inequalities for some operator and matrix functions, Advances
in Math. , 20( 1976) 174- 1 78.
H. Lin, Almost commuting selfadjoint matrices and applications, preprint , 1995. G. Lindblad, Entropy, information and quantum measurements, Commun. Math. Phys. , 33( 1973) 305-322. J.H. van Lint, The van der Waerden conjecture: two proofs in one year, Math. , Intelligencer, 4( 1982) 72-77.
334
References
P. O . Lowdin, On the non- orthogonality problem connected with the use of atomic wave functions in the theory of molecules and crystals, J . Chern. Phys. , 18( 1950) 365-37 4 .
..
K. Lowner, Uber monotone MatTixfunctionen, Math. Z. , 38 ( 1 934) 1 77-2 16. T.-X. Lu, Perturbation bounds for eigenvalues of symmetrizable matrices, Numerical Mathematics: a Journal of Chinese Universities, 16( 1994) 177- 185 ( in Chinese) . G. Lumer and M. Rosenblum, Linear operator equations, Proc. Amer. Math. Soc. , 10( 1959) 32-4 1 . M. Marcus, Finite-dimensional Multilinear Algebra, 2 volumes , Marcel Dekker, 1973 and 1975. M. Marcus and L. Lopes , Inequalities for symmetric functions and Herrni tian matrices, Canad. J. Math. , 9 ( 1957) 305-312. M. Marcus and H. Mine, A Survey of Matrix Theory and Matrix Inequali ties, Prindle, Weber and Schmidt, 1964 , reprinted by Dover in 1 992 . M. Marcus and M. Newmann , Inequalities for the permanent function, Ann . of Math. , 75 ( 1 962) 47-62. A . S . Markus , The eigen-and singular values of the sum and product of linear operators, Russian Math. Surveys, 19( 1964) 92- 120. A.W. Marshall and I. Olkin, Inequalities: Theory of Majorization and Its Applications, Academic Press , 1979. R. Mathias , The Hadamard operator norrn of a circulant and applications, SIAM J. Matrix Analysis , 14( 1993) 1 152- 1 167. R. McEachin , A sharp estimate in an operator inequality., Proc. Amer. Math. Soc. , 1 15 ( 1992) 161-165. R. McEachin, A nalyzing specific cases of an operator inequality, Linear Algebra Appl . , 208/209 ( 1 994) 343-365 . A. Mcintosh, Counterexample to a question on commutators, Proc, Amer. Math. Soc. , 29( 1971) 337-340. A. Mcintosh , Functions and --derivations of C* -algebras, J . Funct. Anal.,
30( 1978) 264-275.
A. Mcintosh, Heinz inequalities and perturbation of spectral families, Mac quarie Math. Reports, 1 979. M.L. Mehta, Matrix Theory, 2nd ed. , Hindustan Publishing Co. , 1989.
R. Merris and J.A. Dias da Silva, Generalized Schur functions, J. Algebra, 35 ( 1975) 442-448.
H. Mine, Permanents, Addison Wesley, 1978. B.A. Mirman , Numerical range and norrn of a linear operator, Thudy Sem inara p o Funkcional ' n o mu Analizu, No. 10 ( 19 68) , 51-55.
Refere nces
335
L. Mirsky, Matrices with prescribed characteristic roots and diagonal ele ments, J. London Math. Soc. , 33 ( 1958) 14-2 1 .
L. Mirsky, Symmetric gauge functions and unitarily invariant norms, Quart. J. Math. , Oxford Ser. (2 ) , 1 1 ( 1960) 50-59. Y. Nakamura, Numerical range and norm, Math. Japonica, 27( 1982) 149150 . A. Nudel'man and P. S varc man, The spectrum of a product of unitary ma trices, Uspehi Mat. Nauk 13 ( 1958) 1 1 1- 1 1 7. M. Ohya and D. Petz, Quantum Entropy and Its use, Springer-Verlag, 1993. K. Okubo, Holder- type norm inequalities for Schur products of matrices, Linear Algebra Appl. , 9 1 (1987) 13-28 . C.L. Olsen and G.K. Pedersen, Corona C* algebras and their applications to lifting pro blems, Math. Scand. , 64( 1989) 63-86. M. Omladic and P. Semrl, On the distance between normal matrices, Proc. Amer. Math. Soc. , 1 10 ( 1990) 59 1-596 . A. Ostrowski, Recherches sur I! a methode de Griiffe et f es zeros des polynomes et des series de Laurent, Acta Math. , 72 ( 1 940) 99-257 . A. Ostrowski, Uber die Stetigkeit von charakteristischen Wurzeln in Abhangigkeit von den Matrizenelementen, J ber. Deut. Mat. - Verein, 60( 1 957) 40-42 . A. Ostrowski, Solution of Equations and Systems of Equations, Academic Press , 1960. C . C. Paige and M . Wei, History and generality of the CS Decomposition, Linear Algebra Appl. , 208/209 ( 1994) 303-326. B .N. Parlett, The Symmetric Eigenvalue Problem, Prentice-Hall, 1980. K.R. Parthasarathy, Eigenvalues of matrix-valued analytic maps, J. Aus tral. Math. Soc.(Ser . A ) , 26( 1978) 179- 197. C. Pearcy and A. Shields, Almost commuting matrices, J. Funct . Anal. , 33( 1979) 332-338. G.K. Pedersen, A commutator inequality ( unpublished note) . D. Petz, Quasi- entropies for finite quantum systems, Rep. Math. Phys. , 2 1 ( 1 986) 57-65. D. Petz , A survey of certain trace inequalities, Banach Centre Publications 30 ( 1994) 287-298. D. Phillips, Improving spectral variation bounds with Chebyshev polynomi als, Linear Algebra Appl. , 133 ( 1990) 165- 173 . -
..
J . Phillips , Nearest normal approximation for certain normal operators� Proc. Amer. Math. Soc. 67(1977) 236-240. A. Pokrzywa, Spectra of operators with fixed imaginary parts, Proc. Amer. Math. Soc. , 8 1 ( 198 1) 359-364. ,
336
References
LR. Porteous , Topological Geometry, Cambridge University Press, 198 1 . R.T. Powers and E . St¢rmer, Free states of the canonical anticommutation relations, Commun . Math. Phys. , 16( 1970) 1-33.
J .F. Queir6 and A.L. Duarte, On the Cartesian d e comp o sit i o n of a matrix, Linear and Multilinear Algebra, 18 (1985) 77-85. Lord Rayleigh, The The o ry of Sound, reprinted by Dover, 1945. M . Reed and B. Simon, Methods of Modern Mathe m ati ca l Physics, Volume 4, Academic Press, 1978. R. C. Riddell, Minimax problems on G rass m a nn m anifo lds, Advances in Math. , 54( 1984) 107- 199. M. Rosenblum, On the operator equation BX - XA 23( 1956) 263-270.
==
Q, Duke Math. J . ,
S. Ju. Rotfel'd, The singular values of a sum of completely continuous operators, Topics in Mathematical Physics, Consultants Bureau, 1969 , Vol. 3, 73-78. A. Ruhe, Closest normal matrix finally found! BIT, 27( 1987) 585-598. D. Ruelle, Statistical Mechanics, Benj amin, 1969.
R. Schatten, Narm Ideals of Completely Continuous Operators, SpringerVerlag , 1960. A . Schonhage, Quasi- GCD computations, J. Complexity, 1 ( 1985) 1 18-137. J.P. Serre, Linear Representations of Finite Gro ups, Springer-Verlag, 1977. H.S. Shapiro, Topics in Approximation Theory, Springer Lecture Notes in Mathematics, Vol. 187, 1971 . B . Simon, Tra ce Ideals and Their Applica t ions, Cambridge University Press, 1979.
S. Smale, The fundamental theorem of algebra and complexity theo ry, Bull. Amer. Math. Soc. ( New Series ) , 4( 1981) 1-36.
M.F. Smiley, Inequalities related to Lidskii 's, Proc. Amer. Math. Soc. , 19(1968) 1029- 1034. G . Sparr, A new p ro of of Lowner 's theorem on m o n o t on e matrix functions, Math. Scand. , 4 7 ( 1980) 266-27 4. t
G.W. Stewart, Erro r and perturbation bounds for subspaces associated with certain e ig e n va lue problems, SIAM Rev . , 1 5 ( 1973) 727-764. G.W. Stewart , On the pe rturba ti o n of ps e u d o - inve rs e s projections, and linear least squares problems, SIAM Rev. , 19( 1977) 634-662. ,
G .W. Stewart and J.-G. Sun, Matrix Perturbation Theory, Academic Press, 1990. J .- G. Sun, On the pe rtu rb ati o n of the eigenvalues of a normal matrix, Math. Numer. Sinica, 6( 1 984) 334-336.
References
33 7
Sun, On the variation of the spectrum of a normal matrix, Linear Algebra Appl . , 246( 1996) 2 1 5-222 .
J .- G .
V.S. Sunder, Distance between normal operators, Proc. Amer. Math. Soc. , 84( 1982) 483-484. V.S. Sunder, On permutations, convex hulls and normal ope ra to rs , Liuear Algebra Appl. , 48 ( 1982) 403-4 1 1 . J .H. Sylvester, Sur l 'equation en matrices px 99 ( 1 884) 67-71 and 1 15- 1 16.
== x q ,
C.R. Acad. Sci. Paris,
B. Sz.-Nagy, Uber die Ungleichung von H. Boh r, Math. Nachr. , 9( 1953)
255-259. P. Tarazaga, Eige n value estimates fo r symmetric matrices, Linear Algebra Appl. , 135 ( 1 990) 1 7 1- 1 79. C.J. Thompson, Inequality with applications in statistical mechanics, J. Math. Phys. , 6 ( 1965 ) 1 8 1 2- 1 813. C.J. Thompson, Inequalities and partial orders on matrix spaces, Indiana Univ. Math. J. 2 1 ( 1971 ) 469-480. R. C. Thompson, Principal submatrices II, Linear Algebra Appl. , 1 ( 1968) 21 1-243. R. C. Thompson , Principal submatrices IX, Linear Algebra Appl . , 5( 1972) 1- 12. R. C. Thompson, On the eigenvalues of a product of unitary matrices, Linear and Multilinear Algebra, 2 ( 1974) 13-24. J.L. van Hemmen and Ando An inequality fo r trace ideals, Commun. Math. Phys. , 76( 1980) 143- 148. J. von Neumann, Some matrix inequalities and metrization of matric space, Tomsk. Univ. Rev. , 1 ( 1937) 286-300, reprinted in Coll ec ted works, Perg amon Press, 1962. B. Wang and M . Gong, Some eigenvalue i n equali ti e s for positive s e mi d efi nite matrix power products, Linear Algebra Appl. , 184 ( 1 993) 249-260. P.A. Wedin, Perturbati o n bounds in connection with si ngul ar value decom p o s i tio n BIT ( 13) 2 1 7-232. H . F . Weinberger, Remarks on the preceding paper of Lax, Comm. Pure Appl. Math. , 1 1 ( 1958) 195-196. H. Weyl, Das asy mp t o tisch e Vert ei lu ngsg es e tz der Eigenwerte linearer par tieller Differentialgleichungen, Math. Ann. , 71 ( 191 1 ) 441-469. H. Whitney, Complex A nalytic Varieties, Addison Wesley, 1972. ,
H. Wielandt, Lineare Scharen von Matrizen mit reellen Eigenwerten, Math. z. , 53 ( 1950) 2 19-225. H.W. Wielandt, A n extre mum property of sums of eig e nv alu e s, Proc. An1er. Math. Soc. , 6 ( 1 955) 106- 1 10.
338
References
H.W. Wielandt , Topics in the Analytic Theory af Matrices, mimeographed lecture notes, University of Wisconsin, 1 967, reprinted in Collected Works, Vol. 2, W.de Gruyter, 1996. E . Wigner and J. von Neumann, Significance of L owner 's theorem in the quantum theory of collisions, Ann. of Math. , 59 ( 1954) 418-433. J. H. Wilkinson, The Algebraic Eigenvalue Problem, Oxford University Press , 1965 .
Index
absolute value of a matrix, 296 of a vector , 30 perturbation of, 296 adjoint , 4 analytic continuation, 1 34 analytic functions in a half-plane, 13 4 angles between subspaces, 20 1 angle operator, 22 1 annihilation operator, 2 1 antisymmetric tensor power, 16 antisymmetric tensor product, 16 arithmetic-geometric mean inequality, 87, 262 , 295 averaging, 40 , 1 1 7 Ando's Concavity Theorem, 273 Araki-Lieb-Thirring inequality, 258 Aronszajn ' s Inequality, 64 Bauer-Fike Theorem , 233 Bernstein's Theorem, 148 bilinear map, 1 2 , 3 10, 3 1 3
bilinear functional, 1 2 bilinear functional, elementary, 12 binormal operator, 284 biorthogonal, 2 Birkhoff 's Theorem, 37, 180 block-matrix, 9, 195 Borel measure, 139 bound norm, 7 C-numerical radius, 102 , 106 canonical angles, 20 1 cosines of, 20 1 Caratheodory's Theorem, 38 Cartesian decomposition, 6, 25, 73, 156, 183 Carrollian n-tuple, 242 Cauchy's integral formula, 205 Cauchy's Interlacing Theorem, 59 Cauchy-Riemann equations, 135 Cauchy-Schwarz inequality for matrices , 266 , 286 , 297, 298 for symmetric gauge func-
340
Index
tions , 88 for unitarily invariant norms , 95, 108 Cayley transform, 284 centraliser, 167 chain rule, 3 1 1 Chebyshev's Theorem, 228 Cholesky decomposition , 5, 3 19 class £, 268 class T, 259 coefficients in the characteristic polynomial , 269 compatible matching, 36 commutant, 167 commutator, 167, 234 complete symmetric polynomial, 18 completely monotone function, 148 compression, 59, 6 1 , 1 19 concave function, 41 , 53, 158, 240, 248 , 289 condition number, 232 conformally equivalent , 137 contraction, 7, 1 19 contractive, 7, 10 convex function, 40, 41 , 45, 87, 1 1 7, 157, 218, 240, 248, 265, 28 1 monotone, 40 smoothness properties, 145 convolution, 146 creation operator, 21 CS Decomposition, 196, 223 cyclic order, 184 , 191 Davis-Kahan sin8 Theorem, 2 1 2 derivative, 124, 310 determinant , 16, 253, 269 inequality, 3 , 1 9 , 2 1 , 47, 5 1 , 108, 1 8 1 , 182, 183, 271 , 28 1 perturbation of, 22 diagonal of A, 37 diagonalisable matrix, 232
diagonally dominant , 25 1 differentiable curve , 166 differentiable manifold , 305 differentiable map, 124, 310 differentiation, rules of, 3 1 1 Dilation Theory, 26 distance between subspaces , 202 direct sum, 9 directional derivative, 3 10 distribution function, 139 doubly stochastic matrix, 23 , 32 , 165 doubly substochastic matrix, 38 , 39 dual norm, 89, 96 dual space, 14 Dyson's expansion, 3 1 1 , 3 19 eigenvalues, 24 continuity of, 1 52, 168 continuous parametrization of, 154 generalised, 238 of A with respect to B, 238 of Hermitian matrices, 24, 35, 62 , 1 0 1 o f product of unitary matrices, 82 elementary symmetric polynomi als, 18, 46 , 5 1 entropy, 44, 274, 287 in quantum mechanics, 274 relative, 274 Euclidean norm, 86 exponential, 8, 254, 3 1 1 extremal representation for eigenvalues, 2 4 58, 67, 77 extremal representation for singular values, 75, 76 ,
Fan Dominance Theorem, 93, 284, 29 1 Fan-Hoffman theorem, 73 field of values , 8
Index
first divided difference, 123 Fischer's inequality, 5 1 Fourier analysis, 135 Fourier transform, 206, 207, 216 inverse, 2 16 minimal extrapolation problem, 224 Frechet derivative, 310 Frechet differential calculus, 30 1 , 3 10 Frechet differentiable map , 124 Frobenius norm, 7, 25, 92, 2 14 Fubini's Theorem, 140 Fuglede-Putnam Theorem, 235 function of bounded variation, 217 gamma function, 3 1 9 gap, 225 Gel'fand Naimark Theorem, 71 general linear group, 7 Gersgorin disks, 244 Gersgorin Disk Theorem, 244 Golden-Thompson inequality, 261 , 279, 285 Gram determinant , 20 Gram matrix, 20 Gram-Schmidt procedure , 2, 287 Grassmann power, 18 Hadamard's inequality, 3, 227 Hadamard product , 23 Hadamard Determinant Theorem, 24, 46, 5 1 Hall's Theorem, 52 Hall's Marriage Theorem, 36 harmonic conjugates, 137 harmonic function, 135 harmonic function, mean value property, 136 Hausdorff distance, 160 Heinz inequalities, 285 Henrici 's Theorem, 246 Herglotz Theorem, 136 Hermitian approximant , 276
341
Hermitian matrix, 4, 8, 57, 1 55 , 254 Hilbert-Schmidt norm, 7, 92, 97, 299 Hoffman-Wielandt inequality, 165, 237 Hoffman-Wielandt Theorem, 1 65 Holder inequality, 88 for symmetric gauge func tions, 88 for unitarily invariant norms, 95 hyperbolic PDE, 25 1 hyperbolic polynomials, 251 inner product , 1 inner product , on matrices, 92 imaginary part of a matrix, 6 invariant subspace, 10 interlacing theorem, 60 converse, 6 1 for singular values, 8 1 inverse, 29 3 isometry, 1 19 isotone, 41 jointly concave, 271 , 273 Jordan canonical form, 246 Jordan decomposition , 99, 262, 277, 292 Kato-Temple inequality, 77 Konig-Frobenius Theorem, 37 Krein-Milman Theorem, 133 Ky Fan k-norms, 35 , 92 main use, 93 Ky Fan's Maximum Principle, 24 , 35, 65, 6 9 , 7 1 , 1 74 lp-norrns, 84, 89 lattice superadditive, 48, 53 laxly positive, 238 Lebesgue Dominated Conver gence Theorem, 139 length of a curve, 169
342
Index
lexicographic ordering, 14 Lidskii ' s Theorem, 69, 1 79 second proof, 70 multiplicative, 73 third proof, 98 for normal matrices, 181 Lie algebra, 241 Lie bracket , 167 Lie Product Formula, 254 , 280 Liebian function, 286 Lieb's Concavity Theorem, 271 Lieb's Theorem, 270 Lieb-Thirring inequality, 279 Lindblad 's Theorem, 275 Lipschitz continuous, 322 Lipschitz constant , 215 Lipschitz continuous function, 214 logarithm, 145 principal branch, 143 logarithmic majorisation, 71 Loewner's Theorems, 13 1 , 149 Loewner-Heinz inequality, 150 Lowner-Heinz Theorem, 285 Low din orthogonalisation, 28 7 LR decomposition, 3 1 9 perturbation of, 320 Lyapunov equation, 22 1 majorisation, 28 weak, 30 of complex vectors, 179 soft, 180 manifold, 167 Marcus-de Oliviera Conjecture, 184 Marriage Problem , 36 Marriage Theorem, 162, 2 13 Matching Problem, 36 Matching Theorem, 185 matrix convex, 1 1 3 matrix convex of order n, 1 13 matrix monotone, 1 1 2 matrix monotone of order n, 1 12 matrix triangle inequality, 81
Matrix Young inequalities, 286 mean value theorem, 303, 307, 312 measure of nonnormality, 245 , 25 1 , 278 metrically fiat , 1 70 mid-point operator convex, 1 13 minimax principle, 58 of Courant , Fischer and Weyl, 58 for singular values, 75 Minkowski determinant inequal ity, 56 Minkowski Determinant Theo rem, 47, 282 Minkowski inequality, for sym metric gauge functions, 89 Mirman ' s Theorem, 25 Mixed Schwarz inequality, 281 , 286 mollifiers, 146 monotone, 45 , 48 monotone decreasing, 4 1 monotone increasing, 4 1 multi-indices, 1 6 multilinear functional, 1 2 multilinear map, 1 2 multiplication operator, 223 multiplication operator, left, 273 multiplication operator, ri ght, 273 nearly normal matrix, 252 Neumann Series, 7 Nevanlinna's Theorem, 138 norm, 1 , 6 absolute, 85 bound, 91 gauge invariant , 85 Frobenius, 92 Hilbert-Schmidt, 92 Ky Fan, 92 monotone, 85 operator, 91
Index
permutation invariant 85 ' Schat ten, 92 s ubmult i pl i cat i ve 94 symmetric, 85 trace, 92 unitarily invariant , 9 1 weakly uni t arily invariant , 102 normal approximant, 277 normal curve, 169 normal matrices, distance between e igenvalue s, 2 12 normal matrices, path connected, 169 normal matrix, 4, 8, 160, 16 1 , 168, 172 , 1 77, 180, 253 function of, 5 spectral resol ut ion, 161 normal path, 169, 177 normal path inequality, 189 numerical radius, 8, 102 nu me ri cal range, 8, 25 v-measure of nonno rm a lity, 246 ,
operator approximation, 192 , 275, 287 operator concave function , 1 13 121 operator convex function 1 13 130 integral representation, 134 o pe rator monotone function , 1 12 , 12 1, 126, 127, 130, 289 , 302, 303 , 304, 3 17 canonical representation, 145 infinitely differentiable , 134, 290 integral representation, 134 inverse of, 293 operator norm, 7 optimal matching, 159 optimal matching distance , 52 , 153, 1 60 , 2 1 '
'
'
343
o rbit , 189 orthostochastic matrix, 35 , 180 p-C ar r ol l i an , 242 p-norms, 84 pth derivative, 3 1 5 p arti t i one d matrix, 64, 188 Pe i erl s B ogol iubov Inequality, 275, 28 1 permanent, 17, 19 i neq u ali ty, 19, 2 1 , 23 p e rt ur b at io n of, 22 permutations , 165 permutation invariant, 43 permutation matrix, 32, 37, 165 complex, 85 permutation orbit , 166 p i n chi ng , 50 , 97, 1 18, 275 pi nch i ng ine qual ity, 97 for wui norms, 107 Pick fu nc ti o ns, 135 , 1 39 Nevanlinna ' s Theorem, 135 integr al representation, 138 Poi n c a re s Inequality, 58 Poisson kernel, 136 po lar decomposition, 6, 213, 267, 2 76, 305 perturbation of, 305 positive approximant , 277 positive definite, 4 positive matrices, product of, 255 positive matrix, 4 positive part , 6, 213 positive semidefinite, 4 positivity-preserving, 32 power functi o ns, 123 , 145, 289 operator monotonicity of, 123 operator convexity of, 1 2 3 principal branch, 143 pro b abi lity measure, 133 pro b ab ili ty measures, weak* compact , 136 p rinc ip a l angles, 202 -
'
344
Index
product rule for differentiation , 3 12 Pythagorean Theorem, 2 1 Q-norm, 89 , 95, 1 74, 1 75, 277 Q ' -norm, 90, 97 QR decomposition, 3, 195, 307 perturbation of, 307 rank revealing, 196 quasi-norms, 107 quantum chemistry, 28 7 R factor, 195 real eigenvalues, 238 real part of a matrix, 6 real spectrum, 1 93 rectifiable normal path, 169 rectifiable path, 169 reducing subspace, 10 regularisation, 146 residual bounds, 193 retraction, 1 73, 277 roots of polynomials, 230 continuity of, 154 perturbation of, 230 Rotfel'd's Theorem, 98 Rouche's Theorem, 153 S-convex, 40 Schatten class, 32 1 Schatten 2-norm, 7 Schatten p- norm, 92, 297, 298 Schur basis, 5 Schur-concave, 44, 46, 53 Schur-convex, 40, 4 1 , 44 Schur-convexity, and convexity, 46 Schur-convexity, for differentiable functions, 45 Schur product , 23, 1 24 Schur ' s Theorem, 23, 35, 47, 5 1 , 74 converse of, 55 Schur Triangular Form, 5 Schwartz space, 219
second derivative, 313 second divided difference, 128 self-adjoint , 4 sesquilinear functional , 12 signature of a permutation, 16 similarity orbit, 189 similarity transformations, 1 02 singular value decomposition, 6 singular values, 5 inequalities, 94 majorisation, 157 of products, 71 perturbation of, 78 singular vectors, 6 perturbation of, 2 1 5 sinO theorem, 224 skew-Hermitian, 4, 1 55 skew-symmetric, 24 1 smooth approximate identities, 146 spectral radius, 9, 102, 253, 256, 269 spectral radius formula, 204 spectral resolution, 57 Spectral Theorem, 5 square root, 297, 30 1 of a positive matrix, 5 principal branch, 143 Stieltjes inversion formula, 1 39 strictly isotone, 4 1 strictly positive, 4 strongly isotone, 41 strongly nonsingular, 319 subadditive, 53 subspaces, 20 1 Sylvester equation, 194, 203, 222, 223, 234 condition for a unique solution, 203 norm of the solution, 208 solution of, 204, 205, 206, 207, 208 symmetry classes of tensors , 17 symmetric gauge function, 44, 52, 86 , 90, 260
Index
quadratic, 89 symmetric matrix, 241 symmetric norm, 94 symmetric tensor power, 16, 18 symmetric tensor product, 16 symplectic, Lie algebra, 241 symplectic, Lie group, 241 T-transform, 33 tangent space, 167, 305 tangent vector, 166 tan8 theorem, 222 Taylor ' s Theorem, 303 , 307, 3 1 5 tempered distribution, 2 19 tensor product, 222 construction, 12 inner product on, 14 of operators, 14 of spaces, 13 orthonormal basis for, 14 Toeplitz-Hausdorff Theorem, 8, 20 trace, 25, 253, 269 of a vector, 29 trace inequality, 258, 261 , 279, 28 1 trace norm, 92, 1 73 trace-preserving, 32 triangle inequality for the matrix absolute value, 74 tridiagonal matrix, 60 twice differentiable map, 313 Tychonoff 's Theorem, 132 T-length of a path, 1 75 T-optimal matching distance , 1 73
unitary-stochastic , 35 unitarily equivalent , 5 unitarily invariant function norm, 104 unitarily invariant norm, 9 1 , 93 unitarily similar, 5 unordered n-tuples, 1 53 metric on, 153 quotient topology on, 153 van der Waerden conjecture, 27 variance, 44 weak submajorisation, 30 weak supermajorisation, 30 weakly unitarily invariant norm, 102 , 109 Weyl's inequalities, 62 , 64 Weyl's Maj orant Theorem, 42, 73 , 254, 279 converse of, 55 Weyl's Monotonicity Theorem, 63 Weyl ' s Monotonicity Principle, 100, 29 1 , 292 Weyl's Perturbation Theorem, 63, 7 1 , 99, 152, 240 Wielandt 's Minimax Principle, 67 Wigner-Yanase-Dyson conjec ture, 274 wui norm, 102, 1 73, 1 77, 190 wui seminorm, 102 Notations
unital, 32 unitary approximant, 276 unitary conj ugation, 102 , 166 unitary factors, 307 unitary group, 7 unitary invariance, 7 unitary matrix, 4 , 1 6 2, 178 unitary orbit, 166 unitary part , 6, 82, 2 13, 305
345
a
V
1\
b, 30
b, 30 A* ' 4 A > 0, 4 A > B, 4 A l /2 , 5 A ® B , 14 A [ k J _ 19 a
I
346
Index
A o B , 23 A < B, 1 1 2 A < L B, 238 AT , 241 A, 204 B, 204 C (A) , 50 C (S) , 104 c�y m ' 52 C. · U, 170 cond (S) , 232 det A, 3 d( .A , Jl- ) , 52 D f (A) , 124 D, 135 d ( o- (A) , o- ( B) ) , 160 dT ( () (A) ' () ( B )) ' 1 73 d( Root f, Root g ) , 230 � (A) , 245 � v (A) , 246 D j (A ) , 30 1 l i D f(A ) II ' 30 1 II I D J (A) III , 30 1 a + ( n ) , 307 are (n) , 307 Df(u ) , 310 D 2 f(u) , 3 13 diag ( A ) , 35 Eu , 16 e , 29 e 1 , 29 Eig A , 63 Eigl (A ) , 63 Eig T (A) , 63 Eiga (A ) , 158 Eig l l l (A) , 1 58 Eig i T I (A) , 158 f ( x ) , 40 j(l] ' 123 .t(2J , 128 j , 206 , 44 p (x ) , 44
q> ( P) ( x ) , 89 q, (P 2 ) ' 89 P1 ��\ ' 89
h( o- (A) , o- ( B ) ) , 160 h(Rootj, Rootg) , 23 1 Im A, 6 JC * , 14 JC (n) , 167 .c , 269 .C(V, W ) , 3 £ ( 1-l) , 4 .C 2 ( X , Y ) , 3 1 3 A (A ) , 50 -A1 (A) , 57 Aj (A ) , 58 .A 1 (A) , 58 Aj (A ) , 58 A 1 (T) , 256 £, ( , ) , 175 mA (X ) , 162 M (n) , 9 1 N , 169 N (
Index
16 s pan { v 1 , . . . , v k } , 6 5 s( L, M) , 1 60 a ( A ) , 160 s (a (A) , a ( B)) , 160 sgnx , 2 1 7 Sn , 165 s j_, 167 r+ , 99 r- , 99 T( A ) , 102 TAUA , 167 TA OA , 189 8 (£ , F) , 201 8 ( /, g ) , 230 Tu U ( n ) , 305 T, 259 tr, 29 u *v , 2 U ( n) , 7 UB , 166 v + w , 65 v - w, 65 w (A) , 8 W (A) , 8 X @ y , 12 X 1 1\ · · · 1\ X k , 16 X 1 V · · · V X k , 16 a,
xl 28 '
x T 28 X -< y, 2 8 X -< w y, 30 X -< w y , 30 '
-< s y ,
X V y, X 1\ y , x+ , 3 0 Z (A) , 167
X
( u, v) ,
1 80 30 30
1 [K, A] , 167 ffijk 7-lj , 1 1 ® 7-l , 14 ® k A , 15 1\ k 1-{ , 16 v k 71, 16 1\ k A , 18 vk A , 18 IAI , 5 I I I , 29 l x l , 30 ll x l l p , 84 ll x l l oo , 84 l l x l l 1 , 86 llx II (k ) , 8 6 IIAII , 6 I I A II 2 , 7 I I IA I I I , 91 I I I AI I I
347
Graduate Texts in Mathematics continued from page ii
61
WHITEHEAD. Elements of Homotopy
92
Theory .
62
64
63 65
66
Spaces.
KARGAPOLOV�ERWDAKOV. Fundrunentrus
93
I. 2nd ed.
Geometry-Methods and Applications.
BOLLOBAS. Graph Theory.
Part
EDwARDS. Fourier Series. Vol. I 2nd ed.
94
WELLS . Differentiru Anruysis on Complex
WARNER . Foundations of Differentiable Manifolds and Lie Groups.
Manifolds . 2nd ed .
95
SHIRYAEV. Probability. 2nd ed .
WATERHOUSE. Introduction to Affine
96
CoNWAY. A Course in Functionru Anruysis . 2nd ed.
97
67
SERRE. Local Fields.
68
WEIDMANN. Linear Operators in Hilbert
69
LANG. Cyclotomic Fields II .
70
MASSEY. Singular Homology Theory.
KOBUTZ. Introduction to Elliptic Curves and Modular Forms. 2nd ed.
98
Spaces.
72
DUBROVINIFOMENKo!NOVIKOV. Modem
of the Theory of Groups.
Group Schemes .
71
DIESTEL. Sequences and Series in Banach
BROCKERifOM DIECK. Representations of Compact Lie Groups .
STILLWELL. Classical Topology and
FARKASIKRA. Riemann Surfaces. 2nd ed.
99
GROVFiBENSON. Finite Reflection Groups. 2nd ed.
100
BERG/CHRISTENSEN/RESSEL. Harmonic
Combinatoriru Group Theory. 2nd ed.
Analysis on Semigroups: Theory of
73
HUNGERFORD. AJgebra.
Positive Definite and Related Functions.
74
DAVENPORT. Multiplicative Number
1 01
EDWARDS. Gruois Theory.
Theory. 2nd ed.
1 02
VARADARAJAN. Lie Groups, Lie Algebras
75 76 77 78 79
Groups and Lie AJgebras .
1 03
liTAKA. AJgebraic Geometry .
104 DUBROVIN!FOMENKO/NOVIKOV. Modem
LANG. Complex Anruysis. 3rd ed.
HECKE. Lectures on the Theory of
Geometry-Methods and Applications.
AJgebraic Numbers.
Part II.
B URRIS/SANKAPPA NAVAR. A Course in
1 05 LANG. S�(R) .
Universru Algebra.
1 06
WALTERS . An Introduction to Ergodic Theory.
80
and Their Representations.
HOCHSCHILD . Basi c Theory of AJgebraic
ROBINSON. A Course Groups. 2nd ed.
in the Theory of in Algebraic
SILV ERMAN. The Arithmetic of Elliptic
Curves .
1 07
OLVER: Applications of Lie Groups to Differential Equations. 2nd ed.
1 08
81
FoRSTER. Lectures on Riemann Surfaces.
Integral Representations
82
Borrffu. Differenttru Forms
Complex Variables.
Topology.
83
in Severru
RANGE. Holomorphic Functions and
1 09 Lmrro. Univalent Functions and
WASHINGTON. Introduction to Cyclotomic
Teichmiiller Spaces.
Curves.
Fields. 2nd ed.
1 10 LANG. Algebraic Number Theory.
IRELAND/RosEN. A Classical Introduction
111
to Modem Number Theory. 2nd ed.
1 1 2 LANG. Elliptic Functions.
85
EDwARDS. Fouri�r Series. Vol. II. 2nd ed.
1 13
86
vAN LINT. Introduction to Coding Theory.
84
2nd ed.
87
BROWN. Cohomology of Groups.
88
PIERCE. Associative Algebras.
89
LANG. Introduction to Algebraic and Abelian Functions. 2nd ed.
90
BR0NDSTED. An Introduction to Convex Polytopes.
91
B EARDON. On the Geometry of Discrete Groups.
HUSEMOLLER. Elliptic
KARATZAS/SHREVE. Brownian Motion and
A Course in Number Theory
Stochastic Cruculus. 2nd ed.
1 14
KOBLITZ.
and Cryptography. 2nd ed.
1 15
BERGERIGosTIAUX. Differential Geometry: Manifolds, Curves, and Surfaces .
1 1 6 KELLEY/S RINIVASAN. Measure and Integral. Vol. I.
1 17
SERRE. Algebraic Groups and Class Fields.
1 18
PEDERSEN. Analysis Now.
1 19
ROTMAN. An Introduction to Algebraic
1 46
Topology. 1 20
Mathematical Sketchbook.
ZIEMER. Weakly Differentiable Functions :
1 47
Sobolev Spaces and Functions of Bounded Variation. 121
LANG. Cyclotomic Fields I and II.
Readings in Mathematics 1 23 EBBINGHAUS/HERMES et al. Numbers . Readings in Mathematics
1 24
Geometry
Methods and Applications .
1 49
1 25 1 26 1 27
Introduction.
BERENSTEINIGAY. Complex Variables : An
in Algebraic
BoREL. Linear Algebraic Groups
MASSEY. A Basic Course Topology .
.
1 28
RAUCH. Partial Differential Equations.
1 29
FuLTON/HARRIS. Representation Theory:
Readings in Mathematics A First Course.
1 30 131
1 50
1 33
Geometry. 151 1 52
ZIEGLER. Lectures on Polytopes .
1 53
FuLTON. Algebraic Topology : A
First Course. 1 54
Analysis. 1 55
KASSEL. Quantum Groups.
1 56
KECHRIS . Classical Descriptive Set
Theory .
1 58
ROMAN. Field Theory.
LAM . A First Course in Noncommutative
1 59
CoNWAY. Functions of One
HARRis . Algebraic Geometry : A First
B EARDON. Iteration of Rational Functions .
1 35
RoMAN. Advanced Linear Algebra.
1 36
ADKINS/WEINTRAUB. Algebra: An
Approach via Module Theory.
Complex Variable II. 1 60
CoHEN. A Course
in Computational Algebraic Number Theo ry .
1 39
BREDON. Topology and Geometry .
1 40
AUBIN. Optima and Equilibria. An Introduction to Nonlinear Analysis.
161 1 62 1 63
DooB . Measure Theory 1 44 DENNISIFARB . No ncommutative Algebra.
145
VICK. Homology Theory . An
Introduction to Algebraic Topology. 2nd ed.
DIXON/MO RTIMER. Permutation
NATIIAN S ON. Additive N umber Theory:
Groups.
1 64
NATIIAN SON. Additive Number Theory : The Classical Bases .
1 65
Inverse Problems and the Geometry of SHARPE. Differential Geometry :
Cartan' s Generalization of Klein' s Erlangen Program . Sumsets .
1 66
1 67 1 68
MORANDI. Field and Galois Theory .
EWALD. Combinatorial Convexity and Algebraic Geometry.
3rd ed.
1 43
ALPERIN/BELL. Groups and
Representations .
B ases. A Computational Approach to
LANG. Real and Functional Analysis .
BORWEINIERDELYI. Polynomials and
Polynomial Inequalities.
BECKERIWEISPFENNING!KREDEL. Grobner Commutative Algebra.
LANG. Differential and Riemannian
Manifolds .
AxLERIBOURDON!RAMEY. Harmonic
Function Theory.
142
MALLIAVIN. Integration and
DoosoN/POSTON. Tensor Geometry .
RoMAN. Coding and Information Theory.
141
B ROWN/PEARCY. An Introduction to
Probability .
1 34
138
S ILYERMAN. Advanced Topics in
the Arithmetic of Elliptic Curves .
1 57
Course.
1 37
EISENBUD. Commutative Algebra with a View Toward Algebraic
Rings.
1 32
RATCLIFFE . Foundations of Hyperbolic Manifolds.
-
Part III.
RoTMAN . An Introduction to the
Theory of Groups. 4th ed.
REMMERT. Theory of Complex Functions .
DUBROVIN/FOMENKOINOVIKOV. Modem
ROSENBERG. Algebraic K-Theory
and Its Applications.
1 48
Combined 2nd ed.
1 22
B RIDGES . Computability: A
1 69
BHATIA. Matrix Analysis.