Matrix Algebra and Its Applications to Statistics and Econometrics
This page is intentionally left blank
Matrix Algebra and Its Applications to Statistics and Econometrics
c. Radhakrishna Rao Pennsylvania State University, USA
M. Bhaskara Rao North Dakota State University, USA
, •
World Scientific Singapore· New Jersey· London· Hong Kong
Published by
World Scientific Publishing Co. Pte. Ltd. POBox 128, Farrer Road, Singapore 912805 USA office: 27 Warren Street, Suite 401-402, Hackensack, NJ 07601 UK office: 57 Shelton Street, Covent Garden, London WC2H 9HE
Library of Congress Cataloging-in-Publication Data Rao, C. Radhakrishna (Calyampudi Radhakrishna), 1920Matrix algebra and its applications to statistics and econometrics / C. Radhakrishna Rao and M. Bhaskara Rao. p. cm. Includes bibliographical references and index. ISBN 9810232683 (alk. paper) I. Matrices. 2. Statistics. 3. Econometrics. I. Bhaskara Rao, M. QA188.R36 1998 98-5596 512.9'434--dc21 CIP
British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library.
First published 1998 Reprinted 2001, 2004
Copyright © 1998 by World Scientific Publishing Co. Pte. Ltd. All rights reserved. This book, or parts thereof, nwy not be reproduced in any form or by any means, electronic or mechanical, including photocopying, recording or any infornwtion storage and retrieval system now known or to be invented, without written permission from the Publisher.
For photocopying of material in this volume, please pay a copying fee through the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. In this case permission to photocopy is not required from the publisher.
Printed in Singapore by Utopia Press Pte Ltd
To our wives
BHARGAVl (Mrs. C.R. Rao) JAYASRI (Mrs. M.B. Rao)
This page is intentionally left blank
PREFACE
Matrix algebra and matrix computations have become essential prerequisites for study and research in many branches of science and technology. It is also of interest to know that statistical applications motivated new lines of research in matrix algebra, some examples of which are generalized inverse of matrices, matrix approximations, generalizations of Chebychev and Kantorovich type inequalities, stochastic matrices, generalized projectors, Petrie matrices and limits of eigenvalues of random matrices. The impact of linear algebra on statistics and econometrics has been so substantial, in fact, that a number of books devoted entirely to matrix algebra oriented towards applications in these two subjects are now available. It has also become a common practice to devote one chapter or a large appendix on matrix calculus in books on mathematical statistics and econometrics. Although there is a large number of books devoted to matrix algebra and matrix computations, most of them are somewhat specialized in character. Some of them deal with purely mathematical aspects and do not give any applications. Others discuss applications using limited matrix theory. We have attempted to bridge the gap between the two types. We provide a rigorous treatment of matrix theory and discuss a variety of applications especially in statistics and econometrics. The book is aimed at different categories of readers: graduate students in mathematics who wish to study matrix calculus and get acquainted with applications in other disciplines, graduate students in statistics, psychology, economics and engineering who wish to concentrate on applications, and to research workers who wish to know the current developments in matrix theory for possible applications in other areas. This book provides a self-contained, updated and unified treatment of the theory and applications of matrix methods in statistics and econ~ metrics. All the standard results and the current developments, such as the generalized inverse of matrices, matrix approximations, matrix vii
viii
MATRIX ALGEBRA THEORY AND APPLICATIONS
differential calculus and matrix decompositions, are brought together to produce a most comprehensive treatise to serve both as a text in graduate courses and a reference volume for research students and consultants. It has a large number of examples from different applied areas and numerous results as complements to illustrate the ubiquity of matrix algebra in scientific and technological investigations. It has 16 chapters with the following contents. Chapter 1 introduces the concept of vector spaces in a very general setup. All the mathematical ideas involved are explained and numerous examples are given. Of special interest is the construction of orthogonal latin squares using concepts of vector spaces. Chapter 2 specializes to unitary and Euclidean spaces, which are vector spaces in which distances and angles between vectors are defined. They playa special role in applications. Chapter 3 discusses linear transformations and matrices. The notion of a transformation from one vector space to another is introduced and the operational role of matrices for this purpose is explained. Thus matrices are introduced in a natural way and the relationship between transformations and matrices is emphasized throughout the rest of the book. Chapters 4, 5, 6 and 7 cover all aspects of matrix calculus. Special mention may be made of theorems on rank of matrices, factorization of matrices, eigenvalues and eigenvectors, matrix derivatives and projection operators. Chapter 8 is devoted to generalized inverse of matrices, a new area in matrix algebra which has been found to be a valuable tool in developing a unified theory of linear models in statistics and econometrics. Chapters 9, 10 and 11 discuss special topics in matrix theory which are useful in solving optimization problems. Of special interest are inequalities on singular values of matrices and norms of matrices which have applications in almost all areas of science and technology. Chapters 12 and 13 are devoted to the use of matrix methods in the estimation of parameters in univariate and multivariate linear models. Concepts of quadratic subspaces and new strategies of solving linear equations are introduced to provide a unified theory and computational techniques for the estimation of parameters. Some modern developments in regression theory such as total least squares, estimation of parameters in mixed linear models and minimum norm quadratic estimation are discussed in detail using matrix methods. Chapter 14
ix
Preface
deals with inequalities which are useful in solving problems in statistics and econometrics. Chapter 15 is devoted to non-negative matrices and Perron-Frobenius theorem which are essential for the study of and research in econometrics, game theory, decision theory and genetics. Some miscellaneous results not covered in the main themes of previous chapters are put together in Chapter 16. It is a pleasure to thank Marina Tempelman for her patience in typing numerous revisions of the book.
March 1998 C.R. Rao M.B. Rao
This page is intentionally left blank
NOTATION
The following symbols are used throughout the text to indicate certain elements and the operations based on them.
Scalars
R C F x
= Xl + iX2
X=
Xl -
iX2
Ixi = (xi + X~)1/2
real numbers complex numbers general field of elements a complex number conjugate of X modulus of X
General
{an}
A,B, ... ACB xEA
A+B AuB AnB
a sequence of elements sets of elements set A is contained in set B x is an element of set A {Xl + X2 : Xl E A, X2 E B} {x : X E A and/or B} {x : X E A and x E B}
Vector Spaces
(V, F) dimV at, a2,··· Sp(at, ... , ak) Fn Rn
Cn
vector space over field F dimension of V vectors in V the set {alaI + ... + akak : al, .. · , ak E C} n dimensional coordinate (Euclidean) space same as Fn with F = R same as Fn with F = C xi
xii
MATRIX ALGEBRA THEORY AND APPLICATIONS
direct sum, {x + y : x E V, yEW; V n W = O} inner product semi inner product
El1
< .,. >
(., .) Transformations
transformation from space V to space W the range of T, i.e., the set {Tx : x E V} the kernel of T, i.e., the set {Tx = 0 : x E V} nullity (dimension of K{T))
T:V-+W
R(T) K(T)
v{T) Matrices
general matrices or linear transformations m x n order matrix
A,B,C, ... A
mxn Mm,n Mm,n(-) Mn A = [aij]
A E Mm,n
Sp(A)
A* =.A' A* = A A*A = AA* A*A = AA* A# A-I
AA+ ALMN In
I
=I
the class of matrices with m rows and n columns m x n order matrices with specified property 0 the class of matrices with n rows and n columns aij is the (i,j) the entry of A (i-th row and j-th column) A is a matrix with m rows and n columns the vector space spanned by the column vectors of A, also indicated by R(A) considering A as transformation iiij is complex conjugate of aij obtained from A interchanging rows and columns, i.e., if A = (aij) then A' = (aji) Conjugate transpose or transpose of .A defined above Hermitian or self adjoint unitary normal adjoint (A E Mm,n, < Ax, Z >m= < x, A#z >n) inverse of A E Mn such that AA-I = A-I A = I generalized or g-inverse of A E Mm,n, (AA- A = A) Moore-Penrose inverse Rao-Yanai (LMN) inverse identity matrix of order n with all diagonal elements as unities and the rest as zeros identity matrix when the order is implicit
Notation
o
xiii
IIxll
zero scalar, vector or matrix rank of matrix A spectral radius of A vector of order mn formed by writing the columns of A E Mm,n one below the other matrix partitioned by colwnn vectors aI, ... ,an matrix partitioned by two matrices Al and A2 trace A, the sum of diagonal elements of A E Mn determinant of A Hadamard-Schur product Kronecker product Khatri-Hao product matrix with < bi , aj > as the (j, i)-th entry, where A = (all ... laftl ) , B = (bll· . ·Ibn ) norm of vector x
IIxll ..e
semi-norm of vector x
II All
norm or matrix norm of A
IIAIIF
Frobenius norm of A = ([tr(A* A)Jl/2
II Allin
induced matrix norm: max II Axil for
IIAlls
spectral norm of A
IIAllui
unitarily invariant norm, IIU* AVII = for all unitary U and V, A E Mm,n
IIAllwui
weakly unitarily invariant norm, IIU* AUII for all unitary U, A E Mn
IIAIIMNi
M, N invariant norm
m(A)
matrix obtained from A = (aij) by replacing aij by laijl, the modulus of the number aij E C positive definite matrix (x* Ax > 0 for x i= 0) non-negative definite matrix (x" Ax ~ 0) singular value decomposition
p(A) p.. (A) vec A
(all·· . Ian) [AIIA2J tr A IAI or det A
A·B A®B A0B AoB
pd nnd s.v.d. B~L A
IIxll =
1
IIAII = IIAII
or simply B ~ A to indicate (LOwner partial order) A - B is nnd
MATRIX ALGEBRA THEORY AND APPLICATIONS
xiv
B ~e A
xi ~ Yi, i = 1, ... ,n, where x' = (x}, .. . ,xn ) and y' = (y}, . . . ,Yn) entry wise inequality bij ~ aij, A = (aij), B = (b ij )
B ~e A
entry wise inequality bij
X
~e Y
A
~e
A
>e 0
0
Y
«
y
«w x «8 x
y
x
~
aij
non-negative matrix (all elements are non-negative) positive matrix (all elements are positive) vector x majorizes vector y vector x weakly majorizes vector y vector x soft majorizes vector y
{Ai(A)}
eigenvalues of A E M n , [A} (A) ~ ... ~ An(A)]
{O'i(A)}
singular values of A E Mm,n, [O'}(A) ~ ... ~ O'r(A)], r = min{m,n}
CONTENTS
Preface . ....... ... .... . .. . ... . . . .. . .......... . ..... .. .. . . .. ... . . vii Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. xi
CHAPTER 1. VECTOR SPACES 1.1 Rings and Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 1.2 Mappings ... . . . ......... . .... . ................ .... .. . . . .... . 1.3 Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 1.4 Linear Independence and Basis of a Vector Space . . . . . . . . . . . . . . . . 1.5 Subspaces ....... .. .. . ........... .. . . . ... .... .. . ... . . . .. .. . . 1.6 Linear Equations .. . .. . ... ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 1. 7 Dual Space ...... .. ......... .. ... . ..... . ....... . ....... .. ... 1.8 Quotient Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 1.9 Projective Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CHAPTER 2. UNITARY AND EUCLIDEAN SPACES 2.1 Inner Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Orthogonality .... . ..... . .. .. ... . .. . ...................... . . . 2.3 Linear Equations .. ........... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Linear Functionals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Semi-inner Product ............. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6 Spectral Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7 Conjugate Bilinear Functionals and Singular Value Decomposition . .. ......... .. .. .. . . .. . . ... . . . . . .. ...... . ....
51 56 66 71 76 83 101
CHAPTER 3. 3.1 3.2 3.3 3.4
LINEAR TRANSFORMATIONS AND MATRICES Preliminaries . . ...... . . . ... ... .. . .... ..... . ... . . .. .. . . ...... Algebra of Transformations .. .. .... . .. . . ... ..... . .. .. . ... . . .. Inverse Transformations ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. Matrices . .. . . .. . ...... .. . . . .. .... .. .... . ....... . .... . .. .. ..
1 14 16 19 24 29 35 41 42
xv
107 110 116 120
XVI
MATRIX ALGERBA THEORY AND APPLICATIONS
CHAPTER 4. CHARACTERISTICS OF MATRICES 4.1 Rank and Nullity of a Matrix .......... .... .................. 4.2 Rank and Product of Matrices. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 4.3 Rank Factorization and Further Results .... .......... ..... .. . 4.4 Determinants . . .. ......... ... ............ ......... ...... . . . 4.5 Determinants and Minors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
128 131 136 142 146
CHAPTER 5. FACTORIZATION OF MATRICES 5.1 Elementary Matrices. . . . . . . . . . . . . . . . . . . . . . . . . . . .. .......... 5.2 Reduction of General Matrices ......... .. ........ .. ... ... . . . . 5.3 Factorization of Matrices with Complex Entries . . . . . . . . . . . . . . . . . 5.4 Eigenvalues and Eigenvectors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5 Simultaneous Reduction of Two Matrices . . . . . . . . . . . . . . . . . . . . .. 5.6 A Review of Matrix Factorizations . . . . . . . . . . . . . . . . . . . . . . . . . . . .
157 160 166 177 184 188
CHAPTER 6. OPERATIONS ON MATRICES 6.1 Kronecker Product. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 6.2 The Vec Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 6.3 The Hadamard-Schur Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 6.4 Khatri-Roo Product. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 6.5 Matrix Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
193 200 203 216 223
CHAPTER 7.
PROJECTORS AND IDEMPOTENT OPERATORS Projectors ......... . ... .. ................... . ... . .. ... .... . Invariance and Reducibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Orthogonal Projection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. Idempotent Matrices. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. Matrix Representation of Projectors . . . . . . . . . . . . . . . . . . . . . . . . . . .
239 245 248 250 256
CHAPTER 8. GENERALIZED INVERSES 8.1 Right and Left Inverses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 Generalized Inverse (g-inverse) ..... .. .... .. ..... .. . . . .. ...... 8.3 Geometric Approach: LMN-inverse .... .... .. ........... . .. . .. 8.4 Minimum Norm Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
264 265 282 288
7.1 7.2 7.3 7.4 7.5
Contents
8.5 8.6 8.7 8.8 8.9
Least Squares Solution ... .... . .. . ..... . .......... . . . ....... . Minimum Norm Least Squares Solution . . . . . . . . . . . . . . . . . . . . . . .. Various Types of g-inverses ............................ . ..... G-inverses Through Matrix Approximations. . . . . . . . . . . . . . . . . . .. Gauss-Markov Theorem. . . .. . . .. . . . . . .... .. ... .. . . . . . .. . . . . .
xvii
289 291 292 296 300
CHAPTER 9. MAJORIZATION 9.1 Majorization.. ............................... . ........ . . . .. 303 9.2 A Gallery of Functions ..... .... ..... ............ ...... ... . .. 307 9.3 Basic Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 308 CHAPTER 10. INEQUALITIES FOR EIGENVALUES 10.1 Monotonicity Theorem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 10.2 Interlace Theorems ... . ... ... .. ........................... 10.3 Courant-Fischer Theorem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . lOA Poincare Separation Theorem ...... . ...... ...... . .. . . ... ... 10.5 Singular Values and Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.6 Products of Matrices, Singular Values, and Horn's Theorem . . ........... . ........... . ... .................. . 10.7 Von Neumann 's Theorem . . . . .. . . . . .. .. . . . .. .. .. . .. . .. .. ... CHAPTER 11. MATRIX APPROXIMATIONS 11.1 Norm on a Vector Space. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 11.2 Norm on Spaces of Matrices. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 11.3 Unitarily Invariant Norms . ............................. ... 11.4 Some Matrix Optimization Problems . . . . . . . . . . . . . . . . . . . . . . .. 11.5 Matrix Approximations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 11.6 M, N-invariant Norm and Matrix Approximations . . . . . . . . . . . .. 11.7 Fitting a Hyperplane to a Set of Points . . . . . . . . . . . . . . . . . . . . ..
340 342
361 363 374 383 388 394 398
CHAPTER 12. 12.1 12.2 12.3 1204 12.5
OPTIMIZATION PROBLEMS IN STATISTICS AND ECONOMETRICS Linear Models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Some Useful Lemmas. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Estimation in a Linear Model ...................... .. .. . .. . A Trace Minimization Problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . Estimation of Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
322 328 332 337 339
403 403 406 409 413
xviii
MATRIX ALGEBRA THEORY AND APPLICATIONS
12.6 The Method of MIN QUE: A Prologue. . . . . . . . . . . . . . . . . . . . . .. 12.7 Variance Components Models and Unbiased Estimation . . . . . .. 12.8 Normality Assumption and Invariant Estimators . . . . . . . . . . . . . 12.9 The Method of MIN QUE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 12.10 Optimal Unbiased Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.11 Total Least Squares. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
415 416 419 422 425 428
CHAPTER 13. QUADRATIC SUBSPACES 13.1 Basic Ideas 13.2 The Structure of Quadratic Subspaces . . . . . . . . . . . . . . . . . . . . . .. 13.3 Commutators of Quadratic Subspaces . . . . . . . . . . . . . . . . . . . . . . . 13.4 Estimation of Variance Components . . . . . . . . . . . . . . . . . . . . . . . . .
433 438 442 443
CHAPTER 14.
INEQUALITIES WITH APPLICATIONS IN STATISTICS Some Results on nnd and pd Matrices . . . . . . . . . . . . . . . . . . . . . .. Cauchy-Schwartz and Related Inequalities . ... ... .... ...... .. Hadamard Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. Holder's Inequality. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. Inequalities in Information Theory . . . . . . . . . . . . . . . . . . . . . . . . .. Convex Functions and Jensen's Inequality . . . . . . . . . . . . . . . . . . . . Inequalities Involving Moments. . . . . . . . . . . . . . . . . . . . . . . . . . . .. Kantorovich Inequality and Extensions . . . . . . . . . . . . . . . . . . . . . .
449 454 456 457 458 459 461 462
CHAPTER 15. NON-NEGATIVE MATRICES 15.1 Perron-Frobenius Theorem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 15.2 Leontief Models in Economics . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 15.3 Markov Chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 15.4 Genetic Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 15.5 Population Growth Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
467 477 481 485 489
CHAPTER 16. MISCELLANEOUS COMPLEMENTS 16.1 Simultaneous Decomposition of Matrices . . . . . . . . . . . . . . . . . . . .. 16.2 More on Inequalities .. . . ... . .. .......... ..... . . .... .... ... 16.3 Miscellaneous Results on Matrices ... . .... . ...... .. .... . .... 16.4 Toeplitz Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.5 Restricted Eigenvalue Problem . .... . ...... .. ...... . ... . ....
493 494 497 501 506
14.1 14.2 14.3 14.4 14.5 14.6 14.7 14.8
Contents
16.6 Product of Two Raleigh Quotients . . . . . . . . . . . . . . . . . . . . . . . . . . 16.7 Matrix Orderings and Projection . . ..................... . ... 16.8 Soft Majorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 16.9 Circulants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 16.10 Hadamard Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 16.11 Miscellaneous Exercises. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
xix
507 508 509 511 514 515
REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 519 INDEX ..... . .. . .. . .. . ....... . .... .. .. . ... . .. . . . .. . . . .. . . .. ... . 529
CHAPTER 1 VECTOR SPACES The use of matrix theory is now widespread in both physical and s~ cial sciences. The theory of vector spaces and transformations (of which matrices are a special case) have not, however, found a prominent place, although they are more fundamental and offer a better understanding of applied problems. The concept of a vector space is essential in the discussion of topics such as the theory of games, economic behavior, prediction in time series, and the modern treatment of univariate and multivariate statistical methods.
1.1. Rings and Fields Before defining a vector space, we briefly recall the concepts of groups, rings and fields. Consider a set G of elements with one binary operation defined on them. We call this operation multiplication. If a and f3 are two elements of G, the binary operation gives an element of G denoted byaf3. The set G is called a group if the following hold:
(gd (g2)
a(f31') = (af3h for every a, f3 and l' in G (associative law). The equations ay = f3 and ya = f3 have unique solutions for y for all a and f3 in G. From these axioms, the following propositions (P) follow. (We use the symbol P for any property, proposition or theorem. The first two digits after P denote the section number.)
P 1.1.1 There exists a unique element, which we denote by 1 (the unit element of G), such that al
=a
and la
=a
for every a in G.
P 1.1.2 For every a in G, there exists a unique element, which we denote by a-I (multiplicative inverse of a, or simply, the inverse of a) 1
2
MATRIX ALGEBRA THEORY AND APPLICATIONS
such that
aa- I
= a-Ia = l.
A group G is said to be cormnutative if a{3 = {3a for every a and {3 in G. If the group is cormnutative, it is customary to call the binary operation as addition and use the addition symbol + for the binary operation on the elements of G. The unit element of G is called the zero element of G and is denoted by the symbol o. The inverse of any element a in G is denoted by -a. A cormnutative group is also called an abelian group. A simple example of an abelian group is the set of all real numbers with the binary operation being the usual addition of real numbers. Another example of an abelian group is the set G = (0, (0), the set of all positive numbers, with the binary operation being the usual multiplication of real numbers. We will present more examples later. A subgroup of a group G is any subset H of G with the properties that a{3 E H whenever a, {3 E H . A subgroup is a group in its own right under the binary operation of G restricted to H . If H is a subgroup of a group G and x E G, then xH = {xy : y E H} is called a left coset of H. If x E H, then xH = H . If xIH and X2H are two left cosets, then either xIH = X2H or xIH n X2H = 0. A right coset Hx is also defined analogously. A subgroup H of a group G is said to be invariant if xH = Hx for all x E G . Let H be an invariant subgroup of a group G. Let GIH be the collection of all distinct cosets of H . One can introduce multiplication between ele~ of GIH. If HI and H2 are two cosets, define HIH2 = {a{3 : a E HI and {3 E H2}. Under this binary operation, GIH is a group. Its unit element is H. The group GIH is called the quotient group of G modulo H. It can also be shown that the union of all cosets is G. More concretely, the cosets of H form a partition of G. There is a nice connection between finite groups and latin squares. Let us give a formal definition of a latin square. DEFINITION 1.1.3. Let T be a set of n elements.
A latin square of order n based on T is a square grid of n 2 elements t ~J' -- 1 < i ~ n, 1 ~ j ~ n arranged in n rows and n columns such that (1) tij E T for every i and j, (2) each element of T appears exactly once in each row, (3) each element of T appears exactly once in each column.
L
= (tiJ-)
3
Vector SpacetJ
In a statistical context, T is usually the set of treatments which we wish to compare for their effects over a certain population of experimental units. We select n 2 experimental units arranged in n rows and n columns. The next crucial step is the allocation of treatments to experimental units. The latin square arrangement of treatments is one way of allocating the treatments to experimental units. This arrangement will enable us to compare the effects of any pair of treatments, rows, and columns. Latin squares are quite common in parlor games. One of the problems is to arrange the kings (K), queens (Q), jacks (J) and aces (A) of a pack of cards in the form of a 4 X 4 grid so that each row and each column contains one from each rank and each suit. If we denote spades by S, hearts by H, diamonds by D and clubs by C, the following is one such arrangement.
SA CQ DJ HK
DK HJ SQ CA
HQ DA CK SJ
CJ SK HA DQ
The above arrangement is a superimposition of two latin squares. The suits and ranks each form a latin square of order 4. We now spell out the connection between finite groups and latin squares. P 1.1.4 Let G be any group with finitely many elements. Then the table of the group operation on the elements of G constitutes a latin square of order n on G. PROOF. Assume that G has n elements. Let G = {01' 02, ... ,On}. Assume, without loss of generality, that the group is commutative with the group operation denoted by +. Let us consider a square grid of size n x n, where the rows and columns are each indexed by 01,02,··· ,On and the entry located in the i-th row and j-th column is given by 0i +OJ. This is precisely the table of the group operation. We claim that no two elements in each row are identical. Suppose not. If 0i + OJ = 0i + Ok for some 1 ~ i,j, k ~ nand j 1= k, then OJ = Ok. This is a contradiction. Similarly, one can show that no two elements in each column are identical.
4
MATRIX ALGEBRA THEORY AND APPLICATIONS
It is not difficult to construct latin squares on any n symbols. But it is nice to know that the group table of any finite group gives a latin square. However, it is not true that every latin square arises from a group table. We will talk more about latin squares when we discuss fields later. We now turn our attention to rings. Let K be a set equipped with two binary operations, which we call as addition and multiplication. The set K is said to be a ring if the following hold: (1) With respect to addition, K is an abelian group. (2) With respect to multiplication, the associative law holds, i.e.,
0:(f3'Y) = (o:f3h for every 0:, f3 and 'Y in K. (3) The multiplication is distributive with respect to addition, i.e.,
0:(f3 + 'Y) = 0:f3 + O:'Y for every 0:, f3 and 'Y in K. If the multiplication operation in the ring K is commutative, then K is called a commutative ring. As a simple example, let K = {a, 1,2,3,4, 5,6}. The addition and multiplication on K are the usual addition and multiplication of real numbers but modulo 7. Then K is a commutative ring. Let Z be the set of all integers with the usual operations of addition and multiplication. Then Z is a commutative ring. Finally, we come to the definition of a field. Let F be a set with the operations of addition and multiplication (two binary operations) satisfying the following: (1) With respect to the addition, F is an abelian group. (2) With respect to the multiplication, F - {a} is an abelian group. (3) Multiplication is distributive with respect to addition, i.e.,
0:(f3 + 'Y) = 0:f3 + O:'Y for every 0:, f3 and 'Y in F. The members of a field F are called scalars. Let Q be the set of all rational numbers, R the set of all real numbers, and C the set of all complex numbers. The sets Q, Rand C are standard examples of a field. The reader may verify the following from the properties of a field. P 1.1.5
If 0: + f3
= 0: + 'Y for 0:, f3 and 'Y in F, then f3 = 'Y.
Vector Spaces
P 1.1.6
5
(-1)0 = -0 for any a in F.
P 1.1.7 0 a = 0 for any a in F. P 1.1.8 If a # 0 and {3 are any two scalars, then there exists a unique scalar x such that ox = {3. In fact, x = 0-1{3, which we may also write as {3/ o. P 1.1.9 If 0{3 = 0 for some a and {3 in F, then at least one of a and {3 is zero. Another way of characterizing a field is that it is a commutative ring in which there is a unit element with respect to multiplication and any non-zero element has a multiplicative inverse. In the commutative ring K = {O, 1,2, 3} with addition and multiplication modulo 4, there are elements a and {3 none of which is zero and yet 0{3 = o. In a field, 0{3 = 0 implies that at least one of a and {3 is zero.
EXAMPLE 1.1.10. Let p be any positive integer. Let F = {O, 1,2, ... ,p - I}. Define addition in F by 0+ {3 = 0+ {3 (modulo p) for a and {3 in F. Define multiplication in F by 0{3 = 0{3 (modulo p) for a and {3 in F. More precisely, define addition and multiplication in F by a + {3 = a + {3 = a
+ {3 -
if a p
0{3 = 0{3
if a
+ {3 ~ p + {3 > p -
if 0{3
~
r
1;
p -1,
if 0{3 = rp
="Y
1,
2: 1 and 0
+ "y for some integers ~ "y ~
p - 1.
If p is a prime number, then F is a field. EXAMPLE 1.1.11. Let F = {O, 1,0,{3}, addition and multiplication on the elements of F be as in the following tables.
Multiplication table
Addition table
0 1 a {3
0
1
a
{3
0 1 a {3
1 0 {3 a
a {3 0 1
{3 a 1 0
0 1 a {3
0
1
a
{3
0 0 0
0 1 a {3
0 a {3 1
0 {3 1 a
°
6
MATRIX ALGEBRA THEORY AND APPLICATIONS
The binary operations so defined above on F make F a field. Firute fields, i.e., fields consisting of a firllte number of elements are called Galois fields. One of the remarkable results of a Galois field is that the number of elements in any Galois field is pm for some prime number p and positive integer m. Example 1.1.10 is a description of the Galois field, GF(p), where p is a prime number. Example 1.1.11 is a description of the Galois field, GF(22 ). As one can see, the description of GF(p) with p being a prime number is easy to provide. But when it comes to describing G F(pm) with p being prime and m ~ 2, additional work is needed. Some methods for construction of such fields are developed in papers by Bose, Chowla, and Roo (1944, 1945a, 1945b). See also Mann (1949) for the use of GF(pm) in the construction of designs. Construction of orthogonal latin squares and magic squares are two of the benefits that accrue from a study of firllte fields. Let us start with some defirutions. DEFINITION 1.1.12. Let L1 and L2 be two latin squares each on a set of n symbols. They are said to be orthogonal if we superimpose one latin square upon the other, every ordered pair of symbols occurs exactly once in the composite square. The following are two latin squares, one on the set S1 = {S, H, D, C} and the other on the set S2 = {K, Q, J, A}.
L1
:
S C D
D H S
H D C
C S H
L 2·.Q J
H C S
D
K
A
K Q
J
Q
A K
K A
A
J
Q
J
The latin squares L1 and L2 are orthogonal. Way back in 1779, Leonard Euler posed the following famous problem. There are 36 officers of six different ranks with six officers from each rank. They also come from six different regiments with each regiment contributing six officers. Euler conjectured that it is impossible to arrange these officers in a 6 x 6 grid so that each row and each column contains one officer from each regiment and one from each rank. In terms of the notation introduced ab0're , can one build a latin square L1 on the set of regiments and a latin square L2 on the set of ranks such that L1 and L2 are orthogonal? By an exhaustive enumeration, it has been found that Euler was right. But if n > 6, one can always find a pair of orthogonal latin squares as shown
Vector Spaces
7
by Bose, Shrikhande and Parker (1960). In the example presented after Definition 1.1.3, the suits are the regiments, the kings, queens, jacks and aces are ranks, and n = 4. The problem of finding pairs of orthogonal latin squares has some statistical relevance. Suppose we want to compare the effect of some m dose levels of a drug, Drug A say, in combination with some m levels of another drug, Drug B say. Suppose we have m 2 experimental units classified according to two attributes C and D each at m levels. The attribute C, for example, might refer to m different age groups of experimental units and the attribute D might refer to m different social groups. The basic problem is how to assign the n = m 2 drug combinations to the experimental units in such a way that the drug combinations and the cross-classified experimental units constitute a pair of orthogonal latin squares. If such an arrangement is possible, it is called a graeco-Iatin square. As an illustration, consider the following example. Suppose Drug A is to be applied at two levels: High (At) and Low (A 2 ), and Drug B at two levels: High (Bd and Low (B2). The four drug combinations constitute the first set 8 1 of symbols, i.e.,
for which a latin square L1 is sought with n = 4. Suppose the attribute C has two age groups: C1(~ 40 years old) and C2 (> 40 years old), and D has two groups: Dl (White) and D2 (Black). The second latin square L2 is to be built on the set
Choosing L1 and L2 to be orthogonal confers a distinct statistical advantage. Comparisons can be made between the levels of each drug and attribute. The concept of orthogonality between a pair of latin squares can be extended to any finite number of latin squares.. DEFINITION 1.1.13. Let L 1 , L2, ... ,Lm be a set oflatin squares each of order n. The set is said to be mutually orthogonal if Li and L j are orthogonal for every i =1= j.
8
MATRIX ALGEBRA THEORY AND APPLICATIONS
The construction of a set of mutually orthogonal latin squares is of statistical importance. Galois fields provide some help in this connection. Let GF(s) be a Galois field of order s. Using the Galois field, one can construct a set of s - 1 mutually orthogonal latin squares. Let G F( s) = {OO, 01, ... ,Os-I} with the understanding that 00 = o. P 1.1.14 Let Lr be the square grid in which the entry in the i-th row and j-th column is given by
for 1 ::; r ::; s - 1. Then L}, L2, ... ,Ls- 1 is a set of mutually orthogonal latin squares. PROOF. First, we show that each Lr is a latin square. We claim that any two entries in any row are distinct. Consider the i-th row, and p-th and q-th elements in it with p =F q. Look at
Consequently, no two entries in any row are identical. Consider now the j-th column, and p-th and q-th entries in it with p =F q. Look at
in view of the fact that r 2: 1 and Or =f:. O. Now, we need to show that Lr and Ls are orthogonal for any r =f:. sand r, s = 1,2, ... ,s - 1. Superimpose Lr upon L 8 • Suppose (oij(r),Oij(S)) = (opq(r),opq(s)) for some 0 ::; i, j ::; s-1 and 0 ::; p, q ::; s-1. Then 0rOi +OJ = orop+Oq and 0sOi + OJ = osop + Oq. By subtracting, we obtain
or, equivalently, (Or -
Os)(Oi - op) = O.
Since r =F s, we have 0i - op = 0, or i = p. We see immediately that j = q. This shows that Lr and La are orthogonal. This completes the proof.
Vector Spaces
9
Pairs of orthogonal latin squares are useful in drawing up schedules for competitions between teams. Suppose Teams A and B each consisting of 4, players want to organize chess matches between members of the teams. The following are to be fulfilled. (1) Every member of Team A plays every member of Team B. (2) All the sixteen matches should be scheduled over a span of four \ days with four matches per day. (3) Each player plays only one match on any day. (4) On every day, each team plays an equal number of games with white and black pieces. (5) Each player plays an equal number of games with white and black pieces. Drawing a 16-match schedule spread over four days fulfilling Conditions 1, 2, and 3 is not difficult. One could use a latin square on the set of days the games are to be played. The tricky part is to have the schedule fulfilling Conditions 4 and 5. A pair of orthogonal latin squares can be used to draw up a schedule of matches. Let Di stand for Day i, i = 1, 2, 3, 4. Let L1 and L2 be the pair of orthogonal latin squares on the sets
S1
= {Dl, D2, D3, D4}
and
S2
= {1,2, 3,4},
respectively, given by
L1 :
D2 D3 D4 D4 D3 D2 Dl D2 Dl D4 D3 D3 D4 Dl D2
L2 :
1 2 3 4 3 4 1 2 4 3 2 1 2 1 4 3
Dl
Replace even numbers in L2 by white (W), odd numbers by black (B) and then superimpose the latin squares. The resultant composition is given by
MATRIX ALGEBRA THEORY AND APPLICATIONS
10
Team A/Team B
1 2 3 4
1
2
3
4
(Dl, B) (D4,B) (D2, W)
(D2, W)
(D3,B)
(D3, W) (Dl, B)
(D3, W) (Dl, W) (D3, B)
(D3, W)
(D4, B)
(D2,B) (D4, W) (Dl, W)
(D2,B)
The schedule of matches can be drawn up using the composite square. Day
Team A players
Dl
1 2 3 4
vs vs vs vs
2 3
1 2 3 4
vs vs vs vs
2 3 1 4
W
1
vs vs vs vs
3 2 4 1
B W
vs vs vs vs
4 1 3
D2
D3
2 3 4
D4
1
2 3 4
Team B players
Color of pieces by team A players
1
B
4
W B W
2
B W B
B W W B
W B
This schedule fulfills all the five requirements 1 to 5 stipulated above. A pair of orthogonal latin squares can also be used to build magic squares. Let us define formally what a magic square is.
11
Vector Spaces
DEFINITION 1.1.15. A magic square of order n is an n x n square grid consisting of the numbers 1, 2, ... , n 2 such that the entries in each row, each column and each of the two main diagonals sum up to the same number. We can determine what each row in a magic square sums up to. The sum of all integers from 1 to n 2 is n 2 (n 2 + 1)/2. Then each row in the magic square sums up to (l/n)n 2 (n 2 + 1)/2 = n(n2 + 1)/2. The following are magic squares of orders 3 and 4:
294 7 5 3 618
16
3
2
5
10
11
13 8
9 4
6 15
7 14
12 1
(from an engraving of Albrecht Diller entitled "Melancholia" (1514)). Many methods are available for the construction of magic squares. What we intend to do here is to show how a pair of orthogonal latin squares can be put to use to pullout a magic square. Let L1 = (il j ) and L2 = (ifj) be two orthogonal latin squares on the set {O, 1, 2, . .. , n - I}. Let M = (mij) be an n x n square grid in which the entry in the i-th row and j-th column is given by
for i,j = 1,2, ... , n. What can we say about the numbers mi/s? Since L1 and L2 are orthogonal, every ordered pair (i,j),i,j,= 0,1,2, ... , n - 1 occurs exactly once when we superimpose Ll upon L2. Consequently, each of the numbers 0, 1,2, ... , n 2 -1 will appear somewhere in the square grid M. We are almost there. Define a new grid M' = (m~j) of order n x n with mi j = mij + 1. Now each of the numbers 1,2, . . . , n 2 appears somewhere in the grid M'. P 1.1.16 In the grid M', each row and each column sums up to the same number.
12
MATRIX ALGEBRA THEORY AND APPLICATIONS
PROOF . Since LI and
L2 are latin squares, for any i = 1, 2, . . , , n,
n
L
mij
= Sum of all entries in the i-th row of M
j=I
n
n
j=1
j=1
= n ~ (I . + ~ (2 , L.J 'J L.J 'J = n(O
+ 1 + .. . + n -
I)
+ (0 + 1 + ... + n -
= n(n - l}n/2 + (n - l}n/2 = n(n
2
-
I)
1}/2,
which is independent of i. In a similar vein, one can show that each column of M sums up to the same number n(n2 - 1}/2. Thus M' has the desired properties stipulated above. The grid M' we have obtained above is not quite a magic square. The diagonals of M' may not sum up to the same number. We need to select the latin squares LI and L2 carefully.
P 1.1.17 Let Ll and ~ be two orthogonal latin squares of order n each on the same set {O, 1,2, ... ,n - I}. Suppose that each of the two main diagonals of each of the latin squares Ll and L2 add up to the same number (n 2 - l}n/2. Then the grid M' constructed above is a magic square. PROOF . It is not hard to show that each of the diagonals of M sums
up to n(n2
-
1}/2. We now have M' truly a magic square.
EXAMPLE 1.1.18. In the following L} and L2 are two latin squares of order 5 on the set {O, 1,2,3, 4}. These latin squares satisfy all the conditions stipulated in Proposition 1.1.17. We follow the procedure outlined above. 01234 01234 2340 1 34012 Ll : 40 1 23 L 2 :12340 12340 40 123 34012 23401 o 6 12 1824 1 7 13 19 25 13 1921 1 7 142022 2 8 M: 21 2 8 14 15 M' : 22 3 9 15 16 9 10 1622 3 10 11 1723 4 1723 4 5 11 1824 5 6 12
Vector Space8
13
Note that M' is a Magic square of order 5. We will wind up this section by talking about sub-fields. A subset Fl of a field F is said to be a sub-field of F if Fl is a field in its own right under the same operations of addition and multiplication of F restricted to Fl. For example, the set Q of all rational numbers is a subfield of the set R of all real numbers. A field F is said to be algebraically closed if every polynomial equation with the coefficients belonging to F has at least one root belonging to the field. For example, the set C of all complex numbers is algebraically closed, whereas the set R of all real numbers is not algebraically closed.
Complements As has been pointed out in P 1.1.4, the multiplication table of a finite group provides a latin square. We do not need the full force of a group to generate a latin square. A weaker structure would do. Let G be a finite set with a binary operation. The set G is said to be a quasi group if each of the equations, oy = (3 and yo = (3 has a unique solution in y for every 0, (3 in G.
1.1.1 Show that the multiplication table of a quasigroup with n elements is a latin square of order n. 1.1.2 Show that every latin square of order n gives rise to a quasigroup. (li we look at the definition of a group, it is clear that if the binary operation of a quasi group G is associative, then G is a group.) 1.1.3 Let G = {O, 1, 2} be a set with the following multiplication table. 012
o 1 2
1 2 0 1 2 0
0 2 1
Show that G is a quasi group but not a group. 1.1.4 Let n be an integer 2 2 and G = {O, 1,2, ... ,n - I}. Define a binary operation * on G by o
* (3 = ao + b(3 + c (modulo n)
for all 0 and (3 in G, where a and b are prime to n . Show that G is a quasi group.
14
MATRIX ALGEBRA THEORY AND APPLICATIONS
1.1.5 If L., L 2 , ••• ,Lm is a set of mutually orthogonal latin squares of order n, show that m ~ n - 1. Let X = {I, 2, ... ,n}, say, be a finite set. Let G be the collection of all subsets of X. Define a binary operation on G by o.{3 = o.6{3, a, {3 E G, where 6 is the set-theoretic operation of symmetric difference, i.e., o.6{3 = (a - {3) U ({3 - a), where 0.-
{3 = {x EX: x E a, x ~ {3}.
1.1.6 How many elements are there in G? 1.1.7 Show that G is a group. 1.1.8 Set out the multiplication table of the group G when n = 3. 1.1.9 Let F = {a + bv'2; a, b rational}. The addition and multiplication of elements in F are defined in the usual way. Show that F is a field. 1.1.10 Show that the set of all integers under the usual operations of addition and multiplication of numbers is not a field. 1.2. Mappings
In the subsequent discussion of vector spaces and matrices, we will be considering transformations or mappings from one set to another. We give some basic ideas for later reference. Let 8 and T be two sets. A map, a mapping, or a function 1 from 8 to T is a rule which associates to each element of 8 a unique element of T. If 8 is any element of 8, its associate in T is denoted by 1(8). The set 8 is called the domain of 1 and the set of all associates in T of elements of 8 is called the range of I. The range is denoted by 1(8). The map 1 is usually denoted by 1 : 8 --t T. Consider a map 1 : 8 --t T. The map 1 is said to be surjective or onto if 1(8) = T, i.e., given any t E T, there exists 8 E 8 such that I(s) = t. The map 1 is said to be injective or one-to-one if any two distinct elements of 8 have distinct associates in T, i.e., 81,82 E 8 and 1(81) = I(S2) imply that 81 = 82· The map f is said to be bijective if f
Vector Spaces
15
is one-to-one and onto or surjective and injective. If f is bijective, one can define the inverse map which we denote by f- 1 : T ---+ S; for t in T, f-l(t) = s, where s is such that f(s) = t. The map f- 1 is called the inverse of f.
f be a mapping from a group G 1 to a group is said to be a homomorphism if
DEFINITlON 1.2.1. Let
G 2 • Then
f
f(o{3) = f(o)f({3) If f is bijective, phic.
f
for every
0
and {3 in G 1 .
is said to be an isomorphism and Gland G 2 isomor-
Suppose G is a group and H an invariant subgroup of G, i.e., xH = Hx for all x E G. Let GIH be the quotient group of G modulo H, i.e., GIH is the collection of all distinct cosets of H. [Note that a coset of H is the set {xy : y E H} as defined in Section 1.1]. There is a natural map 71' from G to GIH. For every 0 in G, define 71'(0) = the coset of H to which 0 belongs. The map 71' is surjective and a homomorphism from G onto GIH. This map 71' is called the projection of G onto the quotient group GIH.
f be a mapping from a field Fl to a field F 2 • is said to be a homomorphism if
DEFINITION 1.2.2. Let
Then
f
f(o + (3) = f(o) + f({3) , f(o{3) = f(o)f({3) for every 0 and {3 in Fl. If f is bijective, then f is called an isomorphism and the fields Fl and F2 are called isomorphic.
Complements
1.2.1 Let Sand T be two finite sets consisting of the same number of elements. Let f : S ---+ T be a map. If f is surjective, show that f is bijective. 1.2.2 Let S = {1, 2, 3, 4} and G be the collection of all bijective maps from S to S. For any two maps f and 9 in G, define the composite map fog by (J 0 g)(x) = f(g(x)),x E S. Show that under the binary operation of composition of maps, G is a group. Let H be the collection of all maps f in G such that f(1) = 1. Show that H is a subgroup but not invariant. Identify all distinct left cosets of H. Is this a group under the usual multiplication of cosets?
16
MATRIX ALGEBRA THEORY AND APPLICATIONS
1.3. Vector Spaces
The concept of a vector space is central in any discussion of multivariate methods. A set of elements (called vectors) is said to be a vector space or a linear space over a field of scalars F if the following axioms are satisfied. (We denote the set of elements by V(F) to indicate its dependence on the underlying field F of scalars. Sometimes, we denote the vector space simply by V if the underlying field of scalars is unambiguously clear. We denote the elements of the set V(F) by Roman letters and the elements of F by Greek letters.) (1) To every pair of vectors x and y, there corresponds a vector x + y in such a way that under the binary operation +, V(F) is an abelian group. (2) To every vector x and a scalar a, there corresponds a vector ax, called the scalar product of a and x, in such a way that a} (a2x) = (a}a2)x for every aJ, a2 in F and x in V(F), and Ix = x for every x in V(F), where 1 is the unit element of F. (3) The distributive laws hold for vectors as well as scalars, i.e., a(x + y) = ax + ay for every a in F and x, y in V(F), and (a} + a2)x = a}x + a2x for every a}, a2 in F and x in V(F). We now give some examples. The first example plays an important role in many applications. EXAMPLE 1.3.1. Let F be a field of scalars and k 2: 1 an integer. Consider the following collection of ordered tuples:
Define addition and scalar multiplication in Fk by
for every 8 in F and (a}, a2, ... ,ak) in Fk. It can be verified that Fk is a vector space over the field F with (0,0, ... ,0) as the zero-vector. We call Fk a k-dimensional coordinate space. Strictly speaking, we should
Vector Spaces
17
write the vector space Fk as Fk (F). We will omit the symbol in the parentheses, which will not cause any confusion. Special cases of Fk are Rk and C k , i.e., when F is the field R of real numbers and C of complex numbers, respectively. They are also called real and complex arithmetic spaces. EXAMPLE 1.3.2. Let n ~ 1. The collection of all polynomials of degree less than n with coefficients from a field F with the usual addition and scalar multiplication of polynomials is a vector space. Symbolically, we denote this collection by
Pn(F)(t)
= {ao + a1t + a2t2 + ... + an_1tn-1 : ai E F, i=O,1,2, ... ,n-I},
which is a vector space over the field F. The entity ao + a1t + a2t2 + ... + a n -1 t n - 1 is called a polynomial in t with coefficients from the field
F. EXAMPLE 1.3.3. Let V be the collection of all real valued functions of a real variable which are differentiable. If we take F = R, and define sum of two functions in V and scalar multiplication in the usual way, then V is a vector space over the field R of real numbers. EXAMPLE 1.3.4. Let V = {(a,,B) : a > 0 and f3 addition and scalar multiplication in V as follows.
(1) (aI, f3.)
+ (a2' (32) =
> O}.
Define vector
(a1 a 2, f31(32) for every (aI, (31) and (a2, (32)
inV.
(2) 8(a,f3) = (a 6,f36) for every 8 in Rand (a,f3) in V. Then V is a vector space over the field R of real numbers. EXAMPLE 1.3.5. Let p be an odd integer. Let V = {(a, (3) : a and f3 real}. Define vector addition and scalar multiplication in Vas below: (1) (aI, f3.)+(a2, (32) = ((af+a~)l/p,(f3f+f3~)l/P) for every (a1,f3.) and (a2' f32) in V. (2) 8(a, (3) = (8 1/ Pa, 81/ p(3) for every 8 in R and (a, (3) in V.
Then V is a vector space over the field R of real numbers. This statement is not correct if p is an even integer.
18
MATRIX ALGEBRA THEORY AND APPLICATIONS
EXAMPLE 1.3.6. Let F = {a, 1, 2}. With addition and multiplication modulo 3, F is a field. See Example 1.1.10. Observe that the vector space Fk has only 3 k elements, while R k has an uncountable number of elements. The notion of isomorphic vector spaces will be introduced now. Let V I and V 2 be two vector spaces over the same field F of scalars. The spaces V I and V 2 are said to be isomorphic to each other if there exists a bijection h : V I ---+ V 2 such that
h(x + y) = h(x)
+ h(y)
h(ax) = ah(x)
for all a E F and x E VI .
for all x, y E VI,
Complements
1.3.1 Examine which of the following are vector spaces over the field C of complex numbers. Explain why or why not? (1) V Addition:
= {(a,;3); a E R, f3 E C}.
Scalar multiplication:
8(a,f3) = (8a,8f3), 8 E C,(a,f3) E V.
(2) V = {(a,f3): a +f3 = O,a,f3 E C}. Addition:
Scalar multiplication:
8(a, f3) = (8a, 8f3) , 8 E C, (a, f3)
E V.
1.3.2 Let V I = (0,00) and F = R. The addition in VIis the usual operation of multiplication of real numbers. The scalar multiplication is defined by ax = xO:, a E R, x E VI.
19
Vector Spaces
Show that VIis a vector space over R . Identify the zero vector of VI. 1.3.3 Show that VI of Complement 1.3.2 and the vector space V 2 = R over the field R of real numbers are isomorphic. Exhibit an explicit isomorphism between VIand V 2. 1.3.4 Let V(F) be a vector space over a field F. Let, for any fixed positive integer n,
Define addition in Vn(F) by
for (X},X2,'" ,xn ), (Y},Y2,'" tion in Vn(F) by 0(XI,X2' •••
,Yn)
E vn(F). Define scalar multiplica-
,xn ) = (OXl,OX2, ... ,oxn ), 0 E F (X},X2,'"
and
,xn ) E Vn(F).
Show that Vn(F) is a vector space over the field F. 1.4. Linear Independence and Basis of a Vector Space
Through out this section, we assume that we have a vector space V over a field F of scalars. The notions of linear independence, linear dependence and basis form the core in the development of vector spaces. DEFINITION 1.4.1. A finite set X}, X2, ... ,Xk of vectors is said to
be linearly dependent if there exist scalars O}, 02, ... ,Ok, not all zeros, such that 0IX} + 02X2 + ... + 0kXk = O. Otherwise, it is said to be linearly independent. P 1.4.2 The set consisting of only one vector, which is the zero vector 0, is linearly dependent. P 1.4.3 The set consisting of only one vector, which is a non-zero vector, is linearly independent. P 1.4.4 dependent.
Any set of vectors containing the zero vector is linearly
MATRIX ALGEBRA THEORY AND APPLICATIONS
20
P 1.4.5 A set Xl, X2, .. • ,Xk of non-zero vectors is linearly dependent if and only if there exists 2 ~ i ~ k such that
for some scalars f3l, f32, ... ,f3i-I, i.e., there is a member in the set which can be expressed as a linear combination of its predecessors. Let i E {I, 2, ... ,k} be the smallest integer such that the set of vectors Xl, X2, ••. ,Xi is linearly dependent. Obviously, 2 ~ i ~ k. There exist scalars Ql, Q2, ... ,Qi, not all zero, such that Ql Xl + Q2 X 2 + ... + QiXi = o. By the very choice of i, Qi =1= o. Thus we can write PROOF.
P 1.4.6 Let A and B be two finite sets of vectors such that A c B. If A is linearly dependent, so is B. If B is linearly independent, so is A. DEFlNlTlON 1.4.7. Let B be any subset (finite or infinite) of V. The set B is said to be linearly independent if every finite subset of B is linearly independent. DEFlNlTlON 1.4.8. (Basis of a vector space) A linearly independent set B of vectors is said to be a (Hamel) basis of V if every vector of V is a linear combination of the vectors in B. The vector space V is said to be finite dimensional if there exists a Hamel basis B consisting of finitely many vectors.
It is not clear at the outset whether a vector space possesses a basis. Using Zorn's lemma, one can demonstrate the existence of a maximal linearly independent system of vectors in any vector space. (A discussion of this particular feature is beyond the scope of the book.) Any maximal set is indeed a basis of the vector space. From now on, we will be concerned with finite dimensional vector spaces only. Occasionally, infinite dimensional vector spaces will be presented as examples to highlight some special features of finite dimensional vector spaces. The following results play an important role. P 1.4.9 If XI, X2, ••• ,Xk and Yl, Y2,· the vector space V, then k = s.
•• ,Ys
are two sets of bases for
Vector Spaces
21
PROOF. Suppose k =1= s. Let s > k. It is obvious that the set Yl! Xl! X2, ••• , Xk is linearly dependent. By P 1.4.5, there is a vector Xi which is a linear combination of its predecessors in the above set. Consequently, every vector in V is a linear combination of the vectors y},X}'X2, ••• ,Xi-},Xi+J, ••. ,Xk. ObservenowthatthesetY2,YI,XI,X2, ••• , Xi-I, Xi+!, ••• , Xk is linearly dependent. Again by P 1.4.5, there exists a j E {I, 2, ... , i-I, i + 1, ... , k} such that Xj is a linear combination of its predecessors. (Why?) Assume, without loss of generality, that i < j. It is clear that every vector in V is a linear combination ofthevectorsY2,y),x),x2,.·· ,Xi-),Xi+l, ... ,Xj_I,Xj+), ... ,Xk. Continuing this process, we will eventually obtain the set Yk, Yk-l, .. . , Y2, YI such that every vector in V is a linear combination of members of this set. This is a contradiction to the assumption that s > k. Even if we assume that s < k, we end up with a contradiction. Hence s = k. In finite dimensional vector spaces, one can now introduce the notion of the dimension of a vector space. It is precisely the cardinality of any Hamel basis of the vector space. We use the symbol dim(V) for the dimension of the vector space V.
P 1.4.10 Any given set Xl, X2, ... tors can be enlarged to a basis of V.
, Xr
of linearly independent vec-
Let YI , Y2, • •• , Y k be a basis of V, and consider the set X I, YI , Y2, ... , Yk of vectors, which is linearly dependent. Using the same method as enunciated in the proof of P 1.4.9, we drop one of y/s and then add one of xi's until we get a set X r ,Xr -1!.·· ,Xl'Y(I)'Y(2)'···' Y(k-r), which is a basis for V, where Y(i)'S are selections from Y1! Y2,· .. , Yk. This completes the proof. PROOF.
P 1.4.11 Every vector of any given basis of V. PROOF.
Let
X
x),x2, ... ,Xk
in V has a unique representation in terms be a basis for V. Let
and, also X
for some scalars
Qi'S
= {3IXI + {32 x 2 + ... + {3k x k,
and {3j's. Then
22
MATRIX ALGEBRA THEORY AND APPLICATIONS
from which it follows that ai - f3i = 0 for every i, in view of the fact that the basis is linearly independent.
In view of the unique representation presented above, one can define a map from V to Fk. Let XI,X2, ... ,Xk be a basis for V. Let x E V, and x = aIxI + a2x2 + ... + akXk be the unique representation of x in terms of the vectors of the basis. The ordered tuple (aI, a2, ... , ak) is called the set of coordinates of x with respect to the given basis. Define
which one can verify to be a bijective map from V to Fk. Further, r.p(-) is a homomorphism from the vector space V to the vector space Fk. Consequently, the vector spaces V and Fk are isomorphic. We record this fact as a special property below. P 1.4.12 Any vector space V(F) of dimension k is isomorphic to the vector space Fk. The above result also implies that any two vector spaces over the same field of scalars and of the same dimension are isomorphic to each other. It is time to take stock of the complete meaning and significance of P 1.4.12. If a vector space V over a field F is isomorphic to the vector space Fk for some k ~ 1, why bother to study vector spaces in the generality they are introduced? The vector space Fk is simple to visualize and one could restrict oneself to the vector space Fk in subsequent dealings. There are two main reasons against pursuing such a seemingly simple trajectory. The isomorphism that is built between the vector spaces V(F) and Fk is based on a given basis of the vector space V(F). In the process of transformation, the intrinsic structural beauty of the space V(F) is usually lost in its metamorphism. For the second reason, suppose we establish a certain property of the vector space Fk. If we would like to examine how this property comports itself in the space V(F), we could use anyone of the isomorphisms operational between V(F) and Fk, and translate this property into the space V(F). The isomorphism used is heavily laced with the underlying basis and an understanding of the property devoid of its external trappings provided by the isomorphism would then become a herculean task.
Vector Spaces
23
As a case in point, take F = R and V = Pk, the set of all polynomials with real coefficients of degree < k. The vector space P k is isomorphic to R k • Linear functionals on vector spaces are introduced in Section 1.7. One could introduce a linear functional f on P k as follows. Let JL be a measure on the Borel 17-field of [a, bj, a non-degenerate interval. For x E Pk, let
f(x) =
lb
x(t)JL(dt).
One can verify that
f(x
+ y) = f(x) + f(y), x, Y E P k
and
f(ax) = af(x),
a E
R, x E P k •
Two distinct measures JLI and JL2 on [a, bj might produce the same linear functional. For example, if
for m
= 0,1,2, ...
, k - 1, then
h(x) =
rb
Ja
x(t)JLI(dt) =
1b
x(t)JL2(dt) = h(x)
for all x E P k . A discussion of features such as this in Pk is not possible in Rk. The vector space P k has a number facets allied to it which would be lost if we were to work only with R k using some isomorphism between Pk and Rk. We will work with vector spaces as they come and ignore P 1.4.12.
Complements. 1.4.1 Let V = C, the set of all complex numbers. Then V is a vector space over the field C of complex numbers with the usual addition and multiplication of complex numbers. What is the dimension of the vector space V? 1.4.2 Let V = C, the set of all complex numbers. Then V is a vector space over the field R of real numbers with the usual addition of complex
24
MATRIX ALGEBRA THEORY AND APPLICATIONS
numbers. The scalar multiplication in V is the usual multiplication of a complex number by a real number. What is the dimension of V? How does this example differ from the one in Complement 1.4.1? 1.4.3 Let V = R, the set of all real numbers. Then V is a vector space over the field Q of rational numbers. The addition in V is the usual addition of real numbers. The scalar multiplication in V is the multiplication of a real number by a rational number. What is the dimension of V? 1.4.4 Let R be the vector space over the field Q of rational numbers. See Complement 1.4.3. Show that J2 and v'3 are linearly independent. 1.4.5 Determine the dimension of the vector space introduced in Example 1.3.4. Identify a basis of this vector space. 1.4.6 Let F = {O, 1,2,3,4} be the field in which addition and multiplication are carried out in the usual way but modulo 5. How many points are there in the vector space F3? 1.5. Subspaces In any set with a mathematical structure on it, subsets which exhibit all the features of the original mathematical structure deserve special scrutiny. A study of such subsets aids a good understanding of the mathematical structure itself. DEFINITION 1.5.1. A subset S of a vector space V is said to be a subspace of V if ax + f3y E S whenever x, yES and a, f3 E F.
P 1.5.2 A subspace S of a vector space V is a vector space over the same field F of scalars under the same definition of addition of vectors and scalar multiplication operational in V. Further, dim(S) ::; dim(V). PROOF. It is clear that S is a vector space in its own right. In order to show that dim(S) ::; dim(V), it suffices to show that the vector space S admits a basis. For then, any basis of S is a linearly independent set in V which can be extended to a basis of V. It is known that every vector space admits a basis.
If S consists of only the zero-vector, then S is a zero-dimensional subspace of V. If every vector in S is of the form ax for some fixed non-zero vector x and for some a in F, then S is a one-dimensional subspace of V. If every vector in S is of the form aXl + f3x2 for some
25
Vector Spaces
fixed set of linearly independent vectors Xl and X2 and for some 0: and f3 in F, then S is a tw~dimensional subspace of V . The schematic way we have described above is the way one generally obtains subspaces of various dimensions. The sets {O} and V are extreme examples of subspaces of V. P 1.5.3 The intersection of any family of subspaces of V is a subspace of V. P 1.5.4 Given an r-dimensional subspace S of V, we can find a basis XI,X2, ••• ,Xr ,Xr +I,X r +2, ... ,Xk of V such that XI,X2, •.. ,Xr is a basis of S. The result of P 1.5.4 can also be restated as follows: given a basis Xt, X2, .. . ,Xr of S, it can be completed to a basis of V. The subspaces spanned by a finite set of vectors need special attention. If Xl, X2, ••• ,X r is a finite collection of vectors from a vector space V(F), then the set
is a subspace of V(F). This subspace is called the span of Xl, X2, ... ,Xr and is denoted by Sp(Xt,X2, ... ,xr ). Of course, any subspace of V(F) arises this way. The concept of spanning plays a crucial role in the following properties. P 1.5.5 Given a subspace S of V, we can find a subspace sc of V such that S n SC = {O}, dim(S) + dim(SC) = dim(V), and V
= S EB SC = {x + y: XES, Y ESC}.
Further, any vector X in V has a unique decomposition with Xl E Sand X2 ESc.
X
= Xl
+ X2
PROOF. Let XI,x2, .•• ,XnXr+I, ... ,Xk constitute a basis for the vector space V such that Xl, X2, .. . ,Xr is a basis for S. Let SC be the subspace of V spanned by Xr+b X r +2, ••• ,Xk. The subspace sc meets all the properties mentioned above.
We have introduced a special symbol EB above. The mathematical operation S EB sc is read as the direct sum of the subspaces Sand Sc.
26
MATRIX ALGEBRA THEORY AND APPLICATIONS
The above result states that the vector space V is the direct sum of two disjoint subspaces of V. We use the phrase that the subspaces 8 and 8 c are disjoint even though they have the zero vector in common! We would like to emphasize that the subspace 8 c is not unique. Suppose V = R2 and 8 = {(X, 0) : x E R}. One can take 8 c = {(x,x) : x E R} or 8 c = {(x,2x) : x E R}. We will introduce a special phraseology to describe the subspace SC : 8 c is a complement of 8. More formally, two subspaces 8 1 and S2 are complement to each other if 8 1 n 82 = {a} and {x+y: xE8 1 ,yE8 2 }=V. P 1.5.6 Let K = {Xl, X2, . .. ,xr } be a subset of the vector space V and Sp(K) be the vector space spanned by the vectors in K, i.e., Sp(K) is the space of all linear combinations of the vectors Xl, X2, . . . ,Xr . Then
where the intersection is taken over all subspaces 8 v of V containing K. Let 8 1 and 8 2 be two subspaces of a vector space V. Let
The operation + defined between subspaces of V is analogous to the operation of direct sum 61. We reserve the symbol 61 for subspaces 8 1 and 8 2 which are disjoint, i.e., 8 1 n 8 2 = {a}. The following results give some properties of the operation + defined for subspaces. P 1.5.7 Let 8 1 and 8 2 be two subspaces of a vector space V. Let 8 be the smallest subspace of V containing both 8 1 and 8 2 . Then (1) S = 8 1 + 82, (2) dim(S) = dim(8 1 )
+ dim(S2) that 8 1 + 8 2 S
dim(Sl
n 8 2),
PROOF. It is clear 8. Note that 8 1 + 8 2 is a subspace of V containing both 8 1 and 8 2. Consequently, 8 S 8 1 + S2. This establishes (1). To prove (2), let Xl,X2, ..• ,X r be a basis for 8 1 n 8 2, where r = dim(SI n 8 2). Let Xl, X2, .•. ,Xr , X r +l, X r +2, .. . ,Xm be the completion of the basis of 8 1 n S2 to 8 1 , where dim(8 l ) = m. Refer to P 1.5.4. Let x}, X2, ' " ,XT) Yr+l> Yr+2,. " ,Yn be the completion of the basis of 8 1 n 8 2 to S2, where dim(8 2) = n. It now
27
Vector Spaces
follows that a basis of 8 1 + 8 2 is given by XI, x2, .• • X m , Yr+lt Yr+2,'" ,Yn' (Why?) Consequently, dim(8 l
,X r , X r + 1, X r +2, ... ,
+ 82) = r + (m - r) + (n - r) =m+n-r = dim(8d + dim(8 2) - dim(8 l n 8 2).
P 1.5.8 Let 8 1 and 82 be two subspaces of V. Then the following statements are equivalent. (1) Every vector X in V has a unique representation Xl E 8 1 and X2 E 8 2. (2) 8 1 n 8 2 = {O}. (3) dim(8 1 ) + dim(82) = dim(V).
Xl
+ X2
with
Complements. 1.5.1 Let x, Y and z be three vectors in a vector space V satisfying + Y + z = 0. Show that the subspaces of V spanned by X and y and by X and z are identical. 1.5.2 Show that the subspace 8 = {O} of a vector space V has a unique complement. 1.5.3 Consider the vector space R3. The vectors (1,0,0), (0, 1,0) generate a subspace of R3, say 8. Show that 8{(0, 0, I)} and 8{(1, 1, I)} are two possible complementary one-dimensional subspaces of 8. Show that, in general, the choice of a complementary subspace 8 c of 8 c V is not unique. 1.5.4 Let 8 1 and 8 2 be the subspaces of the vector space R3 spanned by {(I, 0, 0), (0,0, I)} and {(O, 1, 1), (1,2, 3)}, respectively. Find a basis for each of the subspaces 8 1 n 8 2 and 8 1 + 8 2. 1.5.5 Let F = {O, 1, 2} with addition and multiplication defined modulo 3. Let 8 be the subspace of F3 spanned by (0,1,2) and (1,1,2). Identify a complement of 8. 1.5.6 Let F = {O, 1, 2} with addition and multiplication modulo 3. Make a complete list of all subspaces of the vector space F3. Count how many subspaces are there for each of the dimensions 1,2, and 3. 1.5.7 Show that the dimension of the subspace of R6 spanned by the X
28
MATRIX ALGEBRA THEORY AND APPLICATIONS
following row vectors is 4. 1 1 1 1 1 1
0 0 0 1 1 1
1 1 1 0 0 0
0 1 0 0 1 0
1 0 0 1 0 0
0 0 1 0 0 1
1.5.8 Consider pq row vectors each consisting of p + q + 1 entries arranged in q blocks of p rows each in the following way. The last p columns in each block have the same structure with ones in the diagonal and zeros elsewhere.
Block 1
Block 2
Block q
{!
{! {j
1 0 1 0
0 0
1 0 0 1
0 0
1
0
0
0
0
1
0 0
1 1
0 0
1 0
0 1
0 0
0
1
0
0
0
1
o o
0 0
1 1
o
1 0
o
1
o
o
0
1
o
0
1
Show that the subspace of RP+q+l spanned by the row vectors is of dimension p + q - l. 1.5.9 If pq numbers aij, i = 1,2, ... ,p; j = 1,2, ... ,q are such that the tetra difference
Vector Spaces
29
for all i, j, r, and s, show that
for all i and j for some suitably chosen numbers aI, a2, ... ,ap and bl , b2 , •• • ,bq • (Complements 1.5.7-1.5.9 are applied in the analysis of variance of two-way-dassified data in statistics.)
1.6. Linear Equations Let Xl, X2, ... ,Xm be fixed vectors in any vector space V(F}. Consider the following homogeneous linear equation, (1.6.1) with f3i's in F. The word "homogeneous" refers to the vector 0 that appears on the right hand side of the equality (1.6.1). If we have a non-zero vector, the equation is called non-homogeneous. The basic goal in this section is to determine f3i's satisfying equation (1.6.1). Let b = (f3l,/32, ... ,13m) be a generic symbol which is a solution of (1.6.1). The entity b can be regarded as a vector in the vector space Fm. Let S be the collection of all such vectors b satisfying equation (1.6.1). We will establish some properties of the set S. Some comments are in order before we spell out the properties of S. The vector (0,0, ... ,O) is always a member of S. The equation (1.6.1) is intimately related to the notion of linear dependence or independence of the vectors Xl, X2, ... ,Xm in V (F). If Xl> X2, ... ,Xm are linearly independent, f31 = 0,132 = 0, ... ,13m = 0 is the only solution of (1.6.1). The set S has only one vector. If Xl, X2,' .. ,Xm are linearly dependent, the set S has more than one vector of Fm. The objective is to explore the nature of the set S. Another point of inquiry is why one is confined to only one equation in (1.6.1). The case of more than one equation can be handled in an analogous manner. Suppose Xl, X2,'" ,Xm and YI, Y2, ... ,Ym are two sets of vectors in V (F). Suppose we are interested in solving the equations
f3lYI
+ f32Y2 + ... + f3mYm
= 0
MATRIX ALGEBRA THEORY AND APPLICATIONS
30
in unknown 131, /32, ... as a single equation
,13m
in F. These two equations can be rewritten
with (x}, yd, (X2' Y2), ... ,(xm , Ym) E y2(F). The treatment can now proceed in exactly the same way as for the equation (1.6.1).
P 1.6.1
S is a subspace of Fm.
P 1.6.2 Let Y 1 be the vector subspace of Y spanned by Then dimeS) = m - dim(Yd.
x},
X2,
••• ,Xm •
PROOF. If each Xi = 0, then it is obvious that S = Fm, dimeS) = m, and dim(Y l ) = 0. Consequently, dimeS) = m - dim(Y l ). Assume that there exists at least one Xi =I 0. Assume, without loss of generality, that Xl, X2,." ,Xr are linearly independent and each of Xr+l, Xr+2,'" ,Xm is a linear combination of x}, X2, ... ,Xr . This implies that dim(Y l) = r. Accordingly, we can write
(1.6.2) for each j = r vectors, bl
+ 1, r + 2, ... ,m
and for some f3j,s'S in F. Then the
= (f3r+l,l,f3r+l,2,'"
~ = (f3r+2,l,f3r+2,2,'"
bm- r
= (f3m,l,
f3m,2,""
,f3r+l,r, - 1, 0, ... , 0), ,f3r+2,r, f3m,r,
0, -1, ... , 0),
0,
0, ... ,-1),
(1.6.3)
are all linearly independent (why?) and satisfy equation (1.6.1). If we can show that the collection of vectors in (1.6.3) spans all solutions, then it follows that they form a basis for the vector space S, and consequently, dimeS)
If b = (f3}, /32, ...
,13m)
=m
- r
=m
- dim(Y I ).
is any solution of (1.6.1), one can verify that
Vector Spaces
31
i.e., b is a linear combination of bl, b2 , ••• ,bm - r . Use the fact that Xl! X2, •• • ,X r are linearly independent and equation (1.6.2). This completes the proof. A companion to the linear homogeneous equation (1.6.1) is the socalled non-homogeneous equation,
for some known vector y =I O. Note that while a homogeneous equation {1.6.1} always has a solution, namely, the null vector in Fm, a nonhomogeneous equation may not have a solution. Such an equation is said to be inconsistent. For example, let Xl = (I, 1, 1), X2 = {I, 0, I} and X3 = {2, 1, 2} be three vectors in the vector space R 3 {R). Then the non-homogeneous equation,
has no solution. P 1.6.3
The non-homogeneous equation, (1.6.4)
admits a solution if and only if y is dependent on
Xl, X2, • .. ,Xm ·
The property mentioned above is a reformulation of the notion of dependence of vectors. We now identify the set of solutions {1.6.4} if it admits at least one solution. If {1.6.4} admits a solution, we will use the phrase that {1.6.4} is consistent. P 1.6.4 Assume that equation {1.6.4} has a solution. Let bo = (f31,/h . . . ,13m) be any particular solution of (1.6.4). Let 8 1 be the set of all solutions of {1.6.4}. Then
8 1 = {bo
+ b : b E 8},
{1.6.5}
where 8 is the set of all solutions of the homogeneous equation {1.6.1}. It is clear that for any b E 8, bo + b is a solution of {1.6.4}. Conversely, if c is a solution of {1.6.4}, we can write c = bo + {c - bo}. Note that c - bo E 8. PROOF.
32
MATRIX ALGEBRA THEORY AND APPLICATIONS
Note that the consistent non-homogeneous equation (1.6.4) admits a unique solution if and only if the subspace S contains only one vector, namely, the zero vector. Equivalent conditions are that dim(S) = 0 = m - dim(Vt} or Xl, X2, . .. , Xm are linearly independent. A special and important case of the linear equation (1.6.4) arises when Xl, X2, ... , Xm belong to the vector space V (F) = Fk, for some k ~ 1. If we write Xi = (Xli, X2i, ..• , Xki) for i = 1,2, ... , m, with each Xji E F, and Y = (YI, Y2, • .. , Yk) with each Yi E F, then the linear equation (1.6.4) can be rewritten in the form, Xll{31 +XI2{32 x21{31
+ .. . +Xl m {3m
= YI,
+ x22{32 + ... + x2m{3m = Y2, (1.6.6)
which is a system of k simultaneous linear equations in m unknowns {3t.{32, ... ,(3m. Associated with the system (1.6.6), we introduce the following vectors: Ui=(Xil,Xi2, ... ,Xim), i=1,2, ... Vi
= (X i l,Xi2, ...
,Xim,Yi),
i
,k,
= 1,2, ...
,k.
For reasons that will be clear when we take up the subject of matrices, we call Xl, X2, ... , Xm and y as column vectors, and UI, U2,· .. , Uk, VI, V2, .•• , Vk as row vectors. The following results have special bearing on the system (1.6.6) of equations.
P 1.6.5 The maximal number, g, of linearly independent column vectors among Xl, X2, ... , Xm is the same as the maximal number, s, of linearly independent row vectors among UI, U2, ... , Uk. PROOF. The vector Y has no bearing on the property enunciated above. Assume that each Yi = o. If we arrange mk elements from F in the form of a rectangular grid consisting of k rows and m columns, each row can be viewed as a vector in the vector space Fm and each column can be viewed as a vector in the vector space Fk. The property under discussion is concerned about the maximal number of linearly independent rows and of independent columns. We proceed with the
Vector Spaces
33
proof as follows. The case that every Ui = 0 can be handled easily. Assume that there is at least one vector Ui i= O. Assume, without loss of generality, that 'ILl, U2, .•. ,us are linearly independent and each Uj, for j = s + 1, s + 2, ... , k, is a linear combination of UI, U2, •.. ,Us' Consider the subsystem of equations (1.6.6) with y/s taken as zeros consisting of the first s equations Xil{31
+ xi2f32 + ... + xim{3m =
0, i = 1, ... , s.
(1.6.7)
Let S be the collection of all solutions of (1.6.6) and S* that of (1.6.7). It is clear that S = S*. Let V I be the vector space spanned by Xl, X2, •.. , X m . Let dim(V)) = g. By P 1.6.2, dim(S) = m-dim(V)) = m - g. The reduced system of equations (1.6.7) can be rewritten in the fonnat of (1.6.1) as
with xi, x2' ... ,x~, now, in F S • Let Vi be the subspace of F S spanned by xi, x2, ... , x~. (Observe that the components of each xi are precisely the first s components of xd Consequently, dim(Vi) ::; dim(F S ) = s. By P 1.6.2, dim(S*) = m - dim(Vi) ~ m - s, which implies that m - 9 ~ m - s, or, 9 ::; s. By interchanging the roles of rows and columns, we would obtain the inequality s ::; g. Hence s = g. The above result can be paraphrased from an abstract point of view. Let the components x), X2, ... ,X m be arranged in the form of a rectangular grid consisting of k rows and m columns so that the entries in the i-th column are precisely the entries of Xi. We have labelled the rows of the rectangular grid by UI, U2, ... ,Uk. The above result establishes that the maximal number of linearly independent vectors among Xl, X2, .• . , Xm is precisely the maximal number of linearly independent vectors among U), U2, ••. , Uk. We can stretch this analogy a little further. The type of relationship that exists between x), X2,' .. , Xm and UI, U2, ... , Uk is precisely the same that exists between Xl, X2, ... , X m , Y and Vb V2, ... , Vk. Consequently, the maximal number of linearly independent vectors among x I, X2, . .. , X m , Y is the same as the maximal number of linearly independent vectors among VI, V2, ... , Vk. This provides a useful characterization of consistency of a system of nonhomogeneous linear equations.
MATRIX ALGEBRA THEORY AND APPLICATIONS
34
P 1.6.6 A necessary and sufficient condition that the non-homogeneous system (1.6.6) of equations has a solution is that the maximal number, g, of linearly independent vectors among UI, U2, • .. ,Uk is the same as the maximal number, h, of linearly independent vectors among the augmented vectors VI, V2,··. ,Vk·
(1.6.6) admit a solution if and only if the maximal number of linearly independent vectors among Xl, X2,'" ,Xm is the same as the maximal number of linearly independent vectors among Xl, X2, ••• ,Xm , y. But the maximal number of linearly independent vectors among Xl , X2, •.• ,Xm , Y is the same as the maximal number of linearly independent vectors among VI,V2,'" ,Vk. Consequently, a solution exists for (1.6.6) if and only if 9 = s = h. PROOF. By P 1.6.3, equations
The systems of equations described in (1.6.6) arises in many areas of scientific pursuit. One of the pressing needs is to devise a criterion whose verification guarantees a solution to the system. One might argue that P 1.6.5 and P 1.6.6 do provide criteria for the consistency of the system. But these criteria are hard to verify. The following proposition provides a necessary and sufficient condition for the consistency of the system (1.6.6). At a first glance, the condition may look very artificial. But time and again, this is the condition that becomes easily verifiable to check on the consistency of the system (1.6.6). P 1.6.7 The system (1.6.6) of non-homogeneous linear equations admits a solution if and only if
whenever
( 1.6.8)
(1.6.6) admits a solution. Suppose = 0 for some el, e2, ... ,ek in F. Multiply the i-th equation of (1.6.6) by ei and then sum over i. It now follows that elYI + e2Y2 + ... + ekYk = O. Conversely, view PROOF. Suppose the system
el UI
+ e2U2 + ... + ekUk
Vector Spaces
as a system of homogeneous linear equations in k unknowns 10 1, 102, • •• Consider the system of homogeneous linear equations CIVI
35 ,10k .
+ C2V2 + ... + CkVk = 0
in k unknowns 101, 102, ••• ,10k. By (1.6.8), these two systems of equations have the same set of solutions. The dimensions of the spaces of solutions are k - sand k - h, respectively. Thus we have k - s = k - h, or s = h. By P 1.6.6, the system (1.6.6) has a solution.
Complements 1.6.1 Let Q be the field of rational numbers. Consider the system of equations. 2{31
+ {33 -
{34 = 0
{32 - 2{33 - 3{34 = 0
in unknown {31, /32, {33, {34 E Q. Determine the dimension of the solution subspace S of Q4. Show that 2{31
+ {33 -
{34 = Yl
{32 - 2{33 - 3{34
= Y2
admit a solution for every Yl and Y2 in Q. 1.6.2 Consider the system (1.6.6) of equations with Yl = Y2 ~ = ... Yk = O. Show that the system has a non-trivial solution if k < m.
=
1.7. Dual Space One way to understand the intricate structure of a vector space is to pursue the linear functionals defined on the vector space. The duality that reigns between the vector space and its space of linear functionals aids and reveals what lies inside a vector space. DEFINITION 1.7.1. A function! defined on a vector space V(F} taking values in F is said to be a linear functional if
!(alxl + a2 x 2) = at/(xt} + a2/(x2} for every
Xl, X2
in V(F} and
aI, a2
in F.
One can view the field F as a vector space over the field F itself. Under this scenario, a linear functional is simply a homomorphism from the vector space V(F} to the vector space F(F}.
MATRIX ALGEBRA THEORY AND APPLICATIONS
36
1.7.2. Consider the vector space Rn. Let aI , a2, .. . , an be fixed real numbers. For x = (6,6, ... , ~n) ERn. let EXAMPLE
The map I is a linear functional. If ai = 1 and aj = 0 for j i= i for some fixed 1 ::; i ::; n, then the map I is called the i-th co-ordinate functional. EXAMPLE 1.7.3. Let P n be the collection of all polynomials xC) of degree < n with coefficients in the field C of complex numbers. We have seen that P n is a vector space over the field C. Let aO be any complexvalued integrable function defined on a finite interval [a, bJ. Then for x( ·) in P n , let
I(x) Then
=
lb
a(t)x(t) dt.
I is a linear functional on P n.
It is time to introduce the notion of a dual space. Later, we will also determine the structure of a linear functional on a finite dimensional vector space.
1.7.4. Let V(F) be any vector space and V' the space of all linear functionals defined on V(F). Let us denote by 0 the linear functional which assigns the value zero of F for every element in V(F). The set Viis called the dual space of V(F). DEFINITION
We will now equip the space V' with a structure so that it becomes a vector space over the field F. Let II. h E V' and aI, a2 E F. Then the function I defined by
I(x)
= al/I(x) + a2h(x) ,
x
E
V(F)
is clearly a linear functional on V(F). We denote the functional I by adl + a2l2 . This basic operation includes, in its wake, the binary operation of addition and scalar multiplication on V' by the elements of the field F. Under these operations of addition and scalar multiplication, V' becomes a vector space over the field F.
Vector Spaces
37
P 1.7.5 Let X},X2, ... ,Xk be a basis of a finite dimensional vector space V(F}. Let at, 02, ... , Ok be a given set of scalars from F. Then there exists one and only one linear functional f on V(F} such that
PROOF. Any vector x in
V(F} has a unique representation x =
{tXt +6X2 + ... + {kXk for some scalars any linear functional on V(F}, then
6,6, ... ,{k in F.
If f is
which means that the value f(x} is uniquely determined by the values of f at XI, X2,· .. , Xk. The function f defined by
for X = {tXt + 6X2 + ... + {kXk E V(F} is clearly a linear functional satisfying f(Xi} = 0i for each i. Thus the existence and uniqueness follow.
P 1.7.6 Let Xt,X2, ... ,Xk be a basis of a finite dimensional vector space V. Then there exists a unique set It, 12, ... ,fk of linear functionals in Vi such that if z =], if i
=1=
j,
(1.7.1)
and these functionals form a basis for the vector space V'. Consequently,
dim(V}
= dim(V/}.
PROOF. From P 1.7.5, the existence of k linear functionals satis-
fying (1.7.1) is established. We need to demonstrate that these linear functionals are linearly independent and form a basis for the vector space V'. Let f be any linear functional in V'. Let f(Xi} = ai, i = 1,2, ... , k. Note that f = otlt +0212 + ... +oklk. The linear functionals It, 12, ... ,Jk do indeed span the vector space V'. As for their linear independence, suppose {3I1t + fhh + ... + {3klk = 0 for some scalars {3t, {32, ... , {3k in F. Observe that 0 = ({3I1t + fhh + ... + {3kfk)(Xi} = {3i
38
MATRIX ALGEBRA THEORY AND APPLICATIONS
for each i = 1,2, ... ,k. Hence linear independence of these functionals follows. The result that the dimensions of the vector space V and its dual space are identical is obvious now. The basis h, 12, ... ,Ik so arrived above is called the dual basis of x}, X2, . . . ,Xk. Now we are ready to prove the separation theorem. P 1.7.7 Let u and v be two distinct vectors in a vector space V. Then there exists a linear functional I in V' such that I(u} 1= I(v} . Equivalently, for any non-zero vector X in V , there exists a linear functional I in V' such that I(x} 1= O. PROOF . Let Xl,x2,'" ,Xk be a basis of V and II! 12,.·· , Ik its dual basis. Write x = 6Xl + 6X2 + . .. + ~kXk for some scalars 6,6, .. . ,~k in F. If x is non-zero, there exists 1 ::; i ::; k such that ~i is non-zero. Note that li(X} = ~i =1= O. The first statement of P 1.7.7 follows if we take x = u - v .
Since V'is a vector space, we can define its dual vector space V" as the space of all linear functionals defined on V'. From P 1.7.6, we have dim(V} = dim(V/} = dim(V"}. Consequently, all these three vector spaces are isomorphic. But there is a natural isomorphic map from V to V", which we would like to identify explicitly. P 1.7.8 For every linear functional Zo in V", there exists a unique Xo in V such that
zo(J} = I(xo} The correspondence Zo
¢:>
for every I in V'.
Xo is an isomorphism between V" and V.
PROOF. Let 11,12, ... ,!k be a basis of V'. Given Xo in V , there exists a unique Zo in V" such that
We refer to P 1.7.5. Consequently, zo(J} = I(xo} for all I in V' . If Xl and X2 are two distinct vectors in V, then the corresponding vectors Zl and Z2 in V" must be distinct. If not, (Zl - Z2)(J} = 0 = /(XI) - I(X2} = I(XI - X2} for all I in V'. But this is impossible in view of P 1.7.7. Thus we observe that the correspondence Xo ¢:> Zo
Vector Spaces
39
enWlciated above is an injection. It is also clear that this association is a homomorphism. 'The isomorphism of this map now follows from the fact that dim(V) = dim(V"). Now that we have dual spaces in circulation, we can introduce the notion of annihilator of any subset (subspace or not) of a vector space. DEFINITION 1.7.9. The annihilator of a subset S of a vector space
sa of linear fWlctionals given by sa = {f E V' : f(x) = 0 for every x in S}. It is clear that the annihilator sa is a subspace of the vector space
V is the set
V' regardless of whether S is a subspace or not. If S contains only the null vector, then Sa = V'. If S = V, then sa = {O}. If S contains a non-zero vector, then sa i= V' in view of P 1.7.7. P 1.7.10 If S is a subspace of a vector space V, then dim(Sa) = dim V - dim(S). PROOF. Let xl, X2, ... ,Xr be a basis of the subspace S which can be extended to a full basis Xt, X2, .. . ,Xn Xr+b Xr+2, ... ,Xk of V. Let It, 12, ... ,ik be the dual basis of V'. Let f E sa. We can write f = Qllt + Q2h + ... + Qkfk for some scalars Qt, Q2,··· ,Qk in F. Observe that for each 1 ::; i ::; r,O = f(xd = Qi. Consequently, f is a linear combination of fr+l ,Jr+2, . .. ,fk only, i.e., f = Qr+!fr+l + Qr+2fr+2 + ... + Qkik. This implies that sa is a subspace of the span Sp(Jr+1 ,Jr+2,··· ,!k) of fr+l ,Jr+2, ... ,Jk. By the very construction of the dual basis, fi(xj) = 0 for every 1 ::; j ::; rand r + 1 ::; i ::; k. Consequently, each fi' r + 1 ::; i ::; k, belongs to sa. Thus we observe that Sp(Jr+1 ,Jr+2, ... ,ik) ~ sa. We have now identified precisely what sa is, i.e., Sa = Sp(Jr+!, fr+2, ... ,fk). From this it follows that dim(Sa) = k - r = dim(V) - dim(S).
The operation of annihilation can be extended. We start with a subspace S of a vector space V, and arrive at its annihilator sa which is a subspace of V'. Now we can look at the annihilator saa of the subspace sa. Of course, saa would be a subspace of V". This chain could go on forever. P 1.7.11 If S is a subspace of a vector space V, then morphic to S.
saa
is
is~
40
MATRIX ALGEBRA THEORY AND APPLICATIONS
PROOF. Consider the bijection as identified between V and V" in P 1.7.S. For every Xo in V there exists a unique zo in V" such that zo(f) = I(xo) for every I in V'. If zo E 8 aa , then zo(J) = I(xo) = 0 for every I in 8 a • Since 8 is a subspace, this implies that Xo E 8. In a similar vein, one can show that if Xo E 8, then zo E 8 aa • The isomorphism that has been developed between V and V" in P 1.7.S when restricted to the subspace 8 is an isomorphism between 8 and 8 aa •
P 1.7.12
If 8 1 and 8 2 are subspaces of a vector space V, then
(8 1 + 8 2 )a
= 8~ n 8 2.
These identities follow from the definition of annihilator. Complements
1.7.1 If I is a non-zero linear functional from a vector space V(F) to a field F, show that I is a surjection (onto map). 1.7.2 If 11 and 12 are two linear functionals on a vector space V(F) satisfying It (x) = 0 whenever h(x) = 0 for x in V(F), show that It = 0:12 for some 0: in F. 1.7.3 Let F be a field and Pn(t) the vector space of all polynomials of n-l
degree less than n with coefficients from the field F. For any x =
E
dit i
i=O
in P n with di's in F, define n-l
I(x) = Ldi/3i i=O
for any fixed choice f3o, f31 , f32, . .. , f3n-l of scalars from F. Show that I is a linear functional. Show that any linear functional on P n (t) arises this way. 1.7.4 Let F = {O, 1, 2} with addition and multiplication modulo 3. Spell out all the linear functionals explicitly on F3. 1.7.5 The vectors (1,1,1,1), (1,1,-1,-1), (1,-1,1,-1), and (1,-1,-1,1) form a basis of the vector space R 4 . Let It, 12, h, and 14 be the dual basis. Evaluate each of these linear functionals at x = (1,2,3,4).
Vector Spaces
1.7.6
Let
f
41
be a linear functional on a vector space V(F) and
S = {x E V(F) : f(x) = O}. Show that S is a subspace of V(F). Comment on the possible values of the dimension of S. 1.7.7 If S is any subset of a vector space V, show that saa is isomorphic with the subspace spanned by S. 1.7.8 If S1 and S2 are two subsets of a vector space V such that S1 C S2, show that S2 c S'1.
1.8. Quotient Space There are many ways of generating new vector spaces from a given vector space. Subspaces are one lot. Dual spaces are another. In this section, we will introduce quotient spaces. DEFINITION 1.8.1. Let S be a subspace of a vector space V. Let x be an element of V. Then Sx = x + S is said to be a coset of S.
We have seen what cosets are in the context of groups. The idea is exactly the same. The group structure under focus here is the addition of vectors of a vector space. We define the following operations on the cosets of S. Addition:
For x, y in V, let
Scalar multiplication:
For a in F and x in V, let if a
# 0,
if a
= O.
The operation of addition defined above is nothing new. We have introduced this operation in the context of complementary subspaces. The following properties of these operations can be verified easily.
(1) Sx + Sy = Sx+y for any x, y in V. This means that Sx also a coset of S. (2) Sx + S = Sx for all x in V. (3) Sx + S-x = S for all x in V.
+ Sy
is
42
MATRIX ALGEBRA THEORY AND APPLICATIONS
In addition to the above properties, the operation of addition satisfies commutative and associative laws. The set of all distinct cosets of 8 thus becomes a commutative group. The zero element of the group is 8 . The negative of 8 x is 8- x • The scalar multiplication introduced above on the cosets satisfies all the rules of a vector space. Consequently, the set of all cosets form a vector space which is called the quotient space associated with the subspace 8 and is denoted by V /8. The following result identifies what the quotient space is like. P 1.8.2 The quotient space is isomorphic to every complement of the subspace 8 of V. Let 8 c be a complement of 8. Let f : 8 c ---+ V /8 be defined by f(x) = 8 x ,x E 8 c • We show that f is an isomorphism. Let Xl and X2 be two distinct points of 8 c . Then 8 xi i= 8 X2 • If not, for any given Zl in 8 there exists Z2 in 8 such that Xl + Zl = X2 + Z2 = X, say. What this means is that X has two distinct decompositions, which is not possible. Consequently, f is an injection. Let K be any coset of 8. Then K = X + 8 for some X in V. Since X admits a unique decomposition, we can write X = Xo + Xl with Xo E 8 and Xl E 8 c • Consequently, K = X + 8 = Xo + Xl + 8 = Xl + (xo + 8) = Xl + 8. Thus K is of the form Xl + 8 for some Xl in 8 c . This shows that f is a surjection. It can be verified that f is a linear map. Hence f is an isomorphism. PROOF.
P 1.8.3 For any subspace 8 of a vector space V, dim(V /8) dim(V) - dim(8).
=
This result follows from the fact that V /8 is isomorphic to a complement of 8. If one is searching for a complement of 8, V /8 is a natural candidate! Complements 1.8.1 Let F = {O, 1, 2} with addition and multiplication modulo 3. Let 8 be the subspace of F3 spanned by (1,0,0) and (1,0,2) . Construct the quotient space F3 / S. Exhibit a complement of the subspace 8 different from the quotient space F 3 /8. 1.9. Projective Geometry It is time to enjoy the fruits of labor expended so far. We will present some rudiments of projective geometry just sufficient for our needs.
Vector Spaces
43
Some applications of projective geometry include the construction of orthogonal latin squares and balanced incomplete block designs. DEFINITION 1.9.1. Let S be a set of elements and S a collection of subsets of S. The pair (S, S) is said to be a projective geometry if (1) given any two distinct elements in S there is a unique set in S containing these two points, and (2) any two distinct sets in S have only one member of S in common. m the picturesque language of geometry, the members of S are called points and the sets in S lines. The condition (1) translates into the dictum that there is one and only one line passing through any two given distinct points. The condition (2) aligns to the statement that any two distinct lines meet at one and only one point. If the set S is finite, the associated geometry is called a finite projective geometry. m this section, we show that such geometries can be constructed from vector spaces. Consider a three-dimensional vector space V(F) over a finite field F consisting of s elements, say. If Xl, X2, and X3 are linearly independent vectors in V(F), we can identify V(F) as
Since each 0i can be chosen in different ways, the number of distinct vectors in V(F) is s3. We now build a finite projective geometry out of V(F). Let S be the collection of all one-dimensional subspaces (points) of V(F). Consider any two-dimensional subspace of V(F). Any such subspace can be written as the union of all its one-dimensional subspaces. mstead of taking the union, we identify the two-dimensional subspace by the set of all its one-dimensional subspaces. With this understanding, let S be the collection of all two-dimensional subspaces (lines) of V(F). We provide examples in the later part of the section. The important point that emerges out of the discussion carried out so far is that the pair (S, S) is a projective geometry. P 1.9.2
The pair (S, S) is a finite projective geometry. Moreover:
(1) The number of points, i.e., the cardinality of S is s2 + s + 1. (2) The number of lines, i.e., the cardinality of S is s2 + S + 1. (3) The number of points on each line is s + 1.
44
MATRIX ALGEBRA THEORY AND APPLICATIONS
PROOF. A one-dimensional subspace of V(F) is spanned by a nonzero vector of V(F). For each non-zero vector x in V(F), let M(x) be the one-dimensional subspace spanned by x. There are s3 - 1 nonzero vectors in V(F). But the one-dimensional subspaces spanned by each of these vectors are not necessarily distinct in view of the fact that M(x) = M(ax) for each non-zero a in F and non-zero x in V(F). There are s - 1 vectors giving rise to the same one-dimensional subspace. Consequently, the total number of one-dimensional subspaces is (s3 - 1)/(s - 1) = s2 + s + 1. Thus the cardinality of the set 8 is s2 + s + 1. This proves (1). Any two-dimensional subspace of V(F) is spanned by two linearly independent vectors of V(F) . For any two linearly independent vectors Xt, X2 of V (F), let M (Xl, X2) be the twodimensional subspace of V(F) spanned by Xl, x2. The total number of pairs of linearly independent vectors is (s3 - 1)(s3 - s)/2. (Why?) The total number of non-zero vectors in any two-dimensional subspace is s2 - 1. The subspace M(XI' X2) can also be spanned by any two linearly independent vectors in M (Xl, X2). The total number of pairs of linearly independent vectors in M(XI' X2) is (s2 -1)(s2 - s)/2. (Why?) Consequently, the total number of different two-dimensional subspaces is [(s3-1)(s3- s )/2]/[(s2_1)(s2- s)/2] = s2+s +1. This proves (2). Using the argument similar to the one used in establishing (1), the total number of distinct one-dimensional subspaces of a two-dimensional subspace is (s2 -l)/(s -1) = s + 1. This proves (3). It remains to be shown that (8, S) is a projective geometry. Let M(XI) and M(X2) be two distinct one-dimensional spaces of V(F), i.e., distinct points of 8. Let M(X3, X4) be any two-dimensional subspace containing both M(XI) and M(X2). Since Xl, X2 E M(X3, X4), it follows that M(xI, X2) = M(X3, X4). Consequently, there is one and only one line containing any two points. Consider now two different two-dimensional subspaces M (Xl, X2) and M(X3, X4) of V(F). The vectors x}, X2, X3 and X4 are linearly dependent as our vector space is only three dimensional. There exist scalars a}, a2, not both zero, and a3, a4, not both zero, in F such that
(Why?) Obviously, y is non-zero. Clearly, the point y belongs to both the lines M(XI' X2) and M(X3, X4). This means that there is at least one point common to any two distinct lines. Suppose M(YI) and M(Y2)
Vector Spaces
45
are two distinct one-dimensional subspaces common to both M(x}, X2) and M(X3, X4). Then it follows that
M(y}, Y2) = M(x}, X2) = M(X3, X4). This is a contradiction. This shows that any two distinct lines intersect at one and only one point. The proof is complete. The projective geometry described above is denoted by PG(2, s). The relevance of the number s in the notation is clear. The number two is precisely the dimension of the vector space F3 less one. The number s could not be any integer. Since the cardinality of the Galois field F is s, s = pm for some prime number p and integer m ;::: 1. See Section 1.1. A concrete construction of the projective geometry PG(2, s) is not hard. For PG(2, s), what we need is a Galois field F with s elements, which we have described in Example 1.1.10 in a special case. Let the s elements of F be denoted by ao = 0, a} = 1, a2, ... ,a s -} . The underlying three-dimensional vector space V(F) over the field F can be taken to be F3. We now identify explicitly one- and two-dimensional subspaces of F3.
(a)
one-dimensional subspaces Anyone-dimensional subspace of F3 is one of the following types:
(1) Span {(1,ai,ajn, i,j = 0, 1,2, . . . ,s-1. (2) Span {(O, 1, ain, i = 0, 1,2, ... , s - 1. (3) Span {(O, 0, One way to see this is to observe, first, that the vectors (1, ai, aj), (0,1, ak) and (0,0,1) are always (Why?) linearly independent for any i,j, k = 0,1,2, ... ,s -1. Secondly, the totality of all one-dimensional subspaces listed above is exactly s2 + s + 1. In our projective geometry PG(2, s), each of these one-dimensional subspaces constitute points of the set S. For ease of identification, we codify these one-dimensional subspaces.
In.
Code assigned
One-dimensional subspace Span{(l, ai, ajn Span{(O, 1, ain Span{(O, 0,
In
S2
+ is + j, s + i, I
i, j = 0, 1,2, ... ,s - 1 i = 0, 1,2, ... ,s - 1
46
MATRIX ALGEBRA THEORY AND APPLICATIONS
The integer codes for the points of S are all different, although they may not be successive numbers. We now try to identify the tw~dimensional subspaces of F3. Two and associated one dimensional subspaces
Type
Two-dimensional subspaces
1
{(O,O,l),(O,l,O)}
Constituent one-dimensional subspaces
{(O, 1, O)};
{(O, OJ, I)},
j = 0,1,2, ... , s - 1
2
{(O, 0,1), (1, 0i, O)} i
3
= 0, 1, 2, ...
,s - 1
{(O, 1,0), (1 , 0, Oi)} = 0, 1,2, ... , s - 1
i 4
{(O,O, I}; {(1, 0i, OJ)}, = 0, 1,2, ... , s - 1
j
{(O, 1, O};
{(I, OJ, Oi)}'
j = 0, 1,2, ... , s - 1
{(O, 1, Oi), (1, Ok, O)} {(O,I,Oi}; {(l,Ok +OJ,OiOj)}, i = 1,2,3 ... ,s -1, j = 0,1,2, ... , s - 1 k =0,1,2, ... ,s- 1
(For typographical convenience, the qualifying phrase "span" is omitted for the subspaces indicated.) The above table needs some justification. When we wrote down all one-dimensional subspaces of F3 in a systematic fashion, the following vectors in F3 arranged in three distinct groups played a pivotal role. Group 1 (0,0,1) Group 2 (0,1,0)
(0,1,1),
Group 3 (1,0,0)
(1,0,1), (1,1,1), (1,02,1),
(1,1,0) (1,02,0)
.. . ; (0,1, os-d (1,0,02); (1,1,02); (1,02,02);
... ; (1,0, os-d ... ; (1,1, os-d ... ; (1,02, os-d
(l,os-},O) (1, Os-I, 1), (1,Os-1,02); ... ; (l,Os-l,Os-l) Take any two vectors from anywhere in the above pool. Their span will give a tw~dimensional subspace of F3. But there will be duplications. We need to select carefully pairs of vectors from the above
Vector Spaces
47
so as to avoid duplications. Let us begin with the vector from Group 1 and find partners for this vector to generate a two-dimensional subspace. To start with, take the vector (0, 1,0) from Group 2 as a partner. Their span would give us a two-dimensional subspace of F3. The one-dimensional subspaces of Span {(O, 0,1), (0, 1, O)} can be identified in terms of the notation of the one-dimensional subspaces we have employed earlier. The one-dimensional subspaces are given by Span {(0,1,0)}, Span {OJ (0, 1,0) + (O,O,l)},j = 0,1,2, ... ,8 - 1. These one-dimensional subspaces can be rewritten in succinct from as: Span {(0,1,0)}, Span {(O,oj,I)},j = 0,1,2, ... ,8 -1, which are 8 + 1 in number, as expected. This particular two-dimensional subspace is categorized as of Type 1. Let us consider the span of (0,0,1) and anyone of the remaining vectors in Group 2. We will not get anything new. Let us now consider the span of the vector in Group 1 and any vector from Group 3. Consider, in particular, the span of (0,0,1) and any vector in the first column of Group 3. Any such two-dimensional subspace is categorized as of Type 2. You might ask why. Before we answer this question, observe that there are 8 such two-dimensional subspaces and they are all distinct. If we consider the span of (0,0, 1) with any vector in any of the remaining columns of vectors in Group 3, it would coincide with one of the two-dimensional subspaces we have stored under Type 2. The operation of finding mates for the vector (0,0,1) ends here. Let us work with the vector (0,1,0) from Group 2. The vector space spanned by (0,1,0) and anyone of the remaining vectors in Group 2 would coincide with the one we have already got under Type 1. Consider the vector space spanned by (0,1,0) and anyone of the vectors in the first row of Group 3. Each of these two-dimensional subspaces is categorized as of Type 3. There are 8 many of these two-dimensional spaces. These are all distinct among themselves and are also distinct from what we have got under Types 1 and 2. Further, the vector space spanned by (0,1,0) and anyone the vectors from the remaining rows of Group 3 would coincide with one of those we have already got. This completes the search of mates for the vector (0,1,0). Consider any vector (0,1, Oi), i = 1,2, ... ,8-1 from Group 2. The vector space spanned by (0, 1, Oi) and anyone of the vectors from Group 3 is categorized as of Type 4. All these two-dimensional spaces are distinct and are also distinct from what we have already got. There are 8(8-1) vector subspaces in Type 4. So far, we have got 8(8 -1) + 8 + 8 + 1 = 82 + 8 + 1 distinct
48
MATRIX ALGEBRA THEORY AND APPLICATIONS
two-dimensional subspaces. We have no more! The identification of the one-dimensional subspaces in any two-dimensional subspace listed above is similar to the one we have explained for the two-dimensional subspace listed under Type 1. It would be quite instructive to use the integer codes for the one-dimensional subspaces listed under each twodimensional subspace delineated above. EXAMPLE 1.9.3. Let us look at a very specific example. Take s = 3. The Galois field F can be identified as {O, 1, 2} with the usual operations of addition and multiplication modulo 3. The total number of onedimensional subspaces (points) in PG(2, 3) is 13 and the total number of two-dimensional subspaces (lines) is 13. They are identified explicitly in the accompanying table along with the integer codes of the onedimensional subspaces.
The numbers involved in the integer code are:l,3,4,5,9,10,11,12,13,14, 15,16 and 17. For an inkling of what is to come, suppose we want to compare the performance of 13 treatments on some experimental units. Suppose that the treatments are numbered as 1,3,4,5,9,10,11,12,13,14, 15,16 and 17. In the above description of the projective geometry PG(2,3), each line can be viewed as a block and each point as a treatment. Corresponding to each line which has four points (integer codes) on it, we create a block with four experimental units and assign to them the treatments associated with the integer codes. We thus have 13 blocks corresponding to 13 lines, with four treatments in each block. We then have what is called a balanced incomplete block design! Each treatment is replicated exactly four times and every pair of treatments appears exactly in one block. We will now describe formally what a balanced incomplete block design is and explain how projective geometry provides such a design. One of the primary goals in design of experiments is to compare the effect of some 1/ treatments over a certain population of experimental units. The experimental units can be arranged in blocks in such a way that the units in each block are homogeneous in all perceivable aspects. Ideally, we would like to have blocks each containing 1/ experimental units, i.e., of size 1/, so that each treatment can be tried once in each block. In practice, blocks of size k with k < 1/ may only be available. Suppose we have b blocks each of size k. A design is simply an allocation of treatments to experimental units in the blocks. One of the basic
Vector Spaces
49
problems is to allocate the treatments to units in a judicious manner so that we can compare the performance of any two treatments statistically with the same precIsion. A design is said to be a balanced incomplete block design if the following conditions are met: (1) Each treatment appears in r blocks, i.e., each treatment is replicated r times. (2) Every pair of treatments appears together in A blocks. Such a design is denoted by BIBD with parameters b, k, 1/, rand A. The basic question is how to construct such a design. Projective geometry is a vehicle to realize our goal. If the number of treatments 1/ = s2 + s + 1 for some s of the form pm, where p is a prime number and m is a positive integer, and the number of blocks available is b = s2 + s + 1 each of size s + 1, then the projective geometry PG(2, s) will deliver the goods. Identify the points of the projective geometry with treatments and the lines with blocks. We have a balanced incomplete block design with parameters b = s2 + s + 1, k = s + 1, 1/ = s2 + s + 1, 1· = S + 1 and A = 1. The example presented in the accompanying Table is a BIBD with b = 13, k = 4, 1/ = 13, r = 4 and A = 1. After having gone through the gamut outlined above, one gets the uncomfortable feeling that the technique of projective geometries in the construction of balanced incomplete block designs is of limited scope. In practice, the number of blocks available, the number of treatments to be tested and block size may not conform to the above specifications. (At this juncture, we would like to point out that a BIBD may not be available for any given b, k and 1/.) In order to have more flexibility, we need to extend the ideas germane to the projective geometry PG(2, s). Start with a Galois field F consisting of s elements. Consider the vector space Fm+l. The projective geometry PG(m, s) consists of the set S of all one-dimensional subspaces of the vector space Fm+l and the set S of all k-dimensional subspaces of F m + 1 for some k 2: 2. The elements of S are called points and elements of S are called k-planes. One could treat each point as a treatment and each k-plane as a block. This is a more general way of developing a BIBD. We will not pursue in detail the general construction. We only want to provide a rudimentary introduction to what vector spaces and their ilk can solve a variety of statistical problems. For a discussion of finite projective geometries of dimensions more
MATRIX ALGEBRA THEORY AND APPLICATIONS
50
than two, the reader is referred to Rao (1945c, 1946a) and to the references to other papers given there. TABLE: BIBD DESIGN Type
Tw~dimensional
subspaces
1
{(0,0,1),(0,1,0)}
2
{(0,0,1),(1,0,0)} {(0,0,1),(1,1,0)} {(0,0,1),(1,2,0)}
3
{(0,1,0),(1,0,0)} {(O, 1,0),(1,0, I)} {(0,1,0),(1,0,2)}
4
{(0,1,1),(1,0,0)} {(0,1,1),(1,1,0)} {(0,1,1),(1,2,0)} {(0,I,2),(I,0,0)} {(O,I,2),(I,I,0)} {(0,1,2),(1,2,0)}
One-dimensional subspaces
{(0,0,1)}: {(O, 1, I)}: {(O,O, I)}: {(1,0,1)}: {(O, 0, I)} : {(I, 1, I)} : {(O, 0, I)}: {(1,2,1)}: {(O, 1,0)}: {(I, 1, O)}: {(O, 1, O)} : {(I, 1, I)} : {(O, 1, O)} : {(I, 1, 2)} : {(O, 1, I)} : {(I, 1, I)} : {(0,1,1)}: {(I, 1, O)} : {(O, 1, I)}: {(I, 1, 2)} : {(0,1,2)}: {(I, 1, 2)}: {(O, 1, 2)}: {(I, 1, O)} : {(0,I,2)}: {(I, 1, I)} :
I nteger codes
{(0,1,0)} 1,3,4,5 {(O, 1, 2)}; {(1,0,0)} 1,9,10,11 {(1,0,2)}; {(I, 1, O)} 1,12,13,14 {(I, 1, 2)}; {(I, 2, O)} 1,15,16,17 {(1,2,2)}; {(1,0,0)} 3,9,12,15 {(I, 2, O)}; {(I, 0, I)} 3,10,13,16 {(I, 2, I)}; {(I, 0, 2)} 3,11,14,17 {(I, 2, 2)}; {(I, 0, O)} 4,9,13,17 {(I, 2, 2)}; {(1,0,2)} 4,11,12,16 {(I, 2, I)}; {(I, 0, I)} 4,10,14,15 {(I, 2, O)}; {(1,0,0)} 5,9,14,16 {(I, 2, I)}; {(I, 0, I)} 5,10,12,17 {(I, 2, 2)}; {(1,0,2)} 5,11,13,15 {(I, 2, O)}.
Note: Some references to material covered in this Chapter, where further details can be obtained, are Bose, Shrikhande and Parker (1960), Halmos (1958), Raghava Rao (1971) and Rao (1947, 1949).
CHAPTER 2 UNITARY AND EUCLIDEAN SPACES So far we have studied the relationship between the elements of a vector space through the notion of independence. It would be useful to consider other concepts such as distance and angle between vectors as in the case of two and three dimensional Euclidean spaces. It appears that these concepts can easily be extended to vector spaces over the field of complex or real numbers by defining a function called the inner product of two vectors.
2.1. Inner Product However abstract a vector space may be, when it comes to practicality, we would like to relate the vectors either with real numbers or complex numbers. One useful way of relating vectors with real numbers is to associate a norm, which is a non-negative real number, with every vector. We will see more of this later. Another way is to relate pairs of vectors with complex numbers leading to the notion of inner product between vectors. We will present rudiments of these ideas now. DEFINITION 2.1.1. Let V be a vector space over a field F, where F
is either the field C of complex numbers or R of real numbers. A map < ', ' > from V x V to F is called an inner product if the following properties hold for all x, y, z in V and a, {3 in F .
(1) < x, y >
(2) < X,x >
= {
> 0, if x =I 0, (positivity)
= 0, (3)
< y, x >, if F = C, (anti-symmetry) < y, x >, if F = R. (symmetry) if x
= 0.
< ax + {3y, z > = a < x, z > + {3 < y, z > . (linearity in the first argument) 51
52
MATRIX ALGEBRA THEORY AND APPLICATIONS
The bar that appears in (1) above is the operation of conjugation on complex numbers. A vector space furnished with an inner product is called an inner product space. It is customary to call such a space unitary when F = C, and Euclidean when F = R. We have the following proposition as a consequence of the conditions (1), (2) and (3) of Definition 2.1.1. In the sequel, most of the results are couched with reference to the field C of complex numbers. Only minor modifications are needed when the underlying field is R.
P 2.1.2
For any x, y, z in V(C) and a, /3 in C, the following hold.
(a) < x, ay + /3z > = a < x, y > +!3 < x, z >. (b) < x, 0> = < 0, x >= O. (c)
= a<x,/3y>= a!3<x,y>. Some examples of inner product spaces are provided below. EXAMPLE 2.1.3. Consider the vector space Rk for some k ~ 1. For any two vectors x = (a1,a2,'" ,ak) and y = (/31,/32,'" ,/3k) in Rk, define
< x, y > = a1/31
+ a2/32 + ... + ak/3k,
which can be shown to be an inner product on the vector space Rk. This is the standard inner product of the space R k • 2.1.4. Consider the vector space C k for some k ~ 1. Let 81 ,82 , ••• ,15k be fixed positive numbers. For any two vectors x = (at, a2, .. · ,ak) and y = (/31, /32, ... ,/3k) in Ck, define EXAMPLE
which can be shown to be an inner product on C k • If 81 = 82 = ... = 15k = 1, the resultant inner product is the so-called standard inner product on the space Ck. One might wonder about the significance of the way the inner product is defined above on C k • If one defines
then one of the conditions (which one?) of Definition 2.1.1 is violated.
Unitary and Euclidean Spaces
53
2.1.5. Let P n be the space of all polynomials of degree less than n with coefficients from the field C of complex numbers. For any two polynomials x(-) and y(.) in P n , define EXAMPLE
1
< x, y > =
J
x(t)y(t) dt,
o which can be shown to be an inner product on P n . EXAMPLE 2.1.6. Let Xl,x2, .. . ,Xk be a basis of a vector space V(C). For any two vectors x and y in V, we will have unique representations
in terms of the basis vectors. Let 81 ,82 , ... ,8k be some fixed positive numbers. Define
which can be shown to be an inner product on the vector space V. Note that an inner product on a vector space can be defined in many ways. The choice of a particular inner product depends on its usefulness in solving a given problem. We will see instances of several inner products in subsequent sections and chapters. Every inner product gives rise to what is known as a norm. DEFINITION 2.1.7. (Norm) Let < ., . > be an inner product on a vector space V. The positive square root of < x, x > for any x in V is called the norm of x and is denoted by IIx II. There is a more general definition of a norm on a vector space. The norm we have introduced above arises from an inner product. The more general version of a norm will be considered in a later chapter. As a consequence of Definitions 2.1.1 and 2.1.7, the following inequality follows.
P 2.1.8 (Cauchy-Schwartz Inequality). Let (V, < ., . » be an inner product space with the associated norm II . II in V. Then for any two vectors x and y in V, the inequality
1< x,y > I ~ IIxlillyll,
(2.1.1)
MATRIX ALGEBRA THEORY AND APPLICATIONS
54
holds. Moreover, equality holds in the above if and only if for some I and 8, not both zero, in C.
,X + 8y = 0
Let (3 = < x,x > and a = - < y,x >. Observe that < x, y >. We are required to establish that lal 2 = ao. =
PROOF.
a
= -
I < x, y > 12 :::; (3 < y, y >.
By the definition of an inner product,
o :::; < ax + (3y, ax + (3y >
= a < x, ax
= = =
+ (3y > + (3 < y, ax + (3y >
< x,x > +ajj < X,y > + (30. < y,x > +(313 < y,y > ao.
lal 2(3 -laI 2(3 -laI 2(3 + (32 < y,y > -laI 2(3 + (32 < y, y >, (2.1.2)
,X
from which (2.1.1) follows. If + 8y = 0 for some I and 8, not both zero, in C, it is clear that equality holds in (2.1.1). On the other hand, if equality holds in (2.1.1), equality must hold in (2.1.2) throughout. This implies that ax + (3y = O. If x = 0, take I to be any non-zero scalar and 8 = O. If x i= 0, take I = a and 8 = (3. This completes the proof.
P 2.1.9 For any two vectors x and y in a vector space V equipped with an inner product < .,. >, the following inequality holds.
< X,y > + < y,x >:::; 2I1x·lIlIyll. PROOF. Observe that 21 < X,y > I ~ < X,y > + < y,x >. The result now follows from the Cauchy-Schwartz inequality. We now establish some properties of a norm. For any two vectors x and y, IIx - yll can be regarded as the distance between x and y.
P 2.1.10 Let x and y be any two vectors in an inner product space V with inner product < .,. > and norm 11·11. Then the following hold.
(I) IIx + yll :::; IIxll + lIyll· (2) IIx - yll + IIYII ~ IIxll (triangle inequality of distance). (3) IIx + Yll2 + IIx - Yll2 = 211xl12 + 211Yl12 (parallelogram law). (4) IIx +y1l2 = IIxll 2+ lIyll2 if < x, y > = 0 (Pythagorous theorem).
Unitary and Euclidean Spaces
55
PROOF. By the definition of the norm,
II x + Yll2
=
~ =
IIxll 2+ IIYIl2+ < x,y > + < y,x > IIxll 2+ lIyl12 + 211 xll Ilyll (by Cauchy-Schwartz inequality) (lIxll + lIyll)2,
from which (1) follows. In (1), if we replace x by y and y by x - y, we obtain (2). The relations expostulated in (3) and (4) can be established in an analogous fashion. We now formally define the distance and angle between any two vectors in any inner product space. DEFINITION 2.1.11. Let x and y be any two vectors in a vector space
V equipped with an inner product < " -> and the associated norm 11 · 11. The distance 8(x, y) between x and y is defined by 8(x, y) = IIx - YII.
P 2.1.12 The distance function 8(-'·) defined above has the following properties. (1) 8(x, y) = 8(y, x) for any x and y in V. (2) 8(x, y) ? 0 for any x and y in V, =0 if and only if x = y. (3) 8(x,y) ~ 8(x,z) + 8(y,z) for any X,y and z in V (triangle inequality). PROOF. The properties (1) and (2) follow from the very definition of the distance function. If we replace x by x - y and y by x - z in P 2.1.10 (2), we would obtain the triangle inequality (3) above. DEFINITION 2.1.13. Let V be a Euclidean space equipped with an inner product < ',' > and the associated norm II . II. For any two non-zero vectors x and y in V, the angle () between x and y is defined by cos() = < x, y > f[lI xll lIyll].
Observe that, in view of the Cauchy-Schwartz inequality, cos () always lies in the interval [-1, 1]. This definition does not make sense in unitary spaces because < x, y > could be a complex number. The notion of angle between two non-zero vectors of a Euclidean vector space is consonant with the usual perception of angle in vogue in the two-dimensional Euclidean space. Let x = (Xl, X2) and y = (yI! Y2)
56
MATRIX ALGEBRA THEORY AND APPLICATIONS
be two non-zero vectors in the first quadrant of the two-dimensional Euclidean space R2. Let Ll be the line joining the vectors = (0,0) and x = (Xl,X2) and L2, the line joining 0= (0,0) and y = (Yl,Y2). Let 01 and O2 be the angles the lines Ll and L2 make with the x-axis, respectively. Then the angle 0 between the lines Ll and L2 at the origin is given by 0 = 01 - O2 • Further,
°
cos 0 = cos ( 01
xl
-
Yl
(
2 ) = cos 01 cos O2 + sin 0 1 sin O2 x2 Y2 < x, Y > \lx\l\ly\I·
=W·TIYIT+W ·TIYIT= Complements
2.1.1 Let V be a real inner product space and 0:, /3 be two positive real numbers. Show that the angle between two non-zero vectors x and y of V is the same as the angle between the vectors o:x and /3y. 2.1.2 Compute the angle between the vectors x = (3, -1, 1,0) and y = (2,1, -1, 1) in R4 with respect to the standard inner product of the space R4. 2.1.3 Let 0:,/3" and 8 be four complex numbers. For x = (6'~2) and y (1]b1J2) in C 2, define < X,y > O:~lih /36ih ,6ii2 86ii2. Under what conditions on 0:, /3", and 8, is < -,. > an inner product on C2? 2.1.4 Suppose Ilx + Yl12 = \lxl1 2 + \lyl12 for some two vectors x and y in an unitary space V. Is it true that < x, y > = o? What happens when V is a Euclidean space?
=
=
+
+
+
2.2. Orthogonality Let us, for a moment, look at two points x = (Xl, X2) and Y = (y!, Y2) in the two-dimensional Euclidean space R2. Draw a line joining the points = (0,0) and x and another line joining and y. We would like to enquire under what circumstances the angle between these lines is 90°. Equivalently, we ask under what conditions the triangle .6.0xy formed by the points 0, x, and y is a right-angled triangle with the angle LxOy at the origin = 90°. It turns out that the condition is < x, y > = XlYl + X2Y2 = 0. (Draw a picture.) This is the idea that we would like to pursue in inner product spaces.
°
°
Unitary and Euclidean Spaces
57
2.2.1. Two vectors X and y in an inner product space V are said to be orthogonal if the inner product between x and y is zero, i.e., < x, y > = O. In the case of a Euclidean space, orthogonality of x and y implies that the angle between x and y is 90°. Trivially, if x = 0, then x is orthogonal to every vector in V. The notion of orthogonality can be extended to any finite set of vectors. DEFINITION
DEFINITION 2.2.2. A collection, Xl, X2, ... product space V is said to be orthonormal if
,x r,
of vectors in an inner
if i =1= j, if i = j.
If a vector x is such that < x, x > = IIxll2 = 1, then x is said to be of unit length. If we drop the condition that each vector above be of unit length, then the vectors are said to be an orthogonal set of vectors. P 2.2.3 Let Xl, X2, ... ,Xr be an orthogonal set of non-zero vectors in an inner product space V. Then Xl, X2, ••• ,X r are necessarily linearly independent. PROOF. Suppose y = 0lXI + 02X2 + ... + OrXr = O. Then for each 1 ~ i ~ r, 0 = < y,Xi >= 0i < Xi,Xi >= O. Since Xi is a non-zero vector, 0i = O. This shows that the orthogonal set under discussion is linearly independent. One of the most useful techniques in the area of orthogonality is the well-known Gram-Schmidt orthogonalization process. The process transforms a given bunch of vectors in an inner product space into an orthogonal set.
P 2.2.4 (Gram-Schmidt Orthogonalization Process). Given a linearly independent set Xl, X2, ... ,Xr of vectors in an inner product space, it is possible to construct an orthonormal set Zl, Z2, ... ,Zr of vectors such that Sp(xt, ... ,Xi) = Sp(Zl'" . ,Zi) for every i = 1, ... ,r. [Note the definition: Sp( aI, ... ,ak) = {Ol al + ... + 0kak : 01, ... ,Ok E C}.] PROOF.
= Xl, Y2 = X2 -
Define vectors Yl, Y2,
YI
02.IYlt
. .. ,Yr
in the following way:
58
MATRIX ALGEBRA THEORY AND APPLICATIONS
Yr = Xr - Or,r-lYr-l - Or,r-2Yr-2 - ... - Or,lYl,
for some scalars Oi,/S. We will choose Oi,/S carefully so that the new vectors Yl, Y2, . .. , Yr form an orthogonal set of non-zero vectors. The determination of Oi,/S is done sequentially. Choose 02,1 so that Yl and Y2 are orthogonal. Setting
o
=
< Y2,Yl >
=
< X2,Xl >
-02,1
< YbYl >,
we obtain 02,1 = < X2, Xl > / < Yb Yl >. (Note that < Y}, Y1 > > 0.) Thus Y2 is determined. FUrther, the vector Y2 is non-zero since Xl and X2 are linearly independent. Choose 03,2 and 03,1 so that Y1, Y2 and Y3 are pairwise orthogonal. Set
o = < Y3,Y2 > = < X3,Y2 > o = = <X3,Yl > -
03,2 03,1
< Y2,Y2 >. .
From these two equations, we can determine 03,1 and 03,2 which would meet our requirements. Thus Y3 is determined. Note that the vector Y3 is a linear combination of the vectors x}, X2, and X3. Consequently, Y3 =1= O. (Why?) Continuing this process down the line successively, we will obtain a set Y1, Y2, .. , , Yr of orthogonal non-zero vectors. The computation of the coefficients oi./s is very simple. For the desired orthonormal set, set Zi = YdilYill, i = 1,2 ... , r. From the above construction, it is clear that (1) each Yi is a linear combination of X1,X2, ... ,Xi, and (2) each Xi is a linear combination of Y1, Y2, ... , Yi, from which we have
2.2.5. Let P4 be the vector space of all real polynomials of degree less than 4. The polynomials 1, X, x2, x3 form a linearly independent set of vectors in P 4. For p(.) and q(.) in P 4, let EXAMPLE
+1
< pO, qO >=
J
p(X) q(x) dx.
-1
Unitary and Euclidean Spaces
59
Observe that < ., . > is an inner product on P 4. The vectors 1, X, x2, x3 are not orthogonal under the above inner product. We can invoke the Gram-Schmidt orthogonalization process on these vectors to obtain an orthonormal set. The process gives
= 1, p2(X) = x, P3(X) = x2 - 1/3, P4(X) = x3 -
Pl(X)
(3/5)x.
This process can be continued forever. The sequence PI. P2, ... of polynomials so obtained is the well-known Legendre orthogonal polynomials. We can obtain an orthonormal set by dividing each polynomial by its norm. We can create other sequences of polynomials from 1, x, x2, ... by defining inner products of the type
choosing a suitable function f(x). We know that every vector space has a basis. If the vector space comes with an inner product, it is natural to enquire whether it has a basis consisting of orthonormal vectors, i.e., an orthonormal basis. The Gram-Schmidt orthogonalization process provides the sought-after basis. We will record this in the form of a proposition. P 2.2.6 Every inner product space has an orthonormal basis. If we have a basis Xl,X2, ... ,Xk for an inner product space V, we can write every vector x in Vasa linear combination of Xl, X2, ... ,Xk,
for some 01,02, ... ,Ok in C. Determining these coefficients Oi'S is a hard problem. If Xl, X2, ... ,Xk happen to be orthonormal, then these coefficients can be calculated in a simple way. More precisely, we have OJ = < X,Xi > for each i and
X = <X,Xl >Xl
+
<X,X2 >X2
+ ... +
<X,Xk >Xk.
This is not hard to see. There are other advantages that accrue if we happen to have an orthonormal basis. The inner product between any
MATRIX ALGEBRA THEORY AND APPLICATIONS
60
two vectors x and y in V can be computed in a straightforward manner. More precisely,
>< XI,Y > + < X,X2 >< + ... + < X,Xk >< Xk,Y >.
< x,y > = <
X,XI
X2,Y
> (2.2.1)
The above is the well-known Parseval Identity. Once we know the c0efficients Oi ' S in the representation of x in terms of Xl , X2 , ••• , Xk and the coefficients f3/s in the representation of Y in terms of Xl, X2, ••• , Xk, we can immediately jot down the inner product of X and Y, courtesy Parseval identity, as
One consequence of (2.2.1) is that the norm of X can be written down explicitly in terms of these coefficients. More precisely,
In this connection, it is worth bringing into focus the Bessel Inequality. The statement reads as follows: if Xl, X2, • •• , Xr is a set of orthonormal vectors in an inner product space, then for any vector X in the vector space, the following inequality holds
IIxII 2 2:
r
L 1<
X,Xi
> 12.
i =l
The identity (2.2.1) is not hard to establish. What is interesting is that the characteristic property (2.2.1) of an orthonormal basis characterizes the orthonormal property. We record this phenomenon in the following proposition.
P 2.2.7
Let V be an inner product space of dimension k. Let be some k vectors in V having the property that for any two vectors X and Y in V, Xl, X2, • •• , Xk
<
X,Y
> = < X,XI >< XI,Y > + < X,X2 >< + ... + < X , X2 >< Xk,Y > .
X2,Y
> (2.2.3)
Unitary and Euclidean Spaces
Then
XI,X2,'"
,Xk
61
is an orthonormal basis for V.
PROOF. Let us see what we can get out of the property (2.2.3) enunciated above. By plugging x = Xl and y = Xl in (2.2.3), we observe that
Since all the terms involved are non-negative, the only way that the above equality could hold is that IIxIIi :::; 1. As a matter of fact, we have IIxili :::; 1 for every i. Let Ul, U2,' .. ,Uk be an orthonormal basis of the vector space V. Each Xi has a unique representation in terms of the given orthonormal basis. By (2.2.2),
for each i. By plugging
= Y=
X
Ui
in (2.2.3), we observe that
for each i. Summing (2.2.5) over i, we obtain k
k=
k
L L 1<
Xj, Ui
> 12
i=l j=l
k
k
k
= LLI < Xj,Ui > 12 = LIIXjI12. j=l i=l
(2.2.6)
j=l
We have seen earlier that each IIxili :::; 1. This can coexist with (2.2.6) only when each IIxili = 1. In that case, if we look at (2.2.4) and related identities, it follows that < Xi, Xj > = 0 for all i =I- j. This completes the proof. IT we look at the proof, one wonders whether the assumption that the vector space has dimension k can be dropped at all. It is not feasible. Try to prove the above proposition by dropping the assumption on the dimension! In Chapter 1, we talked about complements of subspaces of a vector space. We have also seen that the complement need not be unique. IT we
62
MATRIX ALGEBRA THEORY AND APPLICATIONS
have, additionally, an inner product on the vector space, the whole idea of seeking a complement for the given subspace has to be reexamined under the newer circumstances. This is what we propose to do now. P 2.2.8 Let V be a vector space equipped with an inner product < .,. > and S a subspace of V. Then there exists a subspace S.L of V with the following properties. < x, Y > = 0 whenever xES and yES.L. S n S.L = {O} and V = S $ S.L. dim(S) + dim(S.L) = dim(V).
(1) (2) (3)
PROOF. Let Xl, X2, ... , Xr be a basis of the subspace S, and extend it to a full basis XI,X2, ... ,XnXr+I, ... ,xk ofV . Letzl,Z2, ... ,ZnZr+b .•. ,Zk be the orthonormal basis of V obtained by the Gram-Schmidt orthogonalization process carried out on the Xi'S. We now have a natural candidate to fit the bill. Let S.L be the vector subspace spanned by Zr+b Zr+2, ... ,Zk. Trivially, (1) follows. To prove (2), note that every vector x in V has a unique representation, x
= (OIZI
+ 02Z2 + ... + OrZr) + (Or+IZr+1 + ... + OkZk)
= YI + Y2,
say,
for some scalars oi's in C, where YI = 0IZI + 02Z2 + .. . + 0rZr and Y2 = X - YI . It is clear that YI E Sand Y2 E S.L. By the very construction, we have S n S.L = {O} and dim(S) + dim(S.L) = dim(V). We have talked about complements of a subspace in Chapter 1. The subspace S.L is a complement of the subspace S after all. But the subspace S.L is special. It has an additional property (1) listed above. In order to distinguish it from the plethora of complements available, let us call the subspace S.L an orthogonal complement of the subspace S. When we say that S.L is an orthogonal complement, we sound as though it is not unique. There could be other subspaces exhibiting the properties (1), (2) and (3) listed above. The proof given above is not much of help to settle the question of uniqueness. The subspace S.L is indeed unique and can be characterized in the following way. P 2.2.9 Let S be a subspace of an inner product space V. Then any subspace S.L having the properties (1) and (2) of P 2.2.8 can be characterized as S.L
= {x
E V:< X,Y >
=
0
for every
YES}.
Unitary and Euclidean Spaces
63
PROOF. Let S .. be a subspace of V having the properties (1) and (2) of P 2.2.8. Let S .... = {x E V :< x, y >= 0 for every YES}. We will show that S .. = S ..... As in the proof of P 2.2.8, let Zl, Z2, ••• ,Zr, Zr+1' ••• ,Zk be an orthononnal basis for V such that Zl, Z2, .•• ,Zr is a basis for the subspace S. Then for xES,
x = < x, zl > +
Zl
+ < x, Z2 >
Z2
+ ... + < x, Zr >
Zr
< x, zr+l > zr+l + ... + < x, Zk > zk,
with < x, Zl > Zl + < x, Z2 > Z2 + ... + < x, Zr > Zr E S. If XES", then, by (2), < x, Zi > = 0 for every i = 1,2, ... ,1·. Consequently, xES ..... Conversely, if xES .... , < x, Zi > = 0 for every i = 1,2, ... ,T, in particular. Hence xES". This completes the proof. To stretch matters beyond what was outlined in P 2.2.8, one could talk about the orthogonal complement (S.1).1 of the subspace S.1. If we look at the conditions, especially (1), of P 2.2.8 for the orthogonal complement S.1 of a given subspace S should meet, we perceive some symmetry in the way the condition (1) is arraigned. P 2.2.9 provides a strong motivation for the following result.
P 2.2.10 For any subspace S of an inner product space V, the relation, (S.1).1 = S, holds true. PROOF. Does this really require a proof? Well, let us try one. By P 2.2.8 (1) and (2), it follows that (S.1).1 C S. Since dim(S)+dim(S.1) = dim((S.1).1)+dim(S.1) = dim(V), we have dim((S.1).1) = dim(S). This together with (S.1).1 c S implies that (S.1).1 = S. In the absence of an inner product on a vector space, we could talk about complements of subspaces of the vector space. One could also talk about the complement of a given complement of a given subspace of a vector space. There is no guarantee that the second generation complement will be identical with the given subspace. Starting with a given subspace, one can keep on taking complements no two of which are alike! We are now in a position to introduce Orthogonal Projections. These projections have some bearing in some optimization problems. A more general definition of a projection will be provided later. First, we start with a definition.
64
MATRIX ALGEBRA THEORY AND APPLICATIONS
DEFINITION 2.2.11. (Orthogonal Projection) Let S be a subspace of
an inner product space V and S.l its orthogonal complement. Let x be any vector in V. Then the vector x admits a unique decomposition, x = Y + z, with yES and z E S.l. Define a map Ps from V to S by Ps(x) = y. The map Ps is called an orthogonal projection from the space V to the space S. The orthogonal projection is really a nice map. It is a homomorphism from the vector space onto the vector space S. It is idempotent. These facts are enshrined in the following proposition.
P 2.2.12 Let P s be the orthogonal projection from the inner product space V to its subspace S. Then it has the following properties. (1) The map P s is a linear map from V onto S. (2) The map Ps is idempotent, i.e., Ps(Ps(x)) = Ps(x) for every x in the vector space V. PROOF. The uniqueness of the decomposition of any vector x as a sum of two vectors y and z, with y in Sand z in S.l, is the key ingredient for the map P s to have such nice properties as (1) and (2). For the record, observe that the map Ps is an identity map when it is restricted to the subspace S. In order to define the projection map we do not need an inner product on the vector space. The projection map can always be defined on V with respect to some fixed complement sc of the subspace S of V. Such a map will have properties (1) and (2) of P 2.2.12. The orthogonal projection expostulated above arises in a certain optimization problem. Given any vector x in an inner product space V and any subspace S of V, we would like to compute explicitly the distance between the point x and the subspace S. The notion of distance between any two vectors of an inner product space can be extended to cover vectors and subsets of the vector space. More precisely, if x is a vector in V and A a subset of V, the distance between x and A can be defined as infYEA IIx - YII. Geometrically speaking, this number is the shortest distance between x and points of A. Generally, this distance is hard to compute and may not be attained. If the subset happens to be a subspace of V, the computation is simple, and in fact, the distance is attained at some point of the subspace.
P 2.2.13 Let x be any vector in an inner product space V and Sa
Unitary and Euclidean Spaces
65
subspace of V. Then the distance S(x, S) between x and S is given by
+
PROOF.Since V = S @ S', we can write x = x l x2 with x l E S and x2 E S'. Of course, x l = Ps(x). For any vector y in S, observe that
The last equality requires justification. First, the vector xl - y belongs to S, and of course, x2 E SL. Consequently, < x l - y,x2 > = 0. Pythagorous theorem now justifies the last equality above. See P 2.1.10 (4). After having split llx - Y112 into two parts, we minimize llxl - yll 2 over all y E S. Since x1 belongs to S, the minimum occurs at y = x l = Ps(x). - This completes the proof. The Pythagorous theorem and the decomposition of an inner product space into two subspaces which are orthogonal to each other are two sides of the same coin. If S and SI are complementary subspaces of a inner product space V , and x E S , y E S', then llz y1I2 = 1 1 ~ 1 1 ~ llY112 2 lly112. The above inequality can be paraphrased as follows. For the inequality, llx yll 2 Ilyll, holds for every x E S. any fixed y E sL1 Does this property characterize membership of y in s I ? Yes, it does.
+
+
+
P 2.2.14 Let y be any vector in an inner product space V and S a subspace of V. Then y E S' if and only if llx
+ yll
2 llyll for every x in S.
(2.2.7)
PROOF.We have already checked the "only if" part of the above statement. To prove the "if part", let y = yl y;! be the orthogonal decomposition of y with yl E S and y2 E SL. It suffices to show that yl = 0. Observe that
+
A word of explanation is in order. In the above chain of equalities and inequalities, Pythagorous theorem is used as well as (2.2.6). Observe
66
MATRIX ALGEBRA THEORY AND APPLICATIONS
also that (-y) E S. Thus equality must prevail everywhere in the above chain and hence Y1 = O.
Complements 2.2.1
Show that two vectors x and yare orthogonal if and only if
for all pairs of scalars a and {3. Show that two vectors x and y in a real irmer product space are orthogonal if and only if
2.2.2 If x and yare vectors of unit length in a Euclidean space, show that x + y and x - yare orthogonal. 2.2.3 Let Xl, X2,'" ,Xk be an orthonormal basis of an irmer product space and Y1 = Xl, Y2 = Xl + X2,· .. ,Yk = Xl + ... + Xk. Apply the Gram-Schmidt orthogonalization process to Y1, Y2, ... ,Yk· 2.2.4 Let Sl and S2 be two subspaces of an irmer product space. Prove the following.
cst·
(1) If Sl C S2, then Sf (2) (Sl n S2)1- = + Sf . nSf· (3) (Sl + S2)1- =
st st
2.2.5 Let S be a subspace of V and consider the set of points H = {xo + x: XES} for fixed Xo. Find min IIY - zll for given Y with respect to z E H. 2.3. Linear Equations In Section 1.6, we have considered a linear equation, homogeneous or non-homogeneous, in the environment of vector spaces involving unknown scalars belonging to the underlying field. A special case of such a linear equation we have considered is the one when the vector space was Fk, for some k ~ 1. The linear equation gave rise to a system of linear equations (1.6.6), which, upon close scrutiny, gives the feeling that there is some kind of inner product operation involved. In this section, we will indeed consider irmer product spaces and equations involving
67
Unitary and Euclidean Spaces
the underlying inner products. Let a1, a2,' .. ,am be given vectors in an inner product space V. Let a1, a2, ... ,am be given scalars in the underlying field of the vector space. Consider the system of equations, <X,ai> = ai, i = 1,2, ...
,m,
(2.3.1)
in unknown x E V. If V = C k or R k, and < -, - > is the usual inner product on V, then the above system of equations (2.3.1) identifies with the system (1.6.6). The above system is, in a way, more general than the system (1.6.6). Of course, in (1.6.6), the underlying field F is quite arbitrary. We now need to explore some methods of solving equation (2.3.1). P 2.3.1 The system (2.3.1) of equations has a solution (Le., the equations (2.3.1) are consistent} if and only if m
m
L,Biai = 0 whenever L.8iai = 0 i=1 i=1 for any scalars .8}' .82, ... ,.8m in C.
(2.3.2)
PROOF. Suppose the system (2.3.1) admits a solution x, say. Supm
pose for some scalars .81, .82, ... ,.8m in C,
i=1
.8iai
= O.
m
m
< x, L
L
.8i ai > = 0 -
i=l
L,Bi < X,ai
Then
m
> = L,Biai.
i=l
i=1
For the converse, consider the following system of linear equations < a},al > "Y1 + < a1, a2 > "Y1 +
+ < am,al > "Ym = + < am, a2 > "Ym =
aI, a2,
(2.3.3)
in unknown scalars "Yl, "Y2, ... ,"Ym in C. Our immediate concern is whether the system (2.3.3) admits a solution. We are back into the fold of the system of linear equations (1.6.6). We would like to use
68
MATRIX ALGEBRA THEORY AND APPLICATIONS
P 1.6.7 which provides a necessary and sufficient condition for the system (1.6.6) to have a solution. Let Ui = « aI, ai >, < a2, ai >, ... , < am, ai », i = 1,2, . .. ,m. We need to verify whether the condition,
is satisfied whenever f31Ul
+ f32u2 + ... + f3mum
for any
= 0
(2.3.4)
131,132, . . ·, 13m E C,
is satisfied to guarantee a solution to the system (2.3.3). If 131 Ul + f32U2 + .. . +f3mum = 0, then this is equivalent to 131 < ai, al > + 132 < ai, a2 > + ... + 13m < ai, am > = 0 for each i = 1,2, ... ,m. This, in turn, m
is equivalent to < ai,
L: !3jaj > =
0 for each i = 1,2, ... ,m. Suppose j=1 f31ul + f32u2 + ... + f3mum = 0 for some scalars 131,132,' .. ,13m. By what m
we have discussed above, this is equivalent to < ai,
L: !3jaj > =
0 for
i=1
each i
= 1,2, ... m
,m. This then implies that
m
= < L: !3i ai, L: !3i ai > . m
i=1
i=1
m
m
i=1
j=1
L: !3i ai =
O.
L:!3i < ai, L: !3jaj
m
Consequently,
>= 0
By (2.3.2),
i=l
L: f3i o i =
O. Thus (2.3.4) is verified. The system (2.3.3) admits a solui=1 tion. Denote by, with an apology for an abuse of notation, ,1,,2, ... "m a solution to the system (2.3.3) of equations. Let
One can verify that Xo is a solution of the system (2.3.1) of equations. The verification process merely coincides with the validity of the system (2.3.3) of equations. As has been commented earlier, there is an uncanny resemblance between the systems (2.3.1) and (1.6.6). In view of P 1.6.7, the above result is not surprising. Suppose the system (2.3.1) of equations is consistent, i.e., the system admits a solution. An immediate concern is the identification of a s~ lution. If we scrutinize the proof of P 2.3.1 carefully, it will certainly
Unitary and Eu.clidean Spaces
69
provide an idea of how to obtain a solution to the system. This solution is built upon the solution of the system (2.3.3) of equations operating in the realm of the field of complex numbers. Solving the system (2.3.3) is practical since we are dealing with complex numbers only. There may be more than one solution. We need to determine the stmcture of the set of all solutions of (2.3.1). The following proposition is concerned about this aspect of the problem.
P 2.3.2 Let 8 1 be the collection of all solutions to the system (2.3.1) of equations, assumed to be consistent. Let 8 be the collection of all solutions to the system
< x, ai > = 0
i
= 1,2, ...
,m,
of equations. Let Xo be any particular solution of (2.3.1). Then (1) 8 is a subspace of V, and (2) 8 1 = Xo + 8 = {xo + y : y E 8}. The above proposition is modeled on P 1.6.4. The same kind of humdrum argument carries through. Among the solutions available to the system (2.3.1), we would like to pick up that solution x for which IIxll is minimum. We could label such a solution as a minimum norm solution. The nicest thing about the solution which we have offered in the proof of P 2.3.1 is that it is indeed a minimum norm solution. Let us solemnize this fact in the following proposition.
P 2.3.3 The unique minimum norm solution of (2.3.1), when it is consistent, is given by Xo
= Ilal + 12 a2 + ... + Imam,
where 11,12, ... ,1m is any solution to the system (2.3.3) of equations. Further, IIxoll = ilal + i2 a 2 + ... + imam· PROOF. We have already shown that Xo is a solution to the system
(2.3.1) in P 2.3.1. Any general solution to the system (2.3.1) is of the form Xo + y, where y satisfies the conditions that < y, ai > = 0 for each i = 1,2, ... , m. See P 2.3.2. This y is orthogonal to xo! (Why?) Consequently, by Pythagorous theorem,
70
MATRIX ALGEBRA THEORY AND APPLICATIONS
which shows that Xo is a minimwn nonn solution. FUrther, in the above the equality is attained only when Y = O. This shows that the solution Xo is unique with respect to the property of minimum nonn. As for the norm of Xo, we note that m
IIxoll2 = < Xo, Xo > = < Xo, 2: "/iai > = i=l
m
2:
m
i'i
< Xo, ai > =
i=l
2:
i'i(}:i.
i=l
This completes the proof. Complements
2.3.1 Let V be an inner product space and x E V. Let S be a subspace ofV. (1) Show that IIx - YII is minimized over all yES at any Y = ii E S for which (x - ii) is orthogonal to S, i.e., < x - ii, Y > = 0 for all yES. (This is an alternative formulation of P 2.2.13.) (2) Suppose S is spanned by the vectors Y1. Y2, ... , Yr' Show that the problem of determining ii E S such that x - ii is orthogonal to S is equivalent to the problem of determining scalars /31. /32, ... , /3r such that x - (/31Y1 + /3?Y2 + ... + /3rYr) is orthogonal to S which, in turn, is equivalent to solving the equations
< Y1,Y1 > /31 + < Y2,Y1 > /32
< Y1,Y2 > /31 + < Y2,Y2
+ ... + < Yr,Y1 > /3r = < X,Y1 >, > /32 + ... + < Yr,Y2 > /3r = < X,Y2 >,
< YllYr > /31 + < Y2,Yr > /3? + ... + < YTlYr > /3r
=
< X,Yr >,
in unknown scalars /31, /32, .. , /3r. (3) Show that the system of equations is solvable. (The method outlined in (2) is a practical way of evaluating
Ps(x).) (4) Let V = R n with its standard inner product. Let x = (X1! X 2, ... ,xn ) and Yi = (Yil,Yi2, . . . ,Yin), i = 1,2, ... ,T. Show that the steps involved in (1), (2), and (3) above lead to the least squares theory of approximating the vector x by a vector from the vector space spanned by Y1, Y2, . . . ,Yr'
Unitary and Euclidean Spaces
71
2.3.2 Let P 4 be the vector space of all polynomials of degree less than 4 with real coefficients. The inner product in P 4 is defined by
< Yl (x), Y2(X) > =
ill Yl (X)Y2(X) dx
for Yl (x), Y2 (x) E P 4. Let S be the vector space spanned by the polynomials Yl(X) = 1,Y2(X) = x, and Y3(X) = x 2. Determine the best approximation of the polynomial 2x + 3x 2 - 4x 3 by a polynomial from
S. 2.4. Linear Functiollals
In Section 1.7, we presented some discussion on linear functionals of vector spaces and dual spaces. Now that we have an additional structure on our vector spaces, namely, inner products, we need to reexamine the concept of linear functional in the new environment. The definitions of a linear functional and that of the dual space remain the same. If we use the inner product available on the underlying vector space, we see that the linear functionals have a nice structural form and we get a clearer understanding of the duality between the vector space and its dual space. First, we deal with linear functionals. P 2.4.1 (Representation Theorem of a Linear Functional) Let V be an inner product space and J a linear functional on V. Then there exists a unique vector z in V such that
J(x) = < x, z > for every x in V.
(2.4.1)
PROOF. If J(x) = 0 for every x in V, take z = O. Otherwise, let S.L be the orthogonal complement of the subspace S = {x E V: J(x) = O} of V. The subspace S.L has at least one non-zero vector. (Why?) Choose any vector u in S.L such that < u,u > = 1. Set z = J(u)u . The vector z is the required candidate. Since u rt s, J(u) =F 0 and hence z =F 0 and z E S.L. Let us see whether the vector z does the job. To begin with, we verify
J(z) = J(u)J(u) = < z, z > . Let x E V. Set Xl
=X
-
[J(x)/ < z, z >lz.
(2.4.2)
72
MATRIX ALGEBRA THEORY AND APPLICATIONS
We show that x = Xl + X2 is the orthogonal decomposition of X with respect to the subspaces Sand S.l, where X2 = [j(x}j < z, Z >Jz. It is clear that X2 E S.l. Further, j(XI} = o. Consequently, Xl E S. From
(2.4.2),
o = < XI. Z > = < X, z >
- [j(x}j < z, z >lJ(z}
= < X, z
> - j(x}.
Hence j(x} = < X, z > . The uniqueness of the vector z is easy to establish. If Zl is another vector such that < x, Zl > = < X, z > for all X in V, then < x, Zl - Z > = 0 for all X in V. Hence, we must have Zl = z. This completes the proof. Thus with every linear functional j on the inner product space V, we have a unique vector z in V satisfying (2.4.1). This correspondence is an isomorphism between the vector space V and its dual space V'. This is not hard to establish. We can reap some benefits out of the representation theorem presented above. We recast P 1.7.6 in our new environment.
P 2.4.2 Let Xl, X2, ... ,Xk constitute a basis for an inner product space V. Then we can find another basis Zl, Z2, ... ,Zk for V such that if i
<x o z o>={l " ] 0 for all i and j. Further, for any
X
= < X, Zl
>
Xl
X
if i
= j, =f: j,
in V, we can write
+ < X, Z2
> X2
Also, Zi = Xi for every i if and only if basis for V.
+ ... + < X, Zk
Xl,
> Xk.
X2, ... ,Xk is an orthonormal
There are several ways of establishing the veracity of the above proposition. One way is to use the result of P 1.7.6 to obtain a dual basis of linear functionals It, h, ... ,!k for V' and then use P 2.4.1 above to obtain the associated vectors z}, Z2, . .. ,Zk. We would like to describe another way (really?) which is more illuminating. Let X be any vector in V and (6,6, ... '~k) its co-ordinates with respect to the basis Xl,X2, ... ,Xk, i.e., PROOF.
Unitary and Euclidean Spaces
73
For each 1 :S i :S k, define Ii : V --+ F by Ji(x) = ~i' One can verify that each Ii is a linear functional. By P 2.4.1, there exists a unique vector Zi in V such that li(X) = < X, Zi > for every x in V, and for each 1 :S i :S k. One can verify that Zl, Z2, ••• ,Zk constitute a basis for the vector space V. Since the co-ordinates of Xi are (0,0, ... ,0,1,0, ... ,0) with 1 in the i-th position, we have if j if j
= i, # i.
The other statements of the proposition follow in a simple manner. As an application of the above ideas, let us consider a statistical prediction problem. Let (0, A, P) be a probability space and Xl, X2, ••. ,Xk be square integrable real random variables (defined on the probability space). Then the collection of all random variables of the form 0lXl + 02X2 + ... + 0kXk for all real scalars ot, 02, ... ,ok is a vector space V over the field R of real numbers with the usual operations of addition and scalar multiplication of random variables. We introduce the following inner product on the vector space V. For any x, y in V,
< X,y > =
E(xy),
where E stands for the expectation operator. The above expectation is evaluated with respect to the joint distribution of X and y. In statistical parlance, E( xy) is called the product moment of X and y. Assume that Xl! X2, ••• ,Xk are linearly independent. What this means, in our context, is that if 0lXl + 02X2 + ... + 0kXk = almost surely for some scalars 01, Q2, ... ,Qk then each Qi must be equal to zero. This implies that none of the random variables is degenerate almost surely and dim(V) = k. For applications, it is convenient to adjoin the random variable Xo which is equal to the constant 1 almost surely to our lot Xl, X2, .. . ,Xk if it is not already there. Let V* be the vector space spanned by xo, Xl," . ,Xk. Let p be any positive integer less than k. Let S be the vector space spanned by xo, xl, ... ,xp. Let y be any random variable in V*. Now we come to the prediction problem. Suppose we are able to observe Xl! X2, .•• ,xp. We would like to predict the value of the random variable y. Mathematically, we want to propose a linear predictor f30 + f31Xl + ... + f3pxp as our prediction of the random
°
74
MATRIX ALGEBRA THEORY AND APPLICATIONS
variable y. Practically, what this means is that whenever we observe Xl, X2, ... ,x ' plug in the observed values into the predictor and then P declare that the resultant number is our preructed value of y. Now the question arises as to the choice of the scalars /30, /31, ... ,/3p' Of course, we all feel that we must choose the scalars optimally, optimal in some sense. One natural optimality criterion can be developed in the following way. For any choice of the scalars /30, /31, ... ,/3p,
y - /30 - /31 Xl
-
/32 X2 - .. , - /3px p
can be regarded as prediction error. We need to minimize the preruction error in some way. One way of doing this is to choose the scalars /30, /3t, ... ,/3p in such a way that
liy - /30 - /3t Xt - ... /3px pl1 2 = E(y - /30 - /3t Xt - ... -
/3px p)2 (2.4.3)
is a minimum. This kind of scenario arises in a variety of contexts. In Econometrics, Xi could denote the price of a particular stock at time period i, i = 1,2, ... ,k. After having observed Xt, X2, . .. ,xp at p successive time points, we would like to predict the price of the stock at the time point p + 1. In such a case, we take y = Xp-t-l' In the spatial preruction problem, the objective is to predict a response variable y at a new site given observations x}, X2, ... ,xp at p existing sites. The spatial prediction problem is known as kriging in geostatistics literature. Let us get back to the problem of choosing the scalars /30, /3t, ... ,f3p in (2.4.3). Observe that for any choice of scalars /30, /31, ... ,/3p, the vector f30 + /31 XI + ... + /3pxp belongs to the subspace S. The problem now reduces to finrung a vector X in S such that lIy - xII is a minimum. We have already solved this problem. See the result of P 2.2.13. The solution is given by X = Ps(y), the orthogonal projection of y onto the subspace S. Let us try to compute explicitly the orthogonal projection of y onto the subspace S. Observe that X must be of the form f30 + /31 x I + ... + /3pxp for some scalars /30, /31, ... ,/3p' Write the orthogonal decomposition of y with respect to the subspaces Sand S..L as
y = X + (y - x)
= Ps(y) + (y - x).
Observe that (y - x) E S..L if and only if < y - X,Xi > = 0 for every i = 0,1,2, . .. ,po But < y-x,Xi >= 0 means that E((y-x)xd =
Unitary and Euclidean Spaces
75
0= E«y - f30 - f31Xl - ... - f3pXp)Xi). Expanding the expectation, we obtain the following equations in f3o, f31, .. • , f3p: f30 f30E(Xl) f3oE(X2)
+ f3 1E(xt) + f32E(X2) + + f3 1E(xi) + f32E(XIX2) + + f31E(X2Xd + f32E(x~) +
+ f3pE(xp) = + f3pE(XIXp) = + f3pE(X2Xp) =
E(y), E(X1Y)' E(X2Y),
We need to solve these equations in order to build the required predictor. These linear equations can be simplified further. We can eliminate f30 by using the first equation above, i.e.,
from each of the remaining equations. Now we will have p equations in p unknowns f31, f32, . .. , f3p:
+ f3pSl p = + f3pS2p =
SOl, S02,
where Sij = E(XiXj) - E(Xi)E(xj) = covariance between Xi and Xj, 1 :::; i,j :::; p, and SOi = E(yxd - E(y)E(Xi) = covariance between y and Xi, i = 1,2, ... , p. The problem of determining the optimal predictor of y reduces to the problem of solving the above p linear equations in p unknowns!
Complements
2.4.1 Let R3 be the three-dimensional vector space equipped with the standard inner product. Let f : R 3 ---+ R be the linear functional defined by f(x) = f(xl, X2, X3) = 2Xl + X2 - X3 for X = (Xl,X2,X3) E R3. Determine the vector z E R3 such that
f(x)
=< X,2 >, X E
R3 .
76
MATRIX ALGEBRA THEORY AND APPLICATIONS
2.4.2 Let V be a real vector space with an inner product and T a transformation from V to V. Define a map f: V ---+ R by
f(x) = < Tx,y >, x E V for some fixed vector y E V. Show that Determine z E V such that
f(x) = < x, z >
f
is a linear functional on V.
for all x E V.
2.5. Semi-inner Product We have seen what inner products are in Section 2.1. We do come across some maps on the product space V x V of a vector space which are almost like an inner product. A semi-inner product, which is the focus of attention in this section, is one such map relaxing one of the conditions of an inner product. In this section, we outline some strategies how to handle semi-inner product spaces. All the definitions and results presented in this section are designed for vector spaces over the field C of complex numbers. The modifications should be obvious if the underlying field is that of real numbers. DEFINITION 2.5.1. Let V be a vector space. A complex valued function (., .) defined over the product space V x V is said to be a semi-inner product if it meets the following conditions:
(1) (x,y) = (y,x) for all x and y in V. (2) (x,x) 2: 0 for all x in V. (3) (O:IXl + 0:2X2,y) = O:I(Xl,y) + 0:2(X2,y) for all Xl,X2 and y in V, and
0:1,0:2
in C.
These conditions are the same as those for an inner product except for (2) which admits the possibility of (x, x) vanishing for x i= O. We use the notation (.,.) for a semi-inner product, and < .,. > for a regular inner product for which < x, x >= 0 only when x = O. In the same vein, we define the positive square root of (x, x) as the semi-norm of x and denote it by Ilxll se . Note that IIxlise could be zero for a non-zero vector. The vector space V equipped with a semi-inner product is called a semi-inner product space.
Unitary and Euclidean Spaces
77
Most of the results that are valid for an inner product space are also valid for a semi-inner product space. There are, however, some essential differences. In the following proposition, we highlight some of the salient features of a semi-inner product space. P 2.5.2 Let (".) be a semi-inner product on a vector space V. Then the following are valid. (1) (0,0) = O.
(2) (x, y) = 0 if either IIxli se = 0 or IIYlIse = 0 (3) (Cauchy-Schwartz Inequality) I(x, y)1 ~
IIxlisellYlise for all x and yin V. (4) (Triangle Inequality) II (x + Y)lIse ~ Ilxlise + IIYlIse for all x and yin V. (5) The set N = {x E V: IIxli se = O} is a subspace of V.
PROOF. Before attempting a proof, it will be instructive to scan some parts of Section 2.1 to get a feeling about where the differences lie. To prove (1), choose a1 = a2 = 0 and Y = O. For (2), one could use the Cauchy-Schwartz inequality stated in (3). But this is not the right thing to do. In the proof of (3), we make use of the fact that the assertion of (2) is true. Suppose IIYlIse = O. Then for any complex number a,
o ~ (x + ay, x + ay) =
IIxll~e
+ a(y, x) + a(x, y) + aa(y, y)
= IIxll~e + a(y, x) + a(y, x). If a
= 'Y + i8 and (x, y) = ~ + i"1 for real numbers 'Y, 8, ~ and "1, IIxll~e + a(y, x) + a(y, x) = Ilxll~e + 2'Y~ + 28"1 ~ 0
for all real numbers 'Y and 8. Set 'Y
= O.
then
(2.5.1)
Then
for all real numbers 8. This is possible only if "1 (2.5.1). Then
=
O. Set 8
=
0 in
for all real numbers 'Y. This is possible only if ~ = O. Consequently, (x,y)=~+i"1=O.
78
MATRIX ALGEBRA THEORY AND APPLICATIONS
To prove (3), we follow the same route as the one outlined in the proof of (2). The statement of (3) is valid when lIyllse = 0 by virtue of (2). Assume that lIyllse =f:. o. For any complex number 0:, observe that
o$ Set
0:
o
(x+o:y,x+o:y) = IIxll~e+o:(y,x)+a(x,y)+ao:llyll~e.
= -(x, Y)/lIyll~e. $
Then
IIxll~e -I(x, yW /lIyll~e -I(x, y)12 /lIyll~e + I(x, y)12 /lIyll~e,
from which the Cauchy-Schwartz inequality follows. In the above, we have used the fact that (x,y)(y,x) = l(x,y)j2. This proof is essentially the same as the one provided for the inequality in inner product spaces. Observe the role played by the statement (2) in the proof. The proof of (4) hinges on an application of the Cauchy-Schwartz inequality and is analogous to the one provided in the inner product case. Finally, we tackle the statement (5). To show that N is a subspace, let x and y belong to N, and 0: and {3 be complex numbers. We need to show that
(o:x + {3y, o:x + {3y) = O. But (o:x + {3y, o:x + {3y) = o:allxll~e + {3t1llyll~e +o:t1(x, y) +a{3(y, x) = 0, in view of (2). This completes the proof. We would like to bring into focus certain differences between inner product spaces and semi-inner product spaces. We look at an example. 2.5.3. Consider the k-dimensional Euclidean space Rk for some k 2: 2. Let 1 $ r < k be an integer. For x = (~1,6, ... ,~k) and y = (7]b7]2, ... ,7]k) in Rk, define EXAMPLE
(x,y) = 67]1 +67]2 + ... +~r7]r. The map (.,.) is a semi-inner product on the vector space norm of x works out to be
Rk.
The
The subspace N = {x E Rk : IIxli se = O} consists of vectors of the form (0, 0, . .. '~r+ 1, . .. '~k). Consequently. dim(N) = k - r.
Unitary and Euclidean Spaces
79
The dimensions of orthogonal vector spaces under a semi-inner product may not be additive. To illustrate this point, let Xl = (1,0,0, ... ,0) and X2 = (0, ... ,0,1,0, . . . ,0) where 1 in X2 appears at the (r + 1)th position. Let < ., . > be any inner product on Rk and (., .) the semi-inner product on Rk introduced above. Let S be the vector space spanned by Xl and X2. Obviously, the dimension of the subspace S of Rk is two. Let U = {x E Rk : (x,xt) = (X,X2) = O}. Every vector X in U is of the form (0'~2,6, .. . '~k) for some real numbers ~2' 6, ... '~k. Observe that the dimension of the subspace U of R k is k - 1. The subspace U can be regarded as the orthogonal complement of the subspace S with respect to the semi-inner product (., .). But dimeS) + dim(U) =-I k. Also, S n U =-I {O}. On the other hand, if we define U = {x E Rk : < X,XI > = < X,X2 >= O}, then the subspace U is the orthogonal complement of the subspace S with respect to the inner product < ., . >. There are two ways of manufacturing an inner product from a semiinner product. Let V be a semi-inner product space with semi-inner product (., l The critical ideas are based on the subspace N defined above. Take any complement NC of N. Look up P 1.5.5. The restriction of the semi-inner product (-, .) to the vector space N C is an inner product! This is rather easy to see. Another way is to use the idea of a quotient space. Let W = V IN be the quotient space of V with respect to the subspace N . Look up Section 1.8. The space W is the collection of all cosets of N of the form X + N,x E V . We define a map < ., > on the product space W X W by
<x+N,y+N>= (x,y) for any two distinct cosets of N. It is not hard to show that the map < .,. > is an inner product on the quotient space W. We will record this fact in the form of a proposition for future reference. P 2.5.4 The map < -, . > defined above on the quotient space W is an inner product on W. It is time to define, formally, the orthogonal complement of a subspace S of a vector space V equipped with a semi-inner product (-, .). Following the procedure in the case of inner product spaces, define S; = {x E V : (x,y) = 0 for every yES} as the orthogonal complement of S with respect to the semi-inner product (. , .). As we have
80
MATRIX ALGEBRA THEORY AND APPLICATIONS
seen earlier , S n S1. s may contain non-zero vectors. It is clear that S; is a subspace of V, and then that S n S; is a subspace. The dimension of the space S n S; could be more than one. But one could always decompose any given vector x in V into a sum Xo + Xoo with Xo E S and Xoo E S;. Let us put this down as a proposition.
P 2.5.5 For any given vector x in a semi-inner product space V, there exists Xo in Sand Xoo in S; such that x = Xo
+ Xoo·
Further, the subspaces Sand S; together span the space V. PROOF. Let Xl, X2, ... ,X r be a basis of the vector space S. The basic problem is to determine scalars 0:1,0:2, ... ,O:r such that X-O:lXl0:2X2 - ... - O:rXr belongs to S;. If we succeed in this mission, we simply let Xo = O:lXl + 0:2X2 + ... + O:rXr and Xoo = x - Xo. We will then have the desired decomposition. The scalars could be found. The condition that x - O:lXl - 0:2X2 - ... - O:rXr belongs to S; is equivalent to (x - O:lXl - 0:2X2 - ... - O:rXr, Xi) =
0
for each
i = 1,2, ... ,r.
These equations can be rewritten in an illuminating way. (Xl, Xl)O:l
(Xl, X2)0:1
+ (X2' Xd0:2 + .. . + (Xr, xdO:r = (x,xd, + (X2' X2)0:2 + .. . + (Xr, X2)O:r = (X,X2),
We have r linear equations in r unknowns 0:1, 0:2, ... ,O:r. This system of equations is analogous to the one presented in (1.6.6). We can invoke P 1.6.7 to check whether this system admits a solution. The case Iixlise = o is trivial. The decomposition is: x = 0 + x. Assume that Iixlise =1= o. Let Ui = «XI, Xi), (X2,Xi), ... ,(xr,xi)),i = 1,2, ... ,r. Suppose elUl + e2 U 2 + ... + erUr = 0 for some scalars el, e2, ... ,er. This is equivalent to r
(Xj, L€iXd i=l
=0
for each
j
=
1,2, ... ,r.
Unitary and Euclidean Spares r
This implies that
(L
81
r
€jXj,
j=l
r
L
€iXi)s = 0, from which we have
i=l
II L
€ixill
i=l
= o. Following the line of thought outlined in P 1.6.7, we need only to verify that
But r
Ci (y, Xl)
+ c2(y, X2) + ... + cr(y, Xr) = (y, I:: €iXi) = 0, i=l r
in view of the fact that
II L
€iXili = 0 and the result of P 2.5.2 (2).
i=l
Consequently, the above system of equations is consistent. Finally, the mere fact that the decomposition is possible is good enough to conclude that the spaces Sand S; together span V. This completes the proof. As has been pointed out earlier, the dimensions of Sand S; need not add up to the dimension of V. Further, we would like to point out that the decomposition is not unique. This makes it difficult to define the projection of V onto the subspace S. We need not worry about it. We can get around this difficulty. Let us consider an optimization problem in the context of a semiinner product space similar to the one considered in Section 2.2. Let V be a vector space equipped with a semi-inner product (., .) and S a subspace of V. Let X be any vector in V. We raise the question whether the minimum of IIx - zllse is attained over all z E S, or in other words, whether there exists a vector Xo E S such that inf
zES
Ilx -
zllse
=
IIx - xollse.
This has a solution and it is not hard to guess the vector. The experience we have had with inner product spaces should come handy. In fact, the same kind of proof as in the case of regular inner products works. In P 2.5.5 we showed that any given vector X E V admits a decomposition
X = Xo +xoo with Xo E Sand Xoo E S;. The decomposition is not unique as stated earlier. But any Xo and Xoo with the stated inclusion properties will do.
82
MATRIX ALGEBRA THEORY AND APPLICATIONS
The vector Xo is indeed the solution to the optimization problem above. Note that for any vector z in S,
IIx -
zll~e
= II (x - xo) + (xo - z)ll;e
+ (xo - z), (x - xo) + (xo - z)) = lI(x - xo)ll~e + II(xo - z)ll~e + (x - Xo, Xo + (xo - z, x - xo) = II (x - xo)ll~e + lI(xo - z)ll~e
= ((x - xo)
z)
(2.5.2) This inequality establishes that the vector Xo is a desired solution to our optimization problem. The semi-inner products that appear above vanish in view of the facts that x - Xo E S; and Xo - z E S . But we must admit that the solution vector Xo need not to be unique. The solution vector Xo can be characterized in the following way. P 2.5.6 Let x be any vector in any semi-inner product space V. Let S be a subspace of V. The following two statements are equivalent. (1) There exists a vector Xo in S such that inf
zES
(2)
IIx - zllse = IIx -
xoll se .
There exists a vector Xo in S such that x - Xo E S;.
PROOF. The statement (2) implies the statement (1) from the in-
equality established in (2.5.2). Suppose (1) is true. Then for any complex number 0: and for any vector z in S.
IIx -
Xo - o:zll~e = ~
IIx - xoll~e + o:allzll~e IIx - xoll;e,
a(x - xo, z) - o:(z,x - xo)
since Xo + o:z E S. Since the above inequality is true for any scalar 0:, it follows that (x - Xo, z) = O. But this equality is true for any z in S. Hence x - Xo E S;. We consider another problem of minimization. Let S be a subspace of a semi-inner product space. Let H be a coset of S. We want to minimize IIxli se over all x in H. A solution to this problem can be characterized in the following way.
Unitary and Euclidean Spaces
83
P 2.5.7 Let H be a coset of the subspace S of a semi-inner product space V. Suppose that there exists Xo in H such that inf
xEH
IIxlise = IIxollse.
Then Xo E S;-. PROOF. Suppose Xo is a solution to the minimization problem. Then for any scalar a and any vector z in S,
The above inequality follows if we observe that Xo + az E H. Since the above inequality is valid for any scalar a, it follows that (xo, z) = O. But this equality is valid for any z in S. Hence Xo E S;-.
Complements
2.5.1 Let (.,.) be a semi-inner product on a vector space V and N = {x E V : IIxli se = O}. Show that N.L = V. 2.5.2 Let (.,.) be a semi-inner product on a vector space V and N = {x E V : IIxlise = O}. If x + N = Y + N for x, y E V, show that IIxli se = IIYlIse. Show that for the coset x + N of N, Inf
zEx+N
IIzlise = IIxlise.
2.6. Spectral Theory The spectral theory of conjugate bilinear functionals, to be introduced shortly, can be regarded as a crowning achievement of the theory of vector spaces. We will see numerous instances of the pivotal role that the spectral theory plays in a variety of problems. Let us begin with a definition. DEFINITION 2.6.1. Let V be a vector space over the field C of complex numbers. A map K(-'·) from VxV into C is said to be a Hermitian conjugate bilinear functional if it has the following properties. (1) K(x, y) = K(y, x) for all x and y in V. (Hermitian property) (2) K(alxl + a2x2,y) = alK(xl,Y) + a2K(x2,Y) for all vectors Xl, X2, Y in V and scalars al and a2. (Conjugate bilinearity)
84
MATRIX ALGEBRA THEORY AND APPLICATIONS
A few words of explanation are needed about the terminology used. If we look at the conditions (1) and (2) carefully, they are part of the ones that define an inner product or a semi-inner product. The only defining condition of a semi-inner product that is missing from the list above is that K(x, x) need not be non-negative. The goal of spectral theory is to express any given Hermitian conjugate bilinear functional as a linear combination of semi-inner products by breaking up the vector space V into orthogonal subspaces with respect to a specified inner product on V. Another point worth noting is the following. By combining (1) and (2), one can show that
for all vectors x, Yl, Y2 in V and scalars aI, a2- The map K(·,·) is not quite bilinear! We see that the phrase "conjugate bilinearity" is very apt to describe the property (2). The final remark is that the Hermitian property gives us immediately that K(x, x) is real for all vectors x. We now develop a body of results eventually culminating in the spectral theorem for a Hermitian conjugate bilinear functional. We always assume that the vector space V comes with an inner product < ', . >. Whenever we talk about orthogonality it is always with respect to the underlying inner product on the vector space.
P 2.6.2 Let K(·,·) be a Hermitian conjugate bilinear functional and < .,. > an inner product on a vector space. Then the following are valid. (1) The supremum of K(x, x)/ < x, x > over all non-zero vectors x in V is attained at some vector Xl in V. (2) K(y, xd = 0 for Y E (Sp(Xl)).L, where Xl is the vector under focus in (1) and Sp(xd is the vector space spanned by Xl. PROOF. Let Zl, Z2, ... ,Zk be an orthogonal basis of the vector space V and X = "YlZl + "Y2 Z2 + ... + "YkZk an arbitrary vector in V in its usual representation in terms of the given basis. Let us compute K(x, x) and < x, x >. Note that
k
k
K(x, x) = "L"L"Yi"'YjK(Zi,Zj), i=l i=l
Unitary and Euclidean Spaces
85
and k
< X,X >
=
L 'Yi'Yi· i=l
We want to simplify the problem. Observe that for any non-zero scalar a, and non-zero x E V, K(x,x)/ < x,x >= K(ax,ax)/ < aX,ax >. Consequently,
K(x,x)/<x,x>=
sup xEV,x;.W
sup
K(x,x).
xEV,<x,x>=l
Let kij = K (Zi' Zj) for all i and j. The maximization problem can now be recast as follows. Maximize the objective function k
k
LL 'Yiijkij i=l j=l
over all complex numbers "11, "12, . . . ,'Yk subject to the condition k
Lbil 2 =
1.
i=l
The set D = {hl,'Y2, ... ,'Yk) E C k : bll 2 + b212 + ... + bkl2 I} is a compact subset of Ck. further, the objective function is a continuous function on D. By a standard argument in Mathematical Analysis, the supremum of the objective function is attained at some vector hi, "12,· .. of D. Let Xl = "Ii Zl + 'Y2Z2 + ... + 'YZZk. It is clear that the supremum of K(x, x)/ < x, x> over all non-zero x in V is attained at Xl. This completes the proof of Part (1). For Part (2), let a be any complex number and y any vector in (Sp(xd)l.. If y = 0, (2) is trivially true. Assume that y # O. Then for any complex number a, aXl + y is non-zero. In view of the optimality of the vector Xl, we have
,"In
Let us expand the ratio that appears on the left hand side above and then perform cross- multiplication On writing a = al +ia2 and K (Xl, y)
86
MATRIX ALGEBRA THEORY AND APPLICATIONS
= 61
+ i62
for some real numbers
aI, a2,
61 , and 62 , and observing that
< Xl,Y > = 0, we have
More usefully, we have the following inequality:
This inequality is true for all real numbers al and a2. But the number that appears on the right hand side of the inequality is fixed. Consequently, we must have 61 = 62 = O. Hence K(Xl, Y) = 61 + i62 = o. Some comments are in order on the above result. The attainment of the supremum as established in Part (1) is purely a topological property. Even though the Hermitian bilinear conjugate functional is not an inner product, it inherits some properties of the inner product. If Xl E Sp(Xl) and Y E (Sp(xt})1., it is clear that < Xl, Y > = O. Part (2) says that the same property holds for K(-, .), i.e., K(Xl, y) = o. But more generally, we would like to know whether K(x, y) = 0 whenever X E 8 and Y E 81., where 8 and 81. are a pair of orthogonal complementary subspaces ofV. P 2.6.2 is just a beginning in response to this query. In the following proposition, we do a better job. P 2.6.3 Let K(·,·) be a Hermitian conjugate bilinear functional on a vector space V equipped with an inner product < .,. >. Then there exists a basis x}, X2, .. . , Xk for the vector space V such that
< Xi, Xj > = K(Xi, Xj) = 0 for all i i- j.
(2.6.1)
PROOF. Choose Xl as in P 2.6.2 and apply the result to the vector space (Sp(xt})1. with K(-' ·) and <, . > restricted to the subspace (Sp(Xl))1.. There exists a vector X2 E (Sp(Xl))1. such that
(1)
sup
K(x, x)/ < x, X> = K(X2, X2)/ < X2, X2 >,
xE(Sp(xI).L ,x;i:O
(2) K(Xl1X2) = < Xl,X2 > = 0, (3) K(u, v) = < u, v > = 0 whenever u E Sp(xt, X2) and v E (Sp(Xl,X2))1., where, as usual, Sp(Xl, X2) is the vector space spanned by the vectors Xl and X2. Reflect a little on (3) and see why it holds. Now the focus
Unitary and Euclidean Spaces
87
of attention is the vector space (Sp(XI' X2)).L. A repeated application of P 2.6.2 yields the desired basis. It is possible that the Hermitian conjugate bilinear functional could be an inner product or a semi-inner product in its own right. What P 2.6.3 is trying to convey to us is that we can find a cormnon orthononnal basis under both the inner products K(·,·) and < .,. >. Once we have obtained a basis XI, X2, . .. ,Xk havi ng the property (2.6.1), it is a simple job to normalize them, i.e., have them satisfy < Xi, Xi > = 1 for every i. Assume now that XI, X2, . .. ,Xk is an orthononnal basis under the inner product <,. > satisfying (2.6.1). Let K(Xi' Xi) = Ai, i = 1,2, ... ,k. Assume, without loss of generality, that >'1 2: A2 2: ... 2: Ak. The numbers Ai'S are called the eigenvalues of K(·,·) with respect to the inner product < .,. >, and the corresponding vectors Xl, X2, .•• ,Xk, the eigenvectors of K(-,·) corresponding to the eigenvalues All A2, .• . ,Ak. There is no reason to believe that all Ai'S to be distinct. Let A(l)' A(2)' .•. ,A(s) be the distinct eigenvalues with multiplicities rl, r2, .. . ,rs , respectively. We tabulate the eigenvalues, the corresponding eigenvectors, and the subspaces spanned by the eigenvectors in a systematic fashion.
Eigenvalues Al
= A2 = ... = Arl = A(l) = Arl +2 = ... = Arl +r2 = A(2)
Arl +1
Corresponding eigenvectors Xl, X2,··· 'X rl
X rl +1, X r1 +2, ... ,Xrj +r2
The subspace spanned by the i-th set of vectors is denoted by E i , i = 1, ... ,s. We want to introduce another phrase. The subspace Ei is called the eigenspace of K ( ., .) correspondi ng to the eigenvalue A( i). From the way these eigenspaces are constructed, it is clear that the eigenspaces E I , E 2 , ... ,Es are mutually orthogonal. What this means is that if x E Ei and y E Ej, then < x, y > = 0 fQr any two distinct i and j. Moreover, the vector space V can be realized as the direct sum of the subspaces E I , E2, ... ,Es. More precisely, given any vector y in V, we can find Yi in Ei for each i such that Y = YI + Y2 + ... + Ys. This
88
MATRIX ALGEBRA THEORY AND APPLICATIONS
decomposition is unique. Symbolically, we can write V = El E!1 E2 E!1" . EI1 Ea· Some more properties of eigenvalues, eigenvectors and eigenspaces are recorded in the following proposition. P 2.6.4 Let K(·, .), < following are valid.
',' >, .Vs and E/s be as defined above. The
(1) K(x, y) = < x, Y > = 0 for every x in Ei and Y in E j for any two distinct i and j.
(2) K(x, x)/ < x, x> =
A(i) for every x in Ei and for every i. (3) If X,Y E Ei for any i and < X,Y > = 0, then K(x,y) = O. (4) If x, y E Ei for any i, then K(x, y) = A(i) < x, y > . (5) If Yil, Yi2, ... ,Yir, is an orthonormal basis for the subspace Ei, i = 1,2, ... ,8, then the k vectors, Yll,Y12,··· ,Ylrl;Y2bY22, ..• ,Y2r2;'" ; Ysl, Ys2,'" ,Ysr., constitute an orthonormal basis for the vector space V.
One can establish the above assertion by a repeated application of (2.6.1) and the defining properties of K( ·,·) and < ',' >. The property (5) above has an interesting connotation. The property (2.6.1) is very critical in understanding the structure of any Hermitian conjugate bilinear functional. Once we obtain the subspaces Ei 's, one could generate a variety of orthonormal bases for the vector space V satisfying (2.6.1) by piecing together a variety of orthonormal bases for each subspace Ei. If the eigenvalues are all distinct, or equivalently, each subspace Ei is one-dimensional, we do not have such a kind of freedom. In this case, the normalized vectors Xl,X2, ..• ,Xk satisfying (2.6.1) are unique. Of course, we need to demonstrate that any orthonormal basis of the vector space V satisfying (2.6.1) arises in the way Part (5) of the above proposition outlines. Let us put that down as a proposition. Let K( ·, .), <, . >, xi's and Ei's be as defined above. Let be an orthonormal basis having the property (2.6.1). Then each Zi must belong to some subspace E j . Equivalently, every orthonormal basis of V satisfying (2.6.1) is generated as outlined in Part (5) of P 2.6.4.
P 2.6.5
Z}, Z2, '"
,Zk
PROOF. Since V = El EI1 E2 EI1 ... EI1 E s , each vector Zi in the given
orthonormal basis has a unique decomposition Zi
= Z{l + Zi2 + ... + Zis,
Zij
E Ej , j
= 1,2, '"
,8.
Unitary and Euclidean Spaces
Let us work with the vector we have
Zl.
Since for every )
89
i= 1, < Zl, Zj
< Zll, Zjl > + < Z12, Zj2 > + .. . + < Zls, Zjs > = Since for every)
i= 1, K(ZI' Zj) =
> = 0,
o.
0, we have
This implies, from Part (4) of P 2.6.4,
Consequently,
As this is true for every) i= 1, and Zt, Z2, • .. ,Zs is an orthonormal basis, the vector A(I)Zll + A(2)ZI2 + ... + A(s)Zls must be a multiple of the vector Zl. Since
for some scalar a, we have
(A(l) - a)zll
+ (A(2)
- a)zl2
+ ... + (A(s)
- a)zls
=
o.
Now, we claim that Zlj i= 0 for exactly one) E {1, 2, ... ,s}. Suppose not. Then there are distinct indices )1,)2, ... ,)r E {1, 2, ... ,s} with r ? 2 such that Zlj. i= 0 for i = 1,2, ... ,r and Zlj = 0 for every ) E {1, 2, ... ,s} - {jl, )2, ... ,)r}. Since Zljl' ZIi2, ... ,Zljr are linearly independent, we must have A(M - a = 0 for every i = 1,2, ... ,r. But the A(i) 's are all distinct. This contradiction establishes the claim. In view of the claim, we have that ZI = Zlj. Hence ZI E E j . The same story can be repeated for the other members of the given basis. One important consequence of the above results is that the eigenvalues A(i) 's and the eigenspaces E/s are uniquely defined. If we look at the process how the normalized vectors Xl, X2, ••• ,Xk are chosen satisfying (2.6.1), we had some degree of freedom in the selection of these
90
MATRIX ALGEBRA THEORY AND APPLICATIONS
vectors at every stage of optimization. In the final analysis, it does not matter how the vectors are selected. They lead to the same eigenvalues A(i)'S and eigenspaces Ei's. We need the following terminology. The rank of semi-inner product (', .) on V with respect to an inner product < ',' > on V is defined to be the dimension of the subspace N.L, where N = {x E V : (x, x) = O}. The orthogonal complement N.L is worked out with respect to the inner product < ',' >. Now we are ready to state and prove the spectral theorem for Hermitian conjugate bilinear functionals. The main substance of the spectral theorem is that every Hermitian conjugate bilinear functional is a linear combination of semi-inner products. More precisely, we want to write any given Hermitian conjugate bilinear functional K(·,·) in the following form: (2.6.2) having the following features. (In the background, we have an inner product < " > on the vector space V.) (1) The numbers 81 , 82, . .. ,8m are strictly decreasing. (2) The semi-inner products (-, 'h, (', ·h, ... ,(', ')m are all of nonzero ranks. (3) The subspaces F 1 , F2, ... ,F m are pairwise orthogonal, where Fi = Nt and Ni = {x E V : (X,X)i = O}. (The orthogonality is with respect to the inner product < .,. > .) (4) For any pair of vectors x and y in V, we have
< X,y >= (x,yh + (x,yh + ... + (x,y)m' In abstract terms, when we say we have a spectral form for a Hermitian conjugate bilinear functional with respect to a given inner product, we mean a form of the type (2.6.2) exhibiting all the features (1), (2), (3), and (4) listed above. For such a form, we demonstrate that the vector space V is the direct sum of the subspaces F 1 , F2, ... ,F m. Suppose Fm+l is the subspace of V orthogonal to each of F 1 ,F 2, ... ,Fm. We show that the subspace F m+l is zerc~·dimensional. Observe that the vector space V is the direct sum of the subspaces F},F 2, ... ,Fm+I' Any vector x in V has a unique decomposition x=
Ul
+ U2 + ... + U m +l,
Unitary and Euclidean Spaces
91
with Ui E Fi. By (4) above,
< x,x > = (x,xh + (x,xh + ... + (x,x)m = =
(Ut,Udl
< Ul, Ul
+ (U2,U2h + ... + (um,um)m > + < U2, U2 > + ... + < Um , Um > .
On the other hand,
Consequently, < Um+l, Um+l >= O. Since x is arbitrary, it follows that the subspace F m+l is zero-dimensional. In the following result, we identify explicitly the spectral form of a Hermitian conjugate bilinear functional. P 2.6.6 (Spectral Theorem). Let K(·,·) be a Hermitian conjugate bilinear functional and < " . > an inner product on a vector space V. Then there exist semi-inner products (', ')i, i = 1,2, ... ,s of nonzero ranks, and distinct real scalars A(i)' i = 1,2, ... ,s such that the following hold.
(1) The subspaces F i , i = 1,2, ... ,s of V are pairwise orthogonal, where Fi = Nt and Ni = {x E V: (X,X)i = O}. (2) For every x, y E V,
< X,y > = (x,yh + (x,yh + ... + (x,y)s' (3) K(x, y) = A(1)(X, yh
+ A(2)(X, yh + .,. + A(s) (x, y)s.
PROOF. The spade work we have carried out so far should come in handy. It is not hard to guess the identity of the scalars A(i) 'So We need to identify precisely the subspaces F/s before we proceed further. Let the distinct eigenvalues A(i) 's and the eigenspaces E/s be those as outlined above. We prove that Fi = Ei for every i. We define first the semi-inner products. Let x, y E V. Since the vector space V is the direct sum of the vector spaces E 1 , E2,'" ,E s , we can write
x= y=
+ U2 + ... + Us, Ul + U2 + .. . + v s , Ul
92
MATRIX ALGEBRA THEORY AND APPLICATIONS
with Ui and Vi in E i , in a unique way. For each 1 :S i :S s, define
One can check that ("')i is a semi-inner product on the vector space V. Next we show that (X,X)i = 0 for x in V if and only if x E Ef-. If (x, X)i = 0, then < Ui, Ui >= 0 which implies that Ui = O. Thus we observe that x =
UI
+ U2 + ... + Ui-l + Ui+l + ... + Us·
Consequently, x E Et-. (Why?) The converse follows in the same way if we retrace the steps involved above. Thus we identify the null space N i of the semi-inner product (-, .) i as Et-. Hence Nt- = E i . In view of this identification, (1) follows. By the very definition of the semi-inner products, we have for any x, y E V,
< x,y > = < U}'VI > + < U2,V2 > + ... + < us,Vs >
=
(x,yh
+ (x,yh + ... + (x,Y)s.
This establishes (2). Finally, by P 2.6.4 (4), we have
+ K(U2, V2) + ... + K(u s , VS) = A(I) < UI,VI > +A(2) < U2,V2 > + ... + A(s) < Us,Vs > = A(I)(X, yh + A(2)(X, yh + ... + A(s) (x, Y)s,
K(x, y) = K(uI, VI)
from which (3) follows. It is clear that each semi-inner product introduced above is of non-zero rank. The set P(I), A(2), ... , A(s)} of eigenvalues is called the spectrum of K(·, .). In the following result, we show that the representation given above is unique.
P 2.6.7 Let K(·,·) be a Hermitian conjugate bilinear functional and < .,. > an inne~ product on a vector space V. Let A{1) > A(2) > ... > A(s) be the eIgenvalues and E 1 , E 2 , ... , Es the corresponding eigenspaces of K ( ., .). Suppose
Unitary and Euclidean Spaces
93
is a spectral form of K C,, .) for some real numbers 81 > 82 > .. . > 8m and semi-inner products (., -fl, C" .)~, . . . , (., embodying the three features (1), (2), and (3) of the spectral theorem P 2.6.6 outlined above. Then m = s, 8i = A(i), and (., .): = (., ')i for every i , where the semiinner product C,,·) is the same as the one defined in P 2.6.6.
·rm
PROOF. The ideas are essentially contained in the discussion preced-
ing P 2.6.6. Let Pi = {x E V : (X,Xfi = O} and G i = Pr for each 1 ::; i ::; m. By hypothesis (1), G l , G 2 , . . . , G m are pairwise orthogonal. First, we show that the vector space V is the direct sum of the subspaces G l , G 2 , •• • , G m . Let G m + l be a subspace of V orthogonal to G l , G 2 , •• • , G m so that
Let x E V. We can write x =
Xl
+ 1.
with Xi E G i , 1 ::; i ::; m < x,X > -
=
+ X2 + ... + Xm + Xm+l (x,x)~
+ ... + (x,xfm (Xl,XSl + (X2,X2f2 + ... + (xm,xmfm < X}, Xl > + < X2 , X2 > + ... + < Xm,Xm > . CX,Xfl
+
By (2),
But <X,X>= <Xl,Xl>+<X2,X2>+'"
+ < Xm,Xm > + < Xm+l,Xm+l >, which implies that < Xm+l, Xm+l > = O. Since x is arbitrary, it follows that G m + l is zero-dimensional. If Yil, Yi2, ... , Yiri is an orthonormal basis of G i , i = 1,2, ... , m , then Yl},Y12, • ..
,Ylrl,Y2},Y22,·· · ,Y2r2" " ,Yml,Ym2,··· ,Ymr",
is an orthonormal basis of V. Also < Yij,Yrt >
=
0
MATRIX ALGEBRA THEORY AND APPLICATIONS
94
for every (i, j) =f (r, t). By P 2.6.5, each Yij belongs to some E r . This immediately leads to the verification of the result. There is an alternative way of writing down the spectral form of a Hermitian conjugate bilinear functional. In this form, the functional is written explicitly in terms of the underlying inner product. P 2.6.8 Let K(·,·) be a Hermitian conjugate bilinear functional and < ., - > an inner product on a vector space V. Then there exist orthonormal vectors Xl, X2, ... , Xr in V and real numbers >'1 ~ >'2 ~ ... ~ >'r such that for any pair X and Y of vectors in V, we have K(x,y)
=
>'1 < X,Xl >< +>'r
where r
~
> + >'2 < X,X2 >< X2,Y > + ... < X, Xr >< Xr, Y >, (2.6.3) Xl,Y
dim(V).
PROOF. Choose orthonormal vectors Xl, X2, . .. , Xk in V satisfying (2.6.1), where k = dim(V). Let >'i = K(Xi, Xi) for each i. For given vectors X and Y in V, write the decompositions of X and Y as X
=
< X, Xl > Xl + < X, X2 > X2 + ... + < X, Xk > Xk
Y
=
< Y,Xl > Xl + < Y,X2 > X2 + ... + < Y,Xk > Xk.
and
Consequently, k
K(x,y)
= K ( L < X,Xi > Xi, i=l
k
L
< y,Xi > Xi)
j=l
k
= L>'i < X,Xi >< Xi,y >. i=l
In the above representation, we omit those >'i'S which are zero. Thus we have the desired representation (2.6.3). The statement of P 2.6.8 can be reworded in the following way.
P 2.6.9 Let K(·,·) be a Hermitian conjugate bilinear functional and < .,. > an inner product on a vector space V. Then there exist vectors Xl,X2,··· ,Xk in V and real numbers >'1,>'2, .. . ,>'k such that
Unitary and Euclidean Spaces
95
for any pair x and Y of vectors in V, we have
K(x,y) = Al <
>< XbY > +A2 < X , X2 >< X2,Y > + .. . +Ak < X,Xk >< Xk,Y >
X,Xl
and
< X,Y >
=
< X,Xl >< Xl,Y > + < X , X2 >< X2 , Y > + (2.6.4) + < x, Xk >< Xk , Y > .
The properties (2.6.1), (2.6.3) and (2.6.4) are all equivalent. The second part of Property (2.6.4) is equivalent to the fact that Xl, X2, . .. ,Xk should constitute an orthonormal basis of the vector space V . See P 2.2.7. Consequently, the properties (2.6.3) and (2.6.4) are equivalent. It is clear that (2.6.1) and (2.6.4) are equivalent. It is time to take stock of what has been accomplished. One crucial point we need to discuss is that what happens to the spectral form when the Hermitian conjugate bilinear functional K(·, ·) is itself a semi-inner product or, more restrictively, an inner product. In that case, we will get an additional bonus. If the Hermitian conjugate bilinear functional is a semi-inner product, then all its eigenvalues are non-negative. (Why?) IT the Hermitian conjugate bilinear functional is an inner product , then all its eigenvalues are positive. (Again, why?) The spectral representation of a Hermitian conjugate bilinear functional K(·,·) in (2.6.2) naturally depends on the inner product < ., - > chosen. It is, therefore, of some interest to examine how the representations differ for different choices of the inner product. The following theorems shows that in any representation the number of positive, negative and zero eigenvalues are the same, while the actual eigenvalues and the corresponding eigenvectors may not be the same. We give a representation of K( ·,·) in terms of some basic linear functionals which brings into focus the stated facts above. P 2.6.10 Let K(· ,·) be a Hermitian conjugate bilinear functional on a vector space V. Then there exist p + q (s dim(V)) linearly independent linear functionals L l , L2 , . . . ,Lp +q defined on V such that for every pair X and Y of vectors in V , we have p
K(x, y)
= I: Li(X)Li(Y) i=l
q
I: Lp+j(x)Lp+j(Y)· j=l
(2.6.5)
96
MATRIX ALGEBRA THEORY AND APPLICATIONS
Moreover, the numbers p and q in (2.6.5) are unique for a given K( ·, .), while the choice of linear functionals is not unique. PROOF. Consider the representation of K( ·,·) given in (2.6.4). Assume, without loss of generality, that AI, A2, ... ,Ap are positive Ap+l' Ap+2, . . . ,Ap+q are negative, and Ap+q+b Ap+q+2, . .. ,Ak are zeros. Let J-Lp+j = -Ap+j,j = 1,2, ... ,q. Then for any pair x and Y of vectors in V, p
K(x,y) = LAi < X,Xi >< Xi,Y > i=l q
- LJ-LP+j < x,xp+j >< xp+j,Y > j=l p
= L < x, (Ai)1/2xi >< (Ad 1 / 2x i' Y > i=l q
< x, (J-LP+j)1/2 xp+j >< (J-Lp+j )1/2xp+j , Y >
- L j=l p
q
= LLi(X)Li(Y) - L Lp+j(x)Lp+j(Y)' j=l
i=l
where Li(x) =< x,(Ad1/2xi >, i = 1,2, ... ,p and, similarly we have Lp+j(x) = < x, (J-Lp+j)I/2 xp+j >, j = 1,2, ... ,q. Observe that for each i, LiO is a linear functional on the vector space V . Since Xi 'S are orthononnal, the linear functionals Li (.), i = 1,2, ... ,p + q are all linearly independent. Now let us settle the question of uniqueness. Suppose we have two representations of K(·, ·) given by p
K(x, y)
= L Li(X)Li(Y) - L Lp+j (X) Lp+j (y) i=l
and
q
r
j=1 s
K(x,y) = LMi(X)Mi(Y) - LMr+j(x)Mr+j(y) i=l
j=l
Unitary and Euclidean Spaces
97
for all x and y in V, and for some sets {Li} and {Mi} of linearly independent linear functionals on V. Set x = y. Then for any x in V,
K(x, x) = =
Suppose
l'
p
q
i=1
j=1
L Li(X)Li(X) - L Lp+j(x)Lp+j(x) r
s
i=1
j=1
L Mi(X)Mi(X) - L Mr+j(x)Mr+j(x).
(2.6.6)
(2.6.7)
> p. Consider the following linear equations, Li(X) =0, i= 1,2, ... ,p, Mr+j(x) = 0, j = 1,2, ... ,s,
(2.6.8)
in x. For any x satisfying (2.6.8), K(x, x) is non-positive as per the representation (2.6.6) and is non-negative as per the representation (2.6.7). This apparent anomaly will not arise if
Mi(X) = 0, i = 1,2, ... ,T, Lp+j(x) = 0, j = 1,2, ... ,q,
(2.6.9)
whenever the vector x satisfies equations (2.6.8). What this means is that each of the linear functionals M 1, M 2, ... ,Mr is a linear combination of the functionals £1, £2,"" L p, Mr+ll M r+2, ... ,Mr+s' See Complement 1.7.2. As a consequence, each of the linear functionals MI, M 2, ... ,Mr , M r+1, ... ,Mr+8 is a linear combination of the funetionals L 1, L2, .. . ,Lp, Mr+l' M r+2, .. . ,Mr+s . This is not possible since M 1, M 21 ··· I Mn M r+11 ··· I Mr+s are linearly independent and p < T. Thus we must have 1" ::; p. By similar argument, we can show that p ::; T. Hence p = T. In a similar vein, it follows that q = s. This completes the proof. The numbers p + q and p - q are called the rank and signature of the Hermitian conjugate bilinear functional K(-, .), respectively. If we recall how we obtained the first eigenvalue >'1 of the Hermitian conjugate bilinear functional, we identify >'1 as the largest value of the ratio K(x, x)/ < x, x> as x varies over all non-zero vectors of V. The remaining eigenvalues also do have such optimality properties. In the
98
MATRIX ALGEBRA THEORY AND APPLICATIONS
following theorem, we characterize the intermediate eigenvalues. Before we present the theorem, we would like to rei terate the basic framework under which we operate. Let Al 2: A2 2: ... 2: Ak be the eigenvalues of the Hermitian conjugate bilinear functional K( ·, ·) with respect to an inner product < -, - >, and Xl, X2, .. • ,Xk the corresponding eigenvectors which form an orthonormal basis for the vector space V . All these facts have been handed. down to us from P 2.6.2 and the discussion that ensued..
P 2.6.11 (Minimax Theorems) Let Ms be the vector space spanned by the first s eigenvectors of K(-'·) and M- s the vector space spanned by the last s eigenvectors for each 1 :::; s :::; k. The following hold: (1) inf K(x, x)/ < X, X > = As and the infimum is attained xEM.,x,eO
at
= XS. sup K(x, x)/
X
>=
Ak-s+1
and the supremum is
attained at x = Xk-s+1. (3) inf sup K(x, x)/ < x, x > =
Ak- s +1
where the infimum is
(2)
<
X, X
xEM_.,x,eO
S xES,x,eO
taken over all subspaces S of V with dim(S) 2: s, and the infimum is attained at S = M- s • (4) sup inf K(x, x)/ < x, x > = As where the supremum is S
xES ,x,eO
taken over all subspaces S of V with dim(S) 2: s, and the supremum is attained at S = Ms. PROOF. We begin by proving (1). Let x be any non-zero vector in the vector subspace Ms. Write down its representation in terms of the given orthonormal basis of Ms:
x=
0lXl + 02X2 + ... + OsXs,
for some scalars o}, 02, ...
K(x, x)
< x, x> =
_
,Os .
_
We compute
_
_
[01 0 1 A 1 +02 0 2 A2+ · · .+OsOsAsl![0101 +020:2+ .. . +osO:s].
The above computation indicates that we are taking a weighted average of the numbers Al 2: A2 2: ... 2: As with the non-negative weights 010:}' 020:2,·· . , O:S0:8· It is clear that the weighted average is always
Unitary and Euclidean Spaces
99
~ >'s. It is also clear that when x = x s, the weighted average is exactly equal to >'s. This proves (1). A similar argument establishes the truth of (2). Let us now prove (3). Let S be any subspace with dimeS) 2: s. Observe that
K(x, x) > :eES,:e:FO < x, X > sup
> >
sup xESnM k -0+1 ,X:F O
inf
xESnMk -0+1 ,x:F O xE M
inf
k_ .+l,X:F 0
K(x,x)/ < x,x > K(x,x)/ < x,x >
K(x,x)/ < x,x > =
(2.6.10) Ak-s+l
by (1).
The first inequality above holds as long as S n M k - s + 1 =/:. {O}. This is so, in view of the fact that
+ dim(M k- s+1) - dimeS + Mk-s+d 2: (s) + (k - s + 1) - k = 1.
dimeS n Mk-s+t} = dimeS)
Taking the infimum of (2.6.10) over all subspaces S of dimensions 2: s, we obtain inf sup K(x,x)/ < x,x > 2: Ak-s+1' S xES,x:FO
The subspace S = M- s is one such subspace with dimension equal to s and for which, by (2), sup
K(x, x)/ < x, x> =
Ak-s+l.
:eES,X:FO
Consequently, the infimum above is attained at the subspace S = M- s • This completes the proof of (3). In order to prove (4), repeat the above argument used in the proof of (3) with only one modification: in the chain of inequalities, use S n M-(k-s+1) instead of S n M k - s+ 1 ' There is another line of thinking when it comes to analyzing a Hermitian conjugate functional K (', .) defined on a vector space V. Suppose Q is a subspace of V. One can restrict the given bilinear functional K(·, ·) to the Cartesian product Q x Q. The restriction still remains a Hermitian conjugate bilinear functional on the vector space Q. Now the question arises as to the type of relationship that prevails between the eigenvalues of K ( " .) as defined on V x V and the eigenvalues of K(·, .) as defined on Q x Q. In the following proposition, we establish some inequalities.
100
MATRIX ALGEBRA THEORY AND APPLICATIONS
P 2.6.12 Let K(-'·) be a Hermitian conjugate bilinear functional on a vector space V of dimension k. Let >'1 2 >'2 2 ... 2 >'k be its eigenvalues. Let Q be a subspace of V of dimension t. Let ILl 2 IL2 2 ... 2 ILt be the eigenvalues of K(-'·) restricted to Q x Q. Then the following inequalities hold: (1) >'s 2 j.Ls for s = 1,2, ... ,t; (2) j.Lt-s+1 2 >'k-s+1 for s = 1,2, ... ,t. PROOF. This result follows by an application of minimax theorems established above. By Part (3) of P 2.6.11 (Minimax Theorems), ILt-s+1
=
inf
sup
seQ xES,x,eO
K(x, x)! < x, x >,
where the infimum is taken over all subspaces S of the vector space Q with dim(S) 2 s . But this infimum, 'nf ) seQ
K(x,x).
sup xES,x,eO
< x, X >
2 )nf
sup
K(x,x)
sev xES,x,eO
< x, X >
= >'k-s+1,
where the infimum on the right hand side of the above inequality is taken over all subspaces S of the vector space V with dim(S) 2 s. This is true for any s = 1,2, ... ,t. This proves Part (2) above. Using Part (4) of P 2.6.11, one can establish Part (1) above. The following is a simple consequence of the above result. This result is usually known as the interlace theorem for eigenvalues. P 2.6.13 (Interlace Theorem) Let K(·, ·) be a Hermitian conjugate bilinear functional on a vector space V of dimension k . Let >'1 2 >'2 2 ... 2 >'k be the eigenvalues of K(·, .). Let Q be a subspace of V of dimension (k - 1). Let ILl 2 IL2 2 ... 2 ILk-1 be the eigenvalues of K(·,·) restricted to the subspace Q. Then
>'1 2
ILl
2 >'2 2
IL2
2 ... 2
ILk-1
2
>'k .
COlnplemellts
2.6.1 Let V = C 3 and consider the following function on the product space V x V:
+ (1 + i)6172 + (1 - i)6173 + (1 - i)6171 + 6173 + (1 + i)6171 + ~3T]-}. + 6173
K(x, y) = 6171
Unitary and Euclidean Spaces
101
for x = (~1,6,6) and Y = (171,"72,173) E V.
(1) Show that K(·,·) is a Hermitian conjugate bilinear functional onV. (2) Obtain the spectral form of K(·, .).
2.7. Conjugate Bilinear Functionals and Singular Value Decomposition It may be puzzling to the reader that the phrase "Conjugate bilinear functional" is cropping up again in a new section. Are we not done with it in Section 2.6? The functionals we are entertaining in this section are defined in a more general framework than hitherto considered. You will see the difference when we introduce the definition. DEFINITION 2.7.1. Let VI and V 2 be two vector spaces both over the field C of complex numbers. A map B( ·, ·) from VI x V 2, the Cartesian product of Viand V 2, into C is said to be a conjugate bilinear functional if
(1) B(OIXI + 02X2, y) = oIB(XI' y) + o2B(X2' Y), (2) B(x, /3IYl + /32Y2) = i3 I B(x, yt} + i32 B (X, Y2) hold for every X,X},X2 in VI; Y,Yl,Y2 in V 2; and O},02,/3I,/32 in C. Some remarks are in order. In the above framework, the vector spaces V 1 and V 2 need not be identical. If V I = V 2 and B (., .) is a Hermitian conjugate bilinear functional on VI, then B(· ,·) is a conjugate bilinear functional on VI x VI in the sense portrayed above. If VI = V2 and B(·,·) is a conjugate bilinear functional on VI x V 2, it is not necessary that B(·,·) be Hermitian. Try an example. In what follows, we choose and fix an inner product < .,. >1 on the vector space Viand an inner product < .,. >2 on V 2. We establish what is called the Singular Value Decomposition of the conjugate bilinear functional B(·, ·) with respect to the inner products < ., . >1 and < ., . >2. This decomposition is in the same spirit as the spectral fonn of a Hermitian conjugate bilinear functional on a vector space. In the proof of the singular value decomposition, we make use of the spectral form of a certain Hermitian conjugate bilinear functional on a vector space. Even though the singular value decomposition sounds
102
MATRIX ALGEBRA THEORY AND APPLICATIONS
more general than the spectral form but it is a derivative of the spectral form. P 2.7.2 (Singular Value Decomposition) Let Be, .), < .,. >1 and < .,. >2 be as defined above. Then there exist positive real numbers 0"1 ~ 0"2 ~ ... ~ O"r with r ~ min{dim(Vd, dim(V2)}, orthonormal vectors Xl, X2, ... ,Xr in VIand orthonormal vectors Yl, Y2, ... ,Yr in V 2 such that for any two vectors X in VIand Y in V 2,
B(x,y) = 0"1 <X,Xl>12 +0"2 <X,X2>I2
+ ...
+ O"r < X,X r >1< Yr,Y >2· PROOF. For every fixed Y in V 2, B (., y) is a linear functional on the vector space VI. Therefore, there exists a unique vector TJ(Y) in VI such that B(x, y) = < x, TJ(Y) >1 for every X in VI. See P 2.4.1. For every fixed vector X in VI, the map B(x,·) is a linear functional on the vector space V2. Consequently, there exists a unique vector ~(x) in V 2 such that B(x, y) =< ~(x), Y >2 for all Y in V 2. We now define a function K(·,·) on the Cartesian product space VI x VI as follows. For any two vectors x and u in VI, let
K(x, u) =< ~(x), ~(u) >2 . Let us put together the maps vectors x in VI and yin V 2,
~(.)
and TJ(·) in the following way, for any
B(x, y) = < x, TJ(Y) >1 = < ~(x), Y >2 .
(2.7.1)
Since the map K(·,·) is defined through an inner product, K(·,·) is a semi-inner product on the vector space VI! We would like to appeal to the spectral form as exemplified by (2.6.4). Let r be the rank of the spectral form of the semi-inner product K(-, .) with respect to the inner product < .,. >1. Obviously, r ~ min{dim(Vt}, dim(V2)}. It is clear that the eigenvalues of K (., .) are non-negative. For reasons that will be clear later, let O"~ ~ O"~ ... ~ 0"; be the positive eigenvalues of K(·, .). As per (2.6.4), there exist orthonormal vectors Xl, X2, .. . ,Xr in VI such that for any pair of vectors X and u in VI, K(x,u) = O"~ < X,Xl >< Xl,U >1
+ O"~
< X,X2 >1< X2,U >1
+ 0"12 <X,Xr >I<Xr ,U>I·
+ ... (2.7.2)
Unitary and Euclidean Spaces
103
FUrther, we note that for any i,j E {I, 2, ... ,r} K(x.x.)= t, J
u~ t { 0
if i = j, if i =1= j.
Extend the set Xl, X2, ... , Xr of orthonormal vectors to an orthonormal basis X},X2,'" ,Xn Xr+l, ... ,Xk in VI. For the basis, we do have u~
K(x"x;) =
{
~.
if i = j, (i and j E {1,2, ... ,r}), if i = j, (i and j E {r + 1, ... , k}),
(2.7.3)
if i =1= j.
FUrther, for any vector y in V 2, we have the usual expansion
< X, ",(y) >1
=
< X, Xl >1 < XI, ",(y) >1 + < X, X2 >1 < X2, ",(y) >1 + ... + < X,Xk >1< Xk,"'(Y) >1,
which, with the help of (2.7.1), becomes
< x,,,,(y) >1
=
< X,Xl >1< ~(xd,Y >2 + < X,X2 >1< ~(X2)'Y >2 + .,. + < X, Xk >1 < ~(Xk)' Y >2 . (2.7.4)
We need to procure an orthonormal set of vectors for the vector space V 2. A natural candidate is:
Note that by (2.7.3), for i,j E {1,2, ... ,r}, if '{,
=),
if i =1= j. Thus Zl! Z2,' .. ,Zr are orthogonal. Let Ui be the positive square root of ut and Yi = (Ui)-1 Zi, i = 1,2, ... ,r. Then Y}, Y2, .. . , Yr is a set of orthonormal vectors in V 2 • FUrther, Zi = 0 if i = l' + 1, r + 2, ... ,k.
104
MATRIX ALGEBRA THEORY AND APPLICATIONS
Using (2.7.1) and (2.7.4), the puzzle is solved. For any vector x in VI and yin V2, we have
B(x, y)
= < x, TJ(Y) >1 = < X,X1 >1< Z},Y >2 + < X,X2 >1< Z2,Y >2 + ... + < X,Xk >1< Zk,Y >2 - CT1 < X,X} >1< YI,Y >2 + CT2 < X,X2 >1 < Y2,Y >2 + ... + CTr < x, Xr >1 < Yn Y >2 .
This completes the proof. In the development of a spectral form for a Hermitian conjugate bilinear functional defined on a vector space, the critical point was to demonstrate the existence of an orthonormal basis for the vector space satisfying (2.6.4). In the context of a conjugate bilinear functional defined on the Cartesian product of two vector spaces, we can exhibit two orthonormal bases one for each of the underlying vector spaces satisfying a property similar to (2.6.4). Such bases are called canonical bases. Let us state formally what was discussed above.
P 2.7.3 (Canonical Bases of Two Vector Spaces) Let (VI, < -,' >d and (V2' < -,' >2) be two inner product spaces with dimensions k and m, say, respectively. Let B( ·,·) be a conjugate bilinear functional defined on V I X V 2. Then there exists an orthonormal basis x}, X2," . ,Xk for the vector space VI and an orthonormal basis YI, Y2,'" ,Ym for V 2 such that B(Xi,Yj)
=/=j,
= 0
for all i
=/= 0
for j = i.
(2.7.5)
PROOF. Most of the spade work needed for this result was already done in the proof of P 2.7.2. We have already obtained orthonormal vectors Xl, X2, ... ,Xr in V I and orthonormal vectors YI, Y2, ... ,Yr in V 2 such that (2.7.5) is satisfied for all i,j E {1,2, . .. ,r}. Extend Xl, X2, ... ,X r to an orthonormal basis Xl, X2, ... ,Xk of V I, and extend the same courtesy to the set YI, Y2, ... ,Yr' The property (2.7.5) covers all the basis vectors in view of the singular value decomposition of the bilinear form Be, .).
105
Unitary and Euclidean Spaces
It is time we name the numbers 0"1 ~ 0"2 ~ ... ~ O"r > O. These numbers are called the singular values of the conjugate bilinear form B(-, .). Further, the vectors Xi in VI and Yi in V2 are called canonical vectors associated with the singular value O"i, i = 1,2, ... ,T. In the context of a Hermitian conjugate bilinear functional defined on a vector space, the eigenvalues were obtained as a solution to a certain optimization problem. The singular values also have a similar optimality property. P 2.7.4 The largest singular value 0"1 of a conjugate bilinear functional B(-,·) defined on the Cartesian product VI x V 2 of two vector spaces has the following property:
0"1 =
IB(x, Y)I/[IIxlh liyli2j.
sup xEV l ,x#O,yEV 2 ,y#O
Moreover, the supremum above is attained at X = Xl and Y = Yl, where Xl and Yl are a set of canonical vectors associated with the singular value 0"1PROOF. This result is a simple consequence of the singular value
decomposition of B(-, -). For any vectors
X
in VI and Y in V 2, we have
B(x,y) = 0"1 < X,Xl >1< Yl,Y >2 +0"2 < X,X2 >1< Y2,Y >2
+ ...
+O"r < X,X r >1< Yr,Y >2-
Expand each of X and Y with respect to their respective orthonormal bases stemming from the singular value decomposition:
+ < X, X2 >1 X2 + ... + < X, Xk >1 Xk = nlxl + n2x2 + ... + nkxk, say, = < y, Yl >2 Yl + < y, Y2 >2 Y2 + _.. + < y, Ym >2 Ym = f3lYl + f32Y2 + ... + f3mYm, say.
X = < X, Xl >1 Xl Y
By the Cauchy-Schwartz inequality, it now follows that r
i=l r
r
k
m
:::; 0"1 (L IniI2)1/2(L lf3iI 2)1/2 i=l i=l :::; 0"1 (L IniI2)1/2(L lf3jI2)1/2 = 0"1 Ii x lil liyli2i=l
j=l
106
MATRIX ALGEBRA THEORY AND APPLICATIONS
Consequently, the supremum under discussion is ::; B(Xl' Yl) = 0"1. This completes the proof.
0"1.
It is clear that
Complements
2.7.1 Let Vt defined by
= R2 and V2 = R3.
Let B be the functional on VI XV2
for x = (6,6) E VIand Y = ("11, 'TJ2, '(/3) E V 2· (1) Show that B is a conjugate bilinear functional. (2) Obtain the singular value decomposition of B.
Note: The material covered in this Chapter is based on Halmos (1958), Rao and Mitra (1968a, 1971a, 1971b) and Rao (1973c).
CHAPTER 3 LINEAR TRANSFORMATIONS AND MATRICES In Chapters 1 and 2, we have looked at entities called linear funetionals. They were maps from a vector space into the associated field of the vector space. The notion of a linear functional can be extended to a more general setting. In this chapter, we study linear transformations from one vector space into another and their representation in matrix form.
3.1. Preliminaries Let V and W be two arbitrary vector spaces over the same field F. A map T from V to W, written as T : V ---+ W, is said to be a linear transformation if T(ax
+ f3y)
= aT(x)
+ f3T(y)
for every a and f3 in F, and x and y in V. There are other names used for the type of map introduced above: a linear operator, a homomorphism, or a linear mapping. In the sequel, any linear transformation is simply called a transformation, and a general transformation will be referred to as a map. The vector space V is called the domain of the transformation T. If S is a subspace of V, we can restrict the map T to S, and the restriction is usually denoted by TIS. For the restricted map, the domain is obviously the space S. With T : V ---+ W, one can associate two sets R(T)
= {T(x)
E W : x E V}, K(T)
= {x
E V : T(x)
= O}.
The set R(T) is a subset of the vector space Wand is called the range of the transformation T . FUrther, one can show that R(T) is a subspace of 107
108
MATRIX ALGEBRA THEORY AND APPLICATIONS
w.
The set K(T) is a subset of the space V and is called the kernel of the transfonnation T. One can show that K(T) is a subspace of V. To round up the discussion on the range and kernel of a transformation, let us introduce two more notions. The dimension of the subspace R(T) is called the rank of the transformation T and is usually denoted by peT). The dimension of the subspace K(T) is called the nullity of the transformation T and is usually denoted by v(T). If the transformation T is such that R(T) = W, then the transformation is labeled as onto. Otherwise, it is labeled as an into transformation. T is said to be an isomorphism if T is one-to-one and onto. If T is an isomorphism, we declare that the vector spaces V and Ware isomorphic. Let us state a few facts surrounding these notions.
P 3.1.1 Let T: V -+ Wand KC be a complement of the subspace K(T) in V. Then the following hold. (1) The transformation TIKc, i.e., T restricted to KC, has the range R(T), i.e., R(TIKC) = R(T). (2) The transformation TIKc : KC -+ R(T) is one-to-one and onto. (3) dim(KC) = peT) = dim(V) - veT). PROOF. It is obvious that R(TIKC) C R(T). Let y E R(T). There exists a vector x in V such that T(x) = y. Since K(T) EB KC = V, we can write x = Xl + X2 with Xl E K(T) and X2 E K C. As y = T(x) = T(XI)+T(X2) = T(X2) andx2 E KC, we have y E R(TIKC). This proves (1). For (2), we need to show that the map TIKc is one-to-one. Let u and v be two vectors in KC such that T(u) = T(v). As T(u - v) = 0, we have u - v E K(T). The vector u - v also belongs to K C. Hence u - v = 0. This shows that TIKc is one-to-one. We have already seen that dim(V) = dim(K(T)) + dim(KC). (See P 1.5.8.) Since K C and R(T) are isomorphic, (3) follows. The above result can be rephrased as follows . Given any transformation from one vector space V to another W, we have two subspaces K(T) and R(T) with K(T) being a subspace of V and R(T) a subspace of W. The dimensions of these subspaces match the dimension of the vector space V, i.e., dim(K(T)) + dim(R(T)) = dim(V) . We look at some examples. EXAMPLE 3.1.2. Let F be any field. Define a map T from F3 to F2 by T(';1 ,6,6) = (6,6) for (';b6,6) in F3. Note that F3 and F2 are
109
Linear 'l'ransformations and Matrices
three-dimensional and two-dimensional vector spaces, respectively, over the same field F. Further, T is a linear transformation. EXAMPLE 3.1.3. Consider the vector space P n of all polynomials of degree less than n with complex numbers as coefficients. Then the map (Differential operator) T defined by, n-l
n-l
n-l
T(L~iXi) = (d/dx)(L~iXi) = (Li~iXi-l) i=O
i=O
i=l
n-l
is a linear transformation from P n to P n-}, where mial in x of degree less than n with
~o, 6,
...
'~n-l
I: ~ixi is a polyno-
i=O
E C.
EXAMPLE 3.1.4 . Consider the vector space P n of all polynomials of degree less than n with complex coefficients. Then the map (Integral operator) S defined as
n-l n-l S(L ~ixi) = (L(~d(i + l)))xi+1 i=O
i=O
is a linear transformation from P n to P n+l. EXAMPLE 3.1.5. Let V and W be arbitrary vector spaces over the same field. Let WI, W2,'" ,Wr be any r vectors in Wand YI, Y2,··· ,Yr any r linear functionals on V. Then the map T defined by,
is a transformation from V to W. 3.1.6. Let S be a subspace of a vector space V. Let SC be a complement of Sin V. Then any vector x in V can be written as x = Xl + X2 with Xl E Sand X2 E SC in a unique way. Define a map T from V to S by, T(x) = Xl' Then T is a transformation from V onto S. Such an operator is called a projection onto S along SC. We have come across this notion when we were discussing inner product spaces and orthogonal complements of subspaces. We do not have to have an inner product on the vector space in order to have the notion of projection feasible. EXAMPLE
110
MATRIX ALGEBRA THEORY AND APPLICATIONS
Finally, we state some results which connect the notions of range, kernel and isomorphism. P 3.1.7 Let T : V ---+ W . Then the following are valid. (1) The transformation T is an isomorphism if and only if dim(K(T)) = 0, i.e., K(T) contains only the zero vector, and R(T) = W. (2) If the vector spaces V and W have the same dimension, then K(T) = {O} if and only if R(T) = W. (3) If T is an isomorphism from V to W, and S is a subspace of V, then TIS is an isomorphism from S onto R(TIS). PROOF. Part (1) is easy to establish. For part (2), one can use the dimensional identity, dim(V) = dim(K(T)) + dim(R(T)). The proof of part (3) is trivial.
Complements 3.1.1 Let V be the set of all complex numbers regarded as a vector space over the field of real numbers. Let T be the map from V to V defined by T(x + iy) = x - iy, the conjugate of the complex number x isomorphism of V.
+ iy
E V. Show that T is an
3.2. Algebra of Transformations In this section, we look at the collection of all transformations from one vector space to another. This collection can be endowed with a structure so as to make it a vector space. Of course, the underlying structure of the vector space involved plays a crucial role in passing its features to the collection. In the sequel, assume that all vector spaces are over the same field F. DEFINITION 3.2.1. Let T : V ---+ Wand S : V ---+ W . Define a map T+S by, (T + S)(x) = T(x) + S(x) , x E V. For each a in F, define a map aT by,
(aT)(x)
= aT(x), x
E V.
Linear
nunsfo~ations
111
and Matrices
It is clear that the maps T + S and aT are transformations. Thus addition and scalar multiplication of transformations are naturally available on the collection of all transformations. The following proposition clearly spells what these operations mean from a structural point of view. P 3.2.2 Let L(V, W) be the collection of all transformations from the vector space V to the vector space W. With the operations of addition and scalar multiplication of transformations defined above, L(V, W) is a vector space over the same field F. One can define one more operation on transformations, namely, composition, subject to some compatibility conditions. DEFINITION 3.2.3. Let
T :V
~
Wand S : W
~
U. Define a map
ST:V~Uby
(ST)(x) = S(T(x)), x E V. It is clear that the map ST is a transformation. The transformation is, sometimes, called the product of the transformations Sand T. The product TS may not be defined. If V = W = U, one can always define the product ST as well as T S, and they need not be the same. After having defined the space L(V, W), the next item on the agenda is to determine the magnitude of its dimension. The following proposition addresses this problem. P 3.2.4 If dim (V) mn.
=m
and dim(W)
= n,
then dim(L(V, W))
=
PROOF. Let Xl, X2, • •• ,Xm be a basis of the vector space V and
a basis ofW. Let T be any transformation from L(V, W). First, we observe that the value of T(x), for any X E V, is determined by the set T(xt}, T(X2), ... ,T(xm ) of vectors in W. For, we can write
WI, W2, ••• ,Wn
for some aI, a2,." ,am in F in a unique fashion. It follows that
Once a basis for the vector space V is fixed, knowing the vector X is equivalent to knowing its co-ordinates ai, a2, ... ,am' Once we know
112
MATRIX ALGEBRA THEORY AND APPLICATIONS
the values of the transfonnation T at Xt,X2, .•. ,Xm , we can immediately write down the value of T at x. This innocuous observation has deep implications. One can build a transfonnation from V to W demanding it to take certain values in the vector space W at certain vectors in V! With this in view, for every fixed i E {I, 2, ... ,n} and j E {I, 2, . .. ,m}, let tij be a transformation from V to W satisfying TijXj = Wi, TijXk = 0 for all k i= j. In other words, we want a transfonnation Tij such that TijXI = 0, TijX2 = 0, ... ,Tijxj-l = 0, TijXj = Wi, TijXj+1 = 0, ... ,Tijxm = O. Note that Tij is the only transformation that takes the value 0 at Xl, X2, ... ,Xj-l' Xj+I, . .. ,Xm and the value Wi at Xj. (Why?) We claim that the set T ij , 1 :::; i :::; n, 1 :::; j :::; m of transformations constitutes a linearly independent set in L(V, W). Suppose n
m
LL
Ciij7ij
=0
i=l j=l
for some scalars 1:::; k:::; m, n
0=
Ciij,l :::;
i :::; n, 1 :::; j :::; m. In particular, for each
m
(L L
Ciij7ij) (Xk)
=
CilkWI
+ Ci2kW2 + ... + CinkWn·
i=l j=l
Since WI, W2,· .. , Wn are linearly independent, Ciik = 0 for every i 1,2, ... ,n. Since k is arbitrary, it follows that Ciij = 0 for every i and j. This establishes the linear independence of the transformations. Next, we show that any transformation T in L(V, W) is a linear combination of the transformations Til'S. Let T(Xi) = Yi, 1 :::; i :::; m. Expand each Yi in terms of the basis of W, i.e.,
for some scalars
(3ij'S.
One simply verifies that m
T =
n
""(3~~
-tJ-TJt·
i=l j=l
It now follows that dim(L(V, W)) = mn.
Linear TransjofTnations and Matrices
113
The notion of a linear transformation is not that much different from the notion of a lin~ar functional. By piecing together linear functionals in some suitable manner we can construct a linear transformation from one vector space to another. An inkling of this phenomenon has already been provided in Example 3.1.5. A fuller version of this example is the substance of the following proposition. P 3.2.5 Let V and W be two vector spaces. Let WI, W2, ... ,Wn be a basis for W. Let T be a map from V to W. Then T is a transformation if and only if there exist linear functionals Yl, Y2, ... ,Yn on V such that
for every x in V. PROOF. If T = Yl WI + Y2 W2 + ... + Yn Wn for some linear functionals Yl, Y2, . .. ,Yn on the vector space V, it is clear that T is a transformation. Conversely, let T be a transformation from V to W. Let xl, X2,." ,Xm be a basis for the vector space V. Let Yi = T(xd, i = 1,2, ... ,m. Write for each i,
for some scalars f3ij'S. Define for each j = 1,2, ...
,n,
m
Yj(x) = L oif3ij, X E V, i=1
where x = 01 Xl + 02X2 + ... + OmXm is the unique representation of x in terms of the basis of V. It is clear that each Yi(-) is a linear functional on V. Finally, for any x in V, m
T(x)
m
Tn
= T(LoiXi) = LOiT(Xi) = LOiYi = i=1 n
m
i=1
i=1
m
n
i=l
j=1
LOi Lf3ijWj
n
= L ( L o if3ij)Wj = LYj(x)Wj. j=1 i=1
j=1
Thus we are able to write T as a combination of linear functionals! This completes the proof.
114
MATRIX ALGEBRA THEORY AND APPLICATIONS
Linear transformations from a vector space V into itself are of special interest. For any two transformations T and S from V to V, one can define the product TS as the composition of the maps T and S taken in this order. Moreover, one can define the identity transformation I from V to V by I(x) = x for every x in V. It is clear that for any transformation T from V to V, T I = IT = T. Thus the space L(V, V) has an additional structure which arises from the operation of product of transformations defined above. The following proposition provides the necessary details what this additional structure entails. P 3.2.6 The space L(V, V) of all transformations from a vector space V into itself is an algebra with an identity. We have already seen that the space L(V, V) is a vector space. The additional structure on L(V, V) comes from the binary operation of product or composition of transformations. We need to identify the identity of this new binary operation. The obvious candidate is the identity transformation I. In order to show that the space L(V, V) is an algebra, we need to verify the following. (1) For any transformation T,OT = TO = 0 holds, where 0 is the transformation which maps every vector of V into the zero vector. The map 0 is the additive identity of the additive operation of the vector space L(V, V) . (2) For every transformation T, IT = T I = T holds. (3) (Associative law). For any three transformations T, Sand U, (TS)U = T(SU) holds. ( 4) (Distributive laws). For any three transformations T, Sand U, (T+ S)U = TU +SU and T(S + U) = TS +TU hold. These properties are easy to establish. As for the distributive law and associative law, they are valid in a more general framework. For example, suppose T is a transformation from a vector space V I into a vector space V 2, and Sand U are transformations from the vector space V 2 into a vector space V 3, the transformations (S + U)T, ST and UT are all well-defined from the vector space VI to V 3 • Moreover, we have the following distributive law: PROOF.
(S
+ U)T =
ST + UTe
One could write down any number of identities of the above type. As an example of another variety, suppose T is a transformation from a vector
Linear Transformations and Matrices
115
space VItO a vector space V 2, S is a transformation from the vector space V 2 to a vector space V 3 , and U a transformation from the vector space V 3 to a vector space V 4. Then the transformations U (ST) and (U S)T make sense and they are indeed transformations from the vector space VI to V4. Further, they are identical, i.e., U(ST) = (US)T. It is customary to denote this transformation by U ST. Complements 3.2.1 Let Q be the field of all rational numbers. Let V = Q2 and T a transformation from V to V. The only clues we have about T are that T(1,0) = (2, -3), T(O, 1) = (3,1).
Determine T(x, y) for any (x, y) E V. Is T an isomorphism? Justify your answer. 3.2.2 Let V = Q2, W = Q3, and T a transformation from V to W. The only clues we have about T are that T(1,0)
= (2,3, -2),
T(1, 1)
= (4, -7,8) .
Determine T(x, y) for any (x, y) E V. 3.2.3 Let V = R 3 , W = R4, and T a transformation from V to W. The only clues we have about T are that T(1,0,0) = (1,-1,2,1), T(0,1,0) = (0,1,1,0), T(0,0,1) = (2,0,0,0).
Determine the range and kernel of the transformation T along with their dimensions. 3.2.4 Let P be the collection of all polynomials viewed as a vector space over the field of real numbers. (Note that P is an infinitedimensional vector space.) Let T be the differential operator on P defined by n-l
n-l
T(L~iXi) = (Li~iXi-l), i=O
i=l
and S the transformation on P defined by n-l
n-l
S(L ~ixi) = L(~i/(i + 1))xi+l, i=O
~o,6,
...
,~n real, n
i=O
2: 1. Compute TS and ST. Show that ST i= TS.
116
MATRIX ALGEBRA THEORY AND APPLICATIONS
3.3. Inverse Transformations Let T : V --+ W. We raise the question whether given any vector y in W, is it possible to recover the vector x in V through a linear transformation, related to T in some way. The answer to the question depends on the nature of the transformation T. The following are two crucial properties that the answer depends on.
(1) The map is injective or one-t~one: recall that the map T is injective if Xl and X2 are any two distinct vectors in V, then the vectors T(xI) and T(X2) are distinct. (2) The map is surjective or onto: recall that the map T is surjective if for every vector y in W, there exists at least one vector x in V such that T(x) = y, i.e., R(T) = W. P 3.3.1
Let T : V
--+
W.
(1) If the map T is injective, then there exists a linear transformation S : W --+ V such that ST = I, the identity transformation from V to V. (Such a transformation S is called the left inverse of T and is denoted by Ti I.) (2) If the map T is surjective, then there exists a linear transformation S : W --+ V such that TS = I, the identity transformation from W to W. (Such a transformation S is called the right inverse of T and is denoted by Tii. I.) (3) If the map T is bijective, i.e., T is injective and surjective, then there exists a transformation S : W --+ V such that ST = I and T S = I with the identity transformation I operating on the appropriate vector space. The transformation S is unique and is called the inverse of T. It is denoted by T-I. A bijective map is also called invertible. (4) There always exists a transformation S : W --+ V such that TST = T. Such a transformation S is called a generalized inverse (g-inverse) of T and is denoted by T-. PROOF. Before we embark on a proof, let us keep in line the entities we need. Let R(T) be the range of the transformation T. Choose and fix a complement RC of R(T) in W. See P 1.5.5 for details. Let K(T) be the kernel of the transformation T. Choose and fix a complement KC of K(T) in V. We now proceed to prove every part of the above proposition. We basically make use of the above subspaces,
Linear 'Pransformations and Matrices
117
their complements, and the associated projections. Let y be any vector in W. We can write y = Yo + YI uniquely with Yo E R(T) and YI E R C • Since T is injective, there exists a unique xo in V such that T(xo) = Yo. Define Til(y) = Xo. Thus Til is a well-defined map from W to V. It is clear that TilT = I, the identity transformation from V to V. It remains to show that the map Til is a transformation. This essentially follows from the property that any projection is a linear operation. Let a and j3 be two scalars and u and W be any two vectors in W. If we decompose u = Uo + UI and W = Wo + WI with uo, Wo E R(T) and UI, WI E R c, then we can identify the decomposition of au + j3w as
au + j3w = (auo
+ j3wo) + (aul + j3wt)
with auo + j3wo E R(T) and aUI + j3wl E RC. Let Xo and vo be the unique vectors in V such that T(xo) = Uo and T(vo) = WOo It is now clear that
This shows that the map Ti 1 is a transformation. This proves (1). We now work with K(T) and a complement K C of K(T) . Let y be any vector in W. Since T is surjective, there exists a vector x in V such that T(x) = y. But there could be more than one vector x satisfying T(x) = y. But there is one and only one vector Xo in KC such that T(xo) = y. This is not hard to see. Define
Til(y) = Xo · By the very nature of the definition of the map Til, TTii. 1 = I, the identity map from W to W. It remains to show that Tii. l is a transformation. In order to show this, one can craft an argument similar to the one used in Part (1) above, which proves (2). If the map is bijective, both the definitions of the maps Til and Til coincide. Let the common map be denoted by T- I . Of course, we have TT-I = I and T-IT = I. As for uniqueness, suppose S is a map such that TS = I, then
S
= IS = T- I T S = T- 1 I = T- 1 •
118
MATRIX ALGEBRA THEORY AND APPLICATIONS
Another interesting feature is that if Sand U are two maps from W to I V satisfying T S = I and UT = I, then we must have S = U = T- . This proves (3). Let yEW. Write uniquely y = Yo +YI with Yo E R(T) and YI E RC. Determine x E V such that T(x) = Yo. Decompose uniquely x = XO+XI with Xo E K C and Xl E K(T). Define T-(y) = Xo. T- is well-defined and indeed is a transformation from W to V. It is clear that rr-T = T, which proves (4). The transformations Til, Till and T- are not, in general, unique. Different possible choices arise from different choices of the complements R C and KC. We will exploit this fact when we explore the world of ginverses in one of the subsequent chapters. We record a few facts about inverse transformations for future reference. These results can easily be verified.
P 3.3.2 Let T be a bijective transformation (isomorphism) from a vector space V to a vector space W, and S a bijective transformation from the vector space W to a vector space U. Let a be a non-zero scalar. Then the following are valid.
(1) The transformation ST is a bijective transformation from V to U and (ST)-l = T-IS- I . (2) The transformation aT is a bijective transformation from V to Wand (aT)-1 = a-IT-I. (3) The transformation T-I is a bijective transformation from W to V and (T- I = T.
tl
Let us specialize about transformations that operate from a vector space V into itself. If T is such a transformation, T-I exists if and only if p(T) = dim(V). This follows from the identity v(T)+p(T) = dim(V). See P 3.1.7. Complements 3.3.1 Let V be the set of all complex numbers viewed as a vector space over the field of real numbers. Let T be the transformation from V to V defined as T(x + iy) = x - iy, the complex conjugate of the complex number x T-I.
+ iy
E V. Determine
Linear Transformations and Matrices
3.3.2
119
Let T be the transfonnation on the vector space C 2 defined by T{XI,X2} = {axi +,8x2,,),XI +8x2},
for {Xl, X2} E C 2 and a,,8, ,)" and 8 E C. Show that T is an isomorphism if a8 - ,8')' =1= o. In such an event, determine T- I . If a8 - ,8')' = 0, determine a g-inverse T- of T. 3.3.3 Let P be the vector space of all polynomials with real coefficients, which is viewed as an infinite-dimensional vector space over the field of real numbers. Let T be the differential operator on P defined by n-l
n-l
T{L aixi} = L i=O
iaixi-l,
i=l
real and n ~ 1. Show that T is surjective. Let S be the transformation on P defined by aO, aI, a2, ... ,an-l
n-l
n-l
S{L aixi} = L i=O
i=O
~xi+l. 'I,
+1
Show that S is a right inverse of T. Show that there is no left inverse for T. 3.3.4 If T 1 , T2, ... ,Tr are invertible transfonnations from a vector space V into V itself, show that T 1 T 2 ... Tr is also invertible. 3.3.5 Let T be a transfonnation from V to W. Show that for the existence of a left inverse of T, it is necessary that T is injective. If T is injective, show that any left inverse of T is surjective. 3.3.6 Let T be a transfonnation from V to W. Show that for the existence of a right inverse of T, it is necessary that T is surjective. If T is surjective, show that any right inverse of T is injective. 3.3.7 If T and S are two transformations from a finite-dimensional vector space V into itself such that TS = I, show that T is invertible and S = T-I. {Why do we need the finite-dimensionality condition on the vector space?} 3.3.8 Let T be a transformation from a finite-dimensional vector space V into itself enjoying the property that T{~J), T{X2}, . . . ,T{xr } are linearly independent in W whenever XI, X2, ... ,Xr are linearly independent in V for any r ~ 1. Show that T is invertible. Is the finitedimensionality condition needed?
120
MATRIX ALGEBRA THEORY AND APPLICATIONS
3.4. Matrices Let V and W be two finite-dimensional vector spaces over the same field F. Let T be a transformation from V to W. Let Xl. X2,··· ,Xm be a basis of the vector space V and Y1, Y2, . . . ,Yn that of W. For each i = 1,2, ... ,m, note that T(Xi) E W, and consequently, we can write (3.4.1) for some scalars Oij'S in F. These scalars and the transformation T can be regarded as two sides of the same coin. Knowing the transformation T is equivalent to knowing the bunch of scalars Oij'S. We explain why. The transformation T provides a rule or a formula which associates every vector in V with a vector in W . This rule can be captured by the set of scalars Oij'S. First of all, let us organize the scalars in the form of an n X m grid AT, which we call a matrix.
Let x be any vector in V. We can write
for some scalars 131, 132, . .. ,13m in F. In view of the uniqueness of the representation, the vector x can be identified with the m-tuple (131,{32 , ••• ,13m) in Fm. The transformed value T(x) of x under T can be written as
T(x) = I1Y1
+ 12Y2 + ... + InYn
(3.4.2)
for .som~ scala~s ,1,,2, ... "n in F. The transformed vector T(x) can be IdentIfied wIth the n-tuple (,1,,2, ... "n) in Fn. Thus the transformation T can be identified with a transformation from Fm to Fn. Once we know the vector x, or equivalently, its m-tuple (131,132, ... ,13m), the coordinates (,1,,2, ... "n) of the transformed vector T( x) can be obtained as m
Ii =
L i=l
O ij13j,
i = 1,2, ... ,no
(3.4.3)
Linear Transformations and Matrices
121
For, we note that
T(x) = T(f31Xl + f32X2 + ... + f3m Xm) = f31 T (Xt) + f32T(X2) + ... + f3mT(xm) n
n
n
= f31(L OilYi) + f32(L Oi2Yi) + ... + f3m(L OimYi) i=1
i=1
m
i=1
m
= (L Oljf3j)Yl
j=1
m
+ (L 0 2jf3j)Y2 + .. . + (L Onjf3j)Yn. j=1
j=1
(3.4.4)
using (3.4.1) in step 3 above. Then identifying (3.4.4) with (3.4.2), we obtain (3.4.3), which transforms f3i's to I/S. Equations (3.4.3) can be written symbolically as:
[0
11
012
aIm]
021
022
02m
°nl
°n2
°nm
[::] [:] -
or, in short ATb = c,
(3.4.5)
where b is the column vector consisting of entries f3i 's and c is the column vector consisting of entries I i'S. The symbolic representation etched above can be made algebraically meaningful. Operationally, the symbolic representation given above can be implemented as follows . Start with any vector x in V. Determine its coordinates b with respect to the given basis of V. Combine the entries of the matrix AT and the vector b as per the arithmetic set out in the equation (3.4.3) in order to obtain the coordinates c of the transformed vector T(x). Once we know the coordinates of the vector T(x), we can write down the vector T(x) using the given basis of the vector space W. There are two ways of giving an algebraic meaning to the symbolic representation (3.4.5) of the transformation T . In the representation, we seem to multiply a matrix of order n x m (i.e., a matrix consisting nm entries from the underlying field arranged in n rows and m columns) and a matrix of order m x 1 resulting in a matrix of order n x 1. One could spell out the rules of multiplication in such a scenario by invoking the
122
MATRIX ALGEBRA THEORY AND APPLICATIONS
equations (3.4.3). Another way is to define formally the multiplication of two matrices and then exclaim that in the symbolic representation (3.4.5) we are actually carrying out the multiplication. We will now spend some time on matrices and some basic operations on matrices. 3.4.1. Let F be a field. A matrix A of order m x n is an array of mn scalars from F arranged in m rows and n columns. The array is presented in the following form: DEFINITION
with the entries aij 's coming from the field F. Frequently, we abbreviate the matrix A in the form A = (aij), where aij is the generic entry located in the matrix at the junction of the i-th row and j-th column. The scalar aij is also called the (i,j)-th entry of the matrix A. The origin of the word "matrix" makes an interesting reading. In Latin, the word "matrix" means - womb, pregnant animal. The use of the word "matrix" in Linear Algebra, perhaps, refers to the way a matrix is depicted - it is a womb containing objects in an orderly fashion. The Indo-European root of the word "matrix" is ma which means mother. It is time to introduce some operations on matrices. In what follows, we assume that all matrices have scalars from one fixed field. DEFINITION
3.4.2.
(1) Addition. Let A = (aij) and B = (f3ij) be two matrices of the same order m x n. The matrix C = (/ij) of order m x n is defined by the rule that the (i,j)-th entry lij of C is given by lij = aij + f3ij for all 1 ~ i ~ m and 1 ~ j ~ n. The matrix C is called the sum of A and B, and is denoted by A + B. (2) Scalar Multiplication. Let A = (aij) be a matrix of order m x n and a a scalar. The matrix D = (8ij ) of order m x n is defined by the rule that the (i,j)-th entry 8ij of D is given by 8ij = aaij for all i and j. The matrix D is called a scalar multiple of A and is denoted by aA. If a = -1, the matrix aA is denoted by
-A.
(3) Multiplication. Let A = (aij) and B = (f3ij) be two matrices of order m x nand p x q, respectively. Say that A and Bare
123
Linear TIunsformations and Matrices
conformable for multiplication in the order they are written if the number of columns of A is the same as the number of rows of B, i.e., n = p. Suppose A and B are conformable for multiplication. The matrix E = (eij) of order m x q is defined by the rule that n
the (i,j)-th entry
eij
of E is given by
eij
=
:L (Xikf3kj
for all
k=l
1 :::; i :::; m and 1 :::; j :::; q. The matrix E is called the product of A and B, and is denoted by AB. Of all the operations defined on matrices defined above, the multiplication seems to be the most baffling. The operation of multiplication occurs in a natural way when we consider a pair of transformations and their composition. If two matrices A and B are conformable for multiplication, it is not true that B and A should be conformable for multiplication. Even if B and A are conformable for multiplication with the orders of AB and B A being identical, it is not true that we must have AB = BA! We introduce two special matrices. The matrix of order m x n in which every entry is zero is called the zero matrix and is denoted by Omxn. If the order of the matrix is clear from the context, we will denote the zero matrix simply by o. The matrix of order n x n in which every diagonal entry is equal to 1 and every off-diagonal entry is 0 is called the identity matrix of order n x n and is denoted by In. We now record some of the properties of the operations we have introduced above.
P 3.4.3
(1)
Let A, Band C be matrices of the same order. Then
(A+B)+C=A+(B+C), A+B = B+A, A+O=A, A+(-A)=O. (2) Let A, Band C be three matrices of orders m x n, n x p, and p X q, respectively. Then A(BC) = (AB)C of order m x q, ImA= Aln = Orxm A = AOnxs =
A, A, Orxn, Omxs.
MATRIX ALGEBRA THEORY AND APPLICATIONS
124
(3) Let A and B be two matrices of the same order m x n, C a matrix of order n x p and D a matrix of order q X m. Then
(A+B)C= AC+BC D(A+B) = DA+ DB. The above properties are not hard to establish. Some useful pointers emerge from the above deliberations. If Mm,n denotes the collection of all matrices of order m X n, then it is a vector space over the field F. If m = n, then Mm,m is an algebra. The operations are addition, scalar multiplication and multiplication of matrices. This algebra has a certain peculiarity. For two matrices A and B in Mm,m, it is possible that AB = 0 without each of A and B being the zero matrix. Construct an example yourself. We introduce two other operations on an m X n matrix A. One is called transpose, which is obtained by writing the columns of A as rows (i-th column as the i-th row, i = 1, ... ,n) and denoted by A' . It is seen that the order of A' is n X m. The following is an example of A and A'. 2
A=C
3
The following results concerning the transpose operation are easily established. P 3.4.4
(1) (AB)' = B'A', (ABC)' = C'B'A'. (2) (A + B)' = A' + B'. (3) (A2)' = (A')2. (4) If A = (aij) is a square symmetric matrix of order n
n
1
1
LL
aijXiXj
n,
then
= x' Ax
where x' = (Xl, ... ,x n ). Note that X is a column vector. (5) (A-I), = (A')-l if A is invertible, i.e., AA-l = A-I A = I holds.
Linear Transformations and Matrices
125
Another is called the conjugate tmnspose, applicable when the elements of A are from the field of complex numbers, which is obtained by first writing the columns of A as rows as in the transpose and replacing each element by its complex conjugate, and denoted by A*. Thus if A = Al +iA2, where Al and A2 are real, then A* = A~ -iA~. The following results concerning the conjugate transpose are easily established. P 3.4.5 (1) (AB)* = B* A*, (ABC)* = C* B* A*. (2) (A+B)* = A* +B*. (3) (aA)* = oA*, (0 is the complex conjugate of a). (4) (A-I)* = (A*)-I if A is invertible. We showed that associated with a transformation T: V ~ W, there exists an n x m matrix AT which provides a transformation from Fffl to Fn through the operation of matrix multiplication ATx, x E Fffl. As observed earlier, the transformations T and AT are isomorphic and AT does the same job as T. We quote some results which are easy to establish. P 3.4.6 Let T : V ~ Wand S : V -- W be two transformations with the associated matrices AT and As. Then
Aa:T+,BS = aAT + f3As. Let T: V ~ Wand S : W ~ U. Then AST = AsAT, which justifies the matrix multiplication as introduced in Definition 3.4.2 (3).
Complements 3.4.1 Let A = (aij) be a matrix of order n x n with entries from a field. Define n Trace A = Tr A = 2:aii i=I
i.e., the sum of diagonal elements. The following results are easily established: (1) Tr(A+B)=TrA+TrB (2) TraA = a Tr A, for any scalar a. (3) Tr(AB) = Tr(BA) for A of order m x nand B of order n x m. (4) Let x be an n-vector and A be n x n matrix. Then x' Ax = Tr(Axx ' ).
126
MATRIX ALGEBRA THEORY AND APPLICATIONS
Let us recall that in Section 3.1, we have introduced the range R(T) and the nullity K(T) of a transformation T and defined dimR(T) = p(T) as the rank of T and dim K(T) = v(T) as the nullity of T, satisfying the condition dim V = v(T) + p(T). The matrix AT associated with T, for chosen complements of R(T) and K(T), is a transformation on its own right from Fm to F n as represented in (3.4.5) with rule of multiplication given in (3.4.3). So we have V(AT) and p(AT) associated with AT. It is easy to establish the following proposition. P 3.4.7 Let T: V -+ Wand AT: Fm with T be as described above. Then
-+
Fn, the matrix associated
(1) p(T) = p(AT) (2) v(T) = V(AT) = dimSp(AT) = dimSp(A~) + V(AT) = m + v(A~) = n where Sp(AT) is the vector space generated by the column vectors of AT and Sp(A~) in the vector space generated the row vectors of AT, i.e., the column vectors of A~. In the rest of the chapters, we develop the algebra of matrices as a set of elements with the operations of addition, scalar multiplication, matrix multiplication, transpose and conjugate transpose as defined in this section. The results of Chapter two on spectral theory in the context bilinear forms are proved with special reference to matrices for applications to problems in statistics and econometrics.
(3) p(AT) (4) p(AT) (5) p(A~)
Complements 3.4.2 Let A E M n , i.e., a square matrix of order nand p(a) = ao + ala + ... + akak be a polynomial in a of degree k. Define P(A) = a01 + alA + ... + akAk E Mn. Show that if p(a) + q(a) = h(a) and p(a)q(a) = t(a) for scalar polynomials, then p(A) + q(A) = h(A) and p(A)q(A) = t(A). 3.4.3 Let p(,X) = ('x - 1)('x + 2) and consider the matrix equation P(A) = o. Show that A = 1 and -21 are roots of P(A) = o. Construct an example to show that there can be roots other than 1 and -21.
CHAPTER 4 CHARACTERISTICS OF MATRICES In Chapter 3, linear transformations and their associated matrices have been at the center of attraction. The associated matrix itself can be viewed as a linear transformation in its own right. To bring matters into proper perspective, let us recapitulate certain features of Chapter 3. Let V and W be two vector spaces of dimensions nand m, respectively, over the same field F. Let T be a linear transformation from V to W. Let AT be the matrix of order m x n associated with the transformation T. See Section 3.4. The entries of the matrix are all scalars belonging to the underlying field F. Once we arrive at the matrix AT, the flesh and blood of the transformation T, we can ignore the underlying transformation T for what goes on, and concentrate solely on the matrix AT. The matrix AT can now be viewed as a linear transformation from the vector space F n to Fffl by the following operational device: b -+ ATb, b E F n ,
where, as usual, we identify members of the vector spaces Fffl and Fn by column vectors. The subscript T attached to the associated matrix now becomes superficial and we drop it. In this chapter, we are solely concerned with matrices of order m x n with entries belonging to a field F of scalars. Let us recall that a matrix A of order m x n is an arrangement of mn elements in m rows and n columns with the (i,j)-th element indicated by aij' Addition and multiplication operations are as in Definition 3.4.2. Such matrices can be regarded as linear transformations from the vector space F n into the vector space Fffl. A variety of decomposition theorems for matrices will be presented along with their usefulness. In the initial part of this chapter, we will rehash certain notions introduced in the environment of linear transformations for matrices. 127
128
MATRIX ALGEBRA THEORY AND APPLICATIONS
4.1. Rank and Nullity of a Matrix Let A be a matrix of order m x n. Unless otherwise specified, the entries of the matrices are always from a fixed field F of scalars. The range space R(A) of A is defined by,
R(A) = {Ax: x E Fn}.
(4.1.1)
The set (4.1.1) or the subspace (4.1.1) has the equivalent expression
R(A)
= {alaI + ... +ana n : al, .. . ,an E F}
where aI, ... ,an are the n column vectors of A. It is also called the span of the column vectors of A, and alternatively written as Sp(A). It is time to move on to other entities. The rank p(A) of the matrix A is defined by,
p(A)
= dim[R(A)] = dim[R(A')].
( 4.1.2)
The number p(A) can be interpreted as the maximal number of linearly independent column vectors of A, or equivalently, as the maximal number of linearly independent row vectors of A. Another object of interest is the kernel K(A) of the matrix, which is defined by, K(A) = {x E F n : Ax = O}, (4.1.3) which can be verified to be a subspace of the vector space Fn, and the nullity v(A) of the matrix A by,
v(A) = dim[K(A)].
(4.1.4)
The kernel is also called by a different name: null space of A. From P 3.1.1, dim[R(A)] + dim[K(A)] = n = dim[Fn], (4.1.5) or equivalently,
p(A)
+ v(A) = n.
(4.1.6)
If we look at the identity (4.1.5), there is something odd about it. The range space R(A) is a subspace of the vector space F1n whereas the null space K( A) is a subspace of the vector space Fn, and the dimensions of
Chamcteristic8 of Matrices
129
these two subspaces add up to n! Let us rewrite the identity (4.1.5) in a different form: dim[R(A')]
+ dim[K(A)]
= n.
(4.1.7)
Now, both R(A') and K(A) are subspaces of the vector space Fn. The identity (4.1.7) reads better. Then one is immediately led to the question whether R(A') n K(A) = {O}? (4.1.8) The answer to this question depends on the make-up of the field F. If F is the field of real numbers, (4.1.8) certainly holds. There are fields for which (4.1.8) is not valid. For an example, let F = {O, I} and
A=[~ ~]. Note that dim[K(A)] = 1 and dim[R(A')] = 1. Further, K(A) n R(A') is a one-dimensional subspace of F2. Even the field F = C of complex numbers is an oddity. We can have (4.1.8) not satisfied for a matrix A. The following is a simple example. Fix two real numbers a and b, both not zero. Let
A
=
[aa+'tb + ~b
bb-
iaia] .
It is clear that the subspace R(A') of F2 is one-dimensional. Let x' = (a+ib, b-ia). One can check that Ax = O. By (4.1.7), dim[K(A)] = 1. But K(A) n R(A') is a one-dimensional subspace providing a negative response to (4.1.8). Does it puzzle you why the answer to the question in (4.1.8) is in the affirmative when F = R but not when F = C or {O, I}? It is time for some introspection. Let us search for a necessary and sufficient condition on a matrix A for which K(A) n R(A') = {O}. Let A be a matrix of order n x k and x E R(A'). Let aI, a2, ... , an be the columns of A'. We can write (4.1.9)
130
MATRIX ALGEBRA THEORY AND APPLICATIONS
for some scalars at, a2, ... , an in F. Suppose x E K(A). This is equivalent to:
0= Ax = (at,a2, ... ,an)'x, a~x
= 0
for
i = 1,2, ... , n.
(4.1.10)
Combining (4.1.9) and (4.1.10), we obtain
Let us rewrite the above equations in a succinct matrix form: AA' a = 0, where a' = (at, a2, ... , an). Note that (4.1.9) can be rewritten as A'a = x. The deliberations carried out so far make us conclude that K(A) n R(A') = {O} if and only if AA'a = 0 for any vector a implies that A' a = o. We can enshrine this result in the form of a proposition.
P 4.1.1 Let A be any matrix of order n x k. Then K(A) n R(A') = {O} if and only if
AA' a
=0
for any
a ~ A' a
= o.
(4.1.11)
The condition (4.1.11) is the one that sorts out the fields. If F is the field of real numbers, then (4.1.11) is always true for any matrix A. To see this, suppose AA'a = 0 for some a. Then a' AA'a = O. Let Y = A'a. We can rewrite aAA'a = 0 as y'y = O. But y'y is a sum of squares of real numbers, which implies that y = A' a = O. If F = {O, 1}, there are matrices for which (4.1.11) is not true. Take
A
= [~
~]
and
a'
= (1,0).
H F is the field of complex numbers, there are matrices A for which (4.1.11) is not true. It all boils down to the following query. Suppose Yt, Y2, ... , Yn are complex numbers such that yi + y~ + ... + y~ = O. Does this mean that each Yi = O?
Characteristics of Matrices
131
Complements 4.1.1 Let A and B be matrices of orders m x nand s x m, respectively. Show that R{A) = K{B) if and only if
R{B') = K{A'). 4.1.2 Show that a subset S of the vector space Fn is a subspace of Fn if and only if S is the null space of a matrix. 4.2. Rank and Product of Matrices Let us extend the range of ruscussion carried above to products of matrices. Suppose A and B are two matrices such that the product AB is meaningful. Observe that every column of AB is a linear combination of the columns of A. In a similar vein, every row of AB is a linear combination of the rows of B . This simple observation leads to the fruitful inclusion relations:
R{AB) C R{A), R{B' A') C R{B').
<=> p{AB)
~
p{A), p{AB)
~
p{B).
(4.2.1) (4.2.2)
We can combine both the inequalities into a proposition. P 4.2.1
p{AB} ~ min{p(A}, p(B)}.
( 4.2.3)
We are led to another inquiry: when does equality hold in (4.2.3)? We have a precise answer for this question. For this, we need to take a detour on inverses of one kind or the other. In Section 3.3, we spoke about inverse transformations of various types. We need to refurbish these notions in the environment of matrices. There are two ways to do this. One way is to view each matrix as a linear transformation from one vector space to another, work out a relevant inverse transformation and then obtain its associated matrix. This process is a little tortuous. Another way is to define inverses of various kinds for a given matrix directly. We pursue the second approach.
132
MATRIX ALGEBRA THEORY AND APPLICATIONS
DEFINITION 4.2.2. Let
A be a matrix of order m x n .
(1) A left inverse of A is a matrix B of order nxm such that BA = I, where I is the identity matrix of order n x n, i.e., I is a diagonal matrix in which every diagonal entry is equal to the unit element 1 of the field F. (If B exists, it is usually denoted by A'Ll.) (2) A right inverse of A is a matrix C of order n x m such that AC = I, where I is the identity matrix of order m x m. (If C exists, it is usually denoted by A RI.) (3) A regular inverse of A of order n x n is a matrix C such that AC = I. IT C exists, it is denoted by A-I. In such a case AA-I = 1= A-IA. We know precisely the conditions under which a transformation admits a left inverse, a right inverse, or an inverse. These conditions, in the realm of matrices, have a nice interpretation. The following results are easily established. p 4.2.3
Let A be a matrix of order m x n.
(1) A admits a left inverse if and only if p(A) = n. (2) A admits a right inverse if and only if p(A) = m. (3) A admits a regular inverse if and only if m = nand p(A)
= n.
P 4.2.4 Let A and B be two matrices of orders m x nand n x s, respectively. Then:
(1) p(AB)
= p(B)
if A'Ll exists, i.e., (2) p(AB) = p(A) if Bi/ exists, i.e.,
p(A) = n. p(B) = n.
PROOF. Suppose that the left inverse of A exists. Note that B = (A'Ll A)B. Also, by (4.2.3),
p(AB) ~ p(B) = p(A'LI AB) ~ p(AB), from which the professed equality follows. One can establish (2) in a similar vein. The above proposition has very useful applications in a variety of contexts. A standard scenario can be described as follows. Suppose B is any arbitrary matrix. If we pre-multiply or post-multiply B by any non-singular matrix A, the rank remains unaltered, i.e., p(AB) ~ p(B) or p(BA) = p(B), as the case may be .
Charncteristics of Matrices
133
Once equality in (4.2.3) holds, it sets a chain reaction, as the following proposition exemplifies. P 4.2.5 Let A and B be two matrices of order m x nand n x s, respectively. (I) If p(AB} = peA}, then p(CAB} = p(CA} for any matrix C for which the multiplication makes sense. (2) If p(AB} = pCB}, then p(ABD} = p(BD} for any matrix D for which the multiplication makes sense. PROOF . (I) Each column of the matrix AB is a linear combination of the columns of A. The condition p(AB} = peA} implies that the subspaces spanned by the columns of AB and of A individually are identical. What this means is that every column of A is a linear combination of the columns of AB. (Why?) Consequently, we can write A = ABT for some suitable matrix T. Now, for any arbitrary matrix C for which the multiplication involved in the following makes sense,
p(CA} = p(CABT)
~
p(CAB)
~
p(CA),
from which the desired equality follows. The result (2) follows in a similar vein. The condition given in P 4.2.5 for the equality of ranks in (4.2.3) is sufficient but not necessary. Let F be the field of real numbers. Further let
A=B= Then p(AB) = pCB) but peA} =1= 3. We are still in search of a necessary and sufficient condition for the equality in (4.2.3). Let us pinpoint the exact relationship between p(AB) and pCB). P 4.2.6 Let A and B be two matrices of order m x nand n x s , respectively. Then
+ dim[K(A} n R(B)], p(AB} + dim[R(A'} n K(B'}].
(1)
pCB} = p(AB)
(2)
peA) =
suffices to prove the first part. The second part is a consequence of the first part. (Why?) To prove (1) , we observe that PROOF. It
{Bx E F n
:
ABx = O} = K(A} n R(B).
134
MATRIX ALGEBRA THEORY AND APPLICATIONS
We will now find the dimension of the subspace {Bx E Fn : ABx = O} of Fn. Note that {Bx E F n
:
ABx = O} = {Bx E F n
:
x E K{AB)}.
Let V = K{ AB) and W = Fn. The matrix B can be viewed as a transformation from the vector space V to the vector space W. The set {Bx E Fn : x E K{AB)} then becomes the range of this transfonnation. By P 3.1.1 (3), dim{{Bx E F n
:
x E K{AB)}) +dim{{x E K{AB): Bx
= O})
= dim{K{AB)), which gives
dim{{Bx E F n : x E K{AB)})
= =
dim{K{AB)) - dim{{x E K(AB) : Bx
= O})
s - dim{R{AB)) - [8 - dim{R{B))] = p{B) - p{AB).
This completes the proof. One can officially close the search for a necessary and sufficient condition if one is satisfied with the following result. P 4.2.7 Let A and B be two matrices of orders m x nand n x respectively. Then
= p(B) p(AB) = p(A)
(1)
p{AB)
(2)
8,
= {O}; R(A') n K(B') = {O}.
if and only if K(A) n R(B) if and only if
PROOF. (1) This is an immediate consequence of P 4.2.5. The second statement (2) is a consequence of (1). We now spend some time on a certain celebrated inequality on ranks, namely, Frobenius inequality.
P 4.2.8 (Frobenius Inequality) Let A, Band C be three matrices such that the products AC and C B are defined. Then p(ACB) PROOF.
+ p(C)
~ p(AC)
+ p(CB).
( 4.2.4)
Consider the product of partitioned matrices,
0 AC] [ I 01 = C -B I
I -A] [ [o I CB
[-ACB 0
0] C·
Characteristics of Matrices
135
Since the matrices
are non-singular, it follows that
[°
p CB
°
AC]_ [-ACB C - P
But p
CO]
p(ACB)
+ p(C).
[C; Ag] ~ p [dB J + P[Acf] -
p(CB)
+ p(AC),
(why?)
from which the desired inequality follows. The following inequality is a special case of Frobenius inequality. P 4.2.9 (Sylvester's Inequality) Let A and B be two matrices of orders m x nand n x s, respectively. Then p(AB) ;::: p(A)
+ p(B) -
n,
with equality if and only if K(A) C R(B). In the Frobenius inequality, take C = In. The inequality follows. From P 4.2.6 (1), p(B) = p(AB)+dim[K(A)nR(B)]. If equality holds in the Sylvester's inequality, then we have dim[K(A) n R(B)] = n - p(A) = n - dim[R(A)] = dim[K(A)]. See (4.1.5). Consequently, equality holds in Sylvester's inequality if and only if K(A) c R(B). PROOF.
Complements 4.2.1 Reprove P 4.2.3 using results of Sections 3.3 and 3.4. 4.2.2 Let A and B be square matrices of the same order. Is p(AB) = p(BA)? 4.2.3 Let A and B be two matrices of orders m x nand m x s, respectively. Let (AlB) be the augmented matrix of order m x (n + s). Show that p(AIB) = p(A)
if and only if B = AC for some matrix C.
136
MATRIX ALGEBRA THEORY AND APPLICATIONS
4.3. Rank Factorization and Further Results In a variety of applications in engineering and statistics, we need to express a matrix as a product of simple and elegant matrices. In thls section , we consider one of the simplest such factorizations of matrices. . P 4.3.1 (Rank Factorization Theorem) Let A be a matrIX of order m x n with rank a. Then A can be factorized as A=RF,
where R is of order m x a, F is of order a x n, Sp(A) Sp( F'), and p( R) = p( F) = a . Alternatively,
(4.3.1)
= Sp(R),
A=SDG,
Sp(A') =
( 4.3.2)
where Sand G are non-singular matrices of order m x m and n x n, respectively, and D is a block matrix of the following structure: _ D -
[
a Ia xa
a x (n0 - a)
0
0 (m - a) x (n - a)
(m - a) x a
1 .
PROOF. The main goal in the above factorization is to write the matrix A of rank a as a product of two matrices one of them has a full column rank and the other full row rank. This is like removing the chaff from the grain so as to get to the kernel. A proof can be built by imitating the winnowing process. Take any basis of the vector space Sp(A). The column vectors of the basis are taken as the column vectors of the matrix R of order m x a. Every column vector of A is now a linear combination of the columns of R . Consequently, we can write A = RF for some matrix F. Once R is chosen, there is only one matrix F whlch fulfills the above rank factorization. (Why?) It is obvious that p(R) = a. It is also true that p(F) = a. Thls can be seen as follows. Note that
a = p(A) = p(RF) ~ p(F) ~ min{a,n} ~ a. The fact that Sp( A') = Sp( F') follows from the facts that every row vector of A is a linear combination of the rows of F and p( F) = a.
Chamcteristic8 of Matrices
137
To obtain the form (4.3.2), one simply needs to augment matrices in (4.3.1) to get non-singular matrices. For example, let S = (R : Ro), where the columns of the matrix Ro is any basis of a complementary subspace of Sp(R) in Fm. The matrix G is obtained by adjoining F with (n - a) rows so that G becomes non-singular. Take any basis of the complementary subspace of the vector space spanned by the rows of F. The basis vectors are adjoined to F to form the matrix G. A direct multiplication of (4.3.2) gives (4.3.1), which justifies the validity of (4.3.2). That is all there is in rank factorization. Such a simple factorization is useful to answer some pertinent questions on ranks of matrices. One of the perennial questions is to know under what condition does the equality p(A + B) = p(A) + p(B) holds for two given matrices A and B. The following result goes some way to answer this question. P 4.3.2
Let A and B be two matrices of the same order with ranks
a and b, respectively. Then the following statements are equivalent.
(1)
p(A + B) = p(A)
(2)
p(AIB)
(3)
+ p(B).
(4.3.3)
= p [~ ] = p(A) + p(B). Sp(A) n Sp(B) = {O} and Sp(A') n Sp(B') = {O}.
(4.3.4) (4.3.5)
(4)
The matrices A and B can be factorized in the following style:
A= S
Ia
0 0] Gj
(4.3.6)
0 0 0 [000
where the zeros are matrices of appropriate order, and
lSI # 0, IGI # O.
We start with rank factorization of A and B. Write A = RtFt and B = R2F2 with Sp(A) = Sp(Rt), Sp(A') = Sp(FD, Sp(B) = Sp(R2) and Sp(B') = Sp(FD. It is obvious that PROOF.
Sp(AIB) = Sp(RtIR2)' Sp(A'IB') = Sp(F{lF~), from which it follows that
p(AIB) = p(R t IR2),
p(~) = p(~:).
138
MATRIX ALGEBRA THEORY AND APPLICATIONS
We are now ready to prove the equivalence of the statements. Let us assume (1) is true. Then
p(A)
+ p(B) = p(A + B) = P(RI FI + R2F2) = p((RIIR2) (~:)) ::; P(RIIR2) = p(AIB) ::; p(Rd + p(R2) = p(A)
+ p(B),
from which p(AIB) = p(A) + p(B) follows. The equality p(~) = p(A) + p(B) follows in a similar vein. Thus (1) => (2). Since p(AIB) = dim[Sp(AIB)] = the maximal number of linearly independent columns of the augmented matrix (AlB) = p(A) + p(B) = dim[Sp(A)]+dim[Sp(B)] = the maximal number oflinearly independent columns of A+ maximal number of linearly independent columns of B , it follows that Sp(A) n Sp(B) = {O}. It also follows in a similar vein that Sp(A') n Sp(B') = {O} . Thus (3) follows from (2). Let us augment the matrix (R 1 IR2) of order m x (a + b) to a nonsingular matrix of order m x m. Since Sp(A) = Sp(R 1 ) , Sp(B) = Sp(R2)' Sp(RdnSp(R2) = {O} (by hypothesis), P(Rl) = a, and p(R2) = b, all the columns of the matrix (R 1 IR2) are linearly independent. Consequently, we can find a matrix Ro of order m x (m - (a + b)) such that the augmented matrix S = (RIIR2IRo) is nonsingular. Following the same line of reasoning, we can find a matrix Fo of order (n - (a + b)) x n such that
is non-singular. The professed equalities of expressions in (4) follow now routinely. Finally, (4) => (1) obviously. The equality p(A + B) = p(A) + p(B) is very hard to fulfill. What the above result indicates is that if no column vector of A is linearly dependent on the columns of B , no column vector of B is linearly dependent on the columns of A, no row vector of A is linearly dependent on the rows of B, and no row vector of B is linearly dependent on the rows of A, only then the purported equality in (1) will hold. P 4.3.2 can be generalized. Looking at the proof a little critically, it seems that what is good for two matrices should be good for any finite number of matrices.
Chamcteristics of Matrices
139
P 4.3.3 Let AI, A2, ... , Ak be any finite number of matrices each of order mxn with ranks aI, a2, . .. , ak, respectively. Then the following statements are equivalent.
(1) P(AI + A2 + ... + A k) = p(At} + p(A2) + ... + p(Ak). (2) P(AIIA21 .. ·IAk) = p(A~IA21 .. ·IA~) = P(Al) + ... + p(Ak). (3) Sp(Ai)nSp(Aj) = {O} and Sp(ADnSp(Aj) = {O} for all i =/=j. (4) There exist nonsingular matrices Sand G of orders m x m and n x n, respectively, such that Ai
= SDiG, i =
1,2, ... ,k,
where Di is a block matrix of order m x n with the following structure. Row Block No.
Column Block No.
1 2
z
2
o o
0 0
o o
i
o
0
k+1
o
0
1
... (k
+ 1) o
o o
o
o
The zeros appearing in the above are zero matrices of appropriate order. We now consider the problem of computing the rank of a matrix utilizing the notion of a Schur complement. Let A be a matrix of order (n + s) x (n + t) partitioned in the following style.
A --
[nFin nf
.
sxt
Assume that E is nonsingular. Define AjE=H-GE-lF.
(4.3.7)
140
MATRIX ALGEBRA THEORY AND APPLICATIONS
The matrix AlE of order s x t is called the Schur complement of A with respect to the non-singular submatrix E of A. The Schur complement occurs in a wide variety of contexts in mathematics and statistics. Suppose that A is a square matrix, Le., s = t. The determinant IAI of A can be expressed in terms of the determinant of any Schur complement of A. More precisely, IAI = (lEI) (IAI EI) · We are jumping ahead a little talking about determinants. A detailed discussion of determinants begins in Section 4.4 (see Complement 4.4.7 in this connection). p 4.3.4 Let A be a matrix of order (n + s) x (n + t) partitioned in the style of (4.3.7) with E non-singular. Then
p(A) = p(E) PROOF.
+ p(AI E).
Consider the product .
In F] [-GE- IsOJ [EG HF] [In0 -E-l It l
=
[E0 AlE 0] '
( 4.3.8)
where the zeros are zero matrices of appropriate order. The matrix that appears on the left of A is of order (n + s) x (n + s) and non-singular. The matrix that appears next to A is of order (n + t) x (n + t) and non-si ngular. Consequently,
p(A)
= p [~
A~ E]
= p(E) + p(AI E).
Why?
The above result has a practical significance. The computation of the rank of a matrix is painfully and computationally laborious. The above result breaks up the computation into manageable pieces of computation. As an example, consider the following matrix.
A=
1 1 1 1 1 1
1 1 1
0 0 0
0 0 0
1 1 1
1
0
0 0
1
1
0 0
0 0
0 1 0 0 1 0 0 1
Characteristics of Matrices
141
Partition the matrix A in the form
3EJ3 A=
C
[ C
3x3
The matrix H is the identity matrix of order 3 whose inverse is itself. Therefore,
p(A) = p(H) + p(A/ H), where
A/H=E-FH-1C=E-C=
[~ ~ =~] o
1
-1
It is clear that p(A/ H) = 1. Consequently, p(A) = 3 + 1 = 4. In a variety of engineering applications, one comes across large dimensional matrices as big as the order of 5,000 x 5,000. An algorithm can be developed to determine the rank of such matrices based on the result of P 4.3.4.
Complements 4.3.1
For any two matrices A and B of the same order, show that
p(A) + p(B) - c - d :S p(A + B) :S p(A)
+ p(B) -
max {c, d},
where c = dim[Sp(A) n Sp(B)] and d = dim[Sp(A') n Sp(B')]. Show that these inequalities are the best possible. See Marsaglia (1967) and Marsaglia and Styan (1974). 4.3.2 Let A be a matrix of order m x n. Show that p(A) = 1 if and only if A = xy' for some non-zero vectors x and y of orders m x 1 and n x 1, respectively. 4.3.3 Obtain the rank factorization of the matrix
A
=
-1 2 4] [o 2
-1
2
3
10
.
4.3.4 Let A be a matrix of order m x m and of rank m - 1. Show that A can be made non-singular by changing just one element of A. 4.3.5 Let A be a non-singular matrix of order m x m. Show that A can be made singular by changing just one element of A. What can you say about the rank of the resultant matrix?
142
MATRIX ALGEBRA THEORY AND APPLICATIONS
4.4. Determinants We came across determinants in earlier chapters. We now introduce the notion of a determinant from an abstract angle. Let Mn denote the collection of all matrices of order n x n with entries coming from a fixed field F of scalars. The integer n ~ 1 is fixed. A matrix A E Mn can be identified in three equivalent ways.
(1) A =
(Oij)
The (i,j)-th entry of A, i.e., the entry located at the junction of the i-th row and j-th column of A is Oij' (2) A = (al,a2, ... ,an), where ai is the i-th column vector of A. Let In denote the identity matrix of order n x n in which every diagonal entry is 1, the unit element of the field F, and every off-diagonal entry is zero. In this section, every column vector is taken to be of order nxl. DEFINITION
by det A or properties. (1)
4.4.1. The determinant of a matrix A E Mn, denoted having the following
IAI, is a map from Mn into the field F
Multilinearity
For any 1 ~ i ~ nand (n + 1) column vectors at, a2, ... , ai-I, ai+t, ... , an, bl, and b2, and scalars 0 and (3, det(al,a2, ... ,ai-I, ObI +(3b2,ai+I, ... ,an) =odet(at, a 2, ... ,ai-l,b l ,ai+l, ... ,an) +(3 det(aI, a2, ... , ai-I, b2, ai+l, ... , an).
If i = 1, the above equality is taken to be det(obl =
0
+ (3b 2, a2···
, an)
det(bl , a2, ... , an)
+ (3 det(~, a2, ...
, an).
If i = n, the above equality is taken to be det(aI, a2, ... , an-I, ObI
=
0
+ (3b 2)
det(al, a2,··. , an-I, bl ) + (3 det(at, a2, . .. , an-I, b2 ).
Charncteristics of Matrices
(2)
143
Alternating
For any n column vectors a1, a2, ... ,an, det(a1, a2, . .. ,an) = 0 if any two colurrms are identical.
(3)
det(In) = 1.
The common notation that is used for the determinant of a matrix IAI. We will use this notation as well as "det" depending upon which one makes a convenient reading. The following properties of the determinant map follow solely from the definition of the map.
A is
P 4.4.2
Let A = (a1' a2, ... ,an) be any matrix.
(1) If a1 is dependent on a2, a3, ... ,an, i.e., a1 is a linear combination of a2, a3, . .. ,an, then IAI = O. (2) Let B be the matrix obtained from A by interchanging two colurrms of A. Then IBI = -IAI. (3) Let A = (O:ij). Let IT be the collection of all permutations of {I, 2, ... ,n}. Then
IAI =
L
sign(1l') 0:111"(1)0:211"(2)' " O:n1l"(n) ,
1I"En
where sign(1l') =
{
+1
if
-1
if 7T' is an odd premutation.
7T'
is an even permutation,
(4) Let A be any matrix. Then IAI = IA'I. (Recall that A' is the transpose of the matrix A.) (5) Let A and B be any two matrices. Then IABI = IAIIBI. The computation of the determinant of a matrix is important in many applications. The formula given in (3) of P 4.4.2 is very labor-intensive. A simple recurrence formula should come handy. Some recurrence formulas will be presented in Section 4.5. We will now present a simple formula of expanding the determinant of a matrix in terms of lower order determinants. Let A = (O:ij) be a matrix of order n x n. For each 1 ~ i, j ~ n, let Aij be the submatrix of A obtained from A by deleting its i-th row and j-th column. The determinant of A ij , IAij I is called the
144
MATRIX ALGEBRA THEORY AND APPLICATIONS
(i, j)-th minor of A. The (i, j)-th co-factor C ij is defined by the formula: Cij = (-I)i+ j IAijl. The following results are easily established. P 4.4.3 (1) Let A be a diagonal matrix of order n x n, i.e., every off-diagonal entry is zero. Then the determinant of A is the product of its diagonal entries. (2) Let A be an upper triangular matrix of order n x n , i.e., every entry below the main diagonal of A is zero. Then the determinant of A is the product of its diagonal entries. (3) Let A be a lower triangular matrix of order n x n, i.e., every entry above the main diagonal of A is zero. Then the determinant of A is the product of its diagonal entries.
(4)
Let A = (ai j ) be any matrix of order n x n. Then
(a)
IAI =
arlCrl
+ ar2Cr2 + ... + arnCrn ,
for every 1 :S r:S n
(expansion of the determinant by the r-th row of A),
(b)
IAI = al i Cli + a2iC2i + ... + aniCni,
for every 1 :S i :S n
(expansion of the determinant by the i-th column of A). (5)
Let A
= (aij)
be any matrix of order n x n. Then for any r
=I s,
(6) Let A be a square matrix such that IAI =I O. Let B = IAI- 1 (Cij )', i.e., the (i, j)-th entry of B is CjdlAI. Then AB = BA = In. (7)
A square matrix A is invertible or non-singular if and only if
(8)
Let A be a square matrix and
(9)
Let A be any square matrix.
IAI =I o.
a a scalar. Then laAI = Then IA21 = (IAI)2.
anlAI.
Finally, we close this section with a couple of definitions. A square matrix A is said to be symmetric if A' = A, i.e., the (i,j)-th and (j, i)-th entries of A are identical for every i and j. The adjugate of a square matrix A is defined to be the matrix (Cij )', where we recall that C ij is
Characteristics of Matrices
145
the (i,j)-th ~factor of A. The adjugate of A is denoted by adj(A).
Complements 4.4.1 A matrix A = (aij) of order n x n is said to be skew-symmetric if aii = 0 for all i and aij = -aji for all i and j. Show that: if n is odd, IAI = { 0, a perfect square, if n is even (2) p(A) is always an even number. (3) Every square matrix can be written as a sum of a symmetric matrix and a skew-symmetric matrix.
(1)
4.4.2 Let 01,02, ... ,On be n scalars. The Vandermond determinant based on the given scalars is defined to be the determinant of the matrix 1
1
1
A=
Show that IAI = lli>j(Oi - OJ). 4.4.3 Let A be a matrix of order n x n and x a column vector of order n x 1. Let 0 be a scalar. Show that:
(1)
= olAI- x'(adj(A))x,
(2)
=
-IA + xx'i =
= -IAI(l
-IAI- x'(adj(A))x,
+ x' A-Ix)
if IAI
i= o.
4.4.4 Let A be a matrix of order n x n, U a matrix of order n x n every entry of which is equal to 1, V a column vector of order n x 1 in which every entry is equal to 1, and 0 a scalar. Show that:
146
(1) (2) (3)
MATRIX ALGEBRA THEORY AND APPLICATIONS
IA + aUI = IAI + aV'(adj(A))Vj V'adj(A + aU) = V'(adj(A))j (adj(A + aU))V = (adj(A))V.
4.4.5 Let A and B be two matrices of the same order n x n. Determine a necessary and sufficient condition in order that IA + BI = IAI + IBI· 4.4.6 Let A be a matrix of order n x n. Show that adj(adj(A)) =
IAln-2 A. 4.4.7
Let A be a matrix of order (n+s) x (n+s) partitioned as follows.
[n~n
A=
sxn
Assume that E is non-singular. Show that (using P 4.3.4)
IAI = (IEI)(IH - GE- t Fl). 4.5. Determinants and Minors Another important problem is the computation of the determinant of a square matrix. If the matrix is big, the computation is tedious. One needs to find a simple way of computing determinants. We present one such method. For this we need to introduce a cogent notation for minors of a matrix. Let A be a square matrix of order m X m. Let 1 ~ p ~ m and 1 ~ it < i2 < ... < ip ~ m and 1 ~ jt < j2 < ... < jp ~ m. Let a = (it, i 2, ... , ip) and (3 = (jt, h, ... ,jp). Consider the submatrix of A obtained by retaining only the it-th, i2-th, ... , ip-th rows and jt-th, j2-th, ... , jp-th columns of A and deleting the rest from A. Suppose
A
=
1 2 3 4]
-3 4 6 -5 2 2 1 -1 1 -1 2 2
[
'
a = (1,3) and (3 = (2,4). The submatrix that corresponds to the choice of a and (3 is given by
Characteristics of Matrices
147
Let us get back to the generalities. For a given choice of a and {3 described above, let Aa.8 be the submatrix of A associated with a and {3. The determinant IAa.81 of matrix Aa.8 is called a minor of order p. (Note that the minor Aij introduced in Section 4.4 is the same as the minor Aa.8 of order m - 1 with a = (1, 2, . .. , i - I , i + 1, . .. , m) and {3=(1,2, ... ,j-l,j+l, ... ,m).) How many such minors can one work out? If p = m, there is only one minor of order m which is the determinant of A. If p = 1, there are m 2 minors of order 1. The basic computation of the number reduces to how many a's and {3's; one can find. There are each of such a's and {3's. Let M = which is the number of combinations of selecting p objects out of m. Thus there are M2 minors of order p. We would like to build a matrix based on these M2 minors. Before we do this, we need to arrange the p-tuples a's and {3's individually in some order. But it is customary to arrange a's as well as {3's in lexicographic order. Let
C;;)
e;:),
Ap
= {a = (i t ,i2, ... ,ip): 1::; it < i2 < ... < ip::; m}.
For distinct a = (it, i 2 , ... ,ip) and a* = (it, i 2, ... ,i;) in Ap, say that a < a* lexicographically if it < it, or it = ii' but i2 < i 2, or it = it, i2 = i2 but i3 < i 3, ... , or it = it, i2 = i 2, .. · ,ip_t = i;_t but ip < The lexicographic order is also called dictionary order and it is indeed a linear order, i.e., in addition to being a partial order, any two members of Ap are comparable. For example, suppose m = 4 and p = 2. The cardinality of the set A2 is six and its members are laid out in the dictionary order as follows.
i;.
(I,2)
< (1,3) < (1,4) < (2,3) < (2,4) < (3,4).
Now we are ready to define what is called a compound matrix of A. 4.5.1. Let A be a matrix of order m x m and 1 ::; p ::; m. The compound matrix of A of order p is a matrix A[pJ of order M x M whose (a, {3)-th entry is given by IAa.8l, a, {3 E A p , where M = C;:)· The order in which a's and {3's are written down is the lexicographical order enunciated above. We look at a numerical illustration. Suppose m = 4 and p = 2. Let 1 2 _ -3 4 6 -5 A2 2 1 -1 . [ DEFINITION
3 4]
1
-1
2
2
148
MATRIX ALGEBRA THEORY AND APPLICATIONS
The compound matrix of order 2 is given by:
10 -2 -3
A(2)
= -14 -1 -4
15 7 0 -26 -39 -9 -5 -4 -10 -7 8 -2 7 -1 -2 6 -1 -15 -7 -8 3 22 -12 -1 14 4 5 5 3 3
Let us go back to the general case. Note that in the extreme cases, A[l) = A and A[m) = (IAI). There are quite a number of properties enjoyed by the operation of compounding. We relegate these properties to the Complements Section. We need to broaden the definition of a minor in two directions. First, we need not confine our attention to square matrices alone. Let A be any matrix of order m x n. A p-th order minor of A makes sense with 1 S P S min{m,n}. We retain some prows, p columns of A, discard the rest and then form the determinant. Even if p exceeds min{m, n}, conventionally, we can define the p-th order minor of A to be equal to zero. Secondly, the definition of Ap seems to be too restrictive. It is not necessary to have the components of any vector a in Ap to be distinct and strictly increasing. Let us enlarge the set Ap to
For a = (it, i2, ... ,ip) and (3 trix Acl:f.3 of A to be
= (j), h, ... ,jp) in A;, define the subma-
ai2il
ai2i2
~,j, a· .
ai"il
ai"h
a· .
[ aid. Aaf.3=
ai 1i2
'"},,
1
'"},,
The determinant IAaf.3 I of the matrix Aaf.3 is called a minor of order p. If two components of a are identical or two components of {3 are identical, it is clear that IAaf.31 = o. Further, if a and a* belong to A *, and a is a permutation of a*, then the minors IAaf.31 and IAa*f.3 1differ~nly in sign. There are several advantages in extending the definition of a minor as we will see in some of the proofs we present below.
149
Chamcteristics of Matrices
We now spend some time on the Cauchy-Binet formula. Let A, B and C be three matrices of order m x m such that A = BC. We have already seen that the determinant of A has a simple relationship with the determinants of Band C. More precisely, we have IAI = (IBI)(ICI). In the Cauchy-Binet formula this simple relationship is extended to cover minors of A. Recall the definition of Ap = {o: = (i},i 2, ... ,ip ) : 1 ~ i1 < i2 < ... < ip ~ m}, where 1 ~ P ~ m.
P 4.5.2 (Cauchy-Binet Formula). Let A = (aij),B = (bij) and C = (Cij) be three matrices each of order m x m such that A = BC. Let 1 ~ p ~ m. Let 0: = (il, i2, ... , ip) and f3 = (jI,j2,'" ,jp) E Ap. Then the p-th order minor
IAO:.81
IAo:.81 =
is given by
L
IBo:"(l IC"(.8I·
"(EAp
We omit the proof which follows from definitions. Let A be a matrix of order m x m. The compound matrix A[m-I) of order m x m is of special interest. This is related to the adjugate matrix adj(A) of A. See Section 4.4. The (i,j)-th entry a ij of adj(A) is given by, ij a = (-l)i+jIA( 1, 2 . 1,J+ . 1 ,0" ,m,) (12 ,1.'-1 ,t'+1 ,m )1· ,to • •
to •• , ] -
t . ..
The members of the set A(m-l) can be arranged in the lexicographic order as follows:
where0:1=(1,2, ... ,m-1)j 0:2=(1,2, ... ,m-2,m)j 0:3=(1,2, ... , m - 3, m - 1, m)j ... j O:m-2 = (1,2,4, ... , m)j O:m-I = (1,3,4, ... , m)j and O:m = (2,3, ... , m). The (i,j)-th entry of A[m-I) is given by IAO:iO:j I, which is equal to (_1)i+jam+1-j,m+1-i. More transparently, the (1, 1)-th entry of adj(A) is the (m, m)-th entry of A[m-I)' Let us look at a simple example. Let aI2 a22 a32
aI3] a23 . a33
MATRIX ALGEBRA THEORY AND APPLICATIONS
150
Then adj (A) 12 -det [a a32
[all a31 -det [all a31 det
and A[2J
=
One can obtain A[2J from adj(A) in a simple manner. First, ignore all the negative signs present in front of the determinants. Keeping the (1,3)-th position in adj(A) as fixed, move the first row 90 0 anticlockwise. Keeping the (2,2)-th position as fixed, move the second row 900 anticlockwise. Finally, keeping the (3,1 )-th position as fixed, move the third row 900 anticlockwise. We have A[2J. These operations can be executed purely in a mathematical fashion. Let
E= and
[~
0 -1 0
~l
[~ ~l 0
F=
1
0
One can check that A[2J = FE[(adj(A))']EF = FE(adj(A'))EF. The multiplication by the matrix E neutralizes the negative signs in adj(A). The multiplication by the matrix F moves the entries of (adj(A))' the way outlined above. (Try it.) The apparent discrepancy in the location
Chamcteristics of Matrices
151
of the entries between the matrices A[m-I] and adj(A) stems from the fact we have agreed upon the way to write down the members of A m - I in lexicographical order. We now concentrate on minors of submatrices of compound matrices. Let A be a matrix of order m x m. Let p be any positive integer with p + 1 :::; m. For each i and j E {p + 1, p + 2, ... ,m}, let bij = IA(1,2, .. . ,p,i),(1,2, ... ,p,j) I·
Let B be the matrix of order (m - p) x (m - p) whose (i,j)-th entry is given by bij , i, j E {p + 1, P + 2, ... ,m}. We can immediately recognize that the matrix B is a submatrix of the compound matrix A[p+I]' We would like to compute minors of various orders for the matrix B. Let us make an assault. Let p + 1 :::; iI < i2 < ... < iq :::; m,
and p
+ 1 :::; jI < i2 < ... < jq :::; m
be two choices of integers for some 1 :::; q :::; m - p. The following result known as Sylvester's determinantal identity provides a formula for the computation of q-th order minors of B in terms of minors of A. P 4.5.3 (Sylvester's Determinantal Identity) With the notation set as above, we have 1B(i}'i2,' "
,i q ),(jt,j2 , ... ,jq) 1 =
xIA(1,2, ... ,p,i 1 ,i2,'"
[lA(1,2, ... ,p),(1,2, ... ,p) Ilq-I
,i q ),(1,2, ... ,p,jl,h, ... ,jq)l·
PROOF. The case q = 1 is clear. The result follows from the definition of the entries of B. It is not difficult to establish the identity for q = 2. The general case q > 2 follows in the foot-steps of the case for
q=2. The Sylvester's identity can be generalized in several directions. We will describe the generalization in general terms without proof. Choose and fix two sets of p integers: 1 :::; J.LI < J.L2 < . .. < J.Lp :::; m and 1 :::; VI < v2 < ... < vp :::; m, for some 1 :::; p < m. Let i E
152
MATRIX ALGEBRA THEORY AND APPLICATIONS
{I, 2, ... , m} - {J.LI , J.L2, .•.
and j E {I, 2, ... , m} - {VI, V2,
, J.Lp}
•• . ,
v p }.
Define where il < i2 < ... < iP+I is a permutation of i, J.LI, J.L2, ··· , J.Lp and jI < j2 < ... < jp+I is a permutation of j, vI, V2, ... ,1/p' Let B = (b ij ) be the resultant matrix of order (m-p) x (m-p). The entries bij's appear in the matrix B in the natural order of i's and j's consistent with the lexicographical order in which the minors of order p + 1 are arranged in the compound matrix A[P+I]' It is clear that the matrix B is a submatrix of A(p+I]' As an illustration, let m = 5, p = 2, J.LI = 2, J.L2 = 4, VI = 1, and V2 = 3. The matrix B is of order 3 x 3 given by,
B=
=
[bb3212
bI4 b34
bbl5
b52
b54
b55
[I AI 1,2,') ,11,2,3)
35
j
1
IA(2,3,4),(I,2,3) I IA(2,4,5),(I,2,3) I
IA(1,2,4),(I,3,4) I IA(2,3,4) ,(I,3,4) I IA(2,4,5),(I ,3,4) I
IAI I ,2"),II,3,5) I]
IA(2,3,4) ,(I,3,5)1 . IA(2,4,5),(I,3,5) I
(Check the natural order of the subscripts of the entries of B and the matching lexicographical order of the minors of A.) Let us go back to the general case. Let 1 ~ q ~ m - p. We want a formula for the q-th order minor of B. Let kI < k2 < ... < kg and il < i2 < ... < ig be two choices of integers with k I , k 2 , •.. , kg E {1,2, ... ,m} - {J.LI,ll2, ... ,J.Lp} and i I ,i2, •.• ,ig E {1,2, .. . ,m}{VI, V2,··· ,Vp}. We are now ready to state the identity.
P 4.5.4 (Sylvester's Determinantal Identity for the q-th Order Minor of B) IB(k 1 ,k2, ... ,k q ),(ll,l2, ... ,lq)1
=
[lA(Pl,P 2, ... ,P,,),(Vl,V2, ...
,v,,)lI g -
I ..
x IA(Ol,02, .. . ,o,,+q) ,({31,/h, ... ,{3,,+q) I, where al < a2 < ... < a p+ g is a permutation of J.LI,J.L2, ••• ,J.Lp,k I ,k2, ••• ,kg and {31 < {32 < ... < {3p+g is a permutation of VI, V2, ••• ,vp , iI, i2, ... ,ig.
Charncteristics of Matrices
153
Consider the example presented above with m = 5. Let q = 2, k1 = 2, k2 = 3, i1 = 2 and i2 = 4. Then the second-order minor of B is given by
IB(1,3),(2 ,4) 1 =
I~~~
14
1_
bb - IA (2,4),(1 ,3) I·IA (1 ,2,3,4),(1,2,3,4) I. 34
We now need to explore the relationship between minors of a matrix and its rank. We report some results without proof.
P 4.5.5 Let A be a matrix of order m x n. Let 1 ::; r ::; min{m, n}. Then the following statements are equivalent.
(1) p(A) = r. (2) There exists a non-zero minor of A of order r and every minor of order (r + 1) is zero. (We are adopting the convention that if 7' + 1 exceeds min{m, n}, minors of order r + 1 are set equal to zero.) P 4.5.6
Let A be a matrix of order m x m with m
~
2. Then
(1) p(adj(A)) = m if and only if p(A) = m; (2) p(adj(A)) = 1 if and only if p(A) = m - 1; (3) p(adj(A)) = 0 if and only if p(A) ::; m - 2. P 4.5.7 (Laplace Expansion) Let A be a matrix of order m x m. Choose and fix 1 ::; P ::; m. I
AI... +ip+jl+i2+ .. ·+jpIA (tl,t2, ·· ... ,t.p ) ,(JIoJ2 . . , ... ,Jp) . 1 - '"'(_1)il+i2+ L...J · IA(0!1,0!2, .. . ,0!-m-p) ,({3I ,{32, ... ,(3-m - p) I,
where the summation is taken over all p-tuples 1 ::; i t < i2 < ... < ip ::; m, and 1 ::; j1 < h < ... < jp ::; m, 1 ::; a1 < a2 < ... < a m - p ::; m is a permutation of the members of the set {1, 2, .. . ,m} {i., i 2 , •.• ,ip}, and 1 ::; f31 < f32 < ... < f3m-p ::; m is a permutation of the members of the set {1, 2, ... ,m} - {jt , j2,'" ,jp}. (If P = m, the second determinant in the above summation is taken equal to unity.)
Complements 4.5.1 Suppose A, Band e are three matrices each of order m x m such that A = Be. Show that for any 1 ::; P ::; m ,
154
MATRIX ALGEBRA THEORY AND APPLICATIONS
(This is essentially the Cauchy-Binet formula.) 4.5.2 Suppose A is a matrix of order m x m and a E F. Show that for any 1 ~ P ~ m, (aA)[p) = aP(A(p)).
4.5.3 1~ p
~
Suppose A is a matrix of order m x m. m, (A')(p) = (A(p))'.
Show that for any
4.5.4 Suppose A is a non-singular matrix. Show that for any 1 m, A(p) is non-singular, and
~
P~
4.5.5 There is no particular reason that the operation of compounding should be confined to square matrices. The notion makes sense for any matrix of any order. Reformulate Complements 4.5.1-4.5.3 above in the context of matrices of any order and solve them. 4.5.6 Let A, B, and G be three matrices of orders Tn x n, m x r, and r x n, respectively, such that A = BG. Let 1 ~ P ~ min{m, r, n}. We need to take into account possible differences in m and n in the computation of minors of A. Let A;' = {a = (iI, i2, . .. , ip) : 1 ~ i 1 < i2 < ... < ip ~ m}. Let a E A;' and f3 E A~. Show that the p-th order minor of A is given by
IA 0:13 I =
L
'YEA;
IBeq,1 IG'YI3I·
4.5.7 Let A be a matrix of order m x m. Write the relationship between the compound matrix A[m-I) and adj(A) in mathematical terms. Establish this relationship. 4.5.8 Let 2 5 8 Evaluate adj(A) and A[2)' Show that adj(A) is symmetric. (Does this surprise you?)
Charocteristics of Matrices
155
4.5.9 Let T : V ---+ Wand AT be the associated matrix. Further let K(T) = {x E V : T(x) = O} and R(T) = {Tx : x E V} be the kernel and range of T, respectively. Define peT) = dimR(T) and veT) = dim K(T). Then show that veT) = V(AT) and peT) = p(AT). 4.5.10 Let T : V ---+ V, dim V = n and AT be the associated matrix. Then show that there exists a matrix B of order n x n such that ATB = BAT = In, if and only if p(AT) = n. 4.5.11 Let T : V ---+ W. Define a map T' from the dual space V' of V to the dual space W' of W as follows. Let I be a linear functional on Wand f'(x) = I(T(x)),x E V. Then I' is a linear functional on V, i.e., f' E V' . Note that T'(f) = I', lEW'. The map T' is called the transpose of T. Show that T'(a/I
+ f3h)
= aT'(ft}
+ f3T'(h)
i.e., T' is a transformation. If T and S are two transformations from V ---+ W, then show that (T + S)' = T' + S', (aT)' = aT', (ST)' = T'S'. 4.5.12 Let T' be as defined in Complement 4.5.11. Then show that
AT' = (AT)' where AT and AT' are the matrices associated with T and T' respectively. 4.5.13 Let V be a vector space over the field C of complex numbers equipped with an inner product < ', ' >. Consider for each fixed y E V, the linear functionally: V ---+ C defined by
Iy(x)
=< T(x), y >,
x E V.
Then by P 2.4.1, there exists a unique vector z E V such that
Iy(x)
=< x,z >,x E V.
Let us denote z by T*(y). Thus we have a map T* : V ---+ V which we call the adjoint of T. Show that T* is a transformation and the matrix AT- is the conjugate transpose of AT. 4.5.14 Let T : V ---+ Wand S : W ---+ U. Show that
peST) ::; min{p(S) , peT)}.
156
MATRIX ALGEBRA THEORY AND APPLICATIONS
4.5.15
Let A be an m x n matrix of rank k. Show the following:
(1) There is a k x k submatrix of A with nonzero determinant, but all (k + 1) x (k + 1) submatrices of A have determinant zero. (2) There is a set k, but not more than k, of linearly independent vectors b such that the linear equation Ax = b is consistent. 4.5.16 Let A be an m x n matrix. Then p(A* A) = p(A). 4.5.17 Let A and B be m x n matrices. Then p(A) = p(B) if and only if there exist nonsingular matrices X of order m and Y of order n such that B = X AY. 4.5.18 Let A be a square Hermitian matrix of order n, i.e., A = A*. Show that p(A) ~ (tr A)2 Itr A2, with equality if and only if there exists an n X r matrix U = (UI' ... ,ur ) with orthonormal columns and some real number a such that A = aUU*. 4.5.19 Let A = (aij) = (all···la n ) be a square matrix of order n obtained by writing n column vectors aI, ... ,an one after the other to form a square grid. Show that n
p(A) ~
L laiil Iliaill~, 2
i=l
CHAPTER 5 FACTORIZATION OF MATRICES
In Chapter 4, we have demonstrated how a matrix A of order m x n with rank a has the factorization A = RF, where R is an m X a matrix of rank a and F a matrix of order a x n with rank a. In this chapter, we consider a number of other factorizations of A in terms of special matrices. These factorizations are useful in theoretical investigations and practical applications. 5.1. Elementary Matrices Consider the following operations on a matrix A of order m x n. Operations on the rows of the matrix {Rt} Multiply each of the entries in the r-th row by a scalar 0 =1= O. (R2) Replace the r-th row by "r-th row +,B{s-th row)," where ,B is a scalar. (R3) Interchange two rows. Operations on the columns of the matrix {Ct} Multiply each of the entries in the r-th column by a scalar o =1= o. (C2 ) Replace the r-th column by "r-th column +,B{s-th column)," where,B is a scalar. (C3) Interchange two columns. We show that all the above row and column operations on A are equivalent to pre and post-multiplying A respectively, with what we call appropriate elementary matrices. Let Er{o) = (c5ij ) be the matrix of order m x m, where
c5ij = {
0,
ifi=j=r,
1,
if i
0,
ifi=l=j. 157
=j
=1= r,
158
MATRIX ALGEBRA THEORY AND APPLICATIONS
Let B be the matrix obtained from A after performing the operation (Rt) on A. One can show that B = Er(a)A. If C is the matrix obtained from A after the performance of the operation (C1 ), then this operation is equivalent to postmultiplying A by Er(a), i.e., C = AEr(a), with the matrix Er (a) being of order n x n. As per operation (R 2 ), asswne that r =f s. If r = s, the operation (R2) is similar to the operation (Rt). Let Ers(j3) = (-rij) be the matrix of order m x m, where
"Iij =
{
1,
if i = j,
f3,
if i = r, j = s,
0,
otherwise.
Note that Ers(f3) is either an upper-triangular matrix or a lower-triangular matrix depending upon the relationship between rand s. Let D be the matrix obtained from A after performing the operation (R2) on A. One can check that D = Ers (f3)A. Let us look at the column operation (C2 ). Let F be the matrix obtained from A after performing the operation (C2 ) on A. Then one can check that F = AEsr(f3). But now the matrix Ear(f3) is of order n x n. (Do you see the difference in how the matrices Ers(f3)'s work out to be in the operations (R 2 ) and
(C2 )?) One can look at the row operation (Rl) as a special case of the operation (R2 ). One can take r = s in the operation (R 2 ), and define Err(f3) as E r {1 + f3). Suppose we interchange the r-th and s-th rows at A. Let F be the resultant matrix. Let E rs = (Vij) be the matrix of order m x m, where if i = j, i if i
= r,
j
=f r, i =f s, = s,
if i = s, j = r, otherwise. One can check that F = ErsA. It is instructive to observe that the matrix Ers can be obtained as a product of matrices of the type Ers(f3). More precisely,
Factorization of Matrices
159
This follows from the observation:
As mentioned earlier, one can take, conventionally, that Es( -1) = E".( -2). Let G be the matrix obtained from A after performing the operation (C3 ) on A. Then G = AErs with the understanding that E rs is of order n x n. The matrices Er(a), Ers(f3), and E rs of whatever order are called elementary matrices. It is obvious that they are all nonsingular. Consequently, the matrix obtained after perfonning any of the row operations will have the same rank as that of A. The elementary matrices can be obtained by a different route. Start with the identity matrix I'm' Perform the operation (Rt) on I'm' The resultant matrix is Er(a). If we perform the operation (R2) on I'm' we will obtain the matrix E rs (f3). IT we perform the operation (R3) on I'm' we will obtain E rs . Upper-triangular and lower-triangular matrices have nice properties. Some of these will be touched upon in the complements. There are other special forms of matrices called echelon matrices which play an important role in the factorization of matrices. DEFINITION 5.1.1. A matrix A of order m x n is said to be in echelon form if each row of A has one of the following properties:
(1) All the entries in the row are zeros. (2) IT the row is non-zero, then all the entries in the column below the first non-zero entry of the row are zeros. The following matrices are all in echelon form:
1]
1] , [0 1 1 2], 1 [0 0112 o 2 0 1 3 2 3 . 0 00 [ 000200002 o 3 DEFINITION 5.1.2. Let
A be a matrix of order m x n . For each
1 ~ i ~ m, let ai be the number of zeros preceding the first non-zero element of the i-th row. The matrix A is said to be in an upper echelon form if al < a2 < ... < a'm' Let bi be the number of zeros above the
160
MATRIX ALGEBRA THEORY AND APPLICATIONS
first non-zero element of the i-th column, 1 ~ i ~ n . The matrix A is said to be in a lower echelon form if bi < ~ < . . . < bn · The following matrices are in upper echelon form:
o o o
2 1
0
Complements 5.1.1 Let A and B be two upper-triangular matrices of the same order. Show that AB is upper-triangular. 5.1.2 Let A be a non-singular upper-triangular matrix. Show that its inverse is also upper-triangular. 5.1.3 Determine the inverse of each of the elementary matrices Er(O:), E rs (j3), and E rs . 5.1.4 Let
A=
[~
-3
1
o
o 3
-1
~] .
-2
Which row operation on A will render (2,4)-th element of A zero?
5.2. Reduction of General Matrices First, we consider matrices with elements belonging to any field of scalars and derive a number of factorizations of a matrix obtained by pre- and post-multiplying it by elementary matrices. Before this, we would like to introduce the notion of a unit lower-triangular matrix. A square matrix A is said to be a unit lower-triangular matrix if it is lower triangular and each of its diagonal entries is equal to unity. IT 7· > S, the elementary matrix Ers(j3) is a unit lower-triangular matrix. The notion of a unit upper-triangular matrix is analogous. IT r < s, the elementary matrix Ers(j3) is a unit upper-triangular matrix. IT A and B are two unit upper-triangular matrices of the same order, it can be checked that the product AB is also a unit upper-triangular matrix. A similar assertion is valid for unit lower-triangular matrices. P 5.2.1 Let A = (O:ij) be a matrix of order m x n . Then there exists a unit lower-triangular matrix B of order m x m such that BA is in echelon form.
Factorization of Matrices
161
PROOF. Start wjth the first non-zero row of the matrix A. Assume, without loss of generality, that the first row of A is non-zero. Let ali be the first non-zero entry in the first row. Take any 2 ~ 8 ~ m. If asi = 0, we do nothing. If asi =I 0, we multiply the first row by -asda1i and add this to the s-th row. This operation makes the (8, i)-th entry zero. This operation is equivalent to pre-multiplying A by the elementary matrix Est ( -asd a1 i) which is a unit lower-triangular matrix. Thus we have used the (1, i)-th entry, namely ali, as a pivot to liquidate or sweep out all the other entries in the i-th column. The resultant matrix or reduced matrix is mathematically obtainable by pre-multiplying A successively by a finite number of unit lower-triangular matrices whose product is again a unit lower-triangular matrix. Now start with the reduced matrix. Look at the second row. If all of its entries are equal to zero, move on to the third row. If we are unable to find any non-zero vectors among the second, third, ... ,m-th rows, the process stops. The reduced matrix is clearly in echelon form. Otherwise, locate the first non-zero vector among the m - 1 rows of the reduced matrix starting from the second. Repeat the process of sweeping out all the entries below the first non-zero entry (pivot) of the chosen non-zero vector. Repeat this process until we could find no more non-zero vectors in the reduced matrix. The reduced matrix is clearly in echelon form. The promised matrix B is simply the product of all pre-multiplying unit lower-triangular matrices employed in the sweep-out process. Clearly, B is a unit lower-triangular matrix. This completes the proof. COROLLARY 5.2.2. Let A be a matrix of order m x n. Let B be the unit lower-triangular matrix obtained in the proof of P 5.2.1 such that B A is in echelon form. Then the rank of A is equal to the total number of non-zero row vectors in BA, or equivalently to the total number of pivots. We have already established the rank factorization of A in P 4.3.1. This result can be obtained as a corollary of P 5.2.1. Let BA = C. Write A = B- 1 C. Let p(A) = a. The matrix C will have exactly a non-zero rows. Eliminate all zero rows. Let F denote the resultant matrix. The matrix F will be of order aXn. Eliminate the corresponding columns from B- 1 • Let R be the resultant matrix which will be of order m x a. It is clear that p(R) = p(F) = a. Further, A = B-1C = RF. We will jot this result as a corollary.
162
MATRIX ALGEBRA THEORY AND APPLICATIONS
COROLLARY 5.2.3. (Rank Factorization) Any given matrix A of order m x n and of rank a -# 0 admits a rank factorization
A=RF
(5.2.1)
where R is of order m x a with rank a and F is of order a x n with rank a. We can polish the echelon form a little. For the given matrix A, determine a lower-triangular matrix B such that BA = G is in echelon form. By interchanging the columns of G, one can produce a matrix in which all the entries below the leading diagonal are zeros, For example, if
G=
01 10 21 02 31] , [o 0 2 0 1
interchange columns 1 and 2 to produce the matrix
1o 01 21 02 31] , [o 0 2 0 0 in which we notice that all the entries below the leading diagonal are zeros. In some cases, one may have to do more than one interchange of columns. Any interchange of columns in a matrix is equivalent to post-multiplying the matrix with an elementary matrix of the type Era. When the matrix G in echelon form is changed into another matrix D in echelon form with the additional property that all entries below the leading diagonal of D are zeros, one can write D = GG, where G is a product of matrices of the type Era. In the final analysis, we will have BAG = GG = D. Let A = AG. Observe that A is obtainable from A by some interchanges of columns of A. Let us put down all these deliberations in the form of a corollary. 5.2.4. Let A be a given matrix and B a unit lowertriangular matrix such that B A = G is in echelon form. Then one can construct a matrix 11 by interchanging columns of A such that BA is in echelon form with the additional property that all entries below the leading diagonal are zeros. COROLLARY
Factorization of Matrices
163
5.2.5. (Upper Echelon Form) For any given matrix A there exists a non-singular matrix B such that B A is in upper echelon form. COROLLARY
PROOF. If A is the zero matrix, it is already in upper echelon form. Asswne that A is a non-zero matrix. Identify the i-th colwnn such that it is a non-zero vector and all the first (i - 1) columns are zero vectors. Identify the first entry aji in the i-th colwnn which is nonzero. Add j-th row to the first row. In the resulting matrix, the (1, i)th element is non-zero. Using this element as a pivot, sweep out the rest of the entries in the i-th colwnn. Disregard the first row and the first i colwnns of the reduced matrix. Repeat the above operation on the submatrix created out of the reduced matrix. Continue this process until no non-zero colwnn is left. All these matrices may not be unit lower-triangular matrices. The addition of a row at a lower level of the matrix to a row at a higher level of the matrix cannot be done by pre-multiplying the matrix with a unit lower-triangular matrix. In any case, the final reduced matrix is in upper echelon form and the product of all elementary matrices involved is the desired non-singular matrix
B. For the next result, we need the notion of a principal minor of a square matrix. Let A be a square matrix of order m x m. Let 1 ~ i l < i2 < ... < ip ~ m. The determinant of the submatrix of A obtained from A by deleting the il-th, i 2-th, ... , ip-th rows and i l th, i 2-th, ... , ip-th colwnns from A is called a principal minor of A. The submatrix itself is called a principal submatrix. If iI, i2, ... ,ip are consecutive integers and ip = m, the associated principal minor is called a leading principal minor of A. The corresponding submatrix of A is called a leading principal submatrix. P 5.2.6 (LU Triangular Factorization) Let A be an m x m matrix such that all its leading principal minors are non-zero. Then A can be factorized as (5.2.2) A=LU, where L is a unit lower-triangular matrix and U is a non-singular uppertriangular matrix each of order m x m. PROOF. Let A = (aij). Since all =1= 0, it can be used as a pivot to sweep out all the other elements in the first colwnn. Let us identify the
164
MATRIX ALGEBRA THEORY AND APPLICATIONS
(2,2)-th element a~V in the reduced matrix. Observe that
= The determinant on the left-hand side of the above expression is a leading principal minor of A and by hypothesis, it is non-zero. The second determinant above is the result of the sweep-out process and this does not change the value of the minor. Now it follows that a~;) =1= O. Further, all the leading principal minors of the reduced matrix are non-zero. (Why?) Using as a pivot, sweep out all the entries below the second row in the second column of the reduced matrix. Continue this process until we end up with an upper triangular matrix U. All the operations involved are equivalent to pre-multiplying the matrix A with a series of unit lower-triangular matrices whose product B is clearly a unit lower-triangular matrix. Thus we have B A = U, from which we have A = B- I U = LU, say. Observe that B- 1 is a unit lower-triangular matrix. It is clear that U is non-singular. This completes the proof.
aW
COROLLARY 5.2.7. Let A be a matrix of order m x m for which every leading principal minor is non-zero. Then it can be factorized as A=LDU,
(5.2.3)
where L is unit lower-triangular, U is unit upper-triangular, and D is a non-singular diagonal matrix. PROOF. First, we write A = LUI following (5.2.2). Divide each row of U l by its diagonal element. This operation results in a matrix U which is unit upper-triangular. This operation is also equivalent to premultiplying Ut by a non-singular diagonal matrix D- t . Thus we can write U = D-t Ut . The result now follows. The assumption that the matrix A should have leading principal minors non-zero looks a little odd. Of course, every such matrix is non-singular. Consider the following matrix:
A=[~ ~] .
Factorization of Matrices
165
The matrix A is non-singular but the assumption of P 5.2.6 on leading principal minors is not met. There is no way we could write A = LU with L unit lower-triangular and U non-singular upper-triangular. (Try.) In the following corollary we record that this assumption is inviolable. COROLLARY 5.2.8. Let A be a non-singular matrix. Then A admits a factorization of the form (5.2.3) if and only if all the leading principal minors of A are non-zero.
The critical observation is that if A can be factorized in the form (5.2.3), then any leading principal submatrix of A can be factorized in the form (5.2.3). Now the result becomes transparent. The notion of leading minors and leading submatrices can be defined for any general matrix A not necessarily square. A result analogous to P 5.2.6 should work out for rectangular matrices too. PROOF.
5.2.9. Let A be a matrix of order m x n for which all its leading principal minors are non-zero. Then we have the factorization COROLLARY
A=LU,
(5.2.4)
where L is unit lower-triangular of order m x m and U is in upper echelon form. The hypothesis that all leading principal minors are non-zero can be relaxed, but the conclusion will have to be diluted. COROLLARY 5.2.10. Let A be matrix of order m x m such that its il-th, i 2-th, ... , ir-th columns are dependent on the previous columns for some 1 < i l < i2 < ... < ir ~ m. Let B be the matrix obtained from A by deleting the il-th, i2-th, ... , ir-th rows and il-th, i2- th , ... , ir-th columns of A. Suppose all the leading principal minors of B are non-zero. Then we have the factorization
A=LU,
(5.2.5)
where L is unit lower-triangular and U is upper-triangular (with some diagonal elements possibly zero). PROOF. The factorization (5.2.5) can be established by following the same argument that is used in the proof of P 5.2.6. If we encounter a
166
MATRIX ALGEBRA THEORY AND APPLICATIONS
zero leading element at any stage of the reduction process, skip the row containing the leading element and then move on to the next row. The hypothesis of the corollary ensures that when a leading element is zero at any stage, all the elements below it in the same coltunn are zero. When the matrix involved is special, the factorization becomes special too. In the following, we look at symmetric matrices. COROLLARY 5.2.11. Let A be a symmetric matrix of order m x m with all its leading principal minors to be non-zero. Then we have the factorization (5.2.6) A= L6L',
where L is unit lower-triangular and 6 is diagonal with non-zero diagonal entries. PROOF. First, obtain the factorization A = LU following P 5.2.6, where L is unit lower-triangular and U non-singular and upper-triangular. Since A is symmetric,
A = LU = A' = U' L',
from which we note that U = show that 6 is diagonal, then result follows. The diagonality upper-triangular and L-lU' is
L-lU'L'. Let 6 = L-lU'. If we can we can write A = LU = L6L' and the of 6 follows from the fact that U L- 1 is lower-triangular. (Check.)
5.3. Factorization of Matrices with Complex Entries In addition to the factorization results given in the last section, some special results are available when the elements of the matrices belong to the field of complex numbers. Let A be a square matrix with complex numbers as entries. The complex conjugate of A is defined to be that matrix A * obtained from A by replacing the entries of A by their complex conjugates and then taking the transpose of the matrix. If the entries of A are real, the complex conjugate of A is merely the transpose A' of A. A matrix A is said to be Hermitian if A* = A. If A has real entries, recall that A is said to be symmetric if A' = A. Two coltunn vectors al and a2 of the same order with complex entries are said to be orthogonal if aia2 = a2al = O. They are said to be orthonormal if, in addition, aia1 = 1 = a2a2. These notions are not different from what
Factorization of Matrices
167
we already know about orthogonal and orthonormal vectors in an inner product space. The relevant inner product space in this connection is equipped with the standard inner product. A square matrix A is said to be tmitary if A* A = AA* = I. A square matrix A with real entries is said to be orthogonal if A' A = AA' = I. We now introduce formally Householder matrices.
en
DEFINITION 5.3.1. Let w be a column vector of order n x 1 with complex entries satisfying w*w = 1. Let
E(w) = In - 2ww*. The matrix E(w) is called a Householder matrix. The following are some of the properties of Householder matrices. P 5.3.2 (1) Every Householder matrix is Hermitian and tmitary. (2) Let the column vectors a and b be such that a*a = b*b, a*b = b*a (which is automatically true if the vectors are real), and distinct. Then there exists a vector w of unit length such that E(w)a = b. (3) Let a and b be two distinct column vectors of the same order n x 1 with the following structure: a' = (a~, a2) and b' = (ai, b2), where al is a column vector of order t x 1. (What this means is that the first t entries of a and b are identical.) Suppose that b2b2 = a2a2 and b2a2 = a2b2. Then there exists a vector w of unit length such that E(w}a = band E( w)( c', 0') = (c', 0') for any column vector c of order t x 1. PROOF. The verification of (1) is easy. For (2), take w = r(a - b), where r > 0 is such that the length of w is tmity. In fact l/r 2 = (a - b)*(a - b). The given conditions on the vectors a and b imply that the vectors (a - b) and (a + b) are orthogonal, i.e., (a - b)*(a + b) = o. We are now ready to prove the assertion:
E(w)a = (In - 2ww*)a -
2 * a+b [In - 2r (a - b)(a - b) ][-2-
- -a+b + -a-b 2 2.
a-b + -2-1
(a- b) = b.
As for (3), take w = r(a - b), where r > 0 is such that the vector w has unit length. This would do the trick. The following is one of the most useful factorizations of a matrix. It is similar to the rank factorization of a matrix presented earlier.
168
MATRIX ALGEBRA THEORY AND APPLICATIONS
P 5.3.3 (QR Factorization) Let A be a matrix of order m x n with complex entries. Let the rank of A be a. Then A can be factorized as (5.3.1) A=QR, where Q is an m x a matrix of rank a such that Q*Q a X n matrix in upper echelon form.
= Ia
and R is an
PROOF. If A = 0, the avowed factorization of A can be set down easily. Assume that A is non-zero. Identify all non-zero coluIIlllS of A. Let 1 :::; it < i2 < ... < ir :::; n be such that ail' ai2' ... ,air are all those columns of A which are non-zero. Let d be the first entry in the column ail' If d is real, let bt be the column vector given by b~ = (Jail ail ' 0, ... ,0). If d is complex, let b~ = (~Jail ail' 0, .., 0). Note that bibt = ail ail and biail = ail bl . By P 5.3.2 (2), there exists a vector Wt of unit length such that E(wdail = bt . Let E(wdA = At. Let us examine closely the structure of AI . The first (it - 1) colUIIlllS of Al are zero vectors. The it-th column of At is bl . Let us work on AI. Identify all non-zero columns of AI. It turns out that il-th, i2-th, . . . , ir-th columns of At are the only ones which are non-zero. Let us work with i 2-th column of At. Let us denote this column by c. Partition c' = (cIlc2' C3,'" ,c..n) = (Ctlc'), say. Let b2 = (cIlvc*c, 0, ... ,0) if C2 is real. If C2 is complex, b2 is modified in the same way as in bt . The vectors C and b2 satisfy the conditions of P 5.3.2(3) with t = 1. There exists a vector W2 of length one such that E(W2)C = b2 . Also, E(W2)b t = bl . Let A2 = E(W2)A t . Let us look at the structure of A2. The it-th, i 2-th, ... ,ir-th columns are the only vectors of A2 which are non-zero. In the it-th column, all entries below the first element are zeros. In the i2-th column, all entries below the first two elements are zeros. The trend and strategy should be clear by now. We now have to work with the matrix A2 and its i3-th column. Continuing this way, we will have Householder matrices E(wt) , E(W2), ... ,E(wr ) such that
is in upper echelon form. Let QI = E(wr)E(wr_l)'" E(W2)E(Wl)' It is clear that Qt is unitary and A = QiRl . Since p(A) = a, exactly a rows of Rl will be non-zero. Delete these rows from Rl and let R be the resultant matrix. Delete
Factorization of Matrices
169
the correspondingly numbered columns from Qi and let the resultant matrix be Q. Note that A = QR and thls is the desired factorization. The QR factorization of a matrix is computationally very important. It can be used to determine the rank of the matrix, a g-inverse of the matrix, and a host of other features of the matrix. The method outlined in the proof of P 5.3.3 is called Householder method and used in many computational routines. There is another method of obtaining QR factorization of a matrix based on the Gram-Schmidt orthogonalization process which was outlined in Chapter 2.
Gram-Schmidt Method: Let us denote the column vectors of A by aI, a2, ... ,an. The Gram-Schmidt process is designed to provide vectors bl , ~, ... ,bn such that b;b s = 0 for every r i= s, and Sp(aI, a2, ... ,ai) = Sp(b 1, b2 , ... ,bi )
for each i = 1,2, ... , n, where Sp(at, ... ,ai) denotes the subspace spaIUled by al, ... ,ai. The relationship between aI, a2, ... ,an and bI, b2 , •• . ,bn can be written in the form at
= bt
a3 = rl3 bt
+ b2 + r23 b2 + b3
an = rtnb1
+ r 2n b2 + ... + r n-l,n bn-1 + bn ,
a2 = rt2 bt
(5.3.2)
where rij's and bi' S are determined so as to satisfy bibj = 0 for all i i= j. One computational procedure for obtaining bt , b2, ... ,bn is as follows. Suppose bt , ~, ... ,bi - 1 are determined. We will determine r1 i, r2i, ... , Ti-1,i, and bi. Consider the equation ai = 1"lib1
Since b;b s = 0 for every
1"
+ T2i b2 + ...
i= s, r
,1"i-1,i bi-t
+ bi·
E {I, 2, ... ,i -I},
(5.3.3) If b;br = 0, set 1"ri = O. If b;br Finally, bi is obtained from
i= 0, set rri =
b;adb;br ,
1"
= 1, .. . ,i-l.
(5.3.4)
170
MATRIX ALGEBRA THEORY AND APPLICATIONS
Now that Tli, T2i,'" ,Ti-1,i, and bi are determined, we can proce.ed to the next step of determining T1 ,i+b T2,i+b ... ,Ti,i+1, and bi+l III an analogous fashion. The process is continued until all bb b2, . .. ,bn are determined. Equations (5.3.2) can, indeed, be written in a succinct form. Let A = (at, a2,'" ,an), B = (bt, b2,'" ,bn ), and
C= [
1
T12
0
1
o
0
o
Note that A = BC and C is upper triangular. If some of the bi's are zero, we can omit these columns from B and the correspondingly numbered rows from C resulting in matrices Q1 and R 1, respectively. Thus we will have A = Q1Rt, where the columns of Ql are orthogonal and R1 is in upper echelon form. We can normalize the column vectors of Q1 so that the resultant vectors are orthonormal. Let Q be the matrix so obtained from Q1 by the process of normalization. The matrix RI is also modified to absorb the constants involved in the normalization process with the resultant matrix denoted by R. Thus we have the desired QR decomposition of A. Some computational caution is in order in implementing the GramSchmidt method. From the nature of the formulae (5.3.3) and (5.3.4), some problems may arise if bi bi = 0 or close to zero indicating that ai depends on or closely related to a1, a2, ... ,ai-I. In such a case it may be advisable to shift ai to the last position and consider ai+1 after ai-I' Such a rearrangement of ai decided upon at different stages of the Gram-Schmidt process does not alter the nature of the problem. We can always restore the order by shifting the columns of A and the corresponding rows of R under these circumstances, the resulting matrix Q will still be orthonormal and R will be in echelon form. Modified Gram-Schmidt Method: This is a slightly different way of transforming a given set at, a2, ... ,an of vectors into an orthogonal set of vectors. The procedure is carried out in n stages as detailed in the following.
Factorization of Matrices
171
Stage 1 Set bl =
al'
Compute
Stage 2 Set
b2
= a~l). Compute
b*a(l) l'
24 -
2 4
bb
22
and
and
Stage
en - 1)
Set bn-l =
(n-2) an_I'
Tn-l,n
=
Compute
b*n-I an(n-2) b* b n-l n-l
d an
(n-l) _ (n-2) b an - an - Tn -l,n n-l'
Stage n Set
bn
= a~n-l).
If, at any stage, b i = 0, set Ti,i+l = Ti,i+2 = ... = Ti,n = O. One can check that bi bj = 0 for all i i= j. The set bl , b2, ... , bn of vectors and the coefficients Tik'S can be used to set out the QR decomposition of the matrix A = (aI, a2, ... , an).
172
MATRIX ALGEBRA THEORY AND APPLICATIONS
The singular value decomposition (SVD) of a matrix of order m x n is a basic result in matrix algebra. A matrix A of order ~ x n with rank a can be
P 5.3.4 (SVD) factorized as
A = PL:::.Q*
(5.3.5)
where P is of order m x a such that P* P = la, Q is of order n x a such that Q*Q = Ia and L:::. is a diagonal matrix of order a x a with all positive entries. PROOF. A simple proof of (5.3.5) depends on what is called SD (spectral decomposition) of a Hermitian matrix derived later in (5.3.9). Note that AA* is Hermitian of order m x m and using (5.3.9), we have
\2 * + ... + AaPaP a [ = PIPI*+ ... + PaPa*+ Pa+IPa+1 * + ... + PmP m *
\2 * AA * = AIPIPI
where PI, ... ,Pm are orthonormal and \2 1'f 2. = J,..2 = 1, ... , a Pi*AA* Pj = Ai =Oifi#j,i=1, ... ,m,j=1, ... ,m As a consequence of the above two equations, we have
= 0, i = a + 1, ... ,m and with q; = \-lp;A,i = 1, ... ,a P; A
q;qj = 1 ifi =j = 0 if i
# j.
Now consider A = (PIP~
+ ... + Pmp~)A
+ Pap~A + Pa+1P~+1A + ... + Pmp:nA AIPlqi + ... + AaPaq~
= PIPiA + ... =
= PL:::.Q*
where P = (PI: ... : Pa), Q = (ql : ... : ql) . This completes the proof. If A is m x m Hermitian, i.e., A = A*, and non-negative definite (nnd), i.e., x* Ax 2: 0 or positive definite (pd), i.e., x* Ax > 0 for all complex vectors x, then the decomposition takes a finer hue as established in P 5.3.5 and P 5.3.9.
Factorization of Matrices
173
P 5.3.5 (Cbolesky Decomposition) Let A be a non-negative definite matrix of order m x m. Then A can be factorized as
(5.3.6)
A=KK*,
where K is a lower-triangular matrix with non-negative entries in the diagonal. If A is positive definite, then all the entries in the diagonal of K are positive and the decomposition is unique. PROOF. The result is obviously true when m = 1. We will establish the result by induction. Assume that the result is true for any nonnegative definite matrix of order (m - 1) x (m - 1) . Let A be a nonnegative definite matrix of order m x m. Partition A as
A =
[a
2
aa
aa*]
Al
'
where a is a real scalar, a is a column vector of order (m - 1) x 1 and the matrix Al is of order (m - 1) x (m - I). (Check that it is possible to partition A the way we did.) Suppose a i= 0. Let x be any vector of order m x 1 partitioned as x' = (xlix'), where Xl is a scalar and x is a column vector of order (m - 1) x 1. Since A is non-negative definite,
°: ; =
X* Ax
= a21xl12 + aXlx*a + aXla*x + x* Alx
-*(A I X +X IaXI + a *-12
'- aa *)-x.
Since this inequality is valid for all complex vectors x, it follows that At - aa* is non-negative definite. By the induction hypothesis, there exists a lower-triangular matrix L of order (m - 1) x (m - 1) with nonnegative diagonal entries such that Al - aa* = LL *. Observe that
[~ ~] [~
a* ] =
L*
[a
2
aa
Take
If a
aa*
K=
[~
K=
[~
= 0, choose
aa*
+ LL*
] _
-
[a
2
aa
aa*]_
Al
- A.
174
MATRIX ALGEBRA THEORY AND APPLICATIONS
where Al = M M* with M being a lower-triangular matrix with nonnegative diagonal elements. If A is positive definite, IKI 1= 0 which means that every diagonal entry of A is positive. In this case, suppose H H* is an alternative factorization of A. By directly comparing the elements of Hand K, one can show that H = K. P 5.3.6 (A General Decomposition Theorem) Let A be a square matrix of order m x m. Then A can be factorized as
A = prp*, where P is unitary and
r
(5 .3.7)
is upper-triangular.
We prove this result by induction. The case m = 1 is clear. Assume that the result is true for all matrices of order (m-l) x (m-1). Let A be a matrix of order m x m. Consider the equation Ax = AX in unknown A, a scalar, and x, a vector. Choose a A satisfying the determinantal equation IA - AIm I = 0, which is a polynomial in A of m-th degree. Let Al be one of the roots of the polynomial. Let Xl be a vector of unit length satisfying AXI = AlXI. Choose a matrix X of order m x (m - 1) such that the partitioned matrix (XliX) is unitary. One can verify that PROOF.
XjAX] X*AX . Note that X* AX is of order (m - 1) x (m - 1). hypothesis, we can write
By the induction
X* AX = QrlQ* with Q*Q
= Im -
l
and
rl
upper-triangular. Note that
[~l i~~]=[~ ~][~l Xj~~Q][~ Consequently,
XjAX] X*AX
[Xi] X* XjAXQ]
rl
[1
0
Factorization of Matrices
Let
175
~]
and
xtAXQ] rl Clearly, P is unitary and
r
.
is upper-triangular.
COROLLARY 5.3.7. Let A be of order m x m. In the factorization A = prp* in (5.3.7), the diagonal entries of r are the roots of the polynomial IA - >.Iml = 0 in >.. The diagonal entries of r in Corollary 5.3.7 have a special name. They are the eigenvalues of A. See Section 5.4 that follows. P 5.3.6 is also called Schur's Decomposition Theorem. We will discuss the roots of the polynomial IA - >'11 = 0 in >. in the next section. Some inequalities concerning these roots will be presented in a later chapter. We want to present two more decompositions of matrices before we close this section. First, we need a definition. A square matrix A is said to be normal if A* A = AA*. P 5.3.8 (Decomposition of a Normal Matrix) A normal matrix A can be factorized as
A
where
r
= prp*
(5.3.8)
is diagonal and P unitary.
PROOF. Let us use the general decomposition result (5.3.7) on A. Write A = pr P*, where r is upper-triangular and P unitary. Note that
prr* P* A* A = pr* P* prp* = pr*rp*.
AA* = prp* pr* P* = =
Consequently, rr* = diagonal. (Why?)
r*r.
If
r
is upper-triangular, then
r
has to be
P 5.3.9 (Decomposition of a Hermitian Matrix) A Hermitian matrix A can be factorized as A = prp*,
(5.3.9)
176
where
MATRIX ALGEBRA THEORY AND APPLICATIONS
r
is diagonal with real entries and P unitary.
PROOF. Let us use the general decomposition result (5.3.7) again. Write A = pr P*, where r is upper-triangular and P unitary. Note that A" = pr* P* = A = Pf p... Consequently, r = r* and hence r is diagonal with real entries.
Complements 5.3.1 Let E(w) be the Householder matrix based on the vector w of unit length. Show that: E(w)a = a, if a and ware orthogonal, i.e., a'"w = 0, and E(w)w = -w. 5.3.2 Let o 2 o 4 o 6 Obtain QR factorization of A following the method outlined in the proof of P 5.3.3. Obtain also QR factorization of A following the GramSchmidt as well as the modified Gram-Schmidt process. Comment on the computational stability of the two methods. 5.3.3 Let
A=
[all a12 J a21 a22
be non-negative definite. Spell out explicitly a Cholesky decomposition of A. If A is non-negative definite but not positive definite, explore the source of non-uniqueness in the Cholesky decomposition of A. 5.3.4 The QR factorization is very useful in solving systems of linear equations. Suppose Ax = b is a system of linear equations in unknown x of order n xl, where A of order m x nand b of order m x 1 are known. Suppose a QR decomposition of A, i.e., A = QR is available. Rewriting the equations as Rx = Q*b, demonstrate a simple way of solving the linear equations. 5.3.5 (SQ Factorization) If A is an m x n matrix, m ~ Tt, show that A can be factorized as A = SQ where S is m x m lower triangular and Q is m x n matrix with orthonormal rows. 5.3.6 Work out the SVD (singular value decomposition) of the matrix A in the example 5.3.2 above.
Factorization of Matrices
177
5.4. Eigenvalues and Eigenvectors In many of the factorization results on matrices, the main objective is to reduce a given matrix to a diagonal matrix. The question is whether one can attach some meaning to the numbers that appear in the diagonal matrix. In this section, we embark on such an investigation. DEFINITION 5.4.1. Let A be a square matrix of order m x m with complex entries. A complex number A is said to be an eigenvector of A if there exists a non-zero vector x such that
Ax
= AX.
(5.4.1)
We have come across the word "eigenvalue" before in a different context in Chapter 2. There is a connection between what was presented in Chapter 2 in the name of "eigenvalue" and what we are discussing now. This connection is explored in the complements at the end of the section. Rewriting (5.4.1) as (5.4.2) we see that any A for which A - AIm is singular would produce a nonzero solution to the system (A - AIm)x = 0 of homogeneous linear equations in x. The matrix A - AIm being singular would imply that the determinant IA - AIml = O. But IA - AIml is a polynomial in A of degree m. Consequently, one can conclude that every matrix of order m x m has m eigenvalues which are the roots of the polynomial equation IA - AIml = 0 in A. Suppose A is an eigenvalue of a matrix A. Let x be an eigenvector of A corresponding to A. If a is a non-zero scalar, ax is also an eigenvector corresponding to the eigenvalue A. For some special matrices eigenvalues are real.
P 5.4.2 real.
Let A be a Hermitian matrix. Then all its eigenvalues are
Let A be an eigenvalue and x a corresponding eigenvector (aij), i.e., Ax = AX. Take x to be of unit length. This implies
PROOF.
of A that
=
m
x· Ax
m
= L: L: aij XiXj = AX·X = A, i=l j=l
178
MATRIX ALGEBRA THEORY AND APPLICATIONS
where x* = {x}, X2,'" ,xm). We show that x" Ax is real. It is clear that aiixixi is real for all i. If i 1= j,
is real. Hence A is real. In Section 5.3, we presented a decomposition of a Hermitian matrix. The entries in the diagonal matrix involved have a special meaning. p 5.4.3 Let A be a Hermitian matrix and A = prP* its decomposition with P unitary and r = Diag{A}, A2,'" ,Am}. Then A}, A2, .. . ,Am are the eigenvalues of A. PROOF. Look at the determinantal equation IA-AIml = 0 for eigenvalues of A. Note that
IA - AIml = Iprp* - AIml = Iprp* - APP*I -
IP{r - AIm)P*1 = !PI jr - AIm I IP*I
= Ir -
AIml
=
{AI - A){A2 - A)'" (Am - A).
The proof is complete. It can also be verified that the i-th column vector of P is an eigenvector corresponding to the eigenvalue Ai of A. We now spend some time on eigenvectors. Let A and J.L be two distinct eigenvalues of a Hermitian matrix A and x and y the corresponding eigenvectors. Then x and yare orthogonal. Observe that Ax
= AX ~ y* Ax = AY*x,
Ay = J.Ly ~ x* Ay
= J.Lx*y.
Since x* Ay = y* Ax and x*y = y*x, we have (A - J.L)Y*x = 0,
from which we conclude that y*x = 0, i.e., x and yare orthogonal. If A is any matrix and A and J.L are two distinct eigenvalues of A with corresponding eigenvectors x and y, the good thing we can say about x and y is that they are linearly independent. Suppose they are linearly dependent. Then there exist two scalars a and f3 such that
ax + f3y = O.
(5.4.3)
Factorization of Matrices
Since the vectors x and yare non-zero, both zero. Note that
Q
179
and f3 have to be non-
0= A(QX + f3y) = QAx + f3Ay = QAX + f3J.LY.
(5.4.4)
Multiplying (5.4.3) by J.L and then subtracting it from (5.4.4), we note that QAX - QJ.LX = 0 <=> Q(A - J.L)x = O.
Since Q =1= 0, A =1= J.L, and x =1= 0, we have a contradiction to the assumption of linear dependence of x and y. We will now discuss the notion of multiplicity in the context of eigenvalues and eigenvectors. The mUltiplicity aD of a root AD of the equation IA - All = 0 is called its algebraic multiplicity. The number 90 of linearly independent solutions of the system (A - AOI)X = 0 of equations is called the geometric multiplicity of the root AD. If A is Hermitian, we show that 90 = aD. This result follows from the decomposition theorem for Hermitian matrices reported in P 5.3.8. From this theorem, we have a unitary matrix P = (XI,X2"" ,xm ) such that A = Pt:::,.P*, where t:::,. = Diag{A}, A2,'" ,Am}, Xi is the i-th column of P, and Ai'S are the eigenvalues of A. Let A(I),A(2),'" ,A(8) be the distinct values among AI, A2, ... ,Am with multiplicities aI, a2, ... ,as respectively. Clearly, al + a2 + ... + a 8 = m. Assume, without loss of generality, that the first al numbers among AI, A2, ... ,Am are each equal to A(I)' the next a2 numbers are each equal to A(2) and so on. Since
and Xi, i = 1,2, ... ,al are orthogonal, it follows that al ~ 91' A similar argument yields that ai ~ 9i for i = 2, 3, ... ,s. Consequently,
m=
8
8
i=1
i=1
L ai :::; L 9i :::; m.
(We cannot have more than m linearly independent vectors.) Hence ai = 9i for all i. This result is not true for any matrix. As an example, let
A=
[~
~].
180
MATRIX ALGEBRA THEORY AND APPLICATIONS
Zero is an eigenvalue of A of multiplicity 2. The corresponding eigenvector is of the form with x i= 0. Consequently, the geometric multiplicity of the zero eigenvalue is one. We will now present what is called the spectral decomposition of a Hermitian matrix A. This is essentially a rehash of P 5.3.8. From the decomposition A = P 6.P* we can write
A = >'IXIXr + >'2X2XZ + ... + where P
>'mxmx~
= (Xl, X2, ... ,xm ), Xi is the i-th column of P, 6. = Diag{>.l, >'2,.·. ,>'m}.
(5.4.5) and
Without loss of generality, assume that >'i i= 0, i = 1,2, ... ,r and >'i = 0, i = r+ 1,1'+2, ... ,m. Let >'(1),>'(2), ... ,>'(5) be the distinct values among >'1, ... ,>'m. We can rewrite (5.4.5) as
A = >'(1) EI + >'(2) E2 + ... + >'(5) E s ,
(5.4.6)
where Ei is the sum of all the matrices XjX; associated with the same eigenvalue >'i. Note that E; = E i , Ei = Ei for all i, and EiEj = for all i i= j. FUrther, p(Et} is the multiplicity of the root >'(1). The form (5.4.6) is the spectral decomposition of A. The spectral decomposition (5.4.6) is unique. We will demonstrate its uniqueness as follows. Suppose
°
(5.4.7) is another decomposition of A with the properties F? = Fi for all i and FiFj = for all i i= j. Subtracting (5.4.7) from (5.4.6), we note that
°
>'(1)(E1
-
FI ) + >'(2)(E2 - F2) + ... + >'(s)(Es - F 5 )
= 0.
(5.4.8)
MUltiplying (5.4.8) by Ei on the left and Fj on the right, for i i= j, we have EiFj = 0. Multiplying (5.4.8) by Ei on the left, we note that Ei = EiFi. Using a similar argument, we can show that EiFi = Ei. Thus we have Ei = Fi for all i. Now we take up the case of an important class of matrices, namely non-negative definite (also called positive senli-definite) matrices.
Factorization of Matrices
181
DEFINITION 5.4.4. A Hermitian matrix A is said to be non-negative definite (nnd) if x· Ax ~ 0 for all column vectors x E em. Non-negative definite matrices are like non-negative numbers. For example, we can take the square root of an nnd matrix which is also nnd. We shall now identify a special subclass of nnd matrices.
5.4.5. A Hermitian matrix A is said to be positive definite (abbreviated as pd) if A is nnd and x· Ax = 0 if and only if x = O. What we need is a tangible criterion to check whether or not a given Hermitian matrix is nnd. In the following result, we address this problem. DEFINITION
p 5.4.6
(1) (2) (3) (4)
Let A be a Hermitian matrix. The matrix A is nnd if and only if all its eigenvalues are nonnegative. For the matrix A to be nnd, it is necessary (not sufficient) that all its leading principal minors are non-negative. The matrix A is positive definite if and only if all its eigenvalues are positive. The matrix A is positive definite if and only if all its leading principal minors are positive.
(1) Let A be an eigenvalue of A and x a corresponding eigenvector, i.e., Ax = AX. Note that 0 ::; x· Ax = AX·X from which we have A ~ O. Conversely, suppose every eigenvalue of A is nonnegative. By P 5.3.8, there exists a unitary P such that P* AP = !:::,. = Diag{AI, A2, ... , Am}, where AI, A2, ... , Am are the eigenvalues of A. Let x E em be any given vector. Let y = p. x = (Yll . .. , Ym)'. Then PROOF.
x* Ax = y* P* APy = y. !:::,.y m
=
L AilYil
2
~ O.
i=l
(2) Suppose A is nnd. Then the determinant IAI of A is nonnegative. This follows from P 5.3.8. As a matter of fact,
Observe that any principal submatrix of A is also non-negative definite. Consequently, the leading principal minors of A are non-negative.
182
MATRIX ALGEBRA THEORY AND APPLICATIONS
(3) A proof of this assertion can be built based on the proof of (I). (4) If A is positive definite, it is clear that every leading principal minor of A is positive. Conversely, suppose every leading principal minor of A is positive. We want to show that A is positive definite. Let A = (aij). By hypothesis, au > o. Using au as a pivot, sweep out the first colunm and first row of A. Let B be the resultant matrix. Write
We note the following. 1. Every principal minor of A and the corresponding principal minor of B which includes the first row are equal. 2. Any leading principal minor of Bl of order k x k is equal to (a!l) times the leading principal minor of Al of order (k + 1) x (k + 1). These facts are useful in proving the result. We use induction. The result is obviously true for 1 x 1 matrices. Assume that the assertion holds for any matrix of order (m - 1) x (m - 1). The given matrix A is assumed to be of order m x m and for which all the leading principal minors are positive. Let B be the matrix obtained from A as above. It now follows that every leading principal minor of BI is positive. By the induction hypothesis, BI is positive definite. Consequently, B is positive definite. Hence A is positive definite. (Why?) This completes the proof. Note the difference between the assertions (2) and (4) of P 5.4.6. If all the principal leading minors of A are non-negative, it does not follow that A is non-negative definite. For an example, look at
However, if all the principal minors of the Hermitian matrix are nonnegative, then A is non-negative definite. Complements
5.4.1 Let AI, A2, ... ,Am be the eigenvalues of a matrix A of order m x m. Show that m
Tr A
= LAi i=l
and
IAI = n~lAi.
Factorization of Matrices
183
If A is an eigenvalue of A, show that A2 is an eigenvalue of A2. 5.4.3 Let A be a Hermitian matrix of order m X m with complex entries. For row vectors x, y E em, let K(x, y) = xAy*. Show that K(·,·) is a Hermitian conjugate bilinear functional on the vector space
5.4.2
em.
5.4.4 Let K(-'·) be as defined in Complement 5.4.3. Let < ',' > be the usual inner product on em. Let Al 2:: A2 2:: ••. 2:: Am be the eigenvalues of K(·, .). Show that AI, A2, ... ,Am are the eigenvalues of A. 5.4.5 Let A be a Hermitian matrix of order m x m and Ai the i-th leading principal minor of A, i = 1,2, ... ,m. If Al > 0, ... ,Am-I> and Am 2:: 0, show that A is nnd.
° =°
5.4.6 Let A = (aij) be an nnd matrix. If aii = 0, show that aij for all j. 5.4.7 Let A be an nnd matrix. Show that there exists an nnd matrix B such that B2 = A. Show also that B is unique. (It is customary to denote B by Al/2.)
5.4.8 If A is nnd, show that A 2 is nnd. 5.4.9 If A is pd, show that A-I is pd. 5.4.10 If A is positive definite, show that these exists a non-singular matrix C such that C* AC = I. 5.4.11 If A = (aij) is nnd, show that IAI ::; rr~Iaii. 5.4.12 For any matrix B of order m X n, show that BB* is nnd. What is the relationship between eigenvalues of B B* and singular values of B? 5.4.13 Let B = (b ij ) be a matrix of order m x m with real entries. Show that
Hint. Look at A = B' B. 5.4.14 Let A be an n X n nonsingular matrix with complex entries. Show that (A -1)' = A-I if A' = A, where A' is the transpose of A. 5.4.15 Show that a complex symmetric matrix need not be diagonalizable. 5.4.16
Show that every square matrix is similar to its transpose.
184
MATRIX ALGEBRA THEORY AND APPLICATIONS
5.4.17
Let A be a matrix of order m x m given by
A=
[ ~ i : : : :i]. p
p
p
p
1 p
... .. .
Show that A is positive definite if and only if - m~l < p < l. 5.4.18 Let A and B be two Hermitian matrices of the same order. Say A ~ B if A - B is nnd. Prove the following. (1) (2) (3) (4)
c.
If A ~ Band B ~ C, then A ~ If A ~ Band B ~ A, then A = B . If A and Bare nnd and A ~ B, then IAI ~ IBI· If A and B are positive definite, A ~ B, and IAI = IBI, then
A=B. 5.4.19 A Hermitian matrix A is said to be negative semi-definite if x· Ax ~ 0 for all column vectors x E em. A Hermitian matrix A is said to be negative definite if A is negative semi-definite and x'" Ax = 0 only if x = o. Formulate P 5.4.6 for these matrices. 5.5. Simultaneous Reduction of Two Matrices The principal goal of this section is to investigate under what circumstances two given matrices can be factorized in such a way that some factors are common. First, we tackle Hermitian matrices. Before this, we need a result which is useful in our quest. P 5.5.1 Let A be a square matrix of order n x n with complex entries. Let x be a non-zero column vector of order n x 1. Then there exists an eigenvector y of A belonging to the span of x, Ax, A 2x, .... PROOF. The vectors x, Ax, A2 x , ... cannot all be linearly independent. Let k be the smallest integer such that
(5.5.1) for some scalars bo, bl , ... ,b k of the polynomial,
1.
(Why?) Let ILl, IL2,
. .. ,ILk
be the roots
Factorization of Matrices
185
of degree k in z, Le., we can write
z k + bk-lZ k-l + bk-2Z k-2 + ... + blZ+ b0 = (z - JLd(z - JL2) ... (z - JLk). Consequently,
0= Akx + bk_lAk-ix + bk_ 2A k- 2x + ... + blAx + box = (A - JLlI)(A - JL2I) ... (A - JLkI)X. Let Y = (A - JL2I)(A - JL3I) ... (A - JLkI)X. It is clear that Y =1= 0 (Why?) and Y is an eigenvector of A. Further, Y belongs to the span of the vectors x, Ax, A2x, ... , Ak-lx. This completes the proof. p 5.5.2 Let A and B be two Hermitian matrices of the same order n X n. Then a necessary and sufficient condition that A and B have factorizations, A = P /::"1 P* and B = P /::"2P*, with P unitary and /::"1 and /::"2 diagonal matrices is that A and B commute, Le., AB = BA. If A and B have the stipulated factorizations, it is clear that AB = BA. On the other hand, let A and B commute. Let Yl be an eigenvector of B. For any positive integer 1·, we show that ArYl is also an eigenvector of B provided that it is non-zero. Since BYI = AYI for some scalar A, we have B(Ar yl ) = Ar BYI = AAr yl , from which the avowed assertion follows. We now look at the sequence Yl, AYl, A2 yI , ... of vectors. There is a vector PI in the span of the sequence which is an eigenvector of A. The vector PI is obviously an eigenvector of B. Thus we are able to find a common eigenvector of both A and B. We can take PI to be of unit length. Let Y2 be an eigenvector of B orthogonal to Pl. We claim that PI is orthogonal to every vector in the span of Y2, AY2, A2y2 , .. .. For any r 2: 1, PROOF.
* (A r)* Y2 PI = Y2*Ar PI = Y2*( a rPI) = a r Y2PI = 0,
where a is the eigenvalue of A associated with the eigenvector PI of A. Following the argument used earlier in the proof, we can find a vector P2 in the span of Y2, AY2, A 2Y2, ... which is a common eigenvector of both A and B. It is clear that P2 and PI are orthogonal. Take P2 to be of unit length. Continuing this way, we obtain orthonormal eigenvectors
186
MATRIX ALGEBRA THEORY AND APPLICATIONS
Pt.P2,'" ,Pn common to both A and B. Let P = (PI,P2,'" ,Pn)' Note that P is unitary and both P* AP and P* B P are diagonal. This completes the proof. Thus the above result clearly identifies the situation in which two Hermitian matrices are diagonalizable by the same unitary matrix. This result can be extended for any number of matrices. The diagonal matrices involved in the decomposition consist of eigenvalues of their respective matrices in the diagonals.
COROLLARY 5.5.3. Let AI, A 2 , .•. ,Ak be a finite number of Hermitian matrices of the same order. Then there exists a unitary matrix P such that P* AiP is diagonal for every i if and only if the matrices commute pairwise, i.e., AiAj = AjAi for all i =1= j. Finally, we present a result similar to the one presented in P 5.5.2 for special matrices. p 5.5.4 Let A and B be two Hermitian matrices at least one of which is positive definite. Then there exists a non-singular matrix C such that C* AC and C* BC are both diagonal matrices. PROOF. Assume that A is positive definite. Then there exists a noosingular matrix D such that D* AD = I. See Complements 5.4.10. Since D* BD is Hermitian, there exists a unitary matrix V such that V*(D* BD)V is diagonal. Take C = DV. Then C* AC = V* D* ADV = V" IV = I, which is obviously diagonal. This completes the proof. It will be instructive to explore the nature of entries in the diagonal matrix C* BC in P 5.5.4. Let C" BC = 6. = diag{ aI, a2, ... ,am}. The equation C* AC = I implies that C" = C-I A -I. Consequently,
6. = C* BC = C- I A-I BC from which we have A-IBC = C6..
(5.5.2)
Let Xi be the i-th column vector of C which is obviously non-zero. From (5.5.2), we have
A - I Bx't --
...
""X ' \.At t,
which means that ai is an eigenvalue of A - I B and Xi the corresponding eigenvector for every i. It remains to be seen that all the eigenvalues of A-I are accounted by al,a2, ... ,am, i.e., al,a2, ... ,am are the roots
Factorization of Matrices
187
of the detenninantal equation IA-l B - all = O. The linear independence of Xi'S settles this question. The Xi'S have an additional nice property: xi AXj = 0 for all i j. The detenninantal equation IA-I B - all = 0 makes an interesting reading. This equation is equivalent to the equation IB - aAI = O. The roots of this equation can be called the eigenvalues of B with respect to the positive definite matrix A! The usual eigenvalues of B as we know them traditionally can now be called the eigenvalues of B with respect to the positive definite matrix I.
t=
Complements 5.5.1 If A is positive definite and B is Hermitian, show that the eigenvalues of B with respect to A are real. 5.5.2 Let AI, A 2 , .•. ,Ak be k Hermitian matrices with Al positive definite. Show that there exists a non-singular matrix C such that C* AiC is diagonal for every i if and only if AiAII Aj = AjAII Ai for all i and j. 5.5.3 Let
A=
[: :-:I 3
-1
and B =
13
Determine a non-singular matrix which diagonalizes A and B simultaneously. 5.5.4 Suppose A and B are Hermitian and commute. Let >q, A2, ... ,Am be the eigenvalues of A and J.Ll, J.L2, ••. ,J.Lm those of B. Show that the eigenvalues of A + Bare
for some permutation iI, i 2, ... ,im of 1,2, ... ,m. 5.5.5 (Polar Decomposition) Let A be an n xn nOllsingular matrix. Then there exist a positive definite matrix C and an orthogonal matrix H such that A = CH. (Hint: Consider the positive definite matrix AA' and take C = (AA')t the positive definite square root of AA'. Then take H = C-I A. Verify that H is orthogonal.)
188
MATRIX ALGEBRA THEORY AND APPLICATIONS
5.6. A Review of Matrix Factorizations Because of the importance of matrix factorization theorems in applications, all major results of Chapter 5 and some additional propositions not proved in the chapter are recorded in this section for ready reference. The reader may consult the references given to books and papers for further details. We use the following notations. The class of m x n matrices is represented by Mm,n and the matrices of order n x n by Mn. Triangular Matrices. A matrix A = (aij) E Mn is called upper triangular if aij = 0 whenever j < i and lower triangular if aij = 0 whenever j > i. A unit triangular matrix is a triangular matrix which has unities in the diagonal. Permutation Matrices. A matrix P E Mn is called a permutation matrix if exactly one entry in each row and column is equal to 1, and all other entries are zero. Premultiplication by such a matrix interchanges the rows and postmultiplication, the columns. Hessenberg Matrices. A matrix A E Mn is said to be an upper Hessenberg matrix if aij = 0 for i > j + 1, and its transpose is called a lower Hessenberg matrix. if
Tridiagonal Matrices. A E Mn is said to be a tridiagonal matrix = 0, whenever Ii - jl > 1. For example
aij
a12
a13
a12
0
a22
a23
a22
a23
a32
a33
a32
a33
o
a43
o
a43
are upper Hessenberg and tridiagonal matrices respectively. Givens Matrices. A matrix A(l, m; c, s) E Mn is said to be a Givens matrix if aii alm
= =
= a mm = c and aij = 0 elsewhere.
1, i =1= l, i =1= m; -S, aml
= s,
all
We may choose c = cosO and s = sinO. Geometrically, the Givens matrix A = (l, m; c, s) rotates the l-th and m-th coordinate axes in the (l, m)-th plane through an angle O.
Factorization of Matrices
189
Other matrices. A E Mn is said to be Hennitian if A = A*, positive (negative) definite if x* Ax > 0« 0) for all nonzero x E en and non-negative (nonpositive) definite if x* Ax ;::: O(x* Ax ::; 0) for all x E en. The abbreviations pd is used for positive definite and nnd for non-negative definite. An alternative term for non-negative definite used in books on algebra is positive semi-definite abbreviated as psd. P 5.6.1 (Rank Factorization) Let A E Mm,n and rank p(A) = k. Then there exist matrices R E Mm,k, FE Mk,n and p(R) = p(F) = k such that A = RF. In the following propositions A E Mn represents a general matrix, L E M n , a lower triangular matrix and U E M n , an upper triangular matrix, all with complex entries unless otherwise stated. P 5.6.2
(LU Factorization Theorems)
(1) If all the leading principal minors of A are nonzero, then A can be factorized as A = LU, where the diagonal entries of L can all be chosen as unities. (2) If A is nonsingular, there exists a permutation matrix P E Mn such that P A = LU, where the diagonal entries of L can all be chosen as unities. (3) In any case there exist permutation matrices P1 , P2 E Mn such that A = P1 LU P2 . If A is nonsingular, it may be written as A = P1 LU. P 5.6.3
(Schur'S Triangulation Theorems)
(1) Let At, ... ,An be eigenvalues of A in any prescribed order. Then there exists a unitary matrix Q E Mn such that Q* AQ = U or A = QUQ*, where U is upper triangular with the eigenvalues A1, ... ,An as diagonal entries. That is, every square matrix is unitarily equivalent to a triangular matrix whose diagonal entries are in a prescribed order. If A is real and if all its eigenvalues are real, then U may be chosen to be real orthogonal. (2) Given a real A E Mn with k real eigenvalues, A1, ... ,Ak, and Xj + iYj as complex eigenvalues for j > k, there exists a real orthogonal matrix Q E Mn such that A = QRQ' where R is a quasi diagonal n x n matrix
190
MATRIX ALGEBRA THEORY AND APPLICATIONS
R=
and m = (n + k)/2, with blocks
R OJ·· of size
[
= [
xj -Cj
j b ], Jbjcj Xj
k,j ~ k
if i
1x2 2x 1
ifi~k,j>k
> k,j ~ k if i > k, j > k
if i
2x2 Zj
~
1x 1
=
Yj,
for j > kj
bj
2:
Cj, b j Cj
> O.
Note that ~i = Ai, i = 1, ... ,k. For an application of this result to probability theory see Edelman (1997). P 5.6.4 (QR Factorization) . Let A E Mm,n. Then A can be factorized as A = QR, where Q E Mm is unitary and
R= {
[~]
[R1 : 8 11
if m > n, if m ~ n,
where Ro E Mn and R1 E Mm are upper triangular and 8 1 E Mm ,n-m' P 5.6.5 (Upper Hessenberg Reduction) For any A, there exists a unitary matrix Q E Mn such that QAQ* = Hu (upper Hessenberg). P 5.6.6 (Tridiagonal Reduction). If A is Hermitian, there exists a unitary matrix Q E Mn such that QAQ* = HT (Tridiagonal). P 5.6.7 (Normal Matrix Decomposition) If A is normal, i.e., AA* = A* A, then there exists a unitary matrix Q E Mn such that A = QAQ*, where A E Mn is a diagonal matrix with the eigenvalues of A as diagonal elements. (Spectral Decomposition) Let A be Hermitian. Then A = QAQ*, where Q E Mn is unitary and A E Mn is diagonal with real entries, which are the eigenvalues of A. If A is real symmetric, then P 5.6.8
Factorization of Matrices
A
= QAQ' where Q
191
is orthogonal.
P 5.6.9 (Singular Value Decomposition) For A E Mm,n, we have A = V D. W*, where V E Mm and W E Mn are unitary matrices and D. E Mm,n has non-negative elements in the main diagonal and zeros elsewhere. If p(A) = k, then A = VoD.oWO', where Vo E Mm,k and Wo E Mn,k are such that Vo*Vo = h, WO'Wo = hand D.o E Mk is a diagonal matrix with positive elements in the main diagonal. P 5.6.10 (Hermitian Matrix Decomposition) If A is Hermitian, we have the factorization A = SD.S*, where S E Mn is nonsingular and D. E Mn is diagonal with +1,-1 or 0 as diagonal entries. The number of +l's and -l's are same as the number of positive and negative eigenvalues of A and the number of zeros is n - p(A). P 5.6.11 (Symmetric Matrix Decomposition) If A is real symmetric, then A has the factorization A = SD.S', where S E Mn is nonsingular and D. is diagonal with +1 or 0 as diagonal entries and p(D.) = p(A). P 5.6.12 (Cholesky Decomposition) If A is Hermitian and nnd, then it can be factorized as A = LL *, where L E Mn is lower triangular with non-negative diagonal entries. The factorization is unique if A is nonsingular. P 5.6.13 (General Matrix Decomposition) Any A can be factorized as A = SQ'L,Q*S-I, where S E Mn is nonsingular, Q E Mn is unitary and 'L, is diagonal with non-negative entries. P 5.6.14 (Polar Decomposition) Any A with rank k can be factorized as A = SQ, where S E Mn is nnd with rank k and Q E Mn is unitary. If A is nonsingular, then A can be factorized as A = CQ, where QQ' = I and C = C'. P 5.6.15 (Jordan Canonical Form) Let A be a given complex matrix. Then there is a nonsingular matrix S E Mn such that A = SJS-l where J is a block diagonal matrix with the r-th diagonal block as Jnr(Ar) E M nr , r = 1, ... , k and nl + .. .+nk = n. The Ai'S are eigenvalues of A which are not necessarily distinct. The matrix Jnr(Ar) = (aij) is defined as follows (see Horn and Johnson (1985) for a detailed proof):
192
MATRIX ALGEBRA THEORY AND APPLICATIONS
\ ~• -- 1, ... , n r,. at,t+l · . -- 1, ~• -- 1, a it. -- An
.•• ,
n r - 1·,a · . = 0 elsewhere. tJ
P 5.6.16 (Takagi Factorization) If A E Mn is symmetric (i.e., A = A'), then there exists a unitary Q E Mn and a real non-negative diagonal matrix E such that A = QEQ'. The columns of Q are an orthogonal set of eigenvectors of AA and the corresponding diagonal entries of E are the non-negative square roots of the corresponding eigenvalues of AA. [If A = (aij), A = (aij), where aij is the complex conjugate of aij.] P 5.6.17 Let A E Mn be given. There exists a unitary U E Mn and an upper triangular ~ E Mn such that A = U ~U' if and only if all the eigenvalues of AA are real and non-negative. Under this condition all the main diagonal entries of ~ may be chosen to be non-negative. P 5.6.18 (Complete Orthogonal Theorem) Given A E Mm,n with p(A) = k, there exist unitary matrices Q E Mm and W E Mn such that Q* AW
=
(~ ~)
where U E Mk is upper triangular.
P 5.6.19 (Similarity of Matrices) Every A E Mn is similar to a symmetric matrix. [A is similar to B if there exists a nonsingular S E Mn such that B = S-1 AS.]
P 5.6.20 (Simultaneous Singular Value Decomposition) Let E Mm,n. Then there exist unitary matrices P E Mm and Q E Mn such that A = PEl Q* and B = PE2Q* with both El and E2 E Mm,n and diagonal if and only if AB* and B* A are both normal [G is said to be normal if GG* = G*G].
A, B
P 5.6.21 For a set :F = {Ai, i E I} c Mm,n, there exist unitary matrices P and Q such that Ai = PAiQ* for all i E I and Ai are all diagonal if and only if each Ai Aj E Mn is normal and 9 = {AiAj : i,j E I} c Mm is a commuting family. P 5.6.22 Let A = A*, B = B* and AB = BA. Then there exists a unitary matrix U such that U AU* and U BU* are both diagonal. P 5.6.23 The Hermitian matrices, AI, A 2, ... , are simultaneously diagonalizable by the same unitary matrix U if they commute pairwise.
Note: The main references for this Chapter are: Bhatia (1991), Datta (1995), Golub and van Loan (1989), Horn and Johnson (1985, 1990), and Rao (1973c).
CHAPTER 6 OPERATIONS ON MATRICES Matrix multiplication is at the core of a substantial number of developments in Matrix Algebra. In this chapter, we look at other multiplicative operations on matrices and their applications. 6.1. Kronecker Product
Let A = (aij) and B = (b ij ) be two matrices of order m x nand p x q, respectively. The Kronecker product of A and B is denoted by
A ® B and defined by
A®B=
For a specific example, take m = 3, n = 2, p = 2, and q = 3. The Kronecker product of A and B spreads out to be a 6 x 6 matrix, allb 13 1 a12 b l1 allb23 1 a12 b 21
a12 b 12 a12 b 13 a12 b 22 a12 b 23
a21 b 21 a21 b 22
a21 b23 1 a22 b ll a21 b 23 1 a22 b 21
a22 b 12 a22 b 13 a22 b 22 a22 b 23
a31 b ll a31 b 12 a31 b 21 a31 b 22
a31 b 13 1 a32 b ll a31 b 23 1 a32 b 21
a32 b 12 a32 b 13 a32 b 22 a32~3
allb ll allb 12 allb 21 allb22
- - -- - --- - --1- - --- - --- - A®B= 3x2 2x3
a21 bl l a21 b 12
- - -- - --- - --1- - --- - --- - -
In the general case, the Kronecker product, A ® B, of A and B is of order mp x nq. From the very definition of the Kronecker product, it is 193
194
MATRIX ALGEBRA THEORY AND APPLICATIONS
clear that there are no restrictions on the numbers m, n, p, and q for the product to be meaningful. Notice also that the equality A 0 B = B 0 A rarely holds just as it is the case for the usual multiplication of matrices. However, the Kronecker product has one distinctive feature. The matrix B 0 A can be obtained from A 0 B by interchanging rows and columns of A 0 B. This feature is absent in the usual multiplication of matrices. We list some of the salient properties of this operation in the following proposition. Most of these properties stem directly from the definition of the product. P 6.1.1 (1) The operation of performing Kronecker product on matrices is associative. More precisely, if A, B, and C are any three matrices, then (A ®B) ®C = A0 (B 0C).
It is customary to denote (A ® B) 0 C by A ® B 0 C. (2) If A, B, and C are three matrices with Band C being of the same order, then
A0(B +C) = A®B +A®C. (3)
If a is a scalar and A is any matrix, then
o0A
= oA = A0o.
(In the Kronecker product operation, view a as a matrix of order 1 X 1.) (4) If A, B, C, and D are four matrices such that each pair A and C and Band D is conformable for the usual multiplication, then
(A®B)(C®D) = AC0BD. (5)
If A and B are any two matrices, then
(A 0 B)' = A' 0 B'. (6)
If A and B are any two matrices, then
(A 0 B)'" = A* 0B*.
Operations on Matrices
(7) then
195
If A and B are square matrices not necessarily of the same order, tr(A ® B) = [tr(A)][tr(B)J.
(8) If A and B are non-singular matrices not necessarily of the same order, then One of the most important questions concerning Kronecker products is about the relationship of the eigenvalues of the Kronecker product of two matrices and the eigenvalues of the constituent matrices of the product. We will address this question now. Let A and B be two square matrices with eigenvalues A1,A2, ... ,Am and Jl1,Jl2, ... ,Jln, respectively. Then the AiJlj, i = 1,2, ... ,m and j = 1,2, ... ,n are the eigenvalues of A ® B. P 6.1.2
PROOF.
If A is an eigenvalue of A with a corresponding eigenvector
x, and Jl is an eigenvalue of B with a corresponding eigenvector y, it is easy to show that AJl is an eigenvalue of A ® B with a corresponding eigenvector x ® y. As a matter of fact, note that (A ® B)(x ® y) = (Ax) ® (By) = (AX) ® (JlY) = AJl(X ® V),
which settles the avowed assertion. This does not prove that the eigenvalues of A®B are precisely AilLj, i E {1,2, ... ,m}andj E {1,2, ... ,n}. (Why?) Let us prove the assertion invoking the General Decomposition Theorem (P 5.3.6). There exist unitary matrices U of order m x m and V of order nxn such that U AU* = 6 1 and V BV* = 6 2 , where 6 1 and 6 2 are upper-triangular, the diagonal entries of 6 1 are the eigenvalues of A, and the diagonal entries of 6 2 are the eigenvalues of B. Note that (U ® V)(A ® B)(U ® V)* = (U ® V)(A ® B)(U* ® V*) = (U AU*) ® (V BV*) = 6
1
®6
2.
We also note that U ® V is unitary and 6 1 ® 6 2 upper-triangular. The diagonal entries of 6 1 ® 6 2 should exhaust all the eigenvalues of A ® B. (Why?) This completes the proof.
196
MATRIX ALGEBRA THEORY AND APPLICATIONS
We now look at the status of eigenvectors. A proof of the following result has already been included in the proof of P 6.1.2.
P 6.1.3 Let x be an eigenvector corresponding to some eigenvalue of a matrix A and y an eigenvector corresponding to some eigenvalue of a matrix B. Then x ® y is an eigenvector of A ® B. A word of warning. Not all eigenvectors of A ® B do arise the way it was described in P 6.1.3. (See the contrast in the statements of P 6.1.2 and P 6.1.3.) As a counterexample, look at the following 2 x 2 matrices:
A=B=[~ ~].
Let x' = (1,0). The eigenvalues of A are Al = A2 = 0 and the eigenvalues of Bare J.Ll = J.L2 = O. Only non-zero multiples of x are the eigenvectors of A. But A ® B has only three linearly independent eigenvectors, written as rows,
u'
= (I, 0, 0, 0),
v'
= (O, 1,0,0)
and w'
= (0,0,1,0).
There is no way we can write u, v and w in the way expostulated in P 6.1.3! P 6.1.2 has a number of interesting implications. We chronicle some of these in the following Proposition. P 6.1.4 (I) If A and B are non-negative definite matrices, so is
A®B. (2) If A and B are positive definite, so is A ® B. (3) If A and B are matrices of order m x m and n x n, respectively, then (using the notation 1· 1for determinant),
(4)
If A and B are two matrices not necessarily square, then rank(A ® B) = [rank{A)][rank{B)].
Hint: Look at (AA*) ® (BB*). It is time to look at the usefulness of Kronecker products. Consider linear equations AX = B,
(6.1.1)
Operations on Matrices
197
where A and B are known matrices of orders m x nand m x p, respectively, and X is of order n x p and unknown. As an example, look at (6.1.2) There are two ways we can write these equations in the format we are familiar with. One way is: a12
a22
o o
0 0 all
(6.1.3)
a21
Let x' = (Xl,X2,X3,X4) and b' = (b ll ,b2.,b 12,b22). The system (6.1.3) can be rewritten as (6.1.4) (12 ® A)x = b. Another way is:
o (6.1.5)
Let y' = (X},X2,X3,X4) and c' = (bll,b12,b21,b22). The system (6.1.5) can be rewritten as (6.1.6)
In the general case of (6.1.1), let x be the column vector of order np xl obtained from X = (Xij) by stacking the rows of X one by one, i.e.,
This vector has a special name. We will introduce this concept in the next section. Let b be the column vector of order mp x 1 obtained from B = (b ij ) in the same way x was obtained from X. The system of equations (6.1.1) can be rewi·itten as (6.1.7)
198
MATRIX ALGEBRA THEORY AND APPLICATIONS
Suppose m = n, i.e., A is a square matrix. Then the system (6.1.7) has a Wlique solution if A is non-singular. In the general case, a discussion of the consistency of the system AX = B of matrix equation now becomes easy in (6.1.7), courtesy of Kronecker products! Another matrix equation of importance is given by (6.1.8)
AX+XB=C,
where A is of order m x m, B of order n x n, C of order m x n, and X of order m x n. The matrices A, B, and C are known and X is Wlknown. Let x be the column vector of order mn x 1 obtained from X by stacking the rows of X, and c is obtained analogously from C. It can be shown that the system (6.1.8) is equivalent to
(A ® In
+ 1m ® B'}x =
c.
(6.1.9)
Now we can say that the system (6.1.8) admits a unique solution if and only if the matrix (6.1.10) D = A ® In + 1m ® B' of order mn x mn is non-singular. The matrix D is very special. It will be certainly of interest to know when it is non-singular. First, we would like to say something about the eigenvalues of D. P 6.1.5 Let D be as specified in (6.1.10) and AI, Az, ... ,Am be the eigenvalues of A and J.ll' J.l2, ... ,J.ln be those of B. Then the eigenvalues of D are Ai + !Lj, i = 1,2, ... ,m and j = 1,2, ... ,n. PROOF.
Let
E
> 0 be
any number. Look at the product
(Im + EA) ® (In + EB') = 1m ® In + E(A ® In + 1m ® B'} + EZ A ® B' = 1m ® In
+ ED + EZ A ® B'.
The eigenvalues of 1m + EA are 1 + EAl, 1 + EAz, . " , 1 + EAm and those of In+EB are 1+EJ.ll' 1+EJ.l2, ... ,1+EJ.ln. Consequently, the eigenvalues (Im + EA) ® (In + EB') are all given by
Opemtions on Matrices
199
for i = 1, 2, ... ,m and j = 1,2, ... ,n. Since t is arbitrary, it now follows that the eigenvalues of D are all given by (Ai + /lj) for i = 1, 2, . .. ,m and j = 1,2, ... ,n. (Why?) We can reap some benefits out of P 6.1.5. The non-singularity of D can be settled. COROLLARY 6.1.6. Let D be as defined in (6.1.9) and Ai'S be the eigenvalues of A and /l/s those of B. Then D is non-singular if and only if Ai + /lj =1= 0 for all i and j.
Complements 6.1.1 The matrices A and B are of the same order and so are the matrices C and D. Show that (A+B)®(C+D) =A®C+A®D+B®C+B®D.
6.1.2 Let x and y be two column vectors not necessarily of the same order. Show that x' ® y = yx' = y ® x'. 6.1.3 Let A and B be matrices of orders m x nand n x p, respectively. Let x be a column vector of order q x 1. Show that (A ®x)B = (AB) ® x.
6.1.4 Let A be a matrix of order m x n. Define the Kronecker powers of A by etc. Let A and C be two matrices of orders m x nand n x p, respectively. Show that (AC)12J = AI2JCI2J. Hence show that (AC)lkJ = AlkJClkJ for all positive integers k. 6.1.5 Let AX = B be a matrix equation. Provide a criterion for the existence of a solution to the equation. If the matrix equation is consistent, describe the set of all solutions of the equation. 6.1.6 Obtain a solution to the matrix equation AX = B, where A=
4
2
5
3
[ 3 -1
-!-1]
and B =
[-!1!3] .
200
MATRIX ALGEBRA THEORY AND APPLICATIONS
6.1.7 By looking at the eigenvalues of A and B, show that the following matrix equation has a unique solution:
[~
-1]
2 X+X
[-3
1
Determine the unique solution. 6.1.S By looking at the eigenvalues of A and B, show that the following matrix equation has more than one solution:
[01 -1]
2 X+X
[-3
0
Determine all solutions of the equation. 6.1.9 Show that the matrix equation AX - X B = C has a unique solution if and only if A and B have no common eigenvalues. 6.1.10 For what values of j.L, the matrix equation
AX -XA =ILX has a non-'t rivial solution in X. If j.L to the equation.
= -2, obtain a
non-trivial solution
6.2. The Vee Operation One of the most common problems that occurs in many fields of scientific endeavor is solving a system of linear equations. Typically, a system of linear equations can be written in the form Ax = b, where the matrices A and b of orders m x nand m xl, respectively, are known and x of order n x 1 unknown. Another problem of similar nature is solving matrix equations of the form AX = B or AX + X B = C, where A, B, and C are known matrices and X unknown. These matrix equations can be recast in the traditional linear equations format as explained in Section 6.1. The vec operation and Kronecker products playa key role. Let A = (aij) be a matrix of order m x n. One can create a single column vector comprising all the entries of A. This can be done in two ways. One way is to stack the entries of all the rows of A one after another starting from the first row. Another way, which is more popular, is to stack the coluIIUls of A one underneath the other. Let us follow the popular way. Let ai be the i-th column of A, i = 1,2, ... ,n.
Operations on Matrices
201
Fonnally, we define the vec of A as the column vector of order mn x 1 given by
v~(A) ~ [ ] . The notation vee A is an abbreviation of the operation of creating a single column vector comprising all the entries of the matrix A in a systematic way as outlined above. In the new notation, the matrix equation AX = B, where A is of order m x n, X of order n x p, and B of order m x p, can be rewritten as
(Ip ® A)veeX = veeB. This is just a system of linear equations. If the system is consistent, i.e., admits a solution, one can write down all solutions to the system. We will now examine some properties of the vec operation. One thing we would like to emphasize is that the vec operation can be defined for any matrix not necessarily square. Another point to note is that if vee(A) = vec(B), it does not mean that A = B. The matrices A and B could be of different orders and yet vec(A) = vec(B) is possible. P 6.2.1 (1) If x and yare two column vectors not necessarily of the same order, then vec(xy') = y ® x. (2)
If A and B are matrices of the same order, then
tr(A' B) (3)
= [vee(A)l'vec(B).
If A and B are matrices of the same order, then
vee(A + B) = vec(A)
+ vec(B).
(4) If A, B, and C are three matrices such that the product ABC makes sense, then
vec(ABC) = (C' ® A)veeB.
MATRIX ALGEBRA THEORY AND APPLICATIONS
202
(5) If A and B are two matrices of orders m x nand n x p, respectively, then vec(AB) = (B' ® Im)vecA = (Ip ® A)vecB. PROOF. The assertions (1), (2), and (3) are easy to verify. We tackle (4). Let A = (aij), B = (b ij ), and C = (Cij) be of orders m x n, n x p, and p x q, respectively. Let bI , b2, .. . ,bp be the columns of B. Let ell e2, ... ,e p be the columns of the identity matrix Ip. We can write p
B
= Blp = (bI'~'· ..
,bp)(el' e2,· .. ,ep)'
=L
bjej.
j=1
Consequently, by (3) and (4), vec (ABC) p
p
p
=vec(A(Lbjej)C) = Lvec(AbjejC) = Lvec«Abj)(C'ej)') j=1 j=1 j=1 p
p
= L(C'ej) ® (Abj) = L(C' ® A)(ej ® bj ) j=1 j=1 P
= (C' ® A) L(ej ® bj ) j=I P
= (C' ® A) Lvec(bjej) j=1 p
= (C' ® A)vec(L bjej) = (C' ® A)vecB. j=1 The assertion (5) follows from (4) directly if we note that the matrix AB can be written in two ways, namely, AB = ImAB = AB/p. Complements
6.2.1 Let A, B, C, and D be four matrices such that ABC D is square. Show that tr(ABCD) = (vecD')'(C' ® A)vecB = (vec(D'))'(A ® C')vecB'. 6.2.2 Give a necessary and sufficient condition for the existence of a solution to the matrix equation AX B = C, where A, B, and C are all matrices of the same order m x m. Hint: Use P 6.2.1(4).
Operations on Matrices
203
6.3. The Hadamard-Schur Product Let A = (aij) and B = (bij) be two matrices of the same order. The Hadamard-Schur (hereafter abbreviated as HS) product of A and B is again a matrix of the same order whose (i,j)-th entry is given by aijbij. Symbolically, the product is denoted by A . B = (aijb ij ). The HS product is precisely the entry-wise product of A and B. Let Mm,n be the collection of all matrices of order m x n with complex entries. We have already known that Mm,n is a vector space with respect to the operations of addition of matrices and scalar multiplication of matrices. (If m = n, we denote Mm ,n by Mn.) The HS product is associative and distributive over matrix addition. The identity element with respect to HS multiplication is the matrix J in which every entry is equal to 1. In short, Mm,n is a commutative algebra, i.e., a commutative ring with a multiplicative identity. If m and n are different, the usual matrix multiplication of matrices does not make sense. However, if m = n, the usual matrix multiplication is operational in M n , and Mn is indeed a non-commutative algebra. In this section, we study some of the properties of HS multiplication and present some statistical applications. Some of the properties mentioned earlier are chronicled below.
P 6.3.1
(1)
If A and B are matrices of the same order, then
A·B=B · A. (2)
If A, B, and C are three matrices of the same order, then
A · (B · C) = (A· B)· C. (Now the brackets in multiplication involving three or more matrices can be deleted.) (3) If A, B, C, and D are four matrices of the same order, then
(A
+ B)· (C + D) =
A ·C
+ A · D + B · C + B · D.
(4) If A is any matrix and 0 is the zero matrix of the same order, then A·O = O. (5) If A is any matrix and J is the matrix of the same order each entry of which is 1, then A . J = A.
204
MATRIX ALGEBRA THEORY AND APPLICATIONS
(6) If m = n, A = (aij) is any matrix, and 1m is the identity matrix, then A · l m =diag(all,a22, .. . ,amm ). (7) If A and B are any two matrices of the same order, then
(A · B)' = A' . B'. (8) If A = (aij) is any matrix with the property that each aij =1= 0, and B = (l/aij), then A . B = J. (The matrix B is the (HS) multiplicative inverse of A.) There seems to be no universal agreement on an appropriate name for entry-wise product of matrices of the same order. In some research papers and books, the product is called Schur product. In 1911, Schur conducted a systematic study of what we call HS multiplication. In 1899, Hadamard studied properties of three power series J{z) = Eanz n , g{z) = Ebnz n , and h{z) = Eanbnz n , and obtained some remarkable results. Even though he never mentioned entry-wise multiplication of matrices in his study but the idea was mute when he undertook the study of coefficient-wise multiplication of two power senes. The following is one of the celebrated results of Schur. It can be proved in a number of different ways. We will concentrate just on the one which is statistical! P 6.3.2 (Schur's Theorem) If A and B are two non-negative definite matrices of the same order, then A . B is also non-negative definite. If A and B are both positive definite, then so is A . B . PROOF. Let X and Y be two independent random vectors with mean vector 0 and dispersion matrices A and B, respectively. The random vector X . Y has mean vector 0 and dispersion matrix A . B. It is clear that every dispersion matrix is non-negative definite. The nonstatistical proof is as follows using Kronecker product A (8) B. The HS product A · B of two matrices can be regarded as a submatrix of A ® B of A and B. Let A and B be two square matrices of the same order m . Consider the submatrix of A (8) B by retaining its rows by numbers 1, m + 2, 2m + 3, ... ,{m - l)m + m = m 2 and the columns by numbers 1, m + 2, 2m + 3, ... , (m - l)m + m = m 2 and chucking out the rest. This submatrix is precisely A . B and moreover, it is indeed a principal submatrix of A (8) B. If A and B are non-negative definite,
Operations on Matrices
205
then so is A ® B. Consequently, any principal submatrix of A ® B is also non-negative definite. This is another proof of P 6.3.2. In the general case when m and n are different, the HS product A· B is still a submatrix of the Kronecker product A ® B, and the same proof holds. Schur's theorem has a converse! If A is a non-negative definite matrix, is it possible to write A as a HS product of two non-negative definite matrices? The answer is, trivially, yes. Write A = A· J. Recall that J is the matrix in which every entry is equal to 1. If A is positive definite, is it possible to write A as a HS product of two positive definite matrices? The answer is yes. See the complements at the end of the section. Rank and HS product are the next items to be considered jointly. The following result provides an inequality. P 6.3.3 Let A and B be two matrices of the same order m x n. Then rank(A· B) ~ [rank(A)][rank(B)]. PROOF. Let us use rank factorization of matrices. Let A and B have ranks a and b, respectively. Then there exist matrices X =
(Xl,X2, ... ,xa)ofordermxa, Y=(Yl,Y2, ... ,Ya)ofordernxa, Z= (Zl, Z2, ... ,Zb) of order m x b, and U = (Ul' U2, ... , Ub) of order n x b
such that A
=
a
XY'
= 2:= xiyi
and B
=
b
ZU'
= 2:= ziui.
i=l
The matrices
i=l
X and Y each has rank a and Z and U each has rank b. Note that a
b
a
b
A· B = (LxiyD' (LZjuj) = LL(XiyD· (Zjuj) i=l
a
j=l
i=l j=l
b
= LL(Xi· Zj)(Yi· Uj)'. i=l j=l
Consequently, p(A - B) ~ ab = [p(A)][P(B)]. See Complement 6.3.1 at the end of this section. The stipulated inequality follows if we observe that each matrix within the summation symbols is of rank 1 at most. The inequality stated in P 6.3.3 seems to be very crude. On one hand, the rank of A· B cannot exceed min{ m, n} and on the other hand, [rank(A)][rank(B)] could be [min[m, n]j2. However, equality in P 6.3.3
206
MATRIX ALGEBRA THEORY AND APPLICATIONS
is possible. Let
A
=
[~ ~
H]
0011
and B
=
[~ ! ~
!]
0101
Note that rank(A) = rank (B) = 2 and rank(A . B) = 4. Next we concentrate in obtaining some bounds for the eigenvalues of HS products of matrices. For any Hermitian matrix A of order m x m, let AI(A) 2: A2(A) 2: ... 2: Am(A) be the eigenvalues of A arranged in decreasing order. P 6.3.4 Let A and B be two non-negative definite matrices of order m x m. Let b1 and bm be the largest and smallest entries respectively among the diagonal entries of B. Then
PROOF. This is virtually a consequence of variational characterization of the largest and smallest eigenvalues of a Hermitian matrix. We will see more of this in a later chapter. We need to note right away that for any Hermitian matrix A and vector x,
(6.3.1) One can establish this inequality by appealing to P 5.3.8. Note that
Next we note that A - Am(A)Im is non-negative definite. Use (6.3.1). By P 6.3.2, both B·(A-Am(A)Im ) and [Am(A)]B·lm are non-negative definite. If x is a vector of unit length, then
(Why?) Since x is arbitrary, it follows that Am(A . B) 2: bm[Am(A)]. In a similar vein, the inequality Al (A . B) $ bt[A2(A)] follows. This completes the proof.
Operations on Matrices
207
Let us see what we can get out of this inequality. If B = (b ij ) is Hermitian, then Am(B) :::; bii :::; Al (B) for all i. This follows from (6.3.1). Take x to be the unit vector with the i-th component being 1 and all other components being zeros. This simple observation yields the following result. COROLLARY 6.3.5. If A and B are two non-negative definite matrices of the same order m, then for all j
This corollary resembles the result on the rank of an HS product (see P 6.3.3). For the next result, we need the notion of a correlation matrix. A non-negative definite matrix R = (Pij) is said to be a correlation matrix if Pii = 1 for all i, i.e., every diagonal entry of R is equal to unity. If R is a correlation matrix, then Ipijl :::; 1 for all i and j. (Why?) If the correlation matrix R is non-singular, then Ipijl < 1 for all i i= j. Correlation matrices arise naturally in Multivariate Analysis. Let X be a random vector with dispersion matrix ~ = (O"ij). Define Pij = correlation between Xi and Xj = O"ij/[O"iiO"jjj1/2= [covariance between Xi and XjJl[(standard deviation of Xi) (standard deviation of Xj)], where Xi is the i-th component of X. Let R = (Pij). Then R is a correlation matrix. The manner in which we arrived at the correlation matrix begs an apology. What happens when one of the variances O"ii'S is zero. If O"ii is zero, the entire i-th row of ~ is zero. In such an event, we could define Pij = 1 for all j. With this convention, we can proclaim that every correlation matrix arises this way. The following is a trivial corollary of P 6.3.4. COROLLARY 6.3.6. Let A be a non-negative definite matrix of order m x m and R any correlation matrix. Then for all j
Let us examine some implications of this corollary. If we take R
=
1m, we get the result that every diagonal entry of A is sandwiched between the smallest and largest eigenvalues of A. Let ~ = (O"ij) be any non-negative definite matrix. It can be regarded as a dispersion matrix of some random vector X with components Xl, X 2 ,.·· , X m .
208
MATRIX ALGEBRA THEORY AND APPLICATIONS
Let R = (Pij) be the correlation matrix associated with E. Write O'ii = variance of Xi = aT for all i. Observe that aij = Pijaiaj for all i and j. Define A = aa', where the vector a' = (aI,'" ,am)' Note that A is of rank 1. Further, E = A . R . Corollary 6.3.6 provides bounds on the eigenvalues of E in terms of the eigenvalues of A. Since A is of rank 1, (m - 1) eigenvalues of A are all equal to zero. The other eigenvalue is m
m
i=I
i=I
~ a'f. (Why?) Corollary 6.3.6 offers the inequality 0 ::; Aj(E) ::; ~ a'f. Corollary 6.3.6 offers a good insight as to the magnitude of the eigenvalues when the correlations associated with a dispersion matrix are modified. Suppose we have a dispersion matrix E = (aij) with the associated correlation matrix R = (Pij). Suppose we reduce the correlations in R in absolute value by a systematic factor keeping the variances the same. The question is how the eigenvalues of the modified dispersion matrix are affected. Let us make this a little more concrete. Let 1 P P ... P] pIp ... P
I4J=
[
~~~:::~' P P P ... 1
where -1/( m - 1) ::; P ::; 1 is fixed. Clearly, Ro is a correlation matrix. Let Eo = E· Ro. It is clear that Eo is non-negative definite and can be regarded as a dispersion matrix of some random vector. The variances in E and Eo are identical. The correlations Pij'S associated with the dispersion matrix Eo are a constant multiple of the correlations Pij'S associated with E. The gist of Corollary 6.3.6 is that the eigenvalues of the modified dispersion matrix Eo are sandwiched between the smallest and largest eigenvalues of the dispersion matrix E. Let us now deal with some deterrninantal inequalities. The following result is very useful in this connection. P 6.3.7 Let A = (aij) be a non-negative definite matrix of order m x m and Al the submatrix of A obtained from A by deleting the first row and first column of A. Let e' = (1,0,0, ... ,0); n = 0, if IAI = 0; n = IAI/IAII if IAI =1= 0 and
A2 = A - nee'. Then the following are valid.
Operations on Matrices
209
(1) The matrix A2 is non-negative definite. (2) If A is positive definite, A-I satisfies A 2A-I A2 = A 2. (In the nomenclature of a later chapter, A-I is a g-inverse of A 2 .) (3) If A is positive definite and A-I = (a ij ), then alla ll ~ 1. (4) IAI ~ alla22··· amm o
PROOF. (1) and (2). If IAI = 0, A2 = A and hence A2 is nonnegative definite. Suppose IAI i= o. Let us determine what 0 precisely is. Write
A = [all
a~
al Al
] ,
where (all, a~) is the first row of A. Observe that
Check the material on Schur complements. Consequently, 0 = all a~ All al. There is another way to identify o . Recall how the inverse of a matrix is computed using its minors. The determinant of Al is the cofactor of all. Therefore, the (1, I}-th entry all in A-I is given by all = IAII/IAI = 1/0 = e' A-Ie. Note that
A 2A- I A2
= (A - oee')A-I(A - oee') = A + 02ee' A-lee' - oee' = A + oee' - oee' - oee' =
oee' A2 .
It is clear that A2 is symmetric and hence A 2A-I A2 is non-negative definite. This means that A2 is non-negative definite. With one stroke, we are able to establish both (1) and (2). (3) Since A2 is non-negative definite, its (l,l)-th element must be non-negative, i.e., all - 0 ~ o. But 0 = I/a ll . Consequently, alla ll ~ 1. As a matter of fact, aiiaii ~ 1 for all i. (4) The inequality alla ll ~ 1 can be rewritten as IAI ~ auiAII. Let B be the submatrix of A obtained by deleting the first two rows and first two columns of A. In an analogous way, we find that we have IAII ~ a221BI . If we keep pushing this inequality to its utmost capacity, we have the inequality that IAI ~ au a22 .. . ammo This inequality goes under the name of HS determinantal inequality.
210
MATRIX ALGEBRA THEORY AND APPLICATIONS
If A is positive definite, it is not necessary that A2 is positive definite. For a counterexample, take A = 12. The correlation matrix is an important landmark in Multivariate Analysis. The above result provides a good understanding on the makeup of a correlation matrix. COROLLARY 6.3.8. If R = (Pij) is a non-singular correlation matrix then the diagonal entries of R- 1 = (pij) satisfy the inequality pii ~ l. Further, IRI ~ 1. The case when IRI = 1 is of interest. If R = 1m , then IRI = 1. In fact, this is the only situation we have determinant equal to 1. This can be shown as follows. Let AI, A2, . . . ,Am be the eigenvalues of R. By the Arithmetic-Geometric mean inequality,
The equality holds if and only of all Ai's are equal. In our case, 1 = m
IRI = I1~1 Ai and
L
Ai = tr(R)
=
m . Thus equality holds in the
i=l
Arithmetic-Geometric mean inequality. Hence all Ai'S are equal and in fact, they are all equal to unity. Hence R = 1m. (Why?) Now we come to an interesting phase of HS multiplication. The following result involves detenninants. P 6.3.9 Let A = (aij) and B = (b ij ) be two non-negative definite matrices of the same order m x m. Then
IA· BI PROOF.
~ IAI IBI·
(6.3.2)
First, we establish the following inequality: (6.3.3)
If A is singular or one of the diagonal entries of B is zero, the inequality (6.3.3) is crystal clear. Ass1lllle that A is non-singular and none of the diagonal entries of B is zero. Let R be the correlation matrix associated with the dispersion matrix B. Now observe that
IA· BI
= bll b22 o. obmmlA RI. 0
Operations on Matrices
211
(Why?) In order to establish (6.3.3), it suffices to prove that
IA·RI ~IAI
(6.3.4)
for any correlation matrix R = (Pij). Let A2 = A - aee', where a and e are as defined in P 6.3.7. Let A-I = (a ij ). Observe that
Os IA2 . RI = I(A -
aee')· RI
= IA· R - (i/all)ee'l.
The computation of the last determinant requires some tact. We need to borrow a trick or two from the theory of determinants. We note that the determinant IA . R - alII ee'l = IA· RI- allllA 1 . RII where Al is the submatrix of A obtained by deleting the first row and first column of A and RI is created analogously. From these deliberations, we obtain the inequality,
This cess. rows from
is an interesting inequality begging for a continuation of the pr~ Let B be the submatrix obtained from A by deleting the first two and first two columns. Let R2 be the correlation matrix obtained R in a similar fashion. Continuing the work, we have
Pushing the chain of inequalities to the end, we obtain IA·RI ~ IAI. This establishes (6.3.3). Now from (6.3.3), it is clear that IA . BI ~ IAI IBI· Use P 6.3.7 (4). Let us examine what HS multiplication means in certain quarters of Statistics. Suppose X(l), X(2), ... is a sequence of independent identically distributed random vectors with common mean vector 0 and dispersion matrix E. Assume that E is non-singular. Let R be the correlation matrix associated with E. Let, for each n ~ 1, y(n)
= X(l) . X(2) .•... x(n),
i.e., y(n) is the HS product of the random vectors X(1), X(2), ... ,x(n). The correlation matrix R{n) of the random vector y(n) is precisely the HS product of R with itself n times. Let us denote this product by
212
MATRIX ALGEBRA THEORY AND APPLICATIONS
R(n) = (p~j») . If IRI = 1, the components of y(n) are clearly uncorrelatOO. If 0 < IRI < 1, the components of y(n) are nearly uncorrelatOO if n is large. (Why?) The determinantal inequality referred to in P 6.3.9, namely IR(n)1 :2: (IRl)n, is not informative. The determinant IR(n)1 is nearly equal to one if n is large, whereas the quantity (IRl)n is nearly equal to zero. One can easily improve the lower bound provided by P 6.3.9 on the determinant of the HS product of two non-negative definite matrices. This is what we do next. P 6.3.10 If A = (aij) and B = (bij) are two non-negative definite matrices of order m X m, then (6.3.5) PROOF . Note that the determinantal inequality (6.3.2) is a special case of (6.3.5). The inequality (6.3.5) leads to
from which (6.3.2) follows . As for the validity of (6.3.5), if A or B is singular, (6.3.5) is essentially the inequality (6.3.3). Assume that both A and B are non-singular. Let Q and R be the correlation matrices associated with A and B, respectively. It suffices to prove that
IQ· RI + IQI IRI :2: IQI + IRI· To prove this inequality, we can employ the trick we have used in the proof of P 6.3.9 by looking at the relationship between the minors of a matrix. Let Qi be the submatrix of Q obtained by deleting the first i rows and i columns of Q, i = 0, 1,2, ... ,m - 1. Let Ri stand for the submatrix of R likewise. Let
Our objective is to show that £1 :2: O. We shall scrutinize £1 and £2 a little closely. We need to put in some additional work before the scrutinization. Let R-l = (pij). Recall the vector e' = (1,0,0, . . . ,0)
Operations on Matrices
213
we have used in the proof of P 6.3.9. By P 6.3.7, R- = R- (II pll )ee' is non-negative definite and hence Q. Ii is non-negative definite. Further, by (6.3.3),
IQI (1 - 1I p11) ~ IQ . RI = IQ . RI (1 - I~~ ~~~11 ),
(6.3.6)
from which we have (6.3.7) (The equality in (6.3.6) is fashioned after (6.3.6).) By (6.3.7),
£1 -
IQ . RI - IQ1 . RIll pll + IQI IRI -IQ11 IR111 pll + IQ111pll -IQI + IR111pll -IRI Z IQI-IQII pll + IQI IRI-IQ11 IR111 pll + IQ111pll -IQI + IR111pll -IRI = (Ilpll)(IQ11-IQI) + IQI IRI-IQ11 IRI + IRI-IRI = (II pll -IRI)(IQ11-IQI).
£21 pll =
In these deliberations, we have used the fact that pll = IRII/IRI. Observe that (Ilpll) -Ipl = (1 -lp11)lpll Z 0 as the determinant of a correlation matrix is ~ 1. Note also that IQ11 - IQI = qlllQI - IQI = (qll _ I)IQI 0, where Q-1 = (qii). Consequently, we observe that £1 - £21 pll z 0, which implies that £1 Z £2(IRI/IR11). This inequality sets a chain reaction. It now follows that £2 Z £3(IR11/IR21). Proceeding inductively, we achieve that £1 Z o. This completes the proof. Let us spend some time on HS multiplication and ranks. For any two matrices A and B, we have seen that rank(A · B) ~ [rank(A)][rank(B)J. If the matrices are non-negative definite, we can do a better job.
z
P 6.3.11
If A is positive definite and B is non-negative definite with r non-zero diagonal entries, then rank(A . B) = r. PROOF . Observe that A . B is non-negative definite and has r nonzero diagonal entries. Consequently. rank(A . B) ~ 1·. Consider the principal submatrix of A . B of order r x r whose diagonal entries are
214
MATRIX ALGEBRA THEORY AND APPLICATIONS
precisely these non-zero numbers. By (6.3.3), the determinant o~ t~s submatrix is non-zero. Thus we now have a minor of order r whIch IS non-zero. Hence rank(A . B) 2: r. This completes the proof. From P 6.3.11, the following interesting results emerge. (1) If A is pd and B is nnd with diagonal entries nonzero, then A· B is nnd even if B is singular. (2) If p(A) = p(B) = 1, then p(A . B) = 1. (3) If A is pd and B is nd, then A· B is nd. (4) It is feasible for A . B to have full rank even if A and B are not of full rank. For instance, each of the matrices
A
~
[:
:
nand ~ B
[:
i ~1
has rank 2, but A . B has rank 3. P 6.3.12 (Fejer's Theorem) . Let A = (aij) be an n X n matrix. Then A is nnd if and only if tr(A . B) 2: for all nnd matrices B of order n x n.
°
PROOF . To establish the only if part, let A and B be both nnd and consider a vector x E en, with all its components unity. Then A . B is nnd and x"(A· B)x 2: 0, i.e., tr(A · B) 2: 0. Conversely let tr(A · B) 2: for all nnd B. Choose B = (bij) = (XiXj) for' any x E en. Then B is nnd and tr{A . B) = x" Ax 2: which implies that A is nnd. As a corollary to Schur's product theorem we have the following.
°
°
COROLLARY 6.3.13. Let A be an n X n nnd matrix. Then: (1) The matrix A . A ..... A, with any number of terms is nnd. (2) If J{z) = ao + alz + a2z2 + ... is an analytic function with non-negative coefficients and radius of convergence R > 0, then the matrix (f{aij)) is nnd if alllaiji < R.
Complements 6.3.1 Let A and B be matrices of the same order m x n and the same unit rank, i.e., A = xy' and B = uv' for some non-zero column vectors
Operations on Matrices
215
x and u each of order m x 1 and non-zero column vectors y and veach of order n x 1. Show that
A· B = (x· u)(y· v)'. Show also that A . B is at most of rank 1. 6.3.2 Let A = (aij) be a square matrix of order m x m. Show that
6.3.3 (An alternative proof of Schur's Theorem P 6.3.2) . Since B is non-negative definite, write B = TT*. Let tk be the k-th column of T = (tij), k = 1,2, ... ,m. Let x' = (Xl, X2 , '" ,xm ) be any vector with complex entries. Then m
x*(A· B)x
=L
m
LaijbijXiXj
i=I j=I
m
=L
m
m
Laij(Ltikljk)XiXj
i=I j=I
k=I
m
= L(x, tk)* A(x· tk) ;::: 0,
as A is non-negative definite.
k=I
6.3.4 Let A and B be two matrices of the same order m x m. Let 1m be a column vector of order m x 1 in which each entry is equal to 1. Show that tr(AB) = l~(A· B')l m • 6.3.5 Show that every positive definite matrix is the HS product of two positive definite matrices. Explore the uniqueness of the factorization. 6.3.6 If A and B are positive definite, show that A . B = AB if and only if A and B are both diagonal matrices. Hint: Use P 6.3.7 (4) and (6.3.3). Is it necessary that A and B have to be positive definite? 6.3.7 Let A be a symmetric non-singular matrix. Show that 1 is an eigenvalue of A . A-I with corresponding eigenvector e, where e' =
(1,1, ... ,1). Hint: Observe that each row sum of A . A-I is unity.
216
MATRIX ALGEBRA THEORY AND APPLICATIONS
6.4. Khatri-Rao Product In this section, we introduce another product of matrices known as Khatri-Rao product and examine some of its ramifications. Let A and B be two matrices of orders p x nand m x n, respectively. Let ai be the i-th column of A and f3i, be the i-th column of B, i = 1,2, ... ,n. The Khatri-Rao product A 0 B of A and B is the partitioned matrix of order pm x n given by
(6.4.1) We will establish some useful results stemming out of this type of matrix. P 6.4.1 Let A, B, C , and D be four matrices of orders p x n, m x n, m x p, and n x m, respectively. Then (C ® D)(A 0 B) = (CA) 0 (DB), PROOF. Let ai be the i-th column of A and f3i, the i-th column of B, i = 1,2, .. . ,n. Then the i-th column of CA is Cai and that of DB is Df3i. Consequently, the i-th column of (CA) 0 (DB) is Cai ® Df3i = (C® D)(ai ®f3i), which is precisely the i-th column of (C® D)(A0B) . In the next result, we rope HS multiplication into the process of Khatri-Rao product.
P 6.4.2 Let A and B be two non-negative definite matrices each of order n x n. Let A = r'r and B = 11'11 be the Gram-matrix representations of A and B, respectively, for some matrices r of order r x n and 11 of order s x n. Then the HS product of A and B is related to the Khatri-Rao product by
A · B = (r 0 11)'(r 0 11). PROOF. Let ai, a2, .. . ,an be the columns of rand f31, f32, ... , f3n be those of 11. If A = (aij) and B = (b ij ), note that aij = aiaj and bij = f3:f3j for all i and j. The (i, j)-th entry of (r 0 11)' (r 011) is given by (ai ® f3i)'(aj ® f3j) = aiaj ® f3:f3j = (aiaj)(f3ff3j) = aijb ij , which is the (i,j)-th entry of the HS product A . B. From P 6.4.2, we can draw the important conclusion that the HS product A . B is also non-negative definite. We have arrived at this conclusion in Section 6.3 by a different route.
Operations on Matrices
217
P 6.4.3 Let A and B be two non-negative definite matrices of the same order n x n. Let A = rlf and B = 0'0 be the Gram-matrix representations of A and B, respectively, for some matrices r of order r X nand 0 of order s x n. If the HS product A . B is not of full rank, then there exists a non-null diagonal matrix 6. such that r 6.0' = O. PROOF. By P 6.4.2, we can write A ·B = (r00)'(r00). Since A ·B is not of full rank, there exists a non-null vector x such that (A· B)x = O. This implies that (r 0 O)x = O. (Why?) Let r = (,ij), 0 = (Wij), and x' = (x}, X2,'" , x n ). Let 6. = diag(xI' X2, ••• ,xn ). The statement that (r 0 O)x = 0 implies that
for 1 ~ i ~ 7' and 1 ~ j ~ s. This is equivalent to r 6.0' = O. This completes the proof. Let us explore the world of estimation of heteroscedastic variances in linear models as an application of the results presented in this section. Suppose Y1 , Y2,'" ,Yn(n > 2) are pairwise uncorrelated random varirespecables all with the same mean IL but with variances lT~, lT~, ... tively. The objective is to estimate each of the variances using the data
,IT;
n
-
Y1 , Y2, ... ,Yn . Let Y =
n1 L
2
Yi and S =
1 n-l
i=l
n
-2
.
L(Yi - Y) . The estli=l
mati on problem arises when one wants to ascertain precisions of some n instruments. All instruments measure the same quantitative phenomenon IL but with error characterized by the variances lT~, lTi, .. . Based on the measurements Y1 , Y2," . ,Yn made one on each instrument, the objective is to estimate the precisions lT~, lTi, ... of the instruments. Suppose we want an unbiased estimator of lT~. It is natural to seek a quadratic function of the data as an unbiased estimator of
,IT;.
,IT;
n
lT~. Any quadratic function of the data can be written as L
n
L aij Yi Yj
i=l j=l
for some constants aij's. Setting n
lT~ =
n
n
n
2
ELLaijYiYj =1L L L aij i=l j=l
i=l j=l
n
+ LaiilTT, i=l
218
MATRIX ALGEBRA THEORY AND APPLICATIONS
we note that the coefficients aij's have to satisfy the conditions: n
i=2,3, ...
all=l,aii=O,
,nj and
n
L::L::aij=O. i=l j=l
As an example y2 - Y1Y2 is an unbiased estimator of O'r· One can jot down any numbe: of estimators of O'r. We would like to focus on one particular estimator of O'r and demonstrate that it .is optimal in a sense to be made precise later. First, we note the followmg: E(Y1
-
Yi = E((Y1 -
(Y -
JL) -
= E[( - 1 )(Y1
-
n
JL))2
~(Y2 -
JL) -
n
JL) - . .. -
~(Yn -
JL)]2
n
n
= ( - 1 )20'r + (~)2 2: O'~j n
n
i=2
1
n
ES2
n
2:
= (_l_)(n - 1)2 L::O'~ + (_l_)(n - 1)( _)2 O'~ n- 1 n i=l n- 1 n i=l 1
n
2
=-L::O'i. n i=l Now it is simple to verify that
n - 2 1 2 Tl = --(Yl - Y) - - - S
n-2
n-2
is an unbiased estimator of O'i. What is special about this estimator? One thing is that it uses all the data. One could provide other estimators of O'r which make use of all the data. The estimator Tl enjoys a certain invariance property. More precisely, if each Yi is replaced by Yi + c for some fixed constant c, the value of the statistic Tl computed using the new data Y1 + c, Y2 + C, • •• , Yn + c is the same as the value of the statistic Tl computed using the data Y1 , Y2, . . . , Yn . This invariance property is a desirable property for an estimator to possess because the parameters O'['S enjoy such a property. One could provide different invariant estimators of O'i. Let us try a different track. The estimator Tl is a quadratic form in Y1 , Y2, ... , Yn . More precisely, T1=Y'AY.
219
Operations on Matrices
where Y'
= (Yt, 1'2, ... , Yn ), all
= 1, ail =
ali
ai2 aij
and A =
= (aij)
-(n -
where
1)-1, i = 2, ...
,n
= ... = ann = 0,
= (n -1)-I(n - 2)-1 for i =f j, i,j = 2, . .. ,n.
(6.4.2)
(1r
Let Q be the collection of all quadratic unbiased estimators of which are invariant under translations of the data. Any quadratic estimator is of the form Y' BY for some symmetric matrix B = (b ij ) of order n x n. The estimator Y' BY is invariant means that
(Y
+ c1)' B(Y + c1) =
y' BY
for all real numbers c and for all realizations of the data vector Y, where 1 is a column vector of order n x 1 in which every entry is equal to one. The data vector Y + c1 is simply a translation of the data vector Y. The demand that the estimator be invariant is equivalent to the condition that c2 1' B1 + 2c1' BY = 0 for all real numbers and for all realizations of the data vector Y. This is equivalent to the condition that
B1 =0.
(6.4.3)
This condition can be rephrased as the condition that every row sum of B is equal zero. In the presence of invariance condition, the estimator Y' BY is unbiased for if
(1r
bll = 1 and
bii = 0, i = 2,3, ... ,n.
(6.4.4)
(Why?) The matrix A spelled out in (6.4.2) meets all these requirements (6.4.3) and (6.4.4). The matrix A has the optimum property that among n
all matrices B
=
(b ij ) satisfying (6.4.3) and (6.4.4),
n
IIBI12 = L: L: b~j i=1 j=1
is a minimum when B = A. One can easily verify this using the Lagrangian multipliers method. There is yet another way of deriving the
220
MATRIX ALGEBRA THEORY AND APPLICATIONS
estimator T J • The random vector Y falls into the mould of a linear model, i.e., (6.4.5) with E(E) = 0 and Disp(E) = Diag(O"r,O"~, ... ,0";), where f.' = (EJ,E2, ... , En), the unobservable error random vector associated with the da!a vector Y, and X' = (1,1, ... ,1). The least squares estimator of J.L is Y. The projection matrix F associated with the linear model is given by
= (mij) = (In
F
- X ( X , X ) -J X ')
=
1I , In - n n
where I n is the matrix of order n X n in which every entry is equal to one. The vector € of residuals is given by
€' Let
a2' =
= (FY)' = (YJ
(ar, a~,
... , a;).
-
Y, Y2 - Y, ... , Yn
-
Y).
Look at the system of linear equations (6.4.6)
in the unknown a2 , where the symbol . stands for HS multiplication. The matrix F· F is non-singular and its inverse is given by
b
...
~ ~
:::
b
...
a
(F . F) -J =
[
b
where b = [-l/{n - l)(n - 2)J and a - b = n/(n - 2). The solution to the linear equations (6.4.6) is given by a2 = (F. F)-J (€ . E). After a round of routine simplification, we note that
~2
O"i
n
(
-)2
S2
.
= --2 Yi - Y - --2' z = 1,2, ... nn-
a?
,no
For i = 1, is precisely equal to the statistic T} we have been harboring and nurturing all along! There are a couple of things to be sorted out before we can conclude this section. The first is about the significance of the linear equations (6.4.6) we have jotted down. The
221
Operations on Matrices
second is the significance of the optimization problem in arriving at the estimator TI = Y' AY. Let us explain why TI is optimal. When we think about optimality of a certain estimator, we would like to phrase the optimality of the estimator in terms of variance. Declare an unbiased estimator of a parameter to be optimal if its variance is the least among all unbiased estimators of the same parameter. If the variance were to be the criterion of optimality, we need to assume some structure on the fourth moments of the random variables Y 1 , Y2 , ••• ,Yn . If we do not want to assume beyond what we imposed on the data, namely pairwise uncorrelatedness and finite second moments, variance criterion is beyond our reach. We need to seek other optimality criteria intuitively justifiable and acceptable. Suppose J1 is known. A reasonable estimator of (7~ is (Y1 - J1)2. If J1 is known, the residuals EI, E2, ... ,En in the linear model (6.4.5) are observable. In terms of the residuals, the reasonable estimator can be rewritten as E'GE, where G = diag(l, 0, 0, ... ,0). Can we do better than this? Is it possible to find an invariant unbiased estimator Y' BY of (7~ which is close to the reasonable estimator? The conditions on the estimator, especially the invariance property, imply that Y' BY = (Y - J11)' B(Y - J11) = E' BE. The problem is to determine the matrix B such that E'GE - E' BE is as small as possible. This is tantamount to choosing the matrix B with all the constraints so that IIG - BII is minimum. This is equivalent to minimizing IIBII subject to the constraints of invariance and unbiasedness. This is the story behind the estimator TI = Y' AY. One can also justify the estimator TI on the ground that the variation exhibited by the invariant unbiased estimator Y' BY be reduced as much as possible. We can achieve this by choosing B as small as possible. This is the s~called Minimum Norm Quadratic Unbiased Estimation, with the acronym MINQUE principle of, C.R. Rao (1972a). Now we come to the significance of the linear equations (6.4.6). Let
be a general linear model with Disp(E)
= Diag«(7i,(7~, ...
,(7~),
222
MATRIX ALGEBRA THEORY AND APPLICATIONS
where the matrix X of order n x m is known and the parameter vector {3 and variances are unknown. We need good estimators of the variances based on the data vector Y. Assume that the variances are all distinct and the rank of X is m . (These assumptions can be relaxed.) When we say that the variances are all distinct we mean that the vector «(j2)' = «(jr, (j~ , . .. ,(j;) of variances has the parameter space (0, (0) x (0, (0) x .. . x (0, (0). Let P1(jr + P2(j~ + ... + Pn(j~ = p'(j2 be a linear function of the variances, where the vector P' = (P1, P2, ... ,Pn) is known. As per the MINQUE principle, we seek a quadratic estimator Y' BY of P' (j2 such that B = (b ij ) satisfies the conditions
(jr
BX=O, n
L bii(j; = L Pi(j;, i=1 and
(6.4.7)
n
(6.4.8)
i=1 n
n
IIBII2 = LLb~j i=1 j=1 is a minimum. The condition (6.4.7) implies that the estimator Y' BY is invariant, i.e.,
Y' BY
= (Y - X{3o)' B(Y - X{3o)
for all vectors {3o, and condition (6.4.8) implies that the estimator Y' BY is unbiased for P' (j2 . A solution to this problem has been discussed in C.R. Rao (1972). Let F = (In - X(X' X)-1 X') be the projection matrix and £ = (In - X(X' X)-1 X')Y be the vector of residuals. Let (&2)' = (&r, &~, . . . ,&~). Consider the system (F . F) &2 -= €. € of linear equations in the unknown vector &2. If F · F is non-singular, the MINQ UE of (j2 is given by &2 = (F · F) -1 (£ . €). This is the story behind (6.4.6). The next line of inquiry is to understand when the HS product F · F is non-singular. Hartley, Rao, and Kiefer (1969) and Rao (1972a) throw some light on this problem. Complements 6.4.1 Let Y1, Y2, . .. ,Yn be n pairwise uncorrelated random variables with the same mean J-L. Let kll k 2 , ... ,kr be positive integers such that
223
Operations on Matrices
kl
+ k2 + ... + kr = n. Suppose Var(Yi) = Var(Y2) = ... = Var(Yk1 ) = ur, Var(Yk1+I) = Var(Yk1+2) = ... = Var(Ykl+k2) = u~, Var(Ykl+ ... +kr_1+d
= Var(Ykl+ ... +kr_l+2) = ... = Var(Yn ) = u;.
The mean and variances are all unknown. Develop MINQUE's of the variances. 6.4.2 Let }ij, i = 1,2, ... ,p and j = 1,2, ... ,q be pairwise uncorrelated random variables with the following structure. E(}ij) = Var(}ij) =
ai
+ f3j
for all i and j,
u;, j = 1,2, ... ,q
and for all i.
The variances, ai's and f3/s are all unknown. Show that the residuals are given by fij =}ij -
where Yi . =
iii. -
)q
-E
q j=1
Yj
+ Y., i = 1,2, ...
-)p
}ij, Yj
=- E
P i=1
,p -
}ij, and Y . =
the MINQUE of u~ is given by
and
j
= 1,2, ...
IPq
- E E
,q,
}ij. Show that
pq i=1 j=1
t.
a(t. <;;) +b(t <;;), where a
= [(p -
1)(q - 2)]-1 and b = -[(p - 1)(q - 1)(q - 2)]-1 .
6.5. Matrix Derivatives Suppose / is a real valued function of mn variables Xij, i = 1, 2, ... ,m and j = 1, 2, ... ,n. Suppose these variables are arranged in the form of a matrix X = (Xij) of order m x n. Assume that the partial derivatives of / exist with respect to each of its variables. The matrix derivative a/ / ax of / with respect to X is a matrix of order m x n given by
a/ _ (~)
ax -
aXij ,
224
MATRIX ALGEBRA THEORY AND APPLICATIONS
i.e., the (i,j)-th element of 8118X is 8118Xij. If n = 1, X is a column vector and it is denoted by x with components Xl, X2 , .. . , Xm · The corresponding derivative 811 8x is called the vector derivative of I with respect to x. More generally, suppose F = (Jij) is a matrix function of a matrix variable X . What we mean by this is that each entry lij of the matrix F is a real valued function of the variables in X. Let F be of order p x q and X of order m x n. Assume that each of the entries of F has partial derivatives with respect to all the variables in X. The matrix derivative 8FI8X of F with respect to X is defined to be the matrix (6.5.1) of order pm x qn broken up into pq partitions or compartments strung along p rows and q columns. Each partition of the matrix derivative is of order m x n. As an illustration, suppose F is of order 2 x 4 and X is of order 3 x 2. The matrix (6.5.1) comports itself as
!!1.u. ~ 8"'12 "'11 !!1.u. !!1.u. 8X21
8F = 8X
8X22
!!1.u. !!1.u.
1
8ha 8"'12
8[14 8"'11
8"'12
8[13
8ba
8X21
8X22
8b4 8X21
8X22
1
8[14 8Xa1
8Xa2
1
8124 8X11
8124 8X12
8124 8X21
8124 8X22
8124 8 X31
8124 8X32
8h2
8b2
8"'11
8"'12
8ha 8"'1l
8b2 8X21
8b2 8X22
8b2 8X31
8X32
8ba 8Xa1
8ha 8Xa2
8[22
!!.fu.
8ha
8ha
8X11
8X12
8[12 1
8Xa1
8X32
~ 8X11
8121 8X12
gh1 X21
8121 8X22
8122 8X21
8122 8X22
8ha
8ha
8X21
8X22
8121
8121
8Xa1
8Xa2
8[22 8Xal
8122 8X32
8ha 8"'al
8X32
- - -- --I - - -- --I - - -- --I 1
8X11
8X12
1
8ha
-
8h4
8[14 8h4 --
--
There is some criticism mooted against the way the partial derivatives are strung out in 8F18X. Suppose the matrix function is the identity function, i.e., F(X) = X, or equivalently, lij(X) = Xij for all i and j. If we want to use the matrix of partial derivatives to build the Jacobian of the transformation, the entity 8FI8X is in for a disappointment. Suppose X is of order 2 x 3 and F(X) = X. Then 8118X = (( vecI2) ® h)', which is of order 4 x 9. It is clear that the rank of the matrix 8FI8X is one. The Jacobian of the transformation F(X) = X is 10 . The derivative 8FI8X is nowhere near the Jacobian. Even the order
225
Operations on Matrices
of the matrix 8F/8X is wrong for the Jacobian. To ameliorate the standard definition of the matrix derivative (6.5.1) to meet the needs of the Jacobian, one could define the matrix derivative *8F/8X of F of order p X q with respect to X of order m x n as *8F 8X
8vecF 8(vecX}"
(6.5.2)
which is of order pq X mn. In order to work out the new matrix of partial derivatives, to begin with, one has to stack the variables in X column by column in one long vector, takes it transpose, stack the entries of F column by column into one long column vector, and then take the partial derivatives of each and every entry with respect to (vecX)'. Note that the order of vecF is pq x 1 and that of (vecX)' = 1 x mn. Consequently, the order of the matrix (6.5.2) is pq x mn. For example, the case of F with order 2 x 4 and X of order 3 x 2 gives rise to the following matrix derivative in its new incarnation:
!lill.
!lill.
!lill.
£ill.
8X31
8XI2
£ill.
8X21
8121 8Xll
8121 8X21
8121 8 X31
8121 8XI2
8121
8121
8X22
8X32
8b2
8b2 8x21
8b2 8x31
~
8Xll
8b2 8X22
8b2 8X32
8122 8X22
8122 8X32
8xII
*8F 8X
8XI2
8122
8122
8122
8122
8XII
8X21
8 X 31
8XI2
8X22
£ill. 8X32
8b3
8h3
8h3
8h3
8Xll
8X21
8X31
8XI2
8b3 8 X22
8h3 8X32
8/23 8XI2
8/23
8/23
8X22
8 X32
8123
8123
8/23
8Xll
8X21
8X31
8b4
8b4
8b4
8b4
8[14
8Xll
8 x 21
8X31
8XI2
8X22
8b4 8 X32
8124 8X22
8X32
8124
8124
8124
8124
8XII
8X21
8X31
8XI2
8/24
As one can see, the partial derivatives in *8F/8X are set out in the style of evaluating the Jacobian of a transformation. The entries of *8F/8X are simply a rearrangement of the entries of 8F/8X. More precisely, in the special case, first we form transposes of vecs of each partition of
226
MATRIX ALGEBRA THEORY AND APPLICATIONS
aFlax as
(vec88~2 )' (vec~~3 )'
(vec 8lF )'
(veC~~3 )'
and then treating each vec as a single entity arrange them in vec form in order to obtain *aFlax. For a more precise relationship, see Complement 6.5.3. There is a minor conflict between the standard practice of writing the vector derivative and the version (6.5.2) in the case of a scalar valued function 1 of a vector variable x. On one hand, 8118x is a column vector and on the other hand, * 811 ax = all ax' = (all ax)" which is a row vector. When we provide a list of some derivatives of some standard functions, we follow the standard form of arranging the partial derivatives. The formulas for the modified form can be jotted down in a simple manner. A critical result which is useful in deriving some formulas for matrix derivatives is the following. Let 1 be scalar valued function of a matrix variable X of order m x n. Let Y be a constant matrix of order m x n. Assume that 1 is differentiable, i.e., all its partial derivatives exist and are continuous. Then the directional derivative of 1 in the direction of Y as defined by (6.5.3) exists and lim I(X
+ tY) t
t-O
I(X) =
tr
(Y' 81 ) ax'
(6.5.3)
In some problems, it may be relatively easy to evaluate the limit on the left-hand side of (6.5.3). Once we know what it is, 81 lax can be figured out from (6.5.3). As an example, consider the following problem. Let A be a matrix of order m x m and for x in Jl'ffi, let I(x) = x' Ax. Observe that for any constant vector y,
lim I(x t--O
+ ty) - I(x) t
= lim x' Ax + t 2 y' Ay t-O
+ ty' Ax + tx' Ay - x' Ax t
= y' Ax +x' Ay = y'(A + A')x = tr(y'~~) '
227
Operations on Matrices
But tr(y'(af lax)) = y'(af lax). Hence af lax = (A + A')x. This can also be obtained by a straightforward evaluation of the vector derivative. The formula (6.5.3) is also useful in deriving some identities involving matrix derivatives. Some of them are jotted down below. P 6.5.1 Let f and 9 be two differentiable real valued functions of a matrix variable X. Then the following are valid:
W
(1) = f~ (2) 8~lg) = ~
+gU· U - -fo ~
provided 9 is not zero. (3) For a scalar valued function f of a matrix valued function H (hij) of a matrix variable X,
af(H) aH
=
= '" '" af ahij L...JL...J ah tJ·· ax· i j
We now focus on vector derivatives. All our functions valued functions defined on the vector space R"'.
f
are real
P 6.5.2 (1) If f(x) = a'x for some constant vector a E Rm, then = a. (2) If f(x) = XiX, then = 2x. (3) If f(x) = x' Ax for some constant matrix A of order m x m with real entries, then ~ = (A + A')x. (4) If f(x) = x' Ax for some constant symmetric matrix A of order = 2Ax. m x m with real entries, then We seek two important applications of these results. Let A be a nonnegative definite matrix of order m x m, B a matrix of order r x m, and p a colunm vector of order r x 1, all with constant real entries. The objective is to minimize the function f given by f(x) = x' Ax, x E Rm, subject to the restriction that Bx = p. Introduce the vector>' of Lagrange multipliers of order r x 1 and consider the function ~ 8x
M
M
L(x, >.) = x' Ax + 2>.'(Bx - p), x E R m , >. E RT. The stationary values of the function L are obtained by setting separately the vector derivatives of L with respect to x and>' equal to zero. Using P 6.5.2, we have
aL I aL ax = 2Ax + 2B >. = 0, a>. = 2(Bx - p) =
o.
228
MATRIX ALGEBRA THEORY AND APPLICATIONS
These equations which are linear in x and A can be rewritten as
Solving these equations is another story. If rank(B) = r and A is positive definite, then the system of equations admits a unique solution. From the equations, the following series of computations follow:
x
=
_A- 1 B' A, Bx
= -BA- I B' A = p;
A = -(BA- I B')-Ip,
X
= A-I B'(BA- I B')-Ip.
This type of optimization problem arises in Linear Models. Suppose y is a random vector of m components whose distribution could be anyone of the distributions indexed by a finite dimensional parameter () E Rr. Suppose under each () E Rr, Y has the same dispersion matrix A but the expected value is given by Eo Y = B'() for some known matrix B of order r x m . (The expected value of each component of Y is a known linear combination of the components of ().) One of the important problems in Linear Models is to estimate a linear function p'() of the parameter vector unbiasedly with minimum variance, where the vector p of order r x 1 is known. In order to make the estimation problem simple, we seek only linear functions of the data Y which are unbiased estimators of p' () and in this collection of estimators we search for one with minimum variance. One can show that a linear function x'Y of the data Y is unbiased for p'() if Bx = p. For such x, the variance of x'Y is x' Ax. Now the objective is to minimize x' Ax over all x but subject to the condition Bx = p. If B is of rank r and A is of full rank, then the linear unbiased estimator of p'() with minimum variance (Best Linear Unbiased Estimator with the acronym BLUE) is given by
Let us look at another optimization problem. Let A be a symmetric and B a positive definite matrix with real entries. Let f(x) = x' Ax and g(x) = x' Bx, x E R:" . We would like to determine the stationary values of the function f(x)/g(x), x E R:" - {O} . We equate the vector
Operations on Matrices
229
derivative of this ratio with respect to x to zero. Using P 6.5.2, we have 8(1/g) = _2_Ax _ 2x'Ax Bx = O.
8x
x' Bx
(x' Bx)2
This equation leads to the equation
(A - AB)x = 0, where A = x' Ax/x' Bx. Thus the stationary value x in Rrn - {O} of the ratio of quadratic forms has to satisfy the equation (A - AB)x = 0 for some A. (But A will be automatically equal to x' Ax/x' Bx. Why?) A non-zero solution to the equation exists if the determinant IA - ABI = o. This determinantal equations has exactly m roots. Thus the stationary values of the ratio of the quadratic forms of interest are at most m in number. We now focus on matrix derivatives. The function I is a real valued function of a matrix variable X of order m x m. The domain of definition of I need not be the space of all matrices. For the determinant function, we will consider the collection Mrn(ns) of all non-singular matrices of order m x m with real entries. This set is an open subset of the collection of all matrices of order m x m in its usual topology. The set {X E Mrn(ns) : IXI > O} is also an open set and we will consider functions having this set as their domain. Differentiability of the determinant function IXI of X in its domain should pose no problems. P 6.5.3 (1) If I(X) = lXI, X E Mrn(ns), then 81/8X = IXI(X-I),. (2) If I(X) = log lXI, IXI > 0, then = (X-I),. (3) If I(X) = IXlr, IXI > 0 fixed, then 81/8X = rIXn(X- 1 ),.
U
(1) We use (6.5.3). Let X = (Xij) E Mrn(ns). Let Y = (Yij) be any arbitrary matrix of order m x m. For small values of t, X +tY will be non-singular. Let us embark on finding the determinant of X + tY. Let Xc = (xij) be the matrix of cofactors of X. After expanding IX + tYI and omitting terms of the order t 2 , we have PROOF.
m
IX + tYI = IXI + t L
m
L YijX ij = IXI + t trey' XC).
i=l j=l
Consequently,
230
MATRIX ALGEBRA THEORY AND APPLICATIONS
lim !X + tY!-!X! t-O
t
. I.e.,
= tr(Y' XC) = tr(Y' af ) ax af = Xc = !X!(X-1), . ax
This completes the proof. The proofs of (2) and (3) are now trivial. The case of symmetric matrices requires some caution. The space Mm(s) of all symmetric matrices of order m x m is no longer an m 2 _ dimensional vector space. In fact, it is an m(m + 1)/2-dimensional vector space. Now we consider the subset Mm(s,ns) of all non-singular matrices in Mm(s). [The letters ns stand for nonsingular.j This subset is an open set in Mm (s) in its usual topology. The determinant function on this subset is under focus. As a simple example, consider the case of m = 2. Any matrix X in Mm(s, ns) is of the form, X =
with the determinant
XUX22 -
[xu X12 Xi2
X12] , X22
i= O.
Observe that
= 2 [X22 -X12] _ [X22 0 ] = !X![2X-l _ diag(X-l )1. -X12 Xu 0 Xu
This formula holds in general too. Before we jot it down let us discuss the problem of taking derivatives of functions whose domain of definition is the set of all symmetric matrices. Let f be a scalar valued function of a matrix variable X. It is clear that af lax' = (af lax)'. Let f be a scalar valued function of a matrix variable X, where X is symmetric. What we need is a formula analogous to (6.5.3) operational in the case of a symmetric argument. We do have a direct formula which in conjunction with (6.5.3) can be used to solve the symmetric problem. The formula for X symmetric is
af _ {af(Y) af(Y) . (af(Y))} ax - -ay- + ay' - dlag -ay- !y=x.
(6.5.4)
231
Operations on Matrices
In working out the derivative 8f(Y)/8Y, the function f(-} is pretended to have been defined on the class of all matrices Y, i.e., all the components of Yare regarded as independent variables, and then the derivative formed. Let us illustrate the mechanics of this formula with a simple example. Let f(X) = lXI, where X is of order 2 x 2, IXI =1= 0, and X symmetric. Regard f( ·) as a function of Y = (Yij), where Y is of order 2 x 2 and IYI =1= O. More precisely, f(Y) = IYI = YllY22 - Y12Y21· Note that
8f(Y) _ 8Y
[~~~~
Y22
81YI
[
8Y21
8f(Y) 8Y' =
.
diag
(8f (Y)) _ 8Y -
8Y21
81YI
8 1YI
8Y12
8Y22
[~ 0
-Y1l
-Y21
0]
8Yl1
Yll
-Y12
[~ ~] [ Y22 8Yl1
-Y21] ,
81YI
[Y22
o
8Y22
Yll
1
Yll
and for X symmetric,
8f =
8X
P 6.5.4
{8f(Y) 8Y
+ 8f(~) 8Y
_
diag
(8f(Y))}1 _ 8Y
Y-X
(1) If f(X) = lXI, X E Mm(s, ns), then
:i
= IXI[2X- 1
-
diag(X- 1 )].
232
MATRIX A LGEBRA THEORY AND APPLICATIONS
(2) If f(X) = log lXI , X E Mm(s , ns), IXI > 0, then
af = [2X-1 - diag(X- 1)]. ax (3) If f(X) = IXlr, X E Mm(s, ns) , IXI > 0, then
:~ = r1Xn2X- 1 -
diag(X- 1)].
We will now outline some useful formulas on matrix derivatives. Let U and V be two matrix functions of a matrix variable X, where U = (Uij) and V = (Vij) ar.e of orders p x q and X is of order m x n. Applying P 6.5.1 (1) to each term Uij(X)Vji(X), we deduce
a
ax tr(U(X)V(X))
a
a
+ ax tr(U(Y)V(X))IY=x.
= ax tr(U(X)V(Y))IY=x
(6.5.5)
Instead of the trace function dealt in (6.5.5), we could deal with any scalar valued function f of U(X) and V(X). Accordingly, we have
a:f(U(X), V(X)) =
[a~f(U, V)] [a:U(X)] +
[a~ f(U, V) 1[a: V(X)].
(6 .5.6)
Using (6.5.5) or (6.5.6), one can establish the validity of the following proposition. P 6.5.5 (1) Let U(X) be a matrix valued function of a matrix variable X, where U(X) is of order p x p, non-singular, and X is of order m x n. Then
a
ax tr(U-1(X))
a
= - ax tr(U- 2 (Y)U(X))ly=x .
(2) Let A be a constant matrix of order p x p and U(X) a matrix valued function of a matrix argument X, where U(X) is of order p x p, non-singular, and X is of order m x n . Then
a
ax tr(U-1(X)A)
a
= - ax tr(U - 1(Y)AU- 1 (Y)U(X))ly=x.
Operations on Matrices
(3) Let A and B be constant matrices each of order m f(X) = tr(AX-I B), X E Mm(ns). Then
:~
233 X
m and
= -(X-1BAX- 1 )'.
(4) Let U(X) be a matrix valued function of a matrix variable X, where U(X) is of order p x p, non-singular, and X is of order m x n. Then
a
ax IU(X)I
a
= IU(X)I ax tr(U-I(Y)U(X))ly=x.
(5) Let A be a constant matrix of order m x m and f(X) = IAXI, X is of order m X m and AX non-singular. Then
:~
= IAXI
a~ tr«Ay)-l AX)ly=x
= IAXI«AX)-l A)'.
At the beginning of this section, we toyed with another idea of writing the matrix of partial derivatives. More precisely, let F(X) be a matrix valued function of a matrix variable X. We defined
*aF avec(F) ax = a(vecX)'· Even though the entries of *aFjaX are simply a rearrangement of the entries of aF/aX, it is useful to compile *aFjaX for some standard functions F of X. This is what we do in the following proposition. All these results can be derived from first principles. P 6.5.6 (1) Let F(X) = AX, where A is a constant matrix of order p x m and X of order m x n. Then
(2) Let F(X) = X B, where B is a constant matrix of order n x q and X of order m x n. Then
*aF ax = B' ®Im.
234
MATRIX ALGEBRA THEORY AND APPLICATIONS
(3) Let F(X) = AXB, where A and B are constant matrices of orders p x m and n x q, respectively, and X of order m x n. Then
*oF oX =B'®A. (4) Let F(X) = AX' B, where A and B are constant matrices of orders p x nand m x q, respectively, and X of order m x n. Then
*oF 8X
= (A ® B')P,
where P is the permutation matrix which transforms the vector vec(X) into vec(X'), i.e., vec(X') = Pvec(X). (5) Let U(X) and VeX) be matrix valued functions of a matrix variable X, where U(X) is of order p x q, VeX) of order q x r, and X of order m x n. Then
;~U(X)V(X) = (V(X) ® Ir)' ;~ U(X) + (I ® U(X)) ;~ VeX). (6) Let F(X) = X' AX, where A is a constant matrix of order m x m and X of order m x n. Then
*oF oX = ('A' X ® In ) P
+ ( In ® X , A ) .
(7) Let F(X) = AX- l B, where A and B are constant matrices of orders p x m and m x q, respectively, X of order m x m and non-singular. Then *oF oX = _(X- l B)' ® (AX-I) . (8) Let U(X) and Z(X) be two matrix valued functions of a matrix variable X, where U(·) is of order p x q, Z( ·) of order 1 x 1 and X of order m x n . Let f(X) = Z(X)U(X). Then
*8f 8X
= vec(U(X)) *8Z(X) + Z(X) *8U(X) 8X
oX
235
Operations on Matrices
(9) Let U(X) be a matrix valued function of a matrix variable X, where U(·) is of order p x p and non-singular, and X is of order m x n. Let J(X) = [U(X)]-l. Then
*8J = «U-I(X))' ® U-I(X)) *8U(X) . 8X
8X
(10) Let Y(X) be a matrix valued function of a matrix variable X, where Y (.) is of order p x q and X of order m x n. Let Z (V) be a matrix valued function of a matrix variable V, where Z(·) is of order r x sand V of order p x q. Let J(X) = Z(Y(X)), X E Mm,n. Then
*8J = (*8Z(V) 8X 8V
I
) (*8Y(X)).
v=y(X)
8X
(11) Let Z(X} and Y(X} be two matrix valued functions of a matrix variable X, where Z(X} and Y(X} are of the same order p x q and X of order m x n. Let J(X} = Z(X} . Y(X), X E Mm,n, where the symbol . denotes HS multiplication. Then
where for any matrix Z = (Zij) of order p x q, D(Z} =diag(Zn,ZI2, . .. ,Zlq,Z2J,Z22, ••• ,Z2q, ••• ,Zpl,Zp2, • •• ,Zpq). (12) Let Z(X} be a matrix valued function of a matrix variable X, where Z(X} is of order p x q and X of order m x n. Let B be a constant matrix of order p x q and J(X} = Z(X} . B, X E Smn . Then
*8J = D(B) *8Z(X}. 8X
8X
As has been indicated earlier, the matrix derivative defined as * 8 J/ 8X is very useful in evaluating the Jacobian of a transformation. Suppose J(X} is a matrix valued function of a matrix variable X, where both X and J(X) are of the same order m x n. The Jacobian J of the transformation JO is given by J
=
*8J I 8X +' 1
236
MATRIX ALGEBRA THEORY AND APPLICATIONS
where the suffix + inclicates the positive value of the deterrillnant of the matrix *8f18X of order mn X mn. Suppose f(X) = AXB , where A and B are constant non-singular matrices of orders m X m and n X n, respectively, and X E M m •n . The Jacobian of the transformation fO is given by
Complements
6.5.1 Let F(X) = X be the identity transformation of the matrix variable X of order m X n . Show that 8FI8X = (vec{Im)) (8) (vec(In))' . 6.5.2 Let F(X) = X be the identity transformation of the matrix variable X of order m X n. Show that *8F18X = I. 6.5.3 Let F be a matrix valued function of order p X q of a matrix variable X = (Xij) of order m X n. Show that
*8F 8X
=L m
L n
.=1 3=1
(
8F ) ( )' vec 8Xi . VeCEij , 3
where Eij is a matrix of order m X n whose (i,j)-th entry is unity and the rest of its entries zeros. 6.5.4 Let A be a constant matrix of order m X nand f(X) = tr(AX), X E M m •n , the vector space of all matrices X of order n X m. Show that (a/ / ax) A'. If m n and the domain of definition of / is the collection M(s) of all m X m symmetric matrices, show that 8/18X = 2A' - diag(A). 6.5.5 Let f(X) = tr(X2), X E M m , the space of all m x m matrices. Show that (8f 18X) = 2X'. If the domain of definition of / is the collection of all symmetric matrices, how does the matrix of partial derivatives change? 6.5.6 Let A and B be two constant matrices of orders m X m and n X n, respectively. Let f(X) = tr(X' AX B), X E Mm .n, the space of all m X n matrices, Show that (8/18X) = AX B + A' X B'. If m = n and the domain of definition of f is the space of all m X m matrices , show that
=
!{
= AXB
=
+ A'XB' + BXA +B'XA' -cliag(AXB + A'XB').
Opemtions on Matrices
237
Let A and B be two constant matrices of the same order m x m and I(X) = tr(X AX B), X E Sm. Show that
6.5.7
: { = A'X'B'
If the domain of definition of show that : { = A'XB'
I
+ B'X'A'.
is the space of all symmetric matrices,
+ B'XA' + AXB + BXA -diag(A'XB' + B'XA').
6.5.8 Let A be a constant matrix of order mxm and f(X) = tr(X' AX), X E Mm,n' Show that (af laX) = (A + A')X. If m = n and the domain of definition of f is the collection of all symmetric matrices, show that
:{ = (A + A')X + X(A + A') -
diag((A
+ A')X).
6.5.9 Let I{X) = tr(Xn), X E M m , n ~ 1. Show that (af laX) = nxn-I. If the domain of definition of I is the space of all symmetric matrices, show that
af ax
= 2n X n- I _ nd'lag (xn-I) .
6.5.10 Let x and y be two fixed column vectors of orders m x 1 and n x 1, and I{X) = x' Xy, X E Mm,n' Show that (al laX) = xy'. If m = n and the domain of definition of f is the set of all symmetric matrices, show that (af laX) = xy' + yx'. 6.5.11 Let A be a constant matrix of order m x m and f(X) = tr(AX-1),X E Mm(ns), the set of all non-singular matrices of order m x m. Show that al = _(X-I AX-I)'.
ax
If the domain of definition of f is confined to the collection of all nonsingular symmetric matrices, show that
238
MATRIX ALGEBRA THEORY AND APPLICATIONS
6.5.12
Let I(X} = IX x'l, X E Mm,n and rank(X} = m . Show that
:~ = 2IXX'I(XX'}-IX. 6.5.13 Let a and b be two constant colunm vectors of orders m X 1 and n xl, respectively. Determine the matrix derivative of each of the scalar valued functions h(X} = a'Xb, and heX) = a'XX'a, X E Mm,n, the collection of all matrices of order m x n, with respect to X. 6.5.14 Let a be a constant colunm vector of order m x 1 and I(X) = a' X-I a, X E Mm (ns), the collection of all m x m non-singular matrices of order m x m. Determine the matrix derivative of the scalar valued function I with respect to X. 6.5.15 Let p be any positive integer and I(X} = XP, X E Mm. Show that *81 P 8X = :L)X')P-j ® Xj-I.
j=I 6.5.16 Find the Jacobian of each of the following transformations, where A and B are constant matrices of order m x m, and X E Mm. (1) I(X) = AX-I B, X non-singular. (2) I(X} = X AX'. (3) I(X) = X' AX. (4) I(X} = XAX, X E Mm.
(5) I(X) = X' AX'. Notes: The following papers and books have been consulted for developing the material in this chapter. Hartley, Rao, and Kiefer (1969), Rao and Mitra (1971b), Rao (1973c), Srivastava and Khatri (1979), Rao and Kleffe (1980), Graham (1981), Barnett (1990), Rao (1985a), Rao and Kleffe (1988), Magnus and Neudecker (1991), Liu (1995), among others.
CHAPTER 7 PROJECTORS AND IDEMPOTENT OPERATORS
The notion of an orthogonal projection has been introduced in Section 2.2 in the context of inner product spaces. Look up Definition 2.2.11 and the ensuing discussion. In this chapter, we will introduce projectors in the general context of vector spaces. Under a particular mixture of circumstances, an orthogonal projection is seen to be a special kind of projector. We round up the chapter with some examples and complements. 7.1. Projectors DEFINITION 7.1.1. Let a vector space V be the direct sum of two
subspaces VI and V2, VI nV 2 = {O}, Le., V = VI E9 V 2. (See P 1.5.5 and the discussion preceding P 1.5.7.) Then any vector x in V has a unique decomposition x = Xl + X2 with Xl E V I and X2 E V 2. The transformation X - Xl is called the projection of X on VI along V2. The operator or map P defined on the vector space V by Px = Xl is called a projector from the vector space V to the subspace VI along the subspace V 2. The first thing we would like to point out is that the map P is welldefined. Further, the map P is an onto map from V to VI. It is also transparent that the projector P restricted to the subspace V I is the identity transformation from VI to VI, Le., Px = X if X E VI' If V is an inner product space and X E VI, Y E V 2 implies that X and yare orthogonal, i.e., V 2 is the orthogonal complement of VI, the map P is precisely the orthogonal projection from the space V to the space VIas enunciated in Definition 2.2.11. Suppose VI is a subspace of V. There could be any number of subspaces V 2 of V such that V I E9 V 2 = V. Each such subspace V 2 gives a projector P from V onto VI along V 2. 239
240
MATRIX ALGEBRA THEORY AND APPLICATIONS
No two such projectors are the same! In the following proposition, we record a simple property of projectors. P 7.1.2
A projector P is a linear transformation.
PROOF. Let x = Xl + X2 and Y = YI + Y2 be the unique decompositions of two vectors x and Y in V, respectively, with respect to the subspaces VI and V2 of V. The decomposition of the vector x + y works out to be x + Y = {Xl + yd + (X2 + Y2)
with {Xl
+ yd
E VI and (X2
+ Y2)
E V2. By definition,
P{X+Y) =XI +YI = Px+Py. This shows that the map P is additive. For a scalar a, ax = aXI + aX2 with aXI E VI and aX2 E V2. Consequently, P{ax) = aXI = a{Px). Hence P is linear. The definition of a projector involves two subspaces with only zero vector common between them. In the following proposition, we characterize projectors abstractly without alluding to the underlying subspaces. P 7.1.3 A linear transformation P from a vector space V into itself is a projector from V onto some subspace of V along some complementary subspace of V if and only if it is idempotent, i.e., p 2 = pPROOF. Let P be a projector from V onto a subspace V I along a subspace V 2 of V. Let X = Xl + X2 be the unique decomposition of any vector X in V with Xl in VI and X2 in V2. The unique decomposition of Xl in V is given by Xl = Xl +0 with Xl E V I and 0 E V 2. Consequently,
p2X = P{Px) = PXI = Xl = Px . Hence it follows that P is idempotent. Conversely, let p2 = P and
= {x E V V 2 = {X E V
VI
= x} : Px = O}.
: Px
Since P is a linear transformation, V I and V 2 are subspaces of V . FUrther, VI nV2 = {O}. For any given X E V, write X = Px + (I _
Projectors and Idempotent Opemtors
241
P)x, where I is the identity transformation from V to V. Since P is idempotent, Px = p2x = 0, from which we have (I - P}x E V 2. Thus x = Xl + X2, where Xl = Px and X2 = (I - P}x E V2, is the unique decomposition of X with respect to the subspaces V I and V 2. Hence P is a projector from V onto V I along V 2. In view of P 7.1.3, we can safely omit mentioning the subspaces that define a projector. Once we recognize the projector as an idempotent operator, the associated subspaces VI and V2 of the projector can be recovered via the formulas presented in the proof of P 7.1.3. These subspaces are explicitly identified in P 7.1.4 below. In order to show that a particular linear map is a projector, in many cases, it is easier to show that it is an idempotent operator. We now jot down several equivalent characterizations of projectors. Let P be a linear transformation from a vector space V to V. Let R(P) and K(P} be the range and kernel of the transformation P, respectively. See Section 3.1.
P 7.1.4 The following statements are equivalent. (I) The map P is a projector. (2) The map (I - P) is a projector. (3) The range R(P) of P is given by R(P} (4) R(P) = K(I - P). (5) R(I - P} = K(P).
= {x
E V:
Px = x}.
(6) R(P) nR(I - P} = {O}. (7) K(P} K(I - P} = {O}.
n
Proving the equivalence of these statements is left to the reader. In view of P 7.1.4, if P is a projector we can say that P is a projector from V onto R(P} along R(I - P}. In the following we look at sums of projectors. P 7.1.5 Let PI, P2, ... ,Pk be projectors such that PiPj = 0 for all i =1= j. Then:
(1) P = PI + P2 + ... + Pk is a projector. (2) R(Pi) nR(Pj} = {O} for all i =1= j and R(P)
= R(Pt}61R(P2} $
... $ R(Pk }.
(I) It is easy to establish that P is idempotent. (2) Let i =1= j and z E R(Pi} nR(Pj). Then z = PiX = PjY for some vectors X and yin V. Observe that PiX = plx = Pi(Pi X} = Pi(PjY} = PROOF.
242
MATRIX ALGEBRA THEORY AND APPLICATIONS
~PjY = o. Hence z = O. This proves that R(~) nR(Fj) = {O}. By the very definition of the projector P, R(P) C R(Pt}EB .. . Ej1 R(Pk). On the other hand, note that R(~) C R(P) for each i. For , if x E R(Pi ), then x = PiY for some y in V and P PiY = ply = ~y = x , from which it follows that x E R(P). Consequently, R(Pt) EBR(P2) EB.· .EBR(Pk) C R( P) . This completes the proof. The following result is complimentary to P 7.1.5. A given projector can be written as a sum of projectors under the right mixture of circumstances. P 7.1.6 Let P be a projector defined on a vector space V onto a subspace VI along a subspace V 2 of V. Suppose the subspace VI is a direct sum of subspaces, Le., VI = V ll EB V 12 EB ... EB VIr for some subspaces Vlj's of V. Then there exist unique projectors ~ from V onto V Ii along an appropriate subspace of V such that P = PI + P2 + ... + Pr and PiPj = 0 for all i =1= j. One can always bring into existence a projector as long as we have two subspaces whose direct sum is the underlying vector space. In order to identify Pi we need two subspaces. We already have one, namely, VIi. In order to avoid an appropriate subspace complementary to the subspace VIi, let us define the map ~ directly. Let x E V. We can write x = Xu + X12 + . .. + Xlr + y with Xli E VI i and y E V 2 . Define PiX = Xli. The map Pi is obviously a linear transformation and idempotent. Consequently, Pi is a projector. (Can you identify the subspace V 2i such that F1 is a projector from V onto VIi along V 2i?) It is clear that P = PI + P 2 + ... + P r and PiPj = 0 for all i =1= j. To prove uniqueness, let P = Ql + Q2 + .. . + Qr be an alternative representation of P as a sum of projectors. Then for any X in V, 0 = Px - Px = (PI - Ql)X + (P2 - Q2)X + ... + (Pr - Qr)x. This implies that (Pi - Qi)X = 0 for each i in view of the fact that (Pi - Qi)X E VIi. If (Pi - Qi)X = 0 for every x, then Pi = Qi. This proves P 7.1.6. We now look at a familiar problem that crops up in Statistics. Suppose YI, Y2, ... , Ym are m real random variables whose joint distribution depends on a vector parameter ()' = ((}l, (}2, ... ,(}k) E R k, with m > k. Suppose PROOF .
Eo"Yi = Xil(}l + Xi2(}2 + ... + Xik(}k, i = 1, 2, . .. ,m , where Xij'S are known. In the language of linear models, the random
Projectors and Idempotent Operators
243
variables Y1 , Y2, ... , Ym constitute a linear model. These models lie at the heart of multiple regression analysis and design of experiments pro~ lems. Let V be the collection of all linear functions of Yi, Y2, ... , Ym . It is clear that V is a real space of dimension m. As a matter of fact, we can identify the vector space V with R m in the obvious manner. Let V be the collection of all linear unbiased estimators of zero. A linear unbiased estimator of zero is any linear function i I YI +i2 Y2 +... +im Ym of YI , Y2, ... ,Ym such that E9(i IY I + i 2Y 2 + ... + imYm) = 0 for all () in R k. Such linear functions constitute a subspace V I of V. The space VI can be identified explicitly. Let X = (Xij). The matrix X of order m x k is called the design matrix of the linear model. One can check that VI = {i' = (i},i 2, ... ,im ) E R m : i'x = O}. Every vector i' in V I is orthogonal to every colUlnn vector of the matrix X. Let V 2 be the collection of all vectors in Rm each of which is a linear combination of the columns of X. The space V 2 is clearly a subspace of V. Further, V = VI a1 V2. As a matter of fact, each vector in VI and each vector in V 2 are orthogonal. The next target is to identify explicitly the projector from the vector space V onto VI along V2. To simplify the argument, assume that the rank of the matrix X is k. This ensures the matrix X' X to be non-singular. Let
A
= X(X'X)-IX'.
It is clear that the matrix A is of order m x m. One can also check that it is symmetric and idempotent, Le., A2 = A. Let i' be any vector in V = Rm. Observe that i = (Irn - A)i + Ai, where 1m is the identity matrix of order m x m. We claim that the vector ((Im - A)i)' = i'(Im - A) belongs to VI. For i'(Im - A)X = i'(X - X) = O. FUrther, it is clear that Ai = X(X'X)-IX'i is a linear combination of the colmnns of X. Thus we have demonstrated practically how the vector space V is the direct sum of the subspaces VI and V 2 . Let P be the projector from V onto V I along V 2. The explicit formula for the computation of P is given by (7.1.1) Pi = (Im - X(X'X)-IX')i. IT V = Rm is equipped with its usual inner product, the projector P is indeed an orthogonal projection.
244
MATRIX ALGEBRA THEORY AND APPLICATIONS
There is one benefit that accrues from the explicit formula (7.1.1) of the projector P. Suppose Y l , Y2, . .. ,Ym are pairwise uncorrelated with common variance (1"2 > o. Let i l Y l + i2 Y2 + ... + im Ym = i'Y be a linear function of Y l , Y2,.·· ,Ym , where i' = (fl, i2, ... ,im ) and Y' = (YI , Y2, . .. ,Ym ). Let i = i(l) + i(2) be the decomposition of l with respect to the subspaces VIand V 2. One can verify that under each () E Rk,
i.e., l(l) Y and i(2) Yare uncorrelated. The celebrated Gauss-Markov theorem unfolds in a simple way in this environment. Let fO be a linear parametric function, i.e., f(()) = PI()l +P2()2 + ... +Pk()k for some known numbers PI, P2, ... ,Pk, () E Rk. We now seek the best linear unbiased estimator (BLUE) of fO. The estimator should be of the form l I Y I + i2 Y2 + ... + im Ym, unbiased, and has minimum variance among all linear unbiased estimators of fO. To begin with, cook up any linear unbiased estimator i l Yl + l2 Y2 + ... + lmYm = i'Y of fO. Obtain the decomposition i = l(l) + i(2), with respect to the subs paces VI and V 2. Then i(2)Y is the desired BLUE of fO. To see this, let s'Y be any linear unbiased estimator of f( ·). Write s'y = (s - l(2) )'Y + l(2) Y . Note that s - l(2) E VI. (How?) Consequently, for each () E R k, Varianceo(s'Y)
= Varianceo((s -
=> Varianceo(s'Y)
i(2»)'Y)
+ Varianceo(i(2) Y),
~ Variance(i(2) Y).
Complements 7.1.1 If P is a projector defined on a vector space V onto a subspace VI of V along a subspace V2 of V, identify the subspaces V h and V 2• such that the operator 1- P is a projector from V onto V h along V 2•. 7.1.2 Let V = R2, VI = {(Xl,O) E R2 : Xl real}, and V 2 = {(Xl, X2) E R2 : Xl + X2 = O}. Identify the projector PI from V onto VI along V 2. Under the usual inner product on the real vector space V, is PI an orthogonal projector? Explain fully.
Projectors and Idempotent Operators
245
7.1.3 Let V = R2, VI = {(Xl, O) E R2 : Xl real}, and V 2 = {(Xl,X2) E R2 : 2XI + X2 = O}. Identify the projector P 2 from V onto VI along V2. 7.1.4 Let P = PI +P2, where PI is the projector identified in Complement 7.1.2 and P2 the projector in Complement 7.1.3. Is P a projector? Explain fully. 7.1.5 Let Pt, P2, ... ,Pk be projectors defined on a vector space V such that PiPj = 0 for all i =I j. Identify the subspaces V I and V 2 such that P = PI + P2 + ... + Pk is projector from V onto VI along V2. 7.1.6 Let F = {O, I} be the two-element field and V = F2, a vector space over the field F. Let V I = {(O, O), (1, O)} and V 2 = {(O, O), (0, I)}. Show that V = VI EB V 2. Let PI be the projector from V onto VI along V 2 • Show that PI + PI = 0, the map which maps every element of V to the vector (O,O). Let P2 be the projector from V onto V along the subspace {(O,O)}. Show that PI + P2 is a projector but PI P2 =I O. Comment on P 7.1.5 in this connection. 7.1.7 Let V be a vector space over a field F. Suppose that the field F has the property that 1 + 1 =I 0 in F. Let PI and P2 be two projectors defined on V. Show that P = PI + P2 is a projector if and only if PI P2 = P2P I = O. If P is a projector, show that P is a projector from V onto R(PI } EB R(P2} along K(Pt} K(P2}. Hint: First, show that P I P2+P2PI = 0 and then PI P2P I +PIP2PI =
n
O. 7.1.8 Let PI and P2 be two projectors. Show that P I -P2 is a projector if and only if PI P2 = P2P I = P2, in which case PI - P 2 is a projector from V onto R(PI } K(P2} along K(PI }EBR(P2 }. The condition on the underlying field F stipulated in Complement 7.1.7 is still operational. 7.1.9 Let PI and P2 be two projectors such that PI P2 = P2PI. Show that P = P I P2 is a projector. Identify the subspaces VI and V2 so that P is a projector from V onto VI along V 2.
n
7.2. Invariance and Reducibility In this section we explore the world of invariant subspaces. We begin with some basic definitions and notions. Let V represent a generic symbol for a vector space in the following deliberations. DEFINITION 7.2.1. A subspace W of V is said to be invariant under a linear transformation T from V if Tx E W whenever X E W.
246
MATRIX ALGEBRA THEORY AND APPLICATIONS
In other words, what this definition indicates is that if the map T is restricted to the space W, then it is a linear map from W to W. The notion of invariance can be extended to cover two subspaces as in the following definition. DEFINITION 7.2.2. A linear transformation T from V to V is said to be reduced by a pair of subspaces VI and V 2 if VI and V2 are both invariant under T and V = V I EI1 V 2. It will be instructive to examine the notion of invariance in the realm of projectors. Suppose P is a projector from V onto a subspace VI along a subspace V 2. It is clear that V I is invariant under the linear transformation P- It not only maps elements of V I into V I but also all the elements of V. It is also clear that V 2 is also invariant under P. As a matter of fact, every element of V 2 is mapped into O. We will now determine the structure of the matrix associated with a linear transformation with respect to a basis in the context of invariance.
P 7.2.3 Let W be a subspace of V which is invariant under a given transformation T from V to V. Then there exists a basis of V with respect to which the matrix A of the transformation T can be written in the triangular form
A = mXm
(ic\ 0
(m-k) xk
where m
kX~2_k») A3 '
(7.2.1)
(m-k) x(m-k)
= dim(V) and k = dim(W).
For an exposition on matrices that are associated with linear transformations, see Section 3.4. Let Xl, X2, ... ,Xm be a basis of the vector space V so that XI, X2, ••. ,Xk form a basis for the subspace W. Let A = (aij) be the matrix associated with the linear transformation T with respect to this basis. As a matter of fact, PROOF.
m
AXi = LajiXj, i= 1,2, ... ,m. j=1
Since W is an invariant subspace under T, we must have k
AXi= LajiXj, i= 1,2, ... ,k. j=1
Projectors and Idempotent Operators
247
This implies that Qji = 0 for j = k + 1, k + 2, .. . ,m and i = 1,2, .. . , k. Hence the matrix A must be of the fonn (7.2.1). IT the linear transformation T is reduced by a pair of subspaces, then the matrix associated with T is more elegant as we demonstrate in the following proposition.
P 7.2.4 Suppose a linear transformation T is reduced by a pair of subspaces VI and V 2 of V. Then there exists a basis of V such that the matrix A of the transformation T with respect to the basis is of the fonn AI
A mxm -
where m
= dim(V),
k
kxk
(
0
(m-k)xk
= dim(Vd,
o
k x (m-k)
A3
)
'
(7.2.2)
(m-k) x (m-k)
and m - k
= dim(V2) '
PROOF. Let XI, X2, •.• ,Xm be a basis of V such that XI, X2,'" ,Xk fonn a basis for VIand Xk+I, Xk+2, •.. ,Xm form a basis for V 2. Following the argument presented in the proof of P 7.2.3, we can discern that A must be of the form (7.2.2). Projectors onto an invariant subspace of some linear transformation have an intimate relationship with the transformation. In the following propositions we bring out the connection.
P 7.2.5 If a subspace W of V is invariant under a linear transformation T from V to V, then PTP = TP for every projector P from V onto W. Conversely, if PTP = TP for some projector P from V onto W, then W is invariant under T. Let P be a projector from V onto W . Then for every X in P)x with Px E W. IT W is invariant under T, then TPx = Py for some y in V. Here we use the fact that W = R(P). Further, PTPx = p2 y = Py = TPx. Consequently, PTP = TP. Conversely, let PT P = T P for some projector P from V onto W. For every x in V, the statement that PT Px = T Px implies that T Px E R(P) = W. If yEW = R(P), then y = Px for some x in V . Consequently, Ty = T Px E W. This shows that W is invariant under PROOF.
V,
T.
X
= Px + (1 -
248
MATRIX ALGEBRA THEORY AND APPLICATIONS
P 7.2.6 A linear transformation T from V to V is reduced. by a pair subspaces VI and V 2 if and only if PT = TP, where P is the projector from V onto V I along V 2·
TP = PT. If x E VI, then Px = x . Note that which implies that Tx E R{P) = VI' This shows that VI is invariant under T. If y E V2, PTy = TPy = TO = O. This shows that Ty E V 2. Hence V 2 is invariant under T. Conversely, suppose that T is reduced by V I and V 2. Since V I is invariant under T, we have PTP = TP by P 7.2.5. Since V2 is invariant under T and (I - P) is a projector from V onto V 2 along VI, we have (I P)T{I - P) = T{I - P) by the same proposition. This simplifies to T + PTP - PT - TP = T - TP, from which we have PT P = PT. The result now follows. PROOF. Suppose
PTx
= TPx = Tx
Complements Develop a result analogous to P 7.2.3 for projectors. Develop a result analogous to P 7.2.4 for projectors. Let V = R2, VI = {(XI,O) E V : Xl real}, and V2 = {(X},X2) : Xl + X2 = O}. Let P be the projector from V onto VI along V2. Let (1,0) and (0,1) constitute a basis for the vector space V. Construct the matrix of the linear transformation P with respect to the given basis. Let (1,0) and (1,-1) constitute another basis for the vector space. Construct the matrix of the linear transformation P with respect to the new basis. 7.2.4 Let W be a subspace of a vector space V. Let dim(V) = m and dim{W) = k. Let T be a linear transformation from V to V and W be invariant under T. Choose a basis Xl, X2, .•. , Xm such that x}, X2, .•• , Xk form a basis for W. Let P be any projector from V onto W. Let AT and Ap be the matrices of the transformations T and P, respectively, with respect to the given basis. Show that ApATAp = ATAp directly. Place this assertion vis-a-vis with P 7.2.5.
7.2.1 7.2.2 7.2.3
7.3. Orthogonal Projection
In Section 2.2 we touched upon the orthogonal projection briefly. See Definition 2.2.12 and the ensuing discussion. In this section, we will spend some time with the orthogonal projection and learn some more. We will be working in the environment of inner product spaces. Let
Projectors and Idempotent Operators
249
V be an inner product space equipped with an inner product < -, ' >. Let W be a subspace of V and W.l its orthogonal complement. Recall that the projector P from V onto W along W.l is called the orthogonal projection on W. Before we proceed with some characteristic properties of orthogonal projections, we need to brush up our knowledge on the adjoint of a transformation. If T is a linear transformation from an inner product space V into itself, then there exists a linear transformation T* from V to V such that
< x,Ty > = < T*x,y >
for all x and yin V.
The transformation T* is called the adjoint of T. If T* = T , T is called self-adjoint.
P 7.3.1 A linear map P from V to V is an orthogonal projection if and only if P is idempotent and self-adjoint, i.e.,
p2 = P
and P*
=
P.
PROOF. Suppose P is an orthogonal projection. Since it is a projector, it is idempotent. See P 6.1.3. We can identify the relevant subspaces involved. The map P is a projector from V onto R{P}, the range of P, along R(I - P}. Since P is an orthogonal projection, the subspaces R{P} and R(I - P} must be orthogonal, i.e.,
¢:}
'*
< u, v> = 0 for all u E R{P} and v E R(I < (I - P}x, Py >= 0 for all x and y in V < P*(I - P}x, y >= 0 for all x and y in V.
P}
Consequently, P* (I - P)
=0
or P* P
= P* .
(Why?) Observe that P* = P* P = {P* P}* = {P*} * = P. See the complements under Section 3.6. Conversely, suppose that p2 = P and P* = p _ It is clear that P is a projector from R{P} along R(I - P}. See P 6.1.3. We need to show that the subspaces R{P} and R(I - P} are orthogonal. For x and y in V, observe that
< (1 -P}x, Py >=<
P*{I -P}x, y
>=< P(I -P}x, Y >=< Ox, Y >= O.
Hence R{ P) and R( I - P} are orthogonal.
250
MATRIX ALGEBRA THEORY AND APPLICATIONS
7.3.2. Let V = em, a complex vector space of dimension m. The standard inner product on V is given by EXAMPLE
m
< X,y >
=
L Xi1Ji, i=1
where x = (xt, X2, ... ,xm ) and y = (yt, Y2,··· ,Ym) E V. Let P = (Pij) be a matrix of order m X m with complex entries. The matrix P can be regarded as a linear transformation from V to V. The complex conjugate of P is the matrix p. = (qij) (abuse of notation?), where % = fiji for all i and j. Recall that P is Hermitian if p. = P - One can verify that for any two vectors x and Y in V,
< x,py > = < p·x,y >, with the understanding that when we write Py we view y as a column vec~or and then carry out matrix multiplication of P and y. The matrix p. is after all the adjoint of P when they are viewed as transformations. The matrix P viewed as a linear transformation is a projector if and only if p2 = P and P is Hermitian. Sums of orthogonal projections are easy to handle. The following proposition handles this situation, which is easy to prove. P 7.3.3 Let Vt, V 2 , ... ,Vr be pairwise orthogonal subspaces of V. Let Vo = VI $ V 2 Ea ... $ Yr. Let Pi be the orthogonal projection on Vi, i = 1,2, ... ,T. Then P = PI + P2 + ... + Pr is an orthogonal projection on Vo.
7.4. Idempotent Matrices Every linear transformation T from a finite-dimensional vector space V to V has a matrix associated with it under a given basis of V. See Section 3.4. In particular, the matrices associated with projectors are of special interest. In this section, we focus on matrices with entries as complex numbers. A square matrix A is said to be idempotent if A2 = A. This definition is analogous to the one we introduced for linear transformations. In brief, transformations and matrices associated with them are hardly distinguishable and using the same word "idempotent" in both the contexts should not cause confusion.
Projectors and Idempotent Operators
251
The definition of idempotent matrix is also operational when the entries of the matrix come from any field. Some of the results stated below make sense in the general framework of matrices with entries coming from any field. P 7.4.1 Let A be an idempotent matrix of order m x m. The following are valid. (1) The eigenvalues of A can only be zeros and ones. (2) The matrix A admits a factorization A = QR* with Q and R being of order m x k and R*Q = I k , where k = p(A), the rank of A. (3) The matrix A is diagonalizable, i.e., there exists a non-singular matrix L and a diagonal matrix D. such that A = LAL -1, the diagonal entries of A being zeros and ones. (4) p( A) = Tr( A). (The trace operation is discussed in Complement 3.4.7.) (5) There exists a positive definite matrix C such that A = C-I A*C. (6) A is a projection matrix, i.e., there exist two subspaces VI and V2 on em such that VI nV2 = {O}, em = VI EB V2, and if x = Xl + X2 with Xl E VI and X2 E V2, then Ax = Xl. (If we view A as a transformation from em to em, then A is a projector from em onto VI along V 2 , in the usual jargon. As usual, members of em are viewed as column vectors.) PROOF. (1) Let A be an eigenvalue of A with an associated eigenvector x. Then Ax = AX implies that AX = Ax = A2x = A(Ax) = .x(Ax) = .x(.xx) = .x2x. Since X =1= O,.x = or 1. (2) By the Singular Value Decomposition Theorem (P 5.3.4), we can write A = QD.P*, where Q is of order m x k with the property that Q*Q = Ik, P is of order m x k with the property that P* P = h, and D. is a diagonal matrix of order k x k with positive entries in the diagonal. Since A2 = A, we have QD.P*QD.P* = QD.P*, from which we have D.P*QD. = D. or D.P*Q = I k • Take R* = D.P*. Thus we have A = QR* with R*Q = I k • (3) Choose a matrix S of order m x (m - k) so that the augmented matrix L = (QIS) is non-singular and R*S = 0, where Q and Rare the matrices that appear in the representation (2) of A above. (How?) Now choose a matrix U of order m x (m - k) such that U*S = Im-k and U*Q = 0. (How?) One can verify that L- I = (RIU)*. (Verify that
°
252
MATRIX ALGEBRA THEORY AND APPLICATIONS
L- I L = 1m.) Observe that
A
= (QIS)
[~ ~]
(RIU)*
= LAL-I,
(7.4.1)
where A is the diagonal matrix given by
A=
[~ ~] .
(4) From (7.4.1), Tr(A) = Tr(LAL-I) = Tr(AL -1 L) = Tr(A) = k = p(A) . See Complement 3.4.7. (5) Note that 1m - A is also idempotent. Consequently, p(lm A) = m - k. Consider the rank factorizations of A and 1m - A . See Corollary 5.2.3. Write A = DI EI and 1m - A = D2E2, where DI is order m x k with rank k, EI of order k x m with rank k, D2 of order m x (m - k) with rank (m - k), and E2 of order (m - k) x m with rank m-k. Let FI = (DIID2) and F2 = [~~]. Then FIF2 = DIEI +D2E2 = A + (1m - A) = 1m. It now follows that FI = F2- I . Let C = F2 F2. It is clear that C is non-singular and Hermitian. Further, C is positive definite. Note that FI = (DIID2) = F2- I = C-I F2 = C-I(EjIE2'), from which we have DI = C-I Ej or equivalently, CD I = Ej. Finally,
This completes the proof. (6) Take VI = R(A), the range of the matrix A , and V 2 = R(lmA). It is clear that VI nV2 = {o}. For every x in em, note that x = Ax + (1m - A)x, Ax E VI = R(A), and (1m - A)x E V 2 = R(Im - A). This implies that VI EI1 V 2 = em and the projector P from em onto V 1 along V 2 is precisely equal to A. COROLLARY 7.4.2 . If A is idempotent and Hermitian, one can write
A
= TT*
with
T*T
= h,
where k = p(A). PROOF. We use P 5.4.3. Since A is Hermitian, there exists a unitary matrix P such that A = pr P*, where r is a diagonal matrix with
Projectors and Idempotent Operators
253
diagonal entries constituting the eigenvalues of A. Since A is idempotent, each of the eigenvalues of A is either zero or one. Assume, without loss of generality, that r is of the form
r=[~ ~J. Write P = (TIS), where T is of order m x k . We can now check that
A=TT* andT*T=h. The idempotency of a matrix can be characterized solely based on the additive property of the ranks of two matrices A and 1m - A. The following proposition is concerned with this feature. P 7.4.3 Let A be a matrix of order m x m . Then A is idempotent if and only if p(A) + p(Im - A) = m . PROOF. We have already seen that if A is idempotent then p(A) + p(Im - A) = m . The argument is buried somewhere in the proof of P 7.4.1. Suppose p(A) + p(Im - A) = m . Observe that
m = p(Im) = p(A+(Im-A)) = p(A)+p(Im-A)-dim(R(A)nR(Im- A )). This identity requires some explanation. It is clear that c m = R(A) + R(Im - A). (Why?) A look at P 1.5.7 might help the reader to understand the meaning of the symbol + in the orientation defined above. We do not know that C m = R(A) EB R{Im - A) . Consequently, m
= dim(Cm) =
dim(R(A)
= dim(R(A))
+ dim(R{Im
= p(A)
+ p{Im - A) -
= m - dim(R(A)
+ R{Im -
A))
- A)) - dim(R(A) n R{Im - A))
dim(R(A) n R{Im - A))
n R{Im - A)).
This implies that dim(R(A) n R{Im - A))
= 0 from which we have
R(A) n R{Im - A) = {O}. Thus c m = R(A) (1) R{Im - A) indeed. We claim that A(Im - A) = o. Suppose not. Then there exist non-zero vectors x and y in C m such that A{Im - A)x = y. This implies that y E R(A). Note that A(Im - A) =
MATRIX ALGEBRA THEORY AND APPLICATIONS
254
(Im - A}A. It is true that (Im - A}Ax = y. This implies that y E R(Im - A). This is a contradiction. Hence A(Im - A) = 0 or A2 = A. This completes the proof. The following result is analogous to the one stated in Complement 7.1.8. The statement is couched in terms of matrices with complex entries. P 7.4.4 Let At and A2 be two square matrices of the same order and A = At + A2. Then the following statements are equivalent.
(1) A2 = A and p(A} = p(At} + p(A2). = At, A~ = A2, and AtA2 = A2At =
(2) A~
o.
Suppose (2) is true. It is obvious that A2 = A. Since A, At, and A2 are idempotent, p(A) = Tr(A t } + Tr(A2) = p(At) + p(A2}. Suppose (1) is true. By P 7.4.3, m = p(A}+p(Im-A) = p(A t )+p(A2)+ p(Im - A) ~ p(At} + p(A2 + 1m - A} = p(At} + p(lm - At} ~ p(At + 1m - At} = p(lm} = m. Consequently, p(At) + p(Im - A) = m. Again, by P 7.4.3, At is idempotent. In a similar vein, one can show that A2 is idempotent. The fact that A, At, and A2 are idempotent implies that AtA2 + A2At = O. The information that p(A) = P(Al) + p(A2) implies that R(At} n R(A2) = {O}. This coupled with AIA2 = -A 2A t gives AtA2 = o. Follow the argument crafted in the proof of P 7.4.3. A generalization of P 7.4.4 is in order involving more than two matrices. PROOF.
P 7.4.5 Let At, A2, ... ,Ak be any k square matrices of the same order and A = At + A2 + ... + A k. Consider the following statements. (1) Each Ai is idempotent. (2) AiAj = 0 for all i i= j and p(An = p(Ai) for all i. (3) A is idempotent.
(4) p(A} = p(At} + p(A2} + ... + p(A k }. Then the validity of any two of the statements (1), (2), and (3) imply the validity of all the four statements. Further, the validity of statements (3) and (4) imply the validity of the rest of the statements. PROOF. Suppose (I) and (2) are true. It is clear that (3) is true. Since A and At, A2, ... ,Ak are all idempotent, p(A} = Tr(A} = Tr(At} + Tr(A 2} + ... + Tr(Ak) = p(At} + p(A2} + ... + p(A k}. Thus (4) is true.
Projectors and Idempotent Operators
255
Suppose (2) and (3) are true. A computation of A2 yields A2 = k
E At·
Fix 1 ::; i ::; k. We show that Ai is idempotent. Note that
i=l
AAi = AiA = At and A2 Ai = AiA2 = Ar Since A is idempotent, we have At = A~, which implies that Ar(Im - Ai) = O. The condition p(Ai) = p(At) is equivalent to the statement that dim(R(Ai)) = dim(R(A;)). Since R(A;) C R(A i ), we must have R(Ai) = R(At). Consequently, there exists a nonsingular matrix D such that Ai = DAr. Hence A;(Im - A) = 0 implies that Ai(Im - A) = 0 from which we conclude that Ai is idempotent. Thus (1) is true. Now (4) follows. j. Let B = Ai + Aj and Suppose (3) and (4) are valid. Fix i C = A-B. By (4),
t=
k
k
LP(Ar)
= p(A) = p(B + C)
::; p(B)
+ p(C)
::; LP(Ar).
r=l
r=l
From this, we have p(A) (Why?) Observe that
=
p(B)
+ p(C)
and p(B)
=
p(Ai)
+ p(A j ).
+ 1m - B) ::; p(B) + p(Im - B) p(B) + p(lm - A + C) = p(B) + p(lm - A) + p(C) p(A) + p(Im - A) = m.
m = p(lm) = p(B
= =
Hence p{B) + p{Im - B) = m. By P 7.4.3, B is idempotent. Thus we have Ai + Aj idempotent and p(B) = p(Ai) + p(Aj ). By P 7.4.4, AiAj = 0 and Ai and Aj idempotent. Thus (1) and (2) follow in one stroke. Suppose (1) and (2) are valid. It is obvious that (4) follows exploiting the connection between rank and trace for idempotent matrices. Since we have (4) valid, (2) follows now from what we have established above. This completes the proof. The condition in (2) of P 7.4.5 that p(Ai) = p(A;) is somewhat p(B2) for a matrix B. As an intriguing. It could happen that p(B) example, try
t=
B=
[~
This will not happen if B is Hermitian or nonsingular.
256
MATRIX ALGEBRA THEORY AND APPLICATIONS
Complements 7.4.1 Let Yb Y2, ... , Ym be m real random variables whose joint distribution depends on a vector parameter ()' = (()I, ()2,··· ,()k) E Rk. Suppose E(J~ = Xil()1 +Xi2()2 + ... +Xik()k, i = 1,2, ... ,m, where Xi/ S are known. Let X = (Xij). Assume that p(X) = k. Let g(()) = (), () E Rk. The random variables constitute a linear model and one can rewrite the expectations as E(JY = X(),() E Rk, where Y' = (YI , Y2, ... ,Ym ). The least squares estimator 9 of g( .) is given by 9 = (X' X)-I X'Y. The residual sum of squares (RS S) is given by
RSS = (Y - Xg)'(Y - Xg). Show that
(1) (2) (3) (4)
X(X' X)-I X' and (Im - X(X' X)-I X') are idempotent; p(X(X' X)-I X') = k; E(Jg = () for every () E Rk; E(JRSS = (m - k)(J'2 by assuming that YI , Y2, . .. ,Ym are pairwise uncorrelated with common variance
(J'2
> 0;
(5) 9 and RSS are independently distributed by assuming that YI , ... , Ym have a multivariate normal distribution with variance covariance matrix (J'21m. (Hint: A linear form LY and a quadratic form Y' AY are independently distributed if LA = 0.)
7.4.2 7.4.3
If A is idempotent and non-singular, show that A = I. Let
A=[~
g],
where Band D are square matrices. Show that A is idempotent if and only if Band D are idempotent, BCD = 0, and (I - B)C(I - D) = o.
7.5. Matrix Representation of Projectors In this section, we consider a finite dimensional vector space V over the field of complex numbers equipped with an inner product < .,. >. Let be an ordered collection of vectors from V. We define some algebraic operations on the space of all ordered sets of the form S .
Projectors and Idempotent Operators
257
Addition of ordered sets Let SI = (aI, a2, . .. ,an) and S2 = (bl, b2, ... ,bn ) be two ordered collections of vectors from V. Define the sum of SI and S2 by (7.5.1) Multiplication of ordered sets Let Sl = (aI, a2, ... ,am) and S2 = (bt,~, ... ,bn ) be two ordered collections of vectors from V. Define the product of SI and S2 to be the matrix of order m X n (7.5.2)
Multiplication of an ordered set and a matrix Let S = (aI, a2, . .. ,an) be an ordered set of vectors from V and (mij) a matrix of order n x k with complex entries. Define the product of Sand M to be the ordered set
M =
n
S x M =
(2: j=1
n
mjl a j,
2:
n
mj2 a j, ...
j=1
,2:
mjkaj),
(7.5.3)
j=1
which is an ordered set of k vectors. IT M is a column vector with entries mt, m2, .. ' ,mn , then S x M is simply the linear combination ml al + m2a2 + ... + mnan of the vectors at, a2,· .. ,an, The operation 0 in (7.5.2) is a little fascinating. Let us examine what this means when the vectors ai's and bi's come from the coordinate vector space Ck. The set SI turns out to be a matrix of order k x m with complex entries and S2 is of order k x n with complex entries. It can be verified that SI 0 S2 = S;S2. A special case of this is when SI
= S2 = S,
SoS = S*S.
say, in which case
258
MATRIX ALGEBRA THEORY AND APPLICATIONS
The operation x simplifies the usual matrix multiplication in coordinate vector spaces. It turns out that S x M = SM, the usual matrix multiplication of Sand M. We record some of the properties of these operations in the following proposition which can be verified easily. In the following the symbol S with subscripts stand for ordered sets of vectors and M with SUbscripts stand for matrices. P 7.5.1
The following are valid.
(1) If S = (aI, a2,' " ,am), M1 = (mD») and M2 = (m~~») each of
order m x k, then S x (M1 + M 2) = (S x Md + (S x M2) ' (2) If Sl = (aI,a2,'" ,am ),S2 = (b 1,b2, ... ,bm ), and M = (mij) of order m x k, then (Sl + S2) x M = (Sl X M) + (S2 X M) . (3) If Sl = (ab a2,·· . ,am), S2 = (bb b2 , ... , bn ), and M = (mij) of order n x k, then Sl 0 (S2 X M) = (Sl 0 S2)M. (4) If Sl = (aI, a2, . .. ,am), M = (mij) of order m x k, and S2 = (b1'~" " ,bn ), then (Sl X M) 0 S2 = M*(Sl 0 S2). (5) If Sl = (al, a2 , ··. ,am), S2 = (bb~, .. . ,bn ) , and S3 = (C1, C2 , ... ,cr ), then Sl 0 (S2 + S3) = Sl 0 S2 +Sl 0 S3. Having an explicitly workable expression for the projector from a vector space V onto a subspace VI along a subspace V 2 seems to be difficult. Using the new operations developed at the beginning of this section, we will attempt to provide an explicit formula. P 7.5.2 Let S = (aI, a2, . .. ,am) be an ordered collection of vectors from a vector space. Let R(S) be the vector space spanned by the vectors of S. Then the orthogonal projection P from V onto R(S) has the representation, P=Sx M(So .) (7.5.4) so that the orthogonal projection of a vector x in V is given by Px = S x M(Sox),
(7.5.5)
where M is any matrix of order mXm satisfying (SoS)M(SoS) = SoS. PROOF. Let bI,~, .. . ,b k be a basis to the R(S).l , the orthogonal complement of R(S). Take Sl = (bb~, ... ,bk ). It is clear that
So Sl = O.
Projectors and Idempotent Operators
259
X E V. Since R(8) ffi R(8)l. = V, we can write x = Xl + X2 with E R(8) and X2 E R(8)l.. The vector Xl is the one we are going after, i.e., Px = Xl. The vector Xl is a linear combination of the vectors in 8 and X2 is a linear combination of the vectors in 8 1 . Consequently, there are vectors y and z of orders m X 1 and k xl, respectively, such that, in terms of our new algebraic operations,
Let Xl
Xl
= 8 x y
and
X2
= 81
IT we know y, we will know the projection X
= 8 x y
Xl
X
z.
of x. Thus we have
+ 81 X z
(7.5.6)
for some column vectors of complex munbers. Premultiply (7.5.6) by 8 with respect to the operation 0
80 X
= 80 (8 x y) + 8 0 (81 X z) = (80 8)y + (80 8t}z = (80 8)y.
We can view (8 0 8)y = 8 0 X as a linear equation in unknown y. Let M be a generalized inverse of the matrix (808), i.e., M satisfies the equation, (80 8)M(8 08) = (808). A solution of the linear equation is given by y = M(8 0 x). We are jumping the gun again! We will see in Chapter 8 why this is so. Thus
Px
= Xl = 8
xy
=8
x M(8o x).
This completes the proof. We will specialize this result for coordinate spaces. Suppose V = en. Then 8 is an n x m matrix. Take the inner product < -, - > to be the standard inner product in V, i.e., for X and y in V, < X, y >= y*x. Note that 808 = 8*8 and 80 X = 8*x. Finally,
8 x M(8ox) = 8 x M8*x = 8M8*x, where M is a generalized inverse of the Hermitian matrix 8*8. Let us enshrine this result in the form of a corollary. COROLLARY 7.5.3. Let V = en and 8 a matrix of order n X m with complex entries. Let R(8) be the vector space spanned by the
260
MATRIX ALGEBRA THEORY AND APPLICATIONS
columns of S. Then the projection operator P from V into R(S) has the representation, (7.5.7) P= SMS* where M is any matrix satisfying (S* S)M(S* S) = S* S. The expression (7.5.7) for the projection operator first given by Rao (1967) was useful in the discussion of linear models under a very general setup. Now we take up the general case of projectors. Let Sl and S2 be two ordered sets of vectors from an inner product space V. Suppose R(Sl) and R(S2), the vector spaces spanned by Sl and S2, respectively, satisfy the conditions that R(SdnR(S2) = {O} and R(Sl)El1R(S2). The spaces R(Sl) and R(S2) need not be orthogonal. Let P be the projector from V onto R(Sd along R(S2)' We need an explicit representation of P. Let S3 be an ordered set of vectors from V such that R(S2) and R(S3) are orthogonal and V = R( S2) El1 R( S3)' (How will you find S3?) In particular, we will have S3 0 S2 = O. P 7.5.4 In the framework described above, the projector P has the representation Px = Sl X M(S3 ox)
for any x in V, where M is a generalized inverse of the matrix S3 i.e., M satisfies (S3 0 Sl)M(S3 0 Sd = S3 0 Sl.
0
S1,
Let x E V. Write x = Xl + X2 with Xl E R(Sl) and X2 E R(S2). Since Xl is a linear combination of the vectors in S}, we can write Xl = Sl X Y for some column vector of complex entries. In a similar vein, we can write X2 = Sl X z for some column vector z. Premultiplying X = Sl X Y + S2 X z by S3 under the operation 0, we have PROOF.
S3
0
X = S3
(Sl
0
= (S3
0
X
y)
+ S3 0
Sl)y + (S3
0
(S2
S2)Z
X
z)
= (S3 0
A solution to the system of linear equations (S3 unknown y is given by y = M(S3 0 x). Thus
Px
= Sl
This completes the proof.
X
Y = Sl
X
M(S3
0
x).
0
St}y. St}y -
S3
0
X in
Pmjectora and Idempotent Operatorn
261
In the above proof, we roped the ordered set S3 in the representation of the projector. We could avoid this. P 7.5.5 Let the framework of P 7.5.4 be operational here. Let G be any generalized inverse of the matrix
Partitioned as
where the order of the matrix C1is the same as the order of S1o S1 and the order of C4is the same as the order of S2o S2. Then for any x in
Vl Px
= Cl(S10 x)
+ C2 (S2
0
x).
PROOF.As in the proof of P 7.5.4, write for some column vectors y and z with complex entries. Premultiply (7.5.8) by S1 under the operation o. We will have
Premultiplying (7.5.8) by S2 under the operation o, we have
Equations (7.5.9) and (7.5.10) can be written as
[
]
Y = SlOX. [Z]
[S20x]
This is a system of linear equations in the unknowns y and Z. A solution is given by
Consequently, y = Cl(Sl o x ) +C2(S2
o'x), 2 = C3(Si 0 X) + C4(S2 O
262
MATRIX ALGEBRA THEORY AND APPLICATIONS
Finally, Px = S} proof.
X Y
= C}(S}
0
x)
+ C2(S2 0
x). This completes the
Complements 7.5.1 For vectors x and y in a vector space en, define the inner product by < x, y >= y*Ex, where E is a positive definite matrix. Then P is an orthogonal projector if and only if
(1)
p2 = P,
and
(2)
EP is Hermitian.
7.5.2 (Rao (1967)) Let C} be a subspace of cn spanned by columns of an n X k matrix A. Show that the orthogonal projector onto C} is
P
=
A(A*EA)- A*E
where (A*EA)- is a generalized inverse of (A*EA), i.e., any matrix B satisfying the property (A*EA)B(A*EA) = A*EA. The expression for P is unique for any choice of the generalized inverse. (For a discussion of generalized inverses, see Chapter 8.) 7.5.3 Let A be n x p and B be n x q real matrices and denote their Kronecker product by A ® B. Denote by PA, PB and P A0B the orthogonal projectors on R(A), R(B) and R(A ® B) respectively. Then
(1) PA0B = PA ®PB
(2) PA0I=PA®I (3) QA0B = QA ® QB
+ QA ® PB + PA ® QB,
where QA
=
1-
PA, QB = I -QB.
Note: The references for this Chapter are: Rao (1967), Rao and Mitra (1971b), Rao (1974), Rao and Mitra (1974), Rao (1978c), Roo and Yanai (1979) and standard books on Linear Algebra.
CHAPTER 8 GENERALIZED INVERSES
In Section 3.3, we explored the concept of an inverse of a linear transformation T from a vector space V to a vector space W . We found that the nature of the inverse depends upon what kind of properties T has. A summary of the discussion that had been carried out earlier is given below. (1) Suppose T is bijective, i.e., T is injective (one-to-one) and surjective (onto) . Then there exists a unique transformation S from W to V such that ST = I and T S = I with the identity transformation acting on the appropriate vector space. (2) Suppose T is surjective. Then there exist a transformation S (called a right inverse of T) from W to V such that TS = I. (3) Suppose T is injective. Then there exists a transformation S (called a left inverse of T) from W to V such that ST = I . (4) There always exists a transformation S from W to V such that TST = T . Such a transformation S is called a g-inverse of T.
In this chapter, we focus on matrices. We reenact the entire scenario detailing inverses of transformation in the realm of matrices. Special attention will be paid to finding simple criteria for the existence of every type of inverse. The source material for this Chapter is Rao and Mitra (1971). Before we proceed with the details, we need to set up the notation. Let Mm ,n denote the collection of all matrices A of order m x n with entries coming from the field of real or complex numbers. The symbol Mm,n (r) denotes the collection of all matrices A in Mm,n with rank r. The rank of a matrix A is denoted by p{A). The vector space spanned by the columns of a matrix A is denoted by Sp{A) . An equivalent 263
264
MATRIX ALGEBRA THEORY AND APPLICATIONS
notation is R(A), the range of A when A is viewed as a transformation. See Section 4.1. [Sp(A) is more suggestive when A is a matrix as the vector space generated by the column vectors of A.] 8.1. Right and Left Inverses
In this section we characterize right and left inverses of matrices. In addition, the structure of a right inverse as well as left inverse of a matrix is described. P 8.1.1 Let A E Mm,n. There exists a matrix G E Mn,m such that AG = 1m if and only if p(A) = m. In such a case a choice of G is given by (8.1.1) A general solution of G is given by G = VA*(AVA*)-l,
(8.1.2)
where V is any arbitrary matrix satisfying p(A) = p(AV A*) . Suppose p(A) = m. Then m = p(A) = p(AA*) . The matrix AA * is of order m x m and has rank m. Consequently, AA * is nonsingular. The matrix G = A*(AA*)-l indeed satisfies AG = AA*(AA*)-l = 1m. Conversely, suppose there exists a matrix G E Mn ,m such that AG = 1m. Note that m = p(Im) = p(AG) ::; p(A) ::; m. Hence p(A) = m. As for the general structure of G, if V is any matrix satisfying p(AV A*) = p(A) , then G = V A*(AV A*)-l certainly satisfies the condition AG = 1m. On the other hand, if G is any matrix satisfying AG = 1m, it can be put in the form G = V A*(AV A*)-l for some suitable choice of V. Take V = GG*. (How?) The matrix G that figures in P 8.1.1 can rightly be called a right inverse of A. One can also say that a right inverse of A exists if the rows of A are linearly independent. Incidentally, p( G) = m. A similar result can be crafted for left inverses of A. PROOF.
P 8.1.2 Let A E Mm,n . Then there exists a matrix G E M .. ,m such that GA = In if and only if p(A) = n . In such a case one choke of G is given by G=(A*A)-lA*. (8.1.3) A general solution of G satisfying G A = In is given by G = (A*V A)-l A*V (8.1.4)
Generalized Inverses
265
for any matrix V satisfying p(A) = p(A*V A). The matrix G that figures in P 8.1.2 can be rightly called a left inverse of A . The existence of a left inverse of A is guaranteed if the columns of A are linearly independent. Incidentally, p( G) = n. The right and left inverses have some bearing on solving linear equations. Suppose Ax = y is a consistent system of linear equations in an unknown vector x, where A E Mm,n and y E M m ,1 are known. Consistency means that the system admits a solution in x. Suppose p(A) = n. Let G be any left inverse of A . Then x = Gy is a solution to the linear equations. This can be seen as follows . Since Ax = y is consistent, y must be a linear combination of the column of A. In other words, we can write y = Aa for some column vector a. We now proceed to verify that Gy is indeed a solution to the system Ax = y of equations. Note that A(Gy) = AGAa = A1na = Aa = y. Let us look at the other possibility where we have a consistent system Ax = y of linear equations with p(A) = m. Let G be a right inverse of A. Then Gy is a solution to the system Ax = y . This can be verified directly. Incidentally, note that if p(A) = m, Ax = y is always consistent whatever may be the nature of the vector y!
Complements 8.1.1 Let A be a matrix of order m x 1 with at least one non-zero entry. Exhibit a left inverse of A. Obtain a general form of left inverses of A. 8.1.2 Let A be a matrix of order 1 x n with at least one non-zero entry. Exhibit a right inverse of A . Obtain a general form of right inverse of A. Show that Ax = {J is consistent for any number (J. 8.1.3 Let A E Mm ,n. Show that Ax = y is consistent for all y E M m,1 if and only if p(A) = m. 8.1.4 Let
A=
[~
2
3
Obtain a right inverse of A.
8.2. Generalized Inverse (g-inverse) One of the basic problems in Linear Algebra is to determine solutions to a system Ax = y of consistent linear equations, where A E Mm,n.
266
MATRIX ALGEBRA THEORY AND APPLICATIONS
If the matrix is of full rank, i.e., p(A) = m or n, we have seen in the last section how the right and left inverses of A, as the case may be, help to obtain a solution. It is time to make some progress in the case
when A is not of full rank. Generalized Inverses or g-inverses of A are the matrices needed to solve consistent linear equations. They can be introduced in a variety of ways. We follow the linear equations angle. DEFINITION 8.2.1. Let A E Mm,n be of arbitrary rank. A matrix G E Mn,m is said to be a generalized inverse (g-inverse) of A if x = Gy is a solution of Ax = y for any y for which the equation is consistent. This is not a neat definition. It is a goal-oriented definition. Later, we will provide some characterizations of g-inverses, one of which could give us a crisp mathematical definition. The customary notation for a g-inverse of A is A -, if it exists. First, we settle the question of existence.
P 8.2.2
For any matrix A E Mm,n, A- E Mn,m exists.
PROOF . If A = 0, take G = o. Assume that A :f. O. Let us make use of the rank factorization of A . Write A = RF, where R is of order m x a with rank a and F of order a x n with rank a, where a = p(A). See Corollary 5.2.3. Let B be a left inverse of R, i.e., BR = la, and C a right inverse of F, i.e., FC = I. Let A- = CB. We show that Ais a g-inverse of A. Let y E Sp(A), the vector space spanned by the columns of A. Then the system Ax = y is consistent. Also, y = Aa for some column vector a. We show that A-y is a solution of Ax = y.
AA-y = (RF)(CB)y = R(FC)By = RBy = RBAa = (RB)(RF)a = R(BR)Fa = RFa
which shows that x
= Aa = y,
= A-y satisfies the equation Ax = y.
P 8.2.3 Let A E Mm,n and G E Mn,m. The following statements are equivalent. (1) G is a g-inverse of A. (2) AG is an identity on Sp(A), i.e., AGA = A. (3) AG is idempotent and p(A) = p(AG).
Generalized Inverses
267
PROOF. We show that (1) => (2) => (3) => (1). Suppose (1) is true. Let aI, a2, . .. ,an be the columns of A . It is clear that Ax = ai is a consistent system of equations for each i. Since G is a g-inverse of A, Gai is a solution to the system Ax = ai, i.e., AGai = ai· This is true for each i . Combining all these equations, we obtain AGA = A. The statement that AG is an identity on Sp(A) is a restatement of the fact that AGy = y for y E Sp(A). Thus (2) follows. Suppose (2) is true. Post-multiply the equation AGA = A by G. Thus we have AGAG = (AG)2 = AG, which means that AG is idempotent. Note that p(A) = p(AGA) S; p(AG) S; p(A). Hence p(A) = p(AG) . Thus (3) follows. Suppose (3) is true. Let Ax = y be a consistent system of equations. Consistency means that y E Sp(A), i.e., y = Aa for some column vector a. Since p(A) = p(AG), Sp(A) = Sp(AG) . Consequently, y = AG{3 for some column vector {3. We will show that Gy is a solution of Ax = y. Note that AGy = AGAG{3 = AG{3 = y, since AG is idempotent. Thus (1) follows. The characteristic property that a g- inverse G of A satisfies AG A = A can be taken as a definition of g-inverse. The fact that for any g-inverse G of A, AG is idempotent puts us right into the ambit of projectors. In fact, AG is a projector from em onto Sp(A)(= Sp(AG)) along Sp(ImAG). (Why?) It is interesting to note that GA is also idempotent. It is also interesting to note that G E Mn,m is a g-inverse of A if and only if p(A) + p(In - GA) = n . This is reminiscent of the result P 7.4.3. Let G E Mn ,m be a g-inverse of A E Mm,n' The matrix AG behaves almost like an identity matrix. What we mean by this is (AG)A = A, i.e., AG behaves like an identity matrix when multiplying A from the left. In the following proposition we examine under what circumstances AGB=B. P 8.2.4 (1) For a matrix B of order m x k, (AG)B = B if and only if Sp(B) c Sp(A), i.e., B = AD for some matrix D. (2) For a matrix B of order k xn, B(GA) = B if and only if Sp(B') Sp(A'), i.e. B = DA for some matrix D.
c
PROOF. (1) We have already seen that AG is an identity on Sp(A). Consequently, AG is an identity on any subspace of Sp(A) . Thus if B = AD, then AGB = B. Note that AG is a projector from em onto
268
MATRIX ALGEBRA THEORY AND APPLICATIONS
Sp(A) along Sp(Im - AG) . Consequently, if y is a non-zero vector with (AG)y = y, then y better be a linear combination of the columns of A, i.e., y E Sp(A). Hence, if (AG)B = B, then Sp(B) c Sp(A).
(2) This is similar to (1). A number of corollaries can be deduced from P 8.2.4. COROLLARY 8.2 .5. Let A E Mm,n and G E Mn ,m a g-inverse of A. If a is a column vector consisting of n entries such that a E Sp(A') and f3 is a column vector consisting of m entries such that f3 E Sp(A), then a'Gf3 is invariant (i.e., a constant) for any choice of G. PROOF. The conditions on a and f3 mean that a = A', for some column vector, and f3 = Ab" for some column vector 15. Consequently, a ' Gf3 = " AGAf3 = ,'Ab", a constant independent of the choice of G. Corollary 8.2.5 can be generalized. COROLLARY 8 .2.6 . Let G E Mn,m stand as a generic symbol for a g-inverse of A. Suppose Band G are matrices of orders p X nand m x q, respectively, such that Sp(B') c Sp(A') and Sp(C) c Sp(A). Then BGG is invariant for any choice of G. PROOF. A proof can be crafted along the lines of the proof of Corollary 8.2.5. COROLLARY 8.2.7. Let A E Mm,n ' Let (A* A)- E Mn stand for aginverse of A* A. Then A(A* A)-(A* A) = A and (A* A)(A* A)- A* = A*. PROOF. First, we observe that Sp(A*) = Sp(A* A). Consequently, A* = (A* A)D for some matrix D. Therefore, A = D* A* A and A(A* A)(A* A) = D* A* A(A* A)-(A* A) = D* A* A = A . As for the second identity, note that (A* A)(A* A)- A* = (A* A)(A* A)- A* AD = A* AD = A* . Corollary 8.2.7 can be strengthened. Let V be any matrix such that p(A*V A) = p(A). If V is positive definite, this rank condition is definitelysatisfied. ThenA(A*VA)-(A*VA) = A and (A*VA)-(A*VA)A* = A*. The matrix A(A* A)- A* for any A plays a crucial role in Linear Models. Linear Models provide a very general framework embodying Multiple Regression Models and Analysis of Variance Models. In the following proposition, we demonstrate that this matrix has some special properties.
Generalized Inverses
269
P 8.2.8 Let A E Mm,n ' Then A(A* A)- A* is Hermitian, idempotent, and invariant whatever may be the choice of (A* A)- . PROOF. Idempotency is easy to settle: (A(A* A)- A*)A(A* A)- A* = (A(A* A)- A* A)((A* A)- A*) = A(A* A)- A*, by Corollary 8.2.7. Let us look at the invariance property. Since Sp(A*) = Sp(A* A), we can write A* = (A* A)D for some matrix D. Consequently,
A(A* A)- A* = D* A* A(A* A)- A* AD = D* A* AD, which is a constant whatever g-inverse of A * A we use. Incidentally, we note that D* A * AD is Hermitian. This completes the proof. P 8.2.8 can be strengthened. Let V be any positive definite matrix such that p(A*V A) = p(A) . Then A(A*V A)- A* is invariant for any choice of (A*V A)-. Further, if A*V A is Hermitian, so is A(A*V A)- A*. We now focus on a consistant system Ax = y of linear equations. Using a single g-inverse of A, we will demonstrate how all the solutions of the system of equations can be generated. (See Rao (1962) .) P 8.2.9 Let A E Mm,n and G E Mn ,m be a fixed g-inverse of A. (1) A general solution of the homogeneous system Ax = 0 of equations is given by x = (In - GA)z, where z is an arbitrary vector. (2) A general solution to the system Ax = y of consistent equations is given by x = Gy + (In - GA)z where z is an arbitrary vector. (3) Let q be any column vector consisting of n components. Then q' x is a constant for all solutions x of the consistent system Ax = y of equations if and only if q'(In - GA) = 0 or equivalently, q E Sp(A'). (4) A necessary and sufficient condition that the system Ax = y is consistent is that AGy = y. PROOF. (1) First we note that for any vector z, (In - GA)z is a solution of the system Ax = 0 of equations. On the other hand, let x be any solution of the system Ax = O. Then
x
= GAx + (In -
GA)x
= 0 + (In -
GA)x
= (In
- GA)z,
with z = x. Thus we are able to write x in the form (In - GA)z for some vector z.
270
MATRIX ALGEBRA THEORY AND APPLICATIONS
(2) This result follows since a general solution of Ax = y is the sum of a particular solution of Ax = y and a general solution of Ax = O. Note that Gy can be taken as a particular solution. (3) Suppose q'(In - GA) = o. Any solution of Ax = y is of the form Gy+ (In - GA) z for some vector z . We compute q'(Gy+ (In - GA)z) = q'Gy. Since Ax = y is consistent, y E Sp(A). Since q'(In -GA) = O,q E Sp(GA) = Sp(A') . (Why?) By Corollary 8.2.5, q'Gy remains the same regardless of the choice of G. Conversely, suppose q'x remains the same for all solutions x of Ax = y. This means that q'(Gy + (In - GA)z) = constant for all z in en . This implies that q'(In - GA)z = 0 for all z E en . (Why?) Hence q'(In - GA) = O. (4) If Ax = y is consistent, we have seen that x = Gy is a solution. Then AGy = y. The converse is obvious. If A is a nonsingular matrix, g-inverse of A is unique and is given by A-l. This means that AA-l = A-lA = I. If A is a singular square matrix or rectangular matrix, AA - and A - A are, in general, not identity matrices. We shall investigate how exactly AA - and A-A differ from identity matrices and what is common to different g-inverses of A. The answer is contained in the following result.
p 8.2.10 Let A E Mm ,n and G E Mn,m be any two matrices. Partition A and G as
A=
[p~:l ' qXn
G = (G l : G 2 ) nxp nXq
with p + q = m. Then Sp(A~)
n Sp(A;)
=
{O} and G is a g-inverse of A
if and only if
PROOF. Suppose G is a g-inverse of A. The equation AGA = A yields
Generalized Inverses
271
These equations can be rewritten as
Taking transposes, we obtain
If Sp(A~) n Sp(A2) = {O}, the above equations cannot hold unless each expression is zero. Hence
Thus (8.2.1) follows. Conversely, suppose (8 .2.1) holds and (8.2.2)
°
for some vectors x and y. We will show that A~x = A 2y = which would then imply that Sp(A~) n Sp(A2) = {O} . Pre-multiply (8.2.2) by A~ G~. Note that A~ G~A~x = A~ G~A2Y = 0, which implies that A~ x = 0, since A~ G~ A2 = 0. It is simple to check that G is a g-inverse of A when (8.2.1) holds. This completes the proof. This result has a cousin. The matrices could be partitioned in a different way.
P 8.2.11
Let A E Mm ,n and G E Mn ,m be partitioned as
A
= (AI
mxp
: A2
),
G
mxq
=
p Xm
GI [ G2
1
(8.2.3)
qXm
with p + q = n . Then Sp(Ad if and only if
n Sp(A 2 )
= {O} and G is a g-inverse of A
These are useful results. Under some conditions, a g-inverse of a submatrix of A can be picked up from the corresponding submatrix of a
272
MATRIX ALGEBRA THEORY AND APPLICATIONS
g-inverse G of A . Note that the condition that Sp(A I )nSp(A2) = {O} is equivalent to the condition that p(A) = p(Ad + p(A2)' There is another way to look at the result. Pick up some g-inverses G I and G 2 of Al and A 2 , respectively. Build the matrix (8.2.3). We could wish that G be a g-inverse of A. In order to have our wish realized, we must have as a pre-condition that the ranks be additive, i.e., p(A) = p(Al) + P(A2)' We will now derive a number of corollaries from P 8.2.10. COROLLARY 8.2.12. Let matrices. Partition
A=
A
E
All [A2 pxn
Mm ,n
and G E
Mn,m
be any two
, G = ( G I IG 2 ) nXp n x q
qxn
with p + q = m . Then Sp(A~) n and P(AI) = P if and only if
Sp(A~) =
{O}, G is a g-inverse of A,
PROOF. Suppose Sp(AD n Sp(A~) = {O}, G is a g-inverse of A, and p(Ad = p. Then AIGIAI = AI, A2GIAI = 0, A 2G 2A 2 = A 2, and A I G 2A2 = O. See P 8.2.10. If P(AI) = p, we can cancel Al on the right from both sides of the equations AIGIAI = Al and A2GIAl = O. (How?) The converse is straightforward. Look at the product under the stipulated conditions of Corollary 8.2.12.
The matrices AG and 1m have the same first p columns. COROLLARY 8 .2.13. Let A E Mm ,n and G E Mn ,m be any g-inverse of A. A necessary and sufficient condition that i-th column vector of AG is the same as the i-th column vector of 1m is that the i-th row vector of A is non-zero, and cannot be written as a linear combination of the remaining row vectors of A.
Generalized Inverses
273
PROOF. One simply checks the condition of Corollary 8.2.12. One of the interesting implications of Corollary 8.2.13 is that if all the row vectors of A are linearly independent and G is a g-inverse of A, then AG = In. In other words, if all row vectors of A are linearly independent, then p{A) = m and any g-inverse G of A is indeed a right inverse of A confirming the results of Section 8.1. Corollary 8.2.11 can be rehashed in a different way. COROLLARY 8.2.14. Let A E Mm,n be partitioned as A = (AI: A 2 ) mXp
with p Then
+q
mxq
= n. Let G E Mn,m be a matrix partitioned as in (8.2.3).
n Sp(A2) = {O}, G is a g-inverse of A, and p(Ad = p if and only if G1AI = I p , G 1A2 = 0, A 2G 2A 2 = A 2, and A 2G 2A 1 = O. Sp(AI)
The implication of this result is that the matrices G A and In have the same first p rows. Corollary 8.2.13 has a mate. COROLLARY 8 .2.15. Let A E Mm,n and G E Mn,m be any g-inverse of A. Then a necessary and sufficient condition that the i-th row vector of GA is the same as the i-th row vector of In is that the i-th column vector of A is non-zero, and cannot be written as a linear combination of the remaining columns of A. One of the implications of Corollary 8.2.15 is that if all the column vectors of A are linearly independent, i.e., p(A) = n, then GA = In. This means that G is a left inverse of A. The notions of generalized inverse and left inverse coincide in this special setting. We can rephrase these comments in another way. Suppose A E Mm,n and p(A) = m. We have seen in Section 8.1 that A admits a right inverse G. From the very definition of right inverse, it is transparent that G is indeed a g-inverse of A. Corollary 8.2.13 implies that every g-inverse of A is indeed a right inverse of A. Similar remarks apply when p{A) = n. COROLLARY 8.2.16. Let A E Mm,n be partitioned as
A=
[1J~ A21 sxq
1J;] A22 8Xr
274
MATRIX ALGEBRA THEORY AND APPLICATIONS
with p + s = m and q + r = n. Let G E Mn,m be any g-inverse of A. Partition G accordingly, i.e. ,
G11
G=
qx s G12]
qx p
21 [G rxp
G 22 r Xs
Suppose each of the first q columns of A is non-zero and is not a linear combination of all other columns of A and each of the first p rows of A is non-zero and is not a linear combination of all other rows of A. Then G 11 remains the same regardless of the choice of G. PROOF. What the conditions of the proposition mean are
Sp{{A 11 IA 12 )') Sp
n Sp{{A 21 IA 22 )') = {O}
[~~~J nsp [~~~]
= {O}
and p
and
P{AllIA12) = p,
[~~~]
= q.
(8.2.4)
Under (8.2.4), by Corollary 8.2.12,
These equations are equivalent to (8.2.5) Suppose F is another g-inverse of A partitioned in the same style as of G. Let Fij's stand for the blocks of F. Thus we must have (8.2 .6) By subtracting (8.2.6) from (8 .2.5), we observe that
All{Gll-Fll)+A12{G21-F2d
= 0,
We can rewrite these equations as
A21{Gll-Fu)+A22{G21-F2d
= O.
Generalized Inverses
275
This equation means that some linear combinations of the first q columns of A are the same as some linear combinations of the last n - q columns of A. In view of (8.2.4), we must have
Since p
[Au] = q, A21
G u - Fu = O. (Why?) This completes the proof. The following is a special case of Corollary 8.2.16. COROLLARY 8.2.17. If the i-th row of a matrix A E M m •n is nonzero and is not a linear combination of the remaining rows of A and the j-th column of A is non-zero and is not a linear combination of the remaining columns of A, then the (j, i)-th element of a generalized inverse G of A is a constant regardless of the choice of G. We now focus on non-negative definite (nnd) matrices. COROLLARY 8.2.18. Let A E Mm be an nnd matrix partitioned as
Au A= [A21 pxp qxp
with p + q = m. Let G E Mm be any g-inverse of A. Partition G accordingly, i.e.,
GU
G=
pXp
pxq G12]
[ G 21 Gn qXp
qxq
Suppose each of the first p rows of A is non-zero and is not a linear combination of the remaining rows of A. Then
for any g-inverse of A22 of A 22 .
MATRIX ALGEBRA THEORY AND APPLICATIONS
276
PROOF. The conditions of Corollary 8.2.12 are met. Consequently, (8.2.7) and (8.2.8) Choose and fix a g-inverse A22 of A 22 .
Pre-multiplying (8.2.8) by
A 12 A 22 , we obtain
(8.2.9) Since A is nnd, A12A22A22 = A 12 . (Why?) (Since A is nnd, Sp(A21 ) C Sp(A22)') Thus we have from (8.2.9), (8.2.10) Subtracting (8.2.10) from (8.2.7), we obtain (Au - A12A;A.21)G U = Ip.
Hence G u = (Au - A12A22A21)-1. (Note that G u is unique. See Corollary 8.2.16.) COROLLARY 8.2.19. Let A E Mm,n be partitioned as A = (Al
I A2 )
mxp mxq .
with p + q = n. Let G E Mn,m be any g-inverse of A. Let G be partitioned accordingly as in (8.2.3). If Sp(Ad n Sp(A 2) = {O} and Sp(Ad ED Sp(A2) = em, then p = Al G 1 is a projector from em onto Sp(Ad along Sp(A2)' Suppose A = (A1IA2) is a partitioned matrix with Al being of order m x p and A2 of order m x q. Suppose we have g-inverses G 1 and G 2 of Al and A 2, respectively, available. We string G 1 and G 2 as in (8.2.3) . Under what conditions G is a g-inverse of A? P 8.2.11 provides an answer. In the following, we provide a sufficient condition. COROLLARY 8.2 .20. Let Al and A2 be two matrices of orders m x p and m x q, respectively. Let A = (A1IA2) and G 1 and G 2 be g-inverses
Generalized Inverses
of Al and A 2, respectively. If AlGI F of A, then G
=
[g~]
+ A 2G 2 =
277
AF for some g-inverse
is a g-inverse of A.
PROOF. It suffices to show that A I G I A2 = 0, and A 2G 2A I = 0. Post-multiply both sides of the equation AlGI + A 2G 2 = AF by A = (AIA2). This operation leads to
(AI GI
+ A 2G 2) (AIIA2) = AF A = A = (AIIA2) = (AIGIAI + A2G2AIIAIGIA2 + A 2G 2A 2 ),
°
which gives Al GIAI = and A 2 G 2 A I = 0, and the result is proved. In P 8.2.9, we have seen how a single g-inverse of a matrix A generates all solutions of a consistent system Ax = y of linear equations. In the following result, we demonstrate how a single generalized inverse of A generates all g-inverses of A .
P 8.2.21 Let G be a fixed g-inverse of a given matrix A. Any g-inverse G I of A has one of the following forms. (1) G I = G + U - GAU AG for some matrix U. (2) G I = G + V(I - AG) + (I - GA)W for some matrices V and W . PROOF . The first thing we can check is that AGIA = A if G I has anyone of the forms (1) and (2). Conversely, let G I be a given g- inverse of A. If we wish to write G I in the form (1), take U = G I - G. If we wish to write G I in the form (2), take V = G I - G and W = GIAG. We now introduce a special notation. For any matrix A , let {A-} denote the collection of all g-inverses of A. In the following result we demonstrate that A is essentially determined by the class {A -} of all its g-inverses.
P 8.2.22 Let A and B be two matrices of the same order m x n such that {A-} = {B-}. Then A = B . PROOF. What the condition of the theorem means is that if G is a g-inverse of A then it is also a g-inverse of B and vice versa. Let G be a g- inverse of A and
G I = G + (In - A*(AA* )- A)B* .
278
MATRIX ALGEBRA THEORY AND APPLICATIONS
We note that G I is also a g-inverse of A. For, AGIA = AGA + A(In - A*(AA*)- A)B* A = A
+ (A -
(AA*)(AA*)- A)B* A = A
as A = AA*(AA*)- A by Corollary 8.2.7 using A* in place of A. By the hypothesis of the proposition, G I is also a g-inverse of B. Thus B
= BGIB = BGB + B(In - A*(AA*)- A)B* B = B + B(In - A*(AA*)- A)(In - A*(AA*)- A)* B* B,
since (In - A*(AA*)- A) is Hermitian and idempotent. See P 8.2.8. This implies that B(In - A*(AA*)- A)(In - A*(AA*)- A)* B* B
= o.
Pre-multiplying the above by B*, we have B* B(In - A*(AA*)- A)(In - A*(AA*)- A)* B* B =
o.
Consequently, B* B(In -A*(AA*)- A) = O. (Why?) From this, we have B(In - A*(AA*)- A) = 0, or equivalently, B = BA*(AA*)- A. (Why?) Thus we are able to write B = CA for some matrix C. Following the same line of reasoning, we can write A = DB. By focusing now on a variation of G, given above by G 2 = G + B*(Im - A(A* A)- A*),
one can show that B = AE for some matrix E. In a similar vein, one can show that A = BF for some matrix F. (Try.) Now, B
= BGB = CAGAE = CAE = BE,
which implies DB = DBE = AE = B. Hence B = DB = A. This completes the proof. Suppose G is a g-inverse of A, i.e., AGA = A. It need not imply that A is a g-inverse of G, i.e., GAG = G. We will now introduce a special terminology.
Generalized Inverses
279
DEFINITION 8.2.23. Let A be a matrix of order m x n. A matrix G of order n x m is said to be a reflexive g-inverse of A if AGA = A and GAG = G. We use the notation A; for a reflexive g-inverse of A. We now demonstrate the existence of reflexive g-inverses.
P 8.2.24 exists.
For any matrix A E Mm,n, a reflexive g-inverse of A
PROOF. Let p(A) = a. By the Rank Factorization Theorem, write A = RF with R of order m x a with rank a and F of order a x n with rank a. Let C and D be the right and left inverses of F and R, respectively, i.e., FC = Ia and DR = Ia. Choose G = CD . Note that AGA = RFCDRF = RlalaF = RF = A. On the other hand, GAG = CDRFCD = ClalaD = CD = G. If G is a g-inverse of A, one can show that p(A) ::; p(G). The equality p(A) = p(G) does indeed characterize reflexive g-inverses of A. We demonstrate this in the following proposition.
P 8.2.25 The following statements are equivalent for any two matrices A E Mm ,n and G E Mn ,m. (1) AGA = A and GAG = G. (2) AGA = A and p(A) = p(G). PROOF . (1) :::} (2). The statement AGA = A implies that p(A) ::; p(G) and the statement GAG = G implies that p(G) ::; p(A). Then we must have p(A) = p(G). To prove (2):::}(1), note that p(G) = p(A) (by (2))
::; p(GA) ::; p(G) ,
which implies that p(GA) = p(G). The matrix GA is idempotent. Then by P 8.2.3 , A is a g-inverse of G. The computation of a reflexive g-inverse of A can be done in several ways. Making use of the rank factorization of A is one possibility. Suppose we already have a g-inverse G of A. Let G 1 = GAG. One can check that G 1 is a reflexive g-inverse of A. Every reflexive gTinverse of A can be written in the form GAG for some g-inverse of G of A. Try. It is clear that if G is a g-inverse of A, then p(G) ~ p(A). For a given integer s satisfying p(A) ::; s ::; min{ m, n}, is it possible to find a
280
MATRIX ALGEBRA THEORY AND APPLICATIONS
g-inverse G of A such that p( G) = s? In the following result we answer this question.
P 8.2.26 Let A E Mm,n have the decomposition given by
where P and Q are nonsingular matrices of orders m x m and n x n, respectively, ~ isa diagonal matrix of order a x a with rank a, where a = p(A). Then: (1) For any three matrices E 1 , E 2 , and E3 of appropriate orders,
is a g-inverse of Ai (2) For any two matrices El and E2 of appropriate orders,
is a reflexive generalized inverse of A. PROOF. One simply verifies that they are doing their intended jobs. Note that the matrix G of (1) has the property that Ll-l p(G) = p ( [ E2
Given any integer s such that p(A) ~ s ~ min{m,n}, one can choose E 1 , E 2 , and E3 carefully so that p( G) = s. You can experiment with these matrices. One can directly verify that p(G r ) = a = p(A) .
Complements 8.2.1
Cancellation Laws. Prove the following.
(1) If AB = AC, Sp(B) c Sp(A*), and Sp(C) C Sp(A*), then B=C. (Hint: The matrices Band C can be written as B = A* D and C = A * E for some matrices D and E. Therefore, AA * D =
Generalized Inverses
(2) (3) (4) (5)
281
AA * E or AA * (D - E) = 0 or (D - E) * AA * (D - E) = 0 or A*(D - E) = 0.) (Thus A can be cancelled in AB = AC.) If A E Mm ,n, p(A) = n , and AB = AC, then B = C. If A E Mm ,n, p(A) = m , and BA = CA, then B = C. If ABC = ABD and p(AB) = p(B), then BC = BD. If CAB = DAB and p(AB) = p(A), then CA = DA.
8.2.2 If A = 0, determine the structure of g-inverses of A. 8.2.3 Let I n be a matrix of order n x n in which every entry is equal to unity. Let a i= b and a - b + n = O. Show that (a - b)-l In is a g-inverse of (a - b)In + I n . 8.2.4 Let A be a matrix of order m x n and a and (3 column vectors of orders m x 1 and n x 1, respectively. Let G be any g-inverse of A. If either a E Sp(A) or (3 E Sp(A') , show that
= G _ ((Ga)((3'G))
G 1
(1
+ (3'Ga)
is a g-inverse of A + a(3', provided 1 + (3'Ga i= O. 8.2.5 Let A be a matrix of order n x m with rank rand B a matrix of order s x m with rank m - r. Suppose Sp(A*) n Sp(B *) = {O} . Then (1) A* A + B* B is nonsingular j (2) (A* A + B* B)-l is a g-inverse of A* Aj
(3)
A ~*]
[A~
is nonsingular provided that s = m - r .
8.2.6 Show that a Hermitian matrix has a Hermitian g-inverse. 8.2.7 Show that a non-negative definite matrix has a non-negative g-inverse. 8.2.8 Show that a Hermitian matrix A has a non-negative definite g-inverse if and only if A is non-negative definite. 8.2.9 If G 1 and G 2 are two g-inverses of A, show that aG l + (l-a)G 2 is a g-inverse of A for any a. 8.2.10 If G is a g-inverse of a square matrix A , is G 2 a g-inverse of A2? Explain fully. 8.2.11 Let T = A + XU X' where A is nnd, U is symmetric such that Sp(A) C Sp(T) and Sp(X) C Sp(T). Then Sp(X'T- X) = Sp(X') for any g-inverse T- .
282
MATRIX ALGEBRA THEORY AND APPLICATIONS
8.2.12 Let A X = B and X C = D be two consistent system of matrix equations in the unknown matrix X. Show that they have a common solution in X if and only if A D = B C . If this condition holds, show that X = A-B+DC- -ADCis a common solution.
8.3. Geometric Approach: LMN-inverse In defining a g-inverse G of A 6 M,,, in Section 8.2, emphasis has been laid on its use in providing a solution to a consistent system Ax = y of linear equations in the form x = Gy. A necessary and s a c i e n t condition for this is that the operator AG is an identity on the subspace Sp(A) c Cm. Nothing has been specified about the values of AGy or Gy for y 6 Sp(A). The basic theme of this section is that we want to determine a g-inverse G E M,,, such that Sp(G) is a specified subspace in Cn and the kernel of G is a specified subspace in Cm. The concept of g-inverse introduced in Section 8.2 makes perfect sense in the realm of general vector spaces and transformations. Let V and W be two vector spaces and A a transformation from V and W. (We retain the matrix symbol A for transformation too.) A transformation G from W to V is said to be a generalized inverse of A if AGA = A. The entire body of results of Section 8.2 carries over to this general framework. In this section, it is in this general framework we work on.
+ V be a g-inverse of a transformation A + W , i.e., satisfies the equation AGA = A. Define P 8.3.1 Let G : W
V
:
where R denotes the range and K the kernel of a transformation. Then: (1) A N = 0, N A = 0. (2) M n K(A) = (0) and M $ K(A) = V. (3) L n R(A) = (0) and L $ R(A) = W.
PROOF.(1) is clearly satisfied. To prove (2) let a: E V , belong to both M and K(A). Then, obviously Ax = 0, and x = (G - N ) y for some y. Then A(G - N ) y = AGAGy = AGy = 0. This implies that 0 = GAGy = (G - N ) y = x . Hence M n K(A) = (0). Let
Generalized Inverses
283
x E V. Obviously X = GAx + (1 - GA)x, where (1 - GA)x E K(A) since A(1 - GA)x = 0, and GAx = (G - N)Ax E R(G - N). Thus M $ K(A) = V. (3) is proved in an analogous manner. Now we raise the following question. Let MeV be any given complement of K(A), LeW be any given complement of R(A), and N be a given transformation such that N A = 0, AN = o. Does there exist a g-inverse G of A such that the conditions (8.3.1), i.e., N = G - GAG, M = R(G - N), L = K(G - N) hold? The answer is yes and the following propositions give a construction of such an inverse and also establish its uniqueness. We call this inverse the LMN-inverse, which was proposed in Rao and Yanai (1985).
P 8.3.2 Let A be a transformation from a vector space V into a vector space W. Let L be a subspace of W such that L n R(A) = {O} and L $ R(A) = W. Let M be a subspace of V such that M n K(A) = {O} and M61K(A) = V. Let N be a transformation from W to V such that AN = 0 and N A = O. Then there exists an LMN-inverse. PROOF.
Let H be any g-inverse of A. Then we show that G
= PM .K(A)HPR(A).L + N
(8.3.2)
is the desired LMN-inverse, where PM oK(A) (abbreviated as PM in the proof) is the projection operator on M along K(A) and PR(A) oL (abbreviated as P A) is the projection operator on R(A) along L. Consider x(E V) = xdE M) + x2(E K(A)). Then APMX
= Ax! = Ax for every x
E V ::} APM
= A.
(8.3.3)
Obviously, (8.3.4) It is easy to verify using (8.3.3) and (8.3.4) that AGA = A, where G is as in (8.3.2), i.e., G is a g-inverse of A. Let x E M and write HAx = x + Xo. Then AHAx = Ax + Axo ::} Axo = 0 since AHAx = Ax, i.e., Xo E K, and PMHAx = PMX + PMXO = x . Then using (8.3.4) (8.3.5)
284
MATRIX ALGEBRA THEORY AND APPLICATIONS
so that any x E M belongs to R{PMHP A) = R{G - N). Obviously any vector in R{PMHP A) belongs to M, which shows that (8.3.6)
R{G-N) = M.
Let l{E W) = ll{E R{A)) + 12{E L) = Am + 12 for some m . Then (G - N)l = PMHP A{Am + 12) = PMHAm = 0 ~ APMHAm = O. From (8.3.3) , APM = A so that APMH Am = AH Am = Am which is zero. Hence l = 12 E L . Further, for any l E L, (G - N)l = O. Hence
= L.
(8.3.7)
N=G-GAG
(8.3.8)
K{G - N) It is easy to verify that
so that all the conditions for an LMN-inverse are satisfied, and P 8.3.2 is proved. Now we settle the uniqueness problem. P 8.3.3 Let A be a transformation from V to W. The LMNinverse of A, which is a transformation from W to V is unique. Let G 1 and G 2 be two transformations satisfying the conditions (8.3.1). Note that R{G i - N) = M ~ (G i - N)y E M for all yEW and i = 1,2. Taking the difference for i = 1 and 2, (8.3.9)
Similarly K(G i - N) = L ~ {G i - N)y Taking the difference for i = 1 and 2 {G 1
-
G 2 )y = {G 1
A(G 1
-
G 2 )y = A{G 1
= (G i
-
N)P AY for yEW.
G 2 )P AY
-
G 2 )P AY = 0 since PAY = Az for some z
-
~ (G 1
-
G 2 )y E K{A).
Since M n K{A) = 0, (8.3.9) and (8.3.10) ~ {G 1 yEW. Hence G 1 = G 2 .
(8.3.1O) -
G 2 )y
=
0 for all
Generalized Inverses
285
Complements 8.3.1 Let L, M, N be as defined in P 8.3.2. Then the LMN-inverse G has the alternative definitions:
(1) AG = PR(A) .L, GA = PM ' K(A), G - GAG = N. (8.3.11) (2) AGA = A, R(G - N) = M, K(G - N) = L. (8.3.12) 8.3.2 Let G be any g-inverse of A and M is any complement of K(A) in V. Then the transformation AIM: M -+ R(A) is bijective, and the true inverse (AIM)-l : R(A) -+ M exists.
FIGURE. LMN-inverse A:V-+W AIM
GIR(A) M=R(G-N)
R(A)
"
.- '
K(A) N
G:W-+V The figure given above shows the range and kernel spaces of the given transformation A and any associated g-inverse G for given subspaces L, M and a transformation N. P 8.3.4 shows that all generalized inverses can be generated by different choices of L, M and N .
286
MATRIX ALGEBRA THEORY AND APPLICATIONS
P 8.3.4 Let A be a transformation from a vector space V to a vector space W. Let F be the collection of all entities (L, M, N) satisfying the following: (1) LisasubspaceofWsatisfyingLnR(A) = {O} andLEElR(A) = W. (2) M is a subspace of V satisfying M n K(A) = {O} and M EEl K(A) = V . (3) N is a transformation from W to V such that N A = 0 and AN=O. Let G be the collection of all g-inverses of A. Then there exists a bijection from F to G. Given L, M, and N satisfying (1) , (2), and (3), let G LMN be the unique LMN-inverse of A. The map 1(L, M, N) = GLMN is a bijection from F to G. This proves P 8.3.4. The case N = 0 is of special interest. The conditions (8.3.11) and (8.3.12) reduce to PROOF.
AG =
G = GAG
(8.3.13)
AGA = A, R(G) = M, K(G) = L.
(8.3.14)
PR(A).L,
GA =
PM.K(A),
and In other words, when N = 0, G is a reflexive g-inverse of A with range M and kernel L. This special case deserves a mention. COROLLARY 8.3.5. Let A be a transformation from a vector space V to a vector space W. Let Lr be the collection of all entities (L, M) with the following properties: (1) L is a subspace ofW such that LnR(A) = {O} and LEElR(A) = W. (2) M is a subspace of V such that M n K(A) = {O} and M EEl K(A) = V. Let G r be the collection of all reflexive g-inverses of A. Then there exists a bijection from Lr to Gr. A special case of this is of interest. Assume that the vector spaces V and Ware equipped with inner products. We take L to be the orthogonal complement of R(A) and M to be the orthogonal complement of K(A). Let PM be the orthogonal projection from V onto M and
Generalized Inverses
287
the orthogonal projection from W onto R(A) . Take N = O. The conditions (8.3.13) and (8.3.14) reduce to PR(A)
AG
= PR(A) , GA = PM, GAG = G
(8.3.15)
= A, R(G) = M, K(G) = L,
(8.3.16)
and AGA
respectively. If the projection operators are self-adjoint, the equivalent sets (8.3.15) and (8.3.16) of conditions reduce to AGA
= A, GAG = G, AG = (AG)*, GA = (GA)*,
(8.3.17)
where (AG)* is the adjoint of the transformation AG in the inner product space V and (GA)* is the adjoint of the transformation GA in the inner product space W. We now introduce a definition. DEFINITION 8.3.6. Let A be a transformation from an inner product space V to an inner product space W. A transformation G from W to V is said to be a Moore-Penrose inverse of A if (8.3.17) holds. The existence of Moore-Penrose inverse is guaranteed and it is unique. It is after all an LMN-inverse for some suitable choice of L, M, and N. Thus the concept of LMN-inverse is more general than the MoorePenrose inverse. Another special case of interest is when we work with matrices. Let A E Mmn , and N E Mnm , be such that AN = 0 and NA = O. Take V = en and W = em. We now specify some subspaces of V and W. Let B be a matrix of order m x p such that Sp(B) n Sp(A) = {O} and Sp(B) EB Sp(A) = W . Let C be a matrix of order q x n such that K(C)nK(A) = {O} and K(C)EBK(A) = V. The subspaces under focus are Sp(B) and K(C). In the terminology of Land M, M = K(C) and L = Sp(B). We now seek a matrix G E Mn,m such that
AGA
= A, Sp(G - N) = K(C), K(G - N) = Sp(B) .
An explicit form of G is given by
288
MATRIX ALGEBRA THEORY AND APPLICATIONS
8.4. Minimum Norm Solution
In this section, we take up again the problem of solving a system Ax = y of consistent linear equations in the unknown x . The entity A
is a transformation from an inner product space V to a vector space W. For the equation to be consistent, we ought to have y E R(A). Let < ., . > be the inner product and II· II the associated norm on V. The problem we focus on is: minimize
IIxll
over the set {x E V : Ax = y}.
(8.4.1)
The following proposition solves this problem. P 8.4.1 Let PK(A) be the orthogonal projection from V onto K(A), the null space of A. Let G be a transformation from W to V such that GA = J - PK(A) .
(8.4.2)
(We will settle the existence of G a little later.) Then x(n) = Gy solves the minimization problem (8.4.1) . (We use the superscript (n) to indicate that the vector x(n) is a minimum norm solution.) PROOF. Since y E R(A), y = Ab for some bE V. Then
x(n)
= Gy = GAb = (I -
PK(A»)b,
(8.4.3)
which means that x(n) E [K(A)J.l, the orthogonal complement of K(A) . Note that G is a g-inverse of A, i.e., AGA = A. Let x E V, and write x = Xl +X2 with Xl E K(A) and X2 E [K(A)J.l. Then Ax = Ax! +AX2 = AX2. Further, AGAx = A(J -PK(A»)X = A(X-PK(A)X) = A(x-xr) = AX2 = Ax. Hence AGA = A. Since G is a g-inverse of A , Gy = x(n) is a solution to the system of equations. Now let x be any solution to the system of equations. It is clear that x - x(n) E K(A) . Consequently,
<x
- x(n) ,x(n)
>=
o.
(8.4.4)
Observe that
IIxl1 2=< x, x > =< x =< =
x(n)
+ x(n), x _
x - x(n), x - x(n)
x(n)
+ x(n) >
>+<x
_ x(n), x(n)
+ < x(n) ,x - x(n) > + < x(n) ,x(n) > Ilx - x(n) 112 + Ilx(n) 112 2: Ilx(n) 112
>
289
Generalized Inverses
using (8.4.4) in the last but one step. This completes the proof. There is another way to look at the transformation I - PK(A). Let A # be the adjoint of :A. Then I - PK(A) is the orthogonal projection from V onto R(A#). In our suggestive notation, 1- PK(A) = PR(A#) . The existence of G is not a problem. Look back at (8.3.5). The operator PM is precisely our I - PK(A) here. There is a G satisfying (8.3.5). Put any inner product on the vector space W. We could call G a minimum norm inverse. A minimum norm inverse could be defined in general vector spaces not necessarily equipped with inner products. Let A be a transformation from a vector space V to a vector space W. Let M be any subspace of V which is a complement ofK(A), i.e., MnK(A) = {O} and MEBK(A) = V. Let PM.K(A) be a projector from V onto M along K(A). One could define a transformation G from W to Vasa minimum norm inverse if GA = PM .K(A) . Such a G is necessarily a g-inverse of A. (Prove this.)
8.5. Least Squares Solution Let A be a transformation from a vector space V into an inner product space W. Let < .,. > be the inner product on Wand II . II the corresponding norm. Let yEW and Ax = y be a system of equations possibly inconsistent. We focus on the problem: minimize Ily - Axil over all x E V.
(8.5.1)
If the system Ax = y is consistent, any solution of the system, obviously, solves the problem (8.5.1). It is inconsistency that poses a challenge.
P 8.5.1 Let PR(A) be the orthogonal projection from W onto R(A). Let G be any transformation from W to V such that AG =
(8.5.2)
PR(A) .
(Is the existence of G a problem?) Then x(l) = Gy solves the problem (8.5.1). (We use the suggestive superscript l to indicate that x(l) is a least squares solution.) PROOF. Note that y - Ax(l) =
Y -
AGy = (I -
PR(A))Y E
[R(A)J.l.
290
MATRIX ALGEBRA THEORY AND APPLICATIONS
Since A(x U') - x) E R(A) for any x E V, we have
< A(x(l)
- x) , y - Ax(l)
>= O.
(8.5.3)
For any x E V,
Ily -
Axl12
=< Y - Ax,y - Ax> =< (y - Ax(l») + A(x(l) - x), (y =< Y _ AX(l) , y - Ax(l) > + < y -
Ax(l»)
+ A(x(l)
- x) >
AX(l) , A(x(l) - x) >
+ < (x(l) - x), y - Ax(l) > + < A(x(l) - x), A(x(l) = lIy - AX(l) 112 + IIA(x(l) - x)1I2 ~ lIy - AX(l) 112.
- x) >
This completes the proof. The label that x(l) is a least squares solution stems legitimately. Suppose A = (aij) is a matrix of order m x n complex entries. Let y' = (Yl, Y2, ... , Ym) E em. Let the vector space em be equipped with the standard inner product. Then for x' = (Xl,X2, '" ,xm ) E em, m
i=l
This expression appears routinely in many optimization problems. In the context of Linear Models in Statistics, Yi is a realization of a random variable Vi in the model and (ailxl + ai2X2 + ... + ainXn) its expected value. The difference (Yi - (ailxl + ai2x2 + ... + ainXn)) is labelled as the error. The objective is to minimize the sum of squared errors with respect to Xi'S. Any solution to this problem is called a least squares solution. The transformation G satisfying AG = PR(A) could be labelled as a least squares solution. Such a G is necessarily a g-inverse of A . This notion can be defined in general vector spaces without any reference to inner products. Let A be a transformation from a vector space V to a vector space W. Let L be any subspace of W which is a complement of R(A), i.e., R(A)
nL
= {O} and R(A) E9 L = W.
Let PR(A) "L be a projector from W onto R(A) along L . We say that a transformation G from W to V a least squares inverse if AG = PR(A) .L '
Generalized Inverses
291
8.6. Minimum Norm Least Squares Solution Let A be a transformation from an inner product space {V, < -, - >d to an inner product space (W, < .,. >2) with the corresponding norms II . III and 11·112. Let YEW. We focus on the problem: minimize {x(l) E V :
Ily -
Ilx(i)lIl
over the set
Ax(l) 112 = min Ily xEV
-
AxIl2}.
(8.6.1)
First, we gather all the least squares solutions x(l) each of which minimizes lIy - Axl12 over V. Among these solutions, we seek one x(i) for which IIx(l) III is minimum. p 8.6.1 Let PR(A) be the orthogonal projection from W onto R{A) and PK(A) the orthogonal projection from V onto K{A). Let G be a transformation from W to V such that AG
= PR(A),
GA
= (I -
PK(A»),
GAG
= G.
(8.6.2)
(Is the existence of G a problem? Look up (8.3.6) .) Then x(nl) = Gy solves the problem (8.6.1). PROOF.
Let x(i) be any solution to the problem: minimize
Ily -
Axlll over all x E V.
(8.6.3)
From the proof of P 8.5.1, it is clear that x(l) satisfies the equation AX(l) = PR(A)Y.
(8.6.4)
The conditions (8.6.2) subsume the conditions (8.5.2). Consequently, x(nl) is also a solution to the minimization problem (8.6.3). By (8.6.4), AX(l) - Ax(nl)
= PR(A)Y -
PR(A)Y
=0
from which we have x(i) - x(nl) E K{A). Observe that x(ni)
= Gy = GAGy = GA{Gy)
E R{GA)
= R{J -
PK(A»).
292
MATRIX ALGEBRA THEORY AND APPLICATIONS
Since [ - PK(A) is an orthogonal projection from V onto [K(A)j1·, we have x(nl) E [K(A)J-L, from which it follows that
< x(l) Thus for any solution
x(l)
_ x(nl) ,x(nl)
>= O.
of the minimization problem (8.6.3),
Ilx(l) II~ =< x(l), x(l) >=< x(nl) + (x(l) _ x(nl)), x(nl) + (x(l) _ x(nl)) > = Ilx(nl) II~+ < x(nl), (x(l) _ x(nl)) > + < (x(l) _ x(nl)), x(nl) > + Ilx(l) - x(nl) II~ = Ilx(nl) II~ + IIx(l) - x(nl) II~ ~ Ilx(nl) II~. This completes the proof The transformation G satisfying (8.6.2) can be called as a minimum norm least squares solution. This notion can be introduced in general vector spaces without involving inner products. Let A be a transformation from a vector space V to W. Let M be a complement of the subspace K(A) in V and L a complement of R(A) in W. Let PM .K(A) and PR(A) .L stand for the projectors in the usual connotation. A transformation G from W to V is said to be a minimum norm least squares inverse if AG = PR(A).L, AG = PM .K(A), GAG = G. 8.7. Various Types of g-inverses
We have come across a variety of g-inverses in our sojourn. It is time to take stock of what has been achieved and then provide a summary of the deliberations. In the following V and W stand for vector spaces. We look at three possible scenarios. Scenario 1 (1) V and Ware general vector spaces. (2) A stands for a transformation from V to W. (3) G stands for a transformation from W to V. (4) R(A) and K(A) stand for the range and kernel of the transformation A, respectively. (5) M stands for any subspace of V which is a complement of the subspace K(A) of V, i.e., M n K(A) = {O} and M E9 K(A) = V.
Generalized Inverses
293
(6) PM .K(A) stands for the projector from V onto M along K(A). (7) L stands for any subspace of W which is a complement of the subspace R(A). (8) PR(A) .L stands for the projector from W onto R(A) along L.
Scenario 2 (1) V and W stand for general inner product spaces. (2), (3) and (4) are the same as those under Scenario l. (5) A # stands for the adjoint of the transformation A . (The map A # is a transformation from W to V.) (6) PR(A) stands for the orthogonal projection from W onto R(A). (7) P[K(A)J-l stands for the orthogonal projection from V onto [K(A)].l. (8) PR(A#) stands for the orthogonal projection from V onto R(A#) . Scenario 3 (1) V and Ware unitary spaces, i.e., V = cn and W = C m , equipped with their usual inner products. (2) A is a matrix of order m x n. (3) G is a matrix of order n x m . (4) R(A) is the vector space spanned by the columns of A and K(A) the null space of A. (5) A* is the conjugate of A. (6) PA' = A* MiA, where Mi is any matrix satisfying (AA*)Mi(AA*) = AA*. (7) PA = AM2A*, where M2 is any matrix satisfying (A* A)M2(A* A) = A*A.
Complements 8.7.1 Under scenario 1, we have defined the LMN-inv~rse in Section 8.3. (1) Show that the LMN-inverse G can also be characterized as follows GIR(A) = (AIM)-i and GIL = NIL .
[Note that AIM: M -+ R(A) is bijective so that (AIM)-i R(A) -+ M is well defined and unique.] (2) The LMN-inverse can also be defined as Gy = (AIM)-lYi
where Y(E W)
+ NY2
= ydE R(A)) + Y2(E L) .
294
MATRIX ALGEBRA THEORY AND APPLICATIONS
(3) If instead of N, suppose we are given R(G - GAG), an M and an L. Show that G is not unique in such a case. In the following table, we provide a summary of the properties that various types of g-inverses should satisfy under each of the scenarios. TABLE. A catalogue of different types of g-inverses Description of G
Scenario 1
Scenario 2
Scenario 3
g-inverse
AGA=A
AGA=A
AGA=A
r-inverse
GAG=G
GAG=G
GAG=G
r , g-inverse
AGA=A GAG=G
AGA=A GAG=G
AGA=A GAG=G
min norm g-inverse
GA= PM -K(A)
GA = PR(A#)
GA=PA·
min norm r, g-inverse
GA= PM -K(A) GAG=G
GA= PR(A# ) GAG=G
GA = PA. GAG=G
least squares g-inverse
AG = PR(A) -L
AG=PA
AG=PA
least squares r, g- inverse
AG = PR(A) -L GAG=G
AG=PA GAG=G
AG=PA GAG=G
pre min norm least squares g-inverse
AG = PR(A) -L GA=PM -K(A)
AG=PA GA == PA#
AG=PA GA = PA.
min norm least squares g-inverse
AG = PR(A) -L GA= PM -K(A) GAG=G
AG=PA GA=Pc
AG=PA GA=PG
(1)
(2)
g-inverse=generalized inverse, r-inverse=reflexive inverse _ Equivalent conditions for (1) are
AGA=A, GAG=G , (AG)# = AG, (GA)# = GA Equivalent conditions for (2) are
AGA = A, GAG = G , (AGr = AG, (GAr = GA
Gen eralized Inverses
295
Some comments are in order on the above table. (1) If G is a minimum norm inverse of A, then it is also a g-inverse of A . (2) If G is a least squares inverse of A, then it is also a g-inverse of A. (3) Suppose A is a matrix of order m x n and rank a. In Section 8.2, we presented the structure of a g-inverse G of A. Let
be singular value decomposition of A , where P and Q are unitary matrices of orders m x m and n x n, respectively, and ~ is a diagonal matrix of order a x a with p(A) = a. Then a g-inverse G of A is of the form
for any arbitrary matrices E 1 ,E2 , and E 3 . A reflexive g-inverse G of A isOf the form
for any matrices El and E 2 . A matrix G of order n x m is a minimum norm g-inverse of A if G is of the form
for any matrices El and E 3 . A matrix G of order n x m is a minimum norm and reflexive g-inverse of A if G is of the form
for any matrix E 1 . A matrix G of order n x m is a least squares g- inverse of A if G is of the form
296
MATRIX ALGEBRA THEORY AND APPLICATIONS
for any matrices E2 and E 3 . A matrix G of order n x m is a least squares and reflexive inverse of A if G is of the form
for any matrix E 2 • A matrix G of order n x m is a pre-minimum-normleast-squares-inverse of A if G is of the form b. -1
G-Q [0
for any matrix E 3 . A matrix G of order n x m is a minimum norm least squares inverse of A, i.e., Moore-Penrose inverse of A if G is of the form
B.B. G-inverses Through Matrix Approximations
Given a matrix A of order m x n with complex entries, there may not exist a matrix G of order n x m such that GA = In . In such an event, we may try to find a matrix G such that AG and G A are close to 1m and In , respectively. Such a G may be called an approximate inverse of A. The underlying theme of this section is to pursue this idea of determining approximate inverses of A. It turns out that the g-inverses introduced and examined above are after all approximate inverses in some sense. (See Rao (1980) .) In order to develop the theory of approximate inverses, we need a general criterion to decide which one of the two given matrices is closer to the null matrix. DEFINITION 8.8.1. Let Sand R be two matrices of the same order m x n. Assume that R is closer to a null matrix than S if SS* ~ RR* or S" S 2: R* R. Assume that R is strongly closer to a null matrix tlian S if S S" 2: RR* and S* S 2: R* R. Some comments are in order on the definition.
(1) The notation C ~L D (or C ~ D for convenience of notation) for two matrices C and D means that C - D is non-negative definite. (The subscript stands for Lowner ordering.)
Generalized Inverses
297
(2) If SS* ~ RR*, it does not imply that S* S ~ R* R. (3) Let O"dS) ~ 0"2(S) ~ ... ~ O"t(S) ~ 0 and O"dR) ~ 0"2(R) ~ .. . 2: O"t(R) ~ 0 be the singular values of Sand R, respectively, where t = min{m,n}. If O"i(S) ~ O"i(R) for all i, it does not follow that SS* ~ RR* or S* S ~ R* R. But the reverse is true. (4) If SS* ~ RR* or S* S ~ R* R, then it follows that IISII ~ IIRII for any unitarily invariant norm II . II on the space of all matrices of order m x n. We are jumping the gun again. We will not come across unitarily invariant norms until Chapter 11. The converse is not true. Thus the concept of closeness in the sense of Definition 8.6.1 is stronger than having a smaller norm. We are now ready to establish results on matrix approximations via g-inverses. P 8.8.2 Let A be a matrix of order m x nand G be any g-inverse of A, and L a least squares inverse of A . Then
PROOF. Let PA be the orthogonal projection from W = em onto R(A). We have seen that AL = PA. This means that AL is idempotent and Hermitian. Further, A* AL = A*. (Why?) We check that
(AL - AG)*(Im - AL) = AL(Im - AL) - G* A*(Im - AL) =O-G*(A* -A*AL) =0. Finally,
(Im = (Im = (Im + (Im
AG)*(Im - AG) AL + AL - AG)* (Im - AL + AL - AG) AL)*(Im - AL) + (AL - AG)*(lm - AL) - AL)*(AL - AG) + (AL - AG)*(AL - AG)
= (Im - AL)*(Im - AL) ~
(1m - AL)*(Im - AL).
+ 0 + 0 + (AL -
AG)*(AL - AG)
Why?
This completes the proof. We discuss some implications of this result. The matrix 1m - AL is closest to the null matrix among all matrices 1m - AG, with G being
298
MATRIX ALGEBRA THEORY AND APPLICATIONS
a g-inverse of A. Further, for any unitarily invariant norm space of all m x m matrices,
II . II
on the
minllIm - AGII = 111m - ALII, G
where the minimum is taken over all g-inverses G of A. In particular, specializing in the Euclidean norm on the space of all matrices of order m x m, we have min[Tr(Im - AG)*(Im - AG)] = Tr[(Im - AL)*(Im - AL)] G
= Tr(Im - AL).
(Why?)
Note that for any matrix C of order m x m its Euclidean norm is defined by [Tr( C* C)]1/2 . The following results can be established analogously. P 8.8.3 For a given matrix A of order m x n, let M be any minimum norm inverse of A . Then for any g-inverse G of A, (In - GA)*(In - GA) 2 (In - MA)*(In - MA),
so that In - M A is the closest to the null matrix among all matrices In - GA with G being a g-inverse of A. P 8.8.4 For a given matrix A of order m x n, let Q be any preminimum-norm-least-squares-inverse of A. Then for any g-inverse G of A, (Im - AG)*(Im - AG) 2 (Im - AQ)*(Im - AQ) and (In - GA)*(In - GA) 2 (In - QA)* (In - QA).
We now focus on the stronger notion of closeness. Recall that a matrix C is strongly closer to a null matrix than a matrix D if CC* 2 DD* and C*C 2 D* D. P 8.8.5
Let A be a matrix of order m x n. The following hold.
(1) Let L be any least squares inverse as well as a reflexive inverse of A. Then for any g-inverse G of A, (Im - AG)*(Im - AG) 2 (Im - AL)*(Im - AL), (Im - AG)(Im - AG)* 2 (Im - AL)(Im - AL)*,
Generalized Inverses
299
i.e., 1m - AL is strongly closest to a null matrix among all matrices 1m - AG with G being a g-inverse of A . (2) Let M be any minimum norm inverse as well as a reflexive inverse of A. Then for any g-inverse G of A,
(In - GA)*(In - GA) ;::: (In - MA)*(In - MA), (In - GA)(In - GA)* ;::: (In - MA)(In - MA)*, i.e., In - M A is strongly closest to a null matrix among all matrices In - GA with G being a g-inverse of A. (3) Let Q be a Moore-Penrose inverse of A. Then for any g-inverse G of A,
(Im - AG)*{Im - AG) ;::: (Im - AQ)*(Im - AQ), (Im - AG)(Im - AG)* ;::: (1m - AQ)(Im - AQ)*, (In - GA)*(In - GA) ;::: (In -
Q~)*{In -
QA),
(In - GA)(In - GA)* ;::: (In - QA)(In - QA)*, i.e., both 1m - AQ and In - QA are strongly closest to a null matrix among all matrices 1m - AG and In - GA, respectively, with G being a g-inverse of A. PROOF. A proof can be hammered out by imitating the theme in the proof of P 8.8.2. All these results can be restated in the framework of unitarily invariant norms in the manner presented right after the proof of P 8.8.2.
Complements 8.8.1
Let A=
[~ o1 ~]
and B =
[~ o1
Show that AA' 2: BB' but A' A > B' B is not true. Compare the singular values of A and B. 8.8.2 Construct two matrices A and B such that the i-th singular value of A is greater than or equal to the i-th singular value of B for every i, but neither AA* 2: BB* nor A* A ;::: B* B.
300
MATRIX ALGEBRA THEORY AND APPLICATIONS
8.9. Gauss-Markov Theorem The focus in this section is on a random vector Y consisting of m real components with some joint distribution which, among other things, depends on a parameter vector f3 E R n and a scalar a E R+ = (0,00) in the following way,
E{3,u(Y) = Xf3, D{3,u(Y) = E{3,u(Y - X(3)(Y - X(3)' = a21m,
(8.9.1)
where E stands for expectation and D for dispersion matrix, X is a matrix of order m x n with known entries. The model (8.9.1) specifying the mean, E(Y) and dispersion matrix, D(Y) (variance-covariance) matrix is called the Gauss-Markov model and is represented by the triplet (Y, X f3, a 2 1m). This model is widely used in statistics and the associated methodology is called the regression theory. The problems usually considered are the estimation of the unknown parameters f3 and a and tests of hypotheses concerning them. We have touched upon this model earlier but now we show how the concepts of projection and g-inverses are useful in solving these problems. We begin the proceedings with a definition. 8.9.1. Let f(f3) = Qf3, f3 ERn, where Q is a given matrix of order k x n. A statistic CY with C of order k x m is said to be a Linear Unbiased Minimum Dispersion (L UMD) estimator of f (.) if DEFINITION
E{3,u(CY) = f(f3), D{3,u(FY) 2: D{3,u(CY) for all f3 E Rn and a E R +, where FY is any statistic satisfying E{3,u(FY) = f(f3) for all f3 E Rn and a E R+. The parametric function Qf3 is a set of k linear functions of the components of f3. The statistic CY is a set of k linear functions of the components of Y. The estimator CY is LUMD if it is an unbiased estimator of the parametric function and has the least dispersion matrix among all linear unbiased estimators of the parametric function, i.e., that D{3,u(CY) is closest to a null matrix among all matrices D{3,u(FY) for all f3 ERn and a E R +, where FY is an unbiased estimator of f (. ). There is no guarantee that we will have at least one statistic FY which is an unbiased estimator of f (.). The following result answers some of the questions one confronts in the realm of Gauss-Markov models.
Generalized Inverses
301
P 8.9.2 Let P be the orthogonal projector from Rm onto R(X) under the inner product < u, v > = u'v for u. v E R m. The following statements hold. (1) There exists an unbiased estimator of f('), i.e., f(·) is estimable, if and only if Sp(Q') c Sp(X'), i.e., Q = AX for some matrix A. (2) If f{-) is estimable, i.e., Q = AX for some matrix A, APY is the LU M D of f (-). Further,
D/3,u(APY) = a 2 APA' (3) Let g(a) g(.) is given by
= a 2 ,a
for all
/3 ERn and a
E R+ . If p(X)
= r,
E R+.
(8.9.2)
an unbiased estimator of
[l/(m - r)]Y'{I - P)Y.
(8.9.3)
PROOF. (1) Suppose AY is an unbiased estimator of f( ·). means that for every /3 ERn and a E R + ,
This
Q/3 = E{3,u(AY) = AX/3, which implies that Q = AX. Conversely, if Q = AX for some matrix A, the AY is an unbiased estimator of f (.). (2) Suppose f(·) is estimable. Then Q = AX for some matrix A. Further,
E/3,u(APY)
= APX/3 = AX/3 = Q/3
for all /3 and a. This demonstrates that APY is an unbiased estimator of f (.). Let FY be an alternative unbiased estimator of f (. ). This means that FX = Q = AX which implies that FP = AP. Now for all /3 and a,
D/3,u(FY) - D{3,u(APY) = a 2 FF' - a 2 APP' A' = a 2 (FF' - FPP'F') = a 2 (FF' - FPF'), 2
since PP' = P, 2
= a F(I - P)F' = a F(I - P)(I - P)' F',
which is clearly non-negative definite; D{3,u(APY)
= a 2 AP A.
302
MATRIX ALGEBRA THEORY AND APPLICATIONS
(3) Note that for all (3 and a ,
Ep,u[Y'(I - P)Y] = Ep,u [(Y - X(3)'(I - P)(Y - X(3)] = E p,u[Tr(I - P)(Y - X(3)(Y - X(3)'] =
Tr[(I - P)Ep ,u(Y - X(3)(Y - X(3)']
=
a 2 Tr(I - P) = a 2 (m - Tr(P))
= a 2 (m - p(P))
= a 2 (m - r),
from which the result (3) follows. This completes the proof. (What about uniqueness?) We state some of the consequences of the main result. COROLLARY 8.9.3 . If Q = X , then PY is the LUMD of X{3. COROLLARY 8 .9.4. If Q = X , the least squares estimator of X{3 is
PY . COROLLARY 8.9.5. Suppose f(·) is estimable. The LUMD estimator of f(-) is given by Q/J, where /J = GX'Y and G is a g-inverse of X' X. Further, Dp ,u(Q/J) = a 2 QGQ' for all (3 and a. PROOF. To establish the result, we make use of the explicit representation of the projection operator P, namely, P = XGX' , where G is any g-inverse of X'X. Now the LUMD estimator of f(·) is given by
APY = AXGX'Y = QGX'Y = Q/J in our new terminology. Further, for all (3 and a,
Dp ,u(Q/J) == a 2 APA'
= a 2 AXGX'A' = a 2 QGQ' .
The expression /J is not unique when X' X is a singular and depends on the particular choice of the g- inverse G used. However, Q /J and QG Q' are unique for any choice of g-inverse G provided f(·) is estimable. COROLLARY 8.9.6. The unbiased estimator of g(-) given in P 8.9.2 (3) can be rewritten as
(m - r)-lY'(I - P)Y = (m - r)-l(y'Y - /J' X'Y). Note: The material for this Chapter is drawn from the references: Rao(1945a, 1945b, 1946b, 1951 , 1955, 1968, 1971, 1972b, 1973a, 1973b, 1973c, 1975, 1976a, 1976b, 1978a, 1978b, 1979a, 1980, 1981, 1985b), Rao and Mitra (1971a, 1971b, 1973, 1975), Rao, Bhimasankaram and Mitra (1972) and Rao and Yanai (1985a, 1985b).
CHAPTER 9 MAJORIZATION
In this chapter we introduce the notion of majorization and examine some of its ramifications. We also focus on the role of doubly stochastic matrices in characterizing majorization. The chain of ideas presented here will be made use of in understanding the relationship between the eigenvalues and singular values of a square matrix. 9.1. Majorization For any n given numbers Xl, X2, ... ,Xn , let x(1) ~ X(2) ~ ... ~ x(n) be the arrangement of Xl, X2, ... ,Xn in decreasing order of magnitude. Let X = (Xl, X2, . .. ,Xn )' and Y = (Yt, Y2, . .. ,Yn)' be two vectors in R n . Keeping up with the statistical tradition, we identify members of Rn as column vectors. DEFINITION 9.1.1. We say that the vector vector Y (or Y majorizes x), and use the symbol X(1)+ ... +X(i) ::;Y(I)+ ... +Y(i), Xl
X
is majorized by the y, if
X«
i=I, ... ,n-l,
+ X2 + ... + Xn = YI + Y2 + ... + Yn·
(9.1.1) (9.1.2)
REMARKS.
A. The relationship
«
defined above has the following properties.
(1)
X«X for every X in Rn. (2) If X « Y and Y « z, then x«z. (3) The properties stated in (1) and (2) above make the relation « only a pre-order. The crucial element missing for the relation « to be a partial order on Rn is that it does not have the property that x«y and y«x do imply that X = y. However, one can show 303
MATRIX ALGEBRA THEORY AND APPLICATIONS
304
that the components of Y is a permutation of the components of x. See (4) below for a precise statement. (4) Let 7r be a permutation on {I, 2, ... ,n}, i.e., 7r is a one-toone map from {I, 2, . .. ,n} onto {I, 2, . . . ,n}. For x in R n, let X-rr = (X-rr(I),X-rr(2), ... ,x-rr(n»)" The components of X-rr are just a permutation of the components of x. For any x in R n, x«x-rr for every permutation 7r of {I, 2, ... ,n}. The above property can be rephrased as follows. With every permutation 7r of {I, 2, ... ,n}, one can associate a permutation matrix P-rr = (P-rrij) of order n x n defined as follows for 1 ~ i, j ~ n: if 7r(i)=j, otherwise. Only one entry of every row and column of P-rr is unity and the remaining (n-l) entries consist of zeros. One can verify that for any x in Rn and any permutation 7r of {I, 2, ... ,n}, X-rr = P-rrx. Thus we have x«P-rrx for every permutation 7r of {I, 2, ... ,n}. We will elaborate the significance of this observation in the next section. Suppose x«y and y«x. Then one can verify that y = X-rr for some permutation 7r of {I, 2, . .. ,n}. B. The notion of majorization introduced above can be described in a slightly different fashion. For any x = (X},X2,'" ,xn )' in R n , let X[I) ~ X[2) ~ ••. ~ X[n) be the arrangement of Xl> X2, •.. ,X n in increasing order of magnitude. One can show that x«y if and only if X[I)
+ ... + Xli) 2: Y[I) + ... + Y[i» i = 1, ... ,n Xl + X2 + ... + Xn = Yl + Y2 + ... + Yn'
To prove this assertion, one could use the fact that i = 1,2, ... ,no
Xli)
=
1,
X(n-i+l)
for
C. The notion of majorization can be described in yet another form. For
x
and
Y
in
Rn,
let <
n
x, Y
>=
L
XiYi.
For any subset I of
i=l
{I, 2, ... ,n}, let £[ be the column vector in Rn whose i-th entry is given by if i E I, otherwise.
M ajorization
305
For example, if I = {I, 2}, then £] = (1,1,0,0, ... ,0)'. Let #1 denote the cardinality of 1. For any x in R n, it can be verified that k
LX(i)
= max{ < x,£]
>: #1
= k}, k = 1,2, ...
, n.
i=l
The following characterization of majorization is valid. For x and Y in R n , x«y if and only if the following two conditions hold.
(1) For any I C {I, 2, ... ,n} with #1 ~ n - 1, there exists J C {I, 2, ... ,n} such that #1 = #J and < x, £] > ~ < y, cJ >. (2) Xl + ... + Xn = YI + ... + Yn. D. Another equivalent of majorization can be described as follows. A vector x is majorized by a vector Y if and only if n
n
n
n
L{Xi - a)+ ~ L(Yi - a)+ for every a E R and LXi = LYi. i=l i=l i=l i=l EXAMPLES.
1. Let Xl, ... ,Xn be n random variables with some joint distribution
function. Let X(1) 2: ... 2: X(n) be the order statistics of Xt, ... ,Xn arranged in decreasing order of magnitude. Assume that E!Xi ! < 00 for every i. Let X = (Xl, X2, ... , Xn)', X = (EXt, EX2, ... , EXn )' = (X},X2, ... ,xn)', say, and Y = (EX(I), EX(2), ... , EX(n»)'. Then x«y. This can be proved as follows: for any 1 ~ k ~ n, k
LX(i) i=l
= max{ < x,£] >: #1 = k} = max{E{ < X,£] >: #1 = k)} ~ E(max{ < X,£] k
= E(LX(i») = i=l
>: #1 = k}) k
:LEX(i). i=l
In view of Remark C, this essentially completes the proof of the above assertion.
306
MATRIX ALGEBRA THEORY AND APPLICATIONS
2. Let AI, A 2, ... ,An be n events in a probability space with Pr( Ai) = ai, i = 1,2, .. . ,n. For each 1 ~ j ~ n, let Bj denote the event that at least j of At, A2, ... ,An occur. Equivalently, Bj = U(Ail n Ai2 n . .. n Ai;),
where the union is taken over all 1 ~ i l < i2 < .. . < ij ~ n . Let Pr(Bj ) = bj,j = 1,2, ... ,n. Let x = (aI, a2,'" ,an) and y = (bt,~, . " ,bn ). Then x«y. This can be shown as follows. Let Xi = I(A i }, i = 1,2, ... ,n, where I(A} is the indicator function of the event A. It is clear that EXi = ai for every i. Let XU) be the j-th order statistic of Xi's. Observe that for each 1 ~ j ~ n, X(j) = 1 if and only if at least j of Xt, ... ,Xn are each equal to unity, and otherwise equal to O. Consequently, I(Bj} = XU), The required assertion follows from Example 1 above. 3. Let Sn-l be the simplex in R n , i.e., Sn-l = {(PI,P2,'" ,Pn): each n
Pi 2:: 0 and
L Pi
= I}. There is a unique smallest element in Sn-l
i=l
according to the pre-order « on Sn-l given by (lin, lin, ... ,lin), i.e., (lin, lin, ... ,lin) « P for every P in Sn-l. There are n largest elements in Sn-l in the pre-order « on Sn-l. One of them is given by (1,0,0, ... ,0) which majorizes P for every P in Sn-l'
In the following definition we introduce an idea whose defining condition is slightly weaker than that of majorization. This notion plays a useful role in the formulation of some inequalities for eigenvalues. DEFINITION 9.1.2. We say that a vector x = (Xt,X2, ... ,xn )' is weakly majorized by a vector y = (yt, Y2, .. . ,Yn)' and denote the relationship by x «w y, if
I>u) ~ LYU),i = 1,2, . . . ,n. j=l
j=l
Complements 9.1.1 IT x and yare any two vectors of the same order and zany vector, show that
(~) « (~)
if and only if x «y.
Majorization
9.1.2 9.1.3
307
If x « z, Y « z, and 0 ~ 0 ~ 1, show that Ox + (1 - O)y « z. If x «y, x« z, and 0 ~ 0 ~ 1, show that x « Oy + (1 - O)z.
9.2. A Gallery of Functions
The notion of a function f from R m to R n preserving the pre-order « is part of a natural progression of ideas which advance the majorization concept. In this section, we will introduce this notion more formally and study some examples. DEFINITION 9.2.1. Let f be a map from Rm to Rn. It is said to be Schur-convex if x, y E R m and x «y ::::} f(x) «w f(y).
There are variations of Schur-convexity worth reporting. The function f is said to be strongly Schur-convex if x, y E R m and x «w y ::::} f(x) «w f(y). The function
f is said to be strictly Schur-convex if x, y E R m and x «y ::::} f(x)« f(y).
There are a host of other ways one could classify functions. We will now review some of these and examine their connection to Schurconvexity. Let ~ be the usual partial order on R n . More precisely, let x = (X},X2,'" ,xn )' and y = (Y},Y2,'" ,Yn)' be any two members of Rn. Say that x ~e Y iff Xi ~ Yi for all i. [suffix e stands for entrywise.] DEFINITION 9.2.2. Let f be a map from Rm to Rn. We define be monotonically increasing if
f to
x, Y E R m and x ~e Y ::::} f(x) ~e f(y),
f to be monotonically decreasing if - f is monotonically increasing, and f to be monotone if f is either monotonically increasing or monotonically decreasing. The notion that f is monotonically increasing is equivalent to the idea that f is coordinatewise increasing, i.e., if ~ and are real numbers such that ~ ~ then f(Xl,X2, •.. ,Xi-},~,Xi+}"" ,xn ) ~ f(X},X2,"" Xi-l,e,Xi+l,." ,xn ) for 1 ~ i ~ nand Xl,X2, ... ,Xi-l,Xi+l,'" ,Xn inR. Another notion of the same type is "convexity." This notion also uses the usual partial order ~ on R n .
e,
e
MATRIX ALGEBRA THEORY AND APPLICATIONS
308
DEFINITION 9.2.3. Let f be a map from Rm to Rn. We define f to be convex if f(px + (1- p)y) ::;e pf(x) + (1 - p)f(y), for every 0 ::; p ::; 1 and x, y in Rm, and f to be concave if - f is convex.
The notion of "symmetry" can also be introduced for functions. The notion of a real-valued symmetric function is easy to guess. It is simply a function which is symmetric in its arguments. With some innovation, this notion can be extended to multi-valued functions. DEFINITION 9.2.4. Let f be a map from Rm to Rn. Say that f is symmetric if for every permutation 1r of {I, 2, . .. , m} there exists a permutation 1r' of {I, 2, ... , n} such that
(See Remark A(4) above for the definition of x 7r .) In the case when n = 1, f is symmetric if and only if f(x7r) = f(x) for all permutations 1r of {I, 2, ... , m} and x E Rm. A simple example of a real valued symmetric function is given by f(Xl,X2, .•. ,xm) = m
L
Xi, X
= (Xl,X2, •••
,X m )' E
Rm.
i=l
One of the pressing needs now is to identify Schur-convex functions among other classes of functions introduced above. We need some more machinery to move in this direction. We will take up this issue in the next section.
9.3. Basic Results We will now discuss inter-relations between various entities introduced in Section 9.2. At the core of the discussion lies doubly stochastic matrices and permutation matrices. We start with doubly stochastic matrices. DEFINITION 9.3.1. A matrix P = (Pij) of order n x n is said to be a doubly stochastic matrix if (1) Pij ~ 0 for all i and j, n
(2)
L i=l
n
Pij = 1 for all j, and (3)
L
Pij = 1 for all i.
j=l
Every permutation matrix of order nXn is a doubly stochastic matrix. See Remark A(4) of Section 9.2.
Majorization
309
We will now discuss the structure of doubly stochastic matrices. Let Dn be the collection of all doubly stochastic matrices of order n X n. 2 The set Dn can be viewed as a subset of Rn and shown to be a compact convex subset ofRn2. The convexity ofthe set is obvious. If P and Q are members ofD n and 0 ~ P ~ 1, then it is clear that pP+(l-p)Q E Dn. In the context of convex sets, the notion of an extreme point plays a useful role. If A is a convex subset of some vector space, then a member x of A is an extreme point of A if x cannot be written as a strict convex combination of two distinct members of A, i.e., if x = py + (1 - p)z for some 0 < p < 1 and y, z E A, then y = z. It is natural to enquire about the extreme points of Dn. Let P n be the collection of all pennutation matrices of order n X n. See Remark A(4) above. It is obvious that P n C Dn. The following result characterizes the extreme points of Dn.
P 9.3.2
The set of all extreme points of Dn is precisely P n'
P be a permutation matrix from P n and suppose that P = pDl + (1 - p)D2 for some 0 < p < 1 and D 1 ,D2 E Dn. Let P = (Pij), Dl = (dij(l») and D2 = (d ij (2»)' Look at the first row of P. Note that Plj = 1 for exactly one j E {I, 2, ... ,n} and the rest PROOF. Let
of the entries in the first row are all equal to zero. Let Pljl = 1 for some jl E {1,2, ... ,n}. Since Pljl = pd1jdl) + (1 - p)d1j1 (2) and o < P < 1, it follows that d1j1 (1) = d 1j1 (2) = 1. For j i= jt, 0 = Plj = pd1j(1) + (1 - p)d1j (2), which implies that d1j(1) = d1j(2) = O. Consequently, the first rows of P, Dl and D2 are identical. A similar argument can be used to show that the i-th rows of P, Dl and D2 are identical for any i = 2,3, ... ,n. Hence P = Dl = D 2 • Thus P is an extreme point of Dn. Conversely, let D = (d ij ) be an extreme point of Dn. Suppose D is not equal to any of the permutation matrices in P n' Then there are some rows in D such that in each of the rows there are at least two positive entries. Start with one such row, i1-th, say. Then 0 < d ili1 < 1 and 0 < d i li2 < 1 for some jl < h. Look at the j2-th column. There must be another entry di2i2 such that 0 < d i2i2 < 1. There must be an entry di2jS in the i 2-th row such that 0 < d i2i3 < 1. If we continue this way, we will obtain an infinite sequence (this process never ends!)
310
MATRIX ALGEBRA THEORY AND APPLICATIONS
in D such that each entry in the sequence is positive and less than Wlity. But we have only a finite munber of subscripts (i, j) 's for d's with i,j E {I, 2, .. . ,n}. Consequently, a subscript either of the form (ir,jr) or of the form (ir,jr+I) must repeat itself. Assume, without loss of generality, that (it,j1) repeats itself. We now look at the following segment of the above sequence. (9.3.1) with j5+I = j1 . A possible scenario with s = 4 could look like as one depicted below with the entries forming a loop. Entries of the matrix D
did1
T T
d i3i4
L L L L L ! ! di4i4
----
---d i3i3
T T T T T T T di2i3 T T T
--
d id2
L L L L L L d i2i2
- - d i4i5
We now form two distinct matrices. Let 8 be the minimum of all the entries in the loop (9.3.1). One matrix C 1 is created from D by retaining all the entries of Din C 1 with the exception of those in the loop (9.3.1). The entries in the loop are replaced by
respectively and inserted into C 1 • The other matrix C 2 is created in a similar way the only change being the replacement of 8 by -8. We can now verify that C 1 and C 2 are distinct members of Dn and
Majorization
311
D = (1/2)C1 + (1/2)C2 . This contradicts the supposition that D is an extreme point of Dn. This completes the proof. A complete knowledge of extreme points of a convex set is helpful in recreating the set. Every member of the set can be written as a mixture of extreme points. This result is true in a more general framework but, for us, proving it in the environment of doubly stochastic matrices is rewarding and fulfilling. Before we establish this celebrated result, due to G.Birkhoff, we explore the notion of permanent of a matrix. DEFINITION 9.3.3. Let A = (ai j) be a square matrix of order n x n with real entries. The permanent of A, per(A), is defined by n
per(A) =
L II 7r
(9.3.2)
ai7r(i)
i=l
where the summation is taken over all permutations
7r
of {I, 2, ... ,n}.
There is one essential difference between per(A) and det(A). In the n
definition of the determinant, every product of the type multiplied by either +1 or -1 depending on whether odd permutation. As an example, let
A= [
n
ai7r(i)
is
i=l 7r
is an even or
~ -3 -~ -~l4
-2
Check that per(A) = 1 and det(A) = -1. Another critical property of the permanent is that it is permutation-invariant. If we permute the rows and/or columns of A, the value of the permanent remains the same. We will introduce a special notation for submatrices. Let A = (aij) be a square matrix of order n x n. Let I, J C {1, 2, ... ,n}. The submatrix A(I, J) of A is a matrix of order #1 x #J whose rows are indexed by the members of I and columns by J with entries plucked from A correspondingly. (Recall that #1 stands for the cardinality of the set I.) Symbolically, one writes the submatrix as A(I, J) = (aij)iEI,jEJ·
312
MATRIX ALGEBRA THEORY AND APPLICATIONS
For example,
A
=
-1] 1 0 [ 2 -2 -2 -3
=}
1 , I 4
A(I,J)
= {2, I}, =
and J
[-2 1] 0 -1
= {2, 3},
.
The submatrix A(I, J) is technically different from the one given by I' = {I, 2} and J' = {2, 3}, i.e.,
A(I',J' ) =
[-~ -~] .
The permanents of these two submatrices are the same. Since we are predominantly dealing with permanents in this sojourn, we are not concerned about how the members of I and J are arranged. If I and J do not have the same cardinality, perA(I, J) does not make sense. Let us record a simple result on permanents which follows directly from the definition of the permanent. P 9.3.4 If all the entries of a square matrix A of order n x n are non-negative, and I, J are subsets of {I, 2, ... , n} having the property that #1 + #J = n, then per(A) ~ per(A(I, JC)) x per(A(Ic, J)).
(9.3.3)
It is worth noting that, in P 9.3.4, I and JC, the complement of set J, have the same cardinality. The following result plays a crucial role in many of the subsequent results.
P 9.3.5 Let A = (aij) be a square matrix of order n x n with nonnegative entries. Then per(A) = 0 if and only if there are subsets I, J of {I, 2, . . . , n} such that
#1
+
#J ~ n
+1
and
A(I, J) = O.
(9.3.4)
PROOF . Suppose (9.3.4) holds. Since the permanent is permutationinvariant, we can take I = {I, 2, .. . , k} and J = {k, k + 1, ... , n} for
313
M ajorization
some k. Note that #1 + #Ic = nand A(I, IC) = O. Further, the last column of A(I, I) is zero. Consequently,
pereA) = per(A(I, 1)) x per(A(Ic, I C)) =
o.
See Complement 9.3.4. The converse can be established using induction. IT n = 1, the validity of the converse is transparent. Suppose that the converse is true for all non-negative matrices of order m x m with m ~ n, n fixed. Let A = (aij) be a matrix of order (n + 1) x (n + 1) with nonnegative entries and per(A)= O. IT all the entries of A are zeros, the proof ends right there. Assume that aij is positive for some i and j. Without loss of generality, one can take an+I,n+I > O. (Why? Remember that we are dealing with permanents.) Take 10 = {I, 2, ... ,n} and J o = {n+l}. By (9.3.3),
0= pereA)
~
per(A(Io, (Jo)C)) x per(A((Io)C, Jo)) = per(A(Io, (Jo)C)) x an+I,n+1 ~ O.
This implies that per(A(Io, (Jo)C)) = O. Let B = A(Io, (Jo)C). Thus B is a matrix of order n x n with permanent equal to zero. The rows and columns of B are indexed by the same set. By the induction hypothesis, there are subsets It and J 1 of {I, 2, ... ,n} such that
#11 + #J1 ~ n + 1 and B(lt, Jt} = A(Ib Jt} = O. IT #It + #J1 ~ n + 2, the proof ends. Take I = It and J = J1 in (9.3.3). Suppose #11 + #J1 = n + 1. By (9.3.3), in the realm of (n + 1) x (n + 1)-order matrices,
which implies that either per(A(lt, (J1)C))= 0 or per(A((It}c, J 1)) = O. Assume, without loss of generality, that A(lt, (J1)C) = o. By the induction hypothesis, there are sets 12 C It and h C (J1)C such that #12 + #J2 ~ #11 + 1 and A(h, J2) = O. Note that
#h + #(J1 UJ2) = #h + #J1 +#h ~(#I1
+
1)+((n+l)-#Id=n+2.
314
MATRIX ALGEBRA THEORY AND APPLICATIONS
Further, the fact that A(I2, J 1 U J 2) = 0 follows from A(I2, h) = 0 and A{12, J 1 ) is a submatrix of A{lt, Jt} which is equal to zero. Take 1 = 12 and J = J 1 U h in (9.3.4). A useful consequence of this result is the following. P 9.3.6 If A then per{A) > O.
= (Aij)
is a doubly stochastic matrix of order n x n,
Suppose per{A) = O. By P 9.3.5, there are subsets 1 and J of {I, 2, ... ,n} such that #1 + #J ~ n + 1 and A{1, J) = o. After a suitable permutation of its rows and columns, the matrix A can be brought to the form PROOF .
where 0 is of order #1 x #J and the transformed matrix is also doubly stochastic. Note that n = sum of all entries in A* ~
sum of all entries in C
= #1
+
#J
~
+ sum of all entries in B
n+ 1,
which is a contradiction. We are now in a position to describe the structure of the set of all doubly stochastic matrices of the same order. P 9.3.7 Every matrix in On is a convex combination of members of P n , i.e., given any D in On, there exist non-negative numbers Ap, P E P n such that L Ap = 1 and D = LApP. pep,..
Pep,..
PROOF. We furnish a proof using the induction argument. For any matrix D, let n{D) be the total number of non-zero entries in D. It is a well-defined map on On. Obviously, n{D) ~ n. If n{D) = n, it is clear that D ought to be a permutation matrix. The induction method is used on the number n{D). If n(D) = n, the proposition is evidently true. Assume that the proposition is true for all doubly stochastic matrices B for which n(B) < n{D). We will show that the result is also valid
Majorization
315
for the doubly stochastic matrix D = (d ij ). Since per(D) is positive, there exists a permutation 7r on {I, 2, ... ,n} such that d i 7l'(i) > 0 for all i. (Why?) Let Po be the permutation matrix in P n which corresponds to 7r. Let () = minl~i~n d i 7l'(i)' Two cases arise. If () = 1, the matrix D is precisely equal to the permutation matrix Po. The result is true. IT () < 1, let B =
1
(1 _ (})(D - (}Po).
The following features emerge from the definition of the matrix B. (1) B is doubly stochastic. (2) n(B) < n(D). (3) D = (1 - (})B + (}Po. By the induction hypothesis, B is a convex combination of permutation matrices. Hence D is a convex combination of permutation matrices. This completes the proof. REMARK. It is clear that the cardinality of the set P n of all permutation matrices of order n x n is n! Given D in D n , a natural question one can ask is whether one requires all members of P n in the representation of D in terms of members of P n in the form given above. The answer is no. In fact, a crude upper bound for the number of members of P n required in the representation of D above is n 2 • (Why? Each number of P n can be viewed as a member of Rn2.) But a good upper bound is given by n 2 - 2n + 2.
Majorization and doubly stochastic matrices are intimately related. One interesting feature of majorization is that if a vector y is a permutation of a vector x, then one can traverse from x to y through a series of intermediate vectors x
= v(O) «
V(l)
« ... «
v(m)
= y,
where any two consecutive vectors differ only in two coordinates. One merely flips two coordinates of a vector to move on to the next vector in moving from x to y. In the following result we demonstrate that this can be achieved in the environment of majorization. P 9.3.8
Let x, y E Rn. The following are equivalent.
(1) x«y.
316
MATRIX ALGEBRA THEORY AND APPLICATIONS
(2) There is a finite number of vectors u(O), U(l), .. . , u(m) in Rn such that x = u(m) « u(m-l) « .. . « u(O) = y, and for all k , u(k) and U(k+l) differ only in two coordinates. (3) x = Dy for some doubly stochastic matrix. We show that (1) => (2). We use the induction argwnent on the dimension n. If n = 1, there is nothing to show. Assume that the implication is true for all vectors of order m x 1 with m less than n. Let x, Y E Rn. We can assume, without loss of generality, that the components of each of the vectors x = (Xl> X2, ... ,Xn )' and Y = (YI, Y2, . .. , Yn)' are arranged in the decreasing order of magnitude. (Why?) We move from Y to x successively the way outlined in (2). We produce, first, u(l) . Note that Yn :::; Xl :::; YI . One can find k such that Yk-l :::; Xl :::; Yk . Write Xl = tYI + (1 - t)Yk for some 0 :::; t :::; 1. Let PROOF.
u(1)
= (Xl> Y2, ...
, Yk-l,
(1 -
t)YI
+ tYk, Yk+l, .. . ,Yn)'.
The vectors U(l) and u(O) could possibly differ in the first and k-th coordinates. We will show that u(1) « u(O). Towards this end, we will show, first, that (Xl, (1 - t)YI + tYk)' is equal to (9.3.5) It is clear that Xl +(l-t)YI +tYk = YI +Yk and both Xl and (l-t)YI +tYk are less than or equal to YI. Consequently, (9.3.5) follows. (Why?) If we adjoin the vector (Xl> (1 - t)YI + tYk) to (Y2, Y3, ... ,Yk-l, Yk+l, ... ,Yn) and the vector (Yl> Yk) to (Y2 , Y3, ... ,Yk-l> Yk+l, .. . ,Yn), it now follows that u(l) « u(O). See Complement 9.1.1. The next critical step is to demonstrate that
p
«
= (X2' X3, . . .
(Y2, Y3, . .. ,Yk-l,
(1 - t)YI
,X n )'
+ tYk, Yk+l, . ..
,Yn)'
= q. (9.3.6)
By design, we have
(1) Y2 ~ Y3 ~ ... ~ Yk-l ~ Xl ~ X2 ~ . . . , (2) (1 - t)YI + tYk ~ Yk+l ~ Yk+2 · .. ~ Yn . The number (1 - t)YI + tYk is intermediate to the numbers Y2 and Yk+l. The following inequalities are clear.
(1)
X2
+ X3 + ... + Xr
:::; Y2
+ Y3 + ... + Yr
for any 2 :::; r :::; k - 1.
Majorization
317
(2) (Y2 +Y3 + ... +Yk-d + (1- t)Yl +tYk + (Yk+l +Yk+2 + ... +Yr) = (Yl + Y2 + ... + Yr) - Xl ?: (Xl + X2 + .. . + Xr ) - Xl = X2
+ X3 + ... + Xr for any
k ::; r ::; n.
In (2), equality occurs when r = n. No matter where the number (1 - t)Yl + tYk occurs in the chain Y2 ?: Y3 ?: ... Yk-l ?: Yk+t, (9.3.6) follows. (Why?) Let us invoke the induction hypothesis. There are vectors w(l), w(2), ... ,w(m) in Rn-l such that p
= w(m) «
w(m-l)
« ... «
w(2)
«
w(l)
= q.
Let u(k) = (~1») for k = 1,2, ... , m and u(O) = y. These vectors meet all the requirements stipulated in (2). We now proceed with the implication (2) ::} (3). In the chain of vectors, we note that u(k+l) is obtainable from u(k) by pre-multiplying u(k) by a suitable doubly stochastic matrix. More precisely, u(k+I) = (tJ +(I-t)Pk)u(k), where I is the identity matrix and Pk is some suitable permutation matrix which swaps two coordinates only. Stringing all these doubly stochastic matrices in a multiplicative way, we obtain a doubly stochastic matrix D such that X = Dy. Finally, we tackle the implication (3) ::} (1). Let X = Dy for some doubly stochastic matrix D = (d ij ). Assume, without loss of generality, that the components of X = (Xl, X2, ... , Xn)' are in decreasing order of magnitude, Le., Xl ?: X2 ?: ... ?: Xn . Likewise, assume that the components of Y = (yt, Y2, .. . , Yn) are in decreasing order of magnitude. (If not, let PI and P2 be the permutation matrices so that the components of each of the vectors PIX and P2Y are in decreasing order of magnitude. The equation X = Dy can be rewritten as PIX = (Pl )D(P2 )-1(P2Y). Note that (Pt}D(P2)-1 is doubly stochastic.) Note that for any 1 ::; k ::; n, k
k
k
n
k
LXi - LYi = LLdijYj - LYi i=l
i=l
i=l j=l n
i=l
= LtjYj - LYi j=l
n
k
+ Yk(k -
Ltj)
i=l
j=l n
k
= L(Yj - Yk)(tj - 1) j=1
+
L j=k+l
tj(Yj - Yk) ::; 0,
318
MATRIX ALGEBRA THEORY AND APPLICATIONS
where 0 ~
tj
k
L d ij
=
~
1. These numbers are non-negative and sum
i=l
up to k. This completes the proof. We would like to identify Schur-convex functions among convex functions. The following result flows in this direction. P 9.3.9 If a function J from Rm to R n is convex and symmetric, then it is Schur-convex. If, in addition, J is monotonically increasing, then J is strongly Schur-convex. PROOF. Let x, y E R m be such that x «y. By P 9.3.8, there exists
a doubly stochastic matrix D such that x = Dy. By P 9.3.7, we can write D = LApP, a convex combination of permutation matrices PEP",
with 0 ~ Ap and
L
AP = 1. Since J is convex,
PEP",
J(x) = J(
L
APPy) ~
PEP",
L
ApJ(Py) .
PEP",
J is symmetric, for each PEP m, there exists QpEPn such that J(Py) = QpJ(y) for all y E Rm. Consequently,
Since
J(x) ~
L
ApQpJ(y) .
PEP",
Let Do
= L
PEP",
ApQp . Clearly, Do is a doubly stochastic matrix of
order n x n. From the proof of P 9.3.8, it is clear that J(x) ~e DoJ(y) implying that J(x) «w J(y). Suppose, in addition, J is monotonically increasing. Let x, y E Rm be such that x «w y. There exists z in Rm such that x ~ z« y . See Complement 9.2.2. By what we have assumed about J, we have
J(x) ~e J(z) «w J(y), ~ J(x) «w J(z) «w J(y) and J(x) «w J(y). This shows that J is strongly Schur-convex. One can jot down a number of consequences of P 9.3.9.
Majorization COROLLARY
319
9.3.10. Let I be a real-valued function of a real variable
and
If I is convex, then Is is Schur-convex. If, in addition, I is monotonically increasing, then Is is strongly Schur-convex. PROOF. It is clear that Is is convex. As for the symmetry of Is, note that for any permutation II of {I, 2, ... ,n}, I(xn) = (J(x))n for all xERn.
Let us introduce some additional operations on vectors. For any vector x ERn, let u define x+ = (max{xt, O}, max{x2' O}, ... ,max{xn, O})" and Ixl = (lxII, IX21,··· ,Ixn !). 9.3.1l. (a) If x «y, then Ixl «w Iyl. (b) If x« y, then x+ «w y+. (c) If x« y, then (xi,x~, ... ,x~) «w (Yi,y~, ... ,y;).
COROLLARY
For (a), take I(t) = Itl, t E R, in Corollary 9.3.10. For (b), take I(t) = max{t,O}, t E R, in Corollary 9.3.10. For (c), take PROOF.
I(t) = t 2 , t E R. Complements
9.3.1
Let A = (aij) be an orthogonal matrix with real entries, i.e.,
A' A = I. Show that D = (a~j) is a doubly stochastic matrix.
Let x «y. Show that there exists an orthogonal matrix A = (aij) such that x = Dy, where D = (a~j)' 9.3.3 Let A = (aij) be a Hermitian matrix, x = (AI, A2, ... ,An)', the vector of eigenvalues of A, y = (all, a22, ... ,ann), and A = U Diag {AI, A2,'" ,An}U*, the spectral decomposition of A, i.e., U = (Uij) is a unitary matrix. Let D = (luijI2). Show that y = Dx. Hence, or otherwise, show that y « X.
9.3.2
n
9.3.4
Let l(xttX2, ... ,Xn) = LXilogxi; XttX2, ... ,Xn > O. Show i=l
that I is Schur-convex on the set (R+)n. 9.3.5 Let A be a square matrix of order n X n and I, J subsets of {I, 2, ... ,n}. If #1 + #J = nand A(I, J) = 0, show that pereA) = per(A(I, JC)) x per(A(Ic, J)).
320
MATRIX ALGEBRA THEORY AND APPLICATIONS
9.3.6
Let I(XI, X2, .. . ,xn )
=
~
n
L (Xi-x)2 = variance of the numbers i=l
Xl!
X2, ... ,Xn , X
=
(XI, X2, ... ,xn ) ERn, where x
n
= *- L
Xi· Show
i=l
that I is Schur-convex. 9.3.7 Let I be a real-valued. function defined. on Rn whose first-order partial derivatives exist. Let I(i) = 8~i I, i = 1,2, ... ,n. Show that I is Schur-convex if and only if the following hold.
(1) I is symmetric. (2) (Xi - Xj)(J(i) (x) - l(j)(x))
~
0 for all X and i,j. 9.3.8 The k-th elementary symmetric polynomial, for any fixed. 1 k ~ n, is defined by
~
where the summation is taken over alII ~ i l < i2 < ... < i k ~ n. Show that -Sk is Schur-convex over (R+)n. 9.3.9 Let the components of the vectors x' = (Xl! X2, ... ,xn ) and y' = (yI, Y2,· .. ,Yn) be non-negative. If X « y, show that
n Xi n Yi· n
i=l
9.3.10
n
~
i=l
Let A = (aij) be a positive definite matrix of order nXn. Show n
that det(A) ~
n aii. (Use Complements 9.3.3 and 9.3.9.) i=l
Note: For a historical and comprehensive treatment of majorization, Marshall and Olkin (1979) is a good source. A substantial portion of this chapter is inspired. by the work of Ando (1982).
CHAPTER 10 INEQUALITIES FOR EIGENVALUES Eigenvalues and singular values of a matrix playa dominant role in a variety of problems in Statistics, especially in Multivariate Analysis. In this chapter we will present a nucleus of results on the eigenvalues of a square matrix. Corresponding results for the singular values will also be presented. To begin with, we make some general remarks. Let A be a matrix of order n x n with complex numbers as entries. Let AI! A2,· .. ,An be the eigenvalues of A. In order to indicate the dependence of the eigenvalues on the underlying matrix A, the eigenvalues of A are usually denoted by Al (A), A2(A), ... ,An(A). The eigenvalues of A can be obtained as the roots of the determinantal polynomial equation,
IA -
AInl = 0,
in A of degree n. Let O"~ 2 O"~ 2 . . . 2 0"; (2 0) be the eigenvalues of the Hermitian matrix A * A, or equivalently of AA *. The numbers 0"1 2 0'2 2 ... 2 O'n(2 0) are called the singular values of A. More precisely, the singular values are denoted by 0'1 (A) 2 0'2(A) ~ ... ~ O"n(A). Now, let A be a matrix of order m x n. If m i= n, the concept of eigenvalues of A makes no sense. However, singular values of A can be defined in a legitimate manner as the square roots of the eigenvalues of AA * or of A * A. We usually write the singular values of a matrix in decreasing order of magnitude. For a square matrix, some of the eigenvalues could be complex. But if the matrix is Hermitian, the eigenvalues are always real. In such a case, the eigenvalues are written in decreasing order of magnitude. Thus two sets of numbers, namely the set of eigenvalues and the set of singular values, can be defined for a square matrix. The relationship between these two sets will be pinpointed later in this chapter. 321
322
MATRIX ALGEBRA THEORY AND APPLICATIONS
All the results in this chapter are cast in the environment of the vector space en with its usual inner product < ., . > given by n
< x,Y >= LXdii = y*x, i=l
for x' = (Xl, X2, ... ,xn ) and y' = (Yl, Y2, ... ,Yn) E en. Two vectors X and yare orthogonal (x.ly in notation) if < x, Y > = 0 or y*x = o. 10.1. Monotonicity Theorem Let A and B be two Hermitian matrices of the same order n x n, and C = A+B. In this section, some inequalities connecting the eigenvalues of the three matrices A, B, and C will be presented. P 10.1.1
(Monotonicity Theorem for Eigenvalues) Let
be the eigenvalues of the matrices A, B, and C, respectively. Then
(1)
>
1'1
1'2
(2)
}
>
(3)
}
> ?,
01
(n)
+ f3n
02
:.~n-l
On
+ f3l
}
>
1'n
>
323
Inequalities for Eigenvalues
PROOF. The inequalities on the left quoted above can be written succinctly as follows: for each i = 1,2, . . . , n, OJ +f3i-j+l ~ Ii for j
= 1,2, . ..
,i; i
= 1,2, ... ,no
(10.1.1)
The inequalities quoted above on the right can be written as
+ f3n-j+i, j = i,i + 1, ... ,n; i = 1,2, ... ,no
Ii ~ OJ
(10.1.2)
Using a very simple argument, we will show that the inequalities (10.1.2) follow from (10.1.1). Let Ul,U2, ... ,Un; Vl,V2,· .. ,Vn; Wl,W2,.·. ,Wn
be the corresponding orthonormal eigenvectors of A, B, and C, respectively. Fix 1 :S: i :S: nand 1 :S: j :S: i. Let SI = span{Uj,Uj+l, ... ,un}, S2 = span{ Vi-j+l, Vi-j+2, .. . , v n }, S3 = span{ WIt W2, ... , Wi}. Note that dim(St} = n - j + 1, dim(S2) = n - i Using the dimensional identity (see P 1.5.7),
+ j,
and dim(S3)
= i.
+ dim(S2 n S3) - dim(St + (S2 n S3)) = dim(Sl) + dim(S2) + dim(S3) - dim(S2 + S3) - dim(Sl + (S2 n S3)) ~ dim(Sl) + dim(S2) + dim(S3) - n - n
dim(SI n S2 n S3) = dim(St}
=n-j+l+n-i+j+i-n-n=1. Consequently, there exists a vector x in Sl n S2 n S3 such that x*x = 1. Since x E SI, we can write x = ajuj + ... + an Un for some complex numbers aj,aj+It· . . ,an . The property that x*x = 1 implies that E;=j lar l2 = 1. Further, n
x* Ax
=
n
n
n
n
(2: ilru;)A(L arur ) = (2: ilru;)(L arorur ) = L r=j
r=j
r=j
r=j
r=j
lar l2 or .
324
MATRIX ALGEBRA THEORY AND APPLICATIONS
We have used. the fact that ur's are orthonormal eigenvectors. Since the eigenvalues ai's are written in decreasing order of magnitude, it now follows that aj 2: x* Ax 2: an. In our argument we need only the inequality aj 2: x* Ax. In a similar vein, one can show that f3i-j+1 2: x* Bx 2: f3n, II 2: x*Cx 2: Ii, ~ aj
+ f3i-j+1 2: x* Ax + x* Bx =
x*Cx 2: Ii.
Thus (10.1.1) follows. The validity of the inequalities on the right hand side of the Monotonicity Theorem can be effected as follows. Note that
(-A)
+ (-B) = -C,
and the eigenvalues of (-A), (-B) and (-C) are respectively
By what we have proved earlier, it follows that for every i = 1,2, ... , n and j = i, i + 1, ... , n,
from which we have Ii
2: aj + f3n-U-i).
(The map T: {1,2, ... ,n} ---? {n,n-l, ... ,I} involved here is given by T(r) = n - r + 1, r E {I, 2, ... , n}.) This completes the proof. The plethora of inequalities stated in P 10.1.1 can be recast utilizing the majorization concept. Let ,x(C) = (,1" ... "n);,x(A) = (a1, ... ,an), and ,x(B) = (f31, ... ,f3n).
It follows from P 10.1.1 that ,x(C) « (,x(A) + ,x(B)). We would like to venture into the realm of singular values. The question we pop is what is the relationship between the singular values of A, B, and the sum A + B. We need a trick. Let A E Mm nand q = min{m,n}. Let 0"1 2: 0"22: ••• 2: O"q{2: 0) be the singular vaiues of A. Construct a new matrix by
Inequalities for Eigenvalues
325
A*] o . Note that the matrix Hermitian.
P 10.1.2
A is
of order (m
1m - nl
x (m
+ n).
The eigenvalues of the Hermitian matrix
0"1 2: 0"2 2: ... 2: O"q 2: 0 = (with
+ n)
... = 0 2:
Further,
A is
A are:
-lTq 2: -lTq-l 2: ... 2: -lTl
zeros in the middle).
PROOF. Using a singular value decomposition of A, we will obtain a spectral decomposition of A. Assume, for simplicity, that m ~ n. Let A = Pt:::.Q be a singular value decomposition of A, where P and Q are unitary matrices of appropriate order and t:::. = (DIO) with D = diag{lTb lT2, ... ,lTm} and 0 is the zero matrix of order m x (n - m). Partition Q as: Q = (~~), where Ql is of order mXn and Q2 of order (n-m) Xn. The fact that Q*Q = In implies that QiQl + Q 2Q2 = In. The singular value decomposition of A can be rewritten as: A = P DQl. Construct a matrix
/J2]
P* P*/J2
o
.
Note that U is a matrix of order (m + n) x (m + n). More interestingly, U is indeed a unitary matrix. Further,
o -D
o
Thus we have a spectral decomposition of the Hermitian matrix A. The diagonal entries of the matrix ~n the middle of the representation above are indeed the eigenvalues of A. This completes the proof. The above result will be instrumental in establishing a monotonicity theorem for singular values.
326
MATRIX ALGEBRA THEORY AND APPLICATIONS
P 10.1.3 (Monotonicity Theorem for Singular Values). Let
be the singular values of the matrices A, B, and C, respectively, where A and B are each of order m x n, C = A + B, and q = min {m, n}. Then we have the following results:
(1)
Iq
~
!"'+iJ,
a2.~./3q-l a q + /31
( "I +!i,-I (2)
Iq-l ~
a2 + /3q-2 a q-l
+ /31
( "I +!i,-2 Iq-2~
(3)
a2 + /3q-3 aq_~
(q)
II
<
al
+ /31
+ /31
PROOF. A proof of this result can be obtained by a combination of
P 10.1.1 and P 10.1.2. For each of the matrices A, Band C, construct A, Band C as outlined in P 10.1.2. Observe that C: = ...1+ B. The eigenvalues of A, B, and C written in decreasing order of magnitude are: al 2: ... 2: a q 2: 0 =
... = 0 2: -aq 2: ... 2:
-a1,
/31 2: ... 2:/3q 2: 0 = ... = 02: -/3q 2: ... 2: -/31, II 2: ... 2: Iq 2: 0 = ... = 0 2: -,q 2: ... 2: -,1, respectively. Working carefully with the first q eigenvalues of using P 10.1.1, one can establish the inequalities
C and
327
Inequalities for Eigenvalues
'Yi ::; OJ + {3i-j+t. j
= 1,2, ...
, i; i
= 1,2, ...
, q,
on the right-hand side of the list presented in the theorem. The contents of P 10.1.3 can be recast in terms of weak majorization idea. By P 10.1.2, we have, working with the matrices A, B, and C, ("'11, "'12, ... , 'Yq, 0, 0, ... ,0, -'Yq, -'Yq-l, ... , -'Yd
«(01,02, ... ,Oq,O,O, ... ,O,-Oq,-Oq_l ....
,-od
+({3t.{32, ... ,{3q,O,O, ... ,O,-{3q,-{3q-l, ... ,-{31)' From this, we obtain
The proof that we have presented for the Monotonicity Theorem for the eigenvalues of Hermitian matrices is due to Ikebe, Inagaki, and Miyamoto (1987). Complements 10.1.1 Let A, Band C be three matrices of the same order m x n with C = A + B. Note that O"i(A + B) ::; O"i(A) + O"i(B) for each i is not true. Hint: Look at the example:
A=
[~
~]
and
B=
[~
10.1.2 Let A and B be Hermitian matrices of the same order n x n of which B is non-negative definite. Show that Ai (A) ::; Ai(A + B), i = 1,2, ... , n. If A and B are Hermitian matrices, then we have Weyl's Tbeorem
and for each j
Aj(A + B) ::; Ai(A)
for i ::; j,
Aj(A + B)
for i 2: j
= 1, . ..
,n
+ Aj-i+l(B) 2: Ai(A) + Aj-i+n(B)
328
MATRIX ALGEBRA THEORY AND APPLICATIONS
10.2. Interlace Theorems The focus in this section is to spell out the connection between the eigenvalues of a matrix and those of its submatrices. Let A be a Hermitian matrix of order m x m partitioned as
where B is of order n x n for some n < m. It is clear that B is also a Hermitian matrix. P 10.2.1 (Interlace Theorem for Eigenvalues) Let
be the eigenvalues of A and B, respectively. Then
In particular, if m
= n + 1,
then
PROOF. Let U1,U2, ... ,Urn and Vl,V2, ..• ,Vn, be the corresponding orthonormal eigenvectors of A and B, respectively. First, we show that Ok 2': {3k for each k = 1, 2, ... , n. For each i = 1, 2, ... , n, introduce the augmented vector
of order m X 1. Fix 1 :s; k :s; n. Let 8 1 = span{ uk, Uk+1, .. . ,Urn} and 8 2 = span{ WI, W2, •• · , wd. Both are subspaces of Rrn. Note that dim(8 1 ) = m - k + 1 and dim(8 2) = k. Further,
+ dim(82) - dim(8 1 + 82) k + 1) + k - m = 1.
dim(8 1 n 82) = dim(8J)
2': (m -
Consequently, there exists a vector x in 8 1 n8 2 such that x"'x = 1. Since x E 8}, it follows that Ok 2': x'" Ax. We have used this kind of argument
Inequ.alities for Eigenvalu.es
329
in Section 10.1. Since x E S2, we can write x = al WI some unique scalars aI, a2, ... ,ak. FUrther, we have k
1 = x*x =
for
k
L
lail 2vi v i =
i=l
L lail 2, i=l
k
ak
+ ... + akWk,
k
~x*Ax= ~laiI2(viIO)(g. ~)(~) = ~lail2viBVi k
~.8k
L lail 2 = .8k. i=l
To establish the other bunch of inequalities in (10.2.1), look at the matrix
-A =
[=~*
-C]
-D .
The eigenvalues of - A and - Bare
-am
~
-am-l 2: ...
~
-al and
-.8n 2:
-.8n-l 2: ...
~
-.81,
respectively. By what we have established above, we have now
from which the desired inequalities follow. This completes the proof. If one looks at the inequalities (10.2.1) a little closely, the following pattern emerges. The eigenvalues a's that figure on the left-hand side of the inequalities are the first n eigenvalues of A and the eigenvalues a's that figure on the right-hand side of the inequalities are the last n eigenvalues of A. Our next goal is to obtain an analogue of the Interlace Theorem for the singular values of a matrix. The trick employed in P 10.1.2 is also useful here.
P 10.2.2 (Interlace Theorem for Singular Values) Let A E Mm,n be partitioned as
MATRIX ALGEBRA THEORY AND APPLICATIONS
330
where B is a matrix of order p x q for some p ::; m and q ::; n. Let ITt
~
1T2
~
•.•
~ ITrk 0) and
~ 1'2 ~ ... ~ rs(~ 0)
rt
be the singular values of A and B, respectively, where r = min{ m, n} and s = min{p, q}. Then ri ::; lTi, PROOF.
i
= 1,2, ...
,s.
Observe that s ::; r. Let
- [0A
A*]
A=
0
-
and B =
Note that B is a Hermitian submatrix of in P 10.1.2, the eigenvalues of A are:
[0B
A.
oB* ] .
(Why?) As has been noted
and that of Bare r}
~
...
~
rs
~
0
= ... = 0 ~
-rs
~
...
~
-rIo
By P 10.2.1, by comparing the first s eigenvalues of A and the desired inequalities.
B, we obtain
The contents of P 10.2.2 are disappointing. The result does not have as much pep as the one presented in P 10.2.1. It is the fault of the technique employed in the proof of P 10.2.2. We will take up the problem of obtaining a good analogue of Theorem 10.2.1 a little later in the complements for singular values.
Complements
10.2.1 Let A E Mm,n' Assume m ~ n. Let B be the matrix obtained from A by deleting one of the columns of A. Let
be the singular values of A and B, respectively. Show that
Inequalities for Eigenvalues
331
What happens if B is obtained from A by deleting a row of A? Hint: Assume, without loss of generality, that it is the last column that is deleted from A. Note that A = (Bib), where b is the last column of A. Look at A* A and B* B.
10.2.2 Let A E Mm ,n. Assume m < n. Let B be the matrix obtained from A by deleting one column of A. Let
be the singular values of A and B, respectively. Show that
A similar result holds if a row of A is deleted. Some useful pointers emerge from Complements 10.2.1 and 10.2.2 for further generalizations. Let us recast the conclusions of the complements in a uniform manner. Before this, let us adopt the following terminology and conventions. For any matrix A E Mm,n, let
be the singular values of A. Let us extend the range of the subscripts of the singular values of A by defining
O"j(A) = 0 for j > T. The recasting is carried out as follows. Let A E Mm,n be given. Let Al be the submatrix obtained from A by deleting either one row of A or one column of A. Then
Suppose A2 is a matrix obtained from A by deleting either two rows of A or two columns of A or one row and one column of A. We would like to explore the connection between the singular values of A and A 2 . Suppose A2 is obtained from A by deleting two columns, say jl-th and
332
MATRIX ALGEBRA THEORY AND APPLICATIONS
h-th, of A. Let Al be the matrix obtained from A by deleting the jrth column of A. It is clear that the matrix A2 can be obtained from Al by deleting an appropriate column of AI. The relationship between the singular values of Al and those of A2 is clear. More precisely,
O"i(A I ) 2: O"i(A 2) 2:
O"i+I (Ad,
i
= 1,2, ...
,min{ m, n - I}.
These inequalities can be stretched a little further. We can say that
O"i(A I ) 2: O"i(A 2) 2: O"i+I(Ad, i = 1,2, ... ,{m, n}. We do have a good grip on the connection between the singular values of Al and those of A. Finally, we can jot down that
The argument can be pushed further. We have now an analogue of Theorem 10.2.1 for singular values. 10.2.3 Let A E Mm,n' Let Ar be any matrix obtained from A by deleting a total of r rows and columns. Show that
10.3. Courant-Fischer Theorem The Monotonicity Theorem examines relationship between the singular values of a sum of two matrices and those of its constituents. The Interlace Theorem explores the connection between the singular values of a matrix and those of a submatrix. It is time to be a little introspective. The Courant-Fischer Theorem characterizes the singular values via optimization.
P 10.3.1 (Courant-Fischer Theorem for Eigenvalues) Let A E Mn be a Hermitian matrix with eigenvalues
Then for each 1 :::; k :::; n, o'k
=
=
max
min
see" ,di m (S }=k xES ,x' x= I
min
(x* Ax)
max
Tee" ,dim(T}=n-k+I xET,x' x=l
(x* Ax).
(10.3.1) (10.3.2)
Inequalities for Eigenvalues
333
In particular, 0t
= xECn,x'x=l max x" Ax,
On
= xECn,x'x=l min x " Ax.
(The entities Sand Tare subspaces of en.) PROOF . Let U1, U2, . .. ,un be the orthonormal eigenvectors corresponding to the eigenvalues, 01, ... ,On, of A. Fix 1 ::; k ::; n. Let
Clearly, dim(Sl) = n - k Note that
en.
+ 1.
Let S be any k-dimensional subspace of
dim(S n St} = dim(S)
+ dim(Sl) -
dim(S
+ Sl)
2:k+n-k+l-n=1. Choose any y in sns 1 such that y*y = 1. Following by now the familiar argument, since y E Sl, we have Ok 2: y" Ay. Consequently, since YES, Ok
2:
min
xES ,x' x=l
x" Ax.
This inequality is valid for any k-dimensional subspace S of en. Therefore, Ok 2: max min (x" Ax). secn ,dirn(S)=k
xES ,x' x=l
We need to show now the reverse inequality. Let S=span{Ut,U2 , . ..
,ud.
Note that S is a k-dimensional subspace of x .. x = 1, then
en.
Further, if xES and
(Why?) It is clear that max
secn ,dim(S)=k
min xES ,x' x=l
(x* Ax) >
min xES ,x' x=l
x* Ax >
Ok .
334
MATRIX ALGEBRA THEORY AND APPLICATIONS
Thus the identity (10.3.1) follows. As for the second identity, let
It is clear that dim(Sl) = k. Let T be any (n - k + I)-dimensional subspace of en. It is clear that dim(SI n T) 2': 1. Choose any vector y in SI n T such that y*y = 1. Since y E Sl, y* Ay 2': Cik. Since yET, max
xET,x·x=l
(x* Ax)
(10.3.3)
2': Cik.
Let us recast this observation in the following way: for every (n - k + 1)dimensional subspace T of en, (10.3.3) holds. Consequently, min
TCCn,dim(T)=n-k+1
(x* Ax) 2':
max
xET,x· x=l
Cik .
To establish the reverse inequality, take
FUrther, if x E Cik
2':
and x*x = 1, then x" Ax ~
'1',
!llax
xET ,x.x=l
(x* Ax)
2':
Cik .
Hence
min
max
TCCn,dim(T)=n-k+1 xET,x·x=l
(x* Ax).
This establishes the identity (10.3.2). This completes the proof. There is another angle at which one can look at the Fischer-Courant theorem. Let A E Mn be a Hermitian matrix. Let Ci1 2': Ci2 2': ... 2': Ci n be the eigenvalues of A with the corresponding orthogonal eigenvectors Ul, U2, ... ,Un. Define a function p : en - {O} ~ R by u*Au p(u) = - - , u E u*u
en,
U:f: O.
The function p(.) is called the Raleigh quotient and has the following properties. (1)
(2)
Ci1
2': p(u) 2': min
uEC",u;tO
Cin,U E en,U:f:
p(u)
= Ci n
o.
with the minimum attaining at u
= u n·
Inequalities for Eigenvalues
(3)
max
uEC",u;i:O
p(u)
= 01
335
with the maximum attaining at
U
= Ul .
(4) The function p(.) is stationary at, and only at, the eigenvectors of A, i.e., for example,
8p(u)
~ IU=Ui = 0
for each i.
(5) Let 1 ~ k ~ n be fixed. Let Sk = span{uk, Uk+b
..• ,
un}. Then
and the maximum is attained at U = Uk . (6) LetTk =span{u1,u2, ... ,ud. Then
and the minimum is attained at
= Uk .
U
Look at the Courant-Fischer theorem after going through the above properties of the Raleigh quotient. The properties of the Raleigh quotient involve both the eigenvalues and the chosen eigenvectors. The Courant-Fischer theorem characterizes the eigenvalues of A without involving eigenvectors. An analogue of P 10.3.1 for the singular values of a matrix is not hard to obtain. In the formulation of the relevant result, an expression like x* Ax does not make sense for non-square matrices. We need to find an appropriate analogue for such an expression.
P 10.3.2 (Courant-Fischer Theorem for Singular Values) Let A E Mrn ,n and be its singular values. Then for each k
O'k(A) =
min
~
max
1,
(x*(A* A)X)1/2
(10.3.4)
SCC",dim{S)=n-k+1 xES,x' x=1
=
max
min
TCC" ,dim{T)=k xET,x' x=1
(x*(A* A)x)1/2.
(10.3.5)
336
MATRIX ALGEBRA THEORY AND APPLICATIONS
PROOF. If a subspace is not available with a specified dimension, we take the min-max and max-min numbers of (10.3.4) and (10.3.5) to be equal to zero. The result follows from the corresponding CourantFischer theorem for the eigenvalues of a matrix if we keep in mind the fact that the eigenvalues of A * A are:
(7~(A) ~ (7~(A) ~ ...
One can obtain the Interlace Theorem from the Courant-Fischer Theorem. First, we deal with the eigenvalues of a square matrix. COROLLARY 10.3.3. (Interlace Theorem) Let A E Mn be Hermitian and 01 ~ 02 ~ . .. ~ On its eigenvalues. Let B be a submatrix of A obtained by deleting some n - r rows of A and the corresponding columns of A. Let 131 ~ 132 ~ ... ~ f3r be the eigenvalues of B. Then Ok ~ 13k ~ 0k+n-r,
k
= 1,2, ...
,r.
PROOF. Note that B is a Hermitian matrix of order r x r. Assume, without loss of generality, that B was obtained from A by deleting the last n - r rows and last n - r columns of A. For 1 ~ k ~ r, by the Courant-Fischer theorem, Ok
= ~
=
min
Tee n ,dim(T)=n-k+1
max
xET,x' x=l
(x* Ax)
min
max
(z* Az)
min
max
(y* By) =
Tee n ,dim(T)=n-k+1 zET,z' z=l Tee r ,dim(T)=r-k+1 yET,y'y=l
13k
where z in the middle equation is a vector with the last (n - r) components as zero. The last step requires some deft handling. It is left as an exercise to the reader. On the other hand, for each 1 ~ k ~ r, by the Courant-Fischer theorem, 0k+n-r
= See",dim(S)=k+n-r max ~
max
min
xES,x' x=l
min
seen ,dim(S)=k+n-r zES,z' z=l
(x* Ax)
(z* Az)
= seermax min (y* By) = 13k ,dim(S)=k yES,y' y=l . This completes the proof.
Inequalities for Eigenvalues
337
Complements
10.3.1 Let A, B E Mn be such that A is Hennitian and B is nonnegative definite. Let
be the eigenvalues of A and A + B, respectively. Using the CourantFischer theorem, show that ak ~ {3k, k = 1, 2, . .. ,n. (These inequalities can also be obtained using the Monotonicity Theorem.) 10.3.2 Obtain the Interlace Theorem for the singular values of a matrix from the corresponding Courant-Fischer theorem. 10.3.3 (Sturmiall Separation Theorem) Let Ar be the submatrix obtained by deleting the last n - 7' rows and columns of a Hermitian matrix A, r = 1, ... ,n. Then Ak+l (A+d ~ Ak(A i ) ~ Ak{Ai+d.
10.4. Poincare Separation Theorem The Monotonicity Theorem, Interlace Theorem and Courant-Fischer Theorem form a triumvirate of results on the eigenvalues and the singular values of matrices. The monotonicity theorem compares the eigenvalues of two Hermitian matrices A and B with their sum. The Interlace Theorem compares the eigenvalues of a Hennitian matrix and its principal submatrices. The Courant-Fischer Theorem characterizes the eigenvalues of a Hermitian matrix. Anyone of these results is deducible from anyone of the other results. Some of these implications have already been alluded earlier. The Poincare Separation Theorem, which is the subject of discussion in this section, also falls into the same genre. P 10.4.1 (Poincare Separation Theorem for Eigenvalues) Let A E Mn be a Hermitian matrix with eigenvalues 0.1 ~ 0.2 ~ ... ~ an. Let B be any matrix of order n x k such that B* B = I k , i.e., the columns of B constitute a set of orthonormal vectors. Let {31 ~ {32 ~ ... ~ {3k be the eigenvalues of the matrix B* AB. Then ai ~ (3i ~ ai+n-k, i = 1,2, ... ,k. PROOF. Note that B* AB is Hermitian. This result can be deduced from the Interlace Theorem. Let B = (u}, U2, ... ,Uk), where Ui is the
338
MATRIX ALGEBRA THEORY AND APPLICATIONS
i-th column of B. Determine orthonormal vectors Uk+}, Uk+2, ... ,Un such that V = (u}, U2, ... ,un) is unitary. Observe that the matrices V* AV and A have the same set of eigenvalues. Further, B* AB is a principal submatrix of V* AV obtained by deleting the last n - k rows and columns. Now the Interlace Theorem takes over. The result follows. Now the question arises as to when 0i = f3i for i = 1, ... ,k in the Poincare Separation Theorem. Let U}, U2, .. . ,Un be the corresponding orthonormal eigenvectors of A. Let V = (u}, ... ,Uk) and take B = VT for some unitary matrix T of order k x k. Then 0i = f3i, i = 1,2, ... ,k. Let us see what happens to B* AB. Since V* AV = Diag(o}, ... ,Ok), B* AB = T*V* AVT = T*Diag{o}, 02, ... ,0dT. Thus the eigenvalues of B* AB are precisely O}, 02, . . . ,Ok. This establishes that 0i = f3i for i = 1,2, ... ,k. One might ask in a similar vein as to when f3i = Oi+n-k holds for i = 1,2, ... ,k. This is left to the reader as an exercise. We now need to launch an analogue of the Poincare separation theorem for the singular values of a matrix. The following result which is easy to prove covers such a contingency.
P 10.4.2 (Poincare Separation Theorem for Singular Values) Let A E Mm,n with singular values
O"}(A)
~
0"2(A)
~
....
Let V and V be two matrices of order m x p and n x q, respectively, such that V*V = Ip and V*V = I q. Let B = V* AV with singular values
O"}(B) Then, with r
~
0"2(B)
~
....
= (m - p) + (n - q),
O"i(A)
~
O"i(B)
~
O"i+r(A), i = 1,2, ... , min {m, n}.
Complements Let A,B E Mm,n, E = B - A and q = min{m,n}. If O"} ~ 0"q are the singular values of A, Tl ~ ... ~ Tq are the singular values of B, and 8 is the spectral norm of E, then (1) 10"i - Ti I s 8 for all i = 1, . .. ,qj and (2) (O"} - T})2 + ... + (O"q - Tq)2 s IIEII~.
10.4.1
. ..
~
Inequalities for Eigenvalues
339
10.5. Singular Values and Eigenvalues For a square matrix, one can detennine its singular values as well as its eigenvalues. If the matrix is Hermitian, a precise relationship exists between its singular values and eigenvalues. In this section, we establish a set of inequalities connecting the singular values and eigenvalues of a general matrix.
P 10.5.1 Let A E Mn . Let a}, 02, .. . ,an be the eigenvalues of A arranged in such a way that
Let
0"1
~ 0"2 ~ ••• ~ O"n
be the singular values of A. Then
with equality for i = n. PROOF. By the Schur triangularization theorem, there exist unitary matrices U and Veach of order n x n such that U* AV = 6, where 6 is an upper triangular matrix with diagonal entries 01, 02, . .. , an. Fix 1 ~ i ~ n. Write U = (U1IU2) and V = (VI !V2), where U I and VI are of order n x i. Note that
U*AV=
[~}] A(VI !V2) = [~}~~ = 6 = [ ~i
Ui A V2 J
U2AU2
~] , say.
It is now clear that Ui AVI is upper triangular with entries in the diagonal being 01, 02, ... ,oi. It is indeed a sub matrix of U* AV. By the Interlace Theorem, O"j(U; AVt} ~ O"j(U* AV), j = 1,2, ... ,i.
Let us compute the singular values of U* AV. For this we need to find the eigenvalues of (U* AV)*(U* AV) = V* A* AV. The eigenvalues of A* A and V* A* AV are identical. Consequently, O"j(U* AV) = O"j(A) = O"j, j = 1,2, ... ,n. Finally,
340
MATRIX ALGEBRA THEORY AND APPLICATIONS
Idet(U; AVdl
= Idet(6i)1 = Irr;=lajl = rr;=IO"j(U; AVn
:::; rr;=IO"j·
When i = n, equality holds. (See Complement 10.5.1.) This completes the proof. The multiplicative inequalities presented in P 10.6.2 for the eigenvalues and singular values of a square matrix have an additive analogue. More precisely, if none of the eigenvalues and singular values is zero, then (£nla11, £nla21,··· ,£nlanD
«
(£nO"l, £n0"2, ... ,£nO"n),
where £n denotes the natural logarithm. COlnplements
10.5.1 Let A E Mn and 0"1 2: 0"2 2: ... 2: O"n (2: 0) its singular values. Show that Idet(A)1 = 0"10"2·.· O"n· 10.6. Products of Matrices, Singular Values and Horn's Theorem One important operation on matrices is the operation of multiplication. In this section, we will present some results on a connection between the singular values of the product of two matrices and the singular values of its constituent matrices. P 10.6.1 (Horn's Theorem) Let A E Mm ,n and B E Mn,p. Let q = min{m,n,p},r = min{m,n},s = min{n,p} and t = min{m,p}. Let
O"I(A) 2: 0"2(A) 2: ... 2: O"r(A) (2: 0), O"l(B) 2: 0"2(B) 2: ... 2: O"s(B) (2: 0), O"l(AB) 2: 0"2(AB) 2: . .. 2: O"t(AB) (2: 0), be the singular values of A, B, and AB respectively. Then
rr;=IO"j(AB) :::; rr;=IO"j(A)CTj(B), i = 1,2, ... ,q.
(10.6.1)
If A and B are square matrices of the same order, i.e., m = n = p, then equality holds in (10.6.1) for i = n. PROOF.
The proof is outlined in a series of steps.
Inequalities for Eigenvalues
341
1. Let AB = P 6Q* be a singular value decomposition of AB for some unitary matrices P and Q of orders m x m and p x p, respectively. The (k, k)-th entry of the matrix 6 is equal to O"k(AB), k = 1,2, ... ,t, and the rest of the entries of 6 are zeros. 2. Fix 1 :::; i :::; q. Write P = (Pl IP2) and Q = (QlIQ2) , where PI and Ql are of orders m x i and p x i, respectively. Observe that Pi(AB)Ql is an i x i principal submatrix of P*(AB)Q. Therefore, Pi(AB)Ql = diag{ 0"1 (AB), 0"2(AB), ... ,O"i(AB)}, and
det(P: ABQ1)
= n;=IO"j(AB).
3. We want to focus on the matrix BQ1, which is of order nxi. By the polar decomposition theorem, we can find two matrices X of order n x i and W of order i x i such that X has orthonormal columns, W is nonnegative definite, and BQl = XW. Note that W 2 = (BQ.)*(BQ.) = Qi B* BQl . Hence, det(W2) = the product of the squares of the eigenvalues of W = the product of the eigenvalues of Qi B* BQ1 = n;=10"j(Qi B *BQ.).
(The singular values and eigenvalues are the same for a positive semidefinite matrix.) By the Poincare separation theorem,
4. Let us focus on the matrix Pi AX. This is a square matrix of order i x i. By Complement 10.5.1 and Poincare Separation Theorem,
5. Combining all the steps we have carried out so far, we have n;=IO"j(AB) = det(P: ABQ.) = Idet(Pi ABQ1)1 = Idet((Pi AX)(W))I = Idet(Pi AX)lldet(W)1 :::; n;=lO"j(A)O"j(B).
6. If m
= n = p, then
342
MATRIX ALGEBRA THEORY AND APPLICATIONS
This completes the proof. The multiplicative inequalities presented in P 10.6.1 have an additive analogue too. Within the same scenario, we have
, L1nui(AB) ~ Lln(ui(A)uj(B)), i j=l
= 1,2, ...
,q.
j=l
IT A and B are square matrices of the same order n x n, then (lnUI (AB), 1nu2(AB), ... ,lnun(AB))
«
(In(ul (A)U2(B)), in(u2(A)u2(B)), ... ,in( un(A)un(B) )).
10.7. Von Neumann's Theorem For a square matrix, the trace of the matrix and the sum of all eigenvalues of the matrix are the same. Suppose we have two matrices A and B such that AB is a square matrix. The trace of the matrix AB is easy to compute. How is the trace related to the singular values of the individual matrices A and B? The von Neumann's Theorem is an answer to this question. But first, we need to prepare the reader for the von Neumann's Theorem (von Neumann (1937)). P 10.7.1 Let A E M m •n . Then tr(AX) = 0 for every matrix X of order n x m if and only if A = o. A stronger result than P 10.7.1 is as follows. P 10.7.2 Let A E Mn. Then tr(AX) = 0 for all Hermitian matrices X if and only if A = O. PROOF.
Note that any matrix X of order m x m can be written as
with both Xl and X 2 Hermitian. More precisely, one can take
Xl
= (1/2}(X + X*)
and
X2
= (1/2i}(X -
X") .
P 10.7.3 Let A E Mn. Then tr(AX) is real for all Hermitian matrices X if and only if A is Hermitian.
Inequalities for Eigenvalues
PROOF. Suppose
343
A = (aij) and X = (Xij) are Hermitian. Observe
that m
tr(AX) =
m
L L aijXji. i=l j=l
Since A and X are Hermitian, aii and Xii are real for each i. Consequently, aiiXii is real for each i. Let i =1= j. Write aij = a + ib and Xji = e + id, where a, b, e and d are real numbers. Then
aijXji
+ ajiXij = =
+ o'ijXji = (a + ib)(e + id) + (a - ib)(e - id) 2(ae - bd) + i(ad + be) - i(ad + be) = 2(ae - bd)
aijXji
which is clearly real. Consequently, tr( AX) is real. Conversely, suppose, tr(AX) is real for all Hennitian matrices X. Then tr(AX) = tr(AX) = tr«AX)*) = tr(X* A*) = tr(X A*) = tr(A* X)
=> tr«A - A*)X) = 0 for all Hermitian matrices X. By P 10.7.2, we have A - A* A is Hermitian.
= 0,
i.e.,
P 10.7.4 Let A E Mm be Hermitian. If tr(A) 2: Retr(AU) for all unitary matrices U, then A is non-negative definite. (The symbol Re stands as an abbreviation for "the real of part of.") PROOF. Let us perform a spectral decomposition on
A. Write
where At, A2, ..• ,Am are the eigenvalues of A and U1, U2,'" ,Um the corresponding orthonormal eigenvectors of A. Our goal is to prove that each Ai 2: O. Suppose not. Some eigenvalues are negative. Assume, without loss of generality, that
for some 1
~
r
~
n. Let
+ A2U2U2 + ... + ArUrU;,
B =
A1u1Uj
C=
-Ar+1Ur+1U;+1 - Ar+2Ur+2U;+2 - ... - AmUmU~,
344
MATRIX ALGEBRA THEORY AND APPLICATIONS
U = UIU1..
+ U2U2.. + .. , + UrU.r -
Ur+l Ur*+l - Ur+2 Ur* +2 - ... - u m u·m'
Check that Band C are non-negative definjte, C =1= 0, A = B - C, U is Hermitian, and U is unitary. Further, AU = B + C. Observe that, by hypothesis,
tr(A) = tr(B) - tr(C) ~ tr(AU) = tr(B)
+ tr(C).
This inequality is possible only if tr( C) = O. Since C is non-negative definite, this is possible only if C = O. This is a contradiction. This completes the proof. Now we take up a certain optimization problem. This has an important bearing on the proof of von Neumann's Theorem. Before we present the optimization result, let us take a detour. Let B E Mm. The problem is to investigate under what condition the series
1m +B+B2
+ ...
converges absolutely, and if it converges, identify the limit. A full discussion was undertaken in Chapter 11. But we are looking for a simple, sufficient, and easily verifiable condition. Let M = maXl::;i,i::;m Ibijl. Observe that
(1) every entry in B is::; M in absolute value; (2) every entry in B2 is ::; mM2 in absolute value; (3) every entry in B3 is ::; m 2 M3 in absolute value, and so on. Consequently, the series 1m + B + B2 + ... converges absolutely if Ek>1 m k - 1Mk converges, or equivalently, if Ek>1 (mM)k converges. The geometric series converges if mM <-1. To answer the second question, if the series converges then the sum is equal to (1m - B)-I. Let us make use of the above discussion. Let X = (Xii) be an arbitrary but fixed Hermitian matrix of order m x m. Let M = max IXiil. Then we can find EO > 0 such that both
1m
+ ifX
and
l::;i,i::;m
1m - iEX
are invertible for every -EO < E < EO. To see this, take B = iEX = (bij) with E real. Note that m~x Ibijl = IEIM. The series 1::;1,J::;m
Inequalities for Eigenvalues
345
1m +B+B2 + . . . converges absolutely to (1m - B)-1 = (1m -i€X)-1 if 1€lmM < 1. One can take €o = l/mM. In a similar vein, one can show that (1m + i€X) is invertible if lEI < Eo. Fix -EO < E < EO. Then it transpires that
and further, that V is unitary. Let us calculate
(1m
+ iEX)(lm -
+ iEX)(lm + iEX + (iEX)2 + ... ) 1m + 2iEX + 2(iEX)2 + 2(iEX)3 + .. .
iEX)-1 = (1m =
= 2(1m - iEX)-1 - 1m.
(10.7.1)
Likewise,
Observe that
which shows that V is unitary. We are now ready to establish the desired optimization result. P 10.7.5 Let A E Mm. Let U m be the collection of all unitary matrices of order m x m. Then sup Re tr(AU) UEU",
is attained at some matrix Uo E U m non-negative definite.
.
Further, AUo turns out to be
PROOF. The fact that the supremum is attained at some matrix in U m is not difficult to prove. This is a topological result. But the battle to show that AUo is non-negative definite is hard. We will present the proof in a series of steps.
346
MATRIX ALGEBRA THEORY AND APPLICATIONS
1. Observe that the set U m is compact when it is viewed as a subset of an appropriate unitary space with the usual topology. Further, the real valued maps tr(AU) and Re tr(AU) as functions of U E U m are continuous. By a standard result in topology that the supremum of a continuous function on a compact is attained, it follows that the desired supremum is attained at some Uo E U m • 2. The next objective is to show that AUo is indeed Hermitian. By P 10.7.3, it suffices to show that tr(AUoX) is real for every Hermitian matrix X. Start with any Hermitian matrix X. Look up the discussion that followed P 10.7.4. There exists to > 0, such that 1m + itX
and
1m - itX
are both nonsingular for every -to < t < to. Further, the matrix
is unitary, where f(t, X) is a series of matrices involving t and X . See (10.7.1). Note that Re tr(AUoV) = Re tr(AUo) + 2tRe[i tr(AUoX)] + t 2 Re tr(AUof(t, X)) .::; Re tr(AUo), by the very definition of Uo, and V is unitary. This implies that
2tRe[itr(AUoX)] for every -to
< t < to.
If 0
+ t 2 Re tr(AUof(t, X))
< t < to,
.::; 0
we indeed have the inequality
2Re[itr(AUoX)] +tRe tr(AUof(t,X))'::;
o.
Taking the limit as t ! 0, we observe that Re[i tr(AUoX)] .::; o. Arguing in a similar vein for -to < t < 0, we can conclude that Re[i tr(AUoX)] ;::: O. Hence Re[i tr(AUoX)] = 0, from which it follows that tr(AUoX) is real. (Why?) This shows that AUo is Hermitian. 3. The final step is to show that AUo is non-negative definite. Note that by the very definition of Uo. Re tr(AUo) ~ Re tr(AUoV) for every
Inequalities for Eigenvalues
347
Hermitian matrix V . By P 10.7.4, AUa is non-negative definite. This completes the proof. This result is a little unsatisfactory. It uses a topological result to show the existence of the optimal unitary matrix Ua. But we do not know what exactly it is. Secondly, we do not know what the supremum of Re tr(AU), U E U fn actually is. We will ameliorate the deficiencies now. P 10.7.6
Let A E Mfn with the singular values
and singular value decomposition A = Pb.Q, where P and Q are unitary matrices and b. = diag{O"l (A), 0"2(A), ... ,00m(A)}. Let U m be the collection of all unitary matrices of order m x m. Then max Re tr(AU) = 2:ai(A),
UEU",
i=l
and the maximum is attained at Ua = Q* P*. (Ua need not be unique.) PROOF. Let U E U fn . Let us compute m
Re tr(AU) = Re tr(Pb.QU) = Re tr(b.QU P) = Re
2: ai(A)[QU P]ii, i=l
where [QU P]ii is the i-th diagonal entry of the unitary matrix QU P. Being a unitary matrix, I[QU P] iil 1 for each i. Consequently,
: =;
IRe tr(AU)1
m
m
i=l
i=l
::=;2: O"i(A)I[QU P]iil ::=;2: O"i(A) .
Let us compute specifically Re tr(AUa)
= Re tr(Pb.QQ* P*) = Re tr(b.QQ* P* P) = Re tr(b.) =
2: O"i(A). i=l
348
MATRIX ALGEBRA THEORY AND APPLICATIONS
Incidentally, AUo = Pjj.P*, which is clearly non-negative definite. The proof is complete. P 10.7.5 and P 10.7.6 solve the same optimization problem. The proof of P 10.7.5 is originally due to von Neumann (1937). His proof is non-constructive but the methods used are fascinating. The proof provided under P 10.7.6 is constructive. Another nugget emerges from P 10.7.5 and P 10.7.6. Suppose VI is a unitary matrix such that AVt is non-negative definite. Then VI maximizes Re tr(AV) over all V E Um! This can be seen as follows. Since AUt is non-negative definite, its eigenvalues and singular values are identical. Consequently, m
tr(AVt} =
L CTi(AVt ). i=t
The singular values of AVt are precisely the positive square roots of the eigenvalues of (AVI)*(AVI) = Ui A* AUt. The eigenvalues of Vi A* AVt are identical to the eigenvalues of A * A. The singular vales of A are the positive square roots of the eigenvalues of A* A. Hence m
m
i=t
i=t
The avowed assertion now follows from P 10.7.6. We now come to the main result of this section. P 10.7.7 (Von Neumann's Theorem) Let A E Mm nand B E Mn,m be such that AB and BA are non-negative definit~. Let p = minim, n}, q = maxim, n}, and
be the singular values of A and B, respectively. Set
CTp+I(A) = CTp+2(A) = . .. = CTq(A) = 0 CTp+1(B) = CTp+2(B) = ... = CTq(B) = O.
Inequalities for Eigenvalues
Then there exists a permutation
T
349
of {I, 2, ... ,q} such that q
tr(AB) = tr(BA) = LCTi(A)CTr(i)(B). i=l
PROOF. The proof is carried out in several steps.
1. We work out a special case. Assume that m = n, A and B are nonnegative definite, and AB = BA. This implies that AB is non-negative definite. (Why?) We can diagonalize both A and B simultaneously. There exists a unitary matrix U such that
A ~1
where that
Qi'S
= U~lU*
= diag{ Q}, Q2,· ..
= U~2U*, ~2 = diag{.B}' .B2,'"
and B
,Qn},
,.Bn},
and .Bi's are the eigenvalues of A and B respectively. Note
n
= tr(~1~2) =
L
Qi.Bi.
i=l
Since the eigenvalues and singular values are identical for a non-negative definite matrix, we can write n
n
LQi.Bi = L CTi(A)CTr(i) (B), i=l
i=l
for some permutation T of {I, 2, ... ,n}. The statement of the theorem is valid for this special case. 2. The strategy for the general case is as follows . Assume, without loss of generality, that m ~ n. Let A and B be given as stipulated in the theorem. We will construct two matrices Ao and Bo each of order n x n with the following properties. a. Ao and Bo are non-negative definite. b. Ao and Bo commute. c. The eigenvalues of Ao are CT1 (A), CT2(A), . .. ,CTm(A), 0, 0, ... ,0 (n - m zeros) and the eigenvalues of Bo are CT1(B), CT2(B), ... , CTm(B), 0, 0, ... ,0 (n - m zeros). d. tr(AB) = tr(AoBo).
350
MATRIX ALGEBRA THEORY AND APPLICATIONS
Now the conclusion of the theorem will follow from Step 1. 3. We want to simplify the hypothesis of the theorem. We would like to assume that m = n and that A and B commute. Let us see how this can be done. Let AI, A2, . .. ,Am be the eigenvalues of the m x m positive definite matrix AB. We claim that the eigenvalues of the n x n matrix BA are: AI, A2, .. ' ,Am, 0, 0, ... ,0 (n - m zeros). Let us prove a general result. Let A be a non-zero eigenvalue of the non-negative definite matrix AB with multiplicity t with the corresponding linearly independent eigenvectors Ul, U2, ... ,Ut. Then A is also an eigenvalue of BA with multiplicity at least t and BUl, BU2, ... ,But are the corresponding linearly independent eigenvectors. In fact, it will come out later that the multiplicity is exactly equal to t. First, check that
We now check that BUl, BU2, ... ,But are linearly independent. Suppose that (h (Bul) + ... + Ot(But) = for some scalars 01 , O2, ... ,Ot. Then we have
°
i=l
t
t
t
= l::0i(ABui)
= l::0i(AUi) = A l::0iUi,
i=l
i=l
i=l
which implies n
l::0iUi
= 0,
i=l
as A =1= 0. Since Ut, U2, ... ,Ut are linearly independent, it follows that 01 = O2 = ... = Ot = 0, from which we have the desired linear independence of But, BU2, ... ,But. In order to establish the claim , let PI, P2, ... ,Pr be the distinct eigenvalues of AB among AI, A2, ... ,Am. If no Pi = 0, the argument will be a little easier. We will take up the more difficult case: one of the eigenvalues of AB is zero. Assume that Pr = 0. We will produce two tables. The first table summarizes all the information about the eigenvalues of AB and the second table deals with the matrix BA, where m = tl + ... + t r .
Inequalities for Eigenvalues
Eigenvalues of AB
Multiplicity
351
Linearly independent eigenvectors
J..LI J..L2
(r-I)
(r-I)
(r-I)
J..Lr-I
t r- I
J..Lr = 0
tr
Eigenvalues of B A
Multiplicity
J..LI
tl
(I) (1) (1) BU I , BU 2 , .•• , BU t1
J..L2
t2
(2) (2) (2) BU I , BU 2 , • .. , BU t2
ul
" " , u tr _ 1
Immaterial Linearly independent eigenvectors
B U (r-I) ,u2 B (r-I) , • . ., B U (r-I) _
J..Lr-1 J..Lr = 0
,u2
I
tr
+ (n -
m)
tr
1
Immaterial
Since Rank(AB)=Rank(BA)=number of non-zero eigenvalues of AB= number of non-zero eigenvalues of BA, the second table follows. Consequently, the eigenvalues of BA are AI, A2, ... ,Am, 0, 0, ... ,0 (n - m zeros). There may be more zeros among A}, A2, . .. ,Am. Since AB is Hermitian, AB = U t::..U*, where t::.. = diag{>.I' A2,'" ,Am} and U is unitary. Also, there exists a unitary matrix V such that BA=V
[~
~]
V*.
Partition V = (VI !V2) where VI is of order n x m. Note that VI has orthonormal columns, i.e., "\.t;*VI = 1m.
352
MATRIX ALGEBRA THEORY AND APPLICATIONS
Now
~
= U* ABU, which gives
BA
= VI~Vt = VIU* ABUVt·
Now define Al = Y* A and A2 = BY with Y Let us list the properties of Al and A2.
= UVt·
a. Each of Al and A2 is of order n x n. b. Al and A2 commute. As a matter of fact,
AIA2
= Y* ABY = BA
and
A2AI
= BYY* A = BA.
c. A}A 2 and A2Al are non-negative definite. d. The singular values of Al are
° °
Thus we are justified in assuming that m = n and that A and B commute in the statement of the theorem. Let us restate the conditions of the theorem unequivocally. Let A and B E Mm. Assume that A and B commute and that AB is non-negative definite. Step 2 needs to be reasserted. We want to find two matrices Ao and Bo each of order m x m with the following properties. a. Ao and Bo are non-negative definite. b. Ao and Bo commute. c. The singular values of Ao are
adding up to m. The information is summarized in the following table.
Inequalities for Eigenvalues
Eigenvalues of AB
Mllltiplicity
353
Orthonormal eigenvectors
Space
spaIilled
J..Ll
J..Lr The subspaces spanned have the following properties. a. Ml ffi M2 ffi ... ffi Mr = em. b. Mi..LMj for every i =f. j. c. If u E Mk for some 1 ::; k ::; r, then Au E Mk. For this, it suffices to show that Au~k) E Mk for every 1 ::; i ::; tk' Note that (AB)(Au~k») = A(ABu~k») = A(l-lkU~k») = J..Lk(Au~k») , implying that Au~k) is an eigenvector of AB corresponding to the eigenvalue J..Lk. Thus Au~k) E Mk. d. If u E Mk for some 1 ::; k ::; 1', then Bu E M k . Let U consist of all the eigenvectors compiled in the table above:
U=(uP), ... ,u~~),u~2), ... ,u~;), ... ,u~r), ... ,ut»). Note that U is a unitary matrix. Let Observe that: Q.
The matrix
A = U* AU
and
f:J
= U* BU.
A is block-diagonal with i-th diagonal block matrix
Ai being of order ti x ti. j3. The matrix f:J is block-diagonal with i-th diagonal block matrix Bi being of order ti x ti. "y. AB = U* AUU* BU = U* ABU is block diagonal with the i-th diagonal matrix ILift., where ft. is a unit matrix of order ti. The properties Q, j3, and , follow from the properties enunciated under a, b, c, and d above. 8. A and f:J commute.
354
MATRIX ALGEBRA THEORY AND APPLICATIONS
AB is non-negative definite.
f.
tp.
Ai and Bi commute for each i. More precisely, AiBi = J-LiIti for
each i. ~.
AiBi is non-negative definite. '17. The singular values of A and A are identical. The singular values of Band B are identical. (. The singular values of A are the same as the singular values of At, A 2 , • .• ,Ar put together. The singular values of B are the same as the singular values of B 1 , B 2 , ••. ,Br put together. For each 1 ::; i ::; such that
T,
if only we can find a permutation 'Ti of {I, 2, . . . ,ti}
ti
tr(AiBi)
= L O"j(Ai)O"Ti(j)(Bi), j=l
our goal will be accomplished. In line with Steps 1 and 2, if we can find two matrices AiO and BiO with the following properties, a. AiO and BiO are non-negative definite, b. AiO and BiO commute, c. the singular values of Ai and AiO are identical, the singular values of Bi and BiO are identical, d. tr(AiBi) = tr(AiOB iO ), our mission will be accomplished. 5. In view of Steps 1,2,3 and 4, the assumptions of the theorem can be simplified as follows. a. m=n. b. A and B commute. c. AB = AIm for some A ~ O.
The goal is now to find two matrices Ao and Bo with the following properties.
Ao and Bo are non-negative definite. Ao and Bo commute. 'Y. The singular values of A and Ao are identical. The singular values of Band Bo are identical. 6. tr(AB) = tr(AoBo). Q.
f3.
Inequalities for Eigenvalues
355
Case 1. >. > O. By P 10.7.5, there exists a unitary matrix U such that AU is non-negative definite. Define Ao = AU and Bo = U" B . The matrices Ao and Bo have the following properties. a.
Ao and Bo are non-negative definite. The case of Ao is clear.
Note that Bo = U"B = U"C>'A-I) = >'U"A-l which it follows that Bo is non-negative definite.
= >'(AU)-l,
from
{3. Ao and Bo commute. Note that AoBo = AB = >.Im = BoAo. "Y. The singular values of A and Ao are identical. The singular values of Band Bo are identical. 6. trCAB) = tr(AoBo). The goal is achieved.
Case 2. >. = O. We can proceed in exactly the same way as in Case 1. We will have the matrices Ao and Bo with the properties a, {3, "Y, and /j but with one exception. We do not know whether or not Bo is non-negative definite. We need to do some more work. Let J.Ll! J.L2, ... ,J.Lr be the distinct eigenvalues of Ao with multiplicities tl! t2, . " ,tr . If none of the eigenvalues of Ao is zero, then Ao is nonsingular and Bo = 0, which is non-negative definite. The desired objective is achieved. Assume that one of the eigenvalues of Ao is zero. Take J.Lr = 0, for example. Let us summarize some of this information in the following table, where m = tl + ... + t r . Eigenvalues of A
J.Lr
Multiplicity
Orthonormal eigenvectors
Space spanned
(r) (r) (r) u 1 ,u2 ,..., u tr
Let u be any eigenvector of A that corresponds to a non-zero eigenvalue J.L of A. We claim that Bu = O. Note that
0= ABu = B(Au)
= B(JLU) =
J.L(Bu).
356
MATRIX ALGEBRA THEORY AND APPLICATIONS
Since J-L i= 0, it follows that Bu = O. Let W be the matrix composed of the orthonormal eigenvectors of A listed above in the order they were written down as columns. It is clear that W is a unitary matrix. Note that by what we have noted down about Bu for eigenvectors u of A that correspond to non-zero eigenvalues of A, W* Ao W is a block diagonal matrix with J-L i It. in the i-th diagonal block, i = 1, . .. ,r, and
for some matrix Eo of order tr x tr and appropriate dimensions for 0 matrices. The matrices W* Ao Wand W* Bo W would have met the requirements of Q, {3, 'Y, and {j under Case 1 but for the matrix Eo, which need not be non-negative definite. By P 10.7.5, there exists a unitary matrix Wo of order tr x tr such that Eo Wo is non-negative definite. Define
Aoo = W* AoW, and
0
v=[r Note that V is unitary. Let
Aoo
It2
0 0
0 0
0
I t"_ l
~J
= W* AoWV and Boo = W* BoWV.
The following properties of Aoo and Boo flow easily now.
Aoo and Boo are non-negative definite. {3. Aoo and Boo commute. In fact, AooBoo = O. 'Y. The singular values of Aoo and Ao are identical. The singular values of Ao and A are identical. The singular values of Boo and Bo are identical. The singular values of Bo and B are identical. Q.
Our mission is successful.
Inequalities for Eigenvalues
357
As a consequence of von Neumann's theorem, one can solve another optimization problem. The details follow. Let A E Mrn,n and B E Mn,rn.
P 10.7.8
(1)
Then sup
Re tr(AU BV)
UEUn,VEU",
is attained at some unitary matrices U (2)
= Uo
E Un and V
= Vo
E Urn.
further, p
max
UEUn,VEU",
Re tr(AU BV) =
L O"i(A)O"i(B), i=l
where p
= min{m, n}.
PROOF. One can offer a topological proof of (1). One needs to observe that the spaces of unitary matrices are all compact. We will prove (2). Let Uo and Vo be a solution to the problem in (1). Since
Re tr«AUoB)Vo) ~ Re tr«AUoB)V) for all unitary matrices V, it follows that AUoBVo is non-negative definite. See P 10.7.5. In a similar vein, since Re tr(AUoBVo) = Re tr«BVoA)Uo) ~ Re tr«BVoA)U) for all unitary matrices U, it follows that BVoAUo is non-negative definite. Let us focus on the matrices AUo and BVo. The two matrices (AUo)(BVo) and (BVo)(AUo) are non-negative definite. By von Neumann's theorem, there exists a permutation T of {I, 2, ... ,p} such that max
UEUn,VEU",
Re tr(AU BV)
= Re tr(AUoBVo) = tr(AUoBVo) p
=
L O"i(AUo)O"T(i) (BVo) i=l p
=
L O"i(A)O"T(i)(B). i=l
The proof will be complete if we can show that the identity permutation.
T
can be taken to be
358
MATRIX ALGEBRA THEORY AND APPLICATIONS
Assume, without loss of generality, that m = n. In Step 2 of the proof of von Neumann's theorem, for the matrices AVo and BUo we can find two matrices Ao and Bo each of order m X m with the following properties.
(1) (2) (3) values
Ao and Bo are non-negative definite. Ao and Bo commute. The singular values of A and Ao are identical. The singular of Band Bo are identical.
Since Ao and Bo commute, we can find a unitary matrix W such that
W" AoW = diag{O'I (A), 0'2(A), ... 'O'rrI(A)}, W" BoW = diag{O'1'(I) (B), 0'1'(2) (B), ... 'O'1'(m)(B)}, for some permutation matrix T of {1,2, ... ,m}. Let Wi be the i-th column vector of W. Each Wi is a common eigenvector for both Ao and Bo for the eigenvalues O'i(Ao) and O'1'(i) (Bo), respectively. Let Wo be a unitary matrix with the following properties.
(Exercise: Construct Wo explicitly.) Let us determine the eigenvalues of AoWO' BoWo. Observe that
AoWO'BoWOWI
= AoWO'Bow1'-l(I) = AoWO' (0'1 (BO)W1'-l(I)) = 0'1 (Bo)AoWI = 0'1 (BO)'O'I(Ao)WI'
Thus O'I(Ao)O'I(Bo) is an eigenvalue of AoBo with an eigenvector WI. Likewise, it follows that the eigenvalues of AoBo are: O'i(Ao)O'i(Bo), i = 1,2, ... ,m. Thus m
L O'i(Ao)O'i(Bo) = tr(AoWO' BoWo) i=I
~
=
max
UEU""VEU",
Re tr(AoV BoV)
m
m
i=I
i=I
L O'i(Ao)O'1'(i) (Bo) ~ L O'i(Ao)ai(Bo).
Inequalities for Eigenvalues
359
Consequently,
This completes the proof.
Complements 10.7.1 Let A E Mn have singular values O'n ::; ... ::; 0'1. Denote by ri the Euclidean norm of the i-th row of A, i = 1, ... , n, and by Rl ::; R2 ::; .. . ::; Rn the ordered values of rio Show that k
k
i=l
i=l
L O'~-i+l ::; L R; for k = ~,2, ... , n. Write down a similar upper bound for norms of columns. 10.7.2 (Majorization Theorem) Let A E Mn be Hermitian. The vector of diagonal entries of A majorizes the vector of eigenvalues of A. [Hint: The theorem can be proved by induction by assuming that the result is valid for Hermitian matrices of dimension k for all k ::; n - 1 and using the Interlace Theorem (P 10.2.1). 10.7.3 Let A and B E Mn be Hermitian matrices with ordered eigenvalues >\} 2': ... 2': An and /-Ll 2': ... 2': /-Ln· Further let VI 2': ... 2': Vn be the eigenvalues of A-B. Show that the vector (AI - ILl, ... , An - /-Ln) is majorized by the vector (VI, ... , vn ). 10.7.4 (Weyl's Theorem) Let A, BE Mn be Hermitian and pCB), the rank of B, be less than or equal to 1' . Let the eigenvalues Ai(A), Ai (B) and Ai(A + B) be arranged in decreasing order. Then (1) Ai(A + B) ::; Ai-r(A) ::; Ai-2r(A + B), i = 2r + 1, ... , n. (2) Ai(A) ::; Ai-r(A + B) ::; Ai-2r(A), i = 21' + 1, ... , n . (3) If A = UAU· with U = (ull· . . Iu n ) E Mn unitary and A = diag(Ab'" , An) with Al 2': ... 2': An and if
360
MATRIX ALGEBRA THEORY AND APPLICATIONS
10.7.5 (Analogue of Weyl's result of Complement 10.1.2 for singular values). Let A,B E Mm,n, q = min{m,n} and ai denote singular values. Show that, using 10.7.4,
In particular,
+ B) ::; al (A) + al (B) aq(A + B) ::; min{aq(A) + al(B),
al (A
10.7.6
al(A)
+ aq(B)}.
Let A, BE Mm,n and q = min{m, n} . Show that
10.7.7 Let A E Mn be a Hermitian matrix with non-negative eigenvalues >'1 ~ ... ~ >'n ~ O. Show that for each r = 1, ... ,n, the product >'n>'n-t ... >'n- r+ t is less than or equal to the product of the r smallest main diagonal entries of A.
Note: References to material in this Chapter are: Horn and Johnson (1985, 1991), Rao (1973c) and von Neumann (1937).
CHAPTER 11 MATRIX APPROXIMATIONS
The basic line of inquiry in this chapter proceeds along the following lines. Let A be a given collection of matrices each of order m x n. Let B be a given matrix of order m x n. Determine a matrix A in A such that A is closest to B. The notion of one matrix being too close to another can be sanctified with the introduction of the notion of a norm on the spaces of matrices. In this chapter, we will introduce the notion of a norm on a general vector space and establish a variety of matrix approximation theorems.
11.1. Norm on a Vector Space Let V be a vector space either over the field C of complex numbers or the field R of real numbers. The notion of a norm on the vector space V is at the heart of many a development in this chapter. In what follows, we will assume that the underlying field is the field C of complex numbers. If R is the underlying field of the vector space, we simply replace C by R in all the deliberations with obvious modifications. In Chapter 2, we have already introduced the concept of a norm arising out of an inner product. The definition of norm introduced here is more general which need not be generated by an inner product. DEFINITION 11.1.1 . A map II . II from V to R is said to be a vector norm on V if it has the following properties.
(1) IIxll 2: 0 for all x in V and IIxll = 0 if and only if x = O. (2) II ax II = lalllxll for all x in V and a E C. (3) IIx + yll ::; IIxll + Ilyll for all x and yin V. The pair (V, 11 · 11) is called a normed vector space. Using the norm II . II on the vector space V, one can define a distance function d(·, .) 361
362
MATRIX ALGEBRA THEORY AND APPLICATIONS
between any two vectors of V. More precisely,
d(x,y) = I\x -
ylI,
X,y E V.
In view of the definition of the distance d(·, .), one can interpret IIxli as the distance between the vectors x and 0, or simply, the length of x. Property (3) is often called the triangle inequality satisfied by the norm and it implies that d(x, y) :S d(x, z) + d(z, y) for any three vectors x, y, and z in the vector space V. Some examples are in order. EXAMPLE 11.1.2. Take V = en. Let x = (XI, ... following maps are all norms on the vector space V:
(1) (2) (3) (4)
,xn) E V. The
IIxll oo = max1
The norms presented above are all geometrically different. Let us amplify this statement. Let 11·11 be any given norm on a vector space V over the field e of complex numbers. The unit ball 0 in V is defined by
O={xEV:llxll:Sl}. The following exercise is designed to make one realize how the norms presented above are all geometrically different. EXERCISE 11.1.3. Take V =
R2. View V as a vector space over
R. Graph unit balls in each of the normed vector spaces: (V, II ·1100),
(V, II· 111)' (V, II ·112), (V, 11·111.5),
and
(V, II· Ib)·
There are many more norms one can introduce on the vector space V = en. For example, let A = (aij) be any positive definite matrix of order n X n with complex entries. Let x' = (Xl, ... ,xn ) E V. Define n
n
IIxliA = (L:L:aijXiXj)1/2 = (x* AX)1/2.
i=l j=l
Then II . II A is a norm on the vector space conjugate of the complex number x.)
V.
(Recall that
x
is the
Matrix Approximations
363
There are other ways to generate new norms. One can add two norms to get a new norm. One can multiply a norm by a fixed positive number to get a new norm. The examples of norms given for the vector space en can be transferred to any vector space V over the field e of complex numbers. Let { VI, V2, . .. ,vn } be a basis of the vector space V. Let V E V. One can write v = Xl VI + X2 V2 + ... + XnV n ' with XI! X2, . .. ,Xn in e. Further, the above representation is unique. Let II· II be any norm on en. Define (abuse of notation) IIvll = IIxlI, where x' = (x}, X2, .. . ,x n ) E en. Complements 11.1.1 Take V Why is the map
=
II
en. Fix 0 < p < 1. Let x' = (X},X2, .. . ,x n ) E V. . lip defined below not a norm on V n
IIxlip =
(2: IXiIP)I/P. i=l
Let V = R2. Graph the unit ball in the real normed space . 1100 + II . lid·
11.1.2
(V, II
11.2. Norm on Spaces of Matrices Let m and n be two fixed positive integers. Let Mm,n be the collection of all matrices A = (aij) of order m x n with complex entries ai/so It is clear that Mm,n is a vector space with the usual operations of addition of matrices and scalar multiplication of matrices over the field e of complex numbers. As a matter of fact, the space Mm,n can be identified with the vector space e mn by arranging the entries of each matrix A in Mm,n as an mn-tuple in some order. The following are some of the natural norms one can introduce on the vector space Mm,n analogous to those introduced in Example 11.1.2. EXAMPLE
11.2.1. Let A
= (aij)
E
Mm,n.
II A 1100 = maxl,:Si,:Sm,l,:Sj,:Sn laijl (Loo - norm); (2) IIAIIF = (2:~1 2:j=llaijI2)1/2 (Frobenius norm); (3) IIAlip = (2:~1 2:j=I laijIP)I/p (Lp - norm with p ~ 1 fixed). Let us now concentrate on the vector space Mn. (We take m = n.) (For simplicity, we abbreviate Mn,n as M n .) The vector space Mn (1)
364
MATRIX ALGEBRA THEORY AND APPLICATIONS
has an additional algebraic property: if A, B E M n , then the product AB E Mn. It is only natural to expect a norm on Mn to respect the product operation of matrices.
Mn is said to be a matrix norm for all A and B in Mn-
DEFINITION 11.2.2. A norm 11·11 on
on Mn if IIABII :::;
IIAIIIIBIl
From the definition of a matrix norm, the following are transparent: (1) IIfnll ~ 1 (In is the identity matrix of order n x n). (2) IIAkll :::; (IIAII)k for any A in Mn and positive integer k. (3) IIA-III ~ (IIAII)-I for any nonsingular matrix A in Mn. Why do we need to introduce the matrix norm? Is the plain norm not good enough to measure the magnitude of matrices? The clue lies in Property (2) spelled out above. The limiting behavior of powers of a matrix, say A, is the subject of many an inquiry in applied mathematics and statistics. Suppose for some suitable matrix norm II . II we discover that IIAII < 1. By property (2), it transpires that Ak converges to zero as k ~ 00. Are there not other ways of measuring the magnitude of matrices? Surely, eigenvalues must have a say on the magnitude of matrices. Let A}, A2, .. . ,An be the eigenvalues of a matrix A. DEFINITION 11.2.3. The spectral radius of
A is defined to be the
quantity
One could propose the spectral radius as a measure of the magnitude of matrices. But it does not satisfy any of the properties of a norm.
(1) Ps(A) could be equal to zero without A being equal to zero. For example, look at the matrix
(2) It is quite possible that Ps(A + B) > Ps(A) + Ps(B) (failure of the triangle inequality). For example, look at the matrices ,
A
=
[~ ~]
and B
=
[~ ~].
Matrix Approximations
(3) It is quite possible that Ps(AB) look at the matrices,
365
> Ps(A)Ps(B).
For example,
May be the eigenvalues are not a good choice to develop a norm. The singular values are the ones we need to use to define the norm or magnitude of matrices. We will pursue this task a little later. However, we will not abandon the spectral radius. It has some uses. There is a wonderful relationship between the spectral radius of a matrix and its matrix norm. This is stated below. P 11.2.4
Let A E Mn and
11·11
any matrix norm on Mn. Then
PROOF. By the very definition of the spectral radius, there exists a scalar >. and a non-zero column vector x such that Ax = >.x and p.!(A) = 1>.1. Let B be the matrix of order n x n such that every column of B is the same vector x. Observe that AB = >'B. Since II . II is a matrix norm,
IIAIIIIBIl from which it follows
IIABII = II>.BII = I>.IIIBII, that Ps(A) ~ IIAII. ~
A matrix norm is useful in investigating the existence of an inverse of a matrix. First, we need to examine the precise relationship between the individual entries of a matrix and its matrix norm. Then we need a criterion for the convergence of a matrix series. Let Iij be the matrix of order n x n such that the (i,j)-th entry of Iij is unity and the rest of the entries are all zeros. Let II ·11 be a matrix norm on Mn. Let Let A = (aij) E Mn. Then one can verify that IijA1ij andj. P 11.2.5
laijl ~ BIIAII
Let A E Mn and for all i and j.
II . II
= aijlij
for all i
a matrix norm on Mn. Then
366 PROOF.
MATRIX ALGEBRA THEORY AND APPLICATIONS
Note that
laijlllIijll = IlaijIijll = IIIijAIijll ::; II I ijIl2I1AII, => laijl ::; II I ijllllAIl ::; OIlAII· P 11.2.6 Let A E M n , 11·11 a matrix nonn on M n , and {ad, k ~ 0 a sequence of scalars. Then the series Lk>O akAk converges if the series Lk~O laklllAlik of real numbers converge;' (By convention, AO = In.) PROOF.
Let Ak = (a~~») for each k ~
o.
By P 11.2.5,
foralliandj,andk~ 1. SincellAkll::; IIAllk,theseriesLk~olaklla~7)1 of real numbers converges. Hence Lk~O aka~7) converges. Now we can settle the existence of inverse of a matrix.
P 11.2.7 Let A E Mn and 11·11 a matrix norm on Mn. If IIIn -All < 1, then A-I exists and is given by the series
A-I = :l)In - A)k. k~O
PROOF. From P 11.2.6, note that the series Lk>o(In - A)k converges. Let N be any positive integer. Then -
N
N
A(2:(In - A)k) = (In - (In - A)) 2:(In - A)k k=O k=O = In - (In - A)N+I. Since IIIn - All < 1, (In - A)N+I converges to 0 as N quently, A(2:(In - A)k) = In, k~O
from which the desired result follows.
--+ 00.
Conse-
Matrix Approximations
367
Not all norms are matrix norms. It will be instructive to check how many of the norms introduced in Example 11.2.1 pass muster. See Complements 11.2.1 to 11.2.3. We need more examples of matrix norms. Instead of plunking down some examples, it would be useful to develop a method of generating a variety of norms on Mn using norms on the vector space en. Start with any norm II . II on en. For each A E M n, define
IIAllin =
IIAxl1 -II -II . xECn,x#O X
(11.2.1)
sup
As usual, the vectors x E en are regarded as column vectors so that matrix multiplication Ax makes sense. If we can show that IIAllin is finite, we can say that (11.2.2) for every vector x E en. The eventual goal is to demonstrate that the map II . lIin is a matrix norm on Mn. If this is the case, one can call 1I·lIin, the induced matrix norm in Mn induced by the norm 11·11 on en. The letters "in" in the norm are an abbreviation of the phrase "induced norm". It is clear that IIAllin is non-negative. Observe that
IIAllin
sup
=
xEcn,ilxil=l
IIAxll·
Using topological arguments, one can show that there exists a vector Xo E en (depending on A) such that
IIxoll = 1
and
IIAllin = IIAxoll·
(11 .2.3)
This demonstrates that IIAllin is finite. (The topological arguments use the facts that the map II . II is a continuous function from en to Rand that the set {x E en : Ilxll = 1} is a compact subset of en in the usual topology of en.) If A = 0, then IIAllin = O. Conversely, suppose IIAllin = O. This implies that Ax = 0 for every vector x E en with IIxll = 1. Hence A = O. (Why?) If a is any complex number, then
lIaAllin
=
sup xEcn,ilxil=l
lIaAxl1
=
lal
sup xEcn,ilxil=l
IIAxl1 = lalllAII·
368
MATRIX ALGEBRA THEORY AND APPLICATIONS
We now set upon the triangle inequality. Let A and B E Mn· Note that for each vector x E en, II(A + B)xlI ::; IIAxll + IIBxlI. From this inequality, it follows that IIA + Bllin ::; IIAIlin + IIBllin. Finally, we need to show that IIABllin ::; (IiAllin)(IIBllin). By what we have pointed out in (11.2.3), there exists a vector Xo (of course, depending on AB) such that Ilxoll = 1 and
= IIABxoll = IIA(Bxo)1I ::; IIAllinllBxoll ::; IIAllinllBllinlixoll = IIAllinllBllin.
IIABllin
Thus we have shown that II· II in is indeed a matrix norm. The definition of the induced norm is something one introduces routinely in functional analysis. The matrix A from Mn can be viewed as a linear operator from the normed linear space (en, 11·11) to the normed linear space (en, II . II). The definition of the induced norm of A is precisely the operator norm of A. There is no dearth of matrix norms. Every norm on en induces a matrix norm on Mn. Some examples are included below.
P 11.2.8
For A = (aij) E M n, define n
IIAlloo,in
= l~l!fn L
- - j=l
laijl·
(First we form the row sums of absolute values of entries of A and then take the maximum of the row sums to compute II Alloo,in') Then 1I'lIoo,in is a matrix norm on Mn. PROOF. The main idea of the proof is to show that II . lloo,in is the induced matrix norm on Mn induced by the Loo-norm, II . 1100 on en. For any given matrix A E M n, let us compute IIAxiloo on en. Let x' = (Xl, X2, ... ,Xn) E en be such that Ilxli oo = maxl~i~n IXil = l. Then n
IIAxii oo
n
= l~tfn I L aijxjl ::; l~~~ L - -
j=l
- - j=l
laijllxjl
Matrix Approximations
369
(11.2.4)
Let
n
n
m~ "!aij! = "!akj! for some L.J L.J - - j=1 j=1
l
1
~ k ~ n.
If we can show that there exists a vector 0:' = (O:t, 0:2, .. . ,O:n) in en such that 110:1100 = 1 and IIAo:lloo ~ maxl~i~n 2:::j=1 !aij!, it would then imply that IIAlloo,in ~ maxl~i~n 2:::j=1 !aij! and the assertion of the proposition follows by using (11.2.4). Take for each j = 1,2, ... ,n if akj =1= 0, if akj =
o.
Then the k-th entry in the column vector of Ao: is given by 2:::j=1 !akj!. Consequently,
The norm given in P 1I.2.8 is helpful in answering affirmatively the invertibility of diagonally dominant matrices. First, we need a definition. DEFINITION 11.2.9. A matrix A = (aij) of order n x n is said to be diagonally dominant if n
!aii! >
L
!aijl, i = 1, ... ,n.
j=I,ji.i As an example, let a and b be two scalars such that !a! > (n - 1)!bl. Take aij = a if i = j and aij = b if i =1= j. Every diagonal entry of A is equal to a and every off-diagonal entry is equal to b. Then A is a diagonally dominant matrix. COROLLARY 11.2.10. Every diagonally dominant matrix is invertible.
370
MATRIX ALGEBRA THEORY AND APPLICATIONS
PROOF. Let A = (aij) be diagonally dominant. It then follows that every diagonal entry of A is non-zero. Let D = diag{all' a22,··· ,ann}. The matrix D is invertible. Let B = In - D- I A = (bij). Every diagonal entry of B is zero. Let us compute n
n
By P 11.2.7, In - B is invertible. But In - B = D- I A . Hence A is invertible. Now we focus on the matrix norm 1I·III,in on Mn induced by the norm II . Ih on en. The following result focuses on this particular induced norm. This norm is analogous to the norm 1I·lIoo,in. The only difference is that column sums are involved in the norm instead of row sums.
P 11.2.11
For each matrix A = (aij) in M n , we have n
(First, the column sums of the absolute values of the entries of A are formed and then the maximum among the column sums is determined.) PROOF. Let A = (al,a2, ... ,an), where ai is the i-th column of A. Let x' = (Xl, X2, ... ,Xn ) E en be such that IIxlh = 1. Observe that
IIAxlh
=
IIxlal
+ X2 a2 + ... + xnanlh
n
n
n
j=l
j=l
i=l
n
Consequently, IIAlkin ::; maXI~j~n
L
laijl· Let 6j be the i-th unit i=l vector, i.e., a column vector in which the j-th entry is unity and the rest of the entries are zeros, j = 1,2, ... ,n. Note that 116j 111 = ,I and n
IIA6j ll l
=L i=l
laijl ::; IIAIII,in
for each j.
Matri:z: Approximations
371
n
Hence maxI:5i:5n
L
laiil ~ IIAIh,in. This completes the proof.
i=1
There is another important induced matrix norm induced by the Euclidean norm on C n . This is such an important norm, let us devote some space to it. This norm is called the spectral norm on Mn. DEFINITION
11.2.12. For each A E M n , the spectral norm of A is
defined by IIAlis =
sup xEc n ,lIxIl2=1
IIAxI12.
We need to compute precisely the spectral norm for any given matrix. The singular values of A playa pivotal role in the computation.
P 11.2.13 Let A E Mn and values of A. Then
Al
~
A2
~
... ~ An
~
0 be the singular
i.e., the spectral norm of A is the largest singular value of A. PROOF. Note that Ai, A~, ... ,A~ are the eigenvalues of A* A. Let x(1), X(2)' ... ,X(n) be a set of orthonormal eigenvectors associated with the eigenvalues Ai, A~, ... ,A~. Let x E C n be such that IIxII2 = 1. Write x = aIX(I) + a2X(2) + ... + anx(n) for some scalars a}, a2,'" ,an
in C. The condition that
IIxI12 =
n
1 implies that
L lail 2= 1.
Observe
i=1
that n
IIAxlI~
=
IIA(LaiX(i»)II~
n =
i=1
n
II LaiAiX(i)ll~ = Llail2A~ ~ Ai· i=1
i=1
Consequently, IIAlis ~ AI' But
Earlier , we have introduced the notion of the spectral radius of a matrix. If A is a symmetric matrix, we can conclude that Ps(A) = IIAlis. For any matrix A, we can only assert that Ps(A) ~ IIAlis. We would like to spend some more time on the spectral radius. The spectral radius provides a nice criterion under which convergence of powers of a matrix
372
MATRIX ALGEBRA THEORY AND APPLICATIONS
is guaranteed. But first, we need the following results. The first result is easy to prove.
P 11.2.14 Let 11·11 be any matrix norm on Mn and T a nonsingular matrix in Mn. Define the map II· liT on Mn by
IIAIIT = IIT- I ATI!' Then
II · lIT is a matrix
A E Mn·
norm on Mn.
P 11.2.15 Given A E Mn and E > 0, there exists a matrix norm such that Ps{A) ~ IIAII < Ps{A) + E.
II . II
PROOF. The inequality Ps{A) ~ IIAII was already given in P 11.2.4. Let >'1, >'2, .. . ,>'n be the eigenvalues of A. By Schur triangularization theorem (See Chapter 5), there exists a unitary matrix U and an upper triangular matrix D = (d ij ) such that A = U DU* and dii = >'i for all i. For t > 0, let G t = diag{t, t 2, . .. ,tn}. Observe that GtDG t l is upper triangular and in fact, given by
GtDG t l
dl2/t
0
0
>'3
. .. / d2n /t n- 2 . . . d 3n /t n- 3
o
0
0
...
o >'2
= [
d 13/ t2 d23 /t
>'1
.
.
.
...
dIn/tn_II
.
.
>'n
Choose t > 0 such that the sum of all off-diagonal entries of GtDG t l is < E. Define a new matrix norm II . II on Mn by
Recall the structure of the norm 1I-III,in from P 11.2.11. By P 11.2.14, II . II is indeed a matrix norm on Mn. Let us compute
IIAII = =
l l II{UG t )-1 A(UG t ) IIt. in
= IIGtU*AUGtllh ,in
IIGtDGtllh,in < l~a.2'n I>'jl + E = _J_
This completes the proof.
Ps(A)
+ E.
Matrix Approximations
373
We are now ready to provide the connection between the spectral radius of a matrix and the limiting behavior of powers of the matrix. P 11.2.16
Let A E Mn. Then Ak
-+
0 as k
-+ 00
if and only if
Ps(A) < 1. PROOF. Suppose Ps(A) < 1. By P 11.2.15, there exists a matrix norm II . II on Mn such that IIAII < 1. (Why?) Consequently, Ak converges to zero as k -+ 00. See the discussion following Definition 11.2.2. Conversely, suppose Ak converges to 0 as k -+ 00. There is nothing to prove if Ps(A) = o. Assume that Ps(A) > o. Let A be any non-zero eigenvalue of A. Then there exists a non-zero vector x in such that Ax = AX. Consequently, AkX = Akx. Since Akx converges to o as k -+ 00, and x i= 0, it follows that Ak converges to 0 as k -+ 00. But this is possible only when IAI < 1. Hence Ps(A) < 1. The spectral radius of a matrix A has a close connection with the asymptotic behavior of IIAkll for matrix norms II . II. The following result spells out the precise connection.
en
P 11.2.17
Let
II . II
be any matrix norm on Mn. Then for any
A2, ... ,An are the eigenvalues of A, then At, A~, ... ,A~ are the eigenvalues of Ak for any positive integer k. Consequently, Ps(Ak) = [Ps(A)]k. By P 11.2.4, Ps(Ak) ~ IIAkll. Hence Ps(A) ~ (IIAkll)l/k. Let c > 0 and B = (Ps(A) + c)-l A. The matrix B has spectral radius Ps(B) < 1. By P 11.2.16, IIBkl1 -+ 0 as k -+ 00. We can find m ~ 1 such that IIBkll < 1 for all k ~ m. Equivalently, IIAkll < (Ps(A) + cl for all k ~ m. This means that (IIAkll)l/k ~ Ps(A) + c for all k ~ m. Thus we have Ps(A) ~ (IIAkll)l/k ~ Ps(A) + c for all k ~ m. PROOF. If AI,
Since c
> 0 is arbitrary, the desired conclusion follows.
Complements 11.2.1 Show that the Leo-norm on Mn is not a matrix norm (n ~ 2). However, show that the norm defined by
374
MATRIX. ALGEBRA THEORY AND APPLICATIONS
for A = (aij) E Mn is a matrix norm on Mn. See Example 11.2.1. 11.2.2 Show that the F'robenius norm on Mn is a matrix norm on Mn. See Example 11.2.1. 11.2.3 Show that the Lp-norm on Mn is a matrix norm on Mn if and only if 1 ~ P ~ 2. (An outline of the proof. The objective of this exercise is to examine which norms in Example 11.2.1 are matrix norms. If p = 1 or 2, it is easy to demonstrate that the Lp-norm is indeed a matrix norm. Let 1 < p < 2. Determine q such that 1P + 1q = 1. Note that 0 < p - 1 = ~ < 1. Let A = (aij) and B = (b ij ) E Mn. Then
n
IIABII~ =
n
n
L L IL airbrJIP i=1 j=1 r=1
$
ttl (t.lai.IP)! (~?';Iq) l tt.lai.lpt(t(lb';IP)!)'
$
IIAII~
$
(by HOlder's inequality)
(t t Ib,;IP)
$
(IIAII~) (II BII~ )
The inequality (a 8 + b8) ~ (a + b)8 for any a ~ 0, b ~ 0, and () ~ 1 is at the heart of the last step above.) 11.2.4 For any norm 11 · 11 on en, show that IIInllin = 1. 11.2.5 For any matrix A = (aij) in Mn, show that IIAllt.in = IIA* lloo,in, where A * is the adjoint of A. 11.2.6 If A is symmetric, show that Ak converges to 0 if and only if Ps(A) < 1 using the spectral decomposition of A.
11.3. Unitarily Invariant Norms Let m and n be two positive integers. Let Mm,n be the collection of all matrices of order m x n with complex entries. In many statistical applications, we will be concerned with data matrices with m being the sample size and n the number of variables. Generally, m will be much
Matrix Approximations
375
larger than n. We do not have the convenience of matrix multiplication being operational in the space Mm,n' Consequently, the idea of a matrix norm does not make sense in such a general space. In this section, we will look at a particular class of norms on the space Mm,n and find a way to determine the structure of such norms. First, we start with a definition. 11.3.1. A real valued function II . II on the vector space M m •n is said to be a unitarily invariant norm and denoted by 1I·lIui, if it has the following properties. DEFINITION
(1) IIAII ~ 0 for all A E Mm ,n' (2) IIAII = 0 if and only if A = O. (3) lIoAIl = 101 II All for every 0 E C and A E Mm,n . (4) IIA + BII ~ IIAII + IIBII for all A and Bin Mm ,n' (5) IIU AVII = IIAII for all A E Mm,n and unitary matrices U and V of orders m x m and n x n, respectively. The first four properties are the usual properties of a norm. The fifth property is the one which adds spice to the theme of this section. IT M m •n is a real vector space we use the term orthogonally in the place of unitarily invariant norm. We will discuss in an informal way how such an invariant norm looks like. Let A E Mm,n ' Let CTl(A) ~ CT2(A) ~ . . . ~ CTr(A) be the singular values of A, where l' = min{m, n}. By the singular value decomposition theorem, there exist two unitary matrices P and Q of orders m x m and n x n, respectively, such that A = PDQ , where if l' = m, if r = n, with Dl = diag{CTl(A), CT2(A), ... ,CTr(A)} and O's are the appropriate zero matrices. Then IIAllui = IIPDQllui = IIP* PDQQ*lIui = IIDllui . Note that IIDII is purely a function of the singular values of A. Let us denote this function by cp, i.e.,
IIDllui = cp(CTl (A), CT2(A), ... ,CTr(A)). The question then arises as what kind of properties the function cp( .) should possess. One thing is pretty clear. The function cpO must be a
376
MATRIX ALGEBRA THEORY AND APPLICATIONS
symmetric function of its arguments. We will make these notions more precise shortly. Before determining the structure of unitarily invariant norms, let us look at some examples. EXAMPLE 11.3.2. The F'robenius norm on the vector space Mm,n is unitarily invariant. Let A = (aij) and 0"1 2: 0"2 2: . . . 2: 0" r 2: 0 be singular values of A, where r = min{m,n}. Then m
IIAIIF
n
= (L L i=1
laijI2)1/2
= (Tr(A* A))1/2
j=1 r
= (Sum of all the eigenvalues of A* A)I/2 = (LO"~)1/2.
i=1
If U and V are unitary matrices of order m x m and n x n, respectively, then
IIU AVIIF
= [tr((U AV)*(U AV)W/2 = [tr(V* A*U*U AVW/2 = [tr(V* A* AVW/ 2 = [tr(A* AW/2.
The eigenvalues of A* A and V* A* AV are the same for any unitary matrix V. (Why?) Thus the F'robenius norm on Mm,n is seen to be unitarily invariant. EXAMPLE 11.3.3. The spectral norm on Mm,n is also unitarily invariant. In Section 11.2, we have defined the spectral norm on the vector space Mn. It can be defined in two equivalent ways on the space Mm,n too. Let A E Mm,n. One way is:
IIAlis
II Axll2
= xEC",xtO sup -11-11-· x 2
Another way is: IIAlis = 0"1 = max{O"I, 0"2, ... ,O"r} = (Ps(A* A))1/2. One can check that both approaches lead to the same answer. Note that Ps(A* A) is the spectral radius of the matrix A* A. One can also check that the spectral norm is unitarily invariant. Now we take up the case of determining the structure of unitarily invariant norms. Let P n be the group of all permutations of the
Matrix Approximations
377
set {I, 2, ... ,n}. Every member 7r of P n is a one-to-one map from {I, 2, . .. ,n} to {I, 2, ... ,n} and is called a permutation. For each x' = (Xl, X2, ... ,Xn ) E Rn and 7r E P n, let x~ = (X 1T (I) , X1T(2) , ... ,X1T (n»). The vector X 1T is an n-tuple which permutes the components of x. We want to introduce another entity. Let I n denote the collection of all n X n diagonal matrices with each diagonal entry being equal to either +1 or -1. We are ready to introduce a special class of functions. DEFINITION 11.3.4. A real valued function cP from Rn to R is said to be a symmetric gauge function if it has the following properties. (1) cp(x) > 0 for all X E Rn with X =1= o. (2) cp(ax) = lalcp(x) for all X E R n and a E R. (3) cp(x + y) :s; cp(x) + cp(y) for all X and yin Rn. (4) CP(X1T) = cp(x) for all X in R n and 7r E P n . (5) cp(Jx) = cp(x) for all X in Rn and J E I n . The first three properties merely stipulate that the function cp be a norm on the real vector space Rn. The fourth property stipulates that the function cp be symmetrical in its arguments (permutation invariance). The fifth property exhorts the function to remain the same if signs are changed at any number of arguments. The usual Lp-norms on the vector space Rn are some good examples of symmetric gauge functions. The sum of any two symmetric gauge functions is one of the family. A positive multiple of a symmetric gauge function retains all the properties of a symmetric gauge function. Here are some more examples. EXAMPLE n R , let
11.3.5. Let 1 :s; k:S; n be fixed. For x' = (XI,X2, ... ,xn ) E
One can check that CPk(·) is indeed a symmetric gauge function. If k = 1, then CPkO is the usualioo-norm on Rn,. If k = n, then CPkO is the usual iI-norm on Rn. We establish some simple properties of symmetric gauge functions . P 11.3.6 Let cP be a symmetric gauge function. (1) If x' = (X},X2, ... ,xn ) ERn and O:S; PI,P2,··· ,Pn
CP(PIXI,P2X2, ... ,Pnxn)
:s; cp(XI,X2, ...
,X n ).
:s;
1, then
378
MATRIX ALGEBRA THEORY AND APPLICATIONS
(2) If Xi ~ 0 for i = 1, ... ,n and YI ~ XI,Y2 ~ X2,' " ,Yn ~ Xn, then cp(Xl, X2, · · . , xn) :::; cp(YI' Y2, ... , Yn). (3) There exists a constant k > 0 such that for all (Xl, X2, .. , , Xn)' E
Rn,
As a matter of fact, k = cp(l, 0, 0, ... ,0). (4) The function cP is continuous. PROOF. (1) Assume that 0 :::; Pi < 1 for exactly one i and Pj = 1 for the rest of j's. For simplicity, take i = 1 and write PI = p. We will show that
The general result would then follow in an obvious way. Let
Note that u + v = (pXl, X2, X3, ... ,xn). By Properties (2), (3), and (5) of a symmetric gauge function,
CP(PXI, X2, ... , xn) = cp(u + v) :::; cp(u) + cp(v) l+p I-p = (-2-)CP(x) + (-2-)CP(x)
= cp(x) .
(2) Since 0 :::; Xi :::; Yi, we can write Xi = PiYi for some 0 :::; Pi :::; l. Now (2) follows from (1). (3)
Observe that
cp(Xl,X2, ... ,xn) =CP(XI +0,0+X2,0+X3, ... ,0+Xn ) :::; CP(XI, 0, 0, ... ,0) + cp(O, X2, X3, ... , xn)
:::; IX llcp(1, 0,0, ...
,0) + cp(O, X2, 0, ... ,0)
+cp(0,0,X3,X4, ... ,xn)
Matrix Approximations
379
n
~ k(Llxil). i=l
Also, since for each i, (0,0, ... ,0, IXil. 0, ... ,0) ~ (lXII, IX21, ... ,Ixnl} coordinate-wise, by (2),
IXil cp(O, 0, ... ,0,1,0, ... ,0) = cp(O, 0, ... ,0, lXii, 0, ... ,0) ~ CP(IXII, IX21,··· , IXnl) = cp(xt, X2, ... , xn), which implies that CP(XI, X2, ... , xn) ~ klxi I for each i. Hence
CP(XI,X2, ... ,xn) ~ k( m~x IXil). l~t~n
(4)
Since cP is a norm,
ICP(XI,X2, ... ,x n ) -CP(YI,Y2, .. . ,Yn)1 n
~ CP(XI - YI, X2 - Y2,··· , Xn - Yn) ~ k(L IXi - Yil). i=l
The continuity of cP now follows. We have already indicated that a unitarily invariant norm of a matrix is basically a function of the singular values of the matrix. We will show that any such norm is generated by a symmetric gauge function.
P 11.3.1 Let 1I·lIui be a unitarily invariant norm on the vector space Mm,n' Assume, for simplicity, m ~ n. For each (Xl, X2, ... , Xm) E Rm, let
X
° ),
= (mxm D: mx(n-m)
and CP(XI,X2, ... ,xm)
= IIXllui,
where D = diag{ Xl, X2, ... ,xm }. Then cP is a symmetric gauge function on Rm. Conversely, if cP is a symmetric gauge function on Rm, then the map II . II defined by
IIAII =
cp(uI(A), u2(A), ... , um(A)), A
E
Mm,n
380
MATRIX ALGEBRA THEORY AND APPLICATIONS
is a unitarily invariant norm on the vector space Mm,n· Note that the eigenvalues of X X* are IX112, IX212, ... ,l xmI 2. Consequently, the singular values of X are IX11,lx21, ... ,Ixml. It is a routine job to check that cp is a symmetric gauge function. Conversely, let cp be a given symmetric gauge function. The singular values of a matrix can be written in any order. Since cp is symmetric, it is clear that the map II . II induced by cp is well defined and has the following properties. PROOF.
(1) IIAII ~ 0 for all A E Mm,n. (2) IIAII = 0 if and only if A = o. (3) lIaAII = lalllAl1 for all a E C and A E Mm,n. (The singular values of aA are laIO"I(A), ... ,laIO"m(A).) (4) If U and V are unitary matrices of order m x m and n x n, respectively, then IIU AVII = IIAII. The critical step would be to show that the map II . II satisfies the triangle inequality. Let A and B E Mm,n with singular values
Define
O"(A) = (0"1(A)'0"2(A), ... ,00m(A))', O"(B) = (O"I(B), 0"2(B), ... ,00m(B))', O"(A + B) = (0"1 (A + B), 0"2(A + B), ... , O"m(A + B)). By P 10. 1.3, the vector O"(A + B) is weakly majorized by the vector O"(A) + O"(B). By P 9.3.8 and Complement 9.3.2, there exists a doubly stochastic matrix S of order m x m such that
O"(A + B) :S S(O"(A)
+ O"(B))
coordinate-wise. Every doubly stochastic matrix can be written as a convex combination of some permutation matrices, i.e.,
where O/s are non-negative, 2:;=10i matrices. Observe that
IIA + BII =
=
cp(O"(A + B)) :S cp(S(O"(A)
1, and Pi'S are pennutation
+ O"(B)))
(by P 11.3.6 (2))
Matrix Approximations
~
+
r
381
(by the triangle inequality) r
=
+
i=1
i=1
r
r
~ L Oi
i=1 (by the triangle inequality)
r
= L
r
Oi
i=1
+L
Oi
i=1 (by the permutation invariance property of
=
Complements 11.3.1 (Ky Fan (1951)) Let CTI 2:: ... 2:: CTr 2:: 0 and CT~ 2:: .. . 2:: CT~ 2:: 0 be two sets of values. Show that for any gauge function
if and only if CTI + ... + CTi 2:: CT~ + ... + CT:, i = 1, . .. ,r. This result is useful in solving optimization problems as given in the next example. 11.3.2 Let CTi(AI), CTi(A2), i = 1, ... ,r = min(m, n) be the singular values of Al and A2 E M m •n . Show that IIAIII 2:: IIA211 for any unitarily invariant norm II . II on M m •n if and only if
This result will be used in Section 11.5 in some optimization problems. 11.3.3 Let G E M m , and P E Mm be a symmetric idempotent matrix of rank, p(P) = k ~ m. Show that
11.3.4
(Ky Fan and Hoffman (1955)) . Let A E Mn . Then Ai(A
A*) ~ 2CTi(A) .
+
382
MATRIX ALGEBRA THEORY AND APPLICATIONS
11.3.5 Then
Let A E Mm,n with p(A)
(1) O'i(A - B) ~ O'i+k(A), i ~ 0, i
= rand
B E Mm,n with p(B)
=
k.
+ k :::; l' + k > r.
(2) The equalities in (1) are attained if and only if k :::;
l'
and
while the singular value decomposition of A is
Since p(B) = k, it has a rank factorization B that C E Mm,k, C*C = hand D E Mk,n' Then PROOF.
=
CD, such
(A - CD)*(A - CD) ~ A*(I - CC*)A.
~ O';(A - B)
=
/\(A - B)*(A - B)
~ Ai[A*(I - CC*)A]
~ Ak+i(AA*) = O'~+i(A) by the result of Complement 11.3.3, noting that (I-CC*) is idempotent and has rank equal to (m - k). Obviously O'i(A - B) ~ 0 for i + k > 1'. This proves (1). The sufficiency part of (2) is trivial. But the proof of necessity is a bit involved. For details the reader referred to Rao (1980).
11.3.6 If A, B and A - B are Hermitian and non-negative definite matrices of order m and if B is utmost rank k, then (1) Ai(A - B) ~ Ak+i(A).
(2) A necessary and sufficient condition for equality in (1) to hold for all i is that B = Al (A)VI V{ + ... + Ak(A)Vk V£ where VI, ... , Vk are the first k eigenvectors of A.
Matrix Approximations
383
11.4. Some Matrix Optimization Problems
In this section we consider some matrix optimization problems which are useful in solving matrix approximations considered in the next section. For simplicity of notation, we will consider matrices with real entries only. The results readily extend to complex matrices if we replace the phrase "transpose" by "conjugate transpose" , and the phrase "symmetric" by "Hermitian." P 11.4.1 Let A E Mn be a symmetric matrix with eigenvalues Al ~ A2 ~ ... ~ An and Xl, ... ,Xn be some orthonormal vectors. (Note that we are writing the eigenvalues of A in decreasing order of magnitude.) Then k
k
(1) Lx~Axi ~ LAi' k = 1, ... ,n - 1, i=l n
i=l n
(2) Lx~Axi = LAi. i=l
i=l
PROOF. Let Xk = (xII .. . Ixk) and A = PAP', be the spectral decomposition of A, where A = diag(AI,'" ,An) and P an orthogonal matrix. Then for any 1 ~ k ~ n, k
Lx~Axi = tr(X~AXk) = tr(X~P A P'Xk) i=l n
= tr(AP' XkX~P)
= tr A (QQ') = LAiqii I
where Q = P' Xk so that Q'Q = I (of order k x k) and qii is the i-th diagonal element of the idempotent matrix QQ'. When k = n, QQ' = I (of order n x n), which proves (2). When k ~ n - 1, we have n
k
k
L Aiqii ~ L Aiqii + (k - L Qii)Ak+1 i=l
i=l
i=l
k
= L(Ai - Ak+dQii i=l
+ kAk+1
384
MATRIX ALGEBRA THEORY AND APPLICATIONS k
k
~ 2)Ai - Ak+1) + kAk+1 i=l
= LAi' I
n
2: qii = 1, tr (QQ') = tr(Q'Q) = k,
and q;; ~ 1. i=l In view of results {l) and (2) of P 11.4.1, we may say that the eigenvalues Ai majorize X~AXi and hence, in particular, that Ai rnajorize the diagonal elements of the symmetric matrix A. More formally, let x' = (x~ AX1, X~AX2' . .. ,x~Axn), y' = (AI, A2, . .. ,An), and z' = (all, a22, ... ,ann), where aii is the i-th diagonal entry of A. Then since
x
«y
and
z« y.
The matrix A need not be non-negative definite. The equality in (1) of P 11.4.1 is attained for a given k, when xl. ... ,Xk are the eigenvectors of A corresponding to the eigenvalues At, ... ,Ak. The result of P 11.4.1 readily extends to the maximization of a bilinear form as stated in the next proposition. P 11.4.2 Let zi = (xi: yD be mutually orthonormal vectors, i = 1, ... ,p, where Xi is an m-vector and Yi is an n-vector. Further, let A E Mm,n have singular values a1 ~ a2 ~ ... ~ a r > a r +1 = a r +2 = ... - a p = 0, where p = min(m, n) and A has rank 1·. Then k
k
L2x~AYi < Lai, k = 1, ... ,po i=l
i=l
The maximum is attained when Xi and Yi are the singular vectors of A associated with the singular value ai, i = 1, ... ,k. PROOF.
Let M =
[~, ~]
which is symmetric of order (m + n) x (m + n). Note that a1, ... , a p are the p largest eigenvalues of the symmetric matrix M. Then for any 1 ~ k ~ p, k k k L2x~AYi = Lz~Mzi < Lai' i=l i=l i=l
Matrix Approximations
385
by P 11.4.1, which proves the first part of the result. The second part is left to the reader as an exercise. (It will be instructive to write down all the eigenvalues of M.) P 11.4.3 Let A = (aij) E M m •n with singular values a1 ;::: a2 ;::: ... ;::: a r > a r +1 = a r +2 = ... = a p = 0, where p = min{m,n}. Then k
k
(1) La2n , < La;, i=l i=l k
k = 1, ... ,p,
k
(2) Laii < Lai, i=l
k=l, ... ,p.
i=l
Equality in (1) holds if and only if the leading k x k principal submatrix of A is diagonal and laiil = ai, i = 1, ... , k. Equality in (2) holds when equality holds in (1) and aii ;::: 0, i = 1, ... ,k. PROOF.
We have k
k
k
n
"" a~, < ""("" a ,) ~U-~~tJ 2
i=l
i=l j=l
k
= "" e~AA'e;,-~t' < "" a~ ~t i=l
i=l
where the second inequality follows from P 11.4.1 and ei is the i-th column of 1m. The inequality (1) is proved. The inequality (2) follows similarly from P 11.4.2 setting Xi = 2- 1/ 2 ei and Yi equal to 2- 1 / 2 (i-th column of In). P 11.4.4 (Abel's identity). Let a1, ... ,an and b., ... ,bn be two finite sequences of scalars. Then
PROOF.
The right-hand side of the above identity may be written as n-1 i n-1 i n L(ai Lbj ) - L(aH1 Lb j ) + an Lbj i=l j=l i=l j=l j=l
386
MATRIX ALGEBRA THEORY AND APPLICATIONS
n-l i n i-I n = L(ai Lbj ) - L(ai Lbj ) + an Lbj j=l i=l j=l i=2 j=l n-l i i-I n-l n = Lai(Lbj - Lbj ) + al bl - an L bj + an Lbj i=2 j=l j=l j=l j=l n-l n = L aibi + alb l + anbn = L aibi' i=2 i=l
A more general version of the following result has been established in Chapter 10. In the special case reported here, the proof is simple and instructive.
P 11.4.5 (von Neumann (1937)). Let A and B E Mn be both symmetric with eigenvalues Al 2: ... 2: An and J-LI 2: ... 2: J-Ln. Then n
n
i=l
i=l
Equality holds on the right when B n
left when B =
=
n
E J-LiPiP!
E J-Ln-i+l Pi P!, where Pi
i=l eigenvalue Ai, i = 1, ... ,n.
and equality on the
i=l
is an eigenvector of A for the
PROOF. Let A = P /\ pi be the spectral decomposition of A with /\ = diag(Ab ... ,An) and P orthogonal. Then
tr(AB} = tr(P /\ pi B) = tr(/\p l BP) = EAibii, where bii is the i-th diagonal element of pi BP. Using Abel's identity (P 11.4.4), we obtain
Matrix Approximations
387
where the inequality follows from P 11.4.1, since P' BP and B have the same eigenvalues. The inequality on the left-hand side follows by replacing B by -B in the inequality on the right-hand side. P 11.4.6 Let A E Mm,n and B E Mn,m, with singular values 01 ;:::: and {31 ;:::: {32 ;:::: •.. ;:::: {3p, respectively, where p = min{m, n}. Then .•. ~ op
p
-L
p
oi{3i
~ tr{AB} ~
i=1
L
{11.4.1}
oi{3i.
i=1 p
Equality holds on the right when B
= 2: {3iQiP:
p
left when B =
2: {3i {-Qi}P:,
i=1 A for the singular value PROOF.
and equality on the
i=1
Oi,
i
where Pi and Qi are singular vectors of
= 1, ...
, p.
Note that the eigenvalues of the symmetric matrix
[~, ~]
°
°;: :
are 01 ;:::: ... ;:::: Op ;:::: = ... = -op ;:::: ... > number of O's is 1m - nl. By P 11.4.5, we have
-01,
where the
[0 °A] [0B °B']
2tr{AB} = tr A' p
~ LOi{3i
p
+ L{-Oi)(-{3i) =
p
2L o i{3i.
{11.4.2}
1
The rest of the results are easily established.
Complements 11.4.1 Let A E Mm,n and B E Mm,n with singular value decompositions P 6. 1 Q' and R6. 2 S', singular values 01 ;:::: ... ;:::: Or and {31 ;:::: ... ;:::: {3rJ respectively, where r = min{m, n}. Show that
where U and V are any unitary matrices. Further, show that the upper bound is attained when U = QR' and V = SP'. Show also that, if
388
MATRIX ALGEBRA THEORY AND APPLICATIONS
m = n, tr(AU) ~ 01 + ... + Or for any unitary matrix U with the upper bound attaining when U = QPl. (The results follow from von Neumann's propositions P 11.4.5 and P 11.4.6.) 11.4.2 (Ky Fan and Hoffman (1955)) Let A be a square matrix. Show that ,X; (A + A') ~ 20"i(A) = 2[Ai(A' AW/ 2 where Ai(A + A') is the i-th largest eigenvalue of (A + A') and O"i(A) is the i-th singular value of A. 11.4.3 Let A E Mm,n and q = min{m, n}, and Nk(A) = 0"1 (A) + ... + O"k(A) denote the sum of the k largest singular values of A. Show that NkO is a norm on Mm,n for k = 1, ... ,q, and that when m = n, NkO is a matrix norm on Mn for k = 1, ... ,n. The function Nk(A) is called Ky Fan norm. 11.4.4 Let Al and A2 E Mn symmetric with Al - A2 being nonnegative definite. Show that Ai(Ad ~ Ai(A 2), i = 1, ... ,n. 11.5. Matrix Approximations In this section, we will look at some problems of approximating a given matrix by a matrix with specific structural properties. Let A E Mn be a given matrix. First we consider the problem of finding a symmetric matrix B E M n , which is closest to A with respect to a given norm. In the following we provide an explicit solution to this problem. P 11.5.1 Let A E Mn and B = (A+A')/2. Then B is a symmetric matrix closest to A with respect to any orthogonally invariant norm II·lI. PROOF.
Let C be any symmetric matrix in Mn. Note that
IIA - BII = II(A - A')/211 = (1/2)II(A - C) - (A' - C) II = (1/2)II(A - C) - (A - C),II
+ II (A - c)'lIl C) II + II(A - C) III =
~ (1/2)[11 (A - C) II
= (1/2)[II(A -
II(A - C)II.
A matrix B of order n X n with real entries is said to be skewsymmetric if B = - B'. We can find explicitly a skew-symmetric matrix closest to a given matrix.
Matrix Approximations
389
P 11.5.2 Let A E Mn and B = (A - A')/2. Then B is a skewsymmetric matrix closest to A with respect to any orthogonally invariant norm 11·11 . These results can be formulated for matrices with complex entries. If A is a given matrix of order n X n with complex entries, the objective is to find a hermitian matrix B closest to A. Likewise, one can find an anti-hermitian (A matrix B is anti-hermitian if B = -B*.) matrix closest to A. Our next objective is to find an orthogonal matrix closest to A. In Complements 11.5.1 and 11.5.3, we address this problem. In the following, we prove very general results useful in solving statistical problems. P 11.5.3 Let A and B E Mm,n with singular value decompositions A = P 6 1 Q' and B = R 62 S', respectively. Then
where Ok is the class of all k X k orthogonal matrices , U* = RP' and T* = SQ', and II . IIF is the Frobenius norm. PROOF. Consider
IIU A - BTI\} = tr[(U A - BT)'(U A - BT)] = tr(A' A) + tr(B' B) - 2tr(A'U' BT). We have to find the minimum of the last term in the above expression, when U and T roam over orthogonal matrices. Using von Neumann's result stated in the Complement 11.4.1, we obtain the desired result. The result may not hold for all unitarily invariant norms. A number of results follow from P 11.5.3. These consequences are listed as Complements 11.5.1 to 11.5.6. Next in line is the non-negative definiteness property. For a given matrix A of order n X n, we seek a non-negative definite matrix B closest to A under the Frobenius norm. Let A be a matrix of order n x n with real entries. Let Let B = QH be a polar decomposition of B with Q being orthogonal and H non-negative definite. Then C = (B + H)/2 is
P 11.5.4
B
= (A + A')/2.
390
MATRIX ALGEBRA THEORY AND APPLICATIONS
non-negative definite and closest to A under the Frobenius norm Moreover, C is unique.
II . II·
PROOF. First, we show that any non-negative definite matrix closest to A is also closest to B, the symmetric part of A . For this, we note a special property of the Frobenius norm. Let D = (d ij ) and E = (eij) be symmetric and skew-symmetric matrices, i.e., D = D' and E = - E', respectively. Then nn
nn
nn
n
liD + EII~ = LL(dij + eij)2 = LLd~j + Le~j +2 LLdijeij i=l j=l n
=
n
i=l j=l n
i=l
i=l j=l
n
LLd~j + LLe~j +0= IIDII~ + IIEII~ · i=l j=l
i=l j=l
Let X be any non-negative definite matrix of order n x n. Then
IIA - XII~ = II(A + A')/2 - X
+ (A -
A')/211~
= liB - XII~ + II(A - A')/211~, in view of the facts that B - X is symmetric and (A - A')/2 is skewsymmetric. Thus minimizing IIA - XIIF over all non-negative definite matrices X is equivalent to minimizing liB - XIIF over all non-negative definite matrices X. Let us work with the symmetric matrix B. Let B = ZAZ' be the spectral decomposition of B , where Z is an orthogonal matrix, A = diag{Al' A2, .. . ,An}, and AI, A2 , . .. , An are the eigenvalues of B. Let us replace each negative eigenvalue by zero in the spectral decomposition of B. More precisely, let for each 1 ::; i ::; n , if Ai 2:: 0, if Ai < 0, and C = ZDZ' , where D = diag{d l ,d2 , ... ,dn }. It is clear that Cis non-negative definite. Let us see whether C is closest to B . Let X be any non-negative definite matrix. Note that
liB - XII~ = IIZAZ' - ZZ'XZZ'II~ =
IIA -
Z'XZII~ .
Matrix Approximations
391
The last equality follows from the fact that the F'robenius norm is orthogonally invariant. Let Y = Z'XZ = (Yij). Then n
2
liB - XII = LYTj
n
+ L(Ai -
ii:j
Yii)2
~ L(Ai - Yii)2 ~ LAT.
i= 1
i= 1
Ai <0
The last inequality follows from the fact that Y is non-negative definite and Yii ~ O. On the other hand,
liB - CII}
= IIZAZ' -
ZDz'll}
L
= IIA - DII} =
A~.
(Ai >0)
The matrix C is the right choice. Two facts emerge from these deliberations. The first one is that the matrix C which minimizes IIA - XII 2 over all non-negative definite matrices X is unique. The second one is that this minimum can be explicitly spelled out, i.e., min IIA
- XII}
=
L AT + II(A - A')/211
2
.
Ai
The final step consists of tying up the matrix C with the polar decomposition of B. The polar decomposition of B is derivable directly from the spectral decomposition of B. Let for each 1 ::; i ::; n, ei
= {
+1
-1
if Ai
~
0,
if Ai < 0,
and E = diag{ ell e2, . .. ,en}. Note that E is an orthogonal matrix and
where Q = ZEZ' and H = Z diag{IAII, IA21, ... ,IAnl}Z'. It is clear that Q is orthogonal and H is non-negative definite. This is the polar decomposition of B. Note that
(B
+ H)/2 = (ZAZ' + Z diag{IAll, IA21, ...
,IAnl}Z')/2
= ZDZ' = C.
It is worthwhile to note that the result of P 11.5.4 has been proved to be operational under the F'robenius norm. No concrete results are
392
MATRIX ALGEBRA THEORY AND APPLICATIONS
available for orthogonally invariant norms. However, for the spectral norm II . lis, Halmos (1972) has obtained the following result. Let A be a matrix of order nXn with real entries. Let B1 = (A+A')/2 and B2 = (A - A')/2 be the symmetric and skew-symmetric parts of A, respectively. Let C = B1 + [82 In + BiP/2, min{r ~ 0 : 1. 2 In + B~ is non-negative definite and B1 + (r2 In + Bnt is non-negative definite}. Then C is closest to A under the spectral norm. Now we concentrate on the problem of approximating a given matrix A E Mm,n of rank l' by a matrix B of lower rank k < l' such that IIA-BII is a minimum under an orthogonality invariant norm II . II. Such an optimization problem occurs in the representation of high dimensional data in lower dimensions. The symbol Mm,n(r) stands for the collection of all matrices of order m x n with rank ~ r. where 8
=
P 11.5.5 (Mirsky (1960), Eckart and Young (1936).) Let A E M m,n(1') with the singular value decomposition A = lT1 (A)P1Q~ + ... + lTr(A)PrQ~, and define B* = lT1(A)P1Q~ + ... + lTk(A)PkQ~, k ~ r. Then min IIA - BII = IIA - B* II BEMm .,..{k)
for any orthogonally invariant norm 11·11. PROOF. Note that the above result provides the best approximation of A of rank r by a matrix of lower rank k. The proof consists of showing that lTi(A - B*) ~ lTi(A - B), i = 1, ... ,p, P = min{m, n}. Then the result follows by using the Complement 11.3.2. Let B E Mm,n of rank k. Since p(B) = k, it has a rank factorization B = CD where C E Mm,k(k) such that C'C = I and D E Mk,n(k). Then
(A - B)'(A - B)
= (A - CC'A + CC'A - CD)'(A - CC' A + CC'A - CD) = A'(I - CC')A
+ (C' A -
D)'(C' A - D) ~ A'(I - CC')A
(This inequality can also be established by regressing A on C in the model A = CD + c. The least squares solution in D is obtained by
Matrix Approximations
393
minimizing (A - CD)'(A - CD) with respect to D. The solution is given by D = (C'C)-IC' A.) Then using the Complement 11.4.3, since I - CC' is idempotent, u?[(A - B)] = Ad(A - B)'(A - B)]
~ AiA'[(I - CC')A] ~ uT+k(A), i
+k
:::; r,
which gives that ui(A - B) ~ ui+k(A). But uT(A - B.)
= UT(Uk+l (A)Pk+1 Q~+l + ... + oAA)PrQ~) = UT+k(A),
so that ut(A - B.) :::; ut(A - B), as required. What is the closest approximation by a normal matrix X to a given matrix A E Mn? For this we have to solve the optimization problems:
cp(X) == IIA - XII~ subject to g(X) == X' X - X X' = Minimize
o.
There is no closed form solution to the problem. A numerical algorithm is given by Bao and Rokne (1987). Complements
In all the following problems, A and B stand as generic symbols for matrices in Mm,n. The symbol Ok stands for the collection of all k x k orthogonal matrices. 11.5.1 Show that min{IIU A - BTIIF : T E Mm,n and U E On} is attained at BT = PBU A and U = RP', where AA' = P /\1 p' and PB = R /\2 R' are the spectral decompositions of AA' and PB, the orthogonal projection operator on the space spanned by the columns of B, respectively. 11.5.2 Let A E Mn. Show that min{IIA-TIlF : T EOn} = IIA-T.IIF with T. = PQ', where P6.Q' is the singular value decomposition of A. 11.5.3 Show that the result of Complement 11.5.2 holds for any unitarily invariant norm. (Ky Fan and Hoffman (1955).) 11.5.4 Extend the result of Complement 11.5.3 to the case where A and Tare m x n matrices and the columns of T are orthogonal.
394
MATRIX ALGEBRA THEORY AND APPLICATIONS
11.5.5 Let A E Mm n. Show that the minimum of IIA - BTII, under Frobenius norm, over'T E On is attained at T = QP', where P /\ Q' is the s.v.d. of A'B. [Use the result that for any orthogonal matrix U, Itr CUI ~ tr~, where ~ is the diagonal matrix of the singular values of
C.] 11.5.6 Let E E Mn(r) be non-negative definite with the spectral decomposition E = AIPIP{ + ... + ArPrP:, where Mn(r) is the class of all n x n matrices of rank ~ r. Show that for any unitarily invariant norm 11·11, the minimum of liE - LII when L runs through idempotent matrices of rank k ~ r is attained at
11.6. M, N-invariant Norm and Matrix Approximations Rao (1979b, 1980, 1985d) extended the concept of unitarily invariant norm to a more general M, N-invariant norm which is useful in what is known as dimensionality reduction problems in statistical multivariate analysis. We define an M, N-invariant norm, prove some results in matrix approximations and indicate some applications. DEFINITION 11.6.1. A matrix norm in the space of Mm xn (i.e., matrices of order m x n) is said to be an M, N-invariant norm if in addition to conditions (1), (2) and (3) in Definition 11.1.1, the following condition is satisfied: (4) II V XUII = IIXII for every X E Mm,n and any V E Mm and U E Mn such that with respect to given positive definite matrices M E Mm and N E M n , V* MV = M and U* NU = N. (Denote such a norm by IIXIIMNi.) The following proposition provides the connection between M, Ninvariant and unitarily invariant norms.
P 11.6.2 Let M E Mm and N E Mn be positive definite matrices and M I / 2 and N I / 2 be Gramian square roots of M and N respectively, and M- I / 2 and N- I / 2 be those of M- I and N-I. The following results hold. (1) If IIXI!. is a unitarily invariant norm of X, then IIMI/2 X NI/2111 is an M, N-invariant norm of X.
Matrix Approximations
395
(2) If IIXII2 is an M, N-invariant norm of X, then IIM-l/2 X N-l/2112 is a unitarily invariant norm of X. PROOF. Let U E Mm and V E Mn be such that U* MU = M and V*NV = N. Then
IIMl/2U XV N 1 / 21h = IIMI/2U M- 1 / 2Ml/2 X N 1 / 2 N- 1/ 2 V N 1 / 21h = IIMI/2 X N 1 / 2 1h since MI/2U M-I/2 and N-I/2V NI/2 are unitary matrices. This proves (1) of P 11.6.2. Part (2) is proved in a similar manner.
P 11.6.3 Let A E M m .n , and M E Mm and N E Mn be positive definite matrices. Let B E Mm.r and C E M n •k be such that B* M B = Ir and C* NC = h. Then (11.6.1)
where q = min{r,k}. The upper bound in (11.6.1) is attained when B consists of the first r columns of M- 1 / 2P and C consists of the first k columns of N-l/2Q, where P and Q are unitary matrices in the s.v.d. of M- 1 / 2 AN- 1 / 2 • PROOF. The result (11.6.1) follows from Poincare's Separation Theorem P 10.4.2 for singular values.
P 11.6.4 Let A E M m .n , X E M m .a and Y E Mn.b be given matrices. Let C E M a.n , R E Mb.n, and G E M m .n with rank p(G) = k $ r = p(A). Denote orthogonal projection matrices on the column spaces of X and Y by
P
= X(X' M- 1 X)- X' M-I,
Q
= y(y' N- 1 Y)-Y' N- 1
where M and N are given positive definite matrices. Then:
(1)
rninIlA-XCIl = IIA-PAII
(11.6.2)
G
for any M, N-invariant norm. (Choose any C such that XC = PA.)
(2)
rninIiA-RY'1i = IIA-AQ'II
(11.6.3)
R
for any M, N-invariant norm. (Choose any R such that RY' = AQ'.)
MATRIX ALGEBRA THEORY AND APPLICATIONS
396
(3)
min IIA - XC - RY'II C,R
= IIA - PA - AQ' + PAQ'II
(11.6.4)
= PA,
P)AQ'.)
for any M, N-invariant norm. (Choose XC (4)
min
p(G)=k
IIA-GII =
RY'
= (I -
IIA-Goll
(11.6.5)
for any M, N-invariant norm, where
M - 1 / 2G oN-l/2 = 0"1 U1 lr' vI and
O"i,
U i and
+ ... + O"k Ukv,'k'
(11.6.6)
Vi are as in the s.v.d.
M- 1/ 2AN- 1/ 2 = 0"lU 1V{
+ ... + O"kUk V~ + O"k+lUkV~ + ... + O"rUrV;. (11.6.7)
The results (11.6.2) and (11.6.3) follow from the inequalities on singular values PROOF.
O"i(M- 1 / 2(A - XC)N- 1/ 2) 2 O"i(M- 1 / 2(A - PA)N- 1/ 2), i = 1,2, .. . O"i(M- 1 / 2(A - RY')N- 1 / 2) 2 O"i(M- 1/ 2(A - AQ)N- 1 / 2), i = 1,2, .. . by using Ky Fan's Theorem (Complements 11.3.1 and 11.3.2). It is interesting to note that the result (11.6.2) is independent of N and (11.6.3) is independent of M. The solutions (11.6.2) and (11.6.3) are useful in studying regression problems in multivariate analysis. Result (3) is proved as follows. For any given R
N-l/2(A - XC - RY')' M(A - XC - RY')N- 1 / 2
2 N-l/2(A - RY')'(I - P)' M(I - P)(A - RY')N- 1 / 2 (11.6.8) from which it follows, using the notation O"i{-) for the i-th singular value,
O"i[N- 1 / 2(A - XC - RY')M- 1 / 2] 2 O";[N- 1/ 2(A - RY')'(I - P)' M- 1 / 2 ]
Matrix Approximation"
397
= O"i[M- 1 / 2(I - P)(A - RY')N- 1 / 2] ~ O"i[M- 1 / 2(I - P)(A - AQ')N- 1 / 2]
= O"i[M- 1/ 2(A - PA - AQ'
+ PAQ')N- 1/ 2].
(11.6.9)
Result (3) follows from (11.6.9) by using Ky Fan's Theorem (Complement 11.3.2). Result (4) is a direct consequence of the Complement 11.3.5. P 11.6.5 Let A, X, Y, C, Rand G be as defined in P 11.6.4 with the additional conditions p(X) = a, p(Y) = band p(G) = k ~ min(ma, n - b). Let P and Q be projection operators as in P 11.6.4 and
0"1 U1V{
+ ... + O"rUr V:
be the s. v .d. of Then infliA -XC- RY' -Gil taken over C, Rand G, for any M, N-invariant norm, is attained at
= PA, RY' = (1 - P)AQ' G = Ml/2[0"IU 1V{ + ... + O"kUkV£]N1/2.
(11.6.10)
XC
PROOF.
(11.6.11)
Observe that
O"i[N- 1/ 2(A - XC - RY' - G)M-l/2] ~ O"i[N- 1/ 2(A - RY' - G)'(1 - P)' M- 1/ 2] = O"dM-I/2(1 - P)(A - RY' - G)N-l/2]
~ O"i[M- 1 / 2(I - P)(A - G)(I - Q)' N- 1/ 2]
~ O"i+k[M- 1/ 2(1 - P)A(1 - Q)' N- 1 / 2], (i
+ k)
~r
which shows, using Complement 11.3.5, that
IIA -
XC - RY' - Gil ~ 11(1 - P)A(I - Q')II
for any M, N-invariant norm. It can be easily verified that the equality is attained when C, Rand G are chosen as in (11.6.10) and (11.6.11). Note that the solution for optimum C and R may not be unique.
398
MATRIX ALGEBRA THEORY AND APPLICATIONS
11.7. Fitting a Hyperplane to a Set of Points
In discussing vector spaces, we introduced the concept of a subspace of a vector space V, as a subset of vectors closed under the operations of addition and scalar multiplication. We now define subsets of vectors which are obtained by translating a subspace. They are called hyperplanes. DEFINITION 11.7.1. Let V be a vector space of dimension m, and a a vector in V, and B a subspace of dimension k ::; m. A k dimensional hyperplane specified by a and B is defined to be the set of vectors
Hk(a, B) = {a +x: x E B}.
(11.7.1)
In the special case when V = R m , the elements of V are m-vectors and the set (11.7.1) can be represented as
Hk(b,A)
= {x: Ax = b}
(11.7.2)
where A E Mm-k,m with p(A) = m - k and b E Mk,l. A problem of great interest in statistics and econometrics is the following. We have a given set of n points (vectors), Xl, ... ,X n , in a vector space V of dimension m. We would like to fit a k dimensional hyperplane to the n points, i.e., determine a k dimensional hyperplane to which the given set of points are closest in some sense. We will develop some criteria of closeness and discuss how to fit such hyperplanes. This problem was raised by Karl Pearson (1901) and solved in particular cases. Pearson's solution is the forerunner of principal component and related analyses which are currently used in statistical multivariate analysis. First we formulate the problem in general terms and consider some special cases. Let us consider a Hk(a, B) for given k, a and B, and let Zi be such that (11.7.3) where 11 · 11 is a chosen vector norm defined on V. By (11.7.3) we have associated with the given point Xi a point Zi on Hk(a, B). We now define a compound measure of closeness by (11.7.4)
Matrix Approximations
399
using suitable weights Wt, ... , W n . Then L is a function of 0: and B for given k. We now determine 0: and B by rrunimizing L with respect to 0: and B . We have used only the concept of a vector norm in the above formulation of the problem and not any particular structure of V. Let us now consider V = Rm, in which case Xt, . .. , Xn are column vectors which can be represented as a matrix X = (Xtl· . ·Ix n ) E Mm,n. The subspace B is represented by a matrix B = (btl·· ·Ibk ) where bi E R m and bt , ... , bk constitute a basis of B. In such a case a point on Hk(O:, B) can be written as 0: + By, where y E Mk,l . Let us associate with Xi a point o:+BZi in Hk(O:, B). Then the set of points on Hk(O:, B) associated with Xt, ... , Xn can be written in a matrix form 0:1'
+ BZ,
Z
= (zll ·· ·Izn)
E Mk,n.
Now the problem can be formulated as that of minimizing a matrix norm (11.7.5) IIX - 0:1' - BZII with respect to 0:, Band Z. A general solution to this problem for a wide class of norms is given in the following proposition. P 11.7.2 Let Q = 1(l'N- t 1)- t 1'N-t, where 1 is the vector with all components as unity, and F = M- t / 2 X(1-Q')N- 1/ 2 with the s.v.d. F = atUI V{
+ ... + arUrV:.
Then a set of 0:, Band Z which minimize (11.7.5) for any M, N-invariant .norm is given by 0:
= AN- 1 1(l' N- t 1)-t
B = Mt/2(U t l·· ·IUk) Z' = Nt/2(at VIi· . · Iak Vk). PROOF. The results of P 11.7.2 follow from the general theorem proved in P 11.6.5 by choosing the matrices involved appropriately. COROLLARY 11.7.3. The solution for 0:, Band Z which minimize (11.7.5) for any unitarily invariant norm is 0:
B
= =
1
-Xl = x (say) n (Uti · . ·IUk ), Z'
=
(at VIi· . ·Iak Vk)
MATRIX ALGEBRA THEORY AND APPLICATIONS
400
where
Ui, Ui
and Vi are as in the s.v.d.
Note 1. If the norm chosen is Frobenius norm, the solution is the same as in Corollary 11.7.3. We can then compute the minimum value min
Ot,B,Z
IIX - aI' - BZII} = U~+l + ... + u;.
In statistics, the closeness of fit of k dimensional hyperplane to given points is measured by the index
U~+l
+ ... + U:
22'
Ul
+ "' +Ur
Note 2. Let V{ = (ViI, ... ,Vin). Then from the Corollary 11.7.3, we find the best representation of Xi on H k is
so that by choosing U l , .•. ,Uk as coordinate axes and x as the origin, the m vector Xi can be represented as the point (UlVli, ... ,UkVki) in a k dimensional space. The coordinates UlVli, ... ,UkVki are called the first k principal components of Xi.
Note 3.
Let us get back to the compound measure of closeness (11.7.4) and consider some special vector norms. The results are given in the following propositions. P 11.7.4 Let X E Mm,l and z = a + Be where a E Mm,l , B E Mm,k, p(B) = k, and e E Mk,l' Let E be a positive definite matrix. Then min(x - z)'E-l(x - z) c
(11.7.6)
is attained at and the minimum value is (11.7.7)
Matrix Approximations
401
where PB = B(B'E-IB)-IB'E-I. PROOF. The results are easily established by differentiating the qua-
dratic expression in (11.7.6) with respect to the variable c and solving for c (see Section 6.5 for vector and matrix derivatives). Now we consider the problem of minimizing the expression (11.7.4) with norm as in (11.7.6)
where Zi = a+Bci. First we minimize each term of (11.7.8) with respect to its c value and compute the minimum value using P 11.7.4, n
L Wi(Xi - a)'(E-
1
E- 1 PB)(Xi - a).
-
(11.7.9)
i=l
Minimizing (11.7.9) with respect to a, we find the minimum to be n
i=l n
=
n
L Wi(Xi - x)'E-1(Xi - x) - L Wi (Xi - x)'E- PB(Xi - x). 1
i=l
(11.7.10)
i=l
To further minimize with respect to B, we have to maximize n
L Wi(Xi - x)'E- PB(Xi - x) = tr(E- PBS) 1
1
i=l
where
n
S=
L Wi(Xi -
X)(Xi - x)'.
i=l
P 11.7.5 The maximum of tr(E- 1 PBS) over B is attained at B = EI/2Q., where Q. is the matrix of first k eigenvectors of the matrix E- 1/ 2 SE- 1/ 2 .
402
MATRIX ALGEBRA THEORY AND APPLICATIONS
PROOF. Note that tr{E- 1 PBS) = tr{E- 1 / 2E- 1 / 2PBE 1 / 2E- 1 / 2S) = tr(E-I/2PBEI/2E-I/2SE-I/2)
= tr{PT)
where T = E- 1 / 2SE- I / 2 and P = E- I / 2PBE 1 / 2 is an idempotent matrix of rank k. Then P can be expressed as
P
= QQ',
Q E
Mm ,k
and Q'Q
= Ik .
Now
tr{PT)
= tr{Q'TQ).
Using P 11.4.1, the optimum Q which maximizes tr{PT} is the matrix of the first k eigenvectors of T, which may be denoted by Q* . Then
(11.7.11) It is easy to see that the choice B = E-I/2Q* satisfies equation (II. 7.11) . This proves P 11.7.5.
Note: The vectors CI, ... ,Cn associated with the optimum B provide the best representation in a k dimensional space of points Xl, . • • ,Xn in the m{> k) dimensional space. They are called canonical coordinates and used in graphical representation of multivariate data as discussed in Rao (1948).
CHAPTER 12 OPTIMIZATION PROBLEMS IN STATISTICS AND ECONOMETRICS
12.1. Linear Models In this chapter, we consider some general optimization problems which are useful in the statistical analysis of linear models. A linear model is of the form y=X
nxl
nXm
/3+E
mxl
nxl
(12.1.1)
where Y is an n-vector random variable, X is an n x m matrix of rank p{X) = r ::; m, /3 is an m-vector of unknown parameters and E is an n-vector of error variables with E{ E) = 0 and covariance matrix D{E) = 0-2V an n x n non-negative definite matrix. The problems of interest are the estimation of the fixed parameters /3 and 0- 2 and the estimation or the prediction of the random component E. A mixed linear model is of the form (12.1.2) where Y,X,{3 and E are as in (12.1.1), Ui is an n x Pi matrix, ~i is a Pi-vector random variable such that E{~i) = 0 and
COV{~i) = 0-7Ipi' COV{~i' ~j) = 0, i
=1=
j, COV{~i' E) = 0
(12.1.3)
o-r,
for i = 1, ... ,k. The problems of interest are the estimation of /3, 0- 2 , ... ,o-~ and the prediction of the random components 6, ... '~k and
E.
12.2. Some Useful Lemmas In this section, we consider some general results used in optimization problems in statistics. The following notation and assumptions are used throughout this chapter. 403
404
MATRIX ALGEBRA THEORY AND APPLICATIONS
(1) Sp(A) and p(A) stand for the space generated by the columns of the matrix A and the rank of A, respectively. Al. is a matrix of maximum rank such that A' A l. = o. (2) V is n x n non-negative definite matrix, X is n x m matrix and W = (VIX). Note that p(W) = p(V + XX') and Sp(VIX) = Sp(V +XX').
P 12.2.1 The linear equations in matrices L of order n x k and A of order m x k, VL+XA= G1 X'L
(12.2.1)
admit solutions for any G 1 of order n x k and G2 of order m x k such that Sp(G1 ) C Sp(W) = Sp(VIX) and Sp(G2 ) C Sp(X'), respectively. PROOF. We only have to show that
Le. , if there exists a vector (a' : b') such that (12.2.2) then
[a'WI
[g~] = o.
(12.2.3)
Equation (12.2.2) is equivalent to
+ b'X' = 0, a'X = 0 => a'Va = 0, Va = 0, Xb = 0, X' a = 0 => a'W = 0, b' X' = o. a'V
=>
Then a'G' = 0 since Sp(Gt} C Sp(W) and b'G2 Sp(X'), which proves Lemma 12.2.l.
=0
since Sp(G2 ) C
The next lemma due to Rao (1989) is concerned with the minimization of a matrix function in the sense of LOwner.
Optimization Problems in Statistics and Econometrics
405
P 12.2.2 Let Lo and Ao be a solution of the matrix equations in L of order n X k and A of order m X k given by
VL+XA= F, X'L =P, where Sp(F)
c
(12.2.4)
Sp(V\X) and Sp(P) C Sp(X'). Then
L'V L - L' F - F'L '?:.L -F'Lo - p' Ao = -L~F - A~P
(12.2.5)
for all L such that X'L = P. PROOF. A general solution of X'L = P is L = Lo + X 1. A, where A is an arbitrary matrix. Substituting in the expression on the left hand side of (12.2.5), we find (using the equation V Lo = F - X Ao)
L'V L - L' F - F'L = L~ V Lo - L~F - F'Lo
+ (Xl. A)'V(Xl. A)
?:.L L~ V Lo - L~F - F'Lo = - F' Lo - p' Ao since (X 1. A)'V(X 1. A) is non-negative definite.
Complements 12.2.1 (A special case of Lemma 12.2.1) Consider the nonhomogeneous quadratic form Q(f) = f'Vf - 2f'w where f is an n-vector and w is an n-vector. Let p be an m-vector such that p E R(X'). Then min Q(f) =
X'f.=p
-00 {
-fow - p' >'0
if w rJ. R(V\X) if wE R(V\X)
where fo, >'0 is a solution of Vf
12.2.2
+ X>. = w,
X' f
= p.
Consider the matrix A and a g-inverse A-of A partitioned as
A-- [VX'
1 A- = [ClC
X 0'
3
MATRIX ALGEBRA THEORY AND APPLICATIONS
406
where V E Mn and nnd; and X E Mm ,n' Show that: 1a. lb. 2a. 2b. 2c. 2d. 2e. 3a.
peA) ~ m + n; peA) = m + n if p(V) = nand p(X) = m. peA) = p(VIX) + p(X). XC~X = X, XC3 X = X. XC4 X' = XC~X' = VC~X' = XC3 V = VC2X' = XC~V. X'CIX, X'CI V, VCIX are null matrices. VCI VCI V = VCI V = VC~VCI V = VC~V. TrVCI = p(VIX) - p(X) = TrVC~. If Sp(X) c Sp(V), then one choice of C ll C 2, C 3 , C 4 is
3b.
In general
where T = (X'(V + X'V X)- X)-, w = (V + X'V X)- and V is any matrix such that Sp(X) C Sp(V + XV X') and Sp(V) C Sp(V + XV X').
12.3. Estimation in a Linear Model We apply the result of P 12.2.1 to estimate X (3, E and characterizing the linear model
(72,
the triplet
(12.3.1) where we use the symbol D to denote the dispersion (variance covariance) matrix. The reason for choosing X (3 as a more natural parameter for estimation rather than (3 will be made clear once we define estimability of a linear parametric function. DEFINITION 12.3.1. A set of
k parametric functions P'(3 where Pis
an m x k matrix, is said to be unbiasedly estimable by linear functions of Y if there exists a k x n matrix L such that E( L'Y) = P' (3, a sufficient condition for which is Sp(P) C Sp(X'). Note that the entire m-vector parameter (3 is estimable only if Sp(I) = Sp(X'), i.e., when p(X') = m. The function X (3 is always estimable.
Optimization Problems in Statistics and Econometrics
407
LoY is said to. be the minimum dispersion unbiased estimator (MOUE) of an estimable function P' (3 if LoX = P' and DEFINITION 12.3.2. A linear function
D(L~Y)
= (j2 L~V Lo ~L (j2 L'V L = D(L'Y)
for all L such that L' X = P'. The following theorem provides the main result for computing the MOUE of estimable parametric functions of (3.
P 12.3.1
Let Lo and Ao be a solution of the equations
VL+XA=O X'L
=X'.
(12.3.2)
Then LoY is the MOUE of X{3 with the dispersion matrix, D(LoY) = -(j2XA o. PROOF. The covariance matrix of L'Y is (j2 L'V L. Using the result of P 12.2.1, the minimum of L'V L with L subject to X'L = X' is attained as stated in the theorem. The following results are consequences of P 12.3.1. (1) If ClI C 2, C3, -C4 are the partitions of any g-inverse of the matrix of the equations in (12.3.2), then Lo = C2X' and Ao = -C4X' and the estimate of X (3 is
and
D(X{3) A
xj3 = L~Y = XC2Y,
(12.3.3)
= (j 2 (-XAo) = (j 2 XC4 X' .
(12.3.4)
(2) Let P be an m x k matrix such that Sp(P) C Sp(X') in which case the function P' {3 is estimable and the MOUE of P' {3 is P' j3 where j3 = C 2Y with the covariance matrix (j2 P'C4 PTo establish this result, we note the following. (1) There exists an n x k matrix A such that P = X' A. (2) If T is MOUE of a vector parameter (), then A'T is MOUE of A'(} (Why?).
408
MATRIX ALGEBRA THEORY AND APPLICATIONS
Now we choose T = x/3 = XC2Y (Le., defining /3 = C 2Y) which is the MOUE of X {3. Then using (2), the MOUE of A' X (3( = P'{3} is A'X/3 = A'XC2Y = P'C2Y = P'/3. Further
D(P'/3) = D(A'X/3)
= A'D(X/3}A = a 2 A' XC4 X' A
using (12.3.3)
= a 2 P'C4 P.
(12.3.5)
The above results show that we may consider /3 as formally an estimate of {3 and a 2 C 4 as the dispersion of /3, in the sense that the MOUE of an estimable function P' (3 is P' /3 and D(P' /3) = a 2 P'Cov(/3)P = a 2 P'C4 P, although {3 as such may not be estimable. As we have estimated X {3 by X /3, it would appear natural to estimate 10, the error vector, by the residual Y - x/3. Is there a direct way of predicting 10 by minimizing a suitable criterion function? We attempt to do this by considering linear estimators of the form L'Y, where L is an n x n matrix.
P 12.3.2 The linear predictor L'Y of 10 has a bounded mean dispersion error (MOE) for all {3 only if L is such that X'L = O. Let (Le, Ve) be any solution of the equation
VL+XA= V, X'L
=0.
(12.3.6)
Then L~Y is the unbiased predictor of 10 with the minimum mean dispersion error of a 2 X Ae. PROOF.
The MOE of predicting 10 by L'Y is
E[(L'Y - E}(L'Y - 10)'] = E[(L' X{3 + (L' - I)E)(L' X{3 + (L' - I)E)'] = L' X{3{3' X' L+a 2(L-I)'V(L-I). (12.3.7) For any given L, the elements of (12.3.7) tend to infinity as the components of {3 tend to infinity. For boundedness, it is therefore necessary that X'L = O. Then the MOE is
(12(L - I)'V(L - I) = (12(L'V L - L'V - V L'
+ V)
~L E[(L~Y - E}(L~Y - E)']
Optimization Problems in Statistics and Econometrics
409
using the result of P 12.2.2, establishing that MMDE of E is L~Y with the dispersion matrix (apart from the multiplier 0"2) (L~VLe - L~V - VLe
+ V) =
(L~(V - XAe) - L~V - V
+ XAe + V)
= (I - Le)'XA e = 0"2 XA e,
where Le and Ae are any solutions of (12.3.6), and P 12.3.2 is proved. From P 12.3.1 and P 12.3.2, the estimates of Xj3 and E are L~Y and L~ Y, respectively. Do they add up to Y? Let us consider
+ L~Y - Y) 0"2(Lo + Le - I)'V(L o + Le - I) 0"2(Lo + Le - I)'(V Lo + V Le - V) 0"2(Lo + Le - I)' X( -Ae - Ao) 0"2(L~X + L~X - X)( -Ae - Ao) =
D(L~Y
= = = =
(12.3.8) 0
(12.3.9)
using the equations (12.3.2) and (12.3.6) and substituting for V Lo and V Le in (12.3.8) and for X'L o and X'L e in (12.3.9). The result shows
that L~Y + L~Y = Y with probability one. This result can also be established by using the explicit solutions to Lo and Le in terms of the Ci matrices.
12.4. A Trace Minimization Problem Let C be the collection of all matrices C of order m x n with real entries, X a given matrix of order n x r, Vb V2, ... , Vk given matrices each of order m x n, and PI, P2, ... ,Pk given real scalars. Let
C1 = {C E C : CX = 0, Tr(C\ti') = Pi for each i}.
(12.4.1)
The objective of this section is to minimize Tr( CC') over all C E CI . If the collection CI is empty, we have no case to answer. Assume that C1 is non-empty. Let Px be the projection operator from Rn onto the subspace Sp(X), which has an explicit representation Px
= X(X' X)- X',
(12.4.2)
410
MATRIX ALGEBRA THEORY AND APPLICATIONS
where (X' X)- is any g-inverse of X' X. Let Q = I - Px and note that 0 =} C = DQ for some matrix D of order m x n. We are now ready to solve the problem. Consider the following system of linear equations
ex =
k
2)tr(ViQVj)]Ai i=1
= Pj, j = 1,2, ...
,k
(12.4.3)
in unknown A1, A2, ... ,Ak and check on their consistency. Let F be the symmetric matrix of order k x k given by F = (tr(ViQVj)),
(12.4.4)
i.e., the (i,j)-th entry of F is given by tr(ViQVj). The strategy is the following standard one. Let a be any vector of order k x 1 orthogonal to the columns of F. Then show that it is also orthogonal to the vector p, where P' = (P1,P2,'" ,Pk)' This would then imply that P E Sp(F) and hence the system (12.4.3) is solvable. Let a' = (a1' a2, ... ,ak)' If a is orthogonal to the columns of F, then we have k
L aitr(ViQVj) = 0 for each
= 1,2, ...
j
,k.
(12.4.5)
i=1 The operation trace is linear, so that from (12.4.5), we obtain k
=0
tr«L ai Vi)QVj)
for each
j
= 1,2, ... , k.
(12.4.6)
i=1 Multiplying (12.4.6) byaj and summing over j, we obtain k
k
tr«L ai Vi)Q(L ai Vi)') = O.
i=1
Since the matrix involved in the trace operation of (12.4.7) negative definite, we have k
k
«L ai Vi)Q(L ai Vi)') = O.
i=1
(12.4.7)
i=1
i=1
IS
non-
Optimization Problems in Statistics and Econometrics
411
Since Q is non-negative definite, k
k
Q(LaNi)' = 0 = LaiQ~'. i=l
We now show that a'p =
o.
k
Let C E C1 • Observe that k
a'p = L aiPi
k
aitr(C~')
=L
i=l
(12.4.8)
i=l
=L
i=l
aitr(DQ~')
i=l
k
= tr(D(L aiQ~')) = 0
(from (12.4.8)).
i=I
Thus the solvability of the system (12.4.3) of equations is assured. We are now ready to state the result which will solve the optimization problem.
P 12.4.1 The minimum of tr(CC') over all C E Cl is attained at any matrix C. given by k
C. = LAiViQ,
(12.4.9)
i=l
where AI, A2, ... ,Ak is a solution to the following system of linear equations: k
L[tr(ViQVj)]Ai
= Pi, j = 1,2, ...
,k.
i=l
PROOF. First, let us check whether C. belongs to Cl. We need to
verify, first, that C*X = O. This follows from the fact that
QX
= (I -Px)X = X
-X(X'X)-X'X
=X
-X
= O.
Second, we need to verify that tr(C* Vj) = Pi for each j. From the stipulation that Ai'S satisfy the linear equations above, it follows that k
tr(C* Vj)
= L[tr(ViQVj)]Ai = Pi· i=l
412
MATRIX ALGEBRA THEORY AND APPLICATIONS
Thus C* E CI . The next objective is to show that C* is optimal. Let C be any matrix in CI . One can always write C = C* + G, for a suitable matrix G . The fact that C E CI implies, at the outset, that 0= CX = (C* +G)X = C*X +GX = GX,
or more relevantly, that (12.4.10)
GX=O. This implies that GPx = O. (Why?) Further,
GQ
= G(I - Px) = G.
(12.4.11)
Additionally,
Pi = tr(CVj) = tr((C*
+ G)Vj) =
tr(C* Vj)
+ tr(GVj) =
Pi
+ tr(GVj)
for each j implies that tr( CVj) = 0
for each j.
(12.4.12)
Crucially, from (12.4.11) and (12.4.12), we observe that k
tr(C*G')
k
= tr((LAiViQ)C') = tr(LAiViQG') i=I
i=I
k
k
= tr(L Ai ViC') = L i=I
Aitr(ViG') = O.
(12.4.13)
i=I
Finally, we observe that
+ G)(C* + G)'] tr(C*C~) + tr(GG') + tr(C*G') + tr(GC~) tr(C*C~) + tr(GC') ~ tr(C*C~).
tr(CC') = tr[(C* = =
This completes the proof. We now focus on another problem of the same type as the one proposed at the beginning of this section. We assume that m = n. Let VI, V2 , ••• , Vk be a finite collection of known symmetric matrices of order m x m. Let X be a given matrix of order m x T. Let PI, P2, ... , Pk be real scalars. Let C2 = {C : C = C' of order m x m, CX = 0, tr (CVj) = Pi, 'lij}.
The objective is to rrunimize tr(C2 ) over all C E C2. The following result provides an explicit solution to the problem.
Optimization Problems in Statistics and Econometrics
P 12.4.2 given by
413
The minimum of tr( C2) is attained at any matrix C* k
C*
=L
(12.4.14)
AiQViQ,
i=l
where AI, A2, ... ,Ak is a solution to the system of linear equations: k
L[tr(QViQ\tj)]Ai
= Pj, j = 1,2, ...
,k.
(12.4.15)
i=l
We will not give a proof of this result. One can imitate the proof of P 12.4.1 to establish this result. One can notice right away that C* is symmetric and that C* belongs to C2. The consistency of the system of equations (12.4.15) follows in a similar vein. (Try it.) The solution to the optimization problems posed above may not be unique. Nonuniqueness arises if the matrices involved in equations (12.4.3) and (12.4.15) are nonsingular. In practice, we are not unduly concerned with the uniqueness problem.
12.5. Estimation of Variance Let us get back to the basic model: Y = X{3 + € with E€ = 0 and O(€) = (72V where (72 > 0 is unknown. In this section, we assume that V = I. The main objective of this section is the estimation of (72. Let Y'CY be a quadratic form in the data Y, for some symmetric matrix C, which we would like to propose as an estimator of (72. The basic requirement is that it be unbiased. Let us record a very general result on the expected value of a quadratic form.
P 12.5.1
Let Y be a random vector of order n x 1 with mean vector
IL and dispersion matrix (72V. Let C be any symmetric matrix of order n x n. Then (12.5.1) E(Y' CY) = (72 tr(CV) + IL' C IL. PROOF.
Observe that V
= E(Y -
IL)(Y - IL)'
= EYY' -
ILIt'. Also,
E(Y'CY) = Etr(Y'CY) = Etr(CYY') = tr(E(CYY'))
= tr(CE(YY'))
= tr(C«(72V + ILIL'}} = (72 tr(CV) + IL'CIL·
414
MATRIX ALGEBRA THEORY AND APPLICATIONS
In our model, we identify that V = I and J.L = X (3. Consequently,
E(Y'CY)
= 0'2tr(C) + {3X'CX{3.
Since we would like to have the quadratic form to be unbiased, we need to nullify the nuisance element present in its expected value, namely (3X'CX{3. We impose the following conditions on the matrix C: trC
=1
and
CX
= o.
(12.5.2)
Next, we would like the unbiased estimator Y'CY to have minimum variance. But we have to venture into the realm of fourth order moments of the random variables Yi's. The variance of Y'CY can be computed but it is complicated. One way to get out of this rigmarole is to assume that the random variables involved in the linear model have a multivariate normal distribution. Under this assumption Var(Y'CY) = 20'2tr(C2). In order to find a minimum variance estimator Y'CY of 0'2, we need to minimize trC2 subject to the conditions: trC = 1 and CX = O. This problem falls into the orbit of P 12.4.2. We identify that k = 1; Vl = I; and Pl = 1. The optimal matrix C .. is given by C .. = AlQVlQ = AlQ2 = AlQ, where Al satisfies the equation tr(QVlQVt}Al = Pl = 1, from which we obtain, Al = l/tr(Q). Consequently, the minimum variance quadratic unbiased estimator of 0'2 is given by
(trQ)-lY'QY = (trQ)-lY'(I - X(X'X)- X')Y.
(12.5.3)
Let us make a few comments on the above derivation. There is a standard approach available to solve this problem. One uses the method of least squares to estimate {3. As a matter of fact, a solution to the resultant normal equations in fj is given by fj = (X'X)- X'Y. Then one shows that the residual sum of squares
(Y - xfj)'(Y - xfj) = Y'(I - X(X'X)- X')Y has expectation (n - p(X))0'2, from which one obtains an unbiased estimator of 0'2. This estimator is precisely what we have obtained in (12.5.3). Our approach has given us something more. We showed that the estimator (12.5.3) has minimum variance under the normality assumption. The next goal is to dispense away with the normality assumption. This is the objective in the next section.
Optimization Problems in Statistics and Econometrics
415
12.6. The Method of MINQUE: A Prologue The acronym in the title of this section stands for minimum norm quadratic unbiased estimation. Let us work in the realm of the linear model: Y = X f3 + c with Ec = 0 and D(c) = (J2 I. In the last section, under the pretext of multivariate normality, we obtained the minimum variance quadratic unbiased estimator of (J2. In this section, we will make an attempt to obtain an optimal quadratic estimator of (J2 without invoking the normality assumption. If the error vector c' = (cl' C2, ... ,cn) is observable, then a natural estimator of (J2 is n
I
(lin) LC; = (l/n)c'c = c'(-I)c. i=l
n
But only the data vector Y is observable. We need to build an estimator of (J2 based on Y. As professed earlier, we will entertain only quadratic estimators of (J2. Let C be a symmetric matrix of order n x n. We have seen, in the last section, that Y'CY is an unbiased estimator of (J2 if CX = 0 and tr(C) = 1. Under these restrictions, the quadratic form Y'CY simplifies to
+ c)'C(Xf3 + c) f3'X'CXf3 + c'Cc + f3'X'Cc + c'CXf3 =
Y'CY = (Xf3 =
c'Cc.
Consequently, the difference between what is contrived and what is ideal works out to be: Y'CY - c'(n- 1 I)c = c'(C - n- 1 I)c.
The critical idea behind the method of MINQUE is to choose C such that it is close to (1/n)I. Choosing the Frobenius norm (IIBIIF = tr(BB')), the objective is to minimize IIC - n- 1 IIiF subject to unbiasedness conditions that CX = 0 and tr (C) = 1. Observe that
Consequently, the above problem reduces to minimizing tr(C2) subject to the conditions that CX = 0 and tr(C) = 1. This is precisely what we
416
MATRIX ALGEBRA THEORY AND APPLICATIONS
have done in Section 12.5. The solution was identified and we obtained the estimator of (12 the usual one based on the residuals. There are two aspects of this method worth recording. One is that the usual estimator of (12 is optimal based on the underlying criterion of the method of MINQUE. The other aspect is that the method is amenable to generalization. This is what we are proposing to carry out in subsequent sections.
12.7. Variance Components Models and Unbiased Estimation Variance components models (random and mixed effects models) are routinely postulated for a variety of datasets. They also play a useful role in analyzing repeated measurements data. Any such model can be described by a linear model, Y = X f3 + c, with Ec = 0 and (12.7.1) where VI, V2 , •• • ,Vk are known non-negative definite matrices, O}, O2 , ... ,Ok are unknown real numbers and 0' = (O}, O2 , . .. ,Ok)' There may be additional conditions imposed on Oi'S. For example, Oi'S are bound by the restriction that V(O) be non-negative definite. For the time being, let us not put too much emphasis on the range of values for the vector 0, except for the minor technical condition that the range of values for the vector 0 has a non-empty interior, which is a topological condition. The quantities 0 1 , O2 , .. • ,Ok are called the variance components of the model. There may be a certain element of ambiguity in the specification of the matrices V}, V2 , .• • ,Vk . We can rewrite, for example,
The matrices 2- 1 VI, 2- 1 V2 , ••• ,2- 1 Vk are known non-negative definite matrices. Shall we call 201,202, ... ,20k as the variance components of the model? If we fix the matrices V}, V2 , ••• , Vk , variance components are probably uniquely determined. Another complication could arise. It may be possible that there are two distinct vectors (0 1 ,02 , ... ,Ok) and (Oi, O2, ... ,O'iJ such that 0 1VI + ... + Ok Vk = 0iVl + ... + OZVk. In such a contingency, estimation of all the variance components is not possible. We now assume that the matrices VI, V2 , ••. , Vk are fixed and the variance components are unambiguously defined. The main objective of
Optimization Problems in Statistic8 and Econometrics
417
this section is to find some good estimators of the variance components based on the data Y. More generally, let (12.7.2) be a linear combination of the variance components, where PI, P2, ... ,Pk are known. As usual, we will entertain only quadratic estimators Y'CY for the linear combination of the variance components. We need to impose conditions on the matrix of the quadratic form in order that it be unbiased. Note that k
PIOI + P2 02 + ... + PkOk = E(Y'CY) =
L Oitr(CV;) + f3' X'CXf3. i=1
A set of necessary and sufficient conditions for Y'CY to be an unbiased estimator of the given linear combination of the variance components are: X'CX = 0 and tr(CV;) = Pi, i = 1,2, ... ,k. (12.7.3) Will there be a symmetric matrix C at all satisfying (12.7.3)? We need a criterion for the existence of a solution to (12.7.3) in terms of the matrices VI, V2, ... , Vk and X. Construct the following symmetric matrix of order k x k, H = (tr(V;Vj - PV;PVj)),
(12.7.4)
where P denotes, as usual, the projection operator X(X'X)- X'. The matrix H is computable once we know V;'s and X. We will show that there is a matrix C satisfying (12.7.3) if and only if the coefficient vector P consisting of the components PI, P2, ... , Pk belongs to Sp( H}. We will now prove this assertion. A general solution to the equation X'CX = 0 in unknown C is given by B - P B P for some symmetric matrix B. (How?) Suppose there exists a matrix C satisfying (12.7.:3). Then C = B - PBP for some symmetric matrix B. Note that for each i,
Pi = tr(CV;) = tr(BV; - PBPV;) = tr(BV; - BPV;P) = tr(B(V; - PV;P)) = (vec(Vi - PV;P))'(vec(B)). Let us take a critical look at the last step. We use the vec operation to express the trace of a product of two symmetric matrices. Recall
418
MATRIX ALGEBRA THEORY AND APPLICATIONS
the vec operation from Section 6.2. The vec operation performed on a matrix stacks all the entries of the matrix column by column into a single column vector. IT E = (eij) and F = (Iij) are two symmetric matrices of the same order, then tr(EF} = Li Lj eij!ij = (vec(E))'(vec(F)). (Try it.) Note that
[VeC(Vl - PVIP)]']
[vec(V2 - ~~2P)]'
=
[
[vec(B)] .
(12.7.5)
[vec(Vk - PVkP)]'
Let T be the matrix of order k 2 x k given by
Then (12.7.5) can be rewritten as
p = T'vec(B). What this means is that the vector p is a linear combination of the columns ofT'. Consequently, pE Sp(T'), the range space of the matrix T'. But Sp(T') = Sp(T'T). Let us compute T'T. The (i,j)-th entry of T'T is given by, in view of what we have observed about the relationship between the vec operation and trace, [vec(Vi - PViP)]'[vec(Vj - PVjP)] = tr«Vi - PViP)(Vj - PVjP))
= tr(ViVj - ViPVjP - PViPVj + PViPPVjP) = tr(ViVj - PViPVj), since the projection operator is idempotent. This entry is precisely the (i,j}-th entry of H. Consequently, T'T = H, and hence p E Sp(H). Conversely, suppose p E Sp(H} = Sp(T'}. Then there exists a column vector l of order k 2 x 1 such that p = T'l. Now undo the vec operation. Let B be the matrix of order k x k such that vec(B} = l. (How?) The matrix B may not be symmetric. Let
c=
(1/2}[B
+ B' -
P(B + B'}P].
(12.7.6)
Optimization Problems in Statistics and Econometrics
419
This matrix G satisfies all the constraints of (12.7.3) . The symmetric matrix G defined in (12.7.6) satisfies X'GX = O. In order to establish this, one merely uses the fact that PX = X and X'P = X'. Using (12.7.6) and after some simplification, we get tr(GVi) = (1/2)tr[(B
+ B')Vi -
P(B
+ B')PVi]
= [vec(Vi - PViP)]'[vec(B)] .
(l2.7.7)
A critical look at (12.7.7) yields that tr(GVi) = Pi for every i. The above presentation clearly spells out when one can estimate a given linear combination of the variance components unbiasedly using quadratic estimators. Let us enshrine this in the following proposition. P 12.7.1 Let PI(h + ... + PkOk be a linear combination of the variance components. Then there exists a quadratic unbiased estimator of the combination if and only if the vector of coefficients P E Sp(H). 12.8. Normality Assumption and Invariant Estimators Let us get back to the main track. In what follows, we assume that there are quadratic unbiased estimators available to estimate a given linear combination of the variance components. We need to venture beyond unbiasedness. One approach would be to seek a symmetric matrix G such that Y'GY is unbiased and has minimum variance in the class of all unbiased quadratic estimators of the given linear combination of the variance components. But the computation of the variance of the quadratic form involves third and fourth order moments of the error random vector and it is complicated. However, the variance of an unbiased quadratic estimator has a simple form of expression provided we assume normality for the distribution of the error random vector. Under normality, var(Y'GY) = 2tr((GV)2) + 4{3' X'GVGX{3. But the minimization of var(Y' GY) over all quadratic unbiased estimators is fraught with difficulties. The variance involves not only the dispersion matrix V but also the parameter vector {3. The supposition that Y'GY is an unbiased estimator of p'O imposes the condition that X'GX = O. This will not make the term involving {3 to go away. One way to get rid of this term is to impose the stronger condition that GX = O. We not only want our estimators to be unbiased but have a little extra to guarantee that G X = O. Is there a statistical or mathematical interpretation of
420
MATRIX ALGEBRA THEORY AND APPLICATIONS
the condition CX = o? We will provide one interpretation of the condition. Let /30 be an arbitrary but fixed vector of order m x 1. Consider the new data vector Y* = Y + X /30, which is observable as soon as Y is observed. Let /3* = /3 + /30. Since /3 is an unknown parameter vector in our original linear model, /3* is also unknown. Consider the linear model: Y* = X /3* + c. Structurally, this model is no different from our original model. Either of them could be used to estimate the parameter vector /3 and the variance components. Any estimator one provides to estimate the variance components whether one uses the data Y or Y* should lead to the same value. This is the idea behind in the following definition. DEFINITION 12.8.1. A quadratic form Y'CY in the data is said to be an invariant unbiased estimator of a linear combination p'e of variance components if it is unbiased and Y'CY = (Y + X/3o)'C(Y + /30) for all /30 E Rffl . There is nothing new in the idea of invariance. In a substantial number of decision theoretic procedures, invariance property plays a crucial role in deriving some optimal procedures. The invariance property introduced in Definition 12.8.1 is akin to shift invariance. The collection of shift transformations, Y ---. Y + X /30, is indexed by /30, where /30 roams freely over the Euclidean space R ffl • The condition of unbiasedness imposes restrictions on the matrix C. We have already seen that we must have that X'CX = 0 and tr(C\Ii} = Pi for all i. The additional property of invariance imposes also a restriction on C. The following,
+ X/3o}'C(Y + X/3o) = Y'CY + 2Y'CX/30 + /3hX'CX/3o ,
Y'CY = (Y
(12.8.1)
is valid for all /30 if and only if CX = O. Let us paraphrase the above in the following proposition. P 12.8.2. A quadratic form Y'CY in the data is an invariant unbiased estimator of a linear combination p'e of the variance components if and only if
CX = 0
and tr(C\Ii} = Pi, i = 1,2, . . . ,k.
(12.8.2)
Optimization Problems in Statistics and Econometrics
421
Understandably, the conditions (12.8.2) are stronger than the conditions (12.7.3). The next task is to mount an inquiry under what conditions there exists a symmetric matrix C meeting the requirements of (12.8.2)? Let us provide an outline how this can be done. The equations tr(CVi) = Pi, i = 1,2, ... ,k, can be viewed as a bunch of linear equations in the entries of C using the vec operation. But C needs to satisfy an additional restriction that CX = o. In order to accommodate this restriction, determine the general form of C satisfying CX = 0, and accommodate this in the other set of constraints which can be rewritten as tr(CVi) = [vec(Vi)]'[vec(C)] = Pi, i = 1,2, .. . , k. This is the technique we followed in Section 12.7 towards the buildup of P 12.7.1. We will not pursue this natural approach. We will try to exploit P 12.7.1 to determine precise conditions under which we can find an invariant unbiased quadratic estimator of p'(}. The key idea is to build another linear model in the framework of P 12.7.1. Any symmetric matrix C satisfying C X = 0 must be of the form C = BAB' for some symmetric matrix A and for some matrix B of maximal rank satisfying B' B = I, B' X = 0 and BB' = Q = I - P, where P is the projection operator X(X'X)- X'. (Show this.) Introduce a transformation Y. = B'Y. Observe that EY. = B' X f3 = 0 and D(Y.) = (}l VI- + ... + (}k Vk ., where Vi* = B'ViB. Thus we have a linear model: Y. = c. with the design matrix X. = 0, Ec. = 0 and D(c*) = D(Y*). Further, the new<}uadratic form
Y:AY. = y'BAB'y = y'CY. We use the new linear model to discuss unbiased estimation of p'(}. By P 12.7.1, Y:AY. is an unbiased estimator of p'(} if and only if P E Sp(H.), where with P. as the projection operator associated with the design matrix X. of the model. But X * = o. Consequently, P* = O. Let us simplify the matrix H •. Observe that tr(Vi.Vj. -P*Vi.P.Vj.) = tr(B'ViBB'VjB) = tr(B'ViQVjB) = tr(BB'ViQVj) = tr(QViQVj). Consequently, H. = (tr(QViQVj )). Let us rewrite the matrix H. involving the projection
422
MATRIX ALGEBRA THEORY AND APPLICATIONS
operator directly. Observe that
tr{QViQVj) = tr{(I - P)Vi(I - P)Vj ) = tr{ViVj) + tr{PViPVj) - tr{PViVj) - tr{PVjVi). One might get the impression that the invariance property is just introduced to accommodate the normality assumption, viz., to make the variance of the unbiased estimator Y'CY of p'(} look elegant. This is partly true. But the concept of invarian(:e is intuitively appealing and it is not unreasonable to demand that the estimator be invariant, in addition.
12.9_. The Method of MINQUE We have broached unbiased estimation and invariant unbiased estimation of variance components in Sections 12.7 and 12.8. From now on, we will focus on developing an optimality criterion to select an o~ timal invariant unbiased estimator and optimal unbiased estimator of a variance component. In many variance components models, the error random vector has a structural representation, e = UI 6 + U26 + ... + Ukf,k, where Ui is a known matrix of order ni x ni, f,i is a random vector of order ni X 1 with mean vector 0 and dispersion matrix u'f In., i = 1,2, ... ,k, and In. is the identity matrix of order ni X ni. Further, 6,6, ... ,f,k are assumed to be independent. Under this canopy of assumptions, we have
D{e) = urUIU;
+ U~U2U~ + ... + UZUkU~.
In conformity with (12.7.1), we identify Vi with UiUi for each i. If f,i is observable, then a natural estimator of u7 is {1/ni)f,~f,i. Consequently, a natural estimator of PI u~ + P2U~ + ... + PkU~ is k
k
L(pdni)f,:f,i = L(pdni)f,:In.f,i i=I
i=I
= (f,~, f,~, . .. ,f,U D (f,~ , f,~, . .. ,f,~) I ,
where D is the block-diagonal matrix given by
(12.9.1)
Optimization Problems in Statistics and Econometrics
423
There is a reason in rewriting the natural estimator in the way it was written in (12.9.1). We will see shortly. But we need to develop an estimator of the given linear combination of variance components in terms of the data Y. Let us focus on invariant unbiased estimators of p'e. We have already noted down the necessary and sufficient conditions for the quadratic estimator Y'CY to be an invariant unbiased estimator of p'e. See P 12.8.2. Let us rewrite the estimator in terms of the structural representation of c. Note that, by (12.9.11),
Y'CY
= {3' X'CX{3 + +2(U16 + U26 + ... + Uk~k)'CX{3 + (Ul~l + U26 + ... + Uk~k)'C(Ul~l + U26 + ... + Uk~k) = (Ul~l + U26 + ... + Uk~k)'C(U16 + U26 + ... + Uk~k) = (~~,~~, ... ,~~)D.(~~,~~, ... ,~~)',
(12.9.2)
where D. is the block matrix whose (i,j)-th block is given by UfCUj . The critical idea behind the method of MINQUE is to get the proposed invariant unbiased estimator of p'e involving data close to the natural estimator. This is tantamount to getting D. close to D in some norm. The Frobenius norm of (D. - D) is given by
liD. - DIIF = tr«D. -
D)(D. - D)')
= treeD. -
D)2) k k k = L tr«U:CUi - (pi/n i)In J2) + L Ltr(U:CUjUjCUi) ~l
~l~l
i:f.j k
k
= L
tr(U:CUiU:CUi)
+ L(pi/ n i)2 tr (In,) i=l
i=l
k k k - 2 L(pi/ni)tr(U:CUi) + L Ltr(CUjUjCUiUI) i=l k
i=l j=l
i¢j
k
= Ltr(CUiU:CUiUI) i=l
+ LPUni i=l
k - 2 L(pi/ni)tr(CUiUI) i=l
k
+L
k Ltr(C"i CVi )
i=l j=l
i¢j
424
MATRIX ALGEBRA THEORY AND APPLICATIONS k
k
= Ltr(C\liC\Ii)
k
+ LpUni i=1
i=1 k
2 LpUni i=1
k
+ LLtr(CVjC\Ii) i=1 j=1
k k = tr((CL\Ii)2) - LPUni i=1
=
k tr(( CVo)2) - LPUni, i=1
i=1
where Vo = VI + .. .+Vk. The objective now is to deterrrune a symmetric matrix C which satisfies CX = 0 and tr (C\Ii) = Pi, i = 1,2, ... , k and rrunimizes (12.9.3) The optimization problem seems to fall within the purview of P 12.4.2. Not quite. We need to work a little bit more to bring the above problem to the format of P 12.4.2. Since Vo is non-negative definite, we can find a non-negative definite matrix B such that B2 = Vo. The matrix B can be termed as a square root of B. For convenience, one may write B as Vo1/2 . Note that
tr((CVo)2)
= tr((CVoCVo)) = tr((CB 2CB2)) = tr((BCB)(BCB)) .
Note that BCB is symmetric. In many applications, it will turn out that Vo will be positive definite. We will now assume that Vo is positive definite. Consequently, B will be positive definite. In such a case, knowing C is equivalent to knowing BCB. Let A = BCB. We need to transform the constraints into a set of constraints involving A. Note that CX = 0 is equivalent to B- 1 AB-I X = 0, which is equivalent to A(B- I X) = O. The condition that tr(C\Ii) = Pi is equivalent to tr(B-IAB-I\Ii) = Pi, which is equivalent to tr(A(B-1\liB-l)) = Pi. Let
X .. =B-1X
and
\Ii .. =B- 1\liB- I ,i=1,2, ... ,k.
(12.9.4)
The optimization problem translates into the following problem. Deterrrune a symmetric matrix A such that
AX.
=0
and tr(A\Ii .. ) = Pi, i
= 1,2, . . . , k
(12.9.5)
Optimization Problems in Statistics and Econometrics
425
and it minimizes tr(A2). This problem fits nicely into the format of P 12.4.2. An optimal solution A. is given by k
A. = L/\Q."i.Q.,
(12.9.6)
i=I
where AI, A2, ... , Ak satisfy the linear equations k
L[tr(Q. "i.Q* l'J.)]Ai
= Pi, i =
1,2, ... , k,
and
j=I
Once we compute A. , the optimal invariant unbiased estimator Y'C. Y can be computed with C. = B- 1 A.B- I • We label Y'C. Y as an optimal estimator according to the criterion of MINQUE.
12.10. Optimal Unbiased Estimation We are still in the operational mode of a linear model with error random vector containing several variance components: Y = X (3 + e, where Ee = 0, D(e) = O"fVI + ... + O"~Vk' VI, ... , Vk are known nonnegative definite matrices and O"f, ... , O"~ are unknown. Let PIO"r + ... + PkO"~ = p'O be a linear combination of the variance components O"r, ... ,O"~, where P' = (PI, . . . ,Pk) is known and () = (O"r, ... ,O"~). We are seeking an optimal unbiased estimator Y'CY of p'(). Unbiasedness means that we are seeking a symmetric matrix C satisfying
X'CX
=0
and tr(C"i)
= Pi
for all i.
(12.10.1)
Optimality means that we are seeking a symmetric matrix C such that (12.10.2) is a minimum. The entire process boils down to a mathematical problem of minimizing (12.10.2) with respect to the symmetric matrix C subject to (12.10.1). We will use the following result in our optimization problem.
426
MATRIX ALGEBRA THEORY AND APPLICATIONS
P 12.10.1 The minimum of tr{A2) over all symmetric matrices A subject to the conditions X' AX = 0 and tr{AVi) = Pi for all i, is attained at k
(12.1O.3)
A. = L"\{Vi - PViP), i=l
where P = X{X' X)- X' is the projection operator associated with the design matrix X, and AI, A2, ... ,Ak satisfy the linear equations: k
2:[tr(ViVj - PViPVj)]Ai
= Pj, j
= 1,2, ... ,k.
(12.1O.4)
i=l
PROOF. In order to prove that the given A. in (12.10.3) is optimal, we must ascertain that A. is admissible, i.e., A. satisfies the given constraints. This poses no problem. Next, we must determine the general structure of a symmetric matrix A satisfying X' AX = O. A symmetric matrix A satisfies X' AX = 0 if and only if A = T - PT P, for some symmetric matrix T. In a similar vein, there exists a symmetric matrix T. such that A. = T* - PT.P. We can rewrite any general solution A of the constraints as
A
= A. + (A - A.) = A. + [(T - T.) - P{T - T.)P] = A.
+ (S -
P S P),
where S = T - T •. Since
tr{{T - PTP)Vi) = Pi tr{{T. - PT.P)Vi)
= Pi
for all for all
~,
~,
it follows that we must have
tr«S - PSP)Vi) = 0 for all
~.
If we can show that tr(A 2) ~ tr«A.)2), the battle is won. Observe that
tr(A 2) = tr«A. =
+ (S - PSP))(A. + (S - PSP))) tr«A.)2) + tr«S - PSP)(S - PSP))
Optimization Problems in Statistics and Econometrics
=
+ 2tr(A.(S - PSP)) tr((A.)2) + tr((S - PSP)(S -
427
PSP))
k
+ 2 LAi(Vi -
PViP)(S - PSP)
i=l
= tr((A.)2)
+ tr((S -
PSP)2)
k
+ 2 LAi(Vi -
PViP)S
(Why?)
i=l
This completes the proof. This result is not applicable as it is to our optimization problem. We need to do some more work. In what follows we will assume that Vo is non-singular. Let B be a positive definite square root of Vo, i.e., B2 = Vo. Note that we can write
Let A = BGB, or equivalently, G = B- 1AB- 1. Knowing A is equivalent to knowing G. Let X. = X' B-1, Vi. = B-1 ViB-1, i = 1,2, ... ,k. The constraints can be rewritten involving A. More precisely, 0 = X'GX = X ' B-1AN-1X = X~AX. and Pi = tr(GVi) = tr((B- 1ViB-1)) = tr(AVi.). The optimization problem we originally started with is equivalent to maximizing tr(A2) over all symmetric matrices A subject to the constraints, X~AX. = 0 and tr(AVi.) = Pi for all i. This problem fits admirably into the framework of P 12.10.1. Let p. be the projection operator associated with the matrix X., i.e., P* = X.(X~X.)- X~. An optimal solution is given by k
A* = LAi(Vi* - P*Vi*P.), i=l
where AI, A2, . . . ,Ak satisfy the following linear equations: k
L[tr(Vi* Vj• - p. Vi.p. Vj.)]Ai = Pj, j = 1,2, ... ,k. i=l
428
MATRIX ALGEBRA THEORY AND APPLICATIONS
Once we obtain A., we can compute the required optimal matrix C. by, C. = B- 1 A. B- 1 • As a final flourish, we can lay down an optimal unbiased estimator of p'(} as Y'C.Y .
12.11. Total Least Squares We have seen that a linear predictor {3'x of a random variable y, where x E R m is a vector of predictor variables, is estimated by /3'x, where /3 is the solution to the optimization problem (12.11.1) where (Yi,xd,i = 1, ... ,n are observed data (see Sections 12.1-12.3). The expression (12.11.1) can be written as min (Y - X(3)'(Y - X(3),
(12.11.2)
t3ER'"
where Y' = (Y1, ... , Yn) and X' = (XII .. . Ixn) is an m x n matrix. The optimum {3 is a solution of the equation
x/3 = Px Y '* /3 = (X'X)-l X'Y,
(12.11.3)
where Px is the orthogonal projection operator on Sp(X). We may also formulate the problem as one of finding the optimum 0 for 2 (12.11.4) min II(YIX) - ((}IX)11 IiESp(X)
for a suitably chosen nonn, and then solving the consistent equation {j = X /3 to obtain /3. We now seek a solution to another problem of estimating a structural relationship () = ,'{3 between the latent variables () E Rand, E Rm, which are not directly observable. However, surrogate variables Y and X for () and, are available, which bear the stochastic relationships
Y = () + EI, V(Ed = x
=, + E2,
D(E2)
(]"2,
= E(E2E~) = V,
(12 .11.5)
Optimization Problems in Statistics and Econometrics
429
where V (.) stands for variance and D(·) for the dispersion (variancecovariance) matrix. We suggest a method of estimating 13 on the basis of independent observations (Yi, Xi), i = 1, ... ,n. The matrix of observations and the corresponding latent variables are respectively
[ ~I
(12.11.6)
Yn
using the structural relation () =
"1'13.
YI - "I~f3
We determine
13 by
minimizing
(XI - "11)' (12.11.7)
Yn - "I~f3
(xn - "In)'
for a suitably chosen norm. From statistical considerations, a suitable norm is n
n
i=1
i=1
(12.11.8) where a = l/u 2. We minimize (12.11.8) with respect to 13 and "II, ... ,"In in two stages. The minimum value of (12.11.8) with respect to "II,·· . ,"In
P 12.11.1 is n
1 + aaf3'Vf3l)Yi -f3'Xi)2 = 1 + aaf3 'Vf3(y- X f3)'(Y -Xf3), (12.11.9) 1=1
where Y'
= (YI, ...
PROOF.
,Yn) and X'
= (xII·· .Ixn).
Differentiating (12.11.8) with respect to "Ii, we have (12.11.10)
which gives =
af3'V f3Yi + x~f3 1 + af3'Vf3 Yi - xif3 1 + af3'Vf3·
(12.11.11)
430
MATRIX ALGEBRA THEORY AND APPLICATIONS
Also from (12.11.10) (12.11.12) Substituting (12.11.11) and (12.11.12) in (12.11.8) we have the expression (12.11.9). It is interesting to note that the expression (12.11.9) to be minimized is same as that for least squares except for the multiplying factor which is also a function of {3. The method of estimating {3 by minimizing (12.11.9) is called total least squares.
P 12.11.2
The vector {3 which satisfies the equation (12.11.13)
for the smallest value of A minimizes the expression (12.11.9). PROOF.
Differentiating (12.11.9) with respect to {3 and equating to
zero we get 2
ex V{3 ()'( (1 + ex{3'V(3) 2 Y -X{3 Y - X(3)
ex,
+ 1 +ex{3'V{3X (Y -X(3) = O.
(12.11.14) Denoting the expression (12.11.9) by A, equation (12.11.14) reduces to
X'X{3 - X'y = AV{3.
(12.11.15)
Now
A= =
ex
(Y - X(3)'(Y - X(3)
ex
[Y'(Y _ X(3) - (3' X'(Y - X (3)]
1 + ex{3'V{3 1 + ex{3'V{3
ex (Y' , , . = 1+ex{3'V{3 Y-Y X{3+A{3V(3), usmg(12.11.15}
Le.,
U
2
A=Y'Y-Y'X{3.
(12.11.16)
The two expressions (12.11.15) and (12.11.16) give equation (12.11.13).
Optimization Problems in Statistics and Econometrics
431
The method of obtaining f3 is now clear. Let A be the smallest eigenvalue of
Y'Y [ X'Y
Y'X] X' X wi th respect to
[u0
2
t]
(12.11.17)
for which the corresponding eigenvector has its first element not zero. Let Ak be such an eigenvalue and be not repeated. Further let Pkl, ... , Pk,(m+1) be the eigenvector corresponding to Ak. Then the unique estimate of f3 is /3' = -(Pk.)-l (Pk2," . ,Pk,(m+l»)' If Ak is repeated, then any combination of the corresponding eigenvectors will provide a minimizing solution. The solution we obtained is more general than the total least squares estimate of f3 computed on the assumption that V is a diagonal matrix with each diagonal element equal to the same u 2 • If we have some knowledge of u 2 and V, then a better estimate can be obtained. Now we consider another situation where some of the surrogate x variables are latent variables observed without error. In such a case we write the structural equation as () = x;f31 +,~f32' where for () and 12 we have surrogate variables Y and X2. Let us assume, for simplicity, that the errors in y and the components of X2 are independent and have the same variance. In such a case, the total least squares estimate of f31 and f32 may be obtained by minimizing n
n
L(Yi - x~if31 - 1~dh)2 ~1
+ L(X2i -
12i)'(X2i - 12i)
(12.11.18)
~1
with respect to ,2i, i = 1, ... ,nj f31 and f32' As in P 12.11.1, it can be shown that the minimum value of (12.11.18), when minimized with respect to 12i, i = 1, ... ,n is n
1
"(
,
')2
1 + {3' f3 ~ Yi - xlif31 - x2if32 2 2 i=l =
1+
~, f3
2 2
(Y - X 1f31 - X 2(32)'(Y - X 1f31 - X 2(32)(12.11.19)
with the usual notation for Y, XI and X 2. A further minimization with respect to f31 gives estimate of f31 given f32 as /31 = (X~X.)-l Xi
(Y - X2f32).
432
MATRIX ALGEBRA THEORY AND APPLICATIONS
Substituting this estimate in (12.11.18), the expression to be minimized with respect to /32 comes out to be (12.11.20) where PX1 is the orthogonal operator on Sp(X1 ). The rest of minimization with respect to /32 is done on the same lines as in P 12.11.2. Complements
12.11.1
Consider the matrix (12.11.7), which may be written as,
(Y
I X) - (r/3 I r)
= (Y -
r/3 I X - r)
and choose the squared norm tr[(Y -
r/3 I X
- r)W-1(y -
r/3 I X
- r)/J
(12.11.24)
where W is a posi tive definite matrix of order (m + 1) x (m + 1). Try to minimize the above expression with respect to /3 and r. The problem so formulated is the most general version of total least squares. Note: The material of the first 9 sections of this chapter is based on the papers by Rao (1959, 1965, 1972a, 1972c, 1976b, 1984, 1987, 1989), Rao and Mitra (1968b, 1969), Rao and Kleft'e (1988) and Rao and Toutenburg (1995). The model (12.11.5) is called errors-in-variables model in the statistical literature. Some discussion of these models is carried out in Fuller (1987), Anderson (1976, 1980, 1984) and Anderson and Sawa (1982). Ammann and Van Ness (1988, 1989) have conducted some simulation studies to assess some statistical properties of least squares and total least squares estimators in the context of a simple linear regression problem. The multivariate errors-in-variables regression model has been considered by GIeser (1981). The total least squares methods is also called orthogonal regression. Finally, one cannot fail to mention the timely contributions of Van Huft'el and Vandewalle (1991) and Van Huffel and Zha (1993) to the total least squares criterion.
CHAPTER 13 QUADRATIC SUBS PACES 13.1. Basic Ideas The basic object of investigation in this chapter is the collection A of all n x n symmetric matrices with real entries and certain subsets of A, where n is any positive fixed integer. It is clear that the collection A is a real vector space with the usual operations of addition and scalar multiplication of matrices. More precisely, let A = (aij) and B = (b ij ) be any two members of A. The sum of A and B is defined to be the n x n matrix C = (Cij) with Cij = aij +bij for all i and j E {I, 2, ... ,n}. Symbolically, one can write C = A+B. Further, if 0: is any real number, the scalar multiple of the matrix A by the scalar 0: is the n X n matrix D = (dij ) with d ij = o:aij for all i and j. Symbolically, one can write D = o:A. One can introduce, in a perfectly natural manner, an inner product < ., . > on the vector space A. For any two matrices A = (aij) and B = (b ij ) in A, let n
< A, B > =
n
L L aijbij = tr(BA). i=l j=l
The following properties of the inner product are transparent. n
n
L L
a;j ~ o. i=l j=l (2) For any A in A, < A, A > = 0 if and only if A = O. (3) For any A and B in A, < A, B > = < B, A >. (4) The inner product < -, . > on the product space A x A is bilinear, i.e., for any A, Band C in A and 0: and {3 real numbers,
(1) For any A in A, < A, A > =
< o:A +{3B,C > < A,o:B + {3C > -
0: 0:
< A,C > + {3 < B,C >, < A, B > + {3 < A, C > . 433
434
MATRIX ALGEBRA THEORY AND APPLICATIONS
In view of the properties enunciated above, the phrase "inner product" is apt for the entity < .,. >. It is time to introduce a special subset of the vector space A . DEFINITION 13.1.1. A subset B of A is called a quadratic subspace of A if the following hold.
(1) B is a subspace of A. (2) If B E B, then B2 E B. Some examples are in order. EXAMPLE
13.1.2 .
(1) At one end of the spectrum, the whole collection A is a quadratic subspace of A. At the other end, B = {O}, the set consisting of only the zero matrix, is a quadratic subspace of A. (2) Let A be a fixed symmetric idempotent matrix of order n x n. Let B = {aA : a real} . Then B is a quadratic subspace of A. (3) The idea behind Example (2) can be extended. Let A and B be two symmetric idempotent matrices of order n x n satisfying AB = o. (The condition AB = 0 implies that B A = o. Why?) Let B = {aA + {3B: a and {3 real}. Then B is a quadratic subspace of A. It is clear that B is a subspace of A. Observe that
(4) Example (3) can be further generalized to handle more than two idempotent matrices. This is one way of providing many examples of quadratic subspaces. These quadratic subspaces have a certain additional property which make them stand out as a special breed. This will be apparent when the notion of commutative quadratic subspaces is introduced. (5) Let us look at the case n = 2. The vector space A is threedimensional. Let A be any matrix in A. By the spectral decomposition theorem, one can write A = CDC' for some orthogonal
Quadratic Sub spaces
435
matrix C of order 2 x 2 and a diagonal matrix D Every orthogonal matrix C is of the form
C = [cos(
= diag{ a, b}.
-Sin(
cos(
for some angle <po The matrix A can now be written informativelyas A = [cos(
- sin(
cos(
[a
0
0] [cos(
- sin(
sin(
cos(
for some scalars a and b, and angle <po The matrix A is idempotent if and only if a = 0 or 1 and b = 0 or 1. The only three-dimensional quadratic subspace of A is A itself. The zerodimensional quadratic subspace of A is {O}. , A one-dimensional subspace of A is precisely the vector space spanned by a single idempotent matrix. There is a lot of flexibility in the construction of two-dimensional quadratic subspaces of A . Choose an angle <po Let
A = [cos(
- sin(
B = [cos(
- sin(
cos(
0
0]o [-cos(
0]1 [-cos(
sin(
Let B be the vector space spanned by A and B. Then B is a twodimensional quadratic subspace of A. Note that A and Bare idempotent and AB = o. There is another way of constructing a two-dimensional quadratic subspace of A. Let A = 12 , the identity matrix of order 2 x 2, and B is any other idempotent matrix such that A and B are linearly independent. (It will be instructive to construct an example of B.) Let B be the vector space spanned by A and B . Then B is a two-dimensional quadratic subspace of A. (Why? P 13.1.3 might be helpful.) In this example, note that AB i= o. It is time to jot down a few simple properties of quadratic subspaces. P 13.1.3 Let B be a subspace of A. Then the following statements are equivalent. (1) B is a quadratic subspace of A.
436
MATRIX ALGEBRA THEORY AND APPLICATIONS
(2) If A, B E B, then (A + B)2 E B . (3) If A, B E B , then AB + BA E B. (4) If A E B, then Ak E B, k = 1,2, .... PROOF. It is clear that Statement (4) implies Statement (1). Suppose (1) is true. One thing is clear. If A E B, then A2, A4, AS, At6, ... E A. It is not immediately obvious that A3 also belongs to B . Since B is a vector space, A + A2 E B . By the presumption of (1), (A + A2)2 E B. But (A + A2)2 = A2 + 2A3 + A4. Since B is a vector space, 2A 3 E B. Hence A3 E B. In a similar vein, one can show that every integral power of A belongs to B. Try it yourself. Thus (4) follows. It is clear that Statements (1) and (2) are equivalent . To show that (1)=>(3), note that (A + B)2 = A2 + B2 + (AB + BA). It now follows that AB + BA E B. It is clear that (3)=>(1).
P 13.1.4 Let B be a quadratic subspace of A. (1) If A, BE B, then ABA E B. (2) Let A E B be fixed. Let C = {ABA: B E B}. Then C is a quadratic subspace of B . (3) If A, B,C E B, then ABC + CBA E B. PROOF.
(A
(1) Let A, BE B. Observe that
+ AB + BA)2 = =
+ (AB + BA)2 + A(AB + BA) + (AB + BA)A A2 + (AB + BA)2 + (A2 B + BA2) + 2ABA. A2
Since B is a quadratic subspace of A, (A + AB + BA)2,A2,(AB + BA)2,A2B+BA2 E B . Hence ABA E B. (2) Let Bl and B2 E B, and a and (3 real numbers. Then
aABlA + (3 AB2A = A(aBt
+ (3B2) A E B .
This shows that C is a vector space. Let B E B. Then (ABA)2 = ABAABA = A(BA2 B)A. Since B is a quadratic subspace, A2 E B and hence BA2 BE B, by (1). Hence (ABA)2 E C. (3) From the deliberations carried out so far, it is clear that
A(BC + CB)
+ (BC + CB)A = B(AC + CA) + (AC + CA)B = C(AB + BA) + (AB + BA)C =
ABC + ACB + BCA + CBA E B, BAC + BCA + ACB
+ CAB E B, CAB + CBA + ABC + BAC E B.
437
Quadratic Sub spaces
A simple arithmetic,
[A(BC + CB) + (BC + CB)A]- [B(AC + CA) + (AC + CA)B] +[C(AB + BA) + (AB + BA)C] = 2(ABC + CBA), yields that ABC + CBA E B. The next item on the agenda is how to recognize a quadratic subspace when one sees one. Checking whether or not a particular subset B of A is a subspace is basically a simple act. Assume that one has a subspace B of A in hand. Let BI = {Bt, B2, ... ,Bk} be a basis of the vector space B. By checking a few things about B I , one can verify that B is indeed a quadratic subspace of A. The following result is in this direction. P 13.1.5 Let B be a subspace of A. The following statements are equivalent. (1) A E B t :} A2 E B. (B is a quadratic subspace of A.) (2) A,B E B I : } (A+B)2 E B. (3) A, BE B t :} AB + BA E B. PROOF. It is clear that Statement (1) implies Statement (2). Consider the implication (2):}(3). Let A, B E B t . Observe that (A + B)2 = A2+AB+BA+B2 E B, by (2). Since B is a vector space, AB+BA E B. Finally, consider the implication (3):}(1). Let A E B. If A E Bt, by taking B = A, one can observe that AB + BA = 2A2 E B. Hence A2 E B. If A ~ B I , one can write A = olB I + ... + OkBk for some scalars ai, a2, ... ,ak. Note that k
A2 = Lo~Bl
+ LOiOj(BiBj + BjBi). i<j
i=l
By just what was observed, each B'f E B. By (3), BiBj Consequently, A2 E B. This completes the proof. Complements 13.1.1
Show that the idempotent matrices A = 12 and
B=
[cos
2
.cp cos cp sm cp
Cos.cpsincp] sm2 cp
for any angle cp are linearly independent.
+ BjB i
E
B.
438
13.1.2
MATRIX ALGEBRA THEORY AND APPLICATIONS
For what values of
Acos
B -
. 2
[
sm
cos
linearly independent? 13.1.3 For what values of
cos
2 cos
cos
B _ [
linearly independent? 13.1.4 For n = 2, show that the matrices
A
=h B=
[t :. J '
and C
=
[~ ~ ]
form a basis for the vector space A. 13.1.5 Construct a basis of n(n + 1)/2 idempotent matrices for the vector space A, the collection of all n x n symmetric matrices. Prove a similar result for complex matrices.
13.2. The Structure of Quadratic Subspaces The spectral decomposition of a symmetric matrix is the focal point of this section. Recall that the spectral decomposition of a symmetric matrix provides a representation of the matrix as a linear combination of idempotent matrices. More precisely, let A be a symmetric matrix of order n x n, and AI, A2,'" ,An be its eigenvalues. Then we can write
for some idempotent matrices QI, Q2, ... ,Qn each of rank one. Further, these idempotent matrices have the property that QiQj = 0 for all i i= j. Some of the eigenvalues could be zero. Some eigenvalues could be repetitive. Let us tighten the representation in the following way.
Quadratic Sub spaces
439
DEFINITION 13.2.1. A symmetric matrix A is said to have a sparse spectral representation if it can be written as
for some distinct non-zero scalars J-LI, J-L2, ... ,J-Lk, and non-zero idempotent matrices PI, P2, ... ,Pk with the property that ~Pj = 0 for all i i= j. A sparse spectral representation is always possible for any symmetric non-zero matrix. Start with the usual spectral decomposition of the matrix. Throwaway all those idempotent matrices in the representation that correspond to zero eigenvalues. If a non-zero eigenvalue repeats, add up all the idempotent matrices that correspond to this eigenvalue. A sparse representation of the matrix entails. The following matrix is useful in what follows . This matrix is a relative of the well-known Vandermonde matrix. Let J-LI, IL2, . .. ,J-Lk be non-zero distinct real numbers and
J-LI
~=
[
~2 J-Lk
The matrix ~ of order k x k is non-singular. In fact, the determinant of ~ is given by I~I = (IIf=lJ-Ld[IIi>j(J-Li - J-Lj)]. If one looks at the development of ideas so far, the idempotent matrices seem to play a substantial role in the formation of quadratic subspaces. The trend continues with the following critical result from which something important emerges about the idempotent matrices in the sparse spectral decomposition of matrices in a quadratic subspace.
P 13.2.2 Let B be a quadratic subspace of A and A E B. Let A = J-LIPI + J-L2P2 + ... + J-LkPk be a sparse spectral decomposition of A. Then each Pi E B. PROOF. The critical idea in the proof is to demonstrate that each ~ is a linear combination of A, A2, ... ,Ak. Since B is a quadratic
subspace of A, it would then follow that each Pi E B. To accomplish this, note that
440
MATRIX ALGEBRA THEORY AND APPLICATIONS
A2
= ILl PI + IL2 P2 + ... + ILkPk, = ILiPI + IL~P2 + ... + IL%Pk,
Ak
= IL~ PI + IL~P2 + ... + ILZPk.
A
To begin with, let us show that PI is a linear combination of A, A2, ... , Ak. Consider the system of linear equations ~{3 = , in unknown vector {3' = ({3}, {32,.·· ,(3k), where " = (1,0,0, ... ,0) is a vector of order 1 X k. Since ~ is non-singular, the system of equations has a unique solution (3, i.e., satisfying the equations
+ ILiJ32 + ... + IL~{3k = 1, I L2{31 + IL~J32 + ... + IL~{3k = 0, ILI{31
ILk{31
+ ILZ{32 + ... + ILZ{3k =
0.
For this solution, observe that k k k k k i L{3iA = L{3i(LIL~Pj) = L(LIL~{3dPj = Pl. j=l i=l i=I j=l i=l The general argument should be clear by now. This completes the proof.
P 13.2.2 is helpful in many ways. Some of the benefits are chronicled in the following result. The existence of a basis for a quadratic subspace consisting of idempotent matrices is the centerpiece of this section. COROLLARY 13.2.3. Let B be a quadratic subspace of A. (1) If A E B, then the Moore-Penrose inverse A+ E B. (2) If A E B, then AA+ E B. (3) There exists a basis of B consisting of idempotent matrices. PROOF. (1) Let A E B and A = ILl PI +IL2P2 + ... +ILkPk be a sparse spectral decomposition of A. Note that A+ = (I/ILI)PI
+ (I/IL2)P2 + ... + (I/ILk)Pk •
By P 13.2.2, A+ E B . (2) Let A E B. Continuing the argument set out in (1), note that AA+ = PI + P2 + ... + Pk . Hence AA+ E B.
Quadratic Sub spaces
m
445
m
y'(2: 0i(0)Vi)Y - 2j3' X'(2: 0i(O)Vi)X(X' X)-l X'y i=l
i=l m
+ j3' X'(2: 0i(O)Vi)Xj3. i=l
In all these simplifications only the middle tenn is worked upon. The density can now be written as m
f(y; j3, 0) = c*(j3, 0) exp{ -(1/2)[2: 0i(O)y'Viy i=l m
- 2j3' X'(2: O;(O)Vi)X(X' X)-l X'yn i=l m
= c*(j3, 0) exp{2: CPi(O)y'Viy + A(j3, O)X'y}, i=l
where c*(j3, 0) = c(O) exp{ -(1/2)j3' X'(E::1 Oi(O)Vi)X j3}, and
CPi(O)
= -(1/2)Oi(0), i = 1,2, ... ,m, m
A(j3,O) = j3'X'(2:0i(O)Vi)X(X'X)-l. i=l
The family f(· ;j3,0), j3 E RP, 0 E n of densities constitute an exponential family and from the general theory of such families, it follows that statistic (X'Y, Y'V1Y, Y'V2 Y, ... , Y'VmY) is complete and sufficient. Once we identify a complete and sufficient statistic for a family of distributions, it becomes a simple task to identify Uniformly Minimum Variance Unbiased Estimators (UMVUE) of parameters of interest. In the following result, we follow this cue. The basic problem is to estimate linear functions of the components of j3 unbiasedly with minimum variance. This leads to the question as to what linear functions of the components of j3 can be estimated unbiasedly. This question has been tackled earlier. It suffices to estimate unbiasedly X j3. (Why?) The next
442
MATRIX ALGEBRA THEORY AND APPLICATIONS
for the subspace B such that C 1 C 2 = O. (Why?) Consider the following matrices
C 1 C3, C 2C3, C 1 -C1 C 3, C 2 -C2C3, C 3 -(C1 C 3 +C2C3), C 4 ,· ·· ,Ck • These matrices have the following properties: (1) Each one of them is idempotent. (2) The product of any two of the first five matrices is zero. (3) The matrices span B. We can then select a basis D 1 , D 2, . .. ,Dk for the subspace such that DID2 = DID3 = D2D3 = O. This process can be continued until we get a basis of B possessing the properties stipulated by the theorem. Complements 13.2.1 Let A be the collection of all 2 X 2 symmetric matrices. Are there quadratic subspaces of A which are not commutative? 13.2.2 Let A be the collection of all 3 X 3 symmetric matrices. Exhibit a quadratic subspace of A which is not commutative. 13.3. Commutators of Quadratic Subspaces In all the examples presented so far, one particular feature can be identified. In each subspace there is one element II such that it commutes with every member of the space. One can call II as a commutator of the subspace. In this section, we will prove the existence and uniqueness of a commutator for any given quadratic subspace. We need some preliminary results which will help to achieve the objective. P 13.3.1 If A and B are any two members of B, a quadratic subspace of A, then there exists T E B such that Sp(A) + Sp(B) = Sp(T). PROOF. Recall that Sp( A) is the space spanned by the column vectors of A. Let A = >'1Pl + A2P2 + ... + ,ArPr be a sparse spectral decomposition of A. Let P = PI + P2 + .. . + Pr . It is clear that P E B and Sp(A) = Sp(P). Let
T = P
+ (I -
P)B 2 (I - P).
Note that T E B. (Why?) Further, P[(I - P)B] = O. Consequently,
Sp(T) = Sp(P + (I - P)B 2 (I - P))
= Sp(P) El1 Sp(B(I -
P))
= Sp(P) El1 Sp((I -
= Sp(A) + Sp(B).
P)B 2 (I - P))
Quadratic Su.bspaces
443
P 13.3.2 Let B be a quadratic subspace of A. Then there exists a matrix II E B such that IIB = B for all B E B with the following additional properties. (1) II is unique. (2) IIB = BII for all B E B. (3) A E Band peA) = p(II) =? Sp(A) = Sp(II). (4) A E Band peA) = p(II) =? B = {ABA: B E B}. PROOF. Let the matrices B l , B 2 , ... ,Bk span the subspace B. By P 13.3.1, there exists Ti E B such that
for i = 2,3, ... ,k. Let II = TmT;t;. By Corollary 13.2.3, II E B. The matrix II will do the trick.
13.4. Estimation of Variance Components Consider a linear model Y = X/3 + e, where: (1) Y is a random vector of order n x 1. (2) X is a deterministic known matrix of order n x p. (3) /3 is an unknown parameter vector of order p x 1 III the pdimensional Euclidean space RP. (4) e is of order n x 1 with Ee = 0 and D(e) = EOi Vi. (5) 0 = (Ob O2 , • •• ,Om) is an unknown vector belonging to the parameter space
(6) The matrices VI, V2 , •.• , Vm are known linearly independent symmetric matrices with Vm = In. (7) The subset n of Rm has a non-empty interior. (8) The random vector e has a multivariate normal distribution. We will present later an example of a situation in which such a model known as a variance component model arises. If we can find a complete sufficient statistic for /3 E RP, 0 E n in the m
family Nn(X /3,
l: Oi Vi) i=l
of distributions, we can find good estimators
444
MATRIX ALGEBRA THEORY AND APPLICATIONS
of {3 and O. The symbol N n stands for n-variate normal distribution. Let B be the vector space spanned by the matrices VI, V2,··· , V1n . The following result is instrumental in paving a way for a good estimation of the variance components and the parameter vector {3. 1n P 13.4.1 Let N n (X{3, L: (JiVi), {3 E RP, 0 E n be a family of ni=l
variate normal distributions with n having a non-empty interior. Suppose that the following two conditions are met: (1) The space B spanned by the matrices Vb V2, ... , V1n is a quadratic subspace of A. (2) Sp(ViX) c Sp(X) for each i. (Recall that Sp(X) is the vector space spanned by the columns of X.) Then the vector statistic (X'Y, Y'V1 Y, Y'V2Y, ... , Y'V1n Y) is a complete sufficient statistic for the family. PROOF. If A E B, we have already seen that its Moore-Penrose inverse A+ also belongs to B. Let 0' = (0 1 , O2, ... , 01n) En. Thus we have (OiVi + ... + 01n V1n )-l E B. But we can write 1n 1n i=l
i=l
for some functions 0iO, 02(·), ... ,0;'0 on n, since Vb V2, ... , V1n form a basis for the vector space B. Let us analyze the second condition of the theorem. The condition Sp(ViX) C Sp(X) implies that ViX = XCi for some matrix C i . Assume, without loss of generality, that Rank(X) = p, i.e., the rank of X is full. (Why?) Let f('j {3, 0) be the joint density function of the components of the random vector Y. Then
m f(Yj{3,O) = c(O)exp{-(1/2)(y - X{3)'(2:0iVi)-I(y - Xf3)}, Y ERn, i=l
where c( 0) is the normalizing factor depending only on O. The expression in the exponent after excluding the factor - (1/2) simplifies to 1n 1n 1n y'(2: 0i(O)Vi)y - 2f3' X' 0i(O)Vi)y + f3' X' 0i(O)Vi)X {3 i=l
(2:
(2:
i=l
i=l
which after some further simplification reduces to
Quadratic Sub spaces
m
445
m
y'(2: 0i(0)Vi)Y - 2j3' X'(2: 0i(O)Vi)X(X' X)-l X'y i=l
i=l m
+ j3' X'(2: 0i(O)Vi)Xj3. i=l
In all these simplifications only the middle tenn is worked upon. The density can now be written as m
f(y; j3, 0) = c*(j3, 0) exp{ -(1/2)[2: 0i(O)y'Viy i=l m
- 2j3' X'(2: O;(O)Vi)X(X' X)-l X'yn i=l m
= c*(j3, 0) exp{2: CPi(O)y'Viy + A(j3, O)X'y}, i=l
where c*(j3, 0) = c(O) exp{ -(1/2)j3' X'(E::1 Oi(O)Vi)X j3}, and
CPi(O)
= -(1/2)Oi(0), i = 1,2, ... ,m, m
A(j3,O) = j3'X'(2:0i(O)Vi)X(X'X)-l. i=l
The family f(· ;j3,0), j3 E RP, 0 E n of densities constitute an exponential family and from the general theory of such families, it follows that statistic (X'Y, Y'V1Y, Y'V2 Y, ... , Y'VmY) is complete and sufficient. Once we identify a complete and sufficient statistic for a family of distributions, it becomes a simple task to identify Uniformly Minimum Variance Unbiased Estimators (UMVUE) of parameters of interest. In the following result, we follow this cue. The basic problem is to estimate linear functions of the components of j3 unbiasedly with minimum variance. This leads to the question as to what linear functions of the components of j3 can be estimated unbiasedly. This question has been tackled earlier. It suffices to estimate unbiasedly X j3. (Why?) The next
446
MATRIX ALGEBRA THEORY AND APPLICATIONS
problem is to estimate linear functions of the variance components. Let
>" = (>'t. >'2, ... ,>'m) be a given vector. Let P = X(X' X)-l X', and S = tr(~(In - P)Vj). The matrix: S is of order m x m with (i,j)-th element given by tr(~(In - P)Yj). We have seen that there exists an lll1biased quadratic estimator of >.'B if >. E Sp(S), i.e., there exists a vector a such that>. = Sa. (See Sections 12.7 - 12.10.) P 13.4.2 Let Y = X{3 + € be a variance components linear model satisfying all the conditions stipulated in P 13.4.1. Let >.'B be an estimable linear function of the variance components, i.e., >. = Sa for some vector a' = (a1' a2, . .. ,am)' Then: (1) The UMVUE of X{3 is X(X'X)-1 X'Y. m
(2) The UMVUE of >.'B is
2: aiY'(~ -
PViP)Y.
i=1
(1) It is enough to show that X(X'X)-1 X'Y is an lll1biased estimator of X {3. (This is one of the benefits of complete sufficiency of the vector statistic of P 13.4.1.) Note that E(X(X' X)-1 X'Y) = X{3. (2) In a similar vein, it suffices to show that the estimator proposed in (2) is an lll1biased estimator of >.'B. Before proceeding with the task, we do a simple computation to show that PVjP = YjP for each j. Since Sp(VjX) c Sp(X), there exists a matrix C such that YjX = XC. Hence PROOF.
PYjP = X(X' X)-1 X'YjX(X' X)-1 X' = X(X'X)-lC'X'
= X(X' X)-1C' X' X(X' X)-l X'
= X(X'X)-lX'Yj = PYj.
For the evaluation of the desired expectation, we need to recall the formula for the expectation of quadratic form of a random vector. Note that m
ELaiY'(Vi - PViP)Y i=1 m
m
= Laitr((Vi - PViP)Disp(Y))
+L
i=1 m
m
= Laitr((Vi - ViP) LBjYj) i=l
a i{3'X'(Vi - PViP)X{3
i=l
j=1
m
+L i=l
a i{3'(X'ViX - X'PViPX){3
Qu.adratic Su.bspaces
=
m
m
j=1
i=1
m
m
L OJ L
= LOj j=1
447
m
°i
LOi
tr((\Ii - \liP)Vj)
+L
0i
,B'(X'ViX - X'\liPX),B
i=1 m
tr(\Ii(In - P)Vj)
i=1
+0 =
LOjAj, j=1
in view of the fact that A = So. This completes the proof. We would like to indicate the use of P 13.4.2 in the context of design of experiments. We will focus specifically on Balanced Incomplete Block Designs (BIBD). Suppose we wish to compare the performance of some 1/ treatments, T 1 , T2 , ••• ,Til. We select b blocks each containing k experimental units. The treatments are assigned to the experimental units subject to the following conditions:
(1) All treatments are replicated the same number of times
T.
(2) Every pair of treatments occur the same number of times A together in the blocks. Such a design is called a BIBD. The word "incomplete" in BIBD stems from the fact that k < 1/. Consider one particular BIBD adopted for the purported problem of comparing the performance of the treatments. Let Yij be the response from the j-th unit of the i-th block and i(j) the subscript of the treatment that has been assigned to the j-th unit of the i-th block, j = 1,2, ... ,k and i = 1,2, ... ,b. The following is a model with the parameters defined in a natural way that would explain the variation one observes in the responses:
Yij = JL + 0i
+ ,Bi(j) + €ij,
i = 1,2, ... ,b; j = 1,2, ... ,k,
(1) JL is the general effect, (2)
(3) (4)
(5) (6)
(7) (8)
is the effect of the i-th block, ,Bj is the effect of the j-th treatment, €ij is the random error, It and ,Bj's are deterministic unknown parameters, Oi'S are independent identically distributed normal random variables with common mean zero and variance €ij 's are independent identically distributed normal random variables with common mean zero and variance a 2 , Oi'S and €i;'S are mutually independent. 0i
a;,
448
MATRIX ALGEBRA THEORY AND APPLICATIONS
The model is an example of a mixed linear model. The goal now is to put the mixed linear model in the format of Y = Xf3 + c. Let Y be the column vector of all the responses stacked block by block. It is of order n x 1, where n = bk. Let f3' = (iL, f3b f32, .. · ,f3v), a' = (0.1,0.2, ... ,ab),
where lk is a column vector of order k x 1 in which each entry is unity. The matrix X is a block matrix of order n X (1/ + I) and each Xi is of order k x 1/. The first row of Xi has 1 in the i(I}-th column and zeros elsewhere. Remember that the first unit of the i-th block received the treatment Ti(l) and Yil = iL + ai + f3i(l) + Cil. The secon
Y = Xf3 + Ua
+ InTI.
The reader is strongly urged to get a grip on this particular form of writing the mixed linear model. One can identify the vector C with Ua + InTI. The dispersion matrix of C works out to be D(c} = (1';UU' + (1'2 In. The matrix UU' is block diagonal n x n matrix with the i-th diagonal block as J k , a matrix of order k x k in which every entry is equal to unity. Thus we are back in the mould of a variance components model with m = 2, VI = UU', V2 = In, 01 = (1';, and O2 = (1'2. Observe that vl = kV1 •
Note: A seminal paper on quadratic subspaces is by Seely (1971). For further nourishment, pursue Seely (1971), Seely and Zyskind (1977), and Rao and Kleffe (1988).
CHAPTER 14 INEQUALITIES WITH APPLICATIONS IN STATISTICS In this chapter we provide a number of inequalities involving vectors and matrices, which have applications in statistics. Proofs are given for some propositions, while for others references to original sources, where proofs can be found, are given. Applications to statistical problems are indicated.
14.1. Some Results on nnd and pd Matrices Let us recall that an n X n Hennitian matrix A (Le., A = A*) is said to be non-negative definite (nnd) if x* Ax ~ 0 for every x E cn and positive definite (pd) if x* Ax > 0 for every non-zero x E cn. The following propositions characterize nnd and pd matrices.
P 14.1.1 A is nnd if and only if there exists a matrix U such that A = U*U, where U is such that Rank(A) = Rank(U). In fact, we can choose U such that U = U* and write A = U 2 = UU . A is pd if Rank(U) = n. The proposition follows easily from the spectral decomposition of A. In fact U can be chosen to be nnd. In that event we say that U is a square root of A. An nnd matrix is also called Gramian. A square root of A is computed as follows. Write the spectral decomposition of A in the style A = .x~ PiP; + .x~P2P2 + ... + .x~PkPk, where k = Rank(A), .xi, .x~, ... ,.x% are positive eigenvalues of A and Pi, P2, ... , Pk the corresponding orthonormal eigenvectors of A. An nnd (also called a Gramian) square root U of A can be taken as (14.1.1) 449
450
MATRIX ALGEBRA THEORY AND APPLICATIONS
P 14.1.2 Let Y be n x r matrix and denote by y.L, the matrix whose columns are orthogonal to those of Y and such that Rank(Y) + Rank(Y.L) = n. Then Py
= y(y*y)-y*
and 1- Py ,
(14.1.2)
are nnd, where (y*y)- is any g-inverse. PROOF. Let a E en be arbitrary. Write a = Ya1 vectors a1 and a2. (How?) Note that
+ y.La2
for some
a* Pya = a*Y(Y*Y)-Y*a = a~Y*Y(Y*Y)-Y*Yal = a~(Y*Y)al = (a~Y*)(Yad ~
o.
Consequently, Py is nnd. In a similar manner,
from which we conclude that 1- Py is nnd. We have seen the matrix Py before. It is the orthogonal projector from en onto Sp(Y). See Chapter 7. P 14.1.3
Let A be nnd and consider a partition
(14.1.3) where Au and A22 are square matrices. Then Au - A12A22A21 is nnd and unique for any g-inverse A22 of A 22 . PROOF. By P 14.1.1, one can write A = U*U for some matrix U. Partition U = (XIY) , where the number of columns of X is the same as the number of columns of Au . Note that A=U*U= [X*] [XIY]= [x*x x*y] = [All A 12 J y* y* X y*y A21 A22 ' Au - A12A22A21 = X* X - X*Y(Y*Y)-Y* X = X*(I - Py)X.
(14.1.4)
Since l - Py is nnd, Au - A12A22A21 is nnd, and unique (Why?).
Inequalities with Applications in Statistics
451
The matrix All - A12A2"2A21 is called. the Schur complement of A with respect to A 22 . P 14.1.4
Let A be pd and partition A-I as
A-I = [All A2l
A12]-1 _ [All A22 A2l
A12] A22
(14.1.5)
where All is a square matrix. Then
(1) (2) PROOF. From
All = (All - A12A2"l A2d-t, All - All is und.
(14.1.6) (14.1.7)
(14.1.5), we have the equations
+ A12A2l = I, All A12 + A12 A22 = o. AllAll
Eliminating A12 we have an equation in All which gives (14.1.6). To prove (14.1.7), we note that All - (All)-l = A12A2"2lA2l' which is und. This shows that All ~ (All)-l which implies (All)-l ::; All. In particular, from (14.1.7), we have that if A = (aij) and A-I = (a ij ), then all ~ l/all. Complements
14.1.1 If A = B+C, B is pd and C is skew symmetric, then IAI ~ IBI. 14.1.2 If A and B are real pd and of order n x n, show that (1) IAA + (1 - A)BI ~ IAIAIBll-A for 0 ::; A ::; 1; (2) IA + Bil/n ~ IAll/n + IBll/n (Minkowski's inequality). 14.1.3 If B is pd and A - B is und, then (1) IA - ABI = 0 has all its roots A ~ 1; (2) IAI ~ IBI; (3) (B-1 - A-I) is nnd; and (4) if Ar is a principal minor of A of order rand Br is the corn~ sponding minor of B, then Ar - Br is und. 14.1.4 Let x be a column vector of order k x 1 and A be a square symmetric matrix of order k x k, with all its elements non-negative. Then for any integer n,
452
MATRIX ALGEBRA THEORY AND APPLICATIONS
with equality if and only if x is an eigenvector of A . This inequality occurs in genetical theory (Mulholland and Smith (1959)). 14.1.5 If II Ax II ::; IIAx + Byll for all x and y, show that A* B = 0, where II . II is the Euclidean norm. 14.1.6 If A is nnd and a i i = 0, show that a i j = 0 for all j. 14.1.7 Let A and B be two pd matrices with eigenvalues contained in the interval [m, MJ, where M ~ m > O. Let 0 ::; A ::; 1. Show that
This is the reverse of the following convex matrix inequality
14.1.8
Let A
= (ai j) be an n
Xn
complex matrix and U
= (Uij ) be an
n x n tulitary matrix. Denote the Hadamard-Schur product of A and U by A . U =
(ai jUij)
and the largest singular value by
(T max'
Show that
14.1.9 For any square complex matrices X and Y of the same size, show that I det{X
14.1.10
A
= TT',
+ Y)1 2
::;
det{I + X X*) det(I + Y*Y).
Show that an nnd matrix A has a unique decomposition, where T is a triangular matrix with diagonal elements non-
negative. 14.1.11 Let H = A + i B be a Hermitian pd, where A and B are real matrices. Show that IAI > IBI (Robertson inequality) and IHI ::; IAI (Taussky inequality). FUrther, if ai's are the eigenvalues of A and fNs of -iB, both in decreasing order, show that ai > f3i. 14.1.12 If A and Bare nnd matrices, show that AB has only nonnegative eigenvalues. 14.1.13 Schur's majorization theorem: If A is a symmetric n x n matrix, show that the eigenvalues of A majorizes its diagonal elements.
Inequalities with Applications in Statistics
14.1.14
453
Let A E Mn and pd be partitioned as
where An and A22 are square matrices. Show that
which is Fischer's inequality. 14.1.15 If A is nnd partitioned as in complement 14.1.14, show that
is nnd. 14.1.16 Let A be n x n positive definite matrix partitioned as in complement 14.1.14. Show that
where A 22 is the partition conformal to A22 in the reciprocal matrix A-I. As a consequence of the above result, show that Al(A · B) ~ Al(AB) when A and Bare n x n nnd matrices, where Al is a generic symbol for the largest eigenvalue. 14.1.17 Let A and B be non-negative definite Hermitian matrices of order n. Show that 1. Amin(A . B) ~ Amin(AB'), 2. Amin(A . B) ~ Amin(AB). 14.1.18 Oppenheim's inequality: Let A and B be n x n nnd matrices. Show that IA . BI ~ IBI an ... ann. 14.1.19 Fiedler's inequality: If A is pd, show that A· A-I - I is nnd. 14.1.20 Let A and B be n x n positive definite matrices. Show that IAIIBI ::; IA· BI ::; (an a22··· ann )(b ll b22 · ·· bnn ). 14.1.21 Let A and B be nxri symmetric matrices and let the elements of B be non-negative. Show that (Al(A · B), .. . , An(A · B))' majorizes
454
MATRIX ALGEBRA THEORY AND APPLICATIONS
(AI (A), ... ,An(A))'. If A also has all non-negative entries, show that IIf=I Ai(A . B) 2: II~l Ai(A). 14.1.22 Consider the projection operator Py = y(y*y)-y* on the space Sp(Y). Let H be a matrix whose columns form an orthonormal basis of Sp(Y). Show that Py = PH = H H*. Get an explicit form of H by using the s.v.d. of Y. 14.1.23 Let B = (BIIB2) be such that B~B2 = O. Show that PB-PB1 is nnd. 14.1.24 (Ostrowski-Taussky Theorem). Let A E M n , and H = 2- 1 (A+ A*) be pd. Show that IH(A)I ::; IAI. 14.2. Cauchy-Schwartz and Related Inequalities Cauchy-Schwartz inequality has numerous applications in statistics. Some well-known applications are in showing that the correlation coefficient lies between -1 and 1 and in deriving a lower bound to the variance of an unbiased estimator such as the Cramer-Rao lower bound. P 14.2.1
Let x and y be two vectors of real elements. Then
(X'y)2 ::; (x'x)(y'y) with equality if and only if y scalar a. Let x
PROOF.
i= o.
(14.2.1)
= ax for some scalar a or x = ay
for some
A simple demonstration, of (14.2.1) is as follows:
o -< ~( y,, _ ~XiYi2 x,.)2 = 't"
~xi
~ y,2 _ (~XiYi)2 't" 2 ~xi
(14.2.2)
For equality in (14.2.2), we must have Yi = [(~xiYi)/~xrlxi for all i, i.e., (~XiYi)/(~Xn is constant, say a. If PI, ... ,Pn are non-negative numbers, it follows from (14.2.1) that the weighted version
(14.2.3) is also true. P 14.2.2 Then
(1)
Let x and y be two real vectors and A be an nnd matrix.
(x'Ay)2::; (x'Ax}(Y'Ay),
(14.2.4)
Inequalities with Applications in Statistics
(2) (3)
(x'y)2 ~ (x' Ax)(y' A-I y ), (x'x)2 ~ (x' Ax)(x' A-Ix),
if A-I exists, if A-I exists.
455
(14.2.5) (14.2.6)
P 14.2.3 Let x and y be two complex vectors and A be a Hermitian nnd matrix. Then (1) Ix*yl2 ~ (x*x)(y*y), (14.2.7) (2) Ix* Ayl2 ~ (x* Ax)(y* Ay), (14.2.8) (3) Ix*yl2 ~ (x* Ax)(y* A-I y), if A-I exists, (14.2.9) where I . I represents the absolute value. P 14.2.4 Constrained Cauchy-Schwartz inequality. Let x and y be two vectors of order n X 1 and B be a matrix of order n x n. Then (1) (x'y)2 ~ (x' PBX)(Y'y) if Y E Sp(B), where PB is the projection operator on Sp(B), (2) (x'y)2 ~ (y'B-y)(x'Bx) if B is nnd, y E Sp(B), and B- is any g-inverse of B.
P 14.2.5
The integral version of the Cauchy-Schwartz inequality is ( / /g dv)2 = / A
A
/2
dv / g2 dv, A
where / and 9 are real functions defined on a set A, and / and 9 are square integrable with respect to a measure v.
P 14.2.6 The Cauchy-Schwartz inequality for non-null vectors x and y can be stated in the form x*x - (x*y)(y*y)-I(y*X)
2:
o.
A more general version of the inequality is obtained when x and yare replaced by matrices X and Y, yielding that X* X - (X*y)(y*y)-(y* X) 2:L 0,
(14.2.10)
i.e., the matrix in (14.2.10) is nnd. Using Complement 14.1.3, we also have the weaker version in terms of determinants that IX* XI 2: IX*Y(Y*Y)-Y* XI·
(14.2.11)
456
MATRIX ALGEBRA THEORY AND APPLICATIONS
The Cauchy-Schwartz inequality gives an upper bound to x'y. We mention without proof some complementary inequalities which give a lower bound to x'y = (XIYl + ... + XnYn). P 14.2.7 Let 0 < ml ::; Xi ::; M 1, 0 < m2 ::; Yi ::; M 2 , and o < m < "Yi ::; M for every i = 1, 2, . .. ,n. Further, let ~i (i = 1, . .. ,n) be real numbers. Then the following inequalities hold. (1)
(E"Yd(E~)::; (~~ m)2 n 2.
( 2)
(E
(3) ( 4)
"Yi
m
(Schweitzer)
~)(EY.2) < -
(MIM2 + mlm2)2 (E . .)2 (Polya-Szego) 4 M M X.Y.· mlm2 I 2 ~7 (M +m)2 (E"Yi~t)(E-1-)::; M (E~t)2. (Kantorovich) "Yi 4 m
(E
x.
~C2)(EY.2C~) < (MIM2 +Mmlm2)2 (E . . c 2 )2 X.... ... 4 M x.Y....· mlm2 I 2 (Greub-Rheinboldt)
The inequality (4) can written in a form to include the lower and upper bounds to the inner product. (See Diaz and Metcalf (1964).) 1
< (Ex~~;)(Ey;~;) < (MIM2 + mlm2)2 -
(EXiYi~n2
(14.2.12)
4M1M2mlm2
14.3. Hadamard Inequality P 14.3.1
For a nonsingular real n x n matrix B = (b ij ) IBI2 ::; rri=I(Ek=lb~k).
(14.3.1)
As special cases, we have (1) IBI::; 1, if Ej=lbrj = 1, i = 1, ... , n, (2) IBI::;Mn n n/2, if Ibijl::;M foralliandj.
(14.3.2) (14.3.3)
PROOF. To prove (14.3.1), we consider the pd matrix A = BB' and apply the inequality IAI ::; au ... ann, the product of the diagonal elements of A which is proved as follows. Consider the expansion
IAI
= au
+
o a21
a12 a22
al n a2n
Inequalities with Applications in Statistics
457
Since A is pd, the matrix (aiih::;i::;n,2::;i::;n is also pd, and the second term is negative (Why?). Hence
The result IAI :::; alla21 ... ann follows by induction. The result (14.3.1) follows by observing that aii = bTl + ... + bTn.
14.4. Holder's Inequality P 14.4.1 p
Let
Xi, Yi ~
0, i
=
1, ... ,n and (l/p)
+ (l/q) =
1 with
> 1. Then (using E to indicate summation over i = 1, . . . ,n), (14.4.1)
with equality if and only if y/s are proportional to xf-1,s. The integral version of Holder's inequality for functions I, 9 ~ 0 is given by
(14.4.2) PROOF.
First, we establish by differentiation or otherwise min {t(x) = x
P
p
x~O
and the minimum is attained at X U,v ~ 0 in t(x) ~ 1, we find
+ x-
q }
q
=
= 1,
1. Substituting X
uP v q uv<-+- p q
(14.4.3)
with equality when v = up-I. Now let Uk
= xk/(Ex~)l/P,Vk = Yk/(Eyn 1/ q •
Substituting in (14.4.3) and summing over k, we have q Eu P Ev 1 EUkVk:::; __ k + __ k = -
p
q
= ul/qv- l / p ,
p
+ -1 = q
1,
458
MATRIX ALGEBRA THEORY AND APPLICATIONS
which gives the desired inequality. The integral version is proved in a similar manner.
P 14.4.2 Let x~r), i = 1,2, ... ,n be a finite sequence of nonnegative numbers for each r = 1, 2, ... ,m. Let PI, P2, ... ,p"" be m numbers each > 1 such that 1
1
1
PI
P2
p""
- + - + ... +-:S l. Then
n
n
Lrr~lx~j) :S rrT=I[L(x~j»)PjP/Pj.
(14.4.4)
i=1
i=1
The result is easily established. - From Holder's inequality, it is easy to deduce that if P 2:: 1, then
Xi,
Yi 2:: 0 and (14.4.5)
which is Minkowski's inequality, where all summations are from 1 to n.
14.5. Inequalities in Information Theory
(1) Let Eai and Eb i be convergent sequences of positive numbers such that Eai 2:: Ebi. Then bEai log -.:. a •-
:s 0
with equality being attained if and only if and bi :S 1 for all i, then 2Ea•-Iog -ai b i
(14.5.1) ai
= bi . Further, if
2 < Ea •-(a•- - b-) •.
-
To prove the inequalities, note that for at x = 1 yields logx = (x -1) - (x _1)2(2y2)-1
X
> 0,
ai
:S 1
(14.5.2)
the expansion of log x
with y E (l,x) .
(14.5.3)
Using the expansion (14.5.3) for each term in (14.5.1), we have bi Eai log ai = (Eb i - Eai) - Eai(b i - ai)2(2a~y~)-1 :S 0
(14.5.4)
Inequalities with Applications in Statistics
459
thus proving (14.5.1). In (14.5.4), ary'f E (ar,bn, and if ai ~ l,bi ~ 1, the maximwn value of y'f is not greater than unity. Hence (14.5.5) Combining (14.5.4) and (14.5.5), we obtain (14.5.2). The results (14.5.1) and (14.5.2) are true if ai and bi are non-negative and the summations are extended over values of i for which ai > O. In other words we admit the possibility of some of the bi being zero (but not the ai). (2) Let f and 9 be non-negative and integrable functions with respect to a measure J.L and S be the region in which f > O. Then
{ (f - g)
Js
with equality only when
f
~ 0 ~ (f log £dJ.L ~ 0
Js
(14.5.6)
9
= 9 a.e·lfL]·
The proof is the same as that of (14.5.1) with summations replaced by integrals. The reader may directly deduce the inequalities (14.5.1) and (14.5.6) by applying Jensen's inequality (14.6.6), given in the next section. 14.6. Convex Functions and Jensen's Inequality
A function f(x) is said to be convex, if for a, j3
f(ax + j3y)
~
af(x) + j3f(y)
> 0, a + j3 = 1
for all x, y.
(14.6.1)
For such a function, we shall first show that at any point xo,J~(xo), f!....(xo), the right and left derivatives, respectively, exist. Let Xo < Xl < X2. Choosing a = (X2 - XI)/(X2 - xo), j3 = (Xl - XO)/(X2 - xo), X = Xo, and y = X2 so that ax + j3y = Xl, and using (14.6.1), we see that (14.6.2) after multiplication by (X2 - xo). Adding xof(xo) to both sides and rearranging the terms in (14.6.2), we have (14.6.3)
460
MATRIX ALGEBRA THEORY AND APPLICATIONS
which shows that (J(x) - f(xo)lI(x - xo) decreases as x ---- Xo· By adding xd(xt} to both sides of (14.6.2) and rearranging the terms, we have ( ) -.f(.'-....:xo::.!..)_-...::....f~(x.::..!...d < f(X2) - f(xt} . 14.6.4
Xo
-Xl
Equation (14.6.4) in terms of X-I
< Xo < Xl becomes
f(x-t} - f(xo) < f(XI) - f(xo) , X-I - Xo Xl - Xo which shows that [f(x) - f(xo)lI(x - xo) is bounded from below, and since it decreases as X ---- Xo, the right-hand derivative f~(xo) exists. Similarly, f~(xo) exists and obviously f~(xo) ~ f~(xo). Let L be such that f~(xo) ~ L ~ f~(xo). Then, for all x,
f(x) ;::: f(xo) for if Xl
+ L(x -
xo),
(14.6.5)
> Xo,
and the reverse relation is true when Xl < Xo . The inequality (14.6.5) has important applications; it leads to Jensen's inequality. P 14.6.1 (Jensen's Inequality). If X is a random variable such that E(X) = J.L and fO is a convex function, then
E[f(X)] ;::: f[E(X)]
(14.6.6)
with equality if and only if X has a degenerate distribution at J.L or f is a linear function. PROOF.
Consider the inequality (14.6.5) with J.L for Xo ,
f(X) ;::: f(J.L)
+ L(X -
J.L),
(14.6.7)
and take expectations of both sides. The expectation of the second term on the right-hand side of (14.6.7) is zero, yielding the inequality (14.6.6).
Inequalities with Applications in Statistics
461
Let f (x) be a convex function of a vector x E Rn. Then a result analogous to (14.6.7) is
f(x) ~ f(f-L)
+ L'(x -
f-L)
(14.6.8)
where L ERn. Using (14.6.8), we find that the result (14.6.6) is true for a convex function of a vector variable. P 14.6.2 Let Xl, ... , Xn be positive real numbers and f-LI ... , f-Ln be arbitrary real numbers. Then
n n n (Lxil)-l ::; (L Xif-Li 2)(Lf-Li l )-2. i=l
PROOF.
i=l
(14.6.9)
i=l
By the Cauchy-Schwartz inequality (14.2.1) n
n
n
(Lxil)(LXif-Li2) ~ (Lf-Lil)2 i=l
i=l
i=l
from which (14.6.9) follows . P 14.6.3 (Inequality on Harmonic Mean) Let Xl, .. . , Xn be positive random variables. Then the expected value of their harmonic mean is not greater than the harmonic mean of their expected values. Let E(Xi) = f-Li, i = 1, ... ,no Considering the inequality (14.6.9) after multiplying both sides by n and taking expectations, we get PROOF.
n
n
(14.6.10)
i=l
i=l
which is the desired result. See Rao (1996).
14.7. Inequalities Involving Moments Let X be a random variable such that E(X) = f-L. Assume that the moments E(X - f-Lr = f-Lr, r = 1, ... ,2N
462
MATRIX ALGEBRA THEORY AND APPLICATIONS N+I N+I
exist. The quadratic form
L: L:
J.Li+j-2 YiYj in Yb · · · ,YN+I
i=1 j=1
Hence the matrix whose (i,j)-th element is J.Li+j-2 is nnd. This, however, is not a sufficient condition for J.Lr to be a moment sequence of a random variable. The condition is necessary whether the moments are of a discrete or a continuous distribution, or calculated from a given set of observations. In particular, for N = 2,
o1
J.L2] 0 J.L3 J.L2 [ J.L2 J.L3 J.L4
2: 0 or J.L~
(J.L~ 2 J.L~ ) 2: - 1
-
J.L2
0
(14.7.1)
J.L2
i.e., f32 2: 1 + f31, where f31 = J.LV J.L~ and f32 skewness and kurtosis of a distribution.
=
J.L4/ J.L~ are measures of
14.8. Kantorovich Inequality and Extensions In this section, we focus on Kantorovich inequality and some of its variants. Application to some statistical problems is also broached.
P 14.8.1 then
If A is n x n Hermitian pd with eigenvalues Al 2: ... 2: An, x* Ax x* A-Ix
1 <--
x*x
x *x
(AI + An)2 4AIAn
<....;......-_....:....-
(14.8.1)
for all nonzero vectors x . PROOF . The left-hand side inequality follows from the CauchySchwartz inequality (14.2.6) . To prove the right-hand side, we first observe that A and A-I have the spectral decomposition
A = PAP*, A-I = PA- I P* , for some unitary matrix P and A = Diag{AI ' A2, ... ,An}, so that the middle term in (14.8.1) can be written as y* Ay
y* A-Iy
y*y
y*y
(14.8.2)
Inequalities with Applications in Statistics
463
with y = P*x. Thus we need only to find the upper bound of (14.8.2) involving diagonal matrices A and A-I instead of general matrices A and A-I . Taking the logarithm of (14.8.2), finding the derivative with respect to y, and setting the derivative to zero, we have the equation (14.8.3) where 6 = y*y/y*Ay and written as
6AiYi
6 =
+ ~: Yi =
y*y/y*A-1y.
The equation can be
2Yi, i = 1, ... ,n.
(14.8.4)
The equations (14.8.4) are soluble if two y/s corresponding to two different values of Aj are nonzero. Let Yi and Yj be nonzero. Then the equations for 6 and 6 are
giving ~-1
_
1
-
Ai
+ Aj 2
'
~-l
_
2
-
Ai + Aj 2AiAj
and the value of (14.8.2) is then
(Ai + Aj)2 6~2 = 4AiAj 1
which attains the maximum value, when Ai
= Al
(AI + An)2 4AIAn
and Aj
= An, (14.8.5)
The right-hand side of (14.8.1) is proved. Application: Consider the statistical regression problem with one concomitant variable
Yi
= {3xi + ti, i = 1, ... ,n.
IT the covariance matrix of ti'S is (12 A, then the weighted least squares estimate of {3 has the variance (x' A-1x)-t, where x, = (Xl, ... ,Xn ). IT
464
MATRIX ALGEBRA THEORY AND APPLICATIONS
we estimate {3 by the ordinary least squares method, then its variance is x' Ax / (x' x)2. The inefficiency of the latter estimate is
(X'X)2(X' A- 1x)-1 x'Ax using the inequality (14.8.1). Thus the worst possible value for inefficiency is 4AIAn/(Al + An)2. If we consider the general linear model Y=X{3+f.,
(14.8.6)
where X is an n x m matrix and cov(f.) = (72 A, then the covariance matrix of the weighted least squares estimator of {3 is (X' A-I X)-1 and that of the ordinary least squares estimator is (X' X)-I(X' AX)(X' X)-I. We may consider various measures of inefficiency based on the roots of the determinantal equation (14.8.7) One measure is the product of the roots (14.8.8) which is similar to the middle expression in (14.8.1) with x replaced by a matrix X. Bloomfield and Watson (1975) and Knot (1975) showed that (14.8.9) where s = min{m,n - m}. Another measure is E2 = 01 + ... + Om. Khatri and Rao (1981, 1982) showed that
(14.8.10) where t = 0 if s = m and t = 2m - n if s = n - m.
Inequalities with Applications in Statistics
465
Another measure of inefficiency is
Roo (1985c) showed that when X' X
=
I,
s
o ~ E3 ~ 2)~ -
J An-i+d 2 ,
(14.8.11)
i=l
where s = mine m, n - m). The inequality when m = 1 was earlier given by Styan (1983). A fourth measure of inefficiency is
E4 = tr(PA 2 P - (PAP)(PAP)), where P = X(X' X)-l X' is the projection operator. Bloomfield and Watson (1975) showed that 1
s
o ~ E4 ~ 4 2)Ai -
An_i+l)2.
(14.8.12)
i=l The expressions (14.8.9) - (14.8.12) are generalizations of the Kantorovich inequality (14.8.1). The proofs are given in the references cited. Complements
14.8.1 (Strang (1960).) Let A be an n x n nonsingular matrix with singular values 81 ;::: 82 ;::: ... ;::: 8n > 0 and define Wi
Then
= (c5i + c5n_i+d 2/4c5ic5n_i+l' (x' Ay)(y' A-Ix) (x'x)(y'y) -
"":"'--:--':""';-:---:----'- < wI· 14.8.2 (Khatri and Rao (1981).) Let A and 14.8.1. Consider
Wi
be as in Complement
= IX' APy A-I XI/IX' XI I{X, Y) = tr(PxAPyA-1), g{X, Y)
466
MATRIX ALGEBRA THEORY AND APPLICATIONS
where X and Yare nxk and nxs matrices ofranks k and s, respectively, with s ~ k, Px = X(X'X)-IX' and Py = y(y'y)-ly'. Then min(k,n-k)
II
g(X, Y) :::;
i=1
Wi,
k
I(X, Y) :::; LWi if n ~ 2k, i=1
n-k
: :; L Wi + 2k -
rt
if n < 2k.
i=1
14.8.3 (Greub and Rheinboldt (1959).) Let G and H be pd commuting matrices with eigenvalues >'1 ~ ... ~ >'n > 0 and J-Ll ~ ••• ~ J-Ln > 0 respectively. Then
14.8.4 IT >'1
~
>'2
~
...
~ >'n
> 0, show that
14.8.5 (Khatri and Rao (1982)) A measure of inefficiency alternative to (14.8.8) is
14.8.6 If B(X, Y) = X'A-IY(Y'A-Iy)-lY'A-IX and A(X) = X'A-IX - X'X(X'AX)-IX'X, then
sup IB(X, Y)I = IA(X)I. y
CHAPTER 15 NON-NEGATIVE MATRICES In this chapter, we will examine some of the features of the world of non-negative matrices. non-negative matrices occur naturally in several areas of application. From the statistical side, non-negative matrices figure prominently in Markov Chains. Some models in Genetics are based on non-negative matrices. Leontief models in Economics derive sustenance from non-negative matrices. We will touch upon these applications. One of the most prominent results in the area of non-negative matrices is the Perron-Frobenius Theorem. In the next section, we will dwell upon this remarkable result.
15.1. Perron-Frobenius Theorem We need to set some definitions in place to pave the way for an enunciation of the Perron-Frobenius Theorem. The concept of irreducible matrix is central to the development of this section. DEFINITION 15.1.1.
Let A = (aij) E Mn.
(1) The matrix A is said to be non-negative if aij 2': 0 for all i and j. (If A is non-negative, we use the symbol A 2':e 0 or 0 ~e A, the suffix e denotes entry wise.) In the general theory of matrices, the symbol A 2':L 0, alternatively A 2': 0, is used to indicate that A is non-negative definite. (2) The matrix A is said to be positive if aij > 0 for all i and j. (The symbol that is used in this context is A >e 0.) The concepts of nonnegativity and positivity perfectly make sense for matrices not necessarily square. If A and B are two matrices of the same order; we say that A 2':e B or A - B 2':e 0 to mean that if A = (aij) and B = (bij), then aij 2': bij for all i and j . 467
468
MATRIX ALGEBRA THEORY AND APPLICATIONS
DEFINITION 15.1.2. A non-negative matrix A E Mn(n 2: 2) is said to be reducible if there exists a permutation matrix P E Mn such that PAP' is of the form
PAP' =
[~ ~],
(15.1.1)
where B E Mr and D E M n- r , and 0 E Mr,n-r is the null matrix, and
r> 1. What reducibility means is that if we can find some rows of A such that these rows and the corresponding columns of A are permuted, the resultant matrix has a structure stipulated in (15.1.1). The next question is how to identify reducible matrices. In the following proposition, we focus on this problem. We take n 2: 2 in all the propositions. P 15.1.3 Let A E Mn be a non-negative matrix. The matrix A is reducible if and only if there exists a nonempty proper subset [ of {I, 2, ... , n} such that aij = 0 for every i ~ I and j =f:. [.
Sufficiency. Let [ = {il < i2 < ... < id and Ie = {I, 2, ... , n} - I = {il < j2 < ... < jn-k}. Let u be the permutation map from {1,2, ... ,n} to {1,2, ... ,n} defined by u(t) = it, for t = 1,2, ... , k, and u(t) = jt-k, for t = k + 1, k + 2, .. . , n. Let P be the permutation matrix associated with the permutation map u . One can verify that PAP' is of the form (15.1.1). The necessity is clear. One of the characteristic features of reducible matrices is the following. Suppose A is a matrix already in the reduced form (15.1.1). Then Ak, for any positive integer k, is also reducible. More generally, if A is reducible then Ak is reducible for any positive integer k. (Why?) The notion of reducibility can be defined for matrices not necessarily nonnegative. We do not need the definition in generality. The negation of reducibility is one of the key concepts in this section. PROOF.
DEFINITION 15.1.4. A non-negative matrix A E Mn, n 2: 2, is said to be irreducible if it is not reducible. A trivial example of an irreducible matrix is any positive matrix. Another example is a 2 x 2 matrix with main diagonal elements zero and off diagonal elements unity. Matrices of order 1 x 1 are summarily excluded from discussion in this chapter. For us, n is always ~ 2. The following is a characterization of irreducible matrices.
Non-negative
Matrices
469
P 15.1.5 Let A = (aij) E Mn be a non-negative matrix. The following statements are equivalent.
(1) A is irreducible. (2) (/ + A)n-l is positive. (3) For any i and j with 1 ~ i ~ nand i ~ j ~ n, there exists a positive integer k = k(i,j) such that k ~ n and a~;) > 0, where Ak = (a~~»). PROOF. (1) ~ (2). Let y ~ 0 be a non-zero vector of order n x 1 and z = y + Ay = (I + A)y. Let us compute how many non-zero elements z has. Since Ay ~ 0, z has at least as many non-zero elements as y. Could it be possible that y and z have exactly the same number of non-zero
elements? Suppose it is possible. By rearranging the elements of y, if necessary, we can write y' = (u', 0) with u > O. Perforce, the vector z partitions as, z' = (v', 0) with u and v being of the same order and v > O. Partition A accordingly, i.e.,
Z=[V]=[U]+[All o 0 A21
A12] [u]. A22 0
This implies that 0= 0 + A 21 U. Since U > 0, we have A21 = O. Consequently, A is reducible. (Why?) This contradiction shows that z has more non-zero elements than y. Repeat this argument by taking z in the place of y. We find a vector s such that s = (I + A)z = (I + A)2y has more nonzero entries than z. Repeating this argument at most (n - 1) times, we find that (/ + A)n-l >e O. This is true for every non-zero vector y ~ O. Hence (/ + A)n-l >e O. (Why?) (2) ~ (3). Since (I + A)n-l > 0 and A ~e 0, we have
Consequently, for any (i, j), the (i, j) th entry of A, A2 , •• , positive.
,
or An is
470
MATRIX ALGEBRA THEORY AND APPLICATIONS
(3) ~ (1). Suppose A is reducible. There exists a permutation matrix P such that
for every k ~ 1. (Why?) Consequently, we can find i =1= j such that the (i,j)th-entry, (PA k P')ij = 0 for all k ~ 1. Let P = (Pij). Thus we have n n L
LPira~~)pjs = 0 for all k ~ 1.
r=ls=l
For some rand s, Pir = 1 = Pjs. Therefore, a~~) = 0 for all k ~ 1. This is a contradiction to (3). This completes the proof. There is another concept closely related to irreducibility. For primitive matrices, one could obtain a stronger version of Perron-Frobenius Theorem. DEFINITION 15.1.6. A non-negative matrix A is said to be primitive if Ak is positive for some positive integer k. It is clear that every primitive matrix is irreducible. The converse is not true. Look at the case of 2 x 2 matrix with diagonal elements zero and off diagonal elements unity. We now introduce the notion of the modulus of a matrix. Let A = (aij) be a matrix of any size. We define m(A) = (Iaijl). The following properties of the operation of modulus are easy to establish.
P 15.1.7 (1) If A and B are two matrices such that AB is defined, then m(AB) ~e [m(A)][m(B)]. (2) If A is a square matrix, then m(Ak) ~e [m(A)]k for all positive integers k. (3) If A and B are square matrices of the same order such that m(A) ~e m(B), then Ilm(A)IIF ~ IIm(B)IIF, where II·IIF stands for the Frobenius norm. We now focus on the spectral radii of matrices. Recall that the spectral radius Ps(A) of a square matrix A is the maximum of the absolute values of the eigenvalues of A.
Non-negative Matrices
471
P 15.1.8 If A, B E Mn and m(A) ~ B, then Ps(A) ~ ps[m(A)] ~ Ps[(B)]. (In other words, the spectral radius Ps(-) is monotonically increasing on the set of all non-negative matrices in Mn.) PROOF. (1) Note that for any positive integer k,
m(Ak)
~e
[m(A)]k
~e Bk.
It now follows that (IIm(Ak)IIF)I/k ~ (IIm(Ak)II)I/k ~ (IIBkll)l/k. By taking the limit as k ---. 00 now, we note that Ps(A) ~ ps[m(A)] ~ Ps(B). See P 11.2.17.
P 15.1.9 Let A = (aij) E Mn be a non-negative matrix, and B a principal submatrix of A. Then Ps(B) ~ Ps(A). In particular, max aii ~ Ps(A). l$i$n PROOF. Define a matrix C E Mn as follows. Place the entries of B in C in exactly the same position wherever they come from A. The remaining entries of C are zeros. We note that Ps(B) = Ps(C) (Why?) and C ~e A. [We use the result 0 ~e Al ~e A2 ~ Ps(A 1 ) ~ Ps(A2)']
P 15.1.10 Let A = (aij) be a non-negative matrix such that all the row sums of A are equal to the same number 0:. Then Ps(A) = 0:. If all the column sums of A are equal to the same number /3, then Ps(A) = /3. PROOF. Recall the form of the induced leo-norm on
(bij ), IIBlleo,in = l~t'tn I:j=l
Ibijl·
Mn. For B =
See P 11.2.8. This norm is a matrix
norm. Further, r;;:ll that Ps(B) ~ IIBII for any matrix norm II . II. Thus we observe that IIAlleo,in = 0: ~ Ps(A). On the other hand, note that 0: is an eigenvalue of A with eigenvector (1,1, ... ,1)'. Therefore, 0: ~ Ps(A). This proves the first part of the proposition. For the second part, use the matrix norm, 1I·lkin. See P 11.2.11.
P 15.1.11 Let A = (aij) E Mn be a non-negative matrix with row TI, T2, . .. , Tn and column sums ClI C2, . . . ,en. Then
sums
min Ti ~ Ps(A) ~ max Ti,
(1)
~n q ~
(2) PROOF. (1) Let
0:
l~i$n
l$i$n l$~$n
=
~n
l<~
Ps(A)
~ max q. l$~$n
Ti. We show that
0:
~
Ps(A). If
0:
= 0,
this inequality is trivially t~~. Suppose 0: > O. Let B = (b ij ) be defined
472
MATRIX ALGEBRA THEORY AND APPLICATIONS
by bij = a;.i j. Clearly, 0 Se B Se A and every row sum of B is equal to a. Con~quently, a = Pa(B) S Ps(A). The other inequalities can be established in a similar vein. P 15.1.12 Let A = (aij) E Mn be a non-negative matrix. Let x' = (xt, X2, ... ,xn ) be positive. Then 1 n 1 n min - '"' aijXj S Ps(A) S m!tx aijXj, (1) l
L
n
(2)
n
L
min Xj '"' aij S Ps(A) S m!tx Xj aij. I<J'
FUrther, if a, {3 :2: 0 and ax Se Ax Se {3x, then a S Ps(A) S {3. If ax < Ax, then a < Ps(A); if Ax <e {3x, then Ps(A) < {3. PROOF. First, we realize that the matrices S-1 AS and A have the same set of eigenvalues for any non-singular matrix S. This implies that Ps(A) = Ps(S-1 AS). Let S = diag{xl' X2, ... ,xn}. It is clear that S-1 AS :2:e O. Identify the product S-1 AS = (aijxilxj). P 15.1.11 as applied to the matrix S-1 AS establishes the inequalities (1) and (2). For the second part, suppose ax Se Ax. This means that aXi S 2:7=1 aijXj n
for every i. Consequently, a
S 1$.$n ~n ; 2: aijXj S Ps(A). • j=1
If ax
< Ax,
we can find ao > a such that aox Se Ax. (Why?) This gives us a < ao S Ps(A). The other inequalities are established analogously. We need now some facts about simple roots of polynomials and spectral radii of matrices. Let p(x) be a polynomial of degree n in x. Let A be a root of the polynomial. LEMMA 15.1.13. The root A is simple (of multiplicity one) if and only if p'(A) = ddxp(x) IX=A -=1= O.
PROOF. It
is left to the reader.
Let us discuss some aspects of the determinantal equation p(x) = IxI - AI = 0 for any given matrix A. Let Aii be the principal submatrix of A obtained from A by deleting the i-th column and i-th row of A. One can verify that d
n
L
-d p(x) = IxI - Aiil, x i=1
473
Non-negative Matrices
where I stands for the identity matrix of appropriate order in the above.
P 15.1.14 (The Perron-Frobenius Theorem for Positive Matrices.) Let A = (aij) E Mn be a positive matrix. Then the following statements are valid.
(1) Ps(A) > o. (2) Ps(A) is an eigenvalue of A. (3) There exists a positive eigenvector of A corresponding to the eigenvalue Ps(A). (4) If J.L is any other eigenvalue of A, then IJ.LI < Ps(A). (What this means, in particular, is that we cannot find an eigenvalue J.L of A such that IJ.LI = Ps(A).) (5) The multiplicity of the eigenvalue Ps(A) is one, i.e., Ps(A) is not a repeated eigenvalue. PROOF.
(1) Since A is positive, it follows that Ps(A)
> 0, by P
15.1.11. To prove (2) and (3), let J.L be an eigenvalue of A such that IJ.LI = Ps(A). Let x be a corresponding eigenvector and Ixl = m(x). Then
Ps(A)lxl = IJ.LIIXI = IJ.LXI = IAxl ~e IAllxl = Alxl· Let y = Alxl - Ps(A)lxl. It is clear that y ~e o. If y = 0, then Ps(A) is an eigenvalue of A with a corresponding eigenvector Ixl. Since Ixl ~e 0, Ixl # 0, and A is positive, Alxl >e o. FUrther, Ixl = [Ps(A)]-1 Alxl >e
O. This establishes (2) and (3) in case y = O. Suppose y # O. Set u = Alxl which is, obviously, positive. Note that, since A is positive,
o <e Ay = A(Alxl- Ps(A)lxi) = Az - Ps(A)z, which means that Az >e Ps(A)z. By P 15.1.12, Ps(A) > Ps(A), which is not possible. Hence y = o. Thus (2) and (3) are established. (4) Let J.L be an eigenvalue of A such that J.L # Ps(A). By the very definition of spectral radius, IJ.LI ~ Ps(A). We claim that IJ.LI < Ps(A). ~uppose IJ.LI = Ps(A). Let x be an eigenvector of A corresponding to the eigenvalue J.L. Following the argument presented in the proof of (2) and (3) above, it follows that Ixl >e 0 and Ixl is an eigenvector of A corr~ponding to the eigenvalue Ps(A). Let Xi be the i-th component of x. The equation Ax = J.LX implies that n
Ps(A)lxil =
n
IJ.L II Xi I = IJ.LXil = I"LaijXil ~ "Laijlxjl = Ps(A)lxiJ, j=1
j=1
474
MATRIX ALGEBRA THEORY AND APPLICATIONS
for each i. Thus equality must prevail throughout in the above. This means that the complex numbers aijXj, j = 1,2, ... ,n must lie on the same ray in the complex plane. Let () be their common argument. Then e-i8aijXj > 0 for all j. Since aij > 0, we have w = e-ilix >e O. The vector w is also an eigenvector of A corresponding to the eigenvalue J-L of A, i.e., Aw = J-LW. Since W >e 0, J-L ~ O. (Why?) Trivially, J.LW :Se Aw :Se J.LW. By P 6.1.12, J-L :S Ps(A) :S J-L. This contradiction establishes the claim. (5) First, we establish the following result. If 0 <e C :Se D and Ps(C) = Ps(D), then C = D. We now know that there exist positive eigenvectors x and y such that Cx = Ps(C)x and Dy = Ps(D)y. We would like to show that x is also an eigenvector of D corresponding to the eigenvalue Ps(C) of D. Note that 0 <e Ps(C)x = Ps(D)x = Cx :Se Dx. Let z = Dx - Ps(D)x. It is clear that z ~e O. We claim that z = O. Suppose z i= o. Let y = Dx. Note that 0 <e Dz = D(y - Ps(d)x) = Dy - Ps(D)y leading to the inequality Dy >e Ps(D)y. By P 15.1.12, Ps(D) < Ps(D). This contradiction establishes that z = O. Thus the vector x is a common eigenvector of C and D corresponding to the same eigenvalue Ps(C), i.e., Cx = Dx. Since x >e 0 and C :Se D, we have C= D. (Why?) Let p(x) = IxI -AI. Let Aii be the principal submatrix of A obtained from A by deleting the i-th row and i-th column of A. Write
where J-LI! J-L2, ••• ,J-Ln-l are the eigenvalues of A ii . Since Ps(Aii) :S Ps(A), IJ-Lil :S Ps(A) for every i. See P 15.1.9. We can say something stronger. Since A is positive, Ps(Aii) < Ps(A), by what we have seen at the beginning of the proof of (5). Equivalently, IJ-Lil < Ps(A) for every i. Consequently, IxI - Aiil > 0 if x ~ Ps(A). In particular, IPs(A)I - Aiil > 0 for all i. Observe that
This shows that the eigenvalue Ps(A) is simple. This completes the proof.
Non-negative Matrices
475
The spectral radius Ps(A) of a positive matrix is called the Perron root of A. The associated positive eigenvector x' = (Xl, X2, ••. ,x n ), i.e.,
Ax
= [Ps(A)]x,
X
>e 0, with
n
L: Xi = 1 is called the right Perron vector i=l
of A. Note that A' is also positive. The spectral radius remains the same. The right Perron vector y of A' is called the left Perron vector of
A. P 15.1.14 is usually called Perron's theorem. A similar statement has been established by Frobenius in the environment of irreducible matrices. We now focus on irreducible matrices. The extension of P 15.1.14 revolves around comparing the eigenvalues of A and those of I + A. Some results in this connection are worth noting. P 15.1.15 Let A E Mn with eigenvalues )'1, A2, ... ,An. Then the eigenvalues of I + A are 1 + All 1 + A2, ... ,1 + An. Further, Ps(I + A) :::; 1 + Ps(A). If A is non-negative, then Ps(I + A) = 1 + Ps(A).
The first part of the result follows easily. Note that Ps(I + m!lX 11 + Ail:::; 1 + max IAil = 1 + Ps(A). If A ~e 0, then
PROOF.
A)
=
l~t~n
l~.~n
1 + Ps(A) is an eigenvalue of I
+ A.
P 15.1.16 Let A be a non-negative matrix such that Ak is positive for some positive integer k, i.e., A is primitive. Then the assertions of P 15.1.14 hold. [This is easy to establish.] P 15.1.17 (Perron-Frobenius Theorem) Let A E Mn be a nonnegative irreducible matrix. Then:
(1) Ps(A) > 0. (2) Ps(A) is an eigenvalue of A. (3) There exists a positive eigenvector eigenvalue Ps(A) of A. (4) The eigenvalue Ps(A) is simple.
X
of A corresponding to the
PROOF. Since A is irreducible, (I + A)n-l is positive. P 15.1.14 becomes operational for the matrix (I + A)n-l. Now, (1), (2), (3), and (4) follow. Use P 15.1.15 and P 15.1.16. There is one crucial difference between P 15.1.14 and P 15.1.17. If the matrix A is irreducible, it is possible that there is eigenvalue A of A such that IAI = Ps(A).
476
MATRIX ALGEBRA THEORY AND APPLICATIONS
Complements 15.1.1 Let A = (aij) be a matrix of order 3 x 3 with exactly one entry of A equal to zero. Characterize the position of this single zero in the matrix so that A becomes reducible. 15.1.2 Let A = (aij) be a matrix of order 3 x 3 with exactly two entries of A equal to zero. Characterize the position of these zeros in the matrix so that A becomes reducible. 15.1.3 Let A = (aij) be a non-negative matrix such that aii > 0 for all i. Show that A is primitive. 15.1.4 If A is irreducible, show that A' is irreducible. 15.1.5 Let A >e O. Suppose x and yare eigenvectors of A corresponding to the eigenvalue Ps(A) of A. Show that x = ay for some number a. (The eigenvalue Ps(A) is of geometric multiplicity one.) 15.1.6 Let A
=
I-a [
a
f3 f3 ] ,0 < a, f3 < 1.
1_
Determine the Perron root and Perron right vector of A. Examine the asymptotic behavior of A k as k --+ 00. 15.1.7 Let A be a positive matrix. Assess the asymptotic behavior of Ak as k --+ 00. More precisely, show that [AI Ps(A)]k converges as k --+ 00. Show that the limit matrix L is given by L = xy', where Ax = [Ps(A)]x, x >e 0, y' A = [Ps(A)]y', y >e 0, and x'y = 1. (The proof of P 15.3.2 can be adapted in this case by taking L in the place of Q.) Show that exactly the same conclusion is valid for a primitive matrix A. 15.1.8 Let A be a positive matrix and x the right Perron vector of A. Show that Ps(A) = ~~aijXj, the summation being over i,j = 1, ... ,n. 15.1.9 If A is a positive non-singular matrix, demonstrate that A-I cannot be non-negative. 15.1.10 Establish a statement analogous to P 15.1.14 for primitive matrices. Prove this statement using P 15.1.14. 15.1.11 Let In, n 2: 1 be the Fibonacci sequence, i.e., It = h = 1, In = In-I + In-2, n 2: 3. One can show that lim ~f n = I-2,ft the n~oo n-l
'
golden ratio. For any odd positive integer n = 2m + 1, define a matrix
Non-negative Matrices
477
An = (a~j») of order n x n by
a~~) lj
= {
~ o
if
Ii - jl =
1,
ifi =j = m+ 1, otherwise.
Show that Ps(An) :::; J5. Hint: Use P 15.1.12. 15.1.12 Let A be a non-negative matrix. Show that Ps(A) is an eigenvalue of A. Show that there exists a vector x ~ 0 such that Ax = (Ps(A)Jx. 15.1.13 For the following matrices, examine which of the properties (1), (2), (3), (4), and (5) of Theorem 15.1.14 are violated:
15.2. Leontief Models in Economics
We begin with a description of Leontief's Model for an economic system involving inputs and outputs of the industries comprising the economy. Suppose an economy has n industries and each industry produces (output) only one commodity. Each industry requires commodities (inputs) from all the industries, including itself, of the economy for the production of its commodity. No input from outside the economy under focus is needed. This is an example of a closed economy. The problem is to determine suitable ''prices'' to be charged for these commodities so that to each industry total expenditure equals total income. Such a price structure represents equilibrium for the economy. Let us fix: the notation and formulate the problem. Let aij
= the fraction of the total output of the j-th industry purchased by the i-th industry, i, j = 1,2, ... , n.
It is clear that aij ~ O. Further, alj + ... + anj = 1,j = 1, ... ,n. Let A = (aij). Thus A is a non-negative matrix and each column of A sums up to unity. Consequently, the spectral radius Ps(A) of A is unity. (Why?) \Ve assume that A is known. The matrix A is called
478
MATRIX ALGEBRA THEORY AND APPLICATIONS
the input-output matrix of the economy. Let Pi = price for the i-th industry for its total output, i = 1,2, .. . ,n. The equilibrium condition can be stated as follows. Total expenditure incurred by the i-th industry is equal to the total income of the i-th industry, i.e., ailPl
+ ... + ainPn = Pi, i = 1, ...
,n.
These equations can be rewritten as Ap = p, where pi = (pt, P2, ... ,Pn)' The objective is to determine the price vector p. We are back in the realm of non-negative matrices. Since Ps(A) = 1, we are looking for the eigenvector of A corresponding to the eigenvalue Ps(A) of A. This problem falls into the realm of the Perron-Frobenius theorem. In practice, one looks for a positive solution (viable solution) of Ap = p. If A is irreducible, we know that P exists and is positive. As an example, suppose an economy has four industries: a steel plant, an electricity generating plant, a coal mine, and an iron ore mining facility. Twenty percent of the output of the steel plant is used by itself, 30 percent of the output of the steel plant is used by the electricity generating plant, 15 percent by the coal mine, and 35 percent by the iron ore facility. Twenty percent of the electricity generated is used by the steel plant, 25 percent by itself, 25 percent by the coal mine, and 30 percent by the iron ore facility. Thirty percent of the coal produced by the coal mine is used by the steel plant, 30 percent by the electricity generating plant, 20 percent by itself, and 20 percent by the iron ore facility. Finally, 80 percent of iron ore produced by the iron ore mining facility is used by the steel plant, 10 percent by the electricity generating plant, 10 percent by the coal mine, and nothing for itself. The corresponding input-output matrix works out to be A=
0.20 0.30 [ 0.15 0.35
0.20 0.25 0.25 0.30
0.30 0.30 0.20 0.20
0.80 0.10 0.10 0.00
1 .
(15.2.1)
The basic problem is to determine how much the total output of each industry is to be priced so that total expenditure equals total income for each industry. Note that the matrix A is irreducible. There exists a positive vector p satisfying Ap = p. As a matter of fact, any multiple of p constitutes a solution to the equilibrium equation.
Non-negative Matrices
479
Now we begin with a description of an open economy. Suppose there are n industries in an economy each producing only one type of commodity. Portions of these outputs are to be used in the industries within the economy but there is also some demand for these commodities outside the economy. The prices of units of these commodities are fixed and known. Let us introduce some notation. Let d i = monetary value of the output of the i-th industry demanded by sources outside the economy, i = 1,2, ... ,n. Let d' = (d 1 , d2 , ••• ,dn ). Since the prices are known, the sources outside the economy can compute how much is the monetary value of the commodities they are seeking from the economy. The vector d is known. Denote by Cij, the monetary value of the output of the i-th industry needed by the j-th industry in order to produce one unit of monetary value of its output, i,j = 1,2, ... ,n. Let C = (Cij). Clearly, C is non-negative. The matrix C is called the consumption matrix of n
the economy. Normally,
E Cij
~
1 for each i. If the sum is equal to 1,
i=1
the industry is not profitable. Finally, let Xi be the monetary value of the total output of the i-th industry, i = 1,2, ... ,n. The objective is to determine the values of Xi'S so that the needs of the industries within the economy and the demands from sources outside the economy are n
exactly met. Let x' = (x}, X2,'" ,xn ). Note that
E CijXj is the monej=1
tary value of the output of the i-th industry needed by all the industries inside the economy. Consequently, x - Cx represents monetary values of excess outputs of the industries. We set x - Cx = d to match the excess output with the demand. The objective is to determine x 2: 0 so that (15.2.2) (I - C)x = d is satisfied. If (I - C) is nonsingular, there is a unique solution to the system (15.2.2) of equations. The solution may not be non-negative. If (1 - C)-l is non-negative, we will then have a unique non-negative solution x satisfying (15.2.2) for any given demand vector d 2: O. The following results throw some light on solutions to the system (15.2.2).
P 15.2.1 Let C E Mn be a non-negative matrix. Then (I - C)-1 exists and non-negative if and only if there exists a non-negative vector x such that x > Cx. (The condition x > Cx means that there is some production schedule x such that each industry produces more than it
480
MATRIX ALGEBRA THEORY AND APPLICATIONS
consumes.)
If C = 0, the result trivially holds. Assume that C i= 0. Suppose (1 _C)-1 exists and non-negative. There exists a non-negative non-zero vector x such that (I - C)-Ix = dx, where d = Ps[(I - C)-I]. See Complement 15.1.12. We will show that d > 1. The equation PROOF.
(I -C)-Ix
= dx implies that x = ( d~1 )cx.
Since x
~ 0, x i= 0, C ~ 0,
it follows that d > 1. Moreover, [d/(d-1)] > 1. Hence x> Cx. Suppose for some vector x ~ 0, x > Cx. It means that x better be positive. If this is the case, we can find < .x < 1 such that Cx < .xx. (Why?) This implies that Ckx < .xkx for all k ~ 1. Consequently, lim C k = o.
°
k ..... oo
Since (1 - C)(1 + C + C2 + ... + cm) = 1 - cm+l, which converges to as m _ 00, it follows that the series 1 + C + C 2 + ... is convergent and is equal to (1 - C)-I. Thus I - C is invertible. It is clear that (I - C)-1 ~ 0. This completes the proof.
°
The following results are consequences of P 15.2.1. COROLLARY 15.2.2. Let C be a non-negative matrix such that each of its row sums is less than one. Then (I - C)-1 exists and is nonnegative.
15.2.3. Let C be a non-negative matrix such that each of its column sums is less than one. Then (I - C)-1 exists and is nonnegative. The essential difference between the closed model and open model are the following. (1) In the closed model, the outputs of the industries are distributed among themselves. In the open model, an attempt is made to satisfy an outside demand for the outputs. (2) In the closed model, the outputs are fixed and the problem is to detenmne a price structure for the outputs so that the total expenditure and total income for each industry are equal. In the open model, the prices are fixed and the problem is to determine a production schedule meeting the internal and external demands. Complements 15.2.1 For the input-output matrix A of (15.2.1), examine equilibrium solutions and interpret them. COROLLARY
Non-negative Matrice8
481
15.2.2 Three neighbors have backyard vegetable gardens. Neighbor A grows tomatoes, n~ighbor B grows corn, and neighbor C lettuce. They agree to divide their crops among themselves. Neighbor A keeps half of the tomatoes he produces, gives a quarter of his tomatoes to neighbor B, and a quarter to neighbor C. Neighbor B shares his crop equally among themselves. Neighbor C gives one-sixth of his crop to neighbor A, one-sixth to neighbor B, and the rest he keeps himself. What prices the neighbors should assign to their respective crops if the equilibrium condition for a closed economy is to be satisfied, and if the lowest-priced crop is to realize $75? 15.2.3 A town has three main industries: a coal mine, an electricity generating plant, and a local railroad. To mine $1 worth of coal, the mining operation needs 25 cents worth of electricity and makes use of 25 cents worth of transportation facilities. To produce $1 worth of electricity, the generating plant requires 65 cents worth of coal, 5 cents worth of its own electricity, and 5 cents worth of transportation needs. To provide $1 worth of transportation, the railroad requires 55 cents of coal for fuel and 10 cents worth of electricity. In a particular period of operation, the coal mine receives orders for $50,000 of coal from outside and the generating plant receives orders for $25,000 of electricity from outside. Determine how much each of these industries should produce in the period under focus so that internal and external demands are exactly satisfied. (Source: C. Rorres and H. Anton (1984).) 15.3. Markov Chains Let X o, Xl, . .. be a stochastic process, i.e., a sequence of random variables. Assume that each random variable takes values in a finite set {I, 2, ... ,k}. The set {I, 2, ... ,k} is called the state space of the process and members of the set are called the states of the process. In this case, it is easy to explain what a stochastic process means. For every n ~ 0 and i o, it, i2, ... ,in E {I, 2, ... ,k}, the probabilities, Pr{Xo = i o, Xl = i}, ... ,Xn = in}, are spelled out. Let Pi = Pr{Xo = i}, i = 1,2, ... ,k. The vector p' = (PI, P2, ... ,Pk) is called the initial distribution of the process, i.e., P is the distribution of Xo.
482
MATRIX ALGEBRA THEORY AND APPLICATIONS
A physical evolution of a stochastic process can be described as follows. Suppose a particle moves over the states at times 0,1,2, .. . in a random manner. Let Xn be the state in which the particle is at time n, n ~ O. The joint distribution of the process X n , n ~ 0 describes the random movement of the particle over time. Let us introduce the notion of a Markov chain. ~
DEFINITION 15.3.1. The process X n , n if the conditional probability
Pr{Xn+1
0 is called a Markov chain
= jlXo = io, Xl = i l , . . . ,Xn- l = in-I, Xn = i} Pr{Xo = io, Xl = iI, ... ,Xn- l = in-I, Xn = i, X n+ l = j} = Pr{Xo = i o, Xl = iI, ... ,Xn- l = in-I. Xn = i} .
.
= Pr{Xn+1 = JIXn = z} =
Pr{Xn = i, X n +1 Pr{X = i} n
= j}
= Pij,
(say)
for all io, iI. " . ,in-I. i, and j E {I, 2, ... ,k} and n ~ O. The definition means several things: the conditional probability that X n+1 = j given the past {Xo = io, Xl = i l , . . . ,Xn - l = in-I, Xn = i} depends only on the immediate past {Xn = i}; the conditional probability does not depend on n. The numbers Pij'S are called one-step transition probabilities. The number Pij is the conditional probability of moving from the state i to state j in one step from time n to time
(n + 1). Let P = (pij). The matrix P is called the transition probability k
L
matrix. It has the special property that
Pij = 1 for every i. The
j=1
matrix P is an example of what is called a stochastic matrix. In the case of a Markov chain, the knowledge of the initial distribution P and the transition matrix P is enough to determine the joint distribution of any finite subset of the random variables X n , n ~ O. For example, the distribution of Xn is (p')pn. The joint distribution of X2 and X3 is given by Pr{X2 = i, X3
= j} = Pr{X2 = i}
Pr{X3
= jlX2 = i}
= Pr{X2 = i}Pij, for any i and j in the state space. The entry Pr{X2 component of (p')P2.
=
i} is the i-th
Non-negative Matrices
483
One of the basic problems in Markov chain is to assess the asymptotic behavior of the process. One determines lim Pr{ Xn = i} for every state n~oo
i. Asswne that P is irreducible. Irreducibility in the context of Markov chains has a nice physical interpretation. With positive probability one can move from any state to any state in a finite number of steps, Le., p~j) > 0 for some n ~ 1, where pn = (p~j»). See P 15.1.5. Note that the spectral radius P8(P) = 1. The conclusion of the main result of this section, i.e., P 15.3.2 is not valid for irreducible transition matrices. Let us assume that P is primitive. Look up Complement 15.1.10. Let q' = (ql. q2.··· • qk) be the left Perron vector of P, i.e., q> 0, q' P = q', k
and
l: q8 = 1.
The right Perron vector of A is x'
= (1,1, ...• 1),
i.e.,
8=1
Px = x. We want to show that the limiting distribution of Xn is q'. P 15.3.2 If the transition matrix P is primitive, then lim pn = Q, n~oo
where all the rows of Q are identical and equal to q'. Consequently, the limiting distribution of Xn is q', i.e., lim p' pn = q'. n~oo
PROOF.
Observe the following properties of the matrices P and Q.
(1) Q is idempotent. (2) PQ
= Qpm = Q for all m
~
(3) Q(P - Q) = O. (4) (P - Q)m = pm - Q for all M
1.
~ 1. (5) Every non-zero eigenvalue of P - Q is also an eigenvalue of P. This can be proved as follows. Let A be a non-zero eigenvalue of P - Q. Let w be a non-zero vector such that (P - Q)w = AW. Then Q(P - Q)w = AQW = o. This implies that Qw = 0 and Pw = AW. (6) P8(P) = 1 is not an eigenvalue of P - Q, i.e., I - (P - Q) is invertible. This can be proved as follows. Suppose 1 is an eigenvalues of P - Q. Then there exists a non-zero vector W such that (P - Q)w = w. This implies that Qw = 0 and w is an eigenvector of P corresponding to the eigenvalue 1 of A. Since the algebraic multiplicity of the eigenvalue 1 is one, w = ax for
484
MATRIX ALGEBRA THEORY AND APPLICATIONS
some non-zero a, where x is the right Perron vector of P. Since Qw = 0, we must have Qx = o. This is not possible. (7) Let AIA2, ... ,Ak-l' 1 be the eigenvalues of P. Assume that JAIl ::; IA21 ::; ... ::; IAk-11 < 1. Then Ps(P - Q) ::; IAk-11 < 1. From Property 5 above, Ps(P - Q) = IAsl for some s, or = 1, or = O. From Property 6 above, we must have Ps(P - Q) ::; IAk-ll. (8) pm = Q + (P - Q)m for all m ;::: 1. Since Ps(P - Q) < 1, lim (P - Q)m = o. Consequently, lim pm = Q. m--+oo
m~oo
The last property proves the professed assertion. P 15.3.2 is the fundamental theorem of Markov chains. It asserts that whatever the initial distribution p may be, the limiting distribution of Xn is q', the left Perron vector of P. For primitive stochastic matrices, obtaining the left Perron vector of P is tantamount to solving the equations q' P = q' in unknown q.
Complements 15.3.1
Let P =
[~
~ J.
Show that for the transition matrix P,
the limit of pm as m - 00 does not exist. 15.3.2 If P is a stochastic matrix, show that pm is also stochastic for any positive integer m. 15.3.3 An urn contains a black and b red balls. At time n, draw a ball at random from the urn, note its color, put the ball back into the urn, and add c > 0 balls of the same color to the urn. Let Xn be the color of the ball drawn at n-th time, n ;::: 1. Obtain the joint distribution of Xl, X2, and X 3 • Evaluate the conditional probabilities, Pr{X3 = black IXt=black, X 2=black}, Pr{X3 = black IXI=red, X 2=black}, Pr{X3 = black IX2=black}. Show that X n , n ;::: 1 is not a Markov chain. 15.3.4 Show that the transition matrix
0.0 0.5 [ 0.5
0.5 0.5] 0.5 0.0 0.0 0.5
Non-negative Matrices
485
is primitive. Obtain the limiting distribution of the Markov chain driven by the above transition matrix. 15.3.5 A country is divided into three demographic regions. It is found that each year 5% of the residents of Region 1 move to Region 2 and 5% to Region 3. Of the residents of Region 2, 15% move to Region 2 and 10% to Region 3. Finally, of the residents of Region 3, 10% move to Region 1 and 5% to Region 2. Obtain the limiting distribution of the underlying Markov chain. (Source: Rorres and Anton (1984).) 15.4. Genetic Models Gregor Mendel is generally credited with the formulation of laws of inheritance of traits from parents to their offspring. One of the basic problems in genetics is to examine the propagation of traits over several generations of a population. Each inherited trait such as eye color, hair color, is usually governed by a set of two genes, designated by the generic symbols A and a. Plants and animals are composed of cells. Each cell has a collection of chromosomes. Chromosomes carry hereditary genes. Each hwnan being has roughly 100,000 pairs of genes. Each individual in the population has one of the pairings AA, Aa, or aa. These pairings are called genotypes. If the genes correspond to color of eyes in hwnan beings, the human being with genotype AA or Aa will have brown eyes, and the one with aa will have blue eyes. In such a case, we say that the gene A dominates a, or equivalently, the gene a is recessive. In what is called autosomal inheritance, an offspring will receive one gene from each parent. If the father is of the genotype AA, the offspring will receive the gene A from the father. If the father is of the genotype Aa, the offspring will receive either A or a from the father with equal probability. Similar considerations do apply to mothers. If the father is of the genotype AA and the mother is of the genotype Aa, the offspring will receive the gene A from the father and either gene A or gene a from the mother with equal probability. Consequently the genotype of the offspring is either AA with probability 1/2 or Aa with probability 1/2. If the genes correspond to eye color, the offspring will have brown eyes with probability one. (Why?) If both father and mother are of the same genotype Aa, the offspring is of genotype AA, Aa, or aa with probabilities 1/4, 1/2, or 1/4, respectively, If the genes correspond to
486
MATRIX ALGEBRA THEORY AND APPLICATIONS
eye color, the offspring will have brown eyes with probability 3/4 or blue eyes with probability 1/4. In the following table, we list the possible genotype of offspring along with their probabilities. Genotmes of Qarents AA&AA AA&Aa AA&aa Aa &Aa Aa & aa aa & aa
GenotYQe of offsQring aa AA Aa 1 1/2 0 1/4 0 0
0 1/2 1 1/2 1/2 0
0 0 0 1/4 1/2 1
No distinction is made of (father, mother) pairings of genotypes (AA, Aa) and (Aa, AA). Let us look at a simple inbreeding model. Suppose in the O-th generation, the population consists of a proportion of Po individuals of genotype AA, qo of genotype Aa, and TO of genotype aa. Clearly, Po + qo + TO = 1. Suppose mating takes place between individuals of the same genotype only. Let in the population, Pn = proportion of genotype AA in the n-th generation, qn = proportion of genotype Aa in the n-th generation, and Tn = proportion of genotype aa in the n-th generation. We would like to determine the limiting behavior of these proportions. In the first generation, we will have PI = Po + (1/4)qo, ql = (1/2)qo, 1'1 = TO + (1/4)qo. These equations can be rewritten as 1/4 1/2 1/4 Let x~ = (Pn, qn, Tn), n ~ 0 and A the 3 X 3 matrix that appears in the above linear equations. It is clear that the vectors Xn's are governed by the equation,
Note that A is non-negative matrix. Let us determine the limiting behavior of the sequence An, n ~ O. The eigenvalues of A are Al =
Non-negative Matrices
1, >'2
487
= 1, and >'3 = 1/2 with corresponding eigenvectors chosen to be
Let
P=
[~ ~ o
- ; ] , and p-l = (1/2)
1
1
[~ ~ ~]. 0
-1
0
Note that A = P!:1p- l , where!:1 = diag{l, 1, 1/2}. It now transpires that An = P!:1 n p- l for all n ~ O. Consequently, for any n ~ 0,
An =
1 0
1/2 - (1/2)n+1
(1/2)n
[ o 1/2 - (1/2)n+1
More explicitly,
Pn
1 - (l)n+l] = Po + [2 2 qo, qn = (l)n 2 qo, Tn = TO + [12 - (l)n+l] 2 qo·
Consequently, lim Pn = Po
n-..oo
+ (~)qO, lim qn = 0, lim Tn = TO + (-2 )qo. 2 n--+oo n--+oo 1
In the long fUll, individuals of genotype Aa will disappear! Only pure genotypes AA and aa will remain in the population. Let us look at another model called selective breeding model. In the O-th generation, a population has a proportion po of individuals of genotype AA, a proportion qo of genotype Aa, and a proportion TO of genotype aa. In a special breeding program, all individuals are mated with individuals of genotype AA only. Let in the population Pn = proportion of genotype AA in the n-th generation, qn = proportion of genotype Aa in the n-th generation, and Tn = proportion of genotype aa in the n-th generation. We would like to examine the limiting behaviQr of these proportions. In the first generation, we have Pl
= Po
1
+ (2)qO,
ql
= TO
1
+ (2)qO,
Tl
= O.
MATRIX ALGEBRA THEORY AND APPLICATIONS
488
These equations can be rewritten as 1/2 1/2
o
Let X~ = (Pn, qn, Tn), n ;::: 0 and A the 3 x 3 matrix that appears in the above linear equations. It is clear that the vectors Xn's are governed by the equation, Xn = An xo , n;::: o. Note that A is non-negative matrix. Let us determine the limiting behavior of the sequence An, n;::: o. The eigenvalues of A are >q = 1, >'2 = ~, and >'3 = 0 with corresponding eigenvectors chosen to be
m'[-i] ,
and
[-~]
Let
Note that A = PD.p-l, where D. = diag{1, 1/2, O}. It now transpires that An = pD.n p-l for all n ;::: o. Consequently, for any n ;::: 1,
An~ [~
1 - {1/2)n
{1/2)n
o
More explicitly,
Pn
(1)n-l TO, = 1 - ( :21)n qo -:2
Consequently, lim Pn n~~
= 1,
lim qn n~~
qn
(1)n_l TO, = (1)n :2 qo +:2
= 0,
lim Tn n~~
Tn
= O.
= O.
In the long run, individuals of genotype Aa and aa will disappear! Only the pure genotypes AA will remain in the population. As a matter of fact, individuals of genotype aa will disappear in the fist generation itself.
Non-negative Matrices
489
Let us look at a simple restrictive breeding model. In the O-th generation, a proportion Po of individuals is of genotype AA, a proportion qo of genotype Aa, and a proportion TO of genotype aa. Suppose only (AA, AA) and (AA, Aa) matings are allowed. This model is feasible if one wishes to eliminate certain genetic diseases from the population. In many genetic diseases such as cystic fibrosis (predominant among Caucasians), sickle cell anemia (predominant among Blacks), and Tay-Sachs disease (predominant among East European Jews), the relevant normal gene A dominates the recessive gene a. If an individual is of genotype Aa, he or she will be normal but a carrier of the disease. If an individual is of genotype aa, he or she will have the disease and the offspring will be at least a carrier of the disease. One would like to see the effect of the policy of preventing the sufferers of the disease to mate. Let Po be the proportion of the population of genotype AA and qo the proportion of the population of genotype Aa. Since the mating is restricted, the population is taken to be those individuals of genotype AA or Aa. Consequently, Po + qo = 1. Let in the population Pn = proportion of genotype AA in the n-th generation, and qn = proportion of genotype Aa in the n-th generation. One can check that for every n 2: 0,
[::] = An [ :~ ] ,with A =
[~
1/2 ] 1/2 .
A direct computation of powers of A yields
Consequently, lim Pn n--+oo
= 1 and n--+oo lim qn = O.
In the long run, the carriers
of the disease will disappear! 15.5. Population Growth Models One of the important areas in Demography is a critical examination of how population grows over a period of interest. The s~called "Leslie Model" describes the growth of the female portion of a human or animal population. In this section, we will describe the mechanics of this model, in which a non-negative matrix appears. We will outline the limiting
MATRIX ALGEBRA THEORY AND APPLICATIONS
490
behavior of the powers of this matrix in order to shed light on the longterm growth of the population. Suppose the maximum age attained by any female in the population is M years. We classify the female population into some k age classes of equal length. Say, the age classes are: [0, M/k), [M/k, 2M/k), ... , [(k - l)M/k, MJ. When the study is initiated, note down
p~O) = number of females in the population falling into the age group [(i - l)M/k, iM/k), i = 1,2, ... ,k. The vector p(O}1 = (piO) , p~O} , ... ,p~O}) is called the initial age distribution vector. The main objective is to examine how the age distribution changes over time. We would like to examine the age distribution of the female population at times to = 0, tl = M / k, t2 = 2M/ k, and so on. As time progresses, the composition of the classes varies because of three biological processes: birth, death, and aging. These biological processes may be described by the following demographic parameters.
Let = the average number of daughters born to a single female during the time she is in the i-th age class, i = 1,2, ... ,k, bi = the proportion of females in the i-th class expected to survive and pass into the next class, i = 1, ... ,k.
ai
It is clear that ai ~ 0 for every i. Assume that 0 < bi ::; 1 for every i = 1,2, ... ,k - 1. Assume that at least one ai is positive. Let p(m}1 = i·b · f h C I . ( PI(m) ,P2(m) , ... , Pk(m)) b e th e age dstn utlOn 0 t e lema es at tIme t m , where p~m} is the number of females in the i-th age class, i = 1,2, ... ,k. The vectors p(m} 's satisfy the following recurrent relation: (m) PI
(m-l)
= alPI
+ a2P2(m-l) + ... + akPk(m-l)
(m) _ b. (m-l) . _ - t-IPi-1 ,Z -
Pi
2,3, ... ,k.
These equations can be put in a succinct form: - 1 2 3 P(m) -- L P(m-l) , m - " , ...
,
Non-negative Matrices
491
where
L= 000 the so-called Leslie matrix. Clearly, L is a non-negative matrix. It now follows that p(m) = Lmp(O). In order to facilitate the computation of powers of L, finding eigenvalues and eigenvectors of L is helpful. The eigenvalues of L are the roots of the determinantal equation
0= p(A) =
IL -
All
= Ak - alA k-
1
-
a2bl Ak - 2 - a3blb2Ak-3 - ... - akbl ... bk- 1 •
Since at least one ai is positive, there is at least one non-zero root of the polynomial equation. Consequently, the spectral radius Ps(L) > o. We record, without proof, some facts concerning the Leslie matrix. For some details, the reader may refer to Rorres and Anton (1984). (1) The eigenvalue Ps(L) is simple. (2) The vector x, given below, is positive and is an eigenvector corresponding to the eigenvalue Ps(L):
x' = (1, btl Ps(L), b1 b2/[Ps(L)f, ... , b1 b2 ... bk_tl(Ps(L)]k-l). (3) If two successive entries ai and ai+l are positive, then IAI < Ps(L) for any eigenvalue A of L different from Ps(L). Assume that ai and ai+1 are positive for some i. It now follows that (Ps(L)]-m Lmp(O) converges to a constant vector q which depends only on p(O). If m is large, p(m) ~ [ps(L)]mq ~ (Ps(L)]p(m-l). What this means is that the age distribution is a scalar multiple of the preceding age distribution. Complements
15.5.1
Comment on the eigenvalues of the following Leslie matrix:
4 2]
00. 1/8
0
492
MATRIX ALGEBRA THEORY AND APPLICATIONS
= 15 and = (1000, 900,
Let M p(O)
15.5.2
k = 3. If the initial age distribution is given by 800), examine the age distribution after 15 years. Comment on the eigenvalues of the following Leslie matrix:
o 02] . 0 1/3 0 15.5.3 (Fibonacci numbers) In 1202 Leonardo of Pisa, also called Fibonacci, posed the following problem. A pair of rabbits do not produce any offspring during their first month of life. However, starting with the second month, each pair of rabbits produces one pair of offsprings per month. Suppose we start with one pair of rabbits and none of the rabbits produced from this pair die. How many pairs of rabbits will be there at tpe beginning of each month. Let Un be the pair of rabbits at the beginning of the n-th month. Establish the recurrence relation Un = U n - l + U n -2 and show that
where Al and A2 are eigenvalues of the matrix
Notes: The books by Horn and Johnson (1985, 1991) provide a comprehensive treatment on matrices. Some sections of this chapter are inspired by these works. The book by Rorres and Anton (1984) gives a good account of many applications of matrices to real world problems. Their influence is discernible in some of the examples presented here.
CHAPTER 16 MISCELLANEOUS COMPLEMENTS
Some topics, not covered under the different themes of the earlier chapters, which have applications in statistics and econometrics are assembled in this chapter. The proofs are omitted in most cases, but references to original papers and books where proofs and other details can be found are given.
16.1. Sixnultaxleous Decolnpositioxl of Matrices In Section 5.5 we have given a number of results on simultaneous diagonalization of matrices. We consider more general results in this section.
D E F I N ~ T16.1.1. ~ ~ N Two matrices A, B E M, are said to be simultaneously diagonalizable if there exists a nonsingular transformation T such that T*AT and T*BT are both diagonal. D E F I N I T I ~16.1.2. N Two matrices A, B E M, are said to be diagonalizable by contragredient transformations if there exists a nonsingular transformation T such that T*BT and T-'A(T-')* are diagonal.
A typical example where both the definitions hold is the case of two Hermitian commuting matrices which are simultaneously diagonalizable by a unitary transformation. We quote here a number of theorems on simultaneous diagonalizability of two matrices under various conditions given in Rao and Mitra (1971b).
P 16.1.3 Let A E M, be a Hermitian matrix and B E M, be an nnd matrix with rank r 5 n and N E M,,,-, with p(N) = n - r be such that N*B = 0. Then the following hold.
494
MATRIX ALGEBRA THEORY AND APPLICATIONS
(1) There exists a matrix L E M n ,. such that L*BL = J(E Mr) and L* AL = ~(E Mr) diagonal. (2) A necessary and sufficient condition that there exists a ?onsin~ lar transformation T such that T* AT and T* BT are both dIagonal IS (using the notation p(B) = rank of B) p(N* A) = p(N* AN).
(3) A necessary and sufficient condition that there exists a nonsingular transformation such that T* BT and T- 1A(T-1)* are diagonal (i.e., A and B are reducible by contragredient transformations) is p(BA)
= p(BAB) .
(4) If in addition A is nnd, then there exists a nonsingular transformation T such that T* AT and T* BT are both diagonal. (5) If in addition A is nnd, then there exists a nonsingular transformation T such that T*BT and T- 1A(T- 1)* are diagonal.
P 16.1.4 Let A and B be Hermitian and B nonsingular. Then, there exists a nonsingular transformation T such that T* AT and T* BT are both diagonal if and only if there exists a matrix L such that LAB- 1 L -1 is diagonal with real diagonal elements (i.e., AB- I is semisimpIe or similar to a diagonal matrix). For details regarding the above theorems and further results, the reader is referred to Chapter 6 in Rao and Mitra (1971b). 16.2. More on Inequalities In Chapters 10 and 14 we have discussed a number of inequalities which are useful in solving optimization problems and in establishing boWlds to certain functions. We quote here some results from a recent thesis by Liu (1995). P 16.2.1 (Matrix-trace versions of Cauchy-Schwartz inequality) Let B E Mn be nnd, A E Mn,m be such that Sp(A) c Sp(B), Z E Mn,m be arbitrary and B+ be the Moore-Penrose inverse of B. Then (1) (trZ' A)2 ~ (trZ' BZ)(trA' B+ A) with equality if and only if B Z and A are proportional.
495
Miscellaneous Complements
(2) tr[(Z' A)2] ~ tr[Z' BZA' B+ A] with equality if and only if BZA' is symmetric (3) A+ B+(A+), ;::: (A' BA)+ (in Lowner sense) where A+ and B+ are Moore-Penrose inverses of A and B respectively. As special cases of (1) we have
(vecA)(vecA)' ~ [(vecA),(I ® B+)vecA](I ® B) ee'
~
(e'B-e)B for e
E
Sp(B).
For an application of the last result in statistics and econometrics see Toutenburg (1992, pp.286-287).
P 16.2.2 Let BE Mn be nnd, A E Mn symmetric and T E Mn,k such that peT) = k, Sp(T) c Sp(B) and T' BT = h. Further let B+ be the Moore-Penrose inverse of B and Al ;::: . .. ;::: An be the eigenvalues of B+ A. Then the following maxima or minima with respect to T hold.
+ ... + Ak min(trT' AT) = An-k+l + ... + An max [tr(T' AT)2] = Ai + ... + A~ min[tr(T' AT)2] = A~-k+l + ... + A~ max [tr(T' AT)-I] = A;-2k+l + ... + A;l for A > 0, min[tr(T' AT)-l] = All + ... + A;I for A > O.
(1) max(trT' AT) = Al (2) (3) (4) (5) (6)
r = pCB)
The optimum values are reached when T = (tIl·· · Itk), where Bl/2ti are orthonormal eigenvectors of (B+)1/2 A(B+)1/2 associated with the eigenvalues Ai(B+ A), i = 1, ... ,k.
P 16.2.3 For A E Mm,n, Bl E Mm and B2 E Mn are nnd, and Mm,k such that Sp(T) c Sp(Bd and T' BIT = hand W E Mn,k such that SpeW) c Sp(B2) and W'B 2W = I k , we have
T
E
max tr(T' AW) = Al T,W
max tr(T' AW)2 = T,W
+ ... + Ak
Ai + ... + A~
496
MATRIX ALGEBRA THEORY AND APPLICATIONS
where A~ are the eigenvalues of Bi ABt A'. This theorem is useful in the study of canonical correlations in multivariate analysis. (See Lin (1990) and Yanai and Takane (1992).) P 16.2.4 (Matrix Kalltorovich-type inequalities) Let B E Mn and nnd with p(B) = b and A E Mn,r such that Sp(A) c Sp(B) and p(A) = a :::; min(b, r). Further let Al 2:: ... 2: Ab > 0 be the eigenvalues of B. Then:
(1) A+ B+(A+), :::; (Al~~:)2 (A' BA)+ with equality if and only if A = 0 or A'BA = AI+AbA'A and A'B+A = AI+AbA'A. 2 2Al Ab (2) A+ B(A+)' - (A' B+ A)+ :::; (~ - v'Xb)2(A' A)+ with equality if and only if A = 0 or Al = Ab or A' BA = (AI +Ab-v'AIAb)A' A and A' B+ A = (AIAb)-1/2 A' A.
(3) A'B2A < -
(Al+ Ab)2 4AI Ab
A'BAA+BA with equality if and only if A = 0 or A' BA = ~ AI+Ab A' A and A' B2 A = A1 Ab A' A.
(4) A'B 2A-A'BAA+BA:::; HAl -Ab)2A'A with equality if and only if A = 0 or Al = Ab or ,2 2 A'BA = AltAbA'A and A'B2A = "1~AbA'A. Liu (1995) established the following inequalities from the above inequalities, where C . D is the Hadamard-Schur product. P 16.2.5 Let Al and Ab the maximum and minimum of eigenvalues of C ® D where C and Dare pd matrices. Then
( 1) (C· D)-1 -< C-l . D- 1 < -
(AI+Ab)2(C. 4AI Ab
(2) C · D - (C- 1 . D- 1)-1 :::;
(~_
(3) (C· D)2 < -
C2 . D2
< -
yIXb)2[
(AI +Ab)2 (C . 4AI Ab
D)2
(4) (C· D)2 - c2. D2:::; HAl - Ab)2[
(5) (C · D) :::; (C2 . D2)1/2 :::; 2A.)t:!b C· D (6) (C2.
D2
r/
2 - C · D
< -
(AI-Ab) [ 4(AI +Ab)
D)-1
497
Miscellaneous Complements
P 16.2.6 (Schopf (1960)) Let A E Mn be pd with eigenvalues Al 2: ... 2: An and let for y =1= 0, J.Lt = y* Aty. Then
16.3. Miscellaneous Results on Matrices P 16.3.1
(Toeplitz-Hausdorff Theorem) The set
W(A)
= {< x,Ax >:
IIxll
= 1}
is a closed convex set. Note that W(U AU*) = W(A) for any unitary U and W(aA + bI) aW(A) + bW(I) for all a, bE C. Also A(A) E W(A), for any A(A). P 16.3.2
=
For any matrix A, the series
exp A = I
1
2
1
2
1
n
+ A + 2'. A + ... + ,A + ... + ,A + ... n. n.
converges. This is called the exponential of A. Then exp A is invertible and (expA)-I = exp(-A). Conversely every invertible matrix can be expressed as the exponential of a matrix. COROLLARY 16.3.3. Every unitary matrix can be expressed as the exponential of a skew-symmetric matrix. P 16.3.4
Let w(A)
= sup I < x, Ax>
lover IIxll
= 1.
Then
(1) w(A) defines a matrix norm, (2) w(UAU*) = w(A) for all unitary U, and (3) w(A) :S IIAII :S 2w(A) for all A, where IIAII = sup IIAxll· P 16.3.5 (Weyl's Majorant Theorem) Let A E Mn with singular value (TI 2: ., . 2: (Tn and eigenvalues AI, ... ,An arranged in such a way that IA 11 2: ... 2: IAn I. Then for every function ¢ : R+ ---+ R+ such that ¢(et ) is convex and monotonic increasing in t, we have
498
MATRIX ALGEBRA THEORY AND APPLICATIONS
In particular
for all p 2: O. P 16.3.6 (Converse ofWeyl's Majorant Theorem) If )'1,'" ,>'n are complex numbers and Ut, ... ,Un are positive real numbers ordered as 1>'11 2: ... 2: I>'nl and Ul 2: ... 2: Un and if
1>'11 .. ·I>'kl:::; SI·· 'Sk 1>'11·· ·I>'nl = SI··· Sn
for 1:::; k:::; n
then there exists a matrix A E Mn with eigenvalues gular values U}, .. . ,Un'
>'1 '"
>'n and sin-
P 16.3.7 (Fischer's Inequality) Let PI, ... ,Pr be a family of mutually orthogonal projectors in en such that PI El1 ... El1 Pr = In. Then for A 2: 0
[The matrix PIAPI
+ ... + PrAPr
is called the pinching of A.I
P 16.3.8 (Aronszaju's Inequality) Let A E Mn be Hermitian matrix partitioned as
A=
(~*~)
where B E M k • Let the eigenvalues of B, C and A be 131 > 13k; II 2: ... 2: In-k; and al 2: ... 2: an respectively. Then
ai+j-I
+ an :::; 13i + Ij for all
i,j with i
+j
- 1 :::; n.
P 16.3.9 Let A and B be pd matrices. Then the following hold. (1) IIA8 B 811 :::; IIABII8 for 0 :::; S :::; 1. (2) IIABW:::; IIAt Btll for t 2: 1. (3)
>'1 (A8 B8)
:::; >.f(AB) for 0 :::;
(4) [>'I(AB)l t :::;
>'1 (AtBt)
S :::;
for t 2: 1.
l.
>
MiscellaneoU8 Complements
499
(6) (Araki-Lieb-Thirring Inequality)
where A ~ 0, B ~ 0 and sand t are positive real numbers with
t
~
1.
(7) (Lieb-Thirring Inequality) Let A and B be nnd matrices and m, k be positive integers with m ~ k. Then
In particular
P 16.3.10 (n-dimensional Pythagorean Theorem) Let Xl,." , be orthogonal vectors in Rn, and 0 denote the origin. Let the volume of the simplex (Xl, ••. ,xn ) be V and that of the simplex (0, Xl,'" ,Xi, Xi+b' .. ,xn ) be Vi. Then Xn
A formal proof is given by S. Ramanan and K.R. Parthasarathy. For proofs of the propositions in this section, and further results, . reference may be made to Bhatia (1991). P 16.3.11 Then
Let AI, ... ,An be pairwise commuting and pd matrices. A(A) ~ Q(A) ~ 1t(A)
where A(A) = (AI + ... + An)/n, Q(A) = (AI ... An)l/n and 1t(A) = n(A1I + ... +A~l )-1. This is a generalization of the classical inequality connecting arithmetic, geometric and harmonic means. (Maher (1994).)
500
MATRIX ALGEBRA THEORY AND APPLICATIONS
P 16.3.12 (Gersgorill Theorem) Let A E Mn and aij E C be the elements of A, i, j = 1, ... ,n and
Then every eigenvalue of A lies in at least one of the disks {z: Iz - aiil :S ~d, i
= 1, ...
,n
in the complex z-plane. FUrthermore, a set of m disks having no point in common with the remaining (n - m) disks, contains m and only m eigenvalues of A. P 16.3.13
Let A E Mn as in P 16.3.12 and
Then the eigenvalues of A lie in the disk
{z E C : Izl :S min(v, ()}. FUrthermore I det AI :S min(1}n, (n). P 16.3.14 Let A E Mn as in P 16.3.12, dj = lajjl - ~j, j 1, ... ,n and d = min{d}, ... ,dn } > O. Then IAil ~ d, i = 1, ... where AI, ... ,An are the eigenvalues of A. Hence I det AI ~ ~.
= ,n,
P 16.3.15 (Schur Theorem) Let A E M n , II· II denote the Euclidean matrix norm and AI,'" ,An, the eigenvalues of A. Then
+ ... + IAnl2 :S (ReA I)2 + ... + (ReA n )2 :S ( ImA d2 + ... + (ImAn)2 :S IAll2
IIAII2 IIBII2 IICll2
where Re and 1m are real and imaginary parts, B = (A + A*)j2 and C = (A-A*)j2. Equality in anyone of the above three relations implies equality in all three and occurs if and only if A is normal.
501
Miscellaneous Complements
16.4. Toeplitz Matrices Matrices whose entries are constant along each diagonal arise in many applications and are called Toeplitz matrices. Formally T E Mn is Toeplitz if there exist scalars c-p+l, ... ,Co, ... ,Cp-l such that tij, the {i,j}-th element of Tis Cj-i. Thus
co
T=
[ C-I
C-2
Cl
C2
CO
Cl
C-l
CO
CP-'I Cp -2 Cp-3
C-~+l C-~+2 C-~+3
:::
{16.4.1}
CO
is Toeplitz. The special case of (16.4.1) when C-i = Ci and the matrix is positive definite arises in a linear prediction problem using an autoregressive model in time series, (16.4.2) where, considering {16.4.2} as a stationary process, we have
E{td = 0,
E{Xt}
E{XtXt_j}
= 0,
=
°
= E{XtXt+j} = Cj.
Multiplying both sides of (16.4.2) by have Cj
E{Xttt}
Xt-j
and taking expectations, we
+ alcli-ll + ... + apclj_pl = 0, j = 1, ...
,p
which can be written using a special case of (16.4.1),
(16.4.3)
If Ci are known, we can estimate ai by solving equation (16.4.:3) and use the estimates formula
al, ... ,aj
in predicting
Xt
given
Xt-l, .. . , X t-p
by the {16.4.4}
502
MATRIX ALGEBRA THEORY AND APPLICATIONS
If <;'s are not known, we can estimate them from the available observations in the time series, Xl, ... ,X n , by the formula (16.4.5) where nr in (16.4.5) = n - r, the number of terms in the summation. The estimates of Ci are used in equation (16.4.3) to solve for ai. Because of the special structure of the Toeplitz matrix in (16.4.3), the computations involved in estimating ai are somewhat simple. Efficient algorithms have been developed by Durbin (1960) for solving (16.4.3), by Levinson (1947) when the vector on the right in (16.4.3) is arbitrary and by Trench (1964) to find the inverse of the Toeplitz matrix on the left-hand side of (16.4.3). Excellent accounts of these algorithms can be found in Golub and Loan (1989). We illustrate only the Durbin's algorithm to show how the Toeplitz form of the matrix of equations (16.4.3) provides a simple approach. First we introduce what are called persymmetric matrices. DEFINITION 16.4.1. A matrix B = (b ij ) E Mn is called persymmetric if b ij = bn-j+l,n-i+l for all i and j. [What this concept means is that the matrix is symmetric around the diagonal running from the southwest corner to the northeast corner.] .An example of a 3 x 3 persymmetric matrix is as follows.
Let En E Mn be defined by
1]
o 1
0
o
0
which is, indeed, a permutation matrix with the special name of exchange matrix. If x' = (Xl,'" ,xn ), note that (Enx), = {x n , .. . ,xt}. Further E;;l = En. The following results are easy to establish.
503
Miscellaneous Complements
P 16.4.2 Let B E Mn. Then: (1) B is persymmetric if and only if B = EnB' En. (2) If B is persymmetric and nonsingular, then B- 1 is persymmetric. (3) If T E Mn is Toeplitz (of the form (16.4.1)), then T is persymmetric. The converse is not true. We revert to the problem of solving equation (16.4.3). First by dividing each row by Co, we can rewrite equation (16.4.3) as (16.4.6) where
1·1
Tp=
P-1j T Tp-2
Tp-2
1
[ Tp-1
Ti
1
;1
= ci/Co, a~ =
(a1, •.. ,ap ),
P~ =
(T1, ... ,Tp).
Suppose that we are able to obtain a solution of TmYm
(16.4.7)
=Pm
in Ym for a given mE [I,p]. Then, we can obtain a solution of (16.4.8) provided that m
+ 1 :::; p.
Rewrite (16.4.8) as
EmPm] 1
where
Z:n+1
[u] V
= [ Pm J
(16.4.9)
T m +1
= (u', v) and u is an m-vector. These equations give
Tm u
+ EmPm v =
p'mEmu + V
pm
= T m+1
from which we have u =
T;;.l(Pm - EmPmv)
= Ym - T;;.l EmPmv.
(16.4.10)
504
MATRIX ALGEBRA THEORY AND APPLICATIONS
= Ym
u
- vEmYm.
Once we have Ym from (16.4.7), the computation of u from (16.4.10) is simple provided that we know v. Note that
v
= rm.H = rm+1 = rm+1
- P'mEmu - p'mEm(Ym - vEmY) - p'mEmYm - vp'.,..EmYm.
Consequently
v(l
+ p'mYm) = rm+1
- p'mEmYm
v= rm+1 - p'mEmYm 1 + P'mYm
--~--~~~~
Thus, z'm+1 = (u /, v) depends only on Ym. We continue the recursive process starting with any m, until we reach the value p. This is called Durbin's algorithm. Let us look at the eigenvalue decomposition of a Toeplitz matrix. The eigenvalues of Toeplitz matrices do not have an explicit form. However, some asymptotic results on the behavior of eigenvalues are available. We look at a special form of Toeplitz matrices. Let a_ m , a_(m-I), ... ,a-I, ao, aI, .. . , am-I, am be a set of 2m + 1 numbers. Define at = 0 for all ItI > m. For each n 2 1, let Tn = (t~;») be a matrix of order n x n with
t~;)
= aj-i.
For example, if m
r
lao 2 a_I
T5~
= 2 and n = 5, al ao a_t a_2 0
a2 al ao a_I a_2
0 a2 at ao a_I
~21
al ao
If n is large compared to m, a substantial number of entries in the upper right-hand corner and lower right-hand corner of the matrix Tn are zeros. Let an,]' a n ,2,··· ,an,n be the eigenvalues of Tn. Let 10 be the Fourier
Miscellaneous Complements
transform of the sequence a- m , a_(m-l),' i.e.,
505
" ,a-l , ao, al,'"
,am-l,
am,
m
f(>..) =
L
ak eik >..,
0:::; >.. :::; 27r.
k=-m
It can be shown that for any possible integer s, 1 n 1 lim - '"'" o:~ k = -2 n-oo n ~, 7r
10
2 11"
[J(>..W d>.. .
0
k=l
IT Tn is Hermitian for each n, then for any continuous appropriate interval,
lim n-oo
t
~ F(O:n,k) = 2~ r21l" n k=l Jo
FO
on the
F[f(>..)] d>...
/I
The appropriate interval for F is [min,X f(>"), max,X f(>..)]. In this case, one can show that the eigenvalues of Tn lie in this appropriate interval. For further exposition in this connection, the reader can refer to Gray (1972) and Grenander and Szego (1958).
Complements 16.4.1 Let V be the collection of all n x n Toeplitz matrices. Show that V is a vector space and its dimension is 2n - 1. Obtain a basis of
V. 16.4.2
Determine the inverse of the following Toeplitz matrix of order
nx n.
T=
1 0 0
-1 1 0
0 -1 1
0 0 0
0 0 0
0 0
0 0
0 0
1 0
-1 1
For this matrix, ao = 1, al = -1, and all other ai = O.
506
MATRIX ALGEBRA THEORY AND APPLICATIONS
16.4.3 nx n.
Determine the inverse of the following Toeplitz matrix of order
T=
1 -2 1
0 0
-2
0 0 1
0 0 0
0 0 0
0 0 0
0 0
0 0
-2 1
1 -2
0
0 1
1
In this example, ao = 1, a_I = -2, a_2 = 1, and all other ai = O. Hint: T- I is lower triangular and Toeplitz. 16.4.4 Determine the inverse of the following Toeplitz matrix T of order n x n.
T=
2
0 0 0
0 0 0
0 0 0
0 0
-1 0
2 -1
-1 2
0
2 -1 0
-1 2 -1
-1
0 0
0 0
In this example, ao = 2, al = a_I = -1, and all other ai matrix T is an example of a tri-diagonal matrix. Hint: Study the pattern of inverses for n = 1,2,3, and 4.
= O.
The
16.5. Restricted Eigenvalue Problem
In statistical applications, it is sometimes necessary to find the optimum values of a quadratic form x' Ax subject to the conditions x' Bx = 1, where B is pd, and G'x = t (see Rao (1964b)) . A simple solution exists when t = O. P 16.5.1 The stationary values of x' Ax when x is restricted to x' Bx = 1 and G'x = 0 is attained at the eigenvectors of (I - P)A with respect to B , where P is the projection operator
PROOF.
Introducing Lagrangian multipliers>. and jL, we consider the
expression
x' Ax - >.(x' Bx - 1) - 2x'GjL
Miscellaneous Complements
507
and equate its derivative to zero Ax - )"Bx - CJ.L = 0 C'x=O x'Bx
= 1.
(16.5.1)
Multiplying the first equation in (16.5.1) by I -P, we have the equation
(I - P}Ax = )"Bx
(16.5.2)
which is required to be proved. In the special case A = aa' where a is a vector, x' Ax has the maximum value when x ex B- 1 (I - P}a which is an impOi"tant result in problems of genetic selection. Another problem of interest in this connection is to find the maximum of x' Ax when x is restricted by the inequality condition C'x ~e 0 in addition to x' Bx = 1. This leads to a quadratic programming problem as shown in Rao (1964a). In the general case of the condition C' x = t, a solution is given by Gander, Golub and Matt (1989). A more general eigenvalue problem which occurs in statistical problems is to find the stationary values of
subject to the condition x*x = 1, where A is a Hermitian matrix and b is a given vector. For computational aspects of this problem reference may be made to Forsyth and Golub (1965).
16.6. Product of Two Raleigh Quotients We consider the problem of finding the stationary values of x'Cx (x' Ax)1/2(x' Bx)1/2
(16.6.1)
where A and Bare pd matrices and C is a symmetric matrix, all of the same order. The square of {16.6.1} is the product of two Raleigh coefficients (x'Cx/x' Ax) and (x'Cx/x' Bx). The special case of {16.6.1}
508
MATRIX ALGEBRA THEORY AND APPLICATIONS
with C = I and A is of order 2 x 2 originally arose in attempts to design control systems with rillnimum norm feedback matrices (Kouvaritakis and Cameron (1980) and Cameron and Kouvaritakis (1980)) and also in the study of the stability of multivariate nonlinear feedback systems (Cameron (1983)). The general case of (16.6.1) occurs in the analysis of familial data when multiple homologous measurements are available on father and son (say), and the objective is to deterrillne a linear combination of measurements which has the maximum parentoffspring correlation (Rao,and Rao (1987)). P 16.6.1
The stationary values of (16.6.1) are
where Ai and IIi are solutions of the equations
2Cx = A(A + IIB)x, II = x' Ax/x' Bx. PROOF.
(16.6.2)
Differentiating (16.6.1) with respect to x, we obtain
x'Cx x'Cx -'-A x x Ax + -'-B x x Bx Writing A = x'Cx/x' Ax and All equations (16.6.2).
=
= 2Cx.
(16.6.3)
x'Cx/x' Bx, we get the desired
A computational algorithm for solving equations (16.6.2) and some worked out examples are given in Rao and Rao (1987). An investigation into an efficient algorithm for solving (16.6.3) will be a useful contribution. 16.7. Matrix Orderings and Projection Let us consider the real vector space R n endowed with the ordinary inner product < x, y > = x'y, where x, y ERn. A matrix P E Mn is called a projector if p2 = P (i.e., P is idempotent). Let us introduce the following conventions of ordering two matrices A and B. (See Trenkler (1994) for more detailed discussion.) Lowner ordering (Lowner (1934)). Let A, B E Mn. A '5:L B if and only if B - A = CC' for some matrix C (i.e., B - A is nnd matrix).
Miscellaneous Complements
509
Star ordering (Drazen (1978)). Let A, B E Mm,n. A ~* B if and only if A'A = A'B and AA' = BA'. Rank subtractive ordering (Hartwig (1980)). Let A, B E Mm,n. A ~r8 B if and only if p(B - A) = p(B) - p(A). The following proposition characterizes projection operators in terms of the above matrix orderings. P 16.7.1 (Hartwig and Styan (1987)). P is a projector if and only if one of the following conditions is satisfied.
(1) P
~rs
I
(2) P ~* Q for some projector Q (3) P ~rs Q for some projector Q ( 4) P ~ L Q for some projector Q such that PQ eigenvalues of Pare 0 and l.
= QP and all
P 16.7.2
(Baksalary and Mitra (1991)). P is an orthogonal projector (i.e., P is idempotent and symmetric) if and only if one of the following conditions is satisfied.
(I) (2) (3) (4)
P ~* I
P P P
~rs
pip Q for some orthogonal projector Q ~L Q for some orthogonal projector Q and all eigenvalues of
~*
Pare 0 and 1 (5) 0 ~L P ~L Q and (Q - p}2 = (Q - P) for some orthogonal projector Q. 16.8. Soft Majorization
In Chapter 9, we discussed majorization of one real vector by another. We introduce here some alternative definitions applicable to real or complex vectors and mention some inequalities on eigenvalues. DEFINITION 16.8.1. For real or complex vectors v, w( E en}, w is said to majorize v, denoted by v « w, if and only if v is a convex combination of the permutations W 7r of w (W7r denotes a vector obtained from w by permuting its components according to the permutation 7r of {I, ... ,n}}, i.e., v = Ea7r w 7r , Ea7r = 1, with a 7r as real numbers. [Note that v « w defined as above implies v) + ... + Vn = w) + ... + w n , where Vi = (VI. ••. ,vn ) and Wi = (w), ... ,wn ).]
510
MATRIX ALGEBRA THEORY AND APPLICATIONS
DEFINITION 16.8.2. Withv,w(E en) andw7r as in Definition 16.8.1, w is said to softly majorize v, denoted by v «:8 w, if and only if v = Ez7r w7r (finite sum), where Z7r are complex numbers such that EIZ7r1 = l. We use the notation Eig A to denote the n-vector of eigenvalues of A E M n , including multiplicities and ordered arbitrarily. A matrix norm is called unitarily invariant if IIU* AVII = IIAII for every A E Mm,n and unitary U E Mm and V E M n , in which case we denote the norm of A by IIAilui. A matrix norm is called weakly unitarily invariant norm if IIU* AUII = IIAII for every A E Mm and unitary U E M m, in which case we denote the norm of A by II A II wui·
P 16.8.3 (Lidskii Theorem) Let A and B be Hermitian matrices with a' = (0.1, ... ,an) and {3' = ({31, ...' ,13n) as corresponding vectors of eigenvalues, where the components are arranged in decreasing order of magnitude. Then a - (3 «: Eig(A - B) which implies lldiag{ad - diag{{3i}IIui ::; IIA - Bllui.
P 16.8.4 (Bhatia and Holbrook (1989)) For any A, B E Mn and normal such that A - B is also normal, there is an ordering of Eig A and Eig B such that Eig A - Eig B
«: Eig(A -
B).
P 16.8.5 (Sunder (1982)) If A, B and (A - B) are normal as in P 16.8.4, then for each" . IIwui , there is an ordering of Eig A and Eig B, which may depend on the norm, such that IIA - Bllwui ::; lldiag{Eig A} - diag {Eig B}IIwui.
P 16.8.6 (Bhatia and Holbrook (1989)) For A and B E Mn and normal, the following statements are equivalent.
(1) IIAllwui::; IIBllwui for every such norm. (2) Eig A
«:8
Eig (B).
Miscellaneous Complements
511
16.9. Circulauts Let Co, C., C 2 , ••• ,Cn- 1 be n numbers. The circulant C of order n x n based on these numbers is defined by
C=
Co C n- 1 C n- 2
C1 Co C n- 1
C2 C1 Co
C n- 2 C n- 1 Cn- 3 C n- 2 C n- 4 C n- 3
C2 C1
C3 C2
C4 C3
Co C n- 1
C1 Co
The elements of each row (column) of C are identical to those of the previous row (column), but are moved one position to the right (down) and wrapped around. One can recognize from the structure of C that C is a Toeplitz matrix. The entries of C can be described using the modulo operation. Let C = (Cij ) be a matrix of order n x n. The matrix C is a circulant if qj = Cj-i(modn) for alII ~ i,j ~ n for some numbers Co, C l , ... , C n- 1 • The circulant C based on the numbers Co, C}, ... , C n- 1 can be denoted symbolically as C(Co, CJ, ... , Cn-d. There is another way to look at circulants. Let 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 Il= 0 1 0
0
0 0
0 0
1 0
The matrix Il is a permutation matrix of order n x n and can be identified as the circulant C(O, 1,0, ... ,0,0). The matrix Il is called the forward-shift permutation matrix. This is the permutation matrix associated with the permutation map a : {O, 1,2, ... , n - I} ~ {O, 1,2, ... ,n -I} given by a(O) = 1, a(l) = 2, .. . ,a(n - 2) = n - 1, and a(n - 1) = o. One can show that a matrix C of order n x n is a circulant if and only if Cil = IlC. Circulants occur as natural models of dispersion matrices in a certain statistical context. The variances and covariances of say 8 measurements can be modeled in the following way. (1) var(Xi ) = ao, i = 1,2, ... ,8.
512
MATRIX ALGEBRA THEORY AND APPLICATIONS
(2) COV(Xi' Xj) = lTli_jl, i =I j. The dispersion matrix L; of the measurements has the following structure lTD
L;=
lTl
lT2
lT3
lT4
lT5
lT6
lT7
lTl
lTD
lTl
lT2
lT3
lT4
lT5
lT6
lT2
lTl
lTD
lTl
lT2
lT3
lT4
lT5
lT3
lT2
IT}
lTD
lTl
lT2
lT3
lT4
lT4
lT3
lT2
lTl
lTD
lTl
lT2
lT3
lT5
lT4
lT3
lT2
lTl
lTD
lTl
lT2
lTl
lTD
lTl
lT2
lTl
lTD
lT6
lT5
lT4
lT3
lT2
lT7
lT6
lT5
lT4
lT3
Note that L; is a circulant. We now proceed to obtain spectral decomposition of a circulant. Ideas from group theory play a prominent role in the decomposition. Let C = {a, 1,2, ... ,n - I}. The set G is a group under the binary operation of addition modulo n. Let Co, C 1 , C 2 , .. . ,Cn - 1 be a set of numbers. These numbers can be viewed as a function defined on the group C. The Fourier transform C(-) of the function C(-) is given by n-l
Cj = L
e27rijk/nCk,
k=O
where i = .J=T. The function transform C(-) . More precisely,
C( .)
can be recovered from its Fourier
n-}
Cj =
~L
e-27rijk/nCk .
k=O
Define the Fourier matrix
One can verify that F is a unitary matrix. Let C be the circulant based on Co, C ll ... ,Cn - 1 • It also follows that
Miscellaneou.s Complements
513
This is a spectral decomposition of C. The eigenvalues of C are given by Co, C_I, C-2, .. ·0' C-(n-l). We will close this section with a description of cyclic designs. Suppose we want to compare the performance of some seven treatments on some experimental units. The experimental units are arranged in blocks of size 3. The units in blocks are homogeneous but there may be differences in blocks. The problem is to allocate the treatments to units in such a way that each treatment is replicated exactly the same number of times. The following is one such arrangement. Block
Treatments
1
1
2 3 4 5
2 3 4 5
6
6
7
7
2 3 4 5
6 7 1
4 5
6 7 1 2 3
The starting block has treatments 1,2, and 4. The treatments in the following block are taken to be those by repeatedly adding 1 (mod 7) to the treatments in the preceding block. This is an example of a cyclic design. Note that this is indeed a Balanced Incomplete Block Design. Every treatment is replicated three times. Every pair of treatments appears exactly in one block. Let N = (nij) be the incidence matrix of the design, i.e., nij = { 1
0
Note that
if treatment j appears in block i, otherwise. 1
N=
0 0 0 1
0 1
0
0 1 0 1 0 0 1 1 0 1 0 0 1 1 0 0 0 0 1 1 1 0 0 0 1 0 1 0 0 0 1 1
0
1
0 0 0 1
0 1 1
514
MATRIX ALGEBRA THEORY AND APPLICATIONS
Observe that N is a circulant. An excellent source for material on circulants is the book by Davis {1979}.
Complements 16.9.1 Show that the collection of all circulants of order n x n is a vector space. 16.9.2 Show that only two circulants commute. 16.9.3 IT C is a non-singular circulant, show that C- I is a circulant. Using the spectral composition of C, identify the inverse of C. 16.9.4 Show that all circulants are simultaneously diagonalizable. 16.9.5 Explore the possibility of using a circulant for developing a magic square in which all rows sums, column sums, principal diagonal sums are identical. 16.9.6 Let Yi, i = 0, ±1, ±2, . .. be a doubly infinite sequence of numbers. Let . _ Yi-I + Yi + Yi+ 1 Z, 3 ' z. = 0 , ± 1, ±2 , .... Identify the infinite matrix which transforms the sequence {Yi} to {Zi} . Show that C is a circulant.
16.10. Hadamard Matrices A matrix of order n x n whose entries are +1 or -1 is called a Hadamard matrix of order n if HH' = nln' which implies H' H = n1n and H is an orthogonal matrix. The following are some examples of Hadamard matrices:
and
[j
1 1 -1 -1
1 -1 1 -1
-~l -1
.
1
IT H is a Hadamard matrix of order n ~ 4, then necessarily n = 4t for some positive integer t. It has been conjectured that a Hadamard matrix of order 4t exists for every t ~ 1. The conjecture is not yet resolved. However, a number of results on the existence of Hadamard matrices of specific orders are available in the literature. For example, there exists a Hadamard matrix of order 2k for every k ~ 1. For the use of Hadamard matrices in design of experiments, a good reference is Hedayat and Wallis (1976).
Miscellaneous Complements
515
16.11. Miscellaneous Exercises 16.11.1
Verify that detA, where A E Mn is the Fibonacci matrix,
11 is exactly the nAh term of the Fibonacci sequence 1,2,3,5,8,13, ... where an = a n -l + a n-2 (n;::: 3). 16.11.2 Let I n denote the tridiagonal Jacobi matrix
o
0
Show that IJn! = a n !Jn- 1 !- bn-lCn-l!Jn-l!, n;::: 3. 16.11.3 (Fredholm Theorem) The equation Ax = b, (A E Mm,n, with elements in C, b E em) is solvable if and only if b is orthogonal to all solutions of the homogeneous equation A*y = O. 16.11.4 Let A E Mn and Hermitian with eigenvalues Al ;::: ... ;::: An. Let J-Ll ;::: ... ;::: J-Ln-r be the eigenvalues of A subject to 1· linear constraints as in Section 16.5. Then show that
Ai ;::: J-Li ;::: Ai+r, i
= 1, . . . ,n -
r.
In particular, if there is only one constraint, then
16.11.5
Define
a
= max!a rs !, T,S
b = max!brs !, T,S
C
= max!crs ! T,S
516
MATRIX ALGEBRA THEORY AND APPLICATIONS
where A = (aij) E M n , B = (b ij ) = 2-1(A+A)*, C A*). Then, if A is any eigenvalue of A, show that
IAI
~ na,
IRe AI
~ nb,
11m AI
= (Cij)
= 2-1(A-
~ nco
This is a corollary to the Schur theorem (P 16.3.15). 16.11.6 Prove that a matrix G is the Moore-Penrose inverse of A if and only if GAA* = A* and G = BA* for some matrix B . 16.11.7 Let A E Mm,r and B E Mr,n each of rank r ~ {min(m, n)}. Show that (AB)+ = B+ A+. 16.11.8 Let A E Mn and A = HU = U1H1 be polar decomposition of A. Show that A+ = U* H+ = Hi Ui are those of A + . 16.11.9 (Bahadur's expansion of a probability density function). The basic idea behind Bahadur's work is the following. Let 10 be a function defined on a finite set of elements (16.11.1) and denote the vector (16.11.2) where I(Oi) is the value of 1 at ai, i = 1, ... ,k. Further let II,· .. ,Ik be k functions defined on A such that they are not linearly dependent. Then, it is clear from our knowledge of vector spaces that any function 1 defined on A belongs to the vector space V k = Sp(II, ... , h), i.e., 1 can be written as (16.11.3) for some scalars b1 , ••. ,bk • Let us introduce an inner product < .,. > in V k and choose II, ... ,Ik such that they are orthonormal with respect to the inner product. In such a case, the coefficients in (16.11.3) can be obtained in a simple way as bi
=< l.fi >, i = 1, ...
,k .
(16.11.4)
Miscellaneous Complements
517
Bahadur (1961) used this idea in obtaining an expansion for the joint density function of m random variables
where each Xi can take the value 0 or 1. There are 21Tt combinations (Xl,'" ,xm ) which are the elements of the set A as in (16.11.1). Define
p(Xi} = PT. (Xi E(X i } = 7I"i,
= Xi),
(16.11.5) (16.11.6)
V(Xi} = E(Xi -7I"i}2 = (1T, E[(Xi\ - 7I"iJ (Xi2 - 7I"i2) ... (XiJ - 7I"dJ
=
(IT (1i
(16.11.7)
j )
Pi\ ... is •
(16.11.8)
J=1
The numbers Pij, Pijk, ... represent correlations of various orders. Let us consider the functions
=
Xi - 7I"i . , 1. = 1, ... (1i
,Tn,
(1~'
= (Xi . 7I"i) (Xj (1-J,7I"j), i =1= j = 1, ...
= (Xl ~ 71"1 ) ... ( X (1~7I"1Tt ), 1Tt
,m,
(16.11.9)
which are
in number. Define the inner product of two functions Pr and Ps as 1Tt
Pr(XI, ... ,x1Tt }Ps(Xl,'" ,x1Tt } IIp(xi} (X\, .. , ,x"..)EA
i=l
(16.11.10)
518
MATRIX ALGEBRA THEORY AND APPLICATIONS
where p(Xi), i = 1, .. . ,m are fixed marginal densities arising out of p(XI. ... ,xm ) as in (16.11.5). It can be easily checked that with respect to the inner product (16.11.10), the functions (16.11.9) are orthonormal. Then, we have the expansion
p(XI. .. . ,xm ) P(Xl) ... p(xm) = bol
+ f b i (Xi ;
i=l
.
·
7ri) + L"tbij (Xi ; . 7ri) (Xj ; .7r
j
i#j
•
- 7rl) • • • (xm O"m - 7rm) + .. . + b12 ... k ( Xl 0"1
)
J
(16.11.11)
Computing the coefficients using the formula (16.11.4) we find bo = 1, bl bij
= Pij, ...
= .. . = bn = 0 , b12... k
= P12 ... k
(16.11.12)
where Pij,Pijk, ... are as defined in (16.11.8). Equation (16.11.11) provides an expansion of P(Xl, . . . , x m ). In statistical applications, we can estimate 0" i and Pij, Pijk, ... from the sample observations on (X I. ... ,Xm) . In practical applications higher order correlations may be negligible in which case p(XI, ... , xm) can be approximated by the first few terms in (16.11.11).
7ri,
Note: Some references to material covered in this Chapter, besides those mentioned in the text, are Lancaster and Tismenetsky (1985), Bhatia (1991) and Bapat (1993) . Books by Dhrymes (1978), Muirhead (1982) and Amemiya (1985) contain numerous examples of applications of matrix algebra to econometrics and statistics.
REFERENCES
Amemiya, T. (1985). Advanced Econometrics, Basil Blackwell, Oxford. Ammann, L.P. and J.W. Van Ness (1988). A routine for converting regression algorithm into corresponding orthogonal regression algorithms, ACM Trans . Math. Software, 14, 7~87. Ammann, L.P. and J.W. Van Ness (1989). Standard and robust orthogonal regression, Communication in Statistics Simulation Comput., 18, 145-162. Anderson, T.W. (1976). Estimation of linear functional relationships: Approximate distributions and connections with simultaneous equations in econometrics (with discussion), J. Roy. Statist. Soc. B, 38, 1-36. Anderson, T.W. (1980). Estimation of linear statistical relationships, Ann. Statist., 12, 1-45. Anderson, T.W. and Sawa Takamitsu (1982). Exact and approximate distributions of the maxinmm likelihood estimator of a slope coefficient, J. Roy. Statist. Soc. B, 44, 52-62. Anderson, T.W. (1984). An Introduction to Multivariate Statistical Analysis, Second edition, Wiley, New York. Ando, T. (1982). Majorization, doubly stochastic matrices and comparison of eigenvalues, Research Institute of Applied Electricity, Hokkaido University, Sapporo, Japan. Bahadur, R.R. (1961). On classification based on responses to n dichotomous items, In Studies in Item Analysis (ed. H. Solomon), Stanford Univ. Press, Stanford, 158-168. Baksalary, J.K. and S.K. Mitra (1991). Left-star and right-star partial orderings, Linear Algebra and its Applications, 149, 73-98. Baksalary, J.K. and S. Puntanen (1991) . Generalized matrix versions of the Cauchy-Schwarz and Kantorovich inequalities, Aequationes Math., 41, 103-110. 519
520
MATRIX ALGEBRA THEORY AND APPLICATIONS
Bapat, RB. (1993) . Linear Algebra and Linear Models, Hindustan Book Agency, New Delhi . Barnett, S. (1990). Matrix, Methods and Applications, Clarendon Press, New York. Baa, P.e. and J. Rokne (1987). Closest normal matrix bound, BII, 27, 585-598. Bhatia, Rajendra (1991). Matrix Analysis, Springer-Verlag, New York. Bhatia, Rand A.R Holbrook (1989). A softer, stronger Lidskii theorem, Proc. Indian A·cad Sci. (Math. Sci'), 99, 75-83. Bloomfield, P. and G.S. Watson (1975). The inefficiency ofleast squares, Biometrika, 62, 121-128. Bose, R.C., S. Chowla and C.R Rao (1944). On the integral order mod p of quadratics x 2 + ax + b with applications to the construction of minimum functions for GF{p2) and to some number theory results, Bull. Cal. Math. Soc., 15, 153-174. Bose, R.C., S. Chowla and C.R. Rao (1945a). On the roots of a wellknown congruence, Proc. Nat. Acad. Sci., 14, 193. Bose, RC., S. Chowla and C.R Rao (1945b). Minimum functions in Galois fields , Proc. Nat. Acad. Sci., 14, 191. Bose, R.e., S.S. Shrikhande and E.T. Parker (1960) . Further results on the construction of mutually orthogonal latin squares and the falsity of Euler's conjecture, Canad. J. Math., 12, 189-203. Cameron, R (1983). Minimizing the product of two Raleigh quotients, Linear and Multivariate Algebro, 13, 177-178. Cameron, Rand B. Kouvaritakis (1980). Minimizing the norm of output feedback controllers used in pole placement: a dyadic approach, Int. J. Control, 32, 759-770. Datta Biswa Nath (1995). Numerical Linear Algebro and Applications, Brooks/Cole. Davis, P.J. (1979). Circulant Matrices, Wiley, New York. Diaz, J.B. and F.T. Metcalf (1964) . Inequalities complementary to Cauchy's inequality for sums of real numbers, J. Math. Anal., 9, 59-74. Dhrymes, P.J. (1978). Introductory Econometrics, Springer-Verlag, New York. Drazen, M.B. {1978}. Natural structures on semi groups with involution, Bull. Amer. Math. Soc. , 84, 139-141.
References
521
Durbin, J. {1960}. The fitting of time series models, Rev Inst. Int. Stat., 28, 233-243. Eckart, C. and G. Young {1936}. The approximation of one matrix by another of lower rank, Psychometrika, 1, 211-218. Edelman, A. {1997}. The probability that a random real Gaussian matrix has k real eigenvalues, related distributions and the circular law, J. Multivariate Analysis, 60, 203-232. Forsythe, G.E. and G.H. Golub {1965}. On the stationary values of a second degree polynomial on the unit sphere, SIAM J . Appl. Math. Soc., 94, 1-23. Fuller, W.A. {1987}. Measurement Error Models, Wiley, New York. Gander, W., G.H. Golub and V. von Matt (1989). A constrained eigenvalue problem, Linear Algebra and its Applications, 114/115, 815839. GIeser, L.J. {1981}. Estimation in a multivariate "errors in variables" regression model: Large sample results, Ann. Statist., 9, 24-44. Golub, G.H. and C.F. Van Loan {1983}. Matrix Computations, North Oxford Academic, Oxford. Graham, A. {1981}. Kronecker Products and Matrix Calculus with Applications, Ellis Horwood Limited, Chichester, England. Gray, R.M. {1972}. On the asymptotic eigenvalue distribution of Toeplitz matrices, IEEE Trans. Information Theory, 18, 725-730. Grenander, Ulf and Gabor Szego {1958}. Toeplitz Forms and Their Applications, Univ. of California Press. Greub, W. and W. Rheinboldt {1959}. On a generalization of an inequality of L.V. Kantorovich, Proc. Amer. Math. Soc., 10,407-415. Halmos, P.R. {1958}. Finite-Dimensional Vector Spaces, Van Nostrand, New York. Halmos, P.R. {1972}. Positive approximations of operators, Indiana Univ. Math. J., 21, 951-960. Hartley, H.O., J.N.K. Rao and G. Kiefer {1969}. Variance estimation with one unit per stratum, J. Am. Statist. Assoc., 68, 189-192. Hartwig, R.E. (1980). How to partially order regular elements, Mathematica Japonica, 25, 1-13. Hartwig, R.E. and G.P.H. Styan (1987). Partially ordered idempotent matrices. In Proc. Second Int. Tampere Conference in Statistics, (eds. T. Pukkila and S. Puntanen), 361-383.
522
MATRIX ALGEBRA THEORY AND APPLICATIONS
Hedayat, A. and W.O. Wallis (1978). Hadamard matrices and their applications, Ann. Statist., 6, 1184-1238. Horn, R. and C.A. Johnson (1985). Matrix Analysis, Cambridge University Press, Cambridge, UK. Horn, R. and C.A. Johnson (1991). Topics in Matrix Analysis, Cambridge University Press, Cambridge, UK. !kobe, Y., T. Inagaki and S. Miyamoto (1987). The monotonicity the<>rem, Cauchy's interlace theorem, and the Courant-Fischer theorem, Amer. Math. Monthly, 94, 352-354. Khatri, C.G. and C.R. Rao (1981). Some extensions of the Kantorovich inequality and statistical applications, J. Multivariate Analysis, 11, 498-505. Khatri, C.G. and C.R. Rao (1982). Some generalizations of Kantorovich inequality, Sankhya A, 44, 91-102. Knott, M. (1975). On the minimum efficiency ofleast squares, Biometrika 62, 129-132. Kouvaritakis, B. and R. Cameron (1980). Pole placement with minimized norm controllers, Pmc. IEEE, 127,32-36. Ky Fan (1951). Maximum properties and inequalities for eigenvalues of completely continuous operators, Pmc. Nat. Acad. Sci., 37, 760-766. Ky Fan and A.J. Hoffman (1955). Some matrix inequalities in the space of matrices, Pmc. Amer. Math. Soc., 6, 111-116. Levinson, N. (1947). The Weiner RMS error criterion in filter design and prediction, J. Math. Phys., 25, 261-278. Lin, C.T. (1990). Extremes of determinants and optimality of canonical variables, Commun. Statist. Simul. Comp., 19, 141~1430. Liu, S. (1995). Contributions to Matrix Calculus and Applications in Econometrics, Book No.106 of the Tinbergen Institute Research Series, Ph.D. Thesis. LOwner, K. (1934). Uber monotone Matrixfunktionen, Mathematishe ZeitschriJt, 38, 177-216. Magnus, J.R. and H. Neudecker {1991}. Matrix Differential Calculus with Applications in Statistics and Econometrics, Wiley, Chichester. Maher, P.J. {1994}. Means for matrices, Int. J. Math. Educ. Sci. Technol., 25, 591-623. Mann, H.B. (1949). Analysis and Design of Experiments, New York, Dover Publication.
References
523
Marsaglia, G. {1967}. BoWlds on the ranks of sums of matrices. Trans. Fourth Prague Conference on Information Theory, Statistical Decision FUnctions and Random Processes, Czech Acad, Sciences, 455462. Marsaglia, G. and G.P.H. Styan {1974}. Equalities and inequalities for ranks of matrices, Linear and Multilinear Algebra, 2, 269-292. Marshall, A.W. and I. Olkin {1979}. Inequalities: Theory of Majorization and its Applications, Academic Press, New York. Marshall, A.W. and I. Olkin {1990}. Matrix versions of the Cauchy and Kantorovich inequalities, Aequationes Math., 40, 89-93. Mirsky, L. {1960}. Symmetric gauge fWlctions and unitarily invariant norms, Quarterly J. of Mathematics, Oxford, Second series, 11, 5059. Mirsky, L. {1990}. An Introduction to Linear Algebra, Dover Publication. Muirhead, R.J. {1982}. Aspects of Multivariate Statistical Theory, Wiley, New York. Mullholand, H.P. and C.A.B. Smith (1959). An inequality arising in genetical theory, Amer. Math. Monthly, 66, 673-683. Pearson, K. {1901}. On lines and planes of closest fit to systems of points in space, Philosophical Magazine, 2 {sixth series}, 559-572. Raghavarao, D. {1971}. Constructions and Combinatorial Problems in Design of Experiments, John Wiley, New York. Rao, C.R. {1945a}. Generalization of Markoff's theorem and tests of linear hypotheses, Sankhyii, 7, 9-16. Rao, C.R. {1945b}. Markoff's theorem with linear restrictions on parameters, Sankhyii, 7, 16-19. Rao, C.R. {1945c}. Finite geometries and certain derived results in theory of numbers, Proc. Nat. Inst. Sci., 11, 136-149. Rao, C.R. {1946a}. Difference sets and combinatorial arrangements derivable from finite geometrics, Proc. Nat. Inst. Sci., 12, 123-135. Rao, C.R. {1946b}. On the linear combination of observations and the general theory of least squares, Sankhyii, 7, 237-256. Rao, c.R. {1947}. Factorial experiments derivable from combinatorial arrangements of arrays, J. Roy. Statist. Soc., 9, 128-140. Rao, c.R. {1949}. On a class of arrangements, Edin. Math. Proc., 8, 199-225. Rao, C.R. {1951}. A theorem in least squares, Sankhyii, 11, 9-12.
524
MATRIX ALGEBRA THEORY AND APPLICATIONS
Rao, C.R. (1955). Analysis of dispersion for multiply classified. data with unequal numbers of cells, Sankhya, 15, 253-280. Rao, C.R. (1959). Some problems involving linear hypotheses in multivariate analysis, Biometrika, 46, 49-58. Rao, C.R. (1962). A note of generalized. inverse of a matrix with applications to problems in mathematical statistics, J. Roy. Statist. Soc. B, 24, 152-158. Rao, C.R. (1964a). Problems of selection involving programming techniques. In Proc. IBM Scientific Computing Symposium on Statistics, 29-51. Rao, C.R. (1964b). The use and interpretation of principal component analysis in applied. research, Sankhya A, 26, 329-358. Rao, C.R. (1965). The theory of least squares when the parameters are stochastic and its applications to the analysis of growth curves, Biometrika, 52, 447-458. Rao, C.R. (1967) . Calculus of generalized inverses of matrices: Part r general theory, Sankhya A, 29, 317-350. Rao, C.R. (1968). A note on a previous lemma in the theory of least squares and some further results, Sankhya A, 30, 259-266. Rao, C.R. and S.K. Mitra (1968a). Simultaneous reduction of a pair of quadratic forms, Sankhya A, 30, 313-322. Rao, C.R. and S.K. Mitra (1968b). Some results in estimation and tests of linear hypotheses under the Gauss-Markov model, Sankhya A, 30, 281-290. Rao, C.R. and S.K. Mitra (1969). Conditions for optimality and validity of least squares theory, A nn. Math. Statist., 40, 1716-1724. Rao, C.R. (1971). Unified theory of linear estimation. Sankhya A, 33, 371-394. Rao, C.R. and S.K. Mitra (1971a). FUrther contributions to the theory of generalized inverse of matrices and its applications, Sankhya A, 33, 289-300. Rao, C.R. and S.K. Mitra (1971b). Generalized Inverse of Matrices and its Applications, Wiley, New York (Japanese Translation, Tokyo, 1973). Rao, C.R. (1972a). Estimation of variance components in linear models. J. Am. Statist. Assoc., 67, 112-115. Rao, C.R. (1972b). A note on rPM method in the unified theory of linear estimation, Sankhya A, 34, 285-288.
References
525
Rao, C.R., P. Bhimasankaram and S.K. Mitra (1972). Determination of a matrix by its subclasses of g-inverse, Sankhya A, 24, 5-8. Rao, C.R. (1973a). Unified theory of least squares, Communications in Statistics, 1, 1-8. Rao, C.R. (1973b). Representation of best unbiased estimators in the Gauss-Markov model with a singular dispersion matrix, J. Multivariate Analysis, 3, 276-292. Rao, C.R. {1973c}. Linear Statistical Inference and its Applications, Wiley, New York. Rao, C.R. and S.K. Mitra (1973). Theory and application of constrained inverse of matrices, SIAM J. Appl. Math., 24, 473-488. Rao, C.R. (1974). Projectors, generalized inverses and the BLUE's, J. Roy. Statist. Soc., 35, 442-448. Rao, C.R. and S.K. Mitra (1974). Projections under semi-norms and generalized inverse of matrices, Linear Algebra and Appl., 9, 155-167. Rao, C.R. (1975). Theory of estimation of parameters in the general Gauss-Markov model, In a Survey of Statistical Design and Linear Models, (ed. J.N. Srivastava), North Holland, 475-487. Rao, C.R. and S.K. Mitra (1975). Extension of a duality theorem concerning g-inverse of matrices. Sankhya A, 37, 439-445. Rao, C.R. (1976a). Estimation of parameters in a linear model Wald Lecture 1, Ann. Statist., 4, 1023-1037, with a correction in Vol. 7, 696. Rao, c.R. (1976b). On a unified theory of linear estimation in linear models - A review of recent results, In Perspectives in Probability, Papers in honor of M.S. Bartlett, (ed. J. Gani), Academic Press, New York, 89-104. Rao, C.R. (1978a). Least squares theory for possibly singular models, The Canadian J. Statist., 6, 19-23. Rao, C.R. (1978b). A note on the unified theory ofleast squares, Commun. Statist. Theor. Meth. A, 7(5}, 409-411. Rao, C.R. (1978c). Choice of best linear estimators in the Gauss-Markov model with a singular dispersion matrix, Commun. Statist. Theor. Meth. A, 7(13}, 1199-1208. Rao, C.R. and H. Yanai {1979}. General definition and decomposition of projectors and some applications to statistical problems, J. Statist. Planning and Inference, 3, North-Holland, 1-17.
526
MATRIX ALGEBRA THEORY AND APPLICATIONS
Rao, C.R. (1979a). Estimation of parameters in the singular GaussMarkov model, Commun. Statist. Theor. Meth. A, 8(14), 13531358. Rao, C.R. (1979b). Separation theorems for singular values of matrices and their applications in multivariate analysis, J. Multivariate Analysis, 9, 362-377. Rao, c.R. (1980). Matrix approximations and reduction of dimensionality in multivariate statistical analysis, In Multivariate Analysis V, (ed. P.R. Krishnaiah), North-Holland, 3-22. Rao, C.R. and Jurgen Kleffe (1980). Estimation of variance components, Handbook of Statistics, (ed. P.R. Krishnaiah), North-Holland, 1, 1-40. Rao, C.R. (1981). A lemma on g-inverse of a matrix and a computation of correlation coefficients in the singular case, Commun. Statist. Theor. Meth. A, 10, 1-10. Rao, C.R. (1984). Optimization of functions of matrices with applications to statistical problems, In W. G. Cochran's Impact on Statistics, (ed. Poduri, S.R.S. Rao), John Wiley, New York, 191-202. Rao, C.R. (1985a). Matrix derivatives: Applications in statistics, In Encyclopedia of Statistical Sciences Vo1.5, (eds. Kotz-Johnson), John Wiley, New York, 320-325. Rao, C.R. (1985b). A unified approach to inference from linear models, In Pmc. First International Tampere Seminar on Linear Statistical Models and Their Applications, (eds.T. Pukkila and S. Puntanen), University of Tampere, Finland, 9-36. Rao, C.R. (1985c). The inefficiency of least squares: Extension of Kantorovich inequality, Linear Algebra and its Applications, 70, 249-255. Rao, C.R. (1985d). Tests for dimensionality and interactions of mean vectors under general and reducible covariance structures, J. Multivariate Analysis, 16, 173-184. Rao, C.R. and H. Yanai (1985a). Generalized inverse of linear transformations: A geometric approach, Linear Algebra and its Applications, 66,87-98. Rao, C.R. and H. Yanai (1985b). Generalized inverses of partitioned matrices useful in statistical applications, Linear Algebra and its Applications, 70, 105-113. Rao, C.R. (1987) . Estimation in linear models with mixed effects: A unified theory, In Proc. Second International Tampere Conference in
References
527
Statistics, (eds. T. Pukkila and S. Puntanen), University of Tampere, Finland, 73-98. Rao, c.R. and C.V. Rao (1987). Stationary values of the product of two Raleigh coefficients: homologous canonical variates, Sankhya B, 49, 113-125. Rao, C.R. and J. Kleffe (1988). Estimation of Variance Components and Applications, North-Holland. Rao, C.R. (1989). A lemma on optimization of a matrix function and a review of the unified theory of linear estimation, In Statistical Data Analysis and Inference, (ed. Y. Dodge), Elsevier Science Publishers B.V., 397-418. Rao, C.R. (1995). A review of canonical coordinates and an alternative to correspondence analysis using Hellinger distance, Questiio, 19, 15-63. Rao, C.R. and H. Toutenburg (1995). Linear Models: Least Squares and Alternative Methods, Springer-Verlag. Rao, C.R. (1996). Seven inequalities in statistical estimation theory, Student, 1, 149-158. Rorres, C. and H. Anton (1984). Applications of Linear Algebra, 3rd edition, John Wiley, New York. Schopf, A.H. (1960). On the Kantorovich inequality, Numer. Math., 2, 344-346. Seely, J. (1971). Quadratic subspaces and completeness, Ann. Math. Stat., 42, 710-721. Seely, J. and G. Zyskind (1971). Linear spaces and minimum variance unbiased estimation, Ann. Math. Stat., 42, 691-703. Srivastava, M.S. and C.G. Khatri (1979). An Introduction to Multivariate Statistics, North-Holland, New York. Strang, W.G. (1960). On the Kantorovich inequality, Proc. Amer. Math. Soc., 11, 468. Styan, G.P.H. (1983). On some inequalities associated with ordinary least squares and the Kantorovich inequality. In Festschrift for Eino Haikala on his Seventieth Birthday, Univ. of Tampere, 158-166. Sunder, V.S. (1982). On permutations, convex hulls and normal operators, Linear Algebra and Appl., 48, 403-411. Trenkler, G. (1994). Characterizations of oblique and orthogonal projectors, Proc. Int. Conference on Linear Statistical Inference, LINSTAT'93 (eds. T. Calinski and R. Kala), 255-270.
528
MATRIX ALGEBRA THEORY AND APPLICATIONS
Toutenburg, H. (1992). Lineare Modelle, Physica-Verlag, Heidelberg. Trench, W.F. (1964). An algorithm for the inversion of finite Toeplitz matrices, J. ACM, 16, 592-601. Van Huffel, S. and J. Vandewalle (1991). The Total Least Squares Problem: Computational Aspects and A nalysis, Frontiers in Applied Mathematics Ser., Vol. 9 , SIAM, Philadelphia. Van Huffel, S. and H. Zha (1993). The total least squares. In Handbook of Statistics 9, Computational Statistics, Elsevier Science Publishers B.V., 377-408. Von Neumann, J. (1937), Some matrix inequalities and metrization of metric spaces, Tomsk Univ. Rev., 1, 286-299. Yanai, H. and Y. Takane (1992). Canonical correlation analysis with linear constraints, Linear Algebra Appl., 176, 75-89.
INDEX
[Items are generally listed under broad headings like inequalities, factorization of matrix, matrix types, matrix reduction, generalized inverses and so on.]
rank, 97 signature, 97 singular value decomposition, 101-104 spectral theory, 83-98 Contragredient transformation, 493, 494 Convex function, 308 Convexity extreme points, 309 Schur, 307, 318 strictly Schur, 307 strong Schur, 307, 319 Courant-Fischer theorem eigenvalues, 332 singular values, 335 Cramer-Rao bound, 454
Abel's identity, 385 Adjoint, 155 Adjugate matrix, 144,149 Analysis of variance, 29 Autoregressive model, 501 Bahadur expansion, 516 Balanced incomplete block design, 48, 49, 447, 513 Cauchy-Binet formula, 149, 154 Cauchy-Schwartz inequality, 53, 494 constrained version, 455 integral version, 455 matrix version, 494 Cholesky decomposition, 173, 191 Circulant matrices, 511 Conjugate bilinear functional definition, 84 eigenvalues, 87 eigenvectors, 87 minimax theorems, 98
Derivatives matrix, 223-238 vector, 224 Determinant adjugate matrix, 144, 149 Cauchy-Binet formula, 149, 154 co-factor, 144 definition, 142 Laplace expansion, 153 minor, 147, 163 529
530
MATRIX ALGEBRA THEORY AND APPLICATIONS
permanent, 311 Vandermond, 145 Sylvester's identity, 151, 152 Drazen ordering of matrix, 509 Durbin's algoritlun, 502, 504 Eigenvalue, 177-180, 229, 321, 332 Eigenvector, 177-180, 229, 323 Factorization of matrices Cholesky, 173, 191 eigenvalue, 174 general, 174, 191 Hermitian, 175 Jordan canonical, 191 LU, 163, 189 normal, 175, 190 polar, 187, 191 QR, 168, 176, 190 real symmetric, 191 SQ,176 rank,162, 189 Schur, 175 Schur triangulation, 189 singular value, 101, 102, 172, 191 sparse spectral, 439 spectral, 190 Takagi, 192 tridiagonal, 190 upper Hessenberg, 190 Fejer's theorem, 214 Fibonacci matrix, 515 numbers, 476, 492 Field of numbers, 1-13
Fredholm theorem, 515 Fitting hyperplanes, 398 Galois field, 6, 8 Gauss-Markov theorem, 244, 300 Generalized inverse of a matrix general inverse, 265 least squares, 289, 291 left inverse, 116, 132, 264 matrix approximations, 296299 minimum norm, 288, 289 minimum norm least squares, 291 Moore-Penrose, 287, 440 partitioned matrix, 27~277 Rao- Yanai (LMN), 282-292 reflexive, 279 right inverse, 116, 132, 264 various other types, 292, 294 GerSgorin theorem, 500 Gramian, 449 Gram-Schmidt orthogonalization 57,62, 169 modified form, 170 Group, 1 Hadamard-Schur product, 203 eigenvalues, 206 non-negative definiteness, 204 rank, 205 Horn's theorem, 340 Hyperplane, 398 Idempotent matrices, 64, 250 Inequalities
Indez
AM-GM-HM matrix type, 499 Aarki-Lie~ Thirring, 499 Aronszaju, 498 Bessel, 60 Cauchy-Schwartz, 53, 454, 494 Fiedler, 453 Fischer, 453, 498 Frobenius, 134 Hadamard, 456 Hadamard-Schur, 209 Holder, 457 Horn's theorem, 340 information theory, 458, 459 interlace theorem (eigenvalues), 100, 328, 336 interlace theorem (singular values), 329 Jensen, 460 Kantorovich, 462-466 Ky Fan, 381 Ky Fan and Hoffman, 381 Lieb-Thirring, 499 Minkowski, 451, 458 moment, 461, 462 monotonicity theorem (eigenvalues), 322 monotonicity theorem (singular values), 326 Oppenheim, 453 Poincare separation theorem (eigenvalues), 337 Poincare separation theorem (singular values), 338 Rao,382 Robertson, 452
531
Schopf, 497 Strumian separation theorem (eigenvalues), 332 Sylvester, 135 Taussky, 452 von Neumann's theorem, 342359 von Neumann, 386 Weyl's theorem (eigenvalues), 359 Weyl's theorem (singular values), 360 Inner product, 51 semi, 76 Interlace theorem (eigenvalues), 100, 328, 336 Interlace theorem (singular values), 329 Kantorovich type inequalities, 456, 462, 463 Bloomfield, Watson, Knot, 464 Grue~Rheinboldt, 456, 466 Kantorovich, 462 Khatri-Rao, 464-466 matrix version, 496 P6lya-Szego, 456 Rao, 465 Schopf, 497 Schweitzer, 456 Strang, 465 Styan, 465 Latin squares, 2, 3 Graeco-latin, 7 mutually orthogonal, 7-10
532
MATRIX ALGEBRA THEORY AND APPLICATIONS
orthogonal, 6 Least squares method, 70 Leontieff model, 477 Leslie model, 491 Lidskii's theorem, 510 Linear equations, 29, 31, 66, 196, 198, 515 consistency, 31, 34, 67, 269, 288 Fredholm theorem, 515 least squares solution, 70-75 minimum norm solution, 69 solution, 269 Linear functional, 23, 35, 71 representation theorem, 71 Linear independence, 19 Linear model, 217, 228, 242, 244, 256, 300, 403, 406, 443 estimation, 407, 408 Linear transformation, 107 algebra of, 110 inverse, 116 kernel, 108 range, 108 Magic squares, 6, 10, 11 Majorizat~on
complex vectors, 509 real vectors, 303 soft majorization, 510 weak majorization, 306 Weyl's majorant theorem, 497 Markov chains, 481-484 Matrix approximations, 388-393 Eckart, Young and Mirsky, 392 fitting a hyperplane, 398-402
g-inverse, 296-299 Halmos,392 Rao, 394-397 Matrix derivatives, 223-228 Matrix operations conjugate transpose, 125 determinant, 142 kernel, 128 nulli ty, 128 permanent, 311 pinching, 498 range, 128 rank, 128, 131 Schur complement, 140 similarity, 191 spectral decomposition, 449 trace, 125 transpose, 124 vee, 200 Matrix orderings Drazen,509 LOwner,508 rank subtractive, 509 star, 509 Matrix products Hadamar-Schur, 203-206, 496 Khatri-Rao, 216 Kronecker, 193, 195, 196 Matrix reduction Cholesky decomposition, 173, 191 general theorem, 174 Gram-Schmidt method, 169-171 Hermitian, 175
533
Index
Jordan canonical, 191 LV factorization, 163, 189 polar decomposition, 187, 191 QR factorization, 168, 190 rank factorization, 162, 189 SQ factorization, 176 Schur decomposition, 175 Schur triangulation, 189 singular value decomposition, 101, 102, 172, 191 spectral decomposition, 190 Takagi factorization, 192 tridiagonal, 190 upper Hessenberg, 190 Matrix types circulant, 511 diagonally dominant, 369 echolan form, 159, 163 elementary, 157-159 Fibonacci, 151 Givens, 188 Hadamard, 514 Hermitian, 166 Hessenberg, 188 Householder, 167 idempotent, 64 identity, 123 irreducible, 467 lower triangular, 160 non-negative, 467 non-negative definite, 180, 449 null or zero, 123 persymmetric, 502 positive, 467 positive definite, 180, 449
symmetric, 166 triangular, 188 tridiagonal, 188 Toeplitz, 501 upper triangular, 159 Minimax theorems eigenvalues, 98, 332 MINQVE, 221, 415 Monotonicity theorems eigenvalues, 322 singular values, 326 Non-negative matrices reducibility, 468 primitive, 470 Perron-Frobenius theorem, 467, 473, 475 Norm distance based on, 361, 362 Frobenius, 362, 374, 376 induced norm, 367 Ky Fan, 388 Ll, L2, L p , Loo norms, 362 M, N-invariant, 394-397 matrix, 363, 364 spectral, 371-376 spectral radius, 364 symmetric gauge function, 377
unitarily invariant, 297, 374, 375, 379, 510 vector, 53, 361, 362 weakly unitarily invariant, 510 Ostrawski-Taussky theorem, 454 Parseval identity, 60
534
MATRIX ALGEBRA THEORY AND APPLICATIONS
Perron-Frobenius theorem, 467, 475 Perron left root, 475 Perron right root, 475 Poincare separation theorem eigenvalues, 337, 398 singular values, 338 Population growth model, 489 Prediction, 73 Prediction in time series, 501, 502 Primitive matrix, 470 Product of matrices Hadamard-Schur, 203-206, 496 Khatri-Rao, 216 Kronecker, 193-196 Product of Raleigh quotients, 507 Projective geometry, 42 Projection operator, 239, 283,450 matrix representation, 256262 orthogonal projection, 243, 248,301 Rao's form, 262 Pythogorous theorem, 65, 499 Quadratic subspace, 443 commutators of, 442 sparse spectral representation, 439 spectral decomposi tion, 439 structure of, 438 Raleigh quotient, 334 Raleigh quotients product of, 507 Rank of matrix
Frobenius inequality, 134 product of matrices, 131 rank subtractivity, 509 Sylvester's inequality, 135 Regression, 74, 75 Restricted eigenvalue problem, 506 Schur complement, 140 Schur theorem, 500 Semi inner product, 76 Simultaneous reduction of matrices, 184, 186, 493 Simultaneous s.v.d., 192 Sing. value decomposition, 101 Spectral norm, 371-376 Spectral radius, 364, 471 Spectral theory, 83 Spectrum, 92 Stochastic matrix, 482 doubly stochastic, 308, 316 Sturmian separation theorem, 337 Sufficient statistics, 445 Sylvester's identity, 151, 152 Symmetric function, 308 Symmetric gauge function, 380 Toeplitz matrices, 501 eigenvalues of, 506 Total least squares, 428 Transformation adjoint, 155 algebra of, 110, 114 bijective, 14, 116, 263 homomorphism, 15, 107 injective, 14, 116
Index
invariance, 245 inverse, 15, 116, 263 isomorphism, 15, 108 kernel, 108 nullity, 108 range, 107 rank, 108 surjective, 14, 116, 263 Variance components, 217, 218, 221, 415-422, 443 Vec operation, 200 Vector derivative, 224 Vector spaces angle, 55 annihilator, 39 basis, 19, 20 definition, 16 dimension, 21 direct sum, 25 distance, 55 dual space, 35 Euclidean space, 51, 52 Hamel basis, 20 hyperplane, 398 inner product space, 52 isomorphic, 18 norm, 53 orthogonal basis, 59 orthogonal·complement, 62 orthogonal projection, 64 projective geometry, 42 quotient space, 41 semi-inner product space, 76 subspace, 24 unitary space, 51, 52 von Neumann's theorem, 342-359
535
Weyl's majorant theorem, 497 converse of, 498 Weyl's theorem (eigenvalues), 327, 359