Linear Algebra Richard Kaye and
Robert Wilson School ofMathematics and Statistics The University ofBirmingham
OXFORD ...
985 downloads
4625 Views
3MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Linear Algebra Richard Kaye and
Robert Wilson School ofMathematics and Statistics The University ofBirmingham
OXFORD NEW YORK TOKYO OXFORD UNIVERSITY PRESS •
•
Linear Algebra
This book has been printed digitally and produced in a standard specification in order to ensure its continuing availability
OXFORD UNIVERSITY PRESS
Great Clarendon Street, Oxford OX2 6DP Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide in Oxford New York
Auckland Bangkok Buenos Aires Cape Town Chennai Dar es Salaam Delhi Hong Kong Istanbul Karachi Kolkata Kuala Lumpur Madrid Melbourne Mexico City Mumbai Nairobi Sao Paulo Shanghai Taipei Tokyo Toronto Oxford is a registered trade mark of Oxford University Press in the UK and in certain other countries Published in the United States by Oxford University Press Inc., New York ©Richard Kaye and Robert Wilson, 1998 The moral rights of the authors have been asserted Database right Oxford University Press (maker) Reprinted 2003 All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, without the prior permission in writing of Oxford University Press, or as expressly permitted by law, or under terms agreed with the appropriate reprographics rights organization. Enquiries concerning reproduction outside the scope of the above should be sent to the Rights Department, Oxford University Press, at the address above You must not circulate this book in any other binding or cover and you must impose this same condition on any acquirer ISBN 0-19-850237 0
P refa ce This book constitutes a second course in linear algebra, and is based on second year courses given first by RW and then by RK in Birmingham over the last five years. The objectives of such a course are as follows. Firstly, the student must learn a whole host of algebraic methods associated with bilinear forms, inner products, eigenvectors, and diagonalization of matrices, and be confident in carrying out calculations with these in all areas of mathematics. Secondly, this course is likely to be one of the first places where the student meets the axiomatic method of abstract algebra, and as such serves as an introduction to abstract algebra in general. We believe that these two requirements can be successfully married into a single course. By this stage in a student's career, vectors and matrices should be sufficiently familiar that the jump to the full rigour of the axiomatic treatment of vector spaces is not such a great one. Our approach throughout is to show how certain key examples suggest axioms, and then to prove 'structure theorems ' showing that the abstract objects satisfying the axioms are isomorphic to one of the 'canonical' examples for finite dimensional vector spaces over ]g_ or C or other fields. More advanced theorems can then be proved in a ' concrete way' using matrices and column vectors. This approach has the advantages that matrices (with which students are generally quite comfortable) are never very far away, and that proofs coincide very closely with the calculations that the students will be required to do, so algorithms or methods can be obtained from studying proofs (a useful lesson to learn in general) . Obviously, we give plenty of examples in the text, and in doing so we are able to point out exactly where the proofs of the corresponding theorems suggest how to do the calculation in question. We have assumed a certain amount of familiarity with matrices and basic operations on them (addition, multiplication, transpose, determinants, and cal culating inverses) , and the student should be able to solve simultaneous linear equations of the form Ax = b, obtaining the full solution set. This much at least will be included in a first-year programme of study, and this material is summar ized in Chapter 1. We also assume some basic knowledge of the complex numbers, but we do not assume the student has encountered vector spaces over C before. Properties of polynomials that are required for understanding of the Cayley Hamilton Theorem and the minimum polynomial of a linear transformation are set out in a suitable place in Part III. Typically, a student at this level will have met the concepts of vector space over ]g_ and dimension before, but full familiarity with these ideas is not necessary
vi Preface
2.
since this material is revised fully in Chapter Clearly the extent to which this chapter needs to be revised or expanded upon is at the discretion of the lecturer or the student. We have chosen to include all of the basic material on linear independence, bases and dimension here, even though for most students this will be revision. Almost all of the time our vector spaces are finite dimensional, but since students at this level will occasionally meet applications (such as Fourier series) which require infinite dimensional vector spaces we have included some material on these too; in most cases we prove results for finite dimensional vector spaces only, indicating afterward whether or not the theorem is also true for infinite dimensional spaces. We have also included a brief introduction to fields, and this may well be new material to a student at this level. The reasons for including this mater ial are clear: many examples and applications of vector spaces that a typical undergraduate will see will involve fields other than IR or C, and the field ax ioms provide an important illustration of the axiomatic approach. We do not allow ourselves to dwell on the subject, though, giving for example just a few examples of finite fields rather than a complete classification. In any case the main emphasis of the book is on spaces over IR or C, and this section is optional. Part II goes on to discuss inner product spaces in general, and also bilinear forms and quadratic forms on real vector spaces, culminating in their full de scription via the diagonal form given by the Gram-Schmidt orthonormalization (for inner products) and by 'Sylvester's law of inertia' in the more general case of symmetric bilinear forms. In the case of spaces over the complex numbers, conjugate symmetric forms are also considered and the corresponding laws are derived by the same methods. To keep things reasonably straightforward, Part II is concerned mostly with spaces over IR or C. Part III contains a full discussion of linear transformations from a finite di mensional vector space V to itself, their eigenvalues, eigenvectors, and diagonal ization. The algorithms performing these computations are emphasized through out. Determinants are used as an aid to computations (the characteristic poly nomial) but are not required for a full understanding of the theory. The book ends with two chapters that emphasize applications of the material presented in the whole book: one on self-adjoint transformations on inner product spaces and the final chapter on Jordan normal form. In Part III, our vector spaces are over an arbitrary field F with the only condition on F being that the minimum polynomial of the linear transformation f in question splits over F. The student may prefer to continue to read the text as if F is ffi'. or C. Of course, there is too much material for a single course here, and it is up to the lecturer to decide on the course content and emphasis. With the material in Part I taken as understood, it would certainly be possible to cover all of the topics here as 'algorithms' or 'methods' in a single course, leaving the brighter students to follow up the sections explaining why some of the more difficult ideas (such as Sylvester's law of inertia, or the Jordan normal form) really do work.
Preface vii
2. 6 ,
6.2,
Alternatively, some sections and/or chapters can be omitted altogether without interrupting the flow of the text. For example, Sections 4.4, 5.4, 7.4 and 7.5 may be omitted and Chapters 13 and 14 are independent of each other, so one or both of these could be omitted (though it should be mentioned that these last two chapters are in some ways the most important of all for applications) . Birmingham October 1997
RK RW
Contents
PART I 1
2
1.1 .21 1.1.1.435 1.1.67 22..21 2.2.43 2.2.65
Matrices
Matrices Addition and multiplication of matrices The inverse of a matrix The transpose of a matrix Row and column operations Determinant and trace Minors and cofactors
Vector spaces
Examples and axioms Subspaces Linear independence Bases Coordinates Vector spaces over other fields
PART II 3
4
5
MATRICES AND VECTOR SPACES
33.. 21 3.3 4.4.4.321 4.4 5.5.21 55..43
BILINEAR AND SESQUILINEAR FORMS
Inner prod uct spaces
The standard inner product Inner products Inner products over C
Bilinear and sesquilinear forms
Bilinear forms Representation by matrices The base-change formula Sesquilinear forms over C
Orthogonal bases
Orthonormal bases The Gram-Schmidt process Properties of orthonormal bases Orthogonal complements
33 45 56 1115 1919 2427 338294 474749 55 6161 6366 69 7733 7581 85
X
6 7
Contents
6.6.21 7.7.7.321 7.7.45
When is a form definite?
The Gram-Schmidt process revisited The leading minor test
Quadratic forms and Sylvester's law of inertia
Quadratic forms Sylvester's law of inertia Examples Applications to surfaces Sesquilinear and Hermitian forms
PART III 8
9
10
11
12
13
14
8.8.21 8.3 99..21 9.9.34 10.10.21 10.3 11.11.21 11.3 12.12.21 12.3 13.13.21 113.3 .34 14. 1
LINEA R TRANSFORMATIONS
Linear transformations
Basics Arithmetic operations on linear transformations Representation by matrices
Polynomials
Polynomials Evaluating polynomials Roots of polynomials over
Eigenvalues and eigenvectors
An example Eigenvalues and eigenvectors Upper triangular matrices
The minimum polynomial
The minimum polynomial The characteristic polynomial The Cayley-Hamilton theorem
Diagonalization
Diagonal matrices A criterion for diagonalizability Examples
Self-adjoint transformations
Orthogonal and unitary transformations From forms to transformations Eigenvalues and diagonalization Applications
The Jordan normal form
Jordan normal form
949894 106109106 111173 118 127127 135133 142142 141453 147 151151 151553 162162164 167 174174 180176 187187 194190196 203203
14.14.23 1 4 .4
Contents xi Obtaining the Jordan normal form Applications Proof of the primary decomposition theorem
Appendix A
A theorem of analysis
Appendix B
Applications to quantum mechanics
Index
20921 2 216 222 224 227
Part I Matrices and vector spaces
1 Mat ri ces Column vectors and matrices will appear in many ways in this book. O n the one hand, they provide particularly important examples of vector spaces (the subject of the next chapter) and various operations on vector spaces (the subject of Parts II and III) . On the other hand, they provide elegant and convenient mathematical tools for performing calculations in all sorts of types of problems, including problems which at first sight have nothing to do with matrices. You should already have some familiarity with matrices and some of the more common matrix operations. The purpose of this chapter is to review the required material that will be needed later. This will be done in a somewhat more 'advanced' way than when you perhaps learnt this material. The same basic ideas concerning matrices appear in several different contexts, and can be usefully presented in a rather 'unified ' way. 1.1
An
Matri ces n x
rn
matrix is an array of numbers of the form
a1 1 a2 1 a3 1
a12 a22 a3 2
a1 m a2 m a3 m
Sometimes, this matrix will be denoted by ( a;j ) , where a;j is a 'general' element of the array and the brackets () indicate that the whole array is to be considered . Note that the first coordinate, i i n a;j , refers t o the row of the matrix, and the second, j, refers to the column. Similarly, an n x rn matrix has n rows and rn columns. In the special case when n = rn, i.e. the matrix has the same number of rows as columns, the matrix will be said to be square. We will denote the set of n x rn matrices with entries from � (the real numbers) by Mn,m (�) , and n x rn matrices with entries from
4
Matrices
this they present a very useful mathematical shorthand. Moreover, matrices can be added, multiplied, transposed, and there are other important operations too. 1.2
Addition a n d m u ltiplication of m atrices
When A = (aij) and B = (bij ) are both n x m matrices, then they can be added and the result, A + B, is the matrix whose entries are the sum of the corresponding entries in A and B, i.e. the n x m matrix with ( i , j ) th entry aij + bij . If A and B are of different sizes, then the sum A + B is not defined. The n x m zero matrix, On , m , is the matrix all of whose entries are zero. When the size is clear from context, we omit the n, m from the notation, writing simply 0 . Note the following laws for matrix addition. (Associativity.) (A + B) + C = A + (B + C) . (Commutativity.) A + B = B + A. (Zero.) 0 + A = A + 0 = A. These apply whenever A, B , C are the correct size for the addition operations here to be defined. Given an n x m matrix A = ( a;j ) , and a number ). from IR or C, we define >. A to be the n x m matrix whose entries are >.aij · The operation of multiplying a number ). by a matrix A is called scalar multiplication. The matrix -A is defined to be -IA. We have the following laws for scalar multiplication.
1. 2.
3.
4.5. 6.7.
A(flA) = (>.fl)A. (>. + fl)A = >.A + flA. O(A) = 0. A + ( - A) = 0. Two matrices A = (a;j ) and B = (b;j ) can sometimes be multiplied together,
but only if their sizes agree in a special way. The rule is as follows: the matrix product AB only exists when A is n x m and B is m x k for some n, m , k. When this is the case, AB is the n x k matrix with ( i, j)th entry
m
L a;rbrj = a;r brj + a;2 b2j + · · · + a;m bmj .
r =l
The n x n identity matrix I n is the matrix ( a;j ) with diagonal entries a;; 1 and all other entries When n is clear from context, it is just denoted I. Note that matrix multiplication is not in general commutative, and it is not hard to find examples, such as
0.
( -11 02) (01 -11 ) = ( -11 11 ) =I ( -12 02) = (01 -11 ) ( -11 02) .
=
Even worse, it may be that AB is defined but BA is not. However, provided the following matrices are all of the correct size for the expressions to be defined, we have the following laws. (Associativity.) (AB)C = A (BC) .
8.
The inverse of a matrix
9.
5
(Zero.) OA = AO = 0. 10. (Identity.) lA =AI = A. 1 1. (Distributivity. ) A ( B +C) = AB +AC. 12. (Distributivity.) (A+B ) C = AC+BC. Square matrices will be particularly important in this book. We note here that if A , B are n x n matrices, then A+ B , AB and >-A, >-B are all n x n matrices; thus all of the twelve laws just given hold for n x n matrices A , B , C. 1.3
T he i nverse of a m atrix
If A is a square matrix, it may be that there is a matrix B such that AB = I. When this happens, B is called a right inv ers e of A. Similarly, it may be that there is a l eft inv ers eC of A, satisfying CA = I. :\'ot all matrices have inverses in this way, but it is an interesting (and not entirely straightforward) fact that the left and right inverses of a square matrix A, if they exist, are the same. Fact 1 . 1 L et A b ean n x n matrix and suppos e B is e i th er a l eft or a right i nv ers eof A . Th en B is a two-sid ed inv ers e, i. e. BA = AB = I, and this inv ers e is unique, so if C satisfi es eith er CA = I or AC = I , th en C = B . (See Exercise 1.15 for a way to proye this.) Vle denote this unique inverse of a square matrix A when it exists by A - 1. If A - 1 exists we say A is i m·ertib Jc. Proposition 1 . 2 If A and B ar e inv er tibl esquar e matric es , th en AB is also inv ertibl eand (AB ) - 1 eq uals B - 1 A - 1 . Proof
Using the associativity law, (AB ) (B-1 A - 1 )
= =
A(B(B- 1 A - l ) ) A(IA -l )
=
=
A ( ( B B - 1 )A -l )
AA - l = I.
Hence B - 1 A -l is a right inverse of AB, and therefore is the unique inverse of D AB . 1.4
The transpose of a m atrix
The transpos e operation (which is notated with a T sign) converts an matrix to an m x n matrix as follows: all
UJ2
UJm
a21
a22
U2m
a31
a3 2
U3m
anl
Un2
rLnm
1 . 3 For n T (AB) = (a) B TA T;
Proposition
x n
T
(""
UJ 2
-
a 1: m
matric es A, B:
a21
U3 1
a22
a32
a,2
a2m
a3m
rL nm ""'
.
)
.
n x m
Matrices
6
(b) if A is invertible, then so is AT, and (AT)- 1 = (A-1)T .
Proof ( a ) A typical entry Cji of C = AB i s 2::::�= 1 aj k b k i , and this i s the (i , j)th entry of (AB)T . On the other hand, the (i, j)th entry of BT AT is 2:::: �= 1 b k i ajk ,
which is precisely Cji· (b) We use the fact that the inverse B - 1 of a matrix B , when it exists, is both a left and right inverse (i.e. B - 1 B = BB- 1 = I) and is unique. By part (a) we have (A 1 ) TAT = (AA - I )T = I T = I so (A - 1 )T is an inverse of AT. By D uniqueness (A - 1 ) T = (AT) - 1 . -
1.5
Row a n d col u m n operations
An elementary row operation is a basic type of operation on matrices; for ex ample, 'add row i to row j ' is an elementary row operation. The three kinds of elementary row operations are:
1. P j := Pj + >..pi ; 'add >.. times row i to row j ' , for any number >.. and any rows i , j ; 2. P i := >..pi ; 'multiply row i by).. ' , for any nonzero number >.. and any row i ; 3. swap (pi , pj) ; 'swap rows i and j ' , for any two rows i , j . (Note: the Greek letter p ( 'rho') sounds almost the same as the English word 'row' !) Each of these operations corresponds to multiplying on the left by a certain matrix. For example, for row operations on 3 x n matrices,
� �); (� � D; (�0 0� �).
1. P 3 := P 3 + 2pz corresponds to left-multiplying by 2. p,
,�
Ap, wne,pond' to left-multiplying by
2 1
and
3. swap(p1 , pz) corresponds to left-multiplying by
1 A row operation is a combination of elementary row operations, performed con secutively, and so, by associativity, is equivalent to multiplication on the left by a product of matrices of the forms above.
Exercise 1 . 1 Check your understanding by calculating AB directly for B
=
(;
2 3 -1 1 -2 3 1
and for each matrix A in 1-3 above.
By combining elementary row operations, we can obtain row operations to perform several useful transformations of matrices.
Row and column operations
7
Clearing the first column. Our first basic technique using row operations converts any matrix A to another matrix B of the form
(1 so all but possibly the first entry in the first column is zero. We can furthermore arrange that b11 is either 1 or 0.
2. 4.
1. If all entries in the first column are already zero there is nothing to do. Else use the swap operation to arrange that the top left entry a1 1 in the
matrix is nonzero. 3. Optionally, use p 1 := A.p1 where >.. = 1 / a1 1 to ensure that the top left entry is one. Now use operation PJ := Pi + f-lPJ for each j � and for suitable values of f-l to ensure the rest of the first column is zero.
2
(
25 9)
) 2
Exercise 1 . 2 Apply this method to convert the following. (b)
1 0 -1 0 -1 1
1
10 11 . 1
1 1
Echelon form. A matrix is in echelon form if it contains no adjacent rows of
the form
0 0 0 0
0 XJ 0 Y1
X2 Y2
with y 1 =f. 0 (irrespective of what x1 is) . In other words, for a matrix in echelon form, each row starts with a sequence of zeros, and the number of zeros in this initial sequence increases as you go from one row to the next row beneath it until we get to the very last row, or until all other rows are entirely zero.
(
2 24)
)
Exercise 1.3 Decide which of the following are in echelon form. Give reasons. (a)
0 1 0 0 0 0
(b)
(d)
(�
G
0 1 0 0 1 0 0 0
�
(c)
�
1
3
(� 2 2 2 1
1
1
0 0 3
1
8 Matrices Converting to echelon form. Any matrix can be converted to echelon form by row operations. The procedure is as follows. 1. Apply 'clearing the first column ' ignoring any initial columns of zeros ) until the matrix is of the form (
0
0
bml
2. Now put the matrix
into echelon form by ' clearing the first column' of this matrix again, ignor ing any initial column of zeros ) . Note that row operations on( this matrix correspond to row operations on the original. Continue in this way until the whole matrix is in echelon form. If the 'optional' step in 'clearing the first column' above is applied each time, this method gives echelon form in which the first nonzero entry in each row is 1 .
Exercise 1 . 4 Use row operations to put the following into echelon form . ( a)
G �) G 5 i) (! D c� 1 1 1
( d)
(
1 0 0
b)
1 1 3 4 4
2 1 3
(c)
-1
( 0 1
c)
1
2
G
2 3
- 2 -3 -4
2 3 2 3 0
1
-5�)
D
Rank. If A can be converted to the echelon form matrix B using row operations, and B has exactly k nonzero rows, then the rank of A, rk A ( sometimes called the row-rank), is k.
Exercise 1.5 Say what the ranks of the matrices in Exercise 1.4 are. For further practice, calculate the ranks of the matrices in Exercises 1 .2 and 1 . 3 also. It is not at all obvious that this notion of rank is well-defined. In other words, it is not obvious whether or not there can be two different sequences of row operations converting A into echelon forms B and C respectively, where B has a different number of nonzero rows than C. In fact this is impossible, but we will defer a proper discussion of this until Chapter
8.
Row and column operations
9
Converting to the identity matrix. If a matrix A is n x k (i.e. has n rows and k columns) where n::;; k, then the leftmost n x n block of A can sometimes be converted to the identity matrix using row operations. In fact, this can always be done if A has echelon form all 0 0
0
0
al n+ l a2 n+l a3n+ l
alk a2 k a3k
ann an n + l
ank
aln a2n a3n
an 0 a33
with each of the diagonals ajj f:- 0, or in other words, if A has rank n. 1. By performing the row operation Pi := ( 1/ a ii)Pi for i = 1, . . . , n if necessary, convert the above echelon form to a similar one where the diagonal entries aii are all 1 . 2. Now, starting at column n and then working backwards t o column n - 1 , n- 2, up to column 1 , clear all nondiagonal entries i n this column by carrying out row operations as follows: (a) for column n, use operations Pi :=Pi + f1Pn for i = 1 , 2, . . . , n - 1 , and suitable values of 11 in each case; (b) for column n - 1, use operations Pi : = Pi + flPn -l for i = 1, 2, . . . , n - 2; (c) and so on. This gives a form 1 0 0 1 0 0
0 0
0 0 1
0
0 b l n+ l 0 b2 n+ l 0 b3n + l
bl k b2k b3k
b n n+ l
b nk
1
for the matrix.
Exercise 1 . 6 Where applicable, apply this method to the matrices in Exercises 1.2, 1 .3 , and 1 .4.
Calculating the inverse. If A is an n x m matrix and B is an n x k matrix, the augm en t ed matrix (A[B) is the n x (m + k) matrix you get by writing down
the entries of A and B next to one another in the obvious way. To compute the inverse of an n x n square matrix A, apply row operations to the augmented matrix (A/1) , where I is the n x n identity matrix, to get (I[B) for some matrix B , as described in the last section. This is not always possible, but as we have already seen it will be possible if A can be converted to some echelon form with n nonzero rows, i.e. if A has rank n . Then A - I = B .
10 Matrices For example, starting with
A=
( i i -�) -1 0
1
and following the above procedure we can get row operations converting
u
3 1 0 0 2 1 -1 0 1 0 1 0 0 1 0
(Try it! ) Thus
)
to
(
' 0 0 0 1 0 0 0 1
)
1/4 - 1 /2 -5/2 1 1 . 0 1 /4 - 1/2 - 1 /4
)
u l' c�
- 1/2 -5/2 1 1 . 1 /4 - 1/2 - 1/4
2 1 -1 1 0
To see why the method works, recall that each elementary row operation corresponds to multiplying on the left by an elementary row operation matrix. So applying several row operations to (A[I) corresponds to multiplying on the left by a product R = R 1 R2 . . . Rk of row operation matrices. If the result of these operations is (I[B) then by associativity of matrix multiplication R(A[I) = (I[B ) , i.e. RA = I and RI = B. In other words, R = B and B is a left inverse of A . We saw that this method will always find the inverse of an n x n matrix A if A has maximum possible rank n. It turns out A -I exists if and only if rk A = n , so this method will always succeed. The ideas here can be used to prove Fact 1 . 1 . See Exercise 1 . 15.
Exercise 1. 7 Use this method to calculate the inverses of the following matrices. ( a)
(� I) 1 0 1 1 0 1 0 0
(b )
H -�) 0 1 -1
( c)
u �) 2 -1 2
Solving linear equations. Row operations are commonly used to solve simul taneous linear equations, and we may illustrate the method here with an example. To solve x + -
x
y
+ 2z = - 1
y
+ 4z =
z = -1
+
-x +
first put the equations in matrix form
3,
Determinant and trace
11
and then put the augmented matrix formed from the matrix on the left with the column vector on the right into echelon form :
(-
�
1 2 0 1 -1 1 4
1 2 -1 1 3 -2 2 -4
1 2 -1 1 3 -2 0 0 0
6
The full solution can now be read off directly: x
+ y + 2z = - 1 y + 3z = - 2
0 =
0,
)
-
so z may be anything, y = - 2 - 3z, and x = - 1 - 2z y = z + 1 . This method works for any number of simultaneous equations in any number of unknowns, and always gives the most general solution. Of course, a system of simultaneous linear equations may not have any solu tion at all. The following is useful in this regard. Fact 1 .4 The equation
A
=
(��) .
=h
Xn
where A is a k x n matrix and b is a k x 1 column vector has at least one solution if and only if rk(A) rk(A[b) . It has exactly one solution if and only if rk(A) = n .
Column operations. These are analogous to row operations except that they operate on columns instead of rows. In this book, the elementary column oper ations are denoted by "'j := "'j + .A"'; , "'i := Af\,; (for A::/:- 0), and swap("'; , "'j ) , in exact analogy with the notation for the row operations. Here, "'i is used to denote the ith column of a matrix, just as p; denoted the ith row. Column operations correspond to multiplying on the right by special column operation matrices, just as row operations correspond to left-multiplication by row operation matrices. In fact, column operations are not used as much as row operations in practice, and they will appear here only occasionally. 1.6
Determ inant a n d trace
The determinant and trace operations take a square matrix A and return a number, det A or tr A. The trace operation is the simplest to calculate, as it is just the sum of the diagonal entries of the matrix. Thus if A = ( a;j ) is an n x n matrix, the trace of A is defined to be
12 Matrices tr A = tr
(
n
an = a 1 1 + an +
:
·
·
·
+ ann =
L a;;. i =l
an !
The significance of this operation is not at all clear from this definition. However, note that it has the obvious property that for n x n matrices A and B , tr(A + B) = t r A + t r B . One much less obvious property that will play an important role later i s that for an invertible n x n matrix P and any n x n matrix A, tr(P - 1 AP) = tr A. At this stage at least, the determinant of A will seem just as mysterious. Its definition is an inductive one. For 1 x 1 matrices A = ( a1 1 ) , we define det( an ) = a 1 1 ,
I ( )
i.e. the number which is the only entry of the matrix A. For an n x n matrix A = ( a;j ) , we denote det A by the matrix A with vertical lines round it, and define det A =
an a2 1
a1 2 a22
a1n a1n
an!
an 2
ann
a!n
an
a2 1
a2 3
a1n
an!
an3
ann
- a1 2
= a11
ann
an2 +
· · ·
+ ( - l ) n+!aln
a2 1
a! n - 1
an!
an n - !
(This is sometimes called 'expansion by the first row'.) The rule is to take the entries in the top row in turn, with alternating signs, and multiply them by the determinants formed by deleting the row and column of the entry in question. This gives an expression for the n x n determinant in terms of (n - 1) x (n - 1) determinants, and the (n - 1) x (n - 1 ) determinants are evaluated b y the same rule repeatedly until we get down to 1 x 1 determinants which are evaluated using ( 1 ) . For example, applying this to the 2 x 2 case we have
�� ��
which should be memorized. In the 3
a1 1 a2 1 a31
= ad - be, x
3 case we have
a1 2 a13 an a 2 3 = an ( ana33 - a 2 3 a3 2 ) - a1 2 ( a 2 1 a33 - a2 3 a31 ) + a13 ( a 2 1 a3 2 - a 22 a3 1 ) . a3 2 a33
Determinant and trace 13 Determinants often have a physical interpretation a s areas or volumes. For example, the determinant
�� ��=ad-
be
has magnitude equal to the area of the parallelogram with corners given by position vectors
The first point of significance of determinants for matrix operations is the fol lowing. Fact 1 . 5 For an n x n matrix A, th efollowing ar e equival en t : (a) A - I exists ; (b) det A -::/:- 0; (c) rk A = n. There are various useful rules to help calculate determinants. Firstly, for some matrices it is particularly easy to compute determinants. We say a matrix A = ( is upp er triangular if 0 for i > j; in other words the nonzero entries are all on or above the principal diagonal of the matrix so that
a;j)
a;j =
A=
a1 1 0 0 0
a22 a23a33 a1 2
a1 3
0
0
ala2n a3nn ann
Fact 1 . 6 Th e d et erminan t of an upp er triangular matrix A is equal to th e product of i ts diagonal en tri es . We also have Fact 1 . 7 L et A , B b en x n matric es and I th en x n id en ti ty matrix. Th en : (a) det A = det(A T ); (b) det(AB) = det A det B; (c) if A is inv ertibl e, det (A -I ) = (det A) - 1 ; (d) det I = 1; (e) if A has a row or column which is en tir ely z er o, th en det A = 0 . Fact part (b) , i s the central fact concerning determinants, and the key to proving all the results concerning determinants mentioned here. A proof of it is outlined in Exercise 1 .20 below. It also suggests an alternative way to calculate determinants: instead of using the definition directly, we can apply row opera tions to get our matrix in echelon form , and then compute the determinant of
1.7,
14 Matrices this (using Fact 1.6) and also the determinants of the row operation matrices used to get to this form. In fact, the determinants of the basic row operation matrices are easy to calculate. Fact 1.8 (a) If S i, j is th e matrix for th e row op eration swap (pi , pj ) wh er e i =1- j' th en det s i, j = - 1 . (b) If T i,>. is th e matrix for th e row op eration P i : = >.. pi wh er e>.. :j:. 0, th en detT i,>. = >.. . (c) If Ai, j,>. is th ematrix for th erow op eration P i : = P i + Apj wh er ei :j:. j, th en det Ai, j,>. = 1 . From the last two basic facts, all the usual rules for evaluating determinants can be deduced.
Proposition 1 .9 If w eswap any two rows in a d et erminant
an a1 2 a2 1 a22
a1 n a2 n
th e d et erminant chang es sign (i. e. is multipli ed by - 1). In particular, if th e matrix A = ( a ij ) has two id en tical rows th en det A = 0 .
Proof Use Fact 1.7 part ( b ) and Fact 1 .8 part (a) . I f A has two identical rows, then swapping them does not change A, so det A = - det A and hence 0 det A = 0. Proposition 1.10 Multiplying any singl erow of a d et erminant by >..multipli es th ed et erminant by >...
Proof Fact 1 . 7 part (b) and Fact 1 .8 part (b).
0
Be careful here: only one row of the matrix is multiplied by >.. to multiply the determinant by >.. . This is in contrast with scalar multiplication of matrices, >.. A , where every row i s multiplied by >.. . I n fact, i f A i s an n x n matrix, det(>.. A) = An det A .
Proposition 1.1 1 Applying any row op eration P i matrix A l eav es det A unchang ed.
·-
P i + Apj (i :j:. j) t o a
Proof Fact 1 .7 part (b) and Fact 1 . 8 part (c).
0
Proposition 1.12 A d et erminant can b e expand ed by any row, provid ed you r em em b er that th esign associat ed with an en try aij is ( - 1) i+j . Proof This can be derived from the definition, Fact 1 . 7 part (b) and Fact 1 .8
part (a) .
0
Proposition 1 . 1 3 Any of th eabov erul es for evaluating d et erminants for rows and row op erations appli es equally to columns and column op erations.
Minors and cofactors 15 D
Proof Fact 1.7 part (a) .
For example, a determinant with two identical columns is zero, just as one with two identical rows. 1.7
Minors a n d cofactors
Determinants are also used to transform an another matrix as follows. Given
A=
C'
n x n
a12 a 22
a2 1 .
anl
a ,n a2 n .
an 2
)
square matrix A
=
( a ;j ) to
'
a nn
define b;j to be the determinant of the (n - 1 ) x (n - 1) matrix obtained by deleting the i th row and the j th column of A. Then the matrix B = ( b;j) is called the matrix of minors of A. Now define c;j = ( - 1 ) i +J bij · In other words, c;j is the determinant of the (n- 1) x (n - 1) submatrix of A with the sign correction you would use when evaluating det A by the ith row (or by the jth column). The number Cij is called the ( i, j) th cofactor of A, and the matrix C = ( c;j ) is called the matrix of cofactors of A. The transpose of this, C T , is called the adjugat ematrix of A , written adj A. This matrix cr = adj A has the useful property that (adj A)A = A (adj A)
=
det (A)I.
(2)
Thus, in principle at least, determinants give another way of calculating inverses: if det A =1- 0 then A-1
=
�
de A
adj A.
In practice, this method is only used for particularly simple matrices, including all 2 x 2 matrices, where we have
(�
)
(
1 1 b d d -ad - be -c
In most other cases, it is usually simpler to calculate inverses using row opera tions. Exercises
Exercise 1 .8 To show you understand the definition of 'echelon form ' , give an example of an upper triangular matrix which is not in echelon form.
16 Matrices Exercise 1 . 9 Let A be an m x n matrix and let ei be the n x 1 column matrix with ith entry equal to 1 and all other entries 0. Show that Ae; is equal to the ith column of A. Exercise 1 . 10 Using the definitions of the matrix operations directly, verify laws 1-3 for matrix addition, 1-4 for scalar multiplication, and 1-5 for matrix multiplication. Exercise 1 . 1 1 In each case, determine if the system of equations has a solution, and if so give the most general solution. (a) x + 2y + z = 2 -1 -x + 2y 2y + 2z = 7 z = 1 X y 1 2x - y 2x + 2z = 1
5x (b)
(c)
X + 2y + Z X+ y+Z +z -X
1 1 1
Exercise 1 . 1 2 Do the same as the last exercise for the following. x + 2y - z + w = 1 (a) 2x + y + z = 2 (b)
X+y- Z y + z x + 2z x - y + 5z
(c)
1 2 1 1
x + 2y + 3z + -X + y + Zx + 5y + 7z +
w W
w
-1 2 1
Exercise 1 . 13 Calculate the adjugate of each of the matrices of Exercise 1 .7 and in each case verify that equation (2) on page 15 holds. Exercise 1 . 14 Calculate the adjugate of each of the following m atrices A. (b)
(-�
0
. - �)
2 1 0 1 2 -1 -1 1 1
Exercise 1 . 1 5 This exercise proves Fact 1 . 1 using the idea of row operations, and is for ambitious students only. (a) For each n x n row operation matrix R, show directly that R has an inverse, i.e. some S with SR = R S = I.
Exercises 17 (b) Show that if R = R 1 R 2 . . . R k is a product of elementary row operation matrices then R does not have a row that is entirely zero. [Hint: use (a) and associativity of matrix multiplication.] (c) Suppose A , B are n x n matrices with AB = I. Prove that there is some C with CA = I. [Hint: let R = R 1R 2 . . . R k be row operation matrices so that R A is in echelon form. By multiplying on the right by B , show that A has rank n.] (d) Considering transposes and part (c) , show that if A , B are square matrices with BA = I then there is a matrix C with AC = I. (e) Show that if CA = AB = I then C = B. [Hint: what is CAB?] Similarly, show that if CA = BA = I then C = B .
Exercise 1 . 16 Show that any square matrix A is the product R 1R 2 . . . R k of
'generalized' elementary row operation matrices, where 'generalized ' means that the ). in each of the row operations p; := Ap; and p; := Pi + Apj is now allowed to be zero. [Hint: modify the 'converting to the identity matrix' method above to find ordinary elementary row operations R ; and generalized elementary row operations s j such that Rl SII.] R kA = s l 0
0
0
0
0 0
Exercise 1 . 17 Show directly from the definition of determinant that a l l + Ab1 1 a2 1
a1n + ).b ln a2 n
al l a2 1
al n a2 n
an !
ann
an!
ann
+A
bl l a2 1
bl n a2 n
an!
ann
Deduce that the determinant of a matrix A whose top two rows are equal is zero.
Exercise 1 . 18 Show that the matrix of any generalized elementary row opera tion (in the sense of Exercise 1 . 16) can always be written as one of SiT jRT jS ; , SiRSi, TjRT j , or R , where R is the matrix of a generalized row operation act ing on the first two rows only, and Si, T j are the matrices of ' swap' operations swap(p1, Pi) and swap(p2 , P j ) respectively.
Exercise 1 . 19 Prove by induction on n that if A, B are n x n matrices with B obtained from A by the operation swap(pi, P J ) , where 1 ( i < j ( n , then det B = det A. [Hint: if i = 1 and j = 2, expand det B twic e, using the -
definition of determinant. Otherwise, use the induction hypothesis.]
Exercise 1 . 20 Using Exercises 1 . 16, 1 . 17, 1 . 18 , and 1 . 19, show that det(AB) = det A det B for all n
x
n matrices A , B .
Exercise 1.21 Using Exercises 1 .20 and 1 . 16 , o r otherwise, prove the following for an n x n matrix A. (a) rk A = n if and only if det A -::/:- 0. (b) det A T = det A. (c) If A is upper triangular then det A is the product of its diagonal entries.
18 Matrices Exercise 1 . 2 2 A perm utation a- of { 1 , 2, . . . a- : { 1 , 2, .. .
,n
}
-+
,n
} is a bijection
{ 1 , 2 , . .. , n } .
The set of all such permutations is denoted Sn . The matri x M,. of the permutation a- is the n x n matrix with 1 in the (a-( i), i)th position for each i = 1 , 2, .. . , n , and all other entries equal to 0. The sign of the permutation a- , sgn(a-), is defined to be detM,. . (a) Show that M ,. ei = e,.(i) for all i, where ei is the n x 1 column matrix of Exercise 1 .9. Hence show that M,.M" = M uorr , where a- o 1r E Sn is the permutation defined by a- o 1r(i) = a- (1r(i)). (b) Show that sgn(a-) E { 1 , - 1 } for all a- E S n . (c) Calculate the signs of the permutations a-, 1r E S 3 given by a- ( 1)
=
2, a-(2)
=
1, a-(3)
=
3,
7r(1) = 2, 7r(2) = 3, 7r(3)
(d) Verify the formula
a1 1 a2 1
a1 2 a13 a 22 a 2 3
=
=
a1 1 a22 a33 + a1 2 a 2 3a31 + a13a2 1 a32 - a1 2 a2 1 a33 - a13a22 a31 - ana2 3a32
L sgn(a-) a l u( l ) a2 u(2)a3u(3)
where the summation is over all six permutations a- of { 1 , 2 , 3 } .
Exercise 1 .23 Show by induction on n that the formula
=
L sgn (a-) al u(l )a2 u(2 )
o-ESn
holds for all
n
E N.
·
· ·
anu(n)
=
1.
2 Vector spaces I n this chapter we will review material concerning vector spaces. Some o f this (but possibly not all of it) may already be familiar to you. This chapter is very important, however, because it sets out the style in which we shall study linear algebra, how we shall organize the text, and why it is particularly useful to do so in this way. 2.1
Exam ples a n d a xioms
We start our study of vector spaces with some examples.
Example 2 . 1 Let
II!'
from the real numbers
be the eet of all column vectooe R
+0
�
with cntri"'
x, y,,
We can add two such vectors by the rule
It followe that when we add r·
G)
G)
to a vectoc v we get v again, just like 0 + r
r foe all real numbeoe r . So
G)
�
behave' juet like 0 and i' called the
zero vector, denoted 0. We can also multiply a vector v with a real number r by the rule
The operations of addition and multiplication by real numbers r are related. For example, it turns out that 2v is equal to v + v for all vectors v. More generally, the distributivity law, (r + s)v = (rv) + ( sv) , holds, since
20 Vector spaces (r + s)
(�) (��: :��) (��: :�) (��) (:�) =
z
+
=
=
(r + s)z
rz + sz
rz
sz
Various other laws like this one hold too as you will see in a moment.
Example 2.2 Now consider M3 ,3 ( IR) , the set of 3 x 3 matrices with entries from
JR. As for vectors, matrices can be added by the rule
(
X1 2 xn ' X"2 '1 X 22 X2 3 X31 X3 2 X33
) (
" Y1 2 " + Y2 l Y22 Yn "
Y3l
"
Y3 2 Y33
)
=
( ) (
+ y, X! 2 + Y! 2 xu + "" '' X"2 ! + Y 2 1 X 22 + Y22 X2 3 + Y2 3 X3! + Y3! X3 2 + Y3 2 X33 + Y33
and a matrix can be multiplied by a real number
X13 X2 3 X33
r by
rx11 TX1 2 TX1 3
=
TX 2 1 TX31
TX22 TX2 3 TX3 2 TX33
)
)
Similar laws hold for such matrices. For example, the zero matrix ( with all entries equal to 0) can be added to any other matrix without changing it, and the commutative and associative laws of addition, and the distributivity law, hold too, as we saw in Section 1 .2.
Example 2 .3 Consider now IR[X] , the set of polynomials in X with coefficients from N. A typical polynomial is
f(X) = anXn + where the added by
ai
· · ·
+ a2 X2 + a 1 X 1 + ao
are real numbers, and possibly 0. Two such polynomials can be
(anXn + · · + a 1 X 1 + ao) + (bnXn + · · · + b1X1 + bo) n = (an + bn)X + · · · + (a! + b! )X 1 + (ao + bo) , and multiplied by r by r(anXn + · · · + a 1 X1 + ao) = (ran)Xn + + (ra!)X1 + ( m0 ) . ·
· ·
·
Real vector spaces. As will be clear, these examples ( and many others like them ) have several features in common. When mathematicians want to concen trate on certain features of several examples ( and possibly ignore other features of the examples, such as the rule for multiplying two matrices together, or the very different rule for multiplying two polynomials together ) they write down a xioms for the common features. The three examples above are all examples of r eal v ector spac es and the next definition gives the axioms for real vector spaces.
Examples and axioms 21
IE.
Definition 2.4
A real vector space or vector space over is a s et V containing a sp ecia lzero vector 0, tog eth er with op erations of addition of two v ec tors, giving u + v, and mu ltip ilcation of a v ec tor v with a r ea lnumb er >., giving >.v, satis fying th efo llowing al ws for a llu, v, w E V and >., tJ E JR.
(u+v) + w=u +(v+w) u+v=v+ u u+O=u v+(-1)v=O .:\(tJv) =(>.tJ)V A(u + v) =AU+ Av (A +tJ)U=AU+ fJU.
(1) (2) (3) (4) (5) (6) (7)
Th e v ec tor ( - 1 )v is d efin ed to b e th e r es u tl of mu tlip ying l - 1 and v, but is mor e usua lly writt en -v; simi ar l y, l u + (-v) is writt en u- v. Th e elem en ts of V ar eca lled th e vectors of th espac e. Rea lnumb ers ar eoft en r ef err ed to as th e scalars of th espac e. Some of the axioms above have special names: ( 1 ) is called the associativ ity of addition; (2) is the commutativity of addition; and (6) and (7) are the distributivity laws. Several advantages of the axiomatic approach are already apparent. We now have a way of dealing with examples like the three mentioned above (and many others like them) in one go, rather than proving theorems for each example indi vidually, so this approach saves time and energy. It also helps understanding the examples, since features which are irrelevant for some problems (matrix multi plication perhaps) are not mentioned in the axioms. Thirdly, it helps consider ably in checking that proofs are correct, since for example a proof of a theorem about vector spaces can only use the axioms listed in the definition, and no other properties of any special example the reader or author might have in mind. Of course, to apply any theorems we might prove about real vector spaces to a given space V, one must first prove that the axioms for real vector spaces are true of V. This is usually straightforward.
Example 2.5 The set of complex numbers C forms a real vector space with the usual addition z1 +z2 of complex numbers, and scalar multiplication, A(x+ iy) =(Ax) + i(Ay),
where x, y, A are all real.
Proof Most of the vector space axioms express well-known facts about C. For example, (z1 +z2)+z3=z1 +(z2+z3) is just associativity of addition of complex numbers, and z1 + z2 = z2+ z1 is commutativity. The zero vector 0 here is just the complex number 0 0 + iO, so z + 0 =z and z + ( -1)z= 0 are clear. Also A(tJz) = (AtJ)z, A(zl + z2) = Az1 +Az2, and (A+tJ)z = >.z+tJZ are consequences 0 of associativity and distributivity of complex multiplication. =
22 Vector spaces
V and also zero space V = {0}, + V0 = .AO = 0 ). 0, (1)-(7) 0 for all v V andv,Inalla scalrealavector rs .A. space V, (a) v = 1v, (b) Ov=0, and (c) ).0 = 1 v== ((-1)( -1)( --11)v)v + 0 == (( -1)( + ( -1)v) -1)(-l)v -l)v ++ (((v-l)v + v) ==O+v ((-1)(- l)v+ (- l)v) +v ( =v Ov=(=llv+(+ (--l)vl))v ( 7 ) =v + (-1)v =0 (-1)0) .AO== .A.AO(O++.A((-1)0) ==0.AO + ( -1)(.AO)
Note how a vector space is defined. It is necessary to say wh at the set of vectors is, to say wh at th e rules for addition of vectors and scalar multiplication are too. Th e next example is very important, despite its simplicity. In fact, this ex ample describes the simplest kind of vector space of all. Example 2 . 6 Th e over IR is the vector space where (ob viously) 0 denotes th e zero vector. A ddition and scalar multiplication are given for all E JR. Th e vector space axioms are 0 and by the rules 0 all true for this for a very simple reason: since every meaningful expression involving scalars and th e vector 0 gives the vector any two such expressions are equal, and hence equations are all true in this special zero space. Th e next proposition gives the first example of the use of th e axioms to prove statements about vector spaces. It shows th at many of the laws which you migh t have expected to b e axioms in fact follow from the axioms already given.
Proposition 2 . 7 E
Proof ( a) Given
we h ave
by by by by by by by
(5) (3) (4) (2) ( 1) 4) and (2) (3) and (2) .
(b ) Again,
by by (a) by (4) .
For (c ) ,
by by by by
as
required.
(4) (6 ) (5) (4) ,
D
Examples and axioms 23 Because of the axioms and because of propositions like this last one, we will adopt a slightly more relaxed approach to notation, writing for example u+v+w for the sum of three vectors u, v, w in a vector space V without specifying the order in which they are to be added. (This order does not matter, since by ( 1 ) and (2) (u + v ) + w, u + (v + w), u + (w + v), w + (v + u), (w + u ) + v, and so on, are all the same.) Similarly, by (5) , ( -2.\ )v = (-2)(.\v) = -(2.\)v, etc. , and this vector will b e denoted more simply by -2.\v. We will use the axioms ( 1 )-(7) and Proposition 2. 7 all the time, usually without explicit mention.
Complex vector spaces. The definition given above of a vector space over
lR
can easily be modified to give the notion of a v ec tor spac e ov er th e compl ex num bers by just replacing lR by in Definition 2.4 and letting .A and f.l range over all complex numbers in the laws ( 1)-(7) for scalar multiplication. For example, the set of column vectors of height 3 and with entries from can be regarded as a complex vector space, in just the same way as can be regarded as a real vector space. However, care is needed. For example, the set of complex numbers can be regarded as a r eal vector space as we have already seen, or as a compl e x vector space with usual complex-number addition + and for scalar multiplication the compl ex-num ber multiplication Similarly, can be regarded as either a real vector space or a complex vector space. The properties of these spaces (e.g. as a real vector space, and as a complex vector space) are quite different, as we shall see. The distinction of what scalars you allow is a very important one.
e,
e3
e
e
..\z. z1 z2 JR3
e
e
e
(XY!1) (XY.22) Z1 Z2
e3
(XY!1 YX22) Z 1 Z2
en
Canonical examples. This book is concerned with vector spaces in general, but we will constantly refer back to the more familiar vector spaces Rn and with addition defined by
..
.
+
+ +
..
(8)
.. . +
and scalar multiplication defined by
(9)
en
for scalars .A. Unless otherwise specified, will be regarded as a complex vector space (so the scalars .A in (9) will be complex numbers, and multiplication, .Ax etc . , is ordinary multiplication of complex numbers) . R n is always considered as a real vector space. The bold face notation for vectors, as in v, w, , will be reserved for the vector spaces Rn and only, and we will use light face letters x, y, u, v, w, . . . for vectors (i.e. elements of V) in the general case. Scalars will be denoted with Greek letters .A, f.l, v, . . . to distinguish them from vectors.
z,
en
. . .
24 Vector spaces We will also refer to the spaces JR0 and C0 . If JRn is the set of column vectors with n entries, JR0 should be the set of column vectors with no entries. There is only one such vector, and it might be written ( ) , just as the empty set is sometimes written { } . The only possible way to define addition and scalar mul tiplication is by defining ) + ( ) = ) and .A() = ( ) . In other words JR0 is just the zero space over JR, as in Example 2.6, where () is thought of as 0. Similarly, C0 is the complex zero vector space--it too has a single vector ) thought of as 0, but this time the scalars are the complex numbers. The main disadvantage with column notation for vectors in lRn and en is the amount of paper it requires. For this reason, you may see people use so-called row vectors, but changing between row and column vectors is sometimes confusing. In this book, we will use column vectors throughout, but will sometimes notate a column vector as a row with a transpose sign as a reminder. Thus we often write the vector
(
(
(
T
as
(xInr,ourxz,xexamples 3 )T. of vector spaces over the real numbers lR (or the complex
numbers q , lR (or q is called the field of scalars. To simplify terminology, we shall often talk about a vector space over F, or an F-vector space, where F is lR or C. The optional Section 2.6 below takes this terminology a bit further and gives axioms and further examples of fields. 2.2
S ubspaces
Almost invariably in pure mathematics, whenever you see a definition for an 'object' of some kind, you will also have a definition of a 'subobject' .
Definition 2.8 Given a vector space V over lR or C, a subspace of V is a
u,
subset W � V which contains the zero vector of V and is closed under the operations of addition and scalar multiplication. That is, for each v E W and each scalar .A, each of + v , (and .Av) must be in W .
u .Au
u,
Since - 1 i s a scalar, every subspace W contains -v for each v i n W, so W is also closed under subtraction of vectors, v - too. You should be able to check easily that {0} and itself are both subspaces of the vector space V . The following lemma gives a 'minimal' condition for a subset W o f V t o b e a subspace o f Lemma 2.9 Let W � V be nonempty, where V is a vector space over lR or C. Then W is a subspace of V if and only if v + .Aw E W for each v, w E W and each scalar .A.
V.
V
Proof Given v , w E W and .A E F, we have 0 = v + (-l)v E W, v + w = v + lw E W, and .Aw = 0 + .Aw E W . This proves the 'if' part. For the 'only if'
Subspaces 25 part, suppose W is a subspace of V, so v + AW E w .
v,
wE
W , and A is a scalar. Then
Aw E
W D
The laws (1)-(7) clearly hold in any subspace, so a subspace W of a vector space V is a vector space in its own right. Subspaces will turn out to be very important indeed. We would like some way of specifying a subspace accurately and succinctly. For example, suppose we knew the vectors a1 , . . . , an were in a subspace W of V . Does this determine W? Or, if not, is there some 'special' or 'best' subspace of V containing a 1 , . . . , an ? With this in mind we make the tentative definition, If V is a vector space over F = IK or C, and A � V is any subset of vectors from V, then the smallest subspace W of V that contains A is called the subspace spanned by A . The problem with this i s that it i s not immediately obvious that this sub space W exists for all A. Certainly, V itself is a subspace of V containing A, so some subspace of V containing A exists, but why is there a smallest subspace containing A? However, one can rescue this idea as follows. If a1 , a2 , . . . , a n E W and W is a subspace of V then it follows from the fact that W is closed under addition and scalar multiplication that ( 10) for any scalars A1 , A2 , · · · , An · (An expression like (10) is called a linear com bination of the vectors a 1 , . . . , an .) What is more, by the associativity and dis tributivity laws in the vector space,
(A1 a1 + A2 a2 + · · · + Anan) + ( 111 a1 + J12 a2 + · · · + Jln an) = ( ), 1 + J1d a1 + (A2 + J12 )a2 + · · · + (An + Jln) an and so the sum of any two linear combinations of a 1 , . . . , an or the scalar product of a linear combination of a 1 , . . . , an is again a linear combination. In other words the set of linear combinations of a1 , . . . , an is a subspace of V , so we make the following definition.
Definition 2 . 1 0 Given a vector space V over F = IK or C, and given a subset A=
{a 1 , a2 , . . . , an} W
=
of V ,
{ )'1 a1 + A2 a2 + · · + An an : .\1 , . . . , An E ·
F}
is the subspace of V spanned by A. The elements of W
.\1 a1 + A2 a2 + · · · + Anan are called linear combinations of vectors from A. This subspace W is de noted span A or span(a 1 , a2 , . . . , an) ·
26 Vector spaces We note again that span A is a subspace of V , and any subspace containing all the vectors from A must contain every linear combination of vectors from A , i.e. must contain each element of W, so W is indeed the smallest subspace containing A. In particular, the zero vector 0 is always a linear combination of a1 , a2 , . . . , an since
IR3
Example 2 . 1 1 Let V = with the usual addition and scalar multiplication, and consider vectors a = (1, 2, 0)T and b = (0, 1, -l)r. Then a typical linear combination of a, b is
If we write this as (x, y, z)T we easily see that 2x- y- z = 0 since x = .\, y = 2.\+f-l, and z = -f-l. So every vector (x, y, z)T in span( a, b) satisfies 2x-y-z = 0. On the other hand, given a vector (x, y, z)T such that 2x-y-z = 0, we have
so (x, y, z)T is in span(a, b). Thus we have proved that
'Pan(a, b)
�
{ G)
•
2x- y- z � 0
}
.
If A is the empty set, we define span A to be {0}. This is just a convention; if you like, you can think of 0 as the sum of an empty sequence of terms of the form Aiai , but if this doesn't appeal, just learn the convention. Example 2 . 1 2 If V =
and
IR4 with the usual addition and scalar multiplication,
W=
{(X) y
�
. 3x+ y = 0 · x+y+z = w
}
IR
we can easily check that W is a subspace of V : if x, y, z, w, x' , y', z' , w' E with 3x+y = 0, x+y+z=w, 3x'+y' = 0, x'+y'+z' = w', then 3(x+.\x' )+(y+.\y' ) = 0 and (x+A.x' )+(y+.Ay' )+(z+A.z' ) = (w+A.w' ), so W is a subspace by Lemma 2.9. For any vector (x, y, z, w )T E W, y = -3x and w = x + y + z = z- 2x so every vector of W is of the form (x,-3x, z, z - 2x)T for some x, z E R In
Linear independence
27
(1, -3,0, -2) T and b = (0,0, 1, 1)T are in W , as you 1 X -3x 01 -3 =x 0 � 2x -2 1
( ) ( ) + z (Q)
particular the vectors a = can check. In fact span( a, b) = W since
z
It is sometimes useful to be able to define span A when A is infinite. In this case we do not have any way to form an infinite sum of terms of the form .\; a;, so instead we are guided by our principle that span A should be the smallest vector subspace of V containing A .
Definition 2 . 13 If A is an infinite subset of V , where V is a vector space over a field F , we define span A , the s ubspace spanned by A , to be the set of all linear combinations of finite subsets of A . Thus span A i s the union of subspaces of V of the form span B where B � A is finite. In symbols, span A =
U
span B
B finite
You can check that this definition makes span A into a subspace of V , the smallest subspace of V to contain A. This is so important it is worth noting as a separate proposition.
IR
Proposition 2 . 1 4 If V is a vector space over or C, B � V, and a 1 , a2 , . . . are vectors in span B then span( a 1 , a2 , . . . , ak) � span B .
, ak
If A is infinite, a linear combination of vectors from A i s j ust an element of span A; that is, a linear combination of a finite number of elements of A. We repeat that, in general, there is no way to combine infinitely many elements into a single linear combination. 2.3
li near indep endence
++
IR
Suppose that A = { a1 , a2 , . . . , an} � V where V is a vector space over or C. We have seen that the zero vector is always a linear combination of vectors from A, and we can ask if the expression 0=
Oar
+ Oa2
·
· ·
Oan
for 0 is unique. The set of vectors A is said to be linearly independent if the expression above is the only linear combination of vectors from A that gives and A i s linearly dependent otherwise. Generalizing slightly to include the case when A may be infinite we have,
0,
28 Vector spaces
�
A set A � V of vectors in a vector space V over F = or C is linearly dependent if there is n E N, vectors a1 , a2 , . . . , an E A, and scalars ..\ 1 , ..\2 , . . . , An not all zero such that
Definition 2 . 1 5
..\ 1 a1 + ..\2 a2 + · · · + ..\n an = 0 . Otherwise, A is linearly independent. So a finite set A = { a1 , a2 , . . . , an} is linearly independent if and only if for all scalars ..\1 , ..\2 , . . . , An E F
..\ 1 a1 + ..\2 a2 + · · · + An an =
0 implies ..\1 = ..\2 = · · · = An = 0 .
Also, if A i s infinite, i t i s linearly independent if and only i f every finite subset of A is linearly independent. By convention, the empty set containing no vectors is linearly independent. ( 1 , 1 , 1 ) r , and
c
=
�3
the vectors a = ( 1 , 2, o )T , (0, 0 , 1)T form a linearly independent set, for if
Example 2 . 1 6 In the real vector space
b
..\a + Jib + vc = 0 ..\, Jl, v then the following system of equations is satisfied: ..\ + J1 = 0 2..\ + J1 0. J1 + But this system has ..\ = J1 = v = as its only solution. On the other hand, the vectors a = ( 1 , 2, o )r, b ( 1 , 1 , l )r, and d ( 1 , - 1, 3)T form a linearly dependent set since 2a - 3b + d = 0. Example 2 . 1 7 Let V be C considered as a real vector space with addition (11) (x1 + iy1 ) + (x2 + iy2 ) = (x1 + x2 ) + i(y1 + Y2 )
for some scalars
0
0
1/
and scalar multiplication (12) ..\(x + iy) = (..\x) + i(..\y). Then { 1 , i} i s linearly independent, since if ..\, Jl are real numbers with 0 = ..\ . 1 + J1 . i = ..\ + iJL then the real part, ..\ , and the imaginary part, Jl, of ..\ + iJL are both zero. Now consider V = C as a complex vector space with operations as in ( 1 1 ) an d (12) except now ..\ may be a complex number. This time { 1 , i} is linearly dependent, since
so for ..\
1· 1+i·i=O =
1 and
J1 =
i
..\ 1 + J1 . i .
=
0.
Bases 29 In IR3 , a one-element set {a} is linearly independent j ust in case a =1- 0. (Note in particular that {0} is linearly dependent since 0 = 0 , and the scalar used here is nonzero.) Also, {a, b} is linearly independent if and only if a and b do not lie on a single line through 0, and {a, b, c} is linearly independent if and only if a , b, and c do not lie on a single plane. We started talking about linear independence via the uniqueness of the linear combination 0 = Oa1 + Oa2+ + Oan for the zero vector. However, if A is linearly independent and v is in the subspace spanned by A then the linear combination for v is also unique, as the following useful proposition shows. Proposition 2 . 18 Suppose A = { a1 , . . . , an} � V is linearly independent, where V is a vector space over IR or C. Suppose also that v E V and there are scalars A1 , . . . , An and ILl , . . . , ILn such that
1
·
1
· ·
and
Proof We have
so
giving
and hence
since { a1 , a2 , . . . as required. 2.4
, an} is linearly independent. So A1
B ases
Definition 2 . 1 9 V which spans V .
=
ILl , A 2
= 1L2 ,
. . . , An = ILn D
A basis of a vector space V is a linearly independent set B �
Example 2.20 The real vector space IR3 has basis {e 1 , e2 , e3 } , where e 1 = ( l , O, O)r, e 2 = (O, l , O)T, and e3 = (O, O, l)T. To prove this you need to check
30 Vector spaces the set is linearly independent and spans the vector space in question. For the first, if
then
= A2 = A3 = 0, as required. (x, y , z)T where x, y, z E JR. But
so A r
For the second, an arbitrary vector in
IR3
is
(x, y , zf = xer + y e2 + ze 3 ( x, y, z f is a linear combination of e 1 , e 2 , e3 . Similarly, IRn has basis { er , e 2 , . . . , e n } , where ei is the n x with ith entry equal to and all other entries zero. This basis that it is called the usual basis or standard basis of IRn . so
1
1 column vector is used so often
The following theorem is particularly important.
Theorem 2 . 2 1 Let V be a vector space over IR or C, and Jet B <;;; V be linearly independent. Then there is a basis B' of V with B <;;; B' . Proof Although the theorem is true generally, we give a proof in the case when B is finite and there are a1 , a2, . . . , ak E V such that V = span(a1 , a2 , . . . , ak) . Suppose B = {b1 , b2 , . . . bn } is given and i s linearly independent. Then: either ai E span B for all i, so V = span( a1 , a2 , . . . , ak) <;;; span B and hence span B V so B itself is a basis and we may take B' B ; or some a i i s not i n span B . By reordering a 1 , a2 , . . . , a k i f necessary we may assume that a1 rf_ span B. We show that B U { a1 } is linearly independent. If ,
=
then
=
An+I = 0 , for else
Therefore
giving A1 = A2 = · · · = An = 0 by the linear independence of B . Continuing this process at most k times we obtain a linearly independent set B' <;;; B U { ar , . . . , ak } for which ai E span B' for all i, i.e. B' spans V and hence D is a basis. Note that the above argument proves the following fact in the particular case when A, B are finite.
Bases 31 Theorem 2 . 2 2 Suppose span A = V and B � V is linearly independent. Then there is a basis B' of V with B � B' � A U B . Again, this theorem i s true generally even i n the infinite case, but requires more sophisticated set theory to prove.
Example 2.23 As for most proofs in this book, the proof of Theorem 2.21 also
provides a method for calculating a suitable basis. For example, suppose V is the real vector space JR4 and a = ( 1 , 1, 0 , 0)T, b = ( 1, 1 , 1 , 1)T. Then the set B = {a, b} is linearly independent so can be extended to a basis. To find such a basis, start with the usual basis vectors
e 1 = ( 1 , 0, 0, 0) r , e 2 = (0, 1, 0, 0) T , e3 = (0, 0, 1, 0) r, e4 = (0, 0 , 0 , 1) T
of IR4 . It is easy to check that e 1 rf. span( a, b) , so {a, b , e 1 } is linearly independent by the argument in the proof of the theorem. We can therefore add e1 to the basis we are constructing. We now look at e 2 , and this time find that e 2 E span(a, b, e 1 ) , since e2 = a e 1 . However, {a, b, e1 , e 3 } is linearly independent, as you can check, so we adjoin e 3 to our basis. This gives a basis B' = {a, b, e 1 , e 3 } of IR4 , for e 4 = -a + b - e 3 and therefore B' spans V since span B' contains the spanning set { e1 , e 2 , e 3 , e4 } . Note that the basis B ' extending B is by no means unique. For example, {a, b, e 2 , e4 } is another basis of IR4 , as is {a, b, ( 1, - 1 , 0 , 0) T, ( 0 , 0 , 1, - 1) T}. Bases are used to define the notion of the dimension of a vector space V . The key to getting this to work is the following simple lemma.
Lemma 2 . 2 4 (The exchange lemma) Suppose in a vector space V , and suppose that
a 1 , a2 , . . . , an , b
are vectors
b E span(a1 , . . . , an-1 , an) but
b � span(a1 , . . . , an - d · Then a n E span(aJ, . . . , an - 1 , b) . If, in addition, { a , . . . , an - l , an} is independent, then so is { a1 , . . . , an - 1 , b}. Proof Since b E span(a1 , . . . , an - I , an) , there are scalars A; such that b = A1 a 1 + · · + An - l an - 1 + An an . Now, if An = 0 then b = A1 a 1 + · · + An- l an- 1 so b E span (a1 , . . . , an- I ) , which is false. So An =/: 0 and an = A;; 1 (b - A1 a 1 + · · · + An -l an-d 1 1 = >.;; b - >.;; 1 A1a1 - · · · - >.;; An - l an - 1 · Hence an E span( a1 , . . . , an - 1 , b), as required. J
·
·
linearly
32 Vector spaces For the additional part, we are given that { a 1 , . . . , an- I , an } is linearly inde pendent; suppose scalars Jli are given with
J1 1 a1 + · · · + fln- l an- 1 + fln b = 0. Substituting b )q a! + · · · + A n - I an- ! + An an into this we get ( /11 + fln A l )a l + · · + (Jln- 1 + fln An )an- 1 + fln A n an 0. Now { a1 , . . . , an - 1 , an } is linearly independent so all these coefficients are zero. In particular, A n fln 0 so Jln = 0 since An =/:- 0. But this gives j1 1 a 1 + · · + fln-l an- 1 0 so /1 1 = 112 = · · · = fln- 1 = 0 by the linear independence of { a 1 , az, . . . , an }, as =
=
·
=
=
·
D required. Theorem 2.25 Suppose A, B are both bases of a vector space V over � or C.
Then A, B have the same number of elements.
Again, the theorem is true generally, but we will prove it here in the special case when one of the two sets A, B is finite. Proof Suppose A has at least as many elements as B, and B is finite. List all the elements of B as b1 , b2, , bn , and let a 1 , a 2 , , an be distinct elements of A. Our task is to show that this in fact lists all the elements of A . Now a 1 E V = span ( b1 , b2, . . . , bn ) so . • •
a1
• . .
A1b1 + A2 b2 + · · · + An bn for some scalars Ai· Certainly not all the Ai are zero, for otherwise a 1 =
= 0 E A so A would not be linearly independent. By reordering the bi if necessary we may assume A1 =1- 0. Then a1 ¢ span ( b2 , . . . , bn ), for else there would be scalars f.li
with
a1 = Ob1 + P,2b2 + · · · + f.ln bn A1 b1 + A2 b2 + + An bn and 0 =1- A1 , contradicting the uniqueness of the coefficients in linear combina =
· ·
·
tions of linearly independent sets ( Proposition 2.18) . So by the exchange lemma, a 1 , b2 , b3 , . . . , bn is a basis of V . Now consider a2 . Again, a2 = A1 a1 + A2 b2 + · · · + A nbn for some scalars Ai · Not all of A2 , A3 , . . . , A n are zero, else
A1a1 - az = 0
contradicting the linear independence of A. By reordering if necessary, we may assume A 2 =/:- 0. So az E span ( a1 , b2 , b3 , . . . , bn ) · But a2 ¢ span ( a1 , b3 , . . . , bn) , for else A1 a1 + Azbz + A3 b3 + · · · + An bn = P,1 a1 + Obz + P,3b3 + + f.ln bn for some scalars Jli, with ,\2 =1- 0 contradicting Proposition 2 .18. Therefore, by the exchange lemma a 1 , a2 , b3 , . . . , bn is a basis of V . · · ·
Bases 33 Continuing in this way, we eventually get that a 1 , a2 , . . . , an is a basis of V . Now if A -::/:- { a 1 , a2 , . . . , an} take a E A not equal to any ai . Since { a 1 , a2 , . . . , an} spans V there are scalars vi with
so { a1 , a2 , . . . , an, a} is not linearly independent, a contradiction. Therefore A { a 1 , a2 , . . . , an} , and A and B have the same number of elements.
= 0
Definition 2 . 26 The number of elements of a basis ofV (which depends only on V , and not on the choice of basis) is called the dimension of V . The dimension of V is den oted dim V .
The usual examples turn out to have the dimension you would expect. For example, JR3 has dimension 3 since e 1 = ( 1 , e2 = 1, e3 = 0, forms a basis of size 3. Similarly, !Rn has dimension n. The complex vector space en also has dimension n, since the usual basis { el ) e 2 ) . . . ) en} is a basis for en too ( but see also Example 2 . 17 and Exercise 2.5 for the dimension of en as a real vector space). It is wise not to forget the case of dimension This is when a vector space is spanned by the empty linearly independent set, 0. But what vectors are a linear combination of vectors in 0? The zero vector ( by convention) is one such, and in fact it is the only one. So a vector space V of dimension is the zero space, i.e. V =
0, O)T,
(0, O)T,
(0, 1)T
0.
0
{0}.
Corollary 2.27 If V is a vector space over IR or e and U s;;; V is a subspace of V then dim U ::;; dim V . If, additionally, dim V is finite and U -::/:- V then dim U < dim V . Proof Let B s;;; V be a basis of U. Then by Theorem 2.21 B extends to a basis B' 2 B of V. Clearly as B s;;; B' , B' has at least as many elements as B. If dim V is finite and U -::/:- V , then B' is finite and hence B is also finite, so U has finite dimension. But U = span B -::/:- V = span B', so B' -::/:- B and hence 0 B' has strictly more elements than B .
The second part of this very useful corollary can be stated in an alternative form as follows. Note too that the finiteness assumption is essential here (un like some of the results here which were proved only in the finite case but are nevertheless true in the infinite case too) . See Exercise
2.10.
Corollary 2 .28 Suppose that V is a vector space over IR or e, dim V is finite, and U s;: V is a subspace of V with dim U = dim V. Then U = V .
Often, a vector space V has finite dimension, in which case all bases of V are finite, but this may not be the case. In this book we are mostly concerned with finite dimensional vector spaces, but occasionally infinite dimensional spaces are required.
34 Vector spaces Example 2.29 Let V be the set of all functions j, g , . . . from the natural num bers N = {0, 1 , 2, 3, . . . } to JR, with addition f + g defined by
( J + g) ( n )
=
f ( n) + g ( n )
all n E N
and scalar multiplication >.. j by ( >.. f ) (n ) = >.. · f ( n )
all n E N.
Then V is a real vector space with infinite dimension. Proof We leave the verification of the vector space axioms as a straightforward exercise. To show that V has infinite dimension, let e; E V be the function defined by e; ( n ) = 0 for i :j:. n and ei (i) = 1. We show that {ei : i E N} is linearly inde pendent, and hence by Theorem 2.21 can be extended to a ( necessarily infinite) basis. Let
be an arbitrary linear combination of vectors from V , and suppose f = 0 . We must show >..0 = >.. 1 = · · · = An = 0 . The vector f E V is of course a function N -+ IR, and checking the definitions of + and scalar multiplication we see that J ( i)
=
{ �i
if i (; n otherwise.
But if f = 0 this means that f ( O ) = f ( 1 ) = · · · = An = 0 as required.
>..o = >.. 1 2.5
= f ( n ) = 0, in other words D
Coordinates
If V is a finite dimensional vector space over lR or C then it has a finite basis B � V. Since B spans V, every vector v from V can be written as a linear combination of elements of B. Thus if B = { v1 , v2 , . . . , Vn} , each v E V can be
written as
for some scalars >.. 1 , >.. 2 , . . . , >..n . By Proposition 2 . 18, these scalars >.. 1 , >..2 , , >..n are unique, so providing the ordering of B as v1 , v2 , . . . , Vn is understood the column vector ( >.. I ' >..2 , . . . ' An ) T from !Rn or en determines v uniquely; the A; are called the coordinates of v with respect to the ordered basis v1 , v2 , , Vn of V. • • •
• • .
Coordinates 35 Example 2.30 For the real vector space IR3 we may take ordered basis v 1 , v 2 , v 3
where
The coordinates of the vector v can be found by solving
=
(1, 0, 1)T with respect to this ordered basis 1/2,
1,
as three simultaneous equations in .A1 , .A2 , .-\3 . This gives .-\1 .-\2 = .A 3 so v has coordinates with respect t o the ordered basis v1 , v2 , v3 . Similarly, it has coordinates with respect to the ordered basis v1 , v3 , v 2 -changing the ordering of the basis changes the order of the coordin ates. Finally, v has coordinates with respect to the usual basis e1 , e2 , e3 0, of IR3 , just as you would expect.
1)T (1/2, 1/2,(1/2, 1, 1/2)T (1, 1)T
=
=
The convention we shall use in this book is that when the ordering of a basis is important we shall say so, and also omit the curly brackets, as in the phrase 'the ordered basis v 1 , v2 , . . . , Vn '. If the ordering is unimportant (so the basis can be just thought of as a set) we use curly brackets, as in 'the basis { v1 , v2 , . . . , Vn } ' . Ordered bases and coordinates are used to show that two vector spaces V, W of the same dimension over the same scalar field IR or C are isomorphic, or in other words look the same. Two isomorphic spaces V and W might not have exactly the same vectors, but vectors can be paired off, one from V with one from W , so that the operations of addition and scalar multiplication in W do the same to the paired vectors as the operations of addition and scalar multiplication in V do to the original vectors. Example 2.31 The real vector space IR2 of column vectors (with the usual addition and scalar multiplication) looks rather similar to the complex numbers, C, regarded as a real vector space. We can represent both diagrammatically as a plane (with y coordinates in the case of IR2 , the Argand diagram in the case of q . What's more, the column vector gives precisely the coordinates of the complex number + iy E C with respect to the ordered basis i of C. The idea is to pair these two vectors off with each other. This done, there are certain obvious similarities between the vector space operations. Compare
x,
(x, y) T
x
( ) + (x') (x + x')' x
y
with
=
y'
y+y
(x + iy) + (x' + iy' ) (x + x') + i (y + y' ) , =
1,
36 Vector spaces and
with
..\ (x + iy) = (..\x) + i( ..\y) .
Note that we ignore here the fact that in the complex numbers we have a multiplication operation w · z combining z, w E C whereas there is no such multiplication of two vectors giving another vector defined on !R;2 , since we are looking at the two spaces purely as vector spaces. A 'pairing off' of vectors in V and W is called a one-to-one correspondence or a bijection, and is really a function f : V ---+ W such that f is injective, i.e.
v ::/:- w implies f(v) ::/:- f(w),
and surjective, i.e.
w E W implies w = f(v) for some v E
V.
Bijections are used to give the complete definition of isomorphisms of vector spaces. Definition 2.32 Two vector spaces V, W, both over JR; or both over C, are isomorphic if there is a bijection f : V ---+ W such that ( 13)
f(u + v) f(u) + f(v) f(..\v) = ..\f(v) =
(14)
for all u, v E V and all scalars ..\. The bijection f is said to be an isomorphism from V to W . We write V � W or f : V � W . I t i s particularly important to realise that there are two different addition operations here, and two different scalar multiplication operations. So it might be better to write
f(u + v)
=
f(u) EEl f(v)
instead of (13), where + i s addition of vectors i n V and in W . Similarly,
f(..\ · v)
=
EEl
i s addition o f vectors
..\ 8 j (v)
would be a more accurate representation of (14), where · is scalar multiplication in V and 8 is scalar multiplication in W. Similarly, we should distinguish between
Coordinates 37 the zero vector Ov of V and the zero vector Ow of W as these really are different vectors. Note that if f : V c:::t W, then j (Ov) = Ow (or as we shall usually write, f(O) = 0, ignoring the fact that these two zero vectors are actually different) . Indeed, f(Ov) = Of(v) for any vector v E V . But O v i s the zero vector of V , f(v) is a vector in W , and any vector in W multiplied by 0 (according to scalar multiplication in W ) is the zero vector of W . Note also that (13) and (14) together imply that
for all a1 , a2 , . . . , ak E V and all scalars A. 1 , A.2 , . . . , Ak, so an isomorphism f takes linear combinations of a 1 , a2 , . . . , ak E V to linear combinations of
j (a1 ) , j (a2 ) , . . . , f(ak).
The following theorem, when carefully formulated, i s actually true for vector spaces of infinite dimension too, but we restrict our discussion here to the finite case to avoid the more difficult issues in set theory that would otherwise be needed.
Theorem 2.33 Suppose V is a vector space over JR. with finite dimension n � 0 . Then V � IR.n as real vector spaces. Similarly, if V is a vector space over
n,
then v �
en
as complex vector spaces.
Proof We have already seen that if n = 0 then V is the zero space { 0 } , so is obviously isomorphic to the zero space IR.0 . So assume n > 0. By the definition of 'dimension' there is an ordered basis v1 , v 2 , . . . , Vn of V of size n. The idea is to define f : V ---+ IR.n by taking each v E V to its coordinate form with respect to the ordered basis v1 , v 2 , . . . , Vn · Specifically, we define f by
noting that this definition is valid since every v E V has precisely one expres sion of the form A. 1v1 + A. 2 v 2 + · · · + AnVn. We just have to check that f is an isomorphism. The function f is injective, because if v = A.1 v 1 + .\2 v 2 + · · · + Anvn, w Jl ! V l + }.l2V2 + · · · + JlnVn , and f(v) = f(w), then
Ai = Jli for all i, so v = w. Also, f is surjective, since if = (r 1 , r2 , . . . , rn)T E IR.n , we have j (r1v1 + r2 v2 + · · · + rnvn) = r . Furthermore, if v = A1 v1 + A 2 v2 + · · · + AnVn and w = Jll v1 + }.l2V2 + · · + JlnVn,
so
r
·
then
so
38 Vector spaces f(v + w) = ( ).1 + f-1, 1 , >.. 2 + f-1,2 , · · · , An + Pnf = ( >.. 1 , >..2 , . . . , >..n f + (p1 , f.i,2 , · · · , pnf = f(v) + f(w) , and if v i s a scalar
so
f(vv) = (v >.. 1 , v>..2 , . . . , v>..n ) T = v( >.. 1 , >..2 , · · · , >..n f = vf(v) required. For the case of a vector space over C, the argument is the same, but use D scalars from C instead. as
as
It follows that any two real vector spaces V, W of dimension n are isomorphic, are any two complex vector spaces V, W of dimension n .
2.6
Vector spaces over other fields
The observant reader might have noticed that the two kinds of vector spaces we have been considering-over the reals and over the complexes-have much in common and he or she may wonder whether the notion of a vector space makes sense over any other number system other than � or C. The answer is yes. All we need is a number system in which we can perform the usual operations of addition, subtraction, multiplication, and division, subject to the usual rules, such as a + 0 = a, and a · a 1 = 1 , and so on. -
Definition
0 + and · , which satisfy the following axioms. a+b=b+a (a + b) + c = a + (b + c) a+O=a for all a there exists -a such that a + (-a) 0 a·b=b·a (a · b) · c = a · (b · c) a·1 =a for all a =I 0 there exists a-1 such that a · a- 1 = 1 a · (b + c) = a · b + a · c.
2 . 34 A field is a set F containing distinct elements
and 1 , with
two binary operations
=
If a field F is fin ite, its order is the number of elements in F.
( 15) ( 16) ( 1 7) ( 18) (19)
(20) (21) (22) (23)
Vector spaces over other fields 39 The axioms for fields can be thought of as forming three groups: (A) the rules for addition, ( 15)-( 18); (B) the rules for multiplication, ( 1 9)-(22); and ( C) the distributivity law, (23). Plenty of examples are furnished by arithmetic modulo some prime p.
Example 2 .35 If p is any prime number, let lFP denote the set { 0, 1, . . . , p - 1 } , and define operations + and · on lFP as follows. First use ordinary integer arith metic, and then 'reduce modulo p ' ; in other words, subtract whatever multiple of p is necessary to bring the answer in the range 0 to p 1 . -
Example 2.36 The field lF 2 has two elements, 0 and 1 , subject t o all the ordin ary rules of arithmetic except that 1 + 1 = 0 . Example 2 .37 The field JF5 o f order 5 can b e constructed as follows. First take ordinary arithmetic on the set {0, 1 , 2, 3, 4}: 3 4 3 4 4 5 5 6 6 7 7 8
0 1 2 0 1 2 1 2 3 2 3 4 3 4 5 4 5 6
+ 0 1 2 3 4
0 0 0 0 0 0
0 1 2 3 4
1 0 1 2 3 4
2 0 2 4 6 8
3 0 3 6 9 12
4 0 4 8 12 16
3 0 3 1 4 2
4 0 4 3 2 1
which on reduction modulo 5 gives + 0 1 2 3 4
0 0 1 2 3 4
1 1 2 3 4 0
2 2 3 4 0 1
3 3 4 0 1 2
4 4 0 1 2 3
0 1 2 3 4
0 1 0 0 0 1 0 2 0 3 0 4
2 0 2 4 1 3
Example 2.38 The integers modulo 4 do not form a field. One way to see this is to observe that the element 2 does not have a multiplicative inverse, so that you cannot divide by 2. This is because for x = 0, 1 , 2, 3, we have 2 · x = 0, 2, 0, 2 respectively, so there is no element x with 2 x = 1 . ·
In fact there is a field of order 4 , but it cannot b e obtained in this simple way.
Example 2.39 Let lF4 denote the set {0, 1 , a, a + 1 } with addition and multi plication defined by the following tables. + 0 1 a a+1
0 0 1 a a+1
1 1 0 a+1 a
a a a+1 0 1
a+ 1 a+1 a 1 0
0 1 a a+1
0 0 0 0 0
1 0 1 a a+1
Then it can be verified that this is a field with four elements.
a 0 a a+1 1
a+ 1 0 a+1 1 a
40 Vector spaces Notice that in the last example, the multiplication table tells us that a · a = a + 1 , which we can think of as a polynomial equation, a 2 = a+ 1 or a 2 + a + 1 = 0. In fact all finite fields can be defined in a similar way, starting with a field of prime order (in this case IF2 ) , and adjoining some element a satisfying a suitable polynomial equation. We shall see more of this in Chapter 9. For the moment, we merely state the following important theorem without proof.
p and each p ositive integer pn . Moreover, every finite field is of this form.
Theorem 2.40 For each prime field of order
n,
there is a unique
Example 2.41 The field of order 9 can be defined by adjoining a 'square root of - 1 ' to the field IF3 of order 3, in the same way that we obtain C from is,
R
That
Vector spaces over finite fields. The whole of this chapter can be gener
IE.
alized to vector spaces over an arbitrary field F. Simply replace or C in the definitions and theorems by the field F. All the theorems remain true in this more general context.
Example 2 .42 Let F
=
IF2 F8
=
=
{0, 1 } , the field of order 2. Then
{ (x1 , . . .
, xs ) T : Xi
E F}
is a vector space of dimension 8 over F, with a basis
{ (1 , o, o, o, o, o, o, of, (o, 1 , o, o, o, o, o, of, . . . , (O, o, o, o, o, o, o, 1)r}. These vectors are very important i n computer science, where they are called 'bytes'. The number of such vectors is 28 , as there are two possibilities for x1 , two possibilities for x 2 , and so on. More generally, if F is a field of order then contains exactly vectors.
q,
Fn
qn
The generalization of the main theorem, Theorem 2.33, states that any vector space of dimension n over a field F is isomorphic to Fn. As a corollary we have the following.
qn vectors.
Corollary 2.43 Any vector space of dimension exactly
n
over a field of order
q has
In fact, we may use this result to prove the second part of Theorem 2.40; that is, that every finite field has order equal to some prime power.
Lemma 2.44 Let F be a finite field, and let F0 be the subset F0
=
{0,
1, 1 + 1, 1 + 1 + 1, . . . }
of F . Then F0 is a subfield of F (i.e. is closed under addition and multiplication, and is a field in its own right) , and the order of F0 is a prime number,
p.
Vector spaces over other fields 41 Proof F0 is clearly closed under addition, by the associativity of addition. To show it is closed under multiplication, we use distributivity to show that
� - � = �· a
(24)
ab
b
Now F is finite, so the elements 1 , 1 + 1 , 1 + 1 + 1 , . . . cannot be all distinct, which implies that 1+1+···+1= 1+1+···+1
'-v-"
for some positive integers s times gives
r
>
r
s.
'-v-" s
Subtracting 1 from both sides of this equation
1 + 1 + . . . + 1 = 0.
'-v-" r-s
Let p be the smallest positive integer such that 1+1+···+1=0
'-v-"
'
p
and note that the argument just given shows that for no 0 �
s
< r � p does
1+1+··· + 1 = 1+1+···+ 1
'-v-"
for else 0 < r - s <
p
r
and
'-v-" s
1 + 1 + . . . + 1 = 0.
'-v-" r-s
1 <
a, b with 1 < a < p,
We now prove that p is prime. If not, there are integers b < p, and p = ab. Then by (24) we have 0=
�= � - � a
p which implies that both of
�
and
b
�
a
b
are nonzero elements of F without multiplicative inverses. This is a contradiction, so p is prime. It follows that F0 = {0, 1 , 1 + 1, . . . , 1 + 1 + · · · + 1 } '-v-"
p- 1
and F0 has p elements. In fact, it is easy to check that since p is prime, F0 is D isomorphic to the field IFp of Example 2.35, so F0 is a subfield of F .
42 Vector spaces The subfield constructed in the previous lemma is clearly the smallest subfield
F, and because of this it is given a special name. Definition 2.45 If F is a fin ite field, the subfield
of
Fa = {0, 1 , 1 + 1 , 1 + 1 + 1 , . . . }
of F is called the prime subfield of characteristic of F.
F,
and the order of
Fa
is called the
Theorem 2 .46 Every finite field F is a vector space over its prime subfield. Proof Observe that the vector space axioms are special cases of the field axioms for
F.
0
Corollary 2.47 Every finite field F has order pn for some prime p (the charac teristic of F) and some integer n � 1 . Proof Immediate from Theorem 2 .46 and corollary 2.43.
0
S u m m ary
To sum up this chapter, we started with examples of vector spaces and then gave the axioms for vector spaces which in some sense described ' common features' of the examples we had in mind. We then defined the important concept of dimension of a vector space. ( The verification that this is well-defined required the exchange lemma. ) We concluded by showing that ( for finite dimensional spaces at least) the dimension of the vector space characterizes it completely up to isomorphism. In other words any two finite dimensional real vector spaces V1 and V2 are isomorphic if and only if they have the same dimension-in which case both are isomorphic to the space of column vectors �n for n = dim V1 = dim V2 • Similarly for complex spaces and the spaces en . In the ( optional) Section 2.6, the level of abstraction was taken one step further; the axioms for a field were given and the notion of a vector space V over a field F introduced. A particularly simple example of a field F is the set of numbers {0, 1 , 2, . . . , p - 1 } for a prime p, and addition and multiplication being taken modulo p. Similar results for vector spaces over F hold: for example, the typical examples of vector spaces over F are the spaces of column vectors Fn with entries from F, and every finite dimensional vector space over F is isomorphic to F n for some n. Exercises
Exercise 2 . 1 If f : U -r V and g : V -r W are isomorphisms of vector spaces U, V, W, show that f - 1 : V -r U is an isomorphism from V to U, and that the composition g o f : U -r W is an isomorphism from U to W.
n =1- m are natural numbers, then the real vec tor spaces �n and �m are not isomorphic. [Hint: if f : �n -r �m is an iso morphism and e 1 , . . . , e n is the usual basis of �n , show that �m is spanned by
Exercise 2.2 Show that if
Exercises 43 {f(e l ) , . . . , j(en)} (since f is surjective) and furthermore linearly independent. Deduce n = m.]
{f(er ) , . . . , j(en)}
is
Exercise 2.3 Find bases for the following subspaces of IR4 :
{ (x, y, z, t)T : x + y + 2 t = O, y - 3z = O} ; { (x, y, z, t)T : x - 2y + 3z - 4t = 0 } ; { (x, y, z, t)T : 2x - y = 3y - z = 4z - t = 0 }. Exercise 2.4 For each of the following subsets A of IR4 , find a basis for span A, and express span A i n the form { (x, y, z, t) T : . . . }: (a) A = {(1, 0, 2, - lf, (O, l, - 1 , 2)T} ; (b) A = {(1, - 1 , 0, O)r, (0, 1 , - l , O)r, (0, 0, 1, - If , ( - 1 , 0, 0, l ) T} ; (c) A = { (x, y, O, O)T : x > y > 0 } . Exercise 2 . 5 What is the dimension (0, 1 , 2 , 3, . . , or infinite) of the following (a) (b) (c)
.
vector spaces? (Give reasons.) (a) C5 , as a complex vector space. (b) C5 , as a real vector space. (c) The set of polynomials p(X) of degree at most 7 with coefficients from IR , as a real vector space. (d) The set of polynomials p(X) of arbitrary degree with coefficients from IR , as a real vector space.
Exercise 2.6 For each integer n ? 0, let fn : IR � IR be the function defined by fn ( x) = xn . Show that the set B = {fn : n ? 0 } is a basis for the vector space IR[x] of real polynomial functions. Show that the map ¢ defined on IR[x] by
is injective but not surjective. Deduce that to a proper subspace of itself.
IR[x] is isomorphic (as a vector space)
Exercise 2 . 7 Write out the addition and multiplication tables for IF3 . Exercise 2.8 (a) Write out the addition and multiplication tables for integers modulo 6), and show that ::Z6 is not a field. (b) More generally, show that if a and b are any two integers bigger than Zab is not a field.
p
::Z6
(the
1 , then
Exercise 2.9 Let be a prime number, and Zp be the set of integers modulo p. For any nonzero element a E Zp , we define the map 'multiplication by a '
by ma :
b f-t ab
(mod
p)
44 Vector spaces (a) Use uniqueness of prime factorization to show that m a is an injection, and hence a bijection. (b) Let b be the element such that ma (b) = (mod p). Show that b is an inverse to a, and deduce that Zp is a field.
1
Exercise 2.10 Let V be the set of all functions f : N -+ IR, and let 0 be the function defined by O(n) = 0 (see Example 2.29). (a) Prove that each of the axioms holds, showing that V is a real vector space. (b) Let B = { ei : i E N} where ei is as in Example 2.29. Show that B does not span V. (Hint: consider f such that f (n ) = for all n .) (c) Show that W = span B and the real vector space IR[X] of polynomials with coefficients from IR are isomorphic to each other. (d) Show that V and W are not isomorphic to each other, i.e. there is no iso morphism f : V .':::f W. (This is tricky.) [Hint for (d): given such an f, use induction on n to define a function g : N -+ IR so that for each n there is k � n such that
1
in
jRk + I
.]
Part II Bilinear and sesquilinear forms
3 I n ne r pro d uct spaces I n this chapter we consider ways of 'multiplying' two vectors in a vector space V to give a scalar. Such a product is called an inner product, or scalar product, and the theory is based initially on a familiar example (sometimes called the dot product) on three-dimensional vectors in IR3 . To begin with, we consider real vector spaces, but we will consider complex vector spaces later on. 3.1
T h e standard inner prod u ct
The scalar product, inner product, or dot product, of two vectors in IR2 or JR3 is rather well known. We start this chapter with its definition and description of its main properties. Throughout this section, llvll will denote the length of the vector v in IR2 or JR3 .
The standard inner product in IR2 • Let r, s E IR2 be nonzero vectors, and let B be the angle from vector r to s. The angle r makes with the x-axis will be denoted a. (See Figure 3.1. ) We suppose also that r = (r1 , r2 )T and s = s2 )T
in coordinate form, so
llrll
=
Jr? + r� and llsii Jsi + s�.
(s1,
=
Now, the matrix for rotation about the origin by an angle (or negative) direction is the basis-changing matrix
a
in the clockwise
s r
Fig. 3 . 1
Two vectors and the angle between them
48 Inner product spaces
r
b
Fig. 3.2 The cosine rule
( cos a
sin a - sin a cos a
) = 1 ( r1
jj;IT -rz
In other words, multiplication of a position vector ( x, y ) T on the left by this matrix gives the corresponding position after rotation by the angle a as indicated. Clearly, this matrix moves s to the vector li s i i ( cos O, sin O ) r , since B is the angle between r and s, so
jj s jj
1 (cos O) = n;rr ( -r 1 z sin 0 T
from which we deduce that
1"1 81 + Tz 8 z = l l r l l l l s ll cos O
(1)
1"1 82 - 1"28 1 = l l r l l i i s ll sin O.
(2)
and
This chapter and the next are concerned with expressions like these. In par ticular, ll r l l l l s l l cos B is called the standard inner product of r and s, and will be denoted r · s or (r j s ) . Note that as cos 0 1 we have =
The cosine rule. Before we look at inner products in three or more dimen sions, we will deduce the familiar identity a2 = b2 + c2 - 2bc cos 0 for a triangle with sides of length a, b, c and angle 0 opposite the side a. Let r, s be the vectors in IR2 with angle between them 0 and lengths il r il = b and l l s ll = c, and l i s - r ll = a, as in Figure 3.2; suppose that r = ( r1 , rz ) r , s = ( 8 1 , 82 ) r. Then
Inner products 49 /I s - r /1 = (81 - r l ) 2 + (82 - r2 ) 2
812 - 281r1 + r21 + 8 22 - 2 82 r2 + r22 = 8 21 + 8 22 + r21 + r22 - 2( 8rrr + 82 r2 ) 2 = // s // + / / r UZ - 2 // r l ! l! sl ! cos B. =
So
The standard inner product in �3 • As in �2 , the standard inner product r s or (r / s ) is defined to be l!ri ! llsll cos B where B i s the angle between the two vectors. Suppose r = (r1 , r2 , r3 )T and s = (81 , 82 , 83 )T; then by the cosine rule (referring to Figure 3.2 if necessary) we have 2llrll i!si! cosB = l!rll 2 + lls i! 2 - lis - rll 2 ·
r 21 + r22 + r23 + 8 21 + 822 + 823 - ((81 - r l ) 2 + (8 2 - r2 ) 2 + ( 8 3 - r3 ) 2 ) 2 (r 8 J + r282 + T3 8 3 ),
=
r
=
by cancelling terms. Therefore we have an expression very similar t o that ob tained in two dimensions for our inner product, i.e. (3) 3.2
I n ner prod u cts
Although it is more difficult to interpret the idea of angle in four or more dimen sions, the expression in (3) suggests we define (v/w) in �n by
n
(v/w)
=
L v; w; i= l
for two vectors v = (v1 , v2 , . . . , vn)T and w = (wr , w2 , · · · , wn)T in �n . This is called the standard inner product on �n , but there are many other possible inner products, as we shall see. What are the essential properties of an inner product? We obviously have
n
(w/v)
=
n
L w; v; L v;w; i=l
=
i= l
n
(v\A.w) and
=
L v;AW; i=l
=
(v/w) ,
n
=
,\
L v; w; i=l
=
,\ (v \ w),
50 Inner product spaces n
(uiv + w ) = L u; ( v; + w; ) = i= l
n
n
i=l
i=l
L u;v; + L u;w; = (ujv) + (ujw) .
A slightly less obvious property which is nevertheless important is that n
(vlv) = ""' L.., V;2 ;::: 0, i=l
i,
and also that if (viv) = 0 then 2:::� 1 vf = 0 and hence v; = 0 for all which implies v = 0. It is these properties which we (somewhat arbitrarily) choose as the defining properties of an inner product.
Definition 3 . 1 If V is a vector space over IR, then an inner product on V is a map (written ( j )) from V x V to IR (taking a pair of vectors ( v , w) to a real number (vjw)) with the following properties. (a) (Symmetry) (vlw) = (w!v) for all vectors v , w in V . (b) (Linearity) (u l ..\v + J.Lw) = ..\ (u jv} + J.L(ujw} , for all vectors u, v , w in V and all scalars ..\, J.l· (c) (Positive definiteness) i. (v ! v} ;::: 0, and ii. if (vlv) = 0 then v = 0 (equivalently, if v =I 0, then (v l v) =I 0) for all vectors v E V .
Notice that because it i s symmetric, the linearity i n the second variable im plies linearity in the first variable. That is,
(..\u + J.Lvlw} = ..\(ujw} + J.L(vjw}. Thus an inner product is linear in both variables, so we call it bilinear. A finite dimensional vector space over IR with an inner product defined on it is called a Euclidean space.
Definition 3.2
We can define all sorts of different inner products on a vector space, not just the standard ones given above.
Example 3.3 In IR2 we could define
( (a, bfj (c, df) = ac + bc + ad + 3bd. This satisfies the above three properties, so is an inner product. The first two are easy to verify. To check the third one, observe first that ( (a, b)T j (a, b)T) = a2 + 2ab + 3b2 = (a + b) 2 + 2b2 ;::: 0. Moreover, if ( (a, b)Tj (a, b)T ) = 0, then (a + b) 2 + 2b2 = 0, so a + b = 0 and b = 0, and therefore (a, b)T = (0, O)T.
Example 3.4 Another example is given by the vector space 'it'[a, b] of all con tinuous functions 1 from the closed interval [a, b] to JR. Here, addition and scalar 1 We shall not give a completely rigorous account of 'continuous functions' and integration.
Instead, the student is directed to any standard textbook in analysis. For now, the basic properties needed here can be take on trust. See also Appendix A .
Inner products 51 multiplication are the usual pointwise addition and scalar multiplication of func tions (similar to that in Example 2.29) , and the zero element is the function which takes the value 0 everywhere. If and are two functions in we can define their inner product to be
f g
�[a, b],
(fj g) = 1b f(x)g(x) dx.
To prove that this is an inner product according to Definition 3 . 1 , we first need the facts that
1 b f(x)g(x) dx = 1 b g(x)f(x) dx, 1b f(x)(ag(x) + f3h(x)) dx = o: 1b f(x)g(x) dx + f3 1b f(x)h(x) dx,
and
which immediately give properties (a) , (b) , and (c i) . To prove c ii) , we need the (somewhat more difficult) result that if -+ is a continuous function which is nonnegative and not identically 0 , then :f:- 0. This result is proved in Appendix A , as Lemma A .2. Now suppose that is not identically 0, so that defines a continuous function which is everywhere nonnegative and not identically zero. Then
IE.
J: g(x) dx
g: [a, b]
(
(f(x)) 2 (!I f) = 1 b (f(x)) 2 dx :f:- 0 ,
f
(
giving c ii) , s o we have now proved that this is an inner product.
�[a, b]
in Example 3.4 is a very large infinite Example 3 . 5 The vector space dimensional space, but we can construct a similar finite dimensional example, by taking the space of all polynomials in of degree less than n, and defining an inner product by
IE.n [x]
x
1 (fj g) = 1 f(x)g(x) dx. The ordinary scalar product on IE.2 and IE.3 is related to concepts of length and distance in a way that can easily be generalized to arbitrary Euclidean spaces. If we take v = (x, z ) T E IE.3 then for the standard inner product, v v = x2 + 2 + z 2 , which is the square of the length of the vector v. The distance between two vectors v and is then naturally defined as the length of y,
·
y
v - w . In general we define:
w
52 Inner product spaces Definition 3.6 The norm (or length) of a vector v is written llvll and defined by llvll = �, the positive square roo t of the inner product of v with itself The distance between two vectors v and w is written d( v , w) and defined by d(v, w) = l l v - w j j . As a consequence, we note that 11 - v ll = llvll ; the 'length' of the vector - v equals the length of v . More generally, if you multiply a vector by a scalar, then its
length is multiplied by the absolute value of the same scalar.
Proposition 3. 7 For all vectors v in a Euclidean space V , and for all .A E IR, we have jj.Av ll = !.X I · llvii -
Proof II.Avl l = J(.Av j.Av) = J ( .A2 (vjv) ) = I.A I · J(vjv) = j .A j · !l v l ! .
D
If you add two vectors together, the relationship between the lengths is not so simple, but we still get the familiar triangle inequality, i.e. llv + wjj � II v ii + !! w ll · To prove this in general requires the following very important result.
Proposition 3.8 (The Cauchy-Schwarz inequality) For all vectors v, w in a Euclidean space V ,
j (v jw) l � ll v ll · l l w ll -
Proof For every real value of .A we have 0 � llv + A.w l l 2 = (v + A.w jv + A.w )
= (v jv) + .A(vjw) + .A(wjv) + .A2 (wjw) = .A2 II w ll 2 + 2.A(v j w) + l l v l l 2 ·
Now regard the right-hand side as a quadratic polynomial in the variable .>.. . This polynomial is always nonnegative, so it has at most one real root. Therefore the discriminant ('b2 -4ac' for the polynomial ax2 + bx + c) is nonpositive. In symbols, hence Now take the positive square root of both sides to obtain the result.
D
A slightly different proof is obtained by saying that the first inequality in the above proof, namely holds for all values of .>.. , in particular if w =f. 0 then the inequality above holds for
(v jw) .A = l lw l l 2 · Substituting in and simplifying yields the Cauchy-Schwarz inequality. Of course, if w = 0 then (v j w) = 0 so both sides of the inequality are zero and so the
Inner products 53 inequality is true here too. (Compare this also with the proof of the complex version given in Proposition 3.21 below.) The Cauchy-Schwarz inequality has many forms, as it can be applied to all sorts of spaces. For example, applying it to the standard inner product on we obtain the following.
IR',.n
Corollary
3.9 If x1 ,
. . . , Xn , Y1 , . . . , Yn are any real numbers,
then
Similarly, we can apply it to Example 3.4 to obtain:
Corollary 3 . 1 0 If f and interval [a , b] , then
g
are continuous real-valued functions on the closed
(1 f(x)g(x) dx)2 b
Proposition
3.11
�
1 ( j (x)) 2 dx 1 (g(x)) 2 dx. b
b
(The triangle inequality) For any vectors v, w in a Euc
lidean space V ,
ll v + w ll
�
ll v ll + ll w ll ·
Proof Expanding directly, l l v + w l l 2 = \v + w lv + w) = \vlv) + 2 \v lw) + \w lw)
ll v ll 2 + 2 1 \v lw) l + ll w ll 2 � ll vW + 2llv l l l lwll + ll w W = ( I I vii + ll w ll ) 2 �
by Proposition 3.8
·
0
Now take the positive square root of each side.
We have just seen how to generalize the concept of distance from ordinary three-dimensional Euclidean space to arbitrary real inner product spaces, in such a way that the basic theorems like the triangle inequality still hold in this more general context. Another concept that can be easily generalized is that of angle. Recall that two vectors in JR',.3 are perpendicular (at right angles, or orthogonal) if their inner product is zero. More generally, if v and w are two nonzero vectors in JR',.3 then the angle e between them is given by v w = ll v llll w ll cos e . These properties can be used as definitions in arbitrary spaces with inner products defined on them. ·
Definition
3 . 1 2 If V is a Euclidean space, and v and w are elements of V, then and w are said to be orthogonal if \vlw) = 0. If both v and w are nonzero, the angle between v and w is defined to be e where 0 � e � 7r and (vlw) cos e v ll ll · llwll "
v
=
54 Inner product spaces Note that the Cauchy-Schwarz inequality (Proposition 3.8) implies that _1
(vj w} "' ll v ll · ll w l �
and so this definition of e makes sense.
�
1
Example 3.13 In IR3 with the standard inner product, these definitions coincide with the ordinary geometrical definitions.
Example 3.14 In '6'[-1r, 1r] with the inner product
(fjg} = i: f (x)g(x) dx,
define functions fk and gk by fk (x) = cos( kx) for integers k � 0 and gk (x) = sin ( kx) for integers k � 1 . Then you can check that any two of these functions are orthogonal to each other. For example,
since the integrand is an odd function of x, i.e. a function f (x) such that /( -x) - f (x) for all x E JR. Also, provided k ::/:- m,
=
(fkl fm ) = /_ cos(kx) cos(mx) dx = � /_: (cos(kx + mx) + cos(kx - mx)) dx [ rr
rr
1 2
=
0.
_
sin - (krr+mrr) sin(krr+mrr) k+m k+m + sin(krr-mrr) sin -(krr - mrr) k-m k-m _
]
Similarly, if k ::/:- m,
(gkj gm ) /_: sin (kx) sin(mx) dx = � /_rr (cos(kx - mx) - cos(kx + mx) ) dx rr =
= 0.
(If k = m then cos(kx - mx) = 1 and both integrals come to 1r . ) This is an important example which you will most likely meet elsewhere as well. It is the foundation of the theory of Fourier series.
Inner products over C 55 3.3
I nner prod u cts over C
So far in this chapter we have been working with real vector spaces. However, the whole theory goes through for complex vector spaces with very little change. In what follows we write z for the complex conjugate of z, and lz l for the absolute value of z, so that lz 1 2 = zz. We shall also use Re(z) for the real part of z, and Im(z) for the imaginary part of z , so z + z = Re(z) and z - z = Im(z) .
2
2i
Example 3 . 1 5 Let V = C, regarded as a one-dimensional complex vector space. It would be nice if our definition of an inner product gave rise to liz - wll being the distance from z to w in the Argand diagram, i.e. i z - w l , which would mean that l l z l l 2 = lzl 2 = zz . This suggests we should define the inner product by (ziw) = zw. This has slightly different properties from the real inner products we have seen so far. For example, (ziw) = z w = w z = (wiz) . We still have (zi .A.w) = z .A.w = .A. z w = .A. (ziw), but now (.A.z lw) = .A.z w = >: z w = :\ (ziw) . Based on this example, we give the following formal definition of a complex inner product.
Definition 3 . 1 6 If V is a vector space over C, then a map ( I ) from V
x V to C (taking (v, w) to (v lw)) is an inner product if the following are true for all u, v , w E V and .A., p E C. (a) (Conjugate-symmetry) (v lw) = (wlv) . (b) (Linearity) (ui .A.v + pw) = .A.(uiv) + p(uiw). (c) (Positive definiteness) i. (vlv) E IE. with (v iv) ;?: 0, and ii. if (v iv) = 0 then v = 0 (equivalently, if v -::f 0 then (vlv) -::f 0).
Notice that (a) and (b) imply that
( .Au + pvlw) = (w i .A.u + pv) = .A.(wlu) + p(wlv) = >: (w lu) + Ti (wlv) = >: (uiw) + Ti (v iw). Note in particular (ru + sv i w) = r (uiw) + s (v lw) for any real r, s ; in particular, putting r = s = 1 we have (u + v lw) = (ulw) + (v w ). Although this type of
l
inner product is not bilinear it is nearly so. It is sometimes called sesquilinear, i.e. 'one-and-a-half-linear', because of this and part (b) of Definition 3 . 1 6 , and it is also called conjugate-symmetric in view of part (a) of Definition 3 . 1 6 .
56 Inner product spaces Definition 3 . 1 7
A finite dimensional vector space over e with an inner product defined on it is called a unitary space.
We will also need to be able to refer to vector spaces with an inner product without specifying whether these spaces are over IE. or over e, and without any assumption that they have finite dimension. Such spaces will be referred to in the rest of this book as inner product spaces.
n and let v and w be any two vectors in V , say = e v = (vi, · · · ,vn )T and w = (w ,wn )T. Then define n v;w;. (v j w ) = L i= l Example 3 . 1 8 Let V
i
, · · ·
Then ( I ) is an inner product (as you should check) , called the standard inner product on
en .
Most of the results for real inner product spaces also work for complex inner product spaces, with a sprinkling of complex conjugates and absolute values inserted.
Definition 3 . 1 9 If V is a unitary space and
v to be l v l = y'(vjv), and the d(v,w) = l v - w l .
v E V, we define the of v and w to be norm
distance between two vectors
(The concept of angle is not so easy to interpret in the complex case, so we will not try to do so. )
v 1 >-v l = i >- i · l v l · j >.vj j = J(>.vj >.v) = J(�.\ (vj v) ) = j .\ j · y'(vj v) = j >. J · ! v ll-
Proposition 3.20 For any vector i n a complex inner product space and any E e, we have
>.
Proof
Proposition 3.21 (The Cauchy-Schwarz inequality) For any vectors in a complex inner product space V,
D
v, w
j(vjw )j � ll v l · l w llProof For variety, we give here a slightly different proof from the one we gave for the real case. Note carefully the places in the proof where we have to insert an absolute value, complex conjugate, or real part. For every complex value of ,\ we have 0�
ll v + .\w l 2 = (v + >.wj v + >.w) = (vj v + >.w) + �(wj v + >.w) = (vj v ) + .\(vj w ) + �(wj v ) + �,\(wjw) = (v j v) + >.(vjw) + � (vj w) + �>.(wj w ) = 1 J v l 2 + 2 Re (.\ (vj w )) + i>-l2 l w l 2 ·
Inner products over C 57
w -1- 0 it is true for = (l wjw lv2) . Since (v! w)(wj v ) = ( vj w )(v! w ) = ! (v! w W , we have - (v ! w2W ) + ! (v! wW l w ll 2 = - !(v ! wW2 + ! v i 2 . 0 � l v l 2 + 2 Re ( l l wl l!wl 4 l wl!
Now this is true for all .>.. , in particular if .>..
-
Therefore
w
from which the result for -1- 0 follows by taking the positive square root of 0 both sides. For = 0 the inequality is trivial as 0.
(vj w) = l ! w ll = Proposition 3. 22 (The triangle inequality) For any v, w in a complex inner product space V, l v + w l � ! v i + l w ll· Proof A s before, l t v + w l 2 = (v + w ! v + w) = (vjv) + (vj w) + (wj v) + (w ! w) = (v l v ) + 2 Re(vj w) + (wj w) � l v l ! 2 + 2j(v j w )! + l w l 2 by Proposition 3.21 � l v ! l 2 + 2 1! v ll · l ! w ! l + l w l 2 2 = (! vi + l wli) . w
0
Now take the positive square root of each side.
Example 3.23 For a < b from IR., the space 'i!?c [a, b] is the complex vector space of all continuous functions (see Appendix A) f : [a, b] -+ C with pointwise addition and scalar multiplication, analogous to the real case (Example As in the real case, 'i!?c [a, b] can be given an inner product,
3.4).
{J j g) = 1b f(x) g (x) dx.
The axioms of linearity in the second argument and conjugate-symmetry are straightforward to verify; positive definiteness is proved just as in the real case, using = defines a continuous E IR., and the fact that real-valued function on [a, b]. The Cauchy-Schwarz inequality for this space says that whenever j, E 'i!?c [a, b] then or �
f(x)f(x) l f (x)J2 l f (x) l !(f! g)j l f ll l g l ! , 1 /2 1 /2 ) ) 2 I 1b f(x)g(x) dx I � (1b l f(xW dx (1b jg (x) ! dx
g
58 Inner product spaces Example 3.24 Let V be the set of all continuous functions the integral
00 b lim f l f(xW dx !-oo jJ(xW dx a-too b-too -a
f:
ffi. --+
C such that
=
exists in IE.. We show that V becomes a vector space over C when addition and scalar multiplication are defined pointwise, as in the previous example. The most difficult part is in showing that V is closed under addition, i.e. that if E V then E V, in particular that the integral
f, g
f+g
is finite. For this, let
/_: J f (x) + g (xW dx
a, b > 0 and note that
l f (x) + g(xW = (f(x) + g(x))(f(x) + g(x)) = f(x)f(x) + f(x) g (x) + g(x)f(x) + g(x) g (x) = l f(x) l 2 + 2 Re (f(x)g(x)) + j g (x) j 2 . So f� a l f (x) + g(x)J Z dx equals j_ba l f (xW dx + 2 j_ba Re (f(x) g (x)) dx + j_ba J g(xW dx � (f l J ) [-a, b ] + 2 j (f j g ) [ -a, bJ I + (g j g) [ - a, b] where ( I ) [- a, b] denotes the inner product in '?t'c [ -a, b] of the previous example. Now put r f�oo l f (x)J Z dx, and s J�00 j g (x)J Z dx; note also that =
=
f�a l f (x) + g (x)J 2 dx is at most Ul f) [ -a,b] + 2J(Jj f ) [ - a, bJ I 1 /2 j(g j g) [-a,bJ I 1 /2 + (g jg) [ -a ,b] � r + 2vrs + This gives an upper bound on I( a, b) = f� a l f (x) + g(x)J 2 dx which is inde pendent of a, b. Since I( a, b) is nondecreasing as a, b this shows that lim a , b -+oo I( a, b) f�00 j j (x) + g (x)J Z dx exists, required. To prove closure of V under scalar multiplication, just note that for .\ E C,
by Cauchy-Schwarz in
'?t'c [-a, b].
Hence
r:::
=
as
--+
s.
oo,
The vector space axioms are then verified in the usual way and all are now straightforward.
Exercises 59 In fact,
(Ji g) = /_: f(x)g(x) dx defines an inner product on V . Once again, the only difficult part here is to show that if E V then the integral f�oo exists in C. This can be done by considering the real and imaginary parts of this integral separately, and using the Cauchy-Schwarz inequality in 'G'c [ -a, b] , as before. The axioms for an inner product can be verified as for 'G'c [ a, b] .
f(x) g (x) dx
J, g
Exercises
Exercise 3 . 1 For each of the following pairs of vectors in JR3 , calculate the angle between them. (a) ( 1 / J2 , 1/ J2 , 0) T , (0, 1 , 0) T (b) ( 1 , 1 , 0) T , ( 1 , - 1 , 0) T (c) ( 1 , 0 , 1 ) r , ( - 1 , 1 , 0) r . Exercise 3.2 In each of the following cases compute the projection of s in the direction of r. Then write down a nonzero vector t in the space spanned by s and r orthogonal to r. Verify your answer by computing (t jr) . (a) s = (5, 2 , 1 ) r , r = ( - 1 , 2, 3) T (b) S = 1 , 1 , 1 ) T , r = - 1 , 0, 1 ) T .
(
(
Exercise 3.3 From equation (2) , the value r1 8 2 - r2 8 1 two-dimensional determinant
llrll llsll sin B is the
Give a geometrical interpretation of this determinant in terms of the area of the parallelogram drawn out by r and s. Explain the significance of its sign in terms of the direction travelled as you go from r to s.
Exercise 3.4 The triangle inequality is often used in one of the following al ternative forms: ll v - w ll :( l l v ll + l lv + w l l � l lv l l ll v - w ll � ll v ll -
ll w l l ll w ll ll w ll -
Show that these forms follow from Proposition 3. 1 1 by simple substitutions. (Remember that 1 1 - w ll = ll w ll - )
+
Exercise 3.5 By expanding ( v + w l v w) and (v - w jv - w ) , prove that, for any vectors v , w in a real inner product space,
60 Inner product spaces
+
(a) ( v lw) = � ( llv wll2 - llvl l 2 - !lwW ) , (b) llv + wll2 llv - wl! 2 = 2 ( llvll2 llwll2 ) .
+
+
Exercise 3.6 Prove that in any complex inner product space (a) (vlw ) = � ( llv wll2 illv - iwll2 i) ( llvll2 llwl!2 ) ) , (b) (v lw) = t ( llv wll2 - llv - wll2 illv - iwll2 - illv iwW ) .
+ + +
+
(1 +
+
+
Exercise 3 . 7 In each of (a) , (b), (c) below, determine whether ( I / is an inner product on V. Justify your answers. In every case which is an inner product, write down the appropriate form of the Cauchy-Schwarz inequality.
+ (v2 - w2 ) 2 where v = (Vv21 ) and w = (Ww21 ) . (b) V = IR3 , (x l y ) = 2x l Y 1 + 3 X Y + 2x3 y3 + x1 Y3 + X3Y 1 -2x2 Y3 -2x3y2 , where 2 2 X = ( x1,X2 ,x3 ) T and Y = (y!,Y2 , Y3 ) T. (c) V = JR3 , (x l y) = X!Y! + 2X 2 Y2 - 2X!Y3 - 2 X3Yl - X2 Y3 - X3Y2 , where X = (x 1 ,x2 , X3 ) T and Y = (y! , Y2 , Y3 ) T. (a) V = IR2 , (v!w) = ( v 1 - w ! ) 2
Exercise 3 . 8 D o the same as the last question for the following.
(a) V is the set of all n x n matrices with entries from IR, and (AlB) = tr(AB) . (Recall: if C is the matrix ( ci1 ) , then tr(C) denotes the trace of C , defined by tr(C) = Cii · ) (b) V is the set of all n x n matrices with entries from and (A B) = tr (A TB) . (For one of them, try to write down an expression for (AlA) where A = ( a ij ) 2 For the other, try to show that in terms of the aij . It should remind you of (AlB) is not positive definite. )
2::7= 1
IRn
IR,
I
.
Exercise 3.9 Complete the proof that
/_: f (x) g (x) dx is inner product on the complex vector space 'C'c [ -a, a] of Example 3.2 4. Exercise 3 . 1 0 Let V be the real vector space JR3 with the standard inner product. Describe the set of vectors in V which are orthogonal to ( -1, 0, 2 )T. In particular, show that this is a subspace of V, find a basis for it, and hence (J ig) =
an
deduce its dimension.
4 B i l i near a nd sesq u i l i near forms In the last chapter we introduced the idea of an inner product on a vector space. Our main example was the standard inner product on IR3 ( and, more generally, on !Rn ) This inner product has a particularly elegant geometric meaning, but we saw other important examples ( involving vector spaces of continuous functions and integration, for example ) where the geometric interpretation isn't so clear. It turns out that there are many other interesting forms defined like inner products which are not positive definite-some of which, like Minkowski space ( Example 4. 1 ) , have significant physical interpretations. This chapter starts off the study of these forms, and in particular shows how they may be represented by matrices. .
4. 1
B ilinear forms
Example 4 . 1 (Minkowski space) This example is very important in the the ory of special relativity. Take V = with three 'space' coordinates x, y, z and one ' time' coordinate t, and define a function F : V x V --+ by
IR4,
IR
c is the speed of light. Then F is symmetric, F ( (x1 , Y1 , z1 , t l ) r , (x2 , y2 , z2 , t2 ) T ) = x1x2 Y1 Y2 z1 z2 - c2 t1t2 = F ( (x 2 , y2 , z 2 , t2 ) r , (xl , Yl , zl , ti ) r ) ,
where
+
+
and linear in both arguments, e.g.
+ + + +
F (.A (x1 , Yl , z1 , tl ) T t-t (x 2 , y2 , z2 , t2 ) r, (x 3 , Ys , zs , ts) T ) = (.\x1 fJ,X 2 )X 3 + (.\yl MY2 ) Ys + (.\z1 fJ,Z2 )z3 - c2 (.\t1 + t-tt2)t3 = .A (x1 x 3 Yl Y3 z1 z3 - c2 t1ts) t-t(X2X 3 Y2Ys + Z2Z3 - c2 t2ts) = .A F ( (x1 , Y1 , Z1 , tl ) r, (x3 , Y3 , z 3 , t3 f) + t-t F ( (x 2 , Y2 , z2 , t2 ) r, (x 3 , y3 , Z3 , t3 ) r ) , but if we try to define a 'norm' by ll v ll 2 F( v, v) and a ' distance' by d(v , w) = ll v - w ll , then we run into problems since F is not positive definite. In fact F(v - w, v - w) can be positive, giving a so-called space-like distance
+
+
=
+
+
62
Bilinear and sesquilinear forms
d(v, w) JF(v - w, v - w), =
negative, giving a time-like distance
d(v,w) ± J-F(v - w, v - w), or even zero, as is for example d((O, O,O, O)r, (c,0, 0, 1)T). =
In this chapter we will generalize the idea of inner product by dropping the condition of being positive definite, and sometimes also the symmetry condition.
F from u, v, w o: , (a) F( o:u + (Jv,w) = o:F(u,w) + (JF(v,w), and (b) F(u, o:v + (Jw) o:F(u, v) + (J F(u, w). Definition 4.3 A bilinear form on V is symmetric if also (c) F(u,v) F(v,u) for all u, v E V . Example 4 . 4 (The Lorentz plane) Take V IR2 and define the map F from V V to IR by F((a, b)r, (c, d)T) ac-bd. Then F is a symmetric bilinear form on V (as you should check) , but is not an inner product since F((O, l)r, (0, 1)T) = -1, and therefore property (c i) in Definition 3.1 of an inner product fails. We also have F((1, 1)r, (1, 1)T) = 0, so there is a nonzero vector whose 'norm' is zero, and property ( c ii) fails also. Definition 4.2 A bilinear form on a real vector space V is a map V x V to IR which for all E V and (3 E IR satisfies =
=
=
=
x
The next example is really a whole family of examples. It is particularly important since it will turn out that all bilinear forms on finite dimensional real vector spaces can be regarded as essentially just one of these examples, in the same way that every finite dimensional real vector space is isomorphic to IRn for some n . Example 4.5 Let V be the real vector space IRn , and suppose is an n x n matrix with entries from JR. Define an operation
F on V
x
V by
A
F(x, y) = xTAy. Since xT is a 1 matrix , A is and y is 1, this is well-defined matrix multiplication and also F(x, y) is a 1 1 matrix, which we can consider the same the real number which is its only entry. Thus F: V V IR and it turns out that F is a bilinear form on V. Furthermore, if A is symmetric , i.e. if AT A, then F i s symmetric. x n
as
n x n,
x
n x
x
-+
=
Proof To prove all these claims, we must check the two axioms for a bilinear form. We have
F( o:x + (Jy, z) ( o:x + (Jy)TAz ( o:xT + (JyT)Az o:xTAz + (JyTAz o:F(x,z) + (J F(y,z), =
=
=
=
Representation by matrices 63 and similarly,
F(x, o:y + {3z) xTA(o:y + {3z) xTA(o:y) + xTA(f3z) o:xTAy + {3xTAz o:F(x, y) + {3 F(x, z). Now suppose A is symmetric. Then since the transpose of a 1 1 matrix is itself, F(x,y) F(x,y)T. Using the fact that the transpose of a product of =
=
=
=
x
=
matrices is equal to the product of the transposes taken in the opposite order ( Proposition 1.3) , we have
F(x,y) F(x,y) T (xTAyf (yTAT (xT f ) (yTAx) F(y,x), since AT A and (xT)T x. Example 4.6 If V !Rn and A is the identity matrix =
=
=
= =
=
=
=
then
D
n x n
F((x1, · · · ,xn ) T , (y1 , · · · ,ynf) (x1, . . . ,xn )A(y1, · · · , ynf (x1, · · · , Xn )(y1, · · · ,yn ) T X1Y1 + · + XnYn , =
=
so
F is the standard inner product on V .
4.2
=
Representation b y matrices
·
·
e 1 , . . . , en , v w v 2:: 7= 1 vi ei w 2:: 7= 1 wi ei .
Suppose we have a real vector space V with an ordered basis and a bilinear form defined on V . Given two vectors and we can write them as linear combinations of the basis vectors, say = and = Then, using bilinearity,
F
F(v,w) F(� vi ei , t WJeJ ) n n L L Vi WJF(ei ,eJ ) · i=1 j=1 =
=
=
� vi F(ei , t WJeJ )
64 Bilinear and sesquilinear forms
F
F(e;, ej)
Thus if we know the values the form takes on the basis elements, i.e. for all i , j , then we can work out the form on any pair of vectors. For convenience, we may put the values of the basis vectors into a matrix,
F(e;,ej )
A (a;j ) defined by a;j = F(e;,ej) is called F e 1 , . . . , en Notice that if F is a symmetric bilinear form , then F(e;, ej ) F(ej , e;) for j , so the matrix A is symmetric in the sense that a;j = aji for all i , j , or in other words A = AT. Definition 4.7 The matrix
=
the matrix of the bilinear form
with respect to the ordered basis
of V .
all i ,
=
Now, in V, any vector v is a (unique) linear combination
e1, e2 , .(.v1,. , evn , , v )T. 2 F, n Proposition 4.8 Suppose V is a real vector space with ordered basis e1, . . . , e n , and F is a bilinear form defin ed on V , with matrix A with respect to this basis. Then for any vectors v, w E V and their corresponding coordinate forms v = (v1 , . . . , vn )T, ( . . , wn )T with respect to the same basis e1, e2 , . . . , en
of basis vectors. Thus with respect to the basis the vector v is determined by its coordinate form, the column vector This representation can be used to give an elegant formula for the form just as in Example 4.5. . • •
w =
WI ,
.
we have
Proof We have
=
by bilinearity.
n n i= l j= l
L L v;F(e;, ej )wj D
Representation by matrices 65 The conclusion is that every bilinear form on a finite dimensional real vector space is 'the same as' (more precisely, isomorphic to ) one of the examples de scribed in Example Thus matrices can be applied to calculations in examples of inner products on vector spaces other than as in the next example.
4. 5 .
Example
Nn ,
4.9 Let V be the three-dimensional vector space mials of degree less than with the inner product
3,
N3 [x] of all polyno
1 (!! g) = fo f(x)g(x) dx defined in Example 3.5. Then 1,x,x2 is an ordered basis of V , and the corres
ponding matrix of inner products is
A = ( 1�2 1/3 �j� 1/4 1/5 �j�) . This matrix can be used to find (x - 1j2x 2 + x), for example. We write x - 1 = ( -1) 1 + 1 x and 2x2 + x = 1 x + 2 x2 in coordinate form with respect to this basis as (-1, 1, 0)T and (0, 1,2)r, and work out (-1, 1,0)A(0, 1,2)T = -1/3. If you like, you can check this by direct integration, as follows. 1 1 1 1 1 3 10 (x - 1) (2x2 + x) dx = 1!0 (2x - x2 - x) dx = - - = ·
·
·
·
2
3
2
3.
Being able to represent bilinear forms as matrices is not the end of the story, however, as the next two examples show.
Example 4 . 1 0 If we take the standard inner product on N2 , then with respect to the standard basis (1, O)T, (0, 1)T the matrix of this inner product is the identity matrix ( � �) . If we choose a different basis, however, then we get a different matrix representing the inner product. For example, we calculate the matrix of the standard inner product with respect to the ordered basis (1, 2)r, ( -1, o)r. Here, ( ( 1 , 2)T!(1, 2)T) = 1 2 +2 2 = 5, ( (1,2)Tj(-1,0)T) = -1 + 0 = -1, ((- 1, 0)Tj(1,2)T) = -1 by symmetry of ( j ) , and ( ( -1, O)Tj( -1, O)T) = 1. So the representing matrix is ( � -D . Example 4 . 1 1 Take V = N2 and F(x,y) = x1y1 + X 1 Y2 + xzy1 where x (x1, xz)T and y = (YI, yz)r. With respect to the ordered basis e1, e2 given by _
we get the matrix B
=
(� i) , while with respect to the basis e� , e� , where _
66 Bilinear and sesquilinear forms
( _3 -2) . 2 1 Notice that the form F and hence both the matrices B and B' are symmetric.
we get B' =
4.3
The base-change formula
Since we can choose all sorts of different bases, it is important to know what happens to the matrix of an inner product if we change the basis. Let V be a real vector space with basis and bilinear form Sup pose we choose a new ordered basis and write our new basis vectors in terms of the old ones, as
F. fr, . . . ,e1,fn ., . . , en n fi = kLP = ! kiek, say. Write P for the matrix (PiJ), the base-change matrix. the standard basis e = (1,0,0)r, e = Example 4 . 1 2 Let Vl)r. IIRf3weandnowtakechoose (0, 1, O)r, e3 =(0,(0,0,1, -1)T we have for a new basis f11 = (1, 0, I)T, f22 = (2, -1, O)r, f3 fr = 1er + Oe2 + 1e3 f2 = 2er + ( -1)e2 + Oe3 f3 = Oer + 1e2 + ( -1)e3 so the base-change matrix from basis e1, e 2 , e 3 to f1, f2 , f3 is =
=
G -� J)
In this example, the base-change matrix is the matrix formed from columns with respect to the old basis equal to the coordinate forms of the vectors
er, e2 , e3 .
f1, f2 , f3
In general, The base-change matrix is the matrix P formed from columns giving the coordinate forms of the new ordered basis j1 , h, with respect to the old ordered basis To understand the importance of this matrix in calculations, consider a vector with respect to the new basis E V written in coordinate form as We have
vfr,f , . . . , f { 2 n}·
. . . , fn e 1 , e2 , . . . , en . l:�= I vdi
The base-change formula 67 so
P(O, . , 0, 0, . , O) T ) P, e1, e2 , . . . , en . P(v1 , v2 , . . . , vn ) T e1, e2 , . . . , env P fr, . . , fn e1, e2 , . . . , en .
Now . . 1, . . ( where the 1 is in the ith position is the column vector formed from the ith column of i.e. the coordinate form of j; with respect to the old basis is the coordinate form Therefore of with respect to In other words, Multiplication of a column vector by the base-change matrix converts coordinate forms from the new ordered basis to the old h, . ordered basis -I Also, it turns out that the inverse p of the base-change matrix always exists, and Multiplication of a column vector by the base-change matrix p- 1 converts coordinate forms from the old ordered basis to the new ordered basis !1 , h ,
e1, e2 , . . . , en
. . . , fn·
Example 4 . 1 3 I n the complex vector space C2 , take ordered bases a1, a2 and b1, b2 where Then the base-change matrices from the usual basis
and from the usual basis (1,
O)T, (0, 1)T to b1, b2 is = ( -ii 1 ) Q
1
.
( 1 , O)r,
(0, 1)T to a1, a2 is
a1, a2 b1, b2 a1 , a2 , b 1 , b2 , ) - 1 (� -1 ) .
This can be used to find the base-change matrix from as follows. to If is the coordinate form of a vector v with respect to then is the coordinate form of v with respect to the usual basis. Then is the coordinate form of v with respect to so the base Q l change matrix from to is
, T P (v1 , v(v12 ) Tv2 ) P (v1 , v2 ) T -
q- I p
a 1 , a2 b1, b2 = ( i _ 11 ) - l (� = 1 (i1 -- 1i -i 1) = 21 ( -1 1 -i i -z
2i
z
2i -
+
-. 1 z
2+i
z
68 Bilinear and sesquilinear forms The base-change matrix also enables us to convert between the matrices for bilinear forms. Given a bilinear form
F,
n
n
= L LPkiPlJ F(ek ,ez ) n n = k=LPki L F(ek ,ez )pzJ· l Now, Pk i i s the (i, k)th entry of pT, s o this expression i s equal to the (i, j) th entry of pTAP. We have proved the following. k=l l=l
1= 1
Proposition 4.14 ( The base-change formula) Given two ordered bases of a
Euclidean space V , e 1 , . . . , e n and f1 , . . . , fn , related by the base-change matrix . , e n to !I , . . . , fn , suppose and are the matrices of the inner product with respect to . . , e n and h , . . , fn · Then
P from basis
e1, . .
A. B
e1, .
B = pTAP.
Example 4.15 Take the standard inner product on JR2 , with respect to the two bases e1 , e2 where
Then with the notation of Proposition 4. 14 we have
B = ( -25 -21 ) .
P can be calculated as 1 ) - l ( 1 0) = ( 1 1 ) ( 1 0) (-1 1 ) . -1 -2 1 1 -2 - 2 1 5 -2 ( �) , (Alternatively, we can calculate P from first principles, by writing P = � so h = ae1 ce 2 , giving
The base-change matrix
1 3
1
= 3
+
+
2a c = 1 and a - c = -2, which can be solved to give a = -t and cand= �hence - Similarly we find b = t and d = -�.)
Sesquilinear forms over C We can check that
pTAP = ( -11 -25) (51 = i (� -3g) (-15 l 9
4.4
1 -1 2) ( 5 -21 ) =
S esquili near forms over c
1 9
69
1 -2) ( -1845 -1 �) (-25 -21 ) = B.
So far in this chapter, everything has been for vector spaces over the field of real numbers lit If we replaced JR:. throughout by any other field F, e.g. the field of complex numbers, everything remains valid, and we would have the beginnings of the theory of bilinear forms over the field F. In particular, a bilinear form on a finite dimensional vector space over F is represented by a matrix with entries from F and is isomorphic to one of the canonical examples F ( , on But we are interested also in inner products over C also, and these forms are not bilinear, but are instead sesquilinear, so the previous sections do not apply. This last section concerns matrix representations of such forms over C, and generalizes the notion of bilinear form in a different way, using the complex conjugate operation in C
A v w ) = vTAw
Fn .
Definition 4 . 1 6 A
sesquilinear form on a complex vector space V is a map
w E V and o:, {3 E C satisfies w), and w) {3F(u, w).
F from V x V to C which for all u , v , (a) F (o:u {3 v, = Ci F ( u, + /3F (v, (b) F (v, o:v + o: F ( u, v) +
+ w) {3w) =
Definition 4 . 1 7 A sesquilinear form F : V x V -+ C is conjugate-symmetric if also
(c) F ( u, v) = F (v, u) for all u, v
E V.
A s examples, note that any complex inner product (as i n the last chapter) is a conjugate-symmetric sesquilinear form. Everything we have done here for bilinear forms on vector spaces over JR:. applies equally well to sesquilinear forms on vector spaces over C, provided complex-conjugate signs are added in the appropriate place. First, we have a family of ' canonical examples' exactly analogous to Example 4.5.
Example 4.18 Let V be the complex vector space n x n
matrix with entries from C Define F on V
x
en , and suppose A is an
V by
F(x,y) =xTAy,
where y is the complex conjugate of the column vector y , formed by taking the complex conjugate of each entry. As before, F ( x , is a x matrix, which we consider as the same as the complex number which is its only entry. Thus F : V x V -+ C and is a sesquilinear form on V. Also, if is conjugate symmetric, i.e. if then F is conjugate-symmetric.
F AT = A,
y)
1 1 A
70 Bilinear and sesquilinear forms Proof This is just the same as before with a few complex conjugates added.
F(o:x + ;Jy,z) = (ax + /3yf Az (axT j3 yT)Az axTAz + j3yTAz aF(x, z) /3F(y, z), and F(x, o:y ;Jz) = xTA(o:y + ;Jz) xTA(o:y) xTA(;Jz) o:xTAy PxTAz o:F(x,y) ;JF(x,z). Now suppose A is conjugate-symmetric. Then F(x,y) F(x,yf and we have F(x,y) (xTAy)T yTATx = yr -AT +
=
=
+
=
+
=
+
+
+
=
=
=
=
=
X
= YTAx
= F(y,x), similar to the real case. Definition 4.19 Given an ordered basis e 1 , . . . , e n of a complex inner product space V, and a sesquilinear form F on V, the F with respect to this ordered basis is the matrix B whose ( i, j) th entry is b;j F(e;, ej)· If F is conjugate-symmetric, bji F(ej,e;) F(e;, ej) b;j, so B is conjugate-symmetric. Notice that a diagonal entry bii of a conjugate-symmetric matrix B is always real, since b;; = b;; . 0
matrix of the form
=
=
Proposition 4.20 With the same notation, if n
n
v L i=l v;e; and w jL=l Wjej, =
then
n
n
ej)Wj F(v,w) L i=l jLv;F(e;, =l =
=
=
=
Exercises 71 which we can interpret in matrix terms as
vTBw.
Proposition 4.21 (The base-change formula) With the same notation, if we change to the new basis h , . . . , fn , given in terms of the old basis by fi = I:; = I Pkiek, then
�
,
which is the ( i j ) th entry of the matrix
e 1 , . . . , en
P TBP.
Exercises
Exercise 4 . 1 Calculate the base-change matrix P for change of basis from e 1 , e 2 to e� , e� in Example 4. 1 1 , and verify the formula = in this case.
B' pTBP Exercise 4.2 Which of the functions F ( (x1, x 2 , x 3 )T, (y1, y2 , y 3)T) here are in ner products on JR3 ? (a) XI + YI + X2 Y2 + X3Y3· [Hint: what could the matrix representation be?] (b) 3x 1 Yr + Sx 2 Y2 + 4x3Y3 + 2xr Y2 + 2x2 Y 1 + 3x 2 Y3 + 3y2 x3 - x r Y3 - X3Y1 · [Consider 2(xr + x2 ) 2 + 3(x2 + x3 ) 2 + (x1 - x3 ) 2 .] (c) 3x r Yr + Sx 2 Y2 + 4x3Y3 + 2xr Y2 + 2x2 y1 - 3X 2 Y3 - 3y2 x3 - x1 Y3 - X3Y1· [Consider 2(xr + x 2 ) 2 + 3(x2 - x3 ) 2 + (x1 - x 3 ) 2 . ] Exercise 4.3 An inner product,
JR2 er ( 1 , O)T, e 2 (0, l)T. (b) Calculate the matrix representing ( I ) with respect to the basis fr (1/2, o)r , f2 = ( - 1/2, 1) r . (c) Calculate matrices representing the bilinear form
is defined on (a) Calculate the matrix A representing ( I ) with respect to the standard basis •
=
=
=
with respect to e1 , e 2 and
Exercise 4.4 Let
f1 , f2 . Is F an inner product?
Perform row operations on A of the form Pi >.. E IR, to write A in the form
:=
Pi
+ >..pi ,
1
:s; j < i :s;
3 and
72 Bilinear and sesquilinear forms
( ;) G ;) " b d
0 0
Compute a matrix P with
0
PA �
b d 0
Why is P invertible? Now calculate PAPT . What does this matrix represent? Exercise 4.5 A sesquilinear form on C3 is defined by F ((x, y, z ) ' , (x' , y ' , z') T ) ) � (X, ji, Z)
(-I;
2i
- 1 - 2i 0 -z
Compute the matrix of F with respect to: (a) the basis (i, l , O) T , ( 1 , 1 + i, l ) T , (O, O, i )T; (b) the basis ( - 1 - 3i , 2 + i, - 1 + 2i )r , (2 + i, 1 - 2i, 2 + i)r, (0, 0 , -i)T. Is F an inner product? Exercise 4.6 Let F be a sesquilinear form on the complex vector space C3 and e 1 , e 2 , e3 is the usual ordered basis oH:::3 . Suppose that F (f1 , f1 ) 1 , F ( f2 , f2 ) = 2, F(f3 , f3 ) = 2, and F(fi , fj ) = 0 for i ::/=- j , where fl = ( 1 , i, O) r , f2 = (0, i , 1 , fl = (0 , 0, 1 + i) r .
f
=
Calculate the matrix of F with respect to the basis e1 , e 2 , e3 . Is F an inner product?
Exercise 4. 7 Write down the matrices A, B, C (respectively) of the standard inner product on �3 with respect to: (a) the standard basis; (b) the ordered basis (2, 1 , 1 )r, ( 1 , 2, O) r , (0, 1 , - 1)T; (c) the ordered basis ( -1, 1 , l ) T , (3, 0 , O)T, (1, 2 , - 1) T . Also write down base-change matrices P and Q (respectively) for: (d) base change from (a) to (b) ; (e) base change from (b) to (c) ; Verify that B = pTAP and C = QTBQ both hold.
Exercise 4.8 An alternating form F is a bilinear form on a vector space V satisfying F( v, v) = 0 for all v E V . (a) Show by expanding F(v + w , v + w ) that if F i s an alternating form on a vector space V then F is skew-symmetric, i.e. F(v, w) = - F(w, v) for all v and w in V . (b) Clearly the zero function i s an alternating form; give another example. [Hint: find a suitable matrix A.]
5
O rthogo n a l bases
IR3
One important use of the standard inner product of i s that i t allows us to distinguish between 'nice' bases such as the usual one where the basis vec tors are orthogonal to each other and each has length 1 , and others such as ( 1 , 0, 0)r, (1, 1, 0)r, ( 1 , 1 , 1 f where this is not true. The main objective of this chapter is to show how to find such nice (or, as we shall call them, orthonormal) bases for a given finite dimensional vector space with an arbitrary inner product, and to study the properties of these orthonormal bases. This chapter concerns inner products on vector spaces, and all our vector spaces are over or
IR
5.1
O rthonormal bases
IR,
In a vector space V over any two bases look the 'same ' . Indeed, by The orem 2.33 there is an isomorphism of real vector spaces taking one basis onto the other. In inner product spaces this is no longer true; certain bases turn out to be easier to calculate with than others, and isomorphisms of vector spaces do not necessarily preserve inner products.
Example 5 . 1 Let V IR3 with the standard inner product ( (x1, x2 , x3 ) T I (Yl , Y2 , Y3 ) T ) = x1 Yl X2 Y2 + x3 y3 and let e 1 ( 1 , 0, 0)r, e 2 = (0, 1, 0)r, e 3 (0, 0, 1)T be the vectors of the standard basis. These basis vectors have important properties not true for all =
+
=
=
bases. For example,
(e 1j e2 ) (e 1 j e3 )
= =
0, 0,
and i -::j:. j . This property i s referred t o b y saying that e1,(eijej) e2 , e3 } is 0anwhenever orthogonal set. Not all bases are orthogonal; for example,
So {
=
74 Orthogonal bases fl = ( 1, 0 , O)T , f2 = ( 1 , 1, o )T, f3 = ( 1, 1 , 1)T form a basis which is not ortho gonal, since (fdf2 ) = (f1 j f3 ) = 1 , and (f2 jf3) = 2. Moreover,
i.e. (eij ei) = 1 for all i. Orthogonal bases with this extra property are said to be orthonormal. Again, not all bases have this property. For example, g 1 = ( 1 , 1 , 1) r , g2 = ( 1 , - 1 , O) r , and g3 = ( 1 /2, 1/2 , - 1 ) T define an orthogonal basis which is not orthonormal , as (g1 j g1 ) = 3, (g2 jg2 ) = 2 , and (g3 j g3) = 3/2. One familiar fact about the standard basis e 1 , e 2 , e3 is that any vector v = (v1 , v2 , v3 ) T E IR3 is equal to the sum of its projections along the axes. That is,
and the scalar coefficients (ei jv) are easy to compute. Compare this with the much harder task of calculating the (unique!) scalars )q , .\2 , and .\ 3 such that
i.e. of computing
Our object here is to show how such orthonormal bases can be found in arbitrary inner product space, and to describe some of their properties and applications. an
Definition 5 . 2 Two vectors v and w in an inner product space V (over IR or C) are said to be orthogonal if (vjw) = 0 . The set of vectors { v1 , v2 , . . . , Vi , . . . } is said t o b e orthogonal, and the individual vectors v1 , v2 , . . . , vi , . . . in the set are said to be mut ually orthogonal , if each pair of distinct vectors vi , Vj (i ::j= j) in the set is an orthogonal pair. A consequence of this definition is that if one of v or w is zero then v, w are orthogonal: if v is the zero vector then Ov = 0 = v so (vjw) = (Ov j w) = O(vjw) = 0 and similarly for w. This special case is somewhat uninteresting, and we shall usually specifically exclude it in the statements of our theorems. One very important fact about orthogonal vectors is that provided they are all nonzero then they are linearly independent. This is true for any number of orthogonal vectors.
The Gram-Schmidt process 75
Proposition 5.3 If V is an inner product space over or C, v1, . . . , Vn E V, v;independent -::/:- 0 for all i , and the v; are mutually orthogonal, then { v1, . . . , Vn } is a linearly set. Proof Suppose v1 , . . , V n satisfy 2.:: 7= 1 .A;v; 0; we need to show that each .A ; is zero. We can take the inner product with each vector Vj in turn to get 0 (vJ/ 0) (vjjt .A;v;} n .A;(vj/v;) L i=l lR
=
.
= =
i= l
=
(v;/vj)
0, (vj/vj)
since = 0 whenever i -::/:- j . But Vj -::/:so -::/:- 0 since the inner product is positive definite, and so = for each j. Therefore { , n is D linearly independent.
v1,
Aj 0
}
v1, v } . • .
Given a set { . . . , Vn of mutually orthogonal vectors, we can normalize them so that they all have norm 1 (in other words, unit length) , by defining
w1, w1,
(1)
The resulting vectors . . . , Wn are then orthonormal, according t o the next definition. Definition 5.4 A set { . . , Wn of vectors in an inner product space V is said to be orthonormal if = whenever i -::/:- j and also = 1 for each i. If an orthonormal set is also a basis of V, i t is called an orthonormal basis of V . 5.2
} (w;/ wj) 0
(w;/ w;)
.
T h e G ra m-Schm idt process
Another way of thinking of orthonormal bases is to consider those bases of a vector space V for which the matrix representing the inner product is as simple as possible. If we have an orthogonal basis { . . . , Vn for an inner product space V , then the matrix of the inner product with respect t o this basis is diagonal, since the entries off the diagonal are for some i -::/:- j . If we have an orthonormal basis, then the matrix for the inner product is the identity matrix since the diagonal entries are = 1. In fact, for any finite dimensional inner product space, we can always find an orthonormal basis. It is usually easiest to describe (and carry out) the method in two steps: first we find an orthogonal basis, and then we normalize the basis vectors to get an orthonormal basis as in ( 1 ) . The key c;tep i n the method i s the following lemma, which is illustrated in Figure for the case V = with the standard inner product. The idea of
(v;/v;)
5.1
v1 , v2 , (v;/ vj)
!R2
}
76
Orthogonal bases
u-x
x = ! l uJllvcosJ B v = vJ v Ju2 v v (u - x) v u v=0 = u - x is orthogonal to v ·
=
·
·
V · U -- v V · V
·
w
V
X
Fig.
5.1
Orthogonal projections o f a vector
IR2 :
this lemma is just as for given two vectors u, v, form w by taking away the 'projection' of u onto the direction of v.
Lemma 5.5 If u, v are any two vectors in an inner product space V with v -f:- 0, then the vector
w
=
u-
is orthogonal to v.
(vJu) v (vJv)
Proof First, (vJv) -1- 0 since v -1- 0 , so u - (vJu) / (vJv) )v is defined, and
(
)
(vJu) v (vJv) (v J u) (vJv) = (vJu) (v Jv) =0
/
(vJw) = \ v u -
as
I
0
required.
More generally, we can make the vector u orthogonal to as many vectors as we like, provided these vectors are orthogonal to each other.
.
Lemma 5 .6 If V is an inner product space, u, v1 , . . , Vk E V, and v1 , . . . , Vk are mutually orthogonal nonzero vectors, then the vector "" (vi Ju) w = u - L...., · · vi i = l (vl Jvl ) k
is orthogonal to v1 ,
. . . , Vk ·
Proof Note that we must have the condition vi -f:- 0 in order to be able to divide by (vi J vi ) in the formula for w.
The Gram-Schmidt process
77
For each j we have
as
=
claimed.
0, D
To see how the process of producing an orthogonal basis works, let us first consider a small example. We take the standard inner product on IR3 , and pretend we do not already know an orthonormal basis. Example 5 . 7 Let V be the vector space !R3 with the standard inner product, and take the ordered basis v1 , v2 , v3 where v1 (0, 1 , 1) v2 = (0, 2, 0) T, and v3 = ( - 1 , 1 , Take our first vector of the new basis to be w 1 = v1 = (0, 1 , l ) r . The second vector w2 has to be orthogonal to v1 , so using Lemma 5.5 we can take
=
O)r.
T,
The third vector w3 has to be orthogonal to both w1 and w2. Using Lemma 5.6 we take
Thus {w1 , w2 , w 3 } is an orthogonal basis of V (as you should check). We can normalize the vectors to get an orthonormal basis
{ ( � ) ( � ) ' (-�) } . 1/ 2 1 //2
'
1/ 2 - 1/)2
0
In fact what we have just done is a general method, called Gram-Schmidt orthonormalization. It can be stated either as an algorithm, or as a theorem. What's more, it works for inner product spaces over either the real or the complex numbers, with exactly the same proof.
78 Orthogonal bases Theorem 5 . 8 (The Gram-Schmidt process) If {v1 , . . . , vn } is a basis of a finite dimensional inner product space V , then { w 1 , . . . , wn } obtained by W1
is
an
=
V1
orthogonal basis of V .
Proof We prove this theorem by induction, using the following induction hy pothesis 9 ( k): { w 1 , . . . , Wk } is an orthogonal set of nonzero vectors in the space spanned by {v1 , . . . , vk } · For k = 1 , w 1 v1 ::/:- 0 , so 9 ( 1 ) is clear. Now suppose n > k ): 1 and 9 (k) holds. Then by induction the vectors w 1 , , Wk are all nonzero and mutually orthogonal, so, by Lemma 5.6, Wk+l is orthogonal to all the vectors w 1 , . . . , Wk . Moreover, Wk+l is nonzero, for otherwise =
.
•
.
gives
where ,\i (wi !vk+ l )/ (wi ! wi ) , so vk+l is in the subspace spanned by w 1 , . . . , Wk . But this is impossible since, by the induction hypothesis 9(k), w1 , . . . , Wk are in the space spanned by v 1 , . . . , vk , so Vk+l is in this space also; hence there are scalars /1i with =
so { v 1 , . . . , Vk , Vk+l } is linearly dependent, a contradiction. Hence wk ::/:- 0. Finally, by its definition, wk+1 is in the space spanned by w 1 , , Wk and Vk+l , and by the induction hypothesis each Wi ( i :s; k) is in the space spanned by v 1 , . , . , vk ; hence wk+l is in the space spanned by v 1 , . . . , vk+l , and we have D proved 9 (k + 1 ) . .
.
•
The Gram-Schmidt process
79
Corollary 5 .9 Any finite dimensional inner product space V has an orthonor , mal basis. In fact, given the orthogonal basis from the previous theorem, the set defined by
{w 1, wn } {y1, . . . , Yn } W i = Wi Yi = -/l wi l v'(wi l wi) • • •
is
an
orthonormal basis.
Example 5 . 10 (Legendre polynomials) In the vector space
3,
nomials of degree less than
with the inner product
ffi.3 [x] of all poly
I (ffg) = [ I f(x)g(x) dx, take the 'natural' basis {1, x, x2 } and apply the Gram-Schmidt process (without normalization) . First ( 1 j 1) = J� I 1 dx 2, and (xf1) = f 1 xdx = 0, so 1 and x are already orthogonal. Now (x2 f1) = J� I x2 dx = � and (x 2 f x ) 0, so we replace x2 by x2 ¥ . 1 = x2 1/3. Indeed, this process can be continued indefinitely, to produce polynomials 1 ' x ' x2 - 3 ' x3 - �5 x x4 - .2. x2 + 35' We can normalize these polynomials so that they have norm 1 . First, (111) = 2 so we replace 1 by v'2 /2. Next, (xf x) = 2/3 so we replace x by 4-x. Then =
=
-
-
l
x2 -
'
7
..1...
3�1 0(x2 -
so we replace i by i), and so on. These polynomials are scalar multiples of the Legendre polynomials. Since the square roots get rather cumbersome, the Legendre polynomials are usually defined instead by taking the appropriate scalar multiple so that Thus = = = = = . and so on + 4 8 8'
Pn (x) P (1) = n P (x) 1. 2 3 ' P �x, (x) �x Po(x) 1, P1 (x) x, P (x) �x � 2 4 3 35 x4 - lix2 � There is another way of looking at what we have proved in Corollary 5.9. Suppose V is a finite dimensional real vector space with inner product ( I ) . Suppose V has dimension then we know from Theorem 2.33 that V and ffi.n n;
are isomorphic as real vector spaces. But our vector space V has an inner product, and Theorem 2.33 says nothing about these. Instead, we make the following definition which imposes as a further condition on an isomorphism that it should preserve a bilinear (or sesquilinear) form too.
80 Orthogonal bases 5 . 1 1 Two real vector spaces V, W with forms F: V V � and G:Definition W W � respectively are if there is a bijection f: V W such that (2) f(u + v) = f(u) + f(v) (3) f(>..v ) = >..j (v) (4) F(u, v) = G(f(u), f(v)) for all u, v E V and all scalars >.. E �two complex vector spaces V, W with forms F: V V C and G: Similarly, W W C respectively are if there is a bijection f: V W -+
x
x
-+
isomorphic
x
-+
isomorphic
-+
x
such that
f(u + v) = f(u) + f(v) f(>..v ) = >..f (v) F(u,v) = G(f(u),f(v))
for all
-+
-+
(5) (6) (7)
u, v E V and all scalars >.. E C.
Corollary 5 . 1 2 Let V be a Euclidean space of dimension n . Then V is iso morphic to �n with the standard inner product as an inner product space. Similarly, each unitary space V of dimension n is isomorphic to with the standard inner product.
en
Proof We do the real case, the complex case being identical. Denote the inner product in V by F, and the standard inner product in �n by ( I ) .
By Corollary 5.9 there is an orthonormal basis proof of Theorem 2.33
v1, v2 , . . . , Vn of V . By the
defines an isomorphism of real vector spaces from V to �n . We must show (4) be two and holds too. Let and be typical elements of V , and let the coordinate form of with respect to the orthonormal basis of V . Note that = Since the matrix representing the inner and product on V is the identity matrix I (because is orthonormal) , by Proposition 4.8 we have
v = >.. 1 v1 +>..2v2 + · · · +An Vn w =)Tf.LJ VJ +J.L vz+· · - V a = (>.. 1 , >..2 , . . . , >..n b =2 (J.L 1, J.L2+J.L, . .,n. ,nJ,.L.n.)T. , V v,a w f ( w) b. v1 v2 n f ( v) = F v1, v2 , . . . , Vn F(v, w) = aTib = ar b = A1J.L 1 + A2 J.L2 + + Anf.Ln (f(v) i f (w)) ·
·
·
=
as
required.
0
Properties of orthonormal bases 81 5.3
P roperties of orthonorma l bases
There are a number of useful results about orthonormal bases, all using the same basic idea we used in proving Proposition 5.3. Most of the results here concern finite dimensional spaces and are rather and probably also these could be proved by using familiar in the case of Corollary 5.12 to show the space V in question is isomorphic to or with the isomorphism taking an orthonormal basis to the standard basis of or and then proving the result for the standard basis in or However, in the cases of these particular results, it is just as easy to prove them directly, and this is what we shall do here, starting with the real case.
en :
]Rn )
Inner product spaces over numbers.
JR.
IRn enn , n ]R e ) IRn en .
We start by considering spaces over the real
e1, . . . ,en }
Proposition 5 . 1 3 (Fourier expansion) Suppose that { is an or thonormal basis of a Euclidean space V. Then for any E V we have
v
n v L(e;[ v)e;. i= l =
�= l .\;e;. Then take the v toI:obtain ej, n (ej ! v) L .\;(ej[ e;) Aj. i= l Substituting back .\; (e;[ v ) into v 2:: � 1 .\;e; we obtain the equation required. Example 5 . 1 4 You are probably familiar with this result in JR3 . If {i, j , k} is the standard (orthonormal) basis, u · v is the standard inner product, and if v E IR3 , then
v
in terms of the basis, say inner product of both sides of this equation with
Proof We can write
=
=
=
=
=
D
v
=
( i · v)i + (j · v)j + (k · v)k.
That is, v is the sum of its projections onto the three coordinate axes.
Example 5 . 1 5 We consider the orthonormal basis { standard inner product, where
e1 e2 e3 e4
= =
=
=
( 1 /2, 1 /2, 1/2, 1 /2)r, (1/2, 1 /2, - 1/2, - 1/2)r, (1/2, - 1/2, 1 /2, - 1/2)r, (1/2, - 1/2, - 1 /2, 1 /2f,
e1, e2 , e3 , e4 } of IR4 with the and
v ( 1, 2, 3 , 4) T in terms of this basis. We calculate (e1 ! v ) (e2 [ v ) -2, (e3[v) - 1 , and (e4 ! v) 0 , and therefore v 5e1 - 2ez - e3.
and express =
=
=
=
=
=
5,
82 Orthogonal bases
(eiJv)
The coefficients in Proposition 5 . 1 3 are the coordinates of the vector with respect to the ordered basis e1 , They are also sometimes called Fourier coefficients because of their use in Example 3. 14. That example, however, is an infinite dimensional space, for which Proposition 5 . 1 3 does not hold in general. With infinitely many dimensions, there may be convergence problems for infinite series which you might learn about elsewhere.
v
. . . , en .
Example 5 . 1 6 Recall from Example 3.14 that the functions sin(kx) for different positive integer values of k are mutually orthogonal, with respect to the inner product = J::. J (x)g(x) dx, and have norm ,frr . Thus if f : V -+ V is ., defined by
(JJg)
f(x)
n Ak sin(kx) L k=1
=
then we can recover the coefficients Ak from the function f (x) as follows. First normalize the functions to give an orthonormal set of functions fk (x) , where fk (x) = ( 1 /../7r) sin(kx ) , so that f (x) = 2:::: �= 1 (.\k .j7r) fk (x). Then by Proposi tion 5 . 1 3 the coefficients Ak.j1f are given by
so Ak
=
1 (' ; }_7( f(x) sin(kx) dx.
Proposition 5 . 1 7 (Pythagoras's theorem) Suppose e 1 , normal basis of a Euclidean space V. Then, for all E V ,
v
. . . , en is an ortho
n 2 l vW L i= 1 (eiJ v ) . Proof From Proposition 5 . 1 3 , v 2:::: 7= 1 (edv)ei· Now take the inner product of both sides with v. =
=
D
In words, the square of the length of a vector is equal to the sum of the squares of its coordinates. This should make it clear why it is called Pythagoras's theorem.
Example 5 . 1 8 In Example 5 . 1 0 we saw that the polynomials :L!:. � x , and 3\1 0 (x2 �) form an orthonormal basis of the space IR3 [x] of polynomials of
2, 2
-
degree less than 3 with inner product (!Jg)
=
L> (x)g(x) dx.
Properties of orthonormal bases 83
x2 , we have X = 2 J151 0 ( 3 J41 0 (X2 - !3 )) + J32 J22 .
Applying this to the polynomial 2
So
x2 has ' coordinates' y'2/3, 0, and 2 Jl 0/ 1 5. By Pythagoras's theorem 2 l x2 1 2 = ( �2 r + cy;o)
= 92 + 45 = 52 8
J� 1 x4 dx. Pythagoras's theorem is the special case v w of the following. Corollary (Parseval's identity) If { e1, . . . , en } is an orthonormal basis of the Euclidean space V , and v, w E V , then n vje ;)(e;j w ) . (vj w) L( i= l Proof From Proposition 5. 1 3, w = 2.:: � 1 (e; ! w )e;, and taking the inner product of both sides of this with v gives the required identity. which you can check directly by computing
=
5.19
=
D
In IR3 with the standard inner product and standard basis, (vje;) is the ith coordinate of v. In this case,
v;
(vjw} =
3
v;w;. L i=l
5. 1 2.
In other words, Parseval's identity is another way of stating the isomorphism in Corollary So far, the results concern finite dimensional spaces. The next result is a version of Pythagoras's theorem for infinite dimensional spaces. But since we cannot in general form an infinite sum, the infinite dimensional version of Pythagoras's theorem becomes an inequality.
2.:: ::: 1 (v ! e;} 2 , Proposition (Bessel's inequality) If { e1, . . . , ek } is an orthonormal set (no t necessarily a basis) of vectors in a real inner product space V, and v E V , 5 . 20
then
k
L(e; !v ) 2 :::;; ll v l 2 · i= l
84 Orthogonal bases
v - :Z:7= 1 (eiJ v )ei, and compute JJwJJ as follows. 0 � J w j 2 (wJw) = ( v - l:)eiJ v )ei j v - L( ejj v )ej J i=l j= l = (vj v ) - 2 L (ei ! v ) 2 L L (eiJ v )(ej J v )(eiJei ) i=l i=l j= l 2 = (vJ v ) - """' L..,. i=l (eij v ) , since (eiJ ej) 0 except when i j , and ( ei! e i) = 1, a s required. Example 5.21 We can apply Bessel's inequality to Example 5.16. If g is any
Proof Let w
=
n
n
=
n
+
n
n
n
D
=
=
continuous function on [-7r, 1r] , we obtain
so
(x) = x, we get t;n (1;,- x sin(kx) dx) 2 � 1;,- x2 dx = 2K4 /3. Now integrating J.:rrx sin(kx) dx by parts gives ( -1) k+1 27r / k, as you can check, In the special case when g - rr
1r
- rr
so we have proved
or
In fact this is one of the special cases when the convergence problems can be overcome (although we will not prove it in this book) , and it turns out that
:Z:%: 1
(1/k2 ) = 11"2/6.
IC. All of the above results have analogues for spaces over C In all cases the proofs are the same, perhaps with some complex conjugates added.
Inner product spaces over
Orthogonal complements 85 Proposition 5 . 22 If { e 1 , . . . , e n } is an orthonormal basis of a complex inner v , w E V, then
product space V, and
n
:Z:::: i=l
( e ;lv) e ; . n (Pythagoras 's theorem) llvll2 :Z::: : l ( e;lv)l 2· i=l v
(a) (Fourier expansion)
=
=
(b)
(c) (Parseval's identity)
n n w 2:: w e (vl ) I:(vl e; )( d ) (e ;lv)(e;l w ) . =
=
i= l
Proof (a) is exactly as before.
n n n :Z:::: I(vl e ; W . :Z:::: (vl e;)(vle;) i=l ll v ll 2 = (vlv) i=l :Z:::: (vle;)(e;lv) i=l n For (c) , consider w :Z:::: ( e; l w ) e ; and take the inner product with v of both i=l =
=
For (b) , note
__
=
=
0
sides of this equation.
Proposition 5 .23 (Bessel's inequality) If { e1 , . . . , e k } is an orthonormal set (not necessarily a basis!) in a complex inner product space, then
k l:l(e; lv )l2 :( l l v ll2· i=l
Proof
n Let w v - :Z:::: (e;lv) e ; , and compute llwll as follows. i= l n n llwll2 ( wlw) \v - l:(e ; lv ) e;jv - l:(eJi v) eJ ) j n =l n i= l (vlv) - 2:: ( e;lv)(e ;lv) - l:(eJiv)(v ieJ ) j= l i=l n n 2:: 2:: ( e;l v)( eJ i v)( e;iej ) i=l j =l n = (v l v) - ""' L..., l ( e ;lv)l 2 , =
=
=
=
+
i= l
since 5.4
( e ; ie3 )
=
0 except when i
=
j . The result now follows as
llwl l2 � 0.
0
O rthogo n a l com p lements
This section describes the structure of vector spaces with an inner product in greater detail, and in particular introduces the notion of orthogonal subspaces. The main result uses the Gram-Schmidt method, and many of the ideas here are used implicitly in Chapter 7, especially in the proof of Theorem 7.8; also,
86 Orthogonal bases
Fig. 5.2 Orthogonal complements in JR2 in Chapter 13, Theorem 13. 16 gives an alternative short proof using orthogonal complements of results that will be proved by alternative methods. Nothing, however, in this section is actually required later, so the section can be safely skipped on a first reading if necessary.
Example 5 . 24 Let V = IE.2 and U = { ( x, x) : x E IE.} . Then the subspace W = { ( x, 0) : x E IE.} has the property that every vector in V is the sum of a vector from U and a vector from W :
+
(x, y) T = ( y, yf (x - y, Of. The same is true for the subspace W' = { (O, x) : x E IE.} : (x, y ) T = (x, x) T + (O, y - xf. We will say that W and W' are complements to U in the space V. In fact there are infinitely many complements, and in terms of the vector space structure on V there is nothing to choose between them. But in the presence of the standard inner product one complement stands out, namely the perpendicular line U' = { (x, -x) : x E IE.}. ( See Figure 5.2.) We start by giving definitions covering the ideas of ' complement' and 'ortho gonal complement'.
If
Definition 5.25 U and W are subspaces of a vector space V, then the sum of U and W is defin ed as
U + W = { u + w : u E U, w E W } .
Proposition 5 . 2 6 With this definition, U + W is a subspace of V . Proof I f v1 and v2 are elements of U + W, then there are elements u1 , u 2 of U and W 1 , w 2 of W such that v1 = u 1 + w 1 and v2 = u 2 + w 2 . Then for any scalar a
we have
v1 + o:v2
=
=
(u 1 + w l ) + o: ( u2 + w 2 ) ( u 1 o:u 2 ) + ( w 1 + o:w2 ) E U + W,
+
since u 1 + o:u 2 E U and w1 + o:w2 E W .
D
Orthogonal complements 87 by U
Example =
IR2 ,
5.27 Let V = and consider the two subspaces U and W defined E and W = E Then
{ (x,O) : x IR}
U+W=
=
= =
{ (O,y) : y IR}. {u w : u E U, w E W } { ( x, O) ( O, y) : x E IR,y E IR} { (x, y) : x E IR, y E IR} +
+
V.
On the other hand U U W consists only of the points which are on one of the two coordinate axes.
Definition 5 . 28 If V is a vector space and U is a subspace of V , then W is called a complement to U in V if (a ) W is a subspace of V , (b ) V = U + W, and (c) U n W = If these three conditions are satisfied, we write V = U EB W , and say that V is the direct sum of U and W .
{0}.
The point o f Example 5.24 was t o say that although there may be many vector space complements to a subspace U , in an inner product space there is a 'special' one, the orthogonal complement. This is defined next.
Definition 5 .29 If V is an inner product space and U is a subspace of V, we define
U j_
=
{v E V (u I v) 0 for all u E U}. =
:
This is called the orthogonal complement t o U in V , or ' U perp ' for short.
In view of the name, we ought to prove at once that U j_ is actually a comple ment to U , in the sense of Definition 5.28. In fact, this is not true in general, but it is true if U is finite dimensional. Before we prove this, we prove an easy lemma which will be very useful in practice when calculating orthogonal complements of subspaces. Lemma
5 .30 If V is an inner product space, U is a subspace of V, and U has
{u1, , uk } , then U l_ {v E V : (u i ! v ) 0 for all with 1 � i � k}. Proof Let W {v E V : (ui ! v ) 0 for all i with 1 � � k}. I f v E U j_ then (u ! v) 0 for all u E U , so in0 particular (ui! v) 0 for all i , so v E W. Conversely, for all i, and if u E U , then u can be written as if w E W, then (u; l w) u 2.::7= 1 >.;ui for some scalars )..i · Thus k k (u !w ) (L Aiu; j w J L >.;(u;! w ) 0, i= l i=l so w E U Hence U j_ = W , as required. a basis
.
•
•
=
=
=
i
=
=
=
=
=
=
j_ .
=
i
=
0
88 Orthogonal bases
w
This result means that in order to check whether a vector is in U j_ it is sufficient to check that is orthogonal to all vectors in some basis for U. We can now use this to show that U j_ is indeed a complement to U in the sense of Definition 5.28.
w
Proposition 5.31 If V is an inner product space, and U is a finite dimensional subspace of V, then (a) U j_ is a subspace of V , ( b ) u n ul_ = and (c) u + ul_ v . (Thus V U EB U l_ .)
=
=
{0},
v,w E U j_ , then (ujv) = (uj w ) = 0 for all u E U . (uj v+o:w) = (u j v) + o:(uj w ) = 0 for all u E U and all scalars o:, so v + o:w E U So U j_ is a subspace of V. For (b) , observe that if u E U n U l_ then in particular (uj u ) = 0, so u = 0. Thus U n Ul_ = {0}. We show for (c) that any vector can be resolved into two components, v5 in U and vp orthogonal to U. This is very similar to Lemma 5.6 (see also Fig ure 5 . 1 ) . We first use Gram-Schmidt orthonormalization (Theorem 5.8) to find an orthonormal basis for U, say { u 1 , . . . , u k }. Then for any v E V, define k vs = jL=l (v J uJ)uj and Vp = v - vs. By definition vs E U, and for all i (vpj ui) = (vj u i) - (vs l ui) k = (vj u i) - L(vJ uJ)(ujJ u i) j= l Proof For (a) , note that if Therefore
1_ .
we have proved that any vector v E V can be written v+p VpE UEj_ .uThus v = Vs + u l_ , so every vector in v is in u and therefore u + ul_ = v, as required. Proposition 5 .32 If V = U W , then dirn(V) = dirn(U) + dirn(W) . Proof Choose a basis { u 1 , . . . , u m } of U and a basis { w1, . . . , wn } of W. By the hypothesis V = U + W , any vector v in V is of the form v = u + w for some u E U and w E W, so can be written as m +n . v=L AiUi L flj Wj i= l j= l + u l_ ,
Therefore as
EB
0
Exercises 89 Thus
{ ,U u1 , . . .
m , w1 ,
..
.
, wn}
spans V . On the other hand, if
m
n
i=l
j= l
L A;U; + L f1jWj then
n l:>.;u; l:p1w1 { ... 0, wn} { } m
=
i= l
-
j= l
=
0
E unw
{0}, . . . , Um, . , Wn} =
whence .A; = for all i since u 1 , , u m is a basis of U and /1j = 0 for all j since WJ , is a basis of W . Thus { u 1 , w1 , . . is a linearly independent set, so is a basis for V, and .
dim(V) as
=m
•
•
+n
=
dim(U) + dim ( W ) ,
0
required.
Corollary 5 .33 If V is a finite dimensional inner product space, and U is a subspace of V, then (a) dim(U) + dim(U.l ) = dim ( V ) , and (b) (U.l ).l = U.
Proof The first part is immediate from Propositions and 5.32. Now i f x E u then (xlv) = 0 for all v E u L . Therefore X E (U.l ) .l , and hence u s;; ( U.l ) J· . But from the first part applied to U .l , we have dim( U .l ) + dim( ( U .l ) .l ) 0 dim( V ) , so dim(U) = dirn((U.l ) .l ) , and therefore U = (U.l ) .l .
5.31
Exercises
Exercise 5 . 1 Use the Gram-Schmidt process with the following inner products, starting with the ordered bases given, to obtain a new basis which is orthogonal. (a) V = B.3 with the standard inner product, and with the basis (2, -2, (b) V = IR2 , with inner product defined by
(1, 1, o)r,
(0, 1, -l)r,
l)r.
(x!y)
=
(1,0)r, (0, l)r. 1, {
(x1 x2 )
(- � - � ) (��) ,
and basis (c) V = IR3 [x] , with inner product (Jig) = f01 defined by f0 ( x ) = !J (x) = x, and h ( x )
f(x)g(x) dx, = x2 .
and basis
fa, !J , h
Exercise 5 . 2 If e 1 , . . . , en} is any orthogonal basis of a real inner product v E V, prove that
space V, and
(e; i v) e v=� L (e '·ie·) ' ;. i= l Use this to write x3 as a linear combination of the Legendre polynomials P1 (x) = x, P2 (x) = 1x 2 - ' and P3 (x) = %x3 - 1x.
1,
�
Po (x)
=
90
Orthogonal bases
IR3 is given by F (x y ) (x 1 X2 x3) ( � � =� ) (��) · -1 -1 3 Y3
Exercise 5 .3 A bilinear form on ,
=
Apply the Gram-Schmidt process to F, to find a basis with respect to which the matrix of F is diagonal. (You could start with the standard basis. Don ' t forget to check that the set of vectors that Gram-Schmidt gives you is a basis.) Is F an inner product? Explain your answer.
{ e1, e2 ,
Exercise 5 . 4 Let E = . . . } be an infinite orthogonal set of nonzero vectors in a real inner product space V . By normalizing E or otherwise, prove that for all E V
v
(vJeY2 {=: i l ei l ! for all natural numbers n. Exercise 5 . 5 By considering g ( x) x2 , Bessel ' s inequality, and the method of Example 5 .2 1 , find an upper bound for L�=l (1/n4 ) which is independent of Exercise 5 . 6 Let V b e the real vector space IR[x] of polynomials i n x with real coefficients, and let ( I ) be an inner product on V. Suppose that the sets {Po, Pl , · · · , Pn , . . } � V and {Qo, Q l , · · · , Q n , . . } � V are both orthogonal, with Pn and Q n both having degree n for each n E N. Prove that (Pn i R) (Q n i R) 0 for all polynomials R with degree k < n. [Hint: consider Fourier expansions of R.] Hence show that there are scalars A n with Pn An Q n · Exercise 5 . 7 (Legendre polynomials) Let V be the real inner product space IR[x] of polynomials over IR with inner product (P(x) JQ (x)) [11 P(x)Q(x) dx. The Legendre polynomials are defined to be the unique sequence of polynomials E V with Pn having degree n, (P;jPj) 0 for all i # j , and Pn (1) 1 for Palln (x)n. (See Exercise 5.6 to see why this specifies them uniquely.) (a) Prove that for all n E N there is O:n E IR such that Pn+ l (x) O:nX Pn (x) + (1 - O:n )Pn-l (x). (Consider the Fourier expansion of xPn (x). ] (b) Use integration by parts to show that (xPn (x)J P�(x)) 1 - � I ! Pnll 2 and (Pn -l (x)JP�(x)) 2, where denotes differentiation with respect to x. II v 1 2
>
�
"'
=
N.
.
.
=
=
=
=
=
=
=
=
=
1
Exercises 91 (c) Using part (a) , show that
Hence deduce from (b) that
n ;?: 1.
for (d) Using part (a) (twice!) and that
Hence, by induction on
and
(xPn (x)IPn+ l (x)) = (Pn (x) l xPn+ l (x)), show
n, show that II Pn ll 2 = 2n2+ 1
n Pn+I(x) = 2nn++ 11 xPn (x) - -n + 1 Pn- 1 · --
(e) Show that
(1 - x2 )P�(x) - 2xP�(x) n(n + 1)Pn = 0 by taking inner products with Pi for � n. Exercise 5.8 (Hermite polynomials) Let V be the real vector space !R.(x] with (P(x)jQ(x)) = f�oo e-x2/ 2 P(x)Q(x) dx. Let In = (x n \1 ) . (a) Prove that Io = /21i. (Hint: write I6 = JJ e-(x 2 +y2 )/2 dx dy and change to polar coordinates to evaluate this multiple integral.] (b) Show that h n+ I 0 and I2n = ((2n)!/(2n n!) )Io. (Use induction on n.) Hence show that ( I ) is an inner product on V. (c) The Hermite polynomials are defined to be the unique sequence of polynomi als Hn (x) E V with Hn having degree n, {Hn : n E N} is orthogonal for this inner product, and the leading coefficient of Hn (X) is 1. Use Gram-Schmidt or otherwise to show that H3 (x) = x3 - 3x Ho(x) = 1 H4 (x) = x4 - 6x2 3 3 H 1 (x) = x Hs(x) = x5 - 10x + 15x. H2 (x) = x2 - 1 i
+
=
+
92 Orthogonal bases (e) Show that
(xHn (x) I H�(x)) = � ( - I Hn l 2 + 1 Hn+ 1 1 2 o;, II Hn- 1 1 2 ) +
and
by using integration by parts. (f) By taking the inner product of the equation in (d) with
H�(x), show that 1 = 1 + f1n · f1n+ (g) Hence derive the following properties of the polynomials Hn : 1 . 1 Hn l 2 = n !J27[; ii. Hn+1 (x) = xHn (x) - nHn - 1 (x); iii. H�(x) - xH�(x) + nHn (x) = 0. Exercise 5 . 9 (Laguerre polynomials) Let V be the real vector space IR[x] with ( P(x)I Q (x)) = J000 e - x P(x)Q(x) dx. (a) Prove that Jt x n e - x dx = n! for all n. (b) Hence show that ( I ) is an inner product on V. (c) The Laguerre polynomials are the unique sequence of polynomials Ln ( x) E V with Ln having degree n, { L n : n E N} orthogonal for this inner product, and the leading coefficient of L n (x) is 1. Use Gram-Schmidt to show that L3 (x) = x3 - 9x2 3+ 18x - 6 Lo(x) = 1 L4 (x) = x4 - 16x + 72x2 3- 96x + 24 L1(x) = x - l L2 (x) = x2 - 4x + 2 L5 (x) = x5 - 25x4 + 200x - 600x2 + 600x - 120. (d) Derive the following properties of the polynomials L n : 1. 1 Lnl l 2 = (n!) 2 ; ii. L n + 1 (x) (2n + 1 - x)L n (x) + n 2 Ln - 1 (x) = 0; and iii . xL�(x) + (1 - x)L�(x) nL n (x) = 0. Exercise 5 . 1 0 Prove the following generalization of Proposition 5.32: if U and W are subspaces of a vector space V, then dim(U) + dim(W) dirn(U + W) + dim(U W ) . +
+
n
=
Exercise 5 . 1 1 I f U and W are subspaces of a finite dimensional inner product space V , show that (a) if U � W, then W.l <;;; U _j_ , (b) U W _j_ U .l n W .l (c) U_j_ + W _j_ <;;; ( U n W) i , and if also V is finite dimensional then U.l w t (U n W)_j_ .
( + ) =
+ =
,
Exercise 5 . 1 2 In which of the following is V
=U
EB
W?
Exercises
C3 , U JR4 , U
93
{(a,b,c)T : a+bO)T , (0, {(a,b,cf : a = b 2, O) T , 0, 2) T ) ; U
(a) V = = 0}, W = = = c} ; 1 , 0 , 1 ) T ) , = Span ( ( 1 , 0, 1 , (b) V = (0, 1, W = span ( ( 1 , 0, (c) V � [x], = {polynomials in V of even degree} , W = {polynomials in V of odd degree} ; (d) V = IRn [x] , = {polynomials in V of degree less than k } , W = {polynomials i n V divisible by x } . =
U
k
Exercise 5 . 1 3 For Exercise 5. 12, parts (a) and (b), find product is the standard inner product. Exercise 5 . 1 4 (Hard!) Let V '1\"'00 Lm= J
an
=
U_l, where the inner
€2 be the set of all real sequences (an ) such
th at converges. (a) Prove that V is a vector space under componentwise addition and scalar multiplication. (b) Define a suitable inner product on V , and prove that it is an inner product. (c) Deduce that if (a n ) and (bn ) are in V then
U
2
(d) Let be the subspace of all flnitc sequences (i.e. those which are 0 from some point on) . Show that _l ) _l = V and 0 } , and deduce that (U_l )_l ic
U.
U _l
=
{
(U
6
•
When IS a form defi n ite? In the last chapter we used the Gram-Schmidt process to obtain orthonormal bases for vector spaces with inner products. In this chapter we will present several variations of the idea behind Gram-Schmidt, this time applied to a form which need not be positive definite. Sections 6.1 and 6.2 both address the same question: given a real symmetric (or complex conjugate-symmetric) matrix how can we determine if the form is positive definite? And, if it isn't, how can we find a nonzero vector such that � 0? In this chapter, we consider bilinear forms on vector spaces over lR and sesquilinear forms on vector spaces over C only, since the notion of 'positive definite ' as we have defined it only really applies to these forms. Section 6.2 requires knowledge of determinants, and can be safely omitted or postponed until later if the reader is not familiar with these at this stage.
A,
xTAx xTAx 6.1
x
The G ra m-S chmidt process revisited
In this section we study symmetric bilinear forms (or in the complex case, conjugate-symmetric sesquilinear forms) over a finite dimensional vector space V, and try to answer the question: when is it an inner product? The idea behind the proof of these results is as follows: if our bilinear form F(, ) is an inner product we should be able to find a basis for which F has a diagonal representation, by the orthogonalization process. So we will try to mimic this process and see what we get. Looking back at Theorem 5.8 we see that we have to replace by F(w; , w;) , and use the formula
k-l F(w;,vk) Wk = Vk - "' � i=l F (W;,W; ) W;.
(w; ! w;)
(1)
But i n the more general context of symmetric bilinear forms, F(w; , w; ) could be 0, and we might not be able to divide by it.
Example
6 . 1 Let
A
be the matrix 1 2 0
The Gram-Schmidt process revisited 95 and let F(x , y) = xTAy be the corresponding form on IE.3 . Take the usual basis v1 = 0, v2 = (0, O)r, v3 = (0, 0, of IE.3 and apply ( 1 ) . This gives w 1 v1 0, O )T, F(w 1 , w J ) = =
(1, o)r,(1,
1,
2, 1)r
=
W3
'
=
-
F(w1 , v 3 ) F(w1 , w! )
F(w 2 , v3 ) 2 F(w2 , w2 )
( ) -- ( 1) - -1/2 (-1/21) - (-1/32/3) . 2 312 1
V3
0 0
-1
1
WJ
-
W
0 0
0
Now, however,
-1/3
; 2/3-�) (-���) (2/3, -1/3, 0
so we have found a nonzero vector, w 3 s o F i s n o t positive definite.
=
1
=
0
l)T with F(w 3 , w3)
=
0,
Used as in this example, the Gram-Schmidt process gives a method of de termining whether a form is positive definite or not; it does not always give a basis though. The formula applies only as long as F(w k > w k ) -::J 0 , and when 0 the Gram-Schmidt process comes to a stop. k , k is found with F( k ) = W W w But before we show that the method just outlined always works, we shall expand our terminology concerning positive definite forms.
(1)
6 . 2 A symmetric bilinear form F on a real vector space V is said to be definite if
Definition
F(v, v)
=
0
¢:?
v=0
for all v E V ; in other words, if F(v , v) can only be zero when v is zero. So F is positive definite if it is definite and F( v, v) ) 0 for all v E V ; F is said to be negative definite if it is definite and F ( v, v) � 0 for all v E V . I n general, for forms F on complex vector spaces, F (v, v) may not always be real, so it doesn't make sense to say whether the form is positive or negative def inite. However, for conjugate-symmetric sesquilinear forms F on complex vector spaces, Definition is valid since if F is conjugate-symmetric and v is in V then F(v , v) = F(v, v) , so F(v, v) is always real. The next lemma provides a useful criterion for proving definiteness.
6.2
96 When is a form definite?
F
Lemma 6.3 Suppose is a symmetric bilinear form on a vector space V over ffi., or a conjugate-symmetric sesquilinear form on a vector space V over C, and suppose is a basis of V . (a) If > 0 for all i and = 0 for all i -::f. j , then is positive definite. (b) If < 0 for all i and 0 for all i -::f. j , then is negative definite.
v1, . . . , Vn F(vi,vi) F(vi,vi)
F F Proof For (a) , let v E V be arbitrary. Since the Vi form a basis, v 2:: �= 1 AiVi for some scalars Ai. Then F(vi,Vj) F(vi,v1)
=
=
n n L = i=ln jL=l AiAjF(vi,vj) """' 2 L._.. i=l I Ai l F(vi, vi) =
;:: o
1 A il 2
(i -::f. j) is zero, each > 0 , and each ;:: 0 . since each Also, since each > 0 the only way could be equal t o 0 i s if = = ··· = 0. This shows that is positive definite as 0, i.e. if required. The argument for (b) is the same, just replacing > with < in the appropriate D places.
F(vi,v1) F(vi,vi) A1 A2 An =
v
F(vi,vi) F(v,v) F
=
Consider the Gram Schmidt process applied to a symmetric bilinear form
Fsymmetric on a real vector space V with basis vectors v 1, v2 , . . . , Vn (or a conjugate sesquilinear form on a complex vector space V ) . We compute vectors k - 1"' F (Wi, Vk) . Wk - Vk - """" i=l F (Wi, Wi ) W, far as possible either until we have found some Wk with F( W k J wk) 0, or else until we have obtained all vectors w1, w2 , . . . , W n · Suppose first that F(wi,wi) -::f. 0 for all k but F(wk,wk) 0. In this case we have Wi E span(v1,v2 , . . . ,vi) for all � k. But then Wk -::f. 0 since otherwise k - 1"' F(Wi, Vk ) . · ) Wi E span(v1,v2 , . . . ,Vk-d Vk """" i= l F(w,,w, L._.,
as
=
n
=
L._.,
i <
=
The Gram-Schmidt process revisited 97 contradicting our assumption that v1, v2 , . . . , vk is linearly independent. Thus 0 so F is not definite-in particular neither positive and F(wb wk) wdefinite k -::/:- 0 nor negative definite. Now suppose that the Gram-Schmidt process completes, and we are able to compute w1, w2 , . . . , W n with F( wi , wi ) -::/:- 0 for all i In this case, the proof of Theorem 5. 8 shows that w1 , w 2 , , Wn is a basis of V . By Lemma 6.3, if each F(wi , w i ) is positive then F is positive definite, and if each F(wi, wi) is negative then F is negative definite. So in each case the Gram-Schmidt method will determine whether F is positive definite, negative definite, or neither. =
:o:;
n.
. . .
Example 6.4 We apply the Gram-Schmidt process to the symmetric bilinear form
F(x , y) on IR3 . Take the usual basis compute
Vj
=
=
xT
(
�
�)
1 -2 0
-1
v2
( 1 , 0, o ) T ,
y
-3 =
( 0 , 1 , o)T '
V3
=
( 0 , 0 , 1)T ' and
So, with respect to the ordered basis ( 1 , 0, o)T, ( 1 , 1 , O)T, ( 1 , 1 , l ) T , the form has matrix representation
( - � - � �) 0
0
F
-2
and hence is negative definite by Lemma 6.3.
The only question left to answer here is whether a form can be definite even if it is neither positive definite nor negative definite. In other words, if the space V has two vectors with <0< can be definite? The answer is no.
F(v,v) F(w ,w), F Proposition 6 . 5 Suppose F is a symmetric bilinear form on a real vector space V, or a conjugate-symmetric sesquilinear form on a complex vector space V , and there are x, y E V such that F(x, x) 0 F(y , y). Then there is z E V with z -::/:- 0 and F(z, z) 0 . v,w
<
=
<
98 When is a form definite?
{x } x o:y F(x,x) l o: F(x,x) I2 F(y,y) < , y F(y,y) > F(x,y) v to work in the subspace spanned by x, y , using Gram-Schmidt with normalization to find a basis of this subspace with respect to which the matrix of F is ( � � ) . We start by applying the Gram-Schmidt formula to the vectors x, y, defining x1 = x ,\ F(x,y) x = y - -x y1 = y - F(x,x) f1 0. ( Note that so F(x1 ,x1 ) = f1 < 0, F(y1 ,y 1 ) = v - 1 "- 1 2 //1 , and F(x1 ,y 1 ) 1 1 F(y , y ) > 0 < 0 < v.) We now normalize, defining 1 x x = -vi-ii y = Jv - 1l "- l 2 /f1 y so F(x", x") 1 F(y", y") = 1 , and F(x", y") = 0. Then x" F(x", y" :j:. 0 (in fact, x", y" are linearly independent ) and F(x" + y ", x" y") x") + F(y",y") = -1 1 = 0 .
Proof Note first that is linearly independent, since if = then and these would have the same sign. = Suppose 0, and = = 11 0, = ,\. The strategy is -
as 11
II
I
I
II
=
-
,
+
+
+
D
The following corollary is immediate.
Corollary 6.6 A form it is negative definite. 6.2
F is definite if and only if either it is positive definite or
The l ea d i ng minor test
There is a well-known and useful test, called the leading minor test, which uses determinants to give an alternative to the Gram-Schmidt method to determine if a form is positive or negative definite, which we give now. First, we fix some notation. For simplicity, for the rest of this section we work in the vector space V = over the reals, and we suppose A is an n x n symmetric matrix defining the symmetric bilinear form F(x, y ) = xrAy . But if you wish, you can think of A as being a conjugate-symmetric matrix, V = en , and x , y) = xrAy , as all the arguments here work unchanged. Note also that by the previous chapter, we could have started with any finite dimensional real vector space with a bilinear form taken any ordered basis of it, and defined A to be the matrix representing with respect to this basis. Then all the results here will also apply to since, for example, is positive definite if and only if the matrix A is positive definite. We work throughout with the standard basis of V. Given scal ars ( i.e. real numbers) az m for 1 � l � i 1 and 1 � m � i , we will use the expression
IE.n
F(
-
FF, F, F e1, e2 , . . . , en
The leading minor test 99 a12 a22
a1 ; a2 ;
a;-1 2 e2
a ; -1 ; e;
all a21
(2)
a;- 1 1 el
with vectors in the bottom row of this 'determinant ' as a shorthand for the following expansion of a determinant by its last row:
al l
a1 i-2
a1 1
al i- I
e i- 1 + · · ·
e; ai-l !
ai-l i- 1 a1 1 a13
ali
ai- l ! ali
ai-l i-2 ai-J i a12 ali i l i + + ( - 1) e 2 + ( - 1) eJ . 1 1 aiai- 1 3 ai-l i ai- 1 2 ai- l i n Since e 1 , . . . , e n is the standard basis for IR. , this expression denotes a column vector whose first i entries are the ( i 1) x ( i - 1) determinants indicated, with -
alternating signs as shown. The remaining n - i entries are all zero. We have chosen a somewhat unusual way of expanding this determinant to make the work easier later on. Usually the determinant (2) would be expanded by the last row by writing down the e 1 term first. However, the sign of this term has to be adjusted by ( - 1 ) i + 1 , and we prefer to start the expansion with a term that does not have its sign adjusted in this way. You should be aware that strictly vectors ought not appear as the entries of determinants like this. (The determinant operation acts on an array of numbers, not a mixture of numbers and vectors. ) But as long as we restrict vectors to the bottom row of a determinant and be careful to interpret our determinants as in the expansion of (2) we do get a valid expression. Of course, it is impossible to interpret the expansion of (2) by any row or column other than the last row! Note also that (2) gives a vector, i.e. an element of v7, and not a scalar, which is what you'd get if you expanded an ordinary determinant.
Example 6. 7 In IR.3 , the determinant a1 bl e1
a2 a3 b2 b3 e2 e3
is the vector product or cross product a x b, where a = ( a1 , a2 , a3 )T and b (b 1 , b2 , b3 ) T which you may have seen elsewhere. For example,
=
100 When is a form definite? Proposition 6.8 Let V = m:n , A = ( a ;j) a symmetric n let F(x, y) = a ;j = F(e; , ej )
x n real matrix, and Ay be the corresponding symmetric bilinear form on V (so = F(ej , e;) = aji for all i , j). For each i = 1 , . . . , n define
xT
al l (12 1 V; =
a 1; U2;
a 12 a22
Ui - 1 i
Ui -1 1 a; - 1 2 e2
e;
CJ
Then the v; are orthogonal with respect to F, i.e. F(v; , vj ) Moreover, F(v 1 , v 1 ) = a 1 1 and
al l F(v; , v;)
a1 i - 1
=
a;- 1 1
a l l a1 2 a2 1 a22
a; - 1 i - 1 a;
I
=
0 for all i -::/: j .
a! i U2 i
a;2
(3)
a;;
( Note that this proposition does not say whether F(v;, v;) is positive, neg ative, or zero, and does not imply that the v; form a basis. )
Proof First, if j F(ej , v;) equals
< i then by using the expansion of (2) , and using bilinearity,
a l i- 1 a;-1 i - 1 a ! i-2
(4)
ai -l i -2 ai - 1 i
which (using F(ej , ek )
fLi -1 2 =
ajk) equals the determinant
a;-1 1 a; - 1 2 aj 2 ajl But this determinant has two identical rows, so is zero. Thus F(ej , v;)
= 0.
The leading minor test 101 We can use this to prove the orthogonality of the vi . Here, we must show that F(vj , vi) = 0 if i =j:. j . Since F(vi , vj) = F(vj , v ; ) we might as well assume j < i. By (2) , Vj = {3jej + · · · + {31 e 1 for some scalars /3k (all being plus or minus certain (j - 1) x (j - 1) determinants) ; hence
F(vj , vi) = F({3j ej + · · + /11 e 1 , v ; ) = Bj F (ej , v;) + · · · + ,81 F(e 1 , vi ) =0 ·
required. Finally, we must compute F(v i , v; ) . First, note that by definition, v 1 = e 1 so F(v 1 , v l ) = F ( e 1 , e 1 ) = a11 . Now suppose i > 1 and note that by (2) again,
as
for scalars o:k , where ai is the determinant
a1 2 a22
all a21
a; - 1 i- 1 and the other o:k are given by other determinants as in (2) . So
F (v; , v; ) = F( o:.; e ; + Cti- 1 e ;-1 + · · · + 0:1 e 1 , v; ) = o:i F(e i , vi ) + O:i- 1 F(ei - 1 , vi) + · · · + = o:. i F (e ; , v; ) + 0 + · · · + 0 and, by a calculation identical to
0:1
F(e 1 , v;)
(4) ,
F (e i , v; ) =
al l a2 1
a12 a22
a1 i a2 ;
0
which gives (3). This argument gives an alternative to the Gram-Schmidt algorithm.
Theorem
6 . 9 If F is an inner product on the finite dimensional real vector space V , V has basis e 1 , . . . , en, and a ij = F(e ; , ej ) , then the vectors
Vi = a; - 1 1 e1 form an orthogonal basis of V .
a;- 1 2 e2
102 When is a form definite? Proof Since the v; are orthogonal, it suffices to show that they are all nonzero, and to do this it suffices by (3) to show that each determinant D; =
a1 1 a21
a12 a22
a; 1
a ;2
ali a2 ;
a ;; is positive. For i = 0, we conventionally define D0 = 1. Now, we use induction on i to show that D; > 0 . Assume inductively that i > 0 and D; _ 1 > 0, and note that by ( 2 ) v; = D; - 1 e; + /i- 1 e;- 1 + · · · + /1 e 1
for some scalars /k · If v; = 0 then
0 = D; - l ei + /i- l e i- 1
···
+ +
11 e 1
gives a linear dependence between e 1 , . . . , e; with not all coefficients zero (since by assumption D;_ 1 > 0 ) , which is impossible. So v; I- 0 and by (3)
F(v; , v;) = D ; - 1 D ;
>
0
D ;_1 D; / Di- l , the quotient of two since F is positive definite. Finally, D ; positive real numbers, so Di is also positive, as required. Therefore the induction D continues. =
Actually, the vectors v i are not very different from the vectors in the Gram Schmidt process. If F is an inner product on IRn and
k- 1 F(w; , ek ) w k = e k - "'"' � . ) wi , i = l F( w, , w, ·
(5)
then it turns out that V;
= D i- l Wi .
See Exercise 6.5. Our analysis gives necessary and sufficient conditions for a symmetric bilinear form to be positive definite.
Theorem 6.10 (The leading minor test) If F is a symmetric bilinear form on the finite dimensional real vector space V = IRn , e 1 , . . . , e n is the standard basis of V , and aiJ = F(e i , ej ) , then F is positive definite (i.e. is an inner
product) if and only if the determinants
Di = are all positive.
The leading minor test 103 Proof If F is positive definite, then the proof of the previous theorem shows that each D ; > 0. On the other hand, if each D ; > 0, then the vectors v; defined in the proposition are orthogonal and have F(v; , v; ) = D ;_ 1 D ; > 0 (where Do = 1 ) . Therefore each v; is nonzero and hence {v 1 , v2 , . . . , vn } is an D orthogonal basis of V . This suffices, by Lemma 6.3. The determinants D; here are called the leading minors of the matrix A. In calculations, the Gram-Schmidt vectors are usually easier to compute than the leading minors D; or the vectors v; defined here using determinants, but for some matrices A it may be possible to spot values for the leading minors rather quickly.
1 )
Example 6 . 1 1 We can apply the leading minor test to the matrix -1
2 0
2/
�
of Example 6.1; A does not define a positive definite form, since it has determ inant 2 ( 4/3) - 1 (2/3) - 1 2 = 0. On the other hand, ·
·
·
B=
(-1i ; -�1) 0
has leading minors 2 , 3 , and 1 as you can check, so does define a positive definite form.
Theorem 6 . 1 2 If F is a symmetric bilinear form on the finite dimensional real vector space V = !R.n , V has basis e 1 , . . . , e n , and aij = F(e; , ej ) , then F is negative definite if and only if the signs of the determinants
D; =
a1 1 a2 1
a12 a22
a1 ; a2 ;
ail
ai-1 2
a ;;
alternate, with D; < 0 for odd i and D i
>
0 for even i .
Proof If F is negative definite, then for the vectors v i in Proposition 6 . 8 we have F (vi , v; ) = D; - l D i . By convention, Do = 1 > 0 and v 1 = e 1 f= 0 so D 1 = a 1 1 = F ( e 1 , e I ) < 0. The proof of Theorem 6. 9 can now be modified to show that the signs of the D i alternate. Conversely, if the signs of the D; alternate, then the vectors vi defined in the proposition are orthogonal and have F ( vi , vi ) = D;_ 1 D; < 0 so F is negative D definite, by Lemma 6.3.
104 When is a form definite? Exercises
Exercise 6 . 1 Which of the following real symmetric matrices A are positive definite? (a
+! �l -i)
(c)
(b)
G �)
(d)
G n
G�D
Use the Gram�Schmidt process, the leading minor test, and/or row operations. If you find your matrix A is not positive definite, write down a nonzero vector x such that xTAx :::; 0.
Exercise 6.2 A symmetric bilinear form f on a real vector space V is called positive semi-definite if f(v, v) ? 0 for all v E V . Similarly, it is called negative semi-definite if f ( v, v) :::; 0 for all v E V. Show that if there is a basis of V with respect to which the matrix A of f is diagonal, then (a) if all diagonal entries of A are nonnegative then f is positive semi-definite; (b) if all diagonal entries of A are nonpositive then f is negative semi-definite.
Exercise 6.3 For each of the following real symmetric matrices A , determine whether the corresponding symmetric bilinear form is positive definite, negative definite, positive semi-definite, negative semi-definite, or none of these. (a)
(c)
(-� - � -i) (-i -� -�) 4 -7 - 19
-5
1
2
(b)
(d)
H ,�)) (- i
-7 12 0 - 1 -3 7 . 3 7 17 -3
Exercise 6.4 For each of the following complex conjugate-symmetric matrices A, determine whether the corresponding sesquilinear form is positive definite, negative definite, positive semi-definite, negative semi-definite, or none of these. (The definitions of positive and negative semi-definite are exactly the same as for real symmetric matrices above.) (b)
(2 -+13i
1 4+i - 15 3i -3i -4
)
.
Exercise 6.5 Suppose v and w are vectors i n !Rn , F i s an inner product on IRn given by matrix A = ( aij ) with leading minors D i , and F ( w , e i ) = F ( v , ei ) = 0
Exercises 105 for all i :::;= k, where k < n ; also, suppose v E span(e1 , . . . , ek, w). Show that v = >..w for some scalar >.. . Hence deduce for
k- l F ( w i , ek ) . w k - ek - ""' � . F ( w i , w i ) w, t= l _
and
a1 1 a2 1
a1 2 a22
ali a2i
ai- l ! el
a i-1 2 e2
ai-l i ei
Vi =
that v i = D i- l w i . [Hint: for the first part, consider an orthonormal basis of span( e 1 , . . . , ek ) .]
7
Q uad ratic forms a nd Sylveste r ' s l aw of i nertia
F
Given a symmetric bilinear form on a finite dimensional real vector space V we attempt to find 'nice' bases where the matrix representation of the form is as simple as possible, in the same sort of way as we did in the Gram-Schmidt orthogonalization process for inner products. The theorem which says that this is possible and which describes all the possible matrix representations one can get is called Sylvester's law of inertia. (Sylvester 's law of inertia has nothing to do with inertia: the story goes that Sylvester said, ' If Isaac Newton can have a law of inertia, then so can I', and consequently gave this theorem the name it has been known by ever since. ) This 'law' is exactly analogous to Theorem which classified all finite dimensional vector spaces using dimension; it turns out that symmetric bilinear forms are classified by the dimension of the underlying vector space and two further num bers, called the rank and the signature of the form.
2.33
7. 1
Q u a d ratic forms
! I l ! v ! 2 (v!v ), ! v i v!( v ). l v + w/ ! 2 = (v + w lv + w) = (v!v) + 2(v ! w) + (w! w )
= so is the by Given an inner product ( I ) , we defined a norm positive square root of We can also reverse this process, in the sense that, given the norm, we can find the inner product that gave rise to it, as follows.
so
In the more general context of a symmetric bilinear form F, we find that
F(v, v) can be negative, so it does not make sense to take its square root. Apart from this technicality, however, we can do the same thing to an arbitrary sym metric bilinear form as we did to an inner product.
Q(v) F(v, v); Q F
Definition 7 . 1 Given a symmetric bilinear form on a real vector space V , we define a map V --+ IR by i s called the quadratic form = associated with the symmetric bilinear form F.
Q:
Quadratic forms 107 Given Q we can again recover F, since
Q (v + w) = F(v + w, v + w) = F(v, v) + F(v, w) + F(w, v) + F(w, w) = Q (v) + 2F(v, w) + Q (w) so
F(v, w) = � (Q (v + w) - Q (v) - Q (w)). Thus we can pass freely from the symmetric bilinear form to the corresponding quadratic form and back again. We state this formally.
Lemma 7.2 Let F be a symmetric bilinear form on a vector space V, and Q the associated quadratic form. Then for all v, w E V ,
F (v , w) = � ( F(v + w, v + w) - F(v, v) - F(w, w)) . (i.e. F (v, w) = � (Q (v+w) - Q (v) - Q (w)) where Q is the corresponding quadratic form.) Warning. Lemma 7.2 only works for symmetric bilinear forms (why?) . We will see analogues of this lemma for sesquilinear forms over C later on. (See Lemma 7.20.)
Example 7.3 If V = JR3 and F is the standard inner product, defined by
then
The name 'quadratic form' comes from the fact that Q is given by a homo geneous quadratic function of the coordinates. (Homogeneous means each term has the same degree-in this case degree 2, being x 2 or xy for some variables x and y . ) Indeed, this could be taken as the definition of a quadratic form (see Proposition 7.6).
Example 7.4 Let V = JR3 and define Q : V ---+ IR by
Then
where x = (x1 , x2 , x 3 ) T and y = (y l , Y2 , Y3 ) T , and the two forms F and Q can be represented (with respect to the standard basis) by the matrix
(
108 Quadratic forms and Sylvester's law of inertia 1 1 -1
)
B = -� � -� We can work out F(v, w) as vTBw and Q(v ) as vTBv. For example, if v (1, 2, -1)T then
Proposition 7.5 If Q is a quadratic form on a real vector space V , then for all x E V and ..\ E � we have Q(..\x) = ,\2 Q(x). Proof If F is the symmetric bilinear form associated to Q, then Q(..\ x ) = F(.\x, .\x) ..\2 F(x,x) = ,\2 Q(x). In fact, every homogeneous quadratic function of the coordinates x1, . . . , Xn in �n is a quadratic form, and conversely every quadratic form on a finite dimen sional space is given by a homogeneous quadratic function of the coordinates. Proposition 7.6 Let V = �n . Then every quadratic form on V is given by a homogeneous function of the coordinates, of degree 2. Conversely, every homo geneous function of degree 2 of the coordinates is a quadratic form. Proof As in the above example, we work out Q( v ), where v = (v1 , . . . , vnf , by using the formula n n Q(v) vT Bv = L i=l j=Ll v;b;jVj, and we see that every term in the expansion of the right-hand side is quadratic in the v; . Conversely, if q(v) is a homogeneous quadratic function of the coordinates of v, then we can write n q(v) = q ( (v , . . . ,vn )T) = L i= l b;;vf + L i <j 2b;jViVj for some constants b;j. By defining b;j = bji whenever i > j, we can rewrite this n n n q(v) L i= l b;;vf + L i�j b;jViVj = L i=l jL= l b;jV;Vj which is, in matrix notation, again in the familiar form vTBv. D
=
=
r
as
=
D
Sylvester's law of inertia 109 7.2
S y lvester's law of i nertia
We return now to considering symmetric bilinear forms, but it should be borne in mind that, because of the previous section, all the results we prove apply equally to the associated quadratic form. There are two parts to Sylvester's law: the first part says that a 'nice' basis can be found, and the second says that the matrix representation of the form looks essentially the same whichever of these ' nice' bases you choose to take.
Example 7. 7 In Example 6.1 we obtained vectors w 1 , w2 , w 3 using the Gram Schmidt formula. In fact, w 1 , w 2 , w 3 is a basis of �3 , and with respect to this basis the form F has matrix
G + �)
The basis can be normalized by setting w; = wJ / J2 , w� w 2 / -/372, and w� = w 3 , so that with respect to this normalized basis, the matrix representation =
of F is
G � �)
The following theorem shows that any symmetric bilinear form can be represen ted by a diagonal matrix ( with respect to some basis) with 0 , 1, or - 1 on the diagonal.
Theorem 7.8 (First part of Sylvester's law of inertia) Let V be an n di mensional real vector space, and let F be a symmetric bilinear form on V . Then there are nonnegative integers k and rn and a basis { w1 , . . . , wn } of V such that F ( wi , wj ) = 0 for all i :/:- j , F ( wi , wi ) = 1 for i � k, F (w i w i ) = - 1 for k < i � k + rn, and F ( Wi , Wi ) = 0 for i > k + rn . ,
In other words, the matrix representation of F with respect to the ordered basis WJ , . . . , Wn is
1
0
0
0
1 -1 -
0
1 0 0
with k entries equal to '1' and rn entries equal to ' -1' on the diagonal , and all other entries 0. One or both of k, rn may of course be zero.
110 Quadratic forms and Sylvester's law of inertia Before we prove the theorem, we present some simple lemmas. The first says that if a quadratic form is identically zero, then so is the associated symmetric bilinear form.
F F(v, v) w). v E
Lemma 7.9 Let be a symmetric bilinear form on a real vector space V , and suppose = 0 for all V . Then = 0 for all V (with not necessarily equal to
F(v, w)
v, w E
v
Proof From Lemma 7.2,
F(v, w) = �(F(v + w, v + w) - F(v, v) - F(w, w)) so if F(v,v) = F(w,w) F(v + w,v + w) = 0 then F(v,w) = 0. =
0
The basic idea of the proof of Theorem 7.8 is to use the Gram-Schmidt method. However, as we have seen, for certain 'bad ' choices of initial basis vectors obtained by the formula this process may not complete, since some = 0. The trick we will use is not to work with any particular could have initial basis but to choose these vectors as we go along. To do this, we need two further lemmas, both being simple variations of results we have already seen for inner products, but here proved in more generality. The following lemma generalizes the fact that, for an inner product, nonzero orthogonal vectors are linearly independent.
v1, . . . , Vn F(W , W ) v1 , .k. . ,kVn ,
Wk
Lemma 7.10 Let F be a bilinear form on a real vector space V and suppose that
from V which are orthogonal for F, i.e. F ( W i , w1 ) = 0 w1for, allw2 ,i. f:. . j, .WFor n areallvectors scalars Ai (i = 1 , . . . ) if ,n ,
F(w F(wi ,wi) i ,wi )
Ai
then = 0 for all i such that f: 0. In particular, if f: 0 for all i then independent.
{w1,w2 , · · · ,wn } is linearly
Pro of This is just as in Proposition 5.3. If
then
F(wi , 0) F(w n i , A1 w1 + A2 W2 + + An wn ) j=Ll AjF(wi,wj) ,\i F(wi,w;), since F( Wj , wi) 0 for i f: j . But if F( Wi , wi) f: 0 we must have Ai = 0. 0= = =
=
· · ·
=
D
The next lemma is just another way of stating the basic idea behind the Gram-Schmidt formula.
Sylvester's law of inertia 111 Lemma 7. 1 1 Let F b e a bilinear form o n a real vector space V , and suppose w1 , w2 , . . . , wk E V is orthogonal for F, i.e. F(w; , wJ ) = 0 for all i f- j , and that F(w; , w; ) f- 0 for all i � k. Then for all v E V there is u E V such that F ( w; , u) = 0 for all i � k, and v is a linear combination of vectors from w1 , w2 , . . . , Wk , u. In other words, V = span(U U w1 , w2 , . . . , wk } ) where
{
U = { u E V : F ( w; , u) = 0 for all i � k } .
Pro of Define u by the Gram-Schmidt formula, k F ( w; , v) u = v - "" L F( w,. , w,· ) W;. i=l This expression has a well-defined meaning since each F(w; , w; ) f- 0. But then F(w; , u) = F ( w; , v ) -
F(w; , v) F ( w; , w; ) = 0, F( w,. w,· )
for each i , and k F(w; , v) v = u + "" L F( w,. , w,· ) w; . i=l So every v E V is a linear combination of the w; and some u E U.
0
Proof of Theorem 7.8 Our object first is to find a basis w1 , . . . , W n which is orthogonal for F, i .e. one with respect to which the matrix of F is diagonal, and this is done by an inductive argument. We will normalize the basis we get later. To start things off, choose (if possible) w1 E V so that F( w1 , wi ) f- 0. If this is not possible, then it means F (w, w) = 0 for all w; hence F(u, v) = 0 for all u, v E V , by Lemma 7.9, so the matrix of F with respect to any basis of V will be the zero matrix, which is certainly of the form required. Now suppose inductively that we have found a set w1 , . . . , Wk } <;;; V , where k < n, such that F(w; , wj) = 0 for all i f- j and F(w; , w; ) f- 0 for all i ; we show how to obtain Wk+ I · Define
{
U = {u E V : F(w; , u) = O for all i � k} . This i s a subspace of V , since if u , v E U then F(w; , u) = F(w; , v) = 0 for all :( k, so F(w; , u + .X.v) = 0 for all i :( k and all scalars .X.; hence u + .X.v E U .
i
Case 1 . I f there i s u E U with F(u, u) f- 0 , let wk+ 1 E U b e such a vector u, i.e. let Wk+! E U with F(wk+! , wk+ I ) f- 0. Then as Wk+ I E U, {wJ , . . . , wk , Wk+ d is orthogonal for F, and F(wk+ 1 , wk+1 ) f- 0 so our induction continues. Case 2. If F (u, u) = 0 for all u E U, we use the Gram-Schmidt idea to get a suitable basis for the whole of V . Lemma 7. 1 1 shows that V = span(U U
112 Quadratic forms and Sylvester's law of inertia
{w1 , . . . ,wk }).
F(w,w) F(u,v) wk+1 , . . . , w1 u,v E {w1, . . . , W k , Wk+I , . . . , Wt}· )
wE ( {w1, wk } ),
Note too that since = 0 for all U and U is a subspace of V , then = 0 for all U , by Lemma 7.9. Now take a basis of U . Since V = span U U we see that V = span We prove that this is a basis of V so in particular l = n . If
(
• . •
,
for some scalars A; , then A; = 0 for each i :S: k by Lemma 7.10, so
{wk+ 1, , wt}, F(w;,wJ) {w1, w2 , . . . , Wn } Normalization. The construction so far has gone by following 'case 1 ' above as far as possible, until there are no more vectors u available with F(u,u) -::J 0. At this point, ' case 2' completed the basis. All that remains is to normalize the basis w1, , Wn that we have found. This is done by setting w; if F(w;, w;) 0 y'F(L; ,w; ) 1 w; = .,j-F(w; ,w; ) Wi if F(wi ,w;) < 0 if F(w;, w;) = 0. Wi By reordering the basis w� , . . . , w� so that the vectors v with F ( v, v) 0 come first, those with F(v,v) < 0 next and those with F(v,v) = 0 last, we get the required basis. There will in general be many different bases with respect to which F has
and hence also A; = 0 for i > k by the linear independence of as required. Since = 0 for i :s; k < j and = 0 for i , j )! 0 , it is clear that the matrix of F with respect to is diagonal.
F(w;,wj)
.
{
•
•
•
•
•
>
>
D
the form given in Theorem 7.8, but the integers k and regardless of the basis.
m
are always the same,
Theorem 7.12 (Second part of Sylvester's law of inertia) Suppose
F
is a symmetric bilinear form on a real vector space V which has two diagonal matrix representations A, A' as in Theorem 7.8, with respect to ordered bases and If A has k positive diagonal entries and m negative diagonal entries, and A' has k' p ositive diagonal entries and m' negative diagonal entries, then k = k' and m = m ' .
e1, . . . , en
e�, . . . , e�.
e1, . . . , e , and e�, + . . . , e�. Thenk for
Proof Let U be the subspace of V of dimension k spanned by let W be the subspace of dimension n k' spanned by 1 every vector = I:7=l in U , we have
u
uiei
-
k F(u, u) = "' 2 )! 0 L..- u; i= l
,
Examples 113 and i f F(u, u) = 0 then u = 0 since each F(ei , e i ) > 0 and s o F i s positive definite on U, by Lemma 6.3. On the other hand, for every vector w L: 7= k ' +l wie� E W , we have =
F(w, w) Now if k
>
=
-
k ' +m' � wi2 � 0. L i=k '+l
k', the set
i s a set of size a t least n + 1, and V has dimension n, so B i s linearly dependent. Hence for some scalars A.1 , . . . , Ak, /-Lk ' + 1 , . . . , f-Ln not all zero Let Since v E span(e1 , . . . , ek ) = U , we have F(v , v) ?: 0, and similarly, as v E span(e�'+ l , . . . , e� ) = W , we have F(v, v) � 0. Hence F(v, v) = 0 and so v = 0 since F is positive definite on U. But then
with not all the Ai and f-LJ zero, giving either a linear dependence between the ei or a linear dependence between the ej , a contradiction. So k � k' . The other cases are proved similarly: if k < k' consider the subspaces spanned by e� , . . . , e�, and ek+ l , . . . , e n ; if m < m ' consider e 1 , . . . , ek , ek+m +l , . . . , en and e�, + I , . . , e�, + m' ; and if m' < rn consider e� , . . . , e�, , e�, +m' + 1 , . . . , e� and e k+I , . . . , ek+ m · In each case, note first whether F is positive definite or negative D definite on the spaces spanned by these vectors, using Lemma 6.3. .
This enables us to make the following definition without ambiguity.
Definition 7 . 1 3 With the notation of Theorem 7.8, k + rn is the
rank of F,
and k - m is the signat ure .
Proposition 7.14 With the same notation, F is positive definite (i. e. is an inner product) if and only if k = n and m = 0. Similarly, F is negative definite if and only if k = 0 and m = n . 7.3
Exam ples
Since quadratic forms are nothing but symmetric bilinear forms in disguise, and Sylvester's law of inertia (Section 7.2) classifies symmetric bilinear forms, Sylvester's law tells us something about quadratic forms too.
114 Quadratic forms and Sylvester's law of inertia Example 7.15 We apply the method used to prove Sylvester 's law of inertia to
7.4. Here we have Q ( (X! , X2 , X3) T ) = X12 + 2X!X2 - 2X!X3 + 2x2 x3 - 3x23 and the corresponding symmetric bilinear form F given above. First, choose the vector w 1 = ( l,O,O)r, which has Q((l,O,O)T) = 1. Now let U1 be the subspace U1 = { u : F(w1, u) = 0}. As F(w1, (Y l , Y2 , Y3 )T) = Yl + Y2 - Y3 we have U1 = {(y1,y2 , y3f : y 1 + y2 - y3 = 0}, which is spanned by ( -1, 1, O)T and (1, 0, 1)r. Next we need a vector v E U1 with Q(v) =F 0, if there is such a vector. If we take v = (1, 0, 1)T then we get Q(v) = 0, so that is no good, but if we take v = ( -1, 1, O)T then we get Q(v) = -1, which is all right. So we take w2 = ( 1 1 , o)r, and let U2 be the space U2 = {u : F(w1, u) = F(w 2 , u) = 0}. Then U2 = {(yl , Y2 , Y3 f : Y1 + Y2 - Y3 = -y2 + 2y3 = 0}, which is spanned by (-1, 2, 1)r. Moreover, Q((-1, 2, 1)T) = 0, so we can take w3 To= (-1,2, 1)r. sum up, with respect to the basis Example
-
the quadratic form
,
Q has the following shape:
In particular, this quadratic form has rank
2 and signature 0.
In practice, there are two principal ways to find the rank and signature of a quadratic form . The first is to ' complete the square'. In the above example, the terms involving are
x1 x21 + 2x 1 x2 - 2x1x3 = (x1 + (x2 - x3)) 2 - (x2 - x3) 2 , and so Q(x) = (x1 + x 2 - x3) 2 - x� + 4x 2 x 3 - 4x�. Then we collect all remaining terms in x2 , to obtain Q(x) = (x1 + X2 - x3)2 - (x2 - 2x3) 2 + Ox23 . To find the corresponding basis for the space in this example, we need to find the three vectors w1, w2 , w 3 such that (x1, x2 , x3)T = (x1 + X2 - x3)w1 + (x2 - 2x3)w2 + X3 W3 , for all x1,x2 ,x3 . Putting x1 = l,x2 = x3 = 0 we obtain (l,O,O)T = w1. Putting w1= +1 wwe2 obtain = (0, 1,0)r, so w 2 = (-1, 1,0)r, and = X3 =x10 =wex obtain Xsimilarly, 2 = 1,x1putting w = O,x = (-1,2, 1)r. 3 3 2
Examples 115 The second method uses the matrices instead, but is essentially the same calculation. Recall that a quadratic form, and the associated symmetric bilinear form, can be represented by a symmetric matrix. Changing the basis of the underlying space has the effect of changing the matrix A to AP, where P is the base-change matrix. Now pre-multiplication by corresponds to performing certain row operations on A. Post-multiplication by P corresponds to performing the same operations on the columns. We are trying to put A into diagonal form AP, and to do this, therefore, we must perform certain row operations to clear out each column in turn, simul taneously performing the same column operations to clear out the corresponding row.
Tp pT pT
Example 7. 16 We apply row and column operations to the following matrix A. A=
(-l J) -1 2 3
P2 :=p2+P1
p3 : = p3 -3 P2
G0 (! 0
J) _ ,n0 (! 0 -JlD
-1 1
1
p3 : = -irr p 3
K2:=K2 + tt1
0 (! 0 J) (� 0 _ ,D0 (� 0 J) pT , 0 c;?1 1J 1 3
3
1< 3 :=1<3-31<2
1
1< 3 : = -irr 1<3
1
1
So A has rank 3 and signature 1 . To find the corresponding matrix we have to keep track of the elementary row operations in the order in which they were carried out. Here we have
0 0 0 0 pT � (! 0 J,) (! D G 0 0) 0 00 ( ' c 0 )1 1 ) 0 J) (i 0 (� 0 J) 1
1 -3
1
1
and we can check that P TAP =
1
7il -3
1
1
7il -3
-1
-1 2 3
=
1
1 1
1
7il -3
_,)
7il
7il 1 7il -
3
This method works fine in most cases, but every so often you will come across a matrix which seems to resist diagonalizing.
116 Quadratic forms and Sylvester's law of inertia Example 7 . 1 7 Let A = and
(�
2, but this doesn 't work: A=
G �)
�)
. An obvious strategy to try is to swap rows 1
(�
�) .
The problem with this matrix is that for the corresponding quadratic form and e1 ) = e2 ) = 0, so the algorithm for the usual basis e 1 , e 2 or �2 we have Sylvester's law cannot start with either of these two vectors. What's more, the swap ( p1 , p2 operation only swaps these two basis vectors around--it doesn't introduce anything new. So we need to find a different vector v with Q(v) -::J 0 . e 1 + e 2 is one such, so we start o ff with the operation p1 : = p 1 + p2 ,
Q(
)
Q(
K 1 :=rq +K 2
-1/
�)
G
�)
K2:=K2 - �fq
G
-1/
�)
.
Note that in this case, as in many others, the 'obvious' method of first using row operations to get the matrix into upper triangular form and then doing the corresponding column operations does not work. A=
G �)
,: p =p, +p
G -U
2
�)
G
. -+ . 2 -_;._ P2'-=_;.P_ --', : p
G -D
( -12 -1)0 .
In general, you need to do the column operation corresponding to the last row operation immediately after that row operation, or else the method will not work.
Example 7.18 We now do the same example as in Example 7 . 1 7, but this time by 'completing the square '. = The quadratic form in question here is and there is no obvious way of 'collecting the terms'. So we 'change basis' to writing = +
Q(x,y) 2xy, (x x y), y, (x + y) - y. Q(x,y) 2xy = 2((x + y) - y)y = -2y2 + 2y(x y). We can now collect up the y terms as before, -2y2 + 2y(x + y) - 2(y - 1 (x + y)) 2 + 1 (x + y) 2 showing as before that the form has rank 2 and signature 0. +
=
=
x
Applications to surfaces 117 7.4
A pplications to surfa ces
Sylvester's law, or the leading minor test of the previous chapter, is often useful to determine the nature of stationary points of a function of several variables. We assume that we have a sufficiently smooth function F : JR:. n ---t JR:. which is twice differentiable in a neighbourhood of some a E JR:.n . The graph of this function is thought of as a (possibly higher dimensional) surface in JR:.n +I . For example, the function F(x, y) = x2y2 - x2 - y2 + 1 has as a 'graph ' the surface consisting of all points (x, y, z)T E JR:.3 satisfying z = x2y2 - x2 - y2 + 1 . Just as for a function of a single variable, we say that the function F has a stationary point at (a1 , . . . , a n )T E Nn if all of the partial derivatives of F in the directions of the n coordinate axes vanish. For functions of two variables, these stationary points are the places when the tangent plane to the surface is actually horizontal, and for more variables you have to imagine the analogous thing in higher dimensions. In the case of F(x, y) x2y2 - x2 - y2 + 1 , we can calculate 8Fj8x = 2xy2 - 2x, and 8 Fj8x 2x2y - 2y. Solving 2xy2 - 2x = 2x2y - 2y = 0 we see that these partial derivatives vanish at the points (1, l)r, (1 , (- 1 , l)T, ( - 1 , - l )r, so there are five stationary points of F. The second derivatives of F, and Sylvester's law, can help us determine the nature of these stationary points. In the general case of F(xl , x 2 , . . . , X n ) , we shall denote by F; (x1 , :r2 , . . . , X n ) the first partial derivative 8Fj8x;, and by F;j (x1 , x 2 , . . . , Xn ) the second partial derivative 82f/ 8x;8x J . Then the Taylor expansion of the function F about the point a up to quadratic terms is given by
=
-l)T,
F( a + x )
�
(
=
(O,O)T,
F(a) + ( F1 (a) , . . . , Fn (a) )
+ � (XJ , . . . , x n ) 2
Fl l ( a) .: Fn i ( a)
(1)
for small values x;. (If the reader hasn 't seen this before, it is a good exercise to verify that the first- and second-order partial derivatives of the function on the right-hand side of ( 1) agree with those of F ( a + x ) . ) At a stationary point a of F, F1 (a) = · · · = Fn (a) = 0 and the middle term of ( 1) vanishes, giving
If the form
(
118 Quadratic forms and Sylvester's law of inertia Q (x)
=
(x1 , . . . , Xn )
Fn (a)
;
Fn i ( a)
is positive definite, then for all sufficiently small x , third- and higher order terms in x in the Taylor expansion of F can be ignored, and
F(a + x)
�
F(a) +
�Q(x) �
F(a)
with inequality for all sufficiently small x -::/:- 0. Thus a is a local rmmmurn point of F. Similarly, if Q were negative definite, then this point a would be a local maximum point of F. If the diagonal form for Q given by Sylvester's law has both positive and negative values on the diagonal, then in some directions (corresponding to a basis vector with positive entry on the diagonal) the function F will increase as we go away from a, and in other directions (corresponding to a negative entry on the diagonal) the function F will decrease, so this point a is neither a local maximum nor a local minimum-it is called a saddle point. Finally, if the form Q has rank k < n and signature ±k, so its diagonal form consists of all nonnegative entries or all nonpositive entries with at least one zero entry on the diagonal, then the method cannot tell us whether the point is a local maximum , minimum, or saddle. In this case, it is necessary to look at higher derivatives in the directions corresponding to the zeros on the diagonal.
Example 7. 1 9 We analyse the five stationary points (0, O)r , ( 1 , l)r , ( 1 , - l)r , ( - 1 , 1) T , and ( - 1 , - 1)T of the function F(x, y) = x2y2 - x2 - y2 + 1 . The second partial derivatives of F are 82Fj8x2 = 2y2 - 2, 82Fj8x8y = 4xy, and 82Fj 8y2 = 2x2 - 2. So, if the stationary point is (a, b) T , we must look at the quadratic form
) (xy) · Substituting a = ± 1 , b = ± 1 we get the matrices ( � �) (twice) and ( � -�) Q ( X' y ) T
=
( X' y)
(2b24ab- 2
4ab 2a2 - 2
-
(twice) , and each of these have rank 2 and signature 0, so these four stationary points are saddles. On the other hand, substituting a = b = 0 gives the negative definite matrix so the origin is a maximum point. Figure 7.1 shows this surface in more detail.
( -�
7.5
- �) ,
S esquilinea r a n d H ermitian forms
The proof of Sylvester's law given in Section 7.2 successfully classifies conjugate symmetric sesquilinear forms over complex vector spaces too. In this section we describe the results, but do not give full details of the proofs. The vector space V here is a finite dimensional space over the complex numbers throughout.
Sesquilinear and Hermitian forms 119
The analogue of the quadratic form obtained from a symmetric bilinear form is called the Hermitian form H associated with a conjugate-symmetric sesqui linear form F . This is defined in the same way: H (v) = F (v, v). Notice first that H (v) = F(v , v) = F(v , v) = H (v ) , since F is conjugate-symmetric, and therefore H (v) is always real. Also, if >. is any scalar , then H (.\v) = F (>.v, >.v) = :\>.F (v, v) = J>.J 2 H (v ) , giving a result analogous to Proposition 7.5. We can still recover F from H, but it is a bit more complicated than in the real case. There are various possible formulae. Here we present two of them. Lemma 7.20 If F : V x V -t C is a conjugate-symmetric sesquilinear form, and H is the associated Hermitian form, then (a) F(v, w) = � (H (v + w) + iH (v - iw) - ( 1 + i ) ( H (v) + H (w))) ; ( b ) F(v, w) = i ( H (v + w) - H (v - w) + i H (v - iw) - i H (v + iw)) . Proof Expand the right-hand sides, and observe that everything cancels out 0 except the terms in F(v, w).
This gives a lemma analogous to Lemma 7.9. Lemma 7.21 Let F be a conjugate-symmetric sesquilinear form on a complex vector space V , and suppose F(v, v) = 0 for all v E V . Then F(v, w) = 0 for all v, w E V . The proof of Sylvester's law now goes through in the complex case exactly as it did in the real case. What's more, the notions of 'rank' and 'signature' make sense in this case, just as they did in the real case.
120 Quadratic forms and Sylvester's law of inertia Theorem 7.22 Let F be a conjugate-symmetric sesquilinear form on an n di mensional complex vector space V . Then there is a basis w 1 , . , W n of V and k, m :::; n such that F(w; , wj ) 0 for all i -::J j , F(w; , w;) 1 for i :::; k, F(w; , w;) - 1 for k < i :::; k + m, and F(w; , w;) 0 for i > k + m . Moreover, k and m are uniquely determined from F (i.e. are independent of the choice of diagonalizing basis w1 , . . , W n as above).
=
=
.
=
.
=
.
Example 7.23 We apply row operations to find the rank and signature of the conjugate-symmetric form F (x, y)
= xT (�� �) = P TA P , -z
2 0
z
y.
The method is as before: apply a row operation and then a corresponding column operation. However, in this case it is necessary to remember that the base-change formula for sesquilinear forms is B so each row operation should be followed by the conjugate of the same operation applied to columns.
A = (�� �) -z
2 0
u. ��) u, ��) �) (� ) (� - �) G G J) 1 0 0 1 -1 �.1
p3 : = p 3+ P2
0 1 -1 0 1 -1 -1
K2 : = K2 - i l'q
K3 :=K3 -iK1
0 1 0 -2
1<3: =1<3+1<2
0 1 0
so the form has rank 3 and signature 1 . To determine the base-change matrix, we multiply together the row operation matrices we used,
(� �) (� �) ( � � �) = ( ! PT , ) �
0 1 1
�
i 0 1
0 0 1
0 1 2i 1
and remember that this is the conjugate-transpose of the base-change mat rix. So the base-change matrix here is -z
1 0
-2i 1 . 1
Summary 121 We conclude this section by linking up Sylvester's law with the leading minor test of the previous chapter. In the proof of the leading minor test, we started with an n x n matrix which was real symmetric or complex and conjugate symmetric and considered the form F(v, = vT We then defined the leading minors D0 = 1 , D 1 , . . . , Dn of by saying that D ; is the determinant of the top left i x i submatrix of and found vectors f1 , . . . , fn in the underlying vector space V ( IE.n or en ) which were orthogonal for the form F and satisfied F(f; , f;) = D;_ 1 D ; . Note also that, even in the case over the complex numbers, these leading minors D; are all real. By Lemma 7.10 (which works over e in just the same way), if all the D ; are nonzero then f1 , . . . , fn is a basis of V . In this particular case, we can read off the rank and signature of the form directly. Proposition 7.24 Suppose is an n x n symmetric matrix over IE. or e with leading minors D0 = 1 , D 1 , , D n all nonzero. Then the form F defined by F( v ' Oil IE.n or en has rank n and signature equal to the number = VT of i with 0 :( i < n such that D; and D i+ 1 have the same sign. Proof Lemma 7.10 shows the f; form a basis, and the signature of F is the number of i < n such that Di , Di+1 have the same sign, minus the number of D i < n such that D ; , D i+1 have different sign. This proposition says nothing if one or more of the leading minors are zero. For example, the matrix has leading minors D 1 = 0 and D 2 = 1, so Proposition 7.24 doesn't apply. However, Example 7.17 showed that this matrix has rank 2 and signature 0.
A
w)
Aw
w)
A, A
A
.
.
Aw.
•
A = (� �)
S u m m ary
To sum up, m this part of the book we have studied bilinear forms on both real and complex finite dimensional vector spaces and also sesquilinear forms on finite dimensional complex vector spaces, and went some way to classify them. We showed that they are all given by a square matrix and are isomorphic to
A
on IE.n , in the case of bilinear forms on vector spaces over IE., or to
F(x, y)
=
x_TAy
on en , in the case of sesquilinear forms on spaces over C. We then studied three important cases in more detail: firstly, inner products which we showed (using the Gram-Schmidt theorem) are always isomorphic to the usual inner product
F(x,y) X.Tly =
on IE.n or en ; next symmetric bilinear forms, which were not all isomorphic, but (by Sylvester's law) could always be diagonalized and could be classified
122 Quadratic forms and Sylvester's law of inertia by the dimension of the vector space and the rank and signature of the form. Sylvester's law can be applied to classify quadratic forms too, because of the close connections between quadratic forms and symmetric bilinear forms in Section 7.1. These results were applied to surface sketching in three or more dimensions in Section 7.4. Finally, we discussed conjugate-symmetric sesquilinear forms and Hermitian forms, and showed that Sylvester's law applies to these also. In the next part of this book, we shall go on to look at another important application of matrices, that of matrices as representing linear transformations, and try to investigate the structure of such linear transformations in detail us ing matrix methods. There will be many applications too, such as to solving simultaneous differential equations. Exercises
Exercise 7.1 Which of the following are quadratic forms on IR:_n for some n? Write down the matrix representation (with respect to the usual basis) of the corresponding symmetric bilinear forms. (a) Q ( (x, y , z) T ) = x2 + 2y 2 + 3z 2 + xyz (b) Q ( (x, y , z) T ) = x2 + xy (c) Q ( (x, y , z) T ) = (x - y) 2 + (x - z ) 2 + (y - z ) 2 . Exercise 7.2 (a) Let F be the symmetric bilinear form on JR:.3 defined by
where v = (v 1 , v2 , v3 ) T and w = (w 1 , w2 , w 3 ) T . By completing the square, or otherwise, find a basis with respect to which the matrix of F is diagonal, and give this matrix. (b) Do the same for the form
Exercise 7.3 Find a basis of
JR:.4 for which the representation of 1
2
-�)
1 1 3 -1 y -1 -1 -1 0
is diagonal, and write the matrix of F with respect to your basis. (Use row and column operations.) Exercise 7.4 Find the rank and signature of each of the following quadratic forms. Use both the methods (i) completing the square, and (ii) row and column operations, and verify that they give the same answers. (a) Q ( (a, b, c, d) T ) = 2 a 2 - 6 ab + 2 a c - 8bc - c2 + 4bd - 2rf2
Exercises 123 ( b) Q
m�
(
X
y z)
o � Dm
( c) Q ( (x, y, z)T ) = 14x2 + y 2 + 5z 2 - 10xy + 6 yz - 54xz ( d ) Q ( (x, y, z)T ) = x 2 + y 2 + z 2 - 2xy + 2xz. In each case, find a basis with respect to which the matrix of Q is diagonal. Exercise 7.5 Check that the proof of Theorem 7 . 22 goes through exactly as in the real case. Exercise 7.6 Determine the rank and signature of the following conjugatesymmetric matrices.
( a)
C' _',) -z -z
-1 -1 - 1
(b)
G
+)
2 3 1 - 2i
i
(� D i i -2 1
( c)
Exercise 7. 7 For each of the following matrices A, find an invertible matrix P so that P T AP is diagonal.
( a) (c )
u
c�i
1+i -1
)
I+i 5 - 1 ; 3i i - 1 - 3i -z
(b)
)
( 1 -73i
(d )
)
1 + 3i -1 2 2+i 6 1+i 2 i 1-i 3 4 2
-
(�
�)
Exercise 7.8 Suppose A is an n x n complex conjugate-symmetric matrix, and F(x , y) = x_T Ay for all x, y E en . Show that if det A f 0 then F has rank n . Exercise 7 . 9 Find all stationary points of the following functions and investig ate their nature. (a) j (x, y) = x2 y2 - 4x 2 y + 3x2 - y2 + 4y (b) f(x, y) = x 3 - x2 y - x2 + y z ( c ) f(x, y) = x 3 + y3 + 9(x 2 + y 2 ) + 12x y ( d) j (x, y ) = x3 - 15x2 - 20y 2 + 5 (e) j (x, y) = x2 + y2 + (2/x) + (2 /y ) (f) f(x, y) = x 3 + y3 - 2(x 2 + y 2 ) + 3xy (g) j ( X , y) = y 2 COS X - COS X + y .
The next few exercises discuss skew-symmetric or alternating bilinear forms
F on a vector space V over the reals. These forms are defined to be bilinear forms F satisfying F( v, v) = 0 for all v E V , or equivalently ( see Exercise 4.8) bilinear forms F satisfying F(x, y) = -F( y , x) for all x, y E V. Exercise 7.10 Let V be a finite dimensional real vector space with a skew symmetric bilinear form F. Suppose v1 , v 2 E V satisfy F(v1 , v 2 ) f. 0 .
124 Quadratic forms and Sylvester's law of inertia ( a) Show that
{v1,v is linearly independent, and hence extends to a basis {v! ,V2 , · · · ,vn } of2 }V. ( b ) Show that the vectors w1, w2 , , Wn defined by .
•
.
and, for each i ;;: 3,
form a basis of V.
( c ) Compute the first two rows ( and first two columns ) of the matrix of F with
respect to
WI, w2 , . . . , Wn·
Exercise 7 . 1 1 Let F b e a skew-symmetric bilinear form o n a finite dimensional real vector space V . Show that there is a basis v1 , of V such that, with respect to this basis , F has matrix
. . . , Vn
where each submatrix B; is either the 1 x 1 zero matrix
(0) or is the 2
x
2 matrix
( -� �) [Hint: construct a basis inductively using the previous exercise. There are two cases. Given nonzero v 1 , if F(v1, w) 0 for all w E V then we get B 1 (0); else there is v2 with F ( v1, v2 ) -1- 0 and the previous exercise applies. ] 0
=
=
Part III Linear transformations
8 li near tra nsformations In this part of the book, we will study linear transformations i n the same spirit as we studied inner products and quadratic forms. In this chapter we will see how a linear transformation can be represented by a matrix, with respect to a particular basis, and in later chapters we will discuss how to find the 'best' basis, so that the corresponding matrix is as simple as possible, with applications to the solution of many kinds of differential and difference equations, and to quadratic forms. Our vector spaces will be over a field F . For this chapter, F can be any field whatsoever. If it helps, it is possible to think of F as either JR. or C without much being lost. In l ater chapters, we will need some further properties of the field, taking the form that certain polynomials have roots; these properties are always true of the field C, so it is safe to think of F as C throughout the rest of the book. 8.1
B asics
We start with the definition of the basic objects of study in this part of the book. Definition 8 . 1 If V and vV are two vector spaces over the same field F, then a linear transformation from V to W (also called a linear map , or homo morphism) is a map f : V ---7 W satisfying the condition J ( >..n + p v)
= >..j ( n ) + pf(v)
(1)
for all n, v E V and all scalars >.. , 11 E F. The space V is called the domain of f , and W is called the codomain of f .
Thus a linear transformation is the same as an isomorphism of vector spaces (Definition 2.32) , except that we drop the requirements that it be injective and smjective. Lemma 8.2 A linear transformation f : V ---7 W satisfies (a) f(O) = 0, (b) f(A.n ) = >..j ( n) , (c) f( - n ) = -f( n ) , (d) f( n + v) = f( n ) + f(v) , and
128 Linear transformations (e) f("£7=1 >-.; u ; ) = "£7= 1 >-. ;j (u ; ) , for all u, v E V and all scalars >-., A; . Proof For (a) use f (O) f (Ou + Ov) = Of(u) + Of(v) = 0 for all vectors u, v E V . For (b) , apply Definition 8 . 1 for J1 0, and for (c) , apply part (b ) with ).. - 1. Part (d) is just Definition 8 . 1 where ).. = J1 = 1 , and (e) is by repeated 0 applications of Definition 8.1. =
=
=
Example 8.3 Let V be the real vector space linear transformations j, g : V --+ W by
IR3 and let W be IR2 . We define
and
It is easy to check that Definition 8.1 is satisfied. For example,
Definition 8.4 Given f : V --+ W as in Definition 8.1, the image (or range) of f is {f(v) : v E V} (written f(V ) or im(f)). The kernel (or nullspace) of f is {v E V : f (v) = 0} (written ker(f)). Proposition 8.5 If f : V --+ W is a linear transformation, then irn(f) is a sub space of W and ker(f ) is a subspace of V . Proof We verify the conditions in Lemma 2.9. If v, w E ker f then f(v + >-.w) = f(v) + >..f (w) = 0 + )..0 = 0, so v + >-.w E ker f. If v, w E irn f then v = f(x) and w = f(y) for some x, y E V , so f(x + )..y ) = 0 f (x) + )..j ( y) v + >-.w E irn f. =
Basics 129 Example 8.6 For f and g as in Example 8.3, f is surjective since given (u, v)T E �2 we have j (u, -u, u + v) T = (u, v)T, but g is not surjective as for example ( 1 , - 1) T is not equal to any ( x , x ) T . The kernel of f is ker f = { ( x, y, z ) T : 2x + y = y + z = 0 }
which as you can check i s spanned b y the vector ( 1 , spanned by (0, 1 , O)T, (0, 0, l ) T .
-2, 2) r. The kernel of g is
The following is a particularly useful criterion for testing if a linear trans formation is injective.
Proposition 8. 7 A linear transformation f : V -+ W is injective if and only if ker f is the zero subspace { 0 } of V . Proof I f f. 0 i s i n ker f then f v ) = j (O) = 0 s o f i s not injective. Conversely, if f(v) = f(w) for some f. w , then w f. 0 and w) = f (v) - f(w) = 0 D so v - w E ker f and hence ker f f. { 0 } .
v
Proposition
( v-
v
f(v -
8.5 allows us to make the following definition.
rank of f is the dimension of irn(j) , and the nullity of f is the dimension of ker(j) . We write r(j) for the rank of j, and n(j) for its n ullity
Definition 8.8 The
Example 8.9 In Examples 8.3 and 8.6 we have n(j) = 1 and n (g) = 2. We can calculate the ranks of and g as follows: firstly, r(f) = 2 as f maps �3 to �2 surjectively; secondly, r (g) = 1 as ( 1 , 1 ) T forms a one-element basis of irn g . In both cases we check that r(j) + n(f) = 3 = dim �3 and r(g) +n(g) = 3 = dim �3 . This is no accident; in fact, these are just particular cases of the rank-nullity formula.
f
Theorem 8.10 (The rank-nullity formula) If f : V then r(f)
+ n(j) = dim( V ) .
Proof Choose a basis {
-+
W is a linear map,
v1,
. . . , vk } for ker(j) , and extend this basis to a basis { v1 , , Vm } of V, so that k = n(f) and rn = dirn (V). Now any vector v in V is of the form v = I:;� 1 A;v; , so •
•
•
f (v) = f
(t A;v;)
m
=
t= l
2: >-.d(v; ) ,
i=lm = L >-.i f(v;) i=k+ l
since f i s linear,
since f (v J ) = · · · = f(vk) = 0. Thus the vectors f (vk+ ! ) , . . . , j(vm) span the = 0 then image of f. On the other hand if I:;� k+ l
,\i f(v;)
130 Linear transformations
so
m L i=k+l A;V; E ker(j ) ,
and hence there are scalars
Jl.J such that k m i=kL+l A;V; = jL=l Jl.j Vj·
{v1, . . . , Vm } i s a linearly independent set and k m Ji. V L J J L j=l i= k+l (-.\;)v; = 0 so .\; = 0 for all Therefore {f(vk+ l ), . . . , j(vm )} is a linearly independent set, and hence a basis for im(j ) . So But
+
i.
r(j) as
= m -
k = dim(V)
-
n (j) 0
required.
Rank and nullity give useful ways of determining if a transformation is in jective or surjective.
Proposition 8 . 1 1 If j : V -+ W is a linear transformation of finite dimensional
f f Proof The map f i s injective i f and only i f ker f = {0}, i f and only i f n(j) = 0 by Proposition 8.7, and f is surjective if and only if dim(im f ) = dim W by Corollary 2.28.
vector spaces V, W (over the same field F) then (a) is injective if and only if and (b) is surjective if and only if r(j) = dim W .
n (j) =
0,
0
Corollary 8 . 1 2 If f : V
-+ W is a linear transformation of vector spaces V, W, then is injective if and only if r(j) = dim V and is surjective if and only if n(j) = dim V - dim W .
f
f
Proof B y the previous proposition and the rank-nullity formula.
0
Just as matrices provided a canonical family of examples of bilinear forms in the previous part of this book, they provide examples of linear transformations too.
Basics 131 Example 8.13 Let V = IRn and W = !Rm , and let A be an
m x n matrix with real entries. Then we may define the linear transformation fA : V -+ W by
JA (v) = Av.
(Note that if v is an n x 1 matrix, A v is an m x 1 matrix, so the matrix multiplication in this definition makes sense.) The transformation fA is linear by distributivity of matrix multiplication: A(.-\u + f1V) = .\Au + 11Av.
We can do the same over any field F. If A is an m x n matrix with entries from F , we can define the linear transformation fA : F n -+ Fm by fA ( v) = A v . I t i s convenient t o extend the terminology and notation for image, kernel , rank, and nullity to matrices in this way: if A is an m x n matrix over the field F, then im denotes the image of fA ,
A
im A = {Ax : x E F n } ,
ker A denotes the kernel ker A = { x E Fn : Ax = 0 } , and r ( A) , n(A) denote the rank and nullity of A, i.e. the dimensions of im A and ker A respectively. Exercise 8 . 1 Show that, for an m x n matrix A over a field F, the subspace im A of pm is the subspace spanned by the columns of A .
It i s interesting t o note that the rank of an m x n matrix A over IR, a s defined using echelon form in Section 1 .5, is the same as the rank of the linear transformation fA just defined. To see this, consider a sequence of row operations converting A to echelon form B. Since each such row operation corresponds to an m x m matrix Ri over N, we have
r(A)
Now, each row operation matrix Ri is invertible, so 1 1 1 A - R1 R2 . . . R-k B .
This means that the linear transformation fA is equal to the cornpos1 t 1 0n of the transformations JR-' , JR2- ' , · . . , JR- ' , fB · You can verify that each Ri 1 is k I
132 Linear transformations actually a bijection rn;.m -+ rn;.m . It follows that dim(irn fA ) = dirn (irn !B ) - But since B is in echelon form we can spot a basis of im /B immediately. If
0 with each
bi n, -::J 0 then
0
0
the subspace irn /B which is spanned by the columns of
B has basis given by
1 0 0
0 1 0
0 0 1
0
0
0
0
'
•
•
•
(There is one of these vectors for each nonzero row in B . ) So im !B has dimen sion equal to the number of nonzero rows in B , i.e. equal to rk as defined in Section 1.5.
A
Example 8 . 14 Let
A be the matrix
�) .
1 2 2 -1 3 -4 - 1
This is a 3 x 4 matrix so represents a linear transformation JR3 -+ JR4 . We can perform row operations to get into echelon form as follows.
A
P2:=p2-P1
(�
_;)
1 2 1 -3 3 -4 - 1
A
p3 : =p3 - p1
p3 : =p3 - 2 p1
G (i
1 2 1 -3 2 -6 -4 1 2 1 -3 0 0
;)
- �)
_
� s
A has rank 2, and hence nullity dim( IR4 ) - 2 2. the elementary row operation matrices together we find that RABy multiplying B, where R is the matrix So
=
=
(-: _I �)
R- 1
R- 1
Now, im B is spanned by the columns of B , so a basis of this space is (1, 0, O)T and (0, 1, O)r. It follows that im is spanned by ( 1 , 0, O)T and (0, 1, O)T, and on calculating we find that
A
Arithmetic operations on linear transformations 133
R- 1 ( 1 , 0 , 0)T
so a basis of irn A is formed by
(0, 1, 2)T.
To work out the kernels, note that
{ (v�;) + Y + } } { (� ) { c:,2�'�) �} x
ker B =
2z + 3 w 0 y - 3z - 2 w 0 =
=
_
:
z
x
=
y
w
( - 5 , 3, 1 ,
Of' and ( - 5 , 2, 0, l)T. Also, ker A
Av because 8.2
=
z. w E
=
so is spanned by any vector v,
-2z - 3w - y 3z + 2 w
=
0
¢:;>
RAv = 0 ¢:;> B v
=
=
ker B since for
0
R is invertible.
A rithmetic operations on linear tra n sform ations
You will be used to the idea of adding functions together, defining the sum of two functions f + g by
(f + g ) (x)
=
f(x)
+ g (x),
and of scaling functions, defining the function
(i\f)(x)
=
i\f by
>.(J(x)).
For this to make sense, we simply need the codomain of the functions t o have an addition and a scalar multiplication defined on it, and this is true in the case of linear transformations since this codomain here is a vector space W . It should come as no surprise that we expect these operations on functions to satisfy the vector space axioms. In fact, if we take a set S of functions from any set X to a vector space W, such that S is closed under addition and scalar multiplication, then S is automatically a vector space. We do not need any other conditions on the functions, or on
134 Linear transformations the domain X of these functions. To prove this, we simply have to check the vector space axioms: for example, for any E S and any scalar we have since
f, g ,\( ! + g) = (,\f) + ( ,\g), (,\ (! + g))(x) = ,\ ((! + g)(x)) = ,\ (f(x) + g(x)) = ,\(f(x)) + ,\ (g(x))
,\
since W is a vector space, and so
(,\ (! + g))(x) = ( V)(x) + (,\g)(x) = (( ,\! ) + ( ,\g))(x).
The other axioms are equally easy to check. Example 8 . 1 5 Let �[0, 1] be the set of all differentiable functions from [0, 1]
to ffi.. The codomain of these functions is IR, which may be regarded as a one dimensional vector space over itself. Then � [0, 1] is a vector space with the operations defined above, since it is closed under addition and scalar multipli cation.
Example 8 . 1 6 Let 2 ( V, W) be the set of all linear transformations from a
vector space V to a vector space W over the same field F. Then 2 ( V, W) is itself a vector space, over the same field F. The zero element in 2 ( V, W) is the map that takes every vector in V to zero in W .
The special case 2 ( V, V ) will b e especially important for the study of linear transformations and applications. This space may be given an additional opera tion, namely composition of functions, by setting to be the function defined by
fog (f o g)(x) = f(g(x)).
Composition is a sort of 'multiplication' of linear transformations from V to V, but you should be warned that not all the laws you might expect for multiplica tion hold in this case. For example, the commutativity law is false in general, although the associativity law is always true. For the record, we list all the properties of the arithmetic operations on linear transformations defined that hold for all E 2 ( V, V) and all scalars 11 · 1. ( Associativity.) 2. ( Commutativity. ) 3. ( Zero. ) 0 0 where 0 is the zero map defined by 0.
(! o g) o h = f o (gfoohg) = g o f f, g, h ,\ , (! + g) + h = f + (g + h). f + g = g + f. O(x) = + f = f + = f, 4. ,\ (Jlf) = ( AJl ) f. 5. ( ,\ + Jl )f = ,\ f + flf . 6. Of = 0 . 7 . f + ( - l )f = O .
Representation by matrices 135 (f o g) o h = J o (g o h) . 9 . I o f = f o I = j , where I is the identity map defined by I(x) = x. 10. f o (g + h) = (f o g) + (f o h). 1 1 . (g + h) f = (g f) + (h f). 8.
0
0
0
A diligent reader will stop t o check these properties at this point, but for others, the main idea of the next section provides an alternative proof. 8.3
Representation by matrices
If f : V -+ W is a linear map , v 1 , is a basis for V, and w 1 , . . . is a basis for W , then each f(vj) is in W , so can be written as a linear combination of the basis vectors. Thus we have
. . . , Vn m f(vj) = 2::: a;jW; i=l
a;j · a;j)
, Wm
for some scalars (These scalars are uniquely determined by Proposition 2. 18.) The matrix A = ( is called the matrix of f with respect to the ordered bases of V and w 1 , . . . , wm of W.
V], · · · ,vn
Example 8.17 (a) The matrix of the zero map 0: V -+ W taking any
to 0 is just the zero matrix 0. To see this, note that
m i=l
vE
V
m i= l
so each a;J = 0 since these coefficients are unique. (b) The matrix of the identity map I : V -+ V with I( v) = v, with respect to the same basis v 1 , . . . in both domain and codomain, is the identity matrix I. Again, to see this we just need to note that
, Vn
f(vj ) =
n Vj = 2:= J;jVi i=
l b;j = 0 if i -::/:- j and b;j = 1 if i = j . Given the matrix A of a linear map f with respect to basis can work out f ( v) for any vector v E V , as follows. First express v combination of the basis vectors of V, say v = .E7= 1 so that where
AjVj,
v1 , .as. . a, Vlinear n , we (2)
since f is linear, and therefore
n m L j=l Aj 2::: i=l a;j W;. We can view this matrix in terms of the coordinates vector v with respect to the ordered basis . . . , Vn of V. (See Section. . . 2.5, Anforof athediscussion f(v) =
v1 ,
A1 ,
136 Linear transformations of coordinates.) Equation (2) shows that the ith coordinate J-li of f ( v ) with respect to the ordered basis w 1 , . . , W m of W is
.
n
J-li =
L a;jAj ,
j= l
(J-L1 , (
. . . , J-Lm) T is related to the column or, in other words, the column vector 11 = T vector >. = . . . , A n ) and the matrix A = a;j ) by matrix multiplication,
().. 1 ,
11 = A>.. Example 8 . 1 8 Consider the linear transformations f, g : Ilt3
f
G)
e+
--+
Ilt2 given by
e:::) ,
The matrices representing these with respect to the usual bases of Ilt3 and Ilt2 are
1 1
0 0
and
respectively. To see this note that f
m � G) �
2
�)
w + 0 (�) ,
giving the first column of the matrix for j , and so on. Alternatively, note that (x, y , z) T is the coordinate form for this vector with respect to the usual basis, and and The next important question is: what happens to the matrix A if we change either the basis of V or the basis of W? First, let us replace the basis v 1 , . . , Vn by v ; . . . , v� , related by the base-change matrix P = (Pij ) , so that
.
,
vj Then
n
=
L:::V ij V;. i= l
Representation by matrices 137
n LPiJf(v;) i=1 m = L PiJ L a ki Wk i= 1 k=1 =
n
Now the sum in brackets in this last expression is simply the (k, j)th entry of the matrix AP, so the matrix of with respect to the pair of bases and . . . , wm is AP. Similarly, we can replace the basis by w; , . . . , w� , a new basis related to the old one by the base-change matrix Q = ( so that
f
w1,
Let
q- 1 = (rij ), so that
Then
w1, . . . , Wm q j ) i m w� = 2:_ q;1 w ; . i= 1 m W; = j2:_rj;W�. =1
v1 , • . . , v11 q- 1
f
v;, . . . , v�
q- 1
so that the matrix of with respect to and w; , . . . , w� is A. Putting the two together we obtain the general form AP , where P i s the base-change matrix for V and Q is the base-change matrix for W . We are most interested i n the case when V W, where i t i s reasonable to suppose that we will use the same basis for both the domain and codomain. is a basis for V, we can Thus if : V --+ V is a linear map, and write =
v1, . . . , Vn
f
n f(vj) L i=l a;jV;, and say that A = ( is the matrix of with respect to the ordered basis v1 , . . . , v11 • In this case,a;1)if we change basis byf P = (p;1 ) , so that n vj = LPi i= l jV;, =
138 Linear transformations then we change both occurrences of the basis, so the matrix of f with respect to v� , . . . , v� is p - 1 AP. Let us state this formally for future reference. Proposition 8 . 1 9 Let V be a vector space with ordered bases B given by v 1 , . . . , Vn and B ' given by v� , . . . , v� . Let P be the base-change matrix from B
to B' , so
vj
n
=
L Pij Vi · i= 1
Suppose that f : V ---+ V is a linear transformation which has matrix A with respect to the basis B and matrix B with respect to the basis B ' . Then B p- 1 AP.
=
Warning. A matrix in isolation can represent many different things. It does not make sense to talk about changing the basis unless you know what the matrix in question represents. Thus changing the basis for a quadratic form has the effect of changing the representing matrix from A to p T AP, whereas changing the basis for a linear transformation has the effect of changing A to p - 1 AP. These are only the same under the very special circumstances when p- 1 p T , circumstances which are examined i n more detail i n Chapter 13. =
Definition 8 . 2 0 If B = p - 1 AP, then A and B are called similar matrices. If B = p T AP, for some invertible matrix P, then A and B are called congruent.
The final thing that we need to consider with matrix representation of linear transformations is how the matrix representation of the product >..f of a scalar ).. and a linear transformation f or the representation of the sum f + g or com position f o g of two linear transformations can be obtained from the matrix representations of f, g. For scalar multiplication and addition, this is quite straightforward.
Proposition 8.21 Let f, g E £(V, W ) where V, W are finite dimensional vector spaces, and let f, g have matrix representations A, B respectively, with respect to some ordered bases A and B of V, W respectively. Then (a) The matrix representation of >.. f with respect to A, B is the scalar product >.. A of the matrix A. (b) The matrix representation of the sum f + g with respect to A, B is the matrix sum A + B . Proof This i s almost immediate from the definition. The matrix of f i s given by m
f (vj )
=
L a;j Wi
j=1
where A is the ordered basis v1 , . . . , Vn and B is w 1 , . . . , Wm . Then
Representation by matrices 139 m
m
(A.f)(vj ) >... j=L1 aijWi j=L1 ( >...aij )Wi =
so the matrix for
=
A.f is >...A . Similarly if g(vj ) = j=L1 bij Wi m
then m
m
m
(f + g)(vj ) j=L1 a;j Wi + jL=1 b;j Wi j=L1 (a;j + b;j )w; and hence the matrix for f + g is A + B . =
=
0
Note also that the matrix for the zero transformation 0 E Jt(V, W) given by 0 is the zero matrix of the appropriate size. The other arithmetic operation on linear transformations is composition, and this corresponds to matrix multiplication. Indeed, this perhaps explains why matrix multiplication is defined in the way it is.
O(x)
=
Proposition 8.22 ( Composition of functions) Let U, V, W be finite dimen sional vector spaces, with ordered bases A, B, C respectively, and let U -t V and V -t W be linear maps. If f and are represented by the matrices A
g: g and B with respect to A , B, C, then fo g is represented with respect to the same bases by the matrix product AB. Proof If A is the ordered basis 111 , . . . B is v1 , . . . , Vm , and C is w1, . . . , W n , then (f o g)(uk ) f(g (u k )) f (f1 bjkVj ) ]= j=L1 bj k f(vn j ) L bjk L a;j Wi f:
, 11 1 ,
=
=
m
=
m
=
j= l
i=1
But 2::_7'= 1 is precisely the (i, k ) th element of the matrix product AB, so 0 the matrix representing is AB.
a;1b1k
fog
140 Linear transformations In the special case of 2'(V, V ) , where V is an n-dimensional vector space over a field F, what we have proved is this. Given an ordered basis B of the vector space V, we have a map from 2'(V, V ) to the set Mn,n ( F) of n x n matrices with entries from F, taking f to the matrix representing f. Every linear transform ation corresponds to some matrix (which is uniquely determined once we have fixed B ) , and every matrix is the matrix of some transformation f. What's more, the zero transformation corresponds to the zero matrix, the identity transforma tion I(x) = x corresponds to the identity matrix I n , and the operations of scalar multiplication and addition of linear transformations correspond to scalar mul tiplication and addition of matrices. In other words, the vector spaces 2'(V, V ) and Mn,n ( F) over F are isomorphic. B u t we can say a little bit more: these vector spaces have 'multiplication' operations-for Mn,n( F) this is just multiplication of matrices, and for 2'(V, V ) it is composition f o g of functions, and Proposi tion 8.22 shows that Mn,n ( F) and 2'(V, V) are isomorphic with this operation too. Exercises
Exercise 8 . 2 For each of the following n x m matrices A, give bases in �n and �m for im fA and ker fA , where fA : �m --+ �n is left-multiplication by A.
(a)
G�
1 1
(b)
( ; �) -1 0 1 1
(c)
(-� i -i) . -1 4
1
Exercise 8 . 3 For each of the following sets S of vectors in �n , find a linear map
�n --+ �3 whose kernel is spanned by S . (a) S = {(1, 1 , 0), (1, 0, - 1)} i n �3 . (b) S = { ( 1 , -2, 1 , 0)} in �4 . 2 (c) S = { (-2, 1)} in � • (d) S = {( - 1 , 1, 2, - 1), (0, 1, 2, 3)} in
Exercise 8.4 Let
f : �3
--+
�4 .
�3 be the linear transformation defined by
f((x, y, zf) = (-x + y - z, x + 2y, - y + 3z) T . Calculate the images of the vectors
VJ = (0, 1 , 1) v2 = ( 1, - 1, 1) V3 = (2, 1 , 0).
Verify that f(vJ ) = -v1 - 2v2 + v3 , and derive similar expressions for f(v2 ) and f(v 3 ). Hence write down the matrix of f with respect to the basis v1 , v2 , v3 of �3 .
Exercises 141 Exercise 8.5 Do the same for the map
g defined by g ((x, y , zf) = (y - 2z, -x + 2y - z, x + y + z) T .
Exercise 8 . 6 Define f :
Exercise 8 . 7 Let e;j b e the n x m matrix with 1 i n the (i , j) th position and 0 elsewhere. Show that { e;j : 1 � i � n , 1 � j � m } is a basis for Mn , m ( lR) and deduce the dimension of Mn , m( lR) as a real vector space. Exercise 8 . 8 If V, W are vector spaces of dimensions the dimension of the vector space .Y' ( V, W)?
m, n
respectively, what is
9 Polyno m i a l s In the last chapter we saw how we can add and multiply linear transformations in 2'(V, V ) , i .e . how 2'(V, V ) is given arithmetic operations analogous to the matrix operations of addition and multiplication. Much of the study of linear transformations f : V ---+ V concerns their properties under these arithmetic op erations, and before we can take this further we must review some material on polynomials and discuss how polynomials can be applied to linear transforma tions such as f. 9.1
Polynomials
We consider polynomials over a field F to be expressions of the form
where the ar are from F (these are called the coefficients of the polynomial) and n x is an 'indeterminate'. This polynomial will also be written as anx . The degree of this polynomial is the largest r such that ar -:{: 0. The polynomial is monic, or is a monomial if ar = 1 for this particular r, i.e. if the first term of the polynomial is x r for some r. (If all the ar are 0, we have the zero polynomial, whose degree may be defined as - 1 , oo, or not defined at all, according to taste. ) Polynomials can be added and multiplied i n the familiar way. Moreover, you can divide one polynomial f (x) by another nonzero polynomial g(x) to get a quotient q(x) and a remainder r(x). The essential property of the remainder is that it is 'smaller' than g(x) in the sense that it has smaller degree. Since this is the crux of what follows, we state and prove this formally.
2:: 7=o
-
If f (x) and g(x) are two p olynomials, and g (x) is not the zero polynomial, then there exist poly nomials q(x) and r(x) such that f (x) = g (x)q(x) + r(x) , and either r(x) is the zero p olynomial or else deg(r) < deg(g ) .
Proposition 9 . 1 (Division algorithm for polynomials)
Proof The proof i s by induction on the degree o f f(x) . I f deg(f) < deg(g) , we take q(x) to be the zero polynomial, and r(x) = f(x) . Thus the induction starts.
Evaluating polynomials 143 If deg(f) � deg(g) , say f(x) >.xn + h(x) and g(x) pxm deg ( h) < deg(f) = n , deg ( k) < deg(g ) = rn, and m � n, then =
j (x) - (Ajp )x n- mg(x)
=
=
=
+ k(x) ,
where
AXn + h(x) - (.A/p)xn - m f.J,Xm - (.A/p)xn- mk (x) h(x) - (.A/ p)x n- mk(x) ,
which has degree less than n. Therefore, by induction, this polynomial can be written in the form g(x)s(x) + r(x) for some polynomials r(x) and s(x) with either deg ( r ) < deg(g ) or r(x) identically zero. It follows immediately that
f(x) as
=
( (>. jp)x n- m + s(x))g(x) + r(x) ,
0
required.
In practice, you find the quotient and remainder by long division of polyno mials. At each stage you just divide the highest terms into each other, as we did above, and let the smaller terms look after themselves.
2x 3 + 4x 2 + 9x + 7 by x2 - 2x. 2x + 8 2x j 2x 3 + 4x2 + 9x + 7 X2 2x 3 4x2 8x2 + 9x + 7 8x2 1 6x 25x + 7 so we get a quotient of 2x + 8 and a remainder of 25x + 7. Example 9 . 2 Divide
9.2
Eval uating p olynomials
If p(x) = l:�=O arxr , then you know what it means to evaluate p(.A) (or to evaluate p(x) at x >.), when >. is in the same field as the coefficients a r. You simply form the expression l:�=O a r>. r and evaluate it in the field. Now if is a matrix, what can p(A) mean? Presumably we want l:�=o can we make sense of this? Well, just means the product . . A of r copies of and you can certainly add matrices. Multiplying a matrix by a scalar simply means multiplying all entries by that scalar. Finally, since x 0 is interpreted as 1 , presumably should b e interpreted as the identity matrix of the appropriate size. =
A,
p(A)AAA . ar Ar;
Ar
A0
=
A = ( -� �) and p(x) p( A) = -A 2 + 2A + 312 2) + 2 ( 1 - - (-1 - 1 ;)-2 -1 � (
Example 9 . 3 Suppose
.
-
=
=
-x2 + 2x + 3. Then
A
144 Polynomials But there appears to be a problem with evaluations like this, which we illus trate with a simple example. Given the polynomial = we know that which i s to say that equals the product of the polynomials and But = and also. Now i f i s an n x n square matrix, is defined above to mean
p(x) x2 + 3x + 2, p(x) x= (x + 2)(x + 1) p(x) p(x) (x + 1)(x + 2) p(x) = (x + 1) 2 + (x + 1) + 2 Ax + 1. p(A) A2 + 3A + 2In ,
but what are the values of the following?
(A + 2In )(A + In ) )(A (A (A ++InI?n + (A+ 2I+ nIn) ) ·
Since matrix multiplication is not commutative, it is not at all obvious whether these are all equal. Example 9.4 We evaluate the above expressions for the matrix
A = ( - � �) ·
First,
p(A) A2 + 3A + 2I2 ( -1_ 1 =
Then
=
(A + 2I2 )(A + I2 ) ( -� �) (- � n (_ : �) , (A + I2 )(A + 2I2 ) (- � �) ( - � ;) ( _: �) , (A + I2 ) 2 + (A + I2 ) ( - � �r + ( -� D ( _: �) , =
=
=
=
so they are all the same. in a single free variable In fact, it is generally true for polynomials that whenever we have a polynomial identity = such as and an n x n matrix the matrix equation = is true. On the other hand, for polynomials in more than one variable the corresponding statement is definitely false, since the equality holds for polynomials of = two variables, but it is not true that = for all n x n matrices The proof i s rather technical, and relies on a precise technical definition of 'polynomial', which we have avoided giving. In fact, the details are not difficult, just rather unenlightening. lt's rather more interesting to outline the basic idea, and this is as follows.
x(x + 2)(x + 1),
p(p(x)x), q(x)q(x), 2 +3x + 2 x A, p(A) q(A) AB BAxy yx A, B.
=
Roots of polynomials over
(p(x) + q(x)) + r(x) p(x) + q(x) (p(x)q(x))r(x) p(x)q(x) p(x)(q(x) + r(x)) (p(x) + q(x))r(x)
=
= =
=
=
=
q(x) + (p(x) + r(x)) q(x) + p(x) q(x)(p(x)r(x)) q(x)p(x) p(x)q(x) + p(x)r(x) p(x)r(x) + q(x)r(x).
All of these laws hold equally well for matrix addition and multiplication with the exception of the commutativity of multiplication. However, for the special case we are dealing with here-polynomials where is a fixed matrix we find that even matrix multiplication commutes. The essential point to note is that for n, m ? 0 and any scalars >., f1, we have
p(A), q(A)
A
.\An fJ,Am fJ,Am .\An since each side equals (.\11)A n +m . This together with an induction on degree gives the following result, which will be used in several places in the rest of the =
book. Proposition 9 . 5
lfp(x),q(x are polynomials and A is an n n matrix, then p(A)q(A) q(A)p(A). ) It is because of this that any valid polynomial identity in a single variable x, such as x2 + 3x + 2 (x + 2)(x + 1), holds equally well when you substitute a square matrix A for x. x
=
=
So far, we have stated everything here in terms of matrices, but it makes sense to evaluate polynomials at linear transformations E 2' ( V, V ) too, by using composition of functions in place of multiplication. Thus P is interpreted as f o is interpreted as the identity transformation, and so on. Then since 2' ( V, V ) with the operations of addition, scalar multiplication, and composition is isomorphic to the set of n x n square matrices where n dim V ) over the appropriate field = lR or
f
f, j0
ifp(x),q(x)
9.3
Iff
MFn ,n (F)
(
=
p(f)q(f) q(f)p(f).
Roots of p olynomials over
ap(x)
p(x)
Suppose is a polynomial with coefficients from lR or
(x - a)
p(a)
p(x)
p(x) a
p(x)
146 Polynomials Proof If (x - o:) divides p(x), then p(x) = (x - o:)q(x) for some polynomial q(x), so p(o:) (o: - o:)q(o:) 0. Conversely, suppose p(o:) 0. By the division algorithm there are polynomials q(x) and r(x) such that p(x) (x - o:)q(x) + r(x) and the degree of r(x) is less than 1, i.e. r(x) is a constant, r. Substituting x we have p(o:) (o: - o:)q(o:) + so if o: is a root of p(x), then r 0 and (x - o:) divides p(x) exactly. From this a very important fact follows. Corollary 9.8 A polynomial p(x) of degree d ? 1 has at m ost d roots. Proof By induction on the degree d of p(x). By the preceding theorem, if o: is a root of a polynomial p(x) and p(x) has degree d, then p(x) (x - o:)q(x) for some polynomial q(x) of degree d - 1 . B y induction, q(x) has at most d - 1 roots, and if f3 is a root of p(x) then 0 p(/3) = (/3 - o:)q(/3) so f3 is a root of q(x) or else equals o:. Thus p(x) has at =
=
=
=
=
=
o:
T = T
D
=
=
=
D
most d roots.
It may be that a polynomial of degree 2 has only one root. For example, p(x) x2 + 4x + 4 has only the root -2, since x2 + 4x + 4 = (x + 2)2. In cases like this, the root -2 is called a repeated root, in this case of m ultiplicity On the other hand, the polynomial x2 + 1 has no roots at all in since - 1 does not have a square root in But this polynomial does, however, have its maximum possible number of roots in
JR.
2.
lE.
=
It i s a beautiful and significant fact about the complex numbers (usually called the fundamental theorem of algebra, even though it is really a theorem of analysis) that every polynomial over
o:1, . . . ,o:d
p(x) p(x) ci11= 1 (x - o:J) · p(x) =
x2 + p(x)
p(() ( x - p(x) p(x) (x - ()(x -p(x)) x2 - ( ( ()x ( x2 - 2 ( x ( 2 + + +J J
Roots of polynomials over other fields 147 which has real coefficients. We say that C is the algebraic closure of JR. since it has the two properties that ( 1 ) C is algebraically closed, and (2) every element ( E C satisfies a polynomial equation over JR.. We finish this discussion of roots with an application-a special case of Euc lid ' s algorithm-that will be needed in the proof of the 'primary decomposition theorem' in Section 14.4. If the reader prefers, he or she may skip the proof here until it is needed later.
b(x) are nonzero polynomials over C which have no root in common, then there are polynomials s( x) and t( x) such that
Proposition 9.9 If a(x) and
a(x)s(x) + b(x)t(x)
=
Proof B y induction on deg ( a ) + deg ( b ) .
1.
Without loss of generality we may assume deg ( a ) � deg ( b ) , so by the division algorithm we can write
a(x) = b(x)q(x) + r(x)
with deg ( r ) < deg ( b ) . We then have deg ( r ) + deg ( b) < deg ( a ) + deg ( b ) , and if r and b have a root in common, then a(x) b(x)q(x) + r(x) has that root in common with them both. This would contradict our assumption, so b(x) and r(x) have no root in common. If r(x) is identically zero, then a(x) = b(x)q(x) , so b(x) is a constant polyno mial ( for otherwise a(x) and b(x) would have a root in common ) , b(x) = >.. , say, and then =
a(x) + b(x) ( 1 /.A. ) ( 1 - a(x)) = a(x) + .\ ( 1 / >.. ) ( 1 - a(x))
=
1
required. Otherwise r(x) is nonzero and we can apply the inductive hypothesis. This gives polynomials f(x) and g(x) such that
as
r(x)f(x) + b(x)g(x) Substituting back for
r(x)
we obtain
(a(x) - b(x)q(x))f(x) and hence
=
1.
+ b(x)g(x) 1 =
+
a(x)f(x) + b(x) (q(x)f(x) g(x)) = 1 as
required.
9.4
D
Roots of p olynomials over other fields
Most of the rest of the book can be applied to vector spaces over arbitrary fields. This ( optional ) section is provided for readers who would like to see how the theory of linear transformations as we set it out applies to vector spaces over
148 Polynomials fields F other than C and JR. In the chapters that follow, there is essentially very little that needs to be changed when we work over an arbitrary field F, except that we often need our polynomials to have a root in the field, or to factorize into linear factors over that field. The simplest way to ensure this is to restrict attention to algebraically closed fields, which are defined to satisfy these two (equivalent) conditions. Definition 9 . 10 A field F is algebraically closed if every polynomial with
coefficients from F factorizes into linear factors which themselves have coeffi cients from F. However, many of the results below are true even for fields which are not algebraically closed, although the proofs sometimes require us to work in a larger field than we started with (just as results about IR often require us to work in q . To do this in general, we need the existence of an 'algebraic closure' of our field F, which is a 'smallest ' algebraically closed field containing F. This idea is given formally in the next definition, but a proof that such a field exists is beyond the scope of this book. Definition 9 . 1 1 If F is any field, then F is an algebraic closure of F if
(a) every polynomial with coefficients in F has all its roots in F, and (b) every element of F is a root of some p olynomial with coefficients from F. Theorem 9.12 If F is any field, then F has an algebraic closure F. Moreover,
F is unique (up to isomorphism) .
In the case of the field of real numbers, � its algebraic closure is just the ' complex number field, C, which is formed from IR by 'adding a square root of - 1 ' (normally called i) , i.e. by adding a root of the polynomial equation x 2 + 1 = 0. This may suggest a method for constructing the algebraic closure of an ar bitrary field F: simply adjoin roots of polynomials one after another until we can go no further. In general there are infinitely many polynomials to consider, and so this is an infinite process. It is not obvious that it can be 'completed' in a sensible way, or that the result is uniquely determined by the original field F. Example 9.13 As indicated in Example 2.41, the field of order 9 may be con
structed by adjoining a root of x 2
+1
to the field of order 3.
In general , we may as well assume that we adjoin a root of an irreducible polynomial f(x) (i.e. one which cannot be factorized in any nontrivial way over the original fiel d ) . It can then be shown that by taking all polynomials modulo f(x), we obtain a field. If the original field had order q , and the degree of f(x) is n , then the new field will have order q n . (See Exercises 9.6 and 9 .7.) The existence part of Theorem 9.12 is proved by an infinite process as follows. Given a field F' 2 F, either F' is the algebraic closure of F, so there is nothing else to do, or else there is a polynomial equation f(x) = 0 over F' with not all its roots in F. By factorizing f(x) if possible, we may assume f(x) is irreducible. Then we may add a root of f(x) to F' by the method of Exercise 9.6, obtaining a
Exercises 149 new field F" 2 F' , and the process continues. It may be that this process finishes rather quickly, as would be the case in constructing IR, or it may take infinitely many stages. However, even if infinitely many stages are required, we may take the union of all fields constructed at some stage as our algebraic closure. The results on the fields lR and C and polynomials earlier in this chapter apply to other fields too. For instance, we can generalize Euclid's algorithm ( Proposition 9.9) immediately to algebraically closed fields, but for general fields it needs to be stated slightly differently.
a(x)
b(x) a(x) b(x) s(x) t(x) a(x)s(x) + b(x)t(x) = 1.
Proposition 9 . 1 4 If and are nonzero polynomials with coefficients from an arbitrary field F, and and have no common factors other than constants, then there are polynomials and such that
Proof A s before, but replacing 'root ' by 'nonconstant factor' everywhere in the 0 previous proof. Exercises
p(x) = x3 - 2x2 + x + 1, q(x) = x2 - 5x - 2, and r(x) = x2 + 6x + 9 at each of the matrices A = G �) and B = (-31 -30) . Exercise 9 . 2 Evaluate the same polynomials as in the last exercise at the linear transformations f((a, b)T) = (2a-b, a+b)T and g ((a, b, c)T) = (a+b, b+c, c-a)T from IR2 IR2 and IR3 IR3 respectively. Exercise 9 . 3 Expand (A-3I)(B2 + 4B+2l) and (A-3B)(B2 +4B+3A+2l) where A and B are unknown real matrices and I is the identity Exercise 9 . 1 Evaluate each of the polynomials
-t
-t
n x n
n x n
matrix.
Exercise 9 . 4 Show that every polynomial of odd degree with coefficients from lR
bas at least one real root. [Hint: show that if ( is a complex root of a polynomial lR then ( is also a root, so nonreal roots occur in pairs. ]
p(x) over
Exercise 9 . 5 Show that every polynomial over lR can be factorized into linear and quadratic factors over IR, i .e. written as a product of polynomials over IR, r
where each
q1
has degree at most
2.
Exercise 9 . 6 Let F be a field, and let
degree
k
f(x)
be an irreducible polynomial of over F. Use Euclid's algorithm to show that any nonzero polynomial
150 Polynomials g(x) of degree less than k has a multiplicative inverse modulo f(x); that is, there exists h( x) such that g(x)h(x) 1 (mod f(x)). Deduce that the set of polynomials modulo f (x) i s a field. Exercise 9. 7 Let F be a field, f(x) an irreducible polynomial of degree k over F, and G be the field defined by adjoining a root of f(x) to F. Prove that G is =
a vector space of dimension
k over F.
Exercise 9.8 Let F' be a field and let F be a subfield of F' such that the di mension of F' as a vector space over F is finite. (Such field extensions F' 2 F are often called finite extensions.) By considering 1 , a, a 2 , a 3 , . . . , a n , or other wise, show that every a E F' satisfies a polynomial equation = 0 for some polynomial with coefficients from F.
p(x)
p(x)
10 Eige nva l ues a nd eige nvectors 10.1
A n example
Rather than giving the formal definition of eigenvalues and eigenvectors-the subject of this chapter, indeed of the rest of the book-straight away, we shall give a hypothetical example of their use to motivate their study. We imagine a team of biologists studying a single-celled organism which re produces by cell division. They have identified two different types of the organ ism, X and Y, and have noticed that on cell division a type X sometimes mutates into type Y, and sometimes a type Y mutates into type X . Their measurements indicate that both types of the organism reproduce at the same speed, on average doubling in number in unit time. Moreover, starting from a population of type X organisms, after unit time they observe a population of 180 type X and type Y . Similarly, starting from type Y organisms, after unit time 190 type Y and type X are observed. They want to use this data to predict the development of a mixed population over several units of time. If X n represents the number of type X at time n and Yn represents the number of type Y at time n they suggest these numbers should be related by
100
20
100
10
Yn+ l
=
0 . 1 Xn
+ 1.9yn-
Equations like these are called simultaneous difference equations. In matrix form , they are written as
(Xn+l ) = ( 1 .8 0.2) (Xn ) . (1) 0.1 1 .9 Yn Yn + l We shall solve these equations by 'pulling a rabbit out of a hat ' , considering the vectors ( 1, 1) T and ( 2, 1 ) T. These vectors are chosen because they have the nice property that (-2)1 = 1.? (-2) . (2) ( 11) = 2 ( 11 ) ' ( 1 .8 0.2) ( 1.8. 1 0.2) 1 . 9 1 1 . 9 0.1 0 These two vectors are linearly independent so form a basis of IE.2 . We shall also consider the base-change matrix formed from these two vectors, -
152 Eigenvalues and eigenvectors
P = c -21) . p- I = (- � i) U n , Vn (�:) = p-I G:) . (�::�) = p-I G::�) = p-I G:� �:�) pp-I G:)
This matrix has inverse
i
and we define numbers
Now, from
or
(I)
by
and (2) we have
But it follows from
(2) that p-I G:�
�:�) p = G
��7) ·
(Alternatively, this matrix multiplication can be checked directly.) We deduce
(Un+I) Vn+ I (20 1.70 ) (Un) Vn and hence Un = 2n uo, Vn (1. 7 ) nvo. Also, (uo, vo)T = p - I (xo, yo)T = !(xo + 2yo, Yo - xo)T and -
=
which can be expanded to give
and
Xn , Yn
being formulae for the numbers of organisms of type X, Y in terms of the initial populations of types X and Y.
x0, y0
Eigenvalues and eigenvectors 153 10.2
Eigenvalues a n d eigenvectors
Examples like the one given in the previous section show the importance of vectors such as ( 1 , 1)T and ( -2, satisfying properties like those in (2) above. We start by making this into a formal definition.
1)T
Definition 1 0 . 1 Let A be an n n matrix over a field F. Then a column vector x in Fn is called an of A, with A E F, if x f:- 0 and Ax = Ax. Theorem 1 0 . 2 A scalar, A, is an eigenvalue of an n n matrix, A, if and only if the matrix A - AI has nullity n (A - AI) > 0. Proof If A is an eigenvalue of A with eigenvector x f:- 0 then (A - Al)x = Ax - Alx = AX - AX = 0 . Since x f:- 0 and lies in the kernel of A - AI, n (A - AI) > 0. Conversely, if n(A - AI) > 0 then there is a nonzero vector x in the kernel of A - AI, so (A - Al)x = 0 or Ax = AX. Hence A is an eigenvalue. The proof of the preceding theorem also shows the following. Theorem 1 0 . 3 Suppose A is an eigenvalue of an n n matrix A. Then the eigenvectors of A having eigenvalue A are precisely the n onzero vectors in ker ( A - AI) = {x : (A - Al)x 0 } . Theorem 10.4 Every n n matrix A over F = or e has an eigenvalue A in e and an eigenvector X in en with eigenvalue A. Proo f Fix any nonzero vector v E en. The vectors v, Av, . . . ,An v, form a set of n + 1 vectors in a n-dimensional vector space en, so are linearly dependent. In other words there are scalars ao, a1, . . . , a n E e with Now consider the polynomial a n z n + + a1z + a0 • Since we are working over eandandc Eevery polynomial over e has all its roots in e, there are E e e with x
eigenvector
eigenvalue x
D
x
=
lR
x
· · ·
Tj ' . . . ' Tn
Then
= (an An + + a1A + aol)v = c(A - r1l) . . . (A - rn l)v. Since v f:- 0 , there is some such that (A - r;l) . . . (A - rn l)v = 0 but r; 1) . . . (A - r n l)v f:- 0. It follows from this that at least one of the (A +1 matrices (A - r;l) has nonzero nullity, so at least one of the r; is an eigenvalue D for A. 0
· · ·
i
154 Eigenvalues and eigenvectors Remark 10.5 In general, the same result h olds for a matrix A over an arbitrary field F if we replace
(-� - � = �)
and eigenvectors. For example, consider the matrix
A= -1 -1 1 Taking the nonzero vector e 1 = (1, 0, O)T we have In this case, we have the linear dependence
p(z) = -z2 z 2 = -(z - 2)(z 1). p(z) 2 -1,
which yields the polynomial + + + The proof of or the theorem now says that one of the roots of = 0, i.e. one of is an eigenvalue of A. In fact in this case both are, for here
which gives a basis
{(1, - 1 , o)r, (1, 0, -I)r, (1, 1, 1)T} of JR3 of eigenvectors of A.
We can also define the concepts of eigenvector and eigenvalue in the more abstract context of vector spaces and linear transformations. There is a very close relationship between the two situations. Definition 10.7
Iff:
V --+ V is a linear map, where V is a vector space over a field F, and 0 =f. E V with = for some E F, then is an eigenvector of f, with eigenvalue .>.. .
v
f ( v) .Xv
.X
v
f:
Suppose that A = (a;j ) is the matrix of the linear map V --+ V with respect to a basis . . . , Vn of V. Suppose that is an eigenvector of with eigenvalue and write in terms of the basis vectors as = 2:: 7= 1 Then we have Expressing both sides of this equation in terms of the basis vectors, we have
v
.X, v1, v f(v) = .Xv.
n
n
.Xv = .\ jL= l P,jVj = j"'f:.= l (.Xp,1 )v1 . On the other hand,
v
f, p,;v;.
Upper triangular matrices 155
f(v) = ! (t Jl;V; ) = tJld (v;) = � � aj iVj � (�Jliaj; ) vj. So, comparing coefficients of each basis vector, we have ).. j each j , and therefore a1 a1 a� ! a222 a�nn }117 �2 ' an! an2 ann n n which means that (J1 1 , . . . � n ) T is an eigenvector of A with eigenvalue >.. Conversely, if ( �1 , . . . , n )T is an eigenvector of A with eigenvalue >., then the corresponding vector n v=L i=l JliVi i n V i s an eigenvector of f with eigenvalue >.. I n particular, this result implies that the eigenvalues of f are the same a s the eigenvalues of a matrix representing f with respect to any basis. Therefore any �;
=
)( ) ( )
�
c
=
�
)..
�!
�
,
�
two representing matrices have the same eigenvalues. Using Proposition 8 . 1 9 we can restate this as follows. Proposition 1 0 . 8 If A, B, and P are n x then B and A have the same eigenvalues. 10.3
n
matrices related by B
= p -l AP,
U pper triangular matrices
As an important application of eigenvalues and eigenvectors, we aim to show here that any square matrix over � or CC is similar to an upper triangular matrix over CC. In other words, for any square matrix A over � or CC there is an invertible matrix P over CC with
b0u b ! 2 b1 3 b022 b23 1 0 b33 p AP = 0
0
0
b!n b2 n b3n bnn
This theorem isn't quite as simple as it may seem. Firstly, it will turn out that all the diagonal entries in the form above will necessarily be eigenvalues of A . Secondly, it isn't always possible to put a real matrix A into a real upper triangular form ; in other words, the diagonal entries may turn out to be complex.
156 Eigenvalues and eigenvectors The same result is true more generally for a matrix A over a field F. It turns out that we can always find an upper triangular matrix B which is similar to A, but once again the entries from B may have to be from the algebraic closure F of F rather than itself. The first lemma we need is quite straightforward, and has nothing to do with eigenvectors. Lemma 10.9 Suppose f : V -t V is a linear transformation of an n-dimensional vector space V over a field F. If f has nullity at least 1 then there is a basis v 1 , v2 , such that
F
. . , Vn .
for all j is
=
1 , . . . , n. In o ther words, the matrix of f with respect to v 1 , v2 , .
. . , vn
Proof Let r = r(f). By the rank-nullity formula, r (f) < n . So suppose for the whole v1 , . . is a basis for im f, and extend this to a basis v 1 , . 0 of V. Then f (vJ) E span(v 1 , . . . , vr ) � span(v 1 , . . . ) , as required.
. , Vr
, Vn- 1
.
.
, Vn
en be the n-dimensional vector space over C, and suppose f is a linear transformation from V to V . Then there is a basis v1 , . . . , Vn of V such that, with respect to this basis, the matrix of f is upper triangular. Proposition 1 0 . 1 0 Let V =
Proof We use induction on n ; assume the result is true for all spaces V of dimension n 1 over C. Given V of dimension n and f : V -t V, let ..\ be an eigenvalue of f. Such ..\ exists by Theorem 10.4. Note that f - ..\I has nullity at least 1 , so by the previous lemma, there is a basis such that ( f - ..\I) ( ) E span ( 1 , J) for all i . Then the matrix of f with respect to this basis is of the form -
ui
u 1 , u2 , . . . , U n
u . . . , Un -
0
u . , Un- 1
Let W be the subspace spanned by 1 , . . and note that f ( w) E W for all w E W , so that the restriction of f to W is a linear transformation of W . , of W such that the B y the induction hypothesis there i s a basis v 1 , v2 , matrix of the restriction of f to W with respect to this is upper triangular. Put then {v1 , v2 , . . is a basis of V, since (j. span(v 1 , = = span( , - 1 ) and hence { v 1 , . . . is a linearly independent set of = size n = dim V . Also, ( + w for some w E W ; hence the matrix of f D with respect to the basis v1 , v2 , is upper triangular too.
Vn - 1 ,vn } Vn V , V } 1, f vn ) ..\vn V n- n n • . •
Vn un ; u1, . . . Un
.
•
.
.
,
. . . ,Vn - 1)
Upper triangular matrices 157 Remark 1 0 . 1 1 Once again, the proposition remains valid for any algebraically
closed field F in place of C, by the same proof Example 1 0 . 1 2 Consider the matrix
A = (-� -1-� -i) . 0
A A
-2
To put into upper triangular form we first need an eigenvector, and from the first column of it is obvious that 0, is an eigenvector with eigenvalue Then
(1, o)T
A+
A+l= (� -i-1 -1-i)
-1.
0
so the image of I (considered as a linear transformation on IR3 by left multiplication ) has basis which as it happens in this particular case is also an eigenvector of We extend the vector we have so far to a basis 0, (0, of IR3 . Then the base-change matrix is
(-1, 1, -l)r, A. ( -1, 1, -1)T, (1, o)r, 1, O)T : P = (-1- 001
This has inverse
p-1 =
D
H 1 -D ( = 1A p- P � -1 -1�) '
and
0 0
-1
0 0
which is in upper triangular form .
Example 1 0 . 1 3 B y good fortune, we found the new basis rather easily i n the last example. For a slightly more typical example, consider
A = (--1� � =;) · 0
1, O)T
5
Here (0, is obviously an eigenvector with eigenvalue 4. The subspace which is the image of the linear transformation
158 Eigenvalues and eigenvectors
- 4I ( = � � =�)1 -1 0 has basis formed from f1 ( -1, -1, - l)T, and f2 ( 1, -3, l)T. We extend this to a basis of the whole space by adjoining f3 ( 1, 0, O)T, and so we have base-change matrix (-1-1-1 -311 p A
=
=
=
=
=
On calculating, we find that
( -� !) which has eigenvalue 4 and eigenvector ( 1 , 1 ) r. Also, B -4I ( = � i ) so a basis for the image of this is ( 1, 1 ) T. We extend this to the basis ( 1, 1 ) T, ( 1, 0 ) T of IR2 . Going back to IR3 what we have done is to replace the basis f , f , f with f1 f , f1 , f , and the base-change matrix for
We now look at B
=
=
,
1 2 3
this operation is
+
2
On calculating, we have q- I p -I APQ
in upper triangular form.
=
,
3
(4� -�1 401)
10.10 may fail for vector spaces over other fields. (y, -x ) r. The matrix of f with respect to the usual basis is (- � �) . Suppose for the sake of obtaining a contradiction that P ( � �) is a real base-change matrix putting this into upper triangular form, 1 d 0 1 0 p-I ( -1 0) p ad - be ( -ba ) ( -1 01 ) (a db ) (x0 y) · Example 1 0 . 1 4 Proposition
For example, let V
=
IR2 over IR and f be the linear transformation j (x , y )T
=
=
-c
c
=
z
=
Upper triangular matrices 159
( cd2+ ab2 d2 + b2 ) (x yz) -cd - ab - 0 ad - be c
Multiplying out gives
1
-c2 - a2
-
a
-
so = 0 giving a = c = 0 (since a and c were supposed to be real) and hence P is singular. We conclude that the original matrix is not similar to any upper triangular matrix over JR.. Note, however, that
( 1 1) - 1 ( i
0
0
1
-1 0
) (1 1) - (i i ) i
0
0
-i
so the original matrix can be put in upper triangular form over C. The eigenvalues of this particular example are i and -i which are imaginary and not real. In general, the diagonal entries of any upper triangular form for a matrix A are the eigenvalues of A, as shown by the next proposition. Proposition 1 0 . 1 5 If A is an n x n upper triangular matrix, then the diagonal
entries in A are precisely the eigenvalues of A .
Proof Suppose A = (a;j) and ,\ = a ;; is a diagonal entry o f A. Then
Aej = aj1 e1 +
·
·
·
+ aj - 1 j ej - 1 + ajj ej
since A is upper triangular. Using a ;; = ..\, we obtain (A - ..\l)e; = a; 1 e1 +
·
·
·
+
ai-1 i ei- I E span(e1 , . . . , ei_ I ) .
Thus A - ..\1 defines a linear transformation T from an i-dimensional space, span(e 1 , . . . , ei ) , into an (i - I )-dimensional space, span(e 1 , . . . , e;_ I ) . By the rank-nullity formula, T has nullity at least 1, so there is v E span( e 1 , . . . , e;) with T(v) = (A - ..\l)v = 0, i.e. v is an eigenvector of A with eigenvalue ..\. To see that all the eigenvalues of A appear along the diagonal of this rep resentation, suppose ,\ is not equal to any diagonal entry a;i . Then A ..\1 is in upper triangular form and has each diagonal entry nonzero. In other words, A - ..\1 is in echelon form and has rank n, and hence nullity 0. Therefore ,\ is D not an eigenvalue.
-
It would be possible to define the determinant of a matrix A to be the product of the diagonal entries in any upper triangular form p- I AP for A. The problems are ( 1 ) that it is not immediately obvious that this definition agrees with the usual definition, and (2) in showing that this definition does not depend on the choice of P or the choice of the upper triangular form A. In fact these problems can be got round, and the definition can be made sound, but doing so would take us too far off track. 1 We conclude with a particularly useful observation concerning upper trian gular matrices. 1
The interested reader can follow up the details in 'Down with determinants', by Sheldon
Axler.
American Math Monthly,
February 1 995,
pp.
139- 1 4 5 .
160 Eigenvalues and eigenvectors Theorem 1 0 . 1 6 If A is any upper triangular n x n matrix with entries from or C, and A 1 , Az , . . . , An are the diagonal entries of A including repetitions,
JR.
then the matrix
is the zero matrix. Proof If n = 1 , this is obvious. We prove the general statement using induction on n. Given an upper triangular n x n matrix, take the standard basis e1 , . . . , e n of the underlying vector space IR.n or en . Observe first that
from Section 9.2 in the last chapter. Now
since (A - An l)en = 0 since A is upper triangular with last row equal to (0, 0 , . . . , 0 , An ) · Also, the linear transformation given by A on the subspace span (e 1 , . . . , e n - ! ) has upper triangular matrix with respect to this basis, with diagonal entries A1 , . . . , A n - ! , so by the induction hypothesis we have for all i < for i <
n
n.
Thus
and therefore (A - A 1 l) . . . (A - An- I I) (A - An l)e;
for all i :::; n . We have shown that the matrix
=
0
represents the zero transformation, since multiplying e; by it gives 0 for all i :::; n . D Therefore this matrix i s zero, as required. Exercises
Exercise 1 0 . 1 Let
A=
(-i -� -i) . 0
1
-1
Given that A has eigenvalues 1 , 2, and -2, find all the eigenvectors of A , and hence write down a basis of JR:.3 consisting of eigenvectors of A .
Exercises Exercise 1 0 . 2 Let
B
=
�3
161
(�0 0� -3-�) .
Find all the eigenvalues and eigenvectors o f B , and hence show that there is no basis of consisting of eigenvectors of B . Exercise 1 0 . 3 Find eigenvectors and eigenvalues of the following. (Try to guess the eigenvectors and verify your guess, calculating the eigenvalues, or use the ideas in this chapter. )
(a)
(03 3)0
(b)
( - 31 -3)1
(c)
G
�2
(d)
G n.
Exercise 1 0 . 4 (a) Let T be the rotation on the plane by 1r / 4 in the clockwise direction about the origin. Explain why T doesn't have any real eigenvectors. (b) What are the real eigenvalues and eigenvectors in of a rotation of by 1r j 4 about an axis through the origin perpendicular to the plane given by
2x + y -
5z 0? =
�3
�3
Exercise 1 0 . 5 For each of the following matrices, A, find an invertible matrix
P over C such that p - 1 AP is upper triangular. (You don't have to compute
p- 1 . )
(a)
(0 -1) 1 0
(b)
-1 (-1-1 -i
1
1
)
2 i i 2i 2 - i
(c)
(_ �
D·
11
•
•
The m 1 n 1 m u m polynom i a l The central result of this chapter is the remarkable and important fact that for any n x n matrix A there is a polynomial p(x) of degree at most n such that p(A) is the zero matrix. This can be proved either from results in the previous chapter on upper triangular matrices, or more directly. Because of this, it turns out that we can define a special minimum polynomial mA (x) (i.e. of least degree) such that mA (A) = 0. The properties of this minimum polynomial will be central to all the material in the remainder of this book. 11.1
The minimum polynomial
This section is devoted to the minimum polynomial and its properties, and this theory underpins the remainder of the book. The rest of this chapter provides further illustrations and methods of calculation which will be chiefly of use in applying these ideas. We start straight off with the key theoretical observation. Theorem 1 1 . 1 Let f : V -+ V be a linear transformation of a finite dimensional vector space V over a field F . Then there is a polynomial q(x) with coefficients from F such that q(f) = 0. Pro of Let { e 1 , e2 , . . .
the n + 1 vectors
Since these are n dependent, i .e.
+1
, en
} be a basis of V . For each basis vector
ej
we consider
vectors in an n-dimensional space V, they are linearly
for some scalars .\; , not all zero. Alternatively,
This shows that there is a nonzero polynomial q1 ( x) of degree at most n such that q1 (f) (e1) = 0. We find such a polynomial for each j, and multiply them together to give the polynomial
The minimum polynomial 163 Then for every basis element ej , so quired.
q(j) is the zero transformation q(j) 0 , as re Next, we ask about other polynomials p(x) with p(j) 0. First note that if p(f) 0 and r(x) is any other polynomial then p(j)r(j) 0, so we should look for polynomials p(x) of the smallest possible degree such that p(f) 0. Note too that if p(j) 0 then we can divide the polynomial p(x) by its leading coefficient (i.e. the coefficient of the highest power of x) to get another polynomial q(x) with q(j) 0, such that q(x) is monic, i.e. has leading coefficient equal to 1 . Definition 1 1 . 2 Given a linear transformation f: V V, the off is the m onic polynomial p(x) of least degree which has p(f) 0. We write rn1(x) for this minimum polynomial. Similarly, the minimum polyno mial of a square matrix A, rnA (x), is the monic polynomial p(x) of least degree such that p(A) 0 . We need to show that rn1 (x) i s well-defined, i.e. that there cannot b e two dif ferent monic polynomials p(x), q( x) of minimum degree such that p(j) q( j ) =
D
=
=
=
=
=
=
-+
polynomial
minimum =
=
=
=
0. The following proposition is stated for linear transformations, but the proof applies equally well to matrices instead.
f:
Proposition 1 1 .3 Let V -+ V be a linear transformation of a finite di mensional vector space V over a field F . Suppose is any polynomial with = 0 and is any monic p olynomial of minimal degree with = 0. Then is a multiple of in the sense that there is a polynomial such that as polynomials. Moreover, is well-defined.
p(f)
p( x)
rn(x) rn(j) p(x)p(x) rn(x)q(x) rn(x), q(x) rn1(x) Proof If p(f) 0, and rn(x) is of minimal degree such that rn(f) 0, then by the division algorithm we can write p(x) rn(x)q(x) + r(x), where the degree of r(x) is less than the degree of rn(x). Then r(f) p(f) - rn(f)q(f) 0. But rn(x) is of minimal degree with the property that rn(j) 0, so r(x) must be the zero polynomial. Therefore p(x) rn(x)q(x) as required. Now suppose that polynomials rn1(x) and k1(x) both satisfy Definition 1 1 .2. Then k1(f) 0, so by what we have just proved k1(x) rn1(x)q(x) for some Similarly, rn1(x) k1(x)s(x), for some polynomial s(x). Therefore both q(x). q(x) and s(x) are constant polynomials, and since rn1(x) and kj(x) were defined to be monic, they must be equal. Theorem 1 1 .4 Given a linear transformation f: V V where is a vector space over F, and given a scalar .A E F, then .A is an eigenvalue of f if and only if .A is a root of the minimum polynomial rn1(.A ) 0. =
=
=
=
=
=
=
=
=
=
=
D
-+
=
1/
164 The minimum polynomial
m1(x). m1(x) = (x - A)p(x) for some A p(x), p(J) ¥- 0, since p(x) has smaller m1(x), v 0, w = p(J)(v) ¥- 0. But m1(J) is the 0 = mt (f)(v) = (f - Al )p(f)(v) = f(w) - Aw so f(w) = Aw, and hence A is an eigenvalue of f. Conversely, suppose that v E V is an eigenvector of f with eigenvalue A, so that f(v) = Av and v ¥- 0. Then m1(f) = 0, so m1(f)(v) is the zero vector, and 0 = m, (f)(v) = (r + ar-dr - l + + aol)v = r (v) + ar- dr - l (v) + . . . + aov = Arv + ar-!Ar - lv + + aov = (Ar + ar - l Ar- l + + ao)v = mt(A)v. But v ¥- 0, so the scalar m1(A) equals 0.
Proof Suppose is a root of Then polynomial by the remainder theorem . Now degree than so there is E V with transformation taking every vector to so
· · ·
· · ·
·
· ·
0
( 1 A = 02 11 - 1 ) , B = (� � -�) . 0 0 1 0 0 1 Both are upper triangular, with 2, 1 , 1 on the diagonal, so 2 and 1 are the eigen values of both A and B. By Theorem 10.16, we know that p(A) = p(B) = 0, where p(x) = (x - 1) 2 (x - 2), and by Theorem 11. 4 both (x - 1 ) and (x - 2) must divide the minimum polynomials mA(x) and ma(x). So we compute (A - I)(A - 21) = (0� �0 -0�) (0� -0� -1 (0� 0� -�0) , Example 1 1 .5 Consider the two matrices,
- �)
and
(B - I)(B - 21) = (� � -�) (� -� -�) (� � �) . 0 0 0 0 0 -1 0 0 0 So mA(x) = (x - 1) 2 (x - 2) and ma(x) (x - 1)(x - 2). =
11.2
T h e chara cteristic p olynomial
11. 4 ,
By Theorem the minimum polynomial is an example of a polynomial whose roots are exactly the eigenvalues of a square matrix Other polynomials with this property can be obtained using the idea of the determinant of a matrix.
A.
The characteristic polynomial 165 Consider Theorem 10.2, which states that a scalar A is an eigenvalue of a matrix A if and only if the nullity of the matrix A - AI is at least 1 . Since a matrix has nonzero nullity if and only if it has determinant equal to zero, we can rephrase this result in terms of determinants as follows. Theorem 1 1 .6 Let A be an n
only if det(A - AI ) = 0 .
x
n matrix. Then A is an eigenvalue of A if and
Note too that, since det(B) = det(BT) and (A - AI ) T = (AT - AI ) , A i s an eigenvalue of the matrix A if and only if A is an eigenvalue of its transpose AT. (But the reader should note that although the eigenvalues of A and AT are the same, the eigenvectors are in general completely different.) Definition 1 1 . 7 The characteristic polynomial of a square matrix A is the polynomial XA (x) defined by XA (x) = det(A - xi) . The characteristic equa tion is the equation XA (x) = 0 .
Thus the eigenvalues are exactly the solutions of the characteristic equation (i .e. the roots of the characteristic polynomial). Once again, the characteristic polynomial of AT is identical to that of A . Remark 1 1 . 8 Many people defin e XA (x) t o be the polynomial det(xi - A). Since det( -B) = ( - l) n det B for any n x n matrix B , we have that
det(xi - A) = ( - l ) n det (A - xi) , so this alternative definition agrees with ours when n is even, but is the negative of ours for odd n. Unfortunately, there doesn 't seem to be any convention here, and the reader will need to be aware of this potential source of difficulty. The main points of interest in the characteristic polynomial XA ( x) are its coefficients. The constant term turns out to be the determinant of A, and perhaps more surprisingly the trace of A appears in the second coefficient. Proposition 1 1 .9 For a square matrix A, XA (x) = ( - l) n x n + ( - l ) n - l tr(A)x n - l + · · · + det(A) . Proof XA (x)
=
det(A - xi) is equal to the determinant a1 2 a1 1 - x a2 1 a zz - x
The only term in x n comes from taking - x in each diagonal entry, and so the coefficient of x n is ( - l) n . The terms in xn -l must be obtained by tak ing n - 1 terms of the form -x from the diagonal, so the remaining term must be one of the diagonal constant terms a;; . Therefore the term in x n - l is 2.:� 1 a;; ( -x) n- l = ( - l ) n- 1 tr(A)xn - I _ The constant term is obtained by D putting x = 0, so is XA (0) = det(A) .
166 The minimum polynomial Of course, the other coefficients of the characteristic polynomial are of interest too. These coefficients provide useful invariants of the matrix A , with many applications and more advanced material outside the scope of this book. Example 1 1 . 10 We can write down the characteristic polynomial of an upper triangular matrix immediately. For example, if
(
2 1 A= 0 1 0 0
then
X A (x) =
�) -1 '
B
=
(� � -�)
0 0 -2 (2 - x) (I - x)( - 1 - x) and xs (x) = x) 2 (-2 - x).
(3 -
Example 1 1 . 1 1 For the matrix
we have Aand
1
xi =
(1 3
X
)
2 4-x '
I
1 - x _2 = (1 - x) (4 - x) - 6 = x 2 - 5x - 2. XA (x) = 4 x
3
Example 1 1 . 1 2 Sometimes it is possible to find the characteristic polynomial without evaluating determinants at all, but using the remainder theorem instead. Take for example
A= so
c
�
�) ,
2 -2 2 -1
1 2 -1 - x -2 - x 2 2 (I) 2 -1 - x 1 This is a polynomial of degree with the x3 term having coefficient - 1 . So if we can find three roots )q , ,\2, ,\3 of it, then X A (x) = (>.1 - x)(,\ 2 - x) (,\3 - x) . Now if x = -2, the determinant in (1) is zero since the first and the third rows would both be equal to ( 1 2 1 ) . If x = 2 then the sum of all three rows would be zero, so the determinant is zero here too, and finally, if x = -4 then the first and third rows add to give -2 times the second row, so again the determinant XA (x) = det ( A - xi) =
3
is zero. Thus
X A (x) = (2 - x) (-2 - x) (-4 - x) .
The Cayley-Hamilton theorem 167 11 . 3
The C ayley- H a milton theorem
We would like now to define the characteristic polynomial of a linear transform ation f : V ---+ V . In fact, it is possible to define the determinant of a linear map and its characteristic polynomial in a completely direct (and rather abstract) way, but the following is entirely adequate for our purposes here. Definition 1 1 . 1 3 If f : V ---+ V is a linear map on a fin ite dimensional vector space, then the determinant of f (written det(f)) is the determinant of any matrix A representing f . The characteristic polynomial of f is the polynomial XJ (x) = det( A - xi) .
Of course, we now have to prove that these are well-defined. In other words, it doesn't matter which basis we choose. If A and B are two matrices representing f with respect to different bases, then by Proposition 8 . 1 9 we have B = p- 1 AP for some invertible matrix P. Therefore det(B) = det(P - 1 AP) 1 = det(P - ) det(A) det (P) = (det(P)) - 1 det(A) det(P) = det(A) . Thus det(f) is well-defined. Similarly, det(A - xi) = (det P) - 1 det(A - xi) (det P ) 1 = det(P- (A - xi)P ) = det(P - 1 A P - xi) = det(B - xi) = XB (x)
which shows that the characteristic polynomial of a linear transformation f does not depend on choice of basis either. These observations have some very important consequences. Firstly, that the characteristic polynomial of a matrix is invariant in the sense that if the matrices A and B are similar then XA (x) = XB (x) . This combined with Proposition 1 1 . 9 shows also that the trace of a matrix i s invariant in the same way: tr(A) = tr(P- 1 AP) for any square matrix A and any invertible P . Alternatively, this can be proved from first principles�see Exercise 1 1 . 10 . The invariance of the characteristic polynomial also enables u s t o prove the following well-known theorem. Theorem 1 1 . 1 4 (The Cayley-Hamilton theorem) If A is any square mat rix over a field F = IR or C, and XA (x) is i ts characteristic polynomial, then XA (A) = 0 .
168 The minimum polynomial Proof By Proposition 10.10 there is an invertible matrix P over C such that B = p - 1 AP is upper triangular, and Theorem 10.16 says that if A1 , A 2 , . . . , A n E
C are the diagonal entries in B then
It follows that (A - A 1 I)(A - Azi) . . . (A - A n i)
= PP - 1 (A - A 1 I)PP - 1 (A - Azi) . . . PP - 1 (A - A n i)PP - 1 = P(P - 1 AP - A 1 I) (P- 1 AP - Azi) . . . ( P - 1 AP - A n i)P - 1 = P(B - A 1 I) (B - Azi) . . . (B - A n i)P - 1 = 0.
Now, since B is upper triangular, we have
XB (x) = (x - A J ) (x - Az ) . . . (x - An ) · Since p- 1 A P = B , then XA (x) = XB (x) ; hence XA (A) = 0 as required.
0
F. F in place of IR
As usual, the proof just given works over an arbitrary field or C by simply replacing C by the algebraic closure F of Example 1 1 . 1 5 The matrix
A=
( 2) 1 -1
0
has characteristic polynomial defined by
1
XA (x) = det(A - xi) = 1 - X We have
XA (A) =
(
-1 -1
_1
2 1 = xz - x + 2.
-x
-22) - ( - 11 �) + G
The Cayley-Hamilton theorem has the following immediate corollary. Corollary 1 1 . 1 6 For a square matrix A, mA (x) divides XA (x) . Proof Immediate from Theorem 1 1 .14 and Proposition 1 1.3.
0
What this means is that the characteristic polynomial gives a rather direct way of computing minimum polynomials. Rather than finding all the eigenvalues of a matrix A by the rather long-winded method we have used so far and looking at the product of various polynomials of the form (x - A) where A is an eigenvalue, we now see that it suffices to compute the characteristic polynomial and look at polynomials dividing that.
The Cayley-Hamilton theorem 169
)
Example 1 1 . 1 7 To find the minimum polynomial of
(
5 3 1 A = -1 1 -1 , 2 6 6 first find the characteristic polynomial
5-x 3 1 -1 1 - x -1 2 6 6-X
This is
( 5 - X) ( 6 - 7X + x2 + 6) - 3( -6 + X + 2) + ( -6 - 2 + 2x) 2 2 = 60 - 35x + 5x - 12x + 7x - x 3 + 12 - 3x - 8 + 2x = 64 - 48x + 12x2 - x3 = - (x - 4) 3 , (x - 4), (x - 4) 2 , or (x - 4) 3 . Now
so
mA (x)
is
so
mA (x)
is not
=
(x - 4).
(A - 4I) ' � so
XA (x) = det(A - xi).
mA (x) = (x - 4) 2 .
On the other hand,
H -i -D H -� -D G � �)
In doing calculations of this sort, it is useful to remember Theorem 1 1 .4 which says that every eigenvalue of a matrix A appears as a root of the minimum polynomial. This limits the number of possible polynomials considerably, and can sometimes enable you to identify the minimum polynomial directly without any matrix multiplications at all. Example 1 1 . 1 8 Consider the matrix
A= We compute
1
I
(-� � -�) . -4 0 -3
2 -X 2 1). X A = (2 - x) 3 _4 -3 - x = - (2 - x)(x - 1) = - (x - 2) (x - 1 ) (x +
Thus A has three distinct eigenvalues, 2, 1 , - 1 , and mA (x)
= (x- 2) (x- l ) (x + 1).
170 The minimum polynomial Example 1 1 . 19 Let A be the matrix
G =! D G =� D G =i !) (� � D .
We compute (A - I)(A - 2I) . (A - I) (A - 21)
�
Since A - I -::/:- 0 and A - 2I -::/:- 0, we deduce that that 1 and 2 are the only eigenvalues of A .
mA (x)
=
(x - l ) (x - 2)
and
The following terminology i s often useful. Definition 1 1 .20 The algebraic multiplicity of an eigenvalue ,\ of a linear
transformation f or a matrix A is the multiplicity of ,\ as a root of X J (x) or
X A (x) .
In Example 1 1.17, 4 is an eigenvalue of A of algebraic multiplicity 4, and the eigenvalues in Example 1 1 . 18 all have algebraic multiplicity 1. One of the eigenvalues 1 , 2 in Example 1 1. 19 has algebraic multiplicity 1 and the other has algebraic multiplicity 2�it is not possible to determine which is which without further calculation. We conclude this chapter with another proof of the Cayley-Hamilton the orem, this time without using the notion of algebraic closure of a field. This is the usual proof found in most textbooks. It is rather striking in the way the cancellations appear at the end, and perhaps for this reason only it is well worth reading, but to us it is not as clear an explanation of why the theorem is true as the proof given earlier. Second proof of Theorem 1 1 . 1 4 Let B = A - xi, so that Since this is a polynomial of degree n we can write
X A (x) = det(B).
XA (x) = bo + b 1x + · · · + bn xn . The crux of this proof is to consider the adjugate matrix, adj (B ) , of B , which has the property that B adj(B) = det(B)I. This matrix, adj (B ) , i s the transpose of the matrix of cofactors of B , an n x n matrix whose entries are plus or minus determinants of certain (n - 1 ) x (n - 1) submatrices of B . Each of these determinants is therefore a polynomial of degree at most n - 1, so
(
pu(x)
adj (B )
=
:
Pnl (x) where each PiJ (x) is a polynomial of degree at most n - 1 . Now we can separate out the constant terms of the entries of adj (B) into a constant matrix, and the x terms into x times a constant matrix, and so on, to give
The Cayley-Hamilton theorem 171 adj(B) = B o + B 1 x +
·
· ·
+ B n - 1 Xn- 1
(2)
where B0, . . . , B n _1 are n x n matrices. We know that B adj ( B ) = det ( B ) I, so
det ( B ) I = B adj(B)
= (A - xi) adj (B) = A adj(B) - x adj (B ) ,
and both sides are polynomials i n x with matrix coefficients. Now so we have
det ( B ) I = boi + b 1 xi + · · + b n x n i. ·
Also, using the expression (2) for adj ( B ) above, we have
A adj (B) - x adj (B) = ABo + AB 1 x + - Box - B 1 x2 -
+ AB n - 1 X n - 1 n · · · - Bn- ! X .
· ·
·
Equating coefficients of powers of x in these two expressions gives
boi = ABo b1 I = -Bo + AB1 b 2 I = - B 1 + AB 2 bn - l i = - B n- 2 + AB n - 1 bn i = - B n - l · If we multiply these n + 1 equations on the left by I, A, A 2 , . . . , A n respectively,
we obtain
boi = ABo b 1 A = -AB0 + A 2 B 1 b2 A 2 = -A2 B 1 + A 3 B 2 n n 1 b n - l A n - l = -A - B n - 2 + A B n - l b n A n = -A n B n - l · Adding up these equations, all the terms on the right-hand side cancel out, and we are left with
boi + b1 A + · I n other words, XA (A) = 0 .
· ·
+ bn A n = 0 . 0
172 The minimum polynomial Exercises
Exercise 1 1 . 1 Calculate the eigenvalues of the following matrices.
(H �)
(b )
(a )
(c)
1 1
Exercise 1 1 . 2 Calculate the minimum polynomial and the characteristic poly nomial of each of the following matrices.
( a)
(-3- �
1 -4 3
Exercise 1 1 .3 Let
!)
A
c
(b)
-1 0
-3 -3 0 -1 1 -1
be the matrix
(-!
)
- ;)
(c)
-2 -3 4 3 2 3 -2 . -1 -1 4
Compute (A - 4I)(A - 6I) (A - 21) and (A - 4I)(A Say what you can deduce from your answers. Exercise 1 1 .4 Which of the eigenvectors multiplicity 2? Exercise 1 1 . 5 Let
A=
(2;
�
;
2 1
)
1, 2
c
-�)
-3 1 1 -3
- 6I)(A - 21) 2 .
in Example
. Verify that
-1 0
1 1.19
has algebraic
5 is an eigenvalue and find all
other eigenvalues and their algebraic multiplicities. For each eigenvalue .\, find a basis of the subspace ker ( A -
of the real vector space
.\1) = {v E �3 : (A - .\l)v = 0}
�3 .
Exercise 1 1 . 6 Find the roots and their multiplicities of the characteristic equa tion of the 4 x 4 real symmetric matrix A where
(You may find it easier to spot roots from the determinant using row operations rather than calculating the characteristic polynomial. ) Hence find a basis { v 1 , vz, v 3 , v4 } of consisting of eigenvectors of Write down the matrix of the linear transformation T : �4 --+ defined by T ( v) = A v with respect to your basis.
�4
�4
A.
Exercises 173 Exercise 1 1 . 7 Compute
0 0
1 0
0
0 0
1
0
1
0
0
Hence find the characteristic and minimum polynomials of .\ 0
0
f : C3
1 .\
0
0 0
1
0
1
.\
C3 by f(a, b, c)T = (a + 3b - c, 2a - b, b + 2c)r. Find the characteristic polynomial X ! (x ) of f, and verify that X ! (f) is the zero map. Show that Xt (x) has exactly one real root, which is between - 2 and -3, and deduce that the minimum polynomial mt (x) of f i s the negative o f the Exercise l l . 8 Define
--+
characteristic polynomial. Exercise 1 1 . 9 Let A be an n x n matrix over C. Use the invariance of trace to prove that the trace of A is equal to the sum of the eigenvalues of A, counting each eigenvalue m times where m is its algebraic multiplicity. Similarly, show that the determinant of A is the product of the eigenvalues counting algebraic multiplicity. Exercise 1 1 . 1 0 Using the definition of the (j, k)th entry in a matrix product, show that tr(AB) = tr(BA) for matrices A = (aij) and B = (bij) · Hence deduce tr(P- 1 AP) = tr(A) for any invertible matrix P .
12 D iago n a l ization Our basic aim is to find as nice a matrix transformation. That is, given a vector f : V ---t V , we want to find a basis for V is 'nice'. In this chapter, 'nice' will mean 1 2. 1
as possible representing a given linear space V and a linear transformation with respect to which the matrix of f diagonal, if possible.
D iagonal matrices
The process of diagonalization is where one tries to transform a matrix A to a matrix p - l AP which is diagonal, or to find a basis with respect to which the matrix representation of a given linear transformation f is diagonal. We have already seen in examples that this process is often useful or necessary for solving certain kinds of simultaneous equations, such as the example in Section 1 0 . 1 . Unfortunately, some matrices o r linear transformations cannot be diagonal ized in this way, and there are two main restrictions preventing diagonalization. The first restriction is illustrated by Example 1 0 . 14 . There, we saw that the matrix
cannot even be put in upper triangular form over the reals. The problem here is that this matrix does not have real eigenvalues, and as pointed out in Proposi tion 1 0 . 1 5 , the diagonal entries in any upper triangular form for a matrix A are precisely the eigenvalues of The eigenvalues of the matrix A can be described as being the roots of a particular polynomial p(x) . (Here, p(x) may be taken to be the minimum poly nomial of A or the characteristic polynomial of A, it doesn't matter.) So, for A to be diagonalizable, it is necessary that p(x) has the maximum number of roots in the field of scalars. For this reason, when we are investigating whether or not A can be diagonalized, we will usually assume these roots always exist; for example, by working over the field of complex numbers or some other algeb raically closed field, or by simply assuming that the characteristic polynomial has its maximum number of roots in the field. It is still not always possible to get a basis with respect to which the matrix is diagonal, even if the matrix in question is upper triangular, as the following example shows.
A.
Diagonal matrices 175 Example 1 2 . 1 Let V
=
C2 , and define f : V -t V by
With respect to the standard basis of V , the matrix of f is apply the base-change matrix
=
=
d
�)
. So if we
(� �) (assuming this is a base-change matrix,
i.e. is nonsingular, so has determinant for f as follows.
If this is diagonal, then c
(�
=
ad - be f:.
0) we obtain the new matrix
1 (ad + c� - be -c ad - be
cf2
0, which contradicts the fact that
invertible.
)
-be - cd + ad
·
(� �) is
The purpose of this chapter is to investigate this second restriction in more detail, and to show how to find diagonal representations when these exist. We start with a definition of the concept to be studied. Definition 1 2 . 2 A linear transformation f : V -t V is called diagonalizable if there exists a basis of V with respect to which the matrix of f is diagonal.
Similarly, a matrix which represents a linear transformation is called diagon alizableif there exists a matrix P such that p -l AP is diagonal. This concept is intimately connected with eigenvectors. Proposition 1 2 .3 A linear transformation f : V -t V is diagonalizable if and only if there is a basis of V consisting of eigenvectors of f . Pro of If { v 1 , . . . , Vn } is a basis of V consisting of eigenvectors of f , then we have f(v;) = A;v; for some scalars >.; , and the matrix of f is by definition
0
0
]J
Conversely, if the matrix of f with respect to { v 1 , . . . , vn } is as above, then D = A;v; , so the basis vectors v; are eigenvectors of f.
f(v;)
176 Diagonalization 12.2
A criterion for diagonalizability
This section uses the ideas from the previous chapter to give a precise criterion for when a matrix or linear transformation is diagonalizable. Recall that in the last chapter we identified two polynomials whose roots are precisely the eigenvalues of a linear transformation f : V --+ V, namely the minimum polynomial m1 (x) and the characteristic polynomial x1 (x) . If f is diagonalizable, then the minimum polynomial of f takes a particularly simple form. formation. If f distinct roots.
f:
V --+ V be a linear trans is diagonalizable, then the minimum polynomial m1 (x) of f has
Theorem 1 2 .4 Let V be a vector space, and let
f is diagonalizable, so there is a basis of eigenvectors { v 1 , . . . , Vn } of V. Let ,\ 1 , . . . , Ar be the distinct eigenvalues of f , and define the polynomial Proof Assume that
Then p (f) (f - -\11)( ! - .\21) . . . (f - Arl) , and the factors (f - -\;I) all commute with one another by Proposition 9.5. Now each basis vector is an eigenvector, with eigenvalue -\1 for some j , and therefore we have (f - -\1 I) ( v;) = 0. It follows that
=
v;
p (f)(v;) = (f - -\d)(f - -\2 I) . . . (f - Arl) v; = (f - -\d) . . . (f - Aj-J I) (f - AJ+J I) . . . (f - -\ri)(f - -\1 I) v; =0 for each i, and so p (f) is the zero transformation. Therefore m1 (x) divides p ( x ) , by Proposition 1 1 .3. But p ( x ) has distinct roots, and therefore m1 (x) has distinct D roots. In fact, the converse of this result is also true, and we will prove it in due course (see Corollary 12.13). Notice that when we have proved this, we will have a criterion which we can use to determine if a particular linear transformation is diagonalizable. Specifically: f is diagonalizable if and only if m1 ( x) has distinct roots. Recall from Theorem 10.2 that if f : V --+ V , ,\ is a scalar, and v E V, then v E ker(f - -\I) if and only if is either zero or an eigenvector of f with eigenvalue -\ .
v
If ,\ is an eigenvalue of a linear transformation f : V --+ V , then the subspace ker(f - -\ I ) is called the -\-eigenspace of f . Its dimension is called the geometric multiplicity of the eigenvalue .\.
Definition 1 2 . 5
We will prove that if the minimum polynomial of f has distinct roots, then V is the 'direct sum ' of these eigenspaces. In other words, if we take whatever
A criterion for diagonalizability 177 basis B >. we like for the .A-eigenspace ker(f - .AI) , and do this for all eigenvalues .A 1 , -A2 , . . . , .Ak , then B>,1 U B>,2 U · · · U B>..
is automatically a basis of V, provided
m 1 ( x)
has distinct roots.
Definition 1 2.6 V is the direct sum of subspaces V1 , . . . , Vr , if every vector E V can be written uniquely as a sum v = v1 + · · · + Vr , where E Vi . We
v
write v = vl
EB . . . EB
v;
Vr or v = EB �=l Vi .
A useful way to think of direct sums is that V i s a direct sum V = EB �= l V; if and only if whenever we have bases B; of Vi , then their union U ; B; is a basis of V, as the following proposition shows. Proposition 1 2 . 7 If Vi is a subspace of a vector space V, where V = E9 �= 1 V; , and if B; is a basis for Vi for each i, then U�= l B; is a basis for V .
Proof First, every vector v E V can be written as v = v 1 + · · · + Vr , where E Vi . Therefore for each i, v; is a linear combination of the vectors in B; , and
v;
so v is a linear combination of the vectors in U�= l B; . Thus U�= l B; spans the space V . Now suppose that there is a linear dependence among the vectors of U�= 1 B ; . Write B 1 = { a1 , . . . , ak}, B 2 = {b 1 , . . . , bz } , . . . , Br = {z 1 , . . . , zt } , and suppose the linear dependence is Then Vj = Al a! + . . . + Akak E VI , v2 = /-lib! + . . . + J-llbl E v2 , and so on. Thus we have written the zero vector as 0=
VJ
+ V2 +
·
· ·
+ Vr ,
where v; E Vi . But since 0 = 0 + · · · + 0 is the unique way of a sum of vectors in the Vi , we must have v; = 0 for all i. This .A 1 a 1 + · · · + Akak = 0, so all the .A; = 0, since { a 1 , . . . , ak} is a Similarly, = 0, . . . , �i 0. In other words U�= l B; is a linearly subset.
J-li
=
writing 0 as implies that basis for V1 . independent
D
Corollary 1 2 .8 If V = EB �= l Vi , then dim(V) = l:�= l dim( V; ) . Proof Immediate from Proposition 12.7.
D
The main theorem of this section is the following, from which we can easily deduce the converse to Theorem 12.4. Theorem 1 2 .9 Let V be a complex vector space, and let f : V
transformation. Suppose that
-+
V be a linear
with .A1 , .A2 , . . . , Ar distinct, and let Vi be the A;-eigenspace of f. Then V
=
V1
EB
V2
EB · · · EB
Vr .
178 Diagonalization According to Definition 12.6, there are two parts to proving V is a direct sum of subspaces: uniqueness (proved below in Corollary 12. 1 1) and existence (Proposition 1 2 . 1 2 ) . The first part uses the fact that eigenvectors with distinct eigenvalues are linearly independent. More formally, Proposition 1 2 . 1 0 Let f : V --t V be a linear transformation, and suppose that v 1 , . . . , Vr are eigenvectors of f with distinct eigenvalues A 1 , . . . , A r respectively. Then { v1 , . . . , Vr } is a linearly independent set. Proof Suppose not, and let k be the smallest integer such that { v 1 , , Vk } is linearly dependent. In particular, { v 1 , . . . , Vk - d is linearly independent, and there exists a linear dependence .
.
•
0:1 V 1 + · · · + O:k Vk = 0 with ak f:: 0. Moreover, Vk f:: 0 so at least one other a ; is nonzero (i Applying f to both sides of this equation we obtain
0= = = =
< k) .
f (O) j(a1 v 1 + · · · + ak vk ) ar f(vr ) + · · · + ak f(vk ) a1 A 1 v 1 + · · · + O:k A k Vk ·
Now subtract A k times the first equation from the second , to obtain
a1 (A 1 - A k ) v 1 + a2 (A2 - A k ) v2 +
· · ·
+ O:k - 1 (Ak - 1 - A k ) vk - 1 = 0.
But A; - A k f:: 0 since A 1 , . . . , A k are distinct, so this i s a nontrivial linear dependence among { v 1 , . . . , Vk - 1 } , contradicting the fact that these vectors are 0 linearly independent. Corollary 1 2 . 1 1 Let f : V --t V be a linear transformation, and suppose that v E V can be written as v = v 1 + · · · + Vr where v; is an eigenvector of f with eigenvalue A;, and A1 , . . . , Ar are distinct. If also v = w1 + + Wr with each w; an eigenvector of f with eigenvalue A; , then v; = w; for each i . ·
·
·
Proof Otherwise
is a nontrivial linear dependence of eigenvectors with distinct eigenvalues, con 0 tradicting Proposition 1 2 . 1 0 . This proves the uniqueness part of Theorem 12.9. The existence part is a little harder. Proposition 1 2 . 1 2 Suppose that f : V
minimum polynomial
--t
V is a linear transformation with
mJ (x) = (x - A r ) . . . (x - >-r ) where A 1 , . . . , Ar are distinct, and suppose that v E V . Then there exist eigenvectors v 1 , . . . , Vr of f such that v = v 1 + · · · + Vr .
A criterion for diagonalizability 179 Proof For each j in turn consider the polynomial Pj (x) defined by
Note that Pj (x) is well-defined (since all the A i are distinct) and that Pj (Aj ) while Pj (Ak ) = 0 i f k =/::- j . Now consider the polynomial p(x) defined by
=
1,
r
p (x)
=
L:>j (x) .
j= l
This has the property that p(,\i ) = 1 for each i since pj (A i ) = 0 if i =/::- j and = 1 . Thus the polynomial p(x) - 1 has roots ,\1 , . . . , A r· But p(x) - 1 has degree at most r - 1 , since each Pj ( x) has degree r - 1 , so p( x) - 1 has at most r - 1 roots. Since all the A i are distinct, the only way this can happen is if p(x) - 1 is identically 0. Thus p(x) is identically 1 . Hence p(f) is the identity linear transformation, and so p (f) (v) = But
Pi (Ai )
p(f)
=
v. r Pj (f) L j= l
so
r
r
j= l
j =l
where
Vj
Pj (f) (v) (f - >-1 I) . . . (f - Aj - 1 I) (f - AJ+ ! I ) . . . (f - >-ri) = (v) . ( >-j - >- 1 ) . . . ( >-j - Aj_ I ) ( >-j - >-H I ) . . . ( >-j - >- r ) =
Applying f - Aji to both sides of this we conclude that (f - Ajl) (vj ) i s a scalar multiple of mt (f)(v), which is 0. Therefore f(vj) = AjVj , so Vj is an eigenvector D of f with eigenvalue Aj , as required. An alternative proof of this proposition can be given by Proposition 9 . 1 4 and induction. See also Proposition 1 2 . 1 7 below. Theorem 12.9 now follows immediately from Corollary 12. 1 1 and Proposi tion 12.12.
180 Diagonalization 1 2 . 1 3 Suppose that f : V imum polynomial
Corollary
--+
V is a linear transformation with min
rnf (x) = (x - ..\I ) . . . (x - ..\,.)
where ..\ 1 , . . .
Proof
, ..\,.
are distinct. Then f is diagonalizable.
Choose a basis Bi for each eigenspace Vi - Then by Proposition 12.7,
U�= l Bi is a basis for V, and every element of this basis is by definition an
eigenvector of f . The result follows from Proposition 12. 3 . 1 2.3
D
Exa m p les
We shall start by discussing some matrices whose minimum polynomials were calculated in the previous chapter. Example 1 2 . 1 4 In Example 1 1 . 1 7 we showed that the matrix
has minimum polynomial rnA (x) = x 2 - 8x + 16. Since this polynomial factorizes completely into linear factors over IE., A is similar to an upper triangular matrix, but rnA (x) = (x - 4) 2 has 4 as a repeated root, so Theorem 12.4 says that A cannot be diagonalized. Example 1 2 . 1 5 The matrix
over IE. has minimum polynomial x2 - 2x + 2. To see this, it suffices to check that A 2 - 2A + 21 = 0 , and that x 2 - 2x + 2 cannot be factorized over IE.. However, rnA (x) has no real roots, so A is not similar to any upper triangular matrix, let alone a diagonal one. Over C, the situation is different as x 2 - 2x + 2 = (x - ( 1 + i ) ) (x - ( 1 - i ) ) which has two distinct roots in C , so A i s diagonalizable over C . I n fact, we find that
as
you may check.
Example 1 2 . 1 6 Consider the matrix
A=
( ! � -�) -
-4
0 -3
Examples 181 1 1 . 18. This has mA (x) = (x - 2)(x - 1 ) (x + 1) eigenvalues of A are 2, 1 , - 1 . The minimum polynomial mA (x)
of Example
so the has its maximum number of roots ( three ) in IR and all these roots are distinct, so is diagonalizable. In other words, there is an invertible 3 x 3 matrix P with real entries so that
A
p-' AP �
(� ! J)
To find such a matrix P, it suffices to find a basis of eigenvectors of A, since P is just the base-change matrix from the usual basis to a basis of eigenvectors. These eigenvectors can be found as usual by solving simultaneous equations. For eigenvalue 2, we need to solve
The full solution is that (x, y, z)T is any scalar multiple of (0, 1 , 0)r, so we may take (0, 1 , as our first basis vector. Similarly, ( 1 , - 1 , - 1)T and ( 1 , -2, -2)T are eigenvectors with eigenvalues 1 , - 1 respectively. Proposition 12.10 says that these three vectors form a basis, and so ( taking them in the same order as we took the eigenvalues 2, 1 , - 1) we see we may take
Of
p=
(� -� �) -
0 - 1 -2
Note, however, that any three eigenvectors for the three eigenvalues form a basis with respect to which the matrix of the transformation x f--7 Ax is diagonal , so the choice of P above is by no means unique.
Looking ahead to Chapter 14 and the primary decomposition theorem ( The orem 14.3) , we can point out an alternative method for finding eigenvectors other than solving the obvious simultaneous equations. \Vith A as in the previous ex ample, we already identified mA (x) = (x - 2) (x - 1)(x + 1), so it is clear that im B � ker C where B = (A - I) ( A + I) and C = ( A - 2I) , since the product CB of these two matrices is zero. It turns out in fact that these subspaces are actually equal, irn B = ker C , so to find the eigenvectors with eigenvalue 2 it suffices to compute B and find a basis of its image:
(-�
)(
) (
4 0 2 0 0 0 2 1 -5 -4 3 -5 0 3 -4 0 -2 0 0 -4 0 -4 so the image is spanned by (0, 1 , O)T . The other two eigenspaces can be computed B
=
(A - I)(A + I) =
in a similar way.
=
182 Diagona/ization In practice, this method seems useful for more simple matrices, especially when the work in identifying the minimum polynomial has already been done. But, in general, calculating bases of the image of a matrix still involves computing echelon forms, so there may not be any real saving in effort for more complicated examples. For the interested reader, we give the result that states that this method works as follows. Proposition 1 2 . 1 7 For any n x n matrix A over a field F, if A has minimum p olynomial mA (x) = p(x) q (x) , where p(x) and q (x) do not have any non constant factor in common, then im(p( A )) = ker ( q(A) ) .
The proof uses the version of Euclid's algorithm given as Proposition 9. 14, and is left as an exercise for the reader. Diagonalization is frequently applied in the solution of simultaneous linear difference equations and simultaneous linear differential equations. For example, if sequences Xn , Yn are defined by
we would like to find formulae for Xn and Yn in terms of the known quantities Of course, we may write n al l a1 2 r Xn ' Yn = a2 1 a22 s
a n , a 1 2 , a2 1 , a 22 , r, s .
( ) (
) ()
but this just begs the question of determining a formula for the nth power of a square matrix. Diagonalization helps here, since if A=
(
a1 1 a2 1
and
p - 1 AP =
G �)
then (P -1 AP )
and
n
= p - 1 APP - 1 AP . . . p -1 AP = p - 1 A n p
(A0n fl0n) .
So
which gives A n in terms of the eigenvalues A, fl of A and a basis of eigenvectors given by P . This is what is going on in the example in Section 1 0 . 1 , and of course there is nothing special about 2 x 2 matrices here.
Examples 183 Example 1 2 . 1 8 We solve the system of difference equations
=
x0 = 1 ,
X n+l = 3x n - 4yn + 2 zn , Yn+ l Xn - Yn + Zn , Zn+ l = Xn - 2yn + 2 zn ,
Yo = 2, zo = - 1 .
The solution is
where
Now, the minimum polynomial of A was computed in Example 1 1. 19 and found to be (x - 1)(x - 2 ) , which is a product of distinct linear factors, so A is diag onalizable. To find a basis of IR.3 of eigenvectors we must solve the simultaneous equations ( A - I)(x, y , z ) T = 0 and ( A - 2I)(x, y , z ) T = 0. The first of these is
which has solution space ker ( A - I) = span ( ( 1 , 1 , If, ( 2, 1, o f) as
you may check. The second eigenspace is the set of solutions of
which is
ker ( A - 21) = span (( 2, 1 , 1f ) .
Therefore,
(� ) 1
where
P=
2 1 0
2
�
( Of course, there are many other bases for the eigenspaces, and so many other suitable base-change matrices one might take. ) This gives
) ) c� D
184 Diagonalization
(i � ) ( : 1 ' -1 C ) C ' ( :) (-1) -
A" � p
so
0 1 0 2n + 2 " +' - 1 + 2n - 1 + 2n
2 2 "+ ' 2n 1 0 2n 4 - 2 n+ 2 -2 + 2 " +' + 2n 3 - 2 n+ l n+l 2n 2-2 p- • �
= An
;
Zn
2
=
"
2 "+ ' 6 2 n+ 2 3 - 2 n+2
2 1 -2
.
Similar methods can b e used to solve simultaneous differential equations. Example 1 2 . 19 Suppose variables u, v depend on time, t, according to the equations
du = dt
dv = u - v' dt
u + 3v
which can be written as
It turns out that this 2 alized. In fact
x
2 matrix has eigenvalues 2 and - 2, so can be diagon where
for then
=
( u + v)/4,
(31 -_ 11 ) .
p - 1 (u, v)T, or y = ( - u + 3v)/4,
This suggests introducing variables x, y with (x, y)T
x
P=
=
(U) = dtd p (Xy ) = ( 11 - 31) p (Xy) '
!}__ dt v or
This gives dxjdt = 2x, dyjdt = -2y so x = Ae2 1 , y = Be-21 for some positive constants A, B, so the solution is (u , v)T = P (x, y) T , or The constants A , B can be found as usual from boundary conditions. For ex ample, if we are given that v. uo and v vo at time t = 0, then A = (uo + vo)/4 and B = (3v0 - ?lo) / 4.
=
=
Exercises 185 The exercises following provide more examples, and one or two hints on some useful tricks that can be applied in similar cases. Not all matrices can be diagon alized, though, and when you meet such an example the methods of this chapter cannot be used. Instead, it may be necessary to put the matrix in Jordan normal form, and use the ideas from Chapter 14 below. Exercises
Exercise 1 2 . 1 Calculate the characteristic polynomials and minimum polyno-
mials of the following matrices.
( a) ( c)
G D 2 1 0
0 I 0 0
( b)
2 0 2 0
(� �)
( d)
Which of these ( if any ) is diagonalizable? Exercise 1 2 . 2 Show that, regarded as 2
(cos B sin B
- sin B cos B
)
x
(i �) 0 1 0
(! i) 2 0 2
0 1 0 0
3
2 matrices over C,
and
are similar. Exercise 1 2 . 3 Which of the following are diagonalizable? Explain your answers,
but try to do as little work as possible, using results from this and previous chapters where applicable.
( a) ( d)
-
-
G H) G D G D ( -� �i ) (-! ;) (b)
2 -4'
over �
Exercise 1 2 . 4 ( a) Find two 2
0 I
0
(e)
4 -I -2
( c)
-1 1 -2
-4
over C .
x 2 matrices over � which have the same char acteristic polynomial but which are not similar. ( b ) Find two 3 x 3 matrices over � which have the same minimum polynomial but which are not similar. ( c ) Find two 4 x 4 matrices over � which have the same minimum polynomial and the same characteristic polynomial, but which are not similar.
186 Diagonalization Exercise 1 2 . 5 The Fibonacci numbers Xn are defined by Xn+ 2 = Xn+! + Xn , Xo = X! = 1 . Let Un = X2 n , Vn = X 2 n+! and find a matrix A so that
(UnVn+!+ ! ) = A (UnVn ) .
Diagonalize A and hence find a formula for Xn in terms of n. Exercise 12.6 Solve Xn+! = - Yn + Zn ; Yn+! = -yn ; Zn+!
with initial values xo = Yo = 1 , zo = 2.
=
2x n - 2yn + Zn;
Exercise 1 2 . 7 Solve
(a) Xn+! = Xn + 2yn ; Yn+ ! = 2xn + Yn + 1 ; where Xo = Yo = 1 . [Hint: introduce Zn with Zn+! = Zn and zo = 1 .] (b) Xn+! = 2xn + 3yn ; Yn+! = 3xn + 2yn + 2 n ; where Xo = 1 , Yo = 2. [Hint: introduce some suitable Zn .]
Exercise 1 2 . 8 Solve
(a) X n+! = Xn + 4x n + 1 ; Yn+! = Xn + Yn ; Xo = Yo = 1 . (b) X n+ ! = 2xn + Yn + 1 ; Yn+ ! = Xn + 2yn ; Xo = Yo = 1 . [Hint: in each case, introduce U n = Xn + an + b , Vn = Yn + en + constants a , b, c, d.]
d for
certain
Exercise 1 2 .9 Solve the following systems of differential equations for functions
x(t) , y(t) , and z(t) , where a dot denotes differentiation with respect to t. (a) x = -y + z ; iJ = -y; i = 2x - 2y + z ; with boundary conditions x(O) y(O) = 1, z(O) = 2. (b) x = x + 2y; iJ = 2x + y + 1 ; where x(O) = y (O) = 1 . (c) x = 2 x + 3y; iJ = 3 x + 2y + e 2t ; where x (O) = 1 , y (O) = 2. (d) x = x + 4x + 1 ; iJ = x + y; x(O) = y (O) = 1 . (e) x = 2x + y + 1 ; iJ = x + 2y; x(O) = y(O) = 1 .
Exercise 1 2 . 10 Show that V is the direct sum of subspaces U, W if and only if
uE
(a) every v E V is equal to u + w for some U and w E W and (b) U n W = That is, show that Definition 5.28 and Definition 1 2 . 6 agree.
0.
Exercise 1 2 . 1 1 Prove Proposition 12. 1 7. Hence, using induction on dimension, give an alternative proof of Proposition 1 2 . 12.
13 Self-adjoi nt t ra nsformations This chapter combines material concerning quadratic forms with material from the previous chapter on diagonalization. Throughout, V is a finite dimensional vector space over IR or C with an inner product ( I ) . The main goal in this chapter is to understand the nature of quadratic forms, symmetric bilinear forms and conjugate-symmetric sesquilinear forms on a finite dimensional inner product space V-in particular, how the form relates to the inner product on V . It turns out that the key to describing such a form is an associated linear transformation on V . The linear transformations here are of interest in their own right, and have the property of being self-adjoint (as defined below). They can be diagonalized using methods in the last chapter, and this diagonalization provides a complete description of the bilinear or sesquilinear form we are interested in. 13.1
O rthogon a l and u nitary transformations
In earlier chapters we studied the behaviour of quadratic forms (or equivalently, symmetric bilinear forms) on arbitrary vector spaces over JR. The goal was to find a change of basis that diagonalizes the form , or at least makes it look as 'nice ' as possible. In many ways, particularly so for applications to geometry, quantum mechanics, etc. , it is much more interesting to study forms on inner product spaces. What this means is that a base-change transformation f must preserve the inner product, i.e. must send a vector v to another vector f( v) of the same length as v, and send an orthogonal pair of vectors v , to another orthogonal pair f ( v), f ( ) An equivalent view is that our base-change transformations should send orthonormal bases to orthonormal bases, so instead of allowing ourselves to use arbitrary bases, we only allow orthonormal bases.
w.
w
Example 1 3 . 1 Let us suppose that we are working in IR2 with the standard
inner product, and we are considering the quadratic form Q (x, y) (xja)2 + 1 as the equation of an (y/b) 2 . You should recognise the equation Q (x, y) ellipse. By scaling the coordinates, changing the basis to (a , O) r , (O, b)T (which is an orthogonal basis, but not orthonormal) , we can write Q as Q (u, v) u2 + v2 , and the ellipse turns into a unit circle. If on the other hand we only allowed ourselves to use orthonormal bases, then our ellipse would keep its shape, but it might be rotated. For example, changing basis to the orthonormal basis =
=
=
188 Self-adjoint transformations
( 1/v2, 1/v2)r, ( -1/v2, 1/v2)r
represents a rotation by and hence preserves orthogonality and length.
Jr /4 about the origin,
From now to the end of this chapter we will deal with the real case and the complex case at the same time by writing complex-conjugate signs where they are required in the complex case. (In the real case, these complex-conjugate signs can always be ignored since the number in question is real.) Definition 13.2 Let V be a finite dimensional inner product space, and suppose
is a linear transformation V V. We say f is (when V is a vector fspace over �) or (when V is a vector space over
-+
orthogonal
=
-+
D
=
Orthogonal and unitary transformations 189
1
f f2 f3
= = =
( 1 /J2, - l /J2, 0f, ( 1 /J6, 1 /J6, -2/J6) r , ( 1 /)3, 1/)3, 1 jj3 T .
1
)
1
)
) (
The base-change matrix from e , e 2 , e 3 to f , f2 , f3 is
p
=
(
�
1/J2 - 1 /J
(V�
We now calculate p T p:
pTp
=
! 6 J2
1 J2 1 /J6 1 /J3 J3 1 1 /J6 1 /J3 = I - J3 1 J2 . v6 0 - 2 J2 -2/J6 1 /J3
)(
0 - J3 1 -2 J2 J2
1 J3 1 - J3 0 -2
So p- l = p T _ You should verify the matrix multiplication here, and notice as you do so that each entry in p T p is precisely the inner product of the vector given by a row of p T with a column of P. Since the columns of P are precisely the elements of the orthonormal basis f f2 , f3 we then see why the inverse of P is p T _
1,
This example was over the real numbers. The same works in general over
e 1 , . . . , en be an orthonormal basis for V . Suppose f: V V is a linear trans formation with matrix P with respect to e1, . . . , e n . Then f preserves the inner product ( I ) (i.e. is orthogonal or unitary) if and only if p - 1 exists and equals P T . Proof The matrix P of f with respect to e1, . . . , e n is precisely the matrix formed by columns equal to the column vector representations of f ( e1), . . . , f (en ) ·with respect to the same basis. Since e 1 , . . . , e n is orthonormal , the ( i, j)th entry in P T P is just (f(e;)i f (ej)); hence P P I if and only if {f(eJ), . . . , j(e n )} is orthonormal. The proposition now follows from Proposition 13.3. --+
-T
=
D
Example 13.6 We consider a slightly less trivial example than that in Ex
ample 1 3 . 1 , on JR2 with the usual inner product. Consider the quadratic form Q (x, y) = 8x2 + 24xy + y2 on JR2 with the usual inner product. Completing the square gives us Q(x, y) = 2(2x + 3y) 2 - 1 7y2 , so this form has rank 2 and signa ture 0. However, our new basis is not even orthogonal, let alone orthonormal, so this tells us nothing about the shape of the hyperbola Q(x, y) = 1 . On the other hand, it turns out that there is an orthogonal base-change matrix,
psuch that
( -3/5 4/5
)
4/5 3/5 '
190 Self-adjoint transformations
u
y
Fig. 1 3 . 1
p TAP
3/5 ( - 4/5
=
The hyperbola 8x 2 + 24xy
4/5 3/5
) ( 1 82 121) ( -3/5 4/5
+ y2
4/5 3/5
=
1
) ( -80 170) . =
Thus there is an orthonormal basis given by vectors f1 ( - 3/5, 4/5) T and = (4/5, 3/5) T with respect to which Q has the form 8 2 + 1 7v 2 . From this we can calculate the slope of the asymptotes of the hyperbola Q 1 , as well as the points where it cuts the v-axis. In v coordinates, the equations of the asymptotes of 8 + 1 7v2 1 are ± VU/8 v , which on substituting ( -3x + 4y)/5 and v = (4x + 3y)j5 become
u u, =
f2
-
=
u
-
=
u2
(4 v'i7 + 6 h)x
u
=
=
=
( - 3 v'i7 + 8 h)y
and
(4 v'i7 - 6 h)x
=
( -3 v'i7 - 8 h)y.
The hyperbola passes through the v-axis at words at the points ( x, y) given by
(4/5 VU, 3/5 VU)
u
=
0, v
=
± 1 /VT7, or in other
( -4/5 VU, -3/5VU) .
(See Figure 13. 1.) 13.2
From forms t o t ra nsformations
Suppose V is a finite dimensional inner product space V over � or C. We saw in the last section that the linear transformations of V that preserve the in ner product are precisely the orthogonal transformations. If e 1 , . . . , en is an
From forms to transformations 191 orthonormal basis of V, we saw that these transformations have matrices P which satisfy p- I = P T . Now if A is an n x n matrix representing a linear transformation f and we consider P as a base-change matrix, then the matrix A is transformed to p - 1 AP by this base change. On the other hand, if A is an n x n matrix representing a bilinear form F and we consider P as a base-change T T matnx, then A should be transformed to P AP. But P = p - I , so these two transformed matrices are the same. In other words, provided we restrict ourselves to orthogonal base changes the matrix A can be thought of as either the matrix of a linear transformation or the matrix of a bilinear form. All this suggests the idea of switching from looking at a bilinear form with matrix A to considering the linear transformation with the same matrix, and back again. Given a bilinear form F(u, v) on a Euclidean space V with orthonormal basis e 1 , e 2 , . . . , e n , or a sesquilinear form F (u, v) on a unitary space V with orthonormal basis e 1 , e 2 , , e n , we can define an associated map f : V -+ V by -
-
.
.
.
•
n
f(v) =
L F(e;, v) e; . i= l
( 1)
This is a linear transformation by linearity of F in the second argument, since
n f(>.v + f.1W) = L F(e; , >.v + JlW) e; i= I n = L (>.F( e ; , v) + J1F( e ; , w)) e i i=l n n i= I = Af(v) + Mf(w ) .
i= I
A t this stage, i t i s conceivable that the map f depends on the basis e 1 , . . . , e n used in ( 1) . It follows from the calculations below, however, that it is actually independent of the choice of basis. Working in the other direction, suppose that a linear transformation f : V -+ V is given. Then we can define a form F ( v, w) by
F(v, w) = (vjj(w))
(2)
for all v , w E V. It is easy to see that F is bilinear (or sesquilinear in the complex case) since
F(u, >.v + JlW) = = = =
(ujj(>.v + JlW)) (uj ),.j (v) + Mf(w)) >.(ujj(v)) + Jl(ujj(w)) >. F (u, v) + 11F(u, w)
192 Self-adjoint transformations and
F(>..u + fJV , w)
= (.\u + fJv lf(w)) "X(u if(w)) + Jl(vj j(w)) = "XF(u, w) + JlF(v, w) .
=
These two processes, of going from F to f, and of going from f to F, are inverse to each other; that is, if we start with F, say, pass to the associated linear transformation f, and then to its associated form, we get
n
n
i= l
i=l n
\u I L F(e;, v) e ; ) = L F(e; , v)(u j e ;)
= L (uj e; ) F( e ; , v) i=l n
= L (e; ju) F(e;, v) i=l
= F (t (e ; ju) e;, v) = F(u, v), t=l
by the Fourier expansion formula u L ; ( e;ju ) e ;. ln the other direction, if we start with j, compute its associated form F, and the linear transformation associated with that, we get
=
n
n
L F( e;, v) e; L (e; j j (v))e; = f (v ) , =
i= l
i=l
by the Fourier expansion formula again. This, incidentally, also shows that the definition of f from the form F is independent of the choice of orthonormal basis e 1 , . . . , e n used. We now return to the ideas mentioned at the beginning of this section, and show that f and F are represented by the same matrix. Proposition 1 3 . 7 Suppose that f : V ---t V is a linear transformation on an inner product space V , suppose that F is the corresponding conjugate-symmetric sesquilinear form, and that { e 1 , . . . , e n } is an orthonormal basis of V . Then f and F are represented by the same matrix with respect to the ordered basis Proof The matrix of F is ( a ;j) , where a ;j the matrix of f is B (b;j ) where
A=
=
f( ej)
n
= L b;j e; . i= l
=
F(e;, e j ) . On the other hand,
From forms to transformations 193 However, our formula for f gives
n
f( eJ ) =
L F( e; , ej ) e; ; i= l
thus b;J = F (e; , eJ ) = a ;j so A = B .
0
In the real case, we are interested in symmetric bilinear forms F, i.e. forms satisfying F(v , w) = F(w , v) . If f is the corresponding linear transformation, this equation translates into
(v ! f(w) ) = F(v , w) = F(w , v)
=
(w ! f(v)) = (f(v) !w)
for all v, w E V . In the complex case, we are interested in conjugate-symmetric sesquilinear forms F, i.e. satisfying F (v, w) = F (w, v) . Again, if f is the corresponding linear transformation, this is equivalent to
(v ! f (w) )
=
F (v, w) = F (w, v) = (w ! f (v) ) = (f(v) !w) .
In both the real and complex cases we obtain (f(v) ! w ) = (v ! f(w) ) . Linear transformations f with this very special property are called self-adjoint. Definition 1 3.8 A linear transformation f : V
-t V of an inner product space V (over IE. or :c) is said to be self-adjoint if (f(v) !w ) = (v ! f ( w)) for all v , w E \1 .
We conclude this section by discussing the properties of matrices representing self-adjoint transformations. Proposition 13.9 If f is a self-adjoint transformation of an inner product space
V, and if { e 1 , . . . , e n } is an orthonormal basis of 1/, then the matrix A of f with respect to the ordered basis e 1 , . . . , en is conjugate-symmetric, i.e. A T = A .
Proof Since f i s self-adjoint, the corresponding form F i s conjugate-symmetric,
F ( u , v) = (u ! f (v) ) = ( f(u) !v)
=
(v !J (u) ) = F(v , 11 ) ,
so the matrix of F is conjugate-symmetric. But with respect to e 1 , . . . , e n , the matrices of f and F are the same by the last proposition , since { e 1 , . . . , e n } is orthonormal. 0 There is a useful converse to this, characterizing self-adjoint transformations. Proposition 1 3 . 1 0 If f is a linear transformation of an inner product space F ,
and if e1 , . . . , e n is an ordered orthonormal basis of V with respect t o which the matrix A of f is conjugate-symmetric, then f is self-adjoint. Proof Let F be the corresponding form (u!f(v)) . Then the matrix of F is
the same as that of j, and is conjugate-symmetric. Therefore F is conjugate 0 symmetric, and hence f is self-adjoint.
194 Self-adjoint transformations Note that we have proved slightly more than was stated: if f has conjugate symmetric matrix with respect to some orthonormal basis, then f is self-adjoint and so has conjugate-symmetric matrix with respect to every orthonormal basis. This is not true for arbitrary bases, since for example the real 2 x 2 matrix A= is not symmetric, but
(�
�)
is symmetric, where
13.3
p = (-� D .
Eigenvalues a n d diagonalization
The main aim of this section is to prove that given any conjugate-symmetric sesquilinear form (or real symmetric bilinear form) there is an orthonormal basis with respect to which its matrix is diagonal. Recall that if a matrix ( a ;j ) is conjugate-symmetric, then its diagonal entries are real, as a ;; = a;; . Theorem 1 3 . 1 1 If f is a self-adjoint linear transformation of an inner product
space V and .>.. is an eigenvalue of f, then .>.. is real.
Proof We have f (v)
=
.>..v for some nonzero vector v , and .>.. (v !v) = (v! .>..v ) = (v ! f (v)) = (f(v) ! v) = (.>..v !v) = 'X(v !v),
so (.>.. - 'X) (v !v)
=
0. But (v!v) f: 0 since v f: 0 and hence .>.. = -:\".
D
Corollary 1 3 . 1 2 For any symmetric real n x n matrix A or any conjugate symmetric complex n x n matrix A , the characteristic polynomial XA (x ) has n real roots (counting multiplicities). Proof XA (x ) has ceding theorem.
n
complex roots, but each of these roots is real by the pre D
Theorem 13.13 The minimum p olynomial m f (x ) of a self-adjoint linear trans
formation f : V repeated roots.
-+
V of a finite dimensional inner product space V has no
Eigenvalues and diagonalization 195 = (x - ..\) 2 p (x) for some polynomial p (x) . Then E V with ( J - ..\ )p( J ) (v) -::j:. 0 and ( J - ..\) 2 p(f) (v) = 0. 0, so there is ( ) ) v (f- ..\ p f -::j:.
Proof If not, suppose m1 (x)
But then
0 -::f. (( ! - ..\)p(f) (v) l (f - >.)p ( f ) (v))
=
(p(f) (v) l ( f - >.)( ! - >.)p (f)(v))
since ( J - >.) is self-adjoint. This is a contradiction. Corollary 13.14 Any self-adjoint f : V product space V is diagonalizable.
--+
=
0 D
V of a finite dimensional inner
Proof By the previous theorem and Corollary
12.13.
D
In Proposition 12.7 we proved that if v 1 , v2 , . . . , V k are eigenvectors of a linear transformation f , where f (vi) = AiVi and the )..i are all distinct, then { v1 , v2 , , vk } is linearly independent. For self-adjoint f we can make the stronger statement that the Vi are orthogonal. •
•
•
Theorem 1 3 . 1 5 Let f be a self-adjoint linear transformation f : V --+ V, and suppose v 1 , v2 are eigenvectors of f with corresponding eigenvalues ).. 1 , ).. 2 . If ).. 1 -::j:. ).. 2 then v 1 and v2 are orthogonal. Proof We have
>.1 (v 1 l v2 )
1;"" (v1 !v2 ) (>.1 v1 ! v2 ) (f (v i ) I v2 ) = (v i i f (v2 )) = (v1 ! >. 2 v2 ) = >. 2 (v 1 l v2 ) =
=
=
since ).. 1 is real, by Theorem (v 1 l v2 ) = 0, as >. 1 -::f. >. 2 .
13. 1 1.
But then ( >. 1 - ..\ 2 ) (v 1 h )
=
0, and hence D
What this means is that, for a self-adjoint linear transformation J , we can always find an orthogonal basis of eigenvectors. In fact , we can do even better: by normalizing in the usual way there is an orthonormal basis of eigenvectors. To see this, we first find any basis of eigenvectors. Then for each eigenvalue >. , we take the set of basis vectors which have that eigenvalue , and apply the Gram-Schmidt algorithm to it. The result will be an orthonormal basis for the eigenspace, since any nonzero linear combination of eigenvectors with eigenvalue ).. is itself an eigenvector with eigenvalue >. . The base-change matrix P from the usual orthonormal basis to this orthonor mal basis of eigenvectors will be unitary, i.e. P T = p-I , since the new basis is orthonormal. It follows that, given a real symmetric matrix A, or a complex
196 Self-adjoint transformations conjugate-symmetric matrix A, we can find an orthonormal basis of pectively ) for which both the linear transformation
en
f (v)
=
IRn (res
Av
and the (symmetric bilinear, or sesquilinear) form
F(v, w)
=
vTAw
are represented by the same diagonal matrix. The proof just given of Corollary 13. 14 and Theorem 13.15 is somewhat indir ect. Using the notion of orthogonal complement from the 'optional' Section 5.4, it is possible to give a direct proof of these results. We do this now for the benefit of readers who have read the material on orthogonal complements. Theorem 13.16 Suppose f : V
--+ V is a self-adjoint linear transformation of a fin ite dimensional inner product space V over lR or C. Then there is an or tlwnormal basis { v1 , v2 , . . . } of V such that each vi is an eigenvector of f .
, Vn
Proof We use induction on the dimension n of V . If n = 0 there is nothing to prove as the empty set 0 is a suitable basis of V. Since f has a real eigenvalue .\1 , there i s a nonzero v1 E V with f ( v1 ) = .\ 1 v1 and [[vi [[ = 1. Let U = span(vi ) and W = U j_ = {w E V : ( u[w ) = 0 } . W we have ( j(w) [u ) = ( w [f(u) ) = ( w [ .\ 1 u ) Then for w .\ 1 (w [u ) = 0, so f (w) E W ; thus we may regard f as a self-adjoint linear transformation of W. Also, U EB W = V and U has dimension 1 , so W has dimension n - 1. By our induction hypothesis, there is an orthonormal basis v2 , . . . of W consisting of eigenvectors of f, and clearly v1 , v2 , . . . is the required basis of V. D
E
=
, Vn
, Vn
13 .4
Applications
We shall indicate some of the applications of the results in the previous section here by way of some examples, all of which concern real vector spaces. One obvious place in which real symmetric matrices arise is as the repres enting matrix of the symmetric bilinear form corresponding to a quadratic form For example, given a quadratic form Q (x, y , z) on JR3 , the equation Q on Q (x, y, z ) = a represents a surface which we might want to describe. Simply completing the square as we did in the last part to find the rank and signature of Q gives some information, but we lose the additional structure given by the usual inner product on in the process. Somehow, we need to diagonalize the form Q and the usual inner product simultaneously to get a full picture. This is illustrated by the following example.
IRn .
IR3
Example 1 3 . 1 7 Consider the surface in
IR3 defined by the equation
5x2 + 5 y 2 + 5z 2 - 2xy - 2yz - 2zx This i s Q (x, y, z)
=
3 where Q i s the quadratic form
=
3.
Applications 197
Q(x,y,z) � (x , y , z) ( =!
We now diagonalize the matrix
A=
( -�
�� =D G)
-1
-1 -1
5
- 1 -1
5
)
if it represented a linear transformation not a bilinear form . This is legitim ate, provided we keep to orthonormal bases, since we know that corresponds to a symmetric bilinear form, which in turn corresponds to a self-adjoint linear transformation. It is symmetric, and hence the corresponding linear transform ation fA is self-adjoint, and is diagonalizable. In fact, it turns out that this matrix has a basis of eigenvectors 1 , 1 , l )r, ( 1 , - 1 , 0) T , 1 , 0, - 1 ) T with corres ponding eigenvalues 3, 6, 6 , as you can check. However, in this example we want an orthonormal basis of eigenvectors. Using the Gram-Schmidt process to or thogonalize u 1 = ( 1 , - 1 , 0)r, u 2 = ( 1 , 0, - 1 ) T we set v 1 = u 1 and as
Q
(
(
This gives the following orthogonal basis of eigenvectors of A v, �
Now normalize: W1
=
(
( l) -
t)
1 //2 - 1/ 2
.
v2
w2
This gives the base-change matrix
=
=
C�) 1!
2
,
v, �
( )
w3
l f/6 1 /)6 , -2/)6
(D
=
c)
/3 1 /)3 . 1 /)3
)
1 /y'6 1 /)3 1 /)6 1 /)3 -2/)6 1 /)3
of Example 1 3.4. The point of that example was to show that p T = p- 1 and so p TAP = p- 1 AP =
( )
6 0 0 0 6 0 . 0 0 3
This matrix is diagonal, so the base-change matrix P diagonalizes the quadratic form as well as the linear transformation fA .
Q
198 Self-adjoint transformations Now introduce 'new coordinates' a , b, c by the rule
so
Q (x, y, z)
�
(x, y, z)P(P - ' AP)P - '
�
(a, b, c)
=
6 a 2 + 6b2 + 3c2 ,
since (x, y, z) P = pT (x, y, z ) T is given by the equation
=
G � D G)
p - 1 (x, y, z)r. Thus the surface Q (x, y, z)
with respect to the new coordinates
wr , w 2 , w3 .
G)
a,
=
3
b, c in the directions given by the vectors
This surface is an ellipsoid, with centre at the origin, and elongated in the
c or w 3 direction with radius 1 in this direction, and with radius 1 /v2 in the directions orthogonal to this.
A quadratic form of rank 3 over �3 can always be diagonalized as Q (x, y, z) = ± (x/a) 2 ± (y/b) 2 ± (z/ c ) 2 . The surfaces given by the equation Q (x, y, z) to the signs (i.e. the signature) . The surface
Q (x, y, z)
=
=
1 have different shapes according
(x/a) 2 + (y/b) 2 + (z/c) 2
=
1
=
1
is an ellipsoid, with semi-major axes a , b, c. The surface
Q(x, y, z)
=
(x/a) 2 + (y/b) 2 - (z/c) 2
is a one-sheet hyperboloid, something like a cooling tower extending to infinity in both directions. The surface
Q (x, y, z)
=
(x/a) 2 - (y/b) 2 - (z/c) 2
=
1
is a two-sheet hyperboloid, like a hill reflected in the sky. The final equation,
Q (x, y, z)
=
- (x/a) 2 - (y/b) 2 - (z/c?
clearly has no real solutions. (See Figure 13.2.)
=
1,
Applications 199
(a) Ellipsoid
(b) One-sheet hyperboloid
(c) Two-sheet hyperboloid Fig. 13.2 Surfaces defined by quadratic forms of rank 3
200 Self-adjoint transformations
(a) Elliptical cylinder
(b) Hyperbolic cylinder Fig. 1 3 . 3 Surfaces defined by quadratic forms of rank
2
The degenerate cases, when the quadratic form has smaller rank, are also worth noting. The surface = = 1 is an elliptical cylin + der, while the surface Q = is a hyperbolic cylinder (if that makes sense ! ) . (See Figure 13.3.) We may also consider surfaces of the form = 0. These are degener ate cases of = E when E -+ 0. In particular, the surface = 0 is that of two cones joined together at their apexes, whereas the more general case = 0 is similar except the cross sections of the 'cones' are ellipses. The other degenerate case of this type is exemplified by = 0 which is a pair of planes meeting at the line = = 0.
2 2 2 - (y IW(ylb) 1 (x,Q(x,y, y, z) z)(xI a)(xla) Q(x, y, z) Q(x, y, z) (z I b) 2 (xla) 2 - (ylb) 2 - (zlc? (xla) 2 - (ylb) 2 =
(xla?-(y lb) 2 x y
Exercises
Exercise 1 3 . 1 For each case, sketch the graph of the curve in question and
describe all eigenvectors of the matrix
A geometrically.
Exercises 201
G n. (b) x2 + 2xy y2 = 7, and A = G U. ., ., ( 5 2 ) (c) 5x- + 4xy 5y - = 7, and A = 2 5 (d) x2 + y2 = 7, and A = G � ) . (a) x2 + 4xy + y2 = 7, and A = +
+
·
CC2 be defined by T ((x, y) T ) = (2ix + y, x) T . (a) Write down the matrix A of T with respect to the usual basis of CC2 .
Exercise 13.2 Let T : CC2
(b) (c) (d) (e)
-+
Is A symmetric? Is A conjugate-symmetric? What are the eigenvalues of A? Is A diagonalizable?
Exercise 13.3 A matrix A is of the form
(� �) , where a, b, c E JR. Suppose
that A has an eigenvalue .\ of algebraic multiplicity 2 . Prove that a = b and calculate the value of c. Exercise 13.4 Sketch the graph of each of the following.
(a) 5x2 - 8xy + 5y2 9. ( b ) l l x2 - 24xy + 4y2 + 6x + 8y = - 15. (c) 1 6x2 - 24xy + 9y2 - 30x + 40y = 5. [Hint: for (b) and (c) diagonalize the matrix for the quadratic form first, then transform the whole equation including the nonquadratic parts.] =
Exercise 1 3 . 5 Describe the set of points
{ (x, y, z ) E JR3 : x 2 - y2 /3 + z2 - 2 xy - 2yz + 2xz = 1 } mentioning any rotational or translational symmetries that you can find. [Trans lational symmetry is when a figure looks the same after it has been shifted by a translation vector v. Rotational symmetry is similar, but the figure is rotated through an angle e about a given axis.] Exercise 13.6 The form Q is defined on IR3 by
Q ((x, y, z) T )
=
x2 + y 2
+
4z2
+
14xy + 8xz + 8yz.
By finding a suitable orthogonal matrix P and defining 'new coordinates ' a , b, c by write Q (x, y, z) T as for some real constants
.\ ,
fl,
.\a2 + f1b 2 + vc2 and v. Hence describe the following surfaces.
202 Self-adjoint transformations
Q(x, y, z) = Q(x,y,z) = O. Q(x,y, z) + x + y - 2z = l. Q(x,y,z) + x + y + z = Q(x,y, z) + x + y + z = O. Q(x, y, z) + x =
(a) 1. (b) (c) (d) 1. (e) (f) 1. [Hint: for some of these, you may find it helpful to change the origin.] Exercise 1 3 . 7 (This exercise is for students wondering why the terminology
'self-adjoint ' is used. ) Let V be a finite dimensional inner product space, and let
ordered orthonormal basis of V. Suppose f: V --'t V is a linear etransformation 1 , . . . , en be anwith matrix A with respect to this ordered basis, and define jt : v
--'t
v by
jt is called the adjoint of
f.
n Jt(v) = L(v ! f(e;))ei . i=l
(a) Show that jt is a linear transformation of V. (b) Show that
n
n
k=!
k=!
for all i , j . (c) Using the previous part and linearity, show that
(u ! J (v)) = (jt (u)J v)
u, v
E V. for all is (d) Deduce that the matrix of jt with respect to the basis that the definition of jt is independent of the orthonormal basis taken.
e1, . . . , e n e1,AT , and . . . , en
Exercise 13.8 Let V be an inner product space over IE. or C, and suppose that
self-adjoint linear transformation from V to V. Given p(x), a polynomial fwithis acoefficients from the field of scalars for V, show that p(J) is a self-adjoint linear transformation V V. Exercise 13.9 Let a, f3 E 2'(V, V ) be self-adjoint, where V is an inner product space over IR or C. Write a/3 as �(a/3 - f3a) + �(a/3 + f3a), to show that J(vJ af3(v) W ? ±J (vJ (a/3 - f3a)(v)) l 2 for all vectors v E V . --'t
14 The J ord a n norm a l form If a linear transformation f : F ---+ F i s not diagonalizable, we may still ask for a basis with respect to which the matrix of f is as 'nice as possible'. It turns out that we can always obtain such a basis where this matrix is in Jordan normal form; that is, a special upper triangular form where the only nonzero entries off the diagonal are entries equal to 1 just above repeated eigenvalues. Such forms will enable us to solve a much greater variety of simultaneous difference and differential equations. 14.1
J orda n normal form
We have proved that if f : V ---+ V is a linear transformation whose minimum polynomial has distinct roots, then f is diagonalizable. We now consider the general case, when m1(x) may have repeated roots. First we need to generalize the concept of eigenspaces. Definition 1 4 . 1 Suppose f : V
where ) q , .
..
---+
V has minimum polynomial
, Ar are the distinct eigenvalues of f. Then the subspaces ker( (! - ,\;l)e, )
of F are called the generalized eigenspaces of f .
1
Notice that if e ; = 1 , i.e. ,\; is an eigenvalue which occurs with multiplicity a root of the minimum polynomial, then this is just the usual eigenspace. The most important result for our purposes is that V is the direct sum of these generalized eigenspaces. This is a generalization of Theorem 12.9, and is stated below as Theorem 14.3 and will be proved in Section 14.4. But before we give this result formally, let us consider an example by way of illustration. as
Example 1 4 . 2 Let V
=
([3 , and define the linear map f : F f
(�) ( =
z
2
- x - :y - z -x + 4y + z
)
.
---+
F by
204 The Jordan normal form Then f is represented with respect to the standard basis by the matrix
A = (-1-� -�0 -�0) ' (x) = XA (x) = (2-x)(x2 + 2x + 1 ) = (2 -x)(x + 1)2. XJ m1(x) (x -2)(x + 1) (x -2)(x + 1)2. (A -2I)(A +I) = (-1-� -�4 -1- �) (-1-� -�4 -�2) ( � -120 �) ' = = mA( x ) 1)2. m1( x ) ( x -2)( x + 2, = 21. (�z) = (-x-x +-�y4y-z-z) = 0 T : -xz + 4y - z 0, (x, y, z)Ty = 0 x = -z. -x-5y-z{(-z,O,z) = + -1 , (�z) = (-x+4y+2z -x -3;y-z ) = == {(O,y,zf y,z -2y -z 0,4y + 2z = 0} : {(O,y,-2yf : y = ( -x-x+-34y;y-z+ 2z) 9x - (-x + 4y + 2z) ) -3x-2(-x-2y-z) (-3x + 4( -x-2y-z) + 2( -x + 4y + 2z) (I)
so we can calculate its characteristic polynomial in the usual way, as Thus
is either
or
But
6
-6
-6
which is not the zero matrix, so For the eigenvalue the generalized eigenspace is the same as the ordinary eigenspace, and is just the kernel of the linear map g f Now g
so the kernel of
is the set of all vectors satisfying the equations So and = equivalently, and ker(g) = E q. For the eigenvalue we work out the ordinary eigenspace in the same way. Writing h f I we have g
h
and
E C, E C} .
ker(h)
Now
h
Jordan normal form 205 so
ker (h2 ) = {(x, y, z) T : 9x = 0} = { (O, y, z) T : y, z E C} ,
which has dimension 2. Note that the image of h2 i s the set of all vectors of the form (9x, 0 , -9x)T , which is the same as the kernel of g. That is, irn(h2 ) = ker ( g ) . Similarly we have irn(g ) = ker ( h 2 ) . ( Compare Proposition 1 2 . 1 7 and the example preceding it. ) We now state our promised generalization of Theorem 12.9. Theorem 14.3 (Primary decomposition) Let f : V formation witl1 minimum polynomial
-+
V be a linear trans
where >.. 1 , . . . , Ar are the distinct eigenvalues of f, and e1 , . . . , er are posit ive in tegers. Let V1 , . . . , Vr denote the corresponding generalized eigenspaces, i.e. vi = ker((f - >..i W ' ) . Then
Proof See Section 14.4, Theorem 14.15.
D
This is sometimes called the primary decomposition of V with respect to j , and V1 , , Vr are the primary components. We have already seen an example of this in Example 14.2. In that example, the generalized eigenspaces are •
•
•
V1
=
V2
ker ( g ) = { (x, 0, - x) T :
=
2 ker(h ) =
x
E q,
and
{ (O, y, z) T : y, z E C} ,
and it is easy to check that V = V1 EB V2 in this case. If we now choose a basis B1 for VI and a basis B2 for v2 then B = B I u B 2 is a basis for v since v = VI EB v2 ' ' is a direct sum , and then we can write f with respect to the basis B . For example, take B1 = ( 1 , 0, - 1) T and B 2 = (0, 1, O) T , (0, 0, 1 ) T , and calculate
f ( 1 , 0, - l ) T j(0, 1 , 0) T f(0 , 0 , 1 ) T
(2, 0 , -2) T (0, -3, 4) T (0, - 1 , 1 ) T
2( 1 , 0, - 1 ) T - 3 (0 , 1 , O) T + 4(0, 0, 1 ) T - (0, 1 , 0) T + (0, 0 , 1 ) T
so the matrix of f with respect to the basis B is
(! -D 0 -3 4
Observe that this matrix is in block diagonal form, with the blocks corresponding to the different generalized eigenspaces of f.
206 The Jordan normal form 1 4. 3 , if B; is a basis for Vi , then U�= l B; i s a basis for V , and with respect t o this basis the matrix of f has block diagonal form, i.e.
Proposition 1 4 . 4 With ti1e notation of Theorem
0 where A 1 , . . . , Ar are square matrices giving the action of f on V1 , . . . , Vr . Proof We use the fact that f and (f - A;I)e, commute with each other, for all i . I f v; E V; = ker((f - A;I)e' ) , then
(f - A;I)e' (v; )
=
0
so
and
(f - A;I)e' ( f (v;) ) = 0 ;
hence f(v; ) E V; .
D
The primary decomposition uses the factorization of the minimum polynomial of f to give us a block diagonal form for the matrix of f. Each block now has a single eigenvalue: the minimum polynomial of an n x n block is, say, (x - ,A.) k , and the characteristic polynomial is (x - ,\) n . Our next task is to simplify the shape of these blocks. In other words, we try to find as nice a basis as possible for each generalized eigenspace. We have already seen in Example 1 2 . 1 that in general we cannot find a basis with respect to which the matrix is diagonal. However, we can get close. To see the kind of thing that we can do, let us consider an example. Example 1 4 . 5 Let f : IR.3
f
-+
IR.3 be defined by
(�)z ( � ;8� � �z4z) , (�! -� - �) . =
x 1 lOx + 5y -
so that f is represented with respect to the standard basis by the matrix B=
10
5 -4
First we calculate the characteristic polynomial XB (x) = ( 1 - x ) 3 and minimum polynomial mB (x) (x - 1) 2 . In particular there is a single eigenvalue, namely =
Jordan normal form 207 1, and its algebraic multiplicity is 3. Next we work out the eigenspace, ker(B - I), which consists of all vectors (x , y , z ) satisfying
Solving these equations in the usual way, we obtain a two-dimensional space of solutions, spanned by eigenvectors such as ( 1 , 0, 2) T and (0, 1 , l)r, for example. (Thus the geometric multiplicity of the eigenvalue is 2.) ;\1oreover, as (B - I)2 is the zero matrix, ker(B - I)2 is the whole space. To get a nice basis for the space, we first take a basis for ker(B - I) and then extend to a basis for ker(B - I)2 . For example, we could take the ordered basis
Applying the corresponding base-change matrix
we obtain the new matrix 1 Q- B Q =
(
�) '
1 0 -2 0 1 1 0 0
which is now an upper triangular matrix, with the eigenvalues on the diagonal. But we can do better than this. If we apply B - I to the basis vector ( 1 , 0, O)r, then the image vector ( - 2, 14, lO ) T is in ker(B - I) since w = (B - I) ( l , 0, O)T has (B - I)w = (B - I)2 ( 1 , 0, O)T = 0 as ker(B - I)2 = R3 . So let us change our basis of ker(B - I) to include this vector. For example, we could take our new basis for the whole space to be
( 1 , o , of, ( -2, 14, IOf, ( o , 1, If
which would give a base-change matrix
and a new matrix
which is in so-called Jordan n ormal form. The only nonzero entries off the diag onal are entries equal to 1 , one place immediately above the diagonal.
208 The Jordan normal form --t W is a linear transformation with minimum poly nomial m1(x) = (x - A)k , then there is a basis of W with respect to which the matrix of f has A on the diagonal, 1 or 0 in each entry immediately above the diagonal, and 0 elsewhere. That is, the matrix of f has the form
Theorem 14.6 If f : W
A 0
1 A
0
0 1
0
0
1 A
A
1
A
0
0 1
0
0
A
1
0
with zeros everywhere except as indicated. Proof See Section 14.2.
D
This theorem tells you how each of the blocks A; in Proposition 14.4 can be rewritten. Putting all the blocks together again, we get the Jordan normal form of an arbitrary matrix, which has blocks of the shape given in Theorem 14.6, for various values of A. The small blocks which make up this matrix, of the form
A 0
E
=
1 A
0 1
0 0 0
0
1
0
0
0
A
are called elementary Jordan matrices. If E is a k x k matrix of this form , then it is easy to show that (E - Alk ) k = 0, but that (E - Al k )k - I f:. 0 . Thus if a Jordan matrix J has a k x k block E as above, then the minimum polynomial must be divisible by (x - A)k . Indeed we have the following result ( see Exercise 1 1 .7) . Proposition 1 4 . 7 If f : V
--t
V is a linear transformation and
then, in a matrix representation of f in Jordan normal form, the largest element ary Jordan matrix with eigenvalue A; is an e; x e ; matrix.
With the same matrix E as above, suppose that v eigenvector of E. Then Ev Av, i.e. =
=
(v1 , . . . , vk ) T is an
Obtaining the Jordan normal form 209 Ev = (.\v1 + v2 , .\v2 + v3 , . . . , .\vk- 1 + vk , .\vk ) T = .\ (v1 , v2 , · . . , vk ) T , = · · = Vk- l = 0 . So up to a scalar multiple,
so v1 v2 we have (0 , 0, . . . , l)T. Thus each elementary .Jordan matrix has a one-dimensional ei genspace. Putting all these together we obtain the following.
=
v=
·
Proposition 14.8 The dimension of the .\-eigenspace of f (i.e. the geometric
multiplicity of .\) is equal to the number of elementary Jordan matrices for ,\ in the Jordan normal form for f . 14.2
O btaining t h e J ordan norma l form
Here, we will be rather more precise on how the .Jordan form of an arbitrary square matrix can be obtained. Suppose f : --1 is a linear transformation. First, the primary decomposi tion theorem (Theorem 14.3 or Theorem 14. 15) shows how we can get a block diagonal form for the matrix of f , by finding bases of the generalized eigenspaces. (All you need to know to be able to carry out this calculation is the definition of the generalized eigenspaces. In particular, you don't need to know the proof of the primary decomposition theorem .) This reduces the problem to finding a 'nice' representation for each block, i.e. finding a 'nice' basis for each generalized eigenspace. Each block corresponding to one of these generalized eigenspaces has a single eigenvalue. The minimum polynomial of an n x n block is ( x - >.)k , say, and the characteristic polynomial is (x - .\ ) n . We suppose, therefore, that we have a linear transformation f : --1 with minimum polynomial rnJ (x) (x - >.)k, and for simplicity we consider the linear transformation g f- .\I instead. Then we have rng ( x ) x k and X g (x) xn , so the only eigenvalue of g is 0 . A s m g ( x ) xk , gk i s the zero map , s o ker(yk ) . Now clearly ker g r + l 2 r r r ker g for all r, for if v E ker(g ) then g (v) 0 and g r � l (v) g(g r (v)) g(O) 0, so v E ker (g r + l ) . This means we get a chain of subspaces
V V
=
=
V V =
=
=
V=
=
=
V = ker l 2 ker g k - l 2
·
·
·
2 ker g2
:J
ker g 2 ker l
=
=
= {0}.
The general method for finding a suitable basis of V i s as follows. First take a basis v1 , . . . , Vr1 of ker g, extend this to a basis v1 , . . . , Vr1 , Vr1 + I , . . . , Vr2 of ker g 2 , and so on , until we have a basis
V = ker gk. \Ve now modify this basis: first write down those basis elements Vrk _ 1 -,- l , . . . , Vrk of ker gk not in ker gk-l as a1 = Vrk-l . . . , an1 = Vrk , giving
of
+I ,
Next calculate b1 = g(a1 ) , . . . , b n 1 g (an1 ) and write these down underneath the ai . These bi are all elements of ker gk-l since gk-l (bi) gk (ai) 0, and it
=
=
=
210 The Jordan normal form will turn out that all the vectors a;, bj form a linearly independent set. Because of this, we can extend the list of the b; to b n, + 1 , . , b n2 so that .
.
We then work out c; = g(b;) for each i, write these underneath, and extend what we have got to a basis of span( vr. _ 3 + 1 , . . . , vr. ) . When this process is complete, we will have a basis of the whole space V written as a table of the form
a b1
1
bn1 + 1
b n2
C1
a n, b n, Cn,
Cn1 + 1
Cn2
Cn2 + 1
Cn3
Z1
Zn,
Zn1 + 1
Zn2
Zn2+1
Zn3
Znk
·
All that is required is to order this basis in a suitable way. To do this, note that = b; , g(b;) = c; , etc. , so we order the basis reading up the columns first, and then left to right, as
g(a;)
Because g(a;) = b; , g (b;) = c; , etc. , the matrix of g will be in Jordan normal form, with an elementary Jordan matrix of the form
0 1 0 0 0
0 1
0 0 0
0 0
0
0
1
for each column of the table. The matrix of the original linear transformation f g + .AI is then formed of elementary Jordan matrices =
A. 1 0 A. 0
0 1
0 0 0
0 0
0
A.
1
required. Clearly, the crucial point to this construction (and it is not immediately obvious) is that the basis modification actually does give a basis. The lemma that tells us that it really does work is the following. as
Obtaining the Jordan normal form 211 Lemma 1 4 . 9 If { u 1 , . . . , Ur } is a basis for ker(gj ) , i s extended t o a basis
of ker (gJ � 1 ) , and to a basis of ker (gH 2 ) , then { u 1 , . . . , 11r , g( wi ) , . . . , g( w1 ) } is a linearly independent subset of ker(gH 1 ) . Proof First note that gJ� 1 (g(w;) ) = gH2 (w; ) = 0 so g(w;) E ker(gj-t-1 ) . To
show linear independence, suppose we have a linear dependence r t L >..; v; + L JL;g(w;) = 0 , i= 1
so that
i=l
r ;g(w;) = L f1 L A;V; t
i= l
i= l
E ker(gj ) .
Therefore
so 2::�= 1 fl iWi E ker(gh- 1 ) which means that it can be written as a linear com bination of {v 1 , . . . , v r , v 1 , But • • •
,vs}.
=
{u l , · · · , V r , V 1 , · · ·
is a linearly independent set, so all >.. ; = 0 .
Jl;
, t18 1 W 1 , · · ·
, wt }
0 . Therefore L �� 1 A;u; = 0 , so all the
o
Example 14.10 Let A be the matrix
A=
r-: ( -1 I
1
0
2 0 0 0
1
0 0 0
1 1
3
-1 -1
})
You can calculate that X A = (2 - x ) and rnA (x) = ( x - 2r3 . �ow let g (v) = Bv where
B = A - 21 =
5
I
-1 1 -1 1
0 0 0 0
0 1
-1 1 -1
0 0 0
�l -
1 -1 -1
212 The Jordan normal form Calculating kernels as usual, we find that
with basis
( 1 , - 1 , 1 , 0 , 0)r , ( o , o , o, 1 , - l)r ,
with basis
( 1 , - 1 ' 1 , 0 , 0) T ' (0 , 0 , 0 , 1 , - 1 ) T ' ( 1 , 0 , 0 , 0 , O) T ' (0 , 0 , 0 , 1 ' 0) T ' and ker g3
=
IR.5 , with basis
( 1 , - 1 , 1 , o, o )r , ( o, o, o, 1 , - 1f , ( 1 , o, o, o, of, ( o, o, o, 1 , o ) r , ( o, 1 , o, o, of. We now modify this basis according to the rules above. First, we set a1
(0 , 1 , 0 , 0 , O)T. Next, take b 1 = g(aJ ) = ( 1 , 0 , 0 , 0 , O)T , and extend by adding b 2 = (0 , 0 , 0 , 1 , 0)T. Finally, we set c1 = g(b l ) = ( 1 , - 1 , 1 , - 1 , 1)T and c 2 g(b 2 ) = (0 , 0 , 0 , 1 , - l)T. These vectors are organized in the following way,
and we can order the basis we have just found reading up columns and across from left to right as c 1 , b 1 , a1 , c 2 , b 2 . The corresponding base-change matrix P and its inverse are
p
(
=
I
-1 1 -1 1
1 0 0 0 0
0 0 1 0 0 0 0 1 0 -1
!)
p-'
�
(�
0 0 -1 1 1 0 I 0 0
0 0 0 0 1
))
and you can check that
p - 1 AP
1 0 1 0 2 0 0 0 0
=
0 0 0
(� �} 2
2
0
in Jordan normal form. 14.3
Applications
.Just as with diagonalization, Jordan form gives us a useful method for solving many kinds of simultaneous difference and differential equations. \Ve illustrate the method here with an example of simultaneous difference equations.
Applications 213 Example 1 4 . 1 1 Solve the equations
3x Xn+l Zn n YnZn+l+l Yn-Xn +Yn 2zn Zn = = =
+
+
xo Yo zo
= = = 1. subject to First, in matrix form this becomes
XnYn) (Xn+l ) ( Yn+l Zn-,-l Zn =A
where
By the usual calculations, we find that
Put
g(v) = (A - 2I) v, so ker g has basis ( 1 , 0, - l)r, ker g2 has basis ( 1 , 0, - 1f, (1, - 1 , 0f,
and ker g3 has basis
(1, o, - 1f, ( 1 , - 1 , of, ( 1 , o, of. Set a 1 = ( 1 , 0, O)T, b 1 = g(a1 ) = (1, - 1 , O)T, and c 1 = g(b 1 ) = gives a base-change matrix P and its inverse given by
P= and
u �) 1
p-1 =
-1
0
(:
G D (an�: ) (Xn�:)
p - 1 AP = As usual , we 'change coordinates ' to
= p-1
1
20
)
( 1 , 0, - 1)T. This
0 -1 -1 0 . 1 1
214 The Jordan normal form giving
a(bn") (20 21 01) (abn") a( bnn++!1 ) Cn+ Cn 0 0 2 Cn an 2nnao + n2n- l bo + n(n2- 2n-2co bnCn = 22"cobo .+ n2"-1co p - I AP
=
=
I
Now the general solution to difference equations like this is
1)
=
=
Substituting
into this we get
2 - ?n/2- 4) abnn= 2"-2n-�(.1, (33nn-2) Cn 2" 3, 2"-2(34n2- + 5n/2 + 4) XnYn 2"-2( 2"-2(4 + ?n/2-3n2). = =
which gives
·
=
6n )
=
z,
[xx2' (((iii +I)++ 1)1)
=
For solving equations like these, the following 'standard forms' are useful.
Theorem
1 4 . 1 2 T h e general solution of the differen ce equa tions
0 0 A 0 0 0 0
A
X3
xk(i:+ 1) is
where
is the coefficient o f
"Cn
= 1 , etr.
xi
in
( :r +
" C;
=
0 0 0
XX2! ((ii)) (i) 1 Xk(i)
1
A
n!/(i!(n"Co =
I ) " , so
a: 3
1,
i)l)
" C\
=
n,
"C2
= n(n - ... , I)/2,
Applications 215 Proof Use induction on n together with the familiar identity
D
to obtain the theorem .
The same ideas can be applied to differential equations too. Example 1 4 . 1 3 Quantities x(t), y(t) , z (t) vary with time, t, and satisfy the equations
dx - = 3x + z dt dy - = -x + y - z dt dz = y + 2z -;at and boundary conditions x(O) = y(O) = z(O) = In matrix form , we have
L
We find x(t) , y(t) , z (t) .
where
is the matrix of the previous example. This suggests using the same base-change matrix P , and defining new quantities u (t) , v(t) , w (t) by
Then
or
dw - = 2w dt
dv = 2v + w dt
du - = 2v + v . dt
The solution to this standard system of differential equations is
216 The Jordan normal form for constants of integration A , B, C. These constants are found using the bound ary conditions
So A = 3, B
( ) u(O) v (O) w (O)
=
= p-I
- 1 , and C
=
( ) x(O) y (O) z (O)
= p-
1
() ( ) 1 1 1
-1 3
-1
.
-1. This gives the required solution
x = �t2 e 2t + 2te2t + e 2t y = -3te 2t + e 2t z = - � t2 e 2t + te2t + e 2t
of the original differential equations. 14.4
Proof of the prim ary decom position theorem
Here we prove the primary decomposition theorem. As will be clear, if the the orem is taken on trust, the proof is not required in calculations. However, the proof is given here for those readers who like to see the complete story. Suppose that we have a linear map f : \1 -+ V with minimum polynomial m1 (x) = (x - >. I )e' . (x - Ar ) e , where A 1 , . . . , Ar are the distinct eigenvalues of f, and let V1 , . . . , lir be the generalized eigenspaces, defined by . .
"
\1;
= ker ( ( f -
=
A;W' ) .
We want to prove that V i s the direct sum of these generalized eigenspaces. Define p(x) = (x - >. I ) e ' and q (x) (x - >. 2 )e 2 (x - Ar )e" , so that m 1 (x) = p(x) q (x) and VI = ker(p(f) ) . Also define wl = ker ( q (f)) . Our plan is to show that V1 EB W1 = V, and then use induction on dim V to obtain a decomposition w1 = EB r;� 2 V; . We shall use the result in Proposition 9.9 ( or Proposition 9. 14) which says there are polynomials t( x) and s ( x) such that
t(x)p(x) + s(x) q (x)
• • •
=
1.
We shall also use the fact from Section 9.2 that p(f) q (f) throughout. Lemma 1 4 . 1 4 V1 Pro of
EB
q (f)p(f), etc. ,
W1 = V .
Let v E V. Then
v
=
Iv
=
(t(f)p(f) + s(f) q (f))v = t(f)p(f)v + s(f) q (f)v .
But t(f)p(f)v E W1 since q (f) ( t(f)p(f)v) = t(f) (p(f) q (f)v) = 0 as p(f) q (f) = 0. Similarly, s (f) q (f)v E \11 , so \1 = \11 + W1 .
Proof of the primary decomposition theorem 217 To show that this sum is direct, suppose v E V1 , w E W1 , and v + w = 0. We must show v = w = 0. But v + w = 0 implies p(f)v + p(f)w = 0 but p(f)v = 0 as v E ker p(f ) , so p(f)w = 0. But this means w = Iw = (t(J)p(J) q(f)w = way. as
0
+ s (J)q(f))w
since w E ker q(f) .
+ s (f)q(f)w = 0 So w = 0. We prove v = 0 in exactly the same = t(f)p(J)w
D
Theorem 1 4 . 1 5 If f : V --+ V is a linear map with minimum polynomial where .\ 1 , . . . , Ar are the (distinct) eigenvalues of f , let V1 , . . . , Vr be the gener alized eigenspaces, defined by
Then V = V1
EB · · · EB
Vr .
Proof By induction on the number
r of distinct eigenvalues of f. If r = 1 there is nothing to prove. If r > 1 , we have V = V1 EB W1 , and we can define g : W1 --t W1 by g(w) = f(w) . We only need to check that g(w) E W1 . But if w E W1 = im(p(f) ) then w = p(J) (v) for some v E V , so g(w) = f(w) = (f o p(J)) (v) = (p(f) o f ) (v) = p(f) (f(v)) E im (p(J)) = W1 . Moreover, W1 ker(q(f)) , so q (g) maps every vector in W1 to 0 . Therefore =
q (x) = ( x
-
A2 ) e 2
...(
x -
Ar ) e "
divides the minimal polynomial rng ( x ) of g. I t follows that g restricted to wl has r - eigenvalues, and we can use induction to say that W1 = V2 EB EB Vr , D Vr . so that V = V1 EB W1 = V1 EB V2 EB · · · EB
1
·
·
·
Let us look at an example in detail to see how this primary decomposition works. Let V = IR.3 and suppose that f : V --t V is given by f so that f2
() ( ( ( a
() ( a b
c
=
-a - c
)
b = f 4a - b + 7c
c
)
-a - c 4a - b + 7c , 4a + 3c
4a + 3c -(-a - c) - (4a + 3c) 4( -a - c) - (4a - b + 7c) + 7(4a + 3c) 4(-a - c) + 3(4a + 3c) -3a - 2c 20a + b + 10c . 8a + 5c
)
)
218 The Jordan normal form Then f is represented with respect to the standard basis by the matrix
A=
(-� -1 �) -1
7 3
and the characteristic polynomial of f is
X t (x) = XA (x) = (x - 1 ) 2 (x + 1). Therefore the minimal polynomial of f , rn1 (x) = rnA (x), i s either (x - 1) (x+ 1 ) or (x- 1) 2 (x+ 1). But (A-I) (A+I) A2 -I :j: 0, so in fact rnA (x) = (x- 1 ? (x+ 1). (Note: throughout these calculations you can work either with the matrix A or with the linear map f.) Now we want to take out one linear factor of rnA ( x), to the full power, say p(x ) = (x - 1 ) 2 , and q(x) is what is left, namely q(x ) = (x + 1 ) . By definition we have rn1 ( x) = p(x )q(x ) . Substituting f into this identity, we obtain p(J)q(J) = rn1 (J) = 0, so all vectors get mapped to 0 by p(J) o q(J) . Of course, some vectors go to 0 under p(J) on its own (these are exactly the elements of ker(p(J ) ) ) , and some go to 0 under q(J) on its own (these vectors form ker( q(J) ) ) . We have q(x) = x + 1 , so q(J) = f + I i s defined by (q(J)) (v) = j (v) + v, i.e. =
q(J) :
(a, b, c)T r-+ ( -c, 4a + 7c, 4a + 4c)T.
Therefore the image of q(J) is spanned by the vectors (0, 4, 4)T and ( - 1 , 7, 4)T, or to take a simpler basis, {(0, 1, 1)T, (1, 0, 3f} . Moreover, the kernel of q(J) consists of all vectors (a, b, c)T such that (-c, 4a + 7c, 4a + 4c) = (0, 0, 0) , in other words ker(q(J)) = ((0, l , O)T). It is easy to see now that in this example V = im(q(f)) EEl ker(q(J) ) . Similarly we have p(x) = (x - 1 ) 2 , so p(J) = j 2 - 2 f + I and
Therefore the image of p(J) is spanned by (0, 1 , 0)T. Also, the kernel of p(J) consists of all vectors (a, b, c)T satisfying 12a + 4b - 4c = 0. This is clearly a two-dimensional space, spanned by ( 1 , 0, 3)T and (0, 1 , 1 ) T. Thus we see that the image of p(J) is equal to the kernel of q(J ) , and vice versa. In the notation above we have vl = span ( ( l , 0, 3)T, (0, 1 , l)T), wl span((O, 1 , O)T), and v vl Gl wj . =
=
Exercises 219 If we choose the new basis v1 =
( 1 , 0, 3)
T
(0, 1 , 1 )
, v2 =
T
, v3 =
(0, 1 , 0)
T
and write f with respect to this basis, we see that
( -4, 25, 13)T ( - 1 , 6, 3)T (0, - 1 , 0)T
f(vi ) j ( v'2 ) j ( V3 )
- 4v1 + 25v2 - v 1 + 6v2 -V3
so the matrix of f with respect to the ordered basis v1 , v2 , v3 is B
(-�4 J) -1 6 0
2
=
which is in block diagonal form : we have separated out the different eigenvalues into different blocks. Exercises
Exercise 1 4 . 1 Using matrices
p
-1
=
�
c
-1
2
2
-1 -1 - 1 -2
2
compute p - 1 AP, where
A�
(�
-
)
;2
-1
P=
-3 - 5 1 -1
-2 -2
-2
-1
0 0 I 1 0 0 2
(� �)
)
3 . 3
�
What are the characteristic and minimum polynomials of A? Give bases for each of the generalized eigenspaces ker(A - .ur . Exercise 1 4 . 2 For the matrix
A=
(-� � =;) -3
2
-2
1
compute a base-change matrix P such that p - AP is in Jordan normal form , as follows. (a) Compute XA (x). Show that it is of the form - (x - )q ) (:r - .A 2 ) 2 for some distinct .A1 , .A 2 . (b) Find bases 11 of ker(A - .A1 I) , v1 of ker(A - .A 1 I) , and v1 , l• 2 of ker(A - .A1 If .
220 The Jordan normal form ( c ) Working from first principles, explain why ( A - A 1 I ) v2 is a scalar multiple of v 1 . [Hint: what is ( A - A1 I ) ( A - A I I ) v2 ?] ( d ) Let w1 = u, w2 = ( A - A1 I) v2 , and w3 = v2 . What is the matrix of the linear transformation A with respect to this basis? Write down the base-change matrix P .
Exercise 1 4 . 3 Repeat the last exercise (except for part ( c ) ) for the matrix
( This time 1 , 2, 3. )
XA (
x) = (A - x) 3 for some A. Find bases for ker( A - Al) n for
n
=
Exercise 1 4.4 Solve
( a)
X n+l = 2yn - Zn Yn+l = Yn Zn+l = X n - 2yn + 2zn xo = Yo = zo = 1 (b ) X n+! = 5xn - 3yn - 5zn + 5wn Yn + ! = X n + Yn - Zn + Wn Zn+! = 2xn - 2yn - 2zn + 3wn Wn + 1 = Xn - Yn - 2zn + 3wn xo = Yo = zo = wo = 1 .
(
Exercise 1 4 . 5 Find bases for the primary components o f the linear map rep resented by the matrix
A=
-1 1 1
3
0
-4
0
15
-2 - 7 0 3
J)
and hence find a matrix P such that p - I A P is in block diagonal form.
Exercise 14.6 ( a) Let
Find the characteristic polynomial, eigenvalues, and the minimum poly nomial of A , and find the algebraic and geometric multiplicities of each eigenvalue. Write down the Jordan normal form J for A.
(b ) Do the same for the matrix B =
( � -=_12) . i
Exercises 221 Exercise 14.7 Write down all possible Jordan normal forms for matrices with
characteristic polynomial (x - A) 5 . In each case, calculate the minimum polyno mial and the geometric multiplicity of the eigenvalue A. Verify that this informa tion determines the Jordan normal form . Exercise 14.8 Do the same for (x - A)6 .
Exercise 14.9 Show that there are two 7 x 7 matrices in Jordan normal form which have the same minimum polynomial and for which A has the same geo metric multiplicity, but which are not similar. Exercise 1 4 . 1 0 Show that the Jordan normal form of a 2 x 2 matrix is deter mined by its minimum polynomial, but that this is not true for 3 x 3 matrices.
3 x 3 matrix is deter mined by its minimum polynomial and its characteristic polynomial, but that this is not true for 4 x 4 matrices.
Exercise 14. 1 1 Show that the Jordan normal form of a
A p pend ix A A theorem of a na lysis If I is an interval of the reals (e.g. [a, b] , (a, b) , [a , b) where a < b and a, b are either real or ± oo), a function f : I -+ IE. or f : I -+ C is said to be continuous at a point c E I if for all E > 0 from N there is b > 0 in IE. with
/f(x) - f(c) / <
E
for all
xE
I
with
0 < /x - c / < b
where I I is the usual absolute value in IR:. or C. The function f : I -+ N or C is continuous if it is continuous at all points c E I . Continuous functions f : I -+ N or C can always b e integrated; that is, if r � s are real numbers in I then the integral
!s f(x) dx
can be defined by some limiting process as 'the area under the line y = f(x) '. (Two methods of defining integrals are commonly used. The first i s the R.i emann integral, the second is the more general and more powerful Lebesgue integral; we don ' t go into the details, but either method is suitable for the discussion here. ) If r or s is required to be infinite, we define the integral as a limit as follows,
r f(x) dx, f(x) dx = rlim --?oo J Jr)() s s f(x) dx 8]11!� f(x) dx,
J:cc /00 f(x) dx - oo
when these limits exist. Here is a proof that (See Example 3.4.)
(!/g)
=
=
jr lim � f(x) dx, r
=
r -+oo s-Hx;.
-s
.r: f(x)g(x) rlx
is an inner product
011
'b'[a, b] .
A.l Let V be the vector space 't'[a, b] of all continuous functions from the closed interval [a, b] to I�, under pointwise addition and scalar multiplication . For any J, g E V , define (!/g) = J;� f(x)g(x) dx. Then ( I ) is an inner product on V . Theorem
Appendix A 223 The symmetry and bilinearity follow immediately from
� b j (x) g (x) dx = � b g (x) j (x) dx
and
�b ( o:f(x) + f3g (x))h(x) dx o: � b f(x)h(x) dx + f3 �b g(x)h(x) dx . =
Moreover, ( f ! f) j� (f(x)) 2 dx ?: 0 for any that if (f (x) ) 2 dx 0 then f 0 ( i.e. f (x) immediately from the following result.
.r:
=
=
Lemma A.2 If h :
=
[a, b]
f
=
E F. Finally, we have to show 0 for all x E [a, b]) . This follows
-+ IE.
is a continuous function, with is not identically zero, then .r: h(x) dx > 0 .
h(x) ?:
0 for all
x E [a, b] , and h Proof I f h(x) = 0 for all x E (a, b) , then by continuity h(a) = h(b) 0 so h is identically zero, which is a contradiction. Therefore ::Jc E (a, b) such that h(c) # 0. Since h is continuous, there is 6 > 0 such that (c - 6, c + 6) c;; [a, b] and if Jx - cj < 6 then jh(x) - h(c) J < �h(c) (in particular, h(x) > �h(c)) . Thus there is a step function k ( see diagram ) =
h(c) � h(c) a
b
c-6 c c+6
defined by k(x) = �h(c) if x E (c - 6, c + 6) and k (x) 0 otherwise, which is bounded by h-that is, 0 :s; k(x) :s; h(x) for all x E [a, b] . So by the definition of the Riemann integral, h(x) dx ?: k(x) dx 2o� h(c) > 0 as required. D
=
.r:
.r:
=
The proof of this result can be modified to give a proof of the corresponding result for infinite integrals, provided the functions are sufficiently well-behaved for the integrals to exist.
A p pend ix B A p p l ications to q ua nt u m mecha n ics Many of the ideas i n this book are applied in quantum mechanics. I n particular the notions of an inner product and of self-adjoint linear transformations of certain infinite dimensional vector spaces over the complex numbers are essential for the theory of quantum mechanics. We include a very brief discussion of some of the ideas here, since they give useful motivation of many of the topics discussed in the book-especially the importance of inner product spaces over C. In quantum mechanics, a particle moving along the x-axis at a particular moment in time t is represented by a continuous complex-valued function '1/J : IE. --+ C in the space V of Example 3.24, where the argument x of 'lj; (x) denotes position along the axis. We further stipulate that for such a function representing the state of a particle we have
II'I/JII 2 = (1/JI'I/J ) =
/_: '1/J(x) 'I/J (x) dx = 1.
(1)
The real-valued function of x, 1'1/J(xW , is interpreted as a probability density that is, for all a < b in IE. the integral
gives the probability that the particle will be found in the interval [a, b] . (The condition 111/JJrl = 1 of (1) can be seen as just saying that the probability that the particle is somewhere on the x-axis is exactly 1.) Many other expressions using the inner product have physical interpretations too. :For example,
/_: '1/J(x)x'lj;(x) dx
=
(1/J I¢) ,
where ¢ is the function x H x'lj;(x) , is the expected or average position of the particle, and is defined whenever the function ¢ : x H x'lj;(x) is in V . Also,
Appendix B 225 assuming 1/J (x) is differentiable with respect to x with continuous derivatives in V ,
100 -
oc
--;-
'lj! ( X )
- i h d'¢ - ih 'ljJI I 'lj!' ') - --- -- dX ( 27r
27r dx
is the expected or average momentum of the particle at time t, where the physical constant h � 6. 63 x 10-34 .J s is Planck's constant and the prime in '¢ ' denotes differentiation. The general form for these 'expected values ' is that for a measurable quantity q, there is a self-adjoint linear transformation Oq : V --t V (usually called an 'operator') such that ('¢1 Oq ( '0)) gives the expected value for measurements of q. Thus the position operator Ox is pointwise multiplication by the position x, taking ¢ to the function x >--+ x'lj;(x) , and the momentum operator {jP is ( - i h/27r) (djdx) taking ?/; to ( -ih/27r)¢'.
Exercise B . l Use integration by parts to show that (1f) li¢/) = ( i1;'! '1¢) for con tinuously differentiable 7./J, ¢ : R --t C in the vector space V of Example 3.24. lJse ( '1/; 1 ¢) (¢ 1 7);) to deduce that ( 1/J ii'l// ) is real. (Note that the imaginary number i
=
is required to make the signs turn out right.)
This last exercise shows that the operator i(d/dx) (which is clearly a lin ear transformation V --t V) is indeed self-adjoint. ln quantum mechanics, all 'measurable' quantities are represented by such operators. \Ve now consider a particle moving along the x-axis and described at time t by a function 1/J E V which is sufficiently smooth (differentiable, with continuous derivatives, etc. ) for the operators discussed here to make sense. If q is a measurable quantity with corresponding operator Oq , then eq ( 1/;l b'q ( ?/;) ) is the mean or expected value for q, and the 'uncertainty ' of meas urement of q is given by L:::. � , the mean square deviation from the mean , i .e. the expected value of (q - eq )2 . In terms of operators, this is given by =
t::. �
= (7./J I ( Oq - eq)2 (7./J) ) = ( 7./Ji b'i ( '¢ ) ) - 2eq(?/J I Oq ( 7); ) ) + (1,i;!e�1;'! / =
( '¢ 1 0i ( 7j; ) ) - e ; ,
recalling that eq = (7./J I Oq (1;'!) / and 1 1 7); 1 1 2 1 . In the special cases of position x and momentum L:::. � are
=
and
so
p,
the operators for L:::. � and
226 Appendix
8
('lf; i a2 ('1j; ) )(?j;l fi2 ( ?j; )) (a (?j;) ia ( ?j; ))(,6 ('1j; ) l ,6 (?j; )) ? i ( a ( ?j; ) l ,6 ( '1j; ) ) l 2 i ('I)J iafi (?j;) W , using the Cauchy-Schwarz inequality and the facts that a and ,6 are self-adjoint. (This is obvious for a ; for , 6 it follows from Exercise B.l.) The expression I ('I)J! a fi ( ) !1 2 can be estimated, and it turns out that 6.;6.� =
=
=
'ljJ
(See Exercise
13.9 in
Chapter 13.) Also,
[x d'lj; - d (x'l) )] = -ih 'lj; ( a,6 - ,6a) ('lj; ) -ih 2Jr dx dx ! 2Jr , =
so (using
! l 'lj; ! l 2 = 1 again) we get
! ('I)Jiafi (?j;) W 2 ? -i"i (?j;i ( a,6 - ,6 a ) ( ?j; )) ! ? -i-l -ih/2Jrl 2 l i 'lj; W ? h2 /1611"2
6.; 6.; =
or
which is the famous Heisenberg uncertainty principle that describes the theor etical limit of any attempt on the simultaneous measurement of both the mo mentum and position of a particle.
I nd ex (The page num ber where the term in question is defined is given in italics.) adjugate matrix, 1 5, 1 70 algebraic closure, 1 47, 1 48, 1 54 algebraic multiplicity, 1 70 algebraically closed field, 1 46 , 1 48 alternating form , 72, 1 2 3 angle, 53, 5 6 Argand diagram, 3 5 , 5 5 arithmetic modulo p, 3 9 arithmetic modulo a polynomial, 1 48 associativity, 4, 2 1 , 1 34 augmented matrix, 9 axiom , 20 base-change formula, 6 8 , 7 1 , base-change matrix, 66 base-change transformation, basis, 29, I 77 orthonormal, 8 1 usual or standard, 30 Bessel 's inequality, 8 3 , 85 bijection, 3 6 bilinear form, 62 isomorphism, 6 5 matrix of. 6 3 , 64 bilinearity, 50 block diagonal form, 205 byte. 40 Cauchy-Schwarz inequality,
137 1 87
orthogonal, 87, 1 96 complex conj ugate, 55 complex inner product, 55 complex numbers, 2 1 complex vector space, 23 composition, 1 3 9 , 1 4 5 cone, 2 0 0 congruent matrices, 1 38 conjugate-symmetric form, 69, 1 1 9 conjugate-syrn metric matrix, 1 93 conjugate-symmetry, 55 continuity, 222 coordinates, 34, 64, 70 cosine rule, 48 cross product, 99 definite form, 95 degree of a polynomial, 1 42 determinant, 1 1 , 1 2, 1 5 9 of a transformation, 1 6 7 rules for calculation, 1 3 diagonal matrix. 1 09 , 1 74 diagon alizable matrix, 1 75, 1 80 diagonalizable transformation, 1 75,
1 80 ,
195
52, 54, 5 6 ,
57
Cayley- Hamilton theorem, 1 67. 1 70 characteristic equation, 1 65 characteristic of a field, 42 characteristic polynomial , 1 65, 1 94 codomain, 127 coefficient, 1 42 cofactor, 1 5 column operation, 1 1 , 1 1 5 column vector, 1 9 commutativity, 4 , 2 1 , 1 34 . 1 44 complement. 86, 8 7
diagonalization, 1 74. 1 94 , 1 95 difference equations, 1 5 1 , 2 1 2 differential equations, 2 1 2 dimension, 3 1 . 33 dimension zero, 33 direct sum of subspaces, 87, 1 77, distance, 5 1 , 52, 56 distributivity, S . 2 1 , 3 9 division algorithm, 1 4 2 domain, 1 2 7 dot product, 47 echelon form , 7 eigenspace. 1 76 generalized, 203, 20.5, 2 1 6 , 2 1 7 eigenvalue, 1 53, 1 54, 1 63 , 1 94 geometric multiplicity, 209 of an upper triangular matrix, eigenvector, 1 53. 1 54
216
1 59
228 Index elementary Jordan matrix, 208 elementary row operation, 6 ellipse, 1 87 ellipsoid, 1 98 elliptical cylinder, 200 Euclid's algorithm, 1 47, 149, 1 82 Euclidean space, 50, 80 exchange lemma, 3 1 F-vector space, 24 field, 24, 38, 147 algebraic closure, 1 47, 1 48, 1 54 algebraically closed, 146, 1 48 characteristic of, 42 finite, 39, 40 finite extension , 1 50 order, 38 form alternating, 1 23 conjugate-symmetric, 1 1 9 definite, 95 Hermitian, 1 1 9 negative definite, 95, 1 1 3 negative semi-definite, 1 04 positive definite, 50, 62, 94, 1 1 3 positive semi-definite, 1 04 quadratic, 1 06 semi-definite, 1 04 sesquilinear, 1 1 9 skew-symmetric, 1 23 symmetric, 1 07 Fourier coefficients, 82 Fourier expansion , 8 1 , 85, 1 92 Fourier series, 54 function of several variables, 1 1 7 generalized eigenspace, 203, 205, 2 1 6, 217 geometric multiplicity, 1 76, 209 Gram�Schmidt formula, 76, I l l Gram-Schmidt process, 77, 88, 94, 1 1 0 Hermite polynomials, 9 1 Hermitian form , 1 1 9 homogeneous quadratic function, 1 07 homomorphism, 1 2 7 hyperbola, 1 9 0 hyperbolic cylinder, 200 hyperboloid, 1 98 identity map, 1 35 identity matrix, 4, 1 35 image, 1 28, 1 3 1 imaginary part, 55
inertia, law of, 1 09, 1 1 9 infinite dimension, 33 injection, 36 injectivity, 1 29, 1 30 inner product, 47, 50 over C, 55 inner product space, 56 interval, 222 invariance of the characteristic polynomial , 1 67 invariants, 1 66 inverse, 5 inverse matrix, 9 irreducible polynomial, 1 48 isomorphism of forms, 80 of inner product spaces, 79 of vector spaces, 35, 36, 1 27 Jordan matrix, 208 Jordan normal form, 1 85, 203, 208 kernel, 1 28, 1 3 1 Laguerre polynomials, 92 law of inertia, 1 1 9 leading minors, 1 03, 1 2 1 leading minor test, 1 02, 1 2 1 Lebesgue integral, 222 Legendre polynomials, 79, 82, 90 length, 47, 5 1 , 1 87 linear combination, 25, 2 7 linear dependence, 27 linear equations, 1 0 linear independence, 27, 75, 1 78 linear map, 1 27 linear transformation, 1 2 7 addition, 1 33, 1 38 composition, 1 34, 1 39, 1 45 diagonalizable, 1 75, 1 80, 1 95 image, 1 28 injective, 1 29, 1 3 0 kernel, 1 28 matrix of, 1 35 multiplication, 1 34, 1 39 nullity, 1 29 nullspace, 1 28 range, 1 28 rank , 1 29 scalar multiplication, 1 33, 1 38 surjective, 1 3 0 linear transformations forming a vector space, 1 34 linearity, 50, 55
Index 229 long division of polynomials, Lorentz plane, 62
1 43
matrix, 3 addition, 4, 1 38 adjugate, 1 5, 1 70 augmented, 9 column, 3 congruence, 1 38 conj ugate-symmetric, 1 93 diagonal, 1 09 , 1 74 diagonalizable, 1 75, 1 80 identity, 4 image, 1 3 1 invariants, 1 6 6 inverse, 5 , 9 , 1 5 invertible, 5 , 1 3 Jordan, 208 kernel, 1 3 1 multiplication, 4 , 1 39 nullity, 1 3 1 of a permutation, 1 8 rank, 1 3 1 representing a form , 64, 7 0 representing a linear transformation, 1 35 scalar multiplication, 1 38 similarity, 1 38 square, 3 transpose, 5 upper triangular, 1 3, 1 55 , 1 74 zero, 4 measurement, 225 minimum polynomial, 1 62, 1 63, 1 76 , 1 94 , 217
Minkowski space, 6 1 minor, 1 5 momentum , 225 monomial , 1 42 multiplicity of a root, 146 mutual orthogonality, 74 natural numbers, 34 negative definite form, 95, 1 1 3 negative semi-definite form , 1 04 norm, 52, 56 normalization, 75, 1 1 2 nullity, 1 29, 1 3 1 nullspace, 1 28 one-sheet hyperboloid, 1 9 8 one-to-one correspondence, 3 6 one-to-one map, see injectivity onto map, see surjectivity
ordered basis, 34, 1 35 orthogonal complement, 85, 87, 1 96 orthogonal projection, 76, 8 1 orthogonal set, 73, 74 orthogonal transformation, 1 87 , 1 88 orthogonality, 53, 74 of trigonometric functions, 54 orthonormal basis, 73, 74, 75, 8 1 , 1 87 orthonormality, 75 orthonormalization, 77, 88 pair of planes, 200 parallelogram area, 1 3 Parseval's identity, perrnu tation, 18 matrix of, 18 sign of, 18 r. ,
83, 8 5
84
Planck's constant, 225 polynomial, 20, 1 42 degree, 5 1 , 142 irreducible, 1 4 8 long division, 1 43 monic, 1 42 of odd degree, 1 4 9 repeated root, 146 root, 1 45 positive definite form, 50, 94, 1 1 3 positive definiteness, 50, 55 positive semi-definite form, 1 04 primary component, 205 primary decomposition theorem,
1 81 ,
205, 2 1 6
prime subfield, 42 probability density, 224 projection, 74, 76, 8 1 Pythagoras's theorem, 82, quadratic form, 1 06, quantum mechanics, quotient, 1 42
85
1 96 , 1 98 224
range, 1 28 rank, 8, 1 0 , 1 1 , 1 3 1 of a form, 1 06 , 1 1 3, 1 1 9 , 1 98 of a transformation, 1 29 rank-nullity formula, 1 29 real part, 55 real vector space, 21 remainder, 1 4 2 remainder theorem, 1 45 repeated root, 1 46 of the minimum polynomial ,
1 76
230 Index Riemann integral, 222 root of a polynomial, 145 multiplicity, 146 rotation, 1 87 rotation matrix, 47 row operation, 6, 1 1 5 row vector, 24 row-rank, 8 scalar, 2 1 , 24 scalar multiplication, 4 scalar product, 47 self-adjoint transformation, 1 93, 1 94, 1 95, 225 semi-definite form , 1 04 sesquilinear form, 69, 1 1 9 matrix of, 70 sesquilinearity, 55, 69 set orthogonal, 73 orthonormal, 75 signature, 1 06 , 1 1 3, 1 1 9 , 1 98 similar m atrices, 1 38, 1 55 simultaneous equations difference, 1 5 1 , 1 82, 2 1 2 differential, 1 82, 2 1 2 linear, 1 0 skew-symmetric form, 72, 1 23 span A, 25, 2 7 standard basis, 30 standard inner product, 47, 48, 49, 56 stationary point, 1 1 7 subfield, 40 prime, 42 subspace, 24, 1 28 direct sum , 87 spanned by A , 25, 2 7
sum o f subspaces, 86 surface, 1 1 7 surjection, 36, 1 3 0 Sylvester's law o f inertia, 1 09 , 1 1 9 Sylvester, J . J . , 1 06 symmetric form, 62, 1 0 7 symmetry, 5 0 trace, 1 1 transpose, 5 triangle inequality, 53, 57 two-sheet hyperboloid, 1 98 uncertainty principle, 226 unitary space, 56, 80 unitary transformation, 1 87, 1 88 upper triangular matrix, 1 3, 1 55, 1 59 , 1 74 usual basis, 30 vector, 21 vector product, 99 vector space dimension zero, 33 infinite dimensional, 33 over IC, 23 over F, 24 over lR, 21 subspace, 1 28 vector space complement, 86, 8 7 zero zero zero zero zero
map, 1 3 5 matrix, 4 , 1 35 polynomial, 142 space, 22, 24 vector, 1 9, 21