Graduate Texts in Mathematics
S. Axler
135
Editorial Board F.W. Gehring K.A. Ribet
Graduate Texts in Mathematics 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
TAKEUTI/ZARING.Introduction to Axiomatic Set Theory. 2nd ed. OXTOBY.Measure and Category. 2nd ed. SCHAEFER.Topological Vector Spaces. 2nd ed. HILTON/STAMMBACH.A Course in Homological Algebra. 2nd ed. MAC LANE. Categories for the Working Mathematician. 2nd ed. HUGHES/PIPER.Projective Planes. J.-P. SERRE.A Course in Arithmetic. TAKEUTI/ZARING.Axiomatic Set Theory. HUMPHREYS.Introduction to Lie Algebras and Representation Theory. COHEN.A Course in Simple Homotopy Theory. CONWAY.Functions of One Complex Variable I. 2nd ed. BEALS.Advanced Mathematical Analysis. ANDERSON/FULLER.Rings and Categories of Modules. 2nd ed. GOLUBITSKY/GUILLEMIN.Stable Mappings and Their Singularities. BERBERIAN.Lectures in Functional Analysis and Operator Theory. WrNTER.The Structure of Fields. ROSENBLATT.Random Processes. 2nd ed. HALMOS.Measure Theory. HALMOS.A Hilbert Space Problem Book. 2nd ed. HUSEMOLLER.Fibre Bundles. 3rd ed. HUMPHREYS.Linear Algebraic Groups. BARNES/]VIACK.An Algebraic Introduction to Mathematical Logic. GREUB.Linear Algebra. 4th ed. HOLMES.Geometric Functional Analysis and Its Applications. HEWITT/STROMBERG.Real and Abstract Analysis. MANES.Algebraic Theories. KELLEY.General Topology. ZARISKI/SAMUEL.Commutative Algebra. Vol.I. ZAPdSrO/SAMUEL.Commutative Algebra. Vol.II. JACOBSON.Lectures in Abstract Algebra I. Basic Concepts. JACOBSON.Lectures in Abstract Algebra II. Linear Algebra. JACOBSON.Lectures in Abstract Algebra III. Theory of Fields and Galois Theory. HIRSCH.Differential Topology.
34 35 36 37 38 39 40 41
42 43 44 45 46 47 48 49 50 51 52 53 54 55
56 57 58 59 60 61 62 63
SPITZER.Principles of Random Walk. 2nd ed. ALEXANDE~ERMER.Several Complex Variables and Banach Algebras. 3rd ed. KELLEY/NAMIOKAet al. Linear Topological Spaces. MONK.Mathematical Logic. GRAUERT/FRITZSCHE.Several Complex Variables. ARVESON.An Invitation to C*-Algebras. KEMENY/SNELL/KNAPP.Denumerable Markov Chains. 2nd ed. APOSTOL.Modular Functions and Dirichlet Series in Number Theory. 2nd ed. J.-P. SERRE.Linear Representations of Finite Groups. GILLMAN/JERISON.Rings of Continuous Functions. KENDIG.Elementary Algebraic Geometry. LOEVE.Probability Theory I. 4th ed. LoI~vE.Probability Theory II. 4th ed. MOISE.Geometric Topology in Dimensions 2 and 3. SACHS/~qVu.General Relativity for Mathematicians. GRUENBERG/~vVEIR.Linear Geometry. 2nd ed. EDWARDS.Fermat's Last Theorem. KLINGENBERG.A Course in Differential Geometry. HARTSHORNE.Algebraic Geometry. MANN.A Course in Mathematical Logic. GRAVEK/V~ATKINS.Combinatorics with Emphasis on the Theory of Graphs. BROWN/PEARCY.Introduction to Operator Theory I: Elements of Functional Analysis. MASSEY.Algebraic Topology: An Introduction. CROWELL/Fox.Introduction to Knot Theory. KOBLITZ.p-adic Numbers, p-adic Analysis, and Zeta-Functions. 2nd ed. LANG.Cyclotomic Fields. ARNOLD.Mathematical Methods in Classical Mechanics. 2nd ed. WHITEHEAD.Elements of Homotopy Theory. KARGAPOLOV/]~ERLZJAKOV.Fundamentals of the Theory of Groups. BOLLOBAS.Graph Theory.
(continued after index)
Steven Roman
Advanced Linear Algebra Second Edition
___ S p r i n g e r
Steven Roman University of California, Irvine Irvine, California 92697-3875 USA
[email protected]
Editorial Board." S. Axler Mathematics Department San Francisco State University San Francisco, CA 94132 USA
[email protected]
EW. Gehring Mathematics Department East Hall University of Michigan Ann Arbor, MI 48109 USA
[email protected]
K.A. Ribet Mathematics D e p a r t m e n t University of California, Berkeley Berkeley, CA 94720-3840 USA
[email protected]
Mathematics Subject Classification (2000): 15-xx Library of Congress Cataloging-in-Publication Data Roman, Steven. Advanced linear algebra / Steven Roman.--2nd ed. p. cm. Includes bibliographical references and index. ISBN 0-387-24766-1 (acid-free paper) 1. Algebras, Linear. I. Title. QA184.2.R66 2005 512'.5-- dc22 2005040244 ISBN 0-387-24766-1
Printed on acid-free paper.
© 2005 Steven Roman All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer Science+ Business Media, LLC, 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. Printed in the United States of America. 98765432 springer.com
To Donna and to my poker buddies Rachelle, Carol and Dan
Preface to the Second Edition
Let me begin by thanking the readers of the first edition for their many helpful comments and suggestions. The second edition represents a major change from the first edition. Indeed, one might say that it is a totally new book, with the exception of the general range of topics covered. The text has been completely rewritten. I hope that an additional 12 years and roughly 20 books worth of experience has enabled me to improve the quality of my exposition. Also, the exercise sets have been completely rewritten. The second edition contains two new chapters: a chapter on convexity, separation and positive solutions to linear systems (Chapter 15) and a chapter on the QR decomposition, singular values and pseudoinverses (Chapter 17). The treatments of tensor products and the umbral calculus have been greatly expanded and I have included discussions of determinants (in the chapter on tensor products), the complexification of a real vector space, Schur's lemma and Geršgorin disks. Steven Roman
Irvine, California February 2005
Preface to the First Edition
This book is a thorough introduction to linear algebra, for the graduate or advanced undergraduate student. Prerequisites are limited to a knowledge of the basic properties of matrices and determinants. However, since we cover the basics of vector spaces and linear transformations rather rapidly, a prior course in linear algebra (even at the sophomore level), along with a certain measure of “mathematical maturity,” is highly desirable. Chapter 0 contains a summary of certain topics in modern algebra that are required for the sequel. This chapter should be skimmed quickly and then used primarily as a reference. Chapters 1–3 contain a discussion of the basic properties of vector spaces and linear transformations. Chapter 4 is devoted to a discussion of modules, emphasizing a comparison between the properties of modules and those of vector spaces. Chapter 5 provides more on modules. The main goals of this chapter are to prove that any two bases of a free module have the same cardinality and to introduce noetherian modules. However, the instructor may simply skim over this chapter, omitting all proofs. Chapter 6 is devoted to the theory of modules over a principal ideal domain, establishing the cyclic decomposition theorem for finitely generated modules. This theorem is the key to the structure theorems for finite-dimensional linear operators, discussed in Chapters 7 and 8. Chapter 9 is devoted to real and complex inner product spaces. The emphasis here is on the finite-dimensional case, in order to arrive as quickly as possible at the finite-dimensional spectral theorem for normal operators, in Chapter 10. However, we have endeavored to state as many results as is convenient for vector spaces of arbitrary dimension. The second part of the book consists of a collection of independent topics, with the one exception that Chapter 13 requires Chapter 12. Chapter 11 is on metric vector spaces, where we describe the structure of symplectic and orthogonal geometries over various base fields. Chapter 12 contains enough material on metric spaces to allow a unified treatment of topological issues for the basic
x
Preface
Hilbert space theory of Chapter 13. The rather lengthy proof that every metric space can be embedded in its completion may be omitted. Chapter 14 contains a brief introduction to tensor products. In order to motivate the universal property of tensor products, without getting too involved in categorical terminology, we first treat both free vector spaces and the familiar direct sum, in a universal way. Chapter 15 [Chapter 16 in the second edition] is on affine geometry, emphasizing algebraic, rather than geometric, concepts. The final chapter provides an introduction to a relatively new subject, called the umbral calculus. This is an algebraic theory used to study certain types of polynomial functions that play an important role in applied mathematics. We give only a brief introduction to the subject c emphasizing the algebraic aspects, rather than the applications. This is the first time that this subject has appeared in a true textbook. One final comment. Unless otherwise mentioned, omission of a proof in the text is a tacit suggestion that the reader attempt to supply one. Steven Roman
Irvine, California
Contents
Preface to the Second Edition, vii Preface to the First Edition, ix
Preliminaries, 1 Part 1 Preliminaries, 1 Part 2 Algebraic Structures, 16
Part I—Basic Linear Algebra, 31 1
Vector Spaces, 33 Vector Spaces, 33 Subspaces, 35 Direct Sums, 38 Spanning Sets and Linear Independence, 41 The Dimension of a Vector Space, 44 Ordered Bases and Coordinate Matrices, 47 The Row and Column Spaces of a Matrix, 48 The Complexification of a Real Vector Space, 49 Exercises, 51
2
Linear Transformations, 55 Linear Transformations, 55 Isomorphisms, 57 The Kernel and Image of a Linear Transformation, 57 Linear Transformations from - to - , 59 The Rank Plus Nullity Theorem, 59 Change of Basis Matrices, 60 The Matrix of a Linear Transformation, 61 Change of Bases for Linear Transformations, 63 Equivalence of Matrices, 64 Similarity of Matrices, 65 Similarity of Operators, 66 Invariant Subspaces and Reducing Pairs, 68
xii
Contents
Topological Vector Spaces, 68 Linear Operators on = d , 71 Exercises, 72 3
The Isomorphism Theorems, 75 Quotient Spaces, 75 The Universal Property of Quotients and the First Isomorphism Theorem, 77 Quotient Spaces, Complements and Codimension, 79 Additional Isomorphism Theorems, 80 Linear Functionals, 82 Dual Bases, 83 Reflexivity, 84 Annihilators, 86 Operator Adjoints, 88 Exercises, 90
4
Modules I: Basic Properties, 93 Modules, 93 Motivation, 93 Submodules, 95 Spanning Sets, 96 Linear Independence, 98 Torsion Elements, 99 Annihilators, 99 Free Modules, 99 Homomorphisms, 100 Quotient Modules, 101 The Correspondence and Isomorphism Theorems, 102 Direct Sums and Direct Summands, 102 Modules Are Not As Nice As Vector Spaces, 106 Exercises, 106
5
Modules II: Free and Noetherian Modules, 109 The Rank of a Free Module, 109 Free Modules and Epimorphisms, 114 Noetherian Modules, 115 The Hilbert Basis Theorem, 118 Exercises, 119
6
Modules over a Principal Ideal Domain, 121 Annihilators and Orders, 121 Cyclic Modules, 122 Free Modules over a Principal Ideal Domain, 123 Torsion-Free and Free Modules, 125
Contents
Prelude to Decomposition: Cyclic Modules, 126 The First Decomposition, 127 A Look Ahead, 127 The Primary Decomposition, 128 The Cyclic Decomposition of a Primary Module, 130 The Primary Cyclic Decomposition Theorem, 134 The Invariant Factor Decomposition, 135 Exercises, 138 7
The Structure of a Linear Operator, 141 A Brief Review, 141 The Module Associated with a Linear Operator, 142 Orders and the Minimal Polynomial, 144 Cyclic Submodules and Cyclic Subspaces, 145 Summary, 147 The Decomposition of = , 147 The Rational Canonical Form, 148 Exercises, 151
8
Eigenvalues and Eigenvectors, 153 The Characteristic Polynomial of an Operator, 153 Eigenvalues and Eigenvectors, 155 Geometric and Algebraic Multiplicities, 157 The Jordan Canonical Form, 158 Triangularizability and Schur's Lemma, 160 Diagonalizable Operators, 165 Projections, 166 The Algebra of Projections, 167 Resolutions of the Identity, 170 Spectral Resolutions, 172 Projections and Invariance, 173 Exercises, 174
9
Real and Complex Inner Product Spaces , 181 Norm and Distance, 183 Isometries, 186 Orthogonality, 187 Orthogonal and Orthonormal Sets, 188 The Projection Theorem and Best Approximations, 192 Orthogonal Direct Sums, 194 The Riesz Representation Theorem, 195 Exercises, 196
10
Structure Theory for Normal Operators, 201 The Adjoint of a Linear Operator, 201
xiii
xiv
Contents
Unitary Diagonalizability, 204 Normal Operators, 205 Special Types of Normal Operators, 207 Self-Adjoint Operators, 208 Unitary Operators and Isometries, 210 The Structure of Normal Operators, 215 Matrix Versions, 222 Orthogonal Projections, 223 Orthogonal Resolutions of the Identity, 226 The Spectral Theorem, 227 Spectral Resolutions and Functional Calculus, 228 Positive Operators, 230 The Polar Decomposition of an Operator, 232 Exercises, 234
Part II—Topics, 235 11
Metric Vector Spaces: The Theory of Bilinear Forms, 239 Symmetric, Skew-Symmetric and Alternate Forms, 239 The Matrix of a Bilinear Form, 242 Quadratic Forms, 244 Orthogonality, 245 Linear Functionals, 248 Orthogonal Complements and Orthogonal Direct Sums, 249 Isometries, 252 Hyperbolic Spaces, 253 Nonsingular Completions of a Subspace, 254 The Witt Theorems: A Preview, 256 The Classification Problem for Metric Vector Spaces, 257 Symplectic Geometry, 258 The Structure of Orthogonal Geometries: Orthogonal Bases, 264 The Classification of Orthogonal Geometries: Canonical Forms, 266 The Orthogonal Group, 272 The Witt's Theorems for Orthogonal Geometries, 275 Maximal Hyperbolic Subspaces of an Orthogonal Geometry, 277 Exercises, 279
12
Metric Spaces, 283 The Definition, 283 Open and Closed Sets, 286 Convergence in a Metric Space, 287 The Closure of a Set, 288
Contents
Dense Subsets, 290 Continuity, 292 Completeness, 293 Isometries, 297 The Completion of a Metric Space, 298 Exercises, 303 13
Hilbert Spaces, 307 A Brief Review, 307 Hilbert Spaces, 308 Infinite Series, 312 An Approximation Problem, 313 Hilbert Bases, 317 Fourier Expansions, 318 A Characterization of Hilbert Bases, 328 Hilbert Dimension, 328 A Characterization of Hilbert Spaces, 329 The Riesz Representation Theorem, 331 Exercises, 334
14
Tensor Products, 337 Universality, 337 Bilinear Maps, 341 Tensor Products, 343 When Is a Tensor Product Zero? 348 Coordinate Matrices and Rank, 350 Characterizing Vectors in a Tensor Product, 354 Defining Linear Transformations on a Tensor Product, 355 The Tensor Product of Linear Transformations, 357 Change of Base Field, 359 Multilinear Maps and Iterated Tensor Products, 363 Tensor Spaces, 366 Special Multilinear Maps, 371 Graded Algebras, 372 The Symmetric Tensor Algebra, 374 The Antisymmetric Tensor Algebra: The Exterior Product Space, 380 The Determinant, 387 Exercises, 391
15
Positive Solutions to Linear Systems: Convexity and Separation 395 Convex, Closed and Compact Sets, 398 Convex Hulls, 399
xv
xvi
Contents
Linear and Affine Hyperplanes, 400 Separation, 402 Exercises, 407 16
Affine Geometry, 409 Affine Geometry, 409 Affine Combinations, 41 Affine Hulls, 412 The Lattice of Flats, 413 Affine Independence, 416 Affine Transformations, 417 Projective Geometry, 419 Exercises, 423
17
Operator Factorizations: QR and Singular Value, 425 The QR Decomposition, 425 Singular Values, 428 The Moore–Penrose Generalized Inverse, 430 Least Squares Approximation, 433 Exercises, 434
18
The Umbral Calculus, 437 Formal Power Series, 437 The Umbral Algebra, 439 Formal Power Series as Linear Operators, 443 Sheffer Sequences, 446 Examples of Sheffer Sequences, 454 Umbral Operators and Umbral Shifts, 456 Continuous Operators on the Umbral Algebra, 458 Operator Adjoints, 459 Umbral Operators and Automorphisms of the Umbral Algebra, 460 Umbral Shifts and Derivations of the Umbral Algebra, 465 The Transfer Formulas, 470 A Final Remark, 471 Exercises, 472
References, 473 Index, 475
Chapter 0
Preliminaries
In this chapter, we briefly discuss some topics that are needed for the sequel. This chapter should be skimmed quickly and used primarily as a reference.
Part 1 Preliminaries Multisets The following simple concept is much more useful than its infrequent appearance would indicate. Definition Let : be a nonempty set. A multiset 4 with underlying set : is a set of ordered pairs 4 ~ ¸² Á ³
:Á {b Á
£
for £ ¹
where {b ~ ¸Á Á à ¹. The number is referred to as the multiplicity of the elements in 4 . If the underlying set of a multiset is finite, we say that the multiset is finite. The size of a finite multiset 4 is the sum of the multiplicities of all of its elements.
For example, 4 ~ ¸²Á ³Á ²Á ³Á ²Á ³¹ is a multiset with underlying set : ~ ¸Á Á ¹. The elements has multiplicity . One often writes out the elements of a multiset according to multiplicities, as in 4 ~ ¸Á Á Á Á Á ¹ . Of course, two mutlisets are equal if their underlying sets are equal and if the multiplicity of each element in the comon underlying set is the same in both multisets.
Matrices The set of d matrices with entries in a field - is denoted by CÁ ²- ³ or by CÁ when the field does not require mention. The set CÁ ²< ³ is denoted by C ²- ³ or C À If ( C, the ²Á ³-th entry of ( will be denoted by (Á . The identity matrix of size d is denoted by 0 . The elements of the base
2
Advanced Linear Algebra
field - are called scalars. We expect that the reader is familiar with the basic properties of matrices, including matrix addition and multiplication. The main diagonal of an d matrix ( is the sequence of entries (Á Á (Á Á à Á (Á where ~ min¸Á ¹. Definition The transpose of ( CÁ is the matrix (! defined by ²(! ³Á ~ (Á A matrix ( is symmetric if ( ~ (! and skew-symmetric if (! ~ c(.
Theorem 0.1 (Properties of the transpose) Let (, ) CÁ . Then 1) ²(! ³! ~ ( 2) ²( b )³! ~ (! b ) ! 3) ²(³! ~ (! for all 4) ²()³! ~ ) ! (! provided that the product () is defined 5) det²(! ³ ~ det²(³.
Partitioning and Matrix Multiplication Let 4 be a matrix of size d . If ) ¸Á à Á ¹ and * ¸Á à Á ¹ then the submatrix 4 ´)Á *µ is the matrix obtained from 4 by keeping only the rows with index in ) and the columns with index in * . Thus, all other rows and columns are discarded and 4 ´)Á *µ has size () ( d (* (. Suppose that 4 CÁ and 5 CÁ . Let 1) F ~ ¸) Á à Á ) ¹ be a partition of ¸Á à Á ¹ 2) G ~ ¸* Á à Á * ¹ be a partition of ¸Á à Á ¹ 3) H ~ ¸+ Á à Á + ¹ be a partition of ¸Á à Á ¹ (Partitions are defined formally later in this chapter.) Then it is a very useful fact that matrix multiplication can be performed at the block level as well as at the entry level. In particular, we have ´4 5 µ´) Á + µ ~ 4 ´) Á * µ5 ´* Á + µ * G
When the partitions in question contain only single-element blocks, this is precisely the usual formula for matrix multiplication
´4 5 µÁ ~ 4Á 5Á ~
Preliminaries
3
Block Matrices It will be convenient to introduce the notational device of a block matrix. If )Á are matrices of the appropriate sizes then by the block matrix 4~
v )Á Å w )Á
)Á Å )Á
Ä
)Á y Å Ä )Á zblock
we mean the matrix whose upper left submatrix is )Á , and so on. Thus, the )Á 's are submatrices of 4 and not entries. A square matrix of the form v ) x 4 ~x Å w
Ä y Æ Æ Å { { Æ Æ Ä ) zblock
where each ) is square and is a zero submatrix, is said to be a block diagonal matrix.
Elementary Row Operations Recall that there are three types of elementary row operations. Type 1 operations consist of multiplying a row of ( by a nonzero scalar. Type 2 operations consist of interchanging two rows of (. Type 3 operations consist of adding a scalar multiple of one row of ( to another row of (. If we perform an elementary operation of type to an identity matrix 0 , the result is called an elementary matrix of type . It is easy to see that all elementary matrices are invertible. In order to perform an elementary row operation on ( CÁ we can perform that operation on the identity 0 , to obtain an elementary matrix , and then take the product ,(. Note that multiplying on the right by , has the effect of performing column operations. Definition A matrix 9 is said to be in reduced row echelon form if 1) All rows consisting only of 's appear at the bottom of the matrix. 2) In any nonzero row, the first nonzero entry is a . This entry is called a leading entry. 3) For any two consecutive rows, the leading entry of the lower row is to the right of the leading entry of the upper row. 4) Any column that contains a leading entry has 's in all other positions.
Here are the basic facts concerning reduced row echelon form.
4
Advanced Linear Algebra
Theorem 0.2 Matrices (Á ) CÁ are row equivalent, denoted by ( ) , if either one can be obtained from the other by a series of elementary row operations. 1) Row equivalence is an equivalence relation. That is, a) ( ( b) ( ) ¬ ) ( c) ( ) , ) * ¬ ( * . 2) A matrix ( is row equivalent to one and only one matrix 9 that is in reduced row echelon form. The matrix 9 is called the reduced row echelon form of (. Furthermore, ( ~ , Ä, 9 where , are the elementary matrices required to reduce ( to reduced row echelon form. 3) ( is invertible if and only if its reduced row echelon form is an identity matrix. Hence, a matrix is invertible if and only if it is the product of elementary matrices.
The following definition is probably well known to the reader. Definition A square matrix is upper triangular if all of its entries below the main diagonal are . Similarly, a square matrix is lower triangular if all of its entries above the main diagonal are . A square matrix is diagonal if all of its entries off the main diagonal are .
Determinants We assume that the reader is familiar with the following basic properties of determinants. Theorem 0.3 Let ( CÁ ²- ³. Then det²(³ is an element of - . Furthermore, 1) For any ) C ²- ³, det²()³ ~ det²(³det²)³ 2) ( is nonsingular (invertible) if and only if det²(³ £ . 3) The determinant of an upper triangular or lower triangular matrix is the product of the entries on its main diagonal. 4) If a square matrix 4 has the block diagonal form v ) x 4 ~x Å w then det²4 ³ ~ det²) ³.
Ä y Æ Æ Å { { Æ Æ Ä ) zblock
Preliminaries
5
Polynomials The set of all polynomials in the variable % with coefficients from a field - is denoted by - ´%µ. If ²%³ - ´%µ, we say that ²%³ is a polynomial over - . If ²%³ ~ b % b Ä b % is a polynomial with £ then is called the leading coefficient of ²%³ and the degree of ²%³ is , written deg ²%³ ~ . For convenience, the degree of the zero polynomial is cB. A polynomial is monic if its leading coefficient is . Theorem 0.4 (Division algorithm) Let ²%³Á ²%³ - ´%µ where deg ²%³ . Then there exist unique polynomials ²%³Á ²%³ - ´%µ for which ²%³ ~ ²%³²%³ b ²%³ where ²%³ ~ or deg ²%³ deg ²%³.
If ²%³ divides ²%³, that is, if there exists a polynomial ²%³ for which ²%³ ~ ²%³²%³ then we write ²%³ ²%³. Theorem 0.5 Let ²%³Á ²%³ - ´%µ. The greatest common divisor of ²%³ and ²%³, denoted by gcd² ²%³Á ²%³³, is the unique monic polynomial ²%³ over for which 1) ²%³ ²%³ and ²%³ ²%³ 2) if ²%³ ²%³ and ²%³ ²%³ then ²%³ ²%³. Furthermore, there exist polynomials ²%³ and ²%³ over - for which gcd² ²%³Á ²%³³ ~ ²%³ ²%³ b ²%³²%³
Definition The polynomials ²%³Á ²%³ - ´%µ are relatively prime if gcd² ²%³Á ²%³³ ~ . In particular, ²%³ and ²%³ are relatively prime if and only if there exist polynomials ²%³ and ²%³ over - for which ²%³ ²%³ b ²%³²%³ ~
Definition A nonconstant polynomial ²%³ - ´%µ is irreducible if whenever ²%³ ~ ²%³²%³ then one of ²%³ and ²%³ must be constant.
The following two theorems support the view that irreducible polynomials behave like prime numbers. Theorem 0.6 A nonconstant polynomial ²%³ is irreducible if and only if it has the property that whenever ²%³ ²%³²%³ then either ²%³ ²%³ or ²%³ ²%³.
6
Advanced Linear Algebra
Theorem 0.7 Every nonconstant polynomial in - ´%µ can be written as a product of irreducible polynomials. Moreover, this expression is unique up to order of the factors and multiplication by a scalar.
Functions To set our notation, we should make a few comments about functions. Definition Let ¢ : ¦ ; be a function from a set : to a set ; . 1) The domain of is the set : . 2) The image or range of is the set im² ³ ~ ¸ ² ³ :¹. 3) is injective (one-to-one), or an injection, if % £ & ¬ ²%³ £ ²&³. 4) is surjective (onto ; ), or a surjection, if im² ³ ~ ; . 5) is bijective, or a bijection, if it is both injective and surjective. 6) Assuming that ; , the support of is supp² ³ ~ ¸ : ² ³ £ ¹
If ¢ : ¦ ; is injective then its inverse c ¢ im² ³ ¦ : exists and is welldefined as a function on im² ³. It will be convenient to apply to subsets of : and ; . In particular, if ? : and if @ ; , we set ²?³ ~ ¸ ²%³ % ?¹ and c ²@ ³ ~ ¸ : ² ³ @ ¹ Note that the latter is defined even if is not injective. Let ¢ : ¦ ; . If ( : , the restriction of to ( is the function O( ¢ ( ¦ ; defined by O( ²³ ~ ²³ for all (. Clearly, the restriction of an injective map is injective.
Equivalence Relations The concept of an equivalence relation plays a major role in the study of matrices and linear transformations. Definition Let : be a nonempty set. A binary relation on : is called an equivalence relation on : if it satisfies the following conditions:
Preliminaries
7
1) (Reflexivity) for all : . 2) (Symmetry) ¬ for all Á : . 3) (Transitivity) Á ¬ for all Á Á : .
Definition Let be an equivalence relation on : . For : , the set of all elements equivalent to is denoted by ´µ ~ ¸ : ¹ and called the equivalence class of .
Theorem 0.8 Let be an equivalence relation on : . Then 1) ´µ ¯ ´µ ¯ ´µ ~ ´µ 2) For any Á : , we have either ´µ ~ ´µ or ´µ q ´µ ~ J.
Definition A partition of a nonempty set : is a collection ¸( Á Ã Á ( ¹ of nonempty subsets of : , called the blocks of the partition, for which 1) ( q ( ~ J for all £ 2) : ~ ( r Ä r ( .
The following theorem sheds considerable light on the concept of an equivalence relation. Theorem 0.9 1) Let be an equivalence relation on : . Then the set of distinct equivalence classes with respect to are the blocks of a partition of : . 2) Conversely, if F is a partition of : , the binary relation defined by if and lie in the same block of F is an equivalence relation on : , whose equivalence classes are the blocks of F . This establishes a one-to-one correspondence between equivalence relations on : and partitions of : .
The most important problem related to equivalence relations is that of finding an efficient way to determine when two elements are equivalent. Unfortunately, in
8
Advanced Linear Algebra
most cases, the definition does not provide an efficient test for equivalence and so we are led to the following concepts. Definition Let be an equivalence relation on : . A function ¢ : ¦ ; , where ; is any set, is called an invariant of if it is constant on the equivalence classes of , that is, ¬ ²³ ~ ²³ and a complete invariant if it is constant and distinct on the equivalence classes of , that is, ¯ ²³ ~ ²³ A collection ¸ Á Ã Á ¹ of invariants is called a complete system of invariants if ¯ ²³ ~ ²³ for all ~ Á Ã Á
Definition Let be an equivalence relation on : . A subset * : is said to be a set of canonical forms (or just a canonical form) for if for every : , there is exactly one * such that . Put another way, each equivalence class under contains exactly one member of * .
Example 0.1 Define a binary relation on - ´%µ by letting ²%³ ²%³ if and only if ²%³ ~ ²%³ for some nonzero constant - . This is easily seen to be an equivalence relation. The function that assigns to each polynomial its degree is an invariant, since ²%³ ²%³ ¬ deg²²%³³ ~ deg²²%³³ However, it is not a complete invariant, since there are inequivalent polynomials with the same degree. The set of all monic polynomials is a set of canonical forms for this equivalence relation.
Example 0.2 We have remarked that row equivalence is an equivalence relation on CÁ ²- ³. Moreover, the subset of reduced row echelon form matrices is a set of canonical forms for row equivalence, since every matrix is row equivalent to a unique matrix in reduced row echelon form.
Example 0.3 Two matrices (, ) C ²- ³ are row equivalent if and only if there is an invertible matrix 7 such that ( ~ 7 ) . Similarly, ( and ) are column equivalent, that is, ( can be reduced to ) using elementary column operations if and only if there exists an invertible matrix 8 such that ( ~ )8. Two matrices ( and ) are said to be equivalent if there exist invertible matrices 7 and 8 for which
Preliminaries
9
( ~ 7 )8 Put another way, ( and ) are equivalent if ( can be reduced to ) by performing a series of elementary row and/or column operations. (The use of the term equivalent is unfortunate, since it applies to all equivalence relations, not just this one. However, the terminology is standard, so we use it here.) It is not hard to see that an d matrix 9 that is in both reduced row echelon form and reduced column echelon form must have the block form 1 ~ >
0 cÁ
Ác cÁc ?block
We leave it to the reader to show that every matrix ( in C is equivalent to exactly one matrix of the form 1 and so the set of these matrices is a set of canonical forms for equivalence. Moreover, the function defined by ²(³ ~ , where ( 1 , is a complete invariant for equivalence. Since the rank of 1 is and since neither row nor column operations affect the rank, we deduce that the rank of ( is . Hence, rank is a complete invariant for equivalence. In other words, two matrices are equivalent if and only if they have the same rank.
Example 0.4 Two matrices (, ) C ²- ³ are said to be similar if there exists an invertible matrix 7 such that ( ~ 7 )7 c Similarity is easily seen to be an equivalence relation on C . As we will learn, two matrices are similar if and only if they represent the same linear operators on a given -dimensional vector space = . Hence, similarity is extremely important for studying the structure of linear operators. One of the main goals of this book is to develop canonical forms for similarity. We leave it to the reader to show that the determinant function and the trace function are invariants for similarity. However, these two invariants do not, in general, form a complete system of invariants.
Example 0.5 Two matrices (, ) C ²- ³ are said to be congruent if there exists an invertible matrix 7 for which ( ~ 7 )7 ! where 7 ! is the transpose of 7 . This relation is easily seen to be an equivalence relation and we will devote some effort to finding canonical forms for congruence. For some base fields - (such as s, d or a finite field), this is relatively easy to do, but for other base fields (such as r), it is extremely difficult.
10
Advanced Linear Algebra
Zorn's Lemma In order to show that any vector space has a basis, we require a result known as Zorn's lemma. To state this lemma, we need some preliminary definitions. Definition A partially ordered set is a pair ²7 Á ³ where 7 is a nonempty set and is a binary relation called a partial order, read “less than or equal to,” with the following properties: 1) (Reflexivity) For all 7 , 2) (Antisymmetry) For all Á 7 , and implies ~ 3) (Transitivity) For all Á Á 7 , and implies Partially ordered sets are also called posets.
It is customary to use a phrase such as “Let 7 be a partially ordered set” when the partial order is understood. Here are some key terms related to partially ordered sets. Definition Let 7 be a partially ordered set. 1) A maximal element is an element 7 with the property that there is no larger element in 7 , that is 7Á ¬ ~ 2) A minimal element is an element 7 with the property that there is no smaller element in 7 , that is 7Á ¬ ~ 3) Let Á 7 . Then " 7 is an upper bound for and if " and " The unique smallest upper bound for and , if it exists, is called the least upper bound of and and is denoted by lub¸Á ¹. 4) Let Á 7 . Then M 7 is a lower bound for and if M and M The unique largest lower bound for and , if it exists, is called the greatest lower bound of and and is denoted by glb¸Á ¹.
Preliminaries
11
Let : be a subset of a partially ordered set 7 . We say that an element " 7 is an upper bound for : if " for all : . Lower bounds are defined similarly. Note that in a partially ordered set, it is possible that not all elements are comparable. In other words, it is possible to have %Á & 7 with the property that % & and & %. Definition A partially ordered set in which every pair of elements is comparable is called a totally ordered set, or a linearly ordered set. Any totally ordered subset of a partially ordered set 7 is called a chain in 7 .
Example 0.6 1) The set s of real numbers, with the usual binary relation , is a partially ordered set. It is also a totally ordered set. It has no maximal elements. 2) The set o ~ ¸Á Á à ¹ of natural numbers, together with the binary relation of divides, is a partially ordered set. It is customary to write to indicate that divides . The subset : of o consisting of all powers of is a totally ordered subset of o, that is, it is a chain in o. The set 7 ~ ¸Á Á Á Á Á ¹ is a partially ordered set under . It has two maximal elements, namely and . The subset 8 ~ ¸Á Á Á Á ¹ is a partially ordered set in which every element is both maximal and minimal! 3) Let : be any set and let F ²:³ be the power set of : , that is, the set of all subsets of : . Then F ²:³, together with the subset relation , is a partially ordered set.
Now we can state Zorn's lemma, which gives a condition under which a partially ordered set has a maximal element. Theorem 0.10 (Zorn's lemma) If 7 is a partially ordered set in which every chain has an upper bound then 7 has a maximal element.
We will not prove Zorn's lemma. Indeed, Zorn's lemma is a result that is so fundamental that it cannot be proved or disproved in the context of ordinary set theory. (It is equivalent to the famous Axiom of Choice.) Therefore, Zorn's lemma (along with the Axiom of Choice) must either be accepted or rejected as an axiom of set theory. Since almost all mathematicians accept it, we will do so as well. Indeed, we will use Zorn's lemma to prove that every vector space has a basis.
Cardinality Two sets : and ; have the same cardinality, written (: ( ~ (; ( if there is a bijective function (a one-to-one correspondence) between the sets.
12
Advanced Linear Algebra
The reader is probably aware of the fact that ({( ~ (o( and (r( ~ (o( where o denotes the natural numbers, { the integers and r the rational numbers. If : is in one-to-one correspondence with a subset of ; , we write (: ( (; (. If : is in one-to-one correspondence with a proper subset of ; but not all of ; then we write (: ( (; (. The second condition is necessary, since, for instance, o is in one-to-one correspondence with a proper subset of { and yet o is also in one-to-one correspondence with { itself. Hence, (o( ~ ({(. This is not the place to enter into a detailed discussion of cardinal numbers. The intention here is that the cardinality of a set, whatever that is, represents the “size” of the set. It is actually easier to talk about two sets having the same, or different, size (cardinality) than it is to explicitly define the size (cardinality) of a given set. Be that as it may, we associate to each set : a cardinal number, denoted by (: ( or card²:³, that is intended to measure the size of the set. Actually, cardinal numbers are just very special types of sets. However, we can simply think of them as vague amorphous objects that measure the size of sets. Definition 1) A set is finite if it can be put in one-to-one correspondence with a set of the form { ~ ¸Á Á à Á c ¹, for some nonnegative integer . A set that is not finite is infinite. The cardinal number (or cardinality) of a finite set is just the number of elements in the set. 2) The cardinal number of the set o of natural numbers is L (read “aleph nought”), where L is the first letter of the Hebrew alphabet. Hence, (o( ~ ({( ~ (r( ~ L 3) Any set with cardinality L is called a countably infinite set and any finite or countably infinite set is called a countable set. An infinite set that is not countable is said to be uncountable.
Since it can be shown that (s( (o(, the real numbers are uncountable. If : and ; are finite sets then it is well known that (: ( (; ( and (; ( (: ( ¬ (: ( ~ (; ( The first part of the next theorem tells us that this is also true for infinite sets. The reader will no doubt recall that the power set F²:³ of a set : is the set of all subsets of : . For finite sets, the power set of : is always bigger than the set
Preliminaries
13
itself. In fact, (: ( ~ ¬ (F ²:³( ~ The second part of the next theorem says that the power set of any set : is bigger (has larger cardinality) than : itself. On the other hand, the third part of this theorem says that, for infinite sets : , the set of all finite subsets of : is the same size as : . Theorem 0.11 ¨ –Bernstein theorem) For any sets : and ; , 1) (Schroder (: ( (; ( and (; ( (: ( ¬ (: ( ~ (; ( 2) (Cantor's theorem) If F²:³ denotes the power set of : then (: ( (F ²:³( 3) If F ²:³ denotes the set of all finite subsets of : and if : is an infinite set then (: ( ~ (F ²:³( Proof. We prove only parts 1) and 2). Let ¢ : ¦ ; be an injective function from : into ; and let ¢ ; ¦ : be an injective function from ; into : . We want to use these functions to create a bijective function from : to ; . For this purpose, we make the following definitions. The descendants of an element : are the elements obtained by repeated alternate applications of the functions and , namely ² ³Á ² ² ³³Á ²² ² ³³³Á à If ! is a descendant of then is an ancestor of !. Descendants and ancestors of elements of ; are defined similarly. Now, by tracing an element's ancestry to its beginning, we find that there are three possibilities: the element may originate in : , or in ; , or it may have no point of origin. Accordingly, we can write : as the union of three disjoint sets I: ~ ¸ : originates in :¹ I; ~ ¸ : originates in ; ¹ IB ~ ¸ : has no originator¹ Similarly, ; is the disjoint union of J: , J; and JB . Now, the restriction OI: ¢ I: ¦ J: is a bijection. To see this, note that if ! J: then ! originated in : and therefore must have the form ² ³ for some : . But ! and its ancestor ² ³ have the
14
Advanced Linear Algebra
same point of origin and so ! J: implies I: . Thus, OI: is surjective and hence bijective. We leave it to the reader to show that the functions ²OJ; ³c ¢ I; ¦ J; and OIB ¢ IB ¦ JB are also bijections. Putting these three bijections together gives a bijection between : and ; . Hence, (: ( ~ (; (, as desired. We now prove Cantor's Theorem. The map ¢ : ¦ F ²:³ defined by ² ³ ~ ¸ ¹ is an injection from : to F ²:³ and so (: ( (F ²:³(. To complete the proof we must show that if no injective map ¢ : ¦ F ²:³ can be surjective. To this end, let ?~¸ :
¤ ² ³¹ F ²:³
We claim that ? is not in im² ³. For suppose that ? ~ ²%³ for some % : . Then if % ? , we have by the definition of ? that % ¤ ? . On the other hand, if % ¤ ? , we have again by the definition of ? that % ? . This contradiction implies that ? ¤ im² ³ and so is not surjective.
Cardinal Arithmetic Now let us define addition, multiplication and exponentiation of cardinal numbers. If : and ; are sets, the cartesian product : d ; is the set of all ordered pairs : d ; ~ ¸² Á !³
:Á ! ; ¹
The set of all functions from ; to : is denoted by : ; . Definition Let and denote cardinal numbers. Let : and ; be any sets for which (: ( ~ and (; ( ~ . 1) The sum b is the cardinal number of : r ; . 2) The product is the cardinal number of : d ; . 3) The power is the cardinal number of : ; .
We will not go into the details of why these definitions make sense. (For instance, they seem to depend on the sets : and ; , but in fact they do not.) It can be shown, using these definitions, that cardinal addition and multiplication are associative and commutative and that multiplication distributes over addition. Theorem 0.12 Let , and be cardinal numbers. Then the following properties hold: 1) (Associativity) b ² b ³ ~ ² b ³ b and ²³ ~ ²³
Preliminaries
15
2) (Commutativity) b ~ b and ~ 3) (Distributivity) ² b ³ ~ b 4) (Properties of Exponents) a) b ~ b) ² ³ ~ c) ²³ ~
On the other hand, the arithmetic of cardinal numbers can seem a bit strange, as the next theorem shows. Theorem 0.13 Let and be cardinal numbers, at least one of which is infinite. Then b ~ ~ max¸Á ¹
It is not hard to see that there is a one-to-one correspondence between the power set F²:³ of a set : and the set of all functions from : to ¸Á ¹. This leads to the following theorem. Theorem 0.14 For any cardinal 1) If (: ( ~ then (F ²:³( ~ 2)
We have already observed that (o( ~ L . It can be shown that L is the smallest infinite cardinal, that is, L0 ¬ is a natural number It can also be shown that the set s of real numbers is in one-to-one correspondence with the power set F ²o³ of the natural numbers. Therefore, (s( ~ L The set of all points on the real line is sometimes called the continuum and so L is sometimes called the power of the continuum and denoted by . Theorem 0.13 shows that cardinal addition and multiplication have a kind of “absorption” quality, which makes it hard to produce larger cardinals from smaller ones. The next theorem demonstrates this more dramatically. Theorem 0.15 1) Addition applied a countable number of times or multiplication applied a finite number of times to the cardinal number L , does not yield anything
16
Advanced Linear Algebra
more than L . Specifically, for any nonzero o, we have L h L ~ L and L ~ L 2) Addition and multiplication, applied a countable number of times to the cardinal number L does not yield more than L . Specifically, we have L h L ~ L and ²L ³L ~ L
Using this theorem, we can establish other relationships, such as L ²L ³L ²L ³L ~ L which, by the Schro¨der–Bernstein theorem, implies that ²L ³L ~ L We mention that the problem of evaluating in general is a very difficult one and would take us far beyond the scope of this book. We will have use for the following reasonable–sounding result, whose proof is omitted. Theorem 0.16 Let ¸( 2¹ be a collection of sets, indexed by the set 2 , with (2 ( ~ . If (( ( for all 2 then e ( e
2
Let us conclude by describing the cardinality of some famous sets. Theorem 0.17 1) The following sets have cardinality L . a) The rational numbers r. b) The set of all finite subsets of o. c) The union of a countable number of countable sets. d) The set { of all ordered -tuples of integers. 2) The following sets have cardinality L . a) The set of all points in s . b) The set of all infinite sequences of natural numbers. c) The set of all infinite sequences of real numbers. d) The set of all finite subsets of s. e) The set of all irrational numbers.
Part 2 Algebraic Structures We now turn to a discussion of some of the many algebraic structures that play a role in the study of linear algebra.
Preliminaries
17
Groups Definition A group is a nonempty set ., together with a binary operation denoted by *, that satisfies the following properties: 1) (Associativity) For all Á Á . ²i³i ~ i²i³ 2) (Identity) There exists an element . for which i ~ i ~ for all .. 3) (Inverses) For each ., there is an element c . for which ic ~ c i ~
Definition A group . is abelian, or commutative, if i ~ i for all Á . . When a group is abelian, it is customary to denote the operation i by +, thus writing i as b . It is also customary to refer to the identity as the zero element and to denote the inverse c by c, referred to as the negative of .
Example 0.7 The set < of all bijective functions from a set : to : is a group under composition of functions. However, in general, it is not abelian.
Example 0.8 The set CÁ ²- ³ is an abelian group under addition of matrices. The identity is the zero matrix 0Á of size d . The set C ²- ³ is not a group under multiplication of matrices, since not all matrices have multiplicative inverses. However, the set of invertible matrices of size d is a (nonabelian) group under multiplication.
A group . is finite if it contains only a finite number of elements. The cardinality of a finite group . is called its order and is denoted by ².³ or simply (.(. Thus, for example, { ~ ¸Á Á à Á c ¹ is a finite group under addition modulo , but CÁ ²s³ is not finite. Definition A subgroup of a group . is a nonempty subset : of . that is a group in its own right, using the same operations as defined on . .
Rings Definition A ring is a nonempty set 9 , together with two binary operations, called addition (denoted by b ) and multiplication (denoted by juxtaposition), for which the following hold: 1) 9 is an abelian group under addition
18
Advanced Linear Algebra
2) (Associativity) For all Á Á 9 , ²³ ~ ²³ 3) (Distributivity) For all Á Á 9 , ² b ³ ~ b and ² b ³ ~ b A ring 9 is said to be commutative if ~ for all Á 9 . If a ring 9 contains an element with the property that ~ ~ for all 9 , we say that 9 is a ring with identity. The identity is usually denoted by .
Example 0.9 The set { ~ ¸Á Á à Á c¹ is a commutative ring under addition and multiplication modulo l ~ ² b ³ mod Á
p ~ mod
The element { is the identity.
Example 0.10 The set , of even integers is a commutative ring under the usual operations on {, but it has no identity.
Example 0.11 The set C ²- ³ is a noncommutative ring under matrix addition and multiplication. The identity matrix 0 is the identity for C ²- ³.
Example 0.12 Let - be a field. The set - ´%µ of all polynomials in a single variable %, with coefficients in - , is a commutative ring, under the usual operations of polynomial addition and multiplication. What is the identity for - ´%µ? Similarly, the set - ´% Á Ã Á % µ of polynomials in variables is a commutative ring under the usual addition and multiplication of polynomials.
Definition A subring of a ring 9 is a subset : of 9 that is a ring in its own right, using the same operations as defined on 9 and having the same multiplicative identity as 9 .
The condition that a subring : have the same multiplicative identity as 9 is required. For example, the set : of all d matrices of the form ( ~ >
?
for - is a ring under addition and multiplication of matrices (isomorphic to - ). The multiplicative identity in : is the matrix ( , which is not the identity 0 of CÁ ²- ³. Hence, : is a ring under the same operations as CÁ ²- ³ but it is not a subring of CÁ ²- ³.
Preliminaries
19
Applying the definition is not generally the easiest way to show that a subset of a ring is a subring. The following characterization is usually easier to apply. Theorem 0.18 A nonempty subset : of a ring 9 is a subring if and only if 1) The multiplicative identity 9 of 9 is in : 2) : is closed under subtraction, that is Á : ¬ c : 3) : is closed under multiplication, that is, Á : ¬ :
Ideals Rings have another important substructure besides subrings. Definition Let 9 be a ring. A nonempty subset ? of 9 is called an ideal if 1) ? is a subgroup of the abelian group 9, that is, ? is closed under subtraction Á ? ¬ c ? 2) ? is closed under multiplication by any ring element, that is, ? Á 9 ¬ ? and ?
Note that if an ideal ? contains the unit element then ? ~ 9 . Example 0.13 Let ²%³ be a polynomial in - ´%µ. The set of all multiples of ²%³ º²%³» ~ ¸²%³²%³ ²%³ - ´%µ¹ is an ideal in - ´%µ, called the ideal generated by ²%³.
Definition Let : be a subset of a ring 9 with identity. The set º:» ~ ¸
b Ä b
9Á
:Á ¹
of all finite linear combinations of elements of : , with coefficients in 9 , is an ideal in 9 , called the ideal generated by : . It is the smallest (in the sense of set inclusion) ideal of 9 containing : . If : ~ ¸ Á Ã Á ¹ is a finite set, we write º Á Ã Á
»
~ ¸
b Ä b
9Á
:¹
Note that in the previous definition, we require that 9 have an identity. This is to ensure that : º:». Theorem 0.19 Let 9 be a ring.
20
Advanced Linear Algebra
1) The intersection of any collection ¸? 2¹ of ideals is an ideal. 2) If ? ? Ä is an ascending sequence of ideals, each one contained in the next, then the union ? is also an ideal. 3) More generally, if 9 ~ ¸? 0¹ is a chain of ideals in 9 then the union @ ~ 0 ? is also an ideal in 9 . Proof. To prove 1), let @ ~ ? . Then if Á @ , we have Á ? for all 2 . Hence, c ? for all 2 and so c @ . Hence, @ is closed under subtraction. Also, if 9 then ? for all 2 and so @ . Of course, part 2) is a special case of part 3). To prove 3), if Á @ then ? and ? for some Á 0 . Since one of ? and ? is contained in the other, we may assume that ? ? . It follows that Á ? and so c ? @ and if 9 then ? @ . Thus @ is an ideal.
Note that in general, the union of ideals is not an ideal. However, as we have just proved, the union of any chain of ideals is an ideal. Quotient Rings and Maximal Ideals Let : be a subset of a commutative ring 9 with identity. Let be the binary relation on 9 defined by ¯ c : It is easy to see that is an equivalence relation. When , we say that and are congruent modulo : . The term “mod” is used as a colloquialism for modulo and is often written mod : As shorthand, we write . To see what the equivalence classes look like, observe that ´µ ~ ¸ 9 ¹ ~ ¸ 9 c :¹ ~ ¸ 9 ~ b for some :¹ ~ ¸ b :¹ ~b: The set b : ~ ¸ b
:¹
is called a coset of : in 9 . The element is called a coset representative for b :.
Preliminaries
21
Thus, the equivalence classes for congruence mod : are the cosets b : of : in 9 . The set of all cosets is denoted by 9 ~ ¸ b : 9¹ : This is read “9 mod : .” We would like to place a ring structure on 9°: . Indeed, if : is a subgroup of the abelian group 9 then 9°: is easily seen to be an abelian group as well under coset addition defined by ² b :³ b ² b :³ ~ ² b ³ b : In order for the product ² b :³² b :³ ~ b : to be well defined we must have b : ~ Z b : ¬ b : ~ Z b : or, equivalently, c Z : ¬ ² c Z ³ : But c Z may be any element of : and may be any element of 9 and so this condition implies that : must be an ideal. Conversely, if : is an ideal then coset multiplication is well defined. Theorem 0.20 Let 9 be a commutative ring with identity. Then the quotient 9°? is a ring under coset addition and multiplication if and only if ? is an ideal of 9. In this case, 9°? is called the quotient ring of 9 modulo ? , where addition and multiplication are defined by ² b :³ b ² b :³ ~ ² b ³ b : ² b :³² b :³ ~ b :
Definition An ideal ? in a ring 9 is a maximal ideal if ? £ 9 and if whenever @ is an ideal satisfying ? @ 9 then either @ ~ ? or @ ~ 9 .
Here is one reason why maximal ideals are important. Theorem 0.21 Let 9 be a commutative ring with identity. Then the quotient ring 9°? is a field if and only if ? is a maximal ideal. Proof. First, note that for any ideal ? of 9, the ideals of 9°? are precisely the quotients @ °? where @ is an ideal for which ? @ 9. It is clear that @ °? is an ideal of 9°? . Conversely, if AZ is an ideal of 9°? then let A ~ ¸ 9 b ? AZ ¹ It is easy to see that A is an ideal of 9 for which ? A 9 .
22
Advanced Linear Algebra
Next, observe that a commutative ring : with identity is a field if and only if : has no nonzero proper ideals. For if : is a field and ? is an ideal of : containing a nonzero element then ~ c ? and so ? ~ : . Conversely, if : has no nonzero proper ideals and £ : then the ideal º » must be : and so there is an : for which ~ . Hence, : is a field. Putting these two facts together proves the theorem.
The following result says that maximal ideals always exist. Theorem 0.22 Any commutative ring 9 with identity contains a maximal ideal. Proof. Since 9 is not the zero ring, the ideal ¸¹ is a proper ideal of 9 . Hence, the set I of all proper ideals of 9 is nonempty. If 9 ~ ¸? 0¹ is a chain of proper ideals in 9 then the union @ ~ 0 ? is also an ideal. Furthermore, if @ ~ 9 is not proper, then @ and so ? , for some 0 , which implies that ? ~ 9 is not proper. Hence, @ I . Thus, any chain in I has an upper bound in I and so Zorn's lemma implies that I has a maximal element. This shows that 9 has a maximal ideal.
Integral Domains Definition Let 9 be a ring. A nonzero element r 9 is called a zero divisor if there exists a nonzero 9 for which ~ . A commutative ring 9 with identity is called an integral domain if it contains no zero divisors.
Example 0.14 If is not a prime number then the ring { has zero divisors and so is not an integral domain. To see this, observe that if is not prime then ~ in {, where Á . But in { , we have p ~ mod ~ and so and are both zero divisors. As we will see later, if is a prime then { is a field (which is an integral domain, of course).
Example 0.15 The ring - ´%µ is an integral domain, since ²%³²%³ ~ implies that ²%³ ~ or ²%³ ~ .
If 9 is a ring and % ~ & where Á %Á & 9 then we cannot in general cancel the 's and conclude that % ~ &. For instance, in { , we have h ~ h , but canceling the 's gives ~ . However, it is precisely the integral domains in which we can cancel. The simple proof is left to the reader. Theorem 0.23 Let 9 be a commutative ring with identity. Then 9 is an integral domain if and only if the cancellation law
Preliminaries
23
% ~ &Á £ ¬ % ~ & holds.
The Field of Quotients of an Integral Domain Any integral domain 9 can be embedded in a field. The quotient field (or field of quotients) of 9 is a field that is constructed from 9 just as the field of rational numbers is constructed from the ring of integers. In particular, we set 9 b ~ ¸²Á ³ Á 9Á £ ¹ Thinking of ²Á ³ as the “fraction” ° we define addition and multiplication of fractions in the same way as for rational numbers ²Á ³ b ²Á ³ ~ ² b Á ³ and ²Á ³ h ²Á ³ ~ ²Á ³ It is customary to write ²Á ³ in the form ° . Note that if 9 has zero divisors, then these definitions do not make sense, because may be even if and are not. This is why we require that 9 be an integral domain.
Principal Ideal Domains Definition Let 9 be a ring with identity and let 9 . The principal ideal generated by is the ideal º» ~ ¸ 9¹ An integral domain 9 in which every ideal is a principal ideal is called a principal ideal domain.
Theorem 0.24 The integers form a principal ideal domain. In fact, any ideal ? in { is generated by the smallest positive integer a that is contained in ? .
Theorem 0.25 The ring - ´%µ is a principal ideal domain. In fact, any ideal ? is generated by the unique monic polynomial of smallest degree contained in ? . Moreover, for polynomials ²%³Á à Á ²%³, º ²%³Á à Á ²%³» ~ ºgcd¸ ²%³Á à Á ²%³¹» Proof. Let ? be an ideal in - ´%µ and let ²%³ be a monic polynomial of smallest degree in ? . First, we observe that there is only one such polynomial in ? . For if ²%³ ? is monic and deg²²%³³ ~ deg²²%³³ then ²%³ ~ ²%³ c ²%³ ? and since deg²²%³³ deg²²%³³, we must have ²%³ ~ and so ²%³ ~ ²%³.
24
Advanced Linear Algebra
We show that ? ~ º²%³». Since ²%³ ? , we have º²%³» ? . To establish the reverse inclusion, if ²%³ ? then dividing ²%³ by ²%³ gives ²%³ ~ ²%³²%³ b ²%³ where ²%³ ~ or deg ²%³ deg ²%³. But since ? is an ideal, ²%³ ~ ²%³ c ²%³²%³ ? and so deg ²%³ deg ²%³ is impossible. Hence, ²%³ ~ and ²%³ ~ ²%³²%³ º²%³» This shows that ? º²%³» and so ? ~ º²%³». To prove the second statement, let ? ~ º ²%³Á à Á ²%³». Then, by what we have just shown, ? ~ º ²%³Á à Á ²%³» ~ º²%³» where ²%³ is the unique monic polynomial ²%³ in ? of smallest degree. In particular, since ²%³ º²%³», we have ²%³ ²%³ for each ~ Á à Á . In other words, ²%³ is a common divisor of the ²%³'s. Moreover, if ²%³ ²%³ for all , then ²%³ º²%³» for all , which implies that ²%³ º²%³» ~ º ²%³Á à Á ²%³» º²%³» and so ²%³ ²%³. This shows that ²%³ is the greatest common divisor of the ²%³'s and completes the proof.
Example 0.16 The ring 9 ~ - ´%Á &µ of polynomials in two variables % and & is not a principal ideal domain. To see this, observe that the set ? of all polynomials with zero constant term is an ideal in 9 . Now, suppose that ? is the principal ideal ? ~ º²%Á &³». Since %Á & ? , there exist polynomials ²%Á &³ and ²%Á &³ for which % ~ ²%Á &³²%Á &³ and & ~ ²%Á &³²%Á &³
(0.1)
But ²%Á &³ cannot be a constant for then we would have ? ~ 9 . Hence, deg²²%Á &³³ and so ²%Á &³ and ²%Á &³ must both be constants, which implies that (0.1) cannot hold.
Theorem 0.26 Any principal ideal domain 9 satisfies the ascending chain condition, that is, 9 cannot have a strictly increasing sequence of ideals ? ? Ä where each ideal is properly contained in the next one.
Preliminaries
25
Proof. Suppose to the contrary that there is such an increasing sequence of ideals. Consider the ideal < ~ ? which must have the form < ~ º» for some < . Since ? for some , we have ? ~ ? for all , contradicting the fact that the inclusions are proper.
Prime and Irreducible Elements We can define the notion of a prime element in any integral domain. For Á 9 , we say that divides (written ) if there exists an % 9 for which ~ %. Definition Let 9 be an integral domain. 1) An invertible element of 9 is called a unit. Thus, " 9 is a unit if "# ~ for some # 9 . 2) Two elements Á 9 are said to be associates if there exists a unit " for which ~ " . 3) A nonzero nonunit 9 is said to be prime if ¬ or 4) A nonzero nonunit 9 is said to be irreducible if ~ ¬ or is a unit
Note that if is prime or irreducible then so is " for any unit ". Theorem 0.27 Let 9 be a ring. 1) An element " 9 is a unit if and only if º"» ~ 9 . 2) and are associates if and only if º» ~ º ». 3) divides if and only if º » º». 4) properly divides , that is, ~ % where % is not a unit, if and only if º » º».
In the case of the integers, an integer is prime if and only if it is irreducible. In any integral domain, prime elements are irreducible, but the converse need not hold. (In the ring {´jc µ ~ ¸ b jc Á {¹ the prime element divides the product ² b jc ³² c jc ³ ~ but does not divide either factor.) However, in principal ideal domains, the two concepts are equivalent. Theorem 0.28 Let 9 be a principal ideal domain. 1) An 9 is irreducible if and only if the ideal º» is maximal. 2) An element in 9 is prime if and only if it is irreducible.
26
Advanced Linear Algebra
3) The elements Á 9 are relatively prime, that is, have no common nonunit factors if and only if there exist Á 9 for which b ~ Proof. To prove 1), suppose that is irreducible and that º» º» 9 . Then º» and so ~ % for some % 9. The irreducibility of implies that or % is a unit. If is a unit then º» ~ 9 and if % is a unit then º» ~ º%» ~ º». This shows that º» is maximal. (We have º» £ 9 , since is not a unit.) Conversely, suppose that is not irreducible, that is, ~ where neither nor is a unit. Then º» º» 9 . But if º» ~ º» then and are associates, which implies that is a unit. Hence º» £ º». Also, if º» ~ 9 then must be a unit. So we conclude that º» is not maximal, as desired. To prove 2), assume first that is prime and ~ . Then or . We may assume that . Therefore, ~ % ~ % . Canceling 's gives ~ % and so is a unit. Hence, is irreducible. (Note that this argument applies in any integral domain.) Conversely, suppose that is irreducible and let . We wish to prove that or . The ideal º» is maximal and so ºÁ » ~ º» or ºÁ » ~ 9 . In the former case, and we are done. In the latter case, we have ~ % b & for some %Á & 9 . Thus, ~ % b & and since divides both terms on the right, we have . To prove 3), it is clear that if b ~ then and are relatively prime. For the converse, consider the ideal ºÁ » which must be principal, say ºÁ » ~ º%». Then % and % and so % must be a unit, which implies that ºÁ » ~ 9 . Hence, there exists Á 9 for which b ~ .
Unique Factorization Domains Definition An integral domain 9 is said to be a unique factorization domain if it has the following factorization properties: 1) Every nonzero nonunit element 9 can be written as a product of a finite number of irreducible elements ~ Ä . 2) The factorization into irreducible elements is unique in the sense that if ~ Ä and ~ Ä are two such factorizations then ~ and after a suitable reindexing of the factors, and are associates.
Unique factorization is clearly a desirable property. Fortunately, principal ideal domains have this property.
Preliminaries
27
Theorem 0.29 Every principal ideal domain 9 is a unique factorization domain. Proof. Let 9 be a nonzero nonunit. If is irreducible then we are done. If not then ~ , where neither factor is a unit. If and are irreducible, we are done. If not, suppose that is not irreducible. Then ~ , where neither nor is a unit. Continuing in this way, we obtain a factorization of the form (after renumbering if necessary) ~ ~ ² ³ ~ ² ³² ³ ~ ² ³² ³ ~ Ä Each step is a factorization of into a product of nonunits. However, this process must stop after a finite number of steps, for otherwise it will produce an infinite sequence Á Á Ã of nonunits of 9 for which b properly divides . But this gives the ascending chain of ideals º » º » º » º » Ä where the inclusions are proper. But this contradicts the fact that a principal ideal domain satisfies the ascending chain condition. Thus, we conclude that every nonzero nonunit has a factorization into irreducible elements. As to uniqueness, if ~ Ä and ~ Ä are two such factorizations then because 9 is an integral domain, we may equate them and cancel like factors, so let us assume this has been done. Thus, £ for all Á . If there are no factors on either side, we are done. If exactly one side has no factors left then we have expressed as a product of irreducible elements, which is not possible since irreducible elements are nonunits. Suppose that both sides have factors left, that is, Ä ~ Ä where £ . Then Ä , which implies that for some . We can assume by reindexing if necessary that ~ . Since is irreducible must be a unit. Replacing by and canceling gives Äc ~ Äc This process can be repeated until we run out of 's or 's. If we run out of 's first then we have an equation of the form " Ä ~ where " is a unit, which is not possible since the 's are not units. By the same reasoning, we cannot run out of 's first and so ~ and the 's and 's can be paired off as associates.
Fields For the record, let us give the definition of a field (a concept that we have been using).
28
Advanced Linear Algebra
Definition A field is a set - , containing at least two elements, together with two binary operations, called addition (denoted by b ) and multiplication (denoted by juxtaposition), for which the following hold: 1) - is an abelian group under addition. 2) The set - i of all nonzero elements in - is an abelian group under multiplication. 3) (Distributivity) For all Á Á - , ² b ³ ~ b and ² b ³ ~ b
We require that - have at least two elements to avoid the pathological case, in which ~ . Example 0.17 The sets r, s and d, of all rational, real and complex numbers, respectively, are fields, under the usual operations of addition and multiplication of numbers.
Example 0.18 The ring { is a field if and only if is a prime number. We have already seen that { is not a field if is not prime, since a field is also an integral domain. Now suppose that ~ is a prime. We have seen that { is an integral domain and so it remains to show that every nonzero element in { has a multiplicative inverse. Let £ { . Since , we know that and are relatively prime. It follows that there exist integers " and # for which " b # ~ Hence, " ² c #³ mod and so " p ~ in { , that is, " is the multiplicative inverse of .
The previous example shows that not all fields are infinite sets. In fact, finite fields play an extremely important role in many areas of abstract and applied mathematics. A field - is said to be algebraically closed if every nonconstant polynomial over - has a root in - . This is equivalent to saying that every nonconstant polynomial splits into linear factors over - . For example, the complex field d is algebraically closed but the real field s is not. We mention without proof that every field - is contained in an algebraically closed field - , called the algebraic closure of - .
The Characteristic of a Ring Let 9 be a ring with identity. If is a positive integer then by h , we simply mean
Preliminaries
29
h ~ bÄb terms Now, it may happen that there is a positive integer for which h~ For instance, in { , we have h ~ ~ . On the other hand, in {, the equation h ~ implies ~ and so no such positive integer exists. Notice that, in any finite ring, there must exist such a positive integer , since the infinite sequence of numbers h Á h Á h Á Ã cannot be distinct and so h ~ h for some , whence ² c ³ h ~ . Definition Let 9 be a ring with identity. The smallest positive integer for which h ~ is called the characteristic of 9 . If no such number exists, we say that 9 has characteristic . The characteristic of 9 is denoted by char²9³.
If char²9³ ~ then for any 9 , we have h ~ b Ä b ~ ² b Ä b ³ ~ h ~ terms terms Theorem 0.30 Any finite ring has nonzero characteristic. Any finite field has prime characteristic. Proof. We have already seen that a finite ring has nonzero characteristic. Let be a finite field and suppose that char²- ³ ~ . If ~ , where Á then h ~ . Hence, ² h ³² h ³ ~ , implying that h ~ or h ~ . In either case, we have a contradiction to the fact that is the smallest positive integer such that h ~ . Hence, must be prime.
Notice that in any field - of characteristic , we have ~ for all - . Thus, in ~ c for all This property takes a bit of getting used to and makes fields of characteristic quite exceptional. (As it happens, there are many important uses for fields of characteristic .)
Algebras The final algebraic structure of which we will have use is a combination of a vector space and a ring. (We have not yet officially defined vector spaces, but we will do so before needing the following definition, which is placed here for easy reference.)
30
Advanced Linear Algebra
Definition An algebra 7 over a field - is a nonempty set 7, together with three operations, called addition (denoted by b ), multiplication (denoted by juxtaposition) and scalar multiplication (also denoted by juxtaposition), for which the following properties hold: 1) 7 is a vector space over - under addition and scalar multiplication. 2) 7 is a ring under addition and multiplication. 3) If - and Á 7 then ²³ ~ ²³ ~ ²³
Thus, an algebra is a vector space in which we can take the product of vectors, or a ring in which we can multiply each element by a scalar (subject, of course, to additional requirements as given in the definition).
Part I—Basic Linear Algebra
Chapter 1
Vector Spaces
Vector Spaces Let us begin with the definition of one of our principal objects of study. Definition Let - be a field, whose elements are referred to as scalars. A vector space over - is a nonempty set = , whose elements are referred to as vectors, together with two operations. The first operation, called addition and denoted by b , assigns to each pair ²"Á #³ of vectors in = a vector " b # in = . The second operation, called scalar multiplication and denoted by juxtaposition, assigns to each pair ²rÁ "³ - d = a vector " in = . Furthermore, the following properties must be satisfied: 1) (Associativity of addition) For all vectors "Á #Á $ = " b ²# b $³ ~ ²" b #³ b $ 2) (Commutativity of addition) For all vectors "Á # = "b#~#b" 3) (Existence of a zero) There is a vector = with the property that b"~"b~" for all vectors " = . 4) (Existence of additive inverses) For each vector " = , there is a vector in = , denoted by c", with the property that " b ²c"³ ~ ²c"³ b " ~
34
Advanced Linear Algebra
5) (Properties of scalar multiplication) For all scalars Á F and for all vectors "Á # = ²" b #³ ~ " b # ² b ³" ~ " b " ²³" ~ ²"³ " ~ "
Note that the first four properties in the definition of vector space can be summarized by saying that = is an abelian group under addition. Any expression of the form # b Ä b # where - and # = for all , is called a linear combination of the vectors # Á à Á # . If at least one of the scalars is nonzero, then the linear combination is nontrivial. Example 1.1 1) Let - be a field. The set - - of all functions from - to - is a vector space over - , under the operations of ordinary addition and scalar multiplication of functions ² b ³²%³ ~ ²%³ b ²%³ and ² ³²%³ ~ ² ²%³³ 2) The set CÁ ²- ³ of all d matrices with entries in a field - is a vector space over - , under the operations of matrix addition and scalar multiplication. 3) The set - of all ordered -tuples, whose components lie in a field - , is a vector space over - , with addition and scalar multiplication defined componentwise ² Á à Á ³ b ² Á à Á ³ ~ ² b Á à Á b ³ and ² Á à Á ³ ~ ² Á à Á ³ When convenient, we will also write the elements of - in column form. When - is a finite field - with elements, we write = ²Á ³ for - . 4) Many sequence spaces are vector spaces. The set Seq²- ³ of all infinite sequences with members from a field - is a vector space under componentwise operations ² ³ b ²! ³ ~ ²
b ! ³
Vector Spaces
35
and ² ³ ~ ² ³ In a similar way, the set of all sequences of complex numbers that converge to is a vector space, as is the set MB of all bounded complex sequences. Also, if is a positive integer then the set M of all complex sequences ² ³ for which B
( ( B ~
is a vector space under componentwise operations. To see that addition is a binary operation on M , one verifies Minkowski's inequality °
B
8 ( ~
b ! ( 9
B
8 ( ( 9 ~
°
B
°
b 8 (! ( 9 ~
which we will not do here.
Subspaces Most algebraic structures contain substructures, and vector spaces are no exception. Definition A subspace of a vector space = is a subset : of = that is a vector space in its own right under the operations obtained by restricting the operations of = to : .
Since many of the properties of addition and scalar multiplication hold a fortiori in a nonempty subset : , we can establish that : is a subspace merely by checking that : is closed under the operations of = . Theorem 1.1 A nonempty subset : of a vector space = is a subspace of = if and only if : is closed under addition and scalar multiplication or, equivalently, : is closed under linear combinations, that is Á - Á "Á # : ¬ " b # :
Example 1.2 Consider the vector space = ²Á ³ of all binary -tuples, that is, -tuples of 's and 's. The weight M ²#³ of a vector # = ²Á ³ is the number of nonzero coordinates in #. For instance, M ²³ ~ . Let , be the set of all vectors in = of even weight. Then , is a subspace of = ²Á ³. To see this, note that M ²" b #³ ~ M ²"³ b M ²#³ c M ²" q #³ where " q # is the vector in = ²Á ³ whose th component is the product of the
36
Advanced Linear Algebra
th components of " and #, that is, ²" q #³ ~ " h # Hence, if M ²"³ and M ²#³ are both even, so is M ²" b #³. Finally, scalar multiplication over - is trivial and so , is a subspace of = ²Á ³, known as the even weight subspace of = ²Á ³.
Example 1.3 Any subspace of the vector space = ²Á ³ is called a linear code. Linear codes are among the most important and most studied types of codes, because their structure allows for efficient encoding and decoding of information.
The Lattice of Subspaces The set I²= ³ of all subspaces of a vector space = is partially ordered by set inclusion. The zero subspace ¸¹ is the smallest element in I ²= ³ and the entire space = is the largest element. If :Á ; I ²= ³ then : q ; is the largest subspace of = that is contained in both : and ; . In terms of set inclusion, : q ; is the greatest lower bound of : and ; : q ; ~ glb¸:Á ; ¹ Similarly, if ¸: 2¹ is any collection of subspaces of = then their intersection is the greatest lower bound of the subspaces
: ~ glb¸: 2¹ 2
On the other hand, if :Á ; I ²= ³ (and - is infinite) then : r ; I ²= ³ if and only if : ; or ; : . Thus, the union of two subspaces is never a subspace in any “interesting” case. We also have the following. Theorem 1.2 A nontrivial vector space = over an infinite field - is not the union of a finite number of proper subspaces. Proof. Suppose that = ~ : r Ä r : , where we may assume that : \ : r Ä r : Let $ : ± ²: r Ä r : ³ and let # ¤ : . Consider the infinite set ( ~ ¸$ b # - ¹ which is the “line” through #, parallel to $. We want to show that each : contains at most one vector from the infinite set (, which is contrary to the fact that = ~ : r Ä r : . This will prove the theorem.
Vector Spaces
37
If $ b # : for £ then $ : implies # : , contrary to assumption. Next, suppose that $ b # : and $ b # : , for , where £ . Then : ² $ b #³ c ² $ b #³ ~ ² c ³$ and so $ : , which is also contrary to assumption.
To determine the smallest subspace of = containing the subspaces : and ; , we make the following definition. Definition Let : and ; be subspaces of = . The sum : b ; is defined by : b ; ~ ¸" b # " :Á # ; ¹ More generally, the sum of any collection ¸: 2¹ of subspaces is the set of all finite sums of vectors from the union : : ~ H 2
bÄb
: I
2
It is not hard to show that the sum of any collection of subspaces of = is a subspace of = and that in terms of set inclusion, the sum is the least upper bound : b ; ~ lub¸:Á ; ¹ More generally, : ~ lub¸: 2¹ 2
If a partially ordered set 7 has the property that every pair of elements has a least upper bound and greatest lower bound, then 7 is called a lattice. If 7 has a smallest element and a largest element and has the property that every collection of elements has a least upper bound and greatest lower bound, then 7 is called a complete lattice. Theorem 1.3 The set I²= ³ of all subspaces of a vector space = is a complete lattice under set inclusion, with smallest element ¸¹, largest element = , glb¸: 2¹ ~ : 2
and lub¸: 2¹ ~ : 2
38
Advanced Linear Algebra
Direct Sums As we will see, there are many ways to construct new vector spaces from old ones.
External Direct Sums Definition Let = Á à Á = be vector spaces over a field - . The external direct sum of = Á à Á = , denoted by = ~ = ^ Ä ^ = is the vector space = whose elements are ordered -tuples = ~ ¸²# Á à Á # ³ # = Á ~ Á à Á ¹ with componentwise operations ²" Á à Á " ³ b ²# Á à Á # ³ ~ ²" b # Á à Á " b #³ and ²# Á à Á # ³ ~ ²# Á à Á # ³
Example 1.4 The vector space - is the external direct sum of copies of - , that is, - ~ - ^ Ä ^ where there are summands on the right-hand side.
This construction can be generalized to any collection of vector spaces by generalizing the idea that an ordered -tuple ²# Á à Á # ³ is just a function ¢ ¸Á à Á ¹ ¦ = from the index set ¸Á à Á ¹ to the union of the spaces with the property that ²³ = . Definition Let < ~ ¸= 2¹ be any family of vector spaces over - . The direct product of < is the vector space = ~ H ¢ 2 ¦ = d ²³ = I
2
2
thought of as a subspace of the vector space of all functions from 2 to = .
It will prove more useful to restrict the set of functions to those with finite support. Definition Let < ~ ¸= 2¹ be a family of vector spaces over - . The support of a function ¢ 2 ¦ = is the set supp² ³ ~ ¸ 2 ²³ £ ¹
Vector Spaces
39
Thus, a function has finite support if ²³ ~ for all but a finite number of 2. The external direct sum of the family < is the vector space ext
= ~ H ¢ 2 ¦ = d ²³ = , has finite supportI 2
2
thought of as a subspace of the vector space of all functions from 2 to = .
An important special case occurs when = ~ = for all 2 . If we let = 2 denote the set of all functions from 2 to = and ²= 2 ³ denote the set of all functions in = 2 that have finite support then ext
= ~ = 2 and = ~ ²= 2 ³ 2
2
Note that the direct product and the external direct sum are the same for a finite family of vector spaces.
Internal Direct Sums An internal version of the direct sum construction is often more relevant. Definition Let = be a vector space. We say that = is the (internal) direct sum of a family < ~ ¸: 2¹ of subspaces of = if every vector # = can be written, in a unique way (except for order), as a finite sum of vectors from the subspaces in < , that is, if for all # = , # ~ " b Ä b " for " : and furthermore, if # ~ $ b Ä b $ where $ : then ~ and (after reindexing if necessary) $ ~ " for all ~ Á Ã Á . If = is the direct sum of < , we write = ~ : 2
and refer to each : as a direct summand of = . If < ~ ¸: Á Ã Á : ¹ is a finite family, we write = ~ : l Ä l : If = ~ : l ; then ; is called a complement of : in = .
40
Advanced Linear Algebra
Note that a sum is direct if and only if whenever " b Ä b " ~ where " : and £ then " ~ for all , that is, if and only if has a unique representation as a sum of vectors from distinct subspaces. The reader will be asked in a later chapter to show that the concepts of internal and external direct sum are essentially equivalent (isomorphic). For this reason, we often use the term “direct sum” without qualification. Once we have discussed the concept of a basis, the following theorem can be easily proved. Theorem 1.4 Any subspace of a vector space has a complement, that is, if : is a subspace of = then there exists a subspace ; for which = ~ : l ; .
It should be emphasized that a subspace generally has many complements (although they are isomorphic). The reader can easily find examples of this in s . We will have more to say about the existence and uniqueness of complements later in the book. The following characterization of direct sums is quite useful. Theorem 1.5 A vector space = is the direct sum of a family < ~ ¸: 2¹ of subspaces if and only if 1) = is the sum of the : = ~ : 2
2) For each 2 , : q 4: 5 ~ ¸¹ £
Proof. Suppose first that = is the direct sum of < . Then 1) certainly holds and if # : q 4: 5 £
then # ~
for some
: and #~
bÄb
where : and £ for all ~ Á Ã Á . Hence, by the uniqueness of direct sum representations, ~ and so # ~ . Thus, 2) holds. For the converse, suppose that 1) and 2) hold. We need only verify the uniqueness condition. If #~ and
bÄb
Vector Spaces
41
# ~ ! b Ä b ! where : and ! : then by including additional terms equal to we may assume that the index sets ¸ Á Ã Á ¹ and ¸ Á Ã Á ¹ are the same set ¸ Á Ã Á ¹, that is #~
bÄb
and # ~ ! b Ä b ! Thus, ²
c ! ³ b Ä b ²
c ! ³ ~
Hence, each term " c !" :" is a sum of vectors from subspaces other than :" , which can happen only if " c !" ~ . Thus, " ~ !" for all " and = is the direct sum of < .
Example 1.5 Any matrix ( C can be written in the form (~
²( b (! ³ b ²( c (! ³ ~ ) b *
(1.1)
where (! is the transpose of (. It is easy to verify that ) is symmetric and * is skew-symmetric and so (1.1) is a decomposition of ( as the sum of a symmetric matrix and a skew-symmetric matrix. Since the sets Sym and SkewSym of all symmetric and skew-symmetric matrices in C are subspaces of C , we have C ~ Sym b SkewSym Furthermore, if : b ; ~ : Z b ; Z , where : and : Z are symmetric and ; and ; Z are skew-symmetric, then the matrix < ~ : c :Z ~ ; Z c ; is both symmetric and skew-symmetric. Hence, provided that char²- ³ £ , we must have < ~ and so : ~ : Z and ; ~ ; Z . Thus, C ~ Sym l SkewSym
Spanning Sets and Linear Independence A set of vectors spans a vector space if every vector can be written as a linear combination of some of the vectors in that set. Here is the formal definition.
42
Advanced Linear Algebra
Definition The subspace spanned (or subspace generated) by a set : of vectors in = is the set of all linear combinations of vectors from : º:» ~ span²:³ ~ ¸ # b Ä b # - Á # = ¹ When : ~ ¸# Á Ã Á # ¹ is a finite set, we use the notation º# Á Ã Á # », or span²# Á Ã Á # ³. A set : of vectors in = is said to span = , or generate = , if = ~ span²:³, that is, if every vector # = can be written in the form # ~ # b Ä b # for some scalars Á Ã Á and vectors # Á Ã Á # .
It is clear that any superset of a spanning set is also a spanning set. Note also that all vector spaces have spanning sets, since the entire space is a spanning set. Definition A nonempty set : of vectors in = is linearly independent if for any # Á Ã Á # in : , we have # b Ä b # ~ ¬ ~ Ä ~ ~ If a set of vectors is not linearly independent, it is said to be linearly dependent.
It follows from the definition that any nonempty subset of a linearly independent set is linearly independent. Theorem 1.6 Let : be a set of vectors in = . The following are equivalent: 1) : is linearly independent. 2) Every vector in span²:³ has a unique expression as a linear combination of the vectors in : . 3) No vector in : is a linear combination of the other vectors in : .
The following key theorem relates the notions of spanning set and linear independence. Theorem 1.7 Let : be a set of vectors in = . The following are equivalent: 1) : is linearly independent and spans = . 2) For every vector # = , there is a unique set of vectors # Á Ã Á # in : , along with a unique set of scalars Á Ã Á in - , for which # ~ # b Ä b # 3) : is a minimal spanning set, that is, : spans = but any proper subset of : does not span = . 4) : is a maximal linearly independent set, that is, : is linearly independent, but any proper superset of : is not linearly independent. Proof. We leave it to the reader to show that 1) and 2) are equivalent. Now suppose 1) holds. Then : is a spanning set. If some proper subset : Z of : also
Vector Spaces
43
spanned = then any vector in : c : Z would be a linear combination of the vectors in : Z , contradicting the fact that the vectors in : are linearly independent. Hence 1) implies 3). Conversely, if : is a minimal spanning set then it must be linearly independent. For if not, some vector : would be a linear combination of the other vectors in : and so : c ¸ ¹ would be a proper spanning subset of : , which is not possible. Hence 3) implies 1). Suppose again that 1) holds. If : were not maximal, there would be a vector # = c : for which the set : r ¸#¹ is linearly independent. But then # is not in the span of : , contradicting the fact that : is a spanning set. Hence, : is a maximal linearly independent set and so 1) implies 4). Conversely, if : is a maximal linearly independent set then : must span = , for if not, we could find a vector # = c : that is not a linear combination of the vectors in : . Hence, : r ¸#¹ would be a linearly independent proper superset of :, which is a contradiction. Thus, 4) implies 1).
Definition A set of vectors in = that satisfies any (and hence all) of the equivalent conditions in Theorem 1.7 is called a basis for = .
Corollary 1.8 A finite set : ~ ¸# Á Ã Á # ¹ of vectors in = is a basis for = if and only if = ~ º# » l Ä l º# »
Example 1.6 The th standard vector in - is the vector that has s in all coordinate positions except the th, where it has a . Thus, ~ ²Á Á à Á ³Á
~ ²Á Á à Á ³ Á à Á
~ ²Á à Á Á ³
The set ¸ Á Ã Á ¹ is called the standard basis for - .
The proof that every nontrivial vector space has a basis is a classic example of the use of Zorn's lemma. Theorem 1.9 Let = be a nonzero vector space. Let 0 be a linearly independent set in = and let : be a spanning set in = containing 0 . Then there is a basis 8 for = for which 0 8 : . In particular, 1) Any vector space, except the zero space ¸¹, has a basis. 2) Any linearly independent set in = is contained in a basis. 3) Any spanning set in = contains a basis. Proof. Consider the collection 7 of all linearly independent subsets of = containing 0 and contained in : . This collection is not empty, since 0 7 . Now, if
44
Advanced Linear Algebra
9 ~ ¸0 2¹ is a chain in 7 then the union < ~ 0 2
is linearly independent and satisfies 0 < : , that is, < 7 . Hence, every chain in 7 has an upper bound in 7 and according to Zorn's lemma, 7 must contain a maximal element 8 , which is linearly independent. Now, 8 is a basis for the vector space º:» ~ = , for if any : is not a linear combination of the elements of 8 then 8 r ¸ ¹ : is linearly independent, contradicting the maximality of 8 . Hence : º8 » and so = ~ º:» º8 ».
The reader can now show, using Theorem 1.9, that any subspace of a vector space has a complement.
The Dimension of a Vector Space The next result, with its classical elegant proof, says that if a vector space = has a finite spanning set : then the size of any linearly independent set cannot exceed the size of : . Theorem 1.10 Let = be a vector space and assume that the vectors # Á Ã Á # are linearly independent and the vectors Á Ã Á span = . Then . Proof. First, we list the two sets of vectors: the spanning set followed by the linearly independent set Á Ã Á Â # Á Ã Á # Then we move the first vector # to the front of the first list # Á Á Ã Á Â # Á Ã Á # Since Á Ã Á span = , # is a linear combination of the 's. This implies that we may remove one of the 's, which by reindexing if necessary can be , from the first list and still have a spanning set # Á Á Ã Á Â # Á Ã Á # Note that the first set of vectors still spans = and the second set is still linearly independent. Now we repeat the process, moving # from the second list to the first list # Á # Á Á Ã Á Â # Á Ã Á # As before, the vectors in the first list are linearly dependent, since they spanned = before the inclusion of # . However, since the # 's are linearly independent, any nontrivial linear combination of the vectors in the first list that equals
Vector Spaces
45
must involve at least one of the 's. Hence, we may remove that vector, which again by reindexing if necessary may be taken to be and still have a spanning set # Á # Á Á Ã Á Â # Á Ã Á # Once again, the first set of vectors spans = and the second set is still linearly independent. Now, if , then this process will eventually exhaust the list # Á # Á Ã Á # Â #b Á Ã Á #
's
and lead to the
where # Á # Á Ã Á # span = , which is clearly not possible since # is not in the span of # Á # Á Ã Á # . Hence, .
Corollary 1.11 If = has a finite spanning set then any two bases of = have the same size.
Now let us prove Corollary 1.11 for arbitrary vector spaces. Theorem 1.12 If = is a vector space then any two bases for = have the same cardinality. Proof. We may assume that all bases for = are infinite sets, for if any basis is finite then = has a finite spanning set and so Corollary 1.11 applies. Let 8 ~ ¸ 0¹ be a basis for = and let 9 be another basis for = . Then any vector 9 can be written as a finite linear combination of the vectors in 8 , where all of the coefficients are nonzero, say ~ <
But because 9 is a basis, we must have < ~ 0 9
for if the vectors in 9 can be expressed as finite linear combinations of the vectors in a proper subset 8 Z of 8 then 8 Z spans = , which is not the case. Since (< ( L for all 9, Theorem 0.16 implies that (8 ( ~ (0 ( L (9( ~ (9( But we may also reverse the roles of 8 and 9, to conclude that (8 ( (9( and so (8 ( ~ (9( by the Schro¨der–Bernstein theorem.
Theorem 1.12 allows us to make the following definition.
46
Advanced Linear Algebra
Definition A vector space = is finite-dimensional if it is the zero space ¸¹, or if it has a finite basis. All other vector spaces are infinite-dimensional. The dimension of the zero space is and the dimension of any nonzero vector space = is the cardinality of any basis for = . If a vector space = has a basis of cardinality , we say that = is -dimensional and write dim²= ³ ~ .
It is easy to see that if : is a subspace of = then dim²:³ dim²= ³. If in addition, dim²:³ ~ dim²= ³ B then : ~ = . Theorem 1.13 Let = be a vector space. 1) If 8 is a basis for = and if 8 ~ 8 r 8 and 8 q 8 ~ J then = ~ º8 » l º8 » 2) Let = ~ : l ; . If 8 is a basis for : and 8 is a basis for ; then 8 q 8 ~ J and 8 ~ 8 r 8 is a basis for = .
Theorem 1.14 Let : and ; be subspaces of a vector space = . Then dim²:³ b dim²; ³ ~ dim²: b ; ³ b dim²: q ; ³ In particular, if ; is any complement of : in = then dim²:³ b dim²; ³ ~ dim²= ³ that is, dim²: l ; ³ ~ dim²:³ b dim²; ³ Proof. Suppose that 8 ~ ¸ 0¹ is a basis for : q ; . Extend this to a basis 7 r 8 for : where 7 ~ ¸ 1 ¹ is disjoint from 8 . Also, extend 8 to a basis 8 r 9 for ; where 9 ~ ¸ 2¹ is disjoint from 8 . We claim that 7 r 8 r 9 is a basis for : b ; . It is clear that º7 r 8 r 9» ~ : b ; . To see that 7 r 8 r 9 is linearly independent, suppose to the contrary that # b Ä b # ~ where # 7 r 8 r 9 and £ for all . There must be vectors # in this expression from both 7 and 9, since 7 r 8 and 8 r 9 are linearly independent. Isolating the terms involving the vectors from 7 on one side of the equality shows that there is a nonzero vector in % º7» q º8 r 9». But then % : q ; and so % º7» q º8 », which implies that % ~ , a contradiction. Hence, 7 r 8 r 9 is linearly independent and a basis for : b ; .
Vector Spaces
47
Now, dim²:³ b dim²; ³ ~ (7 r 8 ( b (8 r 9( ~ (7( b (8 ( b (8 ( b (9( ~ (7( b (8 ( b (9( b dim²: q ; ³ ~ dim²: b ; ³ b dim²: q ; ³ as desired.
It is worth emphasizing that while the equation dim²:³ b dim²; ³ ~ dim²: b ; ³ b dim²: q ; ³ holds for all vector spaces, we cannot write dim²: b ; ³ ~ dim²:³ b dim²; ³ c dim²: q ; ³ unless : b ; is finite-dimensional.
Ordered Bases and Coordinate Matrices It will be convenient to consider bases that have an order imposed upon their members. Definition Let = be a vector space of dimension . An ordered basis for = is an ordered -tuple ²# Á Ã Á # ³ of vectors for which the set ¸# Á Ã Á # ¹ is a basis for = .
If 8 ~ ²# Á Ã Á # ³ is an ordered basis for = then for each # = there is a unique ordered -tuple ² Á Ã Á ³ of scalars for which # ~ # b Ä b # Accordingly, we can define the coordinate map 8 ¢ = ¦ - by v y 8 ²#³ ~ ´#µ8 ~ Å w z
(1.3)
where the column matrix ´#µ8 is known as the coordinate matrix of # with respect to the ordered basis 8 . Clearly, knowing ´#µ8 is equivalent to knowing # (assuming knowledge of 8 ). Furthermore, it is easy to see that the coordinate map 8 is bijective and preserves the vector space operations, that is, 8 ² # b Ä b # ³ ~ 8 ²# ³ b Ä b 8 ²# ³ or equivalently
48
Advanced Linear Algebra
´ # b Ä b # µ8 ~ ´# µ8 b Ä b ´# µ8 Functions from one vector space to another that preserve the vector space operations are called linear transformations and form the objects of study in the next chapter.
The Row and Column Spaces of a Matrix Let ( be an d matrix over - . The rows of ( span a subspace of - known as the row space of ( and the columns of ( span a subspace of - known as the column space of (. The dimensions of these spaces are called the row rank and column rank, respectively. We denote the row space and row rank by rs²(³ and rrk²(³ and the column space and column rank by cs²(³ and crk²(³. It is a remarkable and useful fact that the row rank of a matrix is always equal to its column rank, despite the fact that if £ , the row space and column space are not even in the same vector space! Our proof of this fact hinges upon the following simple observation about matrices. Lemma 1.15 Let ( be an d matrix. Then elementary column operations do not affect the row rank of (. Similarly, elementary row operations do not affect the column rank of (. Proof. The second statement follows from the first by taking transposes. As to the first, the row space of ( is rs²(³ ~ º (Á Ã Á (» where are the standard basis vectors in - . Performing an elementary column operation on ( is equivalent to multiplying ( on the right by an elementary matrix , . Hence the row space of (, is rs²(,³ ~ º (,Á Ã Á (,» and since , is invertible, rr²(³ ~ dim²rs²(³³ ~ dim²rs²(,³³ ~ rr²(,³ as desired.
Theorem 1.16 If ( CÁ then rrk²(³ ~ crk²(³. This number is called the rank of ( and is denoted by rk²(³. Proof. According to the previous lemma, we may reduce ( to reduced column echelon form without affecting the row rank. But this reduction does not affect the column rank either. Then we may further reduce ( to reduced row echelon form without affecting either rank. The resulting matrix 4 has the same row and column ranks as (. But 4 is a matrix with 's followed by 's on the main
Vector Spaces
49
diagonal (entries 4Á Á 4Á Á Ã ) and 's elsewhere. Hence, rrk²(³ ~ rrk²4 ³ ~ crk²4 ³ ~ crk²(³ as desired.
The Complexification of a Real Vector Space If > is a complex vector space (that is, a vector space over d), then we can think of > as a real vector space simply by restricting all scalars to the field s. Let us denote this real vector space by >s and call it the real version of > . On the other hand, to each real vector space = , we can associate a complex vector space = d . This “complexification” process will play a useful role when we discuss the structure of linear operators on a real vector space. (Throughout our discussion = will denote a real vector space.) Definition If = is a real vector space then the set = d ~ = d = of ordered pairs, with componentwise addition ²"Á #³ b ²%Á &³ ~ ²" b %Á # b &³ and scalar multiplication over d defined by ² b ³²"Á #³ ~ ²" c #Á # b "³ for Á s is a complex vector space, called the complexification of = .
It is convenient to introduce a notation for vectors in = d that resembles complex numbers. In particular, we denote ²"Á #³ = d by " b # and so = d ~ ¸" b # "Á # = ¹ Addition now looks like ordinary addition of complex numbers ²" b #³ b ²% b &³ ~ ²" b %³ b ²# b &³ and scalar multiplication looks like ordinary multiplication of complex numbers ² b ³²" b #³ ~ ²" c #³ b ²# b "³ Thus, for example, we immediately have for Á s ²" b #³ ~ " b # ²" b #³ ~ c# b " ² b ³" ~ " b " ² b ³# ~ c# b # The real part of ' ~ " b # is " = and the imaginary part of ' is # = . The essence of the fact that ' ~ " b # = d is really an ordered pair is that ' is if and only if its real and imaginary parts are both .
50
Advanced Linear Algebra
We can define the complexification map cpx¢ = ¦ = d by cpx²#³ ~ # b Let us refer to # b as the complexification, or complex version of # = . Note that this map is a group homomorphism, that is, cpx²³ ~ b
and
cpx²" f #³ ~ cpx²"³ f cpx²#³
and it is injective cpx²"³ ~ cpx²#³ ¯ " ~ # Also, it preserves multiplication by real scalars cpx²"³ ~ " b ~ ²" b ³ ~ cpx²"³ for s. However, the complexification map is not surjective, since it gives only “real” vectors in = d . The complexification map is an injective linear transformation from the real vector space = to the real version ²= d ³s of the complexification = d , that is, to the complex vector space = d provided that scalars are restricted to real numbers. In this way, we see that = d contains an embedded copy of = .
The Dimension of = d The vector-space dimensions of = and = d are the same. This should not necessarily come as a surprise because although = d may seem “bigger” than = , the field of scalars is also “bigger.” Theorem 1.17 If 8 ~ ¸# 0¹ is a basis for = over s then the complexification of 8 cpx²8 ³ ~ ¸# b # 8 ¹ is a basis for the vector space = d over d. Hence, dim²= d ³ ~ dim²= ³ Proof. To see that cpx²8 ³ spans = d over d, let % b & = d . Then %Á & = and so there exist real numbers and (some of which may be ) for which
Vector Spaces
1
51
1
% b & ~ # b @ # A ~
~
1
~ ² # b # ³ ~ 1
~ ² b ³²# b ³ ~
To see that cpx²8 ³ is linearly independent, if 1
² b ³²# b ³ ~ b ~
then the previous computations show that 1
1
# ~ and # ~ ~
~
The independence of 8 then implies that ~ and ~ for all .
If # = and 8 is a basis for = then we may write
# ~ # ~
for s. Since the coefficients are real, we have
# b ~ ²# b ³ ~
and so the coordinate matrices are equal ´# b µcpx²8 ³ ~ ´#µ8
Exercises 1.
2. 3.
Let = be a vector space over - . Prove that # ~ and ~ for all # = and - . Describe the different 's in these equations. Prove that if # ~ then ~ or # ~ . Prove that # ~ # implies that # ~ or ~ . Prove Theorem 1.3. a) Find an abelian group = and a field - for which = is a vector space over - in at least two different ways, that is, there are two different definitions of scalar multiplication making = a vector space over - .
52
4.
Advanced Linear Algebra
b) Find a vector space = over - and a subset : of = that is (1) a subspace of = and (2) a vector space using operations that differ from those of = . Suppose that = is a vector space with basis 8 ~ ¸ 0¹ and : is a subspace of = . Let ¸) Á Ã Á ) ¹ be a partition of 8 . Then is it true that
: ~ ²: q º) »³ ~
5. 6.
What if : q º) » £ ¸¹ for all ? Prove Corollary 1.8. Let :Á ; Á < I ²= ³. Show that if < : then : q ²; b < ³ ~ ²: q ; ³ b <
7.
This is called the modular law for the lattice I²= ³. For what vector spaces does the distributive law of subspaces : q ²; b < ³ ~ ²: q ; ³ b ²: q < ³
8.
9.
10. 11.
12.
13.
14.
hold? A vector # ~ ² Á Ã Á ³ s is called strongly positive if for all ~ Á Ã Á . a) Suppose that # is strongly positive. Show that any vector that is “close enough” to # is also strongly positive. (Formulate carefully what “close enough” should mean.) b) Prove that if a subspace : of s contains a strongly positive vector, then : has a basis of strongly positive vectors. Let 4 be an d matrix whose rows are linearly independent. Suppose that the columns Á Ã Á of 4 span the column space of 4 . Let * be the matrix obtained from 4 by deleting all columns except Á Ã Á . Show that the rows of * are also linearly independent. Prove that the first two statements in Theorem 1.7 are equivalent. Show that if : is a subspace of a vector space = then dim²:³ dim²= ³. Furthermore, if dim²:³ ~ dim²= ³ B then : ~ = . Give an example to show that the finiteness is required in the second statement. Let dim²= ³ B and suppose that = ~ < l : ~ < l : . What can you say about the relationship between : and : ? What can you say if : : ? What is the relationship between : l ; and ; l : ? Is the direct sum operation commutative? Formulate and prove a similar statement concerning associativity. Is there an “identity” for direct sum? What about “negatives”? Let = be a finite-dimensional vector space over an infinite field - . Prove that if : Á Ã Á : are subspaces of = of equal dimension then there is a subspace ; of = for which = ~ : l ; for all ~ Á Ã Á . In other words, ; is a common complement of the subspaces : .
Vector Spaces
53
15. Prove that the vector space 9 of all continuous functions from s to s is infinite-dimensional. 16. Show that Theorem 1.2 need not hold if the base field - is finite. 17. Let : be a subspace of = . The set # b : ~ ¸# b :¹ is called an affine subspace of = . a) Under what conditions is an affine subspace of = a subspace of = ? b) Show that any two affine subspaces of the form # b : and $ b : are either equal or disjoint. 18. If = and > are vector spaces over - for which (= ( ~ (> ( then does it follow that dim²= ³ ~ dim²> ³? 19. Let = be an -dimensional real vector space and suppose that : is a subspace of = with dim²:³ ~ c . Define an equivalence relation on the set = ± : by # $ if the “line segment” 3²#Á $³ ~ ¸# b ² c ³$ ¹ has the property that 3²#Á $³ q : ~ J. Prove that is an equivalence relation and that it has exactly two equivalence classes. 20. Let - be a field. A subfield of - is a subset 2 of - that is a field in its own right using the same operations as defined on - . a) Show that - is a vector space over any subfield 2 of - . b) Suppose that - is an -dimensional vector space over a subfield 2 of - . If = is an -dimensional vector space over - , show that = is also a vector space over 2 . What is the dimension of = as a vector space over 2 ? 21. Let - be a finite field of size and let = be an -dimensional vector space over - . The purpose of this exercise is to show that the number of subspaces of = of dimension is ² c ³Ä² c ³ 4 5 ~ ² c ³Ä² c ³² c c ³Ä² c ³ The expressions ² ³ are called Gaussian coefficients and have properties similar to those of the binomial coefficients. Let :²Á ³ be the number of -dimensional subspaces of = . a) Let 5 ²Á ³ be the number of -tuples of linearly independent vectors ²# Á à Á # ³ in = . Show that 5 ²Á ³ ~ ² c ³² c ³Ä² c c ³ b) Now, each of the -tuples in a) can be obtained by first choosing a subspace of = of dimension and then selecting the vectors from this subspace. Show that for any -dimensional subspace of = , the number of -tuples of independent vectors in this subspace is ² c ³² c ³Ä² c c ³
Advanced Linear Algebra
54
c)
Show that 5 ²Á ³ ~ :²Á ³² c ³² c ³Ä² c c ³
How does this complete the proof? 22. Prove that any subspace : of s is a closed set or, equivalently, that its set complement : ~ s ± : is open, that is, for any % : there is an open ball )² Á ³ centered at % with radius for which )²%Á ³ : . 23. Let 8 ~ ¸ Á à Á ¹ and 9 ~ ¸ Á à Á ¹ be bases for a vector space = . Let c . Show that there is a permutation of ¸Á à Á ¹ such that Á à Á Á ²b³ Á à Á ²³ and ²³ Á à Á ²³ Á b Á à Á are both bases for = . 24. Let dim²= ³ ~ and suppose that : Á à Á : are subspaces of = with dim²: ³ . Prove that there is a subspace ; of = of dimension c for which ; q : ~ ¸¹ for all . 25. What is the dimension of the complexification = d thought of as a real vector space? 26. (When is a subspace of a complex vector space a complexification?) Let = be a real vector space with complexification = d and let < be a subspace of = d . Prove that there is a subspace : of = for which < ~ : d ~ ¸ b ! Á ! :¹ if and only if < is closed under complex conjugation ¢ = d ¦ = d defined by ²" b #³ ~ " c #.
Chapter 2
Linear Transformations
Linear Transformations Loosely speaking, a linear transformation is a function from one vector space to another that preserves the vector space operations. Let us be more precise. Definition Let = and > be vector spaces over a field - . A function ¢ = ¦ > is a linear transformation if ²" b #³ ~ ²"³ b ²#³ for all scalars Á - and vectors ",# = . A linear transformation ¢ = ¦ = is called a linear operator on = . The set of all linear transformations from = to > is denoted by B²= Á > ³ and the set of all linear operators on = is denoted by B²= ³.
We should mention that some authors use the term linear operator for any linear transformation from = to > . Definition The following terms are also employed: 1) homomorphism for linear transformation 2) endomorphism for linear operator 3) monomorphism (or embedding) for injective linear transformation 4) epimorphism for surjective linear transformation 5) isomorphism for bijective linear transformation. 6) automorphism for bijective linear operator.
Example 2.1 1) The derivative +¢ = ¦ = is a linear operator on the vector space = of all infinitely differentiable functions on s.
56
Advanced Linear Algebra
2) The integral operator ¢ - ´%µ ¦ - ´%µ defined by %
² ³ ~ ²!³!
is a linear operator on - ´%µ. 3) Let ( be an d matrix over - . The function ( ¢ - ¦ - defined by ( ²#³ ~ (#, where all vectors are written as column vectors, is a linear transformation from - to - . This function is just multiplication by ( . 4) The coordinate map ¢ = ¦ - of an -dimensional vector space is a linear transformation from = to - .
The set B²= Á > ³ is a vector space in its own right and B²= ³ has the structure of an algebra, as defined in Chapter 0. Theorem 2.1 1) The set B²= Á > ³ is a vector space under ordinary addition of functions and scalar multiplication of functions by elements of - . 2) If B²< Á = ³ and B²= Á > ³ then the composition is in B²< Á > ³. 3) If B²= Á > ³ is bijective then c B²> Á = ³. 4) The vector space B²= ³ is an algebra, where multiplication is composition of functions. The identity map B²= ³ is the multiplicative identity and the zero map B²= ³ is the additive identity. Proof. We prove only part 3). Let ¢ = ¦ > be a bijective linear transformation. Then c ¢ > ¦ = is a well-defined function and since any two vectors $ and $ in > have the form $ ~ ²# ³ and $ ~ ²# ³, we have c ²$ b $ ³ ~ c ² ²# ³ b ²# ³³ ~ c ² ²# b # ³³ ~ # b # ~ c ²$ ³ b c ²$ ³ which shows that c is linear.
One of the easiest ways to define a linear transformation is to give its values on a basis. The following theorem says that we may assign these values arbitrarily and obtain a unique linear transformation by linear extension to the entire domain. Theorem 2.2 Let = and > be vector spaces and let 8 ~ ¸# 0¹ be a basis for = . Then we can define a linear transformation B²= Á > ³ by specifying the values of ²# ³ > arbitrarily for all # 8 and extending the domain of to = using linearity, that is, ² # b Ä b # ³ ~ ²# ³ b Ä b ²# ³
Linear Transformations
57
This process uniquely defines a linear transformation, that is, if Á B²= Á > ³ satisfy ²# ³ ~ ²# ³ for all # 8 then ~ . Proof. The crucial point is that the extension by linearity is well-defined, since each vector in = has a unique representation as a linear combination of a finite number of vectors in 8. We leave the details to the reader.
Note that if B²= Á > ³ and if : is a subspace of = , then the restriction O: of to : is a linear transformation from : to > .
The Kernel and Image of a Linear Transformation There are two very important vector spaces associated with a linear transformation from = to > . Definition Let B²= Á > ³. The subspace ker² ³ ~ ¸# = ²#³ ~ ¹ is called the kernel of and the subspace im² ³ ~ ¸ ²#³ # = ¹ is called the image of . The dimension of ker² ³ is called the nullity of and is denoted by null² ³. The dimension of im² ³ is called the rank of and is denoted by rk² ³.
It is routine to show that ker² ³ is a subspace of = and im² ³ is a subspace of > . Moreover, we have the following. Theorem 2.3 Let B²= Á > ³. Then 1) is surjective if and only if im² ³ ~ > 2) is injective if and only if ker² ³ ~ ¸¹ Proof. The first statement is merely a restatement of the definition of surjectivity. To see the validity of the second statement, observe that ²"³ ~ ²#³ ¯ ²" c #³ ~ ¯ " c # ker² ³ Hence, if ker² ³ ~ ¸¹ then ²"³ ~ ²#³ ¯ " ~ #, which shows that is injective. Conversely, if is injective and " ker² ³ then ²"³ ~ ²³ and so " ~ . This shows that ker² ³ ~ ¸¹.
Isomorphisms Definition A bijective linear transformation ¢ = ¦ > is called an isomorphism from = to > . When an isomorphism from = to > exists, we say that = and > are isomorphic and write = > .
Example 2.2 Let dim²= ³ ~ . For any ordered basis 8 of = , the coordinate map 8 ¢ = ¦ - that sends each vector # = to its coordinate matrix
58
Advanced Linear Algebra
´#µ8 - is an isomorphism. Hence, any -dimensional vector space over - is isomorphic to - .
Isomorphic vector spaces share many properties, as the next theorem shows. If B²= Á > ³ and : = we write ²:³ ~ ¸ ² ³
:¹
Theorem 2.4 Let B²= Á > ³ be an isomorphism. Let : = . Then 1) : spans = if and only if ²:³ spans > . 2) : is linearly independent in = if and only if ²:³ is linearly independent in >. 3) : is a basis for = if and only if ²:³ is a basis for > .
An isomorphism can be characterized as a linear transformation ¢ = ¦ > that maps a basis for = to a basis for > . Theorem 2.5 A linear transformation B²= Á > ³ is an isomorphism if and only if there is a basis 8 of = for which ²8 ³ is a basis of > . In this case, maps any basis of = to a basis of > .
The following theorem says that, up to isomorphism, there is only one vector space of any given dimension. Theorem 2.6 Let = and > be vector spaces over - . Then = > if and only if dim²= ³ ~ dim²> ³.
In Example 2.2, we saw that any -dimensional vector space is isomorphic to - . Now suppose that ) is a set of cardinality and let ²- ) ³ be the vector space of all functions from ) to - with finite support. We leave it to the reader to show that the functions ²- ) ³ defined for all ) , by ²%³ ~ F
if % ~ if % £
form a basis for ²- ) ³ , called the standard basis. Hence, dim²²- ) ³ ³ ~ () (. It follows that for any cardinal number , there is a vector space of dimension . Also, any vector space of dimension is isomorphic to ²- ) ³ . Theorem 2.7 If is a natural number then any -dimensional vector space over - is isomorphic to - . If is any cardinal number and if ) is a set of cardinality then any -dimensional vector space over - is isomorphic to the vector space ²- ) ³ of all functions from ) to - with finite support.
Linear Transformations
59
The Rank Plus Nullity Theorem Let B²= Á > ³. Since any subspace of = has a complement, we can write = ~ ker² ³ l ker² ³ where ker² ³ is a complement of ker² ³ in = . It follows that dim²= ³ ~ dim²ker² ³³ b dim²ker² ³ ³ Now, the restriction of to ker² ³ ¢ ker² ³ ¦ > is injective, since ker² ³ ~ ker² ³ q ker² ³ ~ ¸¹ Also, im² ³ im² ³. For the reverse inclusion, if ²#³ im² ³ then since # ~ " b $ for " ker² ³ and $ ker² ³ , we have ²#³ ~ ²"³ b ²$³ ~ ²$³ ~ ²$³ im² ³ Thus im² ³ ~ im² ³. It follows that ker² ³ im² ³ From this, we deduce the following theorem. Theorem 2.8 Let B²= Á > ³. 1) Any complement of ker² ³ is isomorphic to im² ³ 2) (The rank plus nullity theorem) dim²ker² ³³ b dim²im² ³³ ~ dim²= ³ or, in other notation, rk² ³ b null² ³ ~ dim²= ³
Theorem 2.8 has an important corollary. Corollary 2.9 Let B²= Á > ³, where dim²= ³ ~ dim²> ³ B. Then is injective if and only if it is surjective.
Note that this result fails if the vector spaces are not finite-dimensional.
Linear Transformations from - to - Recall that for any d matrix ( over - the multiplication map ( ²#³ ~ (#
60
Advanced Linear Algebra
is a linear transformation. In fact, any linear transformation B²- Á - ³ has this form, that is, is just multiplication by a matrix, for we have 2 ² ³ Ä ² ³3 ~ 2 ² ³ Ä ² ³3²³ ~ ² ³ and so ~ ( where ( ~ 2 ² ³ Ä ² ³3 Theorem 2.10 1) If ( is an d matrix over - then ( B²- Á - ³. 2) If B²- Á - ³ then ~ ( where ( ~ ² ² ³ Ä ² ³³ The matrix ( is called the matrix of .
Example 2.3 Consider the linear transformation ¢ - ¦ - defined by ²%Á &Á '³ ~ ²% c &Á 'Á % b & b '³ Then we have, in column form
v % y v %c& y v & ~ ' ~ w ' z w %b&b' z w
c
y v%y & z w' z
and so the standard matrix of is (~
v w
c
y z
If ( CÁ then since the image of ( is the column space of (, we have dim²ker²( ³³ b rk²(³ ~ dim²- ³ This gives the following useful result. Theorem 2.11 Let ( be an d matrix over - . 1) ( ¢ - ¦ - is injective if and only if rk²(³ ~ n. 2) ( ¢ - ¦ - is surjective if and only if rk²(³ ~ m.
Change of Basis Matrices Suppose that 8 ~ ² Á Ã Á ³ and 9 ~ ² Á Ã Á ³ are ordered bases for a vector space = . It is natural to ask how the coordinate matrices ´#µ8 and ´#µ9 are related. The map that takes ´#µ8 to ´#µ9 is 8Á9 ~ 9 8c and is called the change of basis operator (or change of coordinates operator). Since 8Á9 is an operator on - , it has the form ( where
Linear Transformations
61
( ~ ²8Á9 ² ³Á à Á 8Á9 ² ³³ ~ ²9 8c ²´ µ8 ³Á à Á 9 8c ²´ µ8 ³³ ~ ²´ µ9 Á à Á ´ µ9 ³³ We denote ( by 48,9 and call it the change of basis matrix from 8 to 9 . Theorem 2.12 Let 8 ~ ² Á à Á ³ and 9 be ordered bases for a vector space = . Then the change of basis operator 8Á9 ~ 9 8c is an automorphism of - , whose standard matrix is 48,9 ~ ²´ µ9 Á à Á ´ µ9 ³³ Hence ´#µ9 ~ 48Á9 ´#µ8 and 49Á8 ~ 48c ,9 .
Consider the equation ( ~ 48 Á 9 or equivalently, ( ~ ²´ µ9 Á Ã Á ´ µ9 ³³ Then given any two of ( (an invertible d matrix)Á 8 (an ordered basis for - ) and 9 (an order basis for - ), the third component is uniquely determined by this equation. This is clear if 8 and 9 are given or if ( and 9 are given. If ( and 8 are given then there is a unique 9 for which (c ~ 49Á8 and so there is a unique 9 for which ( ~ 48Á9 . Theorem 2.13 If we are given any two of the following: 1) An invertible d matrix (. 2) An ordered basis 8 for - . 3) An ordered basis 9 for - . then the third is uniquely determined by the equation ( ~ 48 Á 9
The Matrix of a Linear Transformation Let ¢ = ¦ > be a linear transformation, where dim²= ³ ~ and dim²> ³ ~ and let 8 ~ ² Á Ã Á ³ be an ordered basis for = and 9 an ordered basis for > . Then the map ¢ ´#µ8 ¦ ´ ²#³µ9 is a representation of as a linear transformation from - to - , in the sense
Advanced Linear Algebra
62
that knowing (along with 8 and 9, of course) is equivalent to knowing . Of course, this representation depends on the choice of ordered bases 8 and 9. Since is a linear transformation from - to - , it is just multiplication by an d matrix (, that is ´ ²#³µ9 ~ (´#µ8 Indeed, since ´ µ8 ~ , we get the columns of ( as follows: (²³ ~ ( ~ (´#µ8 ~ ´ ² ³µ9 Theorem 2.14 Let B²= Á > ³ and let 8 ~ ² Á Ã Á ³ and 9 be ordered bases for = and > , respectively. Then can be represented with respect to 8 and 9 as matrix multiplication, that is ´ ²#³µ9 ~ ´ µ8,9 ´#µ8 where ´ µ8,9 ~ ²´ ² ³µ9 Ä ´ ² ³µ9 ³ is called the matrix of with respect to the bases 8 and 9. When = ~ > and 8 ~ 9, we denote ´ µ8,8 by ´ µ8 and so ´ ²#³µ8 ~ ´ µ8 ´#µ8
Example 2.4 Let +¢ F ¦ F be the derivative operator, defined on the vector space of all polynomials of degree at most . Let 8 ~ 9 ~ ²Á %Á % ³. Then ´+²³µ9 ~ ´µ9 ~
vy vy vy , ´+²%³µ9 ~ ´µ9 ~ Á ´+²% ³µ9 ~ ´%µ9 ~ wz wz wz
and so ´+µ8 ~
v w
y z
Hence, for example, if ²%³ ~ b % b % then ´+²%³µ9 ~ ´+µ8 ´²%³µ8 ~
v w
yv y v y ~ zw z w z
and so +²%³ ~ b %.
The following result shows that we may work equally well with linear transformations or with the matrices that represent them (with respect to fixed
Linear Transformations
63
ordered bases 8 and 9). This applies not only to addition and scalar multiplication, but also to matrix multiplication. Theorem 2.15 Let = and > be vector spaces over - , with ordered bases 8 ~ ² Á Ã Á ³ and 9 ~ ² Á Ã Á ³, respectively. 1) The map ¢ B²= Á > ³ ¦ CÁ ²- ³ defined by ² ³ ~ ´ µ8,9 is an isomorphism and so B²= Á > ³ CÁ ²- ³. 2) If B²< Á = ³ and B²= Á > ³ and if 8 , 9 and : are ordered bases for < , = and > , respectively then ´µ8,: ~ ´ µ9,: ´µ8,9 Thus, the matrix of the product (composition) is the product of the matrices of and . In fact, this is the primary motivation for the definition of matrix multiplication. Proof. To see that is linear, observe that for all ´ b ! µ8Á9 ´ µ8 ~ ´² b ! ³² ³µ9 ~ ´ ² ³ b ! ² ³µ9 ~ ´² ³µ9 b !´ ² ³µ9 ~ ´µ8Á9 ´ µ8 b !´ µ8Á9 ´ µ8 ~ ² ´µ8Á9 b !´ µ8Á9 ³´ µ8 and since ´ µ8 ~ is a standard basis vector, we conclude that ´ b ! µ8Á9 ~ ´µ8Á9 b !´ µ8Á9 and so is linear. If ( CÁ , we define by the condition ´ ² ³µ9 ~ (²³ , whence ² ³ ~ ( and is surjective. Since dim²B²= Á > ³³ ~ dim²CÁ ²- ³³, the map is an isomorphism. To prove part 2), we have ´µ8Á: ´#µ8 ~ ´ ²²#³³µ: ~ ´ µ9,: ´²#³µ9 ~ ´ µ9Á: ´µ8Á9 ´#µ8
Change of Bases for Linear Transformations Since the matrix ´ µ8,9 that represents depends on the ordered bases 8 and 9, it is natural to wonder how to choose these bases in order to make this matrix as simple as possible. For instance, can we always choose the bases so that is represented by a diagonal matrix? As we will see in Chapter 7, the answer to this question is no. In that chapter, we will take up the general question of how best to represent a linear operator by a matrix. For now, let us take the first step and describe the relationship between the matrices ´ µ8Á9 and ´ µ8Z Á9Z of with respect to two different pairs ²8 Á 9³ and ²8 Z Á 9Z ³ of ordered bases. Multiplication by ´ µ8Z Á9Z sends ´#µ8Z to
64
Advanced Linear Algebra
´ ²#³µ9Z . This can be reproduced by first switching from 8 Z to 8 , then applying ´ µ8Á9 and finally switching from 9 to 9Z , that is, ´ µ8Z ,9Z ~ 49Á9Z ´ µ8,9 48Z Á8 ~ 49Á9Z ´ µ8,9 48c Á8 Z Theorem 2.16 Let B²= ,> ³ and let ²8 Á 9³ and ²8 Z Á 9Z ³ be pairs of ordered bases of = and > , respectively. Then ´ µ8Z Á9Z ~ 49Á9Z ´ µ8Á9 48Z Á8
(2.1)
When B²= ³ is a linear operator on = , it is generally more convenient to represent by matrices of the form ´ µ8 , where the ordered bases used to represent vectors in the domain and image are the same. When 8 ~ 9, Theorem 2.16 takes the following important form. Corollary 2.17 Let B²= ³ and let 8 and 9 be ordered bases for = . Then the matrix of with respect to 9 can be expressed in terms of the matrix of with respect to 8 as follows ´ µ9 ~ 48Á9 ´ µ8 48c Á9
(2.2)
Equivalence of Matrices Since the change of basis matrices are precisely the invertible matrices, (2.1) has the form ´ µ8Z Á9Z ~ 7 ´ µ8Á9 8c where 7 and 8 are invertible matrices. This motivates the following definition. Definition Two matrices ( and ) are equivalent if there exist invertible matrices 7 and 8 for which ) ~ 7 (8c
We remarked in Chapter 0 that ) is equivalent to ( if and only if ) can be obtained from ( by a series of elementary row and column operations. Performing the row operations is equivalent to multiplying the matrix ( on the left by 7 and performing the column operations is equivalent to multiplying ( on the right by 8c . In terms of (2.1), we see that performing row operations (premultiplying by 7 ) is equivalent to changing the basis used to represent vectors in the image and performing column operations (postmultiplying by 8c ) is equivalent to changing the basis used to represent vectors in the domain.
Linear Transformations
65
According to Theorem 2.16, if ( and ) are matrices that represent with respect to possibly different ordered bases then ( and ) are equivalent. The converse of this also holds. Theorem 2.18 Let = and > be vector spaces with dim²= ³ ~ and dim²> ³ ~ . Then two d matrices ( and ) are equivalent if and only if they represent the same linear transformation B²= Á > ³, but possibly with respect to different ordered bases. In this case, ( and ) represent exactly the same set of linear transformations in B²= Á > ³. Proof. If ( and ) represent , that is, if ( ~ ´ µ8,9 and ) ~ ´ µ8Z ,9Z for ordered bases 8 Á 9Á 8 Z and 9Z then Theorem 2.16 shows that ( and ) are equivalent. Now suppose that ( and ) are equivalent, say ) ~ 7 (8c where 7 and 8 are invertible. Suppose also that ( represents a linear transformation B²= Á > ³ for some ordered bases 8 and 9, that is, ( ~ ´ µ8Á9 Theorem 2.13 implies that there is a unique ordered basis 8 Z for = for which 8 ~ 48Á8Z and a unique ordered basis 9Z for > for which 7 ~ 49Á9Z . Hence ) ~ 49Á9Z ´ µ8Á9 48Z Á8 ~ ´ µ8Z Á9Z Hence, ) also represents . By symmetry, we see that ( and ) represent the same set of linear transformations. This completes the proof.
We remarked in Example 0.3 that every matrix is equivalent to exactly one matrix of the block form 1 ~ >
0 cÁ
Ác cÁc ?block
Hence, the set of these matrices is a set of canonical forms for equivalence. Moreover, the rank is a complete invariant for equivalence. In other words, two matrices are equivalent if and only if they have the same rank.
Similarity of Matrices When a linear operator B²= ³ is represented by a matrix of the form ´ µ8 , equation (2.2) has the form ´ µ8Z ~ 7 ´ µ8 7 c where 7 is an invertible matrix. This motivates the following definition.
66
Advanced Linear Algebra
Definition Two matrices ( and ) are similar if there exists an invertible matrix 7 for which ) ~ 7 (7 c The equivalence classes associated with similarity are called similarity classes.
The analog of Theorem 2.18 for square matrices is the following. Theorem 2.19 Let = be a vector space of dimension . Then two d matrices ( and ) are similar if and only if they represent the same linear operator B²= ³, but possibly with respect to different ordered bases. In this case, ( and ) represent exactly the same set of linear operators in B²= ³. Proof. If ( and ) represent B²= ³, that is, if ( ~ ´ µ8 and ) ~ ´ µ9 for ordered bases 8 and 9 then Corollary 2.17 shows that ( and ) are similar. Now suppose that ( and ) are similar, say ) ~ 7 (7 c Suppose also that ( represents a linear operator B²= ³ for some ordered basis 8 , that is, ( ~ ´ µ8 Theorem 2.13 implies that there is a unique ordered basis 9 for = for which 7 ~ 48Á9 . Hence ) ~ 48Á9 ´ µ8 48c Á 9 ~ ´ µ9 Hence, ) also represents . By symmetry, we see that ( and ) represent the same set of linear operators. This completes the proof.
We will devote much effort in Chapter 7 to finding a canonical form for similarity.
Similarity of Operators We can also define similarity of operators. Definition Two linear operators Á B²= ³ are similar if there exists an automorphism B²= ³ for which ~ c The equivalence classes associated with similarity are called similarity classes.
Linear Transformations
67
The analog of Theorem 2.19 in this case is the following. Theorem 2.20 Let = be a vector space of dimension . Then two linear operators and on = are similar if and only if there is a matrix ( C that represents both operators (but with respect to possibly different ordered bases). In this case, and are represented by exactly the same set of matrices in C . Proof. If and are represented by ( C , that is, if ´ µ8 ~ ( ~ ´µ9 for ordered bases 8 and 9 then ´µ9 ~ ´ µ8 ~ 49Á8 ´ µ9 48Á9 Let B²= ³ be the automorphism of = defined by ² ³ ~ , where 8 ~ ¸ Á Ã Á ¹ and 9 ~ ¸ Á Ã Á ¹. Then ´µ9 ~ ²´² ³µ9 Ä ´² ³µ9 ³ ~ ²´ µ9 Ä ´µ9 ³ ~ 48Á9 and so c ´µ9 ~ ´µc 9 ´ µ9 ´µ9 ~ ´ µ9
from which it follows that and are similar. Conversely, suppose that and are similar, say ~ c Suppose also that is represented by the matrix ( C , that is, ( ~ ´ µ8 for some ordered basis 8 . Then ´µ8 ~ ´c µ8 ~ ´µ8 ´ µ8 ´µc 8 If we set ~ ² ³ then 9 ~ ² Á Ã Á ³ is an ordered basis for = and ´µ8 ~ ²´² ³µ8 Ä ´² ³µ8 ³ ~ ²´ µ8 Ä ´µ8 ³ ~ 49Á8 Hence ´µ8 ~ 49Á8 ´ µ8 49c Á8 It follows that ( ~ ´ µ8 ~ 48Á9 ´µ8 48c Á9 ~ ´ µ9 and so ( also represents . By symmetry, we see that and are represented by the same set of matrices. This completes the proof.
68
Advanced Linear Algebra
Invariant Subspaces and Reducing Pairs The restriction of a linear operator B²= ³ to a subspace : of = is not necessarily a linear operator on : . This prompts the following definition. Definition Let B²= ³. A subspace : of = is said to be invariant under or -invariant if ²:³ : , that is, if ² ³ : for all : . Put another way, : is invariant under if the restriction O: is a linear operator on : .
If = ~: l; then the fact that : is -invariant does not imply that the complement ; is also -invariant. (The reader may wish to supply a simple example with = ~ s .) Definition Let B²= ³. If = ~ : l ; and if both : and ; are -invariant, we say that the pair ²:Á ; ³ reduces .
A reducing pair can be used to decompose a linear operator into a direct sum as follows. Definition Let B²= ³. If ²:Á ; ³ reduces we write ~ O: l O; and call the direct sum of O: and O; . Thus, the expression ~l means that there exist subspaces : and ; of = for which ²:Á ; ³ reduces and ~ O: and ~ O;
The concept of the direct sum of linear operators will play a key role in the study of the structure of a linear operator.
Topological Vector Spaces This section is for readers with some familiarity with point-set topology. The standard topology on s is the topology for which the set of open rectangles ) ~ ¸0 d Ä d 0 0 's are open intervals in s¹ is a basis (in the sense of topology), that is, a subset of s is open if and only if it is a union of sets in ) . The standard topology is the topology induced by the Euclidean metric on s .
Linear Transformations
69
The standard topology on s has the property that the addition function 7¢ s d s ¦ s ¢ ²#Á $³ ¦ # b $ and the scalar multiplication function C¢ s d s ¦ s ¢ ²Á #³ ¦ # are continuous. As such, s is a topological vector space. Also, any linear functional ¢ s ¦ s is a continuous map. More generally, any real vector space = endowed with a topology J is called a topological vector space if the operations of addition 7¢ = d = ¦ = and scalar multiplication C¢ s d = ¦ = are continuous under J . Let = be a real vector space of dimension and fix an ordered basis 8 ~ ²# Á à Á # ³ for = . Consider the coordinate map ~ 8 ¢ = ¦ s ¢ # ¦ ´#µ8 and its inverse 8 ~ 8c ¢ s ¦ = ¢ ² Á à Á ³ ¦ # We claim that there is precisely one topology J ~ J= on = for which = becomes a topological vector space and for which all linear functionals are continuous. This is called the natural topology on = . In fact, the natural topology is the topology for which 8 (and therefore also 8 ) is a homeomorphism, for any basis 8. (Recall that a homeomorphism is a bijective map that is continuous and has a continuous inverse.) Once this has been established, it will follow that the open sets in J are precisely the images of the open sets in s under the map 8 . A basis for the natural topology is given by ¸8 ²0 d Ä d 0 ³ 0 's are open intervals in s¹ ~ D # 0 's are open intervals in sE 0
First, we show that if = is a topological vector space under a topology J then is continuous. Since ~ where ¢ s ¦ = is defined by ² Á à Á ³ ~ # it is sufficient to show that these maps are continuous. (The sum of continuous maps is continuous.) Let 6 be an open set in J . Then Cc ²6³ ~ ¸²Á %³ s d = % 6¹ is open in s d = . We need to show that the set
70
Advanced Linear Algebra
c ²6³ ~ ¸² Á à Á ³ s # 6¹
is open in s , so let ² Á Ã Á ³ c ²6³. Thus, # 6 . It follows that ² Á # ³ Cc ²6³, which is open, and so there is an open interval 0 s and an open set ) J of = for which ² Á # ³ 0 d ) Cc ²6³ Then the open set < ~ s d Ä d s d 0 d s d Ä d s, where the factor 0 is in the th position, has the property that ²< ³ 6. Thus ² Á Ã Á ³ < c ²6³ and so c ²6³ is open. Hence, , and therefore also , is continuous. Next we show that if every linear functional on = is continuous under a topology J on = then the coordinate map is continuous. If # = denote by ´#µ8Á the th coordinate of ´#µ8 . The map ¢ = ¦ s defined by ²#³ ~ ´#µ8Á is a linear functional and so is continuous by assumption. Hence, for any open interval 0 s the set ( ~ ¸# = ´#µ8Á 0 ¹ is open. Now, if 0 are open intervals in s then c ²0 d Ä d 0 ³ ~ ¸# = ´#µ8 0 d Ä d 0 ¹ ~ ( is open. Thus, is continuous. Thus, if a topology J has the property that = is a topological vector space and every linear functional is continuous, then and ~ c are homeomorphisms. This means that J , if it exists, must be unique. It remains to prove that the topology J on = that makes a homeomorphism has the property that = is a topological vector space under J and that any linear functional on = continuous. As to addition, the maps ¢ = ¦ s and ² d ³¢ = d = ¦ s d s are homeomorphisms and the map 7Z ¢ s d s ¦ s is continuous and so the map 7¢ = d = ¦ = , being equal to c k 7Z k ² d ³, is also continuous. As to scalar multiplication, the maps ¢ = ¦ s and ² d ³¢ s d = ¦ s d s are homeomorphisms and the map CZ ¢ s d s ¦ s is continuous and so the map C¢ = d = ¦ = , being equal to c k CZ k ² d ³, is also continuous. Now let be a linear functional. Since is continuous if and only if k c is continuous, we can confine attention to = ~ s . In this case, if Á Ã Á is the
Linear Transformations
71
standard basis for s and ( ² ³( 4 , then for any % ~ ² Á Ã Á ³ s we have ( ²%³( ~ c ² ³c ( (( ² ³( 4 ( ( Now, if (%( °4 then ( ( °4 and so ( ²%³( , which implies that is continuous. According to the Riesz representation theorem and the Cauchy–Schwarz inequality, we have ) ²%³) )H ))%) Hence, % ¦ implies ²% ³ ¦ and so by linearity, % ¦ % implies ²% ³ ¦ % and so is continuous. Theorem 2.21 Let = be a real vector space of dimension . There is a unique topology on = , called the natural topology for which = is a topological vector space and for which all linear functionals on = are continuous. This topology is determined by the fact that the coordinate map ¢ = ¦ s is a homeomorphism.
Linear Operators on = d A linear operator on a real vector space = can be extended to a linear operator d on the complexification = d by defining d ²" b #³ ~ ²"³ b ²#³ Here are the basic properties of this complexification of . Theorem 2.22 If Á B²= ³ then 1) ² ³d ~ d , s 2) ² b ³d ~ d b d 3) ²³d ~ d d 4) ´ ²#³µd ~ d ²#d ³.
Let us recall that for any ordered basis 8 for = and any vector # = we have ´# b µcpx²8 ³ ~ ´#µ8 Now, if 8 is a basis for = , then the th column of ´ µ8 is ´ ² ³µ8 ~ ´ ² ³ b µcpx²8³ ~ ´ d ² b ³µcpx²8³ which is the th column of the coordinate matrix of d with respect to the basis cpx²8 ³. Thus we have the following theorem.
72
Advanced Linear Algebra
Theorem 2.23 Let B²= ³ where = is a real vector space. The matrix of d with respect to the basis cpx²8 ³ is equal to the matrix of with respect to the basis 8 ´ d µcpx²8³ ~ ´ µ8 Hence, if a real matrix ( represents a linear operator on = then ( also represents the complexification d of on = d .
Exercises 1.
2. 3. 4.
Let ( CÁ have rank . Prove that there are matrices ? CÁ and @ CÁ , both of rank , for which ( ~ ?@ . Prove that ( has rank if and only if it has the form ( ~ %! & where % and & are row matrices. Prove Corollary 2.9 and find an example to show that the corollary does not hold without the finiteness condition. Let B²= Á > ³. Prove that is an isomorphism if and only if it carries a basis for = to a basis for > . If B²= Á > ³ and B²= Á > ³ we define the external direct sum ^ B²= ^ = Á > ^ > ³ by ² ^ ³²²# Á # ³³ ~ ² ²# ³Á ²# ³³
5. 6.
7.
8. 9.
Show that ^ is a linear transformation. Let = ~ : l ; . Prove that : l ; : ^ ; . Thus, internal and external direct sums are equivalent up to isomorphism. Let = ~ ( b ) and consider the external direct sum , ~ ( ^ ) . Define a map ¢ ( ^ ) ¦ = by ²#Á $³ ~ # b $. Show that is linear. What is the kernel of ? When is an isomorphism? Let J be a subset of B²= ³. A subspace : of = is J -invariant if : is invariant for every J . Also, = is J -irreducible if the only J -invariant subspaces of = are ¸¹ and = . Prove the following form of Schur's lemma. Suppose that J= B²= ³ and J> B²> ³ and = is J= -irreducible and > is J> -irreducible. Let B²= Á > ³ satisfy J= ~ J> , that is, for any J= there is a J> such that ~ . Prove that ~ or is an isomorphism. Let B²= ³ where dim²= ³ B. If rk² ³ ~ rk² ³ show that im² ³ q ker² ³ ~ ¸¹. Let B²< ,= ³ and B²= Á > ³. Show that rk²³ min¸rk² ³Á rk²³¹
10. Let B²< Á = ³ and B²= Á > ³. Show that null²³ null² ³ b null²³ 11. Let Á B²= ³ where is invertible. Show that rk²³ ~ rk² ³ ~ rk²³
Linear Transformations
73
12. Let Á B²= Á > ³. Show that rk² b ³ rk² ³ b rk²³ 13. Let : be a subspace of = . Show that there is a B²= ³ for which ker² ³ ~ : . Show also that there exists a B²= ³ for which im²³ ~ : . 14. Suppose that Á B²= ³. a) Show that ~ for some B²= ³ if and only if im²³ im² ³. b) Show that ~ for some B²= ³ if and only if ker² ³ ker²³. 15. Let = ~ : l : . Define linear operators on = by ² b ³ ~ for ~ Á . These are referred to as projection operators. Show that 1) ~ 2) b ~ 0 , where 0 is the identity map on = . 3) ~ for £ where is the zero map. 4) = ~ im² ³ l im² ³ 16. Let dim²= ³ B and suppose that B²= ³ satisfies ~ . Show that rk² ³ dim²= ³. 17. Let ( be an d matrix over - . What is the relationship between the linear transformation ( ¢ - ¦ - and the system of equations (? ~ ) ? Use your knowledge of linear transformations to state and prove various results concerning the system (? ~ ) , especially when ) ~ . 18. Let = have basis 8 ~ ¸# Á Ã Á # ¹. Suppose that for each Á we define Á B²= ³ by Á ²# ³ ~ F
# # b #
if £ if ~
Prove that the Á are invertible and form a basis for B²= ³. 19. Let B²= ³. If : is a -invariant subspace of = must there be a subspace ; of = for which ²:Á ; ³ reduces ? 20. Find an example of a vector space = and a proper subspace : of = for which = : . 21. Let dim²= ³ B. If , B²= ³ prove that ~ implies that and are invertible and that ~ ² ³ for some polynomial ²%³ - ´%µ. 22. Let B²= ³ where dim²= ³ B. If ~ for all B²= ³ show that ~ , for some - , where is the identity map. 23. Let (Á ) C ²- ³. Let 2 be a field containing - . Show that if ( and ) are similar over 2 , that is, if ) ~ 7 (7 c where 7 C ²2³ then ( and ) are also similar over - , that is, there exists 8 C ²- ³ for which ) ~ 8(8c . Hint: consider the equation ?( c )? ~ as a homogeneous system of linear equations with coefficients in - . Does it have a solution? Where? 24. Let ¢ s ¦ s be a continuous function with the property that ²% b &³ ~ ²%³ b ²&³ Prove that is a linear functional on s .
74
Advanced Linear Algebra
25. Prove that any linear functional ¢ s ¦ s is a continuous map. 26. Prove that any subspace : of s is a closed set or, equivalently, that : ~ s ± : is open, that is, for any % : there is an open ball )² Á ³ centered at % with radius for which )²%Á ³ : . 27. Prove that any linear transformation ¢ = ¦ > is continuous under the natural topologies of = and > . 28. Prove that any surjective linear transformation from = to > (both finitedimensional topological vector spaces under the natural topology) is an open map, that is, maps open sets to open sets. 29. Prove that any subspace : of a finite-dimensional vector space = is a closed set or, equivalently, that : is open, that is, for any % : there is an open ball )² Á ³ centered at % with radius for which )²%Á ³ : . 30. Let : be a subspace of = with dim²= ³ B. a) Show that the subspace topology on : inherited from = is the natural topology. b) Show that the natural topology on = °: is the topology for which the natural projection map ¢ = ¦ = °: continuous and open. 31. If = is a real vector space then = d is a complex vector space. Thinking of = d as a vector space ²= d ³s over s, show that ²= d ³s is isomorphic to the external direct product = ^ = . 34. (When is a complex linear map a complexification?) Let = be a real vector space with complexification = d and let B²= d ³. Prove that is a complexification, that is, has the form d for some B²= ³ if and only if commutes with the conjugate map ¢ = d ¦ = d defined by ²" b #³ ~ " c #. 35. Let > be a complex vector space. a) Consider replacing the scalar multiplication on > by the operation ²'Á $³ ¦ '$ where ' d and $ > . Show that the resulting set with the addition defined for the vector space > and with this scalar multiplication is a complex vector space, which we denote by > . b) Show, without using dimension arguments, that ²>s ³d > ^ > . 36. a) Let be a linear operator on the real vector space < with the property that ~ c. Define a scalar multiplication on < by complex numbers as follows ² b ³ h # ~ # b ²#³ for Á s and # < . Prove that under the addition of < and this scalar multiplication < is a complex vector space, which we denote by < . b) What is the relationship between < and = d ? Hint: consider < ~ = ^ = and ²"Á #³ ~ ²c#Á "³.
Chapter 3
The Isomorphism Theorems
Quotient Spaces Let : be a subspace of a vector space = . It is easy to see that the binary relation on = defined by "#¯"c#: is an equivalence relation. When " #, we say that " and # are congruent modulo : . The term mod is used as a colloquialism for modulo and " # is often written " # mod : When the subspace in question is clear, we will simply write " #. To see what the equivalence classes look like, observe that ´#µ ~ ¸" = " #¹ ~ ¸" = " c # :¹ ~ ¸" = " ~ # b for some :¹ ~ ¸# b :¹ ~#b: The set ´#µ ~ # b : ~ ¸# b
:¹
is called a coset of : in = and # is called a coset representative for # b : . (Thus, any member of a cost is a coset representative.) The set of all cosets of : in = is denoted by = ~ ¸# b : # = ¹ : This is read “= mod : ” and is called the quotient space of = modulo : . Of
76
Advanced Linear Algebra
course, the term space is a hint that we intend to define vector space operations on = °: . Note that congruence modulo : is preserved under the vector space operations on = , for if " # and " # then " c # :Á " c # : ¬ ²" c # ³ b ²" c # ³ : ¬ ²" b " ³ c ²# b # ³ : ¬ " b " # b # A natural choice for vector space operations on = °: is ²" b :³ ~ " b : ²" b :³ b ²# b :³ ~ ²" b #³ b : However, in order to show that these operations are well-defined, it is necessary to show that they do not depend on the choice of coset representatives, that is, if " b : ~ " b : and # b : ~ # b : then " b : ~ " b : ²" b # ³ b : ~ ²" b # ³ b : The straightforward details of this are left to the reader. Let us summarize. Theorem 3.1 Let : be a subspace of = . The binary relation "#¯"c#: is an equivalence relation on = , whose equivalence classes are the cosets # b : ~ ¸# b
:¹
of : in = . The set = °: of all cosets of : in = , called the quotient space of = modulo : , is a vector space under the well-defined operations " b : ~ " b : ²" b # ³ b : ~ ²" b # ³ b : The zero vector in = °: is the coset b : ~ : .
The Natural Projection and the Correspondence Theorem If : is a subspace of = then we can define a map : ¢ = ¦ = °: by sending each vector to the coset containing it : ²#³ ~ # b : This map is called the canonical projection or natural projection of = onto = °: , or simply projection modulo : . It is easily seen to be linear, for we have
The Isomorphism Theorems
77
(writing for : ) ²" b #³ ~ ²" b #³ b : ~ ²" b :³ b ²# b :³ ~ ²"³ b ²#³ The canonical projection is clearly surjective. To determine the kernel of , note that # ker²³ ¯ ²#³ ~ ¯ # b : ~ : ¯ # : and so ker²³ ~ : Theorem 3.2 The canonical projection : ¢ = ¦ = °: defined by : ²#³ ~ # b : is a surjective linear transformation with ker²: ³ ~ : .
If : is a subspace of = then the subspaces of the quotient space = °: have the form ; °: for some intermediate subspace ; satisfying : ; = . In fact, as shown in Figure 3.1, the projection map : provides a one-to-one correspondence between intermediate subspaces : ; = and subspaces of the quotient space = °: . The proof of the following theorem is left as an exercise.
V V/S
T S
T/S
{0}
{0}
Figure 3.1: The correspondence theorem Theorem 3.3 (The correspondence theorem) Let : be a subspace of = . Then the function that assigns to each intermediate subspace : ; = the subspace ; °: of = °: is an order preserving (with respect to set inclusion) one-to-one correspondence between the set of all subspaces of = containing : and the set of all subspaces of = °: .
The Universal Property Isomorphism Theorem
of
Quotients
and
the
First
Let : be a subspace of = . The pair ²= °:Á : ³ has a very special property, known as the universal property—a term that comes from the world of category theory.
78
Advanced Linear Algebra
Figure 3.2 shows a linear transformation B²= Á > ³, along with the canonical projection : from = to the quotient space = °: .
W
V
W
Ss W' V/S Figure 3.2: The universal property The universal property states that if ker² ³ : then there is a unique Z ¢ = °: ¦ > for which Z k : ~ Another way to say this is that any such B²= Á > ³ can be factored through the canonical projection : . Theorem 3.4 Let : be a subspace of = and let B²= Á > ³ satisfy : ker² ³. Then, as pictured in Figure 3.2, there is a unique linear transformation Z ¢ = °: ¦ > with the property that Z k : ~ Moreover, ker² Z ³ ~ ker² ³°: and im² Z ³ ~ im² ³. Proof. We have no other choice but to define Z by the condition Z k : ~ , that is, Z ²# b :³ ~ ²#³ This function is well-defined if and only if # b : ~ " b : ¬ Z ²# b :³ ~ Z ²" b :³ which is equivalent to each of the following statements: #b: ~"b: #c": %: :
¬ ²#³ ~ ²"³ ¬ ²# c "³ ~ ¬ ²%³ ~ ker² ³
Thus, Z ¢ = °: ¦ > is well-defined. Also, im² Z ³ ~ ¸ Z ²# b :³ # = ¹ ~ ¸ ²#³ # = ¹ ~ im² ³ and
The Isomorphism Theorems
79
ker² Z ³ ~ ¸# b : Z ²# b :³ ~ ¹ ~ ¸# b : ²#³ ~ ¹ ~ ¸# b : # ker² ³¹ ~ ker² ³°: The uniqueness of Z is evident.
Theorem 3.4 has a very important corollary, which is often called the first isomorphism theorem and is obtained by taking : ~ ker² ³. Theorem 3.5 (The first isomorphism theorem) Let ¢ = ¦ > be a linear transformation. Then the linear transformation Z ¢ = °ker² ³ ¦ > defined by Z ²# b ker² ³³ ~ ²#³ is injective and = im² ³ ker² ³
According to Theorem 3.5, the image of any linear transformation on = is isomorphic to a quotient space of = . Conversely, any quotient space = °: of = is the image of a linear transformation on = : the canonical projection : . Thus, up to isomorphism, quotient spaces are equivalent to homomorphic images.
Quotient Spaces, Complements and Codimension The first isomorphism theorem gives some insight into the relationship between complements and quotient spaces. Let : be a subspace of = and let ; be a complement of : , that is = ~: l; Since every vector # = has the form # ~ b !, for unique vectors ! ; , we can define a linear operator ¢ = ¦ = by setting
: and
² b !³ ~ ! Because and ! are unique, is well-defined. It is called projection onto ; along : . (Note the word onto, rather than modulo, in the definition; this is not the same as projection modulo a subspace.) It is clear that im²³ ~ ; and ker²³ ~ ¸ b ! = ! ~ ¹ ~ :
80
Advanced Linear Algebra
Hence, the first isomorphism theorem implies that ;
= :
Theorem 3.6 Let : be a subspace of = . All complements of : in = are isomorphic to = °: and hence to each other.
The previous theorem can be rephrased by writing (l) ~(l* ¬) * On the other hand, quotients and complements do not behave as nicely with respect to isomorphisms as one might casually think. We leave it to the reader to show the following: 1) It is possible that (l) ~* l+ ° +. Hence, ( * does not imply that a complement with ( * but ) of ( is isomorphic to a complement of * . 2) It is possible that = > and = ~ : l ) and > ~ : l + ° +. Hence, = > does not imply that = °: ° > °: . (However, but ) according to the previous theorem, if = equals > then ) +.) Corollary 3.7 Let : be a subspace of a vector space = . Then dim²= ³ ~ dim²:³ b dim²= °:³
Definition If : is a subspace of = then dim²= °:³ is called the codimension of : in = and is denoted by codim²:³ or codim= ²:³.
Thus, the codimension of : in = is the dimension of any complement of : in = and when = is finite-dimensional, we have codim= ²:³ ~ dim²= ³ c dim²:³ (This makes no sense, in general, if = is not finite-dimensional, since infinite cardinal numbers cannot be subtracted.)
Additional Isomorphism Theorems There are several other isomorphism theorems that are consequences of the first isomorphism theorem. As we have seen, if = ~ : l ; then = °; : . This can be written
The Isomorphism Theorems
81
: l; : ; : q; This applies to nondirect sums as well. Theorem 3.8 (The second isomorphism theorem) Let = be a vector space and let : and ; be subspaces of = . Then : b; : ; : q; Proof. Let ¢ ²: b ; ³ ¦ :°²: q ; ³ be defined by ² b !³ ~ b ²: q ; ³ We leave it to the reader to show that is a well-defined surjective linear transformation, with kernel ; . An application of the first isomorphism theorem then completes the proof.
The following theorem demonstrates one way in which the expression = °: behaves like a fraction. Theorem 3.9 (The third isomorphism theorem) Let = be a vector space and suppose that : ; = are subspaces of = . Then = °: = ; °: ; Proof. Let ¢ = °: ¦ = °; be defined by ²# b :³ ~ # b ; . We leave it to the reader to show that is a well-defined surjective linear transformation whose kernel is ; °: . The rest follows from the first isomorphism theorem.
The following theorem demonstrates one way in which the expression = °: does not behave like a fraction. Theorem 3.10 Let = be a vector space and let : be a subspace of = . Suppose that = ~ = l = and : ~ : l : with : = . Then = = l = = = ^ ~ : : l : : : Proof. Let ¢ = ¦ ²= °: ³ ^ ²= °: ³ be defined by ²# b # ³ ~ ²# b : Á # b : ³ This map is well-defined, since the sum = ~ = l = is direct. We leave it to the reader to show that is a surjective linear transformation, whose kernel is : l : . The rest follows from the first isomorphism theorem.
82
Advanced Linear Algebra
Linear Functionals Linear transformations from = to the base field - (thought of as a vector space over itself) are extremely important. Definition Let = be a vector space over - . A linear transformation B²= Á - ³, whose values lie in the base field - is called a linear functional (or simply functional) on = . (Some authors use the term linear function.) The vector space of all linear functionals on = is denoted by = * and is called the algebraic dual space of = .
The adjective algebraic is needed here, since there is another type of dual space that is defined on general normed vector spaces, where continuity of linear transformations makes sense. We will discuss the so-called continuous dual space briefly in Chapter 13. However, until then, the term “dual space” will refer to the algebraic dual space. To help distinguish linear functionals from other types of linear transformations, we will usually denote linear functionals by lower case italic letters, such as , and . Example 3.1 The map ¢ - ´%µ ¦ - , defined by ²²%³³ ~ ²³ is a linear functional, known as evaluation at .
Example 3.2 Let 9´Á µ denote the vector space of all continuous functions on ´Á µ s. Let ¢ 9´Á µ ¦ s be defined by
²²%³³ ~ ²%³ %
i
Then 9´Á µ .
For any = * , the rank plus nullity theorem is dim²ker² ³³ b dim²im² ³³ ~ dim²= ³ But since im² ³ - , we have either im² ³ ~ ¸¹, in which case is the zero linear functional, or im² ³ ~ - , in which case is surjective. In other words, a nonzero linear functional is surjective. Moreover, if £ then codim²ker² ³³ ~ dim6
= 7~ ker² ³
and if dim²= ³ B then dim²ker² ³³ ~ dim²= ³ c
The Isomorphism Theorems
83
Thus, in dimensional terms, the kernel of a linear functional is a very “large” subspace of the domain = . The following theorem will prove very useful. Theorem 3.11 1) For any nonzero vector # = , there exists a linear functional = * for which ²#³ £ . 2) A vector # = is zero if and only if ²#³ ~ for all = * . 3) Let = i . If ²%³ £ then = ~ º%» l ker² ³ 4) Two nonzero linear functionals Á = i have the same kernel if and only if there is a nonzero scalar such that ~ . Proof. For part 3), if £ # º%» q ker² ³ then ²#³ ~ and # ~ % for £ - , whence ²%³ ~ , which is false. Hence, º%» q ker² ³ ~ ¸¹ and the direct sum : ~ º%» l ker² ³ exists. Also, for any # = we have #~
²#³ ²#³ % b 6# c %7 º%» b ker² ³ ²%³ ²%³
and so = ~ º%» l ker² ³. For part 4), if ~ for £ then ker² ³ ~ ker²³. Conversely, if 2 ~ ker² ³ ~ ker²³ then for % ¤ 2 we have by part 3), = ~ º%» l 2 Of course, O2 ~ O2 for any . Therefore, if ~ ²%³°²%³, it follows that ²%³ ~ ²%³ and hence ~ .
Dual Bases Let = be a vector space with basis 8 ~ ¸# 0¹. For each 0 , we can define a linear functional #i = * , by the orthogonality condition #i ²# ³ ~ Á where Á is the Kronecker delta function, defined by Á ~ F
if ~ if £
Then the set 8 i ~ ¸#i 0¹ is linearly independent, since applying the equation ~ #i b Ä b #i to the basis vector # gives
84
Advanced Linear Algebra
~ #i ²# ³ ~ Á ~ ~
~
for all . Theorem 3.12 Let = be a vector space with basis 8 ~ ¸# 0¹. 1) The set 8 i ~ ¸#i 0¹ is linearly independent. 2) If = is finite-dimensional then 8 i is a basis for = i , called the dual basis of 8. Proof. For part 2), for any = i , we have ²# ³#i ²# ³ ~ ²# ³Á ~ ²# ³
and so ~ ²# ³#i is in the span of 8 i . Hence, 8 * is a basis for = i .
Corollary 3.13 If dim²= ³ B then dim²= i ³ ~ dim²= ³.
The next example shows that Corollary 3.13 does not hold without the finiteness condition. Example 3.3 Let = be an infinite-dimensional vector space over the field - ~ { ~ ¸Á ¹, with basis 8 . Since the only coefficients in - are and , a finite linear combination over - is just a finite sum. Hence, = is the set of all finite sums of vectors in 8 and so according to Theorem 0.11, (= ( (F ²8 ³( ~ (8 ( On the other hand, each linear functional = i is uniquely defined by specifying its values on the basis 8 . Since these values must be either or , specifying a linear functional is equivalent to specifying the subset of 8 on which takes the value . In other words, there is a one-to-one correspondence between linear functionals on = and all subsets of 8 . Hence, (= i ( ~ (F ²8 ³( (8 ( (= ( This shows that = i cannot be isomorphic to = , nor to any proper subset of = . Hence, dim²= i ³ dim²= ³.
Reflexivity If = is a vector space then so is the dual space = i . Hence, we may form the double (algebraic) dual space = ii , which consists of all linear functionals ¢ = i ¦ - . In other words, an element of = ** is a linear map that assigns a scalar to each linear functional on = .
The Isomorphism Theorems
85
With this firmly in mind, there is one rather obvious way to obtain an element of = ii . Namely, if # = , consider the map #¢ = i ¦ - defined by #² ³ ~ ²#³ which sends the linear functional to the scalar ²#³. The map # is called evaluation at #. To see that # = ii , if Á = i and Á - then #² b ³ ~ ² b ³²#³ ~ ²#³ b ²#³ ~ #² ³ b #²³ and so # is indeed linear. We can now define a map ¢ = ¦ = ii by ²#³ ~ # This is called the canonical map (or the natural map) from = to = ii . This map is injective and hence in the finite-dimensional case, it is also surjective. Theorem 3.14 The canonical map ¢ = ¦ = ii defined by ²#³ ~ #, where # is evaluation at #, is a monomorphism. If = is finite-dimensional then is an isomorphism. Proof. The map is linear since " b #² ³ ~ ²" b #³ ~ ²"³ b ²#³ ~ ²" b #³² ³ for all = i . To determine the kernel of , observe that ²#³ ~ ¬ # ~ ¬ #² ³ ~ for all = i ¬ ²#³ ~ for all = i ¬ #~ by Theorem 3.11 and so ker² ³ ~ ¸¹. In the finite-dimensional case, since dim²= ii ³ ~ dim²= i ³ ~ dim²= ³, it follows that is also surjective, hence an isomorphism.
Note that if dim²= ³ B then since the dimensions of = and = ii are the same, we deduce immediately that = = ii . This is not the point of Theorem 3.14. The point is that the natural map # ¦ # is an isomorphism. Because of this, = is said to be algebraically reflexive. Thus, Theorem 3.14 implies that all finitedimensional vector spaces are algebraically reflexive. If = is finite-dimensional, it is customary to identify the double dual space = ii with = and to think of the elements of = ii simply as vectors in = . Let us consider an example of a vector space that is not algebraically reflexive.
86
Advanced Linear Algebra
Example 3.4 Let = be the vector space over { with basis ~ ²Á à Á Á Á Á à ³ where the is in the th position. Thus, = is the set of all infinite binary sequences with a finite number of 's. Define the order ²#³ of any # = to be the largest coordinate of # with value . Then ²#³ B for all # = . Consider the dual vectors i , defined (as usual) by i ² ³ ~ Á For any # = , the evaluation functional # has the property that #²i ³ ~ i ²#³ ~ if ²#³ However, since the dual vectors i are linearly independent, there is a linear functional = ii for which ²i ³ ~ for all . Hence, does not have the form # for any # = . This shows that the canonical map is not surjective and so = is not algebraically reflexive.
Annihilators The functions = i are defined on vectors in = , but we may also define on subsets 4 of = by letting ²4 ³ ~ ¸ ²#³ # 4 ¹ Definition Let 4 be a nonempty subset of a vector space = . The annihilator 4 of 4 is 4 ~ ¸ = i ²4 ³ ~ ¸¹¹
The term annihilator is quite descriptive, since 4 consists of all linear functionals that annihilate (send to ) every vector in 4 . It is not hard to see that 4 is a subspace of = i , even when 4 is not a subspace of = . The basic properties of annihilators are contained in the following theorem, whose proof is left to the reader. Theorem 3.15 1) (Order-reversing) For any subsets 4 and 5 of = , 4 5 ¬ 5 4
The Isomorphism Theorems
87
2) If dim²= ³ B then we have 4 span²4 ³ under the natural map. In particular, if : is a subspace of = then : : . 3) If dim²= ³ B and : and ; are subspaces of = then ²: q ; ³ ~ : b ; and ²: b ; ³ ~ : q ;
Consider a direct sum decomposition = ~: l; Then any linear functional ; i can be extended to a linear functional on = by setting ²:³ ~ . Let us call this extension by . Clearly, : and it is easy to see that the extension by map ¦ is an isomorphism from ; i to : , whose inverse is restriction to ; . Theorem 3.16 Let = ~ : l ; . a) The extension by map is an isomorphism from ; i to : and so ; i : b) If = is finite-dimensional then dim²: ³ ~ codim= ²:³ ~ dim²= ³ c dim²:³
Example 3.5 Part b) of Theorem 3.16 may fail in the infinite-dimensional case, since it may easily happen that : = i . As an example, let = be the vector space over { with a countably infinite ordered basis 8 ~ ² Á Á Ã ³. Let : ~ º » and ; ~ º Á Á Ã ». It is easy to see that : ; i = i and that dim²= i ³ dim²= ³.
The annihilator provides a way to describe the dual space of a direct sum. Theorem 3.17 A linear functional on the direct sum = ~ : l ; can be written as a direct sum of a linear functional that annihilates : and a linear functional that annihilates ; , that is, ²: l ; ³i ~ : l ; Proof. Clearly : q ; ~ ¸¹, since any functional that annihilates both : and ; must annihilate : l ; ~ = . Hence, the sum : b ; is direct. If = i then we can write ~ ² k ; ³ b ² k : ³ : l ; and so = ~ : l ; .
88
Advanced Linear Algebra
Operator Adjoints If B²= Á > ³ then we may define a map d ¢ > * ¦ = i by d ² ³ ~ k ~ for > * . (We will write composition as juxtaposition.) Thus, for any # = ´ d ² ³µ²#³ ~ ² ²#³³ The map d is called the operator adjoint of and can be described by the phrase “apply first.” Theorem 3.18 (Properties of the Operator Adjoint) 1) For Á B²= Á > ³ and Á ² b ³d ~ d b d 2) For B²= Á > ³ and B²> Á < ³ ²³d ~ d d 3) For any invertible B²= ³ ² c ³d ~ ² d ³c Proof. Proof of part 1) is left for the reader. For part 2), we have for all < i ²³d ² ³ ~ ²³ ~ d ² ³ ~ d ²d ² ³³ ~ ² d d ³² ³ Part 3) follows from part 2) and d ² c ³d ~ ² c ³d ~ d ~ and in the same way, ² c ³d d ~ . Hence ² c ³d ~ ² d ³c .
If B²= Á > ³ then d B(> i Á = i ) and so dd B²= ii Á > ii ³. Of course, dd is not equal to . However, in the finite-dimensional case, if we use the natural maps to identify = ii with = and > ii with > then we can think of dd as being in B²= Á > ³. Using these identifications, we do have equality in the finite-dimensional case. Theorem 3.19 Let = and > be finite-dimensional and let B²= Á > ³. If we identify = ii with = and > ii with > using the natural maps then dd is identified with . Proof. For any % = let the corresponding element of = ii be denoted by % and similarly for > . Then before making any identifications, we have for # = dd ²#³² ³ ~ #´ d ² ³µ ~ #² ³ ~ ² ²#³³ ~ ²#³² ³ for all > * and so
The Isomorphism Theorems
89
dd ²#³ ~ ²#³ > ii Therefore, using the canonical identifications for both = ii and > ii we have dd ²#³ ~ ²#³ for all # = .
The next result describes the kernel and image of the operator adjoint. Theorem 3.20 Let B²= Á > ³. Then 1) ker² d ³ ~ im² ³ 2) im² d ³ ~ ker² ³ Proof. For part 1), ker² d ³ ~ ¸ > i d ² ³ ~ ¹ ~ ¸ > i ² ²= ³³ ~ ¸¹¹ ~ ¸ > i ²im² ³³ ~ ¸¹¹ ~ im² ³ For part 2), if ~ ~ d im² d ³ then ker² ³ ker² ³ and so ker² ³ . For the reverse inclusion, let ker² ³ = i . On 2 ~ ker² ³, there is no problem since and d ~ agree on 2 for any > i . Let : be a complement of ker² ³. Then maps a basis 8 ~ ¸ 0¹ for : to a linearly independent set ²8 ³ ~ ¸ ² ³ 0¹ in > and so we can define > i any way we want on ²8 ³. In particular, let > i be defined by setting ² ² ³³ ~ ² ³ and extending in any manner to all of > i . Then ~ ~ d on 8 and therefore on : . Thus, ~ d im² d ³.
Corollary 3.21 Let B²= Á > ³, where = and > are finite-dimensional. Then rk² ³ ~ rk² d ³.
In the finite-dimensional case, and d can both be represented by matrices. Let 8 ~ ² Á Ã Á ³ and 9 ~ ² Á Ã Á ³ be ordered bases for = and > , respectively and let
90
Advanced Linear Algebra
i 8 i ~ ²i Á Ã Á i ³ and 9i ~ ²i Á Ã Á ³
be the corresponding dual bases. Then ²´ µ8Á9 ³Á ~ ²´ ² ³µ9 ³ ~ i ´ ² ³µ and ²´ d µ9i Á8i ³Á ~ ²´ d ²i ³µ8i ³ ~ ii ´ d ²i ³µ ~ d ²i ³² ³ ~ i ² ² ³³ Comparing the last two expressions we see that they are the same except that the roles of and are reversed. Hence, the matrices in question are transposes. Theorem 3.22 Let B²= Á > ³, where = and > are finite-dimensional. If 8 and 9 are ordered bases for = and > , respectively and 8 i and 9i are the corresponding dual bases then ´ d µ9i Á8i ~ ²´ µ8Á9 ³! In words, the matrices of and its operator adjoint d are transposes of one another.
Exercises 1. 2. 3. 4. 5. 6.
If = is infinite-dimensional and : is an infinite-dimensional subspace, must the dimension of = °: be finite? Explain. Prove the correspondence theorem. Prove the first isomorphism theorem. Complete the proof of Theorem 3.10. Let : be a subspace of = . Starting with a basis ¸ Á Ã Á ¹ for :Á how would you find a basis for = °: ? Use the first isomorphism theorem to prove the rank-plus-nullity theorem rk² ³ b null² ³ ~ dim²= ³
7.
for B²= Á > ³. Let B²= ³ and suppose that : is a subspace of = . Define a map Z ¢ = °: ¦ = °: by Z ²# b :³ ~ ²#³ b :
When is Z well-defined? If Z is well-defined, is it a linear transformation? What are im² Z ³ and ker² Z ³? 8. Show that for any nonzero vector # = , there exists a linear functional = i for which ²#³ £ . 9. Show that a vector # = is zero if and only if ²#³ ~ for all = i . 10. Let : be a proper subspace of a finite-dimensional vector space = and let # = ± : . Show that there is a linear functional = i for which ²#³ ~ and ² ³ ~ for all : .
The Isomorphism Theorems
91
11. Find a vector space = and decompositions = ~(l) ~* l+ ° +. Hence, ( * does not imply that ( * . with ( * but ) 12. Find isomorphic vectors spaces = and > with = ~ : l ) and > ~ : l + ° +. Hence, = > does not imply that = °: ° > °: . but ) 13. Let = be a vector space with = ~ : l ; ~ : l ; Prove that if : and : have finite codimension in = then so does : q : and codim²: q : ³ dim²; ³ b dim²; ³ 14. Let = be a vector space with = ~ : l ; ~ : l ;
15.
16. 17. 18. 19.
20. 21.
Suppose that : and : have finite codimension. Hence, by the previous exercise, so does : q : . Find a direct sum decomposition = ~ > l ? for which (1) > has finite codimension, (2) > : q : and (3) ? ; b ; . Let 8 be a basis for an infinite-dimensional vector space = and define, for all 8 , the map Z = i by Z ²³ ~ if ~ and otherwise. Does ¸ Z 8 ¹ form a basis for = i ? What do you conclude about the concept of a dual basis? Prove that ²: l ; ³i : i l ; i . Prove that d ~ and d ~ where is the zero linear operator and is the identity. Let : be a subspace of = . Prove that ²= °:³i : . Verify that a) ² b ³d ~ d b d for Á B²= Á > ³. b) ² ³d ~ d for any - and B²= Á > ³ Let B²= Á > ³, where = and > are finite-dimensional. Prove that rk² ³ ~ rk² d ³. Prove that if B²= ³ has the property that = ~ im²³ l ker²³ and Oim²³ ~
then is projection on im²³ along ker²³. 22. a) Let ¢ = ¦ : be projection onto a subspace : of = along a subspace ; of = . Show that is idempotent, that is ~ . b) Prove that if B²= ³ is idempotent then it is a projection. c) Is the adjoint of a projection also a projection?
Chapter 4
Modules I: Basic Properties
Motivation Let = be a vector space over a field - and let B²= ³. Then for any polynomial ²%³ - ´%µ, the operator ² ³ is well-defined. For instance, if ²%³ ~ b % b % then ² ³ ~ b b where is the identity operator and is the threefold composition k k . Thus, using the operator we can define the product of a polynomial ²%³ - ´%µ and a vector # = by ²%³# ~ ² ³²#³
(4.1)
This product satisfies the usual properties of scalar multiplication, namely, for all ²%³Á ²%³ - ´%µ and "Á # = , ²%³²" b #³ ~ ²%³" b ²%³# ²²%³ b ²%³³" ~ ²%³" b ²%³" ´²%³ ²%³µ" ~ ²%³´ ²%³"µ " ~ " Thus, for a fixed B²= ³, we can think of = as being endowed with the operations of (the usual) addition along with multiplication of an element of = by a polynomial in - ´%µ. However, since - ´%µ is not a field, these two operations do not make = into a vector space. Nevertheless, the situation in which the scalars form a ring but not a field is extremely important, not only in our context but in many others.
Modules Definition Let 9 be a commutative ring with identity, whose elements are called scalars. An 9 -module (or a module over 9 ) is a nonempty set 4 ,
94
Advanced Linear Algebra
together with two operations. The first operation, called addition and denoted by b , assigns to each pair ²"Á #³ 4 d 4 , an element " b # 4 . The second operation, denoted by juxtaposition, assigns to each pair ²Á "³ 9 d 4 , an element # 4 . Furthermore, the following properties must hold: 1) 4 is an abelian group under addition. 2) For all Á 9 and "Á # 4 ²" b #³ ~ " b # ² b ³" ~ " b " ² ³" ~ ² "³ " ~ " The ring 9 is called the base ring of 4 .
Note that vector spaces are just special types of modules: a vector space is a module over a field. When we turn in a later chapter to the study of the structure of a linear transformation B²= ³, we will think of = as having the structure of a vector space over - as well as a module over - ´%µ. Put another way, = is an abelian group under addition, with two scalar multiplications—one whose scalars are elements of - and one whose scalars are polynomials over - . This viewpoint will be of tremendous benefit for the study of . For now, we concentrate only on modules. Example 4.1 1) If 9 is a ring, the set 9 of all ordered -tuples, whose components lie in 9 , is an 9 -module, with addition and scalar multiplication defined componentwise (just as in - ), ² Á Ã Á ³ b ² Á Ã Á ³ ~ ² b Á Ã Á b ³ and ² Á Ã Á ³ ~ ² Á Ã Á ³ for , Á 9 . For example, { is the {-module of all ordered -tuples of integers. 2) If 9 is a ring, the set CÁ ²9³ of all matrices of size d , is an 9 module, under the usual operations of matrix addition and scalar multiplication over 9 . Since 9 is a ring, we can also take the product of matrices in CÁ ²9³. One important example is 9 ~ - ´%µ, whence CÁ ²- ´%µ³ is the - ´%µ-module of all d matrices whose entries are polynomials. 3) Any commutative ring 9 with identity is a module over itself, that is, 9 is an 9 -module. In this case, scalar multiplication is just multiplication by
Modules I: Basic Properties
95
elements of 9 , that is, scalar multiplication is the ring multiplication. The defining properties of a ring imply that the defining properties of an 9 module are satisfied. We shall use this example many times in the sequel.
Importance of the Base Ring Our definition of a module requires that the ring 9 of scalars be commutative. Modules over noncommutative rings can exhibit quite a bit more unusual behavior than modules over commutative rings. Indeed, as one would expect, the general behavior of 9-modules improves as we impose more structure on the base ring 9. If we impose the very strict structure of a field, the result is the very well-behaved vector space structure. To illustrate, if we allow the base ring 9 to be noncommutative then, as we will see, it is possible for an 9 -module to have bases of different sizes! Since modules over noncommutative rings will not be needed for the sequel, we require commutativity in the definition of module. As another example, if the base ring is an integral domain then whenever # Á Ã Á # are linearly independent over 9 so are # Á Ã Á # for any nonzero 9 . This fails when 9 is not an integral domain. We will also consider the property on the base ring 9 that all of its ideals are finitely generated. In this case, any finitely generated 9 -module 4 has the desirable property that all of its submodules are also finitely generated. This property of 9-modules fails if 9 does not have the stated property. When 9 is a principal ideal domain (such as { or - ´%µ), not only are all of its ideals finitely generated, but each is generated by a single element. In this case, the 9 -modules are “reasonably” well behaved. For instance, in general a module may have a basis but one or more of its submodules may not. However, if 9 is a principal ideal domain, this cannot happen. Nevertheless, even when 9 is a principal ideal domain, 9 -modules are less well behaved than vector spaces. For example, there are modules over a principal ideal domain that do not have any linearly independent elements. Of course, such modules cannot have a basis. Many of the basic concepts that we defined for vector spaces can also be defined for modules, although their properties are often quite different. We begin with submodules.
Submodules The definition of submodule parallels that of subspace.
96
Advanced Linear Algebra
Definition A submodule of an 9 -module 4 is a nonempty subset : of 4 that is an 9 -module in its own right, under the operations obtained by restricting the operations of 4 to : .
Theorem 4.1 A nonempty subset : of an 9 -module 4 is a submodule if and only if it is closed under the taking of linear combinations, that is, Á 9Á "Á # : ¬ " b # :
Theorem 4.2 If : and ; are submodules of 4 then : q ; and : b ; are also submodules of 4 .
We have remarked that a commutative ring 9 with identity is a module over itself. As we will see, this type of module provides some good examples of nonvector space like behavior. When we think of a ring 9 as an 9-module rather than as a ring, multiplication is treated as scalar multiplication. This has some important implications. In particular, if : is a submodule of 9 then it is closed under scalar multiplication, which means that it is closed under multiplication by all elements of the ring 9 . In other words, : is an ideal of the ring 9. Conversely, if ? is an ideal of the ring 9 then ? is also a submodule of the module 9 . Hence, the submodules of the 9 -module 9 are precisely the ideals of the ring 9 .
Spanning Sets The concept of spanning set carries over to modules as well. Definition The submodule spanned (or generated) by a subset : of a module 4 is the set of all linear combinations of elements of : : º:» ~ span²:³ ~ ¸ # b Ä b # 9Á # :Á ¹ A subset : 4 is said to span 4 or generate 4 if 4 ~ span²:³
One very important point to note is that if a nontrivial linear combination of the elements # Á à Á # in an 9 -module 4 is , # b Ä b # ~ where not all of the coefficients are then we cannot conclude, as we could in a vector space, that one of the elements # is a linear combination of the others. After all, this involves dividing by one of the coefficients, which may not be possible in a ring. For instance, for the {-module { d { we have ²Á ³ c ²Á ³ ~ ²Á ³ but neither ²Á ³ nor ²Á ³ is an integer multiple of the other.
Modules I: Basic Properties
97
The following simple submodules play a special role in the theory. Definition Let 4 be an 9 -module. A submodule of the form º#» ~ 9# ~ ¸# 9¹ for # 4 is called the cyclic submodule generated by #.
Of course, any finite-dimensional vector space is the direct sum of cyclic submodules, that is, one-dimensional subspaces. One of our main goals is to show that a finitely generated module over a principal ideal domain has this property as well. For reasons that will become clear soon, we need the following definition. Definition An 9 -module 4 is said to be finitely generated if it contains a finite set that generates 4 .
Of course, a vector space is finitely generated if and only if it has a finite basis, that is, if and only if it is finite-dimensional. For modules, life is more complicated. The following is an example of a finitely generated module that has a submodule that is not finitely generated. Example 4.2 Let 9 be the ring - ´% Á % Á Ã µ of all polynomials in infinitely many variables over a field - . It will be convenient to use ? to denote % Á % Á Ã and write a polynomial in 9 in the form ²?³. (Each polynomial in 9 , being a finite sum, involves only finitely many variables, however.) Then 9 is an 9 -module and as such, is finitely generated by the identity element ²?³ ~ . Now, consider the submodule : of all polynomials with zero constant term. This module is generated by the variables themselves, : ~ º% Á % Á Ã » However, : is not finitely generated. To see this, suppose that . ~ ¸ Á Ã Á ¹ is a finite generating set for : . Choose a variable % that does not appear in any of the polynomials in .. Then no linear combination of the polynomials in . can be equal to % . For if
% ~ ²?³ ²?³ ~
98
Advanced Linear Algebra
then let ²?³ ~ % ²?³ b ²?³ where ²?³ does not involve % . This gives
% ~ ´% ²?³ b ²?³µ ²?³ ~
~ % ²?³ ²?³ b ²?³ ²?³ ~
~
The last sum does not involve % and so it must equal . Hence, the first sum must equal , which is not possible since ²?³ has no constant term.
Linear Independence The concept of linear independence also carries over to modules. Definition A subset : of a module 4 is linearly independent if for any # Á Ã Á # : and Á Ã Á 9 , we have # b Ä b # ~ ¬ ~ for all A set : that is not linearly independent is linearly dependent.
It is clear from the definition that any subset of a linearly independent set is linearly independent. Recall that, in a vector space, a set : of vectors is linearly dependent if and only if some vector in : is a linear combination of the other vectors in : . For arbitrary modules, this is not true. Example 4.3 Consider { as a {-module. The elements Á { are linearly dependent, since ²³ c ²³ ~ but neither one is a linear combination (i.e., integer multiple) of the other.
The problem in the previous example (as noted earlier) is that # b Ä b # ~ implies that # ~ c # c Ä c # but, in general, we cannot divide both sides by , since it may not have a multiplicative inverse in the ring 9 .
Modules I: Basic Properties
99
Torsion Elements In a vector space = over a field - , singleton sets ¸#¹ where # £ are linearly independent. Put another way, £ and # £ imply # £ . However, in a module, this need not be the case. Example 4.4 The abelian group { ~ ¸Á Á à Á c¹ is a {-module, with scalar multiplication defined by ' ~ ²' h ³ mod , for all { and { . However, since ~ for all { , no singleton set ¸¹ is linearly independent. Indeed, { has no linearly independent sets.
This example motivates the following definition. Definition Let 4 be an 9 -module. A nonzero element # 4 for which # ~ for some nonzero 9 is called a torsion element of 4 . A module that has no nonzero torsion elements is said to be torsion-free. If all elements of 4 are torsion elements then 4 is a torsion module. The set of all torsion elements of 4 , together with the zero element, is denoted by 4tor .
If 4 is a module over an integral domain, it is not hard to see that 4tor is a submodule of 4 and that 4 °4tor is torsion-free. (We will define quotient modules shortly: they are defined in the same way as for vector spaces.)
Annihilators Closely associated with the notion of a torsion element is that of an annihilator. Definition Let 4 be an 9 -module. The annihilator of an element # 4 is ann²#³ ~ ¸ 9 # ~ ¹ and the annihilator of a submodule 5 of 4 is ann²5 ³ ~ ¸ 9 5 ~ ¸¹¹ where 5 ~ ¸# # 5 ¹. Annihilators are also called order ideals.
It is easy to see that ann²#³ and ann²5 ³ are ideals of 9 . Clearly, # 4 is a torsion element if and only if ann²#³ £ ¸¹. Let 4 ~ º" Á à Á " » be a finitely generated torsion module over an integral domain 9 . Then for each there is a nonzero ann²" ³. Hence, the nonzero product ~ Ä annihilates each generator of 4 and therefore every element of 4 , that is, ann²4 ³. This shows that ann²4 ³ £ ¸¹.
Free Modules The definition of a basis for a module parallels that of a basis for a vector space.
100
Advanced Linear Algebra
Definition Let 4 be an 9-module. A subset 8 of 4 is a basis if 8 is linearly independent and spans 4 . An 9-module 4 is said to be free if 4 ~ ¸¹ or if 4 has a basis. If 8 is a basis for 4 , we say that 4 is free on 8 .
Theorem 4.3 A subset 8 of a module 4 is a basis if and only if for every # 4 , there are unique elements # Á Ã Á # 8 and unique scalars Á Ã Á 9 for which # ~ # b Ä b #
In a vector space, a set of vectors is a basis if and only if it is a minimal spanning set, or equivalently, a maximal linearly independent set. For modules, the following is the best we can do in general. We leave proof to the reader. Theorem 4.4 Let 8 be a basis for an 9 -module 4 . Then 1) 8 is a minimal spanning set. 2) 8 is a maximal linearly independent set.
The {-module { has no basis since it has no linearly independent sets. But since the entire module is a spanning set, we deduce that a minimal spanning set need not be a basis. In the exercises, the reader is asked to give an example of a module 4 that has a finite basis, but with the property that not every spanning set in 4 contains a basis and not every linearly independent set in 4 is contained in a basis. It follows in this case that a maximal linearly independent set need not be a basis. The next example shows that even free modules are not very much like vector spaces. It is an example of a free module that has a submodule that is not free. Example 4.5 The set { d { is a free module over itself, using componentwise scalar multiplication ²Á ³²Á ³ ~ ²Á ³ with basis ¸²Á ³¹. But the submodule { d ¸¹ is not free since it has no linearly independent elements and hence no basis.
Homomorphisms The term linear transformation is special to vector spaces. However, the concept applies to most algebraic structures. Definition Let 4 and 5 be 9 -modules. A function ¢ 4 ¦ 5 is an 9 homomorphism if it preserves the module operations, that is, ²" b #³ ~ ²"³ b ²#³ for all Á 9 and "Á # 4 . The set of all 9 -homomorphisms from 4 to 5 is denoted by hom9 ²4 Á 5 ³. The following terms are also employed:
Modules I: Basic Properties
1) 2) 3) 4)
101
An 9-endomorphism is an 9-homomorphism from 4 to itself. A 9 -monomorphism or 9 -embedding is an injective 9 -homomorphism. An 9-epimorphism is a surjective 9 -homomorphism. An 9-isomorphism is a bijective 9 -homomorphism.
It is easy to see that hom9 ²4 Á 5 ³ is itself an 9 -module under addition of functions and scalar multiplication defined by ² ³²#³ ~ ² ²#³³ ~ ²#³ Theorem 4.5 Let hom9 ²4 Á 5 ³. The kernel and image of , defined as for linear transformations by ker² ³ ~ ¸# 4 ²#³ ~ ¹ and im² ³ ~ ¸ ²#³ # 4 ¹ are submodules of 4 and 5 , respectively. Moreover, is a monomorphism if and only if ker² ³ ~ ¸¹.
If 5 is a submodule of the 9 -module 4 then the map ¢ 5 ¦ 4 defined by ²#³ ~ # is evidently an 9 -monomorphism, called injection of 5 into 4 .
Quotient Modules The procedure for defining quotient modules is the same as that for defining quotient vector spaces. We summarize in the following theorem. Theorem 4.6 Let : be a submodule of an 9 -module 4 . The binary relation "#¯"c#: is an equivalence relation on 4 , whose equivalence classes are the cosets # b : ~ ¸# b
:¹
of : in 4 . The set 4 °: of all cosets of : in 4 , called the quotient module of 4 modulo : , is an 9 -module under the well-defined operations ²" b :³ b ²# b :³ ~ ²" b #³ b : ²" b :³ ~ " b : The zero element in 4 °: is the coset b : ~ : .
One question that immediately comes to mind is whether a quotient space of a free module need be free. As the next example shows, the answer is no. Example 4.6 As a module over itself, { is free on the set ¸¹. For any , the set { ~ ¸' ' {¹ is a free cyclic submodule of {, but the quotient {-
102
Advanced Linear Algebra
module {°{ is isomorphic to { via the map ²" b {³ ~ " mod and since { is not free as a {-module, neither is {°{.
The Correspondence and Isomorphism Theorems The correspondence and isomorphism theorems for vector spaces have analogs for modules. Theorem 4.7 (The correspondence theorem) Let : be a submodule of 4 . Then the function that assigns to each intermediate submodule : ; 4 the quotient submodule ; °: of 4 °: is an order-preserving (with respect to set inclusion) one-to-one correspondence between submodules of 4 containing : and all submodules of 4 °: .
Theorem 4.8 (The first isomorphism theorem) Let ¢ 4 ¦ 5 be an 9 homomorphism. Then the map Z ¢ 4 °ker² ³ ¦ 5 defined by Z ²# b ker² ³³ ~ ²#³ is an 9 -embedding and so 4 im² ³ ker² ³
Theorem 4.9 (The second isomorphism theorem) Let 4 be an 9 -module and let : and ; be submodules of 4 . Then : b; : ; : q;
Theorem 4.10 (The third isomorphism theorem) Let 4 be an 9 -module and suppose that : ; are submodules of 4 . Then 4 °: 4 ; °: ;
Direct Sums and Direct Summands The definition of direct sum is the same for modules as for vector spaces. We will confine our attention to the direct sum of a finite number of modules. Definition An 9 -module 4 is the direct sum of the submodules : Á Ã Á : , written 4 ~ : l Ä l : if every # 4 can be written, in a unique way (except for order), as a sum of
Modules I: Basic Properties
103
one element from each of the submodules : , that is, there are unique " : for which # ~ " b Ä b " In this case, each : is called a direct summand of 4 . If 4 ~ : l ; then : is said to be complemented and ; is called a complement of : in 4 .
Note that a sum is direct if and only if whenever " b Ä b " ~ where " : then " ~ for all , that is, if and only if has a unique representation as a sum of vectors from distinct submodules. As with vector spaces, we have the following useful characterization of direct sums. Theorem 4.11 A module 4 is the direct sum of submodules : Á à Á : if and only if 1) 4 ~ : b Ä b : 2) For each ~ Á à Á : q 4: 5 ~ ¸¹
£
In the case of vector spaces, every subspace is a direct summand, that is, every subspace has a complement. However, as the next example shows, this is not true for modules. Example 4.7 The set { of integers is a {-module. Since the submodules of { are precisely the ideals of the ring { and since { is a principal ideal domain, the submodules of { are the sets º» ~ { ~ ¸' ' {¹ Hence, any two nonzero proper submodules of { have nonzero intersection, for if £ then { q { ~ { where ~ lcm¸Á ¹. It follows that the only complemented submodules of { are { and ¸¹.
In the case of vector spaces, there is an intimate connection between subspaces and quotient spaces, as we saw in Theorem 3.6. The problem we face in generalizing this to modules in general is that not all submodules have a complement. However, this is the only problem. Theorem 4.12 Let : be a complemented submodule of 4 . All complements of : are isomorphic to 4 °: and hence to each other.
104
Advanced Linear Algebra
Proof. For any complement ; of : , the first isomorphism theorem applied to the projection ¢ 4 ¦ ; onto ; along : gives ; 4 °: .
Direct Summands and One-Sided Invertibility The next theorem characterizes direct summands, but first a definition. Definition A submodule : of an 9 -module 4 is a retract of 4 by an 9 homomorphism ¢ 4 ¦ : if fixes each element of : , that is, ² ³ ~ for all :.
Note that the homomorphism in the definition of a retract is similar to a projection map onto : , in that both types of maps are the identity when restricted to : . In fact, if : is complemented and ; is a complement of : then : is a retract of 4 by the projection onto : along ; . Indeed, more can be said. Theorem 4.13 A submodule : of the 9 -module 4 is complemented if and only if it is a retract of 4 . In this case, if : is a retract of 4 by then is projection onto : along ker² ³ and so 4 ~ : l ker² ³ ~ im² ³ l ker² ³ Proof. If 4 ~ : l ; then : is a retract of 4 by the projection map : ¢ 4 ¦ : . Conversely, if ¢ 4 ¦ : is an 9 -homomorphism that fixes : then clearly is surjective and : ~ im² ³. Also, # : q ker² ³ ¬ # ~ ²#³ ~ and for any # 4 we have # ~ ´# c ²#³µ b ²#³ ker² ³ b : Hence, 4 ~ : l ker² ³.
Definition Let ¢ ( ¦ ) be a module homomorphism. Then a left inverse of is a module homomorphism 3 ¢ ) ¦ ( for which 3 k ~ . A right inverse of is a module homomorphism 9 ¢ ) ¦ ( for which k 9 ~ .
It is easy to see that in order for to have a left inverse 3 , it must be injective since ²³ ~ ²³ ¬ 3 k ²³ ~ 3 k ²³ ¬ ~ and in order for to have a right inverse 9 , it must be surjective, since if ) then ~ ´9 ²³µ im² ³. Now, if we were dealing with functions between sets, then the converses of these statements would hold: is left-invertible if and only if it is injective and
Modules I: Basic Properties
105
is right-invertible if and only if it is surjective. However, for modules things are more complicated. Theorem 4.14 1) An 9-homomorphism ¢ ( ¦ ) has a left inverse 3 if and only if it is injective and im² ³ is a direct summand of ) , in which case ) ~ im² ³ l ker²3 ³ im²3 ³ l ker²3 ³ 2) An 9-homomorphism ¢ ( ¦ ) has a right inverse 9 if and only if it is surjective and ker² ³ is a direct summand of ) , in which case ( ~ ker² ³ l im²9 ³ ker² ³ l im² ³ Proof. For part 1), suppose first that 3 ~ ( . Then is injective since applying 3 to the expression ²%³ ~ ²&³ gives % ~ &. Also, % im² ³ q ker²3 ³ implies that % ~ ²³ and ~ 3 ²%³ ~ 3 ² ²³³ ~ and so % ~ . Hence, the direct sum im² ³ l ker²3 ³ exists. For any ) , we can write ~ 3 ²³ b ´ c 3 ²³µ where 3 ²³ im² ³ and c 3 ²³ ker²3 ³ and so ) ~ im² ³ l ker²3 ³. Conversely, if is injective and ) ~ 2 l im² ³ for some submodule 2 then let 3 ~ c k im² ³ where im² ³ is projection onto im²(³. This is well-defined since ¢ ( ¦ im² ³ is an isomorphism. Then 3 k ²³ ~ c k im² ³ k ²³ ~ c k ²³ ~ and so 3 is a left-inverse of . It is clear that 2 ~ ker²3 ³ and since 3 ¢ im² ³ ¦ im²3 ³ is injective, im² ³ im²3 ³. For part 2), if has a right inverse 9 then 9 has a left inverse and so 9 must be injective and ( ~ im²9 ³ l ker² ³ im² ³ l ker² ³ Conversely, suppose that ( ~ ? l ker² ³. Since the elements of different cosets of ker² ³ are mapped to different elements of ) , it is natural to define 9 ¢ ) ¦ ( by taking 9 ²³ to be a particular element in the coset b ker² ³ that is sent to by . However, in general we cannot pick just any element of
106
Advanced Linear Algebra
the coset and expect to get a module morphism. (We get a right inverse but only as a set function.) However, the condition ( ~ ? l ker² ³ is precisely what we need, because it says that the elements of the submodule ? form a set of distinct coset representatives; that is, each % ? belongs to exactly one coset and each coset contains exactly one element of ? . In addition, if ²% ³ ~ and ²% ³ ~ for % Á % ? then ²% b % ³ ~ ²% ³ b ²% ³ ~ b Thus, we can define 9 as follows. For any ) there is a unique % ? for which ²%³ ~ . Let 9 ²³ ~ %. Then we have ²9 ²³³ ~ ²%³ ~ and so k 9 ~ ) . Also, 9 ² b ³ ~ % b % ~ 9 ² ³ b 9 ² ³ and so 9 is a module morphism. Thus is right-invertible.
The last part of the previous theorem is worth further comment. Recall that if ¢ = ¦ > is a linear transformation on vector spaces then = ker² ³ l im² ³ This does not hold in general for modules, but it does hold if ker² ³ is a direct summand.
Modules Are Not As Nice As Vector Spaces Here is a list of some of the properties of modules (over commutative rings with identity) that emphasize the differences between modules and vector spaces. 1) A submodule of a module need not have a complement. 2) A submodule of a finitely generated module need not be finitely generated. 3) There exist modules with no linearly independent elements and hence with no basis. 4) A minimal spanning set or maximal linearly independent set is not necessarily a basis. 5) There exist free modules with submodules that are not free. 6) There exist free modules with linearly independent sets that are not contained in a basis and spanning sets that do not contain a basis. Recall also that a module over a noncommutative ring may have bases of different sizes. However, all bases for a free module over a commutative ring with identity have the same size, as we will prove in the next chapter.
Modules I: Basic Properties
107
Exercises 1. 2.
3.
Give the details to show that any commutative ring with identity is a module over itself. Let : ~ ¸# Á Ã Á # ¹ be a subset of a module 4 . Prove that 5 ~ º:» is the smallest submodule of 4 containing : . First you will need to formulate precisely what it means to be the smallest submodule of 4 containing : . Let 4 be an 9 -module and let 0 be an ideal in 9 . Let 04 be the set of all finite sums of the form # b Ä b #
4.
where 0 and # 4 . Is 04 a submodule of 4 ? Show that if : and ; are submodules of 4 then (with respect to set inclusion) : q ; ~ glb¸:Á ; ¹ and : b ; ~ lub¸:Á ; ¹
5. 6.
7.
8. 9.
Let : : Ä be an ascending sequence of submodules of an 9 module 4 . Prove that the union : is a submodule of 4 . Give an example of a module 4 that has a finite basis but with the property that not every spanning set in 4 contains a basis and not every linearly independent set in 4 is contained in a basis. Show that, just as in the case of vector spaces, an 9 -homomorphism can be defined by assigning arbitrary values on the elements of a basis and extending by linearity. Let hom9 ²4 Á 5 ³ be an 9 -isomorphism. If 8 is a basis for 4 , prove that ²8 ³ ~ ¸ ²³ 8 ¹ is a basis for 5 . Let 4 be an 9 -module and let hom9 ²4 Á 4 ³ be an 9 -endomorphism. If is idempotent, that is, if ~ show that 4 ~ ker² ³ l im² ³
Does the converse hold? 10. Consider the ring 9 ~ - ´%Á &µ of polynomials in two variables. Show that the set 4 consisting of all polynomials in 9 that have zero constant term is an 9 -module. Show that 4 is not a free 9 -module. 11. Prove that 9 is an integral domain if and only if all 9 -modules 4 have the following property: If # Á Ã Á # is linearly independent over 9 then so is # Á Ã Á # for any nonzero 9 . 12. Prove that if a commutative ring 9 with identity has the property that every finitely generated 9 -module is free then 9 is a field. 13. Let 4 and 5 be 9 -modules. If : is a submodule of 4 and ; is a submodule of 5 show that 4 l5 4 5 l : l; : ;
108
Advanced Linear Algebra
14. If 9 is a commutative ring with identity and ? is an ideal of 9 then ? is an 9-module. What is the maximum size of a linearly independent set in ? ? Under what conditions is ? free? 15. a) Show that for any module 4 over an integral domain the set 4tor of all torsion elements in a module 4 is a submodule of 4 . b) Find an example of a ring 9 with the property that for some 9 -module 4 the set 4tor is not a submodule. c) Show that for any module 4 over an integral domain, the quotient module 4 °4tor is torsion-free. 16. Fix a prime and let 4 ~F
Á {Á G
Show that 4 is a {-module and that the set , ~ ¸° ¹ is a minimal spanning set for 4 . Is the set , linearly independent? 17. Let 5 be an abelian group together with a scalar multiplication over a ring 9 that satisfies all of the properties of an 9 -module except that # does not necessarily equal # for all # 5 . Show that 5 can be written as a direct sum of an 9 -module 5 and another “pseudo 9 -module” 5 . 18. Prove that hom9 ²4 Á 5 ³ is an 9 -module under addition of functions and scalar multiplication defined by ² ³²#³ ~ ² ²#³³ ~ ²#³ 19. Prove that any 9 -module 4 is isomorphic to the 9 -module hom9 ²9Á 4 ³. 20. Let 9 and : be commutative rings with identity and let ¢ 9 ¦ : be a ring homomorphism. Show that any : -module is also an 9 -module under the scalar multiplication # ~ ²³# 21. Prove that hom{ ²{ Á { ³ { where ~ gcd²Á ³. 22. Suppose that 9 is a commutative ring with identity. If ? and @ are ideals of 9 for which 9°? 9°@ as 9 -modules then prove that ? ~ @ . Is the result true if 9°? 9°@ as rings?
Chapter 5
Modules II: Free and Noetherian Modules
The Rank of a Free Module Since all bases for a vector space = have the same cardinality, the concept of vector space dimension is well-defined. A similar statement holds for free 9 modules when the base ring is commutative (but not otherwise). Theorem 5.1 Let 4 be a free module over a commutative ring 9 with identity. 1) Then any two bases of 4 have the same cardinality. 2) The cardinality of a spanning set is greater than or equal to that of a basis. Proof. The plan is to find a vector space = with the property that, for any basis for 4 , there is a basis of the same cardinality for = . Then we can appeal to the corresponding result for vector spaces. Let ? be a maximal ideal of 9 , which exists by Theorem 0.22. Then 9°? is a field. Our first thought might be that 4 is a vector space over 9°? but that is not the case. In fact, scalar multiplication using the field 9°? ² b ? ³# ~ # is not even well-defined, since this would require that ? 4 ~ ¸¹. On the other hand, we can fix precisely this problem by factoring out the submodule ? 4 ~ ¸ # b Ä b # ? Á # 4 ¹ Indeed, 4 °? 4 is a vector space over 9°? , with scalar multiplication defined by ² b ? ³²" b ? 4 ³ ~ " b ? 4 To see that this is well-defined, we must show that the conditions b ? ~ Z b ? " b ? 4 ~ "Z b ? 4 imply
110
Advanced Linear Algebra
" b ? 4 ~ Z "Z b ? 4 But this follows from the fact that " c Z "Z ~ ²" c "Z ³ b ² c Z ³"Z ? 4 Hence, scalar multiplication is well-defined. We leave it to the reader to show that 4 °? 4 is a vector space over 9°? . Consider now a set 8 ~ ¸ 0¹ 4 and the corresponding set 8 b ? 4 ~ ¸ b ? 4 0¹
4 ?4
If 8 spans 4 over 9 then 8 b ? 4 spans 4 °? 4 over 9°? . To see this, note that any # 4 has the form # ~ ' for 9 and so # b ? 4 ~ 4 5 b ? 4 ~ ² b ? 4 ³ ~ ² b ? ³² b ? 4 ³ which shows that 8 b ? 4 spans 4 °? 4 . Now suppose that 8 ~ ¸ 0¹ is a basis for 4 over 9. We claim that 8 b ? 4 is a basis for 4 °? 4 over 9°? . We have seen that 8 b ? 4 spans 4 °? 4 . Also, if ² b ? ³² b ? 4 ³ ~ ? 4 then ? 4 and so
~ ~
~
where ? . From the linear independence of 8 we deduce that ? for all and so b ? ~ ? . Hence 8 b ? 4 is linearly independent and therefore a basis, as desired. To see that (8 ( ~ (8 b ? 4 (, note that if b ? 4 ~ b ? 4 then
c ~ ~
where ? . If £ then ~ ? , which is not possible since ? is a maximal ideal. Hence, ~ . Thus, if 8 is a basis for 4 over 9 then (8 ( ~ (8 b ? 4 ( ~ dim9°? ²4 °? 4 ³
Modules II: Free and Noetherian Modules
111
and so all bases for 4 over 9 have the same cardinality, which proves part 1). Moreover, if 8 spans 4 over 9 then 8 b ? 4 spans 4 °? 4 and so dim9°? ²4 °? 4 ³ (8 b ? 4 ( (8 ( Thus, 8 has cardinality at least as great as that of any basis for 4 over 9 .
The previous theorem allows us to define the rank of a free module. (The term dimension is not used for modules in general.) Definition Let 9 be a commutative ring with identity. The rank rk²4 ³ of a nonzero free 9 -module 4 is the cardinality of any basis for 4 . The rank of the trivial module ¸¹ is .
Theorem 5.1 fails if the underlying ring of scalars is not commutative. The next example describes a module over a noncommutative ring that has the remarkable property of possessing a basis of size for any positive integer . Example 5.1 Let = be a vector space over - with a countably infinite basis 8 ~ ¸ Á Á Ã ¹. Let B²= ³ be the ring of linear operators on = . Observe that B²= ³ is not commutative, since composition of functions is not commutative. The ring B²= ³ is an B²= ³-module and as such, the identity map forms a basis for B²= ³. However, we can also construct a basis for B²= ³ of any desired finite size . To understand the idea, consider the case ~ and define the operators and by ² ³ ~ Á ²b ³ ~ and ² ³ ~ Á ²b ³ ~ These operators are linearly independent essentially because they are surjective and their supports are disjoint. In particular, if b ~ then ~ ² b ³² ³ ~ ² ³ and ~ ² b ³²b ³ ~ ² ³ which shows that ~ and ~ . Moreover, if B²= ³ then we define and by
Advanced Linear Algebra
112
² ³ ~ ² ³ ² ³ ~ ²b ³ from which it follows easily that ~ b which shows that ¸ Á ¹ is a basis for B²= ³. More generally, we begin by partitioning 8 into blocks. For each ~ Á Ã Á c , let 8 ~ ¸ mod ¹ Now we define elements B²= ³ by ²b! ³ ~ !Á where ! and where !Á is the Kronecker delta function. These functions are surjective and have disjoint support. It follows that 9 ~ ¸0 Á Ã Á c ¹ is linearly independent. For if B²= ³ and ~ b Ä b c c then, applying this to b! gives ~ ! ! ²b! ³ ~ ! ² ³ for all . Hence, ! ~ . Also, 9 spans B²= ³, for if B²= ³, we define B²= ³ by ² ³ ~ ²+ ³ to get ² b Ä b c c ³²+! ³ ~ ! ! ²+! ³ ~ ! ² ³ ~ ²+! ³ and so ~ b Ä b c c Thus, 9 ~ ¸0 Á Ã Á c ¹ is a basis for B²= ³ of size .
We have spoken about the cardinality of minimal spanning sets. Let us now speak about the cardinality of maximal linearly independent sets. Theorem 5.2 Let 9 be an integral domain and let 4 be a free 9 -module of finite rank . Then all linearly independent sets have size at most rk²4 ³.
Modules II: Free and Noetherian Modules
113
Proof. Since 4 9 if we prove the result for 9 it will hold for 4 . Let 9 b be the field of quotients of 9 . Then 9 is a subset of the vector space ²9 b ³ and 1) 9 is a subgroup of ²9 b ³ under addition 2) scalar multiplication by elements of 9 is defined and 9 is an 9 -module, 3) scalar multiplication by elements of 9 b is defined but 9 is not closed under this scalar multiplication. Now, if 8 ~ ¸# Á Ã Á # ¹ 9 is linearly dependent over 9 b then 8 is clearly linearly dependent over 9 . Conversely, suppose that 8 is linearly independent over 9 and # b Ä b # ~
where £ for all and £ for some . Multiplying by produces a nontrivial linear dependency over 9
# b Ä b
~
Ä
£
# ~
which implies that ~ for all . Thus 8 is linearly dependent over 9 if and only if it is linearly dependent over 9 b . Of course, in the vector space ²9 b ³ all sets of size b or larger are linearly dependent over 9 b and hence all subsets of 9 of size b or larger are linearly dependent over 9 .
Recall that if ) is a basis for a vector space = over - then = is isomorphic to the vector space ²- ) ³ of all functions from ) to - that have finite support. A similar result holds for free 9 -modules. We begin with the fact that ²9 ) ³ is a free 9 -module. The simple proof is left to the reader. Theorem 5.3 Let ) be any set and let 9 be a ring. The set ²9 ) ³ of all functions from ) to 9 that have finite support is a free 9 -module of rank () ( with basis 8 ~ ¸ ¹ where ²%³ ~ F
if % ~ if % £
This basis is referred to as the standard basis for ²9 ) ³ .
Theorem 5.4 Let 4 be an 9 -module. If ) is a basis for 4 then 4 is isomorphic to ²9 ) ³ . Proof. Consider the map ¢ 4 ¦ ²9 ) ³ defined by setting ²³ ~ where is defined in Theorem 5.3 and extending this to all of 4 by linearity,
Advanced Linear Algebra
114
that is, ² b Ä b ³ ~ b Ä b Since maps a basis for 4 to a basis 8 ~ ¸ ¹ for ²9 ) ³ , it follows that is an isomorphism from 4 to ²9 ) ³ .
Theorem 5.5 Two free 9 -modules (over a commutative ring) are isomorphic if and only if they have the same rank. Proof. If 4 5 then any isomorphism from 4 to 5 maps a basis for 4 to a basis for 5 . Since is a bijection, we have rk²4 ³ ~ rk²5 ³. Conversely, suppose that rk²4 ³ ~ rk²5 ³. Let 8 be a basis for 4 and let 9 be a basis for 5 . Since (8 ( ~ (9(, there is a bijective map ¢ 8 ¦ 9 . This map can be extended by linearity to an isomorphism of 4 onto 5 and so 4 5 .
Free Modules and Epimorphisms Homomorphic images that are free have some behavior reminiscent of vector spaces. Theorem 5.6 1) If ¢ 4 ¦ - is a surjective 9 -homomorphism and - is free then ker²³ is complemented and 4 ~ ker²³ l 5 ker²³ ^ where 5 - . 2) If : is a submodule of 4 and if 4 °: is free then : is complemented and 4 :^
4 :
If in addition, 4 Á : and 4 °: are free then rk²4 ³ ~ rk²:³ b rk6
4 7 :
and if the ranks are all finite then rk6
4 7 ~ rk²4 ³ c rk²:³ :
Proof. For part 1), we prove that is right-invertible. Let 8 ~ ¸# 0¹ be a basis for - . Define 9 ¢ - ¦ ) by setting 9 ²# ³ equal to any member of the nonempty set c ²# ³ and extending 9 to an 9 -homomorphism. Then 9 is a right inverse of and so Theorem 4.14 implies that ker²³ is a direct summand of 4 and 4 ker²³ ^ - . Part 2) follows from part 1), where ~ : is projection onto : .
Modules II: Free and Noetherian Modules
115
Noetherian Modules One of the most desirable properties of a finitely generated 9 -module 4 is that all of its submodules be finitely generated. Example 4.2 shows that this is not always the case and leads us to search for conditions on the ring 9 that will guarantee that all submodules of a finitely generated module are themselves finitely generated. Definition An 9 -module 4 is said to satisfy the ascending chain condition on submodules (abbreviated a.c.c.) if any ascending sequence of submodules : : : Ä of 4 is eventually constant, that is, there exists an index for which : ~ :b ~ :kb ~ Ä Modules with the ascending chain condition on submodules are also called noetherian modules (after Emmy Noether, one of the pioneers of module theory).
Theorem 5.7 An 9 -module 4 is noetherian if and only if every submodule of 4 is finitely generated. Proof. Suppose that all submodules of 4 are finitely generated and that 4 contains an infinite ascending sequence : : :3 Ä
(5.1)
of submodules. Then the union : ~ :
is easily seen to be a submodule of 4 . Hence, : is finitely generated, say : ~ º" Á Ã Á " ». Since " : , there exists an index such that " : . Therefore, if ~ max¸ Á Ã Á ¹, we have ¸" Á Ã Á " ¹ : and so : ~ º" Á Ã Á " » : :b :b Ä : which shows that the chain (5.1) is eventually constant. For the converse, suppose that 4 satisfies the a.c.c on submodules and let : be a submodule of 4 . Pick " : and consider the submodule : ~ º" » : generated by " . If : ~ : then : is finitely generated. If : £ : then there is a " : c : . Now let : ~ º" ," ». If : ~ : then : is finitely generated. If : £ : then pick "3 : c : and consider the submodule :3 ~ º" ," ,"3 ».
116
Advanced Linear Algebra
Continuing in this way, we get an ascending chain of submodules º" » º" Á " » º" Á " Á " » Ä : If none of these submodules is equal to : , we would have an infinite ascending chain of submodules, each properly contained in the next, which contradicts the fact that 4 satisfies the a.c.c. on submodules. Hence, : ~ º" Á Ã Á " », for some and so : is finitely generated.
Since a ring 9 is a module over itself and since the submodules of the module 9 are precisely the ideals of the ring 9 , the preceding discussion may be formulated for rings as follows. Definition A ring 9 is said to satisfy the ascending chain condition on ideals if any ascending sequence ? ? ? Ä of ideals of 9 is eventually constant, that is, there exists an index for which ? ~ ?b ~ ?b2 ~ Ä A ring that satisfies the ascending chain condition on ideals is called a noetherian ring.
Theorem 5.8 A ring 9 is noetherian if and only if every ideal of 9 is finitely generated.
Note that a ring 9 is noetherian as a ring if and only if it is noetherian as a module over itself. More generally, a ring 9 is noetherian if and only if every finitely generated 9 -module is noetherian. Theorem 5.9 Let 9 be a commutative ring with identity. 1) 9 is noetherian if and only if every finitely generated 9 -module is noetherian. 2) If, in addition, 9 is a principal ideal domain then if 4 is generated by elements any submodule of 4 is generated by at most elements. Proof. For part 1), one direction is evident. Assume that 9 is noetherian and let 4 ~ º" Á Ã Á " » be a finitely generated 9 -module. Consider the epimorphism ¢ 9 ¦ 4 defined by ² Á Ã Á ³ ~ " b Ä b " Let : be a submodule of 4 . Then c ²:³ ~ ¸" 9 ²"³ :¹ is a submodule of 9 and ² c ²:³³ ~ : . If every submodule of 9 is finitely generated, then c ²:³ is finitely generated and so c ²:³ ~ º# Á Ã Á # ». Then
Modules II: Free and Noetherian Modules
117
: is finitely generated by ¸ ²# ³Á à Á ²# ³¹. Hence, it is sufficient to show that every submodule 4 of 9 is finitely generated. We proceed by induction on . If ~ , then 4 is an ideal of 9 and is thus finitely generated by assumption. Assume that every submodule of 9 is finitely generated for all and let : be a submodule of 9 . If , we can extract from : something that is isomorphic to an ideal of 9 and so will be finitely generated. In particular, let : be the “last coordinates” in :, specifically, let : ~ ¸²Á à Á Á ³ ² Á à Á c Á ³ : for some Á à Á c 9¹ The set : is isomorphic to an ideal of 9 and is therefore finitely generated, say : ~ º= », where = ~ ¸ Á à Á ¹ is a finite subset of : . Also, let : ~ ¸# : # ~ ² Á à Á c Á ³ for some Á à Á c 9¹ be the set of all elements of : that have last coordinate equal to . Note that : is a nonempty submodule of 9 and is isomorphic to a submodule of 9 c . Hence, the inductive hypothesis implies that : is finitely generated, say : ~ º= », where = is a finite subset of : . By definition of : , each = has the form ~ ²Á à Á Á Á ³ for Á 9 where there is a : of the form ~ ²Á Á à Á Ác Á Á ³ Let = ~ ¸ Á à Á ¹. We claim that : is generated by the finite set = r = . To see this, let # ~ ² Á à Á ³ : . Then ²Á à Á Á ³ : and so
²Á à Á Á ³ ~ ~
for 9. Consider now the sum
$ ~ º= » ~
The last coordinate of this sum is
118
Advanced Linear Algebra
Á ~ ~
and so the difference # c $ has last coordinate and is thus in : ~ º= ». Hence # ~ ²# c $³ b $ º= » b º= » ~ º= r = » as desired. For part 2), we leave it to the reader to review the proof and make the necessary changes. The key fact is that : is isomorphic to an ideal of 9 , which is principal. Hence, : is generated by a single element of 4 .
The Hilbert Basis Theorem Theorem 5.9 naturally leads us to ask which familiar rings are noetherian. The following famous theorem describes one very important case. Theorem 5.10 (Hilbert basis theorem) If a ring 9 is noetherian then so is the polynomial ring 9´%µ. Proof. We wish to show that any ideal ? in 9´%µ is finitely generated. Let 3 denote the set of all leading coefficients of polynomials in ? , together with the element of 9 . Then 3 is an ideal of 9 . To see this, observe that if 3 is the leading coefficient of ²%³ ? and if 9 then either ~ or else is the leading coefficient of ²%³ ? . In either case, 3. Similarly, suppose that 3 is the leading coefficient of ²%³ ? . We may assume that deg ²%³ ~ and deg ²%³ ~ , with . Then ²%³ ~ %c ²%³ is in ? , has leading coefficient and has the same degree as ²%³. Hence, c is either or it is the leading coefficient of ²%³ c ²%³ ? . In either case c 3. Since 3 is an ideal of the noetherian ring 9, it must be finitely generated, say 3 ~ º Á à Á ». Since 3, there exist polynomials ²%³ ? with leading coefficient . By multiplying each ²%³ by a suitable power of %, we may assume that deg ²%³ ~ ~ max¸deg ²%³¹ for all ~ Á à Á . Now for ~ Á à Á c let 3 be the set of all leading coefficients of polynomials in ? of degree , together with the element of 9 . A similar argument shows that 3 is an ideal of 9 and so 3 is also finitely generated. Hence, we can find polynomials 7 ~ ¸Á ²%³Á à Á Á ²%³¹ in ? whose leading coefficients constitute a generating set for 3 .
Modules II: Free and Noetherian Modules
119
Consider now the finite set c
7 ~ 8 7 9 r ¸ ²%³Á à Á ²%³¹ ~
If @ is the ideal generated by 7 then @ ? . An induction argument can be used to show that @ ~ ? . If ²%³ ? has degree then it is a linear combination of the elements of 7 (which are constants) and is thus in @ . Assume that any polynomial in ? of degree less than is in @ and let ²%³ ? have degree . If then some linear combination ²%³ over 9 of the polynomials in 7 has the same leading coefficient as ²%³ and if then some linear combination ²%³ of the polynomials B%c ²%³Á à Á %c ²%³C @ has the same leading coefficient as ²%³. In either case, there is a polynomial ²%³ @ that has the same leading coefficient as ²%³. Since ²%³ c ²%³ ? has degree strictly smaller than that of ²%³ the induction hypothesis implies that ²%³ c ²%³ @ and so ²%³ ~ ´²%³ c ²%³µ b ²%³ @ This completes the induction and shows that ? ~ @ is finitely generated.
Exercises 1. 2. 3. 4. 5. 6.
7.
If 4 is a free 9 -module and ¢ 4 ¦ 5 is an epimorphism then must 5 also be free? Let ? be an ideal of 9 . Prove that if 9°? is a free 9 -module then ? is the zero ideal. Prove that the union of an ascending chain of submodules is a submodule. Let : be a submodule of an 9-module 4 . Show that if 4 is finitely generated, so is the quotient module 4 °: . Let : be a submodule of an 9 -module. Show that if both : and 4 °: are finitely generated then so is 4 . Show that an 9-module 4 satisfies the a.c.c. for submodules if and only if the following condition holds. Every nonempty collection I of submodules of 4 has a maximal element. That is, for every nonempty collection I of submodules of 4 there is an : I with the property that ; I ¬ ; :. Let ¢ 4 ¦ 5 be an 9 -homomorphism. a) Show that if 4 is finitely generated then so is im² ³.
120
8. 9. 10. 11.
12. 13.
14.
15.
Advanced Linear Algebra
b) Show that if ker² ³ and im² ³ are finitely generated then 4 ~ ker² ³ b : where : is a finitely generated submodule of 4 . Hence, 4 is finitely generated. If 9 is noetherian and ? is an ideal of 9 show that 9°? is also noetherian. Prove that if 9 is noetherian then so is 9´% Á à Á % µ. Find an example of a commutative ring with identity that does not satisfy the ascending chain condition. a) Prove that an 9-module 4 is cyclic if and only if it is isomorphic to 9°? where ? is an ideal of 9 . b) Prove that an 9 -module 4 is simple (4 £ ¸¹ and 4 has no proper nonzero submodules) if and only if it is isomorphic to 9°? where ? is a maximal ideal of 9 . c) Prove that for any nonzero commutative ring 9 with identity, a simple 9-module exists. Prove that the condition that 9 be a principal ideal domain in part 2) of Theorem 5.9 is required. Prove Theorem 5.9 in the following way. a) Show that if ; : are submodules of 4 and if ; and :°; are finitely generated then so is : . b) The proof is again by induction. Assuming it true for any module generated by elements, let 4 ~ º# Á à Á #b » and let 4 Z ~ º# Á à Á # ». Then let ; ~ : q 4 Z in part a). Prove that any 9 -module 4 is isomorphic to the quotient of a free module - . If 4 is finitely generated then - can also be taken to be finitely generated. Prove that if : and ; are isomorphic submodules of a module 4 it does not necessarily follow that the quotient modules 4 °: and 4 °; are isomorphic. Prove also that if : l ; : l ; as modules it does not necessarily follow that ; ; . Prove that these statements do hold if all modules are free and have finite rank.
Chapter 6
Modules over a Principal Ideal Domain
We remind the reader of a few of the basic properties of principal ideal domains. Theorem 6.1 Let 9 be a principal ideal domain. 1) An element 9 is irreducible if and only if the ideal º» is maximal. 2) An element in 9 is prime if and only if it is irreducible. 3) 9 is a unique factorization domain. 4) 9 satisfies the ascending chain condition on ideals. Hence, so does any finitely generated 9 -module 4 . Moreover, if 4 is generated by elements any submodule of 4 is generated by at most elements.
Annihilators and Orders When 9 is a principal ideal domain all annihilators are generated by a single element. This permits the following definition. Definition Let 9 be a principal ideal domain and let 4 be an 9 -module, with submodule 5 . Any generator of ann²5 ³ is called an order of 5 . An order of an element # 4 is an order of the submodule º#».
Note that any two orders and of 5 (or of an element # 4 ) are associates, since º» ~ º ». Hence, an order of 5 is uniquely determined up to multiplication by a unit of 9 . For this reason, we may occasionally abuse the terminology and refer to “the” order of an element or submodule. Also, if ( ) 4 are submodules of 4 then ann²)³ ann²(³ and so any order of ( divides any order of ) . Thus, just as with finite groups, the order of an element/submodule divides the order of the module.
122
Advanced Linear Algebra
Cyclic Modules The simplest type of nonzero module is clearly a cyclic module. Despite their simplicity, cyclic modules are extremely important and so we want to explore some of their basic properties. Theorem 6.2 Let 9 be a principal ideal domain. 1) If º#» is a cyclic 9 -module with ann²#³ ~ º» then the map ¢ 9 ¦ º#» defined by ²³ ~ # is a surjective 9 -homomorphism with kernel º». Hence º#»
9 º »
In other words, cyclic 9 -modules are isomorphic to quotient modules of the base ring 9. If is a prime then º» is a maximal ideal in 9 and so 9°º» is a field. 2) Any submodule of a cyclic 9 -module is cyclic. 3) Let º#» be a cyclic submodule of 4 of order . Then º #» has order °gcd²Á ³. Hence, if and are relatively prime then º #» also has order . 4) If " Á à Á " are nonzero elements of M with orders Á à Á that are pairwise relatively prime, then the sum # ~ " b Ä b " has order ~ Ä . Consequently, if 4 is an 9 -module and 4 ~ ( b Ä b ( where the submodules ( have orders that are pairwise relatively prime, then the sum is direct. Proof. We leave proof of part 1) as an exercise. Part 2) follows from part 2) of Theorem 5.9. For part 3), we first consider the two extremes: when is relatively prime to and when . As to the first, let and be relatively prime. If ² #³ ~ then . Hence, ²Á ³ ~ implies that and so ~ °gcd²Á ³ is an order of #. Next, if ~ then ² #³ ~ and so any order ² #³ of # divides . But if ² #³ properly divides then properly divides ~ and yet annihilates #, which contradicts the fact that is an order of #. Hence ² #³ and are associates and so ~ ° ~ °gcd²Á ³ is an order of #. Now we can combine these two extremes to finish the proof. Write ~ ² °³ where ~ gcd² Á ³ divides and ° is relatively prime to . Using the previous results, we find that ² °³# has order and so # ~ ² °³# has order ° .
Modules Over a Principal Ideal Domain
123
For part 4), since annihilates #, the order of # divides . If the order of # is a proper divisor of then for some index , there is a prime dividing for which ° annihilates #. But ° annihilates each " for £ and so ~
# ~ " ~ 6 7"
However, and ° are relatively prime and so the order of ²° ³" is equal to the order of " , which contradicts the equation above. Hence, the order of # is . Finally, to see that the sum above is direct, note that if # b Ä b # ~ where # ( then each # must be , for otherwise the order of the sum on the left would be different from .
Free Modules over a Principal Ideal Domain Example 4.5 showed that a submodule of a free module need not be free. (The submodule { d ¸¹ of { d { is not free.) However, if 9 is a principal ideal domain this cannot happen. Theorem 6.3 Let 4 be a free module over a principal ideal domain 9 . Then any submodule : of 4 is also free and rk²:³ rk²4 ³. Proof. We will give the proof only for modules of finite rank, although the theorem is true for all free modules. Thus, since 4 9 where ~ rk²4 ³ we may in fact assume that 4 ~ 9 . Our plan is to proceed by induction on . For ~ , we have 4 ~ 9 and any submodule : of 9 is just an ideal of 9 . If : ~ ¸¹ then : is free by definition. Otherwise, : ~ º» for some £ . But since 9 is an integral domain, we have £ for all £ and so ¸¹ is a basis for : . Thus, : is free and rk²:³ ~ ~ rk²4 ³. Now assume that if then any submodule : of 9 is free and rk²:³ . Let : be a submodule of 9 . Let : ~ ¸# : # ~ ² Á à Á c Á ³ for some Á à Á c 9¹ and : ~ ¸²Á à Á Á ³ ² Á à Á c Á ³ : for some Á à Á c 9¹ Note that : and : are nonempty. Since : is isomorphic to a submodule of 9 c , the inductive hypothesis implies that : is free. Let 8 be a basis for : with (8 ( c . If : ~ ¸¹ then take 8 ~ J.
124
Advanced Linear Algebra
Now, : is isomorphic to a submodule (ideal) of 9 and is therefore also free of rank at most . If : ~ ¸¹ then all elements of : have zero final coordinate, which means that : ~ : , which is free with rank at most , as desired. So assume that : is not trivial and let ¸¹ be a basis for : where ~ ²Á à Á Á ³ for £ 9 . Let : satisfy ~ ² Á à Á c Á ³ We claim that 8 r ¸¹ is a basis for : . To see that 8 r ¸¹ generates : let # ~ ² Á à Á ³ : . Then ²Á à Á Á ³ : and so ²Á à Á Á ³ ~ ~ ²Á à Á Á ³ for 9. Thus ~ and ~ ² Á à Á c Á ³ ~ ² Á à Á c Á ³ Hence the difference # c is in : ~ º8 ». We then have # ~ ²# c ³ b º8 » b º» ~ º8 r ¸¹» and so 8 r ¸¹ generates : . Finally, to see that 8 r ¸¹ is linearly independent, note that if 8 ~ ¸# Á à Á # ¹ and if # b Ä b c # b ~ then comparing th coefficients gives ~ . Since 9 is an integral domain and £ we deduce that ~ . It follows that ~ for all . Thus 8 r ¸¹ is a basis for : and the proof is complete.
If = is a vector space of dimension then any set of linearly independent vectors in = is a basis for = . This fails for modules. For example, { is a {module of rank but the independent set ¸¹ is not a basis. On the other hand, the fact that a spanning set of size is a basis does hold for modules over a principal ideal domain, as we now show. Theorem 6.4 Let 4 be a free 9 -module of rank , where 9 is a principal ideal domain. Let : ~ ¸ Á à Á ¹ be a spanning set for 4 . Then : is a basis for 4. Proof. Let 8 ~ ¸ Á à Á ¹ be a basis for 4 and define the map ¢ 4 ¦ 4 by ² ³ ~ and extending to a surjective 9 -homomorphism. Since 4 is free, Theorem 5.6 implies that 4 ker² ³ ^ im² ³ ~ ker² ³ ^ 4 Since ker² ³ is a submodule of the free module and since 9 is a principal ideal domain, we know that ker² ³ is free of rank at most . It follows that
Modules Over a Principal Ideal Domain
125
rk²4 ³ ~ rk²ker² ³³ b rk²4 ³ and so rk²ker² ³³ ~ , that is, ker² ³ ~ ¸¹, which implies that is an 9 isomorphism and so : is a basis.
In general, a basis for a submodule of a free module over a principal ideal domain cannot be extended to a basis for the entire module. For example, the set ¸¹ is a basis for the submodule { of the {-module {, but this set cannot be extended to a basis for { itself. We state without proof the following result along these lines. Theorem 6.5 Let 4 be a free 9 -module of rank , where 9 is a principal ideal domain. Let 5 be a submodule of 4 that is free of rank . Then there is a basis 8 for 4 that contains a subset : ~ ¸# Á à Á # ¹ for which ¸ # Á à Á # ¹ is a basis for 5 , for some nonzero elements Á à Á of 9 .
Torsion-Free and Free Modules Let us explore the relationship between the concepts of torsion-free and free. It is not hard to see that any free module over an integral domain is torsion-free. The converse does not hold, unless we strengthen the hypotheses by requiring that the module be finitely generated. Theorem 6.6 Let 4 be a torsion-free finitely generated module over a principal ideal domain 9 . Then 4 is free. Thus, a finitely generated module over a principal ideal domain is free if and only if it is torsion free. Proof. Let . ~ ¸# Á Ã Á # ¹ be a generating set for 4 . Consider first the case ~ , whence . ~ ¸#¹. Then . is a basis for 4 since singleton sets are linearly independent in a torson-free module. Hence, 4 is free. Now suppose that . ~ ¸"Á #¹ is a generating set with "Á # £ . If . is linearly independent, we are done. If not, then there exist nonzero Á 9 for which " ~ #. It follows that 4 ~ º"Á #» º"» and so 4 is a submodule of a free module and is therefore free by Theorem 6.3. But the map ¢ 4 ¦ 4 defined by ²#³ ~ # is an isomorphism because 4 is torsion-free. Thus 4 is also free. Now we can do the general case. Write . ~ ¸" Á Ã Á " Á # Á Ã Á #c ¹ where : ~ ¸" Á Ã Á " ¹ is a maximal linearly independent subset of . . (Note that : is nonempty because singleton sets are linearly independent.) For each # , the set ¸" Á Ã Á " Á # ¹ is linearly dependent and so there exist 9 and Á Ã Á 9 for which
Advanced Linear Algebra
126
# b " b Ä b " ~ If ~ Äc then 4 ~ º" Á Ã Á " Á # Á Ã Á #c » º" Á Ã Á " » and since the latter is a free module, so is 4 , and therefore so is 4 .
Prelude to Decomposition: Cyclic Modules The following result shows how cyclic modules can be composed and decomposed. Theorem 6.7 Let 4 be an 9 -module. 1) If " Á Ã Á " are nonzero elements of M with orders Á Ã Á that are pairwise relatively prime, then º" b Ä b " » ~ º" » l Ä l º" » 2) If # 4 has order ~ Ä where Á Ã Á are pairwise relatively prime, then # can be written in the form # ~ " b Ä b " where " has order . Moreover, º#» ~ º" » l Ä l º" » Proof. According to Theorem 6.2, the order of # is and the sum on the right is direct. It is clear that º" b Ä b " » º" » l Ä l º" ». For the reverse inclusion, since and ° are relatively prime, there exist Á 9 for which b ~ Hence " ~ 6 b
" ~ ²" b Ä b " ³ º" b Ä b " » 7" ~
Similarly, " º" b Ä b " » for all and so we get the reverse inclusion. For part 2), the scalars ~ ° are relatively prime and so there exist 9 for which b Ä b ~ Hence, # ~ ² b Ä b ³# ~ # b Ä b # Since the order of # divides , these orders are pairwise relatively prime.
Modules Over a Principal Ideal Domain
127
Hence, the order of the sum on the right is the product of the orders of the terms and so # must have order . The second statement follows from part 1).
The First Decomposition The first step in the decomposition of a finitely generated module 4 over a principal ideal domain 9 is an easy one. Theorem 6.8 Any finitely generated module 4 over a principal ideal domain 9 is the direct sum of a free 9 -module and a torsion 9 -module 4 ~ 4free l 4tor As to uniqueness, the torsion part 4tor is unique (it must be the set of all torsion elements of 4 ) whereas the free part 4free is not unique. However, all possible free summands are isomorphic and thus have the same rank. Proof. As to existence, the set 4tor of all torsion elements is easily seen to be a submodule of 4 . Since 4 is finitely generated, so is the torsion-free quotient module 4 °4tor . Hence, according to Theorem 6.6, 4 °4tor is free. Consider now the canonical projection ¢ 4 ¦ 4 °4tor onto 4tor . Since 4 °4tor is free, Theorem 5.6 implies that 4 ~ 4tor l where - 4 °4tor is free. As to uniqueness, suppose that 4 ~ ; l . where ; is torsion and . is free. Then ; 4tor . But if # 4tor and # ~ ! b where ! ; and . then # ~ and ! ~ for some nonzero Á 9 and so ²³ ~ , which implies that ~ , that is, # ; . Thus, ; ~ 4tor . For the free part, since 4 ~ 4tor l - ~ 4tor l . , the submodules - and . are both complements of 4tor and hence are isomorphic. Hence, all free summands are isomorphic and therefore have the same rank.
Note that if ¸$ Á Ã Á $ ¹ is a basis for 4free we can write 4 ~ º$ » l Ä l º$ » l 4tor where each cyclic submodule º$ » has zero annihilator. This is a partial decomposition of 4 into a direct sum of cyclic submodules.
A Look Ahead So now we turn our attention to the decomposition of finitely generated torsion modules 4 over a principal ideal domain. We will develop two decompositions. One decomposition has the form
128
Advanced Linear Algebra
4 ~ º# » l Ä l º# » where the annihilators of the cyclic submodules form an ascending chain ann²º# »³ Ä ann²º# »³ This decomposition is called an invariant factor decomposition of 4 . Although we will not approach it in quite this manner, the second decomposition can be obtained by further decomposing each cyclic submodule in the invariant factor decomposition into cyclic submodules whose annihilators have the form º » where is a prime. Submodules with annihilators of this form are called primary submodules and so the second decomposition is referred to as a primary cyclic decomposition. Our plan will be to derive the primary cyclic decomposition first and then obtain the invariant factor decomposition from the primary cyclic decomposition by a piecing-together process, as described in Theorem 6.7. As we will see, while neither of these decompositions is unique, the sequences of annihilators are unique, that is, these sequences are completely determined by the module 4 .
The Primary Decomposition The first step in the primary cyclic decomposition is to decompose the torsion module into a direct sum of primary submodules. Definition Let be a prime in 9 . A -primary (or just primary) module is a module whose order is a power of .
Note that a -primary module 4 with order must have an element of order . Theorem 6.9 (The primary decomposition theorem) Let 4 be a nonzero torsion module over a principal ideal domain 9 , with order ~ Ä where the 's are distinct nonassociate primes in 9 . 1) Then 4 is the direct sum 4 ~ 4 l Ä l 4 where 4 ~ ¸# 4 # ~ ¹
Modules Over a Principal Ideal Domain
129
is a primary submodule with order and annihilator ann²4 ³ ~ º » 2) This decomposition of 4 into primary submodules is unique up to order of the summands. That is, if 4 ~ 5 l Ä l 5 where 5 is primary of order and Á Ã Á are distinct nonassociate primes then ~ and after a suitable reindexing of the summands we have 5 ~ 4 . Hence, and are associates and ~ (and so ~ Ä is also a prime factorization of ). Proof. For part 1), let us write ~ ° . We claim that 4 ~ 4 ~ ¸ # # 4 ¹ Since # is annihilated by , we have 4 4 . On the other hand, since and are relatively prime, there exist Á 9 for which b ~ and so if % 4 then % ~ % ~ ² b ³% ~ % 4 Hence 4 4 . Now, since gcd² Á Ã Á ³ ~ , there exist scalars for which b Ä b ~ and so for any % 4
% ~ % ~ ² b Ä b ³% 4 ~
Moreover, since the order of 4 divides and the 's are pairwise relatively prime, it follows that the sum of the submodules 4 is direct, that is, 4 ~ 4 l Ä l 4 ~ 4 l Ä l 4 As to the annihilators, it is clear that º » ann² 4 ³. For the reverse inclusion, if ann² 4 ³ then ann²4 ³ and so , that is, and so º ». Thus ann² 4 ³ ~ º ». As to uniqueness, we claim that ~ Ä is an order of 4 . This follows from the fact that 5 contains an element " of order and so the sum # ~ " b Ä b " has order . Hence, divides . But divides and so and are associates.
Advanced Linear Algebra
130
Unique factorization in 9 now implies that ~ and, after a suitable reindexing, that ~ and and are associates. Hence, 5 is primary of order . For convenience, we can write 5 as 5 . Hence, 5 ¸# 4 # ~ ¹ ~ 4 But if 5 l Ä l 5 ~ 4 l Ä l 4 and 5 4 for all , we must have 5 ~ 4 for all .
The Cyclic Decomposition of a Primary Module The next step in the decomposition process is to show that a primary module can be decomposed into a direct sum of cyclic submodules. While this decomposition is not unique (see the exercises), the set of annihilator ideals is unique, as we will see. To establish this uniqueness, we use the following result. Lemma 6.10 Let 4 be a module over a principal ideal domain 9 and let 9 be a prime. 1) If 4 ~ ¸¹ then 4 is a vector space over the field 9°º» with scalar multiplication defined by ² b º»³# ~ # for all # 4 . 2) For any submodule : of 4 the set : ²³ ~ ¸# : # ~ ¹ is also a submodule of 4 and if 4 ~ : l ; then 4 ²³ ~ : ²³ l ; ²³ Proof. For part 1), since is prime, the ideal º» is maximal and so 9°º» is a field. We leave the proof that 4 is a vector space over 9°º» to the reader. For part 2), it is straightforward to show that : ²³ is a submodule of 4 . Since : ²³ : and ; ²³ ; we see that : ²³ q ; ²³ ~ ¸¹. Also, if # 4 ²³ then # ~ . But # ~ b ! for some : and ! ; and so ~ # ~ b !. Since : and ! ; we deduce that ~ ! ~ , whence # : ²³ l ; ²³ . Thus, 4 ²³ : ²³ l ; ²³ and since the reverse inequality is manifest, the result is proved.
Theorem 6.11 (The cyclic decomposition theorem of a primary module) Let 4 be a nonzero primary finitely generated torsion module over a principal ideal domain 9, with order .
Modules Over a Principal Ideal Domain
131
1) Then 4 is the direct sum 4 ~ º# » l Ä l º# »
(6.1)
of cyclic submodules with annihilators ann²º# »³ ~ º » that can be arranged in ascending order ann²º# »³ Ä ann²º# »³ or equivalently ~ Ä 2) As to uniqueness, suppose that 4 is also the direct sum 4 ~ º" » l Ä l º" » of cyclic submodules with annihilators ann²º" »³ ~ º » arranged in ascending order ann²º" »³ Ä ann²º" »³ or equivalently Ä Then the two chains of annihilators are identical, that is ann²º" »³ ~ ann²º# »³ for all . Thus, ~ , and are associates and ~ for all . Proof. Note first that if (6.1) holds then # ~ . Hence, the order of # divides and so must have the form for . To prove (6.1), let # 4 be an element with order equal to the order of 4 , that is ann²#³ ~ ann²4 ³ ~ º » (We remarked earlier that such an element must exist.) If we show that º#» is complemented, that is, 4 ~ º#» l : for some submodule : then since : is also a finitely generated primary torsion module over 9 , we can repeat the process to get 4 ~ º#» l º# » l Ä l º# » l : where ann²# ³ ~ º ». We can continue this decomposition as long as : £ ¸¹. But the ascending sequence of submodules º#» º#» l º# » Ä must terminate since 4 is noetherian and so eventually we must have : ~ ¸¹, giving (6.1).
132
Advanced Linear Algebra
Now, the direct sum 4 ~ º#» l ¸¹ clearly exists. Suppose that the direct sum 4 ~ º#» l : exists. We claim that if 4 £ 4 then it is possible to find a submodule :b that properly contains : for which the direct sum 4b ~ º#» l :b exists. Once this claim is established, then since 4 is finitely generated, this process must stop after a finite number of steps, leading to 4 ~ º#» l : for some submodule : , as desired. If 4 £ 4 then there is a " 4 ± 4 . We claim that for some scalar 9 , we can take :b ~ º: Á " c #», which is a proper superset of : since " ¤ º#» l : . Thus, we must show that there is an 9 for which % º#» q º: Á " c #» ¬ % ~ Now, there exist scalars and for which % ~ # ~ b ²" c #³ What can we say about the scalars and ? First, although " ¤ 4 , we do have " ~ ² b ³# c º#» l : So let us consider the ideal of all such scalars ? ~ ¸ 9 " º#» l : ¹ Since ? and ? is principal, we have ? ~ º » ~ ¸ 9 " º#» l : ¹ for some . Moreover, is not since that would imply that ? ~ 9 and so " ~ " º#» l : , contrary to assumption. Since ? , we have " ~ # b ! for some ! : . Then ~ " ~ c ² "³ ~ c # b c ! and since º#» q : ~ ¸¹, it follows that c # ~ . Hence ~ for some 9 and " ~ # b ! ~ # b ! Since ? , we have ~ and so
Modules Over a Principal Ideal Domain
133
" ~ " ~ # b ! Thus, % ~ # ~ b ²" c #³ ~ b # b ! c # Now it appears that ~ would be a good choice, since then % ~ # ~ b ! : and since º#» q : ~ ¸¹ we get % ~ . This completes the proof of (6.1). For uniqueness, note first that 4 has orders and and so and are associates and ~ . Next we show that ~ . According to part 2) of Lemma 6.10, 4 ²³ ~ º# »²³ l Ä l º# »²³ and 4 ²³ ~ º" »²³ l Ä l º" »²³ where all summands are nonzero. Since 4 ²³ ~ ¸¹, it follows from part 1) of Lemma 6.10 that 4 ²³ is a vector space over 9°º» and so each of the preceding decompositions expresses 4 ²³ as a direct sum of one-dimensional vector subspaces. Hence, ~ dim²4 ²³ ³ ~ . Finally, we show that the exponents and are equal using induction on . If ~ then ~ for all and since ~ , we also have ~ for all . Suppose the result is true whenever c and let ~ . Suppose that ² Á à Á ³ ~ ² Á à Á Á Á à Á ³Á and ² Á à Á ³ ~ ² Á à Á ! Á Á à Á ³Á ! Then 4 ~ º# » l Ä l º# » and 4 ~ º" » l Ä l º"! » But º# » ~ º# » is a cyclic submodule of 4 with annihilator º c » and so by the induction hypothesis ~ ! and ~ Á à Á ~ which concludes the proof of uniqueness.
134
Advanced Linear Algebra
The Primary Cyclic Decomposition Theorem Now we can combine the various decompositions. Theorem 6.12 (The primary cyclic decomposition theorem) Let 4 be a nonzero finitely generated module over a principal ideal domain 9 . 1) Then 4 ~ 4free l 4tor where 4free is free and 4tor is a torsion module. If 4tor has order ~ Ä where the 's are distinct nonassociate primes in 9 then 4tor can be uniquely decomposed (up to the order of the summands) into the direct sum 4 ~ 4 l Ä l 4 where 4 ~ ¸# 4tor # ~ ¹ is a primary submodule with annihilator º ». Finally, each primary submodule 4 can be written as a direct sum of cyclic submodules, so that 4 ~ 4free l ´ º#Á » l Ä l º#Á » µ l Ä l ´ º#Á » l Ä l º#Á » µ 4
4
where ann²º#Á »³ ~ º Á » and the terms in each cyclic decomposition can be arranged so that, for each , ann²º#Á »³ Ä ann²º#Á »³ or, equivalently, ~ Á Á Ä Á 2) As for uniqueness, suppose that 4 ~ 5free l º% » l Ä l º%M » is a decomposition of 4 into the direct sum of a free module 5free and primary cyclic submodules º% ». Then a) rk²5free ³ ~ rk²4free ³ b) The number of summands is the same in both decompositions, that is M ~ b Ä b c) The summands in this decomposition can be reordered to get 4 ~ 5free l ´º"Á » l Ä l º"Á »µ l Ä l ´º"Á » l Ä l º"Án »µ where the primary submodules are the same
Modules Over a Principal Ideal Domain
135
º"Á » l Ä l º"Á » ~ º#Á » l Ä l º#Á » for ~ Á à Á and the annihilator chains are the same, that is, ann²º"Á »³ ~ ann²º#Á »³ for all Á . In summary, the free rank, primary submodules and annihilator chain are uniquely determined by the module 4 . Proof. We need only prove the uniqueness. We have seen that 5free and 4free are isomorphic and thus have the same rank. Let us look at the torsion part. Since the order of each primary cyclic submodule º% » must divide the order of 4 , this order is a power of one of the primes Á à Á . Let us group the summands by like primes to get 4 ~ 5free l ´ º"Á » l Ä l º"Á » µ l Ä l ´ º"Á » l Ä l º"Á » µ 5 ²primary of order ³
5 ²primary of order ³
Then each group 5 is a primary submodule of 4 with order . The uniqueness of primary decompositions of 4tor implies that 5 ~ 4 . Then the uniqueness of cyclic decompositions implies that the annihilator chains for the decompositions of 5 and 4 are the same.
We have seen that in a primary cyclic decomposition 4 ~ 4free l ´º#Á » l Ä l º#Á »µ l Ä l ´º#Á » l Ä l º#Á »µ the chain of annihilators
ann²º#Á »³ ~ º Á »
is unique except for order. The sequence Á of generators is uniquely determined up to order and multiplication by units. This sequence is called the sequence of elementary divisors of 4 . Note that the elementary divisors are not quite as unique as the annihilators: the multiset of annihilators is unique but the multiset of generators is not since if Á is a generator then so is " Á for any unit " in 9 .
The Invariant Factor Decomposition According to Theorem 6.7, if : and ; are cyclic submodules with relatively prime orders, then : l ; is a cyclic submodule whose order is the product of the orders of : and ; . Accordingly, in the primary cyclic decomposition of 4 4 ~ 4free l ´ º#Á » l Ä l º#Á » µ l Ä l ´ º#Á » l Ä l º#Á » µ 4
4
136
Advanced Linear Algebra
with elementary divisors Á satisfying ~ Á Á Ä Á
(6.4)
we can feel free to regroup and combine cyclic summands with relatively prime orders. One judicious way to do this is to take the leftmost (highest order) cyclic submodules from each group to get + ~ º#Á » l Ä l º#Á » and repeat the process + ~ º#Á » l Ä l º#Á » + ~ º#Á » l Ä l º#Á » Å Of course, some summands may be missing here since the primary modules 4 do not necessarily have the same number of summands. In any case, the result of this regrouping and combining is a decomposition of the form 4 ~ 4free l + l Ä l + which is called an invariant factor decomposition of 4 . For example, suppose that 4 ~ 4free l ´º#Á » l º#Á »µ l ´º#Á »µ l ´º#Á » l º#Á » l º#Á »µ Then the resulting regrouping and combining gives 4 ~ 4free l ´ º#Á » l º#Á » l º#Á » µ l ´ º#Á » l º#Á » µ l ´º# Á »µ +
+
+
As to the orders of the summands, referring to (6.4), if + has order then since the highest powers of each prime are taken for , the second–highest for and so on, we conclude that c Ä or equivalently, ann²+ ³ ann²+ ³ Ä The numbers are called invariant factors of the decomposition. For instance, in the example above suppose that the elementary divisors are Á Á Á Á Á Then the invariant factors are
(6.5)
Modules Over a Principal Ideal Domain
137
~ ~ ~
The process described above that passes from a sequence Á of elementary divisors in order (6.4) to a sequence of invariant factors in order (6.5) is reversible. The inverse process takes a sequence Á Ã Á satisfying (6.5), factors each into a product of distinct nonassociate prime powers with the primes in the same order and then “peels off” like prime powers from the left. (The reader may wish to try it on the example above.) This fact, together with Theorem 6.7, implies that primary cyclic decompositions and invariant factor decompositions are essentially equivalent. In particular, given a primary cyclic decomposition of 4 we can produce an invariant factor decomposition of 4 whose invariant factors are products of the elementary divisors and for which each elementary divisor appears in exactly one invariant factor. Conversely, given an invariant factor decomposition of 4 we can obtain a primary cyclic decomposition of 4 whose elementary divisors are precisely the multiset of prime power factors of the invariant factors. It follows that since the elementary divisors of 4 are unique up to multiplication by units, the invariant factors of 4 are also unique up to multiplication by units. Theorem 6.13 (The invariant factor decomposition theorem) Let 4 be a finitely generated module over a principal ideal domain 9 . Then 4 ~ 4free l + l Ä l + where 4free is a free submodule and D is a cyclic submodule of 4 , with order , where c Ä This decomposition is called an invariant factor decomposition of 4 and the scalars , are called the invariant factors of 4 . The invariant factors are uniquely determined, up to multiplication by a unit, by the module 4 . Also, the rank of 4free is uniquely determined by 4 .
The annihilators of an invariant factor decomposition are called the invariant ideals of 4 . The chain of invariant ideals is unique, as is the chain of annihilators in the primary cyclic decomposition. Note that is an order of 4 , that is ann²4 ³ ~ º » Note also that the product
138
Advanced Linear Algebra
~ Ä of the invariant factors of 4 has some nice properties. For example, is the product of all the elementary divisors of 4 . We will see in a later chapter that in the context of a linear operator on a vector space, is the characteristic polynomial of .
Exercises 1. 2.
3.
4.
5.
6.
Show that any free module over an integral domain is torsion-free. Let 9 be a principal ideal domain and 9 b the field of quotients. Then 9 b is an 9 -module. Prove that any nonzero finitely generated submodule of 9 b is a free module of rank . Let 9 be a principal ideal domain. Let 4 be a finitely generated torson-free 9 -module. Suppose that 5 is a submodule of 4 for which 5 is a free 9 module of rank and 4 °5 is a torsion module. Prove that 4 is a free 9 module of rank . Hint: Use the results of the previous exercise. Show that the primary cyclic decomposition of a torsion module over a principal ideal domain is not unique (even though the elementary divisors are). Show that if 4 is a finitely generated 9 -module where 9 is a principal ideal domain, then the free summand in the decomposition 4 ~ - l 4tor need not be unique. If º#» is a cyclic 9 -module or order show that the map ¢ 9 ¦ º#» defined by ²³ ~ # is a surjective 9 -homomorphism with kernel º» and so º#»
9 º»
If 9 is a ring with the property that all submodules of cyclic 9 -modules are cyclic, show that 9 is a principal ideal domain. 8. Suppose that - is a finite field and let - i be the set of all nonzero elements of - . a) Show that - i is an abelian group under multiplication. b) Show that ²%³ - ´%µ is a nonconstant polynomial over - and if - is a root of ²%³ then % c is a factor of 7 ²%³. c) Prove that a nonconstant polynomial ²%³ - ´%µ of degree can have at most distinct roots in - . d) Use the invariant factor or primary cyclic decomposition of a finite {module to prove that - i is cyclic. 9. Let 9 be a principal ideal domain. Let 4 ~ º#» be a cyclic 9 -module with order . We have seen that any submodule of 4 is cyclic. Prove that for each 9 such that there is a unique submodule of 4 of order . 10. Suppose that 4 is a free module of finite rank over a principal ideal domain 9 . Let 5 be a submodule of 4 . If 4 °5 is torsion, prove that rk²5 ³ ~ rk²4 ³. 7.
Modules Over a Principal Ideal Domain
139
11. Let - ´%µ be the ring of polynomials over a field - and let - Z ´%µ be the ring of all polynomials in - ´%µ that have coefficient of % equal to . Then - ´%µ is an - Z ´%µ-module. Show that - ´%µ is finitely generated and torsion-free but not free. Is - Z ´%µ a principal ideal domain? 12. Show that the rational numbers r form a torsion-free {-module that is not free.
More on Complemented Submodules 13. Let 9 be a principal ideal domain and let 4 be a free 9 -module. a) Prove that a submodule 5 of 4 is complemented if and only if 4 °5 is free. b) If 4 is also finitely generated, prove that 5 is complemented if and only if 4 °5 is torsion-free. 14. Let 4 be a free module of finite rank over a principal ideal domain 9 . a) Prove that if 5 is a complemented submodule of 4 then rk²5 ³ ~ rk²4 ³ if and only if 5 ~ 4 . b) Show that this need not hold if 5 is not complemented. c) Prove that 5 is complemented if and only if any basis for 5 can be extended to a basis for 4 . 15. Let 4 and 5 be free modules of finite rank over a principal ideal domain 9 . Let ¢ 4 ¦ 5 be an 9 -homomorphism. a) Prove that ker² ³ is complemented. b) What about im² ³? c) Prove that rk²4 ³ ~ rk²ker² ³³ b rk²im² ³³ ~ rk²ker² ³³ b rk6
4 7 ker² ³
d) If is surjective then is an isomorphism if and only if rk²4 ³ ~ rk²5 ³. e) If 4 °3 is free then rk6
4 7 ~ rk²4 ³ c rk²3³ 3
16. A submodule 5 of a module 4 is said to be pure in 4 if whenever # ¤ 4 ± 5 then # ¤ 5 for all nonzero 9 . a) Show that 5 is pure if and only if # 5 and # ~ $ for 9 implies $ 5. b) Show that 5 is pure if and only if 4 °5 is torsion-free. c) If 9 is a principal ideal domain and 4 is finitely generated, prove that 5 is pure if and only if 4 °5 is free. d) If 3 and 5 are pure submodules of 4 then so are 3 q 5 and 3 r 5 . What about 3 b 5 ? e) If 5 is pure in 4 then show that 3 q 5 is pure in 3 for any submodule 3 of 4 .
140
Advanced Linear Algebra
17. Let 4 be a free module of finite rank over a principal ideal domain 9 . Let 3 and 5 be submodules of 4 with 3 complemented in 4 . Prove that rk²3 b 5 ³ b rk²3 q 5 ³ ~ rk²3³ b rk²5 ³
Chapter 7
The Structure of a Linear Operator
In this chapter, we study the structure of a linear operator on a finitedimensional vector space, using the powerful module decomposition theorems of the previous chapter. Unless otherwise noted, all vector spaces will be assumed to be finite-dimensional.
A Brief Review We have seen that any linear operator on a finite-dimensional vector space can be represented by matrix multiplication. Let us restate Theorem 2.14 for linear operators. Theorem 7.1 Let B²= ³ and let 8 ~ ² Á Ã Á ³ be an ordered basis for = . Then can be represented by matrix multiplication ´ ²#³µ8 ~ ´ µ8 ´#µ8 where ´ µ8 ~ ²´ ² ³µ8 Ä ´ ² ³µ8 ³
Since the matrix ´ µ8 depends on the ordered basis 8 , it is natural to wonder how to choose this basis in order to make the matrix ´ µ8 as simple as possible. That is the subject of this chapter. Let us also restate the relationship between the matrices of with respect to different ordered bases. Theorem 7.2 Let B²= ³ and let 8 and 8 Z be ordered bases for = . Then the matrix of with respect to 8 Z can be expressed in terms of the matrix of with respect to 8 as follows ´ µ8Z ~ 48Á8Z ´ µ8 ²48Á8Z ³c
142
Advanced Linear Algebra
where 48,8Z ~ ²´ µ8Z Á Ã Á ´ µ8Z ³³
Finally, we recall the definition of similarity and its relevance to the current discussion. Definition Two matrices ( and ) are similar if there exists an invertible matrix 7 for which ) ~ 7 (7 c The equivalence classes associated with similarity are called similarity classes.
Theorem 7.3 Let = be a vector space of dimension . Then two d matrices ( and ) are similar if and only if they represent the same linear operator B²= ³, but possibly with respect to different ordered bases. In this case, ( and ) represent exactly the same set of linear operators in B²= ³.
According to Theorem 7.3, the matrices that represent a given linear operator B²= ³ are precisely the matrices that lie in one particular similarity class. Hence, in order to uniquely represent all linear operators on = we would like to find a simple representative of each similarity class, that is, a set of simple canonical forms for similarity. The simplest type of useful matrices is the diagonal matrices. However, not all linear operators can be represented by diagonal matrices, that is, the set of diagonal matrices does not form a set of canonical forms for similarity. This gives rise to two different directions for further study. First, we can search for a characterization of those linear operators that can be represented by diagonal matrices. Such operators are called diagonalizable. Second, we can search for a different type of “simple” matrix that does provide a set of canonical forms for similarity. We will pursue both of these directions at the same time.
The Module Associated with a Linear Operator Throughout this chapter, we fix a nonzero linear operator B²= ³ and think of = not only as a vector space over a field - but also as a module over - ´%µ, with scalar multiplication defined by ²%³# ~ ² ³²#³ We call = the - ´%µ-module defined by and write = to indicate the dependence on (when necessary). Thus, = and = are modules over the same ring - ´%µ, although the scalar multiplication is different if £ .
The Structure of a Linear Operator
143
Our plan is to interpret the concepts of the previous chapter for the module/vector space = . First, since = is a finite-dimensional vector space, so is B²= ³. It follows that = is a torsion module. To see this, note that since dim²B²= ³³ ~ , the b vectors
Á Á Á Ã Á
are linearly dependent in B²= ³, which implies that ² ³ ~ for some polynomial ²%³ - ´%µ. Hence, ²%³= ~ ¸¹, which shows that ann²= ³ £ ¸¹. Also, since = is finitely generated as a vector space, it is, a fortiori, finitely generated as an - ´%µ-module defined by . Thus, = is a finitely generated torsion module over a principal ideal domain - ´%µ and so we may apply the decomposition theorems of the previous chapter. Next we take a look at the connection between module isomorphisms and vector space isomorphisms. This also describes the connection with similarity. Theorem 7.4 Let and be linear operators on = . Then = and = are isomorphic as - ´%µ-modules if and only if and are similar as linear operators. In particular, a function ¢ = ¦ = is a module isomorphism if and only if it is a vector space automorphism of = satisfying ~ c Proof. Suppose that ¢ = ¦ = is a module isomorphism. Then for # = ²%#³ ~ %²#³ which is equivalent to ² ²#³³ ~ ²²#³³ and since is bijective this is equivalent to ²c ³# ~ ²#³ that is, ~ c . Since a module isomorphism from = to = is a vector space isomorphism as well, the result follows. For the converse, suppose that ~ c for a vector space automorphism on = . This condition is equivalent to ~ and so ²% #³ ~ ² ²#³³ ~ ²²#³³ ~ % ²#³ and by the - -linearity of , for any polynomial ²%³ - ´%µ we have
144
Advanced Linear Algebra
²² ³#³ ~ ²³²#³ which shows that is a module isomorphism from = to = .
Submodules and Invariant Subspaces There is a simple connection between the submodules of the - ´%µ-module = and the subspaces of the vector space = . Recall that a subspace : of = is invariant if ²:³ : . Theorem 7.5 A subset : of = is a submodule of the - ´%µ-module = if and only if it is a -invariant subspace of the vector space = .
Orders and the Minimal Polynomial We have seen that since = is finite-dimensional, the annihilator ann²= ³ ~ ¸²%³ - ´%µ ²%³= ~ ¸¹¹ of = is a nonzero ideal of - ´%µ and since - ´%µ is a principal ideal domain, this ideal is principal, say ann²= ³ ~ º²%³» Since all orders of = are associates and since the units of - ´%µ are precisely the nonzero elements of - , there is a unique monic order of = . Definition Let = be an - ´%µ-module defined by . The unique monic order of = , that is, the unique monic polynomial that generates ann²= ³ is called the minimal polynomial for and is denoted by ²%³ or min² ³. Thus, ann²= ³ ~ º ²%³» and ²%³= ~ ¸¹ ¯ ² ³ ~ ¯ ²%³ ²%³
In treatments of linear algebra that do not emphasize the role of the module = the minimal polynomial of a linear operator is simply defined as the unique monic polynomial ²%³ of smallest degree for which ² ³ ~ . It is not hard to see that this definition is equivalent to the previous definition. The concept of minimal polynomial is also defined for matrices. If ( is a square matrix over - the minimal polynomial A ²%³ of ( is defined as the unique monic polynomial ²%³ - ´%µ of smallest degree for which ²(³ ~ . We leave it to the reader to verify that this concept is well-defined and that the following holds.
The Structure of a Linear Operator
145
Theorem 7.6 1) If ( and ) are similar matrices then ( ²%³ ~ ) ²%³. Thus, the minimal polynomial is an invariant under similarity. 2) The minimal polynomial of B²= ³ is the same as the minimal polynomial of any matrix that represents .
Cyclic Submodules and Cyclic Subspaces For an - ´%µ-module = , consider the cyclic submodule º#» ~ ¸²%³# ²%³ - ´%µ¹ ~ ¸² ³²#³ ²%³ - ´%µ¹ We would like to characterize these simple but important submodules in terms of vector space notions. As we have seen, º#» is a -invariant subspace of = , but more can be said. Let ²%³ be the minimal polynomial of Oº#» and suppose that deg²²%³³ ~ . Any element of º#» has the form ²%³#. Dividing ²%³ by ²%³ gives ²%³ ~ ²%³²%³ b ²%³ where deg ²%³ deg ²%³. Since ²%³# ~ , we have ²%³# ~ ²%³²%³# b ²%³# ~ ²%³# Thus, º#» ~ ¸²%³# deg ²%³ ¹ Put another way, the ordered set 8 ~ ²#Á %#Á à Á %c #³ ~ ²#Á ²#³Á à Á c ²#³³ spans º#» as a vector space over - . But it is also the case that 8 is linearly independent over - , for if # b %# b Ä b c %c # ~ then ²%³# ~ where ²%³ ~ b % b Ä b c %c has degree less than . Hence, ²%³ ~ , that is, ~ for all ~ Á à Á c . Thus, 8 is an ordered basis for º#». To determine the matrix of Oº#» with respect to 8 , write $ ~ ²#³. Then ²$ ³ ~ ² ²#³³ ~ b ²#³ ~ $b for ~ Á à Á c and so simply “shifts” each basis vector in 8 , except the last one, to the next basis vector in 8 . For the last vector $c , if
Advanced Linear Algebra
146
²%³ ~ b % b Ä b c %c b % then since ² ³ ~ we have ~ ² ³ ~ b b Ä b c c b and so ²$c ³ ~ ² c ²#³³ ~ ²#³ ~ c² b b Ä b c c ³²#³ ~ c # c ²#³ c Ä c c c ²#³ ~ c $ c $ c Ä c c $c Hence, the matrix of Oº#» with respect to 8 is v x x *´²%³µ ~ x x Å w
Ä Ä Æ Å Æ Ä
c y c { { Å { { cc2 cc z
This is known as the companion matrix for the polynomial ²%³. Note that companion matrices are defined only for monic polynomials. Definition Let B²= ³. A subspace : of = is -cyclic if there exists a vector # : for which the set ¸#Á ²#³Á à Á c ²#³¹ is a basis for : .
Theorem 7.7 Let = be an - ´%µ-module defined by B²= ³. 1) (Characterization of cyclic submodules) A subset : = is a cyclic submodule of = if and only if it is a -cyclic subspace of the vector space =. 2) Suppose that º#» is a cyclic submodule of = . If the monic order of º#» is ²%³ ~ b % b Ä b c %c b % then 8 ~ ²#Á %#Á à Á %c #³ ~ ²#Á ²#³Á à Á c ²#³³ is an ordered basis for º#» and the matrix ´ Oº#» µ8 is the companion matrix *´²%³µ of ²%³. Hence, dim²º#»³ ~ deg²²%³³
The Structure of a Linear Operator
147
Summary The following table summarizes the connection between the module concepts and the vector space concepts that we have discussed. - ´%µ-Module = Scalar multiplication: ²%³# Submodule of = Annihilator: ann²= ³ ~ ¸²%³ ²%³= ~ ¸¹¹ Monic order ²%³ of = : ann²= ³ ~ º²%³» Cyclic submodule of = : º#» ~ ¸²%³# deg ²%³ deg ²%³¹
- -Vector Space = Action of ² ³: ² ³²#³ -Invariant subspace of = Annihilator: ann²= ³ ~ ¸²%³ ² ³²= ³ ~ ¸¹¹ Minimal polynomial of : ²%³ has smallest deg with ² ³ ~ -cyclic subspace of = : º#» ~ span¸#Á ²#³Á à Á c ²#³¹
The Decomposition of = We are now ready to translate the cyclic decomposition theorem into the language of = . First, we define the elementary divisors and invariant factors of an operator to be the elementary divisors and invariant factors, respectively, of the module = . Also, the elementary divisors and invariant factors of a matrix ( are defined to be the elementary divisors and invariant factors, respectively, of the operator ( . We will soon see that the multiset of elementary divisors and the multiset of invariant factors are complete invariants under similarity and so the multiset of elementary divisors (or invariant factors) of an operator is the same as the multiset of elementary divisors (or invariant factors) of any matrix that represents . Theorem 7.8 (The cyclic decomposition theorem for = ³ Let be a linear operator on a finite-dimensional vector space = . Let ²%³ ~ ²%³Ä ²%³ be the minimal polynomial of , where the monic polynomials ²%³ are distinct and irreducible. 1) (Primary decomposition) The - ´%µ-module = is the direct sum = ~ = l Ä l = where = ~ ¸# = ² ³²#³ ~ ¹ is a primary submodule of = of order ²%³. In vector space terms, = is a
148
Advanced Linear Algebra
-invariant subspace of = and the minimal polynomial of O= is min² O= ³ ~ ²%³ 2) (Cyclic decomposition) Each primary summand = can be decomposed into a direct sum = ~ º#Á » l Ä l º#Á »
of cyclic submodules º#Á » of order Á ²%³ with ~ Á Á Ä Á In vector space terms, º#Á » is a -cyclic subspace of = and the minimal polynomial of Oº#Á » is
min² Oº#Á » ³ ~ Á ²%³ 3) (The complete decomposition) This yields the decomposition of = into a direct sum of -cyclic subspaces = ~ ²º#Á » l Ä l º#Á »³ l Ä l ²º#Á » l Ä l º#Á »³ 4) (Elementary divisors and dimensions) The multiset of elementary divisors ¸ Á ²%³¹ of is uniquely determined by . If deg² Á ²%³³ ~ Á then the -cyclic subspace º#Á » has basis 8Á ~ ²#Á Á ²#Á ³Á à Á Á c ²#Á ³³
and so dim²º#Á »³ ~ deg² Á ³. Hence,
dim²= ³ ~ deg² Á ³
~
The Rational Canonical Form The cyclic decomposition theorem can be used to determine a set of canonical forms for similarity. Recall that if = ~ : l ; and if both : and ; are invariant under , the pair ²:Á ; ³ is said to reduce . Put another way, ²:Á ; ³ reduces if the restrictions O: and O; are linear operators on : and ; , respectively. Recall also that we write ~ l if there exist subspaces : and ; of = for which ²:Á ; ³ reduces and ~ O: and ~ O; If ~ l then any matrix representations of and can be used to construct a matrix representation of .
The Structure of a Linear Operator
149
Theorem 7.9 Suppose that ~ l B²= ³ has a reducing pair ²:Á ; ³. Let 8 ~ ² Á Ã Á Á Á Ã Á ! ³ be an ordered basis for = , where 9 ~ ² Á Ã Á ³ is an ordered basis for : and : ~ ² Á Ã Á ! ³ is an ordered basis for ; . Then the matrix ´ µ8 has the block diagonal form ´ µ8 ~ >
´ µ9
´ µ: ?block
Of course, this theorem may be extended to apply to multiple direct summands and this is especially relevant to our situation, since according to Theorem 7.8 ~ Oº#Á » l Ä l Oº#Á » In particular, if 8Á is an ordered basis for the cyclic submodule º#Á » and if 8 ~ ²8Á Á Ã Á 8Á ³ denotes the ordered basis for = obtained from these ordered bases (as we did in Theorem 7.9) then ´ µ8 ~
v ´Á µ8Á w
Æ
y ´Á µ8Á zblock
where Á ~ Oº#Á » . According to Theorem 7.8, the cyclic submodule º#Á » has ordered basis 8Á ~ ²#Á Á ²#Á ³Á à Á Á c ²#Á ³³
where Á ~ deg² Á ²%³³. Hence, we arrive at the matrix representation of described in the following theorem. Theorem 7.10 (The rational canonical form) Let dim²= ³ B and suppose that B²= ³ has minimal polynomial ²%³ ~ ²%³Ä ²%³ where the monic polynomials ²%³ are distinct and irreducible. Then = has an ordered basis 8 under which
150
Advanced Linear Algebra
Á v *´ ²%³µ x Æ x Á x *´ ²%³µ x ´ µ8 ~ x Æ x Á x *´ ²%³µ x x Æ
w
y { { { { { { { { { *´Á ²%³µ zblock
where the polynomials Á ²%³ are the elementary divisors of . We also write this in the form Á
Á
diag4*´Á ²%³µÁ à Á *´ ²%³µÁ à Á *´
²%³µÁ à Á *´Á ²%³µ5
This block diagonal matrix is said to be in rational canonical form and is called a rational canonical form of . Except for the order of the blocks in the matrix, the rational canonical form is a canonical form for similarity, that is, up to order of the blocks, each similarity class contains exactly one matrix in rational canonical form. Put another way, the multiset of elementary divisors is a complete invariant for similarity. Proof. It remains to prove that if two matrices in rational canonical form are similar, then they must be equal, up to order of the blocks. Let ( be the matrix ´ µ8 above. The ordered basis 8 clearly gives a decomposition of = into a direct sum of primary cyclic submodules for which the elementary divisors are the polynomials Á ²%³. Now suppose that ) is another matrix in rational canonical form
Á
Á
) ~ diag4*´Á ²%³µÁ à Á *´ ²%³µÁ à Á *´
²%³µÁ à Á *´Á ²%³µ5
If ) is similar to ( then we get another primary cyclic decomposition of for which the elementary divisors are the polynomials Á ²%³. It follows that the two sets of elementary divisors are the same and so ( and ) are the same up to the order of their blocks.
Corollary 7.11 1) Any square matrix ( is similar to a unique (except for the order of the blocks on the diagonal) matrix that is in rational canonical form. Any such matrix is called a rational canonical form of (. 2) Two square matrices over the same field - are similar if and only if they have the same multiset of elementary divisors.
Here are some examples of rational canonical forms.
The Structure of a Linear Operator
151
Example 7.1 Let be a linear operator on the vector space s7 and suppose that has minimal polynomial ²%³ ~ ²% c ³²% b ³ Noting that % c and ²% b ³ are elementary divisors and that the sum of the degrees of all elementary divisors must equal , we have two possibilities 1) % c Á ²% b 1³ Á % b 2) % c Á % c Á % c Á ²% b 1³ These correspond to the following rational canonical forms v c x x x x 1) x x x x w
v c x x x x 2) x x x x w
c
c
c c
y { { { { { { { { c z
y { { { { c { { { { c z
Exercises 1.
2. 3.
4.
We have seen that any B²= ³ can be used to make = into an - ´%µmodule. Does every module = over - ´%µ come from some B²= ³? Explain. Show that if ( and ) are block diagonal matrices with the same blocks, but in possibly different order, then ( and ) are similar. Let ( be a square matrix over a field - . Let 2 be the smallest subfield of - containing the entries of (. Prove that any rational canonical form for ( has coefficients in the field 2 . This means that the coefficients of any rational canonical form for ( are “rational” expressions in the coefficients of (, hence the origin of the term “rational canonical form.” Given an operator B²= ³ what is the smallest field 2 for which any rational canonical form must have entries in 2 ? Let 2 be a subfield of - . Prove that two matrices (Á ) C²2³ are similar over 2 if and only if they are similar over - , that is ( ~ 7 )7 c
152
5. 6.
Advanced Linear Algebra for some 7 C²2³ if and only if ( ~ 8)8c for some 8 C²- ³. Hint: Use the results of the previous exercise. Prove that the minimal polynomial of B²= ³ is the least common multiple of its elementary divisors. Let r be the field of rational numbers. Consider the linear operator B²r ³ defined by ² ³ ~ Á ² ³ ~ c . a) Find the minimal polynomial for and show that the rational canonical form for is 9~>
c ?
What are the elementary divisors of ? b) Now consider the map B²d ³ defined by the same rules as , namely, ² ³ ~ Á ² ³ ~ c . Find the minimal polynomial for and the rational canonical form for . What are the elementary divisors of ? c) The invariant factors of are defined using the elementary divisors of in the same way as we did at the end of Chapter 6, for a module 4 . Describe the invariant factors for the operators in parts a) and b). 7. Find all rational canonical forms ²up to the order of the blocks on the diagonal) for a linear operator on s6 having minimal polynomial ²% c 1³ ²% b 1³ . 8. How many possible rational canonical forms (up to order of blocks) are there for linear operators on s6 with minimal polynomial ²% c 1³²% b 1³ ? 9. Prove that if * is the companion matrix of ²%³ then ²*³ ~ and * has minimal polynomial ²%³. 10. Let be a linear operator on - with minimal polynomial ²%³ ~ ²% b 1³²% c 2³. Find the rational canonical form for if - ~ r, - ~ s or - ~ d. 11. Suppose that the minimal polynomial of B²= ³ is irreducible. What can you say about the dimension of = ?
Chapter 8
Eigenvalues and Eigenvectors
Unless otherwise noted, we will assume throughout this chapter that all vector spaces are finite-dimensional.
The Characteristic Polynomial of an Operator It is clear from our discussion of the rational canonical form that elementary divisors and their companion matrices are important. Let *´ ²%³µ be the companion matrix of a monic polynomial ²%Â Á Ã Á c ³ ~ b % b Ä b c %c b % By way of motivation, note that when ~ , we can write the polynomial ²%³ as follows ²%Â Á ³ ~ b % b % ~ %²% b ³ b which looks suspiciously like a determinant, namely, % b ? c ~ det6%0 c > c ?7 ~ det²%0 c *´ ²%³µ³
²%Â Á ³ ~ det>
% c
So, let us define (²%Â Á Ã Á c ³ ~ %0 c *´ ²%³µ Ä v % x c % Ä x ~ x c Æ x Å Å Æ % w Ä c
y { { Å { { c % b c z
154
Advanced Linear Algebra
where % is an independent variable. The determinant of this matrix is a polynomial in % whose degree equals the number of parameters Á Ã Á c . We have just seen that det²(²%Â Á ³³ ~ ²%Â Á ³ and this is also true for ~ . As a basis for induction, suppose that det²(²%Â Á Ã Á c ³³ ~ ²%Â Á Ã Á c ³ Then, expanding along the first row gives det²(²%Á Á Ã Á ³³ v c x ~ % det²(²%Á Á Ã Á ³³ b ²c³ detx Å w ~ % det²(²%Á Á Ã Á ³³ b ~ % ²%Â Á Ã Á ³ b ~ % b % b Ä b % b %b b ~ b ²%Â Á Ã Á ³
% c Å
Ä y Æ { { Æ % Ä c zd
We have proved the following. Lemma 8.1 If *´²%³µ is the companion matrix of the polynomial ²%³, then det²%0 c *´²%³µ³ ~ ²%³
Since the determinant of a block diagonal matrix is the product of the determinants of the blocks on the diagonal, if 9 is a matrix in rational canonical form then
det²%0 c 9³ ~ Á ²%³ Á
is the product of the elementary divisors of . Moreover, if 4 is similar to 9 , say 4 ~ 7 97 c then det²%0 c 4 ³ ~ det²%0 c 7 97 c ³ ~ det ´7 ²%0 c 9³7 c µ ~ det²7 ³det²%0 c 9³det²7 c ³ ~ det²%0 c 9³ and so *4 ²%³ is the product of the elementary divisors of 4 . The polynomial *4 ²%³ ~ det²%0 c 4 ³ is known as the characteristic polynomial of 4 . Since the characteristic polynomial is an invariant under similarity, we have the following.
Eigenvalues and Eigenvectors
155
Theorem 8.2 Let be a linear operator on a finite-dimensional vector space = . The characteristic polynomial * ²%³ of is defined to be the product of the elementary divisors of . If 4 is any matrix that represents , then * ²%³ ~ *4 ²%³ ~ det²%0 c 4 ³
Note that the characteristic polynomial is not a complete invariant under similarity. For example, the matrices (~>
and ) ~ > ?
?
have the same characteristic polynomial but are not similar. (The reader might wish to provide an example of two nonsimilar matrices with the same characteristic and minimal polynomials.) We shall have several occasions to use the fact that the minimal polynomial ²%³ ~ ²%³Ä ²%³ and characteristic polynomial
* ²%³ ~ Á ²%³ Á
of a linear operator B²= ³ have the same set of prime factors. This implies, for example, that these two polynomials have the same set of roots (not counting multiplicity).
Eigenvalues and Eigenvectors Let B²= ³ and let 4 be a matrix that represents . A scalar - is a root of the characteristic polynomial * ²%³ of if and only if det²0 c 4 ³ ~
(8.1)
that is, if and only if the matrix 0 c 4 is singular. In particular, if dim²= ³ ~ then (8.1) holds if and only if there exists a nonzero vector % - for which ²0 c 4 ³% ~ or, equivalently 4 ²%³ ~ % If 4 ~ ´ µ8 and ´#µ8 ~ %, then this is equivalent to ´ µ8 ´#µ8 ~ ´#µ8 or, in operator language
156
Advanced Linear Algebra
²#³ ~ # This prompts the following definition. Definition Let = be a vector space over - . 1) A scalar - is an eigenvalue (or characteristic value) of an operator B²= ³ if there exists a nonzero vector # = for which ²#³ ~ # In this case, # is an eigenvector (or characteristic vector) of associated with . 2) A scalar - is an eigenvalue for a matrix ( if there exists a nonzero column vector % for which (% ~ % In this case, % is an eigenvector (or characteristic vector) for ( associated with . 3) The set of all eigenvectors associated with a given eigenvalue , together with the zero vector, forms a subspace of = , called the eigenspace of , denoted by ; . This applies to both linear operators and matrices. 4) The set of all eigenvalues of an operator or matrix is called the spectrum of the operator or matrix.
The following theorem summarizes some key facts. Theorem 8.3 Let B²= ³ have minimal polynomial m ²%³ and characteristic polynomial * ²%³. 1) The polynomials ²%³ and * ²%³ have the same prime factors and hence the same set of roots, called the spectrum of . 2) (The Cayley–Hamilton theorem) The minimal polynomial divides the characteristic polynomial. Another way to say this is that an operator satisfies its own characteristic polynomial, that is, * ² ³ ~ 3) The eigenvalues of a matrix are invariants under similarity. 4) If is an eigenvalue of a matrix ( then the eigenspace ; is the solution space to the homogeneous system of equations ²0 c (³²%³ ~
One way to compute the eigenvalues of a linear operator is to first represent by a matrix ( and then solve the characteristic equation *( ²%³ ~ Unfortunately, it is quite likely that this equation cannot be solved when
Eigenvalues and Eigenvectors
157
dim²= ³ . As a result, the art of approximating the eigenvalues of a matrix is a very important area of applied linear algebra. The following theorem describes the relationship between eigenspaces and eigenvectors of distinct eigenvalues. Theorem 8.4 Suppose that Á à Á are distinct eigenvalues of a linear operator B²= ³. 1) The eigenspaces meet only in the vector, that is ; q ; ~ ¸¹ ³ Eigenvectors associated with distinct eigenvalues are linearly independent. That is, if # ; then the vectors ¸# Á à Á # ¹ are linearly independent. Proof. We leave the proof of part 1) to the reader. For part 2), assume that the eigenvectors # ; are linearly dependent. By renumbering if necessary, we may also assume that among all nontrivial linear combinations of these vectors that equal , the equation # b Ä b # ~
(8.2)
has the fewest number of terms. Applying gives # b Ä b # ~
(8.3)
Now we multiply (8.2) by and subtract from (8.3), to get ² c ³# b Ä b ² c ³# ~ But this equation has fewer terms than (8.2) and so all of the coefficients must equal . Since the 's are distinct we deduce that ~ for ~ Á Ã Á and so ~ as well. This contradiction implies that the # 's are linearly independent.
Geometric and Algebraic Multiplicities Eigenvalues have two forms of multiplicity, as described in the next definition. Definition Let be an eigenvalue of a linear operator B²= ³. 1) The algebraic multiplicity of is the multiplicity of as a root of the characteristic polynomial * ²%³. 2) The geometric multiplicity of is the dimension of the eigenspace ; .
Theorem 8.5 The geometric multiplicity of an eigenvalue of B²= ³ is less than or equal to its algebraic multiplicity. Proof. Suppose that is an eigenvalue of with eigenspace ; . Given any basis 8 ~ ¸# Á Ã Á # ¹ of ; we can extend it to a basis 8 for = . Since ; is invariant under , the matrix of with respect to 8 has the block form
158
Advanced Linear Algebra
0 ´ µ8 ~ 6
( ) 7block
where ( and ) are matrices of the appropriate sizes and so * ²%³ ~ det²%0 c ´ µ8 ³ ~ det²%0 c 0 ³det²%0c c (³ ~ ²% c ³ det²%0c c (³ (Here is the dimension of = .) Hence, the algebraic multiplicity of is at least , which is the geometric multiplicity of .
The Jordan Canonical Form One of the virtues of the rational canonical form is that every linear operator on a finite-dimensional vector space has a rational canonical form. However, the rational canonical form may be far from the ideal of simplicity that we had in mind for a set of simple canonical forms. We can do better when the minimal polynomial of splits over - , that is, factors into a product of linear factors ²%³ ~ ²% c ³ IJ% c ³
(8.4)
In some sense, the difficulty in the rational canonical form is the basis for the cyclic submodules º#Á ». Recall that since º#Á » is a -cyclic subspace of = we have chosen the ordered basis 8Á ~ 2#Á Á ²#Á ³Á à Á Á c ²#Á ³3
where Á ~ deg² Á ³. With this basis, all of the complexity comes at the end, when we attempt to express ² Á c ²#Á ³³ ~ Á ²#Á ³ as a linear combination of the basis vectors. When the minimal polynomial ²%³ has the form (8.4), the elementary divisors are
Á ²%³ ~ ²% c ³Á In this case, we can choose the ordered basis 9Á ~ 2#Á Á ² c ³²#Á ³Á à Á ² c ³Á c ²#Á ³3 for º#Á ». Denoting the th basis vector in 9Á by , we have for ~ Á à Á Á c ,
Eigenvalues and Eigenvectors
159
² ³ ~ ´² c ³ ²#Á ³µ ~ ² c b ³´² c ³ ²#Á ³µ ~ ² c ³b ²#Á ³ b ² c ³ ²#Á ³ ~ b b For ~ Á c , a similar computation, using the fact that ² c ³b ²#Á ³ ~ ² c ³Á ²#Á ³ ~ gives ²Á c ³ ~ Á c In this case, the complexity is more or less spread out evenly, and the matrix of Oº#Á » with respect to 9Á is the Á d Á matrix v x x @ ² Á Á ³ ~ x x Å w
Æ Ä
Ä Ä y Æ Å { { Æ Æ Å { { Æ Æ z
which is called a Jordan block associated with the scalar . Note that a Jordan block has 's on the main diagonal, 's on the subdiagonal and 's elsewhere. This matrix is, in general, simpler (or at least more aesthetically pleasing) than a companion matrix. Now we can state the analog of Theorem 7.10 for this choice of ordered basis. Theorem 8.6 (The Jordan canonical form) Let dim²= ³ B and suppose that the minimal polynomial of B²= ³ splits over the base field - , that is, ²%³ ~ ²% c ³ IJ% c ³ Then = has an ordered basis 9 under which v @ ² Á Á ³ Æ x x @ ² Á Á ³ x x Æ ´ µ9 ~ x x @ ² Á Á ³ x x w
Æ
y { { { { { { { { @ ² Á Á ³ zblock
where the polynomials ²% c ³Á are the elementary divisors of . This block diagonal matrix is said to be in Jordan canonical form and is called the Jordan canonical form of .
160
Advanced Linear Algebra
If the base field - is algebraically closed, then except for the order of the blocks in the matrix, Jordan canonical form is a canonical form for similarity, that is, up to order of the blocks, each similarity class contains exactly one matrix in Jordan canonical form. Proof. As to the uniqueness, suppose that @ is a matrix in Jordan canonical form that represents the operator with respect to some ordered basis 8 , and that @ has Jordan blocks @ ² Á ³Á à Á @ ² Á ³, where the 's may not be distinct. Then = is the direct sum of -invariant subspaces, that is, submodules of = , say = ~ = l Ä l = Consider a particular submodule = . it is easy to see from the matrix representation that O= satisfies the polynomial ²% c ³ on = , but no polynomial of the form ²% c ³ for , and so the order of = is ²% c ³ . In particular, each = is a primary submodule of = . We claim that = is also a cyclic submodule of = . To see this, let ²# Á à Á # ³ be the ordered basis that gives the Jordan block @ ² Á ³. Then it is easy to see by induction that # is a linear combination of # Á à Á #b , with coefficient of #b equal to or c. Hence, the set ¸# Á # Á # Á à Á c # ¹ is also a basis for = , from which it follows that = is a -cyclic subspace of = , that is, a cyclic submodule of = . Thus, the Jordan matrix @ corresponds to a primary cyclic decomposition of = with elementary divisors ²% c ³ . Since the multiset of elementary divisors is unique, so is the Jordan matrix representation of , up to order of the blocks.
Note that if has Jordan canonical form @ then the diagonal elements of @ are precisely the eigenvalues of , each appearing a number of times equal to its algebraic multiplicity.
Triangularizability and Schur's Lemma We have now discussed two different canonical forms for similarity: the rational canonical form, which applies in all cases and the Jordan canonical form, which applies only when the base field is algebraically closed. Let us now drop the rather strict requirements of canonical forms and look at two classes of matrices that are too large to be canonical forms (the upper triangular matrices and the almost upper triangular matrices) and a class of matrices that is too small to be a canonical form (the diagonal matrices). The upper triangular matrices (or lower triangular matrices) have some nice properties and it is of interest to know when an arbitrary matrix is similar to a
Eigenvalues and Eigenvectors
161
triangular matrix. We confine our attention to upper triangular matrices, since there are direct analogs for lower triangular matrices as well. It will be convenient to make the following, somewhat nonstandard, definition. Definition A linear operator on = is upper triangular with respect to an ordered basis 8 ~ ²# Á Ã Á # ³ if the matrix ´ µ8 is upper triangular, that is, if for all ~ Á Ã Á ²# ³ º# Á Ã Á # » The operator is upper triangularizable if there is an ordered basis with respect to which is upper triangular.
As we will see next, when the base field is algebraically closed, all operators are upper triangularizable. However, since two distinct upper triangular matrices can be similar, the class of upper triangular matrices is not a canonical form for similarity. Simply put, there are just too many upper triangular matrices. Theorem 8.7 (Schur's Lemma) Let = be a finite-dimensional vector space over a field - . 1) If B²= ³ has the property that its characteristic polynomial * ²%³ splits over - then is upper triangularizable. 2) If - is algebraically closed then all operators are upper triangularizable. Proof. Part 2) follows from part 1). The proof of part 1) is most easily accomplished by matrix means, namely, we prove that every square matrix ( 4 ²- ³ whose characteristic polynomial splits over - is similar to an upper triangular matrix. If ~ there is nothing to prove, since all d matrices are upper triangular. Assume the result is true for c and let ( 4 ²- ³. Let # be an eigenvector associated with the eigenvalue - of ( and extend ¸# ¹ to an ordered basis ²# Á ÀÀÀÁ # ³ for s . The matrix of ( with respect to 8 has the form ´(µ8 ~ >
i ( ?block
for some ( 4c ²- ³. Since ´(µ8 and ( are similar, we have det ²%0 c (³ ~ det ²%0 c ´(µ8 ³ ~ ²% c ³ det ²%0 c ( ³ Hence, the characteristic polynomial of ( also splits over - and, by the induction hypothesis, there exists an invertible matrix 7 4c ²- ³ for which < ~ 7 ( 7 c is upper triangular. Hence, if
162
Advanced Linear Algebra
8~>
7 ?block
then 8 is invertible and 8´(µ8 8c ~ >
7 ?>
i ( ?>
~ 7 c ? >
i
is upper triangular. This completes the proof.
When the base field is - ~ s, not all operators are triangularizable. We can, however, achieve a form that is close to triangular. For the sake of the exposition, we make the following nonstandard definition (that is, the reader should not expect to find this definition in other books). Definition A matrix ( 4 ²- ³ is almost upper triangular if it has the form v ( x (~x w
i ( Æ
y { { ( z
where each matrix ( either has size d or else has size d with an irreducible characteristic polynomial. A linear operator B²= ³ is almost upper triangularizable if there is an ordered basis 8 for which ´ µ8 is almost upper triangular.
We will prove that every real linear operator is almost upper triangularizable. In the case of a complex vector space = , any complex linear operator B²= ³ has an eigenvalue and hence = contains a one-dimensional -invariant subspace. The analog for the real case is that for any real linear operator B²= ³, the vector space = contains either a one-dimensional or a “nonreducible” two-dimensional -invariant subspace. Theorem 8.8 Let B²= ³ be a real linear operator. Then = contains at least one of the following: 1) A one-dimensional -invariant subspace, 2) A two-dimensional -invariant subspace > for which ~ O> has the property that ²%³ ~ * ²%³ is an irreducible quadratic. Hence, > is not the direct sum of two one-dimensional -invariant subspaces. Proof. The minimal polynomial ²%³ of factors into a product of linear and quadratic factors over s. If there is a linear factor % c , then is an eigenvalue for and if # ~ # then º#» is the desired one-dimensional -invariant subspace.
Eigenvalues and Eigenvectors
163
Otherwise, let ²%³ ~ % b % b be an irreducible quadratic factor of ²%³ and write ²%³ ~ ²%³²%³ Since ² ³ £ , we may choose a nonzero vector # = such that ² ³# £ . Let > ~ º² ³#Á ² ³#» This subspace is -invariant, for we have ´² ³#µ > and ´ ² ³#µ ~ ² ³# ~ c² b ³² ³# > Hence, ~ O> is a linear operator on > . Also, ² ³> ~ ¸¹ and so has minimal polynomial dividing ²%³. But since ²%³ is irreducible and monic, ²%³ ~ ²%³ is quadratic. It follows that > is two-dimensional, for if ² ³# b ² ³# ~ then b ~ on > , which is not the case. Finally, the characteristic polynomial * ²%³ has degree and is divisible by ²%³, whence * ²%³ ~ ²%³ ~ ²%³ is irreducible. Thus, > satisfies condition 2).
Now we can prove Schur's lemma for real operators. Theorem 8.9 (Schur's lemma: real case) Every real linear operator B²= ³ is almost upper triangularizable. Proof. As with the complex case, it is simpler to proceed using matrices, by showing that any d real matrix ( is similar to an almost upper triangular matrix. The result is clear for ~ or if ( is the zero matrix. For ~ , the characteristic polynomial *²%³ of ( has degree and is divisible by the minimal polynomial ²%³. If ²%³ ~ % c is linear then ( ~ 0 is diagonal. If ²%³ ~ ²% c ³ then ( is similar to an upper triangular matrix with diagonal elements and if ²%³ ~ ²% c ³²% c ³ with £ then ( is similar to a diagonal matrix with diagonal entries and . Finally, if ²%³ ~ *²%³ is irreducible then the result still holds. Assume for the purposes of induction that any square matrix of size less than d is almost upper triangularizable. We wish to show that the same is true for any d matrix (. We may assume that . If ( has an eigenvector # s , then let > ~ º# ». If not, then according to Theorem 8.8, there is a pair of vectors " Á " s for which > ~ º$ Á $ » is
164
Advanced Linear Algebra
two-dimensional and (-invariant and the characteristic and minimal polynomials of ( O> are equal and irreducible. Let < be a complement of > . If dim²> ³ ~ then let 8 ~ ²# Á " Á Ã Á "c ³ be an ordered basis for s and if dim²> ³ ~ then let 8 ~ ²$ Á $ Á " Á Ã Á "c ³ be an ordered basis for s . In either case, ( is similar to a matrix of the form ´(µ8 ~ >
)
i ( ?block
where ) has size d or ) has size d , with irreducible quadratic minimal polynomial. Also, ( has size d , where ~ c or ~ c . Hence, the induction hypothesis applies to ( and there exists an invertible matrix 7 4 for which < ~ 7 ( 7 c is almost upper triangular. Hence, if 8~>
0c
7 ?block
then 8 is invertible and 0 8´(µ8 8c ~ > c
) 7 ?>
0c i ( ?>
) ~ 7 c ? >
i
is almost upper triangular. This completes the proof.
Unitary Triangularizability Although we have not yet discussed inner product spaces and orthonormal bases, the reader is no doubt familiar with these concepts. So let us mention that when = is a real or complex inner product space, then if an operator on = can be triangularized (or almost triangularized) using an ordered basis 8 , it can also be triangularized (or almost triangularized) using an orthonormal ordered basis E. To see this, suppose we apply the Gram–Schmidt orthogonalization process to 8 ~ ²# Á Ã Á # ³. The resulting ordered orthonormal basis E ~ ²" Á Ã Á " ³ has the property that º# Á Ã Á # » ~ º" Á Ã Á " » for all . Since ´ µ8 is upper triangular, that is, ²# ³ º# Á Ã Á # » for all , it follows that ²" ³ º # Á Ã Á # » º# Á Ã Á # » ~ º" Á Ã Á "»
Eigenvalues and Eigenvectors
165
and so the matrix ´ µE is also upper triangular. (A similar argument holds in the almost upper triangular case.) A linear operator is unitarily upper triangularizable if there is an ordered orthonormal basis with respect to which is upper triangular. Accordingly, when = is an inner product space, we can replace the term “upper triangularizable” with “unitarily upper triangularizable” in Schur's lemma. (A similar statement holds for almost upper triangular matrices.)
Diagonalizable Operators A linear operator B²= ³ is diagonalizable if there is an ordered basis 8 for which ´ µ8 is diagonal. In the case of an algebraically closed field, we have seen that all operators are upper triangularizable. However, even for such fields, not all operators are diagonalizable. Our first characterization of diagonalizability amounts to little more than the definitions of the concepts involved. Theorem 8.10 An operator B²= ³ is diagonalizable if and only if there is a basis for = that consists entirely of eigenvectors of , that is, if and only if = ~ ; l Ä l ; where Á Ã Á are the distinct eigenvalues of .
Diagonalizability can also be characterized via minimal polynomials. Suppose that is diagonalizable and that 8 ~ ¸# Á Ã Á # ¹ is a basis for = consisting of eigenvectors of . Let Á Ã Á be a list of the distinct eigenvalues of . Then each basis vector # is an eigenvector for one of these eigenvalues and so
² c ³# ~ ~
for all basis vectors # . Hence, if
²%³ ~ ²% c ³ ~
then ² ³ ~ and so ²%³ ²%³. But every eigenvalue is a root of the minimal polynomial of and so ²%³ ²%³, whence ²%³ ~ ²%³. Conversely, if the minimal polynomial of is a product of distinct linear factors, then the primary decomposition of = looks like = ~ = l Ä l = where
166
Advanced Linear Algebra
= ~ ¸# = ² c ³# ~ ¹ ~ ; By Theorem 8.10, is diagonalizable. We have established the following result. Theorem 8.11 A linear operator B²= ³ on a finite-dimensional vector space is diagonalizable if and only if its minimal polynomial is the product of distinct linear factors.
Projections We have met the following type of operator before. Definition Let = ~ : l ; . The linear operator ¢ = ¦ = defined by ² b !³ ~ where : and ! ; is called projection on : along ; .
The following theorem describes projection operators. Theorem 8.12 1) Let be projection on : along ; . Then a) im²³ ~ : , ker²³ ~ ; b) = ~ im²³ l ker²³ c) # im²³ ¯ ²#³ ~ # Note that the last condition says that a vector is in the image of if and only if it is fixed by . 2) Conversely, if B²= ³ has the property that = ~ im²³ l ker²³ and Oim²³ ~ then is projection on im²³ along ker²³.
Projection operators play a major role in the spectral theory of linear operators, which we will discuss in Chapter 10. Now we turn to some of the basic properties of these operators. Theorem 8.13 A linear operator B²= ³ is a projection if and only if it is idempotent, that is, if and only if ~ . Proof. If is projection on : along ; then for any : and ! ; , ² b !³ ~ ² ³ ~
~ ² b !³
and so ~ . Conversely, suppose that is idempotent. If # im²³ q ker²³ then # ~ ²%³ and so ~ ²#³ ~ ²%³ ~ ²%³ ~ # Hence im²³ q ker²³ ~ ¸¹. Moreover, if # = then
Eigenvalues and Eigenvectors
167
# ~ ´# c ²#³µ b ²#³ ker²³ l im²³ and so = ~ ker²³ l im²³. Finally, ´²%³µ ~ ²%³ and so Oim²³ ~ . Hence, is projection on im²³ along ker²³.
The Algebra of Projections If is projection on : along ; then c is idempotent, since ² c ³ ~ c c b ~ c Hence, c is also a projection. Since ker² c ³ ~ im²³ and im² c ³ ~ ker²³, it follows that c is projection on ; along : .
Orthogonal Projections Definition The projections Á B²= ³ are orthogonal, written , if ~ ~ .
Note that if and only if im²³ ker²³ and im²³ ker²³ The following example shows that it is not enough to have ~ in the definition of orthogonality. In fact, it is possible for ~ and yet is not even a projection. Example 8.1 Let = ~ - and let + ~ ¸²%Á %³ % - ¹ ? ~ ¸²%Á ³ % - ¹ @ ~ ¸²Á &³ & - ¹ Thus, + is the diagonal, ? is the %-axis and @ is the &-axis in - . (The reader may wish to draw pictures in s .³ Using the notation (Á) for the projection on ( along ) , we have +Á? +Á@ ~ +Á@ £ +Á? ~ +Á@ +Á? From this we deduce that if and are projections, it may happen that both products and are projections, but that they are not equal. We leave it to the reader to show that @ Á? ?Á+ ~ (which is a projection), but that ?Á+ @ Á? is not a projection. Thus, it may also happen that is a projection but that is not a projection.
If and are projections, it does not necessarily follow that b , c or is a projection. Let us consider these one by one.
168
Advanced Linear Algebra
The Sum of Projections The sum b is a projection if and only if ² b ³ ~ b or b ~
(8.5)
Of course, this holds if ~ ~ , that is, if . We contend that the converse is also true, namely, that (8.5) implies that . Multiplying (8.5) on the left by and on the right by gives the pair of equations b ~ b ~ Hence ~ which together with (8.5) gives ~ . Therefore, if char²- ³ £ then ~ and therefore ~ , that is, . We have proven that b is a projection if and only if (assuming that - has characteristic different from ). Now suppose that b is a projection. To determine ker² b ³, suppose that ² b ³²#³ ~ Applying and noting that ~ and ~ , we get ²#³ ~ . Similarly, ²#³ ~ and so ker² b ³ ker²³ q ker²³. But the reverse inclusion is obvious and so ker² b ³ ~ ker²³ q ker²³ As to the image of b , we have # im² b ³ ¬ # ~ ² b ³²#³ ~ ²#³ b ²#³ im²³ b im²³ and so im² b ³ im²³ b im²³. But ~ implies that im²³ ker²³ and so the sum is direct and im² b ³ im²³ l im²³ For the reverse inequality, if # ~ b , where im²³ and im²³ then ² b ³²#³ ~ ² b ³²³ b ² b ³² ³ ~ b ~ # and so # im² b ³. Let us summarize. Theorem 8.14 Let Á B²= ³ be projections where = is a vector space over a field of characteristic £ . Then b is a projection if and only if , in which case b is projection on im²³ l im²³ along ker²³ q ker²³.
Eigenvalues and Eigenvectors
169
The Difference of Projections Let and be projections. The difference c is a projection if and only if ~ c ² c ³ ~ ² c ³ b is a projection. Hence, we may apply the previous theorem to deduce that c is a projection if and only if ² c ³ ~ ² c ³ ~ or, equivalently, ~ ~ Moreover, in this case, c ~ c is projection on ker² ³ along im² ³. Theorem 8.14 also implies that im² ³ ~ im² c ³ l im²³ ~ ker²³ l im²³ and ker² ³ ~ ker² c ³ q ker²³ ~ im²³ q ker²³ Theorem 8.15 Let Á B²= ³ be projections where = is a vector space over a field of characteristic £ 2. Then c is a projection if and only if ~ ~ in which case c is projection on im²³ q ker²³ along ker²³ l im²³.
The Product of Projections Finally, let us consider the product of two projections. Theorem 8.16 Let Á B²= ³ be projections. If and commute, that is, if ~ then is a projection. In this case, is projection on im²³ q im²³ along ker²³ b ker²³. (Example 8.1 shows that the converse may be false.) Proof. If ~ then ²³ ~ ~ ~ and so is a projection. To find the image of , observe that if # ~ ²#³ then ²#³ ~ ²#³ ~ ²#³ ~ # and so # im²³. Similarly # im²³ and so im²³ im²³ q im²³ For the reverse inclusion, if % ~ ²#³ ~ ²$³ im²³ q im²³ then
170
Advanced Linear Algebra
²%³ ~ ²$³ ~ ²$³ ~ ²%³ ~ ²#³ ~ ²#³ ~ % and so % im²³. Hence, im²³ ~ im²³ q im²³ Next, we observe that if # ker²³ then ²#³ ~ and so ²#³ ker²³. Hence, # ~ ²#³ b ²# c ²#³³ ker²³ b ker²³ Moreover, if # ~ b ker²³ b ker²³ then ²#³ ~ ² b ³ ~ ²³ b ² ³ ~ b ~ and so # ker²³. Thus, ker²³ ~ ker²³ b ker²³ We should remark that the sum above need not be direct. For example, if ~ then ker²³ ~ ker²³.
Resolutions of the Identity If is a projection then ² c ³ and b ² c ³ ~ Let us generalize this to more than two projections. Definition If Á Ã Á are mutually orthogonal projections, that is, for £ and if b Ä b ~ where is the identity operator then we refer to this sum as a resolution of the identity.
There is a connection between the resolutions of the identity map on = and the decomposition of = . In general, if the linear operators on = satisfy b Ä b ~ then for any # = we have # ~ # ~ ²#³ b Ä b ²#³ im² ³ b Ä b im² ³ and so = ~ im² ³ b Ä b im² ³ However, the sum need not be direct. The next theorem describes when the sum is direct.
Eigenvalues and Eigenvectors
171
Theorem 8.17 Resolutions of the identity correspond to direct sum decompositions of = in the following sense: 1) If b Ä b ~ is a resolution of the identity then = ~ im² ³ l Ä l im² ³ and is projection on im² ³ along ker² ³ ~ im² ³ £
2) Conversely, suppose that = ~ : l Ä l : If is projection on : along the direct sum of the other subspaces : £
then b Ä b ~ is a resolution of the identity. Proof. To prove 1) suppose that b Ä b ~ is a resolution of the identity. Then as we have seen = ~ im² ³ b Ä b im² ³ To see that the sum is direct, if % b Ä b % ~ then applying gives % ~ % ~ for all . Hence, the sum is direct. Finally, we have im² ³ l im² ³ ~ = ~ im² ³ l ker² ³ £
which implies that ker² ³ ~ im² ³ £
To prove part 2), observe that for £ , im² ³ ~ : ker² ³ and similarly im² ³ ker² ³. Hence, . Also, if # ~ where : then # ~
bÄb
~ ²#³ b Ä b ²#³
and so ~ b Ä b is a resolution of the identity.
bÄb
172
Advanced Linear Algebra
Spectral Resolutions Let us try to do something similar to Theorem 8.17 for an arbitrary linear operator on = (rather than just the identity ). Suppose that can be resolved as follows ~ b Ä b where b Ä b ~ is a resolution of the identity and - . Then = ~ im² ³ l Ä l im² ³ Moreover, if # im² ³ then # ~ ²%³ and so ²#³ ~ ² b Ä b ³ ²%³ ~ ²%³ ~ # Hence, im² ³ ; . But the reverse is also true, since the equation ²#³ ~ # is ² b Ä b ³²#³ ~ ² b Ä b ³# or ² c ³ ²#³ b Ä b ² c ³ ²#³ ~ But since ² c ³ ²#³ im² ³, we deduce that ²#³ ~ for £ and so # ~ ² b Ä b ³# ~ # im² ³ Thus, im² ³ ~ ; and we can conclude that = ~ ; l Ä l ; that is, is diagonalizable. The converse also holds, for if = is the direct sum of the eigenspaces of and if is projection on ; along the direct sum of the other eigenspaces then b Ä b ~ But for any # ; , we have ²# ³ ~ # ~ ² b Ä b ³# ~ ² b Ä b ³²# ³ and so ~ b Ä b Theorem 8.18 A linear operator B²= ³ is diagonalizable if and only if it can be written in the form ~ b Ä b
(8.6)
where the 's are distinct and b Ä b ~ is a resolution of the identity. In this case, ¸ Á Ã Á ¹ is the spectrum of and the projections satisfy
Eigenvalues and Eigenvectors
173
im² ³ ~ ; and ker² ³ ~ ; £
Equation (8.6) is referred to as the spectral resolution of .
Projections and Invariance There is a connection between projections and invariant subspaces. Suppose that : is a -invariant subspace of = and let be any projection onto : (along any complement of : ). Then for any # = , we have ²#³ : and so ²²#³³ : . Hence, ²²#³³ is fixed by , that is ²#³ ~ ²#³ Thus ~ . Conversely, if ~ then for any : , we have ~ ² ³, whence ² ³ ~ ² ³ ~ ² ³ ~ ² ³ and so ² ³ is fixed by , from which it follows that ² ³ : . In other words, : is -invariant. Theorem 8.19 Let B²= ³. A subspace : of = is -invariant if and only if ~ for some projection on : .
We also have the following relationship between projections and reducing pairs. Theorem 8.20 Let = ~ : l ; . Then a linear operator B²= ³ is reduced by the pair ²:Á ; ³ if and only if ~ , where is projection on : along ; . Proof. Suppose first that ~ , where is projection on : along ; . For : we have ² ³ ~ ² ³ ~ ² ³ and so fixes ² ³, which implies that ² ³ : . Hence : is invariant under . Also, for ! ; ²!³ ~ ²!³ ~ and so ²!³ ker²³ ~ ; . Hence, ; is invariant under . Conversely, suppose that ²:Á ; ³ reduces . The projection operator fixes vectors in : and sends vectors in ; to . Hence, for : and ! ; we have ² ³ ~ ² ³ ~ ² ³ and
174
Advanced Linear Algebra
²!³ ~ ~ ²!³ which imply that ~ .
Exercises 1.
2.
3. 4. 5.
6. 7.
8. 9.
10.
11.
12.
13.
Let 1 be the d matrix all of whose entries are equal to . Find the minimal polynomial and characteristic polynomial of 1 and the eigenvalues. A linear operator B²= ³ is said to be nonderogatory if its minimal polynomial is equal to its characteristic polynomial. Prove that is nonderogatory if and only if = is a cyclic module. Prove that the eigenvalues of a matrix do not form a complete set of invariants under similarity. Show that B²= ³ is invertible if and only if is not an eigenvalue of . Let ( be an d matrix over a field - that contains all roots of the characteristic polynomial of (. Prove that det²(³ is the product of the eigenvalues of (, counting multiplicity. Show that if is an eigenvalue of then ²³ is an eigenvalue of ² ³, for any polynomial ²%³. Also, if £ then c is an eigenvalue for c . An operator B²= ³ is nilpotent if ~ for some positive o. a) Show that if is nilpotent then the spectrum of is ¸¹. b) Find a nonnilpotent operator with spectrum ¸¹. Show that if Á B²= ³ then and have the same eigenvalues. (Halmos) a) Find a linear operator that is not idempotent but for which ² c ³ ~ . b) Find a linear operator that is not idempotent but for which ² c ³ ~ . c) Prove that if ² c ³ ~ ² c ³ ~ then is idempotent. An involution is a linear operator for which ~ . If is idempotent what can you say about c ? Construct a one-to-one correspondence between the set of idempotents on = and the set of involutions. Let (Á ) 4 ²d³ and suppose that ( ~ ) ~ 0Á ()( ~ ) c but ( £ 0 and ) £ 0 . Show that if * 4 ²d³ commutes with both ( and ) then * ~ 0 for some scalar d. Suppose that 1 and 2 are matrices in Jordan canonical form. Prove that if 1 and 2 are similar then they are the same except for the order of the Jordan blocks. Hence, Jordan form is a canonical form for similarity (up to order of the blocks). Fix . Show that any complex matrix is similar to a matrix that looks just like a Jordan matrix except that the entries that are equal to are replaced by entries with value , where is any complex number. Thus, any complex matrix is similar to a matrix that is “almost” diagonal. Hint:
Eigenvalues and Eigenvectors
175
consider the fact that v w
yv zw
yv zw
c
y v ~ c z w
y z
14. Show that the Jordan canonical form is not very robust in the sense that a small change in the entries of a matrix ( may result in a large jump in the entries of the Jordan form 1 . Hint: consider the matrix ( ~ >
?
What happens to the Jordan form of ( as ¦ ? 15. Give an example of a complex nonreal matrix all of whose eigenvalues are real. Show that any such matrix is similar to a real matrix. What about the type of the invertible matrices that are used to bring the matrix to Jordan form? 16. Let 1 ~ ´ µ8 be the Jordan form of a linear operator B²= ³. For a given Jordan block of 1 ²Á ³ let < be the subspace of = spanned by the basis vectors of 8 associated with that block. a) Show that O< has a single eigenvalue with geometric multiplicity . In other words, there is essentially only one eigenvector (up to scalar multiple) associated with each Jordan block. Hence, the geometric multiplicity of for is the number of Jordan blocks for . Show that the algebraic multiplicity is the sum of the dimensions of the Jordan blocks associated with . b) Show that the number of Jordan blocks in 1 is the maximum number of linearly independent eigenvectors of . c) What can you say about the Jordan blocks if the algebraic multiplicity of every eigenvalue is equal to its geometric multiplicity? 17. Assume that the base field - is algebraically closed. Then assuming that the eigenvalues of ( are known, it is possible to determine the Jordan form 1 of a matrix ( by looking at the rank of various matrix powers. A matrix ) is nilpotent if ) ~ for some . The smallest such exponent is called the index of nilpotence. a) Let 1 ~ 1 ²Á ³ be a single Jordan block of size d . Show that 1 c 0 is nilpotent of index . Thus, is the smallest integer for which rk²1 c 0³ ~ . Now let 1 be a matrix in Jordan form but possessing only one eigenvalue . b) Show that 1 c 0 is nilpotent. Let be its index of nilpotence. Show that is the maximum size of the Jordan blocks of 1 and that rk²1 c 0³c is the number of Jordan blocks in 1 of maximum size. c) Show that rk²1 c 0³c is equal to times the number of Jordan blocks of maximum size plus the number of Jordan blocks of size one less than the maximum.
176
Advanced Linear Algebra
d) Show that the sequence rk²1 c 0³ for ~ Á Ã Á uniquely determines the number and size of all of the Jordan blocks in 1 , that is, it uniquely determines 1 up to the order of the blocks. e) Now let 1 be an arbitrary Jordan matrix. If is an eigenvalue for 1 show that the sequence rk²1 c 0³ for ~ Á Ã Á where is the first integer for which rk²1 c 0³ ~ rk²1 c 0³b uniquely determines 1 up to the order of the blocks. f) Prove that for any matrix ( with spectrum ¸ Á Ã Á ¹ the sequence rk²( c 0³ for ~ Á Ã Á and ~ Á Ã Á where is the first integer for which rk²( c 0³ ~ rk²( c 0³b uniquely determines the Jordan matrix 1 for ( up to the order of the blocks. 18. Let ( C ²- ³. a) If all the roots of the characteristic polynomial of ( lie in - prove that ( is similar to its transpose (! . Hint: Let ) be the matrix v xÅ )~x w
Ä y Ç { { Ç Ç Å Ä z
that has 's on the diagonal that moves up from left to right and 's elsewhere. Let 1 be a Jordan block of the same size as ) . Show that )1 ) c ~ 1 ! . b) Let (Á ) C ²- ³. Let 2 be a field containing - . Show that if ( and ) are similar over 2 , that is, if ) ~ 7 (7 c where 7 C ²2³ then ( and ) are also similar over - , that is, there exists 8 C ²- ³ for which ) ~ 8(8c . Hint: consider the equation ?( c )? ~ as a homogeneous system of linear equations with coefficients in - . Does it have a solution? Where? c) Show that any matrix is similar to its transpose. 19. Prove Theorem 8.8 using the complexification of = .
The Trace of a Matrix 20. Let ( be an d matrix over a field - . The trace of (, denoted by tr²(³, is the sum of the elements on the main diagonal of (. Verify the following statements: a) tr²A³ ~ tr²(³, for b) tr²( b )³ ~ tr²(³ b tr²)³ c) tr²()³ ~ tr²)(³ d) Prove that tr²()*³ ~ tr²*()³ ~ tr²)*(³. Find an example to show that tr²()*³ may not equal tr²(*)³. e) The trace is an invariant under similarity f) If - is algebraically closed then the trace of ( is the sum of the eigenvalues of (. Formulate a definition of the trace of a linear operator, show that it is welldefined and relate this concept to the eigenvalues of the operator.
Eigenvalues and Eigenvectors
177
21. Use the concept of the trace of a matrix, as defined in the previous exercise, to prove that there are no matrices (, ) C ²d³ for which () c )( ~ 0 22. Let ; ¢ C ²- ³ ¦ - be a function with the following properties. For all matrices (Á ) C ²- ³ and - , 1) ; ²A³ ~ ; ²(³ 2) ; ²( b )³ ~ ; ²(³ b ; ²)³ 3) ; ²()³ ~ ; ²)(³ Show that there exists - for which ; ²(³ ~ tr²(³, for all ( C ²- ³.
Simultaneous Diagonalizability 23. A pair of linear operators Á B²= ³ is simultaneously diagonalizable if there is an ordered basis 8 for = for which ´ µ8 and ´µ8 are both diagonal, that is, 8 is an ordered basis of eigenvectors for both and . Prove that two diagonalizable operators and are simultaneously diagonalizable if and only if they commute, that is, ~ . Hint: If ~ then the eigenspaces of are invariant under .
Common Eigenvectors It is often of interest to know whether a family < ~ ¸ B²= ³ ? ¹ of linear operators on = has a common eigenvector, that is, a single vector # = that is an eigenvector for every operator in < (the corresponding eigenvalues may be different for each operator, however). A commuting family < of operators is a family in which each pair of operators commutes, that is, Á < implies ~ . We say that a subspace < of = is < -invariant if it is -invariant for every < . 24. Let Á B²= ³. Prove that if and commute then every eigenspace of is -invariant. Thus, if < is a commuting family then every eigenspace of any member of < is < -invariant. 25. Let < be a family of operators in B²= ³ with the property that each operator in < has a full set of eigenvalues in the base field - , that is, the characteristic polynomial splits over - . Prove that if < is a commuting family then < has a common eigenvector # = . 26. What do the real matrices (~>
c
and ) ~ > ? c
have to do with the issue of common eigenvectors?
?
178
Advanced Linear Algebra
Geršgorin Disks It is generally impossible to determine precisely the eigenvalues of a given complex operator or matrix ( C ²d³, for if then the characteristic equation has degree and cannot in general be solved. As a result, the approximation of eigenvalues is big business. Here we consider one aspect of this approximation problem, which also has some interesting theoretical consequences. Let ( C ²d³ and suppose that (# ~ # where # ~ ² Á Ã Á ³! . Comparing th rows gives
( ~ ~
which can also be written in the form
² c ( ³ ~ ( ~ £
If has the property that ( ( ( ( for all , we have
( (( c ( ( (( (( ( ( ((( ( ~ £
~ £
and thus
( c ( ( (( (
(8.7)
~ £
The right-hand side is the sum of the absolute values of all entries in the th row of ( except the diagonal entry ( . This sum 9 ²(³ is the th deleted absolute row sum of (. The inequality (8.7) says that, in the complex plane, the eigenvalue lies in the disk centered at the diagonal entry ( with radius equal to 9 ²(³. This disk GR ²(³ ~ ¸' d (' c ( ( 9 ²(³¹ is called the Geršgorin row disk for the th row of (. The union of all of the Geršgorin row disks is called the Geršgorin row region for (. Since there is no way to know in general which is the index for which ( ( ( (, the best we can say in general is that the eigenvalues of ( lie in the union of all Geršgorin row disks, that is, in the Geršgorin row region of (.
Eigenvalues and Eigenvectors
179
Similar definitions can be made for columns and since a matrix has the same eigenvalues as its transpose, we can say that the eigenvalues of ( lie in the Geršgorin column region of (. The Geršgorin region .²(³ of a matrix ( 4 ²- ³ is the intersection of the Geršgorin row region and the Geršgorin column region and we can say that all eigenvalues of ( lie in the Geršgorin region of (. In symbols, ²(³ .²(³. 27. Find and sketch the Geršgorin region and the eigenvalues for the matrix (~
v w
y
z
28. A matrix ( 4 ²d³ is diagonally dominant if for each ~ Á Ã Á (( ( 9 ²(³ and it is strictly diagonally dominant if strict inequality holds. Prove that if ( is strictly diagonally dominant then it is invertible. 29. Find a matrix ( 4 ²d³ that is diagonally dominant but not invertible. 30. Find a matrix ( 4 ²d³ that is invertible but not strictly diagonally dominant.
Chapter 9
Real and Complex Inner Product Spaces
We now turn to a discussion of real and complex vector spaces that have an additional function defined on them, called an inner product, as described in the upcoming definition. Thus, in this chapter, - will denote either the real or complex field. If is a complex number then the complex conjugate of is denoted by . Definition Let = be a vector space over - ~ s or - ~ d. An inner product on = is a function ºÁ »¢ = d = ¦ - with the following properties: 1) (Positive definiteness) For all # = , the inner product º#Á #» is real and º#Á #» and º#Á #» ~ ¯ # ~ 2) For - ~ d: (Conjugate symmetry) º"Á #» ~ º#Á "» For - ~ s: (Symmetry) º"Á #» ~ º#Á "» 3) (Linearity in the first coordinate) For all "Á # = and Á º" b #Á $» ~ º"Á $» b º#Á $» A real (or complex) vector space = , together with an inner product, is called a real (or complex) inner product space.
We will study bilinear forms ²also called inner products³ on vector spaces over fields other than s or d in Chapter 11. Note that property 1) implies that the quantity º#Á #» is always real, even if = is a complex vector space. Combining properties 2) and 3), we get, in the complex case º$Á " b #» ~ º" b #Á $» ~ º"Á $» b º#Á $» ~ º$Á "» b º$Á #»
182
Advanced Linear Algebra
This is referred to as conjugate linearity in the second coordinate. Thus, a complex inner product is linear in its first coordinate and conjugate linear in its second coordinate. This is often described by saying that the inner product is sesquilinear. (Sesqui means “one and a half times.”³ In the real case (- ~ s), the inner product is linear in both coordinates—a property referred to as bilinearity. Example 9.1 ) The vector space s is an inner product space under the standard inner product, or dot product, defined by º² Á à Á ³Á ² Á à Á
³»
~
b Ä b
The inner product space s is often called -dimensional Euclidean space. 2) The vector space d is an inner product space under the standard inner product defined by º² Á à Á ³Á ² Á à Á
³»
~
b Ä b
This inner product space is often called -dimensional unitary space. 3) The vector space *´Á µ of all continuous complex-valued functions on the closed interval ´Á µ is a complex inner product space under the inner product b
º Á » ~ ²%³²%³ dx
a
Example 9.2 One of the most important inner product spaces is the vector space M of all real (or complex) sequences ² ³ with the property that ( ( B, under the inner product B
º² ³Á ²! ³» ~ ! ~
Of course, for this inner product to make sense, the sum on the right must converge. To see this, note that if ² ³Á ²! ³ M then ²( ( c (! (³ ~ ( ( c ( ((! ( b (! ( and so ( ! ( ( ( b (! (
Real and Complex Inner Product Spaces
183
which gives B
B
~
~
B
B
~
~
e ! e ( ! ( ( ( b (! ( B
We leave it to the reader to verify that M is an inner product space.
The following simple result is quite useful and easy to prove. Lemma 9.1 If = is an inner product space and º"Á %» ~ º#Á %» for all % = then " ~ #.
Note that a vector subspace : of an inner product space = is also an inner product space under the restriction of the inner product of = to : .
Norm and Distance If = is an inner product space, the norm, or length of # = is defined by )#) ~ jº#Á #»
(9.1)
A vector # is a unit vector if )#) ~ . Here are the basic properties of the norm. Theorem 9.2 1) )#) and )#) ~ if and only if # ~ . 2) )#) ~ (()#) for all - Á # = 3) (The Cauchy-Schwarz inequality) For all "Á # = , (º"Á #»( )"))#) with equality if and only if one of " and # is a scalar multiple of the other. 4) (The triangle inequality) For all "Á # = )" b #) )") b )#) with equality if and only if one of " and # is a scalar multiple of the other. 5) For all "Á #Á % = )" c #) )" c %) b )% c #) 6) For all "Á # = ()") c )#)( )" c #) 7) (The parallelogram law) For all "Á # = )" b #) b )" c #) ~ )") b )#) Proof. We prove only Cauchy-Schwarz and the triangle inequality. For CauchySchwarz, if either " or # is zero the result follows, so assume that "Á # £ . Then, for any scalar - ,
184
Advanced Linear Algebra
)" c #) ~ º" c #Á " c #» ~ º"Á "» c º"Á #» c ´º#Á "» c º#Á #»µ Choosing ~ º#Á "»°º#Á #» makes the value in the square brackets equal to and so º"Á "» c
º#Á "»º"Á #» (º"Á #»( ~ )") c º#Á #» )#)
which is equivalent to the Cauchy-Schwarz inequality. Furthermore, equality holds if and only if )" c #) ~ , that is, if and only if " c # ~ , which is equivalent to " and # being scalar multiples of one another. To prove the triangle inequality, the Cauchy-Schwarz inequality gives )" b #) ~ º" b #Á " b #» ~ º"Á "» b º"Á #» b º#Á "» b º#Á #» )") b )"))#) b )#) ~ ²)") b )#)³ from which the triangle inequality follows. The proof of the statement concerning equality is left to the reader.
Any vector space = , together with a function ) h )¢ = ¦ s that satisfies properties 1), 2) and 4) of Theorem 9.2, is called a normed linear space. (And the function ) h ) is called a norm.³ Thus, any inner product space is a normed linear space, under the norm given by (9.1). It is interesting to observe that the inner product on = can be recovered from the norm. Theorem 9.3 (The polarization identities) 1) If = is a real inner product space, then º"Á #» ~
²)" b #) c )" c #) ³
2) If = is a complex inner product space, then º"Á #» ~
²)" b #) c )" c #) ³ b ²)" b #) c )" c #) ³
The formulas in Theorem 9.3 are known as the polarization identities. The norm can be used to define the distance between any two vectors in an inner product space.
Real and Complex Inner Product Spaces
185
Definition Let = be an inner product space. The distance ²"Á #³ between any two vectors " and # in = is ²"Á #³ ~ )" c #)
(9.2)
Here are the basic properties of distance. Theorem 9.4 1) ²"Á #³ and ²"Á #³ ~ if and only if " ~ # 2) (Symmetry) ²"Á #³ ~ ²#Á "³ 3) (The triangle inequality) ²"Á #³ ²"Á $³ b ²$Á #³
Any nonempty set = , together with a function ¢ = d = ¦ s that satisfies the properties of Theorem 9.4, is called a metric space and the function is called a metric on = . Thus, any inner product space is a metric space under the metric (9.2). Before continuing, we should make a few remarks about our goals in this and the next chapter. The presence of an inner product (and hence a metric) raises a host of topological issues related to the notion of convergence. We say that a sequence ²# ³ of vectors in an inner product space converges to # = if lim ²# Á #³ ~
¦B
that is, if lim )# c #) ~
¦B
Some of the more important concepts related to convergence are closedness and closures, completeness and the continuity of linear operators and linear functionals. In the finite-dimensional case, the situation is very straightforward: all subspaces are closed, all inner product spaces are complete and all linear operators and functionals are continuous. However, in the infinite-dimensional case, things are not as simple. Our goals in this chapter and the next are to describe some of the basic properties of inner product spaces—both finite and infinite-dimensional—and then discuss certain special types of operators (normal, unitary and self-adjoint) in the finite-dimensional case only. To achieve the latter goal as rapidly as possible, we will postpone a discussion of topological properties until Chapter
186
Advanced Linear Algebra
13. This means that we must state some results only for the finite-dimensional case in this chapter, deferring the infinite-dimensional case to Chapter 13.
Isometries An isomorphism of vector spaces preserves the vector space operations. The corresponding concept for inner product spaces is the following. Definition Let = and > be inner product spaces and let B²= Á > ³. 1) is an isometry if it preserves the inner product, that is, if º ²"³Á ²#³» ~ º"Á #» for all "Á # = . 2) A bijective isometry is called an isometric isomorphism. When ¢ = ¦ > is a bijective isometry, we say that = and > are isometrically isomorphic.
It is not hard to show that an isometry is injective and so it is an isometric isomorphism provided it is also surjective. Moreover, if dim²= ³ ~ dim²> ³ B injectivity implies surjectivity and so the concepts of isometry and isometric isomorphism are equivalent. On the other hand, the following example shows that this is not the case for infinite-dimensional inner product spaces. Example 9.3 Let ¢ M ¦ M be defined by ²% Á % Á % à ³ ~ ²Á % Á % Á à ³ (This is the right shift operator.³ Then is an isometry, but it is clearly not surjective.
Theorem 9.5 A linear transformation B²= Á > ³ is an isometry if and only if it preserves the norm, that is, if and only if )²#³) ~ )#) for all # = . Proof. Clearly, an isometry preserves the norm. The converse follows from the polarization identities. In the real case, we have
Real and Complex Inner Product Spaces
187
²) ²"³ b ²#³) c ) ²"³ c ²#³) ³ ~ ²) ²" b #³) c ) ²" c #³) ³ ~ ²)" b #) c )" c #) ³ ~ º"Á #»
º ²"³Á ²#³» ~
and so is an isometry. The complex case is similar.
The next result points out one of the main differences between real and complex inner product spaces. Theorem 9.6 Let = be an inner product space and let B²= ³. 1) If º ²#³Á $» ~ for all #Á $ = then ~ . 2) If = is a complex inner product space and º ²#³Á #» ~ for all # = then ~ . 3) Part 2³ does not hold in general for real inner product spaces. Proof. Part 1) follows directly from Lemma 9.1. As for part 2), let # ~ % b &, for %Á & = and - . Then ~ º ²% b &³Á % b &» ~ (( º ²%³Á %» b º ²&³Á &» b º ²%³Á &» b º ²&³Á %» ~ º ²%³Á &» b º ²&³Á %» Setting ~ gives º ²%³Á &» b º ²&³Á %» ~ and setting ~ gives º ²%³Á &» c º ²&³Á %» ~ These two equations imply that º ²%³Á &» ~ for all %Á & = and so ~ by part 1). As for part 3), rotation by ° in the real plane s has the property that º ²#³Á #» ~ for all #, yet is not zero.
Orthogonality The presence of an inner product allows us to define the concept of orthogonality. Definition Let = be an inner product space. 1) Two vectors "Á # = are orthogonal, written " #, if º"Á #» ~ . 2) Two subsets ?Á @ = are orthogonal, written ? @ , if % & for all % ? and & @ .
188
Advanced Linear Algebra
3) The orthogonal complement of a subset ? = is the set ? ~ ¸# = ¸#¹ ?¹
The following result is easily proved. Theorem 9.7 Let = be an inner product space. 1) The orthogonal complement ? of any subset ? = is a subspace of = . 2) For any subspace : of = , : q : ~ ¸¹.
Orthogonal and Orthonormal Sets Definition A nonempty set E ~ ¸" 2¹ of vectors in an inner product space is said to be an orthogonal set if " " for all £ 2 . If, in addition, each vector " is a unit vector, the set E is an orthonormal set. Thus, a set is orthonormal if º" Á " » ~ Á for all Á 2 , where Á is the Kronecker delta function.
Of course, given any nonzero vector # = , we may obtain a unit vector " by multiplying # by the reciprocal of its norm "~
# )#)
Thus, it is a simple matter to construct an orthonormal set from an orthogonal set of nonzero vectors. Note that if " # then )" b #) ~ )") b )#) and the converse holds if - ~ s. Theorem 9.8 Any orthogonal set of nonzero vectors in = is linearly independent. Proof. Let E ~ ¸" 2¹ be an orthogonal set of nonzero vectors and suppose that " b Ä b " ~ Then, for any ~ Á Ã Á , ~ º " b Ä b " Á " » ~ º" Á " » and so ~ , for all . Hence, E is linearly independent.
Real and Complex Inner Product Spaces
189
Definition A maximal orthonormal set in an inner product space = is called a Hilbert basis for = .
Zorn's lemma can be used to show that any nontrivial inner product space has a Hilbert basis. We leave the details to the reader. Extreme care must be taken here not to confuse the concepts of a basis for a vector space and a Hilbert basis for an inner product space. To avoid confusion, a vector space basis, that is, a maximal linearly independent set of vectors, is referred to as a Hamel basis. An orthonormal Hamel basis will be called an orthonormal basis, to distinguish it from a Hilbert basis. The following example shows that, in general, the two concepts of basis are not the same. Example 9.4 Let = ~ M and let 4 be the set of all vectors of the form ~ ²Á à Á Á Á Á à ³ where has a in the th coordinate and 's elsewhere. Clearly, 4 is an orthonormal set. Moreover, it is maximal. For if # ~ ²% ³ M has the property that # 4 then % ~ º#Á » ~ for all and so # ~ . Hence, no nonzero vector # ¤ 4 is orthogonal to 4 . This shows that 4 is a Hilbert basis for the inner product space M . On the other hand, the vector space span of 4 is the subspace : of all sequences in M that have finite support, that is, have only a finite number of nonzero terms and since span²4 ³ ~ : £ M , we see that 4 is not a Hamel basis for the vector space M .
We will show in Chapter 13 that all Hilbert bases for an inner product space have the same cardinality and so we can define the Hilbert dimension of an inner product space to be that cardinality. Once again, to avoid confusion, the cardinality of any Hamel basis for = is referred to as the Hamel dimension of = . The Hamel dimension is, in general, not equal to the Hilbert dimension. However, as we will now show, they are equal when either dimension is finite. Theorem 9.9 Let = be an inner product space. 1) (Gram–Schmidt orthogonalization) If 8 ~ ²# Á # Á Ã ³ is a linearly independent sequence in = , then there is an orthogonal sequence E ~ ²" Á " Á Ã ³ in = for which span²" Á Ã Á " ³ ~ span²# Á Ã Á # ³ for all .
190
Advanced Linear Algebra
2) If dim²= ³ ~ is finite then = has a Hilbert basis of size and all Hilbert bases for = have size . 3) If = has a finite Hilbert basis of size , then dim²= ³ ~ . Proof. To prove part 1), first let " ~ # . Once the orthogonal set ¸" Á Ã Á " ¹ of nonzero vectors has been chosen so that span²" Á Ã Á " ³ ~ span²# Á Ã Á # ³ the next vector "b is chosen by setting " ~ # b " b Ä b c "c and requiring that "b be orthogonal to each " for , that is, ~ º" Á " » ~ º# b " b Ä b c "c Á " » ~ º# Á " » b º" Á " » or, finally, ~ c
º# Á " » º" Á " »
for all ~ Á Ã Á . For part 2), applying the Gram–Schmidt orthogonalization process to a Hamel basis gives a Hilbert basis of the same size . Moreover, if = has a Hilbert basis of size greater than , it must also have a Hamel basis of size greater than , which is not possible. Finally, if = has a Hilbert basis 8 of size less than then 8 can be extended to a proper superset 9 that is also linearly independent. The Gram–Schmidt process applied to 9 gives a proper superset of 8 that is orthonormal, which is not possible. Hence, all Hilbert bases have size . For part 3), suppose that dim²= ³ . Since a Hilbert basis > of size is linearly independent, we can adjoin a new vector to > to get a linearly independent set of size b . Applying the Gram–Schmidt process to this set gives an orthonormal set that properly contains >, which is not possible.
For reference, let us state the Gram–Schmidt orthogonalization process separately and give an example of its use. Theorem 9.10 (The Gram–Schmidt orthogonalization process) If 8 ~ ²# Á # Á Ã ³ is a sequence of linearly independent vectors in an inner product space = , then the sequence E ~ ²" Á " Á Ã ³ defined by c
" ~ # c ~
º# Á " » " º" Á " »
Real and Complex Inner Product Spaces
191
is an orthogonal sequence in = with the property that span²" Á Ã Á " ³ ~ span²# Á Ã Á # ³ for all .
Of course, from the orthogonal sequence ²" ³, we get the orthonormal sequence ²$ ³, where $ ~ " °)" ). Example 9.5 Consider the inner product space s´%µ of real polynomials, with inner product defined by
º²%³Á ²%³» ~ ²%³²%³% c
Applying the Gram–Schmidt process to the sequence 8 ~ ²Á %Á % Á % Á à ³ gives " ²%³ ~ " ²%³ ~ % c
% % c h~% % c
% % % % c "3 ²%³ ~ % c h c c h % ~ % c % % % c c % ²% c ³% % % % % c 3 "4 ²%³ ~ % c c h c c h % c h 6% c 7 % % % ²% c ³ % c c c 3 ~ % c % and so on. The polynomials in this sequence are (at least up to multiplicative constants) the Legendre polynomials.
Orthonormal bases have a great advantage over arbitrary bases. From a computational point of view, if 8 ~ ¸# Á Ã Á # ¹ is a basis for = then each # = has the form # ~ # b Ä b # In general, however, determining the coordinates requires solving a system of linear equations of size d . On the other hand, if E ~ ¸" Á Ã Á " ¹ is an orthonormal basis for = and # ~ " b Ä b " then the coefficients are quite easily computed: º#Á " » ~ º " b Ä b " Á " » ~ º" Á " » ~
192
Advanced Linear Algebra
Even if E ~ ¸" Á à Á " ¹ is not a basis (but just an orthonormal set), we can still consider the expansion V# ~ º#Á " »" b Ä b º#Á " »" Proof of the following characterization of orthonormal (Hamel) bases is left to the reader. Theorem 9.11 Let E ~ ¸" Á à Á " ¹ be an orthonormal set of vectors in a finite-dimensional inner product space = . For any # = , the Fourier expansion of # with respect to E is V# ~ º#Á " »" b Ä b º#Á " »" In this case, Bessel's inequality holds for all # = , that is )V#) )#) Moreover, the following are equivalent: 1) The set E is an orthonormal basis for = . 2) Every vector is equal to its Fourier expansion, that is, for all # = V# ~ # 3) Bessel's identity holds for all # = , that is )V#) ~ )#) 4) Parseval's identity holds for all #Á $ = , that is º#Á $» ~ º#Á " »º$Á " » b Ä b º#Á " »º$Á " »
The Projection Theorem and Best Approximations We have seen that if : is a subspace of an inner product space = then : q : ~ ¸¹. This raises the question of whether or not the orthogonal complement : is a vector space complement of : , that is, whether or not = ~ : l : . If : is a finite-dimensional subspace of = , the answer is yes, but for infinitedimensional subspaces, : must have the topological property of being complete. Hence, in accordance with our goals in this chapter, we will postpone a discussion of the general case to Chapter 13, contenting ourselves here with an example to show that, in general, = £ : l : . Example 9.6 As in Example 9.4, let = ~ M and let : be the subspace of all sequences of finite support, that is, : is spanned by the vectors ~ ²Á à Á Á Á Á à ³ where has a in the th coordinate and s elsewhere. If % ~ ²% ³ : then
Real and Complex Inner Product Spaces
193
% ~ º%Á » ~ for all and so % ~ . Therefore, : ~ ¸¹. However, : l : ~ : £ M
As the next theorem shows, in the finite-dimensional case, orthogonal complements are also vector space complements. This theorem is often called the projection theorem, for reasons that will become apparent when we discuss projection operators. (We will discuss the projection theorem in the infinitedimensional case in Chapter 13.³ Theorem 9.12 (The projection theorem) If : is a finite-dimensional subspace of an inner product space = (which need not be finite-dimensional) then = ~ : l : Proof. Let E ~ ¸" Á à Á " ¹ be an orthonormal basis for : . For each # = , consider the Fourier expansion V# ~ º#Á " »" b Ä b º#Á " »" with respect to E . We may write # ~ V# b ²# c V#³ where V# : . Moreover, # c V# : , since º# c V#Á " » ~ º#Á " » c º#Á V " » ~ Hence = ~ : b : . We have already observed that : q : ~ ¸¹ and so = ~ : l : .
According to the proof of the projection theorem, the component of # that lies in : is just the Fourier expansion of # with respect to any orthonormal basis E for :.
Best Approximations The projection theorem implies that if # ~ V# b where V# : and : then V# is the element of : that is closest to #, that is, V# is the best approximation to # from within : . For if ! : then since # c V# : we have ²# c V#³ ²# V c !³ and so )# c !) ~ )# c V# b V# c !) ~ )# c V#) b )V# c !) It follows that )# c !) is smallest when ! ~ V#. Also, note that V# is the unique vector in : for which # c V# : . Thus, we can say that the best approximation to # from within : is the unique vector : for which ²# c ³ : and that this vector is the Fourier expansion V# of #.
194
Advanced Linear Algebra
Orthogonal Direct Sums Definition Let = be an inner product space and let : Á Ã Á : be subspaces of = . Then = is the orthogonal direct sum of : Á Ã Á : , written : ~ : p Ä p : if 1) = ~ : l Ä l : 2) : : for £ In general, to say that the orthogonal direct sum : p Ä p : of subspaces exists is to say that the direct sum : l Ä l : exists and that 2) holds.
Theorem 9.12 states that = ~ : p : , for any finite-dimensional subspace : of a vector space = . The following simple result is very useful. Theorem 9.13 Let = be an inner product space. The following are equivalent. 1) = ~ : p ; 2) = ~ : l ; and ; ~ : 3) = ~ : l ; and ; : Proof. Suppose 1) holds. Then = ~ : l ; and : ; , which implies that ; : . But if $ : then $ ~ b ! for :Á ! ; and so ~ º Á $» ~ º Á » b º Á !» ~ º Á » Hence ~ and $ ; , which implies that : ; . Hence, : ~ ; , which gives 2). Of course, 2) implies 3). Finally, if 3) holds then ; : , which implies that : ; and so 1) holds.
Theorem 9.14 Let = be an inner product space. 1) If dim²= ³ B and : is a subspace of = then dim²: ³ ~ dim²= ³ c dim²:³ 2) If : is a finite-dimensional subspace of = then : ~ : 3) If ? is a subset of = and dim²span²?³³ B then ? ~ span²?³ Proof. Since = ~ : l : , we have dim²= ³ ~ dim²:³ b dim²: ³, which proves part 1). As for part 2), it is clear that : : . On the other hand, if # : then by the projection theorem #~ b
Z
Real and Complex Inner Product Spaces
195
where : and Z : . But # : implies that ~ º#Á Z » ~ º Z Á Z » and so Z ~ , showing that # : . Therefore, : : and : ~ : . We leave the proof of part 3) as an exercise.
The Riesz Representation Theorem If % is a vector in an inner product space = then the function % ¢ = ¦ defined by % ²#³ ~ º#Á %» is easily seen to be a linear functional on = . The following theorem shows that all linear functionals on a finite-dimensional inner product space = have this form. (We will see in Chapter 13 that, in the infinite-dimensional case, all continuous linear functionals on = have this form.³ Theorem 9.15 (The Riesz representation theorem) Let = be a finitedimensional inner product space and let = i be a linear functional on = . Then there exists a unique vector % = for which ²#³ ~ º#Á %»
(9.3)
for all # = . Let us call % the Riesz vector for and denote it by 9 . (This term and notation are not standard.) Proof. If is the zero functional, we may take % ~ , so let us assume that £ . Then 2 ~ ker² ³ has codimension and so = ~ º$» p 2 for $ 2 . If % ~ $ for some - , then (9.3) holds if and only if ²#³ ~ º#Á $» and since any # = has the form # ~ $ b for - and 2 , this is equivalent to ² $³ ~ º $Á $» or ²$³ ~ º$Á $» ~ )$) Hence, we may take ~ ²$³°)$) and %~
²$³ ) $)
$
Proof of uniqueness is left as an exercise.
If = ~ s , then it is easy to see that 9 ~ ² ² ³Á à Á ² ³³ where ² Á à Á ³ is the standard basis for s .
196
Advanced Linear Algebra
Using the Riesz representation theorem, we can define a map ¢ = i ¦ = by setting ² ³ ~ 9 , where 9 is the Riesz vector for . Since º#Á ² b ³» ~ ² b ³²#³ ~ ²#³ b ²#³ ~ º#Á ² ³» b º#Á ²³» ~ º#Á ² ³ b ²³» for all # = , we have ² b ³ ~ ² ³ b ²³ and so is conjugate linear. Since is bijective, the map ¢ = i ¦ = is a “conjugate isomorphism.”
Exercises 1. 2. 3.
Verify the statement concerning equality in the triangle inequality. Prove the parallelogram law. Prove the Appolonius identity ) $ c " ) b )$ c # ) ~
4. 5.
)" c #) b h$ c ²" b #³h
Let = be an inner product space with basis 8 . Show that the inner product is uniquely defined by the values º"Á #», for all "Á # 8 . Prove that two vectors " and # in a real inner product space = are orthogonal if and only if )" b #) ~ )") b )#)
6. 7.
Show that an isometry is injective. Use Zorn's lemma to show that any nontrivial inner product space has a Hilbert basis. 8. Prove Bessel's inequality. 9. Prove that an orthonormal set E is a basis for = if and only if V# ~ # , for all #=. 10. Prove that an orthonormal set E is a basis for = if and only if Bessel's identity holds for all # = , that is, if and only if )V#) ~ )#) for all # = . 11. Prove that an orthonormal set E is a basis for = if and only if Parseval's identity holds for all #Á $ = , that is, if and only if º#Á $» ~ º#Á " »º$Á " » b Ä b º#Á " »º$Á " » for all #Á $ = .
Real and Complex Inner Product Spaces
12. Let " ~ ² Á Ã Á ³ and # ~ ² Á Ã Á inequality states that (
³
197
be in s . The Cauchy-Schwarz
b Ä b ( ² b Ä b ³²
bÄb
³
Prove that we can do better: ²( ( b Ä b ( (³ ² b Ä b ³²
bÄb
³
13. Let = be a finite-dimensional inner product space. Prove that for any subset ? of = , we have ? ~ span²?³. 14. Let F3 be the inner product of all polynomials of degree at most 3, under the inner product B
º²%³Á ²%³» ~
²%³²%³ c% %
cB
Apply the Gram–Schmidt process to the basis ¸Á %Á % Á % ¹, thereby computing the first four Hermite polynomials (at least up to a multiplicative constant). 15. Verify uniqueness in the Riesz representation theorem. 16. Let = be a complex inner product space and let : be a subspace of = . Suppose that # = is a vector for which º#Á » b º Á #» º Á » for all : . Prove that # : . 17. If = and > are inner product spaces, consider the function on = ^ > defined by º²# Á $ ³Á ²# Á $ ³» ~ º# Á # » b º$ Á $ » Is this an inner product on = ^ > ? 18. A normed vector space over s or d is a vector space (over s or d) together with a function ))¢ = ¦ s for which for all "Á # = and scalars we have a) )#) ~ (()#) b) )" b #) )") b )#) c) )#) ~ if and only if # ~ If = is a real normed space (over s) and if the norm satisfies the parallelogram law )" b #) b )" c #) ~ )") b )#) prove that the polarization identity º"Á #» ~
²)" b #) c )" c #) ³
defines an inner product on = . 19. Let : be a subspace of an inner product space = . Prove that each coset in = °: contains exactly one vector that is orthogonal to : .
198
Advanced Linear Algebra
Extensions of Linear Functionals 20. Let be a linear functional on a subspace : of a finite-dimensional inner product space = . Let ²#³ ~ º#Á 9 ». Suppose that = i is an extension of , that is, O: ~ . What is the relationship between the Riesz vectors 9 and 9 ? 21. Let be a linear functional on a subspace : of a finite-dimensional inner product space = and let 2 ~ ker² ³. Show that if = i is an extension of then 9 2 ± I . Moreover, for each vector " 2 ± : there is exactly one scalar for which the linear functional ²?³ ~ º?Á "» is an extension of .
Positive Linear Functionals on s A vector # ~ ² Á à Á ³ in s is nonnegative (also called positive), written # , if for all . The vector # is strictly positive, written # , if # is nonnegative but not . The set sb of all strictly positive vectors in s is called the nonnegative orthant in s À The vector # is strongly positive, written # , if for all . The set sbb , of all strongly positive vectors in s is the strongly positive orthant in s À Let ¢ : ¦ s be a linear functional on a subspace : of s . Then is nonnegative (also called positive), written , if # ¬ ²#³ for all # : and is strictly positive, written , if # ¬ ²#³ for all # :À 22. Prove that a linear functional on s is positive if and only if 9 and strictly positive if and only if 9 . If : is a subspace of s is it true that a linear functional on : is nonnegative if and only if 9 ? 23. Let ¢ : ¦ s be a strictly positive linear functional on a subspace : of s . Prove that has a strictly positive extension to s . Use the fact that if < q s b ~ ¸¹, where sb ~ ¸² Á à Á ³ all ¹ and < is a subspace of s then < contains a strongly positive vector. 24. If = is a real inner product space, then we can define an inner product on its complexification = d as follows (this is the same formula as for the ordinary inner product on a complex vector space) º" b #Á % b &» ~ º"Á %» b º#Á &» b ²º#Á %» c º"Á &»³ Show that
Real and Complex Inner Product Spaces
199
)²" b #³) ~ )") b )#) where the norm on the left is induced by the inner product on = d and the norm on the right is induced by the inner product on = .
Chapter 10
Structure Theory for Normal Operators
Throughout this chapter, all vector spaces are assumed to be finite-dimensional unless otherwise noted.
The Adjoint of a Linear Operator The purpose of this chapter is to study the structure of certain special types of linear operators on finite-dimensional inner product spaces. In order to define these operators, we introduce another type of adjoint (different from the operator adjoint of Chapter 3). Theorem 10.1 Let = and > be finite-dimensional inner product spaces over and let B²= Á > ³. Then there is a unique function i ¢ > ¦ = , defined by the condition º ²#³Á $» ~ º#Á i ²$³» for all # = and $ > . This function is in B²> Á = ³ and is called the adjoint of . Proof. For a fixed $ > , consider the function $ ¢ = ¦ - defined by $ ²#³ ~ º ²#³Á $» It is easy to verify that $ is a linear functional on = and so, by the Riesz representation theorem, there exists a unique vector % = for which $ ²#³ ~ º#Á %» for all # = . Hence, if i ²$³ ~ % then º ²#³Á $» ~ º#Á i ²$³» for all # = . This establishes the existence and uniqueness of i . To show that i is linear, observe that
202
Advanced Linear Algebra
º#Á i ²$ b "³» ~ º ²#³Á $ b "» ~ º ²#³Á $» b º ²#³Á "» ~ º#Á i ²$³» b º#Á i ²"³» ~ º#Á i ²#³» b º#Á i ²"³» ~ º#Á i ²$³ b i ²"³» for all # = and so i ²$ b "³ ~ i ²$³ b i ²"³ Hence i B²= Á > ³.
Here are some of the basic properties of the adjoint. Proof is left to the reader. Theorem 10.2 Let = and > be finite-dimensional inner product spaces. For every Á B²= Á > ³ and 1) ² b ³i ~ i b i 2) ² ³i ~ i 3) ii ~ 4) If = ~ > then ² ³i ~ i i 5) If is invertible then ² c ³i ~ ² i ³c 6) If = ~ > and ²%³ s´%µ then ² ³i ~ ² i ³.
Now let us relate the kernel and image of a linear transformation to those of its adjoint. Theorem 10.3 Let B²= Á > ³ where = and > are finite-dimensional inner product spaces. Then 1) ker² i ³ ~ im² ³ 2) im² i ³ ~ ker² ³ 3) is injective if and only if i is surjective. 4) is surjective if and only if i is injective. 5) ker² i ³ ~ ker² ³ 6) ker² i ³ ~ ker² i ³ 7) im² i ³ ~ im² i ³ 8) im² i ³ ~ im² ³ 9) If is projection onto im²³ along ker²³ then i is projection onto ker²³ along im²³ . Proof. For part 1), " ker² i ³ ¯ i ²"³ ~ ¯ º i ²"³Á #» ~ for all # ¯ º"Á ²#³» ~ for all # ¯ " im² ³ Part 2) follows from part 1) by replacing by i and taking orthogonal
Structure Theory for Normal Operators
203
complements. Parts 3) and 4) follow from parts 1) and 2). For part 5), it is clear that ²"³ ~ implies that i ²"³ ~ . For the converse, we have i ²"³ ~ ¬ º i ²"³Á "» ~ ¬ º ²"³Á ²"³» ~ ¬ ²"³ ~ Part 6) follows from part 5) by replacing with i . We leave parts 7)–9) for the reader.
The Operator Adjoint and the Hilbert Space Adjoint We should make some remarks about the relationship between the operator adjoint d of , as defined in Chapter 3 and the adjoint i that we have just defined, which is sometimes called the Hilbert space adjoint. In the first place, if ¢ = ¦ > then d and i have different domains and ranges d¢ > i ¦ = i but i¢ > ¦ = These maps are shown in Figure 10.1, where ¢ = i ¦ = and ¢ > i ¦ > are the conjugate linear maps defined by the conditions ²#³ ~ º#Á ² ³» for all = i and # = and ²$³ ~ º$Á ²³» for all > i and $ > , and whose existence is guaranteed by the Riesz representation theorem. Wx V* W*
I V
I W* W
W
Figure 10.1 The map ¢ > i ¦ = i defined by ~ ² ³c i
204
Advanced Linear Algebra
is linear. Moreover, for all > i and # = ´² ³µ²#³ ~ ´² ³c i ² ³µ²#³ ~ ¸² ³c ´ i ² ³µ¹²#³ ~ º#Á i ² ³» ~ º ²#³Á ² ³» ~ ² ²#³³ ~ d ² ³²#³ and so ~ d . Hence, the relationship between d and i is d ~ ² ³c i The functions are like “change of variables” functions from linear functionals to vectors, and we can say, loosely speaking, that i does to Riesz vectors what d does to the corresponding linear functional. In Chapter 3, we showed that the matrix of the operator adjoint d is the transpose of the matrix of the map . For Hilbert space adjoints, the situation is slightly different (due to the conjugate linearity of the inner product). Suppose that 8 ~ ² Á à Á ³ is an ordered orthonormal basis for = and 9 ~ ² Á à Á ³ is an ordered orthonormal basis for > . Then ²´ i µ9Á8 ³Á ~ º i ² ³Á » ~ º Á ² ³» ~ º ² ³Á » ~ ²´ µ8Á9 ³Á and so ´ i µ9Á8 and ´ µ8Á9 are conjugate transposes. If ( ~ ²Á ³ is a matrix over - , let us write the conjugate transpose as (i ~ ²Á ³! Theorem 10.4 Let B²= Á > ³, where = and > are finite-dimensional inner product spaces. 1) The operator adjoint d and the Hilbert space adjoint i are related by d ~ ² ³c i where maps a linear functional to its Riesz vector 9 . 2) If 8 and 9 are ordered orthonormal bases for = and > , respectively, then ´ i µ9Á8 ~ ²´ µ8Á9 ³i In words, the matrix of the adjoint i is the conjugate transpose of the matrix of .
Unitary Diagonalizability Recall that a linear operator B²= ³ on a finite-dimensional vector space = is diagonalizable if and only if = has a basis consisting entirely of eigenvectors of , or equivalently, = can be written as a direct sum of the eigenspaces of
Structure Theory for Normal Operators
205
= ~ ; l Ä l ; Of course, each eigenspace ; has an orthonormal basis E , but the union of these bases, while certainly a basis for = , need not be orthonormal. Definition Let = be a finite-dimensional inner product space and let B²= ³. If there is an orthonormal basis E for = for which ´ µE is a diagonal matrix, we say that is unitarily diagonalizable when = is complex and orthogonally diagonalizable when = is real.
For simplicity in exposition, we will tend to use the term unitarily diagonalizable for both cases. It is clear that the following statements are equivalent: 1) is unitarily diagonalizable 2) There is an orthonormal basis for = consisting entirely of eigenvectors of 3) = can be written as an orthogonal direct sum of eigenspaces = ~ ; p Ä p ; Since unitarily diagonalizable operators are so well behaved, it is natural to seek a characterization of such operators. Remarkably, there is a simple one. Let us first suppose that is unitarily diagonalizable and that E is an ordered orthonormal basis of eigenvectors for . Then the matrix ´ µE is diagonal ´ µE ~ diag² Á Ã Á ³ and so ´ i µE ~ diag² Á Ã Á ³ Clearly, ´ µE and ´ i µE commute. Hence, and i also commute, that is i ~ i It is a surprising fact that the converse also holds on a complex inner product space, that is, if and i commute then is unitarily diagonalizable. (Something similar holds for real inner product spaces, as we will see.)
Normal Operators It is clear from the preceding discussion that the following concept is key. Definition A linear operator on an inner product space = is normal if it commutes with its adjoint i ~ i Normal operators have some very nice properties.
206
Advanced Linear Algebra
Theorem 10.5 Let D be the set of normal operators on a finite-dimensional inner product space = . Then D satisfies the following properties: 1) (Closure under linear combinations) Á - Á Á D ¬ b D 2) (Closure under multiplication, under a commutivity requirement) Á D Á i ~ i ¬ D 3) (Closure under inverses) D , invertible ¬ c D 4) (Closure under polynomials) D ¬ ² ³ D for any ²%³ - ´%µ Moreover, if D then 5) ²#³ ~ ¯ i ²#³ ~ 6) ²#³ ~ ¯ ²#³ ~ 7) The minimal polynomial of is a product of distinct irreducible monic polynomials. 8) ²#³ ~ # ¯ i ²#³ ~ # 9) Let : and ; be submodules of = , whose orders are relatively prime real polynomials. Then : ; . 10) If and are distinct eigenvalues of then ; ; . Proof. We leave parts 1)–4) for the reader. Normality implies that ) ²#³) ~ º ²#³Á ²#³» ~ º i ²#³Á i ²#³» ~ ) i ²#³) and so part 5) follows. For part 6) let ~ i . Then has the property that º²#³Á $» ~ º i ²#³Á $» ~ º ²#³Á ²$³» ~ º#Á i ²$³» ~ º#Á ²$³» and so i ~ . (We will discuss this property of being self-adjoint in detail later.) Now we can easily prove part 6) for . For if ²#³ ~ for then ~ º ²#³Á c ²#³» ~ ºc ²#³Á c ²#³» and so c ²#³ ~ . Continuing in this way gives ²#³ ~ . Now, if ²#³ ~ for , then the normality of implies that ²#³ ~ ² i ³ ²#³ ~ and so ²#³ ~ . Hence ~ º²#³Á #» ~ º i ²#³Á #» ~ º ²#³Á ²#³» and so ²#³ ~ .
Structure Theory for Normal Operators
207
For part 7), suppose that ²%³ ~ ²%³Ä ²%³ where ²%³Á à Á ²%³ are distinct, irreducible and monic. If then for any # = ² ³´²³ ² ³#µ ~ ²³
where ²%³ ~ ²%³° ²%³. Hence, since ² ³ is also normal, part 6) implies that ² ³´²³ ² ³#µ ~ ²³
for all # = and so ² ³ ² ³ ~ , which is false since the polynomial ²³ ²%³ ²%³ has degree less than that of ²%³. Hence, ~ for all . For part 8), using part 5) we have ²#³ ~ # ¯ ² c ³²#³ ~ ¯ ² c ³i ²#³ ~ ¯ i ²#³ ~ ²#³ For part 9), let ann²:³ ~ º²%³» and ann²; ³ ~ º²%³». Then there are real polynomials ²%³ and ²%³ for which ²%³²%³ b ²%³²%³ ~ . If " : and # ; then since ² ³" ~ implies that ² i ³" ~ , we have º"Á #» ~ º´² i ³² i ³ b ² i ³² i ³µ"Á #» ~ º² i ³² i ³"Á #» ~ º"Á ² ³² ³#» ~ Hence, : ; . For part 10), we have for # ; and $ ; º#Á $» ~ º ²#³Á $» ~ º#Á i ²$³» ~ º#Á $» ~ º#Á $» and since £ we get º#Á $» ~ .
Special Types of Normal Operators Before discussing the structure of normal operators, we want to introduce some special types of normal operators that will play an important role in the theory. Definition Let = be an inner product space and let B²= ³. 1) is self-adjoint (also called Hermitian in the complex case and symmetric in the real case), if i ~
208
Advanced Linear Algebra
2) is called skew-Hermitian in the complex case and skew-symmetric in the real case, if i ~ c 3) is called unitary in the complex case and orthogonal in the real case if is invertible and i ~ c
There are also matrix versions of these definitions, obtained simply by replacing the operator by a matrix (. In the finite-dimensional case, we have seen that i ´ i µE ~ ´ µE
for any ordered orthonormal basis E of = and so if is normal then i ´ µE ´ µiE ~ ´ µE ´ i µE ~ ´ i µE ~ ´ i µE ~ ´ i µE ´ µE ~ ´ µE ´ µE
which implies that the matrix ´ µE of is normal. The converse holds as well. In fact, we can say that is normal (respectively: Hermitian, symmetric, skewHermitian, unitary, orthogonal) if and only if any matrix that represents , with respect to an ordered orthonormal basis E, is normal (respectively: Hermitian, symmetric, skew-Hermitian, unitary, orthogonal). In some sense, square complex matrices are a generalization of complex numbers. Also, the adjoint (conjugate transpose) of a matrix seems to be a generalization of the complex conjugate. In looking for a tighter analogy—one that will lead to some useful mnemonics, we could consider just the diagonal matrices, but this is a bit too tight. The next logical choice is the normal operators. Among the complex numbers, there are some special subsets: the real numbers, the positive numbers and the numbers on the unit circle. We will soon see that a normal operator is self-adjoint if and only if its complex eigenvalues are all real. This would suggest that the analog of the set of real numbers is the set of selfadjoint operators. Also, we will see that a normal operator is unitary if and only if all of its eigenvalues have norm , so numbers on the unit circle seem to correspond to the set of unitary operators. Of course, this is just an analogy.
Self-Adjoint Operators Let us consider some properties of self-adjoint operators. The quadratic form associated with the linear operator is the function 8 ¢ = ¦ - defined by 8 ²#³ ~ º ²#³Á #» We have seen that in a complex inner product space ~ if and only if 8 ~ but this does not hold, in general, for real inner product spaces. However, it does hold for symmetric operators on a real inner product space.
Structure Theory for Normal Operators
209
Theorem 10.6 Let > be the set of self-adjoint operators on a finite-dimensional inner product space = . Then > satisfies the following properties: 1) (Closure under addition) Á > ¬ b > 2) (Closure under real scalar multiplication) sÁ > ¬ > 3) (Closure under multiplication if the factors commute) Á >Á ~ ¬ > 4) (Closure under inverses) >, invertible ¬ c > 5) (Closure under real polynomials) > ¬ ² ³ > for any ²%³ s´%µ 6) A complex operator is Hermitian if and only if 8 ²#³ is real for all #=. 7) If - ~ d or if - ~ s and is symmetric then ~ if and only if 8 ~ 8) If is self-adjoint, then the characteristic polynomial of splits over s and so all complex eigenvalues are real. Proof. For part 6), if is Hermitian then º ²#³Á #» ~ º#Á ²#³» ~ º ²#³Á #» and so 8 ²#³ ~ º ²#³Á #» is real. Conversely, if º ²#³Á #» s then º#Á ²#³» ~ º ²#³Á #» ~ º#Á i ²#³» and so º#Á ² c i ³²#³» ~ for all # = , whence c i ~ , which shows that is Hermitian. As for part 7), the first case (- ~ d) is just Theorem 9.6 so we need only consider the real case, for which ~ º ²% b &³Á % b &» ~ º ²%³Á %» b º ²&³Á &» b º ²%³Á &» b º ²&³Á %» ~ º ²%³Á &» b º ²&³Á %» ~ º ²%³Á &» b º%Á ²&³» ~ º ²%³Á &» b º ²%³Á &» ~ º ²%³Á &» and so ~ .
210
Advanced Linear Algebra
For part 8), if is Hermitian (- ~ d) and ²#³ ~ # then º#Á #» ~ º ²#³Á #» ~ 8 ²#³ is real by part 5) and so must be real. If is symmetric (- ~ s) , we must be a bit careful, since if is a complex root of * ²%³, it does not follow that ²#³ ~ # for some £ # = . However, we can proceed as follows. Let be represented by the matrix (, with respect to some ordered basis for = . Then * ²%³ ~ *( ²%³. Now, ( is a real symmetric matrix, but can be thought of as a complex Hermitian matrix that happens to have real entries. As such, it represents a Hermitian linear operator on the complex space d and so, by what we have just shown, all (complex) roots of its characteristic polynomial are real. But the characteristic polynomial of ( is the same, whether we think of ( as a real or a complex matrix and so the result follows.
Unitary Operators and Isometries We now turn to the basic properties of unitary operators. These are the workhorse operators, in that a unitary operator is precisely a normal operator that maps orthonormal bases to orthonormal bases. Note that is unitary if and only if º ²#³Á $» ~ º#Á c ²$³» for all #Á $ = . Theorem 10.7 Let K be the set of unitary operators on a finite-dimensional inner product space = . Then K satisfies the following properties: 1) (Closure under scalar multiplication by complex numbers of norm ) dÁ (( ~ and K ¬ K 2) (Closure under multiplication) Á K ¬ K 3) (Closure under inverses) K ¬ c K 4) is unitary/orthogonal if and only it is an isometric isomorphism. 5) is unitary/orthogonal if and only if it takes an orthonormal basis to an orthonormal basis. 6) If is unitary/orthogonal then the eigenvalues of have absolute value . Proof. We leave the proofs of 1)–3) to the reader. For part 4), a unitary/orthogonal map is injective and since the range and domain have the same finite dimension, it is also surjective. Moreover, for a bijective linear map , we have
Structure Theory for Normal Operators
211
is an isometry ¯ º ²#³Á ²$³» ~ º#Á $» for all #Á $ = ¯ º#Á i ²$³» ~ º#Á $» for all #Á $ = ¯ i ²$³ ~ $ for all $ = ¯ i ~ ¯ i ~ c ¯ is unitary/orthogonal For part 5), suppose that is unitary/orthogonal and that E ~ ¸" Á à Á " ¹ is an orthonormal basis for = . Then º ²" ³Á ²" ³» ~ º" Á " » ~ Á and so ²E³ is an orthonormal basis for = . Conversely, suppose that E and ²E³ are orthonormal bases for = . Then º ²" ³Á ²" ³» ~ Á ~ º" Á " » Using the conjugate linearity/bilinearity of the inner product, we get º ²#³Á ²$³» ~ º#Á $» and so is unitary/orthogonal. For part 6), if is unitary and ²#³ ~ # then º#Á #» ~ º#Á #» ~ º ²#³Á ²#³» ~ º#Á #» and so (( ~ ~ , which implies that (( ~ .
We also have the following theorem concerning unitary (and orthogonal) matrices. Theorem 10.8 Let ( be an d matrix. 1) The following are equivalent: a) ( is unitary b) The columns of ( form an orthonormal set in d . c) The rows of ( form an orthonormal set in d . 2) If ( is unitary then (det²(³( ~ . In particular, if ( is orthogonal then det²(³ ~ f . Proof. The matrix ( is unitary if and only if ((i ~ 0 , which is equivalent to saying that the rows of ( are orthonormal. Similarly, ( is unitary if and only if (i ( ~ 0 , which is equivalent to saying that the columns of ( are orthonormal. As for part 2), we have ((i ~ 0 ¬ det²(³det²(i ³ ~ ¬ det²(³det²(³ ~ from which the result follows.
Unitary/orthogonal matrices play the role of change of basis matrices when we restrict attention to orthonormal bases. Let us first note that if 8 ~ ²" Á Ã Á " ³
212
Advanced Linear Algebra
is an ordered orthonormal basis and # ~ " b Ä b " $ ~ " b Ä b " then º#Á $» ~ b Ä b ~ ´#µ8 h ´$µ8 and so # $ if and only if ´#µ8 ´$µ8 . We can now state the analog of Theorem 2.13. Theorem 10.9 If we are given any two of the following: 1) A unitary/orthogonal d matrix (. 2) An ordered orthonormal basis 8 for - . 3) An ordered orthonormal basis 9 for - . then the third is uniquely determined by the equation ( ~ 48 Á 9
Unitary Similarity We have seen that the change of basis formula for operators is given by ´ µ8Z ~ 7 ´ µ8 7 c where 7 is an invertible matrix. What happens when the bases are orthonormal? Definition Two complex matrices ( and ) are unitarily similar (also called unitarily equivalent) if there exists a unitary matrix < for which ) ~ < (< c ~ < (< i The equivalence classes associated with unitary similarity are called unitary similarity classes. Similarly, two real matrices ( and ) are orthogonally similar (also called orthogonally equivalent) if there exists an orthogonal matrix 6 for which ) ~ 6(6c ~ 6(6! The equivalence classes associated with orthogonal similarity are called orthogonal similarity classes.
The analog of Theorem 2.19 is the following. Theorem 10.10 Let = be an inner product space of dimension . Then two d matrices ( and ) are unitarily/orthogonally similar if and only if they represent the same linear operator B²= ³, but possibly with respect to different ordered orthonormal bases. In this case, ( and ) represent exactly the
Structure Theory for Normal Operators
213
same set of linear operators in B²= ³, when we restrict attention to orthonormal bases. Proof. If ( and ) represent B²= ³, that is, if ( ~ ´ µ8 and ) ~ ´ µ9 for ordered orthonormal bases 8 and 9 then ) ~ 48Á9 (49Á8 and according to Theorem 10.9, 48Á9 is unitary/orthogonal. Hence, ( and ) are unitarily/orthogonally similar. Now suppose that ( and ) are unitarily/orthogonally similar, say ) ~ < (< c where < is unitary/orthogonal. Suppose also that ( represents a linear operator B²= ³ for some ordered orthonormal basis 8 , that is, ( ~ ´ µ8 Theorem 10.9 implies that there is a unique ordered orthonormal basis 9 for = for which < ~ 48Á9 . Hence ) ~ 48Á9 ´ µ8 48c Á 9 ~ ´ µ9 Hence, ) also represents . By symmetry, we see that ( and ) represent the same set of linear operators, under all possible ordered orthonormal bases.
Unfortunately, canonical forms for unitary similarity are rather complicated and not well discussed. We have shown in Chapter 8 that any complex matrix ( is unitarily similar to an upper triangular matrix, that is, that ( is unitarily upper triangularizable. (This is Schur's lemma.) However, just as in the nonunitary case, upper triangular matrices do not form a canonical form for unitary similarity. We will soon show that every complex normal matrix is unitarily diagonalizable. However, we will not discuss canonical forms for unitary similarity in this book, but instead refer the reader to the survey article [Sh].
Reflections The following defines a very special type of unitary operator. Definition For a nonzero # = , the unique operator /# for which /# # ~ c#Á ²/# ³Oº#» ~ is called a reflection or a Householder transformation.
214
Advanced Linear Algebra
It is easy to verify that /# ²%³ ~ % c
º%Á #» # º#Á #»
Note also that if is a reflection then ~ /% if and only if ²%³ ~ c%. For if ~ /# and ²%³ ~ c% then we can write % ~ # b ' where ' # and so c²# b '³ ~ c% ~ ²%³ ~ /# ²# b '³ ~ c# b ' which implies that ' ~ , whence /# ~ /c % ~ /% . If /# is reflection and we extend # to an ordered orthonormal basis 8 for = then ´/# µ8 is the matrix obtained from the identity matrix by replacing the ²Á ³ entry by c. Thus, we see that a reflection is unitary, Hermitian and idempotent (/# ~ ). Theorem 10.11 Let #Á $ = with )#) ~ )$) £ . Then /#c$ is the unique reflection sending # to $, that is, /#c$ ²#³ ~ $. Proof. If )#) ~ )$) then ²# c $³ ²# b $³ and so /#c$ ²# c $³ ~ $ c # /#c$ ²# b $³ ~ # b $ from which it follows that /#c$ ²#³ ~ $. As to uniqueness, suppose /% is a reflection for which /% ²#³ ~ $. Since /%c ~ /% , we have /% ²$³ ~ # and so /% ²# c $³ ~ c²# c $³ which implies that /% ~ /#c$ .
Reflections can be used to characterize unitary operators. Theorem 10.12 An operator on a finite-dimensional inner product space = is unitary (for - ~ d) or orthogonal (for - ~ s) if and only if it is a product of reflections. Proof. Since reflections are unitary (orthogonal) and the product of unitary (orthogonal) operators is unitary, one direction is easy. For the converse, let be unitary. Let 8 ~ ²" Á Ã Á " ³ be an orthonormal basis for = . Hence ²8 ³ is also an orthonormal basis for = . We make repeated use of the fact that /%c& ²%³ ~ &. For example, if % ~ ²" ³ c " then ²/% ³²" ³ ~ " and so /% is the identity on º" ». Next, if
Structure Theory for Normal Operators
215
% ~ ²/% ³²" ³ c " then ²/% /% ³²" ³ ~ " Also, we claim that % " . Since /% " ~ /%c " ~ ²" ³, it follows that º²/% ³²" ³ c " Á " » ~ º²/% ³²" ³Á " » ~ º ²" ³Á /% " » ~ º ²" ³Á ²" ³» ~ º" Á " » ~ Hence ²/% /% ³²" ³ ~ /% ²" ³ ~ " and so /% /% is the identity on º" Á " ». Now let us generalize. Assume that for , we have found reflections /%c Á à Á /% for which /%c Ä/% is the identity on ºu Á à Á "c ». If % ~ ²/%c Ä/% ³²" ³ c " then ²/% Ä/% ³²" ³ ~ " Also, we claim that % " for all . Since /%c Ä/% " ~ ²" ³ it follows that º²/%c Ä/% ³²" ³ c " Á " » ~ º²/%c Ä/% ³²" ³Á " » ~ º ²" ³Á /%c Ä/% " » ~ º ²" ³Á ²" ³» ~ º" Á " » ~ Hence ²/% Ä/% ³²" ³ ~ /% ²" ³ ~ " and so /% Ä/% is the identity on º" Á à Á " ». Thus, for ~ we have /% Ä/% ~ and so ~ /% Ä/% , as desired.
The Structure of Normal Operators We are now ready to consider the structure of a normal operator on a finitedimensional inner product space. According to Theorem 10.5, the minimal
216
Advanced Linear Algebra
polynomial of has the form ²%³ ~ ²%³Ä ²%³ where the 's are distinct monic irreducible polynomials. If - ~ d, then each ²%³ is linear. Theorem 8.11 then implies that is diagonalizable and = ~ ; l Ä l ; where Á à Á are the distinct eigenvalues of . Theorem 10.5 also tells us that if £ then ; ; and so = ~ ; p Ä p ; This is equivalent to saying that = has an orthonormal basis of eigenvectors of . The converse is also true, that is, if 8 ~ ²" Á à Á " ³ is an ordered orthonormal basis of eigenvectors of then º i " Á " » ~ º" Á " » ~ º" Á " » ~ Á ~ º " Á " » and so i " ~ " . It follows that i " ~ " ~ i " and so is normal. Theorem 10.13 (The structure theorem for normal operators: complex case) Let = be a finite-dimensional complex inner product space. Then a linear operator on = is normal if and only if = has an orthonormal basis 8 consisting entirely of eigenvectors of , that is = ~ ; p Ä p ; where ¸ Á à Á ¹ is the spectrum of . Put another way, is normal if and only if it is unitarily diagonalizable.
Now let us consider the real case, which is more complex than the complex case. However, we can take advantage of the corresponding result for - ~ d, by using the complexification process (which we will review in a moment). First, let us observe that when - ~ s, the minimal polynomial of is a product of distinct real linear and real quadratic factors, say ²%³ ~ ²% c ³Ä²% c ³ ²%³Ä ²%³ where the 's are distinct and the ²%³'s are distinct real irreducible quadratics.
Structure Theory for Normal Operators
217
Hence, according to part 9) of Theorem 10.5, the primary decomposition of = has the form = ~ ; p Ä p ; p > p Ä p > where ¸ Á Ã Á ¹ is the spectrum of and where > ~ ¸# = ² ³# ~ ¹ are -invariant subspaces. Accordingly, we can focus on the subspace > ~ > p Ä p > , upon which is normal, with a minimal polynomial that is the product of distinct irreducible quadratic factors. Let us briefly review the complexification process. Recall that if = is a real vector space, then the set = d ~ ¸" b # "Á # = ¹ is a complex vector space under addition and scalar multiplication “borrowed” from the field of complex numbers, that is, ²" b #³ b ²% b &³ ~ ²" b %³ b ²# b &³ ² b ³²" b #³ ~ ²" c #³ b ²# b "³ Recall also that the complexification map cpx¢ = ¦ = d defined by cpx²#³ ~ # b is an injective linear transformation from the real vector space = to the real version ²= d ³s of the complexification = d . If 8 ~ ¸# 0¹ is a basis for = over s, then the complexification of 8 cpx²8 ³ ~ ¸# b # 8 ¹ is a basis for the vector space = d over d and so dim²= d ³ ~ dim²= ³ For any linear operator on = , we can define a linear operator d on = d by d ²" b #³ ~ ²"³ b ²#³ Note that ² ³d ~ d d Also, if ²%³ is a real polynomial then ² d ³ ~ ´² ³µd .
218
Advanced Linear Algebra
For any ordered basis 8 of = , we have ´ d µcpx²8³ ~ ´ µ8 Hence, if a real matrix ( represents a linear operator on = then ( also represents the complexification of on = d . In particular, the polynomial ²%³ ~ det²%0 c (³ is the characteristic polynomial of both and d À If = is a real inner product space, then we can define an inner product on the complexification = d as follows (this is the same formula as for the ordinary inner product on a complex vector space) º" b #Á % b &» ~ º"Á %» b º#Á &» b ²º#Á %» c º"Á &»³ From this, it follows that if "Á % = then º"d Á %d » ~ º"Á %» and, in particular, " % in = if and only if "d %d in = d . Next, we have ² i ³d ~ ² d ³i , since º" b #Á ² i ³d ²% b &³» ~ º" b #Á i ²%³ b i ²&³» ~ º"Á i ²%³» b º#Á i ²&³» b ²º#Á i ²%³» c º"Á i²&³»³ ~ º ²"³Á %» b º ²#³Á &» b ²º ²#³Á %» c º ²"³Á &»³ ~ º ²"³ b ²#³Á % b &» ~ º d ²" b #³Á % b &» ~ º" b #Á ² d ³i ²% b &³» It follows that is normal if and only if d is normal. Now consider a normal linear operator on a real vector space = and suppose that the minimal polynomial ²%³ of is the product of distinct irreducible quadratic factors ²%³ ~ ²%³Ä ²%³ Hence, ²%³ has distinct roots, all of which are nonreal, say Á Á à Á Á and since the characteristic polynomial ²%³ of and d is a multiple of ²%³, these scalars are characteristic roots of d . Also, since ²%³ is real, it follows that ² d ³ ~ ´ ² ³µd ~
Structure Theory for Normal Operators
219
and so d ²%³ ²%³. However, the eigenvalues Á of d are roots of d ²%³ and so ²%³ d ²%³. Thus, d ²%³ ~ ²%³. Since d ²%³ is the product of distinct linear factors over d, we deduce immediately that d is diagonalizable. Hence, = d has a basis of eigenvectors of d , that is, = d ~ ; p ; p Ä p ; p ; Note also that since d is normal, these eigenspaces are orthogonal under the inner product on = d . Let us consider a particular eigenvalue pair and and the subspace ; p ; . (For convenience, we have dropped the subscript.) Suppose that ~ b and that E ~ ²" b # Á Ã Á " b # ³ is an ordered orthonormal basis for ; . Then for any ~ Á Ã Á , d ²" b # ³ ~ ² b ³²" b # ³ and so ²" ³ ~ " c # ²# ³ ~ " b # It follows that d ²" c # ³ ~ ²" ³ c ²# ³ ~ " c # c ²" b # ³ ~ ² c ³²" c # ³ ~ ²" c # ³ which shows that " c # is an eigenvector for d associated with and so E ~ ²" c # Á Ã Á " c # ³ ; But the set E is easily seen to be linearly independent and so dim²; ³ dim²; ³. Using the same argument with replaced by , we see that this inequality is an equality. Hence E is an ordered orthonormal basis for ; . It follows that ; p ; ~ < p Ä p < where < ~ span²" b # Á " c # ³
220
Advanced Linear Algebra
is two-dimensional, because the eigenvectors " b # and " c # are associated with distinct eigenvalues and are therefore linearly independent. Hence, = d is the orthogonal direct sum of -invariant two-dimensional subspaces = d ~ < p Ä p < where ~ dim²= d ³ ~ dim²= ³ and where each subspace < has the property that ²" ³ ~ " c # ²# ³ ~ " b # and where the scalars ~ b range over the distinct eigenvalues Á Á Ã Á Á of d . Now we wish to drop down to = . For each ~ Á Ã Á , let : ~ span²" Á # ³ be the subspace of = spanned by the real and imaginary parts of the eigenvectors " b # and " c # that span < . To see that : is twodimensional, consider its complexification :d ~ ¸% b & %Á & : ¹ Since < :d , we have ~ dim²< ³ dim²:d ³ ~ dim²: ³ (This can also be seen directly by applying d to the equation " b # ~ and solving the resulting pair of equations in " and # .) Next, we observe that if % : and & : with £ , then since %d < and &d < , we have %d &d and so % &. Thus, : : . In summary, if 8 ~ ²" Á # ³, then the subspaces : are two-dimensional, invariant and pairwise orthogonal, with matrix ´ µ8 ~ >
c ?
It follows that : p Ä p : = but since the dimensions of both sides are equal, we have equality = ~ : p Ä p : Theorem 10.14 (The structure theorem for normal operators: real case) Let = be a finite-dimensional real inner product space. A linear operator on = is normal if and only if
Structure Theory for Normal Operators
221
= ~ ; p Ä p ; p : p Ä p : where ¸ Á Ã Á ¹ is the spectrum of and each : is a two-dimensional invariant subspace for which there exists an ordered basis 8 ~ ²" Á # ³ for which ´ µ8 ~ >
c ?
for Á s. Proof. We need only show that if = has such a decomposition, then is normal. But since ´ µ8 ²´ µ8 ³! ~ ² b ³0 , it is clear that ´ µ8 is real normal. it follows easily that is real normal.
The following theorem includes the structure theorems stated above for the real and complex cases, along with some further refinements related to self-adjoint and unitary/orthogonal operators. Theorem 10.15 (The structure theorem for normal operators) 1) (Complex case) Let = be a finite-dimensional complex inner product space. Then a) An operator on = is normal if and only if = has an orthonormal basis 8 consisting entirely of eigenvectors of , that is = ~ ; p Ä p ; where ¸ Á Ã Á ¹ is the spectrum of . Put another way, is normal if and only if it is unitarily diagonalizable. b) Among the normal operators, the Hermitian operators are precisely those for which all complex eigenvalues are real. c) Among the normal operators, the unitary operators are precisely those for which all eigenvalues have norm . 2) (Real case) Let = be a finite-dimensional real inner product space. Then a) A linear operator on = is normal if and only if = ~ ; p Ä p ; p : p Ä p : where ¸ Á Ã Á ¹ is the spectrum of and each : is a twodimensional -invariant subspace for which there exists an ordered basis 8 ~ ²" Á # ³ for which ´ µ8 ~ >
c ?
for Á s. b) Among the real normal operators, the symmetric operators are precisely those for which there are no subspaces < in the
Advanced Linear Algebra
222
c)
decomposition of part 2b). Hence, an operator is symmetric if and only if it is orthogonally diagonalizable. Among the real normal operators, the orthogonal operators are precisely those for which the eigenvalues are equal to f and the matrices ´ µ8 described in part 2a) have rows (and columns) of norm , that is, ´ µ8 ~ >
sin cos
ccos sin ?
for some s. Proof. We have proved part 1a). As to part 1b), it is only necessary to look at the matrix ( of with respect to a basis 8 consisting of eigenvectors for . This matrix is diagonal and so it is Hermitian (( ~ (i ) if and only if the diagonal entries are real. Similarly, ( is unitary ((c ~ (i ) if and only if the diagonal entries have absolute value equal to . We have proved part 2a). Parts 2b) and 2c) are seen to be true by looking at the matrix ( ~ ´ µ8 , which is symmetric (( ~ (! ) if and only if ( is diagonal and ( is orthogonal if and only if ~ f and the matrices 4 have orthonormal rows.
Matrix Versions We can formulate matrix versions of the structure theorem for normal operators. Theorem 10.16 (The structure theorem for normal matrices) 1) (Complex case) a) A complex matrix ( is normal if and only if there is a unitary matrix < for which < (< c ~ diag² Á Ã Á ³ where ¸ Á Ã Á ¹ is the spectrum of . Put another way, ( is normal if and only if it is unitarily diagonalizable . b) A complex matrix ( is Hermitian if and only if 1a) holds where all eigenvalues are real. c) A complex matrix ( is unitary if and only if 1a) holds where all eigenvalues have norm . 2) (Real case) a) A real matrix ( is real normal if and only if there is an orthogonal matrix 6 for which 6(6c has the block diagonal form 6(6c ~ diag6 Á Ã Á Á >
c ÁÃÁ> ?
c ?7
b) A real matrix ( is symmetric if and only if there is an orthogonal matrix 6 for which 6(6c has the block diagonal form
Structure Theory for Normal Operators
223
6(6c ~ diag² Á Ã Á ³
c)
That is, a real matrix ( is symmetric if and only if it is orthogonally diagonalizable. A real matrix ( is orthogonal if and only if there is an orthogonal matrix 6 for which 6(6c has the block diagonal form 6(6c ~ diag6 Á Ã Á Á >
sin cos
sin ccos ÁÃÁ> sin ? cos
ccos sin ?7
for some Á Ã Á s.
Orthogonal Projections We now wish to characterize unitary diagonalizability in terms of projection operators. Definition Let = ~ : p : . The projection map ¢ = ¦ : on : along : is called orthogonal projection onto : . Put another way, a projection map is an orthogonal projection if im²³ ker²³.
Note that some care must be taken to avoid confusion between the term orthogonal projection and the concept of projections that are orthogonal to each other, that is, for which ~ ~ . We saw in Chapter 8 that an operator is a projection operator if and only if it is idempotent. Here is the analogous characterization of orthogonal projections. Theorem 10.17 Let = be a finite-dimensional inner product space. The following are equivalent for an operator on = : 1) is an orthogonal projection 2) is idempotent and self-adjoint 3) is idempotent and does not expand lengths, that is )²#³) )#) for all # = . Proof. To see that 1) and 2) are equivalent, we have ~ i ¯ im²³ ~ im²i ³ and ker²³ ~ ker²i ³ ¯ im²³ ~ ker²³ and ker²³ ~ im²³ ¯ im²³ ker²³ To prove that 1) implies 3), note that # ~ ²#³ b ' where ' ker²³ and since ²#³ ' we have
224
Advanced Linear Algebra
)#) ~ )²#³) b )' ) )²#³) from which the result follows. Now suppose that 3) holds. We know that = ~ im²³ l ker²³ and wish to show that this sum is orthogonal. According to Theorem 9.13, it is sufficient to show that im²³ ker²³ . Let $ im²³. Then since = ~ ker²³ p ker²³ , we have $ ~ % b &, where % ker²³ and & ker²³ and so $ ~ ²$³ ~ ²%³ b ²&³ ~ ²&³ Hence )%) b )&) ~ )$) ~ )²&³) )&) which implies that )%) ~ and hence that % ~ . Thus, $ ~ & ker²³ and so im²³ ker²³ , as desired.
Note that for an orthogonal projection , we have º#Á ²#³» ~ º#Á ²#³» ~ º²#³Á ²#³» ~ )²#³) Next we give some additional properties of orthogonal projections. Theorem 10.18 Let = be an inner product space over a field of characteristic £ . Let , Á à Á and be projections, each of which is orthogonal. Then 1) ~ if and only if ~ . 2) b is an orthogonal projection if and only if , in which case b is projection on im²³ p im²³ along ker²³ q ker²³. 3) ~ b Ä b is an orthogonal projection if and only if for all £ . 4) c is an orthogonal projection if and only if ~ ~ in which case c is projection on im²³ q ker²³ along ker²³ p im²³. 5) If ~ then is an orthogonal projection. In this case, is projection on im²³ q im²³ along ker²³ p ker²³. 6) a) i is orthogonal projection onto ker²³ along im²³ b) i is orthogonal projection onto ker²³ along ker²³ c) i is orthogonal projection onto im²³ along im²³ Proof. We prove only part 3). If the 's are orthogonal projections and if for all £ then ~ for all £ and so it is straightforward to check that ~ and that i ~ . Hence, is an orthogonal projection. Conversely, suppose that is an orthogonal projection and that % im² ³ for a fixed . Then ²%³ ~ % and so
Structure Theory for Normal Operators
225
)%) )²%³) ~ º²%³Á ²%³» ~ º²%³Á %» ~ º ²%³Á %» ~ ) ²%³) ) ²%³) ~ )%)
which implies that ²%³ ~ for £ . In other words, im² ³ ker² ³ ~ im² ³ Therefore, ~ º ²#³Á ²$³» ~ º ²#³Á $» for all #Á $ = , which shows that ~ , that is, .
For orthogonal projections and , we can define a partial order by defining to be im²³ im²³. it is easy to verify that this is a reflexive, antisymmetric, transitive relation on the set of all orthogonal projections. Furthermore, we have the following characterizations. Theorem 10.19 The following statements are equivalent for orthogonal projections and : 1) 2) ~ 3) ~ 4) )²#³) )²#³) for all # = . 5) 8c ²#³ , for all # = . Proof. First, we show that 2) and 3) are equivalent. If 2) holds then ~ i i ~ ²³i ~ i ~ and so 3) holds. Similarly, 3) implies 2). Next, note that 4) and 5) are equivalent, since 8c ²#³ ~ º²#³Á #» c º²#³Á #» ~ º²#³Á ²#³» c º²#³Á ²#³» ~ )²#³) c )²#³) Now, 2) is equivalent to the statement that fixes each element of im²³, which is equivalent to the statement that im²³ im²³, which is 1). Hence, 1)–3) are equivalent. Finally, if 2) holds, then 3) also holds and so by Theorem 10.18, the difference c is an orthogonal projection, from which it follows that 8c ²#³ ~ º² c ³²#³Á #» ~ º² c ³²#³Á ² c ³²#³» which is 5). Also, if 4) holds, then for any # im²³, we have # ~ % b &, where
226
Advanced Linear Algebra
% im²³Á & ker²³ and % &. Then, )%) b )&) ~ )#) ~ )²#³) )²#³) ~ )%) and so & ~ , that is, # im²³. Hence, 1) holds.
Orthogonal Resolutions of the Identity Recall that resolutions of the identity b Ä b ~ correspond to direct sum decompositions of the space = . It is the mutual orthogonality of the projections that gives the directness of the sum. If, in addition, the projections are themselves orthogonal, then the direct sum is an orthogonal sum. Definition If b Ä b ~ is a resolution of the identity and if each is orthogonal, then we say that b Ä b ~ is an orthogonal resolution of the identity.
The following theorem displays a correspondence between orthogonal direct sum decompositions of = and orthogonal resolutions of the identity. Theorem 10.20 1) If b Ä b ~ is an orthogonal resolution of the identity then = ~ im² ³ p Ä p im² ³ 2) Conversely, if = ~ : p Ä p : and is projection on : along V p Ä p : , where the hat ^ means that the corresponding : p Ä p : term is missing from the direct sum, then b Ä b ~ is an orthogonal resolution of the identity. Proof. To prove 1) suppose that b Ä b ~ is an orthogonal resolution of the identity. According to Theorem 8.17, we have = ~ im² ³ l Ä l im² ³ However, since the 's are orthogonal, they are self-adjoint and so for £ , º ²#³Á ²$³» ~ º#Á ²$³» ~ º#Á » ~ Hence = ~ im² ³ p Ä p im² ³ For the converse, we know from Theorem 8.17 that b Ä b ~ is a resolution of the identity and we need only show that each is an orthogonal
Structure Theory for Normal Operators
227
projection. But this follows from the fact that im² ³ ~ im² ³ p Ä p im²c ³ p im²b ³ p Ä p im² ³ ~ ker² ³
The Spectral Theorem We can now characterize the normal (unitarily diagonalizable) operators on a finite-dimensional complex inner product space using projections. Theorem 10.21 (The spectral theorem for normal operators) Let be an operator on a finite-dimensional complex inner product space = . The following statements are equivalent: 1) is normal 2) is unitarily diagonalizable, that is, = ~ ; p Ä p ; 3) has the orthogonal spectral resolution ~ b Ä b
(10.1)
where d and where b Ä b ~ is an orthogonal resolution of the identity. Moreover, if has the form (10.1), where the 's are distinct and the 's are nonzero then the 's are the eigenvalues of and im² ³ is the eigenspace associated with . Proof. We have seen that 1) and 2) are equivalent. Suppose that is unitarily diagonalizable. Let be orthogonal projection onto ; . Then any # = can be written as a sum of orthogonal eigenvectors # ~ # b Ä b # and so ²#³ ~ # b Ä b # ~ ² b Ä b ³²#³ Hence, 3) holds. Conversely, if (10.1) holds, we have = ~ im² ³ p Ä p im² ³ But Theorem 8.18 implies that im² ³ ~ ; and so is unitarily diagonalizable.
In the real case, we have the following. Theorem 10.22 (The spectral theorem for self-adjoint operators) Let be an operator on a finite-dimensional real inner product space = . The following statements are equivalent: 1) is self-adjoint
228
Advanced Linear Algebra
2) is orthogonally diagonalizable, that is, = ~ ; p Ä p ; 3) has the orthogonal spectral resolution ~ b Ä b
(10.2)
where s and b Ä b ~ is an orthogonal resolution of the identity. Moreover, if has the form (10.2), where the 's are distinct and the 's are nonzero then the 's are the eigenvalues of and im² ³ is the eigenspace associated with .
Spectral Resolutions and Functional Calculus Let be a linear operator on a finite-dimensional inner product space = , and let have spectral resolution ~ b Ä b Since is idempotent, we have ~ for all . The mutual orthogonality of the projections means that ~ for all £ and so ~ ² b Ä b ³ ~ b Ä b More generally, for any polynomial ²%³ over - , we have ² ³ ~ ² ³ b Ä b ² ³ Now, we can extend this further by defining, for any function ¢ ¸ Á Ã Á ¹ ¦ the linear operator ² ³ by setting ² ³ ~ ² ³ b Ä b ² ³ For example, we may define j Á c Á and so on. Notice, however, that since the spectral resolution of is a finite sum, we gain nothing (but convenience) by using functions other than polynomials, for we can always find a polynomial ²%³ for which ² ³ ~ ² ³ for ~ Á Ã Á and so ² ³ ~ ² ³ b Ä b ² ³ ~ ² ³ b Ä b ² ³ ~ ² ³ The study of the properties of functions of an operator is referred to as the functional calculus of . According to the spectral theorem, if = is complex and is normal then ² ³ is a normal operator whose eigenvalues are ² ³. Similarly, if = is real and is self-adjoint then ² ³ is self-adjoint, with eigenvalues ² ³. Let us consider some special cases of this construction.
Structure Theory for Normal Operators
229
If ²%³ is a polynomial for which ² ³ ~ Á for ~ Á Ã Á , then ² ³ ~ and so we see that each projection in the spectral resolution is a polynomial function of . If is invertible then £ for all and so we may take ²%³ ~ %c , giving c ~ c b Ä b c as can easily be verified by direct calculation. If ² ³ ~ and if is normal then each is self-adjoint and so ² ³ ~ b Ä b ~ i
Commutativity The functional calculus can be applied to the study of the commutativity properties of operators. Here are two simple examples. Theorem 10.23 Let = be a finite-dimensional complex inner product space. For Á B²= ³, let us write to denote the fact that and commute. Let and have spectral resolutions ~ b Ä b ~ b Ä b Then 1) For any B²= ³, we have if and only if for all . 2) if and only if , for all Á . 3) If ¢ ¸ Á Ã Á ¹ ¦ - and ¢ ¸ Á Ã Á ¹ ¦ - are injective functions, then ² ³ ²³ if and only if . Proof. The proof is based on the fact that if and are operators then implies that ²³ ² ³ for any polynomials ²%³ and ²%³, and hence ²³ ² ³ for any functions and . For 1), it is clear that for all implies that . The converse follows from the fact that is a polynomial in . Part 2) is similar. For part 3), clearly implies ² ³ ²³. For the converse, let $ ~ ¸ Á Ã Á ¹. Since is injective, the inverse function c ¢ ²$³ ¦ $ is well-defined and
230
Advanced Linear Algebra
c ² ² ³³ ~ . Thus, is a function of ² ³. Similarly, is a function of ²³. it follows that ² ³ ²³ implies .
Theorem 10.24 Let = be a finite-dimensional complex inner product space and let and be normal operators on = . Then and commute if and only if they have the form ~ ²² Á ³³ ~ ²² Á ³³ where ²%³Á ²%³ and ²%Á &³ are polynomials. Proof. If and are polynomials in then they clearly commute. For the converse, suppose that ~ and let ~ b Ä b and ~ b Ä b be the orthogonal spectral resolutions of and . Then according to Theorem 10.23, ~ . Now, let us choose any polynomial ²%Á &³ with the property that Á ~ ² Á ³ are distinct. Since each and is self-adjoint, we may set ~ ² Á ³ and deduce (after some algebra) that ~ ² Á ³ ~ Á Á
We also choose ²%³ and ²%³ so that ²Á ³ ~ for all and ²Á ³ ~ for all . Then ² ³ ~ ²Á ³ ~ ~ ² ³² ³ ~ ~ Á
Á
and similarly, ² ³ ~ .
Positive Operators One of the most important cases of the functional calculus is when ²%³ ~ j%. First, we need some definitions. Recall that the quadratic form associated with a linear operator is 8 ²#³ ~ º ²#³Á #» Definition A self-adjoint linear operator B²= ³ is 1) positive if 8 ²#³ for all # = 2) positive definite if 8 ²#³ for all # £ .
Theorem 10.25 A self-adjoint operator on a finite-dimensional inner product space is
Structure Theory for Normal Operators
231
1) positive if and only if all of its eigenvalues are nonnegative 2) positive definite if and only if all of its eigenvalues are positive. Proof. If 8 ²#³ and ²#³ ~ # then º ²#³Á #» ~ º#Á #» and so . Conversely, if all eigenvalues of are nonnegative then we have ~ b Ä b Á and since ~ b Ä b , º ²#³Á #» ~ º ²#³Á ²#³» ~ ) ²#³) Á
and so is positive. Part 2) is proved similarly.
If is a positive operator, with spectral resolution ~ b Ä b Á then we may take the positive square root of , j ~ j b Ä b j where j is the nonnegative square root of . It is clear that ²j ³ ~ and it is not hard to see that j is the only positive operator whose square is . In other words, every positive operator has a unique positive square root. Conversely, if has a positive square root, that is, if ~ , for some positive operator then is positive. Hence, an operator is positive if and only if it has a positive square root. If is positive then j is self-adjoint and so ²j ³ i j ~ Conversely, if ~ i for some operator then is positive, since it is clearly self-adjoint and º ²#³Á #» ~ ºi ²#³Á #» ~ º²#³Á ²#³» Thus, is positive if and only if it has the form ~ i for some operator . Here is an application of square roots.
232
Advanced Linear Algebra
Theorem 10.26 If and are positive operators and ~ then is positive. Proof. Since is a positive operator, it has a positive square root j , which is a polynomial in . A similar statement holds for . Therefore, since and commute, so do j and j. Hence, ²j j³ ~ ²j ³ ²j³ ~ Since j and j are self-adjoint and commute, their product is self-adjoint and so is positive.
The Polar Decomposition of an Operator It is well known that any nonzero complex number ' can be written in the polar form ' ~ , where is a positive number and is real. We can do the same for any nonzero linear operator on a finite-dimensional complex inner product space. Theorem 10.27 Let be a nonzero linear operator on a finite-dimensional complex inner product space = . Then 1) There exists a positive operator and a unitary operator for which ~ . Moreover, is unique and if is invertible then is also unique. 2) There exists a positive operator and a unitary operator for which ~ . Moreover, is unique and if is invertible then is also unique. Proof. Let us suppose for a moment that ~ . Then i ~ ²³i ~ i i ~ c and so i ~ c ~ Also, if # = then ²#³ ~ ²²#³³ These equations give us a clue as to how to define and . Let us define to be the unique positive square root of the positive operator i . Then )²#³) ~ º²#³Á ²#³» ~ º ²#³Á #» ~ º i ²#³Á #» ~ ) ²#³)
(10.3)
Let us define on im²³ by ²²#³³ ~ ²#³
(10.4)
for all # = . Equation (10.3) shows that ²%³ ~ ²&³ implies that ²%³ ~ ²&³ and so this definition of on im²³ is well-defined.
Structure Theory for Normal Operators
233
Moreover, is an isometry on im²³, since (10.3) gives ) ²²#³³) ~ ) ²#³) ~ )²#³) Thus, if 8 ~ ¸ Á à Á ¹ is an orthonormal basis for im²³, then ²8 ³ ~ ¸ ² ³Á à Á ² ³¹ is an orthonormal basis for ²im²³³ ~ im² ³. Finally, we may extend both orthonormal bases to orthonormal bases for = and then extend the definition of to an isometry on = , for which ~ . As for the uniqueness, we have seen that must satisfy ~ i and since has a unique positive square root, we deduce that is uniquely defined. Finally, if is invertible then so is since ker²³ ker² ³. Hence, ~ c is uniquely determined by . Part 2) can be proved by applying the previous theorem to the map i , to get ~ ² i ³i ~ ²³i ~ c ~ where is unitary.
We leave it as an exercise to show that any unitary operator has the form ~ , where is a self-adjoint operator. This gives the following corollary. Corollary 10.27 (Polar decomposition) Let be a nonzero linear operator on a finite-dimensional complex inner product space. Then there is a positive operator and a self-adjoint operator for which has the polar decomposition ~ Moreover, is unique and if is invertible then is also unique.
Normal operators can be characterized using the polar decomposition. Theorem 10.28 Let ~ be a polar decomposition of a nonzero linear operator . Then is normal if and only if ~ . Proof. Since i ~ c ~ and i ~ c ~ c we see that is normal if and only if c ~
234
Advanced Linear Algebra
or equivalently, ~
(10.5)
Now, is a polynomial in and is a polynomial in and so (10.5) holds if and only if ~ .
Exercises 1.
2.
Let B²< Á = ³. If is surjective, find a formula for the right inverse of in terms of i . If is injective, find a formula for the left inverse of in terms of i . Hint: Consider ker² i ³ and ker² i ³. Let B²= ³ where = is a complex vector space and let ~
² b i ³ and ~ ² c i ³
Show that and are self-adjoint and that ~ b and i ~ c What can you say about the uniqueness of these representations of and i? 3. Prove that all of the roots of the characteristic polynomial of a skewHermitian matrix are pure imaginary. 4. Give an example of a normal operator that is neither self-adjoint nor unitary. 5. Prove that if ) ²#³) ~ ) i ²#³) for all # = , where = is complex then is normal. 6. a) Show that if is a normal operator on a finite-dimensional inner product space then i ~ ² ³, for some polynomial ²%³ - ´%µ. b) Show that if is normal and ~ then i ~ i . In other words, i commutes with all operators that commute with . 7. Show that a linear operator on a finite-dimensional complex inner product space = is normal if and only if whenever : is an invariant subspace under , so is : . 8. Let = be a finite-dimensional inner product space and let be a normal operator on = . a) Prove that if is idempotent then it is also self-adjoint. b) Prove that if is nilpotent then ~ . c) Prove that if ~ then is idempotent. 9. Show that if is a normal operator on a finite-dimensional complex inner product space then the algebraic multiplicity is equal to the geometric multiplicity for all eigenvalues of . 10. Show that two orthogonal projections and are orthogonal to each other if and only if im²³ im²³. 11. Let be a normal operator and let be any operator on = . If the eigenspaces of are -invariant, show that and commute.
Structure Theory for Normal Operators
235
12. Prove that if and are normal operators on a finite-dimensional inner complex product space and if ~ for some operator then i ~ i . 13. Prove that if two normal d complex matrices are similar then they are unitarily similar, that is, similar via a unitary matrix. 14. If is a unitary operator on a complex inner product space, show that there exists a self-adjoint operator for which ~ . 15. Show that a positive operator has a unique positive square root. 16. Let , be complex numbers, for ~ Á Ã Á . Construct a polynomial ²%³ for which ² ³ ~ for all . 17. Prove that if has a square root, that is, if ~ , for some positive operator then is positive. 18. Prove that if and if is a positive operator that commutes with both and then . 19. Does every self-adjoint operator on a finite-dimensional real inner product space have a square root? 20. Let be a linear operator on d and let Á Ã Á be the eigenvalues of , each one written a number of times equal to its algebraic multiplicity. Show that ( ( tr² i ³
where tr is the trace. Show also that equality holds if and only if is normal. 21. If B²= ³ where = is a real inner product space, show that the Hilbert space adjoint satisfies ² i ³d ~ ² d ³i .
Part II—Topics
Chapter 11
Metric Vector Spaces: The Theory of Bilinear Forms
In this chapter, we study vector spaces over arbitrary fields that have a bilinear form defined upon them. Unless otherwise mentioned, all vector spaces are assumed to be finitedimensional. The symbol - denotes an arbitrary field and - denotes a finite field of size .
Symmetric, Skew-Symmetric and Alternate Forms We begin with the basic definition. Definition Let = be a vector space over - . A mapping ºÁ »¢ = d = ¦ - is called a bilinear form if it is a linear function of each coordinate, that is, if º% b &Á '» ~ º%Á '» b º&Á '» and º'Á % b &» ~ º'Á %» b º'Á &» A bilinear form is 1) symmetric if º%Á &» ~ º&Á %» for all %Á & = . 2) skew-symmetric (or antisymmetric) if º%Á &» ~ cº&Á %» for all %Á & = .
240
Advanced Linear Algebra
3) alternate (or alternating) if º%Á %» ~ for all % = . A bilinear form that is either symmetric, skew-symmetric, or alternate is referred to as an inner product and a pair ²= Á ºÁ »³, where = is a vector space and ºÁ » is an inner product on = , is called a metric vector space or inner product space. If ºÁ » is symmetric then ²= Á ºÁ »³ (or just = ) is called an orthogonal geometry over - and if ºÁ » is alternate then ²= Á ºÁ »³ (or just = ) is called a symplectic geometry over - .
As an aside, the term symplectic, from the Greek for “intertwined” was introduced in 1939 by the famous mathematician Hermann Weyl in his book The Classical Groups, as a substitute for the term complex. According to the dictionary, symplectic means “relating to or being an intergrowth of two different minerals.” An example is ophicalcite, which is marble spotted with green serpentine. Example 11.1 Minkowski space 44 is the four-dimensional real orthogonal geometry s with inner product defined by º Á » ~ º Á » ~ º3 Á 3 » ~ º4 Á 4 » ~ c º Á » ~ for £ where Á Ã Á 4 is the standard basis for s .
As is traditional, when the inner product is understood, we will use the phrase “let = be a metric vector space.” The real inner products discussed in Chapter 9 are inner products in the present sense and have the additional property of being positive definite—a notion that does not even make sense if the base field is not ordered. Thus, a real inner product space is an orthogonal geometry. On the other hand, the complex inner products of Chapter 9, being sesquilinear, are not inner products in the present sense. For this reason, we prefer to use the term metric vector space rather than inner product space. If : is a vector subspace of a metric vector space = then : inherits the metric structure from = . With this structure, we refer to : as a subspace of = . The concepts of being symmetric, skew-symmetric and alternate are not independent. However, their relationship depends on the characteristic of the base field - , as do many other properties of metric vector spaces. In fact, the next theorem tells us that we do not need to consider skew-symmetric forms per
Metric Vector Spaces: The Theory of Bilinear Forms
241
se, since skew-symmetry is always equivalent to either symmetry or alternateness. Theorem 11.1 Let = be a vector space over a field - . 1) If char²- ³ ~ then symmetric ¯ skew-symmetric alternate ¬ skew-symmetric 2) If char²- ³ £ then alternate ¯ skew-symmetric Also, the only form that is both alternate and symmetric is the zero form: º%Á &» ~ for all %Á & = . Proof. First note that for any base field, if ºÁ » is alternate then ~ º% b &Á % b &» ~ º%Á %» b º%Á &» b º&Á %» b º&Á &» ~ º%Á &» b º&Á %» Thus, º%Á &» b º&Á %» ~ or º%Á &» ~ cº&Á %» which shows that ºÁ » is skew-symmetric. Thus, alternate always implies skewsymmetric. If char²- ³ ~ then c ~ and so the definitions of symmetric and skewsymmetric are equivalent. This proves 1). If char²- ³ £ and ºÁ » is skewsymmetric, then for any % = , we have º%Á %» ~ cº%Á %» or º%Á %» ~ , which implies that º%Á %» ~ . Hence, ºÁ » is alternate. Finally, if the form is alternate and symmetric, then it is also skew-symmetric and so º"Á #» ~ cº"Á #» for all "Á # = and so º"Á #» ~ for all "Á # = .
Example 11.2 The standard inner product on = ²Á ³, defined by ²% Á à Á % ³ h ²& Á à Á & ³ ~ % & b Ä b % & is symmetric, but not alternate, since ²Á Á à Á ³ h ²Á Á à Á ³ ~ £
242
Advanced Linear Algebra
The Matrix of a Bilinear Form If 8 ~ ² Á à Á ³ is an ordered basis for a metric vector space = then the form ºÁ » is completely determined by the d matrix of values 48 ~ ²Á ³ ~ ²º Á »³ This is referred to as the matrix of the form ºÁ » with respect to the ordered basis 8 . We also refer to 48 as the matrix of = with respect to 8 and write 48 ²= ³ when the space needs emphasis. Observe that multiplication of the coordinate matrix of a vector by 48 produces a vector of inner products, to wit, if % ~ ' then 48 ´%µ8 ~
v º Á %» y Å w º Á %» z
and ´%µ!8 48 ~ º%Á » Ä º%Á » It follows that if & ~ then ´%µ!8 48 ´&µ8 ~ º%Á » Ä º%Á »
v w
Å
y z
~ º%Á &»
and this uniquely defines the matrix 48 , that is, if ´%µ!8 (´&µ8 ~ º%Á &» then ( ~ 48 . Notice also that a form is symmetric if and only if the matrix 48 is symmetric, skew-symmetric if and only if 48 is skew-symmetric and alternate if and only if 48 is skew-symmetric and has 's on the main diagonal. The latter type of matrix is referred to as alternate. Now let us see how the matrix of a form behaves with respect to a change of basis. Let 9 ~ ² Á Ã Á ³ be an ordered basis for = . Recall from Chapter 2 that the change of basis matrix 49Á8 , whose th column is ´ µ8 , satisfies ´#µ8 ~ 49Á8 ´#µ9 Hence, º%Á &» ~ ´%µ!8 48 ´&µ8 ~ ²´%µ!9 49!Á8 ³48 ²49Á8 ´&µ9 ³ ~ ´%µ9! ²49!Á8 48 49Á8 ³´&µ9
Metric Vector Spaces: The Theory of Bilinear Forms
243
and so 49 ~ 49!Á8 48 49Á8 This prompts the following definition. Definition Two matrices (Á ) C ²- ³ are said to be congruent if there exists an invertible matrix 7 for which ( ~ 7 ! )7 The equivalence classes under congruence are called congruence classes.
Let us summarize. Theorem 11.2 If the matrix of a bilinear form on = with respect to an ordered basis 8 ~ ² Á à Á ³ is 48 ~ ²º Á »³ then º%Á &» ~ ´%µ!8 48 ´&µ8 Furthermore, if 9 ~ ² Á à Á ³ is an ordered basis for = then 49 ~ 49!Á8 48 49Á8 where 49Á8 is the change of basis matrix from 9 to 8 .
We have shown that if two matrices represent the same bilinear form on = , they must be congruent. Conversely, congruent matrices represent the same bilinear form on = . For suppose that ) ~ 48 represents a bilinear form on = , with respect to the ordered basis 8 and that ( ~ 7 ! )7 where 7 is nonsingular. We saw in Chapter 2 that there is an ordered basis 9 for = with the property that 7 ~ 49 Á 8 and so ( ~ 49!Á8 48 49Á8 Thus, ( ~ 49 represents the same form with respect to 9 . Theorem 11.3 Two matrices ( and ) represent a bilinear form ºÁ » on = if and only if they are congruent, in which case they represent the same set of bilinear forms on = .
244
Advanced Linear Algebra
In view of the fact that congruent matrices have the same rank, we may define the rank of a bilinear form (or of = ) to be the rank of any matrix that represents that form.
The Discriminant of a Form If ( and ) are congruent matrices then det²(³ ~ det²7 ! (7 ³ ~ det²7 ³ det²)³ and so det²(³ and det²)³ differ by a square factor. The discriminant of a bilinear form is the set of all determinants of the matrices that represent the form under all choices of ordered bases. Thus, if det²(³ ~ for some matrix ( representing the form then the discriminant of the form is the set " ~ - ~ ¸ £ - ¹
Quadratic Forms There is a close link between symmetric bilinear forms and another important type of function defined on a vector space. Definition A quadratic form on a vector space = is a map 8¢ = ¦ - with the following properties: 1) For all - Á # = 8²#³ ~ 8²#³ 2) The map º"Á #»8 ~ 8²" b #³ c 8²"³ c 8²#³ is a (symmetric) bilinear form.
Thus, every quadratic form 8 defines a symmetric bilinear form º"Á #»8 . On the other hand, if char²- ³ £ and if ºÁ » is a symmetric bilinear form on = then we can define a quadratic form 8 by 8²%³ ~
º%Á %»
We leave it to the reader to verify that this is indeed a quadratic form. Moreover, if 8 is defined from a bilinear form in this way then the bilinear form associated with 8 is
Metric Vector Spaces: The Theory of Bilinear Forms
245
º"Á #»8 ~ 8²" b #³ c 8²"³ c 8²#³ ~ º" b #Á " b #» c º"Á "» c º#Á #» ~ º"Á #» b º#Á "» ~ º"Á #» which is the original bilinear form. In other words, the maps ºÁ » ¦ 8 and 8 ¦ ºÁ »8 are inverses and so there is a one-to-one correspondence between symmetric bilinear forms on = and quadratic forms on = . Put another way, knowing the quadratic form is equivalent to knowing the corresponding bilinear form. Again assuming that char²- ³ £ , if 8 ~ ²# Á à Á # ³ is an ordered basis for an orthogonal geometry = and if the matrix of the symmetric form on = is 48 ~ ²Á ³ then for % ~ '% # , 8²%³ ~
º%Á %» ~ ´%µ!8 48 ´%µ8 ~ Á % % Á
and so 8²%³ is a homogeneous polynomial of degree 2 in the coordinates % . (The term “form” means homogeneous polynomial—hence the term quadratic form.)
Orthogonality As we will see, not all metric vector spaces behave as nicely as real inner product spaces and this necessitates the introduction of a new set of terminology to cover various types of behavior. (The base field - is the culprit, of course.) The most striking differences stem from the possibility that º%Á %» ~ for a nonzero vector % = . The following terminology should be familiar. Definition A vector % is orthogonal to a vector &, written % &, if º%Á &» ~ . A vector % = is orthogonal to a subset : of = , written % : , if º%Á » ~ for all : . A subset : of = is orthogonal to a subset ; of = , written : ; , if º Á !» ~ for all : and ! ; . The orthogonal complement of a subset ? of a metric vector space = , denoted by ? , is the subspace ? ~ ¸# = # ?¹
Note that regardless of whether the form is symmetric or alternate (and hence skew-symmetric), orthogonality is a symmetric relation, that is, % & implies & %. Indeed, this is precisely why we restrict attention to these two types of bilinear forms. We will have more to say about this issue momentarily.
246
Advanced Linear Algebra
There are two types of degenerate behaviors that a vector may possess: It may be orthogonal to itself or, worse yet, it may be orthogonal to every vector in = . With respect to the former, we have the following terminology. Definition Let ²= Á ºÁ »³ be a metric vector space. 1) A nonzero % = is isotropic (or null) if º%Á %» ~ ; otherwise it is nonisotropic. 2) = is nonisotropic (also called anisotropic) if it contains no isotropic vectors. 3) = is isotropic if it contains at least one isotropic vector. 4) = is totally isotropic (that is, symplectic) if all vectors in = are isotropic.
With respect to the latter (and more severe) form of degeneracy, we have the following terminology. Definition Let ²= Á ºÁ »³ be a metric vector space. 1) The set = of all degenerate vectors is called the radical of = and written rad²= ³ ~ = 2) = is nonsingular, or nondegenerate, if rad²= ³ ~ ¸¹. 3) = is singular, or degenerate, if rad²= ³ £ ¸¹. 4) = is totally singular or totally degenerate if rad²= ³ ~ = .
Let us make a few remarks about these terms. Some of the above terminology is not entirely standard, so care should be exercised in reading the literature. Also, it is not hard to see that a metric vector space = is nonsingular if and only if the matrix 48 is nonsingular, for any ordered basis 8 . If # is an isotropic vector then so is # for all - . This can be expressed by saying that the set 0 of isotropic vectors in = is a cone in = . With respect to subspaces, to say that a subspace : of = is totally degenerate, for example, means that : is totally degenerate as a metric vector space in its own right and so each vector in : is orthogonal to all other vectors in : , not necessarily in = . In fact, we have rad²:³ ~ : : ~ : q : = where the symbols : and = refer to the orthogonal complements with respect to : and = , respectively.
Metric Vector Spaces: The Theory of Bilinear Forms
247
Example 11.3 Recall that = ²Á ³ is the set of all ordered -tuples, whose components come from the finite field - . It is easy to see that the subspace : ~ ¸Á Á Á ¹ of = ²Á ³ has the property that : ~ : . Note also that = ²Á ³ is nonsingular and yet the subspace : is totally singular.
The following result explains why we restrict attention to symmetric or alternate forms (which includes skew-symmetric forms). Theorem 11.4 Let ºÁ » be a bilinear form on = . Then orthogonality is a symmetric relation, that is, %& ¬ &%
(11.1)
if and only if ºÁ » is either symmetric or alternate, that is, if and only if = is a metric vector space. Proof. It is clear that (11.1) holds if ºÁ » is symmetric. If ºÁ » is alternate then it is skew-symmetric and so (11.1) also holds. For the converse, assume that (11.1) holds. Let us introduce the notation % & to mean that º%Á &» ~ º&Á %» and the notation % = to mean that º%Á #» ~ º#Á %» for all # = . For %Á &Á ' = , let $ ~ º%Á &»' c º%Á '»& Then % $ and so by (11.1) we have $ %, that is, º%Á &»º'Á %» c º%Á '»º&Á %» ~ From this we deduce that % & ¬ º%Á &»²º'Á %» c º%Á '»³ ~ ¬ º%Á &» ~ or % ' for all ' = It follows that % & ¬ % & or ²% = and & = ³
(11.2)
Of course, % % and so for all % = % % or % =
(11.3)
Now we can show that if = is not symmetric, it must be alternate. If = is not \ #. Thus, " \ = and # \ =, symmetric, there exist "Á # = for which " which implies by (11.3) that " and # are both isotropic. We wish to show that all vectors in = are isotropic.
248
Advanced Linear Algebra
\ = , then $ is isotropic. On the other hand, if $ = According to (11.3), if $ then to see that $ is also isotropic, we use the fact that if % and & are orthogonal isotropic vectors, then % c & is also isotropic. In particular, consider the vectors $ b " and ". We have seen that " is isotropic. The fact that $ = implies $ " and $ # and so (11.2) gives $ " and $ #. Hence, ²$ b "³ ". To see that $ b " is isotropic, note that º$ b "Á #» ~ º"Á #» £ º#Á "» ~ º#Á $ b "» \ = . Hence, (11.3) implies that $ b " is isotropic. Thus, " and and so ²$ b "³ $ b " are orthogonal isotropic vectors, and so $ ~ ²$ b "³ c " is also isotropic.
Linear Functionals Recall that the Riesz representation theorem says that for any linear functional on a finite-dimensional real or complex inner product space = , there is a vector 9 = , which we called the Riesz vector for , that represents , in the sense that ²#³ ~ º#Á 9 » for all # = . A similar result holds for nonsingular metric vector spaces. Let = be a metric vector space over - . Let % = and consider the “inner product on the right” map % ¢ = ¦ - defined by % ²#³ ~ º#Á %» This is easily seen to be a linear functional and so we can define a function ¢ = ¦ = i by ²%³ ~ % The bilinearity of the form insures that is linear and the kernel of is ker² ³ ~ ¸% = % ~ ¹ ~ ¸% = º#Á %» ~ for all # = ¹ ~ = Hence, if = is nonsingular then ker² ³ ~ = ~ ¸¹ and so is injective. Moreover, since dim²= ³ ~ dim²= i ³, it follows that is surjective and so is an isomorphism from = onto = i . This implies that every linear functional on = has the form % , for a unique % = . We have proved the Riesz representation theorem for finite-dimensional nonsingular metric vector spaces. Theorem 11.5 (The Riesz representation theorem) Let = be a finitedimensional nonsingular metric vector space. The linear functional ¢ = ¦ = i defined by ²%³ ~ %
Metric Vector Spaces: The Theory of Bilinear Forms
249
where % ²#³ ~ º#Á %» for all # = , is an isomorphism from = to = i . It follows that for each = i there exists a unique vector % = for which ~ % , that is, ²#³ ~ º#Á %» for all # = .
The requirement that = be nonsingular is necessary. As a simple example, if = is totally singular, then no nonzero linear functional could possibly be represented by an inner product. We would like to extend the Riesz representation theorem to the case of subspaces of a metric vector space. The Riesz representation theorem applies to nonsingular metric vector spaces. Thus, if : is a nonsingular subspace of = , the Riesz representation theorem applies to : and so all linear functionals on : have the form of an inner product by a (unique) element of : . This is nothing new. As long as = is nonsingular, even if : is singular, we can still say something very useful. The reason is that any linear functional : i can be extended to a linear functional on = (perhaps in many ways) and since = is nonsingular, the extension has the form of inner product by a vector in = , that is, ²#³ ~ º#Á %» for some % = . Hence, also has this form, where its “Riesz vector” is an element of = , not necessarily : . Here is the formal statement. Theorem 11.6 Let = be a metric vector space and let : be a subspace of = . If either = or : is nonsingular, the linear transformation ¢ = ¦ : i defined by ²%³ ~ % O: where % ²#³ ~ º#Á %», is surjective. Hence, for any linear functional : i there is a (not necessarily unique) vector % = for which ² ³ ~ º Á %». Moreover, if : is nonsingular then % can be taken from : , in which case it is unique.
Orthogonal Complements and Orthogonal Direct Sums If : is a subspace of a real inner product space, then the projection theorem says that the orthogonal complement : of : is a true vector space complement of :, that is, = ~ : p : Hence, the term orthogonal complement is justified. However, in general metric vector spaces, an orthogonal complement may not be a vector space
250
Advanced Linear Algebra
complement. In fact, Example 11.3 shows that we may have the opposite extreme, that is, : ~ : . As we will see, the orthogonal complement of : is a true complement if and only if : is nonsingular. Definition A metric vector space = is the orthogonal direct sum of the subspaces : and ; , written = ~: p; if = ~ : l ; and : ; .
In a real inner product space, if = ~ : p ; then ; ~ : . However, in a metric vector space in general, we may have a proper inclusion ; : . (In fact, : may be all of = .) Many nice properties of orthogonality in real inner product spaces do carry over to nonsingular metric vector spaces. The next result shows that the restriction to nonsingular spaces is not that severe. Theorem 11.7 Let = be a metric vector space. Then = ~ rad²= ³ p : where : is nonsingular and rad²= ³ is totally singular. Proof. Ignoring the metric structure for a moment, we know that all subspaces of a vector space, including rad²= ³, have a complement, say = ~ rad²= ³ l : . But rad²= ³ : and so = ~ rad²= ³ p : . To see that : is nonsingular, if % rad²:³ then % : and so % = , which implies that % rad²= ³ q : ~ ¸¹, that is, % ~ . Hence, rad²:³ ~ ¸¹ and : is nonsingular.
Under the assumption of nonsingularity of = , we get many nice properties, just short of the projection theorem. The first property in the next theorem is key: It says that if : is a subspace of a nonsingular space = , then the orthogonal complement of : always has the “correct” dimension, even if it is not well behaved with respect to its intersection with : , that is, dim²: ³ ~ dim²= ³ c dim²:³ just as in the case of a real inner product space. Theorem 11.8 Let = be a nonsingular metric vector space = and let : be any subspace of = . Then 1) dim²:³ b dim²: ³ ~ dim²= ³ 2) if = ~ : b : then = ~ : p : 3) : ~ : 4) rad²:³ ~ rad²: ³ 5) : is nonsingular if and only if : is nonsingular
Metric Vector Spaces: The Theory of Bilinear Forms
251
Proof. For part 1), the map ¢ = ¦ : i of Theorem 11.6 is surjective and ker² ³ ~ ¸% = % O: ~ ¹ ~ ¸% = º Á %» ~ for all :¹ ~ : Thus, the rank-plus-nullity theorem implies that dim²: i ³ b dim²: ³ ~ dim²= ³ However, dim²: i ³ ~ dim²:³ and so part 1) follows. For part 2), we have using part 1) dim²= ³ ~ dim²: b : ³ ~ dim²:³ b dim²: ³ c dim²: q : ³ ~ dim²= ³ c dim²: q : ³ and so : q : ~ ¸¹. For part 3), part 1) implies that dim²:³ b dim²: ³ ~ dim²= ³ and dim²: ³ b dim²: ³ ~ dim²= ³ and so dim²: ³ ~ dim²:³. But : : and so equality holds. For part 4), we have rad²:³ ~ : q : ~ : q : ~ rad²: ³ and part 5) follows from part 4).
The previous theorem cannot in general be strengthened. Consider the twodimensional metric vector space = ~ span²"Á #³ where º"Á "» ~ Á º"Á #» ~ Á º#Á #» ~ Let : ~ span²"³. Then : ~ span²#³. Now, : is nonsingular but : is singular and so 5) does not hold. Also, rad²:³ ~ ¸¹ and rad²: ³ ~ : and so 4) fails. Finally, : ~ = £ : and so 3) fails. On the other hand, we should note that if : is singular, then so is : , regardless of whether = is singular or nonsingular. To see this, note that if : is singular then there is a nonzero rad²:³ ~ : q : . Hence, : : and so : q : ~ rad²: ³, which implies that : is singular. Now let us state the projection theorem for arbitrary metric vector spaces.
252
Advanced Linear Algebra
Theorem 11.9 Let : be a subspace of a finite-dimensional metric vector space = . Then = ~ : p : if and only if : is nonsingular, that is, if and only if : q : ~ ¸¹. Proof. If = ~ : p : then by definition of orthogonal direct sum, we have rad²:³ ~ : q : ~ ¸¹ and so : is nonsingular. Conversely, if : is nonsingular, then : q : ~ ¸¹ and so : p : exists. Now, the same proof used in part 1) of the previous theorem works if : is nonsingular (even if = is singular). To wit, the map ¢ = ¦ : i of Theorem 11.6 is surjective and ker² ³ ~ ¸% = % O: ~ ¹ ~ ¸% = º Á %» ~ for all :¹ ~ : Thus, the rank-plus-nullity theorem gives dim²: i ³ b dim²: ³ ~ dim²= ³ But dim²: i ³ ~ dim²:³ and so dim²: p : ³ ~ dim²:³ b dim²: ³ ~ dim²= ³ It follows that = ~ : p : .
Isometries We now turn to a discussion of structure-preserving maps on metric vector spaces. Definition Let = and > be metric vector spaces. We use the same notation ºÁ » for the bilinear form on each space. A bijective linear map ¢ = ¦ > is called an isometry if º "Á #» ~ º"Á #» for all vectors " and # in = . If an isometry exists from = to > , we say that = and > are isometric and write = > . It is evident that the set of all isometries from = to = forms a group under composition. If = is a nonsingular orthogonal geometry, an isometry of = is called an orthogonal transformation. The set E²= ³ of all orthogonal transformations on = is a group under composition, known as the orthogonal group of = . If = is a nonsingular symplectic geometry, an isometry of = is called a symplectic transformation. The set Sp²= ³ of all symplectic transformations on = is a group under composition, known as the symplectic group of = .
Metric Vector Spaces: The Theory of Bilinear Forms
253
Here are a few of the basic properties of isometries. Theorem 11.10 Let B²= Á > ³ be a linear transformation between finitedimensional metric vector spaces = and > . 1) Let 8 ~ ¸# Á à Á # ¹ be a basis for = . Then is an isometry if and only if is bijective and º # Á # » ~ º# Á # » for all Á . 2) If = is orthogonal and char²- ³ £ then is an isometry if and only if it is bijective and º ²#³Á ²#³» ~ º#Á #» for all # = . 3) Suppose that is an isometry and = ~ : p : and > ~ ; p ; . If ²:³ ~ ; then ²: ³ ~ ; . In particular, if B²= ³ is an isometry and = ~ : p : then if : is -invariant, so is : . Proof. We prove part 3) only. To see that ²: ³ ~ ; , if ' : and ! ; then since ; ~ ²:³, we can write ! ~ ² ³ for some : and so º ²'³Á !» ~ º ²'³Á ² ³» ~ º'Á » ~ whence ²: ³ ; . But since dim² ²: ³³ ~ dim²; ³, we deduce that ²: ³ ~ ; .
Hyperbolic Spaces A special type of two-dimensional metric vector space plays an important role in the structure theory of metric vector spaces. Definition Let = be a metric vector space. If "Á # = have the property that º"Á "» ~ º#Á #» ~ Á º"Á #» ~ the ordered pair ²"Á #³ is called a hyperbolic pair. Note that º#Á "» ~ if = is an orthogonal geometry and º#Á "» ~ c if = is symplectic. In either case, the subspace / ~ span²"Á #³ is called a hyperbolic plane and any space of the form > ~ / p Ä p / where each / is a hyperbolic plane, is called a hyperbolic space. If ²" Á # ³ is a hyperbolic pair for / then we refer to the basis ²" Á # Á Ã Á " Á # ³ for > as a hyperbolic basis. (In the symplectic case, the usual term is symplectic basis.)
Note that any hyperbolic space > is nonsingular.
254
Advanced Linear Algebra
In the orthogonal case, hyperbolic planes can be characterized by their degree of isotropy, so-to-speak. (In the symplectic case, all spaces are totally isotropic by definition.) Indeed, we leave it as an exercise to prove that a two-dimensional nonsingular orthogonal geometry = is a hyperbolic plane if and only if = contains exactly two one-dimensional totally isotropic (equivalently: totally degenerate) subspaces. Put another way, the cone of isotropic vectors is the union of two one-dimensional subspaces of = .
Nonsingular Completions of a Subspace Let < be a subspace of a nonsingular metric vector space = . If < is singular, it is of interest to find a minimal nonsingular extension of < , that is, minimal nonsingular subspace of = containing < . Such extensions of < are called nonsingular completions of < . Theorem 11.11 (Nonsingular extension theorem) Let = be a nonsingular metric vector space over - . We assume that char²- ³ £ when = is orthogonal. 1) Let : be a subspace of = . For each isotropic vector # : ± : , there is a hyperbolic plane / ~ span²#Á '³ contained in : . Hence, / p : is an extension of : containing #. 2) Let < be a subspace of = and write < ~ rad²< ³ p > where > is nonsingular and ¸# Á Ã Á # ¹ is a basis for rad²< ³. Then there is a hyperbolic space / p Ä p / with hyperbolic basis ²# Á ' Á Ã Á # Á ' ³ for which < ~ / p Ä p / p > is a nonsingular extension of < , called a nonsingular completion of < . Proof. For part 1), the nonsingularity of = implies that : ~ : and so # ¤ : is equivalent to # ¤ : . Hence, there is a vector % : for which º#Á %» £ . If = is symplectic then we can take ' ~ ²°º#Á %»³%. If = is orthogonal, let ' ~ # b %. The conditions defining ²#Á '³ as a hyperbolic pair are (since # is isotropic) ~ º#Á '» ~ º#Á # b %» ~ º#Á %» and ~ º'Á '» ~ º# b %Á # b %» ~ º#Á %» b
º%Á %»
Since º#Á %» £ , the first of these equations can be solved for and since char²- ³ £ , the second can then be solved for . Thus, in either case, a vector ' : exists for which ²#Á '³ is a hyperbolic pair and span²#Á '³ : . For part 2), we proceed by indution on ~ dim²rad²< ³³. If ~ then # is isotropic and # > ± > . Hence, part 1) applied to : ~ > implies that there
Metric Vector Spaces: The Theory of Bilinear Forms
255
is a hyperbolic plane / ~ span²# Á '³ for which / p > is an extension of > containing span²# ³ ~ rad²< ³. Hence, part 2) holds when ~ . Let us assume that the result is true when dim²rad²< ³³ and assume that dim²rad²< ³³ ~ . Let < ~ span²# Á Ã Á # ³ p > Then # is (still) isotropic and # < ± < . Hence, we may apply part 1) to the subspace : ~ < to deduce the existence of a hyperbolic plane / ~ span²# Á ' ³ contained in < . Now, since / is nonsingular, we have = ~ / p / Since / < , it follows that < ~ < / . Thus, we may apply the induction hypothesis to < as a subspace of the nonsingular / , giving a space / p Ä p / p > / containing < . It follows that / p / p Ä p / p > is the desired extension of < .
Note that if < is a nonsingular extension of < then dim²< ³ ~ dim²< ³ b dim²rad²< ³³ Theorem 11.12 Let = be a nonsingular metric vector space and let < be a subspace of = . The following are equivalent: 1) > is a nonsingular completion of < 2) > is a minimal nonsingular extension of < 3) > is a nonsingular extension of < and dim²> ³ ~ dim²< ³ b dim²rad²< ³³ Moreover, any two nonsingular completions of < are isomorphic. Proof. If 1) holds and if < ? > where ? is nonsingular, then we may apply the nonsingular extension theorem to < as a subspace of ? , to obtain a nonsingular extension < Z of < for which <
But < Z and > have the same dimension and so must be equal. Hence, ? ~ > . Thus > is a minimal nonsingular extention of < and 2) holds. If 2) holds then we have < < > where < is a nonsingular completion of < . But the minimality of > implies that < ~ > and so 3) holds. If 3) holds then again we have < < > . But dim²< ³ ~ dim²> ³ and so > ~ < is a nonsingular completion of < and 1) holds.
256
Advanced Linear Algebra
If ? ~ > p > and @ ~ >Z p > are nonsingular completions of < ~ > p rad²< ³ then > and >Z are hyperbolic spaces of the same dimension and are therefore isometric. It follows that ? and @ are isometric.
Extending Isometries to Nonsingular Completions Let = and = Z be isometric nonsingular metric vector spaces and let < ~ > p rad²< ³ be a subspace of = , with nonsingular completion < . If ¢ < ¦ ²< ³ is an isometry, then it is a simple matter to extend to an isometry from < onto a nonsingular completion of ²< ³. To see this, let < ~>p> where > is nonsingular and ²" Á # Á à Á " Á # ³ is a hyperbolic basis for >. Since ²" Á à Á " ³ is a basis for rad²< ³, it follows that ² ²" ³Á à Á ²" ³³ is a basis for rad² ²< ³³. Now we complete ²< ³ ~ ²> ³ p rad² ²> ³³ to get ²< ³ ~ >Z p ²> ³ where >Z has hyperbolic basis ² ²" ³Á ' Á à Á ²" ³Á ' ³. To extend , simply set ²# ³ ~ ' for all ~ Á à Á . Theorem 11.13 Let = and = Z be isometric nonsingular metric vector spaces and let < be a subspace of = , with nonsingular completion < . Any isometry ¢ < ¦ ²< ³ can be extended to an isometry from < onto a nonsingular completion of ²< ³.
The Witt Theorems: A Preview There are two important theorems that are quite easy to prove in the case of real inner product spaces, but require more work in the case of metric vector spaces in general. Let = and = Z be nonsingular isometric metric vector spaces over a field - . We assume that char²- ³ £ if = is orthogonal. The Witt extension theorem says that if : is a subspace of = and ¢ : ¦ ²:³ = Z is an isometry, then can be extended to an isometry from = to = Z . The Witt cancellation theorem says that if = ~ : p :
and = Z ~ ; p ;
then : ; ¬ : ;
Metric Vector Spaces: The Theory of Bilinear Forms
257
We will prove these theorems in both the orthogonal and symplectic cases later in the chapter. For now, we simply want to show that it is easy to prove one Witt theorem from the other. Suppose that the Witt extension theorem holds and assume that = ~ : p :
and = Z ~ ; p ;
and : ; . Then any isometry ¢ : ¦ ; can be extended to an isometry from = to = Z . According to Theorem 11.10, we have ²: ³ ~ ; and so : ; . Hence, the Witt cancellation theorem holds. Conversely, suppose that the Witt cancellation theorem holds and let ¢ : ¦ ²:³ = Z be an isometry. Then we may extend to a nonsingular completion of : . Hence, we may assume that : is nonsingular. Then = ~ : p : Since is an isometry, ²:³ is also nonsingular and we can write = Z ~ ²:³ p ²:³ Since : ²:³, Witt's cancellation theorem implies that : ²:³ . If ¢ : ¦ ²:³ is an isometry then the map ¢ = ¦ = Z defined by ²" b #³ ~ ²"³ b ²#³ for " : and # : is an isometry that extends . Hence Witt's extension theorem holds.
The Classification Problem for Metric Vector Spaces The classification problem for a class of metric vector spaces (such as the orthogonal or symplectic spaces) is the problem of determining when two metric vector spaces in the class are isometric. The classification problem is considered “solved,” at least in a theoretical sense, by finding a set of canonical forms or a complete set of invariants for matrices under congruence. To see why, suppose that ¢ = ¦ > is an isometry and 8 ~ ²# Á à Á # ³ is an ordered basis for = . Then 9 ~ ² ²# ³Á à Á ²# ³³ is an ordered basis for > and 48 ²= ³ ~ ²º# Á # »³ ~ ²º ²# ³Á ²# ³»³ ~ 49 ²> ³ Thus, the congruence class of matrices representing = is identical to the congruence class of matrices representing > . Conversely, suppose that = and > are metric vector spaces with the same congruence class of representing matrices. Then if 8 ~ ²# Á à Á # ³ is an ordered basis for = , there is an ordered basis 9 ~ ²$ Á à Á $ ³ for > for which
258
Advanced Linear Algebra
²º# Á # »³ ~ 48 ²= ³ ~ 49 ²> ³ ~ ²º$ Á $ »³ Hence, the map ¢ = ¦ > defined by ²# ³ ~ $ is an isometry from = to > . We have shown that two metric vector spaces are isometric if and only if they have the same congruence class of representing matrices. Thus, we can determine whether any two metric vector spaces are isometric by representing each space with a matrix and determining if these matrices are congruent, using a set of canonical forms or a set of complete invariants.
Symplectic Geometry We now turn to a study of the structure of orthogonal and symplectic geometries and their isometries. Since the study of the structure (and the structure itself) of symplectic geometries is simpler than that of orthogonal geometries, we begin with the symplectic case. The reader who is interested only in the orthogonal case may omit this section. Throughout this section, let = be a nonsingular symplectic geometry.
The Classification of Symplectic Geometries Among the simplest types of metric vector spaces are those that possess an orthogonal basis, that is, a basis 8 ~ ¸" Á Ã Á " ¹ for which º" Á " » ~ when £ . For in this case, we may write = as an orthogonal direct sum of onedimensional subspaces = ~ span²" ³ p Ä p span²" ³ However, it is easy to see that a symplectic geometry = has an orthogonal basis if and only if it is totally degenerate. For if 8 is an orthogonal basis for = then º" Á " » ~ for ~ since the form is alternate and for £ since the basis is orthogonal. It follows that = is totally degenerate. Thus, no “interesting” symplectic geometries have orthogonal bases. Thus, in searching for an orthogonal decomposition of = , we must look not at one-dimensional subspaces, but at two-dimensional subspaces. Given a nonzero " = , the nonsingularity of = implies that there must exist a vector # = for which º"Á #» ~ £ . Replacing # by c #, we have º"Á "» ~ º#Á #» ~ Á
º"Á #» ~
and º#Á "» ~ c
and so / ~ span²"Á #³ is a hyperbolic plane and the matrix of / with respect to the hyperbolic pair 8 ~ ²"Á #³ is 48 ~ > Since / is nonsingular, we can write
c
?
Metric Vector Spaces: The Theory of Bilinear Forms
259
= ~ / p / where / is also nonsingular. Hence, we may repeat the preceding decomposition in / , eventually obtaining an orthogonal decomposition of = of the form = ~ / p / p Ä p / where each / is a hyperbolic plane. This proves the following structure theorem for symplectic geometries. Theorem 11.14 1) A symplectic geometry has an orthogonal basis if and only if it is totally degenerate. 2) Any nonsingular symplectic geometry = is a hyperbolic space, that is, = ~ / p / p Ä p / where each / is a hyperbolic plane. Thus, there is a basis for = for which the matrix of the form is
@
v x c x x x ~x x x x
c
Æ c
w
y { { { { { { { {
z
In particular, the dimenison of = is even. 3) Any symplectic geometry = has the form = ~ rad²= ³ p > where > is a hyperbolic space and rad²= ³ is a totally degenerate space. The rank of the form is dim²>³ and = is uniquely determined up to isometry by its rank and its dimension. Put another way, up to isometry, there is precisely one symplectic geometry of each rank and dimension.
Symplectic forms are represented by alternate matrices, that is, skew-symmetric matrices with zero diagonal. Moreover, according to Theorem 11.14, each d alternate matrix is congruent to a matrix of the form @ ?Ác ~ >
0c ?block
Since the rank of ?Ác is , no two such matrices are congruent.
260
Advanced Linear Algebra
Theorem 11.15 The set of d matrices of the form ?Ác is a set of canonical forms for alternate matrices under congruence.
The previous theorems solve the classification problem for symplectic geometries by stating that the rank and dimension of = form a complete set of invariants under congruence and that the set of all matrices of the form ?Ác is a set of canonical forms.
Witt's Extension and Cancellation Theorems We now prove the Witt theorems for symplectic geometries. Theorem 11.16 (Witt's extension theorem) Let = and = Z be nonsingular isometric symplectic geometries over a field - . Suppose that : is a subspace of = and ¢ : ¦ ²:³ = Z is an isometry. Then can be extended to an isometry from = to = Z . Proof. According to Theorem 11.13, we can extend to a nonsingular completion of : , so we may simply assume that : and ²:³ are nonsingular. Hence, = ~ : p : and = Z ~ ²:³ p ²:³ To complete the extension of to = , we need only choose a hyperbolic basis ² Á Á Ã Á Á ³ for : and a hyperbolic basis ²Z Á Z Á Ã Á Z Á Z ³ for ²:³ and define the extension by setting ² ³ ~ Z and ² ³ ~ Z .
As a corollary to Witt's extension theorem, we have Witt's cancellation theorem. Theorem 11.17 (Witt's cancellation theorem) Let = and = Z be isometric nonsingular symplectic geometries over a field - . If = ~ : p :
and = Z ~ ; p ;
then : ; ¬ : ;
Metric Vector Spaces: The Theory of Bilinear Forms
261
The Structure of the Symplectic Group: Symplectic Transvections To understand the nature of symplectic transformations on a nonsingular symplectic geometry = , we begin with the following definition. Definition Let = be a nonsingular symplectic geometry over - . Let # = be nonzero and let - . The map #Á ¢ = ¦ = defined by #Á ²%³ ~ % b º%Á #»# is called the symplectic transvection determined by # and .
The first thing to notice about a symplectic transvection #Á is that if ~ then #Á is the identity and if £ then #Á is the identity precisely on the subspace span²#³ , which is very large, in the sense of having codimension . Thus, despite the name, symplectic transvections are not highly complex maps. On the other hand, we should point out that since # is isotropic, the subspace span²#³ is singular and span²#³ q span²#³ ~ span²#³. Hence, span²#³ is not a vector space complement of the space span²#³ upon which #Á is the identity. In other words, while we can write = ~ span²#³ l < where dim²< ³ ~ and Ospan²#³ ~ , we cannot say that < ~ span²#³. Here are the basic properties of symplectic transvections. Theorem 11.18 Let #Á be a symplectic transvection on = . Then 1) #Á is a symplectic transformation. 2) #Á ~ if and only if ~ . 3) If % # then #Á ²%³ ~ %. For £ , % # if and only if #Á ²%³ ~ % 4) #Á #Á ~ #Áb c 5) #Á ~ #Ác 6) For any symplectic transformation , #Á c ~ ²#³Á 7) For - i , #Á ~ #Á
Note that if < is a subspace of = and if "Á is a symplectic transvection on < then, by definition, " < . However, the map "Á can also be thought of as a symplectic transvection on = , defined by the same formula "Á ²%³ ~ % b º%Á "»"
262
Advanced Linear Algebra
where % can be any vector in = . Moreover, for any ' < we have "Á ²'³ ~ ' and so "Á is the identity on < . We now wish to prove that any symplectic transformation on a nonsingular symplectic geometry = is the product of symplectic transvections. The proof is not difficult, but it is a bit lengthy, so we break it up into parts. Our first goal is to show that we can get from any hyperbolic pair to any other hyperbolic pair using a product of symplectic transvections. Let us say that two hyperbolic pairs ²%Á &³ and ²$Á '³ are connected if there is a product of symplectic transvections that carries % to $ and & to ' and write ¢ ²%Á &³ © ²$Á '³ or just ²%Á &³ © ²$Á '³. It is clear that connectedness is an equivalence relation on the set of hyperbolic pairs. Theorem 11.19 Let = be a nonsingular symplectic geometry. 1) For every hyperbolic pair ²"Á #³ and nonzero vector $ = , there is a vector % for which ²"Á #³ © ²$Á %³. 2) Any two hyperbolic pairs ²"Á #³ and ²"Á $³ with the same first coordinate are connected. 3) Every pair ²"Á #³ and ²$Á '³ of hyperbolic pairs is connected. Proof. For part 1), all we need to do is find a product of symplectic transvections for which ²"³ ~ $, because an isometry maps hyperbolic pairs to hyperbolic pairs and so we can simply set % ~ ²#³. If º"Á $» £ then " £ $ and "c$Á ²"³ ~ " b º"Á " c $»²" c $³ ~ " c º"Á $»²" c $³ Taking ~ °º"Á $» gives "c$Á ²"³ ~ $, as desired. Now suppose that º"Á $» ~ . If there is a vector & that is not orthogonal to either " or $, then by what we have just proved, there is a vector % such that ²"Á #³ © ²&Á % ³ and a vector % for which ²&Á % ³ © ²$Á %³. Then transitivity implies that ²"Á #³ © ²$Á %³. But there is a nonzero vector & = that is not orthogonal to either " or $ since there is a linear functional on = for which ²"³ £ and ²$³ £ . But the Riesz representation theorem implies that there is a nonzero vector & such that ²%³ ~ º%Á &» for all % = . For part 2), suppose first that º#Á $» £ . Then # £ $ and since º"Á # c $» ~ , we know that #c$Á ²"³ ~ ". Also,
Metric Vector Spaces: The Theory of Bilinear Forms
263
#c$Á ²#³ ~ # b º#Á # c $»²# c $³ ~ # c º#Á $»²# c $³ Taking ~ °º#Á $» gives #c$Á ²#³ ~ $, as desired. If º#Á $» ~ , then º#Á " b #» £ implies that ²"Á #³ © ²"Á " b #³ and º" b #Á $» £ implies that ²"Á " b #³ © ²"Á $³. It follows by transitivity that ²"Á #³ © ²"Á $³. For part 3), parts 1) and 2) imply that there is a vector % for which ²"Á #³ © ²"Á %³ © ²$Á '³ as desired.
We can now show that the symplectic transvections generate the symplectic group. Theorem 11.20 Every symplectic transformation on a nonsingular symplectic geometry = is the product of symplectic transvections. Proof. Let be a symplectic transformation on = . We proceed by induction on ~ dim²= ³, which must be even. If ~ then = ~ / ~ span²"Á #³ is a hyperbolic plane and by the previous theorem, there is a product of symplectic transvections on = for which ¢ ²"Á #³ © ²²"³Á ²#³³ Hence ~ . This proves the result if ~ . Assume that the result holds for all dimensions less than and let dim²= ³ ~ . Let / ~ span²"Á #³ be a hyperbolic plane in = and write = ~ / p / where / is a nonsingular symplectic geometry of degree less than . Since ²²"³Á ²#³³ is a hyperbolic pair, we again have a product of symplectic transvections on = for which ¢ ²"Á #³ © ²²"³Á ²#³³ Thus ~ on the subspace / . Also, since / is invariant under c , so is / . If we restrict c to / , we may apply the induction hypotheses to get a product of symplectic transvections on / for which c ~ on / . Hence, ~ on / and ~ on / . But since the vectors that define the symplectic transvections making up belong to / , we may extend to = and ~ on / . Thus, ~ on / as well, and we have ~ on = .
264
Advanced Linear Algebra
The Structure of Orthogonal Geometries: Orthogonal Bases We have seen that no interesting (not totally degenerate) symplectic geometries have orthgonal bases. In contradistinction to the symplectic case, almost all interesting orthogonal geometries = have orthogonal bases. The only problem arises when = is also symplectic and char²- ³ ~ . In particular, if = is orthogonal and symplectic, then it cannot possess an orthogonal basis unless it is totally degenerate. When char²- ³ £ , the only orthogonal, symplectic geometries are the totally degenerate ones, since the matrix of = with respect to any basis is both symmetric and skew-symmetric, with zeros on the main diagonal and so must be the zero matrix. However, when char²- ³ ~ , such a nonzero matrix exists, for example 4 ~>
?
Thus, there are orthogonal, symplectic geometries that are not totally degenerate when char²- ³ ~ . These geometries do not have orthogonal bases and we will not consider them further. Once we have an orthogonal basis for = , the natural question is: “How close can we come to obtaining an orthonormal basis?” Clearly, this is possible only if = is nonsingular. As we will see, the answer to this question depends on the nature of the base field, and is different for algebraically closed fields, the real field and finite fields—the three cases that we will consider in this book. We should mention that, even when = has an orthogonal basis, the Gram– Schmidt orthogonalization process may not apply, because even nonsingular orthogonal geometries may have isotropic vectors, and so division by º"Á "» is problematic. For example, consider an orthogonal hyperbolic plane / ~ span²"Á #³ and assume that char²- ³ £ . Thus, " and # are isotropic and º"Á #» ~ º#Á "» ~ . The vector " cannot be extended to an orthogonal basis, as would be possible for a real inner product space, using the Gram–Schmidt process, for it is easy to see that the set ¸"Á " b #¹ cannot be an orthogonal basis for any Á - . However, / has an orthogonal basis, namely, ¸" b #Á " c #¹.
Orthogonal Bases Let = be an orthogonal geometry. If = is also symplectic, then = has an orthogonal basis if and only if it is totally degenerate. Moreover, when char²- ³ £ , these are the only types of orthogonal, symplectic geometries.
Metric Vector Spaces: The Theory of Bilinear Forms
265
Now, let = be an orthogonal geometry that is not symplectic. Hence, = contains a nonisotropic vector " , the subspace span²" ³ is nonsingular and = ~ span²" ³ p = where = ~ span²" ³ . If = is not symplectic, then we may decompose = to get = ~ span²" ³ p span²" ³ p = This process may be continued until we reach a decomposition = ~ span²" ³ p Ä p span²" ³ p < where < is symplectic as well as orthogonal. (This includes the case < ~ ¸¹.) If char²- ³ £ , then < is totally degenerate. Thus, if 8 ~ ²" Á à Á " ³ and 9 is any basis for < , the union 8 r 9 is an orthogonal basis for = . Hence, when char²- ³ £ , any orthogonal geometry has an orthogonal basis. When char²- ³ ~ , we must work a bit harder. Since < is symplectic, it has the form < ~ > p rad²< ³ where > is a hyperbolic space and so = ~ span²" ³ p Ä p span²" ³ p > p D where D is totally degenerate and the " are nonisotropic. If 8 ~ ²" Á à Á " ³ and 9 ~ ²% Á & Á à Á % Á & ³ is a hyperbolic basis for > and : ~ ²' Á à Á ' ³ is an ordered basis for D then the union ; ~ 8 r 9 r : ~ ²" Á à Á " Á % Á & Á à Á % Á & Á ' Á à Á ' ³ is an ordered basis for = . However, we can do better. The following lemma says that, when char²- ³ ~ , a pair of isotropic basis vectors (such as % Á & ³ can be replaced by a pair of nonisotropic basis vectors, in the presence of a nonisotropic basis vector (such as " ). Lemma 11.21 Suppose that char²- ³ ~ . Let > be a three-dimensional orthogonal geometry with ordered basis 8 ~ ²"Á %Á &³ for which the matrix of the form with respect to 8 is 48 ~ where £ . Then the vectors
v w
y z
266
Advanced Linear Algebra
# ~ " b % b & # ~ " b % # ~ " b ² c ³% b & form an orthogonal basis of > consisting of nonisotropic vectors. Proof. It is straightforward to check that the vectors # Á # and # are linearly independent and mutually orthogonal. Details are left to the reader.
Using the previous lemma, we can replace the vectors ¸" Á % Á & ¹ with the nonisotropic vectors ¸# Á #b Á #b ¹ and still have an ordered basis ; ~ ²" Á Ã Á "c Á # Á #b Á #b Á % Á & Á Ã Á % Á & ³ for = . The replacement process can be repeated until the isotropic vectors are absorbed, leaving an orthogonal basis of nonisotropic vectors. Let us summarize. Theorem 11.22 Let = be an orthogonal geometry. 1) If = is also symplectic, then = has an orthogonal basis if and only if it is totally degenerate. (When char²- ³ £ , these are the only types of orthogonal, symplectic geometries. When char²- ³ ~ , orthogonal, symplectic geometries that are not totally degenerate do exist.) 2) If = is not symplectic, then = has an ordered orthogonal basis 8 ~ ²" Á Ã Á " Á ' Á Ã Á ' ³ for which º" Á " » ~ £ and º' Á ' » ~ . Hence, 48 has the diagonal form v x x x 48 ~ x x x w
Æ Æ
y { { { { { { z
with ~ rk²48 ³ nonzero entries and zeros on the diagonal.
As a corollary, we get a nice theorem about symmetric matrices. Corollary 11.23 Let 4 be a symmetric matrix that is not alternate if char²- ³ ~ . Then 4 is congruent to a diagonal matrix.
The Classification of Orthogonal Geometries: Canonical Forms We now want to consider the question of improving upon Theorem 11.22. The diagonal matrices of this theorem do not form a set of canonical forms for congruence. In fact, if Á Ã Á are nonzero scalars, then the matrix of = with
Metric Vector Spaces: The Theory of Bilinear Forms
267
respect to the basis 9 ~ ² " Á Ã Á " Á ' Á Ã Á ' ³ is
v x x x 49 ~ x x x
y { { { { { {
Æ Æ
w
(11.5)
z
Hence, 48 and 49 are congruent diagonal matrices, and by a simple change of basis, we can multiply any diagonal entry by a nonzero square in - . The determination of a set of canonical forms for symmetric (nonalternate when char²- ³ ~ ) matrices under congruence depends on the properties of the base field. Our plan is to consider three types of base fields: algebraically closed fields, the real field s and finite fields. Here is a preview of the forthcoming results. 1) When the base field - is algebraically closed, there is an ordered basis 8 for which
48 ~ AÁ
v x x x ~x x x
y { { { { { {
Æ Æ
w
z
If = is nonsingular, then 48 is an identity matrix and = has an orthonormal basis. 2) Over the real base field, there is an ordered basis 8 for which
48 ~ ZÁÁ
v x x x x x x ~x x x x x x w
Æ c Æ c Æ
y { { { { { { { { { { { { z
268
Advanced Linear Algebra
3) If - is a finite field, there is an ordered basis 8 for which v x x x x 48 ~ ZÁ ²³ ~ x x x x
Æ Æ
w
y { { { { { { { { z
where is unique up to multiplication by a square and if char²- ³ ~ then we can take ~ . Now let us turn to the details.
Algebraically Closed Fields If - is algebraically closed then for every - , the polynomial % c has a root in - , that is, every element of - has a square root in - . Therefore, we may choose ~ °j in (11.5), which leads to the following result. Theorem 11.24 Let = be an orthogonal geometry over an algebraically closed field - . Provided that = is not symplectic as well when char²- ³ ~ , then = has an ordered orthogonal basis 8 ~ ²" Á Ã Á " Á ' Á Ã Á ' ³ for which º" Á " » ~ and º' Á ' » ~ . Hence, 48 has the diagonal form
48 ~ AÁ
v x x x ~x x x
Æ
w
Æ
y { { { { { { z
with ones and zeros on the diagonal. In particular, if = is nonsingular then = has an orthonormal basis.
The matrix version of Theorem 11.24 follows. Theorem 11.25 Let I be the set of all d symmetric matrices over an algebraically closed field - . If char²- ³ ~ , we restrict I to the set of all symmetric matrices with at least one nonzero entry on the main diagonal. 1) Any matrix 4 in I is congruent to a unique matrix of the form ZÁ , in fact, ~ rk²4 ³ and ~ c rk²4 ³. 2) The set of all matrices of the form ZÁ for b ~ , is a set of canonical forms for congruence on I . 3) The rank of a matrix is a complete invariant for congruence on I .
Metric Vector Spaces: The Theory of Bilinear Forms
269
The Real Field s If - ~ s, we can choose ~ °j( (, so that all nonzero diagonal elements in (11.5) will be either , or c. Theorem 11.26 (Sylvester's law of inertia) Any orthogonal geometry = over the real field s has an ordered orthogonal basis 8 ~ ²" Á Ã Á " Á # Á Ã Á # Á ' Á Ã Á ' ³ for which º" Á " » ~ , º# Á # » ~ c and º' Á ' » ~ . Hence, the matrix 48 has the diagonal form
48 ~ ZÁÁ
v x x x x x x ~x x x x x x
Æ c Æ c
w
Æ
y { { { { { { { { { { { { z
with ones, negative ones and zeros on the diagonal.
Here is the matrix version of Theorem 11.26. Theorem 11.27 Let I be the set of all d symmetric matrices over the real field s. 1) Any matrix in I is congruent to a unique matrix of the form ZÁÁ Á for some Á and ~ c c . 2) The set of all matrices of the form ZÁÁ for b b ~ is a set of canonical forms for congruence on I . 3) Let 4 I and let 4 be congruent to AÁÁ . The number b is the rank of 4 , the number c is the signature of 4 and the triple ²Á Á ³ is the inertia of 4 . The pair ²Á ³, or equivalently the pair ² b Á c ³, is a complete invariant under congruence on I . Proof. We need only prove the uniqueness statement in part 1). Let 8 ~ ²" Á à Á " Á # Á à Á # Á ' Á à Á ' ³ and 9 ~ ²"Z Á à Á "ZZ Á #Z Á à Á #ZZ 'Z Á à Á 'Z Z ³ be ordered bases for which the matrices 48 and 49 have the form shown in Theorem 11.26. Since the rank of these matrices must be equal, we have b ~ Z b Z and so ~ Z .
270
Advanced Linear Algebra
If % span²" Á Ã Á " ³ and % £ then º%Á %» ~ L " Á " M ~ º" Á " » ~ Á ~ Á
Á
On the other hand, if & span²#Z Á Ã Á #ZZ ³ and & £ then º&Á &» ~ L #Z Á #Z M ~
Z Z º# Á # »
~ c
Á
Á
~ c
Á
Hence, if & span²#Z Á à Á #ZZ Á 'Z Á à Á 'Z Z ³ then º&Á &» . It follows that span²" Á à Á " ³ q span²#Z Á à Á #ZZ Á 'Z Á à Á 'Z Z ³ ~ ¸¹ and so b ² c Z ³ that is, Z . By symmetry, Z and so ~ Z . Finally, since ~ Z , it follows that ~ Z .
Finite Fields To deal with the case of finite fields, we must know something about the distribution of squares in finite fields, as well as the possible values of the scalars º#Á #». Theorem 11.28 Let - be a finite field with elements. 1) If char²- ³ ~ then every element of - is a square. 2) If char²- ³ £ then exactly half of the nonzero elements of - are squares, that is, there are ² c ³° nonzero squares in - . Moreover, if % is any nonsquare in - then all nonsquares have the form %, for some - . Proof. Write - ~ - , let - i be the subgroup of all nonzero elements in - and let ²- i ³ ~ ¸ - i ¹ be the subgroup of all nonzero squares in - . The Frobenius map ¢ - i ¦ ²- i ³ defined by ²³ ~ is a surjective group homomorphism, with kernel ker²³ ~ ¸ - ~ ¹ ~ ¸cÁ ¹ If char²- ³ ~ , then ker²³ ~ ¸¹ and so is bijective and (- i ( ~ (²- i ³ (, which proves part 1). If char²- ³ £ , then (ker²³( ~ and so (- i ( ~ (²- i ³ (, which proves the first part of part 2). We leave proof of the last statement to the reader.
Metric Vector Spaces: The Theory of Bilinear Forms
271
Definition A bilinear form on = is universal if for any nonzero - there exists a vector # = for which º#Á #» ~ .
Theorem 11.29 Let = be an orthogonal geometry over a finite field - with char²- ³ £ and assume that = has a nonsingular subspace of dimension at least . Then the bilinear form of = is universal. Proof. Theorem 11.22 implies that = contains two linearly independent vectors " and # for which º"Á "» ~ £ Á º#Á #» ~ £ Á º"Á #» ~ Given any - , we want to find and for which ~ º" b #Á " b #» ~ b or ~ c If ( ~ ¸ - ¹ then ((( ~ ² b ³°, since there are ² c ³° nonzero squares and also we must consider ~ . Also, if ) ~ ¸ c - ¹ then for the same reasons () ( ~ ² b ³°. It follows that ( q ) cannot be the empty set and so there are and for which ~ c , as desired.
Now we can proceed with the business at hand. Theorem 11.30 Let = be an orthogonal geometry over a finite field - and assume that = is not symplectic if char²- ³ ~ . If char²- ³ £ then let be a fixed nonsquare in - . For any nonzero - , write v x x x x ? ²³ ~ x x x x w
Æ Æ
y { { { { { { { { z
where rk²? ²³³ ~ . 1) If char²- ³ ~ then there is an ordered basis 8 for which 48 ~ ? ²³. 2) If char²- ³ £ , then there is an ordered basis 8 for which 48 equals ? ²³ or ? ²³. Proof. We can dispose of the case char²- ³ ~ quite easily: Referring to (11.5), since every element of - has a square root, we may take ~ ²j ³c . If char²- ³ £ , then Theorem 11.22 implies that there is an ordered orthogonal basis
272
Advanced Linear Algebra
8 ~ ²" Á Ã Á " Á ' Á Ã Á ' ³ for which º" Á " » ~ £ and º' Á ' » ~ . Hence, 48 has the diagonal form v x x x 48 ~ x x x w
y { { { { { {
Æ Æ
z
Now, consider the nonsingular orthogonal geometry = ~ span²" Á " ³. According to Theorem 11.29, the form is universal when restricted to = . Hence, there exists a # = for which º# Á # » ~ . Now, # ~ " b " for Á - not both , and we may swap " and " if necessary to ensure that £ . Hence, 8 ~ ²# Á " Á Ã Á " Á ' Á Ã Á ' ³ is an ordered basis for = for which the matrix 48 is diagonal and has a in the upper left entry. We can repeat the process with the subspace = ~ span²# Á # ³. Continuing in this way, we can find an ordered basis 9 ~ ²# Á # Á Ã Á # Á ' Á Ã Á ' ³ for which 49 ~ ? ²³ for some nonzero - . Now, if is a square in then we can replace # by ²°j³# to get a basis : for which 4: ~ ? ²³. If is not a square in - , then ~ for some - and so replacing # by ²°³# gives a basis : for which 4" ~ ? ²³.
Theorem 11.31 Let I be the set of all d symmetric matrices over a finite field - . If char²- ³ ~ , we restrict I to the set of all symmetric matrices with at least one nonzero entry on the main diagonal. 1) If char²- ³ ~ then any matrix in I is congruent to a unique matrix of the form ? ²³ and the matrices ¸? ²³ ~ Á à Á ¹ form a set of canonical forms for I under congruence. Also, the rank is a complete invariant. 2) If char²- ³ £ , let be a fixed nonsquare in - . Then any matrix I is congruent to a unique matrix of the form ? ²³ or ? ²³. The set ¸? ²³Á ? ²³ ~ Á à Á ¹ is a set of canonical forms for congruence on I . (Thus, there are exactly two congruence classes for each rank .)
The Orthogonal Group Having “settled” the classification question for orthogonal geometries over certain types of fields, let us turn to a discussion of the structure-preserving maps, that is, the isometries.
Metric Vector Spaces: The Theory of Bilinear Forms
273
Rotations and Reflections We have seen that if 8 is an ordered basis for = , then for any %Á & = º%Á &» ~ ´%µ!8 48 ´&µ8 Now, for any B²= ³, we have º %Á &» ~ ´ %µ!8 48 ´ &µ8 ~ ´%µ!8 ²´ µ!8 48 ´ µ8 ³´&µ8 and so is an isometry if and only if ´ µ!8 48 ´ µ8 ~ 48 Taking determinants gives det²48 ³ ~ det²´ µ8 ³ det²48 ³ Therefore, if = is nonsingular then det²´ µ8 ³ ~ f Since the determinant is an invariant under similarity, we can make the following definition. Definition Let be an isometry on a nonsingular orthogonal geometry = . The determinant of is the determinant of any matrix ´ µ8 representing . If det² ³ ~ then is called a rotation and if det² ³ ~ c then is called a reflection.
The set Eb ²= ³ of rotations forms a subgroup of the orthogonal group E²= ³ and the surjective determinant map det¢ E²= ³ ¦ ¸cÁ ¹ has kernel Eb ²= ³. Hence, if char²- ³ £ , then Eb ²= ³ is a normal subgroup of E²= ³ of index .
Symmetries Recall that for a real (or complex) inner product space = , we defined a reflection to be a linear map /# for which /# # ~ c#Á ²/# ³Oº#» ~ The term symmetry is often used in the context of general orthogonal geometries. In particular, suppose that = is a nonsingular orthogonal geometry over - , where char²- ³ £ and let " = be nonisotropic. Write = ~ span²"³ p span²"³ Then there is a unique isometry " with the properties
274
Advanced Linear Algebra
1) " ²"³ ~ c" 2) " ²%³ ~ % for all % span²"³ We can also write " ~ c p , that is " ²% b &³ ~ c% b & for all % span²"³ and & span²"³ . It is easy to see that " ²#³ ~ # c
º#Á "» " º"Á "»
The map " is called the symmetry determined by ". Note that the requirement that " be nonisotropic is required, since otherwise we would have " span²"³ and so c" ~ " ²"³ ~ ", which implies that " ~ . (Thus, symplectic geometries do not have symmetries.) In the context of real inner product spaces, Theorem 10.11 says that if )#) ~ )$) £ , then /#c$ is the unique reflection sending # to $, that is, /#c$ ²#³ ~ $. In the present context, we must be careful, since symmetries are defined for nonisotropic vectors only. Here is what we can say. Theorem 11.32 Let = be a nonsingular orthogonal geometry over a field - , with char²- ³ £ . If "Á # = have the same nonzero “length,” that is, if º"Á "» ~ º#Á #» £ then there exists a symmetry for which ²"³ ~ #
or ²"³ ~ c#
Proof. In general, if % and & are orthogonal isotropic vectors, then % b & and % c & are also isotropic. Hence, since " and # are not isotropic, it follows that one of " c # and " b # must be nonisotropic. If " b # is nonisotropic, then "b# ²" b #³ ~ c²" b #³ and "b# ²" c #³ ~ " c # Combining these two gives "b# ²"³ ~ c#. On the other hand, if " c # is nonisotropic, then "c# ²" c #³ ~ c²" c #³ and "c# ²" b #³ ~ " b #
Metric Vector Spaces: The Theory of Bilinear Forms
275
These equations give "c# ²"³ ~ #.
Recall that an operator on a real inner product space is unitary if and only if it is a product of reflections. Here is the generalization to nonsingular orthogonal geometries. Theorem 11.33 Let = be a nonsingular orthogonal geometry over a field with char²- ³ £ . A linear transformation on = is an orthogonal transformation (an isometry) if and only if is the product of symmetries on = . Proof. We proceed by induction on ~ dim²= ³. If ~ then = ~ span²#³ where º#Á #» £ . Let ²#³ ~ # where - . Since is unitary º#Á #» ~ º#Á #» ~ º ²#³Á ²#³» ~ º#Á #» and so ~ f. If ~ then is the identity, which is equal to # . On the other hand, if ~ c then ~ # . In either case, is a product of symmetries. Assume now that the theorem is true for dimensions less than and let dim²= ³ ~ . Let # = be nonisotropic. Since º ²#³Á ²#³» ~ º#Á #» £ , Theorem 11.32 implies the existence of a symmetry on = for which ² ²#³³ ~ # where ~ f . Thus, ~ f on span²#³. Since Theorem 11.15 implies that span²#³ is -invariant, we may apply the induction hypothesis to on span²#³ to get Ospan²#³ ~ $ Ä$ ~ where $ span²#³ and $ is a symmetry on span²#³ . But each $ can be extended to a symmetry on = by setting $ ²#³ ~ #. Assume that is the extension of to = , where ~ on span²#³. Hence, ~ on span²#³ and ~ on span²#³. If ~ then ~ on = and so ~ , which completes the proof. If ~ c then ~ # on span²#³ since # is the identity on span²#³ and ~ # on span²#³. Hence, ~ # on = and so ~ # on = .
The Witt's Theorems for Orthogonal Geometries We are now ready to consider the Witt theorems for orthogonal geometries. Theorem 11.34 (Witt's cancellation theorem) Let = and > be isometric nonsingular orthogonal geometries over a field - with char²- ³ £ . Suppose that = ~ : p :
and > ~ ; p ;
276
Advanced Linear Algebra
Then : ; ¬ : ; Proof. First, we prove that it is sufficient to consider the case = ~ > . Suppose that the result holds when = ~ > and that ¢ = ¦ > is an isometry. Then ²:³ p ²: ³ ~ ²: p : ³ ~ ²= ³ ~ > ~ ; p ; Furthermore, ²:³ : ; . We can therefore apply the theorem to > to get : ²: ³ ; as desired. To prove the theorem when = ~ > , assume that = ~ : p : ~ ; p ; where : and ; are nonsingular and : ; . Let ¢ : ¦ ; be an isometry. We proceed by induction on dim²:³. Suppose first that dim²:³ ~ and that : ~ span² ³. Since º ² ³Á ² ³» ~ º Á » £ Theorem 11.32 implies that there is a symmetry for which ² ³ ~ ² ³ where ~ f . Hence, is an isometry of = for which ; ~ ²:³ and Theorem 11.10 implies that ; ~ ²: ³. Thus, O: is the desired isometry. Now suppose the theorem is true for dim²:³ and let dim²:³ ~ . Let ¢ : ¦ ; be an isometry. Since : is nonsingular, we can choose a nonisotropic vector : and write : ~ span² ³ p < , where < is nonsingular. It follows that = ~ : p : ~ span² ³ p < p : and = ~ ; p ; ~ ²span² ³³ p ²< ³ p ; Now we may apply the one-dimensional case to deduce that < p : ²< ³ p ; If ¢ < p : ¦ ²< ³ p ; is an isometry then ²< ³ p ²: ³ ~ ²< p : ³ ~ ²< ³ p ; But ²< ³ ²< ³ and since dim²²< ³³ ~ dim²< ³ , the induction hypothesis implies that : ²: ³ ; .
Metric Vector Spaces: The Theory of Bilinear Forms
277
As we have seen, Witt's extension theorem is a corollary of Witt's cancellation theorem. Theorem 11.35 (Witt's extension theorem) Let = and = Z be isometric nonsingular orthogonal geometries over a field - , with char²- ³ £ . Suppose that < is a subspace of = and ¢ < ¦ ²< ³ = Z is an isometry. Then can be extended to an isometry from = to = Z .
Maximal Hyperbolic Subspaces of an Orthogonal Geometry We have seen that any orthogonal geometry = can be written in the form = ~ < p rad²= ³ where < is nonsingular. Nonsingular spaces are better behaved than singular ones, but they can still possess isotropic vectors. We can improve upon the preceding decomposition by noticing that if " < is isotropic, then span²"³ is totally degenerate and so it can be “captured” in a hyperbolic plane / ~ span²"Á %³, namely, the nonsingular extension of span²"³. Then we can write = ~ / p / < p rad²= ³ where / < is the orthogonal complement of / in < and has “one fewer” isotropic vector. In order to generalize this process, we first discuss maximal totally degenerate subspaces.
Maximal Totally Degenerate Subspaces Let = be a nonsingular orthogonal geometry over a field - , with char²- ³ £ . Suppose that < and < Z are maximal totally degenerate subspaces of = . We claim that dim²< ³ ~ dim²< Z ³. For if dim²< ³ dim²< Z ³, then there is a vector space isomorphism ¢ < ¦ ²< ³ < Z , which is also an isometry, since < and < Z are totally degenerate. Thus, Witt's extension theorem implies the existence of an isometry ¢ = ¦ = that extends . In particular, c ²< Z ³ is a totally degenerate space that contains < and so c ²< Z ³ ~ < , which shows that dim²< ³ ~ dim²< Z ³. We have proved the following. Theorem 11.36 Let = be a nonsingular orthogonal geometry over a field - , with char²- ³ £ .
278
Advanced Linear Algebra
1) All maximal totally degenerate subspaces of = have the same dimension, which is called the Witt index of = and is denoted by $²= ³. 2) Any totally degenerate subspace of = of dimension $²= ³ is maximal.
Maximal Hyperbolic Subspaces We can prove by a similar argument that all maximal hyperbolic subspaces of = have the same dimension. Let > ~ / p Ä p / and A ~ 2 p Ä p 2 be maximal hyperbolic subspaces of = and suppose that / ~ span²" Á # ³ and 2 ~ span²% Á & ³. We may assume that dim²>³ dim²A³. The linear map ¢ > ¦ A defined by ²" ³ ~ % Á ²# ³ ~ & is clearly an isometry from > to ²>³. Thus, Witt's extension theorem implies the existence of an isometry ¢ = ¦ = that extends . In particular, c ²A³ is a hyperbolic space that contains > and so c ²A³ ~ >. It follows that dim²A³ ~ dim²>³. It is not hard to see that the maximum dimension ²= ³ of a hyperbolic subspace of = is $²= ³, where $²= ³ is the Witt index of = . First, the nonsingular extension of a maximal totally degenerate subspace <$ of = is a hyperbolic space of dimension $²= ³ and so ²= ³ $²= ³. On the other hand, there is a totally degenerate subspace < contained in any hyperbolic space > and so $²= ³, that is, dim²> ³ $²= ³. Hence ²= ³ $²= ³ and so ²= ³ ~ $²= ³. Theorem 11.37 Let = be a nonsingular orthogonal geometry over a field - , with char²- ³ £ . 1) All maximal hyperbolic subspaces of = have dimension $²= ³. 2) Any hyperbolic subspace of dimenison $²= ³ must be maximal. 3) The Witt index of a hyperbolic space > is .
The Anisotropic Decomposition of an Orthogonal Geometry If > is a maximal hyperbolic subspace of = then = ~ > p > Since > is maximal, > is anisotropic, for if " > is isotropic then the nonsingular extension of > p span²"³ would be a hyperbolic space strictly larger than >.
Metric Vector Spaces: The Theory of Bilinear Forms
279
Thus, we arrive at the following decomposition theorem for orthogonal geometries. Theorem 11.38 (The anisotropic decomposition of an orthogonal geometry) Let = ~ < p rad²= ³ be an orthogonal geometry over - , with char²- ³ £ . Let > be a maximal hyperbolic subspace of < , where > ~ ¸¹ if < has no isotropic vectors. Then = ~ : p > p rad²= ³ where : is anisotropic, > is hyperbolic of dimension $²= ³ and rad²= ³ is totally degenerate.
Exercises 1.
2.
3.
4. 5.
6.
Let < Á > be subspaces of a metric vector space = . Show that a) < > ¬ > < b) < < c) < ~ < Let < Á > be subspaces of a metric vector space = . Show that a) ²< b > ³ ~ < q > b) ²< q > ³ ~ < b > Prove that the following are equivalent: a) = is nonsingular b) º"Á %» ~ º#Á %» for all % = implies " ~ # Show that a metric vector space = is nonsingular if and only if the matrix 48 of the form is nonsingular, for every ordered basis 8 . Let = be a finite-dimensional vector space with a bilinear form ºÁ ». We do not assume that the form is symmetric or alternate. Show that the following are equivalent: a) ¸# = º#Á $» ~ for all $ = ¹ ~ b) ¸# = º$Á #» ~ for all $ = ¹ ~ Hint: Consider the singularity of the matrix of the form. Find a diagonal matrix congruent to v w
7.
y c z
Prove that the matrices 0 ~ >
and 4 ~ > ?
?
are congruent over the base field - ~ r of rational numbers. Find an invertible matrix 7 such that 7 ! 0 7 ~ 4 .
280
8.
Advanced Linear Algebra
Let = be an orthogonal geometry over a field - with char²- ³ £ . We wish to construct an orthogonal basis E ~ ²" Á Ã Á " ³ for = , starting with any generating set = ~ ²# Á Ã Á # ³. Justify the following steps, essentially due to Lagrange. We may assume that = is not totally degenerate. a) If º# Á # » £ for some then let " ~ # . Otherwise, there are indices £ for which º# Á # » £ . Let " ~ # b # . b) Assume we have found an ordered set of vectors E ~ ²" Á Ã Á " ³ that form an orthogonal basis for a subspace = of = and that none of the " 's are isotropic. Then = ~ = p = . c) For each # = , let
$ ~ # c ~
9.
10. 11. 12. 13. 14.
º# Á " » " º" Á " »
Then the vectors $ span = . If = is totally degenerate, take any basis for = and append it to E . Otherwise, repeat step a) on = to get another vector "b and let Eb ~ ²" Á à Á "b ³. Eventually, we arrive at an orthogonal basis E for = . Prove that orthogonal hyperbolic planes may be characterized as twodimensional nonsingular orthogonal geometries that have exactly two onedimensional totally isotropic (equivalently: totally degenerate) subspaces. Prove that a two-dimensional nonsingular orthogonal geometry is a hyperbolic plane if and only if its discriminant is - ² c ³. Does Minkowski space contain any isotropic vectors? If so, find them. Is Minkowski space isometric to Euclidean space s ? If ºÁ » is a symmetric bilinear form on = and char²- ³ £ , show that 8²%³ ~ º%Á %»° is a quadratic form. Let = be a vector space over a field - , with ordered basis 8 ~ ²# Á à Á # ³. Let ²% Á à Á % ³ be a homogeneous polynomial of degree over - , that is, a polynomial each of whose terms has degree . The -form defined by is the function from = to - defined as follows. If # ~ ' # then ²#³ ~ ² Á à Á ³
(We use the same notation for the form and the polynomial.) Prove that forms are the same as quadratic forms. 15. Show that is an isometry on = if and only if 8² ²#³³ ~ 8²#³ where 8 is the quadratic form associated with the bilinear form on = . (Assume that char²- ³ £ .³ 16. Show that a quadratic form 8 on = satisfies the parallelogram law: 8²% b &³ b 8²% c &³ ~ ´8²%³ b 8²&³µ 17. Show that if = is a nonsingular orthogonal geometry over a field - , with char²- ³ £ then any totally isotropic subspace of = is also a totally degenerate space. 18. Is it true that = ~ rad²= ³ p rad²= ³ ?
Metric Vector Spaces: The Theory of Bilinear Forms
281
19. Let = be a nonsingular symplectic geometry and let #Á be a symplectic transvection. Prove that a) #Á #Á ~ #Áb b) For any symplectic transformation , #Á c ~ ²#³Á c)
For - i , #Á ~ #Á
20.
21. 22. 23. 24.
25.
26. 27.
28.
29. 30. 31.
d) For a fixed # £ , the map ª #Á is an isomorphism from the additive group of - onto the group ¸#Á - ¹ Sp²= ³. Prove that if % is any nonsquare in a finite field - then all nonsquares have the form %, for some - . Hence, the product of any two nonsquares in - is a square. Formulate Sylvester's law of inertia in terms of quadratic forms on = . Show that a two-dimensional space is a hyperbolic plane if and only if it is nonsingular and contains an isotropic vector. Assume that char²- ³ £ . Prove directly that a hyperbolic plane in an orthogonal geometry cannot have an orthogonal basis when char²- ³ ~ . a) Let < be a subspace of = . Show that the inner product º% b < Á & b < » ~ º%Á &» on the quotient space = °< is well-defined if and only if < rad²= ³. b) If < rad²= ³, when is = °< nonsingular? Let = ~ 5 p : , where 5 is a totally degenerate space. a) Prove that 5 ~ rad²= ³ if and only if : is nonsingular. b) If : is nonsingular, prove that : = °rad²= ³. Let dim²= ³ ~ dim²> ³. Prove that = °rad²= ³ > °rad²> ³ implies = >. Let = ~ : p ; . Prove that a) rad²= ³ ~ rad²:³ p rad²; ³ b) = °rad²= ³ :°rad²:³ p ; °rad²; ³ c) dim²rad²= ³³ ~ dim²rad²:³³ b dim²rad²; ³³ d) = is nonsingular if and only if : and ; are both nonsingular. Let = be a nonsingular metric vector space. Because the Riesz representation theorem is valid in = , we can define the adjoint i of a linear map B²= ³ exactly as in the case of real inner product spaces. Prove that is an isometry if and only if it is bijective and unitary (that is, i ~ ). If char²- ³ £ , prove that B²= Á > ³ is an isometry if and only if it is bijective and º ²#³Á ²#³» ~ º#Á #» for all # = . Let 8 ~ ¸# Á à Á # ¹ be a basis for = . Prove that B²= Á > ³ is an isometry if and only if it is bijective and º # Á # » ~ º# Á # » for all Á . Let be a linear operator on a metric vector space = . Let 8 ~ ²# Á à Á # ³ be an ordered basis for = and let 48 be the matrix of the form relative to
282
Advanced Linear Algebra
8 . Prove that is an isometry if and only if ´ µ!8 48 ´ µ8 ~ 48 32. Let = be a nonsingular orthogonal geometry and let B²= ³ be an isometry. a) Show that dim²ker² c ³³ ~ dim²im² c ³ ³. b) Show that ker² c ³ ~ im² c ³ . How would you describe ker² c ³ in words? c) If is a symmetry, what is dim²ker² c ³³? d) Can you characterize symmetries by means of dim²ker² c ³³? 33. A linear transformation B²= ³ is called unipotent if c is nilpotent. Suppose that = is a nonisotropic metric vector space and that is unipotent and isometric. Show that ~ . 34. Let = be a hyperbolic space of dimension and let < be a hyperbolic subspace of = of dimension . Show that for each , there is a hyperbolic subspace > of = for which < > = . 35. Let char²- ³ £ . Prove that if ? is a totally degenerate subspace of an orthgonal geometry = then dim²?³ dim²= ³°. 36. Prove that an orthogonal geometry = of dimension is a hyperbolic space if and only if = is nonsingular, is even and = contains a totally degenerate subspace of dimension °. 37. Prove that a symplectic transformation has determinant equal to .
Chapter 12
Metric Spaces
The Definition In Chapter 9, we studied the basic properties of real and complex inner product spaces. Much of what we did does not depend on whether the space in question is finite or infinite-dimensional. However, as we discussed in Chapter 9, the presence of an inner product and hence a metric, on a vector space, raises a host of new issues related to convergence. In this chapter, we discuss briefly the concept of a metric space. This will enable us to study the convergence properties of real and complex inner product spaces. A metric space is not an algebraic structure. Rather it is designed to model the abstract properties of distance. Definition A metric space is a pair ²4 Á ³, where 4 is a nonempty set and ¢ 4 d 4 ¦ s is a real-valued function, called a metric on 4 , with the following properties. The expression ²%Á &³ is read “the distance from % to & .” 1) (Positive definiteness) For all %Á & 4 , ²%Á &³ and ²%Á &³ ~ if and only if % ~ &. 2) (Symmetry) For all %Á & 4 , ²%Á &³ ~ ²&Á %³ 3) (Triangle inequality) For all %Á &Á ' 4 , ²%Á &³ ²%Á '³ b ²'Á &³
As is customary, when there is no cause for confusion, we simply say “let 4 be a metric space.”
284
Advanced Linear Algebra
Example 12.1 Any nonempty set 4 is a metric space under the discrete metric, defined by ²%Á &³ ~ F
if % ~ & if % £ &
Example 12.2 1) The set s is a metric space, under the metric defined for % ~ ²% Á Ã Á % ³ and & ~ ²& Á Ã Á & ³ by ²%Á &³ ~ j²% c & ³ b Ä b ²% c & ³ This is called the Euclidean metric on s . We note that s is also a metric space under the metric ²%Á &³ ~ (% c & ( b Ä b (% c & ( Of course, ²s Á ³ and ²s Á ³ are different metric spaces. 2) The set d is a metric space under the unitary metric ²%Á &³ ~ k(% c & ( b Ä b (% c & ( where % ~ ²% Á Ã Á % ³ and & ~ ²& Á Ã Á & ³ are in d .
Example 12.3 1) The set *´Á µ of all real-valued (or complex-valued) continuous functions on ´Á µ is a metric space, under the metric ² Á ³ ~ sup ( ²%³ c ²%³( %´Áµ
We refer to this metric as the sup metric. 2) The set *´Á µ of all real-valued ²or complex-valued) continuous functions on ´Á µ is a metric space, under the metric
² ²%³Á ²%³³ ~ ( ²%³ c ²%³( dx
a
Example 12.4 Many important sequence spaces are metric spaces. We will often use boldface roman letters to denote sequences, as in % ~ ²% ³ and & ~ ²& ³. 1) The set MB s of all bounded sequences of real numbers is a metric space under the metric defined by ²%Á &³ ~ sup(% c & (
The set MB d of all bounded complex sequences, with the same metric, is also a metric space. As is customary, we will usually denote both of these spaces by MB .
Metric Spaces
285
2) For , let M be the set of all sequences % ~ ²% ³ of real ²or complex) numbers for which B
(% ( B ~
We define the -norm of % by °
B
)%) ~ 8 (% ( 9 ~
Then M is a metric space, under the metric °
B
²%Á &³ ~ )% c &) ~ 8 (% c & ( 9 ~
The fact that M is a metric follows from some rather famous results about sequences of real or complex numbers, whose proofs we leave as (wellhinted) exercises. ¨ Holder's inequality Let Á and b ~ . If % M and & M then the product sequence %& ~ ²% & ³ is in M and )%&) )%) )&) that is, B
°
B
(% & ( 8 (% ( 9 ~
~
B
°
8 (& ( 9 ~
A special case of this (with ~ ~ 2) is the Cauchy-Schwarz inequality B
B
B
~
~
~
(% & ( m (% ( m (& ( Minkowski's inequality For , if %Á & M then the sum % b & ~ ²% b & ³ is in M and ) % b & ) ) %) b ) & ) that is, B
8 (% b & ( 9 ~
°
B
8(% ( 9 ~
°
B
b 8 (& ( 9 ~
°
286
Advanced Linear Algebra
If 4 is a metric space under a metric then any nonempty subset : of 4 is also a metric under the restriction of to : d : . The metric space : thus obtained is called a subspace of 4 .
Open and Closed Sets Definition Let 4 be a metric space. Let % 4 and let be a positive real number. 1) The open ball centered at % , with radius , is )²% Á ³ ~ ¸% 4 ²%Á % ³ ¹ 2) The closed ball centered at % , with radius , is )²% Á ³ ~ ¸% 4 ²%Á % ³ ¹ 3) The sphere centered at % , with radius , is :²% Á ³ ~ ¸% 4 ²%Á % ³ ~ ¹
Definition A subset : of a metric space 4 is said to be open if each point of : is the center of an open ball that is contained completely in : . More specifically, : is open if for all % : , there exists an such that )²%Á ³ : . Note that the empty set is open. A set ; 4 is closed if its complement ; in 4 is open.
It is easy to show that an open ball is an open set and a closed ball is a closed set. If % 4 , we refer to any open set : containing % as an open neighborhood of %. It is also easy to see that a set is open if and only if it contains an open neighborhood of each of its points. The next example shows that it is possible for a set to be both open and closed, or neither open nor closed. Example 12.5 In the metric space s with the usual Euclidean metric, the open balls are just the open intervals )²% Á ³ ~ ²% c Á % b ³ and the closed balls are the closed intervals )²% Á ³ ~ ´% c Á % b µ Consider the half-open interval : ~ ²Á µ, for a . This set is not open, since it contains no open ball centered at : and it is not closed, since its complement : ~ ²cBÁ µ r ²Á B³ is not open, since it contains no open ball about .
Metric Spaces
287
Observe also that the empty set is both open and closed, as is the entire space s. (Although we will not do so, it is possible to show that these are the only two sets that are both open and closed in s.³
It is not our intention to enter into a detailed discussion of open and closed sets, the subject of which belongs to the branch of mathematics known as topology. In order to put these concepts in perspective, however, we have the following result, whose proof is left to the reader. Theorem 12.1 The collection E of all open subsets of a metric space 4 has the following properties: 1) J E, 4 E 2) If : , ; E then : q ; E 3) If ¸: 2¹ is any collection of open sets then 2 : E .
These three properties form the basis for an axiom system that is designed to generalize notions such as convergence and continuity and leads to the following definition. Definition Let ? be a nonempty set. A collection E of subsets of ? is called a topology for ? if it has the following properties: 1) J EÁ ? E 2) If :Á ; E then : q ; E 3) If ¸: 2¹ is any collection of sets in E then : E . 2
We refer to subsets in E as open sets and the pair ²?Á E³ as a topological space.
According to Theorem 12.1, the open sets (as we defined them earlier) in a metric space 4 form a topology for 4 , called the topology induced by the metric. Topological spaces are the most general setting in which we can define concepts such as convergence and continuity, which is why these concepts are called topological concepts. However, since the topologies with which we will be dealing are induced by a metric, we will generally phrase the definitions of the topological properties that we will need directly in terms of the metric.
Convergence in a Metric Space Convergence of sequences in a metric space is defined as follows. Definition A sequence ²% ³ in a metric space 4 converges to % 4 , written ²% ³ ¦ %, if lim ²% Á %³ ~
¦B
Equivalently, ²% ³ ¦ % if for any , there exists an 5 such that
288
Advanced Linear Algebra
5 ¬ ²% Á %³ or, equivalently 5 ¬ % )²%Á ³ In this case, % is called the limit of the sequence ²% ³.
If 4 is a metric space and : is a subset of 4 , by a sequence in : , we mean a sequence whose terms all lie in : . We next characterize closed sets and therefore also open sets, using convergence. Theorem 12.2 Let 4 be a metric space. A subset : 4 is closed if and only if whenever ²% ³ is a sequence in : and ²% ³ ¦ % then % : . In loose terms, a subset : is closed if it is closed under the taking of sequential limits. Proof. Suppose that : is closed and let ²% ³ ¦ %, where % : for all . Suppose that % ¤ : . Then since % : and : is open, there exists an for which % )²%Á ³ : . But this implies that )²%Á ³ q ¸% ¹ ~ J which contradicts the fact that ²% ³ ¦ %. Hence, % : . Conversely, suppose that : is closed under the taking of limits. We show that : is open. Let % : and suppose to the contrary that no open ball about % is contained in : . Consider the open balls )²%Á °³, for all . Since none of these balls is contained in : , for each , there is an % : q )²%Á °³. It is clear that ²% ³ ¦ % and so % : . But % cannot be in both : and : . This contradiction implies that : is open. Thus, : is closed.
The Closure of a Set Definition Let : be any subset of a metric space 4 . The closure of : , denoted by cl²:³, is the smallest closed set containing : .
We should hasten to add that, since the entire space 4 is closed and since the intersection of any collection of closed sets is closed (exercise), the closure of any set : does exist and is the intersection of all closed sets containing : . The following definition will allow us to characterize the closure in another way. Definition Let : be a nonempty subset of a metric space 4 . An element % 4 is said to be a limit point, or accumulation point of : if every open ball centered at % meets : at a point other than % itself. Let us denote the set of all limit points of : by M²:³.
Here are some key facts concerning limit points and closures.
Metric Spaces
289
Theorem 12.3 Let : be a nonempty subset of a metric space 4 . 1) % M²:³ if and only if there is a sequence ²% ³ in : for which % £ % for all and ²% ³ ¦ %. 2) : is closed if and only if M²:³ : . In words, : is closed if and only if it contains all of its limit points. 3) cl²:³ ~ : r M²:³. 4) % cl²:³ if and only if there is a sequence ²% ³ in : for which ²% ³ ¦ %. Proof. For part 1), assume first that % M²:³. For each , there exists a point % £ % such that % )²%Á °³ q : . Thus, we have ²% Á %³ ° and so ²% ³ ¦ %. For the converse, suppose that ²% ³ ¦ %, where % £ % : . If )²%Á ³ is any ball centered at % then there is some 5 such that 5 implies % )²%Á ³. Hence, for any ball )²%Á ³ centered at %, there is a point % £ %, such that % : q )²%Á ³. Thus, % is a limit point of : . As for part 2), if : is closed then by part 1), any % M²:³ is the limit of a sequence ²% ³ in : and so must be in : . Hence, M²:³ : . Conversely, if M²:³ : then : is closed. For if ²% ³ is any sequence in : and ²% ³ ¦ % then there are two possibilities. First, we might have % ~ % for some , in which case % ~ % : . Second, we might have % £ % for all , in which case ²% ³ ¦ % implies that % M²:³ : . In either case, % : and so : is closed under the taking of limits, which implies that : is closed. For part 3), let ; ~ : r M²:³. Clearly, : ; . To show that ; is closed, we show that it contains all of its limit points. So let % M²; ³. Hence, there is a sequence ²% ³ ; for which % £ % and ²% ³ ¦ %. Of course, each % is either in : , or is a limit point of : . We must show that % ; , that is, that % is either in : or is a limit point of : . Suppose for the purposes of contradiction that % ¤ : and % ¤ M²:³. Then there is a ball )²%Á ³ for which )²%Á ³ q : £ J. However, since ²% ³ ¦ %, there must be an % )²%Á ³. Since % cannot be in : , it must be a limit point of : . Referring to Figure 12.1, if ²% Á %³ ~ then consider the ball )²% Á ²c³°³. This ball is completely contained in )²%Á ³ and must contain an element & of : , since its center % is a limit point of : . But then & : q )²%Á ³, a contradiction. Hence, % : or % M²:³. In either case, % ; ~ : r M²:³ and so ; is closed. Thus, ; is closed and contains : and so cl²:³ ; . On the other hand, ; ~ : r M²:³ cl²:³ and so cl²:³ ~ ; .
290
Advanced Linear Algebra
Figure 12.1 For part 4), if % cl²:³ then there are two possibilities. If % : then the constant sequence ²% ³, with % ~ % for all %, is a sequence in : that converges to %. If % ¤ : then % M²:³ and so there is a sequence ²% ³ in : for which % £ % and ²% ³ ¦ %. In either case, there is a sequence in : converging to %. Conversely, if there is a sequence ²% ³ in : for which ²% ³ ¦ % then either % ~ % for some , in which case % : cl²:³, or else % £ % for all , in which case % M²:³ cl²:³.
Dense Subsets The following concept is meant to convey the idea of a subset : 4 being “arbitrarily close” to every point in 4 . Definition A subset : of a metric space 4 is dense in 4 if cl²:³ ~ 4 . A metric space is said to be separable if it contains a countable dense subset.
Thus, a subset : of 4 is dense if every open ball about any point % 4 contains at least one point of : . Certainly, any metric space contains a dense subset, namely, the space itself. However, as the next examples show, not every metric space contains a countable dense subset. Example 12.6 1) The real line s is separable, since the rational numbers r form a countable dense subset. Similarly, s is separable, since the set r is countable and dense. 2) The complex plane d is separable, as is d for all . 3) A discrete metric space is separable if and only if it is countable. We leave proof of this as an exercise.
Metric Spaces
291
Example 12.7 The space MB is not separable. Recall that MB is the set of all bounded sequences of real numbers (or complex numbers), with metric ²%Á &³ ~ sup(% c & (
To see that this space is not separable, consider the set : of all binary sequences : ~ ¸²% ³ % ~ or for all ¹ This set is in one-to-one correspondence with the set of all subsets of o and so is uncountable. (It has cardinality 2L L .³ Now, each sequence in : is certainly bounded and so lies in MB . Moreover, if % £ & MB then the two sequences must differ in at least one position and so ²%Á &³ ~ . In other words, we have a subset : of MB that is uncountable and for which the distance between any two distinct elements is . This implies that the uncountable collection of balls ¸)² Á °) :¹ is mutually disjoint. Hence, no countable set can meet every ball, which implies that no countable set can be dense in MB .
Example 12.8 The metric spaces M are separable, for . The set : of all sequences of the form ~ ² Á Ã Á Á Á Ã ³ for all , where the 's are rational, is a countable set. Let us show that it is dense in M . Any % M satisfies B
(% ( B ~
Hence, for any , there exists an 5 such that B
(% ( ~5 b
Since the rational numbers are dense in s, we can find rational numbers for which (% c ( 5 for all ~ Á Ã Á 5 . Hence, if ~ ² Á Ã Á 5 Á Á Ã ³ then 5
B
~
~5 b
²%Á ³ ~ (% c ( b (% (
b ~
which shows that there is an element of : arbitrarily close to any element of M . Thus, : is dense in M and so M is separable.
292
Advanced Linear Algebra
Continuity Continuity plays a central role in the study of linear operators on infinitedimensional inner product spaces. Definition Let ¢ 4 ¦ 4 Z be a function from the metric space ²4 Á ³ to the metric space ²4 Z Á Z ³. We say that is continuous at % 4 if for any , there exists a such that ²%Á % ³ ¬ Z ² ²%³Á ²% ³³ or, equivalently, 4)²% Á ³5 )² ²% ³Á ³ (See Figure 12.2.³ A function is continuous if it is continuous at every % 4 .
Figure 12.2 We can use the notion of convergence to characterize continuity for functions between metric spaces. Theorem 12.4 A function ¢ 4 ¦ 4 Z is continuous if and only if whenever ²% ³ is a sequence in 4 that converges to % 4 then the sequence ² ²% ³³ converges to ²% ³, in short, ²% ³ ¦ % ¬ ² ²% ³³ ¦ ²% ³ Proof. Suppose first that is continuous at % and let ²% ³ ¦ % . Then, given , the continuity of implies the existence of a such that ²)²% Á ³³ )² ²% ³Á ³ Since ²% ³ ¦ %, there exists an 5 such that % )²% Á ³ for 5 and so 5 ¬ ²% ³ )² ²% ³Á ³ Thus, ²% ³ ¦ ²% ³. Conversely, suppose that ²% ³ ¦ % implies ² ²% ³³ ¦ ²% ³. Suppose, for the purposes of contradiction, that is not continuous at % . Then there exists an
Metric Spaces
293
such that, for all 4)²% Á ³5 \ )² ²% ³Á ³ Thus, for all , 4) 6% Á
\ )² ²% ³Á ³ 75
and so we may construct a sequence ²% ³ by choosing each term % with the property that % ) 6% Á
7, but ²% ³ ¤ )² ²% ³Á ³
Hence, ²% ³ ¦ % , but ²% ³ does not converge to ²% ³. This contradiction implies that must be continuous at % .
The next theorem says that the distance function is a continuous function in both variables. Theorem 12.5 Let ²4 Á ³ be a metric space. If ²% ³ ¦ % and ²& ³ ¦ & then ²% Á & ³ ¦ ²%Á &³. Proof. We leave it as an exercise to show that (²% Á & ³ c ²%Á &³( ²% Á %³ b ²& Á &³ But the right side tends to as ¦ B and so ²% Á & ³ ¦ ²%Á &³.
Completeness The reader who has studied analysis will recognize the following definitions. Definition A sequence ²% ³ in a metric space 4 is a Cauchy sequence if, for any , there exists an 5 for which Á 5 ¬ ²% Á % ³
We leave it to the reader to show that any convergent sequence is a Cauchy sequence. When the converse holds, the space is said to be complete. Definition Let 4 be a metric space. 1) 4 is said to be complete if every Cauchy sequence in 4 converges in 4 . 2) A subspace : of 4 is complete if it is complete as a metric space. Thus, : is complete if every Cauchy sequence ² ³ in : converges to an element in :.
Before considering examples, we prove a very useful result about completeness of subspaces.
294
Advanced Linear Algebra
Theorem 12.6 Let 4 be a metric space. 1) Any complete subspace of 4 is closed. 2) If 4 is complete then a subspace : of 4 is complete if and only if it is closed. Proof. To prove 1), assume that : is a complete subspace of 4 . Let ²% ³ be a sequence in : for which ²% ³ ¦ % 4 . Then ²% ³ is a Cauchy sequence in : and since : is complete, ²% ³ must converge to an element of : . Since limits of sequences are unique, we have % : . Hence, : is closed. To prove part 2), first assume that : is complete. Then part 1) shows that : is closed. Conversely, suppose that : is closed and let ²% ³ be a Cauchy sequence in : . Since ²% ³ is also a Cauchy sequence in the complete space 4 , it must converge to some % 4 . But since : is closed, we have ²% ³ ¦ % : . Hence, : is complete.
Now let us consider some examples of complete (and incomplete) metric spaces. Example 12.9 It is well known that the metric space s is complete. (However, a proof of this fact would lead us outside the scope of this book.³ Similarly, the complex numbers d are complete.
Example 12.10 The Euclidean space s and the unitary space d are complete. Let us prove this for s . Suppose that ²% ³ is a Cauchy sequence in s , where % ~ ²%Á Á Ã Á %Á ³ Thus,
²% Á % ³ ~ ²%Á c %Á ³ ¦ as Á ¦ B ~
and so, for each coordinate position , ²%Á c %Á ³ ²% Á % ³ ¦ which shows that the sequence ²%Á ³~Á2ÁÃ of th coordinates is a Cauchy sequence in s. Since s is complete, we must have ²%Á ³ ¦ & as ¦ B If & ~ ²& Á Ã Á & ³ then
²% Á &³ ~ ²%Á c & ³ ¦ as ¦ B ~
and so ²% ³ ¦ & s . This proves that s is complete.
Metric Spaces
295
Example 12.11 The metric space ²*´Á µÁ ³ of all real-valued (or complexvalued) continuous functions on ´Á µ, with metric ² Á ³ ~ sup ( ²%³ c ²%³( %´Áµ
is complete. To see this, we first observe that the limit with respect to is the uniform limit on ´Á µ, that is ² Á ³ ¦ if and only if for any , there is an 5 for which 5 ¬ ( ²%³ c ²%³( for all % ´Á µ Now, let ² ³ be a Cauchy sequence in ²*´Á µÁ ³. Thus, for any , there is an 5 for which Á 5 ¬ ( ²%³ c ²%³( for all % ´Á µ
(12.1)
This implies that, for each % ´Á µ, the sequence ² ²%³³ is a Cauchy sequence of real (or complex) numbers and so it converges. We can therefore define a function on ´Á µ by ²%³ ~ lim ²%³ ¦B
Letting ¦ B in (12.1), we get 5 ¬ ( ²%³ c ²%³( for all % ´Á µ Thus, ²%³ converges to ²%³ uniformly. It is well known that the uniform limit of continuous functions is continuous and so ²%³ *´Á µ. Thus, ² ²%³³ ¦ ²%³ *´Á µ and so ²*´Á µÁ ³ is complete.
Example 12.12 The metric space ²*´Á µÁ ³ of all real-valued (or complexvalued) continuous functions on ´Á µ, with metric
² ²%³Á ²%³³ ~ ( ²%³ c ²%³(% a
is not complete. For convenience, we take ´Á µ ~ ´Á µ and leave the general case for the reader. Consider the sequence of functions ²%³ whose graphs are shown in Figure 12.3. (The definition of ²%³ should be clear from the graph.³
296
Advanced Linear Algebra
Figure 12.3 We leave it to the reader to show that the sequence ² ²%³³ is Cauchy, but does not converge in ²*´Á µÁ ³. (The sequence converges to a function that is not continuous.)
Example 12.13 The metric space MB is complete. To see this, suppose that ²% ³ is a Cauchy sequence in MB , where % ~ ²%Á Á %Á2 Á Ã ³ Then, for each coordinate position , we have (%Á c %Á ( sup (%Á c %Á ( ¦ as Á ¦ B
(12.2)
Hence, for each , the sequence ²%Á ³ of th coordinates is a Cauchy sequence in s (or d). Since s (or d) is complete, we have ²%Á ³ ¦ & as ¦ B for each coordinate position . We want to show that & ~ ²& ³ MB and that ²% ³ ¦ &. Letting ¦ B in ²12.2) gives sup (%Á c & ( ¦ as ¦ B
(12.3)
and so, for some , (%Á c & ( for all and so (& ( b (%Á ( for all But since % MB , it is a bounded sequence and therefore so is ²& ³. That is, & ~ ²& ³ MB . Since ²12.3) implies that ²% ³ ¦ &, we see that MB is complete.
Metric Spaces
297
Example 12.14 The metric space M is complete. To prove this, let ²% ³ be a Cauchy sequence in M , where % ~ ²%Á Á %Á2 Á Ã ³ Then, for each coordinate position , B
(%Á c %Á ( (%Á c %Á ( ~ ²% Á % ³ ¦ ~
which shows that the sequence ²%Á ³ of th coordinates is a Cauchy sequence in s (or d). Since s (or d) is complete, we have ²%Á ³ ¦ & as ¦ B We want to show that & ~ ²& ³ M and that ²% ³ ¦ &. To this end, observe that for any , there is an 5 for which
Á 5 ¬ (%Á c %Á ( ~
for all . Now, we let ¦ B, to get
5 ¬ (%Á c & ( ~
for all . Letting ¦ B, we get, for any 5 , B
(%Á c & ( ~
which implies that ²% ³ c & M and so & ~ & c ²% ³ b ²% ³ M addition, ²% ³ ¦ &.
and in
As we will see in the next chapter, the property of completeness plays a major role in the theory of inner product spaces. Inner product spaces for which the induced metric space is complete are called Hilbert spaces.
Isometries A function between two metric spaces that preserves distance is called an isometry. Here is the formal definition. Definition Let ²4 Á ³ and ²4 Z Á Z ³ be metric spaces. A function ¢ 4 ¦ 4 Z is called an isometry if Z ² ²%³Á ²&³³ ~ ²%Á &³
298
Advanced Linear Algebra
for all %Á & 4 . If ¢ 4 ¦ 4 Z is a bijective isometry from 4 to 4 Z , we say that 4 and 4 Z are isometric and write 4 4 Z .
Theorem 12.7 Let ¢ ²4 Á ³ ¦ ²4 Z Á Z ³ be an isometry. Then 1) is injective 2) is continuous 3) c ¢ ²4 ³ ¦ 4 is also an isometry and hence also continuous. Proof. To prove 1), we observe that ²%³ ~ ²&³ ¯ Z ² ²%³Á ²&³³ ~ ¯ ²%Á &³ ~ ¯ % ~ & To prove 2), let ²% ³ ¦ % in 4 then Z ² ²% ³Á ²%³³ ~ ²% Á %³ ¦ as ¦ B and so ² ²% ³³ ¦ ²%³, which proves that is continuous. Finally, we have ² c ² ²%³³Á c ² ²&³³ ~ ²%Á &³ ~ Z ² ²%³Á ²&³³ and so c ¢ ²4 ³ ¦ 4 is an isometry.
The Completion of a Metric Space While not all metric spaces are complete, any metric space can be embedded in a complete metric space. To be more specific, we have the following important theorem. Theorem 12.8 Let ²4 Á ³ be any metric space. Then there is a complete metric space ²4 Z Á Z ³ and an isometry ¢ 4 ¦ ²4 ³ 4 Z for which ²4 ³ is dense in 4 Z . The metric space ²4 Z Á Z ³ is called a completion of ²4 Á ³. Moreover, ²4 Z Á Z ³ is unique, up to bijective isometry. Proof. The proof is a bit lengthy, so we divide it into various parts. We can simplify the notation considerably by thinking of sequences ²% ³ in 4 as functions ¢ o ¦ 4 , where ²³ ~ % .
Cauchy Sequences in 4 The basic idea is to let the elements of 4 Z be equivalence classes of Cauchy sequences in 4 . So let CS²4 ³ denote the set of all Cauchy sequences in 4 . If Á CS²4 ³ then, intuitively speaking, the terms ²³ get closer together as ¦ B and so do the terms ²³. Therefore, it seems reasonable that ² ²³Á ²³³ should approach a finite limit as ¦ B. Indeed, since (² ²³Á ²³³ c ² ²³Á ²³³( ² ²³Á ²³³ b ²²³Á ²³³ ¦ as Á ¦ B it follows that ² ²³Á ²³³ is a Cauchy sequence of real numbers, which implies that
Metric Spaces
lim ² ²³Á ²³³ B
299
(12.4)
¦B
(That is, the limit exists and is finite.³
Equivalence Classes of Cauchy Sequences in 4 We would like to define a metric Z on the set CS²4 ³ by Z ² Á ³ ~ lim ² ²³Á ²³³ ¦B
However, it is possible that lim ² ²³Á ²³³ ~
¦B
for distinct sequences and , so this does not define a metric. Thus, we are led to define an equivalence relation on CS²4 ³ by ¯ lim ² ²³Á ²³³ ~ ¦B
Let CS²4 ³ be the set of all equivalence classes of Cauchy sequences and define, for Á CS²4 ³ Z ² Á ³ ~ lim ² ²³Á ²³³ ¦B
(12.5)
where and . To see that Z is well-defined, suppose that Z and Z . Then since Z and Z , we have (² Z ²³Á Z ²³³ c ² ²³Á ²³³( ² Z ²³Á ²³³ b ²Z ²³Á ²³³ ¦ as ¦ B. Thus, Z and Z ¬ lim ² Z ²³Á Z ²³³ ~ lim ² ²³Á ²³³ ¦B
¦B
¬ Z ² Z Á Z ³ ~ Z ² Á ³ which shows that Z is well-defined. To see that Z is a metric, we verify the triangle inequality, leaving the rest to the reader. If Á and are Cauchy sequences then ² ²³Á ²³³ ² ²³Á ²³³ b ²²³Á ²³³ Taking limits gives lim ² ²³Á ²³³ lim ² ²³Á ²³³ b lim ²²³Á ²³³
¦B
¦B
¦B
300
Advanced Linear Algebra
and so Z ² Á ³ Z ² Á ³ b Z ²Á ³
Embedding ²4 Á ³ in ²4 Z Á Z ³ For each % 4 , consider the constant Cauchy sequence ´%µ, where ´%µ²³ ~ % for all . The map ¢ 4 ¦ 4 Z defined by ²%³ ~ ´%µ is an isometry, since Z ² ²%³Á ²&³³ ~ Z ²´%µÁ ´&µ³ ~ lim ²´%µ²³Á ´&µ²³³ ~ ²%Á &³ ¦B
Moreover, ²4 ³ is dense in 4 Z . This follows from the fact that we can approximate any Cauchy sequence in 4 by a constant sequence. In particular, let 4 Z . Since is a Cauchy sequence, for any , there exists an 5 such that Á 5 ¬ ² ²³Á ²³³ Now, for the constant sequence ´ ²5 ³µ we have Z 4´ ²5 ³µÁ 5 ~ lim ² ²5 ³Á ²³³ ¦B
Z
and so ²4 ³ is dense in 4 .
²4 Z Á Z ³ Is Complete Suppose that Á Á 3 Á à is a Cauchy sequence in 4 Z . We wish to find a Cauchy sequence in 4 for which Z ² Á ³ ~ lim ² ²³Á ²³³ ¦ as ¦ B ¦B
Since 4 Z and since ²4 ³ is dense in 4 Z , there is a constant sequence ´ µ ~ ² Á Á Ã ³ for which Z ² Á ´ µ³
Metric Spaces
301
We can think of as a constant approximation to , with error at most ° . Let be the sequence of these constant approximations ²³ ~ This is a Cauchy sequence in 4 . Intuitively speaking, since the 's get closer to each other as ¦ B, so do the constant approximations. In particular, we have ² Á ³ ~ Z ²´ µÁ ´ µ³ Z ²´ µÁ ³ b Z ² Á ³ b Z ² Á ´ µ³ b Z ² Á ³ b ¦ as Á ¦ B. To see that converges to , observe that Z ² Á ³ Z ² Á ´ µ³ b Z ²´ µÁ ³ ~
b lim ² Á ³ ¦B
b lim ² Á ²³³ ¦B
Now, since is a Cauchy sequence, for any , there is an 5 such that Á 5 ¬ ² Á ³ In particular, 5 ¬ lim ² Á ³ ¦B
and so 5 ¬ Z ² Á ³
b
which implies that ¦ , as desired.
Uniqueness Finally, we must show that if ²4 Z Á Z ³ and ²4 ZZ Á ZZ ³ are both completions of ²4 Á ³ then 4 Z 4 ZZ . Note that we have bijective isometries ¢ 4 ¦ ²4 ³ 4 Z and ¢ 4 ¦ ²4 ³ 4 ZZ Hence, the map ~ c ¢ ²4 ³ ¦ ²4 ³ is a bijective isometry from ²4 ³ onto ²4 ³, where ²4 ³ is dense in 4 Z . ²See Figure 12.4.³
302
Advanced Linear Algebra
Figure 12.4 Our goal is to show that can be extended to a bijective isometry from 4 Z to 4 ZZ . Let % 4 Z . Then there is a sequence ² ³ in ²4 ³ for which ² ³ ¦ %. Since ² ³ is a Cauchy sequence in ²4 ³, ²² ³³ is a Cauchy sequence in ²4 ³ 4 ZZ and since 4 ZZ is complete, we have ²² ³³ ¦ & for some & 4 ZZ . Let us define ²%³ ~ &. To see that is well-defined, suppose that ² ³ ¦ % and ² ³ ¦ %, where both sequences lie in ²4 ³. Then ZZ ²² ³Á ² ³³ ~ Z ² Á ³ ¦ as ¦ B and so ²² ³³ and ²² ³³ converge to the same element of 4 ZZ , which implies that ²%³ does not depend on the choice of sequence in ²4 ³ converging to %. Thus, is well-defined. Moreover, if ²4 ³ then the constant sequence ´µ converges to and so ²³ ~ lim ²³ ~ ²³, which shows that is an extension of . To see that is an isometry, suppose that ² ³ ¦ % and ² ³ ¦ &. Then ²² ³³ ¦ ²%³ and ²² ³³ ¦ ²&³ and since ZZ is continuous, we have ZZ ²²%³Á ²&³³ ~ lim ZZ ²² ³Á ² ³³ ~ lim Z ² Á ³ ~ Z ²%Á &³ ¦B
¦B
Thus, we need only show that is surjective. Note first that ²4 ³ ~ im²³ im²³. Thus, if im²³ is closed, we can deduce from the fact that ²4 ³ is dense in 4 ZZ that im²³ ~ 4 ZZ . So, suppose that ²²% ³³ is a sequence in im²³ and ²²% ³³ ¦ ' . Then ²²% ³³ is a Cauchy sequence and therefore so is ²% ³. Thus, ²% ³ ¦ % 4 Z . But is continuous and so ²²% ³³ ¦ ²%³, which implies that ²%³ ~ ' and so ' im²³. Hence, is surjective and 4 Z 4 ZZ .
Metric Spaces
303
Exercises 1.
Prove the generalized triangle inequality ²% Á % ³ ²% Á % ³ b ²% Á %3 ³ b Ä b ²%c Á % ³
2.
a)
Use the triangle inequality to prove that (²%Á &³ c ²Á ³( ²%Á ³ b ²&Á ³
b) Prove that (²%Á '³ c ²&Á '³( ²%Á &³ 3. 4.
5.
6. 7. 8. 9.
Let : MB be the subspace of all binary sequences (sequences of 's and 's). Describe the metric on : . Let 4 ~ ¸Á ¹ be the set of all binary -tuples. Define a function ¢ : d : ¦ s by letting ²%Á &³ be the number of positions in which % and & differ. For example, ´²³Á ²³µ ~ . Prove that is a metric. (It is called the Hamming distance function and plays an important role in the theory of error-correcting codes.³ Let B. a) If % ~ ²% ³ M show that % ¦ b) Find a sequence that converges to but is not an element of any M for B. a) Show that if % ~ ²% ³ M then % M for all . b) Find a sequence % ~ ²% ³ that is in M for , but is not in M . Show that a subset : of a metric space 4 is open if and only if : contains an open neighborhood of each of its points. Show that the intersection of any collection of closed sets in a metric space is closed. Let ²4 Á ³ be a metric space. The diameter of a nonempty subset : 4 is ²:³ ~ sup ²%Á &³ %Á&:
A set : is bounded if ²:³ B. a) Prove that : is bounded if and only if there is some % 4 and s for which : )²%Á ³. b) Prove that ²:³ ~ if and only if : consists of a single point. c) Prove that : ; implies ²:³ ²; ³. d) If : and ; are bounded, show that : r ; is also bounded. 10. Let ²4 Á ³ be a metric space. Let Z be the function defined by Z ²%Á &³ ~
²%Á &³ b ²%Á &³
Advanced Linear Algebra
304
Show that ²4 Á Z ³ is a metric space and that 4 is bounded under this metric, even if it is not bounded under the metric . b) Show that the metric spaces ²4 Á ³ and ²4 Á Z ³ have the same open sets. 11. If : and ; are subsets of a metric space ²4 Á ³, we define the distance between : and ; by a)
²:Á ; ³ ~
12. 13. 14. 15. 16.
17.
18. 19. 20. 21.
inf ²%Á &³
%:Á!;
a) Is it true that ²:Á ; ³ ~ if and only if : ~ ; ? Is a metric? b) Show that % cl²:³ if and only if ²¸%¹Á :³ ~ . Prove that % 4 is a limit point of : 4 if and only if every neighborhood of % meets : in a point other than % itself. Prove that % 4 is a limit point of : 4 if and only if every open ball )²%Á ³ contains infinitely many points of : . Prove that limits are unique, that is, ²% ³ ¦ %, ²% ³ ¦ & implies that % ~ &. Let : be a subset of a metric space 4 . Prove that % cl²:³ if and only if there exists a sequence ²% ³ in : that converges to %. Prove that the closure has the following properties: a) : cl²:³ b) cl²cl²:³³ ~ : c) cl²: r ; ³ ~ cl²:³ r cl²; ³ d) cl²: q ; ³ cl²:³ q cl²; ³ Can the last part be strengthened to equality? a) Prove that the closed ball )²%Á ³ is always a closed subset. b) Find an example of a metric space in which the closure of an open ball )²%Á ³ is not equal to the closed ball )²%Á ³. Provide the details to show that s is separable. Prove that d is separable. Prove that a discrete metric space is separable if and only if it is countable. Prove that the metric space 8´Á µ of all bounded functions on ´Á µ, with metric ² Á ³ ~ sup ( ²%³ c ²%³( %´Áµ
is not separable. 22. Show that a function ¢ ²4 Á ³ ¦ ²4 Z Á Z ³ is continuous if and only if the inverse image of any open set is open, that is, if and only if c ²< ³ ~ ¸% 4 ²%³ < ¹ is open in 4 whenever < is an open set in 4 Z . 23. Repeat the previous exercise, replacing the word open by the word closed. 24. Give an example to show that if ¢ ²4 Á ³ ¦ ²4 Z Á Z ³ is a continuous function and < is an open set in 4 , it need not be the case that ²< ³ is open in 4 Z .
Metric Spaces
305
25. Show that any convergent sequence is a Cauchy sequence. 26. If ²% ³ ¦ % in a metric space 4 , show that any subsequence ²% ³ of ²% ³ also converges to %. 27. Suppose that ²% ³ is a Cauchy sequence in a metric space 4 and that some subsequence ²% ³ of ²% ³ converges. Prove that ²% ³ converges to the same limit as the subsequence. 28. Prove that if ²% ³ is a Cauchy sequence then the set ¸% ¹ is bounded. What about the converse? Is a bounded sequence necessarily a Cauchy sequence? 29. Let ²% ³ and ²& ³ be Cauchy sequences in a metric space 4 . Prove that the sequence ~ ²% Á & ³ converges. 30. Show that the space of all convergent sequences of real numbers ²or complex numbers) is complete as a subspace of MB . 31. Let F denote the metric space of all polynomials over d, with metric ²Á ³ ~ sup (²%³ c ²%³( %´Áµ
Is F complete? 32. Let : MB be the subspace of all sequences with finite support (that is, with a finite number of nonzero terms). Is : complete? 33. Prove that the metric space { of all integers, with metric ²Á ³ ~ ( c (, is complete. 34. Show that the subspace : of the metric space *´Á µ (under the sup metric) consisting of all functions *´Á µ for which ²³ ~ ²³ is complete. 35. If 4 4 Z and 4 is complete, show that 4 Z is also complete. 36. Show that the metric spaces *´Á µ and *´Á µ, under the sup metric, are isometric. 37. Prove Ho¨lder's inequality °
B
B
~
~
(% & ( 8 (% ( 9
°
B
8 (& ( 9 ~
as follows. a) Show that ~ !c ¬ ! ~ c b) Let " and # be positive real numbers and consider the rectangle 9 in s with corners ²Á ³, ²"Á ³, ²Á #³ and ²"Á #³, with area "#. Argue geometrically (that is, draw a picture) to show that "
"# !c ! b
#
c
and so "# c)
" # b
Now let ? ~ '(% ( B and @ ~ '(& ( B. Apply the results of part b), to
Advanced Linear Algebra
306
"~
(% ( Á ? °
#~
(& ( @ °
and then sum on to deduce Ho¨lder's inequality. 38. Prove Minkowski's inequality °
B
8(% b & ( 9 ~
°
B
8 (% ( 9 ~
B
°
b 8 (& ( 9 ~
as follows. a) Prove it for ~ first. b) Assume . Show that (% b & ( (% ((% b & (c b (& ((% b & (c c)
Sum this from ~ to and apply Ho¨lder's inequality to each sum on the right, to get
(% b & ( ~
°
H8(% ( 9 ~
°
b 8(& ( 9 ~
°
I8(% b & ( 9 ~
Divide both sides of this by the last factor on the right and let ¦ B to deduce Minkowski's inequality. 39. Prove that M is a metric space.
Chapter 13
Hilbert Spaces
Now that we have the necessary background on the topological properties of metric spaces, we can resume our study of inner product spaces without qualification as to dimension. As in Chapter 9, we restrict attention to real and complex inner product spaces. Hence - will denote either s or d.
A Brief Review Let us begin by reviewing some of the results from Chapter 9. Recall that an inner product space = over - is a vector space = , together with an inner product ºÁ »¢ = d = ¦ - . If - ~ s then the inner product is bilinear and if - ~ d, the inner product is sesquilinear. An inner product induces a norm on = , defined by )#) ~ jº#Á #» We recall in particular the following properties of the norm. Theorem 13.1 1) (The Cauchy-Schwarz inequality) For all "Á # = , (º"Á #»( )") )#) with equality if and only if " ~ # for some - . 2) (The triangle inequality) For all "Á # = , )" b #) )") b )#) with equality if and only if " ~ # for some - . 3) (The parallelogram law) )" b #) b )" c #) ~ )") b )#)
We have seen that the inner product can be recovered from the norm, as follows.
308
Advanced Linear Algebra
Theorem 13.2 1) If = is a real inner product space then º"Á #» ~
²)" b #) c )" c #) ³
2) If = is a complex inner product space then º"Á #» ~
²)" b #) c )" c #) ³ b ²)" b #) c )" c #) ³
The inner product also induces a metric on = defined by ²"Á #³ ~ )" c #) Thus, any inner product space is a metric space. Definition Let = and > be inner product spaces and let B²= Á > ³. 1) is an isometry if it preserves the inner product, that is, if º ²"³Á ²#³» ~ º"Á #» for all "Á # = . 2) A bijective isometry is called an isometric isomorphism. When ¢ = ¦ > is an isometric isomorphism, we say that = and > are isometrically isomorphic.
It is easy to see that an isometry is always injective but need not be surjective, even if = ~ > . (See Example 10.3.³ Theorem 13.3 A linear transformation B²= Á > ³ is an isometry if and only if it preserves the norm, that is, if and only if )²#³) ~ )#) for all # = .
The following result points out one of the main differences between real and complex inner product spaces. Theorem 13.4 Let = be an inner product space and let B²= ³. 1) If º ²#³Á $» ~ for all #Á $ = then ~ . 2) If = is a complex inner product space and 8 ²#³ ~ º ²#³Á #» ~ for all # = then ~ . 3) Part 2) does not hold in general for real inner product spaces.
Hilbert Spaces Since an inner product space is a metric space, all that we learned about metric spaces applies to inner product spaces. In particular, if ²% ³ is a sequence of
Hilbert Spaces
309
vectors in an inner product space = then ²% ³ ¦ % if and only if )% c %) ¦ as ¦ B The fact that the inner product is continuous as a function of either of its coordinates is extremely useful. Theorem 13.5 Let = be an inner product space. Then 1) ²% ³ ¦ %Á ²& ³ ¦ & ¬ º% Á & » ¦ º%Á &» 2) ²% ³ ¦ % ¬ )% ) ¦ )%)
Complete inner product spaces play an especially important role in both theory and practice. Definition An inner product space that is complete under the metric induced by the inner product is said to be a Hilbert space.
Example 13.1 One of the most important examples of a Hilbert space is the space M of Example 10.2. Recall that the inner product is defined by B
º%Á &» ~ % & ~
(In the real case, the conjugate is unnecessary.³ The metric induced by this inner product is B
°2
²%Á &³ ~ )% c &) ~ 8(% c & ( 9 ~
which agrees with the definition of the metric space M given in Chapter 12. In other words, the metric in Chapter 12 is induced by this inner product. As we saw in Chapter 12, this inner product space is complete and so it is a Hilbert space. (In fact, it is the prototype of all Hilbert spaces, introduced by David Hilbert in 1912, even before the axiomatic definition of Hilbert space was given by John von Neumann in 1927.³
The previous example raises the question of whether or not the other metric spaces M ² £ ), with distance given by B
²%Á &³ ~ )% c &) ~ 8 (% c & ( 9
°
(13.1)
~
are complete inner product spaces. The fact is that they are not even inner product spaces! More specifically, there is no inner product whose induced metric is given by (13.1). To see this, observe that, according to Theorem 13.1,
310
Advanced Linear Algebra
any norm that comes from an inner product must satisfy the parallelogram law ) % b & ) b ) % c & ) ~ )%) b )& ) But the norm in (13.1) does not satisfy this law. To see this, take % ~ ²Á Á à ³ and & ~ ²Á cÁ à ³. Then )% b &) ~ Á )% c &) ~ and )%) ~ ° Á )&) ~ ° Thus, the left side of the parallelogram law is and the right side is h 2° , which equals if and only if ~ . Just as any metric space has a completion, so does any inner product space. Theorem 13.6 Let = be an inner product space. Then there exists a Hilbert space / and an isometry ¢ = ¦ / for which ²= ³ is dense in / . Moreover, / is unique up to isometric isomorphism. Proof. We know that the metric space ²= Á ³, where is induced by the inner product, has a unique completion ²= Z Á Z ³, which consists of equivalence classes of Cauchy sequences in = . If ²% ³ ²% ³ = Z and ²& ³ ²& ³ = Z then we set ²% ³ b ²& ³ ~ ²% b & ³Á ²% ³ ~ ²% ³ and º²% ³Á ²& ³» ~ lim º% Á & » ¦B
It is easy to see that, since ²% ³ and ²& ³ are Cauchy sequences, so are ²% b & ³ and ²% ³. In addition, these definitions are well-defined, that is, they are independent of the choice of representative from each equivalence class. For instance, if ²% V ³ ²% ³ then lim )% c % V ) ~
¦B
and so (º% Á & » c º% V Á & »( ~ (º% c % V Á & »( )% c % V ))& ) ¦ (The Cauchy sequence ²& ³ is bounded.³ Hence, º²% ³Á ²& ³» ~ lim º% Á & » ~ lim º% V Á & » ~ º²% V ³Á ²& ³» ¦B
¦B
We leave it to the reader to show that = Z is an inner product space under these operations.
Hilbert Spaces
311
Moreover, the inner product on = Z induces the metric Z , since º²% c & ³Á ²% c & ³» ~ lim º% c & Á % c & » ¦B
~ lim ²% Á & ³ ¦B Z
~ ²²% ³Á ²& ³³ Hence, the metric space isometry ¢ = ¦ = Z is an isometry of inner product spaces, since º ²%³Á ²&³» ~ Z ² ²%³Á ²&³³ ~ ²%Á &³ ~ º%Á &» Thus, = Z is a complete inner product space and ²= ³ is a dense subspace of = Z that is isometrically isomorphic to = . We leave the issue of uniqueness to the reader.
The next result concerns subspaces of inner product spaces. Theorem 13.7 1) Any complete subspace of an inner product space is closed. 2) A subspace of a Hilbert space is a Hilbert space if and only if it is closed. 3) Any finite-dimensional subspace of an inner product space is closed and complete. Proof. Parts 1) and 2) follow from Theorem 12.6. Let us prove that a finitedimensional subspace : of an inner product space = is closed. Suppose that ²% ³ is a sequence in : , ²% ³ ¦ % and % ¤ : . Let 8 ~ ¸ Á Ã Á ¹ be an orthonormal Hamel basis for : . The Fourier expansion
~ º%Á » ~
in : has the property that % c £ but º% c Á » ~ º%Á » c º Á » ~ Thus, if we write & ~ % c and & ~ % c : , the sequence ²& ³, which is in : , converges to a vector & that is orthogonal to : . But this is impossible, because & & implies that )& c &) ~ )& ) b )&) )&) ¦ ° This proves that : is closed. To see that any finite-dimensional subspace : of an inner product space is complete, let us embed : (as an inner product space in its own right) in its completion : Z . Then : (or rather an isometric copy of : ) is a finite-dimensional
312
Advanced Linear Algebra
subspace of a complete inner product space : Z and as such it is closed. However, : is dense in : Z and so : ~ : Z , which shows that : is complete.
Infinite Series Since an inner product space allows both addition of vectors and convergence of sequences, we can define the concept of infinite sums, or infinite series. Definition Let = be an inner product space. The th partial sum of the sequence ²% ³ in = is
~ % b Ä b %
If the sequence ² ³ of partial sums converges to a vector = , that is, if )
c ) ¦ as ¦ B
then we say that the series % converges to and write B
% ~
~
We can also define absolute convergence. Definition A series % is said to be absolutely convergent if the series B
)% ) ~
converges.
The key relationship between convergence and absolute convergence is given in the next theorem. Note that completeness is required to guarantee that absolute convergence implies convergence. Theorem 13.8 Let = be an inner product space. Then = is complete if and only if absolute convergence of a series implies convergence. Proof. Suppose that = is complete and that )% ) B. Then the sequence of partial sums is a Cauchy sequence, for if , we have
)
c
)
~ i % i )% ) ¦ ~b
~b
Hence, the sequence ² ³ converges, that is, the series % converges. Conversely, suppose that absolute convergence implies convergence and let ²% ³ be a Cauchy sequence in = . We wish to show that this sequence converges. Since ²% ³ is a Cauchy sequence, for each , there exists an 5
Hilbert Spaces
313
with the property that Á 5 ¬ )% c % )
Clearly, we can choose 5 5 Ä, in which case )%5b c %5 )
and so B
B
B ~
)%5b c %5 ) ~
Thus, according to hypothesis, the series B
²%5b c %5 ³ ~
converges. But this is a telescoping series, whose th partial sum is %5b c %5 and so the subsequence ²%5 ³ converges. Since any Cauchy sequence that has a convergent subsequence must itself converge, the sequence ²% ³ converges and so = is complete.
An Approximation Problem Suppose that = is an inner product space and that : is a subset of = . It is of considerable interest to be able to find, for any % = , a vector in : that is closest to % in the metric induced by the inner product, should such a vector exist. This is the approximation problem for = . Suppose that % = and let ~ inf )% c ) :
Then there is a sequence
for which ~ )% c
as shown in Figure 13.1.
)
¦
314
Advanced Linear Algebra
Figure 13.1 Let us see what we can learn about this sequence. First, if we let & ~ % c then according to the parallelogram law
)& b & ) b )& c & ) ~ ²)& ) b )& ) ³ or )& c & ) ~ ²)& ) b )& ) ³ c h
& b & h
(13.2)
Now, if the set : is convex, that is, if %Á & : ¬ % b ² c ³& : for all (in words : contains the line segment between any two of its points) then ² b ³° : and so h
& b & h ~ h% c
b
h
Thus, (13.2) gives )& c & ) ²)& ) b )& ) ³ c ¦ as Á ¦ B. Hence, if : is convex then the sequence ²& ³ ~ ²% c Cauchy sequence and therefore so is ² ³.
³
is a
If we also require that : be complete then the Cauchy sequence ² ³ converges to a vector % V : and by the continuity of the norm, we must have )% c % V) ~ . Let us summarize and add a remark about uniqueness. Theorem 13.9 Let = be an inner product space and let : be a complete convex subset of = . Then for any % = , there exists a unique % V : for which )% c % V) ~ inf )% c ) :
The vector % V is called the best approximation to % in : .
Hilbert Spaces
315
Proof. Only the uniqueness remains to be established. Suppose that )% c % V) ~ ~ )% c %Z ) Then, by the parallelogram law,
)% V c %Z ) ~ )²% c %Z ³ c ²% c %³ V)
~ )% c % V) b )% c %Z ) c )% c % V c %Z )
~ )% c % V) b )% c %Z ) c h% c
% V b %Z h 2
b c ~ and so % V ~ %Z .
Since any subspace : of an inner product space = is convex, Theorem 13.9 applies to complete subspaces. However, in this case, we can say more. Theorem 13.10 Let = be an inner product space and let : be a complete subspace of = . Then for any % = , the best approximation to % in : is the unique vector %Z : for which % c %Z : . Proof. Suppose that % c %Z : , where %Z : . Then for any : , we have % c %Z c %Z and so
)% c ) ~ )% c %Z ) b )%Z c ) )% c %Z )
Hence %Z ~ % V is the best approximation to % in : . Now we need only show that %c% V : , where % V is the best approximation to % in : . For any : , a little computation reminiscent of completing the square gives )% c )2 ~ º% c Á % c » ~ )%) c º%Á » c º Á %» b ) ) ~ )%) b ) ) 8 c ~ )%) b ) ) 8 c
~ ) % ) b ) ) e c
º%Á » ) )
º%Á » ) )
º%Á » ) )
º%Á » ) )
98 c
e c
Now, this is smallest when ~
c
º%Á » ) )
9
º%Á » ) )
(º%Á »( ) )
9c
(º%Á »( ) )
316
Advanced Linear Algebra
in which case )% c ) ~ )%) c
(º%Á »( ) )
Replacing % by % c % V gives )% c % V c ) ~ )% c % V) c
(º% c %Á V »(
) )
But % V is the best approximation to % in : and since % V c : we must have )% c % V c ) )% c % V)
Hence, (º% c %Á V »(
) )
~
or, equivalently, º% c %Á V »~ Hence, % c % V :.
According to Theorem 13.10, if : is a complete subspace of an inner product space = then for any % = , we may write %~% V b ²% c %³ V where % V : and % c % V : . Hence, = ~ : b : and since : q : ~ ¸¹, we also have = ~ : p : . This is the projection theorem for arbitrary inner product spaces. Theorem 13.11 (The projection theorem) If : is a complete subspace of an inner product space = then = ~ : p : In particular, if : is a closed subspace of a Hilbert space / then / ~ : p :
Theorem 13.12 Let : , ; and ; Z be subspaces of an inner product space = . 1) If = ~ : p ; then ; ~ : . 2) If : p ; ~ : p ; Z then ; ~ ; Z . Proof. If = ~ : p ; then ; : by definition of orthogonal direct sum. On the other hand, if ' : then ' ~ b !, for some : and ! ; . Hence, ~ º'Á » ~ º Á » b º!Á » ~ º Á »
Hilbert Spaces
and so 1).
317
~ , implying that ' ~ ! ; . Thus, : ; . Part 2) follows from part
Let us denote the closure of the span of a set : of vectors by cspan²:³. Theorem 13.13 Let / be a Hilbert space. 1) If ( is a subset of / then cspan²(³ ~ ( 2) If : is a subspace of / then cl²:³ ~ : 3) If 2 is a closed subspace of / then 2 ~ 2 Proof. We leave it as an exercise to show that ´cspan²(³µ ~ ( . Hence / ~ cspan²(³ p ´cspan²(³µ ~ cspan²(³ p ( But since ( is closed, we also have / ~ ( p ( and so by Theorem 13.12, cspan²(³ ~ ( . The rest follows easily from part 1).
In the exercises, we provide an example of a closed subspace 2 of an inner product space = for which 2 £ 2 . Hence, we cannot drop the requirement that / be a Hilbert space in Theorem 13.13. Corollary 13.14 If ( is a subset of a Hilbert space / then span²(³ is dense in / if and only if ( ~ ¸¹. Proof. As in the previous proof, / ~ cspan²(³ p ( and so ( ~ ¸¹ if and only if / ~ cspan²(³.
Hilbert Bases We recall the following definition from Chapter 9. Definition A maximal orthonormal set in a Hilbert space / is called a Hilbert basis for / .
Zorn's lemma can be used to show that any nontrivial Hilbert space has a Hilbert basis. Again, we should mention that the concepts of Hilbert basis and Hamel basis (a maximal linearly independent set) are quite different. We will show
318
Advanced Linear Algebra
later in this chapter that any two Hilbert bases for a Hilbert space have the same cardinality. Since an orthonormal set E is maximal if and only if E ~ ¸¹, Corollary 13.14 gives the following characterization of Hilbert bases. Theorem 13.15 Let E be an orthonormal subset of a Hilbert space / . The following are equivalent: 1) E is a Hilbert basis 2) E ~ ¸¹ 3) E is a total subset of / , that is, cspan²E³ ~ / .
Part 3) of this theorem says that a subset of a Hilbert space is a Hilbert basis if and only if it is a total orthonormal set.
Fourier Expansions We now want to take a closer look at best approximations. Our goal is to find an explicit expression for the best approximation to any vector % from within a closed subspace : of a Hilbert space / . We will find it convenient to consider three cases, depending on whether : has finite, countably infinite, or uncountable dimension.
The Finite-Dimensional Case Suppose that E ~ ¸" Á Ã Á " ¹ is an orthonormal set in a Hilbert space / . Recall that the Fourier expansion of any % / , with respect to E , is given by
% V ~ º%Á " »" ~
where º%Á " » is the Fourier coefficient of % with respect to " . Observe that º% c %Á V " » ~ º%Á " » c º%Á V " » ~ and so % c % V span²E³. Thus, according to Theorem 13.10, the Fourier expansion % V is the best approximation to % in span²E³. Moreover, since %c% V% V, we have )% V ) ~ ) % ) c )% c % V) )%)
and so )% V ) )% ) with equality if and only if % ~ % V, which happens if and only if % span²E³. Let us summarize.
Hilbert Spaces
319
Theorem 13.16 Let E ~ ¸" Á Ã Á " ¹ be a finite orthonormal set in a Hilbert space / . For any % / , the Fourier expansion % V of % is the best approximation to % in span²E³. We also have Bessel's inequality )% V ) )% ) or, equivalently
(º%Á " »( )%)
(13.3)
~
with equality if and only if % span²E³.
The Countably Infinite-Dimensional Case In the countably infinite case, we will be dealing with infinite sums and so questions of convergence will arise. Thus, we begin with the following. Theorem 13.17 Let E ~ ¸" Á " Á Ã ¹ be a countably infinite orthonormal set in a Hilbert space / . The series B
"
(13.4)
~
converges in / if and only if the series B
( (
(13.5)
~
converges in s. If these series converge then they converge unconditionally (that is, any series formed by rearranging the order of the terms also converges). Finally, if the series (13.4) converges then B
B
i " i ~ ( ( ~
~
Proof. Denote the partial sums of the first series by the second series by . Then for )
c
and the partial sums of
) ~ i " i ~ ( ( ~ ( c ( ~b
~b
Hence ² ³ is a Cauchy sequence in / if and only if ² ³ is a Cauchy sequence in s. Since both / and s are complete, ² ³ converges if and only if ² ³ converges. If the series (13.5) converges then it converges absolutely and hence unconditionally. (A real series converges unconditionally if and only if it
320
Advanced Linear Algebra
converges absolutely.³ But if (13.5) converges unconditionally then so does (13.4). The last part of the theorem follows from the continuity of the norm.
Now let E ~ ¸" Á " Á Ã ¹ be a countably infinite orthonormal set in / . The Fourier expansion of a vector % / is defined to be the sum B
% V ~ º%Á " »"
(13.6)
~
To see that this sum converges, observe that, for any , (13.3) gives
(º%Á " »( )%) ~
and so B
(º%Á " »( )%) ~
which shows that the series on the left converges. Hence, according to Theorem 13.17, the Fourier expansion (13.6) converges unconditionally. Moreover, since the inner product is continuous, º% c %Á V " » ~ º%Á " » c º%Á V " » ~ and so % c % V ´span²E³µ ~ ´cspan²E³µ . Hence, % V is the best approximation to % in cspan²E³. Finally, since % c % V% V, we again have )% V ) ~ ) % ) c )% c % V) )%)
and so )% V ) )% ) with equality if and only if % ~ % V, which happens if and only if % cspan²E³. Thus, the following analog of Theorem 13.16 holds. Theorem 13.18 Let E ~ ¸" Á " Á Ã ¹ be a countably infinite orthonormal set in a Hilbert space / . For any % / , the Fourier expansion B
% V ~ º%Á " »" ~
of % converges unconditionally and is the best approximation to % in cspan²E³. We also have Bessel's inequality )% V ) )% )
Hilbert Spaces
321
or, equivalently B
(º%Á " »( )%) ~
with equality if and only if % cspan²E³.
The Arbitrary Case To discuss the case of an arbitrary orthonormal set E ~ ¸" 2¹, let us first define and discuss the concept of the sum of an arbitrary number of terms. (This is a bit of a digression, since we could proceed without all of the coming details c but they are interesting.³ Definition Let A ~ ¸% 2¹ be an arbitrary family of vectors in an inner product space = . The sum % 2
is said to converge to a vector % = and we write % ~ %
(13.7)
2
if for any , there exists a finite set : 2 for which ; :Á ; finite ¬ i % c %i
;
For those readers familiar with the language of convergence of nets, the set F ²2³ of all finite subsets of 2 is a directed set under inclusion (for every (Á ) F ²2³ there is a * F ²2³ containing ( and ) ) and the function : ¦ % :
is a net in / . Convergence of ²13.7) is convergence of this net. In any case, we will refer to the preceding definition as the net definition of convergence. It is not hard to verify the following basic properties of net convergence for arbitrary sums. Theorem 13.19 Let A ~ ¸% 2¹ be an arbitrary family of vectors in an inner product space = . If % ~ % and & ~ & 2
then
2
Advanced Linear Algebra
322
1) (Linearity) ²% b & ³ ~ % b & 2
for any Á 2) (Continuity) º% Á &» ~ º%Á &» and º&Á % » ~ º&Á %» 2
2
The next result gives a useful “Cauchy type” description of convergence. Theorem 13.20 Let A ~ ¸% 2¹ be an arbitrary family of vectors in an inner product space = . 1) If the sum % 2
converges then for any , there exists a finite set 0 2 such that 1 q 0 ~ JÁ 1 finite ¬ i % i 1
2) If = is a Hilbert space then the converse of 1) also holds. Proof. For part 1), given , let : 2 , : finite, be such that ; :Á ; finite ¬ i% c %i ;
2
If 1 q : ~ J, 1 finite then i % i ~ i²% b % c %³ c ²% c %³i 1
1
:
:
i% c %i b i% c %i 1 r:
:
b ~ 2 2
As for part 2), for each , let 0 2 be a finite set for which 1 q 0 ~ JÁ 1 finite ¬ i% i 1
and let & ~ % 0
Hilbert Spaces
323
Then ²& ³ is a Cauchy sequence, since )& c & ) ~ i % c % i ~ i % c % i 0
0
0 c0
0 c0
i % i b i % i b ¦ 0 c0 0 c0
Since = is assumed complete, we have ²& ³ ¦ & . Now, given , there exists an 5 such that 5 ¬ )& c &) ~ i% c &i 0
2
Setting ~ max¸5 Á °¹ gives for ; 0 Á ; finite i % c & i ~ i % c & b % i ;
0
; c0
i % c & i b i % i 0
; c0
b
and so 2 % converges to &.
The following theorem tells us that convergence of an arbitrary sum implies that only countably many terms can be nonzero so, in some sense, there is no such thing as a nontrivial uncountable sum. Theorem 13.21 Let A ~ ¸% 2¹ be an arbitrary family of vectors in an inner product space = . If the sum % 2
converges then at most a countable number of terms % can be nonzero. Proof. According to Theorem 13.20, for each , we can let 0 2 , 0 finite, be such that 1 q 0 ~ JÁ 1 finite ¬ i % i 1
Let 0 ~ 0 . Then 0 is countable and ¤ 0 ¬ ¸¹ q 0 ~ J for all ¬ )% )
for all ¬ % ~
324
Advanced Linear Algebra
Here is the analog of Theorem 13.17. Theorem 13.22 Let E ~ ¸" 2¹ be an arbitrary orthonormal family of vectors in a Hilbert space / . The two series " and ( ( 2
2
converge or diverge together. If these series converge then i " i ~ ( ( 2
2
Proof. The first series converges if and only if for any , there exists a finite set 0 2 such that
1 q 0 ~ JÁ 1 finite ¬ i " i 1
or, equivalently 1 q 0 ~ JÁ 1 finite ¬ ( ( 1
and this is precisely what it means for the second series to converge. We leave proof of the remaining statement to the reader.
The following is a useful characterization of arbitrary sums of nonnegative real terms. Theorem 13.23 Let ¸ 2¹ be a collection of nonnegative real numbers. Then ~ sup 1 finite 1 2
2
1
provided that either of the preceding expressions is finite. Proof. Suppose that sup ~ 9 B 1 finite 1 2 1
Then, for any , there exists a finite set : 2 such that 9 9 c :
(13.8)
Hilbert Spaces
325
Hence, if ; 2 is a finite set for which ; : then since , 9 9 c ;
:
and so i9 c i ;
which shows that converges to 9 . Finally, if the sum on the left of (13.8) converges then the supremum on the right is finite and so (13.8) holds.
The reader may have noticed that we have two definitions of convergence for countably infinite series: the net version and the traditional version involving the limit of partial sums. Let us write B
% and % ob
~
for the net version and the partial sum version, respectively. Here is the relationship between these two definitions. Theorem 13.24 Let / be a Hilbert space. If % / then the following are equivalent: 1)
% converges (net version) to %
ob B
2)
% converges unconditionally to %
~
Proof. Assume that 1) holds. Suppose that is any permutation of ob . Given any , there is a finite set : ob for which ; :Á ; finite ¬ i % c %i ;
Let us denote the set of integers ¸Á à Á ¹ by 0 and choose a positive integer such that ²0 ³ : . Then for we have
²0 ³ ²0 ³ : ¬ i %²³ c %i ~
% c %
~
²0 ³
and so 2) holds.
326
Advanced Linear Algebra
Next, assume that 2) holds, but that the series in 1) does not converge. Then there exists an such that, for any finite subset 0 ob , there exists a finite subset 1 with 1 q 0 ~ J for which i % i 1
From this, we deduce the existence of a countably infinite sequence 1 of mutually disjoint finite subsets of ob with the property that max²1 ³ ~ 4 b ~ min²1b ³ and i % i 1
Now, we choose any permutation ¢ ob ¦ ob with the following properties 1) ²´ Á 4 µ³ ´ Á 4 µ 2) if 1 ~ ¸Á Á à Á Á" ¹ then ² ³ ~ Á Á ² b 1) ~ Á2 Á à Á ² b " c1) ~ Á" The intention in property 2) is that, for each , takes a set of consecutive integers to the integers in 1 . For any such permutation , we have b" c
i %²³ i ~ i % i ~
1
which shows that the sequence of partial sums of the series B
%²³ ~
is not Cauchy and so this series does not converge. This contradicts 2) and shows that 2) implies at least that 1) converges. But if 1) converges to & / then since 1) implies 2) and since unconditional limits are unique, we have & ~ %. Hence, 2) implies 1).
Now we can return to the discussion of Fourier expansions. Let E ~ ¸" 2¹ be an arbitrary orthonormal set in a Hilbert space / . Given any % / , we may apply Theorem 13.16 to all finite subsets of E, to deduce
Hilbert Spaces
327
that sup (º%Á " »( )%) 1 finite 1 2
1
and so Theorem 13.23 tells us that the sum (º%Á " »( 2
converges. Hence, according to Theorem 13.22, the Fourier expansion % V ~ º%Á " »" 2
of % also converges and )% V) ~ (º%Á " »( 2
Note that, according to Theorem 13.21, % V is a countably infinite sum of terms of the form º%Á " »" and so is in cspan²E³. The continuity of infinite sums with respect to the inner product (Theorem 13.19) implies that º% c %Á V " » ~ º%Á " » c º%Á V " » ~ and so % c % V ´span²E³µ ~ ´cspan²E³µ . Hence, Theorem 3.10 tells us that % V is the best approximation to % in cspan²E³. Finally, since % c % V% V, we again have )% V ) ~ ) % ) c )% c % V) )%)
and so )% V ) )% ) with equality if and only if % ~ % V, which happens if and only if % cspan²E³. Thus, we arrive at the most general form of a key theorem about Hilbert spaces. Theorem 13.25 Let E ~ ¸" 2¹ be an orthonormal family of vectors in a Hilbert space / . For any % / , the Fourier expansion % V ~ º%Á " »" 2
of % converges in / and is the unique best approximation to % in cspan²E³. Moreover, we have Bessel's inequality )% V ) )% )
328
Advanced Linear Algebra
or, equivalently (º%Á " »( )%) 2
with equality if and only if % cspan²E³.
A Characterization of Hilbert Bases Recall from Theorem 13.15 that an orthonormal set E ~ ¸" 2¹ in a Hilbert space / is a Hilbert basis if and only if cspan²E³ ~ / Theorem 13.25 then leads to the following characterization of Hilbert bases. Theorem 13.26 Let E ~ ¸" 2¹ be an orthonormal family in a Hilbert space / . The following are equivalent: 1) E is a Hilbert basis (a maximal orthonormal set) 2) E ~ ¸¹ 3) E is total (that is, cspan²E³ ~ / ) 4) % ~ % V for all % / 5) Equality holds in Bessel's inequality for all % / , that is, ) %) ~ )% V) for all % / . 6) Parseval's identity º%Á &» ~ º%Á V V&» holds for all %Á & / , that is, º%Á &» ~ º%Á " »º&Á " » 2
Proof. Parts 1), 2) and 3) are equivalent by Theorem 13.15. Part 4) implies part 3), since % V cspan²E³ and 3) implies 4) since the unique best approximation of any % cspan²E³ is itself and so % ~ % V. Parts 3) and 5) are equivalent by Theorem 13.25. Parseval's identity follows from part 4) using Theorem 13.19. Finally, Parseval's identity for & ~ % implies that equality holds in Bessel's inequality.
Hilbert Dimension We now wish to show that all Hilbert bases for a Hilbert space / have the same cardinality and so we can define the Hilbert dimension of / to be that cardinality.
Hilbert Spaces
329
Theorem 13.27 All Hilbert bases for a Hilbert space / have the same cardinality. This cardinality is called the Hilbert dimension of / , which we denote by hdim²/³. Proof. If / has a finite Hilbert basis then that set is also a Hamel basis and so all finite Hilbert bases have size dim²/³. Suppose next that 8 ~ ¸ 2¹ and 9 ~ ¸ 1 ¹ are infinite Hilbert bases for / . Then for each , we have ~ º Á » 1
where 1 is the countable set ¸ º Á » £ ¹. Moreover, since no can be orthogonal to every , we have 2 1 ~ 1 . Thus, since each 1 is countable, we have (1 ( ~ e 1 e L (2 ( ~ (2 ( 2
By symmetry, we also have (2 ( (1 ( and so the Schro¨der-Bernstein theorem implies that (1 ( ~ (2 (.
Theorem 13.28 Two Hilbert spaces are isometrically isomorphic if and only if they have the same Hilbert dimension. Proof. Suppose that hdim²/ ³ ~ hdim²/ ³. Let E ~ ¸" 2¹ be a Hilbert basis for / and E ~ ¸# 2¹ a Hilbert basis for / . We may define a map ¢ / ¦ / as follows 4 " 5 ~ # 2
2
We leave it as an exercise to verify that is a bijective isometry. The converse is also left as an exercise.
A Characterization of Hilbert Spaces We have seen that any vector space = is isomorphic to a vector space ²- ) ³ of all functions from ) to - that have finite support. There is a corresponding result for Hilbert spaces. Let 2 be any nonempty set and let M ²2³ ~ D ¢ 2 ¦ dc ( ²³( BE 2
The functions in M ²2³ are referred to as square summable functions. ²We can also define a real version of this set by replacing d by s.³ We define an inner product on M ²2³ by º Á » ~ ²³²³ 2
The proof that M ²2³ is a Hilbert space is quite similar to the proof that
330
Advanced Linear Algebra
M ~ M ²o³ is a Hilbert space and the details are left to the reader. If we define M ²2³ by ²³ ~ Á ~ F
if ~ if £
then the collection E ~ ¸ 2¹ is a Hilbert basis for M ²2³, of cardinality (2 (. To see this, observe that º Á » ~ ²³ ²³ ~ Á 2
and so E is orthonormal. Moreover, if M ²2³ then ²³ £ for only a countable number of 2 , say ¸ Á Á Ã ¹. If we define Z by B
Z ~ ² ³ ~
then Z cspan²E³ and Z ²³ ~ ²³ for all 2 , which implies that ~ Z . This shows that M ²2³ ~ cspan²E³ and so E is a total orthonormal set, that is, a Hilbert basis for M ²2³. Now let / be a Hilbert space, with Hilbert basis 8 ~ ¸" 2¹. We define a map ¢ / ¦ M ²2³ as follows. Since 8 is a Hilbert basis, any % / has the form % ~ º%Á " »" 2
Since the series on the right converges, Theorem 13.22 implies that the series (º%Á " »( 2
converges. Hence, another application of Theorem 13.22 implies that the following series converges ²%³ ~ º%Á " » 2
It follows from Theorem 13.19 that is linear and it is not hard to see that it is also bijective. Notice that ²" ³ ~ and so takes the Hilbert basis 8 for / to the Hilbert basis E for M ²2³.
Hilbert Spaces
331
Notice also that
)²%³) ~ º²%³Á ²%³» ~ (º%Á " »( ~ i º%Á " »" i ~ )%) 2
2
and so is an isometric isomorphism. We have proved the following theorem. Theorem 13.29 If / is a Hilbert space of Hilbert dimension and if 2 is any set of cardinality then / is isometrically isomorphic to M ²2³.
The Riesz Representation Theorem We conclude our discussion of Hilbert spaces by discussing the Riesz representation theorem. As it happens, not all linear functionals on a Hilbert space have the form “take the inner product withà ,” as in the finitedimensional case. To see this, observe that if & / then the function & ²%³ ~ º%Á &» is certainly a linear functional on / . However, it has a special property. In particular, the Cauchy-Schwarz inequality gives, for all % / (& ²%³( ~ (º%Á &»( )%))&) or, for all % £ , (& ²%³( )& ) )%) Noticing that equality holds if % ~ & , we have (& ²%³( ~ )& ) %£ )%)
sup
This prompts us to make the following definition, which we do for linear transformations between Hilbert spaces (this covers the case of linear functionals). Definition Let ¢ / ¦ / be a linear transformation from / to / . Then is said to be bounded if )²%³) B %£ )%)
sup
If the supremum on the left is finite, we denote it by ) ) and call it the norm of .
332
Advanced Linear Algebra
Of course, if ¢ / ¦ - is a bounded linear functional on / then ( ²%³( %£ )%)
) ) ~ sup
The set of all bounded linear functionals on a Hilbert space / is called the continuous dual space, or conjugate space, of / and denoted by / i . Note that this differs from the algebraic dual of / , which is the set of all linear functionals on / . In the finite-dimensional case, however, since all linear functionals are bounded (exercise), the two concepts agree. (Unfortunately, there is no universal agreement on the notation for the algebraic dual versus the continuous dual. Since we will discuss only the continuous dual in this section, no confusion should arise.³ The following theorem gives some simple reformulations of the definition of norm. Theorem 13.30 Let ¢ / ¦ / be a bounded linear transformation. 1) ) ) ~ sup ) ²%³) )%)~
2) ) ) ~ sup ) ²%³) )%)
3) ) ) ~ inf¸ s ) ²%³) )%) for all % /¹ The following theorem explains transformations.
the
importance
of bounded linear
Theorem 13.31 Let ¢ / ¦ / be a linear transformation. The following are equivalent: 1) is bounded 2) is continuous at any point % / 3) is continuous. Proof. Suppose that is bounded. Then ) ²%³ c ²% ³) ~ ) ²% c % ³) ) ))% c % ) ¦ as % ¦ % . Hence, is continuous at % . Thus, 1) implies 2). If 2) holds then for any & / , we have ) ²%³ c ²&³) ~ ) ²% c & b % ³ c ²% ³) ¦ as % ¦ &, since is continuous at % and % c & b % ¦ % as & ¦ %. Hence, is continuous at any & / and 3) holds. Finally, suppose that 3) holds. Thus, is continuous at and so there exists a such that )%) ¬ ) ²%³)
Hilbert Spaces
333
In particular, )%) ~ ¬
) ²%³) ) %)
and so )%) ~ ¬ ) % ) ~ ¬
) ² %³) ) ²%³) ¬ ) %) )%)
Thus, is bounded.
Now we can state and prove the Riesz representation theorem. Theorem 13.32 (The Riesz representation theorem) Let / be a Hilbert space. For any bounded linear functional on / , there is a unique ' / such that ²%³ ~ º%Á ' » for all % / . Moreover, )' ) ~ ) ). Proof. If ~ , we may take ' ~ , so let us assume that £ . Hence, 2 ~ ker² ³ £ / and since is continuous, 2 is closed. Thus / ~ 2 p 2 Now, the first isomorphism theorem, applied to the linear functional ¢ / ¦ - , implies that /°2 - (as vector spaces). In addition, Theorem 3.5 implies that /°2 2 and so 2 - . In particular, dim²2 ³ ~ . For any ' 2 , we have % 2 ¬ ²%³ ~ ~ º%Á '» Since dim²2 ³ ~ , all we need do is find a £ ' 2 for which ²'³ ~ º'Á '» for then ²'³ ~ ²'³ ~ º'Á '» ~ º'Á '» for all - , showing that ²%³ ~ º%Á '» for % 2 as well. But if £ ' 2 then ' ~
²'³ ' º'Á '»
has this property, as can be easily checked. The fact that )' ) ~ ) ) has already been established.
334
Advanced Linear Algebra
Exercises 1.
2. 3.
4. 5. 6.
7.
Prove that the sup metric on the metric space *´Á µ of continuous functions on ´Á µ does not come from an inner product. Hint: let ²!³ ~ and ²!³ ~ ²! c a³°² c a³ and consider the parallelogram law. Prove that any Cauchy sequence that has a convergent subsequence must itself converge. Let = be an inner product space and let ( and ) be subsets of = . Show that a) ( ) ¬ ) ( b) ( is a closed subspace of = c) ´cspan²(³µ ~ ( Let = be an inner product space and : = . Under what conditions is : ~ : ? Prove that a subspace : of a Hilbert space / is closed if and only if : ~ : . Let = be the subspace of M consisting of all sequences of real numbers, with the property that each sequence has only a finite number of nonzero terms. Thus, = is an inner product space. Let 2 be the subspace of = consisting of all sequences % ~ ²% ³ in = with the property that '% ° ~ . Show that 2 is closed, but that 2 £ 2 . Hint: For the latter, show that 2 ~ ¸¹ by considering the sequences " ~ ²Á à Á cÁ à ³, where the term c is in the nth coordinate position. Let E ~ ¸" Á " Á à ¹ be an orthonormal set in / . If % ~ ' " converges, show that B
)%) ~ ( ( ~
8.
Prove that if an infinite series B
% ~
9.
converges absolutely in a Hilbert space / then it also converges in the sense of the “net” definition given in this section. Let ¸ 2¹ be a collection of nonnegative real numbers. If the sum on the left below converges, show that ~ sup 2
1 finite 1 2 1
10. Find a countably infinite sum of real numbers that converges in the sense of partial sums, but not in the sense of nets. 11. Prove that if a Hilbert space / has infinite Hilbert dimension then no Hilbert basis for / is a Hamel basis. 12. Prove that M ²2³ is a Hilbert space for any nonempty set 2 .
Hilbert Spaces
335
13. Prove that any linear transformation between finite-dimensional Hilbert spaces is bounded. 14. Prove that if / i then ker² ³ is a closed subspace of / . 15. Prove that a Hilbert space is separable if and only if hdim²/³ L . 16. Can a Hilbert space have countably infinite Hamel dimension? 17. What is the Hamel dimension of M ²o³? 18. Let and be bounded linear operators on / . Verify the following: a) ) ) ~ (() ) b) ) b ) ) ) b )) c) )) ) ))) 19. Use the Riesz representation theorem to show that / i / for any Hilbert space / .
Chapter 14
Tensor Products
In the preceding chapters, we have seen several ways to construct new vector spaces from old ones. Two of the most important such constructions are the direct sum < l = and the vector space B²< Á = ³ of all linear transformations from < to = . In this chapter, we consider another very important construction, known as the tensor product.
Universality We begin by describing a general type of universality that will help motivate the definition of tensor product. Our description is strongly related to the formal notion of a universal arrow (or universal element) in category theory, but we will be somewhat less formal to avoid the need to formally define categorical concepts. Accordingly, the terminology that we shall introduce is not standard (but does not contradict any standard terminology). Referring to Figure 14.1, consider a set ( and two functions and , with domain (. f A S
W
g X Figure 14.1
Suppose that this diagram commutes, that is, that there exists a unique function ¢ : ¦ ? for which ~ k What does this say about the relationship between the functions and ?
338
Advanced Linear Algebra
Let us think of the “information” contained in a function ¢ ( ¦ ) to be the way in which distinguishes elements of ( using labels from ) . The relationship above implies that ²³ £ ²³ ¬ ²³ £ ²³ This can be phrased by saying that whatever ability has to distinguish elements of ( is also possessed by . Put another way, except for labeling differences, any information contained in is also contained in . This is sometimes expressed by saying that can be factored through . If happens to be injective, then the only difference between and is the values of the labels. That is, the two functions have equal ability to distinguish elements of (. However, in general, is not required to be injective and so may contain more information than . Suppose now that for all sets ? in some family I of sets that includes : and for all functions ¢ ( ¦ ? in some family < of functions that includes , the diagram in Figure 14.1 commutes. This says that the information contained in every function in < is also contained in . In other words, captures and preserves the information in every member of < . In this sense, ¢ ( ¦ : is universal among all functions ¢ ( ¦ ? in < . Moreover, since is a member of the family < , we can also say that contains the information in < but no more. In other words, the information in the universal function is precisely the same as the information in the entire family < . In this way, a single function ¢ ( ¦ : (more precisely, a single pair ²:Á ³) can capture a concept, as described by a family of functions, such as the concepts of basis, quotient space, direct sum and bilinearity (as we will see)! Let us make a formal definition. Definition Referring to Figure 14.2, let ( be a set and let I be a family of sets. Let < be a family of functions from ( to members of I . Let > be a family of functions on members of I . We assume that > has the following structure: 1) > contains the identity function for each member of I 2) > is closed under composition of functions 3) Composition of functions in > is associative. We also assume that for any > and < , the composition k is defined and is a member of < .
Tensor Products
339
S1 A
f1 f2
S2
f3
W1
W2
W3 S3
Figure 14.2 Let us refer to > as the measuring set and its members as measuring functions. A pair ²:Á ¢ ( ¦ :³, where : I and < has the universal property for < as measured by >, or is a universal pair for ²< Á >³, if for any ? I and any ¢ ( ¦ ? in < , there is a unique ¢ : ¦ ? in > for which the diagram in Figure 14.1 commutes, that is, ~ k Another way to express this is to say that any < can be factored through , or that any < can be lifted to a function > on : .
Universal pairs are essentially unique, as the following describes. Theorem 14.1 Let ²:Á ¢ ( ¦ :³ and ²; Á ¢ ( ¦ ; ³ be universal pairs for ²< Á >³. Then there is a bijective measuring function > for which ²:³ ~ ; . Proof. With reference to Figure 14.3, there are unique measuring functions ¢ : ¦ ; and ¢ ; ¦ : for which ~ k ~k Hence, ~ ² k ³ k ~ ² k ³ k However, referring to the third diagram in Figure 14.3, both k ¢ : ¦ : and the identity map ¢ : ¦ : are members of > that make the diagram commute. Hence, the uniqueness requirement implies that k ~ . Similarly k ~ and so and are inverses of one another, making ~ the desired bijection.
340
Advanced Linear Algebra f
A
S W
g
g
A
T V
f S
T
f
A
S VW L
f S
Figure 14.3 Now let us look at some examples of the universal property. Let Vect²- ³ denote the family of all vector spaces over the base field - . (We use the term family informally to represent what in set theory is formally referred to as a class. A class is a “collection” that is too large to be considered a set. For example, Vect²- ³ is a class.) Example 14.1 (Bases: the universal property for set functions from a set ( into a vector space, as measured by linearity) Let ( be a nonempty set. Let I ~ Vect²- ³ and let < be the family of set functions with domain (. The measuring set > is the family of linear transformations. If =( is the vector space with basis (, that is, the set of all formal linear combinations of members of ( with coefficients in - , then the pair ²=( Á ¢ ( ¦ =( ³ where is the injection map ²³ ~ , is universal for ²< Á >³. To see this, note that the condition that < can be factored through ~ k is equivalent to the statement that ²³ ~ ²³ for each basis vector (. But this uniquely defines a linear transformation . Note also that Theorem 14.1 implies that if ²> Á ¢ ( ¦ > ³ is also universal for ²< Á >³, then there is a bijective measuring function from =( to > , that is, > and =( are isomorphic.
Example 14.2 (Quotient spaces and canonical projections: the universal property for linear transformations from = whose kernel contains a particular subspace 2 of = , as measured by linearity) Let = be a vector space with subspace 2 . Let I ~ Vect²- ³. Let < be the family of linear transformations with domain = whose kernel contains 2 . The measuring set > is the family of linear transformations. Then Theorem 3.4 says precisely that the pair ²= °2Á ¢ = ¦ = °2³, where is the canonical projection map, has the universal property for < as measured by >.
Example 14.3 (Direct sums: the universal property for pairs ² Á ³ of linear maps with the same range, where has domain < and has domain = , as measured by linearity) Let < and = be vector spaces and consider the ordered
Tensor Products
341
pair ²< Á = ³. Let I ~ Vect²- ³. A member of < is an ordered pair ² ¢ < ¦ > Á ¢ = ¦ > ³ of linear transformations, written ² Á ³¢ ²< Á = ³ ¦ > for which ² Á ³²"Á #³ ~ ²"³ b ²#³ The measure set > is the set of all linear transformations. For > and ² Á ³ < , we set k ² Á ³ ~ ² k Á k ³ We claim that the pair ²< ^ = Á ² Á ³¢ ²< Á = ³ ¦ < ^ = ³, where ²"³ ~ ²"Á ³ ²#³ ~ ²Á #³ are the canonical injections, has the universal property for ²< Á >³. To see this, observe that for any function ² Á ³¢ ²< Á = ³ ¦ > , that is, for any pair of linear transformations ¢ < ¦ > and ¢ = ¦ > , the condition ² Á ³ ~ k ² Á ³ is equivalent to ² Á ³ ~ ² k Á k ³ or ²"³ b ²#³ ~ ²"Á ³ b ²Á #³ ~ ²"Á #³ But the condition ²"Á #³ ~ ²"³ b ²#³ does indeed define a unique linear transformation ¢ < ^ = ¦ > . (For those familiar with category theory, we have essentially defined the coproduct of vector spaces, which is equivalent to the product, or direct sum.)
It should be clear from these examples that the notion of universal property is, well, universal. In fact, it happens that the most useful definition of tensor product is through its universal property.
Bilinear Maps The universality that defines tensor products rests on the notion of a bilinear map. Definition Let < , = and > be vector spaces over - . Let < d = be the cartesian product of < and = as sets. A set function
342
Advanced Linear Algebra
¢ < d = ¦ > is bilinear if it is linear in both variables separately, that is, if ²" b "Z Á #³ ~ ²"Á #³ b ²"Z Á #³ and ²"Á # b #Z ³ ~ ²"Á #³ b ²"Á #Z ³ The set of all bilinear functions from < d = to > is denoted by hom- ²< Á = Â > ³. A bilinear function ¢ < d = ¦ - , with values in the base field - , is called a bilinear form on < d = .
It is important to emphasize that, in the definition of bilinear function, < d = is the cartesian product of sets, not the direct product of vector spaces. In other words, we do not consider any algebraic structure on < d = when defining bilinear functions, so equations like ²%Á &³ b ²'Á $³ ~ ²% b &Á ' b $³ and ²%Á &³ ~ ²%Á &³ are false. In fact, if = is a vector space, we have two classes of functions from = d = to > , the linear maps B²= d = Á > ³ where = d = is the direct product of vector spaces, and the bilinear maps hom²= Á =  > ³, where = d = is just the cartesian product of sets. We leave it as an exercise to show that these two classes have only the zero map in common. In other words, the only map that is both linear and bilinear is the zero map. We made a thorough study of bilinear forms on a finite-dimensional vector space = in Chapter 11 (although this material is not assumed here). However, bilinearity is far more important and far reaching than its application to metric vector spaces, as the following examples show. Indeed, both multiplication and evaluation are bilinear. Example 14.4 (Multiplication is bilinear) If ( is an algebra, the product map ¢ ( d ( ¦ ( defined by ²Á ³ ~ is bilinear. Put another way, multiplication is linear in each position.
Example 14.5 (Evaluation is bilinear) If = and > are vector spaces, then the evaluation map ¢ B²= Á > ³ d = ¦ > defined by
Tensor Products
343
² Á #³ ~ ²#³ is bilinear. In particular, the evaluation map ¢ = i d = ¦ - defined by ² Á #³ ~ ²#³ is a bilinear form on = i d = .
Example 14.6 If = and > are vector spaces, and = i and > i then the product map ¢ = d > ¦ - defined by ²#Á $³ ~ ²#³²$³ is bilinear. Dually, if # = and $ > then the map ¢ = i d > i ¦ - defined by ² Á ³ ~ ²#³²$³ is bilinear.
It is precisely the tensor product that will allow us to generalize the previous example. In particular, if B²< Á > ³ and B²= Á > ³ then we would like to consider a “product” map ¢ < d = ¦ > defined by ²"Á #³ ~ ²"³ ? ²#³ The tensor product n is just the thing to replace the question mark, because it has the desired bilinearity property, as we will see. In fact, the tensor product is bilinear and nothing else, so it is exactly what we need!
Tensor Products Let < and = be vector spaces. Our guide for the definition of the tensor product < n = will be the desire to have a universal property for bilinear functions, as measured by linearity. Put another way, we want < n = to embody the notion of bilinearity but nothing more, that is, we want it to be universal for bilinearity. Referring to Figure 14.4, we seek to define a vector space ; and a bilinear map !¢ < d = ¦ ; so that any bilinear map with domain < d = can be factored through !. Intuitively speaking, ! is the most “general” or “universal” bilinear map with domain < d = .
U
t
V f
bilinear
T W
bilinear
W Figure 14.4
linear
344
Advanced Linear Algebra
Definition Let < d = be the cartesian product of two vector spaces over - . Let I ~ Vect²- ³. Let < ~ ¸hom- ²< Á = Â > ¹ > I ¹ be the family of all bilinear maps from < d = to a vector space > . The measuring set > is the family of all linear transformations. A pair ²; Á !¢ < d = ¦ ; ³ is universal for bilinearity if it is universal for ²< Á >³, that is, if for every bilinear map ¢ < d = ¦ > , there is a unique linear transformation ¢ < n = ¦ > for which ~ k!
Let us now turn to the question of the existence of a universal pair for bilinearity.
Existence I: Intuitive but Not Coordinate Free The universal property for bilinearity captures the essence of bilinearity and nothing more (as is the intent for all universal properties). To understand better how this can be done, let 8 ~ ¸ 0¹ be a basis for < and let 9 ~ ¸ 1 ¹ be a basis for = . Then a bilinear map ! on < d = is uniquely determined by assigning arbitrary values to the “basis” pairs ² Á ³. How can we do this and nothing more? The answer is that we should define ! on the pairs ² Á ³ in such a way that the images !² Á ³ do not interact and then extend by bilinearity. In particular, for each ordered pair ² Á ³, we invent a new formal symbol, say n and define ; to be the vector space with basis : ~ ¸ n 8 Á 9¹ Then define the map ! by setting !² Á ³ ~ n and extending by bilinearity. This uniquely defines a bilinear map ! that is as “universal” as possible among bilinear maps. Indeed, if ¢ < d = ¦ > is bilinear, the condition ~ k ! is equivalent to ² n ³ ~ ² Á ³ which uniquely defines a linear map ¢ ; ¦ > . Hence, ²; Á !³ has the universal property for bilinearity.
Tensor Products
345
A typical element of ; is a finite linear combination
Á ² n ³ Á~
and if " ~ and # ~ then " n # ~ !²"Á #³ ~ !4 Á 5 ~ ² n ³ Note that, as is customary, we have used the notation " n # for the image of any pair ²"Á #³ under !. Strictly speaking, this is an abuse of the notation n as we have defined it. While it may seem innocent, it does contribute to the reputation that tensor products have for being a bit difficult to fathom. Confusion may arise because while the elements " n # form a basis for ; (by definition), the larger set of elements of the form " n # span ; , but are definitely not linearly independent. This raises various questions, such as when a sum of the form " n # is equal to , or when we can define a map on ; by specifying the values ²" n #³ arbitrarily. The first question seems more obviously challenging when we phrase it by asking when a sum of the form !²" Á # ³ is , since there is no algebraic structure on the cartesian product < d = , and so there is nothing “obvious” that we can do with this sum. The second question is not difficult to answer when we keep in mind that the set ¸" n #¹ is not linearly independent. The notation n is used in yet another way: ; is generally denoted by < n = and called the tensor product of < and = . The elements of < n = are called tensors and a tensor of the form " n # is said to be decomposable. For example, in s n s , the tensor ²Á ³ n ²Á ³ is decomposable but the tensor ²Á ³ n ²Á ³ b ²Á ³ n ²Á ³ is not. It is also worth emphasizing that the tensor product n is not a product in the sense of a binary operation on a set, as is the case in rings and fields, for example. In fact, even when = ~ < , the tensor product " n " is not in < , but rather in < n < . It is wise to remember that the decomposable tensor " n # is nothing more than the image of the ordered pair ²"Á #³ under the bilinear map !, as are the basis elements n .
Existence II: Coordinate Free The previous definition of tensor product is about as intuitive as possible, but has the disadvantage of not being coordinate free. The following customary approach to defining the tensor product does not require the choice of a basis. Let -< d= be the vector space over - with basis < d = . Let : be the subspace of -< d= generated by all vectors of the form
346
Advanced Linear Algebra
²"Á $³ b ²#Á $³ c ²" b #Á $³
(14.1)
²"Á #³ b ²"Á $³ c ²"Á # b $³
(14.2)
and
where Á - and "Á # and $ are in the appropriate spaces. Note that these vectors are if we replace the ordered pairs by tensors according to our previous definition. The quotient space < n= ~
-< d= :
is also called the tensor product of < and = . The elements of < n = have the form 4 ²" Á # ³5 b : ~ ²" Á # ³ b : ! However, since ²"Á #³ c ²"Á #³ : and ²"Á #³ c ²"Á #³ : , we can absorb the scalar in either coordinate, that is, ´²"Á #³ b :µ ~ ²"Á #³ b : ~ ²"Á #³ b : and so the elements of < n = can be written simply as ´²" Á # ³ b :µ It is customary to denote the coset ²"Á #³ b : by " n # and so any element of < n = has the form " n # as before. Finally, the map !¢ < d = ¦ < n = is defined by !²"Á #³ ~ " n # The proof that the pair ²< n = Á !¢ < d = ¦ < n = ³ is universal for bilinearity is a bit more tedious when < n = is defined as a quotient space. Theorem 14.2 Let < and = be vector spaces. The pair ²< n = Á !¢ < d = ¦ < n = ³ has the universal property for bilinearity, as measured by linearity. Proof. Consider the diagram in Figure 14.5. Here -< d= is the vector space with basis < d = .
Tensor Products
347
t U
j
V f
FU
V
V
S
U
V
W
W Figure 14.5 Since !²"Á #³ ~ " n # ~ k ²"Á #³, we have !~k The universal property of vector spaces described in Example 14.1 implies that there is a unique linear transformation ¢ -< d= ¦ > for which k~ Note that sends any of the vectors (14.1) and (14.2) that generate : to the zero vector and so : ker²³. For example, ´²"Á $³ b ²#Á $³ c ²" b #Á $³µ ~ ´²"Á $³ b ²#Á $³ c ²" b #Á $³µ ~ ²"Á $³ b ²#Á $³ c ²" b #Á $³ ~ ²"Á $³ b ²#Á $³ c ²" b #Á $³ ~ and similarly for the second coordinate. Hence, we may apply Theorem 3.4 (the universal property described in Example 14.2), to deduce the existence of a unique linear transformation ¢ < n = ¦ > for which k ~ Hence, k!~ kk~k~ As to uniqueness, if Z k ! ~ then Z ~ Z k satisfies Z k ~ Z k k ~ Z k ! ~ The uniqueness of then implies that Z ~ , which in turn implies that Z k ~ Z ~ ~ k and the uniqueness of implies that Z ~ .
We now have two definitions of tensor product that are equivalent, since under the second definition the tensors
348
Advanced Linear Algebra
² Á ³ b : ~ n form a basis for -< d= °: , as we will prove a bit later. Accordingly, we no longer need to make a distinction between the two definitions.
Bilinearity on < d = Equals Linearity on < n = The universal property for bilinearity says that to each bilinear function ¢ < d = ¦ > , there corresponds a unique linear function ¢ < n = ¦ > . This establishes a correspondence ¢ hom²< Á = Â > ³ ¦ B²< n = Á > ³ given by ² ³ ~ . In other words, ² ³¢ < n = ¦ > is the unique linear map for which ² ³²" n #³ ~ ²"Á #³ Observe that is linear, since if Á hom²< Á = Â > ³ then ´² ³ b ²³µ²" n #³ ~ ²"Á #³ b ²"Á #³ ~ ² b ³²"Á #³ and so the uniqueness part of the universal property implies that ² ³ b ²³ ~ ² b ³ Also, is surjective, since if ¢ < n = ¦ > is any linear map then ~ k !¢ < d = ¦ > is bilinear and by the uniqueness part of the universal property, we have ² ³ ~ . Finally, is injective, for if ² ³ ~ then ~ ² ³ k ! ~ . We have established the following result. Theorem 14.3 Let < , = and > be vector spaces over - . Then the map ¢ hom²< Á = Â > ³ ¦ B²< n = Á > ³, where ² ³ is the unique linear map satisfying ~ ² ³ k !, is an isomorphism. Thus, hom²< Á = Â > ³ B²< n = Á > ³
Armed with the definition and the universal property, we can now discuss some of the basic properties of tensor products.
When Is a Tensor Product Zero? Let us consider the question of when a tensor " n # is zero. The universal property proves to be very helpful in deciding this question. First, note that the bilinearity of the tensor product gives n # ~ ² b ³ n # ~ n # b n # and so n # ~ . Similarly, " n ~ .
Tensor Products
349
Now suppose that " n # ~
where we may assume that none of the vectors " and # are . According to the universal property of the tensor product, for any bilinear function ¢ < d = ¦ > , there is a unique linear transformation ¢ < n = ¦ > for which k ! ~ . Hence ~ 4 " n # 5 ~ ² k !³²" Á # ³ ~ ²" Á # ³
The key point is that this holds for any bilinear function ¢ < d = ¦ > . One possibility for is to take two linear functionals < i and = i and multiply them ²"Á #³ ~ ²"³ ²#³ which is easily seen to be bilinear and gives ²" ³ ²# ³ ~
If, for example, the vectors " are linearly independent, then we can consider the dual vectors "i , for which "i ²" ³ ~ Á . Setting ~ "i gives ~ "i ²" ³ ²# ³ ~ ²# ³
for all linear functionals = i . This implies that # ~ . We have proved the following useful result. Theorem 14.4 If " Á Ã Á " are linearly independent vectors in < and # Á Ã Á # are arbitrary vectors in = then " n # ~ ¬ # ~ for all In particular, " n # ~ if and only if " ~ or # ~ .
The next result says that we can get a basis for the tensor product < n = simply by “tensoring” any bases for each coordinate space. As promised, this shows that the two definitions of tensor product are essentially equivalent. Theorem 14.5 Let 8 ~ ¸" 0¹ be a basis for < and let 9 ~ ¸# 1 ¹ be a basis for = . Then the set : ~ ¸" n # 0Á 1 ¹ is a basis for < n = .
350
Advanced Linear Algebra
Proof. To see that the : is linearly independent, suppose that Á ²" n # ³ ~ Á
This can be written " n 4 Á # 5 ~
and so, by Theorem 14.4, we must have Á # ~
for all and hence Á ~ for all and . To see that : spans < n = , let " n # < n = . Then since " ~ " and # ~ # , we have " n # ~ " n #
~ 4" n # 5
~ 4 ²" n # ³5
~ ²" n # ³ Á
Hence, any sum of elements of the form " n # is a linear combination of the vectors " n # , as desired.
Corollary 14.6 For finite-dimensional vector spaces, dim²< n = ³ ~ dim²< ³ h dim²= ³
Coordinate Matrices and Rank If 8 ~ ¸" 0¹ is a basis for < and 9 ~ ¸# 1 ¹ is a basis for = , then any vector ' < n = has a unique expression as a sum ' ~ Á ²" n # ³ 0 1
where only a finite number of the coefficients Á are nonzero. In fact, for a fixed ' < n = , we may reindex the bases so that
' ~ Á ²" n # ³ ~ ~
Tensor Products
351
where none of the rows or columns of the matrix 9 ~ ²Á ³ consists only of 's. The matrix 9 ~ ²Á ³ is called a coordinate matrix of ' with respect to the bases 8 and 9. Note that a coordinate matrix 9 is determined only up to the order of its rows and columns. We could remove this ambiguity by considering ordered bases, but this is not necessary for our discussion and adds a complication since the bases may be infinite. Suppose that M ~ ¸$ 0¹ and N ~ ¸% 1 ¹ are also bases for < and = , respectively and that
' ~
Á ²$
n % ³
~ ~
where : ~ ² Á ³ is a coordinate matrix of ' with respect to these bases. We claim that the coordinate matrices 9 and : have the same rank, which can then be defined as the rank of the tensor ' < n = . Each $ Á Ã Á $ is a finite linear combination of basis vectors in 8 , perhaps involving some of " Á Ã Á " and perhaps involving other vectors in 8 . We can further reindex 8 so that each $ is a linear combination of the vectors " Á Ã Á " , where and set < ~ span²" Á Ã Á " ³ Next, extend ¸$ Á Ã Á $ ¹ to a basis M Z ~ ¸$ Á Ã Á $ Á $b Á Ã Á $ ¹ for < . (Since we no longer need the rest of the basis M , we have commandeered the symbols $b Á Ã Á $ , for simplicity.) Hence
$ ~ Á " for ~ Á Ã Á ~
where ( ~ ²Á ³ is invertible of size d . Now repeat this process on the second coordinate. Reindex the basis 9 so that the subspace = ~ span²# Á à Á # ³ contains % Á à Á % and extend to a basis N Z ~ ¸% Á à Á % Á %b Á à Á % ¹ for = . Then
% ~ Á # for ~ Á Ã Á ~
where ) ~ ²Á ³ is invertible of size d .
352
Advanced Linear Algebra
Next, write
' ~ Á ²" n # ³ ~ ~
by setting Á ~ for or . Thus, the d matrix 9 ~ ²Á ³ comes from 9 by adding c rows of 's to the bottom and then c columns of 's. In particular, 9 and 9 have the same rank. The expression for ' in terms of the basis vectors $ Á à Á $ and % Á à Á % can also be extended using coefficients to
' ~
Á ²$
n % ³
~ ~
where the d matrix : ~ ²
Á ³
has the same rank as : .
Now at last, we can compute
Á ²$
n % ³ ~
Á 8 Á " ~ ~ ~
~ ~
~ Á
n Á # 9 ~
Á Á "
n #
~ ~ ~ ~
~ ²! Á
Á ³Á "
n #
~ ~ ~ ~
~ ²(! : ³Á Á " n # ~ ~ ~
~ ²(! : )³Á " n # ~ ~
and so
Á ²" n # ³ ~ ²(! : )³Á " n # ~ ~
~ ~
It follows that 9 ~ (! : ) . Since ( and ) are invertible, we deduce that rk²9³ ~ rk²9 ³ ~ rk²: ³ ~ rk²:³ In block terms 9 ~ > and
9
and ?
: ~ >
:
?
Tensor Products
(! ~ >
! (Á i
i ? and i
) ) ~ > Á i
353
i i?
Then 9 ~ (! : ) implies that, for the original coordinate matrices, ! 9 ~ (Á :)Á ! where rk²(Á ³ rk²9³ and rk²)Á ³ rk²9³.
We shall soon have use for the following special case. If
' ~ " n # ~ $ n % ~
(14.3)
~
then, in the preceding argument, ~ ~ ~ ~ and 9 ~ : ~ 0 and 0 9 ~ >
and ?
0 : ~ >
?
! Hence, the equation 9 ~ (Á :)Á becomes ! 0 ~ (Á )Á
and we further have
$ ~ Á " for ~ Á Ã Á ~
where (Á ~ ²Á ³ and
% ~ Á # for ~ Á Ã Á ~
where )Á ~ ²Á ³.
The Rank of a Decomposable Tensor Recall that a tensor of the form " n # is said to be decomposable. If ¸" 0¹ is a basis for < and ¸# 1 ¹ is a basis for = then any decomposable vector has the form " n # ~ ²" n # ³ Á
Hence, the rank of a decomposable vector is . This implies that the set of decomposable vectors is quite “small” in < n = , as long as neither vector space has dimension .
354
Advanced Linear Algebra
Characterizing Vectors in a Tensor Product There are several very useful representations of the tensors in < n = . Theorem 14.7 Let ¸" 0¹ be a basis for < and let ¸# 1 ¹ be a basis for = . By a “unique” sum, we mean unique up to order and presence of zero terms. Then 1) Every ' < n = has a unique expression as a finite sum of the form Á " n # Á
where Á - . 2) Every ' < n = has a unique expression as a finite sum of the form " n &
where & = . 3) Every ' < n = has a unique expression as a finite sum of the form % n #
where % < . 4) Every nonzero ' < n = has an expression of the form
% n & ~
where ¸% ¹ < and ¸& ¹ = are linearly independent sets. As to uniqueness, is the rank of ' and so it is unique. Also, we have
% n & ~ $ n ' ~
~
where ¸$ ¹ < and ¸' ¹ = are linearly independent sets, if and only if there exist d matrices ( ~ ²Á ³ and ) ~ ²Á ³ for which (! ) ~ 0 and
$ ~ Á %
and ' ~ Á &
~
~
for ~ Á Ã Á . Proof. Part 1) merely expresses the fact that ¸" n # ¹ is a basis for < n = . From part 1), we write Á " n # ~ @" n Á # A ~ " n & Á
Tensor Products
355
which is part 2). Uniqueness follows from Theorem 14.4. Part 3) is proved similarly. As to part 4), we start with the expression from part 2)
" n & ~
where we may assume that none of the & 's are . If the set ¸& ¹ is linearly independent, we are done. If not, then we may suppose (after reindexing if necessary) that c
& ~ & ~
Then
c
c
" n & ~ " n & b 8" n & 9 ~
~ c
~
c
~ " n & b " n & ~ c
~
~ ²" b " ³ n & ~
But the vectors ¸" b " c ¹ are linearly independent. This reduction can be repeated until the second coordinates are linearly independent. Moreover, the identity matrix 0 is a coordinate matrix for ' and so ~ rk²0 ³ ~ rk²'³. As to uniqueness, one direction was proved earlier; see (14.3) and the other direction is left to the reader.
Defining Linear Transformations on a Tensor Product One of the simplest and most useful ways to define a linear transformation on the tensor product < n = is through the universal property, for this property says precisely that a bilinear function on < d = gives rise to a unique (and well-defined) linear transformation on < n = . The proof of the following theorem illustrates this well. It says that a linear functional on the tensor product is nothing more or less than a tensor product of linear functionals. Theorem 14.8 Let < and = be finite-dimensional vector spaces. Then < i n = i ²< n = ³i via the isomorphism ¢ < i n = i ¦ ²< n = ³i given by ² n ³²" n #³ ~ ²"³²#³ Proof. Informally, for fixed and , the function ²"Á #³ ¦ ²"³²#³ is bilinear in " and # and so there is a unique linear map Á taking " n # to ²"³²#³.
356
Advanced Linear Algebra
The function ² Á ³ ¦ Á is bilinear in and since, as functions, bÁ ~ Á b Á and so there is a unique linear map taking n to Á . A bit more formally, for fixed and , the map - Á ¢ < d = ¦ - defined by - Á ²"Á #³ ~ ²"³²#³ is bilinear and so the universal property of tensor products implies that there exists a unique linear functional Á on < n = for which Á ²" n #³ ~ - Á ²"Á #³ ~ ²"³²#³ Next, the map .¢ < i d = i ¦ ²< n = ³i defined by .² Á ³ ~ Á is bilinear since, for example, .² b Á ³²" n #³ ~ b Á ²" n #³ ~ ² b ³²"³ h ²#³ ~ ²"³²#³ b ²"³²#³ ~ ² Á b Á ³²"Á #³ ~ ².² Á ³ b .²Á ³³²" n #³ and so .² b Á ³ ~ .² Á ³ b .²Á ³ which shows that . is linear in its first coordinate. Hence, the universal property implies that there exists a unique linear map ¢ < i n = i ¦ ²< n = ³i for which ² n ³ ~ .² Á ³ ~ Á that is, ² n ³²" n #³ ~ Á ²" n #³ ~ ²"³²#³ Finally, we must show that is bijective. Let ¸ ¹ be a basis for < , with dual basis ¸ ¹ and let ¸ ¹ be a basis for = , with dual basis ¸ ¹. Then ² n ³²" n # ³ ~ ²" ³ ²# ³ ~ Á" Á# ~ ²Á³Á²"Á#³ and so ¸ ² n ³¹ ²< n = ³i is the dual basis to the basis ¸" n # ¹ for < n = . Thus, takes the basis ¸ n ¹ for < i n = i to the basis ¸ ² n ³¹ and is therefore bijective.
Tensor Products
357
Combining the isomorphisms of Theorem 14.3 and Theorem 14.8, we have, for finite-dimensional vector spaces < and = , < i n = i ²< n = ³i hom²< Á = Â - ³
The Tensor Product of Linear Transformations We wish to generalize Theorem 14.8 to arbitrary linear transformations. Let B²< Á < Z ³ and B²= Á = Z ³. While the product ²"³²#³ does not make sense, the tensor product ²"³ n ²#³ does and is bilinear in " and # ²"Á #³ ~ ²"³ n ²#³ The same informal argument that we used in the proof of Theorem 14.8 will work here. Namely, the expression ²"³ n ²#³ < Z n = Z is bilinear in " and # and so there is a unique linear map, say ² p ³¢ < n = ¦ < Z n = Z for which ² p ³²" n #³ ~ ²"³ n ²#³ Since p B²< n = Á < Z n = Z ³, we have a function ¢ B²< Á < Z ³ d B²= Á = Z ³ ¦ B²< n = Á < Z n = Z ³ defined by ² Á ³ ~ p But is bilinear, since ²² b ³ p ³²"Á #³ ~ ² b ³²"³ n ²#³ ~ ² ²"³ b ²"³³ n ²#³ ~ ´ ²"³ n ²#³µ b ´²"³ n ²#³µ ~ ² p ³²"Á #³ b ² p ³²"Á #³ ~ ²² p ³ b ² p ³³²"Á #³ and similarly for the second coordinate. Hence, there is a unique linear transformation ¢ B²< Á < Z ³ n B²= Á = Z ³ ¦ B²< n = Á < Z n = Z ³ satisfying ² n ³ ~ p that is, ´ ² n ³µ²" n #³ ~ ²"³ n ²#³ To see that is injective, if ² n ³ ~ then ²"³ n ²#³ ~ for all " < and # = . If ~ then n ~ . If £ , then there is a # = for which ²#³ £ . But ²"³ n ²#³ ~ implies that one of the factors is and so
358
Advanced Linear Algebra
²"³ ~ for all " < , that is, ~ . Hence, n ~ . In either case, we see that is injective. Thus, is an embedding (injective linear transformation) and if all vector spaces are finite-dimensional, then dim²B²< Á < Z ³ n B²= Á = Z ³³ ~ dim²B²< n = Á < Z n = Z ³³ and so is also surjective and hence an isomorphism. The embedding of B²< Á < Z ³ n B²= Á = Z ³ into B²< n = Á < Z n = Z ³ means that each n can be thought of as the linear transformation p from < n = to < Z n = Z , defined by ² p ³²" n #³ ~ ²"³ n ²#³ In fact, the notation n is often used to denote both the tensor product of vectors (linear transformations) and the linear map p , and we will do this as well. In summary, we can say that the tensor product n of linear transformations is a linear transformation on tensor products. Theorem 14.9 The linear transformation ¢ B²< Á < Z ³ n B²= Á = Z ³ ¦ B²< n = Á < Z n = Z ³ defined by ² n ³ ~ p where ² p ³²" n #³ ~ ²"³ n ²#³ is an embedding (injective linear transformation), and is an isomorphism if all vector spaces are finite-dimensional. Thus, the tensor product n of linear transformations is (via this embedding) a linear transformation on tensor products.
There are several special cases of this result that are of importance.
Corollary 14.10 Let us use the symbol ? Æ @ to denote the fact that there is an embedding of ? into @ that is an isomorphism if ? and @ are finitedimensional. 1) Taking < Z ~ - gives
< i n B²= Á = Z ³ Æ B²< n = Á = Z ³ where ² n ³²" n #³ ~ ²"³²#³ for < i .
Tensor Products
359
2) Taking < Z ~ - and = Z ~ - gives
< i n = i Æ ²< n = ³i where ² n ³²" n #³ ~ ²"³²#³ 3) Taking = ~ - and noting that B²- Á = Z ³ = Z and < n - < gives (letting > ~ = Z )
B²< Á < Z ³ n > Æ B²< Á < Z n > ³ where ² n $³²"³ ~ ²"³ n $ 4) Taking < Z ~ - and = ~ - gives (letting > ~ = Z )
< i n > Æ B²< Á > ³ where ² n $³²"³ ~ ²"³$
Change of Base Field The tensor product gives us a convenient way to extend the base field of a vector space. (We have already discussed the complexification of a real vector space.) For convenience, let us refer to a vector space over a field - as an - space and write =- . Actually, there are several approaches to “upgrading” the base field of a vector space. For instance, suppose that 2 is an extension field of - , that is, - 2 . If ¸ ¹ is a basis for =- then every % =- has the form % ~ where - . We can define an 2 -space =2 simply by taking all formal linear combinations of the form % ~ where 2 . Note that the dimension of =2 as a 2 -space is the same as the dimension of =- as an - -space. Also, =2 is an - -space (just restrict the scalars to - ) and as such, the inclusion map ¢ =- ¦ =2 sending % =- to ²%³ ~ % =2 , is an - -monomorphism. The approach described in the previous paragraph uses an arbitrarily chosen basis for =- and is therefore not coordinate free. However, we can give a coordinate–free approach using tensor products as follows. Since 2 is a vector
360
Advanced Linear Algebra
space over - , we can consider the tensor product >- ~ 2 n - =It is customary to include the subscript - on n - to denote the fact that the tensor product is taken with respect to the base field - . (All relevant maps are - -bilinear and - -linear.³ However, since =- is not a 2 -space, the only tensor product that makes sense in 2 n =- is the - -tensor product and so we will drop the subscript - . The vector space >- is an - -space by definition of tensor product, but we may make it into a 2 -space as follows. For 2 , the temptation is to “absorb” the scalar into the first coordinate ² n #³ ~ ² ³ n # But we must be certain that this is well-defined, that is, that n # ~ n $ ¬ ² ³ n # ~ ² ³ n $ This becomes easy if we turn to the universal property for bilinearity. In particular, consider the map ¢ ²2 d =- ³ ¦ ²2 n =- ³ defined by ² Á #³ ~ ² ³ n # This map is obviously well-defined and since it is also bilinear, the universal property of tensor products implies that there is a unique (and well-defined!) - linear map ¢ ²2 n =- ³ ¦ ²2 n =- ³ for which ² n #³ ~ ² ³ n # Note also that since is - -linear, it is additive and so ´² n #³ b ² n $³µ ~ ² n #³ b ² n $³ that is, ´² n #³ b ² n $³µ ~ ² n #³ b ² n $³ which is one of the properties required of a scalar multiplication. Since the other defining properties of scalar multiplication are satisfied, the set 2 n =- is indeed a 2 -space under this operation (and addition), which we denote by >2 . To be absolutely clear, we have three distinct vector spaces: the - -spaces =and >- ~ 2 n =- and the 2 -space >2 ~ 2 n =- , where the tensor product in both cases is with respect to - . The spaces >- and >2 are identical as sets and as abelian groups. It is only the “permission to multiply by” that is different. Even though in >- we can multiply only by scalars from - , we still get the same set of vectors. Accordingly, we can recover >- from >2 simply by restricting scalar multiplication to scalars from - .
Tensor Products
361
It follows that we can speak of “- -linear” maps from =- into >2 , with the expected meaning, that is, ²" b #³ ~ ²"³ b ²#³ for all scalars Á - (not in 2 ). If the dimension of 2 as a vector space over - is then dim²>- ³ ~ dim²2 n =- ³ ~ dim²2³ h dim²=- ³ ~ h dim²=- ³ As to the dimension of >2 , it is not hard to see that if ¸ ¹ is a basis for =then ¸ n ¹ is a basis for >2 . Hence dim²>2 ³ ~ dim²=- ³ even when =- is infinite-dimensional. The map ¢ =- ¦ >- defined by ²#³ ~ n # is easily seen to be injective and - -linear and so >- contains an isomorphic copy of =- . We can also think of as mapping =- into >2 , in which case is called the 2 -extension map of =- . This map has a universal property of its own, as described in the next theorem. Theorem 14.11 The 2 -extension map ¢ =- ¦ >2 has the universal property for the family of all - -linear maps with domain =- and range a 2 -space, as measured by 2 -linear maps. In particular, for any - -linear map ¢ =- ¦ @ , where @ is a 2 -space, there exists a unique 2 -linear map ¢ >2 ¦ @ for which the diagram in Figure 14.6 commutes, that is, k~ Proof. If such a map ¢ 2 n =- ¦ > is to exist then it must satisfy ² n #³ ~ ² n #³ ~ ²#³ ~ ²#³
(14.4)
This shows that if exists, it is uniquely determined by . As usual, when searching for a linear map on a tensor product such as >2 ~ 2 n =- , we look for a bilinear map. Let ¢ ²2 d =- ³ ¦ @ be defined by ² Á #³ ~ ²#³ Since this is bilinear, there exists a unique - -linear map for which (14.4) holds. It is easy to see that is also 2 -linear, since if 2 then ´² n #³µ ~ ² n #³ ~ ²#³ ~ ² n #³
362
Advanced Linear Algebra
WK
VF
W
f
Y Figure 14.6 Theorem 14.11 is the key to describing how to extend an - -linear map to a 2 linear map. Figure 14.7 shows an - -linear map ¢ = ¦ > between - -spaces = and > . It also shows the 2 -extensions for both spaces, where 2 n = and 2 n > are 2 -spaces.
W
V
W PW
PV K
V
W
K
W
Figure 14.7 If there is a unique 2 -linear map that makes the diagram in Figure 14.7 commute, then this would be the obvious choice for the extension of the - linear map " to a 2 -linear map. Consider the - -linear map ~ ²> k ³¢ = ¦ 2 n > into the 2 -space 2 n > . Theorem 14.11 implies that there is a unique 2 -linear map ¢ 2 n = ¦ 2 n > for which k = ~ that is, k = ~ > k Now, satisfies ² n #³ ~ ² n #³ ~ ² k = ³²#³ ~ ²> k ³²#³ ~ ² n ²#³³ ~ n ²#³ ~ ²2 n ³² n #³ and so ~ 2 n .
Tensor Products
363
Theorem 14.12 Let = and > be - -spaces, with 2 -extension maps = and > , respectively. (See Figure 14.7.³ Then for any - -linear map ¢ = ¦ > , the map ~ 2 n is the unique 2 -linear map that makes the diagram in Figure 14.7 commute, that is, for which k ~ k
Multilinear Maps and Iterated Tensor Products The tensor product operation can easily be extended to more than two vector spaces. We begin with the extension of the concept of bilinearity. Definition If = Á Ã Á = and > are vector spaces over - , a function ¢ = d Ä d = ¦ > is said to be multilinear if it is linear in each variable separately, that is, if ²" Á Ã Á "c Á # b #Z Á "b Á Ã Á " ³ ~ ²" Á Ã Á "c Á #Á "b Á Ã Á " ³ b ²" Á Ã Á "c Á #Z Á "b Á Ã Á "³ for all ~ Á Ã Á . A multilinear function of variables is also referred to as an -linear function. The set of all -linear functions as defined above will be denoted by hom²= Á Ã Á = Â > ³. A multilinear function from = d Ä d = to the base field - is called a multilinear form or -form.
Example 14.7 1) If ( is an algebra then the product map ¢ ( d Ä d ( ¦ ( defined by ² Á Ã Á ³ ~ Ä is -linear. 2) The determinant function det¢ C ¦ - is an -linear form on the columns of the matrices in C .
We can extend the quotient space definition of the tensor product to -linear functions as follows. Let 8 ~ ¸Á 1 ¹ be a basis for = for ~ Á à Á . For each ordered tuple ²Á Á à Á Á ³, we invent a new formal symbol Á n Ä n Á and define ; to be the vector space with basis : ~ ¸Á n Ä n Á Á 8 ¹ Then define the map ! by setting !²Á Á à Á Á ³ ~ Á n Ä n Á and extending by multilinearity. This uniquely defines a multilinear map ! that is as “universal” as possible among multilinear maps. Indeed, if ¢ = d Ä d = ¦ > is multilinear, the condition ~ k ! is equivalent to
364
Advanced Linear Algebra
²Á n Ä n Á ³ ~ ²Á Á à Á Á ³ which uniquely defines a linear map ¢ ; ¦ > . Hence, ²; Á !³ has the universal property for bilinearity. Alternatively, we may take the coordinate-free quotient space approach as follows. Definition Let = Á à Á = be vector spaces over - and let ; be the subspace of the free vector space < on = d Ä d = , generated by all vectors of the form ²# Á à Á #c Á "Á #b Á à Á # ³ b ²# Á à Á #c Á "Z Á #b Á à Á #³ c ²# Á à Á #c Á " b "Z Á #b Á à Á # ³ for all Á - , "Á "Z < and # Á à Á # = . The quotient space < °; is called the tensor product of = Á à Á = and denoted by = n Ä n = .
As before, we denote the coset ²# Á Ã Á # ³ b ; by # n Ä n # and so any element of = n Ä n = is a sum of decomposable tensors, that is, # n Ä n # where the vector space operations are linear in each variable. Let us formally state the universal property for multilinear functions. Theorem 14.13 (The universal property for multilinear functions as measured by linearity) Let = Á Ã Á = be vector spaces over the field - . The pair ²= n Ä n = Á !³, where !¢ = d Ä d = ¦ = n Ä n = is the multilinear map defined by !²# Á Ã Á # ³ ~ # n Ä n # has the following property. If ¢ = d Ä d = ¦ > is any multilinear function from = d Ä d = to a vector space > over - then there is a unique linear transformation ¢ = n Ä n = ¦ > that makes the diagram in Figure 14.8 commute, that is, for which k!~ Moreover, = n Ä n = is unique in the sense that if a pair ²?Á ³ also has this property then ? is isomorphic to = n Ä n = .
Tensor Products
t
V1uuVn
365
V1
Vn
f
W W
Figure 14.8 Here are some of the basic properties of multiple tensor products. Proof is left to the reader. Theorem 14.14 The tensor product has the following properties. Note that all vector spaces are over the same field - . 1) (Associativity) There exists an isomorphism ¢ ²= n Ä n = ³ n ²> n Ä n > ³ ¦ = n Ä n = n > n Ä n > for which ´²# n Ä n # ³ n ²$ n Ä n $ ³µ ~ # n Ä n # n $ n Ä n $ In particular, ²< n = ³ n > < n ²= n > ³ < n = n > 2) (Commutativity) Let be any permutation of the indices ¸Á à Á ¹. Then there is an isomorphism ¢ = n Ä n = ¦ =²³ n Ä n =²³ for which ²# n Ä n # ³ ~ #²³ n Ä n #²³ 3) There is an isomorphism ¢ - n = ¦ = for which ² n #³ ~ # and similarly, there is an isomorphism ¢ = n - ¦ = for which ²# n ³ ~ # Hence, - n = = = n - .
The analog of Theorem 14.3 is the following. Theorem 14.15 Let = Á Ã Á = and > be vector spaces over - . Then the map ¢ hom²= Á Ã Á = Â > ³ ¦ B²= n Ä n = Á > ³, defined by the fact that ² ³ is the unique linear map for which ~ ² ³ k !, is an isomorphism. Thus,
Advanced Linear Algebra
366
hom²= Á Ã Á = Â > ³ B²= n Ä n = Á > ³ Moreover, if all vector spaces are finite-dimensional then
dim´hom²= Á Ã Á = Â > ³µ ~ dim²> ³ h dim²= ³
~
Theorem 14.9 and its corollary can also be extended. Theorem 14.16 The linear transformation ¢ B²< Á <Z ³ n Ä n B²< Á <Z ³ ¦ B²< n Ä n < Á <Z n Ä n <Z ³ defined by ² n Ä n ³²" n Ä n " ³ ~ ²" ³ n Ä n ²" ³ is an embedding, and is an isomorphism if all vector spaces are finitedimensional. Thus, the tensor product n Ä n of linear transformations is (via this embedding) a linear transformation on tensor products. Two important special cases of this are
<i n Ä n <i Æ ²< n Ä n < ³i where ² n Ä n ³²" n Ä n " ³ ~ ²" ³Ä ²" ³ and
<i n Ä n <i n = Æ B²< n Ä n < Á = ³ where ² n Ä n n #³²" n Ä n " ³ ~ ²" ³Ä ²" ³#
Tensor Spaces Let = be a finite-dimensional vector space. For nonnegative integers and , the tensor product ; ²= ³ ~ = n Ä n = n = i n Ä n = i ~ = n n ²= i ³n factors
factors
is called the space of tensors of type ²Á ³, where is the contravariant type and is the covariant type. If ~ ~ then ; ²= ³ ~ - , the base field. Here we use the notation = n for the -fold tensor product of = with itself. We will also write = d for the -fold cartesian product of = with itself.
Tensor Products
367
Since all vector spaces are finite-dimensional, = and = ii are isomorphic and so ; ²= ³ ~ = n n ²= i ³n ²²= i ³n n = n ³i hom- ²²= i ³d d = d Á - ³ This is the space of all multilinear functionals on = i d Ä d = i d = dÄd= factors
factors
In fact, tensors of type ²Á ³ are often defined as multilinear functionals in this way. Note that dim²; ²= ³³ ~ ´dim²= ³µb Also, the associativity and commutativity of tensor products allows us to write b ; ²= ³ n ; ²= ³ ~ ;b ²= ³
at least up to isomorphism. Tensors of type ²Á ³ are called contravariant tensors ; ²= ³ ~ ; ²= ³ ~ = nÄn= factors
and tensors of type ²Á ³ are called covariant tensors ; ²= ³ ~ ; ²= ³ ~ =inÄn=i factors
Tensors with both contravariant and covariant indices are called mixed tensors. In general, a tensor can be interpreted in a variety of ways as a multilinear map on a cartesian product, or a linear map on a tensor product. (The interpretation we mentioned above that is sometimes used as the definition is only one possibility.) We simply need to decide how many of the contravariant factors and how many of the covariant factors should be “active participants” and how many should be “passive participants.” More specifically, consider a tensor of type ²Á ³, written # n Ä n # n Ä n # n n Ä n n Ä n ; ²= ³ where and . Here we are choosing the first vectors and the first linear functionals as active participants. This determines the number of arguments of the map. In fact, we define a map from the cartesian product
368
Advanced Linear Algebra
= i d Ä d = i d = dÄd= factors
factors
to the tensor product = n Ä n = n =inÄn=i c factors
c factors
of the remaining factors by ²# n Ä n # n n Ä n ³² Á à Á Á % Á à Á % ³ ~ ²# ³Ä ²# ³ ²% ³Ä ²% ³#b n Ä n # n b n Ä n In words, the first group # n Ä n # of (active) vectors interacts with the first set Á à Á of arguments to produce the scalar ²# ³Ä ²# ³. The first group n Ä n of (active) functionals interacts with the second group % Á à Á % of arguments to produce the scalar ²% ³Ä ²% ³. The remaining (passive) vectors #b n Ä n # and functionals b n Ä n are just “copied” to the image vector. It is easy to see that this map is multilinear and so there is a unique linear map from the tensor product = i n Ä n = i n = nÄn= factors
factors
to the tensor product = n Ä n = n =inÄn=i c factors
c factors
defined by ²# n Ä n # n n Ä n ³² n Ä n n % n Ä n % ³ ~ ²# ³Ä ²# ³ ²% ³Ä ²% ³#b n Ä n # n b n Ä n (What justifies the notation # n Ä n # n n Ä n for this map?) Let us look at some special cases. For a contravariant tensor of type ²Á ³ # n Ä n # ; ²= ³ we get a linear map = i n Ä n = i ¦ = nÄn= factors
c factors
(where ³ defined by ²# n Ä n # ³² n Ä n ³ ~ ²# ³Ä ²# ³#b n Ä n #
Tensor Products
369
For a covariant tensor of type ²Á ³ n Ä n ; ²= ³ we get a linear map from = n Ä n = ¦ =inÄn=i factors
c factors
(where ) defined by ² n Ä n ³²% n Ä n % ³ ~ ²% ³Ä ²% ³b n Ä n The special case ~ gives a linear functional on = n , that is, each element of ²= i ³n is a distinct member of ²= n ³i , whence the embedding
=i n Ä n =i Æ ²= n Ä n = ³i that we described earlier. Let us consider some small values of and . For a mixed tensor # n of type ²Á ³ here are the possibilities. When ~ and ~ we get the linear map ²# n ³¢ = ¦ = defined by ²# n ³²$³ ~ ²$³# When ~ and ~ we get the linear map ²# n ³¢ = i ¦ = i defined by ²# n ³²³ ~ ²#³ Finally, when ~ ~ we get a multilinear form ²# n ³¢ = i d = ¦ defined by ²# n ³²Á $³ ~ ²#³ ²$³ Consider also a tensor n of type ²Á ³. When ~ ~ we get a multilinear functional n ¢ ²= d = ³ ¦ - defined by ² n ³²#Á $³ ~ ²#³²$³ This is just a bilinear form on = . When ~ we get a multilinear map ² n ³¢ = ¦ = i defined by ² n ³²#³ ~ ²#³
Contraction Covariant and contravariant factors can be “combined” in the following way. Consider the map c ¢ = d d ²= i ³d ¦ ;c ²= ³
370
Advanced Linear Algebra
defined by ²# Á Ã Á # Á Á Ã Á ³ ~ ²# ³²# n Ä n # n n Ä n ³ This is easily seen to be multilinear and so there is a unique linear map c ¢ ; ²= ³ ¦ ;c ²= ³
defined by ²# n Ä n # n n Ä n ³ ~ ²# ³²# n Ä n # n n Ä n ³ This is called the contraction in the contravariant index and covariant index . Of course, contraction in other indices (one contravariant and one covariant) can be defined similarly. Example 14.8 Consider the tensor space ; ²= ³, which is isomorphic to B²= ³ via the fact that ²# n ³²$³ ~ ²$³# For ~ ~ , the contraction takes the form ²# n ³ ~ ²#³ Now, for # £ , the operator # n has kernel equal to ker² ³, which has codimension and so there is a nonzero vector " = for which = ~ º"» l ker² ³. Now, if ²#³ £ then = ~ º#» l ker² ³ and ²# n ³²#³ ~ ²#³# and so # is an eigenvector for the nonzero eigenvalue ²#³. Hence, = ~ ; ²#³ l ; and so the trace of # n is precisely ²#³. Since the trace is linear, we deduce that the trace of any linear operator on = is the contraction of the corresponding vector in ; ²= ³.
The Tensor Algebra of = Consider the contravariant tensor spaces ; ²= ³ ~ ; ²= ³ ~ = n For ~ we take ; ²= ³ ~ - . The external direct sum B
; ²= ³ ~ ; ²= ³ ~
of these tensor spaces is a vector space with the property that
Tensor Products
371
; ²= ³ n ; ²= ³ ~ ; b ²= ³ This is an example of a graded algebra, where ; ²= ³ are the elements of grade . The graded algebra ; ²= ³ is called the tensor algebra over = . (We will formally define graded structures a bit later in the chapter.) Since ; ²= ³ ~ = i n Ä n = i ~ ; ²= i ³ factors
there is no need to look separately at ; ²= ³.
Special Multilinear Maps The following definitions describe some special types of multilinear maps. Definition 1) A multilinear map ¢ = ¦ > is symmetric if interchanging any two coordinate positions changes nothing, that is, if ²# Á Ã Á # Á Ã Á # Á Ã Á # ³ ~ ²# Á Ã Á #Á Ã Á #Á Ã Á #³ for any £ . 2) A multilinear map ¢ = ¦ > is antisymmetric or skew-symmetric if interchanging any two coordinate positions introduces a factor of c, that is, if ²# Á Ã Á # Á Ã Á # Á Ã Á # ³ ~ c ²# Á Ã Á #Á Ã Á #Á Ã Á #³ for £ . 3) A multilinear map ¢ = ¦ > is alternate or alternating if ²# Á Ã Á # ³ ~ whenever any two of the vectors # are equal.
As in the case of bilinear forms, we have some relationships between these concepts. In particular, if char²- ³ ~ then alternate ¬ symmetric ¯ skew-symmetric and if char²- ³ £ then alternate ¯ skew-symmetric A few remarks about permutations, with which the reader may very well be familiar, are in order. A permutation of the set 5 ~ ¸Á à Á ¹ is a bijective function ¢ 5 ¦ 5 . We denote the group (under composition) of all such permutations by : . This is the symmetric group on symbols. A cycle of
372
Advanced Linear Algebra
length is a permutation of the form ² Á Á à Á ³, which sends to Á to 3 Á à Á c to and to . All other elements of 5 are left fixed. Every permutation is the product (composition) of disjoint cycles. A transposition is a cycle ²Á ³ of length . Every cycle (and therefore every permutation) is the product of transpositions. In general, a permutation can be expressed as a product of transpositions in many ways. However, no matter how one represents a given permutation as such a product, the number of transpositions is either always even or always odd. Therefore, we can define the parity of a permutation : to be the parity of the number of transpositions in any decomposition of as a product of transpositions. The sign of a permutation is defined by sg²³ ~ ²c³parity²³ If sg²³ ~ then is an even permutation and if sg²³ ~ c then is an odd permutation. The sign of is often written ²c³ . With these facts in mind, it is apparent that is symmetric if and only if ²# Á à Á # ³ ~ ²#²1) Á à Á #²³ ³ for all permutations : and that is alternating if and only if ²# Á à Á # ³ ~ ²c³ ²#²1) Á à Á #²³ ³ for all permutations : .
Graded Algebras We need to pause for a few definitions that are useful when discussing tensor algebra. An algebra ( over - is said to be a graded algebra if as a vector space over - , ( can be written in the form B
( ~ ( ~
for subspaces ( of (, and where multiplication behaves nicely, that is, ( ( (b The elements of ( are said to be homogeneous of degree . If ( is written ~ b Ä b for ( , £ , then is called the homogeneous component of of degree . The ring of polynomials - ´%µ provides a prime example of a graded algebra, since
Tensor Products
373
B
- ´%µ ~ - ´%µ ~
where - ´%µ is the subspace of - ´%µ consisting of all scalar multiples of % . More generally, the ring - ´% Á Ã Á % µ of polynomials in several variables is a graded algebra, since it is the direct sum of the subspaces of homogeneous polynomials of degree . (A polynomial is homogeneous of degree if each term has degree . For example, ~ % % b % % % is homogeneous of degree .)
Graded Ideals A graded ideal 0 in a graded algebra ( ~ ( is an ideal 0 for which, as a subspace of (, B
0 ~ ²0 q ( ³ ~
(Note that 0 q ( is not, in general, an ideal.) For example, the ideal 0 of - ´%µ consisting of all polynomials with zero constant term is graded. However, the ideal 1 ~ º b %» ~ ¸²%³² b %³ ²%³ - ´%µ¹ generated by b % is not graded, since - ´%µ contains only monomials and so 1 q - ´%µ ~ ¸¹. Theorem 14.17 Let ( be a graded algebra. An ideal 0 of ( is graded if and only if it is generated by homogeneous elements of (. Proof. If 0 is graded then it is generated by the elements of the direct summands 0 q ( , which are homogeneous. Conversely, suppose that 0 ~ º 2» where each is homogeneous. Any " 0 has the form " ~ " #
where " Á # (. Since ( is graded, each " and # is a sum of homogeneous terms and we can expand " # into a sum of homogeneous terms of the form where and (and ³ are homogeneous. Hence, if deg² ³ ~ deg² ³ h deg² ³ h deg² ³ ~ then ( q 0 and so
374
Advanced Linear Algebra
B
0 ~ ²0 q ( ³ ~
is graded.
If 0 is a graded ideal in (, then the quotient ring (°0 is also graded, since it is easy to show that B ( ( b 0 ~ 0 0 ~
Moreover, for %Á & 0 and ( , ´² b %³ b 0µ´² b &³ b 0µ ~ ² b 0³² b 0³ ~ b 0
( b 0 0
The Symmetric Tensor Algebra We wish to study tensors in ; ²= ³ for that enjoy a symmetry property. Let : be the symmetric group on ¸Á à Á ¹. For each : , the multilinear map ¢ = d ¦ ; ²= ³ defined by ²# Á à Á # ³ ~ #²³ n Ä n #²³ determines (by universality) a unique linear operator on ; ²= ³ for which ²# n Ä n # ³ ~ #²³ n Ä n #²³ Let 8 ~ ¸ Á à Á ¹ be a basis for = . Since the set 8 ~ ¸ n Ä n 8 ¹ is a basis for ; ²= ³ and is a bijection of 8 , it follows that is an isomorphism of ; ²= ³. A tensor ! ; ²= ³ is symmetric if ²!³ ~ ! for all permutations : . A word of caution is in order with respect to the definition of . The permutation permutes the coordinate positions in a decomposable tensor, not the indices. Suppose, for example, that ~ and ~ ²³. If ¸ Á ¹ is a basis for = then ²³ ² n ³ ~ n because permutes the positions, not the indices. Thus, the following is not true no
²³ ² n ³ ~ ²³ n ²³ ~ n The set of all symmetric tensors
Tensor Products
375
:; ²= ³ ~ ¸! ; ²= ³ ²!³ ~ ! for all : ¹ is a subspace of ; ²= ³. To study :; ²= ³ in more detail, let Á Ã Á be a basis for ; ²= ³. Any tensor # ; ²= ³ has the form
#~
ÁÃÁ n Ä n ÁÃÁ ~
where ÁÃÁ £ . It will help if we group the terms in such a sum according to the multiset of indices. Specifically, for each nonempty subset : of indices ¸Á à Á ¹ and each multiset 4 ~ ¸ Á à Á ¹ of size with underlying set : , let .4 consist of all possible decomposable tensors n Ä n where ² Á à Á ³ is a permutation of ¸ Á à Á ¹. For example, if 4 ~ ¸Á Á ¹ then .4 ~ ¸ n n Á n n Á n n ¹ Now, ignoring coefficients, the terms in the expression for # can be organized into the groups .4 . Let us denote the set of terms of # (without coefficients) that lie in .4 by .4 ²#³. For example, if # ~ n n b n n b n n then .¸ÁÁ¹ ²#³ ~ ¸ n n Á n n ¹ and .¸ÁÁ¹ ²#³ ~ ¸ n n ¹ Further, let :4 ²#³ denote the sum of the terms in # that belong to .4 ²#³ (including the coefficients). For example, :¸ÁÁ¹ ²#³ ~ n n b n n Thus, #~
:4 ²#³
multisets 4
Note that each permutation is a permutation of the elements of .4 , for each multiset 4 . It follows that # is a symmetric tensor if and only if the following conditions are satisfied:
376
Advanced Linear Algebra
1) (All or none) For each multiset 4 of size with underlying set : ¸Á à Á ¹, we have .4 ²#³ ~ J
or .4 ²#³ ~ .4
2) If .4 ²#³ ~ .4 , then the coefficients in the sum :4 ²#³ are the same and so :4 ²#³ ~ 4 ²#³ h ! !.4
where 4 ²#³ is the common coefficient. Hence, for a symmetric tensor #, we have
#~
multisets 4
84 ²#³ !9 !.4
Now, symmetric tensors act as though the tensor product was commutative. Of course, it is not, but we can deal with this as follows. Let ¢ ; ²= ³ ¦ - ´ Á Ã Á µ be the function from ; ²= ³ to the vector space - ´ Á Ã Á µ of all homogeneous polynomials of degree in the formal variables Á Ã Á , defined by 4 ÁÃÁ n Ä n 5 ~ ÁÃÁ Ä In this context, the product in - ´ Á Ã Á µ is often denoted by the symbol v , so we have 4 ÁÃÁ n Ä n 5 ~ ÁÃÁ ² v Ä v ³ It is clear that is well-defined, linear and surjective. We want to use to explore the properties of symmetric tensors, first as a subspace of ; ²= ³ and then as a quotient space. (The subspace approach is a bit simpler, but works only when char²- ³ ~ .)
The Case char²- ³ ~ Note that takes every member of a group .4 to the same monomial, whose indices are precisely 4 . Hence, if # :; ²= ³ is symmetric then ²#³ ~
multisets 4
84 ²#³ ²!³9 !.4
~ ²4 ²#³(.4 (³² v Ä v ³ Ä
Tensor Products
377
(Here we identify each multiset 4 with the nondecreasing sequence Ä of its members.) As to the kernel of , if ²#³ ~ for # :; ²= ³ then 4 ²#³(.4 ( ~ for all multisets 4 and so, if char²- ³ ~ , we may conclude that 4 ²#³ ~ for all multisets 4 , that is, # ~ . Hence, if char²- ³ ~ , the restricted map O:; ²= ³ is injective and so it is an isomorphism. We have proved the following. Theorem 14.18 Let = be a finite-dimensional vector space over a field - with char²- ³ ~ . Then the vector space :; ²= ³ of symmetric tensors of degree is isomorphic to the vector space of homogeneous polynomials - ´ Á Ã Á µ, via the isomorphism 4 ÁÃÁ n Ä n 5 ~ ÁÃÁ ² v Ä v ³
The vector space :; ²= ³ of symmetric tensors of degree is often called the symmetric tensor space of degree for = . However, this term is also used for an isomorphic vector space that we will study next. The direct sum B
:; ²= ³ ~ :; ²= ³ ~
is sometimes called the symmetric tensor algebra of = , although this term is also used for a slightly different (but isomorphic) algebra that we will define momentarily. We can use the vector space isomorphisms described in the previous theorem to move the product from the algebra of polynomials - ´ Á Ã Á µ to the symmetric tensor space :; ²= ³. In other words, if char²- ³ ~ then :; ²= ³ is a graded algebra isomorphic to the algebra of polynomials - ´ Á Ã Á µ.
The Arbitrary Case We can define the symmetric tensor space in a different, although perhaps slightly more complex, manner that holds regardless of the characteristic of the base field. This is important, since many important fields (such as finite fields) have nonzero characteristic. Consider again the kernel of the map , but this time as defined on all of ; ²= ³ , not just :; ²= ³. The map sends elements of different groups .4 ²#³ to different monomials in - ´ Á Ã Á µ, and so # ker² ³ if and only if ²:4 ²#³³ ~ Hence, the sum of the coefficients of the elements in .4 ²#³ must be .
378
Advanced Linear Algebra
Conversely, if the sum of the coefficients of the elements in .4 ²#³ is for all multisets 4 , then # ker² ³. Suppose that 4 ~ ¸ Á Ã Á ¹ is a multiset for which ! ~ n Ä n .4 ²#³ Then each decomposable tensor in .4 ²#³ is a permutation of ! and so :4 ²#³ may be written in the form :4 ²#³ ~ n Ä n b ² n Ä n ³
where the sum is over a subset of the symmetric group : , corresponding to the terms that appear in :4 ²#³ and where b ~
Substituting for in the expression for :4 ²#³ gives :4 ²#³ ~ ´ ² n Ä n ³ c ² n Ä n ³µ
It follows that # is in the subspace 0 of ; ²= ³ generated by tensors of the form ²!³ c !, that is 0 ~ º ²!³ c ! ! ; ²= ³Á : » and so ker² ³ 0 . Conversely, ² ² n Ä n ³ c ² n Ä n ³³ ~ and so 0 ker² ³. Theorem 14.19 Let = be a finite-dimensional vector space over a field - . For , the surjective linear map ¢ ; ²= ³ ¦ - ´ Á à Á µ defined by 4 ÁÃÁ n Ä n 5 ~ ÁÃÁ v Ä v has kernel 0 ~ º ²!³ c ! ! ; ²= ³Á : » and so ; ²= ³ - ´ Á à Á µ 0 The vector space ; ²= ³°0 is also referred to as the symmetric tensor space of degree of = . The ideal of ; ²= ³ defined by
Tensor Products
379
0 ~ º ²!³ c ! ! ; ²= ³Á : Á » being generated by homogeneous elements, is graded, so that B
0 ~ 0 ~
where 0 ~ ¸¹. The graded algebra B ; ²= ³ ; ²= ³ b 0 ~ 0 0 ~
is also called the symmetric tensor algebra for = and is isomorphic to - ´ Á Ã Á µ.
Before proceeding to the universal property, we note that the dimension of the symmetric tensor space :; ²= ³ is equal to the number of monomials of degree in the variables Á Ã Á and this is dim²:; ²= ³³ ~ 6
bc 7
The Universal Property for Symmetric -Linear Maps The vector space - ´% Á à Á % µ of homogeneous polynomials, and therefore also the isomorphic spaces of symmetric tensors :; ²= ³ and ; ²= ³°0 , have the universal property for symmetric -linear maps. Theorem 14.20 (The universal property for symmetric multilinear maps, as measured by linearity) Let = be a finite-dimensional vector space. Then the pair ²- ´% Á à Á % µÁ !¢ = d ¦ - ´% Á à Á % µ³, where !²# Á à Á # ³ ~ # v Ä v # has the universal property for symmetric -linear maps with domain = d , as measured by linearity. That is, for any symmetric -linear map ¢ = d ¦ < where < is a vector space, there is a unique linear map ¢ - ´% Á à Á % µ ¦ < for which ²# v Ä v # ³ ~ ²# Á à Á # ³ for any vectors # = . Proof. The universal property requires that ² v Ä v ³ ~ ² Á à Á ³ and this does indeed uniquely define a linear transformation , provided that it is well-defined. However,
380
Advanced Linear Algebra
v Ä v ~ v Ä v if and only if the multisets ¸ Á Ã Á ¹ and ¸ Á Ã Á ¹ are the same, which implies that ² Á Ã Á ³ ~ ² Á Ã Á ³, since is symmetric.
The Symmetrization Map When char²- ³ ~ , we can define a linear map :¢ ; ²= ³ ¦ :; ²= ³, called the symmetrization map, by :²!³ ~
²!³ [ :
(Since char²- ³ ~ we have [ £ .) Since ~ , we have ²:²!³³ ~
²!³ ~ ²!³ ~ ²!³ ~ :²!³ [ : [ : [ :
and so :²!³ is, in fact, symmetric. The reason for the factor °[ is that if # is a symmetric tensor, then ²#³ ~ # and so :²#³ ~
²#³ ~ # ~ # [ : [ :
that is, the symmetrization map fixes all symmetric tensors. It follows that for any tensor ! ; ²= ³ : ²!³ ~ :²:²!³³ ~ :²!³ Thus, : is idempotent and is therefore the projection map of ; ²= ³ onto im²:³ ~ :; ²= ³
The Antisymmetric Tensor Algebra: The Exterior Product Space Let us repeat our discussion for antisymmetric tensors. Before beginning officially, we want to introduce a very useful and very simple concept. Definition Let , ~ ¸ 0¹ be a set, which we refer to as an alphabet. A word, or string over , of finite length is a sequence $ ~ % Ä% where % , . There is one word of length over , , denoted by and called the empty word. Let M ²,³ be the set of all words over , of length and let M(E) be the set of all words over , (of finite length).
Tensor Products
381
Concatenation of words is done by placing one word after another: If # ~ & Ä& then $# ~ % Ä% & Ä& . Also, $ ~ $ ~ $. If the alphabet , is an ordered set, we say that a word $ over , is in ascending order if each , appears at most once in $ and if the order of the letters in $ is that given by the order of , . (For example, is in ascending order but and are not.) The empty word is in ascending order by definition. Let 7 ²,³ be the set of all words in ascending order over , of length and let 7(E) be the set of all words in ascending order over , .
We will assume throughout that char²- ³ £ . For each : , the multilinear map ¢ = d ¦ ; ²= ³ defined by ²# Á à Á # ³ ~ ²c³ #²³ n Ä n #²³ where ²c³ is the sign of , determines (by universality) a unique linear operator on ; ²= ³ for which ²# n Ä n # ³ ~ ²c³ #²³ n Ä n #²³ Note that if # ~ # for some £ , then for ~ ²Á ³, we have ²³ ²# n Ä n # n Ä n # n Ä n # ³ ~ c# n Ä n # n Ä n # n Ä n # ~ c# n Ä n # n Ä n # n Ä n # and since char²- ³ £ , we conclude that ²³ ²# n Ä n # ³ ~ . Let 8 ~ ¸ Á à Á ¹ be a basis for = . Since the set 8 ~ ¸ n Ä n 8 ¹ is a basis for ; ²= ³ and is a bijection of 8 , it follows that is an isomorphism of ; ²= ³. A tensor ! ; ²= ³ is antisymmetric if ²!³ ~ ²c³ ! for all permutations : . The set of all antisymmetric tensors (: ²= ³ ~ ¸! ; ²= ³ ²!³ ~ ²c³ ! for all : ¹ is a subspace of ; ²= ³. Let Á à Á be a basis for ; ²= ³. Any tensor # ; ²= ³ has the form
#~
ÁÃÁ n Ä n ÁÃÁ ~
where ÁÃÁ £ . As before, we define the groups .4 ²#³ and the sums
382
Advanced Linear Algebra
:4 ²#³. Each permutation sends an element ! .4 to another element of . , multiplied by ²c³ . It follows that # is an antisymmetric tensor if and only if the following hold: 1) If 4 is a multiset of size with underlying set : ¸Á à Á ¹ and if at least one element of 4 has multiplicity greater than , that is, if 4 is not a set, then .4 ²#³ ~ J. 2) For each subset 4 of size of ¸Á à Á ¹, we have .4 ²#³ ~ J
or .4 ²#³ ~ .4
3) If .4 ²#³ ~ .4 , then since 4 is a set, there is a unique member " ~ n Ä n of .4 for which Ä . If "Á! denotes the permutation in : for which "Á! takes " to !, then :4 ²#³ ~ 4 ²#³ h sg²"Á! ³! !.4
where 4 ²#³ is the absolute value of the coefficient of ". Hence, for an antisymmetric tensor #, we have # ~ 84 ²#³ sg²"Á! ³!9 4
!.4
Next, we need the counterpart of the polynomials - ´ Á à Á µ in which multiplication acts anticommutatively, that is, ~ c . To this end, we define a map ¢ M ²,³ ¦ 7 ²,³ r ¸¹ as follows. For ! ~ Ä , let ²!³ ~ if ! has any repeated variables. Otherwise, there is exactly one permutation of positions that will reorder ! in ascending order. If the resulting word in ascending order is ", denote this permutation by !Á" . Set ²!³ ~ sg²!Á" ³!Á" ²!³ ~ sg²!Á" ³" We can now define our anticommutative “polynomials.” Definition Let , ~ ² Á à Á ³ be a sequence of independent variables. Let -c ´ Á à Á µ be the vector space over - generated by all words over , in ascending order. For ~ , this is - , which we identify with - . Define a multiplication on the direct sum B
- c ~ - c ´ Á Ã Á µ ~ -c ~
as follows. For monomials ~ % Ä% -c and ~ & Ä& -c set
Tensor Products
383
~ ²% Ä% & Ä& ³ and extend by distributivity to - c . The resulting multiplication makes - c ´ Á Ã Á µ into a (noncommutative) algebra over - .
It is customary to use the notation w for the product in - c ´ Á Ã Á µ. This product is called the wedge product or exterior product. We will not name the algebra - c , but we will name an isomorphic algebra to be defined soon. Now we can define a function ¢ ; ²= ³ ¦ -c ´ Á Ã Á µ by 4 ÁÃÁ n Ä n 5 ~ ÁÃÁ ² w Ä w ³ It is clear that is well-defined, linear and surjective.
The Case char²- ³ ~ Just as with the symmetric tensors, if char²- ³ £ , we can benefit by restricting to the space (; ²= ³. Let ~ n Ä n
and ! ~ n Ä n
belong to the same group .4 and suppose that " ~ n Ä n .4 is in ascending order, that is, Ä . If # (; ²= ³ is antisymmetric, then ²#³ ~ 84 ²#³ sg²"Á! ³ ²!³9 4
!.4
~ 84 ²#³ sg²"Á! ³sg²!Á" ³"9 4
!.4
~ 84 ²#³ "9 4
!.4
~ 4 ²#³(.4 (" 4
Now, if ²#³ ~ for # (; ²= ³ then 4 ²#³(.4 ( ~ for all sets 4 and so, if char²- ³ ~ , we may conclude that 4 ²#³ ~ for all sets 4 , that is, # ~ . Hence, if char²- ³ ~ , the restricted map O(; ²= ³ is injective and so it is an isomorphism. We have proved the following. Theorem 14.21 Let = be a finite-dimensional vector space over a field - with char²- ³ ~ . Then the vector space (; ²= ³ of antisymmetric tensors of degree
384
Advanced Linear Algebra
is isomorphic to the vector space -c ´ Á Ã Á µ, via the isomorphism 4 ÁÃÁ n Ä n 5 ~ ÁÃÁ w Ä w
The vector space (; ²= ³ of antisymmetric tensors of degree is called the antisymmetric tensor space of degree for = or the exterior product space of degree over = . The direct sum B
(; ²= ³ ~ (; ²= ³ ~
is called the antisymmetric tensor algebra of = or the exterior algebra of = . We can use the vector space isomorphisms described in the previous theorem to move the product from the algebra - c ´ Á Ã Á µ to the antisymmetric tensor space (; ²= ³. In other words, if char²- ³ ~ then (; ²= ³ is a graded algebra isomorphic to the algebra - c ´ Á Ã Á µ.
The Arbitrary Case We can define the antisymmetric tensor space in a different manner that holds regardless of the characteristic of the base field. Consider the kernel of the map , as defined on all of ; ²= ³. Suppose that # ker² ³. Since sends elements of different groups .4 ²#³ to different monomials in - ´ Á Ã Á µ, it follows that must send each sum :4 ²#³ to ²:4 ²#³³ ~ Hence, the sum of the coefficients of the elements in .4 ²#³ must be . Conversely, if the sum of the coefficients of the elements in .4 ²#³ is for all multisets 4 , then # ker² ³. Suppose that 4 ~ ¸ Á Ã Á ¹ is a set for which " ~ n Ä n .4 ²#³ Then :4 ²#³ ~ n Ä n b ²c³ ² n Ä n ³
where the sum is over a subset of the symmetric group : , corresponding to the terms that appear in :4 ²#³ and where
Tensor Products
385
b ²c³ ~
Substituting for in the expression for :4 ²#³ gives :4 ²#³ ~ ´²c³ ² n Ä n ³ c ² n Ä n ³µ
It follows that # is in the subspace 0 of ; ²= ³ generated by tensors of the form ²c³ ²!³ c !, that is 0 ~ º²c³ ²!³ c ! ! ; ²= ³Á : » and so ker² ³ 0 . Conversely, ² ² n Ä n ³ c ² n Ä n ³³ ~ and so 0 ker² ³. Theorem 14.22 Let = be a finite-dimensional vector space over a field - . For , the surjective linear map ¢ ; ²= ³ ¦ -c ´ Á à Á µ defined by 4 ÁÃÁ n Ä n 5 ~ ÁÃÁ w Ä w has kernel 0 ~ º²c³ ²!³ c ! ! ; ²= ³Á : » and so ; ²= ³ -c ´ Á à Á µ 0 The vector space ; ²= ³°0 is also referred to as the antisymmetric tensor space of degree of = or the exterior algebra of degree of = . The ideal of ; ²= ³ defined by 0 ~ º²c³ ²!³ c ! ! ; ²= ³Á : Á » being generated by homogeneous elements, is graded, so that B
B
0 ~ 0 ~ 0 ~
~
where 0 ~ ¸¹. The graded algebra (; ²= ³ ~
B ; ²= ³ ; ²= ³ b 0 ~ 0 0 ~
386
Advanced Linear Algebra
is also called the antisymmetric tensor space of = or the exterior algebra of = and is isomorphic to - c ´ Á Ã Á µ.
The isomorphic exterior spaces (; ²= ³ and ; ²= ³°0 are usually denoted by = and the isomorphic exterior algebras (; ²= ³ and ; ²= ³°0 are usually denoted by = . Before proceding to the universal property, we note that the dimension of the exterior tensor space ²= ³ is equal to the number of words of length in ascending order over the alphabet , ~ ¸ Á Ã Á ¹ and this is dim² ²= ³³ ~ 6 7
The Universal Property for Antisymmetric -Linear Maps The vector space -c ´% Á à Á % µ and therefore also the isomorphic spaces of antisymmetric tensors ²= ³ and ; ²= ³°0 , have the universal property for symmetric -linear maps. Theorem 14.23 (The universal property for antisymmetric multilinear maps, as measured by linearity) Let = be a finite-dimensional vector space over a field - . Then the pair ²-c ´% Á à Á % µÁ !¢ = d ¦ -c ´% Á à Á % µ³ where !²# Á à Á # ³ ~ # w Ä w # has the universal property for antisymmetric -linear maps with domain = d , as measured by linearity. That is, for any antisymmetric -linear map ¢ = d ¦ < where < is a vector space, there is a unique linear map ¢ -c ´% Á à Á % µ ¦ < for which ²# w Ä w # ³ ~ ²# Á à Á # ³ for any vectors # = . Proof. Since is antisymmetric, it is completely determined by the fact that it is alternate and by its values on ascending words w Ä w , where Ä . Accordingly, we can define by ² w Ä w ³ ~ ² Á à Á ³ and this does indeed uniquely define a well-defined linear transformation .
Tensor Products
387
The Determinant The universal property for antisymmetric multilinear maps has the following corollary. Corollary 14.24 Let = be a vector space of dimension over a field - . Let , ~ ² Á Ã Á ³ be an ordered basis for = . Then there is a unique antisymmetric -linear form ¢ = d ¦ - for which ² Á Ã Á ³ ~ Proof. According to the universal property for antisymmetric -linear forms, for every such form ¢ = d ¦ - , there is a unique linear map ¢ = ¦ - for which ² w Ä w ³ ~ ² Á Ã Á ³ ~ But the dimension of = is 2 3 ~ and ¸ w Ä w ¹ is a basis for ²= ³. Hence, there is only one linear map ¢ = ¦ - with ² w Ä w ³ ~ . It follows that if and are two such forms, then ² Á Ã Á ³ ~ ² w Ä w ³ ~ ² Á Ã Á ³ and the antisymmetry of and imply that and agree on every permutation of ² Á Ã Á ³. Since and are multilinear, we must have ~ .
We now wish to construct the unique antisymmetric form guaranteed by the previous result. For any # = , write ´#µ,Á for the th coordinate of the coordinate matrix ´#µ, . Thus, # ~ ´#µ,Á
For clarity, and since we will not change the basis, let us write ´#µ for ´#µ,Á . Consider the map ¢ = d ¦ - defined by ²# Á Ã Á # ³ ~ ²c³ ´# µ() Ä´# µ() :
Then is multilinear since
388
Advanced Linear Algebra
²# b " Á Ã Á # ³ ~ ²c³ ´# b " µ() Ä´# µ() :
~ ²c³ ²´# µ²³ b ´" µ() ³Ä´# µ() :
~ ²c³ ´# µ²³ Ä´# µ() :
b ²c³ ´" µ²³ Ä´# µ() :
~ ²# Á à Á # ³ b ²" Á # Á à Á # ³ and similarily for any coordinate position. The map is alternating, and therefore antisymmetric since char²- ³ £ . To see this, suppose for instance that # ~ # . For any permutation : , let ²³ ~ and ²³ ~ . Then the permutation Z ~ ²³ satisfies 1) For % £ and % £ , Z ²%³ ~ ²%³ 2) Z ²³ ~ ²³²³ ~ ²³²³ ~ ~ ²³ 3) Z ²³ ~ ²³²³ ~ ²³²³ ~ ~ ²³. Hence, Z £ and it is easy to check that ²Z ³Z ~ . It follows that if the sets ¸Á Z ¹ and ¸Á Z ¹ intersect, then they are identical. In other words, the distinct sets ¸Á Z ¹ form a partition of : . Hence, ²# Á # Á à Á # ³ ~ ²c³ ´# µ() ´# µ() Ä´#µ() :
~
>²c³ ´# µ() ´# µ() Ä´# µ()
pairs ¸ÁZ ¹ Z
b ²c³ ´# µZ () ´# µZ () Ä´# µZ () ? But ´# µ() ´# µ() ~ ´# µZ () ´# µZ () Z
and since ²c³ ~ c²c³ , the sum of the two terms involving the pair ¸Á Z ¹ is . Hence, ²# Á # Á à Á # ³ ~ . A similar argument holds for any coordinate pair.
Tensor Products
389
Finally, we have ² Á Ã Á ³ ~ ²c³ ´ µ() Ä´ µ() :
~ ²c³ Á²³ ÄÁ²³ :
~ Thus, the map is indeed the unique antisymmetric -linear form on = d for which ² Á Ã Á ³ ~ . Given the ordered basis , ~ ² Á Ã Á ³, we can view = as the space - of coordinate vectors and view = d as the space 4 ²- ³ of d matrices, via the isomorphism ²# Á Ã Á # ³ ª
v ´# µ Å w ´# µ
Ä
´# µ y Å Ä ´# µ z
where all coordinate matrices are with respect to , . With this viewpoint, becomes an antisymmetric -form on the columns of a matrix ( ~ ²Á ³ given by ²(³ ~ ²c³ Á²³ ÄÁ²³ :
This is called the determinant of the matrix (.
Properties of the Determinant Let us explore some of the properties of the determinant function. Theorem 14.25 If ( 4 ²- ³ then ²(³ ~ ²(! ³. Proof. We know that ²(³ ~ ²c³ Á²³ ÄÁ²³ :
which can be written in the form ²(³ ~ ²c³ c ²²³³Á²³ Äc ²²³³Á²³ :
But we can reorder the factors in each term so that the second indices are in ascending order, giving
390
Advanced Linear Algebra
²(³ ~ ²c³ c ²³Á Äc ²³Á : c
~ ²c³ c ²³Á Äc ²³Á c :
~ ²c³ ²³Á IJ³Á :
~ ²(! ³ as desired.
Theorem 14.26 If (Á ) 4 ²- ³ then ²()³ ~ ²(³²)³. Proof. Consider the map ( ¢ 4 ²- ³ ¦ - defined by ( ²?³ ~ ²(?³ We can consider ( as a function on the columns of ? and write ( ¢ ²? ²³ Á Ã Á ? ²³ ³ ª ²(? ²³ Á Ã Á (? ²³ ³ ª ²(?³ Now, this map is multilinear since multiplication by ( is distributive and the determinant is multilinear. For example, let & - and let ? Z come from ? by replacing the first column by &. Then ( ²? ²³ b &Á Ã Á ? ²³ ³ ª ²(? ²³ b (&Á Ã Á (? ²³ ³ ª ²(?³ b ²(? Z ³ ~ ( ²?³ b ( ²? Z ³ The map ( is also alternating since is alternating and interchanging two coordinates in ²? ²³ Á Ã Á ? ²³ ³ is equivalent to interchanging the corresponding columns of (? . Thus, ( is an antisymmetric -linear form and so must be a scalar multiple of the determinant function, say ( ²?³ ~ ²?³. Then ²(?³ ~ ( ²?³ ~ ²?³ Setting ? ~ 0 gives ²(³ ~ and so ²(?³ ~ ²(³²?³ as desired.
If 7 4 ²- ³ is invertible, then 7 7 c ~ 0 and so ²7 ³²7 c ³ ~ which shows that ²7 ³ £ and ²7 c ³ ~ °²7 ³.
Tensor Products
391
But any matrix ( 4 ²- ³ is equivalent to a diagonal matrix ( ~ 7 +8 where 7 and 8 are invertible and + is diagonal with 's and 's on the main diagonal. Hence, ²(³ ~ ²7 ³²+³²8³ and so if ²(³ £ then ²+³ £ . But this can happen only if + ~ 0 , whence ( is invertible. We have proved the following. Theorem 14.27 A matrix ( 4 ²- ³ is invertible if and only if ²(³ £ .
Exercises Show that if ¢ > ¦ ? is a linear map and ¢ < d = ¦ > is bilinear then k ¢ < d = ¦ ? is bilinear. 2. Show that the only map that is both linear and -linear (for ) is the zero map. 3. Find an example of a bilinear map ¢ = d = ¦ > whose image im² ³ ~ ¸ ²"Á #³ "Á # = ¹ is not a subspace of > . 4. Prove that the universal property of tensor products defines the tensor product up to isomorphism only. That is, if a pair ²?Á ¢ < d = ¦ ?³ has the universal property then ? is isomorphic to < n = . 5. Prove that the following property of a pair ²> Á ¢ < d = ¦ > ³ with bilinear characterizes the tensor product ²< n = Á !¢ < d = ¦ < n = ³ up to isomorphism, and thus could have been used as the definition of tensor product: For a pair ²> Á ¢ < d = ¦ > ³ with bilinear if ¸" ¹ is a basis for < and ¸# ¹ is a basis for = then ¸²" Á # ³¹ is a basis for > . 6. Prove that < n = = n < . 7. Let ? and @ be nonempty sets. Use the universal property of tensor products to prove that can be extended to a linear function ¢ < n = ¦ > . Deduce that the function can be extended in a unique way to a bilinear map V ¢ < d = ¦ > . Show that all bilinear maps are obtained in this way. 10. Let : Á : be subspaces of < . Show that 1.
²: n = ³ q ²: n = ³ ²: q : ³ n = 11. Let : < and ; = be subspaces of vector spaces < and = , respectively. Show that ²: n = ³ q ²< n ; ³ : n ;
392
Advanced Linear Algebra
12. Let : Á : < and ; Á ; = be subspaces of < and = , respectively. Show that ²: n ; ³ q ²: n ; ³ ²: q : ³ n ²; n ; ³ 13. Find an example of two vector spaces < and = and a nonzero vector % < n = that has at least two distinct (not including order of the terms) representations of the form
% ~ " n # ~
where the " 's are linearly independent and so are the # 's. 14. Let ? denote the identity operator on a vector space ? . Prove that = p > ~ = n> . 15. Suppose that ¢ < ¦ = Á ¢ = ¦ > and ¢ < Z ¦ =2 Á ¢ =2 ¦ > Z . Prove that ² k ³ p ² k ³ ~ ² p ³ k ² p ³ 16. Connect the two approaches to extending the base field of an - -space = to 2 (at least in the finite-dimensional case) by showing that - n - 2 ²2³ . 17. Prove that in a tensor product < n < for which dim²< ³ not all vectors have the form " n # for some "Á # < . Hint: Suppose that "Á # < are linearly independent and consider " n # b # n ". 18. Prove that for the block matrix 4 ~>
(
) * ?block
we have ²4 ³ ~ ²(³²*³. 19. Let (Á ) 4 ²- ³. Prove that if either ( or ) is invertible, then the matrices ( b ) are invertible except for a finite number of 's.
The Tensor Product of Matrices 20. Let ( ~ ²Á ³ be the matrix of a linear operator B²= ³ with respect to the ordered basis 7 ~ ²" Á à Á " ³. Let ) ~ ²Á ³ be the matrix of a linear operator B²= ³ with respect to the ordered basis 8 ~ ²# Á à Á # ³. Consider the ordered basis 9 ~ ²" n # ³ ordered by lexicographic order, that is " n # "M n # if M or ~ M and . Show that the matrix of n with respect to 9 is p Á ) r ) ( n ) ~ r Á Å q Á )
Á ) Á ) Å Á )
Ä Ä
Á ) s Á ) u u Å Ä Á ) tblock
Tensor Products
21. 22. 23. 24.
25. 26.
393
This matrix is called the tensor product, Kronecker product or direct product of the matrix ( with the matrix ) . Show that the tensor product is not, in general, commutative. Show that the tensor product ( n ) is bilinear in both ( and ) . Show that ( n ) ~ if and only if ( ~ or ) ~ . Show that a) ²( n )³! ~ (! n ) ! b) ²( n )³i ~ (i n ) i (when - ~ d) Show that if "Á # - then (as row vectors) "! # ~ "! n #. Suppose that (Á Á )Á Á *Á and +Á are matrices of the given sizes. Prove that ²( n )³²* n +³ ~ ²(*³ n ²)+³
Discuss the case ~ ~ . 27. Prove that if ( and ) are nonsingular, then so is ( n ) and ²( n )³c ~ (c n ) c 28. Prove that tr²( n )³ ~ tr²(³ h tr²)³ 29. Suppose that - is algebraically closed. Prove that if ( has eigenvalues Á Ã Á and ) has eigenvalues Á Ã Á both lists including multiplicity then ( n ) has eigenvalues ¸ Á ¹, again counting multiplicity. 30. Prove that det²(Á n )Á ³ ~ ²det²(Á ³³ ²det²)Á ³³ .
Chapter 15
Positive Solutions to Linear Systems: Convexity and Separation
Given a matrix ( CÁ ²s³ consider the homogeneous system of linear equations (% ~ It is of obvious interest to determine conditions that guarantee the existence of positive solutions to this system, in a manner made precise by the following definition. Definition Let # ~ ² Á Ã Á ³ s . Then 1) # is nonnegative, written # if for all ~ Á Ã Á (Note that the term positive is also used in the literature for this property.) The set of all nonnegative vectors in s is the nonnegative orthant in s À 2) # is strictly positive, written # if # is nonnegative but not , that is, if for all ~ Á Ã Á and for at least one ~ Á Ã Á The set sb of all strictly positive vectors in s is the strictly positive orthant in s À 3) # is strongly positive, written # if for all ~ Á Ã Á The set sbb of all strongly positive vectors in s is the strongly positive orthant in s À
We are interested in conditions under which the system (% ~ has strictly positive or strongly positive solutions. Since the strictly and strongly positive orthants in s are not subspaces of s, it is difficult to use strictly linear
396
Advanced Linear Algebra
methods in studying this issue: we must also use geometric methods, in particular, methods of convexity. Let us pause briefly to consider an important application of strictly positive solutions to the system (% ~ . If ? ~ ²% Á Ã Á % ³ is a strictly positive solution then so is the vector &~
?~ ²% Á Ã Á % ³ ~ ² Á Ã Á ³ '% '%
which is a probability distribution. Note that if we replace “strictly” with “strongly” then the probability distribution has the property that each probability is positive. Now, the product (& is the expected value of the columns of ( with respect to the probability distribution &. Hence, (% ~ has a strictly (strongly) positive solution if and only if there is a strictly (strongly) positive probability distribution for which the columns of ( have expected value . If these columns represent the payoffs from a game of chance then the game is fair when the expected value of the columns is . Thus, (% ~ has a strictly (strongly) positive solution if and only if the “game” (, where in the strongly positive case, all outcomes are possible, is fair. As another (related) example, in discrete option pricing models of mathematical finance, the absence of arbitrage opportunities in the model is equivalent to the fact that a certain vector describing the gains in a portfolio does not intersect the strictly positive orthant in s . As we will see in this chapter, this is equivalent to the existence of a strongly positive solution to a homogeneous system of equations. This solution, when normalized to a probability distribution, is called a martingale measure. Of course, the equation (% ~ has a strictly positive solution if and only if ker²(³ contains a strictly positive vector, that is, if and only if ker²(³ ~ RowSpace²(³ meets the strictly positive orthant in s . Thus, we wish to characterize the subspaces : of s for which : meets the strictly positive orthant in s , in symbols : q sb £ J for these are precisely the row spaces of the matrices ( for which (% ~ has a strictly positive solution. A similar statement holds for strongly positive solutions. Looking at the real plane s , we can divine the answer with a picture. A onedimensional subspace : of s has the property that its orthogonal complement
Positive Solutions to Linear Systems: Convexity and Separation
397
: meets the strictly positive orthant (quadrant) in s if and only if : is the %axis, the &-axis or a line with negative slope. For the case of the strongly positive orthant, : must have negative slope. Our task is to generalize this to s . This will lead us to the following results, which are quite intuitive in s and s : q sbb £ J if and only if : q sb ~J
(15.1)
: q sb £ J if and only if : q sbb ~ J
(15.2)
and
Let us apply this to the matrix equation (% ~ . If : ~ RowSpace²(³ then : ~ ker²(³ and so we have ker²(³ q sbb £ J if and only if RowSpace²(³ q sb ~J
and ker²(³ q sb £ J if and only if RowSpace²(³ q sbb ~J
Now, RowSpace²(³ q sb ~ ¸#( #( ¹ and RowSpace²(³ q sbb ~ ¸#( #( ¹ and so these statements become (% ~ has a strongly positive solution if and only if ¸#( #( ¹ ~ J and (% ~ has a strictly positive solution if and only if ¸#( #( ¹ ~ J We can rephrase these results in the form of a theorem of the alternative, that is, a theorem that says that exactly one of two conditions holds. Theorem 15.1 Let ( CÁ ²s³. 1) Exactly one of the following holds: a) (" ~ for some strongly positive " s b) #( for some # s 2) Exactly one of the following holds: a) (" ~ for some strictly positive " s b) #( for some # s .
Before proving statements (15.1) and (15.2), we require some background.
398
Advanced Linear Algebra
Convex, Closed and Compact Sets We shall need the following concepts. Definition 1) Let % Á Ã Á % s . Any linear combination of the form ! % b Ä b ! % where ! b Ä b ! ~ Á ! is called a convex combination of the vectors % Á Ã Á % . 2) A subset ? s is convex if whenever %Á & ? then the entire line segment between % and & also lies in ? , in symbols ¸ % b !& b ! ~ Á Á ! ¹ ? 3) A subset ? s is closed if whenever ²% ³ is a convergent sequence of elements of ? , then the limit is also in ? . Simply put, a subset is closed if it is closed under the taking of limits. 4) A subset ? s is compact if it is both closed and bounded. 5) A subset ? s is a cone if % ? implies that % ? for all .
We will also have need of the following facts from analysis. 1) A continuous function that is defined on a compact set ? in s takes on its maximum and minimum values at some points within the set ? . 2) A subset ? of s is compact if and only if every sequence in ? has a subsequence that converges in ? . Theorem 15.2 Let ? and @ be subsets of s . Define ? b @ ~ ¸ b ?Á @ ¹ 1) If ? and @ are convex then so is ? b @ 2) If ? is compact and @ is closed then ? b @ is closed. Proof. For 1) let % b & and % b & be in ? b @ . The line segment between these two points is !²% b & ³ b ² c !³²% b & ³ ~ !% b ² c !³% b !& b ² c !³& ?b@ where ! and so ? b @ is convex. For part 2) let % b & be a convergent sequence in ? b @ . Suppose that % b & ¦ ' . We must show that ' ? b @ . Since % is a sequence in the compact set ? , it has a convergent subsequence % whose limit % lies in ? . Since b ¦ ' and ¦ % we can conclude that ¦ ' c %. Since @ is
Positive Solutions to Linear Systems: Convexity and Separation
399
closed, we must have ' c % @ and so ' ~ % b ²' c %³ ? b @ , as desired.
Convex Hulls We will have use for the notion of convex hull. Definition The convex hull of a set : ~ ¸% Á Ã Á % ¹ of vectors in s is the smallest convex set in s that contains the vectors % Á Ã Á % . We denote the convex hull of : by 9²:³.
Here is a characterization of convex hulls. Theorem 15.3 Let : ~ ¸% Á Ã Á % ¹ be a set of vectors in s . Then the convex hull 9²:³ is the set " of all convex combinations of vectors in : , that is, 9²:³ ~ " ¸! % b Ä b ! % ! Á '! ~ ¹ Proof. First, we show that " is convex. Let ? ~ ! % b Ä b ! % @ ~ % b Ä b % be convex combinations of : and let b ~ Á Á . Then ? b @ ~ ²! % b Ä b ! % ³ b ² % b Ä b ~ ²! b ³% b Ä b ²! b ³%
% ³
But this is also a convex combination of the vectors in : because ! b
² b ³max² Á ! ³ ~ max² Á ! ³
and
²! b ³ ~ ! b ~
~
~b ~
~
Thus, ?Á @ " ¬ ? b @ " which says that " is convex. Since : ", we have 9²:³ ". Clearly, if + is a convex set that contains : then + also contains ". Hence " 9²:³.
Theorem 15.4 The convex hull 9²:³ of a finite set : ~ ¸% Á Ã Á % ¹ of vectors in s is a compact set.
400
Advanced Linear Algebra
Proof. Let + ~ H²! Á Ã Á ! ³ ! and ! ~ I
and define a function ¢ + ¦ s as follows. If ! ~ ²! Á Ã Á ! ³ then ²!³ ~ ! % b Ä b ! % To see that is continuous, let , if ) c !) °4 then
~ ² Á Ã Á
³
( c ! ( ) c !)
and let 4 ~ max²)% )³. For
4
and so ) ² ³ c ²!³) ~ )² c ! ³% b Ä b ² c ! ³% ) ( c ! ()% ) b Ä b ( c ! ()% ) 4 ) c !) ~ Finally, since maps the compact set + onto 9²:³, we deduce that 9²:³ is compact.
Linear and Affine Hyperplanes We next discuss hyperplanes in s . A linear hyperplane in s is an ² c ³dimensional subspace of s . As such, it is the solution set of a linear equation of the form % b Ä b % ~ or º5 Á %» ~ where 5 ~ ² Á Ã Á ³ is nonzero and % ~ ²% Á Ã Á % ³. Geometrically speaking, this is the set of all vectors in s that are perpendicular (normal) to the vector 5 . An (affine) hyperplane is a linear hyperplane that has been translated by a vector. Thus, it is the solution set to an equation of the form ²% c ³ b Ä b ²% c ³ ~ or % b Ä b % ~ b Ä or finally
Positive Solutions to Linear Systems: Convexity and Separation
401
º5 Á %» ~ º5 Á )» where ) ~ ² Á Ã Á ³. Let us write >²5 Á ³, where 5 s and s, to denote the hyperplane >²5 Á ³ ~ ¸% s º5 Á %» ~ ¹ Note that the hyperplane >²5 Á )5 ) ³ ~ ¸% s º5 Á %» ~ )5 ) ¹ contains the point 5 , which is the point of >²5 Á ³ closest to the origin, since Cauchy's inequality gives )5 ) ~ º5 Á %» )5 ))%) and so )5 ) )%) for all % >²5 Á )5 ) ³. Moreover, any hyperplane has the form >²5 Á )5 ) ³ for any appropriate vector 5 . A hyperplane defines two (nondisjoint) closed half-spaces >b ²5 Á ³ ~ ¸% s º5 Á %» ¹ >c ²5 Á ³ ~ ¸% s º5 Á %» ¹ and two (disjoint) open half-spaces >kb ²5 Á ³ ~ ¸% s º5 Á %» ¹ >kc ²5 Á ³ ~ ¸% s º5 Á %» ¹ It is not hard to show that >b ²5 Á ³ q >c ²5 Á ³ ~ >²5 Á ³ k and that >kb ²5 Á ³, >c ²5 Á ³ and >²5 Á ³ are pairwise disjoint and k ²5 Á ³ r >²5 Á ³ ~ s >kb ²5 Á ³ r >c
Definition The subsets ? and @ of s are strictly separated by a hyperplane >²5 Á ³ if ? lies in one open half-space determined by >²5 Á ³ and @ lies in the other. Thus, one of the following holds: 1) º5 Á %» º5 Á &» for all % ?Á & @ 2) º5 Á &» º5 Á %» for all % ?Á & @ .
Note that 1) holds for 5 and if and only if 2) holds for c5 and c , and so we need only consider one of the conditions to demonstrate that two sets ? and @ are not stricctly separated. In particular, suppose that 1) fails for all 5 and . Then the condition
402
Advanced Linear Algebra
ºc5 Á &» c ºc5 Á %» also fails and so 1) and 2) both fail for all 5 and and ? and @ are not strictly separated. The following type of separation is stronger than strict separation. Definition The subsets ? and @ of s are strongly separated by a hyperplane >²5 Á ³ if there is an for which one of the following holds: 1) º5 Á %» c b º5 Á &» for all % ?Á & @ 2) º5 Á &» c b º5 Á %» for all % ?Á & @
Note that, as before, we need only consider one of the conditions to show that two sets are not strongly separated.
Separation Now that we have the preliminaries out of the way, we can get down to some theorems. The first is a well known separation theorem that is the basis for many other separation theorems. It says that if a closed convex set * in s does not contain a vector , then * can be strongly separated from by a hyperplane. Theorem 15.5 Let * be a closed convex subset of s . 1) * contains a unique vector 5 of minimum norm, that is, there is a unique vector 5 * for which ) 5 ) )% ) for all % *Á % £ 5 . 2) If * does not contain the origin then * lies in the closed half-space º5 Á %» )5 ) where 5 £ is the vector in * of minimum norm. Hence, and * are strongly separated by the hyperplane >²5 Á )5 ) °³. 3) If ¤ * then and * are strongly separated. Proof. For part 1), we first show that * contains a vector 5 of minimum norm. Recall that the Euclidean norm (distance) is a continuous function. Although * need not be compact, if we choose a real number such that the closed ball ) ²³ ~ ¸' s )' ) ¹ intersects * , then that intersection * Z ~ * q ) ²³ is both closed and bounded and so is compact. The distance function therefore achieves its minimum on * Z , say at the point 5 * Z * . It is clear that if for some # * we have )#) )5 ) then # ))5 ) ²³ * Z , which is a contradiction to the minimality of 5 . Hence, 5 is a vector of minimum norm in * . Let us write )5 ) ~ .
Positive Solutions to Linear Systems: Convexity and Separation
403
Suppose now that % £ 5 is another vector in * with )%) ~ . Since * is convex, the line segment from 5 to % must be contained in * . In particular, the vector ' ~ ²°³²% b 5 ³ is in * . Since % cannot be a scalar multiple of 5 , the Cauchy-Schwarz inequality is strict º5 Á %» )5 ))%) ~ Hence )% b 5 ) ~ ²P5 P b º5 Á %» b P%P ³ ~ ² b º5 Á %»³
P'P ~
But this contradicts the minimality of and so % ~ 5 . Thus, * has a unique vector of minimum norm. For part 2), if there is an % * for which º5 Á %» )5 ) ~ then again setting ' ~ ²°³²% b 5 ³ * we find that ' has norm less than , which is not possible. Hence, º5 Á %» )5 ) for all % * . For part 3), if ¤ * is not the origin, then is not in the closed convex set * c ¸¹ ~ ¸ c *¹ Hence, by part 2), there is a nonzero 5 * for which º5 Á %» )5 ) for all % * c ¸¹. But as % ranges over * c ¸¹, the vector % c ranges over * so we have º5 Á % c » )5 ) for all % * . This can be written º5 Á %» )5 ) b º5 Á » for all % * . Hence
404
Advanced Linear Algebra
º5 Á » )5 ) b º5 Á » º5 Á %» from which it follows that and * are strongly separated.
The next result brings us closer to our goal by replacing the origin with a subspace : disjoint from * . However, we must strengthen the requirements on * a bit. Theorem 15.6 Let * be a compact convex subset of s and let : be a subspace of s such that * q : ~ J. Then there exists a nonzero 5 : such that º5 Á %» )5 ) for all % * . Hence, the hyperplane >²5 Á )5 ) °³ strongly separates : and *. Proof. Consider the set : b * , which is closed since : is closed and * is compact. It is also convex since : and * are convex. Furthermore, ¤ : b * because if ~ b then ~ c would be in * q : ~ J. According to Theorem 15.5, the set : b * can be strongly separated from the origin. Hence, there is a nonzero 5 s such that º5 Á %» )5 ) for all % ~ b : b * , that is, º5 Á » b º5 Á » ~ º5 Á b » )5 ) for all : and * . Now, if º5 Á » is nonzero for any : , we can replace by an appropriate scalar multiple of to make the left side negative, which is impossible. Hence, we must have º5 Á » ~ for all : . Thus, 5 : and º5 Á » )5 ) for all * , as desired.
We can now prove (15.1) and (15.2). Theorem 15.7 Let : be a subspace of s . Then 1) : q sb ~ J if and only if : q sbb £ J 2) : q sbb ~ J if and only if : q sb £J Proof. For part 1), it is clear that there cannot exist vectors " sbb and # sb that are orthogonal. Hence, : q sb and : q sbb cannot both be nonempty, so if : q sbb £ J then : q sb ~ J. The converse is more interesting.
Positive Solutions to Linear Systems: Convexity and Separation
405
Suppose that : q sb ~ J. A good candidate for an element of : q sbb would be a normal to a hyperplane that separates : from a subset of sb . Note that our theorems do not allow us to separate : from sb , because it is not compact. So consider instead the convex hull " of the standard basis vectors Á Ã Á in sb " ~ ¸! b Ä b ! ! Á '! ~ ¹ It is clear that " is convex and " sb and so " q : ~ J. Also, " is closed and bounded and therefore compact. Hence, by Theorem 15.6, there is a nonzero vector 5 ~ ² Á Ã Á ³ : such that º5 Á » )5 ) for all "À Taking ~ gives ~ º5 Á » )5 ) and so 5 : q sbb , which is therefore nonempty. To prove part 2), again we note that there cannot exist vectors " sbb and # sb that are orthogonal. Hence, : q sbb and : q sb cannot both be nonempty, so if : q sb £ J then : q sbb ~ J. To prove that : q sbb ~ J ¬ : q sb £J
note first that a subspace contains a strictly positive vector 5 if and only if it contains a strictly positive vector whose coordinates sum to . Let 8 ~ ¸) Á à Á ) ¹ be a basis for : and consider the matrix 4 ~ ²Á ³ ~ ²) ) Ä ) ³ whose columns are the basis vectors in 8 . Let the rows of 4 be denoted by 9 Á à Á 9 . Note that 9 s , where ~ dim²:³. Now, : contains a strictly positive vector 5 ~ ² Á à Á ³ if and only if 9 b Ä b 9 ~ for coefficients satisfying ' ~ , that is, if and only if is contained in the convex hull * of the vectors 9 Á à Á 9 in s . Hence, : q sb £ J if and only if * Thus, we wish to prove that : q sbb ~ J ¬ * or, equivalently,
406
Advanced Linear Algebra
£ * ¬ : q sbb £ J Now we have something to separate. Since * is closed and convex, it follows from Theorem 15.5 that there is a nonzero vector ) ~ ² Á Ã Á ³ s for which º)Á %» )) ) for all % * . Consider the vector # ~ ) b Ä b ) : The th coordinate of # is Á b Ä b Á ~ º)Á 9 » )) ) and so # is strongly positive. Hence, # : q sbb and so this set is nonempty. This completes the proof.
Nonhomogeneous Systems We now turn our attention to nonhomogeneous systems (% ~ The following lemma is required. Lemma 15.8 Let ( CÁ ²s³. Then the set * ~ ¸(& & s Á & ¹ is a closed, convex cone. Proof. We leave it as an exercise to prove that * is a convex cone and omit the proof that * is closed.
Theorem 15.9 (Farkas's lemma) Let ( CÁ ²s³ and let s be nonzero. Then exactly one of the following holds: 1) There is a strictly positive solution " s to the system (% ~ . 2) There is a vector # s for which #( and º#Á » . Proof. Suppose first that 1) holds. If 2) also holds, then ²#(³" ~ #²("³ ~ º#Á » However, #( and " imply that ²#(³" . This contradiction implies that 2) cannot hold. Assume now that 1) fails to hold. By Lemma 15.8, the set * ~ ¸(& & s Á & ¹ s is closed and convex. The fact that 1) fails to hold is equivalent to ¤ * .
Positive Solutions to Linear Systems: Convexity and Separation
407
Hence, there is a hyperplane that strongly separates and * . All we require is that and * be strictly separated, that is, for some s and # s º#Á %» º#Á » for all % * Since * it follows that and so º#Á » . Also, the first inequality is equivalent to º#Á (&» , that is, º(! #Á &» for all & s Á & . We claim that this implies that (! # cannot have any positive coordinates and thus #( . For if the th coordinate ²(! #³ is positive, then taking & ~ for we get ²(! #³ which does not hold for large . Thus, 2) holds.
In the exercises, we ask the reader to show that the previous result cannot be improved by replacing #( in statement 2) with #( .
Exercises 1. 2. 3.
If ( is an d matrix prove that the set ¸(% % s Á % ¹ is a convex cone in s . If ( and ) are strictly separated subsets of s and if ( is finite, prove that ( and ) are strongly separated as well. Let = be a vector space over a field - with char²- ³ £ . Show that a subset ? of = is closed under the taking of convex combinations of any two of its points if and only if ? is closed under the taking of arbitrary convex combinations, that is, for all
% Á Ã Á % ?Á ~ Á ¬ % ? ~
4. 5À
~
Explain why an ² c ³-dimensional subspace of s is the solution set of a linear equation of the form % b Ä b % ~ . Show that >b ²5 Á ³ q >c ²5 Á ³ ~ >²5 Á ³ k and that >kb ²5 Á ³, >c ²5 Á ³ and >²5 Á ³ are pairwise disjoint and k ²5 Á ³ r >²5 Á ³ ~ s >kb ²5 Á ³ r >c
6.
A function ; ¢ s ¦ s is affine if it has the form ; ²#³ ~ ²#³ b for s , where B²s Á s ³. Prove that if * s is convex then so is ; ²*³ s .
408
7.
8. 9.
10.
11. 12. 13.
Advanced Linear Algebra Find a cone in s that is not convex. Prove that a subset ? of s is a convex cone if and only if %Á & ? implies that % b & ? for all Á . Prove that the convex hull of a set ¸% Á Ã Á % ¹ in s is bounded, without using the fact that it is compact. Suppose that a vector % s has two distinct representations as convex combinations of the vectors # Á Ã Á # . Prove that the vectors # c # Á Ã Á # c # are linearly dependent. Suppose that * is a nonempty convex subset of s and that >²5 Á ³ is a hyperplane disjoint from * . Prove that * lies in one of the open half-spaces determined by >²5 Á ³. Prove that the conclusion of Theorem 15.6 may fail if we assume only that * is closed and convex. Find two nonempty convex subsets of s that are strictly separated but not strongly separated. Prove that ? and @ are strongly separated by >²5 Á ³ if and only if º5 Á %Z » for all %Z ? and º5 Á &Z » for all &Z @
where ? ~ ? b )²Á ³ and @ ~ @ b )²Á ³ and where )²Á ³ is the closed unit ball. 14. Show that Farkas's lemma cannot be improved by replacing #( in statement 2) with #( . Hint: A nice counterexample exists for ~ Á ~ .
Chapter 16
Affine Geometry
In this chapter, we will study the geometry of a finite-dimensional vector space = , along with its structure-preserving maps. Throughout this chapter, all vector spaces are assumed to be finite-dimensional.
Affine Geometry The cosets of a quotient space have a special geometric name. Definition Let : be a subspace of a vector space = . The coset # b : ~ ¸# b
:¹
is called a flat in = with base : . We also refer to # b : as a translate of : . The set 7²= ³ of all flats in = is called the affine geometry of = . The dimension dim²7²= ³³ of 7²= ³ is defined to be dim²= ³.
Here are some simple yet useful observations about flats. 1) A flat ? ~ % b : is a subspace if and only if % ~ , that is, if and only if ? ~ :. 2) A subset ? is a flat if and only if for any % ? the translate : ~ c% b ? is a subspace. 3) If ? ~ % b : is a flat and ' b ? then ' b ? ~ : . Definition Two flats % ~ % b : and @ ~ & b ; are said to be parallel if : ; or ; : . This is denoted by ? @ .
We will denote subspaces of = by the letters :Á ; Á Ã and flats in = by ?Á @ Á Ã . Here are some of the basic intersection properties of flats.
410
Advanced Linear Algebra
Theorem 16.1 Let : and ; be subspaces of = and let ? ~ % b : and @ ~ & b ; be flats in = . 1) The following are equivalent: a) % b : ~ & b : ) % & b : c) % c & : 2) The following are equivalent: a) $ b ? @ for some $ = b) # b : ; for some # = c) : ; 3) The following are equivalent: a) $ b ? ~ @ for some $ = b) # b : ~ ; for some # = c) : ~ ; 4) ? q @ £ JÁ : ; ¯ ? @ 5) ? q @ £ JÁ : ~ ; ¯ ? ~ @ 6) If ? @ then ? @ , @ ? or ? q @ ~ J 7) ? @ if and only if some translation of one of these flats is contained in the other. Proof. We leave proof of part 1) for the reader. To prove 2), if 2a) holds then c& b $ b % b : ; and so 2b) holds. Conversely, if 2b) holds then & b # c % b ²% b :³ & b ; ~ @ and so 2a) holds. Now, if 2b) holds then # ~ # b ; and so : c# b ; ; , which is 2c). Finally, if : ; then just take # ~ to get 2b). Part 3) is proved in a similar manner. For part 4), we know that $ b ? @ for some $ = . However, if ' ? q @ then $ b ' @ and so $ c' b @ ~ @ , whence ? @ . Part 5) follows similarly. We leave proof of 6) and 7) to the reader.
Part 1) of the previous theorem says that, in general, a flat can be represented in many ways as a translate of the base : . If ? ~ % b : , then % is called a flat representative, or coset representative of ? . Any element of a flat is a flat representative. On the other hand, if % b : ~ & b ; then % c & b : ~ ; and the previous theorem tells us that : ~ ; . Thus, the base of a flat is uniquely determined by the flat and we can make the following definition. Definition The dimension of a flat % b : is dim²:³. A flat of dimension is called a -flat. A -flat is a point, a -flat is a line and a 2-flat is a plane. A flat of dimension dim²7²= ³³ c is called a hyperplane.
Affine Geometry
411
Affine Combinations If - and b Ä b ~ then the linear combination % b Ä b % is referred to as an affine combination of the vectors % Á Ã Á % . Our immediate goal is to show that, while the subspaces of = are precisely the subsets of = that are closed under the taking of linear combinations, the flats of = are precisely the subsets of = that are closed under the taking of affine combinations. First, we need the following. Theorem 16.2 If char²- ³ £ , then the following are equivalent for a subset ? of = . 1) ? is closed under the taking of affine combinations of any two of its points, that is, %Á & ? ¬ % b ² c ³& ? 2) ? is closed under the taking of arbitrary affine combinations, that is, % Á Ã Á % ?Á b Ä b ~ ¬ % b Ä b % ? Proof. It is clear that 2) implies 1). For the converse, we proceed by induction on . Part 1) is the case ~ . Assume the result true for c and consider the affine combination ' ~ % b Ä b % If one of or is different from , say £ then we may write ' ~ % b ² c ³4
% b Ä b % 5 c c
and since the sum of the coefficients inside the large parentheses is , the induction hypothesis implies that this sum is in ? . Then 1) shows that ' ? . On the other hand, if ~ ~ then since char²- ³ £ 2, we may write ' ~ > % b % ? b 3 % 3 b Ä b % and since 1) implies that ²°³% b ²°³% ? , we may again deduce from the induction hypothesis that ' ? . In any case, ' ? and so 2) holds.
Note that the requirement char²- ³ £ is necessary, for if - ~ { then the subset
412
Advanced Linear Algebra
? ~ ¸²Á ³Á ²Á ³Á ²Á ³¹ satisfies condition 1) but not condition 2). We can now characterize flats. Theorem 16.3 1) A subset ? of = is a flat in = if and only if it is closed under the taking of affine combinations, that is, if and only if % Á à Á % ?Á b Ä b ~ ¬ % b Ä b % ? 2) If char²- ³ £ 2, a subset ? of = is a flat if and only if % contains the line through any two of its points, that is, if and only if %Á & ? ¬ % b ² c ³& ? Proof. Suppose that ? ~ % b : is a flat and % Á à Á % ? . Then % ~ % b , for : and so if ' ~ , we have % ~ ²% b ³ ~ % b
%b:
and so ? is closed under affine combinations. Conversely, suppose that ? is closed under the taking of affine combinations. It is sufficient (and necessary) to show that for a given % ? , the set : ~ c% b ? is a subspace of = . But if c% b % Á c% b % : then for any Á ²c% b % ³ b ²c% b % ³ ~ c² b ³% b % b % ~ c% b ´² c c ³% b % b % µ c% b ? Hence, : is a subspace of = . Part 2) follows from part 1) and Theorem 16.2.
Affine Hulls The following definition gives the analog of the subspace spanned by a collection of vectors. Definition Let * be a nonempty set of vectors in = . The affine hull of * , denoted by hull²*³, is the smallest flat containing * .
Theorem 16.4 The affine hull of a nonempty subset * of = is the affine span of * , that is, the set of all affine combinations of vectors in *
hull²*³ ~ D % c Á % Á Ã Á % *Á ~ E ~
~
Affine Geometry
413
Proof. According to Theorem 16.3, any flat containing * must contain all affine combinations of vectors in * . It remains to show that the set ? of all affine combinations of * is a flat, or equivalently, that for any & ? , the set : ~?c& is a subspace of = . To this end, let
& ~ Á % Á ~
& ~ Á %
and & ~ 2Á %
~
~
where % * and 'Á ~ 'Á ~ 'Á ~ . Hence, any linear combination of & c & and & c & has the form ' ~ ²& c &³ b !²& c &³
~ Á % b !2Á % c ² b !³& ~
~
~ ² Á b !2Á ³% c ² b ! c 1)& c & ~
~ 4 Á b !2Á c ² b ! c 1)Á 5% c & ~
But, since 'Á ~ 'Á ~ 'Á ~ , the sum of the coefficients of % is equal to and so the sum is an affine sum, which shows that ' : . Hence, : is a subspace of = .
The affine hull of a finite set of vectors is denoted by hull¸% Á Ã Á % ¹. We leave it as an exercise to show that for any hull¸% Á Ã Á % ¹ ~ % b º% c % Á Ã Á %c c % Á %b c % Á Ã Á % c % »
(16.1)
where º» denotes the subspace spanned by the vectors within the brackets. It follows that dim²hull¸% Á à Á % ¹³ c The affine hull of a pair of distinct points is the line through those points, denoted by %& ~ ¸% b ² c ³& - ¹ ~ & b º% c &»
The Lattice of Flats Since flats are subsets of = , they are partially ordered by set inclusion.
414
Advanced Linear Algebra
Theorem 16.5 The intersection of a nonempty collection 9 ~ ¸% b : 2¹ of flats in = is either empty or is a flat. If the intersection is nonempty, then
²% b : ³ ~ % b : 2
2
for any vector % in the intersection. In other words, the base of the intersection is the intersection of the bases. Proof. If % ²% b : ³ 2
then % b : ~ % b : for all 2 and so
²% b : ³ ~ ²% b : ³ ~ % b : 2
2
2
Definition The join of a nonempty collection 9 ~ ¸% b : 2¹ of flats in = is the smallest flat containing all flats in 9. We denote the join of the collection 9 of flats by 9 , or by ²% b : ³ 2
The join of two flats is written ²% b :³ v ²& b ; ³.
Theorem 16.6 Let 9 ~ ¸% b : 2¹ be a nonempty collection of flats in the vector space = . 1) 9 is the intersection of all flats that contain all flats in 9 . 2) 9 is hull²9³, where 9 is the union of all flats in 9 .
Theorem 16.7 Let ? ~ % b : and @ ~ & b ; be flats in = . Then 1) ? v @ ~ % b ²º% c &» b : b ; ³ 2) If ? q @ £ J then ? v @ ~ % b ²: b ; ³ Proof. For part 1), let ? v @ ~ ²% b :³ v ²& b ; ³ ~ ' b < for some ' = and subspace < of = . Of course, : b ; < . But since %Á & ' b < , we also have % c & < . (Note that % c & is not necessarily in : b ; .) Let > ~ º% c &» b : b ; . Then > < and so % b > % b < ~ ? v @ . On the other hand
Affine Geometry
415
? ~%b: %b> and @ ~ & b ; ~ % c ²% c &³ b ; % b > and so ? v @ % b > . Thus, ? v @ ~ % b > , as desired. For part 2), if ? q @ £ J then we may take the flat representatives for ? and @ to be any element ' ? q @ , in which case part 1) gives ? v @ ~ ' b ²º' c '» b : b ; ³ ~ ' b : b ; and since % ? v @ we also have ? v @ ~ % b : b ; .
We can now describe the dimension of the join of two flats. Theorem 16.8 Let ? ~ % b : and @ ~ & b ; be flats in = . 1) If ? q @ £ J then dim²? v @ ³ ~ dim²: b ; ³ ~ dim²?³ b dim²@ ³ c dim²? q @ ³ 2) If ? q @ ~ J then dim²? v @ ³ ~ dim²: b ; ³ b Proof. According to Theorem 16.7, if ? q @ £ J then ?v@ ~%b: b; and so by definition of the dimension of a flat dim²? v @ ³ ~ dim²: b ; ³ On the other hand, if ? q @ ~ J then ? v @ ~ % b º% c &» b : b ; and since dim²º% c &»³ ~ , we get dim²? v @ ³ ~ dim²: b ; ³ b Finally, we have dim²: b ; ³ ~ dim²:³ b dim²; ³ c dim²: q ; ³ Therefore, if ' ²% b :³ q ²& b ; ³ then % b : ~ ' b : and & b ; ~ ' b ; and so dim²: q ; ³ ~ dim²' b ´: q ; µ³ ~ dim²´' b :µ q ´' b ; µ³ ~ dim²? q @ ³
416
Advanced Linear Algebra
Affine Independence We now discuss the affine counterpart of linear independence. Theorem 16.9 Let ? ~ ¸% Á Ã Á % ¹ be a nonempty set of vectors in = . The following are equivalent: 1) / ~ hull¸% Á Ã Á % ¹ has dimension c . 2) The set ¸% c % Á Ã Á %c c % Á %b c % Á Ã Á % c % ¹ is linearly independent for all ~ Á Ã Á . 3) % ¤ hull¸% Á Ã Á %c Á %b Á Ã Á % ¹ for all ~ Á Ã Á . 4) If 'r % and 's % are affine combinations then % ~ % ¬ ~
for all
A set ? ~ ¸% Á Ã Á % ¹ of vectors satisfying any (any hence all) of these conditions is said to be affinely independent. Proof. The fact that 1) and 2) are equivalent follows directly from (16.1). If 3) does not hold, we have hull¸% Á Ã Á % ¹ ~ hull¸% Á Ã Á %c Á %b Á Ã Á % ¹ where by (16.1), the latter has dimension at most c 2. Hence, 1) cannot hold and so 1) implies 3). Next we show that 3) implies 4). Suppose that 3) holds and that ' % ~ ' % . Setting ! ~ c gives ! % ~ and ! ~
But if any of the ! 's are nonzero, say ! £ then dividing by ! gives % b ²! °! ³% ~
or % ~ c²! °! ³%
where c²! °! ³ ~
Affine Geometry
417
Hence, % hull¸% Á Ã Á % ¹. This contradiction implies that ! ~ for all , that is, ~ for all . Thus, 3) implies 4). Finally, we show that 4) implies 2). For concreteness, let us show that 4) implies that ¸% c % Á Ã Á % c % ¹ is linearly independent. Indeed, if Á Ã Á and ' ~ then ²% c % ³ ~ ¬ % ~ % ¬ ² c ³% b % ~ % 2
2
2
But the latter is an equality between two affine combinations and so corresponding coefficients must be equal, which implies that ~ for all ~ Á Ã Á . This shows that 4) implies 2).
Affinely independent sets enjoy some of the basic properties of linearly independent sets. For example, a nonempty subset of an affinely independent set is affinely independent. Also, any nonempty set ? contains an affinely independent set. Since the affine hull / ~ hull²?³ of an affinely independent set ? is not the affine hull of any proper subset of ? , we deduce that ? is a minimal affine spanning set of its affine hull. Note that if / ~ hull²?³ where ? ~ ¸% Á Ã Á % ¹ is affinely independent then for any , the set ¸% c % Á Ã Á %c c % Á %b c % Á Ã Á % c % ¹ is a basis for the base subspace of / . Conversely, if / ~ % b : £ : and ) ~ ¸ Á Ã Á ¹ is a basis for : then ) Z ~ ¸%Á % b Á Ã Á % b ¹ is affinely independent and since hull²) Z ³ has dimension and is contained in % b : we must have ) Z ~ / . This provides a way to go between “affine bases” ) Z of a flat and linear bases ) of the base subspace of the flat. Theorem 16.10 If ? is a flat of dimension then there exist b vectors % Á Ã Á %b for which every vector % ? has a unique expression as an affine combination % ~ % b Ä b b %b The coefficients are called the barycentric coordinates of % with respect to the vectors % Á Ã Á %b .
Affine Transformations Now let us discuss some properties of maps that preserve affine structure.
418
Advanced Linear Algebra
Definition A function ¢ = ¦ = that preserves affine combinations, that is, for which ~ ¬ 4 % 5 ~ ²% ³
is called an affine transformation (or affine map, or affinity).
We should mention that some authors require that be bijective in order to be an affine map. The following theorem is the analog of Theorem 16.2. Theorem 16.11 If char²- ³ £ then a function ¢ = ¦ = is an affine transformation if and only if it preserves affine combinations of any two of its points, that is, if and only if ²% b ² c ³&³ ~ r ²%³ b ² c ³ ²&³
Thus, if char²- ³ £ then a map is an affine transformation if and only if it sends the line through % and & to the line through ²%³ and ²&³. It is clear that linear transformations are affine transformations. So are the following maps. Definition Let # = . The affine map ;# ¢ = ¦ = defined by ;# ²%³ ~ % b # for all % = , is called translation by #.
It is not hard to see that any map of the form “linear operator followed by translation,” that is, ;# k , where B²= ³, is affine. Conversely, any affine map must have this form. Theorem 16.12 A function ¢ = ¦ = is an affine transformation if and only if it is a linear operator followed by a translation, ~ ;# k where # = and B²= ³. Proof. We leave proof that ;# k is an affine transformation to the reader. Conversely, suppose that is an affine map. If we expect to have the form ;# k then ²³ will equal ;# k ²³ ~ ;# ²³ ~ #. So let # ~ ²³. We must show that ~ ;c ²³ k is a linear operator on = . However, for any % = ²%³ ~ ²%³ c ²³ and so
Affine Geometry
419
²" b #³ ~ ²" b #³ c ²³ ~ ²" b # b ² c c ³³ c ²³ ~ ²"³ b ²#³ b ² c c ³ ²³ c ²³ ~ ²"³ b ²#³ Thus, is linear.
Corollary 16.13 1) The composition of two affine transformations is an affine transformation. 2) An affine transformation ~ ;# k is bijective if and only if is bijective. 3) The set aff²= ³ of all bijective affine transformations on = is a group under composition of maps, called the affine group of = .
Let us make a few group-theoretic remarks about aff²= ³. The set trans²= ³ of all translations of = is a subgroup of aff²= ³. We can define a function ¢ aff²= ³ ¦ B²= ³ by ²;# k ³ ~ It is not hard to see that is a well-defined group homomorphism from aff²= ³ onto B²= ³, with kernel trans²= ³. Hence, trans²= ³ is a normal subgroup of aff²= ³ and aff²= ³ B²= ³ trans²= ³
Projective Geometry If dim²= ³ ~ , the join (affine hull) of any two distinct points in = is a line. On the other hand, it is not the case that the intersection of any two lines is a point, since the lines may be parallel. Thus, there is a certain asymmetry between the concepts of points and lines in = . This asymmetry can be removed by constructing the projective plane. Our plan here is to very briefly describe one possible construction of projective geometries of all dimensions. By way of motivation, let us consider Figure 16.1.
420
Advanced Linear Algebra
Figure 16.1 Note that / is a hyperplane in a 3-dimensional vector space = and that ¤ / . Now, the set 7²/³ of all flats of = that lie in / is an affine geometry of dimension . (According to our definition of affine geometry, / must be a vector space in order to define 7²/³. However, we hereby extend the definition of affine geometry to include the collection of all flats contained in a flat of = .³ Figure 16.1 shows a one-dimensional flat ? and its linear span º?», as well as a zero-dimensional flat @ and its span º@ ». Note that, for any flat ? in / , we have dim²º?»³ ~ dim²?³ b Note also that if 3 and 3 are any two distinct lines in / , the corresponding planes º3 and º3 » have the property that their intersection is a line through the origin, even if the lines are parallel. We are now ready to define projective geometries. Let = be a vector space of any dimension and let / be a hyperplane / in = not containing the origin. To each flat ? in / , we associate the subspace º?» of = generated by ? . Thus, the linear span function from 7 ¢ 7²/³ ¦ I ²= ³ maps affine subspaces ? of / to subspaces º?» of = . The span function is not surjective: Its image is the set of all subspaces that are not contained in the base subspace 2 of the flat / . The linear span function is one-to-one and its inverse is intersection with / 7 c ²< ³ ~ < q / for any subspace < not contained in 2 .
Affine Geometry
421
The affine geometry 7²/³ is, as we have remarked, somewhat incomplete. In the case dim²/³ ~ every pair of points determines a line but not every pair of lines determines a point. Now, since the linear span function 7 is injective, we can identify 7²/³ with its image 7 ²7²/³³, which is the set of all subspaces of = not contained in the base subspace 2 . This view of 7²/³ allows us to “complete” 7²/³ by including the base subspace 2 . In the three-dimensional case of Figure 16.1, the base plane, in effect, adds a projective line at infinity. With this inclusion, every pair of lines intersects, parallel lines intersecting at a point on the line at infinity. This two-dimensional projective geometry is called the projective plane. Definition Let = be a vector space. The set I ²= ³ of all subspaces of = is called the projective geometry of = . The projective dimension pdim²:³ of : I ²= ³ is defined as pdim²:³ ~ dim²:³ c The projective dimension of F²= ³ is defined to be pdim²= ³ ~ dim²= ³ c . A subspace of projective dimension is called a projective point and a subspace of projective dimension is called a projective line.
Thus, referring to Figure 16.1, a projective point is a line through the origin and, provided that it is not contained in the base plane 2 , it meets / in an affine point. Similarly, a projective line is a plane through the origin and, provided that it is not 2 , it will meet / in an affine line. In short, span²affine point³ ~ line through the origin ~ projective point span²affine line³ ~ plane through the origin ~ projective line The linear span function has the following properties. Theorem 16.14 The linear span function 7 ¢ 7²/³ ¦ I ²= ³ from the affine geometry 7²/³ to the projective geometry I ²= ³ defined by 7 ²?³ ~ º?» satisfies the following properties: 1) The linear span function is injective, with inverse given by 7 c ²< ³ ~ < q / for all subspaces < not contained in the base subspace 2 of / . 2) The image of the span function is the set of all subspaces of = that are not contained in the base subspace 2 of / . 3) ? @ if and only if º?» º@ » 4) If ? are flats in / with nonempty intersection then span4 ? 5 ~ span²? ³ 2
2
422
Advanced Linear Algebra
5) For any collection of flats in / , span8 ? 9 ~ span²? ³ 2
2
6) The linear span function preserves dimension, in the sense that pdim²span²?³³ ~ dim²?³ 7) ? @ if and only if one of º?» q 2 and º@ » q 2 is contained in the other. Proof. To prove part 1), let % b : be a flat in / . Then % / and so / ~ % b 2 , which implies that : 2 . Note also that º% b :» ~ º%» b : and ' º% b :» q / ~ ²º%» b :³ q ²% b 2³ ¬ ' ~ % b ~ % b for some : , 2 and - . This implies that ² c ³% 2 , which implies that either % 2 or ~ . But % / implies % ¤ 2 and so ~ , which implies that ' ~ % b % b : . In other words, º% b :» q / % b : Since the reverse inclusion is clear, we have º% b :» q / ~ % b : This establishes 1). To prove 2), let < be a subspace of = that is not contained in 2 . We wish to show that < is in the image of the linear span function. Note first that since < 2 and dim²2³ ~ dim²= ³ c , we have < b 2 ~ = and so dim²< q 2³ ~ dim²< ³ b dim²2³ c dim²< b 2³ ~ dim²< ³ c Now, let £ % < c 2 . Then % ¤ 2 ¬ º%» b 2 ~ = ¬ % b / for some £ - Á 2 ¬ % / Thus, % < q / for some £ - . Hence, the flat % b ²< q 2³ lies in / and dim²% b ²< q 2³³ ~ dim²< q 2³ ~ dim²< ³ c which implies that span²% b ²< q 2³³ ~ º%» b ²< q 2³ lies in < and has the same dimension as < . In other words, span²% b ²< q 2³³ ~ º%» b ²< q 2³ ~ < We leave proof of the remaining parts of the theorem as exercises.
Affine Geometry
423
Exercises Show that if % Á à Á % = then the set : ~ ¸' % ' ~ ¹ is a subspace of = . 2. Prove that hull¸% Á à Á % ¹ ~ % b º% c % Á à Á % c % ». 3. Prove that the set ? ~ ¸²Á ³Á ²Á ³Á ²Á ³¹ in ²{ ³ is closed under the formation of lines, but not affine hulls. 4. Prove that a flat contains the origin if and only if it is a subspace. 5. Prove that a flat ? is a subspace if and only if for some % ? we have % ? for some £ - . 6. Show that the join of a collection 9 ~ ¸% b : 2¹ of flats in = is the intersection of all flats that contain all flats in 9 . 7. Is the collection of all flats in = a lattice under set inclusion? If not, how can you “fix” this? 8. Suppose that ? ~ % b : and @ ~ & b ; . Prove that if dim²?³ ~ dim²@ ³ and ? @ then : ~ ; . 9. Suppose that ? ~ % b : and @ ~ & b ; are disjoint hyperplanes in = . Show that : ~ ; . 10. (The parallel postulate³ Let ? be a flat in = and # ¤ ? . Show that there is exactly one flat containing #, parallel to ? and having the same dimension as ? . 11. a) Find an example to show that the join ? v @ of two flats may not be the set of all lines connecting all points in the union of these flats. b) Show that if ? and @ are flats with ? q @ £ J then ? v @ is the union of all lines %& where % ? and & @ . 12. Show that if ? @ and ? q @ ~ J then 1.
dim²? v @ ³ ~ max¸dim²?³Á dim²@ ³¹ b 13. Let dim²= ³ ~ . Prove the following: a) The join of any two distinct points is a line. b) The intersection of any two nonparallel lines is a point. 14. Let dim²= ³ ~ . Prove the following: a) The join of any two distinct points is a line. b) The intersection of any two nonparallel planes is a line. c) The join of any two lines whose intersection is a point is a plane. d) The intersection of two coplanar nonparallel lines is a point. e) The join of any two distinct parallel lines is a plane. f) The join of a line and a point not on that line is a plane. g) The intersection of a plane and a line not on that plane is a point. 15. Prove that ¢ = ¦ = is a surjective affine transformation if and only if ~ k ;$ for some $ = and B²= ³. 16. Verify the group-theoretic remarks about the group homomorphism ¢ aff²= ³ ¦ B²= ³ and the subgroup trans²= ³ of aff²= ³.
Chapter 17
Operator Factorizations: QR and Singular Value
The QR Decomposition Let = be a finite-dimensional inner product space over - , where - ~ s or - ~ d. Let us recall a definition. Definition A linear operator on = is upper triangular with respect to an ordered basis 8 ~ ²# Á Ã Á # ³ if the matrix ´ µ8 is upper triangular, that is, if for all ~ Á Ã Á ²# ³ º# Á Ã Á # » The operator is upper triangularizable if there is an ordered basis with respect to which is upper triangular.
Given any orthonormal basis 8 for = , it is possible to find a unitary operator for which is upper triangular with respect to 8 . In matrix terms, this is equivalent to the fact that any matrix ( can be factored into a product ( ~ 89 where 8 is unitary (orthogonal) and 9 is upper triangular. This is the well known QR factorization of a matrix. Before proving this fact, let us repeat one more definition. Definition For a nonzero # = , the unique operator /# for which /# # ~ c#Á ²/# ³Oº#» ~ is called a reflection or a Householder transformation.
According to Theorem 10.11, if )#) ~ )$) £ , then /#c$ is the unique reflection sending # to $, that is, /#c$ ²#³ ~ $.
426
Advanced Linear Algebra
Theorem 17.1 (QR-Factorization of an operator) Let be a linear operator on a finite-dimensional real or complex vector space = . Then for any ordered orthonormal basis 8 ~ ²" Á à Á " ³ for = , there is a unitary (orthogonal) operator and an operator that is upper triangular with respect to 8 , that is, ²" ³ º" Á à Á " » for all ~ Á à Á , for which ~ k Moreover, if is invertible, then can be chosen with positive eigenvalues, in which case both and are unique. Proof. Let ~ ) " ). If % ~ " c " then ²/% k ³²" ³ ~ ²/ " c " k ³²" ³ ~ " º" » where, if is invertible then is positive. Assume for the purposes of induction that, for a given , we have found reflections /% Á à Á /% for which, setting / ²³ ~ /% Ä/% ²/ ²³ k ³" º" Á à Á " » for all . Assume also that if is invertible, the coefficient of " in ²/ ²³ k ³" is positive. We seek a reflection /%b for which ²/%b k / ²³ k ³" º" Á à Á " » for b and for which, if is invertible, the coefficient of " in ²/%b k / ²³ k ³" is positive. Note that if %b º"b Á à Á " » then /%b is the identity on º" Á à Á " » and so, at least for we have ²/%b k / ²³ k ³" /%b ²º" Á à Á " »³ ~ º" Á à Á " » as desired. But we also want to choose %b so that ²/%b k / ²³ k ³"b º" Á à Á "b » Let us write ²/ ²³ k ³"b ~ # b $ where # º" Á à Á " » and $ º"b Á à Á " ». We can accomplish our goal by reflecting $ onto the subspace º"b ». In particular, let %b ~ $ c )$)"b .
Operator Factorizations: QR and Singular Value
427
Since %b º"b Á Ã Á " », the operator /%b is the identity on º" Á Ã Á " » and so as noted earlier ²/%b k / ²³ k ³" º" Á Ã Á " », for Also ²/%b k / ²³ k ³"b ~ /$c)$)"b ²# b $³ ~ # b )$)"b º" Á Ã Á "b » and if is invertible, then $ £ and so )$) . Thus, we have found / ²b³ and by induction, ~ /% Ä/% is upper triangular with respect to 8 , which proves the first part of the theorem. It remains only to prove the uniqueness statement. Suppose that is invertible and that ~ ~ and that the coefficients of " in " and " are positive. Then ~ c ~ c is both unitary and upper triangular with respect to 8 and the coefficient of " in " is positive. We leave it to the reader to show that must be the identity and so ~ and ~ .
Here is the matrix version of the preceding theorem. Theorem 17.2 (The QR factorization) Any real or complex matrix ( can be written in the form ( ~ 89 where 8 is unitary (orthogonal) and 9 is upper triangular. Moreover, if ( is nonsingular then the diagonal entries of 9 may be taken to be positive, in which case the factorization is unique. Proof. According to Theorem 17.1, there is a unitary (orthogonal) operator < for which ´< ( µ; ~ 9 is upper triangular. Hence ( ~ ´( µ; ~ ´< i µ; 9 ~ 89 where 8 is a unitary (orthogonal) matrix.
The QR decomposition has important applications. For example, a system of linear equations (% ~ " can be written in the form 89% ~ " c
and since 8
i
~ 8 , we have 9% ~ 8i "
This is an upper triangular system, which is easily solved by back substitution that is, starting from the bottom and working up.
Advanced Linear Algebra
428
Singular Values Let < and = be finite-dimensional inner product spaces over d or s. The spectral theorem can be of considerable help in understanding the relationship between a linear transformation B²< Á = ³ and its adjoint i B²= Á < ³. This relationship is shown in Figure 17.1. (We assume that < and = are finitedimensional.)
im(W*)
ONB of eigenvectors for W*W
im(W) W
u1
ds1
ur
dsr
v1
W W
vr
W
ur+1 W 0 W vr+1
ONB of eigenvectors for WW*
un W 0 W vm
ker(W)
ker(W*)
Figure 17.1 We begin with a simple observation: If B²< Á = ³ then i B²< ³ is a positive Hermitian operator. Hence, if ~ rk² ³ ~ rk² i ³ then < has an ordered orthonormal basis 8 ~ ²" Á Ã Á " Á "b Á Ã Á " ³ of eigenvectors for i , where the corresponding (not necessarily unique) eigenvalues satisfy Ä ~ b ~ Ä ~ The numbers ~ bj , for ~ Á Ã Á are called the singular values of and for ~ Á Ã Á we have i " ~ where
"
~ for .
It is not hard to show that ²"b Á Ã Á " ³ is an ordered orthonormal basis for ker² ³ and so ²" Á Ã Á " ³ is an ordered orthonormal basis for ker² ³ ~ im² i ³. For if then º " Á " » ~ º i " Á " » ~
Operator Factorizations: QR and Singular Value
429
and so ²" ³ ~ , that is, " ker² ³. On the other hand, if % ~ ' " is in ker² ³ then ~ º ²%³Á ²%³» ~ L ²" ³Á ²" ³M ~ b b
and so ~ for and so ker² ³ º"b Á Ã Á " ». We can achieve some “symmetry” here between and i by setting # ~ ²° ³ " for each , giving #
"
" ~ F and i # ~ F
The vectors # Á Ã Á # are orthonormal, since if Á then º# Á # » ~
º " Á " » ~
º i " Á " » ~
º" Á " » ~ Á
Hence, ²# Á Ã Á # ³ is an orthonormal basis for im² ³ ~ ker² i ³ , which can be extended to an orthonormal basis 9 ~ ²# Á Ã Á # ³ for = , the extension ²#b Á Ã Á # ³ being an orthonormal basis for ker² i ³. The vectors " are called the right singular vectors for and the vectors # are called the left singular vectors for . Moreover, since i # ~
"
~
#
the vectors # Á Ã Á # are eigenvectors for i with the same eigenvalues ~ as for i . This completes the picture in Figure 17.1. Theorem 17.3 Let < and = be finite-dimensional inner product spaces over d or s and let B²< Á = ³ have rank . Then there is an ordered orthonormal basis 8 ~ ²" Á Ã Á " Á "b Á Ã Á " ³ of < and an ordered orthonormal basis 9 ~ ²# Á Ã Á # Á #b Á Ã Á # ³ of = with the following properties: 1) 8 ~ ²" Á Ã Á " ³ is an orthonormal basis for ker² ³ ~ im² i ³ 2) ²"b Á Ã Á " ³ is an orthonormal basis for ker² ³ 3) 9 ~ ²# Á Ã Á # ³ is an orthonormal basis for ker² i ³ ~ im² ³ 4) ²#b Á Ã Á # ³ is an orthonormal basis for ker² i ³ 5) The operators and i behave “symmetrically” on 8 and 9 , specifically, for ,
430
Advanced Linear Algebra
²" ³ ~ i ²# ³ ~
# "
where are called the singular values of . The vectors " are called the right singular vectors for and the vectors # are called the left singular vectors for .
The matrix version of the previous discussion leads to the well known singular value decomposition of a matrix. The matrix of under the ordered orthonormal bases 8 ~ ²" Á Ã Á " ³ and 9 ~ ²# Á Ã Á # ³ is ´ µ8Á9 ~ ' ~ diag² Á
Á Ã Á Á Á Ã Á ³
Given any matrix ( CÁ of rank , let ~ ( be multiplication by (. Then ( ~ ´( µ; Á; where ; and ; are the standard bases for < and = , respectively. By changing orthonormal bases to 8 and 9 we get ( ~ ´( µ; Á; ~ 49Á; ´( µ8Á9 4; Á8 ~ 7 '8i where 7 ~ 49Á; is unitary (orthogonal for - ~ s) with th column equal to ´# µ; and 8 ~ 48Á; is unitary (orthogonal for - ~ s) with th column equal to ´" µ; . As to uniqueness, if ( ~ 7 '8i is a singular value decomposition then (i ( ~ ²7 '8i ³i 7 '8i ~ 8'i '8i and since 'i ' ~ diag² Á Á Ã Á Á Á Ã Á ³, it follows that is an eigenvalue of (i (. Hence, since , we deduce that the singular values are uniquely determined by (. We state without proof the following uniqueness facts. For a proof, the reader may wish to consult reference [HJ1]. If and if the eigenvalues are distinct then 7 is uniquely determined up to multiplication on the right by a diagonal matrix of the form + ~ diag²' Á Ã Á ' ³ with (' ( ~ . If then 8 is never uniquely determined. If ~ ~ then for any given 7 there is a unique 8. Thus, we see that, in general, the singular value decomposition is not unique.
The Moore–Penrose Generalized Inverse Singular values lead to a generalization of the inverse of an operator that applies to all linear transformations. The setup is the same as in Figure 17.1. Referring to that figure, we are prompted to define a linear transformation b ¢ = ¦ < by b # ~ F
"
for for
Operator Factorizations: QR and Singular Value
431
for then ² b ³Oº" ÁÃÁ" » ~ ² b ³Oº"b ÁÃÁ" » ~ and ² b ³Oº# ÁÃÁ# » ~ ² b ³Oº#b ÁÃÁ# » ~ Hence, if ~ ~ then b ~ c . The transformation b is called the Moore–Penrose generalized inverse or Moore–Penrose pseudoinverse of . We abbreviate this as MP inverse. Note that the composition b is the identity on the largest possible subspace of < upon which any composition of the form could be the identity, namely, the orthogonal complement of the kernel of . A similar statement holds for the composition b . Hence, b is as “close” to an inverse for as is possible. We have said that if is invertible then b ~ c . More is true: If is injective then b ~ and so b is a left inverse for . Also, if is surjective then b is a right inverse for . Hence the MP inverse b generalizes the one-sided inverses as well. Here is a characterization of the MP inverse. Theorem 17.4 Let B²< Á = ³. The MP inverse b of is completely characterized by the following four properties: 1) b ~ 2) b b ~ b 3) b is Hermitian 4) b is Hermitian Proof. We leave it to the reader to show that b does indeed satisfy conditions 1)–4) and prove only the uniqueness. Suppose that and satisfy 1)–4) when substituted for b . Then ~ ~ ² ³ i ~ i i ~ ² ³ i i ~ i i i i ~ ² ³ i i i ~ i i ~ ~
432
Advanced Linear Algebra
and ~ ~ ² ³ i ~ i i ~ i ² ³i ~ i i i i ~ i i ² ³i ~ i i ~ ~ which shows that ~ .
The MP inverse can also be defined for matrices. In particular, if ( 4Á ²- ³ then the matrix operator ( has an MP inverse (b . Since this is a linear transformation from - to - , it is just multiplication by a matrix (b ~ ) . This matrix ) is the MP inverse for ( and is denoted by (b . Since (b ~ (b and () ~ ( ) , the matrix version of Theorem 17.4 implies that (b is completely characterized by the four conditions 1) 2) 3) 4)
((b ( ~ ( (b ((b ~ (b ((b is Hermitian (b ( is Hermitian
Moreover, if ( ~ < '<i is the singular value decomposition of the matrix ( then (b ~ < 'Z <i where 'Z is obtained from ' by replacing all nonzero entries by their multiplicative inverses. This follows from the characterization above and also from the fact that, for < 'Z <i # ~ < 'Z ~
c <
~
and for < 'Z <i # ~ < 'Z ~
c "
Operator Factorizations: QR and Singular Value
433
Least Squares Approximation Let us now discuss the most important use of the MP inverse. Consider the system of linear equations (% ~ # where ( 4Á ²- ³. (As usual, - ~ d or - ~ s.) Of course, this system has a solution if and only if # im²( ³. If the system has no solution, then it is of considerable practical importance to be able to solve the system (% ~ V# where V# is the unique vector in im²( ³ that is closest to #, as measured by the unitary (or Euclidean) distance. This problem is called the linear least squares problem. Any solution to the system (% ~ V# is called a least squares solution to the system (% ~ #. Put another way, a least squares solution to (% ~ # is a vector % for which )(% c #) is minimized. Suppose that $ and ' are least squares solutions to (% ~ #. Then ($ ~ V# ~ (' and so $ c ' ker²(³. (We will write ( for ( .) Thus, if $ is a particular least squares solution, then the set of all least squares solutions is $ b ker²(³. Among all solutions, the most interesting is the solution of minimum norm. Note that if there is a least squares solution $ that lies in ker²(³ , then for any ' ker²(³, we have )$ b ' ) ~ )$) b )' ) )$) and so $ will be the unique least squares solution of minimum norm. Before proceeding, we remind the reader of our discussion related to the projection theorem (Theorem 9.12) to the effect that if : is a subspace of a finite-dimensional inner product space = , then the best approximation to a vector # = from within : is the unique vector V# : for which # c V# : . Now we can see how the MP inverse comes into play. Theorem 17.5 Let ( 4Á ²- ³. Among the least squares solutions to the system (% ~ V# there is a unique solution of minimum norm, given by (b #, where (b is the MP inverse of (. Proof. A vector $ is a least squares solution if and only if ($ ~ V#. Using the characterization of the best approximation V#, we see that $ is a solution to ($ ~ V# if and only if
Advanced Linear Algebra
434
($ c # im²(³ Since im²(³ ~ ker²(i ³ this is equivalent to (i ²($ c #³ ~ or (i ($ ~ (i # This system of equations is called the normal equations for (% ~ #. Its solutions are precisely the least squares solutions to the system (% ~ #. To see that $ ~ (b # is a least squares solution, recall that, in the notation of Figure 17.1 ((b # ~ F
#
and so (i (²(b # ³ ~ F
(i #
~ (i #
and since 9 ~ ²# Á Ã Á # ³ is a basis for = we conclude that (b # satisfies the normal equations. Finally, since (b # ker²(³ , we deduce by the preceding remarks that (b # is the unique least squares solution of minimum norm.
Exercises QR-Factorization 1.
2.
3. 4.
Suppose that is unitary and upper triangular with respect to an orthonormal basis 8 and that the coefficient of " in " is positive. Show that must be the identity. Assume that is a nonsingular operator on a finite-dimensional inner product space. Use the Gram–Schmidt process to obtain the QRfactorization of . Prove that for reflections, /" ~ /# if and only if " is a scalar multiple of #. For any nonzero # - , show that the reflection /# is given by /# ~ 0 c
5.
##i )#)
Use the QR factorization to show that any matrix that is similar to an upper triangular matrix is also similar to an upper triangular matrix via a unitary (orthogonal) matrix.
Operator Factorizations: QR and Singular Value
6.
435
Let :Á B²= ³ and suppose that : ~ : . Let Á à Á be the eigenvalues for : and Á à Á be the eigenvalues for . Show that the eigenvalues of : b are b Á à Á b where ² Á à Á ³ is a permutation of ²Á à Á ³. Hence, for commuting operators, ²: b ³ ²:³ b ² ³
7.
8.
Let :Á B²= ³ be commuting operators. Let Á Ã Á be the eigenvalues for : and Á Ã Á be the eigenvalues for . Using the previous exercise, show that if all of the sums b are nonzero, then : b is invertible. Let 1 be the matrix v xÅ 1 ~x w
9.
Ä y Ç { { Ç Ç Å Ä z
that has 's on the diagonal that moves up from left to right and 's elsewhere. Find 1 c and 1 i . Compare 1 ( with (. Compare (1 with (. Compare 1 (1 i with (. Show that any upper triangular matrix is unitarily equivalent to a lower triangular matrix. If B²= ³ and 8 ~ ²" Á Ã Á " ³ is a basis for which " º" Á Ã Á " » then find a basis 9 ~ ²% Á Ã Á % ³ for which % º% Á Ã Á % »
10. (Cholsky decomposition) We have seen that a linear operator is positive if and only if it has the form ~ i for some operator . Using the QRfactorization of , prove the following result, known as the Cholsky decomposition. A linear operator B²= ³ is positive if and only if it has the form ~ i where is upper triangularizable. Moreover, if is invertible then can be chosen with positive eigenvalues, in which case the factorization is unique.
Singular Values 11. Let B²< ³. Show that the singular values of i are the same as those of . 12. Find the singular values and the singular value decomposition of the matrix
436
Advanced Linear Algebra
(~>
?
Find (b . 13. Find the singular values and the singular value decomposition of the matrix (~>
?
Find (b . Hint: is it better to work with (i ( or ((i ? 14. Let ? ~ ²% % Ä % ³! be a column matrix over d. Find a singular value decomposition of ? . 15. Let ( 4Á ²- ³ and let ) 4bÁb ²- ³ be the square matrix )~>
(i
( ?block
Show that, counting multiplicity, the nonzero eigenvalues of ) are precisely the singular values of ( together with their negatives. Hint: Let ( ~ < '<i be a singular–value decomposition of ( and try factoring ) into a product < :< i where < is unitary. Do not read the following second hint unless you get stuck. Second Hint: verify the block factorization )~>
<
< ?> '
'i ?> <i
<i ?
What are the eigenvalues of the middle factor on the right? (Try b b and c b .) 16. Use the results of the previous exercise to show that a matrix ( 4Á ²- ³, its adjoint (i , its transpose (! and its conjugate ( all have the same singular values. Show also that if < and < Z are unitary then ( and < (< Z have the same singular values. 17. Let ( 4 ²- ³ be nonsingular. Show that the following procedure produces a singular-value decomposition ( ~ < '<i of (. a) Write ( ~ < +< i where + ~ diag² Á à Á ³ and the 's are positive and the columns of < form an orthonormal basis of eigenvectors for (. (We never said that this was a practical procedure.) ° ° b) Let ' ~ diag² Á à Á ³ where the square roots are nonnegative. Also let < ~ < and U ~ (i < 'c . 18. If ( ~ ²Á ³ is an d matrix then the Frobenius norm of ( is ° )()- ~ 8 Á 9 Á
Show that )()- ~ (.
is the sum of the squares of the singular values of
Chapter 18
The Umbral Calculus
In this chapter, we give a brief introduction to an area called the umbral calculus. This is a linear-algebraic theory used to study certain types of polynomial functions that play an important role in applied mathematics. We give only a brief introduction to the subject, emphasizing the algebraic aspects rather than the applications. For more on the umbral calculus, may we suggest The Umbral Calculus, by Roman ´1984µ? One bit of notation: The lower factorial numbers are defined by ²³ ~ ² c ³Ä² c b ³
Formal Power Series We begin with a few remarks concerning formal power series. Let < denote the algebra of formal power series in the variable !, with complex coefficients. Thus, < is the set of all formal sums of the form B
²!³ ~ !
(18.1)
~
where d (the complex numbers). Addition and multiplication are purely formal B
B
B
! b ! ~ ² b ³! ~
~
~
and B
B
B
4 ! 54 ! 5 ~ 4 c 5 ! ~
~
~
~
The order ² ³ of is the smallest exponent of ! that appears with a nonzero coefficient. The order of the zero series is defined to be b B. Note that a series
438
Advanced Linear Algebra
has a multiplicative inverse, denoted by c , if and only if ² ³ ~ . We leave it to the reader to show that ² ³ ~ ² ³ b ²³ and ² b ³ min¸² ³Á ²³¹ If is a sequence in < with ² ³ ¦ B as ¦ then for any series B
²!³ ~ ! ~
we may substitute for ! to get the series B
²!³ ~ ²!³ ~
which is well-defined since the coefficient of each power of ! is a finite sum. In particular, if ² ³ then ² ³ ¦ B and so the composition B
² k ³²!³ ~ ² ²!³³ ~ ²!³ ~
is well-defined. It is easy to see that ² k ³ ~ ²³² ³. If ² ³ ~ then has a compositional inverse, denoted by and satisfying ² k ³²!³ ~ ² k ³²!³ ~ ! A series with ² ³ ~ is called a delta series. The sequence of powers of a delta series forms a pseudobasis for < , in the sense that for any < , there exists a unique sequence of constants for which B
²!³ ~ ²!³ ~
Finally, we note that the formal derivative of the series (18.1) is given by B
C! ²!³ ~ Z ²!³ ~ !c ~
The operator C! is a derivation, that is, C! ² ³ ~ C! ² ³ b C! ²³
The Umbral Calculus
439
The Umbral Algebra Let F ~ d´%µ denote the algebra of polynomials in a single variable % over the complex field. One of the starting points of the umbral calculus is the fact that any formal power series in < can play three different roles: as a formal power series, as a linear functional on F and as a linear operator on F . Let us first explore the connection between formal power series and linear functionals. Let F i denote the vector space of all linear functionals on F . Note that F i is the algebraic dual space of F , as defined in Chapter 2. It will be convenient to denote the action of 3 F i on ²%³ F by º3 ²%³» (This is the “bra-ket” notation of Paul Dirac.) The vector space operations on F i then take the form º3 b 4 ²%³» ~ º3 ²%³» b º4 ²%³» and º3 ²%³» ~ º3 ²%³»Á d Note also that since any linear functional on F is uniquely determined by its values on a basis for F Á the functional 3 F i is uniquely determined by the values º3 % » for . Now, any formal series in < can be written in the form B
²!³ ~ ~
! !
and we can use this to define a linear functional ²!³ by setting º ²!³ % » ~ for . In other words, the linear functional ²!³ is defined by B
º ²!³ % » ! ! ~
²!³ ~
where the expression ²!³ on the left is just a formal power series. Note in particular that º! % » ~ !Á where Á is the Kronecker delta function. This implies that º! ²%³» ~ ²³ ²³ and so ! is the functional “ th derivative at .” Also, ! is evaluation at .
440
Advanced Linear Algebra
As it happens, any linear functional 3 on F has the form ²!³. To see this, we simply note that if B
º3 % » ! ! ~
3 ²!³ ~ then
º3 ²!³ % » ~ º3 % » for all and so as linear functionals, 3 ~ 3 ²!³. Thus, we can define a map ¢ F i ¦ < by ²3³ ~ 3 ²!³. Theorem 18.1 The map ¢ F i ¦ < defined by ²3³ ~ 3 ²!³ is a vector space isomorphism from F i onto < . Proof. To see that is injective, note that 3 ²!³ ~ 4 ²!³ ¬ º3 % » ~ º4 % » for all ¬ 3 ~ 4 Moreover, the map is surjective, since for any < , the linear functional 3 ~ ²!³ has the property that ²3³ ~ 3 ²!³ ~ ²!³. Finally, B
º3 b 4 % » ! ! ~
²3 b 4 ³ ~
B
B º3 % » º4 % » ! b ! ! ! ~ ~
~
~ ²3³ b ²4 ³
i
From now on, we shall identify the vector space F with the vector space < , using the isomorphism ¢ F i ¦ < . Thus, we think of linear functionals on F simply as formal power series. The advantage of this approach is that < is more than just a vector space—it is an algebra. Hence, we have automatically defined a multiplication of linear functionals, namely, the product of formal power series. The algebra < , when thought of as both the algebra of formal power series and the algebra of linear functionals on F , is called the umbral algebra. Let us consider an example. Example 18.1 For d, the evaluation functional F i is defined by º ²%³» ~ ²³
The Umbral Calculus
441
In particular, º % » ~ and so the formal power series representation for this functional is B
B º % » ! ~ ! ~ ! ! ! ~ ~
²!³ ~
which is the exponential series. If ! is evaluation at then ! ! ~ ²b³! and so the product of evaluation at and evaluation at is evaluation at b .
When we are thinking of a delta series < as a linear functional, we refer to it as a delta functional. Similarly, an invertible series < is referred to as an invertible functional. Here are some simple consequences of the development so far. Theorem 18.2 1) For any < , B
º ²!³ % » ! [ ~
²!³ ~ 2) For any F ,
º! ²%³» % [
²%³ ~ 3) For any Á < ,
º ²!³²!³ % » ~ 4 5 º ²!³ % »º²!³ %c » ~
4) ² ²!³³ deg ²%³ ¬ º ²!³ ²%³» ~ 5) If ² ³ ~ for all then B
L ²!³c²%³M ~ º ²!³ ²%³» ~
where the sum on the right is a finite one. 6) If ² ³ ~ for all then º ²!³ ²%³» ~ º ²!³ ²%³» for all ¬ ²%³ ~ ²%³ 7) If deg ²%³ ~ for all then º ²!³ ²%³» ~ º²!³ ²%³» for all ¬ ²!³ ~ ²!³
442
Advanced Linear Algebra
Proof. We prove only part 3). Let B
²!³ ~ ~
B ! and ²!³ ~ ! ! ! ~
Then B
²!³²!³ ~ 4 ~
4 5 c 5! ! ~
and applying both sides of this ²as linear functionals) to % gives º ²!³²!³ % » ~ 4 5 c ~
The result now follows from the fact that part 1) implies ~ º ²!³ % » and c ~ º²!³ %c ».
We can now present our first “umbral” result. Theorem 18.3 For any ²!³ < and ²%³ F , º ²!³ %²%³» ~ ºC! ²!³ ²%³» Proof. By linearity, we need only establish this for ²%³ ~ % . But, if B
²!³ ~ ~
! !
then B
!c c% M ! ² c ³ ~
ºC! ²!³ % » ~ L B
cÁ ² c ³! ~
~
~ b ~ º ²!³ %b »
Let us consider a few examples of important linear functionals and their power series representations. Example 18.2 1) We have already encountered the evaluation functional ! , satisfying º! ²%³» ~ ²³
The Umbral Calculus
443
2) The forward difference functional is the delta functional ! c , satisfying º! c ²%³» ~ ²³ c ²³ 3) The Abel functional is the delta functional !e! , satisfying º!e! ²%³» ~ Z ²³ 4) The invertible functional ² c !³c satisfies º² c !³c ²%³» ~
B
²"³c" "
as can be seen by setting ²%³ ~ % and expanding the expression ² c !³c . 5) To determine the linear functional satisfying
º ²!³ ²%³» ~ ²"³ "
we observe that B
B º ²!³ % » b !! c ! ~ ! ~ ! ² b ³[ ! ~ ~
²!³ ~
The inverse !°²! c ³ of this functional is associated with the Bernoulli polynomials, which play a very important role in mathematics and its applications. In fact, the numbers ) ~ L
! c% M ! c
are known as the Bernoulli numbers.
Formal Power Series as Linear Operators We now turn to the connection between formal power series and linear operators on F . Let us denote the th derivative operator on F by ! . Thus, ! ²%³ ~ ²³ ²%³ We can then extend this to formal series in ! B
²!³ ~ ~
! !
(18.2)
444
Advanced Linear Algebra
by defining the linear operator ²!³¢ F ¦ F by B
²!³²%³ ~ ~
´! ²%³µ ~ ²³ ²%³ ! !
the latter sum being a finite one. Note in particular that ²!³% ~ 4 5 %c ~
(18.3)
With this definition, we see that each formal power series < plays three roles in the umbral calculus, namely, as a formal power series, as a linear functional and as a linear operator. The two notations º ²!³ ²%³» and ²!³²%³ will make it clear whether we are thinking of as a functional or as an operator. It is important to note that ~ in < if and only if ~ as linear functionals, which holds if and only if ~ as linear operators. It is also worth noting that ´ ²!³²!³µ²%³ ~ ²!³´²!³²%³µ and so we may write ²!³²!³²%³ without ambiguity. In addition, ²!³²!³²%³ ~ ²!³ ²!³²%³ for all Á < and F . When we are thinking of a delta series as an operator, we call it a delta operator. The following theorem describes the key relationship between linear functionals and linear operators of the form ²!³. Theorem 18.4 If Á < then º ²!³²!³ ²%³» ~ º ²!³ ²!³²%³» for all polynomials ²%³ F . Proof. If has the form (18.2) then by (18.3), º! (!³% » ~ L! c4 5 %c M ~ ~ º ²!³ % » ~
²18.4)
By linearity, this holds for % replaced by any polynomial ²%³. Hence, applying this to the product gives º ²!³²!³ ²%³» ~ º! ²!³²!³²%³» ~ º! ²!³´²!³²%³µ» ~ º ²!³ ²!³²%³»
Equation (18.4) shows that applying the linear functional (!³ is equivalent to applying the operator ²!³ and then following by evaluation at % ~ .
The Umbral Calculus
445
Here are the operator versions of the functionals in Example 18.2. Example 18.3 1) The operator ! satisfies B
! % ~ 4 5 %c ~ ²% b ³ ! ~ ~
! % ~ and so
! ²%³ ~ ²% b ³ for all F . Thus ! is a translation operator. 2) The forward difference operator is the delta operator ! c , where ²! c )²%³ ~ ²% b ³ c ²³ 3) The Abel operator is the delta operator !e! , where !e! ²%³ ~ Z ²% b ³ 4) The invertible operator ² c !³c satisfies ² c !³c ²%³ ~
B
²% b "³c" du
5) The operator ²! c )°! is easily seen to satisfy %b ! c ²%³ ~ ²"³ " ! %
We have seen that all linear functionals on F have the form ²!³, for < . However, not all linear operators on F have this form. To see this, observe that deg ´ ²!³²%³µ deg ²%³ but the linear operator ¢ F ¦ F defined by ²²%³³ ~ %²%³ does not have this property. Let us characterize the linear operators of the form ²!³. First, we need a lemma. Lemma 18.5 If ; is a linear operator on 7 and ; ²!³ ~ ²!³; for some delta series ²!³ then deg²; ²%³³ deg²²%³³. Proof. For any deg²; % ³ c ~ deg² ²!³; % ³ ~ deg²; ²!³% ³ and so
446
Advanced Linear Algebra
deg²; % ³ ~ deg²; ²!³% ³ b Since deg² ²!³% ³ ~ c we have the basis for an induction. When ~ we get deg²; ³ ~ . Assume that the result is true for c . Then deg²; % ³ ~ deg²; ²!³% ³ b c b ~
Theorem 18.6 The following are equivalent for a linear operator ; ¢ F ¦ F . 1) ; has the form ²!³, that is, there exists an < for which ; ~ ²!³, as linear operators. 2) ; commutes with the derivative operator, that is, ; ! ~ !; . 3) ; commutes with any delta operator ²!³, that is, ; ²!³ ~ ²!³; . 4) ; commutes with any translation operator, that is, ; ! ~ ! ; . Proof. It is clear that 1) implies 2). For the converse, let B
º! ; % » ! ! ~
²!³ ~ Then
º²!³ % » ~ º! ; % » Now, since ; commutes with !, we have º! ; % » ~ º! ! ; % » ~ º! ; ! % » ~ ²³ º! ; %c » ~ ²³ º! ²!³%c » ~ º! ²!³% » and since this holds for all and we get ; ~ ²!³. We leave the rest of the proof as an exercise.
Sheffer Sequences We can now define the principal object of study in the umbral calculus. When referring to a sequence ²%³ in F , we shall always assume that deg ²%³ ~ for all . Theorem 18.7 Let be a delta series, let be an invertible series and consider the geometric sequence Á Á Á Á Ã in < . Then there is a unique sequence conditions
²%³
in F satisfying the orthogonality
The Umbral Calculus
º²!³ ²!³
²%³»
~ [Á
447
(18.5)
for all Á . Proof. The uniqueness follows from Theorem 18.2. For the existence, if we set ²%³ ~ Á % ~
and B
²!³ ²!³ ~ Á ! ~
where Á £ then (18.5) is B
[Á ~ L Á ! c Á % M ~ B
~
~ Á Á º! c% » ~ ~
~ Á Á [ ~
Taking ~ we get Á ~
Á
For ~ c we have ~ cÁc Ác ² c ³[ b cÁ Á [ and using the fact that Á ~ °Á we can solve this for Ác . By successively taking ~ Á c Á c Á Ã we can solve the resulting equations for the coefficients Á of the sequence ²%³.
Definition The sequence ²%³ in (18.5) is called the Sheffer sequence for the ordered pair ²²!³Á ²!³³. We shorten this by saying that ²%³ is Sheffer for ²²!³Á ²!³³.
Two special types of Sheffer sequences deserve explicit mention. Definition The Sheffer sequence for a pair of the form ²Á ²!³³ is called the associated sequence for ²!³. The Sheffer sequence for a pair of the form ²²!³Á !³ is called the Appell sequence for ²!³.
448
Advanced Linear Algebra
Note that the sequence
²%³
is Sheffer for ²²!³Á ²!³³ if and only if
º²!³ ²!³
²%³»
~ [Á
which is equivalent to º ²!³ ²!³ ²%³» ~ [Á which, in turn, is equivalent to saying that the sequence ²%³ ~ ²!³ ²%³ is the associated sequence for ²!³. Theorem 18.8 The sequence ²%³ is Sheffer for ²²!³Á ²!³³ if and only if the sequence ²%³ ~ ²!³ ²%³ is the associated sequence for ²!³.
Before considering examples, we wish to describe several characterizations of Sheffer sequences. First, we require a key result. Theorem 18.9 (The expansion theorems) Let 1) For any < ,
²%³
be Sheffer for ²²!³Á ²!³³.
B
º²!³ ²%³» ²!³ ²!³ ! ~
²!³ ~ 2) For any F ,
º²!³ ²!³ ²%³ !
²%³ ~
²%³
Proof. Part 1) follows from Theorem 18.2, since B
B º²!³ ²%³» º²!³ ²%³» ²!³ ²!³c ²%³M ~ !Á ! ! ~ ~
L
~ º²!³
²%³»
Part 2) follows in a similar way from Theorem 18.2.
We can now begin our characterization of Sheffer sequences, starting with the generating function. The idea of a generating function is quite simple. If ²%³ is a sequence of polynomials, we may define a formal power series of the form B
²%³ ! ! ~
²!Á %³ ~
This is referred to as the (exponential) generating function for the sequence ²%³. (The term exponential refers to the presence of ! in this series. When this is not present, we have an ordinary generating function.³ Since the series is a formal one, knowing ²!Á %³ is equivalent (in theory, if not always in practice)
The Umbral Calculus
449
to knowing the polynomials ²%³. Moreover, a knowledge of the generating function of a sequence of polynomials can often lead to a deeper understanding of the sequence itself, that might not be otherwise easily accessible. For this reason, generating functions are studied quite extensively. For the proofs of the following characterizations, we refer the reader to Roman ´1984µ. Theorem 18.10 (Generating function) 1) The sequence ²%³ is the associated sequence for a delta series ²!³ if and only if B
²&³ ! ! ~
& ²!³ ~
where ²!³ is the compositional inverse of ²!³. 2) The sequence ²%³ is Sheffer for ²²!³Á ²!³³ if and only if B ²&³ & ²!³ ~ ! ! ² ²!³³ ~
The sum on the right is called the generating function of ²%³. Proof. Part 1) is a special case of part 2). For part 2), the expression above is equivalent to B &! ²&³ ~ ²!³ ²!³ ! ~
which is equivalent to B
&! ~ ~
²&³
!
²!³ ²!³
But if ²%³ is Sheffer for ² ²!³Á ²!³³ then this is just the expansion theorem for &! . Conversely, this expression implies that B ²&³
~ º&!
²%³»
~ ~
and so º²!³ ²!³ ² Á ³.
²%³»
²&³
!
º²!³ ²!³
~ [Á , which says that
We can now give a representation for Sheffer sequences. Theorem 18.11 (Conjugate representation)
²%³»
²%³
is Sheffer for
450
Advanced Linear Algebra
1) A sequence ²%³ is the associated sequence for ²!³ if and only if
º ²!³ % »% [ ~
²%³ ~ 2) A sequence
²%³
is Sheffer for ²²!³Á ²!³³ if and only if
º² ²!³³c ²!³ % »% [ ~
²%³ ~
Proof. We need only prove part 2). We know that ²²!³Á ²!³³ if and only if
²%³
is Sheffer for
B ²&³ & ²!³ ~ ! ! ² ²!³³ ~
But this is equivalent to B ²&³ & ²!³ % ~ ! c% M ~ L N O ! ² ²!³³ ~
²&³
Expanding the exponential on the left gives B
B º² ²!³³c ²!³ % » ²&³ & ~ L ! c% M ~ [ ! ~ ~
²&³
Replacing & by % gives the result.
Sheffer sequences can also be characterized by means of linear operators. Theorem 18.12 ²Operator characterization) 1) A sequence ²%³ is the associated sequence for ²!³ if and only if a) ²³ ~ Á b) ²!³ ²%³ ~ c ²%³ for 2) A sequence ²%³ is Sheffer for ²²!³Á ²!³³, for some invertible series ²!³ if and only if ²!³ ²%³ ~
c ²%³
for all . Proof. For part 1), if ²%³ is associated with ²!³ then ²³ ~ º! ²%³» ~ º ²!³ ²%³» ~ [Á and
The Umbral Calculus
451
º ²!³ ²!³ ²%³» ~ º ²!³b ²%³» ~ [Áb ~ ² c ³[cÁ ~ º ²!³ c ²%³» and since this holds for all we get 1b). Conversely, if 1a) and 1b) hold then º ²!³ ²%³» ~ º! ²!³ ²%³» ~ ²³ c ²³ ~ ²³ cÁ ~ [Á and so ²%³ is the associated sequence for ²!³. As for part 2), if
²%³
is Sheffer for ²²!³Á ²!³³ then
º²!³ ²!³ ²!³ ²%³» ~ º²!³ ²!³b ²%³» ~ [Áb ~ ² c ³[cÁ ~ º²!³ ²!³ c ²%³» and so ²!³ ²%³ ~
c ²%³,
as desired. Conversely, suppose that
²!³ ²%³ ~
c ²%³
and let ²%³ be the associated sequence for ²!³. Let ; be the invertible linear operator on = defined by ;
²%³
~ ²%³
Then ; ²!³ ²%³ ~ ;
c ²%³
~ c ²%³ ~ ²!³ ²%³ ~ ²!³;
²%³
and so Theorem 18.5 implies that ; ~ ²!³ for some invertible series ²!³. Then º²!³ ²!³
and so
²%³
²%³»
~ º ²!³ ²!³ ²%³» ~ º! ²!³ ²%³» ~ ²³ c ²³ ~ ²³ cÁ ~ [Á
is Sheffer for ²²!³Á ²!³³.
We next give a formula for the action of a linear operator ²!³ on a Sheffer sequence.
452
Advanced Linear Algebra
Theorem 18.13 Let ²%³ be a Sheffer sequence for ²²!³Á ²!³³ and let ²%³ be associated with ²!³. Then for any ²!³ we have ²!³ ²%³ ~ 4 5º²!³ ~
²%³»c ²%³
Proof. By the expansion theorem B
º²!³ ²%³» ²!³ ²!³ ! ~
²!³ ~ we have B
º²!³ ²%³» ²!³ ²!³ ²%³ ! ~
²!³ ²%³ ~ B
º²!³ ²%³» ²³ c ²%³ ! ~
~
which is the desired formula.
Theorem 18.14 1) (The binomial identity) A sequence ²%³ is the associated sequence for a delta series ²!³ if and only if it is of binomial type, that is, if and only if it satisfies the identity ²% b &³ ~ 4 5 ²&³c ²%³ ~
for all & d. 2) (The Sheffer identity) A sequence some invertible ²!³ if and only if ²%
²%³
is Sheffer for ²²!³Á ²!³³, for
b &³ ~ 4 5 ²&³ ~
c ²%³
for all & d, where ²%³ is the associated sequence for ²!³. Proof. To prove part 1), if ²%³ is an associated sequence then taking ²!³ ~ &! in Theorem 18.13 gives the binomial identity. Conversely, suppose that the sequence ²%³ is of binomial type. We will use the operator characterization to show that ²%³ is an associated sequence. Taking % ~ & ~ we have for ~ ²³ ~ ²³ ²³ and so ²³ ~ . Also,
The Umbral Calculus
453
²³ ~ ²³ ²³ b ²³ ²³ ~ ²³ and so ²³ ~ . Assuming that ²³ ~ for ~ Á à Á c we have ²³ ~ ²³ ²³ b ²³ ²³ ~ ²³ and so ²³ ~ . Thus, ²³ ~ Á . Next, define a linear functional ²!³ by º ²!³ ²%³» ~ Á Since º ²!³ » ~ º ²!³ ²%³» ~ and º ²!³ ²%³» ~ £ we deduce that ²!³ is a delta series. Now, the binomial identity gives º ²!³ &! ²%³» ~ 4 5 ²&³º ²!³ c ²%³» ~ ~ 4 5 ²&³cÁ ~
~ c ²&³ and so º&! ²!³ ²%³» ~ º&! c ²%³» and since this holds for all &, we get ²!³ ²%³ ~ c ²%³. Thus, ²%³ is the associated sequence for ²!³. For part 2), if ²%³ is a Sheffer sequence then taking ²!³ ~ &! in Theorem 18.13 gives the Sheffer identity. Conversely, suppose that the Sheffer identity holds, where ²%³ is the associated sequence for ²!³. It suffices to show that ²!³ ²%³ ~ ²%³ for some invertible ²!³. Define a linear operator ; by ;
²%³
~ ²%³
Then &! ;
²%³
~ &! ²%³ ~ ²% b &³
and by the Sheffer identity ; &! ²%³ ~ 4 5 ²&³; ~
c ²%³
~ 4 5 ²&³c ²%³ ~
and the two are equal by part 1). Hence, ; commutes with &! and is therefore of the form ²!³, as desired.
454
Advanced Linear Algebra
Examples of Sheffer Sequences We can now give some examples of Sheffer sequences. While it is often a relatively straightforward matter to verify that a given sequence is Sheffer for a given pair ²²!³Á ²!³³, it is quite another matter to find the Sheffer sequence for a given pair. The umbral calculus provides two formulas for this purpose, one of which is direct, but requires the usually very difficult computation of the series ² ²!³°!³c . The other is a recurrence relation that expresses each ²%³ in terms of previous terms in the Sheffer sequence. Unfortunately, space does not permit us to discuss these formulas in detail. However, we will discuss the recurrence formula for associated sequences later in this chapter. Example 18.4 The sequence ²%³ ~ % is the associated sequence for the delta series ²!³ ~ !. The generating function for this sequence is B
&! ~ ~
& ! [
and the binomial identity is the well known binomial formula ²% b &³ ~ 4 5 % &c ~
Example 18.5 The lower factorial polynomials ²%³ ~ %²% c ³Ä²% c b ³ form the associated sequence for the forward difference functional ²!³ ~ ! c discussed in Example 18.2. To see this, we simply compute, using Theorem 18.12. Since ²³ is defined to be , we have ²³ ~ Á . Also, ²! c ³²%³ ~ ²% b ³ c ²%³ ~ ´²% b ³%²% c ³Ä²% c b ³µ c ´%²% c ³Ä²% c b ³µ ~ %²% c ³Ä²% c b ³´²% b ³ c ²% c b ³µ ~ %²% c ³Ä²% c b ³ ~ ²%³c The generating function for the lower factorial polynomials is B
²&³ ! [ ~
& log²b!³ ~
The Umbral Calculus
455
which can be rewritten in the more familiar form B & ² b !³& ~ 4 5 ! ~
Of course, this is a formal identity, so there is no need to make any restrictions on !. The binomial identity in this case is ²% b &³ ~ 4 5²%³ ²&³c ~
which can also be written in the form 4
%b& % & 5 ~ 4 54 5 c ~
This is known as the Vandermonde convolution formula. Example 18.6 The Abel polynomials ( ²%Â ³ ~ %²% c ³c form the associated sequence for the Abel functional ²!³ ~ !e! also discussed in Example 18.2. We leave verification of this to the reader. The generating function for the Abel polynomials is B
&²& c ³c ! [ ~
& ²!³ ~
Taking the formal derivative of this with respect to & gives B
²& c ³²& c ³c ! [ ~
²!³& ²!³ ~
which, for & ~ , gives a formula for the compositional inverse of the series ²!³ ~ !! , B
²c³ c ! ²c³[ ~
²!³ ~
Example 18.7 The famous Hermite polynomials / ²%³ form the Appell sequence for the invertible functional
²!³ ~ ! °
456
Advanced Linear Algebra
We ask the reader to show that ²%³ is the Appell sequence for ²!³ if and only if ²%³ ~ ²!³c % . Using this fact, we get ²³ c / ²%³ ~ c! ° % ~ ²c ³ % [ The generating function for the Hermite polynomials is B
/ ²&³ ! [ ~
&!c! ° ~ and the Sheffer identity is
/ ²% b &³ ~ 4 5/ ²%³&c ~
We should remark that the Hermite polynomials, as defined in the literature, often differ from our definition by a multiplicative constant.
²³
Example 18.8 The well known and important Laguerre polynomials 3 ²%³ of order form the Sheffer sequence for the pair ²!³ ~ ² c !³cc Á ²!³ ~
! !c
It is possible to show ²although we will not do so here³ that
[ b 4 5²c%³ [ c ~
3²³ ²%³ ~
The generating function of the Laguerre polynomials is ²³
B 3 ²%³ &!°²!c³ ~ ! ² c !³b [ ~
As with the Hermite polynomials, some definitions of the Laguerre polynomials differ by a multiplicative constant.
We presume that the few examples we have given here indicate that the umbral calculus applies to a significant range of important polynomial sequences. In Roman ´1984µ, we discuss approximately 30 different sequences of polynomials that are (or are closely related to) Sheffer sequences.
Umbral Operators and Umbral Shifts We have now established the basic framework of the umbral calculus. As we have seen, the umbral algebra plays three roles: as the algebra of formal power series in a single variable, as the algebra of all linear functionals on F and as the
The Umbral Calculus
457
algebra of all linear operators on F that commute with the derivative operator. Moreover, since < is an algebra, we can consider geometric sequences Á Á Á Á à in < , where ²³ ~ and ² ³ ~ . We have seen by example that the orthogonality conditions º²!³ ²!³
²%³»
~ [Á
define important families of polynomial sequences. While the machinery that we have developed so far does unify a number of topics from the classical study of polynomial sequences (for example, special cases of the expansion theorem include Taylor's expansion, the EulerMacLaurin formula and Boole's summation formula), it does not provide much new insight into their study. Our plan now is to take a brief look at some of the deeper results in the umbral calculus, which center around the interplay between operators on F and their adjoints, which are operators on the umbral algebra < ~ F i. We begin by defining two important operators on F associated with each Sheffer sequence. Definition Let ²%³ be Sheffer for ²²!³Á ²!³³. The linear operator Á ¢ F ¦ F defined by Á ²% ³ ~
²%³
is called the Sheffer operator for the pair ²²!³Á ²!³³, or for the sequence ²%³. If ²%³ is the associated sequence for ²!³, the Sheffer operator ²% ³ ~ ²%³ is called the umbral operator for ²!³, or for ²%³.
Definition Let ²%³ be Sheffer for ²²!³Á ²!³³. The linear operator Á ¢ F ¦ F defined by Á ´ ²%³µ ~
b ²%³
is called the Sheffer shift for the pair ²²!³Á ²!³³, or for the sequence ²%³ is the associated sequence for ²!³, the Sheffer operator ´ ²%³µ ~ b ²%³ is called the umbral shift for ²!³, or for ²%³.
²%³.
If
458
Advanced Linear Algebra
It is clear that each Sheffer sequence uniquely determines a Sheffer operator and vice versa. Hence, knowing the Sheffer operator of a sequence is equivalent to knowing the sequence.
Continuous Operators on the Umbral Algebra It is clearly desirable that a linear operator ; on the umbral algebra < pass under infinite sums, that is, that B
B
; 4 ²!³5 ~ ; ´ ²!³µ ~
(18.6)
~
whenever the sum on the left is defined, which is precisely when ² ²!³³ ¦ B as ¦ B. Not all operators on < have this property, which leads to the following definition. Definition A linear operator ; on the umbral algebra < is continuous if it satisfies ²18.6).
The term continuous can be justified by defining a topology on < . However, since no additional topological concepts will be needed, we will not do so here. Note that in order for (18.6) to make sense, we must have ²; ´ ²!³µ³ ¦ B. It turns out that this condition is also sufficient. Theorem 18.15 A linear operator ; on < is continuous if and only if ² ³ ¦ B ¬ ²; ² ³³ ¦ B
(18.7)
Proof. The necessity is clear. Suppose that (18.7) holds and that ² ³ ¦ B. For any , we have B
L; ²!³c% M ~ L; ²!³c% M b L; ²!³c% M (18.8) ~
~
Since 8 ²!³9 ¦ B
(18.7) implies that we may choose large enough so that 4; ²!³5
and ²; ´ ²!³µ³ for
The Umbral Calculus
459
Hence, (18.8) gives B
L; ²!³c% M ~ L; ²!³c% M ~
~
~ L ; ´ ²!³µc% M ~ B
~ L ; ´ ²!³µc% M ~
which implies the desired result.
Operator Adjoints If ¢ F ¦ F is a linear operator on F then its (operator) adjoint d is an operator on F i ~ < defined by d ´²!³µ ~ ²!³ k In the symbolism of the umbral calculus, this is º d ²!³ ²%³» ~ º²!³ ²%³» (We have reduced the number of parentheses used to aid clarity.³ Let us recall the basic properties of the adjoint from Chapter 3. Theorem 18.16 For Á B²F ³, 1) ² b ³d ~ d b d 2) ² ³d ~ d for any d 3) ²³d ~ d d 4) ² c ³d ~ ² d ³c for any invertible B²F ³
Thus, the map ¢ B²F ³ ¦ B²< ³ that sends ¢ F ¦ F to its adjoint d ¢ < ¦ < is a linear transformation from B²F ³ to B²< ³. Moreover, since d ~ implies that º²!³ ²%³» ~ for all ²!³ < and ²%³ F , which in turn implies that ~ , we deduce that is injective. The next theorem describes the range of . Theorem 18.17 A linear operator ; B²< ³ is the adjoint of a linear operator B B²F ³ if and only if ; is continuous. Proof. First, suppose that ; ~ d for some B²F ³ and let ² ²!³³ ¦ B. If then for all we have º d ²!³ % » ~ º ²!³ % » and so it is only necessary to take large enough so that ² ²!³³ deg ²% ³ for all , whence
460
Advanced Linear Algebra
º d ²!³ % » ~ for all and so ² d ²!³³ . Thus, ² d ²!³³ ¦ B and d is continuous. For the converse, assume that ; is continuous. If ; did have the form d then º; ! % » ~ º d ! % » ~ º! % » and since % ~
º! % » % [
we are prompted to define by % ~
º; ! % » % [
This makes sense since ²; ! ³ ¦ B as ¦ B and so the sum on the right is a finite sum. Then º d ! % » ~ º! % » ~
º; ! % » º! % » ~ º; ! % » [
which implies that ; ! ~ d ! for all . Finally, since ; and d are both continuous, we have ; ~ d .
Umbral Operators and Automorphisms of the Umbral Algebra Figure 18.1 shows the map , which is an isomorphism from the vector space B²F ³ onto the space of all continuous linear operators on < . We are interested in determining the images under this isomorphism of the set of umbral operators and the set of umbral shifts, as pictured in Figure 18.1.
The Umbral Calculus
461
Figure 18.1 Let us begin with umbral operators. Suppose that is the umbral operator for the associated sequence ²%³, with delta series ²!³ < . Then ºd ²!³ % » ~ º ²!³ % » ~ º ²!³ ²%³» ~ [Á ~ º! %» for all and . Hence, d ²!³ ~ ! and the continuity of d implies that d ! ~ ²!³ More generally, for any ²!³ < , d ²!³ ~ ² ²!³³
(18.9)
In words, d is composition by ²!³. From (18.9), we deduce that d is a vector space isomorphism and that d ´²!³²!³µ ~ ² ²!³³² ²!³³ ~ d ²!³d²!³ Hence, d is an automorphism of the umbral algebra < . It is a pleasant fact that this characterizes umbral operators. The first step in the proof of this is the following, whose proof is left as an exercise. Theorem 18.18 If ; is an automorphism of the umbral algebra then ; preserves order, that is, ²; ²!³³ ~ ² ²!³³. In particular, ; is continuous.
Theorem 18.19 A linear operator on F is an umbral operator if and only if its adjoint is an automorphism of the umbral algebra < . Moreover, if is an umbral operator then
462
Advanced Linear Algebra
d ²!³ ~ ² ²!³³ for all ²!³ < . In particular, d ²!³ ~ !. Proof. We have already shown that the adjoint of is an automorphism satisfying (18.9). For the converse, suppose that d is an automorphism of < . Since d is surjective, there is a unique series ²!³ for which d ²!³ ~ !. Moreover, Theorem 18.18 implies that ²!³ is a delta series. Thus, [Á ~ º! % » ~ ºd ²!³ % » ~ º ²!³ % » which shows that % is the associated sequence for ²!³ and hence that is an umbral operator.
Theorem 18.19 allows us to fill in one of the boxes on the right side of Figure 18.1. Let us see how we might use Theorem 18.19 to advantage in the study of associated sequences. We have seen that the isomorphism ª d maps the set K of umbral operators on F onto the set aut²< ³ of automorphisms of < ~ F i . But aut²< ³ is a group under composition. So if ¢ % ¦ ²%³ and ¢ % ¦ ²%³ are umbral operators then since ² k ³d ~ d k d is an automorphism of < , it follows that the composition k is an umbral operator. In fact, since ² k ³d ²²!³³ ~ d k d ²²!³³ ~ d ²!³ ~ ! we deduce that k ~ k . Also, since k ~ k ~ ! ~ we have c ~ . Thus, the set K of umbral operators is a group under composition with k ~ k and c ~ Let us see how this plays out with respect to associated sequences. If the
The Umbral Calculus
463
associated sequence for ²!³ is
²%³ ~ Á % ~
then ¢ % ¦ ²%³ and so k ~ k is the umbral operator for the associated sequence
² k ³% ~ ²%³ ~ Á % ~ Á ²%³ ~
~
This sequence, denoted by
² ²%³³ ~ Á ²%³
(18.10)
~
is called the umbral composition of ²%³ with ²%³. The umbral operator ~ c is the umbral operator for the associated sequence ²%³ ~ 'Á % where c % ~ ²%³ and so
% ~ Á ²%³ ~
Let us summarize. Theorem 18.20 1) The set K of umbral operators on F is a group under composition, with k ~ k
and
c ~
2) The set of associated sequences forms a group under umbral composition
² ²%³³ ~ Á ²%³ ~
In particular, the umbral composition ² ²%³³ is the associated sequence for the composition k , that is k ¢ % ¦ ² ²%³³ The identity is the sequence % and the inverse of ²%³ is the associated sequence for the compositional inverse ²!³.
464
Advanced Linear Algebra
3) Let K and ²!³ < . Then as operators ²!³ ~ c ²!³ 4) Let K and ²!³ < . Then ² ²!³³ ~ ²!³ Proof. We prove 3) as follows. For any ²!³ < and ²%³ 7 º²!³ d ²!³²%³» ~ º´d ²!³µ²!³ ²%³» ~ ºd ´²!³²c ³d ²!³µ ²%³» ~ º²!³²c ³d ²!³ ²%³» ~ º²c ³d ²!³ ²!³²%³» ~ º²!³ c ²!³²%³» which gives the desired result. Part 4) follows immediately from part 3) since is composition by .
Sheffer Operators If
²%³
is Sheffer for ²Á ³ then the linear operator Á defined by Á ²% ³ ~
²%³
is called a Sheffer operator. Sheffer operators are closely related to umbral operators, since if ²%³ is associated with ²!³ then ²%³
~ c ²!³ ²%³ ~ c ²!³ %
and so Á ~ c ²!³ It follows that the Sheffer operators form a group with composition Á k Á ~ c ²!³ c ²!³ ~ c ²!³c ² ²!³³ ~ ´²!³² ²!³³µc k ~ h²k ³Ák and inverse c Á ~ c ² ³Á
From this, we deduce that the umbral composition of Sheffer sequences is a Sheffer sequence. In particular, if ²%³ is Sheffer for ²Á ³ and ! ²%³ ~ '!Á % is Sheffer for ²Á ³ then
The Umbral Calculus
465
Á k Á ²% ³ ~ !Á Á % ~
~ !Á ²%³ ~
~ ! ² ²%³³ is Sheffer for ² h ² k ³Á k ³.
Umbral Shifts and Derivations of the Umbral Algebra We have seen that an operator on F is an umbral operator if and only if its adjoint is an automorphism of < . Now suppose that B²F ³ is the umbral shift for the associated sequence ²%³, associated with the delta series ²!³ < . Then º d ²!³ ²%³» ~ º ²!³ ²%³» ~ º ²!³ b ²%³» ~ ² b 1)[bÁ ~ ² c ³[Ác ~ º ²!³c ²%³» and so d ²!³ ~ ²!³c
(18.11)
d ´ ²!³ ²!³ µ ~ d ´ ²!³ µ ²!³ b ²!³ d´ ²!³µ
(18.12)
This implies that
and further, by continuity, that d ´²!³²!³µ ~ ´ d ²!³µ²!³ b ²!³´ d ²!³µ
(18.13)
Let us pause for a definition. Definition Let 7 be an algebra. A linear operator C on 7 is a derivation if C²³ ~ ²C³ b Cb for all Á 7.
Thus, we have shown that the adjoint of an umbral shift is a derivation of the umbral algebra < . Moreover, the expansion theorem and (18.11) show that d is surjective. This characterizes umbral shifts. First we need a preliminary result on surjective derivations.
466
Advanced Linear Algebra
Theorem 18.21 Let C be a surjective derivation on the umbral algebra < . Then C c ~ for any constant < and ²C f²!³³ ~ ² ²!³³ c , if ² ²!³³ . In particular, C is continuous. Proof. We begin by noting that C ~ C ~ C b C ~ C and so C ~ C ~ for all constants < . Since C is surjective, there must exist an ²!³ < for which C²!³ ~ Writing ²!³ ~ b ! ²!³, we have ~ C´ b ! ²!³µ ~ ²C!³ ²!³ b !C ²!³ which implies that ²C!³ ~ . Finally, if ²²!³³ ~ then ²!³ ~ ! ²!³, where ² ²!³³ ~ and so ´C²!³µ ~ ´C! ²!³µ ~ ´! C²!³ b !c ²!³C!µ ~ c
Theorem 18.22 A linear operator on F is an umbral shift if and only if its adjoint is a surjective derivation of the umbral algebra < . Moreover, if is an umbral shift then d ~ C is derivation with respect to ²!³, that is, d ²!³ ~ ²!³c for all . In particular, d ²!³ ~ . Proof. We have already seen that d is derivation with respect to ²!³. For the converse, suppose that d is a surjective derivation. Theorem 18.21 implies that there is a delta functional ²!³ such that d ²!³ ~ . If ²%³ is the associated sequence for ²!³ then º ²!³ ²%³» ~ º d ²!³ ²%³» ~ º ²!³c d ²!³ ²%³» ~ º ²!³c ²%³» ~ ² b 1)[bÁ ~ º ²!³ b ²%³» Hence, ²%³ ~ b ²%³, that is, ~ is the umbral shift for ²%³.
We have seen that the fact that the set of all automorphisms on < is a group under composition shows that the set of all associated sequences is a group under umbral composition. The set of all surjective derivations on < does not form a group. However, we do have the chain rule for derivations[
The Umbral Calculus
467
Theorem 18.23 (The chain rule) Let C and C be surjective derivations on < . Then C ~ ²C ²!³³C Proof. This follows from C ²!³ ~ ²!³c C ²!³ ~ ²C ²!³³C ²!³ and so continuity implies the result.
The chain rule leads to the following umbral result. Theorem 18.24 If and are umbral shifts then ~ k C ²!³ Proof. Taking adjoints in the chain rule gives ~ k ²C ²!³³d ~ k C ²!³
We leave it as an exercise to show that C ²!³ ~ ´C ²!³µc . Now, by taking ²!³ ~ ! in Theorem 18.24 and observing that ! % ~ %b and so ! is multiplication by %, we get ~ %C ! ~ %´C! ²!³µc ~ %´ Z ²!³µc Applying this to the associated sequence ²%³ for ²!³ gives the following important recurrence relation for ²%³. Theorem 18.25 (The recurrence formula) Let ²%³ be the associated sequence for ²!³. Then 1) b ²%³ ~ %´ Z ²!³µc ²%³ 2) b ²%³ ~ % ´ ²!³µZ % Proof. The first part is proved. As to the second, using Theorem 18.20 we have b ²%³ ~ %´ Z ²!³µc ²%³ ~ %´ Z ²!³µc %
~ % ´ Z ² ²!³³µc % ~ % ´ ²!³µZ % Example 18.9 The recurrence relation can be used to find the associated sequence for the forward difference functional ²!³ ~ ! c . Since Z ²!³ ~ ! , the recurrence relation is b ²%³ ~ %c! ²%³ ~ % ²% c 1)
468
Advanced Linear Algebra
Using the fact that ²%³ ~ , we have ²%³ ~ %Á ²%³ ~ %²% c ³Á 3 ²%³ ~ %²% c ³²% c ³ and so on, leading easily to the lower factorial polynomials ²%³ ~ %²% c ³Ä²% c b ³ ~ ²%³
Example 18.10 Consider the delta functional ²!³ ~ log² b !³ Since ²!³ ~ ! c is the forward difference functional, Theorem 18.20 implies that the associated sequence ²%³ for ²!³ is the inverse, under umbral composition, of the lower factorial polynomials. Thus, if we write
²%³ ~ :²Á ³% ~
then
% ~ :²Á ³²%³ ~
The coefficients :²Á ³ in this equation are known as the Stirling numbers of the second kind and have great combinatorial significance. In fact, :²Á ³ is the number of partitions of a set of size into blocks. The polynomials ²%³ are called the exponential polynomials. The recurrence relation for the exponential polynomials is b ²%³ ~ %² b !³ ²%³ ~ %² ²%³ b Z ²%³³ Equating coefficients of % on both sides of this gives the well known formula for the Stirling numbers :² b Á ³ ~ :²Á c ³ b :²Á ³ Many other properties of the Stirling numbers can be derived by umbral means.
Now we have the analog of part 3) of Theorem 18.20. Theorem 18.26 Let be an umbral shift. Then d ²!³ ~ ²!³ c ²!³
The Umbral Calculus
469
Proof. We have º ²!³ d ²!³ ²%³» ~ º´ d ²!³µ ²!³ ²%³» ~ º d ´²!³ ²!³µ c ²!³ d ²!³ ²%³» ~ º d ´²!³ ²!³µ ²%³» c º²!³ c ²!³ ²%³» ~ º ²!³ ²!³ ²%³» c º c ²!³ ²!³ ²%³» ~ º ²!³ ²!³ ²%³» c º d ²!³ ²!³ ²%³» ~ º ²!³ ²!³ ²%³» c º ²!³ ²!³ ²%³» from which the result follows.
If ²!³ ~ ! then is multiplication by % and d is the derivative with respect to ! and so the previous result becomes Z ²!³ ~ ²!³% c %²!³ as operators on F . The right side of this is called the Pincherle derivative of the operator ²!³. (See [Pin].)
Sheffer Shifts Recall that the linear map Á ´ ²%³µ ~
b ²%³
where ²%³ is Sheffer for ²²!³Á ²!³³ is called a Sheffer shift. If ²%³ is associated with ²!³ then ²!³ ²%³ ~ ²%³ and so c ²!³b ²%³ ~ Á ´c ²!³ ²%³µ and so Á ~ c ²!³ ²!³ From Theorem 18.26, the recurrence formula and the chain rule, we have Á ~ c ²!³ ²!³ ~ c ²!³´²!³ c d ²!³µ ~ c c ²!³C ²!³ ~ c c ²!³C ²!³ ~ c c ²!³C !C! ²!³ ~ %´ Z ²!³µc c c ²!³´ Z ²!³µc Z ²!³ Z ²!³ ~ >% c ? Z ²!³ ²!³ We have proved the following.
470
Advanced Linear Algebra
Theorem 18.27 Let Á be a Sheffer shift. Then 1) Á ~ <% c 2)
b ²%³
Z ²!³ ²!³ = Z ²!³ Z ²!³ ²!³ = Z ²!³ ²%³
~ <% c
The Transfer Formulas We conclude with a pair of formulas for the computation of associated sequences. Theorem 18.28 (The transfer formulas) Let ²%³ be the associated sequence for ²!³. Then cc
1) ²%³ ~ Z ²!³4 ²!³ ! 5 c
2) ²%³ ~ %4 ²!³ ! 5
%
%c
Proof. First we show that 1) and 2) are equivalent. Write ²!³ ~ ²!³°!. Then Z ²!³²!³cc % ~ ´!²!³µZ ²!³cc % ~ ²!³c % b !Z ²!³²!³cc % ~ ²!³c % b Z ²!³²!³cc %c ~ ²!³c % b ´²!³c µZ %c ~ ²!³c % c ´²!³c % c %²!³c µ%c ~ %²!³c %c To prove 1), we verify the operation conditions for an associated sequence for the sequence ²%³ ~ Z ²!³²!³cc % . First, when the fourth equality above gives º! ²%³» ~ º! Z ²!³²!³cc % » ~ º! ²!³c % c ´²!³c µZ %c » ~ º²!³c % » c º´²!³c µZ %c » ~ º²!³c % » c º²!³c % » ~ If ~ then º! ²%³» ~ and so, in general, we have º! ²%³» ~ Á as required. For the second required condition, ²!³ ²%³ ~ ²!³ Z ²!³²!³cc % ~ !²!³ Z ²!³²!³cc % ~ Z ²!³²!³cc %c ~ c ²%³ Thus, ²%³ is the associated sequence for ²!³.
The Umbral Calculus
471
A Final Remark Unfortunately, space does not permit a detailed discussion of examples of Sheffer sequences nor the application of the umbral calculus to various classical problems. In [Rom1], one can find a discussion of the following polynomial sequences: The lower factorial polynomials and Stirling numbers The exponential polynomials and Dobinski's formula The Gould polynomials The central factorial polynomials The Abel polynomials The Mittag–Leffler polynomials The Bessel polynomials The Bell polynomials The Hermite polynomials The Bernoulli polynomials and the Euler–Maclaurin expansion The Euler polynomials The Laguerre polynomials The Bernoulli polynomials of the second kind The Poisson–Charlier polynomials The actuarial polynomials The Meixner polynomials of the first and second kinds The Pidduck polynomials The Narumi polynomials The Boole polynomials The Peters polynomials The squared Hermite polynomials The Stirling polynomials The Mahler polynomials The Mott polynomials and more. In [Rom1], we also find a discussion of how the umbral calculus can be used to approach the following types of problems: The connection constants problem Duplication formulas The Lagrange inversion formula Cross sequences Steffensen sequences Operational formulas Inverse relations Sheffer sequence solutions to recurrence relations Binomial convolution
472
Advanced Linear Algebra
Finally, it is possible to generalize the classical umbral calculus that we have described in this chapter to provide a context for studying polynomial sequences such as those of the name Gegenbauer, Chebyshev and Jacobi. Also, there is a q-version of the umbral calculus that involves the q-binomial coefficients (also known as the Gaussian coefficients) ² c ³Ä² c ³ 4 5 ~ ² c ³Ä² c ³² c ³Ä² c c ³ in place of the binomial coefficients. There is also a logarithmic version of the umbral calculus, which studies the harmonic logarithms and sequences of logarithmic type. For more on these topics, please see [LR], [Rom2] and [Rom3].
Exercises 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18.
Prove that ² ³ ~ ² ³ b ²³, for any Á < . Prove that ² b ³ min¸² ³Á ²³¹, for any Á < . Show that any delta series has a compositional inverse. Show that for any delta series , the sequence is a pseudobasis. Prove that C! is a derivation. Show that < is a delta functional if and only if º » ~ and º %» £ . Show that < is invertible if and only if º » £ . Show that º ²!³ ²%³» ~ º ²!³ ²%³» for any a d, < and F. Show that º!e! ²%³» ~ Z ²a³ for any polynomial ²%³ F . Show that ~ in < if and only if ~ as linear functionals, which holds if and only if ~ as linear operators. Prove that if ²%³ is Sheffer for ²²!³Á ²!³³ then ²!³ ²%³ ~ c ²%³. Hint: Apply the functionals ²!³ ²!³ to both sides. Verify that the Abel polynomials form the associated sequence for the Abel functional. Show that a sequence ²%³ is the Appell sequence for ²!³ if and only if c ²%³ ~ ²!³ % . If is a delta series, show that the adjoint d of the umbral operator is a vector space isomorphism of < . Prove that if ; is an automorphism of the umbral algebra then ; preserves order, that is, ²; ²!³³ ~ ² ²!³³. In particular, ; is continuous. Show that an umbral operator maps associated sequences to associated sequences. Let ²%³ and ²%³ be associated sequences. Define a linear operator by ¢ ²%³ ¦ ²%³. Show that is an umbral operator. Prove that if C and C are surjective derivations on < then C ²!³ ~ ´C ²!³µc .
References
General References [HJ1] Horn, R. and Johnson, C., Matrix Analysis, Cambridge University Press, 1985. [HJ2] Horn, R. and Johnson, C., Topics in Matrix Analysis, Cambridge University Press, 1991. [J1] Jacobson, N., Basic Algebra I, Second Edition, W.H. Freeman, 1985. [KM] Kostrikin, A. and Manin, Y., Linear Algebra and Geometry, Gordon and Breach Science Publishers, 1997. [MM1] Marcus, M., Finite Dimensional Multilinear Algebra, Part I, Marcel Dekker, 1971. [MM2] Marcus, M., Finite Dimensional Multilinear Algebra, Part II, Marcel Dekker, 1975. [Sh] Shapiro, H., A survey of canonical forms and invariants for unitary similarity, Linear Algebra and Its Applications 147:101-167 (1991). [ST] Snapper, E. and Troyer, R., Metric Affine Geometry, Dover Publications, 1971.
References on the Umbral Calculus [LR] Loeb, D. and Rota, G.-C., Formal Power Series of Logarithmic Type, Advances in Mathematics, Vol. 75, No. 1, (May 1989) 1–118. [Pin] Pincherle, S. "Operatori lineari e coefficienti di fattoriali." Alti Accad. Naz. Lincei, Rend. Cl. Fis. Mat. Nat. (6) 18, 417–519, 1933.
474
References
[Rom1] Roman, S., The Umbral Calculus, Pure and Applied Mathematics vol. 111, Academic Press, 1984. [Rom2] Roman, S., The logarithmic binomial formula, American Mathematical Monthly 99 (1992) 641–648. [Rom3] Roman, S., The harmonic logarithms and the binomial formula, Journal of Combinatorial Theory, series A, 63 (1992) 143–163.
Index Abel functional 443,455 Abel operator 445 Abel polynomials 455 abelian 17 absolutely convergent 312 accumulation point 288 addition 17,28,30,33,94 adjoint 201 affine 407 affine combination 411 affine geometry 409 affine group 419 affine hull 412 affine map 418 affine span 412 affine subspace 53 affine transformation 418 affinely independent 416 affinity 418 algebra 30 algebraic closure 28 algebraic dual space 82 algebraic multiplicity 157 algebraically closed 28 algebraically reflexive 85 alphabet 380 alternate 240,242,371 alternating 240,371 ancestor 13 anisotropic 246 annihilator 86,99 antisymmetric 239,371,381 antisymmetric tensor algebra 384 antisymmetric tensor space 384,385,386 Antisymmetry 10 Appell sequence 447 Appolonius identity 196 approximation problem 313 ascending chain condition 24,115,116
ascending order 381 associated sequence 447 associates 25 automorphism 55 barycentric coordinates 417 base 409 base ring 94 basis 43,100 Bernoulli numbers 443 Bernstein theorem 13 Bessel's identity 192 Bessel's inequality 192,319,320,327 best approximation 193,314 bijection 6 bijective 6 bilinear 342 bilinear form 239,342 bilinearity 182 binomial identity 452 binomial type 452 block diagonal matrix 3 block matrix 3 blocks 7 bounded 303,331 canonical forms 8 canonical map 85 canonical projection 76 Cantor's theorem 13 cardinal number 12 cardinality 11,12 cartesian product 14 Cauchy sequence 293 Cayley-Hamilton theorem 156 chain 11 chain rule 467 change of basis matrix 61 change of basis operator 60 change of coordinates operator 60
characteristic 29 characteristic equation 156 characteristic polynomial 154,155 characteristic value 156 characteristic vector 156 Characterization of cyclic submodules 146 Cholsky decomposition 435 classification problem 257 closed 286,398 closed ball 286 closure 288 codimension 80 column equivalent 8 column rank 48 column space 48 common eigenvector 177 commutative 17,18 commuting family 177 compact 398 companion matrix 146 complement 39,103 complemented 103 complete 37,293 complete decomposition 148 complete invariant 8 complete system of invariants 8 completion 298 complexification 49,50,71 complexification map 50 composition 438 cone 246,398 congruence classes 243 congruent 9,243 congruent modulo 20,75 conjugate linear 196 conjugate linearity 182 conjugate representation 449 conjugate space 332 conjugate symmetry 181 connected 262
476
Index
continuous dual space 332 continuum 15 contraction 370 contravariant tensors 367 contravariant type 366 converge 321 converges 185,287,312 convex 314,398 convex combination 398 convex hull 399 coordinate map 47 coordinate matrix 47,351 correspondence theorem 77,102 coset 20,75 coset representative 20,75,410 cosets 76,101 countable 12 countably infinite 12 covariant tensors 367 covariant type 366 cycle 371 cyclic 146 cyclic decomposition 130, 148 cyclic submodule 97 decomposable 345 defined by 142 degenerate 246 deleted absolute row sum 178 delta functional 441 delta operator 444 delta series 438 dense 290 derivation 465 descendants 13 determinant 273,389 diagonal 4 diagonalizable 165 diagonally dominant 179 diameter 303 dimension 409,410 direct product 38,393 direct sum 39,68,102 direct summand 39,103 discrete metric 284 discriminant 244
distance 185,304 divides 5,25 division algorithm 5 domain 6 dot product 182 double (algebraic) dual space 84 dual basis 84 eigenspace 156 eigenvalue 156 eigenvector 156 elementary divisors 135,147 elementary matrix 3 embedding 55,101 empty word 380 endomorphism 55,101 epimorphism 55,101 equivalence class 7 equivalence relation 6 equivalent 8,64 Euclidean metric 284 evaluation at 82,85 evaluation functional 440,442 even permutation 372 even weight subspace 36 expansion theorems 448 exponential 448 exponential polynomials 468 extension by 87 extension map 361 exterior algebra 384,385,386 exterior product 383 exterior product space 384 external direct sum 38,39 factored through 338,339 Farkas's lemma 406 field 28 field of quotients 23 finite 1,12,17 finite support 39 finitely generated 97 first isomorphism theorem 79,102 flat 409 flat representative 410 forward difference functional 443
forward difference operator 445 Fourier expansion 192,320,327 free 100 Frobenius norm 436 functional 82 functional calculus 228 Gaussian coefficients 53,472 generate 42 generated 96 generating function 448,449 geometric multiplicity 157 Gersgorin region 179 Gersgorin row disk 178 Gersgorin row region 178 graded algebra 372 graded ideal 373 Gram-Schmidt orthogonalization 189 greatest common divisor 5 greatest lower bound 10 group 17 Hamel basis 189 Hamel dimension 189 Hamming distance function 303 Hermite polynomials 197,455 Hermitian 207 Hilbert basis 189,317 Hilbert basis theorem 118 Hilbert dimension 189,329 Hilbert space 309 Hilbert space adjoint 203 Hilbert spaces 297 homeomorphism 69 homogeneous component 372 homogeneous of degree 372,373 homomorphism 55,100 Householder transformation 213,425 hyperbolic basis 253 hyperbolic pair 253 hyperbolic plane 253 hyperbolic space 253 hyperplane 410
Index
ideal 19 idempotent 91,107 Identity 17 image 6,57 imaginary part 49 index of nilpotence 175 induced 287 inertia 269 injection 6,101 injective 6 inner product 181,240 inner product space 181,240 integral domain 22 internal 39 invariant 8,68,72 invariant factor decomposition 137 invariant factor decomposition theorem 137 invariant factors 137,147 invariant ideals 137 invariant under 68 Inverses 17 invertible functional 441 involution 174 irreducible 5,25,72 isometric 252,298 isometric isomorphism 186,308 isometrically isomorphic 186,308 isometry 186,252,297,308 isomorphic 57 isomorphism 55,57,101 isotropic 246 join 414 Jordan block 159 Jordan canonical form 159 Jordan canonical form of 159 kernel 57 Kronecker delta function 83 Kronecker product 393 Laguerre polynomials 456 lattice 37 leading coefficient 5 leading entry 3
least upper bound 10 left inverse 104 left singular vectors 430 Legendre polynomials 191 length 183 lifted 339 limit 288 limit point 288 line 410 linear code 36 linear combination 34 linear combinations 96 linear functional 82 linear hyperplane 400 linear operator 55 linear transformation 55 linearity 322 linearly dependent 42,98 linearly independent 42,98 linearly ordered set 11 lower bound 10 lower factorial numbers 437 lower factorial polynomials 454 lower triangular 4 main diagonal 2 matrix 60 matrix of 62,242 matrix of the form 242 maximal element 10 maximal ideal 21 maximal linearly independent set 42 maximal orthonormal set 189 measured by 339 measuring functions 339 measuring set 339 metric 185,283 metric space 185,283 metric vector space 240 minimal element 10 minimal polynomial 144 minimal spanning set 42 Minkowski space 240 Minkowski's inequality 35,285 mixed tensors 367
477
modular law 52 module 147 monic 5 monomorphism 55,101 multilinear 363 multilinear form 363 multiplication 17,28,30 multiplicity 1 multiset 1 natural map 85 natural projection 76 natural topology 69,71 negative 17 net definition 321 nilpotent 174,175 noetherian 115 noetherian ring 116 nondegenerate 246 nonderogatory 174 nonisotropic 246 nonnegative 198,395 nonnegative orthant 198,395 nonsingular 246 nonsingular completion 254 nonsingular completions 254 nonsingular extension theorem 254 nontrivial 34 norm 183,184,331 normal 205 normed linear space 184 normed vector space 197 null 246 nullity 57 odd permutation 372 onto 6 open 286 open ball 286 open neighborhood 286 open rectangles 68 open sets 287 operator adjoint 88 order 17,86,121,437 order ideals 99 ordered basis 47
478
Index
orthogonal 167,187,208,245 orthogonal complement 188,245 orthogonal direct sum 194,250 orthogonal geometry 240 orthogonal group 252 orthogonal projection 223 orthogonal resolution of the identity 226 orthogonal set 188 orthogonal similarity classes 212 orthogonal spectral resolution 227,228 orthogonal transformation 252 orthogonality conditions 446 orthogonally diagonalizable 205 orthogonally equivalent 212 orthogonally similar 212 orthonormal basis 189 orthonormal set 188 parallel 409 parallelogram law 183,307 parity 372 Parseval's identity 192,328 partial order 10 partial sum 312 partially ordered set 10 partition 7 permutation 371 Pincherle derivative 469 plane 410 point 410 polar decomposition 233 polarization identities 184 posets 10 positive definite 230 positive definiteness 181,283 positive square root 231 power 14 power of the continuum 15 power set 12 primary 128 primary cyclic decomposition theorem 134 primary decomposition 147 primary decomposition theorem
128 prime 25 principal ideal 23 principal ideal domain 23 product 14 projection 166 projection modulo 76 projection onto 79 projection operators 73 projection theorem 193,316 projective dimension 421 projective geometry 421 projective line 421 projective plane 421 projective point 421 properly divides 25 pseudobasis 438 pure in 139 QR factorization 425 quadratic form 208,244 quotient field 23 quotient module 101 quotient ring 21 quotient space 76 quotient space of 75 radical 246 range 6 rank 48,57,111,244,351 rank plus nullity theorem 59 rational canonical form 149,150 rational canonical form of 150 recurrence formula 467 reduce 148 reduced row echelon form 3,4 reduces 68 reflection 213,273,425 reflexivity 7,10 relatively prime 5,26 resolution of the identity 170 restriction 6 retract 104 Riesz representation theorem 195,248,333 Riesz vector 195
right inverse 104 right singular vectors 430 ring 17 ring with identity 18 rotation 273 row equivalent 4 row rank 48 row space 48 scalar multiplication 30,33 scalars 33,93 separable 290 sesquilinear 182 Sheffer for 447 Sheffer identity 452 Sheffer operator 457,464 Sheffer sequence 447 Sheffer shift 457 sign 372 signature 269 similar 9,66,142 similarity classes 66,142 simple 120 simultaneously diagonalizable 177 singular 246 singular value decomposition 430 singular values 430 size 1 space 359 span 42,96 spectral resolution 173 spectral theorem for normal operators 227 spectrum 156 sphere 286 splits 158 square summable functions 329 standard basis 43,58,113 standard inner product 182 standard topology 68 standard vector 43 Stirling numbers of the second kind 468 strictly diagonally dominant 179 strictly positive 198,395 strictly positive orthant 395 strictly separated 401
Index
string 380 strongly positive 52,198,395 strongly positive orthant 198,395 strongly separated 402 structure theorem for normal matrices 222 structure theorem for normal operators 221 structure theorem for normal operators: complex case 216 structure theorem for normal operators: real case 220 subfield 53 subgroup 17 submatrix 2 submodule 96 submodule spanned 96 subring 18 subspace 35,240,286 subspace generated 42 subspace spanned 42 sum 14,37 sup metric 284 support 6,38 surjection 6 surjective 6 Sylvester's law of inertia 269 symmetric 2,207,239,371,374 symmetric group 371 symmetric tensor algebra 377,379 symmetric tensor space 377,378 symmetrization map 380 Symmetry 7,181,185,274,283 symplectic basis 253 symplectic geometry 240 symplectic group 252 symplectic transformation 252 symplectic transvection 261 tensor algebra 371 tensor product 345,346,364,393 tensors 345 tensors of type 366 third isomorphism theorem 81,102 topological space 287 topological vector space 69
topology 287 torsion element 99 torsion module 99 total subset 318 totally degenerate 246 totally isotropic 246 totally ordered set 11 totally singular 246 trace 176 translate 409 translation 418 translation operator 445 transpose 2 transposition 372 triangle inequality 183,185,283,307 umbral algebra 440 umbral composition 463 umbral operator 457 umbral shift 457 uncountable 12 underlying set 1 unipotent 282 unique factorization domain 26 unit 25 unit vector 183 unitarily diagonalizable 205 unitarily equivalent 212 unitarily similar 212 unitarily upper triangularizable 165 unitary 208 unitary metric 284 unitary similarity classes 212 universal 271 universal for bilinearity 344 universal pair for 339 upper bound 10,11 upper triangular 4,161,425 upper triangularizable 161,425 Vandermonde convolution formula 455 vector space 33,147 vectors 33
479
wedge product 383 weight 35 with respect to 192 with respect to the bases 62 Witt index 278 Witt's cancellation theorem 260,275 Witt's extension theorem 260,277 word 380 zero divisor 22 zero element 17 zero subspace 36 Zorn's lemma 11
480
Index
projection 166 projection modulo 76 projection onto 79 projection operators 73 projection theorem 193,316 projective dimension 421 projective geometry 421 projective line 421 projective plane 421 projective point 421 properly divides 25 pseudobasis 438 pure in 139 QR factorization 425 quadratic form 208,244 quotient field 23 quotient module 101 quotient ring 21 quotient space 76 quotient space of 75 radical 246 range 6 rank 48,57,111,244,351 rank plus nullity theorem 59 rational canonical form 149,150 rational canonical form of 150 recurrence formula 467 reduce 148 reduced row echelon form 3,4 reduces 68 reflection 213,273,425 reflexivity 7,10 relatively prime 5,26 resolution of the identity 170 restriction 6 retract 104 Riesz representation theorem 195,248,333 Riesz vector 195 right inverse 104 right singular vectors 430 ring 17 ring with identity 18 rotation 273
row equivalent 4 row rank 48 row space 48 scalar multiplication 30,33 scalars 33,93 separable 290 sesquilinear 182 Sheffer for 447 Sheffer identity 452 Sheffer operator 457,464 Sheffer sequence 447 Sheller shift 457 sign 372 signature 269 similar 9,66,142 similarity classes 66,142 simple 120 simultaneously diagonalizable 177 singular 246 singular value decomposition 430 singular values 430 size 1 space 359 span 42,96 spectral resolution 173 spectral theorem for normal operators 227 spectrum 156 sphere 286 splits 158 square summable functions 329 standard basis 43,58,113 standard inner product 182 standard topology 68 standard vector 43 Stirling numbers of the second kind 468 strictly diagonally dominant 179 strictly positive 198,395 strictly positive orthant 395 strictly separated 401 string 380 strongly positive 52,198,395 strongly positive orthant 198,395 strongly separated 402 structure theorem for normal
Index
matrices 222 structure theorem for normal operators 221 structure theorem for normal operators: complex case 216 structure theorem for normal operators: real case 220 subfield 53 subgroup 17 submatrix 2 submodule 96 submodule spanned 96 subring 18 subspace 35,240,286 subspace generated 42 subspace spanned 42 sum 14,37 sup metric 284 support 6,38 surjection 6 surjective 6 Sylvester's law of inertia 269 symmetric 2,207,239,371,374 symmetric group 371 symmetric tensor algebra 377,379 symmetric tensor space 377,378 symmetrization map 380 Symmetry 7,181,185,274,283 symplectic basis 253 symplectic geometry 240 symplectic group 252 symplectic transformation 252 symplectic transvection 261 tensor algebra 371 tensor product 345,346,364,393 tensors 345 tensors of type 366 third isomorphism theorem 81,102 topological space 287 topological vector space 69 topology 287 torsion element 99 torsion module 99 total subset 318 totally degenerate 246
481
totally isotropic 246 totally ordered set 1 | totally singular 246 trace 176 translate 409 translation 418 translation operator 445 transpose 2 transposition 372 triangle inequality 183,185,283,307 umbral algebra 440 umbral composition 463 umbral operator 457 umbral shift 457 uncountable 12 underlying set 1 um )otent 282 unique factorization domain 26 umt 25 umt vector 183 unitarily diagonalizable 205 unitarily equivalent 212 unitarily similar 212 unitarily upper triangularizable 165 unitary 208 unitary metric 284 unitary similarity classes 212 universal 271 universal for bilinearity 344 universal pair for 339 upper bound 1O, 11 upper triangular 4,161,425 upper triangularizable 161,425 Vandermonde convolution formula 455 vector space 33,147 vectors 33 wedge product 383 weight 35 with respect to 192 with respect to the bases 62
482
Index
Witt index 278 Witt's cancellation theorem 260,275 Witt's extension theorem 260,277 word 380 zero divisor 22 zero element 17 zero subspace 36 Zorn's lemma 11